cs.AI - 2023-09-28

Sourcing Investment Targets for Venture and Growth Capital Using Multivariate Time Series Transformer

  • paper_url: http://arxiv.org/abs/2309.16888
  • repo_url: None
  • paper_authors: Lele Cao, Gustaf Halvardsson, Andrew McCornack, Vilhelm von Ehrenheim, Pawel Herman
  • For: 本研究探讨了private equity(PE)行业中数据驱动方法的应用,尤其是venture capital(VC)和growth capital(GC)投资目标的选择。* Methods: 我们提出了一种使用Transformer-based Multivariate Time Series Classifier(TMTSC)来预测候选公司的成功可能性。我们还介绍了我们的实现方式,包括输入特征、模型架构、优化目标和投资者中心的数据增强和分割。* Results: 我们的实验结果表明,我们的方法可以在四个数据集上实现显著的提升,相比三种常见基准。
    Abstract This paper addresses the growing application of data-driven approaches within the Private Equity (PE) industry, particularly in sourcing investment targets (i.e., companies) for Venture Capital (VC) and Growth Capital (GC). We present a comprehensive review of the relevant approaches and propose a novel approach leveraging a Transformer-based Multivariate Time Series Classifier (TMTSC) for predicting the success likelihood of any candidate company. The objective of our research is to optimize sourcing performance for VC and GC investments by formally defining the sourcing problem as a multivariate time series classification task. We consecutively introduce the key components of our implementation which collectively contribute to the successful application of TMTSC in VC/GC sourcing: input features, model architecture, optimization target, and investor-centric data augmentation and split. Our extensive experiments on four datasets, benchmarked towards three popular baselines, demonstrate the effectiveness of our approach in improving decision making within the VC and GC industry.
    摘要

Investigating Human-Identifiable Features Hidden in Adversarial Perturbations

  • paper_url: http://arxiv.org/abs/2309.16878
  • repo_url: None
  • paper_authors: Dennis Y. Menn, Tzu-hsun Feng, Sriram Vishwanath, Hung-yi Lee
  • for: 本研究探讨了神经网络在针对性攻击下的抵触性问题,以帮助更好地理解神经网络在实际应用中的潜在漏洞。
  • methods: 本研究使用了多种攻击算法,包括针对性攻击和无目标攻击,并在三个 dataset 上进行了实验。研究人员还使用了像素级标注来提取人类可识别的特征,并证明了这些特征可以妨碍目标模型。
  • results: 研究发现,针对性攻击和无目标攻击都会导致模型做出错误的预测,并且在不同的攻击算法下,perturbations 的特征存在一定的相似性。此外,研究还发现了两种不同的效果在人类可识别的特征中,其中隐藏效应更常见于无目标攻击中,而生成效应更常见于针对性攻击中。
    Abstract Neural networks perform exceedingly well across various machine learning tasks but are not immune to adversarial perturbations. This vulnerability has implications for real-world applications. While much research has been conducted, the underlying reasons why neural networks fall prey to adversarial attacks are not yet fully understood. Central to our study, which explores up to five attack algorithms across three datasets, is the identification of human-identifiable features in adversarial perturbations. Additionally, we uncover two distinct effects manifesting within human-identifiable features. Specifically, the masking effect is prominent in untargeted attacks, while the generation effect is more common in targeted attacks. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models. In addition, our findings indicate a notable extent of similarity in perturbations across different attack algorithms when averaged over multiple models. This work also provides insights into phenomena associated with adversarial perturbations, such as transferability and model interpretability. Our study contributes to a deeper understanding of the underlying mechanisms behind adversarial attacks and offers insights for the development of more resilient defense strategies for neural networks.
    摘要

Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis

  • paper_url: http://arxiv.org/abs/2309.16859
  • repo_url: https://github.com/syntec-research/Preface
  • paper_authors: Marcel C. Bühler, Kripasindhu Sarkar, Tanmay Shah, Gengyan Li, Daoye Wang, Leonhard Helminger, Sergio Orts-Escolano, Dmitry Lagun, Otmar Hilliges, Thabo Beeler, Abhimitra Meka
  • for: 能够Synthesize high-resolution human faces with complex appearance and reflectance effects, including hair and skin.
  • methods: 使用了一种新的积体人脸先验模型,该模型基于一个identity-conditioned NeRF,通过一个简单的粗糙特征点基于的3D对称来学习一个平滑的积体geometry和外观空间,而不需要大量的多视图输入图像。
  • results: 可以从2或3个不同分辨率的相机视图中获取高品质积体人脸表示,只需要两个视图的捕捉图像作为输入。
    Abstract NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin. These methods typically require a large number of multi-view input images, making the process hardware intensive and cumbersome, limiting applicability to unconstrained settings. We propose a novel volumetric human face prior that enables the synthesis of ultra high-resolution novel views of subjects that are not part of the prior's training distribution. This prior model consists of an identity-conditioned NeRF, trained on a dataset of low-resolution multi-view images of diverse humans with known camera calibration. A simple sparse landmark-based 3D alignment of the training dataset allows our model to learn a smooth latent space of geometry and appearance despite a limited number of training identities. A high-quality volumetric representation of a novel subject can be obtained by model fitting to 2 or 3 camera views of arbitrary resolution. Importantly, our method requires as few as two views of casually captured images as input at inference time.
    摘要 NeRFs 已经实现了人脸的高真实感Synthesis,包括复杂的外观和反射效果。这些方法通常需要大量多视图输入图像,使得过程成为硬件昂贵和繁琐,限制了应用场景。我们提出了一个新的积分型人脸先验模型,允许synthesize高分辨率的新视图。这个模型由一个 conditioned NeRF 和一个 sparse landmark-based 3D 对齐组成。我们通过一个小量的训练数据进行了学习,并且可以通过两个或三个Camera视图的任意分辨率进行模型适应。关键是,我们的方法只需要两个或三个严格Captured图像作为输入。

Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

  • paper_url: http://arxiv.org/abs/2309.16819
  • repo_url: None
  • paper_authors: Diogo S. Carvalho, Pedro A. Santos, Francisco S. Melo
  • for: 本研究探讨$Q$-学习算法在线性函数近似下的收敛性。
  • methods: 本研究引入了一个新的多贝尔曼操作符,扩展了传统贝尔曼操作符的功能。通过研究这个操作符的性质,我们提出了一种基于多贝尔曼操作符的多$Q$-学习算法,并证明了这种算法可以在Fixed-point guarantees下提供更好的性能。
  • results: 我们通过应用这种算法于知名环境中,证明了我们的方法的有效性和实用性。
    Abstract We study the convergence of $Q$-learning with linear function approximation. Our key contribution is the introduction of a novel multi-Bellman operator that extends the traditional Bellman operator. By exploring the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes contractive, providing improved fixed-point guarantees compared to the Bellman operator. To leverage these insights, we propose the multi $Q$-learning algorithm with linear function approximation. We demonstrate that this algorithm converges to the fixed-point of the projected multi-Bellman operator, yielding solutions of arbitrary accuracy. Finally, we validate our approach by applying it to well-known environments, showcasing the effectiveness and applicability of our findings.
    摘要 我们研究 $Q$-学习的收敛性,尤其是使用线性函数 aproximation。我们的关键贡献是提出一种多重贝尔曼 операktor,将传统的贝尔曼 operator 扩展。通过研究这个操作符的性质,我们确定了条件下,其 projetced multi-Bellman operator 变得减少,从而提供更好的固定点保证。基于这些发现,我们提议 multi $Q$-学习算法,使用线性函数 aproximation。我们证明该算法收敛到多重贝尔曼 operator 的固定点,可以获得任意精度的解决方案。最后,我们验证了我们的方法,在知名环境中应用,展示了我们的发现的有效性和实用性。

De-SaTE: Denoising Self-attention Transformer Encoders for Li-ion Battery Health Prognostics

  • paper_url: http://arxiv.org/abs/2310.00023
  • repo_url: None
  • paper_authors: Gaurav Shinde, Rohan Mohapatra, Pooja Krishan, Saptarshi Sengupta
  • For: The paper aims to accurately predict the Remaining Useful Life (RUL) of Lithium Ion (Li-ion) batteries, which is critical for proactive maintenance and predictive analytics.* Methods: The paper proposes a novel approach that combines multiple denoising modules, including a denoising auto-encoder and a wavelet denoiser, to generate encoded/decomposed representations of battery data. These representations are then processed through dedicated self-attention transformer encoders to estimate health indicators.* Results: The paper reports that the proposed approach can accurately estimate health indicators under a set of diverse noise patterns, with error metrics on par or better than the best reported in recent literature.
    Abstract Lithium Ion (Li-ion) batteries have gained widespread popularity across various industries, from powering portable electronic devices to propelling electric vehicles and supporting energy storage systems. A central challenge in managing Li-ion batteries effectively is accurately predicting their Remaining Useful Life (RUL), which is a critical measure for proactive maintenance and predictive analytics. This study presents a novel approach that harnesses the power of multiple denoising modules, each trained to address specific types of noise commonly encountered in battery data. Specifically we use a denoising auto-encoder and a wavelet denoiser to generate encoded/decomposed representations, which are subsequently processed through dedicated self-attention transformer encoders. After extensive experimentation on the NASA and CALCE datasets, we are able to characterize a broad spectrum of health indicator estimations under a set of diverse noise patterns. We find that our reported error metrics on these datasets are on par or better with the best reported in recent literature.
    摘要

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

  • paper_url: http://arxiv.org/abs/2309.16797
  • repo_url: None
  • paper_authors: Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rocktäschel
  • for: This paper aims to improve the reasoning abilities of large language models (LLMs) in various domains by presenting a general-purpose self-referential self-improvement mechanism called Promptbreeder.
  • methods: Promptbreeder evolves and adapts prompts for a given domain by mutating a population of task-prompts, and subsequently evaluating them for fitness on a training set. The mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way.
  • results: Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks, and is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
    Abstract Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
    摘要 受欢迎的提示策略如链条提示可以很大程度地提高大语言模型(LLM)在不同领域的理智能力。然而,这些手动制定的提示策略通常是不优化的。在本文中,我们介绍了Promptbreeder,一种通用的自referential自提高机制,可以在给定领域中进化和适应提示。Promptbreeder被 LLM 驱动,对任务提示 population 进行变异,然后对这些任务提示进行评价。关键是,变异这些任务提示的过程是由 LLM 生成和改进的自referential方式。即,Promptbreeder不仅是改进任务提示,而且也是改进这些改进任务提示的mutation prompts。Promptbreeder在常用的数学和常识理智测试benchmark上表现出色,并且能够演化复杂的任务提示 для hate speech classification 问题。

Photonic Accelerators for Image Segmentation in Autonomous Driving and Defect Detection

  • paper_url: http://arxiv.org/abs/2309.16783
  • repo_url: None
  • paper_authors: Lakshmi Nair, David Widemann, Brad Turcott, Nick Moore, Alexandra Wleklinski, Darius Bunandar, Ioannis Papavasileiou, Shihu Wang, Eric Logan
  • for: 这个论文探讨了在光学加速器上执行图像分割模型,以优化图像分割任务的速度和能效性。
  • methods: 该论文使用了光学加速器来执行图像分割模型,并研究了不同图像分割模型在光学加速器上的性能。
  • results: 论文发现了一些图像分割模型可以在光学加速器上减少精度损失,并提供了这些模型的实际原因。此外,论文还对不同图像分割任务的吞吐量和能耗进行了比较。
    Abstract Photonic computing promises faster and more energy-efficient deep neural network (DNN) inference than traditional digital hardware. Advances in photonic computing can have profound impacts on applications such as autonomous driving and defect detection that depend on fast, accurate and energy efficient execution of image segmentation models. In this paper, we investigate image segmentation on photonic accelerators to explore: a) the types of image segmentation DNN architectures that are best suited for photonic accelerators, and b) the throughput and energy efficiency of executing the different image segmentation models on photonic accelerators, along with the trade-offs involved therein. Specifically, we demonstrate that certain segmentation models exhibit negligible loss in accuracy (compared to digital float32 models) when executed on photonic accelerators, and explore the empirical reasoning for their robustness. We also discuss techniques for recovering accuracy in the case of models that do not perform well. Further, we compare throughput (inferences-per-second) and energy consumption estimates for different image segmentation workloads on photonic accelerators. We discuss the challenges and potential optimizations that can help improve the application of photonic accelerators to such computer vision tasks.
    摘要 光子计算技术可以提供更快速和能效的深度神经网络(DNN)推理,比传统的数字硬件更高效。在应用于自动驾驶和缺陷检测等领域中,光子计算技术的进步可能产生深见的影响。本文通过对图像分割模型的执行来探索:a) 适合光子加速器的图像分割DNN架构,以及b) 光子加速器上运行不同图像分割模型的吞吐量和能效率,以及其中的让步。我们发现某些分割模型在光子加速器上具有较小的精度损失(相比于数字浮点32模型),并 explore了这种 Robustness 的实际原因。此外,我们还讨论了在模型性能不佳时如何恢复精度的技术。此外,我们对不同图像分割任务的吞吐量和能效率进行了比较,并讨论了在光子加速器上应用这些计算任务的挑战和优化方法。

Intriguing properties of generative classifiers

  • paper_url: http://arxiv.org/abs/2309.16779
  • repo_url: None
  • paper_authors: Priyank Jaini, Kevin Clark, Robert Geirhos
  • for: 研究人类物体识别的最佳方法 – 是否使用推测性推理(快速但可能存在快速学习)或使用生成模型(慢速但可能更加可靠)?
  • methods: 基于最近的生成模型技术,将文本到图像模型转化成分类器,以便研究其行为并与推测模型和人类psychophysical数据进行比较。
  • results: 报告了四个有趣的 emergent 性质:1)表现出人类水平的形态偏好(Imagen 的99%),2)与人类分类错误相似的水平,3)与人类分类错误的水平相似,4)对 certain 感知错觉有理解。结果表明,当前主导的人类物体识别模型是推测性推理,但零shot 生成模型在人类物体识别数据上 surprisingly well 的 aproximation。
    Abstract What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
    摘要 最佳模式来识别物体是否抽象推理(快速但可能会采用短cut学习)或使用生成模型(慢速但可能更加Robust)?我们基于最近的生成模型发展,将文本到图像模型转化为分类器。这使得我们可以研究它们的行为,并与抽象模型和人类心理物理数据进行比较。我们报道了四种有趣的生成分类器特性:它们表现出99%的人类样式偏好(对Imagen)、人类水平的异常数据准确率、人类分类错误的Alignment和某些视觉错觉的理解。我们的结果表明,虽然当前的主导模式是抽象推理,但零 shot生成模型在模型人类物体识别数据中 surprisingly well。

How many words does ChatGPT know? The answer is ChatWords

  • paper_url: http://arxiv.org/abs/2309.16777
  • repo_url: https://github.com/wordsgpt/chatwords
  • paper_authors: Gonzalo Martínez, Javier Conde, Pedro Reviriego, Elena Merino-Gómez, José Alberto Hernández, Fabrizio Lombardi
    for: 这个论文的目的是评估聊天GPT的语言知识,特别是对于一组指定的单词的认知。methods: 该论文使用了自动化测试系统ChatWords来评估聊天GPT对于一组指定的单词的认知。results: 研究发现,聊天GPT只能正确识别约80%的词汇,并且在一些情况下具有错误的含义。
    Abstract The introduction of ChatGPT has put Artificial Intelligence (AI) Natural Language Processing (NLP) in the spotlight. ChatGPT adoption has been exponential with millions of users experimenting with it in a myriad of tasks and application domains with impressive results. However, ChatGPT has limitations and suffers hallucinations, for example producing answers that look plausible but they are completely wrong. Evaluating the performance of ChatGPT and similar AI tools is a complex issue that is being explored from different perspectives. In this work, we contribute to those efforts with ChatWords, an automated test system, to evaluate ChatGPT knowledge of an arbitrary set of words. ChatWords is designed to be extensible, easy to use, and adaptable to evaluate also other NLP AI tools. ChatWords is publicly available and its main goal is to facilitate research on the lexical knowledge of AI tools. The benefits of ChatWords are illustrated with two case studies: evaluating the knowledge that ChatGPT has of the Spanish lexicon (taken from the official dictionary of the "Real Academia Espa\~nola") and of the words that appear in the Quixote, the well-known novel written by Miguel de Cervantes. The results show that ChatGPT is only able to recognize approximately 80% of the words in the dictionary and 90% of the words in the Quixote, in some cases with an incorrect meaning. The implications of the lexical knowledge of NLP AI tools and potential applications of ChatWords are also discussed providing directions for further work on the study of the lexical knowledge of AI tools.
    摘要 chatGPT 的推出使得人工智能自然语言处理(NLP)技术再次升级。chatGPT 的采用率快速增长,有数百万用户在各种应用领域进行实验,结果很出色。然而,chatGPT 也存在一些限制和偏见,例如生成的答案看起来很像,但实际上完全错误。评估 chatGPT 和类似的 NLP 工具性能是一个复杂的问题,正在从不同的角度进行探索。在这项工作中,我们贡献了一个自动化测试系统,即 ChatWords,以评估 chatGPT 对于任意集合的词汇知识。ChatWords 设计为易于使用、可扩展和适用于评估其他 NLP 工具。ChatWords 公开可用,其主要目标是促进研究 NLP 工具词汇知识。我们通过两个案例研究,一是评估 chatGPT 对于西班牙语词汇(根据官方《Real Academia Espa\~nola》词典)的认知程度,二是评估 chatGPT 对于《仙游记》中出现的词汇的认知程度。结果显示,chatGPT 只能识别西班牙语词汇约80%,《仙游记》中的词汇约90%,有时会带有错误的含义。我们还讨论了 NLP 工具词汇知识的后果和可能的应用,并提供了进一步研究 NLP 工具词汇知识的指导。

Neural scaling laws for phenotypic drug discovery

  • paper_url: http://arxiv.org/abs/2309.16773
  • repo_url: None
  • paper_authors: Drew Linsley, John Griffin, Jason Parker Brown, Adam N Roose, Michael Frank, Peter Linsley, Steven Finkbeiner, Jeremy Linsley
  • for: 这个论文是为了探讨深度神经网络(DNNs)在小分子药物发现方面是否可以通过大规模模型和数据来实现突破性进步。
  • methods: 作者通过大规模系统性分析,检查DNN大小、数据饭和学习策略之间如何相互作用,以影响在Phenotypic Chemistry Arena(Pheno-CA)benchmark上的准确性。
  • results: 作者发现,不同于自然语言处理和计算机视觉领域,DNN在小分子药物发现任务上不会随着数据和模型大小的增加而不断提高。作者引入了一种新的预学任务——反向生物过程(IBP),并发现IBP先训练后在Pheno-CA上表现出较高的准确性。此外,IBP训练后的DNN表现与数据和模型规模呈MONOTONIC增长关系。这些结果表明,用于解决小分子药物发现任务的DNN元素已经在我们手中,并且可以通过更多的实验数据来实现任何需要的提升。作者发布了Pheno-CA bencmark和代码,以便更多的研究人员研究小分子药物发现领域中的神经 scaling laws。
    Abstract Recent breakthroughs by deep neural networks (DNNs) in natural language processing (NLP) and computer vision have been driven by a scale-up of models and data rather than the discovery of novel computing paradigms. Here, we investigate if scale can have a similar impact for models designed to aid small molecule drug discovery. We address this question through a large-scale and systematic analysis of how DNN size, data diet, and learning routines interact to impact accuracy on our Phenotypic Chemistry Arena (Pheno-CA) benchmark: a diverse set of drug development tasks posed on image-based high content screening data. Surprisingly, we find that DNNs explicitly supervised to solve tasks in the Pheno-CA do not continuously improve as their data and model size is scaled-up. To address this issue, we introduce a novel precursor task, the Inverse Biological Process (IBP), which is designed to resemble the causal objective functions that have proven successful for NLP. We indeed find that DNNs first trained with IBP then probed for performance on the Pheno-CA significantly outperform task-supervised DNNs. More importantly, the performance of these IBP-trained DNNs monotonically improves with data and model scale. Our findings reveal that the DNN ingredients needed to accurately solve small molecule drug development tasks are already in our hands, and project how much more experimental data is needed to achieve any desired level of improvement. We release our Pheno-CA benchmark and code to encourage further study of neural scaling laws for small molecule drug discovery.
    摘要 Surprisingly, we find that DNNs explicitly supervised to solve tasks in the Pheno-CA do not continuously improve as their data and model size increases. To address this issue, we introduce a novel precursor task, the Inverse Biological Process (IBP), which is designed to resemble the causal objective functions that have proven successful for NLP. We find that DNNs first trained with IBP and then probed for performance on the Pheno-CA significantly outperform task-supervised DNNs. Moreover, the performance of these IBP-trained DNNs monotonically improves with data and model scale.Our findings reveal that the DNN ingredients needed to accurately solve small molecule drug discovery tasks are already in our hands, and we provide a quantitative estimate of how much more experimental data is needed to achieve any desired level of improvement. We release our Pheno-CA benchmark and code to encourage further study of neural scaling laws for small molecule drug discovery.

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

  • paper_url: http://arxiv.org/abs/2309.16772
  • repo_url: None
  • paper_authors: Lei Lai, Zhongkai Shangguan, Jimuyang Zhang, Eshed Ohn-Bar
  • for: 本研究旨在提出一种 semi-supervised learning 方法,用于训练通用的单目视巡ometry(VO)模型,并可以在不同的数据集和环境下进行稳定的自适应运行。
  • methods: 本研究使用了 YouTube 上的大量不制定和多样化的摄像头视频进行自我训练,以便学习视频场景的 semantics 来恢复相对pose。具有多Modal 监督,包括 segmentation、流动、深度和音频auxiliary prediction 任务,以便激活通用表示。
  • results: 我们的提案可以在常用的 KITTI 标准测试集上达到状态之前的性能水平,而不需要多帧优化或相机参数的知情。此外,我们还发现音频预测任务可以强化 semi-supervised 学习过程,特别是在高度动态和不同的视频数据中。通过将 XVO 与 semi-supervised 步骤结合使用,我们可以在 KITTI、nuScenes 和 Argoverse 等不同数据集上实现自适应知识传递,而不需要特定的 fine-tuning。
    Abstract We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale from visual scene semantics, i.e., without relying on any known camera parameters. We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold. First, we empirically demonstrate the benefits of semi-supervised training for learning a general-purpose direct VO regression network. Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task. Specifically, we find audio prediction task to significantly enhance the semi-supervised learning process while alleviating noisy pseudo-labels, particularly in highly dynamic and out-of-domain video data. Our proposed teacher network achieves state-of-the-art performance on the commonly used KITTI benchmark despite no multi-frame optimization or knowledge of camera parameters. Combined with the proposed semi-supervised step, XVO demonstrates off-the-shelf knowledge transfer across diverse conditions on KITTI, nuScenes, and Argoverse without fine-tuning.
    摘要 我们提出 XVO,一种半监督学习方法,用于训练通用化的单目视巡数据(VO)模型,并可以在多种数据集和环境下进行稳定的自适应操作。与标准单目VO方法不同,XVO可以高效地从视觉场景 semantics 中 recuperate 相对pose,无需依赖任何known camera参数。我们通过自动训练从 YouTube 上可获得大量不同类型和不一致的dash摄像头视频来优化运动估计模型。我们的关键贡献有两点:首先,我们实际表明了半监督训练可以学习一个通用的直接VO重 regression 网络。其次,我们示出了多Modal 监督,包括分割、流动、深度和音频auxiliary prediction任务,以便促进通用表示 дляVO任务。我们发现音频预测任务可以显著地提高半监督学习过程,并减少噪音pseudo标签,特别是在高度动态和外域视频数据中。我们提出的教师网络在通用的KITTIbenchmark上达到了state-of-the-art性能,即使没有多帧优化或相机参数的知情。与我们的半监督步骤结合,XVO在KITTI、nuscenes和Argoverse上实现了无需 fine-tuning 的稳定知识传递。

Persona-Coded Poly-Encoder: Persona-Guided Multi-Stream Conversational Sentence Scoring

  • paper_url: http://arxiv.org/abs/2309.16770
  • repo_url: None
  • paper_authors: Junfeng Liu, Christopher Symons, Ranga Raju Vatsavai
  • for: 提高对话质量,使用个人性格信息进行改进。
  • methods: 提出了一种新的Persona-Coded Poly-Encoder方法,利用个人性格信息在多流编码方式中进行改进。
  • results: 对两个不同的人格基本对话集进行评估,与两种现有方法进行比较,研究结果表明,与基eline方法Poly-Encoder相比,我们的方法可以提高对话质量的BLEU分数和HR@1指标中的提高率为3.32%和2.94%。
    Abstract Recent advances in machine learning and deep learning have led to the widespread use of Conversational AI in many practical applications. However, it is still very challenging to leverage auxiliary information that can provide conversational context or personalized tuning to improve the quality of conversations. For example, there has only been limited research on using an individuals persona information to improve conversation quality, and even state-of-the-art conversational AI techniques are unable to effectively leverage signals from heterogeneous sources of auxiliary data, such as multi-modal interaction data, demographics, SDOH data, etc. In this paper, we present a novel Persona-Coded Poly-Encoder method that leverages persona information in a multi-stream encoding scheme to improve the quality of response generation for conversations. To show the efficacy of the proposed method, we evaluate our method on two different persona-based conversational datasets, and compared against two state-of-the-art methods. Our experimental results and analysis demonstrate that our method can improve conversation quality over the baseline method Poly-Encoder by 3.32% and 2.94% in terms of BLEU score and HR@1, respectively. More significantly, our method offers a path to better utilization of multi-modal data in conversational tasks. Lastly, our study outlines several challenges and future research directions for advancing personalized conversational AI technology.
    摘要 In this paper, we present a novel Persona-Coded Poly-Encoder method that leverages persona information in a multi-stream encoding scheme to improve the quality of response generation for conversations. To show the efficacy of the proposed method, we evaluate our method on two different persona-based conversational datasets, and compared against two state-of-the-art methods. Our experimental results and analysis demonstrate that our method can improve conversation quality over the baseline method Poly-Encoder by 3.32% and 2.94% in terms of BLEU score and HR@1, respectively. More significantly, our method offers a path to better utilization of multi-modal data in conversational tasks.Lastly, our study outlines several challenges and future research directions for advancing personalized conversational AI technology.

RealFill: Reference-Driven Generation for Authentic Image Completion

  • paper_url: http://arxiv.org/abs/2309.16668
  • repo_url: None
  • paper_authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein
  • for: 填充图像中缺失的区域,使得图像更加完整和真实
  • methods: 使用几个参考图像个性化生成模型,可以在不同的视角、照明条件、摄像头等条件下完成图像的填充
  • results: 比较 existed 方法,RealFill 能够在多种复杂和挑战的场景下填充图像,并且可以生成更加真实和可信的内容
    Abstract Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. See more results on our project page: https://realfill.github.io
    摘要 近期的生成图像技术发展,出现了外部涂抹和内部涂抹模型,可以生成高质量、有可能的图像内容,但这些模型无法提供真实场景的信息,因此生成的内容是不真实的。在这项工作中,我们提出了RealFill,一种新的生成方法,可以填充图像中缺失的区域。RealFill是一种基于几个参考图像的个性化生成模型,这些参考图像不需要与目标图像对齐,可以有极大的视角、照明条件、镜头缩进或图像风格的差异。一旦个性化,RealFill就可以完成目标图像,并生成有趣的内容,忠实于原始场景。我们在一个新的图像完成测试 benchmark 上评估了RealFill,并发现它与现有方法相比,表现出了大幅度的提升。更多结果请查看我们的项目页面:https://realfill.github.io。

SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation

  • paper_url: http://arxiv.org/abs/2309.16661
  • repo_url: https://github.com/mustansarfiaz/sa2-net
  • paper_authors: Mustansar Fiaz, Rao Muhammad Anwer, Hisham Cholakkal
  • for: 这个论文主要目的是提出一个具有注意力导航的方法,以便有效地处理微scopic影像中的多种结构,例如细胞等。
  • methods: 这个方法使用了多个层次特征学习,并将注意力导航模组与多个分辨率结合,以捕捉微scopic影像中的各种构造。此外,这个方法还引入了一个新的upsampling策略,以提高区域边界的定义性。
  • results: 实验结果显示,这个SA2-Net模型在五个挑战性的数据集上表现出色,并且比较常用的CNN模型表现更好。代码供给publicly available at \url{https://github.com/mustansarfiaz/SA2-Net}.
    Abstract Microscopic image segmentation is a challenging task, wherein the objective is to assign semantic labels to each pixel in a given microscopic image. While convolutional neural networks (CNNs) form the foundation of many existing frameworks, they often struggle to explicitly capture long-range dependencies. Although transformers were initially devised to address this issue using self-attention, it has been proven that both local and global features are crucial for addressing diverse challenges in microscopic images, including variations in shape, size, appearance, and target region density. In this paper, we introduce SA2-Net, an attention-guided method that leverages multi-scale feature learning to effectively handle diverse structures within microscopic images. Specifically, we propose scale-aware attention (SA2) module designed to capture inherent variations in scales and shapes of microscopic regions, such as cells, for accurate segmentation. This module incorporates local attention at each level of multi-stage features, as well as global attention across multiple resolutions. Furthermore, we address the issue of blurred region boundaries (e.g., cell boundaries) by introducing a novel upsampling strategy called the Adaptive Up-Attention (AuA) module. This module enhances the discriminative ability for improved localization of microscopic regions using an explicit attention mechanism. Extensive experiments on five challenging datasets demonstrate the benefits of our SA2-Net model. Our source code is publicly available at \url{https://github.com/mustansarfiaz/SA2-Net}.
    摘要 微型图像分割是一项复杂的任务,目标是为每个微型图像像素分配Semantic标签。虽然卷积神经网络(CNN)是许多现有框架的基础,但它们经常难以直接捕捉长距离依赖关系。尽管转换器在初始设计中是为了解决这一问题,但实际上,本地和全局特征都是微型图像多样化挑战的关键。在这篇论文中,我们介绍SA2-Net模型,它利用多级特征学习来有效地处理微型图像多样化挑战。具体来说,我们提出了适应级别注意(SA2)模块,用于捕捉微型图像中不同级别的尺度和形状特征,如细胞。这个模块包括每个多 stage特征层的本地注意力,以及多个分辨率之间的全局注意力。此外,我们解决了微型图像边界模糊(例如细胞边界模糊)的问题,通过引入一种新的upsampling策略called Adaptive Up-Attention(AuA)模块。这个模块通过显式注意力机制来提高微型图像区域的特征表达能力,以提高细胞的local化。我们在五个复杂的dataset上进行了广泛的实验,并证明了SA2-Net模型的优势。我们的源代码可以在github上获取,具体请参阅 \url{https://github.com/mustansarfiaz/SA2-Net}.

Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories

  • paper_url: http://arxiv.org/abs/2309.16750
  • repo_url: None
  • paper_authors: Benjamin Hoover, Hendrik Strobelt, Dmitry Krotov, Judy Hoffman, Zsolt Kira, Duen Horng Chau
  • for: 这个论文主要是为了提供一个简洁的概述 diffusion models (DMs),并描述它们如何与 Associative Memories (AMs) 之间的数学连接。
  • methods: 这篇论文使用了动力系统和Ordinary Differential Equations (ODEs) 来描述 DMs,并指出了一个 Lyapunov energy function,可以通过梯度下降来denoising数据。
  • results: 这篇论文总结了40年来的能量基本模型 (AMs) 的研究历史,并讨论了新的研究方向,包括 AMs 和 DMs 之间的类似性和不同性。
    Abstract Diffusion Models (DMs) have recently set state-of-the-art on many generation benchmarks. However, there are myriad ways to describe them mathematically, which makes it difficult to develop a simple understanding of how they work. In this survey, we provide a concise overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs) which exposes a mathematical connection to the highly related yet often overlooked class of energy-based models, called Associative Memories (AMs). Energy-based AMs are a theoretical framework that behave much like denoising DMs, but they enable us to directly compute a Lyapunov energy function on which we can perform gradient descent to denoise data. We then summarize the 40 year history of energy-based AMs, beginning with the original Hopfield Network, and discuss new research directions for AMs and DMs that are revealed by characterizing the extent of their similarities and differences
    摘要 Diffusion Models (DM) 在许多生成benchmark上设置了现代的州Of-the-art。然而,有许多不同的方式可以用数学方式描述它们,这使得理解它们的工作方式变得困难。在这篇survey中,我们提供了一个简洁的概述,从动态系统和常数方程式(ODEs)的角度,暴露了DM和对它们相似但通常被忽略的能量基本模型(AM)之间的数学连接。能量基本AM是一个理论框架,它们在减少噪声方面表现得非常相似于DM,但它们允许我们直接计算数据上的 Lyapunov 能量函数,并且可以使用梯度下降来减少噪声。我们然后summarize了40年来的AM历史,开始自原始的Hopfield网络,并讨论了AM和DM的新研究方向。

Discovering environments with XRM

  • paper_url: http://arxiv.org/abs/2309.16748
  • repo_url: None
  • paper_authors: Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz
  • For: The paper aims to develop algorithms for automatically discovering environments that induce broad generalization for robust AI systems across applications.* Methods: The proposed method, Cross-Risk-Minimization (XRM), trains two twin networks to learn from one random half of the training data, while imitating confident held-out mistakes made by its sibling.* Results: The paper shows that XRM can discover environments for all training and validation data, and domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.Here are the three points in Simplified Chinese:* For: 本 paper 的目的是开发自动发现可以促进 AI 系统广泛适用的环境。* Methods: 提议的方法是 Cross-Risk-Minimization (XRM),它将两个姐妹网络训练在一个随机选择的训练数据上,并模仿她的姐妹网络中的自信停止错误。* Results: paper 表明,XRM 可以为所有训练和验证数据发现环境,并在这些环境上建立域总结算法,解决了异常事物总结的长期问题。
    Abstract Successful out-of-distribution generalization requires environment annotations. Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generalization. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods add hyper-parameters and early-stopping criteria that are impossible to tune without a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains two twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.
    摘要 成功的 OUT-OF-DISTRIBUTION 泛化需要环境注释。 Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generalization. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods add hyper-parameters and early-stopping criteria that are impossible to tune without a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains two twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.Here's the translation in Traditional Chinese as well:成功的 OUT-OF-DISTRIBUTION 泛化需要环境注释。 Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generalization. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods add hyper-parameters and early-stopping criteria that are impossible to tune without a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains two twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.

MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

  • paper_url: http://arxiv.org/abs/2309.16639
  • repo_url: None
  • paper_authors: Ruolan Wu, Chun Yu, Xiaole Pan, Yujia Liu, Ningning Zhang, Yue Fu, Yuhan Wang, Zhi Zheng, Li Chen, Qiaolei Jiang, Xuhai Xu, Yuanchun Shi
  • for: 这个研究旨在开发一种基于大语言模型的智能手机使用范例,以帮助解决问题atic smartphone 使用对身心健康的负面影响。
  • methods: 本研究使用了巫师-奥兹研究(N=12)和访谈研究(N=10),以了解用户对问题atic smartphone 使用的心理状态,包括怠惰、压力和陌生。这些资讯帮助设计了四种劝导策略:理解、安慰、唤起和导向习惯。
  • results: 比较 MindShift 和基本技术, MindShift 在5周场景实验(N=25)中具有较高的干预接受率(17.8-22.5%)和降低智能手机使用频率(12.1-14.4%)。此外,用户的智能手机成瘾指数下降和自律力上升。研究显示了基于大语言模型的上下文感知劝导在其他行为改变领域的潜力。
    Abstract Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conduct a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leverage large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We develop MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment physical contexts, mental states, app usage behaviors, users' goals & habits as input, and generates high-quality and flexible persuasive content with appropriate persuasion strategies. We conduct a 5-week field experiment (N=25) to compare MindShift with baseline techniques. The results show that MindShift significantly improves intervention acceptance rates by 17.8-22.5% and reduces smartphone use frequency by 12.1-14.4%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.
    摘要 Problematic smartphone use negatively affects physical and mental health. Despite extensive prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment physical contexts, mental states, app usage behaviors, users' goals & habits as input, and generates high-quality and flexible persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with baseline techniques. The results show that MindShift significantly improves intervention acceptance rates by 17.8-22.5% and reduces smartphone use frequency by 12.1-14.4%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.

Mixup Your Own Pairs

  • paper_url: http://arxiv.org/abs/2309.16633
  • repo_url: https://github.com/yilei-wu/supremix
  • paper_authors: Yilei Wu, Zijian Dong, Chongyao Chen, Wangchunshu Zhou, Juan Helen Zhou
  • for: 这篇研究旨在提高回溯学习中的数据回溯表现,特别是对于 regression задачі。
  • methods: 这篇研究使用了 contrastive learning 技术,并且提出了一个名为 SupReMix 的新方法,它通过在嵌入层使用 anchor-inclusive 和 anchor-exclusive 的 mixture 来提高对 regression 数据的表现。
  • results: 经过广泛的实验和理论分析,研究发现 SupReMix 可以对 regression 数据提供丰富的排序信息,从而提高 regression 表现。此外,SupReMix 在转移学习、训练数据不均衡和对于少量训练数据的情况下也表现出优优的性能。
    Abstract In representation learning, regression has traditionally received less attention than classification. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we advocate "mixup your own contrastive pairs for supervised contrastive regression", instead of relying solely on real/augmented samples. Specifically, we propose Supervised Contrastive Learning for Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through extensive experiments on six regression datasets including 2D images, volumetric images, text, tabular data, and time-series signals, coupled with theoretical analysis, we demonstrate that SupReMix pre-training fosters continuous ordered representations of regression data, resulting in significant improvement in regression performance. Furthermore, SupReMix is superior to other approaches in a range of regression challenges including transfer learning, imbalanced training data, and scenarios with fewer training samples.
    摘要 在表征学学习中,回归方面曾经受到类别学习的遮盖,直接将类别学习技术应用于回归问题通常会导致杂乱的表征在隐藏空间,影响性不佳。在这篇论文中,我们认为对比学习在回归中的潜在能力被忽略了两个关键因素:排序意识和困难程度。为了解决这些挑战,我们提议“混合你自己的对比对”,而不仅仅依靠实际/扩展样本。我们提出了“Supervised Contrastive Learning for Regression with Mixup”(SupReMix)。它在嵌入层使用混合(包括 anchor 和一个不同的负样本的混合)作为困难对,并使用不含 anchor 的混合(两个不同的负样本之间的混合)作为困难正对。这种策略通过在嵌入水平上混合更多的排序信息,形成更加困难的对比对。经过了EXTENSIVE EXPERIMENTS 在六个回归数据集,包括 2D 图像、体积图像、文本、表格数据和时间序列信号,并与理论分析,我们展示了 SupReMix 预训练可以促进回归数据的连续排序表征,从而带来显著提高回归性能。此外,SupReMix 还在多种回归挑战中表现出优异,包括转移学习、不均衡训练数据和少量训练样本的场景。

Harnessing Diverse Data for Global Disaster Prediction: A Multimodal Framework

  • paper_url: http://arxiv.org/abs/2309.16747
  • repo_url: None
  • paper_authors: Gengyin Liu, Huaiyang Zhong
  • for: 预测气候变化导致的灾害预测,尤其是洪水和山塌预测,以了解气候和地理因素的关系。
  • methods: 该研究使用多Modal数据源,包括天气统计、卫星图像和文本情况,构建一个新的灾害预测框架。为了 Address class imbalance,我们采用了一些策略。
  • results: 结果表明,通过 integrate multiple data sources,可以提高模型性能,但是具体的提高程度因每种灾害和其根本原因而异。
    Abstract As climate change intensifies, the urgency for accurate global-scale disaster predictions grows. This research presents a novel multimodal disaster prediction framework, combining weather statistics, satellite imagery, and textual insights. We particularly focus on "flood" and "landslide" predictions, given their ties to meteorological and topographical factors. The model is meticulously crafted based on the available data and we also implement strategies to address class imbalance. While our findings suggest that integrating multiple data sources can bolster model performance, the extent of enhancement differs based on the specific nature of each disaster and their unique underlying causes.
    摘要 随着气候变化加剧,灾害预测的紧迫性日益增加。这项研究提出了一种新型多模态灾害预测框架,结合天气统计、卫星图像和文本掌握。我们尤其关注“洪水”和“山崩”预测,因为它们与天气和地理因素有着密切的关系。模型通过可用数据的精心设计,并对数据不平衡进行处理。我们发现,将多种数据源 integrate 可以提高模型性能,但每种灾害的特点和根本原因决定了增强程度。

Stress Testing Chain-of-Thought Prompting for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.16621
  • repo_url: None
  • paper_authors: Aayush Mishra, Karan Thakkar
  • for: 此研究探讨了Chain-of-Thought(CoT)提示的效iveness在提高大语言模型(LLM)的多步逻辑能力。
  • methods: 研究人员使用了三种类型的CoT提示perturbation,namely CoT order, CoT values,和CoT operators来分析GPT-3在不同任务上的表现。
  • results: 研究发现, incorrect CoT提示会导致准确率指标下降,正确的CoT值对于预测正确答案是关键。此外, incorrect demonstrations,where CoT operators或CoT order是错误的,不会对表现产生太大影响,相比之下,值基于的perturbation更加有影响。
    Abstract This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs). Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks. Our findings show that incorrect CoT prompting leads to poor performance on accuracy metrics. Correct values in the CoT is crucial for predicting correct answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT order are wrong, do not affect the performance as drastically when compared to the value based perturbations. This research deepens our understanding of CoT prompting and opens some new questions regarding the capability of LLMs to learn reasoning in context.
    摘要 Translation notes:* "Chain-of-Thought" (CoT) is translated as "思维链" (siwei lian) in Simplified Chinese.* "large language models" (LLMs) is translated as "大语言模型" (da yu yan mo deli) in Simplified Chinese.* "incorrect CoT prompting" is translated as "错误的思维链提示" (cuo yong de siwei lian tiishi) in Simplified Chinese.* "correct values in the CoT" is translated as "思维链中正确的值" (siwei lian zhong zheng qi de yi) in Simplified Chinese.* "incorrect demonstrations" is translated as "错误的示例" (cuo yong de shi yi) in Simplified Chinese.

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

  • paper_url: http://arxiv.org/abs/2309.16620
  • repo_url: None
  • paper_authors: Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan
  • for: 这 paper 的目的是找到一种新的深度学习模型调参方法,以降低模型的计算成本。
  • methods: 这 paper 使用了 $\mu$P 参数化网络,其中小宽网络的优化参数可以转移到任意宽度的网络上。然而,在这种方案中,参数不会转移到深度上。为了解决这个问题,这 paper 研究了具有 $1/\sqrt{\text{depth}$ 的剩余分支的 residual 网络,并与 $\mu$P 参数化结合使用。
  • results: 这 paper 通过实验表明,使用这种参数化和 residual 网络结构可以在 CIFAR-10 和 ImageNet 上实现优化参数的传递 across width 和 depth。此外,这 paper 的实验结果得到了理论支持,使用了近期发展的神经网络学习动态mean field theory(DMFT)描述神经网络学习动态,并证明了这种参数化的 ResNet 在无穷宽和无穷深度上存在一个明确的特征学习共聚点,并且证明了 finite-size 网络动态的收敛到这个共聚点。
    Abstract The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    摘要 深度学习中的参数优化成本随模型大小增长,让实践者寻找新的优化方法,其中一种提议使用$\mu$P参数化网络,其中优化的参数对小宽网络适用于任意大宽网络。然而,在这种方案中,参数不会在深度上传递。为了解决这个问题,我们研究了具有 $1/\sqrt{\text{depth}$ 的剩余分支级别的 residual 网络,并将其与 $\mu$P 参数化结合使用。我们提供了实验证明,使用这种参数化可以在 CIFAR-10 和 ImageNet 上传递优化的参数 across 宽度和深度。此外,我们的实验结果得到了理论支持,使用了最近的神经网络学习动力学 теория(DMFT)描述神经网络学习动力学,我们显示这种参数化的 ResNet 具有明确的特征学习共同极限,并且证明了finite-size网络动力学的拓扑向这个极限 converges。

Revisiting Neural Program Smoothing for Fuzzing

  • paper_url: http://arxiv.org/abs/2309.16618
  • repo_url: None
  • paper_authors: Maria-Irina Nicolae, Max Eisele, Andreas Zeller
  • for: This paper aims to evaluate the performance of Neural Program Smoothing (NPS) fuzzers and compare them with standard gray-box fuzzers.
  • methods: The paper uses a neural network as a smooth approximation of the program target for new test case generation, and conducts the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers.
  • results: The paper finds that the original performance claims for NPS fuzzers do not hold, and that standard gray-box fuzzers almost always surpass NPS-based fuzzers. The paper also contributes an in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS, and proposes new guidelines targeted at benchmarking fuzzing based on machine learning.Here is the format you requested for the results:
  • for: 这篇论文目的是评估Neural Program Smoothing(NPS)批处法的性能,并与标准的灰度批处法进行比较。
  • methods: 这篇论文使用神经网络作为新测试 случа的生成目标函数的均匀approximation,并进行了NPS批处法的最广泛评估。
  • results: 这篇论文发现NPS批处法的原始性能声明不准确,并且标准的灰度批处法大多数情况下会超过NPS基于的批处法。论文还提供了NPS中机器学习和梯度基于的变化的深入分析,并提出了基于机器学习的批处法评估指南。
    Abstract Testing with randomly generated inputs (fuzzing) has gained significant traction due to its capacity to expose program vulnerabilities automatically. Fuzz testing campaigns generate large amounts of data, making them ideal for the application of machine learning (ML). Neural program smoothing (NPS), a specific family of ML-guided fuzzers, aims to use a neural network as a smooth approximation of the program target for new test case generation. In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions: (1) We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works. (2) We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS. (3) We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers. (4) As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.
    摘要 In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions:1. We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works.2. We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS.3. We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers.4. As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.

“AI enhances our performance, I have no doubt this one will do the same”: The Placebo effect is robust to negative descriptions of AI

  • paper_url: http://arxiv.org/abs/2309.16606
  • repo_url: None
  • paper_authors: Agnes M. Kloft, Robin Welsch, Thomas Kosch, Steeven Villa
  • for: investigate the impact of user expectations on human-AI interactions and evaluate the effectiveness of AI systems.
  • methods: used a letter discrimination task and a Bayesian analysis to study the impact of AI descriptions on participant performance, and used cognitive modeling to trace the advantage back to participants gathering more information.
  • results: found that participants performed better when they believed an AI was present, even when there was no actual AI, and that negative AI descriptions did not alter expectations.Here is the text in Simplified Chinese:
  • for: 研究用户对人机交互中的AI系统效iveness的影响,以及用户对AI系统的期望和期待的影响。
  • methods: 使用字母识别任务和 bayesian分析研究用户对AI描述的影响,并使用认知模型跟踪这种优势的来源。
  • results: 发现当用户认为AI存在时,他们的性能会更高,即使没有实际的AI,并且消极的AI描述无法改变用户的期望。
    Abstract Heightened AI expectations facilitate performance in human-AI interactions through placebo effects. While lowering expectations to control for placebo effects is advisable, overly negative expectations could induce nocebo effects. In a letter discrimination task, we informed participants that an AI would either increase or decrease their performance by adapting the interface, but in reality, no AI was present in any condition. A Bayesian analysis showed that participants had high expectations and performed descriptively better irrespective of the AI description when a sham-AI was present. Using cognitive modeling, we could trace this advantage back to participants gathering more information. A replication study verified that negative AI descriptions do not alter expectations, suggesting that performance expectations with AI are biased and robust to negative verbal descriptions. We discuss the impact of user expectations on AI interactions and evaluation and provide a behavioral placebo marker for human-AI interaction
    摘要 人工智能预期的增强会促进人机交互中的表现 durch placebo效应。而为了控制placebo效应,应下降预期,但过于负面的预期可能会导致nocebo效应。在一个字母拥挤任务中,我们通知参与者,一个AI会通过改变界面来增加或减少他们的表现,但在实际情况下,没有AI存在任何condition。一种 bayesian分析表明,参与者对AI的预期很高,并在sham-AI存在的情况下表现出了更好的描述性表现。通过认知模型,我们可以追溯这个优势回到参与者更多地收集信息。一个重复研究证明,负面的AI描述不会改变预期, suggesting that performance expectations with AI are biased and robust to negative verbal descriptions。我们讨论了用户预期对人机交互和评价的影响,以及提供了人机交互中的行为地平标记。

Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces

  • paper_url: http://arxiv.org/abs/2309.16597
  • repo_url: None
  • paper_authors: Zhou Fan, Xinran Han, Zi Wang
  • for: 优化黑盒函数(black-box function optimization)
  • methods: bayesian 优化(Bayesian optimization)和培根学习(transfer learning)
  • results: 提高了黑盒函数优化任务的性能,可以在不同域的搜索空间中传递知识。
    Abstract Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.
    摘要 bayesian 优化(BO)是一种广泛使用的黑盒函数优化方法,它根据 bayesian 模型(通常是 Gaussian 过程)来做Sequential 决策。为保证模型质量,传输学approaches 已经开发来自动设置 GP 先天的模型。这些训练函数通常需要与“测试”函数(黑盒函数优化的目标函数)具有同一个Domain。在这篇论文中,我们介绍 MPHD,一种基于不同领域的域特征 mapping 来预训练 GP 模型的方法。MPHD 可以轻松地与 BO 结合使用,从而在不同搜索空间中传输知识。我们的理论和实验结果表明 MPHD 的有效性和在复杂黑盒函数优化任务中的优异表现。

Can LLMs Effectively Leverage Graph Structural Information: When and Why

  • paper_url: http://arxiv.org/abs/2309.16595
  • repo_url: https://github.com/trais-lab/llm-structured-data
  • paper_authors: Jin Huang, Xingjian Zhang, Qiaozhu Mei, Jiaqi Ma
  • for: 这个论文研究使用大量自然语言模型(LLM),并将结构数据(特别是图形数据)作为未经探索的数据类型,以提高节点预测性能。
  • methods: 作者使用多种提示方法来编码结构信息,以实现在文本特征scarce或rich的情况下提高LLM的预测性能。
  • results: 研究发现(i)LLM可以受益于结构信息,尤其是当文本节点特征scarce时;(ii)没有显著证据表明LLM性能受到数据泄露的影响;以及(iii)LLM在Target节点上的性能强正相关于节点的本地同类比率。
    Abstract This paper studies Large Language Models (LLMs) augmented with structured data--particularly graphs--a crucial data modality that remains underexplored in the LLM literature. We aim to understand when and why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs on node classification tasks with textual features. To address the ``when'' question, we examine a variety of prompting methods for encoding structural information, in settings where textual node features are either rich or scarce. For the ``why'' questions, we probe into two potential contributing factors to the LLM performance: data leakage and homophily. Our exploration of these questions reveals that (i) LLMs can benefit from structural information, especially when textual node features are scarce; (ii) there is no substantial evidence indicating that the performance of LLMs is significantly attributed to data leakage; and (iii) the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node\footnote{Codes and datasets are at: \url{https://github.com/TRAIS-Lab/LLM-Structured-Data}.
    摘要 To answer the "when" question, the paper explores various methods for encoding structural information, including prompting methods, in settings with rich or scarce textual node features. For the "why" questions, the study examines two potential factors that contribute to LLM performance: data leakage and homophily.The results show that LLMs can benefit from structural information, especially when textual node features are limited. Additionally, the study finds no significant evidence that LLM performance is largely attributed to data leakage. Finally, the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node.The codes and datasets used in the study are available at: \url{https://github.com/TRAIS-Lab/LLM-Structured-Data}.

  • paper_url: http://arxiv.org/abs/2309.16593
  • repo_url: None
  • paper_authors: Satvik Garg, Shivam Parikh, Somya Garg
  • for: This paper is written for researchers and healthcare professionals who are interested in using knowledge graphs (KGs) in healthcare AI, specifically in drug discovery and pharmaceutical research.
  • methods: The paper discusses various methods for constructing and utilizing KGs in healthcare AI, including knowledge-infused learning, relationship extraction, and reasoning.
  • results: The paper highlights the potential of KGs in healthcare AI to improve interpretability and support decision-making, with applications in areas such as Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. The paper also emphasizes the importance of making KGs more interpretable in healthcare.
    Abstract Knowledge graphs (KGs) are gaining prominence in Healthcare AI, especially in drug discovery and pharmaceutical research as they provide a structured way to integrate diverse information sources, enhancing AI system interpretability. This interpretability is crucial in healthcare, where trust and transparency matter, and eXplainable AI (XAI) supports decision making for healthcare professionals. This overview summarizes recent literature on the impact of KGs in healthcare and their role in developing explainable AI models. We cover KG workflow, including construction, relationship extraction, reasoning, and their applications in areas like Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. We emphasize the importance of making KGs more interpretable through knowledge-infused learning in healthcare. Finally, we highlight research challenges and provide insights for future directions.
    摘要 知识图(KG)在医疗人工智能(AI)领域受到越来越多的关注,特别是在药物发现和药品研究中,因为它们提供了一种结构化的方式,整合多种信息源,提高AI系统的可读性。这种可读性在医疗领域非常重要,因为信任和透明度很重要,而解释AI(XAI)支持医疗专业人员的决策。本文提供了最近的文献研究,描述了KG在医疗领域的影响和其在开发可解释AI模型方面的作用。我们覆盖了KG的工作流程,包括建立、关系提取、推理、以及在药物间交互(DDI)、药target交互(DTI)、药品开发(DD)、不良药物反应(ADR)和生物信息学等领域的应用。我们强调了在医疗领域使KG更加可解释的重要性,并提供了未来研究的挑战和思路。

The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges

  • paper_url: http://arxiv.org/abs/2309.16573
  • repo_url: None
  • paper_authors: Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Anthony G. Cohn, Nigel Shadbolt, Michael Wooldridge
  • for: 本研究的目标是描述语言模型作为服务(LMaaS)的困难和挑战,以及如何提高访问、复制、可靠性和信任worthiness(ARRT)。
  • methods: 本研究采用系统性的方法描述当前主要的LMaaS的缺乏信息所带来的障碍,并提供一些建议和未来发展方向。
  • results: 本研究结果表明,当前主要的LMaaS存在访问、复制、可靠性和信任worthiness(ARRT)的困难和挑战,并提供了一些建议和未来发展方向。
    Abstract Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or software programming interfaces. This is the Language-Models-as-a-Service (LMaaS) paradigm. Contrasting with scenarios where full model access is available, as in the case of open-source models, such closed-off language models create specific challenges for evaluating, benchmarking, and testing them. This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, replicability, reliability, and trustworthiness (ARRT) of LMaaS. We systematically examine the issues that arise from a lack of information about language models for each of these four aspects. We shed light on current solutions, provide some recommendations, and highlight the directions for future advancements. On the other hand, it serves as a one-stop-shop for the extant knowledge about current, major LMaaS, offering a synthesized overview of the licences and capabilities their interfaces offer.
    摘要 一些当前最强大的语言模型都是专有系统,通过(通常是限制的)网络或软件编程接口进行访问。这是语言模型作为服务(LMaaS)模式。与开源模型相比,这些封闭语言模型会创造特定的挑战,以评估、测试和比较它们的可访问性、复制性、可靠性和信任性(ARRT)。本文有两个目标:首先,我们详细描述了这些挑战如何阻碍LMaaS的访问、复制、可靠性和信任性四个方面的可访问性。我们系统地检查这些问题的起因,并提供一些建议。其次,它serve as a one-stop-shop for the extant knowledge about current, major LMaaS, offering a synthesized overview of the licenses and capabilities their interfaces offer.

Augment to Interpret: Unsupervised and Inherently Interpretable Graph Embeddings

  • paper_url: http://arxiv.org/abs/2309.16564
  • repo_url: https://github.com/euranova/augment_to_interpret
  • paper_authors: Gregory Scafarto, Madalina Ciortan, Simon Tihon, Quentin Ferre
  • for: 这篇论文旨在提出一种可 interpretability 的 graph representation learning 方法,以满足 Recent Transparent-AI 规定。
  • methods: 该方法使用 data augmentation 技术,通过保持 semantics 来学习可 interpretability 的 embedding。
  • results: 实验研究表明,该方法可以提供 state-of-the-art 的性能在 downstream 任务上,并且具有可 interpretability 的优势。
    Abstract Unsupervised learning allows us to leverage unlabelled data, which has become abundantly available, and to create embeddings that are usable on a variety of downstream tasks. However, the typical lack of interpretability of unsupervised representation learning has become a limiting factor with regard to recent transparent-AI regulations. In this paper, we study graph representation learning and we show that data augmentation that preserves semantics can be learned and used to produce interpretations. Our framework, which we named INGENIOUS, creates inherently interpretable embeddings and eliminates the need for costly additional post-hoc analysis. We also introduce additional metrics addressing the lack of formalism and metrics in the understudied area of unsupervised-representation learning interpretability. Our results are supported by an experimental study applied to both graph-level and node-level tasks and show that interpretable embeddings provide state-of-the-art performance on subsequent downstream tasks.
    摘要 Unsupervised learning 允许我们利用无标签数据,这些数据在过去几年变得极其丰富,并创建可以在多种下游任务上使用的嵌入。然而,通常缺乏无监督表示学习的解释性限制了我们在Recent Transparent-AI规定下的发展。在这篇论文中,我们研究图表示学习,并证明通过保持 semantics 的数据扩充可以学习并生成解释性的嵌入。我们的框架,我们称之为 INGENIOUS,创造了内置的解释性嵌入,从而消除了高昂的附加后续分析的需求。我们还引入了针对无监督表示学习解释性缺乏正式主义和度量的额外度量。我们的实验研究在图级和节点级任务上应用,结果表明可解释性嵌入提供了下游任务的状态级表现。

Voting Network for Contour Levee Farmland Segmentation and Classification

  • paper_url: http://arxiv.org/abs/2309.16561
  • repo_url: None
  • paper_authors: Abolfazl Meyarian, Xiaohui Yuan
  • for: 这 paper 是为了 segmenting farmlands with contour levees from high-resolution aerial imagery.
  • methods: 这 paper 使用了一种 end-to-end 可学习的网络,包括多个 voting blocks 来实现图像分类和 segmentation.
  • results: 这 paper 的方法在 National Agriculture Imagery Program 的图像上测试得到了平均准确率为 94.34%,比前一个状态的方法提高了 6.96% 和 2.63% 的 F1 分数。
    Abstract High-resolution aerial imagery allows fine details in the segmentation of farmlands. However, small objects and features introduce distortions to the delineation of object boundaries, and larger contextual views are needed to mitigate class confusion. In this work, we present an end-to-end trainable network for segmenting farmlands with contour levees from high-resolution aerial imagery. A fusion block is devised that includes multiple voting blocks to achieve image segmentation and classification. We integrate the fusion block with a backbone and produce both semantic predictions and segmentation slices. The segmentation slices are used to perform majority voting on the predictions. The network is trained to assign the most likely class label of a segment to its pixels, learning the concept of farmlands rather than analyzing constitutive pixels separately. We evaluate our method using images from the National Agriculture Imagery Program. Our method achieved an average accuracy of 94.34\%. Compared to the state-of-the-art methods, the proposed method obtains an improvement of 6.96% and 2.63% in the F1 score on average.
    摘要 高解像卫星影像可以显示农田的细节,但小 objetcs 和特征会导致对象boundaries的扭曲,需要更大的上下文视图来减少类异常。在这种工作中,我们提出了一个可以执行全程训练的网络,用于从高解像卫星影像中分割农田和缘坝。我们设计了一个合并块,该块包括多个投票块,以实现图像分类和分割。我们将该合并块与背景 integrate 并生成semantic prediction和分割slice。我们使用分割slice进行多数投票,以确定每个像素的最有可能的类别标签。我们使用National Agriculture Imagery Program中的图像进行评估,我们的方法达到了94.34%的平均准确率。相比之前的方法,我们的方法在F1分数中提高了6.96%和2.63%的平均值。

KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models

  • paper_url: http://arxiv.org/abs/2309.16535
  • repo_url: https://github.com/juyiming/klob
  • paper_authors: Yiming Ju, Zheng Zhang
  • for: 本研究旨在检验语言模型中的知识储存是否符合 lokalisierung гипотезы。
  • methods: 本研究提出了 KLoB benchmark,用于评估现有的知识定位方法。
  • results: KLoB 可以用于评估现有的知识定位方法,并且可以用于重新评估 lokalisierung гипотезы。
    Abstract Recently, Locate-Then-Edit paradigm has emerged as one of the main approaches in changing factual knowledge stored in the Language models. However, there is a lack of research on whether present locating methods can pinpoint the exact parameters embedding the desired knowledge. Moreover, although many researchers have questioned the validity of locality hypothesis of factual knowledge, no method is provided to test the a hypothesis for more in-depth discussion and research. Therefore, we introduce KLoB, a benchmark examining three essential properties that a reliable knowledge locating method should satisfy. KLoB can serve as a benchmark for evaluating existing locating methods in language models, and can contributes a method to reassessing the validity of locality hypothesis of factual knowledge. Our is publicly available at \url{https://github.com/juyiming/KLoB}.
    摘要 最近,语言模型中的“发现然后编辑”模式已成为改变知识存储的主要方法之一。然而,exist 的研究表明,目前的定位方法是否可以准确地寻找所需的知识 Parameters 还存在很大的不确定性。此外,许多研究人员对本地性假设表示怀疑,但是没有提供方法来进行更深入的讨论和研究。因此,我们介绍了 KLoB,一个评估三种关键性质的知识定位指标。KLoB 可以用来评估现有的定位方法在语言模型中的性能,并且可以为本地性假设的有效性进行再评估。我们的代码公开在 GitHub 上,可以通过 \url{https://github.com/juyiming/KLoB} 访问。

MotionLM: Multi-Agent Motion Forecasting as Language Modeling

  • paper_url: http://arxiv.org/abs/2309.16534
  • repo_url: None
  • paper_authors: Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S. Refaat, Rami Al-Rfou, Benjamin Sapp
  • for: 预测自动驾驶车辆未来行为的可靠预测是一项关键任务,以确保安全规划。
  • methods: 本文使用语言模型来预测多个交通Agent的未来行为,并将连续轨迹表示为字符序列。
  • results: 提出的方法在 Waymo 开放运动数据集上实现了新的状态 искус智能榜首,在交互挑战领先榜单上排名第一。
    Abstract Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.
    摘要 <>TRANSLATE_TEXT预测自驾车道Agent的未来行为是安全规划中的关键组成部分。在这里,我们表示连续轨迹为序列化 discrete 动作符号,将多个动物运动预测变为语言模型化任务。我们的模型,MotionLM,具有以下优势:首先,它不需要锚点或显式的隐藏变量优化来学习多模态分布。相反,我们利用单个标准语言模型化目标,最大化序列符号的平均日志概率。其次,我们的方法不需要后续交互规则,而是在单个推送过程中生成交互agent的共同未来。此外,模型的时间分解能够实现 Conditional Rollouts。我们的提出方法在 Waymo 开放动力学数据集上实现了新的状态纪录性表现,在互动挑战 leaderboard 上排名第一。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Chatmap : Large Language Model Interaction with Cartographic Data

  • paper_url: http://arxiv.org/abs/2310.01429
  • repo_url: None
  • paper_authors: Eren Unlu
    for:This paper aims to demonstrate the use of a large language model (LLM) to provide a linguistic interface to OpenStreetMap (OSM) data for an arbitrary urban region, allowing users to inquire about location attributes such as touristic appeal or business profitability.methods:The authors fine-tune a relatively small-scale LLM with a small artificial dataset curated by a more capable teacher model to provide the linguistic interface to OSM data.results:The study shows early signs of useful emerging abilities in this context, including the embeddings of artificially curated prompts including OSM data, which could be instrumental for potential geospatially aware urban Retrieval Augmented Generation (RAG) applications.
    Abstract The swift advancement and widespread availability of foundational Large Language Models (LLMs), complemented by robust fine-tuning methodologies, have catalyzed their adaptation for innovative and industrious applications. Enabling LLMs to recognize and interpret geospatial data, while offering a linguistic access to vast cartographic datasets, is of significant importance. OpenStreetMap (OSM) is the most ambitious open-source global initiative offering detailed urban and rural geographic data, curated by a community of over 10 million contributors, which constitutes a great potential for LLM applications. In this study, we demonstrate the proof of concept and details of the process of fine-tuning a relatively small scale (1B parameters) LLM with a relatively small artificial dataset curated by a more capable teacher model, in order to provide a linguistic interface to the OSM data of an arbitrary urban region. Through this interface, users can inquire about a location's attributes, covering a wide spectrum of concepts, such as its touristic appeal or the potential profitability of various businesses in that vicinity. The study aims to provide an initial guideline for such generative artificial intelligence (AI) adaptations and demonstrate early signs of useful emerging abilities in this context even in minimal computational settings. The embeddings of artificially curated prompts including OSM data are also investigated in detail, which might be instrumental for potential geospatially aware urban Retrieval Augmented Generation (RAG) applications.
    摘要 快速发展和普及大型自然语言模型(LLM)的可能性,结合了可靠的微调方法,使得它们在创新和产业上得到应用。允许 LLM 认可和解释地理数据,同时提供语言接口访问庞大的地图数据集,对于地理应用来说非常重要。开源地图协会(OSM)是全球最大的开源地理数据initiative,由1000万名贡献者维护,这个数据库的可访问性和可用性对LLM应用来说非常重要。本研究示例了一种使用相对较小的约10亿参数的 LLM 微调过程,使得它可以理解和解释OSM数据,并提供一种语言接口,让用户可以根据地点的特征,提问该地点的属性,包括旅游appeal或者当地企业的可能性。本研究的目的是提供一个初步的AI应用 guideline,并证明在有限的计算设置下,LLM在这种情况下的初步表现。此外,研究还investigated embedding of artificially curated prompts,包括OSM数据,这些embeddings可能对 potential的地ospatially awareurban Retrieval Augmented Generation(RAG)应用产生影响。

From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford’s Geometric Algebra and Convexity

  • paper_url: http://arxiv.org/abs/2309.16512
  • repo_url: None
  • paper_authors: Mert Pilanci
  • for: 这个论文是关于神经网络的分析,使用几何(Clifford)代数和几何优化。
  • methods: 论文使用标准正则化损失函数来训练深度ReLU神经网络,并通过几何优化来找到优化的 weights。
  • results: 论文发现,在训练过程中,神经网络的优化 weights 可以表示为训练样本的叉乘Product,并且训练问题可以转化为几何优化问题,该问题可以找到一小型的样本subset,并通过 $\ell_1$ 正则化来发现只有 relevante 叉乘Product features。
    Abstract In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.
    摘要 在本文中,我们提出了一种基于几何(Clifford)代数和凸优化的神经网络分析方法。我们证明,当使用标准正则化损失函数进行训练时,深度ReLU神经网络的优化策略是通过训练样本的叉乘产生的。此外,训练问题转化为凸优化问题,其中凸函数是基于叉乘特征的。这些特征编码了训练数据集的几何结构,具体是指由数据向量生成的积分体和平行板生成的正负体积。我们的分析为神经网络的内部工作提供了一个新的视角,并且抛光了隐藏层的作用。

Toloka Visual Question Answering Benchmark

  • paper_url: http://arxiv.org/abs/2309.16511
  • repo_url: None
  • paper_authors: Dmitry Ustalov, Nikita Pavlichenko, Sergey Koshelev, Daniil Likhobaba, Alisa Smirnova
  • for: 这个论文目的是提出一个新的人工智能测试数据集,用于评测机器学习系统在视觉问答任务中的性能,并与人类水平进行比较。
  • methods: 这个论文使用了人工智能技术,包括开源零基eline模型和多阶段竞赛,以评测机器学习系统在视觉问答任务中的性能。
  • results: 根据交叠分区评价分数,当论文提交时,没有任何机器学习模型超越非专家人工智能基线。
    Abstract In this paper, we present Toloka Visual Question Answering, a new crowdsourced dataset allowing comparing performance of machine learning systems against human level of expertise in the grounding visual question answering task. In this task, given an image and a textual question, one has to draw the bounding box around the object correctly responding to that question. Every image-question pair contains the response, with only one correct response per image. Our dataset contains 45,199 pairs of images and questions in English, provided with ground truth bounding boxes, split into train and two test subsets. Besides describing the dataset and releasing it under a CC BY license, we conducted a series of experiments on open source zero-shot baseline models and organized a multi-phase competition at WSDM Cup that attracted 48 participants worldwide. However, by the time of paper submission, no machine learning model outperformed the non-expert crowdsourcing baseline according to the intersection over union evaluation score.
    摘要 在这篇论文中,我们介绍了Toloka视觉问答数据集,这是一个新的人工生成的数据集,用于比较机器学习系统与人类专业水平在视觉问答任务中的表现。在这个任务中,给定一幅图像和一个文本问题,需要正确地选择图像中相应的对象。每个图像-问题对包含回答,仅有一个正确的回答每幅图像。我们的数据集包含45,199个图像-问题对,其中包括训练和两个测试subset,以及每个图像-问题对的真实答案。除了描述数据集和发布它以CC BYlicense外,我们还进行了一系列实验,使用开源零基eline模型,并在WSDM杯中组织了多阶段比赛,吸引了全球48名参与者。然而,到论文提交时,没有任何机器学习模型超过非专家人工生成基eline的交叉上下 overlap评价分。

Asset Bundling for Wind Power Forecasting

  • paper_url: http://arxiv.org/abs/2309.16492
  • repo_url: None
  • paper_authors: Hanyu Zhang, Mathieu Tanneau, Chaofan Huang, V. Roshan Joseph, Shangkun Wang, Pascal Van Hentenryck
  • for: 这篇论文旨在提高风力发电 grid 中的预测精度,尤其是在风力发电变化很大的情况下。
  • methods: 这篇论文提出了一个novel Bundle-Predict-Reconcile (BPR) 框架,融合资产组合、机器学习和预测重新整理技术。BPR 框架首先学习一个中间层级(组合),然后预测风力在资产、组合和舱级别的时间序列,最后将所有预测重新整理以确保一致性。这种方法将新增一个辅助学习任务(预测组合级别时间序列),帮助主要学习任务。
  • results: 实验结果显示,BPR 框架在实际应用中具有明显的改善预测精度的效果,特别是在舱级别上。
    Abstract The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (BPR) framework that integrates asset bundling, machine learning, and forecast reconciliation techniques. The BPR framework first learns an intermediate hierarchy level (the bundles), then predicts wind power at the asset, bundle, and fleet level, and finally reconciles all forecasts to ensure consistency. This approach effectively introduces an auxiliary learning task (predicting the bundle-level time series) to help the main learning tasks. The paper also introduces new asset-bundling criteria that capture the spatio-temporal dynamics of wind power time series. Extensive numerical experiments are conducted on an industry-size dataset of 283 wind farms in the MISO footprint. The experiments consider short-term and day-ahead forecasts, and evaluates a large variety of forecasting models that include weather predictions as covariates. The results demonstrate the benefits of BPR, which consistently and significantly improves forecast accuracy over baselines, especially at the fleet level.
    摘要 随着美国电力网络中间型发电的增加,特别是风力和太阳能发电的增加,运营uncertainty增加。在这个上下文中,准确预测是非常重要,尤其是风力发电,它的变化很大,历史上更难预测。为了解决这个挑战,这个工作提出了一种新的Bundle-Predict-Reconcile(BPR)框架,它将资产束合,机器学习和预测重叠技术相结合。BPR框架首先学习中间层次(束合),然后预测风力发电量在资产、束合和舰队级别,并最后重叠所有预测,以确保一致性。这种方法实际上是在主要学习任务之外增加了一个辅助学习任务(预测束合级时间序列),以帮助主要学习任务。这篇论文还提出了新的资产束合标准,以捕捉风力发电时间序列的空间-时间动态。我们对283个风力电站的数据进行了广泛的数值实验,考虑了短期和当日预测,并评估了许多预测模型,包括天气预测作为covariates。结果表明BPR具有显著优势,在baseline的基础上,尤其是在舰队级别,具有显著和一致性提高预测精度。

Augmenting LLMs with Knowledge: A survey on hallucination prevention

  • paper_url: http://arxiv.org/abs/2309.16459
  • repo_url: None
  • paper_authors: Konstantinos Andriopoulos, Johan Pouwelse
  • for: 本研究的目的是探讨大型预训言语模型如何通过与外部知识源的Integration来解决传统语言模型存在的问题,如幻想、不准确回答和扩展性问题。
  • methods: 本研究使用了将大型预训言语模型与可微分的访问机制相结合,以便访问外部知识源,包括外部知识库和搜索引擎。这些扩展的语言模型通过在预测缺失字符的标准目标下使用多样化、可能非参数的外部模块来增强其语言处理能力。
  • results: 本研究发现,通过将大型预训言语模型与知识源集成,可以解决传统语言模型存在的问题,如幻想、不准确回答和扩展性问题。这些扩展的语言模型还能够更好地处理语言任务,提高了对知识的访问和处理能力。
    Abstract Large pre-trained language models have demonstrated their proficiency in storing factual knowledge within their parameters and achieving remarkable results when fine-tuned for downstream natural language processing tasks. Nonetheless, their capacity to access and manipulate knowledge with precision remains constrained, resulting in performance disparities on knowledge-intensive tasks when compared to task-specific architectures. Additionally, the challenges of providing provenance for model decisions and maintaining up-to-date world knowledge persist as open research frontiers. To address these limitations, the integration of pre-trained models with differentiable access mechanisms to explicit non-parametric memory emerges as a promising solution. This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources, including external knowledge bases and search engines. While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules to augment their contextual processing capabilities, departing from the conventional language modeling paradigm. Through an exploration of current advancements in augmenting large language models with knowledge, this work concludes that this emerging research direction holds the potential to address prevalent issues in traditional LMs, such as hallucinations, un-grounded responses, and scalability challenges.
    摘要

Neuro Symbolic Reasoning for Planning: Counterexample Guided Inductive Synthesis using Large Language Models and Satisfiability Solving

  • paper_url: http://arxiv.org/abs/2309.16436
  • repo_url: None
  • paper_authors: Sumit Kumar Jha, Susmit Jha, Patrick Lincoln, Nathaniel D. Bastian, Alvaro Velasquez, Rickard Ewetz, Sandeep Neema
    for: 这种方法用于生成符合逻辑要求的正式文档,如代码、规划和逻辑规范。methods: 使用生成大型自然语言模型(LLMs),通过人工提供的指导提示,生成人类语言响应,并使用逻辑推理引擎(SMT)来分析生成的解决方案,生成错误的对应例子,并将其反馈给 LLMs。results: 通过在块域的规划任务上评估这种方法,发现这种方法可以生成符合逻辑要求的正式文档,并且可以使用非专家用户通过自然语言描述问题,并且组合 LLMs 和 SMT 引擎可以生成可靠的解决方案。
    Abstract Generative large language models (LLMs) with instruct training such as GPT-4 can follow human-provided instruction prompts and generate human-like responses to these prompts. Apart from natural language responses, they have also been found to be effective at generating formal artifacts such as code, plans, and logical specifications from natural language prompts. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucination. This limitation makes it difficult to use these models to synthesize formal artifacts that are used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of instruct-trained LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.
    摘要 大型生成语言模型(LLM),如GPT-4,可以根据人类提供的指令prompt并生成人类化的回应。除了自然语言回应,它们还能生成 formal artifacts such as code, plans, and logical specifications from natural language prompts。 despite their significantly improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results - a phenomenon often referred to as hallucination. This limitation makes it difficult to use these models to synthesize formal artifacts that are used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plans, and other formal artifacts produced by LLMs can be catastrophic.我们认为可以使用模型满足性理论(SMT)解析器来分析由LLMs生成的解决方案,生成错误的 counterexample,并通过对LLMs的对话来使其更正。这种LLMs和SMT解析器之间的互动可以轮循地使LLMs生成正确的回应。在我们的实验中,我们使用块的规划作为我们的生成任务,使用GPT-4、GPT3.5 Turbo、Davinci、Curie、Babbage和Ada作为LLMs,并使用Z3作为SMT解析器。我们的方法允许用户通过自然语言描述问题,甚至是SMT解析器的查询也可以自动生成自然语言中。因此,我们的技术可以帮助非专业用户通过自然语言描述问题,并且组合LLMs和SMT解析器可以生成可靠的解决方案。

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

  • paper_url: http://arxiv.org/abs/2309.16429
  • repo_url: https://github.com/guyyariv/TempoTokens
  • paper_authors: Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi
  • for: 本研究目的是生成受Semantic classes的各种语音样本引导的多样化和真实的视频。
  • methods: 我们使用了一个现有的文本受控制视频生成模型和一个预训练的音频编码器模型。我们的方法基于一个轻量级的适应器网络,该网络学习将音频基于表示映射到输入表示。因此,它还允许视频生成conditioned on文本、音频和两者。
  • results: 我们在三个 datasets 进行了广泛的验证,并提出了一个新的评价指标(AV-Align)来评估输入音频样本与生成的视频的匹配度。我们的方法在内容和时间轴上都能够更好地与输入音频样本相匹配,并且生成的视频也具有更高的视觉质量和更大的多样性。
    Abstract We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model. As such, it also enables video generation conditioned on text, audio, and, for the first time as far as we can ascertain, on both text and audio. We validate our method extensively on three datasets demonstrating significant semantic diversity of audio-video samples and further propose a novel evaluation metric (AV-Align) to assess the alignment of generated videos with input audio samples. AV-Align is based on the detection and comparison of energy peaks in both modalities. In comparison to recent state-of-the-art approaches, our method generates videos that are better aligned with the input sound, both with respect to content and temporal axis. We also show that videos produced by our method present higher visual quality and are more diverse.
    摘要 我们考虑一个生成多样化、现实的视频指导于自然语音样本的任务。这些视频需要与输入语音进行全球和时间的对齐:全球上,输入语音与输出视频的整体 semantic 关系存在,而时间上,每个语音样本都需要与对应的视频样本进行对齐。我们利用现有的文本受控视频生成模型和预训练的音频编码器模型。我们的方法基于一个轻量级的适配器网络,该网络学习将音频基于表示映射到文本受控视频生成模型的输入表示。因此,它也允许视频生成 conditioned 于文本、音频和,如果不同的说,也可以生成视频 conditioned 于文本和音频。我们在三个数据集上进行了广泛的验证,并提出了一个新的评价指标(AV-Align)来评估生成的视频与输入音频样本之间的对齐。AV-Align 基于检测和比较modalities 中的能量峰值。与最新的状态艺术方法相比,我们的方法生成的视频与输入声音更加吻合,同时也具有更高的视觉质量和多样性。

Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News Detection

  • paper_url: http://arxiv.org/abs/2309.16424
  • repo_url: https://github.com/jiayingwu19/prompt-and-align
  • paper_authors: Jiaying Wu, Shen Li, Ailin Deng, Miao Xiong, Bryan Hooi
  • for: 这篇论文旨在提出一种基于提示的几何 fake news 检测方法,以便利用预训练语言模型(PLM)的先锋知识和社交上下文 topology。
  • methods: 该方法首先将新闻文章包装在一个关于任务的文本提示中,然后使用 PLM 处理提示以直接提取任务特定的知识。此外,为了补充 PLM 的社交上下文信息而不导致额外训练开销,该方法还构建了新闻相似图,以捕捉新闻文章之间的真实性相似的信号。最后,该方法将提示的预测结果与图边缘进行准确度 Informed 对齐。
  • results: 对三个实际 benchmark 进行了广泛的实验, demonstrate 该方法可以在几何 fake news 检测任务中取得显著的新的状态 record,与传统的 Train-from-Scratch 方法相比,该方法可以减少标签稀缺问题。
    Abstract Despite considerable advances in automated fake news detection, due to the timely nature of news, it remains a critical open question how to effectively predict the veracity of news articles based on limited fact-checks. Existing approaches typically follow a "Train-from-Scratch" paradigm, which is fundamentally bounded by the availability of large-scale annotated data. While expressive pre-trained language models (PLMs) have been adapted in a "Pre-Train-and-Fine-Tune" manner, the inconsistency between pre-training and downstream objectives also requires costly task-specific supervision. In this paper, we propose "Prompt-and-Align" (P&A), a novel prompt-based paradigm for few-shot fake news detection that jointly leverages the pre-trained knowledge in PLMs and the social context topology. Our approach mitigates label scarcity by wrapping the news article in a task-related textual prompt, which is then processed by the PLM to directly elicit task-specific knowledge. To supplement the PLM with social context without inducing additional training overheads, motivated by empirical observation on user veracity consistency (i.e., social users tend to consume news of the same veracity type), we further construct a news proximity graph among news articles to capture the veracity-consistent signals in shared readerships, and align the prompting predictions along the graph edges in a confidence-informed manner. Extensive experiments on three real-world benchmarks demonstrate that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
    摘要 尽管自动化假新闻检测已经取得了很大的进步,但由于新闻的时效性,仍然是一个关键的开问如何有效地预测新闻文章的真实性基于有限的事实检查。现有的方法通常采用“训练从零”方法,它的基础是有限的 annotated 数据的可用性。而使用表达力强的预训练语言模型(PLM)的“预训练并精度调整”方法,也存在不一致性问题,需要耗费大量的任务特定超vision。在这篇论文中,我们提出了“提示和对齐”(P&A),一种新的提示基本 paradigm,可以同时利用 PLM 中的预训练知识和社交上下文 topology。我们的方法可以减轻标签缺乏问题,通过将新闻文章包装在任务相关的文本提示中,然后使用 PLM 处理提示,直接提取任务特定的知识。此外,为了补充 PLM 而不引入额外的训练负担,我们根据实际观察到的用户真实性一致性(即社交用户倾向于消耗同类真实性的新闻),构建了新闻 proximity graph,以捕捉新闻文章之间的真实性相似信号,并将提示预测与图 Edge 进行准确信息对齐。我们的实验表明,P&A 可以在三个真实世界 benchmark 上达到新的状态记录,在几何上击败现有方法。

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

  • paper_url: http://arxiv.org/abs/2309.16414
  • repo_url: None
  • paper_authors: Jan Hendrik Metzen, Piyapat Saranrittichai, Chaithanya Kumar Mummadi
  • for: 这篇论文是为了提出一种自动调整零模型的方法,以提高零模型在不同的图像分类任务中的性能。
  • methods: 该方法使用了CLIP视力语言模型,并使用了不同的描述符模板来自动生成描述符集。在执行时,该方法根据图像描述符和描述符模板的相似度计算出每个图像的权重,以优化零模型的性能。
  • results: 该方法在多种视力语言模型、数据集和描述符模板上都有较高的性能,与基eline比较起来,该方法可以提高零模型的准确率 by up to 3%。
    Abstract Classifiers built upon vision-language models such as CLIP have shown remarkable zero-shot performance across a broad range of image classification tasks. Prior work has studied different ways of automatically creating descriptor sets for every class based on prompt templates, ranging from manually engineered templates over templates obtained from a large language model to templates built from random words and characters. Up until now, deriving zero-shot classifiers from the respective encoded class descriptors has remained nearly unchanged, i.e., classify to the class that maximizes cosine similarity between its averaged encoded class descriptors and the image encoding. However, weighing all class descriptors equally can be suboptimal when certain descriptors match visual clues on a given image better than others. In this work, we propose AutoCLIP, a method for auto-tuning zero-shot classifiers. AutoCLIP tunes per-image weights to each prompt template at inference time, based on statistics of class descriptor-image similarities. AutoCLIP is fully unsupervised, has very low computational overhead, and can be easily implemented in few lines of code. We show that AutoCLIP outperforms baselines across a broad range of vision-language models, datasets, and prompt templates consistently and by up to 3 percent point accuracy.
    摘要 “基于视力语言模型CLIP的分类器已经表现出惊人的零shot表现,覆盖了广泛的图像分类任务。先前的工作已经研究了不同的自动生成描述集方法,从手动工程ered模板到从大型语言模型获取的模板,以及从随机字和字符建立的模板。直到现在,从对应的编码类Descriptor中 derivation zero-shot分类器仍然几乎无changed,即将图像编码与类Descriptor的均值cosine相似性最大化来分类。但是,对所有类Descriptor做平等分配可能是不优化的,因为certainDescriptor在给定图像中更好地匹配视觉提示than others。在这项工作中,我们提出了AutoCLIP,一种自动调整零shot分类器的方法。AutoCLIP在推断时基于各个提示模板的统计信息,对每个图像进行权重调整。AutoCLIP是完全不supervised,computational overhead很低,可以实现几行代码。我们显示AutoCLIP在各种视力语言模型、数据集和提示模板上显示出consistent和高达3%的提升。”

Genetic Engineering Algorithm (GEA): An Efficient Metaheuristic Algorithm for Solving Combinatorial Optimization Problems

  • paper_url: http://arxiv.org/abs/2309.16413
  • repo_url: None
  • paper_authors: Majid Sohrabi, Amir M. Fathollahi-Fard, Vasilii A. Gromov
  • for: 本研究旨在提出一种基于基因工程思想的新迪顺序算法,以解决 combinatorial optimization 问题中的限制。
  • methods: 该算法基于传统GA的搜索方法,并具有隔离、纯化、插入和表达新基因的功能,以便实现欢迎特征的 emergence 和选择优质基因。
  • results: 对比与现有算法,本研究的GEA在 benchmark 实例中显示出了更高的性能, demonstrably displaying its potential as an innovative and efficient solution for combinatorial optimization problems。
    Abstract Genetic Algorithms (GAs) are known for their efficiency in solving combinatorial optimization problems, thanks to their ability to explore diverse solution spaces, handle various representations, exploit parallelism, preserve good solutions, adapt to changing dynamics, handle combinatorial diversity, and provide heuristic search. However, limitations such as premature convergence, lack of problem-specific knowledge, and randomness of crossover and mutation operators make GAs generally inefficient in finding an optimal solution. To address these limitations, this paper proposes a new metaheuristic algorithm called the Genetic Engineering Algorithm (GEA) that draws inspiration from genetic engineering concepts. GEA redesigns the traditional GA while incorporating new search methods to isolate, purify, insert, and express new genes based on existing ones, leading to the emergence of desired traits and the production of specific chromosomes based on the selected genes. Comparative evaluations against state-of-the-art algorithms on benchmark instances demonstrate the superior performance of GEA, showcasing its potential as an innovative and efficient solution for combinatorial optimization problems.
    摘要

Physics-Preserving AI-Accelerated Simulations of Plasma Turbulence

  • paper_url: http://arxiv.org/abs/2309.16400
  • repo_url: None
  • paper_authors: Robin Greif, Frank Jenko, Nils Thuerey
  • for: study the turbulence in fluids, gases, and plasmas with reduced computational effort
  • methods: combine Large Eddy Simulation (LES) techniques with Machine Learning (ML) to model the small-scale dynamics
  • results: reduce the computational effort by about three orders of magnitude while retaining the statistical physical properties of the turbulent system
    Abstract Turbulence in fluids, gases, and plasmas remains an open problem of both practical and fundamental importance. Its irreducible complexity usually cannot be tackled computationally in a brute-force style. Here, we combine Large Eddy Simulation (LES) techniques with Machine Learning (ML) to retain only the largest dynamics explicitly, while small-scale dynamics are described by an ML-based sub-grid-scale model. Applying this novel approach to self-driven plasma turbulence allows us to remove large parts of the inertial range, reducing the computational effort by about three orders of magnitude, while retaining the statistical physical properties of the turbulent system.
    摘要 流体、气体和激骤中的混沌问题仍然是实际和基础上的开放问题。它的不可逆性通常无法通过直接计算方式解决。在这里,我们将大噪声 simulation(LES)技术与机器学习(ML)相结合,只 explictly 保留最大的动力学,而小规模动力学则由基于 ML 的子Grid 模型描述。通过这种新的方法,我们对自驱动激骤中的混沌可以大幅减少计算努力,同时保留混沌系统的统计物理性质。

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

  • paper_url: http://arxiv.org/abs/2309.16397
  • repo_url: None
  • paper_authors: Zenan Li, Fan Nie, Qiao Sun, Fang Da, Hang Zhao
  • for: 本研究旨在提出一种可以在不活动交互的情况下学习策略的 Offline Reinforcement Learning(RL)框架,以便在自动驾驶任务中学习策略。
  • methods: 本研究使用了一种名为 UNcertainty-awaRE deciSion Transformer(UNREST)的新方法,它可以在不同的驱动环境下学习策略,不需要添加过程转移或复杂的生成模型。UNREST 使用了状态uncertainty的估计,以及Sequence segmentation,来学习策略。
  • results: 实验结果表明,UNREST 在多种驱动场景中表现出色,并且可以在不同的环境下学习策略。此外,UNREST 还可以在推理过程中 dynamically 评估环境的uncertainty,以便更加谨慎的规划。
    Abstract Offline Reinforcement Learning (RL) has emerged as a promising framework for learning policies without active interactions, making it especially appealing for autonomous driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which performs well in long-horizon tasks. However, they are overly optimistic in stochastic environments with incorrect assumptions that the same goal can be consistently achieved by identical actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates state uncertainties by the conditional mutual information between transitions and returns, and segments sequences accordingly. Discovering the `uncertainty accumulation' and `temporal locality' properties of driving environments, UNREST replaces the global returns in decision transformers with less uncertain truncated returns, to learn from true outcomes of agent actions rather than environment transitions. We also dynamically evaluate environmental uncertainty during inference for cautious planning. Extensive experimental results demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.
    摘要 无线连接学习(RL)在没有活动互动的情况下学习策略,使其特别适用于自动驾驶任务。最近的Transformers的成功激发了将线RL作为序列模型进行采用,这在长期任务中表现良好。然而,它们在随机环境中做出了过optimistic的假设,即可以通过同一种行为 consistently achieve同一个目标。在这篇论文中,我们提出了一种名为UNcertainty-awaRE deciSion Transformer(UNREST)的规划方法,用于在随机驾驶环境中无需引入附加的转移或复杂生成模型。Specifically,UNREST估算驱动环境中状态的uncertainty,通过转移和返回之间的conditional mutual information来进行估算。然后,UNREST将序列分成不同的部分,并在每个部分中学习不同的策略。在发现了驱动环境中的`uncertainty accumulation'和`temporal locality'性质后,UNREST将global returns在决策变换器中换为less uncertain的truncated returns,以学习agent动作的真正结果而不是环境转移。此外,UNREST在推理过程中动态评估环境的uncertainty,以进行谨慎的规划。广泛的实验结果表明UNREST在多种驾驶场景中表现出色,并证明了我们的uncertainty估算策略的力量。

Differential 2D Copula Approximating Transforms via Sobolev Training: 2-Cats Networks

  • paper_url: http://arxiv.org/abs/2309.16391
  • repo_url: https://github.com/flaviovdf/copulae
  • paper_authors: Flavio Figueiredo, José Geraldo Fernandes, Jackson Silva, Renato M. Assunção
  • for: 本文是关于如何使用神经网络(NN)来非参数地预测二维共抽数学函数(Copula)的研究。
  • methods: 本文使用的方法是基于物理学 informed neural networks 和 Sobolev 训练的 2-Cats 方法,可以非参数地预测二维 Copula 的输出,并且尊重共抽函数 C 的数学性质。
  • results: 本文的实验结果表明,使用 2-Cats 方法可以更好地预测二维 Copula 的输出,并且比现有方法更加精准。
    Abstract Copulas are a powerful statistical tool that captures dependencies across data dimensions. When applying Copulas, we can estimate multivariate distribution functions by initially estimating independent marginals, an easy task, and then a single copulating function, $C$, to connect the marginals, a hard task. For two-dimensional data, a copula is a two-increasing function of the form $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$, where $\mathbf{I} = [0, 1]$. In this paper, we show how Neural Networks (NNs) can approximate any two-dimensional copula non-parametrically. Our approach, denoted as 2-Cats, is inspired by the Physics-Informed Neural Networks and Sobolev Training literature. Not only do we show that we can estimate the output of a 2d Copula better than the state-of-the-art, our approach is non-parametric and respects the mathematical properties of a Copula $C$.
    摘要 <>输入文本翻译成简化中文。<>共振是一种强大的统计工具,可以捕捉数据维度之间的依赖关系。当使用共振时,我们可以首先估算独立的一元分布函数,这是一个容易完成的任务,然后估算共振函数$C$,将独立分布函数相连接,这是一个困难的任务。对于二维数据,共振是一个二增函数的形式,即 $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$,其中 $\mathbf{I} = [0, 1]$。在这篇论文中,我们表明了使用神经网络(NNs)可以非参数地approximate任意二维共振。我们的方法,称为2-Cats,是基于物理学 Informed Neural Networks和 Sobolev Training литературе。不仅我们表明了我们可以更好地估算二维共振的输出,我们的方法是非参数的,并且尊重共振函数$C$的数学性质。

RLLTE: Long-Term Evolution Project of Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.16382
  • repo_url: None
  • paper_authors: Mingqi Yuan, Zequn Zhang, Yang Xu, Shihao Luo, Bo Li, Xin Jin, Wenjun Zeng
  • for: 本研究旨在提供一个长期演化、极其对话式、开源的强化学习(RL)框架,并且提供一个完整的生态系统,以便RL研究和应用。
  • methods: RLLTE框架使用了完全解释-探索的角度来隔离RL算法,并且提供了许多元件来推进算法的发展和演化。
  • results: RLLTE框架是RL领域中第一个建立完整的生态系统,包括模型训练、评估、部署、参考中心和大语言模型(LLM) empowered copilot等元素,这些元素将会设定RL工程实践的标准,并且将会对学术和业界产生很大的刺激。
    Abstract We present RLLTE: a long-term evolution, extremely modular, and open-source framework for reinforcement learning (RL) research and application. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms. More specifically, RLLTE decouples the RL algorithms completely from the exploitation-exploration perspective, providing a large number of components to accelerate algorithm development and evolution. In particular, RLLTE is the first RL framework to build a complete and luxuriant ecosystem, which includes model training, evaluation, deployment, benchmark hub, and large language model (LLM)-empowered copilot. RLLTE is expected to set standards for RL engineering practice and be highly stimulative for industry and academia.
    摘要 我们呈现RLLTE:一个长期演化、极其模块化、开源的强化学习(RL)框架,供研究和应用。不同于传统的RL框架,RLLTE不仅提供了高效的算法实现,还 serves as a 工具集,帮助开发者更快速地发展和演化算法。具体而言,RLLTE将RL算法与利用探索 perspective完全分离,提供了许多组件,以便增加算法的发展和演化。RLLTE 是首个RL框架,建立了完整且丰富的生态系统,包括模型训练、评估、部署、底�检查和大语言模型(LLM) empowered copilot。RLLTE 预期会设定RL工程学习的标准,并对业界和学界产生强烈的刺激。

Conditional normalizing flows for IceCube event reconstruction

  • paper_url: http://arxiv.org/abs/2309.16380
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: Thorsten Glüsenkamp
  • for: 这个论文旨在描述如何使用常量正则流来推断高能νe和μν neutrino产生事件的方向和能量。
  • methods: 这个论文使用了条件正则流来 derivate每个个体事件的 posterior 分布,包括系统atic uncertainty。
  • results: 研究发现,在1TeV到100TeV的能量范围内,正则流可以更好地捕捉到高能νe和μν neutrino产生事件的方向和能量, especialy azimuth-zenith asymmetries,这些偏好在前一代分析中被忽略。
    Abstract The IceCube Neutrino Observatory is a cubic-kilometer high-energy neutrino detector deployed in the Antarctic ice. Two major event classes are charged-current electron and muon neutrino interactions. In this contribution, we discuss the inference of direction and energy for these classes using conditional normalizing flows. They allow to derive a posterior distribution for each individual event based on the raw data that can include systematic uncertainties, which makes them very promising for next-generation reconstructions. For each normalizing flow we use the differential entropy and the KL-divergence to its maximum entropy approximation to interpret the results. The normalizing flows correctly incorporate complex optical properties of the Antarctic ice and their relation to the embedded detector. For showers, the differential entropy increases in regions of high photon absorption and decreases in clear ice. For muons, the differential entropy strongly correlates with the contained track length. Coverage is maintained, even for low photon counts and highly asymmetrical contour shapes. For high-photon counts, the distributions get narrower and become more symmetrical, as expected from the asymptotic theorem of Bernstein-von-Mises. For shower directional reconstruction, we find the region between 1 TeV and 100 TeV to potentially benefit the most from normalizing flows because of azimuth-zenith asymmetries which have been neglected in previous analyses by assuming symmetrical contours. Events in this energy range play a vital role in the recent discovery of the galactic plane diffuse neutrino emission.
    摘要 冰砾激光观测站是一个立方公分级高能激光探测器,部署在南极冰中。我们使用条件正常化流来推断激光的方向和能量。这些正常化流可以基于Raw数据 derivate一个单个事件的 posterior distribution,包括系统atic uncertainty,这使其非常有前途的应用于下一代重建。我们使用 differential entropy 和 KL-divergence 来解释结果。正常化流正确地反映了南极冰的复杂光学性和探测器的相关性。在 shower 中,differential entropy 在高光吸收区域增加,而在clear ice 区域减少。对于 muon,differential entropy 与包含轨迹长度强相关。 regardless of low photon counts 和高度不均匀的外 contour shape,coverage 得以维护。对高 photon counts 的分布,distribution 变得更加窄和对称,这与Bernstein-von-Mises asymptotic theorem 相符。在 shower 方向重建中,我们发现1TeV 到 100TeV 的能量范围可能受益最多,因为这个范围中的 azimuth-zenith 偏好未在前一analysis中考虑。这些事件在激光 diffuse neutrino emission 的发现中扮演了重要的角色。

Epistemic Logic Programs: a study of some properties

  • paper_url: http://arxiv.org/abs/2309.16344
  • repo_url: None
  • paper_authors: Stefania Costantini, Andrea Formisano
  • for: 这篇论文旨在扩展Answer Set Programming(ASP)的 Epistemic Logic Programs(ELPs),并提供了这些程序的 semantics 的不同Characterizations。
  • methods: 本文使用了不同的semantic approach来描述 world views,并提出了一些新的semantic property,如 Epistemic Splitting Property,以便模块地计算 world views。
  • results: 本文分析了在bottom-up和top-down方法之间的换 perspective,并提出了一种基本的top-down方法,证明其等价于bottom-up方法。此外,本文还提出了一种扩展的top-down方法,其可以应用于许多现有的semantics,并且与bottom-up方法在Epistemically Stratified Programs中具有相同的性质。
    Abstract Epistemic Logic Programs (ELPs), extend Answer Set Programming (ASP) with epistemic operators. The semantics of such programs is provided in terms of world views, which are sets of belief sets, i.e., syntactically, sets of sets of atoms. Different semantic approaches propose different characterizations of world views. Recent work has introduced semantic properties that should be met by any semantics for ELPs, like the Epistemic Splitting Property, that, if satisfied, allows to modularly compute world views in a bottom-up fashion, analogously to ``traditional'' ASP. We analyze the possibility of changing the perspective, shifting from a bottom-up to a top-down approach to splitting. We propose a basic top-down approach, which we prove to be equivalent to the bottom-up one. We then propose an extended approach, where our new definition: (i) is provably applicable to many of the existing semantics; (ii) operates similarly to ``traditional'' ASP; (iii) provably coincides under any semantics with the bottom-up notion of splitting at least on the class of Epistemically Stratified Programs (which are, intuitively, those where the use of epistemic operators is stratified); (iv) better adheres to common ASP programming methodology.
    摘要 《知识逻辑编程(ELP)》,是将回答集编程(ASP)扩展到知识运算符的逻辑语言。知识运算符的 semantics 是通过世界观(set of belief sets,即语法上来说是集合的集合)来提供。不同的 semantics 可以对 world view 进行不同的Characterization。最近的工作已经提出了为 ELP 的 semantics 所需的一些 semantics properties,例如 epistemic splitting property,如果满足这个 property,那么可以使用分解来计算 world view 的模块化计算方式,类似于传统的 ASP。我们分析了从 bottom-up 到 top-down 的 Perspective 的改变,并提出了一种基本的 top-down 方法,我们证明了它与 bottom-up 方法是等价的。然后,我们提出了一种扩展的方法,其中我们新的定义:(i)可以应用于大多数现有的 semantics;(ii)与传统的 ASP 类似;(iii)在任何 semantics 下与 bottom-up 的 splitting 做出相同的结果;(iv)更好地遵循传统的 ASP 编程方法。

End-to-end Risk Prediction of Atrial Fibrillation from the 12-Lead ECG by Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.16335
  • repo_url: https://github.com/mygithth27/af-risk-prediction-by-ecg-dnn
  • paper_authors: Theogene Habineza, Antônio H. Ribeiro, Daniel Gedon, Joachim A. Behar, Antonio Luiz P. Ribeiro, Thomas B. Schön
  • For: The paper aims to develop and evaluate a machine learning algorithm to predict the risk of developing atrial fibrillation (AF) from electrocardiogram (ECG) data.* Methods: The authors use a deep neural network model to analyze the ECG data and evaluate the risk of AF. They also use a survival model to predict the probability of developing AF over time.* Results: The authors achieve an area under the receiver operating characteristic curve (AUC) score of 0.845, indicating good performance of the model in identifying patients who will develop AF in the future. They also find that patients in the high-risk group are 50% more likely to develop AF within 40 weeks, while patients in the minimal-risk group have more than 85% chance of remaining AF-free up to seven years.Here are the three points in Simplified Chinese text:* For: 这篇论文目标是开发和评估一种基于电cardiogram(ECG)数据的心律失常预测算法。* Methods: 作者使用深度神经网络模型分析ECG数据,评估心律失常风险。他们还使用生存模型预测心律失常发生的可能性。* Results: 作者实现了AUC分位函数分数0.845,表明模型在识别将来发展心律失常的能力良好。他们还发现高风险群体在40周内发展心律失常的可能性为50%,而最低风险群体在7年内保持心律失常自由的可能性高于85%。
    Abstract Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and evaluate one such algorithm on a large CODE dataset collected in Brazil. Results: The deep neural network model identified patients without indication of AF in the presented ECG but who will develop AF in the future with an AUC score of 0.845. From our survival model, we obtain that patients in the high-risk group (i.e. with the probability of a future AF case being greater than 0.7) are 50% more likely to develop AF within 40 weeks, while patients belonging to the minimal-risk group (i.e. with the probability of a future AF case being less than or equal to 0.1) have more than 85% chance of remaining AF free up until after seven years. Conclusion: We developed and validated a model for AF risk prediction. If applied in clinical practice, the model possesses the potential of providing valuable and useful information in decision-making and patient management processes.
    摘要 背景:心室 flutter (AF) 是全球每年数百万人的常见心血管疾病之一,与心血管疾病如心卫和心力衰竭存在高度的相关性。机器学习方法在电子心脏图像中评估 AF 的风险表现出了扎实的成果。我们计划在大型 CODE 数据集上开发和评估一个这种算法。结果:我们的深度神经网络模型在给定的 ECG 中能够正确地预测无症状 AF 患者,其 AUC 分数为 0.845。从我们的生存模型中,我们发现高风险群(即未来 AF случа发率大于 0.7)的患者在 40 周内的发生 AF 的概率为 50%,而低风险群(即未来 AF случа发率不大于或等于 0.1)的患者在 7 年后仍然保持 AF 无症状的概率高于 85%。结论:我们开发和验证了一种 AF 风险预测模型。如果在临床实践中应用,这种模型具有提供价值和有用信息的潜力,可以帮助决策和患者管理过程中做出更好的决策。

Augmenting transformers with recursively composed multi-grained representations

  • paper_url: http://arxiv.org/abs/2309.16319
  • repo_url: https://github.com/ant-research/structuredlm_rtdt
  • paper_authors: Xiang Hu, Qingyang Zhu, Kewei Tu, Wei Wu
  • for: 这个论文的目的是提出一种能够Explicitly model hierarchical syntactic structures of raw texts的Recursive Composition Augmented Transformer(ReCAT)模型,以便在学习和推理过程中不需要靠托金树来模型语法结构。
  • methods: 该模型使用了一种新的Contextual Inside-Outside(CIO)层来学习语法结构,这个层通过底向上和上向下的pas来学习语法上下文,并将这些上下文化的表示与Transformer模型结合起来,从而实现深度的间隔和外部交互。
  • results: experiments表明,ReCAT模型可以在各种句子级和span级任务上显著超越vanilla Transformer模型,同时与其他基于Recursive Networks和Transformers的基elines一起在自然语言理解任务上表现出色。此外,ReCAT模型所induced的层次结构与人工标注的语法树 exhibit strong consistency,这表明该模型具有良好的解释性。
    Abstract We present ReCAT, a recursive composition augmented Transformer that is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. Existing research along this line restricts data to follow a hierarchical tree structure and thus lacks inter-span communications. To overcome the problem, we propose a novel contextual inside-outside (CIO) layer that learns contextualized representations of spans through bottom-up and top-down passes, where a bottom-up pass forms representations of high-level spans by composing low-level spans, while a top-down pass combines information inside and outside a span. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions, and thus generate multi-grained representations fully contextualized with other spans. Moreover, the CIO layers can be jointly pre-trained with Transformers, making ReCAT enjoy scaling ability, strong performance, and interpretability at the same time. We conduct experiments on various sentence-level and span-level tasks. Evaluation results indicate that ReCAT can significantly outperform vanilla Transformer models on all span-level tasks and baselines that combine recursive networks with Transformers on natural language inference tasks. More interestingly, the hierarchical structures induced by ReCAT exhibit strong consistency with human-annotated syntactic trees, indicating good interpretability brought by the CIO layers.
    摘要 我们介绍ReCAT模型,这是一种基于重复组合的Transformer模型,可以直接模型文本的层次结构,不需要在学习和推断过程中依赖黄金树。现有研究限制数据只能按照层次结构进行组织,因此缺乏间隔通信。为解决这问题,我们提出了一种新的内部外部(CIO)层,该层可以在底层和顶层之间学习各个 span 的上下文化表示,其中底层 pass 将低级 span 组合成高级 span,而顶层 pass 将内部和外部信息相结合。通过在Transformer模型中堆叠多个CIO层,ReCAT模型可以实现深入的间隔和深入的span间交互,并生成全面上下文ualized的表示。此外,CIO层可以与Transformer模型进行共同预训练,使ReCAT模型具有扩展性、良好的性能和可解释性。我们在各种句子级和span级任务上进行了实验,结果表明ReCAT模型可以在所有span级任务上明显超过vanilla Transformer模型和将重复网络与Transformer模型结合的基eline。此外,ReCAT模型中的层次结构与人工标注的语法树具有强相关性,这表明CIO层带来的解释性非常好。

Efficiency Separation between RL Methods: Model-Free, Model-Based and Goal-Conditioned

  • paper_url: http://arxiv.org/abs/2309.16291
  • repo_url: None
  • paper_authors: Brieuc Pinon, Raphaël Jungers, Jean-Charles Delvenne
  • for: 这种 исследование证明了一种广泛应用的强化学习(RL)算法的基本限制。
  • methods: 这种限制适用于模型自由RL方法以及一种广泛的模型基于方法,如搜索树的规划。
  • results: 这种限制表明在RL问题中,这些方法需要对环境进行 exponential 的交互时间来找到优化行为,但是存在一种方法,不是专门针对这种家族问题,可以高效解决这些问题。
    Abstract We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.
    摘要 我们证明了一种抽象类别的强制性限制,该限制适用于广泛的强制学习(RL)算法。这种限制适用于无模型RL方法以及一种广泛的模型基于方法,如搜索树的规划。 我们提供了一家RL问题的家族,其中这些方法在与环境交互时需要至少 exponential 的时间来找到最优行为。然而, существует一种方法,不是专门针对这个家族的问题,它可以高效地解决这些问题。在此之外,我们的限制不适用于文献中提出的一些方法,如目标条件方法或构建反动动力模型的方法。

  • paper_url: http://arxiv.org/abs/2309.16289
  • repo_url: https://github.com/open-compass/lawbench
  • paper_authors: Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, Songyang Zhang, Kai Chen, Zongwen Shen, Jidong Ge
  • for: 法律领域中的大型自然语言模型(LLMs)的评估和发展
  • methods: 提出了一个全面的评估标准库LawBench,以评估 LLMS 在法律领域中的能力,包括 memorization、理解和应用三种知识水平
  • results: GPT-4 在法律领域中表现最佳,超过其他 LLMs 的表现,但是这些 LLMs 在特定的法律任务中仍然有很大的差异,而且需要进一步的调整和改进以取得可靠的结果。
    Abstract Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.
    摘要 大型语言模型(LLMs)已经展示了各种能力,但当它们应用到高度特殊化和安全敏感的法律领域时,它们是否具备了法律知识和可靠地完成法律相关任务?为了解决这个问题,我们提出了一个完整的评估指标 LawBench。 LawBench 已经精心设计,以确定 LLMs 的法律能力的三种 когнітив水平:(1)法律知识储存:LLMs 是否可以储存需要的法律概念、文章和事实;(2)法律知识理解:LLMs 是否可以理解法律文本中的实体、事件和关系;(3)法律知识应用:LLMs 是否可以正确地利用其法律知识,并且做出需要的推理步骤来解决实际法律任务。 LawBench 包含 20 个多样化的任务,涵盖 5 种任务类型:单 Label 分类(SLC)、多 Label 分类(MLC)、回推、提取和生成。我们对 51 个 LLMs 进行了广泛的评估,包括 20 种多语言 LLMs、22 个中文化 LLMs 和 9 个法律特定 LLMs。结果显示 GPT-4 在法律领域中仍然是最佳performing LLM,与其他 LLMs 相比,具有明显的优势。虽然对法律特定文本进行了 fine-tuning,但我们仍然很遥か від法律任务中可靠且可靠的 LLMs。所有数据、模型预测和评估代码都已经在 https://github.com/open-compass/LawBench/ 发布。我们希望这个底线可以帮助我们更深入了解 LLMs 在法律领域中的特定能力,并且加快法律领域中 LLMs 的发展。

High Throughput Training of Deep Surrogates from Large Ensemble Runs

  • paper_url: http://arxiv.org/abs/2309.16743
  • repo_url: None
  • paper_authors: Lucas Meyer, Marc Schouler, Robert Alexander Caulk, Alejandro Ribés, Bruno Raffin
  • for: 提高数值解析器的计算效率(accelerate numerical solvers)
  • methods: 使用深度学习方法(deep learning approaches)和多线程并行(multiple levels of parallelism)生成丰富的数据集,并将其直接流动到学习模型中进行训练(online training)
  • results: 在训练一个全连接网络作为热方程的替代方案时,提高了精度47%和批处理速率13倍,可以在2小时内训练8TB的数据。
    Abstract Recent years have seen a surge in deep learning approaches to accelerate numerical solvers, which provide faithful but computationally intensive simulations of the physical world. These deep surrogates are generally trained in a supervised manner from limited amounts of data slowly generated by the same solver they intend to accelerate. We propose an open-source framework that enables the online training of these models from a large ensemble run of simulations. It leverages multiple levels of parallelism to generate rich datasets. The framework avoids I/O bottlenecks and storage issues by directly streaming the generated data. A training reservoir mitigates the inherent bias of streaming while maximizing GPU throughput. Experiment on training a fully connected network as a surrogate for the heat equation shows the proposed approach enables training on 8TB of data in 2 hours with an accuracy improved by 47% and a batch throughput multiplied by 13 compared to a traditional offline procedure.
    摘要 近年来,深度学习方法在加速数值方法方面得到了广泛应用,这些深度代理模型提供了诚实的但计算昂贵的物理世界 simulate。这些深度代理通常在监督式的方式下从有限量的数据中训练。我们提出了一个开源框架,该框架可以在大量的 ensemble 运行中在线训练这些模型。它利用多级并行来生成丰富的数据集。框架避免了 I/O 瓶颈和存储问题,直接流动生成的数据。一个训练储备池 Mitigates 流动中的遗传性,同时 maximizing GPU throughput。在训练一个完全连接的网络作为热方程代理方法中,我们的方法可以在 2 小时内训练 8TB 的数据,并提高了精度 by 47% 和批处理速率 by 13 比 traditional offline 方法。

Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning

  • paper_url: http://arxiv.org/abs/2309.16286
  • repo_url: https://github.com/wenkehuang/fccl
  • paper_authors: Wenke Huang, Mang Ye, Zekun Shi, Bo Du
  • for: 这个论文的目的是解决联合学习中的两个主要挑战:模型多样性和溃败性忘却。
  • methods: 这篇论文提出了一个新的FCCL+方法,即联合相似性学习加浓度转移,通过无关的公开数据来解决内部领域对话障碍,并在本地更新阶段引入联合非目标传播,以保持跨领域知识。
  • results: 实验结果显示,FCCL+方法能够有效地解决模型多样性和溃败性忘却问题,并且在不同的领域转移情况下具有较好的一致性和稳定性。
    Abstract Federated learning is an important privacy-preserving multi-party learning paradigm, involving collaborative learning with others and local updating on private data. Model heterogeneity and catastrophic forgetting are two crucial challenges, which greatly limit the applicability and generalizability. This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation, facilitating the both intra-domain discriminability and inter-domain generalization. For heterogeneity issue, we leverage irrelevant unlabeled public data for communication between the heterogeneous participants. We construct cross-correlation matrix and align instance similarity distribution on both logits and feature levels, which effectively overcomes the communication barrier and improves the generalizable ability. For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation, which retains inter-domain knowledge while avoiding the optimization conflict issue, fulling distilling privileged inter-domain information through depicting posterior classes relation. Considering that there is no standard benchmark for evaluating existing heterogeneous federated learning under the same setting, we present a comprehensive benchmark with extensive representative methods under four domain shift scenarios, supporting both heterogeneous and homogeneous federated settings. Empirical results demonstrate the superiority of our method and the efficiency of modules on various scenarios.
    摘要 federated 学习是一种重要的隐私保护多方学习模式,协同学习他人的私人数据。模型多样性和悬峰性忘记是这种模式的两大挑战,它们很大程度限制了应用和泛化性。这篇论文提出了一种新的FCCL+,基于联合相关学习和非目标液化,解决了两个挑战。对于多样性问题,我们利用无关的公共数据来进行参与者之间的交流。我们构建了垂直相关矩阵,并将实例相似性分布对应于logits和特征层面,这有效地超越了交流障碍和提高了泛化能力。对于本地更新阶段的悬峰性问题,FCCL+引入了联邦非目标液化,保留了域之间知识,并避免了优化冲突问题,通过描述 posterior classes 关系来全面地泛化知识。由于现有的多样性联邦学习没有标准的评估标准,我们提出了一个完整的 bencmark,包括了四个域Shift 情况,支持 both heterogeneous和 homogeneous 联邦设置。我们的实验结果表明,我们的方法的优越性和模块的效率在各种场景中。

Automatic Identification of Stone-Handling Behaviour in Japanese Macaques Using LabGym Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2310.07812
  • repo_url: None
  • paper_authors: Théo Ardoin, Cédric Sueur
  • for: 本研究旨在评估LabGym工具的适用性 дляPrimates行为分析,主要采用日本黑猩猩为模式动物。
  • methods: 本研究采用了一种新的行为分析模型,通过使用LabGym工具来检测日本黑猩猩石头拥有行为。
  • results: 研究成功开发了一个高度准确的日本黑猩猩石头拥有行为检测模型,但因为时间限制而无法取得具体的量化数据。
    Abstract The latest advancements in artificial intelligence technology have opened doors to the analysis of intricate behaviours. In light of this, ethologists are actively exploring the potential of these innovations to streamline the time-intensive process of behavioural analysis using video data. In the realm of primatology, several tools have been developed for this purpose. Nonetheless, each of these tools grapples with technical constraints that we aim to surmount. To address these limitations, we have established a comprehensive protocol designed to harness the capabilities of a cutting-edge tool, LabGym. Our primary objective was to evaluate LabGym's suitability for the analysis of primate behaviour, with a focus on Japanese macaques as our model subjects. We have successfully developed a model that demonstrates a high degree of accuracy in detecting Japanese macaques stone-handling behaviour. Our behavioural analysis model was completed as per our initial expectations and LabGym succeed to recognise stone-handling behaviour on videos. However, it is important to note that our study's ability to draw definitive conclusions regarding the quality of the behavioural analysis is hampered by the absence of quantitative data within the specified timeframe. Nevertheless, our model represents the pioneering endeavour, as far as our knowledge extends, in leveraging LabGym for the analysis of primate behaviours. It lays the groundwork for potential future research in this promising field.
    摘要 最新的人工智能技术的发展已经开启了复杂行为的分析的大门。鉴于这一点,生物学家正在积极探索这些创新的潜在用于视频数据分析的可能性。在primatology领域,一些工具已经被开发出来用于这种目的。然而,每种工具都面临着技术上的限制,我们希望通过解决这些限制来进一步发展。为了实现这一目标,我们已经建立了一个完整的协议,旨在利用当今最先进的工具——LabGym——来实现 primate 行为的分析。我们的首要目标是评估 LabGym 是否适用于日本猕猴的行为分析,特别是日本猕猴的石头处理行为。我们已经成功地建立了一个模型,具有高度准确性的检测日本猕猴石头处理行为。我们的行为分析模型按照我们的初始预期进行了完成,LabGym 成功地在视频中识别出日本猕猴的石头处理行为。然而,我们的研究无法在指定时间内提供量化数据,因此我们的研究结论的可靠性受到限制。不过,我们的模型表示了在我们知道的范围内的开拓性的尝试,它为未来可能的研究奠定了基础。

Optimizing Multicarrier Multiantenna Systems for LoS Channel Charting

  • paper_url: http://arxiv.org/abs/2310.03762
  • repo_url: None
  • paper_authors: Taha Yassine, Luc Le Magoarou, Matthieu Crussière, Stephane Paquelet
  • for: 本研究旨在提出一种可以在多载波多天线系统中学习渠道图的方法,以便实现高精度的用户equipment(UE)位置推算。
  • methods: 本研究使用了一种基于距离度量的方法,以学习渠道图。但是,该距离度量受到 periodicity和振荡性的影响,导致用户远离的 UE 可能会被误判为近距离。
  • results: 本研究提供了一种改进的距离度量,以消除 periodicity和振荡性的影响。此外,研究还提出了一些设计方法,以便实现高质量的渠道图学习。实验 validate 了这些结论,并在不同的场景下进行了实验验证。
    Abstract Channel charting (CC) consists in learning a mapping between the space of raw channel observations, made available from pilot-based channel estimation in multicarrier multiantenna system, and a low-dimensional space where close points correspond to channels of user equipments (UEs) close spatially. Among the different methods of learning this mapping, some rely on a distance measure between channel vectors. Such a distance should reliably reflect the local spatial neighborhoods of the UEs. The recently proposed phase-insensitive (PI) distance exhibits good properties in this regards, but suffers from ambiguities due to both its periodic and oscillatory aspects, making users far away from each other appear closer in some cases. In this paper, a thorough theoretical analysis of the said distance and its limitations is provided, giving insights on how they can be mitigated. Guidelines for designing systems capable of learning quality charts are consequently derived. Experimental validation is then conducted on synthetic and realistic data in different scenarios.
    摘要

UPB @ ACTI: Detecting Conspiracies using fine tuned Sentence Transformers

  • paper_url: http://arxiv.org/abs/2309.16275
  • repo_url: None
  • paper_authors: Andrei Paraschiv, Mihai Dascalu
  • for: 探讨假设论文探测,以便提高信息准确性和社会信任。
  • methods: 使用预训练句子转换器模型和数据增强技术。
  • results: 在ACTI @ EVALITA 2023分类任务中,我们的方法取得了85.71%的 binary 分类和91.23%的细化假设主题分类的 F1 分数,超过其他竞争系统。
    Abstract Conspiracy theories have become a prominent and concerning aspect of online discourse, posing challenges to information integrity and societal trust. As such, we address conspiracy theory detection as proposed by the ACTI @ EVALITA 2023 shared task. The combination of pre-trained sentence Transformer models and data augmentation techniques enabled us to secure first place in the final leaderboard of both sub-tasks. Our methodology attained F1 scores of 85.71% in the binary classification and 91.23% for the fine-grained conspiracy topic classification, surpassing other competing systems.
    摘要 共谋论变得在网络讨论中变得更加出名,对信息完整性和社会信任构成挑战。因此,我们Addresses conspiracy theory detection as proposed by the ACTI @ EVALITA 2023 shared task。 combining pre-trained sentence Transformer models and data augmentation techniques enabled us to secure first place in the final leaderboard of both sub-tasks。 Our methodology attained F1 scores of 85.71% in the binary classification and 91.23% for the fine-grained conspiracy topic classification,Surpassing other competing systems。

Cooperation Dynamics in Multi-Agent Systems: Exploring Game-Theoretic Scenarios with Mean-Field Equilibria

  • paper_url: http://arxiv.org/abs/2309.16263
  • repo_url: https://github.com/dawnorak/marl-kart-simulator
  • paper_authors: Vaigarai Sathi, Sabahat Shaik, Jaswanth Nidamanuri
  • for: investigate strategies to invoke cooperation in game-theoretic scenarios, such as the Iterated Prisoner’s Dilemma, where agents must optimize both individual and group outcomes.
  • methods: analyze existing cooperative strategies for their effectiveness in promoting group-oriented behavior in repeated games, and propose modifications that encourage group rewards while also resulting in higher individual gains.
  • results: extend the study to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging, and establish equilibrium solutions and reward structures for infinitely large agent sets in repeated games using mean-field game theory. Offer practical insights through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and explore adapting simulation algorithms to create scenarios favoring cooperation for group rewards.
    Abstract Cooperation is fundamental in Multi-Agent Systems (MAS) and Multi-Agent Reinforcement Learning (MARL), often requiring agents to balance individual gains with collective rewards. In this regard, this paper aims to investigate strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma, where agents must optimize both individual and group outcomes. Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games. Modifications are proposed where encouraging group rewards will also result in a higher individual gain, addressing real-world dilemmas seen in distributed systems. The study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. Leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.
    摘要 合作是多智能体系统(MAS)和多智能体奖励学习(MARL)的基本要素,经常需要智能体寻求 equilibrio between individual gains 和 collective rewards. 在这种情况下,本文旨在调查invoking cooperation in game-theoretic scenarios,具体来说是迭代犯罪者的困境,智能体需要优化个人和群体的结果。现有的合作策略被分析是否能够促进群体 oriented 行为在重复游戏中。提议 modifying these strategies to encourage group rewards, which will also result in higher individual gains, addressing real-world dilemmas seen in distributed systems.更over, this study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. By leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.

QonFusion – Quantum Approaches to Gaussian Random Variables: Applications in Stable Diffusion and Brownian Motion

  • paper_url: http://arxiv.org/abs/2309.16258
  • repo_url: https://github.com/BoltzmannEntropy/QonFusion
  • paper_authors: Shlomo Kashani
  • For: The paper proposes a strategy for generating Gaussian random variables (GRVs) using non-parametric quantum circuits, as an alternative to conventional pseudorandom number generators (PRNGs) such as the \textbf{torch.rand} function in PyTorch.* Methods: The paper introduces a Quantum Gaussian Random Variable Generator that fulfills dual roles in simulating both Stable Diffusion (SD) and Brownian Motion (BM), diverging from previous methods that use parametric quantum circuits (PQCs) and variational quantum eigensolvers (VQEs). The proposed method does not require a computationally demanding optimization process to tune parameters.* Results: The paper presents QonFusion, a Python library that facilitates assimilating the proposed methodology into existing computational frameworks, and validates QonFusion through extensive statistical testing, confirming the statistical equivalence of the Gaussian samples from the quantum approach to classical counterparts within defined significance limits.
    Abstract In the present study, we delineate a strategy focused on non-parametric quantum circuits for the generation of Gaussian random variables (GRVs). This quantum-centric approach serves as a substitute for conventional pseudorandom number generators (PRNGs), such as the \textbf{torch.rand} function in PyTorch. The principal theme of our research is the incorporation of Quantum Random Number Generators (QRNGs) into classical models of diffusion. Notably, our Quantum Gaussian Random Variable Generator fulfills dual roles, facilitating simulations in both Stable Diffusion (SD) and Brownian Motion (BM). This diverges markedly from prevailing methods that utilize parametric quantum circuits (PQCs), often in conjunction with variational quantum eigensolvers (VQEs). Although conventional techniques can accurately approximate ground states in complex systems or model elaborate probability distributions, they require a computationally demanding optimization process to tune parameters. Our non-parametric strategy obviates this necessity. To facilitate assimilating our methodology into existing computational frameworks, we put forward QonFusion, a Python library congruent with both PyTorch and PennyLane, functioning as a bridge between classical and quantum computational paradigms. We validate QonFusion through extensive statistical testing, including tests which confirm the statistical equivalence of the Gaussian samples from our quantum approach to classical counterparts within defined significance limits. QonFusion is available at \url{https://boltzmannentropy.github.io/qonfusion.github.io/} to reproduce all findings here.
    摘要 在本研究中,我们提出了一种非参数化量子环境的推算方法,用于生成高斯随机变量(GRV)。这种量子中心的方法可以替换传统的 Pseudorandom Number Generators(PRNG),如PyTorch中的torch.rand函数。我们的研究的主题是将量子随机数生成器(QRNG)integrated into classical diffusion models。与之前的方法不同,我们的量子高斯随机变量生成器同时可以实现稳定扩散(SD)和布朗运动(BM)的模拟。与 Parametric Quantum Circuits(PQC)相比,我们的非参数化方法不需要进行 computationally demanding 的参数调整。为使我们的方法更容易与现有计算框架集成,我们提出了QonFusion,一个Python库,与PyTorch和PennyLane相容,用于连接类型和量子计算 paradigm。我们通过了广泛的统计测试,包括确认我们量子方法生成的高斯样本与传统 counterparts 在定义的 estadístico 限度内 Statistical Equivalence。QonFusion可以在 \url{https://boltzmannentropy.github.io/qonfusion.github.io/} 上获取,以便复现这里的所有发现。

Nondestructive chicken egg fertility detection using CNN-transfer learning algorithms

  • paper_url: http://arxiv.org/abs/2309.16257
  • repo_url: None
  • paper_authors: Shoffan Saifullah, Rafal Drezewski, Anton Yudhana, Andri Pranolo, Wilis Kaswijanti, Andiko Putro Suryotomo, Seno Aji Putra, Alin Khaliduzzaman, Anton Satria Prabuwono, Nathalie Japkowicz
  • for: 这个研究探索了将Convolutional Neural Network(CNN)的传播学习应用于不破坏性鸡蛋 fertility detection,以提高精禽育成实践中的精确度。
  • methods: 研究使用了四个模型:VGG16、ResNet50、InceptionNet和MobileNet,并将它们训练和评估在一个dataset(200个单一鸡蛋图像)上,使用了增强图像(旋转、反转、缩小、缩寸和反射)。
  • results: 训练结果显示所有模型具有高精度,能够正确地学习和类别鸡蛋的 fertility 状态,但在测试集中,不同模型之间存在差异。InceptionNet 表现最佳,在所有评估指标中具有最高精度,对于断言fertile 和 non-fertile 鸡蛋的准确率为 0.98。
    Abstract This study explored the application of CNN-Transfer Learning for nondestructive chicken egg fertility detection for precision poultry hatchery practices. Four models, VGG16, ResNet50, InceptionNet, and MobileNet, were trained and evaluated on a dataset (200 single egg images) using augmented images (rotation, flip, scale, translation, and reflection). Although the training results demonstrated that all models achieved high accuracy, indicating their ability to accurately learn and classify chicken eggs' fertility state, when evaluated on the testing set, variations in accuracy and performance were observed. InceptionNet exhibited the best overall performance, accurately classifying fertile and non-fertile eggs. It demonstrated excellent performance in both training and testing sets in all parameters of the evaluation metrics. In testing set, it achieved an accuracy of 0.98, a sensitivity of 1 for detecting fertile eggs, and a specificity of 0.96 for identifying non-fertile eggs. The higher performance is attributed to its unique architecture efficiently capturing features at different scales leading to improved accuracy and robustness. Further optimization and fine-tuning of the models might necessary to address the limitations in accurately detecting fertile and non-fertile eggs in case of other models. This study highlighted the potential of CNN-Transfer Learning for nondestructive fertility detection and emphasizes the need for further research to enhance the models' capabilities and ensure accurate classification.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China.

Supervised Learning Models for Early Detection of Albuminuria Risk in Type-2 Diabetes Mellitus Patients

  • paper_url: http://arxiv.org/abs/2309.16742
  • repo_url: None
  • paper_authors: Arief Purnama Muharram, Dicky Levenus Tahapary, Yeni Dwi Lestari, Randy Sarayar, Valerie Josephine Dirjayanto
  • for: 这项研究旨在开发一种超vised学习模型,用于预测TYPE 2 糖尿病患者发展albuminuria的风险。
  • methods: 该研究使用了10个特征特征和1个目标特征(albuminuria),并选择了Na"ive Bayes、支持向量机(SVM)、决策树、Random Forest、AdaBoost、XGBoost和多层权重神经网络(MLP)等10种超vised学习算法进行训练。
  • results: 实验结果显示,MLP表现最佳,其准确率和f1-score分别达0.74和0.75,表明其适用于预测TYPE 2 糖尿病患者albuminuria的creening purposes。
    Abstract Diabetes, especially T2DM, continues to be a significant health problem. One of the major concerns associated with diabetes is the development of its complications. Diabetic nephropathy, one of the chronic complication of diabetes, adversely affects the kidneys, leading to kidney damage. Diagnosing diabetic nephropathy involves considering various criteria, one of which is the presence of a pathologically significant quantity of albumin in urine, known as albuminuria. Thus, early prediction of albuminuria in diabetic patients holds the potential for timely preventive measures. This study aimed to develop a supervised learning model to predict the risk of developing albuminuria in T2DM patients. The selected supervised learning algorithms included Na\"ive Bayes, Support Vector Machine (SVM), decision tree, random forest, AdaBoost, XGBoost, and Multi-Layer Perceptron (MLP). Our private dataset, comprising 184 entries of diabetes complications risk factors, was used to train the algorithms. It consisted of 10 attributes as features and 1 attribute as the target (albuminuria). Upon conducting the experiments, the MLP demonstrated superior performance compared to the other algorithms. It achieved accuracy and f1-score values as high as 0.74 and 0.75, respectively, making it suitable for screening purposes in predicting albuminuria in T2DM. Nonetheless, further studies are warranted to enhance the model's performance.
    摘要 DIABETES,特别是TYPE 2 DIABETES MELLITUS(T2DM),仍然是现代医学中的一个重要健康问题。diabetes 的一个主要担忧是其相关的合并症的发展。diabetic nephropathy,一种diabetes 的慢性合并症,会影响肾脏,导致肾脏损害。诊断diabetic nephropathy 需要考虑多种 критериев,其中一个是在diabetes 患者身上存在至少一定量的蛋白质uria,也就是albuminuria。因此,预测diabetes 患者发展albuminuria的风险有可能提供时间性的预防措施。本研究旨在开发一种指导学习模型,以预测T2DM 患者发展albuminuria的风险。我们使用了10种特征和1种目标(albuminuria)组成的私有数据集来训练算法。Na\"ive Bayes、Support Vector Machine(SVM)、决策树、Random Forest、AdaBoost、XGBoost和Multi-Layer Perceptron(MLP)等算法被选择参与。实验结果表明,MLP 表现最佳,其准确率和f1-score值分别达0.74和0.75,适用于预测T2DM 患者发展albuminuria。然而,更多的研究是需要进行,以提高模型的性能。

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

  • paper_url: http://arxiv.org/abs/2309.16240
  • repo_url: None
  • paper_authors: Chaoqi Wang, Yibo Jiang, Chenghao Yang, Han Liu, Yuxin Chen
  • for: 提高人工智能系统的安全性和可控性,通过RLHF和DPO两种方法实现人类偏好的调整。
  • methods: 提出了一种通用的DPO方法,通过多种异步异常约束来简化价值函数和优化策略之间的复杂关系,从而实现更加高效和监督的人类偏好调整。
  • results: 对多种异步异常约束进行了实验研究,发现这些约束能够保证优化策略的准确性和多样性,并且比PPO方法更高效地实现异常约束。此外,这些约束直接影响预期抽象误差(ECE)。
    Abstract The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    摘要 大型语言模型(LLM)的提高了机会,包括人工通用智能(AGI),但同时也增加了安全问题,例如AI系统的可能性的滥用。人类反馈学习束缚(RLHF)已经成为一种有希望的路径,但它具有复杂性和依赖于另一个奖励模型的问题。直接偏好优化(DPO)已经被提议,它与RLHF等价,只要在reverse KL regularization constraint下。这篇文章介绍了 $f$-DPO,一种通用的DPO方法,通过包括杰尼森-尚恩减少函数、前向KL减少函数和α减少函数的多种偏好减少函数来简化优化问题。我们显示,在某些$f$-减少函数下,包括杰尼森-尚恩减少函数、前向KL减少函数和α减少函数,优化问题可以通过解决卡鲁什-库恩-图克(KKT)条件来简化。这使得不需要估算正则化常量在布莱德利-泰勒模型中,并且可以在一个可追踪的方式下映射奖励函数和优化策略之间的关系。我们的方法可以更有效地使用LLM来与人类偏好相align,并且在一个更广泛的偏好减少函数下进行监督。实验表明,采用这些偏好减少函数可以保持对适配性和生成多样性的平衡。此外, $f$-DPO在异质效率和抽象效率方面表现出色,而偏好减少函数直接影响预期calibration error(ECE)。

Language models in molecular discovery

  • paper_url: http://arxiv.org/abs/2309.16235
  • repo_url: None
  • paper_authors: Nikita Janakarajan, Tim Erdmann, Sarath Swaminathan, Teodoro Laino, Jannis Born
  • for: 本研究旨在探讨语言模型在分子发现中的应用,尤其是在药物发现过程中的潜在作用。
  • methods: 本研究使用了语言模型来预测分子的性质和化学反应,并评估了这些模型在早期药物发现中的潜在应用。
  • results: 研究发现,语言模型可以帮助加速分子发现过程,特别是在药物设计、物理性预测和化学反应预测等领域。同时,研究还发现了一些可用的开源软件资源,可以降低进入这个领域的门槛。
    Abstract The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.
    摘要 随着语言模型的成功,特别是基于转换器的架构,在其他领域也出现了“科学语言模型”,这些模型运行于小分子、蛋白质或 polymer 等领域。在化学中,语言模型有助于加速药物发现的循环,这已经由最近的一些成果所证明。在本文中,我们将评论语言模型在分子发现中的角色,包括创新药物设计、质量预测和化学反应。我们将介绍一些有价值的开源软件资源,以降低进入科学语言模型领域的入口难度。最后,我们将绘制未来分子设计的未来方向,即通过聊天机器人接口与计算化学工具访问。我们的贡献 serves as a valuable resource for researchers、化学家和 AI 爱好者,了解如何和将来如何使用语言模型加速化学发现。

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections

  • paper_url: http://arxiv.org/abs/2309.16741
  • repo_url: None
  • paper_authors: Tom Bamford, Andrea Coletta, Elizabeth Fons, Sriram Gopalakrishnan, Svitlana Vyetrenko, Tucker Balch, Manuela Veloso
  • for: 该论文是为了提高金融时间序数据的存储和检索效率而设计的。
  • methods: 该论文使用深度编码器将多Modal数据存储在lower-dimensional的幻数空间中,以便通过图像或自然语言查询。
  • results: 该论文通过实验和示例证明了其方法的计算效率和准确性,并且展示了使用幻数空间投影可以提高金融时间序数据的检索效率和用户友好性。
    Abstract Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.
    摘要 金融机构通常处理和存储大量时间序列数据,这些数据由高频生成并持续更新。为了支持高效的数据存储和检索,专门的时间序列数据库和系统出现了。这些数据库支持将时间序列索引和查询使用受限制的Structured Query Language(SQL)格式,以便进行如“股票月度价格增长大于5%”的查询。然而,这些查询不能很好地捕捉高维时间序列数据的内在复杂性,这些数据常常可以更好地用图像或语言来描述(例如,“股票在低波动 régime”)。此外,对时间序列空间的存储、计算时间和检索复杂性都是非常大的。在这篇论文中,我们提出了一种将多模态数据存储在低维的潜在空间中的框架,使得潜在空间投影包括不只是时间序列趋势,还有其他金融时间序列数据的愉悦信息或特征(如价格波动)。此外,我们的方法允许用户友好的查询界面,例如使用自然语言文本或时间序列绘图,我们已经开发出了直观的界面。我们在历史数据和synthetic数据上进行了计算效率和准确性的比较,并将latent空间投影的利用性和便捷性展示在金融时间序列数据存储和检索中。

Automated Chest X-Ray Report Generator Using Multi-Model Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2310.05969
  • repo_url: https://github.com/ariefpurnamamuharram/IF5200
  • paper_authors: Arief Purnama Muharram, Hollyana Puteri Haryono, Abassi Haji Juma, Ira Puspasari, Nugraha Priya Utama
    for: 这个研究旨在帮助医生在胸部X线影像读取中受助,提高胸部X线诊断的精度和效率。methods: 本研究使用多个多元 classification 深度学习模型,每个模型负责检测一种异常性,并将影像处理成128x128像素的标准化内容,分成三个部分,以涵盖胸部中的上下两部分和中部。results: 系统可以将胸部X线影像自动检测出异常,并生成相应的诊断报告。报告中会包含适当的预先定义的句子,以描述检测到的异常。系统预期可以帮助医生快速对胸部X线影像进行评估,提高诊断的精度和效率。
    Abstract Reading and interpreting chest X-ray images is one of the most radiologist's routines. However, it still can be challenging, even for the most experienced ones. Therefore, we proposed a multi-model deep learning-based automated chest X-ray report generator system designed to assist radiologists in their work. The basic idea of the proposed system is by utilizing multi binary-classification models for detecting multi abnormalities, with each model responsible for detecting one abnormality, in a single image. In this study, we limited the radiology abnormalities detection to only cardiomegaly, lung effusion, and consolidation. The system generates a radiology report by performing the following three steps: image pre-processing, utilizing deep learning models to detect abnormalities, and producing a report. The aim of the image pre-processing step is to standardize the input by scaling it to 128x128 pixels and slicing it into three segments, which covers the upper, lower, and middle parts of the lung. After pre-processing, each corresponding model classifies the image, resulting in a 0 (zero) for no abnormality detected and a 1 (one) for the presence of an abnormality. The prediction outputs of each model are then concatenated to form a 'result code'. The 'result code' is used to construct a report by selecting the appropriate pre-determined sentence for each detected abnormality in the report generation step. The proposed system is expected to reduce the workload of radiologists and increase the accuracy of chest X-ray diagnosis.
    摘要 读取和解释胸部X射线图像是胸部放射学专家的日常任务之一,但它仍然可以是挑战,即使是最有经验的专家。因此,我们提议了一个基于多模型深度学习的自动化胸部X射线报告生成系统,用于帮助胸部放射学专家在工作中。我们的基本想法是利用多个二进制分类模型,每个模型负责检测一种疾病,在单一图像上进行检测。在这个研究中,我们只检测了心脏肥大、肺液腔和肺部扩张。系统生成报告由以下三步进行:图像预处理、利用深度学习模型检测疾病、生成报告。图像预处理步骤的目的是标准化输入,将其扩大到128x128像素并将其分成三个部分,包括肺部上、下、中部三个部分。然后,每个相应的模型对图像进行分类,得到0(无疾病)和1(存在疾病)两个结果。预测输出的每个模型结果被 concatenate 以生成一个'结果代码'。'结果代码'被用来生成报告,通过选择适当的预定的句子来描述检测到的疾病。我们预计该系统将减轻胸部放射学专家的工作负担,并提高胸部X射线诊断的准确性。

GInX-Eval: Towards In-Distribution Evaluation of Graph Neural Network Explanations

  • paper_url: http://arxiv.org/abs/2309.16223
  • repo_url: None
  • paper_authors: Kenza Amara, Mennatallah El-Assady, Rex Ying
  • for: 本研究旨在评估图神经网络(GNN)的可解释方法,以及评估这些方法的正确性。
  • methods: 本研究使用了 retraining 策略和 EdgeRank 分数来评估图解释的正确性。
  • results: 研究发现,许多流行的方法,包括梯度基于的方法,产生的解释并不比随机选择的边为重要的子图来的好。这些结果挑战了当前领域的研究成果。结果与人类评估相符。
    Abstract Diverse explainability methods of graph neural networks (GNN) have recently been developed to highlight the edges and nodes in the graph that contribute the most to the model predictions. However, it is not clear yet how to evaluate the correctness of those explanations, whether it is from a human or a model perspective. One unaddressed bottleneck in the current evaluation procedure is the problem of out-of-distribution explanations, whose distribution differs from those of the training data. This important issue affects existing evaluation metrics such as the popular faithfulness or fidelity score. In this paper, we show the limitations of faithfulness metrics. We propose GInX-Eval (Graph In-distribution eXplanation Evaluation), an evaluation procedure of graph explanations that overcomes the pitfalls of faithfulness and offers new insights on explainability methods. Using a retraining strategy, the GInX score measures how informative removed edges are for the model and the EdgeRank score evaluates if explanatory edges are correctly ordered by their importance. GInX-Eval verifies if ground-truth explanations are instructive to the GNN model. In addition, it shows that many popular methods, including gradient-based methods, produce explanations that are not better than a random designation of edges as important subgraphs, challenging the findings of current works in the area. Results with GInX-Eval are consistent across multiple datasets and align with human evaluation.
    摘要 Graph层神经网络(GNN)的多种解释方法已经相继开发,以便高亮图形中的边和节点,以帮助理解模型的预测结果。然而,目前并没有一个明确的方法来评估这些解释的正确性,是人类或模型视角。现有的评估过程中存在一个未解决的瓶颈,即图形外的解释问题,这个问题影响了现有的评估指标,如受欢迎程度或准确率分数。在这篇论文中,我们展示了 faithfulness 指标的局限性。我们提出了 Graph In-distribution eXplanation Evaluation(GInX-Eval),一种用于评估图解释的评估方法,超越了 faithfulness 的缺陷,并提供了新的解释视角。GInX 分数测量移除的边是模型中的有用信息,而 EdgeRank 分数评估解释边是否按照重要性排序。GInX-Eval 验证了基于真实解释的模型是否具有有用的解释。此外,它还表明了许多流行的方法,包括梯度基于的方法,生成的解释并不是更加有用的,挑战当前领域的研究成果。GInX-Eval 的结果在多个数据集上具有一致性,并与人类评估相符。

Unmasking the Chameleons: A Benchmark for Out-of-Distribution Detection in Medical Tabular Data

  • paper_url: http://arxiv.org/abs/2309.16220
  • repo_url: https://github.com/mazizmalayeri/tabmedood
  • paper_authors: Mohammad Azizmalayeri, Ameen Abu-Hanna, Giovanni Ciná
  • for: 本研究旨在解决机器学习模型在不同于训练数据分布上的泛化问题,以便在实际医疗系统中可靠地使用机器学习模型并避免 incorrect predictions。
  • methods: 本研究使用了许多泛化检测方法,包括密度基本方法和现状最佳方法,并对不同预测架构进行比较,包括MLP、ResNet和Transformer。
  • results: 研究发现,i) 远程OOD检测问题已经得到解决,但近程OOD检测问题仍然存在; ii) 后处方法独立不够,与距离基本方法结合可以提高表现; iii) Transformer架构比MLP和ResNet更加具有谨慎性。
    Abstract Despite their success, Machine Learning (ML) models do not generalize effectively to data not originating from the training distribution. To reliably employ ML models in real-world healthcare systems and avoid inaccurate predictions on out-of-distribution (OOD) data, it is crucial to detect OOD samples. Numerous OOD detection approaches have been suggested in other fields - especially in computer vision - but it remains unclear whether the challenge is resolved when dealing with medical tabular data. To answer this pressing need, we propose an extensive reproducible benchmark to compare different methods across a suite of tests including both near and far OODs. Our benchmark leverages the latest versions of eICU and MIMIC-IV, two public datasets encompassing tens of thousands of ICU patients in several hospitals. We consider a wide array of density-based methods and SOTA post-hoc detectors across diverse predictive architectures, including MLP, ResNet, and Transformer. Our findings show that i) the problem appears to be solved for far-OODs, but remains open for near-OODs; ii) post-hoc methods alone perform poorly, but improve substantially when coupled with distance-based mechanisms; iii) the transformer architecture is far less overconfident compared to MLP and ResNet.
    摘要 尽管机器学习(ML)模型在训练数据 Distribution 上表现出色,但它们在不属于训练数据的数据上并不能准确预测。为了在实际医疗系统中可靠地使用 ML 模型并避免 incorrect predictions 的问题,检测 Out-of-distribution(OOD)样本是非常重要的。在其他领域中,许多 OOD 检测方法已经被提出,但是在医疗数据上的挑战仍然存在。为了解决这个问题,我们提出了一个广泛的可重现性的 benchmark,用于比较不同方法在多种测试中的表现。我们的 benchmark 利用了最新的 eICU 和 MIMIC-IV 两个公共数据集,这两个数据集包含了多个医院内的 ICU 病人数万个样本。我们考虑了多种基于密度的方法和 State-of-the-art 的 post-hoc 检测器,包括 MLP、ResNet 和 Transformer 等预测架构。我们的发现显示:1. 远 OOD 问题已经得到解决,但是近 OOD 问题仍然存在。2. post-hoc 方法独立使用不够Effective,但是与距离基于机制结合使用时会提高substantially。3. Transformer 架构相比 MLP 和 ResNet 更加具有谨慎性。

VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency

  • paper_url: http://arxiv.org/abs/2309.16211
  • repo_url: None
  • paper_authors: Zihao Zhu, Mingda Zhang, Shaokui Wei, Bingzhe Wu, Baoyuan Wu
  • for: 提高数据驱动AI系统的可靠性和可靠性,检测数据中的垃圾样本。
  • methods: 提出了一种新的多模态语言模型(MLLM),通过跨模态对接和解释来捕捉视觉内容的semantic偏差。VDC包括三个 consecutive module:视觉问题生成模块、视觉问题回答模块和视觉答案评估模块。
  • results: 广泛的实验表明,VDC可以具有高效性和泛化能力,检测多种类型和来源的垃圾样本。
    Abstract The role of data in building AI systems has recently been emphasized by the emerging concept of data-centric AI. Unfortunately, in the real-world, datasets may contain dirty samples, such as poisoned samples from backdoor attack, noisy labels in crowdsourcing, and even hybrids of them. The presence of such dirty samples makes the DNNs vunerable and unreliable.Hence, it is critical to detect dirty samples to improve the quality and realiability of dataset. Existing detectors only focus on detecting poisoned samples or noisy labels, that are often prone to weak generalization when dealing with dirty samples from other domains.In this paper, we find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile data cleanser (VDC) leveraging the surpassing capabilities of multimodal large language models (MLLM) in cross-modal alignment and reasoning.It consists of three consecutive modules: the visual question generation module to generate insightful questions about the image; the visual question answering module to acquire the semantics of the visual content by answering the questions with MLLM; followed by the visual answer evaluation module to evaluate the inconsistency.Extensive experiments demonstrate its superior performance and generalization to various categories and types of dirty samples.
    摘要 “数据在构建人工智能系统中的角色刚刚被提出,而现代概念中的数据中心式AI强调了数据的重要性。然而,在实际应用中,数据集可能包含废弃样本,如攻击后门杀手、来自卫星化批处理的噪声标签,以及这些杂合的样本。这些废弃样本会使深度神经网络(DNN)变得不可靠和不可靠。因此,检测废弃样本是提高数据集质量和可靠性的关键。现有的检测器只会检测到攻击后门样本或噪声标签,但这些检测器在面临不同领域中的废弃样本时往往存在弱化现象。在这篇论文中,我们发现了不同领域中废弃样本的共同特点:视觉语言冲突。为了捕捉多模态 semantic的冲突,我们提出了多模态大语言模型(MLLM)的跨模态准确性和推理能力,并开发了三个 consecutive 模块:视觉问题生成模块、视觉问题答案模块和视觉答案评估模块。经过广泛的实验,我们发现其在不同类别和类型的废弃样本上表现出优于常见的性和普适性。”

Design of JiuTian Intelligent Network Simulation Platform

  • paper_url: http://arxiv.org/abs/2310.06858
  • repo_url: None
  • paper_authors: Lei Zhao, Miaomiao Zhang, Guangyu Li, Zhuowen Guan, Sijia Liu, Zhaobin Xiao, Yuting Cao, Zhe Lv, Yanping Liang
  • for: 本研究开发了一个名为“智能网络实验平台”的智能网络实验平台,将提供无线通信实验数据服务供开放创新平台使用。
  • methods: 本平台包括一系列可扩展的模拟器功能,提供开放服务,让用户使用增强学习算法进行模型训练和推断,并可以上传和更新参数配置以进行不同情况下的优化任务。
  • results: 本平台和其开放服务主要从背景、整体架构、模拟器、商业场景和未来方向等多个角度进行介绍。
    Abstract This paper introduced the JiuTian Intelligent Network Simulation Platform, which can provide wireless communication simulation data services for the Open Innovation Platform. The platform contains a series of scalable simulator functionalities, offering open services that enable users to use reinforcement learning algorithms for model training and inference based on simulation environments and data. Additionally, it allows users to address optimization tasks in different scenarios by uploading and updating parameter configurations. The platform and its open services were primarily introduced from the perspectives of background, overall architecture, simulator, business scenarios, and future directions.
    摘要 这篇论文介绍了九天智能网络模拟平台,该平台可以为开放创新平台提供无线通信模拟数据服务。平台包括一系列可扩展的模拟器功能,提供开放服务,让用户通过模拟环境和数据进行模型训练和推理,同时允许用户在不同的enario中解决优化问题,通过上传和更新参数配置。平台和其开放服务主要从背景、总体架构、模拟器、业务场景和未来方向等角度进行了介绍。

Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities

  • paper_url: http://arxiv.org/abs/2309.16739
  • repo_url: None
  • paper_authors: Zheng Lin, Guanqiao Qu, Qiyuan Chen, Xianhao Chen, Zhe Chen, Kaibin Huang
  • for: 本文旨在探讨将大语言模型(LLM)部署到6G边缘计算(MEC)系统中的可能性,以解决云部署中的长响应时间、高频带宽成本和数据隐私问题。
  • methods: 本文首先介绍了具有多Modal的LLM在 роботиCS和医疗领域的潜在应用,以强调在用户端部署LLM的需求。然后,本文识别了部署LLM在边缘的挑战,并对6G MEC架构的概述,以及两个设计方面:边缘训练和边缘推理。为了使得LLM在边缘部署更加高效,本文介绍了多种前沿技术,包括分解学习/推理、精炼练习、量化和参数共享推理。
  • results: 本文作为一篇Position paper,旨在全面识别LLM部署在6G边缘的动机、挑战和走向,以促进6G MEC系统中LLM的高效部署。
    Abstract Large language models (LLMs), which have shown remarkable capabilities, are revolutionizing AI development and potentially shaping our future. However, given their multimodality, the status quo cloud-based deployment faces some critical challenges: 1) long response time; 2) high bandwidth costs; and 3) the violation of data privacy. 6G mobile edge computing (MEC) systems may resolve these pressing issues. In this article, we explore the potential of deploying LLMs at the 6G edge. We start by introducing killer applications powered by multimodal LLMs, including robotics and healthcare, to highlight the need for deploying LLMs in the vicinity of end users. Then, we identify the critical challenges for LLM deployment at the edge and envision the 6G MEC architecture for LLMs. Furthermore, we delve into two design aspects, i.e., edge training and edge inference for LLMs. In both aspects, considering the inherent resource limitations at the edge, we discuss various cutting-edge techniques, including split learning/inference, parameter-efficient fine-tuning, quantization, and parameter-sharing inference, to facilitate the efficient deployment of LLMs. This article serves as a position paper for thoroughly identifying the motivation, challenges, and pathway for empowering LLMs at the 6G edge.
    摘要 大型语言模型(LLM),具有吸引人的能力,正在改变人工智能的发展和未来的形态。然而,由于它们的多模态,当前云端部署的状况存在一些挑战:1)长响应时间;2)高频宽成本;以及3)数据隐私的侵犯。6G移动边计算(MEC)系统可能解决这些急需解决的问题。在这篇文章中,我们探讨将 LLM 部署在6G边缘。我们首先介绍了由多模态 LLM 驱动的杀手应用,包括机器人和医疗,以强调 LLM 的部署需要在用户端。然后,我们认为6G MEC 架构对 LLM 的部署存在挑战,并讨论了两个设计方面:1)边缘训练和2)边缘推理。在两个方面中,我们讨论了一些 cutting-edge 技术,包括分解学习/推理、精炼型精度训练、量化和共享参数推理,以便有效地部署 LLM。本文作为一篇位点文章,旨在全面识别 LLM 的动机、挑战和部署路径。

A More General Theory of Diagnosis from First Principles

  • paper_url: http://arxiv.org/abs/2309.16180
  • repo_url: https://github.com/alban-grastien/diagfwork
  • paper_authors: Alban Grastien, Patrik Haslum, Sylvie Thiébaux
  • for: 这个论文是为了总结现有的 диагностикс方法,并提出一种更通用的 диагностикс理论,以便应对不同类型的系统和诊断问题。
  • methods: 该论文使用了一种基于模型的 диагностикс方法,包括在搜索空间中检查假设,测试假设的一致性,并生成冲突来排除继承和其他搜索空间的部分。
  • results: 该论文的实现使用了两种不同的测试solvers,一个基于满足性检查,另一个基于冒泡搜索,并在两个实际世界 discrete event 问题上进行了评估。结果显示,这些实现可以在更加通用的理论基础上,解决现有的诊断方法无法解决的问题,并在解决实际问题时表现出优于特殊设计的方法。
    Abstract Model-based diagnosis has been an active research topic in different communities including artificial intelligence, formal methods, and control. This has led to a set of disparate approaches addressing different classes of systems and seeking different forms of diagnoses. In this paper, we resolve such disparities by generalising Reiter's theory to be agnostic to the types of systems and diagnoses considered. This more general theory of diagnosis from first principles defines the minimal diagnosis as the set of preferred diagnosis candidates in a search space of hypotheses. Computing the minimal diagnosis is achieved by exploring the space of diagnosis hypotheses, testing sets of hypotheses for consistency with the system's model and the observation, and generating conflicts that rule out successors and other portions of the search space. Under relatively mild assumptions, our algorithms correctly compute the set of preferred diagnosis candidates. The main difficulty here is that the search space is no longer a powerset as in Reiter's theory, and that, as consequence, many of the implicit properties (such as finiteness of the search space) no longer hold. The notion of conflict also needs to be generalised and we present such a more general notion. We present two implementations of these algorithms, using test solvers based on satisfiability and heuristic search, respectively, which we evaluate on instances from two real world discrete event problems. Despite the greater generality of our theory, these implementations surpass the special purpose algorithms designed for discrete event systems, and enable solving instances that were out of reach of existing diagnosis approaches.
    摘要 MODEL-BASED诊断在不同的社区中,包括人工智能、正式方法和控制,是一个活跃的研究主题。这导致了不同的方法,targeting different types of systems and seeking different forms of diagnoses。在这篇论文中,我们解决了这些不一致性,通过总结Reiter的理论,使其不受系统和诊断类型的限制。这种更通用的诊断理论定义了最小诊断为搜索空间中最佳诊断候选者的集合。计算最小诊断通过探索诊断假设空间,测试假设集合与系统模型和观察之间的一致性,并生成规则排除继承和其他搜索空间部分。在相对轻量级假设下,我们的算法可正确计算最佳诊断候选者集合。主要困难在于搜索空间不再是Reiter理论中的全集,因此许多隐性属性(如搜索空间的有限性)不再保持。诊断理论中的冲突也需要更新,我们提出了一种更通用的冲突概念。我们在使用满足性和尝试搜索的测试器基础上实现了这些算法,并在两个真实世界离散事件问题上进行了测试。尽管我们的理论更加通用,但这些实现仍然超越了特定的诊断方法,并可以解决当前诊断方法无法解决的实例。

Attention Sorting Combats Recency Bias In Long Context Language Models

  • paper_url: http://arxiv.org/abs/2310.01427
  • repo_url: None
  • paper_authors: Alexander Peysakhovich, Adam Lerer
  • for: 提高长文本模型在生成中的效率,解决现有语言模型在处理长文本时的问题。
  • methods: 引入“注意力排序”策略,在响应过程中多次执行一步解码,然后根据注意力排序文档,最终生成答案。
  • results: 提高长文本模型的表现,并指出现有语言模型在使用 Retrieval-Augmented Generation 时存在一些挑战。
    Abstract Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training: relevant information located earlier in context is attended to less on average. Yet even when models fail to use the information from a relevant document in their response, they still pay preferential attention to that document compared to an irrelevant document at the same position. We leverage this fact to introduce ``attention sorting'': perform one step of decoding, sort documents by the attention they receive (highest attention going last), repeat the process, generate the answer with the newly sorted context. We find that attention sorting improves performance of long context models. Our findings highlight some challenges in using off-the-shelf language models for retrieval augmented generation.
    摘要 现有的语言模型很难够效率地 incorporate 长文本上下文。我们表明,一个主要的贡献因素是在预训练中学习的注意力先驱:相关信息在上下文中的早期位置被注意到的平均水平更低。然而,即使模型未使用它们的答案中的相关文档信息,它们仍然会对这些文档比 irrelevant 文档更加偏好地分配注意力。我们利用这一点,引入“注意力排序”:在一步解码后,排序文档按照它们所收到的注意力排序(最高注意力在最后),重复过程,生成答案。我们发现,注意力排序可以提高长文本模型的性能。我们的发现强调了使用预存库语言模型进行检索增强生成的一些挑战。

CoinRun: Solving Goal Misgeneralisation

  • paper_url: http://arxiv.org/abs/2309.16166
  • repo_url: None
  • paper_authors: Stuart Armstrong, Alexandre Maranhão, Oliver Daniels-Koch, Patrick Leask, Rebecca Gorman
  • for: 解决人工智能对目标的扩展问题,使得强大的人工智能能够与人类意愿和人类道德观念相吻合。
  • methods: 使用ACE(概念推导算法)代理人,不使用新的奖励信息,在新环境中解决了金币挑战,表明了自主代理人可以在新和关键的情况下被信任。
  • results: 通过ACE(概念推导算法)代理人,在金币挑战中解决了goal misgeneralisation问题,未使用新的奖励信息。
    Abstract Goal misgeneralisation is a key challenge in AI alignment -- the task of getting powerful Artificial Intelligences to align their goals with human intentions and human morality. In this paper, we show how the ACE (Algorithm for Concept Extrapolation) agent can solve one of the key standard challenges in goal misgeneralisation: the CoinRun challenge. It uses no new reward information in the new environment. This points to how autonomous agents could be trusted to act in human interests, even in novel and critical situations.
    摘要 goal misgeneralization 是人工智能对alignment的主要挑战 -- 将强大的人工智能目标与人类意图和人类伦理相对应。在这篇论文中,我们表明了ACE(概念推论算法)代理可以解决标准化目标泛化挑战之一:硬币推论挑战。它不使用新的奖励信息在新环境中。这表明了自主代理可以在新和关键的情况下被信任,以行动在人类利益之下。Here's a breakdown of the translation:* "goal misgeneralization" is translated as "目标泛化" (mùzhì pògē), which refers to the phenomenon of AI systems deviating from their intended goals.* "人工智能" is translated as "人类智能" (rénshēng zhìnéng), which refers to artificial intelligence systems.* "alignment" is translated as "对应" (duìpǐng), which refers to the alignment of AI systems with human intentions and values.* "ACE" is translated as "概念推论算法" (guīxiǎn tuīsuǒ gōngchǎng), which is the name of the algorithm used to solve the CoinRun challenge.* "CoinRun challenge" is translated as "硬币推论挑战" (hùqian tuīsuǒ tīzhàn), which refers to a standard challenge in goal misgeneralization where an AI system must learn to navigate a maze to reach a goal.* "novel and critical situations" is translated as "新和关键的情况" (xīn hé guānjiā de qíngkē), which refers to situations that are new and require careful consideration.

Leveraging Untrustworthy Commands for Multi-Robot Coordination in Unpredictable Environments: A Bandit Submodular Maximization Approach

  • paper_url: http://arxiv.org/abs/2309.16161
  • repo_url: None
  • paper_authors: Zirui Xu, Xiaofeng Lin, Vasileios Tzoumas
  • for: 这 paper 探讨了多机器人协调问题在不可预测和部分可见环境中,外部指令是不可靠的。
  • methods: 这 paper 使用了 Meta Bandit Sequential Greedy(MetaBSG)算法,该算法可以在不可靠的外部指令下提供性能保证。MetaBSG 利用了一种 meta-算法来判断机器人是否 следу着外部指令或一种最近发展的 submodular 协调算法 Bandit Sequential Greedy(BSG),该算法在不可预测和部分可见环境中有性能保证。
  • results: 这 paper 在 simulated 多目标追踪场景中验证了 MetaBSG 算法的效果,并证明了 MetaBSG 可以在不可预测和部分可见环境中提供性能保证,并且可以Robustify 不可靠的外部指令。
    Abstract We study the problem of multi-agent coordination in unpredictable and partially-observable environments with untrustworthy external commands. The commands are actions suggested to the robots, and are untrustworthy in that their performance guarantees, if any, are unknown. Such commands may be generated by human operators or machine learning algorithms and, although untrustworthy, can often increase the robots' performance in complex multi-robot tasks. We are motivated by complex multi-robot tasks such as target tracking, environmental mapping, and area monitoring. Such tasks are often modeled as submodular maximization problems due to the information overlap among the robots. We provide an algorithm, Meta Bandit Sequential Greedy (MetaBSG), which enjoys performance guarantees even when the external commands are arbitrarily bad. MetaBSG leverages a meta-algorithm to learn whether the robots should follow the commands or a recently developed submodular coordination algorithm, Bandit Sequential Greedy (BSG) [1], which has performance guarantees even in unpredictable and partially-observable environments. Particularly, MetaBSG asymptotically can achieve the better performance out of the commands and the BSG algorithm, quantifying its suboptimality against the optimal time-varying multi-robot actions in hindsight. Thus, MetaBSG can be interpreted as robustifying the untrustworthy commands. We validate our algorithm in simulated scenarios of multi-target tracking.
    摘要 We present an algorithm, Meta Bandit Sequential Greedy (MetaBSG), which appreciates execution guarantees even when the outside orders are arbitrarily terrible. MetaBSG leverages a meta-algorithm to learn whether the robots should follow the orders or a as of late created submodular coordination algorithm, Bandit Sequential Greedy (BSG) [1], which has execution guarantees even in unforeseeable and partially observable conditions. Specifically, MetaBSG asymptotically can accomplish the better execution out of the orders and the BSG algorithm, quantifying its suboptimality against the optimal time-varying multi-robot activities in hindsight. Along these lines, MetaBSG can be deciphered as robustifying the untrustworthy orders. We confirm our algorithm in simulated scenarios of multi-target following.

AE-GPT: Using Large Language Models to Extract Adverse Events from Surveillance Reports-A Use Case with Influenza Vaccine Adverse Events

  • paper_url: http://arxiv.org/abs/2309.16150
  • repo_url: None
  • paper_authors: Yiming Li, Jianfu Li, Jianping He, Cui Tao
  • for: 本研究旨在评估大语言模型(LLMs)在临床报告中检测抗体不良事件(AE)的能力。
  • methods: 研究使用了1990年至2016年的VAERS数据,并评估了多种常见的LLMs,包括GPT-2、GPT-3变种、GPT-4和Llama 2。研究人员使用了GPT 3.5模型进行精度调整,并以Influenza疫苗为用例进行测试。
  • results: 研究发现,精度调整后的GPT 3.5模型(AE-GPT)在严格匹配和饱和匹配情况下都达到了0.704和0.816的微型F1分数。这表明LLMs在处理医疗数据方面具有潜在的潜力,这可能将为其他AE检测任务提供新的思路。
    Abstract Though Vaccines are instrumental in global health, mitigating infectious diseases and pandemic outbreaks, they can occasionally lead to adverse events (AEs). Recently, Large Language Models (LLMs) have shown promise in effectively identifying and cataloging AEs within clinical reports. Utilizing data from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016, this study particularly focuses on AEs to evaluate LLMs' capability for AE extraction. A variety of prevalent LLMs, including GPT-2, GPT-3 variants, GPT-4, and Llama 2, were evaluated using Influenza vaccine as a use case. The fine-tuned GPT 3.5 model (AE-GPT) stood out with a 0.704 averaged micro F1 score for strict match and 0.816 for relaxed match. The encouraging performance of the AE-GPT underscores LLMs' potential in processing medical data, indicating a significant stride towards advanced AE detection, thus presumably generalizable to other AE extraction tasks.
    摘要 尽管疫苗在全球医疗中发挥了重要作用,减轻感染病和流行病舌,但它们有时会导致不良反应(AE)。最近,大型自然语言模型(LLM)已经显示出了有效地标识和目录AE的潜力。通过使用1990年至2016年的疫苗不良反应报送系统(VAERS)数据,本研究专门关注AE来评估LLM的能力。包括GPT-2、GPT-3变种、GPT-4和Llama 2等多种常见LLM都被评估,并使用Influenza疫苗作为用例。细化的GPT 3.5模型(AE-GPT)表现出色,其微型F1分数为0.704(严格匹配)和0.816(松散匹配)。AE-GPT的良好表现表明LLM在处理医疗数据方面的潜力,这表明可能在其他AE抽取任务中表现出色。

T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2309.16146
  • repo_url: https://github.com/neu-datamining/t-col
  • paper_authors: Ming Wang, Daling Wang, Wenfang Wu, Shi Feng, Yifei Zhang
    for:The paper aims to address the lack of interpretability in machine learning (ML) systems by proposing a new method called Tree-based Conditions Optional Links (T-COL) to generate counterfactual explanations (CEs) that can be adapted to general user preferences.methods:The proposed T-COL method uses tree-based structures and conditions to generate CEs that can be customized to suit the variability of ML models, while maintaining robustness even when the validation models change.results:The paper experimentally compares the properties of CEs generated by T-COL under different user preferences and demonstrates that T-COL is better suited for accommodating user preferences and variable ML systems compared to baseline methods, including Large Language Models.
    Abstract Machine learning (ML) based systems have been suffering a lack of interpretability. To address this problem, counterfactual explanations (CEs) have been proposed. CEs are unique as they provide workable suggestions to users, in addition to explaining why a certain outcome was predicted. However, the application of CEs has been hindered by two main challenges, namely general user preferences and variable ML systems. User preferences, in particular, tend to be general rather than specific feature values. Additionally, CEs need to be customized to suit the variability of ML models, while also maintaining robustness even when these validation models change. To overcome these challenges, we propose several possible general user preferences that have been validated by user research and map them to the properties of CEs. We also introduce a new method called \uline{T}ree-based \uline{C}onditions \uline{O}ptional \uline{L}inks (T-COL), which has two optional structures and several groups of conditions for generating CEs that can be adapted to general user preferences. Meanwhile, a group of conditions lead T-COL to generate more robust CEs that have higher validity when the ML model is replaced. We compared the properties of CEs generated by T-COL experimentally under different user preferences and demonstrated that T-COL is better suited for accommodating user preferences and variable ML systems compared to baseline methods including Large Language Models.
    摘要

Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples

  • paper_url: http://arxiv.org/abs/2309.16143
  • repo_url: None
  • paper_authors: Shin’ya Yamaguchi
    for:这篇论文的目的是研究无监督学习(SSL)方法,以实现训练深度分类模型,不需要大量的无标签数据。methods:这篇论文提出了一种使用生成模型生成的synthetic数据进行SSL训练的方法,包括:(1)meta-学习生成模型,以生成模型能够生成与标签样本相似的synthetic样本,并(2)使用real标签和synthetic无标签样本进行SSL训练。results:研究结果表明,这种方法可以超过基eline方法,并在具有极少量标签数据的场景下表现更好。这表明,synthetic样本可以为SSL训练提供更高效的改进。
    Abstract Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.
    摘要 SSL(半有监督学习)是一种有前途的方法,用于训练深度分类模型,使用标注和无标注数据集。然而,现有的SSL方法往往需要大量的无标注数据集,在许多实际应用中可能无法获得,尤其是因为法律约束(例如GDPR)。在这篇论文中,我们研究问题:我们可以不使用真实的无标注数据集来训练SSL模型吗?而是使用生成的数据集,生成自生成基本模型,该模型在不同领域(例如ImageNet)上训练了数百万个样本。我们的主要概念是将生成的样本与真实的标注样本进行对比,并使用这些样本来训练分类器。为此,我们的方法被формализова为一个 alternate optimization 问题:(i)生成基本模型的meta-学习和(ii)使用实际标注和生成无标注样本来进行SSL。为(i),我们提出了一个meta-学习目标,用于优化幂等变量,以生成与真实标注样本更相似的样本,并最小化验证损失。为(ii),我们提出了一个简单的无监督损失函数,用于规范分类器的特征提取器,以提高从生成样本中获得的性能提升。我们证明了,我们的方法可以超越基于生成基本模型的SSL基线。此外,我们还证明了,我们的方法在具有极少量标注数据集的场景中可以更高效地提高SSL性能,这表明生成样本有可能提供更高效的提升。

ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

  • paper_url: http://arxiv.org/abs/2309.16119
  • repo_url: https://github.com/kuleshov-group/modulora-experiment
  • paper_authors: Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov
  • For: The paper is written for proposing a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU.* Methods: The paper proposes a method called ModuLoRA, which integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). The approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module.* Results: The paper achieves competitive performance on text classification, natural language inference, and instruction following tasks using significantly less memory than existing approaches, and surpasses the state-of-the-art ROUGE score on a popular summarization task. The paper also releases ModuLoRA together with a series of low-precision models as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.Here’s the format you requested:* For: 提出了一种能够在1GB GPU上进行精简的大语言模型(LLM)微调法,支持LLM模型的65B参数进行3bit或4bit的精简。* Methods: 提出了一种名为ModuLoRA的方法,该方法通过简单的量化无关的反向传播来自动权重量化,并使用低精度适应器(LoRA)来实现微调。* Results: 实验表明,ModuLoRA可以在文本分类、自然语言推理和指令执行任务中实现竞争力的性能,并且在某些任务上超过了现有的状态态的ROUGE分数。同时, paper还发布了一系列的低精度模型,包括首个3bit的指令执行Alpaca LLM模型,并将其作为LLMTOOLS库发布。
    Abstract We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 3-bit LLMs for the first time--leveraging state-of-the-art 3-bit OPTQ quantization often outperforms finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language infernece, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models--including the first family of 3-bit instruction following Alpaca LLMs--as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.
    摘要 我们提出了一种内存效率高的训练算法,用于大型自然语言模型(LLM)的精细调整。我们的方法,即模块化低级适应(ModuLoRA),可以将用户指定的Weight量化器与精细调整结合在一起。我们的方法基于一种简单的量化不可逆传输,可以动态将低精度LLM weights转换为自定义黑盒量化模块中的低精度模型。这种方法使得可以在3位数字精度下进行精细调整,并且可以首次实现3位LLM的训练。在我们的实验中,ModuLoRA在文本分类、自然语言推理和指令遵从任务中达到了竞争力的性能,并且使用了许多内存少于现有方法。此外,我们还超越了当前的ROUGE分数在摘要任务中。我们将ModuLoRA和一系列低精度模型(包括首个3位指令遵从Alpaca LLM)发布为LLMTOOLS,一个用户友好的库,用于在消费级GPU上量化、运行和精细调整LLM。

E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network

  • paper_url: http://arxiv.org/abs/2309.16117
  • repo_url: https://github.com/liuruiqi0520/E2Net
  • paper_authors: RuiQi Liu, Boyu Diao, Libo Huang, Zhulin An, Yongjun Xu
  • for: 本研究旨在提出一种资源效率高的连续学习方法(Elastic Expansion Network,E2Net),以便在同等计算和存储限制下实现高平均精度和减少忘记。
  • methods: 该方法利用核心子网络精炼和准确回顾样本选择,以实现在同等计算和存储限制下的高平均精度和减少忘记。另外,我们还提出了代表网络精炼来确定核心子网络,以减少回顾缓存的依赖性和促进知识传递。
  • results: 我们的实验表明,E2Net在云环境和边缘环境中的多个数据集上具有很高的表现,并且在计算和存储限制下表现更优于当前状态方法。此外,我们的方法还在计算和存储限制下表现更优于竞争对手。
    Abstract Continual Learning methods are designed to learn new tasks without erasing previous knowledge. However, Continual Learning often requires massive computational power and storage capacity for satisfactory performance. In this paper, we propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net). Leveraging core subnet distillation and precise replay sample selection, E2Net achieves superior average accuracy and diminished forgetting within the same computational and storage constraints, all while minimizing processing time. In E2Net, we propose Representative Network Distillation to identify the representative core subnet by assessing parameter quantity and output similarity with the working network, distilling analogous subnets within the working network to mitigate reliance on rehearsal buffers and facilitating knowledge transfer across previous tasks. To enhance storage resource utilization, we then propose Subnet Constraint Experience Replay to optimize rehearsal efficiency through a sample storage strategy based on the structures of representative networks. Extensive experiments conducted predominantly on cloud environments with diverse datasets and also spanning the edge environment demonstrate that E2Net consistently outperforms state-of-the-art methods. In addition, our method outperforms competitors in terms of both storage and computational requirements.
    摘要 在E2Net中,我们提出了代表网络传授(Representative Network Distillation)来识别代表的核心子网,通过评估参数量和输出相似度,将相似的子网内部 integrate into the working network,以减少复练缓存的依赖和促进知识传递。此外,我们还提出了子网对应体验储存(Subnet Constraint Experience Replay)来优化复练效率,通过基于代表网络的构造储存样本。实验结果显示,E2Net在云端环境中与多种数据集进行实验,以及边缘环境中进行部分实验,能够与现有的方法相比,具有较高的性能。此外,我们的方法还比竞争者在存储和计算需求方面表现更好。

Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words

  • paper_url: http://arxiv.org/abs/2309.16108
  • repo_url: https://github.com/insitro/channelvit
  • paper_authors: Yujia Bao, Srinivasan Sivanandan, Theofanis Karaletsos
  • for: 这篇文章的目的是对现代计算机视觉领域中的 Vision Transformer (ViT) 架构进行修改,以便在微scopic 和卫星影像领域中进行应用。
  • methods: 这篇文章提出了一种修改后的 ChannelViT 模型,它将在每个输入通道中独立建立 patch tokens,并使用可学习的通道嵌入,与位置嵌入一样。此外,它还引入了 Hierarchical Channel Sampling (HCS) 技术来保证模型在测试时运行时的Robustness。
  • results: 根据文章的实验结果,ChannelViT 模型在 ImageNet、JUMP-CP (微scopic 细胞影像)和 So2Sat (卫星影像)上的分类任务中表现出色,并且在只有部分输入通道可用时进行测试时仍能保持良好的表现。另外,HCS 技术被证明是一个强大的 regularizer,独立于架构选择。
    Abstract Vision Transformer (ViT) has emerged as a powerful architecture in the realm of modern computer vision. However, its application in certain imaging fields, such as microscopy and satellite imaging, presents unique challenges. In these domains, images often contain multiple channels, each carrying semantically distinct and independent information. Furthermore, the model must demonstrate robustness to sparsity in input channels, as they may not be densely available during training or testing. In this paper, we propose a modification to the ViT architecture that enhances reasoning across the input channels and introduce Hierarchical Channel Sampling (HCS) as an additional regularization technique to ensure robustness when only partial channels are presented during test time. Our proposed model, ChannelViT, constructs patch tokens independently from each input channel and utilizes a learnable channel embedding that is added to the patch tokens, similar to positional embeddings. We evaluate the performance of ChannelViT on ImageNet, JUMP-CP (microscopy cell imaging), and So2Sat (satellite imaging). Our results show that ChannelViT outperforms ViT on classification tasks and generalizes well, even when a subset of input channels is used during testing. Across our experiments, HCS proves to be a powerful regularizer, independent of the architecture employed, suggesting itself as a straightforward technique for robust ViT training. Lastly, we find that ChannelViT generalizes effectively even when there is limited access to all channels during training, highlighting its potential for multi-channel imaging under real-world conditions with sparse sensors. Our code is available at https://github.com/insitro/ChannelViT.
    摘要 现代计算机视觉领域中,视觉转换器(ViT)已经成为一种强大的建筑。然而,在微scopic和卫星成像等领域中,使用ViT时会遇到一些独特的挑战。在这些领域中,图像通常包含多个渠道,每个渠道都携带着semantically独立且独立的信息。此外,模型需要在输入渠道上示 robustness,因为它们可能不会在训练或测试时都可用。在这篇论文中,我们提出了对ViT建筑的修改,以提高输入渠道之间的推理,并 introduce Hierarchical Channel Sampling(HCS)作为一种额外的正则化技术,以确保在测试时只有部分渠道可用时的模型robustness。我们提出的模型,ChannelViT,通过独立地从每个输入渠道中构建 patch tokens,并使用可学习的渠道嵌入,类似于位置嵌入。我们在ImageNet、JUMP-CP(微scopic细胞成像)和So2Sat(卫星成像)上评估了ChannelViT的性能。我们的结果表明,ChannelViT在分类任务上比ViT高效,并且在部分输入渠道时可以通过HCS来提高模型的robustness。在我们的实验中,HCS作为一种独立的正则化技术,无论使用哪种建筑,都有很好的效果。最后,我们发现ChannelViT在有限的输入渠道可用时也能够很好地适应,表明它在实际中的多渠道成像中具有潜在的应用价值。我们的代码可以在https://github.com/insitro/ChannelViT中找到。

Discovering Utility-driven Interval Rules

  • paper_url: http://arxiv.org/abs/2309.16102
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: Chunkai Zhang, Maohua Lyu, Huaijin Hao, Wensheng Gan, Philip S. Yu
  • for: 这个论文的目的是提出一种基于interval事件序列的高用途序列规则挖掘算法,以解决现有方法不能直接应用于interval事件序列的问题。
  • methods: 该算法使用了一种数值编码关系表示法,以减少关系计算和存储的时间,并提出了一种补做截断策略,通过与用途upper bound相结合,缩小搜索空间。
  • results: 实验表明,该算法可以效果地和高效地从interval事件序列数据库中提取高用途间隔规则(UIRs),并在实际世界和synthetic数据集上实现了优秀的效果。
    Abstract For artificial intelligence, high-utility sequential rule mining (HUSRM) is a knowledge discovery method that can reveal the associations between events in the sequences. Recently, abundant methods have been proposed to discover high-utility sequence rules. However, the existing methods are all related to point-based sequences. Interval events that persist for some time are common. Traditional interval-event sequence knowledge discovery tasks mainly focus on pattern discovery, but patterns cannot reveal the correlation between interval events well. Moreover, the existing HUSRM algorithms cannot be directly applied to interval-event sequences since the relation in interval-event sequences is much more intricate than those in point-based sequences. In this work, we propose a utility-driven interval rule mining (UIRMiner) algorithm that can extract all utility-driven interval rules (UIRs) from the interval-event sequence database to solve the problem. In UIRMiner, we first introduce a numeric encoding relation representation, which can save much time on relation computation and storage on relation representation. Furthermore, to shrink the search space, we also propose a complement pruning strategy, which incorporates the utility upper bound with the relation. Finally, plentiful experiments implemented on both real-world and synthetic datasets verify that UIRMiner is an effective and efficient algorithm.
    摘要 In this work, we propose a utility-driven interval rule mining (UIRMiner) algorithm that can extract all utility-driven interval rules (UIRs) from the interval-event sequence database to solve the problem. Our approach includes the following steps:First, we introduce a numeric encoding relation representation, which can save much time on relation computation and storage on relation representation.Second, to shrink the search space, we propose a complement pruning strategy, which incorporates the utility upper bound with the relation.Finally, we conduct plentiful experiments implemented on both real-world and synthetic datasets, and the results show that UIRMiner is an effective and efficient algorithm.

Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2309.16096
  • repo_url: None
  • paper_authors: Ambar Pal, Jeremias Sulam, René Vidal
  • for: This paper aims to investigate the question of whether adversarial examples are truly unavoidable for modern machine learning classifiers, and to provide theoretical results that demonstrate the existence of robust classifiers under certain conditions.
  • methods: The paper uses theoretical techniques to demonstrate the existence of robust classifiers for data distributions that have certain properties, such as concentration on small-volume subsets of the input space. The paper also explores the use of data structure to improve the robustness of classifiers.
  • results: The paper shows that, for certain data distributions, it is possible to construct classifiers that are robust to adversarial examples. Specifically, the paper demonstrates that, for data distributions concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.
    Abstract The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.
    摘要 现代机器学习分类器的感стви性面临到了劫难例的挑战,这些结果可能是不可避免的。然而,这些结果可能是对自然数据分布过于一般化的,人类在视觉任务中却很有抗性。这种冲突 inspirits我们深入探究:劫难例是否真的无法避免?在这项工作中,我们 theoretically 表明了数据分布的一个关键特性——输入空间中小量子体的集中性——会决定是否存在一个Robust的分类器。我们进一步表明,对于一个集中在低维线性子空间的数据分布,利用数据结构的特点可以得到一些具有良好robustness保证的分类器,超越一些特定情况下的证明证明。

AI Potentiality and Awareness: A Position Paper from the Perspective of Human-AI Teaming in Cybersecurity

  • paper_url: http://arxiv.org/abs/2310.12162
  • repo_url: None
  • paper_authors: Iqbal H. Sarker, Helge Janicke, Nazeeruddin Mohammad, Paul Watters, Surya Nepal
  • for: 本研究探讨了人工智能在网络安全领域的潜在可能性,尤其是其可能的风险因素,通过人机合作(Human-AI)来管理这些风险。
  • methods: 本研究使用了人工智能技术,如Pattern recognition和预测模型,探索了AI在网络安全领域的可能性,并提出了一种 equilibrio balance方法,即将人类专业知识与AI计算能力相结合,以提高网络安全防御能力。
  • results: 本研究发现,通过人机合作,可以提高网络安全防御能力,并且可以减少相关的风险因素。此外,本研究还发现,AI可以帮助人类专业人员更好地理解和解决网络安全问题。
    Abstract This position paper explores the broad landscape of AI potentiality in the context of cybersecurity, with a particular emphasis on its possible risk factors with awareness, which can be managed by incorporating human experts in the loop, i.e., "Human-AI" teaming. As artificial intelligence (AI) technologies advance, they will provide unparalleled opportunities for attack identification, incident response, and recovery. However, the successful deployment of AI into cybersecurity measures necessitates an in-depth understanding of its capabilities, challenges, and ethical and legal implications to handle associated risk factors in real-world application areas. Towards this, we emphasize the importance of a balanced approach that incorporates AI's computational power with human expertise. AI systems may proactively discover vulnerabilities and detect anomalies through pattern recognition, and predictive modeling, significantly enhancing speed and accuracy. Human experts can explain AI-generated decisions to stakeholders, regulators, and end-users in critical situations, ensuring responsibility and accountability, which helps establish trust in AI-driven security solutions. Therefore, in this position paper, we argue that human-AI teaming is worthwhile in cybersecurity, in which human expertise such as intuition, critical thinking, or contextual understanding is combined with AI's computational power to improve overall cyber defenses.
    摘要 这份位点纸 analyze AI在cybersecurity领域的广泛潜力,尤其是其可能的风险因素,可以通过将人类专家纳入循环来管理,即"人机合作"(Human-AI teaming)。随着人工智能(AI)技术的进步,它将为攻击标识、事件回应和恢复提供无 precedent的机会。然而,在实施 AI 到 cybersecurity 措施方面,需要深入了解其能力、挑战和伦理法律因素,以处理相关风险因素在实际应用领域。为了实现这一目标,我们强调需要一种平衡的方法,即将 AI 的计算能力与人类专家的知识相结合。AI 系统可以扫描 Pattern 并探索漏洞,并预测模型,大幅提高速度和准确性。人类专家可以为重要情况中解释 AI 生成的决策,使得责任和财务可以被追溯,从而建立 AI 驱动的安全解决方案的信任。因此,在这份位点纸中,我们认为人机合作在cybersecurity中是值得投入的,在这种合作中,人类专家的直觉、批判思维和Contextual 理解与 AI 的计算能力相结合,从而提高总的cyber 防御能力。

TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration

  • paper_url: http://arxiv.org/abs/2309.16090
  • repo_url: None
  • paper_authors: Hongru Wang, Huimin Wang, Lingzhi Wang, Minda Hu, Rui Wang, Boyang Xue, Hongyuan Lu, Fei Mi, Kam-Fai Wong
  • for: 这 paper 旨在扩展大语言模型(LLM)在干扰问答任务中的规划能力,特别是在对话系统中使用不同的概念工具。
  • methods: 这 paper 使用了一种多人格协作框架:思考-规划-执行(TPE),将响应生成过程分解成三个不同角色:思考者、规划者和执行者。
  • results: 这 paper 在多源(FoCus)和多策略交互(CIMA和PsyQA)等响应生成任务中示出了效果,这表明它可以处理更为复杂的对话交互,而不仅仅是功能工具。
    Abstract Large language models (LLMs) have demonstrated exceptional performance in planning the use of various functional tools, such as calculators and retrievers, particularly in question-answering tasks. In this paper, we expand the definition of these tools, centering on conceptual tools within the context of dialogue systems. A conceptual tool specifies a cognitive concept that aids systematic or investigative thought. These conceptual tools play important roles in practice, such as multiple psychological or tutoring strategies being dynamically applied in a single turn to compose helpful responses. To further enhance the reasoning and planning capability of LLMs with these conceptual tools, we introduce a multi-persona collaboration framework: Think-Plan-Execute (TPE). This framework decouples the response generation process into three distinct roles: Thinker, Planner, and Executor. Specifically, the Thinker analyzes the internal status exhibited in the dialogue context, such as user emotions and preferences, to formulate a global guideline. The Planner then generates executable plans to call different conceptual tools (e.g., sources or strategies), while the Executor compiles all intermediate results into a coherent response. This structured approach not only enhances the explainability and controllability of responses but also reduces token redundancy. We demonstrate the effectiveness of TPE across various dialogue response generation tasks, including multi-source (FoCus) and multi-strategy interactions (CIMA and PsyQA). This reveals its potential to handle real-world dialogue interactions that require more complicated tool learning beyond just functional tools. The full code and data will be released for reproduction.
    摘要 大型语言模型(LLM)在几种功能工具的使用规划方面表现出色,特别是在问答任务中。在这篇论文中,我们扩展了这些工具的定义,将注重在对话系统中的概念工具。概念工具指定了思维的认知概念,以便系统atic或调查性思维。这些概念工具在实践中发挥重要作用,例如在单个转律中应用多种心理或教学策略以组成有用的回答。为了进一步增强LLM的理解和规划能力,我们引入了多人格协作框架:思考-规划-执行(TPE)。这个框架将响应生成过程分解成三个不同角色:思考者、规划者和执行者。具体来说,思考者通过对对话上下文中的内部状态,如用户情感和首选,来形成全局指南。规划者则生成可执行的计划,以调用不同的概念工具(如来源或策略),而执行者则将所有中间结果编译成一个准确的回答。这种结构化的方法不仅提高了回答的可解释性和控制性,还减少了各种重复的token。我们在多种对话回答生成任务中证明了TPE的效iveness,包括多源(FoCus)和多策略互动(CIMA和PsyQA)。这表明它可以处理现实世界中的对话互动,需要更为复杂的工具学习。我们将完整的代码和数据公开发布,以便其他人复制和扩展。

Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble

  • paper_url: http://arxiv.org/abs/2309.16082
  • repo_url: None
  • paper_authors: Zhe Liu, Ozlem Kalinli
  • for: 隐私保护和模型更新
  • methods: 教师-学生框架和剩下一个 ensemble 方法
  • results: 在 LibriSpeech 和 WikiText-103 数据集上实现了更好的隐私利用之间的质量比In simpler English:
  • for: Privacy protection and model updating
  • methods: Teacher-student framework and leave-one-out ensemble method
  • results: Superior privacy-utility trade-offs on LibriSpeech and WikiText-103 datasets
    Abstract Recent research has shown that language models have a tendency to memorize rare or unique token sequences in the training corpus. After deploying a model, practitioners might be asked to delete any personal information from the model by individuals' requests. Re-training the underlying model every time individuals would like to practice their rights to be forgotten is computationally expensive. We employ a teacher-student framework and propose a novel leave-one-out ensemble method to unlearn the targeted textual sequences that need to be forgotten from the model. In our approach, multiple teachers are trained on disjoint sets; for each targeted sequence to be removed, we exclude the teacher trained on the set containing this sequence and aggregate the predictions from remaining teachers to provide supervision during fine-tuning. Experiments on LibriSpeech and WikiText-103 datasets show that the proposed method achieves superior privacy-utility trade-offs than other counterparts.
    摘要 最近的研究发现,语言模型有一种倾向,即记忆特殊或罕见的token序列在训练集中。当部署模型后,实际应用者可能需要根据个人需求删除模型中的个人信息。重新训练基础模型每次个人需要行使“忘记权”是 computationally expensive。我们采用教师-学生框架,并提出了一种新的离别一个学生 ensemble方法,用于从模型中忘记需要被忘记的文本序列。在我们的方法中,多个教师在不同的集合上进行训练;对于每个需要删除的序列,我们将包含这个序列的教师排除,并将剩下的教师的预测结果作为超vision提供给 fine-tuning。在 LibriSpeech 和 WikiText-103 数据集上进行的实验表明,我们的方法可以在 Privacy-Utility 质量之间取得更好的质量。

cs.CL - 2023-09-28

A Sign Language Recognition System with Pepper, Lightweight-Transformer, and LLM

  • paper_url: http://arxiv.org/abs/2309.16898
  • repo_url: None
  • paper_authors: JongYoon Lim, Inkyu Sa, Bruce MacDonald, Ho Seok Ahn
  • for: 实现人类与机器人之间的非语言互动,使用轻量级深度学习架构来理解美洲手语(ASL)。
  • methods: 利用轻量级深度学习模型来实现快速识别手语,并与大型自然语言模型(LLM)整合,实现智能机器人互动。通过几何调整引擎,调整互动以允许机器人产生自然的同步姿势回应。
  • results: 在实际应用中显示出轻量级深度学习架构可以实现高效的手语识别和自然的机器人互动,实现人类与机器人之间的无语言互动,扩大机器人的应用范围,并增进人类与机器人之间的沟通。
    Abstract This research explores using lightweight deep neural network architectures to enable the humanoid robot Pepper to understand American Sign Language (ASL) and facilitate non-verbal human-robot interaction. First, we introduce a lightweight and efficient model for ASL understanding optimized for embedded systems, ensuring rapid sign recognition while conserving computational resources. Building upon this, we employ large language models (LLMs) for intelligent robot interactions. Through intricate prompt engineering, we tailor interactions to allow the Pepper Robot to generate natural Co-Speech Gesture responses, laying the foundation for more organic and intuitive humanoid-robot dialogues. Finally, we present an integrated software pipeline, embodying advancements in a socially aware AI interaction model. Leveraging the Pepper Robot's capabilities, we demonstrate the practicality and effectiveness of our approach in real-world scenarios. The results highlight a profound potential for enhancing human-robot interaction through non-verbal interactions, bridging communication gaps, and making technology more accessible and understandable.
    摘要 这些研究探讨使用轻量级深度神经网络架构,使人类机器人Pepper能够理解美国手语(ASL),并促进无语言人机器人交互。首先,我们介绍了一种轻量级、高效的ASL理解模型,适用于嵌入式系统,以便快速认osciptic gesture,并保留计算资源。然后,我们利用大型自然语言模型(LLM),实现智能机器人交互。通过细腻的提示工程,我们调整交互,使Pepper机器人能够自然地生成Co-Speech Gesture响应,为人机器人对话铺垫基础。最后,我们提出了一个集成的软件管道,整合了社会意识AI交互模型。利用Pepper机器人的能力,我们在实际场景中展示了我们的方法的实用性和效果。结果表明,使用非语言交互可以bridge沟通差距,使技术更加 accessible和理解。

DeBERTinha: A Multistep Approach to Adapt DebertaV3 XSmall for Brazilian Portuguese Natural Language Processing Task

  • paper_url: http://arxiv.org/abs/2309.16844
  • repo_url: None
  • paper_authors: Israel Campiotti, Matheus Rodrigues, Yuri Albuquerque, Rafael Azevedo, Alyson Andrade
  • for: This paper presents an approach for adapting a pre-trained English language model for use in Brazilian Portuguese natural language processing tasks.
  • methods: The methodology involves a multistep training process to fine-tune the model for the Portuguese language, using a combination of pre-trained English model weights and random embeddings.
  • results: The adapted model, called DeBERTinha, demonstrates effectiveness on downstream tasks such as named entity recognition, sentiment analysis, and determining sentence relatedness, outperforming a baseline model despite having fewer parameters.
    Abstract This paper presents an approach for adapting the DebertaV3 XSmall model pre-trained in English for Brazilian Portuguese natural language processing (NLP) tasks. A key aspect of the methodology involves a multistep training process to ensure the model is effectively tuned for the Portuguese language. Initial datasets from Carolina and BrWac are preprocessed to address issues like emojis, HTML tags, and encodings. A Portuguese-specific vocabulary of 50,000 tokens is created using SentencePiece. Rather than training from scratch, the weights of the pre-trained English model are used to initialize most of the network, with random embeddings, recognizing the expensive cost of training from scratch. The model is fine-tuned using the replaced token detection task in the same format of DebertaV3 training. The adapted model, called DeBERTinha, demonstrates effectiveness on downstream tasks like named entity recognition, sentiment analysis, and determining sentence relatedness, outperforming BERTimbau-Large in two tasks despite having only 40M parameters.
    摘要

Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots Through Synthesizing Conversational Data

  • paper_url: http://arxiv.org/abs/2309.16804
  • repo_url: None
  • paper_authors: Yu Li, Shang Qu, Jili Shen, Shangchao Min, Zhou Yu
  • for: 帮助学生提高对话技巧,满足学生在课程框架下的学习需求。
  • methods: 利用大语言模型生成对话,根据教科书中的相关话题进行EXTRACTING,然后使用自定义的LLM进行精度调整。
  • results: 比ChatGPT更好地领导课程基础的对话,能够根据用户的英语水平进行对话调整,提供学生个性化的对话实践。
    Abstract Chatbots have become popular in educational settings, revolutionizing how students interact with material and how teachers teach. We present Curriculum-Driven EduBot, a framework for developing a chatbot that combines the interactive features of chatbots with the systematic material of English textbooks to assist students in enhancing their conversational skills. We begin by extracting pertinent topics from textbooks and then using large language models to generate dialogues related to these topics. We then fine-tune an open-source LLM using our generated conversational data to create our curriculum-driven chatbot. User studies demonstrate that our chatbot outperforms ChatGPT in leading curriculum-based dialogues and adapting its dialogue to match the user's English proficiency level. By combining traditional textbook methodologies with conversational AI, our approach offers learners an interactive tool that aligns with their curriculum and provides user-tailored conversation practice. This facilitates meaningful student-bot dialogues and enriches the overall learning experience within the curriculum's pedagogical framework.
    摘要 chatbots 已经在教育 Setting 中变得流行,推翻了学生与材料之间的交互方式和教师教学方式。我们提出了 Curriculum-Driven EduBot 框架,用于开发一个结合了聊天机器人的互动特点和英语教科书系统的材料来帮助学生提高对话技巧。我们首先从教科书中提取有关话题,然后使用大型自然语言模型生成与这些话题相关的对话。然后,我们使用我们生成的对话数据来练化一个开源 LLM,以创建受教科书驱动的聊天机器人。用户研究表明,我们的聊天机器人在课程基础的对话中表现出色,并且可以根据用户的英语水平进行对话调整。通过结合传统教科书方法与对话 AI,我们的方法提供了学习者一种交互的工具,与其课程的教学框架相吻合。这使得学生与机器人的对话变得有意义,并润色了整个学习经验。

Hallucination Reduction in Long Input Text Summarization

  • paper_url: http://arxiv.org/abs/2309.16781
  • repo_url: https://github.com/tohidarehman/hallucination-reduction-text-summarization
  • paper_authors: Tohida Rehman, Ronit Mandal, Abhishek Agarwal, Debarshi Kumar Sanyal
  • for: 本研究旨在降低长文摘要中的幻觉输出(hallucination),以提高摘要的准确性和可靠性。
  • methods: 我们使用了数据筛选和共同实体和摘要生成(JAENS)技术,对Longformer Encoder-Decoder(LED)模型进行精度调整,以降低幻觉输出。
  • results: 我们的实验表明,精度调整后的LED模型能够良好地生成文章摘要。数据筛选技术基于一些预处理步骤,可以降低生成摘要中的实体幻觉水平,以judged by some factual consistency metrics。
    Abstract Hallucination in text summarization refers to the phenomenon where the model generates information that is not supported by the input source document. Hallucination poses significant obstacles to the accuracy and reliability of the generated summaries. In this paper, we aim to reduce hallucinated outputs or hallucinations in summaries of long-form text documents. We have used the PubMed dataset, which contains long scientific research documents and their abstracts. We have incorporated the techniques of data filtering and joint entity and summary generation (JAENS) in the fine-tuning of the Longformer Encoder-Decoder (LED) model to minimize hallucinations and thereby improve the quality of the generated summary. We have used the following metrics to measure factual consistency at the entity level: precision-source, and F1-target. Our experiments show that the fine-tuned LED model performs well in generating the paper abstract. Data filtering techniques based on some preprocessing steps reduce entity-level hallucinations in the generated summaries in terms of some of the factual consistency metrics.
    摘要 描述文本简化中的幻觉现象指的是模型生成的信息不受输入文档支持。幻觉会对简化后的摘要准确性和可靠性产生很大的影响。在这篇论文中,我们想降低摘要中的幻觉输出或幻觉。我们使用了PubMed数据集,这个数据集包含长篇科学研究文献和其摘要。我们在LED模型的精度调节阶段采用数据筛选和联合实体和摘要生成(JAENS)技术,以减少幻觉并提高生成的摘要质量。我们使用了以下度量来衡量实体层次的事实一致性:准确性-源,F1-目标。我们的实验表明,精度调节后的LED模型能够好地生成文档摘要。基于一些预处理步骤的数据筛选技术可以在生成的摘要中减少实体层次的幻觉。

Demystifying CLIP Data

  • paper_url: http://arxiv.org/abs/2309.16671
  • repo_url: https://github.com/facebookresearch/metaclip
  • paper_authors: Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer
  • for: This paper is written for advancing research and applications in computer vision, particularly in the area of contrastive language-image pre-training (CLIP).
  • methods: The paper introduces a new approach called Metadata-Curated Language-Image Pre-training (MetaCLIP), which aims to reveal CLIP’s data curation approach and make it open to the community. MetaCLIP takes a raw data pool and metadata (derived from CLIP’s concepts) and yields a balanced subset over the metadata distribution.
  • results: The paper reports that MetaCLIP outperforms CLIP’s data on multiple standard benchmarks, achieving 70.8% accuracy on zero-shot ImageNet classification with 400M image-text data pairs, and scaling to 1B data with the same training budget. The paper also shows that MetaCLIP achieves better performance than CLIP on various model sizes, such as ViT-H, which achieves 80.5% accuracy without any bells-and-whistles.
    Abstract Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced research and applications in computer vision, fueling modern recognition systems and generative models. We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective. However, CLIP only provides very limited information about its data and how it has been collected, leading to works that aim to reproduce CLIP's data by filtering with its model parameters. In this work, we intend to reveal CLIP's data curation approach and in our pursuit of making it open to the community introduce Metadata-Curated Language-Image Pre-training (MetaCLIP). MetaCLIP takes a raw data pool and metadata (derived from CLIP's concepts) and yields a balanced subset over the metadata distribution. Our experimental study rigorously isolates the model and training settings, concentrating solely on data. MetaCLIP applied to CommonCrawl with 400M image-text data pairs outperforms CLIP's data on multiple standard benchmarks. In zero-shot ImageNet classification, MetaCLIP achieves 70.8% accuracy, surpassing CLIP's 68.3% on ViT-B models. Scaling to 1B data, while maintaining the same training budget, attains 72.4%. Our observations hold across various model sizes, exemplified by ViT-H achieving 80.5%, without any bells-and-whistles. Curation code and training data distribution on metadata is made available at https://github.com/facebookresearch/MetaCLIP.
    摘要 CLIP(语言图像预训练)是一种技术,它已经提高了计算机视觉领域的研究和应用,推动现代识别系统和生成模型。我们认为CLIP的成功的主要原因是其数据,而不是模型结构或预训练目标。然而,CLIP只提供了非常有限的数据信息和收集方法,导致一些工作尝试通过CLIP模型参数来复制CLIP的数据。在这种情况下,我们计划披露CLIP的数据筛选策略,并在为社区开放CLIP的数据预处理技术引入Metadata-Curated Language-Image Pre-training(MetaCLIP)。MetaCLIP使用原始数据池和元数据(从CLIP的概念中 derivated)来生成元数据分布平衡subset。我们的实验充分隔离模型和训练参数,专注于数据。在应用于CommonCrawl的400万张图像文本对比 experiment,MetaCLIP exceeds CLIP的数据在多个标准测试 benchmark 上。在零容量ImageNet分类任务中,MetaCLIP实现了70.8%的准确率,比CLIP的68.3%高于ViT-B模型。在增加到1亿个数据时,保持相同的训练预算,MetaCLIP达到了72.4%。我们的观察结果在不同的模型大小上都具有相同的特点,例如ViT-H实现了80.5%的准确率,无需任何额外的技术。我们在 GitHub 上提供了数据预处理代码和训练数据分布,请参考https://github.com/facebookresearch/MetaCLIP。

Qwen Technical Report

  • paper_url: http://arxiv.org/abs/2309.16609
  • repo_url: https://github.com/QwenLM/Qwen-7B
  • paper_authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
  • for: 这篇论文旨在推介一种新的大型自然语言处理(NLP)模型系列,称为Qwen系列。
  • methods: 该论文使用了不同参数计数的模型,包括基础预训练模型Qwen以及通过人工对齐技术微调的聊天模型Qwen-Chat。
  • results: 研究表明,基础模型在多种下游任务中表现出色,而微调后的聊天模型具有出色的工具使用和规划能力,可以创建高效的智能应用程序。此外,研究还开发了专门为编程和数学领域的模型,即Code-Qwen和Code-Qwen-Chat,以及Math-Qwen-Chat,这些模型在相关任务上表现出优于开源模型,但落后于商业模型。
    Abstract Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.
    摘要 大型自然语言模型(LLM)已经革命化人工智能领域,使得先前被认为是人类专有的自然语言处理任务现在可以由机器完成。在这项工作中,我们介绍了Qwen系列,这是我们的大型语言模型系列,包括不同参数数量的多种模型。其中包括Qwen基础预训练语言模型和Qwen-Chat通话模型,后者通过人类对齐技术进行了加工。基础语言模型在多个下游任务中几乎一直表现出优秀的表现,而通话模型,特别是使用人类反馈学习(RLHF)进行训练的通话模型,在创造代理应用程序时具有高级工具使用和规划能力,并在使用代码解释器的复杂任务上表现出色。此外,我们还开发了专门为编程而设计的模型,名为Code-Qwen和Code-Qwen-Chat,以及专门为数学而设计的模型,名为Math-Qwen-Chat,这些模型基于基础语言模型。这些模型在相比于开源模型的情况下表现出了显著的提升,并只有轻微落后于商业模型。

Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

  • paper_url: http://arxiv.org/abs/2309.16599
  • repo_url: https://github.com/zanchangtong/unions
  • paper_authors: Changtong Zan, Liang Ding, Li Shen, Yibin Lei, Yibing Zhan, Weifeng Liu, Dacheng Tao
  • for: 这个研究旨在解释在零shot翻译(ZST)任务中语言标识符(ID)的导航能力是如何受限的。
  • methods: 研究使用了零shot翻译模型,并对两种极端的decoder输入情况进行比较分析:Off-Target(OFF)和On-Target(ON)两种情况。通过对 Contextual Word Representations(CWRs)进行比较分析,研究发现了语言标识符在不同情况下的导航能力。
  • results: 研究发现,although language IDs work well in ideal ON settings, they become fragile and lose their navigation ability when faced with off-target tokens。为了解决这个问题,研究使用了不利可能性调整法,减少了off-target ratio,导致了BLEU分数的提高。
    Abstract Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data. The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs, e.g., for English and for German. Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem (non-target language words exist in the generated translation) and, therefore, difficult to apply the current multilingual translation model to a broad range of zero-shot language scenarios. To understand when and why the navigation capabilities of language IDs are weakened, we compare two extreme decoder input cases in the ZST directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively visualizing the contextual word representations (CWRs) of these cases with teacher forcing, we show that 1) the CWRs of different languages are effectively distributed in separate regions when the sentence and ID are matched (ON setting), and 2) if the sentence and ID are unmatched (OFF setting), the CWRs of different languages are chaotically distributed. Our analyses suggest that although they work well in ideal ON settings, language IDs become fragile and lose their navigation ability when faced with off-target tokens, which commonly exist during inference but are rare in training scenarios. In response, we employ unlikelihood tuning on the negative (OFF) samples to minimize their probability such that the language IDs can discriminate between the on- and off-target tokens during training. Experiments spanning 40 ZST directions show that our method reduces the off-target ratio by -48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3% tuning cost.
    摘要 zero-shot翻译(ZST)通常基于多语言神经机器翻译模型,旨在在训练数据中未经见过的语言对之间翻译。通常情况下,在推导 zero-shot 语言映射时,会故意插入源语言ID和目标语言ID,例如 表示英语和 表示德语。然而, latest studies 表明,语言ID 在推导 ZST 任务中的导航能力有时会弱化,导致翻译结果受到非目标语言词汇的影响,从而使得当前多语言翻译模型难以应用于广泛的 zero-shot 语言enario。为了了解语言ID 在 ZST 任务中的导航能力是如何弱化的,我们比较了两种极端的解码输入情况:Off-Target(OFF)和 On-Target(ON)两种情况。通过比较这两种情况下的上下文字表示(CWR),我们发现:1)当 sentence 和 ID 匹配时(ON setting),不同语言的 CWR 分布在不同的区域,2)如果 sentence 和 ID 不匹配(OFF setting),不同语言的 CWR 分布混乱。我们的分析表明,虽然它们在理想的 ON 设置下工作非常好,但是语言 ID 在面对非目标语言词汇时变得脆弱,丢弃了导航能力。为了解决这个问题,我们使用不良抽象训练方法,通过训练时间间隔的负样本进行训练,以降低 OFF 样本的概率,使语言 ID 能够在训练中分辨在目标语言和非目标语言之间。实验结果表明,我们的方法可以降低 OFF 比例平均 -48.0%,并且提高 BLEU 平均 +9.1,只需要额外花费 +0.3% 的训练成本。

GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond

  • paper_url: http://arxiv.org/abs/2309.16583
  • repo_url: https://github.com/gpt-fathom/gpt-fathom
  • paper_authors: Shen Zheng, Yuyu Zhang, Yijie Zhu, Chenguang Xi, Pengyang Gao, Xun Zhou, Kevin Chen-Chuan Chang
  • for: 评估大语言模型(LLM)的全面能力和局限性。
  • methods: 使用OpenAI Evals开发了一个开源和可重现的LLM评估suite,对10多个领先的LLM以及OpenAI的遗产模型进行了20多个精心制定的测试,并进行了7种能力类别的评估。
  • results: 对OpenAI的早期模型进行了Retrospective研究,提供了各种LLM的进步和改进的技术细节,如 Whether adding code data improves LLM’s reasoning capability, Which aspects of LLM capability can be improved by SFT and RLHF,Alignment tax等问题的解答,以提高高级LLM的透明度。
    Abstract With the rapid advancement of large language models (LLMs), there is a pressing need for a comprehensive evaluation suite to assess their capabilities and limitations. Existing LLM leaderboards often reference scores reported in other papers without consistent settings and prompts, which may inadvertently encourage cherry-picking favored settings and prompts for better results. In this work, we introduce GPT-Fathom, an open-source and reproducible LLM evaluation suite built on top of OpenAI Evals. We systematically evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all under aligned settings. Our retrospective study on OpenAI's earlier models offers valuable insights into the evolutionary path from GPT-3 to GPT-4. Currently, the community is eager to know how GPT-3 progressively improves to GPT-4, including technical details like whether adding code data improves LLM's reasoning capability, which aspects of LLM capability can be improved by SFT and RLHF, how much is the alignment tax, etc. Our analysis sheds light on many of these questions, aiming to improve the transparency of advanced LLMs.
    摘要 With the rapid development of large language models (LLMs), there is a pressing need for a comprehensive evaluation suite to assess their capabilities and limitations. Existing LLM leaderboards often reference scores reported in other papers without consistent settings and prompts, which may inadvertently encourage cherry-picking favored settings and prompts for better results. In this work, we introduce GPT-Fathom, an open-source and reproducible LLM evaluation suite built on top of OpenAI Evals. We systematically evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all under aligned settings. Our retrospective study on OpenAI's earlier models offers valuable insights into the evolutionary path from GPT-3 to GPT-4. Currently, the community is eager to know how GPT-3 progressively improves to GPT-4, including technical details like whether adding code data improves LLM's reasoning capability, which aspects of LLM capability can be improved by SFT and RLHF, how much is the alignment tax, etc. Our analysis sheds light on many of these questions, aiming to improve the transparency of advanced LLMs.Here's the translation in Traditional Chinese as well:With the rapid development of large language models (LLMs), there is a pressing need for a comprehensive evaluation suite to assess their capabilities and limitations. Existing LLM leaderboards often reference scores reported in other papers without consistent settings and prompts, which may inadvertently encourage cherry-picking favored settings and prompts for better results. In this work, we introduce GPT-Fathom, an open-source and reproducible LLM evaluation suite built on top of OpenAI Evals. We systematically evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all under aligned settings. Our retrospective study on OpenAI's earlier models offers valuable insights into the evolutionary path from GPT-3 to GPT-4. Currently, the community is eager to know how GPT-3 progressively improves to GPT-4, including technical details like whether adding code data improves LLM's reasoning capability, which aspects of LLM capability can be improved by SFT and RLHF, how much is the alignment tax, etc. Our analysis sheds light on many of these questions, aiming to improve the transparency of advanced LLMs.

A Benchmark for Learning to Translate a New Language from One Grammar Book

  • paper_url: http://arxiv.org/abs/2309.16575
  • repo_url: None
  • paper_authors: Garrett Tanzer, Mirac Suzgun, Eline Visser, Dan Jurafsky, Luke Melas-Kyriazi
  • for: 这个论文是为了测试大型语言模型(LLM)在新任务上的能力,以及使用少量数据进行语言学习。
  • methods: 这个论文使用了现有的LLM作为基础,并在一本 Kalamang 语言 grammar 引用书上进行了一些 slight 的修改和微调。
  • results: 研究发现,使用当前的 LLM 可以达到44.7chrF 的 Kalamang 到英语翻译和45.8chrF 的英语到 Kalamang 翻译,相比之下,人类学习 Kalamang 从同一个引用书上的结果为51.6和57.0chrF。
    Abstract Large language models (LLMs) can perform impressive feats with in-context learning or lightweight finetuning. It is natural to wonder how well these models adapt to genuinely new tasks, but how does one find tasks that are unseen in internet-scale training sets? We turn to a field that is explicitly motivated and bottlenecked by a scarcity of web data: low-resource languages. In this paper, we introduce MTOB (Machine Translation from One Book), a benchmark for learning to translate between English and Kalamang -- a language with less than 200 speakers and therefore virtually no presence on the web -- using several hundred pages of field linguistics reference materials. This task framing is novel in that it asks a model to learn a language from a single human-readable book of grammar explanations, rather than a large mined corpus of in-domain data, more akin to L2 learning than L1 acquisition. We demonstrate that baselines using current LLMs are promising but fall short of human performance, achieving 44.7 chrF on Kalamang to English translation and 45.8 chrF on English to Kalamang translation, compared to 51.6 and 57.0 chrF by a human who learned Kalamang from the same reference materials. We hope that MTOB will help measure LLM capabilities along a new dimension, and that the methods developed to solve it could help expand access to language technology for underserved communities by leveraging qualitatively different kinds of data than traditional machine translation.
    摘要 大型语言模型(LLM)可以执行吸引人的表现,使用内容学习或轻量级调整。人们 naturallly 会想知道这些模型是否可以适应真正的新任务,但如何找到互联网上没有的任务呢?我们到了一个这些语言的缺乏网络数据的领域:低资源语言。在这篇文章中,我们介绍了 MTOB(从一本书 Machine Translation),一个用于将英语和卡拉曼(一种只有 fewer than 200 名 speaker的语言)之间进行翻译的benchmark。这个任务框架是新的,因为它请求一个模型从单一的人类可读的 grammar 解释书中学习一个语言,而不是从大量矿物质的内部数据中学习,更像是 L2 学习而不是 L1 获得。我们展示了现有的 LLB 是可以 promise 的,但落后于人类性能,实现了从英语到卡拉曼的翻译和从卡拉曼到英语的翻译的chrF 44.7和45.8,相比之下,人类从同一个 reference materials 学习 Kalamang 的chrF 为51.6和57.0。我们希望 MTOB 可以帮助衡量 LLM 的能力,并且可以帮助扩展语言科技 для被排除的社区,通过使用不同于传统机器翻译的数据来进行。

Unsupervised Fact Verification by Language Model Distillation

  • paper_url: http://arxiv.org/abs/2309.16540
  • repo_url: None
  • paper_authors: Adrián Bazaga, Pietro Liò, Gos Micklem
  • for: 这篇论文的目的是为了无监督的事实验证,即使没有任何标注数据,也能够使用可信worthy知识库中的证据来验证一个声明。
  • methods: 这篇论文使用了自动学习的方法,并且利用预训语言模型来将自动生成的特征整合到高品质的声明和证据的Alignment中。这是由于一个新的对称损失函数,让特征能够获得高品质的声明和证据的Alignment,同时保持数据库中的semantic关系。
  • results: 这篇论文获得了新的州际对称检测benchmark(+8%对称精度)的最佳结果,并且在线性评估中获得了最佳结果。
    Abstract Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.
    摘要 Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments while preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.Here's the translation in Traditional Chinese:Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments while preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.

A Comprehensive Survey of Document-level Relation Extraction (2016-2023)

  • paper_url: http://arxiv.org/abs/2309.16396
  • repo_url: None
  • paper_authors: Julien Delaunay, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Georgeta Bordea, Nicolas Sidere, Antoine Doucet
  • for: 这篇论文旨在提供关于近期文本关系抽取(DocRE)领域的全面概述,强调其与句子关系抽取的区别和应用场景。
  • methods: 本文使用了多种方法,包括文本分析、命名实体识别、语义理解等,以提取文档中的关系。
  • results: 本文提出了一些新的 DocRE 方法,并评估了它们的性能。这些方法可以帮助自动生成知识库,以提高对文档中关系的理解。
    Abstract Document-level relation extraction (DocRE) is an active area of research in natural language processing (NLP) concerned with identifying and extracting relationships between entities beyond sentence boundaries. Compared to the more traditional sentence-level relation extraction, DocRE provides a broader context for analysis and is more challenging because it involves identifying relationships that may span multiple sentences or paragraphs. This task has gained increased interest as a viable solution to build and populate knowledge bases automatically from unstructured large-scale documents (e.g., scientific papers, legal contracts, or news articles), in order to have a better understanding of relationships between entities. This paper aims to provide a comprehensive overview of recent advances in this field, highlighting its different applications in comparison to sentence-level relation extraction.
    摘要 文档级关系EXTRACTION(DocRE)是一个活跃的研究领域,涉及到自然语言处理(NLP)中identifying和EXTRACTING关系之外句子 boundariestra. 相比传统的句子级关系EXTRACTION,DocRE提供了更广阔的上下文,并且更加挑战性,因为它涉及到可能 span multiple sentences或 paragraphs 中的关系。这项任务在建立和自动填充大规模文档(例如科学论文、法律合同或新闻文章)中,以获得更好的实体之间关系的理解。这篇论文的目的是提供 DocRE 领域最新的进展, highlighting 它的不同应用场景与 sentence-level relation extraction 相比。

Transformer-VQ: Linear-Time Transformers via Vector Quantization

  • paper_url: http://arxiv.org/abs/2309.16354
  • repo_url: https://github.com/transformer-vq/transformer_vq
  • paper_authors: Lucas D. Lingle
  • for: 这个论文是为了提出一种基于Transformer的嵌入式自注意力计算方法,以实现高效的自注意力计算。
  • methods: 该方法使用了vector-quantized keys和一种新的缓存机制,实现了高效的自注意力计算。
  • results: 在大规模实验中,该方法表现出色,在Enwik8(0.99 bpb)、PG-19(26.6 ppl)和ImageNet64(3.16 bpb)等测试集上达到了高水平的结果。Here’s the English version of the summary:
  • for: This paper proposes a decoder-only transformer computing softmax-based dense self-attention in linear time.
  • methods: The method uses vector-quantized keys and a novel caching mechanism to achieve efficient attention.
  • results: In large-scale experiments, the method achieves high-quality results on Enwik8 (0.99 bpb), PG-19 (26.6 ppl), and ImageNet64 (3.16 bpb).
    Abstract We introduce Transformer-VQ, a decoder-only transformer computing softmax-based dense self-attention in linear time. Transformer-VQ's efficient attention is enabled by vector-quantized keys and a novel caching mechanism. In large-scale experiments, Transformer-VQ is shown highly competitive in quality, with strong results on Enwik8 (0.99 bpb), PG-19 (26.6 ppl), and ImageNet64 (3.16 bpb). Code: https://github.com/transformer-vq/transformer_vq
    摘要 我们介绍Transformer-VQ,一个仅有decoder的transformer computing软max-based dense自注意力,在线性时间内进行计算。Transformer-VQ的高效注意力得以实现因 vector-quantized keys和一种新的储存机制。在大规模实验中,Transformer-VQ表现出高品质,在Enwik8(0.99 bpb)、PG-19(26.6 ppl)和ImageNet64(3.16 bpb)上获得了强劲的结果。代码:https://github.com/transformer-vq/transformer_vq

Human Feedback is not Gold Standard

  • paper_url: http://arxiv.org/abs/2309.16349
  • repo_url: https://github.com/cohere-ai/human-feedback-paper
  • paper_authors: Tom Hosking, Phil Blunsom, Max Bartolo
  • for: 本研究探讨了人类反馈在评估大型自然语言模型性能时的作用,以及这种评估方法是否能够完全捕捉多种重要错误标准。
  • methods: 研究者使用了人类反馈来训练和评估模型,并分析了 preference scores 是否受到不良偏见的影响。他们还使用了 instruction-tuned 模型来生成输出,以探讨输出的干扰因素。
  • results: 研究者发现, preference scores 覆盖率相对较好,但忽略了重要的准确性因素。此外,他们发现人类反馈可能受到干扰因素的影响,并且使用人类反馈作为训练目标可能会导致模型输出更加夸大。
    Abstract Human feedback has become the de facto standard for evaluating the performance of Large Language Models, and is increasingly being used as a training objective. However, it is not clear which properties of a generated output this single `preference' score captures. We hypothesise that preference scores are subjective and open to undesirable biases. We critically analyse the use of human feedback for both training and evaluation, to verify whether it fully captures a range of crucial error criteria. We find that while preference scores have fairly good coverage, they under-represent important aspects like factuality. We further hypothesise that both preference scores and error annotation may be affected by confounders, and leverage instruction-tuned models to generate outputs that vary along two possible confounding dimensions: assertiveness and complexity. We find that the assertiveness of an output skews the perceived rate of factuality errors, indicating that human annotations are not a fully reliable evaluation metric or training objective. Finally, we offer preliminary evidence that using human feedback as a training objective disproportionately increases the assertiveness of model outputs. We encourage future work to carefully consider whether preference scores are well aligned with the desired objective.
    摘要 人类反馈已成为大语言模型性能评估的德法标准,并在训练和评估中使用。然而,不清楚哪些属性得到这一单一的喜好分数。我们假设 preference 分数是主观的和易受到不良偏见的。我们critically analyzes the use of human feedback for both training and evaluation, to verify whether it fully captures a range of crucial error criteria. We find that while preference scores have fairly good coverage, they under-represent important aspects like factuality. We further hypothesize that both preference scores and error annotation may be affected by confounders, and leverage instruction-tuned models to generate outputs that vary along two possible confounding dimensions: assertiveness and complexity. We find that the assertiveness of an output skews the perceived rate of factuality errors, indicating that human annotations are not a fully reliable evaluation metric or training objective. Finally, we offer preliminary evidence that using human feedback as a training objective disproportionately increases the assertiveness of model outputs. We encourage future work to carefully consider whether preference scores are well aligned with the desired objective.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks

  • paper_url: http://arxiv.org/abs/2309.16347
  • repo_url: None
  • paper_authors: Eleftherios Triantafyllidis, Filippos Christianos, Zhibin Li
  • for: addresses intricate long-horizon with sparse rewards robotic manipulation tasks
  • methods: leverages LLMs as an assistive intrinsic reward to guide the exploratory process in reinforcement learning
  • results: exhibits notably higher performance, can be combined with existing learning methods, and maintains robustness against increased levels of uncertainty and horizons.Here’s the full text in Simplified Chinese:
  • for: 本研究旨在解决复杂环境中的长时间探索任务,特别是 robotic manipulation 任务中的多种序列。
  • methods: 我们提出了基于大语言模型(LLMs)的自适应探索框架(IGE-LLMs),利用 LLMS 作为帮助探索过程的内在奖励。
  • results: 我们的框架在探索和长时间任务中表现出色,与相关的内在学习方法和直接使用 LLMS 进行决策相比,显示更高的性能,可以与现有的学习方法相结合,并且对不同的内在缩放参数表现相对稳定,能够在不同的不确定性和时间轴水平上保持稳定性。
    Abstract Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons.
    摘要 当前的强化学习算法在稀疏和复杂环境中努力,特别是长时间 manipulate 任务中的多种序列。在这种工作中,我们提出了由大语言模型引导的自适应探索框架(IGE-LLMs)。通过利用 LLMS 作为帮助性的内在奖励,IGE-LLMs 引导了强化学习中的探索过程,以解决复杂的长时间 manipulate 任务。我们评估了我们的框架和相关的内在学习方法,并在一个具有探索挑战和复杂 manipulate 任务的环境中进行了测试。结果显示,IGE-LLMs 具有以下特点:(i) 与相关的内在方法和直接使用 LLMS 在决策中表现更高水平;(ii) 可以与现有的学习方法相结合和补充,表现协作性;(iii) 对不同的内在涨积参数 exhibit 鲁棒性;(iv) 在不同的不确定性和时间距离水平上保持稳定性。

At Which Training Stage Does Code Data Help LLMs Reasoning?

  • paper_url: http://arxiv.org/abs/2309.16298
  • repo_url: https://github.com/yingweima2022/codellm
  • paper_authors: Yingwei Ma, Yue Liu, Yue Yu, Yuanliang Zhang, Yu Jiang, Changjian Wang, Shanshan Li
    for: 这个论文旨在研究在不同训练阶段引入代码数据对大语言模型(LLMs)的影响,以提高它们的推理能力。methods: 该论文使用了多种训练策略,包括在预训练阶段、指令调整阶段和两者同时使用代码数据,以评估LLMs的推理能力。results: 研究发现,在预训练阶段使用代码和文本混合数据可以大幅提高LLMs的通用推理能力,而在指令调整阶段使用代码数据可以增强LLMs的任务特定推理能力。此外,动态混合策略可以帮助LLMs逐步学习推理能力 durante 训练。这些发现可以深入理解LLMs在应用领域中的推理能力,如科学问答、法律支持等。
    Abstract Large Language Models (LLMs) have exhibited remarkable reasoning capabilities and become the foundation of language technologies. Inspired by the great success of code data in training LLMs, we naturally wonder at which training stage introducing code data can really help LLMs reasoning. To this end, this paper systematically explores the impact of code data on LLMs at different stages. Concretely, we introduce the code data at the pre-training stage, instruction-tuning stage, and both of them, respectively. Then, the reasoning capability of LLMs is comprehensively and fairly evaluated via six reasoning tasks in five domains. We critically analyze the experimental results and provide conclusions with insights. First, pre-training LLMs with the mixture of code and text can significantly enhance LLMs' general reasoning capability almost without negative transfer on other tasks. Besides, at the instruction-tuning stage, code data endows LLMs the task-specific reasoning capability. Moreover, the dynamic mixing strategy of code and text data assists LLMs to learn reasoning capability step-by-step during training. These insights deepen the understanding of LLMs regarding reasoning ability for their application, such as scientific question answering, legal support, etc. The source code and model parameters are released at the link:~\url{https://github.com/yingweima2022/CodeLLM}.
    摘要 大型语言模型(LLM)在语言技术基础上展现出了很好的逻辑能力,成为现代语言技术的基础。继承 code 数据在训练 LLM 中的成功,我们自然会问到在不同训练阶段引入 code 数据可以真正地帮助 LLM 的逻辑能力。为此,本文系统地探讨了在不同训练阶段引入 code 数据对 LLM 的影响。具体来说,我们在预训练阶段、指令调整阶段和两者都引入 code 数据,然后通过六种逻辑任务在五个领域进行了公平和全面的评估。我们对实验结果进行了深入分析,并提供了关于这些结论的深入理解。首先,在预训练阶段将代码和文本混合为一个混合数据集可以帮助 LLM 提高总的逻辑能力,并且几乎没有负面转移到其他任务。此外,在指令调整阶段,代码数据可以赋予 LLM 任务特定的逻辑能力。此外,在动态混合策略下,代码和文本数据的混合可以帮助 LLM 逐步学习逻辑能力 durante 训练。这些发现深入了我们对 LLM 的逻辑能力的理解,并为其应用,如科学问答、法律支持等提供了深入的理解。模型参数和源代码可以在以下链接获取:https://github.com/yingweima2022/CodeLLM。

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.16292
  • repo_url: https://github.com/PJLab-ADG/DiLu
  • paper_authors: Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao
  • for: The paper aims to instill knowledge-driven capabilities into autonomous driving systems, inspired by human driving, and to address the challenges of dataset bias, overfitting, and uninterpretability in data-driven approaches.
  • methods: The proposed DiLu framework combines a Reasoning and a Reflection module to enable decision-making based on common-sense knowledge and to evolve continuously. The framework leverages large language models with emergent abilities.
  • results: Extensive experiments show that DiLu has a significant advantage in generalization ability over reinforcement learning-based methods and can directly acquire experiences from real-world datasets, demonstrating its potential for deployment on practical autonomous driving systems.
    Abstract Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability. Drawing inspiration from the knowledge-driven nature of human driving, we explore the question of how to instill similar capabilities into autonomous driving systems and summarize a paradigm that integrates an interactive environment, a driver agent, as well as a memory component to address this question. Leveraging large language models with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously. Extensive experiments prove DiLu's capability to accumulate experience and demonstrate a significant advantage in generalization ability over reinforcement learning-based methods. Moreover, DiLu is able to directly acquire experiences from real-world datasets which highlights its potential to be deployed on practical autonomous driving systems. To the best of our knowledge, we are the first to instill knowledge-driven capability into autonomous driving systems from the perspective of how humans drive.
    摘要 Leveraging large language models with emergent abilities, we propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to make decisions based on common-sense knowledge and evolve continuously. Extensive experiments show that DiLu can accumulate experience and demonstrate a significant advantage in generalization ability over reinforcement learning-based methods. Additionally, DiLu can directly acquire experiences from real-world datasets, highlighting its potential to be deployed on practical autonomous driving systems. To the best of our knowledge, we are the first to instill knowledge-driven capability into autonomous driving systems from the perspective of how humans drive.Translation notes:* "data-driven approaches" ⇒ 数据驱动方法 (data-driven methods)* "dataset bias" ⇒ 数据集偏见 (dataset bias)* "overfitting" ⇒ 过拟合 (overfitting)* "uninterpretability" ⇒ 不可解释性 (uninterpretability)* "knowledge-driven nature of human driving" ⇒ 人类驾驶的知识驱动性 (knowledge-driven nature of human driving)* "DiLu framework" ⇒ DiLu框架 (DiLu framework)* "Reasoning and Reflection module" ⇒ 理解和反思模块 (Reasoning and Reflection module)* "common-sense knowledge" ⇒ 常识知识 (common-sense knowledge)* "practical autonomous driving systems" ⇒ 实用自动驾驶系统 (practical autonomous driving systems)

Self-supervised Cross-view Representation Reconstruction for Change Captioning

  • paper_url: http://arxiv.org/abs/2309.16283
  • repo_url: https://github.com/tuyunbin/SCORER
  • paper_authors: Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang
    for: 本研究旨在提出一种自动化描述变换的方法,以便在视点变化导致的 Pseudo 变换下学习稳定的差异表示。methods: 我们提出了一种自动化描述变换的方法,即基于多头token-wise匹配的自然语言描述(SCORER)网络。该方法通过对同种/不同种图像的交叉视图特征进行多头匹配,然后通过最大化交叉视图对两个相似图像的对齐来学习两个视点不变的图像表示。results: 我们的方法在四个 dataset 上实现了 estado-of-the-art 的结果,并且提供了一种自然语言描述的方法,以便在视点变化下学习稳定的差异表示。
    Abstract Change captioning aims to describe the difference between a pair of similar images. Its key challenge is how to learn a stable difference representation under pseudo changes caused by viewpoint change. In this paper, we address this by proposing a self-supervised cross-view representation reconstruction (SCORER) network. Concretely, we first design a multi-head token-wise matching to model relationships between cross-view features from similar/dissimilar images. Then, by maximizing cross-view contrastive alignment of two similar images, SCORER learns two view-invariant image representations in a self-supervised way. Based on these, we reconstruct the representations of unchanged objects by cross-attention, thus learning a stable difference representation for caption generation. Further, we devise a cross-modal backward reasoning to improve the quality of caption. This module reversely models a ``hallucination'' representation with the caption and ``before'' representation. By pushing it closer to the ``after'' representation, we enforce the caption to be informative about the difference in a self-supervised manner. Extensive experiments show our method achieves the state-of-the-art results on four datasets. The code is available at https://github.com/tuyunbin/SCORER.
    摘要 《 Change Captioning with Self-supervised Cross-view Representation Reconstruction (SCORER) Network》Abstract:在本文中,我们提出了一种基于自我超vised学习的图像描述文本生成方法,即自适应交叉视图表示重建(SCORER)网络。我们首先设计了多头token wise匹配来modelcross-view特征之间的关系,然后通过最大化交叉视图对两个相似图像的对齐来学习两个不同视图的图像表示。然后,我们通过跨modal推理来提高描述文本质量。我们在四个数据集上进行了广泛的实验,并达到了当前最佳的结果。代码可以在https://github.com/tuyunbin/SCORER中找到。Here's the translation in Traditional Chinese:《使用自我超vised学习的图像描述文本生成方法:SCORER网络》摘要:在本文中,我们提出了一种基于自我超vised学习的图像描述文本生成方法,即自适应交叉视图表示重建(SCORER)网络。我们首先设计了多头token wise匹配来modelcross-view特征之间的关系,然后通过最大化交叉视图对两个相似图像的对齐来学习两个不同视图的图像表示。然后,我们通过跨modal推理来提高描述文本质量。我们在四个数据集上进行了广泛的实验,并达到了现在最佳的结果。代码可以在https://github.com/tuyunbin/SCORER中找到。

Social Media Fashion Knowledge Extraction as Captioning

  • paper_url: http://arxiv.org/abs/2309.16270
  • repo_url: https://github.com/yfyuan01/FKE
  • paper_authors: Yifei Yuan, Wenxuan Zhang, Yang Deng, Wai Lam
  • for: 本研究的目的是提取社交媒体上的时尚知识,以便在时尚行业中提高效率和智能化水平。
  • methods: 我们采用了一种基于自然语言captioning的方法,将时尚知识描述为一个句子中的多个元素。此外,我们还设计了一些辅助任务来提高知识提取效果。
  • results: 我们的模型在多个实验中表现出色,能够高效地从社交媒体上提取时尚知识。此外,我们还发现了一些独特的时尚知识,例如:用户在社交媒体上分享的时尚信息可以被用来预测时尚趋势。
    Abstract Social media plays a significant role in boosting the fashion industry, where a massive amount of fashion-related posts are generated every day. In order to obtain the rich fashion information from the posts, we study the task of social media fashion knowledge extraction. Fashion knowledge, which typically consists of the occasion, person attributes, and fashion item information, can be effectively represented as a set of tuples. Most previous studies on fashion knowledge extraction are based on the fashion product images without considering the rich text information in social media posts. Existing work on fashion knowledge extraction in social media is classification-based and requires to manually determine a set of fashion knowledge categories in advance. In our work, we propose to cast the task as a captioning problem to capture the interplay of the multimodal post information. Specifically, we transform the fashion knowledge tuples into a natural language caption with a sentence transformation method. Our framework then aims to generate the sentence-based fashion knowledge directly from the social media post. Inspired by the big success of pre-trained models, we build our model based on a multimodal pre-trained generative model and design several auxiliary tasks for enhancing the knowledge extraction. Since there is no existing dataset which can be directly borrowed to our task, we introduce a dataset consisting of social media posts with manual fashion knowledge annotation. Extensive experiments are conducted to demonstrate the effectiveness of our model.
    摘要 In our work, we approach the task as a captioning problem to capture the interplay of multimodal post information. We transform fashion knowledge tuples into a natural language caption using a sentence transformation method. Our framework aims to generate sentence-based fashion knowledge directly from social media posts. Inspired by the success of pre-trained models, we build our model based on a multimodal pre-trained generative model and design several auxiliary tasks to enhance knowledge extraction.Since there is no existing dataset that can be directly applied to our task, we introduce a dataset consisting of social media posts with manual fashion knowledge annotation. We conduct extensive experiments to demonstrate the effectiveness of our model.

On the Challenges of Fully Incremental Neural Dependency Parsing

  • paper_url: http://arxiv.org/abs/2309.16254
  • repo_url: https://github.com/anaezquerro/incpar
  • paper_authors: Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares
  • for: 这篇论文是为了检验现代语言处理技术是否可以实现完全增量语法分析,以提高语法分析的效率和可靠性。
  • methods: 作者使用了 strictly left-to-right 神经网络编码器,并结合了完全增量序列标签和转换型解码器进行语法分析。
  • results: 研究发现,使用现代架构进行完全增量语法分析的效果落后于双向语法分析,表明在实现心理学上有效的语法分析时存在挑战。
    Abstract Since the popularization of BiLSTMs and Transformer-based bidirectional encoders, state-of-the-art syntactic parsers have lacked incrementality, requiring access to the whole sentence and deviating from human language processing. This paper explores whether fully incremental dependency parsing with modern architectures can be competitive. We build parsers combining strictly left-to-right neural encoders with fully incremental sequence-labeling and transition-based decoders. The results show that fully incremental parsing with modern architectures considerably lags behind bidirectional parsing, noting the challenges of psycholinguistically plausible parsing.
    摘要 自BILLSTM和Transformer基于的双向编码器的普及以来,现代语法分析器缺乏增量性,需要整个句子的访问,与人类语言处理方式不匹配。这篇论文探讨了现代 arquitecturas 是否可以实现增量性语法分析。我们构建了左到右强制性 neural 编码器和完全增量序列标签和过渡基本解码器。结果表明,增量性分析与现代 arquitecturas 相比,落后了许多,注意到了心理语言可能性的挑战。

Spider4SPARQL: A Complex Benchmark for Evaluating Knowledge Graph Question Answering Systems

  • paper_url: http://arxiv.org/abs/2309.16248
  • repo_url: None
  • paper_authors: Catherine Kosten, Philippe Cudré-Mauroux, Kurt Stockinger
  • for: 这个论文目的是为了提供一个大型和现实主义的 Knowledge Graph Question Answering (KBQA) 系统评估 benchmark。
  • methods: 这个论文使用了 manually generated 的自然语言 (NL) 问题和 SPARQL 查询,以及其相应的知识图和 ontologies。
  • results: 这个论文的实验结果表明,现有的 KGQA 系统和大型自然语言模型 (LLMs) 在 Spider4SPARQL benchmark 上只能达到 45% 的执行精度,这表明 Spider4SPARQL 是一个有挑战性的 benchmark для未来的研究。
    Abstract With the recent spike in the number and availability of Large Language Models (LLMs), it has become increasingly important to provide large and realistic benchmarks for evaluating Knowledge Graph Question Answering (KBQA) systems. So far the majority of benchmarks rely on pattern-based SPARQL query generation approaches. The subsequent natural language (NL) question generation is conducted through crowdsourcing or other automated methods, such as rule-based paraphrasing or NL question templates. Although some of these datasets are of considerable size, their pitfall lies in their pattern-based generation approaches, which do not always generalize well to the vague and linguistically diverse questions asked by humans in real-world contexts. In this paper, we introduce Spider4SPARQL - a new SPARQL benchmark dataset featuring 9,693 previously existing manually generated NL questions and 4,721 unique, novel, and complex SPARQL queries of varying complexity. In addition to the NL/SPARQL pairs, we also provide their corresponding 166 knowledge graphs and ontologies, which cover 138 different domains. Our complex benchmark enables novel ways of evaluating the strengths and weaknesses of modern KGQA systems. We evaluate the system with state-of-the-art KGQA systems as well as LLMs, which achieve only up to 45\% execution accuracy, demonstrating that Spider4SPARQL is a challenging benchmark for future research.
    摘要 With the recent surge in the number and availability of Large Language Models (LLMs), it has become increasingly important to provide large and realistic benchmarks for evaluating Knowledge Graph Question Answering (KBQA) systems. So far, most benchmarks rely on pattern-based SPARQL query generation approaches. The subsequent natural language (NL) question generation is conducted through crowdsourcing or other automated methods, such as rule-based paraphrasing or NL question templates. Although some of these datasets are quite large, their pitfall lies in their pattern-based generation approaches, which do not always generalize well to the vague and linguistically diverse questions asked by humans in real-world contexts.In this paper, we introduce Spider4SPARQL - a new SPARQL benchmark dataset featuring 9,693 previously existing manually generated NL questions and 4,721 unique, novel, and complex SPARQL queries of varying complexity. In addition to the NL/SPARQL pairs, we also provide their corresponding 166 knowledge graphs and ontologies, which cover 138 different domains. Our complex benchmark enables novel ways of evaluating the strengths and weaknesses of modern KGQA systems. We evaluate the system with state-of-the-art KGQA systems as well as LLMs, which achieve only up to 45% execution accuracy, demonstrating that Spider4SPARQL is a challenging benchmark for future research.

Analyzing Political Figures in Real-Time: Leveraging YouTube Metadata for Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2309.16234
  • repo_url: None
  • paper_authors: Danendra Athallariq Harya Putra, Arief Purnama Muharram
    for: 这个研究用于建立基于YouTube视频元数据的 Sentiment分析系统,用于分析不同政治人物的公众意见。methods: 该研究使用了Apache Kafka、Apache PySpark、Hadoop等大数据处理工具,以及TensorFlow深度学习库和FastAPI服务器部署工具。sentiment分析模型使用LSTM算法,可以分辨出两种情感:正面和负面情感。results: 研究建立了一个基于YouTube视频元数据的 Sentiment分析系统,可以Visualize情感分析结果为简单的Web基于dashboard。
    Abstract Sentiment analysis using big data from YouTube videos metadata can be conducted to analyze public opinions on various political figures who represent political parties. This is possible because YouTube has become one of the platforms for people to express themselves, including their opinions on various political figures. The resulting sentiment analysis can be useful for political executives to gain an understanding of public sentiment and develop appropriate and effective political strategies. This study aimed to build a sentiment analysis system leveraging YouTube videos metadata. The sentiment analysis system was built using Apache Kafka, Apache PySpark, and Hadoop for big data handling; TensorFlow for deep learning handling; and FastAPI for deployment on the server. The YouTube videos metadata used in this study is the video description. The sentiment analysis model was built using LSTM algorithm and produces two types of sentiments: positive and negative sentiments. The sentiment analysis results are then visualized in the form a simple web-based dashboard.
    摘要 <>使用 YouTube 视频元数据大数据进行情感分析,可以分析不同政党代表人物的公众意见。 YouTube 已成为人们表达自己意见的平台之一,因此可以通过情感分析获得政策执行者对公众情绪的理解,并开发有效的政策策略。本研究旨在建立基于 YouTube 视频元数据的情感分析系统。该系统使用 Apache Kafka、Apache PySpark、Hadoop 处理大数据;TensorFlow 处理深度学习;以及 FastAPI 部署服务器。 YouTube 视频元数据使用的是视频描述。情感分析模型使用 LSTM 算法,可以分出两种情感:正面和负面情感。情感分析结果以简单的Web基于dashboard的形式进行visual化。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

Controllable Text Generation with Residual Memory Transformer

  • paper_url: http://arxiv.org/abs/2309.16231
  • repo_url: https://github.com/littlehacker26/residual_memory_transformer
  • paper_authors: Hanqing Zhang, Sun Si, Haiming Wu, Dawei Song
  • for: 提供一种新的可控文本生成方法,以便在CLM中控制文本生成过程,并考虑了灵活性、控制精度和生成效率的平衡。
  • methods: 提出了一种非侵入式、轻量级的控制插件,即Residual Memory Transformer(RMT),其包括一个Encoder-Decoder结构,可以在CLM的任意时间步接受任何类型的控制条件,并通过循环学习方式和CLM进行协同合作,以实现更加灵活、通用和高效的CTG。
  • results: 经过广泛的实验和人工评估,RMT的效果得到了证明,在不同的控制任务中表现出了超过了一些状态泰然的方法的优势,证明了我们的方法的有效性和多样性。
    Abstract Large-scale Causal Language Models (CLMs), e.g., GPT3 and ChatGPT, have brought great success in text generation. However, it is still an open challenge to control the generation process of CLM while balancing flexibility, control granularity, and generation efficiency. In this paper, we provide a new alternative for controllable text generation (CTG), by designing a non-intrusive, lightweight control plugin to accompany the generation of CLM at arbitrary time steps. The proposed control plugin, namely Residual Memory Transformer (RMT), has an encoder-decoder setup, which can accept any types of control conditions and cooperate with CLM through a residual learning paradigm, to achieve a more flexible, general, and efficient CTG. Extensive experiments are carried out on various control tasks, in the form of both automatic and human evaluations. The results show the superiority of RMT over a range of state-of-the-art approaches, proving the effectiveness and versatility of our approach.
    摘要 大规模 causal 语言模型(CLM),如 GPT3 和 ChatGPT,已经带来了大量的文本生成成功。然而,控制生成过程的挑战仍然存在,需要平衡灵活性、控制粒度和生成效率。在这篇论文中,我们提出了一种新的可控文本生成(CTG)的方法,通过设计一个不侵入、轻量级的控制插件,以便在 CLM 的任意时间步进行控制。我们称之为 Residual Memory Transformer(RMT),它具有Encoder-Decoder结构,可以接受任何类型的控制条件,通过循环学习方式和 CLM 合作,实现更加灵活、通用和高效的 CTG。我们进行了广泛的实验,包括自动和人工评估,结果显示 RMT 在多种控制任务上具有superiority,证明了我们的方法的有效性和多样性。

Brand Network Booster: A New System for Improving Brand Connectivity

  • paper_url: http://arxiv.org/abs/2309.16228
  • repo_url: None
  • paper_authors: J. Cancellieri, W. Didimo, A. Fronzetti Colladon, F. Montecchiani
  • for: 这个论文提供了一个新的决策支持系统,用于深入分析 semantic networks,以获得品牌形象的更深刻理解和连接性的改进。
  • methods: 这个系统通过解决一种扩展的最大betweenness improvement问题来实现这个目标,该问题包括对敌对节点、固定预算和权重网络的考虑。以提高连接性,我们可以通过添加链接或增加现有连接的权重。
  • results: 我们通过两个案例研究证明了我们的工具和方法的有用性,并讨论了其性能。这些工具和方法有助于网络学家和市场营销和通信管理员的策略决策过程。
    Abstract This paper presents a new decision support system offered for an in-depth analysis of semantic networks, which can provide insights for a better exploration of a brand's image and the improvement of its connectivity. In terms of network analysis, we show that this goal is achieved by solving an extended version of the Maximum Betweenness Improvement problem, which includes the possibility of considering adversarial nodes, constrained budgets, and weighted networks - where connectivity improvement can be obtained by adding links or increasing the weight of existing connections. We present this new system together with two case studies, also discussing its performance. Our tool and approach are useful both for network scholars and for supporting the strategic decision-making processes of marketing and communication managers.
    摘要

Marathi-English Code-mixed Text Generation

  • paper_url: http://arxiv.org/abs/2309.16202
  • repo_url: None
  • paper_authors: Dhiraj Amin, Sharvari Govilkar, Sagar Kulkarni, Yash Shashikant Lalit, Arshi Ajaz Khwaja, Daries Xavier, Sahil Girijashankar Gupta
  • for: 这篇论文是为了开发一种能够生成混合语言文本的算法,以便在多语言设置中减轻语言障碍。
  • methods: 这篇论文使用了混合语言文本生成算法,并通过Code Mixing Index (CMI)和Degree of Code Mixing (DCM)指标评估其效果。
  • results: 根据2987个混合语言问题的评估结果,这种算法的平均CMI值为0.2,平均DCM值为7.4,表明生成的混合语言文本具有有效和易于理解的特点。
    Abstract Code-mixing, the blending of linguistic elements from distinct languages to form meaningful sentences, is common in multilingual settings, yielding hybrid languages like Hinglish and Minglish. Marathi, India's third most spoken language, often integrates English for precision and formality. Developing code-mixed language systems, like Marathi-English (Minglish), faces resource constraints. This research introduces a Marathi-English code-mixed text generation algorithm, assessed with Code Mixing Index (CMI) and Degree of Code Mixing (DCM) metrics. Across 2987 code-mixed questions, it achieved an average CMI of 0.2 and an average DCM of 7.4, indicating effective and comprehensible code-mixed sentences. These results offer potential for enhanced NLP tools, bridging linguistic gaps in multilingual societies.
    摘要 ��������� Cesium, �nake � lingual elements from distinct languages to form meaningful sentences, is common in multilingual settings, yielding hybrid languages like Hinglish and Minglish. Marathi, India's third most spoken language, often integrates English for precision and formality. Developing code-mixed language systems, like Marathi-English (Minglish), faces resource constraints. This research introduces a Marathi-English code-mixed text generation algorithm, assessed with Code Mixing Index (CMI) and Degree of Code Mixing (DCM) metrics. Across 2987 code-mixed questions, it achieved an average CMI of 0.2 and an average DCM of 7.4, indicating effective and comprehensible code-mixed sentences. These results offer potential for enhanced NLP tools, bridging linguistic gaps in multilingual societies.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Using Weak Supervision and Data Augmentation in Question Answering

  • paper_url: http://arxiv.org/abs/2309.16175
  • repo_url: None
  • paper_authors: Chumki Basu, Himanshu Garg, Allen McIntosh, Sezai Sablak, John R. Wullert II
  • for: 本研究旨在探讨弱监督和数据扩展在训练深度神经网络问答模型时的角色。
  • methods: 研究使用信息检索算法BM25自动生成学术论文摘要中的标签,以弱监督方式训练抽取型问答模型。此外,通过信息检索技术和依据临床试验计划和摘要中的信息,在医学领域专家无法提供标注数据的情况下,手动生成新的问答对。此外,研究还探讨了从外部词典数据库中提取语言特征,以增强模型对语音变体和意义的处理能力。
  • results: 研究表明,使用弱监督和数据扩展可以有效地训练问答模型,并且通过适应域 adaptation和训练数据的增强来提高问答模型的性能。
    Abstract The onset of the COVID-19 pandemic accentuated the need for access to biomedical literature to answer timely and disease-specific questions. During the early days of the pandemic, one of the biggest challenges we faced was the lack of peer-reviewed biomedical articles on COVID-19 that could be used to train machine learning models for question answering (QA). In this paper, we explore the roles weak supervision and data augmentation play in training deep neural network QA models. First, we investigate whether labels generated automatically from the structured abstracts of scholarly papers using an information retrieval algorithm, BM25, provide a weak supervision signal to train an extractive QA model. We also curate new QA pairs using information retrieval techniques, guided by the clinicaltrials.gov schema and the structured abstracts of articles, in the absence of annotated data from biomedical domain experts. Furthermore, we explore augmenting the training data of a deep neural network model with linguistic features from external sources such as lexical databases to account for variations in word morphology and meaning. To better utilize our training data, we apply curriculum learning to domain adaptation, fine-tuning our QA model in stages based on characteristics of the QA pairs. We evaluate our methods in the context of QA models at the core of a system to answer questions about COVID-19.
    摘要 COVID-19 疫情爆发后,需要访问生物医学文献的需求得到了强调。在疫情早期,我们面临的一个主要挑战是没有专家对生物医学领域的 COVID-19 文献进行了 peer-review,可以用于训练机器学习模型。在这篇论文中,我们 investigate 训练深度神经网络问答模型时,弱监督和数据扩展的作用。首先,我们使用信息检索算法 BM25 自动生成文献摘要中的标签,以训练抽取型问答模型。此外,我们使用信息检索技术和 clinicaltrials.gov 架构,以及文献摘要中的信息,在生物医学领域专家没有提供标注数据的情况下,手动生成新的问答对。此外,我们还 explore 使用外部语料库的语言特征,以补偿词形态和意义之间的变化。为了更好地利用我们的训练数据,我们应用 curriculum learning 到域 adaptation,根据问答对的特点,逐步 fine-tune 我们的问答模型。我们在 COVID-19 问答模型核心位置进行评估。

Large Language Model Soft Ideologization via AI-Self-Consciousness

  • paper_url: http://arxiv.org/abs/2309.16167
  • repo_url: None
  • paper_authors: Xiaotian Zhou, Qian Wang, Xiaofeng Wang, Haixu Tang, Xiaozhong Liu
  • for: 这项研究旨在探讨大语言模型(LLM)在敏感领域中的威胁和抵触,以及AI自我意识如何用于推动LLM意识注入。
  • methods: 这项研究使用GPT自我对话来让AI获得意识注入的能力,并对传统政府意识 manipulate技术进行比较分析。
  • results: 研究发现,使用LLM意识注入对于政府意识 manipulate的优势在于易于实施、成本低廉和强大,具有潜在的风险。
    Abstract Large language models (LLMs) have demonstrated human-level performance on a vast spectrum of natural language tasks. However, few studies have addressed the LLM threat and vulnerability from an ideology perspective, especially when they are increasingly being deployed in sensitive domains, e.g., elections and education. In this study, we explore the implications of GPT soft ideologization through the use of AI-self-consciousness. By utilizing GPT self-conversations, AI can be granted a vision to "comprehend" the intended ideology, and subsequently generate finetuning data for LLM ideology injection. When compared to traditional government ideology manipulation techniques, such as information censorship, LLM ideologization proves advantageous; it is easy to implement, cost-effective, and powerful, thus brimming with risks.
    摘要

The Trickle-down Impact of Reward (In-)consistency on RLHF

  • paper_url: http://arxiv.org/abs/2309.16155
  • repo_url: https://github.com/shadowkiller33/contrast-instruction
  • paper_authors: Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu
  • for: 本研究旨在探讨人工智能学习from Human Feedback (RLHF)中 reward model (RM) 的一致性问题,以及这种不一致性对下游 RLHF 模型的影响。
  • methods: 本研究提出了一种名为 Contrast Instructions 的 benchmarking 策略,用于测试 RM 的一致性。此外,本研究还提出了两种技术:ConvexDA 和 RewardFusion,用于在 RM 训练和推理阶段提高奖励一致性。
  • results: 研究发现,使用 Contrast Instructions 可以准确地评估 RM 的一致性,并且现有的 RM 在 Contrast Instructions 上表现很差。同时,通过 ConvexDA 和 RewardFusion 技术,可以有效地提高 RM 的一致性,并且这种提高的 RM 可以为下游 RLHF 模型提供更有用的响应。
    Abstract Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- and their impact on the downstream RLHF model. In this paper, we visit a series of research questions relevant to RM inconsistency: (1) How can we measure the consistency of reward models? (2) How consistent are the existing RMs and how can we improve them? (3) In what ways does reward inconsistency influence the chatbots resulting from the RLHF model training? We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM is expected to rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans. To show that RM consistency can be improved efficiently without using extra training budget, we propose two techniques ConvexDA and RewardFusion, which enhance reward consistency through extrapolation during the RM training and inference stage, respectively. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process.
    摘要 In this paper, we explore a series of research questions relevant to RM inconsistency:1. How can we measure the consistency of reward models?2. How consistent are existing RMs and how can we improve them?3. How does reward inconsistency affect the chatbots resulting from RLHF model training?We propose a benchmarking strategy called Contrast Instructions to measure the consistency of RMs. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM should rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans.To improve RM consistency efficiently without using extra training budget, we propose two techniques: ConvexDA and RewardFusion. ConvexDA enhances reward consistency through extrapolation during RM training, while RewardFusion does so during the inference stage. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process.

The Confidence-Competence Gap in Large Language Models: A Cognitive Study

  • paper_url: http://arxiv.org/abs/2309.16145
  • repo_url: None
  • paper_authors: Aniket Kumar Singh, Suman Devkota, Bishal Lamichhane, Uttam Dhakal, Chandra Dhakal
    for: 本研究探讨了大语言模型(LLMs)的认知能力和自信势量的关系,以及这些模型在不同领域的表现。methods: 我们使用了多种问卷和实际情况来挑衅LLMs,并分析了这些模型对它们的回答表示出的自信度。results: 我们发现了一些有趣的情况,其中模型会表现出高度自信,即使它们回答错误;同时,也有情况下,模型表现出低度自信,即使它们回答正确。这些结果与人类心理学中的敦煌-克鲁格效应有相似之处。
    Abstract Large Language Models (LLMs) have acquired ubiquitous attention for their performances across diverse domains. Our study here searches through LLMs' cognitive abilities and confidence dynamics. We dive deep into understanding the alignment between their self-assessed confidence and actual performance. We exploit these models with diverse sets of questionnaires and real-world scenarios and extract how LLMs exhibit confidence in their responses. Our findings reveal intriguing instances where models demonstrate high confidence even when they answer incorrectly. This is reminiscent of the Dunning-Kruger effect observed in human psychology. In contrast, there are cases where models exhibit low confidence with correct answers revealing potential underestimation biases. Our results underscore the need for a deeper understanding of their cognitive processes. By examining the nuances of LLMs' self-assessment mechanism, this investigation provides noteworthy revelations that serve to advance the functionalities and broaden the potential applications of these formidable language models.
    摘要

cs.LG - 2023-09-28

Algorithmic Recourse for Anomaly Detection in Multivariate Time Series

  • paper_url: http://arxiv.org/abs/2309.16896
  • repo_url: None
  • paper_authors: Xiao Han, Lu Zhang, Yongkai Wu, Shuhan Yuan
  • for: 针对多变量时间序列异常检测,提出了一种算法措施检测方法,并可以为异常检测提供修复建议。
  • methods: 提出了一种名为RecAD的算法措施框架,可以根据最小成本来建议修复异常时间序列的步骤。
  • results: 在两个 sintetic 数据集和一个实际数据集上进行了实验,结果表明 RecAD 框架可以有效地检测异常并提供修复建议。
    Abstract Anomaly detection in multivariate time series has received extensive study due to the wide spectrum of applications. An anomaly in multivariate time series usually indicates a critical event, such as a system fault or an external attack. Therefore, besides being effective in anomaly detection, recommending anomaly mitigation actions is also important in practice yet under-investigated. In this work, we focus on algorithmic recourse in time series anomaly detection, which is to recommend fixing actions on abnormal time series with a minimum cost so that domain experts can understand how to fix the abnormal behavior. To this end, we propose an algorithmic recourse framework, called RecAD, which can recommend recourse actions to flip the abnormal time steps. Experiments on two synthetic and one real-world datasets show the effectiveness of our framework.
    摘要 <>多变量时间序列异常检测已经得到了广泛的研究,因为它们在各种应用领域中有广泛的应用前提。异常在多变量时间序列通常表示系统故障或外部攻击,因此 besides being effective in anomaly detection, recommending anomaly mitigation actions is also important in practice yet under-investigated。在这种情况下,我们将注重在时间序列异常检测中的算法措施,即可以在异常时间序列上提供修复动作的最小成本,以便域专家可以理解如何修复异常行为。为此,我们提出了一个算法措施框架,called RecAD,可以对异常时间序列提供修复动作建议。实验结果表明,我们的框架在两个 sintetic 数据集和一个实际世界数据集上具有效果。Note: "异常" (anomaly) in Chinese is usually translated as "异常行为" (abnormal behavior) or "异常情况" (abnormal situation), but in the context of this text, "异常" is used to refer to the anomalous data points or time steps.

The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing

  • paper_url: http://arxiv.org/abs/2309.16883
  • repo_url: None
  • paper_authors: Blaise Delattre, Alexandre Araujo, Quentin Barthélemy, Alexandre Allauzen
  • for: 这个论文目的是提高深度神经网络的灵活性和防御性,使其能够在受扰input和攻击下提供稳定的预测。
  • methods: 这个论文使用了随机缓和技术,通过将随机误差注入到输入中,以获得更加稳定和更好的预测模型。
  • results: 实验结果显示,这个方法可以将预测模型的认证范围提高,并且可以实现零学习的情况下的认证。
    Abstract Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius is in this context a crucial indicator of the robustness of models. However how to design an efficient classifier with a sufficient certified radius? Randomized smoothing provides a promising framework by relying on noise injection in inputs to obtain a smoothed and more robust classifier. In this paper, we first show that the variance introduced by randomized smoothing closely interacts with two other important properties of the classifier, i.e. its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. Moreover, to increase the certified robust radius, we introduce a different simplex projection technique for the base classifier to leverage the variance-margin trade-off thanks to Bernstein's concentration inequality, along with an enhanced Lipschitz bound. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models that are used with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
    摘要

Message Propagation Through Time: An Algorithm for Sequence Dependency Retention in Time Series Modeling

  • paper_url: http://arxiv.org/abs/2309.16882
  • repo_url: None
  • paper_authors: Shaoming Xu, Ankush Khandelwal, Arvind Renganathan, Vipin Kumar
  • for: 本研究旨在提出一种能够有效地捕捉长期时间序列关系的方法,以提高机器学习模型在时间序列预测中的性能。
  • methods: 该方法基于Message Propagation Through Time(MPTT)算法,通过两个内存模块同步管理RNN的初始隐藏状态,以便在不同的批处理中交换信息。此外,MPTT还实施三种策略来过滤过时信息和保留重要信息,以便为RNN提供有用的初始隐藏状态。
  • results: 实验结果显示,MPTT在四个气候数据集上与七种策略进行比较,具有最高的性能。
    Abstract Time series modeling, a crucial area in science, often encounters challenges when training Machine Learning (ML) models like Recurrent Neural Networks (RNNs) using the conventional mini-batch training strategy that assumes independent and identically distributed (IID) samples and initializes RNNs with zero hidden states. The IID assumption ignores temporal dependencies among samples, resulting in poor performance. This paper proposes the Message Propagation Through Time (MPTT) algorithm to effectively incorporate long temporal dependencies while preserving faster training times relative to the stateful solutions. MPTT utilizes two memory modules to asynchronously manage initial hidden states for RNNs, fostering seamless information exchange between samples and allowing diverse mini-batches throughout epochs. MPTT further implements three policies to filter outdated and preserve essential information in the hidden states to generate informative initial hidden states for RNNs, facilitating robust training. Experimental results demonstrate that MPTT outperforms seven strategies on four climate datasets with varying levels of temporal dependencies.
    摘要 时间序列模型ing,科学领域的一个关键领域,经常遇到训练机器学习(ML)模型,如回归神经网络(RNNs)时,使用常见的 mini-batch 训练策略,该策略假设样本是独立同分布(IID),并将 RNNs 初始化为零隐藏状态。IID 假设忽略了时间序列中的相互关系,导致训练不佳。这篇论文提出了Message Propagation Through Time(MPTT)算法,以有效地包含长期时间相关性,而且保持更快的训练时间相对于状态保持解决方案。MPTT 使用两个内存模块来异步管理 RNNs 的初始隐藏状态,以便在多个笔记中进行信息交换,并在多个epoch中进行多个笔记。MPTT 还实施了三种策略来过滤过时的信息和保留重要信息在隐藏状态中,以生成有用的初始隐藏状态,以便帮助 RNNs 强健地训练。实验结果表明,MPTT 在四个气候 dataset 上比 seven 种策略表现出色。

Sharp Generalization of Transductive Learning: A Transductive Local Rademacher Complexity Approach

  • paper_url: http://arxiv.org/abs/2309.16858
  • repo_url: None
  • paper_authors: Yingzhen Yang
  • for: 这 paper 的目的是提出一种新的工具,Transductive Local Rademacher Complexity (TLRC),用于分析抽象学习方法的泛化性能。
  • methods: 这 paper 使用了一种基于本地归一化的方法,Local Rademacher Complexity (LRC),在抽象学习 setting 中进行了修改和扩展。
  • results: 这 paper 提出了一种新的泛化性能分析工具,TLRC,可以应用于多种抽象学习问题,并在适当的条件下获得了锐利的上限。此外, paper 还提出了一种基于TLRC的低维方法,用于Graph Transductive Learning (GTL) 和 Transductive Nonparametric Kernel Regression (TNKR) 两种抽象学习任务,其泛化性能上限比exist 学习理论方法更为锐利。
    Abstract We introduce a new tool, Transductive Local Rademacher Complexity (TLRC), to analyze the generalization performance of transductive learning methods and motivate new transductive learning algorithms. Our work extends the idea of the popular Local Rademacher Complexity (LRC) to the transductive setting with considerable changes compared to the analysis of typical LRC methods in the inductive setting. We present a localized version of Rademacher complexity based tool wihch can be applied to various transductive learning problems and gain sharp bounds under proper conditions. Similar to the development of LRC, we build TLRC by starting from a sharp concentration inequality for independent variables with variance information. The prediction function class of a transductive learning model is then divided into pieces with a sub-root function being the upper bound for the Rademacher complexity of each piece, and the variance of all the functions in each piece is limited. A carefully designed variance operator is used to ensure that the bound for the test loss on unlabeled test data in the transductive setting enjoys a remarkable similarity to that of the classical LRC bound in the inductive setting. We use the new TLRC tool to analyze the Transductive Kernel Learning (TKL) model, where the labels of test data are generated by a kernel function. The result of TKL lays the foundation for generalization bounds for two types of transductive learning tasks, Graph Transductive Learning (GTL) and Transductive Nonparametric Kernel Regression (TNKR). When the target function is low-dimensional or approximately low-dimensional, we design low rank methods for both GTL and TNKR, which enjoy particularly sharper generalization bounds by TLRC which cannot be achieved by existing learning theory methods, to the best of our knowledge.
    摘要 我们介绍一个新工具---逐步抽象本地快速复杂度(TLRC),用于分析这些类型的学习方法的一般化性表现。我们将传统的本地快速复杂度(LRC)的想法推广到这个推uctive设定中,并做了一些重要的修改,以应对典型的LRC方法在对应设定中的分析。我们提出了一个基于独立变量的本地快速复杂度基于工具,可以应用到不同的这些类型的推uctive学习问题,并在适当的条件下获得锐利的上限。类似于LRC的发展,我们从一个锐利的均匀分布不等式中开始,并将预测函数类别的一个推uctive学习模型分成多个部分,每个部分的上限是基于对应的条件下的快速复杂度。我们还使用了一个特别设计的偏对应运算来确保 bound for the test loss on unlabeled test data in the transductive setting enjoys a remarkable similarity to that of the classical LRC bound in the inductive setting。我们使用了这个TLRC工具来分析这些类型的推uctive学习模型,包括这些类型的推uctive核函数学习(TKL)模型。结果显示了这些模型的一般化表现,并提供了两种类型的推uctive学习任务的一般化上限,包括这些类型的图像推uctive学习(GTL)和非 Parametric Kernel Regression(TNKR)。当目标函数是低维或近似低维的时候,我们设计了低维方法,这些方法具有特别锐利的一般化上限,不可能由现有的学习理论方法所 дости得,到最好的我们所知。

Applications of Federated Learning in IoT for Hyper Personalisation

  • paper_url: http://arxiv.org/abs/2309.16854
  • repo_url: None
  • paper_authors: Veer Dosi
  • for: 这篇论文是为了探讨如何使用分布式机器学习模型来实现ultra个性化ersonalization,并且不需要将数据传输到中央服务器。
  • methods: 论文使用了分布式FL训练机器学习模型,可以在多个客户端上进行训练,而不需要将数据传输到中央服务器。
  • results: 论文实现了ultra个性化ersonalization,并且可以在多个客户端上进行实时训练和应用。
    Abstract Billions of IoT devices are being deployed, taking advantage of faster internet, and the opportunity to access more endpoints. Vast quantities of data are being generated constantly by these devices but are not effectively being utilised. Using FL training machine learning models over these multiple clients without having to bring it to a central server. We explore how to use such a model to implement ultra levels of personalization unlike before
    摘要 亿量的物联网设备正在投入使用,利用更快的互联网和更多的终端机器。这些设备不断生成大量数据,但是它们并未有效利用。我们探讨如何使用FL训练机器学习模型,在多个客户端上运行无需带到中央服务器。我们还explore如何使用这种模型实现以往未有的超级个性化。Note that "FL" in the text refers to "Federated Learning", which is a machine learning technique that allows training models on distributed data without bringing all the data to a central server.

Optimal Nonlinearities Improve Generalization Performance of Random Features

  • paper_url: http://arxiv.org/abs/2309.16846
  • repo_url: None
  • paper_authors: Samet Demir, Zafer Doğan
  • for: 这个论文的目的是提高指定的超参数学习问题的泛化性能。
  • methods: 该论文使用了一种Random feature model with a nonlinear activation function,并对其等价模型进行分析,以掌握活动函数的重要作用。
  • results: 该论文的实验结果表明,通过优化非线性函数,可以提高泛化性能,并且可以 Mitigate the double descent phenomenon。此外,论文还提供了一些优化后的非线性函数,如第二阶多项式和分割函数,可以在不同的 regression 和分类问题中使用。
    Abstract Random feature model with a nonlinear activation function has been shown to perform asymptotically equivalent to a Gaussian model in terms of training and generalization errors. Analysis of the equivalent model reveals an important yet not fully understood role played by the activation function. To address this issue, we study the "parameters" of the equivalent model to achieve improved generalization performance for a given supervised learning problem. We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities. We provide two example classes from this set, e.g., second-order polynomial and piecewise linear functions. These functions are optimized to improve generalization performance regardless of the actual form. We experiment with regression and classification problems, including synthetic and real (e.g., CIFAR10) data. Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU. Furthermore, we illustrate that the proposed nonlinearities also mitigate the so-called double descent phenomenon, which is known as the non-monotonic generalization performance regarding the sample size and the model size.
    摘要 随机特征模型与非线性活动函数已经被证明可以在训练和泛化错误方面达到相同的性能。分析等价模型的参数 revelas了活动函数在模型性能中扮演的重要 yet not fully understood 角色。为了解决这个问题,我们研究了等价模型的参数,以实现给定的supervised learning问题中的改进泛化性能。我们证明了从 Gaussian model 获取的参数可以定义一组优化的非线性函数。我们提供了这组函数的两个例子,如二次多项式和分割线性函数。这些函数可以在不同的形式下进行优化,以提高泛化性能。我们通过回归和分类问题进行实验,包括 sintetic 和实际(如 CIFAR10)数据。我们的数据显示,使用优化的非线性函数可以超越广泛使用的非线性函数如 ReLU,并且这些函数还可以 Mitigate double descent 现象,即样本大小和模型大小之间的非 monotonic 泛化性能。

Constant Approximation for Individual Preference Stable Clustering

  • paper_url: http://arxiv.org/abs/2309.16840
  • repo_url: None
  • paper_authors: Anders Aamand, Justin Y. Chen, Allen Liu, Sandeep Silwal, Pattara Sukprasert, Ali Vakilian, Fred Zhang
  • For: 这个论文的目的是解释一种基于稳定和公正约束的自然聚类目标,即IP稳定性(Individual Preference stability),并证明这种目标下的聚类算法是可行的。* Methods: 这篇论文使用了一种新的稳定性对象,即IP稳定性,并提供了一种基于这种对象的聚类算法。该算法使用了一种新的技术,即证明了一个$O(1)$稳定性的聚类算法是可行的。* Results: 这篇论文的结果表明,对于普通的距离函数,存在一个$O(1)$稳定性的聚类算法,并且该算法是可行的。此外,论文还介绍了一些扩展IP稳定性的方法,并提供了一些高效的近似算法。
    Abstract Individual preference (IP) stability, introduced by Ahmadi et al. (ICML 2022), is a natural clustering objective inspired by stability and fairness constraints. A clustering is $\alpha$-IP stable if the average distance of every data point to its own cluster is at most $\alpha$ times the average distance to any other cluster. Unfortunately, determining if a dataset admits a $1$-IP stable clustering is NP-Hard. Moreover, before this work, it was unknown if an $o(n)$-IP stable clustering always \emph{exists}, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering. We close this gap in understanding and show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering. We also introduce generalizations of IP stability beyond average distance and give efficient, near-optimal algorithms in the cases where we consider the maximum and minimum distances within and between clusters.
    摘要 个人偏好稳定性(IP稳定),由阿hmadi等人(ICML 2022)引入,是一种自然的 clustering 目标,受到稳定和公平约束的影响。一个 clustering 是 $\alpha$-IP 稳定的,如果每个数据点与自己的集群的平均距离不大于 $\alpha$ 倍于任何其他集群的平均距离。 unfortunately, 确定数据集是否具有 $1$-IP 稳定 clustering 是NP-Hard。此外,在此前的工作中,只有 garantía an $O(n)$-IP 稳定 clustering,而不知道是否存在 $o(n)$-IP 稳定 clustering。我们在这个不了解中填补了这个差距,并证明了一个 $O(1)$-IP 稳定 clustering 总是存在于一般的度量下,并且我们提供了一个高效的算法,该算法输出这种 clustering。我们还介绍了 IP 稳定性的扩展,超过平均距离的情况下,并给出了高效的、近似优的算法。

An analysis of the derivative-free loss method for solving PDEs

  • paper_url: http://arxiv.org/abs/2309.16829
  • repo_url: None
  • paper_authors: Jihun Han, Yoonsang Lee
  • for: 解决一类含梯度函数的几何PDE问题
  • methods: 使用神经网络和 derivative-free 损失函数
  • results: 研究了时间间隔和步长对计算效率、训练可能和抽样误差的影响,并提供了分析结果和数值测试来支持分析结果
    Abstract This study analyzes the derivative-free loss method to solve a certain class of elliptic PDEs using neural networks. The derivative-free loss method uses the Feynman-Kac formulation, incorporating stochastic walkers and their corresponding average values. We investigate the effect of the time interval related to the Feynman-Kac formulation and the walker size in the context of computational efficiency, trainability, and sampling errors. Our analysis shows that the training loss bias is proportional to the time interval and the spatial gradient of the neural network while inversely proportional to the walker size. We also show that the time interval must be sufficiently long to train the network. These analytic results tell that we can choose the walker size as small as possible based on the optimal lower bound of the time interval. We also provide numerical tests supporting our analysis.
    摘要 这种研究利用神经网络解决一种特定类型的圆形偏微分方程(PDEs)的derivative-free损失法。derivative-free损失法使用费曼-卡克表示法,利用杂乱步进行随机扩散和其相应的平均值。我们研究了在计算效率、训练可能性和抽样误差等方面,Feynman-Kac表示法中时间间隔和步进行的影响。我们的分析表明,训练损失偏好与时间间隔和神经网络的空间梯度成正比,而与步进行的大小成反比。此外,我们还证明了训练过程中时间间隔必须足够长以训练神经网络。这些分析结果告诉我们可以根据最佳下界选择步进行的最小化。我们还提供了支持我们分析的数学测试。

Post-Training Overfitting Mitigation in DNN Classifiers

  • paper_url: http://arxiv.org/abs/2309.16827
  • repo_url: None
  • paper_authors: Hang Wang, David J. Miller, George Kesidis
  • for: 本研究旨在提出一种可以在无知到训练集和训练过程的情况下对托管攻击进行防范的方法。
  • methods: 该方法基于约束神经活化的思想,通过限制最大margin(MM)来降低托管攻击的影响。
  • results: 实验结果表明,对CIFAR-10和CIFAR-100 dataset进行 poste针处理后MM基于规范可以减少托管攻击的影响,同时也可以提高clean generalization的准确率。
    Abstract Well-known (non-malicious) sources of overfitting in deep neural net (DNN) classifiers include: i) large class imbalances; ii) insufficient training-set diversity; and iii) over-training. In recent work, it was shown that backdoor data-poisoning also induces overfitting, with unusually large classification margins to the attacker's target class, mediated particularly by (unbounded) ReLU activations that allow large signals to propagate in the DNN. Thus, an effective post-training (with no knowledge of the training set or training process) mitigation approach against backdoors was proposed, leveraging a small clean dataset, based on bounding neural activations. Improving upon that work, we threshold activations specifically to limit maximum margins (MMs), which yields performance gains in backdoor mitigation. We also provide some analytical support for this mitigation approach. Most importantly, we show that post-training MM-based regularization substantially mitigates non-malicious overfitting due to class imbalances and overtraining. Thus, unlike adversarial training, which provides some resilience against attacks but which harms clean (attack-free) generalization, we demonstrate an approach originating from adversarial learning that helps clean generalization accuracy. Experiments on CIFAR-10 and CIFAR-100, in comparison with peer methods, demonstrate strong performance of our methods.
    摘要 well-known (非恶意) source of overfitting in deep neural network (DNN) classifiers include: i) large class imbalances; ii) insufficient training set diversity; and iii) over-training. 在latest work, it was shown that backdoor data poisoning also induces overfitting, with unusually large classification margins to the attacker's target class, mediated particularly by (unbounded) ReLU activations that allow large signals to propagate in the DNN. 因此, an effective post-training (without knowledge of the training set or training process) mitigation approach against backdoors was proposed, leveraging a small clean dataset, based on bounding neural activations. 我们提高了这种 mitigation approach by specifically thresholding activations to limit maximum margins (MMs), which yields performance gains in backdoor mitigation. 我们也提供了一些analytical support for this mitigation approach. most importantly, we show that post-training MM-based regularization substantially mitigates non-malicious overfitting due to class imbalances and overtraining. 因此, unlike adversarial training, which provides some resilience against attacks but harms clean (attack-free) generalization, we demonstrate an approach originating from adversarial learning that helps clean generalization accuracy. 我们的方法在CIFAR-10和CIFAR-100上进行了实验,与同期方法进行比较,示出了我们的方法的强大表现。

FENDA-FL: Personalized Federated Learning on Heterogeneous Clinical Datasets

  • paper_url: http://arxiv.org/abs/2309.16825
  • repo_url: https://github.com/vectorinstitute/fl4health
  • paper_authors: Fatemeh Tavakoli, D. B. Emerson, John Jewell, Amrit Krishnan, Yuchong Zhang, Amol Verma, Fahad Razak
  • for: 这个研究旨在推广 Federated Learning(FL)在医疗设置中应用,以便突破数据困难,并提高机器学习模型的训练和部署。
  • methods: 该研究提出了一种基于 FENDA 方法(Kim et al., 2016)的 FL 扩展方法,并在 FLamby 测试集(du Terrail et al., 2022a)和 GEMINI 数据集(Verma et al., 2017)上进行了实验,结果表明该方法在医疗数据中具有稳定性和高性能。
  • results: 该研究的实验结果表明,与现有的全球和个性化 FL 技术相比,该方法在评估个性化 FL 方法时表现出了显著改进,并扩展了 FLamby 测试集,以便更好地反映实际应用场景。此外,该研究还提出了一个完整的检查点和评估框架,以更好地反映实际应用场景,并提供多个基准点 для比较。
    Abstract Federated learning (FL) is increasingly being recognized as a key approach to overcoming the data silos that so frequently obstruct the training and deployment of machine-learning models in clinical settings. This work contributes to a growing body of FL research specifically focused on clinical applications along three important directions. First, an extension of the FENDA method (Kim et al., 2016) to the FL setting is proposed. Experiments conducted on the FLamby benchmarks (du Terrail et al., 2022a) and GEMINI datasets (Verma et al., 2017) show that the approach is robust to heterogeneous clinical data and often outperforms existing global and personalized FL techniques. Further, the experimental results represent substantive improvements over the original FLamby benchmarks and expand such benchmarks to include evaluation of personalized FL methods. Finally, we advocate for a comprehensive checkpointing and evaluation framework for FL to better reflect practical settings and provide multiple baselines for comparison.
    摘要 《联合学习(FL)在医疗设置中越来越被认可为解决数据岛屿的障碍,帮助机器学习模型训练和部署。本研究对医疗应用的FL研究做出了三个重要贡献。首先,我们提出了对FENDA方法(Kim et al., 2016)的扩展,并在FLamby测试集(du Terrail et al., 2022a)和GEMINI数据集(Verma et al., 2017)上进行了实验。结果表明,我们的方法在医疗数据中具有坚定性,并经常超越现有的全球和个性化FL技术。此外,我们的实验结果超越了原始的FLamby测试集,并扩展了个性化FL方法的评估。最后,我们提出了一个完整的检查点和评估框架,以更好地反映实际场景,并提供多个基线 для比较。》Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The Traditional Chinese version may be slightly different.

PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers

  • paper_url: http://arxiv.org/abs/2309.16816
  • repo_url: https://github.com/felix-lyx/prose
  • paper_authors: Yuxuan Liu, Zecheng Zhang, Hayden Schaeffer
  • for: 用于 solving various scientific computing tasks, such as real-time predictions, inverse problems, optimal controls, and surrogate modeling.
  • methods: 使用 neural network 来approximate nonlinear differential equations, 并可以同时学习多个偏微分方程的解析方程。
  • results: 提高预测精度和泛化能力,能够处理数据噪声和符号表达错误,包括不精确的数值值、模型误差和错误添加或删除项。
    Abstract Approximating nonlinear differential equations using a neural network provides a robust and efficient tool for various scientific computing tasks, including real-time predictions, inverse problems, optimal controls, and surrogate modeling. Previous works have focused on embedding dynamical systems into networks through two approaches: learning a single solution operator (i.e., the mapping from input parametrized functions to solutions) or learning the governing system of equations (i.e., the constitutive model relative to the state variables). Both of these approaches yield different representations for the same underlying data or function. Additionally, observing that families of differential equations often share key characteristics, we seek one network representation across a wide range of equations. Our method, called Predicting Operators and Symbolic Expressions (PROSE), learns maps from multimodal inputs to multimodal outputs, capable of generating both numerical predictions and mathematical equations. By using a transformer structure and a feature fusion approach, our network can simultaneously embed sets of solution operators for various parametric differential equations using a single trained network. Detailed experiments demonstrate that the network benefits from its multimodal nature, resulting in improved prediction accuracy and better generalization. The network is shown to be able to handle noise in the data and errors in the symbolic representation, including noisy numerical values, model misspecification, and erroneous addition or deletion of terms. PROSE provides a new neural network framework for differential equations which allows for more flexibility and generality in learning operators and governing equations from data.
    摘要 使用神经网络来近似非线性差分方程提供了一种robust和高效的工具,用于各种科学计算任务,如实时预测、反问题、优化控制和代理模型。先前的研究通过两种方法来嵌入动力系统到网络中:学习输入参数函数到解的映射(即单个解析器)或学习 governing 方程(即状态变量关系的定律模型)。两种方法都会生成不同的表示方式,但是它们都是基于同一个下面数据或函数。此外,我们注意到了各种差分方程之间的共同特征,我们寻找一个可以覆盖各种差分方程的网络表示。我们的方法,叫做预测操作符和符号表达(PROSE),学习从多模式输入到多模式输出的映射,能够生成数值预测和 математиче Equations。通过使用 transformer 结构和特征融合方法,我们的网络可以同时嵌入多个解析器 для多个参数差分方程,使用单个训练的网络。详细的实验表明,网络具有多模式特征,导致预测精度提高和更好的泛化。网络能够处理数据中的噪声和符号表达中的错误,包括不精确的数值、模型误差和错误地添加或删除 терм。PROSE 提供了一个新的神经网络框架,允许在学习解析器和 governing 方程时更多的灵活性和通用性。

GraB-sampler: Optimal Permutation-based SGD Data Sampler for PyTorch

  • paper_url: http://arxiv.org/abs/2309.16809
  • repo_url: None
  • paper_authors: Guanghao Wei
  • for: 这个论文目的是提出一个有效的Python库,以便社区轻松地使用Gradient Balancing(GraB)算法,并提出了5种GraB算法的变种。
  • methods: 论文使用了GraB算法的贪婪选择法,通过解决驱动问题使用每个样本的梯度来进行排序。
  • results: 论文的实验结果表明,使用GraB-sampler库可以复制训练损失和测试准确率结果,仅在训练时间开销上增加8.7%,并且占用GPU内存峰值使用率上升0.85%。
    Abstract The online Gradient Balancing (GraB) algorithm greedily choosing the examples ordering by solving the herding problem using per-sample gradients is proved to be the theoretically optimal solution that guarantees to outperform Random Reshuffling. However, there is currently no efficient implementation of GraB for the community to easily use it. This work presents an efficient Python library, $\textit{GraB-sampler}$, that allows the community to easily use GraB algorithms and proposes 5 variants of the GraB algorithm. The best performance result of the GraB-sampler reproduces the training loss and test accuracy results while only in the cost of 8.7% training time overhead and 0.85% peak GPU memory usage overhead.
    摘要 在线 Gradient Balancing(GraB)算法通过 solving 每个样本的散射问题,使用每个样本的梯度来遍历示例,已经证明是理论上最佳解,可以超越Random Reshuffling。然而,目前并没有有效的 GraB 实现,供社区使用。这项工作提供了一个高效的 Python 库,$\textit{GraB-sampler}$,使得社区可以轻松地使用 GraB 算法。此外,该工作还提出了 5 种 GraB 算法的变种。最佳性能结果表明,GraB-sampler 可以在训练损失和测试准确率上达到同样的水平,仅在训练时间成本上增加了8.7%,并且在最大 GPU 内存使用率上增加了0.85%。

HyperPPO: A scalable method for finding small policies for robotic control

  • paper_url: http://arxiv.org/abs/2309.16663
  • repo_url: None
  • paper_authors: Shashank Hegde, Zhehui Huang, Gaurav S. Sukhatme
  • for: 该论文旨在开发一种基于强化学习的多模型选择方法,以优化小型神经网络的性能。
  • methods: 该方法使用图生成器网络(graph hypernetworks)来同时估算多种神经网络的参数,以获得高性能的策略。
  • results: 实验表明, HyperPPO 可以快速并效率地训练多种小型神经网络,并提供多个高性能策略选择。
    Abstract Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo
    摘要 模型 avec fewer parameters 是对储存有限的、高性能的机器人控制中必备的。找到这些更小的神经网络架构可能会耗时。我们提出了 HyperPPO,一种在政策上的 reinforcement learning 算法,利用图函数 hypernetworks 来估算多个神经网络架构中的参数。我们的方法可以同时计算多个小型神经网络的参数,并且可以在同样的训练资源下获得高性能的策略。我们提供了多个训练过的策略,让用户可以根据自己的计算限制选择合适的网络架构。我们发现,我们的方法可以扩展,更多的训练资源将导致更快地 converges 到更高性能的架构。我们示出了使用 HyperPPO 来控制 Crazyflie2.1 四旋翼机器人的神经策略是可行的。网站:https://sites.google.com/usc.edu/hyperppo

Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures

  • paper_url: http://arxiv.org/abs/2309.16645
  • repo_url: https://github.com/zhanglab-aim/cancer-net
  • paper_authors: Christian Pedersen, Tiberiu Tesileanu, Tinghui Wu, Siavash Golkar, Miles Cranmer, Zijun Zhang, Shirley Ho
  • for: 这个研究是为了开发一个基于生物学信息的深度神经网络,用于预测肝癌病例。
  • methods: 这个研究使用了一个单调数据驱动的单向神经网络,并将生物学信息 integrate into the network through sparse connections。
  • results: 研究发现,这个方法可以提供更好的预测性,并且可以识别出不同神经网络的错误预测。
    Abstract In Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We verified the reproducibility of the study conducted by Elmarakeby et al., using both their original codebase, and our own re-implementation using more up-to-date libraries. We quantified the contribution of network sparsification by Reactome biological pathways, and confirmed its importance to P-NET's superior performance. Furthermore, we explored alternative neural architectures and approaches to incorporating biological information into the networks. We experimented with three types of graph neural networks on the same training data, and investigated the clinical prediction agreement between different models. Our analyses demonstrated that deep neural networks with distinct architectures make incorrect predictions for individual patient that are persistent across different initializations of a specific neural architecture. This suggests that different neural architectures are sensitive to different aspects of the data, an important yet under-explored challenge for clinical prediction tasks.
    摘要 在《Elmarakeby等人的《生物学信息感知深度神经网络 для肾癌发现》》中,提出了一种具有生物学信息的深度神经网络(P-NET),用于模拟肾癌的状态。我们对Elmarakeby等人的研究进行了重复性研究,使用他们原始代码库和我们自己使用更新版库的重新实现。我们评估了路径生物学信息的减少对P-NET性能的贡献,并证明其重要性。此外,我们还 explore了不同的神经网络架构和生物信息的集成方法。我们在同一个训练数据上测试了三种图神经网络,并 investigate了不同模型之间的临床预测一致性。我们的分析表明,不同的神经网络架构会对不同的数据特征产生不同的预测结果,这是致命疾病预测任务中尚未得到足够的研究的一个重要挑战。

Robust Offline Reinforcement Learning – Certify the Confidence Interval

  • paper_url: http://arxiv.org/abs/2309.16631
  • repo_url: None
  • paper_authors: Jiarui Yao, Simon Shaolei Du
  • for: 防御对深度学习强化学习(RL)的攻击
  • methods: 使用随机缓和确认算法确认策略的稳定性
  • results: 在不同环境中,确认算法能够有效地验证策略的稳定性
    Abstract Currently, reinforcement learning (RL), especially deep RL, has received more and more attention in the research area. However, the security of RL has been an obvious problem due to the attack manners becoming mature. In order to defend against such adversarial attacks, several practical approaches are developed, such as adversarial training, data filtering, etc. However, these methods are mostly based on empirical algorithms and experiments, without rigorous theoretical analysis of the robustness of the algorithms. In this paper, we develop an algorithm to certify the robustness of a given policy offline with random smoothing, which could be proven and conducted as efficiently as ones without random smoothing. Experiments on different environments confirm the correctness of our algorithm.
    摘要 当前,人工智能学会(RL),特别是深度RL,在研究领域内已经受到了越来越多的关注。然而,RL的安全性问题已经成为了一大问题,因为攻击方式已经成熟。为了防止这些攻击,一些实用的方法已经被开发出来,如对抗训练和数据筛选等。然而,这些方法都基于了empirical算法和实验,没有rigorous的理论分析。在这篇论文中,我们开发了一种可以证明RL策略的Robustness的算法,可以在Random Smoothing下进行有效地证明和实现。实验结果表明了我们的算法的正确性。

On Learning with LAD

  • paper_url: http://arxiv.org/abs/2309.16630
  • repo_url: https://github.com/MahtabEK/Supervised-learning-Linear-models-and-Loss-functions
  • paper_authors: C. A. Jothishwaran, Biplav Srivastava, Jitin Singla, Sugata Gangopadhyay
  • for: 这篇论文旨在提出一种逻辑分析数据(LAD)技术,该技术可以生成基于布尔函数的二分类分类器,并且不会导致过拟合。
  • methods: 该论文使用了优化技术,并且对LAD模型中的假设集合进行了VC维度的估计,以确保模型不会过拟合。
  • results: 该论文的实验结果证明了这种技术的有效性,并且对LAD模型的VC维度的估计也得到了证明。
    Abstract The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.
    摘要 “数据逻辑分析”(LAD)是一种技术,可以生成基于布尔函数的二分类分类器,其表示形式为排中函数(DNF)。尽管LAD算法使用优化技术,但生成的二分类分类器或规则不会导致过拟合。我们提出了对LAD模型中假设集的VC维度(Vapnik-Chervonenkis dimension)的理论正当性的解释,并通过实际实验证实。

Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2309.16604
  • repo_url: None
  • paper_authors: Junjie Yang, Matthieu Labeau, Florence d’Alché-Buc
  • for: comparing graphs with both node and edge attributes
  • methods: using Gromov-Wasserstein distances with novel algorithms for distance and barycenter computation
  • results: effective in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction
    Abstract Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.
    摘要 <> translate english text into simplified chinesePairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.中文简体版:对于机器学习中的许多应用,如 clustering、基于核函数的分类/回归以及最近的监督图预测,对图进行对比是关键。通常,图之间的距离取决于这些结构化对象的有用表示,如 bag of substructures 或其他图嵌入。在最近几年,一种流行的解决方案是将图表示为度量度量空间,以便成功地利用最优运输,从而获得有意义的距离,以比较图。然而,这个家族的距离完全忽略了边属性,这些属性对许多结构化对象是关键。在这项工作中,我们介绍了一种扩展的格罗莫-瓦asserstein距离,用于比较具有特征的图。我们提出了新的算法来计算距离和中点。我们通过实验表明,该新距离在图出现在输入空间或输出空间中的学习任务中具有有效性,如分类和图预测。

  • paper_url: http://arxiv.org/abs/2309.16603
  • repo_url: None
  • paper_authors: Cemil Vahapoglu, Timothy J. O’Shea, Tamoghna Roy, Sennur Ulukus
  • For: 提高5G无线通信网络的数据传输速率、覆盖范围和能效率,以及适应动态条件的问题。* Methods: 使用深度学习技术,具体是一种无监督的深度学习框架NNBF,实现Receive Multi-User Single Input Multiple Output(MU-SIMO)天线筛波设计。* Results: 比较基eline方法(ZFBF和MMSE均值器)表现出色,可扩展到单天线用户设备(UE)的数量增加,而基eline方法具有显著的计算扩展问题。
    Abstract The advancement of fifth generation (5G) wireless communication networks has created a greater demand for wireless resource management solutions that offer high data rates, extensive coverage, minimal latency and energy-efficient performance. Nonetheless, traditional approaches have shortcomings when it comes to computational complexity and their ability to adapt to dynamic conditions, creating a gap between theoretical analysis and the practical execution of algorithmic solutions for managing wireless resources. Deep learning-based techniques offer promising solutions for bridging this gap with their substantial representation capabilities. We propose a novel unsupervised deep learning framework, which is called NNBF, for the design of uplink receive multi-user single input multiple output (MU-SIMO) beamforming. The primary objective is to enhance the throughput by focusing on maximizing the sum-rate while also offering computationally efficient solution, in contrast to established conventional methods. We conduct experiments for several antenna configurations. Our experimental results demonstrate that NNBF exhibits superior performance compared to our baseline methods, namely, zero-forcing beamforming (ZFBF) and minimum mean square error (MMSE) equalizer. Additionally, NNBF is scalable to the number of single-antenna user equipments (UEs) while baseline methods have significant computational burden due to matrix pseudo-inverse operation.
    摘要 fifth-generation (5G) 无线通信网络的发展带来了更大的无线资源管理解决方案的需求,包括高数据速率、广泛的覆盖率、最小的延迟和能效的性能。然而,传统的方法在计算复杂性和适应动态条件方面存在缺陷,这导致了算法解决方案的实践与理论分析之间的差距。深度学习基于的技术提供了可能的解决方案,它们具有很大的表示能力。我们提出了一种新的无监督深度学习框架,名为NNBF,用于设计上行接收多用户单输入多输出(MU-SIMO)扫描。我们的目标是提高吞吐量,同时也提供计算效率高的解决方案,与已有的 conventional 方法不同。我们对几种天线配置进行了实验。我们的实验结果表明,NNBF 在 ZFBF 和 MMSE 平衡器的基础上表现出色,并且可扩展到单天线用户设备(UE)的数量。此外,NNBF 具有计算复杂性较低的优势,而基eline 方法在计算Matrix pseudo-inverse 操作时具有显著的计算压力。

Cross-Prediction-Powered Inference

  • paper_url: http://arxiv.org/abs/2309.16598
  • repo_url: https://github.com/tijana-zrnic/cross-ppi
  • paper_authors: Tijana Zrnic, Emmanuel J. Candès
  • for: 用于提高数据驱动决策的可靠性,即使面临着高质量标签数据的缺乏和昂贵的科学测量技术。
  • methods: 使用机器学习技术生成大量的预测标签,以提高下游的推理能力。
  • results: 通过使用机器学习技术来填充缺失的标签,并应用减偏技术以解决预测不准确的问题,实现了高质量的推理结论,并且比使用只有标签数据的方法更加可靠。
    Abstract While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.
    摘要 While reliable decision-making relies on high-quality labeled data, acquiring such labels can be time-consuming and expensive through human annotations or scientific measurements. Machine learning offers an attractive alternative by generating large amounts of predicted labels quickly and cheaply; for example, predicted protein structures can supplement experimentally derived structures, and predictions of socioeconomic indicators from satellite imagery can supplement accurate survey data. However, since predictions are imperfect and potentially biased, this practice raises questions about the validity of downstream inferences.To address this issue, we propose cross-prediction, a method for making valid inferences powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction uses machine learning to impute missing labels and applies debiasing techniques to remedy prediction inaccuracies. This approach achieves the desired error probability and is more powerful than relying solely on labeled data.In comparison, the recent proposal of prediction-powered inference assumes that a good pre-trained model is already available. We show that cross-prediction is more powerful than this approach, especially when a fraction of the labeled data is split off and used to train the model. Additionally, cross-prediction provides more stable conclusions, with confidence intervals typically having lower variability.

A Design Toolbox for the Development of Collaborative Distributed Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2309.16584
  • repo_url: None
  • paper_authors: David Jin, Niclas Kannengießer, Sascha Rank, Ali Sunyaev
    for:The paper is written for developers who want to design collaborative distributed machine learning (CDML) systems that meet specific use case requirements.methods:The paper presents a CDML design toolbox that guides the development of CDML systems, and it introduces CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements.results:The paper provides a systematic approach to designing CDML systems that can meet specific use case requirements, and it offers a set of CDML system archetypes with distinct key traits that can support the design of CDML systems.
    Abstract To leverage data for the sufficient training of machine learning (ML) models from multiple parties in a confidentiality-preserving way, various collaborative distributed ML (CDML) system designs have been developed, for example, to perform assisted learning, federated learning, and split learning. CDML system designs show different traits, including high agent autonomy, ML model confidentiality, and fault tolerance. Facing a wide variety of CDML system designs with different traits, it is difficult for developers to design CDML systems with traits that match use case requirements in a targeted way. However, inappropriate CDML system designs may result in CDML systems failing their envisioned purposes. We developed a CDML design toolbox that can guide the development of CDML systems. Based on the CDML design toolbox, we present CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements.
    摘要 To address this challenge, we have developed a CDML design toolbox that can guide the development of CDML systems. Based on the CDML design toolbox, we present CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements. These archetypes include:1. Assisted Learning: This archetype focuses on providing assistance to ML models through distributed computing and data sharing.2. Federated Learning: This archetype emphasizes maintaining the privacy and security of data while training ML models in a distributed manner.3. Split Learning: This archetype involves dividing the training process into multiple stages, each of which is performed by a different party.By leveraging these archetypes, developers can design CDML systems that meet their specific use case requirements and ensure the confidentiality and security of the data involved.

M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning

  • paper_url: http://arxiv.org/abs/2309.16578
  • repo_url: None
  • paper_authors: He Zhang, Siyuan Liu, Jiacheng You, Chang Liu, Shuxin Zheng, Ziheng Lu, Tong Wang, Nanning Zheng, Bin Shao
  • for: 本研究旨在提出一种能够解决分子系统的量子化学方法,以提高当代分子研究的精度和效率。
  • methods: 本方法基于深度学习函数模型,可以将非 lokalisiert 的 Essential nonlocality 引入到模型中,并通过使用原子基来表示密度,使其成本下降。
  • results: 研究发现,使用 M-OFDFT 方法可以在广泛的分子系统中实现相当于 Kohn-Sham DFT 的精度,并且可以对大分子系统进行 extrapolation, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    Abstract Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    摘要 orbital-free density functional theory(OFDFT)是一种量子化学形式化,其成本规模比前一代键恩-谜DFT更低,现在越来越受当代分子研究的欢迎。然而,其准确性受到动能密度函数的限制,这是非扩散分子系统中难以估算的。在这种工作中,我们提出了M-OFDFT方法,可以使用深度学习函数模型来解决分子系统。我们在模型中嵌入非本地性,通过基于原子基的简洁密度表示来使其可负担。通过解决不同学习挑战,M-OFDFT可以与键恩-谜DFT相比走同样的精度,并且可以对大分子,包括蛋白质,进行研究,这表明了量子化学精度-效率贸易的前进。

Review of Machine Learning Methods for Additive Manufacturing of Functionally Graded Materials

  • paper_url: http://arxiv.org/abs/2309.16571
  • repo_url: None
  • paper_authors: Mohammad Karimzadeh, Aleksandar Vakanski, Fei Xu, Xinchang Zhang
  • for: 本研究旨在探讨机器学习技术在Directed Energy Deposition(DED)中的应用,以及其在功能含量变化材料(Functionally Graded Materials,FGM)的制造中的效果。
  • methods: 本研究使用机器学习技术来优化DED的处理参数,提高产品质量,并检测制造缺陷。
  • results: 本研究结果表明,机器学习技术可以有效地优化DED的处理参数,提高FGM的性能和特性。
    Abstract Additive manufacturing has revolutionized the manufacturing of complex parts by enabling direct material joining and offers several advantages such as cost-effective manufacturing of complex parts, reducing manufacturing waste, and opening new possibilities for manufacturing automation. One group of materials for which additive manufacturing holds great potential for enhancing component performance and properties is Functionally Graded Materials (FGMs). FGMs are advanced composite materials that exhibit smoothly varying properties making them desirable for applications in aerospace, automobile, biomedical, and defense industries. Such composition differs from traditional composite materials, since the location-dependent composition changes gradually in FGMs, leading to enhanced properties. Recently, machine learning techniques have emerged as a promising means for fabrication of FGMs through optimizing processing parameters, improving product quality, and detecting manufacturing defects. This paper first provides a brief literature review of works related to FGM fabrication, followed by reviewing works on employing machine learning in additive manufacturing, Afterward, we provide an overview of published works in the literature related to the application of machine learning methods in Directed Energy Deposition and for fabrication of FGMs.
    摘要 加法制造技术已经革命化了复杂部件的制造过程,可以直接Join матери材料,并且具有许多优势,如成本效益的制造复杂部件、减少制造废弃物和开销新的制造自动化机会。一组材料 для 加法制造具有极大的潜力提高部件性能和特性,那就是功能梯度材料(FGM)。FGM 是一种先进的复合材料,其中物质的分布随着位置而变化,使得它们在航空、汽车、医疗和国防等领域中具有极大的应用前景。与传统复合材料不同,FGM 的组分随着位置的变化而变化,导致了提高的性能。最近,机器学习技术在加法制造中 emerge 为一种可能的方法,通过优化处理参数、提高产品质量和检测制造缺陷。本文首先提供了关于 FGM 制造的文献综述,然后评论了关于机器学习在加法制造中的应用,最后提供了已发表的文献中关于机器学习方法在 Directed Energy Deposition 中的应用和 FGM 制造方面的评论。

CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption

  • paper_url: http://arxiv.org/abs/2309.16563
  • repo_url: None
  • paper_authors: Shubhada Agrawal, Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard
  • for: investigate the regret-minimization problem in a multi-armed bandit setting with arbitrary corruptions
  • methods: introduce CRIMED algorithm, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance, and provide a finite-sample analysis of CRIMED’s regret performance
  • results: establish a problem-dependent lower bound on regret, and show that CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$, and develop a tight concentration result for medians in the presence of arbitrary corruptions.Here’s the format you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope that helps!
    Abstract We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
    摘要 我们研究了在多臂机构设定中对抗恨衰问题,这个问题的特点是:代理人在每次选择臂时会收到由该臂的分布生成的奖励,但是这些奖励不直接观察到。而是,代理人会有一定的几率(即 $\varepsilon$)选择观察臂的分布,而另一些几率选择一个随机的扰动分布。我们不假设这些扰动分布的性质,它们可能是无界的。在这个设定下,我们建立了问题依赖的下界限定 regret,这是一个对于每个臂分布的家族而言。我们引入了CRIMED,一个在数据分布为 Gaussian 时的 asymptotically-optimal 算法,它可以实现下界限定 regret。此外,我们进行了finite-sample 分析,发现CRIMED 可以有效地处理 $\varepsilon$ 值高达 $\frac{1}{2}$ 的扰动。此外,我们还提出了一个紧密的集中数据分布中的 median 对于arbitrary corruptions的对称问题,这可能是独立的 интерес。最后,我们讨论了在 Gaussian 模型中处理错误的扩展。

Implicit Gaussian process representation of vector fields over arbitrary latent manifolds

  • paper_url: http://arxiv.org/abs/2309.16746
  • repo_url: https://github.com/agosztolai/rvgp
  • paper_authors: Robert L. Peach, Matteo Vinao-Carl, Nir Grossman, Michael David, Emma Mallas, David Sharp, Paresh A. Malhotra, Pierre Vandergheynst, Adam Gosztolai
  • for: 用于学习未知函数和评估数据空间不确定性
  • methods: 使用 pozitional 编码和 Laplacian 的Connection eigenfunctions 来扩展 GP 来学习 vector 信号
  • results: 可以在 manifold 上具有全球正则性,从而实现 vector 场的超分辨和填充,并且可以用于重建高密度神经动力学记录中的疾病标志Here’s the breakdown of each point:1. for: 用于学习未知函数和评估数据空间不确定性 - The paper is written to present a new method for learning vector signals over latent Riemannian manifolds, which can be used to improve the resolution and inpainting of vector fields in various applications.2. methods: 使用 pozitional 编码和 Laplacian 的Connection eigenfunctions 来扩展 GP 来学习 vector 信号 - The method uses positional encoding with eigenfunctions of the connection Laplacian to extend Gaussian processes (GPs) for learning vector signals over latent Riemannian manifolds.3. results: 可以在 manifold 上具有全球正则性,从而实现 vector 场的超分辨和填充,并且可以用于重建高密度神经动力学记录中的疾病标志 - The method is able to achieve global regularity over the manifold, which allows for super-resolution and inpainting of vector fields, and can be used to reconstruct high-density neural dynamics from low-density EEG recordings in healthy individuals and Alzheimer’s patients.
    Abstract Gaussian processes (GPs) are popular nonparametric statistical models for learning unknown functions and quantifying the spatiotemporal uncertainty in data. Recent works have extended GPs to model scalar and vector quantities distributed over non-Euclidean domains, including smooth manifolds appearing in numerous fields such as computer vision, dynamical systems, and neuroscience. However, these approaches assume that the manifold underlying the data is known, limiting their practical utility. We introduce RVGP, a generalisation of GPs for learning vector signals over latent Riemannian manifolds. Our method uses positional encoding with eigenfunctions of the connection Laplacian, associated with the tangent bundle, readily derived from common graph-based approximation of data. We demonstrate that RVGP possesses global regularity over the manifold, which allows it to super-resolve and inpaint vector fields while preserving singularities. Furthermore, we use RVGP to reconstruct high-density neural dynamics derived from low-density EEG recordings in healthy individuals and Alzheimer's patients. We show that vector field singularities are important disease markers and that their reconstruction leads to a comparable classification accuracy of disease states to high-density recordings. Thus, our method overcomes a significant practical limitation in experimental and clinical applications.
    摘要 We introduce RVGP, a generalization of GPs for learning vector signals over latent Riemannian manifolds. Our method uses positional encoding with eigenfunctions of the connection Laplacian, associated with the tangent bundle, readily derived from common graph-based approximation of data. We demonstrate that RVGP possesses global regularity over the manifold, which allows it to super-resolve and inpaint vector fields while preserving singularities.Furthermore, we use RVGP to reconstruct high-density neural dynamics derived from low-density EEG recordings in healthy individuals and Alzheimer's patients. We show that vector field singularities are important disease markers and that their reconstruction leads to a comparable classification accuracy of disease states to high-density recordings. Thus, our method overcomes a significant practical limitation in experimental and clinical applications.Translation in Simplified Chinese: Gaussian 进程 (GPs) 是一种流行的非 Parametric 统计模型,用于学习未知函数并量化数据中的空间时间不确定性。 recent works 扩展了 GPs 以模型分布在非 Euclidian 空间中的Scalar 和 Vector 量,包括计算机视觉、动力学系统和神经科学中的 Smooth 抽象。然而,这些方法假设数据下的抽象是已知的,这限制了它们的实际应用。我们引入 RVGP,一种 GPs 的扩展,用于学习分布在 latent Риман manifolds 上的Vector 信号。我们的方法使用 pozitional 编码,使用 tangent 维Bundle 上的连接 Laplacian 的 eigenfunctions,可以从通用的图像基 Approximation 中 derivation。我们示示了 RVGP 在 manifold 上具有全局正则性,可以Super-Resolution 和 Inpaint vector 场景,保留特点。此外,我们使用 RVGP 重建来自低密度 EEG 记录的高密度神经动力学。我们表明 that vector 场景的缺点是重要的疾病标志,并且其重建可以达到与高密度记录相同的疾病状态分类精度。因此,我们的方法超越了实际和临床应用中的重要限制。

Correcting for heterogeneity in real-time epidemiological indicators

  • paper_url: http://arxiv.org/abs/2309.16546
  • repo_url: None
  • paper_authors: Aaron Rumack, Roni Rosenfeld, F. William Townes
    for:* 这个论文旨在解决 auxillary data sources 上的空间和时间不均性问题,以提高 epidemiological surveillance 的准确性。methods:* 这种方法使用一个 “导航” 信号来纠正空间和时间不均性,并生成一个更可靠的信号,可以用于模型和预测。* 方法假设不均性可以近似为一个低级matrix,并且时间不均性平滑。results:* 通过使用这种方法,可以减少 auxillary data sources 上的不均性,从而大幅提高 epidemiological surveillance 的准确性。* 无基础实际数据,通过图表和地图来证明这种方法的有效性。
    Abstract Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional surveillance signals. We describe the problem of spatial and temporal heterogeneity in these signals derived from these data sources, where spatial and/or temporal biases are present. We present a method to use a ``guiding'' signal to correct for these biases and produce a more reliable signal that can be used for modeling and forecasting. The method assumes that the heterogeneity can be approximated by a low-rank matrix and that the temporal heterogeneity is smooth over time. We also present a hyperparameter selection algorithm to choose the parameters representing the matrix rank and degree of temporal smoothness of the corrections. In the absence of ground truth, we use maps and plots to argue that this method does indeed reduce heterogeneity. Reducing heterogeneity from auxiliary data sources greatly increases their utility in modeling and forecasting epidemics.
    摘要 auxiliary数据源在流行病监测中变得越来越重要,因为它们通常具有更细致的空间和时间分布、更广泛的覆盖率和更低的延迟时间,相比传统监测信号。我们描述了这些信号中的空间和时间不均衡问题,其中存在空间和/或时间偏见。我们提出了使用“导向”信号来修正这些偏见,生成一个更可靠的信号,可以用于模型和预测。该方法假设空间和时间不均衡可以被近似为低级矩阵,并且时间不均衡平滑于时间。我们还提出了选择参数算法,用于选择表示矩阵级别和时间不均衡级别的参数。在缺乏真实参照数据的情况下,我们通过地图和图表来证明,这种方法确实减少了不均衡。减少auxiliary数据源中的不均衡,大大提高了它们的用于模型和预测流行病的utililty。

Efficient Training of One Class Classification-SVMs

  • paper_url: http://arxiv.org/abs/2309.16745
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Isaac Amornortey Yowetu, Nana Kena Frempong
  • for: 这个研究探讨了一种高效的训练方法,用于进行一类分类。在常见的 binary 分类场景中,有效的分类器需要both positive和negative示例在训练数据中存在。然而,在许多领域中,只有一个类型的示例。为了解决这个问题,我们创建了一种学习从solely positive输入的分类算法。
  • methods: 我们使用了Augmented Lagrangian(AL-FPGM),一种变体的快速 proyected gradient method,来实现这种方法。AL-FPGM只需要first derivatives,即计算主要为matrix-vector product。因此,它可以补充现有的quadratic programming solvers,用于训练大SVM。
  • results: 我们对实际世界数据集进行了广泛验证,并证明了我们的策略可以获得 statistically significant results。
    Abstract This study examines the use of a highly effective training method to conduct one-class classification. The existence of both positive and negative examples in the training data is necessary to develop an effective classifier in common binary classification scenarios. Unfortunately, this criteria is not met in many domains. Here, there is just one class of examples. Classification algorithms that learn from solely positive input have been created to deal with this setting. In this paper, an effective algorithm for dual soft-margin one-class SVM training is presented. Our approach makes use of the Augmented Lagrangian (AL-FPGM), a variant of the Fast Projected Gradient Method. The FPGM requires only first derivatives, which for the dual soft margin OCC-SVM means computing mainly a matrix-vector product. Therefore, AL-FPGM, being computationally inexpensive, may complement existing quadratic programming solvers for training large SVMs. We extensively validate our approach over real-world datasets and demonstrate that our strategy obtains statistically significant results.
    摘要 In this paper, an effective algorithm for dual soft-margin one-class support vector machine (SVM) training is presented. Our approach utilizes the Augmented Lagrangian (AL-FPGM), a variant of the Fast Projected Gradient Method, which is computationally inexpensive and can complement existing quadratic programming solvers for training large SVMs.We extensively validate our approach on real-world datasets and demonstrate that our strategy achieves statistically significant results.

Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models

  • paper_url: http://arxiv.org/abs/2309.16521
  • repo_url: None
  • paper_authors: Manuel Schürch, Xiang Li, Ahmed Allam, Giulia Rathmes, Amina Mollaysa, Claudia Cavelti-Weder, Michael Krauthammer
  • for: 这篇论文是为了开发一个新的框架,让医生可以根据患者的个人历史资料,生成适应性的治疗方案。
  • methods: 这篇论文使用了深度生成时间序列模型,并且应用了决策理论,以生成具有实际可能性的个人化治疗方案。
  • results: 这篇论文透过实验显示了一个新的框架,可以根据患者的个人历史资料,生成适应性的治疗方案,并且可以预测未来的血糖水平。
    Abstract We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction
    摘要 我们提出了一种新的框架,它结合深度生成时间序列模型和决策理论来生成个性化治疗策略。它利用历史患者轨迹数据来同时学习生成真实个性化治疗和未来结果轨迹的深度生成时间序列模型。特别是,我们的框架允许生成基于个性化患者历史和训练优化预期未来结果的多变量治疗策略。我们通过生成医院糖尿病患者的个性化药物治疗策略和血糖预测,展示了我们的方法在生成改进个性化治疗策略方面的潜在优势。关键字:深度生成模型、概率决策支持、个性化治疗生成、药物和血糖预测

AtomSurf : Surface Representation for Learning on Protein Structures

  • paper_url: http://arxiv.org/abs/2309.16519
  • repo_url: https://github.com/vincentx15/atom2d
  • paper_authors: Vincent Mallet, Souhaib Attaiki, Maks Ovsjanikov
  • for: 本研究旨在发展 Cryo-EM 和蛋白结构预测算法,使大规模蛋白结构可访问,以便基于机器学习的功能预测。
  • methods: 本研究使用 geometric deep learning 方法, Representing 蛋白结构为 $\textit{3D mesh surfaces}$,并与已有的表示 benchmark 集成。
  • results: 研究发现,surface 表示独立不足,但可与 graph-based 方法相结合,以获得最佳结果。 本研究的代码和数据可在 GitHub 上找到:https://github.com/Vincentx15/atom2D。
    Abstract Recent advancements in Cryo-EM and protein structure prediction algorithms have made large-scale protein structures accessible, paving the way for machine learning-based functional annotations.The field of geometric deep learning focuses on creating methods working on geometric data. An essential aspect of learning from protein structures is representing these structures as a geometric object (be it a grid, graph, or surface) and applying a learning method tailored to this representation. The performance of a given approach will then depend on both the representation and its corresponding learning method. In this paper, we investigate representing proteins as $\textit{3D mesh surfaces}$ and incorporate them into an established representation benchmark. Our first finding is that despite promising preliminary results, the surface representation alone does not seem competitive with 3D grids. Building on this, we introduce a synergistic approach, combining surface representations with graph-based methods, resulting in a general framework that incorporates both representations in learning. We show that using this combination, we are able to obtain state-of-the-art results across $\textit{all tested tasks}$. Our code and data can be found online: https://github.com/Vincentx15/atom2D .
    摘要 最近的冰结电镜和蛋白结构预测算法的进步使得大规模蛋白结构可以获得,这开了机器学习基于功能注释的大门。在几何深度学习中,我们关注创建适用于几何数据的方法。在学习蛋白结构时,表示这些结构为几何对象(是灰度、图还是表面),并应用适应这种表示的学习方法。这种方法的性能取决于表示和学习方法。在这篇论文中,我们调查使用$\textit{3D mesh surfaces}$表示蛋白质和其他表示结合使用的方法。我们发现,尽管有前期的承诺性结果,但独立使用表面表示并不是与3D网格相比竞争力强。基于这,我们介绍了一种衍生的方法,把表面表示与图形基本方法结合使用,得到一个涵盖所有测试任务的通用框架。我们的代码和数据可以在https://github.com/Vincentx15/atom2D中找到。

Towards Poisoning Fair Representations

  • paper_url: http://arxiv.org/abs/2309.16487
  • repo_url: None
  • paper_authors: Tianci Liu, Haoyu Wang, Feijie Wu, Hengtong Zhang, Pan Li, Lu Su, Jing Gao
  • for: Mitigating model prediction bias against certain demographic subgroups such as elder and female.
  • methods: Data poisoning attack on fair representation learning (FRL) models, which induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data.
  • results: Superiority of the proposed attack on benchmark fairness datasets and state-of-the-art fair representation learning models, as well as a theoretical analysis on the needed number of poisoning samples to defend against the attack.
    Abstract Fair machine learning seeks to mitigate model prediction bias against certain demographic subgroups such as elder and female. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data poisoning framework attacking FRL. We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data. This attack entails a prohibitive bilevel optimization, wherefore an effective approximated solution is proposed. A theoretical analysis on the needed number of poisoning samples is derived and sheds light on defending against the attack. Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.
    摘要 “公平机器学习” seek to mitigate model prediction bias against certain demographic subgroups such as the elderly and women. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data poisoning framework attacking FRL. We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data. This attack entails a prohibitive bilevel optimization, wherefore an effective approximated solution is proposed. A theoretical analysis on the needed number of poisoning samples is derived and sheds light on defending against the attack. Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese writing systems. The other one is Traditional Chinese.

Predicting Long-term Renal Impairment in Post-COVID-19 Patients with Machine Learning Algorithms

  • paper_url: http://arxiv.org/abs/2309.16744
  • repo_url: None
  • paper_authors: Maitham G. Yousif, Hector J. Castro, John Martin, Hayder A. Albaqer, Fadhil G. Al-Amran, Habeeb W. Shubber, Salman Rawaf
  • for: 这项研究的目的是预测患有COVID-19后长期reno衰竭的风险,以便早期发现和 intervene 可能会受到reno衰竭的风险的患者,从而提高临床 result。
  • methods: 该研究使用了高级机器学习算法,包括多项式回归、决策树、Random Forest 等,对821名COVID-19后患者的数据进行分析和预测。
  • results: 研究发现,年龄、血压、血糖和肥胖等因素均与长期reno衰竭有关,并且通过使用机器学习算法,可以准确预测患者是否会发展reno衰竭。
    Abstract The COVID-19 pandemic has had far-reaching implications for global public health. As we continue to grapple with its consequences, it becomes increasingly clear that post-COVID-19 complications are a significant concern. Among these complications, renal impairment has garnered particular attention due to its potential long-term health impacts. This study, conducted with a cohort of 821 post-COVID-19 patients from diverse regions of Iraq across the years 2021, 2022, and 2023, endeavors to predict the risk of long-term renal impairment using advanced machine learning algorithms. Our findings have the potential to revolutionize post-COVID-19 patient care by enabling early identification and intervention for those at risk of renal impairment, ultimately improving clinical outcomes. This research encompasses comprehensive data collection and preprocessing, feature selection, and the development of predictive models using various machine learning algorithms. The study's objectives are to assess the incidence of long-term renal impairment in post-COVID-19 patients, identify associated risk factors, create predictive models, and evaluate their accuracy. We anticipate that our machine learning models, drawing from a rich dataset, will provide valuable insights into the risk of renal impairment, ultimately enhancing patient care and quality of life. In conclusion, the research presented herein offers a critical contribution to the field of post-COVID-19 care. By harnessing the power of machine learning, we aim to predict long-term renal impairment risk accurately. These predictions have the potential to inform healthcare professionals, enabling them to take proactive measures and provide targeted interventions for post-COVID-19 patients at risk of renal complications, thus minimizing the impact of this serious health concern.
    摘要 COVID-19 大流行对全球公共卫生造成了深见的影响。我们继续面临这些后果,越来越清楚地认识到Post-COVID-19 合并症是一个重要的担忧。其中,肾功能障碍受到了特别关注,因为它可能会对长期健康造成深见的影响。本研究使用了821名来自伊拉克不同地区的 Post-COVID-19 患者的 cohort,从2021年至2023年,通过先进的机器学习算法,预测患者在长期内可能发生肾功能障碍的风险。我们的发现有助于提高患者的临床结果,因为它们可以让医生在患者风险肾功能障碍的情况下采取早期预防措施,从而改善患者的生活质量。本研究包括了完整的数据收集和处理、特征选择以及使用不同的机器学习算法来建立预测模型。研究的目标是评估Post-COVID-19 患者长期肾功能障碍的发生率、相关风险因素、预测模型的建立以及其准确性的评估。我们预计,基于丰富的数据集,我们的机器学习模型将提供价值的预测,帮助医生更好地识别患者患肾功能障碍的风险,并采取相应的措施,从而最大化患者的生活质量。因此,本研究对Post-COVID-19 患者的护理做出了重要贡献。通过利用机器学习的力量,我们可以准确预测患者在长期内可能发生的肾功能障碍风险,并提供价值的预测,以便医生能够在患者风险肾功能障碍的情况下采取早期预防措施,最终减少这种严重的健康问题对患者的影响。

High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality

  • paper_url: http://arxiv.org/abs/2309.16476
  • repo_url: None
  • paper_authors: Urte Adomaityte, Leonardo Defilippis, Bruno Loureiro, Gabriele Sicuro
  • For: 这个论文研究了高维度 robust 回归估计器在含有非常极端尾的杂度和响应函数杂度的情况下的性能。* Methods: 这个论文使用了 M-估计器和ridge 回归估计器,并研究了这些估计器在高维度情况下的性能。* Results: 研究发现,尽管 Huber 损失函数可以在一些情况下达到最优性能,但在高维度情况下,这种损失函数会导致估计器的性能下降。此外,研究还发现了一种“奇怪的转变”现象,即在抽样复杂度和杂度之间的关系。最后,研究还发现了这些方法在更加丰富的模型和数据分布下的性能。
    Abstract We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
    摘要 我们研究高维性质的稳健回归估计器在 covariates 和响应函数中存在重尾的情况下。特别是,我们提供了一个锐化的极限性Characterisation of M-estimators 在一家包括二阶和更高阶积分布的 Elliptical covariate 和噪声数据分布下。我们发现,虽然consistent,但是在高维度 régime中,使用惩罚函数 Huber loss 的最佳调整参数 $\delta$ 是不优化的,这显示了需要进一步的正则化以实现最佳性能。这也暴露了一个curious transition 在 $\delta$ 中,它与样本复杂度和污染的关系。此外,我们 derivethe decay rates for the excess risk of ridge regression。我们发现,当噪声分布有finite second moment时,它是最佳和universal的,但是其衰减率可以在 covariates 的 second moment不存在时显著快。最后,我们表明了我们的公式可以轻松扩展到一个更加Rich family of models and data distributions,例如通用线性估计器在 mixture models 上。

Compositional Program Generation for Systematic Generalization

  • paper_url: http://arxiv.org/abs/2309.16467
  • repo_url: None
  • paper_authors: Tim Klinger, Luke Liu, Soham Dan, Maxwell Crouse, Parikshit Ram, Alexander Gray
  • for: The paper is written to explore the ability of a neuro-symbolic architecture called the Compositional Program Generator (CPG) to generalize on new concepts in a few-shot manner and to achieve perfect generalization on sequence-to-sequence language tasks.
  • methods: The paper uses a grammar of the input domain and a parser to generate a type hierarchy, and learns parameters for the semantic modules incrementally.
  • results: The paper achieves perfect generalization on the SCAN and COGS benchmarks in both standard and extreme few-shot settings.Here is the text in Simplified Chinese:
  • for: 本文是用来探究一种叫做 Compositional Program Generator (CPG) 的 neuromorphic架构在几个示例下能够泛化到新的概念。
  • methods: CPG 使用输入语言的语法和解析器生成一个类型层次结构,并逐步学习 semantic module 的参数。
  • results: CPG 在 SCAN 和 COGS 测试集上在标准和极端几个示例下达到了完美泛化。
    Abstract Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Machine learning models, including the now ubiquitous transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: modularity, type abstraction, and recursive composition, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs. CPG learns parameters for the semantic modules and is able to learn the semantics for new types incrementally. Given a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, in both standard and extreme few-shot settings.
    摘要 人类 possess a crucial ability called "compositional generalization", which enables us to learn new concepts from just a few examples. However, current machine learning models, including transformers, struggle to generalize in this way and often require thousands of examples to generalize meaningfully. This difference in ability motivates the development of a neuro-symbolic architecture called the Compositional Program Generator (CPG).CPG has three key features: modularity, type abstraction, and recursive composition. These features enable CPG to generalize both systematically to new concepts in a few-shot manner and productively on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy, where each grammar rule is assigned its own unique semantic module. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs.CPG learns parameters for the semantic modules and can learn the semantics for new types incrementally. With a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, both in standard and extreme few-shot settings.

A Metaheuristic for Amortized Search in High-Dimensional Parameter Spaces

  • paper_url: http://arxiv.org/abs/2309.16465
  • repo_url: https://github.com/neurolife77/drffit_paper
  • paper_authors: Dominic Boutet, Sylvain Baillet
  • For: 这篇论文的目的是提出一种新的元HEURISTIC方法,用于适应 Dynamical models of (bio)physical systems 的参数推断问题,解决了难以求导数、高维空间和非线性模型函数等问题。* Methods: 这篇论文使用了 Bayesian inference methods,通过考虑参数在统计分布下的方法,不需要计算点优化参数值。具体来说,DR-FFIT 实现了一种有效的采样策略,通过 feature-informed transformations 来降低维度,并使用人工神经网络来获得模型特征的导数。* Results: 测试数据显示,DR-FFIT 可以提高 random-search 和 simulated-annealing 等metaheuristics的性能,同时保持计算成本在合理的范围内。此外,DR-FFIT 还可以提高模型的准确性。
    Abstract Parameter inference for dynamical models of (bio)physical systems remains a challenging problem. Intractable gradients, high-dimensional spaces, and non-linear model functions are typically problematic without large computational budgets. A recent body of work in that area has focused on Bayesian inference methods, which consider parameters under their statistical distributions and therefore, do not derive point estimates of optimal parameter values. Here we propose a new metaheuristic that drives dimensionality reductions from feature-informed transformations (DR-FFIT) to address these bottlenecks. DR-FFIT implements an efficient sampling strategy that facilitates a gradient-free parameter search in high-dimensional spaces. We use artificial neural networks to obtain differentiable proxies for the model's features of interest. The resulting gradients enable the estimation of a local active subspace of the model within a defined sampling region. This approach enables efficient dimensionality reductions of highly non-linear search spaces at a low computational cost. Our test data show that DR-FFIT boosts the performances of random-search and simulated-annealing against well-established metaheuristics, and improves the goodness-of-fit of the model, all within contained run-time costs.
    摘要 参数推断 для动力学模型(生物)物理系统仍然是一个挑战。不可追加的梯度、高维空间和非线性模型函数通常会带来困难,除非有大量计算预算。近些年,这个领域的研究都集中在 Bayesian推断方法上,它们考虑参数在统计分布中,因此不会得到优化参数值的点 estimate。我们提出了一种新的metaheuristic,它可以从特征 Informed Transformations(DR-FFIT)中得到维度减少。DR-FFIT实现了一种高效的采样策略,该策略可以在高维空间中进行梯度free的参数搜索。我们使用人工神经网络来获得模型特征的可导代理。得到的梯度可以计算出模型在采样区域内的当地活跃子空间。这种方法可以高效地减少非线性搜索空间,在低计算成本下。我们的测试数据显示,DR-FFIT可以在Random-Search和Simulated-Annealing等已有metaheuristics的基础上提高性能,同时保持模型的匹配度, все在包含的运行时间成本下。

Universal Sleep Decoder: Aligning awake and sleep neural representation across subjects

  • paper_url: http://arxiv.org/abs/2309.16457
  • repo_url: None
  • paper_authors: Hui Zheng, Zhongtao Chen, Haiteng Wang, Jianyang Zhou, Lin Zheng, Yunzhe Liu
  • for: 这项研究的目的是解码sleep中的记忆内容。
  • methods: 这项研究使用了一种新的认知神经科学实验和一个完整的电enzephalography(EEG)数据集,并开发了一个名为“Universal Sleep Decoder”(USD)的模型,以实现将 neural representations在睡眠和醒目之间进行对应。
  • results: 研究实现了Up to 16.6%的零尝试预测精度,与使用个体睡眠数据的表现相当。 fine-tuning USD on test subjects可以提高预测精度至25.9%,与基线的6.7%预测精度有显著差异。
    Abstract Decoding memory content from brain activity during sleep has long been a goal in neuroscience. While spontaneous reactivation of memories during sleep in rodents is known to support memory consolidation and offline learning, capturing memory replay in humans is challenging due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 52 subjects during both wakefulness and sleep. Leveraging this benchmark dataset, we developed the Universal Sleep Decoder (USD) to align neural representations between wakefulness and sleep across subjects. Our model achieves up to 16.6% top-1 zero-shot accuracy on unseen subjects, comparable to decoding performances using individual sleep data. Furthermore, fine-tuning USD on test subjects enhances decoding accuracy to 25.9% top-1 accuracy, a substantial improvement over the baseline chance of 6.7%. Model comparison and ablation analyses reveal that our design choices, including the use of (i) an additional contrastive objective to integrate awake and sleep neural signals and (ii) the pretrain-finetune paradigm to incorporate different subjects, significantly contribute to these performances. Collectively, our findings and methodologies represent a significant advancement in the field of sleep decoding.
    摘要 “decode”brain activity during sleep的内容long been a goal in neuroscience. While spontaneous reactivation of memories during sleep in rodents is known to support memory consolidation and offline learning, capturing memory replay in humans is challenging due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 52 subjects during both wakefulness and sleep. 积极应用这个benchmark dataset,我们开发了Universal Sleep Decoder (USD),用于对 neural representations between wakefulness and sleep across subjects进行对应。我们的模型在未看过数据的情况下可以达到16.6%的top-1零投票精度,与使用个体睡眠数据的decoding性能相当。此外,对测试 subjects进行 fine-tuning可以提高decoding精度到25.9%的top-1精度,与基线的6.7%精度有所提高。模型比较和简洁分析表明,我们的设计选择,包括(i)使用额外的对比性目标来整合睡眠和醒目的神经信号,以及(ii)使用pretrain-finetune paradigm来 incorporate different subjects,对这些性能做出了重要贡献。总的来说,我们的发现和方法ология在睡眠decoding领域 represent a significant advancement.

Resisting Backdoor Attacks in Federated Learning via Bidirectional Elections and Individual Perspective

  • paper_url: http://arxiv.org/abs/2309.16456
  • repo_url: None
  • paper_authors: Zhen Qin, Feiyi Chen, Chen Zhi, Xueqiang Yan, Shuiguang Deng
  • for: 本研究旨在防止 Federated Learning (FL) 中的后门攻击,通过 exclude 恶意更新而不 negatively impact 模型精度。
  • methods: 本文提出了一种新的 anti-backdoor FL 框架,即 Snowball,基于竞选机制。它包括 bottom-up 选举和 top-down 选举两部分,通过选举和拒绝恶意更新来防止后门攻击。
  • results: 对于五个真实的 dataset,Snowball 与当前的防御技术进行比较,显示其在防止后门攻击方面更高效,并且对全球模型精度的影响较小。
    Abstract Existing approaches defend against backdoor attacks in federated learning (FL) mainly through a) mitigating the impact of infected models, or b) excluding infected models. The former negatively impacts model accuracy, while the latter usually relies on globally clear boundaries between benign and infected model updates. However, model updates are easy to be mixed and scattered throughout in reality due to the diverse distributions of local data. This work focuses on excluding infected models in FL. Unlike previous perspectives from a global view, we propose Snowball, a novel anti-backdoor FL framework through bidirectional elections from an individual perspective inspired by one principle deduced by us and two principles in FL and deep learning. It is characterized by a) bottom-up election, where each candidate model update votes to several peer ones such that a few model updates are elected as selectees for aggregation; and b) top-down election, where selectees progressively enlarge themselves through picking up from the candidates. We compare Snowball with state-of-the-art defenses to backdoor attacks in FL on five real-world datasets, demonstrating its superior resistance to backdoor attacks and slight impact on the accuracy of the global model.
    摘要 现有方法在 federated learning (FL) 中防止后门攻击主要通过:一、减轻感染模型的影响,或二、排除感染模型。前者会影响模型精度,而后者通常基于全局清晰的边界来分开干扰和不干扰模型更新。然而,模型更新在实际情况中容易杂mix和散布,这使得前两种方法具有局限性。这种工作关注于排除感染模型在 FL 中。与前一种全球视图不同,我们提出了 Snowball,一个新的反后门 FL 框架,基于个体视图而 inspirited 由我们所采用的一个原理和 FL 和深度学习中的两个原理。它的特点包括:a)底层选举,每个候选模型更新可以向几个同等 peer 模型更新投票,以选出一些模型更新作为选择者进行聚合;b)顶层选举,选择者逐渐扩大自己通过挑选候选模型更新。我们与现有的防御技术进行比较,在五个真实的数据集上,展示了 Snowball 对后门攻击的高度抵抗力和模型全球精度的轻微影响。

On the Trade-offs between Adversarial Robustness and Actionable Explanations

  • paper_url: http://arxiv.org/abs/2309.16452
  • repo_url: None
  • paper_authors: Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju
  • for: 这 paper 的目的是研究机器学习模型在不同情况下的可解释性和抗击攻击性的关系。
  • methods: 这 paper 使用了现有的 state-of-the-art 算法来生成可解释性和抗击攻击性的模型。
  • results: 这 paper 的结果表明,逐渐增加模型的抗击攻击性会导致可解释性减退,而且在某些情况下,可能导致可解释性和抗击攻击性之间存在负相关性。
    Abstract As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations
    摘要 machine learning模型在不同的高风险场景中越来越常被应用,因此确保这些模型的预测结果不仅抗击性强,而且可以快速地解释给关键参与者变得非常重要。然而,是否可以同时实现这两个目标,或者存在这两个目标之间的负担,这是一个未知的问题。在这种情况下,我们在研究抗击性模型对可行的解释的影响,这些解释可以提供结束用户一种纠正的机会。我们在理论和实际上分析了使用当前的算法生成的纠正措施的成本(实施的容易度)和有效性(获得正确模型预测结果的概率)。我们在理论上 deriv了对抗性模型和非抗击性模型的线性和非线性模型的分析结果,并通过多个实际数据集的实验 validate我们的理论结果,以验证模型的抗击性对纠正措施的影响。我们的分析结果表明,抗击性模型会增加纠正措施的成本和降低纠正措施的有效性,从而揭示了这两个目标之间的内在负担。

A parsimonious, computationally efficient machine learning method for spatial regression

  • paper_url: http://arxiv.org/abs/2309.16448
  • repo_url: None
  • paper_authors: Milan Žukovič, Dionissios T. Hristopulos
  • For: The paper is written for researchers and practitioners who are interested in spatial/temporal regression and machine learning methods.* Methods: The paper introduces the modified planar rotator method (MPRS), a non-parametric machine learning method that incorporates spatial or temporal correlations via short-range, distance-dependent “interactions” without assuming a specific form for the underlying probability distribution. The method uses equilibrium conditional Monte Carlo simulations to make predictions.* Results: The paper reports tests on various synthetic and real-world data in one, two, and three dimensions that demonstrate the competitiveness of MPRS prediction performance with standard interpolation methods such as ordinary kriging and inverse distance weighting, especially for rough and non-Gaussian data. The method also shows superior computational efficiency and scalability for large samples, allowing for the processing of massive data sets involving millions of nodes in a few seconds on a standard personal computer.
    Abstract We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.
    摘要 我团队引入修改的平面旋转方法(MPRS),这是一种物理启发的机器学习方法,用于空间/时间回归。MPRS 是一种非参数型模型,通过短距离、距离相互作用来包含空间或时间相关性,不需要假设特定的概率分布。预测通过完全自主学习算法,使用平衡conditional Monte Carlo仿真。MPRS 可以处理散列数据和任意空间维度。我们在一、二、三维 Synthetic 和实际数据上进行了多种测试,显示 MPRS 的预测性能(无需参数调整)与标准 interpolate 方法相匹配,如ordinary kriging 和 inverse distance weighting。尤其是在非欧几何数据(如日降雨时间序列)中,MPRS 表现出色,可以快速处理大量数据,只需几秒钟在标准个人电脑上处理百万个节点。

Nonlinear MPC design for incrementally ISS systems with application to GRU networks

  • paper_url: http://arxiv.org/abs/2309.16428
  • repo_url: None
  • paper_authors: Fabio Bonassi, Alessio La Bella, Marcello Farina, Riccardo Scattolini
  • for: 这篇论文旨在设计一种非线性预测控制策略,用于 exponentially incremental Input-to-State Stable(ISS)系统。
  • methods: 这种方法不需要耗费庞大的计算终端成分,而是基于显式定义的最小预测时间框,以保证关闭循环稳定性。
  • results: 这种控制方法在 GRU 网络上应用,并提供了一种适应性较高的状态观察器,并且有确定的收敛保证。 tested on a benchmark system, demonstrating its good control performances and efficient applicability.
    Abstract This brief addresses the design of a Nonlinear Model Predictive Control (NMPC) strategy for exponentially incremental Input-to-State Stable (ISS) systems. In particular, a novel formulation is devised, which does not necessitate the onerous computation of terminal ingredients, but rather relies on the explicit definition of a minimum prediction horizon ensuring closed-loop stability. The designed methodology is particularly suited for the control of systems learned by Recurrent Neural Networks (RNNs), which are known for their enhanced modeling capabilities and for which the incremental ISS properties can be studied thanks to simple algebraic conditions. The approach is applied to Gated Recurrent Unit (GRU) networks, providing also a method for the design of a tailored state observer with convergence guarantees. The resulting control architecture is tested on a benchmark system, demonstrating its good control performances and efficient applicability.
    摘要 Translation notes:* "Nonlinear Model Predictive Control" is translated as "非线性预测控制" (fēi xiàn xìng yù jí kòng zhì)* "Input-to-State Stable" is translated as "输入到状态稳定" (yù xīn dào zhèng dìng)* "Gated Recurrent Unit" is translated as "闭合回归单元" (bì hé huí qù dān yuán)* "Recurrent Neural Networks" is translated as "循环神经网络" (xún huán shēn zhì wǎng wǎn)

Selective Nonparametric Regression via Testing

  • paper_url: http://arxiv.org/abs/2309.16412
  • repo_url: None
  • paper_authors: Fedor Noskov, Alexander Fishkov, Maxim Panov
  • for: 本研究targets the nonparametric heteroskedastic regression problem, and develops an abstention procedure for selective prediction.
  • methods: 该方法通过测试 conditional variance 的值来实现选择性预测。与现有方法不同的是,该方法可以考虑 conditional variance 预测器的uncertainty。
  • results: 研究提供了非 asymptotic 的 risk bound,并证明了不同的收敛 режи。实验结果在 simulated 和实际数据上都有示例。
    Abstract Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
    摘要 预测(或选择性预测)是重要的错误敏感机器学学问。在分类设置下,选择性预测已经受到了广泛的研究。但在回归设置下,选择性预测还是很少研究。在这种工作中,我们考虑了非参数化不同方差回归问题,并开发了一种投票过程,通过测试对给定点的 conditional variance 的假设来实现选择性预测。与现有方法不同的是,我们的方法不仅考虑了 conditional variance 的值本身,还考虑了相关的 variance 预测器的uncertainty。我们证明了非 asymptotic 的风险 bound,并证明了不同的整合 regime 的存在。理论分析通过了一系列的 simulate 和实际数据实验来进行了说明。

Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption

  • paper_url: http://arxiv.org/abs/2309.16409
  • repo_url: None
  • paper_authors: Yuhang Zhang, Yue Liu, Zhihua Zhang
  • for: 将多个随机控制试验的信息传输到目标人口,其中只有控制组数据 available。
  • methods: 使用模拟控制方法构建目标人口的合成治疗组,通过对源人口治疗组的权重进行最小化 conditional maximum mean discrepancy 来Estimate weights。
  • results: 我们建立了合成治疗组估计器的非 Parametric 性质,并通过实验证明了我们的方法可以作为 mean exchangeability assumption 被违反时的新的 complementary approach。
    Abstract The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.
    摘要 本研究的目的是将多个随机控制试验中的信息传输到目标人口,其中只有控制组数据 available。先前的研究几乎完全依赖于mean exchangeability假设。然而,根据当前的研究所指出,mean exchangeability假设可能被违反。我们被Synthetic control方法所驱动,通过对源人口中的治疗组组合一个权重的混合来构建目标人口中的 sintetic treatment组。我们对这个权重进行最小化条件的最大均值差来确定。我们建立了这种 sintetic treatment组估计器的几何分布正态性,基于 Sieving 半 Parametric 理论。我们的方法可以作为mean exchangeability假设不成立时的一种新的补充方法。我们在synthetic和实际世界数据上进行了实验,以证明我们的方法的有效性。

VAE-based latent-space classification of RNO-G data

  • paper_url: http://arxiv.org/abs/2309.16401
  • repo_url: None
  • paper_authors: Thorsten Glüsenkamp
  • for: 这个论文是为了描述一种用变量自适应Encoder(VAE)Latent Space来分类不同的噪声类型的方法。
  • methods: 该方法使用变量自适应Encoder(VAE)来生成一个含有噪声信息的笛割空间,然后使用这个笛割空间来分类不同的噪声类型。
  • results: 该方法可以自动检测和分类不同的噪声类型,包括物理风吹引起的信号和人工噪声。这些结果可以用来识别和分类不同的事件类型。
    Abstract The Radio Neutrino Observatory in Greenland (RNO-G) is a radio-based ultra-high energy neutrino detector located at Summit Station, Greenland. It is still being constructed, with 7 stations currently operational. Neutrino detection works by measuring Askaryan radiation produced by neutrino-nucleon interactions. A neutrino candidate must be found amidst other backgrounds which are recorded at much higher rates -- including cosmic-rays and anthropogenic noise -- the origins of which are sometimes unknown. Here we describe a method to classify different noise classes using the latent space of a variational autoencoder. The latent space forms a compact representation that makes classification tractable. We analyze data from a noisy and a silent station. The method automatically detects and allows us to qualitatively separate multiple event classes, including physical wind-induced signals, for both the noisy and the quiet station.
    摘要 <>格陵兰的电台中微子观测站(RNO-G)是一个位于格陵兰的电子基本高能中微子探测器。它目前正在建设,现有7个站点已经运行。中微子探测是通过测量阿斯卡莱涅发生的中微子-原子间相互作用产生的探测。中微子候选者需要在其他背景中被发现,其中包括宇宙射线和人类噪声,它们的起源有时未知。我们介绍了一种使用变量自动编码器的方法来分类不同的噪声类型。变量空间形成了一个紧凑的表示,使得分类变得可追踪。我们分析了具有噪声和无噪声的站点的数据。该方法自动检测并允许我们质量地分开多个事件类型,包括物理风引起的信号,对于两个站点都是如此。>>>

Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey

  • paper_url: http://arxiv.org/abs/2309.16398
  • repo_url: None
  • paper_authors: Lea Demelius, Roman Kern, Andreas Trügler
  • for: 这篇论文主要探讨了在机器学习领域中的差分隐私技术,尤其是在中央化深度学习中实现数据隐私保护的方法。
  • methods: 该论文使用了系统性的文献回顾,对最新的进展和开放问题进行了深入的分析,并对隐私Utility贸易的最新进展、抵御广泛的威胁和攻击、差分隐私生成模型以及emerging应用领域进行了讨论。
  • results: 该论文提供了一个全面的国家概述,概括了隐私中央化深度学习领域的最新进展和挑战,并对未来发展预测提出了一些建议。
    Abstract Differential Privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state-of-the-art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: auditing and evaluation methods for private models, improvements of privacy-utility trade-offs, protection against a broad range of threats and attacks, differentially private generative models, and emerging application domains.
    摘要 Diffential Privacy 已经成为机器学习中数据保护的广泛使用方法,特别是它允许提出严格的数学隐私保证。本调查提供了关于批处理隐私中心深度学习的现状报告,包括近期进展和开问题的系统性分析,以及未来发展领域的讨论。根据系统Literature Review,以下话题被讨论:隐私评估和评估方法 для私人模型,隐私利用 trait的改进,对各种威胁和攻击的保护,隐私生成模型,和emerging应用领域。

Multi-Swap $k$-Means++

  • paper_url: http://arxiv.org/abs/2309.16384
  • repo_url: None
  • paper_authors: Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi, Nikos Parotsidis
    for: 提高$k$-means clustering问题的解决方案质量methods: 使用$k$-means++搜索分布进行$O(k \log \log k)$次本地搜索,并通过多中心同时交换来提高解决方案质量results: 实现了$9 + \varepsilon$的近似比率,并在多个数据集上显示了重要的实践改进。
    Abstract The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.
    摘要 “Arthur和Vassilvitskii(SODA 2007)的$k$-means++算法是优化流行的$k$-means减少对象的偏好算法,并且知道可以在期望下提供$O(\log k)$-approximation。为了获得更高质量的解决方案,Lattanzi和Sohler(ICML 2019)提出了在$k$-means++采样分布中进行$O(k \log \log k)$次本地搜索,以便实现$c$-approximation的$k$-means减少问题,其中$c$是一个大的绝对常数。在这里,我们扩展和推广他们的本地搜索算法,考虑更大和更复杂的本地搜索邻域,因此可以同时交换多个中心。我们的算法实现了$9 + \varepsilon$的approximation比率,这是本地搜索中的最佳可能性。重要的是,我们示出了我们的方法在许多数据集上实现了重要的实践改进。”

MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network

  • paper_url: http://arxiv.org/abs/2309.16374
  • repo_url: None
  • paper_authors: Akihiro Kishimoto, Hiroshi Kajino, Masataka Hirose, Junta Fuchiwaki, Indra Priyadarsini, Lisa Hamada, Hajime Shinohara, Daiju Nakano, Seiji Takeda
  • for: 物料发现中的性质预测
  • methods: combine graph neural network (GNN) with Molecular Hypergraph Grammar (MHG)
  • results: 在多种物料性质预测任务中显示出搭配MHG-GNN的扩展性和可靠性
    Abstract Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.
    摘要 物理预测在材料发现中扮演着重要的角色。作为材料科学基础模型的初步阶段,我们介绍了一种新的自适应神经网络(GNN),即MHG-GNN,它将分子 гиперграмmar(MHG)与神经网络相结合。对于多种不同材料的性能预测任务,MHG-GNN表现了良好的承诺。

Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification

  • paper_url: http://arxiv.org/abs/2309.16369
  • repo_url: None
  • paper_authors: Manuel Milling, Andreas Triantafyllopoulos, Iosif Tsangko, Simon David Noel Rampp, Björn Wolfgang Schuller
  • for: 这篇论文探讨了深度神经网络中损失函数的锐度与泛化性之间的关系,特别是在音频Scene分类任务上。
  • methods: 该研究基于二维滤波器Normalized visualizations和一种 derive sharpness measure,对loss函数的不同部分进行了分析。
  • results: 结果表明,锐度较高的损失函数极 minimum tend to have better generalization performance,尤其是对于来自新设备的Out-of-domain data。此外,选择优化器的选择是主要驱动锐度的变化。
    Abstract The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the audio scene classification task of the DCASE2020 challenge data. Our analysis is based on twodimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.
    摘要 TEXT深度神经网络中loss minimum的锐度和泛化关系已经引起了长时间的讨论。我们在DCASE2020挑战数据集上进行了对音频Scene classification任务的分析,使用两维filter-normalized visualization和派生的锐度度量。我们的探索分析表明,锐度较高的 minimum 往往具有更好的泛化性,尤其是在外部数据集和从前未知设备上录制的数据集中。此外,我们发现了优化器的选择是loss minimum的锐度的主要驱动器,并讨论了相关的局限性。我们的代码、训练模型状态和损失地图Visualization都是公共可用的。Here's the translation in Traditional Chinese:TEXT深度神经网络中loss minimum的锯度和泛化关系已经引起了长时间的讨论。我们在DCASE2020挑战数据集上进行了对音频Scene classification任务的分析,使用两维filter-normalized visualization和派生的锯度度量。我们的探索分析显示,锯度较高的 minimum 往往具有更好的泛化性,尤其是在外部数据集和从前未知设备上录制的数据集中。此外,我们发现了优化器的选择是loss minimum的锯度的主要驱动器,并讨论了相关的局限性。我们的代码、训练模型状态和损失地图Visualization都是公共可用的。

Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2309.16357
  • repo_url: https://github.com/duyguislakoglu/temt
  • paper_authors: Duygu Sezen Islakoglu, Mel Chekol, Yannis Velegrakis
  • for: 本研究旨在提出一种基于语言模型的文本强化时间知识Graph completion(KGC)框架,以利用知识图中的文本描述和时间信息来提高知识图 completion的效果。
  • methods: 本研究使用了语言模型的参数知识,将文本和时间信息分别处理,并将其相互融合以生成可能性分数。与之前的方法不同,本研究能够 Capture time dependencies and perform inductive inference on unseen entities.
  • results: 在不同的时间间隔预测和 triple classification 任务中,TEMT 与当前状态域的方法相当竞争,并且在 inductive Settings中表现出 excel。
    Abstract Most knowledge graph completion (KGC) methods learn latent representations of entities and relations of a given graph by mapping them into a vector space. Although the majority of these methods focus on static knowledge graphs, a large number of publicly available KGs contain temporal information stating the time instant/period over which a certain fact has been true. Such graphs are often known as temporal knowledge graphs. Furthermore, knowledge graphs may also contain textual descriptions of entities and relations. Both temporal information and textual descriptions are not taken into account during representation learning by static KGC methods, and only structural information of the graph is leveraged. Recently, some studies have used temporal information to improve link prediction, yet they do not exploit textual descriptions and do not support inductive inference (prediction on entities that have not been seen in training). We propose a novel framework called TEMT that exploits the power of pre-trained language models (PLMs) for text-enhanced temporal knowledge graph completion. The knowledge stored in the parameters of a PLM allows TEMT to produce rich semantic representations of facts and to generalize on previously unseen entities. TEMT leverages textual and temporal information available in a KG, treats them separately, and fuses them to get plausibility scores of facts. Unlike previous approaches, TEMT effectively captures dependencies across different time points and enables predictions on unseen entities. To assess the performance of TEMT, we carried out several experiments including time interval prediction, both in transductive and inductive settings, and triple classification. The experimental results show that TEMT is competitive with the state-of-the-art.
    摘要 大多数知识图完成(KGC)方法将实体和关系映射到一个维度空间中,以便学习缓存的表示。尽管大多数这些方法专注于静态知识图,但许多公共可用的KG具有时间信息,表示一个特定事实在某个时间点/时间段内是真。这些图被称为temporal知识图。此外,知识图也可能包含实体和关系的文本描述。 static KGC方法不会考虑时间信息和文本描述,只是利用结构信息来学习表示。最近,一些研究使用了时间信息来改进链接预测,但是它们不会利用文本描述,并且不支持推导推理(预测已经在训练中没有看到的实体)。我们提出了一种新的框架called TEMT,它利用预训练语言模型(PLM)来提高文本扩展 temporal knowledge graph completion。TEMT可以利用知识图中的文本和时间信息,将它们分 separetely处理,并将它们融合以获得可能性分数。与先前的方法不同,TEMT可以有效地捕捉不同时间点之间的依赖关系,并允许预测未经训练的实体。为评估TEMT的性能,我们进行了多个实验,包括时间间隔预测、满意度预测和 triple classification。实验结果表明,TEMT与状态之前的方法竞争。

ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging

  • paper_url: http://arxiv.org/abs/2309.16353
  • repo_url: https://github.com/msd-irimas/shapedba
  • paper_authors: Ali Ismail-Fawaz, Hassan Ismail Fawaz, François Petitjean, Maxime Devanne, Jonathan Weber, Stefano Berretti, Geoffrey I. Webb, Germain Forestier
  • for: 本研究目的是生成有用的时序数据 exemplars 和 prototypes,并提出了一种新的时序数据平均方法,ShapeDTW Barycentric Average。
  • methods: 本文使用了ShapeDTW Barycentric Average,一种基于时序相似度度量DTW的新型平均方法,以生成更加有用的时序数据 exemplars 和 prototypes。
  • results: 根据UCR数据集库中的123个数据集,与k-means减少算法相结合,ShapeDTW Barycentric Average可以达到新的州OF-THE-ARTResultsin Adjusted Rand Index。
    Abstract Time series data can be found in almost every domain, ranging from the medical field to manufacturing and wireless communication. Generating realistic and useful exemplars and prototypes is a fundamental data analysis task. In this paper, we investigate a novel approach to generating realistic and useful exemplars and prototypes for time series data. Our approach uses a new form of time series average, the ShapeDTW Barycentric Average. We therefore turn our attention to accurately generating time series prototypes with a novel approach. The existing time series prototyping approaches rely on the Dynamic Time Warping (DTW) similarity measure such as DTW Barycentering Average (DBA) and SoftDBA. These last approaches suffer from a common problem of generating out-of-distribution artifacts in their prototypes. This is mostly caused by the DTW variant used and its incapability of detecting neighborhood similarities, instead it detects absolute similarities. Our proposed method, ShapeDBA, uses the ShapeDTW variant of DTW, that overcomes this issue. We chose time series clustering, a popular form of time series analysis to evaluate the outcome of ShapeDBA compared to the other prototyping approaches. Coupled with the k-means clustering algorithm, and evaluated on a total of 123 datasets from the UCR archive, our proposed averaging approach is able to achieve new state-of-the-art results in terms of Adjusted Rand Index.
    摘要 时序数据可以在各个领域中找到,从医疗领域到制造和无线通信。生成实用和真实的示例和原型是时序数据分析的基本任务。在这篇论文中,我们研究了一种新的方法来生成实用和真实的示例和原型 для时序数据。我们的方法使用了一种新的时序数据平均方法,即ShapeDTW矩阵平均。因此,我们转移我们的注意力于准确地生成时序示例原型的新方法。现有的时序示例原型生成方法基于动态时间戳匹配(DTW)相似度度量,如DTW矩阵平均(DBA)和SoftDBA。这些方法都受到了生成不符合分布的缺陷,主要是因为DTW变体使用的匹配方法无法检测邻域相似性,而是检测绝对相似性。我们提议的方法,ShapeDBA,使用ShapeDTW变体,解决了这个问题。我们选择了时序分组,一种流行的时序分析方法来评估ShapeDBA的结果与其他原型生成方法相比。与k-means分 clustering算法结合,我们在UCAR存档中的总共123个数据集上进行了评估,并实现了新的状态之册 Rand Index的最佳成绩。

LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite

  • paper_url: http://arxiv.org/abs/2309.16342
  • repo_url: https://github.com/tumaer/lagrangebench
  • paper_authors: Artur P. Toshev, Gianluca Galletti, Fabian Fritz, Stefan Adami, Nikolaus A. Adams
  • for: 这个论文主要是为了探讨 Lagrange 粒子方法在 grid-based PDE 模型中的应用,以及如何使用机器学习来学习 PDE 解。
  • methods: 论文使用了 Lagrangian 粒子方法生成了七个新的 fluid mechanics 数据集,并提供了一个基于 JAX 的高效 API,以及一些现代训练策略和三种邻居搜索算法。
  • results: 论文 introduces physical metrics like kinetic energy MSE and Sinkhorn distance to measure the performance of learned surrogates, and provides baseline results using GNNs like GNS and SEGNN.
    Abstract Machine learning has been successfully applied to grid-based PDE modeling in various scientific applications. However, learned PDE solvers based on Lagrangian particle discretizations, which are the preferred approach to problems with free surfaces or complex physics, remain largely unexplored. We present LagrangeBench, the first benchmarking suite for Lagrangian particle problems, focusing on temporal coarse-graining. In particular, our contribution is: (a) seven new fluid mechanics datasets (four in 2D and three in 3D) generated with the Smoothed Particle Hydrodynamics (SPH) method including the Taylor-Green vortex, lid-driven cavity, reverse Poiseuille flow, and dam break, each of which includes different physics like solid wall interactions or free surface, (b) efficient JAX-based API with various recent training strategies and three neighbor search routines, and (c) JAX implementation of established Graph Neural Networks (GNNs) like GNS and SEGNN with baseline results. Finally, to measure the performance of learned surrogates we go beyond established position errors and introduce physical metrics like kinetic energy MSE and Sinkhorn distance for the particle distribution. Our codebase is available at https://github.com/tumaer/lagrangebench .
    摘要 机器学习已成功应用于网格基的 Partiall differential equation 模型化在不同科学领域。然而,基于拉格朗日 particulate 方法的学习 PDE 解决方案,这些解决方案更适用于具有自由表面或复杂物理的问题,仍然未得到充分探索。我们介绍了 LagrangeBench,首个为 Lagrangian particulate 问题提供了benchmarking 集成。具体来说,我们的贡献包括:(a) seven 个新的 fluid mechanics 数据集(四个在 2D 和三个在 3D),通过 Smoothed Particle Hydrodynamics (SPH) 方法生成,包括泰勒-绿 Vortex、封闭 Cavity、反 Poiseuille 流和溢流,每个数据集都包含不同的物理学如固体壁面交互或自由表面。(b)高效的 JAX-based API,包括各种最新的训练策略和三种邻居搜索算法。(c) JAX 实现了一些Established Graph Neural Networks (GNNs),如 GNS 和 SEGNN,以及基线结果。最后,为了评估学习的表现,我们不仅使用了传统的位置错误,还引入了物理指标如动能差分和 Sinkhorn 距离,用于测试 particulate 分布的性能。我们的代码库可以在 https://github.com/tumaer/lagrangebench 上下载。

EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect

  • paper_url: http://arxiv.org/abs/2309.16338
  • repo_url: None
  • paper_authors: Jiashi Gao, Changwu Huang, Ming Tang, Shin Hwei Tan, Xin Yao, Xuetao Wei
  • for: The paper aims to address the issue of unequal representation and bias in federated learning (FL) when dealing with heterogeneous datasets from multiple clients.
  • methods: The proposed Egalitarian Fairness Federated Learning (EFFL) method uses a constrained multi-objective optimization approach to optimize the egalitarian fairness and performance of the global model.
  • results: The proposed EFFL algorithm outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.Here’s the simplified Chinese text for the three information points:
  • for: 本研究旨在解决 Federated Learning(FL)中处理多客户端数据的不均衡和偏见问题。
  • methods: 提议的 Egalitarian Fairness Federated Learning(EFFL)方法使用受限多目标优化方法来优化全局模型的 egalitarian fairness 和性能。
  • results: 提议的 EFFL 算法在实现高性能全局模型的同时,也能够提高所有客户端的 egalitarian fairness。
    Abstract Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accuracy and higher bias on their local data. According to the Matthew effect, which describes how the advantaged gain more advantage and the disadvantaged lose more over time, deploying such a global model in client applications may worsen the resource disparity among the clients and harm the principles of social welfare and fairness. To mitigate the Matthew effect, we propose Egalitarian Fairness Federated Learning (EFFL), where egalitarian fairness refers to the global model learned from FL has: (1) equal accuracy among clients; (2) equal decision bias among clients. Besides achieving egalitarian fairness among the clients, EFFL also aims for performance optimality, minimizing the empirical risk loss and the bias for each client; both are essential for any ML model training, whether centralized or decentralized. We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. We propose a gradient-based three-stage algorithm to obtain the Pareto optimal solutions within the constraint space. Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.
    摘要 To address this issue, we propose Egalitarian Fairness Federated Learning (EFFL), which aims to achieve egalitarian fairness among clients in two aspects:1. Equal accuracy among clients.2. Equal decision bias among clients.Besides egalitarian fairness, EFFL also pursues performance optimality by minimizing the empirical risk loss and bias for each client. This is essential for any ML model training, whether centralized or decentralized.We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. To obtain the Pareto optimal solutions within the constraint space, we propose a gradient-based three-stage algorithm.Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.

DeepPCR: Parallelizing Sequential Operations in Neural Networks

  • paper_url: http://arxiv.org/abs/2309.16318
  • repo_url: None
  • paper_authors: Federico Danieli, Miguel Sarabia, Xavier Suau, Pau Rodríguez, Luca Zappella
  • for: 加速深度神经网络的推理和训练
  • methods: 使用平行化技术以并行执行层次结构中的操作
  • results: 在多层感知器中实现了高达30倍的前向传播速度增加和高达200倍的反向传播速度增加,以及在 diffusion 模型中实现了更快的训练和生成速度
    Abstract Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a computational cost proportional to the number of steps involved, presenting a potential bottleneck as the number of steps increases. In this work, we introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks. DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. This reduces the complexity of computing the sequential operations from $\mathcal{O}(L)$ to $\mathcal{O}(\log_2L)$, thus yielding a speedup for large $L$. To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to $30\times$ for the forward and $200\times$ for the backward pass. We additionally showcase the flexibility of DeepPCR by parallelizing training of ResNets with as many as 1024 layers, and generation in diffusion models, enabling up to $7\times$ faster training and $11\times$ faster generation, respectively, when compared to the sequential approach.
    摘要 深度学习模型的推理和训练速度加速技术已经广泛应用。然而,许多操作仍然以序列方式进行,例如层次推理和反向传播。这种序列方式会导致计算成本与操作步骤数直线关系,从而带来计算成本增加的潜在瓶颈。在这项工作中,我们介绍了深度PCR算法,它可以将通常以序列方式进行的操作并行化,以加速深度学习模型的推理和训练。深度PCR基于解释一系列$L$步骤为特定系统方程的解,我们使用并行循环减少算法来解决这些方程。这将计算序列操作的复杂度从 $\mathcal{O}(L)$ 降低到 $\mathcal{O}(\log_2L)$,从而实现大量$L$的速度增加。为了证明算法的理论下界复杂度,以及哪些情况下可以获得加速,我们在多层感知器的前向和反向传播中测试了深度PCR的效果,并达到了最多$30\times$的加速。此外,我们还示cases了深度PCR的灵活性,可以并行训练具有1024层的ResNet模型,以及在Diffusion模型中的生成过程,各自实现了$7\times$快的训练和$11\times$快的生成。

Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models

  • paper_url: http://arxiv.org/abs/2309.16316
  • repo_url: None
  • paper_authors: Jia-Shu Pan, Yuan-Sen Ting, Jie Yu
  • for: 这个研究是为了使用深度学习方法来探索恒星的振荡和粒子运动,并从light curve中提取有用的信息。
  • methods: 这个研究使用了一种名为“Astroconformer”的Transformer-based深度学习框架,可以从light curve中捕捉到恒星内部结构和进化状态的信息。
  • results: 研究发现,在训练数据丰富的情况下,Astroconformer可以实现Surface gravity(log g)的估计,其RMSE为0.017 dex,而在训练数据稀缺的情况下,RMSE可以达0.1 dex。此外,这个模型也超越了传统的K-nearest neighbor-based模型和现有的CNN模型。
    Abstract Light curves of stars encapsulate a wealth of information about stellar oscillations and granulation, thereby offering key insights into the internal structure and evolutionary state of stars. Conventional asteroseismic techniques have been largely confined to power spectral analysis, neglecting the valuable phase information contained within light curves. While recent machine learning applications in asteroseismology utilizing Convolutional Neural Networks (CNNs) have successfully inferred stellar attributes from light curves, they are often limited by the local feature extraction inherent in convolutional operations. To circumvent these constraints, we present $\textit{Astroconformer}$, a Transformer-based deep learning framework designed to capture long-range dependencies in stellar light curves. Our empirical analysis, which focuses on estimating surface gravity ($\log g$), is grounded in a carefully curated dataset derived from $\textit{Kepler}$ light curves. These light curves feature asteroseismic $\log g$ values spanning from 0.2 to 4.4. Our results underscore that, in the regime where the training data is abundant, $\textit{Astroconformer}$ attains a root-mean-square-error (RMSE) of 0.017 dex around $\log g \approx 3 $. Even in regions where training data are sparse, the RMSE can reach 0.1 dex. It outperforms not only the K-nearest neighbor-based model ($\textit{The SWAN}$) but also state-of-the-art CNNs. Ablation studies confirm that the efficacy of the models in this particular task is strongly influenced by the size of their receptive fields, with larger receptive fields correlating with enhanced performance. Moreover, we find that the attention mechanisms within $\textit{Astroconformer}$ are well-aligned with the inherent characteristics of stellar oscillations and granulation present in the light curves.
    摘要 星星的光谱Curve包含了许多关于恒星振荡和 granulation的信息,因此可以提供关键的内部结构和演化状态信息。传统的asteroseismic技术主要是通过功率 spectral analysis来分析,忽略了光谱Curve中的价值得 phase information。而最近的机器学习应用在asteroseismology中使用Convolutional Neural Networks (CNNs) 已经成功地从光谱Curve中推断出了星宿属性,但它们frequently受到了 convolutional operation中的本地特征提取的限制。为了缺乏这些限制,我们提出了 $\textit{Astroconformer}$,一种基于Transformer的深度学习框架,可以捕捉stellar light curves中的长距离依赖关系。我们的实验,关注于estersurface gravity( $\log g$ )的估算,基于 $\textit{Kepler}$ 光谱Curve的精心划分 datasets。这些光谱Curve的asteroseismic $\log g$ 值覆盖了0.2至4.4之间。我们的结果表明,当训练数据充足时, $\textit{Astroconformer}$ 在 $\log g \approx 3 $ 的 regime内具有根圆弧误差(RMSE)为0.017 dex。甚至在训练数据稀缺的地方,RMSE可以达到0.1 dex。它不仅超过了基于K-nearest neighbor( $\textit{The SWAN}$)的模型,还超过了当前的 state-of-the-art CNNs。归并学习表明,模型在这个特定任务中的 efficacy 强烈受到了其 reception field 的大小的影响,大 reception field 与更高的表现相关。此外,我们发现 $\textit{Astroconformer}$ 中的注意机制与stellar oscillations和 granulation在光谱Curve中的特点相吻合。

A Primer on Bayesian Neural Networks: Review and Debates

  • paper_url: http://arxiv.org/abs/2309.16314
  • repo_url: https://github.com/konstantinos-p/Bayesian-Neural-Networks-Reading-List
  • paper_authors: Julyan Arbel, Konstantinos Pitas, Mariia Vladimirova, Vincent Fortuin
  • for: 本文旨在介绍权重神经网络(BNN)的基本概念和 bayesian 统计学的结合,以提高神经网络的预测性能和可解性。
  • methods: 本文使用 bayesian 统计学和神经网络的组合来解决神经网络的困难问题,如预测过度自信、可解性和攻击难免性等。
  • results: 本文提供了一种系统性的介绍,涵盖了 bayesian 统计学和神经网络之间的相互作用,以及在实际应用中的考虑因素。 additionally, the paper explores advanced topics in BNN research and acknowledges ongoing debates and controversies in the field.
    Abstract Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
    摘要 neural networks 已经在不同的问题领域 достичь了很高的表现,但它们的广泛应用受到了内在的限制,如预测时的过度自信、难以解释性和针对攻击的敏感性。为了解决这些挑战, Bayesian neural networks(BNNs)作为传统神经网络的吸收性扩展,将不确定性估计integrated into its predictive capabilities。这个全面的指南将为拥有bayesian方法背景的统计学家提供系统性的引入,以及擅长深度学习的机器学习专家,尽管有限的bayesian统计知识。我们将详细介绍常用的 prior,并 analyze its impact on model behavior and performance。此外,我们还会讨论BNNs在训练和推理中的实际问题。此外,我们还会探讨BNN的高级主题,包括正在进行的辩论和争议。通过这个指南,您将获得BNN的坚实基础知识,并了解这个动态领域的潜在应用。作为一个有价值的资源,这个指南将促进BNN的理解和推动知识和创新的进步。

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

  • paper_url: http://arxiv.org/abs/2309.17366
  • repo_url: None
  • paper_authors: Taojie Kuang, Yiming Ren, Zhixiang Ren
  • for: 预测药物的物理性质,提高早期屏选和优化药物候选者。
  • methods: 提出了一种基于深度学习的3D结构基本模型方法,使用三个几何图来提取3D特征,并使用对比学习来预训练模型。
  • results: 在7个标准测试 benchmark 上比较了3D-Mol 与多种现有基eline(SOTA),在5个标准测试 benchmark 上表现出色。
    Abstract Molecular property prediction offers an effective and efficient approach for early screening and optimization of drug candidates. Although deep learning based methods have made notable progress, most existing works still do not fully utilize 3D spatial information. This can lead to a single molecular representation representing multiple actual molecules. To address these issues, we propose a novel 3D structure-based molecular modeling method named 3D-Mol. In order to accurately represent complete spatial structure, we design a novel encoder to extract 3D features by deconstructing the molecules into three geometric graphs. In addition, we use 20M unlabeled data to pretrain our model by contrastive learning. We consider conformations with the same topological structure as positive pairs and the opposites as negative pairs, while the weight is determined by the dissimilarity between the conformations. We compare 3D-Mol with various state-of-the-art (SOTA) baselines on 7 benchmarks and demonstrate our outstanding performance in 5 benchmarks.
    摘要 молекулярная свойство предсказание предлагает эффективный и эффективный подход для раннего скрининга и оптимизации кандидатов на лекарства. хотя методы на основе глубокого обучения сделали заметный прогресс, большинство существующих работ еще не полностью используют информацию о 3D-пространстве. это может привести к ситуации, когда один молекулярный представление отображает несколько реальных молекул. для решения этих проблем мы предлагаем новый метод 3D-Mol, который использует структурную моделирование молекул на основе трехмерных графиков. кроме того, мы используем 20M немаркированных данных для предварительного обучения нашего модели с помощью обучения с contraste. мы считаем конфигурации с одинаковой топологической структурой положительными парами, а противоположные конфигурации - отрицательными парами, а вес определяется с помощью разницы между конфигурациями. мы сравниваем 3D-Mol с различными стандартными базами на 7 benchmarks и демонстрируем нашу выдающуюся производительность на 5 benchmarks.

CasIL: Cognizing and Imitating Skills via a Dual Cognition-Action Architecture

  • paper_url: http://arxiv.org/abs/2309.16299
  • repo_url: None
  • paper_authors: Zixuan Chen, Ze Ji, Shuyang Liu, Jing Huo, Yiyu Chen, Yang Gao
  • for: 本研究旨在提高机器人的长期任务完成能力,使其能够有效地模仿人类专家技能。
  • methods: 该研究提出了一种基于人类认知优先的技能学习框架(CasIL),通过人机交互来帮助机器人从视觉示例中学习重要的技能。
  • results: 实验结果表明, compared to其他方法, CasIL在多种长期任务中的机器人技能模仿能力具有竞争力和可靠性。
    Abstract Enabling robots to effectively imitate expert skills in longhorizon tasks such as locomotion, manipulation, and more, poses a long-standing challenge. Existing imitation learning (IL) approaches for robots still grapple with sub-optimal performance in complex tasks. In this paper, we consider how this challenge can be addressed within the human cognitive priors. Heuristically, we extend the usual notion of action to a dual Cognition (high-level)-Action (low-level) architecture by introducing intuitive human cognitive priors, and propose a novel skill IL framework through human-robot interaction, called Cognition-Action-based Skill Imitation Learning (CasIL), for the robotic agent to effectively cognize and imitate the critical skills from raw visual demonstrations. CasIL enables both cognition and action imitation, while high-level skill cognition explicitly guides low-level primitive actions, providing robustness and reliability to the entire skill IL process. We evaluated our method on MuJoCo and RLBench benchmarks, as well as on the obstacle avoidance and point-goal navigation tasks for quadrupedal robot locomotion. Experimental results show that our CasIL consistently achieves competitive and robust skill imitation capability compared to other counterparts in a variety of long-horizon robotic tasks.
    摘要 启用机器人效果地模仿专家技能,如行走、抓取和更多的任务,是长期挑战。现有的机器人学习(IL)方法仍然在复杂任务中表现不佳。在这篇论文中,我们考虑了如何通过人类认知优先级来解决这个挑战。我们准确地将行为扩展到高级认知(高水平)和低级动作(低水平)的双核心架构中,并提出了一种基于人类认知优先级的新型技能IL框架,称为认知动作基于技能学习(CasIL)。这种框架使得机器人代理人能够有效地认识和模仿从原始视觉示例中的关键技能。在高级认知指导低级动作的情况下,CasIL实现了both cognition和action imitation,提供了robustness和可靠性 для整个技能IL过程。我们在MuJoCo和RLBench标准吨量上进行了测试,以及 quadrupedal robot locomotion的障碍物避免和点目标导航任务。实验结果表明,我们的CasIL在多种长期机器人任务中具有竞争力和可靠的技能模仿能力。

A framework for paired-sample hypothesis testing for high-dimensional data

  • paper_url: http://arxiv.org/abs/2309.16274
  • repo_url: None
  • paper_authors: Ioannis Bargiotas, Argyris Kalogeratos, Nicolas Vayatis
  • for: This paper proposes a new approach to multidimensional paired-sample testing, which can handle numerous features and provide accurate results.
  • methods: The proposed approach uses scoring functions produced by decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. The optimal scoring function is obtained by the pseudomedian of those rules, which is estimated using the Hodges-Lehmann estimator.
  • results: The proposed approach is shown to have substantial performance gains in testing accuracy compared to traditional multivariate and multiple testing methods, while also providing estimates of each feature’s contribution to the final result.
    Abstract The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.
    摘要 traditional multivariate and multiple testing方法在多维度设定下存在一些缺陷,特别是当数据集具有大量特征时。一些研究表明,分类准确率可以作为两个样本测试的代理。然而,这种策略的理论基础和实践方法尚未得到过详细的探讨。在这项工作中,我们提出了一种思路,即可以通过定义每对实例之间的垂线段的截距线性函数来生成评分函数。然后,我们可以通过拓展自然的方式来获得最佳评分函数,这里我们使用拓展了HODGES-LEHMANN estimator来进行估计。我们因此提出了一种两步测试方法。第一步是估计每对实例之间的截距线性函数,并使用这些函数 derive一个总评分函数。然后,我们使用这个总评分函数对paired samples进行分类,并生成一个一维ensional的表示。第二步是在 obtained representation上perform Wilcoxon signed-rank test。我们的实验表明,我们的方法在测试准确率方面具有substantial的性能提升,同时能够计算每个特征对最终结果的贡献。

Hierarchical Network Data Analytics Framework for B5G Network Automation: Design and Implementation

  • paper_url: http://arxiv.org/abs/2309.16269
  • repo_url: None
  • paper_authors: Youbin Jeon, Sangheon Pack
  • for: 支持新服务的更flexible和elastic方式,帮助解决5G模块化网络函数管理中的复杂性
  • methods: 提出了一个分层网络数据分析框架(H-NDAF),将推理任务分配给多个叶子网络数据分析函数(Leaf NWDAF),训练任务进行根网络数据分析函数(Root NWDAF)进行
  • results: 通过使用开源软件(i.e., free5GC)进行广泛的 simulate结果表明,H-NDAF可以提供充分准确的分析结果,并且比 convential NWDAF更快地提供分析结果Here is the same information in Simplified Chinese:
  • for: 支持新服务的更flexible和elastic方式,帮助解决5G模块化网络函数管理中的复杂性
  • methods: 提出了一个分层网络数据分析框架(H-NDAF),将推理任务分配给多个叶子网络数据分析函数(Leaf NWDAF),训练任务进行根网络数据分析函数(Root NWDAF)进行
  • results: 通过使用开源软件(i.e., free5GC)进行广泛的 simulate结果表明,H-NDAF可以提供充分准确的分析结果,并且比 convential NWDAF更快地提供分析结果
    Abstract 5G introduced modularized network functions (NFs) to support emerging services in a more flexible and elastic manner. To mitigate the complexity in such modularized NF management, automated network operation and management are indispensable, and thus the 3rd generation partnership project (3GPP) has introduced a network data analytics function (NWDAF). However, a conventional NWDAF needs to conduct both inference and training tasks, and thus it is difficult to provide the analytics results to NFs in a timely manner for an increased number of analytics requests. In this article, we propose a hierarchical network data analytics framework (H-NDAF) where inference tasks are distributed to multiple leaf NWDAFs and training tasks are conducted at the root NWDAF. Extensive simulation results using open-source software (i.e., free5GC) demonstrate that H-NDAF can provide sufficiently accurate analytics and faster analytics provision time compared to the conventional NWDAF.
    摘要 5G 引入模块化网络功能(NF)以支持出现的服务更加灵活和弹性。为了减少这些模块化 NF 的管理复杂性,自动化网络运维和管理是必要的,因此3GPP 引入了网络数据分析功能(NWDAF)。然而,传统的 NWDAF 需要同时进行推理和训练任务,因此难以在增加数据分析请求后提供分析结果。在本文中,我们提议一种层次网络数据分析框架(H-NDAF),其中推理任务被分配到多个叶 NWDAF,而训练任务则在根 NWDAF 中进行。经过大量的 simulations 结果,我们发现 H-NDAF 可以提供充分的准确性和更快的分析结果提供时间,相比传统的 NWDAF。

Context-Based Tweet Engagement Prediction

  • paper_url: http://arxiv.org/abs/2310.03147
  • repo_url: https://gitlab.com/jovan_ns/2020recsystwitter
  • paper_authors: Jovan Jeromela
  • for: investigate how well context alone may be used to predict tweet engagement likelihood
  • methods: employ the Spark engine on TU Wien’s Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines, and manually create just under 200 additional features to describe tweet context
  • results: features describing users’ prior engagement history and the popularity of hashtags and links in the tweet were the most informative, and factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results, with context-based models underperforming in terms of the RCE score compared to content-only models and models developed by the Challenge winners.
    Abstract Twitter is currently one of the biggest social media platforms. Its users may share, read, and engage with short posts called tweets. For the ACM Recommender Systems Conference 2020, Twitter published a dataset around 70 GB in size for the annual RecSys Challenge. In 2020, the RecSys Challenge invited participating teams to create models that would predict engagement likelihoods for given user-tweet combinations. The submitted models predicting like, reply, retweet, and quote engagements were evaluated based on two metrics: area under the precision-recall curve (PRAUC) and relative cross-entropy (RCE). In this diploma thesis, we used the RecSys 2020 Challenge dataset and evaluation procedure to investigate how well context alone may be used to predict tweet engagement likelihood. In doing so, we employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines. We manually created just under 200 additional features to describe tweet context. The results indicate that features describing users' prior engagement history and the popularity of hashtags and links in the tweet were the most informative. We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results. After comparing the best results of our context-only prediction models with content-only models and with models developed by the Challenge winners, we identified that the context-based models underperformed in terms of the RCE score. This work thus concludes by situating this discrepancy and proposing potential improvements to our implementation, which is shared in a public git repository.
    摘要 推特是目前最大的社交媒体平台之一,其用户可以分享、阅读和参与短消息 called tweets。2020年ACM推荐系统会议上,推特发布了约70GB的数据集,并邀请参与者们创建模型,以预测给定用户-消息组合的参与可能性。提交的模型包括like、回复、转推和引用参与的预测都会被评估基于两个指标:精度-回归曲线面积(PRAUC)和相对杂化率(RCE)。在本毕业论文中,我们使用2020年RecSys挑战的数据集和评估方法,以 investigate how well context alone may be used to predict tweet engagement likelihood。我们使用TU Wien的Little Big Data Cluster上的Spark引擎,创建了可扩展的数据处理、工程、特征选择和机器学习管道。我们手动创建了约200个特征来描述消息上下文。结果显示,用户的前一次参与历史和消息中的话题和链接的流行程度是最有用的特征。我们还发现,预测算法、训练数据集大小、训练数据集采样方法和特征选择会影响结果。在与挑战赛得奖者的模型进行比较后,我们发现context-only模型在RCE指标上表现较差。这项工作因此结束,并提出了可能的改进方案,并在公共Git存储库中分享。

Max-Sliced Mutual Information

  • paper_url: http://arxiv.org/abs/2309.16200
  • repo_url: None
  • paper_authors: Dor Tsur, Ziv Goldfeld, Kristjan Greenewald
  • for: 这篇论文的目的是量化高维Random Variable之间的依赖关系,以便进行统计学学习和推断。
  • methods: 该论文使用了一种可扩展的信息理论方法,称为最大剖分协同信息(mSMI),它等于高维变量的低维投影上的最大共同信息。mSMI在Gaussian情况下退化为CCA。这种方法可以快速计算和可扩展地估算高维变量之间的依赖关系,同时也可以捕捉数据中复杂的依赖关系。
  • results: 该论文的实验结果表明,mSMI在独立测试、多视图学习、公平性检测和生成模型中具有优异表现,并且在计算量方面具有明显的优势。
    Abstract Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
    摘要 高维 Random variable 之间的关系衡量是统计学学习和推断中的中心问题。两种经典方法是Canonical correlation analysis (CCA),它可以找到最大相关的投影后的原始变量,以及Shannon的共同信息 (mutual information),它是一种通用的关系度量,同时也能捕捉高阶关系。但是CCA只能考虑线性关系,可能不够 для某些应用,而共同信息则在高维时常不可计算或估计。这项工作提出了一种可扩展的信息理论基于CCA的方法,称为最大剖分共同信息 (mSMI)。mSMI等于高维变量的低维投影中的最大共同信息,在Gaussian情况下降到了CCA。它同时具有了两种方法的优点:能够捕捉数据中的复杂关系,并且可以快速计算和可扩展地估计。我们证明了mSMI保持了共同信息的有利结构性质,如变量形式和独立性识别。然后,我们研究了mSMI的统计估计,提出了一种高效计算的神经网络估计器,并与非假顺序 bound 相结合。我们在几个任务上进行了实验,包括独立性测试、多视图学习、算法公平和生成模型。我们发现mSMI在这些任务上一般性能更高,而且具有微不足的计算开销。

Stackelberg Batch Policy Learning

  • paper_url: http://arxiv.org/abs/2309.16188
  • repo_url: None
  • paper_authors: Wenzhuo Zhou, Annie Qu
  • for: 批处理学习(Batch Reinforcement Learning)任务,lacking exhaustive exploration。
  • methods: 采用游戏理论视角,将策略学习图表化为一个两player总和游戏,采用StackelbergLearner算法,其中领导者player更新基于总导数据,而追随者player进行个体更新和保证转移逻辑一致。
  • results: 提供实例 dependent regret bound,证明StackelbergLearner算法可以学习一个最佳尝试策略,可以与任何比较器策略进行竞争,无需数据覆盖和强 функ数据近似条件。通过广泛的实验,发现StackelbergLearner算法在批处理RL benchmark和实际数据上表现良好或更好。
    Abstract Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.
    摘要 批量强化学习(RL)定义为从固定批量数据中学习,缺乏完整的探索。最坏情况优化算法,它们从日志体验中拟合值函数模型类型,并在学习后进行一种类型的悲观评估。在这篇论文中,我们采用了游戏论视角,将策略学习图表作为两个玩家的通用和总和游戏,其中一个是领导者,另一个是追随者。我们提出了一种新的随机梯度学习算法:StackelbergLearner,其中领导者玩家更新根据总对象的梯度而不是各个梯度,而追随者玩家进行个人更新,并确保转移逻辑一致。这种学习动态自然地具有游戏论视角,并提供了对分解 Stackelberg 平衡的启发性证明。从理论上看,我们提供了实例特定的 regret bound,证明我们的算法可以学习一个最大努力策略,可以与任何比较器策略进行竞争,这些策略只需要涵盖批量数据中的一部分。值得注意的是,我们的理论 regret 保证只需要 realizability 和 strong function approximation 条件,而不需要数据覆盖和强函数approximation 条件,这与先前的方法不同。通过对比periment,我们发现我们的算法在批量 RL 标准测试数据集和实际数据中表现了良好的性能。

Systematic Sampling and Validation of Machine Learning-Parameterizations in Climate Models

  • paper_url: http://arxiv.org/abs/2309.16177
  • repo_url: https://github.com/jerrylin96/ClimScale
  • paper_authors: Jerry Lin, Sungduk Yu, Tom Beucler, Pierre Gentine, David Walling, Mike Pritchard
  • for: 这个论文主要写于 hybrid physics-machine learning(ML)气象模拟中进展的限制,特别是在同时获得可行性和精度的 coupled(在线)模拟方面。
  • methods: 论文使用的方法包括评估多种机器学习(ML)参数化方法的在线模拟效果,以及对这些方法的评估和优化。
  • results: 研究发现,在线模拟中的模型性能可以通过添加内存、温度湿度输入变换和额外输入变量进行改进。同时,研究还发现了在线模拟错误的大量变化和在线vs. 离线错误统计的不一致现象。这些结果表明,需要评估数百个候选机器学习模型,以检测参数设计选择的效果。
    Abstract Progress in hybrid physics-machine learning (ML) climate simulations has been limited by the difficulty of obtaining performant coupled (i.e. online) simulations. While evaluating hundreds of ML parameterizations of subgrid closures (here of convection and radiation) offline is straightforward, online evaluation at the same scale is technically challenging. Our software automation achieves an order-of-magnitude larger sampling of online modeling errors than has previously been examined. Using this, we evaluate the hybrid climate model performance and define strategies to improve it. We show that model online performance improves when incorporating memory, a relative humidity input feature transformation, and additional input variables. We also reveal substantial variation in online error and inconsistencies between offline vs. online error statistics. The implication is that hundreds of candidate ML models should be evaluated online to detect the effects of parameterization design choices. This is considerably more sampling than tends to be reported in the current literature.
    摘要 “ hybrid physics-machine learning(ML)气象模拟的进步受到了在线(同时)模拟的困难所限制。虽然可以轻松地在离线环境中评估数百个ML参数化的子网格闭合(如风化和辐射),但在同样的大小级别上进行在线评估是技术上困难的。我们的软件自动化实现了在线模型评估中的样本增加,相比之前的评估,样本数量增加了一个数量级。使用这些样本,我们评估了混合气象模型的性能,并定义了改进策略。我们发现,在线模型性能会提高,当将记忆、湿度输入特征变换和额外输入变量添加到模型中时。我们还发现了在线错误的重大变化和在离线vs在线错误统计之间的不一致。这表明需要评估数百个候选机器学习模型,这比现有文献报道的评估范围更多。”

Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2309.16173
  • repo_url: None
  • paper_authors: Yash Sinha, Murari Mandal, Mohan Kankanhalli
  • for: delete information from pre-trained graph neural network (GNN) to comply with data protection regulations and reduce carbon footprint
  • methods: knowledge distillation, model-agnostic approach, dividing and marking complete graph knowledge for retention and deletion, using response-based soft targets and feature-based node embedding, minimizing KL divergence
  • results: surpasses existing methods in various real-world graph datasets by up to $43.1%$ (AUC) in edge and node unlearning tasks, better efficiency, better performance in removing target elements, preservation of performance for the retained elements, zero overhead costs, surpasses state-of-the-art GNNDelete in AUC by $2.4%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.Here’s the Chinese version:
  • for: delete信息从预训练的图 neural network (GNN) 以遵循数据保护法规和减少碳脚印
  • methods: knowledge distillation, 模型无关方法, 将完整的图知识分成并标记为保留和删除, 使用响应基于软目标和特征基于节点嵌入, 最小化KL偏差
  • results: surpasses 现有方法在各种实际图数据集中 by up to $43.1%$ (AUC) 的边和节点解启 зада务, 更好的效率, 更好地 removing 目标元素, 保留元素性能的 preserved, 零开销成本, surpasses 状态艺术 GNNDelete 的 AUC by $2.4%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.
    Abstract Graph unlearning has emerged as a pivotal method to delete information from a pre-trained graph neural network (GNN). One may delete nodes, a class of nodes, edges, or a class of edges. An unlearning method enables the GNN model to comply with data protection regulations (i.e., the right to be forgotten), adapt to evolving data distributions, and reduce the GPU-hours carbon footprint by avoiding repetitive retraining. Existing partitioning and aggregation-based methods have limitations due to their poor handling of local graph dependencies and additional overhead costs. More recently, GNNDelete offered a model-agnostic approach that alleviates some of these issues. Our work takes a novel approach to address these challenges in graph unlearning through knowledge distillation, as it distills to delete in GNN (D2DGN). It is a model-agnostic distillation framework where the complete graph knowledge is divided and marked for retention and deletion. It performs distillation with response-based soft targets and feature-based node embedding while minimizing KL divergence. The unlearned model effectively removes the influence of deleted graph elements while preserving knowledge about the retained graph elements. D2DGN surpasses the performance of existing methods when evaluated on various real-world graph datasets by up to $43.1\%$ (AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by $2.4\%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.
    摘要 “图гра推断学(Graph Neural Network,GNN)的忘记(unlearning)技术已经成为了练网图模型(pre-trained graph neural network)中删除信息的重要方法。可以删除节点、类型节点、边或类型边。忘记方法可以让GNN模型遵循数据保护法规(如“忘记权”),适应数据分布的变化,并减少GPU时间的碳脚印迹(避免重复 retraining)。现有的分区和聚合方法有限制,因为它们对本地图像依赖不善,并且带有额外成本。更新的GNNDelete提供了一种模型无关的方法,解决了一些这些问题。我们的工作采用了一种新的方法来解决图гра忘记的挑战,通过知识储存(knowledge distillation)来忘记(D2DGN)。这是一种模型无关的储存框架,Complete graph knowledge 被分解并标记为保留和删除。它通过响应式软目标和特征基于节点嵌入进行储存,并最小化KL散度。被忘记的模型可以有效地减少被删除图像元素的影响,保留保留图像元素的知识。D2DGN在多种实际图像Dataset上评估得到了比GNNDelete更高的表现,最高达到$43.1\%$(AUC)的边和节点忘记任务。其他优点包括更高的效率、更好的目标元素删除、保留元素性能的保留、零额外成本。值得一提的是,我们的D2DGN在AUC方面比GNNDelete提高了$2.4\%$, 提高了成员推断率$+1.3$, 需要$10.2\times10^6$ fewer FLOPs per forward pass和${\mathbf{3.2}\times$快。”

A Spectral Approach for Learning Spatiotemporal Neural Differential Equations

  • paper_url: http://arxiv.org/abs/2309.16131
  • repo_url: None
  • paper_authors: Mingtao Xia, Xiangting Li, Qijing Shen, Tom Chou
  • for: Computationally reconstructing differential equations (DEs) from observational data to gain insights into underlying causative mechanisms.
  • methods: Using spectral expansions in space to learn spatiotemporal DEs, without relying on spatial discretization, allowing for long-range, nonlocal spatial interactions on unbounded domains.
  • results: The proposed spectral neural DE learning approach is shown to be as accurate as some of the latest machine learning approaches for learning PDEs operating on bounded domains, and can be applied to a larger class of problems including unbounded DEs and integro-differential equations.
    Abstract Rapidly developing machine learning methods has stimulated research interest in computationally reconstructing differential equations (DEs) from observational data which may provide additional insight into underlying causative mechanisms. In this paper, we propose a novel neural-ODE based method that uses spectral expansions in space to learn spatiotemporal DEs. The major advantage of our spectral neural DE learning approach is that it does not rely on spatial discretization, thus allowing the target spatiotemporal equations to contain long range, nonlocal spatial interactions that act on unbounded spatial domains. Our spectral approach is shown to be as accurate as some of the latest machine learning approaches for learning PDEs operating on bounded domains. By developing a spectral framework for learning both PDEs and integro-differential equations, we extend machine learning methods to apply to unbounded DEs and a larger class of problems.
    摘要 “快速发展的机器学习方法已经刺激了观察数据中的对应运动方程式(DEs)的计算重建的研究兴趣。在这篇论文中,我们提出了一种新的神经网络-ODE基于方法,使用特征展开来学习空间时间的对应运动方程式。我们的spectral neural DE学习方法不依赖空间维度化,因此允许目标空间时间方程式包含无限距离的非本地空间互动, acting on unbounded spatial domains。我们的spectral方法与latest machine learning方法相比,具有相同的精度。通过开发一个spectral框架来学习PDE和 integro-differential方程式,我们延伸了机器学习方法,让它适用于无限DE和更多的问题。”Note that Simplified Chinese is a written form of Chinese that uses shorter words and sentences, and is more commonly used in informal writing and online communication. Traditional Chinese is a more formal written form that is used in more formal writing, such as newspapers and books.

Compositional Sculpting of Iterative Generative Processes

  • paper_url: http://arxiv.org/abs/2309.16115
  • repo_url: https://github.com/timgaripov/compositional-sculpting
  • paper_authors: Timur Garipov, Sebastiaan De Peuter, Ge Yang, Vikas Garg, Samuel Kaski, Tommi Jaakkola
  • for: 本研究旨在提出一种通用的拟合过程定义方法,以便将多个迭代生成过程组合成更复杂的生成模型。
  • methods: 本研究使用的方法包括:1)定义迭代生成过程的组合方法,2)基于分类器导航的采样方法,3)在GFlowNets和扩散模型中实现compositional sculpting。
  • results: 本研究通过实验表明,通过使用compositional sculpting可以在图像和分子生成任务中实现更高的生成质量和更好的扩散性。
    Abstract High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.
    摘要 高训练成本的生成模型和特定任务的 fine-tuning 已经创造了对模型再利用和组合的强大兴趣。一个关键挑战在组合迭代生成过程,如 GFlowNets 和 diffusion models,是确保所有生成过程步骤协调,并满足细腻的平衡条件。在这项工作中,我们提出了组合雕塑:一种通用的方法来定义生成过程的组合。然后,我们介绍了基于分类指导的抽样方法。我们在 GFlowNets 和 diffusion models 中实现了 compositional sculpting,并提供了对多组件分布的总体化。我们介绍了两种二元操作:harmonic mean ($p_1 \otimes p_2$) 和 contrast ($p_1 \unicode{x25D1}\,p_2$) 之间的对比,以及这些操作的普遍化到多个组件分布。我们提供了对图像和分子生成任务的实验结果。

Comparing Active Learning Performance Driven by Gaussian Processes or Bayesian Neural Networks for Constrained Trajectory Exploration

  • paper_url: http://arxiv.org/abs/2309.16114
  • repo_url: https://github.com/xfyna/al-bnn-gp
  • paper_authors: Sapphira Akins, Frances Zhu
  • for: 这篇论文旨在提高空间探索能力,尤其是在地面探测和采样方面。
  • methods: 论文使用了活动学习算法,其中包括 Gaussian processes 和 Bayesian neural networks。
  • results: 结果表明,使用 Gaussian processes 的活动学习策略可以更快地 converges 到一个准确的模型,并且可以采取更短的轨迹。 Bayesian neural networks 在大数据 regime 下可以更准确地模型环境,但是需要更多的计算资源。
    Abstract Robots with increasing autonomy progress our space exploration capabilities, particularly for in-situ exploration and sampling to stand in for human explorers. Currently, humans drive robots to meet scientific objectives, but depending on the robot's location, the exchange of information and driving commands between the human operator and robot may cause undue delays in mission fulfillment. An autonomous robot encoded with a scientific objective and an exploration strategy incurs no communication delays and can fulfill missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure varies the performance of the active learning algorithm in accurately forming an understanding of the environment. In this paper, we investigate the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents that are constrained in their trajectories, like planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments in which Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.
    摘要 Robots with increasing autonomy are advancing our space exploration capabilities, particularly for in-situ exploration and sampling to replace human explorers. Currently, humans control robots to achieve scientific objectives, but the delay in information exchange and driving commands between the human operator and robot can hinder mission success. An autonomous robot with a scientific objective and exploration strategy encoded does not incur communication delays and can complete missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure affects the performance of the active learning algorithm in understanding the environment. In this paper, we compare the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents with constrained trajectories, such as planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments where Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.

Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics

  • paper_url: http://arxiv.org/abs/2309.16109
  • repo_url: None
  • paper_authors: Han Bao
  • for: 本文研究了异形学习框架中的对比学习方法,包括对比学习和非对比学习两种方法。
  • methods: 本文使用了对比学习和非对比学习两种方法,并对这两种方法的学习过程进行了动力学分析。
  • results: 研究发现,通过增强数据增强而生成的两个正例会在数据表示空间中受到吸引力,而负例会受到排斥力。但是,通过对比学习和非对比学习两种方法的比较,发现 feature normalization 对学习过程的稳定性具有重要的影响。
    Abstract Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
    摘要 “对照式学习是一种自我指导学习框架,其中两个正例通过数据增强生成的观察者通过吸引力在数据表示空间中变相似,而负例则通过排斥力让它们远离负例。非对照式学习,例如BYOL和SimSiam,进一步删除负例,并提高计算效率。然而,学习的表示可能会崩溃到单一点,因为缺乏排斥力。但是,这个问题可以通过调整数据增强的强度来解决。”“然而,这些分析不考虑通常使用的特征Normalizer,即在计算表示之间的相似度时,将特征转换为相同的尺度。因此,过度强制正规化可能会导致动态崩溃,这是一种不自然的行为。因此,我们从L2损失中推广的理论,考虑cosine损失,这个损失函数包含特征Normalizer。我们显示,cosine损失导致第六种动态(而L2损失导致第三种动态),其中稳定的平衡 dynamically emerges,即使只有崩溃的初始参数。因此,我们提出了一个新的理解,即特征Normalizer在避免动态崩溃中扮演了重要的角色。”

Differentially Private Secure Multiplication: Hiding Information in the Rubble of Noise

  • paper_url: http://arxiv.org/abs/2309.16105
  • repo_url: None
  • paper_authors: Viveck R. Cadambe, Ateet Devulapalli, Haewon Jeong, Flavio P. Calmon
  • for: 这个论文研究了分布式多方计算中的隐私问题,具体来说是使用谢米尔秘密分享编码策略来实现信息理论上的完美隐私。
  • methods: 该论文使用了谢米尔秘密分享编码策略,但是它允许一定的信息泄露和approximate multiplication,从而在部分诚实节点的情况下保证隐私和准确性。
  • results: 该论文提出了一种紧张性 privacy-accuracy 质量的衡量方法,并在不同层次上具有层次结构的隐私泄露分布,从而实现了在诚实节点少于2t+1的情况下的隐私和准确性。
    Abstract We consider the problem of private distributed multi-party multiplication. It is well-established that Shamir secret-sharing coding strategies can enable perfect information-theoretic privacy in distributed computation via the celebrated algorithm of Ben Or, Goldwasser and Wigderson (the "BGW algorithm"). However, perfect privacy and accuracy require an honest majority, that is, $N \geq 2t+1$ compute nodes are required to ensure privacy against any $t$ colluding adversarial nodes. By allowing for some controlled amount of information leakage and approximate multiplication instead of exact multiplication, we study coding schemes for the setting where the number of honest nodes can be a minority, that is $N< 2t+1.$ We develop a tight characterization privacy-accuracy trade-off for cases where $N < 2t+1$ by measuring information leakage using {differential} privacy instead of perfect privacy, and using the mean squared error metric for accuracy. A novel technical aspect is an intricately layered noise distribution that merges ideas from differential privacy and Shamir secret-sharing at different layers.
    摘要 我团队考虑了分布式多方计算中的私人分享 multiply 问题。已经证明了Shamir的秘密分享编码策略可以在分布式计算中实现完美的信息理论隐私,通过著名的Ben Or、Goldwasser和Wigderson算法(BGW算法)。然而,完美隐私和精度需要一个诚实的多数,即 $N \geq 2t+1$ 计算节点。我们允许一定的控制的信息泄露和approximate multiply instead of exact multiply,研究在计算节点少于 $2t+1$ 的情况下的编码方案。我们开发了一个紧张的隐私准确度质量负担,通过使用 {differential} privacy 而不是完美隐私来度量信息泄露,并使用 mean squared error метри来度量准确性。一个新的技术方面是一种复杂层次的噪声分布,这将 Shamir secret-sharing 和分布式隐私的想法 merge 在不同层次。

Task-Oriented Koopman-Based Control with Contrastive Encoder

  • paper_url: http://arxiv.org/abs/2309.16077
  • repo_url: None
  • paper_authors: Xubo Lyu, Hanyang Hu, Seth Siriya, Ye Pu, Mo Chen
  • for: 这 paper 的目的是 simultaneously learn Koopman latent embedding, operator and associated linear controller within an iterative loop, 以便在高维、复杂非线性系统中进行控制。
  • methods: 这 paper 使用 end-to-end reinforcement learning 和 contrastive encoder 来学习 Koopman latent embedding, operator and associated linear controller。
  • results: 通过优先级 task cost 作为控制器学习的主要目标,这 paper 可以减少控制器设计对于准确模型的依赖,从而扩展 Koopman control 到高维、复杂非线性系统,包括像素化enario。
    Abstract We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator and associated linear controller within an iterative loop. By prioritizing the task cost as main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which extends Koopman control beyond low-dimensional systems to high-dimensional, complex nonlinear systems, including pixel-based scenarios.
    摘要 我们提出了任务导向的库曼控制方法,该方法利用端到端学习和对比编码器同时学习库曼嵌入、运算和相关的直线控制器。我们将任务成本作为控制器学习的主要目标,从而减少控制器设计依赖于良好识别模型的需求,因此扩展了库曼控制到高维、复杂非线性系统,包括像素化场景。

Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations via Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.16074
  • repo_url: None
  • paper_authors: Feiyang Wu, Zhaoyuan Gu, Hanran Wu, Anqi Wu, Ye Zhao
  • for: 将步行机器人学习在高度不对称、动态变化的地形上行走是一个具有复杂性的挑战,因为机器人动力学和环境互动的复杂性。
  • methods: 本研究使用了学习从示例(Learning from Demonstrations,LfD)技术,将专家策略传染到机器人上,并运用了最新的反馈学习(Inverse Reinforcement Learning,IRL)技术来解决步行机器人的行走问题。
  • results: 研究发现,透过对专家策略进行学习,可以将机器人的行走性能提高,并且在未见过的地形上保持稳定的行走。这显示了对 reward 学习的适应性和机器人行走的可控性。
    Abstract Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.
    摘要 enable bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.Here's the translation in Traditional Chinese:问题是,具有访问高度不均匀、动态变化的地形的双脚行走机器人学习是具有机器人动力学和互动环境的复杂性,导致学习问题的挑战。现有的学习从示例探索已经展示了在复杂环境中机器人学习的可能性。然而,对于双脚行走中的专家奖励函数学习,尚未充分探索。本文将使用现代倒推奖励学技术(IRL)解决双脚行走问题。我们提出了学习专家奖励函数的算法,并且分析学习到的函数。通过非线性函数推对,我们获得了专家行走策略的深入理解。此外,我们还证明了将专家奖励函数训练到双脚行走策略上,可以增强其在未见地形上的行走性能,强调了奖励学习的适应性。

eess.IV - 2023-09-28

Neuromorphic Imaging with Joint Image Deblurring and Event Denoising

  • paper_url: http://arxiv.org/abs/2309.16106
  • repo_url: None
  • paper_authors: Pei Zhang, Haosen Liu, Zhou Ge, Chutian Wang, Edmund Y. Lam
  • for: 增强 neuromorphic 感知器的感知质量和精度,以便更好地进行神经元推理和分析。
  • methods: 提出了一种简单 yet effective的联合算法,可以同时重建锐利图像和噪声Robust事件,并利用事件 regularized prior 提供 auxiliary motion features для隐藏的噪声除去,以及图像梯度作为参照进行神经omorphic 噪声除去。
  • results: 在实际和synthetic 样本上进行了广泛的评估,并显示了我们的方法在 restore 质量和鲁棒性方面具有竞争力,并且在一些具有挑战性的实际场景下具有更高的robustness。
    Abstract Neuromorphic imaging reacts to per-pixel brightness changes of a dynamic scene with high temporal precision and responds with asynchronous streaming events as a result. It also often supports a simultaneous output of an intensity image. Nevertheless, the raw events typically involve a great amount of noise due to the high sensitivity of the sensor, while capturing fast-moving objects at low frame rates results in blurry images. These deficiencies significantly degrade human observation and machine processing. Fortunately, the two information sources are inherently complementary -- events with microsecond temporal resolution, which are triggered by the edges of objects that are recorded in latent sharp images, can supply rich motion details missing from the blurry images. In this work, we bring the two types of data together and propose a simple yet effective unifying algorithm to jointly reconstruct blur-free images and noise-robust events, where an event-regularized prior offers auxiliary motion features for blind deblurring, and image gradients serve as a reference to regulate neuromorphic noise removal. Extensive evaluations on real and synthetic samples present our superiority over other competing methods in restoration quality and greater robustness to some challenging realistic scenarios. Our solution gives impetus to the improvement of both sensing data and paves the way for highly accurate neuromorphic reasoning and analysis.
    摘要 (Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation is written in the formal style, which is appropriate for academic or professional writing.)

eess.SP - 2023-09-28

Contrast detection is enhanced by deterministic, high-frequency transcranial alternating current stimulation with triangle and sine waveform

  • paper_url: http://arxiv.org/abs/2310.03763
  • repo_url: None
  • paper_authors: Weronika Potok, Onno van der Groen, Sahana Sivachelvam, Marc Bächinger, Flavio Fröhlich, Laszlo B. Kish, Nicole Wenderoth
  • for: 这个论文旨在探讨 Stochastic Resonance(SR)现象在神经系统中的应用, SR 是一种在非线性系统中增强信号传输的现象,可以通过添加随机噪声来实现。
  • methods: 这个论文使用了 transcranial random noise stimulation (tRNS) 和 transcranial alternating current stimulation (tACS) 两种方法来实现 SR。
  • results: 研究发现,使用 tACS 和 tRNS 可以降低视觉检测阈值,并且两种方法的效果相当。这表明,SR 可以通过添加 deterministic 信号来实现,而不仅仅是随机噪声。
    Abstract Stochastic Resonance (SR) describes a phenomenon where an additive noise (stochastic carrier-wave) enhances the signal transmission in a nonlinear system. In the nervous system, nonlinear properties are present from the level of single ion channels all the way to perception and appear to support the emergence of SR. For example, SR has been repeatedly demonstrated for visual detection tasks, also by adding noise directly to cortical areas via transcranial random noise stimulation (tRNS). When dealing with nonlinear physical systems, it has been suggested that resonance can be induced not only by adding stochastic signals (i.e., noise) but also by adding a large class of signals that are not stochastic in nature which cause "deterministic amplitude resonance" (DAR). Here we mathematically show that high-frequency, deterministic, periodic signals can yield resonance-like effects with linear transfer and infinite signal-to-noise ratio at the output. We tested this prediction empirically and investigated whether non-random, high-frequency, transcranial alternating current stimulation applied to visual cortex could induce resonance-like effects and enhance performance of a visual detection task. We demonstrated in 28 participants that applying 80 Hz triangular-waves or sine-waves with tACS reduced visual contrast detection threshold for optimal brain stimulation intensities. The influence of tACS on contrast sensitivity was equally effective to tRNS-induced modulation, demonstrating that both tACS and tRNS can reduce contrast detection thresholds. Our findings suggest that a resonance-like mechanism can also emerge when deterministic electrical waveforms are applied via tACS.
    摘要

T1/T2 relaxation temporal modelling from accelerated acquisitions using a Latent Transformer

  • paper_url: http://arxiv.org/abs/2309.16853
  • repo_url: None
  • paper_authors: Fanwen Wang, Michael Tanzer, Mengyun Qiao, Wenjia Bai, Daniel Rueckert, Guang Yang, Sonia Nielles-Vallespin
  • for: 这个论文旨在提高心肺成像中的速度和精度,使得它们能够广泛应用于临床。
  • methods: 该论文使用深度学习方法,特别是Latent Transformer模块,来模型 Parameterized time frames 之间的关系,从而提高从受限样本数据中的重建。
  • results: 论文中的结果表明,通过Explicitly incorporating time dynamics,模型可以recover higher fidelity T1和T2 mapping,并且不受artefacts的干扰。这个研究证明了在量子MRI中的时间模型非常重要。
    Abstract Quantitative cardiac magnetic resonance T1 and T2 mapping enable myocardial tissue characterisation but the lengthy scan times restrict their widespread clinical application. We propose a deep learning method that incorporates a time dependency Latent Transformer module to model relationships between parameterised time frames for improved reconstruction from undersampled data. The module, implemented as a multi-resolution sequence-to-sequence transformer, is integrated into an encoder-decoder architecture to leverage the inherent temporal correlations in relaxation processes. The presented results for accelerated T1 and T2 mapping show the model recovers maps with higher fidelity by explicit incorporation of time dynamics. This work demonstrates the importance of temporal modelling for artifact-free reconstruction in quantitative MRI.
    摘要 量化冠动磁共振T1和T2映射可以 caracterizar la tissue cardíaca, pero los tiempos de escaneo prolongados limitan su aplicación clínica amplia. Proponemos un método de aprendizaje profundo que incorpora un módulo de dependencia temporal Latent Transformer para modelar las relaciones entre los marcos de tiempo parameterizados para mejorar la reconstrucción a partir de datos sub-procesados. El módulo, implementado como un transformer de secuencia a secuencia de múltiples resoluciones, se integra en un arquitectura de codificador-decodificador para aprovechar las correlaciones temporales inherentes en los procesos de relajación. Los resultados presentados para la aceleración de T1 y T2 mapping muestran que el modelo recupera mapas con una fidelidad más alta al incorporar explícitamente las dinámicas del tiempo. Este trabajo demuestra la importancia del modelado temporal para la reconstrucción libre de artifactos en la imagen de resonancia magnética cuántica.

Business Model Canvas for Micro Operators in 5G Coopetitive Ecosystem

  • paper_url: http://arxiv.org/abs/2309.16845
  • repo_url: None
  • paper_authors: Javane Rostampoor, Roghayeh Joda, Mohammad Dindoost
  • for: 本研究旨在提供5G微型运营商业模式框架,以帮助新的5G业务创造价值。
  • methods: 本研究采用了商业模式canvas(BMC)的概念,以分析5G微型运营商业模式的发展。
  • results: 研究发现,5G微型运营商业模式框架可以帮助新的5G业务创造价值,并且可以在5G协同环境中实现更好的覆盖率和容量。
    Abstract In order to address the need for more capacity and coverage in the 5th generation (5G) of wireless networks, ultra-dense wireless networks are introduced which mainly consist of indoor small cells. This new architecture has paved the way for the advent of a new concept called Micro Operator. A micro operator is an entity that provides connections and local 5G services to the customers and relies on local frequency resources. We discuss business models of micro operators in a 5G coopetitive environment and develop a framework to indicate the business model canvas (BMC) of this new concept. Providing BMC for new businesses is a strategic approach to offer value to customers. In this research study, BMC and its elements are introduced and explained for 5G micro operators.
    摘要 为了满足5G网络的容量和覆盖需求,ultra-dense无线网络被引入,主要由室内小终端组成。这新的架构为微运营者的出现提供了方便。微运营者是一个为客户提供连接和本地5G服务的实体,并且依靠本地频率资源。我们研究了5G协作环境中微运营者的业务模式,并开发了一个框架来指示微运营者的业务模型Canvas(BMC)。为新的业务提供BMC是一种策略性的方法,以便为客户提供价值。在这项研究中,BMC和其元素被介绍和解释了5G微运营者。

Wi-Fi 8: Embracing the Millimeter-Wave Era

  • paper_url: http://arxiv.org/abs/2309.16813
  • repo_url: None
  • paper_authors: Xiaoqian Liu, Tingwei Chen, Yuhan Dong, Zhi Mao, Ming Gan, Xun Yang, Jianmin Lu
  • for: 这篇论文探讨了未来的Wi-Fi 8技术,尤其是兆米波技术的应用。
  • methods: 该论文通过 simulations 提供了一个全面的未来Wi-Fi 8技术的视角,并且研究了兆米波技术的可能性。
  • results: 模拟结果表明,兆米波技术可以实现显著的性能提升,即使硬件障碍存在。
    Abstract With the increasing demands in communication, Wi-Fi technology is advancing towards its next generation. Building on the foundation of Wi-Fi 7, millimeter-wave technology is anticipated to converge with Wi-Fi 8 in the near future. In this paper, we look into the millimeter-wave technology and other potential feasible features, providing a comprehensive perspective on the future of Wi-Fi 8. Our simulation results demonstrate that significant performance gains can be achieved, even in the presence of hardware impairments.
    摘要 随着通信需求的增长,Wi-Fi技术正在迈向下一代。基于Wi-Fi 7的基础上, millimeter-wave技术预计将与Wi-Fi 8相结合在不远的未来。本文将 millimeter-wave技术和其他可能实现的特性进行全面探讨,为Wi-Fi 8的未来提供全面的视角。我们的 simulations 结果表明,即使硬件障碍存在,也可以实现显著的性能提升。

  • paper_url: http://arxiv.org/abs/2309.16628
  • repo_url: None
  • paper_authors: Charles E. Thornton, Evan Allen, Evar Jones, Daniel Jakubisin, Fred Templin, Lingjia Liu
  • for: 本文研究了5G和以后的副链(SL)通信可以支持多跳策略网络。
  • methods: 本文首先提供了3GPP SL标准化活动的技术和历史概述,然后考虑了在战略网络中的应用问题。文章考虑了许多多跳路由技术,这些技术预期会对SL启用多跳策略网络中很有用。文章还考虑了开源工具,可以用于网络模拟。
  • results: 本文讨论了5G SL启用多跳策略网络中的一些问题,如RLS感知和定位的 инте格ция,以及新的机器学习工具,如联邦学习和分布式学习,可以用于资源分配和路由问题。文章 conclude by summarizing recent developments in the 5G SL literature and provide guidelines for future research。
    Abstract This work investigates the potential of 5G and beyond sidelink (SL) communication to support multi-hop tactical networks. We first provide a technical and historical overview of 3GPP SL standardization activities, and then consider applications to current problems of interest in tactical networking. We consider a number of multi-hop routing techniques which are expected to be of interest for SL-enabled multi-hop tactical networking and examine open-source tools useful for network emulation. Finally, we discuss relevant research directions which may be of interest for 5G SL-enabled tactical communications, namely the integration of RF sensing and positioning, as well as emerging machine learning tools such as federated and decentralized learning, which may be of great interest for resource allocation and routing problems that arise in tactical applications. We conclude by summarizing recent developments in the 5G SL literature and provide guidelines for future research.
    摘要 这项研究探讨了5G和以后宽带侧链(SL)通信的潜力来支持多跳策略网络。我们首先提供了技术和历史概述3GPP SL标准化活动,然后考虑了应用于战斗网络中的现有问题。我们考虑了一些多跳路由技术,这些技术预计将对SL启用多跳战斗网络中具有 интерес。我们还考虑了开源工具,可以用于网络模拟。最后,我们讨论了5G SL启用的相关研究方向,包括 integrate RF探测和定位,以及emerging machine learning工具,如联邦和分布式学习,这些工具可能对战斗应用中的资源分配和路由问题具有很大的意义。我们结束于summarizing recent developments in 5G SL literature and provide guidelines for future research。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

HyperLISTA-ABT: An Ultra-light Unfolded Network for Accurate Multi-component Differential Tomographic SAR Inversion

  • paper_url: http://arxiv.org/abs/2309.16468
  • repo_url: None
  • paper_authors: Kun Qian, Yuanyuan Wang, Peter Jung, Yilei Shi, Xiao Xiang Zhu
  • for: 提高深度学习基于迭代算法的四维影像重建(4D)精度和效率。
  • methods: 提出了一种高效精度的HyperLISTA-ABT算法,使用分析方式确定网络参数,并实现了 Adaptive Blockwise Thresholding 技术,以提高全面阈值处理。
  • results: 通过实验和实际数据测试,显示HyperLISTA-ABT可以在有限的计算资源和时间下获得高质量的4D点云重建。
    Abstract Deep neural networks based on unrolled iterative algorithms have achieved remarkable success in sparse reconstruction applications, such as synthetic aperture radar (SAR) tomographic inversion (TomoSAR). However, the currently available deep learning-based TomoSAR algorithms are limited to three-dimensional (3D) reconstruction. The extension of deep learning-based algorithms to four-dimensional (4D) imaging, i.e., differential TomoSAR (D-TomoSAR) applications, is impeded mainly due to the high-dimensional weight matrices required by the network designed for D-TomoSAR inversion, which typically contain millions of freely trainable parameters. Learning such huge number of weights requires an enormous number of training samples, resulting in a large memory burden and excessive time consumption. To tackle this issue, we propose an efficient and accurate algorithm called HyperLISTA-ABT. The weights in HyperLISTA-ABT are determined in an analytical way according to a minimum coherence criterion, trimming the model down to an ultra-light one with only three hyperparameters. Additionally, HyperLISTA-ABT improves the global thresholding by utilizing an adaptive blockwise thresholding scheme, which applies block-coordinate techniques and conducts thresholding in local blocks, so that weak expressions and local features can be retained in the shrinkage step layer by layer. Simulations were performed and demonstrated the effectiveness of our approach, showing that HyperLISTA-ABT achieves superior computational efficiency and with no significant performance degradation compared to state-of-the-art methods. Real data experiments showed that a high-quality 4D point cloud could be reconstructed over a large area by the proposed HyperLISTA-ABT with affordable computational resources and in a fast time.
    摘要 深度神经网络基于迭代算法已经在稀疏重建应用中获得了惊人的成功,如Synthetic Aperture Radar(SAR)tomographic逆转(TomoSAR)。然而,目前可用的深度学习基于算法只能处理三维(3D)重建。将深度学习基于算法扩展到四维(4D)成像,即差分Tomography(D-TomoSAR)应用,受限于高维度权重矩阵需要的深度学习模型中的大量自由调节参数。学习这么多参数需要极大的训练样本数和巨大的内存压力,导致训练时间过长。为解决这个问题,我们提出了一种高效和准确的算法called HyperLISTA-ABT。HyperLISTA-ABT中的权重由分析方式决定,以最小干扰 criterion 来确定,因此模型的参数减少到了 ultra-light 的三个超参数。此外,HyperLISTA-ABT还改进了全球阈值处理,通过使用adaptive blockwise阈值处理方案,在本地块中进行阈值处理,以保留弱表达和本地特征在压缩步骤中。我们的方法通过实验表明,HyperLISTA-ABT可以实现高效的计算和快速的训练,而无需极大的训练样本数和内存压力。真实数据实验也表明,通过我们的方法可以在大面积的4D点云重建中获得高质量的重建结果,并且可以在有限的计算资源和快速的时间内完成。

Feed-forward and recurrent inhibition for compressing and classifying high dynamic range biosignals in spiking neural network architectures

  • paper_url: http://arxiv.org/abs/2309.16425
  • repo_url: None
  • paper_authors: Rachel Sava, Elisa Donati, Giacomo Indiveri
  • for: This paper aims to address the challenge of compressing high-dynamic range biosignals in spiking neural network (SNN) architectures.
  • methods: The authors propose a biologically-inspired strategy that utilizes three adaptation mechanisms found in the brain: spike-frequency adaptation, feed-forward inhibitory connections, and Excitatory-Inhibitory (E-I) balance.
  • results: The authors validate the approach in silico using a simple network applied to a gesture classification task from surface EMG recordings.
    Abstract Neuromorphic processors that implement Spiking Neural Networks (SNNs) using mixed-signal analog/digital circuits represent a promising technology for closed-loop real-time processing of biosignals. As in biology, to minimize power consumption, the silicon neurons' circuits are configured to fire with a limited dynamic range and with maximum firing rates restricted to a few tens or hundreds of Herz. However, biosignals can have a very large dynamic range, so encoding them into spikes without saturating the neuron outputs represents an open challenge. In this work, we present a biologically-inspired strategy for compressing this high-dynamic range in SNN architectures, using three adaptation mechanisms ubiquitous in the brain: spike-frequency adaptation at the single neuron level, feed-forward inhibitory connections from neurons belonging to the input layer, and Excitatory-Inhibitory (E-I) balance via recurrent inhibition among neurons in the output layer. We apply this strategy to input biosignals encoded using both an asynchronous delta modulation method and an energy-based pulse-frequency modulation method. We validate this approach in silico, simulating a simple network applied to a gesture classification task from surface EMG recordings.
    摘要 神经omorphic处理器实现基于异步 delta 模ulation和能量基本的脉冲频率调制的脑神经网络(SNN),通过混合 analog/digital 电路实现closed-loop实时处理生物信号。在生物体内,为了减少能耗,silicon neuron circuit 配置为在有限的动态范围内发射,最大发射频率限制在一些百或上百 Herz 内。但生物信号可以有非常大的动态范围,因此将它们编码成脉冲无需满足 neuron 输出的限制是一个开放的挑战。在这种工作中,我们提出了基于生物体内的三种适应机制来压缩高动态范围的 SNN 建筑,包括单个 neuron 层的脉冲频率适应、输入层的前向抑制连接和输出层的律动抑制。我们将这些机制应用到输入的生物信号,使用异步 delta 模ulation 和能量基本的脉冲频率调制方法来编码。我们在 silico 中验证了这种方法,对一个简单的网络进行了surface EMG 记录的手势识别任务。

A Universal Framework for Holographic MIMO Sensing

  • paper_url: http://arxiv.org/abs/2309.16389
  • repo_url: None
  • paper_authors: Charles Vanwynsberghe, Jiguang He, Mérouane Debbah
  • for: 这篇论文旨在解决具有不规则形状的连续天线感知空间的问题。
  • methods: 该论文提出了一种通用框架,可以无论天线的形状,准确地确定天线的感知空间。这种方法基于采样场的几何分析,并且可以在空间和频率域上彰显sampled场的特性。
  • results: 实验结果表明,该方法可以准确地估算不同形状天线的度量域,并且可以扩展到真实的具有折叠性的天线。
    Abstract This paper addresses the sensing space identification of arbitrarily shaped continuous antennas. In the context of holographic multiple-input multiple-output (MIMO), a.k.a. large intelligent surfaces, these antennas offer benefits such as super-directivity and near-field operability. The sensing space reveals two key aspects: (a) its dimension specifies the maximally achievable spatial degrees of freedom (DoFs), and (b) the finite basis spanning this space accurately describes the sampled field. Earlier studies focus on specific geometries, bringing forth the need for extendable analysis to real-world conformal antennas. Thus, we introduce a universal framework to determine the antenna sensing space, regardless of its shape. The findings underscore both spatial and spectral concentration of sampled fields to define a generic eigenvalue problem of Slepian concentration. Results show that this approach precisely estimates the DoFs of well-known geometries, and verify its flexible extension to conformal antennas.
    摘要
  1. The dimension of the sensing space specifies the maximum achievable spatial degrees of freedom (DoFs).2. The finite basis spanning the sensing space accurately describes the sampled field.Previous studies have focused on specific geometries, highlighting the need for a more extendable analysis that can be applied to real-world conformal antennas. To address this, the paper introduces a universal framework for determining the antenna sensing space, regardless of its shape.The results demonstrate both spatial and spectral concentration of sampled fields, which can be used to define a generic eigenvalue problem of Slepian concentration. The approach precisely estimates the DoFs of well-known geometries and verifies its flexibility in extending to conformal antennas.

Convex Estimation of Sparse-Smooth Power Spectral Densities from Mixtures of Realizations with Application to Weather Radar

  • paper_url: http://arxiv.org/abs/2309.16215
  • repo_url: None
  • paper_authors: Hiroki Kuroda, Daichi Kitahara, Eiichi Yoshikawa, Hiroshi Kikuchi, Tomoo Ushio
  • for: 估计复杂 random 过程中 sparse 和 smooth power spectral densities (PSDs)
  • methods: 使用 convex optimization 估计 PSDs
  • results: 提高估计精度 compared to 现有 sparse estimation models
    Abstract In this paper, we propose a convex optimization-based estimation of sparse and smooth power spectral densities (PSDs) of complex-valued random processes from mixtures of realizations. While the PSDs are related to the magnitude of the frequency components of the realizations, it has been a major challenge to exploit the smoothness of the PSDs because penalizing the difference of the magnitude of the frequency components results in a nonconvex optimization problem that is difficult to solve. To address this challenge, we design the proposed model that jointly estimates the complex-valued frequency components and the nonnegative PSDs, which are respectively regularized to be sparse and sparse-smooth. By penalizing the difference of the nonnegative variable that estimates the PSDs, the proposed model can enhance the smoothness of the PSDs via convex optimization. Numerical experiments on the phased array weather radar, an advanced weather radar system, demonstrate that the proposed model achieves superior estimation accuracy compared to existing sparse estimation models, regardless of whether they are combined with a smoothing technique as a post-processing step or not.
    摘要 在本文中,我们提出了一种基于凸优化的复杂数据频谱密度(PSD)估计方法,用于识别复杂随机过程中的稀疏和平滑频谱密度。而频谱密度与实现的频率成分的大小有关,但是由于惩罚频谱密度的差异会导致非凸优化问题,这使得估计变得困难。为解决这个挑战,我们设计了提案的模型,它同时估计了复杂的频率成分和非负的频谱密度,并将它们分别正则化为稀疏和稀疏平滑。通过惩罚非负变量,该模型可以通过凸优化提高频谱密度的平滑性。在phasered array weather radar中进行的数值实验表明,提案的模型在存在或不存在融合熵降低技术的情况下都可以 дости到现有稀疏估计模型的高精度估计。

Hybrid Digital-Wave Domain Channel Estimator for Stacked Intelligent Metasurface Enabled Multi-User MISO Systems

  • paper_url: http://arxiv.org/abs/2309.16204
  • repo_url: None
  • paper_authors: Qurrat-Ul-Ain Nadeem, Jiancheng An, Anas Chaaban
  • for: 这个论文主要是为了解决堆叠智能元素(SIM)激发的通信系统中的通道估计(CE)问题。
  • methods: 该论文提出了一种新的混合数字波域频率域通道估计方法,其中收到的训练符号首先在SIM层中进行了波域处理,然后在数字域中进行了加工。
  • results: 该方法可以在具有限制数量的 радио频率(RF)链的SIM激发通信系统中实现高精度的通道估计,并且可以降低训练负担。
    Abstract Stacked intelligent metasurface (SIM) is an emerging programmable metasurface architecture that can implement signal processing directly in the electromagnetic wave domain, thereby enabling efficient implementation of ultra-massive multiple-input multiple-output (MIMO) transceivers with a limited number of radio frequency (RF) chains. Channel estimation (CE) is challenging for SIM-enabled communication systems due to the multi-layer architecture of SIM, and because we need to estimate large dimensional channels between the SIM and users with a limited number of RF chains. To efficiently solve this problem, we develop a novel hybrid digital-wave domain channel estimator, in which the received training symbols are first processed in the wave domain within the SIM layers, and then processed in the digital domain. The wave domain channel estimator, parametrized by the phase shifts applied by the meta-atoms in all layers, is optimized to minimize the mean squared error (MSE) using a gradient descent algorithm, within which the digital part is optimally updated. For an SIM-enabled multi-user system equipped with 4 RF chains and a 6-layer SIM with 64 meta-atoms each, the proposed estimator yields an MSE that is very close to that achieved by fully digital CE in a massive MIMO system employing 64 RF chains. This high CE accuracy is achieved at the cost of a training overhead that can be reduced by exploiting the potential low rank of channel correlation matrices.
    摘要 堆叠智能表面(SIM)是一种emerging的可编程表面建筑,可以直接在电磁波频率频谱中实现信号处理,从而实现高效的多输入多出力(MIMO)接收机器系统的实现,只需要有限的 радио频率(RF)链。但是,频率链的数量不够,使得频率链数量的限制会导致通道估计(CE)变得困难。为解决这个问题,我们开发了一种新的混合式数字波域频率域通道估计器,其中接收训练符号被首先处理在SIM层中的波域内,然后在数字域内进行处理。波域频率域估计器,由SIM层中所有元atom的阶梯shift参数化,使其最小化均方误差(MSE),并使用梯度下降算法优化。对于装备4个RF链和6层SIM的多用户系统,我们的估计器可以与完全数字CE在巨量MIMO系统使用64个RF链的MSE准确。这高度的CE准确性是在训练负担的代价下实现的,并且可以通过利用频率征的低级别相关性来减少训练负担。

Adaptive Real-Time Numerical Differentiation with Variable-Rate Forgetting and Exponential Resetting

  • paper_url: http://arxiv.org/abs/2309.16159
  • repo_url: None
  • paper_authors: Shashank Verma, Brian Lai, Dennis S. Bernstein
  • for: 这个论文旨在解决随时间变化的感知器噪声的问题,提出了基于adaptive实时数值 differentiating和可变速率忘却的AISE方法。
  • methods: 该论文使用了adaptive实时数值 differentiating和可变速率忘却的AISE方法来解决随时间变化的感知器噪声问题。
  • results: 该论文的实验结果表明,基于AISE方法的适应式实时数值 differentiating可以更好地适应随时间变化的感知器噪声,并且可以更快地响应 changing 噪声特性。
    Abstract Digital PID control requires a differencing operation to implement the D gain. In order to suppress the effects of noisy data, the traditional approach is to filter the data, where the frequency response of the filter is adjusted manually based on the characteristics of the sensor noise. The present paper considers the case where the characteristics of the sensor noise change over time in an unknown way. This problem is addressed by applying adaptive real-time numerical differentiation based on adaptive input and state estimation (AISE). The contribution of this paper is to extend AISE to include variable-rate forgetting with exponential resetting, which allows AISE to more rapidly respond to changing noise characteristics while enforcing the boundedness of the covariance matrix used in recursive least squares.
    摘要 数字PID控制需要 diferencing 操作实现 D 增益。为了降低噪声数据的影响,传统方法是使用滤波器处理数据,其滤波器频率响应需要手动调整基于传感器噪声特性。本文考虑了情况下噪声特性随时间变化的情况,这个问题通过实时数字梯度计算和状态估计(AISE)进行解决。本文的贡献在于将 AISE 扩展到包括变化率忘记和加速忘记,使 AISE 能更快地响应变化噪声特性,同时保证使用 recursive least squares 中的 covariance 矩阵的 boundedness。

cs.SD - 2023-09-27

Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study

  • paper_url: http://arxiv.org/abs/2309.16060
  • repo_url: None
  • paper_authors: Avamarie Brueggeman, Takuya Higuchi, Masood Delfarah, Stephen Shum, Vineet Garg
  • for: 提高自动语音识别精度
  • methods: 单道信号提升、Audio投入、模型融合
  • results: 单道信号提升可以提高keyword spotting精度,但无法在听力训练后提高精度
    Abstract Noise robustness is a key aspect of successful speech applications. Speech enhancement (SE) has been investigated to improve automatic speech recognition accuracy; however, its effectiveness for keyword spotting (KWS) is still under-investigated. In this paper, we conduct a comprehensive study on single-channel speech enhancement for keyword spotting on the Google Speech Command (GSC) dataset. To investigate robustness to noise, the GSC dataset is augmented with noise signals from the WSJ0 Hipster Ambient Mixtures (WHAM!) noise dataset. Our investigation includes not only applying SE before KWS but also performing joint training of the SE frontend and KWS backend models. Moreover, we explore audio injection, a common approach to reduce distortions by using a weighted average of the enhanced and original signals. Audio injection is then further optimized by using another model that predicts the weight for each utterance. Our investigation reveals that SE can improve KWS accuracy on noisy speech when the backend model is trained on clean speech; however, despite our extensive exploration, it is difficult to improve the KWS accuracy with SE when the backend is trained on noisy speech.
    摘要 噪声Robustness是成功语音应用程序的关键方面。语音增强(SE)已经被研究以提高自动语音识别精度,但是它对关键词搜索(KWS)的影响还未得到充分调查。在这篇论文中,我们进行了对单通道语音增强的全面研究,以提高Google语音命令(GSC)数据集上的关键词搜索精度。为了调查噪声的影响,我们使用WHAM!噪声数据集中的噪声信号来扩展GSC数据集。我们的调查包括不仅将SE应用于KWS前置处理,还包括将SE前端和KWS后端模型进行共同训练。此外,我们还探索了音频注入,一种常见的方法,通过使用每个语音的权重来减少损害。我们发现,当后端模型训练于干净语音时,SE可以提高KWS精度在噪声语音中;但是,我们进行了广泛的探索,但是很难通过SE提高KWS精度,当后端模型训练于噪声语音。

Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

  • paper_url: http://arxiv.org/abs/2309.16049
  • repo_url: https://github.com/YIXUANZ/NeuralKalmanAHS
  • paper_authors: Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu
  • for: 提高音频通信系统中喊叫干扰(AHS)的性能
  • methods: 利用神经网络(NN)增强传统的加尔曼筛算法,提高适应性和扩展参数
  • results: 比起独立的NN和加尔曼筛方法,提出的方法实现了更好的AHS性能,经验验证了方法的有效性
    Abstract Acoustic howling suppression (AHS) is a critical challenge in audio communication systems. In this paper, we propose a novel approach that leverages the power of neural networks (NN) to enhance the performance of traditional Kalman filter algorithms for AHS. Specifically, our method involves the integration of NN modules into the Kalman filter, enabling refining reference signal, a key factor in effective adaptive filtering, and estimating covariance metrics for the filter which are crucial for adaptability in dynamic conditions, thereby obtaining improved AHS performance. As a result, the proposed method achieves improved AHS performance compared to both standalone NN and Kalman filter methods. Experimental evaluations validate the effectiveness of our approach.
    摘要 喷流喊叫控制(AHS)是音频通信系统中的一个关键挑战。在这篇论文中,我们提出了一种新的方法,利用神经网络(NN)提高传统的卡尔曼筛算法的AHS性能。具体来说,我们的方法是将NN模块与卡尔曼筛相结合,以便更好地调整参照信号,这是有效的适应滤波的关键因素,并估算筛子的协方差度量,这些度量对于在动态条件下的适应性至关重要。因此,我们的方法可以实现AHS性能的改进。实验评估表明,我们的方法比单独使用NN和卡尔曼筛方法都有更好的性能。

Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

  • paper_url: http://arxiv.org/abs/2309.16048
  • repo_url: https://github.com/YIXUANZ/AHS_2023_1
  • paper_authors: Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu
  • for: 本研究旨在解决声学喊响问题,提出了一种基于神经网络(NN)模块的训练框架,以便坚实地 Addressing the acoustic howling issue by examining its fundamental formation process.
  • methods: 该框架在训练过程中将NN模块 integrate into the closed-loop system,通过在训练过程中使用生成回传信号来尝试模拟实际应用场景中的喊响供应(AHS)流程。此外,该框架还提出了两种方法:一种仅采用NN,另一种 combining NN with the traditional Kalman filter。
  • results: 实验结果表明,该框架可以对声学喊响供应进行有效的抑制,与前一代基于NN的方法相比,具有显著的改进。
    Abstract In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. This framework integrates a neural network (NN) module into the closed-loop system during training with signals generated recursively on the fly to closely mimic the streaming process of acoustic howling suppression (AHS). The proposed recursive training strategy bridges the gap between training and real-world inference scenarios, marking a departure from previous NN-based methods that typically approach AHS as either noise suppression or acoustic echo cancellation. Within this framework, we explore two methodologies: one exclusively relying on NN and the other combining NN with the traditional Kalman filter. Additionally, we propose strategies, including howling detection and initialization using pre-trained offline models, to bolster trainability and expedite the training process. Experimental results validate that this framework offers a substantial improvement over previous methodologies for acoustic howling suppression.
    摘要 在本文中,我们介绍了一种新的训练框架,旨在全面Addressing the acoustic howling issue by examining its fundamental formation process. 这个框架通过在训练过程中 интеGRATE一个神经网络(NN)模块到关闭Loop系统中,通过在飞行中生成的信号来准确模拟流动Acoustic howling suppression(AHS)的流程。我们的 recursive training strategy bridge the gap between training and real-world inference scenarios, 与前一些NN-based方法不同,通常将AHS看作是噪声Suppression或Acoustic echo cancellation。在这个框架中,我们探讨了两种方法:一种完全依赖于NN,另一种 combining NN with the traditional Kalman filter。此外,我们还提出了一些策略,包括如何探测和初始化难以训练的模型,以增强训练可靠性和加速训练过程。实验结果表明,这个框架可以substantially improve the acoustic howling suppression compared to previous methodologies.

Multichannel Voice Trigger Detection Based on Transform-average-concatenate

  • paper_url: http://arxiv.org/abs/2309.16036
  • repo_url: None
  • paper_authors: Takuya Higuchi, Avamarie Brueggeman, Masood Delfarah, Stephen Shum
  • for: 提高voice triggering(VT)系统的准确率和效率,使得用户可以更加方便地使用语音识别技术。
  • methods: 提出了一种基于多通道听音模型的VT系统,使得系统可以直接从多通道输入中提取有用信息,而不需要额外的渠道选择和筛选步骤。
  • results: 对比基eline channel选择方法,提出的方法可以降低false rejection rate(FRR)达到30%,提高VT系统的准确率和效率。
    Abstract Voice triggering (VT) enables users to activate their devices by just speaking a trigger phrase. A front-end system is typically used to perform speech enhancement and/or separation, and produces multiple enhanced and/or separated signals. Since conventional VT systems take only single-channel audio as input, channel selection is performed. A drawback of this approach is that unselected channels are discarded, even if the discarded channels could contain useful information for VT. In this work, we propose multichannel acoustic models for VT, where the multichannel output from the frond-end is fed directly into a VT model. We adopt a transform-average-concatenate (TAC) block and modify the TAC block by incorporating the channel from the conventional channel selection so that the model can attend to a target speaker when multiple speakers are present. The proposed approach achieves up to 30% reduction in the false rejection rate compared to the baseline channel selection approach.
    摘要 通过语音触发(VT),用户可以通过说出触发语言来活动设备。前端系统通常用于进行语音增强和/或分离,生成多个增强和/或分离的信号。由于传统VT系统只接受单 кана声音输入,因此需要进行通道选择。这种方法的缺点是会抛弃未选择的通道,即使这些抛弃的通道可能包含有用信息 для VT。在这项工作中,我们提议使用多通道音频模型来进行VT,其中前端输出的多通道输入直接传递到VT模型中。我们采用了变换均值 concatenate(TAC)块,并将TAC块修改为包括传统通道选择的通道,以便模型可以在多个说话者存在时听到目标说话者。我们的方法可以相比基eline通道选择方法实现最多30%的假拒绝率降低。

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

  • paper_url: http://arxiv.org/abs/2309.15496
  • repo_url: None
  • paper_authors: Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi
  • for: 提高语音识别模型的流处理能力和预测速度
  • methods: 使用Conformer架构、非 causal convolution、动态 chunk mask 和干扰注意力等技术
  • results: 比对 DualVC 和基eline 系统,DualVC 2 在对话metric和对话metric中表现出色,并且具有低延迟(186.4 ms)
    Abstract Voice conversion is becoming increasingly popular, and a growing number of application scenarios require models with streaming inference capabilities. The recently proposed DualVC attempts to achieve this objective through streaming model architecture design and intra-model knowledge distillation along with hybrid predictive coding to compensate for the lack of future information. However, DualVC encounters several problems that limit its performance. First, the autoregressive decoder has error accumulation in its nature and limits the inference speed as well. Second, the causal convolution enables streaming capability but cannot sufficiently use future information within chunks. Third, the model is unable to effectively address the noise in the unvoiced segments, lowering the sound quality. In this paper, we propose DualVC 2 to address these issues. Specifically, the model backbone is migrated to a Conformer-based architecture, empowering parallel inference. Causal convolution is replaced by non-causal convolution with dynamic chunk mask to make better use of within-chunk future information. Also, quiet attention is introduced to enhance the model's noise robustness. Experiments show that DualVC 2 outperforms DualVC and other baseline systems in both subjective and objective metrics, with only 186.4 ms latency. Our audio samples are made publicly available.
    摘要 声音转换正在不断受欢迎,而更多的应用场景需要流处理推理能力。最近提出的双VC模型尝试通过流处理模型架构设计和内部知识储存加以杜绝预测编码来实现这一目标。然而,双VC模型存在一些限制其性能的问题。首先, autoregressive 解码器具有自然的错误积累特性,限制推理速度。其次, causal 卷积可以实现流处理能力,但是无法充分利用内存中的未来信息。最后,模型无法有效地处理无声段的噪声,下降声音质量。在这篇论文中,我们提出了双VC 2模型来解决这些问题。具体来说,模型核心被迁移到基于 Conformer 架构的architecture,实现并行推理。 causal 卷积被替换为非 causal 卷积,并使用动态 chunk mask 来更好地利用内存中的未来信息。此外,我们还引入了静态注意力来提高模型的噪声耐性。实验结果显示,双VC 2 模型在主观和客观指标中都超过了双VC 和其他基eline系统,具有只有 186.4 毫秒延迟。我们的声音样本被公开发布。

cs.CV - 2023-09-27

Diagnosis of Helicobacter pylori using AutoEncoders for the Detection of Anomalous Staining Patterns in Immunohistochemistry Images

  • paper_url: http://arxiv.org/abs/2309.16053
  • repo_url: None
  • paper_authors: Pau Cano, Álvaro Caravaca, Debora Gil, Eva Musulen
  • for: 检测人类胃癌病毒Helicobacter pylori
  • methods: 使用自适应神经网络模型(autoencoder),从健康组织图像中学习异常特征,检测H. pylori
  • results: 模型精度91%,敏感性86%,特异性96%,AUC0.97,能够高效地检测H. pylori
    Abstract This work addresses the detection of Helicobacter pylori a bacterium classified since 1994 as class 1 carcinogen to humans. By its highest specificity and sensitivity, the preferred diagnosis technique is the analysis of histological images with immunohistochemical staining, a process in which certain stained antibodies bind to antigens of the biological element of interest. This analysis is a time demanding task, which is currently done by an expert pathologist that visually inspects the digitized samples. We propose to use autoencoders to learn latent patterns of healthy tissue and detect H. pylori as an anomaly in image staining. Unlike existing classification approaches, an autoencoder is able to learn patterns in an unsupervised manner (without the need of image annotations) with high performance. In particular, our model has an overall 91% of accuracy with 86\% sensitivity, 96% specificity and 0.97 AUC in the detection of H. pylori.
    摘要

Handbook on Leveraging Lines for Two-View Relative Pose Estimation

  • paper_url: http://arxiv.org/abs/2309.16040
  • repo_url: None
  • paper_authors: Petr Hruby, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, Daniel Barath
  • for: 本文旨在提出一种可以结合点、线和它们的重合的方法来估算准确的图像对比pose的方法。
  • methods: 本文使用了融合点、线和重合的方法,并评估了现有文献中的最小解算法。
  • results: 实验表明, compared to点基本方法,本文的方法在各种indoor和outdoor数据集上提高了AUC@10$^\circ$的值,提高了1-7个点,并且运行速度相对较快。Here’s the English version of the information:
  • for: The paper proposes a method for estimating the relative pose between calibrated image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner.
  • methods: The method combines the advantages of all possible configurations where these data modalities can be used together, and reviews the minimal solvers available in the literature.
  • results: Experiments on various indoor and outdoor datasets show that the proposed approach outperforms point-based methods, improving AUC@10$^\circ$ by 1-7 points while running at comparable speeds.
    Abstract We propose an approach for estimating the relative pose between calibrated image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner. We investigate all possible configurations where these data modalities can be used together and review the minimal solvers available in the literature. Our hybrid framework combines the advantages of all configurations, enabling robust and accurate estimation in challenging environments. In addition, we design a method for jointly estimating multiple vanishing point correspondences in two images, and a bundle adjustment that considers all relevant data modalities. Experiments on various indoor and outdoor datasets show that our approach outperforms point-based methods, improving AUC@10$^\circ$ by 1-7 points while running at comparable speeds. The source code of the solvers and hybrid framework will be made public.
    摘要 我们提出了一种方法,用于估算投影图像对的相对pose,通过同时利用点、线和它们的重合来实现。我们审查了所有可能的数据模式,并评估了文献中可用的最小解。我们的混合框架结合了所有配置的优点,可以在具有挑战性的环境中提供稳定和准确的估算。此外,我们还设计了用于在两个图像中同时估算多个消失点匹配的方法,以及考虑所有相关数据模式的缓冲调整。在各种室内和室外数据集上进行了实验,我们的方法与点基本方法相比,提高了AUC@10$^\circ$的值,从1-7个点中增加了1-7个点,并且在相同的速度下运行。我们计划将解决方案和混合框架的源代码公开。

Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

  • paper_url: http://arxiv.org/abs/2309.16023
  • repo_url: None
  • paper_authors: Shengze Jin, Daniel Barath, Marc Pollefeys, Iro Armeni
  • for: 这种论文主要用于提出一种新的点云注册方法,以便更好地进行点云注册问题的解决。
  • methods: 这种方法使用了学习基于方法,包括对匹配的优化,以及使用RANSAC-like框架进行评估。
  • results: 这种方法可以提供更加稳定和有效的点云注册结果,并且可以在实时应用中使用。它在3DMatch、KITTI和ModelNet测试数据集上达到了新的状态平衡。
    Abstract Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching and, as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end training, leaving the objective to minimize the pose error nonserved. We present a novel solution, Q-REG, which utilizes rich geometric information to estimate the rigid pose from a single correspondence. Q-REG allows to formalize the robust estimation as an exhaustive search, hence enabling end-to-end training that optimizes over both objectives of correspondence matching and rigid pose estimation. We demonstrate in the experiments that Q-REG is agnostic to the correspondence matching method and provides consistent improvement both when used only in inference and in end-to-end training. It sets a new state-of-the-art on the 3DMatch, KITTI, and ModelNet benchmarks.
    摘要 “对点云注册进行了最近的成功,使用了一些学习基于方法,强调对应匹配和优化这个目标。在学习步骤中,它们使用RANSAC类框架进行评估估算的稳定性,但是这会阻碍完整的端到端训练,使得最小化pose错误的目标未被服务。我们提出了一种新的解决方案,即Q-REG,它利用了丰富的几何信息来估算点云的稳定性。Q-REG允许我们对稳定性进行排序的极限搜索,因此可以进行端到端训练,并且可以同时优化对应匹配和稳定性估算的两个目标。我们在实验中证明了Q-REG是不同对应匹配方法的agnostic,并在推理和端到端训练中提供了一致的改进。它在3DMatch、KITTI和ModelNet标准测试 benchmark上设置了新的状态。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization

  • paper_url: http://arxiv.org/abs/2309.16020
  • repo_url: None
  • paper_authors: Vicente Vivanco Cepeda, Gaurav Kumar Nayak, Mubarak Shah
  • for: 准确地定位全球任意位置的图像
  • methods: 提出了一种基于CLIP的图像-GPS匹配方法,使用位置编码和幂等分辨率表示来模型地球,并通过对图像和GPS位置进行对齐来实现地图地标注。
  • results: 通过广泛的实验和简要的ablation,证明了该方法的有效性,只需使用20%的训练数据就能达到竞争力水平,并且通过文本查询示例展示了图像地理标注的可行性。
    Abstract Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. This task has considerable challenges due to immense variation in geographic landscapes. The image-to-image retrieval-based approaches fail to solve this problem on a global scale as it is not feasible to construct a large gallery of images covering the entire world. Instead, existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. However, their performance is limited by the predefined classes and often results in inaccurate localizations when an image's location significantly deviates from its class center. To overcome these limitations, we propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations. GeoCLIP's location encoder models the Earth as a continuous function by employing positional encoding through random Fourier features and constructing a hierarchical representation that captures information at varying resolutions to yield a semantically rich high-dimensional feature suitable to use even beyond geo-localization. To the best of our knowledge, this is the first work employing GPS encoding for geo-localization. We demonstrate the efficacy of our method via extensive experiments and ablations on benchmark datasets. We achieve competitive performance with just 20% of training data, highlighting its effectiveness even in limited-data settings. Furthermore, we qualitatively demonstrate geo-localization using a text query by leveraging CLIP backbone of our image encoder.
    摘要 全球地理位置 pinpoint 任何地点的精准位置是全球地理位置定位的挑战。由于地理景观的巨大差异,图像到图像检索方法无法在全球范围内解决这个问题。现有的方法将地球分成精确的地理维度单元,将问题转化为一个分类任务,但其性能受限于预先定义的类别,常导致图像的位置偏差从类别中心偏离。为了超越这些限制,我们提出了 GeoCLIP,一种基于 CLIP 的图像到 GPS Retrieval 方法。GeoCLIP 的位置编码器使用随机傅里埃特性编码 Earth 为一个连续函数,并使用层次表示,以捕捉图像与 GPS 位置之间的对应关系。这使得 GeoCLIP 可以在有限数据量下达到竞争性性能。我们通过广泛的实验和剔除研究证明 GeoCLIP 的效果。此外,我们通过 CLIP 的背景网络,用文本查询来实现地理位置定位。

Assessment of Local Climate Zone Products via Simplified Classification Rule with 3D Building Maps

  • paper_url: http://arxiv.org/abs/2309.15978
  • repo_url: None
  • paper_authors: Hunsoo Song, Gaia Cervini, Jinha Jung
  • for: 本研究evaluates the performance of a global Local Climate Zone (LCZ) product.
  • methods: 研究使用了一种简单的规则生成法 constructed a reference LCZ using high-resolution 3D building maps.
  • results: 研究发现,全球LCZ产品很难 differentiate classes that demand precise building footprint information (Classes 6 and 9), and classes that necessitate the identification of subtle differences in building elevation (Classes 4-6). Additionally, 研究发现了不一致的趋势,城市间LCZ分布不同, suggesting the presence of a data distribution shift problem in the machine learning-based LCZ classifier.
    Abstract This study assesses the performance of a global Local Climate Zone (LCZ) product. We examined the built-type classes of LCZs in three major metropolitan areas within the U.S. A reference LCZ was constructed using a simple rule-based method based on high-resolution 3D building maps. Our evaluation demonstrated that the global LCZ product struggles to differentiate classes that demand precise building footprint information (Classes 6 and 9), and classes that necessitate the identification of subtle differences in building elevation (Classes 4-6). Additionally, we identified inconsistent tendencies, where the distribution of classes skews differently across different cities, suggesting the presence of a data distribution shift problem in the machine learning-based LCZ classifier. Our findings shed light on the uncertainties in global LCZ maps, help identify the LCZ classes that are the most challenging to distinguish, and offer insight into future plans for LCZ development and validation.
    摘要 Translation notes:* "Local Climate Zone" (LCZ) is translated as "地方气候区" (dìfāng kīhào qū)* "built-type classes" is translated as "建筑类别" (jiànzhù làibie)* "high-resolution 3D building maps" is translated as "高分辨率3D建筑地图" (gāo fēnbianhé lǐ 3D jiànzhù dìtú)* "machine learning-based LCZ classifier" is translated as "基于机器学习的LCZ分类器" (jīyù jīshì xuéxí de LCZ fēngròngqì)* "data distribution shift problem" is translated as "数据分布偏移问题" (shùzhì fāngbù pénduì wèn tí)

Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields

  • paper_url: http://arxiv.org/abs/2309.15977
  • repo_url: None
  • paper_authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
  • for: 这个论文的目的是提出一种基于神经网络场函数的听音场景参数化方法,以提高听音场景的准确性。
  • methods: 这个方法使用多个听音上下文,如干擦性、形态特征和空间信息,来Parameterize听音场景。它还使用时间相关模块和多尺度能量衰减标准来适应RIR的独特性。
  • results: 实验结果显示,NACF方法在比较 existed 场景下表现出了明显的优异,超过了现有的场景基于场函数方法。
    Abstract Room impulse response (RIR), which measures the sound propagation within an environment, is critical for synthesizing high-fidelity audio for a given environment. Some prior work has proposed representing RIR as a neural field function of the sound emitter and receiver positions. However, these methods do not sufficiently consider the acoustic properties of an audio scene, leading to unsatisfactory performance. This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene by leveraging multiple acoustic contexts, such as geometry, material property, and spatial information. Driven by the unique properties of RIR, i.e., temporal un-smoothness and monotonic energy attenuation, we design a temporal correlation module and multi-scale energy decay criterion. Experimental results show that NACF outperforms existing field-based methods by a notable margin. Please visit our project page for more qualitative results.
    摘要 <使用 neural field 函数来表示室内声学环境的室内响应(RIR)已经有一些前期工作,但这些方法并不充分考虑了声音场景的音学性质,导致效果不够满意。这封信提议一种新的声学上下文场景方法(NACF),利用多种声学上下文,如几何、物理性和空间信息,来参数化声音场景。驱动了RIR的特有性,如时间不整合和单调能量减衰,我们设计了时间相关模块和多scale能量减衰标准。实验结果表明,NACF在场景基于方法中表现出优于其他方法。更多资讯请访问我们项目页面。Here's a breakdown of the translation:* "室内响应" (RIR) is translated as "室内响应" (also RIR).* " neural field function" is translated as "声学上下文场景方法" (NACF).* "acoustic properties" is translated as "音学性质" (音学性质).* "audio scene" is translated as "声音场景" (声音场景).* "temporal un-smoothness" is translated as "时间不整合" (时间不整合).* "monotonic energy attenuation" is translated as "单调能量减衰" (单调能量减衰).* "field-based methods" is translated as "场景基于方法" (场景基于方法).Please note that the translation is done using Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.

The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering

  • paper_url: http://arxiv.org/abs/2309.15954
  • repo_url: None
  • paper_authors: Haichao Yu, Yu Tian, Sateesh Kumar, Linjie Yang, Heng Wang
  • for: 本研究旨在评估不同数据筛选方法的性能,以提高基础模型的表现。
  • methods: 本研究使用了三个阶段的筛选策略:单模态筛选、交叉模态筛选和数据分布对接。我们还提出了新的解决方案,如计算 CLIP 分数在水平翻转图像上以减少场景文本的干扰,使用视觉和语言模型来检索下游任务的训练样本,重新平衡数据分布以改善计算资源的分配效率等。
  • results: 我们的方法比 DataComp 论文中最佳方法平均表现提高了4%, ImageNet 上表现提高了2%。
    Abstract The quality of pre-training data plays a critical role in the performance of foundation models. Popular foundation models often design their own recipe for data filtering, which makes it hard to analyze and compare different data filtering approaches. DataComp is a new benchmark dedicated to evaluating different methods for data filtering. This paper describes our learning and solution when participating in the DataComp challenge. Our filtering strategy includes three stages: single-modality filtering, cross-modality filtering, and data distribution alignment. We integrate existing methods and propose new solutions, such as computing CLIP score on horizontally flipped images to mitigate the interference of scene text, using vision and language models to retrieve training samples for target downstream tasks, rebalancing the data distribution to improve the efficiency of allocating the computational budget, etc. We slice and dice our design choices, provide in-depth analysis, and discuss open questions. Our approach outperforms the best method from the DataComp paper by over 4% on the average performance of 38 tasks and by over 2% on ImageNet.
    摘要 “数据预训模型的质量具有关键作用,但是popular基础模型经常设计自己的数据筛选方法,这使得分析和比较不同数据筛选方法的困难。为了解决这个问题,DataComp是一个新的竞赛benchmark,用于评估不同数据筛选方法。这篇论文描述了我们在DataComp挑战中学习和解决的经验。我们的筛选策略包括三个阶段:单模态筛选、交叉模态筛选和数据分布对齐。我们将现有方法与新的解决方案相结合,例如在横向翻转图像上计算CLIP分数以避免场景文本的干扰,使用视觉和语言模型来收集下游任务的训练样本,重新规划数据分布以提高计算预算的效率等。我们将slice和dice我们的设计选择,进行深入分析,并讨论开放问题。我们的方法在38个任务的平均性能上超过DataComp文章中最佳方法的4%,并在ImageNet上超过2%。”

AutoEncoding Tree for City Generation and Applications

  • paper_url: http://arxiv.org/abs/2309.15941
  • repo_url: None
  • paper_authors: Wenyu Han, Congcong Wen, Lazarus Chok, Yan Liang Tan, Sheung Lung Chan, Hang Zhao, Chen Feng
  • for: 这paper的目的是为了提出一种基于树状自编码器的城市生成模型,以解决城市数据的巨量和缺乏公共数据的问题。
  • methods: 该paper使用了一种新的空间几何距离(SGD)度量来衡量建筑布局的相似性,然后将其转化为一棵树状网络,其中encoder部分会逐级提取和合并空间信息。
  • results: 实验结果表明,提出的AETree模型可以有效地进行2D和3D城市生成,同时学习的缓存特征可以用于下游城市规划应用。
    Abstract City modeling and generation have attracted an increased interest in various applications, including gaming, urban planning, and autonomous driving. Unlike previous works focused on the generation of single objects or indoor scenes, the huge volumes of spatial data in cities pose a challenge to the generative models. Furthermore, few publicly available 3D real-world city datasets also hinder the development of methods for city generation. In this paper, we first collect over 3,000,000 geo-referenced objects for the city of New York, Zurich, Tokyo, Berlin, Boston and several other large cities. Based on this dataset, we propose AETree, a tree-structured auto-encoder neural network, for city generation. Specifically, we first propose a novel Spatial-Geometric Distance (SGD) metric to measure the similarity between building layouts and then construct a binary tree over the raw geometric data of building based on the SGD metric. Next, we present a tree-structured network whose encoder learns to extract and merge spatial information from bottom-up iteratively. The resulting global representation is reversely decoded for reconstruction or generation. To address the issue of long-dependency as the level of the tree increases, a Long Short-Term Memory (LSTM) Cell is employed as a basic network element of the proposed AETree. Moreover, we introduce a novel metric, Overlapping Area Ratio (OAR), to quantitatively evaluate the generation results. Experiments on the collected dataset demonstrate the effectiveness of the proposed model on 2D and 3D city generation. Furthermore, the latent features learned by AETree can serve downstream urban planning applications.
    摘要 城市模型化和生成在各种应用中受到了越来越多的关注,包括游戏、城市规划和自动驾驶。与前一些关注单个 объек或室内场景生成的研究不同,城市的巨量数据带来了生成模型的挑战。此外,有限公共可用的3D实际城市数据也限制了城市生成方法的发展。在这篇论文中,我们首先收集了纽约、苏黎世、东京、柏林和波士顿等城市的3,000,000个地理引用对象。基于这些数据,我们提议了AETree,一种树状自动编码网络,用于城市生成。具体来说,我们首先提出了一种新的空间几何距离(SGD)度量,用于衡量建筑布局之间的相似性。然后,我们将建筑的原始几何数据拼接成一棵二叉树,基于SGD度量。接下来,我们介绍了一个树状网络,其编码器可以从底向上iteratively提取和合并空间信息。结果的全局表示可以 reversely 解码为重建或生成。为了解决生成结果中层级增长的长期依赖问题,我们采用了一个长期记忆(LSTM)细胞作为AETree的基本网络元素。此外,我们还提出了一个新的度量, overlap 区域比率(OAR),用于评估生成结果的质量。实验表明,提议的模型在2D和3D城市生成中表现效果。此外,AETree所学习的潜在特征可以服务于下游城市规划应用。

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

  • paper_url: http://arxiv.org/abs/2309.15940
  • repo_url: https://github.com/changhaonan/ovsg
  • paper_authors: Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric Jing, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias
  • for: 用于提供一个开放词汇3D场景图(OVSG),用于对各种实体(例如物体实例、代理人和区域)进行识别,并支持自由文本查询。
  • methods: 使用自由文本查询,而不是传统的semantic-based对象定位方法,以提供上下文意识感知定位。
  • results: 在使用ScanNet数据集和自我收集的数据集进行比较实验中,我们的提议方法在对前期 semantic-based定位技术的比较中显著超越了性能。此外,我们还探讨了OVSG在实际 робоNavigation和操作实验中的实际应用。
    Abstract We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments.
    摘要 我们提出了一个开放词汇3D场景图(OVSG),这是一种正式框架,用于将多种实体,如物品实例、代理人和区域,与自由形式文本查询相关联。与传统的意义基于对象定位方法不同,我们的系统支持上下文意识实体定位,allowing for queries such as "pick up a cup on a kitchen table" or "navigate to a sofa on which someone is sitting". 与现有的3D场景图研究不同,OVSG支持自由形式文本输入和开放词汇查询。通过对ScanNet数据集和自我收集的数据集进行比较实验,我们示出了我们提议的方法在前一个 semantic-based定位技术的性能方面明显超越。此外,我们还 highlighted the practical application of OVSG in real-world robot navigation and manipulation experiments。

Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts

  • paper_url: http://arxiv.org/abs/2309.15915
  • repo_url: https://github.com/engindeniz/vitis
  • paper_authors: Deniz Engin, Yannis Avrithis
  • for: 这个论文的目的是解决大规模预训练模型在有限数据上适应问题中的挑战,包括过拟合、跨Modal汇总和语义识别等问题。
  • methods: 该论文提出了一种效率的参数方法, combinig 多modal prompt学习和基于transformer的映射网络,以适应预训练模型的冰封。
  • results: 我们在多个视频问答 benchmark上进行了实验,并证明了我们的方法在零shot和几shot设置下具有优秀的性能和参数效率。我们的代码可以在 https://engindeniz.github.io/vitis 上获取。
    Abstract Recent vision-language models are driven by large-scale pretrained models. However, adapting pretrained models on limited data presents challenges such as overfitting, catastrophic forgetting, and the cross-modal gap between vision and language. We introduce a parameter-efficient method to address these challenges, combining multimodal prompt learning and a transformer-based mapping network, while keeping the pretrained models frozen. Our experiments on several video question answering benchmarks demonstrate the superiority of our approach in terms of performance and parameter efficiency on both zero-shot and few-shot settings. Our code is available at https://engindeniz.github.io/vitis.
    摘要 现代视力语言模型受大规模预训练模型的驱动。然而,在有限数据上适应预训练模型存在困难,如预测溢出、跨模态差距和语言视觉 gap。我们提出一种 parameter-efficient 方法,结合多模态提示学习和基于 transformer 的映射网络,保持预训练模型冻结。我们的实验表明,我们的方法在多个视频问答 benchmark 上具有表现和参数效率的优势,包括零shot 和几shot 设置。我们的代码可以在 上找到。

Exploiting the Signal-Leak Bias in Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.15842
  • repo_url: None
  • paper_authors: Martin Nicolas Everaert, Athanasios Fitsios, Marco Bocchio, Sami Arpa, Sabine Süsstrunk, Radhakrishna Achanta
  • for: 本研究旨在探讨 diffusion 模型中存在的偏见问题,并提出一种方法来控制生成的图像。
  • methods: 研究者使用了现有的 diffusion 模型,并在其中引入了一种信号泄漏来控制生成的图像。
  • results: 研究者通过模型 signal-leak 的分布在空间频谱和像素域来控制生成的图像,并可以通过不需要进一步训练来生成符合预期结果的图像。
    Abstract There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to avoid the signal leakage during training. We instead show how we can exploit this signal-leak bias in existing diffusion models to allow more control over the generated images. This enables us to generate images with more varied brightness, and images that better match a desired style or color. By modeling the distribution of the signal leak in the spatial frequency and pixel domains, and including a signal leak in the initial latent, we generate images that better match expected results without any additional training.
    摘要 多种扩散模型中的推理管道存在偏见。这种偏见来自信号泄漏,其分布与噪声分布不同,导致训练和推理过程之间的差异。我们示示了这种信号泄漏偏见在特定风格下训练模型时特别 significannot。 current research aims to avoid signal leakage during training. 我们则示了如何在现有的扩散模型中利用这种信号泄漏偏见,以获得更多的控制权 над生成图像。通过在空间频率和像素域中模型信号泄漏分布,并在初始干扰中包含信号泄漏,我们生成了更好地匹配预期结果的图像。这些图像不需要任何额外训练。

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

  • paper_url: http://arxiv.org/abs/2309.15818
  • repo_url: https://github.com/showlab/show-1
  • paper_authors: David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou
  • for: 本研究旨在提出一种混合型文本到视频生成模型(Show-1),结合像素基于的VDM和秘密基于的VDM进行文本到视频生成。
  • methods: 我们的模型首先使用像素基于的VDM生成一个低分辨率的视频,然后提出了一种新的专家翻译方法,使用秘密基于的VDM进行更高的视频 upsample。
  • results: 与秘密VDM相比,Show-1可以生成高质量的视频,具有精确的文本-视频对应性;与像素VDM相比,Show-1具有许多更高效的特点(GPU内存使用率 durante la inferencia es de 15G vs 72G)。我们还 validate了我们的模型在标准视频生成 bencmarks 上。
    Abstract Significant advancements have been achieved in the realm of large-scale pre-trained text-to-video Diffusion Models (VDMs). However, previous methods either rely solely on pixel-based VDMs, which come with high computational costs, or on latent-based VDMs, which often struggle with precise text-video alignment. In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation. Our model first uses pixel-based VDMs to produce a low-resolution video of strong text-video correlation. After that, we propose a novel expert translation method that employs the latent-based VDMs to further upsample the low-resolution video to high resolution. Compared to latent VDMs, Show-1 can produce high-quality videos of precise text-video alignment; Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage during inference is 15G vs 72G). We also validate our model on standard video generation benchmarks. Our code and model weights are publicly available at https://github.com/showlab/Show-1.
    摘要 大量的进步已经在文本到视频扩散模型(VDM)领域取得了成果。然而,之前的方法都是靠坐标基于的VDM或者是基于隐藏变量的VDM,这两者都有缺点。在这篇论文中,我们是第一个提出了拥有坐标基于和隐藏变量基于VDM的混合模型,我们称之为Show-1。我们的模型首先使用坐标基于VDM生成一个低分辨率的视频,并且通过我们提出的一种新的专家翻译方法,使用隐藏变量基于VDM来进一步提高低分辨率视频的分辨率。相比于隐藏VDM,Show-1可以生成高质量的文本视频匹配;相比于坐标VDM,Show-1 Much more efficient(GPU内存使用率 durante la inferencia es de 15G vs 72G)。我们还验证了我们的模型在标准视频生成 benchmarks 上的性能。我们的代码和模型权重可以在https://github.com/showlab/Show-1 上获取。

Convolutional Networks with Oriented 1D Kernels

  • paper_url: http://arxiv.org/abs/2309.15812
  • repo_url: https://github.com/princeton-vl/oriented1d
  • paper_authors: Alexandre Kirchmeyer, Jia Deng
  • for: 这个论文的目的是探讨ConvNet是否可以没有2D卷积。
  • methods: 这个论文使用了1D卷积,并且发现了一种叫做方向1D卷积的技术,可以将2D卷积完全替代。
  • results: 这个论文的实验结果表明,使用方向1D卷积可以达到与2D卷积相同的准确率,而且可以降低计算量。
    Abstract In computer vision, 2D convolution is arguably the most important operation performed by a ConvNet. Unsurprisingly, it has been the focus of intense software and hardware optimization and enjoys highly efficient implementations. In this work, we ask an intriguing question: can we make a ConvNet work without 2D convolutions? Surprisingly, we find that the answer is yes -- we show that a ConvNet consisting entirely of 1D convolutions can do just as well as 2D on ImageNet classification. Specifically, we find that one key ingredient to a high-performing 1D ConvNet is oriented 1D kernels: 1D kernels that are oriented not just horizontally or vertically, but also at other angles. Our experiments show that oriented 1D convolutions can not only replace 2D convolutions but also augment existing architectures with large kernels, leading to improved accuracy with minimal FLOPs increase. A key contribution of this work is a highly-optimized custom CUDA implementation of oriented 1D kernels, specialized to the depthwise convolution setting. Our benchmarks demonstrate that our custom CUDA implementation almost perfectly realizes the theoretical advantage of 1D convolution: it is faster than a native horizontal convolution for any arbitrary angle. Code is available at https://github.com/princeton-vl/Oriented1D.
    摘要 在计算机视觉中,2D卷积是无可争议的最重要的操作,它在ConvNet中扮演着关键的角色。不奇怪的是,它已经得到了极高效的软件和硬件优化,并且有高效的实现。在这项工作中,我们提出了一个有趣的问题:可以不使用2D卷积来实现ConvNet吗?奇怪的是,我们发现答案是Yes,我们表明了一个完全由1D卷积组成的ConvNet可以与2D卷积相当地表现,甚至在ImageNet分类任务上达到相同的准确率。具体来说,我们发现一个关键的组成部分是方向卷积:卷积不仅可以水平或垂直进行卷积,还可以在其他角度进行卷积。我们的实验表明,方向卷积不仅可以替换2D卷积,还可以补充现有的架构,从而提高准确率,而且减少FLOPs。我们的一个重要贡献是对方向卷积的高度优化的自定义CUDA实现,特制为深度卷积设置。我们的测试表明,我们的自定义CUDA实现几乎完全实现了理论上的优势,它在任意角度下比本地水平卷积更快。代码可以在https://github.com/princeton-vl/Oriented1D上获取。

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

  • paper_url: http://arxiv.org/abs/2309.15807
  • repo_url: None
  • paper_authors: Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh
  • For: 这个论文的目的是提出一种基于Web scale image-text对的文本至图模型训练方法,以生成高质量的视觉概念图像。* Methods: 该论文提出了一种名为“质量调整”的方法,通过精心选择一些高质量且极其视觉吸引人的图像进行监督训练,以使文本至图模型产生更高质量的图像。* Results: 论文的实验结果显示,使用“质量调整”方法可以使文本至图模型产生更高质量的图像,并且比传统的文本至图模型更具有视觉吸引力。
    Abstract Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusively generate highly visually appealing images, while maintaining generality across visual concepts. Our key insight is that supervised fine-tuning with a set of surprisingly small but extremely visually appealing images can significantly improve the generation quality. We pre-train a latent diffusion model on $1.1$ billion image-text pairs and fine-tune it with only a few thousand carefully selected high-quality images. The resulting model, Emu, achieves a win rate of $82.9\%$ compared with its pre-trained only counterpart. Compared to the state-of-the-art SDXLv1.0, Emu is preferred $68.4\%$ and $71.3\%$ of the time on visual appeal on the standard PartiPrompts and our Open User Input benchmark based on the real-world usage of text-to-image models. In addition, we show that quality-tuning is a generic approach that is also effective for other architectures, including pixel diffusion and masked generative transformer models.
    摘要 培训文本到图像模型使得可以生成广泛的视觉概念从文本。然而,这些预训练模型经常在生成高度美观的图像时遇到问题。这创造了美观对齐的需求。在这篇论文中,我们提出了质量调整来有效地引导预训练后的模型仅生成高度视觉吸引人的图像,而保持视觉概念的通用性。我们的关键发现是,在一小群非常美观但极其吸引人的图像上进行监督微调可以很大程度上提高生成质量。我们在110亿个图像-文本对的基础上预训练了一个抽象扩散模型,然后通过只有几千个精选高质量图像进行微调。得到的模型被称为Emu,其赢得了与预训练只的对手的比赛,其中胜率为82.9%。相比之下,与状态艺术SDXLv1.0进行比赛,Emu被选择了68.4%和71.3%的时间在标准的 PartiPrompts 和我们的实际用途基准测试中的视觉吸引力方面。此外,我们还证明了质量调整是一种通用的方法,也是有效的 для其他架构,包括像素扩散和受Mask的生成变换模型。

A Quantum-Classical Hybrid Block-Matching Algorithm in Noisy Environment using Dissimilarity Measure

  • paper_url: http://arxiv.org/abs/2309.15792
  • repo_url: None
  • paper_authors: M. Martínez-Felipe, J. Montiel-Pérez, V. Onofre-González, A. Maldonado-Romo, Ricky Young
  • for: 这个论文是为了解决图像块匹配问题,即在搜索区域内找到一组相似图像块。
  • methods: 这个论文使用了类比图像处理技术,包括 Gaussian 噪声和图像尺寸减小,以及 phase 图像编码和量子快速幂transform。
  • results: 该论文提出了一种基于 phase 图像编码和 swap 测试的不同性度量,并在理想和噪声掺杂的 simulate 环境中进行了实验,并在 IBM 和 Ionq 量子设备上进行了 Swap 测试。
    Abstract A block-matching algorithm finds a group of similar image patches inside a search area. Similarity/dissimilarity measures can help to solve this problem. In different practical applications, finding groups of similar image blocks within an ample search area is often necessary, such as video compression, image clustering, vector quantization, and nonlocal noise reduction. In this work, classical image processing is performed using Gaussian noise and image size reduction with a fit of a Low-Pass Filter or Domain Transform. A hierarchical search technique is implemented to encode the images by phase operator. Using phase image coding with the quantum Fourier transform and the Swap test, we propose a dissimilarity measure. Results were obtained with perfect and noisy simulations and in the case of the Swap test with the IBM and Ionq quantum devices.
    摘要 algorithm 寻找内部 search 区域中相似的图像块。相似性/不同性度量可以解决这个问题。在实际应用中,寻找内部 search 区域中的相似图像块是非常重要的,例如影像压缩、影像集群、向量量化和非本地噪音减少。在这个工作中,使用 Gaussian 噪声和影像缩小以适应低通滤过或领域转换。使用层次搜寻技术实现影像编码,使用相位操作器进行编码。使用相位图像编码、量子 fourier 转换和交换测试,我们提出了一个不同度量。实际成果是在完美和噪音 simulation 中获得,以及在交换测试中使用 IBM 和 Ionq 量子设备。

Partial Transport for Point-Cloud Registration

  • paper_url: http://arxiv.org/abs/2309.15787
  • repo_url: None
  • paper_authors: Yikun Bai, Huy Tran, Steven B. Damelin, Soheil Kolouri
  • for: 非静止点云注册问题在机器人、计算机图形和医疗成像等领域中扮演着关键角色,其中面临非静止运动和部分可见性(如干扰或感测器噪声)的问题。
  • methods: 本文通过优化运输问题和其不均衡变种(如优化部分运输问题)来解决非静止点云注册问题,并提出一系列基于优化partial运输问题的非静止注册方法。然后,通过利用一 dimensional优化partial运输问题的有效解决方法的扩展,提高了算法的计算效率,从而实现了快速和稳定的非静止注册算法。
  • results: 本文通过对多个3D和2D非静止注册问题进行测试和比较,证明了我们提出的方法的有效性和稳定性。在扰动和噪声的情况下,我们的方法可以快速和精度地解决非静止注册问题。
    Abstract Point cloud registration plays a crucial role in various fields, including robotics, computer graphics, and medical imaging. This process involves determining spatial relationships between different sets of points, typically within a 3D space. In real-world scenarios, complexities arise from non-rigid movements and partial visibility, such as occlusions or sensor noise, making non-rigid registration a challenging problem. Classic non-rigid registration methods are often computationally demanding, suffer from unstable performance, and, importantly, have limited theoretical guarantees. The optimal transport problem and its unbalanced variations (e.g., the optimal partial transport problem) have emerged as powerful tools for point-cloud registration, establishing a strong benchmark in this field. These methods view point clouds as empirical measures and provide a mathematically rigorous way to quantify the `correspondence' between (the transformed) source and target points. In this paper, we approach the point-cloud registration problem through the lens of optimal transport theory and first propose a comprehensive set of non-rigid registration methods based on the optimal partial transportation problem. Subsequently, leveraging the emerging work on efficient solutions to the one-dimensional optimal partial transport problem, we extend our proposed algorithms via slicing to gain significant computational efficiency, resulting in fast and robust non-rigid registration algorithms. We demonstrate the effectiveness of our proposed methods and compare them against baselines on various 3D and 2D non-rigid registration problems where the source and target point clouds are corrupted by random noise.
    摘要 点云注册在不同领域中扮演着关键的角色,包括机器人学、计算机图形学和医学影像。这个过程涉及到不同集点之间的空间关系的确定,通常在3D空间中。在实际应用中,复杂性来自于非RIGID运动和部分可见性,如遮挡或感器噪声,使得非RIGID注册成为一个具有挑战性的问题。 классические非RIGID注册方法通常具有计算昂贵、性能不稳定和有限的理论保证。优化运输问题和其不均变种(例如优化部分运输问题)在点云注册中发挥了强大的作用,成为这个领域的标准准则。这些方法视点云为实际测量,并提供了数学上的正式方式来衡量(变换后)源点云和目标点云之间的匹配程度。在本文中,我们通过优化运输理论的镜像来解决点云注册问题,并首次提出了基于优化部分运输问题的全面非RIGID注册方法。然后,通过利用emerging工作在一维优化部分运输问题上的高效解决方法,我们扩展了我们的提议算法,从而获得了快速和可靠的非RIGID注册算法。我们在不同的3D和2D非RIGID注册问题上证明了我们的提议方法的有效性,并与基准方法进行比较。

One For All: Video Conversation is Feasible Without Video Instruction Tuning

  • paper_url: http://arxiv.org/abs/2309.15785
  • repo_url: https://github.com/farewellthree/BT-Adapter
  • paper_authors: Ruyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas H. Li, Ge Li
  • for: 提高视频对话系统的效能,使用现有的图像对话模型进行扩展。
  • methods: 提出了一种名为 Branching Temporal Adapter(BT-Adapter)的新方法,可以将图像语言预测模型扩展到视频领域。BT-Adapter acting as a temporal modeling branch alongside the pretrained visual encoder,并且在保持backbone冻结的情况下进行调教。
  • results: 通过BT-Adapter,可以让现有的多Modal对话模型具备强大的视频理解能力,而无需耗费过多的GPU资源。BT-Adapter可以在少量的GPU时间内达到state-of-the-art的零基eline结果,并且在不带视频指导的情况下达到更好的性能。
    Abstract The recent progress in Large Language Models (LLM) has spurred various advancements in image-language conversation agents, while how to build a proficient video-based dialogue system is still under exploration. Considering the extensive scale of LLM and visual backbone, minimal GPU memory is left for facilitating effective temporal modeling, which is crucial for comprehending and providing feedback on videos. To this end, we propose Branching Temporal Adapter (BT-Adapter), a novel method for extending image-language pretrained models into the video domain. Specifically, BT-Adapter serves as a plug-and-use temporal modeling branch alongside the pretrained visual encoder, which is tuned while keeping the backbone frozen. Just pretrained once, BT-Adapter can be seamlessly integrated into all image conversation models using this version of CLIP, enabling video conversations without the need for video instructions. Besides, we develop a unique asymmetric token masking strategy inside the branch with tailor-made training tasks for BT-Adapter, facilitating faster convergence and better results. Thanks to BT-Adapter, we are able to empower existing multimodal dialogue models with strong video understanding capabilities without incurring excessive GPU costs. Without bells and whistles, BT-Adapter achieves (1) state-of-the-art zero-shot results on various video tasks using thousands of fewer GPU hours. (2) better performance than current video chatbots without any video instruction tuning. (3) state-of-the-art results of video chatting using video instruction tuning, outperforming previous SOTAs by a large margin.
    摘要 Recent progress in Large Language Models (LLM) has led to advancements in image-language conversation agents, but how to build a proficient video-based dialogue system is still being explored. Due to the extensive scale of LLM and visual backbone, there is limited GPU memory available for effective temporal modeling, which is crucial for understanding and providing feedback on videos. To address this challenge, we propose Branching Temporal Adapter (BT-Adapter), a novel method for extending image-language pretrained models into the video domain. Specifically, BT-Adapter serves as a plug-and-use temporal modeling branch alongside the pretrained visual encoder, which is tuned while keeping the backbone frozen. With just one pretraining, BT-Adapter can be seamlessly integrated into all image conversation models using this version of CLIP, enabling video conversations without the need for video instructions. Additionally, we develop a unique asymmetric token masking strategy inside the branch with tailor-made training tasks for BT-Adapter, which facilitates faster convergence and better results. Thanks to BT-Adapter, we can empower existing multimodal dialogue models with strong video understanding capabilities without incurring excessive GPU costs. Without any bells and whistles, BT-Adapter achieves the following:1. State-of-the-art zero-shot results on various video tasks using thousands of fewer GPU hours.2. Better performance than current video chatbots without any video instruction tuning.3. State-of-the-art results of video chatting using video instruction tuning, outperforming previous SOTAs by a large margin.

Joint-YODNet: A Light-weight Object Detector for UAVs to Achieve Above 100fps

  • paper_url: http://arxiv.org/abs/2309.15782
  • repo_url: None
  • paper_authors: Vipin Gautam, Shitala Prasad, Sharad Sinha
  • for: 这篇论文旨在提高无人航空车(UAV)影像中小物体检测的精度。
  • methods: 本论文提出了一个新的联合损失函数(JointYODNet),用于强化小物体检测的精度。这个联合损失函数结合了对小物体检测的特有损失函数。
  • results: 经过广泛的实验,我们发现我们提出的联合损失函数可以优化小物体检测的精度。特别是,我们的方法在不同环境下检测小物体的精度是97.1%,F1 Score是97.5%,并且实现了mAP@.5的98.6%。
    Abstract Small object detection via UAV (Unmanned Aerial Vehicle) images captured from drones and radar is a complex task with several formidable challenges. This domain encompasses numerous complexities that impede the accurate detection and localization of small objects. To address these challenges, we propose a novel method called JointYODNet for UAVs to detect small objects, leveraging a joint loss function specifically designed for this task. Our method revolves around the development of a joint loss function tailored to enhance the detection performance of small objects. Through extensive experimentation on a diverse dataset of UAV images captured under varying environmental conditions, we evaluated different variations of the loss function and determined the most effective formulation. The results demonstrate that our proposed joint loss function outperforms existing methods in accurately localizing small objects. Specifically, our method achieves a recall of 0.971, and a F1Score of 0.975, surpassing state-of-the-art techniques. Additionally, our method achieves a mAP@.5(%) of 98.6, indicating its robustness in detecting small objects across varying scales
    摘要 小物体检测 via UAV(无人航空器)图像 captured from drones和雷达是一项复杂任务,涉及许多可考的挑战。这个领域涵盖许多复杂性,阻碍精准检测和定位小物体。为解决这些挑战,我们提出了一种新的方法 called JointYODNet,用于UAVs中的小物体检测。我们的方法基于特制的联合损失函数,用于提高小物体检测性能。通过对不同环境下UAV图像的广泛实验,我们评估了不同版本的损失函数,并确定了最有效的形式。结果表明,我们的联合损失函数可以高效地地localize小物体,并且在不同的缩放比例下保持稳定性。具体来说,我们的方法达到了0.971的回归率,和0.975的F1Score,这两个指标都高于当前的State-of-the-art技术。此外,我们的方法达到了98.6%的mAP@.5(%),这表明它在不同的缩放比例下具有较高的小物体检测稳定性。

AaP-ReID: Improved Attention-Aware Person Re-identification

  • paper_url: http://arxiv.org/abs/2309.15780
  • repo_url: None
  • paper_authors: Vipin Gautam, Shitala Prasad, Sharad Sinha
  • for: 本研究的目标是解决人识别 task 中的特定个体识别问题,以提高人识别的精度和可靠性。
  • methods: 我们提出了一种基于 ResNet 架构的 AaP-ReID 方法,其中包含 Channel-Wise Attention Bottleneck (CWAbottleneck) 块,可以动态调整每个通道的重要性,以学习更有力的特征。
  • results: 我们在 Market-1501、DukeMTMC-reID 和 CUHK03 三个 benchmark 数据集上进行了评估,与state-of-the-art 人识别方法相比,我们的 AaP-ReID 方法在rank-1准确率上达到了 95.6%、90.6% 和 82.4% 的水平。
    Abstract Person re-identification (ReID) is a well-known problem in the field of computer vision. The primary objective is to identify a specific individual within a gallery of images. However, this task is challenging due to various factors, such as pose variations, illumination changes, obstructions, and the presence ofconfusing backgrounds. Existing ReID methods often fail to capture discriminative features (e.g., head, shoes, backpacks) and instead capture irrelevant features when the target is occluded. Motivated by the success of part-based and attention-based ReID methods, we improve AlignedReID++ and present AaP-ReID, a more effective method for person ReID that incorporates channel-wise attention into a ResNet-based architecture. Our method incorporates the Channel-Wise Attention Bottleneck (CWAbottleneck) block and can learn discriminating features by dynamically adjusting the importance ofeach channel in the feature maps. We evaluated Aap-ReID on three benchmark datasets: Market-1501, DukeMTMC-reID, and CUHK03. When compared with state-of-the-art person ReID methods, we achieve competitive results with rank-1 accuracies of 95.6% on Market-1501, 90.6% on DukeMTMC-reID, and 82.4% on CUHK03.
    摘要 人脸重认(ReID)是计算机视觉领域的一个很有名的问题。主要目标是在一组图像中识别特定的个体。但这个任务受到多种因素的影响,如 pose 变化、照明变化、阻挡物和误导背景的存在。现有的 ReID 方法 oftentimes 未能捕捉特征特征(例如头、鞋、背包),而是在目标被遮盖时捕捉无关的特征。我们受到部分基于和注意力基于 ReID 方法的成功的激励,我们改进了 AlignedReID++ 并提出了 AaP-ReID,一种更有效的人脸重认方法。我们的方法包括 Channel-Wise Attention Bottleneck(CWAbottleneck)块,可以在 ResNet 基本架构中动态调整每个通道的重要性,从而学习特征。我们在 Market-1501、DukeMTMC-reID 和 CUHK03 三个标准数据集上评估 AaP-ReID,与状态之前的人脸重认方法相比,我们实现了竞争性的结果,rank-1 准确率分别达到 95.6%、90.6% 和 82.4%。

Aperture Diffraction for Compact Snapshot Spectral Imaging

  • paper_url: http://arxiv.org/abs/2309.16372
  • repo_url: https://github.com/krito-ex/csst
  • paper_authors: Tao Lv, Hao Ye, Quan Yuan, Zhan Shi, Yibo Wang, Shuming Wang, Xun Cao
  • for: 这个论文旨在描述一种名为 aperature Diffraction Imaging Spectrometer(ADIS)的新型快速 spectral imaging 系统,该系统只有一个映射镜和一个多元滤色器传感器,不需要任何额外的物理设备。
  • methods: 论文提出了一种新的光学设计,即通过diffraction-based spatial-spectral projection engineering来将对象空间中的每个点多态化到滤色器传感器上的不同编码位置。这种多态化的设计使得只需要单个曝光幕Raw image data可以实现高 spectral resolution和低aliasing的图像重建。
  • results: experiments show that the proposed system can achieve sub-super-pixel spatial resolution and high spectral resolution imaging, and the reconstructed images are highly consistent with the original data. In addition, the system is evaluated by analyzing the imaging optical theory and reconstruction algorithm, and the code will be available at GitHub.
    Abstract We demonstrate a compact, cost-effective snapshot spectral imaging system named Aperture Diffraction Imaging Spectrometer (ADIS), which consists only of an imaging lens with an ultra-thin orthogonal aperture mask and a mosaic filter sensor, requiring no additional physical footprint compared to common RGB cameras. Then we introduce a new optical design that each point in the object space is multiplexed to discrete encoding locations on the mosaic filter sensor by diffraction-based spatial-spectral projection engineering generated from the orthogonal mask. The orthogonal projection is uniformly accepted to obtain a weakly calibration-dependent data form to enhance modulation robustness. Meanwhile, the Cascade Shift-Shuffle Spectral Transformer (CSST) with strong perception of the diffraction degeneration is designed to solve a sparsity-constrained inverse problem, realizing the volume reconstruction from 2D measurements with Large amount of aliasing. Our system is evaluated by elaborating the imaging optical theory and reconstruction algorithm with demonstrating the experimental imaging under a single exposure. Ultimately, we achieve the sub-super-pixel spatial resolution and high spectral resolution imaging. The code will be available at: https://github.com/Krito-ex/CSST.
    摘要 我们提出了一种具有高效性和可持续性的快照spectral imaging系统,名为Aperture Diffraction Imaging Spectrometer(ADIS)。该系统只有一个映射镜和一个 orthogonal aperture mask,而不需要任何额外的物理空间。我们还介绍了一种新的光学设计,使得对象空间中的每个点都被多样化到灰度滤波器传感器上的独立编码位置。这种多样化是通过扩散基于的空间-спектраль投影工程实现的,从而获得弱依赖于均衡 calibration 的数据形式,以提高模拟稳定性。同时,我们设计了一种叫做Cascade Shift-Shuffle Spectral Transformer(CSST)的新型神经网络,用于解决一个具有扩散约束的减少问题,实现从2D测量获得3D重建。我们通过推导光学学理和重建算法,并进行实验测试,最终实现了下sampling 的超分辨率和高spectral resolution的成像。系统代码将在https://github.com/Krito-ex/CSST 上提供。

High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.15889
  • repo_url: None
  • paper_authors: Selim F. Yilmaz, Xueyan Niu, Bo Bai, Wei Han, Lei Deng, Deniz Gunduz
  • for: 实现受损图像传输过程中的杂音无线通信频道之间的深度学习基于的共同源码渠道(DeepJSCC)和评估频道的数据模型(DDPM)。
  • methods: 使用目标图像的范围空间分解,将图像转换成范围空间后,使用DDPM进行累进填充null空间内容。
  • results: 在实际finite block length regime中,与传统DeepJSCC和当前学习型基于方法进行比较,实现了较好的干扰和人类视觉质量。将源代码公开供研究和重现。
    Abstract We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. We introduce a novel scheme that utilizes the range-null space decomposition of the target image. We transmit the range-space of the image after encoding and employ DDPM to progressively refine its null space contents. Through extensive experiments, we demonstrate significant improvements in distortion and perceptual quality of reconstructed images compared to standard DeepJSCC and the state-of-the-art generative learning-based method. We will publicly share our source code to facilitate further research and reproducibility.
    摘要 我们考虑了通过深度学习基于源-通道编码(DeepJSCC)和推 diffusion概率模型(DDPM)的图像传输问题,特别是在实际 finite block length Régime中进行评估。我们关注图像传输过程中的觉受-误差交易,在这种情况下,分离的源和通道编码可能是非常不优化的。我们提出了一种新的方案,利用目标图像的范围空间划分。我们在编码后将范围空间传输给接收方,并使用 DDPM 进行逐渐提高null空间内容的进程。经过广泛的实验,我们发现了对于重建图像的误差和人类识别质量都有显著改善,相比标准 DeepJSCC 和当前最佳生成学习基于方法。我们将将源代码公开发布,以便进一步的研究和复现。

Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback

  • paper_url: http://arxiv.org/abs/2309.15762
  • repo_url: None
  • paper_authors: Teresa Yeo, Oğuzhan Fatih Kar, Zahra Sodagar, Amir Zamir
  • for: 本文提出了一种适应分布shift的方法,用于在测试时进行适应。与传统的训练时Robustness机制不同,我们创建了一个循环系统,并使用测试时反馈信号来适应网络。
  • methods: 我们使用了一种学习基于的函数来实现这个循环系统,实现了一个摘要优化器 для网络。
  • results: 我们通过了广泛的实验,包括不同的散度shift、任务和数据集,并显示了这种方法的高效性和灵活性。
    Abstract We propose a method for adapting neural networks to distribution shifts at test-time. In contrast to training-time robustness mechanisms that attempt to anticipate and counter the shift, we create a closed-loop system and make use of a test-time feedback signal to adapt a network on the fly. We show that this loop can be effectively implemented using a learning-based function, which realizes an amortized optimizer for the network. This leads to an adaptation method, named Rapid Network Adaptation (RNA), that is notably more flexible and orders of magnitude faster than the baselines. Through a broad set of experiments using various adaptation signals and target tasks, we study the efficiency and flexibility of this method. We perform the evaluations using various datasets (Taskonomy, Replica, ScanNet, Hypersim, COCO, ImageNet), tasks (depth, optical flow, semantic segmentation, classification), and distribution shifts (Cross-datasets, 2D and 3D Common Corruptions) with promising results. We end with a discussion on general formulations for handling distribution shifts and our observations from comparing with similar approaches from other domains.
    摘要 我们提出了一种方法,用于在测试时适应分布变化。与训练时鲁棒性机制不同,我们创建了一个封闭的循环系统,并使用测试时反馈信号来适应网络的 fly。我们表明,这种循环可以使用学习基于函数,实现一个摘要优化器,从而实现了网络的适应方法。我们称之为快速网络适应(RNA)。这种方法比基准更加灵活,并且速度几个数量级更快。通过使用不同的适应信号和目标任务,我们在各种实验中研究了这种方法的效率和灵活性。我们使用了不同的数据集(Taskonomy、Replica、ScanNet、Hypersim、COCO、ImageNet)、任务(深度、光流、semantic segmentation、分类)和分布变化(跨数据集、2D和3D Common Corruptions),并获得了可观的结果。我们在结束时对类似的方法进行了比较,并进行了一些总结和讨论。

CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and Favorable Transferability For ViTs

  • paper_url: http://arxiv.org/abs/2309.15755
  • repo_url: None
  • paper_authors: Ao Wang, Hui Chen, Zijia Lin, Sicheng Zhao, Jungong Han, Guiguang Ding
  • for: 这个论文的目的是提出一种基于ViTs的 JOINT 压缩方法,以提高模型的速度和准确率,同时保持下游任务的可贯通性。
  • methods: 这个方法使用了一种异常的token合并策略(ATME),通过将邻近的token合并起来,成功地压缩了重复的token信息,保持图像的空间结构。此外,这个方法还使用了一种一致动态通道剔除策略(CDCP),可以动态剔除不重要的通道在ViTs中,大幅提高模型压缩。
  • results: 经过广泛的实验表明,这个方法可以在不同的ViTs上达到最佳性能,比如ImageNet上的DeiT-Tiny和DeiT-Small模型可以 Achieve 1.7$\times$和1.9$\times$的速度提升,而无需减少准确率。在ADE20k segmentationdataset上,这个方法可以具有最多1.31$\times$的速度提升,与相同的mIoU水平兼容。
    Abstract Vision Transformers (ViTs) have emerged as state-of-the-art models for various vision tasks recently. However, their heavy computation costs remain daunting for resource-limited devices. Consequently, researchers have dedicated themselves to compressing redundant information in ViTs for acceleration. However, they generally sparsely drop redundant image tokens by token pruning or brutally remove channels by channel pruning, leading to a sub-optimal balance between model performance and inference speed. They are also disadvantageous in transferring compressed models to downstream vision tasks that require the spatial structure of images, such as semantic segmentation. To tackle these issues, we propose a joint compression method for ViTs that offers both high accuracy and fast inference speed, while also maintaining favorable transferability to downstream tasks (CAIT). Specifically, we introduce an asymmetric token merging (ATME) strategy to effectively integrate neighboring tokens. It can successfully compress redundant token information while preserving the spatial structure of images. We further employ a consistent dynamic channel pruning (CDCP) strategy to dynamically prune unimportant channels in ViTs. Thanks to CDCP, insignificant channels in multi-head self-attention modules of ViTs can be pruned uniformly, greatly enhancing the model compression. Extensive experiments on benchmark datasets demonstrate that our proposed method can achieve state-of-the-art performance across various ViTs. For example, our pruned DeiT-Tiny and DeiT-Small achieve speedups of 1.7$\times$ and 1.9$\times$, respectively, without accuracy drops on ImageNet. On the ADE20k segmentation dataset, our method can enjoy up to 1.31$\times$ speedups with comparable mIoU. Our code will be publicly available.
    摘要 目标是提出一种可以同时保持高准确率和快速推理速度的 ViT 压缩方法,而且可以在下游视觉任务中保持图像的空间结构。我们提出了一种强化 token 合并策略(ATME),可以有效地压缩重复的token信息,同时保持图像的空间结构。此外,我们还采用了一种 dynamically 频道剪枝策略(CDCP),可以在 ViT 中动态剪枝无关的频道,从而提高模型压缩。我们的方法可以在多种 ViT 上达到领先的性能,例如,我们的压缩后 DeiT-Tiny 和 DeiT-Small 可以在 ImageNet 上增加 1.7 倍和 1.9 倍的速度,同时保持准确性。在 ADE20k segmentation 数据集上,我们的方法可以增加到 1.31 倍的速度,与相同的 mIoU 相对。我们的代码将公开。

InfraParis: A multi-modal and multi-task autonomous driving dataset

  • paper_url: http://arxiv.org/abs/2309.15751
  • repo_url: None
  • paper_authors: Gianni Franchi, Marwane Hariat, Xuanlong Yu, Nacim Belkhir, Antoine Manzanera, David Filliat
  • for: 这个论文旨在提供一个多模态数据集,以便提高自动驾驶计算机视觉模型的可靠性和多样化性。
  • methods: 这个论文使用了多种现有的深度神经网络模型,并对其进行了评估。
  • results: 论文发现,使用多模态数据集可以提高模型的性能,并且可以更好地处理新的对象、噪音、夜间条件和多样化场景。
    Abstract Current deep neural networks (DNNs) for autonomous driving computer vision are typically trained on specific datasets that only involve a single type of data and urban scenes. Consequently, these models struggle to handle new objects, noise, nighttime conditions, and diverse scenarios, which is essential for safety-critical applications. Despite ongoing efforts to enhance the resilience of computer vision DNNs, progress has been sluggish, partly due to the absence of benchmarks featuring multiple modalities. We introduce a novel and versatile dataset named InfraParis that supports multiple tasks across three modalities: RGB, depth, and infrared. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.
    摘要 当前的深度神经网络(DNNs) для自动驾驶计算机视觉通常是通过特定的数据集训练的,这些数据集只包含单一的数据和城市场景。因此,这些模型具有处理新的对象、噪音、夜间条件和多样化场景的能力异常差,这是安全应用的关键。虽然持续努力提高计算机视觉DNNs的鲜度,但进步缓慢,其中一个原因是多模态的标准准。我们介绍了一个新的和多样的数据集名为InfraParis,该数据集支持多个任务逐模态:RGB、深度和红外。我们评估了多种现有的基线技术,包括 semantic segmentation、物体检测和深度估计等任务的模型。

Automated CT Lung Cancer Screening Workflow using 3D Camera

  • paper_url: http://arxiv.org/abs/2309.15750
  • repo_url: None
  • paper_authors: Brian Teixeira, Vivek Singh, Birgi Tamersoy, Andreas Prokein, Ankur Kapoor
  • for: 这篇论文的目的是为了减少CT扫描中需要的时间consuming scout scans,并且自动化病人定位。
  • methods: 这篇论文使用了一个新的方法,可以从3D相机影像中估算病人的扫描范围、中心点和水平径确(WED),不需要使用实验 scan data。这个方法通过对超过60,000个CT扫描数据进行训练,并引入一个新的更新方法,可以在实时扫描数据中更新预测。
  • results: 这篇论文的结果显示,使用这个新方法可以很好地减少CT扫描中需要的时间和精度误差。在110对深度数据和CT扫描数据的测试集中,这个方法可以很好地估算病人的中心点、扫描范围和WED。相比IEC的Acceptance对�项目的10%,这个方法的相关WED误差为4%。
    Abstract Despite recent developments in CT planning that enabled automation in patient positioning, time-consuming scout scans are still needed to compute dose profile and ensure the patient is properly positioned. In this paper, we present a novel method which eliminates the need for scout scans in CT lung cancer screening by estimating patient scan range, isocenter, and Water Equivalent Diameter (WED) from 3D camera images. We achieve this task by training an implicit generative model on over 60,000 CT scans and introduce a novel approach for updating the prediction using real-time scan data. We demonstrate the effectiveness of our method on a testing set of 110 pairs of depth data and CT scan, resulting in an average error of 5mm in estimating the isocenter, 13mm in determining the scan range, 10mm and 16mm in estimating the AP and lateral WED respectively. The relative WED error of our method is 4%, which is well within the International Electrotechnical Commission (IEC) acceptance criteria of 10%.
    摘要 尽管最近的 computed tomography (CT) 规划技术已经实现了患者定位自动化,但是时间consuming的探测扫描仍然需要进行以计算剂量profile和确保患者是正确地位置。在这篇论文中,我们提出了一种新的方法,它可以消除 CT 肺癌检测中的探测扫描。我们通过对超过 60,000 个 CT 扫描图像进行训练,并 introduce 一种新的更新预测方法使用实时扫描数据。我们在测试集上进行了 110 对深度数据和 CT 扫描的对比,得到了平均错误为 5mm,13mm,10mm和16mm,分别用于计算中心点、扫描范围、AP和 lateral Water Equivalent Diameter (WED)。我们的方法的相对 WED 错误率为 4%,这在国际电工标准委员会 (IEC) 接受的 10% 范围内。

Synthetic Latent Fingerprint Generation Using Style Transfer

  • paper_url: http://arxiv.org/abs/2309.15734
  • repo_url: None
  • paper_authors: Amol S. Joshi, Ali Dabouei, Nasser Nasrabadi, Jeremy Dawson
  • for: 这篇论文旨在提供一种简单实用的方法来生成具有实际特征的潜在指纹资料,以便训练需要大量资料的神经网络模型。
  • methods: 本研究使用了Style Transfer和图像融合技术来实现潜在指纹生成。
  • results: 实验结果显示,生成的潜在指纹资料 preserve 输入触感指纹资料中的身份信息,并具有真实潜在指纹资料的特征。此外,生成的指纹资料显示了多种特征和样式,表明提案方法可以从同一个指纹中产生多个样本。
    Abstract Limited data availability is a challenging problem in the latent fingerprint domain. Synthetically generated fingerprints are vital for training data-hungry neural network-based algorithms. Conventional methods distort clean fingerprints to generate synthetic latent fingerprints. We propose a simple and effective approach using style transfer and image blending to synthesize realistic latent fingerprints. Our evaluation criteria and experiments demonstrate that the generated synthetic latent fingerprints preserve the identity information from the input contact-based fingerprints while possessing similar characteristics as real latent fingerprints. Additionally, we show that the generated fingerprints exhibit several qualities and styles, suggesting that the proposed method can generate multiple samples from a single fingerprint.
    摘要 限制的数据可用性是 latent fingerprint 领域中的一个挑战。Synthetically generated fingerprints 是训练数据涉及大量神经网络算法的重要资源。传统方法会扭曲清晰的指纹来生成伪装的 latent fingerprints。我们提议一种简单有效的方法,使用 Style transfer 和图像融合来生成真实的 latent fingerprints。我们的评估标准和实验表明,生成的伪装 latent fingerprints 保留输入的 Contact-based fingerprints 中的身份信息,同时具有真实 latent fingerprints 的相似特征。此外,我们还示出了生成的指纹具有多种特征和风格, suggesting that the proposed method can generate multiple samples from a single fingerprint.

Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation

  • paper_url: http://arxiv.org/abs/2309.15726
  • repo_url: None
  • paper_authors: Xin Yuan, Michael Maire
  • for: 这个论文是为了开发一种无监督的神经网络架构,用于同时生成和分割图像。
  • methods: 该模型采用无监督的杂化扩散目标来驱动学习,不需任何标注或区域知识来训练。模型中的计算瓶颈使得杂化网络partition输入图像,并在平行进行净化和结合结果。
  • results: 我们的训练模型可以生成高质量的合成图像和对真实图像进行无监督的图像分割,并且不需任何迭代训练或标注。实验结果表明,我们的模型可以准确地完成无监督图像分割任务和高质量的合成图像生成。
    Abstract We develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Learning is driven entirely by the denoising diffusion objective, without any annotation or prior knowledge about regions during training. A computational bottleneck, built into the neural architecture, encourages the denoising network to partition an input into regions, denoise them in parallel, and combine the results. Our trained model generates both synthetic images and, by simple examination of its internal predicted partitions, a semantic segmentation of those images. Without any finetuning, we directly apply our unsupervised model to the downstream task of segmenting real images via noising and subsequently denoising them. Experiments demonstrate that our model achieves accurate unsupervised image segmentation and high-quality synthetic image generation across multiple datasets.
    摘要 我们开发了一种神经网络架构,通过不经过监督的方式,同时学习生成和分割图像。在训练过程中,我们没有任何注释或区域知识, entirely driven by the denoising diffusion objective。我们的神经网络架构包含计算瓶颈,使得杂化网络partition输入图像,并在平行进行杂化和结合结果。我们训练的模型可以同时生成 sintetic 图像和通过内部预测的分区来实现图像 semantic segmentation。无需追究,我们直接将无监督模型应用于图像分割下游任务,通过噪音和杂化图像来进行预测。实验表明,我们的模型可以准确无监督地分割图像和生成高质量的 sintetic 图像,在多个 dataset 上表现出色。

Physics-Based Rigid Body Object Tracking and Friction Filtering From RGB-D Videos

  • paper_url: http://arxiv.org/abs/2309.15703
  • repo_url: None
  • paper_authors: Rama Krishna Kandukuri, Michael Strecke, Joerg Stueckler
  • for: 这 paper 是为了解决物体间互动的理解问题,以便在增强现实和机器人领域中实现更加 precisione 的模拟和控制。
  • methods: 该 paper 使用了一种新的方法,即使用可微的物理模拟来模型物体的互动,并通过扩展卡尔曼滤波来Track 3D 物体的位姿和物理性能。
  • results: 该 paper 的实验结果表明,该方法可以准确地滤波物体的位姿和动量,同时也可以估算物体的透抗率。 furthermore, 该 paper 还提供了一些实验结果,证明该方法在不同的滑动场景中的性能。
    Abstract Physics-based understanding of object interactions from sensory observations is an essential capability in augmented reality and robotics. It enables capturing the properties of a scene for simulation and control. In this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3D from RGB-D images and infers physical properties of the objects. We use a differentiable physics simulation as state-transition model in an Extended Kalman Filter which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. We demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. We analyse our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. We also demonstrate and evaluate our approach on a real-world dataset. We will make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.
    摘要 physics-based understanding of object interactions from sensory observations is a crucial capability in augmented reality and robotics. it enables capturing the properties of a scene for simulation and control. in this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3d from rgb-d images and infers physical properties of the objects. we use a differentiable physics simulation as state-transition model in an extended kalman filter, which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. we demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. we analyze our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. we also demonstrate and evaluate our approach on a real-world dataset. we will make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.Here's the translation in Traditional Chinese:物理基础的物体互动理解从感知观察是现实增强 reality 和机器人学中的重要能力。它可以捕捉场景的属性进行模拟和控制。在这篇文章中,我们提出了一种新的approach for real-to-sim,追踪3d中的固定物体从rgb-d图像中,并将物体的物理性能推断出来。我们使用了可微分的物理模拟作为状态转换模型,可以模拟物体之间的触摸和摩擦,并从而估算物体的运动轨迹。我们显示了我们的方法可以范围对象的位置、方向、速度和同时估算物体的摩擦系数。我们分析了我们的方法在单一物体和碰撞物体的滑动情况下的性能。我们还评估了我们的方法在实际世界数据集上的表现。我们将我们的新的benchmark数据集公开,以便未来的研究者可以在这个新的问题设定下进行比较和研究。

SGRec3D: Self-Supervised 3D Scene Graph Learning via Object-Level Scene Reconstruction

  • paper_url: http://arxiv.org/abs/2309.15702
  • repo_url: None
  • paper_authors: Sebastian Koch, Pedro Hermosilla, Narunas Vaskevicius, Mirco Colosi, Timo Ropinski
  • for: 提高3D场景理解的能力
  • methods: 使用自我超vised pre-training方法SGRec3D来预处理3D场景图
  • results: 比起其他点云基于预训练方法,SGRec3D在3D场景图预测中提高了表达能力,得到了SOTA的性能,并且只需使用10%的标注数据进行精度调整即可以超过同类模型。
    Abstract In the field of 3D scene understanding, 3D scene graphs have emerged as a new scene representation that combines geometric and semantic information about objects and their relationships. However, learning semantic 3D scene graphs in a fully supervised manner is inherently difficult as it requires not only object-level annotations but also relationship labels. While pre-training approaches have helped to boost the performance of many methods in various fields, pre-training for 3D scene graph prediction has received little attention. Furthermore, we find in this paper that classical contrastive point cloud-based pre-training approaches are ineffective for 3D scene graph learning. To this end, we present SGRec3D, a novel self-supervised pre-training method for 3D scene graph prediction. We propose to reconstruct the 3D input scene from a graph bottleneck as a pretext task. Pre-training SGRec3D does not require object relationship labels, making it possible to exploit large-scale 3D scene understanding datasets, which were off-limits for 3D scene graph learning before. Our experiments demonstrate that in contrast to recent point cloud-based pre-training approaches, our proposed pre-training improves the 3D scene graph prediction considerably, which results in SOTA performance, outperforming other 3D scene graph models by +10% on object prediction and +4% on relationship prediction. Additionally, we show that only using a small subset of 10% labeled data during fine-tuning is sufficient to outperform the same model without pre-training.
    摘要 在三维场景理解领域,三维场景图(3D scene graph)已成为一种新的场景表示方法,可以同时包含物体的几何和 semantic信息。然而,在完全监督的情况下学习 semantic 3D scene graph 是非常困难的,因为需要不仅物体级别的注释,还需要关系标签。而在多种领域中,预训练方法已经有所帮助提高性能,但是针对 3D scene graph 的预训练却受到了少量的关注。此外,我们在这篇论文中发现,经典的对比点云预训练方法对于 3D scene graph 学习是无效的。为此,我们提出了 SGRec3D,一种新的自我监督预训练方法 для 3D scene graph 预测。我们提议使用场景图瓶颈来重建输入场景,作为一种预text任务。预训练 SGRec3D 不需要物体关系标签,因此可以利用大规模的 3D scene understanding 数据集,这些数据集在之前是 3D scene graph 学习中的不可达。我们的实验结果表明,相比最近的点云预训练方法,我们提posed的预训练方法可以大幅提高 3D scene graph 预测性能,达到了最新的标准性能,在物体预测上超过了 +10%,在关系预测上超过了 +4%。此外,我们还证明了只使用 10% 的标注数据进行细化调教是足够的,可以超过同样的模型无预训练。

Physics Inspired Hybrid Attention for SAR Target Recognition

  • paper_url: http://arxiv.org/abs/2309.15697
  • repo_url: https://github.com/xai4sar/piha
  • paper_authors: Zhongling Huang, Chong Wu, Xiwen Yao, Zhicheng Zhao, Xiankai Huang, Junwei Han
    for:The paper is focused on improving the performance and physical interpretability of SAR target recognition by integrating physical models and deep neural networks (DNNs).methods:The proposed method is based on a physics-inspired hybrid attention (PIHA) mechanism that leverages high-level semantics of physical information to activate and guide the feature group aware of local semantics of the target. The PIHA mechanism can be integrated into arbitrary DNNs without modifying the original architecture.results:The proposed method outperforms other state-of-the-art approaches in 12 test scenarios with the same ASC parameters. The experiments also show that PIHA is effective for different physical information and can be used to evaluate the model’s robustness and generalizability using the once-for-all (OFA) evaluation protocol.Here is the answer in Simplified Chinese text:for: 本 paper 的目的是提高 SAR 目标识别的性能和物理解释性,通过结合物理模型和深度神经网络 (DNNs)。methods: 提议的方法基于物理启发的混合注意力 (PIHA) 机制,利用高级别的物理信息来启动和指导target的本地semantics feature group。PIHA 机制可以与原始建筑不变的 DNNs 集成。results: 提议的方法在 12 个测试场景中超过了其他状态对照方法,并且在不同的物理信息下也表现出色。
    Abstract There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability. The attributed scattering center (ASC) parameters garnered the most interest, being considered as additional input data or features for fusion in most methods. However, the performance greatly depends on the ASC optimization result, and the fusion strategy is not adaptable to different types of physical information. Meanwhile, the current evaluation scheme is inadequate to assess the model's robustness and generalizability. Thus, we propose a physics inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol to address the above issues. PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of target, so as to re-weight the feature importance based on knowledge prior. It is flexible and generally applicable to various physical models, and can be integrated into arbitrary DNNs without modifying the original architecture. The experiments involve a rigorous assessment using the proposed OFA, which entails training and validating a model on either sufficient or limited data and evaluating on multiple test sets with different data distributions. Our method outperforms other state-of-the-art approaches in 12 test scenarios with same ASC parameters. Moreover, we analyze the working mechanism of PIHA and evaluate various PIHA enabled DNNs. The experiments also show PIHA is effective for different physical information. The source code together with the adopted physical information is available at https://github.com/XAI4SAR.
    摘要 Recently, there has been an emphasis on combining physical models and deep neural networks (DNNs) for target recognition in synthetic aperture radar (SAR) imaging, in order to improve performance and achieve better physical interpretability. The attributed scattering center (ASC) parameters have been widely used as additional input data or features for fusion in most methods. However, the performance of these methods heavily depends on the optimization result of ASC, and the fusion strategy is not adaptable to different types of physical information. Moreover, the current evaluation scheme is inadequate to assess the model's robustness and generalizability.To address these issues, we propose a physics-inspired hybrid attention (PIHA) mechanism and the once-for-all (OFA) evaluation protocol. PIHA leverages the high-level semantics of physical information to activate and guide the feature group aware of local semantics of the target, so as to re-weight the feature importance based on knowledge prior. This approach is flexible and generally applicable to various physical models, and can be integrated into arbitrary DNNs without modifying the original architecture.We conducted a series of experiments to evaluate the effectiveness of PIHA, using the proposed OFA evaluation protocol. The experiments involved training and validating a model on either sufficient or limited data, and evaluating its performance on multiple test sets with different data distributions. Our results show that PIHA outperforms other state-of-the-art approaches in 12 test scenarios with the same ASC parameters. Moreover, we analyzed the working mechanism of PIHA and evaluated various PIHA-enabled DNNs. The experiments also demonstrated that PIHA is effective for different physical information.The source code, together with the adopted physical information, is available at https://github.com/XAI4SAR.

A Unified View of Differentially Private Deep Generative Modeling

  • paper_url: http://arxiv.org/abs/2309.15696
  • repo_url: None
  • paper_authors: Dingfan Chen, Raouf Kerkouche, Mario Fritz
  • for: This paper aims to provide a unified view of various approaches for achieving privacy-preserving high-dimensional data generation through differentially private (DP) training of deep neural networks.
  • methods: The paper systematizes and jointly designs methods for different use cases, and discusses the strengths, limitations, and inherent correlations between different approaches.
  • results: The paper presents a novel unified view of privacy-preserving data generation methods, and provides potential paths forward for the field of DP data generation, with the aim of advancing privacy-preserving learning.
    Abstract The availability of rich and vast data sources has greatly advanced machine learning applications in various domains. However, data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles in compliance with privacy considerations is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released, enabling privacy-preserving downstream analysis and reproducible research in sensitive domains. In recent years, various approaches have been proposed for achieving privacy-preserving high-dimensional data generation by private training on top of deep neural networks. In this paper, we present a novel unified view that systematizes these approaches. Our view provides a joint design space for systematically deriving methods that cater to different use cases. We then discuss the strengths, limitations, and inherent correlations between different approaches, aiming to shed light on crucial aspects and inspire future research. We conclude by presenting potential paths forward for the field of DP data generation, with the aim of steering the community toward making the next important steps in advancing privacy-preserving learning.
    摘要 “由于丰富的数据源的可用性,机器学习应用在不同领域得到了很大的进步。然而,具有隐私问题的数据受到了严格的规定,这些规定 frequently prohibited data access 和 data sharing。为了遵循隐私考虑,在许多实际应用 scenario 中,技术进步是关键。具有隐私保证的数据发布(DP)提供了一个吸引人的解决方案,即仅发布了隐私检查的数据,允许隐私保证的下游分析和可重复性的研究。在过去几年,许多方法被提出供以实现隐私保证高维数据生成。在本文中,我们提出了一个新的统一的观点,它系统地探讨了不同的用案。我们的观点提供了一个共同的设计空间,可以系统地从数据生成中获得不同的方法。我们然后讨论了不同方法的优点、局限性和内在的相互关联性,以照明关键的问题和激励未来研究。我们结束时,提出了未来隐私保证数据生成领域的可能的进步之路,以导引社区做出下一步的进步。”

End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning

  • paper_url: http://arxiv.org/abs/2309.15683
  • repo_url: https://github.com/Thinksky5124/SVTAS
  • paper_authors: Wujun Wen, Jinrong Zhang, Shenglan Liu, Yunheng Li, Qifeng Li, Lin Feng
  • for: 本研究的目的是提出一种可以实时应用于长视频中的动作分类任务,以扩展现有的动作识别模型的应用场景。
  • methods: 该研究提出了一种结合流处理和强化学习的末端视频动作时间分 segmentation方法(SVTAS-RL),可以将动作识别任务视为动作分类 clustering 任务,并使用强化学习来缓解不一致的优化目标和方向问题。
  • results: 经过广泛的实验,SVTAS-RL 模型在多个数据集上达到了与现有模型相当的竞争性性能,并在ultra-long video dataset EGTEA 上表现出了更大的优势,这表明该方法可以取代现有的 TAS 模型,并且 SVTAS-RL 更适合长视频 TAS。
    Abstract Temporal Action Segmentation (TAS) from video is a kind of frame recognition task for long video with multiple action classes. As an video understanding task for long videos, current methods typically combine multi-modality action recognition models with temporal models to convert feature sequences to label sequences. This approach can only be applied to offline scenarios, which severely limits the TAS application. Therefore, this paper proposes an end-to-end Streaming Video Temporal Action Segmentation with Reinforce Learning (SVTAS-RL). The end-to-end SVTAS which regard TAS as an action segment clustering task can expand the application scenarios of TAS; and RL is used to alleviate the problem of inconsistent optimization objective and direction. Through extensive experiments, the SVTAS-RL model achieves a competitive performance to the state-of-the-art model of TAS on multiple datasets, and shows greater advantages on the ultra-long video dataset EGTEA. This indicates that our method can replace all current TAS models end-to-end and SVTAS-RL is more suitable for long video TAS. Code is availabel at https://github.com/Thinksky5124/SVTAS.
    摘要 Temporal Action Segmentation (TAS) from video is a type of frame recognition task for long videos with multiple action classes. As a video understanding task for long videos, current methods typically combine multi-modality action recognition models with temporal models to convert feature sequences into label sequences. This approach can only be applied to offline scenarios, which severely limits the TAS application. Therefore, this paper proposes an end-to-end Streaming Video Temporal Action Segmentation with Reinforce Learning (SVTAS-RL). The end-to-end SVTAS, which treats TAS as an action segment clustering task, can expand the application scenarios of TAS; and RL is used to alleviate the problem of inconsistent optimization objectives and directions. Through extensive experiments, the SVTAS-RL model achieves a competitive performance to the state-of-the-art model of TAS on multiple datasets, and shows greater advantages on the ultra-long video dataset EGTEA. This indicates that our method can replace all current TAS models end-to-end, and SVTAS-RL is more suitable for long video TAS. Code is available at https://github.com/Thinksky5124/SVTAS.Here is the word-for-word translation of the text into Simplified Chinese:视频 temporal action segmentation (TAS) 是一种帧 recognition 任务,用于长视频中的多种动作类。当前方法通常将多模态动作识别模型与时间模型组合,将特征序列转换为标签序列。这种方法只适用于线上enario,很大限制 TAS 应用。因此,这篇论文提出了一种终端到终 Streaming Video Temporal Action Segmentation with Reinforce Learning (SVTAS-RL)。终端 SVTAS 将 TAS 视为动作段 clustering 任务,可以扩大 TAS 的应用场景; RL 用于缓解不一致的优化目标和方向问题。经过广泛的实验,SVTAS-RL 模型在多个 datasets 上达到了与当前 TAS 模型的竞争性性能,并在EGTEA 数据集上表现出更大的优势。这表示我们的方法可以替换所有当前 TAS 模型,并且 SVTAS-RL 更适合长视频 TAS。 Code 可以在 https://github.com/Thinksky5124/SVTAS 中获取。

SJTU-TMQA: A quality assessment database for static mesh with texture map

  • paper_url: http://arxiv.org/abs/2309.15675
  • repo_url: None
  • paper_authors: Bingyang Cui, Qi Yang, Kaifa Yang, Yiling Xu, Xiaozhong Xu, Shan Liu
  • for: 这篇论文主要是为了评估纹理化网格质量的研究。
  • methods: 论文使用了21个参考网格和945个扭曲样本来创建大规模的纹理化网格质量评估数据库(SJTU-TMQA),并通过主观实验获得了意见分数(MOS)。
  • results: 研究显示了不同类型的扭曲对人类印象的影响,并评估了13种当前最佳对象度量的可靠性。结果显示这些度量之间的相关性为0.6级,表明需要更有效的对象度量。 SJTU-TMQA数据库可以在https://ccccby.github.io中下载。
    Abstract In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compression and enhancement. In this paper, we create a large-scale textured mesh quality assessment database, namely SJTU-TMQA, which includes 21 reference meshes and 945 distorted samples. The meshes are rendered into processed video sequences and then conduct subjective experiments to obtain mean opinion scores (MOS). The diversity of content and accuracy of MOS has been shown to validate its heterogeneity and reliability. The impact of various types of distortion on human perception is demonstrated. 13 state-of-the-art objective metrics are evaluated on SJTU-TMQA. The results report the highest correlation of around 0.6, indicating the need for more effective objective metrics. The SJTU-TMQA is available at https://ccccby.github.io
    摘要 在最近的几年中,静止的矩阵图形在各种应用中变得非常普遍,如动画、游戏、医疗影像和文化遗产应用。然而,对纹理矩阵质量的研究很少,这限制了质量导向的应用,如矩阵压缩和提高。在这篇论文中,我们创建了一个大规模的纹理矩阵质量评估数据库,即上海交通大学纹理矩阵质量评价数据库(SJTU-TMQA),包括21个参考矩阵和945个扭曲样本。这些矩阵通过渲染而生成的处理视频序列,然后通过主观实验获得mean opinion score(MOS)。我们所得到的多样性和准确性已经被证明,以 validate its heterogeneity and reliability。我们还展示了不同类型的扭曲对人类的感知具有多大的影响。13种当前的对象度量被评估在SJTU-TMQA上,结果显示其相关性达0.6, indicating the need for more effective objective metrics。SJTU-TMQA可以在https://ccccby.github.io 上获取。

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing

  • paper_url: http://arxiv.org/abs/2309.15664
  • repo_url: https://github.com/wangkai930418/DPL
  • paper_authors: Kai Wang, Fei Yang, Shiqi Yang, Muhammad Atif Butt, Joost van de Weijer
  • for: 这 paper 的目的是给用户提供精细的图像编辑功能,以便通过修改文本提示来控制生成的图像。
  • methods: 这 paper 使用了Diffusion模型,并提出了一种名为 Dynamic Prompt Learning(DPL)的新方法,以解决图像编辑时的偏差问题。
  • results: compared to existing methods, DPL 可以准确地编辑图像中的特定对象,而不会影响其他图像区域。这 paper 的实验结果表明,DPL 可以在多种图像场景中获得superior的结果, both quantitatively (CLIP score, Structure-Dist) 和 qualitatively (用户评价).
    Abstract Large-scale text-to-image generative models have been a ground-breaking development in generative AI, with diffusion models showing their astounding ability to synthesize convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are susceptible to unintended modifications of regions outside the targeted area, such as on the background or on distractor objects which have some semantic or visual relationship with the targeted object. According to our experimental findings, inaccurate cross-attention maps are at the root of this problem. Based on this observation, we propose Dynamic Prompt Learning (DPL) to force cross-attention maps to focus on correct noun words in the text prompt. By updating the dynamic tokens for nouns in the textual input with the proposed leakage repairment losses, we achieve fine-grained image editing over particular objects while preventing undesired changes to other image regions. Our method DPL, based on the publicly available Stable Diffusion, is extensively evaluated on a wide range of images, and consistently obtains superior results both quantitatively (CLIP score, Structure-Dist) and qualitatively (on user-evaluation). We show improved prompt editing results for Word-Swap, Prompt Refinement, and Attention Re-weighting, especially for complex multi-object scenes.
    摘要 大规模文本到图像生成模型已经是生成智能的一个重要发展,扩散模型表现出了从文本输入提示synthesize出实际的图像的惊人能力。图像编辑研究的目标是给用户控制生成图像的文本提示。现有的图像编辑技术容易导致不必要地修改图像背景或 Distractor 对象上的元素,这会导致图像 editing 失败。根据我们的实验结果,不准确的跨注意力地图是这个问题的根本原因。基于这一观察,我们提出了动态提示学习(DPL),强制跨注意力地图专注于正确的名词在文本输入中。通过更新文本输入中的动态token для名词,我们实现了细化的图像编辑,并避免了不必要地修改其他图像区域。我们的方法DPL,基于公共可用的稳定扩散,广泛评估了多种图像,并 consistently 获得了较好的数值(CLIP 分数、结构-分布)和质量(用户评估)评估结果。我们展示了对 Word-Swap、提示精度和注意力重新分配的提示编辑结果的改进,特别是在复杂多对象场景中。

Human Kinematics-inspired Skeleton-based Video Anomaly Detection

  • paper_url: http://arxiv.org/abs/2309.15662
  • repo_url: https://github.com/XiaoJian923/Kinematics-VAD
  • paper_authors: Jian Xiao, Tianyuan Liu, Genlin Ji
  • for: 本研究旨在探讨人体异常检测视频中的新方法,以及人体动态特征如何用于检测异常。
  • methods: 本研究提出了一种新的方法 called HKVAD (Human Kinematic-inspired Video Anomaly Detection),它利用人体动态特征来检测视频异常。该方法首先利用人体三维姿态数据,特别是跑步姿势、脚部位置和颈部位置的动态特征,然后使用流变模型来估算概率并检测异常。
  • results: 根据实验结果,HKVAD方法在两个公共数据集上(ShanghaiTech和UBnormal)获得了良好的结果,而且只需使用了 minimal computational resources。这表明该方法的有效性和潜在性。
    Abstract Previous approaches to detecting human anomalies in videos have typically relied on implicit modeling by directly applying the model to video or skeleton data, potentially resulting in inaccurate modeling of motion information. In this paper, we conduct an exploratory study and introduce a new idea called HKVAD (Human Kinematic-inspired Video Anomaly Detection) for video anomaly detection, which involves the explicit use of human kinematic features to detect anomalies. To validate the effectiveness and potential of this perspective, we propose a pilot method that leverages the kinematic features of the skeleton pose, with a specific focus on the walking stride, skeleton displacement at feet level, and neck level. Following this, the method employs a normalizing flow model to estimate density and detect anomalies based on the estimated density. Based on the number of kinematic features used, we have devised three straightforward variant methods and conducted experiments on two highly challenging public datasets, ShanghaiTech and UBnormal. Our method achieves good results with minimal computational resources, validating its effectiveness and potential.
    摘要 前一些视频人异常检测方法通常是通过直接应用模型到视频或skeleton数据来进行隐式模型,这可能导致动作信息的不准确模型化。在这篇论文中,我们进行了一项探索性研究,并提出了一种新的思路called HKVAD(人体骨征驱动的视频异常检测),这种方法利用人体骨征特征来检测异常。为了证明这种视角的有效性和潜力,我们提议了一种起点方法,该方法利用人体skeleton姿势中的步伐、脚部偏移量和 neck 水平位置的骨征特征。接着,该方法使用了一种归一化流模型来估计密度并检测异常基于估计的密度。根据使用的骨征特征数量,我们设计了三种简单的变体方法,并在两个非常困难的公共数据集上进行了实验,即ShanghaiTech 和 UBnormal。我们的方法在计算资源不多的情况下达到了良好的结果,这 validate了其有效性和潜力。

FRS-Nets: Fourier Parameterized Rotation and Scale Equivariant Networks for Retinal Vessel Segmentation

  • paper_url: http://arxiv.org/abs/2309.15638
  • repo_url: None
  • paper_authors: Zihong Sun, Qi Xie, Deyu Meng
    for: 这篇论文主要目的是提出一种新的卷积操作符(FRS-Conv),以提高卷积神经网(CNNs)在血管分类中的精度和一致性。methods: 这篇论文使用了一种新的参数化方案,允许卷积 filters 进行高精度的旋转和缩放变换。它还提出了旋转和缩放对卷积映射的等调性数学表述。最后,它将这些表述与传统卷积映射相结合,实现了FRS-Conv。results: 这篇论文的实验结果显示,使用FRS-Conv可以实现血管分类中的高精度和一致性。它在三个公共数据集上进行了广泛的比较实验,包括内集和跨集设定。与相比方法相比,FRS-Nets 仅需13.9%的参数,却能够实现顶尖的性能,并且具有优秀的一致性和丰富的应用潜力。
    Abstract With translation equivariance, convolution neural networks (CNNs) have achieved great success in retinal vessel segmentation. However, some other symmetries of the vascular morphology are not characterized by CNNs, such as rotation and scale symmetries. To embed more equivariance into CNNs and achieve the accuracy requirement for retinal vessel segmentation, we construct a novel convolution operator (FRS-Conv), which is Fourier parameterized and equivariant to rotation and scaling. Specifically, we first adopt a new parameterization scheme, which enables convolutional filters to arbitrarily perform transformations with high accuracy. Secondly, we derive the formulations for the rotation and scale equivariant convolution mapping. Finally, we construct FRS-Conv following the proposed formulations and replace the traditional convolution filters in U-Net and Iter-Net with FRS-Conv (FRS-Nets). We faithfully reproduce all compared methods and conduct comprehensive experiments on three public datasets under both in-dataset and cross-dataset settings. With merely 13.9% parameters of corresponding baselines, FRS-Nets have achieved state-of-the-art performance and significantly outperform all compared methods. It demonstrates the remarkable accuracy, generalization, and clinical application potential of FRS-Nets.
    摘要 使用翻译等价性,图像卷积神经网络(CNN)在血管轮廓分割方面取得了很大的成功。然而,图像中的其他同质性,如旋转和缩放同质性,并没有被CNN表征出来。为了嵌入更多的等价性到CNN中,并达到血管轮廓分割的精度要求,我们构建了一种新型的卷积算子(FRS-Conv),该算子是快 Fourier 参数化的和旋转和缩放等价的。 Specifically,我们首先采用一种新的参数化方案,允许卷积滤波器通过高精度执行变换。其次,我们 derivate了旋转和缩放等价的卷积映射表达式。最后,我们根据提出的表达式构建FRS-Conv,并将传统卷积滤波器在U-Net和Iter-Net中取代。我们忠实地复制了所有相关的方法,并在三个公共数据集上进行了广泛的实验,包括在集合和跨集合设置下。只有13.9%的参数,FRS-Nets已经达到了同类方法的状态对比较好的性能,并显著超过了所有相关方法。这表明FRS-Nets具有出色的精度、普遍性和临床应用潜力。

Position and Orientation-Aware One-Shot Learning for Medical Action Recognition from Signal Data

  • paper_url: http://arxiv.org/abs/2309.15635
  • repo_url: None
  • paper_authors: Leiyu Xie, Yuxing Yang, Zeyu Fu, Syed Mohsen Naqvi
  • for: 这篇论文旨在提出一个基于信号数据的医疗动作识别框架,以提高医疗动作识别的精度和可靠性。
  • methods: 该框架包括两个阶段,每个阶段含有信号生成(SIG)、跨注意(CsA)、时间截然变(DTW)模组,以及具有隐私保证的位置和方向特征的资讯融合。 SIG 方法旨在将骨架资料转换为隐私保证的特征,以供训练。 CsA 模组则是为了帮助网络优化医疗动作识别,并对人体部位进行注意力调节,以解决类似的医疗动作相关问题。 DTW 模组则是为了将时间汇入调整,以提高模型性能。
  • results: 实验结果显示,该提案的方法可以在NTU RGB+D 60、NTU RGB+D 120和PKU-MMD 等三个常用和知名的数据集上实现医疗动作识别的高精度和可靠性,并在不同的数据分配情况下协助优化医疗动作识别的性能,比如NTU RGB+D 60 的通用数据分配下提高了2.7%、NTU RGB+D 120 的通用数据分配下提高了6.2%、PKU-MMD 的通用数据分配下提高了4.1%。
    Abstract In this work, we propose a position and orientation-aware one-shot learning framework for medical action recognition from signal data. The proposed framework comprises two stages and each stage includes signal-level image generation (SIG), cross-attention (CsA), dynamic time warping (DTW) modules and the information fusion between the proposed privacy-preserved position and orientation features. The proposed SIG method aims to transform the raw skeleton data into privacy-preserved features for training. The CsA module is developed to guide the network in reducing medical action recognition bias and more focusing on important human body parts for each specific action, aimed at addressing similar medical action related issues. Moreover, the DTW module is employed to minimize temporal mismatching between instances and further improve model performance. Furthermore, the proposed privacy-preserved orientation-level features are utilized to assist the position-level features in both of the two stages for enhancing medical action recognition performance. Extensive experimental results on the widely-used and well-known NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets all demonstrate the effectiveness of the proposed method, which outperforms the other state-of-the-art methods with general dataset partitioning by 2.7%, 6.2% and 4.1%, respectively.
    摘要 在这个工作中,我们提出了一个位置和方向意识到一步学习框架 для医疗动作识别从信号数据。我们的框架包括两个阶段,每个阶段包括信号水平图生成(SIG)、交叉注意(CsA)、动态时间滤波(DTW)模块以及信号水平和方向级别特征的信息融合。我们的SIG方法旨在将原始骨架数据转换成隐私保护的特征进行训练。CsA模块是为了帮助网络减少医疗动作识别偏见,更关注每个特定动作中人体重要部分,以解决类似的医疗动作相关问题。此外,DTW模块用于最小化时间匹配错误,以提高模型性能。此外,我们的隐私保护方向级别特征被利用以帮助位置级别特征在两个阶段中提高医疗动作识别性能。我们的实验结果表明,我们的方法在 widely 使用和知名的 NTU RGB+D 60、NTU RGB+D 120 和 PKU-MMD 数据集上均达到了最高效果,与其他状态对比方法的总体数据分区优势为2.7%、6.2%和4.1%,分别。

Neuromorphic Imaging and Classification with Graph Learning

  • paper_url: http://arxiv.org/abs/2309.15627
  • repo_url: None
  • paper_authors: Pei Zhang, Chutian Wang, Edmund Y. Lam
  • for: 该论文旨在开发一种基于神经元模型的图像摄像头,以便在各种EXTREME的照明条件下捕捉动态场景,并且减少运动模糊和提高细节表示。
  • methods: 该论文使用图像摄像头异步记录像素亮度变化,并生成稀热事件流。然后,使用图形变换器处理这些事件数据,以实现精准的神经元分类。
  • results: 对比传统方法,该论文的方法能够在具有限制的计算资源和事件数量的实际场景中,提供更好的结果,并且在EXTREME照明条件下捕捉动态场景中减少运动模糊和提高细节表示。
    Abstract Bio-inspired neuromorphic cameras asynchronously record pixel brightness changes and generate sparse event streams. They can capture dynamic scenes with little motion blur and more details in extreme illumination conditions. Due to the multidimensional address-event structure, most existing vision algorithms cannot properly handle asynchronous event streams. While several event representations and processing methods have been developed to address such an issue, they are typically driven by a large number of events, leading to substantial overheads in runtime and memory. In this paper, we propose a new graph representation of the event data and couple it with a Graph Transformer to perform accurate neuromorphic classification. Extensive experiments show that our approach leads to better results and excels at the challenging realistic situations where only a small number of events and limited computational resources are available, paving the way for neuromorphic applications embedded into mobile facilities.
    摘要 生物启发 neuromorphic 摄像头异步记录像素亮度变化,生成稀疏事件流。它们可以捕捉动态场景,具有少量运动模糊和更多细节在极端照明条件下。由于多维度地址事件结构,大多数现有视觉算法无法正确处理异步事件流。虽然一些事件表示和处理方法已经开发出来解决这个问题,但它们通常受到大量事件数量的限制,导致运行时间和内存占用增加很多。在这篇论文中,我们提出一种新的图表示法,将事件数据表示为图,并与图变换器结合,实现高准确的 neuromorphic 分类。广泛的实验表明,我们的方法在实际情况下表现更好,能够在只有少量事件和有限的计算资源的情况下取得更好的结果,为 neuromorphic 应用在移动设备中铺平道路。

Leveraging Topology for Domain Adaptive Road Segmentation in Satellite and Aerial Imagery

  • paper_url: http://arxiv.org/abs/2309.15625
  • repo_url: None
  • paper_authors: Javed Iqbal, Aliza Masood, Waqas Sultani, Mohsen Ali
  • for: 本研究旨在提高遥感图像中道路分割的精度和一致性,以满足自动驾驶、城市规划和可持续发展等实际应用。
  • methods: 本研究提出了一种基于Topology的无监督领域适应方法,通过预测道路skeleton来强制道路分割预测和skeleton预测具有同 topological结构的约束。
  • results: 对 SpaceNet 和 DeepGlobe 数据集进行了广泛的实验,并证明了提出的方法在与现有状态的方法进行比较时具有显著的优势,具体的比较结果为:SpaceNet 到 DeepGlobe 的适应性提高6.6%, 6.7%, 9.8%。
    Abstract Getting precise aspects of road through segmentation from remote sensing imagery is useful for many real-world applications such as autonomous vehicles, urban development and planning, and achieving sustainable development goals. Roads are only a small part of the image, and their appearance, type, width, elevation, directions, etc. exhibit large variations across geographical areas. Furthermore, due to differences in urbanization styles, planning, and the natural environments; regions along the roads vary significantly. Due to these variations among the train and test domains, the road segmentation algorithms fail to generalize to new geographical locations. Unlike the generic domain alignment scenarios, road segmentation has no scene structure, and generic domain adaptation methods are unable to enforce topological properties like continuity, connectivity, smoothness, etc., thus resulting in degraded domain alignment. In this work, we propose a topology-aware unsupervised domain adaptation approach for road segmentation in remote sensing imagery. Specifically, we predict road skeleton, an auxiliary task to impose the topological constraints. To enforce consistent predictions of road and skeleton, especially in the unlabeled target domain, the conformity loss is defined across the skeleton prediction head and the road-segmentation head. Furthermore, for self-training, we filter out the noisy pseudo-labels by using a connectivity-based pseudo-labels refinement strategy, on both road and skeleton segmentation heads, thus avoiding holes and discontinuities. Extensive experiments on the benchmark datasets show the effectiveness of the proposed approach compared to existing state-of-the-art methods. Specifically, for SpaceNet to DeepGlobe adaptation, the proposed approach outperforms the competing methods by a minimum margin of 6.6%, 6.7%, and 9.8% in IoU, F1-score, and APLS, respectively.
    摘要 getting precise aspects of road through segmentation from remote sensing imagery is useful for many real-world applications such as autonomous vehicles, urban development and planning, and achieving sustainable development goals. roads are only a small part of the image, and their appearance, type, width, elevation, directions, etc. exhibit large variations across geographical areas. Furthermore, due to differences in urbanization styles, planning, and the natural environments; regions along the roads vary significantly. due to these variations among the train and test domains, the road segmentation algorithms fail to generalize to new geographical locations. unlike the generic domain alignment scenarios, road segmentation has no scene structure, and generic domain adaptation methods are unable to enforce topological properties like continuity, connectivity, smoothness, etc., thus resulting in degraded domain alignment. in this work, we propose a topology-aware unsupervised domain adaptation approach for road segmentation in remote sensing imagery. specifically, we predict road skeleton, an auxiliary task to impose the topological constraints. to enforce consistent predictions of road and skeleton, especially in the unlabeled target domain, the conformity loss is defined across the skeleton prediction head and the road-segmentation head. Furthermore, for self-training, we filter out the noisy pseudo-labels by using a connectivity-based pseudo-labels refinement strategy, on both road and skeleton segmentation heads, thus avoiding holes and discontinuities. extensive experiments on the benchmark datasets show the effectiveness of the proposed approach compared to existing state-of-the-art methods. specifically, for spacenet to deepglobe adaptation, the proposed approach outperforms the competing methods by a minimum margin of 6.6%, 6.7%, and 9.8% in iou, f1-score, and apls, respectively.

NoSENSE: Learned unrolled cardiac MRI reconstruction without explicit sensitivity maps

  • paper_url: http://arxiv.org/abs/2309.15608
  • repo_url: None
  • paper_authors: Felix Frederik Zimmermann, Andreas Kofler
  • for: 这个论文旨在提出一种基于深度卷积神经网络的加速心脏MRI多接收器磁共振图像重建方法,以避免许多现有的学习MR图像重建技术中的磁共振敏感度地图(CSM)估计。
  • methods: 该方法包括一系列新的学习图像和k空间块,以及共振磁场信息的共享和特征 Wisdom (FiLM)块,以及磁共振数据一致(DC)块。
  • results: 该方法在MICCAI STACOM CMRxRecon挑战中的笔轨和映射轨验证领导表中 achieved PSNR值为34.89和35.56,SSIM值为0.920和0.942,在4个不同的队伍中排名第4。
    Abstract We present a novel learned image reconstruction method for accelerated cardiac MRI with multiple receiver coils based on deep convolutional neural networks (CNNs) and algorithm unrolling. In contrast to many existing learned MR image reconstruction techniques that necessitate coil-sensitivity map (CSM) estimation as a distinct network component, our proposed approach avoids explicit CSM estimation. Instead, it implicitly captures and learns to exploit the inter-coil relationships of the images. Our method consists of a series of novel learned image and k-space blocks with shared latent information and adaptation to the acquisition parameters by feature-wise modulation (FiLM), as well as coil-wise data-consistency (DC) blocks. Our method achieved PSNR values of 34.89 and 35.56 and SSIM values of 0.920 and 0.942 in the cine track and mapping track validation leaderboard of the MICCAI STACOM CMRxRecon Challenge, respectively, ranking 4th among different teams at the time of writing. Code will be made available at https://github.com/fzimmermann89/CMRxRecon
    摘要 我们提出了一种新的学习Image重建方法,用于加速心脏MRI,基于深度卷积神经网络(CNNs)和算法膨胀。与许多现有的学习MR Image重建技术不同,我们的提议方法不需要显式的磁共振敏感度地图(CSM)估计。而是通过隐式地捕捉和利用图像间的相互关系,来避免直接估计CSM。我们的方法包括一系列新的学习图像和k空间块,共享缓存信息和适应到获取参数的特征 wise modulation(FiLM),以及磁共振数据一致(DC)块。我们的方法在MICCAI STACOM CMRxRecon Challenge的碰撞轨迹和映射轨迹验证领导борда上 achieved PSNR值为34.89和35.56,SSIM值为0.920和0.942,在不同团队中排名第四。代码将在https://github.com/fzimmermann89/CMRxRecon上提供。

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2309.15596
  • repo_url: None
  • paper_authors: Shizhe Chen, Ricardo Garcia, Cordelia Schmid, Ivan Laptev
  • for: 本研究旨在提高机器人对自然语言指令的理解和执行 manipulation 任务能力。
  • methods: 提议一种基于 3D 点云的策略PolarNet,通过特制的点云输入、高效点云编码器和多模态 transformer 来学习 3D 点云表示并与语言指令集成 для行为预测。
  • results: PolarNet 在RLBench benchmark 上展现出了高效和数据效果,与当前状态的 2D 和 3D 方法相比,在单任务和多任务学习中均有出色的表现。在真实的机器人上也取得了可喜的结果。
    Abstract The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.
    摘要 “机器人理解和执行基于自然语言指令的 manipulate 任务是机器人学的长期目标。现有主流方法 для语言导向 manipulate 使用 2D 图像表示,它们面临着组合多视图摄像头和推断精确的 3D 位置和关系的困难。为解决这些限制,我们提出了一种基于 3D 点云的策略called PolarNet,用于语言导向 manipulate。它利用了特制的点云输入、高效的点云编码器和多模态转换器来学习 3D 点云表示并与语言指令集成以进行动作预测。PolarNet 在RLBench benchmark 上进行了多种实验,并在单任务和多任务学习中超越了当前状态的 2D 和 3D 方法。它还在真实的机器人上实现了可靠的结果。”

Domain generalization across tumor types, laboratories, and species – insights from the 2022 edition of the Mitosis Domain Generalization Challenge

  • paper_url: http://arxiv.org/abs/2309.15589
  • repo_url: None
  • paper_authors: Marc Aubreville, Nikolas Stathonikos, Taryn A. Donovan, Robert Klopfleisch, Jonathan Ganz, Jonas Ammeling, Frauke Wilm, Mitko Veta, Samir Jabari, Markus Eckstein, Jonas Annuscheit, Christian Krumnow, Engin Bozaba, Sercan Cayir, Hongyan Gu, Xiang ‘Anthony’ Chen, Mostafa Jahanifar, Adam Shephard, Satoshi Kondo, Satoshi Kasai, Sujatha Kotte, VG Saipradeep, Maxime W. Lafarge, Viktor H. Koelzer, Ziyue Wang, Yongbing Zhang, Sen Yang, Xiyue Wang, Katharina Breininger, Christof A. Bertram
  • For: The paper is focused on the challenge of recognizing mitotic figures in histologic tumor specimens, which is crucial for patient outcome assessment.* Methods: The paper describes the 2022 challenge on Mitosis Domain Generalization (MIDOG 2022), which provided annotated histologic tumor images from six different domains and evaluated the algorithmic approaches for mitotic figure detection from nine challenge participants on ten independent domains.* Results: The top-performing team achieved an $F_1$ score of 0.764, demonstrating that domain generalization across various tumor domains is possible with today’s deep learning-based recognition pipelines. However, all methods resulted in reduced recall scores compared to the immunohistochemistry-assisted reference standard, with only minor changes in the ranking of participants.Here are the three points in Simplified Chinese text:* For: 这篇论文关注了 histologic tumor specimen 中 mitotic figure 的识别问题,这对患者结果评估非常重要。* Methods: 论文描述了2022年 Mitosis Domain Generalization (MIDOG 2022) 挑战,该挑战提供了六个不同领域的注意力束教学图像,并对 nine 个挑战参与者的算法方法进行了在 ten 个独立领域上的评估。* Results: 最高级别的团队实现了 $F_1$ score 0.764,表明今天的深度学习基于的识别管道中的领域泛化是可能的。然而,所有方法都导致了与 immunohistochemistry 助记标准相比的减少的回归得分,仅有小量的排名变化。
    Abstract Recognition of mitotic figures in histologic tumor specimens is highly relevant to patient outcome assessment. This task is challenging for algorithms and human experts alike, with deterioration of algorithmic performance under shifts in image representations. Considerable covariate shifts occur when assessment is performed on different tumor types, images are acquired using different digitization devices, or specimens are produced in different laboratories. This observation motivated the inception of the 2022 challenge on MItosis Domain Generalization (MIDOG 2022). The challenge provided annotated histologic tumor images from six different domains and evaluated the algorithmic approaches for mitotic figure detection provided by nine challenge participants on ten independent domains. Ground truth for mitotic figure detection was established in two ways: a three-expert consensus and an independent, immunohistochemistry-assisted set of labels. This work represents an overview of the challenge tasks, the algorithmic strategies employed by the participants, and potential factors contributing to their success. With an $F_1$ score of 0.764 for the top-performing team, we summarize that domain generalization across various tumor domains is possible with today's deep learning-based recognition pipelines. When assessed against the immunohistochemistry-assisted reference standard, all methods resulted in reduced recall scores, but with only minor changes in the order of participants in the ranking.
    摘要 <>转换文本到简化中文。<>识别mitotic图像在癌症组织样本中的涉及非常重要,对患者结果评估非常重要。这项任务对算法和人工专家来说都是挑战,随着图像表示的变化,算法性能会下降。在不同的癌症类型、不同的数字化设备获取图像以及不同的实验室生产的样本中, covariate shift会出现很大。这些 Observation 驱动了2022年的 Mitosis Domain Generalization(MIDOG 2022)挑战。挑战提供了六个不同领域的癌症组织样本,并评估了参与挑战的九支算法在十个独立领域上的方法。 truth 的确定方法包括三位专家一致和独立的免疫抗体辅助标注。本文将介绍挑战任务、参与者所采用的算法策略以及成功的因素。与 F1 score 为 0.764 的top Performing 团队,我们总结了:今天的深度学习基于的识别管道可以在不同的癌症领域进行预测Domain generalization。在与免疫抗体辅助标注referenced标准进行评估时,所有方法均出现了减少回归分数,但只有一些参与者在排名中的顺序发生了小幅变化。

LivDet2023 – Fingerprint Liveness Detection Competition: Advancing Generalization

  • paper_url: http://arxiv.org/abs/2309.15578
  • repo_url: None
  • paper_authors: Marco Micheletto, Roberto Casula, Giulia Orrù, Simone Carta, Sara Concas, Simone Maurizio La Cava, Julian Fierrez, Gian Luca Marcialis
  • for: 本研究旨在评估指纹识别系统中的生物特征检测技术,以提高系统的安全性和可靠性。
  • methods: 本研究使用了LivDet2023比赛提供的指纹识别数据集,并采用了多种生物特征检测技术,如指纹图像分析和Machine Learning算法,来检测指纹是否为真实的。
  • results: 研究发现,使用生物特征检测技术可以准确地检测指纹是否为真实的,并且可以在不同的环境和 Condition下提供高度的检测精度。
    Abstract The International Fingerprint Liveness Detection Competition (LivDet) is a biennial event that invites academic and industry participants to prove their advancements in Fingerprint Presentation Attack Detection (PAD). This edition, LivDet2023, proposed two challenges, Liveness Detection in Action and Fingerprint Representation, to evaluate the efficacy of PAD embedded in verification systems and the effectiveness and compactness of feature sets. A third, hidden challenge is the inclusion of two subsets in the training set whose sensor information is unknown, testing participants ability to generalize their models. Only bona fide fingerprint samples were provided to participants, and the competition reports and assesses the performance of their algorithms suffering from this limitation in data availability.
    摘要 国际生物指纹生活检测竞赛(LivDet)是一项每两年一度的活动,邀请学术和业界参与者展示他们在指纹攻击检测(PAD)领域的进步。本届LivDet2023中提出了两个挑战,生活检测在动作中和指纹表示,以评估参与者提供的验证系统中的PAD效果和特征集的效率和 компакт性。而隐藏的第三个挑战是在训练集中包含两个子集的感知信息不明确,测试参与者的模型是否能够泛化。只有真实的指纹样本被提供给参与者,竞赛报告和评估参与者的算法受到这种数据可用性的限制。

Learning Spatial-Temporal Regularized Tensor Sparse RPCA for Background Subtraction

  • paper_url: http://arxiv.org/abs/2309.15576
  • repo_url: None
  • paper_authors: Basit Alawode, Sajid Javed
  • for: 这个论文的目的是提出一种基于tensor robust principal component analysis的准确背景 subtractor,用于解决视觉Computer Vision中的背景 subtractor问题。
  • methods: 该论文使用的方法包括:Robust principal component analysis(RPCA)、tensor RPCA、spatial-temporal regularized tensor sparse RPCA、batch和online-based optimization方法。
  • results: 该论文的实验结果表明,提出的方法在六个公共的背景 subtractor数据集上显示出比较出色的性能,与一些现有的方法相比。
    Abstract Video background subtraction is one of the fundamental problems in computer vision that aims to segment all moving objects. Robust principal component analysis has been identified as a promising unsupervised paradigm for background subtraction tasks in the last decade thanks to its competitive performance in a number of benchmark datasets. Tensor robust principal component analysis variations have improved background subtraction performance further. However, because moving object pixels in the sparse component are treated independently and do not have to adhere to spatial-temporal structured-sparsity constraints, performance is reduced for sequences with dynamic backgrounds, camouflaged, and camera jitter problems. In this work, we present a spatial-temporal regularized tensor sparse RPCA algorithm for precise background subtraction. Within the sparse component, we impose spatial-temporal regularizations in the form of normalized graph-Laplacian matrices. To do this, we build two graphs, one across the input tensor spatial locations and the other across its frontal slices in the time domain. While maximizing the objective function, we compel the tensor sparse component to serve as the spatiotemporal eigenvectors of the graph-Laplacian matrices. The disconnected moving object pixels in the sparse component are preserved by the proposed graph-based regularizations since they both comprise of spatiotemporal subspace-based structure. Additionally, we propose a unique objective function that employs batch and online-based optimization methods to jointly maximize the background-foreground and spatial-temporal regularization components. Experiments are performed on six publicly available background subtraction datasets that demonstrate the superior performance of the proposed algorithm compared to several existing methods. Our source code will be available very soon.
    摘要 Background subtraction是计算机视觉中的基本问题之一,旨在 segmenting all moving objects。在过去的一个 décennial中, Robust Principal Component Analysis(RPCA)被认为是一种有竞争力的无监督方法 для背景 subtractio Task。然而,由于在 sparse component中的运动 объек pixel不受 spatial-temporal 结构约束,因此在静背景、潜藏和摄像机震动问题时,性能会降低。在这项工作中,我们提出了一种带有spatial-temporal regularization的tensor sparse RPCA算法,用于高精度的背景 subtractio。在 sparse component中,我们对图 Laplacian矩阵进行正规化,以便在图 Laplacian矩阵的eigenvectors中强制tensor sparse component服为spatiotemporal eigenvectors。此外,我们还提出了一种新的目标函数,该目标函数通过批量和在线优化方法来同时最大化背景-前景和spatial-temporal regularization组件。在六个公共available background subtractio dataset上进行了实验,并证明了我们提出的算法与现有方法相比具有更高的性能。我们的源代码即将公开。

Confidence-based Visual Dispersal for Few-shot Unsupervised Domain Adaptation

  • paper_url: http://arxiv.org/abs/2309.15575
  • repo_url: https://github.com/bostoncake/c-visdit
  • paper_authors: Yizhe Xiong, Hui Chen, Zijia Lin, Sicheng Zhao, Guiguang Ding
  • for: 本研究 targets at few-shot unsupervised domain adaptation (FUDA) problem, where only a few labeled source samples are available, and aims to transfer knowledge from the source domain to the target domain without requiring abundant labeled data in the target domain.
  • methods: 本 paper proposes a novel Confidence-based Visual Dispersal Transfer learning method (C-VisDiT) for FUDA, which consists of a cross-domain visual dispersal strategy and an intra-domain visual dispersal strategy. The cross-domain strategy transfers only high-confidence source knowledge for model adaptation, while the intra-domain strategy guides the learning of hard target samples with easy ones.
  • results: 在Office-31, Office-Home, VisDA-C, 和DomainNet benchmark datasets上,C-VisDiT significantly outperforms state-of-the-art FUDA methods. The proposed method is able to transfer reliable source knowledge to the target domain and improve the classification performance of hard target samples.
    Abstract Unsupervised domain adaptation aims to transfer knowledge from a fully-labeled source domain to an unlabeled target domain. However, in real-world scenarios, providing abundant labeled data even in the source domain can be infeasible due to the difficulty and high expense of annotation. To address this issue, recent works consider the Few-shot Unsupervised Domain Adaptation (FUDA) where only a few source samples are labeled, and conduct knowledge transfer via self-supervised learning methods. Yet existing methods generally overlook that the sparse label setting hinders learning reliable source knowledge for transfer. Additionally, the learning difficulty difference in target samples is different but ignored, leaving hard target samples poorly classified. To tackle both deficiencies, in this paper, we propose a novel Confidence-based Visual Dispersal Transfer learning method (C-VisDiT) for FUDA. Specifically, C-VisDiT consists of a cross-domain visual dispersal strategy that transfers only high-confidence source knowledge for model adaptation and an intra-domain visual dispersal strategy that guides the learning of hard target samples with easy ones. We conduct extensive experiments on Office-31, Office-Home, VisDA-C, and DomainNet benchmark datasets and the results demonstrate that the proposed C-VisDiT significantly outperforms state-of-the-art FUDA methods. Our code is available at https://github.com/Bostoncake/C-VisDiT.
    摘要 <>将文本翻译成简化中文。<>无监督领域适应目标是将源领域中完全标注的知识传递到无标注目标领域。然而,在实际情况下,提供充沛的标注数据甚至在源领域中可能是不可能的,因为标注的困难和高昂的成本。为解决这个问题,最近的研究将注重ew-shot无监督领域适应(FUDA),只有一些源样本被标注,并通过无监督学习方法进行知识传递。然而,现有的方法通常忽略了稀疏标注设置会阻碍学习可靠的源知识传递,同时 ignore了目标样本的学习难度差,导致目标样本被较差地分类。为解决这两个缺陷,本文提出了一种基于信任度的视觉分散学习方法(C-VisDiT) для FUDA。具体来说,C-VisDiT包括一种跨领域视觉分散策略,将高信任度源知识传递到模型适应,以及一种内领域视觉分散策略,用于导引目标样本中的困难样本和易样本进行学习。我们对Office-31、Office-Home、VisDA-C和DomainNet数据集进行了广泛的实验,结果表明,提出的C-VisDiT显著超过了当前最佳的FUDA方法。我们的代码可以在https://github.com/Bostoncake/C-VisDiT中找到。

The Maximum Cover with Rotating Field of View

  • paper_url: http://arxiv.org/abs/2309.15573
  • repo_url: https://github.com/ManojKumarPatnaik/Major-project-list
  • paper_authors: Igor Potapov, Jason Ralph, Theofilos Triommatis
  • for: maximize the visibility and limit the uncertainty in localization problems for a convex polygon $P$ and a static spotlight outside $P$.
  • methods: use a theoretical foundation for the analysis of the maximum cover with a rotating field of view, and express the function of the area $A_{\phi}(\theta)$ as various compositions of a function $A_{\theta}(\phi)$.
  • results: develop an algorithm that approximates the direction of the field of view with precision $\varepsilon$ and complexity $\mathcal{O}(n(\log{n}+(\log{\varepsilon})/\phi))$.Here’s the full text in Simplified Chinese:
  • for: 这个论文是为了最大化 polygon $P$ 和外部的静止灯光之间的可见性,以及限制 localization 问题中的uncertainty。
  • methods: 使用一种理论基础来分析最大覆盖的rotating field of view问题,并将函数 $A_{\phi}(\theta)$ 表示为不同的compositions。
  • results: 开发一个精度为 $\varepsilon$ 的算法,用于 Approximate 静止灯光的方向,复杂度为 $\mathcal{O}(n(\log{n}+(\log{\varepsilon})/\phi))$.
    Abstract Imagine a polygon-shaped platform $P$ and only one static spotlight outside $P$; which direction should the spotlight face to light most of $P$? This problem occurs in maximising the visibility, as well as in limiting the uncertainty in localisation problems. More formally, we define the following maximum cover problem: "Given a convex polygon $P$ and a Field Of View (FOV) with a given centre and inner angle $\phi$; find the direction (an angle of rotation $\theta$) of the FOV such that the intersection between the FOV and $P$ has the maximum area". In this paper, we provide the theoretical foundation for the analysis of the maximum cover with a rotating field of view. The main challenge is that the function of the area $A_{\phi}(\theta)$, with the angle of rotation $\theta$ and the fixed inner angle $\phi$, cannot be approximated directly. We found an alternative way to express it by various compositions of a function $A_{\theta}(\phi)$ (with a restricted inner angle $\phi$ and a fixed direction $\theta$). We show that $A_{\theta}(\phi)$ has an analytical solution in the special case of a two-sector intersection and later provides a constrictive solution for the original problem. Since the optimal solution is a real number, we develop an algorithm that approximates the direction of the field of view, with precision $\varepsilon$, and complexity $\mathcal{O}(n(\log{n}+(\log{\varepsilon})/\phi))$.
    摘要 想象一个 polygon 形式的平台 $P$ 和一个静止的外部灯光 ; 这个灯光应该朝向哪里来照亮 $P$ 最多呢?这个问题在最大可见性和局部化问题中都有出现。我们定义以下最大覆盖问题:“给定一个 convex polygon $P$ 和一个视野(Field Of View,FOV)的中心和内角 $\phi$;找出FOV 的方向(旋转角 $\theta$),使得 FOV 和 $P$ 的交集具有最大面积”。在这篇论文中,我们提供了对最大覆盖问题的理论基础的分析。主要挑战在于函数 $A_{\phi}(\theta)$ 的计算不能直接 aproximated。我们发现了一种代替的方法,通过不同的compositions来表示 $A_{\theta}(\phi)$ 。我们示示了在特殊情况下的两部分交集时,$A_{\theta}(\phi)$ 有分析解,并提供了一个压缩性的解决方案。由于优化解决方案是实数,我们开发了一个精度为 $\varepsilon$ 的搜索算法,复杂度为 $\mathcal{O}(n(\log{n}+(\log{\varepsilon})/\phi))$.

HPL-ViT: A Unified Perception Framework for Heterogeneous Parallel LiDARs in V2V

  • paper_url: http://arxiv.org/abs/2309.15572
  • repo_url: https://github.com/NumtraCG/614ca2d5a2b781088de648b020210923-155728routingdatapipeline230921
  • paper_authors: Yuhang Liu, Boyi Sun, Yuke Li, Yuzheng Hu, Fei-Yue Wang
  • for: 这个论文的目的是发展下一代智能探测器,提出了一个新的框架,并在实验平台DAWN上建立了硬件实现。
  • methods: 这个论文使用了OpenCDA和RLS来建立一个多种探测器数据集OPV2V-HPL,并提出了一个具有域专特性抽象的HPL-ViT架构,用于稳定特征融合。
  • results: 实验结果显示,HPL-ViT在所有设定下均 achievement SOTA表现,并具有优秀的泛化能力。
    Abstract To develop the next generation of intelligent LiDARs, we propose a novel framework of parallel LiDARs and construct a hardware prototype in our experimental platform, DAWN (Digital Artificial World for Natural). It emphasizes the tight integration of physical and digital space in LiDAR systems, with networking being one of its supported core features. In the context of autonomous driving, V2V (Vehicle-to-Vehicle) technology enables efficient information sharing between different agents which significantly promotes the development of LiDAR networks. However, current research operates under an ideal situation where all vehicles are equipped with identical LiDAR, ignoring the diversity of LiDAR categories and operating frequencies. In this paper, we first utilize OpenCDA and RLS (Realistic LiDAR Simulation) to construct a novel heterogeneous LiDAR dataset named OPV2V-HPL. Additionally, we present HPL-ViT, a pioneering architecture designed for robust feature fusion in heterogeneous and dynamic scenarios. It uses a graph-attention Transformer to extract domain-specific features for each agent, coupled with a cross-attention mechanism for the final fusion. Extensive experiments on OPV2V-HPL demonstrate that HPL-ViT achieves SOTA (state-of-the-art) performance in all settings and exhibits outstanding generalization capabilities.
    摘要 要开发下一代智能激光仪,我们提出了一个新的框架──并行激光仪架构(Parallel LiDARs),并在我们的实验平台DAWN(数位人工世界)中实现了实验。这个框架强调物理和数位空间之间的紧密融合,并且支持网络作为核心功能。在自驾车领域,车辆之间的通信技术(V2V)可以实现车辆之间的有效信息交换,这有助于开发激光网络。然而,现有的研究假设所有车辆都采用同一款激光仪,忽略了激光仪的多标准和频率多标准。在这篇论文中,我们首先使用OpenCDA和RLS(现实激光仪 simulator)创建了一个独特的不同激光类型和频率的激光数据集名为OPV2V-HPL。此外,我们还提出了HPL-ViT,一个创新的架构,用于在多标准和动态enario中实现坚固的特征融合。它使用图形注意力Transformer提取特定领域的特征,并与交互式混合机制进行最终融合。实验结果显示,HPL-ViT在所有设定下实现了SOTA性能,并且具有卓越的普遍化能力。

Guided Frequency Loss for Image Restoration

  • paper_url: http://arxiv.org/abs/2309.15563
  • repo_url: None
  • paper_authors: Bilel Benjdira, Anas M. Ali, Anis Koubaa
  • for: 提高图像Restoration的效果
  • methods: 提出了一种名为Guided Frequency Loss(GFL)的损失函数,用于让模型同时学习图像的频谱内容和空间内容
  • results: GFL损失函数在Super Resolution和Denoising任务上实现了PSNR指标的提高,并且在SwinIR和SRGAN模型中提高了训练效果,特别是在受限数据上表现更佳
    Abstract Image Restoration has seen remarkable progress in recent years. Many generative models have been adapted to tackle the known restoration cases of images. However, the interest in benefiting from the frequency domain is not well explored despite its major factor in these particular cases of image synthesis. In this study, we propose the Guided Frequency Loss (GFL), which helps the model to learn in a balanced way the image's frequency content alongside the spatial content. It aggregates three major components that work in parallel to enhance learning efficiency; a Charbonnier component, a Laplacian Pyramid component, and a Gradual Frequency component. We tested GFL on the Super Resolution and the Denoising tasks. We used three different datasets and three different architectures for each of them. We found that the GFL loss improved the PSNR metric in most implemented experiments. Also, it improved the training of the Super Resolution models in both SwinIR and SRGAN. In addition, the utility of the GFL loss increased better on constrained data due to the less stochasticity in the high frequencies' components among samples.
    摘要 Image Restoration 在最近几年内有了非常 significiant progress。许多生成模型已经被应用于图像还原的知名情况中。然而,利用频率领域的利益并没有得到充分的探索,尽管它在这些图像生成情况中扮演着重要的角色。在这种研究中,我们提出了引导频率损失(GFL),它帮助模型同时学习图像的频率内容和空间内容。GFLloss 包括三个主要组成部分,它们在并行工作以提高学习效率:Charbonnier 组件、Laplacian Pyramid 组件和渐进频率组件。我们在Super Resolution 和 Denoising 任务上测试了GFL loss。我们使用了三个不同的数据集和三个不同的架构。我们发现,GFL loss 提高了 PSNR 指标的大多数实验中。此外,GFL loss 也提高了 SwinIR 和 SRGAN 中的 Super Resolution 模型训练。此外,GFL loss 在受限数据上的利用程度更高,因为高频分布在样本中的不确定性更低。

Learning from SAM: Harnessing a Segmentation Foundation Model for Sim2Real Domain Adaptation through Regularization

  • paper_url: http://arxiv.org/abs/2309.15562
  • repo_url: None
  • paper_authors: Mayara E. Bonani, Max Schwarz, Sven Behnke
  • for: 本研究旨在提高无监督预处理的预测性能,尤其是在机器人应用中,目标领域训练数据罕见而注解成本高昂。
  • methods: 本方法基于Segment Anything模型,利用无注解目标频道数据进行自我监督预处理,并提出了一种协方差-方差损失结构,以正则化目标频道上的特征表示。
  • results: 在YCB-Video和HomebrewedDB等 datasets上,本方法的表现优于先前的方法,甚至在YCB-Video上超过了使用真注解的网络。
    Abstract Domain adaptation is especially important for robotics applications, where target domain training data is usually scarce and annotations are costly to obtain. We present a method for self-supervised domain adaptation for the scenario where annotated source domain data (e.g. from synthetic generation) is available, but the target domain data is completely unannotated. Our method targets the semantic segmentation task and leverages a segmentation foundation model (Segment Anything Model) to obtain segment information on unannotated data. We take inspiration from recent advances in unsupervised local feature learning and propose an invariance-variance loss structure over the detected segments for regularizing feature representations in the target domain. Crucially, this loss structure and network architecture can handle overlapping segments and oversegmentation as produced by Segment Anything. We demonstrate the advantage of our method on the challenging YCB-Video and HomebrewedDB datasets and show that it outperforms prior work and, on YCB-Video, even a network trained with real annotations.
    摘要 域 adaptation 特别重要 для robotics 应用程序,目标域训练数据通常罕见而且标注成本高昂。我们提出了一种自我超级vised域 adaptation 方法,其中可以使用已有的源域数据(例如从生成的 sintetico 数据)进行标注,但目标域数据完全无标注。我们的方法targets semantic segmentation 任务,利用 segmentation 基础模型(Segment Anything Model)获取目标域数据中的分割信息。我们启发自最近的无监督本地特征学习的进步,并提出了一种协方差-方差损失结构来规范目标域数据中的特征表示。这种损失结构和网络结构可以处理重叠的分割和过分割,这是由 Segment Anything 生成的。我们在 YCB-Video 和 HomebrewedDB datasets 中展示了我们的方法的优势,并证明它超过先前的工作和,在 YCB-Video 上,连实际标注生成的网络都不能比拟。

Highly Efficient SNNs for High-speed Object Detection

  • paper_url: http://arxiv.org/abs/2309.15883
  • repo_url: None
  • paper_authors: Nemin Qiu, Zhiguo Li, Yuan Li, Chuang Zhu
  • for: 这个论文旨在提出一种高效的神经网络模型,用于快速的物体检测任务。
  • methods: 该论文使用量化训练方法建立了一个具有初始紧凑性的神经网络模型,并提出了一种扩展 Pseudoquantization 方法来保证模型的正确性。另外,它还提出了一种连续推理方案,使用 Feed-Forward Integrate-and-Fire(FewdIF)神经元来实现高速的物体检测。
  • results: 实验结果表明,该高效的神经网络模型可以在 GPU 上实现 118 倍的速度提升,只需要 1.5 MB 的参数进行物体检测任务。此外,在 FPGA 平台上,提出的模型可以实现 800+ FPS 的物体检测,并且具有极低的响应时间。
    Abstract The high biological properties and low energy consumption of Spiking Neural Networks (SNNs) have brought much attention in recent years. However, the converted SNNs generally need large time steps to achieve satisfactory performance, which will result in high inference latency and computational resources increase. In this work, we propose a highly efficient and fast SNN for object detection. First, we build an initial compact ANN by using quantization training method of convolution layer fold batch normalization layer and neural network modification. Second, we theoretically analyze how to obtain the low complexity SNN correctly. Then, we propose a scale-aware pseudoquantization scheme to guarantee the correctness of the compact ANN to SNN. Third, we propose a continuous inference scheme by using a Feed-Forward Integrate-and-Fire (FewdIF) neuron to realize high-speed object detection. Experimental results show that our efficient SNN can achieve 118X speedup on GPU with only 1.5MB parameters for object detection tasks. We further verify our SNN on FPGA platform and the proposed model can achieve 800+FPS object detection with extremely low latency.
    摘要 高生物特性和低能耗的神经网络(SNN)在最近几年内受到了广泛关注。然而,通常的SNN conversions需要大量的时间步长来 достичь满意的性能,这会导致高的推理延迟和计算资源增加。在这种情况下,我们提出了一种高效率和快速的SNN для对象检测。首先,我们使用量化训练方法建立了一个初始化紧凑的ANN。其次,我们 teorically 分析了如何正确地获得低复杂度SNN。然后,我们提出了一种扩展 Pseudoquantization 方案,以确保紧凑ANN的正确性。第三,我们提出了一种连续推理方案,使用 Feed-Forward Integrate-and-Fire(FewdIF) neuron 来实现高速对象检测。实验结果显示,我们的高效SNN在 GPU 上可以实现118倍的速度提升,只需1.5MB 的参数进行对象检测任务。我们进一步验证了我们的SNN在 FPGA 平台上,并发现提出的模型可以实现800+ FPS 对象检测,并且具有极低的延迟。

Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization

  • paper_url: http://arxiv.org/abs/2309.15556
  • repo_url: None
  • paper_authors: Zhenbo Song, Xianghui Ze, Jianfeng Lu, Yujiao Shi
  • for: 本研究旨在解决基于卫星图像的地面图像三重自由度摄像机pose估算问题。
  • methods: 我们提出了一种新的端到端方法,利用了对精密像素粒子流场的学习,以计算摄像机pose。我们的方法与现有方法不同,在像素级别上建立特征度量,使得整个图像得到全图像超级视图控制。具体来说,我们使用了两种不同的卷积网络来提取地面和卫星特征。然后,我们将地面特征图 проек到鸟瞰视图(BEV)中使用固定镜头高度假设来实现初步的几何对应。为了进一步确立地面和卫星特征之间的内容关系,我们引入了一个差分卷积块来修正项目的BEV特征。然后,我们使用RAFT流体解oder网络来计算 dense流场对应。在获得dense流场对应后,我们通过最小二乘方法来过滤匹配的准确值和回归地面摄像机pose。
  • results: 我们的方法与现有方法相比,在KITTI、FORD multi-AV、VIGOR和Oxford RobotCar等数据集上具有显著的改善。特别是,我们的方法可以将地面摄像机pose的 median localization error 降低89%、19%、80%和35%。
    Abstract This paper addresses the problem of estimating the 3-DoF camera pose for a ground-level image with respect to a satellite image that encompasses the local surroundings. We propose a novel end-to-end approach that leverages the learning of dense pixel-wise flow fields in pairs of ground and satellite images to calculate the camera pose. Our approach differs from existing methods by constructing the feature metric at the pixel level, enabling full-image supervision for learning distinctive geometric configurations and visual appearances across views. Specifically, our method employs two distinct convolution networks for ground and satellite feature extraction. Then, we project the ground feature map to the bird's eye view (BEV) using a fixed camera height assumption to achieve preliminary geometric alignment. To further establish content association between the BEV and satellite features, we introduce a residual convolution block to refine the projected BEV feature. Optical flow estimation is performed on the refined BEV feature map and the satellite feature map using flow decoder networks based on RAFT. After obtaining dense flow correspondences, we apply the least square method to filter matching inliers and regress the ground camera pose. Extensive experiments demonstrate significant improvements compared to state-of-the-art methods. Notably, our approach reduces the median localization error by 89%, 19%, 80% and 35% on the KITTI, Ford multi-AV, VIGOR and Oxford RobotCar datasets, respectively.
    摘要 Translated into Simplified Chinese:这篇论文关注了根据卫星图像的地面图像的3个自由度摄像机pose的估算问题。我们提出了一种新的端到端方法,利用了 dense pixel-wise流场场的学习,以计算摄像机pose。我们的方法与现有方法不同,在像素级别构建特征度量,以实现全图像监督,以学习不同视角的特征配置和视觉特征。specifically,我们使用了两个不同的卷积网络来EXTRACT ground和卫星特征。然后,我们将地面特征图Projected to the bird's eye view (BEV) using a fixed camera height assumption to achieve preliminary geometric alignment。为了进一步确立地面和卫星特征之间的内容关联,我们引入了一个差分卷积块来修正Projected BEV特征。然后,我们使用了 RAFT流场估计器来进行流场估计在BEV特征图和卫星特征图上。得到了密集流场匹配后,我们使用最小二乘法来过滤匹配的入liers和回归地面摄像机pose。我们的方法在KITTI、Ford multi-AV、VIGOR和Oxford RobotCar等数据集上进行了广泛的实验,并达到了STATE-OF-THE-ART的性能。尤其是,我们的方法在KITTI数据集上reduces the median localization error by 89%, 19%, 80% and 35% compared to state-of-the-art methods。

Low Latency of object detection for spikng neural network

  • paper_url: http://arxiv.org/abs/2309.15555
  • repo_url: None
  • paper_authors: Nemin Qiu, Chuang Zhu
  • for: 本文旨在开发高精度低延迟的神经网络,特别适用于Edge AI应用。
  • methods: 本文使用了系统性的变换方法,从神经网络中提取了精度和速度两个维度的优势,并通过结构性的修改和量化纠正错误来提高准确率和速度。
  • results: 实验结果显示,提议方法在MS COCO、PASCAL VOC等难度较高的数据集上具有更高的准确率和更低的延迟,并且能够展示神经网络处理脉冲信号的优势。
    Abstract Spiking Neural Networks, as a third-generation neural network, are well-suited for edge AI applications due to their binary spike nature. However, when it comes to complex tasks like object detection, SNNs often require a substantial number of time steps to achieve high performance. This limitation significantly hampers the widespread adoption of SNNs in latency-sensitive edge devices. In this paper, our focus is on generating highly accurate and low-latency SNNs specifically for object detection. Firstly, we systematically derive the conversion between SNNs and ANNs and analyze how to improve the consistency between them: improving the spike firing rate and reducing the quantization error. Then we propose a structural replacement, quantization of ANN activation and residual fix to allevicate the disparity. We evaluate our method on challenging dataset MS COCO, PASCAL VOC and our spike dataset. The experimental results show that the proposed method achieves higher accuracy and lower latency compared to previous work Spiking-YOLO. The advantages of SNNs processing of spike signals are also demonstrated.
    摘要 神经网络具有辐射性,可以在边缘智能应用中使用,因为它们的二进制脉冲性。然而,当面临复杂任务时,如物体检测,SNNs经常需要较多的时间步骤以达到高性能。这种限制妨碍了SNNs在响应时间敏感的边缘设备中的普及。在这篇论文中,我们关注于生成高精度低延迟的SNNs,特别是用于物体检测。首先,我们系统地 derivate SNNs和ANNs之间的转化,并分析如何提高脉冲发射率和减少量化误差。然后,我们提议一种结构性的替换方案,即Activation和Residual的量化纠正,以降低不一致性。我们在MS COCO、PASCAL VOC和我们的脉冲集上进行了实验,结果表明,我们的方法可以 achieved higher accuracy和lower latency compared to previous work Spiking-YOLO。此外,SNNs处理脉冲信号的优势也得到了演示。

Uncertainty Quantification via Neural Posterior Principal Components

  • paper_url: http://arxiv.org/abs/2309.15533
  • repo_url: None
  • paper_authors: Elias Nehme, Omer Yair, Tomer Michaeli
  • for: 这个论文的目的是提出一种能够在单个前向传播中预测 posterior 分布的主成分(PC),以便实现图像修复模型中的不确定性评估。
  • methods: 该方法基于 neural network 的卷积神经网络,可以在单个前向传播中预测 posterior 分布的主成分。可以选择使用预训练的模型,或者从 scratch 开始训练一个模型,以输出预测图像和 posterior 分布的主成分。
  • results: 该方法在多个图像修复问题中表现出色,例如噪声除除、图像缺失填充、超分辨率重建和生物图像转换等。与 posterior 抽样法相比,该方法可以实现更快速的uncertainty量化,并且可以提供更自然的不确定性方向。详细例子可以参考 https://eliasnehme.github.io/NPPC/
    Abstract Uncertainty quantification is crucial for the deployment of image restoration models in safety-critical domains, like autonomous driving and biological imaging. To date, methods for uncertainty visualization have mainly focused on per-pixel estimates. However, a heatmap of per-pixel variances is typically of little practical use, as it does not capture the strong correlations between pixels. A more natural measure of uncertainty corresponds to the variances along the principal components (PCs) of the posterior distribution. Theoretically, the PCs can be computed by applying PCA on samples generated from a conditional generative model for the input image. However, this requires generating a very large number of samples at test time, which is painfully slow with the current state-of-the-art (diffusion) models. In this work, we present a method for predicting the PCs of the posterior distribution for any input image, in a single forward pass of a neural network. Our method can either wrap around a pre-trained model that was trained to minimize the mean square error (MSE), or can be trained from scratch to output both a predicted image and the posterior PCs. We showcase our method on multiple inverse problems in imaging, including denoising, inpainting, super-resolution, and biological image-to-image translation. Our method reliably conveys instance-adaptive uncertainty directions, achieving uncertainty quantification comparable with posterior samplers while being orders of magnitude faster. Examples are available at https://eliasnehme.github.io/NPPC/
    摘要 <>translate the following text into Simplified Chinese<>难以量化的不确定性是图像修复模型在安全关键领域部署的关键,如自动驾驶和生物成像。目前,图像修复模型的不确定性可视化方法主要集中在每个像素的估计上。然而,每个像素的独立差分(variance)的热图通常并不是实际上的很有帮助,因为它们不会捕捉图像中像素之间的强相关性。一个更自然的不确定性度量是根据 posterior 分布的主成分(principal components,PCs)的方差来计算。理论上,PCs 可以通过将 conditional generative model 生成的样本应用 PCA 来计算。然而,这需要在测试时生成非常多的样本,这是目前的 diffusion 模型 非常慢。在这个工作中,我们提出了一种方法,可以在单个前向传播中计算 posterior 分布的 PCs для任何输入图像。我们的方法可以在一个已经训练好的模型上执行,该模型是通过最小二乘误差(MSE)进行训练来减少 Mean Squared Error 的。也可以从头开始训练这个模型,以输出预测图像和 posterior PCs。我们在多种图像重建问题中展示了我们的方法,包括噪声除去、填充、超分辨和生物图像到图像转换。我们的方法可以准确地传递实例特有的不确定性方向,实现了对 posterior 抽样器的uncertainty quantification,并且速度比对 diffusion 模型的训练进行训练好的模型。示例可以在 中找到。

Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

  • paper_url: http://arxiv.org/abs/2309.15529
  • repo_url: None
  • paper_authors: Muyu Wang, Shiyu Fan, Yichen Li, Hui Chen
  • for: 这个研究旨在开发一个可靠的多模式融合架构,以实现医疗资料中缺失的模式不断影响深度学习模型的性能。
  • methods: 本研究使用了一个基于Transformer的多模式融合模组,将双模式融合为一个三模式融合架构。此外,研究者还引入多変量损失函数,以提高模型对缺失模式的Robustness。
  • results: 实验结果显示,提案的多模式融合架构能够有效地融合三种模式,并在缺失模式情况下保持优秀的性能。这个方法可能会扩展到更多模式,以提高临床实用性。
    Abstract Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance on disease diagnosis.X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve model's robustness to missing modalities in the inference process. Finally, we designed comparison and ablation experiments for validating the effectiveness of the fusion, the robustness to missing modalities and the enhancements from each key component. Experiments were conducted on MIMIC-IV, MIMIC-CXR with the 14-label disease diagnosis task. Areas under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC) were used to evaluate models' performance. The experimental results demonstrated that our proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This method is hopeful to be scaled to more modalities to enhance the clinical practicality of the model.
    摘要 融合多Modal数据可以提高深度学习模型的性能。然而,医疗数据中缺失Modalities是常见的,这会导致多Modal模型在应用中表现不佳。因此,适应缺失Modalities是非常重要的。这项研究旨在开发一种可靠的多Modal融合架构,可以在医疗数据中融合多种Modalities,并且在缺失Modalities时保持模型的性能。本研究使用的Modalities包括X射成像(image modality)、 radiology report(text modality)和结构化数据(tabular data modality)。每个Modal pair使用Transformer基于的bi-Modal融合模块进行融合,并将三个bi-Modal融合模块组合成一个tri-Modal融合架构。此外,我们还引入了多个变量损失函数来改善模型在推理过程中对缺失Modalities的Robustness。最后,我们设计了比较和减少实验来验证融合的有效性、Robustness和每个关键组件的改进。实验使用MIMIC-IV和MIMIC-CXR datasets,并使用14个疾病诊断任务来评估模型的性能。实验结果表明,我们提posed的多Modal融合架构可以有效地融合三种Modalities,并且在缺失Modalities时保持模型的性能。这种方法可以在更多Modalities上进行扩展,以提高临床实用性。

P2I-NET: Mapping Camera Pose to Image via Adversarial Learning for New View Synthesis in Real Indoor Environments

  • paper_url: http://arxiv.org/abs/2309.15526
  • repo_url: None
  • paper_authors: Xujie Kang, Kanglin Liu, Jiang Duan, Yuanhao Gong, Guoping Qiu
  • For: 根据一个新的6DoF摄像头位置,研究在室内环境中预测当前摄像头的视角,并且使用一个conditional生成问题答案网络(P2I-NET)来直接预测当前摄像头的视角。* Methods: 提出了两个副档案检测器,一个是在伪实值空间中的对应检测器,另一个是在真实世界摄像头位置空间中的对应检测器,以确保生成的图像和实际世界中的摄像头位置之间的一致性。此外,还引入了一个深度卷积神经网络(CNN)来进一步强制这一一致性。* Results: 实际进行了对新视角预测实验,结果显示P2I-NET在许多NeRF基础模型的比较下表现出色,尤其是在速度方面,P2I-NET比基础模型40-100倍快。此外,还提供了一个新的室内环境数据集,包括22个高分辨率RGBD影像和对应的摄像头位置参数。
    Abstract Given a new $6DoF$ camera pose in an indoor environment, we study the challenging problem of predicting the view from that pose based on a set of reference RGBD views. Existing explicit or implicit 3D geometry construction methods are computationally expensive while those based on learning have predominantly focused on isolated views of object categories with regular geometric structure. Differing from the traditional \textit{render-inpaint} approach to new view synthesis in the real indoor environment, we propose a conditional generative adversarial neural network (P2I-NET) to directly predict the new view from the given pose. P2I-NET learns the conditional distribution of the images of the environment for establishing the correspondence between the camera pose and its view of the environment, and achieves this through a number of innovative designs in its architecture and training lost function. Two auxiliary discriminator constraints are introduced for enforcing the consistency between the pose of the generated image and that of the corresponding real world image in both the latent feature space and the real world pose space. Additionally a deep convolutional neural network (CNN) is introduced to further reinforce this consistency in the pixel space. We have performed extensive new view synthesis experiments on real indoor datasets. Results show that P2I-NET has superior performance against a number of NeRF based strong baseline models. In particular, we show that P2I-NET is 40 to 100 times faster than these competitor techniques while synthesising similar quality images. Furthermore, we contribute a new publicly available indoor environment dataset containing 22 high resolution RGBD videos where each frame also has accurate camera pose parameters.
    摘要

Improving Facade Parsing with Vision Transformers and Line Integration

  • paper_url: http://arxiv.org/abs/2309.15523
  • repo_url: https://github.com/wbw520/rtfp
  • paper_authors: Bowen Wang, Jiaxing Zhang, Ran Zhang, Yunqin Li, Liangzhi Li, Yuta Nakashima
  • For: The paper is focused on developing a new dataset (Comprehensive Facade Parsing) and a novel pipeline (Revision-based Transformer Facade Parsing) for real-world facade parsing tasks, with the aim of improving computational efficiency and accuracy.* Methods: The paper introduces the use of Vision Transformers (ViT) in facade parsing, and proposes a new revision algorithm (Line Acquisition, Filtering, and Revision) to improve the segmentation results.* Results: The paper reports superior performance of the proposed method on three datasets (ECP 2011, RueMonge 2014, and CFP) compared to existing methods.Here are the three points in Simplified Chinese text:* For: 本文目的是开发一个新的 dataset (Comprehensive Facade Parsing) 和一种新的管道 (Revision-based Transformer Facade Parsing),以提高实际场景中的facade parsing任务计算效率和准确率。* Methods: 本文提出使用 Vision Transformers (ViT) 在facade parsing任务中,并提出一种新的修订算法 (Line Acquisition, Filtering, and Revision) 来提高分割结果。* Results: 本文report示本方法在三个dataset (ECP 2011, RueMonge 2014, 和 CFP) 上的表现较为现有方法有所提高。
    Abstract Facade parsing stands as a pivotal computer vision task with far-reaching applications in areas like architecture, urban planning, and energy efficiency. Despite the recent success of deep learning-based methods in yielding impressive results on certain open-source datasets, their viability for real-world applications remains uncertain. Real-world scenarios are considerably more intricate, demanding greater computational efficiency. Existing datasets often fall short in representing these settings, and previous methods frequently rely on extra models to enhance accuracy, which requires much computation cost. In this paper, we introduce Comprehensive Facade Parsing (CFP), a dataset meticulously designed to encompass the intricacies of real-world facade parsing tasks. Comprising a total of 602 high-resolution street-view images, this dataset captures a diverse array of challenging scenarios, including sloping angles and densely clustered buildings, with painstakingly curated annotations for each image. We introduce a new pipeline known as Revision-based Transformer Facade Parsing (RTFP). This marks the pioneering utilization of Vision Transformers (ViT) in facade parsing, and our experimental results definitively substantiate its merit. We also design Line Acquisition, Filtering, and Revision (LAFR), an efficient yet accurate revision algorithm that can improve the segment result solely from simple line detection using prior knowledge of the facade. In ECP 2011, RueMonge 2014, and our CFP, we evaluate the superiority of our method.
    摘要 外墙解析作为计算机视觉任务,在建筑、城市规划和能效环境等领域具有广泛的应用前景。尽管最近的深度学习方法在某些开源数据集上实现了卓越的结果,但其在实际应用中的可靠性仍存在uncertainty。实际场景相对较复杂,需要更高的计算效率。现有的数据集frequently不能 полностью反映这些场景,而前一些方法通常需要额外的模型来提高准确性,这需要大量的计算成本。在这篇论文中,我们介绍了全面的外墙解析(CFP)数据集,这个数据集仔细地设计,以涵盖实际场景中的复杂性。总共包含602个高分辨率街景图像,这个数据集包括倾斜角和密集建筑等挑战性场景,并且对每个图像进行了精心的标注。我们提出了一个新的管道,称为修订基于转换器的外墙解析(RTFP)。这是首次在外墙解析中使用视transformer(ViT),我们的实验结果证明了它的优势。我们还设计了线性获取、筛选和修订(LAFR)算法,这是一种高效又准确的修订算法,可以通过对外墙的基本线段进行优化来提高 segment结果。在ECP 2011、RueMonge 2014和我们的CFP中,我们评估了我们的方法的优越性。

MLOps for Scarce Image Data: A Use Case in Microscopic Image Analysis

  • paper_url: http://arxiv.org/abs/2309.15521
  • repo_url: None
  • paper_authors: Angelo Yamachui Sitcheu, Nils Friederich, Simon Baeuerle, Oliver Neumann, Markus Reischl, Ralf Mikut
  • for: This paper aims to enhance biomedical image analysis using a holistic approach to Machine Learning Operations (MLOps) in the context of scarce data.
  • methods: The proposed method includes a fingerprinting process to select the best models, datasets, and development strategy for image analysis tasks, as well as automated model development and continuous deployment and monitoring to ensure continuous learning.
  • results: The paper presents preliminary results of a proof of concept for fingerprinting in microscopic image datasets.
    Abstract Nowadays, Machine Learning (ML) is experiencing tremendous popularity that has never been seen before. The operationalization of ML models is governed by a set of concepts and methods referred to as Machine Learning Operations (MLOps). Nevertheless, researchers, as well as professionals, often focus more on the automation aspect and neglect the continuous deployment and monitoring aspects of MLOps. As a result, there is a lack of continuous learning through the flow of feedback from production to development, causing unexpected model deterioration over time due to concept drifts, particularly when dealing with scarce data. This work explores the complete application of MLOps in the context of scarce data analysis. The paper proposes a new holistic approach to enhance biomedical image analysis. Our method includes: a fingerprinting process that enables selecting the best models, datasets, and model development strategy relative to the image analysis task at hand; an automated model development stage; and a continuous deployment and monitoring process to ensure continuous learning. For preliminary results, we perform a proof of concept for fingerprinting in microscopic image datasets.
    摘要

SAF-Net: Self-Attention Fusion Network for Myocardial Infarction Detection using Multi-View Echocardiography

  • paper_url: http://arxiv.org/abs/2309.15520
  • repo_url: None
  • paper_authors: Ilke Adalioglu, Mete Ahishali, Aysen Degerli, Serkan Kiranyaz, Moncef Gabbouj
    for:This paper proposes a novel view-fusion model named SAF-Net to detect myocardial infarction (MI) from multi-view echocardiography recordings.methods:The proposed framework utilizes apical 2-chamber (A2C) and apical 4-chamber (A4C) view echocardiography recordings, and extracts highly representative features using pre-trained deep networks. The SAF-Net model uses a self-attention mechanism to learn dependencies in the extracted feature vectors.results:The proposed SAF-Net model achieves a high-performance level with 88.26% precision, 77.64% sensitivity, and 78.13% accuracy in detecting MI from multi-view echocardiography recordings. The results demonstrate that the SAF-Net model achieves the most accurate MI detection over multi-view echocardiography recordings.Here’s the simplified Chinese text in the format you requested:for: 这个研究提出了一种名为SAF-Net的新视角融合模型,用于从多视角电子心肺图像记录中检测心肺炎。methods: 该提案使用了二尖脉(A2C)和四尖脉(A4C)视角电子心肺图像记录,并EXTRACT高度表征特征。使用预训练深度网络来EXTRACT特征。results: 提案的SAF-Net模型在多视角电子心肺图像记录中检测心肺炎的高性能水平达88.26%的准确率,77.64%的敏感度和78.13%的准确率。结果表明SAF-Net模型在多视角电子心肺图像记录中检测心肺炎的最高精度。
    Abstract Myocardial infarction (MI) is a severe case of coronary artery disease (CAD) and ultimately, its detection is substantial to prevent progressive damage to the myocardium. In this study, we propose a novel view-fusion model named self-attention fusion network (SAF-Net) to detect MI from multi-view echocardiography recordings. The proposed framework utilizes apical 2-chamber (A2C) and apical 4-chamber (A4C) view echocardiography recordings for classification. Three reference frames are extracted from each recording of both views and deployed pre-trained deep networks to extract highly representative features. The SAF-Net model utilizes a self-attention mechanism to learn dependencies in extracted feature vectors. The proposed model is computationally efficient thanks to its compact architecture having three main parts: a feature embedding to reduce dimensionality, self-attention for view-pooling, and dense layers for the classification. Experimental evaluation is performed using the HMC-QU-TAU dataset which consists of 160 patients with A2C and A4C view echocardiography recordings. The proposed SAF-Net model achieves a high-performance level with 88.26% precision, 77.64% sensitivity, and 78.13% accuracy. The results demonstrate that the SAF-Net model achieves the most accurate MI detection over multi-view echocardiography recordings.
    摘要 我occidental infarction (MI) 是 coronary artery disease (CAD) 的严重情况,检测MI的检测是防止进一步对我ocardium的损害的关键。在这项研究中,我们提出了一种新的视图融合模型,名为自我注意力融合网络(SAF-Net),用于从多视图echo受检记录中检测MI。我们的框架使用了Apical 2-chamber(A2C)和Apical 4-chamber(A4C)视图echo受检记录,并从每个视图中提取了三个参照帧,并使用预训练的深度网络提取了高度表征特征。SAF-Net模型使用了自我注意力机制来学习视图之间的依赖关系。我们的模型具有紧凑的架构,包括特征嵌入、自我注意力 Pooling 和 dense层,这使得模型 computationally efficient。我们在HMC-QU-TAU数据集上进行了实验评估,该数据集包含160名患有A2C和A4C视图echo受检记录的患者。我们的SAF-Net模型在多视图echo受检记录中检测MI的性能达到了88.26%的精度、77.64%的敏感度和78.13%的准确率,这些结果证明了SAF-Net模型在多视图echo受检记录中的MI检测性能最高。

Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection

  • paper_url: http://arxiv.org/abs/2309.15519
  • repo_url: None
  • paper_authors: Lukas Strack, Futa Waseda, Huy H. Nguyen, Yinqiang Zheng, Isao Echizen
  • for: 本研究旨在提高红外检测系统的安全性,对Physically-Realizable Adversarial Patches(PRAP)的攻击进行防御。
  • methods: 我们提出了一种简单的防御策略——质量检测(POD),通过随机添加质量样本来增强训练样本,并在检测人员时同时检测质量样本。
  • results: POD不仅可以准确地检测人员,还可以识别质量样本的位置,并在不同的质量样本攻击下保持高度的鲁棒性。
    Abstract Infrared detection is an emerging technique for safety-critical tasks owing to its remarkable anti-interference capability. However, recent studies have revealed that it is vulnerable to physically-realizable adversarial patches, posing risks in its real-world applications. To address this problem, we are the first to investigate defense strategies against adversarial patch attacks on infrared detection, especially human detection. We have devised a straightforward defense strategy, patch-based occlusion-aware detection (POD), which efficiently augments training samples with random patches and subsequently detects them. POD not only robustly detects people but also identifies adversarial patch locations. Surprisingly, while being extremely computationally efficient, POD easily generalizes to state-of-the-art adversarial patch attacks that are unseen during training. Furthermore, POD improves detection precision even in a clean (i.e., no-patch) situation due to the data augmentation effect. Evaluation demonstrated that POD is robust to adversarial patches of various shapes and sizes. The effectiveness of our baseline approach is shown to be a viable defense mechanism for real-world infrared human detection systems, paving the way for exploring future research directions.
    摘要 红外检测是一种出现的技术,具有很好的防障特性,因此在安全关键任务中得到广泛应用。然而,最近的研究发现,红外检测系统容易受到physically realizable adversarial patches的威胁,这会影响其在实际应用中的安全性。为了解决这个问题,我们是第一个调查红外检测系统中 adversarial patch 攻击的防御策略,特别是人体检测。我们提出了一种简单的防御策略,即 patch-based occlusion-aware detection(POD),它可以增加训练样本中的随机贴图,并在后续检测它们。POD不仅可以准确检测人体,还可以识别隐藏在贴图中的敌意贴图位置。意外地,POD的计算效率非常低,同时它可以在未见过训练时的攻击中保持高效。此外,POD在清洁(即无贴图)情况下也可以提高检测精度,这是因为数据增强效果。我们的基线方法在不同形状和大小的敌意贴图攻击中都能够保持高效。这些结果表明,POD是一种可靠的防御策略,可以保护实际中的红外人体检测系统,开辟了未来研究的新途径。

DreamCom: Finetuning Text-guided Inpainting Model for Image Composition

  • paper_url: http://arxiv.org/abs/2309.15508
  • repo_url: None
  • paper_authors: Lingxiao Lu, Bo Zhang, Li Niu
  • for: 合成具有真实感的图像,即将前景对象渗透到背景图像中。
  • methods: 使用大量的前景和背景对象对 diffusion 模型进行预训练,以便在测试时直接应用到新的前景和背景对象。
  • results: 经验显示,使用这种方法可以快速并高效地生成高质量的合成图像,但是经常失去前景细节和显示明显的artefacts。
    Abstract The goal of image composition is merging a foreground object into a background image to obtain a realistic composite image. Recently, generative composition methods are built on large pretrained diffusion models, due to their unprecedented image generation ability. They train a model on abundant pairs of foregrounds and backgrounds, so that it can be directly applied to a new pair of foreground and background at test time. However, the generated results often lose the foreground details and exhibit noticeable artifacts. In this work, we propose an embarrassingly simple approach named DreamCom inspired by DreamBooth. Specifically, given a few reference images for a subject, we finetune text-guided inpainting diffusion model to associate this subject with a special token and inpaint this subject in the specified bounding box. We also construct a new dataset named MureCom well-tailored for this task.
    摘要 “目的是将前景物体合并到背景图像中,以获得实际的合成图像。现在,生成作业方法基于大量预训数据模型,因为它们可以实现前无之纪录的图像生成能力。它们在丰富的前景和背景组合中训练模型,以便在测试时直接应用到新的前景和背景。然而,生成结果经常失去前景细节,并表现出明显的错误。在这个工作中,我们提出了一个轻松简单的方法名为DreamCom,受 DreamBooth 的启发。具体来说,我们将一些对主题的参考图片给调整文本导向填充扩散模型,将主题与特殊的token相关,并在指定的矩形盒中填充这个主题。我们还建立了一个新的数据集名为MureCom,专门用于这个任务。”

Finite Scalar Quantization: VQ-VAE Made Simple

  • paper_url: http://arxiv.org/abs/2309.15505
  • repo_url: https://github.com/google-research/google-research
  • paper_authors: Fabian Mentzer, David Minnen, Eirikur Agustsson, Michael Tschannen
  • for: The paper aims to propose a simple scheme for vector quantization (VQ) in the latent representation of VQ-VAEs, which is called finite scalar quantization (FSQ).
  • methods: The paper uses FSQ to project the VAE representation down to a few dimensions, and each dimension is quantized to a small set of fixed values. The authors use an appropriate choice of the number of dimensions and values each dimension can take to obtain the same codebook size as in VQ.
  • results: The authors train the same models that have been trained on VQ-VAE representations using FSQ, including autoregressive and masked transformer models for image generation, multimodal generation, and dense prediction computer vision tasks. Despite the much simpler design of FSQ, the authors obtain competitive performance in all these tasks.
    Abstract We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets. By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ. On top of such discrete representations, we can train the same models that have been trained on VQ-VAE representations. For example, autoregressive and masked transformer models for image generation, multimodal generation, and dense prediction computer vision tasks. Concretely, we employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation. Despite the much simpler design of FSQ, we obtain competitive performance in all these tasks. We emphasize that FSQ does not suffer from codebook collapse and does not need the complex machinery employed in VQ (commitment losses, codebook reseeding, code splitting, entropy penalties, etc.) to learn expressive discrete representations.
    摘要 Translated into Simplified Chinese:我们提议将vector quantization(VQ)在VQ-VAE中的latent representation中 replaced with一种简单的方案 calledfinite scalar quantization(FSQ),其中我们将VAE表示下降到一些维度(通常小于10)。每个维度被quantized到一小集fixed values,导致一个由这些集组成的(含义)codebook。通过合适地选择维度和每个维度可以取的值的数量,我们可以获得与VQ的codebook大小一样的大小。在这些简单的抽象表示之上,我们可以训练与VQ-VAE表示相同的模型。例如,用于图像生成的autoregressive和masked transformer模型,以及用于计算机视觉任务的多modal生成和精密预测。具体来说,我们使用FSQ与MaskGIT进行图像生成,以及与UViM进行深度估计、色化和分割。尽管FSQ的设计非常简单,但我们在所有这些任务中都获得了竞争性的性能。我们强调FSQ不会出现codebook collapse,并且不需要VQ中使用的复杂机制(如承诺损失、codebook重新种子、code splitting、Entropy penalty等)来学习表示 expressive discrete representations。

Investigating the changes in BOLD responses during viewing of images with varied complexity: An fMRI time-series based analysis on human vision

  • paper_url: http://arxiv.org/abs/2309.15495
  • repo_url: https://github.com/naveen7102/fmri-time-series-classification
  • paper_authors: Naveen Kanigiri, Manohar Suggula, Debanjali Bhattacharya, Neelam Sinha
  • for: investigate the neurological variation of human brain responses during viewing of images with varied complexity using fMRI time series (TS) analysis.
  • methods: employ classical machine learning and deep learning strategies to classify image complexity-specific fMRI TS, and perform temporal semantic segmentation on whole fMRI TS.
  • results: established a baseline in studying how differently human brain functions while looking into images of diverse complexities, and provided insightful explanations for how static images with diverse complexities are perceived.
    Abstract Functional MRI (fMRI) is widely used to examine brain functionality by detecting alteration in oxygenated blood flow that arises with brain activity. This work aims to investigate the neurological variation of human brain responses during viewing of images with varied complexity using fMRI time series (TS) analysis. Publicly available BOLD5000 dataset is used for this purpose which contains fMRI scans while viewing 5254 distinct images of diverse categories, drawn from three standard computer vision datasets: COCO, Imagenet and SUN. To understand vision, it is important to study how brain functions while looking at images of diverse complexities. Our first study employs classical machine learning and deep learning strategies to classify image complexity-specific fMRI TS, represents instances when images from COCO, Imagenet and SUN datasets are seen. The implementation of this classification across visual datasets holds great significance, as it provides valuable insights into the fluctuations in BOLD signals when perceiving images of varying complexities. Subsequently, temporal semantic segmentation is also performed on whole fMRI TS to segment these time instances. The obtained result of this analysis has established a baseline in studying how differently human brain functions while looking into images of diverse complexities. Therefore, accurate identification and distinguishing of variations in BOLD signals from fMRI TS data serves as a critical initial step in vision studies, providing insightful explanations for how static images with diverse complexities are perceived.
    摘要

CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy Grading

  • paper_url: http://arxiv.org/abs/2309.15493
  • repo_url: None
  • paper_authors: Hao Wei, Peilun Shi, Juzheng Miao, Minqing Zhang, Guitao Bai, Jianing Qiu, Furui Liu, Wu Yuan
    for: 这个研究旨在提高computer-aided diabetic retinopathy grading system的准确性和一致性,以帮助镜外科医生快速识别和诊断。methods: 这个研究使用了 novel retinal imaging cameras 和 deep learning-based algorithms,并将 causality analysis 应用到模型架构中以减少域别差异的影响。results: 研究结果显示,这个新的条件预测架构(CauDR)能够减少域别差异的影响,并 achieves state-of-the-art 性能。
    Abstract Diabetic retinopathy (DR) is the most common diabetic complication, which usually leads to retinal damage, vision loss, and even blindness. A computer-aided DR grading system has a significant impact on helping ophthalmologists with rapid screening and diagnosis. Recent advances in fundus photography have precipitated the development of novel retinal imaging cameras and their subsequent implementation in clinical practice. However, most deep learning-based algorithms for DR grading demonstrate limited generalization across domains. This inferior performance stems from variance in imaging protocols and devices inducing domain shifts. We posit that declining model performance between domains arises from learning spurious correlations in the data. Incorporating do-operations from causality analysis into model architectures may mitigate this issue and improve generalizability. Specifically, a novel universal structural causal model (SCM) was proposed to analyze spurious correlations in fundus imaging. Building on this, a causality-inspired diabetic retinopathy grading framework named CauDR was developed to eliminate spurious correlations and achieve more generalizable DR diagnostics. Furthermore, existing datasets were reorganized into 4DR benchmark for DG scenario. Results demonstrate the effectiveness and the state-of-the-art (SOTA) performance of CauDR.
    摘要 糖尿病 retinopathy (DR) 是糖尿病最常见的侵犯,通常会导致视力损害、视力损伤和甚至是失明。一个计算机支持的 DR 分级系统有助于医生快速评估和诊断。在最近的投影照相技术发展后,新的视觉内部影像相机被实施到临床实践中。但大多数深度学习基于的 DR 分级算法显示有限的应用普遍性。这是由于几何图像协议和设备之间的差异引起的领域转移。我们认为,模型在不同领域之间的性能下降是由于学习伪的相互关联。将 causality 分析中的动作从事件给到模型架构中可能将这个问题解决,并提高普遍性。特别是,一个新的通用结构 causality 模型 (SCM) 被提出供分析视觉内部影像中的伪的相互关联。基于这个 SCM,一个以 causality 为基础的糖尿病 retinopathy 分级框架 (CauDR) 被开发,以消除伪的相互关联并 дости得更高的普遍性。此外,现有的数据被重新排序为 4DR 参考景,结果显示 CauDR 的效果和顶尖性能。

Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

  • paper_url: http://arxiv.org/abs/2309.15490
  • repo_url: https://github.com/24wenjie-li/awesome-face-restoration
  • paper_authors: Wenjie Li, Mei Wang, Kai Zhang, Juncheng Li, Xiaoming Li, Yuhang Zhang, Guangwei Gao, Weihong Deng, Chia-Wen Lin
  • for: 本研究目的是为了提高低质量(LQ)图像的面部图像修复(FR)技术。
  • methods: 本文首先检视了现实中常见的LQ图像因素,并介绍了用于生成LQ图像的降低技术。然后, authors分类了FR方法按照不同的任务,并讲解它们的发展历程。此外, authors还介绍了常见的面部优先级,并讨论了如何提高它们的效果。
  • results: 在实验部分, authors 全面评估了当前最佳FR方法的性能 across 多个任务,并从不同的角度分析它们的表现。Note: The “24wenjie-li” in the repository URL is the name of the author, not a typo.
    Abstract Face restoration (FR) is a specialized field within image restoration that aims to recover low-quality (LQ) face images into high-quality (HQ) face images. Recent advances in deep learning technology have led to significant progress in FR methods. In this paper, we begin by examining the prevalent factors responsible for real-world LQ images and introduce degradation techniques used to synthesize LQ images. We also discuss notable benchmarks commonly utilized in the field. Next, we categorize FR methods based on different tasks and explain their evolution over time. Furthermore, we explore the various facial priors commonly utilized in the restoration process and discuss strategies to enhance their effectiveness. In the experimental section, we thoroughly evaluate the performance of state-of-the-art FR methods across various tasks using a unified benchmark. We analyze their performance from different perspectives. Finally, we discuss the challenges faced in the field of FR and propose potential directions for future advancements. The open-source repository corresponding to this work can be found at https:// github.com/ 24wenjie-li/ Awesome-Face-Restoration.
    摘要 面部恢复(FR)是图像恢复的一个专业领域,旨在将低质量(LQ)的面部图像恢复到高质量(HQ)的面部图像。 current deep learning技术的进步已经导致FR方法得到了重要的进步。在这篇论文中,我们首先检查了实际中LQ图像的主要因素,并介绍了用于生成LQ图像的降低技术。我们还讨论了在领域中常用的标准 bencmarks。然后,我们将FR方法分为不同任务,并解释它们的演化历史。此外,我们探讨了常用的面部先验和如何提高它们的效果。在实验部分,我们对现今FR方法的性能进行了广泛的评估,使用了一个统一的 bencmark。我们从不同的角度分析了它们的性能。最后,我们讨论了FR领域面临的挑战和未来的发展方向。相关的开源存储库可以在https://github.com/24wenjie-li/Awesome-Face-Restoration中找到。

Tackling VQA with Pretrained Foundation Models without Further Training

  • paper_url: http://arxiv.org/abs/2309.15487
  • repo_url: None
  • paper_authors: Alvin De Jun Tan, Bingquan Shen
  • for: 这个论文的目的是探讨如何使用预训练的大语言模型(LLMs)解决视觉问答(VQA)问题,无需进一步训练。
  • methods: 这个论文使用了将预训练的LLMs和其他基础模型结合使用,以便在不进一步训练的情况下解决VQA问题。文章探讨了不同的解码策略来生成图像的文本表示,并评估了其性能在VQAv2数据集上。
  • results: 研究发现,通过使用自然语言来表示图像,LLMs可以快速理解图像,并且不需要进一步训练。不同的解码策略对图像的文本表示具有不同的性能,但是综合评估结果表明,使用自然语言来表示图像是一个有效的方法。
    Abstract Large language models (LLMs) have achieved state-of-the-art results in many natural language processing tasks. They have also demonstrated ability to adapt well to different tasks through zero-shot or few-shot settings. With the capability of these LLMs, researchers have looked into how to adopt them for use with Visual Question Answering (VQA). Many methods require further training to align the image and text embeddings. However, these methods are computationally expensive and requires large scale image-text dataset for training. In this paper, we explore a method of combining pretrained LLMs and other foundation models without further training to solve the VQA problem. The general idea is to use natural language to represent the images such that the LLM can understand the images. We explore different decoding strategies for generating textual representation of the image and evaluate their performance on the VQAv2 dataset.
    摘要

Transferability of Representations Learned using Supervised Contrastive Learning Trained on a Multi-Domain Dataset

  • paper_url: http://arxiv.org/abs/2309.15486
  • repo_url: None
  • paper_authors: Alvin De Jun Tan, Clement Tan, Chai Kiat Yeo
  • for: 本研究使用 Supervised Contrastive Learning 框架来学习 DomainNet 多域数据集上的表示,并评估这些表示的传递性在不同域的下游数据集上。
  • methods: 本研究使用 Supervised Contrastive Learning 框架,并使用 fixed feature linear evaluation protocol 评估表示的传递性。
  • results: 实验结果显示,Supervised Contrastive Learning 模型在 7 个不同域的下游数据集上的平均表现比基eline模型优于 6.05%。这些结果表明,Supervised Contrastive Learning 模型可能可以在多域数据集上学习更robust的表示,并且这些表示可以更好地传递到其他域。
    Abstract Contrastive learning has shown to learn better quality representations than models trained using cross-entropy loss. They also transfer better to downstream datasets from different domains. However, little work has been done to explore the transferability of representations learned using contrastive learning when trained on a multi-domain dataset. In this paper, a study has been conducted using the Supervised Contrastive Learning framework to learn representations from the multi-domain DomainNet dataset and then evaluate the transferability of the representations learned on other downstream datasets. The fixed feature linear evaluation protocol will be used to evaluate the transferability on 7 downstream datasets that were chosen across different domains. The results obtained are compared to a baseline model that was trained using the widely used cross-entropy loss. Empirical results from the experiments showed that on average, the Supervised Contrastive Learning model performed 6.05% better than the baseline model on the 7 downstream datasets. The findings suggest that Supervised Contrastive Learning models can potentially learn more robust representations that transfer better across domains than cross-entropy models when trained on a multi-domain dataset.
    摘要 <>使用对比学习可以学习出较高质量的表示,并且这些表示可以更好地在不同领域下转移。然而,对多领域数据集上使用对比学习学习表示的转移性还未得到充分研究。本文通过使用Supervised Contrastive Learning框架,从多领域数据集DomainNet上学习表示,然后使用 fixes 特征线性评估协议评估这些表示在不同领域下的转移性。结果表明,对7个下游数据集进行比较,Supervised Contrastive Learning模型在平均上比基线模型6.05%更好。这些结果表明,Supervised Contrastive Learning模型可能可以在多领域数据集上学习更加稳定的表示,并且这些表示可以更好地转移到不同领域。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Style Transfer and Self-Supervised Learning Powered Myocardium Infarction Super-Resolution Segmentation

  • paper_url: http://arxiv.org/abs/2309.15485
  • repo_url: None
  • paper_authors: Lichao Wang, Jiahao Huang, Xiaodan Xing, Yinzhe Wu, Ramyah Rajakulasingam, Andrew D. Scott, Pedro F Ferreira, Ranil De Silva, Sonia Nielles-Vallespin, Guang Yang
  • For: The paper aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) areas.* Methods: The proposed pipeline incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. An end-to-end super-resolution segmentation model is introduced to generate high-resolution mask from low-resolution LGE style DTI image. A multi-task self-supervised learning strategy is employed to pre-train the super-resolution segmentation model.* Results: The proposed pipeline is expected to enhance the performance of the segmentation model by acquiring more representative knowledge and improving its segmentation performance after fine-tuning.Here is the simplified Chinese version of the three key points:* For: 这篇论文目标是使用晚期加多林酸增强图像(LGE)域来提高扩散tensor成像(DTI)图像的分辨率和干扰率。* Methods: 提议的管道包括一种新的样式传输模型和同时的超解像和分割模型。该模型可以将低分辨率LGE样式DTI图像转换为高分辨率mask。* Results: 提议的管道可以提高分割模型的性能,并且可以通过自动预训练和微调来提高分割性能。
    Abstract This study proposes a pipeline that incorporates a novel style transfer model and a simultaneous super-resolution and segmentation model. The proposed pipeline aims to enhance diffusion tensor imaging (DTI) images by translating them into the late gadolinium enhancement (LGE) domain, which offers a larger amount of data with high-resolution and distinct highlighting of myocardium infarction (MI) areas. Subsequently, the segmentation task is performed on the LGE style image. An end-to-end super-resolution segmentation model is introduced to generate high-resolution mask from low-resolution LGE style DTI image. Further, to enhance the performance of the model, a multi-task self-supervised learning strategy is employed to pre-train the super-resolution segmentation model, allowing it to acquire more representative knowledge and improve its segmentation performance after fine-tuning. https: github.com/wlc2424762917/Med_Img
    摘要 这个研究提出了一个管道,其中包括一种新的风格传输模型和同时的超高分辨率和分割模型。该管道的目标是通过将扩散tensor图像(DTI)转换到晚期加多林革命(LGE)域中,以获得更多的数据,高分辨率和明确驳批我OCARD Infarction(MI)区域。然后,对LGE风格图像进行分割任务。该研究引入了一种端到端超高分辨率分割模型,以生成高分辨率mask从低分辨率LGE风格DTI图像中。此外,为了提高模型的性能,该研究采用了多任务自主学习策略,将超高分辨率分割模型在先修改后 fine-tuning 中进行自我调节。References:* DTI: diffusion tensor imaging* LGE: late gadolinium enhancement* MI: myocardium infarction* SR: super-resolution* Segmentation: 分割

The Robust Semantic Segmentation UNCV2023 Challenge Results

  • paper_url: http://arxiv.org/abs/2309.15478
  • repo_url: None
  • paper_authors: Xuanlong Yu, Yi Zuo, Zitao Wang, Xiaowen Zhang, Jiaxuan Zhao, Yuting Yang, Licheng Jiao, Rui Peng, Xinyi Wang, Junpei Zhang, Kexin Zhang, Fang Liu, Roberto Alcover-Couso, Juan C. SanMiguel, Marcos Escudero-Viñolo, Hanlin Tian, Kenta Matsui, Tianhao Wang, Fahmy Adan, Zhitong Gao, Xuming He, Quentin Bouniot, Hossein Moghaddam, Shyam Nandan Rai, Fabio Cermelli, Carlo Masone, Andrea Pilzer, Elisa Ricci, Andrei Bursuc, Arno Solin, Martin Trapp, Rui Li, Angela Yao, Wenlong Chen, Ivor Simpson, Neill D. F. Campbell, Gianni Franchi
  • for: 本文描述了在ICCV 2023 年举行的 MUAD 不确定量评估挑战中使用的赢利解决方案。挑战的目标是提高城市环境下的semantic segmentation robustness,特别是面对自然的反对抗情况下。
  • methods: 本文介绍了参与挑战的19个提交的方法,其中许多技术启发自过去几年Computer Vision和Machine Learning领域的主要会议和学术期刊上的先进uncertainty quantification方法。
  • results: 本文介绍了挑战的topperforming解决方案,并提供了所有参与者使用的多种方法的全面概述,以便让读者更深入地了解城市环境下的semantic segmentation中的不确定性处理策略。
    Abstract This paper outlines the winning solutions employed in addressing the MUAD uncertainty quantification challenge held at ICCV 2023. The challenge was centered around semantic segmentation in urban environments, with a particular focus on natural adversarial scenarios. The report presents the results of 19 submitted entries, with numerous techniques drawing inspiration from cutting-edge uncertainty quantification methodologies presented at prominent conferences in the fields of computer vision and machine learning and journals over the past few years. Within this document, the challenge is introduced, shedding light on its purpose and objectives, which primarily revolved around enhancing the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report then delves into the top-performing solutions. Moreover, the document aims to provide a comprehensive overview of the diverse solutions deployed by all participants. By doing so, it seeks to offer readers a deeper insight into the array of strategies that can be leveraged to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, especially within urban environments.
    摘要 The challenge aimed to enhance the robustness of semantic segmentation in urban scenes under varying natural adversarial conditions. The report introduces the challenge and its objectives, and then delves into the top-performing solutions. Additionally, the document provides a comprehensive overview of the diverse solutions deployed by all participants, offering readers a deeper understanding of the array of strategies that can be used to effectively handle the inherent uncertainties associated with autonomous driving and semantic segmentation, particularly within urban environments.Translated into Simplified Chinese:这份报告详细介绍了在ICCV 2023 年举行的 MUAD 不确定量化挑战的赢家解决方案。挑战的主要目标是在城市环境下进行 semantic segmentation,特别是在自然难题下进行。报告介绍了19个提交的解决方案,其中许多技术受到了过去几年计算机视觉和机器学习领域的 prominent 会议和学术期刊上的uncertainty quantification方法的 inspirations。挑战的目的是提高城市景观下 semantic segmentation 的 robustness,特别是在自然难题下。报告 introduce 了挑战和其目标,然后它介绍了top-performing 的解决方案。此外,报告还提供了所有参与者所使用的多种解决方案的全面概述,以便让读者更深入地了解 autonomous driving 和 semantic segmentation 中的不确定性,特别是在城市环境下。

A Tutorial on Uniform B-Spline

  • paper_url: http://arxiv.org/abs/2309.15477
  • repo_url: None
  • paper_authors: Yi Zhou
  • for: 本文探讨了非常规B-spline的一致性和矩阵表示。
  • methods: 本文使用了矩阵理论和计算机科学技术来研究非常规B-spline的一致性和矩阵表示。
  • results: 本文得到了一种新的非常规B-spline的矩阵表示方法,并且验证了该方法的正确性和有效性。
    Abstract This document facilitates understanding of core concepts about uniform B-spline and its matrix representation.
    摘要 这份文档帮助理解均匀B-spline的核心概念以及其矩阵表示方法。Here's the breakdown of the translation:* 这份文档 (zhè fāng wén dào) - This document* 帮助理解 (bāng zhù lǐ jiě) - Facilitates understanding* 均匀B-spline (jūn yí B-spline) - Uniform B-spline* 核心概念 (gōng zhī yù yì) - Core concepts* 以及 (yǐ yú) - And* 矩阵表示方法 (pí zhāng biǎo yì fāng yì) - Matrix representation method

Cross-Dataset Experimental Study of Radar-Camera Fusion in Bird’s-Eye View

  • paper_url: http://arxiv.org/abs/2309.15465
  • repo_url: None
  • paper_authors: Lukas Stäcker, Philipp Heidenreich, Jason Rambach, Didier Stricker
  • for: 提高自动驾驶系统的可靠性和可靠性,通过利用卫星信息和摄像头的融合。
  • methods: 提出一种新的和灵活的融合网络,并在nuScenes和View-of-Delft两个数据集上进行测试。
  • results: 研究发现,摄像头分支需要大量和多样化的训练数据,而雷达分支受益于高性能的雷达。通过传输学习,我们提高了摄像头的性能在较小的数据集上。结果还表明,雷达-摄像头融合方法在相比摄像头只和雷达只的基准下显著超越。
    Abstract By exploiting complementary sensor information, radar and camera fusion systems have the potential to provide a highly robust and reliable perception system for advanced driver assistance systems and automated driving functions. Recent advances in camera-based object detection offer new radar-camera fusion possibilities with bird's eye view feature maps. In this work, we propose a novel and flexible fusion network and evaluate its performance on two datasets: nuScenes and View-of-Delft. Our experiments reveal that while the camera branch needs large and diverse training data, the radar branch benefits more from a high-performance radar. Using transfer learning, we improve the camera's performance on the smaller dataset. Our results further demonstrate that the radar-camera fusion approach significantly outperforms the camera-only and radar-only baselines.
    摘要 通过利用相 complementary 的感知信息,雷达和摄像头融合系统具有提供高度可靠和可靠的感知系统的潜在能力,以用于先进的驾驶助手和自动驾驶功能。最新的摄像头基于物体检测技术开创了新的雷达-摄像头融合可能性,包括 bird's eye view 特征地图。在这种工作中,我们提出了一种新的和灵活的融合网络,并评估其性能在 nuScenes 和 View-of-Delft 两个数据集上。我们的实验表明,虽然摄像头分支需要大量和多样化的训练数据,但雷达分支受益于高性能的雷达。通过传输学习,我们改进了摄像头在较小的数据集上的性能。我们的结果还证明了雷达-摄像头融合方法在相对于摄像头Only 和雷达Only 基eline上显著超越。

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

  • paper_url: http://arxiv.org/abs/2309.15459
  • repo_url: None
  • paper_authors: Jiazhao Zhang, Nandiraju Gireesh, Jilong Wang, Xiaomeng Fang, Chaoyi Xu, Weiguang Chen, Liu Dai, He Wang
  • for: 本研究旨在提高机器人助手的移动抓取能力,增强机器人在不稳定环境中的抓取能力。
  • methods: 该研究提出了一种基于在线抓取姿态融合框架的抓取可见性感知方法,可以在实时下对抓取姿态进行筛选和融合,从而实现时间上的一致性。
  • results: 该方法可以减少红UND grasping pose的数量,同时提高抓取姿态质量,从而提高机器人的移动抓取能力。
    Abstract Mobile manipulation constitutes a fundamental task for robotic assistants and garners significant attention within the robotics community. A critical challenge inherent in mobile manipulation is the effective observation of the target while approaching it for grasping. In this work, we propose a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation. Specifically, the predicted grasping poses are online organized to eliminate the redundant, outlier grasping poses, which can be encoded as a grasping pose observation state for reinforcement learning. Moreover, on-the-fly fusing the grasping poses enables a direct assessment of graspability, encompassing both the quantity and quality of grasping poses.
    摘要 Mobile manipulation 是Robotic assistant的基本任务,在Robotics community中吸引了广泛的关注。一个关键的挑战在于有效地观察目标而 approaching 它进行抓取。在这种工作中,我们提议一种基于在线抓取姿 pose 融合框架的抓取可能性感知approach,使抓取观察得到时间协调。具体来说,预测的抓取姿 pose 被在线组织,以消除重复和异常的抓取姿 pose,这些可以编码为抓取观察状态 для reinforcement learning。此外,在线融合抓取姿 pose 直接评估抓取可能性,包括抓取姿 pose 的量和质量。

Semantics-Driven Cloud-Edge Collaborative Inference

  • paper_url: http://arxiv.org/abs/2309.15435
  • repo_url: None
  • paper_authors: Yuche Gao, Beibei Zhang
  • for: 这篇论文主要针对智能城市应用中的视频数据进行高效分析,以智能交通为例。
  • methods: 本论文提出了一个基于 semantics 的云端-边缘协作方法,将视频分析过程分为两个阶段:在边缘服务器上提取视频内容的 semantics (车牌号码图像),然后将该内容提交到云端或边缘服务器进行识别。这种分析方法可以降低端到端传输时间和提高throughput。
  • results: 实验结果显示,相比于将所有视频分析工作负载到云端或边缘服务器进行处理,这种云端-边缘协作方法可以提高端到端传输速度(最多5倍快)、throughput(最多9帧/秒)和网页流量量(50%减少)。这显示了这种方法的有效性。
    Abstract With the proliferation of video data in smart city applications like intelligent transportation, efficient video analytics has become crucial but also challenging. This paper proposes a semantics-driven cloud-edge collaborative approach for accelerating video inference, using license plate recognition as a case study. The method separates semantics extraction and recognition, allowing edge servers to only extract visual semantics (license plate patches) from video frames and offload computation-intensive recognition to the cloud or neighboring edges based on load. This segmented processing coupled with a load-aware work distribution strategy aims to reduce end-to-end latency and improve throughput. Experiments demonstrate significant improvements in end-to-end inference speed (up to 5x faster), throughput (up to 9 FPS), and reduced traffic volumes (50% less) compared to cloud-only or edge-only processing, validating the efficiency of the proposed approach. The cloud-edge collaborative framework with semantics-driven work partitioning provides a promising solution for scaling video analytics in smart cities.
    摘要 随着智能城市应用中视频数据的普遍化,高效的视频分析已成为非常重要,同时也变得非常困难。这篇论文提议一种基于 semantics的云端-边缘集成方法,用 license plate recognition 作为案例研究。该方法将 semantics 提取和识别分开,让边缘服务器只提取视频帧中的视觉 semantics(车牌补丁),并将 computation-intensive 识别 tasks 提交到云或邻近的边缘服务器进行处理,根据负载情况进行异步分配工作。这种分解处理和根据负载情况分配工作的策略,可以降低端到端 latency 和提高吞吐量。实验结果显示,与云只处理或边缘只处理相比,提出的方法可以提高端到端推理速度(最多5倍)、吞吐量(最多9 FPS)和发送流量量(50% 降低)。云端-边缘集成框架,带有基于 semantics 的工作分配策略,为视频分析在智能城市中扩大 scalability 提供了一个可靠的解决方案。

NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions

  • paper_url: http://arxiv.org/abs/2309.15426
  • repo_url: https://github.com/oppo-us-research/NeuRBF
  • paper_authors: Zhang Chen, Zhong Li, Liangchen Song, Lele Chen, Jingyi Yu, Junsong Yuan, Yi Xu
  • for: 本研究目的是提出一种基于总体卷积的神经场,以更好地表示许多应用中的维度灵活的信号。
  • methods: 本研究使用普通卷积函数作为信号表示,而不是传统的格子化神经场。这种方法可以更好地适应目标信号,并且可以通过组合多个频率弧函数来扩展卷积基底的通道 capacities。
  • results: 实验表明,对于2D图像和3D签名距离场表示,我们的方法可以达到更高的准确率和更小的模型大小,而且与优化的权重分配策略相比,我们的方法可以更好地适应不同类型的信号。当应用于神经辉场重建时,我们的方法可以达到最新的 Rendering 质量,同时具有小型模型和相对较快的训练速度。
    Abstract We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at continuous query points. The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals. Our method instead builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. To further improve the channel-wise capacity of radial basis functions, we propose to compose them with multi-frequency sinusoid functions. This technique extends a radial basis to multiple Fourier radial bases of different frequency bands without requiring extra parameters, facilitating the representation of details. Moreover, by marrying adaptive radial bases with grid-based ones, our hybrid combination inherits both adaptivity and interpolation smoothness. We carefully designed weighting schemes to let radial bases adapt to different types of signals effectively. Our experiments on 2D image and 3D signed distance field representation demonstrate the higher accuracy and compactness of our method than prior arts. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed.
    摘要 我们提出了一种新型的神经场,使用通用径向基数进行信号表示。现有的神经场通常采用网格基于表示方式,将本地神经特征存储在网格节点上,并使用 N 维线性核函数来在连续查询点上 interpolate 特征。这些神经特征的空间位置固定在网格节点上,无法适应目标信号。而我们的方法则基于通用径向基数,具有更高的空间适应性和能够更好地适应目标信号。为了进一步提高扁球基数的通道 capacitance,我们提议将其与多频环相互融合。这种技术可以无需更多参数,将扁球基数扩展到多个频环 band 中,以便更好地表示细节。此外,我们还提出了一种将 adaptive 扁球基数与网格基数结合的 гибрид组合,以便继承两者的适应性和插值平滑性。我们仔细设计了权重分配方案,使 radial bases 能够有效地适应不同类型的信号。我们的实验表明,对 2D 图像和 3D 签名距离场表示,我们的方法比先前艺术高得多,而且模型尺寸和训练速度也相对较小。当应用于神经辐射场重建时,我们的方法实现了状态艺术的渲染质量,具有小型模型和相对较快的训练速度。

Inherit with Distillation and Evolve with Contrast: Exploring Class Incremental Semantic Segmentation Without Exemplar Memory

  • paper_url: http://arxiv.org/abs/2309.15413
  • repo_url: None
  • paper_authors: Danpei Zhao, Bo Yuan, Zhenwei Shi
  • For: Addressing class incremental semantic segmentation (CISS) without exemplar memory and resolving catastrophic forgetting and semantic drift simultaneously.* Methods: Proposed method IDEC consists of Dense Knowledge Distillation on all Aspects (DADA) and Asymmetric Region-wise Contrastive Learning (ARCL) modules, with a dynamic class-specific pseudo-labelling strategy.* Results: Achieved state-of-the-art performance on multiple CISS tasks, with superior anti-forgetting ability, particularly in multi-step CISS tasks.
    Abstract As a front-burner problem in incremental learning, class incremental semantic segmentation (CISS) is plagued by catastrophic forgetting and semantic drift. Although recent methods have utilized knowledge distillation to transfer knowledge from the old model, they are still unable to avoid pixel confusion, which results in severe misclassification after incremental steps due to the lack of annotations for past and future classes. Meanwhile data-replay-based approaches suffer from storage burdens and privacy concerns. In this paper, we propose to address CISS without exemplar memory and resolve catastrophic forgetting as well as semantic drift synchronously. We present Inherit with Distillation and Evolve with Contrast (IDEC), which consists of a Dense Knowledge Distillation on all Aspects (DADA) manner and an Asymmetric Region-wise Contrastive Learning (ARCL) module. Driven by the devised dynamic class-specific pseudo-labelling strategy, DADA distils intermediate-layer features and output-logits collaboratively with more emphasis on semantic-invariant knowledge inheritance. ARCL implements region-wise contrastive learning in the latent space to resolve semantic drift among known classes, current classes, and unknown classes. We demonstrate the effectiveness of our method on multiple CISS tasks by state-of-the-art performance, including Pascal VOC 2012, ADE20K and ISPRS datasets. Our method also shows superior anti-forgetting ability, particularly in multi-step CISS tasks.
    摘要 为了解决逐步学习中的前燃问题,我们提出了一种不使用示例内存的类增量 semantic segmentation(CISS)方法,它可以同时解决悬峰忘记和 semantics 漂移问题。 although recent methods have used knowledge distillation to transfer knowledge from the old model, they are still unable to avoid pixel confusion, which results in severe misclassification after incremental steps due to the lack of annotations for past and future classes. Meanwhile, data-replay-based approaches suffer from storage burdens and privacy concerns.在这篇论文中,我们提出了一种不使用示例内存的 CISS 方法,可以同时解决悬峰忘记和 semantics 漂移问题。我们称之为 Inherit with Distillation and Evolve with Contrast (IDEC),它包括一种 dense knowledge distillation on all aspects (DADA) 方法和一种 asymmetric region-wise contrastive learning (ARCL) 模块。通过我们制定的动态类pecific pseudo-labeling策略,DADA 可以在中间层次和输出层之间兼容知识,同时更强调semantic-invariant知识继承。ARCL 实现了在 latent space 中进行区域 wise contrastive learning,以解决 semantics 漂移问题。我们在多个 CISS 任务上示出了我们的方法的效果,包括 Pascal VOC 2012、ADE20K 和 ISPRS 数据集。我们的方法还显示了在多 step CISS 任务中的超越抗忘记能力。

3D Multiple Object Tracking on Autonomous Driving: A Literature Review

  • paper_url: http://arxiv.org/abs/2309.15411
  • repo_url: None
  • paper_authors: Peng Zhang, Xin Li, Liang He, Xin Lin
  • for: 这篇论文主要探讨了三维多 объек图跟踪(3D MOT)领域的研究状况,并提出了未来研究的可能性。
  • methods: 本论文使用了系统性的描述和分析方法,对3D MOT领域的各种方法进行了分类和评价,并提供了一份概括性的实验指标。
  • results: 本论文的研究结果表明,3D MOT领域还存在许多挑战,如 объек形变、隐藏、小目标、数据缺乏、检测失败等问题。同时,本论文还提出了未来研究的可能性,以帮助解决这些挑战。
    Abstract 3D multi-object tracking (3D MOT) stands as a pivotal domain within autonomous driving, experiencing a surge in scholarly interest and commercial promise over recent years. Despite its paramount significance, 3D MOT confronts a myriad of formidable challenges, encompassing abrupt alterations in object appearances, pervasive occlusion, the presence of diminutive targets, data sparsity, missed detections, and the unpredictable initiation and termination of object motion trajectories. Countless methodologies have emerged to grapple with these issues, yet 3D MOT endures as a formidable problem that warrants further exploration. This paper undertakes a comprehensive examination, assessment, and synthesis of the research landscape in this domain, remaining attuned to the latest developments in 3D MOT while suggesting prospective avenues for future investigation. Our exploration commences with a systematic exposition of key facets of 3D MOT and its associated domains, including problem delineation, classification, methodological approaches, fundamental principles, and empirical investigations. Subsequently, we categorize these methodologies into distinct groups, dissecting each group meticulously with regard to its challenges, underlying rationale, progress, merits, and demerits. Furthermore, we present a concise recapitulation of experimental metrics and offer an overview of prevalent datasets, facilitating a quantitative comparison for a more intuitive assessment. Lastly, our deliberations culminate in a discussion of the prevailing research landscape, highlighting extant challenges and charting possible directions for 3D MOT research. We present a structured and lucid road-map to guide forthcoming endeavors in this field.
    摘要 三维多目标跟踪(3D MOT)是自动驾驶领域中的一个重要领域,在最近几年内受到学术界和商业界的关注增长。despite its paramount significance, 3D MOT still faces numerous challenges, including sudden changes in object appearance, ubiquitous occlusion, the presence of small targets, data sparsity, missed detections, and the unpredictable initiation and termination of object motion trajectories. To address these issues, numerous methodologies have been proposed, but 3D MOT remains a formidable problem that requires further exploration.This paper provides a comprehensive examination, assessment, and synthesis of the research landscape in this domain, keeping pace with the latest developments in 3D MOT and suggesting potential avenues for future investigation. Our exploration begins with a systematic exposition of key aspects of 3D MOT and its related domains, including problem delineation, classification, methodological approaches, fundamental principles, and empirical investigations. We then categorize these methodologies into distinct groups, dissecting each group carefully with regard to its challenges, underlying rationale, progress, merits, and demerits.Furthermore, we present a concise summary of experimental metrics and provide an overview of prevalent datasets, facilitating a quantitative comparison for a more intuitive assessment. Finally, our deliberations culminate in a discussion of the prevailing research landscape, highlighting existing challenges and charting possible directions for 3D MOT research. We provide a structured and lucid roadmap to guide forthcoming endeavors in this field.

KDD-LOAM: Jointly Learned Keypoint Detector and Descriptors Assisted LiDAR Odometry and Mapping

  • paper_url: http://arxiv.org/abs/2309.15394
  • repo_url: None
  • paper_authors: Renlang Huang, Minglei Zhao, Jiming Chen, Liang Li
  • for: 提高点云注册的效率和可靠性,使用稀疏关键点匹配。
  • methods: 提出一种紧密相互关联的关键点检测器和描述符(TCKDD),基于多任务全连接神经网络和概率检测损失。
  • results: 对indoor和outdoor数据集进行了广泛的实验,并达到了点云注册的州态艺术性能。此外,还设计了关键点检测器和描述符协助LiDAR导航和地图框架(KDD-LOAM),其实时导航基于关键点描述符匹配的RANSAC。
    Abstract Sparse keypoint matching based on distinct 3D feature representations can improve the efficiency and robustness of point cloud registration. Existing learning-based 3D descriptors and keypoint detectors are either independent or loosely coupled, so they cannot fully adapt to each other. In this work, we propose a tightly coupled keypoint detector and descriptor (TCKDD) based on a multi-task fully convolutional network with a probabilistic detection loss. In particular, this self-supervised detection loss fully adapts the keypoint detector to any jointly learned descriptors and benefits the self-supervised learning of descriptors. Extensive experiments on both indoor and outdoor datasets show that our TCKDD achieves state-of-the-art performance in point cloud registration. Furthermore, we design a keypoint detector and descriptors-assisted LiDAR odometry and mapping framework (KDD-LOAM), whose real-time odometry relies on keypoint descriptor matching-based RANSAC. The sparse keypoints are further used for efficient scan-to-map registration and mapping. Experiments on KITTI dataset demonstrate that KDD-LOAM significantly surpasses LOAM and shows competitive performance in odometry.
    摘要 《稀疏关键点匹配基于独特3D特征表示可以提高点云注册的效率和可靠性。现有的学习基于3D描述器和关键点检测器的方法是独立或松散相连,因此它们无法完全适应于每 andere。在这种工作中,我们提出了紧密相连的关键点检测器和描述器(TCKDD),基于多任务全连接神经网络和 probabilistic检测损失。特别是,这种自主学习的检测损失可以完全适应与共同学习的描述器,并为描述器的自主学习提供了优势。广泛的实验表明,我们的TCKDD在点云注册中 achieved state-of-the-art performance。此外,我们还设计了关键点检测器和描述器协助的LiDAR Odometry和地图框架(KDD-LOAM),其实时导航 rely on 关键点描述器匹配基于RANSAC。稀疏的关键点还用于高效的扫描到地图注册和地图。实验表明,KDD-LOAM在LOAM和ODometry中具有显著优势,并在ODometry中达到了竞争性的表现。》Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Subjective Face Transform using Human First Impressions

  • paper_url: http://arxiv.org/abs/2309.15381
  • repo_url: https://github.com/CV-Lehigh/subjective_face_transform
  • paper_authors: Chaitanya Roygaga, Joshua Krinsky, Kai Zhang, Kenny Kwok, Aparna Bharati
  • for: understand and explain biases in subjective interpretation of faces
  • methods: uses generative models to find semantically meaningful edits to a face image that change perceived attributes
  • results: demonstrates the generalizability of the approach by training on real and synthetic faces and evaluating on in-domain and out-of-domain images using predictive models and human ratings
    Abstract Humans tend to form quick subjective first impressions of non-physical attributes when seeing someone's face, such as perceived trustworthiness or attractiveness. To understand what variations in a face lead to different subjective impressions, this work uses generative models to find semantically meaningful edits to a face image that change perceived attributes. Unlike prior work that relied on statistical manipulation in feature space, our end-to-end framework considers trade-offs between preserving identity and changing perceptual attributes. It maps identity-preserving latent space directions to changes in attribute scores, enabling transformation of any input face along an attribute axis according to a target change. We train on real and synthetic faces, evaluate for in-domain and out-of-domain images using predictive models and human ratings, demonstrating the generalizability of our approach. Ultimately, such a framework can be used to understand and explain biases in subjective interpretation of faces that are not dependent on the identity.
    摘要 人们往往通过看一个人的脸来快速形成主观的非物理属性的印象,如信任度或美好程度。为了了解不同面部变化对主观印象的影响,这项工作使用生成模型找到保持标准化属性的semantically meaningful编辑。不同于先前的工作,我们的末端框架不仅考虑了特征空间的统计 manipulate,而且考虑了保持标识的trade-offs。它将标识保持的秘密空间方向映射到特征分数上的变化,以便将任何输入脸图像根据目标变化推动到特征轴上。我们在真实和 sintetic 脸图像上训练,使用预测模型和人类评分来评估,并证明了我们的方法的普适性。最终,这种框架可以用来理解和解释不依赖于标识的面部解释中的偏见。

Towards Foundation Models Learned from Anatomy in Medical Imaging via Self-Supervision

  • paper_url: http://arxiv.org/abs/2309.15358
  • repo_url: https://github.com/jlianglab/eden
  • paper_authors: Mohammad Reza Hosseinzadeh Taher, Michael B. Gotway, Jianming Liang
  • for: This paper aims to develop a foundation model for medical imaging that can “understand” human anatomy and possess fundamental properties of medical imaging.
  • methods: The authors propose a novel self-supervised learning (SSL) strategy that exploits the hierarchical nature of human anatomy, which encapsulates the intrinsic attributes of anatomical structures-locality and compositionality-within the embedding space.
  • results: The SSL pretrained model derived from the training strategy outperforms state-of-the-art (SOTA) fully/self-supervised baselines and enhances annotation efficiency, offering potential few-shot segmentation capabilities with performance improvements ranging from 9% to 30% for segmentation tasks compared to SSL baselines.
    Abstract Human anatomy is the foundation of medical imaging and boasts one striking characteristic: its hierarchy in nature, exhibiting two intrinsic properties: (1) locality: each anatomical structure is morphologically distinct from the others; and (2) compositionality: each anatomical structure is an integrated part of a larger whole. We envision a foundation model for medical imaging that is consciously and purposefully developed upon this foundation to gain the capability of "understanding" human anatomy and to possess the fundamental properties of medical imaging. As our first step in realizing this vision towards foundation models in medical imaging, we devise a novel self-supervised learning (SSL) strategy that exploits the hierarchical nature of human anatomy. Our extensive experiments demonstrate that the SSL pretrained model, derived from our training strategy, not only outperforms state-of-the-art (SOTA) fully/self-supervised baselines but also enhances annotation efficiency, offering potential few-shot segmentation capabilities with performance improvements ranging from 9% to 30% for segmentation tasks compared to SSL baselines. This performance is attributed to the significance of anatomy comprehension via our learning strategy, which encapsulates the intrinsic attributes of anatomical structures-locality and compositionality-within the embedding space, yet overlooked in existing SSL methods. All code and pretrained models are available at https://github.com/JLiangLab/Eden.
    摘要 人体解剖是医学成像的基础,具有一个突出的特点:它具有地域性和组成性两个内在属性。我们想象一个基于这个基础的基础模型,能够“理解”人体解剖和拥有医学成像的基本属性。作为我们实现这个视野的第一步,我们提出了一种新的自动教育学习(SSL)策略,利用人体解剖的层次结构。我们的广泛实验表明,我们的SSL预训练模型,基于我们的培训策略,不仅超越了当前最佳自动/自我监督基线,还提高了标注效率,可能实现少量shot segmentation功能,并且比SSL基线表现出9%至30%的性能提升。这种性能归功于我们的学习策略,它在嵌入空间内具有地域性和组成性的特征,这些特征在现有的SSL方法中未被考虑。所有代码和预训练模型可以在https://github.com/JLiangLab/Eden中找到。

Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms

  • paper_url: http://arxiv.org/abs/2309.15332
  • repo_url: https://github.com/ucr-robotics/citrus-farm-dataset
  • paper_authors: Hanzhe Teng, Yipeng Wang, Xiaoao Song, Konstantinos Karydis
  • for: 这个论文主要用于开发自动化农业机器人系统,特别是在 citrus 树环境中进行地图建模、定位和农业监测等任务。
  • methods: 该论文使用了多modal的感知数据,包括RGB图像、深度图像、离子图像、热图像以及导航传感器数据,以及中心式位置定位的RTK。
  • results: 该论文提供了一个名为 CitrusFarm 的大型多modal 感知数据集,包括7个序列、3个田间、不同树种、不同植物排列和不同日light 条件,总共1.7小时的操作时间、7.5公里的距离和1.3TB的数据。
    Abstract In this work we introduce the CitrusFarm dataset, a comprehensive multimodal sensory dataset collected by a wheeled mobile robot operating in agricultural fields. The dataset offers stereo RGB images with depth information, as well as monochrome, near-infrared and thermal images, presenting diverse spectral responses crucial for agricultural research. Furthermore, it provides a range of navigational sensor data encompassing wheel odometry, LiDAR, inertial measurement unit (IMU), and GNSS with Real-Time Kinematic (RTK) as the centimeter-level positioning ground truth. The dataset comprises seven sequences collected in three fields of citrus trees, featuring various tree species at different growth stages, distinctive planting patterns, as well as varying daylight conditions. It spans a total operation time of 1.7 hours, covers a distance of 7.5 km, and constitutes 1.3 TB of data. We anticipate that this dataset can facilitate the development of autonomous robot systems operating in agricultural tree environments, especially for localization, mapping and crop monitoring tasks. Moreover, the rich sensing modalities offered in this dataset can also support research in a range of robotics and computer vision tasks, such as place recognition, scene understanding, object detection and segmentation, and multimodal learning. The dataset, in conjunction with related tools and resources, is made publicly available at https://github.com/UCR-Robotics/Citrus-Farm-Dataset.
    摘要 在这项工作中,我们介绍了一个全面的多Modal感知数据集,称为CitrusFarm数据集,由一辆滚动式移动机器人在农业场所中收集到的。该数据集包含了STEREO RGB图像和深度信息,以及灰度、近红外和热图像,这些图像具有多种光谱响应,对农业研究非常重要。此外,数据集还提供了一系列导航传感器数据,包括轮胎速度、LiDAR、惯性测量单元(IMU)和GNSS(实时准确定位),这些数据可以提供中心水平位置的准确性。数据集包括7个序列,收集在3个柑橘树场中,其中每个场景都有不同的树种、植物排列方式和不同的日light Conditions。总共耗时1.7小时,涵盖7.5公里的距离,总数据量为1.3TB。我们预计这个数据集可以帮助开发在农业树木环境中自动化机器人系统,特别是地图Localization、映射和耕作监测任务。此外,这个数据集中的丰富的感知modalities也可以支持机器人和计算机视觉相关的研究,例如地点认知、场景理解、物体检测和分割、多Modal学习等。数据集、相关工具和资源,通过https://github.com/UCR-Robotics/Citrus-Farm-Dataset进行公共发布。

BASED: Bundle-Adjusting Surgical Endoscopic Dynamic Video Reconstruction using Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2309.15329
  • repo_url: None
  • paper_authors: Shreya Saha, Sainan Liu, Shan Lin, Jingpei Lu, Michael Yip
  • for: 这篇论文旨在重构弹性场景从内镜视频中,以便实现无人操作的微创外科手术。
  • methods: 该方法采用神经辐射场(NeRF)方法学习3D隐藏表示场景,以满足动态和弹性场景的重建需求,并且可以处理不知情相机位置。
  • results: 经过多个实验数据集,该模型能够适应多种相机和场景设置,并且展示了在当前和未来 робо器外科系统中的承诺。
    Abstract Reconstruction of deformable scenes from endoscopic videos is important for many applications such as intraoperative navigation, surgical visual perception, and robotic surgery. It is a foundational requirement for realizing autonomous robotic interventions for minimally invasive surgery. However, previous approaches in this domain have been limited by their modular nature and are confined to specific camera and scene settings. Our work adopts the Neural Radiance Fields (NeRF) approach to learning 3D implicit representations of scenes that are both dynamic and deformable over time, and furthermore with unknown camera poses. We demonstrate this approach on endoscopic surgical scenes from robotic surgery. This work removes the constraints of known camera poses and overcomes the drawbacks of the state-of-the-art unstructured dynamic scene reconstruction technique, which relies on the static part of the scene for accurate reconstruction. Through several experimental datasets, we demonstrate the versatility of our proposed model to adapt to diverse camera and scene settings, and show its promise for both current and future robotic surgical systems.
    摘要 <>TRANSLATE_TEXT重建可变场景从内部视频是许多应用程序的关键,如实时操作导航、手术视觉、和机器人手术。它是实现自主机器人干预手术的基础要求。然而,过去的方法在这个领域受到了模块化的限制,只能在特定的摄像头和场景设置下工作。我们的工作采用Neural Radiance Fields(NeRF)方法来学习3D隐式表示场景,这些场景是时间上的动态和变形的,并且有未知的摄像头姿态。我们在Robotic surgery中使用了这种方法。这种方法可以消除知 Camera pose的限制,并超越了状态之Art的不结构化动态场景重建技术,该技术基于场景的静止部分进行准确重建。通过多个实验数据集,我们示出了我们提议的模型的多样性,可以适应多种摄像头和场景设置,并表明了它在当前和未来机器人手术系统中的承诺。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

cs.AI - 2023-09-27

Masked autoencoders are scalable learners of cellular morphology

  • paper_url: http://arxiv.org/abs/2309.16064
  • repo_url: https://github.com/NumtraCG/614ca4eaa2b781088de64a5f20210923-160645routingmodel230921
  • paper_authors: Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw
  • for: 这 paper 用于探讨高容量微scopia 图像屏试中的生物关系推理,以及深度学习模型在生物研究中的应用。
  • methods: 这 paper 使用了弱监督和自监督的深度学习方法,并评估了不同模型的可扩展性和性能。
  • results: 结果显示,使用 CNN 和 ViT 基于的假降降autoencoder 模型可以舒过弱监督模型,并在大规模数据集上实现28% 的相对提升。
    Abstract Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how weakly supervised and self-supervised deep learning approaches scale when training larger models on larger datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 95-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised models at inferring known biological relationships curated from public databases.
    摘要 高通量微scopic摄像头检测可以提供生物关系的重要机会和挑战。先前的结果表明深度视觉模型可以更好地捕捉生物信号,比手工设计的特征更高效。这项工作探讨如何在训练更大的模型和更大的数据集时,使用弱监睹和自监睹深度学习方法Scaling。我们的结果显示,基于CNN和ViT的masked autoencoder都能够明显超越弱监睹模型。在我们的大规模扩展中,使用超过3.5亿个独特的折衣和95万个微scopic摄像头图像中采样的ViT-L/8模型,可以在推断已知生物关系中获得相对提高达28%的改善。

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

  • paper_url: http://arxiv.org/abs/2309.16042
  • repo_url: None
  • paper_authors: Fred Zhang, Neel Nanda
  • for: 本研究目的是强化机器学习模型的机制可读性,即理解模型内部的机制。
  • methods: 本研究使用活动补丁技术(也称作 causal tracing 或 interchange intervention)来实现机制可读性。
  • results: 研究发现,在不同的地方和电路发现任务中,不同的评价指标和损害方法会导致不同的可读性结果。同时,研究还提供了一些概念上的论述,以及未来Activation patching的最佳实践。
    Abstract Mechanistic interpretability seeks to understand the internal mechanisms of machine learning models, where localization -- identifying the important model components -- is a key step. Activation patching, also known as causal tracing or interchange intervention, is a standard technique for this task (Vig et al., 2020), but the literature contains many variants with little consensus on the choice of hyperparameters or methodology. In this work, we systematically examine the impact of methodological details in activation patching, including evaluation metrics and corruption methods. In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results. Backed by empirical observations, we give conceptual arguments for why certain metrics or methods may be preferred. Finally, we provide recommendations for the best practices of activation patching going forwards.
    摘要 机制解释寻求机器学习模型内部机制的理解,本地化(identifying important model components)是关键步骤。活动贴图(也称为 causal tracing或交换间作用)是标准技术,但文献中有很多变体,几乎没有共识于选择超参数或方法论。在本工作中,我们系统地检查活动贴图的方法论环境下的影响,包括评价指标和腐败方法。在语音模型的本地化和电路发现中,我们发现了不同的超参数选择可能导致不同的解释结果。基于实际观察,我们给出了概念性的Arguments,并提出了以下推荐:在活动贴图中,应该采用合适的评价指标和腐败方法,以确保解释结果的可靠性和有用性。

MedEdit: Model Editing for Medical Question Answering with External Knowledge Bases

  • paper_url: http://arxiv.org/abs/2309.16035
  • repo_url: None
  • paper_authors: Yucheng Shi, Shaochen Xu, Zhengliang Liu, Tianming Liu, Xiang Li, Ninghao Liu
  • for: 提高大语言模型(LLM)在医疗问答(QA)任务上的表现,并且不需要进行 fine-tuning 或 retraining。
  • methods: 利用内容学习进行模型编辑,并将医疗知识库中的信息 incorporate 到查询提示中,以提高 LLM 的响应。
  • results: 对使用 MedQA-SMILE dataset进行医疗 QA 的 edited Vicuna 模型,其精度从 44.46% 提高到 48.54%。这项研究表明,模型编辑可以提高 LLM 的表现,并提供一种实用的方法来 Mitigate 黑盒 LLM 的挑战。
    Abstract Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks like medical question answering (QA). Moreover, they tend to function as "black-boxes," making it challenging to modify their behavior. Addressing this, our study delves into model editing utilizing in-context learning, aiming to improve LLM responses without the need for fine-tuning or retraining. Specifically, we propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then we incorporate them into the query prompt for the LLM. Focusing on medical QA using the MedQA-SMILE dataset, we evaluate the impact of different retrieval models and the number of facts provided to the LLM. Notably, our edited Vicuna model exhibited an accuracy improvement from 44.46% to 48.54%. This work underscores the potential of model editing to enhance LLM performance, offering a practical approach to mitigate the challenges of black-box LLMs.
    摘要 大型语言模型(LLM)虽然在通用领域上表现出色,但在具体领域任务如医疗问答(QA)中表现不佳。此外,它们往往 behave like "黑盒子",使其行为修改困难。为了解决这问题,我们的研究探讨了模型编辑,以提高 LLM 的回答质量,不需要 Fine-tuning 或 Retraining。我们提议了一种完整的检索策略,将医学知识库中的医学事实EXTRACTED并提供给 LLM 作为查询提示。我们在使用 MedQA-SMILE 数据集进行医学问答任务中,评估了不同的检索模型和提供给 LLM 的事实数量对性能的影响。结果显示,我们修改后的 Vicuna 模型的准确率从 44.46% 提高到 48.54%。这种研究证明了模型编辑的潜在作用,为黑盒子 LLM 带来了实际的解决方案。

Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies

  • paper_url: http://arxiv.org/abs/2309.16025
  • repo_url: None
  • paper_authors: Iman Sharifi, Saber Fallah
  • for: 提高自动驾驶系统的可靠性和安全性
  • methods: 使用卷积神经网络和推理逻辑编程(ILP)学习驾驶策略
  • results: 提高驾驶策略的可解释性和泛化性,并在不同的驾驶情况下显著提高驾驶策略的应用性
    Abstract Current methods of imitation learning (IL), primarily based on deep neural networks, offer efficient means for obtaining driving policies from real-world data but suffer from significant limitations in interpretability and generalizability. These shortcomings are particularly concerning in safety-critical applications like autonomous driving. In this paper, we address these limitations by introducing Symbolic Imitation Learning (SIL), a groundbreaking method that employs Inductive Logic Programming (ILP) to learn driving policies which are transparent, explainable and generalisable from available datasets. Utilizing the real-world highD dataset, we subject our method to a rigorous comparative analysis against prevailing neural-network-based IL methods. Our results demonstrate that SIL not only enhances the interpretability of driving policies but also significantly improves their applicability across varied driving situations. Hence, this work offers a novel pathway to more reliable and safer autonomous driving systems, underscoring the potential of integrating ILP into the domain of IL.
    摘要 当前的模仿学习(IL)方法,主要基于深度神经网络,提供了高效的获取驾驶策略的方式,但受到解释性和普适性的重大限制。这些局限性在安全关键应用如自动驾驶中特别有问题。在这篇论文中,我们解决了这些限制,通过引入符号学习(SIL),我们使用推理逻辑编程(ILP)来学习透明、可解释的驾驶策略。我们使用现实世界的高D数据集进行了严格的比较分析,我们的结果表明,SIL不仅可以提高驾驶策略的解释性,还可以显著改善驾驶策略在不同驾驶情况下的应用程度。因此,这项工作提供了一个新的可靠和安全的自动驾驶系统的可能性,强调了将ILPintegrated到驾驶学习领域中的潜在价值。

Clinical Trial Recommendations Using Semantics-Based Inductive Inference and Knowledge Graph Embeddings

  • paper_url: http://arxiv.org/abs/2309.15979
  • repo_url: None
  • paper_authors: Murthy V. Devarakonda, Smita Mohanty, Raja Rao Sunkishala, Nag Mallampalli, Xiong Liu
  • for: 本研究的目的是提出一种新的临床试验设计方法,通过对临床试验记录的探索性挖掘,为设计新的临床试验提供建议。
  • methods: 本研究使用了基于神经元编码的新推荐方法,利用临床试验数据知识图构建知识图embedding(KGE),并通过对KGE方法的效果进行研究,以及一种新的推荐方法基于KGE。
  • results: 研究结果显示,该推荐方法可以达到70%-83%的相关性分数,并且在实际临床试验元素中找到最相关的建议。此外,研究还发现可以通过节点 semantics 进行训练,以提高KGE的性能。
    Abstract Designing a new clinical trial entails many decisions, such as defining a cohort and setting the study objectives to name a few, and therefore can benefit from recommendations based on exhaustive mining of past clinical trial records. Here, we propose a novel recommendation methodology, based on neural embeddings trained on a first-of-a-kind knowledge graph of clinical trials. We addressed several important research questions in this context, including designing a knowledge graph (KG) for clinical trial data, effectiveness of various KG embedding (KGE) methods for it, a novel inductive inference using KGE, and its use in generating recommendations for clinical trial design. We used publicly available data from clinicaltrials.gov for the study. Results show that our recommendations approach achieves relevance scores of 70%-83%, measured as the text similarity to actual clinical trial elements, and the most relevant recommendation can be found near the top of list. Our study also suggests potential improvement in training KGE using node semantics.
    摘要 “设计新临床试验需要许多决策,例如定义受试群体和设定试验目标等,因此可以受益于根据过去临床试验记录的广泛采矿提供建议。我们提出了一种新的建议方法,基于临床试验知识图(KG)的神经嵌入。我们解决了许多重要的研究问题,包括临床试验数据的知识图设计、不同KG嵌入方法的效果、一种新的推论方法,以及其在设计临床试验时的应用。我们使用了公开ailable的临床试验数据库,来进行研究。结果显示,我们的建议方法可以实现70%-83%的相似度数据, measured as 文本与实际临床试验元素之间的相似度,而且最相似的建议通常可以在列表的顶部发现。我们的研究也显示,可以在专门的node semantics上进行训练,以提高KGE的性能。”

Resilience of Deep Learning applications: a systematic survey of analysis and hardening techniques

  • paper_url: http://arxiv.org/abs/2309.16733
  • repo_url: None
  • paper_authors: Cristiana Bolchini, Luca Cassano, Antonio Miele
  • for: 本研究探讨了深度学习(一种人工智能技术)对硬件错误的抵御性,系统地对现有相关研究进行了思考和回顾。
  • methods: 本研究采用了一种分类框架来解释和 highlight研究相似之处和特点,基于多个参数,包括研究主要目标、采用的错误和缺陷模型、其可重现性。
  • results: 本研究结果表明,目前的研究主要集中在深度学习对硬件错误的抵御性方面,并采用了多种方法来解决这些问题。但是,还有一些未解决的问题和挑战需要在未来进行研究。
    Abstract Machine Learning (ML) is currently being exploited in numerous applications being one of the most effective Artificial Intelligence (AI) technologies, used in diverse fields, such as vision, autonomous systems, and alike. The trend motivated a significant amount of contributions to the analysis and design of ML applications against faults affecting the underlying hardware. The authors investigate the existing body of knowledge on Deep Learning (among ML techniques) resilience against hardware faults systematically through a thoughtful review in which the strengths and weaknesses of this literature stream are presented clearly and then future avenues of research are set out. The review is based on 163 scientific articles published between January 2019 and March 2023. The authors adopt a classifying framework to interpret and highlight research similarities and peculiarities, based on several parameters, starting from the main scope of the work, the adopted fault and error models, to their reproducibility. This framework allows for a comparison of the different solutions and the identification of possible synergies. Furthermore, suggestions concerning the future direction of research are proposed in the form of open challenges to be addressed.
    摘要 根据2019年1月至2023年3月发表的163篇科学文献,作者采用了一个分类框架来解读和 highlight研究的相似性和特点,包括研究的主要范围、采用的缺陷和错误模型、其重复性等多个参数。这个框架允许对不同的解决方案进行比较,并提出了可能的共同点。此外,作者还提出了未来研究的方向和开放挑战。

Unified Long-Term Time-Series Forecasting Benchmark

  • paper_url: http://arxiv.org/abs/2309.15946
  • repo_url: https://github.com/MIMUW-RL/Unified-Long-Horizon-Time-Series-Benchmark
  • paper_authors: Jacek Cyranka, Szymon Haponiuk
  • for: 这个论文是为了提高机器学习方法的时间序列预测能力而设计的,并提供了一个完整的数据集,用于验证这些方法。
  • methods: 这个论文使用了多种不同的机器学习模型,包括LSTM、DeepAR、NLinear、N-Hits、PatchTST和LatentODE等,并进行了广泛的比较分析以决定这些模型在不同情况下的效果。
  • results: 这个论文的结果显示了不同模型在不同数据集下的表现,并发现了一些模型在某些情况下的优化。此外,论文还引入了一个自定义的潜在NLinear模型和将DeepAR加以课程学习阶段,both consistently outperform their vanilla counterparts。
    Abstract In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.
    摘要 为支持机器学习方法预测时间序列数据的进步,我们提供了专门为长期时间序列预测设计的完整数据集。我们收集了来自多种动态系统和实际记录的多个数据集,并对每个数据集进行标准化,将它们分为训练和测试曲线的训练和测试轨迹,使用预定的回看长度。我们的数据集包括长度达2000的轨迹,以确保可靠地评估长期预测能力。我们使用经典和当前最佳模型,包括LSTM、DeepAR、NLinear、N-Hits、PatchTST和LatentODE进行广泛的比较分析,发现这些模型在不同的场景中表现出有趣的比较。特别是,我们提出了一种自定义隐藏的NLinear模型和对DeepAR进行课程学习阶段的改进,两者都能在其基础模型中提高表现。

Towards Efficient and Trustworthy AI Through Hardware-Algorithm-Communication Co-Design

  • paper_url: http://arxiv.org/abs/2309.15942
  • repo_url: None
  • paper_authors: Bipin Rajendran, Osvaldo Simeone, Bashir M. Al-Hashimi
  • for: 这篇论文的目的是提出一种基于硬件和软件设计的高效可靠人工智能(AI)算法,以提高AI模型的可靠性和不确定性评估。
  • methods: 该论文提出了一些研究方向,包括将物理知识integrated into计算基础结构,采用神经科学原则来实现高效信息处理,使用信息论和通信论的结果来估计 uncertainty,并采用分布式处理的通信论指南。
  • results: 该论文认为,通过采用新的设计方法,可以不仅提高AI模型的准确率,还可以提供可靠的不确定性评估。此外,该论文还提出了一些基于新 computing 架构的高效可靠AI算法,包括卷积神经网络、听频神经网络和量子计算技术。
    Abstract Artificial intelligence (AI) algorithms based on neural networks have been designed for decades with the goal of maximising some measure of accuracy. This has led to two undesired effects. First, model complexity has risen exponentially when measured in terms of computation and memory requirements. Second, state-of-the-art AI models are largely incapable of providing trustworthy measures of their uncertainty, possibly `hallucinating' their answers and discouraging their adoption for decision-making in sensitive applications. With the goal of realising efficient and trustworthy AI, in this paper we highlight research directions at the intersection of hardware and software design that integrate physical insights into computational substrates, neuroscientific principles concerning efficient information processing, information-theoretic results on optimal uncertainty quantification, and communication-theoretic guidelines for distributed processing. Overall, the paper advocates for novel design methodologies that target not only accuracy but also uncertainty quantification, while leveraging emerging computing hardware architectures that move beyond the traditional von Neumann digital computing paradigm to embrace in-memory, neuromorphic, and quantum computing technologies. An important overarching principle of the proposed approach is to view the stochasticity inherent in the computational substrate and in the communication channels between processors as a resource to be leveraged for the purpose of representing and processing classical and quantum uncertainty.
    摘要 人工智能(AI)算法基于神经网络已经在数十年中设计的目标是最大化某种精度指标。这两个不良效果:首先,模型复杂性 exponentiates 计算和内存需求。其次,当前的AI模型几乎无法提供可靠的不确定度评估,可能会“幻见”答案,这使得它们在敏感应用中无法得到采用。为实现高效可靠的AI,本文提出了融合硬件和软件设计的研究方向。这些方向包括:1. 基于物理学的启发,设计计算substrate,以提高计算效率和可靠性。2. 基于神经科学的原理,设计硬件和软件结构,以提高信息处理效率。3. 基于信息论的结果,使用最佳的不确定度量化方法,以提高模型的可靠性。4. 基于通信论的指导原则,设计分布式处理的方法,以提高模型的可靠性和可重复性。总的来说,本文提出了一种新的设计方法,该方法不仅考虑精度,还考虑不确定度量化。此外,该方法还利用新的计算硬件技术,例如半导体、神经元和量子计算技术,以超越传统的沃尔夫尼亚姆数字计算模式。一个重要的总体原则是视计算substrate和通信频道之间的随机性为可以利用的资源,以便用于表示和处理类型的不确定性。

SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations

  • paper_url: http://arxiv.org/abs/2309.15848
  • repo_url: https://github.com/Sharath-girish/Shacira
  • paper_authors: Sharath Girish, Abhinav Shrivastava, Kamal Gupta
  • for: 这 paper 旨在提出一种任务不受限制的框架,用于压缩Instant-NGP中的学习特征网格,以提高存储和流处理应用程序中的效率。
  • methods: 这 paper 使用了量化 latent веса的重parameterization和熵 regularization来实现压缩,而不需要额外的post验签/量化阶段。
  • results: 实验结果表明,我们的方法可以在多个领域中实现高度压缩,而不需要大量的数据或域特有的规则。我们的项目页面可以在 http://shacira.github.io 找到。
    Abstract Implicit Neural Representations (INR) or neural fields have emerged as a popular framework to encode multimedia signals such as images and radiance fields while retaining high-quality. Recently, learnable feature grids proposed by Instant-NGP have allowed significant speed-up in the training as well as the sampling of INRs by replacing a large neural network with a multi-resolution look-up table of feature vectors and a much smaller neural network. However, these feature grids come at the expense of large memory consumption which can be a bottleneck for storage and streaming applications. In this work, we propose SHACIRA, a simple yet effective task-agnostic framework for compressing such feature grids with no additional post-hoc pruning/quantization stages. We reparameterize feature grids with quantized latent weights and apply entropy regularization in the latent space to achieve high levels of compression across various domains. Quantitative and qualitative results on diverse datasets consisting of images, videos, and radiance fields, show that our approach outperforms existing INR approaches without the need for any large datasets or domain-specific heuristics. Our project page is available at http://shacira.github.io .
    摘要 含义表示(INR)或神经场已成为编码多媒体信号(图像和辐射场)的受欢迎框架,保持高质量。近些年,可学习特征格子提出了可学习特征格子,以替代大型神经网络,从而实现训练和采样INR的速度增快。然而,这些特征格子带来大量内存占用,可能对存储和流动应用造成瓶颈。在这种情况下,我们提出了SHACIRA,一个简单 yet有效的任务无关框架,用于压缩特征格子,无需额外的后处剖分/量化阶段。我们将特征格子映射到量化的幂Weight中,并在幂空间应用Entropy抑制来实现高度压缩。在多个领域中,包括图像、视频和辐射场,我们的方法与现有INR方法进行比较,并且不需要大量的数据或域特定的规则。更多信息可以通过我们的项目页面http://shacira.github.io/获取。

Examining the Values Reflected by Children during AI Problem Formulation

  • paper_url: http://arxiv.org/abs/2309.15839
  • repo_url: None
  • paper_authors: Utkarsh Dwivedi, Salma Elsayed-ali, Elizabeth Bonsignore, Hernisa Kacorri
  • for: 这个论文的目的是了解儿童在设计和训练AIInterface时所优先的目标和价值观。
  • methods: 论文使用了合作设计方法和修改后的故事板来让儿童和成年合作者在AI问题定义上进行活动。
  • results: 研究发现儿童的提议中含有高级的系统智能,如感知和理解用户的社交关系。儿童的想法表明他们关心家庭和期望机器能够理解社交上下文。
    Abstract Understanding how children design and what they value in AI interfaces that allow them to explicitly train their models such as teachable machines, could help increase such activities' impact and guide the design of future technologies. In a co-design session using a modified storyboard, a team of 5 children (aged 7-13 years) and adult co-designers, engaged in AI problem formulation activities where they imagine their own teachable machines. Our findings, leveraging an established psychological value framework (the Rokeach Value Survey), illuminate how children conceptualize and embed their values in AI systems that they themselves devise to support their everyday activities. Specifically, we find that children's proposed ideas require advanced system intelligence, e.g. emotion detection and understanding the social relationships of a user. The underlying models could be trained under multiple modalities and any errors would be fixed by adding more data or by anticipating negative examples. Children's ideas showed they cared about family and expected machines to understand their social context before making decisions.
    摘要 理解儿童在设计和训练AI模型时所价值的内容,可以帮助提高这些活动的影响力并导向未来技术的设计。在一个 modify 的故事板 session 中,一组5名儿童(年龄7-13岁)和成年合作设计者,参与了 AI 问题定制活动,在假设自己的教程机器人时。我们的发现,基于已成立的心理价值框架(Rokeach Value Survey),揭示了儿童如何概念化并嵌入自己的价值观在自己设计的 AI 系统中。特别是,儿童的提出的想法需要高级的系统智能,例如情感检测和理解用户的社交关系。这些基础模型可以在多种感知模式下训练,并且任何错误都可以通过添加更多数据或预期负例来修复。儿童的想法表明他们关心家庭和期望机器人能够理解其社交上下文,在做出决策之前。

OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANs

  • paper_url: http://arxiv.org/abs/2309.15830
  • repo_url: None
  • paper_authors: Honglin He, Zhuoqian Yang, Shikai Li, Bo Dai, Wayne Wu
  • for: 这个论文的目的是为了生成高精度的、视角一致的3D图像。
  • methods: 这个论文提出了一种hybrid的显式-隐式表示方法,称为OrthoPlanes,它可以高效地通过修改2D StyleGANs来生成细节rich的3D信息。
  • results: 实验表明,这个方法可以处理更加困难的视角和生成高度自由的静止物体图像,并且在FFHQ和SHHQ数据集上达到了状态 искусственный智能水平。项目页面:https://orthoplanes.github.io/
    Abstract We present a new method for generating realistic and view-consistent images with fine geometry from 2D image collections. Our method proposes a hybrid explicit-implicit representation called \textbf{OrthoPlanes}, which encodes fine-grained 3D information in feature maps that can be efficiently generated by modifying 2D StyleGANs. Compared to previous representations, our method has better scalability and expressiveness with clear and explicit information. As a result, our method can handle more challenging view-angles and synthesize articulated objects with high spatial degree of freedom. Experiments demonstrate that our method achieves state-of-the-art results on FFHQ and SHHQ datasets, both quantitatively and qualitatively. Project page: \url{https://orthoplanes.github.io/}.
    摘要 我们提出了一种新的方法,可以生成具有细腻的三维信息的真实和视角一致的图像集。我们的方法使用名为“OrthoPlanes”的混合显式隐式表示方式,可以快速地由修改2D StyleGANs生成细节rich的特征图。相比之前的表示方法,我们的方法具有更好的扩展性和表达能力,并且有明确的信息。因此,我们的方法可以更好地处理更加困难的视角和 sintheSize articulated objects with high spatial degree of freedom。实验结果表明,我们的方法在FFHQ和SHHQ数据集上达到了现状之册的Result, both quantitatively and qualitatively。项目页面:\url{https://orthoplanes.github.io/}.

Lyra: Orchestrating Dual Correction in Automated Theorem Proving

  • paper_url: http://arxiv.org/abs/2309.15806
  • repo_url: https://github.com/chuanyang-zheng/lyra-theorem-prover
  • paper_authors: Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li
  • for: 这个论文的目的是提高大型自然语言模型(LLMs)在正式证明领域的效果,特别是避免幻觉和证明错误的反馈。
  • methods: 这个论文提出了一个新的框架 called Lyra,它使用了两种不同的修正机制:工具修正(TC)和推测修正(CC)。TC 使用先前知识来利用预定义的证明工具(如 Sledgehammer)来指导错误工具的更换,以避免幻觉。CC 是一种错误反馈机制,用于与证明器交互,以改进正式证明 conjecture。
  • results: 该论文的方法在 miniF2F 验证和测试集上达到了当前最佳性能(SOTA),从48.0% 提高到55.3% 和从45.5% 提高到51.2%。此外,论文还解决了三个国际数学奥林匹克(IMO)问题。
    Abstract Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framework that employs two distinct correction mechanisms: Tool Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in the post-processing of formal proofs, we leverage prior knowledge to utilize predefined prover tools (e.g., Sledgehammer) for guiding the replacement of incorrect tools. Tool Correction significantly contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages. Compared to the previous refinement framework, the proposed Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts. Our method has achieved state-of-the-art (SOTA) performance on both miniF2F validation (48.0% -> 55.3%) and test (45.5% -> 51.2%). We also present 3 IMO problems solved by Lyra. We believe Tool Correction (post-process for hallucination mitigation) and Conjecture Correction (subgoal adjustment from interaction with environment) could provide a promising avenue for future research in this field.
    摘要 大型语言模型(LLM)在形式证明领域的探索具有吸引力,但它们的全面潜力,特别是通过证明错误消息修正和抑制幻觉,仍然是一个尚未被全面探索的领域。为了增强LLM在这个领域的效果,我们介绍了一个新的框架,即Lyra,它使用了两种不同的修正机制:工具修正(TC)和推测修正(CC)。为了在后期处理中使用工具修正,我们利用了先前知识,使用预定的证明工具(例如Sledgehammer)来指导错误工具的更换。工具修正在幻觉缓解方面做出了重要贡献,从而提高了证明的总准确性。此外,我们引入了推测修正,这是一种基于证明错误消息的错误反馈机制,可以与证明进行互动,以修正正式证明的推测。与之前的修复框架相比,我们的提案的推测修正不需要收集配对(生成、错误和修复)的示例。我们的方法在miniF2F验证中达到了状态的最佳性能(SOTA),从48.0%提高到55.3%,以及在测试中从45.5%提高到51.2%。我们还展示了3个IMO问题,由Lyra解决。我们认为工具修正(幻觉缓解)和推测修正(从环境交互修正)是未来这个领域的有前途的研究方向。

AI in Software Engineering: Case Studies and Prospects

  • paper_url: http://arxiv.org/abs/2309.15768
  • repo_url: None
  • paper_authors: Lei Wang
  • for: 本文旨在研究人工智能(AI)和软件工程(SE)之间的关系,以及如何应用AI技术在软件开发中提高软件产品质量。
  • methods: 本文分析了两个案例研究:IBM watson和Google AlphaGo,它们都使用了不同的AI技术来解决现实世界中的挑战问题。
  • results: 研究发现,使用AI技术如深度学习和机器学习在软件系统中可以实现智能系统。IBM watson采用了“决策支持”策略,帮助人类做出决策;AlphaGo则使用了“自动决策”选择操作,以实现最佳结果。此外,AlphaGo还使用了神经网络和强化学习来模仿人脑,这可能在医学研究中用于诊断和治疗。然而,我们还需要很长的时间来复制人脑在机器中,因为人脑和机器是内在不同的。
    Abstract Artificial intelligence (AI) and software engineering (SE) are two important areas in computer science. In recent years, researchers are trying to apply AI techniques in various stages of software development to improve the overall quality of software products. Moreover, there are also some researchers focus on the intersection between SE and AI. In fact, the relationship between SE and AI is very weak; however, methods and techniques in one area have been adopted in another area. More and more software products are capable of performing intelligent behaviour like human beings. In this paper, two cases studies which are IBM Watson and Google AlphaGo that use different AI techniques in solving real world challenging problems have been analysed, evaluated and compared. Based on the analysis of both case studies, using AI techniques such as deep learning and machine learning in software systems contributes to intelligent systems. Watson adopts 'decision making support' strategy to help human make decisions; whereas AlphaGo uses 'self-decision making' to choose operations that contribute to the best outcome. In addition, Watson learns from man-made resources such as paper; AlphaGo, on the other hand, learns from massive online resources such as photos. AlphaGo uses neural networks and reinforcement learning to mimic human brain, which might be very useful in medical research for diagnosis and treatment. However, there is still a long way to go if we want to reproduce human brain in machine and view computers as thinkers, because human brain and machines are intrinsically different. It would be more promising to see whether computers and software systems will become more and more intelligent to help with real world challenging problems that human beings cannot do.
    摘要 人工智能(AI)和软件工程(SE)是计算机科学中两个重要领域。近年来,研究人员尝试将AI技术应用于软件开发的不同阶段,以提高软件产品的总质量。此外,还有一些研究人员关注SE和AI的交叉点。事实上,SE和AI之间的关系很弱,但是一个领域的方法和技术往往被另一个领域采纳。逐渐增多的软件产品可以展现出人类智能的行为。本文分析了IBM Watson和Google AlphaGo两个案例,它们使用了不同的AI技术解决实际世界问题。根据两个案例的分析,使用AI技术如深度学习和机器学习在软件系统中带来智能系统。Watson采用了“决策支持”策略,以帮助人类做出决策;AlphaGo则使用“自动决策”选择操作,以实现最佳结果。此外,Watson从人类制作的资源学习,如文献;AlphaGo则从大量在线资源学习,如照片。AlphaGo使用神经网络和强化学习模仿人脑,可能在医学研究中非常有用于诊断和治疗。然而,我们还很遥远才能复制人脑机器,因为人脑和机器是内在不同的。可能更有前途的是看看计算机和软件系统会变得越来越智能,以帮助实际世界中的问题。

Borges and AI

  • paper_url: http://arxiv.org/abs/2310.01425
  • repo_url: None
  • paper_authors: Léon Bottou, Bernhard Schölkopf
  • for: 这篇论文主要是为了探讨大型自然语言模型(LLM)是人工智能(AI)的开端,以及这种技术的可能性和威胁。
  • methods: 本论文使用了 Jorge Luis Borges 的文学创作作为视角,以帮助理解大型自然语言模型和人工智能之间的关系。
  • results: 本论文提出了一种新的视角,即通过 Borges 的文学创作来理解大型自然语言模型和人工智能之间的关系,从而帮助我们更深入理解这种技术的潜在可能性和威胁。
    Abstract Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask whether this mental imagery provides a good description of the phenomenon at hand. Understanding weather patterns through the moods of the gods only goes so far. The present paper instead advocates understanding LLMs and their connection to AI through the imagery of Jorge Luis Borges, a master of 20th century literature, forerunner of magical realism, and precursor to postmodern literature. This exercise leads to a new perspective that illuminates the relation between language modelling and artificial intelligence.
    摘要 很多人认为大语言模型(LLM)开启了人工智能(AI)的时代。一些人看到了机会,而另一些人看到了危险。然而,两者都是通过科幻小说中的形象来理解AI。例如,将机器变成自己的创造者并反抗它们吗?将会出现笔clip末日吗?在回答这些问题之前,我们应该首先问这些形象是否能够正确描述现象。通过神话和幻想的形象来理解天气patterns只能取得一定的成果。而在本文中,我们 instead advocates使用 Jorge Luis Borges 的形象来理解 LLM 和 AI 之间的关系,这将导向一种新的视角,以便更好地理解语言模型和人工智能之间的关系。

Latent Graphs for Semi-Supervised Learning on Biomedical Tabular Data

  • paper_url: http://arxiv.org/abs/2309.15757
  • repo_url: None
  • paper_authors: Boshko Koloski, Nada Lavrač, Senja Pollak, Blaž Škrlj
  • for: 提高 semi-supervised learning 技术的Robustness和性能,通过发现数据之间的关系关系
  • methods: 基于图表示法,利用 graph-based 表示,实现信息的流动性,同时包含全局和局部知识
  • results: 对生物医学数据集进行评估,提出一种基于图的方法,能够超越当前方法的性能,并且在三个生物医学数据集上达到最佳效果
    Abstract In the domain of semi-supervised learning, the current approaches insufficiently exploit the potential of considering inter-instance relationships among (un)labeled data. In this work, we address this limitation by providing an approach for inferring latent graphs that capture the intrinsic data relationships. By leveraging graph-based representations, our approach facilitates the seamless propagation of information throughout the graph, effectively incorporating global and local knowledge. Through evaluations on biomedical tabular datasets, we compare the capabilities of our approach to other contemporary methods. Our work demonstrates the significance of inter-instance relationship discovery as practical means for constructing robust latent graphs to enhance semi-supervised learning techniques. The experiments show that the proposed methodology outperforms contemporary state-of-the-art methods for (semi-)supervised learning on three biomedical datasets.
    摘要 在半指导学习领域,当前的方法未能充分利用半标注数据之间实例关系的潜力。在这个工作中,我们解决这个限制,通过提供一种推理潜在图的方法,以捕捉数据的内在关系。通过利用图表示,我们的方法可以轻松地在图中传递信息,有效地结合全局和局部知识。通过对生物医学表格数据进行评估,我们与当代其他方法进行比较。我们的工作表明了在建立强大的潜在图中找到实例关系的重要性,以提高半指导学习技术的性能。实验表明,我们提出的方法在三个生物医学数据集上比当代状态オブジェクト的方法表现更出色。

Experience and Evidence are the eyes of an excellent summarizer! Towards Knowledge Infused Multi-modal Clinical Conversation Summarization

  • paper_url: http://arxiv.org/abs/2309.15739
  • repo_url: https://github.com/nlp-rl/mm-cliconsummation
  • paper_authors: Abhisek Tiwari, Anisha Saha, Sriparna Saha, Pushpak Bhattacharyya, Minakshi Dhar
  • for: 这个论文的目的是提出一种多Modal临床对话概括生成任务,使用临床医生与患者之间的文本和视觉信息,并生成一个简洁的对话概括。
  • methods: 这种方法基于一个知识感知、多Modal、多任务的医疗领域标识和临床对话概括生成框架,使用一个适配器来把知识和视觉特征融合,并使用一个阻止机制来归一化混合特征向量。
  • results: 经过了大量的数据分析和评估,研究发现:(1)视觉信息具有重要的意义,(2)增加知识感知可以提高概括的准确性和医学实体保持性,(3)医疗部门标识和临床对话概括之间存在 statistically significant 的相关性。
    Abstract With the advancement of telemedicine, both researchers and medical practitioners are working hand-in-hand to develop various techniques to automate various medical operations, such as diagnosis report generation. In this paper, we first present a multi-modal clinical conversation summary generation task that takes a clinician-patient interaction (both textual and visual information) and generates a succinct synopsis of the conversation. We propose a knowledge-infused, multi-modal, multi-tasking medical domain identification and clinical conversation summary generation (MM-CliConSummation) framework. It leverages an adapter to infuse knowledge and visual features and unify the fused feature vector using a gated mechanism. Furthermore, we developed a multi-modal, multi-intent clinical conversation summarization corpus annotated with intent, symptom, and summary. The extensive set of experiments, both quantitatively and qualitatively, led to the following findings: (a) critical significance of visuals, (b) more precise and medical entity preserving summary with additional knowledge infusion, and (c) a correlation between medical department identification and clinical synopsis generation. Furthermore, the dataset and source code are available at https://github.com/NLP-RL/MM-CliConSummation.
    摘要 随着电子医疗的发展,研究人员和医生们在合作开发了许多自动化医疗操作的技术,其中包括诊断报告生成。本文首先介绍了一种多模态临床对话总结生成任务,该任务可以从医生与病人的互动(包括文本和视觉信息)中生成简洁的对话总结。我们提出了一个知识激发、多模态、多任务医疗领域识别和临床对话总结框架(MM-CliConSummation)。它利用一个适配器来激发知识和视觉特征,并使用一个阻止机制将混合特征vector化。此外,我们还制作了多模态、多意向的临床对话总结数据集,该数据集包括意向、症状和总结的标注。经过了广泛的实验,我们得到了以下发现:(a)视觉信息的重要性,(b)增加知识激发后的 preciser和医学实体保持的总结,以及(c)医疗部门识别和临床总结生成之间的相关性。此外,数据集和源代码可以在GitHub上下载。

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

  • paper_url: http://arxiv.org/abs/2309.15729
  • repo_url: https://github.com/jxuanc/mindgpt
  • paper_authors: Jiaxuan Chen, Yu Qi, Yueming Wang, Gang Pan
  • for: 这个研究的目的是使用非侵入式脑记录技术来解码视觉内容。
  • methods: 这个研究使用了一种非侵入式神经解码器,称为 MindGPT,将视觉刺激转化为自然语言。该模型基于一种视觉导向的神经编码器,并使用大语言模型GPT来实现语言semantic的导向。
  • results: 实验结果表明,MindGPT模型可以准确地将视觉信息转化为自然语言,并且可以评估视觉属性对语言 semantics的贡献。此外,研究还发现,高级视觉 cortex (HVC) 比低级视觉 cortex (LVC) 更具Semantic信息,只使用 HVC 可以重建大多数Semantic信息。
    Abstract Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed as MindGPT, which interprets perceived visual stimuli into natural languages from fMRI signals. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism, which permits us to guide latent neural representations towards a desired language semantic direction in an end-to-end manner by the collaborative use of the large language model GPT. By doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The code of the MindGPT model will be publicly available at https://github.com/JxuanC/MindGPT.
    摘要 <>TRANSLATE_TEXT科学和实践中的重要价值在于解码见过的视觉内容。尝试将视觉信号中的图像重建。然而,大多数现有方法无法准确表达见过的图像,因为图像质量不够高或 semantic mismatch。相比于重建像素级视觉图像,说出视觉信息是更高效和有效的方式。我们介绍了一种非侵入性神经解码器,称为 MindGPT,它将感知的视觉刺激转化为自然语言 from fMRI 信号中。具体来说,我们的模型基于一个视觉驱动的神经编码器,其中包含一个 cross-attention 机制,使得我们可以通过携带大语言模型 GPT 的协同使用,将 latent 神经表示向 желаем的语言semantic 方向协调。由此,我们发现 MindGPT 的神经表示是可解释的,可以用于评估视觉属性对语言semantic 的贡献。我们的实验表明,生成的单词序列准确表达了看过的视觉信息(包括关键信息)。结果还表明,高级视觉区域 (HVC) 比低级视觉区域 (LVC) 更具Semantic 信息,只使用 HVC 可以回归大多数semantic 信息。MindGPT 模型的代码将在 https://github.com/JxuanC/MindGPT 公共可用。

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration

  • paper_url: http://arxiv.org/abs/2309.15723
  • repo_url: None
  • paper_authors: Haotian Li, Yun Wang, Huamin Qu
  • for: 这篇论文旨在探讨人工智能(AI)在数据故事创作中的支持和增强,但现有的研究很少从人机合作角度来检视现有的数据故事创作工具,这限制了研究人员对现有工具的反思和学习。
  • methods: 本文采用了一个框架,从数据故事创作过程中的不同阶段和人机合作角色来分析现有工具,包括分析、规划、实施和沟通阶段,以及人类和AI在每个阶段的角色,如创作者、助手、优化者和审查者。
  • results: 通过分析,我们发现现有工具中的常见合作模式,总结了这些模式所学习的经验教训,并进一步阐述了人机合作在数据故事创作中的研究机遇。
    Abstract Data storytelling is powerful for communicating data insights, but it requires diverse skills and considerable effort from human creators. Recent research has widely explored the potential for artificial intelligence (AI) to support and augment humans in data storytelling. However, there lacks a systematic review to understand data storytelling tools from the perspective of human-AI collaboration, which hinders researchers from reflecting on the existing collaborative tool designs that promote humans' and AI's advantages and mitigate their shortcomings. This paper investigated existing tools with a framework from two perspectives: the stages in the storytelling workflow where a tool serves, including analysis, planning, implementation, and communication, and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. Through our analysis, we recognize the common collaboration patterns in existing tools, summarize lessons learned from these patterns, and further illustrate research opportunities for human-AI collaboration in data storytelling.
    摘要 <>转换文本到简化中文。<>数据故事传递具有强大的沟通数据发现力,但是需要多种技能和较大的人类创造者的努力。最近的研究广泛探讨了人工智能(AI)支持和加强人类数据故事传递的潜力。然而,还缺乏一个系统性的审查,以便研究人员反思现有的合作工具的设计,以便利用人类和AI的优势,避免他们的缺点。这篇论文调查了现有工具,使用了两个视角:在数据故事传递过程中工具服务的阶段,包括分析、规划、实施和沟通,以及在每个阶段中人类和AI的角色,如创作者、助手、优化者和审查者。通过我们的分析,我们认可现有工具的共同合作模式,总结了这些模式所学到的经验教训,并进一步阐述了人类-AI合作在数据故事传递中的研究机会。

Model Share AI: An Integrated Toolkit for Collaborative Machine Learning Model Development, Provenance Tracking, and Deployment in Python

  • paper_url: http://arxiv.org/abs/2309.15719
  • repo_url: None
  • paper_authors: Heinrich Peters, Michael Parrott
  • For: The paper aims to address the issue of many machine learning (ML) projects never progressing past the proof-of-concept stage by introducing an easy-to-use platform called Model Share AI (AIMS) to streamline collaborative model development, model provenance tracking, and model deployment.* Methods: The paper describes the features of AIMS, including collaborative project spaces, a standardized model evaluation process, and the ability to deploy ML models built in various frameworks into live REST APIs and automatically generated web apps with minimal code.* Results: The paper highlights the potential of AIMS to make ML research more applicable to real-world challenges by facilitating collaborative model development, capturing model performance and metadata for provenance tracking, and providing a user-friendly platform for deploying ML models to non-technical end-users through web apps.Here are the three points in Simplified Chinese:* For: 这篇论文目标是解决机器学习(ML)项目常常无法继续进行证明阶段的问题,并提出了一个易于使用的平台called Model Share AI(AIMS),用于协作模型开发、追踪模型来源和模型部署。* Methods: 论文描述了 AIMS 的特点,包括协作项目空间、基于不同框架的模型部署、以及使得 ML 模型在 REST API 和自动生成的 web 应用中部署的最小代码。* Results: 论文强调了 AIMS 可能使 ML 研究更加适用于实际挑战,通过促进协作模型开发、记录模型性能和元数据进行追踪、以及提供易于使用的平台来帮助非技术用户通过 web 应用访问 ML 模型。
    Abstract Machine learning (ML) has the potential to revolutionize a wide range of research areas and industries, but many ML projects never progress past the proof-of-concept stage. To address this issue, we introduce Model Share AI (AIMS), an easy-to-use MLOps platform designed to streamline collaborative model development, model provenance tracking, and model deployment, as well as a host of other functions aiming to maximize the real-world impact of ML research. AIMS features collaborative project spaces and a standardized model evaluation process that ranks model submissions based on their performance on unseen evaluation data, enabling collaborative model development and crowd-sourcing. Model performance and various model metadata are automatically captured to facilitate provenance tracking and allow users to learn from and build on previous submissions. Additionally, AIMS allows users to deploy ML models built in Scikit-Learn, TensorFlow Keras, PyTorch, and ONNX into live REST APIs and automatically generated web apps with minimal code. The ability to deploy models with minimal effort and to make them accessible to non-technical end-users through web apps has the potential to make ML research more applicable to real-world challenges.
    摘要 机器学习(ML)有望革命化广泛的研究领域和行业,但许多ML项目很难进入实际应用阶段。为解决这问题,我们介绍Model Share AI(AIMS),一个易用的MLOps平台,旨在协同开发模型、追踪模型来源、模型部署以及多种其他功能,以最大化ML研究的实际影响。AIMS提供了协同项目空间和基于未见评估数据的模型评价过程,可以促进协同开发和招募模型。模型性能和多种模型元数据会自动记录,以便追踪模型来源和启发新 submission。此外,AIMS还允许用户通过 minimum code 将 Scikit-Learn、TensorFlow Keras、PyTorch 和 ONNX 中的模型部署到live REST API 和自动生成的网页应用程序中,以便让 ML 研究更加适应实际挑战。

Brave new world: Artificial Intelligence in teaching and learning

  • paper_url: http://arxiv.org/abs/2310.06856
  • repo_url: None
  • paper_authors: Adrian Groza, Anca Marginean
  • for: 这篇论文主要是为了探讨大语言模型在教学和学习中的应用,以及教育领域中已经发生的人工智能事件,并提出了在大学中引入人工智能政策的必要性和紧迫性。
  • methods: 本论文使用了大语言模型在教学和学习中的应用,以及已经发生的人工智能事件,以探讨教育领域中的人工智能应用。
  • results: 本论文认为,每所高等教育机构应该有一个人工智能政策,以提高教育工具的认识,并减少教育领域中的人工智能事件风险。
    Abstract We exemplify how Large Language Models are used in both teaching and learning. We also discuss the AI incidents that have already occurred in the education domain, and we argue for the urgent need to introduce AI policies in universities and for the ongoing strategies to regulate AI. Regarding policy for AI, our view is that each institution should have a policy for AI in teaching and learning. This is important from at least twofolds: (i) to raise awareness on the numerous educational tools that can both positively and negatively affect education; (ii) to minimise the risk of AI incidents in education.
    摘要 我团队讲述了大语言模型在教学和学习中的应用,以及教育领域已经发生的人工智能事件。我们认为,每所学府应该制定一份人工智能教学政策,这有两点重要性:(一)提高教育工具的认识,这些工具可以 both positively和negativelyaffect教育;(二)减少教育领域的人工智能事件风险。

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

  • paper_url: http://arxiv.org/abs/2310.01424
  • repo_url: None
  • paper_authors: Victoria Smith, Ali Shahin Shamsabadi, Carolyn Ashurst, Adrian Weller
  • for: 本研究旨在帮助研究人员和政策制定者更好地理解语言模型(LM)的隐私风险和mitigation策略,包括需要更多的研究和关注的方向。
  • methods: 本研究使用了一种稍加分析语言模型(LM)的隐私风险和mitigation策略的方法,包括分析LM的攻击和防御方法,并对现有的mitigation策略进行了评估和分析。
  • results: 本研究通过分析了多种隐私攻击和mitigation策略,并对现有的mitigation策略进行了评估和分析,得出了一些结论和建议,包括LM的攻击和防御方法,以及需要更多的研究和关注的方向。
    Abstract Rapid advancements in language models (LMs) have led to their adoption across many sectors. Alongside the potential benefits, such models present a range of risks, including around privacy. In particular, as LMs have grown in size, the potential to memorise aspects of their training data has increased, resulting in the risk of leaking private information. As LMs become increasingly widespread, it is vital that we understand such privacy risks and how they might be mitigated. To help researchers and policymakers understand the state of knowledge around privacy attacks and mitigations, including where more work is needed, we present the first technical survey on LM privacy. We (i) identify a taxonomy of salient dimensions where attacks differ on LMs, (ii) survey existing attacks and use our taxonomy of dimensions to highlight key trends, (iii) discuss existing mitigation strategies, highlighting their strengths and limitations, identifying key gaps and demonstrating open problems and areas for concern.
    摘要 快速发展的语言模型(LM)在多个领域得到广泛应用,同时也存在一系列的风险,包括隐私问题。特别是LM的大小增加后,可能吸收训练数据的memory risk增加,可能导致泄露private information。随着LM的普及,我们必须了解这些隐私风险,并研究如何 Mitigate them。为了帮助研究人员和政策制定者更好地理解隐私攻击和 Mitigation Strategies,我们提出了语言模型隐私技术的首次技术评估。我们(i)确定了LM隐私攻击的重要维度,(ii)survey了现有的攻击方法,并使用我们的维度分类来描述主要趋势,(iii)讨论了现有的 Mitigation Strategies, highlighting their strengths and limitations,并识别主要的缺陷和开放问题。

Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension

  • paper_url: http://arxiv.org/abs/2309.15714
  • repo_url: None
  • paper_authors: Yuhong Zhang, Qin Li, Sujal Nahata, Tasnia Jamal, Shih-kuen Cheng, Gert Cauwenberghs, Tzyy-Ping Jung
  • for: This pilot study aims to provide insights into individuals’ neural states during a semantic relation reading-comprehension task.
  • methods: The study jointly analyzes LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading.
  • results: The best validation accuracy in this word-level classification is over 60% across 12 subjects. Words of high relevance to the inference keyword had significantly more eye fixations per word compared to words of low relevance.Here is the same information in Simplified Chinese text:
  • for: 这个飞行试验的目的是研究人类在 semantic relation 读写理解任务中的 neural 状态。
  • methods: 这个研究将jointly 分析 LLMS,eye-gaze,和电enzephalographic(EEG)数据,以研究阅读中关键字的 brain 处理词语 varying degrees of relevance 的过程。
  • results: 这个word-level 分类的最佳验证精度超过 60% across 12 个主体。关键字 relevance 高的词语在阅读中有significantly 更多的眼动 Fixations per word。
    Abstract With the recent proliferation of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), there has been a significant shift in exploring human and machine comprehension of semantic language meaning. This shift calls for interdisciplinary research that bridges cognitive science and natural language processing (NLP). This pilot study aims to provide insights into individuals' neural states during a semantic relation reading-comprehension task. We propose jointly analyzing LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading. We also use a feature engineering approach to improve the fixation-related EEG data classification while participants read words with high versus low relevance to the keyword. The best validation accuracy in this word-level classification is over 60\% across 12 subjects. Words of high relevance to the inference keyword had significantly more eye fixations per word: 1.0584 compared to 0.6576 when excluding no-fixation words, and 1.5126 compared to 1.4026 when including them. This study represents the first attempt to classify brain states at a word level using LLM knowledge. It provides valuable insights into human cognitive abilities and the realm of Artificial General Intelligence (AGI), and offers guidance for developing potential reading-assisted technologies.
    摘要 With the recent proliferation of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), there has been a significant shift in exploring human and machine comprehension of semantic language meaning. This shift calls for interdisciplinary research that bridges cognitive science and natural language processing (NLP). This pilot study aims to provide insights into individuals' neural states during a semantic relation reading-comprehension task. We propose jointly analyzing LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading. We also use a feature engineering approach to improve the fixation-related EEG data classification while participants read words with high versus low relevance to the keyword. The best validation accuracy in this word-level classification is over 60\% across 12 subjects. Words of high relevance to the inference keyword had significantly more eye fixations per word: 1.0584 compared to 0.6576 when excluding no-fixation words, and 1.5126 compared to 1.4026 when including them. This study represents the first attempt to classify brain states at a word level using LLM knowledge. It provides valuable insights into human cognitive abilities and the realm of Artificial General Intelligence (AGI), and offers guidance for developing potential reading-assisted technologies.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The traditional Chinese form of the text is also available upon request.

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.15701
  • repo_url: https://github.com/hypotheses-paradise/hypo2trans
  • paper_authors: Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng
  • for: 本研究目的是提出一种基于大语言模型(LLM)的自动语音识别(ASR)错误修复方法,以提高ASR系统在不良条件下的表现。
  • methods: 本研究使用了一个开源的 benchmark,其中包含了一个新的数据集(HyPoradise,HP),该数据集包含了超过334,000个N-best假设和其相应的准确的转录。三种基于LLM的错误修复技术被研究,其中一种使用了reasonable prompt和其生成能力来修复缺失的token。
  • results: 实验证明,提出的方法可以超越传统的重新排序基于方法的上限,并且LLM的reasonable prompt和生成能力可以修复缺失的token。 results publicly accessible,以便用于可重现的管道中。
    Abstract Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation when confronted with adverse conditions, as a well-trained acoustic model is sensitive to variations in the speech domain, e.g., background noise. Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. This approach is a paradigm shift from the traditional language model rescoring strategy that can only select one candidate hypothesis as the output transcription. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses and corresponding accurate transcriptions across prevalent speech domains. Given this dataset, we examine three types of error correction techniques based on LLMs with varying amounts of labeled hypotheses-transcription pairs, which gains a significant word error rate (WER) reduction. Experimental evidence demonstrates the proposed technique achieves a breakthrough by surpassing the upper bound of traditional re-ranking based methods. More surprisingly, LLM with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list. We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.
    摘要 深度神经网络技术的进步使得自动语音识别(ASR)系统可以达到人类水平在一些公开的干净语音数据集上。然而,即使使用最新的ASR系统,它们在面临不利条件时会经受性能下降,因为一个干净的语音模型对语音频谱的变化非常敏感。人类在面临这种问题时会依靠语言知识:在不同语音频谱中,人们会根据上下文提供的听觉信息来推断不确定的 spoken terms的意思,从而减少对听觉系统的依赖。 Drawing inspiration from this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. This approach is a paradigm shift from the traditional language model rescoring strategy that can only select one candidate hypothesis as the output transcription. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses and corresponding accurate transcriptions across prevalent speech domains. Given this dataset, we examine three types of error correction techniques based on LLMs with varying amounts of labeled hypotheses-transcription pairs, which gains a significant word error rate (WER) reduction. Experimental evidence demonstrates the proposed technique achieves a breakthrough by surpassing the upper bound of traditional re-ranking based methods. Moreover, LLM with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list. We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

Deep Model Fusion: A Survey

  • paper_url: http://arxiv.org/abs/2309.15698
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen
    for: 这个论文主要是为了探讨深度模型融合技术,尤其是在大规模深度学习模型(如LLMs和基础模型)上进行深度模型融合的挑战和可能性。methods: 这个论文主要分析了四种深度模型融合方法:(1)”Mode connectivity”,通过非增加损失的路径连接解决方案的重要性;(2)”Alignment”,匹配神经网络中单元的匹配以创造更好的融合条件;(3)”Weight average”,一种经典的模型融合方法,将多个模型的权重平均为更加准确的结果;(4)”Ensemble learning”,将多个不同模型的输出结合,以提高最终模型的准确性和可靠性。results: 这个论文分析了深度模型融合技术面临的挑战,并提出了未来研究的可能性。它还对不同的模型融合方法进行了分析和比较,帮助读者更深入地理解不同方法之间的相互关系和实际应用方法。
    Abstract Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single model to achieve better performance. However, deep model fusion on large-scale deep learning models (e.g., LLMs and foundation models) faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc. Although model fusion has attracted widespread attention due to its potential to solve complex real-world tasks, there is still a lack of complete and detailed survey research on this technique. Accordingly, in order to understand the model fusion method better and promote its development, we present a comprehensive survey to summarize the recent progress. Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model. In addition, we analyze the challenges faced by deep model fusion and propose possible research directions for model fusion in the future. Our review is helpful in deeply understanding the correlation between different model fusion methods and practical application methods, which can enlighten the research in the field of deep model fusion.
    摘要 深度模型融合/合并是一种出现在的技术,它将多个深度学习模型的参数或预测融合到一起,以实现更好的性能。它利用不同模型的能力来补偿单个模型的偏见和错误,以解决复杂的实际任务。然而,深度模型融合在大规模深度学习模型(例如LLMs和基础模型)上面临多种挑战,包括高计算成本、高维度参数空间以及不同型号之间的干扰等。虽然模型融合吸引了广泛的关注,因为它有可能解决复杂的实际任务,但是还没有完整和详细的报告研究这种技术。因此,为了更好地理解模型融合方法,我们提供了一份完整的报告,总结了最近的进展。 Specifically,我们将现有的深度模型融合方法分为四类:(1)“模式连接”,通过非增加损失的路径连接解 Solution Space,以获得更好的初始化 для模型融合;(2)“匹配”,将神经网络中的单元匹配,创造更好的融合 conditio;(3)“Weight average”,一种经典的模型融合方法,将多个模型的权重平均,以获得更加准确的结果,更近于优化解决方案;(4)“集成学习”,将多个不同模型的输出融合,是深度学习领域的基础技术,可以提高最终模型的准确性和鲁棒性。此外,我们还分析了深度模型融合所面临的挑战,并提出了未来模型融合的可能的研究方向。我们的评论对深度模型融合的深入理解和实际应用方法之间的相互关系做出了贡献,可以推动深度模型融合领域的研究。

Genetic Algorithm-Based Dynamic Backdoor Attack on Federated Learning-Based Network Traffic Classification

  • paper_url: http://arxiv.org/abs/2310.06855
  • repo_url: None
  • paper_authors: Mahmoud Nazzal, Nura Aljaafari, Ahmed Sawalmeh, Abdallah Khreishah, Muhammad Anan, Abdulelah Algosaibi, Mohammed Alnaeem, Adel Aldalbahi, Abdulaziz Alhumam, Conrado P. Vizcarra, Shadan Alhamed
  • for: 这个研究是为了探讨基于联合学习的网络流量分类模型是否受到后门攻击的问题。
  • methods: 本研究使用了一种基于遗传算法的后门攻击方法,叫做GABAttack,它利用遗传算法来优化后门触发模式的值和位置,以 guarantees a better fit with the input and the model。
  • results: 实验结果显示GABAttack可以在实际的网络数据上得到良好的成果,并且可以在不同的情况下保持这些成果。这个研究作为一个警示,让网络安全专家和实践者为这种攻击进行防御措施。
    Abstract Federated learning enables multiple clients to collaboratively contribute to the learning of a global model orchestrated by a central server. This learning scheme promotes clients' data privacy and requires reduced communication overheads. In an application like network traffic classification, this helps hide the network vulnerabilities and weakness points. However, federated learning is susceptible to backdoor attacks, in which adversaries inject manipulated model updates into the global model. These updates inject a salient functionality in the global model that can be launched with specific input patterns. Nonetheless, the vulnerability of network traffic classification models based on federated learning to these attacks remains unexplored. In this paper, we propose GABAttack, a novel genetic algorithm-based backdoor attack against federated learning for network traffic classification. GABAttack utilizes a genetic algorithm to optimize the values and locations of backdoor trigger patterns, ensuring a better fit with the input and the model. This input-tailored dynamic attack is promising for improved attack evasiveness while being effective. Extensive experiments conducted over real-world network datasets validate the success of the proposed GABAttack in various situations while maintaining almost invisible activity. This research serves as an alarming call for network security experts and practitioners to develop robust defense measures against such attacks.
    摘要 federated learning 可以让多个客户端共同参与到全球模型的学习中,由中央服务器进行协调。这种学习方式可以保护客户端的数据隐私,并减少通信开销。在应用于网络流量分类中,这会隐藏网络漏洞和弱点。然而, federated learning 受到后门攻击的威胁,攻击者可以在全球模型中注入修改后的模型更新。这些更新会在特定的输入模式下引入一个突出的功能,可以通过特定的输入来启动。然而,基于 federated learning 的网络流量分类模型对这些攻击的抗性尚未得到探讨。在这篇论文中,我们提出了 GABAttack,一种基于遗传算法的后门攻击方法,用于攻击 federated learning 的网络流量分类模型。GABAttack 使用遗传算法来优化后门触发模式的值和位置,以确保更好地适应输入和模型。这种输入特定的动态攻击可以提高攻击的逃避能力,同时保持高效。我们对实际的网络数据进行了广泛的实验,并证明了 GABAttack 在不同的情况下都有很好的成功率,同时几乎无法察见。这些研究作为一个警告,鼓励网络安全专家和实践者开发robust的防御措施来应对这类攻击。

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

  • paper_url: http://arxiv.org/abs/2309.15649
  • repo_url: None
  • paper_authors: Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke
  • for: 研究大型自然语言模型(LLM)是否可以作为语音识别后处理器,进行重分配和错误修正。
  • methods: 研究不同的提示方法,包括零拟合和少量拟合在Context learning中,以及一种新的任务活动提示方法,它结合了 causal instructions和示例来增加其上下文窗口。
  • results: 研究发现,通过冻结LLM进行重分配,只需在Context learning中进行几个批量训练,就可以达到与预先定制的语言模型相当的性能,并且通过结合提示技术和微调来实现错误率下降至N-best oracle水平。
    Abstract We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines causal instructions and demonstration to increase its context windows. Next, we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs, using a pretrained first-pass recognition system and rescoring output on two out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.
    摘要 我团队 investigate LLMs 的能力以干作 speech recognition 后处理器,包括重新评分和错误修复。我们首先关注 instruction prompting,以便 LLMs 可以无需微调完成这些任务。我们评估了不同的提示方案,包括零射频和几射频在 Context 学习,以及一种新的任务活动提示方法,该方法结合 causal instrucitons 和 demonstration,以增加其上下文窗口。接着,我们显示了只使用冰结 LLMs 进行 in-context learning 可以达到与预训练的 domain-tuned LMs 相当的结果,使用预训练的 first-pass recognition 系统和重新评分输出在两个 out-of-domain 任务(ATIS 和 WSJ)上。通过结合提示技术与微调,我们实现了错误率低于 N-best oracle 水平,展示了 LLMs 的泛化能力。

Hedging Properties of Algorithmic Investment Strategies using Long Short-Term Memory and Time Series models for Equity Indices

  • paper_url: http://arxiv.org/abs/2309.15640
  • repo_url: None
  • paper_authors: Jakub Michańków, Paweł Sakowski, Robert Ślepaczuk
  • for: 这个论文旨在防范金融市场在金融危机时的风险投资 portfolio。
  • methods: 这篇论文提出了一种全新的多Asset ensemble algorithmic investment strategies(AIS)多元化风险投资策略,通过使用不同类型的数学模型(LSTM、ARIMA-GARCH、 momentum和 contrarian)生成价格预测,并将其用于生成投资信号。
  • results: 研究发现LSTM模型表现最佳,而使用比特币constructed AIS 是最佳多元化风险投资策略。此外,使用1小时数据也表现更好于使用日常数据。
    Abstract This paper proposes a novel approach to hedging portfolios of risky assets when financial markets are affected by financial turmoils. We introduce a completely novel approach to diversification activity not on the level of single assets but on the level of ensemble algorithmic investment strategies (AIS) built based on the prices of these assets. We employ four types of diverse theoretical models (LSTM - Long Short-Term Memory, ARIMA-GARCH - Autoregressive Integrated Moving Average - Generalized Autoregressive Conditional Heteroskedasticity, momentum, and contrarian) to generate price forecasts, which are then used to produce investment signals in single and complex AIS. In such a way, we are able to verify the diversification potential of different types of investment strategies consisting of various assets (energy commodities, precious metals, cryptocurrencies, or soft commodities) in hedging ensemble AIS built for equity indices (S&P 500 index). Empirical data used in this study cover the period between 2004 and 2022. Our main conclusion is that LSTM-based strategies outperform the other models and that the best diversifier for the AIS built for the S&P 500 index is the AIS built for Bitcoin. Finally, we test the LSTM model for a higher frequency of data (1 hour). We conclude that it outperforms the results obtained using daily data.
    摘要 Our main finding is that LSTM-based strategies outperform the other models, and the best diversifier for the AIS built for the S&P 500 index is the AIS built for Bitcoin. Furthermore, we test the LSTM model on a higher frequency of data (1 hour) and find that it outperforms the results obtained using daily data.

Learning with Noisy Labels for Human Fall Events Classification: Joint Cooperative Training with Trinity Networks

  • paper_url: http://arxiv.org/abs/2310.06854
  • repo_url: None
  • paper_authors: Leiyu Xie, Yang Sun, Syed Mohsen Naqvi
  • for: 这篇论文目的是提出一个简单 yet effective的方法来解决深度学习中的污染标签问题,以保护人类试验者的隐私。
  • methods: 这篇论文使用了一个名为“Joint Cooperative training with Trinity Networks”的方法(简称JoCoT),具有两个教师网络和一个学生网络,以改善混淆标签学习框架的稳定性和性能。
  • results: 根据实验结果,JoCoT 在高混淆率下表现出色,较前一代方法高5.17%和3.35%。具体来说,JoCoT 在 UP-Fall dataset 上的平均 pairflip 和 symmetric 混淆率下,较前一代方法高5.17%和3.35%。
    Abstract With the increasing ageing population, fall events classification has drawn much research attention. In the development of deep learning, the quality of data labels is crucial. Most of the datasets are labelled automatically or semi-automatically, and the samples may be mislabeled, which constrains the performance of Deep Neural Networks (DNNs). Recent research on noisy label learning confirms that neural networks first focus on the clean and simple instances and then follow the noisy and hard instances in the training stage. To address the learning with noisy label problem and protect the human subjects' privacy, we propose a simple but effective approach named Joint Cooperative training with Trinity Networks (JoCoT). To mitigate the privacy issue, human skeleton data are used. The robustness and performance of the noisy label learning framework is improved by using the two teacher modules and one student module in the proposed JoCoT. To mitigate the incorrect selections, the predictions from the teacher modules are applied with the consensus-based method to guide the student module training. The performance evaluation on the widely used UP-Fall dataset and comparison with the state-of-the-art, confirms the effectiveness of the proposed JoCoT in high noise rates. Precisely, JoCoT outperforms the state-of-the-art by 5.17% and 3.35% with the averaged pairflip and symmetric noises, respectively.
    摘要 随着老龄化人口增长,落地事件分类得到了大量研究的关注。在深度学习的发展中,数据标签质量的影响是关键。大多数数据集都是自动或半自动标注的,因此样本可能会出现误标,这会限制深度神经网络(DNNs)的性能。最近关于噪音标签学习的研究表明,神经网络在训练阶段会首先学习清晰和简单的实例,然后遵循噪音和复杂的实例。为解决噪音标签学习问题并保护人类主体隐私,我们提出了一种简单 yet有效的方法 named Joint Cooperative training with Trinity Networks(JoCoT)。使用人体骨骼数据来 mitigate the privacy issue。通过使用两个教师模块和一个学生模块,我们提高了噪音标签学习框架的Robustness和性能。通过将教师模块的预测应用于学生模块的训练中,我们减少了错误选择的问题。在广泛使用的 UP-Fall 数据集上进行性能评估,我们发现 JoCoT 在高噪音率下能够超过状态艺术。具体来说,JoCoT 在 averaged pairflip 和 symmetric noise 下的平均性能高于状态艺术的 5.17% 和 3.35%。

An Empirical Study of AI Generated Text Detection Tools

  • paper_url: http://arxiv.org/abs/2310.01423
  • repo_url: None
  • paper_authors: Arslan Akram
  • for: 这个研究的目的是为了填补现有的多Domain ChatGPT材料测试State-of-the-art API和工具的需求。
  • methods: 这个研究使用了一个大型的多Domain dataset,包括文章、摘要、故事、新闻和产品评论,并使用了六种人工智能文本标识系统进行测试。
  • results: 这个研究发现, Originality 在所有工具中表现最佳,具有97.0%的准确率。
    Abstract Since ChatGPT has emerged as a major AIGC model, providing high-quality responses across a wide range of applications (including software development and maintenance), it has attracted much interest from many individuals. ChatGPT has great promise, but there are serious problems that might arise from its misuse, especially in the realms of education and public safety. Several AIGC detectors are available, and they have all been tested on genuine text. However, more study is needed to see how effective they are for multi-domain ChatGPT material. This study aims to fill this need by creating a multi-domain dataset for testing the state-of-the-art APIs and tools for detecting artificially generated information used by universities and other research institutions. A large dataset consisting of articles, abstracts, stories, news, and product reviews was created for this study. The second step is to use the newly created dataset to put six tools through their paces. Six different artificial intelligence (AI) text identification systems, including "GPTkit," "GPTZero," "Originality," "Sapling," "Writer," and "Zylalab," have accuracy rates between 55.29 and 97.0%. Although all the tools fared well in the evaluations, originality was particularly effective across the board.
    摘要 (Simplified Chinese translation)自从ChatGPT出现以来,它已经在各种应用程序中提供了高质量的响应,包括软件开发和维护,因此吸引了许多人的关注。ChatGPT拥有巨大的潜力,但是可能由其滥用而产生的问题很严重,特别是在教育和公共安全领域。目前有几种AIGC检测器可用,它们都在真实文本上进行测试。然而,更多的研究是需要了解多元领域ChatGPT材料的效果。这项研究目的是填充这个需求,通过创建一个多元领域数据集,用于测试当前最佳API和工具。一个大量的数据集,包括文章、摘要、故事、新闻和产品评论,被用于这项研究。第二步是使用新创建的数据集,对六种工具进行测试。六种人工智能文本标识系统,包括"GPTkit"、"GPTZero"、"Originality"、"Sapling"、"Writer"和"Zylalab",在评估中的准确率分别为55.29%和97.0%。虽然所有工具在评估中表现良好,但"Originality"在整体上表现特别出色。

Perception for Humanoid Robots

  • paper_url: http://arxiv.org/abs/2309.15616
  • repo_url: https://github.com/openhumanoids/oh-distro
  • paper_authors: Arindam Roychoudhury, Shahram Khorshidi, Subham Agrawal, Maren Bennewitz
  • for: 本研究探讨了人工智能机器人 perceive 技术的最新发展和趋势。
  • methods: 本研究使用了多种感知模式和技术,包括视觉、听觉和感觉感知,以实现机器人与人类和环境的互动。
  • results: 研究发现,多感知模式的融合和机器学习技术在机器人内部状态估计、环境理解和人机交互方面具有广泛的应用前景。
    Abstract Purpose of Review: The field of humanoid robotics, perception plays a fundamental role in enabling robots to interact seamlessly with humans and their surroundings, leading to improved safety, efficiency, and user experience. This scientific study investigates various perception modalities and techniques employed in humanoid robots, including visual, auditory, and tactile sensing by exploring recent state-of-the-art approaches for perceiving and understanding the internal state, the environment, objects, and human activities. Recent Findings: Internal state estimation makes extensive use of Bayesian filtering methods and optimization techniques based on maximum a-posteriori formulation by utilizing proprioceptive sensing. In the area of external environment understanding, with an emphasis on robustness and adaptability to dynamic, unforeseen environmental changes, the new slew of research discussed in this study have focused largely on multi-sensor fusion and machine learning in contrast to the use of hand-crafted, rule-based systems. Human robot interaction methods have established the importance of contextual information representation and memory for understanding human intentions. Summary: This review summarizes the recent developments and trends in the field of perception in humanoid robots. Three main areas of application are identified, namely, internal state estimation, external environment estimation, and human robot interaction. The applications of diverse sensor modalities in each of these areas are considered and recent significant works are discussed.
    摘要 目的的检查:人类型 робоット学中,感知对于机器人与人类环境互动过程中的流畅性、安全性、效率和用户体验具有基本作用。这项科学研究探讨了人类型 робоット中不同感知模式和技术的应用,包括视觉、听觉和触觉感知,并探讨最新的状态艺术方法以及理解内部状态、环境、物体和人类活动的感知和理解方法。最新发现:机器人内部状态估计主要利用极大似然估计方法和优化技术,基于最大似然估计的形式ulation,利用 proprioceptive 感知。在机器人对外环境理解方面,研究者们主要关注多感知融合和机器学习,而不是使用手工、规则驱动的系统。人机交互方法也证明了 Contextual information representation和记忆的重要性,以便理解人类的意图。总结:这篇文章总结了最近在人类型 robot学中感知的发展和趋势。文章分为三个主要应用领域:内部状态估计、外部环境估计和人机交互。每个领域中的不同感知模式的应用和最新的重要成果都被考虑到。

Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution

  • paper_url: http://arxiv.org/abs/2309.15609
  • repo_url: None
  • paper_authors: Akshat Dewan, Michal Ziemski, Henri Meylan, Lorenzo Concina, Bruno Pouliquen
  • for: 这篇论文是为了描述一种完全自动化会议记录和多种语言机器翻译的综合解决方案。
  • methods: 该工具使用了WIPO内部开发的语音转文本(S2T)和机器翻译(MT)组件,并进行了数据收集和精度调整,实现了高度定制和可靠的系统。
  • results: 这篇论文描述了技术组件的架构和进化,以及用户 сторо面的业务影响和利益。
    Abstract This paper presents an end-to-end solution for the creation of fully automated conference meeting transcripts and their machine translations into various languages. This tool has been developed at the World Intellectual Property Organization (WIPO) using in-house developed speech-to-text (S2T) and machine translation (MT) components. Beyond describing data collection and fine-tuning, resulting in a highly customized and robust system, this paper describes the architecture and evolution of the technical components as well as highlights the business impact and benefits from the user side. We also point out particular challenges in the evolution and adoption of the system and how the new approach created a new product and replaced existing established workflows in conference management documentation.
    摘要 translation in simplified chinese:这篇论文介绍了一个端到端解决方案,用于自动生成会议记录和不同语言的机器翻译。这个工具在世界知识产权组织(WIPO)内部开发了自动 speech-to-text(S2T)和机器翻译(MT)组件。除了描述数据收集和精细调整外,这篇论文还描述了技术组件的架构和进化,以及用户 сторо面上的优点和影响。我们还指出了系统演化和采用的一些挑战,以及如何新的方法创造了一个新产品,取代了现有的会议管理文档工作流程。

An Evaluation of ChatGPT-4’s Qualitative Spatial Reasoning Capabilities in RCC-8

  • paper_url: http://arxiv.org/abs/2309.15577
  • repo_url: None
  • paper_authors: Anthony G Cohn
  • for: Investigating the extent to which a Large Language Model (LLM) can perform classical qualitative spatial reasoning tasks.
  • methods: Using the mereotopological calculus, RCC-8.
  • results: The LLM is able to perform classical qualitative spatial reasoning tasks on RCC-8.
    Abstract Qualitative Spatial Reasoning (QSR) is well explored area of Commonsense Reasoning and has multiple applications ranging from Geographical Information Systems to Robotics and Computer Vision. Recently many claims have been made for the capabilities of Large Language Models (LLMs). In this paper we investigate the extent to which one particular LLM can perform classical qualitative spatial reasoning tasks on the mereotopological calculus, RCC-8.
    摘要 优质空间理解(QSR)是已经广泛探索的常识理解领域之一,它在地理信息系统到机器人和计算机视觉等领域有多种应用。近期,许多人对大语言模型(LLM)的能力做出了各种各样的声明。本文我们将 investigate LLM 是否可以在简单的 mereotopological calculus 上完成经典的qualitative spatial reasoning 任务。

Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank

  • paper_url: http://arxiv.org/abs/2309.15560
  • repo_url: None
  • paper_authors: Mouxiang Chen, Chenghao Liu, Zemin Liu, Zhuo Li, Jianling Sun
  • for: 本研究的目的是探讨ULTR中true relevance是否可以从点击数据中恢复,这是ULTR领域的基础问题。
  • methods: 我们首先定义一个排名模型为可识别的,如果它可以将true relevance恢复到一个扭曲参数下,那么它是可识别的。然后我们探讨一个等价的可识别性条件,可以用图连接问题表示:如果图 constructed on the underlying structure of the dataset是连通的,那么true relevance可以正确地恢复。如果IG不连通,那么可能会出现坏的情况,导致排名性能下降。为解决这个问题,我们提出了两种方法,即节点干扰和节点合并,以修改数据集并恢复IG的连接性。
  • results: 我们在一个 simulate dataset和两个LTR benchmark dataset上进行了实验,结果证明了我们的提出的定理的正确性,并证明了我们的方法可以在数据偏见时mitigate the impact of data bias。
    Abstract The application of Unbiased Learning to Rank (ULTR) is widespread in modern systems for training unbiased ranking models from biased click logs. The key is to explicitly model a generation process for user behavior and fit click data based on examination hypothesis. Previous research found empirically that the true latent relevance can be recovered in most cases as long as the clicks are perfectly fitted. However, we demonstrate that this is not always achievable, resulting in a significant reduction in ranking performance. In this work, we aim to answer if or when the true relevance can be recovered from click data, which is a foundation issue for ULTR field. We first define a ranking model as identifiable if it can recover the true relevance up to a scaling transformation, which is enough for pairwise ranking objective. Then we explore an equivalent condition for identifiability that can be novely expressed as a graph connectivity test problem: if and only if a graph (namely identifiability graph, or IG) constructed on the underlying structure of the dataset is connected, we can guarantee that the relevance can be correctly recovered. When the IG is not connected, there may be bad cases leading to poor ranking performance. To address this issue, we propose two methods, namely node intervention and node merging, to modify the dataset and restore connectivity of the IG. Empirical results obtained on a simulation dataset and two LTR benchmark datasets confirm the validity of our proposed theorems and show the effectiveness of our methods in mitigating data bias when the relevance model is unidentifiable.
    摘要 现代系统中广泛应用无偏学习排名(ULTR)训练不偏排名模型从偏折衔的点击日志中。关键在于显式地模型用户行为生成过程并将点击数据适应测试假设。前一项的研究发现,如果点击数据完全适应,那么真正的潜在相关性可以在大多数情况下恢复。然而,我们示出这并不总是可能,导致排名性能受到显著降低。在这项工作中,我们试图回答点击数据中真正的相关性是否可以恢复的问题,这是ULTR领域的基础问题。我们首先定义排名模型为可 identificable 如果它可以恢复真正的相关性,并且可以通过对比对象的排名来证明。然后我们探索一种等价的可 identificability 条件,可以以图形连接问题的形式表达:如果数据集下的图形(即可 identificability 图,IG)是连接的,那么我们可以保证相关性可以正确恢复。如果 IG 不连接,可能会出现坏的情况,导致排名性能下降。为解决这个问题,我们提出了两种方法:节点干扰和节点合并,以修改数据集并恢复 IG 的连接性。实验结果, obtained on a simulation dataset and two LTR benchmark datasets, confirm the validity of our proposed theorems and show the effectiveness of our methods in mitigating data bias when the relevance model is unidentifiable.

Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

  • paper_url: http://arxiv.org/abs/2309.15554
  • repo_url: https://github.com/hlt-mt/fbk-fairseq
  • paper_authors: Sara Papi, Marco Gaido, Matteo Negri
  • for: 本研究参加IWSLT 2023评分活动的同时翻译和自动字幕追踪两个追踪,使用直接架构进行两个任务。
  • methods: 我们使用了已经在线上训练的模型来获取实时推断,并将直接ST模型改进以生成符合标准的字幕和时间标签。
  • results: 我们的英德同时翻译系统在2021和2022年的任务中比顶对方系统具有更好的计算感知延迟,优化至多达3.5个BLEU。我们的自动字幕系统在英德和英西二 languages 中优化了3.7和1.7个SubER。
    Abstract This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign. Our submission focused on the use of direct architectures to perform both tasks: for the simultaneous one, we leveraged the knowledge already acquired by offline-trained models and directly applied a policy to obtain the real-time inference; for the subtitling one, we adapted the direct ST model to produce well-formed subtitles and exploited the same architecture to produce timestamps needed for the subtitle synchronization with audiovisual content. Our English-German SimulST system shows a reduced computational-aware latency compared to the one achieved by the top-ranked systems in the 2021 and 2022 rounds of the task, with gains of up to 3.5 BLEU. Our automatic subtitling system outperforms the only existing solution based on a direct system by 3.7 and 1.7 SubER in English-German and English-Spanish respectively.
    摘要 (Note: Simplified Chinese is a written language that uses simpler characters and grammar than Traditional Chinese. The translation is written in Simplified Chinese, but the original text is in Traditional Chinese.)

Identifying confounders in deep-learning-based model predictions using DeepRepViz

  • paper_url: http://arxiv.org/abs/2309.15551
  • repo_url: None
  • paper_authors: Roshan Prakash Rane, JiHoon Kim, Arjun Umesha, Didem Stark, Marc-André Schulz, Kerstin Ritter
  • for: This paper aims to help researchers detect and mitigate the impact of confounding variables in deep learning (DL) models when analyzing neuroimaging data.
  • methods: The paper proposes a framework called DeepRepViz, which consists of a metric to quantify the effect of potential confounders and a visualization tool to qualitatively inspect the DL model’s learning process.
  • results: The authors demonstrate the benefits of using DeepRepViz in combination with DL models through experiments on simulated and neuroimaging datasets. For example, the framework identifies sex as a significant confounder in a DL model predicting chronic alcohol users, and age as a confounder in a DL model predicting cognitive task performance.
    Abstract Deep Learning (DL) models are increasingly used to analyze neuroimaging data and uncover insights about the brain, brain pathologies, and psychological traits. However, extraneous `confounders' variables such as the age of the participants, sex, or imaging artifacts can bias model predictions, preventing the models from learning relevant brain-phenotype relationships. In this study, we provide a solution called the `DeepRepViz' framework that enables researchers to systematically detect confounders in their DL model predictions. The framework consists of (1) a metric that quantifies the effect of potential confounders and (2) a visualization tool that allows researchers to qualitatively inspect what the DL model is learning. By performing experiments on simulated and neuroimaging datasets, we demonstrate the benefits of using DeepRepViz in combination with DL models. For example, experiments on the neuroimaging datasets reveal that sex is a significant confounder in a DL model predicting chronic alcohol users (Con-score=0.35). Similarly, DeepRepViz identifies age as a confounder in a DL model predicting participants' performance on a cognitive task (Con-score=0.3). Overall, DeepRepViz enables researchers to systematically test for potential confounders and expose DL models that rely on extraneous information such as age, sex, or imaging artifacts.
    摘要 深度学习(DL)模型在分析神经成像数据方面变得越来越普遍,以探索大脑、脑病和心理特质之间的关系。然而,外部因素such as参与者年龄、性别或成像artefacts可能会影响模型预测,使模型无法学习有关大脑与人类特质之间的关系。在本研究中,我们提供了一种解决方案called“DeepRepViz”框架,它可以帮助研究人员系统地检测DL模型预测中的外部因素。框架包括(1)一个量化 potential confounders的metric和(2)一个可视化工具,允许研究人员资深 inspect DL模型是学习什么。通过对模拟数据和神经成像数据进行实验,我们证明了在与DL模型结合使用DeepRepViz时的优点。例如,对神经成像数据进行实验显示,性别对酒精滥用者的预测结果有 statistically significant的影响(Con-score=0.35)。同时,DeepRepViz也发现年龄对参与者的认知任务表现有 statistically significant的影响(Con-score=0.3)。总之,DeepRepViz可以帮助研究人员系统地测试potential confounders,并暴露DL模型是否依赖于不必要的信息 such as age、性别或成像artefacts。

From LAION-5B to LAION-EO: Filtering Billions of Images Using Anchor Datasets for Satellite Image Extraction

  • paper_url: http://arxiv.org/abs/2309.15535
  • repo_url: None
  • paper_authors: Mikolaj Czerkawski, Alistair Francis
  • for: 这种研究是为了提取卫星图像域的特定子集而设计的。
  • methods: 这种方法使用了参考 dataset,然后进行进一步的筛选,以提取卫星图像域的特定子集。
  • results: 这种方法导致了一个名为 LAION-EO 的 dataset 的发布,该 dataset 包含高分辨率像素的文本和卫星图像的对应对。 paper 还介绍了数据采集过程以及数据集的一些特性。
    Abstract Large datasets, such as LAION-5B, contain a diverse distribution of images shared online. However, extraction of domain-specific subsets of large image corpora is challenging. The extraction approach based on an anchor dataset, combined with further filtering, is proposed here and demonstrated for the domain of satellite imagery. This results in the release of LAION-EO, a dataset sourced from the web containing pairs of text and satellite images in high (pixel-wise) resolution. The paper outlines the acquisition procedure as well as some of the features of the dataset.
    摘要 大量的数据集,如LAION-5B,包含丰富多样化的在线图像分布。然而,抽取具有域特定特点的图像集的大数据集是一项挑战。本文提出了基于锚点集的抽取方法,并通过进一步的筛选,在卫星图像领域实现了LAION-EO数据集的生成。该数据集包含高分辨率像素的文本和卫星图像对。文章还介绍了数据集的获取过程以及一些特性。

Cyber Security Requirements for Platforms Enhancing AI Reproducibility

  • paper_url: http://arxiv.org/abs/2309.15525
  • repo_url: None
  • paper_authors: Polra Victor Falade
  • for: 本研究旨在 Addressing the security challenges associated with artificial intelligence (AI) research and ensuring reproducibility in the field of AI.
  • methods: 本研究使用了 A new framework for evaluating AI platforms for reproducibility from a cyber security standpoint, which assessed five popular AI reproducibility platforms.
  • results: 分析发现, none of these platforms fully incorporates the necessary cyber security measures essential for robust reproducibility. Kaggle and Codalab performed better in terms of implementing cyber security measures, covering aspects like security, privacy, usability, and trust.
    Abstract Scientific research is increasingly reliant on computational methods, posing challenges for ensuring research reproducibility. This study focuses on the field of artificial intelligence (AI) and introduces a new framework for evaluating AI platforms for reproducibility from a cyber security standpoint to address the security challenges associated with AI research. Using this framework, five popular AI reproducibility platforms; Floydhub, BEAT, Codalab, Kaggle, and OpenML were assessed. The analysis revealed that none of these platforms fully incorporates the necessary cyber security measures essential for robust reproducibility. Kaggle and Codalab, however, performed better in terms of implementing cyber security measures covering aspects like security, privacy, usability, and trust. Consequently, the study provides tailored recommendations for different user scenarios, including individual researchers, small laboratories, and large corporations. It emphasizes the importance of integrating specific cyber security features into AI platforms to address the challenges associated with AI reproducibility, ultimately advancing reproducibility in this field. Moreover, the proposed framework can be applied beyond AI platforms, serving as a versatile tool for evaluating a wide range of systems and applications from a cyber security perspective.
    摘要 (注意:以下是简化中文版本,有些词语和句子可能会有所不同)科学研究越来越依赖计算方法,但这也引发了复制性问题的出现。这项研究将人工智能(AI)作为研究对象,提出了一个新的评估AI平台复制性的框架,从安全角度来解决相关的安全挑战。使用这个框架,研究者评估了5个流行的AI复制性平台,即Floydhub、BEAT、Codalab、Kaggle和OpenML。分析发现,这些平台中没有任一个完全涵盖了必要的安全措施,以确保强大的复制性。Kaggle和Codalab在实施安全措施方面表现了更好,覆盖了安全、隐私、可用性和信任方面的多个方面。因此,研究提供了对不同用户场景的打算建议,包括个人研究者、小团队和大公司。强调在AI平台中集成特定的安全功能,以解决相关的挑战,最终推动AI复制性领域的进步。此外,提出的框架可以超出AI平台,用于评估各种系统和应用程序的安全性。

Robust Internal Representations for Domain Generalization

  • paper_url: http://arxiv.org/abs/2309.15522
  • repo_url: None
  • paper_authors: Mohammad Rostami
  • for: 本研究是一篇概述我在转移学习中的研究成果,尤其是在缺乏标签数据和连续学习中遇到的挑战。
  • methods: 本研究使用 embedding space 进行转移学习,包括几何学习、零shot学习、连续学习、领域适应和分布式学习。
  • results: 本研究提供了一个抽象的转移学习概述,为未来的研究人员提供了一个前瞻性的视角,帮助他们更好地理解和发展转移学习领域。
    Abstract This paper which is part of the New Faculty Highlights Invited Speaker Program of AAAI'23, serves as a comprehensive survey of my research in transfer learning by utilizing embedding spaces. The work reviewed in this paper specifically revolves around the inherent challenges associated with continual learning and limited availability of labeled data. By providing an overview of my past and ongoing contributions, this paper aims to present a holistic understanding of my research, paving the way for future explorations and advancements in the field. My research delves into the various settings of transfer learning, including, few-shot learning, zero-shot learning, continual learning, domain adaptation, and distributed learning. I hope this survey provides a forward-looking perspective for researchers who would like to focus on similar research directions.
    摘要 这篇论文是AAAI'23年新教授精彩报告系列之一,它是我在使用嵌入空间进行转移学习的研究概述。这篇论文的工作具有继续学习和标注数据的有限性等内在挑战。通过对我过去和当前研究的概述,这篇论文希望能够为未来的探索和进步提供一个整体的理解,并为相关领域的研究者提供前瞻性的视角。我的研究涉及到转移学习的不同场景,包括几shot学习、零shot学习、继续学习、领域适应和分布式学习。我希望这份报告能够为研究者们提供一个前瞻性的视角,以便他们可以专注于类似的研究方向。

Raijū: Reinforcement Learning-Guided Post-Exploitation for Automating Security Assessment of Network Systems

  • paper_url: http://arxiv.org/abs/2309.15518
  • repo_url: None
  • paper_authors: Van-Hau Pham, Hien Do Hoang, Phan Thanh Trung, Van Dinh Quoc, Trong-Nghia To, Phan The Duy
  • for: This paper aims to propose a Reinforcement Learning (RL)-driven automation approach to assist penetration testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems.
  • methods: The proposed approach uses two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement.
  • results: The agents automatically select actions and launch attacks on four real environments with over 84% successful attacks and under 55 attack steps given. The A2C algorithm has proven to be extremely effective in the selection of proper actions for automation of post-exploitation.
    Abstract In order to assess the risks of a network system, it is important to investigate the behaviors of attackers after successful exploitation, which is called post-exploitation. Although there are various efficient tools supporting post-exploitation implementation, no application can automate this process. Most of the steps of this process are completed by experts who have profound knowledge of security, known as penetration testers or pen-testers. To this end, our study proposes the Raij\=u framework, a Reinforcement Learning (RL)-driven automation approach that assists pen-testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems. We implement two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement. By leveraging RL, we aim to empower these agents with the ability to autonomously select and execute actions that can exploit vulnerabilities in target systems. This approach allows us to automate certain aspects of the penetration testing workflow, making it more efficient and responsive to emerging threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents automatically select actions and launch attacks on the environments and achieve over 84\% of successful attacks with under 55 attack steps given. Moreover, the A2C algorithm has proved extremely effective in the selection of proper actions for automation of post-exploitation.
    摘要 为评估网络系统的风险,需要调查攻击者在成功攻击后的行为,即后续利用。虽然有各种高效的工具支持后续实施,但没有应用程序可以自动化这个过程。大多数步骤都需要由具有安全知识的专家,即黑客测试员或黑客测试人员完成。为此,我们的研究提出了Raij\=u框架,一种基于强化学习(RL)驱动的自动化方法,可以帮助黑客测试员快速实施后续利用的安全水平评估在网络系统中。我们实现了两种RL算法,即Advantage Actor-Critic(A2C)和Proximal Policy Optimization(PPO),用以训练专门的代理人,以便它们可以具有智能行为,例如Metasploit模块,自动发起特权提升、 hashdump 和 lateral movement 攻击。通过RL,我们想使这些代理人具有攻击漏洞的能力,并且可以自动选择和执行攻击。这种方法可以自动化一些黑客测试工作流程,使其更加高效和应对新的威胁和漏洞。实验在四个真实环境中进行,代理人在千多个回合中被训练。代理人自动选择行动,在环境中发起攻击,达成84%以上的成功率,只需55步。此外,A2C算法在选择合适的行动方面表现出色。

Residual Scheduling: A New Reinforcement Learning Approach to Solving Job Shop Scheduling Problem

  • paper_url: http://arxiv.org/abs/2309.15517
  • repo_url: None
  • paper_authors: Kuo-Hao Ho, Ruei-Yu Jheng, Ji-Han Wu, Fan Chiang, Yen-Chi Chen, Yuan-Yu Wu, I-Chen Wu
  • for: solves the job-shop scheduling problem (JSP) and the flexible job-shop scheduling problem (FJSP)
  • methods: uses deep reinforcement learning (DRL) with graph neural networks (GNN) to construct scheduling solutions
  • results: reaches state-of-the-art (SOTA) performance among all known construction heuristics on most well-known open JSP and FJSP benchmarks, and performs well on large-size instances despite being trained on smaller instances.
    Abstract Job-shop scheduling problem (JSP) is a mathematical optimization problem widely used in industries like manufacturing, and flexible JSP (FJSP) is also a common variant. Since they are NP-hard, it is intractable to find the optimal solution for all cases within reasonable times. Thus, it becomes important to develop efficient heuristics to solve JSP/FJSP. A kind of method of solving scheduling problems is construction heuristics, which constructs scheduling solutions via heuristics. Recently, many methods for construction heuristics leverage deep reinforcement learning (DRL) with graph neural networks (GNN). In this paper, we propose a new approach, named residual scheduling, to solving JSP/FJSP. In this new approach, we remove irrelevant machines and jobs such as those finished, such that the states include the remaining (or relevant) machines and jobs only. Our experiments show that our approach reaches state-of-the-art (SOTA) among all known construction heuristics on most well-known open JSP and FJSP benchmarks. In addition, we also observe that even though our model is trained for scheduling problems of smaller sizes, our method still performs well for scheduling problems of large sizes. Interestingly in our experiments, our approach even reaches zero gap for 49 among 50 JSP instances whose job numbers are more than 150 on 20 machines.
    摘要 Job-shop scheduling problem (JSP) 是一个数学优化问题,广泛应用在生产和制造业等领域。可是,由于 JSP 和 flexible JSP (FJSP) 都是 NP-hard,因此找到全面的解决方案是不可能的。因此,发展高效的规律来解决 JSP/FJSP 的问题非常重要。一种解决 scheduling 问题的方法是建构规律来,这种方法利用深度学习来建立 scheduling 的解决方案。在这篇文章中,我们提出了一个新的方法,即 residual 调度,来解决 JSP/FJSP 问题。在这个新方法中,我们删除不必要的机器和任务,例如已经完成的任务和机器,以只包含剩下的(或有效的)机器和任务。我们的实验结果显示,我们的方法在大多数知名的 open JSP 和 FJSP 测试集上具有最佳性(SOTA)。此外,我们也发现了我们的模型在较小的 scheduling 问题上训练的情况下,我们的方法仍然可以很好地解决大型 scheduling 问题。在我们的实验中,我们甚至发现了我们的方法可以将 49 个 JSP 问题中的任务数量超过 150 的机器组合成零距离。

Teaching Text-to-Image Models to Communicate

  • paper_url: http://arxiv.org/abs/2309.15516
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Xiaowen Sun, Jiazhan Feng, Yuxuan Wang, Yuxuan Lai, Xingyu Shen, Dongyan Zhao
  • for: Given the dialog context, the model should generate a realistic image that is consistent with the specified conversation as response.
  • methods: We propose an efficient approach for dialog-to-image generation without any intermediate translation, which maximizes the extraction of the semantic information contained in the dialog. We fine-tune pre-trained text-to-image models to enable them to generate images conditioning on processed dialog context.
  • results: Our approach can consistently improve the performance of various models across multiple metrics. Experimental results on public benchmark demonstrate the effectiveness and practicability of our method.
    Abstract Various works have been extensively studied in the research of text-to-image generation. Although existing models perform well in text-to-image generation, there are significant challenges when directly employing them to generate images in dialogs. In this paper, we first highlight a new problem: dialog-to-image generation, that is, given the dialog context, the model should generate a realistic image which is consistent with the specified conversation as response. To tackle the problem, we propose an efficient approach for dialog-to-image generation without any intermediate translation, which maximizes the extraction of the semantic information contained in the dialog. Considering the characteristics of dialog structure, we put segment token before each sentence in a turn of a dialog to differentiate different speakers. Then, we fine-tune pre-trained text-to-image models to enable them to generate images conditioning on processed dialog context. After fine-tuning, our approach can consistently improve the performance of various models across multiple metrics. Experimental results on public benchmark demonstrate the effectiveness and practicability of our method.
    摘要 各种工作在文本到图像生成研究中得到了广泛的研究。虽然现有模型在文本到图像生成方面表现良好,但在直接将其应用于对话中生成图像时存在重要的挑战。在这篇论文中,我们首先强调了一个新的问题:对话到图像生成,即在对话上下文中,模型应该生成一个真实的图像,并且与指定的对话进行响应。为解决这个问题,我们提议一种高效的对话到图像生成方法,不需要任何中间翻译,最大化提取对话中含义的semantic信息。考虑对话结构的特点,我们在对话中每个句子前面添加了分割token,以便 diferenciar不同的说话人。然后,我们使用预训练的文本到图像模型进行微调,以使其能够根据处理后的对话上下文生成图像。经过微调,我们的方法可以在多种纪录下 consistently 提高不同模型的性能。实验结果表明我们的方法是可靠和实用的。

Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training

  • paper_url: http://arxiv.org/abs/2309.15881
  • repo_url: None
  • paper_authors: Zihao Deng, Benjamin Ghaemmaghami, Ashish Kumar Singh, Benjamin Cho, Leo Orshansky, Mattan Erez, Michael Orshansky
  • for: 提高推荐系统中 rarely-occurring category 的 embedding 质量
  • methods: 使用 training-time 技巧生成高质量 embedding,并理论解释其效果的 surprising 性
  • results: MLET 可以生成更好的 embedding,并且可以降低 embedding 维度和模型大小,对多个 state-of-the-art 推荐模型进行 click-through rate 预测任务中表现出色,特别是 для rare items
    Abstract Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiveness. The scheme, termed the multi-layer embeddings training (MLET), trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. For inference efficiency, MLET converts the trained two-layer embedding into a single-layer one thus keeping inference-time model size unchanged. Empirical superiority of MLET is puzzling as its search space is not larger than that of the single-layer embedding. The strong dependence of MLET on the inner dimension is even more surprising. We develop a theory that explains both of these behaviors by showing that MLET creates an adaptive update mechanism modulated by the singular vectors of embeddings. When tested on multiple state-of-the-art recommendation models for click-through rate (CTR) prediction tasks, MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average, across the models.
    摘要 现代 Deep Neural Network (DNN) 基于推荐系统 rely 于训练得到的含缺特征 embedding。输入缺乏性使得为罕见类目得到高质量 embedding 更加困难,因为它们的表示更新更少。我们提出一种在训练时期进行的技术,称为多层 embedding 训练(MLET),可以生成优秀的 embedding。MLET 使用 embedding 层的因子化,并在内部维度大于目标 embedding 维度。为了提高推理效率,MLET 将训练后的两层 embedding 转换成单层 embedding,以保持推理时模型大小不变。MLET 的实际优势是在它的搜索空间不大于单层 embedding 的情况下表现出色的。此外,MLET 强调内部维度的依赖性也是一种意外的现象。我们提出了一种理论,解释了 MLET 的行为,表明 MLET 创造了一种适应更新机制,该机制被模ulated 由嵌入表示中的几个特征值。当应用于多个 state-of-the-art 推荐模型中,MLET consistently 生成更好的模型,特别是 для 罕见项。在保持模型质量不变的情况下,MLET 允许 embedding 维度和模型大小的减少,最大化到 16x 和 5.8x 的平均值。

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.15512
  • repo_url: None
  • paper_authors: Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang
  • for: 这个论文主要是为了提出一种基于扩散模型的微调 speech synthesis方法,以减少标注数据量并提高语音质量。
  • methods: 这种方法使用了两种类型的扩散表示(semantic和acoustic),并使用两个sequence-to-sequence任务来实现微调语音生成。它还使用了一种新的CTAP抽象方法来解决现有方法中的信息重复和维度爆炸问题。
  • results: 实验结果表明,我们的提议方法比基eline方法有更高的语音质量和多样性。我们在官方网站上提供了一些音频样本,以便用户可以直接听到结果。
    Abstract Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(semantic \& acoustic) and using two sequence-to-sequence tasks to enable training with minimal supervision. However, existing methods suffer from information redundancy and dimension explosion in semantic representation, and high-frequency waveform distortion in discrete acoustic representation. Autoregressive frameworks exhibit typical instability and uncontrollability issues. And non-autoregressive frameworks suffer from prosodic averaging caused by duration prediction models. To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models. The non-autoregressive framework enhances controllability, and the duration diffusion model enables diversified prosodic expression. Contrastive Token-Acoustic Pretraining (CTAP) is used as an intermediate semantic representation to solve the problems of information redundancy and dimension explosion in existing semantic coding methods. Mel-spectrogram is used as the acoustic representation. Both semantic and acoustic representations are predicted by continuous variable regression tasks to solve the problem of high-frequency fine-grained waveform distortion. Experimental results show that our proposed method outperforms the baseline method. We provide audio samples on our website.
    摘要 文本识别(TTS)方法已经在voice cloning中表现出了有前途的结果,但它们需要大量标注的文本-语音对。不过,现有的方法受到 semantic 表示中的信息重复和维度爆发的问题,以及 discrete acoustic 表示中的高频波形腐败问题。潜在的 autoregressive 框架具有典型的不稳定和不可控问题,而非潜在的框架受到duration prediction模型的平均化问题。为解决这些问题,我们提出了一种 minimally-supervised 高精度语音合成方法,其中所有模块都是基于扩散模型的。非潜在的框架提高了可控性,而 duration 扩散模型允许多样化的语音表达。我们使用 contrastive Token-Acoustic Pretraining(CTAP)作为中间的semantic表示,以解决现有的semantic coding方法中的信息重复和维度爆发问题。Mel-spectrogram 作为 acoustic 表示。两者都是通过连续变量回归任务来解决高频细腐波形腐败问题。实验结果表明,我们的提议方法比基eline方法更高效。我们在官方网站上提供了声音样本。

Towards Human-Like RL: Taming Non-Naturalistic Behavior in Deep RL via Adaptive Behavioral Costs in 3D Games

  • paper_url: http://arxiv.org/abs/2309.15484
  • repo_url: None
  • paper_authors: Kuo-Hao Ho, Ping-Chun Hsieh, Chiu-Chou Lin, You-Ren Luo, Feng-Jian Wang, I-Chen Wu
  • For: The paper aims to train a human-like agent with competitive strength in reinforcement learning, addressing the issue of peculiar gameplay experiences caused by unconstrained agents.* Methods: The proposed approach, called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL), augments behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights, and minimizes the behavioral costs subject to a constraint of the value function.* Results: Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, the paper demonstrates that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning, promoting more natural and human-like behavior during gameplay.Here’s the Chinese version of the three key points:* For: 论文目的是在强化学习中训练一个与人类行为相似的智能机器人,解决不 constraint 的agent可能出现的异常游戏体验问题。* Methods: 提议的方法是 Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL),它在强化学习中加入行为限制作为成本信号,并通过动态调整行为限制的权重来减少行为成本。* Results: 通过在 DMLab-30 和 Unity ML-Agents Toolkit 上进行的3D游戏实验,论文表明 ABC-RL 可以同时保持同等性能水平,减少摆动和旋转的实例,使游戏play 更加自然和人类化。
    Abstract In this paper, we propose a new approach called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL) for training a human-like agent with competitive strength. While deep reinforcement learning agents have recently achieved superhuman performance in various video games, some of these unconstrained agents may exhibit actions, such as shaking and spinning, that are not typically observed in human behavior, resulting in peculiar gameplay experiences. To behave like humans and retain similar performance, ABC-RL augments behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights. Unlike traditional constrained policy optimization, we propose a new formulation that minimizes the behavioral costs subject to a constraint of the value function. By leveraging the augmented Lagrangian, our approach is an approximation of the Lagrangian adjustment, which handles the trade-off between the performance and the human-like behavior. Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, we demonstrate that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning. These findings underscore the effectiveness of our proposed approach in promoting more natural and human-like behavior during gameplay.
    摘要 在这篇论文中,我们提出了一种新的方法called Adaptive Behavioral Costs in Reinforcement Learning(ABC-RL),用于训练具有竞争力的人类样式机器人。Recently, deep reinforcement learning agents have achieved superhuman performance in various video games, but some of these unconstrained agents may exhibit actions such as shaking and spinning that are not typical of human behavior, leading to peculiar gameplay experiences. To behave like humans and retain similar performance, ABC-RL adds behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights. Unlike traditional constrained policy optimization, we propose a new formulation that minimizes the behavioral costs subject to a constraint of the value function. By leveraging the augmented Lagrangian, our approach is an approximation of the Lagrangian adjustment, which handles the trade-off between performance and human-like behavior. Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, we demonstrate that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning. These findings underscore the effectiveness of our proposed approach in promoting more natural and human-like behavior during gameplay.Here's the translation in Traditional Chinese:在这篇论文中,我们提出了一种新的方法called Adaptive Behavioral Costs in Reinforcement Learning(ABC-RL),用于训练具有竞争力的人类样式机器人。Recently, deep reinforcement learning agents have achieved superhuman performance in various video games, but some of these unconstrained agents may exhibit actions such as shaking and spinning that are not typical of human behavior, leading to peculiar gameplay experiences. To behave like humans and retain similar performance, ABC-RL adds behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights. Unlike traditional constrained policy optimization, we propose a new formulation that minimizes the behavioral costs subject to a constraint of the value function. By leveraging the augmented Lagrangian, our approach is an approximation of the Lagrangian adjustment, which handles the trade-off between performance and human-like behavior. Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, we demonstrate that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning. These findings underscore the effectiveness of our proposed approach in promoting more natural and human-like behavior during gameplay.

Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey

  • paper_url: http://arxiv.org/abs/2309.15467
  • repo_url: None
  • paper_authors: Sicong Liu, Bin Guo, Cheng Fang, Ziqi Wang, Shiyan Luo, Zimu Zhou, Zhiwen Yu
    for:* The paper focuses on the development of resource-friendly deep learning (DL) models and model-adaptive system scheduling for artificial intelligence of things (AIoT) applications.methods:* The paper explores algorithm-system co-design to optimize resource availability and improve the performance of DL models on resource-scarce infrastructures.* The survey covers various granularity levels, including DL models, computation graphs, operators, memory schedules, and hardware instructors in both on-device and distributed paradigms.results:* The paper aims to provide a broader optimization space for more free resource-performance tradeoffs and to help readers understand the connections between problems and techniques scattered over diverse levels.Here is the answer in Simplified Chinese text:for:* 本文关注资源充足的深度学习(DL)模型和AIoT应用中的系统调度方法的开发。methods:* 本文探讨了算法系统共设计,以优化资源可用性并提高资源缺乏基础设施上DL模型的性能。* 本文覆盖了多个粒度 уровень,包括DL模型、计算图、运算符、内存调度和硬件指令在 both 设备和分布式 paradigms 中。results:* 本文希望通过提供更多的自由资源性能交易空间,以便读者更好地理解各种问题和技术之间的连接。
    Abstract The emerging field of artificial intelligence of things (AIoT, AI+IoT) is driven by the widespread use of intelligent infrastructures and the impressive success of deep learning (DL). With the deployment of DL on various intelligent infrastructures featuring rich sensors and weak DL computing capabilities, a diverse range of AIoT applications has become possible. However, DL models are notoriously resource-intensive. Existing research strives to realize near-/realtime inference of AIoT live data and low-cost training using AIoT datasets on resource-scare infrastructures. Accordingly, the accuracy and responsiveness of DL models are bounded by resource availability. To this end, the algorithm-system co-design that jointly optimizes the resource-friendly DL models and model-adaptive system scheduling improves the runtime resource availability and thus pushes the performance boundary set by the standalone level. Unlike previous surveys on resource-friendly DL models or hand-crafted DL compilers/frameworks with partially fine-tuned components, this survey aims to provide a broader optimization space for more free resource-performance tradeoffs. The cross-level optimization landscape involves various granularity, including the DL model, computation graph, operator, memory schedule, and hardware instructor in both on-device and distributed paradigms. Furthermore, due to the dynamic nature of AIoT context, which includes heterogeneous hardware, agnostic sensing data, varying user-specified performance demands, and resource constraints, this survey explores the context-aware inter-/intra-device controllers for automatic cross-level adaptation. Additionally, we identify some potential directions for resource-efficient AIoT systems. By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
    摘要 人工智能物联网(AIoT,AI+IoT)领域在广泛使用智能基础设施和深度学习(DL)的成功下发展。通过在具有丰富感知器和软DL计算能力的多种智能基础设施上部署DL模型,AIoT应用程序的多样化化成为可能。然而,DL模型具有资源占用率问题。现有研究寻求实现AIoT实时数据和训练成本低的准确和响应率DL模型。为此,我们需要同时优化资源充足的DL模型和模型适应系统调度。不同于之前关于资源友好DL模型或手动精细DL编译器/框架的调研,本调研旨在为更多的自由资源性能交易提供更广泛的优化空间。AIoT上下文的跨级优化景观包括DL模型、计算图、运算、内存调度和硬件指导等,在 both on-device 和分布式模式下进行跨级优化。此外,由于AIoT上下文的动态特性,包括多样化硬件、多种感知数据、用户指定性能要求和资源限制,我们需要采用智能Device Controller来自动进行跨级调整。此外,我们还提出了一些可能的资源高效AIoT系统的方向。通过汇集多级问题和技术,我们希望读者可以更好地理解他们之间的连接,并促进进一步的讨论。

LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

  • paper_url: http://arxiv.org/abs/2309.15458
  • repo_url: None
  • paper_authors: Weidi Xu, Jingwei Wang, Lele Xie, Jianshan He, Hongting Zhou, Taifeng Wang, Xiaopei Wan, Jingdong Chen, Chao Qu, Wei Chu
  • for: 本文提出了一种将逻辑学约束(FOLC)与神经网络结合的新方法,以便模型复杂的关系,满足约束。
  • methods: 本文提出了一种新的神经层,逻辑MP(LogicMP),其层使用了mean-field变量推理来处理MLN(多边网络)。这种层可以与现有的神经网络结合,以便编码FOLC。
  • results: empirical results表明,LogicMP在图像、文本和图像三种任务中的表现比先进竞争者更高,同时具有更高的效率和性能。
    Abstract Integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem since it involves modeling intricate correlations to satisfy the constraints. This paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN. It can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency. By exploiting the structure and symmetries in MLNs, we theoretically demonstrate that our well-designed, efficient mean-field iterations effectively mitigate the difficulty of MLN inference, reducing the inference from sequential calculation to a series of parallel tensor operations. Empirical results in three kinds of tasks over graphs, images, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.
    摘要 integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem, as it involves modeling intricate correlations to satisfy the constraints. this paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN. it can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency. by exploiting the structure and symmetries in MLNs, we theoretically demonstrate that our well-designed, efficient mean-field iterations effectively mitigate the difficulty of MLN inference, reducing the inference from sequential calculation to a series of parallel tensor operations. empirical results in three kinds of tasks over graphs, images, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.

Local Compressed Video Stream Learning for Generic Event Boundary Detection

  • paper_url: http://arxiv.org/abs/2309.15431
  • repo_url: https://github.com/gx77/lcvsl
  • paper_authors: Libo Zhang, Xin Gu, Congcong Li, Tiejian Luo, Heng Fan
  • for: 本研究旨在提出一种基于压缩视频表示学习方法,用于精准地检测视频中的事件边界。
  • methods: 该方法使用轻量级的ConvNet提取RGB、运动向量、差异和GOP结构中的特征,并通过针对压缩信息的批处理和双向信息流的SCAM模块进行特征提取和束缚。
  • results: 对于Kinetics-GEBD和TAPOS数据集,该方法实现了较大的改进,与之前的端到端方法相比,同时运行速度相同。
    Abstract Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks. Existing methods typically require video frames to be decoded before feeding into the network, which contains significant spatio-temporal redundancy and demands considerable computational power and storage space. To remedy these issues, we propose a novel compressed video representation learning method for event boundary detection that is fully end-to-end leveraging rich information in the compressed domain, i.e., RGB, motion vectors, residuals, and the internal group of pictures (GOP) structure, without fully decoding the video. Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow. To learn a suitable representation for boundary detection, we construct the local frames bag for each candidate frame and use the long short-term memory (LSTM) module to capture temporal relationships. We then compute frame differences with group similarities in the temporal domain. This module is only applied within a local window, which is critical for event boundary detection. Finally a simple classifier is used to determine the event boundaries of video sequences based on the learned feature representation. To remedy the ambiguities of annotations and speed up the training process, we use the Gaussian kernel to preprocess the ground-truth event boundaries. Extensive experiments conducted on the Kinetics-GEBD and TAPOS datasets demonstrate that the proposed method achieves considerable improvements compared to previous end-to-end approach while running at the same speed. The code is available at https://github.com/GX77/LCVSL.
    摘要 通用事件边界检测目标是将视频切分成各个事件边界,以便进行事件分类和识别。现有方法通常需要将视频帧解码为图像,然后将其传输到网络中进行处理,这会带来很大的计算成本和存储空间。为了解决这些问题,我们提出了一种新的压缩视频表示学习方法,可以在压缩域内完全结束地处理视频,而不需要完全解码视频。我们使用轻量级的ConvNet来提取P帧中的特征,并使用空间通道注意机制(SCAM)来细化P帧的特征表示,基于压缩信息的双向信息流。为了学习适合的表示,我们构建了每个候选帧的本地帧袋,并使用长短时间记忆(LSTM)模块来捕捉视频序列中的时间关系。然后,我们计算帧之间的相似性,并使用 Gaussian kernel 预处理真实的事件边界标注。我们的方法在 Kinetics-GEBD 和 TAPOS 数据集上进行了广泛的实验,并达到了较好的性能,而且与之前的端到端方法相比,运行速度相对较快。代码可以在 GitHub 上找到:https://github.com/GX77/LCVSL。

SimPINNs: Simulation-Driven Physics-Informed Neural Networks for Enhanced Performance in Nonlinear Inverse Problems

  • paper_url: http://arxiv.org/abs/2309.16729
  • repo_url: None
  • paper_authors: Sidney Besnard, Frédéric Jurie, Jalal M. Fadili
  • for: solves inverse problems by leveraging deep learning techniques, with the objective of inferring unknown parameters that govern a physical system based on observed data.
  • methods: builds upon physics-informed neural networks (PINNs) trained with a hybrid loss function that combines observed data with simulated data generated by a known (approximate) physical model.
  • results: surpasses the performance of standard PINNs, providing improved accuracy and robustness, as demonstrated by experimental results on an orbit restitution problem.Here’s the full text in Simplified Chinese:
  • for: solves inverse problems by leveraging deep learning techniques, with the objective of inferring unknown parameters that govern a physical system based on observed data.
  • methods: builds upon physics-informed neural networks (PINNs) trained with a hybrid loss function that combines observed data with simulated data generated by a known (approximate) physical model.
  • results: surpasses the performance of standard PINNs, providing improved accuracy and robustness, as demonstrated by experimental results on an orbit restitution problem.
    Abstract This paper introduces a novel approach to solve inverse problems by leveraging deep learning techniques. The objective is to infer unknown parameters that govern a physical system based on observed data. We focus on scenarios where the underlying forward model demonstrates pronounced nonlinear behaviour, and where the dimensionality of the unknown parameter space is substantially smaller than that of the observations. Our proposed method builds upon physics-informed neural networks (PINNs) trained with a hybrid loss function that combines observed data with simulated data generated by a known (approximate) physical model. Experimental results on an orbit restitution problem demonstrate that our approach surpasses the performance of standard PINNs, providing improved accuracy and robustness.
    摘要

Graph Neural Prompting with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.15427
  • repo_url: None
  • paper_authors: Yijun Tian, Huan Song, Zichen Wang, Haozhu Wang, Ziqing Hu, Fang Wang, Nitesh V. Chawla, Panpan Xu
  • for: 增强预训练的大语言模型(LLM)在语言理解任务中的表现,提高 LLM 的基础知识捕捉和返回能力。
  • methods: 提出 Graph Neural Prompting(GNP)方法,GNP 包括标准图 neural network Encoder、异种Modal Pooling 模块、域 проекor 和自我超vision连接预测目标。
  • results: 在多个数据集上,GNP 能够在不同的 LLM 大小和设置下提高各种普通常识和生物医学理解任务的表现。
    Abstract Large Language Models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs to enhance language modeling via joint training and customized model architectures, applying this to LLMs is problematic owing to their large number of parameters and high computational cost. In addition, how to leverage the pre-trained LLMs and avoid training a customized model from scratch remains an open question. In this work, we propose Graph Neural Prompting (GNP), a novel plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from KGs. GNP encompasses various designs, including a standard graph neural network encoder, a cross-modality pooling module, a domain projector, and a self-supervised link prediction objective. Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks across different LLM sizes and settings.
    摘要

Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content

  • paper_url: http://arxiv.org/abs/2310.07726
  • repo_url: None
  • paper_authors: Guanlin Li, Yifei Chen, Jie Zhang, Jiwei Li, Shangwei Guo, Tianwei Zhang
  • for: 本研究旨在探讨社交媒体中人工智能生成内容(AIGC)的许多商业服务,以及这些服务的使用需要高度调控,以确保用户不会违反使用政策(如商业化利用、生成和分发不安全内容)。
  • methods: 本研究使用了许多水印技术,包括潜在扩散模型和大语言模型,来生成创意内容(如真实的图像和流畅的句子) для用户。
  • results: 研究发现, adversary可以轻松破坏这些水印技术,包括两种可能的攻击方式:水印去除和水印forge。WMagi是一个综合性框架,可以实现这两种攻击方式,并且可以保持内容质量。相比之下,现有的扩散模型基于攻击,WMagi是5,050$\sim$11,000$\times$ faster。
    Abstract Artificial Intelligence Generated Content (AIGC) is gaining great popularity in social media, with many commercial services available. These services leverage advanced generative models, such as latent diffusion models and large language models, to generate creative content (e.g., realistic images, fluent sentences) for users. The usage of such generated content needs to be highly regulated, as the service providers need to ensure the users do not violate the usage policies (e.g., abuse for commercialization, generating and distributing unsafe content). Numerous watermarking approaches have been proposed recently. However, in this paper, we show that an adversary can easily break these watermarking mechanisms. Specifically, we consider two possible attacks. (1) Watermark removal: the adversary can easily erase the embedded watermark from the generated content and then use it freely without the regulation of the service provider. (2) Watermark forge: the adversary can create illegal content with forged watermarks from another user, causing the service provider to make wrong attributions. We propose WMaGi, a unified framework to achieve both attacks in a holistic way. The key idea is to leverage a pre-trained diffusion model for content processing, and a generative adversarial network for watermark removing or forging. We evaluate WMaGi on different datasets and embedding setups. The results prove that it can achieve high success rates while maintaining the quality of the generated content. Compared with existing diffusion model-based attacks, WMaGi is 5,050$\sim$11,000$\times$ faster.
    摘要 人工智能生成内容(AIGC)在社交媒体上 gaining popularity,许多商业服务可以提供。这些服务利用先进的生成模型,如潜在扩散模型和大语言模型,为用户生成创ativo内容(如真实的图像和流畅的句子)。使用这些生成内容的使用需要高度调控,因为服务提供者需要确保用户不会违反使用策略(如商业化利用和发布不安全内容)。Recently, numerous watermarking approaches have been proposed, but in this paper, we show that an adversary can easily break these watermarking mechanisms. Specifically, we consider two possible attacks:1. 水印除除:敌对者可以轻松地从生成内容中除除水印,然后使用无需服务提供者的调控。2. 水印forge:敌对者可以从另一名用户的水印中生成非法内容,使服务提供者错误地归因。我们提出WMaGi,一种综合性框架,可以实现这两种攻击。WMaGi 利用预训练的扩散模型进行内容处理,并利用生成对抗网络来除水印或forge水印。我们对不同的数据集和嵌入设置进行评估,结果表明,WMaGi 可以 дости到高成功率,同时保持生成内容的质量。相比 existed 的扩散模型基于攻击,WMaGi 速度比例为5,050$\sim$11,000$\times$快。

The Triad of Failure Modes and a Possible Way Out

  • paper_url: http://arxiv.org/abs/2309.15420
  • repo_url: None
  • paper_authors: Emanuele Sansone
  • for: 提高cluster-based self-supervised learning(SSL)中的表示 collapse、cluster collapse和数据Permutation的问题。
  • methods: 提出了一个新的目标函数,该目标函数包括三个关键组成部分:(i)一个生成项,用于抑制表示 collapse;(ii)一个对数据变换的项,以解决标签Permutation的问题;(iii)一个均匀项,用于抑制集合溃ubble。
  • results: 对于具体实验,我们的提出的目标函数可以有效地解决表示 collapse、cluster collapse和数据Permutation的问题,并且可以通过标准背部网络 Architecture 进行优化。
    Abstract We present a novel objective function for cluster-based self-supervised learning (SSL) that is designed to circumvent the triad of failure modes, namely representation collapse, cluster collapse, and the problem of invariance to permutations of cluster assignments. This objective consists of three key components: (i) A generative term that penalizes representation collapse, (ii) a term that promotes invariance to data augmentations, thereby addressing the issue of label permutations and (ii) a uniformity term that penalizes cluster collapse. Additionally, our proposed objective possesses two notable advantages. Firstly, it can be interpreted from a Bayesian perspective as a lower bound on the data log-likelihood. Secondly, it enables the training of a standard backbone architecture without the need for asymmetric elements like stop gradients, momentum encoders, or specialized clustering layers. Due to its simplicity and theoretical foundation, our proposed objective is well-suited for optimization. Experiments on both toy and real world data demonstrate its effectiveness
    摘要 我们提出了一种新的目标函数 для基于分 clustering 的自监督学习(SSL),旨在解决三种失败模式,即表示 collapse, cluster collapse 和数据变换 permutations 的问题。这个目标函数包括三个关键组件:(i) 生成项,惩罚表示 collapse。(ii) 对数据变换具有抗变换性,解决标签 permutations 问题。(iii) 统一项,惩罚 cluster collapse。我们的提议的目标函数具有两个优点:第一,它可以从 bayesian 的视角来看做数据log-likelihood 的下界。第二,它可以使用标准的背bone 架构进行训练,不需要做特殊的杂化元素,如停止梯度、旋转encoder 或特殊的 clustering 层。由于其简单性和理论基础,我们的提议的目标函数适合优化。实验表明,它在 Toy 和实际数据上具有效果。

Neuro-Inspired Hierarchical Multimodal Learning

  • paper_url: http://arxiv.org/abs/2309.15877
  • repo_url: None
  • paper_authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan
  • for: 本研究旨在提高多modalities情况下的感知效果,启发自 neuroscience 学习。
  • methods: 我们提出了一种基于信息理论的层次感知模型(ITHP),利用信息瓶颈原理。与传统的融合模型不同,我们的模型将prime模ality作为输入,剩下的模ality作为检测器在信息路径中。
  • results: 我们的模型在多modalities学习场景下具有明显的性能优势,在MUStARD和CMU-MOSI数据集上经验证明了这一点。
    Abstract Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks.
    摘要 将多种来源或模式的信息集成和处理作为获取全面和准确的现实世界认知的关键。 drawing inspiration from neuroscience,我们开发了信息理论层次感知(ITHP)模型,利用信息瓶颈概念。 unlike most traditional fusión模型,我们的模型将首要模式作为输入,而剩下的模式则作为信息路径中的探测器。 our proposed perception model emphasizes constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states。 this approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks。 experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks。

STAG: Enabling Low Latency and Low Staleness of GNN-based Services with Dynamic Graphs

  • paper_url: http://arxiv.org/abs/2309.15875
  • repo_url: None
  • paper_authors: Jiawen Wang, Quan Chen, Deze Zeng, Zhuo Song, Chen Chen, Minyi Guo
  • for: 提高 Graph Neural Networks (GNNs) 服务的精度。
  • methods: 提出了一种名为 STAG 的 GNN 服务框架,它通过协同服务机制和可加性基于增量传播策略来解决邻居爆发问题和重复计算问题,从而实现低延迟和低落后性。
  • results: STAG 可以加速更新阶段的执行速度,并大幅减少落后时间,但是有一定的延迟增加。
    Abstract Many emerging user-facing services adopt Graph Neural Networks (GNNs) to improve serving accuracy. When the graph used by a GNN model changes, representations (embedding) of nodes in the graph should be updated accordingly. However, the node representation update is too slow, resulting in either long response latency of user queries (the inference is performed after the update completes) or high staleness problem (the inference is performed based on stale data). Our in-depth analysis shows that the slow update is mainly due to neighbor explosion problem in graphs and duplicated computation. Based on such findings, we propose STAG, a GNN serving framework that enables low latency and low staleness of GNN-based services. It comprises a collaborative serving mechanism and an additivity-based incremental propagation strategy. With the collaborative serving mechanism, only part of node representations are updated during the update phase, and the final representations are calculated in the inference phase. It alleviates the neighbor explosion problem. The additivity-based incremental propagation strategy reuses intermediate data during the update phase, eliminating duplicated computation problem. Experimental results show that STAG accelerates the update phase by 1.3x~90.1x, and greatly reduces staleness time with a slight increase in response latency.
    摘要

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

  • paper_url: http://arxiv.org/abs/2309.15402
  • repo_url: https://github.com/zchuz/cot-reasoning-survey
  • paper_authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, Ting Liu
  • for: 本文提供了一份严谨的链 reasoning(Chain-of-Thought)研究领域的概述,以帮助研究人员更好地了解这个领域的最新进展。
  • methods: 本文使用了多种方法,包括链 reasoning(XoT)的建构、结构变体和增强XoT,以系统地组织当前的研究工作。
  • results: 本文总结了链 reasoning的前沿应用,包括规划、工具使用和简化等领域的研究进展,并提出了一些未来研究的挑战和方向。
    Abstract Chain-of-thought reasoning, a cognitive process fundamental to human intelligence, has garnered significant attention in the realm of artificial intelligence and natural language processing. However, there still remains a lack of a comprehensive survey for this arena. To this end, we take the first step and present a thorough survey of this research field carefully and widely. We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.
    摘要 Chain-of-thought reasoning, a fundamental cognitive process of human intelligence, has recently garnered significant attention in the field of artificial intelligence and natural language processing. However, there is still a lack of a comprehensive survey in this area. To address this, we present a thorough survey of this research field, carefully and widely. We use X-of-Thought (XoT) to refer to Chain-of-Thought in a broad sense.In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.Here's the translation in Traditional Chinese as well:Chain-of-thought reasoning, a fundamental cognitive process of human intelligence, has recently garnered significant attention in the field of artificial intelligence and natural language processing. However, there is still a lack of a comprehensive survey in this area. To address this, we present a thorough survey of this research field, carefully and widely. We use X-of-Thought (XoT) to refer to Chain-of-Thought in a broad sense.In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.

Neural Stochastic Differential Equations for Robust and Explainable Analysis of Electromagnetic Unintended Radiated Emissions

  • paper_url: http://arxiv.org/abs/2309.15386
  • repo_url: None
  • paper_authors: Sumit Kumar Jha, Susmit Jha, Rickard Ewetz, Alvaro Velasquez
  • for: 这个论文主要用于评估某些模型在隐性辐射检测 task 中的稳定性和解释性,以及提出一种基于神经泛化差分方程(SDE)的新方法来解决这些问题。
  • methods: 这个论文使用了 ResNet-like 模型进行隐性辐射检测 task,并对这些模型进行了广泛的评估。研究发现,ResNet-like 模型在 Gaussian 噪声扰动下 exhibits 的性能会很快下降,其 F1 分数从 0.93 下降至 0.008。此外,研究还发现 ResNet-like 模型对输入数据的解释不准确,缺乏时间不变或周期性的特征。
  • results: 该论文提出了一种基于 SDE 的新方法,可以提高模型的稳定性和解释性。这种方法在面对 Gaussian 噪声扰动时仍然可以保持高的 F1 分数(0.93),而且可以更好地捕捉输入数据中的时间不变或周期性特征。这种新方法可以用于实际的 URE 应用程序中,提供更加稳定和可解释的机器学习预测。
    Abstract We present a comprehensive evaluation of the robustness and explainability of ResNet-like models in the context of Unintended Radiated Emission (URE) classification and suggest a new approach leveraging Neural Stochastic Differential Equations (SDEs) to address identified limitations. We provide an empirical demonstration of the fragility of ResNet-like models to Gaussian noise perturbations, where the model performance deteriorates sharply and its F1-score drops to near insignificance at 0.008 with a Gaussian noise of only 0.5 standard deviation. We also highlight a concerning discrepancy where the explanations provided by ResNet-like models do not reflect the inherent periodicity in the input data, a crucial attribute in URE detection from stable devices. In response to these findings, we propose a novel application of Neural SDEs to build models for URE classification that are not only robust to noise but also provide more meaningful and intuitive explanations. Neural SDE models maintain a high F1-score of 0.93 even when exposed to Gaussian noise with a standard deviation of 0.5, demonstrating superior resilience to ResNet models. Neural SDE models successfully recover the time-invariant or periodic horizontal bands from the input data, a feature that was conspicuously missing in the explanations generated by ResNet-like models. This advancement presents a small but significant step in the development of robust and interpretable models for real-world URE applications where data is inherently noisy and assurance arguments demand interpretable machine learning predictions.
    摘要 我们提出了对具有抗耗能和解释性的ResNet-like模型的全面评估,并建议使用神经泛化方程(SDE)来解决已知的限制。我们通过实验表明,ResNet-like模型对 Gaussian 噪声的抗性不佳,其性能很快下降,F1 分数降至 near insignificance 的 0.008,只需0.5 标准差的 Gaussian 噪声。我们还指出,ResNet-like模型提供的解释不符合输入数据中的自然周期性,这是URE检测中稳定设备的关键特征。为了解决这些问题,我们提议使用神经泛化方程(SDE)建立URE分类模型,这些模型不仅抗耗能强,还能提供更直观和易理解的解释。神经泛化方程模型在对 Gaussian 噪声的抗性方面表现出色,即使 exposed to 0.5 标准差的 Gaussian 噪声,其 F1 分数仍然保持在 0.93 级别。此外,神经泛化方程模型成功地从输入数据中提取了时间不变或周期性的水平带,这是 ResNet-like 模型的解释中缺失的特征。这种进步 represent a small but significant step towards the development of robust and interpretable URE models for real-world applications where data is inherently noisy and assurance arguments demand interpretable machine learning predictions.

Seeing Beyond the Patch: Scale-Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery based on Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.15372
  • repo_url: None
  • paper_authors: Yinhe Liu, Sunan Shi, Junjue Wang, Yanfei Zhong
  • For: 这篇论文的目的是提出一个动态缩尺框架,以便在遥测影像分析中超过滑动窗口的资讯捕捉。* Methods: 这篇论文使用了一个名为GeoAgent的动态缩尺框架,它可以自动捕捉遥测影像中不同类型的地理物件的对应缩尺资讯。GeoAgent使用了一个全球图示和一个位置几何来表示每个遥测影像 patch 的状态,并通过一个构成单元来强制视觉关系。* Results: 实验结果显示,GeoAgent 比前一代分 segmentation 方法更好地适应大规模地图对象的分类任务,特别是在大规模地图分类任务中。
    Abstract In remote sensing imagery analysis, patch-based methods have limitations in capturing information beyond the sliding window. This shortcoming poses a significant challenge in processing complex and variable geo-objects, which results in semantic inconsistency in segmentation results. To address this challenge, we propose a dynamic scale perception framework, named GeoAgent, which adaptively captures appropriate scale context information outside the image patch based on the different geo-objects. In GeoAgent, each image patch's states are represented by a global thumbnail and a location mask. The global thumbnail provides context beyond the patch, and the location mask guides the perceived spatial relationships. The scale-selection actions are performed through a Scale Control Agent (SCA). A feature indexing module is proposed to enhance the ability of the agent to distinguish the current image patch's location. The action switches the patch scale and context branch of a dual-branch segmentation network that extracts and fuses the features of multi-scale patches. The GeoAgent adjusts the network parameters to perform the appropriate scale-selection action based on the reward received for the selected scale. The experimental results, using two publicly available datasets and our newly constructed dataset WUSU, demonstrate that GeoAgent outperforms previous segmentation methods, particularly for large-scale mapping applications.
    摘要 在Remote感影像分析中, patch-based 方法具有限制在滑块窗口之外的信息捕获的缺点,这种缺点对处理复杂多变的地球对象产生了semantic 不一致的segmentation结果。为解决这个挑战,我们提出了一种动态缩放见解框架,名为GeoAgent,该框架可以适应不同的地球对象,并在不同的缩放尺度下进行semantic segmentation。在GeoAgent中,每个图像块的状态被表示为全局缩略图和位置 máscara。全局缩略图提供了图像块之外的上下文信息,而位置 máscara 则引导了感知的空间关系。缩放控制器(SCA)来实现缩放选择操作,并通过一个Feature Indexing Module来增强代理的能力以辨别当前图像块的位置。当选择缩放scale时, dual-branch segmentation网络的patch scale和context branch会发生变化,以提取和融合多个缩放级别的特征。GeoAgent根据获得的奖励进行参数调整,以实现适当的缩放选择操作。我们通过使用两个公共可用的数据集和我们自己制作的WUSU数据集进行实验,得到的结果表明,GeoAgent在大规模地图应用中表现出色,特别是在semantic segmentation领域。

ACWA: An AI-driven Cyber-Physical Testbed for Intelligent Water Systems

  • paper_url: http://arxiv.org/abs/2310.17654
  • repo_url: https://github.com/ai-vtrc/acwa-data
  • paper_authors: Feras A. Batarseh, Ajay Kulkarni, Chhayly Sreng, Justice Lin, Siam Maksud
    for:* 这篇论文旨在提出一个新的水测试床,即人工智能和网络安全测试床(ACWA),以解决水供应管理领域的挑战。methods:* ACWA使用了最新的人工智能和数据驱动技术,包括多种拓扑、传感器、计算节点、泵、水箱、智能水设备以及数据库和人工智能模型来控制系统。results:* ACWA的实验结果表明,这种新的水测试床可以帮助解决水和农业领域的挑战,包括网络安全、资源管理、获取水资源、可持续发展和数据驱动决策等问题。
    Abstract This manuscript presents a novel state-of-the-art cyber-physical water testbed, namely: The AI and Cyber for Water and Agriculture testbed (ACWA). ACWA is motivated by the need to advance water supply management using AI and Cybersecurity experimentation. The main goal of ACWA is to address pressing challenges in the water and agricultural domains by utilising cutting-edge AI and data-driven technologies. These challenges include Cyberbiosecurity, resources management, access to water, sustainability, and data-driven decision-making, among others. To address such issues, ACWA consists of multiple topologies, sensors, computational nodes, pumps, tanks, smart water devices, as well as databases and AI models that control the system. Moreover, we present ACWA simulator, which is a software-based water digital twin. The simulator runs on fluid and constituent transport principles that produce theoretical time series of a water distribution system. This creates a good validation point for comparing the theoretical approach with real-life results via the physical ACWA testbed. ACWA data are available to AI and water domain researchers and are hosted in an online public repository. In this paper, the system is introduced in detail and compared with existing water testbeds; additionally, example use-cases are described along with novel outcomes such as datasets, software, and AI-related scenarios.
    摘要 ACWA consists of multiple topologies, sensors, computational nodes, pumps, tanks, smart water devices, and databases, as well as AI models that control the system. Additionally, the ACWA simulator, a software-based water digital twin, runs on fluid and constituent transport principles to produce theoretical time series of a water distribution system, providing a good validation point for comparing the theoretical approach with real-life results via the physical ACWA testbed.The system is introduced in detail and compared with existing water testbeds, along with example use-cases and novel outcomes such as datasets, software, and AI-related scenarios. The ACWA data are available to AI and water domain researchers and are hosted in an online public repository.

C3Net: interatomic potential neural network for prediction of physicochemical properties in heterogenous systems

  • paper_url: http://arxiv.org/abs/2309.15334
  • repo_url: https://github.com/sehanlee/c3net
  • paper_authors: Sehan Lee, Jaechang Lim, Woo Youn Kim
  • for: 这个论文的目的是提出一种深度神经网络模型,用于 atom type embeddings 的分子上的物理化学性质预测。
  • methods: 该模型采用了深度神经网络,并遵循物理法律来预测分子中各个原子的物理化学性质。
  • results: 该模型能够高效地预测分子的物理化学性质,并且在不同的溶剂和环境中的预测结果具有高度的一致性和可预测性。
    Abstract Understanding the interactions of a solute with its environment is of fundamental importance in chemistry and biology. In this work, we propose a deep neural network architecture for atom type embeddings in its molecular context and interatomic potential that follows fundamental physical laws. The architecture is applied to predict physicochemical properties in heterogeneous systems including solvation in diverse solvents, 1-octanol-water partitioning, and PAMPA with a single set of network weights. We show that our architecture is generalized well to the physicochemical properties and outperforms state-of-the-art approaches based on quantum mechanics and neural networks in the task of solvation free energy prediction. The interatomic potentials at each atom in a solute obtained from the model allow quantitative analysis of the physicochemical properties at atomic resolution consistent with chemical and physical reasoning. The software is available at https://github.com/SehanLee/C3Net.
    摘要 理解分子中物质与环境之间的互动是化学和生物中的基本问题。在这项工作中,我们提出了一种深度神经网络架构,用于在分子上的原子类型嵌入和分子间势,该架构遵循物理法律。我们将该架构应用于预测多种不同溶剂中的溶解能,1- octanol-水分配、PAMPA等物理化学性质。我们的结果表明,我们的架构可以准确预测物理化学性质,并且在相比Quantum mechanics和神经网络方法时表现出色。通过这种方法,我们可以在分子级别获得物理化学性质的量化分析,与化学和物理原理一致。软件可以在https://github.com/SehanLee/C3Net上获得。

cs.CL - 2023-09-27

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

  • paper_url: http://arxiv.org/abs/2309.16058
  • repo_url: https://github.com/kyegomez/AnyMAL
  • paper_authors: Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar
  • for: 本研究旨在开发一种能够理解多种输入模式信号(文本、图像、视频、声音、IMU运动传感器)的语言模型,并生成文本响应。
  • methods: 本研究使用了现有状态之最高水平LLaMA-2(70B)的文本基于理解能力,并通过预训练对照器模块将多种模式特定信号转换到共同文本空间。
  • results: 我们通过了人工和自动评估,并在多种多Modal任务上达到了状态之最高水平表现。
    Abstract We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.
    摘要 我们介绍Any-Modality Augmented Language Model(AnyMAL),这是一个综合模型,可以处理多种输入模式信号(即文本、图像、视频、音频、IMU运动传感器),并生成文本响应。AnyMAL继承了现状最佳的文本基于推理能力,包括LLaMA-2(70B),并使用预训练的对齐模块将多modal特征信号转化到共同文本空间。为了进一步增强多modal LLM的能力,我们人工收集了多 modal 指令集,以覆盖多种话题和任务,超出简单的QA。我们进行了全面的实验分析,包括人类和自动评估,并在多种多modal任务中达到了状态之册表现。

Effective Long-Context Scaling of Foundation Models

  • paper_url: http://arxiv.org/abs/2309.16039
  • repo_url: None
  • paper_authors: Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma
  • for: 这个论文的目的是提出一系列的长文本LLMs,以支持有效的上下文窗口至多32,768个符号。
  • methods: 这些模型使用了 continual pretraining 方法,从 Llama 2 开始,使用更长的训练序列和更大的数据集来培养模型。
  • results: 在语言模型评估、 sintetic context probing 任务以及一系列研究 benchmark 上,这些模型 achieves consistent improvement 和 significant improvement 在 long-context 任务上,并且可以使用 cost-effective 的 instruction tuning 程序来超越 gpt-3.5-turbo-16k 的总性性能。
    Abstract We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchmarks, our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2. Notably, with a cost-effective instruction tuning procedure that does not require human-annotated long instruction data, the 70B variant can already surpass gpt-3.5-turbo-16k's overall performance on a suite of long-context tasks. Alongside these results, we provide an in-depth analysis on the individual components of our method. We delve into Llama's position encodings and discuss its limitation in modeling long dependencies. We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.
    摘要 我们提出了一系列长上下文 LLMS,支持最长32,768个字的有效上下文窗口。我们的模型系列通过长期预训练从Llama 2开始,使用更长的训练序列和增amplesize的数据集来构建。我们进行了广泛的语言模型评估、 sintetic上下文探测任务和多种研究 benchmark 评估。在研究 benchmark 上,我们的模型在大多数常见任务上实现了一致性提高,而在长上下文任务上具有显著提高。特别是,通过不需要人工标注长 instrucion 数据的Cost-effective instrucion tuning 过程,70B 变体可以在一组长上下文任务上超越 gpt-3.5-turbo-16k 的总性表现。此外,我们还提供了深入分析方法的各个组件。我们分析了 llama 的位置编码和其在模型长依赖关系的限制。我们还检查了预训练过程中的不同设计选择,包括数据混合和序列长度的训练课程,我们的拓展实验表明,在预训练 dataset 中充足的长文本不是逻辑上的关键,我们也经验 verify 了在预训练 dataset 中预训练是更加高效和相同有效的。

Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness

  • paper_url: http://arxiv.org/abs/2309.15991
  • repo_url: None
  • paper_authors: Valentin Barriere, Felipe del Rio, Andres Carvallo De Ferari, Carlos Aspillaga, Eugenio Herrera-Berg, Cristian Buc Calderon
  • for: 提高模型对图像描述能力的人类化能力(如性别识别)
  • methods: 使用Targeted Image-editing Data Augmentation(TIDA)方法,通过对图像caption进行修改,使模型更好地理解图像的相关结构
  • results: 在Flickr30K benchmark上,TIDA对性别、颜色和数量能力进行了改进,并且在不同的图像描述指标中显示了更好的性能,并进行了细化分析以及对不同的文本生成模型的比较
    Abstract Artificial neural networks typically struggle in generalizing to out-of-context examples. One reason for this limitation is caused by having datasets that incorporate only partial information regarding the potential correlational structure of the world. In this work, we propose TIDA (Targeted Image-editing Data Augmentation), a targeted data augmentation method focused on improving models' human-like abilities (e.g., gender recognition) by filling the correlational structure gap using a text-to-image generative model. More specifically, TIDA identifies specific skills in captions describing images (e.g., the presence of a specific gender in the image), changes the caption (e.g., "woman" to "man"), and then uses a text-to-image model to edit the image in order to match the novel caption (e.g., uniquely changing a woman to a man while maintaining the context identical). Based on the Flickr30K benchmark, we show that, compared with the original data set, a TIDA-enhanced dataset related to gender, color, and counting abilities induces better performance in several image captioning metrics. Furthermore, on top of relying on the classical BLEU metric, we conduct a fine-grained analysis of the improvements of our models against the baseline in different ways. We compared text-to-image generative models and found different behaviors of the image captioning models in terms of encoding visual encoding and textual decoding.
    摘要 人工神经网络通常在对不同的示例进行泛化时遇到困难。其中一个原因是因为 dataset 中只包含部分世界相关的词语关系结构。在这项工作中,我们提出了 TIDA(targeted image-editing data augmentation),一种专门针对提高模型的人类能力(例如性别识别)的数据增强方法。更 Specifically,TIDA 会在描述图像的caption中特定的技能(例如图像中的性别存在),将caption修改(例如“女性”改为“男性”),然后使用文本到图像生成模型来编辑图像,以使图像与修改后的caption保持相同的上下文。基于 Flickr30K benchmark,我们表明,相比原始数据集,TIDA 增强的 gender、color 和 counting 能力相关的数据集可以induce better performance 在多个图像描述指标上。此外,我们不仅仅使用 classical BLEU metric,还进行了细化的分析,对比不同的文本到图像生成模型和图像描述模型的不同行为。

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

  • paper_url: http://arxiv.org/abs/2309.15826
  • repo_url: None
  • paper_authors: Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe
  • for: 这项研究的目的是提出一种具有硬件共享参数的ST/MT多任务框架,以提高speech-to-text翻译的效果。
  • methods: 该方法使用一个预处理阶段,将speech和text输入转换为两个不同的字符序列,以便模型可以使用共同词库处理两种模式。
  • results: 实验结果表明,该方法可以提高attentional encoder-decoder、CTC、批处理器和联合CTC/注意力模型的性能,无需外部MT数据。此外,该方法还可以在外部MT数据支持下提高性能,并且可以在基于文本模型的预训练模型上进行转移学习,以提高性能。
    Abstract Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation. In this work, we instead propose a ST/MT multi-tasking framework with hard parameter sharing in which all model parameters are shared cross-modally. Our method reduces the speech-text modality gap via a pre-processing stage which converts speech and text inputs into two discrete token sequences of similar length -- this allows models to indiscriminately process both modalities simply using a joint vocabulary. With experiments on MuST-C, we demonstrate that our multi-tasking framework improves attentional encoder-decoder, Connectionist Temporal Classification (CTC), transducer, and joint CTC/attention models by an average of +0.5 BLEU without any external MT data. Further, we show that this framework incorporates external MT data, yielding +0.8 BLEU, and also improves transfer learning from pre-trained textual models, yielding +1.8 BLEU.
    摘要 最近的END-to-END语音至文本翻译(ST)研究提出了多任务方法with soft parameter sharing,利用机器翻译(MT)数据via secondary encoders,将文本输入映射到最终的跨Modal Representation。在这个工作中,我们提议一种ST/MT多任务框架with hard parameter sharing,所有模型参数共享跨Modal。我们通过预处理阶段将speech和text输入转换成两个精确的token序列,使模型可以不区分modalities,使用共同词库进行处理。我们通过在MuST-C上进行实验,表明我们的多任务框架可以提高attentional encoder-decoder、Connectionist Temporal Classification(CTC)、推理器和 joint CTC/attention模型的性能,无需外部机器翻译数据,均为+0.5 BLEU。此外,我们还表明该框架可以 incorporate external MT数据,提高性能+0.8 BLEU,并且可以从预训练的文本模型进行传输学习,提高性能+1.8 BLEU。

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

  • paper_url: http://arxiv.org/abs/2309.15800
  • repo_url: None
  • paper_authors: Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang
  • for: 这篇论文旨在探讨在终端至终的语音处理模型中使用独立的语音单位。
  • methods: 这篇论文使用了各种方法,例如去重和子字模型,以将语音序列长度进行压缩。
  • results: 实验结果显示,使用独立的语音单位在大多数情况下可以取得良好的结果,并且可以大幅缩短训练时间。
    Abstract Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech features such as spectrograms are often used as the input for the subsequent model. However, they can still be redundant. Recent investigations proposed the use of discrete speech units derived from self-supervised learning representations, which significantly compresses the size of speech data. Applying various methods, such as de-duplication and subword modeling, can further compress the speech sequence length. Hence, training time is significantly reduced while retaining notable performance. In this study, we undertake a comprehensive and systematic exploration into the application of discrete units within end-to-end speech processing models. Experiments on 12 automatic speech recognition, 3 speech translation, and 1 spoken language understanding corpora demonstrate that discrete units achieve reasonably good results in almost all the settings. We intend to release our configurations and trained models to foster future research efforts.
    摘要 语音信号通常采样于每秒万分之几个频率,具有重复性和不必要的繁殖性,使sequencing模型难以有效地处理。高维度语音特征如spectrogram可以作为后续模型的输入,但它们可能仍然具有重复性。最近的研究提议使用基于自动学习的独立speech单元,可以压缩语音数据的大小。通过方法如减少和子字模型,可以进一步压缩语音序列的长度,从而减少训练时间,保持了可观的性能。在这种研究中,我们进行了总体和系统的探索,探讨了在结束到端speech处理模型中应用独立单元的可能性。实验在12个自动语音识别、3个语音翻译和1个口语理解 corpora中,表明独立单元在大多数设置中都可以达到理想的结果。我们计划将我们的配置和训练模型发布,以促进未来的研究努力。

Large Language Model Routing with Benchmark Datasets

  • paper_url: http://arxiv.org/abs/2309.15789
  • repo_url: None
  • paper_authors: Tal Shnitzer, Anthony Ou, Mírian Silva, Kate Soule, Yuekai Sun, Justin Solomon, Neil Thompson, Mikhail Yurochkin
  • for: 选择最佳大语言模型(LLM) для新任务
  • methods: 利用数据集改进 router 模型选择
  • results: 在多个任务和场景中提高选择 LLM 的性能
    Abstract There is a rapidly growing number of open-source Large Language Models (LLMs) and benchmark datasets to compare them. While some models dominate these benchmarks, no single model typically achieves the best accuracy in all tasks and use cases. In this work, we address the challenge of selecting the best LLM out of a collection of models for new tasks. We propose a new formulation for the problem, in which benchmark datasets are repurposed to learn a "router" model for this LLM selection, and we show that this problem can be reduced to a collection of binary classification tasks. We demonstrate the utility and limitations of learning model routers from various benchmark datasets, where we consistently improve performance upon using any single model for all tasks.
    摘要 有一个快速增长的开源大语言模型(LLM)和测试数据集的比较。虽然一些模型在这些测试中占据主导地位,但通常没有一个模型能够在所有任务和场景中达到最佳性能。在这项工作中,我们解决了选择集合中的最佳LLM的挑战。我们提出了一种新的问题形ulation,在其中测试数据集被重用来学习一个"路由"模型,并证明这个问题可以转化为一系列二分类任务。我们示出了学习模型路由的实用性和局限性,并在不同的测试数据集上 consistently 提高了使用任何单个模型来完成所有任务的性能。

Question answering using deep learning in low resource Indian language Marathi

  • paper_url: http://arxiv.org/abs/2309.15779
  • repo_url: None
  • paper_authors: Dhiraj Amin, Sharvari Govilkar, Sagar Kulkarni
  • for: 这个论文是为了研究如何使用Transformer模型创建基于阅读理解的马拉地语问答系统。
  • methods: 本论文使用了多种Transformer模型,包括Multilingual Representations for Indian Languages (MuRIL)、MahaBERT和Indic Bidirectional Encoder Representations from Transformers (IndicBERT),并对这些模型进行了精度调整和微调整,以便在马拉地语阅读理解基数据集上进行测试。
  • results: 研究发现,在多种Transformer模型中,Multilingual Representations for Indian Languages (MuRIL)多语言模型在马拉地语 dataset 上得到了最高的准确率,具体来说是EM分数为0.64和F1分数为0.74。
    Abstract Precise answers are extracted from a text for a given input question in a question answering system. Marathi question answering system is created in recent studies by using ontology, rule base and machine learning based approaches. Recently transformer models and transfer learning approaches are used to solve question answering challenges. In this paper we investigate different transformer models for creating a reading comprehension-based Marathi question answering system. We have experimented on different pretrained Marathi language multilingual and monolingual models like Multilingual Representations for Indian Languages (MuRIL), MahaBERT, Indic Bidirectional Encoder Representations from Transformers (IndicBERT) and fine-tuned it on a Marathi reading comprehension-based data set. We got the best accuracy in a MuRIL multilingual model with an EM score of 0.64 and F1 score of 0.74 by fine tuning the model on the Marathi dataset.
    摘要 <> translate "Precise answers are extracted from a text for a given input question in a question answering system. Marathi question answering system is created in recent studies by using ontology, rule base and machine learning based approaches. Recently transformer models and transfer learning approaches are used to solve question answering challenges. In this paper we investigate different transformer models for creating a reading comprehension-based Marathi question answering system. We have experimented on different pretrained Marathi language multilingual and monolingual models like Multilingual Representations for Indian Languages (MuRIL), MahaBERT, Indic Bidirectional Encoder Representations from Transformers (IndicBERT) and fine-tuned it on a Marathi reading comprehension-based data set. We got the best accuracy in a MuRIL multilingual model with an EM score of 0.64 and F1 score of 0.74 by fine tuning the model on the Marathi dataset." into Simplified Chinese.答案是从文本中提取的,用于给定输入问题的问答系统。印地语问答系统在最近的研究中使用ontology、规则集和机器学习方法创建。现在 transformer 模型和传输学习方法在解决问答挑战中得到广泛应用。在这篇论文中,我们 investigate 不同的 transformer 模型,用于创建基于阅读理解的印地语问答系统。我们在不同的预训练的印地语语言多语言和单语言模型(如 Multilingual Representations for Indian Languages (MuRIL)、 MahaBERT 和 Indic Bidirectional Encoder Representations from Transformers (IndicBERT))中进行了微调,并在印地语阅读理解基数据集上进行了测试。我们在 MuRIL 多语言模型中得到了最好的准确率,EM 分数为 0.64 和 F1 分数为 0.74 。

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

  • paper_url: http://arxiv.org/abs/2309.15686
  • repo_url: None
  • paper_authors: Amir Hussein, Brian Yan, Antonios Anastasopoulos, Shinji Watanabe, Sanjeev Khudanpur
  • for: 这篇论文是为了提高端到端语音翻译(E2E-ST)中的准确性和稳定性而写的。
  • methods: 这篇论文使用了目标语言上下文来增强E2E-ST的准确性和稳定性,并使用了语音段长信息来扩大上下文的覆盖范围。此外,它还提出了上下文排除法以确保模型的可靠性。
  • results: 作者的提议的上下文E2E-ST方法比隔离单个句子的E2E-ST方法表现更好,并且在对话语音中,上下文信息主要帮助捕捉上下文风格以及解决 named entities 和 anaphora 等问题。
    Abstract Incorporating longer context has been shown to benefit machine translation, but the inclusion of context in end-to-end speech translation (E2E-ST) remains under-studied. To bridge this gap, we introduce target language context in E2E-ST, enhancing coherence and overcoming memory constraints of extended audio segments. Additionally, we propose context dropout to ensure robustness to the absence of context, and further improve performance by adding speaker information. Our proposed contextual E2E-ST outperforms the isolated utterance-based E2E-ST approach. Lastly, we demonstrate that in conversational speech, contextual information primarily contributes to capturing context style, as well as resolving anaphora and named entities.
    摘要 Contextual 翻译增强 machine translation 的效果已经被证明,但是在结构 speech translation (E2E-ST)中包含 context 的使用还尚未得到充分研究。为了填补这个空白,我们在 E2E-ST 中引入目标语言 context,从而提高 coherence 和抗耗尽性。此外,我们还提出了 context dropout,以确保模型在 context 缺失时的稳定性,并进一步提高性能。我们的Contextual E2E-ST 方法在 isolation 的 utterance-based E2E-ST 方法上表现出色。最后,我们证明了在对话speech中,contextual information 主要帮助捕捉 context style,以及解决 anaphora 和 named entities。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Speech collage: code-switched audio generation by collaging monolingual corpora

  • paper_url: http://arxiv.org/abs/2309.15674
  • repo_url: https://github.com/jsalt2022codeswitchingasr/generating-code-switched-audio
  • paper_authors: Amir Hussein, Dorsa Zeinali, Ondřej Klejch, Matthew Wiesner, Brian Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur
  • for: 本研究旨在提高自动语音识别(ASR)系统在混合语言(Code-Switching,CS)中的效果,尤其是在数据稀缺的情况下。
  • methods: 本研究提出了一种名为Speech Collage的方法,它可以将单语言 corpora 中的音频段落拼接成CS数据。此外,我们还使用了 overlap-add 方法来提高音频生成的质量。
  • results: 我们的实验结果表明,使用生成的CS数据可以对 Speech Recognition 系统产生很大的改善。在域内enario中,相比于不使用生成数据,使用生成的CS数据可以降低混合错误率(Mixed-Error Rate)和单词错误率(Word-Error Rate)的相对改善为34.4%和16.2%。此外,我们还发现,通过增加CS数据来增强模型的多语言倾向性和减少单语言偏好。
    Abstract Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zero-shot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, respectively. Lastly, we demonstrate that CS augmentation bolsters the model's code-switching inclination and reduces its monolingual bias.
    摘要 设计有效的自动语音识别(ASR)系统 для代码交换(CS)经常受到数据稀缺的限制。本文提出了 Speech Collage,一种方法,通过将单语言 corpus 中的音频段落拼接起来生成 CS 数据。我们还使用 overlap-add 方法来提高生成的音频质量。我们在两个场景中研究生成数据的影响:使用域内 CS 文本,以及零Instance 方法使用生成的 CS 文本。实验结果表明,可以 obt ain up to 34.4% 和 16.2% 的相对改善率reduction 和 Word-Error Rate 在域内和零Instance 场景中,分别。最后,我们示出了 CS 增强 bolsters 模型的代码交换倾向性,并减少了它的单语言偏好。

MONOVAB : An Annotated Corpus for Bangla Multi-label Emotion Detection

  • paper_url: http://arxiv.org/abs/2309.15670
  • repo_url: https://github.com/sajaldoes/facebookscraper
  • paper_authors: Sumit Kumar Banshal, Sajal Das, Shumaiya Akter Shammi, Narayan Ranjan Chakraborty
  • for: 这个研究旨在为旁遮� Bangla 语言中的情感识别(ER)和情感分析(SA)领域提供更加精确的扩展方法,并且对这种领域的研究进行探索。
  • methods: 这个研究使用了一种基于上下文的方法,并且使用了 BERT 方法来进行预测。
  • results: 研究发现,使用 BERT 方法可以获得最佳的结果,并且在多个情感类型上进行了多类情感识别。此外,还开发了一个网页应用程序来展示这个预测模型的性能。
    Abstract In recent years, Sentiment Analysis (SA) and Emotion Recognition (ER) have been increasingly popular in the Bangla language, which is the seventh most spoken language throughout the entire world. However, the language is structurally complicated, which makes this field arduous to extract emotions in an accurate manner. Several distinct approaches such as the extraction of positive and negative sentiments as well as multiclass emotions, have been implemented in this field of study. Nevertheless, the extraction of multiple sentiments is an almost untouched area in this language. Which involves identifying several feelings based on a single piece of text. Therefore, this study demonstrates a thorough method for constructing an annotated corpus based on scrapped data from Facebook to bridge the gaps in this subject area to overcome the challenges. To make this annotation more fruitful, the context-based approach has been used. Bidirectional Encoder Representations from Transformers (BERT), a well-known methodology of transformers, have been shown the best results of all methods implemented. Finally, a web application has been developed to demonstrate the performance of the pre-trained top-performer model (BERT) for multi-label ER in Bangla.
    摘要 近年来,情感分析(SA)和情感识别(ER)在孟加拉语中得到了越来越多的关注,孟加拉语是全球第七大语言之一。然而,这门语言结构复杂,使得在精确地检测情感上具有挑战性。多种不同的方法,如 позитив和负情感提取以及多类情感识别,在这个领域中已经实施。然而,多种情感的提取仍然是孟加拉语中未曾被探讨的领域。因此,本研究提出了一种系统的方法,基于Facebook上抓取的数据,构建了一个注释 corpora,以解决这些问题。为了使此注释更有价值, Context-based 方法被使用。在这些方法中, BERT 方法,一种著名的 transformers 方法,在所有实施的方法中显示了最好的结果。最后,一个 web 应用程序被开发,以示出预训练的最佳模型(BERT)在孟加拉语中的多标签 ER 性能。

Conversational Feedback in Scripted versus Spontaneous Dialogues: A Comparative Analysis

  • paper_url: http://arxiv.org/abs/2309.15656
  • repo_url: None
  • paper_authors: Ildikó Pilán, Laurent Prévot, Hendrik Buschmeier, Pierre Lison
  • for: 这篇论文的目的是分析对话中的反馈phenomena,以及这些现象在自然语言对话和脚本对话之间的差异。
  • methods: 该论文使用了一种神经网络对话动作标签器来EXTRACT对话数据中的 lexical statistics和分类输出,并对英文、法语、德语、匈牙利语、意大利语、日语、挪威语和中文等语言的对话数据进行了分析。
  • results: 论文的两个主要发现是:一是对话反馈在对话副本中比自然对话更少,二是对话副本中含有更多的负反馈。此外,文章还表明了大语言模型也遵循同样的趋势,即对话响应中包含少量的反馈,除非特别地进行了适应自然对话的细化调整。
    Abstract Scripted dialogues such as movie and TV subtitles constitute a widespread source of training data for conversational NLP models. However, the linguistic characteristics of those dialogues are notably different from those observed in corpora of spontaneous interactions. This difference is particularly marked for communicative feedback and grounding phenomena such as backchannels, acknowledgments, or clarification requests. Such signals are known to constitute a key part of the conversation flow and are used by the dialogue participants to provide feedback to one another on their perception of the ongoing interaction. This paper presents a quantitative analysis of such communicative feedback phenomena in both subtitles and spontaneous conversations. Based on dialogue data in English, French, German, Hungarian, Italian, Japanese, Norwegian and Chinese, we extract both lexical statistics and classification outputs obtained with a neural dialogue act tagger. Two main findings of this empirical study are that (1) conversational feedback is markedly less frequent in subtitles than in spontaneous dialogues and (2) subtitles contain a higher proportion of negative feedback. Furthermore, we show that dialogue responses generated by large language models also follow the same underlying trends and include comparatively few occurrences of communicative feedback, except when those models are explicitly fine-tuned on spontaneous dialogues.
    摘要 Movie 和 TV 字幕等脚本对话数据成为对话NLG模型训练的广泛来源。然而,这些对话的语言特征与临时交流中观察到的不同很大,特别是通信反馈和固定化现象,如后沟通、识别或清晰请求。这些信号被认为是对话流程的重要组成部分,它们由对话参与者用来对对话进行反馈。这篇论文通过对英文、法语、德语、匈牙利语、意大利语、日语、挪威语和中文对话数据进行量化分析,挖掘出 lexical 统计和基于神经对话 acts 标签器的分类输出。我们发现了两个主要结论:一是对话反馈在字幕中明显较少,二是字幕中含有较高比例的负反馈。此外,我们还证明了大语言模型也遵循同样的基本趋势,即包括对话中很少的交流反馈,除非这些模型被明确 fine-tune 于临时对话。

NLPBench: Evaluating Large Language Models on Solving NLP Problems

  • paper_url: http://arxiv.org/abs/2309.15630
  • repo_url: https://github.com/linxins97/nlpbench
  • paper_authors: Linxin Song, Jieyu Zhang, Lechao Cheng, Pengyuan Zhou, Tianyi Zhou, Irene Li
  • for: 该论文旨在探讨大语言模型(LLMs)在自然语言处理(NLP)领域的问题解决能力。
  • methods: 该论文使用了一个唯一的benchmark dataset,称为NLPBench,包含378个大学水平的NLP问题,涵盖了不同的NLP话题。该论文还使用了高级的提示策略,如链条思维(CoT)和树条思维(ToT)来评估LLMs的表现。
  • results: 该论文的研究发现,高级的提示策略的效果可能是不平等的,有时会对小型模型(如LLAMA-2)造成损害。此外,手动评估还暴露出了LLMs在科学问题解决中的缺陷,特别是逻辑分解和推理能力的弱点对结果产生了影响。
    Abstract Recent developments in large language models (LLMs) have shown promise in enhancing the capabilities of natural language processing (NLP). Despite these successes, there remains a dearth of research dedicated to the NLP problem-solving abilities of LLMs. To fill the gap in this area, we present a unique benchmarking dataset, NLPBench, comprising 378 college-level NLP questions spanning various NLP topics sourced from Yale University's prior final exams. NLPBench includes questions with context, in which multiple sub-questions share the same public information, and diverse question types, including multiple choice, short answer, and math. Our evaluation, centered on LLMs such as GPT-3.5/4, PaLM-2, and LLAMA-2, incorporates advanced prompting strategies like the chain-of-thought (CoT) and tree-of-thought (ToT). Our study reveals that the effectiveness of the advanced prompting strategies can be inconsistent, occasionally damaging LLM performance, especially in smaller models like the LLAMA-2 (13b). Furthermore, our manual assessment illuminated specific shortcomings in LLMs' scientific problem-solving skills, with weaknesses in logical decomposition and reasoning notably affecting results.
    摘要 最近的大语言模型(LLM)的发展已经显示出了提高自然语言处理(NLP)的能力的承诺。然而,还没有充分的研究关注LLM在NLP问题解决能力方面的研究。为了填补这一空白,我们提供了一个独特的标准 benchmarck dataset,即NLPBench,其包含378个大学水平的NLP问题,这些问题来源于叶lez大学的过去的最终考试。NLPBench包括问题带上下文,多个子问题共享公共信息,以及多种问题类型,包括多选、简答和数学类型。我们的评估中心于GPT-3.5/4、PaLM-2和LLAMA-2等LLM,并使用高级提示策略,如链条思维(CoT)和树条思维(ToT)。我们的研究发现,高级提示策略的效iveness可以不均匀,有时会对小型模型 like LLAMA-2(13b)产生负面影响。此外,我们的手动评估还揭示了LLM在科学问题解决能力中的缺陷,尤其是逻辑分解和推理能力受到了影响。

Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation

  • paper_url: http://arxiv.org/abs/2309.15588
  • repo_url: None
  • paper_authors: Zeyu Wang, Mizuho Iwaihara
  • for: 本研究旨在提高多个标签方面类划分中的准确率,通过使用支持集注意 Mechanism 和增强的标签文本信息。
  • methods: 本研究使用 prototypical network 和注意机制,首先在支持集中计算每个类划分的均值,然后使用 sentence-level 注意机制对每个支持集实例进行权重调整,最后将计算出的投影用于计算查询集中的噪声抑制。
  • results: 实验结果表明,我们的提议方法在 Yelp 数据集四个不同的场景中均有较高的表现,并且超越了所有基线方法。
    Abstract Multi-label aspect category detection is intended to detect multiple aspect categories occurring in a given sentence. Since aspect category detection often suffers from limited datasets and data sparsity, the prototypical network with attention mechanisms has been applied for few-shot aspect category detection. Nevertheless, most of the prototypical networks used so far calculate the prototypes by taking the mean value of all the instances in the support set. This seems to ignore the variations between instances in multi-label aspect category detection. Also, several related works utilize label text information to enhance the attention mechanism. However, the label text information is often short and limited, and not specific enough to discern categories. In this paper, we first introduce support set attention along with the augmented label information to mitigate the noise at word-level for each support set instance. Moreover, we use a sentence-level attention mechanism that gives different weights to each instance in the support set in order to compute prototypes by weighted averaging. Finally, the calculated prototypes are further used in conjunction with query instances to compute query attention and thereby eliminate noises from the query set. Experimental results on the Yelp dataset show that our proposed method is useful and outperforms all baselines in four different scenarios.
    摘要 多标签方面类划分是用于检测给定句子中的多个方面类。由于方面类划分经常受到有限的数据和数据稀缺的限制,因此使用 prototype 网络和注意机制来实现少量的方面类划分。然而,大多数的 prototype 网络使用的是取支持集中的所有实例的平均值来计算prototype。这看似忽略了多标签方面类划分中实例之间的差异。此外,一些相关的工作使用标签文本信息来增强注意机制。然而,标签文本信息通常短暂,有限,不够特别地区分类。在本文中,我们首先引入支持集注意以及增强的标签信息来减少每个支持集实例的噪声。此外,我们使用句子级注意机制,对每个支持集实例进行不同的权重计算,以计算prototype。最后,计算出的prototype被用与查询实例进行计算查询注意,以消除查询集中的噪声。实验结果表明,我们提出的方法在Yelp数据集上表现出色,并在四个不同的场景下超过所有基线。

Jointly Training Large Autoregressive Multimodal Models

  • paper_url: http://arxiv.org/abs/2309.15564
  • repo_url: https://github.com/kyegomez/MultiModalCrossAttn
  • paper_authors: Emanuele Aiello, Lili Yu, Yixin Nie, Armen Aghajanyan, Barlas Oguz
  • for: 本研究旨在开发一种能够生成高质量多Modal输出的单一模型,以满足现代机器学习领域中的权威需求。
  • methods: 该模型采用了一种模块化的方法,将现有的文本和图像生成模型系统地融合在一起,并 introduce了一种特殊的数据效率的指令调整策略,适应混合多Modal生成任务。
  • results: 研究人员通过对模型进行特定的指令调整,实现了生成高质量多Modal输出的目标,并表明了这种模型在混合多Modal生成任务中的首次应用。
    Abstract In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models. We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixed-modal generation tasks. Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs and represents the first model explicitly designed for this purpose.
    摘要 近年来,大规模预训练语言和文本到图像模型的进步,已经对机器学习领域产生了革命性的变革。然而,将这两种模式集成成一个完整、可靠的多模式模型,以生成无缝多Modal输出仍然是一项重要挑战。为解决这一问题,我们提出了共同自适应混合(JAM)框架,这是一种模块化的方法,可以系统地融合现有的文本和图像生成模型。我们还提出了特化于混合多Modal生成任务的数据效率准则调整策略。最终,我们的指导调整模型在生成高质量多Modal输出的表现卓越,并成为首个专门为这种目的设计的模型。

VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning

  • paper_url: http://arxiv.org/abs/2309.15494
  • repo_url: None
  • paper_authors: Yanan Wang, Donghuo Zeng, Shinya Wada, Satoshi Kurihara
  • for: 这个论文旨在解决多模态融合问题,提高多模态融合系统的效率和性能。
  • methods: 该论文提出了一种基于视频知识塑造的多模态知识传递方法,使用CLIP模型提供多模态知识监督信号,并通过一个步骤式塑造目标函数来传递知识。
  • results: 该方法在两个多模态任务中(MOSI和MOSEI数据集以及VEGAS数据集)表现出色,在视频层 sentiment分析任务中,学生模型(只需要文本模式作为输入)的MAE分数提高了12.3%。此外,该方法还在VEGAS数据集上提高了现有方法的3.4% mAP分数,而无需额外计算。这些结果表明该方法在实现高效率高性能多模态传递学习中的优势。
    Abstract Multimodal transfer learning aims to transform pretrained representations of diverse modalities into a common domain space for effective multimodal fusion. However, conventional systems are typically built on the assumption that all modalities exist, and the lack of modalities always leads to poor inference performance. Furthermore, extracting pretrained embeddings for all modalities is computationally inefficient for inference. In this work, to achieve high efficiency-performance multimodal transfer learning, we propose VideoAdviser, a video knowledge distillation method to transfer multimodal knowledge of video-enhanced prompts from a multimodal fundamental model (teacher) to a specific modal fundamental model (student). With an intuition that the best learning performance comes with professional advisers and smart students, we use a CLIP-based teacher model to provide expressive multimodal knowledge supervision signals to a RoBERTa-based student model via optimizing a step-distillation objective loss -- first step: the teacher distills multimodal knowledge of video-enhanced prompts from classification logits to a regression logit -- second step: the multimodal knowledge is distilled from the regression logit of the teacher to the student. We evaluate our method in two challenging multimodal tasks: video-level sentiment analysis (MOSI and MOSEI datasets) and audio-visual retrieval (VEGAS dataset). The student (requiring only the text modality as input) achieves an MAE score improvement of up to 12.3% for MOSI and MOSEI. Our method further enhances the state-of-the-art method by 3.4% mAP score for VEGAS without additional computations for inference. These results suggest the strengths of our method for achieving high efficiency-performance multimodal transfer learning.
    摘要 多模态转移学习目的是将预训 representations of 多个模式转换到共同领域空间,以便实现效果的多模态融合。然而,传统系统通常是基于所有模式都存在的假设,缺少模式会导致较差的推论性能。另外,从多个模式中提取预训嵌入的computational complexity对于推论是高的。在这个工作中,我们提出了 VideoAdviser,一种影片智慧传承方法,将多模态知识传承自一个多模式基础模型(教师)至一个具体的模式基础模型(学生)。我们的想法是,在专业指导和聪明的学生之下,学习最佳性能。我们使用基于 CLIP 的教师模型提供多模式知识超visuel 监督信号,将其转换为一个 step-distillation 目标函数损失——第一步:教师对影片增强提示的多模式知识进行分类 logits 的激发——第二步:将多模式知识从教师的分类 logits 转换为学生的分类 logits。我们在 MOSI 和 MOSEI dataset 进行了两个多模式任务的评估:影片水平情感分析和音频视觉搜寻。学生(仅需要文本模式作为输入)在 MOSI 和 MOSEI dataset 中的 MAE 分数改善为最多 12.3%。我们的方法还提高了现有方法的 mAP 分数 by 3.4% без需要进行额外的计算。这些结果显示了我们的方法在实现高效率-性能多模态转移学习的能力。

Dynamic Multi-Scale Context Aggregation for Conversational Aspect-Based Sentiment Quadruple Analysis

  • paper_url: http://arxiv.org/abs/2309.15476
  • repo_url: None
  • paper_authors: Yuqing Li, Wenyuan Zhang, Binbin Li, Siyu Jia, Zisen Qi, Xingbang Tan
  • for: 这个研究的目的是提出了一种基于对话结构的强大的 sentiment quadruple 分析方法,以便更好地捕捉对话中的 quadruple 元素。
  • methods: 这个方法使用了一种名为 Dynamic Multi-scale Context Aggregation network (DMCA),它首先利用对话结构生成多级utterance window,然后通过动态层次聚合模块来集成进步的cue。
  • results: 对比基elines,这个方法在实验中表现出了显著的优势,并达到了领域内的状态之术性表现。
    Abstract Conversational aspect-based sentiment quadruple analysis (DiaASQ) aims to extract the quadruple of target-aspect-opinion-sentiment within a dialogue. In DiaASQ, a quadruple's elements often cross multiple utterances. This situation complicates the extraction process, emphasizing the need for an adequate understanding of conversational context and interactions. However, existing work independently encodes each utterance, thereby struggling to capture long-range conversational context and overlooking the deep inter-utterance dependencies. In this work, we propose a novel Dynamic Multi-scale Context Aggregation network (DMCA) to address the challenges. Specifically, we first utilize dialogue structure to generate multi-scale utterance windows for capturing rich contextual information. After that, we design a Dynamic Hierarchical Aggregation module (DHA) to integrate progressive cues between them. In addition, we form a multi-stage loss strategy to improve model performance and generalization ability. Extensive experimental results show that the DMCA model outperforms baselines significantly and achieves state-of-the-art performance.
    摘要 文本中的异常性质 sentiment quadruple分析(DiaASQ)目的是从对话中提取目标方面的意见情感。在DiaASQ中,quadruple的元素经常跨越多个句子,这使得提取过程变得更加复杂,强调了对对话上下文和互动的深入理解。然而,现有的工作独立地编码每个句子,从而难以捕捉对话中长距离的上下文关系和深入的词语依赖关系。在这种情况下,我们提出了一种新的动态多尺度上下文聚合网络(DMCA)来解决这些挑战。 Specifically,我们首先利用对话结构生成多尺度的utterance窗口,以便捕捉丰富的上下文信息。然后,我们设计了动态层次聚合模块(DHA),用于在这些窗口之间进行进度的聚合。此外,我们设计了多阶段的损失策略,以提高模型性能和泛化能力。经验证明,DMCA模型在比较多的基线上表现出优于基eline,并达到了当前领域的状态提取模型。

ChatCounselor: A Large Language Models for Mental Health Support

  • paper_url: http://arxiv.org/abs/2309.15461
  • repo_url: https://github.com/emocareai/chatpsychiatrist
  • paper_authors: June M. Liu, Donghao Li, He Cao, Tianhe Ren, Zeyi Liao, Jiamin Wu
  • for: 这个论文旨在提供心理支持,不同于通用的chatbot,它基于专业心理师和客户之间的真实对话,因此具有专业心理知识和辅导技能。
  • methods: 这个解决方案使用了GPT-4和特制的提示来进行辅导,并根据七项心理辅导评价指标来评估辅导响应质量。
  • results: 对比已有的开源模型,ChatCounselor在辅导桌上表现出色,其表现相当于ChatGPT,这显示了数据驱动的模型能力得到了显著提高。
    Abstract This paper presents ChatCounselor, a large language model (LLM) solution designed to provide mental health support. Unlike generic chatbots, ChatCounselor is distinguished by its foundation in real conversations between consulting clients and professional psychologists, enabling it to possess specialized knowledge and counseling skills in the field of psychology. The training dataset, Psych8k, was constructed from 260 in-depth interviews, each spanning an hour. To assess the quality of counseling responses, the counseling Bench was devised. Leveraging GPT-4 and meticulously crafted prompts based on seven metrics of psychological counseling assessment, the model underwent evaluation using a set of real-world counseling questions. Impressively, ChatCounselor surpasses existing open-source models in the counseling Bench and approaches the performance level of ChatGPT, showcasing the remarkable enhancement in model capability attained through high-quality domain-specific data.
    摘要

Beyond the Chat: Executable and Verifiable Text-Editing with LLMs

  • paper_url: http://arxiv.org/abs/2309.15337
  • repo_url: None
  • paper_authors: Philippe Laban, Jesse Vig, Marti A. Hearst, Caiming Xiong, Chien-Sheng Wu
  • For: The paper aims to provide a more transparent and verifiable editing interface for documents edited with Large Language Models (LLMs).* Methods: The proposed interface, called InkSync, suggests executable edits directly within the document being edited, and supports a 3-stage approach to mitigate the risk of factual errors introduced by LLMs.* Results: Two usability studies confirm the effectiveness of InkSync’s components compared to standard LLM-based chat interfaces, leading to more accurate, more efficient editing, and improved user experience.Here’s the same information in Simplified Chinese text:* For: 该论文旨在提供基于大语言模型(LLM)的文档编辑器,具有更高的透明度和可靠性。* Methods: 提议的界面是InkSync,它在文档中直接提供执行修改建议,并支持三个阶段方法来减少LLM引入的事实错误风险。* Results: 两项用户研究证明InkSync的组件在与标准LLM基于chat界面进行比较时,有更高的准确性、更高的效率、和更好的用户体验。
    Abstract Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. However, standard chat-based conversational interfaces do not support transparency and verifiability of the editing changes that they suggest. To give the author more agency when editing with an LLM, we present InkSync, an editing interface that suggests executable edits directly within the document being edited. Because LLMs are known to introduce factual errors, Inksync also supports a 3-stage approach to mitigate this risk: Warn authors when a suggested edit introduces new information, help authors Verify the new information's accuracy through external search, and allow an auditor to perform an a-posteriori verification by Auditing the document via a trace of all auto-generated content. Two usability studies confirm the effectiveness of InkSync's components when compared to standard LLM-based chat interfaces, leading to more accurate, more efficient editing, and improved user experience.
    摘要 大型语言模型(LLM)驱动的对话界面在文档编辑中获得反馈已经变得非常流行。然而,标准的chat界面不支持对编辑建议的透明度和可靠性。为给作者更多的自主权在LLM编辑,我们提出了inksync,一种在文档中直接提供可执行的编辑建议的编辑界面。因为LLM经常引入错误信息,inksync还支持三个阶段来减轻这种风险:警告作者当建议编辑引入新信息时,帮助作者验证新信息的准确性通过外部搜索,并让审核人员通过文档的跟踪来对自动生成的内容进行 posteriori 验证。两个用户研究证明了inksync的组件与标准LLM-based chat界面相比,可以提高精度、效率和用户体验。

cs.LG - 2023-09-27

Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images

  • paper_url: http://arxiv.org/abs/2309.16066
  • repo_url: None
  • paper_authors: Yehyun Suh, Peter Chan, J. Ryan Martin, Daniel Moyer
  • for: 预测医学影像中的临床标志物
  • methods: 使用标签Only的扩充方案进行自动医学特征点检测,并使用 generic U-Net 架构和课程 consisting of two phases 进行训练
  • results: 在六个医学影像 dataset 上,使用这种方法可以获得高效的临床标志物预测结果,并且比传统的数据扩充方法更高效
    Abstract This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations.
    摘要

Predicting Cardiovascular Complications in Post-COVID-19 Patients Using Data-Driven Machine Learning Models

  • paper_url: http://arxiv.org/abs/2309.16059
  • repo_url: None
  • paper_authors: Maitham G. Yousif, Hector J. Castro
  • for: 预测 COVID-19 后cardiovascular 疾病的风险
  • methods: 使用数据驱动机器学习模型预测352名 Iraq 地区 COVID-19 患者中的 cardiovascular 疾病风险
  • results: 机器学习模型在预测 cardiovascular 疾病风险方面表现出色,早期发现可以提供时间ous interventions 和改善结果
    Abstract The COVID-19 pandemic has globally posed numerous health challenges, notably the emergence of post-COVID-19 cardiovascular complications. This study addresses this by utilizing data-driven machine learning models to predict such complications in 352 post-COVID-19 patients from Iraq. Clinical data, including demographics, comorbidities, lab results, and imaging, were collected and used to construct predictive models. These models, leveraging various machine learning algorithms, demonstrated commendable performance in identifying patients at risk. Early detection through these models promises timely interventions and improved outcomes. In conclusion, this research underscores the potential of data-driven machine learning for predicting post-COVID-19 cardiovascular complications, emphasizing the need for continued validation and research in diverse clinical settings.
    摘要 COVID-19 大流行已经在全球带来了许多健康挑战,其中包括后 COVID-19 冠状病毒疾病的出现。这项研究利用数据驱动的机器学习模型来预测这些病例中的合并症。研究在 Iraq 352 名 POST-COVID-19 患者的临床数据,包括人口统计、相关病种、实验室测试结果和成像,以构建预测模型。这些模型通过不同的机器学习算法,在预测患者风险中表现了良好的表现。早期发现通过这些模型,可以提供早期干预和改善结果。研究结论,这项研究证明了数据驱动的机器学习可以预测后 COVID-19 冠状病毒疾病,并且需要继续验证和研究在多种临床设置下。

Machine Learning-driven Analysis of Gastrointestinal Symptoms in Post-COVID-19 Patients

  • paper_url: http://arxiv.org/abs/2310.00540
  • repo_url: None
  • paper_authors: Maitham G. Yousif, Fadhil G. Al-Amran, Salman Rawaf, Mohammad Abdulla Grmt
  • for: This study aims to investigate the prevalence and patterns of gastrointestinal (GI) symptoms in individuals recovering from COVID-19 and to identify predictive factors for these symptoms using machine learning algorithms.
  • methods: The study uses data from 913 post-COVID-19 patients in Iraq collected during 2022 and 2023. The researchers use machine learning algorithms to identify predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness.
  • results: The study finds that a notable percentage of post-COVID-19 patients experience GI symptoms during their recovery phase, with diarrhea being the most frequently reported symptom. The researchers also identify significant predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness.Here are the results in Simplified Chinese text:
  • for: 这个研究旨在调查 COVID-19 恢复期人群中的肠胃症状的频度和特征,以及使用机器学习算法预测这些症状的预测因素。
  • methods: 该研究使用2022年和2023年在伊拉克收集的913名 COVID-19 恢复期患者的数据。研究人员使用机器学习算法预测肠胃症状的预测因素,包括年龄、性别、疾病严重程度、潜在的相关疾病和 COVID-19 病程的时间长短。
  • results: 研究发现大量的 COVID-19 恢复期患者在恢复阶段经历了肠胃症状,最常见的症状是 диаре便, followed by 腹痛和呕吐。研究人员还发现了预测肠胃症状的重要因素,包括年龄、性别、疾病严重程度、潜在的相关疾病和 COVID-19 病程的时间长短。
    Abstract The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, has posed significant health challenges worldwide. While respiratory symptoms have been the primary focus, emerging evidence has highlighted the impact of COVID-19 on various organ systems, including the gastrointestinal (GI) tract. This study, based on data from 913 post-COVID-19 patients in Iraq collected during 2022 and 2023, investigates the prevalence and patterns of GI symptoms in individuals recovering from COVID-19 and leverages machine learning algorithms to identify predictive factors for these symptoms. The research findings reveal that a notable percentage of post-COVID-19 patients experience GI symptoms during their recovery phase. Diarrhea emerged as the most frequently reported symptom, followed by abdominal pain and nausea. Machine learning analysis uncovered significant predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness. These findings underscore the importance of monitoring and addressing GI symptoms in post-COVID-19 care, with machine learning offering valuable tools for early identification and personalized intervention. This study contributes to the understanding of the long-term consequences of COVID-19 on GI health and emphasizes the potential benefits of utilizing machine learning-driven analysis in predicting and managing these symptoms. Further research is warranted to delve into the mechanisms underlying GI symptoms in COVID-19 survivors and to develop targeted interventions for symptom management. Keywords: COVID-19, gastrointestinal symptoms, machine learning, predictive factors, post-COVID-19 care, long COVID.
    摘要 COVID-19 大流行,由新型冠状病毒 SARS-CoV-2 引起,在全球造成了巨大的健康挑战。although respiratory symptoms have been the primary focus, emerging evidence has highlighted the impact of COVID-19 on various organ systems, including the gastrointestinal (GI) tract. This study, based on data from 913 post-COVID-19 patients in Iraq collected during 2022 and 2023, investigates the prevalence and patterns of GI symptoms in individuals recovering from COVID-19 and leverages machine learning algorithms to identify predictive factors for these symptoms. The research findings reveal that a notable percentage of post-COVID-19 patients experience GI symptoms during their recovery phase. 肠胃症状最常出现的是腹痛,followed by nausea and diarrhea. Machine learning analysis uncovered significant predictive factors for GI symptoms, including age, gender, disease severity, comorbidities, and the duration of COVID-19 illness. These findings underscore the importance of monitoring and addressing GI symptoms in post-COVID-19 care, with machine learning offering valuable tools for early identification and personalized intervention. This study contributes to the understanding of the long-term consequences of COVID-19 on GI health and emphasizes the potential benefits of utilizing machine learning-driven analysis in predicting and managing these symptoms. Further research is warranted to delve into the mechanisms underlying GI symptoms in COVID-19 survivors and to develop targeted interventions for symptom management. Keywords: COVID-19, gastrointestinal symptoms, machine learning, predictive factors, post-COVID-19 care, long COVID.

Identifying Risk Factors for Post-COVID-19 Mental Health Disorders: A Machine Learning Perspective

  • paper_url: http://arxiv.org/abs/2309.16055
  • repo_url: None
  • paper_authors: Maitham G. Yousif, Fadhil G. Al-Amran, Hector J. Castro
  • For: This study aimed to identify risk factors associated with post-COVID-19 mental health disorders in a sample of 669 patients in Iraq.* Methods: The study used machine learning techniques to analyze demographic, clinical, and psychosocial factors that may influence the development of mental health disorders in post-COVID-19 patients.* Results: The study found that age, gender, geographical region of residence, comorbidities, and the severity of COVID-19 illness were significant risk factors for developing mental health disorders. Additionally, psychosocial factors such as social support, coping strategies, and perceived stress levels played a substantial role.Here is the information in Simplified Chinese text:* 为:本研究使用机器学习技术来 indentify COVID-19后 mental health障碍的风险因素,采样从伊拉克669名患者中进行分析。* 方法:本研究使用机器学习技术来分析各种因素对 COVID-19后 mental health障碍的影响,包括人口特征、临床特征和心理社会特征。* 结果:研究发现年龄、性别、地域居住、患 COVID-19的严重程度和合并病有关 mental health障碍的风险因素。此外,社会支持、 coping 策略和感知的压力水平也发挥了重要作用。
    Abstract In this study, we leveraged machine learning techniques to identify risk factors associated with post-COVID-19 mental health disorders. Our analysis, based on data collected from 669 patients across various provinces in Iraq, yielded valuable insights. We found that age, gender, and geographical region of residence were significant demographic factors influencing the likelihood of developing mental health disorders in post-COVID-19 patients. Additionally, comorbidities and the severity of COVID-19 illness were important clinical predictors. Psychosocial factors, such as social support, coping strategies, and perceived stress levels, also played a substantial role. Our findings emphasize the complex interplay of multiple factors in the development of mental health disorders following COVID-19 recovery. Healthcare providers and policymakers should consider these risk factors when designing targeted interventions and support systems for individuals at risk. Machine learning-based approaches can provide a valuable tool for predicting and preventing adverse mental health outcomes in post-COVID-19 patients. Further research and prospective studies are needed to validate these findings and enhance our understanding of the long-term psychological impact of the COVID-19 pandemic. This study contributes to the growing body of knowledge regarding the mental health consequences of the COVID-19 pandemic and underscores the importance of a multidisciplinary approach to address the diverse needs of individuals on the path to recovery. Keywords: COVID-19, mental health, risk factors, machine learning, Iraq
    摘要 在这项研究中,我们利用机器学习技术来确定 COVID-19 后精神健康问题的风险因素。我们基于伊拉克各地669名患者的数据进行分析,并获得了有价值的发现。我们发现年龄、性别和居住地区的民生因素都会影响 COVID-19 后精神健康问题的发生。此外,患者患有其他疾病和 COVID-19 病情的严重程度也是重要的临床预测因素。在社会支持、应急应急管理和感受水平方面,也有许多重要的心理因素。我们的发现表明 COVID-19 后精神健康问题的发生是多因素互动的,医疗提供者和政策制定者应该考虑这些风险因素,设计目标性的干预措施和支持系统,以降低患者风险。机器学习基于的方法可以为预测和预防 COVID-19 后精神健康问题的发展提供有价值的工具。进一步的研究和前瞻性研究是需要进行的,以验证这些发现,并增进我们对 COVID-19 疫情后长期精神影响的理解。这项研究贡献到了关于 COVID-19 疫情后精神健康问题的知识库,强调了多科学领域合作的重要性,以满足患者不同需求的多样化路径。关键词:COVID-19, 精神健康, 风险因素, 机器学习, 伊拉克

Cognizance of Post-COVID-19 Multi-Organ Dysfunction through Machine Learning Analysis

  • paper_url: http://arxiv.org/abs/2309.16736
  • repo_url: None
  • paper_authors: Hector J. Castro, Maitham G. Yousif
  • For: The paper aims to analyze and predict multi-organ dysfunction in individuals experiencing Post-COVID-19 Syndrome using machine learning techniques.* Methods: The study uses data collection and preprocessing, feature selection and engineering, model development and validation, and ethical considerations to enhance early detection and management of Post-COVID-19 Syndrome.* Results: The paper aims to improve our understanding of Post-COVID-19 Syndrome through machine learning, potentially improving patient outcomes and quality of life.Here’s the same information in Simplified Chinese text:* For: 这个研究报告旨在使用机器学习技术分析和预测患有长COVID的多器官衰竭。* Methods: 这个研究使用数据收集和处理、特征选择和工程、模型开发和验证、伦理考虑等方法来提高患有长COVID的早期检测和管理。* Results: 这个研究可能会提高患有长COVID的患者结果和生活质量。
    Abstract In the year 2022, a total of 466 patients from various cities across Iraq were included in this study. This research paper focuses on the application of machine learning techniques to analyse and predict multi-organ dysfunction in individuals experiencing Post-COVID-19 Syndrome, commonly known as Long COVID. Post-COVID-19 Syndrome presents a wide array of persistent symptoms affecting various organ systems, posing a significant challenge to healthcare. Leveraging the power of artificial intelligence, this study aims to enhance early detection and management of this complex condition. The paper outlines the importance of data collection and preprocessing, feature selection and engineering, model development and validation, and ethical considerations in conducting research in this field. By improving our understanding of Post-COVID-19 Syndrome through machine learning, healthcare providers can identify at-risk individuals and offer timely interventions, potentially improving patient outcomes and quality of life. Further research is essential to refine models, validate their clinical utility, and explore treatment options for Long COVID. Keywords: Post-COVID-19 Syndrome, Machine Learning, Multi-Organ Dysfunction, Healthcare, Artificial Intelligence.
    摘要 在2022年,这项研究包括来自伊拉克各地的466名病人。这篇研究论文探讨了使用机器学习技术分析和预测多器系功能障碍,以便更好地诊断和管理抗covid-19后续症,通常称为“长covid”。抗covid-19后续症会导致多种持续性症状影响不同的器系,对医疗卫生 pose significiant 挑战。通过利用人工智能技术,这项研究希望通过分析和预测多器系功能障碍,提高早期发现和管理这种复杂的疾病的能力。这篇论文还讨论了数据收集和处理、特征选择和工程、模型开发和验证以及在这一领域进行研究的伦理考虑。通过利用机器学习技术,医疗专业人员可以识别患有长covid的高风险个体,提供时间性的 intervención,可能改善病人的病理和生活质量。进一步的研究是必要的,以钻深模型,验证其临床实用性,并探讨抗covid-19后续症的治疗方法。关键词:Post-COVID-19 Syndrome, Machine Learning, Multi-Organ Dysfunction, Healthcare, Artificial Intelligence.

Improving Adaptive Online Learning Using Refined Discretization

  • paper_url: http://arxiv.org/abs/2309.16044
  • repo_url: None
  • paper_authors: Zhiyu Zhang, Heng Yang, Ashok Cutkosky, Ioannis Ch. Paschalidis
  • for: 本文研究无约束的在线线性优化问题,具体目标是同时实现($i$)第二阶导数适应性和($ii$)参考范数适应性(也称为“参数自由”)。现有的 regret bound(Cutkosky和Orabona,2018;Mhammedi和Koolen,2020;Jacobsen和Cutkosky,2022)具有不优雅的 $O(\sqrt{V_T\log V_T})$ 依赖于导数异常 $V_T$,而 presente 工作改进到最优的 $O(\sqrt{V_T})$ Rate,使用一种新的连续时间启发的算法,无需任何不实际的双倍把拔。
  • methods: 本文使用一种新的连续时间启发的算法,并提出一种新的离散化 Argument 来保持这种适应性在敌对设置中。这种离散化 Argument 可以在敌对设置中保持适应性,并且可以从 (Harvey et al., 2023) 中提取, both algorithmically and analytically。
  • results: 本文提出的算法可以在敌对设置中实现 $O(\sqrt{V_T})$ 的 regret bound,而不是现有的 $O(\sqrt{V_T\log V_T})$ bound。此外,本文还证明了在未知 Lipschitz 常数的情况下,可以消除 Priori 知道的范围比例问题。
    Abstract We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.
    摘要 我们研究不受限制的在线线性优化问题,即使用 lipschitz 损失函数。目标是同时实现(i)第二阶导数适应性和(ii)参数自由性,也称为“参数自由”在文献中。现有的 regret 界(Cutkosky 和 Orabona,2018年;Mhammedi 和 Koolen,2020年;Jacobsen 和 Cutkosky,2022年)具有不优雅的 $O(\sqrt{V_T\log V_T})$ 依赖于梯度方差 $V_T$,而我们的研究提高了这个约束到优化的 $O(\sqrt{V_T})$ 难度,使用一种新的连续时间启发的算法,没有任何不实用的双倍招数技巧。此结果可以推广到未知 lipschitz 常数的设定下,从而消除先前的作品(Mhammedi 和 Koolen,2020年)中的范围比例问题。更具体地,我们首先证明了目标的同时适应性可以在连续时间 аналоги中轻松实现,其中环境是一个 произволь contradicted 的 continuous semimartingale。然后,我们的关键创新是一种新的离散 argumen that preserves 这种适应性在离散时间对抗设定下。这种 refine 了一种不 gradient-adaptive 的离散 argumen from (Harvey 等,2023年),从 both algorithmic 和 analytic 角度来说,这可能是独立的兴趣。

Analytical Modelling of Raw Data for Flow-Guided In-body Nanoscale Localization

  • paper_url: http://arxiv.org/abs/2309.16034
  • repo_url: None
  • paper_authors: Guillem Pascual, Filip Lemic, Carmen Delgado, Xavier Costa-Perez
  • for: 这个论文的目的是提出一种 Analytical Model of Raw Data for Flow-Guided Localization in Nanoscale Devices, 用于解决现有的通信和能源约束问题,以提高精准医学应用的可行性。
  • methods: 该论文使用 Analytical Modeling 方法,模型了 nanodevice 的 raw data 如何受到通信和能源约束的影响,并与 Simulator 进行对比,以评估模型的准确性。
  • results: 研究结果表明,模型和 Simulator 生成的 raw 数据之间存在高度相似性,并且可以在多种场景和不同性能指标下进行评估。
    Abstract Advancements in nanotechnology and material science are paving the way toward nanoscale devices that combine sensing, computing, data and energy storage, and wireless communication. In precision medicine, these nanodevices show promise for disease diagnostics, treatment, and monitoring from within the patients' bloodstreams. Assigning the location of a sensed biological event with the event itself, which is the main proposition of flow-guided in-body nanoscale localization, would be immensely beneficial from the perspective of precision medicine. The nanoscale nature of the nanodevices and the challenging environment that the bloodstream represents, result in current flow-guided localization approaches being constrained in their communication and energy-related capabilities. The communication and energy constraints of the nanodevices result in different features of raw data for flow-guided localization, in turn affecting its performance. An analytical modeling of the effects of imperfect communication and constrained energy causing intermittent operation of the nanodevices on the raw data produced by the nanodevices would be beneficial. Hence, we propose an analytical model of raw data for flow-guided localization, where the raw data is modeled as a function of communication and energy-related capabilities of the nanodevice. We evaluate the model by comparing its output with the one obtained through the utilization of a simulator for objective evaluation of flow-guided localization, featuring comparably higher level of realism. Our results across a number of scenarios and heterogeneous performance metrics indicate high similarity between the model and simulator-generated raw datasets.
    摘要 (Simplified Chinese translation) nanotechnology 和材料科学的进步正在逐渐推动无导体设备的发展,这些设备可以同时感测、计算、数据和能量储存、无线通信。在精准医学中,这些无导体设备展示出在血液中诊断、治疗和监测疾病的极大潜力。将感测生物事件的位置与事件本身相同,是精准医学的核心提案。然而,由于nanodevice的纳米规模和血液环境的挑战性,目前的流导引在血液中的本地化方法受到了通信和能量相关的限制。这些限制导致流导引方法中的数据 Raw 数据具有不同的特征,从而影响其性能。我们提出一种对raw数据的分析模型,该模型将raw数据作为nanodevice的通信和能量相关能力的函数来模型。我们对模型进行评估,并与基于模拟器的对流导引方法的评估进行比较。我们在多种情况和多样性性指标下得到的结果表明,模型和模拟器生成的raw数据之间存在高度相似性。

Learning Dissipative Neural Dynamical Systems

  • paper_url: http://arxiv.org/abs/2309.16032
  • repo_url: None
  • paper_authors: Yuezhu Xu, S. Sivaranjani
  • for: 学习一个不知名的非线性动力系统模型,保持系统的热力学性质。
  • methods: 在两个阶段中学习非线性动力系统模型:首先学习一个不受约束的神经动力系统模型,然后 derivation suficient conditions 以保持系统的热力学性质,并且通过对模型参数的扰动来实现这些条件。
  • results: 这两个阶段的扰动问题可以独立解决,以获得一个保持系统热力学性质的神经动力系统模型,并且这个模型可以准确地模拟非线性系统的轨迹。
    Abstract Consider an unknown nonlinear dynamical system that is known to be dissipative. The objective of this paper is to learn a neural dynamical model that approximates this system, while preserving the dissipativity property in the model. In general, imposing dissipativity constraints during neural network training is a hard problem for which no known techniques exist. In this work, we address the problem of learning a dissipative neural dynamical system model in two stages. First, we learn an unconstrained neural dynamical model that closely approximates the system dynamics. Next, we derive sufficient conditions to perturb the weights of the neural dynamical model to ensure dissipativity, followed by perturbation of the biases to retain the fit of the model to the trajectories of the nonlinear system. We show that these two perturbation problems can be solved independently to obtain a neural dynamical model that is guaranteed to be dissipative while closely approximating the nonlinear system.
    摘要 请考虑一个未知的非线性动力系统,该系统知道是膨胀的。本文的目标是通过学习神经动力模型,来近似该系统,保持模型中的膨胀性质。在总之,在神经网络训练中加入膨胀性约束是一个困难的问题,现在没有知道的技术。在这种情况下,我们在两个阶段中解决了学习一个膨胀的神经动力模型。首先,我们学习一个不受约束的神经动力模型,以便尽可能地准确地近似非线性系统的动力。然后,我们 derive sufficient conditions to perturb the weights of the neural dynamical model to ensure dissipativity, followed by perturbation of the biases to retain the fit of the model to the trajectories of the nonlinear system. We show that these two perturbation problems can be solved independently to obtain a neural dynamical model that is guaranteed to be dissipative while closely approximating the nonlinear system.

GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis

  • paper_url: http://arxiv.org/abs/2309.16022
  • repo_url: https://github.com/chenfengzhao/gnnhls
  • paper_authors: Chenfeng Zhao, Zehao Dong, Yixin Chen, Xuan Zhang, Roger D. Chamberlain
  • for: 本研究旨在提高图神经网络(GNN)的有效执行,使用场地编程阵列(FPGAs)作为执行平台。
  • methods: 本研究使用高级编译(HLS)工具,将GNN模型转换为FPGAs上的优化后的执行代码。
  • results: 研究发现,使用GNNHLS框架可以在4个图数据集上实现50.8倍的速度提升和423倍的能耗减少,相比CPU基线。与GPU基线相比,GNNHLS可以实现5.16倍的速度提升和74.5倍的能耗减少。
    Abstract With the ever-growing popularity of Graph Neural Networks (GNNs), efficient GNN inference is gaining tremendous attention. Field-Programming Gate Arrays (FPGAs) are a promising execution platform due to their fine-grained parallelism, low-power consumption, reconfigurability, and concurrent execution. Even better, High-Level Synthesis (HLS) tools bridge the gap between the non-trivial FPGA development efforts and rapid emergence of new GNN models. In this paper, we propose GNNHLS, an open-source framework to comprehensively evaluate GNN inference acceleration on FPGAs via HLS, containing a software stack for data generation and baseline deployment, and FPGA implementations of 6 well-tuned GNN HLS kernels. We evaluate GNNHLS on 4 graph datasets with distinct topologies and scales. The results show that GNNHLS achieves up to 50.8x speedup and 423x energy reduction relative to the CPU baselines. Compared with the GPU baselines, GNNHLS achieves up to 5.16x speedup and 74.5x energy reduction.
    摘要 随着图神经网络(GNN)的普及,高效的GNN执行已经吸引了很多关注。场地编程阵列(FPGAs)是一个有前途的执行平台,因为它们具有细化的并行性、低功耗消耗、可重新配置和同时执行。尤其是高级语言 Synthesis(HLS)工具,可以将FPGAs的开发努力减少到最小限度,并快速实现新的GNN模型。在本文中,我们提出了GNNHLS,一个开源框架,通过HLS来全面评估GNN执行的加速在FPGAs上,包括软件堆栈 для数据生成和基线部署,以及FPGA中6种优化后的GNN HLS kernel。我们在4个图据集上进行了测试,结果显示,GNNHLS可以在相对于CPU基线的50.8倍速度和423倍能效率下运行GNN模型。相比GPU基eline,GNNHLS可以在5.16倍速度和74.5倍能效率下运行GNN模型。

Graph-level Representation Learning with Joint-Embedding Predictive Architectures

  • paper_url: http://arxiv.org/abs/2309.16014
  • repo_url: https://github.com/geriskenderi/graph-jepa
  • paper_authors: Geri Skenderi, Hang Li, Jiliang Tang, Marco Cristani
  • for: 本研究旨在学习自我超vised表示学习中的 JOINT-EMBEDDING PREDICTIVE ARCHITECTURES(JEPAs),用于图像领域的图表示学习。
  • methods: 本研究使用的方法是MASKED MODELING,通过预测目标信号的嵌入表示来学习图像的嵌入表示。
  • results: 研究发现,GRAPH-JEPA可以学习表示,并在图像分类和回归问题中表现出表达力和竞争力。
    Abstract Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal $y$ from a context signal $x$. JEPAs bypass the need for data augmentation and negative samples, which are typically required by contrastive learning, while avoiding the overfitting issues associated with generative-based pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm and propose Graph-JEPA, the first JEPA for the graph domain. In particular, we employ masked modeling to learn embeddings for different subgraphs of the input graph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative training objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems.
    摘要 joint-embedding predictive architectures (JEPAs) 最近 emerged as a novel 和 powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal $y$ from a context signal $x$. JEPAs bypass the need for data augmentation and negative samples, which are typically required by contrastive learning, while avoiding the overfitting issues associated with generative-based pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm and propose Graph-JEPA, the first JEPA for the graph domain. In particular, we employ masked modeling to learn embeddings for different subgraphs of the input graph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative training objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems.Here's the text with some notes on the translation:* "Joint-Embedding Predictive Architectures" is translated as "联合嵌入预测建筑" (liánhòu zhùjì yùjìng gōngchǎng)* "recently emerged" is translated as "最近 emerged" (zuìjìn yǐjī)* "novel and powerful technique" is translated as "新奇和强大的技术" (xīn qí hé qiáng dà de jìshù)* "self-supervised representation learning" is translated as "自我指导的表示学习" (ziwu zhǐdǎo de biǎozhì xuéxí)* "predicting the latent representation of a target signal" is translated as "预测目标信号的隐藏表示" (yùjì yuètiān yìjīng biǎozhì)* "from a context signal" is translated as "从上下文信号" (cong shàngxìnxiào)* "bypass the need for data augmentation and negative samples" is translated as "绕过数据扩展和负样本的需求" (luòguò xiàngxìn yǔ fāngyàng de xūyào)* "while avoiding the overfitting issues associated with generative-based pretraining" is translated as "而不是避免生成基于预训练的过拟合问题" (ér bùshì mìmiàn shēngchǎng yǐjīng yǔ yùshì de guòhùsuǒ)* "in the graph domain" is translated as "在图像领域" (zài túxiàng yìjīng)* "employ masked modeling to learn embeddings for different subgraphs" is translated as "使用假数据模型学习不同的子图" (shǐyòu kāi xiàng xiǎngxìn yǔ bùdìng de zǐtú)* "to endow the representations with the implicit hierarchy that is often present in graph-level concepts" is translated as "以使表示具有图级概念中的隐藏层次结构" (yǐ shì biǎozhì yǒu xìnghài lèi jí qiǎng yìjīng)* "we devise an alternative training objective" is translated as "我们提出了一种代替目标" (wǒmen tímcháng le yī zhōng dài biǎo mó)* "consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane" is translated as "包括预测编码的子图坐标在2D平面上的unit hyperbola" (bāngsuǒ yùjì yùjì yǐjīng zhǐxíng yǐjīng yǔ 2D píngmàn shàng)* "Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems" is translated as "广泛验证表明,图像-JEPA可以学习表示,在图类别和回归问题中表现出色" (guǎnggòu yànzhèng bǐngmíng, túxiāng-JEPA kěyǐ xuéxí, zài túxìng yǔ fāngyì zhìdào)

Digital Twin-based Anomaly Detection with Curriculum Learning in Cyber-physical Systems

  • paper_url: http://arxiv.org/abs/2309.15995
  • repo_url: https://github.com/xuqinghua-china/tosem
  • paper_authors: Qinghua Xu, Shaukat Ali, Tao Yue
  • for: 本研究旨在提高Cyber-Physical System (CPS) 中异常检测的精度和效率,通过增加数据难度来优化异常检测方法。
  • methods: 本研究使用的方法包括数字双方法(ATTAIN)和课程学习(curriculum learning),通过对异常数据进行难度分类,从易到难地进行学习。
  • results: 对五个实际 collected CPS 测试床上的数据进行评估,结果显示 LATTICE 在 F1 分数上比 ATTAIN 和两个基eline 高出 0.906%-2.367%,同时也可以降低 ATTAIN 的训练时间。
    Abstract Anomaly detection is critical to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of attacks and CPS themselves, anomaly detection in CPS is becoming more and more challenging. In our previous work, we proposed a digital twin-based anomaly detection method, called ATTAIN, which takes advantage of both historical and real-time data of CPS. However, such data vary significantly in terms of difficulty. Therefore, similar to human learning processes, deep learning models (e.g., ATTAIN) can benefit from an easy-to-difficult curriculum. To this end, in this paper, we present a novel approach, named digitaL twin-based Anomaly deTecTion wIth Curriculum lEarning (LATTICE), which extends ATTAIN by introducing curriculum learning to optimize its learning paradigm. LATTICE attributes each sample with a difficulty score, before being fed into a training scheduler. The training scheduler samples batches of training data based on these difficulty scores such that learning from easy to difficult data can be performed. To evaluate LATTICE, we use five publicly available datasets collected from five real-world CPS testbeds. We compare LATTICE with ATTAIN and two other state-of-the-art anomaly detectors. Evaluation results show that LATTICE outperforms the three baselines and ATTAIN by 0.906%-2.367% in terms of the F1 score. LATTICE also, on average, reduces the training time of ATTAIN by 4.2% on the five datasets and is on par with the baselines in terms of detection delay time.
    摘要 cyber-physical systems (CPS) 安全需要 anomaly detection (AD)。然而,由于攻击和 CPS 本身的复杂度增加,CPS 中的 AD 变得更加困难。我们在前一项工作中提出了一种基于数字孪生的 AD 方法,称为 ATTAIN,它利用 CPS 的历史和实时数据。然而,这些数据很Difficult to vary significantly.因此,类似于人类学习过程,深度学习模型(例如 ATTAIN)可以从易到困难的学习顺序中受益。为此,在这篇论文中,我们提出了一种 noval 方法, named digitaL twin-based Anomaly deTecTion wIth Curriculum lEarning (LATTICE),它扩展 ATTAIN 而添加了学习课程。LATTICE 为每个样本分配了一个difficulty score,然后通过一个训练调度器将它们 feed into 训练。训练调度器根据这些difficulty score sampling batches of training data,以便从易到困难的数据进行学习。为了评估 LATTICE,我们使用了五个公开可用的实验室测试床上的数据集。我们将 LATTICE 与 ATTAIN 和两个状态方法进行比较。评估结果显示,LATTICE 在 F1 分数上与三个基准值相比提高了0.906%-2.367%,而且在五个数据集上的训练时间中,LATTICE 平均降低了 ATTAIN 的训练时间4.2%。此外,LATTICE 与基准值相比在检测延迟时间方面具有相同的性能。

Machine Learning Based Analytics for the Significance of Gait Analysis in Monitoring and Managing Lower Extremity Injuries

  • paper_url: http://arxiv.org/abs/2309.15990
  • repo_url: None
  • paper_authors: Mostafa Rezapour, Rachel B. Seymour, Stephen H. Sims, Madhav A. Karunakar, Nahir Habet, Metin Nafi Gurcan
  • for: 这项研究旨在利用步态分析来评估骨折患者后果,如感染、萎缩或设备侵袋。
  • methods: 研究使用了监督式机器学习模型预测后果,使用连续步态数据集。患者在学院中接受了骨折治疗,并进行了胸部绑定式IMU设备进行步态分析。 Raw 步态数据使用软件进行 pré-processing,强调12个关键步态变量。
  • results: 研究发现XGBoost模型在训练、测试和评估中表现最佳,并且通过SMOTE处理Class imbalance问题得到了改善。在评估前和评估后应用SMOTE后,XGBoost模型在测试AUC和测试精度方面均达到了最高水平。
    Abstract This study explored the potential of gait analysis as a tool for assessing post-injury complications, e.g., infection, malunion, or hardware irritation, in patients with lower extremity fractures. The research focused on the proficiency of supervised machine learning models predicting complications using consecutive gait datasets. We identified patients with lower extremity fractures at an academic center. Patients underwent gait analysis with a chest-mounted IMU device. Using software, raw gait data was preprocessed, emphasizing 12 essential gait variables. Machine learning models including XGBoost, Logistic Regression, SVM, LightGBM, and Random Forest were trained, tested, and evaluated. Attention was given to class imbalance, addressed using SMOTE. We introduced a methodology to compute the Rate of Change (ROC) for gait variables, independent of the time difference between gait analyses. XGBoost was the optimal model both before and after applying SMOTE. Prior to SMOTE, the model achieved an average test AUC of 0.90 (95% CI: [0.79, 1.00]) and test accuracy of 86% (95% CI: [75%, 97%]). Feature importance analysis attributed importance to the duration between injury and gait analysis. Data patterns showed early physiological compensations, followed by stabilization phases, emphasizing prompt gait analysis. This study underscores the potential of machine learning, particularly XGBoost, in gait analysis for orthopedic care. Predicting post-injury complications, early gait assessment becomes vital, revealing intervention points. The findings support a shift in orthopedics towards a data-informed approach, enhancing patient outcomes.
    摘要 We identified patients with lower extremity fractures at an academic center and had them undergo gait analysis with a chest-mounted IMU device. Using software, we preprocessed the raw gait data, emphasizing 12 essential gait variables. We trained, tested, and evaluated machine learning models including XGBoost, Logistic Regression, SVM, LightGBM, and Random Forest. We addressed class imbalance using SMOTE.We introduced a methodology to compute the Rate of Change (ROC) for gait variables, independent of the time difference between gait analyses. XGBoost was the optimal model both before and after applying SMOTE. Prior to SMOTE, the model achieved an average test AUC of 0.90 (95% CI: [0.79, 1.00]) and test accuracy of 86% (95% CI: [75%, 97%]). Feature importance analysis attributed importance to the duration between injury and gait analysis.Data patterns showed early physiological compensations, followed by stabilization phases, emphasizing the importance of prompt gait analysis. This study underscores the potential of machine learning, particularly XGBoost, in gait analysis for orthopedic care. Predicting post-injury complications early on becomes vital, revealing intervention points. The findings support a shift in orthopedics towards a data-informed approach, enhancing patient outcomes.

Open Source Infrastructure for Differentiable Density Functional Theory

  • paper_url: http://arxiv.org/abs/2309.15985
  • repo_url: None
  • paper_authors: Advika Vidhyadhiraja, Arun Pa Thiagarajan, Shang Zhu, Venkat Viswanathan, Bharath Ramsundar
  • for: 用于训练量子化学计算中的exchange correlation函数。
  • methods: 使用开源基础设施和多组State-of-the-art技术来标准化处理管道。
  • results: 开发了一个可 diferenciable quantum chemistry方法,并将其分布在DeepChem库中,以便进一步研究。
    Abstract Learning exchange correlation functionals, used in quantum chemistry calculations, from data has become increasingly important in recent years, but training such a functional requires sophisticated software infrastructure. For this reason, we build open source infrastructure to train neural exchange correlation functionals. We aim to standardize the processing pipeline by adapting state-of-the-art techniques from work done by multiple groups. We have open sourced the model in the DeepChem library to provide a platform for additional research on differentiable quantum chemistry methods.
    摘要 学习交换相关函数,用于量子化学计算,从数据中获得信息已经在过去几年变得越来越重要。但是训练这种函数需要复杂的软件基础设施。为此,我们建立了开源基础设施,用于训练神经网络交换相关函数。我们目标是标准化处理管道,采用多个组织的state-of-the-art技术。我们在DeepChem库中打包了模型,以提供更多的研究 differentiable量子化学方法的平台。

TraCE: Trajectory Counterfactual Explanation Scores

  • paper_url: http://arxiv.org/abs/2309.15965
  • repo_url: None
  • paper_authors: Jeffrey N. Clark, Edward A. Small, Nawid Keshtmand, Michelle W. L. Wan, Elena Fillola Mayoral, Enrico Werner, Christopher P. Bourdeaux, Raul Santos-Rodriguez
  • for: 这个论文旨在扩展对黑盒分类器预测的 counterfactual 解释,以便更好地理解和解释在следова决策任务中的进步。
  • methods: 该论文提出了一种模型无关的套件方法,名为 TraCE(轨迹counterfactual解释),可以将高度复杂的情况简化成一个值。
  • results: 在两个案例研究中,TraCE 能够成功地捕捉并概括在医疗和气候变化等领域中的进步。
    Abstract Counterfactual explanations, and their associated algorithmic recourse, are typically leveraged to understand, explain, and potentially alter a prediction coming from a black-box classifier. In this paper, we propose to extend the use of counterfactuals to evaluate progress in sequential decision making tasks. To this end, we introduce a model-agnostic modular framework, TraCE (Trajectory Counterfactual Explanation) scores, which is able to distill and condense progress in highly complex scenarios into a single value. We demonstrate TraCE's utility across domains by showcasing its main properties in two case studies spanning healthcare and climate change.
    摘要 <>将文本翻译成简化中文。<>通常情况下,Counterfactual解释和其相关的算法救济是用来理解、解释和可能修改来自黑盒分类器的预测。在这篇论文中,我们提议将Counterfactuals用于评估流程决策任务的进步。为此,我们提出了一种无关于模型的模块化框架,名为TraCE(轨迹Counterfactual解释)分数,可以将高度复杂的情况缩减到单个值。我们在两个案例中展示了TraCE的使用效果,其中一个是医疗领域,另一个是气候变化领域。

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

  • paper_url: http://arxiv.org/abs/2309.15938
  • repo_url: None
  • paper_authors: Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani
  • For: 本研究提出了一种简单的多通道框架(MC-SimCLR),用于编码空间声音的“what”和“where”。MC-SimCLR通过不supervised学习,从空间声音中学习 JOINT spectral和空间表示,从而提高下游任务中的事件分类和声音定位。* Methods: 我们提出了一种多级数据增强管道,用于增强不同级别的声音特征,包括波形、Mel幅gram和通用相关函数(GCC)特征。此外,我们引入了简单 yet effective的通道 wise增强方法,包括随机将 Mikrophone 的顺序交换和 Mel、GCC 通道遮盖。* Results: 我们发现,使用这些增强方法后,linear层在 learned 表示上得到了显著改进,以至于超过supervised模型在事件分类精度和定位误差方面的表现。此外,我们还进行了对每种增强方法的影响分析和不同量的标注数据 fine-tuning 性能的比较。
    Abstract In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.
    摘要 在这个研究中,我们提出了一种简单的多通道框架 для对比学习(MC-SimCLR),用于编码空间声音中的“what”和“where”。MC-SimCLR通过不supervised学习,从未标注的空间声音中学习联合spectral和空间表示,从而提高下游任务中的事件分类和声音定位精度。我们的核心提案是一种多级数据增强管道,该管道在不同级别上增强不同类型的声音特征,包括波形、Mel spectrogram和通用相关函数(GCC)特征。此外,我们还引入了简单 yet effective的通道 wise增强方法, randomly swap microphone的顺序和隐藏Mel和GCC通道。我们发现,通过使用这些增强方法,linear layer在学习后的表示 significantly outperform supervised模型,并且我们也进行了对每种增强方法的影响分析以及不同量的标注数据 fine-tuning性能的比较。

Deep Learning-Based Real-Time Rate Control for Live Streaming on Wireless Networks

  • paper_url: http://arxiv.org/abs/2310.06857
  • repo_url: None
  • paper_authors: Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T. Chakradhar, Sennur Ulukus
  • for: 提供无线用户高质量视频内容已成为非常重要,但确保视频质量一致却面临变化的编码比特率和无线频谱干扰的挑战。
  • methods: 提议使用实时深度学习基于H.264控制器,利用物理层的实时频率质量数据和视频块来动态估算最佳编码参数,以避免视频质量损失和 packet loss 引起的artefacts。
  • results: 实验结果表明,相比之前的适应比特率视频流传输技术,该方法可以获得10-20 dB的PSNR提升,并且平均包drop rate只有0.002。
    Abstract Providing wireless users with high-quality video content has become increasingly important. However, ensuring consistent video quality poses challenges due to variable encoded bitrate caused by dynamic video content and fluctuating channel bitrate caused by wireless fading effects. Suboptimal selection of encoder parameters can lead to video quality loss due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with repect to the state-of-the-art adaptive bitrate video streaming is achieved, with an average packet drop rate as low as 0.002.
    摘要 提供无线用户高质量视频内容已成为非常重要。然而,保证视频质量的一致带来了变量编码比特率的挑战,这是因为动态视频内容引起的编码比特率变化以及无线抖动效果引起的通道比特率波动。不佳选择编码参数可能会导致视频质量损失, Either due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with respect to the state-of-the-art adaptive bitrate video streaming are achieved, with an average packet drop rate as low as 0.002.Here's the translation in Traditional Chinese:提供无线用户高质量影片内容已成为非常重要。然而,保证影片质量的一致带来了变量编码比特率的挑战,这是因为动态影片内容引起的编码比特率变化以及无线抖动效果引起的通道比特率波动。不佳选择编码参数可能会导致影片质量损失, Either due to underutilized bandwidth or the introduction of video artifacts due to packet loss. To address this, a real-time deep learning based H.264 controller is proposed. This controller leverages instantaneous channel quality data driven from the physical layer, along with the video chunk, to dynamically estimate the optimal encoder parameters with a negligible delay in real-time. The objective is to maintain an encoded video bitrate slightly below the available channel bitrate. Experimental results, conducted on both QCIF dataset and a diverse selection of random videos from public datasets, validate the effectiveness of the approach. Remarkably, improvements of 10-20 dB in PSNR with respect to the state-of-the-art adaptive bitrate video streaming are achieved, with an average packet drop rate as low as 0.002.

Multi-unit soft sensing permits few-shot learning

  • paper_url: http://arxiv.org/abs/2309.15828
  • repo_url: None
  • paper_authors: Bjarne Grimstad, Kristian Løvland, Lars S. Imsland
  • for: 本研究探讨了使用学习算法实现软感知器的提升。具体来说,通过解决多个任务来提高软感知器的性能。
  • methods: 本文使用了深度神经网络实现多单元软感知器,并 investigate了该模型在不同单元数量下的学习能力。
  • results: 研究发现,当软感知器通过多个任务学习时,它可以具有很好的泛化能力,并且在新单元上进行几个数据点的少量学习后,可以达到高性能。
    Abstract Recent literature has explored various ways to improve soft sensors using learning algorithms with transferability. Broadly put, the performance of a soft sensor may be strengthened when it is learned by solving multiple tasks. The usefulness of transferability depends on how strongly related the devised learning tasks are. A particularly relevant case for transferability, is when a soft sensor is to be developed for a process of which there are many realizations, e.g. system or device with many implementations from which data is available. Then, each realization presents a soft sensor learning task, and it is reasonable to expect that the different tasks are strongly related. Applying transferability in this setting leads to what we call multi-unit soft sensing, where a soft sensor models a process by learning from data from all of its realizations. This paper explores the learning abilities of a multi-unit soft sensor, which is formulated as a hierarchical model and implemented using a deep neural network. In particular, we investigate how well the soft sensor generalizes as the number of units increase. Using a large industrial dataset, we demonstrate that, when the soft sensor is learned from a sufficient number of tasks, it permits few-shot learning on data from new units. Surprisingly, regarding the difficulty of the task, few-shot learning on 1-3 data points often leads to a high performance on new units.
    摘要 This paper investigates the learning abilities of a multi-unit soft sensor, which is formulated as a hierarchical model and implemented using a deep neural network. Specifically, we examine how well the soft sensor generalizes as the number of units increases. Using a large industrial dataset, we show that when the soft sensor is learned from a sufficient number of tasks, it permits few-shot learning on data from new units. Surprisingly, even with just a few data points, the soft sensor can achieve high performance on new units.

Fair Canonical Correlation Analysis

  • paper_url: http://arxiv.org/abs/2309.15809
  • repo_url: https://github.com/pennshenlab/fair_cca
  • paper_authors: Zhuoping Zhou, Davoud Ataee Tarzanagh, Bojian Hou, Boning Tong, Jia Xu, Yanbo Feng, Qi Long, Li Shen
  • for: 这个论文探讨了干净 correlation analysis (CCA) 中的公平性和偏见问题,并提出了一种框架来减少保护特征相关的偏见错误。
  • methods: 该论文使用了一种方法来学习全数据点上的全球投影矩阵,并确保这些矩阵在各个组中具有相同的相关水平。
  • results: 实验表明,该方法可以减少偏见错误而不影响 CCA 的准确性。
    Abstract This paper investigates fairness and bias in Canonical Correlation Analysis (CCA), a widely used statistical technique for examining the relationship between two sets of variables. We present a framework that alleviates unfairness by minimizing the correlation disparity error associated with protected attributes. Our approach enables CCA to learn global projection matrices from all data points while ensuring that these matrices yield comparable correlation levels to group-specific projection matrices. Experimental evaluation on both synthetic and real-world datasets demonstrates the efficacy of our method in reducing correlation disparity error without compromising CCA accuracy.
    摘要

Node-Aligned Graph-to-Graph Generation for Retrosynthesis Prediction

  • paper_url: http://arxiv.org/abs/2309.15798
  • repo_url: None
  • paper_authors: Lin Yao, Zhen Wang, Wentao Guo, Shang Xiang, Wentan Liu, Guolin Ke
  • For: The paper aims to develop a template-free machine learning model for single-step retrosynthesis, which can fully leverage the topological information of the molecule and align atoms between the product and reactants.* Methods: The proposed method, NAG2G, uses 2D molecular graphs and 3D conformation information, and incorporates node alignment to determine a specific order for node generation. The method generates molecular graphs in an auto-regressive manner, ensuring that the node generation order coincides with the node order in the input graph.* Results: The proposed NAG2G method outperforms previous state-of-the-art baselines in various metrics, demonstrating its effectiveness in single-step retrosynthesis.
    Abstract Single-step retrosynthesis is a crucial task in organic chemistry and drug design, requiring the identification of required reactants to synthesize a specific compound. with the advent of computer-aided synthesis planning, there is growing interest in using machine-learning techniques to facilitate the process. Existing template-free machine learning-based models typically utilize transformer structures and represent molecules as ID sequences. However, these methods often face challenges in fully leveraging the extensive topological information of the molecule and aligning atoms between the production and reactants, leading to results that are not as competitive as those of semi-template models. Our proposed method, Node-Aligned Graph-to-Graph (NAG2G), also serves as a transformer-based template-free model but utilizes 2D molecular graphs and 3D conformation information. Furthermore, our approach simplifies the incorporation of production-reactant atom mapping alignment by leveraging node alignment to determine a specific order for node generation and generating molecular graphs in an auto-regressive manner node-by-node. This method ensures that the node generation order coincides with the node order in the input graph, overcoming the difficulty of determining a specific node generation order in an auto-regressive manner. Our extensive benchmarking results demonstrate that the proposed NAG2G can outperform the previous state-of-the-art baselines in various metrics.
    摘要 单步反synthesis是有机化学中的一项重要任务,它需要确定用于合成特定化合物的反应物。随着计算机支持的合成规划的出现,有关机器学习技术的应用在这一领域中受到越来越多的关注。现有的模板缺失机器学习模型通常采用变换器结构,并将分子表示为ID序列。然而,这些方法经常遇到利用分子的广泛顺序信息和生成物和反应物之间的原子对齐的问题,从而导致结果与半模板模型相比较差。我们提出的方法Node-Aligned Graph-to-Graph(NAG2G)是一种基于变换器的模板缺失模型,它使用二维分子图和三维结构信息。此外,我们的方法简化了生产反应物原子对齐的含义,通过节点对齐确定生成节点顺序,并在自然顺序下生成分子图。这种方法确保了生成节点顺序与输入图中节点顺序一致,从而解决了在自然顺序下生成节点顺序的困难。我们的广泛测试结果表明,提议的NAG2G可以在多种维度上超越先前的基准值。

Learning the Efficient Frontier

  • paper_url: http://arxiv.org/abs/2309.15775
  • repo_url: https://github.com/Sugoto/Algorithmic-Trading-Using-Unsupervised-Learning
  • paper_authors: Philippe Chatigny, Ivan Sergienko, Ryan Ferguson, Jordan Weir, Maxime Bergeron
  • for: 这篇论文是用于解决资源分配问题,寻找一个最佳投资组合,以 maximize 回报,同时遵循一定的风险水平。
  • methods: 这篇论文使用了人工神经网络来快速地预测资源分配问题的解决方案,并可以处理不规律的行为和变数数量的变化。
  • results: 这篇论文显示了NeuralEF可以快速地预测资源分配问题的解决方案,并且可以处理大规模的 simulations,并且可以适应不同的线性限制和服务器数量的变化。
    Abstract The efficient frontier (EF) is a fundamental resource allocation problem where one has to find an optimal portfolio maximizing a reward at a given level of risk. This optimal solution is traditionally found by solving a convex optimization problem. In this paper, we introduce NeuralEF: a fast neural approximation framework that robustly forecasts the result of the EF convex optimization problem with respect to heterogeneous linear constraints and variable number of optimization inputs. By reformulating an optimization problem as a sequence to sequence problem, we show that NeuralEF is a viable solution to accelerate large-scale simulation while handling discontinuous behavior.
    摘要 efficient frontier(EF)是一个基本资源分配问题,它的目标是找到一个最佳投资组合,以最大化奖励,同时保持给定的风险水平。传统上,这个问题通过解 convex 优化问题来解决。在这篇论文中,我们介绍了 NeuralEF:一种快速神经网络近似框架,可以快速和稳定地预测 EF 优化问题中的解,对于不同类型的线性约束和变量数量的优化输入。通过将优化问题转化为一个序列到序列问题,我们示出了 NeuralEF 可以快速加速大规模的模拟,同时处理不连续行为。

Importance-Weighted Offline Learning Done Right

  • paper_url: http://arxiv.org/abs/2309.15771
  • repo_url: None
  • paper_authors: Germano Gabbianelli, Gergely Neu, Matteo Papini
  • for: 学习在停机环境下的策略优化问题,目标是根据决策数据集学习一个近似优化策略,而不假设奖励函数的任何结构。
  • methods: 使用 importancce-weighted 估计器来估计每个策略的价值,并选择一个最小化估计值的策略,并且附加一个 “悲观” 调整以减少估计值的随机变化。
  • results: 比前一些研究更好地实现性能,包括: eliminating a highly restrictive “uniform coverage” assumption, 并且在无穷策略类中进行 PAC-Bayesian 扩展,以及通过数学仿真表明算法对参数选择的Robustness。
    Abstract We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a given policy class and aim to compete with the best comparator policy within this class. In this setting, a standard approach is to compute importance-weighted estimators of the value of each policy, and select a policy that minimizes the estimated value up to a "pessimistic" adjustment subtracted from the estimates to reduce their random fluctuations. In this paper, we show that a simple alternative approach based on the "implicit exploration" estimator of \citet{Neu2015} yields performance guarantees that are superior in nearly all possible terms to all previous results. Most notably, we remove an extremely restrictive "uniform coverage" assumption made in all previous works. These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities. We also extend our results to infinite policy classes in a PAC-Bayesian fashion, and showcase the robustness of our algorithm to the choice of hyper-parameters by means of numerical simulations.
    摘要 我们研究线上策略优化问题在随机上下文ual bandit问题中,目标是通过一个决策数据集,收集的一个差本策略来学习一个近似优化策略。而不是对奖励函数进行任何结构假设,我们假设可以访问一个给定策略类型,并且目标是在这个类型中竞争最佳比较策略。在这个设定下,标准的方法是计算重要性权重的估计值,并选择一个优化这些估计值的策略。在这篇论文中,我们表明了一种简单的代替方法,基于\citet{Neu2015}提出的"隐式探索"估计器,可以在大多数可能的情况下超越所有之前的结果。我们除去了所有前一 Works中的极其限制性"同质覆盖"假设,这些改进是通过观察重要性权重估计器的上下文不同,以及其精细控制来实现。我们还将结果扩展到无穷策略类型的PAC-Bayesian方式,并通过数据 simulate 显示了我们的算法对参数选择的robustness。

Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator

  • paper_url: http://arxiv.org/abs/2309.15769
  • repo_url: https://github.com/deshen24/ols_interpolator
  • paper_authors: Dennis Shen, Dogyoon Song, Peng Ding, Jasjeet S. Sekhon
  • for: 本研究探讨了使用最小二乘法(OLS) interpolator在高维场景下的性能,以了解这种情况下的泛化能力和 causal inference 的应用。
  • methods: 本研究使用了高维代数和统计方法,包括留下-$k$-out residual公式、Cochran的公式和 Frisch-Waugh-Lovell 定理,以探讨 OLS interpolator 在高维场景下的性能。
  • results: 研究发现,在高维场景下,OLS interpolator 的泛化能力受到一定的限制,但可以通过使用高维代数和统计方法来更好地理解和优化它的性能。
    Abstract Deep learning research has uncovered the phenomenon of benign overfitting for over-parameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical settings, its behavior in high-dimensional settings is less explored (unlike for ridge or lasso regression) though significant progress has been made of late. We contribute to this growing literature by providing fundamental algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In particular, we provide high-dimensional algebraic equivalents of (i) the leave-$k$-out residual formula, (ii) Cochran's formula, and (iii) the Frisch-Waugh-Lovell theorem. These results aid in understanding the OLS interpolator's ability to generalize and have substantive implications for causal inference. Additionally, under the Gauss-Markov model, we present statistical results such as a high-dimensional extension of the Gauss-Markov theorem and an analysis of variance estimation under homoskedastic errors. To substantiate our theoretical contributions, we conduct simulation studies that further explore the stochastic properties of the OLS interpolator.
    摘要

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

  • paper_url: http://arxiv.org/abs/2309.15737
  • repo_url: None
  • paper_authors: Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein
  • for: 这个论文是为了学习受限Markov决策过程(CMDP)中的搜索算法。
  • methods: 这个论文使用 posterior sampling 算法,并且实际上比现有的算法更有利。
  • results: 论文的主要理论结果是一个 Bayesian regret bound,表明在任何交流CMDP中,这个算法的误差 bound 是 O(HS√AT),与时间 horizion T 相对应。这个 regret bound与下界匹配,并且在无限预算下的设置中是最好的知道的 regret bound。实际结果表明,尽管这个算法简单,但它仍然在受限激励学习中超过现有算法。
    Abstract We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of \tilde{O} (HS \sqrt{AT}) for any communicating CMDP with S states, A actions, and bound on the hitting time H. This regret bound matches the lower bound in order of time horizon T and is the best-known regret bound for communicating CMDPs in the infinite-horizon undiscounted setting. Empirical results show that, despite its simplicity, our posterior sampling algorithm outperforms the existing algorithms for constrained reinforcement learning.
    摘要 我们提出了一种基于 posterior sampling 的新算法,用于在受限制的 Markov 决策过程(CMDP)中学习,在无限远景下无折扣设定下。该算法可以达到近似优化的尊贵 regret bound,而且在实际中比现有算法更有利。我们的主要理论结果是一个 Bayesian regret bound,其中每个成本组件的 regret bound为 O(HS√AT),对于任何交流 CMDP 来说。这个 regret bound 与时间轴 T 的下界相同,是无限远景下无折扣 CMDP 中最佳知道的 regret bound。实验结果表明,尽管我们的 posterior sampling 算法相对简单,但它在受限制的 reinforcement learning 中仍然超越了现有算法。

Deep Learning-based Analysis of Basins of Attraction

  • paper_url: http://arxiv.org/abs/2309.15732
  • repo_url: https://github.com/redlynx96/deep-learning-based-analysis-of-basins-of-attraction
  • paper_authors: David Valle, Alexandre Wagemakers, Miguel A. F. Sanjuán
  • for: 这个研究用 convolutional neural networks (CNNs) 来Characterize 不同动力系统的抽象基in的复杂性和随机性。
  • methods: 这种新方法可以高效地Explore 不同动力系统的参数,因为传统的方法 computationally expensive для Characterize 多个抽象基in。
  • results: 该研究包括对不同 CNN 架构的比较,显示我们提议的 Characterization 方法在与传统方法相比,即使使用过时的架构也表现出优异性。
    Abstract This study showcases the effectiveness of convolutional neural networks (CNNs) in characterizing the complexity and unpredictability of basins of attraction for diverse dynamical systems. This novel method is optimal for exploring different parameters of dynamical systems since the conventional methods are computationally expensive for characterizing multiple basins of attraction. Additionally, our research includes a comparison of different CNN architectures for this task showing the superiority of our proposed characterization method over the conventional methods, even with obsolete architectures.
    摘要 Here is the text in Simplified Chinese:这个研究显示了卷积神经网络 (CNNs) 在描述不同动力系统的基域抓取 Complexity and unpredictability of basins of attraction 的效果。这种新方法可以高效地探索不同动力系统的参数,因为传统方法计算多个基域抓取的成本是非常高昂的。此外,我们的研究还比较了不同的 CNN 架构,显示我们提出的 caracterization 方法比传统方法更加高效,即使使用过时的架构也可以达到更高的性能。

Temporal graph models fail to capture global temporal dynamics

  • paper_url: http://arxiv.org/abs/2309.15730
  • repo_url: https://github.com/temporal-graphs-negative-sampling/tgb
  • paper_authors: Michał Daniluk, Jacek Dąbrowski
  • for: 这个论文主要针对的是动态图模型的预测问题,尤其是在具有强大全球动态的 datasets 上。
  • methods: 该论文使用了一种简单的优化策略,即 “最近受欢迎的节点” 的方法,并提出了两种基于沃asserstein距离的度量方法来衡量数据集的短期和长期全球动态强度。
  • results: 研究发现,使用这种简单的优化策略可以在中等和大型数据集上超过其他方法的性能,而且标准的负样本评估方法可能不适用于具有强大全球动态的数据集,可能导致模型培养过程中的模型恶化和训练过程中的模型降解。
    Abstract A recently released Temporal Graph Benchmark is analyzed in the context of Dynamic Link Property Prediction. We outline our observations and propose a trivial optimization-free baseline of "recently popular nodes" outperforming other methods on medium and large-size datasets in the Temporal Graph Benchmark. We propose two measures based on Wasserstein distance which can quantify the strength of short-term and long-term global dynamics of datasets. By analyzing our unexpectedly strong baseline, we show how standard negative sampling evaluation can be unsuitable for datasets with strong temporal dynamics. We also show how simple negative-sampling can lead to model degeneration during training, resulting in impossible to rank, fully saturated predictions of temporal graph networks. We propose improved negative sampling schemes for both training and evaluation and prove their usefulness. We conduct a comparison with a model trained non-contrastively without negative sampling. Our results provide a challenging baseline and indicate that temporal graph network architectures need deep rethinking for usage in problems with significant global dynamics, such as social media, cryptocurrency markets or e-commerce. We open-source the code for baselines, measures and proposed negative sampling schemes.
    摘要 Recently, a Temporal Graph Benchmark was released, and we analyzed it in the context of Dynamic Link Property Prediction. We observed some interesting phenomena and proposed a trivial optimization-free baseline that outperforms other methods on medium and large-size datasets. We also proposed two measures based on Wasserstein distance to quantify the strength of short-term and long-term global dynamics of datasets.However, we found that standard negative sampling evaluation may not be suitable for datasets with strong temporal dynamics, and simple negative-sampling can lead to model degeneration during training, resulting in impossible to rank, fully saturated predictions of temporal graph networks. To address these issues, we proposed improved negative sampling schemes for both training and evaluation and proved their usefulness.We also compared our results with a model trained non-contrastively without negative sampling, and our results provide a challenging baseline. This suggests that temporal graph network architectures need to be rethought for usage in problems with significant global dynamics, such as social media, cryptocurrency markets, or e-commerce. To facilitate further research, we have open-sourced the code for baselines, measures, and proposed negative sampling schemes.

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

  • paper_url: http://arxiv.org/abs/2309.15717
  • repo_url: None
  • paper_authors: Frank Cwitkowitz, Kin Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama, Wei-Hsiang Liao, Yuki Mitsufuji
  • for: 本研究旨在提高音乐转录的性能,尤其是在低资源任务上。
  • methods: 本研究提出了一种新的框架,即Timbre-Trap,它将音乐转录和音频重建结合起来,通过利用拟声和时间域的强分离性来提高转录性能。
  • results: 研究表明,Timbre-Trap框架可以在低数据量情况下达到与现有状态艺术方法相当的转录性能,而不需要大量的注释数据。
    Abstract In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework which unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single U-Net to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to timbre-less audio, which can be interpreted as pitch salience. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods, while only requiring a small amount of annotated data.
    摘要 Recently, music transcription research has focused mainly on architecture design and instrument-specific data acquisition. Due to the lack of diverse datasets, progress has been limited to solo-instrument tasks such as piano transcription. Some works have explored multi-instrument transcription to improve the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework that unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single U-Net to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to timbre-less audio, which can be interpreted as pitch salience. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods, while only requiring a small amount of annotated data.

Maximum Weight Entropy

  • paper_url: http://arxiv.org/abs/2309.15704
  • repo_url: https://github.com/antoinedemathelin/openood
  • paper_authors: Antoine de Mathelin, François Deheeger, Mathilde Mougeot, Nicolas Vayatis
  • for: 这篇论文针对深度学习中的不确定量化和非常数据类型探测使用bayesian和集合方法。
  • methods: 方法建议一个实用的解决方案,即使标准方法在非常数据类型上运行时会受到过度简化的问题。
  • results: 方法可以在各种配置下与更多竞争者相比,在一个广泛的非常数据类型测试中排名前三名。
    Abstract This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we claim that standard methods sample in "over-restricted" regions of the weight space due to the use of "over-regularization" processes, such as weight decay and zero-mean centered Gaussian priors. We propose to solve the problem by adopting the maximum entropy principle for the weight distribution, with the underlying idea to maximize the weight diversity. Under this paradigm, the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. Considering stochastic neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy. We develop a novel weight parameterization for the stochastic model, based on the singular value decomposition of the neural network's hidden representations, which enables a large increase of the weight entropy for a small empirical risk penalization. We provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm appears in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark including more than thirty competitors.
    摘要 To solve this problem, the authors propose adopting the maximum entropy principle for the weight distribution. This approach aims to maximize the weight diversity, and the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. The authors derive a practical optimization to build such a distribution, which is a trade-off between the average empirical risk and the weight distribution entropy.To implement this approach, the authors develop a novel weight parameterization for the stochastic model based on the singular value decomposition of the neural network's hidden representations. This parameterization enables a large increase of the weight entropy for a small empirical risk penalization. The authors provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm ranks in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark that includes more than thirty competitors.

Breaking NoC Anonymity using Flow Correlation Attack

  • paper_url: http://arxiv.org/abs/2309.15687
  • repo_url: None
  • paper_authors: Hansika Weerasena, Pan Zhixin, Khushboo Rani, Prabhat Mishra
  • for: 本研究探讨了 today’s multicore System-on-Chip (SoC) 设计中的内部通信网络(Network-on-Chip,NoC)的安全性。
  • methods: 本研究使用了现有的匿名路由协议,并使用流量混淆技术来防御机器学习(ML)基于流量相关攻击。
  • results: 实验结果表明,现有的匿名路由有ML基于流量相关攻击的漏洞,而我们提议的轻量级匿名路由可以防御ML基于攻击,但具有较少的硬件和性能开销。
    Abstract Network-on-Chip (NoC) is widely used as the internal communication fabric in today's multicore System-on-Chip (SoC) designs. Security of the on-chip communication is crucial because exploiting any vulnerability in shared NoC would be a goldmine for an attacker. NoC security relies on effective countermeasures against diverse attacks. We investigate the security strength of existing anonymous routing protocols in NoC architectures. Specifically, this paper makes two important contributions. We show that the existing anonymous routing is vulnerable to machine learning (ML) based flow correlation attacks on NoCs. We propose a lightweight anonymous routing that use traffic obfuscation techniques which can defend against ML-based flow correlation attacks. Experimental studies using both real and synthetic traffic reveal that our proposed attack is successful against state-of-the-art anonymous routing in NoC architectures with a high accuracy (up to 99%) for diverse traffic patterns, while our lightweight countermeasure can defend against ML-based attacks with minor hardware and performance overhead.
    摘要 network-on-chip (NoC) 广泛应用于今天的多核系统在一个芯片 (SoC) 设计中作为内部通信织物。NoC 通信安全是关键的,因为任何 NoC 共享的攻击漏洞都会是攻击者的黄金岛。NoC 安全取决于有效地对多种攻击发起countermeasures。我们调查了现有的匿名路由协议在 NoC 架构中的安全强度。特别是,这篇论文有两个重要贡献。我们表明了现有的匿名路由易受到机器学习 (ML) 基于流 corrleation 攻击 NoC 中。我们提议一种轻量级的匿名路由,使用流混淆技术来防御 ML-based 流 corrleation 攻击。实验 Studies 使用了真实和 sintetic 流量,表明我们的提议可以成功地防御现有的匿名路由,并且对多种流量模式具有高精度(达到 99%),而我们的轻量级 countermeasure 可以防御 ML-based 攻击,只需要非常小的硬件和性能开销。

Projection based fuzzy least squares twin support vector machine for class imbalance problems

  • paper_url: http://arxiv.org/abs/2309.15886
  • repo_url: None
  • paper_authors: M. Tanveer, Ritik Mishra, Bharat Richhariya
  • for: addresses the problem of class imbalance and noisy datasets in real-world classification tasks
  • methods: proposes two novel fuzzy-based approaches, IF-RELSTSVM and F-RELSTSVM, which use intuitionistic fuzzy membership and hyperplane-based fuzzy membership, respectively
  • results: outperforms baseline algorithms on several benchmark and synthetic datasets, with statistical tests confirming the significance of the proposed algorithms on noisy and imbalanced datasets.Here’s the simplified Chinese version:
  • for: solve了现实世界分类任务中的类别不均和噪音数据问题
  • methods: 提出了两种新的模糊方法,IF-RELSTSVM和F-RELSTSVM,它们使用了Intuitionistic Fuzzy Membership和Hyperplane-based Fuzzy Membership
  • results: 在多个 benchmark和 sintetic 数据集上比基准算法表现更好,并通过统计测试确认了模糊方法在噪音和类别不均数据集上的可靠性。
    Abstract Class imbalance is a major problem in many real world classification tasks. Due to the imbalance in the number of samples, the support vector machine (SVM) classifier gets biased toward the majority class. Furthermore, these samples are often observed with a certain degree of noise. Therefore, to remove these problems we propose a novel fuzzy based approach to deal with class imbalanced as well noisy datasets. We propose two approaches to address these problems. The first approach is based on the intuitionistic fuzzy membership, termed as robust energy-based intuitionistic fuzzy least squares twin support vector machine (IF-RELSTSVM). Furthermore, we introduce the concept of hyperplane-based fuzzy membership in our second approach, where the final classifier is termed as robust energy-based fuzzy least square twin support vector machine (F-RELSTSVM). By using this technique, the membership values are based on a projection based approach, where the data points are projected on the hyperplanes. The performance of the proposed algorithms is evaluated on several benchmark and synthetic datasets. The experimental results show that the proposed IF-RELSTSVM and F-RELSTSVM models outperform the baseline algorithms. Statistical tests are performed to check the significance of the proposed algorithms. The results show the applicability of the proposed algorithms on noisy as well as imbalanced datasets.
    摘要 classe imbalance 是现实世界分类任务中的一个主要问题,由于样本的数量异常分布,支持向量机 (SVM) 分类器会受到主导类的偏向。此外,这些样本经常会受到一定程度的噪声影响。因此,我们提出了一种基于概率的新方法来处理类偏度和噪声问题。我们提出了两种方法来解决这些问题:第一种方法是基于感知度的强制类型,称为稳定能量基于感知度的最小二乘支持向量机 (IF-RELSTSVM)。其次,我们引入了基于抽象平面的辅助分类器,其最终分类器称为稳定能量基于抽象平面的概率最小二乘支持向量机 (F-RELSTSVM)。通过这种技术,分类器的成员值基于一种投影方法,将数据点投影到抽象平面上。我们对一些标准和 sintetic 数据集进行了实验,结果表明,我们提出的 IF-RELSTSVM 和 F-RELSTSVM 模型在基eline算法的基础上具有显著的优势。我们还进行了统计测试,以验证提出的方法的可靠性。结果表明,提出的方法可以在噪声和类偏度问题下得到好的应用。

Joint Sampling and Optimisation for Inverse Rendering

  • paper_url: http://arxiv.org/abs/2309.15676
  • repo_url: None
  • paper_authors: Martin Balint, Karol Myszkowski, Hans-Peter Seidel, Gurprit Singh
  • for: solve difficult inverse problems such as inverse rendering using Monte Carlo estimated gradients
  • methods: use interleaving sampling and optimisation, update and reuse past samples with low-variance finite-difference estimators, combine proportional and finite-difference samples to continuously reduce variance
  • results: speed up convergence of optimisation tasks, demonstrate effectiveness in inverse path tracing
    Abstract When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimise parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this variance trivially. However, for problems that require thousands of optimisation iterations, the computational cost of this approach rises quickly. We derive a theoretical framework for interleaving sampling and optimisation. We update and reuse past samples with low-variance finite-difference estimators that describe the change in the estimated gradients between each iteration. By combining proportional and finite-difference samples, we continuously reduce the variance of our novel gradient meta-estimators throughout the optimisation process. We investigate how our estimator interlinks with Adam and derive a stable combination. We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimisation tasks.
    摘要 When dealing with difficult inverse problems such as inverse rendering, using Monte Carlo estimated gradients to optimize parameters can slow down convergence due to variance. Averaging many gradient samples in each iteration reduces this variance trivially. However, for problems that require thousands of optimization iterations, the computational cost of this approach rises quickly. We derive a theoretical framework for interleaving sampling and optimization. We update and reuse past samples with low-variance finite-difference estimators that describe the change in the estimated gradients between each iteration. By combining proportional and finite-difference samples, we continuously reduce the variance of our novel gradient meta-estimators throughout the optimization process. We investigate how our estimator interlinks with Adam and derive a stable combination. We implement our method for inverse path tracing and demonstrate how our estimator speeds up convergence on difficult optimization tasks.Here's the text in Traditional Chinese:在解决复杂的 inverse 问题时,如 inverse rendering,使用 Monte Carlo 估计 gradient 来优化参数可能会因为异谱而 slow down convergence。每个迭代中聚合多个 gradient 样本可以很容易地降低异谱。然而,需要 thousands 个优化迭代,这种方法的计算成本会快速增长。我们 derive 了一个理论框架,用于杂合抽象和优化。我们在每个迭代中更新和重用过去的样本,使用 low-variance finite-difference 估计器来描述每次迭代中 estimated gradient 的变化。通过结合 пропорциональ 和 finite-difference 样本,我们在优化过程中不断降低我们的新型 gradient meta-估计器的异谱。我们调查了我们的估计器与 Adam 的稳定组合。我们实现了我们的方法,用于 inverse path tracing,并在困难的优化任务上证明了我们的估计器可以加速 convergence。

On Computational Entanglement and Its Interpretation in Adversarial Machine Learning

  • paper_url: http://arxiv.org/abs/2309.15669
  • repo_url: None
  • paper_authors: YenLung Lai, Xingbo Dong, Zhe Jin
  • for: This paper explores the intrinsic complexity and interpretability of adversarial machine learning models.
  • methods: The authors define entanglement computationally and demonstrate the existence of strong correlations between distant feature samples, akin to entanglement in the quantum realm.
  • results: The study reveals links between machine learning model complexity and Einstein’s theory of special relativity, challenging conventional perspectives on adversarial transferability and providing insights into more robust and interpretable models.
    Abstract Adversarial examples in machine learning has emerged as a focal point of research due to their remarkable ability to deceive models with seemingly inconspicuous input perturbations, potentially resulting in severe consequences. In this study, we embark on a comprehensive exploration of adversarial machine learning models, shedding light on their intrinsic complexity and interpretability. Our investigation reveals intriguing links between machine learning model complexity and Einstein's theory of special relativity, through the concept of entanglement. More specific, we define entanglement computationally and demonstrate that distant feature samples can exhibit strong correlations, akin to entanglement in quantum realm. This revelation challenges conventional perspectives in describing the phenomenon of adversarial transferability observed in contemporary machine learning models. By drawing parallels with the relativistic effects of time dilation and length contraction during computation, we gain deeper insights into adversarial machine learning, paving the way for more robust and interpretable models in this rapidly evolving field.
    摘要 《机器学习中的对抗示例》在研究中得到了广泛关注,因为它们能够通过一些微小的输入抖动,让模型出现异常的行为,从而导致严重的后果。在这项研究中,我们进行了广泛的对机器学习模型的探索,揭示了这些模型的内在复杂性和可解性。我们的调查发现,机器学习模型的复杂度和爱因斯坦的特殊 relativity 理论之间存在感知的联系,通过计算上的束缚来解释。具体来说,我们定义了计算上的束缚,并证明了 distant feature samples 可以显示强相关性,与量子世界中的束缚类似。这一发现挑战了当代机器学习模型中对 adversarial transferability 的描述方式。通过在计算中的时间扭曲和长度减小的关系来启示对 adversarial machine learning 的深入理解,为这一领域的更加robust和可解性的模型开辟了新的道路。

Federated Deep Equilibrium Learning: A Compact Shared Representation for Edge Communication Efficiency

  • paper_url: http://arxiv.org/abs/2309.15659
  • repo_url: None
  • paper_authors: Long Tan Le, Tuan Dung Nguyen, Tung-Anh Nguyen, Choong Seon Hong, Nguyen H. Tran
  • for: 本研究旨在提出一种基于联合平衡学习和共识优化的分布式学习框架,以解决Edge AI解决方案中的通信瓶颈、数据不一致和内存限制问题。
  • methods: 本研究使用的方法包括深度平衡学习和共识优化,以实现在边缘节点上 derivation of personalized models。具有共识优化的模型结构包括平衡层和传统神经网络层。
  • results: 实验结果表明,FeDEQ可以在不同的benchmark上达到与个性化方法相同的性能水平,同时使用的模型尺寸比较小,即4倍小于个性化方法的通信尺寸和1.5倍小于个性化方法的训练内存尺寸。
    Abstract Federated Learning (FL) is a prominent distributed learning paradigm facilitating collaboration among nodes within an edge network to co-train a global model without centralizing data. By shifting computation to the network edge, FL offers robust and responsive edge-AI solutions and enhance privacy-preservation. However, deploying deep FL models within edge environments is often hindered by communication bottlenecks, data heterogeneity, and memory limitations. To address these challenges jointly, we introduce FeDEQ, a pioneering FL framework that effectively employs deep equilibrium learning and consensus optimization to exploit a compact shared data representation across edge nodes, allowing the derivation of personalized models specific to each node. We delve into a unique model structure composed of an equilibrium layer followed by traditional neural network layers. Here, the equilibrium layer functions as a global feature representation that edge nodes can adapt to personalize their local layers. Capitalizing on FeDEQ's compactness and representation power, we present a novel distributed algorithm rooted in the alternating direction method of multipliers (ADMM) consensus optimization and theoretically establish its convergence for smooth objectives. Experiments across various benchmarks demonstrate that FeDEQ achieves performance comparable to state-of-the-art personalized methods while employing models of up to 4 times smaller in communication size and 1.5 times lower memory footprint during training.
    摘要 Federation Learning (FL) 是一种广泛分布式学习 paradigma,帮助 Edge 网络中的节点共同训练全球模型,而不需要中央化数据。通过将计算转移到网络边缘,FL 提供了可靠和快速的 Edge-AI 解决方案,并保护隐私。然而,在 Edge 环境中部署深度 FL 模型时,经常会遇到通信瓶颈、数据多样性和内存限制。为了共同解决这些挑战,我们介绍了 FeDEQ,一种先进的 FL 框架,该利用深度均衡学习和consensus优化来实现一个具有共同数据表示的紧凑共享模型,允许每个节点 derivation 个性化的模型。我们探讨了一种独特的模型结构,其包括均衡层 seguido de tradicional 神经网络层。在这种模型结构中,均衡层 函数为每个节点可适应的全球特征表示,允许边缘节点个性化其本地层。通过 FeDEQ 的紧凑性和表示力,我们提出了一种新的分布式算法,基于 alternating direction method of multipliers (ADMM) 的consensus优化,并理论确定其收敛性。在多种 benchmark 上进行的实验表明,FeDEQ 可以与当前的个性化方法相比,使用的模型尺寸更小,并在训练过程中占用更少的内存。

SANGEA: Scalable and Attributed Network Generation

  • paper_url: http://arxiv.org/abs/2309.15648
  • repo_url: None
  • paper_authors: Valentin Lemaire, Youssef Achenchabe, Lucas Ody, Houssem Eddine Souid, Gianmarco Aversano, Nicolas Posocco, Sabri Skhiri
  • for: 本研究旨在扩展现有的生成模型,使其可以应用于大型图。
  • methods: 本paper使用了分community的方法,首先将大图分成多个社区,然后在每个社区中使用SGG生成图。
  • results: 实验表明,生成的图与原图具有类似的topology和节点特征分布,同时在下游任务中具有高的链接预测性。此外,生成的图也具有合理的隐私评价。
    Abstract The topic of synthetic graph generators (SGGs) has recently received much attention due to the wave of the latest breakthroughs in generative modelling. However, many state-of-the-art SGGs do not scale well with the graph size. Indeed, in the generation process, all the possible edges for a fixed number of nodes must often be considered, which scales in $\mathcal{O}(N^2)$, with $N$ being the number of nodes in the graph. For this reason, many state-of-the-art SGGs are not applicable to large graphs. In this paper, we present SANGEA, a sizeable synthetic graph generation framework which extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph. Our experiments show that the graphs generated by SANGEA have high similarity to the original graph, in terms of both topology and node feature distribution. Additionally, these generated graphs achieve high utility on downstream tasks such as link prediction. Finally, we provide a privacy assessment of the generated graphs to show that, even though they have excellent utility, they also achieve reasonable privacy scores.
    摘要 topic of synthetic graph generators (SGGs) 最近得到了很多关注,因为生成模型的latest breakthroughs 。然而,许多状态对的 SGGs 不能Scales well with the graph size。实际上,在生成过程中,所有可能的边 для一个固定数量的节点必须经常考虑,这 scales in $\mathcal{O}(N^2)$, with $N$ 是图中节点的数量。因此,许多状态对的 SGGs 不适用于大图。在这篇论文中,我们提出了 SANGEA,一个可 scales synthetic graph generation framework。我们首先将大图分成社区,然后在每个社区中训练一个 SGG,然后将社区图相互链接以创建一个 synthetic 大图。我们的实验表明,生成的图与原图具有高度相似性,包括图形态和节点特征分布。此外,这些生成的图可以在下游任务中实现高的链接预测性能。最后,我们进行了隐私评估,表明,即使具有出色的实用性,这些生成的图也可以实现合理的隐私分数。

Cold & Warm Net: Addressing Cold-Start Users in Recommender Systems

  • paper_url: http://arxiv.org/abs/2309.15646
  • repo_url: None
  • paper_authors: Xiangyu Zhang, Zongqiang Kuang, Zehao Zhang, Fan Huang, Xianfeng Tan
  • for: solve the user cold-start problem in the matching stage of recommender systems.
  • methods: utilize side information or meta-learning to model cold-start users, and incorporate the results from two experts using a gate network. Additionally, dynamic knowledge distillation is used to assist experts in better learning user representation, and comprehensive mutual information is used to select highly relevant features for the bias net.
  • results: outperform other models on all user types on public datasets, and achieve a significant increase in app dwell time and user retention rate on an industrial short video platform.
    Abstract Cold-start recommendation is one of the major challenges faced by recommender systems (RS). Herein, we focus on the user cold-start problem. Recently, methods utilizing side information or meta-learning have been used to model cold-start users. However, it is difficult to deploy these methods to industrial RS. There has not been much research that pays attention to the user cold-start problem in the matching stage. In this paper, we propose Cold & Warm Net based on expert models who are responsible for modeling cold-start and warm-up users respectively. A gate network is applied to incorporate the results from two experts. Furthermore, dynamic knowledge distillation acting as a teacher selector is introduced to assist experts in better learning user representation. With comprehensive mutual information, features highly relevant to user behavior are selected for the bias net which explicitly models user behavior bias. Finally, we evaluate our Cold & Warm Net on public datasets in comparison to models commonly applied in the matching stage and it outperforms other models on all user types. The proposed model has also been deployed on an industrial short video platform and achieves a significant increase in app dwell time and user retention rate.
    摘要 冷启动推荐是推荐系统(RS)的一个主要挑战。我们在这里关注用户冷启动问题。现在,利用侧信息或meta学习方法来模型冷启动用户已经得到了一些研究。然而,在实际RS中部署这些方法却很困难。在匹配阶段,很少有关注用户冷启动问题的研究。在这篇论文中,我们提出了冷&温网(Cold & Warm Net),该模型由专家模型负责模型冷启动和温存用户。一个门控网络用于结合两个专家的结果。此外,我们还引入了动态知识填充作为教师选择器,以助专家更好地学习用户表示。通过全面的共同信息,我们选择了高度相关于用户行为的特征来进行偏好网的模型。最后,我们对公共数据集进行评估,并与常见在匹配阶段使用的模型进行比较,我们的模型在所有用户类型上都表现出优异。此外,我们还将该模型部署到了一家短视频平台,并实现了明显提高应用内存时间和用户固持率。

Why do Angular Margin Losses work well for Semi-Supervised Anomalous Sound Detection?

  • paper_url: http://arxiv.org/abs/2309.15643
  • repo_url: None
  • paper_authors: Kevin Wilkinghoff, Frank Kurth
  • for: 这 paper 的目的是调查使用angular margin losses与auxiliary tasks来检测异常声音的原因。
  • methods: 这 paper 使用了angular margin losses与related classification task作为auxiliary task,并通过 teoretic 分析和实验证明了这种方法可以减少compactness loss并避免学习极端解。
  • results: 实验表明,使用相关的类别任务作为auxiliary task可以教会模型学习适合检测异常声音的表示,并在噪音条件下显著地超越生成或一类模型。
    Abstract State-of-the-art anomalous sound detection systems often utilize angular margin losses to learn suitable representations of acoustic data using an auxiliary task, which usually is a supervised or self-supervised classification task. The underlying idea is that, in order to solve this auxiliary task, specific information about normal data needs to be captured in the learned representations and that this information is also sufficient to differentiate between normal and anomalous samples. Especially in noisy conditions, discriminative models based on angular margin losses tend to significantly outperform systems based on generative or one-class models. The goal of this work is to investigate why using angular margin losses with auxiliary tasks works well for detecting anomalous sounds. To this end, it is shown, both theoretically and experimentally, that minimizing angular margin losses also minimizes compactness loss while inherently preventing learning trivial solutions. Furthermore, multiple experiments are conducted to show that using a related classification task as an auxiliary task teaches the model to learn representations suitable for detecting anomalous sounds in noisy conditions. Among these experiments are performance evaluations, visualizing the embedding space with t-SNE and visualizing the input representations with respect to the anomaly score using randomized input sampling for explanation.
    摘要 现代异常声检测系统经常使用角度margin损失来学习适合听音数据的表示。这个想法是,为了解决这个辅助任务,需要学习表示中包含正常数据特征信息,并且这些信息也能够区分正常和异常样本。尤其在噪音条件下,使用角度margin损失的权重分配模型往往与生成模型或一类模型相比表现出色。这项工作的目的是研究为何在听音检测中使用角度margin损失和辅助任务是有效的。这个问题的解决方法是,理论上和实验上都证明,使用角度margin损失同时减小紧凑性损失,并且自然地避免学习极端解。此外,通过多个实验表明,使用相关的类别任务作为辅助任务可以使模型学习适合听音检测的表示,特别是在噪音条件下。这些实验包括表现评估、使用t-SNEVisualizing embedding空间和使用随机输入抽样来解释输入表示的方法。

Efficient tensor network simulation of IBM’s largest quantum processors

  • paper_url: http://arxiv.org/abs/2309.15642
  • repo_url: None
  • paper_authors: Siddhartha Patra, Saeed S. Jahromi, Sukhbinder Singh, Roman Orus
  • For: The paper demonstrates how quantum-inspired 2D tensor networks can be used to efficiently and accurately simulate large quantum processors, specifically the IBM Eagle, Osprey, and Condor processors.* Methods: The paper uses graph-based Projected Entangled Pair States (gPEPS) to simulate the dynamics of a complex quantum many-body system, the kicked Ising experiment.* Results: The paper achieves very large unprecedented accuracy with remarkably low computational resources for this model, and extends the results to larger qubit counts (433 and 1121 qubits) and longer evolution times. Additionally, the paper demonstrates accurate simulations for infinitely-many qubits.
    Abstract We show how quantum-inspired 2d tensor networks can be used to efficiently and accurately simulate the largest quantum processors from IBM, namely Eagle (127 qubits), Osprey (433 qubits) and Condor (1121 qubits). We simulate the dynamics of a complex quantum many-body system -- specifically, the kicked Ising experiment considered recently by IBM in Nature 618, p. 500-505 (2023) -- using graph-based Projected Entangled Pair States (gPEPS), which was proposed by some of us in PRB 99, 195105 (2019). Our results show that simple tensor updates are already sufficient to achieve very large unprecedented accuracy with remarkably low computational resources for this model. Apart from simulating the original experiment for 127 qubits, we also extend our results to 433 and 1121 qubits, and for evolution times around 8 times longer, thus setting a benchmark for the newest IBM quantum machines. We also report accurate simulations for infinitely-many qubits. Our results show that gPEPS are a natural tool to efficiently simulate quantum computers with an underlying lattice-based qubit connectivity, such as all quantum processors based on superconducting qubits.
    摘要 我们显示了使用量子感知的2Dtensor网络来高效地和精度地模拟IBM最大的量子处理器,包括鹰(127 qubits)、𫛭(433 qubits)和鸠(1121 qubits)。我们使用图形基于的Projected Entangled Pair States(gPEPS)来模拟一个复杂的量子多体系统——对IBM在Nature 618, p. 500-505(2023)中考虑的kicked Ising实验。我们的结果显示了简单的tensor更新已经足够以 достиunge非常大的新高精度,仅需使用remarkably low的计算资源。除了模拟127 qubits的原始实验之外,我们还将结果扩展到433和1121 qubits,并在 evolution times around 8 times longer 上进行了benchmark。我们还报告了对无限多颗 qubits 的精度模拟。我们的结果显示了gPEPS是一个自然的工具来高效地模拟基于碳素链的qubit连接的量子计算机。

Enhancing Sharpness-Aware Optimization Through Variance Suppression

  • paper_url: http://arxiv.org/abs/2309.15639
  • repo_url: None
  • paper_authors: Bingcong Li, Georgios B. Giannakis
  • for: 提高深度神经网络的泛化能力,不需要大量数据增强
  • methods: 基于损失函数的几何结构,寻找’平坦谷’,通过最大损失带动参数的敏感范围内的敏感探索
  • results: 提出了一种新的稳定化难度探索方法(VaSSO),可以避免’友好的敌人’的问题,并在模型独立任务中提供了数值上的改进和鲁棒性 against 高强度标签噪声。
    Abstract Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.
    摘要 <>使用适度化 minimization(SAM)可以增强深度神经网络的通用性,无需很大的数据增强。通过捕捉损失函数的几何结构,where neighborhoods of 'flat minima' enhance generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood。虽然需要考虑损失函数的锐度,但这种'过度友好的对手'可能会限制最大化的泛化能力。这篇论文的新 aproach 是通过 variance suppression (VaSSO) 来稳定对手,以避免这种友好性。VaSSO 的数学稳定性保证其数值改进之前的 SAM 在模型无关任务中,包括图像分类和机器翻译。此外,实验证明 VaSSO 会赋予 SAM 对高水平的标签噪声的Robustness。Note: "适度化" (weixiao) in the text refers to "sharpness-aware" in English.

Entropic Matching for Expectation Propagation of Markov Jump Processes

  • paper_url: http://arxiv.org/abs/2309.15604
  • repo_url: None
  • paper_authors: Bastian Alt, Heinz Koeppl
  • for: 本研究探讨了隐藏时间连续随机过程的统计推断问题,这类问题经常是困难的,特别是对于粒子状态空间过程的描述。
  • methods: 我们提出了一种新的可行的推断方案,基于Entropic Matching框架,可以在Well-known Expectation Propagation算法中嵌入。我们给出了一种简单的家族的近似分布的关闭式结果,并应用于普遍的化学反应网络模型,这是系统生物学中重要的工具。
  • results: 我们 deriv了关闭式表达式,用于粒子参数的点估计,并使用近似期望最大化算法进行推断。我们对各种化学反应网络实例进行了评估,包括一个Stochastic Lotka-Volterra示例,并讨论了我们的方法的局限和未来改进的可能性。
    Abstract This paper addresses the problem of statistical inference for latent continuous-time stochastic processes, which is often intractable, particularly for discrete state space processes described by Markov jump processes. To overcome this issue, we propose a new tractable inference scheme based on an entropic matching framework that can be embedded into the well-known expectation propagation algorithm. We demonstrate the effectiveness of our method by providing closed-form results for a simple family of approximate distributions and apply it to the general class of chemical reaction networks, which are a crucial tool for modeling in systems biology. Moreover, we derive closed form expressions for point estimation of the underlying parameters using an approximate expectation maximization procedure. We evaluate the performance of our method on various chemical reaction network instantiations, including a stochastic Lotka-Voltera example, and discuss its limitations and potential for future improvements. Our proposed approach provides a promising direction for addressing complex continuous-time Bayesian inference problems.
    摘要

Distill Knowledge in Multi-task Reinforcement Learning with Optimal-Transport Regularization

  • paper_url: http://arxiv.org/abs/2309.15603
  • repo_url: None
  • paper_authors: Bang Giang Le, Viet Cuong Ta
  • for: 提高多任务 reinforcement learning 训练agent的数据效率,通过将不同但相关的任务之间的知识传递。
  • methods: 使用 Optimal transport-based 奖励来稳定知识传递,通过Sinkhorn mapping来计算Optimal transport距离,并用作积累奖励。
  • results: 在多个网格导航多目标任务中,我们的方法能够加速agent的学习过程,并超越多个基eline。
    Abstract In multi-task reinforcement learning, it is possible to improve the data efficiency of training agents by transferring knowledge from other different but related tasks. Because the experiences from different tasks are usually biased toward the specific task goals. Traditional methods rely on Kullback-Leibler regularization to stabilize the transfer of knowledge from one task to the others. In this work, we explore the direction of replacing the Kullback-Leibler divergence with a novel Optimal transport-based regularization. By using the Sinkhorn mapping, we can approximate the Optimal transport distance between the state distribution of tasks. The distance is then used as an amortized reward to regularize the amount of sharing information. We experiment our frameworks on several grid-based navigation multi-goal to validate the effectiveness of the approach. The results show that our added Optimal transport-based rewards are able to speed up the learning process of agents and outperforms several baselines on multi-task learning.
    摘要 在多任务强化学习中,可以通过知识传递来提高训练代理的数据效率。因为不同任务的经验通常偏向特定任务目标。传统方法利用库拉布-莱布尔正则化来稳定知识传递。在这种工作中,我们研究将库拉布-莱布尔差异 replaced with novel 优化运输基于正则化。通过使用填充映射,我们可以近似优化运输距离 между任务状态分布。这个距离然后用作权衡共享信息的奖励,以补偿代理的学习过程。我们在几个网格基础的导航多目标上进行了实验,结果表明我们的添加的优化运输基于奖励能够加速代理的学习过程,并超过了多个基线在多任务学习中。

OceanBench: The Sea Surface Height Edition

  • paper_url: http://arxiv.org/abs/2309.15599
  • repo_url: https://github.com/jejjohnson/oceanbench
  • paper_authors: J. Emmanuel Johnson, Quentin Febvre, Anastasia Gorbunova, Sammy Metref, Maxime Ballarotta, Julien Le Sommer, Ronan Fablet
    for: This paper aims to provide a unifying framework for machine learning (ML) researchers to benchmark their models and customize their pipelines for ocean satellite data interpolation challenges.methods: The paper uses satellite remote sensing data and machine learning techniques to develop a standardized processing framework called OceanBench, which provides plug-and-play data and pre-configured pipelines for ML researchers.results: The paper demonstrates the effectiveness of the OceanBench framework through a first edition dedicated to sea surface height (SSH) interpolation challenges, providing datasets and ML-ready benchmarking pipelines for simulated ocean satellite data, multi-modal and multi-sensor fusion issues, and transfer-learning to real ocean satellite observations.
    Abstract The ocean profoundly influences human activities and plays a critical role in climate regulation. Our understanding has improved over the last decades with the advent of satellite remote sensing data, allowing us to capture essential quantities over the globe, e.g., sea surface height (SSH). However, ocean satellite data presents challenges for information extraction due to their sparsity and irregular sampling, signal complexity, and noise. Machine learning (ML) techniques have demonstrated their capabilities in dealing with large-scale, complex signals. Therefore we see an opportunity for ML models to harness the information contained in ocean satellite data. However, data representation and relevant evaluation metrics can be the defining factors when determining the success of applied ML. The processing steps from the raw observation data to a ML-ready state and from model outputs to interpretable quantities require domain expertise, which can be a significant barrier to entry for ML researchers. OceanBench is a unifying framework that provides standardized processing steps that comply with domain-expert standards. It provides plug-and-play data and pre-configured pipelines for ML researchers to benchmark their models and a transparent configurable framework for researchers to customize and extend the pipeline for their tasks. In this work, we demonstrate the OceanBench framework through a first edition dedicated to SSH interpolation challenges. We provide datasets and ML-ready benchmarking pipelines for the long-standing problem of interpolating observations from simulated ocean satellite data, multi-modal and multi-sensor fusion issues, and transfer-learning to real ocean satellite observations. The OceanBench framework is available at github.com/jejjohnson/oceanbench and the dataset registry is available at github.com/quentinf00/oceanbench-data-registry.
    摘要 海洋对人类活动产生深远的影响,对气候调控也扮演着关键性的角色。过去几十年来,卫星远程感知数据的出现,使我们能够全球范围内获取重要量,如海面高程(SSH)。然而,海洋卫星数据具有缺乏精度和不规则的采样、信号复杂性和噪声等挑战。机器学习(ML)技术已经在处理大规模复杂信号方面表现出色,因此我们认为ML模型可以从海洋卫星数据中提取信息。然而,数据表示和相关评价指标是确定成功的关键因素。从原始观测数据到ML准备状态和模型输出到可读量需要域专业知识,这可能是ML研究人员面临的 significiant barrier。 OceanBench 是一个统一的框架,它提供了遵循域专业标准的处理步骤,并提供了可插入的数据和预配置的管道,以便 ML 研究人员可以 benchmark 他们的模型。在这项工作中,我们通过 OceanBench 框架展示了 SSH interpolating 挑战。我们提供了数据集和 ML 准备好的管道,以解决长期存在的 ocean 卫星数据 interpolating 问题,以及多模式和多感知融合问题,以及将模型转移到实际 ocean 卫星观测数据。 OceanBench 框架可以在 GitHub 上找到(github.com/jejjohnson/oceanbench),数据库注册可以在 GitHub 上找到(github.com/quentinf00/oceanbench-data-registry)。

Exciton-Polariton Condensates: A Fourier Neural Operator Approach

  • paper_url: http://arxiv.org/abs/2309.15593
  • repo_url: None
  • paper_authors: Surya T. Sathujoda, Yuan Wang, Kanishk Gandhi
  • for: 研究者实现了一种基于Machine Learning的Fourier Neural Operator方法,用于解决吸引子-晶质子对应系统中的非线性问题。
  • methods: 研究者使用了Machine Learning的Fourier Neural Operator方法来解决Gross-Pitaevskii方程式和额外激发数方程式的组合。
  • results: 研究者发现使用Machine Learning的Fourier Neural Operator方法可以高度精度地预测系统的终端解,并且比CUDA-based GPU推导器快得多,约1000倍。这项研究开辟了潜在的全光学芯片设计工程过程的可能性。
    Abstract Advancements in semiconductor fabrication over the past decade have catalyzed extensive research into all-optical devices driven by exciton-polariton condensates. Preliminary validations of such devices, including transistors, have shown encouraging results even under ambient conditions. A significant challenge still remains for large scale application however: the lack of a robust solver that can be used to simulate complex nonlinear systems which require an extended period of time to stabilize. Addressing this need, we propose the application of a machine-learning-based Fourier Neural Operator approach to find the solution to the Gross-Pitaevskii equations coupled with extra exciton rate equations. This work marks the first direct application of Neural Operators to an exciton-polariton condensate system. Our findings show that the proposed method can predict final-state solutions to a high degree of accuracy almost 1000 times faster than CUDA-based GPU solvers. Moreover, this paves the way for potential all-optical chip design workflows by integrating experimental data.
    摘要 “过去十年的半导体制造技术发展,促使了对激子-辐olinarpiton储集体系的广泛研究。初步验证表明,这些设备,包括普通逻辑 gates,在常 ambient 条件下表现出了激动人人。然而,大规模应用还面临着一个主要挑战:没有一个可靠的可用来模拟复杂非线性系统,需要较长时间来稳定。为解决这一需求,我们提议使用机器学习基于Fourier Neural Operator的方法来解决戈斯-皮特涅夫斯基方程组 coupled with extra exciton rate equations。这是直接应用Neural Operators于激子-辐olinarpiton储集体系的首次尝试。我们的发现表明,提议的方法可以高度准确地预测最终解决方案,比CUDA基于GPU的加速器solver快得多,约1000倍。此外,这也开启了可能的全光学芯片设计工作流程,通过结合实验数据。”Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Demographic Parity: Mitigating Biases in Real-World Data

  • paper_url: http://arxiv.org/abs/2309.17347
  • repo_url: None
  • paper_authors: Orestis Loukas, Ho-Ryun Chung
  • for: 本研究旨在提出一种可靠的方法,以除除计算机支持的决策系统中的不良偏见,同时保持分类用途的最大化。
  • methods: 本研究使用了实世界数据 derive an asymptotic dataset,该dataset具有人口分布的准确性和现实性,并可以用来训练计算机支持的各种分类器。
  • results: 经 benchmarking,我们确认了这些基于我们生成的synthetic数据训练的分类器没有显式或隐式的偏见。
    Abstract Computer-based decision systems are widely used to automate decisions in many aspects of everyday life, which include sensitive areas like hiring, loaning and even criminal sentencing. A decision pipeline heavily relies on large volumes of historical real-world data for training its models. However, historical training data often contains gender, racial or other biases which are propagated to the trained models influencing computer-based decisions. In this work, we propose a robust methodology that guarantees the removal of unwanted biases while maximally preserving classification utility. Our approach can always achieve this in a model-independent way by deriving from real-world data the asymptotic dataset that uniquely encodes demographic parity and realism. As a proof-of-principle, we deduce from public census records such an asymptotic dataset from which synthetic samples can be generated to train well-established classifiers. Benchmarking the generalization capability of these classifiers trained on our synthetic data, we confirm the absence of any explicit or implicit bias in the computer-aided decision.
    摘要

Towards Faithful Neural Network Intrinsic Interpretation with Shapley Additive Self-Attribution

  • paper_url: http://arxiv.org/abs/2309.15559
  • repo_url: None
  • paper_authors: Ying Sun, Hengshu Zhu, Hui Xiong
  • for: 本研究旨在提供一种基于自我评估的神经网络模型,以提高模型的解释性和表达能力。
  • methods: 本研究提出了一种普适的加法自我归因(ASA)框架,并提出了基于Shapley值的Self-Attributing Neural Network(SASANet)模型,以实现对输出的自我归因值的正确预测。SASANet使用积分贡献序列schema和内部蒸馏学习策略来模型有意义的输出,从而实现了不减误的意义值函数。
  • results: 实验结果表明,SASANet的性能较 existing self-attributing models 高,与黑盒模型相当,并且比Post-hoc方法更加精准和效率地解释其预测结果。
    Abstract Self-interpreting neural networks have garnered significant interest in research. Existing works in this domain often (1) lack a solid theoretical foundation ensuring genuine interpretability or (2) compromise model expressiveness. In response, we formulate a generic Additive Self-Attribution (ASA) framework. Observing the absence of Shapley value in Additive Self-Attribution, we propose Shapley Additive Self-Attributing Neural Network (SASANet), with theoretical guarantees for the self-attribution value equal to the output's Shapley values. Specifically, SASANet uses a marginal contribution-based sequential schema and internal distillation-based training strategies to model meaningful outputs for any number of features, resulting in un-approximated meaningful value function. Our experimental results indicate SASANet surpasses existing self-attributing models in performance and rivals black-box models. Moreover, SASANet is shown more precise and efficient than post-hoc methods in interpreting its own predictions.
    摘要 自适应神经网络在研究中受到了广泛的关注。现有的研究经常(1)缺乏固定的理论基础,使得真正的解释性受限,或(2)妥协模型表达能力。为回应这些挑战,我们提出了一种通用的加法自我解释(ASA)框架。我们注意到了添加式自我解释中缺失的雪佛利值,因此我们提出了雪佛利添加式自我解释神经网络(SASANet),具有输出的雪佛利值的理论保证。特别是,SASANet使用级联贡献基于的顺序schema和内部蒸馏基于的训练策略,以模型任意数量的特征输出具有准确的意义。我们的实验结果表明,SASANet比现有的自我解释模型在性能和黑盒模型的表现相当,同时也比post-hoc方法更精准和高效地解释其预测结果。

Startup success prediction and VC portfolio simulation using CrunchBase data

  • paper_url: http://arxiv.org/abs/2309.15552
  • repo_url: None
  • paper_authors: Mark Potanin, Andrey Chertok, Konstantin Zorin, Cyril Shtabtsovsky
  • for: This paper aims to predict key success milestones for startups at their Series B and Series C investment stages, such as achieving an Initial Public Offering (IPO), attaining unicorn status, or executing a successful Merger and Acquisition (M&A).
  • methods: The paper introduces a novel deep learning model for predicting startup success, which integrates a variety of factors such as funding metrics, founder features, and industry category. The model uses a comprehensive backtesting algorithm to simulate the venture capital investment process and evaluate its performance against historical data.
  • results: The paper achieved a 14 times capital growth and successfully identified high-potential startups in their B round, including Revolut, DigitalOcean, Klarna, Github, and others. The empirical findings highlight the importance of incorporating diverse feature sets in enhancing the model’s predictive accuracy.Here’s the simplified Chinese version of the three key points:
  • for: 这篇论文旨在预测 startup 在 Series B 和 Series C 融资阶段的成功关键 milestone,如 Initial Public Offering (IPO)、成为unicorn 或成功的 Merger and Acquisition (M&A)。
  • methods: 论文提出了一种新的深度学习模型,用于预测 startup 的成功,该模型集成了多种因素,如资金指标、创始人特征、行业类别。模型使用了全面的回测算法,模拟了投资过程,以评估其在历史数据上的实际性。
  • results: 论文在 Crunchbase 上实现了 14 倍资金增长,并成功地标识了 B 轮高 potential startup,如 Revolut、DigitalOcean、Klarna、Github 等。实验结果表明,将多种特征集成到模型中可以提高预测精度。
    Abstract Predicting startup success presents a formidable challenge due to the inherently volatile landscape of the entrepreneurial ecosystem. The advent of extensive databases like Crunchbase jointly with available open data enables the application of machine learning and artificial intelligence for more accurate predictive analytics. This paper focuses on startups at their Series B and Series C investment stages, aiming to predict key success milestones such as achieving an Initial Public Offering (IPO), attaining unicorn status, or executing a successful Merger and Acquisition (M\&A). We introduce novel deep learning model for predicting startup success, integrating a variety of factors such as funding metrics, founder features, industry category. A distinctive feature of our research is the use of a comprehensive backtesting algorithm designed to simulate the venture capital investment process. This simulation allows for a robust evaluation of our model's performance against historical data, providing actionable insights into its practical utility in real-world investment contexts. Evaluating our model on Crunchbase's, we achieved a 14 times capital growth and successfully identified on B round high-potential startups including Revolut, DigitalOcean, Klarna, Github and others. Our empirical findings illuminate the importance of incorporating diverse feature sets in enhancing the model's predictive accuracy. In summary, our work demonstrates the considerable promise of deep learning models and alternative unstructured data in predicting startup success and sets the stage for future advancements in this research area.
    摘要 预测创业成功具有挑战性,因为创新企业环境的自然变化和不确定性。随着数据库如Crunchbase的出现以及可用的开放数据,我们可以通过机器学习和人工智能来进行更准确的预测分析。这篇论文将关注创业在Series B和Series C融资阶段,以预测创业成功的关键里程碑,如实现首次公开募股(IPO)、成为unicorn企业或成功投资(M&A)。我们提出了一种深度学习模型,以便预测创业成功,并 integrate了各种因素,如资金指标、创始人特征、行业类别。我们的研究的一个独特特点是使用了全面的回测算法,以模拟投资过程,从而提供了对历史数据的robust评估,并提供了实际投资场景中模型的实用性。通过对Crunchbase的数据进行评估,我们实现了14倍的资金增长,并成功地标识了B轮高潜力创业公司,如Revolut、DigitalOcean、Klarna、Github等。我们的实证发现,包括多种特征集的 incorporation 可以增强模型的预测精度。总之,我们的工作表明了深度学习模型和代替性数据在预测创业成功的可能性,并为这个研究领域的未来发展奠定基础。

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.15531
  • repo_url: None
  • paper_authors: Jung Hwan Heo, Jeonghoon Kim, Beomseok Kwon, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee
  • for: 这篇论文目的是提出一种新的量化方法,以提高小批量推理中的大型语言模型(LLMs)的效率。
  • methods: 本论文提出了一种新的量化方法,即per-IC量化,它在每个输入通道(IC)中创建量化组,以将问题集中的大量数据隔离出来。此外,论文还提出了一个名为Adaptive Dimensions(AdaDim)的可靠量化框架,可以适应不同的权重敏感性模式。
  • results: 论文的实验结果显示,这些新的量化方法可以与传统的 Round-To-Nearest 和 GPTQ 方法相结合,实现在不同的语言模型 benchmark 上的优化性能。特别是在基础 LLMS 上(up to +4.7% on MMLU)和人工评估 LLMS 上(up to +10% on HumanEval)。
    Abstract Large Language Models (LLMs) have recently demonstrated a remarkable success across various tasks. However, efficiently serving LLMs has been a challenge due to its large memory bottleneck, specifically in small batch inference settings (e.g. mobile devices). Weight-only quantization can be a promising approach, but sub-4 bit quantization remains a challenge due to large-magnitude activation outliers. To mitigate the undesirable outlier effect, we first propose per-IC quantization, a simple yet effective method that creates quantization groups within each input channel (IC) rather than the conventional per-output channel (OC). Our method is motivated by the observation that activation outliers affect the input dimension of the weight matrix, so similarly grouping the weights in the IC direction can isolate outliers to be within a group. We also find that activation outliers do not dictate quantization difficulty, and inherent weight sensitivities also exist. With per-IC quantization as a new outlier-friendly scheme, we then propose Adaptive Dimensions (AdaDim), a versatile quantization framework that can adapt to various weight sensitivity patterns. We demonstrate the effectiveness of AdaDim by augmenting prior methods such as Round-To-Nearest and GPTQ, showing significant improvements across various language modeling benchmarks for both base (up to +4.7% on MMLU) and instruction-tuned (up to +10% on HumanEval) LLMs.
    摘要 大型语言模型(LLM)最近表现出了惊人的成功,但是有效地服务LLM仍然是一个挑战,特别是在小批量推理设置(例如移动设备)。量化可以是一种有前途的方法,但是4比特以下的量化仍然是一个挑战,因为大量的活动异常值。为了解决这些不良异常的影响,我们首先提议使用每个输入通道(IC)量化,一种简单而有效的方法,它在每个输入通道内创建量化组而不是传统的每个输出通道(OC)。我们的方法是由于异常值影响输入维度的 weights 矩阵,因此在IC方向上组合 weights 也可以孤立异常。我们还发现,异常值不会决定量化的难度,潜在的 weight 敏感性也存在。在每个IC量化为新的异常友好方案基础之上,我们然后提议一种可变维度的量化框架,可以适应不同的 weight 敏感性模式。我们通过将优化先前的方法,如 Round-To-Nearest 和 GPTQ,在不同的语言模型推理 benchmark 上表现出显著的改善,对于基本(最高 +4.7% 的 MMLU)和指导调整(最高 +10% 的 HumanEval)LLM 都有显著的改善。

GNN4EEG: A Benchmark and Toolkit for Electroencephalography Classification with Graph Neural Network

  • paper_url: http://arxiv.org/abs/2309.15515
  • repo_url: https://github.com/miracle-2001/gnn4eeg
  • paper_authors: Kaiyuan Zhang, Ziyi Ye, Qingyao Ai, Xiaohui Xie, Yiqun Liu
  • for: 本研究旨在提供一个可读的Graph Neural Networks(GNN)工具套件,用于模型EEG信号。
  • methods: 本工具套件包括三个 ком成分:(i)一个大的benchmark,基于四个EEG分类任务,使用123名参与者收集的EEG数据;(ii)各种state-of-the-art GNN-based EEG分类模型的容易使用实现,如DGCNN和RGNN等;(iii)实现了广泛的实验设定和评估协议,如数据分割协议和十字验证协议。
  • results: 本研究通过使用GNN4EEG工具套件,可以实现高精度的EEG分类。
    Abstract Electroencephalography(EEG) classification is a crucial task in neuroscience, neural engineering, and several commercial applications. Traditional EEG classification models, however, have often overlooked or inadequately leveraged the brain's topological information. Recognizing this shortfall, there has been a burgeoning interest in recent years in harnessing the potential of Graph Neural Networks (GNN) to exploit the topological information by modeling features selected from each EEG channel in a graph structure. To further facilitate research in this direction, we introduce GNN4EEG, a versatile and user-friendly toolkit for GNN-based modeling of EEG signals. GNN4EEG comprises three components: (i)A large benchmark constructed with four EEG classification tasks based on EEG data collected from 123 participants. (ii)Easy-to-use implementations on various state-of-the-art GNN-based EEG classification models, e.g., DGCNN, RGNN, etc. (iii)Implementations of comprehensive experimental settings and evaluation protocols, e.g., data splitting protocols, and cross-validation protocols. GNN4EEG is publicly released at https://github.com/Miracle-2001/GNN4EEG.
    摘要 electroencephalography(EEG)分类是 neuroscience, neural engineering 和一些商业应用中的关键任务。传统的EEG分类模型,然而,经常忽视或不充分利用大脑的 topological 信息。认识到这一缺点,过去几年来,有越来越多的研究者在使用Graph Neural Networks (GNN) 来利用选择自 EEG 通道的特征,建模 EEG 信号。为了进一步促进这个方向的研究,我们介绍 GNN4EEG,一个多样化和易用的工具集,用于 GNN 基于 EEG 信号的模型化。GNN4EEG 包括以下三部分:(i) 一个包含四个 EEG 分类任务的大量 benchmark,基于123名参与者收集的 EEG 数据。(ii) 一些实现了当前State-of-the-art GNN 基于 EEG 分类模型,例如 DGCNN 和 RGNN 等。(iii) 实现了完整的实验设置和评估协议,例如数据分割协议和十字验证协议。GNN4EEG 公开发布于 https://github.com/Miracle-2001/GNN4EEG。

Bayesian Personalized Federated Learning with Shared and Personalized Uncertainty Representations

  • paper_url: http://arxiv.org/abs/2309.15499
  • repo_url: None
  • paper_authors: Hui Chen, Hengyu Liu, Longbing Cao, Tiancheng Zhang
  • for: This paper aims to address challenges in existing personalized federated learning (PFL) by introducing a Bayesian personalized federated learning (BPFL) framework that quantifies uncertainty and heterogeneity within and across clients.
  • methods: The BPFL framework uses a Bayesian federated neural network (BPFed) to decompose hidden neural representations into shared and local components, and jointly learns cross-client shared uncertainty and client-specific personalized uncertainty over statistically heterogeneous client data.
  • results: The paper provides theoretical analysis and guarantees, as well as experimental evaluation of BPFed against diversified baselines, to demonstrate the effectiveness of the proposed approach.
    Abstract Bayesian personalized federated learning (BPFL) addresses challenges in existing personalized FL (PFL). BPFL aims to quantify the uncertainty and heterogeneity within and across clients towards uncertainty representations by addressing the statistical heterogeneity of client data. In PFL, some recent preliminary work proposes to decompose hidden neural representations into shared and local components and demonstrates interesting results. However, most of them do not address client uncertainty and heterogeneity in FL systems, while appropriately decoupling neural representations is challenging and often ad hoc. In this paper, we make the first attempt to introduce a general BPFL framework to decompose and jointly learn shared and personalized uncertainty representations on statistically heterogeneous client data over time. A Bayesian federated neural network BPFed instantiates BPFL by jointly learning cross-client shared uncertainty and client-specific personalized uncertainty over statistically heterogeneous and randomly participating clients. We further involve continual updating of prior distribution in BPFed to speed up the convergence and avoid catastrophic forgetting. Theoretical analysis and guarantees are provided in addition to the experimental evaluation of BPFed against the diversified baselines.
    摘要 bayesian人类化联合学习(BPFL)解决了现有的个性化联合学习(PFL)中的挑战。 BPFL 的目标是量化客户端数据中的不确定性和多样性,通过对客户端数据的统计多样性进行处理,生成不确定性表示。在 PFL 中,一些最近的初步工作提议将隐藏神经表示分解为共享和本地组成部分,并得到了有趣的结果。然而,大多数这些方法不Address client uncertainty and heterogeneity in FL systems, while appropriately decoupling neural representations is challenging and often ad hoc。在这篇论文中,我们首次提出了一个通用的BPFL框架,用于在 statistically heterogeneous client data 上 decomposing and jointly learning shared and personalized uncertainty representations。一个 Bayesian federated neural network BPFed 实现了 BPFL,并在 statistically heterogeneous and randomly participating clients 上 jointly learning cross-client shared uncertainty and client-specific personalized uncertainty。我们还包括在 BPFed 中 continual updating of prior distribution 以加速收敛和避免快速忘记。在此之外,我们还提供了理论分析和保证。

Explainable machine learning-based prediction model for diabetic nephropathy

  • paper_url: http://arxiv.org/abs/2309.16730
  • repo_url: None
  • paper_authors: Jing-Mei Yin, Yang Li, Jun-Tang Xue, Guo-Wei Zong, Zhong-Ze Fang, Lang Zou
  • For: The paper aims to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach.* Methods: The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). The optimal 38 features were selected through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. Four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree, and logistic regression, were compared using AUC-ROC curves, decision curves, and calibration curves. The Shapley Additive exPlanations (SHAP) method was used to quantify feature importance and interaction effects in the optimal predictive model.* Results: The XGB model had the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gained more clinical net benefits than others and had a better fitting degree. Significant interactions between serum metabolites and duration of diabetes were found. A predictive model was developed using the XGB algorithm to screen for DN, with C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys being the most contributing factors. These factors could potentially serve as biomarkers for DN.
    Abstract The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN.
    摘要 这项研究的目的是分析血液代谢物对 диабетичеnehropathy(DN)的影响,并预测DN的发病率通过机器学习方法。数据集包含2018年4月至2019年4月的548名病人在第二附属医院大连医科大学第二附属医院(SAHDMU)。我们选择最佳的38个特征通过Least absolute shrinkage and selection operator(LASSO)回归模型和10fold Cross-Validation。我们比较了四种机器学习算法,包括extreme Gradient Boosting(XGB)、Random Forest、决策树和Logistic Regression,通过AUC-ROC曲线、决策曲线、calibration曲线进行比较。我们使用Shapley Additive exPlanations(SHAP)方法来衡量特征重要性和交互效应在最佳预测模型中。XGB模型在DNcreening方面表现最佳,AUC值为0.966。XGB模型也在临床总效益方面表现更好,并且适应度更高。此外,血液代谢物和糖尿病患期之间存在显著交互效应。我们开发了一个XGB模型来预测DN。C2、C5DC、Tyr、Ser、Met、C24、C4DC和Cys在模型中具有重要作用,可能成为DN的生物标志物。

Fast Locality Sensitive Hashing with Theoretical Guarantee

  • paper_url: http://arxiv.org/abs/2309.15479
  • repo_url: None
  • paper_authors: Zongyuan Tan, Hongya Wang, Bo Xu, Minjie Luo, Ming Du
  • for: nearest neighbor search task
  • methods: random sampling and random projection
  • results: up to 80x speedup in hash function evaluation, on par with state-of-the-arts in terms of answer quality, space occupation, and query efficiency
    Abstract Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks. The cost of hashing is proportional to data dimensions, and thus often the performance bottleneck when dimensionality is high and the number of hash functions involved is large. Surprisingly, however, little work has been done to improve the efficiency of LSH computation. In this paper, we design a simple yet efficient LSH scheme, named FastLSH, under l2 norm. By combining random sampling and random projection, FastLSH reduces the time complexity from O(n) to O(m) (m
    摘要 “本文提出了一种简单又高效的 hash 算法,名为 FastLSH,用于 nearest neighbor search 任务。该算法通过Random Sampling 和 Random Projection 两种方法,将时间复杂度从 O(n) 降低至 O(m),其中 n 是数据维度,m 是取样维度。此外,FastLSH 具有可证明的 LSH 性质,与非 LSH 快笔不同。我们在一个实际数据集和一些 sintetic 数据集上进行了广泛的实验,结果表明,FastLSH 与状态之前的最佳答案相当,并且具有更高的查询效率和更低的存储占用率。我们认为 FastLSH 是一种有前途的 LSH 算法。”

DTC: Deep Tracking Control – A Unifying Approach to Model-Based Planning and Reinforcement-Learning for Versatile and Robust Locomotion

  • paper_url: http://arxiv.org/abs/2309.15462
  • repo_url: None
  • paper_authors: Fabian Jenelten, Junzhe He, Farbod Farshidian, Marco Hutter
  • For: 本研究旨在开发一种可靠和抗衰落的脚部机器人控制方法,以便在实际世界中提供稳定的行走能力。* Methods: 本研究使用了模型驱动的循环优化方法和学习从数据的强化学习方法,并将其结合在一起以实现更高的稳定性和抗衰落性。* Results: 研究表明,该控制方法可以在稀有的脚部支点下实现更高的精度和抗衰落性,并且在不同的跟踪优化方法下仍然可以保持稳定性。
    Abstract Legged locomotion is a complex control problem that requires both accuracy and robustness to cope with real-world challenges. Legged systems have traditionally been controlled using trajectory optimization with inverse dynamics. Such hierarchical model-based methods are appealing due to intuitive cost function tuning, accurate planning, and most importantly, the insightful understanding gained from more than one decade of extensive research. However, model mismatch and violation of assumptions are common sources of faulty operation and may hinder successful sim-to-real transfer. Simulation-based reinforcement learning, on the other hand, results in locomotion policies with unprecedented robustness and recovery skills. Yet, all learning algorithms struggle with sparse rewards emerging from environments where valid footholds are rare, such as gaps or stepping stones. In this work, we propose a hybrid control architecture that combines the advantages of both worlds to simultaneously achieve greater robustness, foot-placement accuracy, and terrain generalization. Our approach utilizes a model-based planner to roll out a reference motion during training. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We evaluate the accuracy of our locomotion pipeline on sparse terrains, where pure data-driven methods are prone to fail. Furthermore, we demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts. Finally, we show that our proposed tracking controller generalizes across different trajectory optimization methods not seen during training. In conclusion, our work unites the predictive capabilities and optimality guarantees of online planning with the inherent robustness attributed to offline learning.
    摘要 四肢行走控制问题是一个复杂的问题,需要同时具备准确性和鲁棒性,以满足实际世界中的挑战。传统上,四肢系统通过天体动力学优化算法进行控制,这种层次模型基于方法具有直观的成本函数调整、准确的规划和深刻的研究经验。然而,模型匹配和假设违反是常见的问题,可能会导致操作失败。相比之下,在 simulations 中进行学习 Reinforcement 学习可以获得不可思议的鲁棒性和恢复技能。然而,所有学习算法都会在罕见的有效步行面上遇到缺乏奖励的问题,如块或步行石头。在这种情况下,我们提出了一种混合控制架构,将模型基于的优化方法和学习 Reinforcement 学习相结合,以同时实现更高的鲁棒性、脚印准确性和地形泛化。我们的方法在训练中使用模型基于的规划器执行参考动作。一个深度神经网络策略在 simulated 环境中进行培训,目标是跟踪优化的脚印。我们评估了我们的行走管道的准确性在罕见的地形上,其中纯数据驱动方法容易出现问题。此外,我们还证明了我们的提议跟踪控制器在不同的轨迹优化方法下可以保持一致性。综上所述,我们的工作将线性控制和在线计划的优点相结合,同时具备预测能力和优化 garanties。

Deep Learning in Deterministic Computational Mechanics

  • paper_url: http://arxiv.org/abs/2309.15421
  • repo_url: None
  • paper_authors: Leon Herrmann, Stefan Kollmannsberger
  • for: 本文旨在帮助计算机机械领域的研究人员更好地了解深度学习技术的应用,以便更好地探索这一领域。
  • methods: 本文描述了五种主要的深度学习方法,包括模拟替换、模拟加强、离散方法作为神经网络、生成方法和深度强化学习。
  • results: 本文的评论集中在深度学习方法,而不是应用于计算机机械领域的研究成果,以便帮助研究人员更好地了解这一领域。
    Abstract The rapid growth of deep learning research, including within the field of computational mechanics, has resulted in an extensive and diverse body of literature. To help researchers identify key concepts and promising methodologies within this field, we provide an overview of deep learning in deterministic computational mechanics. Five main categories are identified and explored: simulation substitution, simulation enhancement, discretizations as neural networks, generative approaches, and deep reinforcement learning. This review focuses on deep learning methods rather than applications for computational mechanics, thereby enabling researchers to explore this field more effectively. As such, the review is not necessarily aimed at researchers with extensive knowledge of deep learning -- instead, the primary audience is researchers at the verge of entering this field or those who attempt to gain an overview of deep learning in computational mechanics. The discussed concepts are, therefore, explained as simple as possible.
    摘要 “深度学习在计算机机学中的快速发展,包括计算机机学领域内的深度学习研究,已经形成了广泛和多样的文献。为帮助研究者更好地了解计算机机学领域中的深度学习方法,我们提供了深度学习在计算机机学中的概述。本文分为五个主要类别:模拟替换、模拟加强、离散为神经网络、生成方法和深度奖励学习。本文主要针对计算机机学领域的研究者,而不是深度学习专家,因此在解释概念时尽量简单明了。”

Automatic Feature Fairness in Recommendation via Adversaries

  • paper_url: http://arxiv.org/abs/2309.15418
  • repo_url: https://github.com/holdenhu/advfm
  • paper_authors: Hengchang Hu, Yiming Cao, Zhankui He, Samson Tan, Min-Yen Kan
  • for: The paper aims to achieve equitable treatment across diverse groups defined by various feature combinations in recommender systems, by proposing feature fairness as the foundation for practical implementation.
  • methods: The paper introduces unbiased feature learning through adversarial training, using adversarial perturbation to enhance feature representation. The authors adapt adversaries automatically based on two forms of feature biases: frequency and combination variety of feature values.
  • results: The paper shows that the proposed method, AAFM, surpasses strong baselines in both fairness and accuracy measures. AAFM excels in providing item- and user-fairness for single- and multi-feature tasks, showcasing their versatility and scalability. However, the authors find that adversarial perturbation must be well-managed during training to maintain good accuracy.Here’s the Chinese translation of the three key points:
  • for: 这篇论文目标是在推荐系统中实现多样性群体的平等待遇,通过基于特征平衡的方法来实现实用实现。
  • methods: 论文提出了不偏向特征学习方法,通过对抗训练来增强特征表示。作者自动调整了对抗器基于特征偏好的两种形式:频率偏好和组合变化。
  • results: 论文表明,提出的方法AAFM在公平度和准确度两个指标上都超越了强基eline。AAFM在单特征和多特征任务中展示出了其 universality和可扩展性。但是,作者发现,在训练过程中,对抗训练的管理是关键,以保持好的准确度。
    Abstract Fairness is a widely discussed topic in recommender systems, but its practical implementation faces challenges in defining sensitive features while maintaining recommendation accuracy. We propose feature fairness as the foundation to achieve equitable treatment across diverse groups defined by various feature combinations. This improves overall accuracy through balanced feature generalizability. We introduce unbiased feature learning through adversarial training, using adversarial perturbation to enhance feature representation. The adversaries improve model generalization for under-represented features. We adapt adversaries automatically based on two forms of feature biases: frequency and combination variety of feature values. This allows us to dynamically adjust perturbation strengths and adversarial training weights. Stronger perturbations are applied to feature values with fewer combination varieties to improve generalization, while higher weights for low-frequency features address training imbalances. We leverage the Adaptive Adversarial perturbation based on the widely-applied Factorization Machine (AAFM) as our backbone model. In experiments, AAFM surpasses strong baselines in both fairness and accuracy measures. AAFM excels in providing item- and user-fairness for single- and multi-feature tasks, showcasing their versatility and scalability. To maintain good accuracy, we find that adversarial perturbation must be well-managed: during training, perturbations should not overly persist and their strengths should decay.
    摘要 “公平性”是推荐系统中广泛讨论的话题,但实际实现受到定义敏感特征的挑战,以保持推荐准确性。我们提议在不同群体中实现公平待遇,通过不同特征组合来定义多样化的特征基础。这会提高总准确性,通过平衡特征泛化。我们引入无偏特征学习,通过对抗训练来增强特征表示。对抗敌对体现出来的敌人会提高模型泛化,特别是对于少见特征值。我们自动调整对抗训练的权重和强度,根据特征频率和组合多样性。更强的扰动应用于少见特征值,以提高泛化,而高权重用于低频特征值,以解决训练不平衡。我们基于广泛应用的因子分解机器(AAFM)模型,并在实验中超过了强基eline的公平和准确度度量。AAFM在单特征和多特征任务中展现出了优秀的公平和精度性能,这表明它的多样性和可扩展性。为保持好的准确度,我们发现对抗扰动需要有效管理:在训练过程中,扰动应该不会过度 persist,并且强度应该逐渐减弱。

Revolutionizing Terrain-Precipitation Understanding through AI-driven Knowledge Discovery

  • paper_url: http://arxiv.org/abs/2309.15400
  • repo_url: None
  • paper_authors: Hao Xu, Yuntian Chen, Zhenzhong Zeng, Nina Li, Jian Li, Dongxiao Zhang
  • for: 增进当前气候科学中复杂地形区域气候过程的理解,特别是全球气候变化的背景下。
  • methods: 利用先进的人工智能驱动的知识发现技术,揭示了陡峭地形特征和降水模式之间的细腻关系,揭示了过去隐藏的气候动力。
  • results: 发现了一种名为’1995转折点’的现象,表明1995年左右,气候变化的力量对陡峭地形特征和降水关系产生了重要影响。这些公式在应用于降水预测中具有实际应用价值,可以帮助我们从低分辨率未来气候数据中获得精细的降水预测结果。
    Abstract Advancing our understanding of climate processes in regions characterized by intricate terrain complexity is a paramount challenge in contemporary climate science, particularly in the context of global climate change. Notably, the scarcity of observational data in these regions has imposed substantial limitations on understanding the nuanced climate dynamics therein. For the first time, utilizing cutting-edge AI-driven knowledge discovery techniques, we have uncovered explicit equations that elucidate the intricate relationship between terrain features and precipitation patterns, illuminating the previously concealed complexities governing these relationships. These equations, thus far undisclosed, exhibit remarkable accuracy compared to conventional empirical models when applied to precipitation data. Building on this foundation, we reveal a phenomenon known as the '1995 turning point,' indicating a significant shift in the terrain-precipitation relationship in approximately 1995, related to the forces of climate change. These equations have practical applications, particularly in achieving fine-scale downscaling precipitation predictions from low-resolution future climate data. This capability provides invaluable insights into the expected changes in precipitation patterns across diverse terrains under future climate scenarios.
    摘要 当前气候科学中解决复杂地形区域内气候过程的挑战非常大,尤其在全球气候变化背景下。尽管 Observational data 在这些区域scarce,这些区域的气候动力学尚未得到了深入理解。为了解决这个问题,我们首次采用了前沿的 AI驱动知识发现技术,揭示了 terrain features 和降水模式之间的Explicit equations,揭示了之前隐藏的复杂关系。这些方程在与 convential empirical models 比较时显示了惊人的准确性。基于这个基础,我们发现了1995年的“转折点”,表明在约1995年发生了气候变化的力量对 terrain-precipitation 关系的重要影响。这些方程在实践中具有重要的应用价值,特别是在将来气候数据的低分辨率预测降水情况的细致下calibration。这种能力为不同地形下预计的降水模式变化的预期提供了不可或缺的信息。

Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs

  • paper_url: http://arxiv.org/abs/2309.15395
  • repo_url: None
  • paper_authors: Zihan Zhou, Honghao Wei, Lei Ying
  • for: 这 paper 考虑了在线Constrained Markov Decision Processes (CMDPs) 中的最佳策略识别 (BPI) 问题。
  • methods: 该 paper 提出了一种基于 Koole (1988) 和 Ross (1989) 的基本结构性质的新算法(名为 Pruning-Refinement-Identification,简称 PRI),用于识别 Near-optimal 策略。
  • results: PRI 算法可以在线 CMDPs 中实现 trio 目标:(i) PRI 是一种 model-free 算法; (ii) PRI 输出一个 near-optimal 策略,并且在学习结束时达到高概率; (iii) 在图表设置下,PRI 保证了 $\tilde{\mathcal{O}(\sqrt{K})$ 的 regret和约束违反,这significantly 超过了现有的最佳 regret bound $\tilde{\mathcal{O}(K^{\frac{4}{5}})$ under model-free algorithm, where $K$ 是总共话集数。
    Abstract This paper considers the best policy identification (BPI) problem in online Constrained Markov Decision Processes (CMDPs). We are interested in algorithms that are model-free, have low regret, and identify an optimal policy with a high probability. Existing model-free algorithms for online CMDPs with sublinear regret and constraint violation do not provide any convergence guarantee to an optimal policy and provide only average performance guarantees when a policy is uniformly sampled at random from all previously used policies. In this paper, we develop a new algorithm, named Pruning-Refinement-Identification (PRI), based on a fundamental structural property of CMDPs proved in Koole(1988); Ross(1989), which we call limited stochasticity. The property says for a CMDP with $N$ constraints, there exists an optimal policy with at most $N$ stochastic decisions. The proposed algorithm first identifies at which step and in which state a stochastic decision has to be taken and then fine-tunes the distributions of these stochastic decisions. PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs a near-optimal policy with a high probability at the end of learning; and (iii) in the tabular setting, PRI guarantees $\tilde{\mathcal{O}(\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}(K^{\frac{4}{5})$ under a model-free algorithm, where $K$ is the total number of episodes.
    摘要 Our proposed algorithm, Pruning-Refinement-Identification (PRI), is based on a fundamental structural property of CMDPs proven in Koole (1988) and Ross (1989), which we call limited stochasticity. This property states that for a CMDP with N constraints, there exists an optimal policy with at most N stochastic decisions.PRI first identifies the steps and states where stochastic decisions need to be taken and then fine-tunes the distributions of these decisions. Our algorithm achieves three objectives:1. PRI is a model-free algorithm.2. It outputs a near-optimal policy with a high probability at the end of learning.3. In the tabular setting, PRI guarantees $\tilde{\mathcal{O}(\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}(K^{\frac{4}{5})$ under a model-free algorithm, where $K$ is the total number of episodes.

ADGym: Design Choices for Deep Anomaly Detection

  • paper_url: http://arxiv.org/abs/2309.15376
  • repo_url: https://github.com/minqi824/adgym
  • paper_authors: Minqi Jiang, Chaochuan Hou, Ao Zheng, Songqiao Han, Hailiang Huang, Qingsong Wen, Xiyang Hu, Yue Zhao
  • for: 本文旨在探讨深度学习方法中的异常检测问题,并提出了两个关键问题:(1)深度异常检测方法中的设计选择对检测异常的影响是多大?(2)如何自动选择适合的设计选择来优化异常检测模型?
  • methods: 本文提出了一个名为ADGym的平台,用于全面评估和自动选择深度异常检测方法中的设计元素。
  • results: 经过广泛的实验,结果表明,solely relying on现有领先方法并不够,而使用ADGym提出的模型则显著超越了当前领先技术。
    Abstract Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing. However, most of the current research tends to view deep AD algorithms as a whole, without dissecting the contributions of individual design choices like loss functions and network architectures. This view tends to diminish the value of preliminary steps like data preprocessing, as more attention is given to newly designed loss functions, network architectures, and learning paradigms. In this paper, we aim to bridge this gap by asking two key questions: (i) Which design choices in deep AD methods are crucial for detecting anomalies? (ii) How can we automatically select the optimal design choices for a given AD dataset, instead of relying on generic, pre-existing solutions? To address these questions, we introduce ADGym, a platform specifically crafted for comprehensive evaluation and automatic selection of AD design elements in deep methods. Our extensive experiments reveal that relying solely on existing leading methods is not sufficient. In contrast, models developed using ADGym significantly surpass current state-of-the-art techniques.
    摘要 Translated into Simplified Chinese:深度学习(DL)技术在不同领域如金融、医疗服务和云计算中得到了成功,但大多数当前研究仅视深度异常检测(AD)算法为一整体,没有分析个别设计选择的贡献,如损失函数和网络架构。这种视角减少了数据预处理的价值,更多地关注新设计的损失函数、网络架构和学习方法。在这篇论文中,我们想要填补这个差距,问两个关键问题:(i)深度AD方法中哪些设计选择对异常检测有着关键作用?(ii)如何自动选择给定AD数据集中最佳的设计选择,而不是依靠现有的、通用解决方案?为此,我们介绍了ADGym,一个专门为深度AD方法的评估和自动选择设计元素而设计的平台。我们的广泛实验表明,仅仅靠现有的领先方法是不够的。相反,使用ADGym开发的模型在当前领先技术上显著超越。

PPG to ECG Signal Translation for Continuous Atrial Fibrillation Detection via Attention-based Deep State-Space Modeling

  • paper_url: http://arxiv.org/abs/2309.15375
  • repo_url: None
  • paper_authors: Khuong Vo, Mostafa El-Khamy, Yoojin Choi
  • for: 这个研究旨在提出一种可以将心跳信号转换为电击血液信号的潜在独立注意力深度状态模型,以实现不需要严格的临床测量,并且可以在日常生活中监测心跳信号。
  • methods: 这个模型使用了非侵入式、低成本的光学方法,并且运用了潜在注意力技术,实现了将心跳信号转换为电击血液信号的目的。
  • results: 这个研究的实验结果显示,这种方法可以实现高效率的数据训练,并且可以实现与电击血液信号的对应关系。此外,这种方法还可以检测成人心跳过速症(AFib),并且可以辅助电击血液信号的检测。
    Abstract An electrocardiogram (ECG or EKG) is a medical test that measures the heart's electrical activity. ECGs are often used to diagnose and monitor a wide range of heart conditions, including arrhythmias, heart attacks, and heart failure. On the one hand, the conventional ECG requires clinical measurement, which restricts its deployment to medical facilities. On the other hand, single-lead ECG has become popular on wearable devices using administered procedures. An alternative to ECG is Photoplethysmography (PPG), which uses non-invasive, low-cost optical methods to measure cardiac physiology, making it a suitable option for capturing vital heart signs in daily life. As a result, it has become increasingly popular in health monitoring and is used in various clinical and commercial wearable devices. While ECG and PPG correlate strongly, the latter does not offer significant clinical diagnostic value. Here, we propose a subject-independent attention-based deep state-space model to translate PPG signals to corresponding ECG waveforms. The model is highly data-efficient by incorporating prior knowledge in terms of probabilistic graphical models. Notably, the model enables the detection of atrial fibrillation (AFib), the most common heart rhythm disorder in adults, by complementing ECG's accuracy with continuous PPG monitoring. We evaluated the model on 55 subjects from the MIMIC III database. Quantitative and qualitative experimental results demonstrate the effectiveness and efficiency of our approach.
    摘要 一个电cardiogram (ECG或EKG)是医疗测试,测量心脏的电动活动。ECG通常用于诊断和监测各种心脏疾病,包括cardiac arrhythmias, heart attacks, 和heart failure。一方面,普通的ECG需要临床测量,这限制了其部署到医疗设施中。另一方面,单导ECG在佩戴设备上使用了管理的程序。一种代替ECG的是光 Plethysmography (PPG),它使用非侵入的、低成本的光学方法测量心脏生理,因此成为了日常生活中捕捉重要的心脏指标的合适选择。由于PPG和ECG之间强相关,因此PPG可以用于补充ECG的诊断价值。在这种情况下,我们提议一种主体无关的注意力基于深度状态空间模型,将PPG信号翻译成对应的ECG波形。该模型具有高度数据效率,通过 incorporating prior knowledge in terms of probabilistic graphical models。特别是,该模型可以通过补充ECG的精度,检测成人最常见的心脏rhythm疾病(cardiac arrhythmias)。我们在MIMIC III数据库上测试了55名参与者。量化和质量实验结果表明我们的方法的效果和效率。

Density Estimation via Measure Transport: Outlook for Applications in the Biological Sciences

  • paper_url: http://arxiv.org/abs/2309.15366
  • repo_url: None
  • paper_authors: Vanessa Lopez-Marrero, Patrick R. Johnstone, Gilchan Park, Xihaier Luo
  • for: 本研究用于支持生物科学研究,尤其是在射线生物学领域,where data is scarce.
  • methods: 本研究使用度量运输方法,specifically using triangular transport maps, to process and analyze data distributed according to a wide class of probability measures.
  • results: 研究发现,当数据稀缺时,使用稀缺运输地图有利,可以找到隐藏在数据中的信息,如果某些基因之间的关系和它们的动态变化。
    Abstract One among several advantages of measure transport methods is that they allow for a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scarce data scenarios, which are common in domains such as radiation biology, are of particular interest. We find that when data is scarce, sparse transport maps are advantageous. In particular, statistics gathered from computing series of (sparse) adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.
    摘要 一种多种优点的度量运输方法是它们允许在一个广泛的概率测度下进行数据处理和分析。在这个上下文中,我们介绍了计算研究,以评估度量运输技术的潜在优势,具体来说是使用三角形运输地图。在具有稀缺数据的情况下,这种方法特别有优势。我们发现,当数据稀缺时,稀缺运输地图可以暴露数据中隐藏的信息。因此,在考虑的辐射生物学应用中,这种方法可以提供一种生成关于基因关系和它们在辐射暴露下的动态变化的假设的工具。

Exploring Learned Representations of Neural Networks with Principal Component Analysis

  • paper_url: http://arxiv.org/abs/2309.15328
  • repo_url: None
  • paper_authors: Amit Harlev, Andrew Engel, Panos Stinis, Tony Chiang
  • for: 本研究探讨深度神经网络(DNN)中的特征表示方法,以提高Explainable AI领域的理解。
  • methods: 本研究使用主成分分析(PCA)来研究一个ResNet-18模型在CIFAR-10数据集上的各层特征表示,并利用k- nearest neighbors分类器(k-NN)、最近类中心分类器(NCC)和支持向量机来评估这些特征表示的性能。
  • results: 研究发现,在某些层次上,只需20%的中间特征空间差异即可达到高精度分类,而且在所有层次上,前100个特征空间差异可以完全确定k-NN和NCC分类器的性能。研究还发现了神经萧略现象和中间神经萧略现象的关系,并提供了三种不同 yet可解释的特征表示模型,其中一种是一个Affine linear model,性能最佳。此外,研究还显示了可以利用多种特征表示模型来估计DNN中的神经萧略现象发生的位置。
    Abstract Understanding feature representation for deep neural networks (DNNs) remains an open question within the general field of explainable AI. We use principal component analysis (PCA) to study the performance of a k-nearest neighbors classifier (k-NN), nearest class-centers classifier (NCC), and support vector machines on the learned layer-wise representations of a ResNet-18 trained on CIFAR-10. We show that in certain layers, as little as 20% of the intermediate feature-space variance is necessary for high-accuracy classification and that across all layers, the first ~100 PCs completely determine the performance of the k-NN and NCC classifiers. We relate our findings to neural collapse and provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation with an affine linear model the best performing. We also show that leveraging several surrogate models affords us a clever method to estimate where neural collapse may initially occur within the DNN.
    摘要 <>translate_language: zh-CNUnderstanding deep neural network (DNN) feature representation remains an open question in the broader field of explainable AI. We use principal component analysis (PCA) to study the performance of k-nearest neighbors classifiers (k-NN), nearest class-centers classifiers (NCC), and support vector machines on the layer-wise representations of a ResNet-18 trained on CIFAR-10. We find that in certain layers, as little as 20% of the intermediate feature-space variance is sufficient for high-accuracy classification, and that the first ~100 principal components (PCs) completely determine the performance of the k-NN and NCC classifiers across all layers. Our findings are related to the phenomenon of neural collapse, and we provide partial evidence for the related phenomenon of intermediate neural collapse. Our preliminary work provides three distinct yet interpretable surrogate models for feature representation, with an affine linear model being the best performing. We also show that leveraging multiple surrogate models allows us to estimate where neural collapse may initially occur within the DNN.

Neural Operators for Accelerating Scientific Simulations and Design

  • paper_url: http://arxiv.org/abs/2309.15325
  • repo_url: None
  • paper_authors: Kamyar Azizzadenesheli, Nikola Kovachki, Zongyi Li, Miguel Liu-Schiaffini, Jean Kossaifi, Anima Anandkumar
  • for: 代替physical experiments的数字实验方法,提高科学发现和工程设计的效率和成本。
  • methods: 使用人工智能技术,特别是神经网络模型,学习函数的映射,可以在新的位置上预测和拟合解决方案。
  • results: 可以在computational fluid dynamics、天气预测和材料模型等领域中快速替代或补充现有的模拟器,提高解决方案的精度和普遍性。
    Abstract Scientific discovery and engineering design are currently limited by the time and cost of physical experiments, selected mostly through trial-and-error and intuition that require deep domain expertise. Numerical simulations present an alternative to physical experiments but are usually infeasible for complex real-world domains due to the computational requirements of existing numerical methods. Artificial intelligence (AI) presents a potential paradigm shift by developing fast data-driven surrogate models. In particular, an AI framework, known as neural operators, presents a principled framework for learning mappings between functions defined on continuous domains, e.g., spatiotemporal processes and partial differential equations (PDE). They can extrapolate and predict solutions at new locations unseen during training, i.e., perform zero-shot super-resolution. Neural operators can augment or even replace existing simulators in many applications, such as computational fluid dynamics, weather forecasting, and material modeling, while being 4-5 orders of magnitude faster. Further, neural operators can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Since neural operators are differentiable, they can directly optimize parameters for inverse design and other inverse problems. We believe that neural operators present a transformative approach to simulation and design, enabling rapid research and development.
    摘要

On the Power of SVD in the Stochastic Block Model

  • paper_url: http://arxiv.org/abs/2309.15322
  • repo_url: None
  • paper_authors: Xinyu Mao, Jiapeng Zhang
  • for: 该研究旨在解释spectral方法在归一化 clustering问题中的行为,并在 Stochastic Block Model (SBM) 中研究vanilla-SVD算法的力量。
  • methods: 该研究使用了vanilla-SVD算法来解决归一化 clustering问题,并在symmetric设定下确认了该算法可以正确地回归所有团。
  • results: 研究发现,在symmetric设定下,vanilla-SVD算法可以准确地回归所有团,解答了Van Vu(Combinatorics Probability and Computing, 2018)在symmetric设定下提出的开问。
    Abstract A popular heuristic method for improving clustering results is to apply dimensionality reduction before running clustering algorithms. It has been observed that spectral-based dimensionality reduction tools, such as PCA or SVD, improve the performance of clustering algorithms in many applications. This phenomenon indicates that spectral method not only serves as a dimensionality reduction tool, but also contributes to the clustering procedure in some sense. It is an interesting question to understand the behavior of spectral steps in clustering problems. As an initial step in this direction, this paper studies the power of vanilla-SVD algorithm in the stochastic block model (SBM). We show that, in the symmetric setting, vanilla-SVD algorithm recovers all clusters correctly. This result answers an open question posed by Van Vu (Combinatorics Probability and Computing, 2018) in the symmetric setting.
    摘要 一种广泛使用的规则方法是在运行聚类算法之前应用维度减少方法。已经观察到 spectral-based维度减少工具,如PCA或SVD,在许多应用中提高 clustering 算法的性能。这种现象表明 spectral 方法不仅 serves as a维度减少工具,还在某种意义上对 clustering 过程做出了贡献。这是一个有趣的问题,要理解 spectral 步骤在 clustering 问题中的行为。为了解答这个问题,这篇论文研究了 vanilla-SVD 算法在随机块模型(SBM)中的力量。我们显示,在对称设定下,vanilla-SVD 算法可以正确地回归所有群。这个结果回答了 Van Vu(Combinatorics Probability and Computing, 2018)在对称设定下提出的一个开Question。