cs.AI - 2023-09-28

Sourcing Investment Targets for Venture and Growth Capital Using Multivariate Time Series Transformer

  • paper_url: http://arxiv.org/abs/2309.16888
  • repo_url: None
  • paper_authors: Lele Cao, Gustaf Halvardsson, Andrew McCornack, Vilhelm von Ehrenheim, Pawel Herman
  • For: 本研究探讨了private equity(PE)行业中数据驱动方法的应用,尤其是venture capital(VC)和growth capital(GC)投资目标的选择。* Methods: 我们提出了一种使用Transformer-based Multivariate Time Series Classifier(TMTSC)来预测候选公司的成功可能性。我们还介绍了我们的实现方式,包括输入特征、模型架构、优化目标和投资者中心的数据增强和分割。* Results: 我们的实验结果表明,我们的方法可以在四个数据集上实现显著的提升,相比三种常见基准。
    Abstract This paper addresses the growing application of data-driven approaches within the Private Equity (PE) industry, particularly in sourcing investment targets (i.e., companies) for Venture Capital (VC) and Growth Capital (GC). We present a comprehensive review of the relevant approaches and propose a novel approach leveraging a Transformer-based Multivariate Time Series Classifier (TMTSC) for predicting the success likelihood of any candidate company. The objective of our research is to optimize sourcing performance for VC and GC investments by formally defining the sourcing problem as a multivariate time series classification task. We consecutively introduce the key components of our implementation which collectively contribute to the successful application of TMTSC in VC/GC sourcing: input features, model architecture, optimization target, and investor-centric data augmentation and split. Our extensive experiments on four datasets, benchmarked towards three popular baselines, demonstrate the effectiveness of our approach in improving decision making within the VC and GC industry.
    摘要

Investigating Human-Identifiable Features Hidden in Adversarial Perturbations

  • paper_url: http://arxiv.org/abs/2309.16878
  • repo_url: None
  • paper_authors: Dennis Y. Menn, Tzu-hsun Feng, Sriram Vishwanath, Hung-yi Lee
  • for: 本研究探讨了神经网络在针对性攻击下的抵触性问题,以帮助更好地理解神经网络在实际应用中的潜在漏洞。
  • methods: 本研究使用了多种攻击算法,包括针对性攻击和无目标攻击,并在三个 dataset 上进行了实验。研究人员还使用了像素级标注来提取人类可识别的特征,并证明了这些特征可以妨碍目标模型。
  • results: 研究发现,针对性攻击和无目标攻击都会导致模型做出错误的预测,并且在不同的攻击算法下,perturbations 的特征存在一定的相似性。此外,研究还发现了两种不同的效果在人类可识别的特征中,其中隐藏效应更常见于无目标攻击中,而生成效应更常见于针对性攻击中。
    Abstract Neural networks perform exceedingly well across various machine learning tasks but are not immune to adversarial perturbations. This vulnerability has implications for real-world applications. While much research has been conducted, the underlying reasons why neural networks fall prey to adversarial attacks are not yet fully understood. Central to our study, which explores up to five attack algorithms across three datasets, is the identification of human-identifiable features in adversarial perturbations. Additionally, we uncover two distinct effects manifesting within human-identifiable features. Specifically, the masking effect is prominent in untargeted attacks, while the generation effect is more common in targeted attacks. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models. In addition, our findings indicate a notable extent of similarity in perturbations across different attack algorithms when averaged over multiple models. This work also provides insights into phenomena associated with adversarial perturbations, such as transferability and model interpretability. Our study contributes to a deeper understanding of the underlying mechanisms behind adversarial attacks and offers insights for the development of more resilient defense strategies for neural networks.
    摘要

Preface: A Data-driven Volumetric Prior for Few-shot Ultra High-resolution Face Synthesis

  • paper_url: http://arxiv.org/abs/2309.16859
  • repo_url: https://github.com/syntec-research/Preface
  • paper_authors: Marcel C. Bühler, Kripasindhu Sarkar, Tanmay Shah, Gengyan Li, Daoye Wang, Leonhard Helminger, Sergio Orts-Escolano, Dmitry Lagun, Otmar Hilliges, Thabo Beeler, Abhimitra Meka
  • for: 能够Synthesize high-resolution human faces with complex appearance and reflectance effects, including hair and skin.
  • methods: 使用了一种新的积体人脸先验模型,该模型基于一个identity-conditioned NeRF,通过一个简单的粗糙特征点基于的3D对称来学习一个平滑的积体geometry和外观空间,而不需要大量的多视图输入图像。
  • results: 可以从2或3个不同分辨率的相机视图中获取高品质积体人脸表示,只需要两个视图的捕捉图像作为输入。
    Abstract NeRFs have enabled highly realistic synthesis of human faces including complex appearance and reflectance effects of hair and skin. These methods typically require a large number of multi-view input images, making the process hardware intensive and cumbersome, limiting applicability to unconstrained settings. We propose a novel volumetric human face prior that enables the synthesis of ultra high-resolution novel views of subjects that are not part of the prior's training distribution. This prior model consists of an identity-conditioned NeRF, trained on a dataset of low-resolution multi-view images of diverse humans with known camera calibration. A simple sparse landmark-based 3D alignment of the training dataset allows our model to learn a smooth latent space of geometry and appearance despite a limited number of training identities. A high-quality volumetric representation of a novel subject can be obtained by model fitting to 2 or 3 camera views of arbitrary resolution. Importantly, our method requires as few as two views of casually captured images as input at inference time.
    摘要 NeRFs 已经实现了人脸的高真实感Synthesis,包括复杂的外观和反射效果。这些方法通常需要大量多视图输入图像,使得过程成为硬件昂贵和繁琐,限制了应用场景。我们提出了一个新的积分型人脸先验模型,允许synthesize高分辨率的新视图。这个模型由一个 conditioned NeRF 和一个 sparse landmark-based 3D 对齐组成。我们通过一个小量的训练数据进行了学习,并且可以通过两个或三个Camera视图的任意分辨率进行模型适应。关键是,我们的方法只需要两个或三个严格Captured图像作为输入。

Multi-Bellman operator for convergence of $Q$-learning with linear function approximation

  • paper_url: http://arxiv.org/abs/2309.16819
  • repo_url: None
  • paper_authors: Diogo S. Carvalho, Pedro A. Santos, Francisco S. Melo
  • for: 本研究探讨$Q$-学习算法在线性函数近似下的收敛性。
  • methods: 本研究引入了一个新的多贝尔曼操作符,扩展了传统贝尔曼操作符的功能。通过研究这个操作符的性质,我们提出了一种基于多贝尔曼操作符的多$Q$-学习算法,并证明了这种算法可以在Fixed-point guarantees下提供更好的性能。
  • results: 我们通过应用这种算法于知名环境中,证明了我们的方法的有效性和实用性。
    Abstract We study the convergence of $Q$-learning with linear function approximation. Our key contribution is the introduction of a novel multi-Bellman operator that extends the traditional Bellman operator. By exploring the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes contractive, providing improved fixed-point guarantees compared to the Bellman operator. To leverage these insights, we propose the multi $Q$-learning algorithm with linear function approximation. We demonstrate that this algorithm converges to the fixed-point of the projected multi-Bellman operator, yielding solutions of arbitrary accuracy. Finally, we validate our approach by applying it to well-known environments, showcasing the effectiveness and applicability of our findings.
    摘要 我们研究 $Q$-学习的收敛性,尤其是使用线性函数 aproximation。我们的关键贡献是提出一种多重贝尔曼 операktor,将传统的贝尔曼 operator 扩展。通过研究这个操作符的性质,我们确定了条件下,其 projetced multi-Bellman operator 变得减少,从而提供更好的固定点保证。基于这些发现,我们提议 multi $Q$-学习算法,使用线性函数 aproximation。我们证明该算法收敛到多重贝尔曼 operator 的固定点,可以获得任意精度的解决方案。最后,我们验证了我们的方法,在知名环境中应用,展示了我们的发现的有效性和实用性。

De-SaTE: Denoising Self-attention Transformer Encoders for Li-ion Battery Health Prognostics

  • paper_url: http://arxiv.org/abs/2310.00023
  • repo_url: None
  • paper_authors: Gaurav Shinde, Rohan Mohapatra, Pooja Krishan, Saptarshi Sengupta
  • For: The paper aims to accurately predict the Remaining Useful Life (RUL) of Lithium Ion (Li-ion) batteries, which is critical for proactive maintenance and predictive analytics.* Methods: The paper proposes a novel approach that combines multiple denoising modules, including a denoising auto-encoder and a wavelet denoiser, to generate encoded/decomposed representations of battery data. These representations are then processed through dedicated self-attention transformer encoders to estimate health indicators.* Results: The paper reports that the proposed approach can accurately estimate health indicators under a set of diverse noise patterns, with error metrics on par or better than the best reported in recent literature.
    Abstract Lithium Ion (Li-ion) batteries have gained widespread popularity across various industries, from powering portable electronic devices to propelling electric vehicles and supporting energy storage systems. A central challenge in managing Li-ion batteries effectively is accurately predicting their Remaining Useful Life (RUL), which is a critical measure for proactive maintenance and predictive analytics. This study presents a novel approach that harnesses the power of multiple denoising modules, each trained to address specific types of noise commonly encountered in battery data. Specifically we use a denoising auto-encoder and a wavelet denoiser to generate encoded/decomposed representations, which are subsequently processed through dedicated self-attention transformer encoders. After extensive experimentation on the NASA and CALCE datasets, we are able to characterize a broad spectrum of health indicator estimations under a set of diverse noise patterns. We find that our reported error metrics on these datasets are on par or better with the best reported in recent literature.
    摘要

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

  • paper_url: http://arxiv.org/abs/2309.16797
  • repo_url: None
  • paper_authors: Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rocktäschel
  • for: This paper aims to improve the reasoning abilities of large language models (LLMs) in various domains by presenting a general-purpose self-referential self-improvement mechanism called Promptbreeder.
  • methods: Promptbreeder evolves and adapts prompts for a given domain by mutating a population of task-prompts, and subsequently evaluating them for fitness on a training set. The mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way.
  • results: Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks, and is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
    Abstract Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.
    摘要 受欢迎的提示策略如链条提示可以很大程度地提高大语言模型(LLM)在不同领域的理智能力。然而,这些手动制定的提示策略通常是不优化的。在本文中,我们介绍了Promptbreeder,一种通用的自referential自提高机制,可以在给定领域中进化和适应提示。Promptbreeder被 LLM 驱动,对任务提示 population 进行变异,然后对这些任务提示进行评价。关键是,变异这些任务提示的过程是由 LLM 生成和改进的自referential方式。即,Promptbreeder不仅是改进任务提示,而且也是改进这些改进任务提示的mutation prompts。Promptbreeder在常用的数学和常识理智测试benchmark上表现出色,并且能够演化复杂的任务提示 для hate speech classification 问题。

Photonic Accelerators for Image Segmentation in Autonomous Driving and Defect Detection

  • paper_url: http://arxiv.org/abs/2309.16783
  • repo_url: None
  • paper_authors: Lakshmi Nair, David Widemann, Brad Turcott, Nick Moore, Alexandra Wleklinski, Darius Bunandar, Ioannis Papavasileiou, Shihu Wang, Eric Logan
  • for: 这个论文探讨了在光学加速器上执行图像分割模型,以优化图像分割任务的速度和能效性。
  • methods: 该论文使用了光学加速器来执行图像分割模型,并研究了不同图像分割模型在光学加速器上的性能。
  • results: 论文发现了一些图像分割模型可以在光学加速器上减少精度损失,并提供了这些模型的实际原因。此外,论文还对不同图像分割任务的吞吐量和能耗进行了比较。
    Abstract Photonic computing promises faster and more energy-efficient deep neural network (DNN) inference than traditional digital hardware. Advances in photonic computing can have profound impacts on applications such as autonomous driving and defect detection that depend on fast, accurate and energy efficient execution of image segmentation models. In this paper, we investigate image segmentation on photonic accelerators to explore: a) the types of image segmentation DNN architectures that are best suited for photonic accelerators, and b) the throughput and energy efficiency of executing the different image segmentation models on photonic accelerators, along with the trade-offs involved therein. Specifically, we demonstrate that certain segmentation models exhibit negligible loss in accuracy (compared to digital float32 models) when executed on photonic accelerators, and explore the empirical reasoning for their robustness. We also discuss techniques for recovering accuracy in the case of models that do not perform well. Further, we compare throughput (inferences-per-second) and energy consumption estimates for different image segmentation workloads on photonic accelerators. We discuss the challenges and potential optimizations that can help improve the application of photonic accelerators to such computer vision tasks.
    摘要 光子计算技术可以提供更快速和能效的深度神经网络(DNN)推理,比传统的数字硬件更高效。在应用于自动驾驶和缺陷检测等领域中,光子计算技术的进步可能产生深见的影响。本文通过对图像分割模型的执行来探索:a) 适合光子加速器的图像分割DNN架构,以及b) 光子加速器上运行不同图像分割模型的吞吐量和能效率,以及其中的让步。我们发现某些分割模型在光子加速器上具有较小的精度损失(相比于数字浮点32模型),并 explore了这种 Robustness 的实际原因。此外,我们还讨论了在模型性能不佳时如何恢复精度的技术。此外,我们对不同图像分割任务的吞吐量和能效率进行了比较,并讨论了在光子加速器上应用这些计算任务的挑战和优化方法。

Intriguing properties of generative classifiers

  • paper_url: http://arxiv.org/abs/2309.16779
  • repo_url: None
  • paper_authors: Priyank Jaini, Kevin Clark, Robert Geirhos
  • for: 研究人类物体识别的最佳方法 – 是否使用推测性推理(快速但可能存在快速学习)或使用生成模型(慢速但可能更加可靠)?
  • methods: 基于最近的生成模型技术,将文本到图像模型转化成分类器,以便研究其行为并与推测模型和人类psychophysical数据进行比较。
  • results: 报告了四个有趣的 emergent 性质:1)表现出人类水平的形态偏好(Imagen 的99%),2)与人类分类错误相似的水平,3)与人类分类错误的水平相似,4)对 certain 感知错觉有理解。结果表明,当前主导的人类物体识别模型是推测性推理,但零shot 生成模型在人类物体识别数据上 surprisingly well 的 aproximation。
    Abstract What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well.
    摘要 最佳模式来识别物体是否抽象推理(快速但可能会采用短cut学习)或使用生成模型(慢速但可能更加Robust)?我们基于最近的生成模型发展,将文本到图像模型转化为分类器。这使得我们可以研究它们的行为,并与抽象模型和人类心理物理数据进行比较。我们报道了四种有趣的生成分类器特性:它们表现出99%的人类样式偏好(对Imagen)、人类水平的异常数据准确率、人类分类错误的Alignment和某些视觉错觉的理解。我们的结果表明,虽然当前的主导模式是抽象推理,但零 shot生成模型在模型人类物体识别数据中 surprisingly well。

How many words does ChatGPT know? The answer is ChatWords

  • paper_url: http://arxiv.org/abs/2309.16777
  • repo_url: https://github.com/wordsgpt/chatwords
  • paper_authors: Gonzalo Martínez, Javier Conde, Pedro Reviriego, Elena Merino-Gómez, José Alberto Hernández, Fabrizio Lombardi
    for: 这个论文的目的是评估聊天GPT的语言知识,特别是对于一组指定的单词的认知。methods: 该论文使用了自动化测试系统ChatWords来评估聊天GPT对于一组指定的单词的认知。results: 研究发现,聊天GPT只能正确识别约80%的词汇,并且在一些情况下具有错误的含义。
    Abstract The introduction of ChatGPT has put Artificial Intelligence (AI) Natural Language Processing (NLP) in the spotlight. ChatGPT adoption has been exponential with millions of users experimenting with it in a myriad of tasks and application domains with impressive results. However, ChatGPT has limitations and suffers hallucinations, for example producing answers that look plausible but they are completely wrong. Evaluating the performance of ChatGPT and similar AI tools is a complex issue that is being explored from different perspectives. In this work, we contribute to those efforts with ChatWords, an automated test system, to evaluate ChatGPT knowledge of an arbitrary set of words. ChatWords is designed to be extensible, easy to use, and adaptable to evaluate also other NLP AI tools. ChatWords is publicly available and its main goal is to facilitate research on the lexical knowledge of AI tools. The benefits of ChatWords are illustrated with two case studies: evaluating the knowledge that ChatGPT has of the Spanish lexicon (taken from the official dictionary of the "Real Academia Espa\~nola") and of the words that appear in the Quixote, the well-known novel written by Miguel de Cervantes. The results show that ChatGPT is only able to recognize approximately 80% of the words in the dictionary and 90% of the words in the Quixote, in some cases with an incorrect meaning. The implications of the lexical knowledge of NLP AI tools and potential applications of ChatWords are also discussed providing directions for further work on the study of the lexical knowledge of AI tools.
    摘要 chatGPT 的推出使得人工智能自然语言处理(NLP)技术再次升级。chatGPT 的采用率快速增长,有数百万用户在各种应用领域进行实验,结果很出色。然而,chatGPT 也存在一些限制和偏见,例如生成的答案看起来很像,但实际上完全错误。评估 chatGPT 和类似的 NLP 工具性能是一个复杂的问题,正在从不同的角度进行探索。在这项工作中,我们贡献了一个自动化测试系统,即 ChatWords,以评估 chatGPT 对于任意集合的词汇知识。ChatWords 设计为易于使用、可扩展和适用于评估其他 NLP 工具。ChatWords 公开可用,其主要目标是促进研究 NLP 工具词汇知识。我们通过两个案例研究,一是评估 chatGPT 对于西班牙语词汇(根据官方《Real Academia Espa\~nola》词典)的认知程度,二是评估 chatGPT 对于《仙游记》中出现的词汇的认知程度。结果显示,chatGPT 只能识别西班牙语词汇约80%,《仙游记》中的词汇约90%,有时会带有错误的含义。我们还讨论了 NLP 工具词汇知识的后果和可能的应用,并提供了进一步研究 NLP 工具词汇知识的指导。

Neural scaling laws for phenotypic drug discovery

  • paper_url: http://arxiv.org/abs/2309.16773
  • repo_url: None
  • paper_authors: Drew Linsley, John Griffin, Jason Parker Brown, Adam N Roose, Michael Frank, Peter Linsley, Steven Finkbeiner, Jeremy Linsley
  • for: 这个论文是为了探讨深度神经网络(DNNs)在小分子药物发现方面是否可以通过大规模模型和数据来实现突破性进步。
  • methods: 作者通过大规模系统性分析,检查DNN大小、数据饭和学习策略之间如何相互作用,以影响在Phenotypic Chemistry Arena(Pheno-CA)benchmark上的准确性。
  • results: 作者发现,不同于自然语言处理和计算机视觉领域,DNN在小分子药物发现任务上不会随着数据和模型大小的增加而不断提高。作者引入了一种新的预学任务——反向生物过程(IBP),并发现IBP先训练后在Pheno-CA上表现出较高的准确性。此外,IBP训练后的DNN表现与数据和模型规模呈MONOTONIC增长关系。这些结果表明,用于解决小分子药物发现任务的DNN元素已经在我们手中,并且可以通过更多的实验数据来实现任何需要的提升。作者发布了Pheno-CA bencmark和代码,以便更多的研究人员研究小分子药物发现领域中的神经 scaling laws。
    Abstract Recent breakthroughs by deep neural networks (DNNs) in natural language processing (NLP) and computer vision have been driven by a scale-up of models and data rather than the discovery of novel computing paradigms. Here, we investigate if scale can have a similar impact for models designed to aid small molecule drug discovery. We address this question through a large-scale and systematic analysis of how DNN size, data diet, and learning routines interact to impact accuracy on our Phenotypic Chemistry Arena (Pheno-CA) benchmark: a diverse set of drug development tasks posed on image-based high content screening data. Surprisingly, we find that DNNs explicitly supervised to solve tasks in the Pheno-CA do not continuously improve as their data and model size is scaled-up. To address this issue, we introduce a novel precursor task, the Inverse Biological Process (IBP), which is designed to resemble the causal objective functions that have proven successful for NLP. We indeed find that DNNs first trained with IBP then probed for performance on the Pheno-CA significantly outperform task-supervised DNNs. More importantly, the performance of these IBP-trained DNNs monotonically improves with data and model scale. Our findings reveal that the DNN ingredients needed to accurately solve small molecule drug development tasks are already in our hands, and project how much more experimental data is needed to achieve any desired level of improvement. We release our Pheno-CA benchmark and code to encourage further study of neural scaling laws for small molecule drug discovery.
    摘要 Surprisingly, we find that DNNs explicitly supervised to solve tasks in the Pheno-CA do not continuously improve as their data and model size increases. To address this issue, we introduce a novel precursor task, the Inverse Biological Process (IBP), which is designed to resemble the causal objective functions that have proven successful for NLP. We find that DNNs first trained with IBP and then probed for performance on the Pheno-CA significantly outperform task-supervised DNNs. Moreover, the performance of these IBP-trained DNNs monotonically improves with data and model scale.Our findings reveal that the DNN ingredients needed to accurately solve small molecule drug discovery tasks are already in our hands, and we provide a quantitative estimate of how much more experimental data is needed to achieve any desired level of improvement. We release our Pheno-CA benchmark and code to encourage further study of neural scaling laws for small molecule drug discovery.

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

  • paper_url: http://arxiv.org/abs/2309.16772
  • repo_url: None
  • paper_authors: Lei Lai, Zhongkai Shangguan, Jimuyang Zhang, Eshed Ohn-Bar
  • for: 本研究旨在提出一种 semi-supervised learning 方法,用于训练通用的单目视巡ometry(VO)模型,并可以在不同的数据集和环境下进行稳定的自适应运行。
  • methods: 本研究使用了 YouTube 上的大量不制定和多样化的摄像头视频进行自我训练,以便学习视频场景的 semantics 来恢复相对pose。具有多Modal 监督,包括 segmentation、流动、深度和音频auxiliary prediction 任务,以便激活通用表示。
  • results: 我们的提案可以在常用的 KITTI 标准测试集上达到状态之前的性能水平,而不需要多帧优化或相机参数的知情。此外,我们还发现音频预测任务可以强化 semi-supervised 学习过程,特别是在高度动态和不同的视频数据中。通过将 XVO 与 semi-supervised 步骤结合使用,我们可以在 KITTI、nuScenes 和 Argoverse 等不同数据集上实现自适应知识传递,而不需要特定的 fine-tuning。
    Abstract We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale from visual scene semantics, i.e., without relying on any known camera parameters. We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold. First, we empirically demonstrate the benefits of semi-supervised training for learning a general-purpose direct VO regression network. Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task. Specifically, we find audio prediction task to significantly enhance the semi-supervised learning process while alleviating noisy pseudo-labels, particularly in highly dynamic and out-of-domain video data. Our proposed teacher network achieves state-of-the-art performance on the commonly used KITTI benchmark despite no multi-frame optimization or knowledge of camera parameters. Combined with the proposed semi-supervised step, XVO demonstrates off-the-shelf knowledge transfer across diverse conditions on KITTI, nuScenes, and Argoverse without fine-tuning.
    摘要 我们提出 XVO,一种半监督学习方法,用于训练通用化的单目视巡数据(VO)模型,并可以在多种数据集和环境下进行稳定的自适应操作。与标准单目VO方法不同,XVO可以高效地从视觉场景 semantics 中 recuperate 相对pose,无需依赖任何known camera参数。我们通过自动训练从 YouTube 上可获得大量不同类型和不一致的dash摄像头视频来优化运动估计模型。我们的关键贡献有两点:首先,我们实际表明了半监督训练可以学习一个通用的直接VO重 regression 网络。其次,我们示出了多Modal 监督,包括分割、流动、深度和音频auxiliary prediction任务,以便促进通用表示 дляVO任务。我们发现音频预测任务可以显著地提高半监督学习过程,并减少噪音pseudo标签,特别是在高度动态和外域视频数据中。我们提出的教师网络在通用的KITTIbenchmark上达到了state-of-the-art性能,即使没有多帧优化或相机参数的知情。与我们的半监督步骤结合,XVO在KITTI、nuscenes和Argoverse上实现了无需 fine-tuning 的稳定知识传递。

Persona-Coded Poly-Encoder: Persona-Guided Multi-Stream Conversational Sentence Scoring

  • paper_url: http://arxiv.org/abs/2309.16770
  • repo_url: None
  • paper_authors: Junfeng Liu, Christopher Symons, Ranga Raju Vatsavai
  • for: 提高对话质量,使用个人性格信息进行改进。
  • methods: 提出了一种新的Persona-Coded Poly-Encoder方法,利用个人性格信息在多流编码方式中进行改进。
  • results: 对两个不同的人格基本对话集进行评估,与两种现有方法进行比较,研究结果表明,与基eline方法Poly-Encoder相比,我们的方法可以提高对话质量的BLEU分数和HR@1指标中的提高率为3.32%和2.94%。
    Abstract Recent advances in machine learning and deep learning have led to the widespread use of Conversational AI in many practical applications. However, it is still very challenging to leverage auxiliary information that can provide conversational context or personalized tuning to improve the quality of conversations. For example, there has only been limited research on using an individuals persona information to improve conversation quality, and even state-of-the-art conversational AI techniques are unable to effectively leverage signals from heterogeneous sources of auxiliary data, such as multi-modal interaction data, demographics, SDOH data, etc. In this paper, we present a novel Persona-Coded Poly-Encoder method that leverages persona information in a multi-stream encoding scheme to improve the quality of response generation for conversations. To show the efficacy of the proposed method, we evaluate our method on two different persona-based conversational datasets, and compared against two state-of-the-art methods. Our experimental results and analysis demonstrate that our method can improve conversation quality over the baseline method Poly-Encoder by 3.32% and 2.94% in terms of BLEU score and HR@1, respectively. More significantly, our method offers a path to better utilization of multi-modal data in conversational tasks. Lastly, our study outlines several challenges and future research directions for advancing personalized conversational AI technology.
    摘要 In this paper, we present a novel Persona-Coded Poly-Encoder method that leverages persona information in a multi-stream encoding scheme to improve the quality of response generation for conversations. To show the efficacy of the proposed method, we evaluate our method on two different persona-based conversational datasets, and compared against two state-of-the-art methods. Our experimental results and analysis demonstrate that our method can improve conversation quality over the baseline method Poly-Encoder by 3.32% and 2.94% in terms of BLEU score and HR@1, respectively. More significantly, our method offers a path to better utilization of multi-modal data in conversational tasks.Lastly, our study outlines several challenges and future research directions for advancing personalized conversational AI technology.

RealFill: Reference-Driven Generation for Authentic Image Completion

  • paper_url: http://arxiv.org/abs/2309.16668
  • repo_url: None
  • paper_authors: Luming Tang, Nataniel Ruiz, Qinghao Chu, Yuanzhen Li, Aleksander Holynski, David E. Jacobs, Bharath Hariharan, Yael Pritch, Neal Wadhwa, Kfir Aberman, Michael Rubinstein
  • for: 填充图像中缺失的区域,使得图像更加完整和真实
  • methods: 使用几个参考图像个性化生成模型,可以在不同的视角、照明条件、摄像头等条件下完成图像的填充
  • results: 比较 existed 方法,RealFill 能够在多种复杂和挑战的场景下填充图像,并且可以生成更加真实和可信的内容
    Abstract Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. See more results on our project page: https://realfill.github.io
    摘要 近期的生成图像技术发展,出现了外部涂抹和内部涂抹模型,可以生成高质量、有可能的图像内容,但这些模型无法提供真实场景的信息,因此生成的内容是不真实的。在这项工作中,我们提出了RealFill,一种新的生成方法,可以填充图像中缺失的区域。RealFill是一种基于几个参考图像的个性化生成模型,这些参考图像不需要与目标图像对齐,可以有极大的视角、照明条件、镜头缩进或图像风格的差异。一旦个性化,RealFill就可以完成目标图像,并生成有趣的内容,忠实于原始场景。我们在一个新的图像完成测试 benchmark 上评估了RealFill,并发现它与现有方法相比,表现出了大幅度的提升。更多结果请查看我们的项目页面:https://realfill.github.io。

SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation

  • paper_url: http://arxiv.org/abs/2309.16661
  • repo_url: https://github.com/mustansarfiaz/sa2-net
  • paper_authors: Mustansar Fiaz, Rao Muhammad Anwer, Hisham Cholakkal
  • for: 这个论文主要目的是提出一个具有注意力导航的方法,以便有效地处理微scopic影像中的多种结构,例如细胞等。
  • methods: 这个方法使用了多个层次特征学习,并将注意力导航模组与多个分辨率结合,以捕捉微scopic影像中的各种构造。此外,这个方法还引入了一个新的upsampling策略,以提高区域边界的定义性。
  • results: 实验结果显示,这个SA2-Net模型在五个挑战性的数据集上表现出色,并且比较常用的CNN模型表现更好。代码供给publicly available at \url{https://github.com/mustansarfiaz/SA2-Net}.
    Abstract Microscopic image segmentation is a challenging task, wherein the objective is to assign semantic labels to each pixel in a given microscopic image. While convolutional neural networks (CNNs) form the foundation of many existing frameworks, they often struggle to explicitly capture long-range dependencies. Although transformers were initially devised to address this issue using self-attention, it has been proven that both local and global features are crucial for addressing diverse challenges in microscopic images, including variations in shape, size, appearance, and target region density. In this paper, we introduce SA2-Net, an attention-guided method that leverages multi-scale feature learning to effectively handle diverse structures within microscopic images. Specifically, we propose scale-aware attention (SA2) module designed to capture inherent variations in scales and shapes of microscopic regions, such as cells, for accurate segmentation. This module incorporates local attention at each level of multi-stage features, as well as global attention across multiple resolutions. Furthermore, we address the issue of blurred region boundaries (e.g., cell boundaries) by introducing a novel upsampling strategy called the Adaptive Up-Attention (AuA) module. This module enhances the discriminative ability for improved localization of microscopic regions using an explicit attention mechanism. Extensive experiments on five challenging datasets demonstrate the benefits of our SA2-Net model. Our source code is publicly available at \url{https://github.com/mustansarfiaz/SA2-Net}.
    摘要 微型图像分割是一项复杂的任务,目标是为每个微型图像像素分配Semantic标签。虽然卷积神经网络(CNN)是许多现有框架的基础,但它们经常难以直接捕捉长距离依赖关系。尽管转换器在初始设计中是为了解决这一问题,但实际上,本地和全局特征都是微型图像多样化挑战的关键。在这篇论文中,我们介绍SA2-Net模型,它利用多级特征学习来有效地处理微型图像多样化挑战。具体来说,我们提出了适应级别注意(SA2)模块,用于捕捉微型图像中不同级别的尺度和形状特征,如细胞。这个模块包括每个多 stage特征层的本地注意力,以及多个分辨率之间的全局注意力。此外,我们解决了微型图像边界模糊(例如细胞边界模糊)的问题,通过引入一种新的upsampling策略called Adaptive Up-Attention(AuA)模块。这个模块通过显式注意力机制来提高微型图像区域的特征表达能力,以提高细胞的local化。我们在五个复杂的dataset上进行了广泛的实验,并证明了SA2-Net模型的优势。我们的源代码可以在github上获取,具体请参阅 \url{https://github.com/mustansarfiaz/SA2-Net}.

Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories

  • paper_url: http://arxiv.org/abs/2309.16750
  • repo_url: None
  • paper_authors: Benjamin Hoover, Hendrik Strobelt, Dmitry Krotov, Judy Hoffman, Zsolt Kira, Duen Horng Chau
  • for: 这个论文主要是为了提供一个简洁的概述 diffusion models (DMs),并描述它们如何与 Associative Memories (AMs) 之间的数学连接。
  • methods: 这篇论文使用了动力系统和Ordinary Differential Equations (ODEs) 来描述 DMs,并指出了一个 Lyapunov energy function,可以通过梯度下降来denoising数据。
  • results: 这篇论文总结了40年来的能量基本模型 (AMs) 的研究历史,并讨论了新的研究方向,包括 AMs 和 DMs 之间的类似性和不同性。
    Abstract Diffusion Models (DMs) have recently set state-of-the-art on many generation benchmarks. However, there are myriad ways to describe them mathematically, which makes it difficult to develop a simple understanding of how they work. In this survey, we provide a concise overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs) which exposes a mathematical connection to the highly related yet often overlooked class of energy-based models, called Associative Memories (AMs). Energy-based AMs are a theoretical framework that behave much like denoising DMs, but they enable us to directly compute a Lyapunov energy function on which we can perform gradient descent to denoise data. We then summarize the 40 year history of energy-based AMs, beginning with the original Hopfield Network, and discuss new research directions for AMs and DMs that are revealed by characterizing the extent of their similarities and differences
    摘要 Diffusion Models (DM) 在许多生成benchmark上设置了现代的州Of-the-art。然而,有许多不同的方式可以用数学方式描述它们,这使得理解它们的工作方式变得困难。在这篇survey中,我们提供了一个简洁的概述,从动态系统和常数方程式(ODEs)的角度,暴露了DM和对它们相似但通常被忽略的能量基本模型(AM)之间的数学连接。能量基本AM是一个理论框架,它们在减少噪声方面表现得非常相似于DM,但它们允许我们直接计算数据上的 Lyapunov 能量函数,并且可以使用梯度下降来减少噪声。我们然后summarize了40年来的AM历史,开始自原始的Hopfield网络,并讨论了AM和DM的新研究方向。

Discovering environments with XRM

  • paper_url: http://arxiv.org/abs/2309.16748
  • repo_url: None
  • paper_authors: Mohammad Pezeshki, Diane Bouchacourt, Mark Ibrahim, Nicolas Ballas, Pascal Vincent, David Lopez-Paz
  • For: The paper aims to develop algorithms for automatically discovering environments that induce broad generalization for robust AI systems across applications.* Methods: The proposed method, Cross-Risk-Minimization (XRM), trains two twin networks to learn from one random half of the training data, while imitating confident held-out mistakes made by its sibling.* Results: The paper shows that XRM can discover environments for all training and validation data, and domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.Here are the three points in Simplified Chinese:* For: 本 paper 的目的是开发自动发现可以促进 AI 系统广泛适用的环境。* Methods: 提议的方法是 Cross-Risk-Minimization (XRM),它将两个姐妹网络训练在一个随机选择的训练数据上,并模仿她的姐妹网络中的自信停止错误。* Results: paper 表明,XRM 可以为所有训练和验证数据发现环境,并在这些环境上建立域总结算法,解决了异常事物总结的长期问题。
    Abstract Successful out-of-distribution generalization requires environment annotations. Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generalization. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods add hyper-parameters and early-stopping criteria that are impossible to tune without a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains two twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.
    摘要 成功的 OUT-OF-DISTRIBUTION 泛化需要环境注释。 Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generalization. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods add hyper-parameters and early-stopping criteria that are impossible to tune without a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains two twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.Here's the translation in Traditional Chinese as well:成功的 OUT-OF-DISTRIBUTION 泛化需要环境注释。 Unfortunately, these are resource-intensive to obtain, and their relevance to model performance is limited by the expectations and perceptual biases of human annotators. Therefore, to enable robust AI systems across applications, we must develop algorithms to automatically discover environments inducing broad generalization. Current proposals, which divide examples based on their training error, suffer from one fundamental problem. These methods add hyper-parameters and early-stopping criteria that are impossible to tune without a validation set with human-annotated environments, the very information subject to discovery. In this paper, we propose Cross-Risk-Minimization (XRM) to address this issue. XRM trains two twin networks, each learning from one random half of the training data, while imitating confident held-out mistakes made by its sibling. XRM provides a recipe for hyper-parameter tuning, does not require early-stopping, and can discover environments for all training and validation data. Domain generalization algorithms built on top of XRM environments achieve oracle worst-group-accuracy, solving a long-standing problem in out-of-distribution generalization.

MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention

  • paper_url: http://arxiv.org/abs/2309.16639
  • repo_url: None
  • paper_authors: Ruolan Wu, Chun Yu, Xiaole Pan, Yujia Liu, Ningning Zhang, Yue Fu, Yuhan Wang, Zhi Zheng, Li Chen, Qiaolei Jiang, Xuhai Xu, Yuanchun Shi
  • for: 这个研究旨在开发一种基于大语言模型的智能手机使用范例,以帮助解决问题atic smartphone 使用对身心健康的负面影响。
  • methods: 本研究使用了巫师-奥兹研究(N=12)和访谈研究(N=10),以了解用户对问题atic smartphone 使用的心理状态,包括怠惰、压力和陌生。这些资讯帮助设计了四种劝导策略:理解、安慰、唤起和导向习惯。
  • results: 比较 MindShift 和基本技术, MindShift 在5周场景实验(N=25)中具有较高的干预接受率(17.8-22.5%)和降低智能手机使用频率(12.1-14.4%)。此外,用户的智能手机成瘾指数下降和自律力上升。研究显示了基于大语言模型的上下文感知劝导在其他行为改变领域的潜力。
    Abstract Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conduct a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leverage large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We develop MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment physical contexts, mental states, app usage behaviors, users' goals & habits as input, and generates high-quality and flexible persuasive content with appropriate persuasion strategies. We conduct a 5-week field experiment (N=25) to compare MindShift with baseline techniques. The results show that MindShift significantly improves intervention acceptance rates by 17.8-22.5% and reduces smartphone use frequency by 12.1-14.4%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.
    摘要 Problematic smartphone use negatively affects physical and mental health. Despite extensive prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment physical contexts, mental states, app usage behaviors, users' goals & habits as input, and generates high-quality and flexible persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with baseline techniques. The results show that MindShift significantly improves intervention acceptance rates by 17.8-22.5% and reduces smartphone use frequency by 12.1-14.4%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.

Mixup Your Own Pairs

  • paper_url: http://arxiv.org/abs/2309.16633
  • repo_url: https://github.com/yilei-wu/supremix
  • paper_authors: Yilei Wu, Zijian Dong, Chongyao Chen, Wangchunshu Zhou, Juan Helen Zhou
  • for: 这篇研究旨在提高回溯学习中的数据回溯表现,特别是对于 regression задачі。
  • methods: 这篇研究使用了 contrastive learning 技术,并且提出了一个名为 SupReMix 的新方法,它通过在嵌入层使用 anchor-inclusive 和 anchor-exclusive 的 mixture 来提高对 regression 数据的表现。
  • results: 经过广泛的实验和理论分析,研究发现 SupReMix 可以对 regression 数据提供丰富的排序信息,从而提高 regression 表现。此外,SupReMix 在转移学习、训练数据不均衡和对于少量训练数据的情况下也表现出优优的性能。
    Abstract In representation learning, regression has traditionally received less attention than classification. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we advocate "mixup your own contrastive pairs for supervised contrastive regression", instead of relying solely on real/augmented samples. Specifically, we propose Supervised Contrastive Learning for Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through extensive experiments on six regression datasets including 2D images, volumetric images, text, tabular data, and time-series signals, coupled with theoretical analysis, we demonstrate that SupReMix pre-training fosters continuous ordered representations of regression data, resulting in significant improvement in regression performance. Furthermore, SupReMix is superior to other approaches in a range of regression challenges including transfer learning, imbalanced training data, and scenarios with fewer training samples.
    摘要 在表征学学习中,回归方面曾经受到类别学习的遮盖,直接将类别学习技术应用于回归问题通常会导致杂乱的表征在隐藏空间,影响性不佳。在这篇论文中,我们认为对比学习在回归中的潜在能力被忽略了两个关键因素:排序意识和困难程度。为了解决这些挑战,我们提议“混合你自己的对比对”,而不仅仅依靠实际/扩展样本。我们提出了“Supervised Contrastive Learning for Regression with Mixup”(SupReMix)。它在嵌入层使用混合(包括 anchor 和一个不同的负样本的混合)作为困难对,并使用不含 anchor 的混合(两个不同的负样本之间的混合)作为困难正对。这种策略通过在嵌入水平上混合更多的排序信息,形成更加困难的对比对。经过了EXTENSIVE EXPERIMENTS 在六个回归数据集,包括 2D 图像、体积图像、文本、表格数据和时间序列信号,并与理论分析,我们展示了 SupReMix 预训练可以促进回归数据的连续排序表征,从而带来显著提高回归性能。此外,SupReMix 还在多种回归挑战中表现出优异,包括转移学习、不均衡训练数据和少量训练样本的场景。

Harnessing Diverse Data for Global Disaster Prediction: A Multimodal Framework

  • paper_url: http://arxiv.org/abs/2309.16747
  • repo_url: None
  • paper_authors: Gengyin Liu, Huaiyang Zhong
  • for: 预测气候变化导致的灾害预测,尤其是洪水和山塌预测,以了解气候和地理因素的关系。
  • methods: 该研究使用多Modal数据源,包括天气统计、卫星图像和文本情况,构建一个新的灾害预测框架。为了 Address class imbalance,我们采用了一些策略。
  • results: 结果表明,通过 integrate multiple data sources,可以提高模型性能,但是具体的提高程度因每种灾害和其根本原因而异。
    Abstract As climate change intensifies, the urgency for accurate global-scale disaster predictions grows. This research presents a novel multimodal disaster prediction framework, combining weather statistics, satellite imagery, and textual insights. We particularly focus on "flood" and "landslide" predictions, given their ties to meteorological and topographical factors. The model is meticulously crafted based on the available data and we also implement strategies to address class imbalance. While our findings suggest that integrating multiple data sources can bolster model performance, the extent of enhancement differs based on the specific nature of each disaster and their unique underlying causes.
    摘要 随着气候变化加剧,灾害预测的紧迫性日益增加。这项研究提出了一种新型多模态灾害预测框架,结合天气统计、卫星图像和文本掌握。我们尤其关注“洪水”和“山崩”预测,因为它们与天气和地理因素有着密切的关系。模型通过可用数据的精心设计,并对数据不平衡进行处理。我们发现,将多种数据源 integrate 可以提高模型性能,但每种灾害的特点和根本原因决定了增强程度。

Stress Testing Chain-of-Thought Prompting for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.16621
  • repo_url: None
  • paper_authors: Aayush Mishra, Karan Thakkar
  • for: 此研究探讨了Chain-of-Thought(CoT)提示的效iveness在提高大语言模型(LLM)的多步逻辑能力。
  • methods: 研究人员使用了三种类型的CoT提示perturbation,namely CoT order, CoT values,和CoT operators来分析GPT-3在不同任务上的表现。
  • results: 研究发现, incorrect CoT提示会导致准确率指标下降,正确的CoT值对于预测正确答案是关键。此外, incorrect demonstrations,where CoT operators或CoT order是错误的,不会对表现产生太大影响,相比之下,值基于的perturbation更加有影响。
    Abstract This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs). Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks. Our findings show that incorrect CoT prompting leads to poor performance on accuracy metrics. Correct values in the CoT is crucial for predicting correct answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT order are wrong, do not affect the performance as drastically when compared to the value based perturbations. This research deepens our understanding of CoT prompting and opens some new questions regarding the capability of LLMs to learn reasoning in context.
    摘要 Translation notes:* "Chain-of-Thought" (CoT) is translated as "思维链" (siwei lian) in Simplified Chinese.* "large language models" (LLMs) is translated as "大语言模型" (da yu yan mo deli) in Simplified Chinese.* "incorrect CoT prompting" is translated as "错误的思维链提示" (cuo yong de siwei lian tiishi) in Simplified Chinese.* "correct values in the CoT" is translated as "思维链中正确的值" (siwei lian zhong zheng qi de yi) in Simplified Chinese.* "incorrect demonstrations" is translated as "错误的示例" (cuo yong de shi yi) in Simplified Chinese.

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

  • paper_url: http://arxiv.org/abs/2309.16620
  • repo_url: None
  • paper_authors: Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan
  • for: 这 paper 的目的是找到一种新的深度学习模型调参方法,以降低模型的计算成本。
  • methods: 这 paper 使用了 $\mu$P 参数化网络,其中小宽网络的优化参数可以转移到任意宽度的网络上。然而,在这种方案中,参数不会转移到深度上。为了解决这个问题,这 paper 研究了具有 $1/\sqrt{\text{depth}$ 的剩余分支的 residual 网络,并与 $\mu$P 参数化结合使用。
  • results: 这 paper 通过实验表明,使用这种参数化和 residual 网络结构可以在 CIFAR-10 和 ImageNet 上实现优化参数的传递 across width 和 depth。此外,这 paper 的实验结果得到了理论支持,使用了近期发展的神经网络学习动态mean field theory(DMFT)描述神经网络学习动态,并证明了这种参数化的 ResNet 在无穷宽和无穷深度上存在一个明确的特征学习共聚点,并且证明了 finite-size 网络动态的收敛到这个共聚点。
    Abstract The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    摘要 深度学习中的参数优化成本随模型大小增长,让实践者寻找新的优化方法,其中一种提议使用$\mu$P参数化网络,其中优化的参数对小宽网络适用于任意大宽网络。然而,在这种方案中,参数不会在深度上传递。为了解决这个问题,我们研究了具有 $1/\sqrt{\text{depth}$ 的剩余分支级别的 residual 网络,并将其与 $\mu$P 参数化结合使用。我们提供了实验证明,使用这种参数化可以在 CIFAR-10 和 ImageNet 上传递优化的参数 across 宽度和深度。此外,我们的实验结果得到了理论支持,使用了最近的神经网络学习动力学 теория(DMFT)描述神经网络学习动力学,我们显示这种参数化的 ResNet 具有明确的特征学习共同极限,并且证明了finite-size网络动力学的拓扑向这个极限 converges。

Revisiting Neural Program Smoothing for Fuzzing

  • paper_url: http://arxiv.org/abs/2309.16618
  • repo_url: None
  • paper_authors: Maria-Irina Nicolae, Max Eisele, Andreas Zeller
  • for: This paper aims to evaluate the performance of Neural Program Smoothing (NPS) fuzzers and compare them with standard gray-box fuzzers.
  • methods: The paper uses a neural network as a smooth approximation of the program target for new test case generation, and conducts the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers.
  • results: The paper finds that the original performance claims for NPS fuzzers do not hold, and that standard gray-box fuzzers almost always surpass NPS-based fuzzers. The paper also contributes an in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS, and proposes new guidelines targeted at benchmarking fuzzing based on machine learning.Here is the format you requested for the results:
  • for: 这篇论文目的是评估Neural Program Smoothing(NPS)批处法的性能,并与标准的灰度批处法进行比较。
  • methods: 这篇论文使用神经网络作为新测试 случа的生成目标函数的均匀approximation,并进行了NPS批处法的最广泛评估。
  • results: 这篇论文发现NPS批处法的原始性能声明不准确,并且标准的灰度批处法大多数情况下会超过NPS基于的批处法。论文还提供了NPS中机器学习和梯度基于的变化的深入分析,并提出了基于机器学习的批处法评估指南。
    Abstract Testing with randomly generated inputs (fuzzing) has gained significant traction due to its capacity to expose program vulnerabilities automatically. Fuzz testing campaigns generate large amounts of data, making them ideal for the application of machine learning (ML). Neural program smoothing (NPS), a specific family of ML-guided fuzzers, aims to use a neural network as a smooth approximation of the program target for new test case generation. In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions: (1) We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works. (2) We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS. (3) We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers. (4) As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.
    摘要 In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions:1. We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works.2. We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS.3. We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers.4. As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.

“AI enhances our performance, I have no doubt this one will do the same”: The Placebo effect is robust to negative descriptions of AI

  • paper_url: http://arxiv.org/abs/2309.16606
  • repo_url: None
  • paper_authors: Agnes M. Kloft, Robin Welsch, Thomas Kosch, Steeven Villa
  • for: investigate the impact of user expectations on human-AI interactions and evaluate the effectiveness of AI systems.
  • methods: used a letter discrimination task and a Bayesian analysis to study the impact of AI descriptions on participant performance, and used cognitive modeling to trace the advantage back to participants gathering more information.
  • results: found that participants performed better when they believed an AI was present, even when there was no actual AI, and that negative AI descriptions did not alter expectations.Here is the text in Simplified Chinese:
  • for: 研究用户对人机交互中的AI系统效iveness的影响,以及用户对AI系统的期望和期待的影响。
  • methods: 使用字母识别任务和 bayesian分析研究用户对AI描述的影响,并使用认知模型跟踪这种优势的来源。
  • results: 发现当用户认为AI存在时,他们的性能会更高,即使没有实际的AI,并且消极的AI描述无法改变用户的期望。
    Abstract Heightened AI expectations facilitate performance in human-AI interactions through placebo effects. While lowering expectations to control for placebo effects is advisable, overly negative expectations could induce nocebo effects. In a letter discrimination task, we informed participants that an AI would either increase or decrease their performance by adapting the interface, but in reality, no AI was present in any condition. A Bayesian analysis showed that participants had high expectations and performed descriptively better irrespective of the AI description when a sham-AI was present. Using cognitive modeling, we could trace this advantage back to participants gathering more information. A replication study verified that negative AI descriptions do not alter expectations, suggesting that performance expectations with AI are biased and robust to negative verbal descriptions. We discuss the impact of user expectations on AI interactions and evaluation and provide a behavioral placebo marker for human-AI interaction
    摘要 人工智能预期的增强会促进人机交互中的表现 durch placebo效应。而为了控制placebo效应,应下降预期,但过于负面的预期可能会导致nocebo效应。在一个字母拥挤任务中,我们通知参与者,一个AI会通过改变界面来增加或减少他们的表现,但在实际情况下,没有AI存在任何condition。一种 bayesian分析表明,参与者对AI的预期很高,并在sham-AI存在的情况下表现出了更好的描述性表现。通过认知模型,我们可以追溯这个优势回到参与者更多地收集信息。一个重复研究证明,负面的AI描述不会改变预期, suggesting that performance expectations with AI are biased and robust to negative verbal descriptions。我们讨论了用户预期对人机交互和评价的影响,以及提供了人机交互中的行为地平标记。

Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces

  • paper_url: http://arxiv.org/abs/2309.16597
  • repo_url: None
  • paper_authors: Zhou Fan, Xinran Han, Zi Wang
  • for: 优化黑盒函数(black-box function optimization)
  • methods: bayesian 优化(Bayesian optimization)和培根学习(transfer learning)
  • results: 提高了黑盒函数优化任务的性能,可以在不同域的搜索空间中传递知识。
    Abstract Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.
    摘要 bayesian 优化(BO)是一种广泛使用的黑盒函数优化方法,它根据 bayesian 模型(通常是 Gaussian 过程)来做Sequential 决策。为保证模型质量,传输学approaches 已经开发来自动设置 GP 先天的模型。这些训练函数通常需要与“测试”函数(黑盒函数优化的目标函数)具有同一个Domain。在这篇论文中,我们介绍 MPHD,一种基于不同领域的域特征 mapping 来预训练 GP 模型的方法。MPHD 可以轻松地与 BO 结合使用,从而在不同搜索空间中传输知识。我们的理论和实验结果表明 MPHD 的有效性和在复杂黑盒函数优化任务中的优异表现。

Can LLMs Effectively Leverage Graph Structural Information: When and Why

  • paper_url: http://arxiv.org/abs/2309.16595
  • repo_url: https://github.com/trais-lab/llm-structured-data
  • paper_authors: Jin Huang, Xingjian Zhang, Qiaozhu Mei, Jiaqi Ma
  • for: 这个论文研究使用大量自然语言模型(LLM),并将结构数据(特别是图形数据)作为未经探索的数据类型,以提高节点预测性能。
  • methods: 作者使用多种提示方法来编码结构信息,以实现在文本特征scarce或rich的情况下提高LLM的预测性能。
  • results: 研究发现(i)LLM可以受益于结构信息,尤其是当文本节点特征scarce时;(ii)没有显著证据表明LLM性能受到数据泄露的影响;以及(iii)LLM在Target节点上的性能强正相关于节点的本地同类比率。
    Abstract This paper studies Large Language Models (LLMs) augmented with structured data--particularly graphs--a crucial data modality that remains underexplored in the LLM literature. We aim to understand when and why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs on node classification tasks with textual features. To address the ``when'' question, we examine a variety of prompting methods for encoding structural information, in settings where textual node features are either rich or scarce. For the ``why'' questions, we probe into two potential contributing factors to the LLM performance: data leakage and homophily. Our exploration of these questions reveals that (i) LLMs can benefit from structural information, especially when textual node features are scarce; (ii) there is no substantial evidence indicating that the performance of LLMs is significantly attributed to data leakage; and (iii) the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node\footnote{Codes and datasets are at: \url{https://github.com/TRAIS-Lab/LLM-Structured-Data}.
    摘要 To answer the "when" question, the paper explores various methods for encoding structural information, including prompting methods, in settings with rich or scarce textual node features. For the "why" questions, the study examines two potential factors that contribute to LLM performance: data leakage and homophily.The results show that LLMs can benefit from structural information, especially when textual node features are limited. Additionally, the study finds no significant evidence that LLM performance is largely attributed to data leakage. Finally, the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node.The codes and datasets used in the study are available at: \url{https://github.com/TRAIS-Lab/LLM-Structured-Data}.

  • paper_url: http://arxiv.org/abs/2309.16593
  • repo_url: None
  • paper_authors: Satvik Garg, Shivam Parikh, Somya Garg
  • for: This paper is written for researchers and healthcare professionals who are interested in using knowledge graphs (KGs) in healthcare AI, specifically in drug discovery and pharmaceutical research.
  • methods: The paper discusses various methods for constructing and utilizing KGs in healthcare AI, including knowledge-infused learning, relationship extraction, and reasoning.
  • results: The paper highlights the potential of KGs in healthcare AI to improve interpretability and support decision-making, with applications in areas such as Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. The paper also emphasizes the importance of making KGs more interpretable in healthcare.
    Abstract Knowledge graphs (KGs) are gaining prominence in Healthcare AI, especially in drug discovery and pharmaceutical research as they provide a structured way to integrate diverse information sources, enhancing AI system interpretability. This interpretability is crucial in healthcare, where trust and transparency matter, and eXplainable AI (XAI) supports decision making for healthcare professionals. This overview summarizes recent literature on the impact of KGs in healthcare and their role in developing explainable AI models. We cover KG workflow, including construction, relationship extraction, reasoning, and their applications in areas like Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. We emphasize the importance of making KGs more interpretable through knowledge-infused learning in healthcare. Finally, we highlight research challenges and provide insights for future directions.
    摘要 知识图(KG)在医疗人工智能(AI)领域受到越来越多的关注,特别是在药物发现和药品研究中,因为它们提供了一种结构化的方式,整合多种信息源,提高AI系统的可读性。这种可读性在医疗领域非常重要,因为信任和透明度很重要,而解释AI(XAI)支持医疗专业人员的决策。本文提供了最近的文献研究,描述了KG在医疗领域的影响和其在开发可解释AI模型方面的作用。我们覆盖了KG的工作流程,包括建立、关系提取、推理、以及在药物间交互(DDI)、药target交互(DTI)、药品开发(DD)、不良药物反应(ADR)和生物信息学等领域的应用。我们强调了在医疗领域使KG更加可解释的重要性,并提供了未来研究的挑战和思路。

The ARRT of Language-Models-as-a-Service: Overview of a New Paradigm and its Challenges

  • paper_url: http://arxiv.org/abs/2309.16573
  • repo_url: None
  • paper_authors: Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Anthony G. Cohn, Nigel Shadbolt, Michael Wooldridge
  • for: 本研究的目标是描述语言模型作为服务(LMaaS)的困难和挑战,以及如何提高访问、复制、可靠性和信任worthiness(ARRT)。
  • methods: 本研究采用系统性的方法描述当前主要的LMaaS的缺乏信息所带来的障碍,并提供一些建议和未来发展方向。
  • results: 本研究结果表明,当前主要的LMaaS存在访问、复制、可靠性和信任worthiness(ARRT)的困难和挑战,并提供了一些建议和未来发展方向。
    Abstract Some of the most powerful language models currently are proprietary systems, accessible only via (typically restrictive) web or software programming interfaces. This is the Language-Models-as-a-Service (LMaaS) paradigm. Contrasting with scenarios where full model access is available, as in the case of open-source models, such closed-off language models create specific challenges for evaluating, benchmarking, and testing them. This paper has two goals: on the one hand, we delineate how the aforementioned challenges act as impediments to the accessibility, replicability, reliability, and trustworthiness (ARRT) of LMaaS. We systematically examine the issues that arise from a lack of information about language models for each of these four aspects. We shed light on current solutions, provide some recommendations, and highlight the directions for future advancements. On the other hand, it serves as a one-stop-shop for the extant knowledge about current, major LMaaS, offering a synthesized overview of the licences and capabilities their interfaces offer.
    摘要 一些当前最强大的语言模型都是专有系统,通过(通常是限制的)网络或软件编程接口进行访问。这是语言模型作为服务(LMaaS)模式。与开源模型相比,这些封闭语言模型会创造特定的挑战,以评估、测试和比较它们的可访问性、复制性、可靠性和信任性(ARRT)。本文有两个目标:首先,我们详细描述了这些挑战如何阻碍LMaaS的访问、复制、可靠性和信任性四个方面的可访问性。我们系统地检查这些问题的起因,并提供一些建议。其次,它serve as a one-stop-shop for the extant knowledge about current, major LMaaS, offering a synthesized overview of the licenses and capabilities their interfaces offer.

Augment to Interpret: Unsupervised and Inherently Interpretable Graph Embeddings

  • paper_url: http://arxiv.org/abs/2309.16564
  • repo_url: https://github.com/euranova/augment_to_interpret
  • paper_authors: Gregory Scafarto, Madalina Ciortan, Simon Tihon, Quentin Ferre
  • for: 这篇论文旨在提出一种可 interpretability 的 graph representation learning 方法,以满足 Recent Transparent-AI 规定。
  • methods: 该方法使用 data augmentation 技术,通过保持 semantics 来学习可 interpretability 的 embedding。
  • results: 实验研究表明,该方法可以提供 state-of-the-art 的性能在 downstream 任务上,并且具有可 interpretability 的优势。
    Abstract Unsupervised learning allows us to leverage unlabelled data, which has become abundantly available, and to create embeddings that are usable on a variety of downstream tasks. However, the typical lack of interpretability of unsupervised representation learning has become a limiting factor with regard to recent transparent-AI regulations. In this paper, we study graph representation learning and we show that data augmentation that preserves semantics can be learned and used to produce interpretations. Our framework, which we named INGENIOUS, creates inherently interpretable embeddings and eliminates the need for costly additional post-hoc analysis. We also introduce additional metrics addressing the lack of formalism and metrics in the understudied area of unsupervised-representation learning interpretability. Our results are supported by an experimental study applied to both graph-level and node-level tasks and show that interpretable embeddings provide state-of-the-art performance on subsequent downstream tasks.
    摘要 Unsupervised learning 允许我们利用无标签数据,这些数据在过去几年变得极其丰富,并创建可以在多种下游任务上使用的嵌入。然而,通常缺乏无监督表示学习的解释性限制了我们在Recent Transparent-AI规定下的发展。在这篇论文中,我们研究图表示学习,并证明通过保持 semantics 的数据扩充可以学习并生成解释性的嵌入。我们的框架,我们称之为 INGENIOUS,创造了内置的解释性嵌入,从而消除了高昂的附加后续分析的需求。我们还引入了针对无监督表示学习解释性缺乏正式主义和度量的额外度量。我们的实验研究在图级和节点级任务上应用,结果表明可解释性嵌入提供了下游任务的状态级表现。

Voting Network for Contour Levee Farmland Segmentation and Classification

  • paper_url: http://arxiv.org/abs/2309.16561
  • repo_url: None
  • paper_authors: Abolfazl Meyarian, Xiaohui Yuan
  • for: 这 paper 是为了 segmenting farmlands with contour levees from high-resolution aerial imagery.
  • methods: 这 paper 使用了一种 end-to-end 可学习的网络,包括多个 voting blocks 来实现图像分类和 segmentation.
  • results: 这 paper 的方法在 National Agriculture Imagery Program 的图像上测试得到了平均准确率为 94.34%,比前一个状态的方法提高了 6.96% 和 2.63% 的 F1 分数。
    Abstract High-resolution aerial imagery allows fine details in the segmentation of farmlands. However, small objects and features introduce distortions to the delineation of object boundaries, and larger contextual views are needed to mitigate class confusion. In this work, we present an end-to-end trainable network for segmenting farmlands with contour levees from high-resolution aerial imagery. A fusion block is devised that includes multiple voting blocks to achieve image segmentation and classification. We integrate the fusion block with a backbone and produce both semantic predictions and segmentation slices. The segmentation slices are used to perform majority voting on the predictions. The network is trained to assign the most likely class label of a segment to its pixels, learning the concept of farmlands rather than analyzing constitutive pixels separately. We evaluate our method using images from the National Agriculture Imagery Program. Our method achieved an average accuracy of 94.34\%. Compared to the state-of-the-art methods, the proposed method obtains an improvement of 6.96% and 2.63% in the F1 score on average.
    摘要 高解像卫星影像可以显示农田的细节,但小 objetcs 和特征会导致对象boundaries的扭曲,需要更大的上下文视图来减少类异常。在这种工作中,我们提出了一个可以执行全程训练的网络,用于从高解像卫星影像中分割农田和缘坝。我们设计了一个合并块,该块包括多个投票块,以实现图像分类和分割。我们将该合并块与背景 integrate 并生成semantic prediction和分割slice。我们使用分割slice进行多数投票,以确定每个像素的最有可能的类别标签。我们使用National Agriculture Imagery Program中的图像进行评估,我们的方法达到了94.34%的平均准确率。相比之前的方法,我们的方法在F1分数中提高了6.96%和2.63%的平均值。

KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models

  • paper_url: http://arxiv.org/abs/2309.16535
  • repo_url: https://github.com/juyiming/klob
  • paper_authors: Yiming Ju, Zheng Zhang
  • for: 本研究旨在检验语言模型中的知识储存是否符合 lokalisierung гипотезы。
  • methods: 本研究提出了 KLoB benchmark,用于评估现有的知识定位方法。
  • results: KLoB 可以用于评估现有的知识定位方法,并且可以用于重新评估 lokalisierung гипотезы。
    Abstract Recently, Locate-Then-Edit paradigm has emerged as one of the main approaches in changing factual knowledge stored in the Language models. However, there is a lack of research on whether present locating methods can pinpoint the exact parameters embedding the desired knowledge. Moreover, although many researchers have questioned the validity of locality hypothesis of factual knowledge, no method is provided to test the a hypothesis for more in-depth discussion and research. Therefore, we introduce KLoB, a benchmark examining three essential properties that a reliable knowledge locating method should satisfy. KLoB can serve as a benchmark for evaluating existing locating methods in language models, and can contributes a method to reassessing the validity of locality hypothesis of factual knowledge. Our is publicly available at \url{https://github.com/juyiming/KLoB}.
    摘要 最近,语言模型中的“发现然后编辑”模式已成为改变知识存储的主要方法之一。然而,exist 的研究表明,目前的定位方法是否可以准确地寻找所需的知识 Parameters 还存在很大的不确定性。此外,许多研究人员对本地性假设表示怀疑,但是没有提供方法来进行更深入的讨论和研究。因此,我们介绍了 KLoB,一个评估三种关键性质的知识定位指标。KLoB 可以用来评估现有的定位方法在语言模型中的性能,并且可以为本地性假设的有效性进行再评估。我们的代码公开在 GitHub 上,可以通过 \url{https://github.com/juyiming/KLoB} 访问。

MotionLM: Multi-Agent Motion Forecasting as Language Modeling

  • paper_url: http://arxiv.org/abs/2309.16534
  • repo_url: None
  • paper_authors: Ari Seff, Brian Cera, Dian Chen, Mason Ng, Aurick Zhou, Nigamaa Nayakanti, Khaled S. Refaat, Rami Al-Rfou, Benjamin Sapp
  • for: 预测自动驾驶车辆未来行为的可靠预测是一项关键任务,以确保安全规划。
  • methods: 本文使用语言模型来预测多个交通Agent的未来行为,并将连续轨迹表示为字符序列。
  • results: 提出的方法在 Waymo 开放运动数据集上实现了新的状态 искус智能榜首,在交互挑战领先榜单上排名第一。
    Abstract Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.
    摘要 <>TRANSLATE_TEXT预测自驾车道Agent的未来行为是安全规划中的关键组成部分。在这里,我们表示连续轨迹为序列化 discrete 动作符号,将多个动物运动预测变为语言模型化任务。我们的模型,MotionLM,具有以下优势:首先,它不需要锚点或显式的隐藏变量优化来学习多模态分布。相反,我们利用单个标准语言模型化目标,最大化序列符号的平均日志概率。其次,我们的方法不需要后续交互规则,而是在单个推送过程中生成交互agent的共同未来。此外,模型的时间分解能够实现 Conditional Rollouts。我们的提出方法在 Waymo 开放动力学数据集上实现了新的状态纪录性表现,在互动挑战 leaderboard 上排名第一。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Chatmap : Large Language Model Interaction with Cartographic Data

  • paper_url: http://arxiv.org/abs/2310.01429
  • repo_url: None
  • paper_authors: Eren Unlu
    for:This paper aims to demonstrate the use of a large language model (LLM) to provide a linguistic interface to OpenStreetMap (OSM) data for an arbitrary urban region, allowing users to inquire about location attributes such as touristic appeal or business profitability.methods:The authors fine-tune a relatively small-scale LLM with a small artificial dataset curated by a more capable teacher model to provide the linguistic interface to OSM data.results:The study shows early signs of useful emerging abilities in this context, including the embeddings of artificially curated prompts including OSM data, which could be instrumental for potential geospatially aware urban Retrieval Augmented Generation (RAG) applications.
    Abstract The swift advancement and widespread availability of foundational Large Language Models (LLMs), complemented by robust fine-tuning methodologies, have catalyzed their adaptation for innovative and industrious applications. Enabling LLMs to recognize and interpret geospatial data, while offering a linguistic access to vast cartographic datasets, is of significant importance. OpenStreetMap (OSM) is the most ambitious open-source global initiative offering detailed urban and rural geographic data, curated by a community of over 10 million contributors, which constitutes a great potential for LLM applications. In this study, we demonstrate the proof of concept and details of the process of fine-tuning a relatively small scale (1B parameters) LLM with a relatively small artificial dataset curated by a more capable teacher model, in order to provide a linguistic interface to the OSM data of an arbitrary urban region. Through this interface, users can inquire about a location's attributes, covering a wide spectrum of concepts, such as its touristic appeal or the potential profitability of various businesses in that vicinity. The study aims to provide an initial guideline for such generative artificial intelligence (AI) adaptations and demonstrate early signs of useful emerging abilities in this context even in minimal computational settings. The embeddings of artificially curated prompts including OSM data are also investigated in detail, which might be instrumental for potential geospatially aware urban Retrieval Augmented Generation (RAG) applications.
    摘要 快速发展和普及大型自然语言模型(LLM)的可能性,结合了可靠的微调方法,使得它们在创新和产业上得到应用。允许 LLM 认可和解释地理数据,同时提供语言接口访问庞大的地图数据集,对于地理应用来说非常重要。开源地图协会(OSM)是全球最大的开源地理数据initiative,由1000万名贡献者维护,这个数据库的可访问性和可用性对LLM应用来说非常重要。本研究示例了一种使用相对较小的约10亿参数的 LLM 微调过程,使得它可以理解和解释OSM数据,并提供一种语言接口,让用户可以根据地点的特征,提问该地点的属性,包括旅游appeal或者当地企业的可能性。本研究的目的是提供一个初步的AI应用 guideline,并证明在有限的计算设置下,LLM在这种情况下的初步表现。此外,研究还investigated embedding of artificially curated prompts,包括OSM数据,这些embeddings可能对 potential的地ospatially awareurban Retrieval Augmented Generation(RAG)应用产生影响。

From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford’s Geometric Algebra and Convexity

  • paper_url: http://arxiv.org/abs/2309.16512
  • repo_url: None
  • paper_authors: Mert Pilanci
  • for: 这个论文是关于神经网络的分析,使用几何(Clifford)代数和几何优化。
  • methods: 论文使用标准正则化损失函数来训练深度ReLU神经网络,并通过几何优化来找到优化的 weights。
  • results: 论文发现,在训练过程中,神经网络的优化 weights 可以表示为训练样本的叉乘Product,并且训练问题可以转化为几何优化问题,该问题可以找到一小型的样本subset,并通过 $\ell_1$ 正则化来发现只有 relevante 叉乘Product features。
    Abstract In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.
    摘要 在本文中,我们提出了一种基于几何(Clifford)代数和凸优化的神经网络分析方法。我们证明,当使用标准正则化损失函数进行训练时,深度ReLU神经网络的优化策略是通过训练样本的叉乘产生的。此外,训练问题转化为凸优化问题,其中凸函数是基于叉乘特征的。这些特征编码了训练数据集的几何结构,具体是指由数据向量生成的积分体和平行板生成的正负体积。我们的分析为神经网络的内部工作提供了一个新的视角,并且抛光了隐藏层的作用。

Toloka Visual Question Answering Benchmark

  • paper_url: http://arxiv.org/abs/2309.16511
  • repo_url: None
  • paper_authors: Dmitry Ustalov, Nikita Pavlichenko, Sergey Koshelev, Daniil Likhobaba, Alisa Smirnova
  • for: 这个论文目的是提出一个新的人工智能测试数据集,用于评测机器学习系统在视觉问答任务中的性能,并与人类水平进行比较。
  • methods: 这个论文使用了人工智能技术,包括开源零基eline模型和多阶段竞赛,以评测机器学习系统在视觉问答任务中的性能。
  • results: 根据交叠分区评价分数,当论文提交时,没有任何机器学习模型超越非专家人工智能基线。
    Abstract In this paper, we present Toloka Visual Question Answering, a new crowdsourced dataset allowing comparing performance of machine learning systems against human level of expertise in the grounding visual question answering task. In this task, given an image and a textual question, one has to draw the bounding box around the object correctly responding to that question. Every image-question pair contains the response, with only one correct response per image. Our dataset contains 45,199 pairs of images and questions in English, provided with ground truth bounding boxes, split into train and two test subsets. Besides describing the dataset and releasing it under a CC BY license, we conducted a series of experiments on open source zero-shot baseline models and organized a multi-phase competition at WSDM Cup that attracted 48 participants worldwide. However, by the time of paper submission, no machine learning model outperformed the non-expert crowdsourcing baseline according to the intersection over union evaluation score.
    摘要 在这篇论文中,我们介绍了Toloka视觉问答数据集,这是一个新的人工生成的数据集,用于比较机器学习系统与人类专业水平在视觉问答任务中的表现。在这个任务中,给定一幅图像和一个文本问题,需要正确地选择图像中相应的对象。每个图像-问题对包含回答,仅有一个正确的回答每幅图像。我们的数据集包含45,199个图像-问题对,其中包括训练和两个测试subset,以及每个图像-问题对的真实答案。除了描述数据集和发布它以CC BYlicense外,我们还进行了一系列实验,使用开源零基eline模型,并在WSDM杯中组织了多阶段比赛,吸引了全球48名参与者。然而,到论文提交时,没有任何机器学习模型超过非专家人工生成基eline的交叉上下 overlap评价分。

Asset Bundling for Wind Power Forecasting

  • paper_url: http://arxiv.org/abs/2309.16492
  • repo_url: None
  • paper_authors: Hanyu Zhang, Mathieu Tanneau, Chaofan Huang, V. Roshan Joseph, Shangkun Wang, Pascal Van Hentenryck
  • for: 这篇论文旨在提高风力发电 grid 中的预测精度,尤其是在风力发电变化很大的情况下。
  • methods: 这篇论文提出了一个novel Bundle-Predict-Reconcile (BPR) 框架,融合资产组合、机器学习和预测重新整理技术。BPR 框架首先学习一个中间层级(组合),然后预测风力在资产、组合和舱级别的时间序列,最后将所有预测重新整理以确保一致性。这种方法将新增一个辅助学习任务(预测组合级别时间序列),帮助主要学习任务。
  • results: 实验结果显示,BPR 框架在实际应用中具有明显的改善预测精度的效果,特别是在舱级别上。
    Abstract The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (BPR) framework that integrates asset bundling, machine learning, and forecast reconciliation techniques. The BPR framework first learns an intermediate hierarchy level (the bundles), then predicts wind power at the asset, bundle, and fleet level, and finally reconciles all forecasts to ensure consistency. This approach effectively introduces an auxiliary learning task (predicting the bundle-level time series) to help the main learning tasks. The paper also introduces new asset-bundling criteria that capture the spatio-temporal dynamics of wind power time series. Extensive numerical experiments are conducted on an industry-size dataset of 283 wind farms in the MISO footprint. The experiments consider short-term and day-ahead forecasts, and evaluates a large variety of forecasting models that include weather predictions as covariates. The results demonstrate the benefits of BPR, which consistently and significantly improves forecast accuracy over baselines, especially at the fleet level.
    摘要 随着美国电力网络中间型发电的增加,特别是风力和太阳能发电的增加,运营uncertainty增加。在这个上下文中,准确预测是非常重要,尤其是风力发电,它的变化很大,历史上更难预测。为了解决这个挑战,这个工作提出了一种新的Bundle-Predict-Reconcile(BPR)框架,它将资产束合,机器学习和预测重叠技术相结合。BPR框架首先学习中间层次(束合),然后预测风力发电量在资产、束合和舰队级别,并最后重叠所有预测,以确保一致性。这种方法实际上是在主要学习任务之外增加了一个辅助学习任务(预测束合级时间序列),以帮助主要学习任务。这篇论文还提出了新的资产束合标准,以捕捉风力发电时间序列的空间-时间动态。我们对283个风力电站的数据进行了广泛的数值实验,考虑了短期和当日预测,并评估了许多预测模型,包括天气预测作为covariates。结果表明BPR具有显著优势,在baseline的基础上,尤其是在舰队级别,具有显著和一致性提高预测精度。

Augmenting LLMs with Knowledge: A survey on hallucination prevention

  • paper_url: http://arxiv.org/abs/2309.16459
  • repo_url: None
  • paper_authors: Konstantinos Andriopoulos, Johan Pouwelse
  • for: 本研究的目的是探讨大型预训言语模型如何通过与外部知识源的Integration来解决传统语言模型存在的问题,如幻想、不准确回答和扩展性问题。
  • methods: 本研究使用了将大型预训言语模型与可微分的访问机制相结合,以便访问外部知识源,包括外部知识库和搜索引擎。这些扩展的语言模型通过在预测缺失字符的标准目标下使用多样化、可能非参数的外部模块来增强其语言处理能力。
  • results: 本研究发现,通过将大型预训言语模型与知识源集成,可以解决传统语言模型存在的问题,如幻想、不准确回答和扩展性问题。这些扩展的语言模型还能够更好地处理语言任务,提高了对知识的访问和处理能力。
    Abstract Large pre-trained language models have demonstrated their proficiency in storing factual knowledge within their parameters and achieving remarkable results when fine-tuned for downstream natural language processing tasks. Nonetheless, their capacity to access and manipulate knowledge with precision remains constrained, resulting in performance disparities on knowledge-intensive tasks when compared to task-specific architectures. Additionally, the challenges of providing provenance for model decisions and maintaining up-to-date world knowledge persist as open research frontiers. To address these limitations, the integration of pre-trained models with differentiable access mechanisms to explicit non-parametric memory emerges as a promising solution. This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources, including external knowledge bases and search engines. While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules to augment their contextual processing capabilities, departing from the conventional language modeling paradigm. Through an exploration of current advancements in augmenting large language models with knowledge, this work concludes that this emerging research direction holds the potential to address prevalent issues in traditional LMs, such as hallucinations, un-grounded responses, and scalability challenges.
    摘要

Neuro Symbolic Reasoning for Planning: Counterexample Guided Inductive Synthesis using Large Language Models and Satisfiability Solving

  • paper_url: http://arxiv.org/abs/2309.16436
  • repo_url: None
  • paper_authors: Sumit Kumar Jha, Susmit Jha, Patrick Lincoln, Nathaniel D. Bastian, Alvaro Velasquez, Rickard Ewetz, Sandeep Neema
    for: 这种方法用于生成符合逻辑要求的正式文档,如代码、规划和逻辑规范。methods: 使用生成大型自然语言模型(LLMs),通过人工提供的指导提示,生成人类语言响应,并使用逻辑推理引擎(SMT)来分析生成的解决方案,生成错误的对应例子,并将其反馈给 LLMs。results: 通过在块域的规划任务上评估这种方法,发现这种方法可以生成符合逻辑要求的正式文档,并且可以使用非专家用户通过自然语言描述问题,并且组合 LLMs 和 SMT 引擎可以生成可靠的解决方案。
    Abstract Generative large language models (LLMs) with instruct training such as GPT-4 can follow human-provided instruction prompts and generate human-like responses to these prompts. Apart from natural language responses, they have also been found to be effective at generating formal artifacts such as code, plans, and logical specifications from natural language prompts. Despite their remarkably improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results despite their syntactic coherence - a phenomenon often referred to as hallucination. This limitation makes it difficult to use these models to synthesize formal artifacts that are used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plan, and other formal artifacts produced by LLMs can be catastrophic. We posit that we can use the satisfiability modulo theory (SMT) solvers as deductive reasoning engines to analyze the generated solutions from the LLMs, produce counterexamples when the solutions are incorrect, and provide that feedback to the LLMs exploiting the dialog capability of instruct-trained LLMs. This interaction between inductive LLMs and deductive SMT solvers can iteratively steer the LLM to generate the correct response. In our experiments, we use planning over the domain of blocks as our synthesis task for evaluating our approach. We use GPT-4, GPT3.5 Turbo, Davinci, Curie, Babbage, and Ada as the LLMs and Z3 as the SMT solver. Our method allows the user to communicate the planning problem in natural language; even the formulation of queries to SMT solvers is automatically generated from natural language. Thus, the proposed technique can enable non-expert users to describe their problems in natural language, and the combination of LLMs and SMT solvers can produce provably correct solutions.
    摘要 大型生成语言模型(LLM),如GPT-4,可以根据人类提供的指令prompt并生成人类化的回应。除了自然语言回应,它们还能生成 formal artifacts such as code, plans, and logical specifications from natural language prompts。 despite their significantly improved accuracy, these models are still known to produce factually incorrect or contextually inappropriate results - a phenomenon often referred to as hallucination. This limitation makes it difficult to use these models to synthesize formal artifacts that are used in safety-critical applications. Unlike tasks such as text summarization and question-answering, bugs in code, plans, and other formal artifacts produced by LLMs can be catastrophic.我们认为可以使用模型满足性理论(SMT)解析器来分析由LLMs生成的解决方案,生成错误的 counterexample,并通过对LLMs的对话来使其更正。这种LLMs和SMT解析器之间的互动可以轮循地使LLMs生成正确的回应。在我们的实验中,我们使用块的规划作为我们的生成任务,使用GPT-4、GPT3.5 Turbo、Davinci、Curie、Babbage和Ada作为LLMs,并使用Z3作为SMT解析器。我们的方法允许用户通过自然语言描述问题,甚至是SMT解析器的查询也可以自动生成自然语言中。因此,我们的技术可以帮助非专业用户通过自然语言描述问题,并且组合LLMs和SMT解析器可以生成可靠的解决方案。

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

  • paper_url: http://arxiv.org/abs/2309.16429
  • repo_url: https://github.com/guyyariv/TempoTokens
  • paper_authors: Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi
  • for: 本研究目的是生成受Semantic classes的各种语音样本引导的多样化和真实的视频。
  • methods: 我们使用了一个现有的文本受控制视频生成模型和一个预训练的音频编码器模型。我们的方法基于一个轻量级的适应器网络,该网络学习将音频基于表示映射到输入表示。因此,它还允许视频生成conditioned on文本、音频和两者。
  • results: 我们在三个 datasets 进行了广泛的验证,并提出了一个新的评价指标(AV-Align)来评估输入音频样本与生成的视频的匹配度。我们的方法在内容和时间轴上都能够更好地与输入音频样本相匹配,并且生成的视频也具有更高的视觉质量和更大的多样性。
    Abstract We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model. As such, it also enables video generation conditioned on text, audio, and, for the first time as far as we can ascertain, on both text and audio. We validate our method extensively on three datasets demonstrating significant semantic diversity of audio-video samples and further propose a novel evaluation metric (AV-Align) to assess the alignment of generated videos with input audio samples. AV-Align is based on the detection and comparison of energy peaks in both modalities. In comparison to recent state-of-the-art approaches, our method generates videos that are better aligned with the input sound, both with respect to content and temporal axis. We also show that videos produced by our method present higher visual quality and are more diverse.
    摘要 我们考虑一个生成多样化、现实的视频指导于自然语音样本的任务。这些视频需要与输入语音进行全球和时间的对齐:全球上,输入语音与输出视频的整体 semantic 关系存在,而时间上,每个语音样本都需要与对应的视频样本进行对齐。我们利用现有的文本受控视频生成模型和预训练的音频编码器模型。我们的方法基于一个轻量级的适配器网络,该网络学习将音频基于表示映射到文本受控视频生成模型的输入表示。因此,它也允许视频生成 conditioned 于文本、音频和,如果不同的说,也可以生成视频 conditioned 于文本和音频。我们在三个数据集上进行了广泛的验证,并提出了一个新的评价指标(AV-Align)来评估生成的视频与输入音频样本之间的对齐。AV-Align 基于检测和比较modalities 中的能量峰值。与最新的状态艺术方法相比,我们的方法生成的视频与输入声音更加吻合,同时也具有更高的视觉质量和多样性。

Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News Detection

  • paper_url: http://arxiv.org/abs/2309.16424
  • repo_url: https://github.com/jiayingwu19/prompt-and-align
  • paper_authors: Jiaying Wu, Shen Li, Ailin Deng, Miao Xiong, Bryan Hooi
  • for: 这篇论文旨在提出一种基于提示的几何 fake news 检测方法,以便利用预训练语言模型(PLM)的先锋知识和社交上下文 topology。
  • methods: 该方法首先将新闻文章包装在一个关于任务的文本提示中,然后使用 PLM 处理提示以直接提取任务特定的知识。此外,为了补充 PLM 的社交上下文信息而不导致额外训练开销,该方法还构建了新闻相似图,以捕捉新闻文章之间的真实性相似的信号。最后,该方法将提示的预测结果与图边缘进行准确度 Informed 对齐。
  • results: 对三个实际 benchmark 进行了广泛的实验, demonstrate 该方法可以在几何 fake news 检测任务中取得显著的新的状态 record,与传统的 Train-from-Scratch 方法相比,该方法可以减少标签稀缺问题。
    Abstract Despite considerable advances in automated fake news detection, due to the timely nature of news, it remains a critical open question how to effectively predict the veracity of news articles based on limited fact-checks. Existing approaches typically follow a "Train-from-Scratch" paradigm, which is fundamentally bounded by the availability of large-scale annotated data. While expressive pre-trained language models (PLMs) have been adapted in a "Pre-Train-and-Fine-Tune" manner, the inconsistency between pre-training and downstream objectives also requires costly task-specific supervision. In this paper, we propose "Prompt-and-Align" (P&A), a novel prompt-based paradigm for few-shot fake news detection that jointly leverages the pre-trained knowledge in PLMs and the social context topology. Our approach mitigates label scarcity by wrapping the news article in a task-related textual prompt, which is then processed by the PLM to directly elicit task-specific knowledge. To supplement the PLM with social context without inducing additional training overheads, motivated by empirical observation on user veracity consistency (i.e., social users tend to consume news of the same veracity type), we further construct a news proximity graph among news articles to capture the veracity-consistent signals in shared readerships, and align the prompting predictions along the graph edges in a confidence-informed manner. Extensive experiments on three real-world benchmarks demonstrate that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
    摘要 尽管自动化假新闻检测已经取得了很大的进步,但由于新闻的时效性,仍然是一个关键的开问如何有效地预测新闻文章的真实性基于有限的事实检查。现有的方法通常采用“训练从零”方法,它的基础是有限的 annotated 数据的可用性。而使用表达力强的预训练语言模型(PLM)的“预训练并精度调整”方法,也存在不一致性问题,需要耗费大量的任务特定超vision。在这篇论文中,我们提出了“提示和对齐”(P&A),一种新的提示基本 paradigm,可以同时利用 PLM 中的预训练知识和社交上下文 topology。我们的方法可以减轻标签缺乏问题,通过将新闻文章包装在任务相关的文本提示中,然后使用 PLM 处理提示,直接提取任务特定的知识。此外,为了补充 PLM 而不引入额外的训练负担,我们根据实际观察到的用户真实性一致性(即社交用户倾向于消耗同类真实性的新闻),构建了新闻 proximity graph,以捕捉新闻文章之间的真实性相似信号,并将提示预测与图 Edge 进行准确信息对齐。我们的实验表明,P&A 可以在三个真实世界 benchmark 上达到新的状态记录,在几何上击败现有方法。

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

  • paper_url: http://arxiv.org/abs/2309.16414
  • repo_url: None
  • paper_authors: Jan Hendrik Metzen, Piyapat Saranrittichai, Chaithanya Kumar Mummadi
  • for: 这篇论文是为了提出一种自动调整零模型的方法,以提高零模型在不同的图像分类任务中的性能。
  • methods: 该方法使用了CLIP视力语言模型,并使用了不同的描述符模板来自动生成描述符集。在执行时,该方法根据图像描述符和描述符模板的相似度计算出每个图像的权重,以优化零模型的性能。
  • results: 该方法在多种视力语言模型、数据集和描述符模板上都有较高的性能,与基eline比较起来,该方法可以提高零模型的准确率 by up to 3%。
    Abstract Classifiers built upon vision-language models such as CLIP have shown remarkable zero-shot performance across a broad range of image classification tasks. Prior work has studied different ways of automatically creating descriptor sets for every class based on prompt templates, ranging from manually engineered templates over templates obtained from a large language model to templates built from random words and characters. Up until now, deriving zero-shot classifiers from the respective encoded class descriptors has remained nearly unchanged, i.e., classify to the class that maximizes cosine similarity between its averaged encoded class descriptors and the image encoding. However, weighing all class descriptors equally can be suboptimal when certain descriptors match visual clues on a given image better than others. In this work, we propose AutoCLIP, a method for auto-tuning zero-shot classifiers. AutoCLIP tunes per-image weights to each prompt template at inference time, based on statistics of class descriptor-image similarities. AutoCLIP is fully unsupervised, has very low computational overhead, and can be easily implemented in few lines of code. We show that AutoCLIP outperforms baselines across a broad range of vision-language models, datasets, and prompt templates consistently and by up to 3 percent point accuracy.
    摘要 “基于视力语言模型CLIP的分类器已经表现出惊人的零shot表现,覆盖了广泛的图像分类任务。先前的工作已经研究了不同的自动生成描述集方法,从手动工程ered模板到从大型语言模型获取的模板,以及从随机字和字符建立的模板。直到现在,从对应的编码类Descriptor中 derivation zero-shot分类器仍然几乎无changed,即将图像编码与类Descriptor的均值cosine相似性最大化来分类。但是,对所有类Descriptor做平等分配可能是不优化的,因为certainDescriptor在给定图像中更好地匹配视觉提示than others。在这项工作中,我们提出了AutoCLIP,一种自动调整零shot分类器的方法。AutoCLIP在推断时基于各个提示模板的统计信息,对每个图像进行权重调整。AutoCLIP是完全不supervised,computational overhead很低,可以实现几行代码。我们显示AutoCLIP在各种视力语言模型、数据集和提示模板上显示出consistent和高达3%的提升。”

Genetic Engineering Algorithm (GEA): An Efficient Metaheuristic Algorithm for Solving Combinatorial Optimization Problems

  • paper_url: http://arxiv.org/abs/2309.16413
  • repo_url: None
  • paper_authors: Majid Sohrabi, Amir M. Fathollahi-Fard, Vasilii A. Gromov
  • for: 本研究旨在提出一种基于基因工程思想的新迪顺序算法,以解决 combinatorial optimization 问题中的限制。
  • methods: 该算法基于传统GA的搜索方法,并具有隔离、纯化、插入和表达新基因的功能,以便实现欢迎特征的 emergence 和选择优质基因。
  • results: 对比与现有算法,本研究的GEA在 benchmark 实例中显示出了更高的性能, demonstrably displaying its potential as an innovative and efficient solution for combinatorial optimization problems。
    Abstract Genetic Algorithms (GAs) are known for their efficiency in solving combinatorial optimization problems, thanks to their ability to explore diverse solution spaces, handle various representations, exploit parallelism, preserve good solutions, adapt to changing dynamics, handle combinatorial diversity, and provide heuristic search. However, limitations such as premature convergence, lack of problem-specific knowledge, and randomness of crossover and mutation operators make GAs generally inefficient in finding an optimal solution. To address these limitations, this paper proposes a new metaheuristic algorithm called the Genetic Engineering Algorithm (GEA) that draws inspiration from genetic engineering concepts. GEA redesigns the traditional GA while incorporating new search methods to isolate, purify, insert, and express new genes based on existing ones, leading to the emergence of desired traits and the production of specific chromosomes based on the selected genes. Comparative evaluations against state-of-the-art algorithms on benchmark instances demonstrate the superior performance of GEA, showcasing its potential as an innovative and efficient solution for combinatorial optimization problems.
    摘要

Physics-Preserving AI-Accelerated Simulations of Plasma Turbulence

  • paper_url: http://arxiv.org/abs/2309.16400
  • repo_url: None
  • paper_authors: Robin Greif, Frank Jenko, Nils Thuerey
  • for: study the turbulence in fluids, gases, and plasmas with reduced computational effort
  • methods: combine Large Eddy Simulation (LES) techniques with Machine Learning (ML) to model the small-scale dynamics
  • results: reduce the computational effort by about three orders of magnitude while retaining the statistical physical properties of the turbulent system
    Abstract Turbulence in fluids, gases, and plasmas remains an open problem of both practical and fundamental importance. Its irreducible complexity usually cannot be tackled computationally in a brute-force style. Here, we combine Large Eddy Simulation (LES) techniques with Machine Learning (ML) to retain only the largest dynamics explicitly, while small-scale dynamics are described by an ML-based sub-grid-scale model. Applying this novel approach to self-driven plasma turbulence allows us to remove large parts of the inertial range, reducing the computational effort by about three orders of magnitude, while retaining the statistical physical properties of the turbulent system.
    摘要 流体、气体和激骤中的混沌问题仍然是实际和基础上的开放问题。它的不可逆性通常无法通过直接计算方式解决。在这里,我们将大噪声 simulation(LES)技术与机器学习(ML)相结合,只 explictly 保留最大的动力学,而小规模动力学则由基于 ML 的子Grid 模型描述。通过这种新的方法,我们对自驱动激骤中的混沌可以大幅减少计算努力,同时保留混沌系统的统计物理性质。

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

  • paper_url: http://arxiv.org/abs/2309.16397
  • repo_url: None
  • paper_authors: Zenan Li, Fan Nie, Qiao Sun, Fang Da, Hang Zhao
  • for: 本研究旨在提出一种可以在不活动交互的情况下学习策略的 Offline Reinforcement Learning(RL)框架,以便在自动驾驶任务中学习策略。
  • methods: 本研究使用了一种名为 UNcertainty-awaRE deciSion Transformer(UNREST)的新方法,它可以在不同的驱动环境下学习策略,不需要添加过程转移或复杂的生成模型。UNREST 使用了状态uncertainty的估计,以及Sequence segmentation,来学习策略。
  • results: 实验结果表明,UNREST 在多种驱动场景中表现出色,并且可以在不同的环境下学习策略。此外,UNREST 还可以在推理过程中 dynamically 评估环境的uncertainty,以便更加谨慎的规划。
    Abstract Offline Reinforcement Learning (RL) has emerged as a promising framework for learning policies without active interactions, making it especially appealing for autonomous driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which performs well in long-horizon tasks. However, they are overly optimistic in stochastic environments with incorrect assumptions that the same goal can be consistently achieved by identical actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates state uncertainties by the conditional mutual information between transitions and returns, and segments sequences accordingly. Discovering the `uncertainty accumulation' and `temporal locality' properties of driving environments, UNREST replaces the global returns in decision transformers with less uncertain truncated returns, to learn from true outcomes of agent actions rather than environment transitions. We also dynamically evaluate environmental uncertainty during inference for cautious planning. Extensive experimental results demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.
    摘要 无线连接学习(RL)在没有活动互动的情况下学习策略,使其特别适用于自动驾驶任务。最近的Transformers的成功激发了将线RL作为序列模型进行采用,这在长期任务中表现良好。然而,它们在随机环境中做出了过optimistic的假设,即可以通过同一种行为 consistently achieve同一个目标。在这篇论文中,我们提出了一种名为UNcertainty-awaRE deciSion Transformer(UNREST)的规划方法,用于在随机驾驶环境中无需引入附加的转移或复杂生成模型。Specifically,UNREST估算驱动环境中状态的uncertainty,通过转移和返回之间的conditional mutual information来进行估算。然后,UNREST将序列分成不同的部分,并在每个部分中学习不同的策略。在发现了驱动环境中的`uncertainty accumulation'和`temporal locality'性质后,UNREST将global returns在决策变换器中换为less uncertain的truncated returns,以学习agent动作的真正结果而不是环境转移。此外,UNREST在推理过程中动态评估环境的uncertainty,以进行谨慎的规划。广泛的实验结果表明UNREST在多种驾驶场景中表现出色,并证明了我们的uncertainty估算策略的力量。

Differential 2D Copula Approximating Transforms via Sobolev Training: 2-Cats Networks

  • paper_url: http://arxiv.org/abs/2309.16391
  • repo_url: https://github.com/flaviovdf/copulae
  • paper_authors: Flavio Figueiredo, José Geraldo Fernandes, Jackson Silva, Renato M. Assunção
  • for: 本文是关于如何使用神经网络(NN)来非参数地预测二维共抽数学函数(Copula)的研究。
  • methods: 本文使用的方法是基于物理学 informed neural networks 和 Sobolev 训练的 2-Cats 方法,可以非参数地预测二维 Copula 的输出,并且尊重共抽函数 C 的数学性质。
  • results: 本文的实验结果表明,使用 2-Cats 方法可以更好地预测二维 Copula 的输出,并且比现有方法更加精准。
    Abstract Copulas are a powerful statistical tool that captures dependencies across data dimensions. When applying Copulas, we can estimate multivariate distribution functions by initially estimating independent marginals, an easy task, and then a single copulating function, $C$, to connect the marginals, a hard task. For two-dimensional data, a copula is a two-increasing function of the form $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$, where $\mathbf{I} = [0, 1]$. In this paper, we show how Neural Networks (NNs) can approximate any two-dimensional copula non-parametrically. Our approach, denoted as 2-Cats, is inspired by the Physics-Informed Neural Networks and Sobolev Training literature. Not only do we show that we can estimate the output of a 2d Copula better than the state-of-the-art, our approach is non-parametric and respects the mathematical properties of a Copula $C$.
    摘要 <>输入文本翻译成简化中文。<>共振是一种强大的统计工具,可以捕捉数据维度之间的依赖关系。当使用共振时,我们可以首先估算独立的一元分布函数,这是一个容易完成的任务,然后估算共振函数$C$,将独立分布函数相连接,这是一个困难的任务。对于二维数据,共振是一个二增函数的形式,即 $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$,其中 $\mathbf{I} = [0, 1]$。在这篇论文中,我们表明了使用神经网络(NNs)可以非参数地approximate任意二维共振。我们的方法,称为2-Cats,是基于物理学 Informed Neural Networks和 Sobolev Training литературе。不仅我们表明了我们可以更好地估算二维共振的输出,我们的方法是非参数的,并且尊重共振函数$C$的数学性质。

RLLTE: Long-Term Evolution Project of Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.16382
  • repo_url: None
  • paper_authors: Mingqi Yuan, Zequn Zhang, Yang Xu, Shihao Luo, Bo Li, Xin Jin, Wenjun Zeng
  • for: 本研究旨在提供一个长期演化、极其对话式、开源的强化学习(RL)框架,并且提供一个完整的生态系统,以便RL研究和应用。
  • methods: RLLTE框架使用了完全解释-探索的角度来隔离RL算法,并且提供了许多元件来推进算法的发展和演化。
  • results: RLLTE框架是RL领域中第一个建立完整的生态系统,包括模型训练、评估、部署、参考中心和大语言模型(LLM) empowered copilot等元素,这些元素将会设定RL工程实践的标准,并且将会对学术和业界产生很大的刺激。
    Abstract We present RLLTE: a long-term evolution, extremely modular, and open-source framework for reinforcement learning (RL) research and application. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms. More specifically, RLLTE decouples the RL algorithms completely from the exploitation-exploration perspective, providing a large number of components to accelerate algorithm development and evolution. In particular, RLLTE is the first RL framework to build a complete and luxuriant ecosystem, which includes model training, evaluation, deployment, benchmark hub, and large language model (LLM)-empowered copilot. RLLTE is expected to set standards for RL engineering practice and be highly stimulative for industry and academia.
    摘要 我们呈现RLLTE:一个长期演化、极其模块化、开源的强化学习(RL)框架,供研究和应用。不同于传统的RL框架,RLLTE不仅提供了高效的算法实现,还 serves as a 工具集,帮助开发者更快速地发展和演化算法。具体而言,RLLTE将RL算法与利用探索 perspective完全分离,提供了许多组件,以便增加算法的发展和演化。RLLTE 是首个RL框架,建立了完整且丰富的生态系统,包括模型训练、评估、部署、底�检查和大语言模型(LLM) empowered copilot。RLLTE 预期会设定RL工程学习的标准,并对业界和学界产生强烈的刺激。

Conditional normalizing flows for IceCube event reconstruction

  • paper_url: http://arxiv.org/abs/2309.16380
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: Thorsten Glüsenkamp
  • for: 这个论文旨在描述如何使用常量正则流来推断高能νe和μν neutrino产生事件的方向和能量。
  • methods: 这个论文使用了条件正则流来 derivate每个个体事件的 posterior 分布,包括系统atic uncertainty。
  • results: 研究发现,在1TeV到100TeV的能量范围内,正则流可以更好地捕捉到高能νe和μν neutrino产生事件的方向和能量, especialy azimuth-zenith asymmetries,这些偏好在前一代分析中被忽略。
    Abstract The IceCube Neutrino Observatory is a cubic-kilometer high-energy neutrino detector deployed in the Antarctic ice. Two major event classes are charged-current electron and muon neutrino interactions. In this contribution, we discuss the inference of direction and energy for these classes using conditional normalizing flows. They allow to derive a posterior distribution for each individual event based on the raw data that can include systematic uncertainties, which makes them very promising for next-generation reconstructions. For each normalizing flow we use the differential entropy and the KL-divergence to its maximum entropy approximation to interpret the results. The normalizing flows correctly incorporate complex optical properties of the Antarctic ice and their relation to the embedded detector. For showers, the differential entropy increases in regions of high photon absorption and decreases in clear ice. For muons, the differential entropy strongly correlates with the contained track length. Coverage is maintained, even for low photon counts and highly asymmetrical contour shapes. For high-photon counts, the distributions get narrower and become more symmetrical, as expected from the asymptotic theorem of Bernstein-von-Mises. For shower directional reconstruction, we find the region between 1 TeV and 100 TeV to potentially benefit the most from normalizing flows because of azimuth-zenith asymmetries which have been neglected in previous analyses by assuming symmetrical contours. Events in this energy range play a vital role in the recent discovery of the galactic plane diffuse neutrino emission.
    摘要 冰砾激光观测站是一个立方公分级高能激光探测器,部署在南极冰中。我们使用条件正常化流来推断激光的方向和能量。这些正常化流可以基于Raw数据 derivate一个单个事件的 posterior distribution,包括系统atic uncertainty,这使其非常有前途的应用于下一代重建。我们使用 differential entropy 和 KL-divergence 来解释结果。正常化流正确地反映了南极冰的复杂光学性和探测器的相关性。在 shower 中,differential entropy 在高光吸收区域增加,而在clear ice 区域减少。对于 muon,differential entropy 与包含轨迹长度强相关。 regardless of low photon counts 和高度不均匀的外 contour shape,coverage 得以维护。对高 photon counts 的分布,distribution 变得更加窄和对称,这与Bernstein-von-Mises asymptotic theorem 相符。在 shower 方向重建中,我们发现1TeV 到 100TeV 的能量范围可能受益最多,因为这个范围中的 azimuth-zenith 偏好未在前一analysis中考虑。这些事件在激光 diffuse neutrino emission 的发现中扮演了重要的角色。

Epistemic Logic Programs: a study of some properties

  • paper_url: http://arxiv.org/abs/2309.16344
  • repo_url: None
  • paper_authors: Stefania Costantini, Andrea Formisano
  • for: 这篇论文旨在扩展Answer Set Programming(ASP)的 Epistemic Logic Programs(ELPs),并提供了这些程序的 semantics 的不同Characterizations。
  • methods: 本文使用了不同的semantic approach来描述 world views,并提出了一些新的semantic property,如 Epistemic Splitting Property,以便模块地计算 world views。
  • results: 本文分析了在bottom-up和top-down方法之间的换 perspective,并提出了一种基本的top-down方法,证明其等价于bottom-up方法。此外,本文还提出了一种扩展的top-down方法,其可以应用于许多现有的semantics,并且与bottom-up方法在Epistemically Stratified Programs中具有相同的性质。
    Abstract Epistemic Logic Programs (ELPs), extend Answer Set Programming (ASP) with epistemic operators. The semantics of such programs is provided in terms of world views, which are sets of belief sets, i.e., syntactically, sets of sets of atoms. Different semantic approaches propose different characterizations of world views. Recent work has introduced semantic properties that should be met by any semantics for ELPs, like the Epistemic Splitting Property, that, if satisfied, allows to modularly compute world views in a bottom-up fashion, analogously to ``traditional'' ASP. We analyze the possibility of changing the perspective, shifting from a bottom-up to a top-down approach to splitting. We propose a basic top-down approach, which we prove to be equivalent to the bottom-up one. We then propose an extended approach, where our new definition: (i) is provably applicable to many of the existing semantics; (ii) operates similarly to ``traditional'' ASP; (iii) provably coincides under any semantics with the bottom-up notion of splitting at least on the class of Epistemically Stratified Programs (which are, intuitively, those where the use of epistemic operators is stratified); (iv) better adheres to common ASP programming methodology.
    摘要 《知识逻辑编程(ELP)》,是将回答集编程(ASP)扩展到知识运算符的逻辑语言。知识运算符的 semantics 是通过世界观(set of belief sets,即语法上来说是集合的集合)来提供。不同的 semantics 可以对 world view 进行不同的Characterization。最近的工作已经提出了为 ELP 的 semantics 所需的一些 semantics properties,例如 epistemic splitting property,如果满足这个 property,那么可以使用分解来计算 world view 的模块化计算方式,类似于传统的 ASP。我们分析了从 bottom-up 到 top-down 的 Perspective 的改变,并提出了一种基本的 top-down 方法,我们证明了它与 bottom-up 方法是等价的。然后,我们提出了一种扩展的方法,其中我们新的定义:(i)可以应用于大多数现有的 semantics;(ii)与传统的 ASP 类似;(iii)在任何 semantics 下与 bottom-up 的 splitting 做出相同的结果;(iv)更好地遵循传统的 ASP 编程方法。

End-to-end Risk Prediction of Atrial Fibrillation from the 12-Lead ECG by Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.16335
  • repo_url: https://github.com/mygithth27/af-risk-prediction-by-ecg-dnn
  • paper_authors: Theogene Habineza, Antônio H. Ribeiro, Daniel Gedon, Joachim A. Behar, Antonio Luiz P. Ribeiro, Thomas B. Schön
  • For: The paper aims to develop and evaluate a machine learning algorithm to predict the risk of developing atrial fibrillation (AF) from electrocardiogram (ECG) data.* Methods: The authors use a deep neural network model to analyze the ECG data and evaluate the risk of AF. They also use a survival model to predict the probability of developing AF over time.* Results: The authors achieve an area under the receiver operating characteristic curve (AUC) score of 0.845, indicating good performance of the model in identifying patients who will develop AF in the future. They also find that patients in the high-risk group are 50% more likely to develop AF within 40 weeks, while patients in the minimal-risk group have more than 85% chance of remaining AF-free up to seven years.Here are the three points in Simplified Chinese text:* For: 这篇论文目标是开发和评估一种基于电cardiogram(ECG)数据的心律失常预测算法。* Methods: 作者使用深度神经网络模型分析ECG数据,评估心律失常风险。他们还使用生存模型预测心律失常发生的可能性。* Results: 作者实现了AUC分位函数分数0.845,表明模型在识别将来发展心律失常的能力良好。他们还发现高风险群体在40周内发展心律失常的可能性为50%,而最低风险群体在7年内保持心律失常自由的可能性高于85%。
    Abstract Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and evaluate one such algorithm on a large CODE dataset collected in Brazil. Results: The deep neural network model identified patients without indication of AF in the presented ECG but who will develop AF in the future with an AUC score of 0.845. From our survival model, we obtain that patients in the high-risk group (i.e. with the probability of a future AF case being greater than 0.7) are 50% more likely to develop AF within 40 weeks, while patients belonging to the minimal-risk group (i.e. with the probability of a future AF case being less than or equal to 0.1) have more than 85% chance of remaining AF free up until after seven years. Conclusion: We developed and validated a model for AF risk prediction. If applied in clinical practice, the model possesses the potential of providing valuable and useful information in decision-making and patient management processes.
    摘要 背景:心室 flutter (AF) 是全球每年数百万人的常见心血管疾病之一,与心血管疾病如心卫和心力衰竭存在高度的相关性。机器学习方法在电子心脏图像中评估 AF 的风险表现出了扎实的成果。我们计划在大型 CODE 数据集上开发和评估一个这种算法。结果:我们的深度神经网络模型在给定的 ECG 中能够正确地预测无症状 AF 患者,其 AUC 分数为 0.845。从我们的生存模型中,我们发现高风险群(即未来 AF случа发率大于 0.7)的患者在 40 周内的发生 AF 的概率为 50%,而低风险群(即未来 AF случа发率不大于或等于 0.1)的患者在 7 年后仍然保持 AF 无症状的概率高于 85%。结论:我们开发和验证了一种 AF 风险预测模型。如果在临床实践中应用,这种模型具有提供价值和有用信息的潜力,可以帮助决策和患者管理过程中做出更好的决策。

Augmenting transformers with recursively composed multi-grained representations

  • paper_url: http://arxiv.org/abs/2309.16319
  • repo_url: https://github.com/ant-research/structuredlm_rtdt
  • paper_authors: Xiang Hu, Qingyang Zhu, Kewei Tu, Wei Wu
  • for: 这个论文的目的是提出一种能够Explicitly model hierarchical syntactic structures of raw texts的Recursive Composition Augmented Transformer(ReCAT)模型,以便在学习和推理过程中不需要靠托金树来模型语法结构。
  • methods: 该模型使用了一种新的Contextual Inside-Outside(CIO)层来学习语法结构,这个层通过底向上和上向下的pas来学习语法上下文,并将这些上下文化的表示与Transformer模型结合起来,从而实现深度的间隔和外部交互。
  • results: experiments表明,ReCAT模型可以在各种句子级和span级任务上显著超越vanilla Transformer模型,同时与其他基于Recursive Networks和Transformers的基elines一起在自然语言理解任务上表现出色。此外,ReCAT模型所induced的层次结构与人工标注的语法树 exhibit strong consistency,这表明该模型具有良好的解释性。
    Abstract We present ReCAT, a recursive composition augmented Transformer that is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. Existing research along this line restricts data to follow a hierarchical tree structure and thus lacks inter-span communications. To overcome the problem, we propose a novel contextual inside-outside (CIO) layer that learns contextualized representations of spans through bottom-up and top-down passes, where a bottom-up pass forms representations of high-level spans by composing low-level spans, while a top-down pass combines information inside and outside a span. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions, and thus generate multi-grained representations fully contextualized with other spans. Moreover, the CIO layers can be jointly pre-trained with Transformers, making ReCAT enjoy scaling ability, strong performance, and interpretability at the same time. We conduct experiments on various sentence-level and span-level tasks. Evaluation results indicate that ReCAT can significantly outperform vanilla Transformer models on all span-level tasks and baselines that combine recursive networks with Transformers on natural language inference tasks. More interestingly, the hierarchical structures induced by ReCAT exhibit strong consistency with human-annotated syntactic trees, indicating good interpretability brought by the CIO layers.
    摘要 我们介绍ReCAT模型,这是一种基于重复组合的Transformer模型,可以直接模型文本的层次结构,不需要在学习和推断过程中依赖黄金树。现有研究限制数据只能按照层次结构进行组织,因此缺乏间隔通信。为解决这问题,我们提出了一种新的内部外部(CIO)层,该层可以在底层和顶层之间学习各个 span 的上下文化表示,其中底层 pass 将低级 span 组合成高级 span,而顶层 pass 将内部和外部信息相结合。通过在Transformer模型中堆叠多个CIO层,ReCAT模型可以实现深入的间隔和深入的span间交互,并生成全面上下文ualized的表示。此外,CIO层可以与Transformer模型进行共同预训练,使ReCAT模型具有扩展性、良好的性能和可解释性。我们在各种句子级和span级任务上进行了实验,结果表明ReCAT模型可以在所有span级任务上明显超过vanilla Transformer模型和将重复网络与Transformer模型结合的基eline。此外,ReCAT模型中的层次结构与人工标注的语法树具有强相关性,这表明CIO层带来的解释性非常好。

Efficiency Separation between RL Methods: Model-Free, Model-Based and Goal-Conditioned

  • paper_url: http://arxiv.org/abs/2309.16291
  • repo_url: None
  • paper_authors: Brieuc Pinon, Raphaël Jungers, Jean-Charles Delvenne
  • for: 这种 исследование证明了一种广泛应用的强化学习(RL)算法的基本限制。
  • methods: 这种限制适用于模型自由RL方法以及一种广泛的模型基于方法,如搜索树的规划。
  • results: 这种限制表明在RL问题中,这些方法需要对环境进行 exponential 的交互时间来找到优化行为,但是存在一种方法,不是专门针对这种家族问题,可以高效解决这些问题。
    Abstract We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.
    摘要 我们证明了一种抽象类别的强制性限制,该限制适用于广泛的强制学习(RL)算法。这种限制适用于无模型RL方法以及一种广泛的模型基于方法,如搜索树的规划。 我们提供了一家RL问题的家族,其中这些方法在与环境交互时需要至少 exponential 的时间来找到最优行为。然而, существует一种方法,不是专门针对这个家族的问题,它可以高效地解决这些问题。在此之外,我们的限制不适用于文献中提出的一些方法,如目标条件方法或构建反动动力模型的方法。

  • paper_url: http://arxiv.org/abs/2309.16289
  • repo_url: https://github.com/open-compass/lawbench
  • paper_authors: Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, Songyang Zhang, Kai Chen, Zongwen Shen, Jidong Ge
  • for: 法律领域中的大型自然语言模型(LLMs)的评估和发展
  • methods: 提出了一个全面的评估标准库LawBench,以评估 LLMS 在法律领域中的能力,包括 memorization、理解和应用三种知识水平
  • results: GPT-4 在法律领域中表现最佳,超过其他 LLMs 的表现,但是这些 LLMs 在特定的法律任务中仍然有很大的差异,而且需要进一步的调整和改进以取得可靠的结果。
    Abstract Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.
    摘要 大型语言模型(LLMs)已经展示了各种能力,但当它们应用到高度特殊化和安全敏感的法律领域时,它们是否具备了法律知识和可靠地完成法律相关任务?为了解决这个问题,我们提出了一个完整的评估指标 LawBench。 LawBench 已经精心设计,以确定 LLMs 的法律能力的三种 когнітив水平:(1)法律知识储存:LLMs 是否可以储存需要的法律概念、文章和事实;(2)法律知识理解:LLMs 是否可以理解法律文本中的实体、事件和关系;(3)法律知识应用:LLMs 是否可以正确地利用其法律知识,并且做出需要的推理步骤来解决实际法律任务。 LawBench 包含 20 个多样化的任务,涵盖 5 种任务类型:单 Label 分类(SLC)、多 Label 分类(MLC)、回推、提取和生成。我们对 51 个 LLMs 进行了广泛的评估,包括 20 种多语言 LLMs、22 个中文化 LLMs 和 9 个法律特定 LLMs。结果显示 GPT-4 在法律领域中仍然是最佳performing LLM,与其他 LLMs 相比,具有明显的优势。虽然对法律特定文本进行了 fine-tuning,但我们仍然很遥か від法律任务中可靠且可靠的 LLMs。所有数据、模型预测和评估代码都已经在 https://github.com/open-compass/LawBench/ 发布。我们希望这个底线可以帮助我们更深入了解 LLMs 在法律领域中的特定能力,并且加快法律领域中 LLMs 的发展。

High Throughput Training of Deep Surrogates from Large Ensemble Runs

  • paper_url: http://arxiv.org/abs/2309.16743
  • repo_url: None
  • paper_authors: Lucas Meyer, Marc Schouler, Robert Alexander Caulk, Alejandro Ribés, Bruno Raffin
  • for: 提高数值解析器的计算效率(accelerate numerical solvers)
  • methods: 使用深度学习方法(deep learning approaches)和多线程并行(multiple levels of parallelism)生成丰富的数据集,并将其直接流动到学习模型中进行训练(online training)
  • results: 在训练一个全连接网络作为热方程的替代方案时,提高了精度47%和批处理速率13倍,可以在2小时内训练8TB的数据。
    Abstract Recent years have seen a surge in deep learning approaches to accelerate numerical solvers, which provide faithful but computationally intensive simulations of the physical world. These deep surrogates are generally trained in a supervised manner from limited amounts of data slowly generated by the same solver they intend to accelerate. We propose an open-source framework that enables the online training of these models from a large ensemble run of simulations. It leverages multiple levels of parallelism to generate rich datasets. The framework avoids I/O bottlenecks and storage issues by directly streaming the generated data. A training reservoir mitigates the inherent bias of streaming while maximizing GPU throughput. Experiment on training a fully connected network as a surrogate for the heat equation shows the proposed approach enables training on 8TB of data in 2 hours with an accuracy improved by 47% and a batch throughput multiplied by 13 compared to a traditional offline procedure.
    摘要 近年来,深度学习方法在加速数值方法方面得到了广泛应用,这些深度代理模型提供了诚实的但计算昂贵的物理世界 simulate。这些深度代理通常在监督式的方式下从有限量的数据中训练。我们提出了一个开源框架,该框架可以在大量的 ensemble 运行中在线训练这些模型。它利用多级并行来生成丰富的数据集。框架避免了 I/O 瓶颈和存储问题,直接流动生成的数据。一个训练储备池 Mitigates 流动中的遗传性,同时 maximizing GPU throughput。在训练一个完全连接的网络作为热方程代理方法中,我们的方法可以在 2 小时内训练 8TB 的数据,并提高了精度 by 47% 和批处理速率 by 13 比 traditional offline 方法。

Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning

  • paper_url: http://arxiv.org/abs/2309.16286
  • repo_url: https://github.com/wenkehuang/fccl
  • paper_authors: Wenke Huang, Mang Ye, Zekun Shi, Bo Du
  • for: 这个论文的目的是解决联合学习中的两个主要挑战:模型多样性和溃败性忘却。
  • methods: 这篇论文提出了一个新的FCCL+方法,即联合相似性学习加浓度转移,通过无关的公开数据来解决内部领域对话障碍,并在本地更新阶段引入联合非目标传播,以保持跨领域知识。
  • results: 实验结果显示,FCCL+方法能够有效地解决模型多样性和溃败性忘却问题,并且在不同的领域转移情况下具有较好的一致性和稳定性。
    Abstract Federated learning is an important privacy-preserving multi-party learning paradigm, involving collaborative learning with others and local updating on private data. Model heterogeneity and catastrophic forgetting are two crucial challenges, which greatly limit the applicability and generalizability. This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation, facilitating the both intra-domain discriminability and inter-domain generalization. For heterogeneity issue, we leverage irrelevant unlabeled public data for communication between the heterogeneous participants. We construct cross-correlation matrix and align instance similarity distribution on both logits and feature levels, which effectively overcomes the communication barrier and improves the generalizable ability. For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation, which retains inter-domain knowledge while avoiding the optimization conflict issue, fulling distilling privileged inter-domain information through depicting posterior classes relation. Considering that there is no standard benchmark for evaluating existing heterogeneous federated learning under the same setting, we present a comprehensive benchmark with extensive representative methods under four domain shift scenarios, supporting both heterogeneous and homogeneous federated settings. Empirical results demonstrate the superiority of our method and the efficiency of modules on various scenarios.
    摘要 federated 学习是一种重要的隐私保护多方学习模式,协同学习他人的私人数据。模型多样性和悬峰性忘记是这种模式的两大挑战,它们很大程度限制了应用和泛化性。这篇论文提出了一种新的FCCL+,基于联合相关学习和非目标液化,解决了两个挑战。对于多样性问题,我们利用无关的公共数据来进行参与者之间的交流。我们构建了垂直相关矩阵,并将实例相似性分布对应于logits和特征层面,这有效地超越了交流障碍和提高了泛化能力。对于本地更新阶段的悬峰性问题,FCCL+引入了联邦非目标液化,保留了域之间知识,并避免了优化冲突问题,通过描述 posterior classes 关系来全面地泛化知识。由于现有的多样性联邦学习没有标准的评估标准,我们提出了一个完整的 bencmark,包括了四个域Shift 情况,支持 both heterogeneous和 homogeneous 联邦设置。我们的实验结果表明,我们的方法的优越性和模块的效率在各种场景中。

Automatic Identification of Stone-Handling Behaviour in Japanese Macaques Using LabGym Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2310.07812
  • repo_url: None
  • paper_authors: Théo Ardoin, Cédric Sueur
  • for: 本研究旨在评估LabGym工具的适用性 дляPrimates行为分析,主要采用日本黑猩猩为模式动物。
  • methods: 本研究采用了一种新的行为分析模型,通过使用LabGym工具来检测日本黑猩猩石头拥有行为。
  • results: 研究成功开发了一个高度准确的日本黑猩猩石头拥有行为检测模型,但因为时间限制而无法取得具体的量化数据。
    Abstract The latest advancements in artificial intelligence technology have opened doors to the analysis of intricate behaviours. In light of this, ethologists are actively exploring the potential of these innovations to streamline the time-intensive process of behavioural analysis using video data. In the realm of primatology, several tools have been developed for this purpose. Nonetheless, each of these tools grapples with technical constraints that we aim to surmount. To address these limitations, we have established a comprehensive protocol designed to harness the capabilities of a cutting-edge tool, LabGym. Our primary objective was to evaluate LabGym's suitability for the analysis of primate behaviour, with a focus on Japanese macaques as our model subjects. We have successfully developed a model that demonstrates a high degree of accuracy in detecting Japanese macaques stone-handling behaviour. Our behavioural analysis model was completed as per our initial expectations and LabGym succeed to recognise stone-handling behaviour on videos. However, it is important to note that our study's ability to draw definitive conclusions regarding the quality of the behavioural analysis is hampered by the absence of quantitative data within the specified timeframe. Nevertheless, our model represents the pioneering endeavour, as far as our knowledge extends, in leveraging LabGym for the analysis of primate behaviours. It lays the groundwork for potential future research in this promising field.
    摘要 最新的人工智能技术的发展已经开启了复杂行为的分析的大门。鉴于这一点,生物学家正在积极探索这些创新的潜在用于视频数据分析的可能性。在primatology领域,一些工具已经被开发出来用于这种目的。然而,每种工具都面临着技术上的限制,我们希望通过解决这些限制来进一步发展。为了实现这一目标,我们已经建立了一个完整的协议,旨在利用当今最先进的工具——LabGym——来实现 primate 行为的分析。我们的首要目标是评估 LabGym 是否适用于日本猕猴的行为分析,特别是日本猕猴的石头处理行为。我们已经成功地建立了一个模型,具有高度准确性的检测日本猕猴石头处理行为。我们的行为分析模型按照我们的初始预期进行了完成,LabGym 成功地在视频中识别出日本猕猴的石头处理行为。然而,我们的研究无法在指定时间内提供量化数据,因此我们的研究结论的可靠性受到限制。不过,我们的模型表示了在我们知道的范围内的开拓性的尝试,它为未来可能的研究奠定了基础。

Optimizing Multicarrier Multiantenna Systems for LoS Channel Charting

  • paper_url: http://arxiv.org/abs/2310.03762
  • repo_url: None
  • paper_authors: Taha Yassine, Luc Le Magoarou, Matthieu Crussière, Stephane Paquelet
  • for: 本研究旨在提出一种可以在多载波多天线系统中学习渠道图的方法,以便实现高精度的用户equipment(UE)位置推算。
  • methods: 本研究使用了一种基于距离度量的方法,以学习渠道图。但是,该距离度量受到 periodicity和振荡性的影响,导致用户远离的 UE 可能会被误判为近距离。
  • results: 本研究提供了一种改进的距离度量,以消除 periodicity和振荡性的影响。此外,研究还提出了一些设计方法,以便实现高质量的渠道图学习。实验 validate 了这些结论,并在不同的场景下进行了实验验证。
    Abstract Channel charting (CC) consists in learning a mapping between the space of raw channel observations, made available from pilot-based channel estimation in multicarrier multiantenna system, and a low-dimensional space where close points correspond to channels of user equipments (UEs) close spatially. Among the different methods of learning this mapping, some rely on a distance measure between channel vectors. Such a distance should reliably reflect the local spatial neighborhoods of the UEs. The recently proposed phase-insensitive (PI) distance exhibits good properties in this regards, but suffers from ambiguities due to both its periodic and oscillatory aspects, making users far away from each other appear closer in some cases. In this paper, a thorough theoretical analysis of the said distance and its limitations is provided, giving insights on how they can be mitigated. Guidelines for designing systems capable of learning quality charts are consequently derived. Experimental validation is then conducted on synthetic and realistic data in different scenarios.
    摘要

UPB @ ACTI: Detecting Conspiracies using fine tuned Sentence Transformers

  • paper_url: http://arxiv.org/abs/2309.16275
  • repo_url: None
  • paper_authors: Andrei Paraschiv, Mihai Dascalu
  • for: 探讨假设论文探测,以便提高信息准确性和社会信任。
  • methods: 使用预训练句子转换器模型和数据增强技术。
  • results: 在ACTI @ EVALITA 2023分类任务中,我们的方法取得了85.71%的 binary 分类和91.23%的细化假设主题分类的 F1 分数,超过其他竞争系统。
    Abstract Conspiracy theories have become a prominent and concerning aspect of online discourse, posing challenges to information integrity and societal trust. As such, we address conspiracy theory detection as proposed by the ACTI @ EVALITA 2023 shared task. The combination of pre-trained sentence Transformer models and data augmentation techniques enabled us to secure first place in the final leaderboard of both sub-tasks. Our methodology attained F1 scores of 85.71% in the binary classification and 91.23% for the fine-grained conspiracy topic classification, surpassing other competing systems.
    摘要 共谋论变得在网络讨论中变得更加出名,对信息完整性和社会信任构成挑战。因此,我们Addresses conspiracy theory detection as proposed by the ACTI @ EVALITA 2023 shared task。 combining pre-trained sentence Transformer models and data augmentation techniques enabled us to secure first place in the final leaderboard of both sub-tasks。 Our methodology attained F1 scores of 85.71% in the binary classification and 91.23% for the fine-grained conspiracy topic classification,Surpassing other competing systems。

Cooperation Dynamics in Multi-Agent Systems: Exploring Game-Theoretic Scenarios with Mean-Field Equilibria

  • paper_url: http://arxiv.org/abs/2309.16263
  • repo_url: https://github.com/dawnorak/marl-kart-simulator
  • paper_authors: Vaigarai Sathi, Sabahat Shaik, Jaswanth Nidamanuri
  • for: investigate strategies to invoke cooperation in game-theoretic scenarios, such as the Iterated Prisoner’s Dilemma, where agents must optimize both individual and group outcomes.
  • methods: analyze existing cooperative strategies for their effectiveness in promoting group-oriented behavior in repeated games, and propose modifications that encourage group rewards while also resulting in higher individual gains.
  • results: extend the study to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging, and establish equilibrium solutions and reward structures for infinitely large agent sets in repeated games using mean-field game theory. Offer practical insights through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and explore adapting simulation algorithms to create scenarios favoring cooperation for group rewards.
    Abstract Cooperation is fundamental in Multi-Agent Systems (MAS) and Multi-Agent Reinforcement Learning (MARL), often requiring agents to balance individual gains with collective rewards. In this regard, this paper aims to investigate strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma, where agents must optimize both individual and group outcomes. Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games. Modifications are proposed where encouraging group rewards will also result in a higher individual gain, addressing real-world dilemmas seen in distributed systems. The study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. Leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.
    摘要 合作是多智能体系统(MAS)和多智能体奖励学习(MARL)的基本要素,经常需要智能体寻求 equilibrio between individual gains 和 collective rewards. 在这种情况下,本文旨在调查invoking cooperation in game-theoretic scenarios,具体来说是迭代犯罪者的困境,智能体需要优化个人和群体的结果。现有的合作策略被分析是否能够促进群体 oriented 行为在重复游戏中。提议 modifying these strategies to encourage group rewards, which will also result in higher individual gains, addressing real-world dilemmas seen in distributed systems.更over, this study extends to scenarios with exponentially growing agent populations ($N \longrightarrow +\infty$), where traditional computation and equilibrium determination are challenging. By leveraging mean-field game theory, equilibrium solutions and reward structures are established for infinitely large agent sets in repeated games. Finally, practical insights are offered through simulations using the Multi Agent-Posthumous Credit Assignment trainer, and the paper explores adapting simulation algorithms to create scenarios favoring cooperation for group rewards. These practical implementations bridge theoretical concepts with real-world applications.

QonFusion – Quantum Approaches to Gaussian Random Variables: Applications in Stable Diffusion and Brownian Motion

  • paper_url: http://arxiv.org/abs/2309.16258
  • repo_url: https://github.com/BoltzmannEntropy/QonFusion
  • paper_authors: Shlomo Kashani
  • For: The paper proposes a strategy for generating Gaussian random variables (GRVs) using non-parametric quantum circuits, as an alternative to conventional pseudorandom number generators (PRNGs) such as the \textbf{torch.rand} function in PyTorch.* Methods: The paper introduces a Quantum Gaussian Random Variable Generator that fulfills dual roles in simulating both Stable Diffusion (SD) and Brownian Motion (BM), diverging from previous methods that use parametric quantum circuits (PQCs) and variational quantum eigensolvers (VQEs). The proposed method does not require a computationally demanding optimization process to tune parameters.* Results: The paper presents QonFusion, a Python library that facilitates assimilating the proposed methodology into existing computational frameworks, and validates QonFusion through extensive statistical testing, confirming the statistical equivalence of the Gaussian samples from the quantum approach to classical counterparts within defined significance limits.
    Abstract In the present study, we delineate a strategy focused on non-parametric quantum circuits for the generation of Gaussian random variables (GRVs). This quantum-centric approach serves as a substitute for conventional pseudorandom number generators (PRNGs), such as the \textbf{torch.rand} function in PyTorch. The principal theme of our research is the incorporation of Quantum Random Number Generators (QRNGs) into classical models of diffusion. Notably, our Quantum Gaussian Random Variable Generator fulfills dual roles, facilitating simulations in both Stable Diffusion (SD) and Brownian Motion (BM). This diverges markedly from prevailing methods that utilize parametric quantum circuits (PQCs), often in conjunction with variational quantum eigensolvers (VQEs). Although conventional techniques can accurately approximate ground states in complex systems or model elaborate probability distributions, they require a computationally demanding optimization process to tune parameters. Our non-parametric strategy obviates this necessity. To facilitate assimilating our methodology into existing computational frameworks, we put forward QonFusion, a Python library congruent with both PyTorch and PennyLane, functioning as a bridge between classical and quantum computational paradigms. We validate QonFusion through extensive statistical testing, including tests which confirm the statistical equivalence of the Gaussian samples from our quantum approach to classical counterparts within defined significance limits. QonFusion is available at \url{https://boltzmannentropy.github.io/qonfusion.github.io/} to reproduce all findings here.
    摘要 在本研究中,我们提出了一种非参数化量子环境的推算方法,用于生成高斯随机变量(GRV)。这种量子中心的方法可以替换传统的 Pseudorandom Number Generators(PRNG),如PyTorch中的torch.rand函数。我们的研究的主题是将量子随机数生成器(QRNG)integrated into classical diffusion models。与之前的方法不同,我们的量子高斯随机变量生成器同时可以实现稳定扩散(SD)和布朗运动(BM)的模拟。与 Parametric Quantum Circuits(PQC)相比,我们的非参数化方法不需要进行 computationally demanding 的参数调整。为使我们的方法更容易与现有计算框架集成,我们提出了QonFusion,一个Python库,与PyTorch和PennyLane相容,用于连接类型和量子计算 paradigm。我们通过了广泛的统计测试,包括确认我们量子方法生成的高斯样本与传统 counterparts 在定义的 estadístico 限度内 Statistical Equivalence。QonFusion可以在 \url{https://boltzmannentropy.github.io/qonfusion.github.io/} 上获取,以便复现这里的所有发现。

Nondestructive chicken egg fertility detection using CNN-transfer learning algorithms

  • paper_url: http://arxiv.org/abs/2309.16257
  • repo_url: None
  • paper_authors: Shoffan Saifullah, Rafal Drezewski, Anton Yudhana, Andri Pranolo, Wilis Kaswijanti, Andiko Putro Suryotomo, Seno Aji Putra, Alin Khaliduzzaman, Anton Satria Prabuwono, Nathalie Japkowicz
  • for: 这个研究探索了将Convolutional Neural Network(CNN)的传播学习应用于不破坏性鸡蛋 fertility detection,以提高精禽育成实践中的精确度。
  • methods: 研究使用了四个模型:VGG16、ResNet50、InceptionNet和MobileNet,并将它们训练和评估在一个dataset(200个单一鸡蛋图像)上,使用了增强图像(旋转、反转、缩小、缩寸和反射)。
  • results: 训练结果显示所有模型具有高精度,能够正确地学习和类别鸡蛋的 fertility 状态,但在测试集中,不同模型之间存在差异。InceptionNet 表现最佳,在所有评估指标中具有最高精度,对于断言fertile 和 non-fertile 鸡蛋的准确率为 0.98。
    Abstract This study explored the application of CNN-Transfer Learning for nondestructive chicken egg fertility detection for precision poultry hatchery practices. Four models, VGG16, ResNet50, InceptionNet, and MobileNet, were trained and evaluated on a dataset (200 single egg images) using augmented images (rotation, flip, scale, translation, and reflection). Although the training results demonstrated that all models achieved high accuracy, indicating their ability to accurately learn and classify chicken eggs' fertility state, when evaluated on the testing set, variations in accuracy and performance were observed. InceptionNet exhibited the best overall performance, accurately classifying fertile and non-fertile eggs. It demonstrated excellent performance in both training and testing sets in all parameters of the evaluation metrics. In testing set, it achieved an accuracy of 0.98, a sensitivity of 1 for detecting fertile eggs, and a specificity of 0.96 for identifying non-fertile eggs. The higher performance is attributed to its unique architecture efficiently capturing features at different scales leading to improved accuracy and robustness. Further optimization and fine-tuning of the models might necessary to address the limitations in accurately detecting fertile and non-fertile eggs in case of other models. This study highlighted the potential of CNN-Transfer Learning for nondestructive fertility detection and emphasizes the need for further research to enhance the models' capabilities and ensure accurate classification.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China.

Supervised Learning Models for Early Detection of Albuminuria Risk in Type-2 Diabetes Mellitus Patients

  • paper_url: http://arxiv.org/abs/2309.16742
  • repo_url: None
  • paper_authors: Arief Purnama Muharram, Dicky Levenus Tahapary, Yeni Dwi Lestari, Randy Sarayar, Valerie Josephine Dirjayanto
  • for: 这项研究旨在开发一种超vised学习模型,用于预测TYPE 2 糖尿病患者发展albuminuria的风险。
  • methods: 该研究使用了10个特征特征和1个目标特征(albuminuria),并选择了Na"ive Bayes、支持向量机(SVM)、决策树、Random Forest、AdaBoost、XGBoost和多层权重神经网络(MLP)等10种超vised学习算法进行训练。
  • results: 实验结果显示,MLP表现最佳,其准确率和f1-score分别达0.74和0.75,表明其适用于预测TYPE 2 糖尿病患者albuminuria的creening purposes。
    Abstract Diabetes, especially T2DM, continues to be a significant health problem. One of the major concerns associated with diabetes is the development of its complications. Diabetic nephropathy, one of the chronic complication of diabetes, adversely affects the kidneys, leading to kidney damage. Diagnosing diabetic nephropathy involves considering various criteria, one of which is the presence of a pathologically significant quantity of albumin in urine, known as albuminuria. Thus, early prediction of albuminuria in diabetic patients holds the potential for timely preventive measures. This study aimed to develop a supervised learning model to predict the risk of developing albuminuria in T2DM patients. The selected supervised learning algorithms included Na\"ive Bayes, Support Vector Machine (SVM), decision tree, random forest, AdaBoost, XGBoost, and Multi-Layer Perceptron (MLP). Our private dataset, comprising 184 entries of diabetes complications risk factors, was used to train the algorithms. It consisted of 10 attributes as features and 1 attribute as the target (albuminuria). Upon conducting the experiments, the MLP demonstrated superior performance compared to the other algorithms. It achieved accuracy and f1-score values as high as 0.74 and 0.75, respectively, making it suitable for screening purposes in predicting albuminuria in T2DM. Nonetheless, further studies are warranted to enhance the model's performance.
    摘要 DIABETES,特别是TYPE 2 DIABETES MELLITUS(T2DM),仍然是现代医学中的一个重要健康问题。diabetes 的一个主要担忧是其相关的合并症的发展。diabetic nephropathy,一种diabetes 的慢性合并症,会影响肾脏,导致肾脏损害。诊断diabetic nephropathy 需要考虑多种 критериев,其中一个是在diabetes 患者身上存在至少一定量的蛋白质uria,也就是albuminuria。因此,预测diabetes 患者发展albuminuria的风险有可能提供时间性的预防措施。本研究旨在开发一种指导学习模型,以预测T2DM 患者发展albuminuria的风险。我们使用了10种特征和1种目标(albuminuria)组成的私有数据集来训练算法。Na\"ive Bayes、Support Vector Machine(SVM)、决策树、Random Forest、AdaBoost、XGBoost和Multi-Layer Perceptron(MLP)等算法被选择参与。实验结果表明,MLP 表现最佳,其准确率和f1-score值分别达0.74和0.75,适用于预测T2DM 患者发展albuminuria。然而,更多的研究是需要进行,以提高模型的性能。

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

  • paper_url: http://arxiv.org/abs/2309.16240
  • repo_url: None
  • paper_authors: Chaoqi Wang, Yibo Jiang, Chenghao Yang, Han Liu, Yuxin Chen
  • for: 提高人工智能系统的安全性和可控性,通过RLHF和DPO两种方法实现人类偏好的调整。
  • methods: 提出了一种通用的DPO方法,通过多种异步异常约束来简化价值函数和优化策略之间的复杂关系,从而实现更加高效和监督的人类偏好调整。
  • results: 对多种异步异常约束进行了实验研究,发现这些约束能够保证优化策略的准确性和多样性,并且比PPO方法更高效地实现异常约束。此外,这些约束直接影响预期抽象误差(ECE)。
    Abstract The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    摘要 大型语言模型(LLM)的提高了机会,包括人工通用智能(AGI),但同时也增加了安全问题,例如AI系统的可能性的滥用。人类反馈学习束缚(RLHF)已经成为一种有希望的路径,但它具有复杂性和依赖于另一个奖励模型的问题。直接偏好优化(DPO)已经被提议,它与RLHF等价,只要在reverse KL regularization constraint下。这篇文章介绍了 $f$-DPO,一种通用的DPO方法,通过包括杰尼森-尚恩减少函数、前向KL减少函数和α减少函数的多种偏好减少函数来简化优化问题。我们显示,在某些$f$-减少函数下,包括杰尼森-尚恩减少函数、前向KL减少函数和α减少函数,优化问题可以通过解决卡鲁什-库恩-图克(KKT)条件来简化。这使得不需要估算正则化常量在布莱德利-泰勒模型中,并且可以在一个可追踪的方式下映射奖励函数和优化策略之间的关系。我们的方法可以更有效地使用LLM来与人类偏好相align,并且在一个更广泛的偏好减少函数下进行监督。实验表明,采用这些偏好减少函数可以保持对适配性和生成多样性的平衡。此外, $f$-DPO在异质效率和抽象效率方面表现出色,而偏好减少函数直接影响预期calibration error(ECE)。

Language models in molecular discovery

  • paper_url: http://arxiv.org/abs/2309.16235
  • repo_url: None
  • paper_authors: Nikita Janakarajan, Tim Erdmann, Sarath Swaminathan, Teodoro Laino, Jannis Born
  • for: 本研究旨在探讨语言模型在分子发现中的应用,尤其是在药物发现过程中的潜在作用。
  • methods: 本研究使用了语言模型来预测分子的性质和化学反应,并评估了这些模型在早期药物发现中的潜在应用。
  • results: 研究发现,语言模型可以帮助加速分子发现过程,特别是在药物设计、物理性预测和化学反应预测等领域。同时,研究还发现了一些可用的开源软件资源,可以降低进入这个领域的门槛。
    Abstract The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.
    摘要 随着语言模型的成功,特别是基于转换器的架构,在其他领域也出现了“科学语言模型”,这些模型运行于小分子、蛋白质或 polymer 等领域。在化学中,语言模型有助于加速药物发现的循环,这已经由最近的一些成果所证明。在本文中,我们将评论语言模型在分子发现中的角色,包括创新药物设计、质量预测和化学反应。我们将介绍一些有价值的开源软件资源,以降低进入科学语言模型领域的入口难度。最后,我们将绘制未来分子设计的未来方向,即通过聊天机器人接口与计算化学工具访问。我们的贡献 serves as a valuable resource for researchers、化学家和 AI 爱好者,了解如何和将来如何使用语言模型加速化学发现。

Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections

  • paper_url: http://arxiv.org/abs/2309.16741
  • repo_url: None
  • paper_authors: Tom Bamford, Andrea Coletta, Elizabeth Fons, Sriram Gopalakrishnan, Svitlana Vyetrenko, Tucker Balch, Manuela Veloso
  • for: 该论文是为了提高金融时间序数据的存储和检索效率而设计的。
  • methods: 该论文使用深度编码器将多Modal数据存储在lower-dimensional的幻数空间中,以便通过图像或自然语言查询。
  • results: 该论文通过实验和示例证明了其方法的计算效率和准确性,并且展示了使用幻数空间投影可以提高金融时间序数据的检索效率和用户友好性。
    Abstract Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.
    摘要 金融机构通常处理和存储大量时间序列数据,这些数据由高频生成并持续更新。为了支持高效的数据存储和检索,专门的时间序列数据库和系统出现了。这些数据库支持将时间序列索引和查询使用受限制的Structured Query Language(SQL)格式,以便进行如“股票月度价格增长大于5%”的查询。然而,这些查询不能很好地捕捉高维时间序列数据的内在复杂性,这些数据常常可以更好地用图像或语言来描述(例如,“股票在低波动 régime”)。此外,对时间序列空间的存储、计算时间和检索复杂性都是非常大的。在这篇论文中,我们提出了一种将多模态数据存储在低维的潜在空间中的框架,使得潜在空间投影包括不只是时间序列趋势,还有其他金融时间序列数据的愉悦信息或特征(如价格波动)。此外,我们的方法允许用户友好的查询界面,例如使用自然语言文本或时间序列绘图,我们已经开发出了直观的界面。我们在历史数据和synthetic数据上进行了计算效率和准确性的比较,并将latent空间投影的利用性和便捷性展示在金融时间序列数据存储和检索中。

Automated Chest X-Ray Report Generator Using Multi-Model Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2310.05969
  • repo_url: https://github.com/ariefpurnamamuharram/IF5200
  • paper_authors: Arief Purnama Muharram, Hollyana Puteri Haryono, Abassi Haji Juma, Ira Puspasari, Nugraha Priya Utama
    for: 这个研究旨在帮助医生在胸部X线影像读取中受助,提高胸部X线诊断的精度和效率。methods: 本研究使用多个多元 classification 深度学习模型,每个模型负责检测一种异常性,并将影像处理成128x128像素的标准化内容,分成三个部分,以涵盖胸部中的上下两部分和中部。results: 系统可以将胸部X线影像自动检测出异常,并生成相应的诊断报告。报告中会包含适当的预先定义的句子,以描述检测到的异常。系统预期可以帮助医生快速对胸部X线影像进行评估,提高诊断的精度和效率。
    Abstract Reading and interpreting chest X-ray images is one of the most radiologist's routines. However, it still can be challenging, even for the most experienced ones. Therefore, we proposed a multi-model deep learning-based automated chest X-ray report generator system designed to assist radiologists in their work. The basic idea of the proposed system is by utilizing multi binary-classification models for detecting multi abnormalities, with each model responsible for detecting one abnormality, in a single image. In this study, we limited the radiology abnormalities detection to only cardiomegaly, lung effusion, and consolidation. The system generates a radiology report by performing the following three steps: image pre-processing, utilizing deep learning models to detect abnormalities, and producing a report. The aim of the image pre-processing step is to standardize the input by scaling it to 128x128 pixels and slicing it into three segments, which covers the upper, lower, and middle parts of the lung. After pre-processing, each corresponding model classifies the image, resulting in a 0 (zero) for no abnormality detected and a 1 (one) for the presence of an abnormality. The prediction outputs of each model are then concatenated to form a 'result code'. The 'result code' is used to construct a report by selecting the appropriate pre-determined sentence for each detected abnormality in the report generation step. The proposed system is expected to reduce the workload of radiologists and increase the accuracy of chest X-ray diagnosis.
    摘要 读取和解释胸部X射线图像是胸部放射学专家的日常任务之一,但它仍然可以是挑战,即使是最有经验的专家。因此,我们提议了一个基于多模型深度学习的自动化胸部X射线报告生成系统,用于帮助胸部放射学专家在工作中。我们的基本想法是利用多个二进制分类模型,每个模型负责检测一种疾病,在单一图像上进行检测。在这个研究中,我们只检测了心脏肥大、肺液腔和肺部扩张。系统生成报告由以下三步进行:图像预处理、利用深度学习模型检测疾病、生成报告。图像预处理步骤的目的是标准化输入,将其扩大到128x128像素并将其分成三个部分,包括肺部上、下、中部三个部分。然后,每个相应的模型对图像进行分类,得到0(无疾病)和1(存在疾病)两个结果。预测输出的每个模型结果被 concatenate 以生成一个'结果代码'。'结果代码'被用来生成报告,通过选择适当的预定的句子来描述检测到的疾病。我们预计该系统将减轻胸部放射学专家的工作负担,并提高胸部X射线诊断的准确性。

GInX-Eval: Towards In-Distribution Evaluation of Graph Neural Network Explanations

  • paper_url: http://arxiv.org/abs/2309.16223
  • repo_url: None
  • paper_authors: Kenza Amara, Mennatallah El-Assady, Rex Ying
  • for: 本研究旨在评估图神经网络(GNN)的可解释方法,以及评估这些方法的正确性。
  • methods: 本研究使用了 retraining 策略和 EdgeRank 分数来评估图解释的正确性。
  • results: 研究发现,许多流行的方法,包括梯度基于的方法,产生的解释并不比随机选择的边为重要的子图来的好。这些结果挑战了当前领域的研究成果。结果与人类评估相符。
    Abstract Diverse explainability methods of graph neural networks (GNN) have recently been developed to highlight the edges and nodes in the graph that contribute the most to the model predictions. However, it is not clear yet how to evaluate the correctness of those explanations, whether it is from a human or a model perspective. One unaddressed bottleneck in the current evaluation procedure is the problem of out-of-distribution explanations, whose distribution differs from those of the training data. This important issue affects existing evaluation metrics such as the popular faithfulness or fidelity score. In this paper, we show the limitations of faithfulness metrics. We propose GInX-Eval (Graph In-distribution eXplanation Evaluation), an evaluation procedure of graph explanations that overcomes the pitfalls of faithfulness and offers new insights on explainability methods. Using a retraining strategy, the GInX score measures how informative removed edges are for the model and the EdgeRank score evaluates if explanatory edges are correctly ordered by their importance. GInX-Eval verifies if ground-truth explanations are instructive to the GNN model. In addition, it shows that many popular methods, including gradient-based methods, produce explanations that are not better than a random designation of edges as important subgraphs, challenging the findings of current works in the area. Results with GInX-Eval are consistent across multiple datasets and align with human evaluation.
    摘要 Graph层神经网络(GNN)的多种解释方法已经相继开发,以便高亮图形中的边和节点,以帮助理解模型的预测结果。然而,目前并没有一个明确的方法来评估这些解释的正确性,是人类或模型视角。现有的评估过程中存在一个未解决的瓶颈,即图形外的解释问题,这个问题影响了现有的评估指标,如受欢迎程度或准确率分数。在这篇论文中,我们展示了 faithfulness 指标的局限性。我们提出了 Graph In-distribution eXplanation Evaluation(GInX-Eval),一种用于评估图解释的评估方法,超越了 faithfulness 的缺陷,并提供了新的解释视角。GInX 分数测量移除的边是模型中的有用信息,而 EdgeRank 分数评估解释边是否按照重要性排序。GInX-Eval 验证了基于真实解释的模型是否具有有用的解释。此外,它还表明了许多流行的方法,包括梯度基于的方法,生成的解释并不是更加有用的,挑战当前领域的研究成果。GInX-Eval 的结果在多个数据集上具有一致性,并与人类评估相符。

Unmasking the Chameleons: A Benchmark for Out-of-Distribution Detection in Medical Tabular Data

  • paper_url: http://arxiv.org/abs/2309.16220
  • repo_url: https://github.com/mazizmalayeri/tabmedood
  • paper_authors: Mohammad Azizmalayeri, Ameen Abu-Hanna, Giovanni Ciná
  • for: 本研究旨在解决机器学习模型在不同于训练数据分布上的泛化问题,以便在实际医疗系统中可靠地使用机器学习模型并避免 incorrect predictions。
  • methods: 本研究使用了许多泛化检测方法,包括密度基本方法和现状最佳方法,并对不同预测架构进行比较,包括MLP、ResNet和Transformer。
  • results: 研究发现,i) 远程OOD检测问题已经得到解决,但近程OOD检测问题仍然存在; ii) 后处方法独立不够,与距离基本方法结合可以提高表现; iii) Transformer架构比MLP和ResNet更加具有谨慎性。
    Abstract Despite their success, Machine Learning (ML) models do not generalize effectively to data not originating from the training distribution. To reliably employ ML models in real-world healthcare systems and avoid inaccurate predictions on out-of-distribution (OOD) data, it is crucial to detect OOD samples. Numerous OOD detection approaches have been suggested in other fields - especially in computer vision - but it remains unclear whether the challenge is resolved when dealing with medical tabular data. To answer this pressing need, we propose an extensive reproducible benchmark to compare different methods across a suite of tests including both near and far OODs. Our benchmark leverages the latest versions of eICU and MIMIC-IV, two public datasets encompassing tens of thousands of ICU patients in several hospitals. We consider a wide array of density-based methods and SOTA post-hoc detectors across diverse predictive architectures, including MLP, ResNet, and Transformer. Our findings show that i) the problem appears to be solved for far-OODs, but remains open for near-OODs; ii) post-hoc methods alone perform poorly, but improve substantially when coupled with distance-based mechanisms; iii) the transformer architecture is far less overconfident compared to MLP and ResNet.
    摘要 尽管机器学习(ML)模型在训练数据 Distribution 上表现出色,但它们在不属于训练数据的数据上并不能准确预测。为了在实际医疗系统中可靠地使用 ML 模型并避免 incorrect predictions 的问题,检测 Out-of-distribution(OOD)样本是非常重要的。在其他领域中,许多 OOD 检测方法已经被提出,但是在医疗数据上的挑战仍然存在。为了解决这个问题,我们提出了一个广泛的可重现性的 benchmark,用于比较不同方法在多种测试中的表现。我们的 benchmark 利用了最新的 eICU 和 MIMIC-IV 两个公共数据集,这两个数据集包含了多个医院内的 ICU 病人数万个样本。我们考虑了多种基于密度的方法和 State-of-the-art 的 post-hoc 检测器,包括 MLP、ResNet 和 Transformer 等预测架构。我们的发现显示:1. 远 OOD 问题已经得到解决,但是近 OOD 问题仍然存在。2. post-hoc 方法独立使用不够Effective,但是与距离基于机制结合使用时会提高substantially。3. Transformer 架构相比 MLP 和 ResNet 更加具有谨慎性。

VDC: Versatile Data Cleanser for Detecting Dirty Samples via Visual-Linguistic Inconsistency

  • paper_url: http://arxiv.org/abs/2309.16211
  • repo_url: None
  • paper_authors: Zihao Zhu, Mingda Zhang, Shaokui Wei, Bingzhe Wu, Baoyuan Wu
  • for: 提高数据驱动AI系统的可靠性和可靠性,检测数据中的垃圾样本。
  • methods: 提出了一种新的多模态语言模型(MLLM),通过跨模态对接和解释来捕捉视觉内容的semantic偏差。VDC包括三个 consecutive module:视觉问题生成模块、视觉问题回答模块和视觉答案评估模块。
  • results: 广泛的实验表明,VDC可以具有高效性和泛化能力,检测多种类型和来源的垃圾样本。
    Abstract The role of data in building AI systems has recently been emphasized by the emerging concept of data-centric AI. Unfortunately, in the real-world, datasets may contain dirty samples, such as poisoned samples from backdoor attack, noisy labels in crowdsourcing, and even hybrids of them. The presence of such dirty samples makes the DNNs vunerable and unreliable.Hence, it is critical to detect dirty samples to improve the quality and realiability of dataset. Existing detectors only focus on detecting poisoned samples or noisy labels, that are often prone to weak generalization when dealing with dirty samples from other domains.In this paper, we find a commonality of various dirty samples is visual-linguistic inconsistency between images and associated labels. To capture the semantic inconsistency between modalities, we propose versatile data cleanser (VDC) leveraging the surpassing capabilities of multimodal large language models (MLLM) in cross-modal alignment and reasoning.It consists of three consecutive modules: the visual question generation module to generate insightful questions about the image; the visual question answering module to acquire the semantics of the visual content by answering the questions with MLLM; followed by the visual answer evaluation module to evaluate the inconsistency.Extensive experiments demonstrate its superior performance and generalization to various categories and types of dirty samples.
    摘要 “数据在构建人工智能系统中的角色刚刚被提出,而现代概念中的数据中心式AI强调了数据的重要性。然而,在实际应用中,数据集可能包含废弃样本,如攻击后门杀手、来自卫星化批处理的噪声标签,以及这些杂合的样本。这些废弃样本会使深度神经网络(DNN)变得不可靠和不可靠。因此,检测废弃样本是提高数据集质量和可靠性的关键。现有的检测器只会检测到攻击后门样本或噪声标签,但这些检测器在面临不同领域中的废弃样本时往往存在弱化现象。在这篇论文中,我们发现了不同领域中废弃样本的共同特点:视觉语言冲突。为了捕捉多模态 semantic的冲突,我们提出了多模态大语言模型(MLLM)的跨模态准确性和推理能力,并开发了三个 consecutive 模块:视觉问题生成模块、视觉问题答案模块和视觉答案评估模块。经过广泛的实验,我们发现其在不同类别和类型的废弃样本上表现出优于常见的性和普适性。”

Design of JiuTian Intelligent Network Simulation Platform

  • paper_url: http://arxiv.org/abs/2310.06858
  • repo_url: None
  • paper_authors: Lei Zhao, Miaomiao Zhang, Guangyu Li, Zhuowen Guan, Sijia Liu, Zhaobin Xiao, Yuting Cao, Zhe Lv, Yanping Liang
  • for: 本研究开发了一个名为“智能网络实验平台”的智能网络实验平台,将提供无线通信实验数据服务供开放创新平台使用。
  • methods: 本平台包括一系列可扩展的模拟器功能,提供开放服务,让用户使用增强学习算法进行模型训练和推断,并可以上传和更新参数配置以进行不同情况下的优化任务。
  • results: 本平台和其开放服务主要从背景、整体架构、模拟器、商业场景和未来方向等多个角度进行介绍。
    Abstract This paper introduced the JiuTian Intelligent Network Simulation Platform, which can provide wireless communication simulation data services for the Open Innovation Platform. The platform contains a series of scalable simulator functionalities, offering open services that enable users to use reinforcement learning algorithms for model training and inference based on simulation environments and data. Additionally, it allows users to address optimization tasks in different scenarios by uploading and updating parameter configurations. The platform and its open services were primarily introduced from the perspectives of background, overall architecture, simulator, business scenarios, and future directions.
    摘要 这篇论文介绍了九天智能网络模拟平台,该平台可以为开放创新平台提供无线通信模拟数据服务。平台包括一系列可扩展的模拟器功能,提供开放服务,让用户通过模拟环境和数据进行模型训练和推理,同时允许用户在不同的enario中解决优化问题,通过上传和更新参数配置。平台和其开放服务主要从背景、总体架构、模拟器、业务场景和未来方向等角度进行了介绍。

Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities

  • paper_url: http://arxiv.org/abs/2309.16739
  • repo_url: None
  • paper_authors: Zheng Lin, Guanqiao Qu, Qiyuan Chen, Xianhao Chen, Zhe Chen, Kaibin Huang
  • for: 本文旨在探讨将大语言模型(LLM)部署到6G边缘计算(MEC)系统中的可能性,以解决云部署中的长响应时间、高频带宽成本和数据隐私问题。
  • methods: 本文首先介绍了具有多Modal的LLM在 роботиCS和医疗领域的潜在应用,以强调在用户端部署LLM的需求。然后,本文识别了部署LLM在边缘的挑战,并对6G MEC架构的概述,以及两个设计方面:边缘训练和边缘推理。为了使得LLM在边缘部署更加高效,本文介绍了多种前沿技术,包括分解学习/推理、精炼练习、量化和参数共享推理。
  • results: 本文作为一篇Position paper,旨在全面识别LLM部署在6G边缘的动机、挑战和走向,以促进6G MEC系统中LLM的高效部署。
    Abstract Large language models (LLMs), which have shown remarkable capabilities, are revolutionizing AI development and potentially shaping our future. However, given their multimodality, the status quo cloud-based deployment faces some critical challenges: 1) long response time; 2) high bandwidth costs; and 3) the violation of data privacy. 6G mobile edge computing (MEC) systems may resolve these pressing issues. In this article, we explore the potential of deploying LLMs at the 6G edge. We start by introducing killer applications powered by multimodal LLMs, including robotics and healthcare, to highlight the need for deploying LLMs in the vicinity of end users. Then, we identify the critical challenges for LLM deployment at the edge and envision the 6G MEC architecture for LLMs. Furthermore, we delve into two design aspects, i.e., edge training and edge inference for LLMs. In both aspects, considering the inherent resource limitations at the edge, we discuss various cutting-edge techniques, including split learning/inference, parameter-efficient fine-tuning, quantization, and parameter-sharing inference, to facilitate the efficient deployment of LLMs. This article serves as a position paper for thoroughly identifying the motivation, challenges, and pathway for empowering LLMs at the 6G edge.
    摘要 大型语言模型(LLM),具有吸引人的能力,正在改变人工智能的发展和未来的形态。然而,由于它们的多模态,当前云端部署的状况存在一些挑战:1)长响应时间;2)高频宽成本;以及3)数据隐私的侵犯。6G移动边计算(MEC)系统可能解决这些急需解决的问题。在这篇文章中,我们探讨将 LLM 部署在6G边缘。我们首先介绍了由多模态 LLM 驱动的杀手应用,包括机器人和医疗,以强调 LLM 的部署需要在用户端。然后,我们认为6G MEC 架构对 LLM 的部署存在挑战,并讨论了两个设计方面:1)边缘训练和2)边缘推理。在两个方面中,我们讨论了一些 cutting-edge 技术,包括分解学习/推理、精炼型精度训练、量化和共享参数推理,以便有效地部署 LLM。本文作为一篇位点文章,旨在全面识别 LLM 的动机、挑战和部署路径。

A More General Theory of Diagnosis from First Principles

  • paper_url: http://arxiv.org/abs/2309.16180
  • repo_url: https://github.com/alban-grastien/diagfwork
  • paper_authors: Alban Grastien, Patrik Haslum, Sylvie Thiébaux
  • for: 这个论文是为了总结现有的 диагностикс方法,并提出一种更通用的 диагностикс理论,以便应对不同类型的系统和诊断问题。
  • methods: 该论文使用了一种基于模型的 диагностикс方法,包括在搜索空间中检查假设,测试假设的一致性,并生成冲突来排除继承和其他搜索空间的部分。
  • results: 该论文的实现使用了两种不同的测试solvers,一个基于满足性检查,另一个基于冒泡搜索,并在两个实际世界 discrete event 问题上进行了评估。结果显示,这些实现可以在更加通用的理论基础上,解决现有的诊断方法无法解决的问题,并在解决实际问题时表现出优于特殊设计的方法。
    Abstract Model-based diagnosis has been an active research topic in different communities including artificial intelligence, formal methods, and control. This has led to a set of disparate approaches addressing different classes of systems and seeking different forms of diagnoses. In this paper, we resolve such disparities by generalising Reiter's theory to be agnostic to the types of systems and diagnoses considered. This more general theory of diagnosis from first principles defines the minimal diagnosis as the set of preferred diagnosis candidates in a search space of hypotheses. Computing the minimal diagnosis is achieved by exploring the space of diagnosis hypotheses, testing sets of hypotheses for consistency with the system's model and the observation, and generating conflicts that rule out successors and other portions of the search space. Under relatively mild assumptions, our algorithms correctly compute the set of preferred diagnosis candidates. The main difficulty here is that the search space is no longer a powerset as in Reiter's theory, and that, as consequence, many of the implicit properties (such as finiteness of the search space) no longer hold. The notion of conflict also needs to be generalised and we present such a more general notion. We present two implementations of these algorithms, using test solvers based on satisfiability and heuristic search, respectively, which we evaluate on instances from two real world discrete event problems. Despite the greater generality of our theory, these implementations surpass the special purpose algorithms designed for discrete event systems, and enable solving instances that were out of reach of existing diagnosis approaches.
    摘要 MODEL-BASED诊断在不同的社区中,包括人工智能、正式方法和控制,是一个活跃的研究主题。这导致了不同的方法,targeting different types of systems and seeking different forms of diagnoses。在这篇论文中,我们解决了这些不一致性,通过总结Reiter的理论,使其不受系统和诊断类型的限制。这种更通用的诊断理论定义了最小诊断为搜索空间中最佳诊断候选者的集合。计算最小诊断通过探索诊断假设空间,测试假设集合与系统模型和观察之间的一致性,并生成规则排除继承和其他搜索空间部分。在相对轻量级假设下,我们的算法可正确计算最佳诊断候选者集合。主要困难在于搜索空间不再是Reiter理论中的全集,因此许多隐性属性(如搜索空间的有限性)不再保持。诊断理论中的冲突也需要更新,我们提出了一种更通用的冲突概念。我们在使用满足性和尝试搜索的测试器基础上实现了这些算法,并在两个真实世界离散事件问题上进行了测试。尽管我们的理论更加通用,但这些实现仍然超越了特定的诊断方法,并可以解决当前诊断方法无法解决的实例。

Attention Sorting Combats Recency Bias In Long Context Language Models

  • paper_url: http://arxiv.org/abs/2310.01427
  • repo_url: None
  • paper_authors: Alexander Peysakhovich, Adam Lerer
  • for: 提高长文本模型在生成中的效率,解决现有语言模型在处理长文本时的问题。
  • methods: 引入“注意力排序”策略,在响应过程中多次执行一步解码,然后根据注意力排序文档,最终生成答案。
  • results: 提高长文本模型的表现,并指出现有语言模型在使用 Retrieval-Augmented Generation 时存在一些挑战。
    Abstract Current language models often fail to incorporate long contexts efficiently during generation. We show that a major contributor to this issue are attention priors that are likely learned during pre-training: relevant information located earlier in context is attended to less on average. Yet even when models fail to use the information from a relevant document in their response, they still pay preferential attention to that document compared to an irrelevant document at the same position. We leverage this fact to introduce ``attention sorting'': perform one step of decoding, sort documents by the attention they receive (highest attention going last), repeat the process, generate the answer with the newly sorted context. We find that attention sorting improves performance of long context models. Our findings highlight some challenges in using off-the-shelf language models for retrieval augmented generation.
    摘要 现有的语言模型很难够效率地 incorporate 长文本上下文。我们表明,一个主要的贡献因素是在预训练中学习的注意力先驱:相关信息在上下文中的早期位置被注意到的平均水平更低。然而,即使模型未使用它们的答案中的相关文档信息,它们仍然会对这些文档比 irrelevant 文档更加偏好地分配注意力。我们利用这一点,引入“注意力排序”:在一步解码后,排序文档按照它们所收到的注意力排序(最高注意力在最后),重复过程,生成答案。我们发现,注意力排序可以提高长文本模型的性能。我们的发现强调了使用预存库语言模型进行检索增强生成的一些挑战。

CoinRun: Solving Goal Misgeneralisation

  • paper_url: http://arxiv.org/abs/2309.16166
  • repo_url: None
  • paper_authors: Stuart Armstrong, Alexandre Maranhão, Oliver Daniels-Koch, Patrick Leask, Rebecca Gorman
  • for: 解决人工智能对目标的扩展问题,使得强大的人工智能能够与人类意愿和人类道德观念相吻合。
  • methods: 使用ACE(概念推导算法)代理人,不使用新的奖励信息,在新环境中解决了金币挑战,表明了自主代理人可以在新和关键的情况下被信任。
  • results: 通过ACE(概念推导算法)代理人,在金币挑战中解决了goal misgeneralisation问题,未使用新的奖励信息。
    Abstract Goal misgeneralisation is a key challenge in AI alignment -- the task of getting powerful Artificial Intelligences to align their goals with human intentions and human morality. In this paper, we show how the ACE (Algorithm for Concept Extrapolation) agent can solve one of the key standard challenges in goal misgeneralisation: the CoinRun challenge. It uses no new reward information in the new environment. This points to how autonomous agents could be trusted to act in human interests, even in novel and critical situations.
    摘要 goal misgeneralization 是人工智能对alignment的主要挑战 -- 将强大的人工智能目标与人类意图和人类伦理相对应。在这篇论文中,我们表明了ACE(概念推论算法)代理可以解决标准化目标泛化挑战之一:硬币推论挑战。它不使用新的奖励信息在新环境中。这表明了自主代理可以在新和关键的情况下被信任,以行动在人类利益之下。Here's a breakdown of the translation:* "goal misgeneralization" is translated as "目标泛化" (mùzhì pògē), which refers to the phenomenon of AI systems deviating from their intended goals.* "人工智能" is translated as "人类智能" (rénshēng zhìnéng), which refers to artificial intelligence systems.* "alignment" is translated as "对应" (duìpǐng), which refers to the alignment of AI systems with human intentions and values.* "ACE" is translated as "概念推论算法" (guīxiǎn tuīsuǒ gōngchǎng), which is the name of the algorithm used to solve the CoinRun challenge.* "CoinRun challenge" is translated as "硬币推论挑战" (hùqian tuīsuǒ tīzhàn), which refers to a standard challenge in goal misgeneralization where an AI system must learn to navigate a maze to reach a goal.* "novel and critical situations" is translated as "新和关键的情况" (xīn hé guānjiā de qíngkē), which refers to situations that are new and require careful consideration.

Leveraging Untrustworthy Commands for Multi-Robot Coordination in Unpredictable Environments: A Bandit Submodular Maximization Approach

  • paper_url: http://arxiv.org/abs/2309.16161
  • repo_url: None
  • paper_authors: Zirui Xu, Xiaofeng Lin, Vasileios Tzoumas
  • for: 这 paper 探讨了多机器人协调问题在不可预测和部分可见环境中,外部指令是不可靠的。
  • methods: 这 paper 使用了 Meta Bandit Sequential Greedy(MetaBSG)算法,该算法可以在不可靠的外部指令下提供性能保证。MetaBSG 利用了一种 meta-算法来判断机器人是否 следу着外部指令或一种最近发展的 submodular 协调算法 Bandit Sequential Greedy(BSG),该算法在不可预测和部分可见环境中有性能保证。
  • results: 这 paper 在 simulated 多目标追踪场景中验证了 MetaBSG 算法的效果,并证明了 MetaBSG 可以在不可预测和部分可见环境中提供性能保证,并且可以Robustify 不可靠的外部指令。
    Abstract We study the problem of multi-agent coordination in unpredictable and partially-observable environments with untrustworthy external commands. The commands are actions suggested to the robots, and are untrustworthy in that their performance guarantees, if any, are unknown. Such commands may be generated by human operators or machine learning algorithms and, although untrustworthy, can often increase the robots' performance in complex multi-robot tasks. We are motivated by complex multi-robot tasks such as target tracking, environmental mapping, and area monitoring. Such tasks are often modeled as submodular maximization problems due to the information overlap among the robots. We provide an algorithm, Meta Bandit Sequential Greedy (MetaBSG), which enjoys performance guarantees even when the external commands are arbitrarily bad. MetaBSG leverages a meta-algorithm to learn whether the robots should follow the commands or a recently developed submodular coordination algorithm, Bandit Sequential Greedy (BSG) [1], which has performance guarantees even in unpredictable and partially-observable environments. Particularly, MetaBSG asymptotically can achieve the better performance out of the commands and the BSG algorithm, quantifying its suboptimality against the optimal time-varying multi-robot actions in hindsight. Thus, MetaBSG can be interpreted as robustifying the untrustworthy commands. We validate our algorithm in simulated scenarios of multi-target tracking.
    摘要 We present an algorithm, Meta Bandit Sequential Greedy (MetaBSG), which appreciates execution guarantees even when the outside orders are arbitrarily terrible. MetaBSG leverages a meta-algorithm to learn whether the robots should follow the orders or a as of late created submodular coordination algorithm, Bandit Sequential Greedy (BSG) [1], which has execution guarantees even in unforeseeable and partially observable conditions. Specifically, MetaBSG asymptotically can accomplish the better execution out of the orders and the BSG algorithm, quantifying its suboptimality against the optimal time-varying multi-robot activities in hindsight. Along these lines, MetaBSG can be deciphered as robustifying the untrustworthy orders. We confirm our algorithm in simulated scenarios of multi-target following.

AE-GPT: Using Large Language Models to Extract Adverse Events from Surveillance Reports-A Use Case with Influenza Vaccine Adverse Events

  • paper_url: http://arxiv.org/abs/2309.16150
  • repo_url: None
  • paper_authors: Yiming Li, Jianfu Li, Jianping He, Cui Tao
  • for: 本研究旨在评估大语言模型(LLMs)在临床报告中检测抗体不良事件(AE)的能力。
  • methods: 研究使用了1990年至2016年的VAERS数据,并评估了多种常见的LLMs,包括GPT-2、GPT-3变种、GPT-4和Llama 2。研究人员使用了GPT 3.5模型进行精度调整,并以Influenza疫苗为用例进行测试。
  • results: 研究发现,精度调整后的GPT 3.5模型(AE-GPT)在严格匹配和饱和匹配情况下都达到了0.704和0.816的微型F1分数。这表明LLMs在处理医疗数据方面具有潜在的潜力,这可能将为其他AE检测任务提供新的思路。
    Abstract Though Vaccines are instrumental in global health, mitigating infectious diseases and pandemic outbreaks, they can occasionally lead to adverse events (AEs). Recently, Large Language Models (LLMs) have shown promise in effectively identifying and cataloging AEs within clinical reports. Utilizing data from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016, this study particularly focuses on AEs to evaluate LLMs' capability for AE extraction. A variety of prevalent LLMs, including GPT-2, GPT-3 variants, GPT-4, and Llama 2, were evaluated using Influenza vaccine as a use case. The fine-tuned GPT 3.5 model (AE-GPT) stood out with a 0.704 averaged micro F1 score for strict match and 0.816 for relaxed match. The encouraging performance of the AE-GPT underscores LLMs' potential in processing medical data, indicating a significant stride towards advanced AE detection, thus presumably generalizable to other AE extraction tasks.
    摘要 尽管疫苗在全球医疗中发挥了重要作用,减轻感染病和流行病舌,但它们有时会导致不良反应(AE)。最近,大型自然语言模型(LLM)已经显示出了有效地标识和目录AE的潜力。通过使用1990年至2016年的疫苗不良反应报送系统(VAERS)数据,本研究专门关注AE来评估LLM的能力。包括GPT-2、GPT-3变种、GPT-4和Llama 2等多种常见LLM都被评估,并使用Influenza疫苗作为用例。细化的GPT 3.5模型(AE-GPT)表现出色,其微型F1分数为0.704(严格匹配)和0.816(松散匹配)。AE-GPT的良好表现表明LLM在处理医疗数据方面的潜力,这表明可能在其他AE抽取任务中表现出色。

T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2309.16146
  • repo_url: https://github.com/neu-datamining/t-col
  • paper_authors: Ming Wang, Daling Wang, Wenfang Wu, Shi Feng, Yifei Zhang
    for:The paper aims to address the lack of interpretability in machine learning (ML) systems by proposing a new method called Tree-based Conditions Optional Links (T-COL) to generate counterfactual explanations (CEs) that can be adapted to general user preferences.methods:The proposed T-COL method uses tree-based structures and conditions to generate CEs that can be customized to suit the variability of ML models, while maintaining robustness even when the validation models change.results:The paper experimentally compares the properties of CEs generated by T-COL under different user preferences and demonstrates that T-COL is better suited for accommodating user preferences and variable ML systems compared to baseline methods, including Large Language Models.
    Abstract Machine learning (ML) based systems have been suffering a lack of interpretability. To address this problem, counterfactual explanations (CEs) have been proposed. CEs are unique as they provide workable suggestions to users, in addition to explaining why a certain outcome was predicted. However, the application of CEs has been hindered by two main challenges, namely general user preferences and variable ML systems. User preferences, in particular, tend to be general rather than specific feature values. Additionally, CEs need to be customized to suit the variability of ML models, while also maintaining robustness even when these validation models change. To overcome these challenges, we propose several possible general user preferences that have been validated by user research and map them to the properties of CEs. We also introduce a new method called \uline{T}ree-based \uline{C}onditions \uline{O}ptional \uline{L}inks (T-COL), which has two optional structures and several groups of conditions for generating CEs that can be adapted to general user preferences. Meanwhile, a group of conditions lead T-COL to generate more robust CEs that have higher validity when the ML model is replaced. We compared the properties of CEs generated by T-COL experimentally under different user preferences and demonstrated that T-COL is better suited for accommodating user preferences and variable ML systems compared to baseline methods including Large Language Models.
    摘要

Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples

  • paper_url: http://arxiv.org/abs/2309.16143
  • repo_url: None
  • paper_authors: Shin’ya Yamaguchi
    for:这篇论文的目的是研究无监督学习(SSL)方法,以实现训练深度分类模型,不需要大量的无标签数据。methods:这篇论文提出了一种使用生成模型生成的synthetic数据进行SSL训练的方法,包括:(1)meta-学习生成模型,以生成模型能够生成与标签样本相似的synthetic样本,并(2)使用real标签和synthetic无标签样本进行SSL训练。results:研究结果表明,这种方法可以超过基eline方法,并在具有极少量标签数据的场景下表现更好。这表明,synthetic样本可以为SSL训练提供更高效的改进。
    Abstract Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.
    摘要 SSL(半有监督学习)是一种有前途的方法,用于训练深度分类模型,使用标注和无标注数据集。然而,现有的SSL方法往往需要大量的无标注数据集,在许多实际应用中可能无法获得,尤其是因为法律约束(例如GDPR)。在这篇论文中,我们研究问题:我们可以不使用真实的无标注数据集来训练SSL模型吗?而是使用生成的数据集,生成自生成基本模型,该模型在不同领域(例如ImageNet)上训练了数百万个样本。我们的主要概念是将生成的样本与真实的标注样本进行对比,并使用这些样本来训练分类器。为此,我们的方法被формализова为一个 alternate optimization 问题:(i)生成基本模型的meta-学习和(ii)使用实际标注和生成无标注样本来进行SSL。为(i),我们提出了一个meta-学习目标,用于优化幂等变量,以生成与真实标注样本更相似的样本,并最小化验证损失。为(ii),我们提出了一个简单的无监督损失函数,用于规范分类器的特征提取器,以提高从生成样本中获得的性能提升。我们证明了,我们的方法可以超越基于生成基本模型的SSL基线。此外,我们还证明了,我们的方法在具有极少量标注数据集的场景中可以更高效地提高SSL性能,这表明生成样本有可能提供更高效的提升。

ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers

  • paper_url: http://arxiv.org/abs/2309.16119
  • repo_url: https://github.com/kuleshov-group/modulora-experiment
  • paper_authors: Junjie Yin, Jiahao Dong, Yingheng Wang, Christopher De Sa, Volodymyr Kuleshov
  • For: The paper is written for proposing a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU.* Methods: The paper proposes a method called ModuLoRA, which integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). The approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module.* Results: The paper achieves competitive performance on text classification, natural language inference, and instruction following tasks using significantly less memory than existing approaches, and surpasses the state-of-the-art ROUGE score on a popular summarization task. The paper also releases ModuLoRA together with a series of low-precision models as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.Here’s the format you requested:* For: 提出了一种能够在1GB GPU上进行精简的大语言模型(LLM)微调法,支持LLM模型的65B参数进行3bit或4bit的精简。* Methods: 提出了一种名为ModuLoRA的方法,该方法通过简单的量化无关的反向传播来自动权重量化,并使用低精度适应器(LoRA)来实现微调。* Results: 实验表明,ModuLoRA可以在文本分类、自然语言推理和指令执行任务中实现竞争力的性能,并且在某些任务上超过了现有的状态态的ROUGE分数。同时, paper还发布了一系列的低精度模型,包括首个3bit的指令执行Alpaca LLM模型,并将其作为LLMTOOLS库发布。
    Abstract We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 3-bit LLMs for the first time--leveraging state-of-the-art 3-bit OPTQ quantization often outperforms finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language infernece, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models--including the first family of 3-bit instruction following Alpaca LLMs--as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.
    摘要 我们提出了一种内存效率高的训练算法,用于大型自然语言模型(LLM)的精细调整。我们的方法,即模块化低级适应(ModuLoRA),可以将用户指定的Weight量化器与精细调整结合在一起。我们的方法基于一种简单的量化不可逆传输,可以动态将低精度LLM weights转换为自定义黑盒量化模块中的低精度模型。这种方法使得可以在3位数字精度下进行精细调整,并且可以首次实现3位LLM的训练。在我们的实验中,ModuLoRA在文本分类、自然语言推理和指令遵从任务中达到了竞争力的性能,并且使用了许多内存少于现有方法。此外,我们还超越了当前的ROUGE分数在摘要任务中。我们将ModuLoRA和一系列低精度模型(包括首个3位指令遵从Alpaca LLM)发布为LLMTOOLS,一个用户友好的库,用于在消费级GPU上量化、运行和精细调整LLM。

E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network

  • paper_url: http://arxiv.org/abs/2309.16117
  • repo_url: https://github.com/liuruiqi0520/E2Net
  • paper_authors: RuiQi Liu, Boyu Diao, Libo Huang, Zhulin An, Yongjun Xu
  • for: 本研究旨在提出一种资源效率高的连续学习方法(Elastic Expansion Network,E2Net),以便在同等计算和存储限制下实现高平均精度和减少忘记。
  • methods: 该方法利用核心子网络精炼和准确回顾样本选择,以实现在同等计算和存储限制下的高平均精度和减少忘记。另外,我们还提出了代表网络精炼来确定核心子网络,以减少回顾缓存的依赖性和促进知识传递。
  • results: 我们的实验表明,E2Net在云环境和边缘环境中的多个数据集上具有很高的表现,并且在计算和存储限制下表现更优于当前状态方法。此外,我们的方法还在计算和存储限制下表现更优于竞争对手。
    Abstract Continual Learning methods are designed to learn new tasks without erasing previous knowledge. However, Continual Learning often requires massive computational power and storage capacity for satisfactory performance. In this paper, we propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net). Leveraging core subnet distillation and precise replay sample selection, E2Net achieves superior average accuracy and diminished forgetting within the same computational and storage constraints, all while minimizing processing time. In E2Net, we propose Representative Network Distillation to identify the representative core subnet by assessing parameter quantity and output similarity with the working network, distilling analogous subnets within the working network to mitigate reliance on rehearsal buffers and facilitating knowledge transfer across previous tasks. To enhance storage resource utilization, we then propose Subnet Constraint Experience Replay to optimize rehearsal efficiency through a sample storage strategy based on the structures of representative networks. Extensive experiments conducted predominantly on cloud environments with diverse datasets and also spanning the edge environment demonstrate that E2Net consistently outperforms state-of-the-art methods. In addition, our method outperforms competitors in terms of both storage and computational requirements.
    摘要 在E2Net中,我们提出了代表网络传授(Representative Network Distillation)来识别代表的核心子网,通过评估参数量和输出相似度,将相似的子网内部 integrate into the working network,以减少复练缓存的依赖和促进知识传递。此外,我们还提出了子网对应体验储存(Subnet Constraint Experience Replay)来优化复练效率,通过基于代表网络的构造储存样本。实验结果显示,E2Net在云端环境中与多种数据集进行实验,以及边缘环境中进行部分实验,能够与现有的方法相比,具有较高的性能。此外,我们的方法还比竞争者在存储和计算需求方面表现更好。

Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words

  • paper_url: http://arxiv.org/abs/2309.16108
  • repo_url: https://github.com/insitro/channelvit
  • paper_authors: Yujia Bao, Srinivasan Sivanandan, Theofanis Karaletsos
  • for: 这篇文章的目的是对现代计算机视觉领域中的 Vision Transformer (ViT) 架构进行修改,以便在微scopic 和卫星影像领域中进行应用。
  • methods: 这篇文章提出了一种修改后的 ChannelViT 模型,它将在每个输入通道中独立建立 patch tokens,并使用可学习的通道嵌入,与位置嵌入一样。此外,它还引入了 Hierarchical Channel Sampling (HCS) 技术来保证模型在测试时运行时的Robustness。
  • results: 根据文章的实验结果,ChannelViT 模型在 ImageNet、JUMP-CP (微scopic 细胞影像)和 So2Sat (卫星影像)上的分类任务中表现出色,并且在只有部分输入通道可用时进行测试时仍能保持良好的表现。另外,HCS 技术被证明是一个强大的 regularizer,独立于架构选择。
    Abstract Vision Transformer (ViT) has emerged as a powerful architecture in the realm of modern computer vision. However, its application in certain imaging fields, such as microscopy and satellite imaging, presents unique challenges. In these domains, images often contain multiple channels, each carrying semantically distinct and independent information. Furthermore, the model must demonstrate robustness to sparsity in input channels, as they may not be densely available during training or testing. In this paper, we propose a modification to the ViT architecture that enhances reasoning across the input channels and introduce Hierarchical Channel Sampling (HCS) as an additional regularization technique to ensure robustness when only partial channels are presented during test time. Our proposed model, ChannelViT, constructs patch tokens independently from each input channel and utilizes a learnable channel embedding that is added to the patch tokens, similar to positional embeddings. We evaluate the performance of ChannelViT on ImageNet, JUMP-CP (microscopy cell imaging), and So2Sat (satellite imaging). Our results show that ChannelViT outperforms ViT on classification tasks and generalizes well, even when a subset of input channels is used during testing. Across our experiments, HCS proves to be a powerful regularizer, independent of the architecture employed, suggesting itself as a straightforward technique for robust ViT training. Lastly, we find that ChannelViT generalizes effectively even when there is limited access to all channels during training, highlighting its potential for multi-channel imaging under real-world conditions with sparse sensors. Our code is available at https://github.com/insitro/ChannelViT.
    摘要 现代计算机视觉领域中,视觉转换器(ViT)已经成为一种强大的建筑。然而,在微scopic和卫星成像等领域中,使用ViT时会遇到一些独特的挑战。在这些领域中,图像通常包含多个渠道,每个渠道都携带着semantically独立且独立的信息。此外,模型需要在输入渠道上示 robustness,因为它们可能不会在训练或测试时都可用。在这篇论文中,我们提出了对ViT建筑的修改,以提高输入渠道之间的推理,并 introduce Hierarchical Channel Sampling(HCS)作为一种额外的正则化技术,以确保在测试时只有部分渠道可用时的模型robustness。我们提出的模型,ChannelViT,通过独立地从每个输入渠道中构建 patch tokens,并使用可学习的渠道嵌入,类似于位置嵌入。我们在ImageNet、JUMP-CP(微scopic细胞成像)和So2Sat(卫星成像)上评估了ChannelViT的性能。我们的结果表明,ChannelViT在分类任务上比ViT高效,并且在部分输入渠道时可以通过HCS来提高模型的robustness。在我们的实验中,HCS作为一种独立的正则化技术,无论使用哪种建筑,都有很好的效果。最后,我们发现ChannelViT在有限的输入渠道可用时也能够很好地适应,表明它在实际中的多渠道成像中具有潜在的应用价值。我们的代码可以在https://github.com/insitro/ChannelViT中找到。

Discovering Utility-driven Interval Rules

  • paper_url: http://arxiv.org/abs/2309.16102
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: Chunkai Zhang, Maohua Lyu, Huaijin Hao, Wensheng Gan, Philip S. Yu
  • for: 这个论文的目的是提出一种基于interval事件序列的高用途序列规则挖掘算法,以解决现有方法不能直接应用于interval事件序列的问题。
  • methods: 该算法使用了一种数值编码关系表示法,以减少关系计算和存储的时间,并提出了一种补做截断策略,通过与用途upper bound相结合,缩小搜索空间。
  • results: 实验表明,该算法可以效果地和高效地从interval事件序列数据库中提取高用途间隔规则(UIRs),并在实际世界和synthetic数据集上实现了优秀的效果。
    Abstract For artificial intelligence, high-utility sequential rule mining (HUSRM) is a knowledge discovery method that can reveal the associations between events in the sequences. Recently, abundant methods have been proposed to discover high-utility sequence rules. However, the existing methods are all related to point-based sequences. Interval events that persist for some time are common. Traditional interval-event sequence knowledge discovery tasks mainly focus on pattern discovery, but patterns cannot reveal the correlation between interval events well. Moreover, the existing HUSRM algorithms cannot be directly applied to interval-event sequences since the relation in interval-event sequences is much more intricate than those in point-based sequences. In this work, we propose a utility-driven interval rule mining (UIRMiner) algorithm that can extract all utility-driven interval rules (UIRs) from the interval-event sequence database to solve the problem. In UIRMiner, we first introduce a numeric encoding relation representation, which can save much time on relation computation and storage on relation representation. Furthermore, to shrink the search space, we also propose a complement pruning strategy, which incorporates the utility upper bound with the relation. Finally, plentiful experiments implemented on both real-world and synthetic datasets verify that UIRMiner is an effective and efficient algorithm.
    摘要 In this work, we propose a utility-driven interval rule mining (UIRMiner) algorithm that can extract all utility-driven interval rules (UIRs) from the interval-event sequence database to solve the problem. Our approach includes the following steps:First, we introduce a numeric encoding relation representation, which can save much time on relation computation and storage on relation representation.Second, to shrink the search space, we propose a complement pruning strategy, which incorporates the utility upper bound with the relation.Finally, we conduct plentiful experiments implemented on both real-world and synthetic datasets, and the results show that UIRMiner is an effective and efficient algorithm.

Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2309.16096
  • repo_url: None
  • paper_authors: Ambar Pal, Jeremias Sulam, René Vidal
  • for: This paper aims to investigate the question of whether adversarial examples are truly unavoidable for modern machine learning classifiers, and to provide theoretical results that demonstrate the existence of robust classifiers under certain conditions.
  • methods: The paper uses theoretical techniques to demonstrate the existence of robust classifiers for data distributions that have certain properties, such as concentration on small-volume subsets of the input space. The paper also explores the use of data structure to improve the robustness of classifiers.
  • results: The paper shows that, for certain data distributions, it is possible to construct classifiers that are robust to adversarial examples. Specifically, the paper demonstrates that, for data distributions concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.
    Abstract The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.
    摘要 现代机器学习分类器的感стви性面临到了劫难例的挑战,这些结果可能是不可避免的。然而,这些结果可能是对自然数据分布过于一般化的,人类在视觉任务中却很有抗性。这种冲突 inspirits我们深入探究:劫难例是否真的无法避免?在这项工作中,我们 theoretically 表明了数据分布的一个关键特性——输入空间中小量子体的集中性——会决定是否存在一个Robust的分类器。我们进一步表明,对于一个集中在低维线性子空间的数据分布,利用数据结构的特点可以得到一些具有良好robustness保证的分类器,超越一些特定情况下的证明证明。

AI Potentiality and Awareness: A Position Paper from the Perspective of Human-AI Teaming in Cybersecurity

  • paper_url: http://arxiv.org/abs/2310.12162
  • repo_url: None
  • paper_authors: Iqbal H. Sarker, Helge Janicke, Nazeeruddin Mohammad, Paul Watters, Surya Nepal
  • for: 本研究探讨了人工智能在网络安全领域的潜在可能性,尤其是其可能的风险因素,通过人机合作(Human-AI)来管理这些风险。
  • methods: 本研究使用了人工智能技术,如Pattern recognition和预测模型,探索了AI在网络安全领域的可能性,并提出了一种 equilibrio balance方法,即将人类专业知识与AI计算能力相结合,以提高网络安全防御能力。
  • results: 本研究发现,通过人机合作,可以提高网络安全防御能力,并且可以减少相关的风险因素。此外,本研究还发现,AI可以帮助人类专业人员更好地理解和解决网络安全问题。
    Abstract This position paper explores the broad landscape of AI potentiality in the context of cybersecurity, with a particular emphasis on its possible risk factors with awareness, which can be managed by incorporating human experts in the loop, i.e., "Human-AI" teaming. As artificial intelligence (AI) technologies advance, they will provide unparalleled opportunities for attack identification, incident response, and recovery. However, the successful deployment of AI into cybersecurity measures necessitates an in-depth understanding of its capabilities, challenges, and ethical and legal implications to handle associated risk factors in real-world application areas. Towards this, we emphasize the importance of a balanced approach that incorporates AI's computational power with human expertise. AI systems may proactively discover vulnerabilities and detect anomalies through pattern recognition, and predictive modeling, significantly enhancing speed and accuracy. Human experts can explain AI-generated decisions to stakeholders, regulators, and end-users in critical situations, ensuring responsibility and accountability, which helps establish trust in AI-driven security solutions. Therefore, in this position paper, we argue that human-AI teaming is worthwhile in cybersecurity, in which human expertise such as intuition, critical thinking, or contextual understanding is combined with AI's computational power to improve overall cyber defenses.
    摘要 这份位点纸 analyze AI在cybersecurity领域的广泛潜力,尤其是其可能的风险因素,可以通过将人类专家纳入循环来管理,即"人机合作"(Human-AI teaming)。随着人工智能(AI)技术的进步,它将为攻击标识、事件回应和恢复提供无 precedent的机会。然而,在实施 AI 到 cybersecurity 措施方面,需要深入了解其能力、挑战和伦理法律因素,以处理相关风险因素在实际应用领域。为了实现这一目标,我们强调需要一种平衡的方法,即将 AI 的计算能力与人类专家的知识相结合。AI 系统可以扫描 Pattern 并探索漏洞,并预测模型,大幅提高速度和准确性。人类专家可以为重要情况中解释 AI 生成的决策,使得责任和财务可以被追溯,从而建立 AI 驱动的安全解决方案的信任。因此,在这份位点纸中,我们认为人机合作在cybersecurity中是值得投入的,在这种合作中,人类专家的直觉、批判思维和Contextual 理解与 AI 的计算能力相结合,从而提高总的cyber 防御能力。

TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration

  • paper_url: http://arxiv.org/abs/2309.16090
  • repo_url: None
  • paper_authors: Hongru Wang, Huimin Wang, Lingzhi Wang, Minda Hu, Rui Wang, Boyang Xue, Hongyuan Lu, Fei Mi, Kam-Fai Wong
  • for: 这 paper 旨在扩展大语言模型(LLM)在干扰问答任务中的规划能力,特别是在对话系统中使用不同的概念工具。
  • methods: 这 paper 使用了一种多人格协作框架:思考-规划-执行(TPE),将响应生成过程分解成三个不同角色:思考者、规划者和执行者。
  • results: 这 paper 在多源(FoCus)和多策略交互(CIMA和PsyQA)等响应生成任务中示出了效果,这表明它可以处理更为复杂的对话交互,而不仅仅是功能工具。
    Abstract Large language models (LLMs) have demonstrated exceptional performance in planning the use of various functional tools, such as calculators and retrievers, particularly in question-answering tasks. In this paper, we expand the definition of these tools, centering on conceptual tools within the context of dialogue systems. A conceptual tool specifies a cognitive concept that aids systematic or investigative thought. These conceptual tools play important roles in practice, such as multiple psychological or tutoring strategies being dynamically applied in a single turn to compose helpful responses. To further enhance the reasoning and planning capability of LLMs with these conceptual tools, we introduce a multi-persona collaboration framework: Think-Plan-Execute (TPE). This framework decouples the response generation process into three distinct roles: Thinker, Planner, and Executor. Specifically, the Thinker analyzes the internal status exhibited in the dialogue context, such as user emotions and preferences, to formulate a global guideline. The Planner then generates executable plans to call different conceptual tools (e.g., sources or strategies), while the Executor compiles all intermediate results into a coherent response. This structured approach not only enhances the explainability and controllability of responses but also reduces token redundancy. We demonstrate the effectiveness of TPE across various dialogue response generation tasks, including multi-source (FoCus) and multi-strategy interactions (CIMA and PsyQA). This reveals its potential to handle real-world dialogue interactions that require more complicated tool learning beyond just functional tools. The full code and data will be released for reproduction.
    摘要 大型语言模型(LLM)在几种功能工具的使用规划方面表现出色,特别是在问答任务中。在这篇论文中,我们扩展了这些工具的定义,将注重在对话系统中的概念工具。概念工具指定了思维的认知概念,以便系统atic或调查性思维。这些概念工具在实践中发挥重要作用,例如在单个转律中应用多种心理或教学策略以组成有用的回答。为了进一步增强LLM的理解和规划能力,我们引入了多人格协作框架:思考-规划-执行(TPE)。这个框架将响应生成过程分解成三个不同角色:思考者、规划者和执行者。具体来说,思考者通过对对话上下文中的内部状态,如用户情感和首选,来形成全局指南。规划者则生成可执行的计划,以调用不同的概念工具(如来源或策略),而执行者则将所有中间结果编译成一个准确的回答。这种结构化的方法不仅提高了回答的可解释性和控制性,还减少了各种重复的token。我们在多种对话回答生成任务中证明了TPE的效iveness,包括多源(FoCus)和多策略互动(CIMA和PsyQA)。这表明它可以处理现实世界中的对话互动,需要更为复杂的工具学习。我们将完整的代码和数据公开发布,以便其他人复制和扩展。

Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble

  • paper_url: http://arxiv.org/abs/2309.16082
  • repo_url: None
  • paper_authors: Zhe Liu, Ozlem Kalinli
  • for: 隐私保护和模型更新
  • methods: 教师-学生框架和剩下一个 ensemble 方法
  • results: 在 LibriSpeech 和 WikiText-103 数据集上实现了更好的隐私利用之间的质量比In simpler English:
  • for: Privacy protection and model updating
  • methods: Teacher-student framework and leave-one-out ensemble method
  • results: Superior privacy-utility trade-offs on LibriSpeech and WikiText-103 datasets
    Abstract Recent research has shown that language models have a tendency to memorize rare or unique token sequences in the training corpus. After deploying a model, practitioners might be asked to delete any personal information from the model by individuals' requests. Re-training the underlying model every time individuals would like to practice their rights to be forgotten is computationally expensive. We employ a teacher-student framework and propose a novel leave-one-out ensemble method to unlearn the targeted textual sequences that need to be forgotten from the model. In our approach, multiple teachers are trained on disjoint sets; for each targeted sequence to be removed, we exclude the teacher trained on the set containing this sequence and aggregate the predictions from remaining teachers to provide supervision during fine-tuning. Experiments on LibriSpeech and WikiText-103 datasets show that the proposed method achieves superior privacy-utility trade-offs than other counterparts.
    摘要 最近的研究发现,语言模型有一种倾向,即记忆特殊或罕见的token序列在训练集中。当部署模型后,实际应用者可能需要根据个人需求删除模型中的个人信息。重新训练基础模型每次个人需要行使“忘记权”是 computationally expensive。我们采用教师-学生框架,并提出了一种新的离别一个学生 ensemble方法,用于从模型中忘记需要被忘记的文本序列。在我们的方法中,多个教师在不同的集合上进行训练;对于每个需要删除的序列,我们将包含这个序列的教师排除,并将剩下的教师的预测结果作为超vision提供给 fine-tuning。在 LibriSpeech 和 WikiText-103 数据集上进行的实验表明,我们的方法可以在 Privacy-Utility 质量之间取得更好的质量。