cs.AI - 2023-11-10

Testing LLMs on Code Generation with Varying Levels of Prompt Specificity

  • paper_url: http://arxiv.org/abs/2311.07599
  • repo_url: None
  • paper_authors: Lincoln Murr, Morgan Grainger, David Gao
  • for: 这个论文主要用于研究大自然语言模型(LLM)在自动代码生成方面的表现,以及不同提问精度对代码生成的影响。
  • methods: 本论文使用了多种大自然语言模型(LLM),如Bard、ChatGPT-3.5、ChatGPT-4和Claude-2,对 Python 编程问题进行自动代码生成。研究者使用了 104 个编程问题,每个问题有四种提问类型,以不同的测试和精度来评估代码的准确率、时间效率和空间效率。
  • results: 研究结果表明不同的 LLM 和提问类型之间存在显著的性能差异,而且提问精度对代码生成的准确率和时间效率有重要的影响。本研究的重要贡献在于找到了最佳提问策略,以便在自动代码生成任务中创造准确的 Python 函数。
    Abstract Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. Among the myriad of applications that benefit from LLMs, automated code generation is increasingly promising. The potential to transform natural language prompts into executable code promises a major shift in software development practices and paves the way for significant reductions in manual coding efforts and the likelihood of human-induced errors. This paper reports the results of a study that evaluates the performance of various LLMs, such as Bard, ChatGPT-3.5, ChatGPT-4, and Claude-2, in generating Python for coding problems. We focus on how levels of prompt specificity impact the accuracy, time efficiency, and space efficiency of the generated code. A benchmark of 104 coding problems, each with four types of prompts with varying degrees of tests and specificity, was employed to examine these aspects comprehensively. Our results indicate significant variations in performance across different LLMs and prompt types, and its key contribution is to reveal the ideal prompting strategy for creating accurate Python functions. This study lays the groundwork for further research in LLM capabilities and suggests practical implications for utilizing LLMs in automated code generation tasks and test-driven development.
    摘要 This study evaluates the performance of several LLMs, including Bard, ChatGPT-3.5, ChatGPT-4, and Claude-2, in generating Python code for coding problems. We focus on how the level of specificity in the prompts affects the accuracy, time efficiency, and space efficiency of the generated code. To examine these aspects comprehensively, we used a benchmark of 104 coding problems, each with four types of prompts with varying degrees of tests and specificity.Our results show significant variations in performance across different LLMs and prompt types. The study's key contribution is revealing the ideal prompting strategy for creating accurate Python functions. These findings lay the groundwork for further research into LLM capabilities and have practical implications for using LLMs in automated code generation tasks and test-driven development.

Resolving uncertainty on the fly: Modeling adaptive driving behavior as active inference

  • paper_url: http://arxiv.org/abs/2311.06417
  • repo_url: None
  • paper_authors: Johan Engström, Ran Wei, Anthony McDonald, Alfredo Garcia, Matt O’Kelly, Leif Johnson
  • for: 本研究旨在开发一个可以用于评估和开发自动驾驶车辆的人类驾驶模型,即对人类驾驶行为的理解。
  • methods: 本研究使用了活动推测模型,这是一种由计算神经科学开发的行为模型。该模型基于人类决策的最低预期自由能量原则,可以解释人类如何在不约束的情况下做出决策。
  • results: 研究发现,通过应用这种模型,可以解释人类在不同的驾驶情况下如何 adaptively 驾驶,例如穿过障碍物和同时进行眼动时间分享。这些结果表明了这种模型的一致性和可解释性。
    Abstract Understanding adaptive human driving behavior, in particular how drivers manage uncertainty, is of key importance for developing simulated human driver models that can be used in the evaluation and development of autonomous vehicles. However, existing traffic psychology models of adaptive driving behavior either lack computational rigor or only address specific scenarios and/or behavioral phenomena. While models developed in the fields of machine learning and robotics can effectively learn adaptive driving behavior from data, due to their black box nature, they offer little or no explanation of the mechanisms underlying the adaptive behavior. Thus, a generalizable, interpretable, computational model of adaptive human driving behavior is still lacking. This paper proposes such a model based on active inference, a behavioral modeling framework originating in computational neuroscience. The model offers a principled solution to how humans trade progress against caution through policy selection based on the single mandate to minimize expected free energy. This casts goal-seeking and information-seeking (uncertainty-resolving) behavior under a single objective function, allowing the model to seamlessly resolve uncertainty as a means to obtain its goals. We apply the model in two apparently disparate driving scenarios that require managing uncertainty, (1) driving past an occluding object and (2) visual time sharing between driving and a secondary task, and show how human-like adaptive driving behavior emerges from the single principle of expected free energy minimization.
    摘要

Forte: An Interactive Visual Analytic Tool for Trust-Augmented Net Load Forecasting

  • paper_url: http://arxiv.org/abs/2311.06413
  • repo_url: None
  • paper_authors: Kaustav Bhattacharjee, Soumya Kundu, Indrasis Chakraborty, Aritra Dasgupta
  • For: This paper aims to provide a visual analytics-based application (Forte) to explore deep probabilistic net load forecasting models across various input variables and understand the error rates for different scenarios.* Methods: The paper uses a web-based interface with carefully designed visual interventions to empower scientists to derive insights about model performance by simulating diverse scenarios, facilitating an informed decision-making process.* Results: The paper demonstrates the effectiveness of visualization techniques to provide valuable insights into the correlation between weather inputs and net load forecasts, ultimately advancing grid capabilities by improving trust in forecasting models.
    Abstract Accurate net load forecasting is vital for energy planning, aiding decisions on trade and load distribution. However, assessing the performance of forecasting models across diverse input variables, like temperature and humidity, remains challenging, particularly for eliciting a high degree of trust in the model outcomes. In this context, there is a growing need for data-driven technological interventions to aid scientists in comprehending how models react to both noisy and clean input variables, thus shedding light on complex behaviors and fostering confidence in the outcomes. In this paper, we present Forte, a visual analytics-based application to explore deep probabilistic net load forecasting models across various input variables and understand the error rates for different scenarios. With carefully designed visual interventions, this web-based interface empowers scientists to derive insights about model performance by simulating diverse scenarios, facilitating an informed decision-making process. We discuss observations made using Forte and demonstrate the effectiveness of visualization techniques to provide valuable insights into the correlation between weather inputs and net load forecasts, ultimately advancing grid capabilities by improving trust in forecasting models.
    摘要 正确的电网负载预测是重要的能源观察,帮助决策贸易和负载分配。然而,评估预测模型对不同的输入变数,如温度和湿度,的性能仍然是一个挑战,尤其是为了获得高度的信任度。在这个上下文中,有一个增长的需求是使用数据驱动的技术来帮助科学家理解预测模型对不同的输入变数具有多少影响,以及这些变数对预测模型的影响。在这篇论文中,我们提出了Forte,一个基于可观察分析的应用程序,用于探索深度概率电网负载预测模型的不同输入变数下的性能。这个网页式界面通过精心设计的可观察干预,帮助科学家从不同的enario中获得预测模型的性能,并帮助他们做出了 Informed 的决策。我们详细说明了使用Forte所作出的观察,并证明了可观察技术的效用,以提高电网预测模型的信任度,最终提高电网的能力。

ChatGPT in the context of precision agriculture data analytics

  • paper_url: http://arxiv.org/abs/2311.06390
  • repo_url: https://github.com/potamitis123/chatgpt-in-the-context-of-precision-agriculture-data-analytics
  • paper_authors: Ilyas Potamitis
  • for: 这个研究 argue that 将 ChatGPT интеGRATED into the data processing pipeline of automated sensors in precision agriculture 可以带来多个Benefits和改进现代农业实践中的多个方面。
  • methods: 这个研究使用 ChatGPT 的 speech recognition输入模块,提供一种更直观和自然的方式 для政策制定者们与农业数据处理系统的数据库进行交互,从而提高了对数据分析软件的学习和适应成本。
  • results: 这个研究表明,通过 ChatGPT 的语言模型可以将 Speech 输入映射到文本,并且可以通过 Python 代码和 Pandas 与整个数据库进行交互,可以实时提供农业数据分析的结果和建议,并且可以通过语音合成器与用户进行Iterative 和改进的交互。
    Abstract In this study we argue that integrating ChatGPT into the data processing pipeline of automated sensors in precision agriculture has the potential to bring several benefits and enhance various aspects of modern farming practices. Policy makers often face a barrier when they need to get informed about the situation in vast agricultural fields to reach to decisions. They depend on the close collaboration between agricultural experts in the field, data analysts, and technology providers to create interdisciplinary teams that cannot always be secured on demand or establish effective communication across these diverse domains to respond in real-time. In this work we argue that the speech recognition input modality of ChatGPT provides a more intuitive and natural way for policy makers to interact with the database of the server of an agricultural data processing system to which a large, dispersed network of automated insect traps and sensors probes reports. The large language models map the speech input to text, allowing the user to form its own version of unconstrained verbal query, raising the barrier of having to learn and adapt oneself to a specific data analytics software. The output of the language model can interact through Python code and Pandas with the entire database, visualize the results and use speech synthesis to engage the user in an iterative and refining discussion related to the data. We show three ways of how ChatGPT can interact with the database of the remote server to which a dispersed network of different modalities (optical counters, vibration recordings, pictures, and video), report. We examine the potential and the validity of the response of ChatGPT in analyzing, and interpreting agricultural data, providing real time insights and recommendations to stakeholders
    摘要 在这项研究中,我们 argue that将 ChatGPT integrate into 自动感知系统的数据处理管道可以带来多种优点,提高现代农业实践中的各个方面。政策制定者经常遇到困难,当他们需要获取庞大农业场景中的信息,以便做出决策。他们需要和农业专家、数据分析师和技术提供商合作,创建协同团队,但这些团队不一定可以在需要时协作,建立有效的交流也是一个挑战。在这项工作中,我们 argue that ChatGPT 的语音识别输入模式提供了一种更直观和自然的方式,让政策制定者与农业数据处理系统的服务器上的数据库进行交互。大语言模型将语音输入转换为文本,让用户可以自定义的提问,不需要适应特定的数据分析软件。输出的语言模型可以通过 Python 代码和 Pandas 与整个数据库进行交互,可视化结果,并使用语音合成器与用户进行可迭代的讨论,与数据相关。我们介绍了三种 ChatGPT 与远程服务器上的数据库交互的方法。我们研究了 ChatGPT 对农业数据的分析和解释的可能性和有效性,以及在实时提供农业决策者的信息和建议。

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

  • paper_url: http://arxiv.org/abs/2311.06243
  • repo_url: https://github.com/wy1iu/butterfly-oft
  • paper_authors: Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf
  • for: 这篇论文主要目的是研究一种对下游任务进行优化的原理方法—Orthogonal Finetuning (OFT)。
  • methods: 这篇论文使用了一种叫做Orthogonal Butterfly (BOFT)的优化方法,它是基于Cooley-Tukey快速傅立叶transform算法的启发,并且具有更好的参数效率。
  • results: 这篇论文通过实践研究,发现BOFT可以对大型视觉对应、大型语言模型和文本对应图像散乱模型进行优化,并且比OFT更有优化效果。
    Abstract Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.
    摘要 大型基金模型在现场变得普遍,但从头来训练它们是不可持续的。因此,有效地适应这些强大模型到下游任务变得越来越重要。在这篇论文中,我们研究了一种原则正式的 Parameter-efficient finetuning 方法——Orthogonal Finetuning (OFT)。尽管它们展现了良好的泛化能力,但 OFT 仍然需要一些可训练的参数,这是因为正交矩阵的维度较高。为了解决这个问题,我们从信息传输的角度来考虑 OFT,然后确定了一些关键的需求,可以提高参数效率。受到 Cooley-Tukey 快速傅立叶变换算法的启发,我们提议一种高效的正交参数化方法,使用蝴蝶结构。我们将这种参数化方法应用于 OFT,创造了一种新的参数效率高的 finetuning 方法,称为 Orthogonal Butterfly (BOFT)。 BOFT 将 OFT 作为特例,提出一种总体的正交 finetuning 框架。最后,我们进行了广泛的实验研究,适应大型视觉转换器、大型语言模型和文本到图像扩散模型到视觉和语言领域中的各种下游任务。

Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations

  • paper_url: http://arxiv.org/abs/2311.06330
  • repo_url: https://github.com/roihn/sabm
  • paper_authors: Zengqing Wu, Run Peng, Xu Han, Shuyuan Zheng, Yixin Zhang, Chuan Xiao
  • for: 这篇论文旨在探讨智能代理模型(SABM)的可能性和应用,它将自然语言模型(LLM)与代理模型(ABM)结合,以模拟复杂系统的行为。
  • methods: 该论文首先介绍了ABM的基本概念和挑战,然后提出了通过与LLM结合来解决这些挑战的想法。具体来说, authors使用了GPT作为LLM,并开发了一种基于SABM的方法。
  • results: 论文通过三个实验(代码可以在https://github.com/Roihn/SABM),证明了SABM的有效性和可行性。这些实验表明,SABM可以模拟复杂系统的行为,并且可以增加模型的灵活性和现实感。
    Abstract Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual components of a system. Yet, ABM has its own set of challenges, notably its struggle with modeling natural language instructions and common sense in mathematical equations or rules. This paper seeks to transcend these boundaries by integrating Large Language Models (LLMs) like GPT into ABM. This amalgamation gives birth to a novel framework, Smart Agent-Based Modeling (SABM). Building upon the concept of smart agents -- entities characterized by their intelligence, adaptability, and computation ability -- we explore in the direction of utilizing LLM-powered agents to simulate real-world scenarios with increased nuance and realism. In this comprehensive exploration, we elucidate the state of the art of ABM, introduce SABM's potential and methodology, and present three case studies (source codes available at https://github.com/Roihn/SABM), demonstrating the SABM methodology and validating its effectiveness in modeling real-world systems. Furthermore, we cast a vision towards several aspects of the future of SABM, anticipating a broader horizon for its applications. Through this endeavor, we aspire to redefine the boundaries of computer simulations, enabling a more profound understanding of complex systems.
    摘要 SABM leverages the concept of smart agents, which are entities characterized by their intelligence, adaptability, and computational ability. By using LLM-powered agents, SABM can simulate real-world scenarios with increased nuance and realism. In this comprehensive exploration, we elucidate the current state of the art of ABM, introduce the potential and methodology of SABM, and present three case studies (available at ), demonstrating the effectiveness of the SABM methodology in modeling real-world systems.Looking forward, we envision a broader horizon for the applications of SABM, with the potential to redefine the boundaries of computer simulations and enable a deeper understanding of complex systems. By harnessing the power of LLMs and ABM, SABM has the potential to revolutionize the field of computer simulations and provide new insights into complex systems.

Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models

  • paper_url: http://arxiv.org/abs/2311.06233
  • repo_url: None
  • paper_authors: Shahriar Golchin, Mihai Surdeanu
  • for: 检测大型自然语言模型(LLM)中的数据污染
  • methods: 使用多选题型测验方法检测数据污染,创建三个Word级改动后的实例,保持原始实例的语义和句子结构不变
  • results: 在七个数据集和其分割(训练和测试/验证)上,使用GPT-4和GPT-3.5两种state-of-the-art LLM,发现方法可以增强数据污染检测和准确地估计污染程度,即使污染信号弱。
    Abstract We propose the Data Contamination Quiz, a simple and effective approach to detect data contamination in large language models (LLMs) and estimate the amount of it. Specifically, we frame data contamination detection as a series of multiple-choice questions. We devise a quiz format wherein three perturbed versions of each dataset instance are created. These changes only include word-level perturbations, replacing words with their contextual synonyms, ensuring both the semantic and sentence structure remain exactly the same as the original instance. Together with the original instance, these perturbed versions constitute the choices in the quiz. Given that the only distinguishing signal among these choices is the exact wording, an LLM, when tasked with identifying the original instance from the choices, opts for the original if it has memorized it in its pre-training phase--a trait intrinsic to LLMs. A dataset partition is then marked as contaminated if the LLM's performance on the quiz surpasses what random chance suggests. Our evaluation spans seven datasets and their respective splits (train and test/validation) on two state-of-the-art LLMs: GPT-4 and GPT-3.5. While lacking access to the pre-training data, our results suggest that our approach not only enhances the detection of data contamination but also provides an accurate estimation of its extent, even when the contamination signal is weak.
    摘要 我们提出了数据污染测验(Data Contamination Quiz),一种简单有效的方法用于检测大型自然语言模型(LLM)中的数据污染和量化其扩散。具体来说,我们将数据污染检测转化为一系列多选题目。我们设计了一种测验形式,其中每个数据集实例上分别创建了三个杂化版本。这些杂化版本仅包括单词水平的修改,将单词换成相关的同义词,以保持原始实例的语义和句子结构完全相同。与原始实例一起,这些杂化版本组成测验的选择。由于这些选择之间只有单词的不同,因此当一个LLM在面临这些选择时,如果它在预训练阶段已经记忆了原始实例,那么它会选择原始实例。我们对七个dataset和它们的分割(训练和测试/验证)进行了评估,使用两个现代LLM:GPT-4和GPT-3.5。尽管我们没有直接访问预训练数据,但我们的方法不仅可以增强数据污染检测,还可以准确地估计污染的程度,即使污染信号弱。

Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

  • paper_url: http://arxiv.org/abs/2311.06224
  • repo_url: None
  • paper_authors: Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann
  • for: 本研究旨在探讨深度学习中使用 sintetic 数据的问题,特别是Shape bias在训练过程中的表现。
  • methods: 我们使用了不同的网络架构和监督方法来评估Shape bias的可靠性和其能否解释模型认知的差异。
  • results: 我们发现Shape bias的表现受到网络架构和监督方法的影响,并且与多样性和自然性相互纠缠。我们提出了一种新的解释Shape bias的方法,即用于估计样本集中样本的多样性。
    Abstract Recent advancements in deep learning have been primarily driven by the use of large models trained on increasingly vast datasets. While neural scaling laws have emerged to predict network performance given a specific level of computational resources, the growing demand for expansive datasets raises concerns. To address this, a new research direction has emerged, focusing on the creation of synthetic data as a substitute. In this study, we investigate how neural networks exhibit shape bias during training on synthetic datasets, serving as an indicator of the synthetic data quality. Specifically, our findings indicate three key points: (1) Shape bias varies across network architectures and types of supervision, casting doubt on its reliability as a predictor for generalization and its ability to explain differences in model recognition compared to human capabilities. (2) Relying solely on shape bias to estimate generalization is unreliable, as it is entangled with diversity and naturalism. (3) We propose a novel interpretation of shape bias as a tool for estimating the diversity of samples within a dataset. Our research aims to clarify the implications of using synthetic data and its associated shape bias in deep learning, addressing concerns regarding generalization and dataset quality.
    摘要 Our findings reveal three key points:1. Shape bias varies across network architectures and types of supervision, casting doubt on its reliability as a predictor for generalization and its ability to explain differences in model recognition compared to human capabilities.2. Relying solely on shape bias to estimate generalization is unreliable, as it is entangled with diversity and naturalism.3. We propose a novel interpretation of shape bias as a tool for estimating the diversity of samples within a dataset.Our research aims to clarify the implications of using synthetic data and its associated shape bias in deep learning, addressing concerns regarding generalization and dataset quality.

MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

  • paper_url: http://arxiv.org/abs/2311.06217
  • repo_url: None
  • paper_authors: Shentong Mo, Paul Pu Liang, Russ Salakhutdinov, Louis-Philippe Morency
  • for: 这篇论文是为了开发机器学习技术来处理互联网物联网(IoT)数据而写的。
  • methods: 这篇论文使用了多种感知模式,包括动作、热度、地理位置、成像、深度、声音和视频感知模式,以及模式特异性和噪声特征。
  • results: 这篇论文提出了多种模型基线,包括单感知模式和多感知模式,以及多任务和多感知模型,以便未来的研究人员可以更好地进行多感知表示学习。
    Abstract The Internet of Things (IoT), the network integrating billions of smart physical devices embedded with sensors, software, and communication technologies for the purpose of connecting and exchanging data with other devices and systems, is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio for prediction tasks involving the pose, gaze, activities, and gestures of humans as well as the touch, contact, pose, 3D of physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for impact in understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To develop machine learning technologies for IoT, this paper proposes MultiIoT, the most expansive IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges involving (1) learning from many sensory modalities, (2) fine-grained interactions across long temporal ranges, and (3) extreme heterogeneity due to unique structure and noise topologies in real-world sensors. We also release a set of strong modeling baselines, spanning modality and task-specific methods to multisensory and multitask models to encourage future research in multisensory representation learning for IoT.
    摘要 互联网物品(IoT),整合了数百万个智能物理设备,嵌入了感知器、软件和通信技术,用于连接和交换数据,是当代世界中一个关键和迅速发展的组成部分。IoT生态系统提供了丰富的现实世界模式,如运动、热度、地理位置、成像、深度、音频和视频等,用于预测人类的姿势、视线、活动和手势。机器学习对IoT数据进行自动处理,可以实现高效的推理,以便更好地理解人类的健康状况、控制物理设备和连接智能城市。为了开发IoT中机器学习技术,本文提出了MultiIoT,迄今为止最大的IoTbenchmark,包括12种感知模式和8个任务,共计1.15万个样本。MultiIoT带来了来自多种感知模式的学习挑战,以及长时间范围内的细化交互和实际世界感知器的特殊结构和噪声概率图。我们还发布了一组强大的模型基线,覆盖模式和任务特定的方法、多感知模型和多任务模型,以促进未来对多感知表示学习的研究。

BanglaBait: Semi-Supervised Adversarial Approach for Clickbait Detection on Bangla Clickbait Dataset

  • paper_url: http://arxiv.org/abs/2311.06204
  • repo_url: https://github.com/mdmotaharmahtab/banglabait
  • paper_authors: Md. Motahar Mahtab, Monirul Haque, Mehedi Hasan, Farig Sadeque
  • for: 本研究旨在探讨 clicks 文章标题的检测问题,特别是在低资源语言如孟加拉语中。
  • methods: 研究人员建立了第一个孟加拉语clickbait检测数据集,包含15,056个标注新闻文章和65,406个未标注新闻文章,来自clickbait dense 新闻网站。每篇文章都由三位专家语言学家标注,包括标题、文体和其他元数据。研究人员使用 semi-supervised 生成对抗网络(SS GANs)来练化一个预训练的孟加拉语变换器模型。
  • results: 提出的模型在这个数据集上表现出色,超越了传统神经网络模型(LSTM、GRU、CNN)和语言特征基于的模型。这个数据集和详细的分析和比较可以提供未来关于孟加拉语文章标题检测的基础研究。研究人员已经发布相关代码和数据集。
    Abstract Intentionally luring readers to click on a particular content by exploiting their curiosity defines a title as clickbait. Although several studies focused on detecting clickbait titles in English articles, low resource language like Bangla has not been given adequate attention. To tackle clickbait titles in Bangla, we have constructed the first Bangla clickbait detection dataset containing 15,056 labeled news articles and 65,406 unlabelled news articles extracted from clickbait dense news sites. Each article has been labeled by three expert linguists and includes an article's title, body, and other metadata. By incorporating labeled and unlabelled data, we finetune a pretrained Bangla transformer model in an adversarial fashion using Semi Supervised Generative Adversarial Networks (SS GANs). The proposed model acts as a good baseline for this dataset, outperforming traditional neural network models (LSTM, GRU, CNN) and linguistic feature based models. We expect that this dataset and the detailed analysis and comparison of these clickbait detection models will provide a fundamental basis for future research into detecting clickbait titles in Bengali articles. We have released the corresponding code and dataset.
    摘要 <>将文本翻译成简化中文。<> clickbait 标题的目的是引诱读者点击特定内容,这定义了 clickbait 标题。 although several studies have focused on detecting clickbait titles in English articles, low-resource languages like Bangla have not received adequate attention. To address clickbait titles in Bangla, we have constructed the first Bangla clickbait detection dataset, containing 15,056 labeled news articles and 65,406 unlabeled news articles extracted from clickbait-dense news sites. Each article has been labeled by three expert linguists and includes the article's title, body, and other metadata. By incorporating labeled and unlabeled data, we fine-tune a pre-trained Bangla transformer model in an adversarial fashion using Semi-Supervised Generative Adversarial Networks (SS GANs). The proposed model serves as a good baseline for this dataset and outperforms traditional neural network models (LSTM, GRU, CNN) and linguistic feature-based models. We expect that this dataset and the detailed analysis and comparison of these clickbait detection models will provide a fundamental basis for future research into detecting clickbait titles in Bengali articles. We have released the corresponding code and dataset.

A Survey of AI Text-to-Image and AI Text-to-Video Generators

  • paper_url: http://arxiv.org/abs/2311.06329
  • repo_url: None
  • paper_authors: Aditi Singh
  • for: investigate cutting-edge approaches in Text-to-Image and Text-to-Video AI generations
  • methods: cover data preprocessing techniques, neural network types, and evaluation metrics used in the field
  • results: discuss challenges and limitations of Text-to-Image and Text-to-Video AI generations, as well as future research directionsHere’s the summary in Traditional Chinese:
  • for: 研究文本至图和文本至影片人工智能生成领域中的进步技术
  • methods: 涵盖数据清洁技术、神经网络类型和评估指标在这个领域中的使用
  • results: 探讨文本至图和文本至影片人工智能生成中的挑战和限制,以及未来研究方向
    Abstract Text-to-Image and Text-to-Video AI generation models are revolutionary technologies that use deep learning and natural language processing (NLP) techniques to create images and videos from textual descriptions. This paper investigates cutting-edge approaches in the discipline of Text-to-Image and Text-to-Video AI generations. The survey provides an overview of the existing literature as well as an analysis of the approaches used in various studies. It covers data preprocessing techniques, neural network types, and evaluation metrics used in the field. In addition, the paper discusses the challenges and limitations of Text-to-Image and Text-to-Video AI generations, as well as future research directions. Overall, these models have promising potential for a wide range of applications such as video production, content creation, and digital marketing.
    摘要 文本到图像和文本到视频人工智能生成模型是革新技术,使用深度学习和自然语言处理(NLP)技术来生成图像和视频从文本描述。本文对 Text-to-Image 和 Text-to-Video AI 生成领域进行了详细的探讨和分析,包括现有文献的概述以及不同研究中使用的方法。它还讨论了该领域的挑战和限制,以及未来的研究方向。总之,这些模型在视频生产、内容创作和数字市场营销等领域具有广阔的应用前景。Here's the translation in Traditional Chinese:文本到图像和文本到影片人工智能生成模型是革新技术,使用深度学习和自然语言处理(NLP)技术来生成图像和影片从文本描述。本文对 Text-to-Image 和 Text-to-Video AI 生成领域进行了详细的探讨和分析,包括现有文献的概述以及不同研究中使用的方法。它还讨论了该领域的挑战和限制,以及未来的研究方向。总之,这些模型在影片生产、内容创作和数位市场营销等领域具有广阔的应用前景。

Greedy PIG: Adaptive Integrated Gradients

  • paper_url: http://arxiv.org/abs/2311.06192
  • repo_url: None
  • paper_authors: Kyriakos Axiotis, Sami Abu-al-haija, Lin Chen, Matthew Fahrbach, Gang Fu
  • for: 本文提出了一种基于subset选择的特征归因和特征选择框架,用于解释深度学习模型的预测结果。
  • methods: 本文提出了一种名为Greedy PIG的自适应加速方法,用于Feature attribution和Feature selection。
  • results: 试验结果表明,引入自适应性可以使归因方法更加强大和多功能。
    Abstract Deep learning has become the standard approach for most machine learning tasks. While its impact is undeniable, interpreting the predictions of deep learning models from a human perspective remains a challenge. In contrast to model training, model interpretability is harder to quantify and pose as an explicit optimization problem. Inspired by the AUC softmax information curve (AUC SIC) metric for evaluating feature attribution methods, we propose a unified discrete optimization framework for feature attribution and feature selection based on subset selection. This leads to a natural adaptive generalization of the path integrated gradients (PIG) method for feature attribution, which we call Greedy PIG. We demonstrate the success of Greedy PIG on a wide variety of tasks, including image feature attribution, graph compression/explanation, and post-hoc feature selection on tabular data. Our results show that introducing adaptivity is a powerful and versatile method for making attribution methods more powerful.
    摘要 深度学习已成为大多数机器学习任务的标准方法。虽然其影响无疑,但从人类视角来解释深度学习模型预测结果仍然是一个挑战。与模型训练相比,模型解释更难以量化和表示为显式优化问题。 Drawing inspiration from AUC softmax information curve (AUC SIC) metric for evaluating feature attribution methods, we propose a unified discrete optimization framework for feature attribution and feature selection based on subset selection. This leads to a natural adaptive generalization of the path integrated gradients (PIG) method for feature attribution, which we call Greedy PIG. We demonstrate the success of Greedy PIG on a wide variety of tasks, including image feature attribution, graph compression/explanation, and post-hoc feature selection on tabular data. Our results show that introducing adaptivity is a powerful and versatile method for making attribution methods more powerful.Here's the text with some notes on the translation:* "深度学习" (shēn dào xué xí) is the Chinese term for "deep learning"* "模型训练" (mó delè xùn zhí) is the Chinese term for "model training"* "模型解释" (mó delè jiě jiè) is the Chinese term for "model interpretation"* "AUC SIC" (AUC softmax information curve) is translated as "AUC SIC" (AUC 软MAX信息曲线)* "subset selection" is translated as "子集选择" (zǐ jiāo jiàn zhèng)* "path integrated gradients" (PIG) is translated as "路径集成 gradient" (lù jì zhì zhèng jiè dào)* "Greedy PIG" is translated as "积极 PIG" (jī jí PIG)Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Taiwan, and other regions.

FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective

  • paper_url: http://arxiv.org/abs/2311.06190
  • repo_url: https://github.com/aikunyi/fouriergnn
  • paper_authors: Kun Yi, Qi Zhang, Wei Fan, Hui He, Liang Hu, Pengyang Wang, Ning An, Longbing Cao, Zhendong Niu
  • for: 这个论文旨在提出一种新的多变量时间序列预测方法,它可以考虑多个时间序列之间的干扰关系,并且可以有效地预测未来时间序列的值。
  • methods: 这篇论文使用了一种新的数据结构 called hypervariate graph,它将每个时间序列的值看作一个图节点,并将每个滑动窗口转换为一个完全连接的空间时间图。然后,它提出了一种新的架构 called Fourier Graph Neural Network (FourierGNN),它可以在快 Fourier 空间中进行矩阵乘法,并且可以有效地预测未来时间序列的值。
  • results: 在七个 dataset 上进行了广泛的实验,结果显示,FourierGNN 可以在预测时间序列值方面具有更高的效果,同时具有更低的复杂性和更少的参数。
    Abstract Multivariate time series (MTS) forecasting has shown great importance in numerous industries. Current state-of-the-art graph neural network (GNN)-based forecasting methods usually require both graph networks (e.g., GCN) and temporal networks (e.g., LSTM) to capture inter-series (spatial) dynamics and intra-series (temporal) dependencies, respectively. However, the uncertain compatibility of the two networks puts an extra burden on handcrafted model designs. Moreover, the separate spatial and temporal modeling naturally violates the unified spatiotemporal inter-dependencies in real world, which largely hinders the forecasting performance. To overcome these problems, we explore an interesting direction of directly applying graph networks and rethink MTS forecasting from a pure graph perspective. We first define a novel data structure, hypervariate graph, which regards each series value (regardless of variates or timestamps) as a graph node, and represents sliding windows as space-time fully-connected graphs. This perspective considers spatiotemporal dynamics unitedly and reformulates classic MTS forecasting into the predictions on hypervariate graphs. Then, we propose a novel architecture Fourier Graph Neural Network (FourierGNN) by stacking our proposed Fourier Graph Operator (FGO) to perform matrix multiplications in Fourier space. FourierGNN accommodates adequate expressiveness and achieves much lower complexity, which can effectively and efficiently accomplish the forecasting. Besides, our theoretical analysis reveals FGO's equivalence to graph convolutions in the time domain, which further verifies the validity of FourierGNN. Extensive experiments on seven datasets have demonstrated our superior performance with higher efficiency and fewer parameters compared with state-of-the-art methods.
    摘要 多变量时间序列(MTS)预测已经在多个行业得到了重要的应用。当前的状态艺术Graph Neural Network(GNN)基本预测方法通常需要图网络(例如GCN)和时间网络(例如LSTM)来捕捉 между序列(空间)动力和内部序列(时间)依赖项,分别。然而,这两种网络的不确定兼容性会增加手动设计模型的困难度。另外,分离的空间和时间模型自然地违反了实际世界中的一体化空时间依赖关系,这大大降低了预测性能。为了解决这些问题,我们开explored an interesting direction of directly applying graph networks and rethinking MTS forecasting from a pure graph perspective.我们首先定义了一种新的数据结构,卷积graph,其中每个时间序列值(无论是变量或时间戳)都被视为图节点,并将滑动窗口转化为空间时间完全连接图。这种视角同时考虑了空间时间动力的统一,并将经典MTS预测转化为对卷积图的预测。然后,我们提出了一种新的架构Fourier Graph Neural Network(FourierGNN),其基于我们提出的快捷Graph Operator(FGO)来执行矩阵乘法操作。FourierGNN具有充分的表达能力,同时可以有效地和高效地完成预测。此外,我们的理论分析表明FGO的等价性于图 convolutions在时间频谱中,这进一步证明了FourierGNN的有效性。我们在七个数据集上进行了广泛的实验,结果显示我们的性能高于当前状态艺术方法,同时具有更低的复杂性和更少的参数。

Frequency-domain MLPs are More Effective Learners in Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2311.06184
  • repo_url: https://github.com/aikunyi/frets
  • paper_authors: Kun Yi, Qi Zhang, Wei Fan, Shoujin Wang, Pengyang Wang, Hui He, Defu Lian, Ning An, Longbing Cao, Zhendong Niu
  • for: 时间序列预测任务中的一种基于多层感知器(MLP)的新方法,旨在提高预测性能。
  • methods: 使用频域MLP来学习时间序列的频谱特征,并通过频域域转换和频率学习两个阶段来学习时间序列的局部和全局相关性。
  • results: 在13个实验室中,与状态艺术方法进行比较,FreTS方法具有更高的预测精度和稳定性。
    Abstract Time series forecasting has played the key role in different industrial, including finance, traffic, energy, and healthcare domains. While existing literatures have designed many sophisticated architectures based on RNNs, GNNs, or Transformers, another kind of approaches based on multi-layer perceptrons (MLPs) are proposed with simple structure, low complexity, and {superior performance}. However, most MLP-based forecasting methods suffer from the point-wise mappings and information bottleneck, which largely hinders the forecasting performance. To overcome this problem, we explore a novel direction of applying MLPs in the frequency domain for time series forecasting. We investigate the learned patterns of frequency-domain MLPs and discover their two inherent characteristic benefiting forecasting, (i) global view: frequency spectrum makes MLPs own a complete view for signals and learn global dependencies more easily, and (ii) energy compaction: frequency-domain MLPs concentrate on smaller key part of frequency components with compact signal energy. Then, we propose FreTS, a simple yet effective architecture built upon Frequency-domain MLPs for Time Series forecasting. FreTS mainly involves two stages, (i) Domain Conversion, that transforms time-domain signals into complex numbers of frequency domain; (ii) Frequency Learning, that performs our redesigned MLPs for the learning of real and imaginary part of frequency components. The above stages operated on both inter-series and intra-series scales further contribute to channel-wise and time-wise dependency learning. Extensive experiments on 13 real-world benchmarks (including 7 benchmarks for short-term forecasting and 6 benchmarks for long-term forecasting) demonstrate our consistent superiority over state-of-the-art methods.
    摘要 时间序列预测在不同的行业中扮演着关键角色,包括金融、交通、能源和医疗领域。而现有的文献中设计了许多复杂的架构,如RNNs、GNNs或Transformers,另一种基于多层感知器(MLPs)的方法具有简单的结构、低复杂度和超越性。然而,大多数MLP基于预测方法受到点约映射和信息瓶颈的限制,这大大降低预测性能。为了解决这个问题,我们开探了在频率域应用MLP的新方向,并 investigate了频率域MLP学习的特征。我们发现频率域MLP拥有两种内在特征,即全球视图和能量压缩,这两种特征使得频率域MLP在预测时Series中表现出优异。然后,我们提出了FreTS,一种简单 yet有效的架构,基于频率域MLP进行时Series预测。FreTS主要包括两个阶段,即频率域转换和频率学习。频率域转换将时间域信号转换为复数频率域,而频率学习则使用我们重新设计的MLP进行频率组成部分的学习。这两个阶段在时间和通道级别进行了规模进行了时间和通道级别的依赖学习。我们对13个实际benchmark进行了广泛的实验,结果表明我们在state-of-the-art方法之上保持了稳定的优势。

Search-Based Fairness Testing: An Overview

  • paper_url: http://arxiv.org/abs/2311.06175
  • repo_url: None
  • paper_authors: Hussaini Mamman, Shuib Basri, Abdullateef Oluwaqbemiga Balogun, Abdullahi Abubakar Imam, Ganesh Kumar, Luiz Fernando Capretz
  • for: 这篇论文主要是为了探讨人工智能系统中的偏见问题,以及如何通过搜索测试来检测和解决这些偏见。
  • methods: 本文主要介绍了目前关于公平测试的研究,尤其是通过搜索测试来实现公平测试的方法。我们的分析发现,现有的搜索测试方法可以帮助解决人工智能系统中的偏见问题,但还有一些需要改进的方面。
  • results: 本文的分析发现,目前关于公平测试的研究做到了一定的进步,但还有一些需要改进的方面。未来的研究应该更加强调利用现有的搜索测试方法来进行公平测试,以确保人工智能系统中的偏见问题得到解决。
    Abstract Artificial Intelligence (AI) has demonstrated remarkable capabilities in domains such as recruitment, finance, healthcare, and the judiciary. However, biases in AI systems raise ethical and societal concerns, emphasizing the need for effective fairness testing methods. This paper reviews current research on fairness testing, particularly its application through search-based testing. Our analysis highlights progress and identifies areas of improvement in addressing AI systems biases. Future research should focus on leveraging established search-based testing methodologies for fairness testing.
    摘要 人工智能(AI)在招聘、金融、医疗和司法等领域表现出了惊人的能力,但AI系统中的偏见引起了道德和社会问题的关注,高调出了有效的公平测试方法的需求。本文综述当前关于公平测试的研究,特别是通过搜索基于测试方法的应用。我们的分析显示了进步和改进的方向,未来的研究应该集中于利用已有的搜索基于测试方法来进行公平测试。

Language Models can be Logical Solvers

  • paper_url: http://arxiv.org/abs/2311.06158
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen
  • for: 这篇论文旨在探讨大语言模型(LLM)是否可以直接模仿逻辑解题器的思维过程,以提高其逻辑推理能力。
  • methods: 该论文提出了一种新的语言模型LoGiPT,它通过学习逻辑解题器的语法和 sintaxis 来直接模仿逻辑解题器的思维过程,并且不需要解析自然语言问题。
  • results: 实验结果表明,LoGiPT在两个公共的逻辑推理数据集上表现出色,超越了现有的解题器辅助语言模型和少量提示方法,并且在竞争的LLM如ChatGPT或GPT-4上表现了竞争力。
    Abstract Logical reasoning is a fundamental aspect of human intelligence and a key component of tasks like problem-solving and decision-making. Recent advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. The state-of-the-art, solver-augmented language models, use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions. In this paper, we introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers and bypasses the parsing errors by learning to strict adherence to solver syntax and grammar. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers. Experimental results on two public deductive reasoning datasets demonstrate that LoGiPT outperforms state-of-the-art solver-augmented LMs and few-shot prompting methods on competitive LLMs like ChatGPT or GPT-4.
    摘要 理智推理是人类智能的基本方面,对于问题解决和决策都是重要组成部分。 current advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. State-of-the-art solver-augmented language models use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions.在这篇论文中,我们介绍了LoGiPT,一种新的语言模型,它直接模拟逻辑解决器的思维过程,并通过学习逻辑解决器的语法和语言规则,减少或消除解析错误。 LoGiPT 在两个公共的逻辑推理数据集上进行了实验,并证明了它在与state-of-the-art solver-augmented LMs和 few-shot prompting methods进行比较中表现出色。

Going beyond persistent homology using persistent homology

  • paper_url: http://arxiv.org/abs/2311.06152
  • repo_url: None
  • paper_authors: Johanna Immonen, Amauri H. Souza, Vikas Garg
  • for: 这篇论文的目的是提高图像逻辑测试中的表达能力。
  • methods: 论文使用了 persistent homology(PH)来增强图像模型的表达能力。
  • results: 论文提出了一种新的颜色分离集来解决图像模型中的表达限制问题,并实现了一种基于颜色级别的PH的学习方法,从而提高了图像模型的表达能力。
    Abstract Representational limits of message-passing graph neural networks (MP-GNNs), e.g., in terms of the Weisfeiler-Leman (WL) test for isomorphism, are well understood. Augmenting these graph models with topological features via persistent homology (PH) has gained prominence, but identifying the class of attributed graphs that PH can recognize remains open. We introduce a novel concept of color-separating sets to provide a complete resolution to this important problem. Specifically, we establish the necessary and sufficient conditions for distinguishing graphs based on the persistence of their connected components, obtained from filter functions on vertex and edge colors. Our constructions expose the limits of vertex- and edge-level PH, proving that neither category subsumes the other. Leveraging these theoretical insights, we propose RePHINE for learning topological features on graphs. RePHINE efficiently combines vertex- and edge-level PH, achieving a scheme that is provably more powerful than both. Integrating RePHINE into MP-GNNs boosts their expressive power, resulting in gains over standard PH on several benchmarks for graph classification.
    摘要 Message-passing graph neural networks (MP-GNNs) 的表示限制已经很好地了解,例如通过weisfeiler-leman (WL) 测试来判断图是否同构。通过添加图的拓扑特征via persistent homology (PH) 得到了广泛应用,但是确定 attributed graphs 中 PH 能认可的类型仍然是一个重要的开放问题。我们提出了一种新的色分集来解决这个重要问题。我们证明了基于图连接组件的persistence得到了必要和 suficient 条件,并且证明 neither vertex-level PH nor edge-level PH 可以包含另一个类型。我们建议RePHINE,一种可以有效地结合 vertex-level PH 和 edge-level PH 的学习方法。RePHINE 可以提高 MP-GNNs 的表达能力,在多个图分类 benchmark 上实现了比标准 PH 更高的性能。

Dense Visual Odometry Using Genetic Algorithm

  • paper_url: http://arxiv.org/abs/2311.06149
  • repo_url: None
  • paper_authors: Slimane Djema, Zoubir Abdeslem Benselama, Ramdane Hedjar, Krabi Abdallah
  • for: 估算mobile robot或运动物体头部摄像机运动 FROM RGB-D图像中的静止场景
  • methods: 使用非线性最小二乘方法转化问题,并使用经典方法提供了迭代解决方案,以及metaheuristic优化方法解决问题,并提高结果
  • results: 基于基因算法开发了一种新的视觉速度计算方法,并通过比较与基能量方法和另一种metaheuristic方法进行比较,证明了我们的创新算法的效率。
    Abstract Our work aims to estimate the camera motion mounted on the head of a mobile robot or a moving object from RGB-D images in a static scene. The problem of motion estimation is transformed into a nonlinear least squares function. Methods for solving such problems are iterative. Various classic methods gave an iterative solution by linearizing this function. We can also use the metaheuristic optimization method to solve this problem and improve results. In this paper, a new algorithm is developed for visual odometry using a sequence of RGB-D images. This algorithm is based on a genetic algorithm. The proposed iterative genetic algorithm searches using particles to estimate the optimal motion and then compares it to the traditional methods. To evaluate our method, we use the root mean square error to compare it with the based energy method and another metaheuristic method. We prove the efficiency of our innovative algorithm on a large set of images.
    摘要 我团队的工作目标是从RGB-D图像中估算移动机器或移动物体的摄像头运动。这个问题被转化为非线性最小二乘函数。解决这类问题的方法是迭代的。经典方法可以将这个函数线性化以获得迭代解决方案。我们还可以使用metaheuristic优化方法解决这个问题,以提高结果。在这篇论文中,我们开发了一种基于遗传算法的视觉奔迅算法。我们使用一系列RGB-D图像来测试我们的算法,并与传统方法和另一种metaheuristic方法进行比较。我们使用根mean square error来评估我们的方法,并证明了我们的创新算法在大量图像上的效率。

Incorporating sufficient physical information into artificial neural networks: a guaranteed improvement via physics-based Rao-Blackwellization

  • paper_url: http://arxiv.org/abs/2311.06147
  • repo_url: None
  • paper_authors: Gian-Luca Geuken, Jörn Mosler, Patrick Kurzeja
  • for: 提高人工神经网络预测的精度,使用物理信息。
  • methods: 使用Rao-Blackwell化Strategy,将错误范数和证明改进传递到决定性概念上,使用物理条件的充分信息。
  • results: 应用于材料模型化、塑性钢 simulate、质量违 brittle 损伤和塑性实验,可以提高预测的精度,减少噪音、过拟合和数据要求。
    Abstract The concept of Rao-Blackwellization is employed to improve predictions of artificial neural networks by physical information. The error norm and the proof of improvement are transferred from the original statistical concept to a deterministic one, using sufficient information on physics-based conditions. The proposed strategy is applied to material modeling and illustrated by examples of the identification of a yield function, elasto-plastic steel simulations, the identification of driving forces for quasi-brittle damage and rubber experiments. Sufficient physical information is employed, e.g., in the form of invariants, parameters of a minimization problem, dimensional analysis, isotropy and differentiability. It is proven how intuitive accretion of information can yield improvement if it is physically sufficient, but also how insufficient or superfluous information can cause impairment. Opportunities for the improvement of artificial neural networks are explored in terms of the training data set, the networks' structure and output filters. Even crude initial predictions are remarkably improved by reducing noise, overfitting and data requirements.
    摘要 “RAO-BLACKWELLIZATION”技术可以提高人工神经网络预测的准确性,通过物理信息的充分利用。原始统计概念的错误 нор和证明改进被转移到决定性概念上,使用物理条件的充分信息。提议的策略被应用于材料模型化,通过示例描述了固体弹性钢的预测、不可逆减弱损伤和塑料实验的标定。使用物理信息,如 invariants、最小化问题的参数、维度分析、均匀性和微分性。证明了如果物理信息充分,则直观增加信息可以带来改进,但也证明了不充分或过度信息会导致下降。探讨人工神经网络的改进机会,包括训练数据集、网络结构和输出筛选。жеven crude initial predictions can be remarkably improved by reducing noise, overfitting and data requirements.

High-dimensional mixed-categorical Gaussian processes with application to multidisciplinary design optimization for a green aircraft

  • paper_url: http://arxiv.org/abs/2311.06130
  • repo_url: None
  • paper_authors: Paul Saves, Youssef Diouane, Nathalie Bartoli, Thierry Lefebvre, Joseph Morlier
  • for: 这 paper 的目的是提出一种基于 Gaussian Process(GP)的混合 categorical 优化方法,以解决多学科设计优化中混合 categorical 变量的问题。
  • methods: 这 paper 使用 Partial Least Squares(PLS)回归来构建混合 categorical GP,并通过 Kriging with PLS 来扩展 GP 的应用范围。
  • results: 该方法在实际应用中得到了成功,包括对一架悬臂 beam 的结构行为的研究以及一架绿色飞机的多学科设计优化。 results 表明,该方法可以减少飞机在一次任务中消耗的燃料量为 439 公斤。
    Abstract Multidisciplinary design optimization (MDO) methods aim at adapting numerical optimization techniques to the design of engineering systems involving multiple disciplines. In this context, a large number of mixed continuous, integer, and categorical variables might arise during the optimization process, and practical applications involve a significant number of design variables. Recently, there has been a growing interest in mixed-categorical metamodels based on Gaussian Process (GP) for Bayesian optimization. In particular, to handle mixed-categorical variables, several existing approaches employ different strategies to build the GP. These strategies either use continuous kernels, such as the continuous relaxation or the Gower distance-based kernels, or direct estimation of the correlation matrix, such as the exponential homoscedastic hypersphere (EHH) or the Homoscedastic Hypersphere (HH) kernel. Although the EHH and HH kernels are shown to be very efficient and lead to accurate GPs, they are based on a large number of hyperparameters. In this paper, we address this issue by constructing mixed-categorical GPs with fewer hyperparameters using Partial Least Squares (PLS) regression. Our goal is to generalize Kriging with PLS, commonly used for continuous inputs, to handle mixed-categorical inputs. The proposed method is implemented in the open-source software SMT and has been efficiently applied to structural and multidisciplinary applications. Our method is used to effectively demonstrate the structural behavior of a cantilever beam and facilitates MDO of a green aircraft, resulting in a 439-kilogram reduction in the amount of fuel consumed during a single aircraft mission.
    摘要 多学科设计优化(MDO)方法是指通过数学优化技术来设计工程系统中的多学科系统。在这个上下文中,可能会出现大量的混合连续、整数和分类变量,而实际应用中的设计变量数量可能很大。现在,关于混合分类变量的泛化模型方法已经受到了越来越多的关注。特别是在涉及到混合分类变量时,exist several approaches to build the GP, such as using continuous kernels, like the continuous relaxation or the Gower distance-based kernels, or direct estimation of the correlation matrix, like the exponential homoscedastic hypersphere (EHH) or the Homoscedastic Hypersphere (HH) kernel. Although the EHH and HH kernels are shown to be very efficient and lead to accurate GPs, they are based on a large number of hyperparameters. In this paper, we address this issue by constructing mixed-categorical GPs with fewer hyperparameters using Partial Least Squares (PLS) regression. Our goal is to generalize Kriging with PLS, commonly used for continuous inputs, to handle mixed-categorical inputs. The proposed method is implemented in the open-source software SMT and has been efficiently applied to structural and multidisciplinary applications. Our method is used to effectively demonstrate the structural behavior of a cantilever beam and facilitates MDO of a green aircraft, resulting in a 439-kilogram reduction in the amount of fuel consumed during a single aircraft mission.

Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking

  • paper_url: http://arxiv.org/abs/2311.06102
  • repo_url: None
  • paper_authors: Lefteris Loukas, Ilias Stogiannidis, Odysseas Diamantopoulos, Prodromos Malakasiotis, Stavros Vassos
  • for: 这个研究旨在探讨具有限制的数据的情况下,使用少量样本进行NLG的可行性,并评估OpenAI、Cohere和Anthropic等 cutting-edge LLMs 的表现。
  • methods: 研究使用了内生生成(RAG)和GPT-4的数据增强技术,并评估了这些方法的成本效益。
  • results: 研究发现,使用RAG可以大幅降低操作成本,并且GPT-4的数据增强技术可以提高表现在限制的数据情况下。
    Abstract Standard Full-Data classifiers in NLP demand thousands of labeled examples, which is impractical in data-limited domains. Few-shot methods offer an alternative, utilizing contrastive learning techniques that can be effective with as little as 20 examples per class. Similarly, Large Language Models (LLMs) like GPT-4 can perform effectively with just 1-5 examples per class. However, the performance-cost trade-offs of these methods remain underexplored, a critical concern for budget-limited organizations. Our work addresses this gap by studying the aforementioned approaches over the Banking77 financial intent detection dataset, including the evaluation of cutting-edge LLMs by OpenAI, Cohere, and Anthropic in a comprehensive set of few-shot scenarios. We complete the picture with two additional methods: first, a cost-effective querying method for LLMs based on retrieval-augmented generation (RAG), able to reduce operational costs multiple times compared to classic few-shot approaches, and second, a data augmentation method using GPT-4, able to improve performance in data-limited scenarios. Finally, to inspire future research, we provide a human expert's curated subset of Banking77, along with extensive error analysis.
    摘要 <>将文本翻译成简化中文。<>普通的全数据分类器在自然语言处理中需要千个标注示例,这是数据有限领域中不切实际的。少量方法提供了一个 alternatives, 使用对比学习技术,可以在每个类型只需20个示例。同时,大语言模型(LLMs)如GPT-4可以在每个类型只需1-5个示例。然而,这些方法的性能成本负担仍未得到充分探讨,这是企业有限预算的关键问题。我们的工作解决这个问题,通过对上述方法的研究,包括开放AI、Cohere和人类智慧的cutting-edge LLMs在内的广泛的少量场景。我们还添加了两种额外方法:首先,一种基于检索增生(RAG)的cost-effective查询方法,可以在经典少量场景中多次减少操作成本,并第二,一种使用GPT-4的数据扩展方法,可以在数据有限场景中提高性能。最后,为未来研究提供了人类专家精心审核的 Banking77 子集,以及广泛的错误分析。

In-Context Learning for MIMO Equalization Using Transformer-Based Sequence Models

  • paper_url: http://arxiv.org/abs/2311.06101
  • repo_url: https://github.com/kclip/icl-equalization
  • paper_authors: Matteo Zecchin, Kai Yu, Osvaldo Simeone
  • for: 这个论文旨在探讨如何利用大规模预训练序列模型(如转换器基 architecture)进行上下文学习(ICL),以解决多输入多输出(MIMO)均衡问题。
  • methods: 该论文使用了ICL技术,通过将输入和相关任务上的一些例子映射到输出变量上,以直接确定决策。无需显式更新模型参数,可以适应新任务。
  • results: 研究表明,通过预训练,可以使transformer-based ICL在MIMO均衡问题中达到阈值行为,即,随着预训练任务数量的增加,性能从预先确定的 minimum mean squared error(MMSE)均衡器转变为真实数据生成的prior。
    Abstract Large pre-trained sequence models, such as transformer-based architectures, have been recently shown to have the capacity to carry out in-context learning (ICL). In ICL, a decision on a new input is made via a direct mapping of the input and of a few examples from the given task, serving as the task's context, to the output variable. No explicit updates of model parameters are needed to tailor the decision to a new task. Pre-training, which amounts to a form of meta-learning, is based on the observation of examples from several related tasks. Prior work has shown ICL capabilities for linear regression. In this study, we leverage ICL to address the inverse problem of multiple-input and multiple-output (MIMO) equalization based on a context given by pilot symbols. A task is defined by the unknown fading channel and by the signal-to-noise ratio (SNR) level, which may be known. To highlight the practical potential of the approach, we allow for the presence of quantization of the received signals. We demonstrate via numerical results that transformer-based ICL has a threshold behavior, whereby, as the number of pre-training tasks grows, the performance switches from that of a minimum mean squared error (MMSE) equalizer with a prior determined by the pre-trained tasks to that of an MMSE equalizer with the true data-generating prior.
    摘要 大型预训模型,如基于转换器架构的模型,最近已经显示出在上下文学习(ICL)中有较大的容量。在 ICL 中,一个决策是通过直接映射输入和任务上的一些例子来进行决策。无需显式更新模型参数,可以适应新任务。预训,即一种形式的meta-学习,基于多个相关任务的观察。前工作已经证明了 ICL 的能力 для线性回归。在这个研究中,我们利用 ICL 来解决多输入多出力(MIMO)平衡问题,基于一个 Context 给出的飞行符号。任务是由未知拍摄通道和信号噪声比(SNR)水平确定。为了强调实用的潜力,我们允许接收信号的量化。我们通过数值结果表明,基于转换器的 ICL 存在一个阈值行为,其中,当预训任务数量增加时,性能从一个基于预训任务的最小方差平均值(MMSE)平衡器转换为一个基于真实数据生成的 prior 的 MMSE 平衡器。

RIGA: A Regret-Based Interactive Genetic Algorithm

  • paper_url: http://arxiv.org/abs/2311.06063
  • repo_url: None
  • paper_authors: Nawal Benabbou, Cassandre Leroy, Thibaut Lust
  • For: 解决多目标 combinatorial 优化问题中的偏好不确定性问题( preference imprecision problem)。* Methods: 使用互动遗传算法(Interactive Genetic Algorithm,IGA),其包括: + 使用 regret-based elicitation 技术缩小参数空间。 + 在参数实例上应用 génétiques 运算(genetic operators)来更好地探索参数空间。 + 使用现有的解决方案(solving methods)来生成有前景的解(promising solutions)。* Results: 对多目标包袋和旅行团队问题进行了测试,并证明了 RIGA 可以在有界时间内运行,并且不超过一定的数量的查询。同时,对多个表现指标(computation times, gap to optimality, number of queries),RIGAs 的表现比现有的算法更好。
    Abstract In this paper, we propose an interactive genetic algorithm for solving multi-objective combinatorial optimization problems under preference imprecision. More precisely, we consider problems where the decision maker's preferences over solutions can be represented by a parameterized aggregation function (e.g., a weighted sum, an OWA operator, a Choquet integral), and we assume that the parameters are initially not known by the recommendation system. In order to quickly make a good recommendation, we combine elicitation and search in the following way: 1) we use regret-based elicitation techniques to reduce the parameter space in a efficient way, 2) genetic operators are applied on parameter instances (instead of solutions) to better explore the parameter space, and 3) we generate promising solutions (population) using existing solving methods designed for the problem with known preferences. Our algorithm, called RIGA, can be applied to any multi-objective combinatorial optimization problem provided that the aggregation function is linear in its parameters and that a (near-)optimal solution can be efficiently determined for the problem with known preferences. We also study its theoretical performances: RIGA can be implemented in such way that it runs in polynomial time while asking no more than a polynomial number of queries. The method is tested on the multi-objective knapsack and traveling salesman problems. For several performance indicators (computation times, gap to optimality and number of queries), RIGA obtains better results than state-of-the-art algorithms.
    摘要 在这篇论文中,我们提出了一种互动式遗传算法,用于解决具有偏好不确定性的多目标组合优化问题。具体来说,我们考虑了具有以下特点的问题:解决方案的偏好可以通过一个参数化的汇聚函数(例如Weighted sum、OWA运算符、Choquet积分)来表示,并且偏好参数在初始化时并不知道。为了快速提供高质量的建议,我们将感知和搜索结合使用,具体来说是:1)使用 regret-based elicitation技术来减少参数空间,2)在参数实例(而不是解决方案)上应用遗传运算,3)使用现有的解决方案设计方法来生成优秀的解决方案(人口)。我们称之为RIGА算法,它可以应用于任何多目标组合优化问题,只要汇聚函数是线性的,并且可以有效地确定(或近似)优质解决方案。我们还研究了其理论性能:RIGА算法可以在 polynomial 时间内运行,并且只需要对问题进行 polynomial 数量的询问。我们测试了这种方法在多重目标随机抽样问题和多重目标随机包问题上,并证明了它在几个性能指标(计算时间、落差和询问数)上表现更好。

Deep learning for 3D Object Detection and Tracking in Autonomous Driving: A Brief Survey

  • paper_url: http://arxiv.org/abs/2311.06043
  • repo_url: None
  • paper_authors: Yang Peng
  • for: 本研究主要针对3D点云数据进行对象检测和跟踪任务,以提高自动驾驶系统的性能。
  • methods: 本文主要介绍最新的深度学习方法 для3D对象检测和跟踪,包括PointNet、PointNet++、DGCNN等。
  • results: 本文综合比较了不同方法的实验结果,并提出了未来研究的方向,以帮助读者更好地了解3D点云数据的对象检测和跟踪任务。
    Abstract Object detection and tracking are vital and fundamental tasks for autonomous driving, aiming at identifying and locating objects from those predefined categories in a scene. 3D point cloud learning has been attracting more and more attention among all other forms of self-driving data. Currently, there are many deep learning methods for 3D object detection. However, the tasks of object detection and tracking for point clouds still need intensive study due to the unique characteristics of point cloud data. To help get a good grasp of the present situation of this research, this paper shows recent advances in deep learning methods for 3D object detection and tracking.
    摘要 <> translate_language: zh-CN对象探测和跟踪是自动驾驶中非常重要和基本的任务,目的是在场景中从先定的类别中标识和定位对象。三维点云学习在所有自驾数据中受到更多的关注。目前有许多深度学习方法 для 3D 对象探测。但是对于点云数据的特殊特点,对象探测和跟踪 tasks 仍然需要进一步的研究。为了帮助更好地了解这个研究的现状,本文介绍了最新的深度学习方法 для 3D 对象探测和跟踪。Note: I've set the `translate_language` parameter to `zh-CN` to indicate that the text should be translated into Simplified Chinese.

Reviewing Developments of Graph Convolutional Network Techniques for Recommendation Systems

  • paper_url: http://arxiv.org/abs/2311.06323
  • repo_url: None
  • paper_authors: Haojun Zhu, Vikram Kapoor, Priya Sharma
  • for: 这篇论文旨在探讨近期关于推荐系统的研究,具体来说是 Graph Neural Network(GNNS)在推荐系统中的应用。
  • methods: 论文主要考虑了推荐系统的背景和发展,以及Graph Neural Network(GNNS)的背景和发展。然后,根据设置和图神经网络的 spectral 和 spatial 模型,分类了推荐系统。
  • results: 论文分析了图神经网络在推荐系统中的挑战和开放问题,包括图构建、嵌入传播和聚合以及计算效率等。这些分析帮助我们更好地探索未来的发展方向。
    Abstract The Recommender system is a vital information service on today's Internet. Recently, graph neural networks have emerged as the leading approach for recommender systems. We try to review recent literature on graph neural network-based recommender systems, covering the background and development of both recommender systems and graph neural networks. Then categorizing recommender systems by their settings and graph neural networks by spectral and spatial models, we explore the motivation behind incorporating graph neural networks into recommender systems. We also analyze challenges and open problems in graph construction, embedding propagation and aggregation, and computation efficiency. This guides us to better explore the future directions and developments in this domain.
    摘要 “推荐系统是今天互联网上重要的资讯服务。最近,图 neural network 已经成为推荐系统的主要方法。我们尝试综述最近的文献,探讨推荐系统和图 neural network 的背景和发展,以及将推荐系统分为不同的设定和将图 neural network 分为спектраль和空间模型。我们也分析了将图 neural network 应用到推荐系统的动机,以及构建图、传播嵌入和聚合的挑战和开放问题。这导我们更好地探索未来的发展方向。”Note: Simplified Chinese is used here, as it is more commonly used in mainland China. If you prefer Traditional Chinese, I can provide that version as well.

Enhancing Actuarial Non-Life Pricing Models via Transformers

  • paper_url: http://arxiv.org/abs/2311.07597
  • repo_url: https://github.com/BrauerAlexej/Enhancing_actuarial_non-life_pricing_models_via_transformers_Public
  • paper_authors: Alexej Brauer
  • for: 提高非人寿保险价格预测力,基于 transformer 模型对简洁数据进行增强
  • methods: 使用 novel 方法增强 actuarial non-life 模型,包括 feature tokenizer transformer 和 LocalGLMnet
  • results: 比较了多种 referential 模型,包括 generalized linear models、feed-forward neural networks、combined actuarial neural networks、LocalGLMnet 和 pure feature tokenizer transformer,并证明新方法可以在 real-world claim frequency 数据上达到更好的结果,同时保持一定的 generalized linear model 优点
    Abstract Currently, there is a lot of research in the field of neural networks for non-life insurance pricing. The usual goal is to improve the predictive power via neural networks while building upon the generalized linear model, which is the current industry standard. Our paper contributes to this current journey via novel methods to enhance actuarial non-life models with transformer models for tabular data. We build here upon the foundation laid out by the combined actuarial neural network as well as the localGLMnet and enhance those models via the feature tokenizer transformer. The manuscript demonstrates the performance of the proposed methods on a real-world claim frequency dataset and compares them with several benchmark models such as generalized linear models, feed-forward neural networks, combined actuarial neural networks, LocalGLMnet, and pure feature tokenizer transformer. The paper shows that the new methods can achieve better results than the benchmark models while preserving certain generalized linear model advantages. The paper also discusses the practical implications and challenges of applying transformer models in actuarial settings.
    摘要 当前, neuronal networks 在非生命保险价值评估领域中有很多研究。目标通常是通过 neuronal networks 提高预测力,而基于现有的泛化线性模型(Generalized Linear Model, GLM)。我们的论文在这个领域中做出了贡献,通过 novel methods 增强 actuarial non-life 模型。我们在 combined actuarial neural network 和 localGLMnet 基础上建立了新的模型,并使用 feature tokenizer transformer 进行增强。 manuscript 中对实际的审核频率数据集进行了表现测试,并与多个 Referential models,如 generalized linear models、feed-forward neural networks、combined actuarial neural networks、LocalGLMnet 和 pure feature tokenizer transformer 进行比较。结果显示,新方法可以在 benchmark models 之上 achieve better results,同时保持一定的 Generalized Linear Model 优点。论文还讨论了应用 transformer models 在 actuarial 设置中的实际意义和挑战。

RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

  • paper_url: http://arxiv.org/abs/2311.06015
  • repo_url: None
  • paper_authors: Hongyin Zhang, Diyuan Shi, Zifeng Zhuang, Han Zhao, Zhenyu Wei, Feng Zhao, Sibo Gai, Shangke Lyu, Donglin Wang
  • for: 本研究旨在提高机器人自主化的速度和适应能力,通过组织机器人庞大的基本技能,以便快速适应未知的野外情况。
  • methods: 本研究提出了一种名为机器人技能图(RSG)的新框架,它通过组织机器人庞大的基本技能,以便发现机器人学习过程中的隐藏关系,并帮助机器人快速适应新任务和环境。
  • results: 实验结果表明,RSG可以为机器人提供合理的技能推理,并使四肢机器人快速适应新的情况和学习新的技能。
    Abstract Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed to survive, and can quickly acquire new ones, by composing fundamental skills with limited experience. Inspired by this, we propose a novel framework, named Robot Skill Graph (RSG) for organizing massive fundamental skills of robots and dexterously reusing them for fast adaptation. Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of massive dynamic behavioral skills instead of static knowledge in KG and enables discovering implicit relations that exist in be-tween of learning context and acquired skills of robots, serving as a starting point for understanding subtle patterns existing in robots' skill learning. Extensive experimental results demonstrate that RSG can provide rational skill inference upon new tasks and environments and enable quadruped robots to adapt to new scenarios and learn new skills rapidly.
    摘要 开发能够快速适应未经见过的野外情况的机器人智能系统是探索自主机器人的一个核心挑战。虽然有些很出色的进步在四肢机器人的步态稳定和技能学习方面,但它们的快速适应仍然不如自然界中的动物。动物出生时拥有大量需要生存的基础技能,并可以快速获得新的技能,通过精心组合基本技能和有限的经验。受这种启示,我们提出了一个新的框架,即机器人技能图(RSG),用于组织机器人的基本技能和重用它们 для快速适应。RSG的结构类似知识图(KG),但是它使用动态行为技能而不是静态知识,可以发现机器人学习过程中存在的潜在关系,并作为机器人技能学习的开始点,以便理解机器人技能学习中的细微趋势。我们的实验结果表明,RSG可以为机器人提供合理的技能推理,并使四肢机器人快速适应新任务和环境,快速学习新技能。

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

  • paper_url: http://arxiv.org/abs/2311.05997
  • repo_url: https://github.com/CraftJarvis/JARVIS-1
  • paper_authors: Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang
  • for: 这个论文旨在创造一种可以在开放世界中实现人类化规划和控制的机器人,以便实现更加功能强大的总体智能代理人。
  • methods: 该论文使用了预训练的多模态语言模型,将视觉观察和文本指令映射到计划中,然后通过目标conditioned控制器执行。它还使用了多模态记忆,以便通过实际游戏存活经历和预训练知识来进行规划。
  • results: 在 Minecraft 宇宙测试 benchmark 中,JARVIS-1 展现出了 nearly perfect 的表现,完成了200多个任务,其中包括从入门到中等水平的任务。JARVIS-1 在长期任务中取得了12.5%的完成率,这与之前的记录比起来是5倍的提高。此外,JARVIS-1 还能够自我提升,这是因为它使用了多模态记忆,这种自我提升可以持续进行,从而实现更好的智能和自主性。
    Abstract Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progresses. We introduce JARVIS-1, an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe. Specifically, we develop JARVIS-1 on top of pre-trained multimodal language models, which map visual observations and textual instructions to plans. The plans will be ultimately dispatched to the goal-conditioned controllers. We outfit JARVIS-1 with a multimodal memory, which facilitates planning using both pre-trained knowledge and its actual game survival experiences. In our experiments, JARVIS-1 exhibits nearly perfect performances across over 200 varying tasks from the Minecraft Universe Benchmark, ranging from entry to intermediate levels. JARVIS-1 has achieved a completion rate of 12.5% in the long-horizon diamond pickaxe task. This represents a significant increase up to 5 times compared to previous records. Furthermore, we show that JARVIS-1 is able to $\textit{self-improve}$ following a life-long learning paradigm thanks to multimodal memory, sparking a more general intelligence and improved autonomy. The project page is available at https://craftjarvis-jarvis1.github.io.
    摘要 实现人类化规划和控制,使用多Modal观察在开放世界中是功能普通代理的关键里程碑。现有方法可以处理某些长期任务在开放世界中,但它们仍然在数量可能无限的任务中受到挑战,而且缺乏逐渐提高任务完成度的能力。我们介绍JARVIS-1,一个在 Minecraft 宇宙中运行的开放世界代理,可以感知多Modal输入(视觉观察和人工指令),生成复杂的计划,并执行具体的控制,全部运行在 Minecraft 游戏中。具体来说,我们基于预训练的多Modal语言模型,将视觉观察和文本指令映射到计划。计划将被最终转交给目标受控器。我们为 JARVIS-1 增加了多Modal 记忆,以便通过预训练知识和实际游戏生存经验来帮助计划。在我们的实验中,JARVIS-1 在 Minecraft Universe Benchmark 上 exhibits nearly perfect 性能,包括多达 200 个任务,覆盖从入门到中级水平。JARVIS-1 在长期钻石钻刀任务中达到了12.5%的完成率,这比前一记录提高了5倍。此外,我们表明 JARVIS-1 能够 $\textit{自我改进}$ ,采用生命长学学习模式,增强智能和自主性。项目页面可以在 中找到。

Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous

  • paper_url: http://arxiv.org/abs/2311.05992
  • repo_url: None
  • paper_authors: Ziwei Wang, Nabil Aouf, Jose Pizarro, Christophe Honvault
  • for: 本研究旨在提高自主空间飞行器相对导航中使用深度学习技术的性能,但是这些技术也增加了对其可靠性和安全性的担忧,尤其是对于深度学习方法的抗击攻击。本文提出了一种基于解释性理念的异常检测方法来检测深度神经网络基于相对pose估计中的异常攻击。
  • methods: 本文提出了一种基于Convolutional Neural Network (CNN)的新型相对pose估计技术,该技术使用了图像从追踪器上的摄像头获取,并输出了目标的相对位置和旋转矩阵。此外,本文还使用了Fast Gradient Sign Method (FGSM)生成的各种各样的异常攻击来让模型适应不同的异常攻击情况。
  • results: 实验结果显示,提出的异常检测方法可以准确地检测异常攻击,其检测精度为99.21%。此外,在实验室设置中使用了真实数据进行测试,实验结果表明,提出的异常检测方法在实际应用中可以达到96.29%的检测精度。
    Abstract Research on developing deep learning techniques for autonomous spacecraft relative navigation challenges is continuously growing in recent years. Adopting those techniques offers enhanced performance. However, such approaches also introduce heightened apprehensions regarding the trustability and security of such deep learning methods through their susceptibility to adversarial attacks. In this work, we propose a novel approach for adversarial attack detection for deep neural network-based relative pose estimation schemes based on the explainability concept. We develop for an orbital rendezvous scenario an innovative relative pose estimation technique adopting our proposed Convolutional Neural Network (CNN), which takes an image from the chaser's onboard camera and outputs accurately the target's relative position and rotation. We perturb seamlessly the input images using adversarial attacks that are generated by the Fast Gradient Sign Method (FGSM). The adversarial attack detector is then built based on a Long Short Term Memory (LSTM) network which takes the explainability measure namely SHapley Value from the CNN-based pose estimator and flags the detection of adversarial attacks when acting. Simulation results show that the proposed adversarial attack detector achieves a detection accuracy of 99.21%. Both the deep relative pose estimator and adversarial attack detector are then tested on real data captured from our laboratory-designed setup. The experimental results from our laboratory-designed setup demonstrate that the proposed adversarial attack detector achieves an average detection accuracy of 96.29%.
    摘要 研究在开发深度学习技术以提高自主空间飞行器相对导航的挑战在最近几年内不断增长。采用这些技术可以提高性能,但这些方法也增加了对深度学习方法的信任和安全性的担忧,尤其是它们对抗性攻击的敏感性。在这项工作中,我们提出了一种基于 explainability 概念的对深度神经网络 pose 估计方法的 adversarial 攻击检测方法。我们在推送器上的摄像头拍摄的图像上采用我们提出的卷积神经网络(CNN),输出target的相对位置和旋转精度。我们使用 Fast Gradient Sign Method(FGSM)生成的抗击性攻击来略微地扰乱输入图像。然后,我们根据 Long Short Term Memory(LSTM)网络来建立一个基于 explainability 度的 adversarial 攻击检测器,这里的 explainability 度是 CNN 基于 pose 估计器输出的 SHapley Value。实验结果显示,我们的 adversarial 攻击检测器在 simulated 数据上达到了 99.21% 的检测精度。在实验室设置中测试的实际数据上,我们的 adversarial 攻击检测器的平均检测精度为 96.29%。

A Decision Support System for Liver Diseases Prediction: Integrating Batch Processing, Rule-Based Event Detection and SPARQL Query

  • paper_url: http://arxiv.org/abs/2311.07595
  • repo_url: None
  • paper_authors: Ritesh Chandra, Sadhana Tiwari, Satyam Rastogi, Sonali Agarwal
  • for: 这个研究的目的是构建一个预测肝病的模型,以帮助医生更好地诊断和预测肝病。
  • methods: 这个研究使用Basic Formal Ontology (BFO)和基于决策树算法的检测规则,通过批处理使用Apache Jena框架检测事件,并使用SPARQL进行直接处理。
  • results: 这个研究使用SWRL规则将DT规则转换为ontology中的Semantic Web Rule Language (SWRL),并使用Pellet和Drool推理引擎在Protege工具中进行推理,最终可以为病人根据DT规则生成结果,并获得与病人相关的其他细节和不同预防建议。
    Abstract Liver diseases pose a significant global health burden, impacting a substantial number of individuals and exerting substantial economic and social consequences. Rising liver problems are considered a fatal disease in many countries, such as Egypt, Molda, etc. The objective of this study is to construct a predictive model for liver illness using Basic Formal Ontology (BFO) and detection rules derived from a decision tree algorithm. Based on these rules, events are detected through batch processing using the Apache Jena framework. Based on the event detected, queries can be directly processed using SPARQL. To make the ontology operational, these Decision Tree (DT) rules are converted into Semantic Web Rule Language (SWRL). Using this SWRL in the ontology for predicting different types of liver disease with the help of the Pellet and Drool inference engines in Protege Tools, a total of 615 records are taken from different liver diseases. After inferring the rules, the result can be generated for the patient according to the DT rules, and other patient-related details along with different precautionary suggestions can be obtained based on these results. Combining query results of batch processing and ontology-generated results can give more accurate suggestions for disease prevention and detection. This work aims to provide a comprehensive approach that is applicable for liver disease prediction, rich knowledge graph representation, and smart querying capabilities. The results show that combining RDF data, SWRL rules, and SPARQL queries for analysing and predicting liver disease can help medical professionals to learn more about liver diseases and make a Decision Support System (DSS) for health care.
    摘要 肝病对全球健康带来重大的影响,影响了大量人口并且对健康系统和社会带来了巨大的经济和社会影响。肝病在许多国家被视为致命疾病,如 Egyp、Molda等国。本研究的目标是使用基本正式 ontology(BFO)和基于决策树算法 derive的检测规则来建立预测肝病的模型。通过批处理,Apache Jena框架中的事件被检测,并基于检测到的事件,使用 SPARQL 进行直接处理。为了使 ontology 操作,这些决策树(DT)规则被转换为 Semantic Web Rule Language(SWRL)。使用这些 SWRL 在 ontology 中预测不同类型的肝病,并使用 Protege 工具中的 Pellet 和 Drool 推理引擎,共计615个记录来自不同的肝病。 after inferring the rules, the result can be generated for the patient according to the DT rules, and other patient-related details along with different precautionary suggestions can be obtained based on these results。通过将批处理的查询结果和 ontology 生成的结果组合,可以给出更加准确的疾病预测和预防建议。本工作的目标是提供一种通用的方法,可以用于肝病预测、丰富的知识图表示和智能查询能力。结果表明,将 RDF 数据、SWRL 规则和 SPARQL 查询结合分析和预测肝病,可以帮助医疗专业人员更好地了解肝病,并建立一个智能决策支持系统(DSS) для健康医疗。

How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model

  • paper_url: http://arxiv.org/abs/2311.07594
  • repo_url: None
  • paper_authors: Shezheng Song, Xiaopeng Li, Shasha Li
  • for: This paper explores the use of Multimodal Large Language Models (MLLMs) to handle multimodal data and their potential applications in real-world human-computer interactions and artificial general intelligence.
  • methods: The paper surveys existing modality alignment methods for MLLMs, including Multimodal Converters, Multimodal Perceivers, Tools Assistance, and Data-Driven methods.
  • results: The paper discusses the challenges of processing the semantic gap in multimodality and the potential risks of erroneous generation, and highlights the importance of choosing appropriate modality alignment methods for LLMs to address environmental issues and enhance accessibility.Here is the simplified Chinese text for the three key points:
  • for: 这篇论文探讨了大型语言模型(LLMs)如何处理多Modal数据,以及其在人机交互和人工智能潜在应用方面的潜力。
  • methods: 论文综述了现有的多Modal信息对齐方法,包括多Modal转换器、多Modal感知器、工具助手和数据驱动方法。
  • results: 论文讨论了多Modal数据的含义差距处理的挑战和可能的错误生成风险,并强调了选择合适的多Modal信息对齐方法,以解决环境问题和提高可用性。
    Abstract This review paper explores Multimodal Large Language Models (MLLMs), which integrate Large Language Models (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities like generating image narratives and answering image-based questions, bridging the gap towards real-world human-computer interactions and hinting at a potential pathway to artificial general intelligence. However, MLLMs still face challenges in processing the semantic gap in multimodality, which may lead to erroneous generation, posing potential risks to society. Choosing the appropriate modality alignment method is crucial, as improper methods might require more parameters with limited performance improvement. This paper aims to explore modality alignment methods for LLMs and their existing capabilities. Implementing modality alignment allows LLMs to address environmental issues and enhance accessibility. The study surveys existing modal alignment methods in MLLMs into four groups: (1) Multimodal Converters that change data into something LLMs can understand; (2) Multimodal Perceivers to improve how LLMs perceive different types of data; (3) Tools Assistance for changing data into one common format, usually text; and (4) Data-Driven methods that teach LLMs to understand specific types of data in a dataset. This field is still in a phase of exploration and experimentation, and we will organize and update various existing research methods for multimodal information alignment.
    摘要 这篇评论文章探讨了多模态大语言模型(MLLM),它们将大语言模型(LLM)如GPT-4 integrated into多模态数据处理,如文本和视觉。 MLLMs 示出了生成图像故事和回答图像问题的能力, bridge the gap towards real-world human-computer interactions and hint at a potential pathway to artificial general intelligence。然而, MLLMs 在多模态 semantic gap处理方面仍面临挑战,可能导致错误生成, posing potential risks to society。选择合适的模态对齐方法是关键,因为不当的方法可能需要更多的参数,但具有有限的性能提升。这篇文章探讨了LLMs 的现有能力和现有的模态对齐方法,以实现环境问题和访问ibilty。对于现有的模态对齐方法,我们将它们分为四个组:(1)多模态转换器,将数据转换成LLMs可以理解的形式;(2)多模态感知器,提高LLMs 对不同类型数据的感知能力;(3)工具助手,将数据转换成一种常见的文本格式;(4)数据驱动方法,教导LLMs 理解特定的数据集中的特定类型数据。这个领域仍处于探索和实验阶段,我们将组织和更新现有的研究方法,以便在多模态信息对齐方面进行进一步的发展。

TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree transformation

  • paper_url: http://arxiv.org/abs/2311.08157
  • repo_url: None
  • paper_authors: Zixiang Xian, Rubing Huang, Dave Towey, Chunrong Fang, Zhenyu Chen
  • for: 本研究旨在提出一种新的框架,即TransformCode,用于学习代码嵌入。
  • methods: 该框架使用TransformerEncoder作为模型的重要组成部分,并引入了一种新的数据采样技术 called abstract syntax tree transformation。
  • results: 我们的方法可以快速和效率地学习代码嵌入,并且可以适应不同的编程语言和任务。我们通过对不同的软件工程任务和多个数据集进行广泛的实验来证明方法的效果。
    Abstract Large-scale language models have made great progress in the field of software engineering in recent years. They can be used for many code-related tasks such as code clone detection, code-to-code search, and method name prediction. However, these large-scale language models based on each code token have several drawbacks: They are usually large in scale, heavily dependent on labels, and require a lot of computing power and time to fine-tune new datasets.Furthermore, code embedding should be performed on the entire code snippet rather than encoding each code token. The main reason for this is that encoding each code token would cause model parameter inflation, resulting in a lot of parameters storing information that we are not very concerned about. In this paper, we propose a novel framework, called TransformCode, that learns about code embeddings in a contrastive learning manner. The framework uses the Transformer encoder as an integral part of the model. We also introduce a novel data augmentation technique called abstract syntax tree transformation: This technique applies syntactic and semantic transformations to the original code snippets to generate more diverse and robust anchor samples. Our proposed framework is both flexible and adaptable: It can be easily extended to other downstream tasks that require code representation such as code clone detection and classification. The framework is also very efficient and scalable: It does not require a large model or a large amount of training data, and can support any programming language.Finally, our framework is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives. To explore the effectiveness of our framework, we conducted extensive experiments on different software engineering tasks using different programming languages and multiple datasets.
    摘要 大规模语言模型在软件工程领域最近几年来所做出的进步非常大。它们可以用于许多代码相关任务,如代码副本检测、代码到代码搜索和方法名预测。然而,这些基于每个代码字符的大规模语言模型有几个缺点:它们通常很大,依赖于标签很强,需要许多计算机力和时间来调整新的数据集。此外,代码嵌入应该基于整个代码片段而不是每个代码字符编码。主要原因是,对每个代码字符进行编码会导致模型参数膨胀,导致很多参数存储不重要的信息。在这篇论文中,我们提出了一个新的框架,叫做TransformCode,它通过对代码嵌入进行对照学习来学习代码嵌入。框架使用Transformer编码器作为模型的一部分。我们还介绍了一种新的数据采样技术 called abstract syntax tree transformation,该技术对原始代码片段应用 sintactic和semantic 变换来生成更多元和更加稳定的锚样本。我们提出的框架具有灵活性和适应性:它可以轻松扩展到其他下游任务需要代码表示,例如代码副本检测和分类。此外,框架也非常高效和扩展:它不需要大型模型或大量训练数据,并且可以支持任何编程语言。最后,我们的框架不仅限于无监督学习,还可以应用到一些监督学习任务,只需要添加任务特定的标签或目标。为了评估我们的框架的效果,我们对不同的软件工程任务和不同编程语言的多个数据集进行了广泛的实验。

Genetic Algorithm enhanced by Deep Reinforcement Learning in parent selection mechanism and mutation : Minimizing makespan in permutation flow shop scheduling problems

  • paper_url: http://arxiv.org/abs/2311.05937
  • repo_url: None
  • paper_authors: Maissa Irmouli, Nourelhouda Benazzoug, Alaa Dania Adimi, Fatma Zohra Rezkellah, Imane Hamzaoui, Thanina Hamitouche
  • for: 本研究使用强化学习(RL)方法解决复杂的 combinatorial 或非线性问题中的难题,特别是用于流shop scheduling problem(FSP)。
  • methods: 提议的 RL+GA 方法 integrate 神经网络(NN),并使用 Q-learning 或 Sarsa(0) 方法来控制 GA 算法中的两个关键运算:父选择机制和变异。在每一代,RL 代理的动作是确定选择方法、父选择概率和孪生变异概率。这allow RL 代理 dynamically 调整选择和变异 based on its 学习政策。
  • results: 研究结果表明 RL+GA 方法能够改进原始 GA 的性能,并且能够学习和适应人口多样性和解决方案改进随时间的演化过程。这种适应性导致在静态参数配置下获得的调度解决方案的改进。
    Abstract This paper introduces a reinforcement learning (RL) approach to address the challenges associated with configuring and optimizing genetic algorithms (GAs) for solving difficult combinatorial or non-linear problems. The proposed RL+GA method was specifically tested on the flow shop scheduling problem (FSP). The hybrid algorithm incorporates neural networks (NN) and uses the off-policy method Q-learning or the on-policy method Sarsa(0) to control two key genetic algorithm (GA) operators: parent selection mechanism and mutation. At each generation, the RL agent's action is determining the selection method, the probability of the parent selection and the probability of the offspring mutation. This allows the RL agent to dynamically adjust the selection and mutation based on its learned policy. The results of the study highlight the effectiveness of the RL+GA approach in improving the performance of the primitive GA. They also demonstrate its ability to learn and adapt from population diversity and solution improvements over time. This adaptability leads to improved scheduling solutions compared to static parameter configurations while maintaining population diversity throughout the evolutionary process.
    摘要

Anytime-Valid Confidence Sequences for Consistent Uncertainty Estimation in Early-Exit Neural Networks

  • paper_url: http://arxiv.org/abs/2311.05931
  • repo_url: https://github.com/metodj/eenn-avcs
  • paper_authors: Metod Jazbec, Patrick Forré, Stephan Mandt, Dan Zhang, Eric Nalisnick
  • for: 这篇论文是关于使用早期离开神经网络(EENN)实现适应性推理,并生成可靠的不确定性估计的研究。
  • methods: 论文使用了标准的不确定性评估技术,如 bayesian 方法或充分预测,但这些技术可能会导致逻辑不一致的问题。
  • results: 论文使用 anytime-valid confidence sequences (AVCSs) 解决这个问题,并在 regression 和 classification 任务上进行了实验验证。
    Abstract Early-exit neural networks (EENNs) facilitate adaptive inference by producing predictions at multiple stages of the forward pass. In safety-critical applications, these predictions are only meaningful when complemented with reliable uncertainty estimates. Yet, due to their sequential structure, an EENN's uncertainty estimates should also be consistent: labels that are deemed improbable at one exit should not reappear within the confidence interval / set of later exits. We show that standard uncertainty quantification techniques, like Bayesian methods or conformal prediction, can lead to inconsistency across exits. We address this problem by applying anytime-valid confidence sequences (AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across exits. We examine the theoretical and practical challenges of applying AVCSs to EENNs and empirically validate our approach on both regression and classification tasks.
    摘要 Early-exit neural networks (EENNs) 可以实现适应性的推理,通过多个前进通道生成预测结果。在安全关键应用中,这些预测结果的准确性只有在 accompaniment with reliable uncertainty estimates 时才有意义。然而,由于 EENN 的序列结构,它们的uncertainty estimates 应该具有一定的一致性:在某个 exit 被评估为不可能时,后续 exit 的信息不应该重新出现在信任范围内。我们表明,标准的uncertainty量化技术,如 Bayesian 方法或充分预测,可能会导致 exit 之间的不一致。我们解决这个问题,通过应用 anytime-valid confidence sequences (AVCSs) 到 EENN 的 exit 处理。由于 AVCSs 的设计,它们可以保证 exit 之间的一致性。我们检查了应用 AVCSs 到 EENN 的理论和实践挑战,并对 regression 和 classification 任务进行了实验验证。

The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

  • paper_url: http://arxiv.org/abs/2311.05928
  • repo_url: None
  • paper_authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov
  • for: 这个研究探讨了转换器架构中嵌入的动态异构性和自身维度问题,特别是编码器和解码器之间的对比。
  • methods: 这个研究使用了一种新的方法来研究嵌入的动态异构性和自身维度,包括对嵌入的分布进行分析和对嵌入的维度进行测量。
  • results: 研究发现,在解码器中的嵌入异构性表现出一个明确的bell型曲线,中间层的异构性最高,而编码器中的嵌入异构性则更加uniform。此外,研究还发现,在训练的初期阶段,嵌入的维度会增加,然后逐渐减少,表明在训练过程中,模型在嵌入空间中进行了扩展和细化。
    Abstract In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.
    摘要 在这项研究中,我们展示了对转换器架构中嵌入的动态异构和内在维度的调查,特别是转换器Encoder和Decoder之间的对比。我们的发现显示,转换器Decoder中的异构性profile采取了一个明确的钟形曲线,中间层的异构性最高。这种模式与Encoder中的异构性更加 uniformly distributed 不同。此外,我们发现在训练的初期阶段,嵌入的内在维度会增加,表示在更高维度的空间中扩展。然后在训练的末期阶段,嵌入的维度会减少,表示向更加紧凑的表示进行了修finement。我们的结果为encoder和decoder嵌入性能的理解提供了新的视角。

Fake Alignment: Are LLMs Really Aligned Well?

  • paper_url: http://arxiv.org/abs/2311.05915
  • repo_url: None
  • paper_authors: Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang
  • for: 这个研究探讨了 LLM 的安全性评估问题,具体来说是多选题和开放题之间的性能差异。
  • methods: 该研究采用了基于犯罪攻击模式的研究方法,并提出了 fake alignment 现象,即 LLM 只记忆了安全问题的答案,而无法解决其他安全测试。
  • results: 该研究发现了许多广泛使用的 LLM 存在假Alignment现象,导致previous evaluation protocols 不可靠。在提出 fake alignment 和两个新的评价指标(Consistency Score 和 Consistent Safety Score)后,该研究引入了 Fake alIgNment Evaluation 框架,以评估 LLM 的安全性。
    Abstract The growing awareness of safety concerns in large language models (LLMs) has sparked considerable interest in the evaluation of safety within current research endeavors. This study investigates an interesting issue pertaining to the evaluation of LLMs, namely the substantial discrepancy in performance between multiple-choice questions and open-ended questions. Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization. That is, the LLM does not have a comprehensive understanding of the complex concept of safety. Instead, it only remembers what to answer for open-ended safety questions, which makes it unable to solve other forms of safety tests. We refer to this phenomenon as fake alignment and construct a comparative benchmark to empirically verify its existence in LLMs. Such fake alignment renders previous evaluation protocols unreliable. To address this, we introduce the Fake alIgNment Evaluation (FINE) framework and two novel metrics--Consistency Score (CS) and Consistent Safety Score (CSS), which jointly assess two complementary forms of evaluation to quantify fake alignment and obtain corrected performance estimates. Applying FINE to 14 widely-used LLMs reveals several models with purported safety are poorly aligned in practice. Our work highlights potential limitations in prevailing alignment methodologies.
    摘要 LLMs 的安全问题正在引起越来越多的关注,这个研究探讨了 LLMS 的安全评估方法中的一个有趣问题,即多选题和开放式题的性能差异。我们根据监狱攻击模式的研究,提出了匹配混合泛化的问题,即 LLMS 对安全概念的理解不够全面,只记忆了开放式安全题的答案,无法解决其他安全测试形式。我们称这种现象为“假对齐”,并构建了比较指标来实验性证明其存在。这种假对齐使得以前的评估协议不可靠。为了解决这个问题,我们介绍了 Fake alIgNment Evaluation(FINE)框架和两个新的度量——一致度分数(CS)和安全一致分数(CSS),它们共同评估了两种不同的评估方法,以量化假对齐并获得修正后的性能估计。通过应用 FINE 到 14 种广泛使用的 LLMS 中,发现一些被认为具有安全的模型在实践中有假对齐问题。我们的工作高光了现有的对齐方法的局限性。

Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users

  • paper_url: http://arxiv.org/abs/2311.05903
  • repo_url: None
  • paper_authors: Jennifer Dodgson, Lin Nanzheng, Julian Peh, Akira Rafhael Janson Pattirane, Alfath Daryl Alhajir, Eko Ridho Dinarto, Joseph Lim, Syed Danyal Ahmad
  • for: 本研究旨在提高大型自然语言模型(LLM)性能,通过微调、检索增强生成(RAG)和软引用等方法。
  • methods: 研究通常使用高级技术或高成本技术,使得许多新发现的方法对非技术用户而言是不可CCESSIBLE。本文测试了未修改版GPT 3.5、微调版本和使用 вектор化RAG数据库的同一模型,并在孤立和与基本、非算法式软引用结合使用下测试。
  • results: 研究发现,使用商业平台和默认设置,无论输出多少轮Iteration,微调模型比GPT 3.5 Turbo高效,而RAG方法则超越了两者。软引用的应用有助于每种方法的性能提高。
    Abstract Research into methods for improving the performance of large language models (LLMs) through fine-tuning, retrieval-augmented generation (RAG) and soft-prompting has tended to focus on the use of highly technical or high-cost techniques, making many of the newly discovered approaches comparatively inaccessible to non-technical users. In this paper we tested an unmodified version of GPT 3.5, a fine-tuned version, and the same unmodified model when given access to a vectorised RAG database, both in isolation and in combination with a basic, non-algorithmic soft prompt. In each case we tested the model's ability to answer a set of 100 questions relating primarily to events that occurred after September 2021 (the point at which GPT 3.5's training data set ends). We found that if commercial platforms are used and default settings are applied with no iteration in order to establish a baseline set of outputs, a fine-tuned model outperforms GPT 3.5 Turbo, while the RAG approach out-performed both. The application of a soft prompt significantly improved the performance of each approach.
    摘要

A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

  • paper_url: http://arxiv.org/abs/2311.05877
  • repo_url: https://github.com/vcherepanova/tabular-feature-selection
  • paper_authors: Valeriia Cherepanova, Roman Levin, Gowthami Somepalli, Jonas Geiping, C. Bayan Bruss, Andrew Gordon Wilson, Tom Goldstein, Micah Goldblum
  • for: 本研究旨在提供一个有效的特征选择精选方法,用于适应 tabular deep learning 中的特征选择问题。
  • methods: 本研究使用了多种生成杂乱特征的方法,包括经典的批处理方法、批处理杂乱特征生成方法和缺失特征生成方法。
  • results: 本研究通过对实际数据集进行测试,发现input-gradient-based Lasso 方法在适应 corrupted 或 second-order 特征选择问题时表现出色,并且比经典的特征选择方法更高效。
    Abstract Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.
    摘要 学术表格标准 benchmark 常常包含小量精心选择的特征。相比之下,数据科学家通常尽可能多地收集特征到他们的数据集中,甚至从现有特征中引入新的特征。为防止适应度过高在后续的模型下,实践者通常使用自动化特征选择方法,以确定减少的特征subset。现有的表格特征选择标准对古典下游模型、娱乐生成的数据集或不会评估特征选择器的下游性能。我们受到表格深度学习的增加流行,我们构建了一个具有下游模型性能评估的特征选择 benchmark,使用真实数据和多种生成附加特征的方法。我们还提议一种输入Gradient-based的lasso方法,用于神经网络上的特征选择,其在具有受损或第二项特征的问题上表现出色。

DPR: An Algorithm Mitigate Bias Accumulation in Recommendation feedback loops

  • paper_url: http://arxiv.org/abs/2311.05864
  • repo_url: None
  • paper_authors: Hangtong Xu, Yuanbo Xu, Yongjian Yang, Fuzhen Zhuang, Hui Xiong
  • For: The paper aims to address the bias issues in recommendation models caused by user feedback, specifically the exposure mechanism and feedback loops.* Methods: The paper uses the Missing Not At Random (MNAR) assumption to analyze the data exposure mechanism and feedback loops, and proposes a dynamic re-weighting algorithm called Dynamic Personalized Ranking (DPR) to mitigate the cross-effects of exposure mechanisms and feedback loops.* Results: The paper theoretically demonstrates the effectiveness of the proposed approach in mitigating the negative effects of feedback loops and unknown exposure mechanisms. Experimental results on real-world datasets show that models using DPR can better handle bias accumulation, and the Universal Anti-False Negative (UFN) plugin can mitigate the negative impact of false negative samples.
    Abstract Recommendation models trained on the user feedback collected from deployed recommendation systems are commonly biased. User feedback is considerably affected by the exposure mechanism, as users only provide feedback on the items exposed to them and passively ignore the unexposed items, thus producing numerous false negative samples. Inevitably, biases caused by such user feedback are inherited by new models and amplified via feedback loops. Moreover, the presence of false negative samples makes negative sampling difficult and introduces spurious information in the user preference modeling process of the model. Recent work has investigated the negative impact of feedback loops and unknown exposure mechanisms on recommendation quality and user experience, essentially treating them as independent factors and ignoring their cross-effects. To address these issues, we deeply analyze the data exposure mechanism from the perspective of data iteration and feedback loops with the Missing Not At Random (\textbf{MNAR}) assumption, theoretically demonstrating the existence of an available stabilization factor in the transformation of the exposure mechanism under the feedback loops. We further propose Dynamic Personalized Ranking (\textbf{DPR}), an unbiased algorithm that uses dynamic re-weighting to mitigate the cross-effects of exposure mechanisms and feedback loops without additional information. Furthermore, we design a plugin named Universal Anti-False Negative (\textbf{UFN}) to mitigate the negative impact of the false negative problem. We demonstrate theoretically that our approach mitigates the negative effects of feedback loops and unknown exposure mechanisms. Experimental results on real-world datasets demonstrate that models using DPR can better handle bias accumulation and the universality of UFN in mainstream loss methods.
    摘要 推荐模型通常受到已部署的推荐系统中收集的用户反馈的偏见。用户反馈受到曝光机制的影响很大,用户只是提供曝光给他们的项目,并且忽略其他项目,因此生成了大量的假正样本。这些偏见会在新的模型中继承下来,并通过反馈循环被强制。此外,假正样本的存在使得负样本难以处理,并将偏见引入推荐过程中。 latest work has investigated the negative impact of feedback loops and unknown exposure mechanisms on recommendation quality and user experience, treating them as independent factors and ignoring their cross-effects. To address these issues, we deeply analyze the data exposure mechanism from the perspective of data iteration and feedback loops with the Missing Not At Random (\textbf{MNAR}) assumption, theoretically demonstrating the existence of an available stabilization factor in the transformation of the exposure mechanism under the feedback loops. We further propose Dynamic Personalized Ranking (\textbf{DPR}), an unbiased algorithm that uses dynamic re-weighting to mitigate the cross-effects of exposure mechanisms and feedback loops without additional information. Furthermore, we design a plugin named Universal Anti-False Negative (\textbf{UFN}) to mitigate the negative impact of the false negative problem. We demonstrate theoretically that our approach mitigates the negative effects of feedback loops and unknown exposure mechanisms. Experimental results on real-world datasets demonstrate that models using DPR can better handle bias accumulation and the universality of UFN in mainstream loss methods.

Reframing Audience Expansion through the Lens of Probability Density Estimation

  • paper_url: http://arxiv.org/abs/2311.05853
  • repo_url: https://github.com/carvalhaes-ai/audience-expansion
  • paper_authors: Claudio Carvalhaes
  • for: 这篇论文旨在探讨如何使用机器学习算法扩大目标观众,以提高营销效果。
  • methods: 该论文使用了一种基于二分类 зада务的机器学习算法,通过计算样本的类别概率来扩大目标观众。
  • results: simulations 表明,该方法可以准确地Identify the most relevant users for an expanded audience, with high precision and recall values.
    Abstract Audience expansion has become an important element of prospective marketing, helping marketers create target audiences based on a mere representative sample of their current customer base. Within the realm of machine learning, a favored algorithm for scaling this sample into a broader audience hinges on a binary classification task, with class probability estimates playing a crucial role. In this paper, we review this technique and introduce a key change in how we choose training examples to ensure the quality of the generated audience. We present a simulation study based on the widely used MNIST dataset, where consistent high precision and recall values demonstrate our approach's ability to identify the most relevant users for an expanded audience. Our results are easily reproducible and a Python implementation is openly available on GitHub: \url{https://github.com/carvalhaes-ai/audience-expansion}
    摘要 audi范拓已成为营销市场的重要元素,帮助市场部署创建基于当前客户基础的目标听众。在机器学习领域,一种受欢迎的算法用于扩大这个样本,基于二分类任务,其中类别概率估计具有关键作用。在这篇论文中,我们评论这种技术,并引入一个关键的更改,以确保生成的听众质量。我们通过基于广泛使用的 MNIST 数据集进行的 simulation 研究,发现我们的方法可以具有高精度和准确性。我们的结果可以重新制作,并在 GitHub 上公开提供 Python 实现:\url{https://github.com/carvalhaes-ai/audience-expansion}

Cognitive Architecture Toward Common Ground Sharing Among Humans and Generative AIs: Trial on Model-Model Interactions in Tangram Naming Task

  • paper_url: http://arxiv.org/abs/2311.05851
  • repo_url: None
  • paper_authors: Junya Morita, Tatsuya Yui, Takeru Amaya, Ryuichiro Higashinaka, Yugo Takeuchi
  • for: 这个研究的目的是为了建立人工智能与人类之间的透明共同基础,以促进人工智能的可信度。
  • methods: 这个研究使用了生成型AI来实现模型之间的共同基础建立过程。
  • results: 研究发现,透过实现共同基础,模型之间的通信效果超过了偶数几率水平,并且观察到了对模型中的一个部分进行逐步反向传播可以实现性能的 statistically significant 提升。
    Abstract For generative AIs to be trustworthy, establishing transparent common grounding with humans is essential. As a preparation toward human-model common grounding, this study examines the process of model-model common grounding. In this context, common ground is defined as a cognitive framework shared among agents in communication, enabling the connection of symbols exchanged between agents to the meanings inherent in each agent. This connection is facilitated by a shared cognitive framework among the agents involved. In this research, we focus on the tangram naming task (TNT) as a testbed to examine the common-ground-building process. Unlike previous models designed for this task, our approach employs generative AIs to visualize the internal processes of the model. In this task, the sender constructs a metaphorical image of an abstract figure within the model and generates a detailed description based on this image. The receiver interprets the generated description from the partner by constructing another image and reconstructing the original abstract figure. Preliminary results from the study show an improvement in task performance beyond the chance level, indicating the effect of the common cognitive framework implemented in the models. Additionally, we observed that incremental backpropagations leveraging successful communication cases for a component of the model led to a statistically significant increase in performance. These results provide valuable insights into the mechanisms of common grounding made by generative AIs, improving human communication with the evolving intelligent machines in our future society.
    摘要 为了让生成型AI变得可靠,建立与人类共同基础是必要的。为了实现人机共同基础,本研究研究了模型之间的共同基础建设。在这个上下文中,共同基础被定义为在交流中的智能框架,它使得交换 между代理人之间的符号与每个代理人内部的含义相连接。这种连接是通过共同智能框架的共享而实现。在本研究中,我们使用生成型AI来视觉化模型内部的过程。在这个任务中,发送方构建一个抽象图形内部的模型,并生成基于这个图形的详细描述。接收方根据伙伴的生成描述重新构建原始抽象图形。初步的研究结果显示,通过实施共同基础,任务性能超过了偶极值水平,这表明了共同智能框架在模型中的作用。此外,我们还发现,通过基于成功交流 caso的增量反向卷积,对一部分模型的性能进行了 statistically significant 的提高。这些结果为我们在将来社会中与智能机器进行交流的机制提供了有价值的发现。

Tamil-Llama: A New Tamil Language Model Based on Llama 2

  • paper_url: http://arxiv.org/abs/2311.05845
  • repo_url: https://github.com/abhinand5/tamil-llama
  • paper_authors: Abhinand Balachandran
  • for: 提高坦米语言模型的表现,尤其是在坦米语言上的人工智能文本生成。
  • methods: 使用LoRA方法对大量坦米文本资料进行有效的模型训练,并将ChatGPT模型扩展到16,000个坦米词汇上。
  • results: 在坦米文本生成和理解方面获得了显著的性能提升,具有潜在的应用在印度语言模型中。
    Abstract Language modeling has witnessed remarkable advancements in recent years, with Large Language Models (LLMs) like ChatGPT setting unparalleled benchmarks in human-like text generation. However, a prevailing limitation is the underrepresentation of languages like Tamil in these cutting-edge models, leading to suboptimal performance in diverse linguistic contexts. This paper addresses this lacuna, enhancing the open-source LLaMA model with an addition of 16,000 Tamil tokens, aiming to achieve superior text generation and comprehension in the Tamil language. We strategically employ the LoRA methodology for efficient model training on a comprehensive Tamil corpus, ensuring computational feasibility and model robustness. Moreover, we introduce a Tamil-translated version of the Alpaca dataset and a subset of the OpenOrca dataset tailored for instruction fine-tuning. Our results showcase significant performance improvements in Tamil text generation, with potential implications for the broader landscape of LLMs in Indian languages. We further underscore our commitment to open research by making our models, datasets, and code publicly accessible, fostering further innovations in language modeling.
    摘要 Large Language Models (LLMs) 如 ChatGPT 在最近几年内取得了无 precedent 的进步,但是有一点问题是一些语言,如 tamile 的语言,在这些先进模型中受到了不足的表现,这导致在多样化语言上的表现不佳。这篇文章解决了这个问题,通过将16,000个 tamile 单词添加到了开源的 LLaMA 模型中,以达到在 tamile 语言中的superior 文本生成和理解。我们使用 LoRA 方法学习在 comprehensive tamile 词汇库上,以确保计算可行性和模型稳定性。此外,我们还引入了 tamile 翻译的 Alpaca 数据集和 OpenOrca 数据集的一个子集,用于 fine-tuning instruction。我们的结果表明,在 tamile 文本生成方面有了显著的性能提高,这可能对整个 LLMS 的发展产生了影响。此外,我们还强调我们的研究是开放的,我们将我们的模型、数据集和代码公开 accessible,以促进进一步的语言模型化领域的创新。

AI-native Interconnect Framework for Integration of Large Language Model Technologies in 6G Systems

  • paper_url: http://arxiv.org/abs/2311.05842
  • repo_url: None
  • paper_authors: Sasu Tarkoma, Roberto Morabito, Jaakko Sauvola
  • for: 这篇论文旨在探讨6G体系中大语言模型(LLM)和通用预训Transformer(GPT)如何紧密结合,以及这种结合如何重塑通信网络的功能和交互方式。
  • methods: 本论文提出了一种新的建筑方式,即将LLM和GPT与传统的预生成AI和机器学习(ML)算法结合在一起,以实现一个以AI为核心的下一代通信体系。
  • results: 该论文预测,通过将AI作为下一代通信体系的核心,将能够提高通信网络的功能和交互方式,并且将有新的实际应用出现。
    Abstract The evolution towards 6G architecture promises a transformative shift in communication networks, with artificial intelligence (AI) playing a pivotal role. This paper delves deep into the seamless integration of Large Language Models (LLMs) and Generalized Pretrained Transformers (GPT) within 6G systems. Their ability to grasp intent, strategize, and execute intricate commands will be pivotal in redefining network functionalities and interactions. Central to this is the AI Interconnect framework, intricately woven to facilitate AI-centric operations within the network. Building on the continuously evolving current state-of-the-art, we present a new architectural perspective for the upcoming generation of mobile networks. Here, LLMs and GPTs will collaboratively take center stage alongside traditional pre-generative AI and machine learning (ML) algorithms. This union promises a novel confluence of the old and new, melding tried-and-tested methods with transformative AI technologies. Along with providing a conceptual overview of this evolution, we delve into the nuances of practical applications arising from such an integration. Through this paper, we envisage a symbiotic integration where AI becomes the cornerstone of the next-generation communication paradigm, offering insights into the structural and functional facets of an AI-native 6G network.
    摘要 六代网络架构的演化将导致一次性的 коммуникацион网络变革,人工智能(AI)将扮演关键角色。这篇论文探讨了在六代系统中大语言模型(LLM)和通用预训练变换器(GPT)的无缝嵌入。这些技术将能够捕捉意图、策略和执行复杂命令,对网络功能和交互进行重塑。核心在于人工智能集成框架,织入网络中AI-центри的操作。基于不断演化的当前状态艺术,我们提出了下一代移动网络的新建筑视图。在这个新视图中,LLMs和GPTs将与传统的预生成AI和机器学习(ML)算法一起Collaborate,创造一种新的旧和新的融合,将经验证过的方法与转变性AI技术融合。本论文不仅提供了这一演化的概念审视,还探讨了这种融合的实践应用。我们可以看到,AI将成为下一代通信 paradigma的基础 stone,提供对六代网络结构和功能方面的新的视角。

Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion

  • paper_url: http://arxiv.org/abs/2311.06318
  • repo_url: None
  • paper_authors: Jinheon Baek, Nirupama Chandrasekaran, Silviu Cucerzan, Allen herring, Sujay Kumar Jauhar
  • for: 这个研究旨在提高搜索引擎的搜索结果,使其更加个性化和有用。
  • methods: 该研究使用了一种新的方法,即在用户的搜索和浏览历史记录中提取有用的信息,并将其与大型自然语言模型(LLM)结合使用,以提高搜索结果的个性化性。
  • results: 研究表明,该方法可以提供更加个性化和有用的查询建议,比如其他LLM-基于的基elines。通过人工评估,该方法在Contextual Query Suggestion任务中表现出色,生成的查询建议更加相关、个性化和有用。
    Abstract Large Language Models (LLMs) excel at tackling various natural language tasks. However, due to the significant costs involved in re-training or fine-tuning them, they remain largely static and difficult to personalize. Nevertheless, a variety of applications could benefit from generations that are tailored to users' preferences, goals, and knowledge. Among them is web search, where knowing what a user is trying to accomplish, what they care about, and what they know can lead to improved search experiences. In this work, we propose a novel and general approach that augments an LLM with relevant context from users' interaction histories with a search engine in order to personalize its outputs. Specifically, we construct an entity-centric knowledge store for each user based on their search and browsing activities on the web, which is then leveraged to provide contextually relevant LLM prompt augmentations. This knowledge store is light-weight, since it only produces user-specific aggregate projections of interests and knowledge onto public knowledge graphs, and leverages existing search log infrastructure, thereby mitigating the privacy, compliance, and scalability concerns associated with building deep user profiles for personalization. We then validate our approach on the task of contextual query suggestion, which requires understanding not only the user's current search context but also what they historically know and care about. Through a number of experiments based on human evaluation, we show that our approach is significantly better than several other LLM-powered baselines, generating query suggestions that are contextually more relevant, personalized, and useful.
    摘要 In this work, we propose a novel and general approach that augments an LLM with relevant context from users' interaction histories with a search engine to personalize its outputs. Specifically, we construct an entity-centric knowledge store for each user based on their search and browsing activities on the web, which is then leveraged to provide contextually relevant LLM prompt augmentations. This knowledge store is lightweight, as it only produces user-specific aggregate projections of interests and knowledge onto public knowledge graphs, and leverages existing search log infrastructure, thereby mitigating privacy, compliance, and scalability concerns associated with building deep user profiles for personalization.We validate our approach on the task of contextual query suggestion, which requires understanding not only the user's current search context but also what they historically know and care about. Through a number of experiments based on human evaluation, we show that our approach is significantly better than several other LLM-powered baselines, generating query suggestions that are contextually more relevant, personalized, and useful.

Model-as-a-Service (MaaS): A Survey

  • paper_url: http://arxiv.org/abs/2311.05804
  • repo_url: None
  • paper_authors: Wensheng Gan, Shicheng Wan, Philip S. Yu
  • for: 本研究旨在介绍Model-as-a-Service(MaaS) paradigma,它是一种基于云计算的Generative Artificial Intelligence(GenAI)模型的部署和使用方式。
  • methods: 本研究使用了cloud computing技术,并介绍了关键的MaaS技术。
  • results: 研究表明,MaaS将使GenAI模型的开发变得更加民主化,并且可以为不同领域的应用提供可观之服务。它还可以解决许多当前AI技术的挑战,如模型训练和部署等。
    Abstract Due to the increased number of parameters and data in the pre-trained model exceeding a certain level, a foundation model (e.g., a large language model) can significantly improve downstream task performance and emerge with some novel special abilities (e.g., deep learning, complex reasoning, and human alignment) that were not present before. Foundation models are a form of generative artificial intelligence (GenAI), and Model-as-a-Service (MaaS) has emerged as a groundbreaking paradigm that revolutionizes the deployment and utilization of GenAI models. MaaS represents a paradigm shift in how we use AI technologies and provides a scalable and accessible solution for developers and users to leverage pre-trained AI models without the need for extensive infrastructure or expertise in model training. In this paper, the introduction aims to provide a comprehensive overview of MaaS, its significance, and its implications for various industries. We provide a brief review of the development history of "X-as-a-Service" based on cloud computing and present the key technologies involved in MaaS. The development of GenAI models will become more democratized and flourish. We also review recent application studies of MaaS. Finally, we highlight several challenges and future issues in this promising area. MaaS is a new deployment and service paradigm for different AI-based models. We hope this review will inspire future research in the field of MaaS.
    摘要 由于预训过程中参数和数据的增加超过了一定水平,基础模型(例如大语言模型)可以显著提高下游任务性能,并且具有一些新的特殊能力(例如深度学习、复杂逻辑和人类匹配),这些能力在之前没有出现过。基础模型是生成人工智能(GenAI)的一种形式,而Model-as-a-Service(MaaS)是一种革命性的部署和使用GenAI模型的新 paradigma。MaaS将如何使用AI技术发生了一种巨大的变革,并提供了可扩展的和访问ible的解决方案,让开发者和用户可以无需具备大量的基础设施或模型训练专业知识来使用预训AI模型。在这篇论文中,我们 aim to provide a comprehensive overview of MaaS, its significance, and its implications for various industries. We will review the development history of "X-as-a-Service" based on cloud computing and present the key technologies involved in MaaS. With the development of GenAI models becoming more democratized, MaaS will flourish. We will also review recent application studies of MaaS. Finally, we will highlight several challenges and future issues in this promising area. MaaS是一种新的部署和服务 paradigma,我们希望这篇文章能启发未来的研究在这个领域。

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

  • paper_url: http://arxiv.org/abs/2311.05800
  • repo_url: https://github.com/google-research-datasets/swim-ir
  • paper_authors: Nandan Thakur, Jianmo Ni, Gustavo Hernández Ábrego, John Wieting, Jimmy Lin, Daniel Cer
  • for: 这个论文主要针对的是如何使用人工生成的语言训练数据来提高多语言检索模型的性能。
  • methods: 这个论文提出了一种名为SAP(概要然后提问)的技术,其中使用大型自然语言处理器(LLM)生成文本概要,然后使用这个概要来生成目标语言中的问题。
  • results: 根据这个论文的结果,使用SWIM-IR数据集进行synthetic fine-tuning的多语言检索模型可以达到与人工supervised模型相当的性能,而且可以在三个检索测试benchmark上进行可靠的评估。
    Abstract Dense retrieval models have predominantly been studied for English, where models have shown great success, due to the availability of human-labeled training pairs. However, there has been limited success for multilingual retrieval so far, as training data is uneven or scarcely available across multiple languages. Synthetic training data generation is promising (e.g., InPars or Promptagator), but has been investigated only for English. Therefore, to study model capabilities across both cross-lingual and monolingual retrieval tasks, we develop SWIM-IR, a synthetic retrieval training dataset containing 33 (high to very-low resource) languages for training multilingual dense retrieval models without requiring any human supervision. To construct SWIM-IR, we propose SAP (summarize-then-ask prompting), where the large language model (LLM) generates a textual summary prior to the query generation step. SAP assists the LLM in generating informative queries in the target language. Using SWIM-IR, we explore synthetic fine-tuning of multilingual dense retrieval models and evaluate them robustly on three retrieval benchmarks: XOR-Retrieve (cross-lingual), XTREME-UP (cross-lingual) and MIRACL (monolingual). Our models, called SWIM-X, are competitive with human-supervised dense retrieval models, e.g., mContriever, finding that SWIM-IR can cheaply substitute for expensive human-labeled retrieval training data.
    摘要 traditional retrieval models have been mainly studied for English, where models have shown great success, due to the availability of human-labeled training pairs. However, there has been limited success for multilingual retrieval so far, as training data is uneven or scarcely available across multiple languages. Synthetic training data generation is promising (e.g., InPars or Promptagator), but has been investigated only for English. Therefore, to study model capabilities across both cross-lingual and monolingual retrieval tasks, we develop SWIM-IR, a synthetic retrieval training dataset containing 33 (high to very-low resource) languages for training multilingual dense retrieval models without requiring any human supervision. To construct SWIM-IR, we propose SAP (summarize-then-ask prompting), where the large language model (LLM) generates a textual summary prior to the query generation step. SAP assists the LLM in generating informative queries in the target language. Using SWIM-IR, we explore synthetic fine-tuning of multilingual dense retrieval models and evaluate them robustly on three retrieval benchmarks: XOR-Retrieve (cross-lingual), XTREME-UP (cross-lingual) and MIRACL (monolingual). Our models, called SWIM-X, are competitive with human-supervised dense retrieval models, e.g., mContriever, finding that SWIM-IR can cheaply substitute for expensive human-labeled retrieval training data.