cs.CL - 2023-09-02

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

paper_url: http://arxiv.org/abs/2309.00986
repo_url: https://github.com/modelscope/modelscope-agent
paper_authors: Chenliang Li, Hehong Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren Zhou
for: 这种研究的目的是为了开发一种通用的智能代理框架，让大型自然语言模型（LLMs）具备工具使用能力，以完成复杂任务。
methods: 该研究使用了开源的 LLMs 作为控制器，并提供了一个用户友好的系统库，可以自定义引擎设计以支持模型训练在多个开源 LLMs 上，同时允许在一起融合模型 API 和常见 API。
results: 研究提出了一个涵盖工具使用数据收集、工具检索、工具注册、内存控制、自定义模型训练和评估的框架，以帮助 LLMs 具备工具使用能力。此外，还展示了一个基于 ModelScope-Agent 框架的实际应用程序——ModelScopeGPT，可以连接多个开源 LLMs 和上千个公共 AI 模型，以及本地化社区知识。

Abstract
Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. The ModelScope-Agent library\footnote{https://github.com/modelscope/modelscope-agent} and online demo\footnote{https://modelscope.cn/studios/damo/ModelScopeGPT/summary} are now publicly available.

摘要
带有强大语言模型（LLM）的大型语言模型在最近的几年内已经展现出了人类意图理解、逻辑推理和规划行为的强大能力。为了更好地利用LLM完成复杂任务，现在有一个增长的趋势是建立一个具有工具使用能力的代理框架，将LLM与大量外部API集成起来。在这项工作中，我们介绍了ModelScope-Agent，一个通用和可定制的代理框架，基于开源LLM控制器。它提供了用户友好的系统库，可以自定义引擎设计，以支持模型训练在多个开源LLM上，同时也可以快速集成模型API和常见API。为了让LLM具备工具使用能力，我们提出了一个涵盖工具使用数据收集、工具检索、工具注册、内存控制、定制化模型训练和评估的框架。最后，我们展示了ModelScopeGPT，一个基于ModelScope-Agent框架的现实世界智能助手，可以连接多个开源LLM和超过1000个公共AI模型，并将本地化社区知识集成到ModelScope中。ModelScope-Agent库（）和在线示例（）现在都已经公开available。

Multilingual Text Representation

paper_url: http://arxiv.org/abs/2309.00949
repo_url: https://github.com/TheBauwssss/TimeInWords
paper_authors: Fahim Faisal
for: 本研究旨在探讨现代自然语言处理（NLP）技术的发展，尤其是大量多语言模型在100种语言以上执行任务的可能性。
methods: 本研究使用了现代语言模型，从一元化表示法开始，能够实现自然语言理解、通用常识逻辑和问答等任务，同时捕捉文本语法和 semantics。
results: 研究发现，现代语言模型可以在低资源 диаLECTS 上进行竞争性的表现，并且可以拓展到未知语言边界。然而，要确保文本的一致表示，还需要解决一些问题，以实现语言和 Speakers 之间的一致模型空间。

Abstract
Modern NLP breakthrough includes large multilingual models capable of performing tasks across more than 100 languages. State-of-the-art language models came a long way, starting from the simple one-hot representation of words capable of performing tasks like natural language understanding, common-sense reasoning, or question-answering, thus capturing both the syntax and semantics of texts. At the same time, language models are expanding beyond our known language boundary, even competitively performing over very low-resource dialects of endangered languages. However, there are still problems to solve to ensure an equitable representation of texts through a unified modeling space across language and speakers. In this survey, we shed light on this iterative progression of multilingual text representation and discuss the driving factors that ultimately led to the current state-of-the-art. Subsequently, we discuss how the full potential of language democratization could be obtained, reaching beyond the known limits and what is the scope of improvement in that space.

摘要
现代NLP技术发展包括大量多语言模型，可以在多于100种语言上进行任务。当前的语言模型已经很 longue distance，从简单的一个热点表示单词开始，可以完成自然语言理解、常识逻辑和问答等任务，同时捕捉文本的语法和 semantics。然而，还有很多问题需要解决，以确保文本在多语言空间中具有一致的表示空间，并且让所有语言和发音者都有平等的表达机会。在这份报告中，我们将详细介绍这一趋势的迭代发展，并讨论驱动这一进步的因素。后续，我们将讨论如何实现语言 демокра化的全部潜力，超越已知的限制，以及这一空间中的可进步范围。

BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing

paper_url: http://arxiv.org/abs/2309.00916
repo_url: https://github.com/cwang621/blsp
paper_authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Jinliang Lu, Junhong Wu, Yuchen Liu, Chengqing Zong, Jiajun Zhang
for: 该论文旨在扩展大语言模型（LLM）的语言能力到语音频谱上，解决模态对齐问题。
methods: 该论文提出了一种基于行为对齐的语音频谱预训练方法（BLSP），通过学习一个轻量级的模态适配器，使LLM在语音和文本两个模式下 exhibit 同样的生成行为。
results: 该论文实现了将 LLM 的语言能力扩展到语音频谱上，可以实现语音识别、语音翻译、语音理解和语音对话，甚至在零shot cross-lingual scenarios 下。

Abstract
The emergence of large language models (LLMs) has sparked significant interest in extending their remarkable language capabilities to speech. However, modality alignment between speech and text still remains an open problem. Current solutions can be categorized into two strategies. One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are used as inputs for LLMs, which limits their potential in modeling alignment between speech and text. The other is an end-to-end approach that relies on speech instruction data, which is very difficult to collect in large quantities. In this paper, we address these issues and propose the BLSP approach that Bootstraps Language-Speech Pre-training via behavior alignment of continuation writing. We achieve this by learning a lightweight modality adapter between a frozen speech encoder and an LLM, ensuring that the LLM exhibits the same generation behavior regardless of the modality of input: a speech segment or its transcript. The training process can be divided into two steps. The first step prompts an LLM to generate texts with speech transcripts as prefixes, obtaining text continuations. In the second step, these continuations are used as supervised signals to train the modality adapter in an end-to-end manner. We demonstrate that this straightforward process can extend the capabilities of LLMs to speech, enabling speech recognition, speech translation, spoken language understanding, and speech conversation, even in zero-shot cross-lingual scenarios.

摘要
大型语言模型（LLM）的出现引起了很大的关注，它们的语言能力可以扩展到语音领域。然而，语音和文本之间的模式匹配仍然是一个开放的问题。现有的解决方案可以分为两种策略：一种是采用分解approach，在 separately 训练的语音识别系统输出（token或状态）作为 LLMS 的输入，这会限制其在模式匹配方面的潜力。另一种是以端到端方式进行，它 rely 于语音指令数据，但这些数据难以在大量收集。在这篇论文中，我们解决这些问题，并提出了 BLSP 方法，即通过行为对齐来启动语言-语音预训练。我们通过学习一个轻量级的模式适配器，使 LLMS 在语音段和其转录之间 exhibit 同样的生成行为。训练过程可以分为两步。第一步是让 LLMS 使用语音转录作为前缀，生成文本。第二步是使用这些文本继续进行练习，以train 模式适配器。我们示示这个简单的过程可以扩展 LLMS 的能力到语音领域，实现语音识别、语音翻译、语音理解和语音对话，甚至在零aser 跨语言enario 中。

Evaluating Transformer’s Ability to Learn Mildly Context-Sensitive Languages

paper_url: http://arxiv.org/abs/2309.00857
repo_url: None
paper_authors: Shunjie Wang, Shane Steinert-Threlkeld
for: 这些研究检验了Transformer模型在模型自然语言方面的能力，以及它们是否能够学习一些轻度上下文敏感的语言。
methods: 这些研究使用了Transformer模型，并测试了它们在不同复杂度的语言上的能力。
results: 研究发现，Transformer模型在已经看过的数据上能够泛化良好，但在长串上的推断能力较差，而LSTM模型在这个方面表现更好。分析还显示，Transformer模型学习了自我关注 patrerns和表示，这些 patrerns和表示可能帮助模型解决语言。

Abstract
Despite that Transformers perform well in NLP tasks, recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages. These findings motivated us to think about their implications in modeling natural language, which is hypothesized to be mildly context-sensitive. We test Transformer's ability to learn a variety of mildly context-sensitive languages of varying complexities, and find that they generalize well to unseen in-distribution data, but their ability to extrapolate to longer strings is worse than that of LSTMs. Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior, which may have helped the models solve the languages.

摘要
尽管变换器在自然语言处理（NLP）任务中表现良好，但 latest studies 表明自注意力在学习一些常见和 context-free 语言方面存在理论上的限制。这些发现使我们思考自然语言模型化的可能性，自然语言被假设为有些 context-sensitive。我们测试 transformer 能够学习不同复杂程度的 mildly context-sensitive 语言，并发现它们在未seen 数据上具有良好的泛化能力，但在更长的字串上具有更差的推理能力，与 LSTM 模型相比。我们的分析表明 transformer 模型中学习的自注意力模式和表示方式可以模型依赖关系和 counting 行为，可能有助于模型解决语言。

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models

paper_url: http://arxiv.org/abs/2309.00789
repo_url: https://github.com/dell-research-harvard/linktransformer
paper_authors: Abhishek Arora, Melissa Dell
For: The paper aims to improve record linkage in noisy datasets using large language models (LLMs) and make it more accessible to users who are familiar with popular string matching packages like R and Stata.* Methods: The paper proposes an open-source package called LinkTransformer that treats record linkage as a text retrieval problem and uses transformer LLMs to perform record linkage. The package includes a rich repository of pre-trained transformer semantic similarity models for multiple languages and supports easy integration of any transformer language model from Hugging Face or OpenAI.* Results: The paper claims that LinkTransformer can perform record linkage with high accuracy and supports standard functionality such as blocking and linking on multiple noisy fields. It also includes comprehensive tools for efficient model tuning and makes it easy for users to contribute their custom-trained models to its model hub.

Abstract
Linking information across sources is fundamental to a variety of analyses in social science, business, and government. While large language models (LLMs) offer enormous promise for improving record linkage in noisy datasets, in many domains approximate string matching packages in popular softwares such as R and Stata remain predominant. These packages have clean, simple interfaces and can be easily extended to a diversity of languages. Our open-source package LinkTransformer aims to extend the familiarity and ease-of-use of popular string matching methods to deep learning. It is a general purpose package for record linkage with transformer LLMs that treats record linkage as a text retrieval problem. At its core is an off-the-shelf toolkit for applying transformer models to record linkage with four lines of code. LinkTransformer contains a rich repository of pre-trained transformer semantic similarity models for multiple languages and supports easy integration of any transformer language model from Hugging Face or OpenAI. It supports standard functionality such as blocking and linking on multiple noisy fields. LinkTransformer APIs also perform other common text data processing tasks, e.g., aggregation, noisy de-duplication, and translation-free cross-lingual linkage. Importantly, LinkTransformer also contains comprehensive tools for efficient model tuning, to facilitate different levels of customization when off-the-shelf models do not provide the required accuracy. Finally, to promote reusability, reproducibility, and extensibility, LinkTransformer makes it easy for users to contribute their custom-trained models to its model hub. By combining transformer language models with intuitive APIs that will be familiar to many users of popular string matching packages, LinkTransformer aims to democratize the benefits of LLMs among those who may be less familiar with deep learning frameworks.

摘要
连结资讯 Across ources 是社会科学、商业和政府中许多分析的基本步骤。 although large language models (LLMs) 在复杂数据中提供了巨大的推荐，在许多领域中， approximate string matching 套件在 популяр的软件such as R 和 Stata 中仍然占主导地位。这些套件有clean、简单的接口，并可以轻松扩展到多种语言。我们的开源套件 LinkTransformer 目标是将受欢迎的字串匹配方法和深度学习结合在一起，以提供一个易用的字串匹配解决方案。它的核心是一个可以在四行程式码中应用transformer模型的工具组。LinkTransformer 包含了多种语言的预训transformer对偶性模型，并支持轻松地 интеграble任何transformer语言模型。它支持标准的功能，例如封页和联结多个噪音字段。LinkTransformer API 还可以进行其他常见的文本数据处理任务，例如聚合、噪音除除损和无需翻译的跨语言联结。更重要的是，LinkTransformer 还包含了详细的模型调整工具，以便在不同的粒度上进行自定义，当Off-the-shelf模型不提供所需的精度时。最后，为了促进再利用、重现性和扩展性，LinkTransformer 让用户可以轻松地发布自己的自定义模型。通过结合transformer语言模型和对多数使用 string matching 套件的用户而且 familier的 APIs，LinkTransformer 目标是将LLMs 的好处传播到那些可能不熟悉深度学习框架的人。