cs.CL - 2023-08-03

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

paper_url: http://arxiv.org/abs/2308.01831
repo_url: None
paper_authors: Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro
for: 这个论文的目的是学习一种能够同时处理多种语言的语音和文本表示，以便实现多语言语音合成等多种任务。
methods: 该论文提出了一种基于单个模型的方法，利用自适应语音模型对语音特征编码生成 speech units，然后通过对这些 speech units 进行pseudo文本处理，建立了语音和文本之间的统一表示。最后， authors 提出了一种基于encoder-decoder结构的 Unit-to-Unit Translation (UTUT) 目标函数，用于在多语言数据上进行多对多翻译。
results: 通过对多种语言进行实验，authors 证明了该方法在多种多语言语音合成、文本合成和翻译等任务中的效果。此外，authors 还证明了该方法可以实现多对多语言同时翻译，这在文献中尚未被探讨过。示例可以在 https://choijeongsoo.github.io/utut 上找到。

Abstract
In this paper, we propose a method to learn unified representations of multilingual speech and text with a single model, especially focusing on the purpose of speech synthesis. We represent multilingual speech audio with speech units, the quantized representations of speech features encoded from a self-supervised speech model. Therefore, we can focus on their linguistic content by treating the audio as pseudo text and can build a unified representation of speech and text. Then, we propose to train an encoder-decoder structured model with a Unit-to-Unit Translation (UTUT) objective on multilingual data. Specifically, by conditioning the encoder with the source language token and the decoder with the target language token, the model is optimized to translate the spoken language into that of the target language, in a many-to-many language translation setting. Therefore, the model can build the knowledge of how spoken languages are comprehended and how to relate them to different languages. A single pre-trained model with UTUT can be employed for diverse multilingual speech- and text-related tasks, such as Speech-to-Speech Translation (STS), multilingual Text-to-Speech Synthesis (TTS), and Text-to-Speech Translation (TTST). By conducting comprehensive experiments encompassing various languages, we validate the efficacy of the proposed method across diverse multilingual tasks. Moreover, we show UTUT can perform many-to-many language STS, which has not been previously explored in the literature. Samples are available on https://choijeongsoo.github.io/utut.

摘要
在这篇论文中，我们提出了一种方法，可以通过单一模型学习多语言 speech 和文本的共同表示，特别是关注语音合成的目的。我们使用自动编码的 speech 特征来编码多语言 speech 音频，并将其转换为 pseudo text。因此，我们可以关注其语言内容，并建立多语言 speech 和文本的共同表示。然后，我们提出了一种encoder-decoder结构的模型，使用 Unit-to-Unit Translation（UTUT）目标在多语言数据上进行训练。具体来说，通过将源语言tokenconditional encode器，并将目标语言tokenconditional decode器，模型将被优化为将说话语言翻译成目标语言，这种多语言翻译设置下进行训练。因此，模型可以学习说话语言如何被理解，以及如何将其与不同语言关联起来。一个预训练的 UTUT 模型可以用于多种多语言 speech-和文本相关任务，如 Speech-to-Speech Translation（STS）、多语言 Text-to-Speech Synthesis（TTS）和 Text-to-Speech Translation（TTST）。通过对多种语言进行全面的实验，我们证明了提出的方法的有效性。此外，我们还证明了 UTUT 可以实现多语言 STS，这是在文献中没有被探讨的。样例可以在中找到。

Scaling Relationship on Learning Mathematical Reasoning with Large Language Models

paper_url: http://arxiv.org/abs/2308.01825
repo_url: https://github.com/ofa-sys/gsm8k-screl
paper_authors: Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, Chang Zhou
for: 本研究旨在investigate how pre-training loss, supervised data amount, and augmented data amount influence the mathematical reasoning performances of a supervised large language model (LLM).
methods: 本研究使用了supervised fine-tuning (SFT)和Rejection sampling Fine-Tuning (RFT)两种方法来改进LLM的数学逻辑能力。
results: 研究发现，pre-training loss是LLM表现度的更好的指标，而不是模型的参数数量。通过不同量的supervised数据进行学习练习，我们发现模型的性能与数据量成直线关系，并且更好的模型在更大的数据量下提高的速度更快。此外，我们还发现，通过使用RFT，可以增加更多的数据样本，提高LLM的数学逻辑能力，特别是对于较差的LLM模型。最后，我们将多个模型的拒绝样本合并，使得LLM-7B模型的准确率达到49.3%，高于SFT方法的准确率35.9%，并且显著超过SFT方法。

Abstract
Mathematical reasoning is a challenging task for large language models (LLMs), while the scaling relationship of it with respect to LLM capacity is under-explored. In this paper, we investigate how the pre-training loss, supervised data amount, and augmented data amount influence the reasoning performances of a supervised LLM. We find that pre-training loss is a better indicator of the model's performance than the model's parameter count. We apply supervised fine-tuning (SFT) with different amounts of supervised data and empirically find a log-linear relation between data amount and model performance, and we find better models improve less with enlarged supervised datasets. To augment more data samples for improving model performances without any human effort, we propose to apply Rejection sampling Fine-Tuning (RFT). RFT uses supervised models to generate and collect correct reasoning paths as augmented fine-tuning datasets. We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs. We also find RFT brings more improvement for less performant LLMs. Furthermore, we combine rejection samples from multiple models which push LLaMA-7B to an accuracy of 49.3% and outperforms the supervised fine-tuning (SFT) accuracy of 35.9% significantly.

摘要
matematic reasoning 是一个大型自然语言模型 (LLM) 中的挑战任务，而这些模型的规模下的扩展关系尚未得到充分探索。在这篇论文中，我们 investigate了在supervised fine-tuning (SFT) 中，预训练损失、supervised数据量和增强数据量对模型的推理表现的影响。我们发现预训练损失是模型性能的更好的指标，而不是模型的参数数量。我们运用SFT的不同数量的supervised数据，并发现在数据量增加时，模型性能逐渐提高，但是当数据量增加时，模型的改进度逐渐减少。为了增加更多的数据样本以提高模型性能而不需要人工劳动，我们提出了应用Rejection sampling Fine-Tuning (RFT)。RFT使用supervised模型生成和收集正确的推理路径作为增强数据集。我们发现增强样本中包含更多的不同的推理路径，RFT可以更好地提高LLMs的推理性能。此外，我们发现RFT对较低性能的LLMs更加有优势。此外，我们将多个模型的拒绝样本结合，使得LLLaMA-7B 的准确率提高到 49.3%，并且在SFT的准确率（35.9%）上显著超越。

Lexicon and Rule-based Word Lemmatization Approach for the Somali Language

paper_url: http://arxiv.org/abs/2308.01785
repo_url: https://github.com/shafieabdi/somalilemmatizer
paper_authors: Shafie Abdi Mohamed, Muhidin Abdullahi Mohamed
for: 这篇论文是为了开发索马里语文本正常化技术（text lemmatization）而写的。
methods: 这篇论文使用了词语谱和规则来实现索马里语文本正常化。
results: 该算法在120篇文档中测试得到了57%的准确率（对长文章），60.57%的准确率（对新闻文章摘要）和95.87%的高准确率（对社交媒体消息）。

Abstract
Lemmatization is a Natural Language Processing (NLP) technique used to normalize text by changing morphological derivations of words to their root forms. It is used as a core pre-processing step in many NLP tasks including text indexing, information retrieval, and machine learning for NLP, among others. This paper pioneers the development of text lemmatization for the Somali language, a low-resource language with very limited or no prior effective adoption of NLP methods and datasets. We especially develop a lexicon and rule-based lemmatizer for Somali text, which is a starting point for a full-fledged Somali lemmatization system for various NLP tasks. With consideration of the language morphological rules, we have developed an initial lexicon of 1247 root words and 7173 derivationally related terms enriched with rules for lemmatizing words not present in the lexicon. We have tested the algorithm on 120 documents of various lengths including news articles, social media posts, and text messages. Our initial results demonstrate that the algorithm achieves an accuracy of 57\% for relatively long documents (e.g. full news articles), 60.57\% for news article extracts, and high accuracy of 95.87\% for short texts such as social media messages.

摘要
“干扰”是自然语言处理（NLP）技术的一种 normalize 文本的方法，通过将 morphological derivations 变换为其根形式。它是许多 NLP 任务的核心预处理步骤，包括文本索引、信息检索和机器学习 для NLP 等。这篇论文推动了索马里语文本干扰的开发，这是一种有限的资源语言，具有非常有限或者无效的 NLP 方法和数据集。我们特别开发了索马里语文本干扰词典和规则基于的干扰器，这是一个索马里文本干扰系统的开始。我们考虑了语言的 morphological 规则，我们开发了初始词典的 1247 个根词和 7173 个 derivationally 相关的词汇，并添加了词语不在词典中的 lemmatizing 规则。我们对 120 篇文档进行测试，包括新闻文章、社交媒体帖子和短信。我们的初步结果表明，算法在长文档（例如全文新闻文章）中达到了 57% 的准确率，在新闻文章摘要中达到了 60.57%，并在短信中达到了高准确率的 95.87%。

Does Correction Remain A Problem For Large Language Models?

paper_url: http://arxiv.org/abs/2308.01776
repo_url: https://github.com/Noykarde/NoykardeRepository
paper_authors: Xiaowu Zhang, Xiaotian Zhang, Cheng Yang, Hang Yan, Xipeng Qiu
for: 这paper investigate了大语言模型中 correction 的问题，并通过两个实验来解决这个问题。
methods: 这paper使用了 few-shot learning 技术和 GPT-like 模型来进行 error correction。
results: 这paper发现 correction 在大语言模型中仍然存在问题，但可以通过 few-shot learning 技术和 GPT-like 模型来解决这个问题。

Abstract
As large language models, such as GPT, continue to advance the capabilities of natural language processing (NLP), the question arises: does the problem of correction still persist? This paper investigates the role of correction in the context of large language models by conducting two experiments. The first experiment focuses on correction as a standalone task, employing few-shot learning techniques with GPT-like models for error correction. The second experiment explores the notion of correction as a preparatory task for other NLP tasks, examining whether large language models can tolerate and perform adequately on texts containing certain levels of noise or errors. By addressing these experiments, we aim to shed light on the significance of correction in the era of large language models and its implications for various NLP applications.

摘要
大型语言模型，如GPT，继续推进自然语言处理（NLP）的能力，问题是：检查还是存在的问题吗？这篇论文通过两个实验来研究大型语言模型中的检查问题。第一个实验专注于检查作为独立任务，使用少量学习技术训练GPT-like模型进行错误检查。第二个实验探讨检查作为其他NLP任务的前置任务，检查大语言模型是否可以忍受和处理含有一定水平的噪音或错误的文本。通过这两个实验，我们想要照明大型语言模型时代检查的重要性和它对各种NLP应用的影响。

Supply chain emission estimation using large language models

paper_url: http://arxiv.org/abs/2308.01741
repo_url: None
paper_authors: Ayush Jain, Manikandan Padmanaban, Jagabondhu Hazra, Shantanu Godbole, Kommy Weldemariam
for: 这个论文主要写于企业实现可持续发展目标（SDGs）中，尤其是目标13：抗击气候变化的挑战。
methods: 该论文提出了一种首次使用域适应NLP基础模型来估算企业Scope 3排放（供应链排放），通过利用财务交易作为购买商品和服务的代理。
results: 我们的结果表明，域适应基础模型比状态机器学习技术更高效，并且与专家（SME）性能相似。该 frameworks可能加速企业范围内Scope 3估算，帮助企业采取适当的气候行动，实现SDG 13。

Abstract
Large enterprises face a crucial imperative to achieve the Sustainable Development Goals (SDGs), especially goal 13, which focuses on combating climate change and its impacts. To mitigate the effects of climate change, reducing enterprise Scope 3 (supply chain emissions) is vital, as it accounts for more than 90\% of total emission inventories. However, tracking Scope 3 emissions proves challenging, as data must be collected from thousands of upstream and downstream suppliers.To address the above mentioned challenges, we propose a first-of-a-kind framework that uses domain-adapted NLP foundation models to estimate Scope 3 emissions, by utilizing financial transactions as a proxy for purchased goods and services. We compared the performance of the proposed framework with the state-of-art text classification models such as TF-IDF, word2Vec, and Zero shot learning. Our results show that the domain-adapted foundation model outperforms state-of-the-art text mining techniques and performs as well as a subject matter expert (SME). The proposed framework could accelerate the Scope 3 estimation at Enterprise scale and will help to take appropriate climate actions to achieve SDG 13.

摘要
大型企业面临一个决定性的传统目标，即 alcancing Sustainable Development Goals (SDGs)，特别是目标13，它强调抗暖化和其影响。为减少暖化的影响，减少企业范围3（供应链排放）是非常重要，因为它占总排放清单的超过90%。然而，追踪范围3排放是具有挑战性，因为需要从 thousands of 上游和下游供应商收集数据。为解决上述问题，我们提出了一个创新的框架，使用领域适应NLP基础模型估算范围3排放，通过利用购买商品和服务的金融交易作为代理。我们与州创的文本分类模型进行比较，包括TF-IDF、word2Vec和零 shot learning。我们的结果显示，领域适应基础模型比州创的文本探索技术更好，并且和专家（SME）的性能相似。我们的提案的框架可以优化企业范围3估算，帮助实现SDG 13，并且对抗暖化。

Ambient Adventures: Teaching ChatGPT on Developing Complex Stories

paper_url: http://arxiv.org/abs/2308.01734
repo_url: None
paper_authors: Zexin Chen, Eric Zhou, Kenneth Eaton, Xiangyu Peng, Mark Riedl
for: 本研究旨在允许机器人通过幻想玩偶来与现实世界进行更加人性化的互动。
methods: 本研究采用大语言模型的故事生成能力，使用人写的提示生成幻想玩偶的故事，然后简化和映射到动作序列，以帮助机器人进行幻想玩偶。
results: 研究表明，通过使用大语言模型的故事生成能力和人写的提示，机器人可以成功完成幻想玩偶，并且在文本冒险游戏中 simulate 一个家庭作为玩偶场景。

Abstract
Imaginative play is an area of creativity that could allow robots to engage with the world around them in a much more personified way. Imaginary play can be seen as taking real objects and locations and using them as imaginary objects and locations in virtual scenarios. We adopted the story generation capability of large language models (LLMs) to obtain the stories used for imaginary play with human-written prompts. Those generated stories will be simplified and mapped into action sequences that can guide the agent in imaginary play. To evaluate whether the agent can successfully finish the imaginary play, we also designed a text adventure game to simulate a house as the playground for the agent to interact.

摘要
幻想玩耍是一个创造力领域，可以让机器人与现实世界更加人性化地交互。幻想玩耍可以看作是将现实物品和位置用作虚拟enario中的幻想物品和位置。我们采用了大语言模型（LLM）的故事生成能力，使用人写的提示来生成故事。这些生成的故事将被简化并映射到动作序列，以引导代理人进行幻想玩耍。为了评估代理人是否能成功完成幻想玩耍，我们还设计了一个文本冒险游戏，模拟了一个家庭作为玩耍场地。

Baby’s CoThought: Leveraging Large Language Models for Enhanced Reasoning in Compact Models

paper_url: http://arxiv.org/abs/2308.01684
repo_url: https://github.com/oooranz/baby-cothought
paper_authors: Zheyu Zhang, Han Yang, Bolei Ma, David Rügamer, Ercong Nie
for: 这个论文主要目标是提出一种使用大语言模型（LLM）的链条（CoT）提示来高效地训练小语言模型（BabyLM）的pipeline。
methods: 该ipeline使用GPT-3.5-turbo来重新排序一个小于100M的数据集，并使用RoBERTa（Liu et al., 2019）的方式进行预训练。
results: 在4个benchmark上评测，该BabyLM表现得更好，在10种语言、NLU和问答任务中超过了RoBERTa-base的表现，提示了更好的Context抽取能力。

Abstract
Large Language Models (LLMs) demonstrate remarkable performance on a variety of Natural Language Understanding (NLU) tasks, primarily due to their in-context learning ability. This ability is utilized in our proposed "CoThought" pipeline, which efficiently trains smaller "baby" language models (BabyLMs) by leveraging the Chain of Thought (CoT) prompting of LLMs. Our pipeline restructures a dataset of less than 100M in size using GPT-3.5-turbo, transforming it into task-oriented, human-readable texts that are comparable to the school texts for language learners. The BabyLM is then pretrained on this restructured dataset in a RoBERTa (Liu et al., 2019) fashion. In evaluations across 4 benchmarks, our BabyLM outperforms the RoBERTa-base in 10 linguistic, NLU, and question answering tasks by more than 3 points, showing superior ability to extract contextual information. These results suggest that compact LMs pretrained on small, LLM-restructured data can better understand tasks and achieve improved performance. The code for data processing and model training is available at: https://github.com/oooranz/Baby-CoThought.

摘要
大语言模型（LLM）在多种自然语言理解（NLU）任务上表现出众，主要归功于其在上下文学习能力。我们的提议的“CoThought”管道利用了LLM的链条（CoT）提问能力，将更小的“宝宝”语言模型（BabyLM）进行高效地训练。我们使用GPT-3.5-turbo重新排序了一个数据集，将其转换成任务导向、人类可读的文本，与学校语言学习课本相似。然后，我们使用RoBERTa（Liu et al., 2019）的方式先进行了BabyLM的预训练。在4个标准准标下进行评估，我们的BabyLM在10种语言、NLU和问答任务中超过了RoBERTa-base的3点，表明它更好地提取上下文信息。这些结果表明，使用小型LM在小数据集上进行预训练可以更好地理解任务，并实现更高的性能。相关代码可以在GitHub上找到：https://github.com/oooranz/Baby-CoThought。

Evaluating ChatGPT text-mining of clinical records for obesity monitoring

paper_url: http://arxiv.org/abs/2308.01666
repo_url: None
paper_authors: Ivo S. Fins, Heather Davies, Sean Farrell, Jose R. Torres, Gina Pinchbeck, Alan D. Radford, Peter-John Noble
For: The paper aims to compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives.* Methods: The study uses 4,415 anonymized clinical narratives, with BCS values extracted using either RegexT or by appending the narrative to a prompt sent to ChatGPT and manually reviewing the results.* Results: The paper finds that the precision of RegexT was higher (100%, 95% CI 94.81-100%) than the ChatGPT (89.3%, 95% CI 82.75-93.64%), but the recall of ChatGPT (100%, 95% CI 96.18-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92-79.94%).Here are the three key points in Simplified Chinese text:* 用途: 这篇论文目的是比较一个大型自然语言模型（ChatGPT）和一个已经开发的正则表达（RegexT）在宠物临床报告中识别过重体重分数（BCS）的能力。* 方法: 这个研究使用了4415个匿名的临床报告，BCS值由RegexT或者将报告附加到ChatGPT的提示中，并 manually查看结果。* 结果: 论文发现，RegexT的精度（100%, 95% CI 94.81-100%）高于ChatGPT（89.3%, 95% CI 82.75-93.64%），但ChatGPT的感知（100%, 95% CI 96.18-100%）远高于RegexT（72.6%, 95% CI 63.92-79.94%）。

Abstract
Background: Veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. Here we compare the ability of a large language model (ChatGPT) and a previously developed regular expression (RegexT) to identify overweight body condition scores (BCS) in veterinary narratives. Methods: BCS values were extracted from 4,415 anonymised clinical narratives using either RegexT or by appending the narrative to a prompt sent to ChatGPT coercing the model to return the BCS information. Data were manually reviewed for comparison. Results: The precision of RegexT was higher (100%, 95% CI 94.81-100%) than the ChatGPT (89.3%; 95% CI82.75-93.64%). However, the recall of ChatGPT (100%. 95% CI 96.18-100%) was considerably higher than that of RegexT (72.6%, 95% CI 63.92-79.94%). Limitations: Subtle prompt engineering is needed to improve ChatGPT output. Conclusions: Large language models create diverse opportunities and, whilst complex, present an intuitive interface to information but require careful implementation to avoid unpredictable errors.

摘要
背景： veterinary clinical narratives remain a largely untapped resource for addressing complex diseases. 在这里，我们比较了一个大型自然语言模型（ChatGPT）和一个已经开发的正则表达（RegexT）可以在 veterinary narratives 中标识过重的体重 condition scores (BCS)。方法：BCS 值被提取自4,415个匿名的临床 narratives 中，使用 Either RegexT 或者附加 narrative 到一个提示，并将模型返回 BCSI 信息。数据被手动审查以进行比较。结果：RegexT 的精度高于 ChatGPT（100%, 95% CI 94.81-100%），但 ChatGPT 的回归高于 RegexT（100%, 95% CI 96.18-100%）。限制：需要细化的提示工程来提高 ChatGPT 输出。结论：大型自然语言模型创造了多样的机会，尽管复杂，但它们提供了直观的界面，但是需要小心的实施以避免不可预期的错误。

BioBERT Based SNP-traits Associations Extraction from Biomedical Literature

paper_url: http://arxiv.org/abs/2308.02569
repo_url: None
paper_authors: Mohammad Dehghani, Behrouz Bokharaeian, Zahra Yazdanparast
for: 这个论文是为了提取生物医学信息中的单核苷多态性和特征之间的关系。
methods: 这个论文使用的方法是 BioBERT-GRU 方法，用于识别单核苷多态性和特征之间的关系。
results: 根据SNPPhenA数据集的评估结果，BioBERT-GRU 方法比前一些机器学习和深度学习基于的方法表现更好，具有精度为 0.883、回归率为 0.882 和 F1 分数为 0.881。

Abstract
Scientific literature contains a considerable amount of information that provides an excellent opportunity for developing text mining methods to extract biomedical relationships. An important type of information is the relationship between singular nucleotide polymorphisms (SNP) and traits. In this paper, we present a BioBERT-GRU method to identify SNP- traits associations. Based on the evaluation of our method on the SNPPhenA dataset, it is concluded that this new method performs better than previous machine learning and deep learning based methods. BioBERT-GRU achieved the result a precision of 0.883, recall of 0.882 and F1-score of 0.881.

摘要
（文科文献中含有大量信息，提供了优秀的机会，用于发展文本挖掘技术，抽取生物医学关系。特别是关于单核苷多态性（SNP）和特征之间的关系。在本文中，我们提出了 BioBERT-GRU 方法，用于找到 SNP-特征关系。对于 SNPPhenA 数据集的评估，我们得出结论，这种新方法在前一些机器学习和深度学习基于方法上表现更好。BioBERT-GRU 实现了一个精度为 0.883，回归率为 0.882，和 F1 分数为 0.881。）

Multimodal Neurons in Pretrained Text-Only Transformers

paper_url: http://arxiv.org/abs/2308.01544
repo_url: None
paper_authors: Sarah Schwettmann, Neil Chowdhury, Antonio Torralba
for: 这个论文旨在研究语言模型是否可以将一个modalities的表示映射到另一个modalities的下游任务中。
methods: 这篇论文使用了一个冻结的文本转换器，并通过一个自然语言生成器和一个单线性映射学习了一个图像-文本任务。
results: 研究发现，当图像和文本modalities被拼接在一起时，语言模型可以将图像表示映射到文本中，并且这种转换发生在transformer模型的深处。此外，研究还发现了一些”多Modal neurons”，这些neurons可以将视觉表示转换为对应的文本描述，并且这种转换具有系统的 causal effect on image captioning。

Abstract
Language models demonstrate remarkable capacity to generalize representations learned in one modality to downstream tasks in other modalities. Can we trace this ability to individual neurons? We study the case where a frozen text transformer is augmented with vision using a self-supervised visual encoder and a single linear projection learned on an image-to-text task. Outputs of the projection layer are not immediately decodable into language describing image content; instead, we find that translation between modalities occurs deeper within the transformer. We introduce a procedure for identifying "multimodal neurons" that convert visual representations into corresponding text, and decoding the concepts they inject into the model's residual stream. In a series of experiments, we show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning.

摘要
语言模型表现出了惊人的泛化能力，可以将在一种modalities中学习的表示 transferred 到另一种modalities中的下游任务中。我们可以追溯这种能力到个体神经吗？我们研究一个冻结的文本转换器被融合到视觉中的情况，使用一个自我supervised visual encoder和一个单一的线性投影学习了一个image-to-text任务。投影层的输出不是 immediate 可以decode 到描述图像内容的语言;相反，我们发现在trasformer中， traduction between modalities 发生在更深层次。我们提出了一种方法来识别"多modal neurons"，它们将视觉表示转换成对应的语言表示，并在模型的剩余流中注入概念。在一系列实验中，我们发现这些多modal neurons 操作于特定的视觉概念上，并具有系统的 causal effect on image captioning。

Comparing scalable strategies for generating numerical perspectives

paper_url: http://arxiv.org/abs/2308.01535
repo_url: None
paper_authors: Hancheng Cao, Sofia Eleni Spatharioti, Daniel G. Goldstein, Jake M. Hofman
for: 这篇论文目的是为了探讨如何生成大规模的数字观点，以帮助人们更好地理解EXTREME和不熟悉的数字（例如3300亿美元相当于每个美国人1000美元）。
methods: 这篇论文使用了三种策略来生成大规模的数字观点：一个基于规则的方法、一个基于人们协作的系统，以及一个使用Wikipedia数据和BERT嵌入来生成上下文特定的观点。
results: 研究发现，这三种策略之间有 synergy，不同的策略在不同的设置和上下文中表现出不同的优势，用户也表现出不同的偏好。

Abstract
Numerical perspectives help people understand extreme and unfamiliar numbers (e.g., \$330 billion is about \$1,000 per person in the United States). While research shows perspectives to be helpful, generating them at scale is challenging both because it is difficult to identify what makes some analogies more helpful than others, and because what is most helpful can vary based on the context in which a given number appears. Here we present and compare three policies for large-scale perspective generation: a rule-based approach, a crowdsourced system, and a model that uses Wikipedia data and semantic similarity (via BERT embeddings) to generate context-specific perspectives. We find that the combination of these three approaches dominates any single method, with different approaches excelling in different settings and users displaying heterogeneous preferences across approaches. We conclude by discussing our deployment of perspectives in a widely-used online word processor.

摘要
numrical 角度可以帮助人们更好地理解极端和不熟悉的数字（例如， \$330 亿是美国每个人 \$1,000）。 Although research shows that perspectives are helpful, generating them at scale is challenging because it is difficult to identify which analogies are most helpful, and what is most helpful can vary based on the context in which a given number appears. Here, we present and compare three policies for large-scale perspective generation: a rule-based approach, a crowdsourced system, and a model that uses Wikipedia data and semantic similarity (via BERT embeddings) to generate context-specific perspectives. We find that the combination of these three approaches dominates any single method, with different approaches excelling in different settings and users displaying heterogeneous preferences across approaches. We conclude by discussing our deployment of perspectives in a widely-used online word processor.Note:* "numrical" is a typo, the correct word is "numerical"* "极端" is a more casual way of saying "extreme"* "不熟悉" is a more casual way of saying "unfamiliar"* "BERT embeddings" is a more technical term, you may want to use a more general term like "word embeddings" or "semantic embeddings" to make the text more accessible to a wider audience.

Large Language Model Displays Emergent Ability to Interpret Novel Literary Metaphors

paper_url: http://arxiv.org/abs/2308.01497
repo_url: None
paper_authors: Nicholas Ichien, Dušan Stamenković, Keith J. Holyoak
for: 这个论文探讨了大型自然语言模型（LLM）是否可以在不同的语言处理和推理任务中达到人类高级能力水平。
methods: 该论文使用了GPT-4，一个现代大型语言模型，对塞尔维亚诗歌中的新型比喻进行了自然语言解释。
results: GPT-4能够提供精彩和准确的比喻解释，而且在逆比喻中也表现出了格雷西安合作原则的敏感性。这些结果表明LLMs such as GPT-4已经获得了解新的复杂比喻的能力。

Abstract
Recent advances in the performance of large language models (LLMs) have sparked debate over whether, given sufficient training, high-level human abilities emerge in such generic forms of artificial intelligence (AI). Despite the exceptional performance of LLMs on a wide range of tasks involving natural language processing and reasoning, there has been sharp disagreement as to whether their abilities extend to more creative human abilities. A core example is the ability to interpret novel metaphors. Given the enormous and non-curated text corpora used to train LLMs, a serious obstacle to designing tests is the requirement of finding novel yet high-quality metaphors that are unlikely to have been included in the training data. Here we assessed the ability of GPT-4, a state-of-the-art large language model, to provide natural-language interpretations of novel literary metaphors drawn from Serbian poetry and translated into English. Despite exhibiting no signs of having been exposed to these metaphors previously, the AI system consistently produced detailed and incisive interpretations. Human judge - blind to the fact that an AI model was involved - rated metaphor interpretations generated by GPT-4 as superior to those provided by a group of college students. In interpreting reversed metaphors, GPT-4, as well as humans, exhibited signs of sensitivity to the Gricean cooperative principle. These results indicate that LLMs such as GPT-4 have acquired an emergent ability to interpret complex novel metaphors.

摘要

Investigating Reinforcement Learning for Communication Strategies in a Task-Initiative Setting

paper_url: http://arxiv.org/abs/2308.01479
repo_url: None
paper_authors: Baber Khalid, Matthew Stone
for: 这篇论文是为了研究在对话任务中如何实现精细化信息传递的优化策略。
methods: 论文使用了模拟方法，对初始呈现和后续跟进的交互策略进行了分析，并比较了多种基eline策略和由强化学习 derive的策略的性能。
results: 研究发现，使用 coherence-based 对话策略可以带来最小数据需求、可解释的选择和强大的审核能力，但具有较小的数据损失，并且在各种用户模型下表现良好。

Abstract
Many conversational domains require the system to present nuanced information to users. Such systems must follow up what they say to address clarification questions and repair misunderstandings. In this work, we explore this interactive strategy in a referential communication task. Using simulation, we analyze the communication trade-offs between initial presentation and subsequent followup as a function of user clarification strategy, and compare the performance of several baseline strategies to policies derived by reinforcement learning. We find surprising advantages to coherence-based representations of dialogue strategy, which bring minimal data requirements, explainable choices, and strong audit capabilities, but incur little loss in predicted outcomes across a wide range of user models.

摘要
很多对话领域需要系统向用户提供细化信息。这些系统必须跟踪用户的提问和理解错误，以便进行回答和修复。在这个工作中，我们研究了这种交互策略在参照通信任务中的应用。使用模拟，我们分析了对话投入和后续跟进的交互траde-off，并比较了多种基eline策略和基于强化学习的策略的性能。我们发现了一些预期不一致的优点，包括减少数据需求、可解释的选择和强大的审核能力，但是这些策略具有较小的输出预测损失。

Reverse Stable Diffusion: What prompt was used to generate this image?

paper_url: http://arxiv.org/abs/2308.01472
repo_url: None
paper_authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
for: 这个研究的目的是提出一个新的文本描述预测任务，即预测由生成演化模型生成的图像所对应的文本描述。
methods: 我们使用了一系列的白盒和黑盒模型（具有或无法访问演化网络的 weights）来处理这个任务。我们提出了一个共同的预告 regression 和多 label 词典分类目标，从而生成改进的预告。
results: 我们的新学习框架在这个任务中产生了出色的结果，尤其是在使用白盒模型时。此外，我们发现了一个有趣的发现：将演化模型直接用于文本至图像生成任务时，可以让模型生成更加适合预告的图像。

Abstract
Text-to-image diffusion models such as Stable Diffusion have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we introduce the new task of predicting the text prompt given an image generated by a generative diffusion model. We combine a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising of a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (i.e. that are better aligned), and an unsupervised domain-adaptive kernel learning method that uses the similarities between samples in the source and target domains as extra features. We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. Our novel learning framework produces excellent results on the aforementioned task, yielding the highest gains when applied on the white-box model. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation.

摘要
文本扩散模型，如稳定扩散，在近期吸引了许多研究人员的关注。在 revert 扩散过程中，可以更好地理解生成过程，并用于改进提示的工程。为此，我们提出了预测由生成扩散模型生成的图像中的文本提示的新任务。我们结合了一系列的白盒和黑盒模型（具有或 безDiffusion网络的参数）来处理该任务。我们提出了一种新的学习框架，包括联合提示回归和多类词汇分类目标，可以生成改进的提示。为了进一步改进我们的方法，我们使用了课程学习程序，该程序通过将图像提示与更低的标签噪音（即更好地对齐）进行排序，来促进学习。此外，我们还使用了无监督领域适应kernels的学习方法，该方法使用源和目标领域中样本之间的相似性作为额外特征。我们在DiffusionDB数据集上进行了实验，预测由Stable Diffusion生成的图像中的文本提示。我们的新的学习框架在该任务上取得了优秀的结果，特别是在白盒模型上。此外，我们还发现了一个有趣的发现：在直接将扩散模型用于文本到图像生成任务的训练过程中，可以使扩散模型生成更好地对齐的图像，当模型直接重用于文本到图像生成时。

UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text using Transformer Ensembles

paper_url: http://arxiv.org/abs/2308.01408
repo_url: None
paper_authors: Andrei-Alexandru Preda, Dumitru-Clementin Cercel, Traian Rebedea, Costin-Gabriel Chiru
for: 本文描述了UPB团队在AuTexTification共享任务中提交的解决方案，该任务是IberLEF-2023的一部分。
methods: 我们采用了基于Transformer的深度学习模型，以及多任务学习和虚拟反对抗训练等训练技术来提高结果。
results: 我们的最佳模型在英文数据集上 achievable macro F1-score为66.63%，在西班牙语数据集上 achievable macro F1-score为67.10%。

Abstract
This paper describes the solutions submitted by the UPB team to the AuTexTification shared task, featured as part of IberLEF-2023. Our team participated in the first subtask, identifying text documents produced by large language models instead of humans. The organizers provided a bilingual dataset for this subtask, comprising English and Spanish texts covering multiple domains, such as legal texts, social media posts, and how-to articles. We experimented mostly with deep learning models based on Transformers, as well as training techniques such as multi-task learning and virtual adversarial training to obtain better results. We submitted three runs, two of which consisted of ensemble models. Our best-performing model achieved macro F1-scores of 66.63% on the English dataset and 67.10% on the Spanish dataset.

摘要

Optimizing Machine Translation through Prompt Engineering: An Investigation into ChatGPT’s Customizability

paper_url: http://arxiv.org/abs/2308.01391
repo_url: None
paper_authors: Masaru Yamada
for: 这项研究探讨了在提示中包含翻译目标和目标受众的影响对ChatGPT生成的翻译质量。
methods: 研究采用了以前的翻译研究、行业实践和ISO标准，强调翻译过程的预制阶段的重要性。
results: 研究发现，在大规模语言模型如ChatGPT中添加合适的提示可以生成灵活的翻译，而 convention Machine Translation（MT）未能实现这一点。研究还发现，当提示包含特定条件时，翻译质量发生了变化。对于翻译质量的评价采用了实践翻译家的视角，并使用OpenAI的单词嵌入API进行cosine相似性计算。结果表明，在提示中包含翻译目标和目标受众可以改善翻译质量，并且在市场文档和文化依赖的成语中特别有用。

Abstract
This paper explores the influence of integrating the purpose of the translation and the target audience into prompts on the quality of translations produced by ChatGPT. Drawing on previous translation studies, industry practices, and ISO standards, the research underscores the significance of the pre-production phase in the translation process. The study reveals that the inclusion of suitable prompts in large-scale language models like ChatGPT can yield flexible translations, a feat yet to be realized by conventional Machine Translation (MT). The research scrutinizes the changes in translation quality when prompts are used to generate translations that meet specific conditions. The evaluation is conducted from a practicing translator's viewpoint, both subjectively and qualitatively, supplemented by the use of OpenAI's word embedding API for cosine similarity calculations. The findings suggest that the integration of the purpose and target audience into prompts can indeed modify the generated translations, generally enhancing the translation quality by industry standards. The study also demonstrates the practical application of the "good translation" concept, particularly in the context of marketing documents and culturally dependent idioms.

摘要