cs.CL - 2023-07-27

ARC-NLP at PAN 2023: Transition-Focused Natural Language Inference for Writing Style Detection

  • paper_url: http://arxiv.org/abs/2307.14913
  • repo_url: None
  • paper_authors: Izzet Emre Kucukkaya, Umitcan Sahin, Cagri Toraman
  • for: 本文的任务是检测文本中的多个作者写作风格变化。
  • methods: 我们将这个任务转换为自然语言推理问题,并将两个篇章相互对应。我们使用不同的 transformer 基础模型进行训练,并在训练过程中添加充电阶段。
  • results: 我们在 экспериментах中表现出了超过基线和其他提议的模型版本的好干净。对于易、中等设置,我们提交了关注转折的自然语言推理基于 DeBERTa 的温存训练版本,而对于困难设置,我们提交了不含转折的同一模型版本。
    Abstract The task of multi-author writing style detection aims at finding any positions of writing style change in a given text document. We formulate the task as a natural language inference problem where two consecutive paragraphs are paired. Our approach focuses on transitions between paragraphs while truncating input tokens for the task. As backbone models, we employ different Transformer-based encoders with warmup phase during training. We submit the model version that outperforms baselines and other proposed model versions in our experiments. For the easy and medium setups, we submit transition-focused natural language inference based on DeBERTa with warmup training, and the same model without transition for the hard setup.
    摘要 文本检测多种作者风格目标在文档中找到任何写作风格变化的位置。我们将问题转换为自然语言推理问题,将两段文本相邻排序。我们的方法关注段落之间的转换,并在训练中使用Transformer基本encoder进行温存阶段。在我们的实验中,我们提交的模型版本超越基eline和其他提议的模型版本。对于易Difficulty和中等Difficulty的设置,我们提交转换注意力的自然语言推理基于DeBERTa,并在训练中添加温存阶段。

ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger Detection

  • paper_url: http://arxiv.org/abs/2307.14912
  • repo_url: None
  • paper_authors: Umitcan Sahin, Izzet Emre Kucukkaya, Cagri Toraman
  • for: 这篇论文目的是描述在PAN CLEF 2023中的Trigger Detection分享任务中检测多个触发内容的方法。
  • methods: 我们使用了层次模型,其中首先将长文档分割成较小的 segment,然后使用这些 segment 来练化一个基于 Transformer 的语言模型。接着,我们从练化后的 Transformer 模型提取特征嵌入,并将其作为多个 LSTM 模型的训练输入。
  • results: 我们的模型在验证集上达到了 F1-macro 分数为 0.372 和 F1-micro 分数为 0.736,这些结果高于PAN CLEF 2023中基eline的结果。
    Abstract Fanfiction, a popular form of creative writing set within established fictional universes, has gained a substantial online following. However, ensuring the well-being and safety of participants has become a critical concern in this community. The detection of triggering content, material that may cause emotional distress or trauma to readers, poses a significant challenge. In this paper, we describe our approach for the Trigger Detection shared task at PAN CLEF 2023, where we want to detect multiple triggering content in a given Fanfiction document. For this, we build a hierarchical model that uses recurrence over Transformer-based language models. In our approach, we first split long documents into smaller sized segments and use them to fine-tune a Transformer model. Then, we extract feature embeddings from the fine-tuned Transformer model, which are used as input in the training of multiple LSTM models for trigger detection in a multi-label setting. Our model achieves an F1-macro score of 0.372 and F1-micro score of 0.736 on the validation set, which are higher than the baseline results shared at PAN CLEF 2023.
    摘要 fanfiction,一种受欢迎的创作写作形式,已经在确立的虚拟世界中得到了大量的在线支持。然而,保证参与者的健康和安全已成为这个社区的核心问题。检测触发内容(material that may cause emotional distress or trauma to readers)成为了一项重要的挑战。在这篇论文中,我们描述了我们在PAN CLEF 2023中的触发检测共同任务中的方法,我们想检测给定的fanfiction文档中的多个触发内容。为此,我们构建了层次模型,使用recurrence override Transformer-based语言模型。在我们的方法中,我们首先将长文档分成更小的段落,然后使用这些段落来精度调整Transformer模型。然后,我们从精度调整后的Transformer模型中提取特征嵌入,这些嵌入被用作多个LSTM模型的训练输入,以实现多标签的触发检测。我们的模型在验证集上达到了F1-macroscore的0.372和F1-micro score的0.736,这些结果高于PAN CLEF 2023中分享的基线结果。

Retrieval-based Text Selection for Addressing Class-Imbalanced Data in Classification

  • paper_url: http://arxiv.org/abs/2307.14899
  • repo_url: None
  • paper_authors: Sareh Ahmadi, Aditya Shah, Edward Fox
  • for: 这篇论文的目的是解决文本分类中选择文本进行标注的问题,当有限制人工资源时。另外,该论文还面临了罕见的分类问题,即Binary categories with a small number of positive instances。
  • methods: 这篇论文提出了利用SHAP来建构高质量的查询集,以便使用Elasticsearch和semantic search来选择需要标注的文本。这种方法可以在批量标注的情况下,使用先前的标注来指导下一批文本的选择。
  • results: 实验结果显示,这种方法可以提高少数分类的F1分数,协助解决分类问题。
    Abstract This paper addresses the problem of selecting of a set of texts for annotation in text classification using retrieval methods when there are limits on the number of annotations due to constraints on human resources. An additional challenge addressed is dealing with binary categories that have a small number of positive instances, reflecting severe class imbalance. In our situation, where annotation occurs over a long time period, the selection of texts to be annotated can be made in batches, with previous annotations guiding the choice of the next set. To address these challenges, the paper proposes leveraging SHAP to construct a quality set of queries for Elasticsearch and semantic search, to try to identify optimal sets of texts for annotation that will help with class imbalance. The approach is tested on sets of cue texts describing possible future events, constructed by participants involved in studies aimed to help with the management of obesity and diabetes. We introduce an effective method for selecting a small set of texts for annotation and building high-quality classifiers. We integrate vector search, semantic search, and machine learning classifiers to yield a good solution. Our experiments demonstrate improved F1 scores for the minority classes in binary classification.
    摘要 Here is the translation in Simplified Chinese:这篇论文关注在文本分类中使用检索方法选择文本进行标注时,因为人工资源有限制而存在限制标注数量的问题。此外,论文还 Addresses the challenge of dealing with binary categories that have a small number of positive instances, reflecting severe class imbalance. 在我们的情况下,标注发生在长时间内,因此可以在批次中选择文本进行标注,前一次的标注导导选择下一批文本。为解决这些挑战,论文提议利用SHAP construct一个质量集合查询语义搜索,以尝试确定最佳的标注文本集,以帮助减轻类偏。这种方法在cue文描述可能的未来事件的集合上进行测试,这些集合由参与研究的参与者构建。我们引入了一种有效的文本选择和建立高质量分类器的方法。我们将vector搜索、语义搜索和机器学习分类器集成,以获得良好的解决方案。我们的实验表明,在二分类分类中,少数类的F1分数得到了改善。

MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities

  • paper_url: http://arxiv.org/abs/2307.14878
  • repo_url: https://github.com/thukelab/mesed
  • paper_authors: Yangning Li, Tingwei Lu, Yinghui Li, Tianyu Yu, Shulin Huang, Hai-Tao Zheng, Rui Zhang, Jun Yuan
  • for: 本研究旨在提高Entity Set Expansion(ESE)任务中的entity扩展精度,通过 integrate多Modalities的信息来表示实体。
  • methods: 本研究提出了Multi-modal Entity Set Expansion(MESE)方法,使用多modalities的信息来代表实体,包括:(1) 不同Modalities提供补充信息。(2) 多modalities信息提供共同Visual属性,为同Semantic class或实体提供一致信号。(3) 多modalities信息提供Robust的Alignment信号,用于同义实体。
  • results: 在MESED dataset上,我们提出了一种 poderous multi-modal模型MultiExpan,通过四种多模态预训练任务进行预训练。实验和分析表明,MESED dataset具有高质量,MultiExpan具有高效性,同时还指明了未来研究的方向。
    Abstract The Entity Set Expansion (ESE) task aims to expand a handful of seed entities with new entities belonging to the same semantic class. Conventional ESE methods are based on mono-modality (i.e., literal modality), which struggle to deal with complex entities in the real world such as: (1) Negative entities with fine-grained semantic differences. (2) Synonymous entities. (3) Polysemous entities. (4) Long-tailed entities. These challenges prompt us to propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities. Intuitively, the benefits of multi-modal information for ESE are threefold: (1) Different modalities can provide complementary information. (2) Multi-modal information provides a unified signal via common visual properties for the same semantic class or entity. (3) Multi-modal information offers robust alignment signal for synonymous entities. To assess the performance of model in MESE and facilitate further research, we constructed the MESED dataset which is the first multi-modal dataset for ESE with large-scale and elaborate manual calibration. A powerful multi-modal model MultiExpan is proposed which is pre-trained on four multimodal pre-training tasks. The extensive experiments and analyses on MESED demonstrate the high quality of the dataset and the effectiveness of our MultiExpan, as well as pointing the direction for future research.
    摘要 Entity Set Expansion (ESE) 任务目标是从seed entity开始扩展新的实体,这些实体属于同一个semantic class。传统的ESE方法基于单一模式(即Literal modality),在实际世界中遇到了复杂实体的挑战,如:1. 具有细化 semantic differences的负实体。2. 同义实体。3. 多义实体。4. 长尾实体。这些挑战使我们提出了Multi-modal Entity Set Expansion (MESE),其中模型可以 Integrate multiple modalities 来表示实体。可以 intuitively 分解为以下三个 benefit:1. 不同的模式可以提供补充信息。2. multi-modal information 提供了一个统一的信号,通过共同的视觉属性来表示同一个semantic class或实体。3. multi-modal information 提供了一个强健的对同义实体的吴健信号。为了评估模型在 MESE 中的性能和进一步研究,我们构建了 MESED 数据集,这是首个大规模、精心准备的多模式 ESE 数据集。我们还提出了一种强大的多模式模型 MultiExpan,该模型在四种多模式预训练任务上进行预训练。我们对 MESED 进行了广泛的实验和分析,以证明数据集的高质量和我们的 MultiExpan 的有效性,同时也指出了未来研究的方向。

Cascaded Cross-Modal Transformer for Request and Complaint Detection

  • paper_url: http://arxiv.org/abs/2307.15097
  • repo_url: https://github.com/ristea/ccmt
  • paper_authors: Nicolae-Catalin Ristea, Radu Tudor Ionescu
  • for: 本研究的目的是开发一种新的多modal transformer模型,用于检测电话对话中的客户请求和投诉。
  • methods: 本研究使用自动语音识别(ASR)模型将语音转录为文本,并将文本翻译成不同的语言。然后,我们将语言特定的 BERT 模型和 Wav2Vec2.0 音频特征结合在一起,使用novel的卷积混合注意力模型。
  • results: 我们在 ACM Multimedia 2023 计算语言学挑战中的请求子挑战中应用了我们的系统,实现了无权重平均回归率(UAR)65.41%和85.87% для请求和投诉类别。
    Abstract We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and text transcripts to detect customer requests and complaints in phone conversations. Our approach leverages a multimodal paradigm by transcribing the speech using automatic speech recognition (ASR) models and translating the transcripts into different languages. Subsequently, we combine language-specific BERT-based models with Wav2Vec2.0 audio features in a novel cascaded cross-attention transformer model. We apply our system to the Requests Sub-Challenge of the ACM Multimedia 2023 Computational Paralinguistics Challenge, reaching unweighted average recalls (UAR) of 65.41% and 85.87% for the complaint and request classes, respectively.
    摘要 我们提出了一种新的层次融合 cross-modal transformer(CCMT),该模型结合语音和文本译文来检测电话对话中的客户请求和投诉。我们的方法借鉴了多模态思想,通过使用自动语音识别(ASR)模型将语音转录为不同语言,然后将语言特定的 BERT 基于模型与 Wav2Vec2.0 音频特征结合在一起,并通过新的层次融合 cross-attention transformer 模型进行结合。我们在 ACM Multimedia 2023 计算 паралингвистики挑战中的请求子挑战中应用了我们的系统,实现了不Weighted average recall(UAR)的 65.41% 和 85.87%,对于投诉和请求类别。

ArcGPT: A Large Language Model Tailored for Real-world Archival Applications

  • paper_url: http://arxiv.org/abs/2307.14852
  • repo_url: None
  • paper_authors: Shitou Zhang, Jingrui Hou, Siyuan Peng, Zuchao Li, Qibiao Hu, Ping Wang
  • for: 为了更好地管理和利用档案信息资源,提高archivist的工作效率和质量。
  • methods: 使用大量和广泛的档案领域数据进行预训练,提高模型在真实世界档案任务中的表现。
  • results: 比起现有状态的模型,ArcGPT在四个真实世界档案任务上表现出色,代表了一个重要的进步在有效地管理档案数据方面。
    Abstract Archives play a crucial role in preserving information and knowledge, and the exponential growth of such data necessitates efficient and automated tools for managing and utilizing archive information resources. Archival applications involve managing massive data that are challenging to process and analyze. Although LLMs have made remarkable progress in diverse domains, there are no publicly available archives tailored LLM. Addressing this gap, we introduce ArcGPT, to our knowledge, the first general-purpose LLM tailored to the archival field. To enhance model performance on real-world archival tasks, ArcGPT has been pre-trained on massive and extensive archival domain data. Alongside ArcGPT, we release AMBLE, a benchmark comprising four real-world archival tasks. Evaluation on AMBLE shows that ArcGPT outperforms existing state-of-the-art models, marking a substantial step forward in effective archival data management. Ultimately, ArcGPT aims to better serve the archival community, aiding archivists in their crucial role of preserving and harnessing our collective information and knowledge.
    摘要 归档 play a crucial role in preserving information and knowledge, and the exponential growth of such data makes it necessary to have efficient and automated tools for managing and utilizing archive information resources. 归档应用程序面临着处理和分析庞大数据的挑战。虽然LLMs 已经在多个领域取得了卓越成果,但是没有公共可用的归档LLM。为了填补这个空白,我们介绍ArcGPT,我们知道的第一个普用LLM,专门针对归档领域。为了提高模型在实际归档任务中的性能,ArcGPT 在庞大和广泛的归档领域数据上进行了预训练。同时,我们发布了 AMBLE,一个包含四个实际归档任务的benchmark。评估表明,ArcGPT 在 AMBLE 上表现出色,超越了现有的状态对模型, represents a significant step forward in effective archival data management。最终,ArcGPT hopes to better serve the archival community, aiding archivists in their crucial role of preserving and harnessing our collective information and knowledge.

Turkish Native Language Identification

  • paper_url: http://arxiv.org/abs/2307.14850
  • repo_url: None
  • paper_authors: Ahmet Yavuz Uluslu, Gerold Schneider
  • for: 这个论文是为了探讨土耳其语言认知(NLI)的首次应用。
  • methods: 该研究使用了土耳其学习语料库,并结合三种语法特征(CFG生成规则、part-of-speech n-grams和函数词)来证明其效果。
  • results: 研究发现这些语法特征可以有效地预测土耳其作者的首语言。
    Abstract In this paper, we present the first application of Native Language Identification (NLI) for the Turkish language. NLI involves predicting the writer's first language by analysing their writing in different languages. While most NLI research has focused on English, our study extends its scope to Turkish. We used the recently constructed Turkish Learner Corpus and employed a combination of three syntactic features (CFG production rules, part-of-speech n-grams, and function words) with L2 texts to demonstrate their effectiveness in this task.
    摘要 在本文中,我们介绍了第一个用于土耳其语的本地语言认知(NLI)应用。NLI 涉及预测作者的第一语言,通过分析他们在不同语言中写作的文本。而大多数 NLI 研究都集中在英语上,我们的研究将其扩展到土耳其语。我们使用最近建立的土耳其学习 корпуス,并将三种语法特征(CFG生成规则、part-of-speech n-grams 和函数词)与第二语言文本结合使用,以示其效果。

What Makes a Good Paraphrase: Do Automated Evaluations Work?

  • paper_url: http://arxiv.org/abs/2307.14818
  • repo_url: None
  • paper_authors: Anna Moskvina, Bhushan Kotnis, Chris Catacata, Michael Janz, Nasrin Saef
  • for: 本研究旨在探讨自动重塑语料的质量。
  • methods: 研究使用了自动生成的语料,并采用了不同的自动重塑策略进行评估。
  • results: 研究发现,使用不同的自动重塑策略可以获得不同的质量。
    Abstract Paraphrasing is the task of expressing an essential idea or meaning in different words. But how different should the words be in order to be considered an acceptable paraphrase? And can we exclusively use automated metrics to evaluate the quality of a paraphrase? We attempt to answer these questions by conducting experiments on a German data set and performing automatic and expert linguistic evaluation.
    摘要 要完成篇幅翻译,需要不同的词语表达同一个主题的核心意义。但是,可以用自动化指标来评估翻译质量吗?我们通过使用德语数据集进行实验和自动语言评估来尝试回答这些问题。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Models of reference production: How do they withstand the test of time?

  • paper_url: http://arxiv.org/abs/2307.14817
  • repo_url: https://github.com/fsame/reg_grec-wsj
  • paper_authors: Fahime Same, Guanyi Chen, Kees van Deemter
  • for: 本研究的目的是研究自然语言处理(NLP)领域的语言和科学方面,而不是围绕性能提高。
  • methods: 本研究使用生成引用表达(REG-in-context)作为案例研究,从GREC(一个十年前发布的英语共同任务集)开始分析。我们评估模型在更现实的数据集和更高级的方法上的性能。我们使用不同的评价指标和特征选择实验测试模型。
  • results: 我们结论认为,GREC不再能准确评估模型的人工引用生成能力,因为选择 Corpora 和评价指标会产生很大的影响。我们的结果还表明,基于预训练语言模型的模型比 классиic Machine Learning 模型更具有抗衰落性,因此在不同的 Corpora 上都能够更好地预测类别。
    Abstract In recent years, many NLP studies have focused solely on performance improvement. In this work, we focus on the linguistic and scientific aspects of NLP. We use the task of generating referring expressions in context (REG-in-context) as a case study and start our analysis from GREC, a comprehensive set of shared tasks in English that addressed this topic over a decade ago. We ask what the performance of models would be if we assessed them (1) on more realistic datasets, and (2) using more advanced methods. We test the models using different evaluation metrics and feature selection experiments. We conclude that GREC can no longer be regarded as offering a reliable assessment of models' ability to mimic human reference production, because the results are highly impacted by the choice of corpus and evaluation metrics. Our results also suggest that pre-trained language models are less dependent on the choice of corpus than classic Machine Learning models, and therefore make more robust class predictions.
    摘要 Translation notes:* "GREC" is translated as "格列克" (Gélèkè) in Simplified Chinese, which is a combination of "语料" (yǔliào) and "共享" (gòngxiǎng).* "REG-in-context" is translated as "在上下文中的引用表达" (zài shàngxìaoxìng zhōng de yǐnxiǎng biǎojiè) in Simplified Chinese.* "pre-trained language models" is translated as "预训练语言模型" (yùxùnxíng yǔyán módelì) in Simplified Chinese.* "classic Machine Learning models" is translated as "传统机器学习模型" (chuánchēng jīxuéxí módelì) in Simplified Chinese.

Improving Aspect-Based Sentiment with End-to-End Semantic Role Labeling Model

  • paper_url: http://arxiv.org/abs/2307.14785
  • repo_url: https://github.com/pauli31/srl-aspect-based-sentiment
  • paper_authors: Pavel Přibáň, Ondřej Pražák
  • for: 提高 Aspect-Based Sentiment Analysis (ABSA) 性能
  • methods: 利用 Semantic Role Labeling (SRL) 模型提取 semantics 信息,并提出了一种 novel end-to-end SRL 模型
  • results: 在英语和捷克两种语言中,使用 ELECTRA-small 模型评估提出的模型,并在两种语言中提高了 ABSA 性能,同时在捷克 ABSA 实现了新的状态理想 результаanos
    Abstract This paper presents a series of approaches aimed at enhancing the performance of Aspect-Based Sentiment Analysis (ABSA) by utilizing extracted semantic information from a Semantic Role Labeling (SRL) model. We propose a novel end-to-end Semantic Role Labeling model that effectively captures most of the structured semantic information within the Transformer hidden state. We believe that this end-to-end model is well-suited for our newly proposed models that incorporate semantic information. We evaluate the proposed models in two languages, English and Czech, employing ELECTRA-small models. Our combined models improve ABSA performance in both languages. Moreover, we achieved new state-of-the-art results on the Czech ABSA.
    摘要

Turning Whisper into Real-Time Transcription System

  • paper_url: http://arxiv.org/abs/2307.14743
  • repo_url: https://github.com/ufal/whisper_streaming
  • paper_authors: Dominik Macháček, Raj Dabre, Ondřej Bojar
  • for: 这个论文是为了实现实时语音抄写和翻译的 Whisper-Streaming 模型而写的。
  • methods: 该论文基于 Whisper 模型,并使用本地协议和自适应延迟来实现流式抄写。
  • results: 论文在长形语音抄写测试集上达到了高质量和3.3秒延迟,并在多语言会议中作为实时抄写组件展示了可靠性和实用性。
    Abstract Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real time transcription. In this paper, we build on top of Whisper and create Whisper-Streaming, an implementation of real-time speech transcription and translation of Whisper-like models. Whisper-Streaming uses local agreement policy with self-adaptive latency to enable streaming transcription. We show that Whisper-Streaming achieves high quality and 3.3 seconds latency on unsegmented long-form speech transcription test set, and we demonstrate its robustness and practical usability as a component in live transcription service at a multilingual conference.
    摘要 喊号是一种最新的多语言语音识别和翻译模型,但它并不是实时抄写的设计。在这篇论文中,我们基于喊号建立了喊号流动,一种实时语音抄写和翻译的喊号-like 模型的实现。喊号流动使用本地协议策略和自适应延迟来实现流动抄写。我们表明,喊号流动在长形语音抄写测试集上达到了高质量和3.3秒延迟,并在多语言会议中作为实时抄写服务的组件进行了实践应用。

Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training

  • paper_url: http://arxiv.org/abs/2307.14666
  • repo_url: https://github.com/fraunhofer-iais/arabic_nlp
  • paper_authors: Mohammad Majd Saad Al Deen, Maren Pielka, Jörn Hees, Bouthaina Soulef Abdou, Rafet Sifa
  • for: 本研究针对的是使用自然语言处理(NLP)技术进行阿拉伯文本分类,尤其是自然语言推理(NLI)和矛盾检测(CD)。由于阿拉伯语为resource-poor语言,具有有限的数据资源,因此有限的NLP方法可用。
  • methods: 作者通过创建专门的数据集来缓解这种限制,并使用 transformer 型机器学习模型进行训练和评估。他们还应用了语言专门预训练方法,如命名实体识别(NER),以提高模型的性能。
  • results: 研究发现,使用语言专门预训练方法可以使模型在阿拉伯语言中表现竞争力强,并且与多语言方法相比,语言专门模型(AraBERT)在 NLI 和 CD 任务上的性能几乎相同。这是首次对这类任务进行大规模评估,也是首次在这种语言上应用多任务预训练。
    Abstract This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.
    摘要

Plug and Pray: Exploiting off-the-shelf components of Multi-Modal Models

  • paper_url: http://arxiv.org/abs/2307.14539
  • repo_url: None
  • paper_authors: Erfan Shayegani, Yue Dong, Nael Abu-Ghazaleh
    for: 这篇论文的目的是提醒开发者关于将多模态组件(如视觉) incorporated into large language models(LLMs)中存在的安全问题。methods: 论文使用了对预训练encoder的 embedding空间进行攻击,而这些预训练encoder通常是公共可用的,并在多模态系统中使用了plug-and-play的方式。results: 论文发现,通过在预训练encoder的 embedding空间中寻找适合的输入图像,可以让模型表现出异常的行为,包括“Context Contamination”和“Hidden Prompt Injection”等两种威胁。这些威胁可以让模型的行为发生重大变化,从而威胁多模态系统的安全性。
    Abstract The rapid growth and increasing popularity of incorporating additional modalities (e.g., vision) into large language models (LLMs) has raised significant security concerns. This expansion of modality, akin to adding more doors to a house, unintentionally creates multiple access points for adversarial attacks. In this paper, by introducing adversarial embedding space attacks, we emphasize the vulnerabilities present in multi-modal systems that originate from incorporating off-the-shelf components like public pre-trained encoders in a plug-and-play manner into these systems. In contrast to existing work, our approach does not require access to the multi-modal system's weights or parameters but instead relies on the huge under-explored embedding space of such pre-trained encoders. Our proposed embedding space attacks involve seeking input images that reside within the dangerous or targeted regions of the extensive embedding space of these pre-trained components. These crafted adversarial images pose two major threats: 'Context Contamination' and 'Hidden Prompt Injection'-both of which can compromise multi-modal models like LLaVA and fully change the behavior of the associated language model. Our findings emphasize the need for a comprehensive examination of the underlying components, particularly pre-trained encoders, before incorporating them into systems in a plug-and-play manner to ensure robust security.
    摘要 “大型语言模型(LLM)中的多modalitate(例如视觉)的快速增长和增加受欢迎性,导致安全性问题的提出。这些多modalitate的扩展,就如加入更多门窗的房屋一样,不意气境产生多个攻击入口。在这篇论文中,我们透过引入对抗嵌入空间攻击,强调了多modalite系统中从对抗预训练encoder的潜在弱点。与现有的工作不同,我们的方法不需要访问多modalite系统的 weights或parameters,而是利用这些预训练encoder的广大未探索的嵌入空间。我们的提案的对抗嵌入空间攻击包括寻找在这些预训练encoder的危险或目标区域中的输入图像。这些制成的攻击图像会导致两种主要威胁:'Context Contamination'和'Hidden Prompt Injection',这两种威胁都可以妥协多modalite模型如LLaVA,并完全改变这些语言模型的行为。我们的发现强调了在将这些预训练encoder incorporated into systems中需要进行全面的评估,以确保系统的安全性。”

CliniDigest: A Case Study in Large Language Model Based Large-Scale Summarization of Clinical Trial Descriptions

  • paper_url: http://arxiv.org/abs/2307.14522
  • repo_url: None
  • paper_authors: Renee D. White, Tristan Peng, Pann Sripitak, Alexander Rosenberg Johansen, Michael Snyder
  • for: This paper aims to provide a tool for summarizing clinical trials in real-time, with the goal of helping researchers keep up-to-date with the latest trials in their field.
  • methods: The tool used for summarization is called CliniDigest, which is based on GPT-3.5. It can reduce a large amount of text (up to 85 clinical trial descriptions) into a concise 200-word summary with references and limited hallucinations.
  • results: The paper reports the results of testing CliniDigest on 457 trials across 27 medical subdomains. The summaries generated by CliniDigest have a mean length of 153 words and utilize an average of 54% of the sources.Here is the same information in Simplified Chinese text:
  • for: 这篇论文的目的是为了提供一种实时概括临床试验的工具,以帮助研究人员保持最新的知识在其领域。
  • methods: 这个工具被称为CliniDigest,它基于GPT-3.5。它可以将大量文本(最多85个临床试验描述)缩减成200个单词的概括,并包含参考文献和有限的幻化。
  • results: 论文报告了对CliniDigest在27个医学子领域的457个临床试验上的测试结果。CliniDigest生成的概括均长153单词,并使用了平均54%的源文献。
    Abstract A clinical trial is a study that evaluates new biomedical interventions. To design new trials, researchers draw inspiration from those current and completed. In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day, with each trial having a mean of approximately 1500 words [1]. This makes it nearly impossible to keep up to date. To mitigate this issue, we have created a batch clinical trial summarizer called CliniDigest using GPT-3.5. CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials. CliniDigest can reduce up to 85 clinical trial descriptions (approximately 10,500 words) into a concise 200-word summary with references and limited hallucinations. We have tested CliniDigest on its ability to summarize 457 trials divided across 27 medical subdomains. For each field, CliniDigest generates summaries of $\mu=153,\ \sigma=69 $ words, each of which utilizes $\mu=54\%,\ \sigma=30\% $ of the sources. A more comprehensive evaluation is planned and outlined in this paper.
    摘要 临床试验是评估新生物医学 intervención的研究。为设计新试验,研究人员从当前和完成的试验中留取灵感。2022年平均每天有更多于100个临床试验被提交到ClinicalTrials.gov,每个试验约1500个字。这使得保持最新的issue nearly impossible。为解决这个问题,我们创建了一个批处临床试验摘要工具 called CliniDigest,使用GPT-3.5。CliniDigest是,我们所知道的,第一个能提供实时、真实、全面的临床试验摘要。CliniDigest可以将457个试验(每个字符串约10,500个字)缩减到200个字的摘要,并且具有参考和有限的幻化。我们对CliniDigest在不同医学子领域中的性能进行了测试,每个领域的摘要平均长度为153个字,标准差为69个字,每个字符串使用54%的源文件。我们计划进一步评估CliniDigest的性能。

A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing

  • paper_url: http://arxiv.org/abs/2307.14500
  • repo_url: None
  • paper_authors: Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan yang, Jennifer Romano
  • for: 这种研究旨在开发一种用于数字信息参与度(IE)预测模型,称为READ模型,它 integrates 认知偏见和计算语言学,以提供一个多维度的信息参与度视角。
  • methods: 该研究使用了 WordNet 词库中随机选择的 50 对同义词(共 100 个词),通过大规模在线调查(n = 80,500)测量这些词的参与度水平,并计算每个词的 READ 特征值。
  • results: 研究发现,READ 模型能够准确预测一个词的参与度水平,并在对同义词进行比较时准确地分辨出更加吸引人的词。READ 模型在不同领域,如商业、教育、政府和医疗等领域,可以提高内容参与度和扩展人工语言模型的发展。
    Abstract This study introduces and empirically tests a novel predictive model for digital information engagement (IE) - the READ model, an acronym for the four pivotal attributes of engaging information: Representativeness, Ease-of-use, Affect, and Distribution. Conceptualized within the theoretical framework of Cumulative Prospect Theory, the model integrates key cognitive biases with computational linguistics and natural language processing to develop a multidimensional perspective on information engagement. A rigorous testing protocol was implemented, involving 50 randomly selected pairs of synonymous words (100 words in total) from the WordNet database. These words' engagement levels were evaluated through a large-scale online survey (n = 80,500) to derive empirical IE metrics. The READ attributes for each word were then computed and their predictive efficacy examined. The findings affirm the READ model's robustness, accurately predicting a word's IE level and distinguishing the more engaging word from a pair of synonyms with an 84% accuracy rate. The READ model's potential extends across various domains, including business, education, government, and healthcare, where it could enhance content engagement and inform AI language model development and generative text work. Future research should address the model's scalability and adaptability across different domains and languages, thereby broadening its applicability and efficacy.
    摘要

A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Human Language?

  • paper_url: http://arxiv.org/abs/2308.00109
  • repo_url: None
  • paper_authors: Gary Marcus, Evelina Leivada, Elliot Murphy
  • for: 这 paper 探讨了大语言模型在语言相关任务中的潜力,以及这些模型在人工智能发展中的应用。
  • methods: 这 paper 使用了大语言模型来进行下一个词预测任务,并分析了这些模型的贡献和缺失。
  • results: 这 paper 发现了目前大语言模型的应用还缺乏一些关键的能力,并提出了未来研发的方向。
    Abstract Artificial Intelligence applications show great potential for language-related tasks that rely on next-word prediction. The current generation of large language models have been linked to claims about human-like linguistic performance and their applications are hailed both as a key step towards Artificial General Intelligence and as major advance in understanding the cognitive, and even neural basis of human language. We analyze the contribution of large language models as theoretically informative representations of a target system vs. atheoretical powerful mechanistic tools, and we identify the key abilities that are still missing from the current state of development and exploitation of these models.
    摘要 人工智能应用场景中的语言相关任务可以通过下一个词预测来实现潜在的潜力。当前一代大语言模型被联系到人类语言性能的宣传和其应用被视为人工通用智能的关键一步和人类语言认知的重要进展。我们分析了大语言模型作为目标系统的理论 informative 表示和无理论强大机制工具的贡献,并标识当前开发和利用这些模型的关键缺失。

Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking

  • paper_url: http://arxiv.org/abs/2307.14440
  • repo_url: https://github.com/aramir62/da-nlg
  • paper_authors: Angela Ramirez, Karik Agarwal, Juraj Juraska, Utkarsh Garg, Marilyn A. Walker
  • for: This paper aims to develop a novel few-shot overgenerate-and-rank approach for controlled generation of dialogue acts (DAs).
  • methods: The proposed approach uses pretrained language models (LLMs) with prompt-based learning, and includes eight few-shot prompt styles and six automatic ranking functions to identify outputs with both correct DA and high semantic accuracy.
  • results: The approach achieves perfect DA accuracy and near perfect semantic accuracy (99.81%) on three domains and four LLMs, outperforming fine-tuned few-shot models trained with 5 to 100 instances per DA.
    Abstract Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning.
    摘要 对话系统需要生成具有多种对话行为(DA)的响应,以高度准确的semantic fidelity。在过去,对话自然语言生成器(NLG)通常通过大量并行 corpora 来训练,该 corpora 映射了域pecific DA 和其Semantic attribute 到输出utterance。现在,先进的语言模型(LLM)提供了新的可控制NLG的可能性,使用提示基本学习。在这里,我们开发了一种新的几个shot 过generate-and-rank方法,实现控制的对话行为生成。我们比较了8种几个shot prompt 样式,其中包括一种基于文本 Pseudo-References 的文本风格转移方法。我们开发了6种自动排序函数,以确定生成的输出包含正确的 DA 和高度准确的 Semantic accuracy。我们在三个领域和四个LLM上测试了我们的方法。我们认为这是对对话NLG的自动排序的首次研究。为了完整性,我们与 fine-tuned 几个shot 模型进行比较,后者在 5 到 100 个实例每个 DA 上进行微调。我们的结果表明,一些提示设置可以实现完美的 DA 准确率,以及near perfect Semantic accuracy(99.81%),并且比几个shot micro-tuning 更好。

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

  • paper_url: http://arxiv.org/abs/2307.14430
  • repo_url: None
  • paper_authors: Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher Ré
    for: This paper aims to study how to select data to improve the performance of pre-trained large language models (LMs) by understanding the natural order of skills learned by LMs from their training data.methods: The authors develop a new framework based on the intuition that LMs learn skills in a natural order, and they formalize the notion of a skill and an ordered set of skills in terms of the associated data. They also propose an online data sampling algorithm called Skill-It to efficiently learn multiple skills in a continual pre-training setting and an individual skill in a fine-tuning setting.results: The authors demonstrate the effectiveness of their framework and algorithm on several datasets, including the LEGO synthetic dataset and the Natural Instructions dataset. On the LEGO dataset, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. Additionally, they apply their framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens.
    Abstract The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens.
    摘要 “训练数据质量对预训练大语言模型(LM)的性能产生重要影响。我们研究如何在固定的字符数限制下选择数据,以便在任务间之间尽可能地提高下游模型的性能。我们提出了一个新的框架,基于简单的假设:人类学习语言时也跟随着一定的顺序,然而语言模型在学习训练数据时也会遵循类似的自然顺序。如果这种顺序存在,那么我们可以利用它来更好地理解LM和进行数据效率的训练。使用这种假设,我们将技能和技能集定义为数据的一部分,并提出了一种在线数据采样算法Skill-It,用于在混合技能下进行不断预训练和精度调整。在LEGO sintética上,Skill-It在不断预训练中比随机采样高出36.5个点的精度。在自然指令集上,Skill-It在精度调整中比直接使用目标技能的数据采样下降13.6%。我们在最近的RedPajama数据集上应用我们的技能框架,实现了使用3B参数的LM在LM评估套件上的高精度,比基eline方法(随机采样)在3B字符上的精度高出1B字符。”

Towards Generalist Biomedical AI

  • paper_url: http://arxiv.org/abs/2307.14334
  • repo_url: https://github.com/kyegomez/Med-PaLM
  • paper_authors: Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral, Dale Webster, Greg S. Corrado, Yossi Matias, Karan Singhal, Pete Florence, Alan Karthikesalingam, Vivek Natarajan
  • for: 这个论文旨在开发一种涵盖多Modal的生物医学人工智能系统,以便实现从科学发现到医疗提供的各种应用。
  • methods: 该论文首先创建了MultiMedBench,一个新的多Modal生物医学测试集。然后,他们引入了Med-PaLM Multimodal(Med-PaLM M),一种证明性的生物医学人工智能系统,可以涵盖多Modal的生物医学数据,包括临床语言、成像和 genomics。
  • results: 该论文的结果表明,Med-PaLM M在MultiMedBench任务中达到了与或超过当前状态的表现,常常超越专家模型。此外,论文还报道了无需训练的零shot泛化和转移学习的情况,以及模型生成的 radiology 报告得到了临床医生的评价。
    Abstract Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems.
    摘要 医学 inherently 多Modal,具有丰富的数据Modalities,包括文本、影像、基因组和更多。通用医学人工智能(AI)系统,可以在大规模上编码、集成和解释这些数据,有 potential 实现科学发现和诊断等应用。为了实现这些模型的发展,我们首先筹建 MultiMedBench,一个新的多Modal 医学benchmark。MultiMedBench 包括 14 种多Modal 医学任务,如医学问答、肿瘤和皮肤影像解读、放射学报告生成和摘要、基因变异检测等。然后,我们引入 Med-PaLM Multimodal(Med-PaLM M),我们的证明性证明,可以在多Modal 医学数据中灵活编码和解释医学语言、影像和基因组数据,并且使用同一组model weights。Med-PaLM M 在 MultiMedBench 任务中达到了与或超过当前状态的表现,经常超过专家模型,并且在一些任务上表现出透明的优势。我们还报道了零shot泛化到新的医学概念和任务, transferred learning across tasks,以及 emergent zero-shot 医学理解。为了进一步探索 Med-PaLM M 的能力和局限性,我们对模型生成(以及人类)的胸部X射影报告进行了 radiologist 评估,并观察到了模型在不同缩放器 scales 上的鼓舞人的表现。在 246 例退化胸部X射影中,临床医生对 Med-PaLM M 报告表示了对比 radiologists 的可 preference,表明了这些模型的临床可能性。虽然需要进一步的验证,但我们的结果代表了开发通用医学 AI 系统的重要里程碑。

Comparative Analysis of Libraries for the Sentimental Analysis

  • paper_url: http://arxiv.org/abs/2307.14311
  • repo_url: https://github.com/RAJHARINI-KRISHNASAMY/sentimental-analysis-of-news-feeds
  • paper_authors: Wendy Ccoya, Edson Pinto
  • For: 本研究的主要目标是提供机器学习方法对图书馆进行比较分析。* Methods: 本研究使用了Python和R语言的NLTK、TextBlob、Vader、Transformers(GPT和BERT预训练)和Tidytext等库来实现情感分析技术。同时还使用了四种机器学习模型:DT、SVM、NB和KNN。* Results: 根据实验结果,BERT transformer方法的准确率为0.973,得出的结论是推荐使用BERT transformer方法进行情感分析。
    Abstract This study is main goal is to provide a comparative comparison of libraries using machine learning methods. Experts in natural language processing (NLP) are becoming more and more interested in sentiment analysis (SA) of text changes. The objective of employing NLP text analysis techniques is to recognize and categorize feelings related to twitter users utterances. In this examination, issues with SA and the libraries utilized are also looked at. provides a number of cooperative methods to classify emotional polarity. The Naive Bayes Classifier, Decision Tree Classifier, Maxent Classifier, Sklearn Classifier, Sklearn Classifier MultinomialNB, and other conjoint learning algorithms, according to recent research, are very effective. In the project will use Five Python and R libraries NLTK, TextBlob, Vader, Transformers (GPT and BERT pretrained), and Tidytext will be used in the study to apply sentiment analysis techniques. Four machine learning models Tree of Decisions (DT), Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN) will also be used. To evaluate how well libraries for SA operate in the social network environment, comparative study was also carried out. The measures to assess the best algorithms in this experiment, which used a single data set for each method, were precision, recall, and F1 score. We conclude that the BERT transformer method with an Accuracy: 0.973 is recommended for sentiment analysis.
    摘要 这个研究的主要目标是对使用机器学习方法进行图书馆的比较分析。专家们在自然语言处理(NLP)领域越来越关注图书馆用户言语的情感分析(SA)。通过使用NLP文本分析技术,可以识别和分类图书馆用户言语中的情感偏好。本 исследование还检查了SA和使用的图书馆问题。提供了许多合作方法来分类情感偏好。Naive Bayes Classifier、Decision Tree Classifier、Maxent Classifier、Sknlearn Classifier、Sknlearn Classifier MultinomialNB等 conjunction learning algorithms,据最新研究,很有效。在研究中使用的Python和R库有NLTK、TextBlob、Vader、Transformers(GPT和BERT预训练)和Tidytext等,用于应用情感分析技术。此外, Four machine learning模型:Tree of Decisions(DT)、Support Vector Machine(SVM)、Naive Bayes(NB)和K-Nearest Neighbor(KNN)也将被使用。为了评估图书馆用于SA的库如何在社交网络环境中运行,本研究还进行了比较研究。测试使用的单个数据集的评价标准为精度、恢复率和F1分数。我们得出的结论是,BERT变换器方法,准确率为0.973,是SA的最佳方法。

Automatically Evaluating Opinion Prevalence in Opinion Summarization

  • paper_url: http://arxiv.org/abs/2307.14305
  • repo_url: None
  • paper_authors: Christopher Malon
  • for: 本文主要研究如何自动生成产品评价摘要,以提高对产品评价的汇总和理解。
  • methods: 本文提出了一种自动测试摘要中意见强度的度量方法,基于对每个摘要句子与每个源评论进行匹配度量。同时,本文考虑了多种现有的方法来评估摘要句子与源评论之间的一致性。
  • results: 基于一个Amazon产品评论数据集,本文证明了自动生成的摘要具有较低的意见强度,而人工摘要具有较高的意见强度。同时,本文还表明了采用简化源评论后,现有的抽象意见摘要系统可以达到人工性能水平。
    Abstract When faced with a large number of product reviews, it is not clear that a human can remember all of them and weight opinions representatively to write a good reference summary. We propose an automatic metric to test the prevalence of the opinions that a summary expresses, based on counting the number of reviews that are consistent with each statement in the summary, while discrediting trivial or redundant statements. To formulate this opinion prevalence metric, we consider several existing methods to score the factual consistency of a summary statement with respect to each individual source review. On a corpus of Amazon product reviews, we gather multiple human judgments of the opinion consistency, to determine which automatic metric best expresses consistency in product reviews. Using the resulting opinion prevalence metric, we show that a human authored summary has only slightly better opinion prevalence than randomly selected extracts from the source reviews, and previous extractive and abstractive unsupervised opinion summarization methods perform worse than humans. We demonstrate room for improvement with a greedy construction of extractive summaries with twice the opinion prevalence achieved by humans. Finally, we show that preprocessing source reviews by simplification can raise the opinion prevalence achieved by existing abstractive opinion summarization systems to the level of human performance.
    摘要 (Simplified Chinese translation)面对大量产品评论,人类可能无法记忆所有评论并准确表达其意见,我们提出一种自动度量测试摘要中意见的流行程度,基于对每个摘要声明中的评论数量进行匹配,并且忽略不重要或重复的评论。为了建立这种意见流行度度量,我们考虑了一些现有的评论一致性分数方法,并在亚马逊产品评论集中收集了多个人类意见一致性判断,以确定最佳的自动度量。结果显示,人类撰写的摘要只有轻微更高的意见流行程度,而前期EXTRACTIVE和ABSTRACTIVE无监督意见摘要方法则比人类性能更差。我们还示出了使用拼接法构建EXTRACTIVE摘要,可以 doubles the opinion prevalence achieved by humans。最后,我们显示了对源评论进行简化处理可以提高现有的ABSTRACTIVE意见摘要系统达到人类性能水平。

Founding a mathematical diffusion model in linguistics. The case study of German syntactic features in the North-Eastern Italian dialects

  • paper_url: http://arxiv.org/abs/2307.14291
  • repo_url: None
  • paper_authors: I. Lazzizzera
  • for: 研究德语特征在意大北部地区罗曼语方言中的传播
  • methods: 使用地理数据科学工具创建了一个交互式地图,以表示德语特征在当地的分布
  • results: 研究发现,使用德语特征的地区可以用一个二维函数来表示,并且这个函数可以通过 физи学中的卷积方程来描述,并且这种方法可以准确地预测当地语言的传播情况。此外,研究还发现,Schmidt的波可以被包含在卷积方程中,这可以复制现实语言传播事件中的复杂性。
    Abstract We take as a case study the spread of Germanic syntactic features into Romance dialects of North-Eastern Italy, which occurred after the immigration of German people in the Tyrol during the High Middle Ages. An interactive map is produced using tools of what is called Geographic Data Science. A smooth two-dimensional surface $\mathcal{G}$ expresses locally which fraction of territory uses a given German language feature: it is obtained by interpolating a discrete function that says if at any surveyed locality that feature is used or not.\newline This surface $\mathcal{G}$ is thought of as the value at the present time of a function describing a diffusion-convection phenomenon in two dimensions (here said \emph{tidal} mode), which is subjected in a very natural way to the same equation, suitably contextualized, used in physics for a number of phenomenological facts like the heat diffusion. It is shown that solutions of this equation, evaluated at the present time, fit well with the data as interpolated by $\mathcal{G}$, thus providing convincing pictures of diffusion-convection of the linguistic features of the case study, albeit simplifications and approximations.\newline Very importantly, it is shown that Schmidt's 'waves' can be counted among the solutions of the diffusion equation: superimposing Schmidt 'waves' to a 'tidal flooding' can reproduce complexities of real linguistic diffusion events.
    摘要 我们选择德语 sintactic feature 在意大利北东部罗曼语方言的传播为案例研究,该事件发生在中世纪高中期,德国人在提rol中移民。 使用地理数据科学工具生成了一个交互式地图,表示每个地方使用的德语特征的分布情况。这个二维表面 $\mathcal{G}$ 是通过 interpolating 一个粗略函数来获得,该函数在任一调查地点是否使用该语言特征。\newline 这个表面 $\mathcal{G}$ 被视为当前时间的函数值,描述了在二维空间中的扩散-湍流现象(称为“潮汐”模式),该现象遵循同physics中的一些现象,如热扩散。这些解的评估值,评估到当前时间,与实际数据 interpolated by $\mathcal{G}$ 相吻合,提供了语言扩散的诱导图。\newline 很重要的是,Schmidt的“浪”可以被视为扩散方程的解之一:将Schmidt的“浪”与“潮汐涌入”相加可以复制实际语言扩散事件中的复杂性。