cs.CL - 2023-07-23

X-CapsNet For Fake News Detection

  • paper_url: http://arxiv.org/abs/2307.12332
  • repo_url: None
  • paper_authors: Mohammad Hadi Goldani, Reza Safabakhsh, Saeedeh Momtazi
  • for: 本研究旨在帮助减少社交媒体和网络论坛上的谣言对用户决策产生的影响,通过自动检测和抵御假新闻。
  • methods: 该研究提出了一种基于变换器的模型,称为X-CapsNet,该模型包括一个带有动态路由算法的 capsule神经网络(CapsNet),以及一个大小基于的分类器。
  • results: 研究使用了 Covid-19 和 Liar 数据集进行评估,结果表明,模型在 Covid-19 数据集上的 F1 分数和 Liar 数据集上的准确率都高于现有基线。
    Abstract News consumption has significantly increased with the growing popularity and use of web-based forums and social media. This sets the stage for misinforming and confusing people. To help reduce the impact of misinformation on users' potential health-related decisions and other intents, it is desired to have machine learning models to detect and combat fake news automatically. This paper proposes a novel transformer-based model using Capsule neural Networks(CapsNet) called X-CapsNet. This model includes a CapsNet with dynamic routing algorithm paralyzed with a size-based classifier for detecting short and long fake news statements. We use two size-based classifiers, a Deep Convolutional Neural Network (DCNN) for detecting long fake news statements and a Multi-Layer Perceptron (MLP) for detecting short news statements. To resolve the problem of representing short news statements, we use indirect features of news created by concatenating the vector of news speaker profiles and a vector of polarity, sentiment, and counting words of news statements. For evaluating the proposed architecture, we use the Covid-19 and the Liar datasets. The results in terms of the F1-score for the Covid-19 dataset and accuracy for the Liar dataset show that models perform better than the state-of-the-art baselines.
    摘要 新闻消耗量已经明显增加,这与网络讨论平台和社交媒体的普及和使用有着直接关系。这种情况设置了误导和混淆人们的场景,为了帮助用户避免基于误information的决策,自动检测和抗误information的机器学习模型变得越来越重要。本文提出了一种基于变换器的新型模型,称为X-CapsNet,该模型包括一个具有动态路由算法的Capsule神经网络(CapsNet)和一个大小分类器。我们使用了两个大小分类器,一个是深度卷积神经网络(DCNN)用于检测长 fake news 声明,另一个是多层感知神经网络(MLP)用于检测短新闻声明。为解决短新闻声明的表示问题,我们使用了 indirect 特征,即新闻发布人的 Profile 向量和新闻声明中的负面情感和计数词向量。为评估提议的体系,我们使用了 Covid-19 和 Liar 数据集。结果表明,模型在 Covid-19 数据集上的 F1 分数和 Liar 数据集上的准确率都高于当前基eline。

Milimili. Collecting Parallel Data via Crowdsourcing

  • paper_url: http://arxiv.org/abs/2307.12282
  • repo_url: https://github.com/alantonov/milimili
  • paper_authors: Alexander Antonov
  • for: 这个研究是为了提出一种通过协同劳动来收集并构建平行 Corpora的方法,比聘用专业翻译人员更加经济。
  • methods: 这种方法利用了互联网平台,通过吸引志愿者参与翻译来收集数据,并使用机器学习算法来进行自动评分。
  • results: 研究人员通过实验对Chechen-Russian和Fula-English语种的平行数据进行了收集和分析,并发现这种方法可以提供高质量的平行数据,但需要进一步的优化和纠正。
    Abstract We present a methodology for gathering a parallel corpus through crowdsourcing, which is more cost-effective than hiring professional translators, albeit at the expense of quality. Additionally, we have made available experimental parallel data collected for Chechen-Russian and Fula-English language pairs.
    摘要 我们提出了一种使用人工协助收集并行文献的方法,这比聘请专业翻译人员更加经济,然而品质可能受到影响。此外,我们已经为Chechen-Russian和Fula-English语种对 experimental parallel数据进行了收集。

Transformer-based Joint Source Channel Coding for Textual Semantic Communication

  • paper_url: http://arxiv.org/abs/2307.12266
  • repo_url: None
  • paper_authors: Shicong Liu, Zhen Gao, Gaojie Chen, Yu Su, Lu Peng
  • for: 本文提出了一种文本semantic传输框架,以提高在雷达环境下的文本传输可靠性和效率。
  • methods: 本文使用了高级自然语言处理技术,将文本句子分解成单词,并使用Transformer编码器进行semantic提取。编码后的数据被归一化为固定长度二进制序列,并对这些二进制序列进行了模拟隐藏 Markov chain 的扩展。
  • results: simulationResults表明,提出的模型在semantic传输中具有较高的可靠性和效率,并且在雷达环境下能够有效地抗抗干扰。
    Abstract The Space-Air-Ground-Sea integrated network calls for more robust and secure transmission techniques against jamming. In this paper, we propose a textual semantic transmission framework for robust transmission, which utilizes the advanced natural language processing techniques to model and encode sentences. Specifically, the textual sentences are firstly split into tokens using wordpiece algorithm, and are embedded to token vectors for semantic extraction by Transformer-based encoder. The encoded data are quantized to a fixed length binary sequence for transmission, where binary erasure, symmetric, and deletion channels are considered for transmission. The received binary sequences are further decoded by the transformer decoders into tokens used for sentence reconstruction. Our proposed approach leverages the power of neural networks and attention mechanism to provide reliable and efficient communication of textual data in challenging wireless environments, and simulation results on semantic similarity and bilingual evaluation understudy prove the superiority of the proposed model in semantic transmission.
    摘要 天空地海集成网络呼吁更加robust和安全的传输技术以抗干扰。在这篇论文中,我们提出了文本semantic传输框架,利用高级自然语言处理技术来模型和编码句子。具体来说,文本句子首先使用wordpiece算法拆分成单词,然后将单词embedding到token vector中进行semantic提取。编码后的数据被归一化为固定长度二进制序列进行传输,并考虑了三种渠道( binary erasure、symmetric 和deletion)的传输。接收到的二进制序列被transformer解码器解码成原始句子中的单词,并用于句子重建。我们提出的方法利用神经网络和注意机制提供了可靠和高效的文本数据在困难无线环境中的传输,并在语义传输方面进行了严格的验证和评估。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

A meta learning scheme for fast accent domain expansion in Mandarin speech recognition

  • paper_url: http://arxiv.org/abs/2307.12262
  • repo_url: None
  • paper_authors: Ziwei Zhu, Changhao Shan, Bihong Zhang, Jian Yu
  • for: 这篇论文主要用于探讨普通话自动语音识别(ASR)中腔点域扩展问题。
  • methods: 该论文使用元学习技术来实现快速普通话腔点扩展,包括冻结模型参数以及元学习。
  • results: 相比基eline模型,该方法在腔点扩展任务中显示出3%的相对提升,并在大量数据下达到4%的相对提升。
    Abstract Spoken languages show significant variation across mandarin and accent. Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task. In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. Meta-learning or learn-to-learn can learn general relation in multi domains not only for over-fitting a specific domain. So we select meta-learning in the domain expansion task. This more essential learning will cause improved performance on accent domain extension tasks. We combine the methods of meta learning and freeze of model parameters, which makes the recognition performance more stable in different cases and the training faster about 20%. Our approach significantly outperforms other methods about 3% relatively in the accent domain expansion task. Compared to the baseline model, it improves relatively 37% under the condition that the mandarin test set remains unchanged. In addition, it also proved this method to be effective on a large amount of data with a relative performance improvement of 4% on the accent test set.
    摘要 ⟨SYS⟩ spoken languages show significant variation across mandarin and accent. Despite the high performance of mandarin automatic speech recognition (ASR), accent ASR is still a challenge task. In this paper, we introduce meta-learning techniques for fast accent domain expansion in mandarin speech recognition, which expands the field of accents without deteriorating the performance of mandarin ASR. Meta-learning or learn-to-learn can learn general relation in multi domains not only for over-fitting a specific domain. So we select meta-learning in the domain expansion task. This more essential learning will cause improved performance on accent domain extension tasks. We combine the methods of meta learning and freeze of model parameters, which makes the recognition performance more stable in different cases and the training faster about 20%. Our approach significantly outperforms other methods about 3% relatively in the accent domain expansion task. Compared to the baseline model, it improves relatively 37% under the condition that the mandarin test set remains unchanged. In addition, it also proved this method to be effective on a large amount of data with a relative performance improvement of 4% on the accent test set. traslated by Google Translate

MyVoice: Arabic Speech Resource Collaboration Platform

  • paper_url: http://arxiv.org/abs/2308.02503
  • repo_url: None
  • paper_authors: Yousseif Elshahawy, Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali
  • for: 增强阿拉伯语言技术的研究
  • methods: 使用拥有者参与的人群征集平台收集阿拉伯语言录音,并提供公共数据集
  • results: 成功创建了大量的 диалект语言录音数据集,并提供了可Switch用户角色的功能,以及一个质量筛选和反馈系统,以保证数据的质量。
    Abstract We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.
    摘要 我们介绍MyVoice,一个人工智能平台,旨在收集阿拉伯语言的口语,以提高方言技术。这个平台提供了大规模的方言语音数据的设计机会,并将其公开提供。MyVoice让参与者选择城市/国家精细方言,并录制显示的句子。用户可以在参与者和注释者之间进行Switch角色。平台包含一个质量保证系统,将低质量和假的录音过滤掉,然后将其发送到验证。在验证阶段,参与者可以评估录音质量,注释和提供反馈,这些反馈会被管理员审核。此外,平台还允许管理员添加新的数据或任务,超出方言语音和单词收集,这些任务将被显示给参与者。因此,MyVoice可以促进多方合作,收集多样化和大规模的阿拉伯语言数据。

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

  • paper_url: http://arxiv.org/abs/2307.12231
  • repo_url: None
  • paper_authors: Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe
  • for: 这个论文的目的是提出一种基于自适应学习表示的多 speaker自然语音识别系统。
  • methods: 本文使用多通道分离方法、封闭抑制映射和复杂 spectral mapping,以及最佳的ASR后端模型特征。
  • results: 研究人员通过使用最新的自我超级vised学习表示(SSLR)来提高filterbank特征下的识别性能,并通过合理的训练策略将speech separation和识别 integrate into一个系统,实现了WHAMR! reverberation测试集的2.5%字幕误差率,与现有的mask-based MVDR抽样抑制和filterbank综合integration(28.9%)相比,表现显著提高。
    Abstract Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end. In detail, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model. We employ the recent self-supervised learning representation (SSLR) as a feature and improve the recognition performance from the case with filterbank features. To further improve multi-speaker recognition performance, we present a carefully designed training strategy for integrating speech separation and recognition with SSLR. The proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set, significantly outperforming an existing mask-based MVDR beamforming and filterbank integration (28.9%).
    摘要 “神经语音分离技术在过去几年中已经做出了很大的进步,其与自动语音识别(ASR)的结合是实现多个说话人ASR的重要方向。本文提供了详细的语音分离在噪音混响和噪音混响 scenarios中的研究,作为ASR前端。 Specifically, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model. We employ the recent self-supervised learning representation (SSLR) as a feature and improve the recognition performance from the case with filterbank features. To further improve multi-speaker recognition performance, we present a carefully designed training strategy for integrating speech separation and recognition with SSLR. The proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set, significantly outperforming an existing mask-based MVDR beamforming and filterbank integration (28.9%).”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Identifying Misinformation on YouTube through Transcript Contextual Analysis with Transformer Models

  • paper_url: http://arxiv.org/abs/2307.12155
  • repo_url: https://github.com/christoschr97/misinf-detection-llms
  • paper_authors: Christos Christodoulou, Nikos Salamanos, Pantelitsa Leonidou, Michail Papadakis, Michael Sirivianos
  • for: 本研究旨在提出一种新的视频分类方法,以确定视频内容的真实性。
  • methods: 本方法利用视频转cript中的文本内容,将传统的视频分类任务转化为文本分类任务。采用高级机器学习技术,如传输学习和少量学习。
  • results: 在三个dataset上进行评估,包括YouTube疫苗谣言相关视频、YouTube pseudoscience视频和一个新闻假消息集合。 fine-tuned模型的 Matthews Correlation Coefficient>0.81,准确率>0.90和F1 score>0.90。而少量学习模型在YouTube pseudoscience数据集上比 fine-tuned模型高20%的准确率和F1 score。
    Abstract Misinformation on YouTube is a significant concern, necessitating robust detection strategies. In this paper, we introduce a novel methodology for video classification, focusing on the veracity of the content. We convert the conventional video classification task into a text classification task by leveraging the textual content derived from the video transcripts. We employ advanced machine learning techniques like transfer learning to solve the classification challenge. Our approach incorporates two forms of transfer learning: (a) fine-tuning base transformer models such as BERT, RoBERTa, and ELECTRA, and (b) few-shot learning using sentence-transformers MPNet and RoBERTa-large. We apply the trained models to three datasets: (a) YouTube Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and (c) Fake-News dataset (a collection of articles). Including the Fake-News dataset extended the evaluation of our approach beyond YouTube videos. Using these datasets, we evaluated the models distinguishing valid information from misinformation. The fine-tuned models yielded Matthews Correlation Coefficient>0.81, accuracy>0.90, and F1 score>0.90 in two of three datasets. Interestingly, the few-shot models outperformed the fine-tuned ones by 20% in both Accuracy and F1 score for the YouTube Pseudoscience dataset, highlighting the potential utility of this approach -- especially in the context of limited training data.
    摘要 伪信息在YouTube上是一项重要的问题,需要 robust的检测策略。在这篇论文中,我们介绍了一种新的方法ology for video classification, ocus on 视频内容的真实性。我们将传统的视频分类任务转化为文本分类任务,通过利用视频字幕中的文本内容。我们采用了先进的机器学习技术,如传输学习,解决分类挑战。我们的方法包括两种形式的传输学习:(a)精度调整基于BERT、RoBERTa和ELECTRA的transformer模型,以及(b)几shot学习使用 sentence-transformers MPNet和RoBERTa-large。我们对三个数据集进行了应用:(a)YouTube疫苗谣言相关视频,(b)YouTube pseudo科学视频,以及(c) fake news数据集(一个收集了文章)。通过这些数据集,我们评估了我们的方法可以分辨真实信息和伪信息。精度调整模型的 Matthews Correlation Coefficient>0.81,准确率>0.90,和F1分数>0.90在三个数据集中都达到了。有意思的是,几shot模型在YouTube pseudo科学数据集上比精度调整模型高出20%的准确率和F1分数,这highlights了这种方法在有限的训练数据情况下的潜在实用性。

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

  • paper_url: http://arxiv.org/abs/2307.12134
  • repo_url: None
  • paper_authors: Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer
  • for: 提高 END-to-END 语言理解系统的稳定性,增强对听写错误的耐受能力。
  • methods: 提出一种新的 END-to-END 语言理解系统,通过融合音频和文本表示来增强对听写错误的耐受能力,并采用两种新技术:1)有效地编码听写错误的质量信息,2)有效地将其集成到 END-to-END 语言理解模型中。
  • results: 在 STOP 数据集上实现了准确率的提高,并进行了分析,证明了我们的方法的有效性。
    Abstract End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently. This approach uses a single model that utilizes audio and text representations from pre-trained speech recognition models (ASR), and outperforms traditional pipeline SLU systems in on-device streaming scenarios. However, E2E SLU systems still show weakness when text representation quality is low due to ASR transcription errors. To overcome this issue, we propose a novel E2E SLU system that enhances robustness to ASR errors by fusing audio and text representations based on the estimated modality confidence of ASR hypotheses. We introduce two novel techniques: 1) an effective method to encode the quality of ASR hypotheses and 2) an effective approach to integrate them into E2E SLU models. We show accuracy improvements on STOP dataset and share the analysis to demonstrate the effectiveness of our approach.
    摘要 最近,终端到终端(E2E)的语音理解系统(SLU)已经变得更加有前途。这种方法使用单个模型,利用采样和文本表示从预训练的语音识别模型(ASR)中获得,并在设备流动中超越传统的管道式SLU系统。然而,E2E SLU系统仍然在文本表示质量低下表现不佳,即使使用ASR转译错误。为解决这个问题,我们提出了一种新的E2E SLU系统,增强了ASR错误的鲁棒性。我们介绍了两种新技术:1)有效的ASR假设质量编码方法和2)有效的E2E SLU模型集成方法。我们在STOP数据集上显示了准确率改善,并进行分析,以证明我们的方法的有效性。

Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources

  • paper_url: http://arxiv.org/abs/2307.12131
  • repo_url: None
  • paper_authors: Jiasheng Si, Yingjie Zhu, Xingyu Shi, Deyu Zhou, Yulan He
  • for: 本文提出了一种新的可解释话题增强的论据挖掘方法,以提高论据挖掘的精度和效果。
  • methods: 本文使用了神经网络话题模型和语言模型,将目标信息补充了可解释话题表示,并通过共同学习来捕捉在论据中的句子水平话题信息。
  • results: 实验结果表明,提出的方法在benchmark数据集上在各种设置下都有显著优势,与现有基线模型相比。
    Abstract Given a controversial target such as ``nuclear energy'', argument mining aims to identify the argumentative text from heterogeneous sources. Current approaches focus on exploring better ways of integrating the target-associated semantic information with the argumentative text. Despite their empirical successes, two issues remain unsolved: (i) a target is represented by a word or a phrase, which is insufficient to cover a diverse set of target-related subtopics; (ii) the sentence-level topic information within an argument, which we believe is crucial for argument mining, is ignored. To tackle the above issues, we propose a novel explainable topic-enhanced argument mining approach. Specifically, with the use of the neural topic model and the language model, the target information is augmented by explainable topic representations. Moreover, the sentence-level topic information within the argument is captured by minimizing the distance between its latent topic distribution and its semantic representation through mutual learning. Experiments have been conducted on the benchmark dataset in both the in-target setting and the cross-target setting. Results demonstrate the superiority of the proposed model against the state-of-the-art baselines.
    摘要 Given a controversial target such as "核能源" (nuclear energy), argument mining aims to identify the argumentative text from heterogeneous sources. Current approaches focus on exploring better ways of integrating the target-associated semantic information with the argumentative text. Despite their empirical successes, two issues remain unsolved: (i) a target is represented by a word or a phrase, which is insufficient to cover a diverse set of target-related subtopics; (ii) the sentence-level topic information within an argument, which we believe is crucial for argument mining, is ignored. To tackle the above issues, we propose a novel explainable topic-enhanced argument mining approach. Specifically, with the use of the neural topic model and the language model, the target information is augmented by explainable topic representations. Moreover, the sentence-level topic information within the argument is captured by minimizing the distance between its latent topic distribution and its semantic representation through mutual learning. Experiments have been conducted on the benchmark dataset in both the in-target setting and the cross-target setting. Results demonstrate the superiority of the proposed model against the state-of-the-art baselines.Here's the word-for-word translation:给一个争议性目标,如“核能源”(nuclear energy), argument mining 目标是从不同来源中提取Argumentative text。现有的方法主要关注在更好地将目标相关的语义信息与 argumentative text 集成。despite their empirical successes, two issues remain unsolved: (i) a target is represented by a word or a phrase, which is insufficient to cover a diverse set of target-related subtopics; (ii) the sentence-level topic information within an argument, which we believe is crucial for argument mining, is ignored. To tackle the above issues, we propose a novel explainable topic-enhanced argument mining approach. Specifically, with the use of the neural topic model and the language model, the target information is augmented by explainable topic representations. Moreover, the sentence-level topic information within the argument is captured by minimizing the distance between its latent topic distribution and its semantic representation through mutual learning. Experiments have been conducted on the benchmark dataset in both the in-target setting and the cross-target setting. Results demonstrate the superiority of the proposed model against the state-of-the-art baselines.