cs.CL - 2023-11-18

Experts-in-the-Loop: Establishing an Effective Workflow in Crafting Privacy Q&A

  • paper_url: http://arxiv.org/abs/2311.11161
  • repo_url: None
  • paper_authors: Zahra Kolagar, Anna Katharina Leschanowsky, Birgit Popp
  • for: 保护用户隐私,法律领域世界各地强调数据处理的透明度。
  • methods: 提议一种动态工作流程,将隐私政策转化为隐私问答对,使隐私政策通过对话AI访问。
  • results: 促进多学科协作,法律专家和对话设计师之间的合作,同时考虑使用大语言模型的生成能力和相关挑战。
    Abstract Privacy policies play a vital role in safeguarding user privacy as legal jurisdictions worldwide emphasize the need for transparent data processing. While the suitability of privacy policies to enhance transparency has been critically discussed, employing conversational AI systems presents unique challenges in informing users effectively. In this position paper, we propose a dynamic workflow for transforming privacy policies into privacy question-and-answer (Q&A) pairs to make privacy policies easily accessible through conversational AI. Thereby, we facilitate interdisciplinary collaboration among legal experts and conversation designers, while also considering the utilization of large language models' generative capabilities and addressing associated challenges. Our proposed workflow underscores continuous improvement and monitoring throughout the construction of privacy Q&As, advocating for comprehensive review and refinement through an experts-in-the-loop approach.
    摘要 《隐私政策在保护用户隐私方面发挥了关键作用,世界各地法律领域都强调数据处理的透明度。虽然透明度是隐私政策的重要方面,但是在使用对话AI系统时,它们带来了一些独特的挑战,用于有效地 Informing 用户。在这篇位点纸中,我们提出了一种动态工作流程,将隐私政策转换成隐私问答对,以便通过对话AI系统访问隐私政策。这样,我们促进了法律专家和对话设计师之间的跨学科合作,同时也考虑了大语言模型的生成能力和相关挑战。我们的提议的工作流程强调了不断改进和监测,在建立隐私问答时,强调了专家征考和反复修改的方式。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Evaluating the Inclusiveness of Artificial Intelligence Software in Enhancing Project Management Efficiency – A Review

  • paper_url: http://arxiv.org/abs/2311.11159
  • repo_url: None
  • paper_authors: Vasileios Alevizos, Ilias Georgousis, Akebu Simasiku, Sotiria Karypidou, Antonis Messinis
  • for: 本研究旨在探讨技术integrated project management中的包容性和效率之间的关系,以及如何通过技术来提高项目成果。
  • methods: 本研究使用了定义和测量包容性的方法,以及评估技术的可转化潜力。
  • results: 研究发现,通过技术支持,项目管理中的包容性可以得到 significan improvement,但需要注意技术的可行性和伦理考虑。
    Abstract The rise of advanced technology in project management (PM) highlights a crucial need for inclusiveness. This work examines the enhancement of both inclusivity and efficiency in PM through technological integration, focusing on defining and measuring inclusiveness. This approach illuminates how inclusivity-centered technology can significantly elevate project outcomes. The research navigates through the challenges of achieving inclusivity, mainly biases in learning databases and the design process of these technologies, assessment of transformative potential of these technologies, particularly in automating tasks like data collection and analysis, thus enabling managers to prioritize human-centric aspects of projects. However, the integration of such technology transcends efficiency, indicating a paradigm shift in understanding their societal roles. This shift necessitates a new approach in the development of these systems to prevent perpetuating social inequalities. We proposed a methodology involving criteria development for evaluating the inclusiveness and effectiveness of these technologies. This methodical approach is vital to comprehensively address the challenges and limitations inherent in these systems. Emphasizing the importance of inclusivity, the study advocates for a balance between technological advancement and ethical considerations, calling for a holistic understanding and regulation. In conclusion, the paper underscores that while these technologies can significantly improve outcomes, their mindful integration, ensuring inclusivity, is paramount. This exploration into the ethical and practical aspects of technology in PM contributes to a more informed and balanced approach within the field.
    摘要 高技术的普及在项目管理(PM)领域 highlights 一个关键的需求:包容性。这项研究探讨了通过技术集成提高包容性和效率在 PM 中的可能性,关注定义和测量包容性。这种方法揭示了包容性 centered 技术在项目成果上可以产生显著提高。研究探讨了实现包容性的挑战,主要是学习数据库和设计过程中的偏见,以及这些技术的变革潜力,特别是自动化数据采集和分析任务,以便管理人员能够更好地注重项目人文方面。然而,技术的集成不仅提高效率,还表示一种社会角色转变。这种转变需要一种新的开发系统的方法,以避免社会不平等的持续。我们提出了一种包括对包容性和效果的评价 criterion 的方法ологи。这种系统atic 方法是必要的,以全面 Address 这些系统的挑战和限制。强调包容性的重要性,这篇论文呼吁对技术进步和伦理考虑进行平衡,寻求一种整体的理解和规范。 study 的结论是,虽然这些技术可以提高成果,但是注意其包容性的集成是关键。这种探讨技术和伦理的平衡在 PM 领域中做出了更加 Informed 和平衡的方法论。

Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

  • paper_url: http://arxiv.org/abs/2311.11142
  • repo_url: None
  • paper_authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Ahmed Al Wase, Mehidi Ahmmed, Md. Rabius Sani, Tashreef Muhammad
  • for: 这个研究的目的是填充在过去的研究中缺失的班加礼语言方言到标准班加礼语言的翻译。
  • methods: 该研究使用了mT5和BanglaT5模型来翻译地方语言到标准班加礼语言,以及mBERT和Bangla-bert-base来准确地探测地方语言的来源地区。
  • results: 实验结果显示,мы门司нг班加礼语言方言的 Bleu 分数为69.06,而奇图加礼语言方言的 Bleu 分数为36.75。我们还发现,мы门司нг班加礼语言方言的平均单词错误率为0.1548,而奇图加礼语言方言的平均单词错误率为0.3385。Region detection的准确率为85.86%和84.36%。这是首次对班加礼语言方言到班加礼语言机器翻译进行大规模的调查。
    Abstract The Bangla linguistic variety is a fascinating mix of regional dialects that adds to the cultural diversity of the Bangla-speaking community. Despite extensive study into translating Bangla to English, English to Bangla, and Banglish to Bangla in the past, there has been a noticeable gap in translating Bangla regional dialects into standard Bangla. In this study, we set out to fill this gap by creating a collection of 32,500 sentences, encompassing Bangla, Banglish, and English, representing five regional Bangla dialects. Our aim is to translate these regional dialects into standard Bangla and detect regions accurately. To achieve this, we proposed models known as mT5 and BanglaT5 for translating regional dialects into standard Bangla. Additionally, we employed mBERT and Bangla-bert-base to determine the specific regions from where these dialects originated. Our experimental results showed the highest BLEU score of 69.06 for Mymensingh regional dialects and the lowest BLEU score of 36.75 for Chittagong regional dialects. We also observed the lowest average word error rate of 0.1548 for Mymensingh regional dialects and the highest of 0.3385 for Chittagong regional dialects. For region detection, we achieved an accuracy of 85.86% for Bangla-bert-base and 84.36% for mBERT. This is the first large-scale investigation of Bangla regional dialects to Bangla machine translation. We believe our findings will not only pave the way for future work on Bangla regional dialects to Bangla machine translation, but will also be useful in solving similar language-related challenges in low-resource language conditions.
    摘要 “孟加拉语言变体是一种非常有趣的地域 диалект混合,增加了孟加拉语言社区的文化多样性。尽管过去对孟加拉语言到英语、英语到孟加拉语言和孟加拉语言到英语的翻译已经进行了广泛的研究,但是还没有尝试翻译孟加拉语言地域 диалект到标准孟加拉语言。在这项研究中,我们决定填补这一空白,创建了32500句孟加拉语言、孟加拉语言混合和英语 sentences,代表五个孟加拉语言地域 диалект。我们的目标是将这些地域 диаLECTS翻译成标准孟加拉语言,并准确地检测这些地域的来源。为此,我们提出了名为mT5和BanglaT5的模型,以及mBERT和Bangla-bert-base来确定这些地域的来源。我们的实验结果显示,最高的BLEU分数为69.06,来自米亚尼斯希的地域 диаLECTS,最低的BLEU分数为36.75,来自切图格的地域 диаLECTS。我们还观察到,米亚尼斯希的地域 диаLECTS的平均单词错误率最低,为0.1548,而切图格的地域 диаLECTS的平均单词错误率最高,为0.3385。对地域检测,我们获得了85.86%的准确率,使用Bangla-bert-base,和84.36%的准确率,使用mBERT。这是对孟加拉语言地域 диаLECTS到孟加拉语言机器翻译的首次大规模调查。我们认为,我们的发现将不 только开阔未来对孟加拉语言地域 диаLECTS到孟加拉语言机器翻译的道路,还将在低资源语言条件下解决类似语言相关的挑战。”

(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs

  • paper_url: http://arxiv.org/abs/2311.11123
  • repo_url: None
  • paper_authors: Wanqin Ma, Chenyang Yang, Christian Kästner
  • for: 本研究旨在探讨如何对不断发展的语言模型API进行 regression testing,以适应应用程序开发人员在使用LMMs时遇到的问题。
  • methods: 本研究采用了一种CASE研究方法,通过实践案例研究了LLM APIs的更新和 deprecation 对应用程序的影响。
  • results: 研究发现, traditional testingapproaches 不能满足 regression testing LLMs,因为LLM APIs的不同correctness notions、prompt brittleness和non-determinism。
    Abstract Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our case study on toxicity detection. Based on our case study, we emphasize the need for and re-examine the concept of regression testing for evolving LLM APIs. We argue that regression testing LLMs requires fundamental changes to traditional testing approaches, due to different correctness notions, prompting brittleness, and non-determinism in LLM APIs.
    摘要

Responsible AI Considerations in Text Summarization Research: A Review of Current Practices

  • paper_url: http://arxiv.org/abs/2311.11103
  • repo_url: None
  • paper_authors: Yu Lu Liu, Meng Cao, Su Lin Blodgett, Jackie Chi Kit Cheung, Alexandra Olteanu, Adam Trischler
  • for: 本研究旨在探讨文本摘要 tasks 中可能存在的责任AI问题,包括可能的风险和影响,以及研究者是否考虑到可能的利益相互作用。
  • methods: 本研究使用质性分析方法对333篇ACL Anthology中的文本摘要研究进行多轮分析,以探讨研究和报道实践中的责任AI问题。
  • results: 研究发现大多数论文不考虑可能的利益相互作用和下游影响,导致研究的可能性和可靠性受到限制。基于这些结果,我们提出了实践和研究方向,以促进责任AI在文本摘要领域的发展。
    Abstract AI and NLP publication venues have increasingly encouraged researchers to reflect on possible ethical considerations, adverse impacts, and other responsible AI issues their work might engender. However, for specific NLP tasks our understanding of how prevalent such issues are, or when and why these issues are likely to arise, remains limited. Focusing on text summarization -- a common NLP task largely overlooked by the responsible AI community -- we examine research and reporting practices in the current literature. We conduct a multi-round qualitative analysis of 333 summarization papers from the ACL Anthology published between 2020-2022. We focus on how, which, and when responsible AI issues are covered, which relevant stakeholders are considered, and mismatches between stated and realized research goals. We also discuss current evaluation practices and consider how authors discuss the limitations of both prior work and their own work. Overall, we find that relatively few papers engage with possible stakeholders or contexts of use, which limits their consideration of potential downstream adverse impacts or other responsible AI issues. Based on our findings, we make recommendations on concrete practices and research directions.
    摘要 AI和自然语言处理(NLP)发表文章的场景越来越多地鼓励研究人员考虑可能的伦理考虑因素、不良影响和负责任AI问题他们的工作可能会带来。然而,对于特定的NLP任务,我们对这些问题的流行程度、出现时间和发生原因仍然很限制。关注文本概要---一种常见的NLP任务,负责AI社区很少关注的领域---我们对当前文献的研究和报道做了多 round的质量分析。我们关注怎样、何时和为什么负责AI问题被考虑,哪些关键参与者被考虑,以及文献目标与实际实施的差异。我们还讨论当前的评价方法,并考虑作者如何评估先前工作和自己的研究的局限性。总之,我们发现大多数文章没有考虑可能的利益相关者或使用场景,这限制了他们对可能的下游不良影响或其他负责任AI问题的考虑。根据我们的发现,我们提出了具体的实践和研究方向。

Bit Cipher – A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models

  • paper_url: http://arxiv.org/abs/2311.11012
  • repo_url: None
  • paper_authors: Haoran Zhao, Jake Ryland Williams
  • for: 本研究旨在提出一种新的词表示系统,以提高词向量的计算效率和解释力。
  • methods: 本研究使用了一种新的Bit-cipher算法来训练词向量,该算法不需要反propagation,并且可以利用文本上的上下文信息和紧凑的维度减少技术来提供强大的解释性。
  • results: 研究表明,使用Bit-cipher算法可以快速训练高效的词向量,并且可以在不同的文本任务中达到竞争性的表现。此外,研究还发现了在折衔批处理和练习中使用cipher embedding层可以加速模型的训练过程,并且可以提高模型的优化性。
    Abstract While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance.
    摘要 While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance.

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition

  • paper_url: http://arxiv.org/abs/2311.11009
  • repo_url: None
  • paper_authors: Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura
  • for: 本文targets at multimodal emotion recognition, aiming to recognize emotions for each utterance of multiple modalities, and has important applications in human-machine interaction.
  • methods: 本文提出了一种joint modality fusion and graph contrastive learning方法,即Joyful方法,其中multimodality fusion、contrastive learning和情感认知 jointly optimized. specifically, a new multimodal fusion mechanism is designed to provide deep interaction and fusion between global contextual and uni-modal specific features. Additionally, a graph contrastive learning framework with inter-view and intra-view contrastive losses is introduced to learn more distinguishable representations for samples with different sentiments.
  • results: 根据三个benchmark datasets的实验结果,Joyful方法 achieved state-of-the-art (SOTA) performance compared to all baselines.
    Abstract Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared to all baselines.
    摘要 多模态情感识别目标是识别对话中每个语音的多种情感,受到人机交互应用的关注。现有的图structured方法无法同时展示对话中全局上下文特征和多模态特征的地方特征。此外,随着图层数量的增加,它们容易陷入过度熔合。本文提出了一种对多模态情感识别进行联合多modalité拟合和图相对学习的方法(Joyful),其中多模态拟合、对比学习和情感识别被联合优化。 Specifically,我们首先设计了一种新的多模态拟合机制,可以提供深入的交互和拟合全局上下文特征和单模态特征。然后,我们引入了一个图相对学习框架,包括对视和自视对比损失来学习更加 distinguishable的表示。经验表明,Joyful在三个基准数据集上达到了所有基准的最佳性能(SOTA)。

Gendec: A Machine Learning-based Framework for Gender Detection from Japanese Names

  • paper_url: http://arxiv.org/abs/2311.11001
  • repo_url: None
  • paper_authors: Duong Tien Pham, Luan Thanh Nguyen
  • for: 这个研究旨在创建一个日本名字性别检测数据集,以及一个基于多种方法的名字性别检测框架,以便更好地理解日本人名中的性别信息。
  • methods: 该研究使用了多种方法,包括传统的机器学习技术和最新的传输学习模型,以检测日本名字中的性别信息。
  • results: 研究预计可以准确地预测日本名字中的性别信息,并且可以应用于多个领域。
    Abstract Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to practical applications. Hence, this work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders. Moreover, we propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models, to predict the gender associated with Japanese names accurately. Through a thorough investigation, the proposed framework is expected to be effective and serve potential applications in various domains.
    摘要 每个人有自己的名字,是他们的身份和文化遗产的基本组成部分。名字通常包含了许多信息,如个人背景、民族和特别是性别。通过对名字进行分析,研究人员可以获得价值的信息,包括语言模式和文化规范,这些信息可以应用于实际应用。因此,本文提出了一个新的日本名字性别检测集合,包括64,139个全名(拼音、平仮名和汉字形式),以及他们的生物性别。此外,我们提出了一个基于多种方法的日本名字性别检测框架,包括传统机器学习技术和前沿技术的转移学习模型,以准确预测日本名字中的性别。经过仔细调查,我们预期该框架能够实现效果,并可以在多个领域应用。

Behavior Optimized Image Generation

  • paper_url: http://arxiv.org/abs/2311.10995
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Varun Khurana, Yaman K Singla, Jayakumar Subramanian, Rajiv Ratn Shah, Changyou Chen, Zhiqiang Xu, Balaji Krishnamurthy
  • for: 这篇论文的目的是如何在图像生成过程中嵌入目标行为的知识,以创造不仅更加美观的图像,还能够提高图像的表现力。
  • methods: 这篇论文提出了一种名为BoigLLM的语言模型,该模型能够理解图像内容和用户行为。BoigLLM知道图像需要如何看起来,以达到某些需要的KPI。此外, authors还提出了一种扩散型模型(BoigSD),用于对BoigLLM定义的奖励进行调整。
  • results: 作者们表明,BoigLLM可以比13倍大的GPT-3.5和GPT-4模型在这个任务中表现更好,表明这些当前状态的模型可以理解图像,但缺乏实际世界中图像的信息。此外, authors还发现,通过使用BoigBench数据集,可以对图像生成和理解进行更多的研究。
    Abstract The last few years have witnessed great success on image generation, which has crossed the acceptance thresholds of aesthetics, making it directly applicable to personal and commercial applications. However, images, especially in marketing and advertising applications, are often created as a means to an end as opposed to just aesthetic concerns. The goal can be increasing sales, getting more clicks, likes, or image sales (in the case of stock businesses). Therefore, the generated images need to perform well on these key performance indicators (KPIs), in addition to being aesthetically good. In this paper, we make the first endeavor to answer the question of "How can one infuse the knowledge of the end-goal within the image generation process itself to create not just better-looking images but also "better-performing'' images?''. We propose BoigLLM, an LLM that understands both image content and user behavior. BoigLLM knows how an image should look to get a certain required KPI. We show that BoigLLM outperforms 13x larger models such as GPT-3.5 and GPT-4 in this task, demonstrating that while these state-of-the-art models can understand images, they lack information on how these images perform in the real world. To generate actual pixels of behavior-conditioned images, we train a diffusion-based model (BoigSD) to align with a proposed BoigLLM-defined reward. We show the performance of the overall pipeline on two datasets covering two different behaviors: a stock dataset with the number of forward actions as the KPI and a dataset containing tweets with the total likes as the KPI, denoted as BoigBench. To advance research in the direction of utility-driven image generation and understanding, we release BoigBench, a benchmark dataset containing 168 million enterprise tweets with their media, brand account names, time of post, and total likes.
    摘要 过去几年,图像生成领域取得了巨大的成功,已经突破了艺术性的接受reshold,使其直接适用于个人和商业应用。然而,在市场营销和广告应用中,图像 oftentimes 是为了达到一个目标而创建的,而不是仅仅是艺术意义。因此,生成的图像需要在关键性能指标(KPI)上表现良好,除了外观好。在这篇论文中,我们做出了第一次的尝试,回答“如何在图像生成过程中植入目标的知识,以创造不仅更好看的图像,而且“更好表现”的图像?”我们提出了 BoigLLM,一个理解图像内容和用户行为的 LLM。BoigLLM 知道一个图像需要如何看起来,以达到某个需要的 KPI。我们表明,BoigLLM 在这项任务上超过 13 倍大的模型,如 GPT-3.5 和 GPT-4,表明这些当前顶尖模型可以理解图像,但缺乏实际世界中图像的信息。为生成 Conditioned 的实际像素,我们训练了一个扩散基于模型(BoigSD),以与 BoigLLM 定义的奖励相对。我们展示了整个管道在两个 dataset 上的表现,其中一个是一个股票 dataset,其中的 KPI 是前进动作的数量,另一个是一个包含 tweet 的 dataset,其中的 KPI 是总喜欢数。为了推动研究在Utility-driven 图像生成和理解方面,我们发布了 BoigBench,一个包含 168 万个企业微博,它们的媒体、 bran account 名称、发布时间和总喜欢数。

Journey of Hallucination-minimized Generative AI Solutions for Financial Decision Makers

  • paper_url: http://arxiv.org/abs/2311.10961
  • repo_url: None
  • paper_authors: Sohini Roychowdhury
  • for: 这项研究的目的是设计降低幻觉的大语言模型(LLM)解决方案,以便在金融领域的决策者使用。
  • methods: 本研究使用了三个主要阶段:评估、缩放和LLM进化,以确保 chatbot 、自动报告和警示的可靠性和质量。
  • results: 研究表明,通过使用人类反馈来控制幻觉,可以提高 chatbot 的可靠性和质量。同时,通过使用数据 Ansatz 技术,可以生成高质量的自动报告和警示。
    Abstract Generative AI has significantly reduced the entry barrier to the domain of AI owing to the ease of use and core capabilities of automation, translation, and intelligent actions in our day to day lives. Currently, Large language models (LLMs) that power such chatbots are being utilized primarily for their automation capabilities for software monitoring, report generation etc. and for specific personalized question answering capabilities, on a limited scope and scale. One major limitation of the currently evolving family of LLMs is 'hallucinations', wherein inaccurate responses are reported as factual. Hallucinations are primarily caused by biased training data, ambiguous prompts and inaccurate LLM parameters, and they majorly occur while combining mathematical facts with language-based context. Thus, monitoring and controlling for hallucinations becomes necessary when designing solutions that are meant for decision makers. In this work we present the three major stages in the journey of designing hallucination-minimized LLM-based solutions that are specialized for the decision makers of the financial domain, namely: prototyping, scaling and LLM evolution using human feedback. These three stages and the novel data to answer generation modules presented in this work are necessary to ensure that the Generative AI chatbots, autonomous reports and alerts are reliable and high-quality to aid key decision-making processes.
    摘要 生成AI已经大幅降低了AI领域的入口难度,因为它们提供了易于使用的核心功能,包括自动化、翻译和智能行为,在我们日常生活中。目前,大型语言模型(LLM)正在主要用于其自动化能力,例如软件监控、报告生成等,以及特定的个性化问答能力,但范围和规模受限。现有的LLM家族存在一个主要的问题,即“幻见”(hallucinations),即因训练数据偏见、模糊的提示和不准确的LLM参数而导致的不准确回答。因此,在设计针对决策者的LLM基于解决方案时,监控和控制幻见变得必要。在这项工作中,我们介绍了针对决策者的LLM基于解决方案设计的三个主要阶段:原型、扩大和LLM进化,以及在这三个阶段中使用人类反馈来确保生成AI聊天机器人、自动生成报告和警示高质量和可靠,以帮助决策过程。

Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2311.10944
  • repo_url: None
  • paper_authors: Panfeng Li, Mohamed Abouelenien, Rada Mihalcea
  • for: 这篇论文探讨了用 convolutional neural networks 进行多模态骗学检测的应用。
  • methods: 作者使用了一个由104名参与者回答两个话题的数据集,从这些数据中提取了语言和生理学特征,以建立和训练神经网络模型。
  • results: 作者提出了一种将多 modalities 融合的卷积神经网络模型,并证明了这种方法在多模态骗学检测中的优势。此外,作者还比较了这种方法与之前的多模态骗学检测方法,发现其在有限数据情况下也能够达到更高的检测精度。
    Abstract Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data.
    摘要 骗子检测在当前有越来越多的关注,这主要归功于伦理和安全问题。本文探讨了使用卷积神经网络进行多Modal骗子检测的应用。我们使用了104名参与者对两个话题进行了面试,每个话题有一个真实的和一个假的回答。特别是,我们从这些数据中提取了语言和生物学特征,用于训练和构建神经网络模型。我们的主要贡献如下:1. 我们提出了一种基于多Modal的卷积神经网络模型,以实现更好的总体性能。2. 我们对之前的多Modal骗子检测方法进行了比较,并发现我们的系统在数据量有限的情况下表现更好。我们的研究表明,使用卷积神经网络进行多Modal骗子检测是可行的,即使是在数据量有限的情况下。

Partially Randomizing Transformer Weights for Dialogue Response Diversity

  • paper_url: http://arxiv.org/abs/2311.10943
  • repo_url: None
  • paper_authors: Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan
  • for: 提高开放领域对话的响应多样性
  • methods: 使用固定layer的权重初始化和随机 initialize
  • results: 与先前的方法相比,PaRaFormer的性能相似,不增加训练或推理难度,也不增加模型的复杂度
    Abstract Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in model complexity.
    摘要 尽管最近的开放领域对话生成技术已经做出了 significi cant 进步,但问题仍然存在低响应多样性。先前的方法通常通过新的目标函数、不同的学习方法如变量框架,或者架构扩展如随机链接(RL)转换器来解决这个问题。然而,这些方法通常会在训练/推理过程中增加额外难度,或者模型的大小和复杂性会增加。因此,我们提出了PaRaFormer,一种简单的 transformer 扩展,它通过随机初始化选择层的 weights 并冻结它们来解决低响应多样性问题。实验结果表明,PaRaFormer 的性能与先前的方法相当,而无需额外的训练困难或模型的复杂度增加。