cs.AI - 2023-10-24

Speakerly: A Voice-based Writing Assistant for Text Composition

  • paper_url: http://arxiv.org/abs/2310.16251
  • repo_url: None
  • paper_authors: Dhruv Kumar, Vipul Raheja, Alice Kaiser-Schatzlein, Robyn Perry, Apurva Joshi, Justin Hugues-Nuger, Samuel Lou, Navid Chowdhury
  • for: 这篇论文是为了描述一个新的实时语音基于写作帮助系统,帮助用户在电子邮件、快递短信和笔记等不同场景下进行文本compose。
  • methods: 该系统使用小型、任务特定的模型,以及预训练的语言模型,实现快速和有效的文本compose,同时支持多种输入方式以提高使用性。
  • results: 该系统可以生成排版好的和协调的文档,并且可以在大规模部署中实现。
    Abstract We present Speakerly, a new real-time voice-based writing assistance system that helps users with text composition across various use cases such as emails, instant messages, and notes. The user can interact with the system through instructions or dictation, and the system generates a well-formatted and coherent document. We describe the system architecture and detail how we address the various challenges while building and deploying such a system at scale. More specifically, our system uses a combination of small, task-specific models as well as pre-trained language models for fast and effective text composition while supporting a variety of input modes for better usability.
    摘要 我们现在推出Speakerly,一个新的实时语音基于文本协助系统,帮助用户在不同的用例中(如电子邮件、即时消息和笔记)进行文本组成。用户可以通过指令或讲解与系统交互,系统会生成排版完善、流畅的文档。我们详细介绍系统架构,并解决了在建立和扩展这种系统的许多挑战。更specifically,我们的系统使用小型、任务特定的模型,以及预训练的语言模型,以快速和高效地进行文本组成,同时支持多种输入模式,以提高用户体验。

A clustering tool for interrogating finite element models based on eigenvectors of graph adjacency

  • paper_url: http://arxiv.org/abs/2310.16249
  • repo_url: None
  • paper_authors: Ramaseshan Kannan
  • for: 用于 debug Finite Element(FE)模型中的错误
  • methods: 使用无监督学习算法,对FE模型中的度量自由度进行卷积 clustering,基于数值的邻域稠密矩阵的数学性质
  • results: 已经成功应用于实际世界FE模型 debug,并提供了使用示例
    Abstract This note introduces an unsupervised learning algorithm to debug errors in finite element (FE) simulation models and details how it was productionised. The algorithm clusters degrees of freedom in the FE model using numerical properties of the adjacency of its stiffness matrix. The algorithm has been deployed as a tool called `Model Stability Analysis' tool within the commercial structural FE suite Oasys GSA (www.oasys-software.com/gsa). It has been used successfully by end-users for debugging real world FE models and we present examples of the tool in action.
    摘要 这份笔记介绍了一种无监督学习算法,用于 finite element(FE)模拟器中错误的调试,并详细介绍了其生产化过程。该算法使用 numerical properties of the adjacency of its stiffness matrix to cluster degrees of freedom in the FE model。该工具被命名为“Model Stability Analysis”工具,并在商业structural FE集成环境Oasys GSA(www.oasys-software.com/gsa)中部署。它已经由用户成功地用于真实世界FE模型的调试,我们现在提供了这种工具在行动的示例。

Pixel-Level Clustering Network for Unsupervised Image Segmentation

  • paper_url: http://arxiv.org/abs/2310.16234
  • repo_url: None
  • paper_authors: Cuong Manh Hoang, Byeongkeun Kang
  • for: 提高图像分割精度和效率,无需ground truth标注。
  • methods: 提出了一种基于像素嵌入、注意力机制、图像统计和超像素分割的无监督图像分割框架。
  • results: 实验结果表明,提出的方法在三个公开 dataset(Berkeley segmentation dataset、PASCAL VOC 2012 dataset和COCO-Stuff dataset)上表现出色,与前一代方法相比有所提高。
    Abstract While image segmentation is crucial in various computer vision applications, such as autonomous driving, grasping, and robot navigation, annotating all objects at the pixel-level for training is nearly impossible. Therefore, the study of unsupervised image segmentation methods is essential. In this paper, we present a pixel-level clustering framework for segmenting images into regions without using ground truth annotations. The proposed framework includes feature embedding modules with an attention mechanism, a feature statistics computing module, image reconstruction, and superpixel segmentation to achieve accurate unsupervised segmentation. Additionally, we propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images. To avoid potential over-segmentation caused by superpixel-based losses, we also propose a post-processing method. Furthermore, we present an extension of the proposed method for unsupervised semantic segmentation. We conducted experiments on three publicly available datasets (Berkeley segmentation dataset, PASCAL VOC 2012 dataset, and COCO-Stuff dataset) to demonstrate the effectiveness of the proposed framework. The experimental results show that the proposed framework outperforms previous state-of-the-art methods.
    摘要 《Image Segmentation without Annotations: A Pixel-level Clustering Framework》Introduction:Image segmentation is a crucial technology in various computer vision applications, such as autonomous driving, grasping, and robot navigation. However, annotating all objects at the pixel-level for training is nearly impossible. Therefore, the study of unsupervised image segmentation methods is essential. In this paper, we propose a pixel-level clustering framework for segmenting images into regions without using ground truth annotations.Methodology:The proposed framework includes the following modules:1. Feature Embedding Modules with Attention Mechanism: These modules extract features from the input images and embed them into a lower-dimensional space using an attention mechanism.2. Feature Statistics Computing Module: This module computes the statistics of the embedded features to capture the distribution of the data.3. Image Reconstruction: This module reconstructs the input image from the embedded features to measure the similarity between the embedded features and the original image.4. Superpixel Segmentation: This module segments the input image into superpixels using the reconstructed image.Training Strategy:We propose a training strategy that utilizes intra-consistency within each superpixel, inter-similarity/dissimilarity between neighboring superpixels, and structural similarity between images. To avoid potential over-segmentation caused by superpixel-based losses, we also propose a post-processing method.Extension to Unsupervised Semantic Segmentation:We extend the proposed framework to perform unsupervised semantic segmentation by incorporating a semantic label predictor. We conduct experiments on three publicly available datasets (Berkeley segmentation dataset, PASCAL VOC 2012 dataset, and COCO-Stuff dataset) to demonstrate the effectiveness of the proposed framework. The experimental results show that the proposed framework outperforms previous state-of-the-art methods.In summary, the proposed pixel-level clustering framework provides an effective solution for unsupervised image segmentation without annotated data. The framework has broad applications in computer vision and can be further extended to other tasks such as object detection and semantic segmentation.

CleanCoNLL: A Nearly Noise-Free Named Entity Recognition Dataset

  • paper_url: http://arxiv.org/abs/2310.16225
  • repo_url: https://github.com/flairnlp/cleanconll
  • paper_authors: Susanna Rücker, Alan Akbik
  • for: 本文旨在提高CoNLL-03数据集的标注质量,以便对名实Recognition(NER)模型进行比较和分析。
  • methods: 本研究使用自动一致性检查和实体链接注意力,对英文CoNLL-03数据集进行了全面的重新标注。
  • results: 我们的实验发现,使用我们的数据集,现有的状态aset的模型可以达到97.1%的F1分数,而且错误率下降了从47%到6%。这表明我们的资源适用于分析现有模型的错误,并且证明了现有的模型尚未达到理论的最高性能边界。
    Abstract The CoNLL-03 corpus is arguably the most well-known and utilized benchmark dataset for named entity recognition (NER). However, prior works found significant numbers of annotation errors, incompleteness, and inconsistencies in the data. This poses challenges to objectively comparing NER approaches and analyzing their errors, as current state-of-the-art models achieve F1-scores that are comparable to or even exceed the estimated noise level in CoNLL-03. To address this issue, we present a comprehensive relabeling effort assisted by automatic consistency checking that corrects 7.0% of all labels in the English CoNLL-03. Our effort adds a layer of entity linking annotation both for better explainability of NER labels and as additional safeguard of annotation quality. Our experimental evaluation finds not only that state-of-the-art approaches reach significantly higher F1-scores (97.1%) on our data, but crucially that the share of correct predictions falsely counted as errors due to annotation noise drops from 47% to 6%. This indicates that our resource is well suited to analyze the remaining errors made by state-of-the-art models, and that the theoretical upper bound even on high resource, coarse-grained NER is not yet reached. To facilitate such analysis, we make CleanCoNLL publicly available to the research community.
    摘要 CoNLL-03 资料集是可能最具知名度和使用度的命名实体识别(NER)的标准benchmark dataset。然而,前一些研究发现CoNLL-03中存在大量的注释错误、不完整性和不一致性问题。这会带来对NERapproaches的比较和分析其错误的困难,因为当前的State-of-the-art模型在CoNLL-03中达到了F1分数,与或者超过了估计的注释噪声水平。为解决这个问题,我们提供了一项全面的重新标注努力,利用自动一致性检查来更正CoNLL-03英文版的7.0%的标签。我们的努力还添加了实体关联注释,以提高NER标签的解释性和作为额外的注释质量保障。我们的实验评估发现,不仅State-of-the-art方法在我们的数据上达到了97.1%的F1分数,而且关键的是,由于注释噪声而被误 counted为错误的分数从47%降至6%。这表示我们的资源适用于分析State-of-the-art模型的剩下错误,并且证明了高资源、粗粒度NER的理论最高 bound还没有被实现。为便于这种分析,我们将CleanCoNLL公开提供给研究社区。

Hierarchical Randomized Smoothing

  • paper_url: http://arxiv.org/abs/2310.16221
  • repo_url: https://github.com/ManojKumarPatnaik/Major-project-list
  • paper_authors: Yan Scholten, Jan Schuchardt, Aleksandar Bojchevski, Stephan Günnemann
  • for: 提高模型对输入数据小变化的抗干扰性和准确性。
  • methods: 使用层次随机熵法,在不同级别上随机添加噪声,以提高模型的抗干扰性和准确性。
  • results: 在图像和节点分类任务中,通过层次随机熵法获得了更好的抗干扰性-准确性质量平衡,比传统方法更高。
    Abstract Real-world data is complex and often consists of objects that can be decomposed into multiple entities (e.g. images into pixels, graphs into interconnected nodes). Randomized smoothing is a powerful framework for making models provably robust against small changes to their inputs - by guaranteeing robustness of the majority vote when randomly adding noise before classification. Yet, certifying robustness on such complex data via randomized smoothing is challenging when adversaries do not arbitrarily perturb entire objects (e.g. images) but only a subset of their entities (e.g. pixels). As a solution, we introduce hierarchical randomized smoothing: We partially smooth objects by adding random noise only on a randomly selected subset of their entities. By adding noise in a more targeted manner than existing methods we obtain stronger robustness guarantees while maintaining high accuracy. We initialize hierarchical smoothing using different noising distributions, yielding novel robustness certificates for discrete and continuous domains. We experimentally demonstrate the importance of hierarchical smoothing in image and node classification, where it yields superior robustness-accuracy trade-offs. Overall, hierarchical smoothing is an important contribution towards models that are both - certifiably robust to perturbations and accurate.
    摘要 实际世界数据是复杂的,经常可以划分为多个实体(例如图像可以分解为像素)。随机简化是一个强大的框架,使模型对小改变输入的证明抗坏性 - 通过确保多数票的抗坏性在随机添加噪音之前。然而,在复杂数据上证明抗坏性通过随机简化是困难的,因为敌人不会随机改变整个对象(例如图像),而是只会改变一部分其实体(例如像素)。为解决这个问题,我们引入层次随机简化:我们在随机选择对象的一部分实体上添加随机噪音。由于我们在添加噪音的方式更加精准,我们可以获得更强的抗坏性保证,同时保持高准确率。我们使用不同的噪音分布初始化层次简化,得到了新的抗坏性证明 certificates для精度和连续域。我们在图像和节点分类中实际证明了层次简化的重要性,它在抗坏性-准确度质量上提供了优于现有方法的质量。总之,层次简化是一个重要的贡献,它使得模型可以同时拥有证明抗坏性和高准确率的能力。

Knowledge Editing for Large Language Models: A Survey

  • paper_url: http://arxiv.org/abs/2310.16218
  • repo_url: None
  • paper_authors: Song Wang, Yaochen Zhu, Haochen Liu, Zaiyi Zheng, Chen Chen, Jundong Li
    for:This paper aims to provide a comprehensive and in-depth overview of recent advances in the field of Knowledge-based Model Editing (KME) for pre-trained language models (LLMs).methods:The paper uses a general formulation of KME to encompass different KME strategies, and introduces an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs.results:The paper provides an in-depth analysis of existing KME strategies, including their key insights, advantages, and limitations. It also introduces representative metrics, datasets, and applications of KME.
    Abstract Large language models (LLMs) have recently transformed both the academic and industrial landscapes due to their remarkable capacity to understand, analyze, and generate texts based on their vast knowledge and reasoning ability. Nevertheless, one major drawback of LLMs is their substantial computational cost for pre-training due to their unprecedented amounts of parameters. The disadvantage is exacerbated when new knowledge frequently needs to be introduced into the pre-trained model. Therefore, it is imperative to develop effective and efficient techniques to update pre-trained LLMs. Traditional methods encode new knowledge in pre-trained LLMs through direct fine-tuning. However, naively re-training LLMs can be computationally intensive and risks degenerating valuable pre-trained knowledge irrelevant to the update in the model. Recently, Knowledge-based Model Editing (KME) has attracted increasing attention, which aims to precisely modify the LLMs to incorporate specific knowledge, without negatively influencing other irrelevant knowledge. In this survey, we aim to provide a comprehensive and in-depth overview of recent advances in the field of KME. We first introduce a general formulation of KME to encompass different KME strategies. Afterward, we provide an innovative taxonomy of KME techniques based on how the new knowledge is introduced into pre-trained LLMs, and investigate existing KME strategies while analyzing key insights, advantages, and limitations of methods from each category. Moreover, representative metrics, datasets, and applications of KME are introduced accordingly. Finally, we provide an in-depth analysis regarding the practicality and remaining challenges of KME and suggest promising research directions for further advancement in this field.
    摘要 大型语言模型(LLM)在学术和工业领域的应用中已经导致了巨大的改变,主要是因为它们具有巨大的知识和推理能力,能够理解、分析和生成文本。然而,LLM的执行成本仍然是一个主要的障碍,尤其是当需要时常更新它们时。因此,发展有效和高效的LLM更新技术是非常重要的。传统的方法是通过直接精度调整 LLM 来将新知识给嵌入。然而,规范地再训练 LLM 可能是 computationally 的消耗性和对已有的知识的损害。因此, Knowledge-based Model Editing(KME)在最近引起了越来越多的关注,它目的是将特定知识给 LLM 中,而不影响其他无关知识。在这篇评论中,我们希望提供一个全面和深入的 KME 发展的总结。我们首先提出了 KME 的一般表述,然后提出了一个创新的 KME 分类法,根据新知识如何在预训 LLM 中整合。接着,我们分析了现有的 KME 策略,并评估了每个类别的关键见解、优点和局限性。此外,我们还介绍了 KME 的代表性 метри克、数据集和应用。最后,我们对 KME 的实用性和未来挑战进行了深入分析,并建议了可能的进一步发展方向。

Length is a Curse and a Blessing for Document-level Semantics

  • paper_url: http://arxiv.org/abs/2310.16193
  • repo_url: https://github.com/gowitheflow-1998/la-ser-cubed
  • paper_authors: Chenghao Xiao, Yizhi Li, G Thomas Hudson, Chenghua Lin, Noura Al Moubayed
  • for: This paper aims to investigate the length generalizability of contrastive learning (CL) models and develop a new framework for learning semantically robust sentence representations that is not vulnerable to length-induced semantic shift.
  • methods: The authors use unsupervised CL methods that rely solely on the semantic signal provided by document length to devise a new framework for learning sentence representations.
  • results: The proposed framework, LA(SER)$^{3}$, achieves state-of-the-art unsupervised performance on the standard information retrieval benchmark, demonstrating the effectiveness of the length-agnostic self-reference approach in learning semantically robust sentence representations.
    Abstract In recent years, contrastive learning (CL) has been extensively utilized to recover sentence and document-level encoding capability from pre-trained language models. In this work, we question the length generalizability of CL-based models, i.e., their vulnerability towards length-induced semantic shift. We verify not only that length vulnerability is a significant yet overlooked research gap, but we can devise unsupervised CL methods solely depending on the semantic signal provided by document length. We first derive the theoretical foundations underlying length attacks, showing that elongating a document would intensify the high intra-document similarity that is already brought by CL. Moreover, we found that isotropy promised by CL is highly dependent on the length range of text exposed in training. Inspired by these findings, we introduce a simple yet universal document representation learning framework, LA(SER)$^{3}$: length-agnostic self-reference for semantically robust sentence representation learning, achieving state-of-the-art unsupervised performance on the standard information retrieval benchmark.
    摘要 在最近几年,对比学习(CL)已经广泛应用于恢复先验语言模型中的句子和文档级编码能力。在这项工作中,我们质疑CL-based模型对长度的敏感性,即因文档长度而导致的 semantics shift。我们不仅证明了长度敏感性是一个被忽略的研究漏洞,而且我们可以靠文档长度的semantic signal来静默CL方法。我们首先 derive了对length attack的理论基础,表明了在CL中延长文档会增强高于文档内的同样性。此外,我们发现CL中的均匀性强度与文档训练中文本的长度范围有着高度相关性。 inspirited by这些发现,我们提出了一个简单 yet universally applicable的文档表示学习框架,即LA(SER)$^{3}$:不受文档长度影响的自referential学习方法,可以在标准信息检索benchmark上实现领先的无监督性能。

Correction with Backtracking Reduces Hallucination in Summarization

  • paper_url: http://arxiv.org/abs/2310.16176
  • repo_url: None
  • paper_authors: Zhenzhen Liu, Chao Wan, Varsha Kishore, Jin Peng Zhou, Minmin Chen, Kilian Q. Weinberger
    for: 本研究旨在提高神经网络抽象摘要模型的可靠性,减少幻想(也称为假想)现象,以生成简洁而准确的摘要。methods: 本研究提出了一种简单 yet efficient的技术——CoBa,通过两个步骤来减少幻想:检测幻想和抑制幻想。检测幻想可以通过测量 conditional word probabilities 和距离上下文字的距离来实现。而抑制幻想可以使用直观的 backtracking 技术。results: 对于三个文本摘要 benchmark 数据集,我们进行了广泛的评估。结果表明,CoBa 能够有效地减少幻想现象,同时具有很好的适应性和灵活性。
    Abstract Abstractive summarization aims at generating natural language summaries of a source document that are succinct while preserving the important elements. Despite recent advances, neural text summarization models are known to be susceptible to hallucinating (or more correctly confabulating), that is to produce summaries with details that are not grounded in the source document. In this paper, we introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization. The approach is based on two steps: hallucination detection and mitigation. We show that the former can be achieved through measuring simple statistics about conditional word probabilities and distance to context words. Further, we demonstrate that straight-forward backtracking is surprisingly effective at mitigation. We thoroughly evaluate the proposed method with prior art on three benchmark datasets for text summarization. The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.
    摘要 摘要抄写目标在于生成自然语言摘要,以减少源文档中重要元素的抽象。尽管最近很多进步,神经网络抄写模型仍然容易受到幻觉(或更正确地说,杂谈)的影响,即生成摘要中的细节不基于源文档。在这篇论文中,我们介绍了一种简单 yet efficient的技术,CoBa,以减少抄写幻觉。该方法包括两个步骤:幻觉检测和缓解。我们表明,前者可以通过测量条件词概率和与上下文词的距离来实现。此外,我们示示了直观的回溯 surprisingly 有效于缓解。我们在三个文本摘要 benchmark 上进行了严格的评估,结果显示,CoBa 有效地减少了幻觉,并具有很好的适应性和灵活性。

Context-aware feature attribution through argumentation

  • paper_url: http://arxiv.org/abs/2310.16157
  • repo_url: None
  • paper_authors: Jinfeng Zhong, Elsa Negre
  • for: 本研究旨在提高现有的特征归因方法的精度和可解性,并考虑用户的上下文,以提高预测结果的准确性。
  • methods: 本研究使用了一种新的特征归因框架,即Context-Aware Feature Attribution Through Argumentation(CA-FATA),它基于了论证来处理每个特征,并将特征归因视为一种论证过程。
  • results: 对比 existed 方法,CA-FATA 能够更高度地考虑用户的上下文,并提高特征归因的精度和可解性。
    Abstract Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important features for predicting an outcome. The history of feature attribution methods can be traced back to General Additive Models (GAMs), which extend linear regression models by incorporating non-linear relationships between dependent and independent variables. In recent years, gradient-based methods and surrogate models have been applied to unravel complex Artificial Intelligence (AI) systems, but these methods have limitations. GAMs tend to achieve lower accuracy, gradient-based methods can be difficult to interpret, and surrogate models often suffer from stability and fidelity issues. Furthermore, most existing methods do not consider users' contexts, which can significantly influence their preferences. To address these limitations and advance the current state-of-the-art, we define a novel feature attribution framework called Context-Aware Feature Attribution Through Argumentation (CA-FATA). Our framework harnesses the power of argumentation by treating each feature as an argument that can either support, attack or neutralize a prediction. Additionally, CA-FATA formulates feature attribution as an argumentation procedure, and each computation has explicit semantics, which makes it inherently interpretable. CA-FATA also easily integrates side information, such as users' contexts, resulting in more accurate predictions.
    摘要 feature 归属是机器学习和数据分析中的基本任务,它的目的是确定每个特征或变量对模型输出的贡献。这个过程可以帮助 Identify the most important features for predicting an outcome. 在过去,feature 归属方法的历史可以追溯到通用加itive模型(GAMs),它们将线性回归模型扩展到包括非线性висиendent和独立变量之间的关系。在过去几年,梯度基本方法和代理模型已经应用于解读复杂的人工智能(AI)系统,但这些方法有限制。GAMs通常具有较低的准确率,梯度基本方法可能难以理解,而代理模型经常受稳定性和准确性的问题。此外,大多数现有方法不考虑用户的上下文,这可能会对用户的偏好产生重要影响。为了缓解这些限制并提高当前状态艺术,我们定义了一种新的特征归属框架,即Context-Aware Feature Attribution Through Argumentation(CA-FATA)。我们的框架利用了论证的力量,将每个特征视为一个论证,这些论证可以支持、攻击或中和预测。CA-FATA还将特征归属定义为论证过程,每个计算有显式 semantics,这使其自然地可读性高。CA-FATA还容易集成用户的上下文信息,导致更加准确的预测。

Yin Yang Convolutional Nets: Image Manifold Extraction by the Analysis of Opposites

  • paper_url: http://arxiv.org/abs/2310.16148
  • repo_url: https://github.com/nosaveddata/yinyang_cnn
  • paper_authors: Augusto Seben da Rosa, Frederico Santos de Oliveira, Anderson da Silva Soares, Arnaldo Candido Junior
  • for: 这项研究的目的是提出一种基于生物学革新的计算机视觉模型,以便更好地模仿大脑的运作。
  • methods: 该模型采用了一种称为“阴阳 convolutional network”的新型架构,该架构包括分离颜色和形状的分析块,以模拟 occipital lobe 的运作。
  • results: 研究结果表明,该模型在 CIFAR-10 数据集上达到了 State-of-the-Art 级别的效率,其中第一个模型达到了 93.32% 的测试精度,比之前的 SOTA 高出 0.8%,同时具有 150k fewer parameters (726k 总参数). 第二个模型使用了 52k 参数,产生的测试精度下降了 3.86%。此外,我们还对 ImageNet 进行了分析,达到了 66.49% 的验证精度,使用了 1.6M 参数。代码已经公开在 GitHub 上:https://github.com/NoSavedDATA/YinYang_CNN。
    Abstract Computer vision in general presented several advances such as training optimizations, new architectures (pure attention, efficient block, vision language models, generative models, among others). This have improved performance in several tasks such as classification, and others. However, the majority of these models focus on modifications that are taking distance from realistic neuroscientific approaches related to the brain. In this work, we adopt a more bio-inspired approach and present the Yin Yang Convolutional Network, an architecture that extracts visual manifold, its blocks are intended to separate analysis of colors and forms at its initial layers, simulating occipital lobe's operations. Our results shows that our architecture provides State-of-the-Art efficiency among low parameter architectures in the dataset CIFAR-10. Our first model reached 93.32\% test accuracy, 0.8\% more than the older SOTA in this category, while having 150k less parameters (726k in total). Our second model uses 52k parameters, losing only 3.86\% test accuracy. We also performed an analysis on ImageNet, where we reached 66.49\% validation accuracy with 1.6M parameters. We make the code publicly available at: https://github.com/NoSavedDATA/YinYang_CNN.
    摘要 通用计算机视觉在总体上表现出了许多进步,如训练优化、新的架构(纯注意力、高效块、视觉语言模型、生成模型等)。这些进步使得在多个任务中表现得更好,如分类等。然而,大多数这些模型都强调与真实神经科学方法相关的距离减少。在这种情况下,我们采用了更加生物启发的方法,并提出了灵感来自于脑的Yin Yang卷积网络架构,该架构可以提取视觉拟合,其各层分析颜色和形状,类似于 occipital lobe 的运作。我们的结果表明,我们的架构在 CIFAR-10 数据集中提供了 State-of-the-Art 的效率,其中首个模型达到了 93.32% 的测试精度,较之前的 SOTA 高于 0.8%,同时具有 150k fewer 参数(总共 726k)。我们的第二个模型使用了 52k 参数,产生的损失只有 3.86%。此外,我们还对 ImageNet 进行了分析,其中我们达到了 66.49% 的验证精度,使用了 1.6M 参数。我们将代码公开发布在 GitHub 上:https://github.com/NoSavedDATA/YinYang_CNN。

PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering

  • paper_url: http://arxiv.org/abs/2310.16147
  • repo_url: None
  • paper_authors: Wookje Han, Jinsol Park, Kyungjae Lee
  • for: 这个论文是为了解决信息搜寻问题中的假设和假设推理问题。
  • methods: 该论文使用了提取问题中的假设,并将其作为工作记忆来生成反馈和行动,以解决问题。
  • results: 实验表明,该方法不仅可以处理假设问题,还可以处理正常的问题, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings。
    Abstract Information-seeking questions in long-form question answering (LFQA) often prove misleading due to ambiguity or false presupposition in the question. While many existing approaches handle misleading questions, they are tailored to limited questions, which are insufficient in a real-world setting with unpredictable input characteristics. In this work, we propose PreWoMe, a unified approach capable of handling any type of information-seeking question. The key idea of PreWoMe involves extracting presuppositions in the question and exploiting them as working memory to generate feedback and action about the question. Our experiment shows that PreWoMe is effective not only in tackling misleading questions but also in handling normal ones, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings.
    摘要 常见的信息搜寻问题在长形问答中经常导致误导,因为问题中含有模糊或虚假假设。虽然现有的方法可以处理误导问题,但它们只适用于有限的问题类型,这些类型在实际应用中是不可预测的。在这项工作中,我们提出了PreWoMe方法,可以处理任何类型的信息搜寻问题。PreWoMe方法的关键思想是提取问题中的假设,并将其作为工作记忆来生成反馈和行动。我们的实验表明,PreWoMe方法不仅能够处理误导问题,还能够处理正常的问题,这说明了在实际QA场景中,可以通过借鉴假设、反馈和行动来提高问答效果。

From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning

  • paper_url: http://arxiv.org/abs/2310.18364
  • repo_url: https://github.com/sled-group/heuristic-analytic-reasoning
  • paper_authors: Zheyuan Zhang, Shane Storks, Fengyuan Hu, Sungryull Sohn, Moontae Lee, Honglak Lee, Joyce Chai
  • for: 本研究旨在提高语言模型(PLM)的合理性和可靠性,通过 incorporating 认知心理学中的快速和直观思维和慢速和探索性思维两种不同的过程。
  • methods: 我们在 PLM 的 fine-tuning 和 contextual learning 中采用了这两种不同的思维过程,并应用到了两种语言理解任务,需要 Physical Common Sense 的合理化。
  • results: 我们的方法可以很大程度地提高 PLM 的合理化结果,达到了 TRIP 任务的州OF-THE-ART результаados,并发现这种改进的合理化是受到更加 faithful 的语言上下文注意力的直接结果。
    Abstract Pre-trained language models (PLMs) have shown impressive performance in various language tasks. However, they are prone to spurious correlations, and often generate illusory information. In real-world applications, PLMs should justify decisions with formalized, coherent reasoning chains, but this challenge remains under-explored. Cognitive psychology theorizes that humans are capable of utilizing fast and intuitive heuristic thinking to make decisions based on past experience, then rationalizing the decisions through slower and deliberative analytic reasoning. We incorporate these interlinked dual processes in fine-tuning and in-context learning with PLMs, applying them to two language understanding tasks that require coherent physical commonsense reasoning. We show that our proposed Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions, yielding state-of-the-art results on Tiered Reasoning for Intuitive Physics (TRIP). We also find that this improved coherence is a direct result of more faithful attention to relevant language context in each step of reasoning. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
    摘要

Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature

  • paper_url: http://arxiv.org/abs/2310.16146
  • repo_url: None
  • paper_authors: Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, Nigam Shah
  • For: This paper aims to provide an open-source WebApp called Clinfo.ai that answers clinical questions based on dynamically retrieved scientific literature, and to evaluate the performance of such retrieval-augmented language models (LLMs) using a specified information retrieval and abstractive summarization task.* Methods: The authors use a dataset of 200 questions and corresponding answers derived from published systematic reviews, named PubMed Retrieval and Synthesis (PubMedRS-200), to evaluate the performance of Clinfo.ai and other publicly available OpenQA systems.* Results: The authors report benchmark results for Clinfo.ai and other OpenQA systems on PubMedRS-200, demonstrating the effectiveness of their approach in answering clinical questions and summarizing relevant scientific literature.
    Abstract The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200.
    摘要 随着医学文献的快速扩展,临床医生和研究人员很难在时间上保持和总结最新、相关的发现。虽然现在有一些基于大语言模型(LLM)的关闭源摘要工具存在,但是对这些工具的评估还缺乏严格和系统的评估。此外,有限的高质量数据集和适当的benchmark任务,用于评估这些工具的性能也缺乏。我们通过以下四个贡献来解决这些问题:我们发布了一个开源的WebApp,名为Clinfo.ai,可以根据动态 retrieve scientific literature来回答临床问题;我们定义了一个信息检索和抽象摘要任务,用于评估这些基于检索加强的LLM系统的性能;我们发布了200个问题和相应的答案,这些答案来自已发布的系统性文献审查,我们称之为PubMed Retrieval and Synthesis(PubMedRS-200);我们还对Clinfo.ai和其他公共可用的OpenQA系统在PubMedRS-200上的性能进行了标准化测试。

A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing

  • paper_url: http://arxiv.org/abs/2310.16142
  • repo_url: None
  • paper_authors: William Timkey, Tal Linzen
  • for: 本研究旨在开发一种基于循环神经网络的自然语言处理模型,以更好地模拟人类句子处理中的内存系统。
  • methods: 本研究使用了一个带有单个自我注意头的循环神经网络模型,以更好地模拟人类句子处理中的内存系统。
  • results: 研究发现,该模型的单个注意头可以捕捉人类句子处理中的 semantic和 sintactic干扰效应。
    Abstract Two of the central factors believed to underpin human sentence processing difficulty are expectations and retrieval from working memory. A recent attempt to create a unified cognitive model integrating these two factors relied on the parallels between the self-attention mechanism of transformer language models and cue-based retrieval theories of working memory in human sentence processing (Ryu and Lewis 2021). While Ryu and Lewis show that attention patterns in specialized attention heads of GPT-2 are consistent with similarity-based interference, a key prediction of cue-based retrieval models, their method requires identifying syntactically specialized attention heads, and makes the cognitively implausible assumption that hundreds of memory retrieval operations take place in parallel. In the present work, we develop a recurrent neural language model with a single self-attention head, which more closely parallels the memory system assumed by cognitive theories. We show that our model's single attention head captures semantic and syntactic interference effects observed in human experiments.
    摘要 两个中心因素被认为影响人类句子处理困难是预期和从工作内存中 retrieval。一项最近尝试创建一个统一的认知模型,旨在 integrating 这两个因素,基于 transformer 语言模型的自注意机制和人工内存中缺失 theories 的缘故(Ryu 和 Lewis 2021)。而 Ryu 和 Lewis 表明,特殊注意头中的注意力强度与相似性基本干扰相关,但 их方法需要标识语法特殊注意头,并假设 hundreds 个内存检索操作发生在平行的。在当前工作中,我们开发了一个 recurrent neural language model ,具有单一的自注意头,更加匹配人类认知理论中的内存系统。我们显示,我们模型的单一注意头能够捕捉 semantic 和语法干扰效应,观察到人类实验中。

Context-aware explainable recommendations over knowledge graphs

  • paper_url: http://arxiv.org/abs/2310.16141
  • repo_url: None
  • paper_authors: Jinfeng Zhong, Elsa Negre
  • for: 用于模型用户在不同上下文中的偏好,并可以根据知识图中对物品的semantic关系进行适应 recombination。
  • methods: 使用Context-Aware Knowledge Graph Convolutional Network(CA-KGCN)框架,可以模型用户在不同上下文中的偏好,并可以根据知识图中对物品的semantic关系进行适应 recombination。
  • results: 在三个实际数据集上进行实验,证明了CA-KGCN框架的效果:可以模型用户在不同上下文中的偏好,并可以对 recombination 提供适应于context的解释。
    Abstract Knowledge graphs contain rich semantic relationships related to items and incorporating such semantic relationships into recommender systems helps to explore the latent connections of items, thus improving the accuracy of prediction and enhancing the explainability of recommendations. However, such explainability is not adapted to users' contexts, which can significantly influence their preferences. In this work, we propose CA-KGCN (Context-Aware Knowledge Graph Convolutional Network), an end-to-end framework that can model users' preferences adapted to their contexts and can incorporate rich semantic relationships in the knowledge graph related to items. This framework captures users' attention to different factors: contexts and features of items. More specifically, the framework can model users' preferences adapted to their contexts and provide explanations adapted to the given context. Experiments on three real-world datasets show the effectiveness of our framework: modeling users' preferences adapted to their contexts and explaining the recommendations generated.
    摘要 知识图中的semantic关系rich related to items, 将这些semantic关系integrated into recommender systems可以探索item的隐藏连接, thereby improving the accuracy of predictions and enhancing the explainability of recommendations. However, such explainability is not adapted to users' contexts, which can significantly influence their preferences. In this work, we propose CA-KGCN (Context-Aware Knowledge Graph Convolutional Network), an end-to-end framework that can model users' preferences adapted to their contexts and can incorporate rich semantic relationships in the knowledge graph related to items. This framework captures users' attention to different factors: contexts and features of items. More specifically, the framework can model users' preferences adapted to their contexts and provide explanations adapted to the given context. Experiments on three real-world datasets show the effectiveness of our framework: modeling users' preferences adapted to their contexts and explaining the recommendations generated.Here's the breakdown of the translation:* 知识图 (knowledge graph) becomes 知识图中 (in the knowledge graph)* semantic关系 (semantic relationships) becomes semantic关系rich (rich semantic relationships)* integrated becomes integrated into* 隐藏连接 (hidden connections) becomes 隐藏连接 (latent connections)* predictions becomes 预测 (predictions)* explainability becomes 解释 (explainability)* users' contexts becomes 用户的上下文 (users' contexts)* CA-KGCN (Context-Aware Knowledge Graph Convolutional Network) becomes Context-Aware Knowledge Graph Convolutional Network (CA-KGCN)* end-to-end framework becomes 终端框架 (end-to-end framework)* can model becomes 可以模型 (can model)* users' preferences becomes 用户的偏好 (users' preferences)* adapted to their contexts becomes 适应到他们的上下文 (adapted to their contexts)* and provide explanations becomes 并提供解释 (and provide explanations)* real-world datasets becomes 实际数据集 (real-world datasets)I hope this helps! Let me know if you have any further questions or if there's anything else I can help you with.

Alquist 5.0: Dialogue Trees Meet Generative Models. A Novel Approach for Enhancing SocialBot Conversations

  • paper_url: http://arxiv.org/abs/2310.16119
  • repo_url: None
  • paper_authors: Ondřej Kobza, Jan Čuhel, Tommaso Gargiani, David Herel, Petr Marek
  • for: 这篇论文是为了描述作者开发的 SocialBot Alquist~5.0,以及该系统如何 integrating NRG Barista 和 multimodal devices。
  • methods: 该论文使用了多种创新approaches,包括NRG Barista 和 multimodal devices的支持,以提高对话体验。
  • results: 该论文提供了关于 Alquist~5.0 的开发,以及其能够满足用户对话需求的细节信息。
    Abstract We present our SocialBot -- Alquist~5.0 -- developed for the Alexa Prize SocialBot Grand Challenge~5. Building upon previous versions of our system, we introduce the NRG Barista and outline several innovative approaches for integrating Barista into our SocialBot, improving the overall conversational experience. Additionally, we extend our SocialBot to support multimodal devices. This paper offers insights into the development of Alquist~5.0, which meets evolving user expectations while maintaining empathetic and knowledgeable conversational abilities across diverse topics.
    摘要 我们现在推出我们的社交机器人——Alquist~5.0——,这是为Alexa Prize SocialBot Grand Challenge~5所开发的。基于之前的版本,我们引入了NRG Barista,并提出了一些创新的方法来集成Barista到我们的社交机器人中,从而改善总体的对话体验。此外,我们扩展了我们的社交机器人以支持多模态设备。这篇论文为Alquist~5.0的开发提供了深入的启示,这种系统能够适应用户对话习惯的不断变化,同时保持对多种话题的对话能力和同理心。

Anatomically-aware Uncertainty for Semi-supervised Image Segmentation

  • paper_url: http://arxiv.org/abs/2310.16099
  • repo_url: https://github.com/adigasu/anatomically-aware_uncertainty_for_semi-supervised_segmentation
  • paper_authors: Sukesh Adiga V, Jose Dolz, Herve Lombaert
  • for: 这个论文的目的是提出一种新的图像分割不精确性估计方法,以便更好地指导图像分割网络。
  • methods: 这个论文使用了一种基于 segmentation masks 的 anatomically-aware 表示方法,通过将 prediction 映射到 anatomically-plausible segmentation,来估计图像分割不精确性。
  • results: 这个论文在两个公共可用的图像分割数据集上进行了测试,并与当前最佳 semi-supervised 方法进行了比较,得到了更高的分割精度。
    Abstract Semi-supervised learning relaxes the need of large pixel-wise labeled datasets for image segmentation by leveraging unlabeled data. A prominent way to exploit unlabeled data is to regularize model predictions. Since the predictions of unlabeled data can be unreliable, uncertainty-aware schemes are typically employed to gradually learn from meaningful and reliable predictions. Uncertainty estimation methods, however, rely on multiple inferences from the model predictions that must be computed for each training step, which is computationally expensive. Moreover, these uncertainty maps capture pixel-wise disparities and do not consider global information. This work proposes a novel method to estimate segmentation uncertainty by leveraging global information from the segmentation masks. More precisely, an anatomically-aware representation is first learnt to model the available segmentation masks. The learnt representation thereupon maps the prediction of a new segmentation into an anatomically-plausible segmentation. The deviation from the plausible segmentation aids in estimating the underlying pixel-level uncertainty in order to further guide the segmentation network. The proposed method consequently estimates the uncertainty using a single inference from our representation, thereby reducing the total computation. We evaluate our method on two publicly available segmentation datasets of left atria in cardiac MRIs and of multiple organs in abdominal CTs. Our anatomically-aware method improves the segmentation accuracy over the state-of-the-art semi-supervised methods in terms of two commonly used evaluation metrics.
    摘要 半指导学习降低了需要大量标注的像素级数据来进行图像分割的需求,通过利用无标注数据来推导模型预测。无标注数据的预测结果可能不可靠,因此通常采用不确定性感知方法来慢慢地学习有意义和可靠的预测结果。然而,不确定性估计方法通常需要对每个训练步骤进行多次模型预测,这是计算昂贵的。此外,这些不确定性地图只 capture pixel级差异,没有考虑全局信息。本工作提出了一种新的不确定性估计方法,通过利用分割masks中的全局信息来改进 segmentation uncertainty estimation。具体来说,首先学习了一种可靠的分割表示,以模型可用的分割masks来表示可能的分割结果。然后,通过将新的分割预测映射到该可靠的分割表示中,得到了一个具有 анатомиче可靠性的分割结果。与这个可靠的分割结果的差异可以用来估计图像分割的下一个像素级uncertainty,从而进一步引导分割网络。我们的方法只需要单个的模型预测,可以快速地计算不确定性,而不需要多次模型预测。我们对两个公共可用的分割数据集进行评估,即cardiac MRIs中的左大脏分割数据集和abdominal CTs中的多器官分割数据集。我们的半导学习方法在两个常用的评估指标上提高了分割精度,比前STATE-OF-THE-ART semi-supervised方法更高。

Synthetic Data as Validation

  • paper_url: http://arxiv.org/abs/2310.16052
  • repo_url: https://github.com/fiu-airlab/Next-Generation-Airline-Data-Exchange-Simulator
  • paper_authors: Qixin Hu, Alan Yuille, Zongwei Zhou
  • For: The paper is written to explore the use of synthetic data for early cancer detection in computed tomography (CT) volumes, with a focus on improving the robustness of AI models in identifying very tiny liver tumors.* Methods: The paper uses synthetic data to generate and superimpose tumors onto healthy organs in CT volumes, creating an extensive dataset for validation. The authors also propose a continual learning framework that continuously trains AI models on a stream of out-domain data with synthetic tumors.* Results: The paper shows that using synthetic data for validation can improve AI robustness in both in-domain and out-domain test sets. Specifically, the DSC score for liver tumor segmentation improves from 26.7% to 34.5% when evaluated on an in-domain dataset and from 31.1% to 35.4% on an out-domain dataset, with significant improvements in identifying very tiny liver tumors.
    Abstract This study leverages synthetic data as a validation set to reduce overfitting and ease the selection of the best model in AI development. While synthetic data have been used for augmenting the training set, we find that synthetic data can also significantly diversify the validation set, offering marked advantages in domains like healthcare, where data are typically limited, sensitive, and from out-domain sources (i.e., hospitals). In this study, we illustrate the effectiveness of synthetic data for early cancer detection in computed tomography (CT) volumes, where synthetic tumors are generated and superimposed onto healthy organs, thereby creating an extensive dataset for rigorous validation. Using synthetic data as validation can improve AI robustness in both in-domain and out-domain test sets. Furthermore, we establish a new continual learning framework that continuously trains AI models on a stream of out-domain data with synthetic tumors. The AI model trained and validated in dynamically expanding synthetic data can consistently outperform models trained and validated exclusively on real-world data. Specifically, the DSC score for liver tumor segmentation improves from 26.7% (95% CI: 22.6%-30.9%) to 34.5% (30.8%-38.2%) when evaluated on an in-domain dataset and from 31.1% (26.0%-36.2%) to 35.4% (32.1%-38.7%) on an out-domain dataset. Importantly, the performance gain is particularly significant in identifying very tiny liver tumors (radius < 5mm) in CT volumes, with Sensitivity improving from 33.1% to 55.4% on an in-domain dataset and 33.9% to 52.3% on an out-domain dataset, justifying the efficacy in early detection of cancer. The application of synthetic data, from both training and validation perspectives, underlines a promising avenue to enhance AI robustness when dealing with data from varying domains.
    摘要

AI Alignment and Social Choice: Fundamental Limitations and Policy Implications

  • paper_url: http://arxiv.org/abs/2310.16048
  • repo_url: None
  • paper_authors: Abhilash Mishra
  • for: 本研究旨在 investigating the challenges of building reinforcement learning with human feedback (RLHF) systems that respect democratic norms.
  • methods: 本paper使用社会选择理论的不可能性结果,以及研究RLHF系统如何对各个个人的价值观进行对齐。
  • results: 研究发现,使用RLHF系统对所有个人的价值观进行对齐是不可能的,而且需要对特定用户群进行对齐。 Here’s the full version of the three points in Traditional Chinese:
  • for: 本研究旨在探讨RLHF系统如何对民主 norms进行对齐。
  • methods: 本paper使用社会选择理论的不可能性结果,以及研究RLHF系统如何对各个个人的价值观进行对齐。
  • results: 研究发现,使用RLHF系统对所有个人的价值观进行对齐是不可能的,而且需要对特定用户群进行对齐。
    Abstract Aligning AI agents to human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? Reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to align their outputs to human values. It is critical to understand the limitations of RLHF and consider policy challenges arising from these limitations. In this paper, we investigate a specific challenge in building RLHF systems that respect democratic norms. Building on impossibility results in social choice theory, we show that, under fairly broad assumptions, there is no unique voting protocol to universally align AI systems using RLHF through democratic processes. Further, we show that aligning AI agents with the values of all individuals will always violate certain private ethical preferences of an individual user i.e., universal AI alignment using RLHF is impossible. We discuss policy implications for the governance of AI systems built using RLHF: first, the need for mandating transparent voting rules to hold model builders accountable. Second, the need for model builders to focus on developing AI agents that are narrowly aligned to specific user groups.
    摘要 aligning AI agents with human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? reinforcement learning with human feedback (RLHF) has emerged as the key framework for AI alignment. RLHF uses feedback from human reinforcers to fine-tune outputs; all widely deployed large language models (LLMs) use RLHF to align their outputs to human values. It is critical to understand the limitations of RLHF and consider policy challenges arising from these limitations. In this paper, we investigate a specific challenge in building RLHF systems that respect democratic norms. Building on impossibility results in social choice theory, we show that, under fairly broad assumptions, there is no unique voting protocol to universally align AI systems using RLHF through democratic processes. Further, we show that aligning AI agents with the values of all individuals will always violate certain private ethical preferences of an individual user i.e., universal AI alignment using RLHF is impossible. We discuss policy implications for the governance of AI systems built using RLHF: first, the need for mandating transparent voting rules to hold model builders accountable. Second, the need for model builders to focus on developing AI agents that are narrowly aligned to specific user groups.Here's the translation in Traditional Chinese:对于人工智能应用来说,与人类意图和价值观点相互整合是一个关键瓶颈。但是谁的价值应该与AI代理人相整合?人工增强学习with人类反馈(RLHF)已经成为AI整合的关键框架。RLHF通过人类反馈来细化输出,所有广泛部署的大型自然语言模型(LLMs)都使用RLHF来对人类价值进行整合。理解RLHF的限制和考虑由这些限制导致的政策挑战是非常重要。在这篇论文中,我们investigates a specific challenge in building RLHF systems that respect democratic norms。基于社会选择理论中的不可能结果,我们显示,在很广泛的假设下,通过民主过程使用RLHFUnique voting protocol是不可能的。此外,我们显示,通过RLHF来整合AI代理人与所有个人的价值相整合会违反某些个人私人的伦理偏好。我们讨论RLHF建设的政策影响:首先,需要制定透明的投票规则,以便举证模型建立者的责任。第二,模型建设者需要专注于开发特定用户群的狭频整合AI代理人。

Woodpecker: Hallucination Correction for Multimodal Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16045
  • repo_url: https://github.com/bradyfu/woodpecker
  • paper_authors: Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen
  • for: mitigate hallucinations in Multimodal Large Language Models (MLLMs)
  • methods: training-free method named Woodpecker, consisting of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction
  • results: 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl on the POPE benchmark, and the source code is released at https://github.com/BradyFU/Woodpecker.
    Abstract Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content. In order to mitigate hallucinations, existing studies mainly resort to an instruction-tuning manner that requires retraining the models with specific data. In this paper, we pave a different way, introducing a training-free method named Woodpecker. Like a woodpecker heals trees, it picks out and corrects hallucinations from the generated text. Concretely, Woodpecker consists of five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction. Implemented in a post-remedy manner, Woodpecker can easily serve different MLLMs, while being interpretable by accessing intermediate outputs of the five stages. We evaluate Woodpecker both quantitatively and qualitatively and show the huge potential of this new paradigm. On the POPE benchmark, our method obtains a 30.66%/24.33% improvement in accuracy over the baseline MiniGPT-4/mPLUG-Owl. The source code is released at https://github.com/BradyFU/Woodpecker.
    摘要 《抖抖幻觉:一种免 instrucion 的方法》Introduction: Multimodal Large Language Models (MLLMs) 的发展受到幻觉的困扰,幻觉指的是生成文本与图像内容不一致。为了缓解幻觉,现有研究主要采用 instruction-tuning 方法,需要重新训练模型。在这篇论文中,我们开辟了一个不同的方向,提出了一种无需 instrucion 的方法—— Woodpecker。就像一只鹊鸟护理树木,Woodpecker 可以从生成文本中提取和修正幻觉。Method: Woodpecker 包括五个阶段:关键概念提取、问题构造、视觉知识验证、视觉声明生成和幻觉修正。在后续救恤方式下,Woodpecker 可以轻松地服务于不同的 MLLMs,同时可以通过访问不同阶段的输出来解释。我们对 Woodpecker 进行了量化和质量的评估,并显示了这新的思维方式的巨大潜力。在 POPE 测试准则下,我们的方法与基eline MiniGPT-4/mPLUG-Owl 的基eline之间具有30.66%/24.33%的提升。Code Release: Woodpecker 的源代码已经在 GitHub 上发布,可以通过 访问。Note: * 幻觉(Hallucination)指的是生成文本与图像内容不一致的现象。* MLLMs 是多模态大型语言模型的缩写。* instrucion-tuning 是指重新训练模型以适应特定数据的方法。

WebWISE: Web Interface Control and Sequential Exploration with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.16042
  • repo_url: None
  • paper_authors: Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem
  • for: 这篇论文是用一种大型自然语言模型(LLM)来自动执行网络软件任务的。
  • methods: 这种方法使用筛选的文档对象模型(DOM)元素作为观察,并按照步骤进行任务,逐步生成小程序基于当前观察。
  • results: 我们的WebWISE方法可以在MiniWob++ bencmark上达到类似或更好的性能,只需要一个在场示例。
    Abstract The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially generating small programs based on the current observations. We use in-context learning, either benefiting from a single manually provided example, or an automatically generated example based on a successful zero-shot trial. We evaluate the proposed method on the MiniWob++ benchmark. With only one in-context example, our WebWISE method achieves similar or better performance than other methods that require many demonstrations or trials.
    摘要 文章研究使用大型自然语言模型(LLM)来自动完成网络软件任务,使用点击、滚动和文本输入操作。先前的方法,如强化学习(RL)或模仿学习,训练不efficient。我们的方法使用筛选后的文档对象模型(DOM)元素作为观察,逐步执行任务,基于当前观察sequentially generating small programs。我们使用在场学习,从单一提供的人工示例或自动生成的零例示例中受益。我们在MiniWob++测试准则上评估了我们的WebWISE方法。只需一个在场示例,我们的方法可以与其他需要多个示例或尝试的方法匹配或超越其表现。

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

  • paper_url: http://arxiv.org/abs/2310.16040
  • repo_url: https://github.com/yzjiao/on-demand-ie
  • paper_authors: Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji, Jiawei Han
  • for: 本研究旨在提供一种基于大语言模型的个性化信息EXTRACTION系统,以满足非专家用户的长尾特殊EXTRACTION需求。
  • methods: 我们提出了一种新的 paradigm,称为“按需信息EXTRACTION”,以满足实际世界用户的个性化需求。我们的任务是根据用户的指令EXTRACTION相关文本中所需的内容,并将其提供在结构化表格格式中。
  • results: 我们在InstructIE benchmark上进行了广泛的评估,并显示了ODIE模型在相同大小模型中substantially outperform existing open-source models。
    Abstract Large language models with instruction-following capabilities open the door to a wider group of users. However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users. To address this, we propose a novel paradigm, termed On-Demand Information Extraction, to fulfill the personalized demands of real-world users. Our task aims to follow the instructions to extract the desired content from the associated text and present it in a structured tabular format. The table headers can either be user-specified or inferred contextually by the model. To facilitate research in this emerging area, we present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set. Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE. Comprehensive evaluations on our benchmark reveal that ODIE substantially outperforms the existing open-source models of similar size. Our code and dataset are released on https://github.com/yzjiao/On-Demand-IE.
    摘要 大型自然语言处理模型具有 instrucion-following 能力,打开了更广泛的用户群体。然而,在信息提取任务中,大多数任务特定系统无法与长尾随机提取用 caso 对非专家用户进行好匹配。为解决这个问题,我们提出了一种新的思路,称为需求导向信息提取(On-Demand Information Extraction),以满足实际世界用户的个性化需求。我们的任务是根据用户提供的指令,从关联的文本中提取需要的内容,并将其格式化为结构化的表格形式。表头可以由用户指定,或者由模型Contextually inferred。为促进这个新兴领域的研究,我们提供了一个名为 InstructIE 的benchmark,其包括自动生成的training数据,以及人工注释的测试集。基于 InstructIE,我们进一步开发了一个需求导向信息提取器(ODIE)。我们的评估表明,ODIE在相同大小模型中显著超越了现有的开源模型。我们的代码和数据集在 上发布。

What’s Left? Concept Grounding with Logic-Enhanced Foundation Models

  • paper_url: http://arxiv.org/abs/2310.16035
  • repo_url: https://github.com/joyhsu0504/left
  • paper_authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
  • for: 本研究旨在提出一种基于逻辑的跨领域基础模型(LEFT),可以智能地把概念考据到不同领域中。
  • methods: LEFT使用了大语言模型(LLM)和域特定固定模型(GTM),通过一种可微分的逻辑语言来执行程序。
  • results: LEFT在四个领域(2D图像、3D场景、人体动作和机器人抓取)中显示出了强大的理解能力,能够解决复杂的任务,包括训练时未看到的任务,并且可以轻松应用于新领域。
    Abstract Recent works such as VisProg and ViperGPT have smartly composed foundation models for visual reasoning-using large language models (LLMs) to produce programs that can be executed by pre-trained vision-language models. However, they operate in limited domains, such as 2D images, not fully exploiting the generalization of language: abstract concepts like "left" can also be grounded in 3D, temporal, and action data, as in moving to your left. This limited generalization stems from these inference-only methods' inability to learn or adapt pre-trained models to a new domain. We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor. LEFT has an LLM interpreter that outputs a program represented in a general, logic-based reasoning language, which is shared across all domains and tasks. LEFT's executor then executes the program with trainable domain-specific grounding modules. We show that LEFT flexibly learns concepts in four domains: 2D images, 3D scenes, human motions, and robotic manipulation. It exhibits strong reasoning ability in a wide variety of tasks, including those that are complex and not seen during training, and can be easily applied to new domains.
    摘要 最近的工作,如VisProg和ViperGPT,已经开发了基于大语言模型(LLM)的视觉逻辑基础模型。然而,这些基础模型只能在有限的领域中运行,如2D图像,而不能充分利用语言的抽象概念,如“左”。这种推理只能在新领域中学习和适应已经训练过的模型。我们提出了逻辑增强基础模型(LEFT),一个统一框架,可以在不同领域和任务中学习和理解概念。LEFT的LLM интерпрета器输出一个基于普适逻辑语言的程序表示,这种语言在所有领域和任务中共享。然后,LEFT的执行器将使用可调整的领域特定的固定模块执行程序。我们示示LEFT在四个领域中灵活地学习概念,并且在各种复杂的任务中表现出强大的逻辑能力,包括一些在训练时未经看到的任务。此外,LEFT可以轻松应用于新的领域。

Finetuning Offline World Models in the Real World

  • paper_url: http://arxiv.org/abs/2310.16029
  • repo_url: https://github.com/fyhMer/fowm
  • paper_authors: Yunhai Feng, Nicklas Hansen, Ziyan Xiong, Chandramouli Rajagopalan, Xiaolong Wang
  • for: 这个研究想要将模型基础RL(world models)训练在真实机器上,并且将其应用于新任务中。
  • methods: 这个研究使用了禁止线RL(offline RL)框架,将RL策略训练在预存的数据集上,而不需要在线上互动。在训练过程中,我们将模型训练到在线上执行的构思中,以免于在新任务中发生扩展错误。
  • results: 我们的方法可以在许多类型的视动控制任务中实现几次训练即可以见和未见的任务,甚至在有限的预存数据集上进行训练。我们提供了许多类型的视动控制任务的评估结果,并且提供了相关的代码和数据。
    Abstract Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. While model-based RL algorithms (world models) improve data-efficiency to some extent, they still require hours or days of interaction to learn skills. Recently, offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. However, constraining an algorithm to a fixed dataset induces a state-action distribution shift between training and inference, and limits its applicability to new tasks. In this work, we seek to get the best of both worlds: we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model. To mitigate extrapolation errors during online interaction, we propose to regularize the planner at test-time by balancing estimated returns and (epistemic) model uncertainty. We evaluate our method on a variety of visuo-motor control tasks in simulation and on a real robot, and find that our method enables few-shot finetuning to seen and unseen tasks even when offline data is limited. Videos, code, and data are available at https://yunhaifeng.com/FOWM .
    摘要 强化学习(RL)被认为是数据不充分的,这使得在真实机器人上进行训练变得困难。而使用世界模型(世界模型)的RL算法可以有一定程度的改善数据效率,但它们仍然需要数小时或数天的交互来学习技能。在最近,无线RL作为一个框架,它可以在已有数据集上训练RL策略而无需在线交互。然而,将算法约束在固定的数据集上会导致状态动作分布shift,限制其应用于新任务。在这个工作中,我们寻求获得两个世界的优点:我们考虑使用已有的世界模型在线上采集的数据进行预训练,然后在线上采集的数据上进行微调。为了减少在线交互时的推断误差,我们提议在测试时对计划进行规则化,将估计的返回和(知识)模型不确定性平衡。我们在视觉动作控制任务上进行了多种实验和在真实机器人上进行了实验,发现我们的方法可以在几个步骤内完成seen和unseen任务。视频、代码和数据可以在 中获取。

What Algorithms can Transformers Learn? A Study in Length Generalization

  • paper_url: http://arxiv.org/abs/2310.16028
  • repo_url: None
  • paper_authors: Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
  • for: 这种研究旨在探讨Transformer模型在算法任务上的总能力范围,以及它们是否可以学习真正的算法来解决问题。
  • methods: 我们使用RASP(Weiss等,2021)编程语言,并提出了RASP总则:Transformer模型在任务上能够长期泛化,如果该任务可以通过一个短的RASP程序来解决,并且该程序适用于所有输入长度。
  • results: 我们发现,这个总则能够准确预测Transformer模型在算法任务上的泛化性能,并且我们可以根据这个总则提高传统难度的任务(如平衡和加法)的泛化性能。在理论方面,我们给出了一个简单的示例,表明Abbe等(2023)的”最小度 interpolator”模型不能正确预测Transformer模型的异常行为,而我们的总则则可以正确预测。总之,我们的工作为compositional generalization机制和Transformer模型的算法能力提供了新的视角。
    Abstract Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
    摘要 大型语言模型会表现出意想不到的总结普遍性特性,但同时也对许多简单的逻辑任务显示困难。这提出了Transformer模型是否和何时学习真实的算法来解决任务的问题。我们在特定的设置下研究Transformer模型的能力范围。我们利用Weiss等人(2021)提出的RASP编程语言——一种基于Transformer计算模型的编程语言——并提出了RASP总结假设:Transformer模型在任务上具有长度总结能力,如果任务可以通过一个短的RASP程序来解决,并且这个程序适用于所有输入长度。这个简单的假设奇怪地捕捉了大多数已知的长度总结任务。此外,我们利用我们的发现来大幅提高传统难度任务(如加法和平衡)的总结性能。在理论上,我们给出了一个简单的例子,证明Abbe等人(2023)的“最小度 interpolator”模型不能正确预测Transformer模型的外部数据行为,而我们的假设则能够正确预测。总之,我们的工作提供了一种新的视角来理解compositional总结和Transformer模型的算法能力。

Physically Explainable Deep Learning for Convective Initiation Nowcasting Using GOES-16 Satellite Observations

  • paper_url: http://arxiv.org/abs/2310.16015
  • repo_url: None
  • paper_authors: Da Fan, Steven J. Greybush, David John Gagne II, Eugene E. Clothiaux
  • for: 预测气象预报模型和现有的nowcasting算法中的气象发展初始化(CI)问题仍然是一个挑战。
  • methods: 本研究使用物体基于概率深度学习模型预测CI,使用多核心infrared GOES-R卫星观测数据。
  • results: 深度学习模型在领先时间至1小时之间显著超越了经典逻辑模型,尤其是false alarm ratio。通过案例研究,深度学习模型表现了云和湿度特征在多个水平的依赖关系。模型解释结果显示模型决策过程中的不同基线的重要性。
    Abstract Convection initiation (CI) nowcasting remains a challenging problem for both numerical weather prediction models and existing nowcasting algorithms. In this study, object-based probabilistic deep learning models are developed to predict CI based on multichannel infrared GOES-R satellite observations. The data come from patches surrounding potential CI events identified in Multi-Radar Multi-Sensor Doppler weather radar products over the Great Plains region from June and July 2020 and June 2021. An objective radar-based approach is used to identify these events. The deep learning models significantly outperform the classical logistic model at lead times up to 1 hour, especially on the false alarm ratio. Through case studies, the deep learning model exhibits the dependence on the characteristics of clouds and moisture at multiple levels. Model explanation further reveals the model's decision-making process with different baselines. The explanation results highlight the importance of moisture and cloud features at different levels depending on the choice of baseline. Our study demonstrates the advantage of using different baselines in further understanding model behavior and gaining scientific insights.
    摘要 干湿物理学中的干湿物理学问题(CI)预测仍然是数值天气预测模型和现有预测算法的挑战。在本研究中,我们开发了基于多通道红外GOES-R卫星观测数据的对象概率深度学习模型,以预测CI。数据来自于2020年6月至7月和2021年6月在大陆地区的多普勒多感器气象雷达产品中拟合成的潜在CI事件。我们采用了对象雷达基本法来标识这些事件。深度学习模型在预测领域的领先时间(最长1小时)中显著超过了经典逻辑模型,特别是false alarm ratio。通过案例研究,深度学习模型表现出云和湿度特征的依赖关系,并且模型解释结果透视了不同基准下模型决策过程中的云和湿度特征的重要性。这些结果表明了不同基准下模型的行为和科学意义。

Bayesian imaging inverse problem with SA-Roundtrip prior via HMC-pCN sampler

  • paper_url: http://arxiv.org/abs/2310.17817
  • repo_url: https://github.com/qjy415417122/sa-roundtrip
  • paper_authors: Jiayu Qian, Yuanyuan Liu, Jingya Yang, Qingping Zhou
  • for: 用于解决科学和工程领域的图像反向问题
  • methods: 使用深度生成先验学习积分分布,并在描述数据的自注意结构内嵌入探索机制
  • results: 比较现有方法表现出色,在CT重建和MNIST数据集上实现了稳定和准确的点估计,同时提供了精确的不确定性评估
    Abstract Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification.
    摘要 bayesian 推理 WITH deep 生成假设 已经受到了许多科学和工程领域的广泛关注,用于解决几何逆问题。假设分布的选择是从可用的假设测量学习的,因此是很重要的假设学习。SA-Roundtrip,一种新的深度生成假设,是用于实现控制的抽样生成和识别数据的自然维度。这个假设包含了自我注意结构在 bidirectional 生成对抗网络中。接下来,bayesian 推理是应用到在低维度的潜在空间中的 posterior 分布上,使用 Hamilton Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) 算法,这是在特定条件下证明为ergodic。实验在 computed tomography (CT) 重建中使用 MNIST 和 TomoPhantom 数据集,显示了提案的方法在比较之上表现出Robust和superior的点估计,同时精确地评估不确定性。

Human-in-the-Loop Task and Motion Planning for Imitation Learning

  • paper_url: http://arxiv.org/abs/2310.16014
  • repo_url: None
  • paper_authors: Ajay Mandlekar, Caelan Garrett, Danfei Xu, Dieter Fox
  • for: 本研究旨在开发一种名为人类在循环控制(HITL)任务和运动规划(TAMP)系统,用于教育机器人执行复杂的搏动任务。
  • methods: 该系统使用了TAMP-gat Control机制,允许人类操作员在机器人 Fleet 中进行数据采集,并将采集的人类数据与仿冒学框架结合,以训练TAMP-gat 策略。
  • results: 与传统电动操作系统相比,HITL-TAMP 可以更高效地采集数据,并且可以从非专家电动操作数据中训练高效的机器人。研究中收集了2.1K demos,并在12种接触度高、时间长的任务中实现了近乎完美的机器人。
    Abstract Imitation learning from human demonstrations can teach robots complex manipulation skills, but is time-consuming and labor intensive. In contrast, Task and Motion Planning (TAMP) systems are automated and excel at solving long-horizon tasks, but they are difficult to apply to contact-rich tasks. In this paper, we present Human-in-the-Loop Task and Motion Planning (HITL-TAMP), a novel system that leverages the benefits of both approaches. The system employs a TAMP-gated control mechanism, which selectively gives and takes control to and from a human teleoperator. This enables the human teleoperator to manage a fleet of robots, maximizing data collection efficiency. The collected human data is then combined with an imitation learning framework to train a TAMP-gated policy, leading to superior performance compared to training on full task demonstrations. We compared HITL-TAMP to a conventional teleoperation system -- users gathered more than 3x the number of demos given the same time budget. Furthermore, proficient agents (75\%+ success) could be trained from just 10 minutes of non-expert teleoperation data. Finally, we collected 2.1K demos with HITL-TAMP across 12 contact-rich, long-horizon tasks and show that the system often produces near-perfect agents. Videos and additional results at https://hitltamp.github.io .
    摘要 人类示范学习可以教育机器人执行复杂的搬运任务,但是时间和劳动力成本高。相比之下,任务和动作规划(TAMP)系统是自动化的,但是它们在触感丰富任务上表现不佳。本文描述了一种新的人类在循环(HITL-TAMP)系统,它利用了两种方法的优点。该系统使用了 TAMP-gated 控制机制,允许人类操作员在机器人队列中进行数据采集管理。采集的人类数据与一种模仿学习框架结合,以训练一个 TAMP-gated 政策,从而实现更高的性能。我们比较了 HITL-TAMP 和传统的 теле操作系统,发现用户可以在同一时间预算内收集更多的示范数据(至少3倍)。此外,我们还发现,只需要10分钟的非专家电操作数据,可以训练出75%以上的成功率的高效代理。最后,我们收集了12种Contact-rich、long-horizon任务的2100个示范数据,并证明该系统可以生成近似完美的代理。详细结果和视频可以在 查看。

Dissecting In-Context Learning of Translations in GPTs

  • paper_url: http://arxiv.org/abs/2310.15987
  • repo_url: None
  • paper_authors: Vikas Raunak, Hany Hassan Awadalla, Arul Menezes
  • for: 这项研究旨在更好地理解在语音翻译中使用大语言模型(LLMs)的几个示例选择。
  • methods: 这项研究使用了对高质量、 Domain 中的示例进行偏移来更好地理解翻译中的几个示例选择。
  • results: 研究发现,源侧偏移几乎没有影响,而目标偏移可以很快地降低翻译质量,这说明在翻译中,输出文本分布是提供最重要的学习信号。他们提出了一种名为 Zero-Shot-Context 的方法,可以自动地添加这种信号在零示例提示中。研究显示,这种方法可以超越 GPT-3 的零示例翻译性能,甚至与几个示例提示的翻译性能相匹配。
    Abstract Most of the recent work in leveraging Large Language Models (LLMs) such as GPT-3 for Machine Translation (MT) has focused on selecting the few-shot samples for prompting. In this work, we try to better understand the role of demonstration attributes for the in-context learning of translations through perturbations of high-quality, in-domain demonstrations. We find that asymmetric perturbation of the source-target mappings yield vastly different results. We show that the perturbation of the source side has surprisingly little impact, while target perturbation can drastically reduce translation quality, suggesting that it is the output text distribution that provides the most important learning signal during in-context learning of translations. We propose a method named Zero-Shot-Context to add this signal automatically in Zero-Shot prompting. We demonstrate that it improves upon the zero-shot translation performance of GPT-3, even making it competitive with few-shot prompted translations.
    摘要 大多数最近的研究把大型自然语言模型(LLM)如GPT-3应用于机器翻译(MT)都集中在选择少量示例的推示中。在这个工作中,我们尝试更好地理解在上下文学习翻译时示例特征的作用。我们发现源目标映射的偏移会导致极其不同的结果,而目标偏移可以很快地降低翻译质量,表明在翻译上下文学习中,输出文本分布提供了最重要的学习信号。我们提出一种名为Zero-Shot-Context的方法,可以自动添加这种信号在零批示中。我们示示了这种方法可以超越GPT-3的零批示翻译性能,甚至与几批示推示的翻译性能竞争。

Graph Deep Learning for Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2310.15978
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Andrea Cini, Ivan Marisca, Daniele Zambon, Cesare Alippi
  • for: 本研究旨在提供一个系统的方法ologique框架,以便对时间序列集合进行Graph-based深度学习预测。
  • methods: 本研究使用的方法包括对时间序列集合进行Graph-based的深度学习预测,并提供了一个系统的设计原则和性能评估方法。
  • results: 本研究的结果显示,Graph-based深度学习预测方法可以实现高精度的预测,并且可以处理大规模的时间序列集合。此外,本研究还提供了一些设计原则和实践建议,以便帮助研究者在实际应用中使用这些方法。
    Abstract Graph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions.
    摘要 GRaph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from Traditional Chinese.

Accented Speech Recognition With Accent-specific Codebooks

  • paper_url: http://arxiv.org/abs/2310.15970
  • repo_url: https://github.com/csalt-research/accented-codebooks-asr
  • paper_authors: Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni
  • For: 本研究旨在提高现代自动话语识别(ASR)系统对不同口音的表现,尤其是对于未经投入的口音。* Methods: 我们提出了一种新的口音适应方法,使用跨注意力的学习码库集。这些可学习的码库集中包含了口音特有的信息,并与ASR编码层结合使用。* Results: 我们在Mozilla Common Voice多口音 dataset上进行了实验,结果显示,我们的提议方法可以在训练过程中提高英语口音表现(最多提高37%的单词错误率),同时在未见口音测试数据上也可以获得显著改善(最多提高5%的单词错误率)。此外,我们还发现在零基础转移设置下,我们的方法可以在L2Artic dataset上实现显著改善。与其他基于口音对抗训练的方法进行比较,我们的方法也表现出优异。
    Abstract Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to $37\%$ relative improvement in word error rate) but also on the unseen accents (up to $5\%$ relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training.
    摘要 听力口音对现代自动语音识别(ASR)系统 pose significiant 挑战。听力口音不均衡的性能下降是ASR的包容性采用的严重障碍。在这项工作中,我们提议一种基于 cross-attention 的新的口音适应方法,用于结构ASR系统。这些可学习的codebook捕捉了口音特定信息,并在ASR编码层中集成。模型在口音英语语音上训练,测试数据包含不同于训练数据中的口音。在Mozilla Common Voice多口音集合上,我们表明我们的提议方法可以在 seen 口音上实现 significiant 性能提升(最高达37%的字典错误率下降),以及不seen 口音上(最高达5%的字典错误率下降)。此外,我们还证明了零战斗转移设置下的好处。此外,我们还与口音对抗训练方法进行比较。

Representation Learning with Large Language Models for Recommendation

  • paper_url: http://arxiv.org/abs/2310.15950
  • repo_url: https://github.com/hkuds/rlmrec
  • paper_authors: Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang
  • for: 这个论文的目的是提高现有的推荐系统,使其能够更好地捕捉用户的偏好和行为。
  • methods: 这篇论文使用了语言模型(LLM)来强化现有的ID-based推荐系统,并通过跨视图对Alignment来协调语言模型和共同关系信号的semantic空间。
  • results: 在我们的评估中,RLMRec可以增强现有的推荐模型,并且可以降低噪声和偏见的影响。我们的实现代码可以在https://github.com/HKUDS/RLMRec上获取。
    Abstract Recommender systems have seen significant advancements with the influence of deep learning and graph neural networks, particularly in capturing complex user-item relationships. However, these graph-based recommenders heavily depend on ID-based data, potentially disregarding valuable textual information associated with users and items, resulting in less informative learned representations. Moreover, the utilization of implicit feedback data introduces potential noise and bias, posing challenges for the effectiveness of user preference learning. While the integration of large language models (LLMs) into traditional ID-based recommenders has gained attention, challenges such as scalability issues, limitations in text-only reliance, and prompt input constraints need to be addressed for effective implementation in practical recommender systems. To address these challenges, we propose a model-agnostic framework RLMRec that aims to enhance existing recommenders with LLM-empowered representation learning. It proposes a recommendation paradigm that integrates representation learning with LLMs to capture intricate semantic aspects of user behaviors and preferences. RLMRec incorporates auxiliary textual signals, develops a user/item profiling paradigm empowered by LLMs, and aligns the semantic space of LLMs with the representation space of collaborative relational signals through a cross-view alignment framework. This work further establish a theoretical foundation demonstrating that incorporating textual signals through mutual information maximization enhances the quality of representations. In our evaluation, we integrate RLMRec with state-of-the-art recommender models, while also analyzing its efficiency and robustness to noise data. Our implementation codes are available at https://github.com/HKUDS/RLMRec.
    摘要 现代推荐系统已经受到深度学习和图内存网络的影响,特别是在捕捉用户-项目关系的复杂性方面做出了重要进步。然而,这些图基的推荐器很多依赖ID数据,可能忽略用户和项目之间的文本信息,从而导致学习的表示变得更加粗糙。此外,使用隐式反馈数据也会引入噪声和偏见问题,对用户喜好学习造成挑战。而将大型自然语言模型(LLM)integrated into传统ID基的推荐器已经吸引了关注,但是这些挑战需要解决,例如可扩展性问题、仅依靠文本信息的局限性和提交输入约束。为了解决这些挑战,我们提出了一个模型无关的框架RLMRec,该框架通过将LLM-empowered representation learning与现有推荐器集成,以捕捉用户行为和喜好的细腻层次。RLMRec利用文本信号作为辅助信号,开发了基于LLM的用户/项目 profiling方法,并通过交叉视图对齐框架将LLM的 semantic space与现有的共同关系信号的表示空间相匹配。本研究还建立了基于在加入文本信号的mutual information最大化的理论基础,证明了将文本信号integrated into推荐系统可以提高表示质量。在我们的评估中,我们将RLMRec与当今最先进的推荐模型结合,同时分析其效率和Robustness to noise data。我们的实现代码可以在https://github.com/HKUDS/RLMRec上获取。

Combining Behaviors with the Successor Features Keyboard

  • paper_url: http://arxiv.org/abs/2310.15940
  • repo_url: None
  • paper_authors: Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran
  • for: 这篇论文目标是提出一种基于Successor Features(SF)和Generalized Policy Improvement(GPI)的行为知识传递方法,以便在新任务环境中快速学习。
  • methods: 该方法使用了Successor Features Keyboard(SFK)和Categorical Successor Feature Approximator(CSFA)两种算法来实现行为知识传递。CSFA是一种新的学习算法,可以同时发现状态特征和任务编码。
  • results: 通过SFK和CSFA,该论文在一个复杂的3D环境中实现了行为知识传递,并且比基于转移学习的基准方法更快速地传递到长期任务。此外,CSFA比其他SF approximator方法更能够在大规模任务中发现与SF&GPI相容的表示。
    Abstract The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work, we propose the "Successor Features Keyboard" (SFK), which enables transfer with discovered state-features and task encodings. To enable discovery, we propose the "Categorical Successor Feature Approximator" (CSFA), a novel learning algorithm for estimating SFs while jointly discovering state-features and task encodings. With SFK and CSFA, we achieve the first demonstration of transfer with SFs in a challenging 3D environment where all the necessary representations are discovered. We first compare CSFA against other methods for approximating SFs and show that only CSFA discovers representations compatible with SF&GPI at this scale. We then compare SFK against transfer learning baselines and show that it transfers most quickly to long-horizon tasks.
    摘要 “Option Keyboard(OK)最近被提议作为知识传递方法。OK通过适应性地组合已知行为使用成功特征(SF)和通用政策改进(GPI)来传递知识。然而,它依赖于手动设计的状态特征和任务编码,这些编码是为每个新环境设计的困难。在这种工作中,我们提出了“Successor Features Keyboard”(SFK),它允许通过发现状态特征和任务编码来传递知识。为实现发现,我们提出了“ categorical Successor Feature Approximator”(CSFA),一种新的学习算法,可以在同时发现状态特征和任务编码中 estimating SF。与SFK和CSFA相比,我们发现SFK可以在长期任务中传递最快。”Note that Simplified Chinese is used in this translation, as it is the more widely used standard for Chinese writing in mainland China. If you prefer Traditional Chinese, I can provide that as well.

E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity

  • paper_url: http://arxiv.org/abs/2310.15929
  • repo_url: None
  • paper_authors: Yun Li, Lin Niu, Xipeng Zhang, Kai Liu, Jianchen Zhu, Zhanhui Kang
  • for: 提高 Large Language Models (LLMs) 的 Generative AI 性能,解决传统遍历方法在 LLMs 中的不可持续性问题。
  • methods: 利用隐藏状态特征的信息熵,设计一种名为 E-Sparse 的减少精度度量,以提高 N:M 稀缺性的准确性。E-Sparse 使用信息 ricness 来优化通道重要性,并采用了一些新的技术来实现:(1)通过增加参数权重和输入特征norm的信息熵来增强精度度量,并不需要修改剩下的权重。(2)设计全局简单混合和本地块混合来快速优化信息分布,并妥协 LLMS 的精度和内存占用。
  • results: E-Sparse 在 LLaMA 家族和 OPT 模型上实现了显著提高模型推理速度(最高达 1.53 倍)和内存占用减少(最高达 43.52%),与稀缺模型的精度损失可以接受。
    Abstract Traditional pruning methods are known to be challenging to work in Large Language Models (LLMs) for Generative AI because of their unaffordable training process and large computational demands. For the first time, we introduce the information entropy of hidden state features into a pruning metric design, namely E-Sparse, to improve the accuracy of N:M sparsity on LLM. E-Sparse employs the information richness to leverage the channel importance, and further incorporates several novel techniques to put it into effect: (1) it introduces information entropy to enhance the significance of parameter weights and input feature norms as a novel pruning metric, and performs N:M sparsity without modifying the remaining weights. (2) it designs global naive shuffle and local block shuffle to quickly optimize the information distribution and adequately cope with the impact of N:M sparsity on LLMs' accuracy. E-Sparse is implemented as a Sparse-GEMM on FasterTransformer and runs on NVIDIA Ampere GPUs. Extensive experiments on the LLaMA family and OPT models show that E-Sparse can significantly speed up the model inference over the dense model (up to 1.53X) and obtain significant memory saving (up to 43.52%), with acceptable accuracy loss.
    摘要 传统的剪枝方法在大语言模型(LLM)中的生成AI中存在困难,因为它们的训练过程是不可持预算的,计算负担也很大。我们首次在LLM中引入隐藏状态特征的信息熵,并将其作为剪枝度量设计metric,称之为E-Sparse。E-Sparse利用隐藏状态特征的信息质量来优化频道重要性,并采用了一些新的技术来实现:1. 通过将参数权重和输入特征norm中的信息熵添加到剪枝度量中,以提高N:M稀疏的精度。2. 设计了全局简单混淆和局部块混淆,以快速调整信息分布,并有效地处理LLMs的精度下降。E-Sparse实现为Sparse-GEMM在FasterTransformer上,并在NVIDIA Ampere GPU上运行。广泛的实验表明,E-Sparse可以在LLMs中提高模型推理速度(最高达1.53倍),并获得显著的内存储存减少(最高达43.52%),同时保持了可接受的精度损失。

Characterizing Mechanisms for Factual Recall in Language Models

  • paper_url: http://arxiv.org/abs/2310.15910
  • repo_url: None
  • paper_authors: Qinan Yu, Jack Merullo, Ellie Pavlick
  • for: 研究语言模型(LM)在练习时如何处理不同来源的信息冲突。
  • methods: 使用分布式和机制性的方法来研究LM的行为,包括测量LM使用counterfactual前缀(例如“波兰首都是伦敦”)抹除预练知识。
  • results: 发现在Pythia和GPT2模型中,训练国家“波兰”和在context中城市“伦敦”的频率高度影响LM的使用counterfactual的可能性。通过在运行时缩放单个注意头的值向量来控制使用context answer的可能性,可以提高生成context answer的时间为88%。这些研究贡献到了控制模型行为的方法,并提供了一种proof of concept。
    Abstract Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88\% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.
    摘要 语言模型(LM)经常需要将在预训练中记忆的信息与新的上下文中的信息结合在一起。这两个来源可能会有冲突, causing competition within the model, 并且不清楚如何使模型解决这种冲突。在一个关于世界首都的数据集上,我们调查了分布式和机制性的权重决定LM行为在这些情况下。specifically,我们测量了LM使用counterfactual prefix(例如,"Poland的首都是London")来覆盖预训练中学习的答案("Warsaw")的比例。在Pythia和GPT2上,训练国家("Poland")和上下文城市("London")的频率高度影响模型的使用counterfactual的可能性。然后,我们使用头归因来确定具有推荐 memorized answer 或 in-context answer 的个体注意头。通过在运行时缩放这些头的值向量,我们可以控制在新数据上使用 in-context answer 的可能性。这种方法可以将在新数据上使用 in-context answer 的比例提高到88%,只需在运行时缩放单个头。我们的工作贡献到了控制模型行为的方法,并提供了一种实验证明的证明。

Is Probing All You Need? Indicator Tasks as an Alternative to Probing Embedding Spaces

  • paper_url: http://arxiv.org/abs/2310.15905
  • repo_url: None
  • paper_authors: Tal Levy, Omer Goldman, Reut Tsarfaty
  • for: This paper is written to explore the use of indicator tasks for evaluating the information encoded in word embeddings, and to demonstrate the advantages of using indicator tasks over traditional probing methods.
  • methods: The paper uses two test cases to demonstrate the effectiveness of indicator tasks: one dealing with gender debiasing and another with the erasure of morphological information from embedding spaces.
  • results: The paper shows that the application of a suitable indicator provides a more accurate picture of the information captured and removed compared to probes, and thus concludes that indicator tasks should be implemented and taken into consideration when eliciting information from embedded representations.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是为了探讨word embedding中信息的评估方法,并通过两个测试 случа件来证明使用指标任务的优势。
  • methods: 这篇论文使用两个测试 случа件来证明指标任务的效果:一个 gender 偏见减少的测试 случа件和另一个 morphological information 从 embedding space 中除去的测试cases。
  • results: 这篇论文表明,使用合适的指标可以更准确地捕捉 embedding space 中 captured 和 removed 的信息,而不是使用 probing 方法。因此,这篇论文认为,指标任务应该被实施并考虑在 embedded representation 中提取信息时使用。
    Abstract The ability to identify and control different kinds of linguistic information encoded in vector representations of words has many use cases, especially for explainability and bias removal. This is usually done via a set of simple classification tasks, termed probes, to evaluate the information encoded in the embedding space. However, the involvement of a trainable classifier leads to entanglement between the probe's results and the classifier's nature. As a result, contemporary works on probing include tasks that do not involve training of auxiliary models. In this work we introduce the term indicator tasks for non-trainable tasks which are used to query embedding spaces for the existence of certain properties, and claim that this kind of tasks may point to a direction opposite to probes, and that this contradiction complicates the decision on whether a property exists in an embedding space. We demonstrate our claims with two test cases, one dealing with gender debiasing and another with the erasure of morphological information from embedding spaces. We show that the application of a suitable indicator provides a more accurate picture of the information captured and removed compared to probes. We thus conclude that indicator tasks should be implemented and taken into consideration when eliciting information from embedded representations.
    摘要 “可以识别和控制不同类型的语言信息对字 vector 表示的编码有很多用途,特别是 для explainability 和偏见除除。通常透过一系列简单的分类任务,称为探针,来评估嵌入空间中的信息。但是,将探针与搜寻器的性质推导关系会导致嵌入空间中的信息被探针所探索的结果混合在一起。为了解决这个问题,现代工作中通常使用不需要训练助手模型的任务。在这篇文章中,我们引入了“仪器任务”这个概念,用于在嵌入空间中询问特定的特性是否存在。我们声称,这种任务可能会与探针相反的方向,并且这个对应关系会复杂化嵌入空间中的信息决定。我们透过两个测试案例,一个是关于性别偏见的消除和另一个是关于嵌入空间中的 morphological 信息的消除,以示出适当的仪器可以提供更加精确的信息捕捉和移除比探针。因此,我们结论到,仪器任务应该被实现和考虑在嵌入表示中提取信息时。”

AdaptiX – A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive Robotics

  • paper_url: http://arxiv.org/abs/2310.15887
  • repo_url: https://github.com/maxpascher/AdaptiX
  • paper_authors: Max Pascher, Felix Ferdinand Goldau, Kirill Kronhardt, Udo Frese, Jens Gerken
  • for: 本研究旨在提高辅助技术的可用性和普及度,通过结合人机共同控制的概念,提高用户自主性,并在高解像力 simulate 环境中评估和开发共同控制应用程序。
  • methods: 本研究使用了开源的 AdaptiX XR 框架,可以快速开发和评估共同控制应用程序,无需真实的 física robotic arm。它可以轻松扩展,以满足特定研究需求,并可以通过 ROS 集成控制真实 robotic arm。
  • results: 本研究通过 AdaptiX 框架进行了评估和开发共同控制应用程序,并在虚拟现实环境中进行了多种人机共同控制方法的研究。结果表明,AdaptiX 可以帮助提高用户自主性,并减少用户与软件控制之间的差异。
    Abstract With the ongoing efforts to empower people with mobility impairments and the increase in technological acceptance by the general public, assistive technologies, such as collaborative robotic arms, are gaining popularity. Yet, their widespread success is limited by usability issues, specifically the disparity between user input and software control along the autonomy continuum. To address this, shared control concepts provide opportunities to combine the targeted increase of user autonomy with a certain level of computer assistance. This paper presents the free and open-source AdaptiX XR framework for developing and evaluating shared control applications in a high-resolution simulation environment. The initial framework consists of a simulated robotic arm with an example scenario in Virtual Reality (VR), multiple standard control interfaces, and a specialized recording/replay system. AdaptiX can easily be extended for specific research needs, allowing Human-Robot Interaction (HRI) researchers to rapidly design and test novel interaction methods, intervention strategies, and multi-modal feedback techniques, without requiring an actual physical robotic arm during the early phases of ideation, prototyping, and evaluation. Also, a Robot Operating System (ROS) integration enables the controlling of a real robotic arm in a PhysicalTwin approach without any simulation-reality gap. Here, we review the capabilities and limitations of AdaptiX in detail and present three bodies of research based on the framework. AdaptiX can be accessed at https://adaptix.robot-research.de.
    摘要 随着人们身体障碍的 empowerment 和技术的普及,协助技术,如合作 робоptic arms,在广泛的应用中受到欢迎。然而,其广泛的成功受限于使用问题,具体是用户输入和软件控制之间的差异,在自主性维度上。 To address this, shared control concepts can combine the targeted increase of user autonomy with a certain level of computer assistance. This paper presents the free and open-source AdaptiX XR framework for developing and evaluating shared control applications in a high-resolution simulation environment. The initial framework consists of a simulated robotic arm with an example scenario in Virtual Reality (VR), multiple standard control interfaces, and a specialized recording/replay system. AdaptiX can easily be extended for specific research needs, allowing Human-Robot Interaction (HRI) researchers to rapidly design and test novel interaction methods, intervention strategies, and multi-modal feedback techniques, without requiring an actual physical robotic arm during the early phases of ideation, prototyping, and evaluation. Additionally, a Robot Operating System (ROS) integration enables the controlling of a real robotic arm in a PhysicalTwin approach without any simulation-reality gap. Here, we review the capabilities and limitations of AdaptiX in detail and present three bodies of research based on the framework. AdaptiX can be accessed at https://adaptix.robot-research.de.

KirchhoffNet: A Circuit Bridging Message Passing and Continuous-Depth Models

  • paper_url: http://arxiv.org/abs/2310.15872
  • repo_url: None
  • paper_authors: Zhengqi Gao, Fan-Keng Sun, Duane S. Boning
  • for: 这篇论文的目的是介绍一种基于电子电路基本原理的神经网络模型,称之为KirchhoffNet,它与消息传递神经网络和连续深度网络有紧密的联系。
  • methods: 作者使用Kirchhoff的电流法则来引入一种独特的神经网络模型,这种模型不需要传统层次结构(如卷积、聚合和线性层)却可以达到98.86%的测试准确率在MNIST dataset,与当前最佳实践(SOTA)结果相当。
  • results: 作者证明了KirchhoffNet在硬件方面具有潜在的优势,它可以通过物理实现为analog电子电路,而不需要使用GPU。此外,作者还证明了KirchhoffNet的前向计算总是在1/f秒钟内完成,这意味着无论KirchhoffNet的参数数量如何,它都可以在1/f秒钟内完成计算。这一特点为实现ultra-大规模神经网络提供了一个有前途的技术。
    Abstract In this paper, we exploit a fundamental principle of analog electronic circuitry, Kirchhoff's current law, to introduce a unique class of neural network models that we refer to as KirchhoffNet. KirchhoffNet establishes close connections with message passing neural networks and continuous-depth networks. We demonstrate that even in the absence of any traditional layers (such as convolution, pooling, or linear layers), KirchhoffNet attains 98.86% test accuracy on the MNIST dataset, comparable with state of the art (SOTA) results. What makes KirchhoffNet more intriguing is its potential in the realm of hardware. Contemporary deep neural networks are conventionally deployed on GPUs. In contrast, KirchhoffNet can be physically realized by an analog electronic circuit. Moreover, we justify that irrespective of the number of parameters within a KirchhoffNet, its forward calculation can always be completed within 1/f seconds, with f representing the hardware's clock frequency. This characteristic introduces a promising technology for implementing ultra-large-scale neural networks.
    摘要 在这篇论文中,我们利用电子电路中的一个基本原理,基希夫托的电流法则,来引入一种新的神经网络模型,我们称之为基希夫托网络(KirchhoffNet)。基希夫托网络与消息传递神经网络和连续深度网络有紧密的联系。我们示示了,即使没有传统层(如核心抽取、聚合或线性层),基希夫托网络仍能达到98.86%的测试准确率在MNIST数据集上,与状态的最佳实践(SOTA)结果相当。这使得基希夫托网络更加吸引人,因为它在硬件方面具有潜在的优势。当代深度神经网络通常在GPU上部署。相比之下,基希夫托网络可以被物理实现为一个分析电子电路。此外,我们证明了基希夫托网络的前向计算总是在1/f秒钟内完成,其中f是硬件的频率。这一特点引入了可能实现超大规模神经网络的有望技术。

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

  • paper_url: http://arxiv.org/abs/2310.16853
  • repo_url: None
  • paper_authors: Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang
  • for: 本研究的目的是提供一种能够自动生成binary函数摘要的方法,以便进行反向工程学。
  • methods: 本研究使用了一种基于控制流图和伪代码的 binary code summarization 框架,称为CP-BCS。该框架利用了双向指令级别控制流图和伪代码,以便充分利用 Assembly 代码的 semantics。
  • results: 对于3种不同的 binary 优化级别(O1、O2、O3)和3种不同的计算机架构(X86、X64、ARM),CP-BCS 的评估结果表明,它能够显著提高反向工程学的效率。
    Abstract Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that make them still confusing. To bridge this gap, we focus on generating complete summaries for binary functions, especially for stripped binary (no symbol table and debug information in reality). To fully exploit the semantics of assembly code, we present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS. CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics. We evaluate CP-BCS on 3 different binary optimization levels (O1, O2, and O3) for 3 different computer architectures (X86, X64, and ARM). The evaluation results demonstrate CP-BCS is superior and significantly improves the efficiency of reverse engineering.
    摘要 自动生成函数摘要 для二进制文件是一项非常有价值但具有挑战性的任务,因为它涉及翻译二进制语言(assembly code)的执行行为和semantics into human-readable natural language。然而,现有的大多数工作都集中在生成函数名称上,这些名称充满缩写,使得它们仍然混乱不清。为了弥补这一差距,我们将注意点在生成完整的函数摘要,特别是对于剥离二进制(没有符号表和调试信息)。为了全面利用二进制语言的 semantics,我们提出了一种控制流图和pseudo code导航 binary code摘要框架,称为CP-BCS。CP-BCS使用双向指令级别控制流图和pseudo code,并通过专家知识来学习二进制函数执行行为和逻辑semantics。我们对3个不同的二进制优化级别(O1、O2和O3)和3种不同的计算机架构(X86、X64和ARM)进行了评估。评估结果表明,CP-BCS在逆向工程中提高了效率。

Topology-aware Debiased Self-supervised Graph Learning for Recommendation

  • paper_url: http://arxiv.org/abs/2310.15858
  • repo_url: https://github.com/malajikuai/tdsgl
  • paper_authors: Lei Han, Hui Yan, Zhicheng Qiao
  • for: 提高推荐系统的准确率和效果
  • methods: 基于图的自适应自监学习(TDSGL),利用交互数据反映用户购买意愿和物品特点,构建强类对比,提高模型的泛化能力
  • results: 在三个公共数据集上实验显示,提出的模型在与状态艺术模型进行比较时达到了显著的提高,代码可以在https://github.com/malajikuai/TDSGL中获取
    Abstract In recommendation, graph-based Collaborative Filtering (CF) methods mitigate the data sparsity by introducing Graph Contrastive Learning (GCL). However, the random negative sampling strategy in these GCL-based CF models neglects the semantic structure of users (items), which not only introduces false negatives (negatives that are similar to anchor user (item)) but also ignores the potential positive samples. To tackle the above issues, we propose Topology-aware Debiased Self-supervised Graph Learning (TDSGL) for recommendation, which constructs contrastive pairs according to the semantic similarity between users (items). Specifically, since the original user-item interaction data commendably reflects the purchasing intent of users and certain characteristics of items, we calculate the semantic similarity between users (items) on interaction data. Then, given a user (item), we construct its negative pairs by selecting users (items) which embed different semantic structures to ensure the semantic difference between the given user (item) and its negatives. Moreover, for a user (item), we design a feature extraction module that converts other semantically similar users (items) into an auxiliary positive sample to acquire a more informative representation. Experimental results show that the proposed model outperforms the state-of-the-art models significantly on three public datasets. Our model implementation codes are available at https://github.com/malajikuai/TDSGL.
    摘要 在推荐中,基于图的共同推荐(CF)方法可以减轻数据稀缺性,通过引入图像对比学习(GCL)。然而,Random Negative Sampling策略在这些GCL基于CF模型中忽略用户(item)的 semantic structure,不仅引入假的负样本(与锚用户(item)相似的负样本),还忽略了可能的正样本。为了解决这些问题,我们提出了 topology-aware debiased self-supervised graph learning(TDSGL)模型,它根据用户(item)的semantic similarity来构建对比对。具体来说,我们根据原始的用户-item交互数据可以得到用户(item)的购买意愿和一些特征,然后计算用户(item)的semantic similarity。接下来,给一个用户(item),我们选择semantic structure不同的用户(item)作为负样本,以确保与给定用户(item)的semantic差异。此外,为一个用户(item),我们设计了一个特征提取模块,将其他semantic相似的用户(item)转换为auxiliary positive sample,以获得更加有用的表示。实验结果显示,我们提出的模型在三个公共数据集上与当前状态的模型显著超越。我们的模型实现代码可以在https://github.com/malajikuai/TDSGL中找到。

Using Artificial French Data to Understand the Emergence of Gender Bias in Transformer Language Models

  • paper_url: http://arxiv.org/abs/2310.15852
  • repo_url: None
  • paper_authors: Lina Conti, Guillaume Wisniewski
  • for: 这个研究探究了神经语言模型如何自然地学习语言特性,包括 gender 信息的捕捉和使用规则。
  • methods: 研究使用了一个基于 PCDF 的人工语料库,控制了训练数据中 gender 分布,以确定模型是否正确地捕捉 gender 信息,或者受到 gender 偏见。
  • results: 研究发现,当模型在训练时受到 sufficient 的 gender 信息时,它们可以正确地捕捉 gender 信息并遵循使用规则。但是,当 gender 信息不充分时,模型可能受到 gender 偏见。
    Abstract Numerous studies have demonstrated the ability of neural language models to learn various linguistic properties without direct supervision. This work takes an initial step towards exploring the less researched topic of how neural models discover linguistic properties of words, such as gender, as well as the rules governing their usage. We propose to use an artificial corpus generated by a PCFG based on French to precisely control the gender distribution in the training data and determine under which conditions a model correctly captures gender information or, on the contrary, appears gender-biased.
    摘要 多个研究已经证明神经语言模型可以不直接监督学习不同语言性质。这项工作尝试了对较少研究的话语性质的探索,包括单词的性别信息以及其使用规则。我们提议使用基于法语的PCFG生成人工词库,以控制训练数据中性别分布的精度,并确定模型在捕捉性别信息时是否正确或偏向一方。

Posterior Estimation for Dynamic PET imaging using Conditional Variational Inference

  • paper_url: http://arxiv.org/abs/2310.15850
  • repo_url: None
  • paper_authors: Xiaofeng Liu, Thibault Marin, Tiss Amal, Jonghye Woo, Georges El Fakhri, Jinsong Ouyang
  • for: 这个研究旨在高效地估计动态PET成像中的质量参数 posterior distribution,基于时间活动曲线的测量。
  • methods: 该研究使用了深度学习框架,通过引入幂变量来抵消前向模型中的信息损失,然后使用conditional variational autoencoder(CVAE)估计 posterior。
  • results: 研究表明,CVAE-based方法可以高效地估计质量参数 posterior distribution,并且与MCMC sampling相比,其效果更好,特别是在低维度数据(单个脑区)下。
    Abstract This work aims efficiently estimating the posterior distribution of kinetic parameters for dynamic positron emission tomography (PET) imaging given a measurement of time of activity curve. Considering the inherent information loss from parametric imaging to measurement space with the forward kinetic model, the inverse mapping is ambiguous. The conventional (but expensive) solution can be the Markov Chain Monte Carlo (MCMC) sampling, which is known to produce unbiased asymptotical estimation. We propose a deep-learning-based framework for efficient posterior estimation. Specifically, we counteract the information loss in the forward process by introducing latent variables. Then, we use a conditional variational autoencoder (CVAE) and optimize its evidence lower bound. The well-trained decoder is able to infer the posterior with a given measurement and the sampled latent variables following a simple multivariate Gaussian distribution. We validate our CVAE-based method using unbiased MCMC as the reference for low-dimensional data (a single brain region) with the simplified reference tissue model.
    摘要 To address this issue, we propose a deep-learning-based framework for efficient posterior estimation. Specifically, we introduce latent variables to counteract the information loss in the forward process. Then, we use a conditional variational autoencoder (CVAE) to optimize the evidence lower bound. The well-trained decoder is able to infer the posterior with a given measurement and the sampled latent variables, following a simple multivariate Gaussian distribution.To validate our CVAE-based method, we use unbiased MCMC as the reference for low-dimensional data (a single brain region) with the simplified reference tissue model. Our results show that the CVAE-based method can provide accurate and efficient posterior estimation for dynamic PET imaging.

Grid Frequency Forecasting in University Campuses using Convolutional LSTM

  • paper_url: http://arxiv.org/abs/2310.16071
  • repo_url: None
  • paper_authors: Aneesh Sathe, Wen Ren Yang
  • for: 这种paper的目的是提出一种基于Convolutional Neural Networks (CNN)和Long Short-Term Memory (LSTM)网络的强化时间序列预测模型,以提高电网的可靠性和灵活性。
  • methods: 这种方法使用了Convolutional LSTM (ConvLSTM)模型,通过训练每个学生会大楼的自适应模型,以适应各自的时间序列数据。同时,一种ensemble模型也是形成,以汇集各个大楼的预测结果,提供整体的预测结果。
  • results: 实验结果表明,提出的方法在预测电网频率方面表现出色,比传统预测技术更高的精度和稳定性。 metrics such as Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) 都表明了这种方法的优势。
    Abstract The modern power grid is facing increasing complexities, primarily stemming from the integration of renewable energy sources and evolving consumption patterns. This paper introduces an innovative methodology that harnesses Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to establish robust time series forecasting models for grid frequency. These models effectively capture the spatiotemporal intricacies inherent in grid frequency data, significantly enhancing prediction accuracy and bolstering power grid reliability. The research explores the potential and development of individualized Convolutional LSTM (ConvLSTM) models for buildings within a university campus, enabling them to be independently trained and evaluated for each building. Individual ConvLSTM models are trained on power consumption data for each campus building and forecast the grid frequency based on historical trends. The results convincingly demonstrate the superiority of the proposed models over traditional forecasting techniques, as evidenced by performance metrics such as Mean Square Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Additionally, an Ensemble Model is formulated to aggregate insights from the building-specific models, delivering comprehensive forecasts for the entire campus. This approach ensures the privacy and security of power consumption data specific to each building.
    摘要 现代电力网络面临着加大复杂性,主要来自于可再生能源资源的整合和消耗模式的变化。这篇论文提出了一种创新的方法,利用卷积神经网络(CNN)和长短期记忆网络(LSTM)建立了可靠的时间序列预测模型,以提高电力网络的可靠性。研究探讨了建筑物pecific Convolutional LSTM(ConvLSTM)模型的可能性和发展,通过对每座建筑物的电力消耗数据进行独立训练和评估,以预测电力网络频率。结果证明了提议的模型在传统预测技术的基础上具有明显的优势,如 Mean Square Error(MSE)、Mean Absolute Error(MAE)和Mean Absolute Percentage Error(MAPE)等性能指标。此外,一个Ensemble Model也被构建,以汇集建筑物特定的预测结果,为整个校园提供全面的预测。这种方法保障了每座建筑物的电力消耗数据的隐私和安全。

Clinical Decision Support System for Unani Medicine Practitioners

  • paper_url: http://arxiv.org/abs/2310.18361
  • repo_url: None
  • paper_authors: Haider Sultan, Hafiza Farwa Mahmood, Noor Fatima, Marriyam Nadeem, Talha Waheed
  • for: 这个研究旨在开发一个基于网络的临床决策支持系统,以帮助新手的医生更准确地诊断疾病,并提供远程医疗服务。
  • methods: 这个系统使用了现代人工智能技术,包括决策树、深度学习和自然语言处理,并将患者症状输入到网络上,然后自动分析和生成潜在疾病列表。
  • results: 这个系统可以帮助医生更快速和准确地诊断疾病,提高患者满意度和医疗效果,同时也可以减少医疗成本和提高医疗资源的利用率。
    Abstract Like other fields of Traditional Medicines, Unani Medicines have been found as an effective medical practice for ages. It is still widely used in the subcontinent, particularly in Pakistan and India. However, Unani Medicines Practitioners are lacking modern IT applications in their everyday clinical practices. An Online Clinical Decision Support System may address this challenge to assist apprentice Unani Medicines practitioners in their diagnostic processes. The proposed system provides a web-based interface to enter the patient's symptoms, which are then automatically analyzed by our system to generate a list of probable diseases. The system allows practitioners to choose the most likely disease and inform patients about the associated treatment options remotely. The system consists of three modules: an Online Clinical Decision Support System, an Artificial Intelligence Inference Engine, and a comprehensive Unani Medicines Database. The system employs advanced AI techniques such as Decision Trees, Deep Learning, and Natural Language Processing. For system development, the project team used a technology stack that includes React, FastAPI, and MySQL. Data and functionality of the application is exposed using APIs for integration and extension with similar domain applications. The novelty of the project is that it addresses the challenge of diagnosing diseases accurately and efficiently in the context of Unani Medicines principles. By leveraging the power of technology, the proposed Clinical Decision Support System has the potential to ease access to healthcare services and information, reduce cost, boost practitioner and patient satisfaction, improve speed and accuracy of the diagnostic process, and provide effective treatments remotely. The application will be useful for Unani Medicines Practitioners, Patients, Government Drug Regulators, Software Developers, and Medical Researchers.
    摘要 如其他传统医学一样,欧奈医学在历史上已经证明自己是一种有效的医疗方式。它仍然广泛使用在亚洲子半岛,特别是在巴基斯坦和印度。然而,欧奈医学实践者缺乏现代IT应用程序在日常临床实践中。一个在线临床决策支持系统可能解决这个挑战,并帮助新手欧奈医学实践者在诊断过程中更加准确和效率。该系统提供了一个网络化界面,让医生输入症状,然后自动由我们的系统分析,生成潜在疾病列表。系统允许医生选择最有可能的疾病,并通过在线向患者提供相关的治疗方案。系统包括三个模块:在线临床决策支持系统、人工智能推理引擎和欧奈医学数据库。系统运用先进的AI技术,如决策树、深度学习和自然语言处理。为系统开发,项目团队使用了技术栈,包括React、FastAPI和MySQL。数据和功能的暴露使用API,以便与类似领域应用程序集成和扩展。该项目的创新之处在于,它通过应用技术来解决欧奈医学原则下诊断疾病的挑战。通过利用技术,该提案的临床决策支持系统具有扩大健康服务和信息的访问权,降低成本,提高医生和患者满意度,提高诊断过程的速度和准确性,并提供远程治疗方案。该应用程序对欧奈医学实践者、患者、政府药品监管部门、软件开发者和医学研究人员都将有用。

A Diffusion Weighted Graph Framework for New Intent Discovery

  • paper_url: http://arxiv.org/abs/2310.15836
  • repo_url: https://github.com/yibai-shi/dwgf
  • paper_authors: Wenkai Shi, Wenbin An, Feng Tian, Qinghua Zheng, QianYing Wang, Ping Chen
  • for: 本研究旨在用有限量的标注数据来识别新和已知意图 from 无标注数据,并提供更充分和可靠的监督信号。
  • methods: 我们提出了一种新的Diffusion Weighted Graph Framework (DWGF),可以捕捉数据中的语义相似性和结构关系,并生成更加充分和可靠的监督信号。
  • results: 我们的方法在多个 benchmark 数据集上的所有评价指标上都达到了state-of-the-art 水平。
    Abstract New Intent Discovery (NID) aims to recognize both new and known intents from unlabeled data with the aid of limited labeled data containing only known intents. Without considering structure relationships between samples, previous methods generate noisy supervisory signals which cannot strike a balance between quantity and quality, hindering the formation of new intent clusters and effective transfer of the pre-training knowledge. To mitigate this limitation, we propose a novel Diffusion Weighted Graph Framework (DWGF) to capture both semantic similarities and structure relationships inherent in data, enabling more sufficient and reliable supervisory signals. Specifically, for each sample, we diffuse neighborhood relationships along semantic paths guided by the nearest neighbors for multiple hops to characterize its local structure discriminately. Then, we sample its positive keys and weigh them based on semantic similarities and local structures for contrastive learning. During inference, we further propose Graph Smoothing Filter (GSF) to explicitly utilize the structure relationships to filter high-frequency noise embodied in semantically ambiguous samples on the cluster boundary. Extensive experiments show that our method outperforms state-of-the-art models on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/yibai-shi/DWGF.
    摘要 新意探索(NID)目标是从无标签数据中识别新和已知意图,帮助于限量标注数据中只包含已知意图。过去的方法生成了不稳定的超级监督信号,无法保持新意群和传输预训练知识的效果。为了解决这一限制,我们提出了一种新的扩散加重图框架(DWGF),可以捕捉数据中的语义相似性和结构关系,生成更充分和可靠的监督信号。具体来说,对每个样本,我们将邻居关系扩散到语义路径上,由最近邻居 guid 的多个跳步来特征化其地方结构。然后,我们将其正样例选择,并根据语义相似性和地方结构进行权重调整。在推理过程中,我们还提出了图像缓和筛选器(GSF),可以直接利用结构关系来过滤semantically ambiguous的样本。广泛的实验表明,我们的方法在多个评价指标上都超过了当前的模型。代码和数据可以在https://github.com/yibai-shi/DWGF中获取。

Automatic Aorta Segmentation with Heavily Augmented, High-Resolution 3-D ResUNet: Contribution to the SEG.A Challenge

  • paper_url: http://arxiv.org/abs/2310.15827
  • repo_url: https://github.com/mwod/sega_mw_2023
  • paper_authors: Marek Wodzinski, Henning Müller
  • for: 这个论文主要目标是提出一种自动推断三维医疗图像中的大动脉分割方法,以解决医疗图像中动脉分割的难题。
  • methods: 该方法基于深度编码器-解码器架构,并且假设数据预处理和扩展对于深度架构的性能更为重要,特别是在低数据范围内。因此,该方法基于变体的卷积核网络。
  • results: 该方法在所有测试 caso中取得了 dice 分数大于0.9,并且在所有参与者中得分最高,在临床评估、量化结果和三维瓷砾质量等方面分别获得了第一、第四和第三名。ources code、预训练模型和算法在 Grand-Challenge 平台上提供开源。
    Abstract Automatic aorta segmentation from 3-D medical volumes is an important yet difficult task. Several factors make the problem challenging, e.g. the possibility of aortic dissection or the difficulty with segmenting and annotating the small branches. This work presents a contribution by the MedGIFT team to the SEG.A challenge organized during the MICCAI 2023 conference. We propose a fully automated algorithm based on deep encoder-decoder architecture. The main assumption behind our work is that data preprocessing and augmentation are much more important than the deep architecture, especially in low data regimes. Therefore, the solution is based on a variant of traditional convolutional U-Net. The proposed solution achieved a Dice score above 0.9 for all testing cases with the highest stability among all participants. The method scored 1st, 4th, and 3rd in terms of the clinical evaluation, quantitative results, and volumetric meshing quality, respectively. We freely release the source code, pretrained model, and provide access to the algorithm on the Grand-Challenge platform.
    摘要 自动从三维医疗量子中分类索引静脉是一项重要但困难的任务。几种因素使得这个问题复杂,例如索引分化或者分类和注释小支流的困难。本文是MedGIFT团队对SEG.A挑战的贡献,于MICCAI 2023会议上举行。我们提议一种完全自动的深度编码器-解码器架构。我们假设数据预处理和扩展是在低数据情况下更重要的,特别是在低数据情况下。因此,我们的解决方案基于变种的传统 convolutional U-Net。我们的方法在测试案例中的 Dice 分数超过 0.9,并且在所有参与者中显示出最高稳定性。在临床评估、量化结果和卷积质量方面,我们的方法分别获得了第一名、第四名和第三名。我们开源了源代码、预训练模型,并在 Grand-Challenge 平台上提供了算法访问权限。

Rosetta Stone at KSAA-RD Shared Task: A Hop From Language Modeling To Word–Definition Alignment

  • paper_url: http://arxiv.org/abs/2310.15823
  • repo_url: None
  • paper_authors: Ahmed ElBakry, Mohamed Gabr, Muhammad ElNokrashy, Badr AlKhamissi
  • for: 本文旨在提供一种用于解决阿拉伯语”tip-of-the-tongue”现象的词典查询工具。
  • methods: 本文使用了一个 ensemble of 阿拉伯语BERT模型,通过对定义进行预处理并使用 average 方法生成word embedding。
  • results: 本文在两个子任务中都达到了最高分,并且在两个子任务中的结果表明了 ensemble 方法的可靠性和稳定性。
    Abstract A Reverse Dictionary is a tool enabling users to discover a word based on its provided definition, meaning, or description. Such a technique proves valuable in various scenarios, aiding language learners who possess a description of a word without its identity, and benefiting writers seeking precise terminology. These scenarios often encapsulate what is referred to as the "Tip-of-the-Tongue" (TOT) phenomena. In this work, we present our winning solution for the Arabic Reverse Dictionary shared task. This task focuses on deriving a vector representation of an Arabic word from its accompanying description. The shared task encompasses two distinct subtasks: the first involves an Arabic definition as input, while the second employs an English definition. For the first subtask, our approach relies on an ensemble of finetuned Arabic BERT-based models, predicting the word embedding for a given definition. The final representation is obtained through averaging the output embeddings from each model within the ensemble. In contrast, the most effective solution for the second subtask involves translating the English test definitions into Arabic and applying them to the finetuned models originally trained for the first subtask. This straightforward method achieves the highest score across both subtasks.
    摘要 一个反字典是一种工具,允许用户根据提供的定义、含义或描述找到一个词。这种技术在不同的场景中具有很大的价值,帮助语言学习者只有word的描述而没有其标识,也有助于作家寻找精准的术语。这些场景通常被称为“舌尖上的灵感”(Tip-of-the-Tongue,TOT)现象。在这个工作中,我们介绍了我们对阿拉伯语反字典共享任务的赢利解决方案。这个任务的目标是从阿拉伯语定义中提取一个阿拉伯语词的向量表示。共享任务包括两个不同的子任务:第一个使用阿拉伯语定义作为输入,第二个使用英语定义。对于第一个子任务,我们的方法是使用预处理的阿拉伯语BERT模型 ensemble,将给定定义生成word embedding。最终的表示是通过ensemble中每个模型的输出均值来获得。相反,对于第二个子任务,我们发现简单地将英语测试定义翻译成阿拉伯语,然后将其应用于原始预处理的阿拉伯语BERT模型中得到了最高分。

Discriminator Guidance for Autoregressive Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.15817
  • repo_url: None
  • paper_authors: Filip Ekström Kelvinius, Fredrik Lindsten
  • for: 这个论文是用来描述如何使用抑制器导向抽象扩散模型进行生成的。
  • methods: 这个论文使用了一个抑制器来导向一个抽象扩散过程,以前这种方法已经在连续扩散模型中使用过。这里的作者 derive了在离散 caso 中使用抑制器和预训练生成模型的方法。
  • results: 作者通过对分子图生成任务进行测试,发现使用抑制器可以提高生成性能。
    Abstract We introduce discriminator guidance in the setting of Autoregressive Diffusion Models. The use of a discriminator to guide a diffusion process has previously been used for continuous diffusion models, and in this work we derive ways of using a discriminator together with a pretrained generative model in the discrete case. First, we show that using an optimal discriminator will correct the pretrained model and enable exact sampling from the underlying data distribution. Second, to account for the realistic scenario of using a sub-optimal discriminator, we derive a sequential Monte Carlo algorithm which iteratively takes the predictions from the discrimiator into account during the generation process. We test these approaches on the task of generating molecular graphs and show how the discriminator improves the generative performance over using only the pretrained model.
    摘要 我们介绍了推导者导向的抽象扩散模型。在这篇文章中,我们 derivates了使用推导者与预训生成模型在碎散 случа的使用方法。首先,我们显示了使用最佳推导者可以正确地训练预训生成模型,并允许精准地抽取背景数据分布中的样本。其次,为了考虑实际情况中使用不Optimal推导者的情况,我们 derivates了一个序列 Монте卡洛 Algorithms,它在生成过程中逐步地考虑推导者的预测结果。我们在生成分子图的任务上试用了这些方法,并证明了推导者可以提高生成性能。

  • paper_url: http://arxiv.org/abs/2310.15799
  • repo_url: https://github.com/sreyan88/dale
  • paper_authors: Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar, S Ramaneswaran, S Sakshi, Utkarsh Tyagi, Dinesh Manocha
  • for: 提供了一个新的、有效的生成数据增强框架,用于低资源法律自然语言处理(Legal NLP)。
  • methods: 使用Encoder-Decoder语言模型,在 selective masking 基础上进行预训练,以获得法律语言特有的知识和语言使用表达。
  • results: 在13个数据集和6个任务中,DALE对低资源法律自然语言处理任务进行了有效的增强,与基线相比,DALE的表现提高1%-50%。
    Abstract We present DALE, a novel and effective generative Data Augmentation framework for low-resource LEgal NLP. DALE addresses the challenges existing frameworks pose in generating effective data augmentations of legal documents - legal language, with its specialized vocabulary and complex semantics, morphology, and syntax, does not benefit from data augmentations that merely rephrase the source sentence. To address this, DALE, built on an Encoder-Decoder Language Model, is pre-trained on a novel unsupervised text denoising objective based on selective masking - our masking strategy exploits the domain-specific language characteristics of templatized legal documents to mask collocated spans of text. Denoising these spans helps DALE acquire knowledge about legal concepts, principles, and language usage. Consequently, it develops the ability to generate coherent and diverse augmentations with novel contexts. Finally, DALE performs conditional generation to generate synthetic augmentations for low-resource Legal NLP tasks. We demonstrate the effectiveness of DALE on 13 datasets spanning 6 tasks and 4 low-resource settings. DALE outperforms all our baselines, including LLMs, qualitatively and quantitatively, with improvements of 1%-50%.
    摘要 我们介绍DALE,一种新的有效的生成数据扩展框架,用于低资源法律自然语言处理(Legal NLP)。DALE解决了现有框架对法律文档生成有效数据扩展的挑战,因为法律语言具有特殊词汇和复杂的语法、语义和语法结构,而且不能仅通过重句来生成有效的数据扩展。为了解决这个问题,DALE采用了一个 Encoder-Decoder 语言模型,并在其前置训练时使用一种新的无监督文本净化目标函数,基于选择性遮盖。我们的遮盖策略利用了特定领域的法律文档语言特征, selectively 遮盖了协调的文本段。这种净化帮助DALE获得了法律概念、原则和语言使用的知识。因此,DALE可以生成具有新Context的coherent和多样化的扩展。最后,DALE进行了conditional生成,为低资源法律 NLP 任务生成了 sintetic的扩展。我们在13个数据集和6个任务上demonstrate了DALE的效果,与基线相比,DALE表现出1%-50%的提升。

Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation

  • paper_url: http://arxiv.org/abs/2310.15797
  • repo_url: https://github.com/jiaangl/randomquantization
  • paper_authors: Jiaang Li, Quan Wang, Yi Liu, Licheng Zhang, Zhendong Mao
  • for: 这篇论文关注在知识图(KG)表示学习中的实体表示问题,尤其是现有的KG Embedding(KGE)方法在扩展性上存在挑战。
  • methods: 该论文提出了一种新的方法,即随机实体量化,即对实体进行随机分配小型编码图表。
  • results: 研究发现,随机实体量化可以达到类似于现有策略的效果,并且通过分析 entropy 和 Jaccard 距离,解释了这种现象。
    Abstract Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies. The code to reproduce our results is available at https://github.com/JiaangL/RandomQuantization.
    摘要 知识图(KG)下 Representation Learning 是下游任务的关键。目前主流方法是知识图嵌入(KGE),它使用独立的向量表示实体,但面临缩放挑战。近期研究提出了一种减少参数的方法,即使用预定的小规模码表来表示实体。我们称之为实体量化,之前的工作已经设计了复杂的策略。很奇怪的是,这篇论文显示了Random Entity Quantization可以达到类似的效果。我们分析了这个现象,发现实体代码,即表示实体的量化结果,在Random Entity Quantization中具有更高的熵度和Jacard距离。因此,不同的实体更容易区分,从而实现有效的KG表示。上述结果表明,当前的量化策略并不是KG表示的关键因素,还有可能提高实体区分性的空间。代码可以在https://github.com/JiaangL/RandomQuantization中找到。

Improving generalization in large language models by learning prefix subspaces

  • paper_url: http://arxiv.org/abs/2310.15793
  • repo_url: None
  • paper_authors: Louis Falissard, Vincent Guigue, Laure Soulier
  • for: 这篇论文主要关注大语言模型(LLMs)在缺乏数据的情况下(也称为“几个样本”学习设定)进行精细调整。
  • methods: 该论文提出了一种基于神经网络空间的方法,用于提高LLMs的通用能力。这种优化方法,原来在计算机视觉领域中引入,通过同时优化参数空间中的整个简单кс(simplex)来提高模型的通用能力。然而,将这种方法应用于大型、预训练的变换器 pose 些挑战,主要是由于它们具有大量参数和固定参数初始化方法。我们表明,“参数效率微调”(PEFT)方法是与这种原始方法完全兼容的,并提议在参数空间中学习整个简单кс中的连续前缀。
  • results: 我们在一个基于GLUE bencmark的变换器上进行了测试,并显示了我们的贡献对于“几个样本”学习设定下的平均性能具有提高。实现可以在以下链接中找到:https://github.com/Liloulou/prefix_subspace
    Abstract This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting). We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces. This optimization method, recently introduced in computer vision, aims to improve model generalization by identifying wider local optima through the joint optimization of an entire simplex of models in parameter space. Its adaptation to massive, pretrained transformers, however, poses some challenges. First, their considerable number of parameters makes it difficult to train several models jointly, and second, their deterministic parameter initialization schemes make them unfit for the subspace method as originally proposed. We show in this paper that "Parameter Efficient Fine-Tuning" (PEFT) methods, however, are perfectly compatible with this original approach, and propose to learn entire simplex of continuous prefixes. We test our method on a variant of the GLUE benchmark adapted to the few-shot learning setting, and show that both our contributions jointly lead to a gain in average performances compared to sota methods. The implementation can be found at the following link: https://github.com/Liloulou/prefix_subspace
    摘要 Translated into Simplified Chinese:这篇文章关注大型语言模型(LLMs)在缺乏数据情况下(也称为“几个样本学习”)的细化。我们提出一种方法,通过神经网络子空间来增强LLMs的泛化能力。这种优化方法,在计算机视觉领域已经引入,目的是通过 JOINT 优化整个参数空间中的模型,以提高模型的泛化性。但是对于大型预训练转换器来说,它们的许多参数和确定性参数初始化方式使得它们不适合原始的子空间方法。我们在这篇文章中显示,使用 "Parameter Efficient Fine-Tuning"(PEFT)方法可以与原始方法相结合,并提议学习整个继承refix的连续前缀。我们在一个基于 GLUE benchmark 的修改版本上测试了我们的方法,并显示了我们的贡献的共同作用导致了与 SOTA 方法相比的平均性能提高。实现可以在以下链接中找到:

Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut Triggers

  • paper_url: http://arxiv.org/abs/2310.18360
  • repo_url: https://github.com/mosh0110/guiding-llm
  • paper_authors: Mosh Levy, Shauli Ravfogel, Yoav Goldberg
  • for: 这篇论文探讨了LLMs在机器阅读理解(MRC)系统中的应用,以及短cuts Mechanisms在其可靠性上的影响。
  • methods: 作者分析了LLMs作为编辑和读者两个角度,并提出了一个框架来引导编辑添加可能的短cut triggers。使用GPT4作为编辑,研究发现GPT4可以成功地编辑出Trigger短cut。
  • results: 研究发现,即使使用 capable LLMs,也可以使用短cut知识来欺骗LLMs。另外,发现GPT4可以被自己的编辑(15%的F1下降)。这些发现表明LLMs在短cut manipulate下存在潜在的漏洞。作者发布了ShortcutQA数据集,供未来研究使用。
    Abstract Recent applications of LLMs in Machine Reading Comprehension (MRC) systems have shown impressive results, but the use of shortcuts, mechanisms triggered by features spuriously correlated to the true label, has emerged as a potential threat to their reliability. We analyze the problem from two angles: LLMs as editors, guided to edit text to mislead LLMs; and LLMs as readers, who answer questions based on the edited text. We introduce a framework that guides an editor to add potential shortcuts-triggers to samples. Using GPT4 as the editor, we find it can successfully edit trigger shortcut in samples that fool LLMs. Analysing LLMs as readers, we observe that even capable LLMs can be deceived using shortcut knowledge. Strikingly, we discover that GPT4 can be deceived by its own edits (15% drop in F1). Our findings highlight inherent vulnerabilities of LLMs to shortcut manipulations. We publish ShortcutQA, a curated dataset generated by our framework for future research.
    摘要 Translated into Simplified Chinese:最近,大语言模型(LLMs)在机器阅读理解(MRC)系统中的应用有所成就,但使用快捷途径,基于true标签的特征相关性,被视为可能的可靠性问题。我们从两个角度分析问题:LLMs作为编辑,受导向编辑文本以诱导LLMs;以及LLMs作为读者,根据编辑后的文本回答问题。我们提出了一个框架,帮助编辑添加可能的快捷途径触发器到样本中。使用GPT4作为编辑,我们发现它可以成功地编辑诱导LLMs的快捷途径。我们分析LLMs作为读者时,发现即使有能力的LLMs也可以被使用快捷途径知识欺骗。另外,我们发现GPT4可以被自己的编辑(15%下降)欺骗。我们的发现表明LLMs具有快捷途径 manipulate的潜在弱点。我们发布了ShortcutQA数据集,用于未来的研究。

SequenceMatch: Revisiting the design of weak-strong augmentations for Semi-supervised learning

  • paper_url: http://arxiv.org/abs/2310.15787
  • repo_url: None
  • paper_authors: Khanh-Binh Nguyen
  • for: 本研究的目的是提出一种高效的 semi-supervised learning (SSL) 方法,以增强模型在半有标注数据上的训练。
  • methods: 本方法使用多种数据增强,其中包括一种中等增强,以减少模型对半有标注数据的过拟合。此外,本方法还定义了两种不同的一致性约束,以适应不同的预测情况。
  • results: 对于标准的 CIFAR-10/100、SVHN 和 STL-10 测试集,SequenceMatch 减少了模型的训练时间和数据量,同时保持高度的准确率。此外,SequenceMatch 在 ImageNet 大规模测试集上也达到了新的州OF-THE-ART 水平,其error rate 为 38.46%。
    Abstract Semi-supervised learning (SSL) has become popular in recent years because it allows the training of a model using a large amount of unlabeled data. However, one issue that many SSL methods face is the confirmation bias, which occurs when the model is overfitted to the small labeled training dataset and produces overconfident, incorrect predictions. To address this issue, we propose SequenceMatch, an efficient SSL method that utilizes multiple data augmentations. The key element of SequenceMatch is the inclusion of a medium augmentation for unlabeled data. By taking advantage of different augmentations and the consistency constraints between each pair of augmented examples, SequenceMatch helps reduce the divergence between the prediction distribution of the model for weakly and strongly augmented examples. In addition, SequenceMatch defines two different consistency constraints for high and low-confidence predictions. As a result, SequenceMatch is more data-efficient than ReMixMatch, and more time-efficient than both ReMixMatch ($\times4$) and CoMatch ($\times2$) while having higher accuracy. Despite its simplicity, SequenceMatch consistently outperforms prior methods on standard benchmarks, such as CIFAR-10/100, SVHN, and STL-10. It also surpasses prior state-of-the-art methods by a large margin on large-scale datasets such as ImageNet, with a 38.46\% error rate. Code is available at https://github.com/beandkay/SequenceMatch.
    摘要 隐supervised learning(SSL)在最近几年内变得越来越流行,因为它允许使用大量无标签数据来训练模型。然而,许多SSL方法面临的问题是确认偏见,这种情况下模型将被适应小标注训练集,并生成过自信的错误预测。为解决这个问题,我们提出了SequenceMatch方法,它利用多种数据增强。SequenceMatch的关键元素是将媒体增强加到无标签数据中。通过利用不同的增强和每对增强的约束相互之间的一致性,SequenceMatch可以减少模型预测结果的分化。此外,SequenceMatch还定义了高和低自信预测的两种一致约束。因此,SequenceMatch比RemixMatch更有效率,比RemixMatch(×4)和Comatch(×2)更快速,同时具有更高的准确率。尽管简单,SequenceMatch在标准标准benchmark上一直保持领先,包括CIFAR-10/100、SVHN和STL-10。它还在大规模数据集上超越了先前的状态rut bearing, erreur rate为38.46%。代码可以在https://github.com/beandkay/SequenceMatch上找到。

3D Masked Autoencoders for Enhanced Privacy in MRI Scans

  • paper_url: http://arxiv.org/abs/2310.15778
  • repo_url: None
  • paper_authors: Lennart Alexander Van der Goten, Kevin Smith
  • for: 防止MRI扫描数据泄露个人隐私信息
  • methods: 使用Masked Autoencoders和Generative Adversarial Networks(GAN)进行数据隐私处理
  • results: 提出CP-MAE模型,可以在下游任务中提高隐私处理性能,同时可以生成高分辨率($256^3$)的扫描数据 Synthesize
    Abstract MRI scans provide valuable medical information, however they also contain sensitive and personally identifiable information (PII) that needs to be protected. Whereas MRI metadata is easily sanitized, MRI image data is a privacy risk because it contains information to render highly-realistic 3D visualizations of a patient's head, enabling malicious actors to possibly identify the subject by cross-referencing a database. Data anonymization and de-identification is concerned with ensuring the privacy and confidentiality of individuals' personal information. Traditional MRI de-identification methods remove privacy-sensitive parts (e.g. eyes, nose etc.) from a given scan. This comes at the expense of introducing a domain shift that can throw off downstream analyses. Recently, a GAN-based approach was proposed to de-identify a patient's scan by remodeling it (e.g. changing the face) rather than by removing parts. In this work, we propose CP-MAE, a model that de-identifies the face using masked autoencoders and that outperforms all previous approaches in terms of downstream task performance as well as de-identification. With our method we are able to synthesize scans of resolution up to $256^3$ (previously 128 cubic) which constitutes an eight-fold increase in the number of voxels. Using our construction we were able to design a system that exhibits a highly robust training stage, making it easy to fit the network on novel data.
    摘要

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

  • paper_url: http://arxiv.org/abs/2310.15777
  • repo_url: None
  • paper_authors: Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Heyan Huang, Yang Gao
  • for: 这 paper 的目的是开发轻量级大语言模型,以提高模型的效率和可扩展性,并应对训练和部署大模型的高成本和资源短缺。
  • methods: 这 paper 使用了自scratch 训练的双语大语言模型 MindLLM,并提供了 1.3 亿和 3 亿参数的模型。文章还介绍了数据构造、模型体系、评估和应用等方面的经验。
  • results: MindLLM 可以与其他更大的开源模型在一些公共Benchmark上匹配或超越其表现,而且文章还提出了一种针对更小的模型进行调整 instrucion 的框架,以提高其能力。此外,文章还探讨了 MindLLM 在法律和金融等特定领域的应用。
    Abstract Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
    摘要 大型自然语言模型(LLM)已经展示出惊人的性能, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Causal Understanding of Why Users Share Hate Speech on Social Media

  • paper_url: http://arxiv.org/abs/2310.15772
  • repo_url: None
  • paper_authors: Dominique Geissler, Abdurahman Maarouf, Stefan Feuerriegel
  • for: 这篇论文旨在探讨用户在社交媒体上分享仇恨言论的原因,以及如何预测和防止这种情况。
  • methods: 这篇论文采用了一种新的三步 causal 框架,包括减少选择性的偏见、使用倒排比例分数模型 latent 感知用户对仇恨言论的抵触程度,以及基于这些感知值模型 causal 效应。
  • results: 研究发现,有 fewer followers、fewer friends 和 fewer posts 的用户更容易分享仇恨言论,而 younger accounts 则更 unlikely to do so。这些结论可以帮助检测可能会发生危害的用户,并设计有效的预防策略。
    Abstract Hate speech on social media threatens the mental and physical well-being of individuals and is further responsible for real-world violence. An important driver behind the spread of hate speech and thus why hateful posts can go viral are reshares, yet little is known about why users reshare hate speech. In this paper, we present a comprehensive, causal analysis of the user attributes that make users reshare hate speech. However, causal inference from observational social media data is challenging, because such data likely suffer from selection bias, and there is further confounding due to differences in the vulnerability of users to hate speech. We develop a novel, three-step causal framework: (1) We debias the observational social media data by applying inverse propensity scoring. (2) We use the debiased propensity scores to model the latent vulnerability of users to hate speech as a latent embedding. (3) We model the causal effects of user attributes on users' probability of sharing hate speech, while controlling for the latent vulnerability of users to hate speech. Compared to existing baselines, a particular strength of our framework is that it models causal effects that are non-linear, yet still explainable. We find that users with fewer followers, fewer friends, and fewer posts share more hate speech. Younger accounts, in return, share less hate speech. Overall, understanding the factors that drive users to share hate speech is crucial for detecting individuals at risk of engaging in harmful behavior and for designing effective mitigation strategies.
    摘要 仇恨言语在社交媒体上威胁个人的心理和身体健康,并导致实际暴力。散布仇恨言语的重要驱动力是重分享,然而现有少知为何用户会重分享仇恨言语。在这篇论文中,我们提供了一个完整的 causal 分析,探讨用户特征是如何导致他们重分享仇恨言语。然而,从观察社交媒体数据中进行 causal 推断是困难的,因为这些数据可能受到选择偏见的影响,并且存在用户敏感性差异,导致不同用户对仇恨言语的敏感程度不同。我们开发了一种新的三步 causal 框架:1. 我们使用反报分析来减少社交媒体数据的选择偏见。2. 我们使用减少后的报分数来模型用户对仇恨言语的敏感程度,并将其转换为一个隐藏的嵌入。3. 我们使用隐藏嵌入来模型用户特征对重分享仇恨言语的 causal 影响,同时控制用户对仇恨言语的敏感程度。相比现有基线,我们的框架的一个重要优势是可以模型非线性的 causal 效应,同时仍然可以解释。我们发现,分享仇恨言语的用户通常有 fewer followers、fewer friends 和 fewer posts,而年轻的帐户则更少分享仇恨言语。总之,理解用户分享仇恨言语的因素非常重要,以便检测可能产生危害行为的个人,并设计有效的缓解策略。

Debiasing, calibrating, and improving Semi-supervised Learning performance via simple Ensemble Projector

  • paper_url: http://arxiv.org/abs/2310.15764
  • repo_url: https://github.com/beandkay/epass
  • paper_authors: Khanh-Binh Nguyen
  • for: 本研究旨在提高现有的对联训练 semi-supervised learning 框架的性能,而不是 introduce 更多的网络组件和训练过程。
  • methods: 该 paper 提出了一种简单的方法,名为 Ensemble Projectors Aided for Semi-supervised Learning (EPASS),它是通过改进学习的嵌入来提高 semi-supervised learning 的性能。
  • results: EPASS 可以提高泛化、强化特征表示和提高性能,例如,在 SimMatch 和 CoMatch datasets 上,EPASS 可以提高基eline 的 semi-supervised learning 性能 by 39.47%/31.39%/24.70% top-1 error rate,并且可以在不同的方法、网络架构和 dataset 上获得一致的提高。
    Abstract Recent studies on semi-supervised learning (SSL) have achieved great success. Despite their promising performance, current state-of-the-art methods tend toward increasingly complex designs at the cost of introducing more network components and additional training procedures. In this paper, we propose a simple method named Ensemble Projectors Aided for Semi-supervised Learning (EPASS), which focuses mainly on improving the learned embeddings to boost the performance of the existing contrastive joint-training semi-supervised learning frameworks. Unlike standard methods, where the learned embeddings from one projector are stored in memory banks to be used with contrastive learning, EPASS stores the ensemble embeddings from multiple projectors in memory banks. As a result, EPASS improves generalization, strengthens feature representation, and boosts performance. For instance, EPASS improves strong baselines for semi-supervised learning by 39.47\%/31.39\%/24.70\% top-1 error rate, while using only 100k/1\%/10\% of labeled data for SimMatch, and achieves 40.24\%/32.64\%/25.90\% top-1 error rate for CoMatch on the ImageNet dataset. These improvements are consistent across methods, network architectures, and datasets, proving the general effectiveness of the proposed methods. Code is available at https://github.com/beandkay/EPASS.
    摘要 最近的半监督学习(SSL)研究已经取得了很大的成功。虽然现有的状态之arteMethods tend to increasingly complex designs at the cost of introducing more network components and additional training procedures. 在这篇论文中,我们提出了一种简单的方法named Ensemble Projectors Aided for Semi-supervised Learning(EPASS),它主要是通过改进学习的嵌入来提高现有的对比训练半监督学习框架的性能。 unlike standard methods, where the learned embeddings from one projector are stored in memory banks to be used with contrastive learning, EPASS stores the ensemble embeddings from multiple projectors in memory banks. 因此,EPASS可以提高总体化,强化特征表示,并提高性能。例如,EPASS可以使用100k/1\%/10\%的标注数据对SimMatch进行训练,并实现了ImageNet数据集上的40.24\%/32.64\%/25.90\%的top-1错误率,而使用标注数据的数量和网络架构和数据集的变化,这些改进是一致的,证明了提posed方法的通用效果。代码可以在https://github.com/beandkay/EPASS中找到。

Integrating Language Models into Direct Speech Translation: An Inference-Time Solution to Control Gender Inflection

  • paper_url: http://arxiv.org/abs/2310.15752
  • repo_url: https://github.com/hlt-mt/fbk-fairseq
  • paper_authors: Dennis Fucci, Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli
  • for: 这篇论文主要是为了提高语音翻译(ST)系统中的 speaker-related gender inflection 控制,以提高翻译的准确性和可靠性。
  • methods: 这篇论文提出了一种在运行时使用 external language model(LM)来控制 ST 系统中的 speaker-related gender inflection 的方法,而不需要对模型进行专门的重新训练。
  • results: 根据实验结果,这种方法可以在 en->es/fr/it 等语言对话中提高 gender accuracy 的表现,相比基eline模型和最佳训练时间缓解策略,可以提高31.0和1.6个点的准确率。此外,当 speaker 的 vocal trait 与 gender 存在冲突时,这种方法的提高还更加明显,可以提高32.0和3.4个点的准确率。
    Abstract When translating words referring to the speaker, speech translation (ST) systems should not resort to default masculine generics nor rely on potentially misleading vocal traits. Rather, they should assign gender according to the speakers' preference. The existing solutions to do so, though effective, are hardly feasible in practice as they involve dedicated model re-training on gender-labeled ST data. To overcome these limitations, we propose the first inference-time solution to control speaker-related gender inflections in ST. Our approach partially replaces the (biased) internal language model (LM) implicitly learned by the ST decoder with gender-specific external LMs. Experiments on en->es/fr/it show that our solution outperforms the base models and the best training-time mitigation strategy by up to 31.0 and 1.6 points in gender accuracy, respectively, for feminine forms. The gains are even larger (up to 32.0 and 3.4) in the challenging condition where speakers' vocal traits conflict with their gender.
    摘要 当翻译话语时,ST系统 shouldn't resort to default masculine generics 也不可以 rely on potentially misleading vocal traits。而是应该根据Speaker的偏好 assign gender。现有的解决方案,虽然有效,但在实践中困难可行,需要重新训练 gender-labeled ST 数据。为了缓解这些限制,我们提出了首个在推理时控制 speaker-related gender inflections的解决方案。我们的方法部分地将(偏见的)STdecoder内部的语言模型(LM)被换成 gender-specific external LMs。实验表明,我们的解决方案在 en->es/fr/it 等语言之间对比 base models 和最佳训练时mitigation strategy 的性能有31.0和1.6点的提升,即 gender accuracy 的提升。这些提升还更大(达到32.0和3.4)在 speakers' vocal traits 与 gender 之间的复杂情况下。

The Hyperdimensional Transform: a Holographic Representation of Functions

  • paper_url: http://arxiv.org/abs/2310.16065
  • repo_url: None
  • paper_authors: Pieter Dewulf, Michiel Stock, Bernard De Baets
  • For: The paper introduces a new type of integral transform called the hyperdimensional transform, which maps square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors.* Methods: The paper uses a set of stochastic, orthogonal basis functions to approximate a function as a linear combination of random functions, and defines the hyperdimensional transform and its inverse.* Results: The paper discusses general transform-related properties such as uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives, and provides straightforward and easily understandable code for computing the transform and solving differential equations.
    Abstract Integral transforms are invaluable mathematical tools to map functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new kind of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modelling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.
    摘要 Integral transforms are incredibly useful mathematical tools for mapping functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new type of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modeling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.Here's the translation in Traditional Chinese:Integral transforms are incredibly useful mathematical tools for mapping functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new type of integral transform. It converts square-integrable functions into noise-robust, holographic, high-dimensional representations called hyperdimensional vectors. The central idea is to approximate a function by a linear combination of random functions. We formally introduce a set of stochastic, orthogonal basis functions and define the hyperdimensional transform and its inverse. We discuss general transform-related properties such as its uniqueness, approximation properties of the inverse transform, and the representation of integrals and derivatives. The hyperdimensional transform offers a powerful, flexible framework that connects closely with other integral transforms, such as the Fourier, Laplace, and fuzzy transforms. Moreover, it provides theoretical foundations and new insights for the field of hyperdimensional computing, a computing paradigm that is rapidly gaining attention for efficient and explainable machine learning algorithms, with potential applications in statistical modeling and machine learning. In addition, we provide straightforward and easily understandable code, which can function as a tutorial and allows for the reproduction of the demonstrated examples, from computing the transform to solving differential equations.

Recurrent Linear Transformers

  • paper_url: http://arxiv.org/abs/2310.15719
  • repo_url: https://github.com/subho406/Recurrent-Linear-Transformers
  • paper_authors: Subhojeet Pramanik, Esraa Elelimy, Marlos C. Machado, Adam White
  • for: 本文目的是提出一种基于循环自注意力机制的 alternating attention 机制,以提高 transformer 模型在 reinforcement learning задача中的应用可行性。
  • methods: 本文使用了一种基于循环自注意力机制的 alternating attention 机制,该机制可以独立地进行启发成本计算,同时能够有效地利用长距离依赖关系。
  • results: 对于 reinforcement learning 问题,本文的方法与 state-of-the-art 方法 GTrXL 相比,推理成本下降至少 40%,并且减少了内存使用量超过 50%。此外,本文的方法在 harder tasks 上比 GTrXL 表现更好,提高了更多于 37%。
    Abstract The self-attention mechanism in the transformer architecture is capable of capturing long-range dependencies and it is the main reason behind its effectiveness in processing sequential data. Nevertheless, despite their success, transformers have two significant drawbacks that still limit their broader applicability: (1) In order to remember past information, the self-attention mechanism requires access to the whole history to be provided as context. (2) The inference cost in transformers is expensive. In this paper we introduce recurrent alternatives to the transformer self-attention mechanism that offer a context-independent inference cost, leverage long-range dependencies effectively, and perform well in practice. We evaluate our approaches in reinforcement learning problems where the aforementioned computational limitations make the application of transformers nearly infeasible. We quantify the impact of the different components of our architecture in a diagnostic environment and assess performance gains in 2D and 3D pixel-based partially-observable environments. When compared to a state-of-the-art architecture, GTrXL, inference in our approach is at least 40% cheaper while reducing memory use in more than 50%. Our approach either performs similarly or better than GTrXL, improving more than 37% upon GTrXL performance on harder tasks.
    摘要 transformer 架构中的自注意机制可以 capture 长距离依赖关系,这是它处理序列数据的效果的主要原因。然而,尽管它们成功, transformer 还有两个主要的缺点,这些缺点还限制了它们的更广泛应用:1. 为了记忆过去的信息,自注意机制需要提供全部历史作为 Context。2. transformer 的推理成本贵。在这篇论文中,我们介绍了 transformer 自注意机制的循环alternative,这些alternative可以提供context-independent的推理成本,高效地利用长距离依赖关系,并在实践中表现良好。我们在 reinforcement learning 问题中评估了我们的方法,这些问题的计算限制使 transformer 在实际应用中几乎无法实现。我们评估了不同组件的影响,并在二维和三维像素基于部分可见环境中评估性能。与 state-of-the-art 架构 GTrXL 相比,我们的方法的推理成本至少减少了40%,同时减少了内存使用量超过50%。我们的方法在更难的任务上表现相对或更好于 GTrXL,提高了More than 37% 的 GTrXL 性能。

Solving large flexible job shop scheduling instances by generating a diverse set of scheduling policies with deep reinforcement learning

  • paper_url: http://arxiv.org/abs/2310.15706
  • repo_url: None
  • paper_authors: Imanol Echeverria, Maialen Murua, Roberto Santana
  • for: 提出了一种方法可以针对大规模的Job Shop优化调度问题(FJSSP)。
  • methods: 使用图ael Neural Networks(GNN)模型FJSSP,并提出了两种方法来提高推理的稳定性:生成多个调度策略并限制其使用派单规则(DRs)。
  • results: 对于大规模的FJSSP实例,该方法比派单规则和三种 latest deep reinforcement learning方法获得更好的结果。
    Abstract The Flexible Job Shop Scheduling Problem (FJSSP) has been extensively studied in the literature, and multiple approaches have been proposed within the heuristic, exact, and metaheuristic methods. However, the industry's demand to be able to respond in real-time to disruptive events has generated the necessity to be able to generate new schedules within a few seconds. Among these methods, under this constraint, only dispatching rules (DRs) are capable of generating schedules, even though their quality can be improved. To improve the results, recent methods have been proposed for modeling the FJSSP as a Markov Decision Process (MDP) and employing reinforcement learning to create a policy that generates an optimal solution assigning operations to machines. Nonetheless, there is still room for improvement, particularly in the larger FJSSP instances which are common in real-world scenarios. Therefore, the objective of this paper is to propose a method capable of robustly solving large instances of the FJSSP. To achieve this, we propose a novel way of modeling the FJSSP as an MDP using graph neural networks. We also present two methods to make inference more robust: generating a diverse set of scheduling policies that can be parallelized and limiting them using DRs. We have tested our approach on synthetically generated instances and various public benchmarks and found that our approach outperforms dispatching rules and achieves better results than three other recent deep reinforcement learning methods on larger FJSSP instances.
    摘要 flexible job shop scheduling problem (FJSSP) 已经在文献中得到了广泛的研究,多种方法在迭代、精确和元迭代方法中提出了多种方法。然而,业务需要在实时应对突发事件,因此需要在几秒钟内生成新的计划。在这些方法中,只有派送规则(DRs)可以生成计划,尽管它们的质量可以进行改进。为了改进结果,latest methods 把 FJSSP 模型为Markov Decision Process (MDP),并使用 reinforcement learning 创建一个政策,以确定操作分配给机器。然而,在更大的 FJSSP 实例中仍然有很多的空间进行改进。因此,本文的目标是提出一种可以坚定地解决大型 FJSSP 实例的方法。为了实现这一目标,我们提出了一种新的 MDP 模型使用图 neural network。我们还提出了两种方法来使推理更加稳定:生成多种调度策略,并使用 DRs 限制它们。我们对 synthetically generated instances 和多个公共benchmark 进行了测试,并发现我们的方法在更大的 FJSSP 实例中比派送规则和三种最新的深度学习方法更好的结果。

Learning-based Scheduling for Information Accuracy and Freshness in Wireless Networks

  • paper_url: http://arxiv.org/abs/2310.15705
  • repo_url: None
  • paper_authors: Hitesh Gudwani
  • for: 本文关注一个多源单渠道监测系统,其中每个源都测量时间变化的量,但是这些量的准确性和传输成功率都不确定。
  • methods: 作者使用多臂抽奖问题的变体来模型调度问题,并对四种标准抽奖策略进行比较:ETC、$\epsilon$-贪婪、UCB和TS。此外,作者还提供了ETC和$\epsilon$-贪婪策略的分析保证。
  • results: 作者通过 simulations 和分析来证明,ETC 策略和 $\epsilon$-贪婪策略在该系统中表现最佳,而 UCB 和 TS 策略则表现较差。此外,作者还提供了系统的下界性能。
    Abstract We consider a system of multiple sources, a single communication channel, and a single monitoring station. Each source measures a time-varying quantity with varying levels of accuracy and one of them sends its update to the monitoring station via the channel. The probability of success of each attempted communication is a function of the source scheduled for transmitting its update. Both the probability of correct measurement and the probability of successful transmission of all the sources are unknown to the scheduler. The metric of interest is the reward received by the system which depends on the accuracy of the last update received by the destination and the Age-of-Information (AoI) of the system. We model our scheduling problem as a variant of the multi-arm bandit problem with sources as different arms. We compare the performance of all $4$ standard bandit policies, namely, ETC, $\epsilon$-greedy, UCB, and TS suitably adjusted to our system model via simulations. In addition, we provide analytical guarantees of $2$ of these policies, ETC, and $\epsilon$-greedy. Finally, we characterize the lower bound on the cumulative regret achievable by any policy.
    摘要 我们考虑一个多源多通道单一监控站的系统。每个源都会测量一个时间 varying 的量,并且其中一个source会将更新发送到监控站 via 通道。各 source 的尝试通信成功率是对应的 schedule 的函数。各源的正确测量和通信成功率都是未知的。我们的评估问题是一种多臂钟点问题,其中每个臂是各个源。我们通过实验和分析 guarantees 评估不同的政策的表现。特别是,我们评估了4种标准钟点策略, namely ETC、 $\epsilon $-greedy、UCB 和 TS,并且提供了适当的适配。此外,我们还提供了任何政策的下限 regret 可以实现的Lower bound。

Towards Automated Recipe Genre Classification using Semi-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.15693
  • repo_url: None
  • paper_authors: Nazmus Sakib, G. M. Shahariar, Md. Mohsinul Kabir, Md. Kamrul Hasan, Hasan Mahmud
  • for: 这个研究的目的是提供一个大规模的Cooking recipe dataset,以便对食谱进行分类和识别。
  • methods: 这个研究使用了两种Named Entity Recognition(NER)提取工具,并将食谱描述中的各种名称都EXTRACTED出来。
  • results: 研究结果显示,使用传统机器学习、深度学习和预训练语言模型可以对食谱进行分类,并达到了98.6%的总准确率。 investigate发现,标题特征在分类种类中发挥了更重要的作用。
    Abstract Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre.
    摘要 共享cooking recipe是一个好方式来交换culinary ideas和提供食物预制 instrucciones。然而,将raw recipefound online分类到合适的食物种类中可以是一个挑战,因为缺乏足够的标注数据。在这项研究中,我们提出了名为“Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset”的数据集,其包含2000万个culinary recipe,每个recipe都被标注到相应的食物种类中,并且包含了多种特征,如标题、NER、 instrucciones、扩展NER等,以及九个不同的标签,表示不同的食物种类,包括烘焙、饮料、非肉食、蔬菜、快餐、粮食、主食、侧菜和混合。我们提出的管道名为3A2M+,通过两个NER提取工具来扩展recipe directions中缺失的名称实体。3A2M+数据集提供了一个完整的解决方案 для多种挑战的recipe-related任务,包括分类、名称实体识别和recipe生成。此外,我们使用传统机器学习、深度学习和预训练语言模型来分类recipes,并达到了98.6%的总准确率。我们的调查表明,标题特征在分类种类中发挥了更重要的作用。

Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers

  • paper_url: http://arxiv.org/abs/2310.15684
  • repo_url: https://github.com/tangg555/biomed-sum
  • paper_authors: Chen Tang, Shun Wang, Tomas Goldsack, Chenghua Lin
  • for: 本研究旨在提高生物医学文献摘要的语言模型性能,通过 integrate 文献引用中的域专业知识。
  • methods: 我们提出了一种新的注意力机制,用于将域专业知识集成到语言模型中,以便通过文献内容和相关知识来生成摘要。
  • results: 我们的模型在生物医学摘要 tasks 上表现出优秀的result,至于比基eline方法有substantial 的提升。
    Abstract Abstracts derived from biomedical literature possess distinct domain-specific characteristics, including specialised writing styles and biomedical terminologies, which necessitate a deep understanding of the related literature. As a result, existing language models struggle to generate technical summaries that are on par with those produced by biomedical experts, given the absence of domain-specific background knowledge. This paper aims to enhance the performance of language models in biomedical abstractive summarisation by aggregating knowledge from external papers cited within the source article. We propose a novel attention-based citation aggregation model that integrates domain-specific knowledge from citation papers, allowing neural networks to generate summaries by leveraging both the paper content and relevant knowledge from citation papers. Furthermore, we construct and release a large-scale biomedical summarisation dataset that serves as a foundation for our research. Extensive experiments demonstrate that our model outperforms state-of-the-art approaches and achieves substantial improvements in abstractive biomedical text summarisation.
    摘要 摘要抽象从生物医学文献中获得特有的领域特有特征,包括专业性语言风格和生物医学术语,这使得现有语言模型在生物医学抽象摘要中表现不佳,因为缺乏领域专业知识。这篇论文旨在提高生物医学抽象摘要中语言模型的表现,通过收集外部文献中的参照纸引入领域专业知识。我们提议一种注意力机制的参照纸聚合模型,使得神经网络可以通过文献内容和相关知识来生成摘要。此外,我们构建了大规模的生物医学摘要数据集,作为我们的研究基础。广泛的实验表明,我们的模型在生物医学抽象摘要方面表现出色,与现有方法相比,得到了显著的改进。

Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation

  • paper_url: http://arxiv.org/abs/2310.15676
  • repo_url: None
  • paper_authors: Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang, Yang Yang
  • for: 这篇论文主要是为了探讨多Modal 3D场景理解的最新进展和发展趋势。
  • methods: 该论文使用了多种多Modal 3D方法,包括3D+2D多camera图像和3D+语言文本描述。
  • results: 论文对多Modal 3D方法的比较和分析,并提供了一些实验结果和深入分析。
    Abstract Multi-modal 3D scene understanding has gained considerable attention due to its wide applications in many areas, such as autonomous driving and human-computer interaction. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also ensures a more robust and resilient understanding. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over past three years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this article, we present a systematic survey of recent progress to bridge this gap. We begin by briefly introducing a background that formally defines various 3D multi-modal tasks and summarizes their inherent challenges. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.
    摘要 多modal 3D场景理解在过去三年内获得了广泛关注,因为它在各个领域,如自动驾驶和人机交互,有广泛的应用。与传统的单modal 3D理解相比,引入一个额外的模式不仅提高了场景理解的丰富性和精度,而且确保了更加可靠和抗障的理解。这在多样化和挑战性的环境中变得特别重要。随着过去三年内多modal 3D方法的开发,特别是将多个相机图像(3D+2D)和文本描述(3D+语言)结合起来的方法,尚未有一篇系统性的审查。在这篇文章中,我们提供了一个系统性的评估,以填补这一空白。我们首先介绍了背景,正式定义了不同的3D多modal任务,并总结了它们的内在挑战。接着,我们提出了一种新的分类方法,根据模式和任务进行了详细的分类,探讨它们的优势和局限性。此外,我们还提供了一些最近的方法在多个标准数据集上的比较结果,并进行了深入的分析。最后,我们讨论了未解决的问题,并提出了一些未来研究的可能性。

Confounder Balancing in Adversarial Domain Adaptation for Pre-Trained Large Models Fine-Tuning

  • paper_url: http://arxiv.org/abs/2310.16062
  • repo_url: None
  • paper_authors: Shuoran Jiang, Qingcai Chen, Yang Xiang, Youcheng Pan, Xiangping Wu
  • For: This paper proposes a method for adversarial domain adaptation (ADA) with confounder balancing for pre-trained large models (PLMs) fine-tuning.* Methods: The proposed method, ADA-CBF, includes a PLM as the foundation model for a feature extractor, a domain classifier, and a confounder classifier, which are jointly trained with an adversarial loss.* Results: Compared to existing ADA methods, ADA-CBF can correctly identify confounders in domain-invariant features and eliminate confounder biases in the extracted features from PLMs. Empirical results on natural language processing and computer vision downstream tasks show that ADA-CBF outperforms the newest GPT-4, LLaMA2, ViT, and ADA methods.Here is the text in Simplified Chinese:* For: 这篇论文提出了针对预训练大型模型(PLMs)精细调教时的反对频道适应(ADA)方法,以帮助将知识从源频道传递到目标频道。* Methods: ADA-CBF方法包括PLM作为基础模型,用于特征提取器、频道分类器和干扰分类器,这些模型通过反对损失进行联合训练。* Results: 与现有ADA方法相比,ADA-CBF方法可以正确地识别频道 invariant 中的干扰因素,从而消除PLMs中提取的干扰因素。实验结果表明,ADA-CBF方法在自然语言处理和计算机视觉下游任务上表现出色,超过了最新的GPT-4、LLaMA2、ViT和ADA方法。
    Abstract The excellent generalization, contextual learning, and emergence abilities in the pre-trained large models (PLMs) handle specific tasks without direct training data, making them the better foundation models in the adversarial domain adaptation (ADA) methods to transfer knowledge learned from the source domain to target domains. However, existing ADA methods fail to account for the confounder properly, which is the root cause of the source data distribution that differs from the target domains. This study proposes an adversarial domain adaptation with confounder balancing for PLMs fine-tuning (ADA-CBF). The ADA-CBF includes a PLM as the foundation model for a feature extractor, a domain classifier and a confounder classifier, and they are jointly trained with an adversarial loss. This loss is designed to improve the domain-invariant representation learning by diluting the discrimination in the domain classifier. At the same time, the adversarial loss also balances the confounder distribution among source and unmeasured domains in training. Compared to existing ADA methods, ADA-CBF can correctly identify confounders in domain-invariant features, thereby eliminating the confounder biases in the extracted features from PLMs. The confounder classifier in ADA-CBF is designed as a plug-and-play and can be applied in the confounder measurable, unmeasurable, or partially measurable environments. Empirical results on natural language processing and computer vision downstream tasks show that ADA-CBF outperforms the newest GPT-4, LLaMA2, ViT and ADA methods.
    摘要 这些预训练大型模型(PLM)具有出色的泛化、上下文学习和 emergence 能力,可以在不直接使用训练数据的情况下处理特定任务,使其成为更好的基础模型在逆转Domain adaptation(ADA)方法中传递知识从源频道到目标频道。然而,现有的 ADA 方法未能正确考虑隐藏因素(confounder),这是源数据分布的根本原因。这项研究提出了基于 PLM 的逆转Domain adaptation with confounder balancing(ADA-CBF)。ADA-CBF 包括 PLM 作为基础模型的特征提取器、频道分类器和隐藏因素分类器,这些模块在jointly 训练的 adversarial loss 中进行学习。这种损失设计可以提高频道无关的表示学习,同时也在训练中均衡隐藏因素的分布。相比现有的 ADA 方法,ADA-CBF 可以正确识别隐藏因素在频道无关的特征中,从而消除 PLM 提取的隐藏因素偏见。ADA-CBF 的隐藏因素分类器设计为可插入的 plug-and-play,可以在测量、未测量或部分测量环境中应用。实验结果表明,ADA-CBF 在自然语言处理和计算机视觉下渠道任务上表现出色,超过了最新的 GPT-4、LLaMA2、ViT 和 ADA 方法。

A Survey on Detection of LLMs-Generated Content

  • paper_url: http://arxiv.org/abs/2310.15654
  • repo_url: https://github.com/xianjun-yang/awesome_papers_on_llms_detection
  • paper_authors: Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng
  • For: The paper aims to provide a comprehensive survey of existing detection strategies and benchmarks for identifying content generated by advanced large language models (LLMs), and to identify key challenges and prospects in the field.* Methods: The paper scrutinizes the differences between existing detection strategies and benchmarks, and advocates for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs.* Results: The paper provides a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at a specific GitHub repository.
    Abstract The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git.
    摘要 大量高级自然语言模型(LLMs)的发展,如ChatGPT,已经导致了假内容生成的增加,对媒体、网络安全、公共讨论和教育等领域产生了重要影响。因此,检测LLMs生成的内容的能力变得非常重要。我们想要提供一个详细的检测策略和标准 benchmark的概述,分析它们之间的差异,并确定领域中关键的挑战和前景,并提倡更加适应性和可靠性的模型,以提高检测精度。此外,我们认为需要采取多方面的方法来防止不同类型的攻击,以对抗LLMs的攻击能力的快速发展。根据我们所知,这是LLMs生成内容检测领域的首次全面评价。我们希望这篇文章能为研究人员和实践者提供一份广泛的理解LLMs生成内容检测领域的现状,并作为参考,帮助他们在LLMs占据主导地位的时代保持数字信息的完整性。相关论文将在https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git中进行系统性的汇总和更新。

Career Path Prediction using Resume Representation Learning and Skill-based Matching

  • paper_url: http://arxiv.org/abs/2310.15636
  • repo_url: None
  • paper_authors: Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder, Thomas Demeester
  • for: 预测工作者的下一步职业发展,包括职业变革预测和内部职业移动预测。
  • methods: 利用工作经验段落中的文本描述来预测下一步职业发展,包括一种基于 ESCO 职业标签的数据集,以及一种基于文本描述的 represencing learning 方法(CareerBERT)。
  • results: 在使用数据集进行训练后,提出了一种基于技能的预测模型和一种基于文本描述的预测模型,它们分别达到了 35.24% 和 39.61% 的 recall@10 性能指标,而将这两种模型结合使用的 Hybrid 方法则达到了 43.01% 的 recall@10 性能指标。
    Abstract The impact of person-job fit on job satisfaction and performance is widely acknowledged, which highlights the importance of providing workers with next steps at the right time in their career. This task of predicting the next step in a career is known as career path prediction, and has diverse applications such as turnover prevention and internal job mobility. Existing methods to career path prediction rely on large amounts of private career history data to model the interactions between job titles and companies. We propose leveraging the unexplored textual descriptions that are part of work experience sections in resumes. We introduce a structured dataset of 2,164 anonymized career histories, annotated with ESCO occupation labels. Based on this dataset, we present a novel representation learning approach, CareerBERT, specifically designed for work history data. We develop a skill-based model and a text-based model for career path prediction, which achieve 35.24% and 39.61% recall@10 respectively on our dataset. Finally, we show that both approaches are complementary as a hybrid approach achieves the strongest result with 43.01% recall@10.
    摘要 “人职匹配对工作满意度和性能的影响广泛被认可,这 highlights 提供工作者合适的下一步时间的重要性。预测工作者的下一步职业是职业发展预测的任务,具有吸引人的应用场景,如防止辞职和内部职位转移。现有的职业发展预测方法基于大量私人职业历史数据来模型职业标签和公司之间的互动。我们提议利用工作经历部分的未曾被利用的文本描述。我们提供了2,164个匿名化的职业历史记录,并将其注解为 ESCO 职业标签。基于这个数据集,我们提出了一种新的表征学习方法,称为 CareerBERT,专门为工作历史数据设计。我们开发了一种基于技能的模型和一种基于文本的模型,以预测职业发展。两种方法分别 achiev 35.24% 和 39.61% 的回归@10 的结果。最后,我们表明这两种方法是补充的,混合方法实现最强的结果,即 43.01% 的回归@10。”

Using Slisemap to interpret physical data

  • paper_url: http://arxiv.org/abs/2310.15610
  • repo_url: None
  • paper_authors: Lauri Seppäläinen, Anton Björklund, Vitus Besel, Kai Puolamäki
  • for: 这篇论文是用于描述一种新的 manifold visualization 方法,叫做 Slise,以及其在物理和化学 dataset 上的应用。
  • methods: 这篇论文使用了 Slisemap,它是一种结合 manifold visualization 和可解释人工智能的方法,用于调查黑盒机器学习模型和复杂模拟器的决策过程。 Slisemap 可以找到一个嵌入,使得数据项的相似本地解释被分组在一起,从而提供了黑盒模型的不同行为的总览。
  • results: 在这篇论文中, authors 使用了 Slisemap 在物理数据上进行了评估,并发现了 Slisemap 可以帮助找到 meaningful 信息,包括分类和回归模型在数据集上的表现。
    Abstract Manifold visualisation techniques are commonly used to visualise high-dimensional datasets in physical sciences. In this paper we apply a recently introduced manifold visualisation method, called Slise, on datasets from physics and chemistry. Slisemap combines manifold visualisation with explainable artificial intelligence. Explainable artificial intelligence is used to investigate the decision processes of black box machine learning models and complex simulators. With Slisemap we find an embedding such that data items with similar local explanations are grouped together. Hence, Slisemap gives us an overview of the different behaviours of a black box model. This makes Slisemap into a supervised manifold visualisation method, where the patterns in the embedding reflect a target property. In this paper we show how Slisemap can be used and evaluated on physical data and that Slisemap is helpful in finding meaningful information on classification and regression models trained on these datasets.
    摘要 manifold 视觉技术广泛应用于物理科学中高维数据的可视化。在这篇论文中,我们使用最近引入的抽象可视化方法,即Slise,对物理和化学数据集进行可视化。Slisemap 结合抽象智能和可解释性,用于调查黑盒机器学习模型和复杂模拟器的决策过程。通过Slisemap,我们可以找到一个嵌入,使得数据项与相似的本地解释集成在一起。因此,Slisemap 为我们提供了一种监督的抽象可视化方法,其中抽象的特征嵌入反映了目标属性。在这篇论文中,我们示例了如何使用 Slisemap 可视化和评估物理数据集,并证明 Slisemap 对分类和回归模型的训练数据进行可视化有助于找到有意义的信息。

tagE: Enabling an Embodied Agent to Understand Human Instructions

  • paper_url: http://arxiv.org/abs/2310.15605
  • repo_url: https://github.com/csarkar/tage
  • paper_authors: Chayan Sarkar, Avik Mitra, Pradip Pramanick, Tapas Nayak
  • for: 本研究旨在提高智能代理人(embodied agent)理解自然语言(NLU)指令,以便在人工智能(AI)与人类交互时更加准确地理解人类的意图。
  • methods: 本研究提出了一种新的task和argument grounding for Embodied agents(tagE)系统,该系统使用了一种新型的神经网络模型,可以从自然语言中提取复杂任务指令中的多个任务和其相应的Arguments。
  • results: 实验结果表明,tagE系统比基eline模型表现出色,能够更好地理解人类的意图并将其映射到机器人的已知技能集和环境中的物品上。
    Abstract Natural language serves as the primary mode of communication when an intelligent agent with a physical presence engages with human beings. While a plethora of research focuses on natural language understanding (NLU), encompassing endeavors such as sentiment analysis, intent prediction, question answering, and summarization, the scope of NLU directed at situations necessitating tangible actions by an embodied agent remains limited. The inherent ambiguity and incompleteness inherent in natural language present challenges for intelligent agents striving to decipher human intention. To tackle this predicament head-on, we introduce a novel system known as task and argument grounding for Embodied agents (tagE). At its core, our system employs an inventive neural network model designed to extract a series of tasks from complex task instructions expressed in natural language. Our proposed model adopts an encoder-decoder framework enriched with nested decoding to effectively extract tasks and their corresponding arguments from these intricate instructions. These extracted tasks are then mapped (or grounded) to the robot's established collection of skills, while the arguments find grounding in objects present within the environment. To facilitate the training and evaluation of our system, we have curated a dataset featuring complex instructions. The results of our experiments underscore the prowess of our approach, as it outperforms robust baseline models.
    摘要 自然语言作为智能代理人与人类交流的主要模式,而许多研究集中在自然语言理解(NLU)方面,涵盖了情感分析、意图预测、问答和概要概述等领域。然而,针对需要物理行为的情况,NLU的应用范围仍然很有限。自然语言的抽象和缺失使得智能代理人很难理解人类意图。为了解决这个问题,我们介绍了一种新的系统——任务和论点固定 для具体代理人(tagE)。我们的系统使用了一种创新的神经网络模型,可以从自然语言中提取复杂任务指令中的多个任务和其相应的论点。我们的提议的模型采用了Encoder-Decoder框架,并添加了嵌入式编码来有效地提取任务和论点。这些提取的任务将被映射到机器人已经建立的技能集中,而论点则会在环境中找到对应的物品。为了让我们的系统受到训练和评估,我们已经制作了一个复杂指令的数据集。实验结果表明,我们的方法在与强大基准模型进行比较时表现出色。

Emergent Communication in Interactive Sketch Question Answering

  • paper_url: http://arxiv.org/abs/2310.15597
  • repo_url: https://github.com/mediabrain-sjtu/ecisqa
  • paper_authors: Zixing Lei, Yiming Zhang, Yuxin Xiong, Siheng Chen
  • for: 本研究旨在通过图画和解释器学习沟通,并评估多轮交互对人工智能沟通的影响。
  • methods: 本研究提出了一个新的互动图问答任务,两名合作player通过图画回答一个图像的问题,并采用了一种新的有效的互动EC系统,可以实现问题答案准确率、图画复杂度和人类可读性的有效平衡。
  • results: 实验结果和人类评价表明,多轮交互机制可以提高目标和有效的人工智能沟通,并且具有良好的人类可读性。
    Abstract Vision-based emergent communication (EC) aims to learn to communicate through sketches and demystify the evolution of human communication. Ironically, previous works neglect multi-round interaction, which is indispensable in human communication. To fill this gap, we first introduce a novel Interactive Sketch Question Answering (ISQA) task, where two collaborative players are interacting through sketches to answer a question about an image in a multi-round manner. To accomplish this task, we design a new and efficient interactive EC system, which can achieve an effective balance among three evaluation factors, including the question answering accuracy, drawing complexity and human interpretability. Our experimental results including human evaluation demonstrate that multi-round interactive mechanism facilitates targeted and efficient communication between intelligent agents with decent human interpretability.
    摘要 视觉基于的演进通信(EC)目标是通过绘画学习沟通,并推翻人类沟通的演进。却意外地,先前的工作忽略了多轮互动,这是人类沟通中不可或缺的一部分。为了填补这一漏洞,我们首先介绍了一种新的互动绘画问答(ISQA)任务,其中两名合作的玩家通过绘画回答一个图像的问题,以多轮的方式进行交互。为了完成这项任务,我们设计了一种新的高效的互动EC系统,可以实现三个评价因素的有效平衡,包括问题答案准确率、绘画复杂度和人类可解释性。我们的实验结果,包括人类评价,表明了多轮互动机制可以使智能代理人与人类之间的沟通更加准确和高效。

Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression

  • paper_url: http://arxiv.org/abs/2310.15594
  • repo_url: None
  • paper_authors: Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan
  • for: 实现大规模预训语言模型(LLMs)在实际应用中的部署,解决大型模型的问题。
  • methods: 提出了一种新的压缩方法 Retrieval-based Knowledge Transfer(RetriKT),可以将 LLMS 中的知识转移到非常小型的模型(例如 1%)中。
  • results: 实际实验结果显示,提出的方法可以将小型模型的性能提高,通过将 LLMS 中的知识转移到小型模型中,并且运用软题调整和 proximal policy optimization(PPO)优化学习方法。
    Abstract Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.
    摘要 Our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, we employ soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques. We conduct extensive experiments on low-resource tasks from SuperGLUE and GLUE benchmarks. The results show that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.

Detecting Intentional AIS Shutdown in Open Sea Maritime Surveillance Using Self-Supervised Deep Learning

  • paper_url: http://arxiv.org/abs/2310.15586
  • repo_url: None
  • paper_authors: Pierre Bernabé, Arnaud Gotlieb, Bruno Legeard, Dusica Marijan, Frank Olaf Sem-Jacobsen, Helge Spieker
  • for: 检测非法活动,如非法捕鱼或贩卖不法物品。
  • methods: 使用自动标识系统(AIS)消息,并使用深度学习技术和变换器模型进行自我监督学习,实时处理AIS消息,可处理每月超过500亿消息和60000艘船舶的路径轨迹。
  • results: 可以准确地检测AIS消息不当失 reception,并且可以 rediscover已经检测到的意外AIS终端停机。
    Abstract In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down their AIS transponders to hide illegal activities. In the open sea, it is very challenging to differentiate intentional AIS shutdowns from missing reception due to protocol limitations, bad weather conditions or restricting satellite positions. This paper presents a novel approach for the detection of abnormal AIS missing reception based on self-supervised deep learning techniques and transformer models. Using historical data, the trained model predicts if a message should be received in the upcoming minute or not. Afterwards, the model reports on detected anomalies by comparing the prediction with what actually happens. Our method can process AIS messages in real-time, in particular, more than 500 Millions AIS messages per month, corresponding to the trajectories of more than 60 000 ships. The method is evaluated on 1-year of real-world data coming from four Norwegian surveillance satellites. Using related research results, we validated our method by rediscovering already detected intentional AIS shutdowns.
    摘要 在海上交通监控中,检测违法活动,如非法渔业或贩卖黑市商品是海岸管理部门的关键任务。在开海上,一需要依靠自动识别系统(AIS)消息,由船舶上的扩展器传输,由监控卫星接收。然而,不诚实的船舶经常故意关闭AIS扩展器,以隐藏违法活动。在开海上,区分意外的AIS缺失和意外关闭非常困难。这篇论文提出了一种新的偏差检测方法,基于自动学习技术和变换器模型。使用历史数据,训练好的模型预测下一分钟内是否应该接收AIS消息。之后,模型对检测到的异常进行报告,通过与实际情况进行比较。我们的方法可以处理AIS消息在实时,特别是每月超过5亿条消息,对应60000艘船舶的轨迹。我们的方法在4年的实际数据上进行了评估,使用相关研究成果, validate our method by rediscovering already detected intentional AIS shutdowns。

CONTRASTE: Supervised Contrastive Pre-training With Aspect-based Prompts For Aspect Sentiment Triplet Extraction

  • paper_url: http://arxiv.org/abs/2310.15577
  • repo_url: https://github.com/nitkannen/contraste
  • paper_authors: Rajdeep Mukherjee, Nithish Kannen, Saurabh Kumar Pandey, Pawan Goyal
  • for: 提高多个ABSA任务下推理性能
  • methods: 使用对比学习增强ASTE性能,并提出多任务结合方法
  • results: 实现新的ASTE最佳Result,并通过细致的ablation study证明每个提案的重要性
    Abstract Existing works on Aspect Sentiment Triplet Extraction (ASTE) explicitly focus on developing more efficient fine-tuning techniques for the task. Instead, our motivation is to come up with a generic approach that can improve the downstream performances of multiple ABSA tasks simultaneously. Towards this, we present CONTRASTE, a novel pre-training strategy using CONTRastive learning to enhance the ASTE performance. While we primarily focus on ASTE, we also demonstrate the advantage of our proposed technique on other ABSA tasks such as ACOS, TASD, and AESC. Given a sentence and its associated (aspect, opinion, sentiment) triplets, first, we design aspect-based prompts with corresponding sentiments masked. We then (pre)train an encoder-decoder model by applying contrastive learning on the decoder-generated aspect-aware sentiment representations of the masked terms. For fine-tuning the model weights thus obtained, we then propose a novel multi-task approach where the base encoder-decoder model is combined with two complementary modules, a tagging-based Opinion Term Detector, and a regression-based Triplet Count Estimator. Exhaustive experiments on four benchmark datasets and a detailed ablation study establish the importance of each of our proposed components as we achieve new state-of-the-art ASTE results.
    摘要 现有的对应情感三元EXTRACTION(ASTE)研究主要集中在发展更高效的精细化训练技术上。而我们的动机是为了开发一个通用的方法,可以同时提高多个ABSA任务的下游性能。为此,我们提出了CONTRASTE,一种基于对照学习的新预训练策略。我们主要关注ASTE,但我们也证明了我们的提案方法在其他ABSA任务,如ACOS、TASD和AESC上的优势。对于一句话和其相关的(方面、意见、情感)三元,我们首先设计了方面基于的提示,并将相应的情感覆盖。然后,我们使用对照学习 trains一个encoder-decoder模型,并透过对decoder生成的方面对应情感表现进行对照学习。接着,我们提出了一个新的多任务方法,将基础encoder-decoder模型与两个辅助模组(一个是基于标签的意见 терміن列表,另一个是基于回归的三元数据列表)结合。我们对四个 benchmark 数据集进行了详细的实验和ablation研究,并证明了我们的每个提案元件的重要性,得到了新的state-of-the-art ASTE结果。

DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding

  • paper_url: http://arxiv.org/abs/2310.18359
  • repo_url: None
  • paper_authors: Xiao-Yu Guo, Yuan-Fang Li, Gholamreza Haffari
  • for: 本研究旨在检验社交智能benchmark datasets的准确性。
  • methods: 我们采用了一种全面的方法来研究社交智能benchmark datasets的准确性,包括对社交智能问题的分析和模型的评估。
  • results: 我们发现社交智能benchmark datasets存在严重的偏见,可以由一个moderately strong的语言模型学习到无 Context或问题的关系,并且我们提出了一个新的benchmark dataset(DeSIQ),它可以减少了原始社交智能benchmark datasets中的偏见。
    Abstract Social intelligence is essential for understanding and reasoning about human expressions, intents and interactions. One representative benchmark for its study is Social Intelligence Queries (Social-IQ), a dataset of multiple-choice questions on videos of complex social interactions. We define a comprehensive methodology to study the soundness of Social-IQ, as the soundness of such benchmark datasets is crucial to the investigation of the underlying research problem. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model to learn spurious correlations to achieve perfect performance without being given the context or even the question. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ. Our empirical analysis shows DeSIQ significantly reduces the biases in the original Social-IQ dataset. Furthermore, we examine and shed light on the effect of model size, model style, learning settings, commonsense knowledge, and multi-modality on the new benchmark performance. Our new dataset, observations and findings open up important research questions for the study of social intelligence.
    摘要 社会智能是人类表达、意图和互动理解的关键。一个代表性的研究 benchmark 是社交智能问答 (Social-IQ),这是一个多选问题 dataset 集合,涵盖了复杂的社交互动。我们定义了一种全面的方法ологи?来研究 Social-IQ 的听实性。我们的分析发现,Social-IQ 存在严重的偏见,可以由一个 Moderately strong 语言模型学习伪造关系,以达到完美性无需 Context 或问题。我们引入了 DeSIQ,一个新的挑战性 dataset,通过简单的变换来修正 Social-IQ 中的偏见。我们的实验分析表明,DeSIQ 有效减少了 Social-IQ 中的偏见。此外,我们还检查了模型大小、模型风格、学习设置、常识知识和多模态对新 benchmark 性能的影响。我们的新 dataset、观察和发现开 up了重要的研究问题,用于研究社会智能。

SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation

  • paper_url: http://arxiv.org/abs/2310.15539
  • repo_url: https://github.com/sade-adrien/stelocoder
  • paper_authors: Jialing Pan, Adrien Sadé, Jin Kim, Eric Soriano, Guillem Sole, Sylvain Flamant
  • for: 这个论文的目的是提出一个基于 StarCoder 的 Decoder-only LLM,用于多种程式语言到 Python 的程式码转换。
  • methods: 这个模型使用 Mixture-of-Experts 技术和 Low-Rank Adaptive Method,将 StarCoder 模型扩展为多任务处理。
  • results: 这个模型在 XLCoST 数据集上实现了73.76 CodeBLEU 分数,比领先者至少高出3.5。
    Abstract With the recent focus on Large Language Models (LLMs), both StarCoder (Li et al., 2023) and Code Llama (Rozi\`ere et al., 2023) have demonstrated remarkable performance in code generation. However, there is still a need for improvement in code translation functionality with efficient training techniques. In response to this, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. In particular, SteloCoder achieves C++, C#, JavaScript, Java, or PHP-to-Python code translation without specifying the input programming language. We modified StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique featuring five experts and a gating network for multi-task handling. Experts are obtained by StarCoder fine-tuning. Specifically, we use a Low-Rank Adaptive Method (LoRA) technique, limiting each expert size as only 0.06% of number of StarCoder's parameters. At the same time, to enhance training efficiency in terms of time, we adopt curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert takes only 6 hours to train on one single 80Gb A100 HBM. With experiments on XLCoST datasets, SteloCoder achieves an average of 73.76 CodeBLEU score in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is release here: https://github.com/sade-adrien/SteloCoder.
    摘要 Recently, there has been a focus on Large Language Models (LLMs), and both StarCoder (Li et al., 2023) and Code Llama (Rozi`ere et al., 2023) have shown impressive performance in code generation. However, there is still room for improvement in code translation functionality with efficient training techniques. In response, we introduce SteloCoder, a decoder-only StarCoder-based LLM designed specifically for multi-programming language-to-Python code translation. Specifically, SteloCoder can translate C++, C#, JavaScript, Java, or PHP code to Python without specifying the input programming language. We modified the StarCoder model architecture by incorporating a Mixture-of-Experts (MoE) technique with five experts and a gating network for multi-task handling. The experts are obtained by fine-tuning StarCoder. We use a Low-Rank Adaptive Method (LoRA) technique to limit each expert size to only 0.06% of the number of StarCoder's parameters. To improve training efficiency in terms of time, we adopt a curriculum learning strategy and use self-instruct data for efficient fine-tuning. As a result, each expert can be trained on one single 80Gb A100 HBM in just 6 hours. With experiments on XLCoST datasets, SteloCoder achieves an average CodeBLEU score of 73.76 in multi-programming language-to-Python translation, surpassing the top performance from the leaderboard by at least 3.5. This accomplishment is attributed to only 45M extra parameters with StarCoder as the backbone and 32 hours of valid training on one 80GB A100 HBM. The source code is available at: https://github.com/sade-adrien/SteloCoder.

Generative and Contrastive Paradigms Are Complementary for Graph Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.15523
  • repo_url: https://github.com/wyx11112/GCMAE
  • paper_authors: Yuxiang Wang, Xiao Yan, Chuang Hu, Fangcheng Fu, Wentao Zhang, Hao Wang, Shuo Shang, Jiawei Jiang
  • for: 本研究旨在提出一种基于自适应学习的图像自我超visentinig方法,即图像对比学习(CL)和压缩学习(MAE)两种方法的统一框架,以提高图像自我超visentinig的性能。
  • methods: 本研究提出了一种名为图像对比压缩学习(GCMAE)的统一框架,其包括一个MAE分支和一个CL分支,两个分支共享一个编码器,这使得MAE分支可以利用CL分支中提取的全局信息。此外,为了让GCMAE捕捉全局图像结构,研究人员们提出了一种强制GCMAE重建整个相对性矩阵的方法,而不是只重建部分掩码的edge как在现有工作中。
  • results: 研究人员们对四个流行的图像任务(i.e., 节点类别、节点划分、链接预测和图像类别)进行了测试,并与14种现有的基准值进行了比较。结果显示,GCMAE在这些任务中的性能都很好,最大的性能提升比为3.2%相比最佳基准值。
    Abstract For graph self-supervised learning (GSSL), masked autoencoder (MAE) follows the generative paradigm and learns to reconstruct masked graph edges or node features. Contrastive Learning (CL) maximizes the similarity between augmented views of the same graph and is widely used for GSSL. However, MAE and CL are considered separately in existing works for GSSL. We observe that the MAE and CL paradigms are complementary and propose the graph contrastive masked autoencoder (GCMAE) framework to unify them. Specifically, by focusing on local edges or node features, MAE cannot capture global information of the graph and is sensitive to particular edges and features. On the contrary, CL excels in extracting global information because it considers the relation between graphs. As such, we equip GCMAE with an MAE branch and a CL branch, and the two branches share a common encoder, which allows the MAE branch to exploit the global information extracted by the CL branch. To force GCMAE to capture global graph structures, we train it to reconstruct the entire adjacency matrix instead of only the masked edges as in existing works. Moreover, a discrimination loss is proposed for feature reconstruction, which improves the disparity between node embeddings rather than reducing the reconstruction error to tackle the feature smoothing problem of MAE. We evaluate GCMAE on four popular graph tasks (i.e., node classification, node clustering, link prediction, and graph classification) and compare with 14 state-of-the-art baselines. The results show that GCMAE consistently provides good accuracy across these tasks, and the maximum accuracy improvement is up to 3.2% compared with the best-performing baseline.
    摘要 For 自我指导学习(GSSL),假设自动编码器(MAE)遵循生成概念,学习缺失图 edges 或节点特征的重建。相对学习(CL) maximizes 图中的相似性,并广泛应用于 GSSL。然而,MAE 和 CL 在现有的工作中是分开的。我们发现 MAE 和 CL 的概念是补偿的,我们提议一个图自相关假设自动编码器(GCMAE)框架来统一它们。具体来说,MAE 因为关注本地 edges 或节点特征,无法捕捉图的全局信息,而且受到特定的 edges 和特征的影响。相反,CL 因为考虑图之间的关系,能够提取全局信息。因此,我们在 GCMAE 中采用 MAE 支持和 CL 支持,两者共享一个编码器,使 MAE 支持 CL 支持的全局信息。此外,我们训练 GCMAE 可以重建整个相关矩阵,而不是只重建缺失 edges как在现有的工作中。此外,我们还提出了一种分类损失,以提高节点特征的重建。我们在四个流行的图任务(即节点分类、节点划分、链接预测和图分类)上评估 GCMAE,并与 14 个当前的基eline进行比较。结果显示,GCMAE 在这些任务上具有良好的准确率,最大的准确率提升为 3.2% 相比最佳基eline。

KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval

  • paper_url: http://arxiv.org/abs/2310.15511
  • repo_url: None
  • paper_authors: Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi
  • for: 本研究旨在探讨现代模型是否能够回答信息检索查询(如“ san diego 的冰淇淋店”),以及这类查询是否只能通过网络搜索或知识库解决。
  • methods: 本研究使用了现代语言模型(LLMs)来初步研究这类查询的能力。
  • results: 研究发现,许多现有的检索标准不能够评估模型的约束满足能力,而且现有的检索数据集也存在许多问题。研究者们提供了一个新的数据集(KITAB),以及一种相关的动态数据收集和约束验证方法,以便在其他作者的数据上进行类似的测试。经过扩展的实验表明,在不含上下文的情况下,模型会表现出严重的局限性,包括不相关信息、错误信息和不完整性,这些局限性随着信息的流行程度减少而加剧。
    Abstract We study the ability of state-of-the art models to answer constraint satisfaction queries for information retrieval (e.g., 'a list of ice cream shops in San Diego'). In the past, such queries were considered to be tasks that could only be solved via web-search or knowledge bases. More recently, large language models (LLMs) have demonstrated initial emergent abilities in this task. However, many current retrieval benchmarks are either saturated or do not measure constraint satisfaction. Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models. KITAB consists of book-related data across more than 600 authors and 13,000 queries, and also offers an associated dynamic data collection and constraint verification approach for acquiring similar test data for other authors. Our extended experiments on GPT4 and GPT3.5 characterize and decouple common failure modes across dimensions such as information popularity, constraint types, and context availability. Results show that in the absence of context, models exhibit severe limitations as measured by irrelevant information, factual errors, and incompleteness, many of which exacerbate as information popularity decreases. While context availability mitigates irrelevant information, it is not helpful for satisfying constraints, identifying fundamental barriers to constraint satisfaction. We open source our contributions to foster further research on improving constraint satisfaction abilities of future models.
    摘要 我们研究现代模型对信息检索查询(如“圣地亚哥的冰淇淋店列表”)的能力。在过去,这些查询被视为只能通过网络搜索或知识库解决的任务。然而,大型自然语言模型(LLM)在这个任务中已经表现出初步的能力。然而,现有的检索标准 benchmark 中有很多不够或者不能测试制约能力的问题。为了解决这些问题,我们提出了 KITAB,一个新的数据集,用于测试语言模型的制约能力。KITAB 包含了More than 600 作者和 13,000 个查询,并且提供了一种相关的动态数据收集和制约验证方法,用于获取类似的测试数据 для其他作者。我们在 GPT4 和 GPT3.5 上进行了扩展的实验,用于描述和解除不同维度的失败模式,包括信息受欢迎程度、制约类型和上下文可用性。结果表明,在没有上下文时,模型会表现出严重的限制,包括无关信息、错误信息和不完整性,这些问题在信息受欢迎程度下降时加剧。然而,上下文可用性可以减少无关信息,但是不能满足制约。我们将我们的贡献开源,以促进未来模型的制约能力改进。

Robust Representation Learning for Unified Online Top-K Recommendation

  • paper_url: http://arxiv.org/abs/2310.15492
  • repo_url: None
  • paper_authors: Minfang Lu, Yuchen Jiang, Huihui Dong, Qi Li, Ziru Xu, Yuanlin Liu, Lixia Wu, Haoyuan Hu, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng
  • For: The paper focuses on improving the efficiency of online recommendation systems in large-scale industrial e-commerce, particularly in delivering highly relevant item/content advertising that caters to diverse business scenarios.* Methods: The proposed method employs robust representation learning, including domain adversarial learning and multi-view Wasserstein distribution learning, to learn robust representations that ensure data fairness. The method balances conflicting objectives through homoscedastic uncertainty weights and orthogonality constraints.* Results: The proposed method is effective in tackling the challenges of multi-domain matching and retrieving top-k advertisements from multi-entity advertisements across different domains. Various experiments validate the effectiveness and rationality of the proposed method, which has been successfully deployed online to serve real business scenarios.
    Abstract In large-scale industrial e-commerce, the efficiency of an online recommendation system is crucial in delivering highly relevant item/content advertising that caters to diverse business scenarios. However, most existing studies focus solely on item advertising, neglecting the significance of content advertising. This oversight results in inconsistencies within the multi-entity structure and unfair retrieval. Furthermore, the challenge of retrieving top-k advertisements from multi-entity advertisements across different domains adds to the complexity. Recent research proves that user-entity behaviors within different domains exhibit characteristics of differentiation and homogeneity. Therefore, the multi-domain matching models typically rely on the hybrid-experts framework with domain-invariant and domain-specific representations. Unfortunately, most approaches primarily focus on optimizing the combination mode of different experts, failing to address the inherent difficulty in optimizing the expert modules themselves. The existence of redundant information across different domains introduces interference and competition among experts, while the distinct learning objectives of each domain lead to varying optimization challenges among experts. To tackle these issues, we propose robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The robust representation learning employs domain adversarial learning and multi-view wasserstein distribution learning to learn robust representations. Moreover, the proposed method balances conflicting objectives through the homoscedastic uncertainty weights and orthogonality constraints. Various experiments validate the effectiveness and rationality of our proposed method, which has been successfully deployed online to serve real business scenarios.
    摘要 大规模工业电商中,在线推荐系统的效率是关键,可以帮助提供高度相关的商品/内容广告,满足不同的业务场景。然而,大多数现有研究仅关注Item广告,忽略了内容广告的重要性。这种忽略会导致多元结构中的不一致性和不公正性。另外,从多个Domains中检索Top-k广告的挑战更加复杂。现有研究表明,用户-Entity行为在不同Domains中具有差异和同一性特征。因此,多Domain匹配模型通常采用Hybrid-Experts框架,并使用域不同和域相同的表示。然而,大多数方法主要关注多个专家的组合方式优化,而忽略了专家模块的内在困难。在不同Domains中存在重复信息的问题会导致专家之间的竞争和干扰,而每个Domain的学习目标也会导致专家模块的优化挑战。为了解决这些问题,我们提出了robust表示学习方法,以确保数据公平。我们的方法在Entity空间建立了统一的模型,并使用域对抗学习和多视图沃氏分布学习来学习Robust表示。此外,我们的方法通过Homoscedastic不确定量和正交约束来均衡矛盾目标。经过多个实验 validate了我们提出的方法的有效性和合理性,并在实际业务场景中成功部署。

NuTrea: Neural Tree Search for Context-guided Multi-hop KGQA

  • paper_url: http://arxiv.org/abs/2310.15484
  • repo_url: https://github.com/mlvlab/nutrea
  • paper_authors: Hyeong Kyu Choi, Seunghun Lee, Jaewon Chu, Hyunwoo J. Kim
  • for: 本研究旨在提高多跳知识图问答(KGQA)任务的性能,使得可以使用自然语言问题来检索知识图(KG)中的节点。
  • methods: 我们提出了一种基于搜索的图神经网络模型(GNN),即神经搜索(NuTrea),它可以更好地考虑知识图的全局背景。我们采用了一种消息传递机制,可以增强过去的嵌入。此外,我们还引入了一种关系频率反Entity频率(RF-IEF)节点嵌入,可以更好地描述不确定的知识图节点。
  • results: 我们通过对三个主要多跳KGQA测试集进行实验,证明了我们的方法的普遍有效性。此外,我们还进行了广泛的分析,以证明我们的方法的表达能力和稳定性。总的来说,NuTrea 提供了一种可以使用自然语言问题来检索知识图中节点的强大工具。代码可以在https://github.com/mlvlab/NuTrea 上获取。
    Abstract Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions. Recent GNN-based approaches formulate this task as a KG path searching problem, where messages are sequentially propagated from the seed node towards the answer nodes. However, these messages are past-oriented, and they do not consider the full KG context. To make matters worse, KG nodes often represent proper noun entities and are sometimes encrypted, being uninformative in selecting between paths. To address these problems, we propose Neural Tree Search (NuTrea), a tree search-based GNN model that incorporates the broader KG context. Our model adopts a message-passing scheme that probes the unreached subtree regions to boost the past-oriented embeddings. In addition, we introduce the Relation Frequency-Inverse Entity Frequency (RF-IEF) node embedding that considers the global KG context to better characterize ambiguous KG nodes. The general effectiveness of our approach is demonstrated through experiments on three major multi-hop KGQA benchmark datasets, and our extensive analyses further validate its expressiveness and robustness. Overall, NuTrea provides a powerful means to query the KG with complex natural language questions. Code is available at https://github.com/mlvlab/NuTrea.
    摘要 多跳知识图问答(KGQA)是一个检索知识图(KG)中节点,以回答自然语言问题的任务。现代GNN基于的方法将这个任务视为知识图路径搜索问题,其中消息从种子节点向答题节点进行顺序传递。然而,这些消息是过去oriented,没有考虑整个KG上下文。另外,KG节点经常表示特定名词实体,有时会隐藏信息,从而困难在选择路径。为了解决这些问题,我们提出了神经树搜索(NuTrea)模型,它采用了树搜索的方法,并在搜索过程中采用了消息传递机制,以提高过去oriented的嵌入。此外,我们还引入了关系频率对反应实体频率(RF-IEF)节点嵌入,以更好地描述不确定的KG节点。我们的方法在三个主要多跳KGQA标准数据集上进行了实验,并通过了广泛的分析,以证明其表达力和稳定性。总之,NuTrea提供了一种可以用于查询KG的复杂自然语言问题的强大工具。代码可以在https://github.com/mlvlab/NuTrea上找到。

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

  • paper_url: http://arxiv.org/abs/2310.15479
  • repo_url: None
  • paper_authors: Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang Cheng
  • for: 本研究使用扩散模型生成 синтетиче表格数据,解决了表格数据生成中异构特征的问题。
  • methods: 我们使用 auto-encoder 架构来处理异构特征,并与现有的表格生成器进行比较。
  • results: 我们在 15 个公共数据集上进行了实验,发现我们的模型能够准确地捕捉特征之间的相关性,并在下游任务中表现良好。
    Abstract Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over 15 publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available upon request and will be publicly released if paper is accepted.
    摘要 Diffusion model已成为现代机器学习许多子领域的主要思想,包括计算机视觉、语言模型和声音生成等。在这篇论文中,我们利用分散模型为生成合成表格数据。表格数据中具有多种特征的差异是现代表格数据生成的主要障碍,我们通过使用自编码器架构解决这个问题。与现有的表格生成器相比,我们的模型能够很好地保持与真实数据的统计准确性,并在下游任务中表现出色。我们在15个公共可用的数据集上进行了实验。值得一提的是,我们的模型能够很好地捕捉表格数据中特征之间的相关性,这是现代表格数据生成中长期的挑战。我们的代码可以在请求时提供,并将在纸上接受后公开发布。

A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.18358
  • repo_url: None
  • paper_authors: Yuanfeng Song, Yuanqin He, Xuefang Zhao, Hanlin Gu, Di Jiang, Haijun Yang, Lixin Fan, Qiang Yang
  • for: 本文旨在提供一种新的 perspective 来审查现有的 prompting 方法,并帮助读者更深入地理解现有的发展趋势。
  • methods: 本文使用了communication theory 框架来 illustrate 现有的 prompting 方法,并分析了四种典型任务的发展趋势。
  • results: 本文提出了一些可能的 future research directions ,可以帮助开发更好的 prompting 方法。
    Abstract The springing up of Large Language Models (LLMs) has shifted the community from single-task-orientated natural language processing (NLP) research to a holistic end-to-end multi-task learning paradigm. Along this line of research endeavors in the area, LLM-based prompting methods have attracted much attention, partially due to the technological advantages brought by prompt engineering (PE) as well as the underlying NLP principles disclosed by various prompting methods. Traditional supervised learning usually requires training a model based on labeled data and then making predictions. In contrast, PE methods directly use the powerful capabilities of existing LLMs (i.e., GPT-3 and GPT-4) via composing appropriate prompts, especially under few-shot or zero-shot scenarios. Facing the abundance of studies related to the prompting and the ever-evolving nature of this field, this article aims to (i) illustrate a novel perspective to review existing PE methods, within the well-established communication theory framework; (ii) facilitate a better/deeper understanding of developing trends of existing PE methods used in four typical tasks; (iii) shed light on promising research directions for future PE methods.
    摘要 春季 LLM 的出现导致社区从单一任务领域的自然语言处理(NLP)研究转移到整体端到端多任务学习模式。在这一研究领域,基于 LLM 的提示方法吸引了很多注意力,部分是因为提示工程(PE)技术的优势以及不同提示方法下的 NLP 原理的披露。传统的超级vised学习通常需要基于标注数据进行训练,然后进行预测。与此相反,PE 方法直接利用现有的 LLM(如 GPT-3 和 GPT-4)的强大能力,通过编写合适的提示,特别是在少量或零量场景下。面对各种提示和这一领域的不断演化,这篇文章的目标是:(i)在通信理论框架下绘制一种新的观点来评估现有的 PE 方法;(ii)促进更深刻的理解现有 PE 方法在四种典型任务的发展趋势;(iii)突出未来 PE 方法的探索方向。

Empowering Distributed Solutions in Renewable Energy Systems and Grid Optimization

  • paper_url: http://arxiv.org/abs/2310.15468
  • repo_url: None
  • paper_authors: Mohammad Mohammadi, Ali Mohammadi
  • for: 这项研究探讨了电力业务从中心化到分散式的转变,尤其是如何通过机器学习(ML)技术推动可再生能源和改善电网管理。
  • methods: 这项研究使用了各种机器学习模型,如人工神经网络、支持向量机和决策树,以预测可再生能源生产和消耗。此外,还使用了数据处理技术,如数据分割、 нормализация、分解和简化,以提高预测精度。
  • results: 该研究发现,通过将大数据和机器学习应用于智能电网,可以提高能源效率、更好地应对需求,并更好地 интеグрировать可再生能源资源。然而,还需要解决大量数据处理、保障网络安全和获得专业知识等挑战。
    Abstract This study delves into the shift from centralized to decentralized approaches in the electricity industry, with a particular focus on how machine learning (ML) advancements play a crucial role in empowering renewable energy sources and improving grid management. ML models have become increasingly important in predicting renewable energy generation and consumption, utilizing various techniques like artificial neural networks, support vector machines, and decision trees. Furthermore, data preprocessing methods, such as data splitting, normalization, decomposition, and discretization, are employed to enhance prediction accuracy. The incorporation of big data and ML into smart grids offers several advantages, including heightened energy efficiency, more effective responses to demand, and better integration of renewable energy sources. Nevertheless, challenges like handling large data volumes, ensuring cybersecurity, and obtaining specialized expertise must be addressed. The research investigates various ML applications within the realms of solar energy, wind energy, and electric distribution and storage, illustrating their potential to optimize energy systems. To sum up, this research demonstrates the evolving landscape of the electricity sector as it shifts from centralized to decentralized solutions through the application of ML innovations and distributed decision-making, ultimately shaping a more efficient and sustainable energy future.
    摘要 The integration of big data and ML into smart grids offers several advantages, including improved energy efficiency, more effective demand response, and better integration of renewable energy sources. However, challenges such as managing large data volumes, ensuring cybersecurity, and obtaining specialized expertise must be addressed.The research examines various ML applications within the realms of solar energy, wind energy, and electric distribution and storage, demonstrating their potential to optimize energy systems. For instance, ML can be used to predict solar and wind power output, optimize energy storage systems, and manage electricity distribution networks.Overall, this study illustrates the evolving landscape of the electricity sector as it shifts from centralized to decentralized solutions through the application of ML innovations and distributed decision-making, ultimately leading to a more efficient and sustainable energy future.

UI Layout Generation with LLMs Guided by UI Grammar

  • paper_url: http://arxiv.org/abs/2310.15455
  • repo_url: None
  • paper_authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li
  • for: investigate the use of Large Language Models (LLMs) for UI layout generation
  • methods: propose a novel approach called UI grammar to represent the hierarchical structure of UI screens and guide the generative capacities of LLMs
  • results: initial experiments with GPT-4 showed promising capability of LLMs to produce high-quality user interfaces via in-context learning, and preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results.Here’s the full translation of the abstract in Simplified Chinese:
  • for: 这份位置论文是要研究 Large Language Models (LLMs) 在用户界面 (UI) 布局生成方面的应用。
  • methods: 我们提出了一种新的方法,即 UI 语法,来表示 UI 画面中的层次结构,并将其用来引导 LLMs 的生成能力。
  • results: 我们的初步实验显示,使用 GPT-4 可以通过内容学习获得高质量的用户界面,而且我们的初步比较研究显示,语法基本方法在特定方面的生成结果质量上有潜在的改善。
    Abstract The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects.
    摘要

PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ Workflows

  • paper_url: http://arxiv.org/abs/2310.15435
  • repo_url: None
  • paper_authors: Savvas Petridis, Michael Terry, Carrie J. Cai
  • for: 这篇研究探讨了如何将AI示范和UI设计联系起来,以提高设计师的工作效率和 prototype 的可信度。
  • methods: 研究人员开发了一个名为PromptInfuser的 Figma 插件,可以让设计师将 UI 元素与提示连接起来,实现半功能的 mockup。
  • results: 在14名设计师 participated 的研究中,PromptInfuser 被视为比现有的 AI 示范工作流程更有用,可以更好地传达产品想法,生成更加实际地表现出想像中的产品,并且更有效率地进行示范。
    Abstract Prototyping AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototyping, designers are still prototyping AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create semi-functional mockups, by connecting UI elements to the inputs and outputs of prompts. In a study with 14 designers, we compare PromptInfuser to designers' current AI-prototyping workflow. PromptInfuser was perceived to be significantly more useful for communicating product ideas, more capable of producing prototypes that realistically represent the envisioned artifact, more efficient for prototyping, and more helpful for anticipating UI issues and technical constraints. PromptInfuser encouraged iteration over prompt and UI together, which helped designers identify UI and prompt incompatibilities and reflect upon their total solution. Together, these findings inform future systems for prototyping AI applications.
    摘要 probiotyping AI 应用非常困难。 although large language model (LLM) 提示大大降低了 AI probiotyping 的门槛,Designers 仍然在 AI 功能和 UI 设计之间进行分离的探索。我们研究了如何将提示和 UI 设计结合,对设计者的工作流程产生影响。为了实践这些研究,我们开发了 PromptInfuser,一款 Figma 插件,允许用户通过将 UI 元素连接到提示的输入和输出来创建半功能的 mockup。在 14 名设计者参与的研究中,我们比较了 PromptInfuser 和设计者当前的 AI 探索工作流程。结果显示,PromptInfuser 被视为更有用于传达产品想法,更能生成符合想像中的 artifact 的 mockup,更高效的探索,并更有助于预测 UI 问题和技术约束。PromptInfuser 促进了提示和 UI 的同步 iterate,帮助设计者更好地了解他们的总解决方案,并反思提示和 UI 之间的不兼容性。这些发现可以帮助未来的 AI 探索系统。

ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

  • paper_url: http://arxiv.org/abs/2310.15428
  • repo_url: None
  • paper_authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry
  • for: 这个研究的目的是开发一种可以帮助用户通过自然语言反馈来调整LLM输出的工具,以便更好地控制chatbot的行为。
  • methods: 这个研究使用了用户反馈的自然语言来生成 constitution,以便用于控制chatbot的行为。
  • results: 研究发现用户可以使用ConstitutionMaker工具来更好地控制chatbot的行为,并且可以更加快速地将自己的反馈转换成明确的原则。此外,用户还可以使用这个工具来更好地表达自己的想法和反馈,并且可以更加有效地控制chatbot的行为。
    Abstract Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively refine model outputs through their feedback, by helping them convert their feedback into a set of principles (i.e. a constitution) that dictate the model's behavior. From a formative study, we (1) found that users needed support converting their feedback into principles for the chatbot and (2) classified the different principle types desired by users. Inspired by these findings, we developed ConstitutionMaker, an interactive tool for converting user feedback into principles, to steer LLM-based chatbots. With ConstitutionMaker, users can provide either positive or negative feedback in natural language, select auto-generated feedback, or rewrite the chatbot's response; each mode of feedback automatically generates a principle that is inserted into the chatbot's prompt. In a user study with 14 participants, we compare ConstitutionMaker to an ablated version, where users write their own principles. With ConstitutionMaker, participants felt that their principles could better guide the chatbot, that they could more easily convert their feedback into principles, and that they could write principles more efficiently, with less mental demand. ConstitutionMaker helped users identify ways to improve the chatbot, formulate their intuitive responses to the model into feedback, and convert this feedback into specific and clear principles. Together, these findings inform future tools that support the interactive critiquing of LLM outputs.
    摘要 大语言模型(LLM)提示是一种有前途的新方法,允许用户创建和自定义自己的 chatbot。然而,现有的输出方法,如提示工程和精度调整,不支持用户将自然反馈转化为提示或模型更新。在这种工作中,我们explore了如何让用户通过反馈来精炼模型输出,并帮助他们将反馈转化为一组原则(即宪法),定义模型的行为。从一项形成研究中,我们发现:1. 用户需要支持将反馈转化为原则,以便控制 chatbot。2. 用户希望的不同原则类型,包括具体的问题和答案。基于这些发现,我们开发了宪法制作器(ConstitutionMaker),一种可供用户在自然语言反馈下,将反馈转化为原则。宪法制作器包括三种反馈模式:自然语言反馈、自动生成反馈和重写回复。每种反馈模式都会自动生成一个原则,并将其插入到 chatbot 的提示中。在14名参与者参与的用户研究中,我们比较了宪法制作器与一个减少版本,其中用户需要手动编写原则。与减少版本相比,参与者表示使用宪法制作器可以更好地控制 chatbot,更容易将反馈转化为原则,并可以更高效地编写原则,减少心理压力。宪法制作器帮助用户找到 chatbot 的问题,将自然反馈转化为可以更好地引导模型的反馈,并将反馈转化为具体和明确的原则。这些发现可以指导未来的工具,以支持 LLM 输出的交互批判。

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

  • paper_url: http://arxiv.org/abs/2310.18356
  • repo_url: None
  • paper_authors: Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, Luming Liang
  • for: 本研究目的是提出一种高效的语言模型结构剪裁方法,以降低大型语言模型的计算成本。
  • methods: 本方法首先生成了LoRA模块之间的依赖关系图,以发现最小 removable 结构并分析知识分布。然后,它进行了逐步结构剪裁 LoRA 适配器,并启用了内置的知识传递以更好地保留红利模块中的信息。
  • results: 经过numerical experiment,这种方法可以在只使用一个GPU内部的几个GPU天内减少大型语言模型的占用空间,并且只减少了1.0%的性能。此外,这种方法还可以至少20%的计算成本。
    Abstract Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear at first creates the dependency graphs over LoRA modules to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear.
    摘要 大型语言模型(LLMs)已经改变人工智能领域的景观,但它们的巨大大小也带来了计算成本的挑战。我们介绍LoRAShear,一种新的高效方法,可以结构删除LLMs并恢复知识。对于一般的LLMs,LoRAShear首先创建LoRA模组之间的依赖关系图以发现最小移除结构,并分析知识的分布。然后,它逐渐进行进构式删除LoRA拓展,并启用内置的知识转移,以更好地保留在缩减结构中遗留的信息。为了恢复在删除过程中遗失的知识,LoRAShear详细研究并提出了动态精度调整方案和动态数据适配器,以有效地缩小性能差距和全模型。 num = num results表明,仅使用一个GPU,在几个GPU天的时间内,LoRAShear已经成功地将LLMs的印识缩减了20%,仅增加1.0%的性能损失,并明显超过了状态顶峰的方法。源代码将会在https://github.com/microsoft/lorashear上公开。

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

  • paper_url: http://arxiv.org/abs/2310.15421
  • repo_url: https://github.com/skywalker023/fantom
  • paper_authors: Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Le Bras, Gunhee Kim, Yejin Choi, Maarten Sap
  • for: 这个论文是为了测试理解人类思维的语言模型(LLM)。
  • methods: 该论文使用了一个新的benchmark方法called FANToM,通过问答测试LLM的理解人类思维的能力。
  • results: 研究发现,现状的LLM模型在Answering questions时表现较差,与人类的思维能力相比。
    Abstract Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning.
    摘要 theory of mind (ToM) 评估目前主要是通过使用无反应的故事进行测试,这些故事自然lack interactivity。我们介绍了FANToM,一个新的 benchmark,用于在信息不均衡对话上测试 ToM,通过问答来强制测试。我们的 benchmark drew upon 心理理论中的重要需求和评估大语言模型 (LLMs) 中必要的实证考虑。特别是,我们设计了多种问题,需要同样的理解来识别LLMs中的幻觉或false sense of ToM能力。我们显示了FANToM是现状顶尖 LLMs 所表现出 significatively worse than humans,包括链式思维或细化。

Fractal Landscapes in Policy Optimization

  • paper_url: http://arxiv.org/abs/2310.15418
  • repo_url: None
  • paper_authors: Tao Wang, Sylvia Herbert, Sicun Gao
  • for: 本研究旨在探讨policy gradient方法在连续状态空间下的深度学习控制问题中的一种限制。
  • methods: 本研究使用了chaos theory和非准确分析的技术,分析了策略优化目标函数的马克思列普涅夫准则和欧几里得级数。
  • results: 研究发现,在某些类型的MDP中,策略优化的优化地形可能具有非准确或复杂的特征,导致无法估算梯度。通过实际实验,研究证明了这种情况的发生。
    Abstract Policy gradient lies at the core of deep reinforcement learning (RL) in continuous domains. Despite much success, it is often observed in practice that RL training with policy gradient can fail for many reasons, even on standard control problems with known solutions. We propose a framework for understanding one inherent limitation of the policy gradient approach: the optimization landscape in the policy space can be extremely non-smooth or fractal for certain classes of MDPs, such that there does not exist gradient to be estimated in the first place. We draw on techniques from chaos theory and non-smooth analysis, and analyze the maximal Lyapunov exponents and H\"older exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.
    摘要 政策梯度位于深度奖励学习(RL)中的核心,即使在标准控制问题上有知解的解决方案,RL 训练仍然可能失败。我们提出了一种框架来理解政策梯度方法的内在限制:在策略空间中的优化景观可能是某些类型的 Markov declaim Problem (MDP) 非凡的,无法计算梯度。我们Draw on chaos theory and non-smooth analysis techniques, and analyze the maximal Lyapunov exponents and Holder exponents of the policy optimization objectives. Moreover, we develop a practical method that can estimate the local smoothness of the objective function from samples to identify when the training process has encountered fractal landscapes. We show experiments to illustrate how some failure cases of policy optimization can be explained by such fractal landscapes.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The Traditional Chinese writing system is also widely used, especially in Taiwan and Hong Kong.

Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction

  • paper_url: http://arxiv.org/abs/2310.15416
  • repo_url: https://github.com/andrewlai61616/npsr
  • paper_authors: Chih-Yu Lai, Fan-Keng Sun, Zhengqi Gao, Jeffrey H. Lang, Duane S. Boning
  • for: 本研究旨在提出一种基于点 reconstruction 和序列 reconstruction 的无监测时间序异常检测方法,以解决时间序异常检测中的复杂性和多样性问题。
  • methods: 本研究提出了一个点 reconstruction 模型和一个序列 reconstruction 模型,用于检测点异常和Contextual异常。Point reconstruction 模型使用一个点异常检测器来评估点异常,而序列 reconstruction 模型使用一个序列异常检测器来评估点和Contextual异常。
  • results: 对于several public datasets进行了广泛的实验研究,结果表明,提出的方法在时间序异常检测中比大多数现有的基准方法表现更好。
    Abstract Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.
    摘要 时间序列异常检测是因为复杂多变的异常模式而具有挑战性。一个主要困难在于模型时间相关关系以确定上下文异常,同时保持点异常检测的准确性。在这篇论文中,我们提出一种无监督时间序列异常检测框架,该框架利用点基模型和序列基模型来进行重建。点基模型尝试量化点异常,而序列基模型尝试量化点和上下文异常。我们在认为观察到的时间点为假值的两个阶段偏移后,引入了一个 Nominality 分数,该分数来自重建错误的合计值。我们还提出了一个潜在异常分数,并经过理论证明其在某些条件下超过原始异常分数的优越性。我们在多个公共数据集上进行了广泛的研究,并证明了我们的框架在大多数现状监督模型的基础上显著超越。

Mind the Gap Between Conversations for Improved Long-Term Dialogue Generation

  • paper_url: http://arxiv.org/abs/2310.15415
  • repo_url: https://github.com/qzx7/mindthetime
  • paper_authors: Qiang Zhang, Jason Naradowsky, Yusuke Miyao
  • for: 这篇论文目的是让对话模型意识到时间的概念,并在不同的时间间隔下进行对话。
  • methods: 作者提出了一个名为GapChat的多会话对话集,其中每个会话之间的时间间隔不同。模型在接受时间信息后,对时间和事件进展进行了不同的表示。
  • results: 人工评估表明,在评价对话 relevance 和信息吸收方面,意识时间的模型表现更好。
    Abstract Knowing how to end and resume conversations over time is a natural part of communication, allowing for discussions to span weeks, months, or years. The duration of gaps between conversations dictates which topics are relevant and which questions to ask, and dialogue systems which do not explicitly model time may generate responses that are unnatural. In this work we explore the idea of making dialogue models aware of time, and present GapChat, a multi-session dialogue dataset in which the time between each session varies. While the dataset is constructed in real-time, progress on events in speakers' lives is simulated in order to create realistic dialogues occurring across a long timespan. We expose time information to the model and compare different representations of time and event progress. In human evaluation we show that time-aware models perform better in metrics that judge the relevance of the chosen topics and the information gained from the conversation.
    摘要 知道如何结束和续续会话是通信的自然部分,允许对话 span 周月年。对话系统不显式考虑时间可能生成不自然的响应。在这项工作中,我们探讨将对话模型意识到时间的想法,并提出了 GapChat,一个多期对话集。在这个数据集中,每个会话之间的时间间隔不同。虽然数据集是在实时构建的,但speakers的生活进程的进步是通过模拟来创建真实的对话发生在长时间间隔。我们暴露了时间信息给模型,并比较了不同的时间和事件进度表示。在人类评估中,我们显示时间意识的模型在评估对话中话题的相关性和获得的信息的 метриках中表现出色。

Diverse Conventions for Human-AI Collaboration

  • paper_url: http://arxiv.org/abs/2310.15414
  • repo_url: https://github.com/Stanford-ILIAD/Diverse-Conventions
  • paper_authors: Bidipta Sarkar, Andy Shih, Dorsa Sadigh
  • for: 提高多代理人游戏中的合作性和多样性,使得玩家可以协调共享策略而不需要显式交流。
  • methods: 使用自适应奖励学习和权衡策略来优化合作策略,并通过权衡策略和前一次发现的策略之间的冲突来生成多样的协议。
  • results: 在多种多样的合作游戏中,包括Overcooked,技术可以超越人类水平,并且能够适应人类的协议。
    Abstract Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.
    摘要 <>translate "Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce \emph{mixed-play}, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.">> Here's the translation in Simplified Chinese:<>合作多代理游戏中,共谱是关键,它允许玩家在不Explicit Communication的情况下协调共同策略。然而,标准的多代理学习技术,如自我玩家,会导致不多样化的共谱,这会导致与新合作伙伴交互时的泛化性强度不高。在这项工作中,我们提出了一种技术来生成多样化的共谱,通过(1)在自我玩家中最大化奖励,而(2)在已经发现的共谱中最小化奖励,以便使共谱具有semantically different的特征。为确保学习的策略在恶性优化的cross-play中保持良好的行为,我们引入了杂合玩家(mixed-play),其中,初始状态由自我玩家和cross-play转移中随机选择,并且玩家学习从这个初始状态中 maximize自我玩家奖励。我们对多种多代理合作游戏,包括Overcooked,进行分析,发现我们的技术可以适应人类的共谱,并且在与真实用户配对时超过人类水平的性能。