cs.AI - 2023-10-01

OceanNet: A principled neural operator-based digital twin for regional oceans

  • paper_url: http://arxiv.org/abs/2310.00813
  • repo_url: None
  • paper_authors: Ashesh Chattopadhyay, Michael Gray, Tianning Wu, Anna B. Lowe, Ruoying He
  • for: 这个研究旨在开发一种基于神经网络的数字孪生模型,用于海洋径流预测。
  • methods: 该模型使用FOURNIER神经网络算法和评估误差修正方法,以提高预测稳定性和抑制自回卷积误差增长。此外,使用 спектраль regularizer 减少小规模谱偏误。
  • results: 在北大西洋西部边域流(卡里布湾流)中测试了这种模型,并成功地预测了径流聚合体和弯曲流的季节预报。与传统的不连接、状态环境模型预测相比,这种模型显示出竞争的预测能力,同时减少了计算量500,000倍。这些成果表明物理启发的深度神经算法可能成为高分辨率数字海洋模型的成本效果的替代方案。
    Abstract While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.
    摘要 While data-driven approaches have shown great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.Here's the translation in Traditional Chinese:而data-driven方法在大气模拟和天气预测中表现出了很大的潜力,但是海洋模拟却存在复杂的海底地形、陆地、垂直结构和流体非线性等挑战。本研究提出了OceanNet,一种基于神经算子的数字双胞虫 для海洋流动。OceanNet使用了福洛神经算子和预测评估修正 integrate scheme来减少自回归错误增长和提高时间尺度上的稳定性。另外,一种 Spectral regularizer 来抵消小尺度的 spectral bias。OceanNet 应用于北大西洋西部边Current( GolStream),专注于季节预测Loop Current eddies 和 GolStream meander。使用历史海面高度数据进行训练,OceanNet 表现出了与不可分离的、现有的动力海洋模型预测 SSH 数据的竞争力,并且减少了计算量500,000倍。这些成就表明 physics-inspired deep neural operators 可以成为高分解能数字海洋模型的成本效果的替代方案。

Sparse Backpropagation for MoE Training

  • paper_url: http://arxiv.org/abs/2310.00811
  • repo_url: None
  • paper_authors: Liyuan Liu, Jianfeng Gao, Weizhu Chen
  • for: 这篇论文主要旨在解决深度学习中的权值计算问题,特别是在混合专家(Mixture-of-Expert,MoE)模型中,通过专家路由实现稀疏计算,从而实现很好的扩展性。
  • methods: 该论文提出了一种名为SparseMixer的扩展性 gradient estimator,它可以在混合专家模型中实现可靠的梯度估计,并且不需要忽略某些梯度项,从而实现更加准确的梯度估计。SparseMixer基于数字差分方法,利用中点法来提供精确的梯度估计,计算 overhead 很低。
  • results: 应用SparseMixer于 Switch Transformer 上,在预训练和机器翻译任务中,可以见到较大的性能提升,快速加速训练过程,最多提高训练速度2倍。
    Abstract One defining characteristic of Mixture-of-Expert (MoE) models is their capacity for conducting sparse computation via expert routing, leading to remarkable scalability. However, backpropagation, the cornerstone of deep learning, requires dense computation, thereby posting challenges in MoE gradient computations. Here, we introduce SparseMixer, a scalable gradient estimator that bridges the gap between backpropagation and sparse expert routing. Unlike typical MoE training which strategically neglects certain gradient terms for the sake of sparse computation and scalability, SparseMixer provides scalable gradient approximations for these terms, enabling reliable gradient estimation in MoE training. Grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a second-order ODE solver, to deliver precise gradient approximations with negligible computational overhead. Applying SparseMixer to Switch Transformer on both pre-training and machine translation tasks, SparseMixer showcases considerable performance gain, accelerating training convergence up to 2 times.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation may not be perfect, and some nuances or idiomatic expressions may be lost in translation.

Towards Causal Foundation Model: on Duality between Causal Inference and Attention

  • paper_url: http://arxiv.org/abs/2310.00809
  • repo_url: None
  • paper_authors: Jiaqi Zhang, Joel Jennings, Cheng Zhang, Chao Ma
  • for: 这篇论文旨在建立复杂任务中的 causal inference 模型,以提高机器学习的效果。
  • methods: 该论文提出了一种新的、理论上正确的方法 called Causal Inference with Attention (CInA),该方法通过多个无标注数据进行自主学习 causal learning,并在新数据上进行零shot causal inference。
  • results: 实验结果表明,CInA方法能够通过最终层的 transformer-type 架构实现零shot causal inference,并能够在不同的数据集上进行效果的泛化。
    Abstract Foundation models have brought changes to the landscape of machine learning, demonstrating sparks of human-level intelligence across a diverse array of tasks. However, a gap persists in complex tasks such as causal inference, primarily due to challenges associated with intricate reasoning steps and high numerical precision requirements. In this work, we take a first step towards building causally-aware foundation models for complex tasks. We propose a novel, theoretically sound method called Causal Inference with Attention (CInA), which utilizes multiple unlabeled datasets to perform self-supervised causal learning, and subsequently enables zero-shot causal inference on unseen tasks with new data. This is based on our theoretical results that demonstrate the primal-dual connection between optimal covariate balancing and self-attention, facilitating zero-shot causal inference through the final layer of a trained transformer-type architecture. We demonstrate empirically that our approach CInA effectively generalizes to out-of-distribution datasets and various real-world datasets, matching or even surpassing traditional per-dataset causal inference methodologies.
    摘要 基础模型已经带来了机器学习领域的变革,展示出人类水平的智能特性在多种任务上。然而,在复杂任务中,如 causal inference,仍存在一个差距,主要归结于复杂的逻辑步骤和高精度数字需求。在这项工作中,我们首次实现了基于自我超vised causal learning的可 causal-aware基础模型。我们提出了一种新的、理论上正确的方法called Causal Inference with Attention(CInA),通过多个无标签数据集进行自我超vised causal learning,并在新数据上进行零实际参数的 causal inference。这基于我们的理论结果,证明了优化 covariate balancing 和 self-attention 的 primal-dual 连接,从而实现零实际参数的 causal inference through 训练过的 transformer-type 架构的最后一层。我们通过实验证明,我们的方法 CInA 可以对不同的数据集和实际世界任务进行有效的泛化,与传统每个数据集的 causal inference 方法相当或者even surpass。

Knowledge Engineering for Wind Energy

  • paper_url: http://arxiv.org/abs/2310.00804
  • repo_url: https://github.com/Planet21century/TECHALDO.
  • paper_authors: Yuriy Marykovskiy, Thomas Clark, Justin Day, Marcus Wiens, Charles Henderson, Julian Quick, Imad Abdallah, Anna Maria Sempreviva, Jean-Paul Calbimonte, Eleni Chatzi, Sarah Barber
  • for: 本研究旨在帮助风能领域专家将数据转化为域知识,与其他知识源集成,并为下一代人工智能系统提供可用的数据。
  • methods: 本文使用知识工程来支持风能领域的数字变革,并提出了域知识表示的主要概念。 previous work 在风能领域知识工程和知识表示方面进行了系统性的分析,并提供了适用于域专家的指南。
  • results: 本文通过系统分析当前风能领域知识工程的状况,并将主要域算法和工具置于风能领域专家需求和问题点上下文中,以帮助读者更好地理解和应用知识工程技术。
    Abstract With the rapid evolution of the wind energy sector, there is an ever-increasing need to create value from the vast amounts of data made available both from within the domain, as well as from other sectors. This article addresses the challenges faced by wind energy domain experts in converting data into domain knowledge, connecting and integrating it with other sources of knowledge, and making it available for use in next generation artificially intelligent systems. To this end, this article highlights the role that knowledge engineering can play in the process of digital transformation of the wind energy sector. It presents the main concepts underpinning Knowledge-Based Systems and summarises previous work in the areas of knowledge engineering and knowledge representation in a manner that is relevant and accessible to domain experts. A systematic analysis of the current state-of-the-art on knowledge engineering in the wind energy domain is performed, with available tools put into perspective by establishing the main domain actors and their needs and identifying key problematic areas. Finally, guidelines for further development and improvement are provided.
    摘要 随着风能行业的快速发展,需要从各个领域中提取丰富的数据,并将其与其他领域的知识相连接和融合。这篇文章挑战风能领域专家将数据转化为域知识,并将其与其他知识源融合,以便在下一代人工智能系统中使用。为此,本文强调了知识工程在风能领域的数字转型过程中的重要作用。文章介绍了知识工程的主要概念,并总结了过去在风能领域的知识工程和知识表示方面的工作,以便对风能领域专家有所帮助。本文进行了风能领域知识工程的系统性分析,并将可用工具放在风能领域主要拥有者和他们的需求之前提下进行了比较。文章还标识了主要问题点,以便进一步的发展和改进。最后,文章提供了进一步发展和改进的指南。

GraphPatcher: Mitigating Degree Bias for Graph Neural Networks via Test-time Augmentation

  • paper_url: http://arxiv.org/abs/2310.00800
  • repo_url: https://github.com/jumxglhf/graphpatcher
  • paper_authors: Mingxuan Ju, Tong Zhao, Wenhao Yu, Neil Shah, Yanfang Ye
  • for: 提高 graph neural network (GNN) 的测试时通用性和低度节点表现。
  • methods: 提出了一种名为 GraphPatcher 的测试时扩充框架,通过在训练时生成虚拟节点来强化 GNN 的测试时性能。
  • results: 对七个基准数据集进行了广泛的实验,并 consistently 提高了常见 GNN 的总性能和低度节点表现,相比之前的状态态标准基eline。
    Abstract Recent studies have shown that graph neural networks (GNNs) exhibit strong biases towards the node degree: they usually perform satisfactorily on high-degree nodes with rich neighbor information but struggle with low-degree nodes. Existing works tackle this problem by deriving either designated GNN architectures or training strategies specifically for low-degree nodes. Though effective, these approaches unintentionally create an artificial out-of-distribution scenario, where models mainly or even only observe low-degree nodes during the training, leading to a downgraded performance for high-degree nodes that GNNs originally perform well at. In light of this, we propose a test-time augmentation framework, namely GraphPatcher, to enhance test-time generalization of any GNNs on low-degree nodes. Specifically, GraphPatcher iteratively generates virtual nodes to patch artificially created low-degree nodes via corruptions, aiming at progressively reconstructing target GNN's predictions over a sequence of increasingly corrupted nodes. Through this scheme, GraphPatcher not only learns how to enhance low-degree nodes (when the neighborhoods are heavily corrupted) but also preserves the original superior performance of GNNs on high-degree nodes (when lightly corrupted). Additionally, GraphPatcher is model-agnostic and can also mitigate the degree bias for either self-supervised or supervised GNNs. Comprehensive experiments are conducted over seven benchmark datasets and GraphPatcher consistently enhances common GNNs' overall performance by up to 3.6% and low-degree performance by up to 6.5%, significantly outperforming state-of-the-art baselines. The source code is publicly available at https://github.com/jumxglhf/GraphPatcher.
    摘要 近期研究发现,图 neural network (GNN) 具有节点度偏好:它们通常在高度节点上表现良好,但是在低度节点上遇到困难。现有的方法包括设计专门的 GNN 架构或训练策略,以解决这个问题。虽然有效,这些方法会意外创造一种人工的异常情况,导致模型在训练中主要或仅仅观察低度节点,从而导致高度节点的性能下降。为了解决这个问题,我们提出了一个测试时扩展框架,即 GraphPatcher,以提高任何 GNN 的测试时通用性。Specifically, GraphPatcher iteratively generates virtual nodes to patch artificially created low-degree nodes via corruptions, aiming at progressively reconstructing target GNN's predictions over a sequence of increasingly corrupted nodes. Through this scheme, GraphPatcher not only learns how to enhance low-degree nodes (when the neighborhoods are heavily corrupted) but also preserves the original superior performance of GNNs on high-degree nodes (when lightly corrupted). Additionally, GraphPatcher is model-agnostic and can also mitigate the degree bias for either self-supervised or supervised GNNs.我们在七个 benchmark 数据集上进行了广泛的实验,并证明 GraphPatcher 可以一直提高常见 GNN 的总性能和低度节点性能,最高提高3.6%和6.5%。与现有的基elines相比,GraphPatcher 显示出了显著的优势。源代码可以在 GitHub 上下载,请参阅

A Comprehensive Review of Generative AI in Healthcare

  • paper_url: http://arxiv.org/abs/2310.00795
  • repo_url: None
  • paper_authors: Yasin Shokrollahi, Sahar Yarmohammadtoosky, Matthew M. Nikahd, Pengfei Dong, Xianqi Li, Linxia Gu
  • for: 本文主要探讨了生成式人工智能(AI)在医疗领域的应用,尤其是转换器和扩散模型。
  • methods: 本文使用的方法包括医疗影像分析、预测蛋白结构、临床文档、诊断协助、放射学解读、临床决策支持、医疗代码和财务处理等。
  • results: 本文总结了各种生成式AI应用在医疗领域的进展,包括医疗影像重建、图像至图像翻译、图像生成和分类、蛋白结构预测、临床诊断和决策支持等,并提出了未来研究的可能性以满足医疗领域的发展需求。
    Abstract The advancement of Artificial Intelligence (AI) has catalyzed revolutionary changes across various sectors, notably in healthcare. Among the significant developments in this field are the applications of generative AI models, specifically transformers and diffusion models. These models have played a crucial role in analyzing diverse forms of data, including medical imaging (encompassing image reconstruction, image-to-image translation, image generation, and image classification), protein structure prediction, clinical documentation, diagnostic assistance, radiology interpretation, clinical decision support, medical coding, and billing, as well as drug design and molecular representation. Such applications have enhanced clinical diagnosis, data reconstruction, and drug synthesis. This review paper aims to offer a thorough overview of the generative AI applications in healthcare, focusing on transformers and diffusion models. Additionally, we propose potential directions for future research to tackle the existing limitations and meet the evolving demands of the healthcare sector. Intended to serve as a comprehensive guide for researchers and practitioners interested in the healthcare applications of generative AI, this review provides valuable insights into the current state of the art, challenges faced, and prospective future directions.
    摘要 人工智能(AI)的发展对各个领域产生了革命性的变革,医疗领域是其中之一。在这个领域中,生成式AI模型,特别是转换器和扩散模型,对医疗数据进行分析发挥了重要作用。这些模型可以处理各种不同的数据类型,包括医疗影像重建、图像到图像翻译、图像生成和图像分类、蛋白质结构预测、临床记录、诊断助手、医学影像理解、诊断支持、医疗代码和财务处理等。这些应用程序提高了临床诊断、数据重建和药物合成。本文旨在为医疗领域的研究人员和实践者提供一份全面的综述,探讨生成式AI在医疗领域的应用,特别是转换器和扩散模型。此外,我们还提出了未来研究的可能性,以满足医疗领域的发展需求。这篇文章旨在为医疗领域的研究人员和实践者提供一份价值的指南,帮助他们更好地理解现有技术的状况、挑战和未来发展趋势。

  • paper_url: http://arxiv.org/abs/2310.00793
  • repo_url: https://github.com/uisim2020/uisim2020
  • paper_authors: Haitao Mao, Juanhui Li, Harry Shomer, Bingheng Li, Wenqi Fan, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang
  • for: 本研究旨在探讨链接预测task在不同领域 dataset 之间的共通原理,以提高链接预测模型的普适性。
  • methods: 本研究使用了三种关键因素:本地结构靠近性、全局结构靠近性和特征靠近性,以探索链接预测task 的数据中心视角。
  • results: 研究发现,全局结构靠近性只有在本地结构靠近性不足时才有效。此外,特征靠近性和结构靠近性之间存在冲突,导致 GNN4LP 模型在一些链接上表现不佳。
    Abstract Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets. In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity. We then unearth relationships among those factors where (i) global structural proximity only shows effectiveness when local structural proximity is deficient. (ii) The incompatibility can be found between feature and structural proximity. Such incompatibility leads to GNNs for Link Prediction (GNN4LP) consistently underperforming on edges where the feature proximity factor dominates. Inspired by these new insights from a data perspective, we offer practical instruction for GNN4LP model design and guidelines for selecting appropriate benchmark datasets for more comprehensive evaluations.
    摘要 链接预测,一项基本任务在图上,已经在各种应用中证明无可或,例如朋友推荐、蛋白分析和药物交互预测。然而, datasets span 多个领域,它们可能具有不同的下面机制。文献证明了无一个通用的算法适用于所有 datasets。在这篇文章中,我们尝试通过数据中心的视角来探索链接预测的原则。我们认为链接预测中有三个基本因素是关键的:本地结构靠近性、全局结构靠近性和特征靠近性。然后,我们发现这些因素之间存在关系,包括(i)全局结构靠近性只有当本地结构靠近性不足时才能够有效。(ii)特征和结构靠近性之间存在不兼容性,这导致 GNNs for Link Prediction (GNN4LP) 在特征靠近性因素占主导地位的边上表现不佳。被这些新的数据视角所 inspirited,我们提供了实用的 GNN4LP 模型设计指南和选择合适的 benchmark 数据集的指南,以便更全面的评估。

Towards a Universal Understanding of Color Harmony: Fuzzy Approach

  • paper_url: http://arxiv.org/abs/2310.00791
  • repo_url: None
  • paper_authors: Pakizar Shamoi, Muragul Muratbekova, Assylzhan Izbassar, Atsushi Inoue, Hiroharu Kawanaka
  • for: explore color harmony using a fuzzy-based color model and evaluate its universality
  • methods: use a dataset of attractive images from five different domains, apply a fuzzy approach to identify harmony patterns and dominant color palettes
  • results: color harmony is largely universal, influenced by hue relationships, saturation, and intensity of colors, with prevalent adherence to color wheel principles in palettes with high harmony levels.
    Abstract Harmony level prediction is receiving increasing attention nowadays. Color plays a crucial role in affecting human aesthetic responses. In this paper, we explore color harmony using a fuzzy-based color model and address the question of its universality. For our experiments, we utilize a dataset containing attractive images from five different domains: fashion, art, nature, interior design, and brand logos. We aim to identify harmony patterns and dominant color palettes within these images using a fuzzy approach. It is well-suited for this task because it can handle the inherent subjectivity and contextual variability associated with aesthetics and color harmony evaluation. Our experimental results suggest that color harmony is largely universal. Additionally, our findings reveal that color harmony is not solely influenced by hue relationships on the color wheel but also by the saturation and intensity of colors. In palettes with high harmony levels, we observed a prevalent adherence to color wheel principles while maintaining moderate levels of saturation and intensity. These findings contribute to ongoing research on color harmony and its underlying principles, offering valuable insights for designers, artists, and researchers in the field of aesthetics.
    摘要 现在,谐契度预测已经得到了越来越多的关注。颜色在人类美学反应中发挥了关键性的作用。在这篇论文中,我们使用基于朴素集的颜色模型来探讨颜色谐契,并评估其universality。我们使用包含有吸引人的图像的五个领域:时尚、艺术、自然、家居设计和品牌LOGO的 dataset进行实验。我们希望通过朴素方法来确定图像中的谐契模式和主导的颜色alette。这种方法适合这种任务,因为它可以处理美学和颜色谐契评估中的内在主观性和上下文变化。我们的实验结果表明,颜色谐契是大体上的universal。此外,我们发现颜色谐契不仅受到颜色轮的颜色关系的影响,还受到颜色的浓淡和强度的影响。在高谐契水平的颜色alette中,我们发现了较高的颜色轮原则遵循性,同时保持了中等的浓淡和强度。这些发现对美学颜色谐契的研究提供了有价值的见解,对设计师、艺术家和研究人员在美学颜色谐契方面的工作都是有益的。

BooookScore: A systematic exploration of book-length summarization in the era of LLMs

  • paper_url: http://arxiv.org/abs/2310.00785
  • repo_url: https://github.com/lilakk/booookscore
  • paper_authors: Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer
  • For: This paper focuses on developing a method to evaluate the coherence of book-length summaries generated by large language models (LLMs). The authors aim to address the challenges of evaluating summarization of long documents, which are not well-studied due to the lack of datasets and evaluation methods.* Methods: The authors use two prompting workflows to generate book-length summaries: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. They also develop an automatic metric, BooookScore, to measure the coherence of the summaries.* Results: The authors obtain human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. They find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than the oft-repetitive ones generated by LLaMA 2. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by human annotators.Here is the Chinese translation of the three points:* For: 本文旨在开发一种方法来评估大语言模型(LLM)生成的长文摘要的准确性。作者们面临长文摘要评估的挑战,因为现有的数据集和评估方法尚未得到了深入研究。* Methods: 作者们使用两种提示工作流程来生成长文摘要:(1)层次合并 chunk-level 摘要,和(2)逐步更新 Running 摘要。他们还开发了一个自动度量器,叫做 BooookScore,用于衡量摘要的准确性。* Results: 作者们 obt 100 篇最近发表的书籍的 GPT-4 生成的摘要,并将其分为八种常见的准确性错误。他们发现,关闭源 LLM such as GPT-4 和 Claude 2 生成的摘要具有更高的 BooookScore,与 oft-repetitive 的 LLaMA 2 生成的摘要不同。增量更新 yields 较低的 BooookScore,但是具有更高的细节水平。
    Abstract Summarizing book-length documents (>100K tokens) that exceed the context window size of large language models (LLMs) requires first breaking the input document into smaller chunks and then prompting an LLM to merge, update, and compress chunk-level summaries. Despite the complexity and importance of this task, it has yet to be meaningfully studied due to the challenges of evaluation: existing book-length summarization datasets (e.g., BookSum) are in the pretraining data of most public LLMs, and existing evaluation methods struggle to capture errors made by modern LLM summarizers. In this paper, we present the first study of the coherence of LLM-based book-length summarizers implemented via two prompting workflows: (1) hierarchically merging chunk-level summaries, and (2) incrementally updating a running summary. We obtain 1193 fine-grained human annotations on GPT-4 generated summaries of 100 recently-published books and identify eight common types of coherence errors made by LLMs. Because human evaluation is expensive and time-consuming, we develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. BooookScore has high agreement with human annotations and allows us to systematically evaluate the impact of many other critical parameters (e.g., chunk size, base LLM) while saving $15K and 500 hours in human evaluation costs. We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than the oft-repetitive ones generated by LLaMA 2. Incremental updating yields lower BooookScore but higher level of detail than hierarchical merging, a trade-off sometimes preferred by human annotators. We release code and annotations after blind review to spur more principled research on book-length summarization.
    摘要 大量文档摘要(>100K tokens)需要首先将输入文档分成更小的块,然后使用大型自然语言模型(LLM)来合并、更新和压缩块级摘要。尽管这项任务的复杂性和重要性尚未得到系统的研究,但是现有的书籍摘要 dataset(例如 BookSum)都包含在大多数公共 LLM 的预训练数据中,而现有的评估方法很难 Capture LLM 摘要器中的错误。在这篇论文中,我们提出了首次对 LLM 基于的书籍摘要器的准确性进行了研究。我们使用两种提示工作流程:(1)层次合并块级摘要,和(2)逐步更新RunningSummary。我们获得了1193个精细的人类标注,对 GPT-4 生成的 100 部最新出版的书籍摘要进行了评估,并发现了 eight 种常见的准确性错误。由于人类评估是昂贵和时间consuming的,我们开发了一个自动度量器,BooookScore,可以衡量摘要中含有准确性错误的句子的比例。BooookScore 与人类标注具有高一致性,我们可以通过 sistematic 地评估多个关键参数(例如块大小、基础 LLN)而节省 $15K 和 500 小时的人类评估成本。我们发现closed-source LLMs 如 GPT-4 和 Claude 2 生成的摘要具有更高的 BooookScore,而 LLLaMA 2 的摘要则具有较高的重复性。逐次更新具有较低的 BooookScore,但是具有更高的细节水平,这些trade-off 有时被人类 annotators 首选。我们在审查后发布代码和标注,以促进更理性的研究在书籍摘要领域。

Mining Java Memory Errors using Subjective Interesting Subgroups with Hierarchical Targets

  • paper_url: http://arxiv.org/abs/2310.00781
  • repo_url: https://github.com/remilyoucef/sca-miner
  • paper_authors: Youcef Remil, Anes Bendimerad, Mathieu Chambard, Romain Mathonat, Marc Plantevit, Mehdi Kaytoue
  • for: 本文主要针对软件应用程序,尤其是企业资源计划(ERP)系统的维护问题。
  • methods: 本文提出了一种新的子组发现(SD)技术,可以自动 mines incident数据并提取独特的模式,以识别问题的根本原因。
  • results: 本文通过一个Empirical Study validate了该方法的有效性和Pattern的质量。
    Abstract Software applications, especially Enterprise Resource Planning (ERP) systems, are crucial to the day-to-day operations of many industries. Therefore, it is essential to maintain these systems effectively using tools that can identify, diagnose, and mitigate their incidents. One promising data-driven approach is the Subgroup Discovery (SD) technique, a data mining method that can automatically mine incident datasets and extract discriminant patterns to identify the root causes of issues. However, current SD solutions have limitations in handling complex target concepts with multiple attributes organized hierarchically. To illustrate this scenario, we examine the case of Java out-of-memory incidents among several possible applications. We have a dataset that describes these incidents, including their context and the types of Java objects occupying memory when it reaches saturation, with these types arranged hierarchically. This scenario inspires us to propose a novel Subgroup Discovery approach that can handle complex target concepts with hierarchies. To achieve this, we design a pattern syntax and a quality measure that ensure the identified subgroups are relevant, non-redundant, and resilient to noise. To achieve the desired quality measure, we use the Subjective Interestingness model that incorporates prior knowledge about the data and promotes patterns that are both informative and surprising relative to that knowledge. We apply this framework to investigate out-of-memory errors and demonstrate its usefulness in incident diagnosis. To validate the effectiveness of our approach and the quality of the identified patterns, we present an empirical study. The source code and data used in the evaluation are publicly accessible, ensuring transparency and reproducibility.
    摘要 To illustrate this scenario, we examine the case of Java out-of-memory incidents among several possible applications. We have a dataset that describes these incidents, including their context and the types of Java objects occupying memory when it reaches saturation, with these types arranged hierarchically. This scenario inspires us to propose a novel Subgroup Discovery approach that can handle complex target concepts with hierarchies.To achieve this, we design a pattern syntax and a quality measure that ensure the identified subgroups are relevant, non-redundant, and resilient to noise. To achieve the desired quality measure, we use the Subjective Interestingness model that incorporates prior knowledge about the data and promotes patterns that are both informative and surprising relative to that knowledge. We apply this framework to investigate out-of-memory errors and demonstrate its usefulness in incident diagnosis.To validate the effectiveness of our approach and the quality of the identified patterns, we present an empirical study. The source code and data used in the evaluation are publicly accessible, ensuring transparency and reproducibility.硬件应用程序,特别是企业资源规划(ERP)系统,对许多行业的日常运营是关键。因此,保持这些系统的效果是非常重要,使用可以识别、诊断和缓解incident的工具。一种有前途的数据驱动方法是Subgroup Discovery(SD)技术,可以自动挖掘incident数据集并提取描述性模式,以识别问题的根本原因。然而,现有的SD解决方案在处理复杂目标概念中存在限制,这些概念通常具有多个属性,并且归类在层次结构中。为了解释这种情况,我们选择了Java垃圾回收incident作为例子,我们有一个描述这些incident的数据集,包括incident的 контекст和占用内存资源的Java对象类型,这些类型以层次结构组织。这种情况提醒我们提出一种处理复杂目标概念的Subgroup Discovery方法。为了实现这一目标,我们设计了一种模式语法和质量度量,确保提取的子组是有用、非重复、抗噪的。为了实现所需的质量度量,我们使用Subjective Interestingness模型,该模型将数据中的知识纳入考虑,并且提高模式的有用性和surprise性,以便更好地描述数据。我们在调查垃圾回收incident中应用这种框架,并示出其在事件诊断中的有用性。为了证明我们的方法的有效性和模式的质量,我们进行了一个实验研究。研究中使用的源代码和数据都公开 accessible,以确保透明度和可重现性。

Pre-training with Synthetic Data Helps Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.00771
  • repo_url: None
  • paper_authors: Zecheng Wang, Che Wang, Zixuan Dong, Keith Ross
  • for: 这个论文主要研究了深度强化学习(DRL)的离线预训练方法,特别是使用大量语言资料来提高下游性能(Reid et al., 2022)。
  • methods: 本论文使用了几种简单的预训练方案,包括使用生成的IID数据和一步随机链生成的数据,以及使用Q学习算法和多层感知器(MLP)作为后续。
  • results: 实验结果表明,使用这些简单的预训练方案可以提高DRL的性能,并且可以与使用大量语言资料预训练的性能相比肩。此外,采用这些预训练方案可以提高CQL算法的性能,并且在D4RL Gym游戏数据集上获得了一致的性能提升。
    Abstract Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.
    摘要 Inspired by these results, we then explore pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm that uses Q-learning and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, we find that pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets.The findings of this paper demonstrate the importance of pre-training for offline DRL and show that the pre-training data can be synthetic and generated with remarkably simple mechanisms. This has significant implications for the development of offline DRL algorithms and highlights the potential for using simple pre-training schemes to improve performance.

Facilitating Battery Swapping Services for Freight Trucks with Spatial-Temporal Demand Prediction

  • paper_url: http://arxiv.org/abs/2310.04440
  • repo_url: None
  • paper_authors: Linyu Liu, Zhen Dai, Shiji Song, Xiaocheng Li, Guanting Chen
  • for: 这篇论文旨在探讨重型卡车电池更换服务的潜力和效率,以实现碳neutral未来。
  • methods: 论文运用了双重方法,首先预测了运输网络上未来几个小时的交通模式,然后将预测结果引入优化模组,实现电池的有效分配和部署。
  • results: 分析了2,500英里长的高速公路重型卡车数据,我们发现预测/机器学习可以帮助未来的决策。具体来说,我们发现在设置早期的移动电池更换站更有利,但是随着系统的成熟,固定位置的电池更换站更受欢迎。
    Abstract Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swapping services emerge as an attractive solution for these trucks. This paper employs a two-fold approach to investigate the potential and enhance the efficacy of such services. Firstly, spatial-temporal demand prediction models are adopted to predict the traffic patterns for the upcoming hours. Subsequently, the prediction guides an optimization module for efficient battery allocation and deployment. Analyzing the heavy-duty truck data on a highway network spanning over 2,500 miles, our model and analysis underscore the value of prediction/machine learning in facilitating future decision-makings. In particular, we find that the initial phase of implementing battery-swapping services favors mobile battery-swapping stations, but as the system matures, fixed-location stations are preferred.
    摘要 电动重型卡车的应用提供了巨大的减少碳排放的机会,推进向碳中和未来。然而,重型卡车的自然限制,如电池能量有限和车辆总重,导致很长的充电时间和减少的行驶距离。因此,电池换卡服务出现了一个有吸引力的解决方案。本文采用两重方法来探讨这种服务的潜在和提高效率。首先,采用空间-时间需求预测模型预测下一个几个小时的交通趋势。然后,预测导引一个优化模块,以便有效地分配和部署电池。分析了2,500英里长的高速公路上的重型卡车数据,我们的模型和分析表明,预测/机器学习在未来决策中发挥了重要作用。尤其是在实施电池换卡服务的初期阶段,移动电池换卡站更有优势;而在系统成熟后,固定位置的电池换卡站变得更加受欢迎。

Mind the Gap: Federated Learning Broadens Domain Generalization in Diagnostic AI Models

  • paper_url: http://arxiv.org/abs/2310.00757
  • repo_url: https://github.com/tayebiarasteh/fldomain
  • paper_authors: Soroosh Tayebi Arasteh, Christiane Kuhl, Marwin-Jonathan Saehn, Peter Isfort, Daniel Truhn, Sven Nebelung
    for: 这项研究旨在评估 Federated Learning(FL)在骨肢X射线图像分类 task 中的影响,特别是训练策略、网络架构和数据多样性等因素对模型的预测性能的影响。methods: 研究使用了610,000个骨肢X射线图像数据集,来评估不同训练策略、网络架构和数据多样性对模型的预测性能。results: 研究发现,虽然大型数据集可能会增加FL的性能,但是在某些情况下,甚至会导致性能下降。相反,小型数据集表现出了明显的改善。因此,本地训练和FL的性能主要受到训练数据大小的影响,而不同数据集之间的多样性则对于Off-domain任务的性能产生了更大的影响。通过合作训练在多个外部机构的数据上,FL可以提高隐私、可重现性和 Off-domain 可靠性,并且可能提高医疗结果。
    Abstract Developing robust artificial intelligence (AI) models that generalize well to unseen datasets is challenging and usually requires large and variable datasets, preferably from multiple institutions. In federated learning (FL), a model is trained collaboratively at numerous sites that hold local datasets without exchanging them. So far, the impact of training strategy, i.e., local versus collaborative, on the diagnostic on-domain and off-domain performance of AI models interpreting chest radiographs has not been assessed. Consequently, using 610,000 chest radiographs from five institutions across the globe, we assessed diagnostic performance as a function of training strategy (i.e., local vs. collaborative), network architecture (i.e., convolutional vs. transformer-based), generalization performance (i.e., on-domain vs. off-domain), imaging finding (i.e., cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, and no abnormality), dataset size (i.e., from n=18,000 to 213,921 radiographs), and dataset diversity. Large datasets not only showed minimal performance gains with FL but, in some instances, even exhibited decreases. In contrast, smaller datasets revealed marked improvements. Thus, on-domain performance was mainly driven by training data size. However, off-domain performance leaned more on training diversity. When trained collaboratively across diverse external institutions, AI models consistently surpassed models trained locally for off-domain tasks, emphasizing FL's potential in leveraging data diversity. In conclusion, FL can bolster diagnostic privacy, reproducibility, and off-domain reliability of AI models and, potentially, optimize healthcare outcomes.
    摘要 <>发展强健的人工智能(AI)模型,使其在未见数据集上具有良好的泛化性是一项挑战。通常需要大量和多样的数据集,从多个机构获取。在联合学习(FL)中,模型在多个地点进行协同学习,而不需要交换本地数据。 jusqu'à présent,训练策略(本地 versus 协同)对AI模型解剖学影像鉴定性的影响尚未得到评估。为了解决这个问题,我们使用了全球五个机构的610,000张胸部X射影像,评估AI模型的鉴定性以训练策略、网络架构、泛化性、影像发现(cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, 和无异常)、数据集大小(从n=18,000到213,921 radiographs)和数据多样性之间的关系。结果表明,大型数据集不仅在FL中没有获得明显的性能提升,有些情况下甚至显示下降。相反,较小的数据集表现出了明显的改善。因此,本地训练的性能主要受到训练数据集大小的影响,而在不同机构的外部数据集上进行协同训练的性能则更多受到训练多样性的影响。当AI模型在多个外部机构上进行协同训练时,其在外部任务上的性能 consistently 高于本地训练的模型,这将FL的潜在作用在拓展数据多样性方面强调。总之,FL可以提高鉴定隐私、重现性和外部可靠性的AI模型,并且可能会优化医疗结果。

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

  • paper_url: http://arxiv.org/abs/2310.00752
  • repo_url: None
  • paper_authors: Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen
  • for: 本研究开发了一个名为TIGERScore的自动评估指标,用于评估文本生成任务的效果。
  • methods: TIGERScore使用了专门训练的LLaMA模型,并基于自己调整的MetricInstruct dataset,以提供可读的错误分析,并不需要参考。
  • results: TIGERScore在5个对接评分数据集上 Achieves the highest overall Spearman’s correlation with human ratings,并且与其他指标相比表现更好,甚至可以超越参考基于的指标。
    Abstract We present TIGERScore, a \textbf{T}rained metric that follows \textbf{I}nstruction \textbf{G}uidance to perform \textbf{E}xplainable, and \textbf{R}eference-free evaluation over a wide spectrum of text generation tasks. Different from other automatic evaluation methods that only provide arcane scores, TIGERScore is guided by the natural language instruction to provide error analysis to pinpoint the mistakes in the generated text. Our metric is based on LLaMA, trained on our meticulously curated instruction-tuning dataset MetricInstruct which covers 6 text generation tasks and 23 text generation datasets. The dataset consists of 48K quadruple in the form of (instruction, input, system output $\rightarrow$ error analysis). We collected the `system outputs' through diverse channels to cover different types of errors. To quantitatively assess our metric, we evaluate its correlation with human ratings on 5 held-in datasets, 2 held-out datasets and show that TIGERScore can achieve the highest overall Spearman's correlation with human ratings across these datasets and outperforms other metrics significantly. As a reference-free metric, its correlation can even surpass the best existing reference-based metrics. To further qualitatively assess the rationale generated by our metric, we conduct human evaluation on the generated explanations and found that the explanations are 70.8\% accurate. Through these experimental results, we believe TIGERScore demonstrates the possibility of building universal explainable metrics to evaluate any text generation task.
    摘要 我们介绍TIGERScore,一个已经训练的度量,可以根据自然语言指南进行可解释的、无参考度的文本生成任务评价。与其他自动评价方法不同,TIGERScore不仅提供神秘的分数,还可以通过错误分析来 pinpoint生成文本中的错误。我们的度量基于LLaMA,并在我们精心抽样的指南调度集MetricInstruct上训练。这个集合包括6种文本生成任务和23种文本生成数据集,共48000个四元组(指南、输入、系统输出 → 错误分析)。我们通过多种途径收集了“系统输出”,以覆盖不同类型的错误。为了评估我们的度量,我们对5个保留数据集、2个保 OUT数据集进行了量化评估,并发现TIGERScore可以在这些数据集中 achiev the highest Spearman correlation coefficient with human ratings,并且与其他度量相比显著出perform better。作为一个无参考度量,TIGERScore的相关性可以甚至超过最佳参考基础度量。为了进一步评估我们的度量生成的理由,我们对生成的解释进行了人工评估,并发现解释的准确率为70.8%。通过这些实验结果,我们认为TIGERScore表明了可以建立 universal explainable metrics,用于评价任何文本生成任务。

NoxTrader: LSTM-Based Stock Return Momentum Prediction for Quantitative Trading

  • paper_url: http://arxiv.org/abs/2310.00747
  • repo_url: None
  • paper_authors: Hsiang-Hui Liu, Han-Jay Shu, Wei-Ning Chiu
  • for: 这个研究主要目的是在股票市场中获得资金收益,尤其是在中期至长期的时间预测。
  • methods: 这个研究使用时间序列分析来学习股票市场的趋势,并使用价格和股票量数据进行特征工程。他们还使用Long Short-Term Memory(LSTM)模型来捕捉价格趋势,并在交易过程中进行动态模型更新。
  • results: 这个研究获得了一些优秀的预测数据,其中预测和实际市场资料之间的距离在0.65至0.75之间。他们还使用筛选技术来改善初始投资回报,从-60%提升到325%.
    Abstract We introduce NoxTrader, a sophisticated system designed for portfolio construction and trading execution with the primary objective of achieving profitable outcomes in the stock market, specifically aiming to generate moderate to long-term profits. The underlying learning process of NoxTrader is rooted in the assimilation of valuable insights derived from historical trading data, particularly focusing on time-series analysis due to the nature of the dataset employed. In our approach, we utilize price and volume data of US stock market for feature engineering to generate effective features, including Return Momentum, Week Price Momentum, and Month Price Momentum. We choose the Long Short-Term Memory (LSTM)model to capture continuous price trends and implement dynamic model updates during the trading execution process, enabling the model to continuously adapt to the current market trends. Notably, we have developed a comprehensive trading backtesting system - NoxTrader, which allows us to manage portfolios based on predictive scores and utilize custom evaluation metrics to conduct a thorough assessment of our trading performance. Our rigorous feature engineering and careful selection of prediction targets enable us to generate prediction data with an impressive correlation range between 0.65 and 0.75. Finally, we monitor the dispersion of our prediction data and perform a comparative analysis against actual market data. Through the use of filtering techniques, we improved the initial -60% investment return to 325%.
    摘要 我们介绍NoxTrader,一个复杂的系统,用于股票投资组合建立和交易执行,主要目标是在股市中实现可观的收益。我们的学习过程借鉴了历史交易数据中的宝贵经验,特别是时间序列分析,因为我们使用的数据集是时间序列型的。在我们的方法中,我们利用美国股市价格和量数据进行特征工程,生成有效特征,包括回报势力、周期势力和月度势力。我们选择Long Short-Term Memory(LSTM)模型,以捕捉连续价格趋势,并在交易执行过程中进行动态模型更新,使模型能够不断适应当前市场趋势。值得一提的是,我们开发了一套完整的交易回测系统——NoxTrader,它允许我们基于预测得分来管理投资组合,并使用自定义评估 metric来进行严格的评估我们的交易性能。我们的严格的特征工程和预测目标的精心选择,使我们能够生成预测数据的各种相关度范围在0.65-0.75之间。最后,我们监测预测数据的分散情况,并对实际市场数据进行比较分析。通过筛选技术,我们从初始投资回报下降至60%的位置提高至325%。

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.00746
  • repo_url: https://github.com/interactivenlp-team/rolellm-public
  • paper_authors: Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, Junran Peng
  • for: 本文旨在提高语言模型(LLM)的角色扮演能力,以增强用户交互。
  • methods: 本文提出了一个框架,名为RoleLLM,用于评估、引出和提高 LLM 的角色扮演能力。RoleLLM 包括四个阶段:(1)角色资料构建(Role Profile Construction),(2)基于上下文的指令生成(Context-Based Instruction Generation),(3)角色提示(Role Prompting),以及(4)角色定制化指令调整(Role-Conditioned Instruction Tuning)。
  • results: 通过 Context-Instruct 和 RoleGPT,我们创建了 RoleBench,这是首个系统性的、细致的字级 benchmark 数据集,用于测试角色扮演能力。此外,通过 RoCIT 在 RoleBench 上进行调整,我们获得了 RoleLLaMA(英文)和 RoleGLM(中文),这些模型显著提高了角色扮演能力,甚至与 RoleGPT(使用 GPT-4)具有相同的Results。
    Abstract The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).
    摘要 大型语言模型(LLM)的出现已经为复杂的任务如角色扮演提供了方便,这些任务可以使模型模拟多种角色,从而提高用户交互的体验。然而,现有的State-of-the-art LLMs的关闭源代码和通用训练限制了角色扮演优化。在这篇论文中,我们介绍了RoleLLM框架,用于评价、引导和提高LLMs中的角色扮演能力。RoleLLM包括四个阶段:(1)角色Profile构建100个角色;(2)基于上下文的指令生成(Context-Instruct)用于角色特定知识提取;(3)基于GPT的角色提示(RoleGPT)用于模仿说话风格;以及(4)基于角色的Conditioned Instruction Tuning(RoCIT)用于 fine-tuning开源模型以及角色定制。通过Context-Instruct和RoleGPT,我们创建了RoleBench,第一个系统和细化的字符级 benchmark dataset для角色扮演,包含168,093个样本。此外,在RoleBench上进行RoCIT后,我们获得了RoleLLaMA(英语)和RoleGLM(中文),两个能够明显提高角色扮演能力的模型,甚至与RoleGPT(使用GPT-4)相当。

My Machine and I: ChatGPT and the Future of Human-Machine Collaboration in Africa

  • paper_url: http://arxiv.org/abs/2310.13704
  • repo_url: None
  • paper_authors: Munachimso Blessing Oguine, Chidera Godsfavor Oguine, Kanyifeechukwu Jane Oguine
  • for: 本研究旨在探讨聊天GPT在人机合作中的效果。
  • methods: 本研究使用反透明主题分析方法对51篇2019-2023年的文章进行分析。
  • results: 研究发现聊天GPT在学术领域 such as 教育和研究中的人机交互非常普遍,而且聊天GPT在改善人机合作方面的效果较高。
    Abstract Recent advancements in technology have necessitated a paradigm shift in the people use technology necessitating a new research field called Human-Machine collaboration. ChatGPT, an Artificial intelligence (AI) assistive technology, has gained mainstream adoption and implementation in academia and industry; however, a lot is left unknown about how this new technology holds for Human-Machine Collaboration in Africa. Our survey paper highlights to answer some of these questions. To understand the effectiveness of ChatGPT on human-machine collaboration we utilized reflexive thematic analysis to analyze (N= 51) articles between 2019 and 2023 obtained from our literature search. Our findings indicate the prevalence of ChatGPT for human-computer interaction within academic sectors such as education, and research; trends also revealed the relatively high effectiveness of ChatGPT in improving human-machine collaboration.
    摘要 最近的技术发展使得人机合作的研究领域得到了推动,这种新的研究领域被称为人机合作。智能人工智能(AI)协助技术ChatGPT在学术和产业界得到了广泛的批处和实施,但是关于这种新技术在非洲的人机合作方面还有很多未知之处。我们的调查论文旨在回答这些问题。为了评估ChatGPT在人机合作效果,我们使用了反思主题分析法分析(N=51)于2019年至2023年之间的文章。我们的发现表明了ChatGPT在教育和研究领域的人机交互非常普遍,并且发现ChatGPT在改善人机合作效果方面的趋势相对较高。

GenAI Against Humanity: Nefarious Applications of Generative Artificial Intelligence and Large Language Models

  • paper_url: http://arxiv.org/abs/2310.00737
  • repo_url: None
  • paper_authors: Emilio Ferrara
  • for: This paper is written to raise awareness about the potential risks and challenges of Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) being misused for nefarious purposes.
  • methods: The paper uses a combination of research and analysis to identify the potential risks of GenAI and LLMs, including their use in deepfakes, malicious content generation, and the creation of synthetic identities.
  • results: The paper highlights the potential consequences of GenAI and LLMs being misused, including the blurring of the lines between the virtual and real worlds, the potential for targeted misinformation and scams, and the creation of sophisticated malware. The paper also serves as a call to action to prepare for these potential risks and challenges.In Simplified Chinese text, the three key points would be:
  • for: 这篇论文是为了提醒大家关于生成人工智能(GenAI)和大语言模型(LLMs)的可能的风险和挑战。
  • methods: 论文使用了组合的研究和分析来识别GenAI和LLMs的可能的风险,包括它们在深圳中的使用、邪恶内容生成和假造标识等。
  • results: 论文 highlights GenAI和LLMs的可能的后果,包括虚拟和现实世界之间的边界模糊、targeted的谣言和骗局、以及高级的黑客软件。论文也 serves as a call to action,准备这些可能的风险和挑战。
    Abstract Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are marvels of technology; celebrated for their prowess in natural language processing and multimodal content generation, they promise a transformative future. But as with all powerful tools, they come with their shadows. Picture living in a world where deepfakes are indistinguishable from reality, where synthetic identities orchestrate malicious campaigns, and where targeted misinformation or scams are crafted with unparalleled precision. Welcome to the darker side of GenAI applications. This article is not just a journey through the meanders of potential misuse of GenAI and LLMs, but also a call to recognize the urgency of the challenges ahead. As we navigate the seas of misinformation campaigns, malicious content generation, and the eerie creation of sophisticated malware, we'll uncover the societal implications that ripple through the GenAI revolution we are witnessing. From AI-powered botnets on social media platforms to the unnerving potential of AI to generate fabricated identities, or alibis made of synthetic realities, the stakes have never been higher. The lines between the virtual and the real worlds are blurring, and the consequences of potential GenAI's nefarious applications impact us all. This article serves both as a synthesis of rigorous research presented on the risks of GenAI and misuse of LLMs and as a thought-provoking vision of the different types of harmful GenAI applications we might encounter in the near future, and some ways we can prepare for them.
    摘要 生成人工智能(GenAI)和大型语言模型(LLMs)是技术的宠儿,被庆贤以其在自然语言处理和多模式内容生成的能力。它们承诺一个转型的未来。但就像所有的强大工具一样,它们也有阴影。 imagine living in a world where deepfakes are indistinguishable from reality, where synthetic identities orchestrate malicious campaigns, and where targeted misinformation or scams are crafted with unparalleled precision. Welcome to the darker side of GenAI applications. This article is not just a journey through the meanders of potential misuse of GenAI and LLMs, but also a call to recognize the urgency of the challenges ahead. As we navigate the seas of misinformation campaigns, malicious content generation, and the eerie creation of sophisticated malware, we'll uncover the societal implications that ripple through the GenAI revolution we are witnessing. From AI-powered botnets on social media platforms to the unnerving potential of AI to generate fabricated identities, or alibis made of synthetic realities, the stakes have never been higher. The lines between the virtual and the real worlds are blurring, and the consequences of potential GenAI's nefarious applications impact us all. This article serves both as a synthesis of rigorous research presented on the risks of GenAI and misuse of LLMs and as a thought-provoking vision of the different types of harmful GenAI applications we might encounter in the near future, and some ways we can prepare for them.

Review of deep learning in healthcare

  • paper_url: http://arxiv.org/abs/2310.00727
  • repo_url: https://github.com/avadhutsonavane/Diagnosis-of-Coronavirus-using-chest-X-RAY
  • paper_authors: Hasan Hejbari Zargar, Saha Hejbari Zargar, Raziye Mehri
  • for: 本研究旨在探讨医疗系统中使用深度学习方法,包括最新的网络设计、应用和市场趋势。
  • methods: 本研究使用深度学习方法,包括深度神经网络模型,以提取医疗数据中隐藏的模式和有价值信息。
  • results: 研究发现,深度学习方法在医疗系统中可以提取到有价值的信息,但是需要更好地结合人类医疗解释才能实现更高效的应用。
    Abstract Given the growing complexity of healthcare data over the last several years, using machine learning techniques like Deep Neural Network (DNN) models has gained increased appeal. In order to extract hidden patterns and other valuable information from the huge quantity of health data, which traditional analytics are unable to do in a reasonable length of time, machine learning (ML) techniques are used. Deep Learning (DL) algorithms in particular have been shown as potential approaches to pattern identification in healthcare systems. This thought has led to the contribution of this research, which examines deep learning methods used in healthcare systems via an examination of cutting-edge network designs, applications, and market trends. To connect deep learning methodologies and human healthcare interpretability, the initial objective is to provide in-depth insight into the deployment of deep learning models in healthcare solutions. And last, to outline the current unresolved issues and potential directions.
    摘要 随着医疗数据的增长复杂性,使用机器学习技术如深度神经网络(DNN)模型已经得到了加大的appeal。为了从庞大量的医疗数据中提取隐藏的模式和其他有价值的信息,传统分析无法在合理的时间内完成,因此机器学习(ML)技术被使用。深度学习(DL)算法在医疗系统中特别有潜力,这也导致了本研究的出发,即通过对当前最新的网络设计、应用和市场趋势进行检验,探讨深度学习在医疗解决方案中的应用。为了将深度学习方法与人类医疗解释相连接,初始的目标是提供深度学习模型在医疗解决方案中的深入分析。最后,总结当前未解决的问题和可能的发展方向。

Improving Length-Generalization in Transformers via Task Hinting

  • paper_url: http://arxiv.org/abs/2310.00726
  • repo_url: None
  • paper_authors: Pranjal Awasthi, Anupam Gupta
  • for: 本研究旨在解决 transformer 模型在某些逻辑和数学任务上长度泛化问题。特别是,一个基于添加的 transformer 模型在应用于更长的实例时表现会下降很快。本研究提出了一种基于任务提示的方法,以解决长度泛化问题。
  • methods: 本研究使用了多任务训练框架,并在训练过程中同时训练模型解决一个简单且相关的 auxillary 任务。
  • results: 对于排序问题,我们发现可以使用 sequences 的 length 不超过 20 来训练模型,并在 test 数据上提高了模型的测试准确率从 less than 1% (标准训练) 提高到更多于 92% (via 任务提示)。此外,我们还发现了一些有趣的长度泛化问题的方面,包括不同的 auxillary 任务的效iveness 在提高长度泛化方面有很大差异。
    Abstract It has been observed in recent years that transformers have problems with length generalization for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained on tasks (say addition) up to a certain length (e.g., 5 digit numbers) drops sharply when applied to longer instances of the same problem. This work proposes an approach based on task hinting towards addressing length generalization. Our key idea is that while training the model on task-specific data, it is helpful to simultaneously train the model to solve a simpler but related auxiliary task as well. We study the classical sorting problem as a canonical example to evaluate our approach. We design a multitask training framework and show that task hinting significantly improve length generalization. For sorting we show that it is possible to train models on data consisting of sequences having length at most $20$, and improve the test accuracy on sequences of length $100$ from less than 1% (for standard training) to more than 92% (via task hinting). Our study uncovers several interesting aspects of length generalization. We observe that while several auxiliary tasks may seem natural a priori, their effectiveness in improving length generalization differs dramatically. We further use probing and visualization-based techniques to understand the internal mechanisms via which the model performs the task, and propose a theoretical construction consistent with the observed learning behaviors of the model. Based on our construction, we show that introducing a small number of length dependent parameters into the training procedure can further boost the performance on unseen lengths. Finally, we also show the efficacy of our task hinting based approach beyond sorting, giving hope that these techniques will be applicable in broader contexts.
    摘要 近年来,transformer模型在某些逻辑和数学任务中表现出长度泛化问题。具体来说,一个基于添加任务的transformer模型在应用于更长的问题时表现下降。这项工作提出一种基于任务提示的方法来解决长度泛化问题。我们的关键想法是在训练模型时,同时训练模型解决一个相关的简单任务。我们选择排序问题作为一个典型的例子来评估我们的方法。我们设计了一个多任务训练框架,并证明了任务提示可以显著提高长度泛化。对于排序问题,我们可以在数据中包含长度不超过20的序列,并在测试时提高测试 accuracy 从 less than 1% (标准训练) 到更多于92% (via任务提示)。我们的研究揭示了长度泛化的几个有趣方面。我们发现,虽然一些 auxillary task 可能看起来很自然,但它们在提高长度泛化效果上差异很大。我们还使用探测和视觉化技术来理解模型如何完成任务,并提出了一种理论建构,该建构与模型学习行为相符。基于该建构,我们表明在训练过程中引入一小数量的长度参数可以进一步提高对未经见长度的表现。最后,我们还证明了我们的任务提示基本方法在更广泛的上下文中有效。

Subtractive Mixture Models via Squaring: Representation and Learning

  • paper_url: http://arxiv.org/abs/2310.00724
  • repo_url: https://github.com/anon-npc/squared-npcs
  • paper_authors: Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari
  • for: 用于模型复杂的分布
  • methods: 使用深度减法 mixture 模型
  • results: 可以提高表达能力,并且在实际分布估计任务中得到良好效果
    Abstract Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.
    摘要 混合模型通常通过添加多个分布来表示和学习。然而,允许混合 subtract 概率质量或密度可以很快减少需要odel复杂分布的组件数量。然而,学习这种 subtractive 混合并确保它们仍然表示非负函数是困难的。我们研究如何在概率Circuits框架下学习和进行推理深 subtractive 混合,并证明在这种框架下,allowing squaring 可以在exponentially more expressive的基础上表示。此外,我们还employs empirical evidence demonstrate this increased expressiveness on a series of real-world distribution estimation tasks。Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm

  • paper_url: http://arxiv.org/abs/2310.00708
  • repo_url: None
  • paper_authors: Qi Wang, Yiqin Lv, Yanghe Feng, Zheng Xie, Jincai Huang
  • for: 提高 meta 学习的可靠性和鲁棒性,尤其是在风险敏感的情况下。
  • methods: 基于分布 robust 思想来优化 meta 学习管道,并使用预期尾风险度量进行优化。
  • results: 实验结果显示,我们的简单方法可以提高 meta 学习对任务分布的Robustness,降低 conditional 预期最坏快速风险的平均值。
    Abstract Meta learning is a promising paradigm to enable skill transfer across tasks. Most previous methods employ the empirical risk minimization principle in optimization. However, the resulting worst fast adaptation to a subset of tasks can be catastrophic in risk-sensitive scenarios. To robustify fast adaptation, this paper optimizes meta learning pipelines from a distributionally robust perspective and meta trains models with the measure of expected tail risk. We take the two-stage strategy as heuristics to solve the robust meta learning problem, controlling the worst fast adaptation cases at a certain probabilistic level. Experimental results show that our simple method can improve the robustness of meta learning to task distributions and reduce the conditional expectation of the worst fast adaptation risk.
    摘要 <>将文本翻译成简化中文。<>基于实际风险最小化原则的现有方法通常采用Meta学习。然而,这可能导致在任务分布下的最坏快adaptation情况,在风险敏感场景下可能是灾难性的。为了强化快adaptation的稳定性,这篇论文从分布 robust perspective来优化Meta学习管道,并使用度量预期的尾风险来训练Meta模型。我们采用两阶段策略来解决Robust Meta学习问题,在某些 probabilistic水平上控制最坏快adaptation的情况。实验结果表明,我们的简单方法可以提高Meta学习的任务分布Robustness和降低最坏快adaptation风险的 conditional expectation。

Meta Semantic Template for Evaluation of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01448
  • repo_url: None
  • paper_authors: Yachuan Liu, Liang Chen, Jindong Wang, Qiaozhu Mei, Xing Xie
  • for: 评估大语言模型(LLMs)的 semantics 理解能力,不是仅仅是 memorize 训练数据。
  • methods: 提出了 MSTemp 方法,通过创建meta semantic templates来评估 LLMs 的 semantics 理解能力。
  • results: MSTemp 可以生成高度 OUT-OF-DISTRIBUTION(OOD)评估样本,并且可以显著降低 LLMS 使用现有数据集作为种子时的性能。
    Abstract Do large language models (LLMs) genuinely understand the semantics of the language, or just memorize the training data? The recent concern on potential data contamination of LLMs has raised awareness of the community to conduct research on LLMs evaluation. In this paper, we propose MSTemp, an approach that creates meta semantic templates to evaluate the semantic understanding ability of LLMs. The core of MSTemp is not to perform evaluation directly on existing benchmark datasets, but to generate new out-of-distribution (OOD) evaluation sets using existing datasets as seeds. Specifically, for a given sentence, MSTemp leverages another language model to generate new samples while preserving its semantics. The new samples are called semantic templates to the original sentence. Then, MSTemp generates evaluation samples via sentence parsing and random word replacement on the semantic templates. MSTemp is highly flexible, dynamic, and cost-effective. Our initial experiments show that MSTemp-generated samples can significantly reduce the performance of LLMs using existing datasets as seeds. We hope this initial work can shed light on future research of LLMs evaluation.
    摘要 <>将文本翻译成简化中文。<> LLMs 真的理解语言 semantics 吗,或者只是Memorize 训练数据?在社区关注 LLMS 可能存在数据束缚的问题后,我们提出了一种方法来评估 LLMS。在这篇论文中,我们提出了 MSTemp,它使用现有的语言模型生成新的 OUT-OF-DISTRIBUTION(OOD)评估集。具体来说,对于一个句子,MSTemp 利用另一个语言模型生成新的样本,保持句子的 semantics。然后,MSTemp 使用句子分析和随机词替换来生成评估样本。MSTemp 具有高度的灵活性、动态性和成本效益。我们的初步实验表明,MSTemp 生成的样本可以使 LLMS 使用现有数据集作为种子时表现出显著的下降性能。我们希望这些初步研究可以鼓励未来 LLMS 评估的研究。

Exchange means change: an unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange

  • paper_url: http://arxiv.org/abs/2310.00689
  • repo_url: https://github.com/chenhongruixuan/i3pe
  • paper_authors: Hongruixuan Chen, Jian Song, Chen Wu, Bo Du, Naoto Yokoya
  • for: 这个研究旨在提出一个无监控、无标注的单时间变化检测 Framework,以实现单时间 remote sensing 图像上的变化检测。
  • methods: 这个 Framework 使用了内部和外部图像块交换 (I3PE) 方法,通过交换内部图像块和外部图像块,从单时间图像中生成 pseudo-bi-temporal 图像组和变化标签。
  • results: 实验结果显示,I3PE 可以超过表现最佳方法的代表无监控方法,实现 F1 值提升约 10.65% 和 6.99%。此外,I3PE 可以在单监控和半监控情况下提高变化检测器的性能。
    Abstract Change detection (CD) is a critical task in studying the dynamics of ecosystems and human activities using multi-temporal remote sensing images. While deep learning has shown promising results in CD tasks, it requires a large number of labeled and paired multi-temporal images to achieve high performance. Pairing and annotating large-scale multi-temporal remote sensing images is both expensive and time-consuming. To make deep learning-based CD techniques more practical and cost-effective, we propose an unsupervised single-temporal CD framework based on intra- and inter-image patch exchange (I3PE). The I3PE framework allows for training deep change detectors on unpaired and unlabeled single-temporal remote sensing images that are readily available in real-world applications. The I3PE framework comprises four steps: 1) intra-image patch exchange method is based on an object-based image analysis method and adaptive clustering algorithm, which generates pseudo-bi-temporal image pairs and corresponding change labels from single-temporal images by exchanging patches within the image; 2) inter-image patch exchange method can generate more types of land-cover changes by exchanging patches between images; 3) a simulation pipeline consisting of several image enhancement methods is proposed to simulate the radiometric difference between pre- and post-event images caused by different imaging conditions in real situations; 4) self-supervised learning based on pseudo-labels is applied to further improve the performance of the change detectors in both unsupervised and semi-supervised cases. Extensive experiments on two large-scale datasets demonstrate that I3PE outperforms representative unsupervised approaches and achieves F1 value improvements of 10.65% and 6.99% to the SOTA method. Moreover, I3PE can improve the performance of the ... (see the original article for full abstract)
    摘要 Change detection (CD) 是生态系统和人类活动研究中的关键任务,使用多时间 remote sensing 图像进行研究。深度学习 已经在 CD 任务中表现出色,但它需要大量标注和对应的多时间图像来达到高性能。对于大规模多时间 remote sensing 图像的对应和标注是非常昂贵和时间消耗的。为了使深度学习 基于 CD 技术更实用和成本效果,我们提出了一个无监督单时 CD 框架,基于内部和外部图像块交换(I3PE)。I3PE 框架包括四个步骤:1. 内部图像块交换方法基于 объек 基于 image 分析方法和自适应聚类算法,通过在图像中交换块来生成 pseudo-bi-temporal 图像对和相应的变化标签。2. 外部图像块交换方法可以生成更多的土地覆盖变化类型,通过在图像之间交换块。3. 我们提出了一个模拟管道,包括多种图像提升方法,以模拟在实际情况下的 радиометрические差异。4. 我们采用了自动标注的自我超vised 学习方法,以进一步提高 CD 检测器的性能。我们在两个大规模数据集上进行了广泛的实验,并证明 I3PE 可以在无监督和半监督情况下超越代表性的无监督方法,并在 SOTA 方法上实现 F1 值提升率为 10.65% 和 6.99%。此外,I3PE 还可以提高 CD 检测器的性能。

The Robots are Here: Navigating the Generative AI Revolution in Computing Education

  • paper_url: http://arxiv.org/abs/2310.00658
  • repo_url: None
  • paper_authors: James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, Stephen MacNeil, Andrew Peterson, Raymond Pettit, Brent N. Reeves, Jaromir Savelka
  • for: 这份工作组报告旨在探讨大语言模型(LLMs)在计算教育中的应用和挑战,以及如何适应和利用这些新技术。
  • methods: 本报告使用Literature Review和论坛调查来探讨LLMs在计算教育中的应用,并从22名计算教育专家的深入采访中收集了实践经验。
  • results: 本报告的主要结论是:LLMs在计算教育中的应用可以提高学生的学习效果和创新能力,但也存在一些伦理和教学方法的挑战。同时,现有的LLMs在计算教育领域的性能水平在不断提高。
    Abstract Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesise findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.
    摘要 Recent advancements in artificial intelligence (AI) are fundamentally reshaping computing, with large language models (LLMs) now effectively being able to generate and interpret source code and natural language instructions. These emergent capabilities have sparked urgent questions in the computing education community around how educators should adapt their pedagogy to address the challenges and to leverage the opportunities presented by this new technology. In this working group report, we undertake a comprehensive exploration of LLMs in the context of computing education and make five significant contributions. First, we provide a detailed review of the literature on LLMs in computing education and synthesize findings from 71 primary articles. Second, we report the findings of a survey of computing students and instructors from across 20 countries, capturing prevailing attitudes towards LLMs and their use in computing education contexts. Third, to understand how pedagogy is already changing, we offer insights collected from in-depth interviews with 22 computing educators from five continents who have already adapted their curricula and assessments. Fourth, we use the ACM Code of Ethics to frame a discussion of ethical issues raised by the use of large language models in computing education, and we provide concrete advice for policy makers, educators, and students. Finally, we benchmark the performance of LLMs on various computing education datasets, and highlight the extent to which the capabilities of current models are rapidly improving. Our aim is that this report will serve as a focal point for both researchers and practitioners who are exploring, adapting, using, and evaluating LLMs and LLM-based tools in computing classrooms.

LEGO-Prover: Neural Theorem Proving with Growing Libraries

  • paper_url: http://arxiv.org/abs/2310.00656
  • repo_url: https://github.com/wiio12/LEGO-Prover
  • paper_authors: Haiming Wang, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, Jian Yin, Zhenguo Li, Heng Liao, Xiaodan Liang
  • for: This paper aims to improve the ability of large language models (LLMs) to prove mathematical theorems by employing a growing skill library containing verified lemmas as skills.
  • methods: The proposed method, called LEGO-Prover, constructs the proof modularly and uses existing skills retrieved from the library to augment the capability of LLMs. The skills are further evolved by prompting an LLM to enrich the library on another scale.
  • results: The proposed method advances the state-of-the-art pass rate on miniF2F-valid and miniF2F-test, and generates over 20,000 skills (theorems/lemmas) that are added to the growing library. The ablation study shows that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%.Here is the same information in Simplified Chinese text:
  • for: 这篇论文的目的是使大型自然语言模型(LLM)能够更好地证明数学定理,通过使用增长的技能库,这个库包含已验证的证明。
  • methods: 提议的方法是LEGO-Prover,它将证明构造为模块化的方式,使用已存在的技能库中的技能来增强LLM的能力。这些技能还会在证明过程中进行进一步的演化,以便在另一个尺度上增强库。
  • results: 提议的方法提高了miniF2F-valid和miniF2F-test的状态前的通过率,并生成了超过20,000个技能(定理/证明),这些技能被加入了增长的库中。我们的减少研究表明,这些新增的技能确实对证明定理有帮助,从47.1%提高到50.4%。我们还发布了我们的代码和所有生成的技能。
    Abstract Despite the success of large language models (LLMs), the task of theorem proving still remains one of the hardest reasoning tasks that is far from being fully solved. Prior methods using language models have demonstrated promising results, but they still struggle to prove even middle school level theorems. One common limitation of these methods is that they assume a fixed theorem library during the whole theorem proving process. However, as we all know, creating new useful theorems or even new theories is not only helpful but crucial and necessary for advancing mathematics and proving harder and deeper results. In this work, we present LEGO-Prover, which employs a growing skill library containing verified lemmas as skills to augment the capability of LLMs used in theorem proving. By constructing the proof modularly, LEGO-Prover enables LLMs to utilize existing skills retrieved from the library and to create new skills during the proving process. These skills are further evolved (by prompting an LLM) to enrich the library on another scale. Modular and reusable skills are constantly added to the library to enable tackling increasingly intricate mathematical problems. Moreover, the learned library further bridges the gap between human proofs and formal proofs by making it easier to impute missing steps. LEGO-Prover advances the state-of-the-art pass rate on miniF2F-valid (48.0% to 57.0%) and miniF2F-test (45.5% to 47.1%). During the proving process, LEGO-Prover also manages to generate over 20,000 skills (theorems/lemmas) and adds them to the growing library. Our ablation study indicates that these newly added skills are indeed helpful for proving theorems, resulting in an improvement from a success rate of 47.1% to 50.4%. We also release our code and all the generated skills.
    摘要 尽管大型语言模型(LLM)取得了成功,但 theorem proving 仍然是一项非常困难的推理任务,尚未被完全解决。先前的方法使用语言模型已经取得了有望的结果,但它们仍然无法证明中学水平的定理。一个常见的限制是这些方法假设定量定理库在整个定理证明过程中保持不变。然而,我们所知道,创造新有用的定理或新的理论是不仅有帮助作用,而且是必要和必要的,以前进 mathematics 和证明更深入的结果。在这项工作中,我们提出了 LEGO-Prover,它使用增长的技能库,其中包含验证的证明为技能来增强 LLMS 在定理证明中的能力。通过构建证明为模块,LEGO-Prover 让 LLMS 可以在证明过程中使用现有的技能库中的技能,以及在证明过程中创建新的技能。这些技能被进一步演化(通过提示 LLMS),以拓展库的规模。我们不断增加可重用的技能,以便解决越来越复杂的数学问题。此外,学习的库还使得人类证明和正式证明之间的差距变得更小,使得补充缺失的步骤更加容易。LEGO-Prover 提高了 miniF2F-valid 和 miniF2F-test 的通过率(48.0% 到 57.0%)和 miniF2F-test 的成功率(45.5% 到 47.1%)。在证明过程中,LEGO-Prover 还生成了超过 20,000 个技能(定理/证明),并将它们添加到增长的库中。我们的剥离研究表明,这些新增的技能确实对于证明定理有帮助,导致成功率从 47.1% 提高到 50.4%。我们还发布了我们的代码和所有生成的技能。

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

  • paper_url: http://arxiv.org/abs/2310.00653
  • repo_url: https://github.com/thunlp/muffin
  • paper_authors: Tianyu Yu, Jinyi Hu, Yuan Yao, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun
  • for: 这个论文主要是为了提出一种新的视觉语言模型框架和 Multimodal 指令训练数据集,以提高现有的 Multimodal 语言模型的性能。
  • methods: 这个论文使用了一种称为 Muffin 的新框架,该框架直接使用预训练的视觉语言模型来连接视觉模块和语言模型,而不需要额外的特征Alignment预训练。此外,该论文还提出了一个名为 UniMM-Chat 的新数据集,该数据集通过将不同任务的数据集融合而成,以生成高质量和多样化的 Multimodal 指令。
  • results: 实验结果表明,Muffin 框架和 UniMM-Chat 数据集可以提高 Multimodal 语言模型的性能,并且超越了现有的状态机器人模型 like LLaVA 和 InstructBLIP。
    Abstract Recent Multimodal Large Language Models (MLLMs) exhibit impressive abilities to perceive images and follow open-ended instructions. The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following. (i) For the model architecture, most existing models introduce an external bridge module to connect vision encoders with language models, which needs an additional feature-alignment pre-training. In this work, we discover that compact pre-trained vision language models can inherently serve as ``out-of-the-box'' bridges between vision and language. Based on this, we propose Muffin framework, which directly employs pre-trained vision-language models to act as providers of visual signals. (ii) For the multimodal instruction tuning datasets, existing methods omit the complementary relationship between different datasets and simply mix datasets from different tasks. Instead, we propose UniMM-Chat dataset which explores the complementarities of datasets to generate 1.1M high-quality and diverse multimodal instructions. We merge information describing the same image from diverse datasets and transforms it into more knowledge-intensive conversation data. Experimental results demonstrate the effectiveness of the Muffin framework and UniMM-Chat dataset. Muffin achieves state-of-the-art performance on a wide range of vision-language tasks, significantly surpassing state-of-the-art models like LLaVA and InstructBLIP. Our model and dataset are all accessible at https://github.com/thunlp/muffin.
    摘要 最近的多模态大语言模型(MLLM)展现出了惊人的图像识别和开放式指令遵从能力。MLLM的能力取决于两个关键因素:模型架构来实现视觉模块的特征对应,以及多模态指令调整数据集来训练人类指令遵从。在这种情况下,我们发现了一种``出团''的解决方案:使用预训练的视觉语言模型作为视觉信号的提供者。基于这一点,我们提出了甜甜干涯(Muffin)框架,直接employs预训练的视觉语言模型来处理视觉信号。其次,我们发现现有的多模态指令调整数据集忽略了不同任务之间的补偿关系,而是将不同任务的数据集混合在一起。相反,我们提出了UniMM-Chat数据集,它探索不同任务之间的补偿关系,并将这些数据集转化为更加知识充沛的对话数据。实验结果表明甜甜干涯框架和UniMM-Chat数据集的效果。甜甜干涯在各种视觉语言任务上达到了状态之arte的表现,significantly超越了状态之arte的模型 like LLaVA和InstructBLIP。我们的模型和数据集都可以在https://github.com/thunlp/muffin中下载。

Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning

  • paper_url: http://arxiv.org/abs/2310.01446
  • repo_url: None
  • paper_authors: Jianpeng Zhou, Wanjun Zhong, Yanlin Wang, Jiahai Wang
  • for: 本研究旨在提高大语言模型(LLM)在复杂理解任务中的表现,并适应实际问题的多样性。
  • methods: 本研究提出了一种适应性解决框架,该框架可以根据问题的复杂性进行灵活的调整。具体来说,该框架包括两个主要模块:初始评估模块和后续适应模块。在后续适应模块中,研究者采用了三种适应策略:(1)模型适应策略:根据问题的复杂性,选择合适的大语言模型;(2)提示方法适应策略:根据问题的特点,选择合适的提示方法;(3)归纳粒度适应策略:根据问题的复杂性,进行细化的问题分解。
  • results: 实验结果显示,提示方法适应策略和归纳粒度适应策略在所有任务中均提高了表现,而模型适应策略可以减少API成本(最多50%),同时保持高水平的表现。
    Abstract Large Language Models (LLMs) are showcasing impressive ability in handling complex reasoning tasks. In real-world situations, problems often span a spectrum of complexities. Humans inherently adjust their problem-solving approaches based on task complexity. However, most methodologies that leverage LLMs tend to adopt a uniform approach: utilizing consistent models, prompting methods, and degrees of problem decomposition, regardless of the problem complexity. Inflexibility of them can bring unnecessary computational overhead or sub-optimal performance. To address this problem, we introduce an Adaptive-Solver framework. It strategically modulates solving strategies based on the difficulties of the problems. Given an initial solution, the framework functions with two primary modules. The initial evaluation module assesses the adequacy of the current solution. If improvements are needed, the subsequent adaptation module comes into play. Within this module, three key adaptation strategies are employed: (1) Model Adaptation: Switching to a stronger LLM when a weaker variant is inadequate. (2) Prompting Method Adaptation: Alternating between different prompting techniques to suit the problem's nuances. (3) Decomposition Granularity Adaptation: Breaking down a complex problem into more fine-grained sub-questions to enhance solvability. Through such dynamic adaptations, our framework not only enhances computational efficiency but also elevates the overall performance. This dual-benefit ensures both the efficiency of the system for simpler tasks and the precision required for more complex questions. Experimental results from complex reasoning tasks reveal that the prompting method adaptation and decomposition granularity adaptation enhance performance across all tasks. Furthermore, the model adaptation approach significantly reduces API costs (up to 50%) while maintaining superior performance.
    摘要 大型语言模型(LLMs)在复杂逻辑任务中表现出色,但在实际情况中,问题 часто处于复杂性spectrum中。人类自然地根据任务复杂性调整问题解决方法,而多数利用LLMs的方法ologies却采用一致的方法:使用一致的模型、提示方法和问题剖析级别,无论问题复杂性如何。这种不灵活性可能会带来不必要的计算开销或低效性。为解决这个问题,我们介绍了一个适应解决器框架。它在给定的解决方案基础上,策略地调整解决方法,以适应问题的复杂度。解决器框架包括两个主要模块:初始评估模块和后续适应模块。初始评估模块评估当前解决方案的妥当性。如果需要改进,后续适应模块就会起到作用。在这个模块中,我们采用了三种适应策略:1. 模型适应:在弱模型无法解决问题时,切换到更强的LLM。2. 提示方法适应:根据问题的特点,采用不同的提示方法。3. decompositions Granularity适应:将复杂问题 decompositions into更细grained的子问题,以提高可解性。通过这些动态适应策略,我们的框架不 только提高计算效率,还能够保持高度的表现。这种双重优点确保系统在简单任务上的效率,以及复杂任务上的准确性。实验结果表明,提示方法适应和 decompositions Granularity适应在所有任务上提高表现,而模型适应策略可以减少API成本(最多50%),同时保持高度表现。

WASA: WAtermark-based Source Attribution for Large Language Model-Generated Data

  • paper_url: http://arxiv.org/abs/2310.00646
  • repo_url: None
  • paper_authors: Jingtan Wang, Xinyang Lu, Zitong Zhao, Zhongxiang Dai, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low
  • for: 这个论文是为了解决大语言模型(LLM)训练数据的知识产权问题而写的。
  • methods: 这个论文使用了水印技术(watermarking)来解决知识产权问题。 specifically, it proposes a WAtermarking for Source Attribution (WASA) framework that enables an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s).
  • results: 该论文通过实验证明,使用WASA框架可以实现有效的源归属和数据来源验证。
    Abstract The impressive performances of large language models (LLMs) and their immense potential for commercialization have given rise to serious concerns over the intellectual property (IP) of their training data. In particular, the synthetic texts generated by LLMs may infringe the IP of the data being used to train the LLMs. To this end, it is imperative to be able to (a) identify the data provider who contributed to the generation of a synthetic text by an LLM (source attribution) and (b) verify whether the text data from a data provider has been used to train an LLM (data provenance). In this paper, we show that both problems can be solved by watermarking, i.e., by enabling an LLM to generate synthetic texts with embedded watermarks that contain information about their source(s). We identify the key properties of such watermarking frameworks (e.g., source attribution accuracy, robustness against adversaries), and propose a WAtermarking for Source Attribution (WASA) framework that satisfies these key properties due to our algorithmic designs. Our WASA framework enables an LLM to learn an accurate mapping from the texts of different data providers to their corresponding unique watermarks, which sets the foundation for effective source attribution (and hence data provenance). Extensive empirical evaluations show that our WASA framework achieves effective source attribution and data provenance.
    摘要 大型语言模型(LLM)的吸引人表现和其商业化潜力已经引起了训练数据知识产权(IP)的严重担忧。具体来说, LLM 生成的 sintetic 文本可能会侵犯训练数据的 IP。因此,必须能够(a)确定 LLM 生成 sintetic 文本中的数据提供者(源归属),以及(b)验证数据提供者的文本数据是否被用来训练 LLM。在这篇论文中,我们表明了这两个问题可以通过水印来解决,即使 LLM 生成 sintetic 文本时包含水印,其中包含了文本的来源信息。我们标识了水印框架的关键属性(如源归属精度和对抗攻击者的Robustness),并提出了一个基于 WASA 框架的水印方法,该方法满足这些关键属性。我们的 WASA 框架使得 LLM 可以学习不同数据提供者的文本和它们对应的唯一水印之间的准确映射,这为有效的源归属(以及数据来源)提供了基础。我们的 empirical 评估表明,我们的 WASA 框架可以实现有效的源归属和数据来源识别。

From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information

  • paper_url: http://arxiv.org/abs/2310.00642
  • repo_url: None
  • paper_authors: Zhendong Shi, Xiaoli Wei, Ercan E. Kuruoglu
  • for: 本研究旨在解决Sequential process中的各种复杂环境下的投资决策问题,使用了两种方法来增强遗传学习的性能:contextual Thompson sampling和 reinforcement learning under supervision。
  • methods: 本研究使用了遗传学习和CPPI(常数比例资产保险),并将其与DDPG(深度决定策函数优化)相结合,以加速遗传学习的迭代过程,寻找最佳策略。
  • results: 实验结果显示,使用了上述两种方法可以加速遗传学习的迭代过程,并且可以快速获得最佳策略。
    Abstract The problem of how to take the right actions to make profits in sequential process continues to be difficult due to the quick dynamics and a significant amount of uncertainty in many application scenarios. In such complicated environments, reinforcement learning (RL), a reward-oriented strategy for optimum control, has emerged as a potential technique to address this strategic decision-making issue. However, reinforcement learning also has some shortcomings that make it unsuitable for solving many financial problems, excessive resource consumption, and inability to quickly obtain optimal solutions, making it unsuitable for quantitative trading markets. In this study, we use two methods to overcome the issue with contextual information: contextual Thompson sampling and reinforcement learning under supervision which can accelerate the iterations in search of the best answer. In order to investigate strategic trading in quantitative markets, we merged the earlier financial trading strategy known as constant proportion portfolio insurance (CPPI) into deep deterministic policy gradient (DDPG). The experimental results show that both methods can accelerate the progress of reinforcement learning to obtain the optimal solution.
    摘要 “对于续行过程中获利的选择问题,由于动态变化快速且在许多应用场景中存在许多不确定性,这个问题仍然具有困难。在这些复杂的环境中,奖励学习(RL),一种奖励控制的整合策略,已经被视为一种解决此策略决策问题的技术。然而,奖励学习也有一些缺陷,使其不适合解决许多金融问题,包括资源耗用量过大和寻找最佳解答的速度太慢,这使其不适合量化交易市场。在这篇研究中,我们使用了两种方法来突破问题,即在上下文信息中进行奖励探索和奖励学习的监督。为了研究量化交易的战略问题,我们将以前的金融交易策略known as constant proportion portfolio insurance (CPPI)与深度决定性策略gradient (DDPG) 混合。实验结果显示,这两种方法可以加速奖励学习的进程,以取得最佳解答。”Note: The translation is done using Google Translate and may not be perfect. Please note that the translation is done in a simplified Chinese, if you need a traditional Chinese translation, please let me know.

Knowledge Engineering using Large Language Models

  • paper_url: http://arxiv.org/abs/2310.00637
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Bradley P. Allen, Lise Stork, Paul Groth
  • for: 这篇论文旨在探讨大语言模型在知识工程中的潜在作用,以及如何将其与传统的符号知识系统融合。
  • methods: 该论文提出了两个中心方向:1)创建混合神经符号知识系统;2)在自然语言中进行知识工程。
  • results: 该论文提出了一些关键的未解决问题,以便进一步探讨这两个方向。
    Abstract Knowledge engineering is a discipline that focuses on the creation and maintenance of processes that generate and apply knowledge. Traditionally, knowledge engineering approaches have focused on knowledge expressed in formal languages. The emergence of large language models and their capabilities to effectively work with natural language, in its broadest sense, raises questions about the foundations and practice of knowledge engineering. Here, we outline the potential role of LLMs in knowledge engineering, identifying two central directions: 1) creating hybrid neuro-symbolic knowledge systems; and 2) enabling knowledge engineering in natural language. Additionally, we formulate key open research questions to tackle these directions.
    摘要 知识工程是一个领域,它关注创建和维护生成和应用知识的过程。传统上,知识工程方法都是关注正式语言表达的知识。然而,大型自然语言模型的出现和它们可以有效地与自然语言进行交互,使得知识工程的基础和实践面临到了新的问题。以下是我们对大型自然语言模型在知识工程中的潜在作用的描述,以及两个中心方向:1. 创建混合神经符号知识系统;2. 实现自然语言知识工程。此外,我们还提出了关键的开放研究问题,以便解决这两个方向。

A Survey of Robustness and Safety of 2D and 3D Deep Learning Models Against Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2310.00633
  • repo_url: None
  • paper_authors: Yanjie Li, Bin Xie, Songtao Guo, Yuanyuan Yang, Bin Xiao
  • for: 本研究旨在提高深度学习模型的可靠性和安全性,对抗训练时的敏感攻击和应用场景中的物理攻击。
  • methods: 本文首先构建了不同角度的威胁模型,然后对最新的2D和3D敏感攻击进行了全面的文献综述。同时,本文还扩展了敏感示例的概念,涵盖了不同类型的攻击方法。
  • results: 本文系统性地Investigated 3D模型对各种敏感攻击的 robustness,并发现了许多现有的攻击方法。此外,本文还发现了物理攻击可能导致安全风险的问题。最后,本文Summarize 现有的主流话题,预测未来研究的挑战和方向,以帮助建立可靠的AI系统。
    Abstract Benefiting from the rapid development of deep learning, 2D and 3D computer vision applications are deployed in many safe-critical systems, such as autopilot and identity authentication. However, deep learning models are not trustworthy enough because of their limited robustness against adversarial attacks. The physically realizable adversarial attacks further pose fatal threats to the application and human safety. Lots of papers have emerged to investigate the robustness and safety of deep learning models against adversarial attacks. To lead to trustworthy AI, we first construct a general threat model from different perspectives and then comprehensively review the latest progress of both 2D and 3D adversarial attacks. We extend the concept of adversarial examples beyond imperceptive perturbations and collate over 170 papers to give an overview of deep learning model robustness against various adversarial attacks. To the best of our knowledge, we are the first to systematically investigate adversarial attacks for 3D models, a flourishing field applied to many real-world applications. In addition, we examine physical adversarial attacks that lead to safety violations. Last but not least, we summarize present popular topics, give insights on challenges, and shed light on future research on trustworthy AI.
    摘要 利用深度学习快速发展,2D和3D计算机视觉应用在许多安全关键系统中部署,如自动驾驶和身份验证。然而,深度学习模型没有够的可靠性,因为它们对骚动攻击有限制的Robustness。物理可行的骚动攻击更加 pose 致命的威胁,对应用和人类安全构成了 fatal 威胁。许多论文已经出现,以 investigate 深度学习模型对骚动攻击的Robustness和安全性。为了带来可靠的 AI,我们首先从不同的角度构建一个通用威胁模型,然后对最新的2D和3D骚动攻击进行全面的回顾。我们将 adversarial 例外扩展到不可见的扰动,并将超过 170 篇论文综述深度学习模型对不同骚动攻击的Robustness。我们认为是首次系统地调查3D模型对骚动攻击的Robustness,这是应用于许多实际应用的蓬勃领域。此外,我们还检查了物理骚动攻击导致的安全违反。最后,我们 summarize 当前流行的话题,提供挑战的视角,并照明未来可靠 AI 的研究方向。

Intelligent Client Selection for Federated Learning using Cellular Automata

  • paper_url: http://arxiv.org/abs/2310.00627
  • repo_url: https://github.com/nikopavl4/ca_client_selection
  • paper_authors: Nikolaos Pavlidis, Vasileios Perifanis, Theodoros Panagiotis Chatzinikolaou, Georgios Ch. Sirakoulis, Pavlos S. Efraimidis
  • for: 这个研究旨在提出一个基于自动化机器学习的联盟学习(Federated Learning)客户端选择算法,以提高隐私保护和减少延迟,并且能够适应实际应用中的快速变化环境。
  • methods: 本研究提出了一个基于细胞自动机(Cellular Automata)的客户端选择算法(CA-CS),它考虑了参与客户端的 Computational Resources 和通信能力,并且考虑了客户端之间的互动,以选择最适合的客户端进行联盟学习过程。
  • results: 根据实验结果显示,CA-CS 可以与随机选择方法相比,具有与随机选择方法相似的准确性,而且可以快速避免高延迟的客户端。
    Abstract Federated Learning (FL) has emerged as a promising solution for privacy-enhancement and latency minimization in various real-world applications, such as transportation, communications, and healthcare. FL endeavors to bring Machine Learning (ML) down to the edge by harnessing data from million of devices and IoT sensors, thus enabling rapid responses to dynamic environments and yielding highly personalized results. However, the increased amount of sensors across diverse applications poses challenges in terms of communication and resource allocation, hindering the participation of all devices in the federated process and prompting the need for effective FL client selection. To address this issue, we propose Cellular Automaton-based Client Selection (CA-CS), a novel client selection algorithm, which leverages Cellular Automata (CA) as models to effectively capture spatio-temporal changes in a fast-evolving environment. CA-CS considers the computational resources and communication capacity of each participating client, while also accounting for inter-client interactions between neighbors during the client selection process, enabling intelligent client selection for online FL processes on data streams that closely resemble real-world scenarios. In this paper, we present a thorough evaluation of the proposed CA-CS algorithm using MNIST and CIFAR-10 datasets, while making a direct comparison against a uniformly random client selection scheme. Our results demonstrate that CA-CS achieves comparable accuracy to the random selection approach, while effectively avoiding high-latency clients.
    摘要 通用学习(FL)已经出现为保护隐私和减少延迟的有力解决方案,在交通、通信和医疗等实际应用中得到广泛应用。FL目的是将机器学习(ML)带到边缘,通过收集数百万个设备和物联网感知器的数据,以实现快速应对动态环境和提供高度个性化结果。然而,在多个应用中的多种感知器上增加了通信和资源分配的挑战,这会阻碍所有设备参与联邦过程,并提高效果的联邦学习客户端选择的需求。为解决这个问题,我们提出了基于Cellular Automata(CA)的客户端选择算法(CA-CS),利用CA模型来有效地捕捉快速发展环境中的空间-时间变化。CA-CS考虑每个参与联邦学习的客户端的计算资源和通信能力,同时也考虑客户端之间的互动,以实现在线联邦学习过程中智能客户端选择。在这篇论文中,我们对提出的CA-CS算法进行了住ehour评估,使用MNIST和CIFAR-10数据集,并对Random client selection scheme进行了直接比较。我们的结果表明,CA-CS可以与随机选择方案具有相同的准确率,同时有效地避免高延迟客户端。

Hierarchical Adaptation with Hypernetworks for Few-shot Molecular Property Prediction

  • paper_url: http://arxiv.org/abs/2310.00614
  • repo_url: None
  • paper_authors: Shiguang Wu, Yaqing Wang, Quanming Yao
  • for: 这篇论文的目的是提出一种基于卷积神经网络的层次适应机制,以解决生物医学应用中的分类问题。
  • methods: 该论文提出了一种基于卷积神经网络的层次适应机制,包括在编码器中选择性地适应参数,以及在预测器中对分子的适应进行层次适应。
  • results: 该论文的实验结果显示,基于层次适应机制的方法可以在几拟shot学习问题中取得state-of-the-art的性能。
    Abstract Molecular property prediction (MPP) is important in biomedical applications, which naturally suffers from a lack of labels, thus forming a few-shot learning problem. State-of-the-art approaches are usually based on gradient-based meta learning strategy, which ignore difference in model parameter and molecule's learning difficulty. To address above problems, we propose a novel hierarchical adaptation mechanism for few-shot MPP (HiMPP). The model follows a encoder-predictor framework. First, to make molecular representation property-adaptive, we selectively adapt encoder's parameter by designing a hypernetwork to modulate node embeddings during message propagation. Next, we make molecule-level adaptation by design another hypernetwork, which assigns larger propagating steps for harder molecules in predictor. In this way, molecular representation is transformed by HiMPP hierarchically from property-level to molecular level. Extensive results show that HiMPP obtains the state-of-the-art performance in few-shot MPP problems, and our proposed hierarchical adaptation mechanism is rational and effective.
    摘要 молекулярная свойство предсказание(MPP)在生物医学应用中具有重要意义,但受到标签缺乏的限制,形成了几个shot学习问题。现状的方法通常基于梯度based meta学习策略,忽略了模型参数和分子学习难度之间的差异。为解决上述问题,我们提出了一种新的层次适应机制 для几个shot MPP(HiMPP)。模型采用encoder-predictor框架,首先使分子表示性能adaptive,通过设计一个权重网络来修饰节点嵌入的消息传播过程中的模型参数。然后,我们又使用另一个权重网络,将更难的分子 assign 更大的传播步长,从而使分子表示被HiMPP层次适应。这种方法使得分子表示被HiMPP层次适应,从属性层次适应到分子层次适应。我们的实验结果表明,HiMPP在几个shot MPP问题中获得了状态计算机科学中的最佳性能,而我们提出的层次适应机制是合理和有效的。

Understanding AI Cognition: A Neural Module for Inference Inspired by Human Memory Mechanisms

  • paper_url: http://arxiv.org/abs/2310.09297
  • repo_url: https://github.com/zengxyyu/A-neural-module-for-inference-inspired-by-human-memory-mechanisms
  • paper_authors: Xiangyu Zeng, Jie Lin, Piao Hu, Ruizheng Huang, Zhicheng Zhang
  • for: The paper aims to improve the ability of machines to make sense of current inputs and retain information for relation reasoning and question-answering by proposing a PMI framework inspired by human brain’s memory system and cognitive architectures.
  • methods: The PMI framework consists of perception, memory, and inference components, with a differentiable competitive write access, working memory, and long-term memory with a higher-order structure. The framework also uses outer product associations to merge working memory with long-term memory and retrieve relevant information from two separate memory origins for associative integration.
  • results: The paper exploratively applies the PMI framework to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as relation calculation and image classification tasks, and in each case, the PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that memory consolidation and the interaction and integration of information from diverse memory sources substantially contribute to the model effectiveness on inference tasks.Here’s the format you requested:
  • for: 论文目标是提高机器对当前输入的理解和保留信息以便关系逻辑和问答。
  • methods: PMI框架包括感知、记忆和推理组件,具有可 differentiable 竞争写访问,工作记忆和长期记忆,其中长期记忆具有更高级结构以保留更多的积累知识和经验。outer product associations 将工作记忆与长期记忆 merge,并在两个不同的记忆来源之间进行相关的集成。
  • results: 论文应用 PMI 框架进行 prevailing Transformers 和 CNN 模型的改进,包括 bAbI-20k 和 Sort-of-CLEVR 数据集上的问答任务,以及关系计算和图像分类任务,并在每一个任务上,PMI 改进均以显著的程度超越原始模型。视觉分析表明,记忆整合和多种记忆来源之间的交互和集成对推理任务的效果具有重要作用。
    Abstract How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inference components. Notably, the memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain more accumulated knowledge and experiences. Through a differentiable competitive write access, current perceptions update working memory, which is later merged with long-term memory via outer product associations, averting memory overflow and minimizing information conflicts. In the inference module, relevant information is retrieved from two separate memory origins and associatively integrated to attain a more comprehensive and precise interpretation of current perceptions. We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as relation calculation and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that memory consolidation, along with the interaction and integration of information from diverse memory sources, substantially contributes to the model effectiveness on inference tasks.
    摘要 人们和机器如何对当前输入进行关系理解和问答,将感知信息置入我们过去经验的 контекст,是认知科学和人工智能领域的挑战。我们提出了一个PMI框架,包括感知、记忆和推理组件。特别是记忆模块包括工作记忆和长期记忆,其中后者具有更高级别结构,以保留更多的总知识和经验。通过可 diferenciable 竞争写访问,当前感知更新工作记忆,并 eventually 与长期记忆通过外产品关联相结合,避免记忆溢出和信息冲突。在推理模块中,来自不同记忆来源的相关信息被asso ciatively 集成,以实现更全面和准确的当前感知解释。我们考虑应用PMI来改进现有的Transformers和CNN模型,在问答任务和关系计算任务上,以及图像分类任务上,并在每个任务上,我们的PMI改进都能够显著超越原始模型。视觉分析表明,记忆凝固以及不同记忆来源之间的交互和集成,对推理任务的效果具有重要作用。

Adapting LLM Agents Through Communication

  • paper_url: http://arxiv.org/abs/2310.01444
  • repo_url: None
  • paper_authors: Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, Yelong Shen
  • for: 这个论文旨在提出一种名为“学习通信”(LTC)的训练方法,帮助大型自然语言模型(LLM) agents 在不需要广泛人类指导下,适应新任务。
  • methods: 该方法基于 iterative exploration 和 PPO 训练,使得 LLM agents 可以通过与环境和其他代理交互,不断提高自己的能力。
  • results: 在 ALFWorld、HotpotQA 和 GSM8k 三个数据集上,LTC 方法比基eline 高出 12%、5.1% 和 3.6% respectively,这些结果表明 LTC 方法在多种领域中具有广泛的应用前景。
    Abstract Recent advancements in large language models (LLMs) have shown potential for human-like agents. To help these agents adapt to new tasks without extensive human supervision, we propose the Learning through Communication (LTC) paradigm, a novel training approach enabling LLM agents to improve continuously through interactions with their environments and other agents. Recent advancements in large language models (LLMs) have shown potential for human-like agents. To help these agents adapt to new tasks without extensive human supervision, we propose the Learning through Communication (LTC) paradigm, a novel training approach enabling LLM agents to improve continuously through interactions with their environments and other agents. Through iterative exploration and PPO training, LTC empowers the agent to assimilate short-term experiences into long-term memory. To optimize agent interactions for task-specific learning, we introduce three structured communication patterns: Monologue, Dialogue, and Analogue-tailored for common tasks such as decision-making, knowledge-intensive reasoning, and numerical reasoning. We evaluated LTC on three datasets: ALFWorld (decision-making), HotpotQA (knowledge-intensive reasoning), and GSM8k (numerical reasoning). On ALFWorld, it exceeds the instruction tuning baseline by 12% in success rate. On HotpotQA, LTC surpasses the instruction-tuned LLaMA-7B agent by 5.1% in EM score, and it outperforms the instruction-tuned 9x larger PaLM-62B agent by 0.6%. On GSM8k, LTC outperforms the CoT-Tuning baseline by 3.6% in accuracy. The results showcase the versatility and efficiency of the LTC approach across diverse domains. We will open-source our code to promote further development of the community.
    摘要 最近的大语言模型(LLM)的进步已经表现出人类样式的代理。为了帮助这些代理适应新任务而不需极大的人类指导,我们提议了学习通信(LTC)方法,这是一种新的训练方法,可以让 LLM 代理通过与环境和其他代理的交互来不断改进。通过迭代探索和 PPO 训练,LTC 让代理可以将短期经验转化为长期记忆。为了优化代理之间的交互以掌握任务特定的学习,我们引入了三种结构化的通信模式:假言、对话和数据辅助,这些模式特化于常见的决策、知识激发和数学计算等任务。我们在 ALFWorld、HotpotQA 和 GSM8k 三个 dataset 上评估了 LTC,结果显示 LTC 在Success rate、EM 分数和准确率等方面都表现出优异。这些结果表明 LTC 方法在多种领域中具有广泛的应用前景和高效性。我们将在未来开源代码,以便更多的社区成员参与发展。

Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals

  • paper_url: http://arxiv.org/abs/2310.00603
  • repo_url: None
  • paper_authors: Yair Gat, Nitay Calderon, Amir Feder, Alexander Chapanin, Amit Sharma, Roi Reichart
  • for: 提高NLプロセッサの安全性和信任性を确保するための说明の强化
  • methods: 2种のアプローチを提案します:1つはCF生成アプローチで、具体的なテキスト概念を変更することでCFを生成する方法です。2つ目はマッチングアプローチで、トレーニング时にLLMを使用して特别な拟似的空间を学习する方法です。
  • results: 実験结果では、CF生成アプローチが非常に效果的ですが、検证时间が高くなるDrawbackがあります。一方、マッチングアプローチは、テスト时间の资源を削减した上で效果的な说明を提供することができます。また、Top-K技术を适用することで、すべてのテストされた方法を超える说明を提供することができます。
    Abstract Causal explanations of the predictions of NLP systems are essential to ensure safety and establish trust. Yet, existing methods often fall short of explaining model predictions effectively or efficiently and are often model-specific. In this paper, we address model-agnostic explanations, proposing two approaches for counterfactual (CF) approximation. The first approach is CF generation, where a large language model (LLM) is prompted to change a specific text concept while keeping confounding concepts unchanged. While this approach is demonstrated to be very effective, applying LLM at inference-time is costly. We hence present a second approach based on matching, and propose a method that is guided by an LLM at training-time and learns a dedicated embedding space. This space is faithful to a given causal graph and effectively serves to identify matches that approximate CFs. After showing theoretically that approximating CFs is required in order to construct faithful explanations, we benchmark our approaches and explain several models, including LLMs with billions of parameters. Our empirical results demonstrate the excellent performance of CF generation models as model-agnostic explainers. Moreover, our matching approach, which requires far less test-time resources, also provides effective explanations, surpassing many baselines. We also find that Top-K techniques universally improve every tested method. Finally, we showcase the potential of LLMs in constructing new benchmarks for model explanation and subsequently validate our conclusions. Our work illuminates new pathways for efficient and accurate approaches to interpreting NLP systems.
    摘要 natura causa 解释 预测 是非常重要的,以确保安全性和建立信任。然而,现有的方法经常无法有效地解释模型预测或有效地适用于不同模型。在这篇论文中,我们提出了两种方法来实现模型无关的解释。首先,我们提出了一种基于生成的方法,使用大型自然语言模型(LLM)在预测时对特定文本概念进行修改,保持干扰因素不变。虽然这种方法具有很高的效果,但是在执行时需要费力。我们因此提出了第二种方法,基于匹配的方法,并提出了一种受 LLM 培训时引导的方法,学习专门的嵌入空间。这个空间忠实于给定的 causal 图,并能够准确地标识符合 CF 的匹配。我们理论上证明,以 Approximating CF 为前提,才能建立 faithful 的解释。我们对我们的方法进行了比较,并将其应用于多个模型,包括具有数百亿参数的 LLM。我们的实验结果表明,CF 生成模型在无关模型中具有非常高的表现,并且我们的匹配方法,需要较少的测试资源,也提供了有效的解释。此外,我们发现 Top-K 技术在所有测试方法中都有优化效果。最后,我们展示了 LLB 的潜在在建立新的解释指标和验证我们的结论。我们的工作揭示了新的高效和准确的方法,用于解释 NLP 系统。

A Novel Computational and Modeling Foundation for Automatic Coherence Assessment

  • paper_url: http://arxiv.org/abs/2310.00598
  • repo_url: None
  • paper_authors: Aviya Maimon, Reut Tsarfaty
  • for: 这篇论文主要针对了自然语言处理(NLP)中的 coherence 评估问题,即文本听起来有意义和连贯的问题。
  • methods: 该论文提出了一种基于 формаль语言定义的 coherence 评估方法,包括三个条件:cohesion、consistency 和 relevance。这些条件被формализова为不同的计算任务,并假设一个涵盖所有任务的模型会学习出 coherence 评估所需的特征。
  • results: 在两个人类评分的 benchmark 上进行了实验,结果表明,对于每个任务和总的 coherence 评估来说,使用 joint 模型比使用单个任务模型更好。这些结果表明,该方法可以提供一个强大的基础 для大规模自动 coherence 评估。
    Abstract Coherence is an essential property of well-written texts, that refers to the way textual units relate to one another. In the era of generative AI, coherence assessment is essential for many NLP tasks; summarization, generation, long-form question-answering, and more. However, in NLP {coherence} is an ill-defined notion, not having a formal definition or evaluation metrics, that would allow for large-scale automatic and systematic coherence assessment. To bridge this gap, in this work we employ the formal linguistic definition of \citet{Reinhart:1980} of what makes a discourse coherent, consisting of three conditions -- {\em cohesion, consistency} and {\em relevance} -- and formalize these conditions as respective computational tasks. We hypothesize that (i) a model trained on all of these tasks will learn the features required for coherence detection, and that (ii) a joint model for all tasks will exceed the performance of models trained on each task individually. On two benchmarks for coherence scoring rated by humans, one containing 500 automatically-generated short stories and another containing 4k real-world texts, our experiments confirm that jointly training on the proposed tasks leads to better performance on each task compared with task-specific models, and to better performance on assessing coherence overall, compared with strong baselines. We conclude that the formal and computational setup of coherence as proposed here provides a solid foundation for advanced methods of large-scale automatic assessment of coherence.
    摘要 “一致性”是文本写作中非常重要的特性,指的是文本单位之间的关联方式。在生成AI时代,一致性评估成为许多自然语言处理(NLP)任务的重要组成部分,包括概要、生成、长文问答等。但在NLP中,“一致性”是一个不具体定义或评估指标的概念,无法进行大规模自动化和系统化的评估。为了bridging这个差距,在这个工作中,我们运用了实际语言学定义(Reinhart, 1980)所定义的一致性条件,包括“结合”、“一致”和“相关”三个条件,并将这些条件ormal化为各自的计算任务。我们假设(i)一个对所有这些任务进行训练的模型将学习出一致性检测所需的特征,并且(ii)将所有这些任务联合训练的模型会比单独训练的模型表现更好。在人类评分的两个库中,一个包含500个自动生成的短篇故事,另一个包含4000个真实世界文本,我们的实验显示,将所有这些任务联合训练的模型比单独训练的模型表现更好,并且在评估一致性方面表现更好,比单独使用强大的基准模型。我们 conclude that这种以形式和计算为基础的一致性设置提供了一个坚实的基础 для进一步的大规模自动一致性评估。

Quantum generative adversarial learning in photonics

  • paper_url: http://arxiv.org/abs/2310.00585
  • repo_url: None
  • paper_authors: Yizhi Wang, Shichuan Xue, Yaxuan Wang, Yong Liu, Jiangfang Ding, Weixu Shi, Dongyang Wang, Yingwen Liu, Xiang Fu, Guangyao Huang, Anqi Huang, Mingtang Deng, Junjie Wu
  • for: 本研究旨在调查 Whether Quantum Generative Adversarial Networks (QGANs) can perform learning tasks on near-term quantum devices usually affected by noise and even defects.
  • methods: 我们使用了一个可编程的硅量子光学芯片,实验了 QGAN 模型在光学领域中,并研究了噪声和缺陷对其性能的影响。
  • results: 我们的结果表明,即使Generator的相位调制器中有一半被损坏,或Generator和Discriminator的相位调制器都受到相位噪声达0.04π,QGANs仍然可以生成高质量量子数据,其准确率高于90%。
    Abstract Quantum Generative Adversarial Networks (QGANs), an intersection of quantum computing and machine learning, have attracted widespread attention due to their potential advantages over classical analogs. However, in the current era of Noisy Intermediate-Scale Quantum (NISQ) computing, it is essential to investigate whether QGANs can perform learning tasks on near-term quantum devices usually affected by noise and even defects. In this Letter, using a programmable silicon quantum photonic chip, we experimentally demonstrate the QGAN model in photonics for the first time, and investigate the effects of noise and defects on its performance. Our results show that QGANs can generate high-quality quantum data with a fidelity higher than 90\%, even under conditions where up to half of the generator's phase shifters are damaged, or all of the generator and discriminator's phase shifters are subjected to phase noise up to 0.04$\pi$. Our work sheds light on the feasibility of implementing QGANs on NISQ-era quantum hardware.
    摘要 量子生成对抗网络(QGAN),量子计算和机器学习的交叉点,在当今中等规模量子计算(NISQ)时代受到广泛关注,因为它们可能比类比的古典模型具有优势。然而,在NISQ时代的近期量子设备上进行学习任务,受到噪声和瑕疵的影响是必须考虑的。在这封信中,我们使用可编程的硅量子光学芯片实验ally QGAN模型在光学中,并研究噪声和瑕疵对其性能的影响。我们的结果表明,QGAN可以生成高质量量子数据,其准确率高于90%, même under conditions where up to half of the generator's phase shifters are damaged, or all of the generator and discriminator's phase shifters are subjected to phase noise up to 0.04π。我们的工作照明了在NISQ时代量子硬件上实现QGAN的可能性。

CityFM: City Foundation Models to Solve Urban Challenges

  • paper_url: http://arxiv.org/abs/2310.00583
  • repo_url: None
  • paper_authors: Pasquale Balsebre, Weiming Huang, Gao Cong, Yi Li
  • for: 本研究旨在开发一种基于自适应学习的城市基础模型(CityFM),以便在选定的地理区域内(如城市)进行自动化学习。
  • methods: CityFM 基于开源地理数据(如 OpenStreetMap)进行自我超vision,通过对不同类型实体(如路径、建筑物、区域)的多模式信息进行拟合,生成高质量的基础表示。
  • results: 对于路、建筑物和区域等下游任务,CityFM 的表示能够超过或与特定应用程序的基elines匹配。
    Abstract Pre-trained Foundation Models (PFMs) have ushered in a paradigm-shift in Artificial Intelligence, due to their ability to learn general-purpose representations that can be readily employed in a wide range of downstream tasks. While PFMs have been successfully adopted in various fields such as Natural Language Processing and Computer Vision, their capacity in handling geospatial data and answering urban questions remains limited. This can be attributed to the intrinsic heterogeneity of geospatial data, which encompasses different data types, including points, segments and regions, as well as multiple information modalities, such as a spatial position, visual characteristics and textual annotations. The proliferation of Volunteered Geographic Information initiatives, and the ever-increasing availability of open geospatial data sources, like OpenStreetMap, which is freely accessible globally, unveil a promising opportunity to bridge this gap. In this paper, we present CityFM, a self-supervised framework to train a foundation model within a selected geographical area of interest, such as a city. CityFM relies solely on open data from OSM, and produces multimodal representations of entities of different types, incorporating spatial, visual, and textual information. We analyse the entity representations generated using our foundation models from a qualitative perspective, and conduct quantitative experiments on road, building, and region-level downstream tasks. We compare its results to algorithms tailored specifically for the respective applications. In all the experiments, CityFM achieves performance superior to, or on par with, the baselines.
    摘要 干支基模型(PFM)已经引入了人工智能中的一个新模式,因为它们可以学习通用表示,可以在多种下游任务中使用。 although PFMs have been successfully applied in various fields such as natural language processing and computer vision, their ability to handle geospatial data and answer urban questions is still limited. This is because geospatial data is inherently heterogeneous, including different data types such as points, segments, and regions, as well as multiple information modalities such as spatial position, visual characteristics, and textual annotations. With the proliferation of Volunteered Geographic Information initiatives and the increasing availability of open geospatial data sources like OpenStreetMap, which is freely accessible globally, there is a promising opportunity to bridge this gap.在本文中,我们提出了CityFM,一种自我超vised框架,用于在选择的地理区域内(如城市)训练基本模型。 CityFM仅使用OpenStreetMap开源数据,生成多模式表示实体不同类型,包括空间、视觉和文本信息。我们从质量角度分析基本模型生成的实体表示,并对路、建筑物和区域级下游任务进行量测试。我们与专门为这些应用程序开发的算法进行比较。在所有实验中,CityFM的性能都高于或与基eline相当。

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

  • paper_url: http://arxiv.org/abs/2310.00582
  • repo_url: https://github.com/sy-xuan/pink
  • paper_authors: Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang
    For:This paper aims to enhance the Referential Comprehension (RC) ability of Multi-modal Large Language Models (MLLMs) for fine-grained perception tasks.Methods:The proposed method represents the referring object in the image using the coordinates of its bounding box and converts the coordinates into texts in a specific format, allowing the model to treat the coordinates as natural language. The model is trained end-to-end with a parameter-efficient tuning framework that allows both modalities to benefit from multi-modal instruction tuning.Results:The proposed method demonstrates superior performance on conventional vision-language and RC tasks, achieving a 12.0% absolute accuracy improvement over Instruct-BLIP on VSR and surpassing Kosmos-2 by 24.7% on RefCOCO_val under zero-shot settings. The model also attains the top position on the leaderboard of MMBench.
    Abstract Multi-modal Large Language Models (MLLMs) have shown remarkable capabilities in many vision-language tasks. Nevertheless, most MLLMs still lack the Referential Comprehension (RC) ability to identify a specific object or area in images, limiting their application in fine-grained perception tasks. This paper proposes a novel method to enhance the RC capability for MLLMs. Our model represents the referring object in the image using the coordinates of its bounding box and converts the coordinates into texts in a specific format. This allows the model to treat the coordinates as natural language. Moreover, we construct the instruction tuning dataset with various designed RC tasks at a low cost by unleashing the potential of annotations in existing datasets. To further boost the RC ability of the model, we propose a self-consistent bootstrapping method that extends dense object annotations of a dataset into high-quality referring-expression-bounding-box pairs. The model is trained end-to-end with a parameter-efficient tuning framework that allows both modalities to benefit from multi-modal instruction tuning. This framework requires fewer trainable parameters and less training data. Experimental results on conventional vision-language and RC tasks demonstrate the superior performance of our method. For instance, our model exhibits a 12.0% absolute accuracy improvement over Instruct-BLIP on VSR and surpasses Kosmos-2 by 24.7% on RefCOCO_val under zero-shot settings. We also attain the top position on the leaderboard of MMBench. The models, datasets, and codes are publicly available at https://github.com/SY-Xuan/Pink
    摘要 多modal大语言模型(MLLM)已经表现出了很好的能力在视觉语言任务中。然而,大多数MLLM仍然缺乏指向某个特定 объек或区域在图像中的能力,限制了它们在细化感知任务中的应用。这篇论文提出了一种新的方法来增强MLLM的指向能力。我们的模型使用图像中引用对象的矩形框坐标来表示引用对象,并将坐标转换成特定格式的文本。这 позвоils 模型对坐标视为自然语言。此外,我们构建了一个指令调整数据集,包括了多种设计的指令调整任务,并且可以在低成本下实现。为了进一步提高模型的指向能力,我们提出了一种自适应增强方法,该方法可以将 dense object 注解 extend 到高质量的引用表示矩形框对。模型通过一个简单的参数效率的调参框架进行全局调参,这使得两种模式都可以从多模态指令调整中受益。实验结果表明,我们的方法可以在 convential 视觉语言任务和指向任务中表现出较好的性能。例如,我们的模型在 VSR 任务上比 Instruct-BLIP 提高 12.0% 绝对准确率,并在 RefCOCO_val 任务上比 Kosmos-2 提高 24.7% 绝对准确率,这些结果均在零shot设置下获得。此外,我们的模型在 MMBench 领导板块上位居榜首。模型、数据集和代码都可以在 https://github.com/SY-Xuan/Pink 上获取。

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

  • paper_url: http://arxiv.org/abs/2310.02279
  • repo_url: https://github.com/sony/ctm
  • paper_authors: Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon
  • for: 加速扩散模型采样,提高扩散模型的性能。
  • methods: 提议一种新的兼容性轨迹模型(CTM),可以在单个前进 pass中输出分数(即极化流动方程中的导数),并允许在扩散过程中任意时刻进行交互。
  • results: CTM在CIFAR-10和ImageNet的64x64分辨率上达到了新的州际级FID值(FID 1.73和FID 2.06),并可以在计算预算增加时,不断提高样本质量,避免了CM中的降低。
    Abstract Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. CTM trains a single neural network that can -- in a single forward pass -- output scores (i.e., gradients of log-density) and enables unrestricted traversal between any initial and final time along the Probability Flow Ordinary Differential Equation (ODE) in a diffusion process. CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance and achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and ImageNet at 64X64 resolution (FID 2.06). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, CTM's access to the score accommodates all diffusion model inference techniques, including exact likelihood computation.
    摘要 协调模型(CM)(Song et al., 2023)可以加速基于分数的扩散模型抽象,但是会增加样本质量的成本。为了解决这个限制,我们提出了一种新的模型——一致轨迹模型(CTM)。CTM可以在单个前进 pass中输出分数(即极化流速度的导数),并且允许在扩散过程中任意时刻之间进行不受限制的游走。此外,CTM还可以通过 combining adversarial training和杂噪分数匹配损失来提高性能,并实现了单步扩散模型抽象中的新的州态-of-the-art FID 值(FID 1.73)和 ImageNet 的 64x64 分辨率上的 FID 值(FID 2.06)。此外,CTM还可以实现一种新的抽象方式,包括 deterministic 和 stochastic 的长距离跳跃。在计算预算增加时,CTM可以逐步提高样本质量,而不是如CM所见的协调模型。此外,CTM可以访问分数,因此可以应用于所有扩散模型的推理技术,包括准确的概率计算。

LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations

  • paper_url: http://arxiv.org/abs/2310.00570
  • repo_url: https://github.com/simon-tan/laplace
  • paper_authors: Sein Minn
  • for: The paper aims to provide probabilistic cause-and-effect explanations for any classifier operating on tabular data, in a human-understandable manner.
  • methods: The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features automatically, and incorporates conditional probabilities to offer probabilistic causal explanations.
  • results: The approach outperforms LIME and SHAP in terms of local accuracy and consistency of explained features, and is validated across various classification models through experiments with both simulated and real-world datasets. The explanations provided by LaPLACE can address trust-related issues such as evaluating prediction reliability, facilitating model selection, enhancing trustworthiness, and identifying fairness-related concerns within classifiers.Here is the information in Simplified Chinese text:
  • for: 本文目的是提供任何类别器操作于表格数据上的可能性 causa causal 解释,以人类可理解的方式。
  • methods: LaPLACE-Explainer 组件利用 Markov 围栏的概念,自动地建立表格数据上相关和非相关特征的统计边界,并通过 conditional probabilities 提供可能性解释。
  • results: LaPLACE 的方法比 LIME 和 SHAP 在本地准确率和解释特征的一致性方面表现出色,并通过多种分类模型的实验,在 simulate 和实际数据集上进行了验证。 LaPLACE 的解释可以解决一些信任问题,如评估预测可靠性、促进模型选择、增强可靠性和检测 fairness 相关问题在类别器中。
    Abstract Machine learning models have undeniably achieved impressive performance across a range of applications. However, their often perceived black-box nature, and lack of transparency in decision-making, have raised concerns about understanding their predictions. To tackle this challenge, researchers have developed methods to provide explanations for machine learning models. In this paper, we introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for any classifier operating on tabular data, in a human-understandable manner. The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features automatically. This approach results in the automatic generation of optimal feature subsets, serving as explanations for predictions. Importantly, this eliminates the need to predetermine a fixed number N of top features as explanations, enhancing the flexibility and adaptability of our methodology. Through the incorporation of conditional probabilities, our approach offers probabilistic causal explanations and outperforms LIME and SHAP (well-known model-agnostic explainers) in terms of local accuracy and consistency of explained features. LaPLACE's soundness, consistency, local accuracy, and adaptability are rigorously validated across various classification models. Furthermore, we demonstrate the practical utility of these explanations via experiments with both simulated and real-world datasets. This encompasses addressing trust-related issues, such as evaluating prediction reliability, facilitating model selection, enhancing trustworthiness, and identifying fairness-related concerns within classifiers.
    摘要 机器学习模型在多种应用场景中表现出色,但它们的很多时候被视为黑盒模型,无法准确地描述它们的预测结果。为解决这个问题,研究人员开发了一些方法来提供机器学习模型的解释。本文介绍了LaPLACE-explainer,可以为任何基于表格数据的分类器提供 probabilistic cause-and-effect 的解释,并且在人类可以理解的方式下进行解释。LaPLACE-Explainer 组件利用 Markov blanket 的概念,自动地确定相关和无关的特征。这种方法可以自动生成最佳的特征子集,作为预测的解释。这种方法不需要手动决定固定的特征数 N 作为解释,从而提高了方法的灵活性和适应性。通过 incorporating conditional probabilities,我们的方法可以提供 probabilistic causal 的解释,并且在本地准确性和解释特征的一致性方面超过 LIME 和 SHAP(已知的模型无关解释器)。LaPLACE 的准确性、一致性、本地准确性和适应性被严格验证了多种分类模型。此外,我们通过对 simulated 和实际数据进行实验,证明了这些解释的实际用途。这包括评估预测可靠性、促进模型选择、增强可靠性和识别分类器中的公平问题。

Quantum-Based Feature Selection for Multi-classification Problem in Complex Systems with Edge Computing

  • paper_url: http://arxiv.org/abs/2310.01443
  • repo_url: None
  • paper_authors: Wenjie Liu, Junxiu Chen, Yuxiang Wang, Peipei Gao, Zhibin Lei, Xu Ma
  • for: 本研究提出了一种基于量子算法的特征选择方法,以提高计算效率和降低资源消耗。
  • methods: 本方法使用量子态编码法将每个样本的特征编码为量子状态,然后应用振荡器算法计算任务之间的相似性。接着,根据相似性,使用格罗韦-隆方法找到最近的k个邻居样本,并更新权重矩阵。
  • results: 与传统的类ReliefF算法相比,本方法可以降低相似性计算的复杂度从O(MN)降至O(M),找到最近的邻居的复杂度从O(M)降至O(sqrt(M)),并降低资源消耗从O(MN)降至O(MlogN)。同时,与量子Relief算法相比,本方法在找到最近的邻居方面更为精准,从O(M)降至O(sqrt(M))。最后,通过基于Rigetti的一个简单示例的实验来验证方法的可行性。
    Abstract The complex systems with edge computing require a huge amount of multi-feature data to extract appropriate insights for their decision making, so it is important to find a feasible feature selection method to improve the computational efficiency and save the resource consumption. In this paper, a quantum-based feature selection algorithm for the multi-classification problem, namely, QReliefF, is proposed, which can effectively reduce the complexity of algorithm and improve its computational efficiency. First, all features of each sample are encoded into a quantum state by performing operations CMP and R_y, and then the amplitude estimation is applied to calculate the similarity between any two quantum states (i.e., two samples). According to the similarities, the Grover-Long method is utilized to find the nearest k neighbor samples, and then the weight vector is updated. After a certain number of iterations through the above process, the desired features can be selected with regards to the final weight vector and the threshold {\tau}. Compared with the classical ReliefF algorithm, our algorithm reduces the complexity of similarity calculation from O(MN) to O(M), the complexity of finding the nearest neighbor from O(M) to O(sqrt(M)), and resource consumption from O(MN) to O(MlogN). Meanwhile, compared with the quantum Relief algorithm, our algorithm is superior in finding the nearest neighbor, reducing the complexity from O(M) to O(sqrt(M)). Finally, in order to verify the feasibility of our algorithm, a simulation experiment based on Rigetti with a simple example is performed.
    摘要 复杂系统与边计算需要巨量多元特征数据提取适当的洞察,因此需要一种可行的特征选择方法来提高计算效率和节省资源消耗。本文提出了一种基于量子算法的多类划分问题特征选择算法,即QReliefF,可以有效减少算法的复杂性和提高计算效率。首先,每个样本的所有特征都被编码成量子状态,并通过操作CMP和R_y进行实现。然后,对任意两个量子状态(即两个样本)进行振荡检测,并根据相似性,使用格罗弗-隆方法查找最近的k个邻居样本。然后更新权重 вектор。经过一定的迭代过程,可以选择符合最终权重 вектор和阈值{\tau}的特征。与 классическойReliefF算法相比,我们的算法减少了相似性计算的复杂性从O(MN)降低到O(M),寻找最近邻居的复杂性从O(M)降低到O(sqrt(M)),资源消耗从O(MN)降低到O(MlogN)。同时,与量子Relief算法相比,我们的算法在寻找最近邻居方面更加突出,从O(M)降低到O(sqrt(M))。 finally,为证明我们的算法的可行性,我们在Rigetti上进行了一个简单的实验。

TDCGL: Two-Level Debiased Contrastive Graph Learning for Recommendation

  • paper_url: http://arxiv.org/abs/2310.00569
  • repo_url: None
  • paper_authors: Yubo Gao, Haotian Wu
    for:The paper aims to address the problems of over-reliance on high-quality knowledge graphs and noise issues in real-world data, which can negatively impact the performance of knowledge graph-based recommendation methods.methods:The proposed method, Two-Level Debiased Contrastive Graph Learning (TDCGL), combines contrastive learning with debiasing techniques to improve the performance of knowledge graph-based recommendation methods. The method is designed to work on both User-Item and User-User pairs to model higher-order relations.results:The proposed method significantly outperforms state-of-the-art baselines in terms of anti-noise capability and recommendation performance. Ablation studies demonstrate the necessity of each level of the TDCGL method.
    Abstract knowledge graph-based recommendation methods have achieved great success in the field of recommender systems. However, over-reliance on high-quality knowledge graphs is a bottleneck for such methods. Specifically, the long-tailed distribution of entities of KG and noise issues in the real world will make item-entity dependent relations deviate from reflecting true characteristics and significantly harm the performance of modeling user preference. Contrastive learning, as a novel method that is employed for data augmentation and denoising, provides inspiration to fill this research gap. However, the mainstream work only focuses on the long-tail properties of the number of items clicked, while ignoring that the long-tail properties of total number of clicks per user may also affect the performance of the recommendation model. Therefore, to tackle these problems, motivated by the Debiased Contrastive Learning of Unsupervised Sentence Representations (DCLR), we propose Two-Level Debiased Contrastive Graph Learning (TDCGL) model. Specifically, we design the Two-Level Debiased Contrastive Learning (TDCL) and deploy it in the KG, which is conducted not only on User-Item pairs but also on User-User pairs for modeling higher-order relations. Also, to reduce the bias caused by random sampling in contrastive learning, with the exception of the negative samples obtained by random sampling, we add a noise-based generation of negation to ensure spatial uniformity. Considerable experiments on open-source datasets demonstrate that our method has excellent anti-noise capability and significantly outperforms state-of-the-art baselines. In addition, ablation studies about the necessity for each level of TDCL are conducted.
    摘要 知识图库(KG)基于推荐方法在推荐系统中取得了很大的成功。然而,高质量知识图的过亢使得这些方法受到了阻碍。具体来说,知识图中实体的长尾分布和实际世界中的噪声问题会使Item-Entity相关性偏离真实特性,从而对模型用户喜好的表达有很大的负面影响。对此,对数据增强和降噪的contrastive学习提供了灵感,但主流工作只关注长尾数量的点击项,而忽略了每个用户的总点击量长尾属性的影响。因此,为了解决这些问题,我们提出了Two-Level Debiased Contrastive Graph Learning(TDCGL)模型。具体来说,我们设计了Two-Level Debiased Contrastive Learning(TDCL),并在KG中进行了实现,不仅在用户-项对上进行了实现,还在用户-用户对上进行了实现,以模型高级别关系。此外,为了减少Random sampling导致的偏见,除了随机抽取的负样本外,我们还添加了随机生成的负样本,以确保空间均匀性。经过了一系列的实验,我们发现我们的方法在开源数据集上具有极高的反噪能力,并在比较之下显著超越了状态精算标准。此外,我们还进行了剖析研究,以确定每级TDCL的必要性。

Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2310.00567
  • repo_url: None
  • paper_authors: Quang H. Nguyen, Yingjie Lao, Tung Pham, Kok-Seng Wong, Khoa D. Doan
  • for: 防止深度神经网络受到黑盒攻击,即使攻击者只有模型的输出信息。
  • methods: 提出了一种简单、轻量级的防御策略,在推理时将隐藏层的特征加上随机噪音,以提高模型免受黑盒攻击。
  • results: 经过 teorical 分析和实验 validate,该方法可以有效地增强模型对黑盒攻击的抵抗力,并不需要对模型进行 adversarial 训练,对模型的准确率也没有明显的影响。
    Abstract Recent works have shown that deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify. Even with access only to the model's output, an attacker can employ black-box attacks to generate such adversarial examples. In this work, we propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time. Our theoretical analysis confirms that this method effectively enhances the model's resilience against both score-based and decision-based black-box attacks. Importantly, our defense does not necessitate adversarial training and has minimal impact on accuracy, rendering it applicable to any pre-trained model. Our analysis also reveals the significance of selectively adding noise to different parts of the model based on the gradient of the adversarial objective function, which can be varied during the attack. We demonstrate the robustness of our defense against multiple black-box attacks through extensive empirical experiments involving diverse models with various architectures.
    摘要

Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models

  • paper_url: http://arxiv.org/abs/2310.00566
  • repo_url: https://github.com/colfeng/calm
  • paper_authors: Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie, Weiguang Han, Alejandro Lopez-Lira, Hao Wang
  • for: 这篇论文旨在检验大语言模型(LLM)是否可以用于信用评估。
  • methods: 作者使用了三个假设和一个大量的实验研究LLM在信用评估中的可行性。他们首先制定了一个特有的信用评估大语言模型(CALM),然后对LLM的偏见进行了严格的检查。
  • results: 研究发现LLM可以超越传统模型的局限性,并且在不同的金融评估中表现出优异的适应能力。同时,研究也发现LLM可能存在一些偏见,因此提出了一些改进方案。
    Abstract Credit and risk assessments are cornerstones of the financial landscape, impacting both individual futures and broader societal constructs. Existing credit scoring models often exhibit limitations stemming from knowledge myopia and task isolation. In response, we formulate three hypotheses and undertake an extensive case study to investigate LLMs' viability in credit assessment. Our empirical investigations unveil LLMs' ability to overcome the limitations inherent in conventional models. We introduce a novel benchmark curated for credit assessment purposes, fine-tune a specialized Credit and Risk Assessment Large Language Model (CALM), and rigorously examine the biases that LLMs may harbor. Our findings underscore LLMs' potential in revolutionizing credit assessment, showcasing their adaptability across diverse financial evaluations, and emphasizing the critical importance of impartial decision-making in the financial sector. Our datasets, models, and benchmarks are open-sourced for other researchers.
    摘要 信用和风险评估是金融景观中的两个重要基础,对个人未来和社会构建都产生了深远的影响。现有的信用评估模型经常受到知识偏见和任务隔离的限制。为了应对这些限制,我们提出了三个假设,并进行了广泛的案例研究,以评估LLMs在信用评估中的可行性。我们的实际调查发现,LLMs可以超越传统模型中的限制。我们开发了一个专门为信用评估目的制定的benchmark,细化一个特殊的信用和风险评估大语言模型(CALM),并且严格地检查LLMs可能披露的偏见。我们的发现表明,LLMs在改变信用评估的方式方面具有启示性,并且在多种金融评估中展现出了适应性。我们的数据集、模型和benchmark都公开发布,以便其他研究人员进行进一步的研究。

DYNAP-SE2: a scalable multi-core dynamic neuromorphic asynchronous spiking neural network processor

  • paper_url: http://arxiv.org/abs/2310.00564
  • repo_url: None
  • paper_authors: Ole Richter, Chenxi Wu, Adrian M. Whatley, German Köstinger, Carsten Nielsen, Ning Qiao, Giacomo Indiveri
  • for: 这个论文旨在提出一种基于生物神经系统的概念的卷积神经网络平台,用于实时处理感知信号。
  • methods: 该平台使用了卷积神经网络,并实现了各种生物学上的神经处理现象,如短期抑制、NMDA阻链、AMPA扩散、家OSTAT、脉冲频率调整、抗阻填充和脉冲传输延迟。
  • results: 该平台可以实现实时处理感知信号,并且可以模拟不同的生物学上的神经网络,包括单个神经元和脑细胞信号的监测。
    Abstract With the remarkable progress that technology has made, the need for processing data near the sensors at the edge has increased dramatically. The electronic systems used in these applications must process data continuously, in real-time, and extract relevant information using the smallest possible energy budgets. A promising approach for implementing always-on processing of sensory signals that supports on-demand, sparse, and edge-computing is to take inspiration from biological nervous system. Following this approach, we present a brain-inspired platform for prototyping real-time event-based Spiking Neural Networks (SNNs). The system proposed supports the direct emulation of dynamic and realistic neural processing phenomena such as short-term plasticity, NMDA gating, AMPA diffusion, homeostasis, spike frequency adaptation, conductance-based dendritic compartments and spike transmission delays. The analog circuits that implement such primitives are paired with a low latency asynchronous digital circuits for routing and mapping events. This asynchronous infrastructure enables the definition of different network architectures, and provides direct event-based interfaces to convert and encode data from event-based and continuous-signal sensors. Here we describe the overall system architecture, we characterize the mixed signal analog-digital circuits that emulate neural dynamics, demonstrate their features with experimental measurements, and present a low- and high-level software ecosystem that can be used for configuring the system. The flexibility to emulate different biologically plausible neural networks, and the chip's ability to monitor both population and single neuron signals in real-time, allow to develop and validate complex models of neural processing for both basic research and edge-computing applications.
    摘要 随着技术的快速发展,处理数据在边缘的需求减少了很多。电子系统在这些应用程序中必须在实时中处理数据,并在最小的能量预算下提取相关信息。一种有前途的方法是根据生物神经系统来实现持续时间的触发神经网络(SNN)。我们在这篇文章中提出了一种基于脑神经系统的平台,用于实时驱动SNN。该系统支持直接模拟生物化的神经处理现象,如短期抑制、NMDA闭合、AMPA扩散、家OSTASIS、脉冲频率调整、抗场基于脑干细胞和脉冲传输延迟。这些分析电路与低延迟的异步数字电路结合,以实现不同网络架构和直接将事件转换为数据。这个异步基础设施允许定义不同的网络架构,并提供直接基于事件的数据编码和转换接口。我们在这篇文章中描述了整体系统架构,Characterize mixed signal analog-digital circuits that emulate neural dynamics, demonstrate their features with experimental measurements, and present a low- and high-level software ecosystem that can be used for configuring the system。系统的灵活性可以模拟不同的生物学可能的神经网络,系统的检测功能可以在实时中监测单个神经元和群体神经元的信号。这些功能使得可以开发和验证复杂的神经处理模型,以满足边缘计算应用和基础研究的需求。

Siamese Representation Learning for Unsupervised Relation Extraction

  • paper_url: http://arxiv.org/abs/2310.00552
  • repo_url: https://github.com/gxxxzhang/siamese-ure
  • paper_authors: Guangxin Zhang, Shu Chen
  • for: 掌握开放平台文本中Named Entity对的下一级关系,无需先知 relacional distribution。
  • methods: 使用对比学习,吸引正样本,排斥负样本,以提高分类的分化。
  • results: 我们提出的Siamese Representation Learning for Unsupervised Relation Extraction模型,可以有效优化关系表示例子,保持关系特征空间的层次结构,并在无监督的情况下提高关系EXTRACTION的性能。
    Abstract Unsupervised relation extraction (URE) aims at discovering underlying relations between named entity pairs from open-domain plain text without prior information on relational distribution. Existing URE models utilizing contrastive learning, which attract positive samples and repulse negative samples to promote better separation, have got decent effect. However, fine-grained relational semantic in relationship makes spurious negative samples, damaging the inherent hierarchical structure and hindering performances. To tackle this problem, we propose Siamese Representation Learning for Unsupervised Relation Extraction -- a novel framework to simply leverage positive pairs to representation learning, possessing the capability to effectively optimize relation representation of instances and retain hierarchical information in relational feature space. Experimental results show that our model significantly advances the state-of-the-art results on two benchmark datasets and detailed analyses demonstrate the effectiveness and robustness of our proposed model on unsupervised relation extraction.
    摘要 <>Translate the given text into Simplified Chinese.<>无监督关系抽取(URE)目标是从开放领域平滑文本中发现下面的关系,无需先知relational分布。现有URE模型使用对照学习,吸引正样本并排斥负样本,以促进更好的分离。然而,细腻的关系semantic在关系中导致假性负样本的生成,损害内在的层次结构,降低性能。为解决这个问题,我们提出了对称表示学习 для无监督关系抽取——一种新的框架,可以简单地利用正样本来 representation学习,具有可以有效优化关系表示实例的能力,并保留关系特征空间中的层次信息。实验结果显示,我们的模型在两个 benchmark 数据集上显著提高了状态的报告结果,并在详细分析中证明了我们提出的模型在无监督关系抽取中的效果和稳定性。

JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

  • paper_url: http://arxiv.org/abs/2310.00535
  • repo_url: None
  • paper_authors: Yuandong Tian, Yiping Wang, Zhenyu Zhang, Beidi Chen, Simon Du
  • for: 这篇论文旨在理解多层Transformer架构在训练过程中的行为。
  • methods: 论文提出了一种新的数学框架,称为Join MLP/Attention(JoMA)动力学,它将Transformer架构中的自注意层替换为多层MLP层,从而更好地理解训练过程。
  • results: 实验表明,在使用真实世界数据集(Wikitext2/Wikitext103)和不同的预训练模型(OPT、Pythia)训练的情况下,JoMA能够准确预测多层Transformer中Token的组合方式,并且能够解释在不同的 activations 下,注意力在不同阶段变得稀疏或密集。
    Abstract We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to understand the training procedure of multilayer Transformer architectures. This is achieved by integrating out the self-attention layer in Transformers, producing a modified dynamics of MLP layers only. JoMA removes unrealistic assumptions in previous analysis (e.g., lack of residual connection) and predicts that the attention first becomes sparse (to learn salient tokens), then dense (to learn less salient tokens) in the presence of nonlinear activations, while in the linear case, it is consistent with existing works that show attention becomes sparse over time. We leverage JoMA to qualitatively explains how tokens are combined to form hierarchies in multilayer Transformers, when the input tokens are generated by a latent hierarchical generative model. Experiments on models trained from real-world dataset (Wikitext2/Wikitext103) and various pre-trained models (OPT, Pythia) verify our theoretical findings.
    摘要 我们提议的 JOINT MLP/ATTENTION(JoMA)动力学框架,用于理解多层Transformer结构的训练过程。我们在Transformer中抽取了自注意层,生成了修改后的MLP层 dynamics。JoMA eliminates unrealistic assumptions in previous analysis(例如缺乏径向连接),并预测在非线性活化下,注意力首先变得稀疏(以学习重要的token),然后变得密集(以学习较不重要的token)。在线性情况下,它与先前的研究一致,注意力随时间变得稀疏。我们利用JoMA来质量地解释在多层Transformer中如何将输入token组合成层次结构,当输入token由隐藏的层次生成模型生成。我们通过在真实世界数据集(Wikitext2/Wikitext103)和多种预训练模型(OPT、Pythia)进行实验,证明我们的理论发现。

SELF: Language-Driven Self-Evolution for Large Language Model

  • paper_url: http://arxiv.org/abs/2310.00533
  • repo_url: None
  • paper_authors: Jianqiao Lu, Wanjun Zhong, Wenyong Huang, Yufei Wang, Fei Mi, Baojun Wang, Weichao Wang, Lifeng Shang, Qun Liu
  • for: The paper aims to introduce an innovative approach for autonomous model development in large language models (LLMs), enabling them to undergo continual self-evolution and improve their intrinsic abilities without human intervention.
  • methods: The proposed approach, called “SELF” (Self-Evolution with Language Feedback), employs language-based feedback as a versatile and comprehensive evaluative tool to guide the model’s self-evolutionary training. SELF acquires foundational meta-skills through meta-skill learning, and uses self-curated data for perpetual training and iterative fine-tuning to enhance its capabilities.
  • results: The experimental results on representative benchmarks demonstrate that SELF can progressively advance its inherent abilities without human intervention, producing responses of superior quality. The SELF framework signifies a viable pathway for autonomous LLM development, transforming the LLM from a passive recipient of information into an active participant in its own evolution.
    Abstract Large Language Models (LLMs) have showcased remarkable versatility across diverse domains. However, the pathway toward autonomous model development, a cornerstone for achieving human-level learning and advancing autonomous AI, remains largely uncharted. We introduce an innovative approach, termed "SELF" (Self-Evolution with Language Feedback). This methodology empowers LLMs to undergo continual self-evolution. Furthermore, SELF employs language-based feedback as a versatile and comprehensive evaluative tool, pinpointing areas for response refinement and bolstering the stability of self-evolutionary training. Initiating with meta-skill learning, SELF acquires foundational meta-skills with a focus on self-feedback and self-refinement. These meta-skills are critical, guiding the model's subsequent self-evolution through a cycle of perpetual training with self-curated data, thereby enhancing its intrinsic abilities. Given unlabeled instructions, SELF equips the model with the capability to autonomously generate and interactively refine responses. This synthesized training data is subsequently filtered and utilized for iterative fine-tuning, enhancing the model's capabilities. Experimental results on representative benchmarks substantiate that SELF can progressively advance its inherent abilities without the requirement of human intervention, thereby indicating a viable pathway for autonomous model evolution. Additionally, SELF can employ online self-refinement strategy to produce responses of superior quality. In essence, the SELF framework signifies a progressive step towards autonomous LLM development, transforming the LLM from a mere passive recipient of information into an active participant in its own evolution.
    摘要 大型语言模型(LLM)在多种领域表现出了惊人的多面性。然而,把模型发展成为自主的核心目标,以实现人类水平的学习和自主AI的进步,仍然是一个未探索的路径。我们提出了一种创新的方法,称为“自我演进”(Self-Evolution with Language Feedback,SELF)。这种方法使得LLM可以不断自我演进。此外,SELF使用语言反馈作为多方面的评价工具,帮助模型自我评估和改进。通过初级技能学习,SELF取得了基本的初级技能,重点是自我反馈和自我改进。这些初级技能是关键的,帮助模型在后续的自我演进过程中自动生成和互动地反复修改答案。在没有 Label 的情况下,SELF 使得模型可以自动生成和修改答案。这些合成的训练数据被筛选并用于迭代练化,从而提高模型的能力。实验结果表明,SELF 可以不断提高其内在能力,无需人类干预,这表明了一个可行的自主模型演进路径。此外,SELF 还可以使用在线自我反finement策略生成高质量答案。总之,SELF 框架表示了一个自主 LLM 发展的进步,将 LLM 转化为一个活跃参与自己演进的参与者。

Are Graph Neural Networks Optimal Approximation Algorithms?

  • paper_url: http://arxiv.org/abs/2310.00526
  • repo_url: None
  • paper_authors: Morris Yau, Eric Lu, Nikolaos Karalias, Jessica Xu, Stefanie Jegelka
  • for: 这个论文目的是设计用于获得优化算法的图 neural network 架构,用于解决一类 combinatorial optimization problems。
  • methods: 论文使用了强大的算法工具 from semidefinite programming (SDP),并证明了可以使用 polynomial-sized message passing algorithms 来表示最强 polynomial time algorithms for Max Constraint Satisfaction Problems,假设Unique Games Conjecture 成立。
  • results: 论文实现了高质量的近似解决方案,在多种实际和 sintetic 数据集上对比 both neural baselines 和 classical algorithms 表现出色。此外,论文还利用 OptGNN 的 convex relaxation 能力设计了一种生成 dual certificates of optimality 的算法。
    Abstract In this work we design graph neural network architectures that can be used to obtain optimal approximation algorithms for a large class of combinatorial optimization problems using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max Cut and maximum independent set. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against both neural baselines and classical algorithms. Finally, we take advantage of OptGNN's ability to capture convex relaxations to design an algorithm for producing dual certificates of optimality (bounds on the optimal solution) from the learned embeddings of OptGNN.
    摘要 在这个工作中,我们设计了图神经网络架构,可以用来获取大类 combinatorial optimization 问题的优化算法。我们证明了,使用半definite 程序(SDP)的强大算法工具,可以通过极限下的讯息传递算法来获取最优解。我们利用这个结果,构建了高效的图神经网络架构 OptGNN,可以在 landmark combinatorial optimization 问题中获得高质量的近似解。我们的方法在各种实际和 sintetic 数据集上实现了强有力的实际结果,比较 neural 基elines 和经典算法。最后,我们利用 OptGNN 捕捉到的 convex relaxation,设计了一种生成优化解的 dual certificate 算法。