cs.AI - 2023-11-22

A Survey of Blockchain, Artificial Intelligence, and Edge Computing for Web 3.0

  • paper_url: http://arxiv.org/abs/2311.13731
  • repo_url: None
  • paper_authors: Jianjun Zhu, Fan Li, Jinyuan Chen
  • For: The paper explores the intersection of blockchain, artificial intelligence, and edge computing in the context of Web 3.0, with a focus on their potential to improve data privacy and security.* Methods: The paper provides an in-depth analysis of each of these technologies, including their relevance to Web 3.0, key components, and practical applications. The authors also propose decentralized storage and computing solutions by exploring the integration of these technologies.* Results: The paper highlights the potential of these technologies to return control and ownership of data and digital assets back to users, and outlines key challenges and research directions for realizing this vision.
    Abstract Web 3.0, as the third generation of the World Wide Web, aims to solve contemporary problems of trust, centralization, and data ownership. Driven by the latest advances in cutting-edge technologies, Web 3.0 is moving towards a more open, decentralized, intelligent, and interconnected network. However, increasingly widespread data breaches have raised awareness of online privacy and security of personal data. Additionally, since Web 3.0 is a sophisticated and complex convergence, the technical details behind it are not as clear as the characteristics it presents. In this survey, we conduct an in-depth exploration of Web 3.0 from the perspectives of blockchain, artificial intelligence, and edge computing. Specifically, we begin with summarizing the evolution of the Internet and providing an overview of these three key technological factors. Afterward, we provide a thorough analysis of each technology separately, including its relevance to Web 3.0, key technology components, and practical applications. We also propose decentralized storage and computing solutions by exploring the integration of technologies. Finally, we highlight the key challenges alongside potential research directions. Through the combination and mutual complementation of multiple technologies, Web 3.0 is expected to return more control and ownership of data and digital assets back to users.
    摘要 web3.0,为第三代世界互联网的发展,旨在解决当今的信任、中央化和数据所有权问题。驱动了最新的技术创新,web3.0 向更开放、分布式、智能和连接的网络方向发展。然而,随着数据泄露的扩散,人们对在线隐私和个人数据安全的意识得到提高。此外,由于 web3.0 是一种复杂和不熟悉的整体,技术细节不够清晰,需要进一步的探索。在这份调查中,我们从区块链、人工智能和 Edge computing 的角度进行了深入的探索,包括这些技术的发展历程、关键组件和实际应用。此外,我们还提出了分布式存储和计算解决方案,并考虑了技术的集成。最后,我们指出了关键挑战和未来研究方向。通过多技术的结合和互补,web3.0 期望返还数据和数字资产的控制和所有权给用户。

Studying Artist Sentiments around AI-generated Artwork

  • paper_url: http://arxiv.org/abs/2311.13725
  • repo_url: None
  • paper_authors: Safinah Ali, Cynthia Breazeal
  • for: 这个论文的目的是研究艺术家对人工智能生成的艺术作品的情感反应,以便更好地发展和使用创造力支持工具。
  • methods: 该论文使用了面试和社交媒体平台的公共帖子分析方法,以了解艺术家对AI生成的艺术作品的主要担忧和希望。
  • results: 该论文发现艺术家对AI生成的艺术作品的主要担忧包括可能会抄袭他们的作品和风格,而希望包括用于创造新的艺术用途。这些发现可以指导艺术家和开发者合作,以便负责任地开发和使用创造力支持工具。
    Abstract Art created using generated Artificial Intelligence has taken the world by storm and generated excitement for many digital creators and technologists. However, the reception and reaction from artists have been mixed. Concerns about plagiarizing their artworks and styles for datasets and uncertainty around the future of digital art sparked movements in artist communities shunning the use of AI for generating art and protecting artists' rights. Collaborating with these tools for novel creative use cases also sparked hope from some creators. Artists are an integral stakeholder in the rapidly evolving digital creativity industry and understanding their concerns and hopes inform responsible development and use of creativity support tools. In this work, we study artists' sentiments about AI-generated art. We interviewed 7 artists and analyzed public posts from artists on social media platforms Reddit, Twitter and Artstation. We report artists' main concerns and hopes around AI-generated artwork, informing a way forward for inclusive development of these tools.
    摘要 人工智能生成的艺术在全球引起了风波,让数字创作者和技术人员感到非常激动。然而,艺术家对这些工具的接受和反应是杂mix的。一些艺术家担心AI将其艺术作品和风格复制用于数据集,而其他艺术家则对未来的数字艺术产生了uncertainty。这些问题在艺术家社区中引发了抵制使用AI生成艺术的运动,以保护艺术家的权利。同时,一些创作者也希望通过与这些工具的合作来探索新的创作用例。作为数字创作业务的重要参与者,理解艺术家的担忧和希望是不可或缺的,以便负责任地开发和使用创新工具。在这项工作中,我们研究了艺术家对AI生成艺术的情感,通过对Reddit、Twitter和Artstation等社交媒体平台上艺术家的公共帖子和采访7名艺术家,我们报告了艺术家对AI生成艺术作品的主要担忧和希望,以便为包容性的创新工具的发展提供指导。

Nova$^+$: Generative Language Models for Binaries

  • paper_url: http://arxiv.org/abs/2311.13721
  • repo_url: None
  • paper_authors: Nan Jiang, Chengxiao Wang, Kevin Liu, Xiangzhe Xu, Lin Tan, Xiangyu Zhang
  • for: 本研究旨在推广现有的生成大语言模型(LLMs)到二进制领域,以便利用LLMs在代码生成、程序修复和文档分析等方面的表现。
  • methods: 我们开发了一个名为Nova和Nova$^+的两种LLMs,它们都是基于二进制 corpora 进行预训练的。Nova 使用标准的语言模型任务进行预训练,在五个 benchmark 上表现出色,在三个下游任务(二进制代码相似性检测、二进制代码翻译和二进制代码恢复)中表现更好于 GPT-3.5 和其他现有技术。我们还构建了 Nova$^+,它使用了两个新的预训练任务,即优化生成和优化级别预测,以学习二进制优化和对等 binaries。
  • results: Nova 和 Nova$^+ 在五个 benchmark 上都表现出色,其中 Nova$^+ 在三个下游任务中表现最佳,说明新的预训练任务对 Nova 的性能做出了贡献。
    Abstract Generative large language models (LLMs) pre-trained on code have shown impressive effectiveness in code generation, program repair, and document analysis. However, existing generative LLMs focus on source code and are not specialized for binaries. There are three main challenges for LLMs to model and learn binary code: hex-decimal values, complex global dependencies, and compiler optimization levels. To bring the benefit of LLMs to the binary domain, we develop Nova and Nova$^+$, which are LLMs pre-trained on binary corpora. Nova is pre-trained with the standard language modeling task, showing significantly better capability on five benchmarks for three downstream tasks: binary code similarity detection (BCSD), binary code translation (BCT), and binary code recovery (BCR), over GPT-3.5 and other existing techniques. We build Nova$^+$ to further boost Nova using two new pre-training tasks, i.e., optimization generation and optimization level prediction, which are designed to learn binary optimization and align equivalent binaries. Nova$^+$ shows overall the best performance for all three downstream tasks on five benchmarks, demonstrating the contributions of the new pre-training tasks.
    摘要 大型生成语言模型(LLM)在代码生成、程序修复和文档分析方面表现出色,但现有的生成LLM仅专注于源代码,对二进制代码没有特化。二进制代码模型和学习的三大挑战是hex值、复杂的全球依赖关系以及编译优化级别。为将LLM的优势带到二进制领域,我们开发了Nova和Nova$^+$,它们是基于二进制质量的LLM。Nova通过标准语言模型任务进行预训练,在五个benchmark上表现出显著提高,对三个下游任务(二进制代码相似检测、二进制代码翻译和二进制代码恢复)的性能有显著优势。为了进一步提高Nova的性能,我们构建了Nova$^+$,它通过两个新的预训练任务——优化生成和优化级别预测——来学习二进制优化和对等二进制的适应。Nova$^+$在五个benchmark上表现出总最佳性能,证明了新的预训练任务的贡献。

Towards More Likely Models for AI Planning

  • paper_url: http://arxiv.org/abs/2311.13720
  • repo_url: None
  • paper_authors: Turgay Caglar, Sirine Belhaj, Tathagata Chakraborti, Michael Katz, Sarath Sreedharan
  • for: 这个研究探讨了大型自然语言模型 (LLMs) 在自动规划任务中的应用,特别是在模型空间更新中。
  • methods: 本研究使用了两种不同的模型空间问题,并考虑了一个 LLM 的效果在这些任务中。我们还评估了 LLM 和排序搜寻 (CS) 的比较,以及 LLM 在模型空间推理中的表现。
  • results: 我们的实验结果显示, LLM 在模型空间更新中的表现比 CS 更好,并且可以实现更高的成功率。这显示了 LLM 在自动规划任务中的应用潜力。
    Abstract This is the first work to look at the application of large language models (LLMs) for the purpose of model space edits in automated planning tasks. To set the stage for this sangam, we explore two different flavors of model space problems that have been studied in the AI planning literature and explore the effect of an LLM on those tasks. We empirically demonstrate how the performance of an LLM contrasts with combinatorial search (CS) - an approach that has been traditionally used to solve model space tasks in planning, both with the LLM in the role of a standalone model space reasoner as well as in the role of a statistical signal in concert with the CS approach as part of a two-stage process. Our experiments show promising results suggesting further forays of LLMs into the exciting world of model space reasoning for planning tasks in the future.
    摘要

Deep learning-based instance segmentation for the precise automated quantification of digital breast cancer immunohistochemistry images

  • paper_url: http://arxiv.org/abs/2311.13719
  • repo_url: None
  • paper_authors: Blanca Maria Priego-Torresa, Barbara Lobato-Delgado, Lidia Atienza-Cuevas, Daniel Sanchez-Morillo
  • for: automatic quantification of biomarkers on immunohistochemistry breast cancer images for appropriate therapy and disease prognosis
  • methods: deep learning-based instance segmentation architecture for automatic segmentation of nuclear and membrane biomarkers
  • results: promising method to segment nuclei instances in IHC-stained images, integrated into a web platform for decision-support by pathologists
    Abstract The quantification of biomarkers on immunohistochemistry breast cancer images is essential for defining appropriate therapy for breast cancer patients, as well as for extracting relevant information on disease prognosis. This is an arduous and time-consuming task that may introduce a bias in the results due to intra- and inter-observer variability which could be alleviated by making use of automatic quantification tools. However, this is not a simple processing task given the heterogeneity of breast tumors that results in non-uniformly distributed tumor cells exhibiting different staining colors and intensity, size, shape, and texture, of the nucleus, cytoplasm and membrane. In this research work, we demonstrate the feasibility of using a deep learning-based instance segmentation architecture for the automatic quantification of both nuclear and membrane biomarkers applied to IHC-stained slides. We have solved the cumbersome task of training set generation with the design and implementation of a web platform, which has served as a hub for communication and feedback between researchers and pathologists as well as a system for the validation of the automatic image processing models. Through this tool, we have collected annotations over samples of HE, ER and Ki-67 (nuclear biomarkers) and HER2 (membrane biomarker) IHC-stained images. Using the same deep learning network architecture, we have trained two models, so-called nuclei- and membrane-aware segmentation models, which, once successfully validated, have revealed to be a promising method to segment nuclei instances in IHC-stained images. The quantification method proposed in this work has been integrated into the developed web platform and is currently being used as a decision-support tool by pathologists.
    摘要 医学影像中的生物标志物质量化对于定义乳腺癌患者适当的治疗和患病诊断提供了关键信息。然而,这是一项艰难和时间consuming的任务,可能会引入偏见,因为评估人员之间和内部的变动可能会导致不一致的结果。为了解决这个问题,我们在这项研究中使用了深度学习基于实例分割的自动量化方法。我们通过设计和实现了一个网络平台,成功地解决了训练集生成的困难问题,并且为评估模型的验证提供了一个系统。通过这个工具,我们收集了HE、ER和 Ki-67(核内生物标志物质)以及HER2(膜上生物标志物质)IHC染色图像的注释。使用同一个深度学习网络架构,我们训练了两个模型,即核心快速分割模型和膜部分快速分割模型,并在成功验证后,这些模型显示了在IHC染色图像中分割核心实例的批处。我们提出的量化方法已经被集成到开发的网络平台中,现在被用作诊断支持工具。

A Unified Approach to Count-Based Weakly-Supervised Learning

  • paper_url: http://arxiv.org/abs/2311.13718
  • repo_url: None
  • paper_authors: Vinay Shukla, Zhe Zeng, Kareem Ahmed, Guy Van den Broeck
  • for: 学习从弱监督数据中获得高质量标签
  • methods: 提出了一种统一的弱监督学习方法,利用计数来计算输出中true的概率,并基于这个计算 derivate一个计数损失函数
  • results: 在三种常见的弱监督学习 paradigm 上测试了提出的方法,并观察到在三个 paradigm 中均 achiev 了 estado-of-the-art 或高度竞争的结果
    Abstract High-quality labels are often very scarce, whereas unlabeled data with inferred weak labels occurs more naturally. In many cases, these weak labels dictate the frequency of each respective class over a set of instances. In this paper, we develop a unified approach to learning from such weakly-labeled data, which we call count-based weakly-supervised learning. At the heart of our approach is the ability to compute the probability of exactly k out of n outputs being set to true. This computation is differentiable, exact, and efficient. Building upon the previous computation, we derive a count loss penalizing the model for deviations in its distribution from an arithmetic constraint defined over label counts. We evaluate our approach on three common weakly-supervised learning paradigms and observe that our proposed approach achieves state-of-the-art or highly competitive results across all three of the paradigms.
    摘要 高品质标签很难得,而无标签数据具有推断的弱标签更加普遍。在许多情况下,这些弱标签会决定每个实例的类频率。在这篇论文中,我们开发了一种统一的学习方法,我们称之为计数基于的弱监督学习。我们的方法可以计算每个输出的概率为true,这个计算是不可Diff、正确和高效的。基于这个计算,我们 derivate一个计数损失,惩罚模型的分布与标签计数之间的差异。我们在三种常见的弱监督学习 paradigm中评估了我们的方法,并观察到我们的方法在所有三个 paradigm中均达到了领先或非常竞争的结果。

Data Acquisition: A New Frontier in Data-centric AI

  • paper_url: http://arxiv.org/abs/2311.13712
  • repo_url: None
  • paper_authors: Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou
  • for: 本研究旨在探讨数据获取过程中遇到的挑战,并提出一种数据获取挑战(DAM)以便促进数据获取的参与度。
  • methods: 本研究使用现有数据市场的调查,发现现有的数据市场缺乏详细信息的平台、透明的价格和标准化的数据格式。
  • results: 本研究发现了数据获取过程中的挑战,并通过DAM挑战引起了数据获取领域的参与度。
    Abstract As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative. There is limited study on the challenges of data acquisition due to ad-hoc processes and lack of consistent methodologies. We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets, transparent pricing, standardized data formats. With the objective of inciting participation from the data-centric AI community, we then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers. The benchmark was released as a part of DataPerf. Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in ML.
    摘要 Machine Learning (ML) 系统的增长使得具有相关性和完整性的数据集的需求变得非常重要。然而,有限的研究表明,数据收集过程中存在许多挑战,主要是因为数据收集的过程是静默的,缺乏一致的方法论。我们首先对当前的数据市场进行了调查,发现现有的平台缺乏提供详细信息的能力,无法提供透明的价格和标准化的数据格式。为了鼓励数据中心AI社区的参与,我们则引入了数据收集挑战(DAM),一个测试框架,用于模拟数据提供者和收集者之间的交互。这个挑战被发布在DataPerf上,我们对提交的策略进行了评估,这显示了ML中数据收集策略的重要性。

Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores

  • paper_url: http://arxiv.org/abs/2311.13693
  • repo_url: None
  • paper_authors: Zeliang Zhang, Zhuo Liu, Susan Liang, Zhiyuan Wang, Yifan Zhu, Chen Ding, Chenliang Xu
  • for: 支持扩展级数据分析,特别是基因分析、深度学习和量子计算。
  • methods: 提出了一种压缩基于的tensor decomposition框架,即exascale-tensor,以支持eksascale tensor decomposition。并且提出了一系列策略来提高计算效率。
  • results: 与基elines相比,exascale-tensor可以支持8,000倍大的tensor和6.95倍的速度提升。同时,我们在基因分析和tensor层神经网络两个实际应用中进行了应用,numeric结果表明我们的方法具有扩展性和效果。
    Abstract CP decomposition is a powerful tool for data science, especially gene analysis, deep learning, and quantum computation. However, the application of tensor decomposition is largely hindered by the exponential increment of the computational complexity and storage consumption with the size of tensors. While the data in our real world is usually presented as trillion- or even exascale-scale tensors, existing work can only support billion-scale scale tensors. In our work, we propose the Exascale-Tensor to mitigate the significant gap. Specifically, we propose a compression-based tensor decomposition framework, namely the exascale-tensor, to support exascale tensor decomposition. Then, we carefully analyze the inherent parallelism and propose a bag of strategies to improve computational efficiency. Last, we conduct experiments to decompose tensors ranging from million-scale to trillion-scale for evaluation. Compared to the baselines, the exascale-tensor supports 8,000x larger tensors and a speedup up to 6.95x. We also apply our method to two real-world applications, including gene analysis and tensor layer neural networks, of which the numeric results demonstrate the scalability and effectiveness of our method.
    摘要 <>将文本翻译成简化中文。<>CP分解是数据科学中的一种强大工具,尤其是基因分析、深度学习和量子计算中。然而,tensor分解的应用受到tensor大小的幂等增长的计算复杂性和存储消耗的限制。而在实际世界中,数据通常表示为十进制或甚至ек斯单位积分数组。现有的工作只能支持百亿积分数组。在我们的工作中,我们提出了Exascale-Tensor来缓解这个巨大的差距。具体来说,我们提出了一种压缩基于的tensor分解框架,称为Exascale-Tensor,以支持eks单位积分数组的分解。然后,我们仔细分析了内置的并行性,并提出了一系列策略来提高计算效率。最后,我们进行了对积分数组的范围从百万积分到十亿积分的实验。相比基准,Exascale-Tensor支持8,000倍大的积分数组和6.95倍的速度提升。我们还应用了我们的方法到两个实际应用中,包括基因分析和tensor层神经网络,其数据结果表明了我们的方法的可扩展性和有效性。

Next-Generation Earth System Models: Towards Reliable Hybrid Models for Weather and Climate Applications

  • paper_url: http://arxiv.org/abs/2311.13691
  • repo_url: None
  • paper_authors: Tom Beucler, Erwan Koch, Sven Kotlarski, David Leutwyler, Adrien Michel, Jonathan Koh
  • for: 本文概要总结了机器学习如何改变我们对地球系统的模型,以及我们预计近 future 中这些突破将对瑞士用户产生多大影响。
  • methods: 本文使用了机器学习技术,包括深度学习和自适应学习,来模型地球系统。
  • results: 本文的结果表明,机器学习技术可以提高地球系统模型的准确性和可靠性,并且可以为瑞士用户提供更好的气象预报和环境监测数据。
    Abstract We review how machine learning has transformed our ability to model the Earth system, and how we expect recent breakthroughs to benefit end-users in Switzerland in the near future.
    摘要 我们回顾了机器学习如何改变我们对地球系统的模型,以及我们预计近 future 中对瑞士用户的帮助。Note: "地球系统" (dìzhēn xìtǒng) is a common way to refer to the Earth in Chinese, and "瑞士" (ruìshì) refers to Switzerland.

MAIRA-1: A specialised large multimodal model for radiology report generation

  • paper_url: http://arxiv.org/abs/2311.13668
  • repo_url: None
  • paper_authors: Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando Pérez-García, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, Noel Codella, Matthew P. Lungren, Maria Teodora Wetscherek, Ozan Oktay, Javier Alvarez-Valle
  • for: 这 paper 的目的是生成基于胸部X射线图像的放射学报告。
  • methods: 该模型使用了一个专门设计 для胸部X射线图像的图像编码器,并将大型自然语言模型经过 fine-tuning 和文本基于数据增强,以生成高质量的报告。
  • results: 模型可以生成高水平的报告,并在RadCliQ指标和所有字符指标中显著提高。人工审查表明模型输出的报告具有良好的流畅性和准确性,而且可以捕捉到现有评价方法不能捕捉的失败模式。更多信息和资源可以通过项目网站:https://aka.ms/maira 获取。
    Abstract We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira.
    摘要 我们提出了一种专门适用于胸部X射线成像(CXR)的多modal模型,用于生成胸部X射线成像报告。我们的工作基于了大型语言模型可以通过调整预训推导器获得多modal能力。在自然图像上,这已经被证明了,让多modal模型获得图像理解和描述能力。我们的提议模型(MAIRA-1)具有特殊的CXR专用影像推导器和微调的大型语言模型,基于Vicuna-7B,以及文本基于数据增强,以生成高品质的报告。尤其是,MAIRA-1在专业医生调整的RadCliQ指标下有 statistically significant 的改善,并且在所有的字幕指标中都有改善。 manual review of model outputs 表明模型输出的报告具有良好的流畅和准确性,而且可以探索现有评估方法所不能捕捉的失败模式。更多信息和资源可以在项目网站上找到:https://aka.ms/maira。

Sample as You Infer: Predictive Coding With Langevin Dynamics

  • paper_url: http://arxiv.org/abs/2311.13664
  • repo_url: None
  • paper_authors: Umais Zahid, Qinghai Guo, Zafeirios Fountas
  • for: 该论文旨在提出一种新的深度生成模型参数学习算法,基于计算神经科学中的预测编码(PC)框架。
  • methods: 该方法对标准PC算法进行修改,使其与标准变分自动编码(VAE)训练相当或超越其性能。injecting Gaussian noise into PC归案程序,将其重新定义为欠拟合Langevin采样,以便对紧缩证据下界(ELBO)进行优化。
  • results: 通过在Langevin采样过程中添加Gaussian噪声,提高了encoder-free训练方法的性能,并测试了三种不同的目标函数。此外,通过采用一个简单的和容易计算的预conditioning方法,提高了模型的稳定性和抗折冲性。与标准VAE训练方法相比,该方法可以在多个维度上提高或与性能相等,同时快速 converge。
    Abstract We present a novel algorithm for parameter learning in generic deep generative models that builds upon the predictive coding (PC) framework of computational neuroscience. Our approach modifies the standard PC algorithm to bring performance on-par and exceeding that obtained from standard variational auto-encoder (VAE) training. By injecting Gaussian noise into the PC inference procedure we re-envision it as an overdamped Langevin sampling, which facilitates optimisation with respect to a tight evidence lower bound (ELBO). We improve the resultant encoder-free training method by incorporating an encoder network to provide an amortised warm-start to our Langevin sampling and test three different objectives for doing so. Finally, to increase robustness to the sampling step size and reduce sensitivity to curvature, we validate a lightweight and easily computable form of preconditioning, inspired by Riemann Manifold Langevin and adaptive optimizers from the SGD literature. We compare against VAEs by training like-for-like generative models using our technique against those trained with standard reparameterisation-trick-based ELBOs. We observe our method out-performs or matches performance across a number of metrics, including sample quality, while converging in a fraction of the number of SGD training iterations.
    摘要 我们提出了一种新的算法,用于参数学习在通用深度生成模型中,基于计算神经科学中的预测编码(PC)框架。我们的方法将标准PC算法修改,以使其和标准VARIATIONAL autoencoder(VAE)训练中的性能相似或更好。我们在PC推理过程中注入 Gaussian 噪音,将其视为过度过度的Langevin抽样,以便对紧缩证据下界(ELBO)进行优化。我们提高了结果的无架构训练方法,通过添加架构网络,以提供抽象的暖点起始,并试用三种不同的目标进行实现。最后,我们运用一种轻量级的和容易计算的预测运算,受到RIEMANN MANIFOLD Langevin和SGD文献中的可靠运算所 inspirited,以增加抽样步长的稳定性和减少曲线弯曲的敏感性。我们与VAEs进行了相似的生成模型训练,并与标准重构化运算之下的ELBO进行比较。我们发现,我们的方法可以与VAEs匹配或超越性能,包括样本质量在内的一些指标,并在SGD训练迭代数中更快 converge。

Visual In-Context Prompting

  • paper_url: http://arxiv.org/abs/2311.13601
  • repo_url: https://github.com/ux-decoder/dinov
  • paper_authors: Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao
  • for: 提高零shot能力,应用于视觉任务中的开放集成segmentation和检测。
  • methods: 基于encoder-decoder架构,开发了一种多功能提示编码器,支持不同类型的提示,如笔emark、盒子和点。提高了使用多个参考图像段的能力。
  • results: 在COCO和SA-1B dataset上进行联合训练,实现了57.7的PQ在COCO和23.2的PQ在ADE20K。
    Abstract In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain. Existing visual prompting methods focus on referring segmentation to segment the most relevant object, falling short of addressing many generic vision tasks like open-set segmentation and detection. In this paper, we introduce a universal visual in-context prompting framework for both tasks. In particular, we build on top of an encoder-decoder architecture, and develop a versatile prompt encoder to support a variety of prompts like strokes, boxes, and points. We further enhance it to take an arbitrary number of reference image segments as the context. Our extensive explorations show that the proposed visual in-context prompting elicits extraordinary referring and generic segmentation capabilities to refer and detect, yielding competitive performance to close-set in-domain datasets and showing promising results on many open-set segmentation datasets. By joint training on COCO and SA-1B, our model achieves $57.7$ PQ on COCO and $23.2$ PQ on ADE20K. Code will be available at https://github.com/UX-Decoder/DINOv.
    摘要 内容提示在大型语言模型(LLM)中已经成为一种普遍使用的方法来提高零执行能力,但这个想法在视觉领域中更少被探索。现有的视觉提示方法主要集中在将分类器访问最相关的物体,而忽略了许多通用视觉任务,例如开放集合分类和检测。在这篇论文中,我们介绍了一个通用的视觉内容提示框架,用于解决这些任务。具体来说,我们基于encoder-decoder架构,并开发了一个多功能的提示encoder,以支持不同的提示,如画笔、方块和点。我们还将其改进,以接受任意数量的参考图像段落作为内容。我们的广泛探索显示,提案的视觉内容提示能够出色地参考和通用分类,实现了与关注领域的竞争性性能,并在许多开放集合分类任务中显示了应用前景。我们通过联合训练COCO和SA-1B,我们的模型能够在COCO上 achieve $57.7$ PQ和ADE20K上 achieve $23.2$ PQ。代码将会在https://github.com/UX-Decoder/DINOv中提供。

Labeling Neural Representations with Inverse Recognition

  • paper_url: http://arxiv.org/abs/2311.13594
  • repo_url: https://github.com/lapalap/invert
  • paper_authors: Kirill Bykov, Laura Kopf, Shinichi Nakajima, Marius Kloft, Marina M. -C. Höhne
  • for: 这篇论文的目的是探讨深度神经网络(DNNs)学习的复杂层次数据表示方法,以及这些表示方法如何与人类可理解的概念相连接。
  • methods: 该论文提出了一种可扩展的方法 called Inverse Recognition(INVERT),该方法可以连接DNNs学习的表示与人类可理解的概念。INVERT不依赖于分割masks,具有较低的计算复杂性,并且可以处理多种类型的神经元。
  • results: 论文通过应用INVERT方法在多个场景中,包括检测模型中的偶极性关系和决策过程的层次结构,以及解释模型中的各种神经元和层次结构。INVERT方法可以提供可解释的度量,评估模型表示与其相应的解释之间的一致程度,并且可以提供统计学上的有效性测试。
    Abstract Deep Neural Networks (DNNs) demonstrated remarkable capabilities in learning complex hierarchical data representations, but the nature of these representations remains largely unknown. Existing global explainability methods, such as Network Dissection, face limitations such as reliance on segmentation masks, lack of statistical significance testing, and high computational demands. We propose Inverse Recognition (INVERT), a scalable approach for connecting learned representations with human-understandable concepts by leveraging their capacity to discriminate between these concepts. In contrast to prior work, INVERT is capable of handling diverse types of neurons, exhibits less computational complexity, and does not rely on the availability of segmentation masks. Moreover, INVERT provides an interpretable metric assessing the alignment between the representation and its corresponding explanation and delivering a measure of statistical significance, emphasizing its utility and credibility. We demonstrate the applicability of INVERT in various scenarios, including the identification of representations affected by spurious correlations, and the interpretation of the hierarchical structure of decision-making within the models.
    摘要 深度神经网络(DNNs)表现出了学习复杂层次数据表示的很高能力,但这些表示的本质仍然不得而知。现有的全局解释方法,如网络解剖,受到了一些限制,如依赖于分割mask,缺乏统计学 significativeness 测试和高计算复杂性。我们提议一种可扩展的方法——反向识别(INVERT),可以将学习得到的表示连接到人类可理解的概念上。与先前的工作不同,INVERT可以处理多种类型的神经元,计算复杂性较低,并不需要分割mask。此外,INVERT提供了一个可解释的度量,用于评估表示和其相应的解释之间的对应度,并提供一个统计学 significativeness 测试,这使得INVERT的实用性和可靠性更加出色。我们在多种场景中应用INVERT,包括检测模型中的偶极性相关性和解释模型中的决策层次结构。

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.13628
  • repo_url: https://github.com/thomaspzollo/prompt_risk
  • paper_authors: Thomas P. Zollo, Todd Morrill, Zhun Deng, Jake C. Snell, Toniann Pitassi, Richard Zemel
  • for: 防止大语言模型的不良回快,提高模型的质量和可靠性。
  • methods: 基于可信度上的紧凑约束,选择合适的提示语。并提供多种维度的约束,包括最坏用户的回快和用户群体中差异的生成质量。
  • results: 通过实验,提高模型的可靠性和质量,降低最坏回快和用户群体中差异的风险。
    Abstract The recent explosion in the capabilities of large language models has led to a wave of interest in how best to prompt a model to perform a given task. While it may be tempting to simply choose a prompt based on average performance on a validation set, this can lead to a deployment where unexpectedly poor responses are generated, especially for the worst-off users. To mitigate this prospect, we propose Prompt Risk Control, a lightweight framework for selecting a prompt based on rigorous upper bounds on families of informative risk measures. We offer methods for producing bounds on a diverse set of metrics, including quantities that measure worst-case responses and disparities in generation quality across the population of users. In addition, we extend the underlying statistical bounding techniques to accommodate the possibility of distribution shifts in deployment. Experiments on applications such as open-ended chat, medical question summarization, and code generation highlight how such a framework can foster responsible deployment by reducing the risk of the worst outcomes.
    摘要

$σ$-PCA: a unified neural model for linear and nonlinear principal component analysis

  • paper_url: http://arxiv.org/abs/2311.13580
  • repo_url: None
  • paper_authors: Fahdi Kanavati, Lucy Katsnith, Masayuki Tsuneki
  • for: 这 paper 的目的是学习从数据中学习线性变换。
  • methods: 这 paper 使用的方法包括单层自适应神经网络模型,包括线性PCA、非线性PCA 和线性ICA。
  • results: 这 paper 得到的结果是一种叫做 $\sigma$-PCA 的单层自适应神经网络模型,可以同时学习线性和非线性PCA 的特征。这个模型可以减少维度并按照方差排序数据。
    Abstract Linear principal component analysis (PCA), nonlinear PCA, and linear independent component analysis (ICA) -- those are three methods with single-layer autoencoder formulations for learning linear transformations from data. Linear PCA learns orthogonal transformations (rotations) that orient axes to maximise variance, but it suffers from a subspace rotational indeterminacy: it fails to find a unique rotation for axes that share the same variance. Both nonlinear PCA and linear ICA reduce the subspace indeterminacy from rotational to permutational by maximising statistical independence under the assumption of unit variance. The relationship between all three can be understood by the singular value decomposition of the linear ICA transformation into a sequence of rotation, scale, rotation. Linear PCA learns the first rotation; nonlinear PCA learns the second. The scale is simply the inverse of the standard deviations. The problem is that, in contrast to linear PCA, conventional nonlinear PCA cannot be used directly on the data to learn the first rotation, the first being special as it reduces dimensionality and orders by variances. In this paper, we have identified the cause, and as a solution we propose $\sigma$-PCA: a unified neural model for linear and nonlinear PCA as single-layer autoencoders. One of its key ingredients: modelling not just the rotation but also the scale -- the variances. This model bridges the disparity between linear and nonlinear PCA. And so, like linear PCA, it can learn a semi-orthogonal transformation that reduces dimensionality and orders by variances, but, unlike linear PCA, it does not suffer from rotational indeterminacy.
    摘要 Linear 主成分分析(PCA)、非线性 PCA 和 linear 独立成分分析(ICA) --- 这三种方法都是基于单层自适应神经网络的学习线性变换的方法。Linear PCA 学习正交变换(旋转),以便将轴线 Orient 到最大化协方差,但它受到了子空间旋转不确定性:它无法找到共享协方差的唯一旋转。非线性 PCA 和 linear ICA 都减少了子空间旋转不确定性,通过最大化统计独立性,假设协方差均一。这三种方法之间的关系可以通过线性 ICA 变换的特征值分解来理解,线性 PCA 学习的第一个旋转,非线性 PCA 学习的第二个旋转。标准差的逆是非线性 PCA 的标准差。问题在于,与线性 PCA 不同,非线性 PCA 直接在数据上学习第一个旋转是不可行的。在这篇论文中,我们已经确定了这种问题的原因,并提出了一个解决方案:$\sigma$-PCA。这是一种把线性和非线性 PCA 转化为单层自适应神经网络的模型。其中一个关键组成部分是模型不仅包括旋转,还包括标准差(协方差)。这种模型 bridges 线性 PCA 和非线性 PCA 之间的差异,因此它可以学习一个半正交变换,减少维度并按照协方差排序,但不同于线性 PCA,它不受旋转不确定性的影响。

Physical Reasoning and Object Planning for Household Embodied Agents

  • paper_url: http://arxiv.org/abs/2311.13577
  • repo_url: https://github.com/com-phy-affordance/coat
  • paper_authors: Ayush Agrawal, Raghav Prabhakar, Anirudh Goyal, Dianbo Liu
  • for: 本研究探讨了家庭智能代理人在强大任务规划中的准确性和灵活性问题,尤其是在选择替代物品上。
  • methods: 我们引入了通用智能物品用途任务(COAT)框架,用于分析语言模型在日常情景下的理智能力。COAT框架基于理解代理人如何有效地识别和利用日常任务中的替代物品。
  • results: 我们通过三个精心制作的常识问答数据集(15k和130k问题)来评估当今状态的语言模型是如何处理这些问题。我们的发现包括对物品的各种物理状态的抽象变量,以及对物品的用途相关性进行映射。这些贡献可以帮助解决实际生活中的物品选择问题,并为未来家庭智能代理人技术的发展提供基础。
    Abstract In this study, we explore the sophisticated domain of task planning for robust household embodied agents, with a particular emphasis on the intricate task of selecting substitute objects. We introduce the CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios. This approach is centered on understanding how these agents can effectively identify and utilize alternative objects when executing household tasks, thereby offering insights into the complexities of practical decision-making in real-world environments.Drawing inspiration from human decision-making, we explore how large language models tackle this challenge through three meticulously crafted commonsense question-and-answer datasets, featuring refined rules and human annotations. Our evaluation of state-of-the-art language models on these datasets sheds light on three pivotal considerations: 1) aligning an object's inherent utility with the task at hand, 2) navigating contextual dependencies (societal norms, safety, appropriateness, and efficiency), and 3) accounting for the current physical state of the object. To maintain accessibility, we introduce five abstract variables reflecting an object's physical condition, modulated by human insights to simulate diverse household scenarios. Our contributions include insightful Object-Utility mappings addressing the first consideration and two extensive QA datasets (15k and 130k questions) probing the intricacies of contextual dependencies and object states. The datasets, along with our findings, are accessible at: \url{https://github.com/com-phy-affordance/COAT}. This research not only advances our understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.
    摘要 在这项研究中,我们探讨了家庭机器人的任务规划问题,尤其是在选择替代物品方面。我们提出了一种新的框架——通用常识物品用途任务(COAT),用于分析语义理解能力在日常情境中。这种方法是基于理解家庭任务执行时机器人如何有效地标识和利用替代物品,从而为实际决策提供了深入的理解。我们draw inspiration from human decision-making, investigating how large language models tackle this challenge using three carefully crafted commonsense question-and-answer datasets, featuring refined rules and human annotations. Our evaluation of state-of-the-art language models on these datasets sheds light on three key considerations: 1) aligning an object's inherent utility with the task at hand, 2) navigating contextual dependencies (societal norms, safety, appropriateness, and efficiency), and 3) accounting for the current physical state of the object. To maintain accessibility, we introduce five abstract variables reflecting an object's physical condition, modulated by human insights to simulate diverse household scenarios. Our contributions include insightful Object-Utility mappings addressing the first consideration and two extensive QA datasets (15k and 130k questions) probing the intricacies of contextual dependencies and object states. The datasets, along with our findings, are accessible at: \url{https://github.com/com-phy-affordance/COAT}. This research not only advances our understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence.

Drilling Down into the Discourse Structure with LLMs for Long Document Question Answering

  • paper_url: http://arxiv.org/abs/2311.13565
  • repo_url: None
  • paper_authors: Inderjeet Nair, Shwetha Somasundaram, Apoorv Saxena, Koustava Goswami
  • for: 长文问题回答 task中的证据检索,即寻找答案相关的段落。
  • methods: 利用大型自然语言模型(LLMs)进行零shot长文证据检索,并提出了一些技巧来利用文档结构来创建缩减表示,以获得更好的理解和分析文档中不同部分之间的关系。
  • results: 比较旧的零shot方法的性能,发现可以保留 $99.6%$ 的性能,同时只处理 $26%$ 的字元数,并且可以与自己的问题进行结合,以达到最佳零shot性能。
    Abstract We address the task of evidence retrieval for long document question answering, which involves locating relevant paragraphs within a document to answer a question. We aim to assess the applicability of large language models (LLMs) in the task of zero-shot long document evidence retrieval, owing to their unprecedented performance across various NLP tasks. However, currently the LLMs can consume limited context lengths as input, thus providing document chunks as inputs might overlook the global context while missing out on capturing the inter-segment dependencies. Moreover, directly feeding the large input sets can incur significant computational costs, particularly when processing the entire document (and potentially incurring monetary expenses with enterprise APIs like OpenAI's GPT variants). To address these challenges, we propose a suite of techniques that exploit the discourse structure commonly found in documents. By utilizing this structure, we create a condensed representation of the document, enabling a more comprehensive understanding and analysis of relationships between different parts. We retain $99.6\%$ of the best zero-shot approach's performance, while processing only $26\%$ of the total tokens used by the best approach in the information seeking evidence retrieval setup. We also show how our approach can be combined with \textit{self-ask} reasoning agent to achieve best zero-shot performance in complex multi-hop question answering, just $\approx 4\%$ short of zero-shot performance using gold evidence.
    摘要 我们考虑了长文档问答中的证据检索任务,即在文档中找到相关的段落以回答问题。我们想要评估大语言模型(LLM)在零批长文档证据检索任务中的应用可行性,因为它们在各种自然语言处理任务中表现出了无 precedent 的成绩。然而,目前 LLM 只能处理有限的上下文长度作为输入,因此将文档段落作为输入可能会忽略全文上下文,同时错过捕捉段落之间的相互依赖关系。此外,直接将大量输入集 feed 给 LLM 可能会产生巨大的计算成本,特别是处理整个文档(并可能会产生企业 API 如 OpenAI 的 GPT 变种的成本)。为解决这些挑战,我们提出了一个套件的技术,利用文档的话语结构来创建缩短的文档表示,以便更好地理解和分析文档中不同部分之间的关系。我们保留了 $99.6\%$ 的最佳零批approach 的性能,同时只处理 $26\%$ 的总token 数。我们还示出了我们的方法可以与 \textit{self-ask} 反思代理 integrate 以获得最佳零批性能在复杂多层问答中,只比零批性能约 $4\%$ 短。

Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object

  • paper_url: http://arxiv.org/abs/2311.13562
  • repo_url: https://github.com/yisuanwang/soulstyler
  • paper_authors: Junhao Chen, Peng Rong, Jingbo Sun, Chao Li, Xiang Li, Hongwu Lv
  • for: 提高图像风格转移精度,使得图像中的特定对象能够根据文本描述进行风格转移,而不影响背景风格。
  • methods: 提出了一种名为“Soulstyler”的框架,通过简单的文本描述指定风格转移目标对象,并使用大量语言模型解析文本,从图像内容中提取风格特征。
  • results: 实验结果表明,Soulstyler 可以准确地根据文本描述进行风格转移,而不会影响背景风格。
    Abstract Image style transfer occupies an important place in both computer graphics and computer vision. However, most current methods require reference to stylized images and cannot individually stylize specific objects. To overcome this limitation, we propose the "Soulstyler" framework, which allows users to guide the stylization of specific objects in an image through simple textual descriptions. We introduce a large language model to parse the text and identify stylization goals and specific styles. Combined with a CLIP-based semantic visual embedding encoder, the model understands and matches text and image content. We also introduce a novel localized text-image block matching loss that ensures that style transfer is performed only on specified target objects, while non-target regions remain in their original style. Experimental results demonstrate that our model is able to accurately perform style transfer on target objects according to textual descriptions without affecting the style of background regions. Our code will be available at https://github.com/yisuanwang/Soulstyler.
    摘要 Image style transfer在计算机图形和计算机视觉中占据重要地位。然而,现有的方法都需要参照已经预先处理的样式图片,无法单独预处理特定对象。为解决这一限制,我们提出了“魂革”框架,允许用户通过简单的文本描述指导特定对象在图片中的预处理。我们引入了一个大型自然语言处理模型,以解析文本并确定预处理目标和特定样式。与CLIP基于的semantic visual embedding编码器结合使用,我们的模型能够理解并匹配文本和图片内容。我们还引入了一种新的本地文本-图片块匹配损失函数,确保预处理只对target对象进行,非target区域保持原样式。实验结果表明,我们的模型可以根据文本描述来准确地进行预处理target对象,而不会影响背景区域的样式。我们的代码将在https://github.com/yisuanwang/Soulstyler中提供。

Transfer Learning-based Real-time Handgun Detection

  • paper_url: http://arxiv.org/abs/2311.13559
  • repo_url: None
  • paper_authors: Youssef Elmir, Sid Ahmed Laouar, Larbi Hamdaoui
  • for: 提高安全措施,减少人工监视的依赖
  • methods: 使用卷积神经网络和传输学习实现实时计算机视觉手枪检测系统
  • results: 提出了一种可靠和高效的自动手枪检测系统,准确率为84.74%,与相关研究比较可观,可以减少人工监视时间和false positive数量
    Abstract Traditional surveillance systems rely on human attention, limiting their effectiveness. This study employs convolutional neural networks and transfer learning to develop a real-time computer vision system for automatic handgun detection. Comprehensive analysis of online handgun detection methods is conducted, emphasizing reducing false positives and learning time. Transfer learning is demonstrated as an effective approach. Despite technical challenges, the proposed system achieves a precision rate of 84.74%, demonstrating promising performance comparable to related works, enabling faster learning and accurate automatic handgun detection for enhanced security. This research advances security measures by reducing human monitoring dependence, showcasing the potential of transfer learning-based approaches for efficient and reliable handgun detection.
    摘要 传统的监视系统依赖人工注意力,其效果受限。本研究利用卷积神经网络和传输学习开发了实时计算机视觉系统,自动检测手枪。研究对网上手枪检测方法进行了全面分析,强调减少假阳性和学习时间。传输学习被证明是一种有效的方法。Despite technical challenges, the proposed system achieves a precision rate of 84.74%, demonstrating promising performance comparable to related works, enabling faster learning and accurate automatic handgun detection for enhanced security. This research advances security measures by reducing human monitoring dependence, showcasing the potential of transfer learning-based approaches for efficient and reliable handgun detection.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Vamos: Versatile Action Models for Video Understanding

  • paper_url: http://arxiv.org/abs/2311.13627
  • repo_url: https://github.com/brown-palm/Vamos
  • paper_authors: Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun
  • for: 这种论文旨在提出一种基于文本表示的视频理解方法,以便更好地理解视频中的动作和情境。
  • methods: 该方法使用大型自然语言模型(LLM)作为”理解器”,可以直接利用视频中的文本描述、动作标签和自由文本描述来学习视频表示。
  • results: 研究发现,文本基于的表示可以在四个不同的视频理解benchmark上实现竞争性的表现,而视频嵌入不提供明显的性能提升。
    Abstract What makes good video representations for video understanding, such as anticipating future activities, or answering video-conditioned questions? While earlier approaches focus on end-to-end learning directly from video pixels, we propose to revisit text-based representations, such as discrete action labels, or free-form video captions, which are interpretable and can be directly consumed by large language models (LLMs). Intuitively, different video understanding tasks may require representations that are complementary and at different granularities. To this end, we propose versatile action models (Vamos), a learning framework powered by a large language model as the "reasoner", and can flexibly leverage visual embeddings, action labels, and free-form descriptions extracted from videos as its input. We evaluate Vamos on four complementary video understanding benchmarks, Ego4D, Next-QA, IntentQA, and EgoSchema, on its capability to model temporal dynamics, encode visual history, and perform reasoning. Surprisingly, we observe that text-based representations consistently achieve competitive performance on all benchmarks, and that visual embeddings provide marginal or no performance improvement, demonstrating the effectiveness of text-based video representation in the LLM era. We perform extensive ablation study and qualitative analysis to support our observations, and achieve state-of-the-art performance on three benchmarks.
    摘要 “什么使得视频理解成功,例如预测未来活动或回答视频相关问题?”而之前的方法直接从视频像素学习,我们提议返回文本基于表示,如可读的动作标签或自由形式视频caption,它们可读性好,可以直接被大语言模型(LLM)消耗。intuitively,不同的视频理解任务可能需要不同的表示,具有不同的粒度。为了实现这一目标,我们提出了多能动作模型(Vamos),一种学习框架,它可以通过大语言模型来“理解”,并可以灵活地使用视觉嵌入、动作标签和自由形式视频描述作为输入。我们对Vamos进行了四个补充性视频理解benchmark测试,包括Ego4D、Next-QA、IntentQA和EgoSchema,以评估它在模拟时间动态、编码视觉历史和理解能力方面的表现。 Surprisingly,我们发现文本基于表示在所有benchmark上具有竞争性的表现,而视觉嵌入提供了极少或无法提高表现,这说明文本基于视频表示在LLM时代是非常有效的。我们进行了广泛的减少性分析和质量分析,并在三个benchmark上实现了状态之最好的表现。”

Medical Image Retrieval Using Pretrained Embeddings

  • paper_url: http://arxiv.org/abs/2311.13547
  • repo_url: None
  • paper_authors: Farnaz Khun Jush, Tuan Truong, Steffen Vogler, Matthias Lenga
  • for: 这篇论文的目的是为了探讨医疗影像搜寻的可能性,并且评估了四种现代预训网络的应用可能性。
  • methods: 这篇论文使用了四种现代预训网络,并且比较了两种相似指标探索方法的结果。此外,它还分析了对3D影像搜寻的影响,包括权重和抽样策略。
  • results: 这篇论文获得了在不同的模式、身体区域和器官水平进行医疗影像搜寻的100%精度。使用预训网络可以实现医疗影像搜寻,不需要进行任何额外训练或微调步骤。
    Abstract A wide range of imaging techniques and data formats available for medical images make accurate retrieval from image databases challenging. Efficient retrieval systems are crucial in advancing medical research, enabling large-scale studies and innovative diagnostic tools. Thus, addressing the challenges of medical image retrieval is essential for the continued enhancement of healthcare and research. In this study, we evaluated the feasibility of employing four state-of-the-art pretrained models for medical image retrieval at modality, body region, and organ levels and compared the results of two similarity indexing approaches. Since the employed networks take 2D images, we analyzed the impacts of weighting and sampling strategies to incorporate 3D information during retrieval of 3D volumes. We showed that medical image retrieval is feasible using pretrained networks without any additional training or fine-tuning steps. Using pretrained embeddings, we achieved a recall of 1 for various tasks at modality, body region, and organ level.
    摘要 医疗图像检索技术和数据格式的多样性使得从图像数据库中准确检索成为挑战。高效的检索系统是现代医疗和研究的关键,因此解决医疗图像检索的挑战是必要的。在这项研究中,我们评估了使用四种当前最佳预训练模型进行医疗图像检索的可能性,并比较了两种相似指标方法的结果。由于使用的网络都是2D图像,我们分析了在检索3D量化时如何包含3D信息的影响。我们显示了无需任何额外训练或微调步骤,可以使用预训练的嵌入来实现医疗图像检索。我们在不同的模式、身体区域和器官水平达到1的回归率。

Enigma: Privacy-Preserving Execution of QAOA on Untrusted Quantum Computers

  • paper_url: http://arxiv.org/abs/2311.13546
  • repo_url: None
  • paper_authors: Ramin Ayanzadeh, Ahmad Mousavi, Narges Alavisamani, Moinuddin Qureshi
  • for: 这个论文的目的是实现低成本、隐私保护的量子计算,并且可以与现有系统集成。
  • methods: 本论文提出了一个称为“Enigma”的隐私保护方案,专门设计用于量子近似最佳化算法(QAOA)。与前一些隐私保护技术不同,Enigma将输入问题转换为不可理解的电路和结果。本论文提出了三种Enigma的变iante,包括Enigma-I、Enigma-II和Enigma-III。
  • results: 本论文透过IBM量子设备进行评估,结果显示Enigma可以确保隐私,同时只对精度(1%-13%)造成小量损失。
    Abstract Quantum computers can solve problems that are beyond the capabilities of conventional computers. As quantum computers are expensive and hard to maintain, the typical model for performing quantum computation is to send the circuit to a quantum cloud provider. This leads to privacy concerns for commercial entities as an untrusted server can learn protected information from the provided circuit. Current proposals for Secure Quantum Computing (SQC) either rely on emerging technologies (such as quantum networks) or incur prohibitive overheads (for Quantum Homomorphic Encryption). The goal of our paper is to enable low-cost privacy-preserving quantum computation that can be used with current systems. We propose Enigma, a suite of privacy-preserving schemes specifically designed for the Quantum Approximate Optimization Algorithm (QAOA). Unlike previous SQC techniques that obfuscate quantum circuits, Enigma transforms the input problem of QAOA, such that the resulting circuit and the outcomes are unintelligible to the server. We introduce three variants of Enigma. Enigma-I protects the coefficients of QAOA using random phase flipping and fudging of values. Enigma-II protects the nodes of the graph by introducing decoy qubits, which are indistinguishable from primary ones. Enigma-III protects the edge information of the graph by modifying the graph such that each node has an identical number of connections. For all variants of Enigma, we demonstrate that we can still obtain the solution for the original problem. We evaluate Enigma using IBM quantum devices and show that the privacy improvements of Enigma come at only a small reduction in fidelity (1%-13%).
    摘要 量子计算机可以解决常见计算机无法解决的问题。由于量子计算机是昂贵而难以维护的,因此通常的模型是将环境发送到量子云提供商。这导致了商业实体的隐私问题,因为无信任的服务器可以从提供的环境中获得保护信息。现有的安全量子计算(SQC)提案 Either rely on emerging technologies (such as quantum networks) or incur prohibitive overheads (for Quantum Homomorphic Encryption).我们的文章的目标是启用低成本、隐私保护的量子计算,可以与当前系统一起使用。我们提出了Enigma,一个特有的隐私保护方案,专门为Quantum Approximate Optimization Algorithm(QAOA)设计。与前一些SQC技术不同,Enigma将输入问题变换成无法理解的环境和结果。我们提出了Enigma-I、Enigma-II和Enigma-III三种变iante。Enigma-I使用随机相位折衣和值融合来保护QAOA的系数。Enigma-II通过引入幌夹宠块来保护图形中的节点。Enigma-III通过修改图形,使每个节点都有相同的连接数来保护图形中的边信息。对于所有Enigma变体,我们证明可以仍然解决原始问题。我们使用IBM量子计算机 Device进行评估,并显示Enigma Privacy Improvement Only at a small reduction in fidelity (1%-13%).

Physics-driven generative adversarial networks empower single-pixel infrared hyperspectral imaging

  • paper_url: http://arxiv.org/abs/2311.13626
  • repo_url: None
  • paper_authors: Dong-Yin Wang, Shu-Hang Bie, Xi-Hao Chen, Wen-Kai Yu
  • for: 这个论文是为了解决单ixel化光谱成像(HSI)中的各种数据训练问题而设计的。
  • methods: 这个论文使用了物理驱动的生成敌对网络(GAN)结构,并将单ixel成像(SPI)的物理过程 integrate到生成器中。 objective function中使用了实际和估计的一维(1D)桶信号作为约束,以更新网络参数并优化生成器。
  • results: 相比传统基于压缩学习和物理驱动 convolutional neural networks 的单ixel红外HSI方法,我们的物理驱动 GAN 基于的单ixel红外HSI可以实现更高的成像性能,但具有更少的测量。
    Abstract A physics-driven generative adversarial network (GAN) was established here for single-pixel hyperspectral imaging (HSI) in the infrared spectrum, to eliminate the extensive data training work required by traditional data-driven model. Within the GAN framework, the physical process of single-pixel imaging (SPI) was integrated into the generator, and the actual and estimated one-dimensional (1D) bucket signals were employed as constraints in the objective function to update the network's parameters and optimize the generator with the assistance of the discriminator. In comparison to single-pixel infrared HSI methods based on compressed sensing and physics-driven convolution neural networks, our physics-driven GAN-based single-pixel infrared HSI can achieve higher imaging performance but with fewer measurements. We believe that this physics-driven GAN will promote practical applications of computational imaging, especially various SPI-based techniques.
    摘要 一种基于物理学的生成对抗网络(GAN)在此处建立,用于单像素干涉谱图像(HSI)的红外spectrum中进行推断,以消除传统数据驱动模型需要的大量数据训练工作。在GAN框架中,实际的单像素捕获(SPI)物理过程被集成到生成器中,并将真实的和估计的一维(1D)吸管信号作为对象函数中的约束,以更新网络参数并优化生成器。与基于压缩感知和物理驱动 convolutional neural networks 的单像素红外HSI方法相比,我们的物理驱动 GAN 基于的单像素红外HSI可以实现更高的捕抓性,但需要 fewer measurements。我们认为这种物理驱动 GAN 将推动计算成像应用,特别是各种 SPI 技术的实际应用。

Linear Log-Normal Attention with Unbiased Concentration

  • paper_url: http://arxiv.org/abs/2311.13541
  • repo_url: None
  • paper_authors: Yury Nahshan, Joseph Kampeas, Emir Haleva
  • for: 本研究旨在解决Transformer模型在长文本或高分辨率图像处理中遇到的缺乏扩展性问题,即自注意力机制的时间和内存复杂度 quadratic 关系与序列长度相关。
  • methods: 本研究通过分析自注意力矩阵的分布和吸引力能力来研究自注意力机制。此外,我们还提出了测量这些量的工具,并提出了一种新的自注意力机制,线性Log-Normal注意力,可以模拟原始自注意力的分布和吸引力行为。
  • results: 我们在各种自然语言标准套件上进行了实验,发现我们提出的线性Log-Normal注意力可以超越其他线性注意力的选择,并提供了提高Transformer模型扩展性的可能性。代码可以在补充材料中找到。
    Abstract Transformer models have achieved remarkable results in a wide range of applications. However, their scalability is hampered by the quadratic time and memory complexity of the self-attention mechanism concerning the sequence length. This limitation poses a substantial obstacle when dealing with long documents or high-resolution images. In this work, we study the self-attention mechanism by analyzing the distribution of the attention matrix and its concentration ability. Furthermore, we propose instruments to measure these quantities and introduce a novel self-attention mechanism, Linear Log-Normal Attention, designed to emulate the distribution and concentration behavior of the original self-attention. Our experimental results on popular natural language benchmarks reveal that our proposed Linear Log-Normal Attention outperforms other linearized attention alternatives, offering a promising avenue for enhancing the scalability of transformer models. Our code is available in supplementary materials.
    摘要

Speak Like a Native: Prompting Large Language Models in a Native Style

  • paper_url: http://arxiv.org/abs/2311.13538
  • repo_url: https://github.com/yangzhch6/aligncot
  • paper_authors: Zhicheng Yang, Yiwei Wang, Yinya Huang, Jing Xiong, Xiaodan Liang, Jing Tang
  • for: 提高大语言模型(LLM)的理解能力
  • methods: 使用链式思维(CoT)技术,并对具有不同文本风格的示例进行对齐,以提高 LLM 的表现
  • results: 对多个 benchmark 进行了广泛和全面的实验,结果表明,使用我们的 AlignCoT 技术可以在 GPT-3.5-turbo 上提高表现,比如在 GSM8K 上提高了 +2.5%。此外,我们的 AlignCoT 技术可以轻松地与现有的state-of-the-art技术结合使用,以进一步提高 LLM 的表现。
    Abstract Existing work has found that the prompt engineering heavily influences the performance of large language models (LLMs). Chain-of-thought (CoT), as a popular prompt engineering technique, prompted LLMs using in-context examples with reasoning steps. In current studies, the few-shot examples of CoT are generally handcrafted by humans. However, how the text style of in-context examples influence the outputs of LLMs still remains under-explored. This paper presents a novel and effective approach, named \textbf{AlignCoT}, to improve the reasoning capability of LLMs by aligning the in-context examples with the native style of LLMs. ``Native'' refers to the inherent characteristic style of LLMs which can be probed by original zero-shot scenarios. AlignCoT is orthogonal to other prompt engineering methods, making it easy to combine with state-of-the-art techniques to further improve the LLMs' performance. We conduct extensive and comprehensive experiments on several benchmarks. The empirical results demonstrate that our AlignCoTsignificantly improves performance over the carefully handcrafted in-context examples. For instance, with GPT-3.5-turbo, we observed a +2.5\% improvement on GSM8K. Furthermore, our AlignCoT consistently improve the performance when combined with other state-of-the-art prompt engineering methods. The source code and dataset will be available at \href{https://github.com/yangzhch6/AlignCoT}{https://github.com/yangzhch6/AlignCoT}.
    摘要 先前的研究发现,提示工程对大型自然语言模型(LLM)的性能产生很大的影响。链条思维(CoT)是一种流行的提示工程技术,通过提供在 Context 中的示例,使 LLM 进行推理步骤。然而,现有的几个示例通常是由人类手动制定的。但是,Context 中文本风格对 LLM 的输出仍然尚未得到充分探索。本文提出了一种新的和有效的方法,名为 AlignCoT,可以提高 LLM 的推理能力,通过将 Context 中的示例与 LLM 的Native 风格进行对齐。“Native”指的是 LLM 的内生特征风格,可以通过原生 zero-shot enario 探测。AlignCoT 与其他提示工程方法 orthogonal,可以与现有的技术相结合,以进一步提高 LLM 的性能。我们在多个 benchmark 上进行了广泛和全面的实验,结果表明,我们的 AlignCoT 可以较之手动制定的示例 obt 得到 +2.5% 的提升,例如使用 GPT-3.5-turbo 时在 GSM8K 上得到了 +2.5% 的提升。此外,我们的 AlignCoT 在与其他现有提示工程方法相结合时也能够保持性能的提升。代码和数据将在 GitHub 上提供,链接为 \href{https://github.com/yangzhch6/AlignCoT}{https://github.com/yangzhch6/AlignCoT}.

LM-Cocktail: Resilient Tuning of Language Models via Model Merging

  • paper_url: http://arxiv.org/abs/2311.13534
  • repo_url: https://github.com/flagopen/flagembedding
  • paper_authors: Shitao Xiao, Zheng Liu, Peitian Zhang, Xingrun Xing
  • for: 提高 fine-tuned 语言模型在通用任务上的表现,而不导致特定领域表现下降。
  • methods: 提出 LM-Cocktail 方法,通过将 fine-tuned 模型与基模型或其他领域的 peer 模型进行 weighted average 来实现模型融合。
  • results: 通过对 LLama 和 BGE 模型进行 experiments,在各种通用任务上实现了强大的 empirical 表现,同时保持了targeted 领域的优秀表现能力。
    Abstract The pre-trained language models are continually fine-tuned to better support downstream applications. However, this operation may result in significant performance degeneration on general tasks beyond the targeted domain. To overcome this problem, we propose LM-Cocktail which enables the fine-tuned model to stay resilient in general perspectives. Our method is conducted in the form of model merging, where the fine-tuned language model is merged with the pre-trained base model or the peer models from other domains through weighted average. Despite simplicity, LM-Cocktail is surprisingly effective: the resulted model is able to achieve a strong empirical performance in the whole scope of general tasks while preserving a superior capacity in its targeted domain. We conduct comprehensive experiments with LLama and BGE model on popular benchmarks, including FLAN, MMLU, MTEB, whose results validate the efficacy of our proposed method. The code and checkpoints are available at https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail.
    摘要 预训言语模型 continually 细化以更好地支持下游应用。然而,这种操作可能会导致普通任务上的性能下降。为了解决这个问题,我们提出了 LM-Cocktail,它使得细化后的模型保持普通视角的强健性。我们的方法是通过模型合并,将细化后的语言模型与预训义基模型或来自其他领域的同等模型进行权重平均。尽管简单,但 LM-Cocktail 却 surprisingly 有效:得到的模型在整个通用任务范围内具有强大的 empirical 性能,同时保持高水平的目标领域能力。我们在 LLama 和 BGE 模型上进行了广泛的实验,包括 FLAN、MMLU、MTEB 等标准 benchmarcks,结果证明了我们的提议的有效性。代码和检查点可以在 中找到。

Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices

  • paper_url: http://arxiv.org/abs/2311.13502
  • repo_url: None
  • paper_authors: Gaoxiang Duan, Junkai Zhang, Xiaoying Zheng, Yongxin Zhu
  • for: 本研究旨在解决现代计算领域中高性能模型的应用所遇到的计算复杂性和精度要求问题。
  • methods: 本研究提出了一种新的注意机制,即使用位运算取代浮点数矩阵乘法,以提高注意机制的计算复杂性和精度。
  • results: 对比传统浮点数矩阵乘法,位运算注意机制可以减少计算复杂性,同时保持注意机制的能力capture长距离信息相互关系。
    Abstract In the current landscape of large models, the Transformer stands as a cornerstone, playing a pivotal role in shaping the trajectory of modern models. However, its application encounters challenges attributed to the substantial computational intricacies intrinsic to its attention mechanism. Moreover, its reliance on high-precision floating-point operations presents specific hurdles, particularly evident in computation-intensive scenarios such as edge computing environments. These environments, characterized by resource-constrained devices and a preference for lower precision, necessitate innovative solutions. To tackle the exacting data processing demands posed by edge devices, we introduce the Bitformer model, an inventive extension of the Transformer paradigm. Central to this innovation is a novel attention mechanism that adeptly replaces conventional floating-point matrix multiplication with bitwise operations. This strategic substitution yields dual advantages. Not only does it maintain the attention mechanism's prowess in capturing intricate long-range information dependencies, but it also orchestrates a profound reduction in the computational complexity inherent in the attention operation. The transition from an $O(n^2d)$ complexity, typical of floating-point operations, to an $O(n^2T)$ complexity characterizing bitwise operations, substantiates this advantage. Notably, in this context, the parameter $T$ remains markedly smaller than the conventional dimensionality parameter $d$. The Bitformer model in essence endeavors to reconcile the indomitable requirements of modern computing landscapes with the constraints posed by edge computing scenarios. By forging this innovative path, we bridge the gap between high-performing models and resource-scarce environments, thus unveiling a promising trajectory for further advancements in the field.
    摘要 现在大型模型的景观中,Transformer作为一个重要的基础模型,对现代模型的发展做出了重要贡献。然而,其应用面临着基于注意机制的计算复杂性的挑战,以及高精度浮点运算的特定障碍。特别是在Edge computing环境中,由于资源有限和低精度的限制,这些障碍变得更加明显。为了解决Edge computing环境中的减少数据处理需求,我们提出了Bitformer模型,这是Transformer思想的创新扩展。Bitformer模型的核心是一种新型的注意机制,可以将浮点矩阵乘法换换为位运算。这种策略性的更换实现了两个优点。首先,它保持了注意机制的长距离信息依赖关系捕捉能力,并且实现了计算复杂性的显著减少。在浮点运算的$O(n^2d)$复杂度下,我们可以将其换换为位运算的$O(n^2T)$复杂度,其中$T$是 Parameter $T$ 的值,而 $d$ 是传统的维度参数。这种减少的复杂度是由于位运算的精度低于浮点运算的精度所致。Bitformer模型的目的是将现代计算景观中的高性能模型与 Edge computing 环境进行协调,从而 bridge 这两个环境之间的差异。通过这一创新之路,我们打开了现代模型的发展前景,并为Edge computing环境带来了可靠的计算解决方案。

Complexity-Guided Curriculum Learning for Text Graphs

  • paper_url: http://arxiv.org/abs/2311.13472
  • repo_url: None
  • paper_authors: Nidhi Vakil, Hadi Amiri
  • for: 这篇论文的目的是提出一种curriculum learning方法,用于训练文本 граhp数据。
  • methods: 本研究使用了一种novel data scheduler,它利用了“ espaired repetition”和 complexity formalisms来引导训练过程。
  • results: 研究发现,提出的方法可以更好地使用资料,并且可以在不同的GNN模型和数据集上学习传播的curriculum。此外,本研究发现了本地(node)和全局(graph)图形复杂度指标,以及浅层和传统文本复杂度指标的重要性。
    Abstract Curriculum learning provides a systematic approach to training. It refines training progressively, tailors training to task requirements, and improves generalization through exposure to diverse examples. We present a curriculum learning approach that builds on existing knowledge about text and graph complexity formalisms for training with text graph data. The core part of our approach is a novel data scheduler, which employs "spaced repetition" and complexity formalisms to guide the training process. We demonstrate the effectiveness of the proposed approach on several text graph tasks and graph neural network architectures. The proposed model gains more and uses less data; consistently prefers text over graph complexity indices throughout training, while the best curricula derived from text and graph complexity indices are equally effective; and it learns transferable curricula across GNN models and datasets. In addition, we find that both node-level (local) and graph-level (global) graph complexity indices, as well as shallow and traditional text complexity indices play a crucial role in effective curriculum learning.
    摘要 curriculum learning提供了一种系统化的训练方法,可以逐步精细地训练,根据任务需求进行定制化,并通过对多个示例的暴露来提高通用性。我们提出了基于文本和图Complexity formalisms的curriculum learning方法,其核心部分是一种新的数据规划器,利用“隔离重复”和Complexity formalisms来导引训练过程。我们在多个文本图任务和图神经网络架构上运行了该方法,并证明了该模型在使用更少数据和训练时间仍能够达到更好的性能。此外,我们还发现了本地(节点级)和全局(图级)图复杂性指标以及浅层和传统的文本复杂性指标在有效的curriculum learning中扮演着关键的角色。

Generation of Explanations for Logic Reasoning

  • paper_url: http://arxiv.org/abs/2311.13455
  • repo_url: None
  • paper_authors: Yanyi Pu
  • for: 本论文探讨了fortiori理由在推理中的应用,强调其在法律、哲学和人工智能等领域的重要性。
  • methods: 本研究使用GPT-3.5-turbo自动分析fortiori理由,主要目标是理解复杂的推理过程、生成清晰 coherent 的解释以及创造新的理由。研究方法包括细致的分析、解释和增强fortiori理由。
  • results: 实验表明GPT-3.5-turbo在准确地检测和分类fortiori理由方面存在挑战,但模型仍能达到与专门模型相当的性能,特别是在提取关键元素和理解基础性质方面。外部信息的 интеграinto模型处理中显著提高了生成的解释质量。此外,模型还展现了在增强理由方面的remarkable capability。
    Abstract This thesis delves into a fortiori arguments in deductive reasoning, underscoring their relevance in various domains such as law, philosophy, and artificial intelligence. The research is centred on employing GPT-3.5-turbo to automate the analysis of these arguments, with a focus on understanding intricate reasoning processes, generating clear and coherent explanations, and creating novel arguments. The methodology encompasses a series of tasks including detailed reasoning, interpretation, and the augmentation of a fortiori arguments. It involves meticulously identifying these arguments in diverse contexts, differentiating comparative elements, and categorizing them based on their logical structure. Extensive experiments reveals the challenges encountered by GPT-3.5-turbo in accurately detecting and classifying a fortiori arguments. Nevertheless, the model demonstrates a performance that rivals specialized models, particularly in extracting key components and interpreting underlying properties. The integration of external information into the model's processing significantly elevates the quality of the generated explanations. Additionally, the model exhibits a noteworthy capability in augmenting arguments, thus contributing to the enrichment of the data set. Despite facing certain limitations, this thesis makes significant contributions to the fields of artificial intelligence and logical reasoning. It introduces novel methodologies, establishes a rigorous evaluation framework, and provides deep insights that set the stage for future advancements in automated logical reasoning. The findings and methodologies presented herein not only underscore the potential of AI in complex reasoning tasks but also highlight areas for future research and development.
    摘要 Experiments show that GPT-3.5-turbo encounters challenges in accurately detecting and classifying a fortiori arguments, but still demonstrates competitive performance in extracting key components and interpreting underlying properties. Integrating external information into the model's processing significantly improves the quality of generated explanations. Moreover, the model exhibits the ability to augment arguments, enriching the data set.Despite facing limitations, this thesis makes significant contributions to artificial intelligence and logical reasoning, introducing novel methodologies, establishing a rigorous evaluation framework, and providing deep insights. The findings and methodologies presented herein highlight the potential of AI in complex reasoning tasks and suggest directions for future research and development.

Guided Flows for Generative Modeling and Decision Making

  • paper_url: http://arxiv.org/abs/2311.13443
  • repo_url: None
  • paper_authors: Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, Ricky T. Q. Chen
  • for: 这 paper 是为了研究 Classifier-free guidance 是否可以应用于 Flow Matching 模型,以提高 conditional generative models 的性能。
  • methods: 这 paper 使用了 Guided Flows 方法,通过对 vector fields 进行回归来训练 Continuous Normalizing Flows。
  • results: 这 paper 表明 Guided Flows 可以在 conditional image generation、speech synthesis 和 reinforcement learning 中提高样本质量,并且可以采用非常低的计算量来实现。 particularly, 这 paper 是首次将流模型应用于 offline reinforcement learning 设定。
    Abstract Classifier-free guidance is a key component for improving the performance of conditional generative models for many downstream tasks. It drastically improves the quality of samples produced, but has so far only been used for diffusion models. Flow Matching (FM), an alternative simulation-free approach, trains Continuous Normalizing Flows (CNFs) based on regressing vector fields. It remains an open question whether classifier-free guidance can be performed for Flow Matching models, and to what extent does it improve performance. In this paper, we explore the usage of Guided Flows for a variety of downstream applications involving conditional image generation, speech synthesis, and reinforcement learning. In particular, we are the first to apply flow models to the offline reinforcement learning setting. We also show that Guided Flows significantly improves the sample quality in image generation and zero-shot text-to-speech synthesis, and can make use of drastically low amounts of computation without affecting the agent's overall performance.
    摘要 <>将文本翻译成简化中文。<>梯度自由指导是下游任务中提高 conditional generative 模型性能的关键组件。它很大程度提高样本质量,但只有用于 diffusion 模型。流匹配(FM)是一种代替 simulation-free 方法,基于抽象正则场进行 Continuous Normalizing Flows(CNFs)训练。尚未知是否可以在 Flow Matching 模型中实现梯度自由指导,以及它对性能的影响。在这篇论文中,我们探索了 Guided Flows 在多种下游应用中的使用,包括 conditional image generation、speech synthesis 和 reinforcement learning。尤其是,我们是首次将流模型应用到 offline reinforcement learning 设置中。我们还示出,Guided Flows 在图像生成和 zero-shot text-to-speech 合成中明显提高样本质量,并可以在计算量很低的情况下使用,不影响代理人的总性能。

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

  • paper_url: http://arxiv.org/abs/2311.13435
  • repo_url: https://github.com/mbzuai-oryx/video-llava
  • paper_authors: Shehan Munasinghe, Rusiru Thushara, Muhammad Maaz, Hanoona Abdul Rasheed, Salman Khan, Mubarak Shah, Fahad Khan
  • for: 本研究旨在扩展图像基于的大型多Modal模型(LMM)到视频领域,以提高视频理解和对话能力。
  • methods: 我们提出了一种名为Video-LLaVA的新方法,它利用扫描器和一种新的固定模块,以具有像素级别的定位能力,并将音频信号转化为文本,以涌增视频上的上下文理解。
  • results: 我们在视频生成和问答任务上进行了评估,并提出了新的benchmark测试集,以衡量在视频中提出指令时对象的定位性能。results表明,Video-LLaVA在视频基于的对话和定位任务上具有明显的优势,比如Video-ChatGPT和Video-LLaMA等方法。项目页面:https://github.com/mbzuai-oryx/Video-LLaVA
    Abstract Extending image-based Large Multimodal Models (LMM) to videos is challenging due to the inherent complexity of video data. The recent approaches extending image-based LMM to videos either lack the grounding capabilities (e.g., VideoChat, Video-ChatGPT, Video-LLaMA) or do not utilize the audio-signals for better video understanding (e.g., Video-ChatGPT). Addressing these gaps, we propose Video-LLaVA, the first LMM with pixel-level grounding capability, integrating audio cues by transcribing them into text to enrich video-context understanding. Our framework uses an off-the-shelf tracker and a novel grounding module, enabling it to spatially and temporally localize objects in videos following user instructions. We evaluate Video-LLaVA using video-based generative and question-answering benchmarks and introduce new benchmarks specifically designed to measure prompt-based object grounding performance in videos. Further, we propose the use of Vicuna over GPT-3.5, as utilized in Video-ChatGPT, for video-based conversation benchmarking, ensuring reproducibility of results which is a concern with the proprietary nature of GPT-3.5. Our framework builds on SoTA image-based LLaVA model and extends its advantages to the video domain, delivering promising gains on video-based conversation and grounding tasks. Project Page: https://github.com/mbzuai-oryx/Video-LLaVA
    摘要 扩展图像基的大型多Modal模型(LMM)到视频是一项复杂的任务,因为视频数据本身具有内在的复杂性。现有的方法扩展图像基LMM到视频中缺乏基础能力(如VideoChat和Video-ChatGPT)或者不利用音频信号来改善视频理解(如Video-ChatGPT)。为了解决这些漏洞,我们提出了 Video-LLaVA,首个具有像素级别固定能力的LMM,通过将音频词语转换为文本来增强视频上下文理解。我们的框架使用了商业化的跟踪器和一个新的固定模块,使其在用户 instrucions 的指导下在视频中空间和时间上local化对象。我们使用了 Vicuna 代替 GPT-3.5,以确保结果的可重复性,因为 GPT-3.5 是商业化的。我们的框架基于 SoTA 图像基 LLaVA 模型,扩展其优势到视频领域,并达到了视频基于对话和固定任务的承诺性。项目页面:https://github.com/mbzuai-oryx/Video-LLaVA

From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?

  • paper_url: http://arxiv.org/abs/2311.13414
  • repo_url: https://github.com/yannikkellerde/gnn_hex
  • paper_authors: Yannik Keller, Jannis Blüml, Gopika Sudhakaran, Kristian Kersting
  • for: 研究自适应学习(RL)方法在游戏自动化中的应用,特别是使用图 neural networks(GNN)取代 convolutional neural networks(CNN)。
  • methods: 使用Hex游戏作为实验平台,比较GNN和CNN在自适应学习中的表现。
  • results: GNN在处理长距离依赖关系方面表现出色,降低过拟合现象,但也表现出对地方结构的不优异。这表明GNN可能成为自适应学习中的新平台,并且可能通过使用游戏特定的结构来重新定义自适应学习。
    Abstract The gameplay of strategic board games such as chess, Go and Hex is often characterized by combinatorial, relational structures -- capturing distinct interactions and non-local patterns -- and not just images. Nonetheless, most common self-play reinforcement learning (RL) approaches simply approximate policy and value functions using convolutional neural networks (CNN). A key feature of CNNs is their relational inductive bias towards locality and translational invariance. In contrast, graph neural networks (GNN) can encode more complicated and distinct relational structures. Hence, we investigate the crucial question: Can GNNs, with their ability to encode complex connections, replace CNNs in self-play reinforcement learning? To this end, we do a comparison with Hex -- an abstract yet strategically rich board game -- serving as our experimental platform. Our findings reveal that GNNs excel at dealing with long range dependency situations in game states and are less prone to overfitting, but also showing a reduced proficiency in discerning local patterns. This suggests a potential paradigm shift, signaling the use of game-specific structures to reshape self-play reinforcement learning.
    摘要 游戏如棋盘游戏(如棋牌、吴弈和幻方块)的游戏机制通常具有组合、关系结构,而不仅仅是图像。然而,大多数常见的自动学习(RL)方法仅仅使用卷积神经网络(CNN)来近似策略函数和价值函数。卷积神经网络具有本地性和变换不变性的适应性,这种适应性限制了RL方法的表现。相比之下,图гра几何神经网络(GNN)可以更好地编码复杂的关系结构。因此,我们询问:可以使用GNN来取代CNN?为了回答这个问题,我们在hex游戏(一款抽象又战略丰富的棋盘游戏)上进行了实验。我们的发现表明,GNN在游戏状态中处理长距离依赖的情况上表现出色,并且更少地过拟合。然而,GNN也表现出了处理本地结构的不足。这表明RL方法可能需要根据游戏特定的结构进行重新定义。

Confidant: Customizing Transformer-based LLMs via Collaborative Edge Training

  • paper_url: http://arxiv.org/abs/2311.13381
  • repo_url: None
  • paper_authors: Yuhao Chen, Yuxuan Yan, Qianqian Yang, Yuanchao Shu, Shibo He, Jiming Chen
  • for: 这篇论文的目的是提出一个多 backend 协作训练框架,以便在常规手持式设备上调整现代大型自然语言处理(NLP)模型。
  • methods: 这篇论文使用了一个分解 LLM 为多个子模型的方法,并使用了一个管道式平行训练机制来确保快速和高效地进行分布式训练。 另外,论文还提出了一个新的 backend 排程来将不同的注意头分配到不同的运算硬件,包括手持式 CPU 和 GPU,以最大化每个边缘设备的 compute 资源使用率。
  • results: 论文的初步实验结果显示,使用 Confidant 可以获得最多 45.3% 的内存减少和 8.03x 的推断速度提升在实际应用中。
    Abstract Transformer-based large language models (LLMs) have demonstrated impressive capabilities in a variety of natural language processing (NLP) tasks. Nonetheless, it is challenging to deploy and fine-tune LLMs on mobile edge devices with limited computing, memory, and energy budgets. In this paper, we propose Confidant, a multi-backend collaborative training framework for customizing state-of-the-art LLMs on commodity mobile devices like smartphones. Confidant partitions an LLM into several sub-models so that each fits into a mobile device's memory. A pipeline parallel training mechanism is further developed to ensure fast and efficient distributed training. In addition, we propose a novel backend scheduler to allocate different attention heads to heterogeneous compute hardware, including mobile CPU and GPUs, to maximize the compute resource utilization on each edge device. Our preliminary experimental results show that Confidant achieves at most 45.3% memory reduction and 8.03x inference speedup in practical settings.
    摘要 transformer-based large language models (LLMs) 已经在多种自然语言处理 (NLP) 任务中表现出了惊人的能力。然而,在移动边缘设备上部署和微调 LLMs 仍然是一个挑战,主要因为这些设备具有限制的计算、存储和能源预算。本文提出了一个名为 Confidant 的多后端协作训练框架,用于适应最新的 LLMs 在商用移动设备上进行定制。Confidant 将 LLM 分解成多个子模型,以便每个子模型适应移动设备的内存限制。此外,我们还提出了一种管道并行训练机制,以确保 rapide 和高效地进行分布式训练。此外,我们还提出了一种新的后端调度器,用于将不同的注意头分配到多种不同的计算硬件,包括移动 CPU 和 GPU,以最大化每个边缘设备的计算资源利用率。我们的初步实验结果表明,Confidant 可以在实际应用中实现最多 45.3% 的内存减少和 8.03x 的执行速度提高。

Deriving Comprehensible Theories from Probabilistic Circuits

  • paper_url: http://arxiv.org/abs/2311.13379
  • repo_url: None
  • paper_authors: Sieben Bocklandt, Wannes Meert, Koen Vanderstraeten, Wouter Pijpops, Kurt Jaspers
  • for: 这个论文的目的是提高可解释AI模型的可理解性,具体来说是计算出一个可读的逻辑理论来描述高密度区域生成的概率电路。
  • methods: 这个论文使用的方法是基于生成 significado的抽象减少方法,即PUTPUT(概率电路理解通过减少下面的逻辑理论)。
  • results: 论文的结果表明,这种方法可以有效地生成一个可读的逻辑理论,描述高密度区域生成的概率电路,并且在性能和可理解性之间进行了有效的质量减少。
    Abstract The field of Explainable AI (XAI) is seeking to shed light on the inner workings of complex AI models and uncover the rationale behind their decisions. One of the models gaining attention are probabilistic circuits (PCs), which are a general and unified framework for tractable probabilistic models that support efficient computation of various probabilistic queries. Probabilistic circuits guarantee inference that is polynomial in the size of the circuit. In this paper, we improve the explainability of probabilistic circuits by computing a comprehensible, readable logical theory that covers the high-density regions generated by a PC. To achieve this, pruning approaches based on generative significance are used in a new method called PUTPUT (Probabilistic circuit Understanding Through Pruning Underlying logical Theories). The method is applied to a real world use case where music playlists are automatically generated and expressed as readable (database) queries. Evaluation shows that this approach can effectively produce a comprehensible logical theory that describes the high-density regions of a PC and outperforms state of the art methods when exploring the performance-comprehensibility trade-off.
    摘要 领域的解释AI(XAI)目前在努力推翻复杂AI模型的内部工作和做出决策的理由。一种吸引注意的模型是可能性电路(PC),它是一种通用和统一的概率模型框架,支持高效计算多种概率查询。可能性电路保证推理是圆拟于电路的大小。在这篇论文中,我们通过计算可读、可理解的逻辑理论来提高可能性电路的解释性。为此,我们使用基于生成重要性的剪辑方法,称为PUTPUT(可能性电路理解通过剪辑下面逻辑理论)。这种方法应用于实际应用场景,即自动生成音乐播放列表,并表示为可读(数据库)查询。评估表明,这种方法可以有效生成可读逻辑理论,描述高密度区域的可能性电路,并超越当前的状态艺法。

Large Language Model is a Good Policy Teacher for Training Reinforcement Learning Agents

  • paper_url: http://arxiv.org/abs/2311.13373
  • repo_url: https://github.com/zjlab-ammi/llm4teach
  • paper_authors: Zihao Zhou, Bin Hu, Pu Zhang, Chenyang Zhao, Bin Liu
  • for: 这篇论文旨在解决大型自然语言模型(LLM)在复杂序列决策任务中的应用限制,以及在实际动态环境中LLM-基本的代理agent的部署成本和时间成本。
  • methods: 作者提出了一种新的框架,即使用LLM提供高级指令来训练一个较小规模的特殊学生代理。通过LLM提供的导航动作,本地学生模型可以借鉴LLM的先前知识,并且可以通过环境反馈进一步增强其表现。
  • results: 作者在三个复杂的MiniGrid环境中进行了实验,结果显示,作者的方法可以提高样本效率,并且比基eline方法表现更优。代码可以在https://github.com/ZJLAB-AMMI/LLM4Teach中下载。
    Abstract Recent studies have shown that Large Language Models (LLMs) can be utilized for solving complex sequential decision-making tasks by providing high-level instructions. However, LLM-based agents face limitations in real-time dynamic environments due to their lack of specialization in solving specific target problems. Moreover, the deployment of such LLM-based agents is both costly and time-consuming in practical scenarios. In this paper, we introduce a novel framework that addresses these challenges by training a smaller scale specialized student agent using instructions from an LLM-based teacher agent. By leveraging guided actions provided by the teachers, the prior knowledge of the LLM is distilled into the local student model. Consequently, the student agent can be trained with significantly less data. Furthermore, subsequent training with environment feedback empowers the student agents to surpass the capabilities of their teachers. We conducted experiments on three challenging MiniGrid environments to evaluate the effectiveness of our framework. The results demonstrate that our approach enhances sample efficiency and achieves superior performance compared to baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.
    摘要

Applying Large Language Models to Power Systems: Potential Security Threats

  • paper_url: http://arxiv.org/abs/2311.13361
  • repo_url: None
  • paper_authors: Jiaqi Ruan, Gaoqi Liang, Huan Zhao, Guolong Liu, Jing Qiu, Junhua Zhao, Zhao Xu, Fushuan Wen, Zhao Yang Dong
  • for: 这篇论文旨在探讨应用大型自然语言模型(LLM)到电力系统中的潜在安全隐患。
  • methods: 该论文使用了可观察性分析和攻击检测技术来检测 LLM 应用中的安全隐患。
  • results: 论文发现了一些可能的安全隐患,包括数据隐私泄露和攻击者可能利用 LLM 进行攻击。
    Abstract Applying large language models (LLMs) to power systems presents a promising avenue for enhancing decision-making and operational efficiency. However, this action may also incur potential security threats, which have not been fully recognized so far. To this end, this letter analyzes potential threats incurred by applying LLMs to power systems, emphasizing the need for urgent research and development of countermeasures.
    摘要 使用大型自然语言模型(LLM)在电力系统中应用可能会提高决策和运营效率,但这也可能会带来未知的安全隐患。为此,本封信 analyze了应用LLM到电力系统时可能出现的安全隐患,并强调需要进行紧急的研究和开发相应的防范措施。

Uncertainty Estimation in Multi-Agent Distributed Learning

  • paper_url: http://arxiv.org/abs/2311.13356
  • repo_url: None
  • paper_authors: Gleb Radchenko, Victoria Andrea Fill
  • for: 本研究旨在为IoT边缘设备提供更多的自主操作功能,通过开源框架和新的训练策略来推进AI应用。
  • methods: 研究人员采用了量化、剪辑掌握和缺省化等新技术来提高边缘设备的ML功能。
  • results: 研究人员成功地实现了在边缘设备上进行分布式学习,并解决了协作学习中数据集的信任度问题。
    Abstract Traditionally, IoT edge devices have been perceived primarily as low-power components with limited capabilities for autonomous operations. Yet, with emerging advancements in embedded AI hardware design, a foundational shift paves the way for future possibilities. Thus, the aim of the KDT NEUROKIT2E project is to establish a new open-source framework to further facilitate AI applications on edge devices by developing new methods in quantization, pruning-aware training, and sparsification. These innovations hold the potential to expand the functional range of such devices considerably, enabling them to manage complex Machine Learning (ML) tasks utilizing local resources and laying the groundwork for innovative learning approaches. In the context of 6G's transformative potential, distributed learning among independent agents emerges as a pivotal application, attributed to 6G networks' support for ultra-reliable low-latency communication, enhanced data rates, and advanced edge computing capabilities. Our research focuses on the mechanisms and methodologies that allow edge network-enabled agents to engage in collaborative learning in distributed environments. Particularly, one of the key issues within distributed collaborative learning is determining the degree of confidence in the learning results, considering the spatio-temporal locality of data sets perceived by independent agents.
    摘要 传统上,IoT边缘设备被视为低功耗组件,具有有限的自主运行能力。然而,随着嵌入式AI硬件设计的进步,这种情况正在发生变化。因此,KDT NEUROKIT2E项目的目标是开发一个新的开源框架,以便更好地推进AI应用程序在边缘设备上。这些创新包括量化、减少训练和缺省化,可以扩大边缘设备的功能范围,使其能够处理复杂的机器学习任务,并且可以在本地资源上进行自主学习。在6G的转型潜力下,分布式学习在独立代理之间emerges as a critical application,这是因为6G网络支持低延迟、高速数据传输和高级边缘计算功能。我们的研究关注边缘网络启用代理在分布环境中合作学习的机制和方法。特别是,在分布式合作学习中,确定独立代理所感知的数据集的空间-时间地域性是一个关键问题。

Fact-based Court Judgment Prediction

  • paper_url: http://arxiv.org/abs/2311.13350
  • repo_url: None
  • paper_authors: Shubham Kumar Nigam, Aniket Deroy
  • for: 这个研究旨在提高初期案件结果预测,帮助法律专业人士和公众快速了解案件结果。
  • methods: 该研究使用了两种不同的问题变体:一种基于事实,另一种将事实与下级法院的判决(RLC)结合。研究使用了DELSumm算法和多种权重方案。
  • results: 研究结果表明,使用只事实进行法律判决预测的结果较为低,尤其是与原始ILDC for CJPE研究的结果相比。使用不同的转换器模型也未能达到状态之前的最佳结果。
    Abstract This extended abstract extends the research presented in "ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation" \cite{malik-etal-2021-ildc}, focusing on fact-based judgment prediction within the context of Indian legal documents. We introduce two distinct problem variations: one based solely on facts, and another combining facts with rulings from lower courts (RLC). Our research aims to enhance early-phase case outcome prediction, offering significant benefits to legal professionals and the general public. The results, however, indicated a performance decline compared to the original ILDC for CJPE study, even after implementing various weightage schemes in our DELSumm algorithm. Additionally, using only facts for legal judgment prediction with different transformer models yielded results inferior to the state-of-the-art outcomes reported in the "ILDC for CJPE" study.
    摘要 这个扩展报告build upon the research presented in "ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation" \cite{malik-etal-2021-ildc}, focusing on fact-based judgment prediction within the context of Indian legal documents. We introduce two distinct problem variations: one based solely on facts, and another combining facts with rulings from lower courts (RLC). Our research aims to enhance early-phase case outcome prediction, offering significant benefits to legal professionals and the general public. The results, however, indicated a performance decline compared to the original ILDC for CJPE study, even after implementing various weightage schemes in our DELSumm algorithm. Additionally, using only facts for legal judgment prediction with different transformer models yielded results inferior to the state-of-the-art outcomes reported in the "ILDC for CJPE" study.Here's the text with some notes on the translation:* "ILDC for CJPE" is translated as "ILDC for CJPE" (文档批判预测和解释的印度法律文档资料库)* "extended abstract" is translated as "扩展报告" (扩展报告)* "fact-based judgment prediction" is translated as "基于事实的判决预测" (基于事实的判决预测)* "legal documents" is translated as "法律文档" (法律文档)* "lower courts" is translated as "下级法院" (下级法院)* "RLC" is translated as "RLC" (RLC)* "early-phase case outcome prediction" is translated as "案件初期结果预测" (案件初期结果预测)* "significant benefits" is translated as "显著的利益" (显著的利益)* "state-of-the-art outcomes" is translated as "现状最佳的结果" (现状最佳的结果)Please note that the translation is based on the Simplified Chinese version of the text, and some words or phrases may have different translations in Traditional Chinese.

Learning principle and mathematical realization of the learning mechanism in the brain

  • paper_url: http://arxiv.org/abs/2311.13341
  • repo_url: None
  • paper_authors: Taisuke Katayose
  • for: 本研究的目的是提供一个数学框架,用于解释深度学习的成功原理,并对现实机器学习模型进行应用。
  • methods: 本研究使用了一种新的概率评估方法,可以对任意数据集进行无约先知无约学习。此外,本研究还提出了一种新的定值方法,可以用于评估估计的值。
  • results: 本研究得到了许多有价值的结果,包括:1) 所有的学习都可以视为输入数据的概率估计;2) 普通的监督学习可以视为估计条件概率;3) 可以对任意数据集进行无约先知无约学习。此外,本研究还成功地描述了大脑中的学习机制。
    Abstract While deep learning has achieved remarkable success, there is no clear explanation about why it works so well. In order to discuss this question quantitatively, we need a mathematical framework that explains what learning is in the first place. After several considerations, we succeeded in constructing a mathematical framework that can provide a unified understanding of all types of learning, including deep learning and learning in the brain. We call it learning principle, and it follows that all learning is equivalent to estimating the probability of input data. We not only derived this principle, but also mentioned its application to actual machine learning models. For example, we found that conventional supervised learning is equivalent to estimating conditional probabilities, and succeeded in making supervised learning more effective and generalized. We also proposed a new method of defining the values of estimated probability using differentiation, and showed that unsupervised learning can be performed on arbitrary dataset without any prior knowledge. Namely, this method is a general-purpose machine learning in the true sense. Moreover, we succeeded in describing the learning mechanism in the brain by considering the time evolution of a fully or partially connected model and applying this new method. The learning principle provides solutions to many unsolved problems in deep learning and cognitive neuroscience.
    摘要 深度学习已经取得了很大的成功,但是没有一个明确的解释为什么它会那么好。要讨论这个问题的数学基础,我们需要一个概念框架,可以解释学习是什么。经过一些考虑,我们成功地构建了一个概念框架,可以为所有类型的学习提供一个统一的理解。我们称之为学习原理,它表示所有学习都是估计输入数据的概率的过程。我们不仅描述了这个原理,还应用于实际的机器学习模型。例如,我们发现了 conditional probabilities 的估计,并成功地使得 supervised learning 更加有效和普遍化。我们还提出了一种新的定义值的方法,并证明了不含任何先验知识的情况下,可以在任何数据集上进行无监督学习。即,这种方法是真正的通用机器学习。此外,我们成功地用时间演化的完全或部分连接模型来描述大脑学习机制,并应用了这种新方法。学习原理解决了深度学习和认知神经科学中的许多未解决的问题。

Quantum learning and essential cognition under the traction of meta-characteristics in an open world

  • paper_url: http://arxiv.org/abs/2311.13335
  • repo_url: None
  • paper_authors: Jin Wang, Changlin Song
  • for: This paper aims to improve the ability of artificial intelligence (AI) to recognize and explore new knowledge in the Open World problem.
  • methods: The proposed model and elemental feature system focus on recognizing the distribution differences in objective features between the new and old worlds, using the quantum tunneling effect of learning ability and the tractive force of meta-characteristic.
  • results: The model system achieves outstanding performance in learning new knowledge, with an accuracy of $96.71%$ at most, demonstrating that AI has acquired the ability to recognize the new world and explore new knowledge, similar to humans.
    Abstract Artificial intelligence has made significant progress in the Close World problem, being able to accurately recognize old knowledge through training and classification. However, AI faces significant challenges in the Open World problem, as it involves a new and unknown exploration journey. AI is not inherently proactive in exploration, and its challenge lies in not knowing how to approach and adapt to the unknown world. How do humans acquire knowledge of the unknown world. Humans identify new knowledge through intrinsic cognition. In the process of recognizing new colors, the cognitive cues are different from known color features and involve hue, saturation, brightness, and other characteristics. When AI encounters objects with different features in the new world, it faces another challenge: where are the distinguishing features between influential features of new and old objects? AI often mistakes a new world's brown bear for a known dog because it has not learned the differences in feature distributions between knowledge systems. This is because things in the new and old worlds have different units and dimensions for their features. This paper proposes an open-world model and elemental feature system that focuses on fundamentally recognizing the distribution differences in objective features between the new and old worlds. The quantum tunneling effect of learning ability in the new and old worlds is realized through the tractive force of meta-characteristic. The outstanding performance of the model system in learning new knowledge (using pedestrian re-identification datasets as an example) demonstrates that AI has acquired the ability to recognize the new world with an accuracy of $96.71\%$ at most and has gained the capability to explore new knowledge, similar to humans.
    摘要 人工智能在封闭世界问题上做出了重要进步,能够通过训练和分类精度地识别旧知识。然而,AI在开放世界问题上面临着巨大挑战,因为它需要面对未知的探索旅程。AI本身不具有探索的 Innate 特性,而且在未知世界中不知道如何进行approach和适应。人类如何获得未知世界的知识呢?人类通过内在认知来识别新知识。在认识新颜色的过程中,认知cue不同于已知颜色特征,包括色彩、浓淡、亮度等特征。当AI在新世界中遇到不同特征的对象时,它又遇到了一个挑战:新世界的熊猫与已知的狗之间的区分是什么?AI经常将新世界的熊猫误认为是已知的狗,因为它没有学习新和旧世界之间的特征分布之间的区别。这是因为新和旧世界中的物体具有不同的单位和维度,这些特征的分布也不同。本文提出了一种开放世界模型和元素特征系统,旨在识别新和旧世界之间的分布差异。通过元素特征的 tractive force,AI在新和旧世界中学习新知识的能力得到了证明,并且在使用人行识别数据集(例如)的例子中,AI的学习性能达到了$96.71\%$的最高精度。这表明AI已经获得了识别新世界的能力,并且类似于人类,具有了探索新知识的能力。

Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

  • paper_url: http://arxiv.org/abs/2311.13326
  • repo_url: None
  • paper_authors: Woosung Koh, Insu Choi, Yuntae Jang, Gimin Kang, Woo Chang Kim
  • for: 提高控制任务的性能 over complex time-series data
  • methods: 使用 curriculum learning 和 imitation learning
  • results: 发现 curriculum learning 是一个有前途的方向,而 imitation learning 应用需要谨慎Here’s a more detailed explanation of each point:
  • for: The paper is written to improve the performance of control tasks over complex time-series data. The authors explore the use of curriculum learning and imitation learning in this context.
  • methods: The authors use curriculum learning via data augmentation and imitation learning via policy distillation from an oracle. They also perform ablation studies and tune all overlapping hyperparameters on the baseline.
  • results: The authors find that curriculum learning is a promising direction for improving control-task performance over complex time-series data. However, they also find that imitation learning should be used with caution.
    Abstract Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum learning via data augmentation, while imitation learning is implemented via policy distillation from an oracle. Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance over complex time-series. Our ample random-seed out-sample empirics and ablation studies are highly encouraging for curriculum learning for time-series control. These findings are especially encouraging as we tune all overlapping hyperparameters on the baseline -- giving an advantage to the baseline. On the other hand, we find that imitation learning should be used with caution.
    摘要 curriculum learning和模仿学习在 робо太CS领域得到了广泛应用。然而,对于高度随机时序数据的控制任务,它们的研究几乎没有进行。在这里,我们 theoretically和empirically探索这些想法在代表性的控制任务上。我们通过数据扩展实现了课程学习的基本想法,而模仿学习则通过策略填充 oracle来实现。我们的发现表明,课程学习在复杂时序数据上的控制任务中应该被视为一种新的方向,并且我们的充分随机种子外样实验和抽象研究结果都是非常鼓励的。然而,我们发现,模仿学习应该被用于小心。

Probabilistic Inference in Reinforcement Learning Done Right

  • paper_url: http://arxiv.org/abs/2311.13294
  • repo_url: None
  • paper_authors: Jean Tarbouriech, Tor Lattimore, Brendan O’Donoghue
  • for: 这篇论文的目的是为了解决 Markov decision process (MDP) 中的 probabilistic inference 问题,并提出一种 Bayesian 方法来实现。
  • methods: 这篇论文使用的方法包括 Bayesian 径向量的抽象、variational Bayesian 近似法和 convex optimization 问题的解决方法。
  • results: 研究人员通过实践和计算来证明 VAPOR 方法可以有效地解决 MDP 中的 probabilistic inference 问题,并且可以提供高效的搜索策略。
    Abstract A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.
    摘要 一种广泛的视角在强化学习(RL)中将问题看作是一个图形模型的Markov决策过程(MDP)上的概率推理。学习的核心对象是优化政策下每个状态动作对的抽象概率。先前的方法可能是伪装的,导致算法不能做真正的统计推理,从而导致问题的解决不佳。在这项工作中,我们采取了一种严格的极 bayesian方法,对状态动作优化 posterior概率进行了正确的推导,并证明它可以用来生成一个有效的搜索策略。然而,计算这个量是不可能的,因此我们提出了一种新的变分极 bayesian方法,转化为一个可解决的几何问题,并证明得到的策略也可以有效地搜索。我们称之为VAPOR,并证明它与汤姆逊抽样、K-学习和最大Entropy搜索有着强 Connection。我们结束的实验表明,一个深度RL版本的VAPOR在性能方面具有明显的优势。

Algorithmic Transparency and Manipulation

  • paper_url: http://arxiv.org/abs/2311.13286
  • repo_url: https://github.com/Piyushrai558/voting-via-blockchain
  • paper_authors: Michael Klenk
  • for: 这篇论文旨在探讨算法透明性的欺诈潜在性。
  • methods: 该论文采用了对算法透明性的探讨,以及对欺诈的定义和分析。
  • results: 论文发现,对算法透明性的探讨可能导致欺诈的潜在性,而这种潜在性不仅基于算法的敏感性,还基于人们对算法的理解和信任。
    Abstract A series of recent papers raises worries about the manipulative potential of algorithmic transparency. But while the concern is apt and relevant, it is based on a fraught understanding of manipulation. Therefore, this paper draws attention to the indifference view of manipulation, which explains better than the vulnerability view why algorithmic transparency has manipulative potential. The paper also raises pertinent research questions for future studies of manipulation in the context of algorithmic transparency.
    摘要 一系列最近的论文引发了对算法透明度的欺诈性可能性的担忧。然而,这种担忧基于不稳定的理解欺诈。因此,本文吸引注意力于不关注视角下的欺诈,可以更好地解释算法透明度的欺诈性。本文还提出了对欺诈在算法透明度下的未来研究中的重要问题。

FedFN: Feature Normalization for Alleviating Data Heterogeneity Problem in Federated Learning

  • paper_url: http://arxiv.org/abs/2311.13267
  • repo_url: None
  • paper_authors: Seongyoon Kim, Gihun Lee, Jaehoon Oh, Se-Young Yun
  • for: 这篇论文旨在解决 Federated Learning (FL) 中的数据不均衡问题,以提高模型训练的性能。
  • methods: 本研究使用 Federated Averaging with Feature Normalization Update (FedFN) 方法来解决数据不均衡问题。
  • results: 实验结果显示 FedFN 方法可以实现更好的性能,且可以应用到预训练的 ResNet18 模型以及基础模型上。
    Abstract Federated Learning (FL) is a collaborative method for training models while preserving data privacy in decentralized settings. However, FL encounters challenges related to data heterogeneity, which can result in performance degradation. In our study, we observe that as data heterogeneity increases, feature representation in the FedAVG model deteriorates more significantly compared to classifier weight. Additionally, we observe that as data heterogeneity increases, the gap between higher feature norms for observed classes, obtained from local models, and feature norms of unobserved classes widens, in contrast to the behavior of classifier weight norms. This widening gap extends to encompass the feature norm disparities between local and the global models. To address these issues, we introduce Federated Averaging with Feature Normalization Update (FedFN), a straightforward learning method. We demonstrate the superior performance of FedFN through extensive experiments, even when applied to pretrained ResNet18. Subsequently, we confirm the applicability of FedFN to foundation models.
    摘要 受collaborative方法培训模型保持分散设置下的数据隐私的 Federated Learning(FL)遇到了数据多样性所引起的问题,这可能导致性能下降。在我们的研究中,我们发现,随着数据多样性增加,FedAVG模型中的特征表示性减退更加明显,相比于分类器权重。此外,我们发现,随着数据多样性增加,已知类的特征norm值(在本地模型中)与未知类的特征norm值之间的差距加大,与分类器权重norm值不同。这个差距随着模型训练的进行而扩大,包括特征norm值之间的差距。为解决这些问题,我们提出了 Federated Averaging with Feature Normalization Update(FedFN),一种简单的学习方法。我们通过了详细的实验,证明FedFN的superior性能,即使应用于预训练的ResNet18。然后,我们证明FedFN可以应用于基础模型。

Improved identification accuracy in equation learning via comprehensive $\boldsymbol{R^2}$-elimination and Bayesian model selection

  • paper_url: http://arxiv.org/abs/2311.13265
  • repo_url: None
  • paper_authors: Daniel Nickelsen, Bubacarr Bah
  • for: Equation learning 方法,寻找最佳的Equation 模型,旨在提高Equation 的准确性和效率。
  • methods: 我们提出了一种新的方法,结合 $R^2$ 和 $p(\mathbf{y}|\mathcal{M})$ 两个指标,实现了在每个步骤中较小的模型空间减少,同时保持了广泛的搜索。我们还提出了三种新的方法,包括两种基于 $p(\mathbf{y}|\mathcal{M})$ 的bi-directional stepwise regression方法。
  • results: 经过三个数据集的实验比较,我们发现我们的方法在准确性方面胜过了现有的方法,特别是第二种方法在避免过拟合的问题上取得了最高的率。
    Abstract In the field of equation learning, exhaustively considering all possible equations derived from a basis function dictionary is infeasible. Sparse regression and greedy algorithms have emerged as popular approaches to tackle this challenge. However, the presence of multicollinearity poses difficulties for sparse regression techniques, and greedy steps may inadvertently exclude terms of the true equation, leading to reduced identification accuracy. In this article, we present an approach that strikes a balance between comprehensiveness and efficiency in equation learning. Inspired by stepwise regression, our approach combines the coefficient of determination, $R^2$, and the Bayesian model evidence, $p(\boldsymbol y|\mathcal M)$, in a novel way. Our procedure is characterized by a comprehensive search with just a minor reduction of the model space at each iteration step. With two flavors of our approach and the adoption of $p(\boldsymbol y|\mathcal M)$ for bi-directional stepwise regression, we present a total of three new avenues for equation learning. Through three extensive numerical experiments involving random polynomials and dynamical systems, we compare our approach against four state-of-the-art methods and two standard approaches. The results demonstrate that our comprehensive search approach surpasses all other methods in terms of identification accuracy. In particular, the second flavor of our approach establishes an efficient overfitting penalty solely based on $R^2$, which achieves highest rates of exact equation recovery.
    摘要 在Equation学中,涵盖所有可能的方程从基函数词典中得出的方程是不可能的。简少 regresión和精巧算法已经成为了解决这个挑战的流行方法。然而,多icollinearity会使简少回归技术困难,而greedy步骤可能会意外排除真实方程中的项,导致准确性下降。在这篇文章中,我们提出了一种平衡于广泛性和效率的方法。 Drawing inspiration from stepwise regression, our approach combines the coefficient of determination, $R^2$, and the Bayesian model evidence, $p(\mathbf{y}|\mathcal{M})$, in a novel way. Our procedure is characterized by a comprehensive search with only a minor reduction of the model space at each iteration step. With two flavors of our approach and the adoption of $p(\mathbf{y}|\mathcal{M})$ for bi-directional stepwise regression, we present a total of three new avenues for equation learning. Through three extensive numerical experiments involving random polynomials and dynamical systems, we compare our approach against four state-of-the-art methods and two standard approaches. The results demonstrate that our comprehensive search approach surpasses all other methods in terms of identification accuracy. In particular, the second flavor of our approach establishes an efficient overfitting penalty solely based on $R^2$, which achieves highest rates of exact equation recovery.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

The Rise of Creative Machines: Exploring the Impact of Generative AI

  • paper_url: http://arxiv.org/abs/2311.13262
  • repo_url: None
  • paper_authors: Saad Shaikh, Rajat bendre, Sakshi Mhaske
  • for: 这个研究探讨了如何使用生成型人工智能(AI)改变市场营销、产品开发和研究。
  • methods: 这篇论文介绍了最新的领域发展、易于使用的资源和道德和社会风险。
  • results: 论文强调了负责任开发,通过不断与利益带的沟通和伦理原则来解决问题,并探讨了如何 mitigate 偏见和不准确信息的问题。
    Abstract This study looks at how generative artificial intelligence (AI) can revolutionize marketing, product development, and research. It discusses the latest developments in the field, easy-to-use resources, and moral and social hazards. In addition to addressing mitigating techniques for issues like prejudice and disinformation, the debate emphasizes the significance of responsible development through continual stakeholder communication and ethical principles.
    摘要 Here's the text in Simplified Chinese:这个研究研究了如何生成人工智能(AI)可以革命化市场营销、产品开发和研究。它讨论了最新的发展,易于使用的资源,以及道德和社会的风险。此外,它还Addresses mitigating techniques for issues such as prejudice and disinformation, and emphasizes the importance of responsible development through continual stakeholder communication and ethical principles.

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

  • paper_url: http://arxiv.org/abs/2311.13254
  • repo_url: https://github.com/ZHE-SAPI/STCL
  • paper_authors: Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, Dacheng Tao, Tianyou Chai
  • for: 这篇论文主要是为了解决视频 semantic segmentation 领域中的域名shift问题,即在不同的域中学习 invariable spatio-temporal features。
  • methods: 该论文提出了一种名为 DA-STC 的方法,它包括一个 bidirectional multi-level spatio-temporal fusion 模块和一个 category-aware spatio-temporal feature alignment 模块。这两个模块都是用来促进域外学习的域内适应。
  • results: 该论文的实验结果表明,DA-STC 方法可以在多个挑战性的 benchmark 上达到 state-of-the-art 的 mIOU 性能。此外,该方法还可以在图像域中进行域 adaptive semantic segmentation。代码和模型将在 \url{https://github.com/ZHE-SAPI/DA-STC} 上发布。
    Abstract Video semantic segmentation is a pivotal aspect of video representation learning. However, significant domain shifts present a challenge in effectively learning invariant spatio-temporal features across the labeled source domain and unlabeled target domain for video semantic segmentation. To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features. Firstly, we perform bidirectional spatio-temporal fusion at the image sequence level and shallow feature level, leading to the construction of two fused intermediate video domains. This prompts the video semantic segmentation model to consistently learn spatio-temporal features of shared patch sequences which are influenced by domain-specific contexts, thereby mitigating the feature gap between the source and target domain. Secondly, we propose a category-aware feature alignment module to promote the consistency of spatio-temporal features, facilitating adaptation to the target domain. Specifically, we adaptively aggregate the domain-specific deep features of each category along spatio-temporal dimensions, which are further constrained to achieve cross-domain intra-class feature alignment and inter-class feature separation. Extensive experiments demonstrate the effectiveness of our method, which achieves state-of-the-art mIOUs on multiple challenging benchmarks. Furthermore, we extend the proposed DA-STC to the image domain, where it also exhibits superior performance for domain adaptive semantic segmentation. The source code and models will be made available at \url{https://github.com/ZHE-SAPI/DA-STC}.
    摘要 视频semantic segmentation的Semantic segmentation是视频表示学习中的关键方面。然而,域域转移问题会使得学习域域不变的适用于视频semantic segmentation的特征特征变得更加挑战。为解决这个问题,我们提出了一种新的DA-STC方法,它包括一个bidirectional多级空间时间融合模块和一个类别意识空间时间特征对齐模块,以便促进域域不变特征的一致学习。首先,我们在图像序列级和浅层特征级进行了双向空间时间融合,从而建立了两个融合的中间视频领域。这使得视频semantic segmentation模型在Shared patch sequences中学习到了域域不变的空间时间特征,并且将这些特征与域域特定的上下文相互作用,以降低源和目标域域之间的特征差距。其次,我们提出了一种类别意识空间时间特征对齐模块,以促进域域不变特征的一致学习。我们在每个类别上适应地聚合了域域特定的深度特征,并将其在空间时间维度上进行了约束,以实现跨域内类特征对齐和外类特征分离。我们的方法在多个复杂的benchmark上实现了state-of-the-art的mIOU。此外,我们还将DA-STC扩展到图像领域,其也在域域不变semantic segmentation中展现出了优秀的性能。我们将代码和模型公开在GitHub上,请参考\url{https://github.com/ZHE-SAPI/DA-STC}.

@ve: A Chatbot for Latin

  • paper_url: http://arxiv.org/abs/2311.14741
  • repo_url: None
  • paper_authors: Oliver Bendel, Karim N’diaye
  • for: 保护和推广已灭绝的语言
  • methods: 使用语音保存和脚本收集、数字化,以及针对性语言学习启动
  • results: 创建了一个可以与拉丁语进行对话的 chatbot,可以作为一种新的教学工具,但现有实现还有各种各样的问题,可能通过使用 GPT-4 和扩展知识库解决这些问题。
    Abstract Dead, extinct, and endangered languages have been preserved primarily through audio conservation and the collection and digitization of scripts and have been promoted through targeted language acquisition efforts. Another possibility would be to build conversational agents that can master these languages. This would provide an artificial, active conversational partner which has knowledge of the vocabulary and grammar, and one learns with it in a different way. The chatbot @ve, with which one can communicate in Latin, was developed in 2022/2023 based on GPT-3.0. It was additionally equipped with a manually created knowledge base. After conceptual groundwork, this paper presents the preparation and implementation of the project. In addition, it summarizes the test that a Latin expert conducted with the chatbot. A critical discussion elaborates advantages and disadvantages. @ve could be a new tool for teaching Latin in a memorable and entertaining way through dialogue. However, the present implementation is still too prone to glitches for stand-alone use - i.e., without the accompaniment of a teacher. The use of GPT-4 could be a solution as well as the extension of the knowledge base. In conclusion, it can be argued that conversational agents are an innovative approach to promoting and preserving languages.
    摘要 死亡、灭绝和濒危语言被保存主要通过音频保存和脚本的收集和数字化,并通过目标语言获取努力来推广。另一种可能性是建立会话代理人,这样learner可以通过与其对话学习语言。这个paper在2022/2023年基于GPT-3.0开发了一个名为@ve的聊天机器人,可以与learner通过对话学习拉丁语。这个项目的准备和实施,以及拉丁专家对聊天机器人的测试,都被summarized在这篇论文中。文章还进行了一个批判性的讨论,探讨了这种技术的优缺点。可以Argument that conversational agents are an innovative approach to promoting and preserving languages.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

TSegFormer: 3D Tooth Segmentation in Intraoral Scans with Geometry Guided Transformer

  • paper_url: http://arxiv.org/abs/2311.13234
  • repo_url: https://github.com/huiminxiong/tsegformer
  • paper_authors: Huimin Xiong, Kunle Li, Kaiyuan Tan, Yang Feng, Joey Tianyi Zhou, Jin Hao, Haochao Ying, Jian Wu, Zuozhu Liu
  • for: 该研究旨在提高数字医疗中的光学内口扫描仪(IOS)中的3D牙齿分割精度,以便于各种 dental 应用。
  • methods: 该研究提出了TSegFormer,一种基于多任务3D变换器架构的多任务3D分割方法,可以同时捕捉不同牙齿和软膏的Local和Global依赖关系。此外,研究人员还提出了一种基于新的点曲线的几何引导损失,以进行终端到终端的精度修剪。
  • results: 实验结果表明,TSegFormer 可以一直超越现有的基准值,并且通过了广泛的分析、视觉化和实际临床应用测试。
    Abstract Optical Intraoral Scanners (IOS) are widely used in digital dentistry to provide detailed 3D information of dental crowns and the gingiva. Accurate 3D tooth segmentation in IOSs is critical for various dental applications, while previous methods are error-prone at complicated boundaries and exhibit unsatisfactory results across patients. In this paper, we propose TSegFormer which captures both local and global dependencies among different teeth and the gingiva in the IOS point clouds with a multi-task 3D transformer architecture. Moreover, we design a geometry-guided loss based on a novel point curvature to refine boundaries in an end-to-end manner, avoiding time-consuming post-processing to reach clinically applicable segmentation. In addition, we create a dataset with 16,000 IOSs, the largest ever IOS dataset to the best of our knowledge. The experimental results demonstrate that our TSegFormer consistently surpasses existing state-of-the-art baselines. The superiority of TSegFormer is corroborated by extensive analysis, visualizations and real-world clinical applicability tests. Our code is available at https://github.com/huiminxiong/TSegFormer.
    摘要 Optical Intraoral Scanners (IOS) 广泛应用于数字医疗领域,以提供精确的三维 dental 冠和舌假粘膜信息。精确的三维牙齿分割在 IOS 中是许多 dental 应用的关键,而之前的方法在复杂的边界上存在误差,并且在不同的患者上显示不满意的结果。在这篇论文中,我们提出了 TSegFormer,它通过多任务三维变换器架构捕捉了不同牙齿和舌假粘膜之间的本地和全局依赖关系。此外,我们还设计了基于一个新的点曲线的几何指导损失,以精细化边界,从头到尾完成精确的分割,而不需要时间占用的后处理。此外,我们还创建了16,000个 IOS 的数据集,这是我们知道的最大的 IOS 数据集。实验结果表明,我们的 TSegFormer 稳定地超越了现有的基准值。我们的优势得到了广泛的分析、视觉化和实际应用测试的证明。我们的代码可以在 上获取。

A Survey of Adversarial CAPTCHAs on its History, Classification and Generation

  • paper_url: http://arxiv.org/abs/2311.13233
  • repo_url: None
  • paper_authors: Zisheng Xu, Qiao Yan, F. Richard Yu, Victor C. M. Leung
  • for: 本研究旨在探讨防止自动化攻击的一种方法,即通过创造困难识别的Captcha图像来保护网站和应用程序免受自动化攻击。
  • methods: 本研究使用了一些常见的生成攻击示例方法,以及一些成功地应用于Captcha图像生成的方法。同时,本研究还 analyze了一些防御攻击示例方法,并指出了可能的威胁。
  • results: 本研究发现,通过组合攻击示例和Captcha图像,可以创造出困难识别的Captcha图像,使得深度学习模型无法正确识别。此外,本研究还发现了一些防御攻击示例方法,并分析了其威胁。
    Abstract Completely Automated Public Turing test to tell Computers and Humans Apart, short for CAPTCHA, is an essential and relatively easy way to defend against malicious attacks implemented by bots. The security and usability trade-off limits the use of massive geometric transformations to interfere deep model recognition and deep models even outperformed humans in complex CAPTCHAs. The discovery of adversarial examples provides an ideal solution to the security and usability trade-off by integrating adversarial examples and CAPTCHAs to generate adversarial CAPTCHAs that can fool the deep models. In this paper, we extend the definition of adversarial CAPTCHAs and propose a classification method for adversarial CAPTCHAs. Then we systematically review some commonly used methods to generate adversarial examples and methods that are successfully used to generate adversarial CAPTCHAs. Also, we analyze some defense methods that can be used to defend adversarial CAPTCHAs, indicating potential threats to adversarial CAPTCHAs. Finally, we discuss some possible future research directions for adversarial CAPTCHAs at the end of this paper.
    摘要 “完全自动化的公共Turing测试以辨别电脑和人类,简称CAPTCHA,是一种重要且相对容易的防护方法,旨在防止自动化攻击。然而,安全性和使用性的贸易带来了限制,导致不能使用巨大的几何变换来损坏深度模型的识别,深度模型甚至能够在复杂的CAPTCHA中超越人类。发现敌对例子提供了一个理想的解决方案,通过结合敌对例子和CAPTCHA来生成敌对CAPTCHA,以让深度模型被欺骗。在这篇研究中,我们将扩展敌对CAPTCHA的定义,并提出一种分类方法 для敌对CAPTCHA。然后,我们系统性地回顾了一些通常使用的生成敌对例子方法,以及成功地使用于生成敌对CAPTCHA的方法。此外,我们分析了可以用来防御敌对CAPTCHA的防护方法,表明了敌对CAPTCHA的潜在威胁。最后,我们讨论了未来敌对CAPTCHA的可能性。”

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

  • paper_url: http://arxiv.org/abs/2311.13231
  • repo_url: https://github.com/yk7333/d3po
  • paper_authors: Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li
  • for: fine-tuning diffusion models with human feedback
  • methods: Direct Preference for Denoising Diffusion Policy Optimization (D3PO) method, eliminates the need for a reward model, using relative scale of objectives as a proxy for human preference
  • results: comparable results to methods using ground-truth rewards, reduces image distortion rates, generates safer images
    Abstract Using reinforcement learning with human feedback (RLHF) has shown significant promise in fine-tuning diffusion models. Previous methods start by training a reward model that aligns with human preferences, then leverage RL techniques to fine-tune the underlying models. However, crafting an efficient reward model demands extensive datasets, optimal architecture, and manual hyperparameter tuning, making the process both time and cost-intensive. The direct preference optimization (DPO) method, effective in fine-tuning large language models, eliminates the necessity for a reward model. However, the extensive GPU memory requirement of the diffusion model's denoising process hinders the direct application of the DPO method. To address this issue, we introduce the Direct Preference for Denoising Diffusion Policy Optimization (D3PO) method to directly fine-tune diffusion models. The theoretical analysis demonstrates that although D3PO omits training a reward model, it effectively functions as the optimal reward model trained using human feedback data to guide the learning process. This approach requires no training of a reward model, proving to be more direct, cost-effective, and minimizing computational overhead. In experiments, our method uses the relative scale of objectives as a proxy for human preference, delivering comparable results to methods using ground-truth rewards. Moreover, D3PO demonstrates the ability to reduce image distortion rates and generate safer images, overcoming challenges lacking robust reward models. Our code is publicly available in https://github.com/yk7333/D3PO/tree/main.
    摘要

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

  • paper_url: http://arxiv.org/abs/2311.13230
  • repo_url: https://github.com/zthang/focus
  • paper_authors: Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu
  • for: 本研究旨在探讨如何检测LLMs中的幻想,以提高LLMs在实际应用中的可靠性。
  • methods: 本研究提出了一种新的参考自由、uncertainty-based方法,通过模仿人类对事实检查的焦点,从三个方面进行幻想检测:1)关键词重要性和重要性权重; 2)历史上不可靠的token可能导致幻想堆叠; 3)Token属性such as token类型和token频率。
  • results: 实验结果表明,提出的方法可以准确地检测LLMs中的幻想,并且在所有评价指标中均达到了顶峰性能。这种方法不需要额外信息,可以大大提高LLMs在实际应用中的可靠性。
    Abstract Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. However, LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations in many real-world applications. Existing works for detecting hallucinations in LLMs either rely on external knowledge for reference retrieval or require sampling multiple responses from the LLM for consistency verification, making these methods costly and inefficient. In this paper, we propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs. Our approach imitates human focus in factuality checking from three aspects: 1) focus on the most informative and important keywords in the given text; 2) focus on the unreliable tokens in historical context which may lead to a cascade of hallucinations; and 3) focus on the token properties such as token type and token frequency. Experimental results on relevant datasets demonstrate the effectiveness of our proposed method, which achieves state-of-the-art performance across all the evaluation metrics and eliminates the need for additional information.
    摘要 在这篇论文中,我们提出了一种参考值弹性的方法,用于检测 LLMs 中的夸大。我们的方法从三个方面模仿人类的注意力:1. 专注在文本中最重要和有用的关键字,以减少检测误差。2. 专注在历史上可能导致夸大的不可靠token,以预防夸大的牵扯。3. 专注在token的性质,例如token的类型和频率,以减少夸大的可能性。实验结果显示,我们的提议方法可以实现参考值弹性,并且在所有评估指标上创下了新的纪录。此外,我们的方法不需要额外的信息,因此可以减少成本和增加效率。

Robot at the Mirror: Learning to Imitate via Associating Self-supervised Models

  • paper_url: http://arxiv.org/abs/2311.13226
  • repo_url: https://github.com/andylucny/learningimitation
  • paper_authors: Andrej Lúčny, Kristína Malinovská, Igor Farkaš
  • for: 这个论文的目的是建立一种自动从准备好的自我监督模型中生成自定义模型,而不需要训练和调整。
  • methods: 作者使用了一种基于自带的模型的 associating 方法,通过将其对应的特征向量映射到另一个特征向量的空间来实现。
  • results: 通过这种方法,作者可以在不需要大量样本和训练时间的情况下,建立一个高质量的三维姿态检测器,并且可以通过对模型进行自动化调整来提高其性能。
    Abstract We introduce an approach to building a custom model from ready-made self-supervised models via their associating instead of training and fine-tuning. We demonstrate it with an example of a humanoid robot looking at the mirror and learning to detect the 3D pose of its own body from the image it perceives. To build our model, we first obtain features from the visual input and the postures of the robot's body via models prepared before the robot's operation. Then, we map their corresponding latent spaces by a sample-efficient robot's self-exploration at the mirror. In this way, the robot builds the solicited 3D pose detector, which quality is immediately perfect on the acquired samples instead of obtaining the quality gradually. The mapping, which employs associating the pairs of feature vectors, is then implemented in the same way as the key-value mechanism of the famous transformer models. Finally, deploying our model for imitation to a simulated robot allows us to study, tune up, and systematically evaluate its hyperparameters without the involvement of the human counterpart, advancing our previous research.
    摘要 我们介绍了一种自定义模型的方法,通过对已有的自我监督模型进行对应而不是训练和精度调整。我们通过一个人工智能机器人在镜子上看到自己的姿势,并从图像中检测自己的3D姿势的示例来说明这种方法。我们首先从视觉输入和机器人的姿势获得特征。然后,我们将其相应的潜在空间映射,通过一个效率高的机器人自我探索镜子来实现。这种映射使机器人构建了由人工智能自我探索镜子而来的3D姿势检测器。这个检测器的质量立即完美地适应已有的样本,而不是逐渐提高的方式。我们使用对应对的特征向量的方式实现这种映射,同时采用了transformer模型中的键值机制。最后,我们通过对模型进行模拟,对其参数进行调整,并系统地评估它的性能,不需要人类参与,从而提高我们之前的研究。

Artificial Intelligence in the Service of Entrepreneurial Finance: Knowledge Structure and the Foundational Algorithmic Paradigm

  • paper_url: http://arxiv.org/abs/2311.13213
  • repo_url: None
  • paper_authors: Robert Kudelić, Tamara Šmaguc, Sherry Robinson
  • For: The paper is written to explore the potential of Artificial Intelligence in Entrepreneurship, specifically in the field of Entrepreneurial Finance.* Methods: The paper uses a bibliometric review of relevant journal articles to analyze the current state of research in the field, including the use of Artificial Neural Networks, Deep Neural Networks, Support Vector Machines, Topic Modeling, Fuzzy Neural Networks, and Growing Hierarchical Self-organizing Maps.* Results: The paper identifies nascent and underdeveloped research directions in the field, and finds that Artificial Neural Networks, Deep Neural Networks, and Support Vector Machines are highly represented in almost all identified topic niches, while the use of other methods such as Topic Modeling, Fuzzy Neural Networks, and Growing Hierarchical Self-organizing Maps is quite rare.Here is the information in Simplified Chinese text:* For: 这篇论文是为了探索人工智能在创业领域中的潜力,具体来说是在企业金融领域中的创新金融。* Methods: 这篇论文使用科学文献综述来分析当前研究领域的现状,包括人工神经网络、深度神经网络、支持向量机等方法的使用。* Results: 论文发现当前领域的研究方向尚未充分发展,而人工神经网络、深度神经网络和支持向量机在各个话题 niches 中具有强大的表现,而其他方法如主题分析、杂化神经网络和增长层次自组织图等的使用相对较少。
    Abstract While the application of Artificial Intelligence in Finance has a long tradition, its potential in Entrepreneurship has been intensively explored only recently. In this context, Entrepreneurial Finance is a particularly fertile ground for future Artificial Intelligence proliferation. To support the latter, the study provides a bibliometric review of Artificial Intelligence applications in (1) entrepreneurial finance literature, and (2) corporate finance literature with implications for Entrepreneurship. Rigorous search and screening procedures of the scientific database Web of Science Core Collection resulted in the identification of 1890 relevant journal articles subjected to analysis. The bibliometric analysis gives a rich insight into the knowledge field's conceptual, intellectual, and social structure, indicating nascent and underdeveloped research directions. As far as we were able to identify, this is the first study to map and bibliometrically analyze the academic field concerning the relationship between Artificial Intelligence, Entrepreneurship, and Finance, and the first review that deals with Artificial Intelligence methods in Entrepreneurship. According to the results, Artificial Neural Network, Deep Neural Network and Support Vector Machine are highly represented in almost all identified topic niches. At the same time, applying Topic Modeling, Fuzzy Neural Network and Growing Hierarchical Self-organizing Map is quite rare. As an element of the research, and before final remarks, the article deals as well with a discussion of certain gaps in the relationship between Computer Science and Economics. These gaps do represent problems in the application of Artificial Intelligence in Economic Science. As a way to at least in part remedy this situation, the foundational paradigm and the bespoke demonstration of the Monte Carlo randomized algorithm are presented.
    摘要 artifical intelligence在金融领域有很长的传统,但它在创业方面的潜力刚刚被广泛探索。在这个 контексте,创业金融是人工智能的特别肥沃的地带。为支持后者,本研究提供了一个 bibliometric 评估人工智能在(1)创业金融文献中的应用,以及(2)公司财务文献中对创业的影响。通过严格的搜索和屏选过程,科学数据库Web of Science Core Collection中发现了1890个相关的学术论文,并进行了分析。 bibliometric 分析为我们了解创业金融领域的知识结构带来了丰富的信息,并指出了未开发和未利用的研究方向。根据我们所知,这是首次对人工智能、创业和金融之间的学术领域进行了地图和 bibliometric 分析,同时,这也是第一篇考虑人工智能方法在创业方面的评论。根据结果,人工神经网络、深度神经网络和支持向量机是所有标签 niches 中最具代表性的方法。同时,应用 topic Modeling、杂化神经网络和生长层次自适应MAP 是非常罕见的。在研究中,我们还讨论了计算机科学和经济科学之间的某些差距,这些差距代表了人工智能应用于经济科学中的问题。为纠正这种情况,我们在研究中提出了一种基本的基本思想和一个特定的示例,以示 Monte Carlo 随机化算法的应用。

Breast Cancer classification by adaptive weighted average ensemble of previously trained models

  • paper_url: http://arxiv.org/abs/2311.13206
  • repo_url: None
  • paper_authors: Mosab S. M. Farea, zhe chen
  • for: 检测乳腺癌的技术研究
  • methods: 使用适应静止图像CAD系统,提出采用已经完全训练的模型的适应平均ensemble技术,不同于文献中使用平均ensemble同时训练的方法
  • results: 对评估指标进行了改进,实现了98%的准确率,比最佳参与模型的97%高出1%,同时降低了假阳和假阴的数量,提高了性能指标
    Abstract Breast cancer is a serious disease that inflicts millions of people each year, and the number of cases is increasing. Early detection is the best way to reduce the impact of the disease. Researchers have developed many techniques to detect breast cancer, including the use of histopathology images in CAD systems. This research proposes a technique that combine already fully trained model using adaptive average ensemble, this is different from the literature which uses average ensemble before training and the average ensemble is trained simultaneously. Our approach is different because it used adaptive average ensemble after training which has increased the performance of evaluation metrics. It averages the outputs of every trained model, and every model will have weight according to its accuracy. The accuracy in the adaptive weighted ensemble model has achieved 98% where the accuracy has increased by 1 percent which is better than the best participating model in the ensemble which was 97%. Also, it decreased the numbers of false positive and false negative and enhanced the performance metrics.
    摘要 乳癌是一种严重的疾病,每年感染数百万人,而 случа例数量在增加。早期检测是降低疾病影响的最佳方法。研究人员已经开发了许多检测乳癌的技术,其中包括使用 histopathology 图像在 CAD 系统中。本研究提出了一种新的技术,即使用已经全部训练的模型,并使用适应性平均ensemble。这与文献中使用平均ensemble并同时进行训练的方法不同。我们的方法使用适应性平均ensemble после训练,这使得评估指标的准确率提高了1%,达到了98%。此外,我们的方法还减少了假阳性和假阴性的数量,提高了性能指标。

Cracking the Code of Negative Transfer: A Cooperative Game Theoretic Approach for Cross-Domain Sequential Recommendation

  • paper_url: http://arxiv.org/abs/2311.13188
  • repo_url: None
  • paper_authors: Chung Park, Taesan Kim, Taekyoon Choi, Junui Hong, Yelim Yu, Mincheol Cho, Kyunam Lee, Sungil Ryu, Hyungjun Yoon, Minsung Choi, Jaegul Choo
  • for: 这 paper 探讨了跨多个领域的续写推荐(CDSR)方法,该方法利用多个领域的信息生成准确和多样的推荐,并考虑用户交互的顺序性。
  • methods: 该 paper 提出了一种新的 CDSR 框架,用于解决跨多个领域的负面传递问题,其中评估每个领域之间的负面传递程度,并将相应的预测损失赋予低权重。此外,该 paper 还提出了一种层次对比学习方法,将序列级别的粗细类别信息与细粒度类别信息相结合,以减轻负面传递的影响。
  • results: 该 paper 在两个真实世界数据集上对十个领域的模型进行了比较,并显示了其性能的提高。
    Abstract This paper investigates Cross-Domain Sequential Recommendation (CDSR), a promising method that uses information from multiple domains (more than three) to generate accurate and diverse recommendations, and takes into account the sequential nature of user interactions. The effectiveness of these systems often depends on the complex interplay among the multiple domains. In this dynamic landscape, the problem of negative transfer arises, where heterogeneous knowledge between dissimilar domains leads to performance degradation due to differences in user preferences across these domains. As a remedy, we propose a new CDSR framework that addresses the problem of negative transfer by assessing the extent of negative transfer from one domain to another and adaptively assigning low weight values to the corresponding prediction losses. To this end, the amount of negative transfer is estimated by measuring the marginal contribution of each domain to model performance based on a cooperative game theory. In addition, a hierarchical contrastive learning approach that incorporates information from the sequence of coarse-level categories into that of fine-level categories (e.g., item level) when implementing contrastive learning was developed to mitigate negative transfer. Despite the potentially low relevance between domains at the fine-level, there may be higher relevance at the category level due to its generalised and broader preferences. We show that our model is superior to prior works in terms of model performance on two real-world datasets across ten different domains.
    摘要 To address this issue, we propose a new CDSR framework that mitigates negative transfer by assessing the extent of negative transfer from one domain to another and adaptively assigning low weight values to the corresponding prediction losses. We estimate the amount of negative transfer by measuring the marginal contribution of each domain to model performance based on cooperative game theory. Additionally, we developed a hierarchical contrastive learning approach that incorporates information from the sequence of coarse-level categories into that of fine-level categories (e.g., item level) when implementing contrastive learning. This helps to mitigate negative transfer, as there may be higher relevance at the category level due to its generalised and broader preferences.Our proposed model outperforms prior works in terms of model performance on two real-world datasets across ten different domains.

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization

  • paper_url: http://arxiv.org/abs/2311.13171
  • repo_url: https://github.com/prateeky2806/compeft
  • paper_authors: Prateek Yadav, Leshem Choshen, Colin Raffel, Mohit Bansal
  • for: 提高 parameter-efficient fine-tuning (PEFT) 模型的压缩和serve multiple experts on a single GPU
  • methods: 使用 sparsification 和 ternary quantization 减少 PEFT 模型的大小,不需要进行额外 retraining,保持或提高模型性能
  • results: 在 T5、T0 和 LLaMA 基础模型上,实现压缩比例为 8x-50x,并且 stronger models exhibit higher compressibility and better performanceHere’s the Chinese translation of the three points:
  • for: 提高 parameter-efficient fine-tuning (PEFT) 模型的压缩和serve multiple experts on a single GPU
  • methods: 使用 sparsification 和 ternary quantization 减少 PEFT 模型的大小,不需要进行额外 retraining,保持或提高模型性能
  • results: 在 T5、T0 和 LLaMA 基础模型上,实现压缩比例为 8x-50x,并且 stronger models exhibit higher compressibility and better performance
    Abstract Parameter-efficient fine-tuning (PEFT) techniques make it possible to efficiently adapt a language model to create "expert" models that specialize to new tasks or domains. Recent techniques in model merging and compositional generalization leverage these expert models by dynamically composing modules to improve zero/few-shot generalization. Despite the efficiency of PEFT methods, the size of expert models can make it onerous to retrieve expert models per query over high-latency networks like the Internet or serve multiple experts on a single GPU. To address these issues, we present ComPEFT, a novel method for compressing fine-tuning residuals (task vectors) of PEFT based models. ComPEFT employs sparsification and ternary quantization to reduce the size of the PEFT module without performing any additional retraining while preserving or enhancing model performance. In extensive evaluation across T5, T0, and LLaMA-based models with 200M - 65B parameters, ComPEFT achieves compression ratios of 8x - 50x. In particular, we show that ComPEFT improves with scale - stronger models exhibit higher compressibility and better performance. For example, we show that ComPEFT applied to LLaMA outperforms QLoRA by 4.16% on MMLU with a storage size reduction of up to 26x. In addition, we show that the compressed experts produced by ComPEFT maintain few-shot compositional generalization capabilities, facilitate efficient communication and computation, and exhibit enhanced performance when merged. Lastly, we provide an analysis of different method components, compare it with other PEFT methods, and test ComPEFT's efficacy for compressing the residual of full-finetuning. Our code is available at https://github.com/prateeky2806/compeft.
    摘要 对于parameter-efficient fine-tuning(PEFT)技术,我们提出了一种新的方法 called ComPEFT,用于对PEFT模型进行压缩。ComPEFT使用简化和ternary quantization来将PEFT模型的大小增加,并不需要进行额外的重训。我们在T5、T0和LLaMA基于模型上进行了广泛的评估,并发现ComPEFT可以实现8x-50x的压缩比率。尤其是,我们发现ComPEFT随着模型的大小而增加, stronger models exhibit higher compressibility and better performance。例如,我们发现ComPEFT应用于LLaMA可以与QLoRA相比,在MMLU上提高4.16%的表现,并且实现储存大小的增加。此外,我们还发现压缩后的专家模型仍然保持了几个测试 shot的通用化能力,并且可以实现高效的通信和计算。最后,我们提供了不同方法的分析,与其他PEFT方法进行比较,并评估ComPEFT在对完全训练的压缩中的效果。我们的代码可以在https://github.com/prateeky2806/compeft上获取。

SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

  • paper_url: http://arxiv.org/abs/2311.13169
  • repo_url: None
  • paper_authors: Hua Zheng, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Wen-Yen Chen, Wei Wen
  • for: automating neural network design, particularly in complex domains like RecSys
  • methods: combines zero-shot and one-shot NAS methods, using a “sub-one-shot” paradigm and a novel theoretical framework called SiGeo
  • results: outperforms state-of-the-art NAS proxies on various established NAS benchmarks, with a significant reduction in computational costs compared to weight-sharing one-shot NAS methods
    Abstract Neural Architecture Search (NAS) has become a widely used tool for automating neural network design. While one-shot NAS methods have successfully reduced computational requirements, they often require extensive training. On the other hand, zero-shot NAS utilizes training-free proxies to evaluate a candidate architecture's test performance but has two limitations: (1) inability to use the information gained as a network improves with training and (2) unreliable performance, particularly in complex domains like RecSys, due to the multi-modal data inputs and complex architecture configurations. To synthesize the benefits of both methods, we introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS. In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up." Within this framework, we present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy. Extensive experiments have shown that SiGeo, with the benefit of warm-up, consistently outperforms state-of-the-art NAS proxies on various established NAS benchmarks. When a supernet is warmed up, it can achieve comparable performance to weight-sharing one-shot NAS methods, but with a significant reduction ($\sim 60$\%) in computational costs.
    摘要

Multimodal Large Language Models: A Survey

  • paper_url: http://arxiv.org/abs/2311.13165
  • repo_url: https://github.com/IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving
  • paper_authors: Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, Philip S. Yu
  • for: 本文旨在探讨多Modal语言模型的开发和应用,包括图像、文本、语音等多种数据类型的结合。
  • methods: 本文 introduce了多Modal算法的历史发展和主要技术公司的产品,同时提供了实践指南和常用数据集,供研究人员进行实验和评估。
  • results: 本文总结了多Modal模型的应用和发展挑战,并提供了各种实际应用场景,以便读者更深入地了解多Modal模型的潜力和可能性。
    Abstract The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. Furthermore, we introduce a range of multimodal products, focusing on the efforts of major technology companies. A practical guide is provided, offering insights into the technical aspects of multimodal models. Moreover, we present a compilation of the latest algorithms and commonly used datasets, providing researchers with valuable resources for experimentation and evaluation. Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development. By addressing these aspects, this paper aims to facilitate a deeper understanding of multimodal models and their potential in various domains.
    摘要 translate(text="The exploration of multimodal language models integrates multiple data types, such as images, text, language, audio, and other heterogeneity. While the latest large language models excel in text-based tasks, they often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data. This paper begins by defining the concept of multimodal and examining the historical development of multimodal algorithms. Furthermore, we introduce a range of multimodal products, focusing on the efforts of major technology companies. A practical guide is provided, offering insights into the technical aspects of multimodal models. Moreover, we present a compilation of the latest algorithms and commonly used datasets, providing researchers with valuable resources for experimentation and evaluation. Lastly, we explore the applications of multimodal models and discuss the challenges associated with their development. By addressing these aspects, this paper aims to facilitate a deeper understanding of multimodal models and their potential in various domains.")以下是文本的简化中文翻译:多Modal语言模型的探索 integrates multiple data types, such as images, text, language, audio, and other heterogeneity. latest large language models excel in text-based tasks, but often struggle to understand and process other data types. Multimodal models address this limitation by combining various modalities, enabling a more comprehensive understanding of diverse data.本文首先定义了多Modal的概念,并考虑了多Modal算法的历史发展。然后,我们介绍了一系列多Modal产品,主要关注了大型科技公司的努力。此外,我们还提供了一个实用指南, offering insights into the technical aspects of multimodal models。此外,我们还提供了最新的算法和通用的数据集,为研究人员提供了价值的实验和评估资源。最后,我们探讨了多Modal模型的应用和其发展中的挑战。通过解决这些方面,本文希望能够促进多Modal模型的深入理解和其在不同领域的潜力。

Large Language Models in Education: Vision and Opportunities

  • paper_url: http://arxiv.org/abs/2311.13160
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Wensheng Gan, Zhenlian Qi, Jiayang Wu, Jerry Chun-Wei Lin
  • for: 这个研究旨在探讨和总结大自然语言模型(LLM)在智能教育领域的应用。
  • methods: 这篇文章首先介绍了LLM的研究背景和动机,然后讲述了数字教育和智能教育的关系,并summarized了现有的教育大模型(EduLLM)研究状况。
  • results: 本文提供了一个系统的总结和视野,以帮助教育者、研究人员和政策制定者更深入地理解LLM4Edu的潜在和挑战。同时,文章还提供了进一步发展和应用LLM4Edu的指导和思路。
    Abstract With the rapid development of artificial intelligence technology, large language models (LLMs) have become a hot research topic. Education plays an important role in human social development and progress. Traditional education faces challenges such as individual student differences, insufficient allocation of teaching resources, and assessment of teaching effectiveness. Therefore, the applications of LLMs in the field of digital/smart education have broad prospects. The research on educational large models (EduLLMs) is constantly evolving, providing new methods and approaches to achieve personalized learning, intelligent tutoring, and educational assessment goals, thereby improving the quality of education and the learning experience. This article aims to investigate and summarize the application of LLMs in smart education. It first introduces the research background and motivation of LLMs and explains the essence of LLMs. It then discusses the relationship between digital education and EduLLMs and summarizes the current research status of educational large models. The main contributions are the systematic summary and vision of the research background, motivation, and application of large models for education (LLM4Edu). By reviewing existing research, this article provides guidance and insights for educators, researchers, and policy-makers to gain a deep understanding of the potential and challenges of LLM4Edu. It further provides guidance for further advancing the development and application of LLM4Edu, while still facing technical, ethical, and practical challenges requiring further research and exploration.
    摘要 随着人工智能技术的快速发展,大型语言模型(LLM)已成为研究热点。教育对人类社会发展和进步具有重要作用。传统教育面临着个人学生差异、教学资源不充分分配和教学效果评价等挑战。因此,LLM在智能教育领域的应用有广阔前途。研究教育大型模型(EduLLM)不断演化,为实现个性化学习、智能指导和教学评价目标提供新的方法和approach。这篇文章的目的是调查和总结LLM在智能教育领域的应用。文章首先介绍了LLM的研究背景和动机,然后讨论了数字教育和EduLLM之间的关系,并结合现有研究的总结。本文的主要贡献是对LLM4Edu的研究背景、动机和应用进行系统的总结和视野,为教育者、研究人员和政策制定者提供深入的理解LLM4Edu的潜力和挑战。此外,文章还为进一步发展和应用LLM4Edu提供指导和思路,同时面临技术、伦理和实践挑战。

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

  • paper_url: http://arxiv.org/abs/2311.13614
  • repo_url: https://github.com/yuqifan1117/hallucidoctor
  • paper_authors: Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang
  • for: 本研究旨在调查多模态大语言模型(MLLMs)在机器生成数据上的听说演示性能,以及这种演示性能中的幻觉问题。
  • methods: 本研究使用了一种基于cross-checking的幻觉检测和消除框架,名为HalluciDoctor,以自动检测并消除幻觉在训练数据中。
  • results: 研究表明,使用HalluciDoctor可以成功 mitigate 44.6%的幻觉,并维持与LLaVA的竞争性能。
    Abstract Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks. However, the hallucinations inherent in machine-generated data, which could lead to hallucinatory outputs in MLLMs, remain under-explored. This work aims to investigate various hallucinations (i.e., object, relation, attribute hallucinations) and mitigate those hallucinatory toxicities in large-scale machine-generated visual instruction datasets. Drawing on the human ability to identify factual errors, we present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm. We use our framework to identify and eliminate hallucinations in the training data automatically. Interestingly, HalluciDoctor also indicates that spurious correlations arising from long-tail object co-occurrences contribute to hallucinations. Based on that, we execute counterfactual visual instruction expansion to balance data distribution, thereby enhancing MLLMs' resistance to hallucinations. Comprehensive experiments on hallucination evaluation benchmarks show that our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.The source code will be released at \url{https://github.com/Yuqifan1117/HalluciDoctor}.
    摘要 多模态大语言模型(MLLMs)在机器生成的指令遵循数据上调整后表现出色,但机器生成数据中的幻觉仍未得到充分探讨。这项工作目的是调查不同类型的幻觉(例如物体、关系、特征幻觉),并在大规模机器生成视觉指令数据中缓解这些幻觉毒害。基于人类可以识别事实错误的能力,我们提出了一种新的幻觉检测和消除框架,即HalluciDoctor,基于交叉检查方法。我们使用HalluciDoctor自动地标识并消除幻觉在训练数据中。有趣的是,HalluciDoctor还表明了由长尾物体共occurrence产生的假 correlations contribute to hallucinations。基于这,我们执行了对照性视觉指令扩展,以增强 MLLMs 对幻觉的抵抗力。我们对幻觉评估标准差进行了全面的实验,结果显示,我们的方法可以成功缓解 44.6% 的幻觉,并与 LLaVA 保持竞争性的性能。源代码将在 \url{https://github.com/Yuqifan1117/HalluciDoctor} 上发布。

Building the Future of Responsible AI: A Reference Architecture for Designing Large Language Model based Agents

  • paper_url: http://arxiv.org/abs/2311.13148
  • repo_url: None
  • paper_authors: Qinghua Lu, Liming Zhu, Xiwei Xu, Zhenchang Xing, Stefan Harrer, Jon Whittle
  • for: 该论文是为了提供基础模型基础的自主代理设计指南,以便在设计基础模型基础的自主代理时能够实现责任AI设计。
  • methods: 该论文使用了模式驱动的参考架构来提供自主代理设计指南,并且对两个实际应用中的基础模型基础自主代理进行了映射以评估参考架构的完整性和实用性。
  • results: 该论文提出了一种基于模式的参考架构,可以为基础模型基础自主代理设计提供一个责任AI设计的指南,并且通过对两个实际应用中的基础模型基础自主代理进行映射,证明了参考架构的完整性和实用性。
    Abstract Large language models (LLMs) have been widely recognised as transformative artificial generative intelligence (AGI) technologies due to their capabilities to understand and generate content, including plans with reasoning capabilities. Foundation model based agents derive their autonomy from the capabilities of foundation models, which enable them to autonomously break down a given goal into a set of manageable tasks and orchestrate task execution to meet the goal. Despite the huge efforts put into building foundation model based autonomous agents, the architecture design of the agents has not yet been systematically explored. Also, while there are significant benefits of using autonomous agents for planning and execution, there are serious considerations regarding responsible AI related software quality attributes, such as security and accountability. Therefore, this paper presents a pattern-oriented reference architecture that serves as architecture design guidance and enables responsible-AI-by-design when designing foundation model based autonomous agents. We evaluate the completeness and utility of the proposed reference architecture by mapping it to the architecture of two real-world agents.
    摘要 大型语言模型(LLM)已被广泛承认为转型人工生成智能(AGI)技术,因为它们可以理解和生成内容,包括带有逻辑能力的计划。基础模型基于代理人 derive their autonomy from the capabilities of foundation models, which enable them to autonomously break down a given goal into a set of manageable tasks and orchestrate task execution to meet the goal。虽然对基础模型基于自主代理人的架构设计做出了巨大的努力,但架构设计仍然没有得到系统的探讨。此外,使用自主代理人进行规划和执行也存在责任AI相关的软件质量属性的问题,如安全性和责任。因此,本文提出了一种模式强调式参考架构,作为基础模型基于自主代理人的架构设计指南,以便在设计过程中实现负责任AI。我们对提出的参考架构进行了完整性和实用性的评估,并将其映射到了两个实际应用中的架构。

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

  • paper_url: http://arxiv.org/abs/2311.13133
  • repo_url: https://github.com/97aditi/LIMIT
  • paper_authors: Aditi Jha, Sam Havens, Jeremey Dohmann, Alex Trott, Jacob Portes
  • for: 本研究旨在探讨 whether a small amount of diverse finetuning samples can improve performance on both traditional NLP benchmarks 和 open-ended, model-based evaluation.
  • methods: 我们使用 open-source MPT-7B 和 MPT-30B 模型进行finetuning,并在不同的 instruction finetuning 数据集上进行了多个预处理。
  • results: 我们发现 subsets of 1k-6k instruction finetuning samples 是 sufficient to achieve good performance on both traditional NLP benchmarks 和 model-based evaluation,并且混合 textbook-style 和 open-ended QA finetuning datasets 可以最佳化表现。
    Abstract Large Language Models are traditionally finetuned on large instruction datasets. However recent studies suggest that small, high-quality datasets can suffice for general purpose instruction following. This lack of consensus surrounding finetuning best practices is in part due to rapidly diverging approaches to LLM evaluation. In this study, we ask whether a small amount of diverse finetuning samples can improve performance on both traditional perplexity-based NLP benchmarks, and on open-ended, model-based evaluation. We finetune open-source MPT-7B and MPT-30B models on instruction finetuning datasets of various sizes ranging from 1k to 60k samples. We find that subsets of 1k-6k instruction finetuning samples are sufficient to achieve good performance on both (1) traditional NLP benchmarks and (2) model-based evaluation. Finally, we show that mixing textbook-style and open-ended QA finetuning datasets optimizes performance on both evaluation paradigms.
    摘要 庞大语言模型通常会在大量指令数据集上精度调整。然而,最近的研究表明,小量、高质量的数据集可以为通用指令遵从提供好的性能。这种对 LLM 最佳化做法的不一致是由于 NLP 评价方法的快速分化所致。在这项研究中,我们问道:一小量多样化的训练样本是否可以提高传统的步骤预测精度和开放式模型评价中的性能?我们使用 open-source MPT-7B 和 MPT-30B 模型,在不同大小的指令训练集上进行精度调整,其中训练集的大小从 1k 到 60k 个样本。我们发现,1k-6k 个指令训练样本的子集可以在传统的 NLP benchmark 上达到好的性能,同时在开放式模型评价中也能够取得良好的结果。最后,我们发现将文本书籍式和开放式 QA 训练集混合可以优化两种评价方法的性能。

Conditions for Length Generalization in Learning Reasoning Skills

  • paper_url: http://arxiv.org/abs/2311.16173
  • repo_url: None
  • paper_authors: Changnan Xiao, Bing Liu
  • for: 这个论文主要是为了研究人工智能代理(AI)的推理能力的理论基础。
  • methods: 这篇论文使用了大型自然语言模型(LLM)来检验推理任务的能力。
  • results: 论文发现了推理任务的长度总化问题,即当训练在更小的长度或大小的问题上时,模型很难处理更大的长度或大小的问题。这可能表明了推理学习技能的一些理论上的限制。
    Abstract Reasoning is a fundamental capability of AI agents. Recently, large language models (LLMs) have shown remarkable abilities to perform reasoning tasks. However, numerous evaluations of the reasoning capabilities of LLMs have also showed some limitations. An outstanding limitation is length generalization, meaning that when trained on reasoning problems of smaller lengths or sizes, the resulting models struggle with problems of larger sizes or lengths. This potentially indicates some theoretical limitations of generalization in learning reasoning skills. These evaluations and their observations motivated us to perform a theoretical study of the length generalization problem. This work focused on reasoning tasks that can be formulated as Markov dynamic processes (MDPs) and/or directed acyclic graphs (DAGs). It identifies and proves conditions that decide whether the length generalization problem can be solved or not for a reasoning task in a particular representation. Experiments are also conducted to verify the theoretical results.
    摘要 <>将文本翻译成简化中文。人工智能代理人的基本能力之一是逻辑能力。最近,大量语言模型(LLMs)在逻辑任务上表现出了很好的能力。然而,许多逻辑能力评估中的限制也得到了证明。其中一个重要的限制是长度总结,即当训练在更小的长度或大小的逻辑问题上时,得到的模型在更大的长度或大小的逻辑问题上困难。这可能指示了一些总结概念学习的理论限制。这些评估和其观察激发了我们对长度总结问题的理论研究。这项工作将关注在逻辑任务可以表示为Markov逻辑过程(MDPs)和/或直接径图(DAGs)上,并证明了某些条件可以 decidable或不可 decidable的逻辑问题。实验也进行了验证。

Toward Robust Imperceptible Perturbation against Unauthorized Text-to-image Diffusion-based Synthesis

  • paper_url: http://arxiv.org/abs/2311.13127
  • repo_url: https://github.com/liuyixin-louis/metacloak
  • paper_authors: Yixin Liu, Chenrui Fan, Yutong Dai, Xun Chen, Pan Zhou, Lichao Sun
  • For: 防护人像生成模型免被用来制造诽谤或危险内容,提高内容生成的安全性。* Methods: 使用元学习框架和额外的变数抽取过程,实现对受损测量的硬coded规则的解决,并创造可转移和Robust的抖音。* Results: 在VGGFace2和CelebA-HQ数据集上,MetaCloak比先前的方法表现出色,可以成功骗误在线训练服务Like Replicate,示出MetaCloak在实际应用中的效果。
    Abstract Text-to-image diffusion models allow seamless generation of personalized images from scant reference photos. Yet, these tools, in the wrong hands, can fabricate misleading or harmful content, endangering individuals. To address this problem, existing poisoning-based approaches perturb user images in an imperceptible way to render them "unlearnable" from malicious uses. We identify two limitations of these defending approaches: i) sub-optimal due to the hand-crafted heuristics for solving the intractable bilevel optimization and ii) lack of robustness against simple data transformations like Gaussian filtering. To solve these challenges, we propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework with an additional transformation sampling process to craft transferable and robust perturbation. Specifically, we employ a pool of surrogate diffusion models to craft transferable and model-agnostic perturbation. Furthermore, by incorporating an additional transformation process, we design a simple denoising-error maximization loss that is sufficient for causing transformation-robust semantic distortion and degradation in a personalized generation. Extensive experiments on the VGGFace2 and CelebA-HQ datasets show that MetaCloak outperforms existing approaches. Notably, MetaCloak can successfully fool online training services like Replicate, in a black-box manner, demonstrating the effectiveness of MetaCloak in real-world scenarios. Our code is available at https://github.com/liuyixin-louis/MetaCloak.
    摘要 文本到图像扩散模型允许从有限参考照片生成个性化图像。然而,这些工具,在错误的手上,可以生成误leading或危险内容,危害到个人。为解决这个问题,现有的毒素化approaches在用户图像上进行不可见的扰动,以使其“不可学习”。我们认为这两个限制:一是使用手工规则解决不可解决的二级优化问题,二是缺乏对简单数据变换 like Gaussian filtering 的Robustness。为解决这些挑战,我们提出MetaCloak,它通过元学习框架和额外的变换抽象过程,解决二级毒素化问题。具体来说,我们使用一个 diffusion 模型池来生成可转移的和模型无关的扰动。此外,我们采用额外的变换过程,设计了一个简单的干扰最大化loss,以致在个性化生成过程中引起转换稳定和降低。我们的实验表明,MetaCloak在VGGFace2和CelebA-HQ数据集上表现出色,并且可以成功欺骗在线培训服务Like Replicate, demonstrating the effectiveness of MetaCloak in real-world scenarios。我们的代码可以在https://github.com/liuyixin-louis/MetaCloak 中找到。

Combatting Human Trafficking in the Cyberspace: A Natural Language Processing-Based Methodology to Analyze the Language in Online Advertisements

  • paper_url: http://arxiv.org/abs/2311.13118
  • repo_url: None
  • paper_authors: Alejandro Rodriguez Perez, Pablo Rivas
    for: 本研究 targets the pressing issue of human trafficking in online C2C marketplaces, using advanced NLP techniques to combat human exploitation.methods: 本研究 introduces a novel methodology for generating pseudo-labeled datasets with minimal supervision, employing cutting-edge Transformer models for analysis and using Integrated Gradients for explainable insights.results: 本研究 provides a scalable, machine learning-driven approach to combat human trafficking, offering a rich resource for training state-of-the-art NLP models and filling a critical gap in the literature.
    Abstract This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques. We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models. Focusing on tasks like Human Trafficking Risk Prediction (HTRP) and Organized Activity Detection (OAD), we employ cutting-edge Transformer models for analysis. A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement. This work not only fills a critical gap in the literature but also offers a scalable, machine learning-driven approach to combat human exploitation online. It serves as a foundation for future research and practical applications, emphasizing the role of machine learning in addressing complex social issues.
    摘要 Translation notes:* "C2C" stands for "consumer-to-consumer" and refers to online marketplaces where individuals can buy and sell goods and services directly with each other.* "HTRP" stands for "human trafficking risk prediction" and refers to the task of using machine learning algorithms to predict the likelihood that a given online advertisement or other content may be related to human trafficking.* "OAD" stands for "organized activity detection" and refers to the task of using machine learning algorithms to detect and identify organized criminal activity, such as human trafficking, in online content.* "Integrated Gradients" is a technique for explaining the predictions of machine learning models, by visualizing the contribution of different input features to the model's output.* "law enforcement" refers to the agencies and officials responsible for enforcing laws and maintaining public safety.

PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF

  • paper_url: http://arxiv.org/abs/2311.13099
  • repo_url: None
  • paper_authors: Yutao Feng, Yintong Shang, Xuan Li, Tianjia Shao, Chenfanfu Jiang, Yin Yang
  • for: 本研究旨在推广Physics-based simulations的应用,使其可以与NeRF技术集成,生成高质量的塑性动力学模拟。
  • methods: 本研究使用了一种 meshless 的非线性弹性材料模型,通过 quadratic generalized moving least square (Q-GMLS) 方法来捕捉非线性动力学和大塑形变。
  • results: 研究者们成功地实现了将Physics-based simulations与NeRF技术集成,并可以实时生成各种复杂和多维度形状的塑性动力学模拟。
    Abstract We show that physics-based simulations can be seamlessly integrated with NeRF to generate high-quality elastodynamics of real-world objects. Unlike existing methods, we discretize nonlinear hyperelasticity in a meshless way, obviating the necessity for intermediate auxiliary shape proxies like a tetrahedral mesh or voxel grid. A quadratic generalized moving least square (Q-GMLS) is employed to capture nonlinear dynamics and large deformation on the implicit model. Such meshless integration enables versatile simulations of complex and codimensional shapes. We adaptively place the least-square kernels according to the NeRF density field to significantly reduce the complexity of the nonlinear simulation. As a result, physically realistic animations can be conveniently synthesized using our method for a wide range of hyperelastic materials at an interactive rate. For more information, please visit our project page at https://fytalon.github.io/pienerf/.
    摘要 我们显示了物理模拟可以与NeRF进行无缝整合,以生成高品质的弹簧动力学。不同于现有方法,我们使用一种不需要中继材料的网络无形几何来精确地解析非线性弹簧性。我们使用二项扩展最小二乘(Q-GMLS)来捕捉非线性动力学和大型变形。这种网络无形集成允许我们灵活地模拟复杂的弹簧形状。我们可以根据NeRF密度场进行适应性地置置最小二乘核心,以大大减少非线性模拟的复杂性。因此,我们可以快速地生成具有广泛应用性的弹簧材料的实际physically realistic动画,并在互动率下完成。详情可以参考我们的项目页面:https://fytalon.github.io/pienerf/.

  • paper_url: http://arxiv.org/abs/2311.13095
  • repo_url: None
  • paper_authors: Ha-Thanh Nguyen, Wachara Fungwacharakorn, Ken Satoh
  • For: This paper aims to improve the logical reasoning capabilities of Large Language Models (LLMs) in order to expand their applicability in law and other logic-intensive disciplines.* Methods: The proposed Reinforcement Learning from Logical Feedback (RLLF) approach and a revised evaluation methodology are used to refine LLMs’ reasoning capacities.* Results: The RLLF approach and revised evaluation methodology are shown to be effective in improving LLMs’ logical reasoning abilities, opening up new avenues for research in this domain and contributing to the development of LLMs capable of handling complex legal reasoning tasks.Here is the same information in Simplified Chinese text:* For: 本研究旨在提高大语言模型(LLMs)的逻辑推理能力,以扩展其在法律和其他逻辑敏感领域的应用。* Methods: 提议的Reinforcement Learning from Logical Feedback(RLLF)方法和修订后的评估方法,用于强化LLMs的逻辑推理能力。* Results: RLLF方法和修订后的评估方法被证明有效,可以提高LLMs的逻辑推理能力,开拓新的研究领域,并对LLMs处理复杂的法律推理任务的开发做出贡献。
    Abstract Language serves as a vehicle for conveying thought, enabling communication among individuals. The ability to distinguish between diverse concepts, identify fairness and injustice, and comprehend a range of legal notions fundamentally relies on logical reasoning. Large Language Models (LLMs) attempt to emulate human language understanding and generation, but their competency in logical reasoning remains limited. This paper seeks to address the philosophical question: How can we effectively teach logical reasoning to LLMs while maintaining a deep understanding of the intricate relationship between language and logic? By focusing on bolstering LLMs' capabilities in logical reasoning, we aim to expand their applicability in law and other logic-intensive disciplines. To this end, we propose a Reinforcement Learning from Logical Feedback (RLLF) approach, which serves as a potential framework for refining LLMs' reasoning capacities. Through RLLF and a revised evaluation methodology, we explore new avenues for research in this domain and contribute to the development of LLMs capable of handling complex legal reasoning tasks while acknowledging the fundamental connection between language and logic.
    摘要 语言作为思维表达的工具,帮助人们进行交流沟通。理解多元概念、识别公平和不公平,以及理解多种法律概念的理解都基于逻辑思维。大型自然语言模型(LLM)试图模仿人类语言理解和生成能力,但它们的逻辑思维能力仍有限制。这篇论文问题是如何有效地教育LLM们的逻辑思维能力,保持语言和逻辑之间的深入关系?我们提出了一种基于强化反馈的逻辑学习(RLLF)方法,作为改进LLM们的逻辑思维能力的潜在框架。通过RLLF和修订的评价方法,我们探索了新的研究领域,并为LLM们具有复杂法律逻辑任务处理能力的开发做出贡献,同时也保持语言和逻辑之间的基本关系。

On the Limitation of Diffusion Models for Synthesizing Training Datasets

  • paper_url: http://arxiv.org/abs/2311.13090
  • repo_url: None
  • paper_authors: Shin’ya Yamaguchi, Takuma Fukuda
  • for: 本研究旨在探讨现代扩散模型是否能够完美地复制训练数据分布,并研究扩散模型生成的数据是否能够涵盖训练数据分布的全部。
  • methods: 本研究使用了现代扩散模型生成的数据,并通过对生成的数据进行反向扩散来控制信息的加权。通过评估重建的样本和训练模型,我们发现生成的数据集中集中在训练数据分布的模式内,而难以覆盖数据分布的外部部分。
  • results: 我们发现现代扩散模型不能完美地复制训练数据分布,而且生成的数据集中集中在训练数据分布的模式内,难以覆盖数据分布的外部部分。这表明现代扩散模型在复制训练数据分布方面存在改进的空间。
    Abstract Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpose of replicating datasets for training discriminative tasks. This paper investigates the gap between synthetic and real samples by analyzing the synthetic samples reconstructed from real samples through the diffusion and reverse process. By varying the time steps starting the reverse process in the reconstruction, we can control the trade-off between the information in the original real data and the information added by diffusion models. Through assessing the reconstructed samples and trained models, we found that the synthetic data are concentrated in modes of the training data distribution as the reverse step increases, and thus, they are difficult to cover the outer edges of the distribution. Our findings imply that modern diffusion models are insufficient to replicate training data distribution perfectly, and there is room for the improvement of generative modeling in the replication of training datasets.
    摘要 现代扩散模型可能有 promise 用于训练逻归模型,但我们发现使用 state-of-the-art 扩散模型时,Synthetic 数据会对实际数据集的分类性能产生负面影响。这意味着现代扩散模型不能完美地表示数据分布,无法用于复制训练数据集。这篇论文 investigate 了 Synthetic 和实际样本之间的差异,通过分析扩散和反扩散过程中 Synthetic 样本的重建来控制实际数据集的信息和扩散模型添加的信息之间的平衡。通过评估重建样本和训练模型,我们发现 Synthetic 数据集在增加反扩散步骤时会集中在训练数据集的模式上,因此很难覆盖数据集的外部边缘。我们的发现表明现代扩散模型无法完美复制训练数据集的分布,有很大的改进空间。

Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization

  • paper_url: http://arxiv.org/abs/2311.13087
  • repo_url: None
  • paper_authors: James Kotary, Vincenzo Di Vito, Jacob Christopher, Pascal Van Hentenryck, Ferdinando Fioretto
  • for: 这篇论文旨在提出一种可以从可观察的特征学习优化问题的解的方法,并且可以与传统的Predict-Then-Optimize框架相结合。
  • methods: 这篇论文使用机器学习模型来预测优化问题的解,并且可以在训练过程中将优化问题的解与训练资料进行整合。
  • results: 实验结果显示这种方法可以提供高效、精准和柔软的解决方案,并且可以与传统的Predict-Then-Optimize框架相结合。
    Abstract Many real-world decision processes are modeled by optimization problems whose defining parameters are unknown and must be inferred from observable data. The Predict-Then-Optimize framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. Recent works show that decision quality can be improved in this setting by solving and differentiating the optimization problem in the training loop, enabling end-to-end training with loss functions defined directly on the resulting decisions. However, this approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models. The approach is generic, and based on an adaptation of the Learning-to-Optimize paradigm, from which a rich variety of existing techniques can be employed. Experimental evaluations show the ability of several Learning-to-Optimize methods to provide efficient, accurate, and flexible solutions to an array of challenging Predict-Then-Optimize problems.
    摘要 许多现实世界中的决策过程是通过优化问题来模型的,其定义参数是未知的并需要从可观察数据中推断出来。Predict-Then-Optimize框架使用机器学习模型来预测优化问题中的未知参数从特征上,以解决问题。 recent works show that 决策质量可以通过在训练循环中解决和分析优化问题来提高,但这种方法可能会不够效率,需要手工编写问题特有的规则来实现反propagation。这篇论文提出了一种替代方法,在可观察特征上使用预测模型来学习优化解决方案。该方法是通用的,可以employs a variety of existing techniques from the Learning-to-Optimize paradigm.实验评估表明了许多学习优化方法在 Predict-Then-Optimize 问题中提供了高效、准确和灵活的解决方案。

Learning to Fly in Seconds

  • paper_url: http://arxiv.org/abs/2311.13081
  • repo_url: https://github.com/arplaboratory/learning-to-fly
  • paper_authors: Jonas Eschmann, Dario Albani, Giuseppe Loianno
  • for: 这个论文的目的是提出一种基于学习的控制方法,特别是使用深度学习来控制自适应多旋翼飞行器。
  • methods: 该方法使用了一种非对称演员-评估器架构,并使用了一种高度可靠的学习控制训练方法,以实现direct RPM控制。
  • results: 该方法可以在consumer-grade笔记型电脑上快速训练,并且可以在实时保证下控制多旋翼飞行器。其表现和现有控制方案进行比较,并且可以在实验中证明其竞争性。
    Abstract Learning-based methods, particularly Reinforcement Learning (RL), hold great promise for streamlining deployment, enhancing performance, and achieving generalization in the control of autonomous multirotor aerial vehicles. Deep RL has been able to control complex systems with impressive fidelity and agility in simulation but the simulation-to-reality transfer often brings a hard-to-bridge reality gap. Moreover, RL is commonly plagued by prohibitively long training times. In this work, we propose a novel asymmetric actor-critic-based architecture coupled with a highly reliable RL-based training paradigm for end-to-end quadrotor control. We show how curriculum learning and a highly optimized simulator enhance sample complexity and lead to fast training times. To precisely discuss the challenges related to low-level/end-to-end multirotor control, we also introduce a taxonomy that classifies the existing levels of control abstractions as well as non-linearities and domain parameters. Our framework enables Simulation-to-Reality (Sim2Real) transfer for direct RPM control after only 18 seconds of training on a consumer-grade laptop as well as its deployment on microcontrollers to control a multirotor under real-time guarantees. Finally, our solution exhibits competitive performance in trajectory tracking, as demonstrated through various experimental comparisons with existing state-of-the-art control solutions using a real Crazyflie nano quadrotor. We open source the code including a very fast multirotor dynamics simulator that can simulate about 5 months of flight per second on a laptop GPU. The fast training times and deployment to a cheap, off-the-shelf quadrotor lower the barriers to entry and help democratize the research and development of these systems.
    摘要 学习基于方法,特别是强化学习(RL),对自适应多旋翼飞行器的控制具有很大的承诺。深度RL在模拟中控制复杂系统的精度和活力很高,但模拟到实际世界的跨度往往存在困难的现实差距。此外,RL通常受到很长的训练时间的限制。在这种情况下,我们提出了一种新的不对称演员-评估器结构,并结合高度可靠的RL训练方法,用于端到端多旋翼飞行器的控制。我们表明了课程学习和高度优化的模拟器可以提高样本复杂性和快速训练时间。为了准确描述低级/端到端多旋翼飞行器控制的挑战,我们还提出了一种分类多控制层次、非线性和领域参数的税onomy。我们的框架实现了实际到现实(Sim2Real)的转移,并在consumer级别的笔记计算机上进行了18秒的训练,以及在微控制器上实现了实时保证的多旋翼飞行器控制。最后,我们的解决方案在曲线跟踪中展示了竞争性的性能,通过了多种实验比较,使用实际的Crazyflie纳米四旋翼飞行器。我们开源了代码,包括非常快的多旋翼飞行器动力学模拟器,可以在笔记计算机上的GPU上模拟约5个月的飞行时间。快速的训练时间和在便宜的卖场 quadrotor 上部署,降低了入门门槛和帮助了多旋翼飞行器系统的研发。

Positional Description Matters for Transformers Arithmetic

  • paper_url: http://arxiv.org/abs/2311.14737
  • repo_url: None
  • paper_authors: Ruoqi Shen, Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, Yuanzhi Li, Yi Zhang
  • for: 这篇论文的目的是解决Transformers在数学任务上的问题,尤其是在小数位数的情况下表现不佳。
  • methods: 作者提出了多种解决方案,包括修改 pozitional encoding directly,以及改变表示 arithmetic task 的方式,以更好地利用标准 pozitional encoding。
  • results: 作者通过对三个任务进行研究:(i)纯数 multiplication,(ii)加法长度推导,以及(iii)加法在自然语言上下文中。结果显示,作者可以使用一个小型模型(100M parameters和300k samples)在 direct 加法中达到15位数的高精度,而常规训练在这种情况下会导致模型在4位数 multiplication 失败。在加法任务中,作者只使用120k samples,能够:(ii)在测试12位数时进行 extrapolation,而常规训练无法进行 extrapolation;(iii)在5位数之前达到非常高的准确率,而常规训练只能在3位数之前达到准确率(这实际上是 memorization 的)。
    Abstract Transformers, central to the successes in modern Natural Language Processing, often falter on arithmetic tasks despite their vast capabilities --which paradoxically include remarkable coding abilities. We observe that a crucial challenge is their naive reliance on positional information to solve arithmetic problems with a small number of digits, leading to poor performance on larger numbers. Herein, we delve deeper into the role of positional encoding, and propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently. We investigate the value of these modifications for three tasks: (i) classical multiplication, (ii) length extrapolation in addition, and (iii) addition in natural language context. For (i) we train a small model on a small dataset (100M parameters and 300k samples) with remarkable aptitude in (direct, no scratchpad) 15 digits multiplication and essentially perfect up to 12 digits, while usual training in this context would give a model failing at 4 digits multiplication. In the experiments on addition, we use a mere 120k samples to demonstrate: for (ii) extrapolation from 10 digits to testing on 12 digits numbers while usual training would have no extrapolation, and for (iii) almost perfect accuracy up to 5 digits while usual training would be correct only up to 3 digits (which is essentially memorization with a training set of 120k samples).
    摘要 启发器在现代自然语言处理中取得成功,然而在数学任务上经常表现不佳,尤其是当数字小时。我们发现,启发器在解决小数字的数学任务时依赖于位置信息,导致性能不佳。在这篇文章中,我们深入探讨了位置编码的角色,并提出了多种修复方法,包括修改位置编码或修改数学任务的表示方式,以更好地利用标准的位置编码。我们对三个任务进行了研究:(i)经典乘法,(ii)加法延展,和(iii)自然语言上的加法。对于(i),我们训练了一个小型模型(100M参数和300k样本),其在直接无笔记录的15位数 multiplication 中表现杰出,可以很好地解决12位数的问题,而通常的训练方法会在4位数 multiplication 上失败。在加法任务中,我们使用了120k个样本,以示出:(ii)从10位数到测试12位数的推导,而通常的训练方法会没有推导。以及(iii)在5位数之前的精度是极其高的,而通常的训练方法只能在3位数之前达到正确率(这实际上是Memorization)。