cs.AI - 2023-07-11

Handwritten Text Recognition Using Convolutional Neural Network

  • paper_url: http://arxiv.org/abs/2307.05396
  • repo_url: https://github.com/sushant097/Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow
  • paper_authors: Atman Mishra, A. Sharath Ram, Kavyashree C
  • for: 这篇论文的目的是提出一种基于Convolutional Neural Network(CNN)模型的智能字符识别系统,以便将手写或印刷的字符转换为ASCII文本。
  • methods: 该论文使用了NIST数据集中的超过100,000张图像进行训练,并通过提取图像中的特征来生成每个图像的概率分类结果。
  • results: 该论文在NIST数据集上达到了90.54%的准确率,并且loss为2.53%。
    Abstract OCR (Optical Character Recognition) is a technology that offers comprehensive alphanumeric recognition of handwritten and printed characters at electronic speed by merely scanning the document. Recently, the understanding of visual data has been termed Intelligent Character Recognition (ICR). Intelligent Character Recognition (ICR) is the OCR module that can convert scans of handwritten or printed characters into ASCII text. ASCII data is the standard format for data encoding in electronic communication. ASCII assigns standard numeric values to letters, numeral, symbols, white-spaces and other characters. In more technical terms, OCR is the process of using an electronic device to transform 2-Dimensional textual information into machine-encoded text. Anything that contains text both machine written or handwritten can be scanned either through a scanner or just simply a picture of the text is enough for the recognition system to distinguish the text. The goal of this papers is to show the results of a Convolutional Neural Network model which has been trained on National Institute of Science and Technology (NIST) dataset containing over a 100,000 images. The network learns from the features extracted from the images and use it to generate the probability of each class to which the picture belongs to. We have achieved an accuracy of 90.54% with a loss of 2.53%.
    摘要 OCR(光学字符识别)技术可以快速将手写和印刷字符转换为电子文档,只需扫描文档即可。近些年来,人们开始称这种技术为智能字符识别(ICR)。ICR模块可以将扫描到的手写或印刷字符转换为ASCII文本。ASCII数据是电子通信中的标准编码格式,它将字母、数字、符号、空格和其他字符赋予标准的数字值。在更技术性的说法来,OCR是将二维文本信息转换为机器编码文本的过程。任何包含文本的东西,无论是机器写的还是手写的,都可以通过扫描或直接拍照来让识别系统识别文本。本文的目标是通过使用一个 convolutional neural network 模型,对国家标准技术研究所(NIST)数据集中的 более than 100,000 张图片进行训练,以便从图片中提取特征,并使用这些特征来计算每个图片的概率属于哪一类。我们已经实现了90.54%的准确率,损失为2.53%。

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

  • paper_url: http://arxiv.org/abs/2307.05358
  • repo_url: None
  • paper_authors: Sikai Bai, Shuaicheng Li, Weiming Zhuang, Jie Zhang, Song Guo, Kunlin Yang, Jun Hou, Shuai Zhang, Junyu Gao, Shuai Yi
  • for: 这个研究是为了解决 Federated semi-supervised learning (FSSL) 中 Data Distribution 不均匀的问题,包括在 client 端的 Label scarce 问题。
  • methods: 这个研究提出了一个名为 FedDure 的新的 FSSL 框架,使用了两种 Regulator:Coarse-grained Regulator (C-reg) 和 Fine-grained Regulator (F-reg),实现了在 client 端执行模型训练的 Bi-level 优化。
  • results: 这个研究 empirically 显示了 FedDure 在多种设定下表现出色,特别是在 CIFAR-10 和 CINIC-10 数据集上表现了超过 11% 的提升。
    Abstract Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.
    摘要 federated 学习已成为学习 Decentralized 多样数据的流行方法。 federated 半监督学习(FSSL)emerges 用一小 Fraction of labeled data 来训练模型,因为 labels 的稀缺在 Decentralized 客户端。 existing FSSL 方法假设客户端上的标签数据是独立并且相同分布的(IID),并且在客户端内部的标签和无标签数据之间具有一致的类别分布。这项工作研究了更实际和挑战性的 FSSL 场景, где数据分布不仅在客户端之间不同,而且在客户端内部的标签和无标签数据之间也不同。为解决这个挑战,我们提议了一种新的 FSSL 框架,即 FedDure。FedDure 继承了先前的假设,并使用粗粒度调节器(C-reg)和细粒度调节器(F-reg)来规范本地模型的更新。C-reg 跟踪本地模型的学习效果对标签数据分布的影响,而 F-reg 学习了适应每个客户端的适应权重方案。我们还将客户端模型训练定义为两级优化问题,以适应两个调节器。理论上,我们证明了调节器的收敛保证。实际上,我们证明了 FedDure 在各种设置下的优越性,特别是在 CIFAR-10 和 CINIC-10 数据集上,提高了 более чем 11%。

ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production

  • paper_url: http://arxiv.org/abs/2307.05328
  • repo_url: None
  • paper_authors: Jackson Loth, Pedro Sarmento, CJ Carr, Zack Zukowski, Mathieu Barthet
  • for: 这个论文的目的是创建一种能够生成进步金属音乐的人工智能工具,通过人机合作来创作乐曲。
  • methods: 这个论文使用了一个已经预训练的 Transformer 模型,并在 ProgGP 数据集上进行了微调,以便使用 GuitarPro 格式的符号化表示来生成多个吉他、贝司、鼓、钢琴和管弦乐部分。
  • results: 研究人员使用了一种混合方法,结合计算音乐学和实践研究两种方法,来评估生成的乐曲的有效性。最终,他们使用了这种模型来创作一首完整的进步金属歌曲,并由人类重金属制作人在 AI 生成的音乐基础上进行了完整的制作和混音。
    Abstract Recent work in the field of symbolic music generation has shown value in using a tokenization based on the GuitarPro format, a symbolic representation supporting guitar expressive attributes, as an input and output representation. We extend this work by fine-tuning a pre-trained Transformer model on ProgGP, a custom dataset of 173 progressive metal songs, for the purposes of creating compositions from that genre through a human-AI partnership. Our model is able to generate multiple guitar, bass guitar, drums, piano and orchestral parts. We examine the validity of the generated music using a mixed methods approach by combining quantitative analyses following a computational musicology paradigm and qualitative analyses following a practice-based research paradigm. Finally, we demonstrate the value of the model by using it as a tool to create a progressive metal song, fully produced and mixed by a human metal producer based on AI-generated music.
    摘要 近期在 симвоlic music generation 领域的工作表明了使用 GuitarPro 格式为输入和输出表示的 токен化基于的可能性,这种表示支持吉他表达特性。我们将此工作推广到了一个 Progressive metal 类型的自定义数据集(ProgGP)上,并使用预训练的 Transformer 模型进行微调,以创建这种类型的作品。我们的模型可以生成多个吉他、低音吉他、鼓、钢琴和管弦部分。我们通过混合计算音乐学和实践研究两种方法来评估生成的音乐的有效性。最后,我们使用这种模型创造了一首完整的进步金属歌曲,由人工制作和混音。

Automatic Generation of Semantic Parts for Face Image Synthesis

  • paper_url: http://arxiv.org/abs/2307.05317
  • repo_url: https://github.com/TFonta/Semantic-VAE
  • paper_authors: Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati
  • for: 本研究旨在生成具有 semantic segmentation 标签的实际图像,并可以自动控制图像的形状和文化。
  • methods: 我们提出了一个网络架构,可以将 semantic segmentation 图像中的物类分类 embedding 独立地编识。然后,我们使用 bi-directional LSTM 层和梯度减少层,将新的、地方化修改的图像输出出来。
  • results: 我们在 CelebMask-HQ 数据集上进行了量化和质感评估,结果显示我们的模型可以将 segmentation 图像 faithfully 重建,并且可以自动修改图像的形状和文化。此外,我们还证明了我们的模型可以与 semantic image synthesis generator 组合使用,实现对图像的完全自动生成控制。
    Abstract Semantic image synthesis (SIS) refers to the problem of generating realistic imagery given a semantic segmentation mask that defines the spatial layout of object classes. Most of the approaches in the literature, other than the quality of the generated images, put effort in finding solutions to increase the generation diversity in terms of style i.e. texture. However, they all neglect a different feature, which is the possibility of manipulating the layout provided by the mask. Currently, the only way to do so is manually by means of graphical users interfaces. In this paper, we describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks, with specific focus on human faces. Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited. Then, a bi-directional LSTM block and a convolutional decoder output a new, locally manipulated mask. We report quantitative and qualitative results on the CelebMask-HQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level. Also, we show our model can be put before a SIS generator, opening the way to a fully automatic generation control of both shape and texture. Code available at https://github.com/TFonta/Semantic-VAE.
    摘要 Semantic Image Synthesis (SIS) 指的是根据 semantic segmentation mask 生成真实的图像,其中 semantic segmentation mask 定义图像中对象类的空间布局。大多数文献中的方法,除了生成图像质量外,都努力增加生成图像的样式多样性,即文本ure。然而,它们都忽略了一个不同的特点,即使用 semantic segmentation mask 中提供的布局可以被修改。目前,只有通过图形用户界面进行手动修改。在这篇文章中,我们描述了一种网络架构,用于自动修改或生成 semantic segmentation mask 中对象类的形状,具体是人脸。我们的提议的模型可以将 mask 类别 embedding embedding 在一个隐藏空间中,每个类别 embedding 可以独立地编辑。然后,一个 bidirectional LSTM 块和一个 convolutional decoder 输出一个新的、本地修改过的 mask。我们在 CelebMask-HQ 数据集上进行了评量和质量测试,结果表明我们的模型可以同时准确地重建和修改 segmentation mask。此外,我们的模型可以在 SIS 生成过程中使用,打开了完全自动控制图像的形状和Texture的可能性。代码可以在 https://github.com/TFonta/Semantic-VAE 上找到。

Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

  • paper_url: http://arxiv.org/abs/2307.05314
  • repo_url: https://github.com/pengfeiliheu/mumc
  • paper_authors: Pengfei Li, Gang Liu, Jinlong He, Zixu Zhao, Shenjun Zhong
  • for: 这 paper 的目的是提出一种新的自动化医学图像问答系统,以便回答基于医学图像的临床问题。
  • methods: 该 paper 使用了一种新的自我supervised方法,通过利用医学图像描述集和文本描述集来学习输入图像和文本的单模态和多模态特征表示,并使用了掩码语言模型和图像文本匹配作为预训练目标。
  • results: 该 paper 的实验结果表明,使用该自我supervised方法可以在三个公开available的医学图像问答数据集上实现状态机器的表现,并且与之前的最佳性能相比,具有2.2%、14.7%和1.7%的准确率提升。
    Abstract Medical visual question answering (VQA) is a challenging task that requires answering clinical questions of a given medical image, by taking consider of both visual and language information. However, due to the small scale of training data for medical VQA, pre-training fine-tuning paradigms have been a commonly used solution to improve model generalization performance. In this paper, we present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text using medical image caption datasets, by leveraging both unimodal and multimodal contrastive losses, along with masked language modeling and image text matching as pretraining objectives. The pre-trained model is then transferred to downstream medical VQA tasks. The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets with significant accuracy improvements of 2.2%, 14.7%, and 1.7% respectively. Besides, we conduct a comprehensive analysis to validate the effectiveness of different components of the approach and study different pre-training settings. Our codes and models are available at https://github.com/pengfeiliHEU/MUMC.
    摘要 医疗视觉问答(VQA)是一项复杂的任务,需要根据医疗图像回答医疗问题,同时考虑视觉和语言信息。然而,due to the small scale of training data for medical VQA, pre-training fine-tuning paradigms have been a commonly used solution to improve model generalization performance.在这篇论文中,我们提出了一种新的自主学习方法,通过医疗图像caption dataset来学习输入图像和文本的单modal和多modal特征表示,并使用单modal和多modal对比损失、遮盖语言模型和图像文本匹配作为预训练目标。预训练模型然后被转移到下游医疗VQA任务中。我们的方法实现了三个公共可用的医疗VQA数据集上的state-of-the-art(SOTA)性能,并且在不同的预训练设置下进行了广泛的分析和研究。我们的代码和模型可以在https://github.com/pengfeiliHEU/MUMC上获取。

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

  • paper_url: http://arxiv.org/abs/2307.05300
  • repo_url: https://github.com/mikewangwzhl/solo-performance-prompting
  • paper_authors: Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji
    for: 这个论文旨在使大语言模型(LLM)成为聪明的同工合作者,通过自我合作和多个人格的交互来提高问题解决和总体表现。methods: 该论文提出了 Solo Performance Prompting(SPP)技术,通过在不同的任务输入下动态地标识和模拟多个人格,使得 LLM 可以充分发挥多元智能的优势。results: 在三个复杂任务中(知识填充创作、合作猜谜和逻辑网格问题),SPP 技术能够更好地解决问题,提高 LLM 的问题解决和总体表现,并且能够避免过度推理和幻想现象。
    Abstract Human intelligence thrives on the concept of cognitive synergy, where collaboration and information integration among different cognitive processes yield superior outcomes compared to individual cognitive processes in isolation. Although Large Language Models (LLMs) have demonstrated promising performance as general task-solving agents, they still struggle with tasks that require intensive domain knowledge and complex reasoning. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist refers to an intelligent agent that collaborates with multiple minds, combining their individual strengths and knowledge, to enhance problem-solving and overall performance in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. We have discovered that assigning multiple, fine-grained personas in LLMs elicits better problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, SPP effectively elicits internal knowledge acquisition abilities, reduces hallucination, and maintains strong reasoning capabilities. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git.
    摘要 人类智能强制依赖于认知协同作用,即不同认知过程之间的协作和信息集成,以实现更高水平的成果。虽然大型自然语言模型(LLM)已经表现出了普通任务解决能力的承诺,但它们在需要深厚领域知识和复杂逻辑的任务时仍然受到挑战。在这项工作中,我们提出了 Solo Performance Prompting(SPP),它将单个 LLM 转变成一个认知协同者,通过与多个人格进行多回合自我合作来提高问题解决和总体性能。一个认知协同者是一个智能代理,它与多个智能合作,汇集它们的个人优势和知识,以提高复杂任务的问题解决和总体性能。通过动态确定和模拟不同人格基于任务输入,SPP 解放了 LLM 中认知协同的潜力。我们发现,对 LLM 分配多个细化的人格可以提高问题解决能力,比使用单一或固定数量的人格更好。我们在三个挑战任务上评估了 SPP:知识型创作、codename合作和逻辑网格问题,这些任务包括知识丰富和逻辑推理两类。不同于前一些作品,如链条思维,SPP 不仅提高 LLM 的逻辑能力,还能够诱发内部知识获得能力,减少幻想,保持强大的逻辑能力。代码、数据和提示可以在:https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git 找到。

RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity Recognition

  • paper_url: http://arxiv.org/abs/2307.07417
  • repo_url: None
  • paper_authors: Sihan Song, Furao Shen, Jian Zhao
  • for: 提高low-resource NER任务中的数据稀少问题,提出了Robust Prompt-based Data Augmentation(RoPDA)方法。
  • methods: RoPDA使用预训练语言模型(PLM)的连续提示,通过五种基本的数据增强操作进行实体增强和上下文增强,生成标签颠倒和标签保持的例子。
  • results: 对三个不同领域的benchmark进行了广泛的实验,证明了RoPDA可以明显超越强基eline,同时也可以超越状态之前的半supervised学习方法当有无标注数据。
    Abstract Data augmentation has been widely used in low-resource NER tasks to tackle the problem of data sparsity. However, previous data augmentation methods have the disadvantages of disrupted syntactic structures, token-label mismatch, and requirement for external knowledge or manual effort. To address these issues, we propose Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER. Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation through five fundamental augmentation operations to generate label-flipping and label-preserving examples. To optimize the utilization of the augmented samples, we present two techniques: Self-Consistency Filtering and mixup. The former effectively eliminates low-quality samples, while the latter prevents performance degradation arising from the direct utilization of label-flipping samples. Extensive experiments on three benchmarks from different domains demonstrate that RoPDA significantly improves upon strong baselines, and also outperforms state-of-the-art semi-supervised learning methods when unlabeled data is included.
    摘要 <>转换文本到简化中文。<>数据扩充已广泛应用于低资源NER任务中,以解决数据稀缺问题。然而,前一代数据扩充方法存在破坏语法结构、token-标签匹配问题以及需要外部知识或手动努力等缺点。为了解决这些问题,我们提出了Robust Prompt-based Data Augmentation(RoPDA) для低资源NER。基于预训练语言模型(PLM)与连续提示,RoPDA通过五种基本扩充操作进行实体扩充和上下文扩充,生成标签颠倒和标签保持的例子。为了优化扩充样本的利用,我们提出了两种技术:自我一致筛选和mixup。前者有效地消除低质量样本,而后者防止因直接使用标签颠倒样本而导致性能下降。我们在三个不同领域的benchmark上进行了广泛的实验,结果显示,RoPDA可以明显超越强基线,同时也在包含未标注数据时超越状态之前的最佳半supervised学习方法。

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

  • paper_url: http://arxiv.org/abs/2307.05284
  • repo_url: https://github.com/namkoong-lab/whyshift
  • paper_authors: Jiashuo Liu, Tianyu Wang, Peng Cui, Hongseok Namkoong
  • For: The paper aims to address the issue of distribution shifts in tabular data and their impact on machine learning models’ performance.* Methods: The paper uses a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations to identify the most prevalent type of distribution shift, $Y|X$-shifts. The authors also build an empirical testbed called WhyShift to characterize the type of shift they benchmark performance over.* Results: The paper finds that $Y|X$-shifts are most prevalent in tabular settings and identifies covariate regions that suffer the biggest $Y|X$-shifts. The authors discuss the implications for algorithmic and data-based interventions and highlight the importance of future research to build an understanding of how distributions differ.Here’s the Chinese version of the three key points:* For: 本研究旨在 Addressing tabular 数据中的分布shift问题以及其对机器学习模型性能的影响。* Methods: 本研究使用5个表格数据集和86,000个模型配置进行了广泛的天然分布Shift的调查,并发现了 $Y|X$-shift 是 tabular 设置中最为常见的分布shift 类型。作者还建立了一个名为 WhyShift 的 empirical 测试床,以Characterize 测试中的类型分布shift。* Results: 本研究发现 tabular 设置中 $Y|X$-shift 是最为常见的分布shift 类型,并identified covariate 区域uffering最大 $Y|X$-shift。作者讨论了对 algorithmic 和数据基于的 intervención的影响,并 highlighted 未来研究所需建立分布之间的理解。
    Abstract Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the $Y|X$ distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that $Y|X$-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since $Y|X$-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest $Y|X$-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.
    摘要 不同的分布变化需要不同的算法和运略干预。方法学研究应该基于具体的变化来铺垫。虽然初始的标准准确提供了一个有前途的实际基础,但它们默认地关注 covariate 变化,并且实际发现的有效性取决于类型的变化,例如,之前对算法性能的评估可能无法保持有效性当 $Y|X$ 分布发生变化。我们进行了5个表格数据集的全面调查,找到了86,000个配置中的$Y|X$-变化最为普遍。为了鼓励研究人员开发更加细化的分布变化语言,我们建立了 WhyShift,一个实际测试环境,其中我们Characterize了我们在 benchmark 性能时所测试的类型的变化。由于 tabular Setting 中 $Y|X$-变化最为普遍,我们 identific covariate 区域uffer 最大 $Y|X$-变化,并讨论了对算法和数据基础的干预的影响。我们的测试环境 highlights 未来研究应该建立一个理解如何分布不同的知识。

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

  • paper_url: http://arxiv.org/abs/2307.05260
  • repo_url: https://github.com/exploration-lab/il-pcr
  • paper_authors: Abhinav Joshi, Akshat Sharma, Sai Kiran Tanikella, Ashutosh Modi
  • for: 本研究旨在提高法律领域中的先前案例检索(PCR)任务的自动化。
  • methods: 本文提出了一个新的大型benchmark(IL-PCR corpus),用于检验PCR任务的自动化方法。另外,本文还提出了一种基于事件EXTRACTION的无监督检索方法pipeline U-CREAT,用于提高PCR任务的性能。
  • results: 对于IL-PCR corpus和COLIEE corpus两个法律系统,提出的无监督检索方法在benchmark上表现出了state-of-the-art的性能,并且比BM25更快速。
    Abstract The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
    摘要 法律领域中的先前案例检索任务(PCR)的目标是自动提供与当前案件相关的(根据事实和先例)的先前法律案例。为了进一步推动PCR领域的研究,在这篇论文中,我们提出了一个新的大型benchmark(以英语为语言) дляPCR任务:IL-PCR(印度法律先前案例检索) corpora。由于法律案例的复杂性和法律文档的长度,BM25仍然是PCR任务中的强基线,用于排名被引用的先前法律案例。在这种工作中,我们探索了法律案例中事件的角色,并提出了一种无监督检索方法领导管道U-CREAT(无监督案例检索使用事件EXTRACTION)。我们发现,提议的无监督检索方法可以明显提高性能,并且可以大幅提高检索速度,使其适用于实时案例检索系统。我们的提出的系统是通用的,我们表明它可以在印度和加拿大两个不同的法律系统上实现状态的表现,并在IL-PCR和COLIEE corpora上实现了状态的benchmark。

Integrated Planning in Hospitals: A Review

  • paper_url: http://arxiv.org/abs/2307.05258
  • repo_url: None
  • paper_authors: Sebastian Rachuba, Melanie Reuter-Oppermann, Clemens Thielen
  • for: 这篇论文旨在概述医院资源规划的Operator Research和管理科学文献,尤其是关于多资源集成规划的研究。
  • methods: 该论文分析了不同方面的不确定性模型和使用实际数据,并进行了跨比较,揭示了模型和解决方法的实际应用和潜在发展方向。
  • results: 该论文提供了一个高级分类系统,用于 классифика多资源集成规划方法,并指出了文献缺失和未来研究的潜在方向。
    Abstract Efficient planning of scarce resources in hospitals is a challenging task for which a large variety of Operations Research and Management Science approaches have been developed since the 1950s. While efficient planning of single resources such as operating rooms, beds, or specific types of staff can already lead to enormous efficiency gains, integrated planning of several resources has been shown to hold even greater potential, and a large number of integrated planning approaches have been presented in the literature over the past decades. This paper provides the first literature review that focuses specifically on the Operations Research and Management Science literature related to integrated planning of different resources in hospitals. We collect the relevant literature and analyze it regarding different aspects such as uncertainty modeling and the use of real-life data. Several cross comparisons reveal interesting insights concerning, e.g., relations between the modeling and solution methods used and the practical implementation of the approaches developed. Moreover, we provide a high-level taxonomy for classifying different resource-focused integration approaches and point out gaps in the literature as well as promising directions for future research.
    摘要 高效规划医院资源是一项复杂的任务,自1950年代以来,操作研究和管理科学方法已经开发出了各种方法。虽然单一资源的准确规划,如操作房、床位或特定类型的人员,已经能够实现很大的效率提升,但是集成资源规划具有更大的潜在潜力,文献中已经报道了许多集成资源规划方法。这篇论文是医院操作研究和管理科学文献中集成资源规划的首个文献回顾。我们收集了相关文献,并分析它们,包括不确定性模型和实际数据使用。跨比较表明了一些有趣的发现,例如模型和解决方法的关系和实际应用中使用的方法。此外,我们还提供了资源集成规划方法的高级分类和文献缺失以及未来研究的潜在方向。

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

  • paper_url: http://arxiv.org/abs/2307.05213
  • repo_url: None
  • paper_authors: Mattia Silvestri, Senne Berden, Jayanta Mandi, Ali İrfan Mahmutoğulları, Maxime Mulamba, Allegra De Filippo, Tias Guns, Michele Lombardi
  • for: 这篇论文主要针对优化问题中Unknown参数的预测,以提高优化结果。
  • methods: 这篇论文提出了一种新的决策关注学习(DFL)方法,通过直接最小化下游任务损失来训练机器学习模型。但是现有的DFL方法受到问题结构假设(如线性问题)和仅仅能预测出现在目标函数中的参数的限制。这篇论文采用分布预测和分数函数高斯估计(SFGE)来扩展DFL的应用范围。
  • results: 通过SFGE,我们可以:(1)处理出现在目标函数和约束中的预测;(2)有效地解决两阶段随机优化问题。
    Abstract Many real-world optimization problems contain unknown parameters that must be predicted prior to solving. To train the predictive machine learning (ML) models involved, the commonly adopted approach focuses on maximizing predictive accuracy. However, this approach does not always lead to the minimization of the downstream task loss. Decision-focused learning (DFL) is a recently proposed paradigm whose goal is to train the ML model by directly minimizing the task loss. However, state-of-the-art DFL methods are limited by the assumptions they make about the structure of the optimization problem (e.g., that the problem is linear) and by the fact that can only predict parameters that appear in the objective function. In this work, we address these limitations by instead predicting \textit{distributions} over parameters and adopting score function gradient estimation (SFGE) to compute decision-focused updates to the predictive model, thereby widening the applicability of DFL. Our experiments show that by using SFGE we can: (1) deal with predictions that occur both in the objective function and in the constraints; and (2) effectively tackle two-stage stochastic optimization problems.
    摘要 多数现实世界优化问题中存在未知参数,需要在解决之前预测。为了训练预测机器学习(ML)模型,通常采用的方法是寻求最大化预测精度。然而,这种方法并不总是可以最小化下游任务损失。决策专注学习(DFL)是一种最近提出的方法,其目标是通过直接最小化任务损失来训练 ML 模型。然而,现有的 DFL 方法受到问题结构的假设(例如,问题是线性的)和只能预测出现在目标函数中的参数的限制。在这种情况下,我们提出了一种新的方法,即预测参数的分布,并采用分数函数梯度估计(SFGE)计算决策专注更新,从而扩展 DFL 的应用范围。我们的实验表明,通过使用 SFGE,我们可以:(1)处理目标函数中的预测和约束中的预测;(2)有效地解决两个阶段随机优化问题。

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05209
  • repo_url: None
  • paper_authors: Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
  • for: 提高深度强化学习代理人的适应能力和减少过拟合
  • methods: 使用奖机器(RM)来表示当前任务,通过奖机器来激励代理人学习和转移
  • results: 在多个领域中提高了代理人的样本效率和几拟合转移能力
    Abstract Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
    摘要 现代研究显示,深度强化学习(DRL)代理往往因任务训练而过拟合和环境变化小的适应能力。为了加速转移到未经训练任务上的学习,我们提出了一种新的方法,即使用奖机器(RM)来表示当前任务,从而导致基于当前任务的奖励和动力的子任务。我们的方法为代理提供符号表示当前抽象状态中的优质转移,并奖励代理实现这些转移。这些表示在任务之间共享,使代理能够利用先前遇到的符号和转移知识,从而提高转移。我们的实验证明,我们的表示能够提高样本效率和少量转移在多种领域。

Differentially Private Statistical Inference through $β$-Divergence One Posterior Sampling

  • paper_url: http://arxiv.org/abs/2307.05194
  • repo_url: None
  • paper_authors: Jack Jewson, Sahra Ghalebikesabi, Chris Holmes
  • for: 这个研究旨在提供一种能够保证隐私的统计分析方法,以便发布敏感数据的结果而不会威胁任何参与者的隐私。
  • methods: 这种方法是基于 Bayesian posterior sampling 的,它可以生成具有隐私保证的统计分析结果,而不需要裁剪数据或引入噪声。
  • results: 这种方法可以为复杂的分类器和连续回归模型,如神经网络,提供 differentially private 的估计,并且比传统方法更准确。
    Abstract Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
    摘要 diffe革保证 позволяет发布涉及敏感数据的统计分析结果,而不损害参与者的隐私。实现这些保证通常需要注入噪声,ether directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. However, current approaches have been limited by their strong bounding assumptions, which do not hold for basic models, such as simple linear regressors. To address this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalized posterior targeting the minimization of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.

Can I say, now machines can think?

  • paper_url: http://arxiv.org/abs/2307.07526
  • repo_url: https://github.com/rprokap/pset-9
  • paper_authors: Nitisha Aggarwal, Geetika Jain Saxena, Sanjeev Singh, Amit Pundir
  • for: 本研究旨在探讨人工智能技术的应用和发展,以及这些技术在不同领域中的可能性和挑战。
  • methods: 本研究使用了许多不同的方法,包括对现有技术的分析和评估,以及对新技术的开发和应用。
  • results: 本研究结果表明,人工智能技术已经在许多领域取得了 significiant progress,包括图像生成、回答问题和编写代码等。同时,研究也探讨了这些技术的可能性和挑战,以及如何评估这些技术的认知能力。
    Abstract Generative AI techniques have opened the path for new generations of machines in diverse domains. These machines have various capabilities for example, they can produce images, generate answers or stories, and write codes based on the "prompts" only provided by users. These machines are considered 'thinking minds' because they have the ability to generate human-like responses. In this study, we have analyzed and explored the capabilities of artificial intelligence-enabled machines. We have revisited on Turing's concept of thinking machines and compared it with recent technological advancements. The objections and consequences of the thinking machines are also discussed in this study, along with available techniques to evaluate machines' cognitive capabilities. We have concluded that Turing Test is a critical aspect of evaluating machines' ability. However, there are other aspects of intelligence too, and AI machines exhibit most of these aspects.
    摘要 人工智能技术已经开启了新一代机器的道路,这些机器在多个领域都有不同的能力,例如生成图片、回答问题或故事、根据用户提供的提示生成代码等。这些机器被认为是“思维机器”,因为它们有人类化的回应能力。在这项研究中,我们对人工智能技术的应用和发展进行了分析和探讨。我们还重新审视了图灵的思想机器理论,并与最新的技术进步进行比较。这些思想机器的反对和后果也得到了讨论,同时还提出了评估机器智能能力的技巧。我们认为图灵测试是评估机器智能能力的关键方面,但是还有其他智能方面,AI机器几乎涵盖了所有这些方面。

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

  • paper_url: http://arxiv.org/abs/2307.05182
  • repo_url: https://github.com/longbai1006/cat-vil
  • paper_authors: Long Bai, Mobarakol Islam, Hongliang Ren
  • for: 这个研究旨在帮助医学生和初级外科医生通过记录的手术视频学习和理解手术技巧。
  • methods: 该研究提出了一种基于Transformer的眼动语言(ViL)嵌入,并将视觉和文本特征 fusion 以提高手术Scene理解。
  • results: 实验结果表明,该方法在MICCAI EndoVis Challenge 2017和2018中的公共手术视频上表现出色,并且在各种情况下具有出色的Robustness。
    Abstract Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question Answering (VQA) systems can only provide simple answers without the location of the answers. In addition, vision-language (ViL) embedding is still a less explored research in these kinds of tasks. Therefore, a surgical Visual Question Localized-Answering (VQLA) system would be helpful for medical students and junior surgeons to learn and understand from recorded surgical videos. We propose an end-to-end Transformer with the Co-Attention gaTed Vision-Language (CAT-ViL) embedding for VQLA in surgical scenarios, which does not require feature extraction through detection models. The CAT-ViL embedding module is designed to fuse multimodal features from visual and textual sources. The fused embedding will feed a standard Data-Efficient Image Transformer (DeiT) module, before the parallel classifier and detector for joint prediction. We conduct the experimental validation on public surgical videos from MICCAI EndoVis Challenge 2017 and 2018. The experimental results highlight the superior performance and robustness of our proposed model compared to the state-of-the-art approaches. Ablation studies further prove the outstanding performance of all the proposed components. The proposed method provides a promising solution for surgical scene understanding, and opens up a primary step in the Artificial Intelligence (AI)-based VQLA system for surgical training. Our code is publicly available.
    摘要 医学生和初级外科医生经常寻求 senior 外科医生和专家回答他们学习外科手术时的问题。然而,专家往往忙于临床和学术工作,有少量时间提供指导。此外,现有的深度学习(DL)基本的外科视觉问答(VQA)系统只能提供简单的答案而不是答案的位置。此外,视语(ViL)嵌入仍然是这类任务的未知领域。因此,一个外科视觉问题本地回答(VQLA)系统会对医学生和初级外科医生学习和理解记录的外科手术视频非常有用。我们提议一种终端Transformer结构,其中包括协同注意力加特化视语(CAT-ViL)嵌入模块,不需要通过检测模型提取特征。CAT-ViL嵌入模块设计用于融合视觉和文本来源的多Modal特征。这些融合的嵌入将被 feed 到标准数据效果图像变换器(DeiT)模块,然后通过平行分类器和检测器进行共同预测。我们在MICCAI EndoVis Challenge 2017和2018公共的外科手术视频上进行实验验证。实验结果表明我们提出的方法在现状顶尖方法的基础上表现出色,并且具有较好的 Robustness。剖析研究进一步证明了我们所提出的所有组件的出色性。该方法为外科场景理解提供了一个有前途的解决方案,并打开了人工智能(AI)基于VQLA系统的初步步骤。我们的代码公共可用。

Enriching Verbal Feedback from Usability Testing: Automatic Linking of Thinking-Aloud Recordings and Stimulus using Eye Tracking and Mouse Data

  • paper_url: http://arxiv.org/abs/2307.05171
  • repo_url: None
  • paper_authors: Supriya Murali, Tina Walber, Christoph Schaefer, Sezen Lim
  • for: This paper aims to automatically analyze verbal protocols and investigate the link between spoken feedback and the stimulus using eye tracking and mouse tracking.
  • methods: The paper uses eye tracking and mouse tracking to record the verbal responses, eye movements, and cursor movements of participants as they view and provide feedback on three websites.
  • results: The results show that the hit rate for gaze data is significantly higher than for mouse data, indicating that eye tracking data provides more detailed information and valuable insights about the verbalizations compared to mouse data.
    Abstract The think aloud method is an important and commonly used tool for usability optimization. However, analyzing think aloud data could be time consuming. In this paper, we put forth an automatic analysis of verbal protocols and test the link between spoken feedback and the stimulus using eye tracking and mouse tracking. The gained data - user feedback linked to a specific area of the stimulus - could be used to let an expert review the feedback on specific web page elements or to visualize on which parts of the web page the feedback was given. Specifically, we test if participants fixate on or point with the mouse to the content of the webpage that they are verbalizing. During the testing, participants were shown three websites and asked to verbally give their opinion. The verbal responses, along with the eye and cursor movements were recorded. We compared the hit rate, defined as the percentage of verbally mentioned areas of interest (AOIs) that were fixated with gaze or pointed to with the mouse. The results revealed a significantly higher hit rate for the gaze compared to the mouse data. Further investigation revealed that, while the mouse was mostly used passively to scroll, the gaze was often directed towards relevant AOIs, thus establishing a strong association between spoken words and stimuli. Therefore, eye tracking data possibly provides more detailed information and more valuable insights about the verbalizations compared to the mouse data.
    摘要 “对话思维法”是用户体验优化的重要工具之一,但分析对话数据可能是时间消耗的。本文提出一种自动分析声明协议的方法,并通过眼动跟踪和鼠标跟踪测试了声明与刺激之间的关系。获得的数据(用户反馈与特定区域刺激之间的关系)可以用于让专家审查特定网页元素的反馈,或者可以视化在网页上提供反馈。我们测试了参与者对三个网站的 opinio 的报告,并记录了口头响应、眼动和鼠标移动。我们比较了hit rate,定义为口头提到的兴趣区域(AOI)的眼动或鼠标指向率。结果显示,眼动数据与鼠标数据之间存在显著差异,即眼动数据的hit rate significantly higher。进一步的调查表明,鼠标主要用于滚动,而眼动则经常指向有关的 AOI,因此确立了声明与刺激之间的强相关性。因此,眼动跟踪数据可能提供更多细节信息和更有价值的探索,相比鼠标数据。

Neural Quantile Optimization for Edge-Cloud Computing

  • paper_url: http://arxiv.org/abs/2307.05170
  • repo_url: None
  • paper_authors: Bin Du, He Zhang, Xiangle Cheng, Lei Zhang
  • for: 这篇论文旨在设计一个高效的edge-cloud computingu网络资源分配方案,满足网络组件的硬件和软件组件的硬件和软件缓冲�ayer,并最小化成本。
  • methods: 本论文使用了一种称为Gumbel-softmax reparameterization方法,将原本的离散问题转换为一个连续问题,并透过一个叫做Gumbel-softmax sampling网络来解决这个问题。这个网络结构是根据edge-cloud computing网络的架构设计,并且在训练过程中将网络训练为对于连续问题的最佳解决方案。
  • results: 本论文的实验结果显示,使用Gumbel-softmax sampling网络可以对edge-cloud computing网络进行高效的资源分配,并且可以优化成本。另外,研究人员还证明了这个方法可以跨时间步进行扩展,并且可以与现有的整数优化器相互运算。
    Abstract We seek the best traffic allocation scheme for the edge-cloud computing network that satisfies constraints and minimizes the cost based on burstable billing. First, for a fixed network topology, we formulate a family of integer programming problems with random parameters describing the various traffic demands. Then, to overcome the difficulty caused by the discrete feature of the problem, we generalize the Gumbel-softmax reparameterization method to induce an unconstrained continuous optimization problem as a regularized continuation of the discrete problem. Finally, we introduce the Gumbel-softmax sampling network to solve the optimization problems via unsupervised learning. The network structure reflects the edge-cloud computing topology and is trained to minimize the expectation of the cost function for unconstrained continuous optimization problems. The trained network works as an efficient traffic allocation scheme sampler, remarkably outperforming the random strategy in feasibility and cost function value. Besides testing the quality of the output allocation scheme, we examine the generalization property of the network by increasing the time steps and the number of users. We also feed the solution to existing integer optimization solvers as initial conditions and verify the warm-starts can accelerate the short-time iteration process. The framework is general with solid performance, and the decoupled feature of the random neural networks is adequate for practical implementations.
    摘要

SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization

  • paper_url: http://arxiv.org/abs/2307.05162
  • repo_url: None
  • paper_authors: Kunal Suri, Prakhar Mishra, Saumajit Saha, Atul Singh
  • for: 这个论文是为了提高预测领域具体应用场景的结果而写的。
  • methods: 这个论文使用了Parameter Efficient Fine Tuning(PEFT)方法,将大型语言模型作为固定基础,然后添加额外层,并使用PEFT方法进行微调。
  • results: 这个论文透过评估LoRA方法在医疗对话摘要化领域的表现,发现LoRA方法与终端微调方法的表现相似。
    Abstract Finetuning Large Language Models helps improve the results for domain-specific use cases. End-to-end finetuning of large language models is time and resource intensive and has high storage requirements to store the finetuned version of the large language model. Parameter Efficient Fine Tuning (PEFT) methods address the time and resource challenges by keeping the large language model as a fixed base and add additional layers, which the PEFT methods finetune. This paper demonstrates the evaluation results for one such PEFT method Low Rank Adaptation (LoRA), for Clinical Dialogue Summarization. The evaluation results show that LoRA works at par with end-to-end finetuning for a large language model. The paper presents the evaluations done for solving both the Subtask A and B from ImageCLEFmedical {https://www.imageclef.org/2023/medical}
    摘要 使用大语言模型进行精度调整可以提高域специфи的应用场景中的结果。结束到终端的精度调整大语言模型需要很多时间和资源,同时也需要大量存储空间来存储精度调整后的大语言模型。参数高效精度调整(PEFT)方法解决了时间和资源挑战,它将大语言模型作为基础不变,然后在其上添加额外层,并使用PEFT方法进行精度调整。本文介绍了使用LoRA方法进行临床对话摘要的评估结果。评估结果表明,LoRA与终端精度调整的大语言模型相当。文章还介绍了解决ImageCLEF医学 {https://www.imageclef.org/2023/medical} 的两个任务A和B的评估结果。

Multiobjective Hydropower Reservoir Operation Optimization with Transformer-Based Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05643
  • repo_url: None
  • paper_authors: Rixin Wu, Ran Wang, Jie Hao, Qiang Wu, Ping Wang
  • for: 旨在做好多座水力发电厂的共同运行,以实现能量生产、生态保护和居民用水的平衡。
  • methods: 使用深度学习 reinforcement learning 方法,包括 transformer 框架,以提取多个水库和居民区域信息,并生成合适的操作决策。
  • results: 对 Lake Mead 和 Lake Powell 进行实验,结果表明 transformer 基于 deep reinforcement learning 方法可以生成适当的操作结果,相比之前的方法,可以增加电力生产10.11%,降低修改年度负担流量偏差39.69%,提高水质财富4.10%。
    Abstract Due to shortage of water resources and increasing water demands, the joint operation of multireservoir systems for balancing power generation, ecological protection, and the residential water supply has become a critical issue in hydropower management. However, the numerous constraints and nonlinearity of multiple reservoirs make solving this problem time-consuming. To address this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multihead attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multireservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Consequently, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.
    摘要 Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. The translation may vary depending on the specific dialect or region.

On the Effectiveness of Speech Self-supervised Learning for Music

  • paper_url: http://arxiv.org/abs/2307.05161
  • repo_url: None
  • paper_authors: Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu
  • for: 本研究探讨了自监督学习(SSL)在音乐信息检索(MIR)领域的应用情况。
  • methods: 本研究使用了两种不同的语音相关模型,即data2vec1.0和Hubert,并将其应用到音乐录音中。
  • results: 研究发现,使用音乐数据进行SSL训练可以提高MIR任务的性能,即使使用了基于语音的模型。但是,研究还发现了现有的语音导向的设计方法在处理多重音乐信息方面存在限制。
    Abstract Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.
    摘要 In this study, we explore the music adaptation of SSL using two distinct speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train 12 SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modeling polyphonic information.Based on the experimental results, we provide empirical suggestions for designing future musical SSL strategies and paradigms. For example, we find that using a larger model size and a longer pre-training duration can improve the performance of music2vec and musicHuBERT on MIR tasks. Additionally, we suggest that incorporating more diverse musical features and using a more robust evaluation metric can further enhance the performance of these models.Overall, our study demonstrates the potential of applying SSL to MIR tasks, and provides insights into the limitations and opportunities of using speech-oriented models for music modeling. These findings can help guide the development of future musical SSL models and paradigms, and contribute to the advancement of MIR technology.

Stable Normative Explanations: From Argumentation to Deontic Logic

  • paper_url: http://arxiv.org/abs/2307.05156
  • repo_url: None
  • paper_authors: Cecilia Di Florio, Guido Governatori, Antonino Rotolo, Giovanni Sartor
  • for: 这 paper 探讨了 Defeasible Logic 中稳定解释的表述方式,以及如何在形式逻辑中表述这种概念。
  • methods: 这 paper 使用了形式逻辑的方法,包括建立了 argumentation neighborhood structures 以及讨论了这种概念的 deontic meaning。
  • results: 这 paper 提供了一些直接复杂性结果。
    Abstract This paper examines how a notion of stable explanation developed elsewhere in Defeasible Logic can be expressed in the context of formal argumentation. With this done, we discuss the deontic meaning of this reconstruction and show how to build from argumentation neighborhood structures for deontic logic where this notion of explanation can be characterised. Some direct complexity results are offered.
    摘要 这篇论文研究了如何在正式推理中表达 elsewhere in Defeasible Logic 中的稳定解释概念。然后,我们讨论了这种重建的德 Ontic 含义,并示出如何从推理 neigh 结构中建立德 Ontic 逻辑,其中可以表示这种解释的特征。还提供了一些直接复杂性结果。Here's a breakdown of the translation:* "elsewhere in Defeasible Logic" is translated as " elsewhere in Defeasible Logic" (使用同义词 "elsewhere" 表示 "else" 的意思)* "notion of stable explanation" is translated as "稳定解释概念" (使用词根 "稳" 表示 "stable" 的意思)* "deontic meaning" is translated as "德 Ontic 含义" (使用词根 "德" 表示 "deontic" 的意思)* "argumentation neighborhood structures" is translated as "推理 neigh 结构" (使用词根 "推理" 表示 "argumentation" 的意思)* "direct complexity results" is translated as "直接复杂性结果" (使用词根 "直接" 表示 "direct" 的意思)

A Modal Logic for Explaining some Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05150
  • repo_url: None
  • paper_authors: Pierre Nunn, François Schwarzentruber
  • for: 这个论文是关于模态逻辑中 counting 模态的研究。
  • methods: 论文使用 linear 不等式来表示模态逻辑,并将每个 форму拉到等价的图神经网络(GNN)中。同时,论文还将 GNN 转换成对应的模态逻辑 форму。
  • results: 论文表明了这个扩展的模态逻辑满足问题是可解决的。此外,论文还讨论了一些variant,其中一些在 PSPACE 中。
    Abstract In this paper, we propose a modal logic in which counting modalities appear in linear inequalities. We show that each formula can be transformed into an equivalent graph neural network (GNN). We also show that each GNN can be transformed into a formula. We show that the satisfiability problem is decidable. We also discuss some variants that are in PSPACE.
    摘要 在这篇论文中,我们提出了一种Modal 逻辑,其中 counting modalities 出现在线性不等式中。我们证明了每个公式都可以被转化成等价的图 neural network (GNN)。我们还证明了每个 GNN 都可以被转化成公式。我们还证明了满足问题是可解决的。我们还讨论了一些变体,它们是 PSPACE 中的。Here's the breakdown of the translation:* "Modal 逻辑" (Modal 逻辑) is the Simplified Chinese translation of "modal logic".* "counting modalities" (计数modalities) is the Simplified Chinese translation of "counting modalities".* "线性不等式" (线性不等式) is the Simplified Chinese translation of "linear inequalities".* "GNN" (GNN) is the Simplified Chinese translation of "graph neural network".* "等价" (等价) is the Simplified Chinese translation of "equivalent".* "PSPACE" (PSPACE) is the Simplified Chinese translation of "PSPACE".

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05639
  • repo_url: https://github.com/dannyzx/grbf-nns
  • paper_authors: Danny D’Agostino, Ilija Ilievski, Christine Annette Shoemaker
  • for: 提高机器学习模型的预测性能和可读性。
  • methods: modificar Radial Basis Function Neural Network 模型,使其的 Gaussian kernel 具有可学习的精度矩阵。
  • results: 在回归、分类和特征选择任务中,提出的模型不仅具有吸引人的预测性能,还提供了可读的结果,帮助决策过程中减少风险。
    Abstract Providing a model that achieves a strong predictive performance and at the same time is interpretable by humans is one of the most difficult challenges in machine learning research due to the conflicting nature of these two objectives. To address this challenge, we propose a modification of the Radial Basis Function Neural Network model by equipping its Gaussian kernel with a learnable precision matrix. We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed. In particular, the eigenvectors explain the directions of maximum sensitivity of the model revealing the active subspace and suggesting potential applications for supervised dimensionality reduction. At the same time, the eigenvectors highlight the relationship in terms of absolute variation between the input and the latent variables, thereby allowing us to extract a ranking of the input variables based on their importance to the prediction task enhancing the model interpretability. We conducted numerical experiments for regression, classification, and feature selection tasks, comparing our model against popular machine learning models and the state-of-the-art deep learning-based embedding feature selection techniques. Our results demonstrate that the proposed model does not only yield an attractive prediction performance with respect to the competitors but also provides meaningful and interpretable results that potentially could assist the decision-making process in real-world applications. A PyTorch implementation of the model is available on GitHub at the following link. https://github.com/dannyzx/GRBF-NNs
    摘要 “提供一个具有强predictive表现的模型,同时具有人类可解释性是机器学习研究中最大的挑战。我们提出一个将Radial Basis Function Neural Network模型中的 Gaussian kernelEquipped with a learnable precision matrix。我们展示了在训练完成后可以从精度矩阵的spectrum中提取有用信息。特别是,对称的 eigenvectors可以解释模型的最大敏感方向,显示活跃的子空间和可能的应用 дляsupervised dimensionality reduction。另外,eigenvectors显示输入和隐藏变量之间的绝对差异,因此可以从入门变量中提取关键的importance,增强模型的解释性。我们在回归、分类和特征选择任务中进行了numerical experiments,与流行的机器学习模型和深度学习基于嵌入特征选择技术进行比较。我们的结果显示,我们的模型不仅与竞争者具有吸引力的预测性能,同时也提供了有意义且可解释的结果,可能对实际应用中的决策过程提供帮助。PyTorch的实现可以在GitHub上找到,以下是连结:https://github.com/dannyzx/GRBF-NNs。”

A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions

  • paper_url: http://arxiv.org/abs/2307.05638
  • repo_url: None
  • paper_authors: Peng Yan, Ahmed Abdulkadir, Matthias Rosenthal, Gerrit A. Schatte, Benjamin F. Grewe, Thilo Stadelmann
  • for: 这篇论文的目的是探讨深度转移学习在工业领域中的应用,以实现时间序列异常探测和精确的质量优化。
  • methods: 这篇论文使用深度转移学习方法,具有转移知识和处理不同数据分布的能力,以解决工业领域中的时间序列异常探测 зада题。
  • results: 这篇论文探讨了各种时间序列异常探测任务,例如生产过程监控、预防维护、能源管理和基础设施监控等,并提供了实际的建议和解决方案。
    Abstract Automating the monitoring of industrial processes has the potential to enhance efficiency and optimize quality by promptly detecting abnormal events and thus facilitating timely interventions. Deep learning, with its capacity to discern non-trivial patterns within large datasets, plays a pivotal role in this process. Standard deep learning methods are suitable to solve a specific task given a specific type of data. During training, the algorithms demand large volumes of labeled training data. However, due to the dynamic nature of processes and the environment, it is impractical to acquire the needed data for standard deep learning training for every slightly different case anew. Deep transfer learning offers a solution to this problem. By leveraging knowledge from related tasks and accounting for variations in data distributions, this learning framework solves new tasks even with little or no additional labeled data. The approach bypasses the need to retrain a model from scratch for every new setup and dramatically reduces the labeled data requirement. This survey provides an in-depth review of deep transfer learning, examining the problem settings of transfer learning and classifying the prevailing deep transfer learning methods. Moreover, we delve into applying deep transfer learning in the context of a broad spectrum of time series anomaly detection tasks prevalent in primary industrial domains, e.g., manufacturing process monitoring, predictive maintenance, energy management, and infrastructure facility monitoring. We conclude this survey by underlining the challenges and limitations of deep transfer learning in industrial contexts. We also provide practical directions for solution design and implementation for these tasks, leading to specific, actionable suggestions.
    摘要 自动监测工业过程可以提高效率和质量的优化,通过及时检测异常事件并且进行时间化 intervención。深度学习,作为检测复杂模式的技术,在这个过程中扮演着关键性的角色。标准的深度学习方法适用于特定任务和数据类型。在训练过程中,算法需要大量标注训练数据。然而,由于生产过程和环境的动态性,获得每个微不同的情况的充足数据是不可能的。深度传输学习提供了一个解决方案。通过利用相关任务的知识和考虑数据分布的变化,这种学习框架可以解决新任务,即使有少量或没有额外标注数据。这种方法可以避免重新训练模型的需要,并大幅减少标注数据的需求。本文提供了深度传输学习的深入审查,包括传输学习问题的设定和已知的深度传输学习方法。此外,我们还探讨了在主要工业领域中广泛存在的时间序列异常检测任务,如制造过程监测、预测维护、能源管理和基础设施监测。我们在这种情况下结尾这篇评论,并指出了深度传输学习在工业上的挑战和局限性。同时,我们还提供了实践的解决方案和实施建议,以便帮助读者在这些任务中进行实际应用。

TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2307.05134
  • repo_url: https://github.com/grimalpaul/tiam
  • paper_authors: Paul Grimal, Hervé Le Borgne, Olivier Ferret, Julien Tourille
  • for: 本研究旨在评估文本到图像(T2I)模型生成图像的质量,特别是考虑提示中的重要内容与生成图像之间的匹配程度。
  • methods: 本研究提出了一种基于提示模板的新评价指标,可以更好地描述生成图像与提示中的内容之间的匹配程度,包括提示中的对象类型、数量和颜色等方面。
  • results: 研究发现,图像质量可以受到随机初始点的影响,并且不同的概率误差可以生成不同质量的图像。此外,研究还发现提示中的概念数量、顺序和颜色属性也会影响图像质量。最后,研究还发现了一些latent seed可以生成更好的图像,开启了新的研究方向。
    Abstract The progress in the generation of synthetic images has made it crucial to assess their quality. While several metrics have been proposed to assess the rendering of images, it is crucial for Text-to-Image (T2I) models, which generate images based on a prompt, to consider additional aspects such as to which extent the generated image matches the important content of the prompt. Moreover, although the generated images usually result from a random starting point, the influence of this one is generally not considered. In this article, we propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images. It allows us to better characterize the alignment in terms of the type of the specified objects, their number, and their color. We conducted a study on several recent T2I models about various aspects. An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the latent noise used as a seed for the images. We also quantify the influence of the number of concepts in the prompt, their order as well as their (color) attributes. Finally, our method allows us to identify some latent seeds that produce better images than others, opening novel directions of research on this understudied topic.
    摘要 progress 在生成 sintetic 图像方面的发展使得评估图像质量变得关键。虽然多种指标已经被提出来评估图像的渲染,但是对于 Text-to-Image(T2I)模型,它们生成图像基于提示,需要考虑更多的方面,例如提示中重要内容的匹配度。此外,通常生成的图像都来自于随机的起始点,但是这一点的影响通常不被考虑。本文提出了一个基于提示模板的新指标,用于研究提示中内容和生成图像之间的对应。它允许我们更好地 characteize 对应的类型、数量和颜色。我们对多种最近的 T2I 模型进行了研究,并获得了许多有趣的结果。例如,图像质量可以受到 latent noise 作为种子的影响,并且我们可以衡量提示中概念的数量、顺序以及颜色属性的影响。此外,我们的方法允许我们标识一些 latent seeds 可以生成更好的图像,开启了新的研究方向。

A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI

  • paper_url: http://arxiv.org/abs/2307.05104
  • repo_url: https://github.com/visual-xai-for-time-series/time-series-xai-perturbation-analysis
  • paper_authors: Udo Schlegel, Daniel A. Keim
  • for: 本研究旨在评估时序数据XAI技术中的解释质量。
  • methods: 本研究使用杂化分析方法来评估XAI方法中的解释。
  • results: 研究结果表明,杂化分析方法可以有效评估XAI方法的解释质量,并提供时序数据XAI技术的各种优缺点。
    Abstract Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.
    摘要 < translator:name="Google" /> < translator:fallback_language="en" />explainable artificial intelligence (XAI) 在最近几年内得到了广泛关注,因为机器学习模型的透明性和可解释性的需求增加了。特别是在金融、医疗和气候科学等领域,XAI for time series data 已经变得越来越重要。然而,评估 XAI 技术提供的解释质量仍然是一个挑战。这篇论文提供了对使用扰动分析来评估 XAI 技术提供的解释的深入分析。扰动分析 involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method。我们在多种 state-of-the-art XAI 技术上应用了这种方法,并在三个时间序列分类数据集上进行了评估。我们的结果表明,扰动分析方法可以有效评估 XAI 技术提供的解释质量,并为 XAI 技术的发展提供了有价值的指导。这种方法可以导引选择 XAI 方法,例如关注返回时间而不是精度,并促进更可靠和可解释的机器学习模型的开发。

ATWM: Defense against adversarial malware based on adversarial training

  • paper_url: http://arxiv.org/abs/2307.05095
  • repo_url: None
  • paper_authors: Kun Li, Fan Zhang, Wei Guo
  • for: 防御Windows恶意软件攻击
  • methods: 基于对抗训练的防御方法
  • results: 提高模型对抗攻击能力,不产生模型精度下降
    Abstract Deep learning technology has made great achievements in the field of image. In order to defend against malware attacks, researchers have proposed many Windows malware detection models based on deep learning. However, deep learning models are vulnerable to adversarial example attacks. Malware can generate adversarial malware with the same malicious function to attack the malware detection model and evade detection of the model. Currently, many adversarial defense studies have been proposed, but existing adversarial defense studies are based on image sample and cannot be directly applied to malware sample. Therefore, this paper proposes an adversarial malware defense method based on adversarial training. This method uses preprocessing to defend simple adversarial examples to reduce the difficulty of adversarial training. Moreover, this method improves the adversarial defense capability of the model through adversarial training. We experimented with three attack methods in two sets of datasets, and the results show that the method in this paper can improve the adversarial defense capability of the model without reducing the accuracy of the model.
    摘要 深度学习技术在图像领域得到了很大的成就。为了对抗马尔伯攻击,研究人员已经提出了基于深度学习的Windows马尔伯检测模型许多。然而,深度学习模型容易受到反例攻击。恶意软件可以生成反例攻击模型,使模型无法识别恶意软件。目前,许多反例防御研究已经被提出,但这些研究都基于图像样本,无法直接应用于马尔伯样本。因此,这篇论文提出了基于反例训练的反例防御方法。这种方法使用预处理来防御简单的反例,从而减少反例训练的困难。此外,这种方法通过反例训练提高模型的反例防御能力。我们在两个数据集上使用三种攻击方法进行实验,结果显示,这篇论文中的方法可以提高模型的反例防御能力,不会降低模型的准确率。

OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning

  • paper_url: http://arxiv.org/abs/2307.05082
  • repo_url: https://github.com/knowledge-ukraine/ontochatgpt
  • paper_authors: Oleksandr Palagin, Vladislav Kaverinskiy, Anna Litvin, Kyrylo Malakhov
  • for: 这个研究旨在开发一种基于 ontology 的结构化提示系统,用于与 ChatGPT 的大语言模型(LLM)进行集成。
  • methods: 该研究开发了形式化的信息模型和功能模型,并建立了将 ontology-driven 提示与 ChatGPT 的meta-学能力集成的方法ологиocal foundations。
  • results: 通过应用该技术,OntoChatGPT 系统能够从上下文中提取实体,将其分类,并生成相关的回答。研究表明,该方法可以应用于不同的语言和领域,并且可以扩展到其他基于 LLM 的 chatbot 系统,如 Google Bard 使用 PaLM 2 LLM。
    Abstract This research presents a comprehensive methodology for utilizing an ontology-driven structured prompts system in interplay with ChatGPT, a widely used large language model (LLM). The study develops formal models, both information and functional, and establishes the methodological foundations for integrating ontology-driven prompts with ChatGPT's meta-learning capabilities. The resulting productive triad comprises the methodological foundations, advanced information technology, and the OntoChatGPT system, which collectively enhance the effectiveness and performance of chatbot systems. The implementation of this technology is demonstrated using the Ukrainian language within the domain of rehabilitation. By applying the proposed methodology, the OntoChatGPT system effectively extracts entities from contexts, classifies them, and generates relevant responses. The study highlights the versatility of the methodology, emphasizing its applicability not only to ChatGPT but also to other chatbot systems based on LLMs, such as Google's Bard utilizing the PaLM 2 LLM. The underlying principles of meta-learning, structured prompts, and ontology-driven information retrieval form the core of the proposed methodology, enabling their adaptation and utilization in various LLM-based systems. This versatile approach opens up new possibilities for NLP and dialogue systems, empowering developers to enhance the performance and functionality of chatbot systems across different domains and languages.
    摘要 Translated into Simplified Chinese:这项研究提出了一种涵盖性的方法ология,使用ontology驱动的结构化提问系统与ChatGPT,一种广泛使用的大型语言模型(LLM)相结合。该研究开发了формали模型,包括信息模型和功能模型,并确立了将ontology驱动的提问与ChatGPT的元学习能力相结合的方法基础。这个产品ivitytriad包括方法基础、高级信息技术和OntoChatGPT系统,这三者共同提高了聊天机器人系统的效果和性能。该研究在 ukrainian语言领域的rehabilitation中实现了该技术。通过应用提出的方法ология,OntoChatGPT系统可以从上下文中提取实体,分类它们,并生成相关的回答。研究强调了该方法ология的多样性,指出其可以应用于不同的LLM基于系统,如Google的Bard使用PaLM 2 LLM。这些元学习、结构化提问和ontology驱动的信息检索原理成为该方法ология的核心,可以在不同的LLM基于系统中进行适应和应用。这种多元化的approach开 up了对NLP和对话系统的新可能性,让开发者在不同的领域和语言中提高聊天机器人系统的性能和功能。

Uni-Removal: A Semi-Supervised Framework for Simultaneously Addressing Multiple Degradations in Real-World Images

  • paper_url: http://arxiv.org/abs/2307.05075
  • repo_url: None
  • paper_authors: Yongheng Zhang, Danfeng Yan, Yuanqiang Cai
  • for: removes multiple degradations (haze, rain, and blur) from real-world images
  • methods: uses a twostage semi-supervised framework with a unified model and parameters, leverages a supervised multi-teacher and student architecture, and incorporates an adversarial discriminator and generative adversarial loss for domain adaptation
  • results: demonstrates effective removal of degradations in real-world images, outperforming state-of-the-art supervised and unsupervised methods in dehazing, deraining, and deblurring simultaneously
    Abstract Removing multiple degradations, such as haze, rain, and blur, from real-world images poses a challenging and illposed problem. Recently, unified models that can handle different degradations have been proposed and yield promising results. However, these approaches focus on synthetic images and experience a significant performance drop when applied to realworld images. In this paper, we introduce Uni-Removal, a twostage semi-supervised framework for addressing the removal of multiple degradations in real-world images using a unified model and parameters. In the knowledge transfer stage, Uni-Removal leverages a supervised multi-teacher and student architecture in the knowledge transfer stage to facilitate learning from pretrained teacher networks specialized in different degradation types. A multi-grained contrastive loss is introduced to enhance learning from feature and image spaces. In the domain adaptation stage, unsupervised fine-tuning is performed by incorporating an adversarial discriminator on real-world images. The integration of an extended multi-grained contrastive loss and generative adversarial loss enables the adaptation of the student network from synthetic to real-world domains. Extensive experiments on real-world degraded datasets demonstrate the effectiveness of our proposed method. We compare our Uni-Removal framework with state-of-the-art supervised and unsupervised methods, showcasing its promising results in real-world image dehazing, deraining, and deblurring simultaneously.
    摘要 “实际世界中的多重劣化(如雾、雨、模糊)去除是一个具有挑战性和不确定性的问题。现在,一些可以处理不同类型的劣化的统一模型已经被提出来,但这些方法对实际世界图像的应用表现不佳。在这篇论文中,我们介绍了Uni-Removal,一个二阶段半监督框架,用于实际世界图像中多重劣化的去除。在知识转移阶段,Uni-Removal利用了一个监督学习多教师和学生架构,以便从预训特有的教师网络中学习不同类型的劣化。我们引入了一个多层次对比损失来增强学习,包括图像和特征空间。在领域适应阶段,我们通过不监督微调来适应学生网络从синтети到实际世界领域。我们组合了扩展的多层次对比损失和生成敌方loss,使学生网络能够从synthetic到实际世界领域进行适应。实际世界受损图像 dataset 的广泛实验显示了我们提出的方法的有效性。我们与现有的监督和不监督方法进行比较,展示了我们的Uni-Removal框架在实际世界图像中进行雨、雾、模糊的去除,同时展示了它的推理和稳定性。”

Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain

  • paper_url: http://arxiv.org/abs/2307.05074
  • repo_url: None
  • paper_authors: Chunxi Guo, Zhiliang Tian, Jintao Tang, Shasha Li, Zhihua Wen, Kaixuan Wang, Ting Wang
  • for: 本研究旨在提高LLM-based Text-to-SQL框架的性能,使其能够更好地处理各种自然语言问题。
  • methods: 我们提出了一种基于检索的增强推荐方法,包括样本意识推荐和动态修订链。我们采用样本意识推荐,使用LLM对输入问题进行简化,以便更好地理解用户的意图。另外,我们还提出了两种检索策略,用于帮助检索相似意图的问题。
  • results: 我们在三个Text-to-SQL benchmark上进行了实验,结果显示,我们的方法在比较强的基准模型之上得到了显著的提高。
    Abstract Text-to-SQL aims at generating SQL queries for the given natural language questions and thus helping users to query databases. Prompt learning with large language models (LLMs) has emerged as a recent approach, which designs prompts to lead LLMs to understand the input question and generate the corresponding SQL. However, it faces challenges with strict SQL syntax requirements. Existing work prompts the LLMs with a list of demonstration examples (i.e. question-SQL pairs) to generate SQL, but the fixed prompts can hardly handle the scenario where the semantic gap between the retrieved demonstration and the input question is large. In this paper, we propose a retrieval-augmented prompting method for a LLM-based Text-to-SQL framework, involving sample-aware prompting and a dynamic revision chain. Our approach incorporates sample-aware demonstrations, which include the composition of SQL operators and fine-grained information related to the given question. To retrieve questions sharing similar intents with input questions, we propose two strategies for assisting retrieval. Firstly, we leverage LLMs to simplify the original questions, unifying the syntax and thereby clarifying the users' intentions. To generate executable and accurate SQLs without human intervention, we design a dynamic revision chain which iteratively adapts fine-grained feedback from the previously generated SQL. Experimental results on three Text-to-SQL benchmarks demonstrate the superiority of our method over strong baseline models.
    摘要 文本到SQL是一个目标,它的目的是生成基于自然语言问题的SQL查询。使用大型语言模型(LLM)作为现代方法,它设计了提示来使LLM理解输入问题并生成相应的SQL。然而,它遇到了严格的SQL语法要求的挑战。现有的工作通过提供一列示例问题(即问题-SQL对)来引导LLM生成SQL,但固定的提示很难处理输入问题和示例问题之间的semantic gap。在这篇论文中,我们提议一种基于LLM的文本到SQL框架,包括示例感知提示和动态修订链。我们的方法包括示例感知示例,其中包括SQL运算符的组合和基于给定问题的细化信息。为了助于搜索相似意图的问题,我们提出了两种策略。首先,我们利用LLM来简化原始问题,使其语法统一,从而明确用户的意图。其次,我们设计了动态修订链,以每次生成SQL后收集细化反馈,以便在下一次生成SQL时进行适应。我们的实验结果表明,我们的方法在三个Text-to-SQL标准测试集上具有显著优势。

Aggregating Credences into Beliefs: Agenda Conditions for Impossibility Results

  • paper_url: http://arxiv.org/abs/2307.05072
  • repo_url: None
  • paper_authors: Minkyung Wang, Chisu Kim
  • For: This paper is written for researchers and scholars interested in judgment aggregation, belief binarization, and agenda-theoretic approaches to understanding the limitations of collective decision-making processes.* Methods: The paper uses an agenda-theoretic approach to generalize previous results and determine the necessary and sufficient conditions for the impossibility theorems to arise in binarizing belief aggregation. The authors use path-connectedness, even-negatability, negation-connectedness, blockedness, and other conditions to characterize the agenda conditions for different results.* Results: The paper presents three main results: (1) path-connectedness and even-negatability constitute the exact agenda condition for the oligarchy result, (2) negation-connectedness is the condition for the triviality result, and (3) blockedness is the condition for the impossibility result. The authors also compare these findings with existing agenda-theoretic characterization theorems in judgment aggregation and belief binarization.
    Abstract Binarizing belief aggregation addresses how to rationally aggregate individual probabilistic beliefs into collective binary beliefs. Similar to the development of judgment aggregation theory, formulating axiomatic requirements, proving impossibility theorems, and identifying exact agenda conditions of impossibility theorems are natural and important research topics in binarizing belief aggregation. Building on our previous research on impossibility theorems, we use an agenda-theoretic approach to generalize the results and to determine the necessary and sufficient level of logical interconnection between the issues in an agenda for the impossibility theorems to arise. We demonstrate that (1) path-connectedness and even-negatability constitute the exact agenda condition for the oligarchy result stating that binarizing belief aggregation satisfying proposition-wise independence and deductive closure of collective beliefs yields the oligarchies under minor conditions; (2) negation-connectedness is the condition for the triviality result obtained by adding anonymity to the oligarchy result; and (3) blockedness is the condition for the impossibility result, which follows by adding completeness and consistency of collective beliefs. Moreover, we compare these novel findings with existing agenda-theoretic characterization theorems in judgment aggregation and belief binarization.
    摘要 binarizing belief aggregation 关注如何合理地将个人概率信仰聚合到集体二进制信仰上。与判断聚合理论的发展相似,我们需要明确 axiomatic 要求,证明不可能性定理,并确定不可能性定理的准确议程条件。基于我们之前的研究,我们使用 agenda-theoretic 方法推广结果,并确定了合理议程条件。我们证明了以下结论:1. Path-connectedness 和 even-negatability 是合理议程条件,即在这些条件下,binarizing belief aggregation 满足 Proposition-wise independence 和 deductive closure of collective beliefs 时,会出现 oligarchy 结果,只要满足一些轻微条件。2. negation-connectedness 是添加匿名性后的轻微条件,可以得到 oligarchy 结果。3. blockedness 是添加完整性和一致性的 collective beliefs 后的不可能性条件。此外,我们还与判断聚合理论和 belief binarization 的 agenda-theoretic 特征进行比较。

Mining for Unknown Unknowns

  • paper_url: http://arxiv.org/abs/2307.05071
  • repo_url: https://github.com/jcborges/PeriodicEventMining
  • paper_authors: Bernard Sinclair-Desgagné
  • for: 提高寻找未知未知(Unknown Unknowns)的能力
  • methods: 使用Formal Concept Analysis(FCA),一种基于格理论的数据挖掘和组织技术
  • results: 提出了一个简单的框架,用于系统地思考和搜寻未知未知I hope this helps! Let me know if you have any other questions.
    Abstract Unknown unknowns are future relevant contingencies that lack an ex ante description. While there are numerous retrospective accounts showing that significant gains or losses might have been achieved or avoided had such contingencies been previously uncovered, getting hold of unknown unknowns still remains elusive, both in practice and conceptually. Using Formal Concept Analysis (FCA) - a subfield of lattice theory which is increasingly applied for mining and organizing data - this paper introduces a simple framework to systematically think out of the box and direct the search for unknown unknowns.
    摘要 未知未知是未来重要的不确定因素,它们缺乏先前的描述。虽然有很多回顾账户表明,had these contingencies been previously uncovered, significant gains or losses might have been achieved or avoided,但捕捉未知未知仍然是一个艰难的任务,具体来说是在实践和概念上都存在困难。本文使用正式概念分析(FCA)——一种数据挖掘和组织的子领域——提出了一个简单的框架,以系统地思考和搜索未知未知。

Cognitive Bias and Belief Revision

  • paper_url: http://arxiv.org/abs/2307.05069
  • repo_url: None
  • paper_authors: Panagiotis Papadamos, Nina Gierasimczuk
  • for: 本研究围绕认知偏见的三种类型进行了正式化,并在信念修复框架中应用。
  • methods: 本研究使用了三种常见的信念修复方法:条件修复、lexicographic revision和最小修复。
  • results: 研究发现,偏见信念修复方法在真实追踪中的可靠性不高。计算机实验也表明,偏见信念修复在随机场景中的性能不佳。
    Abstract In this paper we formalise three types of cognitive bias within the framework of belief revision: confirmation bias, framing bias, and anchoring bias. We interpret them generally, as restrictions on the process of iterated revision, and we apply them to three well-known belief revision methods: conditioning, lexicographic revision, and minimal revision. We investigate the reliability of biased belief revision methods in truth tracking. We also run computer simulations to assess the performance of biased belief revision in random scenarios.
    摘要 在这篇论文中,我们将三种认知偏见视为修订信念的框架之下的限制。这三种偏见分别是确认偏见、帧偏见和锚偏见。我们将它们通常 интерпретирова为修订过程中的约束,并应用于三种常见的修订方法:条件修订、lexicographic修订和最小修订。我们研究了偏见修订方法的真实性追踪可靠性。我们还运行了Random Scenario中的计算机实验来评估偏见修订方法的性能。

A Theory of Bounded Inductive Rationality

  • paper_url: http://arxiv.org/abs/2307.05068
  • repo_url: None
  • paper_authors: Caspar Oesterheld, Abram Demski, Vincent Conitzer
  • for: 这篇论文旨在创造一种不假设完美知识的合理决策理论,用于解决具有较大复杂性和不确定性的决策问题。
  • methods: 这篇论文使用了一种基于循环推理的方法,要求合理的推理机器在面临决策问题时,不断测试每个可计算的假设,并遵循这些假设的承诺。
  • results: 论文的主要结果是提供了一种合理的决策理论,可以应对不具有完美知识的决策问题。此外,论文还证明了这种理论的其他愉悦特点,如能够评估随机和 Pseudo-Random 抽签的价值。最后,论文研究了不同代理人之间的竞争交互,并证明了这些代理人可以 converges to 的策略。
    Abstract The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.
    摘要 主流的选择理论假设了推理完整性。即在处理决策问题时,一个代理人可以完成所有相关的计算和决定所有相关的逻辑/数学laim的真伪值。这个假设是不现实的,当我们提供有关距离数字π的赌博或代理人面临 computationally intractable 的观念问题时。此外,这个假设也会导致环境中的描述与代理人之间产生矛盾。在这篇文章中,我们开发了不假设推理完整性的决策理论。我们考虑了面临多次决策问题(包括有关数字π的赌博或与其他代理人的游戏)的代理人。我们的主要贡献是为这种代理人提供一个有道理的做法。简而言之,我们要求这些代理人在可以有效计算的假设上进行无限次测试,并且遵循这些假设的承诺高回数。然后,我们证明了这些代理人具有其他有利的性格。例如,它们对Random和pseudo-Random的抽签有价值。最后,我们考虑了不同的代理人之间的战略互动,并证明了这些代理人可以转化为 Folk Theorem 中的战略。

Exploiting Asymmetry in Logic Puzzles: Using ZDDs for Symbolic Model Checking Dynamic Epistemic Logic

  • paper_url: http://arxiv.org/abs/2307.05067
  • repo_url: None
  • paper_authors: Daniel Miedema, Malvin Gattinger
  • for: 避免状态爆发问题,使用binary decision diagrams(BDDs)进行模型检查。
  • methods: 使用Zero-suppressed Decision Diagrams(ZDDs)来对Kripke模型进行符号编码,以便在多智能体系统中进行知识和信息动态逻辑推理。
  • results: 对三个文献中的Muddy Children、Sum and Product puzzle和Dining Cryptographers问题进行比较,发现使用合适的ZDD变体可以减少内存使用量,这表明ZDDs是用于模型检查多智能体系统的有用工具。
    Abstract Binary decision diagrams (BDDs) are widely used to mitigate the state-explosion problem in model checking. A variation of BDDs are Zero-suppressed Decision Diagrams (ZDDs) which omit variables that must be false, instead of omitting variables that do not matter. We use ZDDs to symbolically encode Kripke models used in Dynamic Epistemic Logic, a framework to reason about knowledge and information dynamics in multi-agent systems. We compare the memory usage of different ZDD variants for three well-known examples from the literature: the Muddy Children, the Sum and Product puzzle and the Dining Cryptographers. Our implementation is based on the existing model checker SMCDEL and the CUDD library. Our results show that replacing BDDs with the right variant of ZDDs can significantly reduce memory usage. This suggests that ZDDs are a useful tool for model checking multi-agent systems.
    摘要 � Binary 决策 diagram (BDD) 广泛用于 mitigate 状态 explosion 问题在 model checking 中。一种 BDD 的变体是 Zero-suppressed Decision Diagrams (ZDD),它们 omits 变量必须为假,而不是 omits 无关变量。我们使用 ZDD 来 символи地编码 Kripke 模型,用于动态 epistemic logic 中的知识和信息动态系统的理解。我们对不同 ZDD 变体的内存使用情况进行比较,对三个文献中的著名例子(即 Muddy Children,Sum and Product puzzle 和 Dining Cryptographers)进行了实验。我们的实现基于现有的 model checker SMCDEL 和 CUDD 库。我们的结果表明,将 BDD replaced with the right variant of ZDD 可以显著减少内存使用量。这表明 ZDD 是用于 model checking 多代理系统的有用工具。

Tableaux for the Logic of Strategically Knowing How

  • paper_url: http://arxiv.org/abs/2307.05066
  • repo_url: None
  • paper_authors: Yanjun Li
  • for: 这篇论文探讨了目标导向的知识如何扩展标准的认知逻辑,并引入了知识如何Operator。
  • methods: 本论文使用了表格过程来处理多代理版本的知识如何逻辑,并证明了这种表格过程的声明性和完整性。
  • results: 本论文证明了知识如何逻辑的满足问题可以在PSPACE中决定,并且展示了这种逻辑的表格过程的声明性和完整性。
    Abstract The logic of goal-directed knowing-how extends the standard epistemic logic with an operator of knowing-how. The knowing-how operator is interpreted as that there exists a strategy such that the agent knows that the strategy can make sure that p. This paper presents a tableau procedure for the multi-agent version of the logic of strategically knowing-how and shows the soundness and completeness of this tableau procedure. This paper also shows that the satisfiability problem of the logic can be decided in PSPACE.
    摘要 这个目的导向知识如何逻辑延伸了标准的知识逻辑,添加了知识如何操作。知识如何操作被解释为存在一个策略,使得代理人知道这个策略可以确保p。这篇文章提供了多代理人版本的知识如何逻辑的桌子程式,证明这个桌子程式的有效性和完整性。此外,文章还证明了这个逻辑的满意问题可以在PSPACE中解决。

Belief Revision from Probability

  • paper_url: http://arxiv.org/abs/2307.05632
  • repo_url: None
  • paper_authors: Jeremy Goodman, Bernhard Salow
  • for: 本研究探讨了一种问题相关的概率论信念观。
  • methods: 该论文使用了推理推论和概率论方法来探讨信念的动态。
  • results: 研究发现该论文的原则比正统的AGM理论弱,但比洛克信念论强。此外,研究还发现一种限定的模型,适用于许多应用程序,并确定了这个模型下的自然原则。
    Abstract In previous work ("Knowledge from Probability", TARK 2021) we develop a question-relative, probabilistic account of belief. On this account, what someone believes relative to a given question is (i) closed under entailment, (ii) sufficiently probable given their evidence, and (iii) sensitive to the relative probabilities of the answers to the question. Here we explore the implications of this account for the dynamics of belief. We show that the principles it validates are much weaker than those of orthodox theories of belief revision like AGM, but still stronger than those valid according to the popular Lockean theory of belief, which equates belief with high subjective probability. We then consider a restricted class of models, suitable for many but not all applications, and identify some further natural principles valid on this class. We conclude by arguing that the present framework compares favorably to the rival probabilistic accounts of belief developed by Leitgeb and by Lin and Kelly.
    摘要 在我们之前的工作("知识从概率", TARK 2021)中,我们发展了问题相关的、概率论的信念观。根据这种观,对于某个问题,某个人的信念是(i)闭合于推论下,(ii)基于证据足够有可能性,以及(iii)受到问题的答案之间的相对概率影响。在这里,我们研究了这种观的动态效应。我们发现这些原则比正统的信念修订理论AGM更弱,但 still stronger than以 Lockean 信念论,该等同于高Subjective 概率。然后,我们考虑了一种限制的模型,适用于许多但不是所有应用,并识别出了这类模型的自然原理。最后,我们 argue that我们的框架与Leitgeb和Lin和Kelly所发展的概率信念观相比,更加有利。

System of Spheres-based Two Level Credibility-limited Revisions

  • paper_url: http://arxiv.org/abs/2307.05062
  • repo_url: None
  • paper_authors: Marco Garapa, Eduardo Ferme, Maurício D. L. Reis
  • for: 本文提出了一种基于Grove的系统圆的两级信任有限修订算法,用于修订信任度较高的句子。
  • methods: 本文使用了系统圆的构造和axiomaic characterization来定义和分析这种修订算法。
  • results: 本文提出的修订算法可以帮助解决信任度较高的句子的修订问题,并且可以保证修订后的信任度仍然满足一定的条件。
    Abstract Two level credibility-limited revision is a non-prioritized revision operation. When revising by a two level credibility-limited revision, two levels of credibility and one level of incredibility are considered. When revising by a sentence at the highest level of credibility, the operator behaves as a standard revision, if the sentence is at the second level of credibility, then the outcome of the revision process coincides with a standard contraction by the negation of that sentence. If the sentence is not credible, then the original belief set remains unchanged. In this paper, we propose a construction for two level credibility-limited revision operators based on Grove's systems of spheres and present an axiomatic characterization for these operators.
    摘要 两级信任限定修改是一种不优先级修改操作。在修改两级信任和一级不信任时,两级信任和一级不信任被考虑。当修改最高水平的信任句时,操作者行为如标准修改,如果句子属第二级信任,然后修改结果与标准减法相同。如果句子不信任,则原信任集未变。在这篇论文中,我们提出了基于Grove的球体系统的两级信任限定修改操作的构建,并提供了这些操作的axiomaCharacterization。

On Imperfect Recall in Multi-Agent Influence Diagrams

  • paper_url: http://arxiv.org/abs/2307.05059
  • repo_url: None
  • paper_authors: James Fox, Matt MacDermott, Lewis Hammond, Paul Harrenstein, Alessandro Abate, Michael Wooldridge
  • for: 这种模型适用于具有忘记和缺失记忆的多代理情况,并提供了解决方案来寻找 Nash 平衡。
  • methods: 这篇文章使用混合策略和两种相关平衡来解决 MAIDs 中的忘记和缺失记忆问题。
  • results: 文章分析了 MAIDs 中关键决策问题的计算复杂性,并描述了在 Markov 游戏和团队情况中的应用。
    Abstract Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. In some settings, MAIDs offer significant advantages over extensive-form game representations. Previous work on MAIDs has assumed that agents employ behavioural policies, which set independent conditional probability distributions over actions for each of their decisions. In settings with imperfect recall, however, a Nash equilibrium in behavioural policies may not exist. We overcome this by showing how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium. We also analyse the computational complexity of key decision problems in MAIDs, and explore tractable cases. Finally, we describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable.
    摘要 多体影响 диаграммы (MAIDs) 是一种流行的游戏理论模型,基于 Bayesian 网络。在某些设置中,MAIDs 提供了Significant advantages 于Extensive-form game 表示。先前的工作 Assume agents 使用行为策略,每个决策都设置独立的 conditional probability distribution over actions。然而,在忘记的情况下,Nash equilibria 在行为策略中可能不存在。我们通过示出如何解决 MAIDs 中忘记和缺失记忆的 agents 使用混合策略和两种相关的 equilibria 来解决这个问题。我们还分析了 MAIDs 中关键决策问题的计算复杂性,并探索可迭代的情况。最后,我们描述了 MAIDs 在 Markov 游戏和团队情况下的应用, где imperfect recall 通常是不可避免的。

Causal Kripke Models

  • paper_url: http://arxiv.org/abs/2307.05631
  • repo_url: None
  • paper_authors: Yiwen Ding, Krishna Manoorkar, Apostolos Tzimoulis, Ruoding Wang, Xiaolong Wang
  • for: 扩展了 Halpern 和 pearl causal models 用于可能世界 semantics 环境中的实际 causality 模型。
  • methods: 使用这种框架,引入了 causality 逻辑,允许在多个可能性、时间、知识和不确定性情况下进行 causality 推理。
  • results: 通过一些例子,证明了这种逻辑的有效性,并提出了未来研究的一些方向。
    Abstract This work extends Halpern and Pearl's causal models for actual causality to a possible world semantics environment. Using this framework we introduce a logic of actual causality with modal operators, which allows for reasoning about causality in scenarios involving multiple possibilities, temporality, knowledge and uncertainty. We illustrate this with a number of examples, and conclude by discussing some future directions for research.
    摘要 这项工作扩展了戴尔和珀尔的 causal models для实际 causality 到可能世界 semantics 环境中。使用这个框架,我们引入了一种实际 causality 逻辑,其允许在多个可能性、时间、知识和不确定性方面进行 causality 的推理。我们通过一些示例来说明,并将在未来的研究方向中讨论一些可能性。

Characterization of AGM Belief Contraction in Terms of Conditionals

  • paper_url: http://arxiv.org/abs/2307.05629
  • repo_url: None
  • paper_authors: Giacomo Bonanno
  • for: 本研究准备了 AGM 信仰缩小的 semantic caracterization,基于帧,包括 Kripke 信仰关系和 Stalnaker-Lewis 选择函数。
  • methods: 本研究使用 Kripke 信仰关系和 Stalnaker-Lewis 选择函数来准备 AGM 信仰缩小的 semantic caracterization。
  • results: 本研究显示,AGM 信仰缩小可以通过使用 Kripke 信仰关系和 Stalnaker-Lewis 选择函数来实现 semantic caracterization。
    Abstract We provide a semantic characterization of AGM belief contraction based on frames consisting of a Kripke belief relation and a Stalnaker-Lewis selection function. The central idea is as follows. Let K be the initial belief set and K-A be the contraction of K by the formula A; then B belongs to the set K-A if and only if, at the actual state, the agent believes B and believes that if not-A is (were) the case then B is (would be) the case.
    摘要 我们提供了AGM信念缩小的语义特征化,基于框架,包括基于Kripke信念关系和Stalnaker-Lewis选择函数。中心思想如下:假设K是初始信念集,则K-A表示通过公式A缩小K得到的新信念集,其中B属于K-A如果且只如果,在实际状态下,代理人认为B是真并认为,如果不是A的情况下,B是真的。

Strengthening Consistency Results in Modal Logic

  • paper_url: http://arxiv.org/abs/2307.05053
  • repo_url: None
  • paper_authors: Samuel Allen Alexander, Arthur Paul Pedersen
  • for: 这篇论文是为了研究Modal逻辑中的一致性问题而写的。
  • methods: 这篇论文使用了Generic Theory这种方法,用以建立Modal逻辑中的一致性。
  • results: 这篇论文得到了一些有关Modal逻辑中一致性的结论,并且这些结论可以帮助解决一些关于Modal逻辑、判断、推理和决策的问题。
    Abstract A fundamental question asked in modal logic is whether a given theory is consistent. But consistent with what? A typical way to address this question identifies a choice of background knowledge axioms (say, S4, D, etc.) and then shows the assumptions codified by the theory in question to be consistent with those background axioms. But determining the specific choice and division of background axioms is, at least sometimes, little more than tradition. This paper introduces **generic theories** for propositional modal logic to address consistency results in a more robust way. As building blocks for background knowledge, generic theories provide a standard for categorical determinations of consistency. We argue that the results and methods of this paper help to elucidate problems in epistemology and enjoy sufficient scope and power to have purchase on problems bearing on modalities in judgement, inference, and decision making.
    摘要 Note:* "modal logic" is translated as "modal 逻辑" (modal logic)* "consistent" is translated as "一致" (consistent)* "background knowledge axioms" is translated as "背景知识axioms" (background knowledge axioms)* "generic theories" is translated as "通用理论" (generic theories)* "categorical determinations of consistency" is translated as "一致的分类决定" (categorical determinations of consistency)* "epistemology" is translated as "知识论" (epistemology)

Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

  • paper_url: http://arxiv.org/abs/2307.05052
  • repo_url: https://github.com/paihengxu/xicl
  • paper_authors: Zongxia Li, Paiheng Xu, Fuxiao Liu, Hyemi Song
  • for: 本研究探讨了大语言模型(LLM)在上下文学习(ICL)性能中不同示例组件的作用。
  • methods: 本研究使用了可解释性NLP(XNLP)方法,并使用了对比示例的抽象图来进行质量和量化分析。
  • results: 研究发现,改变真实标签会导致示例的抽象图发生显著变化,特别是在更大的LLM上。改变输入分布的细节对ICL性能的影响较小,而补充说明在符号逻辑任务中有所助益,但在情感分析任务中的助益相对较少。
    Abstract We investigate the role of various demonstration components in the in-context learning (ICL) performance of large language models (LLMs). Specifically, we explore the impacts of ground-truth labels, input distribution, and complementary explanations, particularly when these are altered or perturbed. We build on previous work, which offers mixed findings on how these elements influence ICL. To probe these questions, we employ explainable NLP (XNLP) methods and utilize saliency maps of contrastive demonstrations for both qualitative and quantitative analysis. Our findings reveal that flipping ground-truth labels significantly affects the saliency, though it's more noticeable in larger LLMs. Our analysis of the input distribution at a granular level reveals that changing sentiment-indicative terms in a sentiment analysis task to neutral ones does not have as substantial an impact as altering ground-truth labels. Finally, we find that the effectiveness of complementary explanations in boosting ICL performance is task-dependent, with limited benefits seen in sentiment analysis tasks compared to symbolic reasoning tasks. These insights are critical for understanding the functionality of LLMs and guiding the development of effective demonstrations, which is increasingly relevant in light of the growing use of LLMs in applications such as ChatGPT. Our research code is publicly available at https://github.com/paihengxu/XICL.
    摘要 我团队 investigate 大语言模型(LLM)在场景学习(ICL)性能中不同示例组件的作用。我们具体地研究示例中的真实标签、输入分布和补充解释的影响,特别是当这些元素被修改或干扰时。我们基于之前的研究,这些元素对 ICL 的影响存在混乱的结果。为了探讨这些问题,我们使用可解释的NLPT(XNLP)方法和对比示例的saliency图进行both质量和kvantitative分析。我们发现,翻转真实标签会对saliency产生显著影响,特别是在更大的LLM中。我们在输入分布的细致层次分析中发现,将sentiment-指示性词改为中性词不会有如同修改真实标签那样的影响。最后,我们发现 complementary explanations 在不同任务中的效果是不同的,在 sentiment analysis 任务中的效果有限,而在符号逻辑任务中的效果较高。这些发现对于理解 LLM 的工作原理和开发有效示例的指导性至关重要,这与大语言模型在应用 such as ChatGPT 中的使用量逐渐增加。我们的研究代码可以在 上获取。

Depth-bounded Epistemic Logic

  • paper_url: http://arxiv.org/abs/2307.07448
  • repo_url: None
  • paper_authors: Farid Arthaud, Martin Rinard
  • for: 本研究探讨了智能代理人如何理解自己和其他代理人的信念。
  • methods: 本文使用了 epistemic logics,即如何模型代理人的信念和信念之间的关系。
  • results: 本文提出了 DBEL 扩展,它是 S5 的一种扩展,可以模型代理人只能对 epistemic 表达进行有限深度的理解。此外,文章还提出了公共宣布逻辑 DPAL,可以扩展 DBEL,并证明了其完备性和soundness。最后,文章使用这些逻辑来研究了有限深度代理人如何解决经典的泥沼孩子问题。
    Abstract Epistemic logics model how agents reason about their beliefs and the beliefs of other agents. Existing logics typically assume the ability of agents to reason perfectly about propositions of unbounded modal depth. We present DBEL, an extension of S5 that models agents that can reason about epistemic formulas only up to a specific modal depth. To support explicit reasoning about agent depths, DBEL includes depth atoms Ead (agent a has depth exactly d) and Pad (agent a has depth at least d). We provide a sound and complete axiomatization of DBEL. We extend DBEL to support public announcements for bounded depth agents and show how the resulting DPAL logic generalizes standard axioms from public announcement logic. We present two alternate extensions and identify two undesirable properties, amnesia and knowledge leakage, that these extensions have but DPAL does not. We provide axiomatizations of these logics as well as complexity results for satisfiability and model checking. Finally, we use these logics to illustrate how agents with bounded modal depth reason in the classical muddy children problem, including upper and lower bounds on the depth knowledge necessary for agents to successfully solve the problem.
    摘要 知识逻辑如何模型代理人的信念和其他代理人的信念。现有逻辑通常假设代理人可以完美地理解未bounded模态深度的命题。我们提出了DBEL,它是S5的扩展,可以模型代理人只能理解知识命题的特定模态深度。为了支持显式的代理人深度 reasoning,DBEL包含了深度原子 Ead (代理人a有深度 exactly d) 和 Pad (代理人a有深度 at least d)。我们提供了完整的幂等化和DBEL的幂等化。我们将DPAL逻辑扩展到支持公共宣布,并证明DPAL逻辑将标准公共宣布逻辑的公理推理。我们还提出了两种不同的扩展,并证明这些扩展会导致知识泄露和忘却两种不良性。我们还提供了这些逻辑的幂等化和满足性和模板检查的复杂度分析。最后,我们使用这些逻辑来解释代理人具有受限模态深度如何在经典泥沼孩子问题中进行 reasoning,包括知识深度的上限和下限,代理人需要在解决问题时具备的深度知识。

Epistemic Syllogistic: First Steps

  • paper_url: http://arxiv.org/abs/2307.05043
  • repo_url: None
  • paper_authors: Yipu Li, Yanjing Wang
  • for: 这篇论文旨在探讨阿里斯多德的模态逻辑问题,尤其是在当代逻辑和哲学研究中的意义。
  • methods: 本文使用自然逻辑程序作为灵感,并对模态逻辑进行了多种变体的研究,包括 epistemic syllogistic 的不同扩展。
  • results: 本文提出了多种 axiomatizations 和完整性证明,以描述模态逻辑中更加复杂的概念。
    Abstract Aristotle's discussions on modal syllogistic have often been viewed as error-prone and have garnered significant attention in the literature due to historical and philosophical interests. However, from a contemporary standpoint, they also introduced natural fragments of first-order modal logic, warranting a comprehensive technical analysis. In this paper, drawing inspiration from the natural logic program, we propose and examine several variants of modal syllogistic within the epistemic context, thereby coining the term Epistemic Syllogistic. Specifically, we concentrate on the de re interpretation of epistemic syllogisms containing non-trivial yet natural expressions such as "all things known to be A are also known to be not B." We explore the epistemic apodeictic syllogistic and its extensions, which accommodate more complex terms. Our main contributions include several axiomatizations of these logics, with completeness proofs that may be of independent interest.
    摘要 亚里斯多德的Modal Syllogistic讨论经常被视为错误多端,吸引了历史和哲学研究的关注。然而,从 contemporaneous 的角度来看,它们实际上揭示了自然逻辑的幻影,值得进行完整的技术分析。在这篇论文中,我们Drawing inspiration from natural logic program,提出并研究了 modal syllogistic 的多种变种,并将其称为 Epistemic Syllogistic。我们专注于 de re 解释epistemic syllogisms 中的非rive yet natural 表达,如 "all things known to be A are also known to be not B"。我们探索了 epistemic apodeictic syllogistic 和其扩展,可以满足更复杂的表达。我们的主要贡献包括这些逻辑的几种 axiomatization,以及完整性证明,可能具有独立的价值。

Neural-Symbolic Recommendation with Graph-Enhanced Information

  • paper_url: http://arxiv.org/abs/2307.05036
  • repo_url: https://github.com/hanzo2020/gnnlr
  • paper_authors: Bang Chen, Wei Peng, Maonian Wu, Bo Zheng, Shaojun Zhu
  • for: 该研究旨在构建一种基于图神经网络和符号逻辑运算的推荐模型,以便同时拥有全球隐藏信息的推荐能力和本地显式逻辑推荐能力。
  • methods: 该模型首先基于互动原理建立ITEM-ITEM图,然后使用图神经网络捕捉全球数据中的隐藏信息。接着,将用户行为转换成符号逻辑表达,以便从认知逻辑的视角进行推荐预测。
  • results: 对五个公共数据集进行了广泛的实验,结果显示,我们的提议模型在比较一些现有方法时表现出色,源代码可以在 [https://github.com/hanzo2020/GNNLR] 上获取。
    Abstract The recommendation system is not only a problem of inductive statistics from data but also a cognitive task that requires reasoning ability. The most advanced graph neural networks have been widely used in recommendation systems because they can capture implicit structured information from graph-structured data. However, like most neural network algorithms, they only learn matching patterns from a perception perspective. Some researchers use user behavior for logic reasoning to achieve recommendation prediction from the perspective of cognitive reasoning, but this kind of reasoning is a local one and ignores implicit information on a global scale. In this work, we combine the advantages of graph neural networks and propositional logic operations to construct a neuro-symbolic recommendation model with both global implicit reasoning ability and local explicit logic reasoning ability. We first build an item-item graph based on the principle of adjacent interaction and use graph neural networks to capture implicit information in global data. Then we transform user behavior into propositional logic expressions to achieve recommendations from the perspective of cognitive reasoning. Extensive experiments on five public datasets show that our proposed model outperforms several state-of-the-art methods, source code is avaliable at [https://github.com/hanzo2020/GNNLR].
    摘要 “推荐系统不仅是从数据中的对照学习,也是一个认知任务,需要认知能力。现在最进步的图 neural network 已经广泛地应用在推荐系统中,因为它们可以从图结构数据中捕捉到隐藏的构造资讯。然而, LIKE 多个神经网络算法,它们只会从视觉角度学习匹配模式。一些研究人员使用用户行为进行逻辑推理来实现推荐预测,但这种逻辑是局部的,忽略了全球规模上的隐藏信息。在这个工作中,我们结合了图神经网络和符号逻辑操作的优点,建立了一个具有全球隐藏推理能力和局部明确逻辑推理能力的神经符号推荐模型。我们首先建立了一个项目项目图,根据邻接互动原则,使用图神经网络来捕捉全球数据中的隐藏信息。然后,我们将用户行为转换为符号逻辑表达,以实现从认知角度的推荐预测。实验结果显示,我们的提出模型在五个公开数据集上比以前的多个状态之顶,源代码可以在 [https://github.com/hanzo2020/GNNLR] 查看。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

  • paper_url: http://arxiv.org/abs/2307.05025
  • repo_url: None
  • paper_authors: Hui Kang, Sheng Liu, Huaxi Huang, Jun Yu, Bo Han, Dadong Wang, Tongliang Liu
    for: 学习含杂标签的研究在最近几年里主要关注开发 novel 算法,以实现对含杂训练标签的Robustness 性,同时能够泛化到干净数据上。methods: 本研究使用 cross-entropy 损失函数,并结合通用的规范策略,如学习速率减少、模型权重平均和数据扩展。results: 我们的结果表明,使用这些规范策略的组合可以超过当前的状态艺术方法。我们的发现鼓励我们重新评估含杂标签学习的benchmark,并重新考虑特殊的学习算法,designed for 含杂标签训练。
    Abstract In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels.
    摘要

Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification

  • paper_url: http://arxiv.org/abs/2307.05017
  • repo_url: None
  • paper_authors: Yi Liao, Yongsheng Gao, Weichuan Zhang
  • for: 本研究旨在为深度学习模型的解释提供有效的方法,使得模型的决策更容易被理解和解释。
  • methods: 本研究提出了一种后处解释算法 named feature activation map (FAM),该算法可以解释深度学习模型没有全连接层的图像分类模型。
  • results: 对于十种深度学习模型的图像分类、对比学习图像分类和图像检索任务,提出的 FAM 算法能够有效地解释模型的决策。
    Abstract Decisions made by convolutional neural networks(CNN) can be understood and explained by visualizing discriminative regions on images. To this end, Class Activation Map (CAM) based methods were proposed as powerful interpretation tools, making the prediction of deep learning models more explainable, transparent, and trustworthy. However, all the CAM-based methods (e.g., CAM, Grad-CAM, and Relevance-CAM) can only be used for interpreting CNN models with fully-connected (FC) layers as a classifier. It is worth noting that many deep learning models classify images without FC layers, e.g., few-shot learning image classification, contrastive learning image classification, and image retrieval tasks. In this work, a post-hoc interpretation tool named feature activation map (FAM) is proposed, which can interpret deep learning models without FC layers as a classifier. In the proposed FAM algorithm, the channel-wise contribution weights are derived from the similarity scores between two image embeddings. The activation maps are linearly combined with the corresponding normalized contribution weights, forming the explanation map for visualization. The quantitative and qualitative experiments conducted on ten deep learning models for few-shot image classification, contrastive learning image classification and image retrieval tasks demonstrate the effectiveness of the proposed FAM algorithm.
    摘要 <> translate="zh-CN"深度学习模型的决策可以通过图像视觉化的方式进行理解和解释。为此,基于图像活动映射(CAM)的方法被提出,使得深度学习模型的预测变得更加可解、透明和可信。然而,所有的CAM基本方法(例如CAM、Grad-CAM和Relevance-CAM)都只能用于解释具有全连接层(FC)的深度学习模型。它们无法用于解释没有FC层的深度学习模型,例如几步学习图像分类、对比学习图像分类和图像检索任务。在这种情况下,一种后期解释工具名为特征活动图(FAM)被提出。在提出的FAM算法中,通过两个图像的相似度分数来 derivate通道级别的贡献权重。然后,通过将活动图与相应的 норма化贡献权重进行线性组合,形成解释图。对于几步图像分类、对比学习图像分类和图像检索任务,对十个深度学习模型进行了量化和质量的实验, demonstarted the effectiveness of the proposed FAM algorithm。

CILF:Causality Inspired Learning Framework for Out-of-Distribution Vehicle Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2307.05624
  • repo_url: None
  • paper_authors: Shengyi Li, Qifan Xue, Yezhuo Zhang, Xuanpeng Li
  • for: 本研究旨在提高自动驾驶车辆的轨迹预测精度,特别是对于不同驾驶场景和环境的预测。
  • methods: 本研究提出了一种基于 causal graph 的 Out-of-Distribution Causal Graph (OOD-CG) 方法,用于解决现有方法强调 correlations 的问题。此外,还提出了一种 Causal Inspired Learning Framework (CILF),包括三个步骤:1) 提取域 invariant causal feature,2) 提取域 variant feature,3) 分离域 variant causal和非 causal feature。
  • results: 实验表明,CILF 在主流的 NGSIM 和 INTERACTION 数据集上提高了预测性能,特别是在不同驾驶场景和环境下。
    Abstract Trajectory prediction is critical for autonomous driving vehicles. Most existing methods tend to model the correlation between history trajectory (input) and future trajectory (output). Since correlation is just a superficial description of reality, these methods rely heavily on the i.i.d. assumption and evince a heightened susceptibility to out-of-distribution data. To address this problem, we propose an Out-of- Distribution Causal Graph (OOD-CG), which explicitly defines the underlying causal structure of the data with three entangled latent features: 1) domain-invariant causal feature (IC), 2) domain-variant causal feature (VC), and 3) domain-variant non-causal feature (VN ). While these features are confounded by confounder (C) and domain selector (D). To leverage causal features for prediction, we propose a Causal Inspired Learning Framework (CILF), which includes three steps: 1) extracting domain-invariant causal feature by means of an invariance loss, 2) extracting domain variant feature by domain contrastive learning, and 3) separating domain-variant causal and non-causal feature by encouraging causal sufficiency. We evaluate the performance of CILF in different vehicle trajectory prediction models on the mainstream datasets NGSIM and INTERACTION. Experiments show promising improvements in CILF on domain generalization.
    摘要 “轨迹预测是自动驾驶车辆的重要任务。现有的方法通常是模型历史轨迹(输入)和未来轨迹(输出)之间的联乘。但这些方法对于实际情况有限的误导,尤其是在非典型数据上表现不佳。为解决这个问题,我们提出了对出现非典型数据的问题的外部干扰概率概念(OOD-CG),它明确地定义了数据的下游结构,包括三个涉及的隐藏特征:1)预测不受领域影响的 causal 特征(IC),2)领域特有的 causal 特征(VC),和3)领域特有的非 causal 特征(VN)。这些特征被混合运动(C)和领域选择器(D)所混淆。为了利用 causal 特征进行预测,我们提出了一个受 causal 革新数据的构成框架(CILF),包括以下三个步骤:1)通过不受领域影响的对称损失提取预测不受领域影响的 causal 特征,2)通过领域对称学习提取领域特有的 causal 特征,和3)通过将领域特有的 causal 和非 causal 特征分开,以便实现 causal 充分性。我们在主流的 NGSIM 和 INTERACTION 数据集上评估了 CILF 的表现,实际上获得了显著的改善。”

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

  • paper_url: http://arxiv.org/abs/2307.05623
  • repo_url: None
  • paper_authors: Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng
  • for: 估算交通需求矩阵(OD matrix),解决交通领域中的重要问题。
  • methods: 使用深度学习方法推导OD序列的结构,并使用结构约束导航传统的数值优化。
  • results: NN可以有效地推导OD序列的结构,并为数值优化提供实用的约束,解决了延迟问题。
    Abstract OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.
    摘要 <> translate("OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.")Here's the translation in Traditional Chinese:<>调 Matrix 估算是交通领域中的一个核心问题。主要方法是使用交通传感器所量测的信息,如交通数据,来估算交通需求表示的OD Matrix。问题可分为两 category:静止OD Matrix 估算和动态OD sequence(简称OD序列)估算。上述两者都面临了不足决定问题,因为有着充足的估算参数和不充分的约束信息。此外,OD序列估算还面临了延迟挑战:由于不同的交通条件,例如塞车,同一辆车会在不同的路段上出现在同一个观察时间点,导致同一个OD需求与不同的车程相对应。为此,本文提出了一个统合方法,将深度学习方法用于OD序列的结构推理,并使用结构约束导引传统的数值估算。我们的实验显示,神经网可以有效地推理OD序列的结构,并提供实用的约束来导引数值估算。此外,实验显示,提供的结构信息不仅包含OD矩阵的空间结构约束,还包含了时间结构约束,很好地解决延迟问题。

Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05004
  • repo_url: None
  • paper_authors: Tomoaki Nakamura, Akira Taniguchi, Tadahiro Taniguchi
  • for: 这个论文提出了一种生成概率模型,旨在 integrate emergent communication和多个代理人学习奖励。
  • methods: 该模型使用概率推理进行控制,并通过消息来实现代理人之间的交流。
  • results: 通过在网格环境中进行实验,我们显示了该PGM可以推理出有意义的消息,以完成协作任务。Note: “for” refers to the purpose or goal of the paper, “methods” refers to the techniques or approaches used in the paper, and “results” refers to the main findings or outcomes of the paper.
    Abstract This paper proposes a generative probabilistic model integrating emergent communication and multi-agent reinforcement learning. The agents plan their actions by probabilistic inference, called control as inference, and communicate using messages that are latent variables and estimated based on the planned actions. Through these messages, each agent can send information about its actions and know information about the actions of another agent. Therefore, the agents change their actions according to the estimated messages to achieve cooperative tasks. This inference of messages can be considered as communication, and this procedure can be formulated by the Metropolis-Hasting naming game. Through experiments in the grid world environment, we show that the proposed PGM can infer meaningful messages to achieve the cooperative task.
    摘要

Selective Sampling and Imitation Learning via Online Regression

  • paper_url: http://arxiv.org/abs/2307.04998
  • repo_url: None
  • paper_authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • for: 这个论文主要研究了受难学习(Imitation Learning)问题,特别是在受到噪声专家反馈的情况下。
  • methods: 这个论文使用了选择采样算法,通过在不同的动作上请求噪声专家反馈来解决受难学习问题。
  • results: 论文提出了一种新的选择采样算法,可以在涉及到多个动作和概念函数类型的情况下实现最佳的 regret 和查询次数 bound。此外,论文还提供了一种基于函数 aproximation的受难学习算法,其 regret bound 只取决于搜索过程中出现的状态的margin。
    Abstract We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
    摘要 我们考虑到循环学习(IL)的问题,通过活动地询问受脏的专家反馈。虽然循环学习在实验上得到了成功,但大多数先前的工作假设了无错的专家反馈,这不是实际应用中的问题。事实上,只有access到受脏的专家反馈时,使用非互动式循环学习(non-interactive IL)的算法可以被证明需要一个实际上是禁止的数量的询问。相比之下,在这个工作中,我们提供了一个互动式的循环学习算法,使用选择性的询问来活动地询问受脏的专家反馈。我们的贡献是二重的:首先,我们提供了一个新的选择性询问算法,可以处理通用函数类型和多个动作。我们得到了最好的知识库 regret bound和询问次数bound。其次,我们将这些分析扩展到受脏专家反馈的循环学习问题上,提供了一个新的循环学习算法,可以仅仅进行有限次询问。我们的选择性询问算法借鉴了函数近似,并且靠着线上回归实验(online regression oracle)来预测动作,以及决定是否询问专家反馈。从理论上来说,我们的 regret bound仅和线上回归实验的 regret bound相依,而且询问次数还受到模型类型的埃勒德尔维度(eluder dimension)的影响。我们补充了一个下界,证明了我们的结果是紧缩的。 finally, we extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04996
  • repo_url: https://github.com/GhanshyamVerma/Explainable-Recommender-System
  • paper_authors: Ghanshyam Verma, Shovon Sengupta, Simon Simanta, Huan Chen, Janos A. Perge, Devishree Pillai, John P. McCrae, Paul Buitelaar
  • for: 这个研究旨在提高客户体验,提高客户关系管理的质量,通过知识图(KG)应用。
  • methods: 这个研究使用了两种知识图基本方法:一是基于强化学习的方法,另一是基于XGBoost算法的方法。两种方法都利用了一个基于结构化和无结构化数据生成的知识图。
  • results: 这个研究表明,通过使用知识图驱动的高级机器学习技术,可以提供更加可解释的结果,从而促进更好的决策。这个研究也表明了在客户关系管理中, combining 高级机器学习技术和知识图驱动的想法的潜在价值。
    Abstract Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications. For example, in financial services, companies may benefit from providing relevant financial articles to their customers to cultivate relationships, foster client engagement and promote informed financial decisions. While several approaches center on KG-based recommender systems for improved content, in this study we focus on interpretable KG-based recommender systems for decision making.To this end, we present two knowledge graph-based approaches for personalized article recommendations for a set of customers of a large multinational financial services company. The first approach employs Reinforcement Learning and the second approach uses the XGBoost algorithm for recommending articles to the customers. Both approaches make use of a KG generated from both structured (tabular data) and unstructured data (a large body of text data).Using the Reinforcement Learning-based recommender system we could leverage the graph traversal path leading to the recommendation as a way to generate interpretations (Path Directed Reasoning (PDR)). In the XGBoost-based approach, one can also provide explainable results using post-hoc methods such as SHAP (SHapley Additive exPlanations) and ELI5 (Explain Like I am Five).Importantly, our approach offers explainable results, promoting better decision-making. This study underscores the potential of combining advanced machine learning techniques with KG-driven insights to bolster experience in customer relationship management.
    摘要 personalized recommendations 的重要性在直接市场策略中不断增长,这导致了研究人员努力增强客户体验,通过知识图(KG)应用。例如,在金融服务中,公司可能会从提供 relevanter 的金融文章来培养关系,促进客户参与度和提高客户做出的 финанCIAL 决策。虽然许多方法集中在 KG 基于的 recommender 系统上,但在这种研究中,我们专注于可解释 KG 基于的 recommender 系统,以便在决策过程中提供更多的帮助。为此,我们提出了两种基于知识图的方法,用于个性化文章推荐。首先,我们使用 Reinforcement Learning 方法,并利用知识图的搜索路径来生成解释(Path Directed Reasoning )。其次,我们使用 XGBoost 算法,并使用后处方法如 SHAP 和 ELI5 来提供可解释的结果。重要的是,我们的方法提供了可解释的结果,从而促进更好的决策。本研究表明,将先进的机器学习技术与知识图驱动的洞察结合,可以提高客户关系管理的经验。

Monotone deep Boltzmann machines

  • paper_url: http://arxiv.org/abs/2307.04990
  • repo_url: None
  • paper_authors: Zhili Feng, Ezra Winston, J. Zico Kolter
  • for: 这个论文是研究深度波兰链机制(DBM)的一种可能的扩展,即允许自适应连接的 monotone DBM,以实现高效的approximate inference。
  • methods: 这篇论文使用了 monotone Deep Equilibrium model 的工具,并通过选择特定的活化函数来实现固定点迭代,以获得一个Variational Mean Field解。
  • results: 这个方法可以应用于深度 convolutional Boltzmann 架构,并能够同时完成图像的联合完成和分类任务,而不需要传统 RBM 中的含义场推理。
    Abstract Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.
    摘要

Epidemic Modeling with Generative Agents

  • paper_url: http://arxiv.org/abs/2307.04986
  • repo_url: https://github.com/bear96/gabm-epidemic
  • paper_authors: Ross Williams, Niyousha Hosseinichimeh, Aritra Majumdar, Navid Ghaffarzadegan
  • for: This study aims to address the grand challenge of incorporating human behavior in epidemic models by offering a new paradigm of individual-level modeling.
  • methods: The study uses generative artificial intelligence in an agent-based epidemic model, where each agent is empowered to make its own reasonings and decisions via connecting to a large language model such as ChatGPT.
  • results: Through various simulation experiments, the study presents compelling evidence that generative agents mimic real-world behaviors such as quarantining when sick and self-isolation when cases rise, and demonstrate patterns akin to multiple waves observed in recent pandemics followed by an endemic period. Additionally, the agents successfully flatten the epidemic curve.
    Abstract This study offers a new paradigm of individual-level modeling to address the grand challenge of incorporating human behavior in epidemic models. Using generative artificial intelligence in an agent-based epidemic model, each agent is empowered to make its own reasonings and decisions via connecting to a large language model such as ChatGPT. Through various simulation experiments, we present compelling evidence that generative agents mimic real-world behaviors such as quarantining when sick and self-isolation when cases rise. Collectively, the agents demonstrate patterns akin to multiple waves observed in recent pandemics followed by an endemic period. Moreover, the agents successfully flatten the epidemic curve. This study creates potential to improve dynamic system modeling by offering a way to represent human brain, reasoning, and decision making.
    摘要

Reducing Causality to Functions with Structural Models

  • paper_url: http://arxiv.org/abs/2307.07524
  • repo_url: https://github.com/miaotianyi/annotated-sfm
  • paper_authors: Tianyi Miao
  • for: 本文提出了一种新的 causality 定义方法,即基于结构功能模型(SFM)的减 delta 压缩和对比迭代推理,以生成具有我们直觉含义的 causal 句子。
  • methods: 本文使用了 delta 压缩和对比迭代推理来实现 SFM,并将其应用到了多个 causal 场景中。
  • results: 本文通过对 SFM 的应用和比较,发现了它的 compatibiltiy 性和可靠性,并用于解释自由意志、 causal 解释和心理 causation 等问题。
    Abstract The precise definition of causality is currently an open problem in philosophy and statistics. We believe causality should be defined as functions (in mathematics) that map causes to effects. We propose a reductive definition of causality based on Structural Functional Model (SFM). Using delta compression and contrastive forward inference, SFM can produce causal utterances like "X causes Y" and "X is the cause of Y" that match our intuitions. We compile a dataset of causal scenarios and use SFM in all of them. SFM is compatible with but not reducible to probability theory. We also compare SFM with other theories of causation and apply SFM to downstream problems like free will, causal explanation, and mental causation.
    摘要 <>translate "The precise definition of causality is currently an open problem in philosophy and statistics. We believe causality should be defined as functions (in mathematics) that map causes to effects. We propose a reductive definition of causality based on Structural Functional Model (SFM). Using delta compression and contrastive forward inference, SFM can produce causal utterances like "X causes Y" and "X is the cause of Y" that match our intuitions. We compile a dataset of causal scenarios and use SFM in all of them. SFM is compatible with but not reducible to probability theory. We also compare SFM with other theories of causation and apply SFM to downstream problems like free will, causal explanation, and mental causation." into 中文(简体)>>Here's the translation:现在哲学和统计学中,定义 causality 是一个开放的问题。我们认为 causality 应该定义为数学函数,将原因映射到后果上。我们提出了一种简化的定义,基于结构功能模型(SFM)。使用 delta 压缩和对比前进INF,SFM 可以生成如 "X 导致 Y" 和 "X 是 Y 的原因" 这样的 causal 句子,与我们的直觉匹配。我们对所有 causal 场景进行了数据集编译,并使用 SFM。SFM 与概率论相容,但不可reducible。我们还与其他 causation 理论进行了比较,并将 SFM 应用到下游问题,如自由意志、 causal 解释和心理 causation。

Secrets of RLHF in Large Language Models Part I: PPO

  • paper_url: http://arxiv.org/abs/2307.04964
  • repo_url: https://github.com/openlmlab/moss-rlhf
  • paper_authors: Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
  • for: 这个论文的主要目标是提高人工智能的普遍性,并通过人类中心的帮助、诚实和无害的方式来实现这一目标。
  • methods: 该论文使用了奖励模型来衡量人类喜好,并使用Proximal Policy Optimization(PPO)算法来优化策略模型的输出。它还使用过程监视来提高步骤进行逻辑能力。
  • results: 该论文通过分析RLHF框架、重新评估PPO算法的内部工作机制,并探索PPO算法中的部件如何影响策略代理训练。研究发现策略约束是RLHF训练稳定的关键因素。因此,研究者提出了PPO-max算法,以提高策略模型训练稳定性。基于主要结果,研究者进行了RLHF能力的全面分析,并与SFT模型和ChatGPT进行比较。
    Abstract Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.
    摘要 大型语言模型(LLM)已经制定了人工通用智能的发展蓝图。其主要目标是作为人acentric(协助、诚实、无害)助手。与人类Alignment的重要性,以及人类反馈学习(RLHF) emerges as the pivotal technological paradigm underpinning this pursuit。现有的技术 Routes usually include 激励模型 to measure human preferences, Proximal Policy Optimization(PPO) to optimize policy model outputs, and process supervision to improve step-by-step reasoning capabilities。然而,由于激励设计、环境互动、代理人训练等因素,加上大型语言模型的巨大实验成本,导致 AI研究者对技术Alignment和安全降落的发展带来了很大的阻碍。RLHF的稳定训练仍然是一个谜。在本报告中,我们分析了RLHF的框架,重新评估PPO内部运作,并探索PPO算法中的不同部分如何影响代理人训练。我们发现政策约束是RLHF的关键因素。因此,我们探索了PPO-max,一种RLHF的进阶版本,以提高政策模型训练的稳定性。根据我们的主要结果,我们进行了RLHF能力的全面分析,与SFT模型和ChatGPT进行比较。由于LLMs的开源实现缺乏,我们对LLMs的Alignment进行了实验性的探索。因此,我们将发布技术报告、激励模型和PPO代码,以做出一定的贡献于LLMs的发展。

Intrinsically motivated graph exploration using network theories of human curiosity

  • paper_url: http://arxiv.org/abs/2307.04962
  • repo_url: https://github.com/spatank/GraphRL
  • paper_authors: Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy, Dani S. Bassett
  • for: 本研究旨在透过两种人类好奇理论来驱动 Graph 结构资料中的探索。
  • methods: 本研究提出了一种基于 Graph Neural Network 的奖励学习方法,使用提议的特征来帮助 Agent 在环境中探索。
  • results: 训练Agent使用提议的奖励学习方法后,可以在更大的环境中和更长的探索路径上获得更好的性能,并且比使用单纯的搜索更加快速。此外,curiosity-based 的推荐系统在真实世界的 Graph 资料上也比 PageRank 中心性更加预测人类行为。
    Abstract Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.
    摘要 自适应的探索被证明对于强化学习非常有用,即使没有额外的奖励。当环境自然表示为图时,如何最佳化探索仍然是一个开放的问题。在这项工作中,我们提议一种新的方法来探索图结构数据,基于人类好奇的两种理论:信息差距理论和压缩进步理论。这两种理论视好奇为一种内在的动机,用于优化图中子节点的 topological 特征。我们使用这些提出的特征作为图神经网络基于的强化学习中的奖励。在多种 sintetically 生成的图上,我们发现训练的代理人在更大的环境和更长的探索步长上进行更好的探索。我们的方法比较有效率于评估相关的 topological 特征。提议的内在动机具有特别 relevance для推荐系统。我们示出,好奇基于推荐是对实际世界图据集的人类行为更加预测的,比如 MovieLens、Amazon Books 和 Wikispeedia。

Reinforcement Learning with Non-Cumulative Objective

  • paper_url: http://arxiv.org/abs/2307.04957
  • repo_url: https://github.com/willtop/Reinforcement_Learning_With_Non-Cumulative_Objective
  • paper_authors: Wei Cui, Wei Yu
  • for: 这篇论文主要针对的是在控制和学习中处理非累积目标的问题。
  • methods: 该论文提出了一种修改现有算法,以便优化非累积目标。Specifically, it modifies the Bellman optimality equation to handle non-cumulative objectives.
  • results: 在实验中,该方法在经典的优化控制和学习任务中,以及在两个网络流量最大化问题中,都能够达到全球最优的协调。
    Abstract In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
    摘要 在再循环学习中,目标通常是一个累加函数,表示过程中的奖励的总和。但在应用领域中,有许多优化控制和再循环学习问题,特别是在通信和网络领域,其目标不自然表示为奖励的总和。在这篇论文中,我们认可这些问题中的非累加目标的普遍存在,并提出修改现有算法以优化这些目标的方法。specifically,我们探究了许多优化控制和再循环学习算法的基本构建块:bellman优化方程。为了优化非累加目标,我们在bellman更新规则中 Replace the original summation operation with a generalized operation corresponding to the objective. In addition, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.

Impact of Feature Encoding on Malware Classification Explainability

  • paper_url: http://arxiv.org/abs/2307.05614
  • repo_url: None
  • paper_authors: Elyes Manai, Mohamed Mejri, Jaouhar Fattahi
  • for: 这个论文研究了对于可解释人工智能(XAI)算法的特征编码技术的影响。
  • methods: 我们使用了一个架构分类 dataset,并将 XGBoost 模型与两种特征编码方法进行比较:标签编码(LE)和一个热点编码(OHE)。
  • results: 我们发现,使用 OHE 而不是 LE 会导致一些性能下降,但是 OHE 提供的更多的细节解释使得这些下降被补偿。我们还发现,使用 OHE 可以更好地探索全局和本地上下文中的细节,使得更全面的回答。此外,我们发现使用 OHE 可以减少解释文件的大小和人类分析者的分析时间。这些结论强调了在 XAI 研究中考虑特征编码技术的重要性,并提出了进一步探索的可能性,包括添加更多的编码方法和创新的可视化方法。
    Abstract This paper investigates the impact of feature encoding techniques on the explainability of XAI (Explainable Artificial Intelligence) algorithms. Using a malware classification dataset, we trained an XGBoost model and compared the performance of two feature encoding methods: Label Encoding (LE) and One Hot Encoding (OHE). Our findings reveal a marginal performance loss when using OHE instead of LE. However, the more detailed explanations provided by OHE compensated for this loss. We observed that OHE enables deeper exploration of details in both global and local contexts, facilitating more comprehensive answers. Additionally, we observed that using OHE resulted in smaller explanation files and reduced analysis time for human analysts. These findings emphasize the significance of considering feature encoding techniques in XAI research and suggest potential for further exploration by incorporating additional encoding methods and innovative visualization approaches.
    摘要 这篇论文研究了特征编码技术对Explainable Artificial Intelligence(XAI)算法的可解释性影响。使用一个恶意软件分类 dataset,我们使用 XGBoost 模型进行比较两种特征编码方法的性能:Label Encoding(LE)和One Hot Encoding(OHE)。我们发现,使用 OHE 而不是 LE 会导致一些性能下降,但是 OHE 提供的更详细的解释相应弥补了这些损失。我们发现,使用 OHE 可以更深入探索全局和局部上下文中的细节,从而提供更全面的答案。此外,我们发现使用 OHE 会减少解释文件的大小和人工分析者的分析时间。这些发现强调了考虑特征编码技术在 XAI 研究中的重要性,并提出了进一步探索的可能性,包括添加更多的编码方法和创新的视觉化方法。

Substance or Style: What Does Your Image Embedding Know?

  • paper_url: http://arxiv.org/abs/2307.05610
  • repo_url: None
  • paper_authors: Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins
  • for: 本研究旨在探索 популяр的嵌入模型(如 MAE、SimCLR 和 CLIP)中的非 semantics 信息,以便更好地理解训练算法和这些基础模型的应用场景。
  • methods: 作者设计了一种系统的变换预测任务,用于测试嵌入中的视觉内容。他们使用了多种自然和人工变换来测试嵌入,并发现六个嵌入(包括 SimCLR)能够确定多达数十种变换。
  • results: 研究发现,使用图像文本模型(如 CLIP 和 ALIGN)可以更好地识别新的样式转移,而使用masking-based模型(如 CAN 和 MAE)则更适合识别图像的涂抹效果。总的来说,研究结果表明,选择预训练算法可以影响嵌入中的信息类型,并且某些模型更适合非 semantics 下游任务。
    Abstract Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.
    摘要 probes 是小型网络,可以预测基于嵌入的数据属性,它们提供一种有效、targeted的方式来探索嵌入中包含的信息。在 NLP 领域中,对嵌入进行分析已成为标准的操作,而在视觉领域中,image foundation models 主要被评估为semantic content。我们认为更好地理解流行 embedding 中的非 semantic 信息(例如 MAE、SimCLR 或 CLIP)将为training algorithms 和这些基础模型提供新的灯光。我们设计了一个系统性的转换预测任务,并测量嵌入中的视觉内容在多个轴上,包括图像风格、质量和自然/人工转换的范围。结果显示,六个嵌入(包括 SimCLR)中的信息足够以识别多达数十种转换。我们还考虑了一个通用化任务,将相似的转换分组,并将其中的一些作为测试集保留。我们发现,图像文本模型(CLIP 和 ALIGN)在新的样式转换任务中表现更好,而基于 masking 的模型(CAN 和 MAE)则表现不如其他模型。总之,我们的结果表明,选择预训练算法的类型会影响嵌入中包含的信息,以及选择合适的模型可以对非 semantic 下游任务进行更好的表现。

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

  • paper_url: http://arxiv.org/abs/2307.07409
  • repo_url: None
  • paper_authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang
  • for: 这篇论文是为了开发一个新的预训条件(Pre-trained Vision-Language Model,VLM),用于胸部X射影领域。
  • methods: 这篇论文使用多种多Modal dataset进行初始化,然后转移到胸部X射影领域。它运用了一个简单的序列对话Schema,让模型从有限的资源中学习需要的知识和技能。
  • results: 这篇论文在 BioNLP 共享任务的benchmark数据集上显示出了优秀的表现,并且在 RadSum23 领域中的隐藏测试集上取得了第一名。
    Abstract In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.
    摘要 在这篇论文中,我们介绍CheXOFA,一种新的预训练视觉语言模型(VLM),用于胸部X射影领域。我们的模型首先在通用领域中预训练于多种多Modal数据集,然后将其转移到胸部X射影领域。遵循一种知名的VLM,我们将各个领域特有的任务统一为简单的序列到序列 schema。这使得模型可以很好地从限制的资源中学习需要的知识和技能。在 BioNLP 共同任务提供的标准 datasets 上,我们的模型表现出色,受益于在多个任务和领域进行训练。通过使用 ensemble 和事实抽象,我们的系统在 RadSum23 隐藏测试集上获得了第一名。

Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer

  • paper_url: http://arxiv.org/abs/2307.04895
  • repo_url: https://github.com/azreasoners/recurrent_transformer
  • paper_authors: Zhun Yang, Adam Ishay, Joohyung Lee
  • for: 解决具有约束的问题 (Constraint Satisfaction Problems, CSPs)
  • methods: 使用Transformer扩展加上回归,实现端到端的学习方法,与现有方法(如图 neural network、SATNet和一些 neuralsymbolic 模型)有明显的优势。
  • results: 可以直接应用于视觉约束逻辑问题,成功解决符号固定问题,并在 inductive 学习中利用约束知识来实现高效的学习和半supervised learning for CSPs。
    Abstract Constraint satisfaction problems (CSPs) are about finding values of variables that satisfy the given constraints. We show that Transformer extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. With the ability of Transformer to handle visual input, the proposed Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. We also show how to leverage deductive knowledge of discrete constraints in the Transformer's inductive learning to achieve sample-efficient learning and semi-supervised learning for CSPs.
    摘要

Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies

  • paper_url: http://arxiv.org/abs/2307.04893
  • repo_url: https://github.com/rubensolv/locallearnerijcai
  • paper_authors: Rubens O. Moraes, David S. Aleixo, Lucas N. Ferreira, Levi H. S. Lelis
  • for: 提供一个集合参考策略来导引搜索算法,以提高两个玩家零点游戏中的策略搜索质量。
  • methods: 使用Local Learner(2L)算法,活动选择一组参考策略,以提高搜索信号。
  • results: 比较IBR、FP和DO等前一代学习算法,2L学习的参考策略提供了更强的搜索信号,并在MicroRTS游戏中synthesize策略时出色表现。
    Abstract This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.
    摘要 Translated into Simplified Chinese:这篇论文介绍了本地学习者(2L)算法,用于为两 Player零点游戏提供参考策略,以提高搜索信号。先前的学习算法,如趋同最佳响应(IBR)、虚拟游戏(FP)和双oracle(DO),可能具有计算成本高或缺乏重要信息,导致搜索算法受到限制。2L活动选择参考策略,以提高搜索信号。我们实际示出了我们方法的优势,在三个游戏中,包括实时战略游戏MicroRTS,对于 synthesizing 策略进行了地方搜索。结果显示,2L学习的参考策略比IBR、FP和DO更强。我们还在MicroRTS tournament中使用2Lsynthesizer,击败了两个最近的 MicroRTS 比赛冠军,这两个冠军分别是人工编程的程序策略。

Measuring and Mitigating Interference in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04887
  • repo_url: None
  • paper_authors: Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White
  • for: 本研究旨在提供一种定义和评估Catastrophic interference的方法,以便更好地理解和解决这种问题。
  • methods: 本研究使用了Fitted Q-Iteration和DQN等值基于学习方法,并提出了一种新的interference measure。
  • results: 研究人员通过系统地评估了新的interference measure,发现它与控制性能的不稳定相关,并在多种网络架构上进行了评估。此外,研究人员还提出了一类名为“online-aware”的算法,可以减少interference,并在多个 классических控制环境中提高稳定性和性能。
    Abstract Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.
    摘要 “潜在的干扰是许多基于网络的学习系统中的常见问题,而有许多提议可以减轻它。在这个工作中,我们提供一个定义和新的干扰量表,用于值基于动作学习方法,如适应Q值迭代和DQN。我们系统评估了我们的干扰量表,证明它与控制性能的不稳定相联系。我们的新干扰量表允许我们问新科学问题,研究通用的深度学习架构和减轻干扰的学习算法。最后,我们描述一 classe of algorithms,我们称之为在线独立的,可以减轻干扰,并证明它们可以降低干扰根据我们的量表,并提高稳定性和性能在一些类典控制环境中。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and Hong Kong.

ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The Unknown

  • paper_url: http://arxiv.org/abs/2307.10195
  • repo_url: https://github.com/markscanlonucd/chatgpt-for-digital-forensics
  • paper_authors: Mark Scanlon, Frank Breitinger, Christopher Hargreaves, Jan-Niclas Hilgert, John Sheppard
  • for: This paper is written to assess the impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4.
  • methods: The paper uses a series of experiments to assess the capability of ChatGPT across several digital forensic use cases, including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education.
  • results: The paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present or require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, ChatGPT could act as a useful supporting tool in some circumstances.
    Abstract The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances.
    摘要 各种领域中的ChatGPT应用(GPT-3.5、GPT-4)已成为科学社会中的一个热点话题。大语言模型(LLM),如BERT、Bard、生成预训练变换器(GPT)、LLaMA等,可以根据用户提供的指令或提示,从大量文本基础数据中生成答案和解决方案。这篇论文评估了ChatGPT在数字审计领域的影响和潜在影响,特别是关注其最新的预训练LLM——GPT-4。经过一系列实验,对多个数字审计应用场景进行了评估,包括文本理解、证据搜索、代码生成、异常检测、应急应对和教育等。在这些话题中,其优点和风险被详细描述,并从一般角度提出了一些结论。总之,这篇论文认为,虽然ChatGPT在数字审计领域有一些低风险应用,但大多数情况下需要上传证据到服务器,或者需要具备足够的话题知识,以确定 incorrect assumption、不准确和错误。然而,对知道这些话题的用户来说,它可以作为一个有用的支持工具在某些情况下使用。

AI For Global Climate Cooperation 2023 Competition Proceedings

  • paper_url: http://arxiv.org/abs/2307.06951
  • repo_url: None
  • paper_authors: Yoshua Bengio, Prateek Gupta, Lu Li, Soham Phade, Sunil Srinivasa, Andrew Williams, Tianyu Zhang, Yang Zhang, Stephan Zheng
    for:The paper aims to design international frameworks for mitigating climate change and promoting economic growth through the use of AI and climate-economic simulations.methods:The paper uses RICE-N, an AI-driven integrated assessment model, to model regional decision-making and assess the climate-economic impact of those decisions into the future.results:The proposals submitted to the second track were evaluated both quantitatively and qualitatively, with a focus on the degree of mitigation of global temperature rise and the increase in economic productivity. An interdisciplinary panel of human experts evaluated the solutions qualitatively, considering effectiveness, simplicity, feasibility, ethics, and notions of climate justice.
    Abstract The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.
    摘要 国际社区必须合作以 Mitigate 气候变化并促进经济增长。然而,合作困难以实现,一部分因为没有全球权威机构可以确保国际气候协议的遵从性。通过结合 AI 与气候经济仿真 simulations 提供了一个有 Promise 的解决方案,设计国际框架,包括谈判协议和气候协议,以促进和激励合作。此外,这些框架还应该具备政策目标实现和持续承诺,考虑气候经济动态和战略行为。这些挑战需要跨学科的approach,涵盖机器学习、经济学、气候科学、法律、政策、伦理和其他领域。为了实现这个目标,我们组织了 AI for Global Climate Cooperation 竞赛,各 коман队提交了国际框架的提案和分析,基于(修改后) RICE-N AI 驱动的集成评估模型(IAM)。具体来说,RICE-N 支持地域决策使用 AI 代理。而 IAM 则模拟了未来气候经济的影响。在第一个轨道中,只评估表现指标。而在第二个轨道中,提案被评估 both 量化和质量上。量化评估包括(i)全球气温升高的减少程度和(ii)经济生产力的增加。而质量评估则由一个多学科专家组评估,包括法律、政策、社会学、经济学和环境科学。特别是,专家组考虑了解决方案的有效性、简单性、可行性、伦理和气候正义。在第三个轨道中,参与者被要求提出 RICE-N 的改进建议。

Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning

  • paper_url: http://arxiv.org/abs/2307.04869
  • repo_url: None
  • paper_authors: Gaurav Bagwe, Xiaoyong Yuan, Miao Pan, Lan Zhang
  • for: 这个论文关注于无需练习的 Federated Continual Learning(FCL),它在分布在客户端上的隐私数据上逐步学习新任务。
  • methods: 这篇论文提出了基于提问学习技术的 Fed-CPrompt,通过异步提问学习和对比性 continual loss 来解决无法访问历史任务数据的忘记问题。
  • results: 实验证明,Fed-CPrompt 可以在无需练习的情况下实现 SOTA 级别的 FCL 性能。
    Abstract Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.
    摘要 federated continual learning (FCL) 学习 incremental 任务随时间的发展,数据分布在客户端上,这篇论文关注 rehearsal-free FCL,因为缺乏历史任务数据,导致新任务学习存在严重的忘记问题。为解决这个问题,我们提议 Fed-CPrompt,基于提示学习技术,在通信效率高的方式下获取任务特定的提示。Fed-CPrompt 具有两个关键组成部分:异步提示学习和矛盾连续损失,以处理 FCL 中的异步任务到达和不同数据分布。广泛的实验证明 Fed-CPrompt 可以实现 SOTA 的忘记-free FCL 性能。

Automated Detection of Gait Events and Travel Distance Using Waist-worn Accelerometers Across a Typical Range of Walking and Running Speeds

  • paper_url: http://arxiv.org/abs/2307.04866
  • repo_url: None
  • paper_authors: Albara Ah Ramli, Xin Liu, Kelly Berndt, Chen-Nee Chuah, Erica Goude, Lynea B. Kaethler, Amanda Lopez, Alina Nicorici, Corey Owens, David Rodriguez, Jane Wang, Daniel Aranki, Craig M. McDonald, Erik K. Henricson
    for: 这项研究的目的是使用商用智能手机的加速度仪数据来提取孩子 Duchenne muscular dystrophy (DMD) 和常见发育 typical developing controls (TDs) 中的步行临床特征 (CFs),并使用机器学习 (ML) 方法来实现这一目标。methods: 该研究使用了一种多步机器学习基本过程来提取加速度仪数据中的步行特征,并对这些特征进行了比较与实际观察数据。results: 研究发现,使用这种方法可以准确地测量孩子在不同步行速度下的步行特征,并且与实际观察数据之间存在强相关性(Pearson 相关系数为 -0.9929 至 0.9986,p < 0.0001)。
    Abstract Background: Estimation of temporospatial clinical features of gait (CFs), such as step count and length, step duration, step frequency, gait speed and distance traveled is an important component of community-based mobility evaluation using wearable accelerometers. However, challenges arising from device complexity and availability, cost and analytical methodology have limited widespread application of such tools. Research Question: Can accelerometer data from commercially-available smartphones be used to extract gait CFs across a broad range of attainable gait velocities in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods Methods: Fifteen children with DMD and 15 TDs underwent supervised clinical testing across a range of gait speeds using 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations while wearing a mobile phone-based accelerometer at the waist near the body's center of mass. Gait CFs were extracted from the accelerometer data using a multi-step machine learning-based process and results were compared to ground-truth observation data. Results: Model predictions vs. observed values for step counts, distance traveled, and step length showed a strong correlation (Pearson's r = -0.9929 to 0.9986, p<0.0001). The estimates demonstrated a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length compared to ground truth observations for the combined 6MWT, 100MRW, and FW tasks. Significance: The study findings indicate that a single accelerometer placed near the body's center of mass can accurately measure CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.
    摘要 Background: 评估社区中行走的临床功能(CFs),如步数和长度、步 duration、步频、走速和距离覆盖,是评估社区基础 mobilidade 评估工具的重要组成部分。但是,设备复杂性和可用性、成本和分析方法等因素,限制了这些工具的广泛应用。研究问题:可以使用商业可用的智能手机加速器数据来提取孩子 Duchenne muscular dystrophy(DMD)和常见发育阶段(TD)的走姿CFs,并使用机器学习(ML)基本方法来实现这一点。Methods: fifteen children with DMD和 fifteen TDs underwent supervised clinical testing across a range of gait speeds using 10或25m run/walk(10MRW, 25MRW)、100m run/walk(100MRW)、6分钟步行(6MWT)和自由步行(FW)评估,并在腰部附近的身体中心的 mobilphone 加速器上穿戴。孩子的走姿CFs 从加速器数据中提取,使用多步骤机器学习基本过程,并与实际观察数据进行比较。Results: 模型预测与实际观察值之间的相关性(Pearson's r = -0.9929到0.9986,p < 0.0001),并且估计结果表明,对于步数、距离旅行和步长,模型的 Mean(SD)百分比误差为1.49%(7.04%)、1.18%(9.91%)和0.37%(7.52%),与实际观察值相比。Significance: 研究发现,使用商业可用的智能手机加速器可以准确地测量CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

  • paper_url: http://arxiv.org/abs/2307.04850
  • repo_url: None
  • paper_authors: Sanjay Kariyappa, Leonidas Tsepenekas, Freddy Lécué, Daniele Magazzeni
  • for: 本研究旨在提高现有方法的样本效率,以解决顶层特征标识问题(TkIP)。
  • methods: 本研究使用了两种多臂投机(MAB) литераature中的技术来提高样本效率:首先,提供一个更好的停止条件,以确定当PAC保证已经得到的时候停止抽样;其次,采用一种精巧的抽样方案,将抽样分配给不同的特征。
  • results: 通过采用KernelSHAP@k和SamplingSHAP@k方法,本研究可以高效地解决TkIP,提供了平均提高$5\times$的样本效率和运行时间。
    Abstract The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
    摘要 “SHAP框架提供一种原则性的方法来解释模型预测结果,计算特征重要性。受金融应用驱动,我们引入Top-k标识问题(TkIP), objective是 Identifying the k 特征 WITH the highest SHAP值。虽然任何方法可以计算SHAP值 WITH uncertainty estimates(如 KernelSHAP 和 SamplingSHAP)可以轻松地解决TkIP,但这样做是高度抽象的。我们的目标是提高现有方法在TkIP的样本效率。我们的关键发现是:TkIP可以被划为Explore-m问题——一个已有研究的多重枪仗问题(MAB)。这种连接使我们能够提高样本效率,通过利用MAB литературе中的两种技术:(1)一个更好的停止条件(to stop sampling),可以确定当PAC(Probably Approximately Correct)保证已经得到到了,和(2)一种聪明的采样方案,可以合理地分配样本 между不同的特征。我们采用这些方法,开发了KernelSHAP@k 和 SamplingSHAP@k,以高效地解决TkIP,在大多数常见的借款相关数据集上提供了平均提高5倍的样本效率和运行时间。”

SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees

  • paper_url: http://arxiv.org/abs/2307.04849
  • repo_url: None
  • paper_authors: Aleksei Sorokin, Xinran Zhu, Eric Hans Lee, Bolong Cheng
  • For: 提高Gradient Boosted Trees(GBT)的hyperparameter tuning效率和用户体验,尤其是在自动化hyperparameter tuning中。* Methods: 利用模型 aware的 hyperparameter tuning系统,结合多元学习和多数据点优化技术,自动地学习GBT模型的优秀hyperparameter。* Results: 比对于现有系统,SigOpt Mulch可以更高效地 Identify GBT模型的优秀hyperparameter,并且更易于使用,不需要用户具备域知识。
    Abstract Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.
    摘要 Gradient Boosted Trees (GBTs) 是研究人员、机器学习实践者和数据科学家们常用的广泛模型,因其 robust性、可解释性和易用性而受到欢迎。然而,在训练 GBTs 时,一个关键挑战是调节其 гипер参数。在实践中,选择这些 гипер参数 oftentimes 是通过手动方式进行的。过去几年,机器学习社区强调通过黑盒优化进行 гипер参数调节,并开发出了先进的系统来实现这一点。然而,在应用这些系统来调节 GBTs 时,存在两个缺点。首先,这些系统不是 GBTs 模型具有的知识,而是针对普通模型设计的,这使得优化性能受到了很大的浪费。其次,使用这些系统需要域知识,如选择优化搜索空间,这与黑盒优化的自动实验相opposite。在这篇论文中,我们提出了 SigOpt Mulch,一种专门为 GBTs 自动调节 гипер参数的模型意识系统。相比现有系统,Mulch 具有两个改进:首先,Mulch 利用了强大的元学习和多级优化技术来进行模型意识化优化。其次,它自动化了优化过程中的performant гипер参数学习,从而减少了用户域知识的需求。这些创新使得 Mulch 可以更高效地、更易于用户使用地调节 GBTs 的 гипер参数,而不需要域知识。

Dynamics of Temporal Difference Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04841
  • repo_url: https://github.com/pehlevan-group/td-rl-dynamics
  • paper_authors: Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
  • for: 这 paper 的目的是研究 reinforcement learning 模型在缺少反馈情况下的学习Dynamic。
  • methods: 这 paper 使用 statistical physics 的概念来研究 temporal difference learning 的值函数学习曲线。
  • results: 研究发现,在 Gaussian equivalence hypothesis 下,学习过程中的抽象函数approximator 会导致学习Dynamic 存在板块,而这些板块与学习率、折扣因子、奖励函数等参数有关。此外,研究还发现,通过 adjusting 学习率和奖励函数,可以改变学习Dynamic 和板块。
    Abstract Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
    摘要 reenforcement 学习在多个应用中得到了成功,其中agent需要在具有罕见反馈的环境中学习行为。然而,Despite this empirical success, there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. 在这个工作中,我们使用统计物理学的概念来研究 temporal difference 学习值函数的Typical case learning curves。我们的理论是基于 Gaussian equivalence hypothesis,在averages over the random trajectories 被 replaced with temporally correlated Gaussian feature averages,并且我们在小规模 Markov Decision Processes 上验证了我们的假设。我们发现,由于 episodic sampling 的抽样决策而导致的stochastic semi-gradient noise 会导致值错误存在显著的板块,与传统的梯度下降动力学不同。我们研究了学习动力学和板块如何受到特征结构、学习率、折损因子和奖励函数的影响。然后,我们分析了如何通过学习率渐进和奖励修饰来改善学习动力学和板块。总之,我们的工作开启了一个新的方向,用于发展 reinforcement 学习的学习动力学理论。

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

  • paper_url: http://arxiv.org/abs/2307.04749
  • repo_url: None
  • paper_authors: Jaskirat Singh, Liang Zheng
  • for: 本文目的是提出一种简单 yet effective的分解方法,以评估和改进文本到图像Alignment。
  • methods: 我们首先引入了一种Decompositional-Alignment-Score,通过将复杂的提示分解成一系列独立的声明,然后使用VQA模型测量每个声明与生成的图像之间的Alignment。最后,我们将不同声明之间的Alignment分数合并 posteriori,以获得最终的文本到图像Alignment分数。
  • results: 我们的实验分析表明,提出的Alignment度量与人工评分之间有高度相关性,而且我们发现了不同声明之间的Alignment分数可以提供有用的反馈,可以在一种简单的迭代过程中逐步提高生成的图像中不同声明的表达。人工用户研究表明,我们的方法比前一个状态的方法提高了8.7%的文本到图像Alignment精度。
    Abstract The field of text-conditioned image generation has made unparalleled progress with the recent advent of latent diffusion models. While remarkable, as the complexity of given text input increases, the state-of-the-art diffusion models may still fail in generating images which accurately convey the semantics of the given prompt. Furthermore, it has been observed that such misalignments are often left undetected by pretrained multi-modal models such as CLIP. To address these problems, in this paper we explore a simple yet effective decompositional approach towards both evaluation and improvement of text-to-image alignment. In particular, we first introduce a Decompositional-Alignment-Score which given a complex prompt decomposes it into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis reveals that the proposed alignment metric shows significantly higher correlation with human ratings as opposed to traditional CLIP, BLIP scores. Furthermore, we also find that the assertion level alignment scores provide a useful feedback which can then be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. Project page for our paper is available at https://1jsingh.github.io/divide-evaluate-and-refine
    摘要 Traditional text-to-image generation models have made significant progress with the recent advent of latent diffusion models. However, as the complexity of the input text increases, the state-of-the-art diffusion models may still struggle to generate images that accurately convey the semantics of the given prompt. Moreover, it has been observed that such misalignments are often undetected by pretrained multi-modal models such as CLIP. To address these issues, in this paper we propose a simple yet effective decompositional approach to both evaluate and improve text-to-image alignment. Specifically, we first introduce a Decompositional-Alignment-Score that decomposes a complex prompt into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, the alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis shows that the proposed alignment metric is significantly correlated with human ratings, compared to traditional CLIP and BLIP scores. Moreover, we find that the assertion-level alignment scores provide useful feedback that can be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. The project page for our paper is available at .

RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

  • paper_url: http://arxiv.org/abs/2307.04738
  • repo_url: https://github.com/MandiZhao/robot-collab
  • paper_authors: Zhao Mandi, Shreeya Jain, Shuran Song
  • for: 这篇论文是为了提出一种基于大语言模型(LLM)的多机器人协作方法,用于高级通信和低级路径规划。
  • methods: 论文使用了LLM来进行高级沟通和低级路径规划,并提供了环境反馈,如碰撞检查,以便提高计划和方向点的准确性。
  • results: 试验表明,该方法在多种多机器人协作场景中具有高成功率,并能够适应任务语义的变化。此外,对话设置具有高可读性和灵活性,可以在实际世界实验中与人类在一起完成任务。
    Abstract We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They then generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset for agent representation and reasoning. We experimentally demonstrate the effectiveness of our approach -- it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility -- in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. See project website https://project-roco.github.io for videos and code.
    摘要 我们提出了一种新的多机器人协作方法,利用预训练的大型自然语言模型(LLM)来实现高级沟通和低级路径规划。机器人配备了LLM,以便集体讨论和逻辑任务策略。然后,它们生成子任务计划和任务空间弧线路径,这些路径被用于加速 trajectory 规划。我们还提供了环境反馈,如碰撞检查,并让 LLM 代理人提高其计划和弧线路径。为了评估,我们提出了 RoCoBench,一个6个任务的benchmark,覆盖了多机器人协作场景的广泛范围。同时,我们还提供了一个文本Only的数据集,用于代理人表示和逻辑。我们的对话设置具有高可读性和灵活性,在实际实验中,我们示例了 RoCo 可以与人类在一起完成任务。更多信息可以在项目网站 查看。

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04726
  • repo_url: None
  • paper_authors: Suzan Ece Ada, Erhan Oztop, Emre Ugur
  • for: 本研究旨在提高非线上强化学习(Offline Reinforcement Learning)方法的性能,使其能够更好地学习政策,并且能够处理不同模式的行为政策。
  • methods: 本研究使用了 conditional diffusion models 来获得表达性的政策,并且引入了状态重建特征学习来解决非线上状态分布偏移问题。
  • results: 本研究在一个新的 2D Multimodal Contextual Bandit 环境中展示了其性能,并在多个 D4RL 标准任务上达到了领先的成绩。
    Abstract Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results.
    摘要 <> translate "Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results."中文翻译:offline reinforcement learning(RL)方法利用先前的经验学习更好的策略,而不是使用Behavior Cloning(行为复制)方法,该方法假设数据是从专家示范中收集的。然而,offline RL算法面临着无法处理分布变化和有效表示策略的挑战,因为训练过程中缺乏在线互动。先前的offline RL方法使用Conditional Diffusion Model(条件扩散模型)来获得表达性的策略,但这些方法并不适应对于非标准分布的状态泛化。我们提出了一种新的方法,即在Recent Class of Diffusion Policies(流体策略的最近一代)中添加了状态重建特征学习来解决非标准分布的状态泛化问题。状态重建损失使得状态表示更加详细,以适应分布变化。我们设计了一个2D多模态上下文抽象环境,以评估和评测我们的提议模型。我们不仅在这个新环境中评估了我们的模型,还在多个D4RL benchmark任务上实现了状态之最。

Large Language Models as General Pattern Machines

  • paper_url: http://arxiv.org/abs/2307.04721
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng
  • for: 这篇论文是关于使用预训练的大语言模型(LLM)来完成复杂的token序列的研究。
  • methods: 该论文使用了随机 sampling tokens from vocabulary 来测试 LLM 的 pattern completion 能力,并研究了如何应用这种零学习能力到机器人控制问题。
  • results: 研究发现,无需任何额外训练,LLM 可以作为通用的序列模型,通过在上下文中学习来完成复杂的序列。这些结果提示了在机器人控制问题中使用 LLM 可能有可能。
    Abstract We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics -- from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.
    摘要 (Simplified Chinese translation)我们发现,预训练的大语言模型(LLMs)可以自动完成复杂的token序列——从生成过程中的任意PCFG推理出来的,到ARC权限 benchmark中的更复杂的空间模式。 Surprisingly,完成模式的能力可以在使用 vocabulary 中随机选择的token表示时保持一定程度的预置。这些结果表明,无需进一步训练,LLMs可以作为通用的序列模型,受到上下文学习驱动。在这个工作中,我们研究了如何这种零shot能力可以应用于机器人控制问题——从时间序列中推断完整的运动,到由奖金条件的轨迹驱动的关闭Loop策略(例如,CartPole中的稳定控制器)。虽然 todays 因为延迟、上下文大小和计算成本等原因,在真实系统中部署这种方法尚未可行,但是使用 LLMs 驱动低级控制可能提供了一个吸引人的前景——将字符串中的模式传递到动作中。

Understanding Real-World AI Planning Domains: A Conceptual Framework

  • paper_url: http://arxiv.org/abs/2307.04701
  • repo_url: None
  • paper_authors: Ebaa Alnazer, Ilche Georgievski
  • for: 这篇论文旨在为AI规划系统的开发提供支持,帮助开发人员更好地理解和处理实际应用领域的复杂因素。
  • methods: 本文提出了一个概念框架,用于识别和分类实际应用领域中的各种因素,包括规划域的不同级别和建筑领域中的可持续发展。
  • results: 本文采用了域的例子,如可持续建筑领域,以示出框架的应用性和可行性。这种框架有助于开发人员更好地设计和实现AI规划系统,并且可能对实际应用领域的规划做出贡献。
    Abstract Planning is a pivotal ability of any intelligent system being developed for real-world applications. AI planning is concerned with researching and developing planning systems that automatically compute plans that satisfy some user objective. Identifying and understanding the relevant and realistic aspects that characterise real-world application domains are crucial to the development of AI planning systems. This provides guidance to knowledge engineers and software engineers in the process of designing, identifying, and categorising resources required for the development process. To the best of our knowledge, such support does not exist. We address this research gap by developing a conceptual framework that identifies and categorises the aspects of real-world planning domains in varying levels of granularity. Our framework provides not only a common terminology but also a comprehensive overview of a broad range of planning aspects exemplified using the domain of sustainable buildings as a prominent application domain of AI planning. The framework has the potential to impact the design, development, and applicability of AI planning systems in real-world application domains.
    摘要 планирование 是任何智能系统的关键能力,它涉及到自动计算满足某个用户目标的计划。人工智能 планирование关注于研究和开发计划系统,以满足实际应用场景中的用户需求。在开发人工智能计划系统时,正确识别和理解实际应用场景中的重要和现实主义特征是非常重要。这对知识工程师和软件工程师在开发过程中的设计、识别和分类资源提供了指导。到目前为止,这种支持不存在。我们通过开发一个概念框架,识别和分类实际应用场景中的各种方面,来填补这一研究漏洞。我们的框架不仅提供了共同术语,还为广泛的计划方面提供了全面的概述,并通过可持续建筑领域作为人工智能计划系统的一个典型应用领域,进行了例示。我们的框架具有影响人工智能计划系统的设计、开发和应用的潜在影响力。

COMEX: A Tool for Generating Customized Source Code Representations

  • paper_url: http://arxiv.org/abs/2307.04693
  • repo_url: https://github.com/ibm/tree-sitter-codeviews
  • paper_authors: Debeshee Das, Noble Saji Mathews, Alex Mathai, Srikanth Tamilselvam, Kranthi Sedamaki, Sridhar Chimalakonda, Atul Kumar
  • For: The paper aims to provide a tool for creating and combining multiple code-views that can be used by machine learning models for various software engineering tasks.* Methods: The tool uses tree-sitter, a widely used incremental parser that supports over 40 languages, to generate code-views such as Control Flow Graph (CFG), Data Flow Graph (DFG), and Abstract Syntax Tree (AST) directly from source code.* Results: The tool is easy to use and can be applied to various programming languages, including Java and C#. It supports both intra-procedural and inter-procedural analysis, and can be used to analyze both method-level snippets and program-level snippets.
    Abstract Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, valid source code, unlike natural language, follows a strict structure and pattern governed by the underlying grammar of the programming language. Current LLMs do not exploit this property of the source code as they treat code like a sequence of tokens and overlook key structural and semantic properties of code that can be extracted from code-views like the Control Flow Graph (CFG), Data Flow Graph (DFG), Abstract Syntax Tree (AST), etc. Unfortunately, the process of generating and integrating code-views for every programming language is cumbersome and time consuming. To overcome this barrier, we propose our tool COMEX - a framework that allows researchers and developers to create and combine multiple code-views which can be used by machine learning (ML) models for various SE tasks. Some salient features of our tool are: (i) it works directly on source code (which need not be compilable), (ii) it currently supports Java and C#, (iii) it can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural analysis, and (iv) it is easily extendable to other languages as it is built on tree-sitter - a widely used incremental parser that supports over 40 languages. We believe this easy-to-use code-view generation and customization tool will give impetus to research in source code representation learning methods and ML4SE. Tool: https://pypi.org/project/comex - GitHub: https://github.com/IBM/tree-sitter-codeviews - Demo: https://youtu.be/GER6U87FVbU
    摘要 学习有效的源代码表示是机器学习 для软件工程(ML4SE)系统的关键。以自然语言处理为灵感,大型语言模型(LLM)如Codex和CodeGen将代码视为普通的文本序列,在庞大的代码数据集上训练,实现了软件工程(SE)任务的状态之权。然而,有效的源代码,与自然语言不同,受到编程语言的下面结构和模式所控制。当前的LLMs未能利用代码的这种属性,而是将代码视为字符串的序列,忽略代码的结构和含义。为了突破这个障碍,我们提出了我们的工具COMEX。COMEX是一个框架,允许研究人员和开发人员通过创建和组合多种代码视图来为机器学习(ML)模型提供多种SE任务。COMEX的一些优点包括:* 直接处理源代码(不需要编译)* 当前支持Java和C#* 可以分析方法级别的剪辑和程序级别的剪辑* 易于扩展到其他语言,基于tree-sitter,一个广泛使用的增量分析器,支持40多种语言我们认为,这个易于使用的代码视图生成和自定义工具,将为源代码表示学习方法和ML4SE研究提供新的动力。工具地址:https://pypi.org/project/comexGitHub地址:https://github.com/IBM/tree-sitter-codeviews demo:https://youtu.be/GER6U87FVbU

VampNet: Music Generation via Masked Acoustic Token Modeling

  • paper_url: http://arxiv.org/abs/2307.04686
  • repo_url: https://github.com/hugofloresgarcia/vampnet
  • paper_authors: Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
  • for: 这个论文主要用于音乐生成、压缩、缺失填充、变换等任务。
  • methods: 该方法使用masked acoustic token modeling Approach,使用可变的masking schedule进行训练,并且使用bidirectional transformer架构来进行非 autoregressive生成。
  • results: 通过不同的prompting方法,VampNet可以应用于音乐压缩、缺失填充、outpainting、continuation和looping等任务,并且可以保持音乐的style、genre、乐器等高级特征。
    Abstract We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
    摘要 我们介绍VampNet,一种带有掩盖的语音对象模型,用于音乐合成、压缩、缺失填充和变化。我们在训练过程中使用可变的掩盖程式,让我们在推断过程中针对不同的掩盖方法(称为“启发”)进行推断。VampNet 不是自然语言模型,它使用两个方向的transformer架构,在前进过程中统计所有的单词。仅需36次推断,VampNet 可以生成具有高匹配度和音质的音乐波形。我们显示,通过对 VampNet 进行不同的启发,可以将其应用到音乐压缩、缺失填充、外填、继续和循环等任务上,并且保持音乐的风格、乐器、乐器等高级特征。这种灵活的启发能力使得 VampNet 成为一个强大的音乐合作工具。代码和音乐样本可以在网上获取。

Quantifying the Echo Chamber Effect: An Embedding Distance-based Approach

  • paper_url: http://arxiv.org/abs/2307.04668
  • repo_url: https://github.com/faalatawi/echo-chamber-score
  • paper_authors: Faisal Alatawi, Paras Sheth, Huan Liu
  • for: This paper aims to develop a novel metric for quantifying echo chambers in online social media platforms.
  • methods: The proposed method, called Echo Chamber Score (ECS), uses a self-supervised graph autoencoder-based user embedding model (EchoGAE) to measure distances between users in the embedding space without making assumptions about the structure of the interaction graph or requiring labels for user ideologies.
  • results: The proposed method was tested on a Twitter dataset consisting of four topics, and the results showcased the effectiveness of ECS in quantifying echo chambers and shedding light on the dynamics of online discourse.
    Abstract The rise of social media platforms has facilitated the formation of echo chambers, which are online spaces where users predominantly encounter viewpoints that reinforce their existing beliefs while excluding dissenting perspectives. This phenomenon significantly hinders information dissemination across communities and fuels societal polarization. Therefore, it is crucial to develop methods for quantifying echo chambers. In this paper, we present the Echo Chamber Score (ECS), a novel metric that assesses the cohesion and separation of user communities by measuring distances between users in the embedding space. In contrast to existing approaches, ECS is able to function without labels for user ideologies and makes no assumptions about the structure of the interaction graph. To facilitate measuring distances between users, we propose EchoGAE, a self-supervised graph autoencoder-based user embedding model that leverages users' posts and the interaction graph to embed them in a manner that reflects their ideological similarity. To assess the effectiveness of ECS, we use a Twitter dataset consisting of four topics - two polarizing and two non-polarizing. Our results showcase ECS's effectiveness as a tool for quantifying echo chambers and shedding light on the dynamics of online discourse.
    摘要 “社交媒体平台的崛起导致了几何圈的形成,这些在网络上的空间中,用户主要遇到的观点都是与他们现有的信念相符的观点,而排挤不同的观点。这个现象严重地阻碍了社区之间的信息传播,并促进了社会的分化。因此,发展方法来量化几何圈的重要性。在这篇文章中,我们提出了几何圈分数(ECS),一个新的度量方法,可以衡量用户社区的凝聚和分离程度。与现有方法不同的是,ECS不需要用户的意识型别标签,并且不 assumptions about the structure of the interaction graph。为了衡量用户之间的距离,我们提出了几何圈GAE,一个基于用户的帖子和互动关系的内置自动encoder模型,可以将用户嵌入到一个实际上反映他们意识上的相似性的空间中。为了评估ECS的有效性,我们使用了一个Twitter dataset,包括四个主题:两个激化主题和两个非激化主题。我们的结果显示ECS是一个有效的几何圈量化工具,可以独立地量化用户社区的凝聚程度,并且给出了网络讨论的动态。”