cs.AI - 2023-08-23

CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

  • paper_url: http://arxiv.org/abs/2308.12213
  • repo_url: https://github.com/xmed-lab/clipn
  • paper_authors: Hualiang Wang, Yi Li, Huifeng Yao, Xiaomeng Li
  • For: This paper focuses on developing a novel method for zero-shot out-of-distribution (OOD) detection using CLIP, a text-to-image model. The goal is to equip CLIP with the ability to distinguish between in-distribution (ID) and OOD samples using positive-semantic prompts and negation-semantic prompts.* Methods: The proposed method, called CLIP saying no (CLIPN), utilizes a novel learnable no prompt and a no text encoder to capture negation semantics within images. Two loss functions are introduced to teach CLIPN to associate images with no prompts, enabling it to identify unknown samples. Additionally, two threshold-free inference algorithms are proposed for OOD detection.* Results: The proposed CLIPN method, based on ViT-B-16, outperforms 7 well-used algorithms by at least 2.34% and 11.64% in terms of AUROC and FPR95 for zero-shot OOD detection on ImageNet-1K. The code is available on GitHub.
    Abstract Out-of-distribution (OOD) detection refers to training the model on an in-distribution (ID) dataset to classify whether the input images come from unknown classes. Considerable effort has been invested in designing various OOD detection methods based on either convolutional neural networks or transformers. However, zero-shot OOD detection methods driven by CLIP, which only require class names for ID, have received less attention. This paper presents a novel method, namely CLIP saying no (CLIPN), which empowers the logic of saying no within CLIP. Our key motivation is to equip CLIP with the capability of distinguishing OOD and ID samples using positive-semantic prompts and negation-semantic prompts. Specifically, we design a novel learnable no prompt and a no text encoder to capture negation semantics within images. Subsequently, we introduce two loss functions: the image-text binary-opposite loss and the text semantic-opposite loss, which we use to teach CLIPN to associate images with no prompts, thereby enabling it to identify unknown samples. Furthermore, we propose two threshold-free inference algorithms to perform OOD detection by utilizing negation semantics from no prompts and the text encoder. Experimental results on 9 benchmark datasets (3 ID datasets and 6 OOD datasets) for the OOD detection task demonstrate that CLIPN, based on ViT-B-16, outperforms 7 well-used algorithms by at least 2.34% and 11.64% in terms of AUROC and FPR95 for zero-shot OOD detection on ImageNet-1K. Our CLIPN can serve as a solid foundation for effectively leveraging CLIP in downstream OOD tasks. The code is available on https://github.com/xmed-lab/CLIPN.
    摘要 OUT-OF-DISTRIBUTION (OOD) 检测指的是在 IN-DISTRIBUTION (ID) 数据集上训练模型,以判断输入图像来自未知类。针对这问题,各种 OOD 检测方法已经得到了广泛的投入,其中一些基于卷积神经网络,一些基于 transformers。然而,驱动 CLIP 的零shot OOD 检测方法却受到了更少的关注。本文提出了一种新的方法,即 CLIP 说不 (CLIPN),该方法通过帮助 CLIP 内部的逻辑分别 ID 和 OOD 样本。我们的关键动机是让 CLIP 能够通过正面 semantics 和否定 semantics 来分辨 ID 和 OOD 样本。具体来说,我们设计了一个可学习的 no 提示和一个 no 文本编码器,以捕捉图像中的否定 semantics。然后,我们引入了两个损失函数:图像文本二进制对立损失和文本 semantics 对立损失,以教 CLIPN 将图像与 no 提示相关联,从而让它能够识别未知样本。此外,我们提出了两种无阈值的推理算法,以利用 no 提示和文本编码器来进行 OOD 检测。实验结果表明,基于 ViT-B-16 的 CLIPN 在 9 个标准数据集(3 ID 数据集和 6 OOD 数据集)上的 OOD 检测任务中,与 7 种常用算法相比,至少提高了 2.34% 和 11.64% 的 AUROC 和 FPR95。我们的 CLIPN 可以作为一个可靠的基础,用于有效地利用 CLIP 在下游 OOD 任务中。代码可以在 https://github.com/xmed-lab/CLIPN 上获取。

Learning to Learn Financial Networks for Optimising Momentum Strategies

  • paper_url: http://arxiv.org/abs/2308.12212
  • repo_url: None
  • paper_authors: Xingyue Pu, Stefan Zohren, Stephen Roberts, Xiaowen Dong
  • for: 这篇论文旨在提供一种新型的风险豁免,利用金融网络中资产之间的连接来预测未来的回报。
  • methods: 该论文提出了一种名为L2GMOM的机器学习框架,该框架同时学习金融网络和股票投资策略,以提高股票投资的性能和风险控制。
  • results: 根据64个连续Future合约的回报测试,L2GMOM模型在20年时间段内能够显著提高股票投资的盈利率和风险控制,Sharpe比率为1.74。
    Abstract Network momentum provides a novel type of risk premium, which exploits the interconnections among assets in a financial network to predict future returns. However, the current process of constructing financial networks relies heavily on expensive databases and financial expertise, limiting accessibility for small-sized and academic institutions. Furthermore, the traditional approach treats network construction and portfolio optimisation as separate tasks, potentially hindering optimal portfolio performance. To address these challenges, we propose L2GMOM, an end-to-end machine learning framework that simultaneously learns financial networks and optimises trading signals for network momentum strategies. The model of L2GMOM is a neural network with a highly interpretable forward propagation architecture, which is derived from algorithm unrolling. The L2GMOM is flexible and can be trained with diverse loss functions for portfolio performance, e.g. the negative Sharpe ratio. Backtesting on 64 continuous future contracts demonstrates a significant improvement in portfolio profitability and risk control, with a Sharpe ratio of 1.74 across a 20-year period.
    摘要 网络势头提供了一种新型的风险偏好,利用财务网络中资产之间的关系预测未来的回报。然而,现有的金融网络建构过程受到高优质数据库和金融专业知识的限制,导致小型和学术机构的访问受到限制。此外,传统方法将网络建构和投资策略优化视为两个独立的任务,可能会降低投资策略的优化性。为解决这些挑战,我们提出了L2GMOM,一种结束到终点的机器学习框架,同时学习金融网络和优化交易信号。L2GMOM的模型是一种高度可解释的前进卷积神经网络,由算法抽象而来。L2GMOM是灵活的,可以使用多种损失函数来优化股票表现,例如负方均值系数。在64个连续未来合约的回测中,L2GMOM显示出了 significiant提高投资收益和风险控制,负方均值系数为20年期间1.74。

Robustness Analysis of Continuous-Depth Models with Lagrangian Techniques

  • paper_url: http://arxiv.org/abs/2308.12192
  • repo_url: None
  • paper_authors: Sophie A. Neubauer, Radu Grosu
  • for: 这个论文旨在统一地present deterministic和统计 lagrange 验证技术,以量化时间连续过程中行为的稳定性。
  • methods: 这个论文使用了 LRT-NG、SLR 和 GoTube 算法来构建紧距盒,即在给定时间范围内可达的状态的上下文。这些算法提供了确定性和统计性的保证。
  • results: 实验表明,lagrange 技术在比较于 LRT、Flow* 和 CAPD 的情况下表现更优异,并用于不同的时间连续模型的稳定性分析。
    Abstract This paper presents, in a unified fashion, deterministic as well as statistical Lagrangian-verification techniques. They formally quantify the behavioral robustness of any time-continuous process, formulated as a continuous-depth model. To this end, we review LRT-NG, SLR, and GoTube, algorithms for constructing a tight reachtube, that is, an over-approximation of the set of states reachable within a given time-horizon, and provide guarantees for the reachtube bounds. We compare the usage of the variational equations, associated to the system equations, the mean value theorem, and the Lipschitz constants, in achieving deterministic and statistical guarantees. In LRT-NG, the Lipschitz constant is used as a bloating factor of the initial perturbation, to compute the radius of an ellipsoid in an optimal metric, which over-approximates the set of reachable states. In SLR and GoTube, we get statistical guarantees, by using the Lipschitz constants to compute local balls around samples. These are needed to calculate the probability of having found an upper bound, of the true maximum perturbation at every timestep. Our experiments demonstrate the superior performance of Lagrangian techniques, when compared to LRT, Flow*, and CAPD, and illustrate their use in the robustness analysis of various continuous-depth models.
    摘要 In LRT-NG, the Lipschitz constant is used as a bloating factor of the initial perturbation to compute the radius of an ellipsoid that over-approximates the set of reachable states. In SLR and GoTube, the Lipschitz constants are used to compute local balls around samples, which are needed to calculate the probability of finding an upper bound of the true maximum perturbation at each timestep. The authors demonstrate the superior performance of Lagrangian techniques compared to LRT, Flow*, and CAPD, and illustrate their use in the robustness analysis of various continuous-depth models through experiments.

Unsupervised anomalies detection in IIoT edge devices networks using federated learning

  • paper_url: http://arxiv.org/abs/2308.12175
  • repo_url: None
  • paper_authors: Niyomukiza Thamar, Hossam Samy Elsaid Sharara
  • for: solves the privacy problem for IoT/ IIoT devices that held sensitive data for the owners.
  • methods: Federated learning(FL) as a distributed machine learning approach, specifically the Fedavg algorithm.
  • results: Almost the same as the centralized machine learning approach, but with the added benefit of addressing privacy concerns.Here’s the simplified Chinese text for the three points:
  • for: 解决 IoT/ IIoT 设备上的敏感数据所有者隐私问题。
  • methods: 分布式机器学习方法(Federated Learning,FL),特别是 Fedavg 算法。
  • results: 与中央机器学习方法相似,但具有隐私保护的优点。
    Abstract In a connection of many IoT devices that each collect data, normally training a machine learning model would involve transmitting the data to a central server which requires strict privacy rules. However, some owners are reluctant of availing their data out of the company due to data security concerns. Federated learning(FL) as a distributed machine learning approach performs training of a machine learning model on the device that gathered the data itself. In this scenario, data is not share over the network for training purpose. Fedavg as one of FL algorithms permits a model to be copied to participating devices during a training session. The devices could be chosen at random, and a device can be aborted. The resulting models are sent to the coordinating server and then average models from the devices that finished training. The process is repeated until a desired model accuracy is achieved. By doing this, FL approach solves the privacy problem for IoT/ IIoT devices that held sensitive data for the owners. In this paper, we leverage the benefits of FL and implemented Fedavg algorithm on a recent dataset that represent the modern IoT/ IIoT device networks. The results were almost the same as the centralized machine learning approach. We also evaluated some shortcomings of Fedavg such as unfairness that happens during the training when struggling devices do not participate for every stage of training. This inefficient training of local or global model could lead in a high number of false alarms in intrusion detection systems for IoT/IIoT gadgets developed using Fedavg. Hence, after evaluating the FedAv deep auto encoder with centralized deep auto encoder ML, we further proposed and designed a Fair Fedavg algorithm that will be evaluated in the future work.
    摘要 在许多物联网设备之间的连接中,通常需要将数据传输到中央服务器进行机器学习模型的训练,但有些 propietarios 对于数据安全问题感到担忧。 Federated learning(FL)作为分布式机器学习方法,在设备上进行机器学习模型的训练,不需要将数据传输到服务器。 Fedavg 是 FL 算法之一,允许在训练过程中将模型复制到参与设备上。这些设备可以随机选择,并且可以在训练过程中被终止。获得的模型将被发送到协调服务器,并与其他完成训练的设备的模型进行平均值。这种方法可以解决物联网/IIoT 设备持有敏感数据的所有者隐私问题。在这篇论文中,我们利用 FL 的优点,并在最新的数据集上实现 Fedavg 算法。结果与中央机器学习方法的结果几乎相同。我们还评估了 Fedavg 的一些缺点,如训练过程中不参与的设备会导致不公平性。这可能导致 IoT/IIoT 设备上开发的投入检测系统中出现高比例的假警示。因此,我们在未来工作中将提出和实现公平的 Fedavg 算法。

Evaluation of Faithfulness Using the Longest Supported Subsequence

  • paper_url: http://arxiv.org/abs/2308.12157
  • repo_url: None
  • paper_authors: Anirudh Mittal, Timo Schick, Mikel Artetxe, Jane Dwivedi-Yu
  • for: evaluating the trustworthiness of machine-generated text, specifically in tasks such as summarization and question-answering
  • methods: introducing a novel approach called the Longest Supported Subsequence (LSS) to compute the faithfulness of machine-generated text, and finetuning a model to generate LSS using a new human-annotated dataset
  • results: demonstrating that the proposed metric correlates better with human ratings than prevailing state-of-the-art metrics, with an 18% enhancement in faithfulness on the dataset, and consistently outperforming other metrics on a summarization dataset across six different models, as well as comparing several popular Large Language Models (LLMs) for faithfulness using this metric.
    Abstract As increasingly sophisticated language models emerge, their trustworthiness becomes a pivotal issue, especially in tasks such as summarization and question-answering. Ensuring their responses are contextually grounded and faithful is challenging due to the linguistic diversity and the myriad of possible answers. In this paper, we introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous substring of the claim that is supported by the context, which we refer to as the Longest Supported Subsequence (LSS). Using a new human-annotated dataset, we finetune a model to generate LSS. We introduce a new method of evaluation and demonstrate that these metrics correlate better with human ratings when LSS is employed, as opposed to when it is not. Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset. Our metric consistently outperforms other metrics on a summarization dataset across six different models. Finally, we compare several popular Large Language Models (LLMs) for faithfulness using this metric. We release the human-annotated dataset built for predicting LSS and our fine-tuned model for evaluating faithfulness.
    摘要 “随着越来越进步的语言模型出现,它们的可靠性成为一个关键的问题,特别是在摘要和问答中。确保它们的回答是基于上下文的,并不是单纯地根据语言模型的假设,是一个具有挑战性的任务。在这篇论文中,我们提出了一种新的方法来评估机器生成的文本的可靠性,通过计算文本中最长的不连续子串,我们称之为“最长支持子串”(LSS)。我们使用了一个新的人类验证数据集,调整了一个模型以生成LSS,并导入了一个新的评估方法。我们示示了这些指标与人类评分更加相似,而且在摘要数据集上,我们的提案的指标与现有的指标相比,有18%的提升。我们的指标在六个不同的模型上的表现都与其他指标相比较高。最后,我们使用这个指标评估了一些流行的大型语言模型的可靠性。我们发布了我们建立的人类验证数据集和调整后的模型,以便用于评估可靠性。”

Multimodal Latent Emotion Recognition from Micro-expression and Physiological Signals

  • paper_url: http://arxiv.org/abs/2308.12156
  • repo_url: None
  • paper_authors: Liangfei Zhang, Yifei Qian, Ognjen Arandjelovic, Anthony Zhu
  • for: 提高隐藏情感识别精度
  • methods: combining微表情(ME)和生理信号(PS),使用1D可分和混合深度卷积网络,标准化分布预测权重混合法,以及深度/生理指导注意模块
  • results: 提高比较方法的表现
    Abstract This paper discusses the benefits of incorporating multimodal data for improving latent emotion recognition accuracy, focusing on micro-expression (ME) and physiological signals (PS). The proposed approach presents a novel multimodal learning framework that combines ME and PS, including a 1D separable and mixable depthwise inception network, a standardised normal distribution weighted feature fusion method, and depth/physiology guided attention modules for multimodal learning. Experimental results show that the proposed approach outperforms the benchmark method, with the weighted fusion method and guided attention modules both contributing to enhanced performance.
    摘要 这篇论文介绍了通过多模式数据的汇入来提高潜在情绪识别精度,特点在微表情(ME)和生理信号(PS)之间。提议的方法框架组合了ME和PS,包括一个可分离的深度wise嵌入网络,一种标准化正态分布权重Feature合并方法,以及深度/生理学引导注意模块 для多模式学习。实验结果显示,提议的方法在比较方法上表现出色,权重合并方法和引导注意模块都对精度提高做出了贡献。

A Probabilistic Fluctuation based Membership Inference Attack for Generative Models

  • paper_url: http://arxiv.org/abs/2308.12143
  • repo_url: None
  • paper_authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang
  • for: 本研究探讨了基于生成模型的会员推测攻击(MIA),并提出了一种基于概率波动的会员推测方法(PFAMI)。
  • methods: PFAMI 基于生成模型中的记忆效应,通过分析生成记录的概率波动来推断会员性。
  • results: 对多种生成模型和数据集进行了广泛的实验,显示 PFAMI 可以提高攻击成功率(ASR)约27.9% comparing with 基准值。
    Abstract Membership Inference Attack (MIA) identifies whether a record exists in a machine learning model's training set by querying the model. MIAs on the classic classification models have been well-studied, and recent works have started to explore how to transplant MIA onto generative models. Our investigation indicates that existing MIAs designed for generative models mainly depend on the overfitting in target models. However, overfitting can be avoided by employing various regularization techniques, whereas existing MIAs demonstrate poor performance in practice. Unlike overfitting, memorization is essential for deep learning models to attain optimal performance, making it a more prevalent phenomenon. Memorization in generative models leads to an increasing trend in the probability distribution of generating records around the member record. Therefore, we propose a Probabilistic Fluctuation Assessing Membership Inference Attack (PFAMI), a black-box MIA that infers memberships by detecting these trends via analyzing the overall probabilistic fluctuations around given records. We conduct extensive experiments across multiple generative models and datasets, which demonstrate PFAMI can improve the attack success rate (ASR) by about 27.9% when compared with the best baseline.
    摘要 机制成员攻击(MIA)可以决定一个记录是否在机器学习模型的训练集中,通过询问模型。过往的研究主要集中在传统的分类模型上,而现在的研究则开始对生成模型进行应用。我们的研究显示,现有的生成模型MIA主要依赖目标模型的过滤。然而,过滤可以使用多种正规化技术来避免,而现有的MIA实际上却表现不佳。不同的过滤,记忆是深度学习模型所需的一种基本现象,它会使模型在实际应用中表现更好。记忆在生成模型中导致生成记录的概率分布增加,因此我们提出了一个概率波动评估机制成员攻击(PFAMI),这是一种黑盒子MIA,可以通过分析givens record的概率波动来决定成员。我们进行了多种生成模型和数据集的广泛实验,结果显示,PFAMI可以提高攻击成功率(ASR)约27.9%,相比最佳基eline。

Semantic Change Detection for the Romanian Language

  • paper_url: http://arxiv.org/abs/2308.12131
  • repo_url: https://github.com/ds4ai-upb/semanticchange-ro
  • paper_authors: Ciprian-Octavian Truică, Victor Tudose, Elena-Simona Apostol
  • for: 本研究旨在分析语言变化的自动Semantic Change Methods,以及在实际英语和罗马尼亚语 Corporas中的应用。
  • methods: 本研究使用Word2Vec和ELMo两种静态和 контекстual word embedding模型,并对这两种模型在英语dataset上进行评估。然后,对罗马尼亚语 dataset进行实验,并强调不同的semantic change aspect,如意义获得和丢失。
  • results: 实验结果显示,取决于 corpus,模型选择和评估距离是检测semantic change的重要因素。
    Abstract Automatic semantic change methods try to identify the changes that appear over time in the meaning of words by analyzing their usage in diachronic corpora. In this paper, we analyze different strategies to create static and contextual word embedding models, i.e., Word2Vec and ELMo, on real-world English and Romanian datasets. To test our pipeline and determine the performance of our models, we first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA). Afterward, we focus our experiments on a Romanian dataset, and we underline different aspects of semantic changes in this low-resource language, such as meaning acquisition and loss. The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.
    摘要 自动 semantic change 方法试图通过分析在时间上的使用情况来识别词语的意义变化。在这篇论文中,我们分析了不同的策略来创建静态和 контекст word embedding 模型,即 Word2Vec 和 ELMo,在实际的英语和罗马尼亚数据集上。为了测试我们的管道和确定模型的表现,我们首先评估了这两种 word embedding 模型在英语数据集(SEMEVAL-CCOHA)上。接着,我们将注意力集中在罗马尼亚数据集上,并强调不同的 semantics 变化方面,如 meaning acquisition 和 loss。实验结果表明,具体取决于 corpus,最重要的因素是选择模型和计算分数的距离。

Masking Strategies for Background Bias Removal in Computer Vision Models

  • paper_url: http://arxiv.org/abs/2308.12127
  • repo_url: https://github.com/ananthu-aniraj/masking_strategies_bias_removal
  • paper_authors: Ananthu Aniraj, Cassio F. Dantas, Dino Ienco, Diego Marcos
  • for: 这种研究旨在探讨细化图像分类任务中背景引起的偏见问题,以及如何使用masking策略来 Mitigate这种偏见。
  • methods: 这些研究使用了标准的Convolutional Neural Network (CNN)和Vision Transformers (ViT)模型,并评估了两种masking策略来解决背景引起的偏见问题。
  • results: 研究发现,使用这两种masking策略可以提高模型对不同背景的抗干扰性能,特别是在使用GAP-Pooled Patch token-based classification和 early masking的情况下。
    Abstract Models for fine-grained image classification tasks, where the difference between some classes can be extremely subtle and the number of samples per class tends to be low, are particularly prone to picking up background-related biases and demand robust methods to handle potential examples with out-of-distribution (OOD) backgrounds. To gain deeper insights into this critical problem, our research investigates the impact of background-induced bias on fine-grained image classification, evaluating standard backbone models such as Convolutional Neural Network (CNN) and Vision Transformers (ViT). We explore two masking strategies to mitigate background-induced bias: Early masking, which removes background information at the (input) image level, and late masking, which selectively masks high-level spatial features corresponding to the background. Extensive experiments assess the behavior of CNN and ViT models under different masking strategies, with a focus on their generalization to OOD backgrounds. The obtained findings demonstrate that both proposed strategies enhance OOD performance compared to the baseline models, with early masking consistently exhibiting the best OOD performance. Notably, a ViT variant employing GAP-Pooled Patch token-based classification combined with early masking achieves the highest OOD robustness.
    摘要 模型 для细化图像分类任务,其中一些类别之间的差别可能很小,而每个类别的样本数也很少,容易受到背景相关的偏见。为了更深入地理解这个重要问题,我们的研究探讨了背景引起的偏见对细化图像分类的影响,并评估了标准的背景模型,如卷积神经网络(CNN)和视Transformers(ViT)。我们研究了两种遮盾策略来减轻背景引起的偏见:早期遮盾,即在输入图像水平上移除背景信息,以及晚期遮盾,即在高级空间特征水平上选择性地遮盾背景相关的特征。我们进行了广泛的实验,评估不同遮盾策略对CNN和ViT模型的影响,尤其是对于不同的背景。结果显示,我们所提出的两种遮盾策略都能提高对于不同背景的性能,而早期遮盾一直保持最好的OOD性能。另外,一种基于GAP-Pooled Patch token的ViT变体,结合早期遮盾,达到了最高的OOD Robustness。

Quantifying degeneracy in singular models via the learning coefficient

  • paper_url: http://arxiv.org/abs/2308.12108
  • repo_url: https://github.com/edmundlth/scalable_learning_coefficient_with_sgld
  • paper_authors: Edmund Lau, Daniel Murfet, Susan Wei
  • for: This paper is written to explore the concept of degeneracy in deep neural networks (DNN) and to develop a method for quantifying the degree of degeneracy using a quantity called the “learning coefficient”.
  • methods: The paper uses singular learning theory and stochastic gradient Langevin dynamics to develop a computationally scalable approximation of the localized learning coefficient.
  • results: The paper demonstrates the accuracy of the proposed approach in low-dimensional models with known theoretical values, and shows that the local learning coefficient can correctly recover the ordering of degeneracy between various parameter regions of interest. Additionally, the paper demonstrates the ability of the local learning coefficient to reveal the inductive bias of stochastic optimizers for more or less degenerate critical points using an experiment on the MNIST dataset.
    Abstract Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies. In this work, we illustrate how a quantity known as the \emph{learning coefficient} introduced in singular learning theory quantifies precisely the degree of degeneracy in deep neural networks. Importantly, we will demonstrate that degeneracy in DNN cannot be accounted for by simply counting the number of "flat" directions. We propose a computationally scalable approximation of a localized version of the learning coefficient using stochastic gradient Langevin dynamics. To validate our approach, we demonstrate its accuracy in low-dimensional models with known theoretical values. Importantly, the local learning coefficient can correctly recover the ordering of degeneracy between various parameter regions of interest. An experiment on MNIST shows the local learning coefficient can reveal the inductive bias of stochastic opitmizers for more or less degenerate critical points.
    摘要

Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments

  • paper_url: http://arxiv.org/abs/2308.12086
  • repo_url: None
  • paper_authors: Maria Rigaki, Ondřej Lukáš, Carlos A. Catania, Sebastian Garcia
  • for: This paper focuses on using pre-trained language models (LLMs) as agents in cybersecurity network environments for sequential decision-making processes.
  • methods: The authors propose using pre-trained LLMs as attacking agents in two reinforcement learning environments and compare their performance to state-of-the-art agents and human testers.
  • results: The LLM agents demonstrate similar or better performance than state-of-the-art agents in most scenarios and configurations, and the best LLM agents perform similarly to human testers without any additional training. This suggests that LLMs have the potential to efficiently address complex decision-making tasks within cybersecurity.Here is the text in Simplified Chinese:
  • for: 这篇论文探讨了使用预训练语言模型(LLM)作为网络安全环境中的决策代理。
  • methods: 作者们提议使用预训练LLM作为两个强化学习环境中的攻击者,并与当前最佳代理进行比较。
  • results: LLM代理在大多数情况下和配置下表现相当或更好于当前最佳代理,并且最佳LLM代理在没有任何额外训练的情况下与人工测试人员表现相当。这表明LLM有可能高效地解决网络安全中的复杂决策问题。
    Abstract Large Language Models (LLMs) have gained widespread popularity across diverse domains involving text generation, summarization, and various natural language processing tasks. Despite their inherent limitations, LLM-based designs have shown promising capabilities in planning and navigating open-world scenarios. This paper introduces a novel application of pre-trained LLMs as agents within cybersecurity network environments, focusing on their utility for sequential decision-making processes. We present an approach wherein pre-trained LLMs are leveraged as attacking agents in two reinforcement learning environments. Our proposed agents demonstrate similar or better performance against state-of-the-art agents trained for thousands of episodes in most scenarios and configurations. In addition, the best LLM agents perform similarly to human testers of the environment without any additional training process. This design highlights the potential of LLMs to efficiently address complex decision-making tasks within cybersecurity. Furthermore, we introduce a new network security environment named NetSecGame. The environment is designed to eventually support complex multi-agent scenarios within the network security domain. The proposed environment mimics real network attacks and is designed to be highly modular and adaptable for various scenarios.
    摘要 大语言模型(LLM)在多种自然语言处理任务中得到了广泛的推广,包括文本生成、摘要和各种自然语言处理任务。尽管它们有自然的限制,但LLM基本设计在开放世界enario中的规划和导航方面表现了扎实的能力。本文介绍了一种使用预训练LLM作为网络安全环境中的代理人,关注它们在顺序决策过程中的使用。我们提出了一种方法,其中预训练LLM被用作攻击者在两个循环学习环境中。我们的提议代理人在大多数情况下和现有EPisode数千个话的代理人之间表现相似或更好。此外,我们的最佳LLM代理人在没有任何额外训练过程的情况下与人类测试者的性能相似。这种设计高亮了LLM在网络安全中的潜在能力。此外,我们介绍了一个新的网络安全环境名为NetSecGame。该环境旨在最终支持复杂多代理人场景在网络安全领域。我们的设计模仿了实际网络攻击,并设计为高度可组合和可调整的多种场景。

Stabilizing RNN Gradients through Pre-training

  • paper_url: http://arxiv.org/abs/2308.12075
  • repo_url: None
  • paper_authors: Luca Herranz-Celotti, Jean Rouat
    for: This paper aims to improve the stability of deep neural networks during training, particularly for complex networks that are difficult to analyze analytically.methods: The authors propose a new approach called the Local Stability Condition (LSC) to stabilize deep neural networks. They extend known stability theories to encompass a broader family of deep recurrent networks and propose a new initialization scheme that gives a weight of a half to the time and depth contributions to the gradient.results: The authors confirm that pre-training both feed-forward and recurrent networks to fulfill the LSC often results in improved final performance across models. Their approach can be implemented as an additional step before pre-training on large augmented datasets, and as an alternative to finding stable initializations analytically.
    Abstract Numerous theories of learning suggest to prevent the gradient variance from exponential growth with depth or time, to stabilize and improve training. Typically, these analyses are conducted on feed-forward fully-connected neural networks or single-layer recurrent neural networks, given their mathematical tractability. In contrast, this study demonstrates that pre-training the network to local stability can be effective whenever the architectures are too complex for an analytical initialization. Furthermore, we extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution, a theory that we refer to as the Local Stability Condition (LSC). Our investigation reveals that the classical Glorot, He, and Orthogonal initialization schemes satisfy the LSC when applied to feed-forward fully-connected neural networks. However, analysing deep recurrent networks, we identify a new additive source of exponential explosion that emerges from counting gradient paths in a rectangular grid in depth and time. We propose a new approach to mitigate this issue, that consists on giving a weight of a half to the time and depth contributions to the gradient, instead of the classical weight of one. Our empirical results confirm that pre-training both feed-forward and recurrent networks to fulfill the LSC often results in improved final performance across models. This study contributes to the field by providing a means to stabilize networks of any complexity. Our approach can be implemented as an additional step before pre-training on large augmented datasets, and as an alternative to finding stable initializations analytically.
    摘要 多种学习理论建议防止梯度变异的束缚增长,以稳定和改进训练。通常,这些分析是在具有数学 tractability 的批量化神经网络或单层循环神经网络上进行的。然而,这项研究表明,在神经网络太复杂以至于无法进行分析初始化时,可以预训练网络到地方稳定性。此外,我们扩展了已知稳定性理论,以覆盖更广泛的深度循环神经网络家族,不需要对数据和参数分布做出过多的假设。我们称之为地方稳定条件(LSC)。我们的调查表明,经典的格洛罗特、和合理初始化方案满足 LSC 当应用于批量化神经网络。然而,对深度循环神经网络进行分析,我们发现了一种新的加法式爆炸源,来自于计算梯度路径在深度和时间方向的矩阵中的计数。我们提出一种新的方法来缓解这个问题,即在计算梯度时,将时间和深度的贡献权重设为 0.5,而不是经典的 1.0。我们的实验结果表明,在 feed-forward 和循环神经网络中预训练满足 LSC 后,可以获得改进的最终性能。这项研究对深度学习领域的稳定性做出了贡献,并提供了一种可以稳定任何复杂性的神经网络的方法。我们的方法可以作为训练之前的额外步骤,或者作为在大量增强数据集上进行分析初始化的替代方案。

Identifying Reaction-Aware Driving Styles of Stochastic Model Predictive Controlled Vehicles by Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.12069
  • repo_url: None
  • paper_authors: Ni Dang, Tao Shi, Zengjie Zhang, Wanxin Jin, Marion Leibold, Martin Buss
  • for: 本研究旨在提供一种基于最大熵逆激励学习(ME-IRL)方法的自动驾驶车辆(AV)驾驶风格识别方法,以便在多辆AV交通系统中评估风险和做出更合理的驾驶决策。
  • methods: 本研究使用ME-IRL方法来定义AV驾驶风格,并设计了一系列新的特征来捕捉AV对附近AV的反应。
  • results: 经过验证Using MATLAB实验和一个Off-the-shelf experiment,提出的方法可以准确地识别AV的驾驶风格,并且可以在多辆AV交通系统中提高安全性。
    Abstract The driving style of an Autonomous Vehicle (AV) refers to how it behaves and interacts with other AVs. In a multi-vehicle autonomous driving system, an AV capable of identifying the driving styles of its nearby AVs can reliably evaluate the risk of collisions and make more reasonable driving decisions. However, there has not been a consistent definition of driving styles for an AV in the literature, although it is considered that the driving style is encoded in the AV's trajectories and can be identified using Maximum Entropy Inverse Reinforcement Learning (ME-IRL) methods as a cost function. Nevertheless, an important indicator of the driving style, i.e., how an AV reacts to its nearby AVs, is not fully incorporated in the feature design of previous ME-IRL methods. In this paper, we describe the driving style as a cost function of a series of weighted features. We design additional novel features to capture the AV's reaction-aware characteristics. Then, we identify the driving styles from the demonstration trajectories generated by the Stochastic Model Predictive Control (SMPC) using a modified ME-IRL method with our newly proposed features. The proposed method is validated using MATLAB simulation and an off-the-shelf experiment.
    摘要 自动驾驶车(AV)的驾驶方式指的是它如何行驶和与其他AV交互。在多辆自动驾驶车系统中,一个能够识别附近AV的驾驶方式的AV可以更加可靠地评估碰撞风险并做出更加合理的驾驶决策。然而,在文献中没有一个共识的自动驾驶车驾驶方式定义。尽管认为驾驶方式是在AV的轨迹中嵌入的,可以使用最大 entropy inverse reinforcement learning(ME-IRL)方法来识别它。然而,驾驶方式中一个重要指标,即AV如何 реаги于附近AV,并没有被完全包含在先前的ME-IRL方法中。在这篇论文中,我们定义了自动驾驶车的驾驶方式为一系列加权特征的成本函数。我们还设计了一些新的反应感知特征,以 capture AV的响应特性。然后,我们使用修改后的ME-IRL方法和我们新提出的特征来识别驾驶方式。我们的方法在MATLAB simulations和一个商业实验中得到了验证。

RemovalNet: DNN Fingerprint Removal Attacks

  • paper_url: http://arxiv.org/abs/2308.12319
  • repo_url: https://github.com/grasses/removalnet
  • paper_authors: Hongwei Yao, Zheng Li, Kunzhe Huang, Jian Lou, Zhan Qin, Kui Ren
    for: 这个论文主要是研究DNNS的知识抽取和模型权利保护问题。methods: 作者提出了一种基于最小最大二重优化的DNNS模型抽取攻击方法,以逃脱模型权利验证。在下面优化中,作者将攻击者模型的特定指纹知识除掉,而在上面优化中,作者通过液化模型的总semantic知识来保持代理模型的性能。results: 作者通过对四种高级防御方法进行了广泛的实验,证明了RemovalNet的效果、效率和精度。特别是,与基准攻击方法相比,RemovalNet使用的计算资源减少了约85%。同时,创造的代理模型保持了高精度 послеDNNS模型抽取过程。
    Abstract With the performance of deep neural networks (DNNs) remarkably improving, DNNs have been widely used in many areas. Consequently, the DNN model has become a valuable asset, and its intellectual property is safeguarded by ownership verification techniques (e.g., DNN fingerprinting). However, the feasibility of the DNN fingerprint removal attack and its potential influence remains an open problem. In this paper, we perform the first comprehensive investigation of DNN fingerprint removal attacks. Generally, the knowledge contained in a DNN model can be categorized into general semantic and fingerprint-specific knowledge. To this end, we propose a min-max bilevel optimization-based DNN fingerprint removal attack named RemovalNet, to evade model ownership verification. The lower-level optimization is designed to remove fingerprint-specific knowledge. While in the upper-level optimization, we distill the victim model's general semantic knowledge to maintain the surrogate model's performance. We conduct extensive experiments to evaluate the fidelity, effectiveness, and efficiency of the RemovalNet against four advanced defense methods on six metrics. The empirical results demonstrate that (1) the RemovalNet is effective. After our DNN fingerprint removal attack, the model distance between the target and surrogate models is x100 times higher than that of the baseline attacks, (2) the RemovalNet is efficient. It uses only 0.2% (400 samples) of the substitute dataset and 1,000 iterations to conduct our attack. Besides, compared with advanced model stealing attacks, the RemovalNet saves nearly 85% of computational resources at most, (3) the RemovalNet achieves high fidelity that the created surrogate model maintains high accuracy after the DNN fingerprint removal process. Our code is available at: https://github.com/grasses/RemovalNet.
    摘要 WITH 深度神经网络(DNN)性能显著提高,DNN已广泛应用于多个领域。因此,DNN模型成为了重要的财产,其知识产权得到了保护。然而,DNN指纹移除攻击的可能性和影响仍然是一个开放的问题。在这篇论文中,我们进行了首次全面的DNN指纹移除攻击调查。通常,DNN模型中的知识可以分为总Semantic和指纹特定知识。为此,我们提出了一种基于最小最大二级优化的DNN指纹移除攻击方法,名为RemovalNet,以避免模型所有权验证。lower-level优化设计移除指纹特定知识。而在upper-level优化中,我们通过液态热塑化将受害者模型的总Semantic知识萃取出来,以保持代理模型的性能。我们对四种高级防御方法进行了广泛的实验,并评估了RemovalNet的准确性、有效性和效率。实验结果显示了以下三点:1. RemovalNet是有效的。在我们的DNN指纹移除攻击后,模型之间的距离增加了100倍,比基eline攻击更高。2. RemovalNet是高效的。它只需使用400个样本和1000次迭代来进行攻击,而基eline攻击需要2000个样本和5000次迭代。此外,与高级模型盗取攻击相比,RemovalNet可以释放大约85%的计算资源。3. RemovalNet实现了高准确性,创建的代理模型在指纹移除过程后仍然保持高度准确。我们的代码可以在https://github.com/grasses/RemovalNet上下载。

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

  • paper_url: http://arxiv.org/abs/2308.12067
  • repo_url: None
  • paper_authors: Lai Wei, Zihao Jiang, Weiran Huang, Lichao Sun
  • for: 这个论文主要用于探讨大语言模型在多模态场景中遵循指令的能力是如何强化的。
  • methods: 论文使用了两个阶段的训练方法:首先在图片和文本对的情况下进行预训练,然后在超参数数据上进行精度调整。
  • results: 论文通过提出一些metric来评估多模态指令数据的质量,并使用这些metric来自动选择高质量的视力语言数据,从而使用InstructionGPT-4超越了原始的MiniGPT-4在多种评估(如视觉问答、GPT-4首选)中的表现。
    Abstract Multimodal large language models acquire their instruction-following capabilities through a two-stage training process: pre-training on image-text pairs and fine-tuning on supervised vision-language instruction data. Recent studies have shown that large language models can achieve satisfactory results even with a limited amount of high-quality instruction-following data. In this paper, we introduce InstructionGPT-4, which is fine-tuned on a small dataset comprising only 200 examples, amounting to approximately 6% of the instruction-following data used in the alignment dataset for MiniGPT-4. We first propose several metrics to access the quality of multimodal instruction data. Based on these metrics, we present a simple and effective data selector to automatically identify and filter low-quality vision-language data. By employing this method, InstructionGPT-4 outperforms the original MiniGPT-4 on various evaluations (e.g., visual question answering, GPT-4 preference). Overall, our findings demonstrate that less but high-quality instruction tuning data is efficient to enable multimodal large language models to generate better output.
    摘要 多模态大语言模型通过两stage训练过程获得指令遵循能力:先于插入图像文本对的预训练,然后在指导视语言数据上进行精度调整。现有研究表明,大语言模型可以通过有限量高质量指令遵循数据来达到满意的结果。在本文中,我们介绍InstructionGPT-4,它是基于只有200个例子,相当于MiniGPT-4的整合数据中的6%的指令遵循数据进行精度调整。我们首先提出了评估多模态指令数据质量的多种指标,然后基于这些指标,我们提出了一种简单有效的数据选择器,可以自动将低质量的视语言数据滤除。通过使用这种方法,InstructionGPT-4在多种评估中(如视觉问答、GPT-4偏好)都超过了原始MiniGPT-4。总之,我们的发现表明,虽然只有少量但高质量的指令循数据,可以使多模态大语言模型生成更好的输出。

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

  • paper_url: http://arxiv.org/abs/2308.12066
  • repo_url: None
  • paper_authors: Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang, Minsoo Rhu
  • for: 大型自然语言模型(LLM)基于变换器的实现,以实现高性能。
  • methods: 使用 Mixture-of-Experts(MoE)架构,以适应大规模 LLM 的计算和存储需求。
  • results: 提出了 Pre-gated MoE 系统,可以有效地解决 conventional MoE 架构中的计算和存储挑战,同时保持高性能和减少 GPU 内存占用量。
    Abstract Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high algorithmic performance, the computational and memory requirements of LLMs present unprecedented challenges. To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which is able to scale its model size without proportionally scaling up its computational requirements. Unfortunately, MoE's high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE's memory-hungry expert parameters to CPU memory fall short because the latency to migrate activated experts from CPU to GPU incurs high performance overhead. Our proposed Pre-gated MoE system effectively tackles the compute and memory challenges of conventional MoE architectures using our algorithm-system co-design. Pre-gated MoE employs our novel pre-gating function which alleviates the dynamic nature of sparse expert activation, allowing our proposed system to address the large memory footprint of MoEs while also achieving high performance. We demonstrate that Pre-gated MoE is able to improve performance, reduce GPU memory consumption, while also maintaining the same level of model quality. These features allow our Pre-gated MoE system to cost-effectively deploy large-scale LLMs using just a single GPU with high performance.
    摘要 Our proposed Pre-gated MoE system effectively tackles the compute and memory challenges of conventional MoE architectures using our algorithm-system co-design. Pre-gated MoE employs a novel pre-gating function that alleviates the dynamic nature of sparse expert activation, allowing our proposed system to address the large memory footprint of MoEs while also achieving high performance. We demonstrate that Pre-gated MoE can improve performance, reduce GPU memory consumption, and maintain the same level of model quality. These features allow our Pre-gated MoE system to cost-effectively deploy large-scale LLMs using just a single GPU with high performance.In simplified Chinese, the text would be:大型语言模型(LLM) based on transformers recent years achieved significant progress, success driven by scaling up model size. However, the high computational and memory requirements of LLMs present unprecedented challenges. To address these challenges, Mixture-of-Experts(MoE) architecture was introduced, which can scale its model size without proportionally increasing its computational requirements. Unfortunately, MoE's high memory demands and dynamic activation of sparse experts limit its applicability to real-world problems. Previous solutions that offload MoE's memory-hungry expert parameters to CPU memory fall short because the latency to migrate activated experts from CPU to GPU incurs high performance overhead.我们的 Pre-gated MoE 系统使用我们的算法-系统合理设计,有效地解决了传统 MoE 架构中的计算和内存挑战。Pre-gated MoE 使用我们的新的预 Gate 函数,解决了 sparse expert 动态 activation 的问题,使我们的提议的系统可以Addressing the large memory footprint of MoEs while also achieving high performance. We demonstrate that Pre-gated MoE can improve performance, reduce GPU memory consumption, and maintain the same level of model quality. These features allow our Pre-gated MoE system to cost-effectively deploy large-scale LLMs using just a single GPU with high performance.

Ensembling Uncertainty Measures to Improve Safety of Black-Box Classifiers

  • paper_url: http://arxiv.org/abs/2308.12065
  • repo_url: None
  • paper_authors: Tommaso Zoppi, Andrea Ceccarelli, Andrea Bondavalli
  • for: 本研究提出了一种安全包装(SPROUT),用于检测和防止机器学习(ML)算法的错误分类。
  • methods: 该方法使用多个不确定度测量来检测输入和输出的不确定性,并在检测到错误分类时阻止输出的传播。
  • results: 实验表明,SPROUT可以准确地检测大量的错误分类,并在特定情况下检测所有错误分类。 SPROUT适用于 binary 和多类分类问题,包括图像和表格数据集。
    Abstract Machine Learning (ML) algorithms that perform classification may predict the wrong class, experiencing misclassifications. It is well-known that misclassifications may have cascading effects on the encompassing system, possibly resulting in critical failures. This paper proposes SPROUT, a Safety wraPper thROugh ensembles of UncertainTy measures, which suspects misclassifications by computing uncertainty measures on the inputs and outputs of a black-box classifier. If a misclassification is detected, SPROUT blocks the propagation of the output of the classifier to the encompassing system. The resulting impact on safety is that SPROUT transforms erratic outputs (misclassifications) into data omission failures, which can be easily managed at the system level. SPROUT has a broad range of applications as it fits binary and multi-class classification, comprising image and tabular datasets. We experimentally show that SPROUT always identifies a huge fraction of the misclassifications of supervised classifiers, and it is able to detect all misclassifications in specific cases. SPROUT implementation contains pre-trained wrappers, it is publicly available and ready to be deployed with minimal effort.
    摘要 机器学习(ML)算法可能会预测错误的类别,导致错误分类。这是已知的一点,错误分类可能会带来整体系统的崩溃。这篇文章提议了“护皮”(SPROUT),它是通过多个不确定度测量来怀疑错误分类的一种安全包装。如果检测到错误分类,SPROUT会阻止分类器的输出传递到包含系统。这会使安全性受到改善,因为SPROUT将异常输入(错误分类)转化为数据漏洞失败,这可以轻松地在系统层面进行管理。SPROUT适用于二分类和多分类,包括图像和表格数据集。我们实验表明,SPROUT总能够检测大量超级vised分类器中的错误分类,并且在某些情况下可以检测所有错误分类。SPROUT的实现包括预训练包装,它公共可用,ready to deploy 需要最小的努力。

FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

  • paper_url: http://arxiv.org/abs/2308.12060
  • repo_url: https://github.com/leezythu/flexkbqa
  • paper_authors: Zhenyu Li, Sunqi Fan, Yu Gu, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, Jianyong Wang
  • for: 提高KBQA模型在实际应用中的性能,尤其是在缺乏高质量annotated数据的情况下。
  • methods: 利用自动生成的程序,如SPARQL查询,和大型自然语言模型(LLMs)来address问题。采用自动生成的程序可以减少人工标注的努力,而LLMs可以将程序转换成自然语言问题。
  • results: 在GrailQA、WebQSP和KQA Pro等 benchmark上进行了广泛的实验,发现在几个shot和零shot情况下,FlexKBQA可以达到很高的性能,比超过所有基eline和even approaching supervised模型的性能,达到93%相对于彻底supervised模型的性能。
    Abstract Knowledge base question answering (KBQA) is a critical yet challenging task due to the vast number of entities within knowledge bases and the diversity of natural language questions posed by users. Unfortunately, the performance of most KBQA models tends to decline significantly in real-world scenarios where high-quality annotated data is insufficient. To mitigate the burden associated with manual annotation, we introduce FlexKBQA by utilizing Large Language Models (LLMs) as program translators for addressing the challenges inherent in the few-shot KBQA task. Specifically, FlexKBQA leverages automated algorithms to sample diverse programs, such as SPARQL queries, from the knowledge base, which are subsequently converted into natural language questions via LLMs. This synthetic dataset facilitates training a specialized lightweight model for the KB. Additionally, to reduce the barriers of distribution shift between synthetic data and real user questions, FlexKBQA introduces an executionguided self-training method to iterative leverage unlabeled user questions. Furthermore, we explore harnessing the inherent reasoning capability of LLMs to enhance the entire framework. Consequently, FlexKBQA delivers substantial flexibility, encompassing data annotation, deployment, and being domain agnostic. Through extensive experiments on GrailQA, WebQSP, and KQA Pro, we observe that under the few-shot even the more challenging zero-shot scenarios, FlexKBQA achieves impressive results with a few annotations, surpassing all previous baselines and even approaching the performance of supervised models, achieving a remarkable 93% performance relative to the fully-supervised models. We posit that FlexKBQA represents a significant advancement towards exploring better integration of large and lightweight models. The code is open-sourced.
    摘要 知识库问答(KBQA)是一项关键性的 yet 挑战性的任务,由于知识库中的维度多样性和用户提交的自然语言问题的多样性。尽管大多数 KBQA 模型在实际场景中表现不佳,这主要归结于缺乏高质量标注数据的问题。为了解决这个问题,我们引入 FlexKBQA,利用大型自然语言模型(LLMs)作为知识库程序翻译器,以解决几何shot KBQA 任务中的挑战。Specifically, FlexKBQA 使用自动生成算法来采样知识库中的多样程序,例如 SPARQL 查询,并将其转化为自然语言问题。这些人工生成的数据可以用来训练特殊的轻量级模型。此外,为了减少实际问题和人工标注数据之间的分布差异,FlexKBQA 引入执行引导自动训练方法,以便逐步利用无标注的用户问题进行自动训练。此外,我们还考虑了利用 LLMs 的内在逻辑能力来增强整个框架。通过广泛的实验在 GrailQA、WebQSP 和 KQA Pro 等平台上,我们发现在几何shot 和零shot enario下,FlexKBQA 可以很好地表现,与完全监督模型相当,达到了93% 的性能相对于完全监督模型。我们认为 FlexKBQA 代表了大量和轻量级模型更好的 интеграción的一个重要进展。代码开源。

Layer-wise Feedback Propagation

  • paper_url: http://arxiv.org/abs/2308.12053
  • repo_url: None
  • paper_authors: Leander Weber, Jim Berend, Alexander Binder, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin
  • for: 这篇论文旨在提出层 wise feedback propagation(LFP),一种基于解释的训练方法,通过分层扩散反馈来评估神经网络中每个连接的贡献,从而实现比 tradicional Gradient Descent 更高效的训练。
  • methods: LFP 使用层 wise relevance propagation(LRP)来分层扩散反馈,不需要计算梯度,从而避免了一些基于梯度的限制。LFP 可以在不同的模型和数据集上实现相似的性能。
  • results: 在论文中, authors 提供了 LFP 的理论和实验证明,并证明了它在不同的模型和数据集上的效果。LFP 可以在不同的应用中提高模型的训练效率,例如在 Step-function activated Spiking Neural Networks(SNNs)中进行训练,或者进行知识传递学习。
    Abstract In this paper, we present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors that utilizes explainability, specifically Layer-wise Relevance Propagation(LRP), to assign rewards to individual connections based on their respective contributions to solving a given task. This differs from traditional gradient descent, which updates parameters towards anestimated loss minimum. LFP distributes a reward signal throughout the model without the need for gradient computations. It then strengthens structures that receive positive feedback while reducingthe influence of structures that receive negative feedback. We establish the convergence of LFP theoretically and empirically, and demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets. Notably, LFP overcomes certain limitations associated with gradient-based methods, such as reliance on meaningful derivatives. We further investigate how the different LRP-rules can be extended to LFP, what their effects are on training, as well as potential applications, such as training models with no meaningful derivatives, e.g., step-function activated Spiking Neural Networks (SNNs), or for transfer learning, to efficiently utilize existing knowledge.
    摘要 在这篇论文中,我们提出层 wise Feedback Propagation(LFP),一种基于解释的训练方法,使用层 wise Relevance Propagation(LRP)来为解决特定任务中的每个连接分配奖励。这与传统的梯度下降不同,梯度下降更新参数向估计损失最小值。LFP在模型中分配奖励信号,不需要梯度计算。它然后强化收到正面反馈的结构,而减少收到负面反馈的影响。我们 theoretically 和 empirically 证明 LFP 的 converges,并在不同模型和数据集上证明其效果。值得注意的是,LFP 可以超越一些相关的梯度基本方法的限制,如依赖于意义 derivatives。我们还 investigate 如何 extend LRP-rules 到 LFP,它们在训练中的效果,以及潜在应用,如训练无意义 derivatives 的模型,例如步函数激活的神经网络(SNNs),或者用于传输学习,以高效地利用现有的知识。

Aligning Language Models with Offline Reinforcement Learning from Human Feedback

  • paper_url: http://arxiv.org/abs/2308.12050
  • repo_url: None
  • paper_authors: Jian Hu, Li Tao, June Yang, Chandler Zhou
  • For: This paper aims to align language models with human preferences using offline reinforcement learning from human feedback (RLHF) frameworks, without relying on online reinforcement learning techniques like Proximal Policy Optimization (PPO) that can be unstable and challenging to tune.* Methods: The authors propose using maximum likelihood estimation (MLE) with filtering, reward-weighted regression (RWR), and Decision Transformer (DT) to align language models to human preferences. They employ a loss function similar to supervised fine-tuning to ensure stable model training, and compare their methods with PPO and other Offline RLHF methods.* Results: The experimental results show that the DT alignment outperforms other Offline RLHF methods and is better than PPO, with a much lower computing resource requirement (around 12.3%) and a simpler machine learning system.
    Abstract Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online reinforcement learning (RL) techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for language models. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline reinforcement learning from human feedback (RLHF) framework to align LMs using pre-generated samples without interacting with RL environments. Specifically, we explore maximum likelihood estimation (MLE) with filtering, reward-weighted regression (RWR), and Decision Transformer (DT) to align language models to human preferences. By employing a loss function similar to supervised fine-tuning, our methods ensure more stable model training than PPO with a simple machine learning system~(MLSys) and much fewer (around 12.3\%) computing resources. Experimental results demonstrate the DT alignment outperforms other Offline RLHF methods and is better than PPO.
    摘要 学习人类偏好是语言模型(LM)效果服务的关键。过去的研究已经做出了可观的进步,通过使用人类反馈来跟进 instruction。然而,这些方法主要依赖于在线强化学习(RL)技术,如 proximal policy optimization(PPO),这些技术有unstable和难于调整的问题。另外,PPO需要复杂的分布式系统实现,这会阻碍大规模分布式训练的效率。在这种情况下,我们提出了一个偏好RLHF框架,用于不需要与RL环境交互的情况下,使语言模型与人类偏好相匹配。具体来说,我们explore maximum likelihood estimation(MLE)with filtering、reward-weighted regression(RWR)和Decision Transformer(DT)来对语言模型进行偏好调整。我们的方法使用一个类似于超vised fine-tuning的损失函数,以确保更稳定的模型训练,并且只需要相对较少的计算资源(约12.3%)。实验结果表明,DT调整超过其他Offline RLHF方法,并且比PPO更好。

Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

  • paper_url: http://arxiv.org/abs/2308.12049
  • repo_url: https://github.com/1015206533/privacy_supporting_fall_detection
  • paper_authors: Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang, Rainer Stiefelhagen
  • for: 预防跌倒,提高健康监测的效果
  • methods: 利用深度感知器和RGB视频数据,通过域 adaptation进行跌倒检测
  • results: 实现了在不需要细致图像数据的情况下,使用RGB视频数据进行跌倒检测,并达到了最佳效果
    Abstract Fall detection is a vital task in health monitoring, as it allows the system to trigger an alert and therefore enabling faster interventions when a person experiences a fall. Although most previous approaches rely on standard RGB video data, such detailed appearance-aware monitoring poses significant privacy concerns. Depth sensors, on the other hand, are better at preserving privacy as they merely capture the distance of objects from the sensor or camera, omitting color and texture information. In this paper, we introduce a privacy-supporting solution that makes the RGB-trained model applicable in depth domain and utilizes depth data at test time for fall detection. To achieve cross-modal fall detection, we present an unsupervised RGB to Depth (RGB2Depth) cross-modal domain adaptation approach that leverages labelled RGB data and unlabelled depth data during training. Our proposed pipeline incorporates an intermediate domain module for feature bridging, modality adversarial loss for modality discrimination, classification loss for pseudo-labeled depth data and labeled source data, triplet loss that considers both source and target domains, and a novel adaptive loss weight adjustment method for improved coordination among various losses. Our approach achieves state-of-the-art results in the unsupervised RGB2Depth domain adaptation task for fall detection. Code is available at https://github.com/1015206533/privacy_supporting_fall_detection.
    摘要 “fall detection是健康监控中的重要任务,可以让系统发送警示,从而更快地对人员坠落时进行应对。然而,大多数先前的方法仅使用标准的RGB影像数据,这种细节意识敏感的监控具有重要的隐私问题。深度感知器,则可以更好地保持隐私,因为它们仅capture物体对感知器或相机的距离,排除颜色和 texture信息。在本文中,我们介绍了一个关于隐私支持的解决方案,让RGB模型在深度领域中可用并在试用时使用深度数据进行坠落探测。”“实现跨模式的坠落探测,我们提出了一个不需要 labels的RGB to Depth(RGB2Depth)跨模式领域适应方法。我们的提案包括一个中继领域模组,用于Feature Bridging,模组挑战数据的类型和大小,以及一个对于模组的挑战数据的多对多挑战数据。我们还使用了一个对于source和target领域的多对多挑战数据,以及一个新的适应式损失调整方法,以改善不同损失函数之间的协调。”“我们的方法在RGB2Depth领域适应任务中得到了state-of-the-art的结果。我们的代码可以在https://github.com/1015206533/privacy_supporting_fall_detection中找到。”

CgT-GAN: CLIP-guided Text GAN for Image Captioning

  • paper_url: http://arxiv.org/abs/2308.12045
  • repo_url: https://github.com/lihr747/cgtgan
  • paper_authors: Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He
  • for: The paper is written for improving image captioning without human-annotated image-caption pairs, using a text-only training paradigm and incorporating images into the training process.
  • methods: The paper proposes a CLIP-guided text GAN (CgT-GAN) that uses adversarial training and a CLIP-based reward to provide semantic guidance, and introduces a novel semantic guidance reward called CLIP-agg that aligns the generated caption with a weighted text embedding.
  • results: The paper shows that CgT-GAN outperforms state-of-the-art methods significantly across all metrics on three subtasks (ZS-IC, In-UIC, and Cross-UIC).Here’s the simplified Chinese text version of the three key information points:
  • for: 文章是为了提高无人注意图像描述的image captioning,使用文本单独训练 paradigm,并在训练过程中包含图像。
  • methods: 文章提出了一种基于CLIP的文本GAN(CgT-GAN),使用对抗训练和基于CLIP的奖励来提供语义指导,并引入了一种新的语义指导奖励called CLIP-agg。
  • results: 文章表明,CgT-GAN在三个任务(ZS-IC、In-UIC和Cross-UIC)上比州前方法显著出众,包括所有指标。
    Abstract The large-scale visual-language pre-trained model, Contrastive Language-Image Pre-training (CLIP), has significantly improved image captioning for scenarios without human-annotated image-caption pairs. Recent advanced CLIP-based image captioning without human annotations follows a text-only training paradigm, i.e., reconstructing text from shared embedding space. Nevertheless, these approaches are limited by the training/inference gap or huge storage requirements for text embeddings. Given that it is trivial to obtain images in the real world, we propose CLIP-guided text GAN (CgT-GAN), which incorporates images into the training process to enable the model to "see" real visual modality. Particularly, we use adversarial training to teach CgT-GAN to mimic the phrases of an external text corpus and CLIP-based reward to provide semantic guidance. The caption generator is jointly rewarded based on the caption naturalness to human language calculated from the GAN's discriminator and the semantic guidance reward computed by the CLIP-based reward module. In addition to the cosine similarity as the semantic guidance reward (i.e., CLIP-cos), we further introduce a novel semantic guidance reward called CLIP-agg, which aligns the generated caption with a weighted text embedding by attentively aggregating the entire corpus. Experimental results on three subtasks (ZS-IC, In-UIC and Cross-UIC) show that CgT-GAN outperforms state-of-the-art methods significantly across all metrics. Code is available at https://github.com/Lihr747/CgtGAN.
    摘要 大规模的视觉语言预训练模型CLIP(Contrastive Language-Image Pre-training)在没有人类标注的场景下提高了图像描述。最新的CLIP基于的图像描述方法采用文本只训练 paradigm,即在共享 embedding 空间中重建文本。然而,这些方法受到训练/推断差距或巨大的存储要求的限制。因为在实际世界中可以轻松地获得图像,我们提出了CLIP引导的文本GAN(CgT-GAN),它将图像 inclusion 到训练过程中,使模型可以"看到"实际的视觉Modal。特别是,我们使用对抗训练来教育CgT-GAN模仿外部文本聚合体和CLIP基于的奖励来提供语义指导。描述生成器被同时激励基于描述自然度计算从GAN的探测器和CLIP基于的奖励模块计算的语义指导奖励。此外,我们还引入了一种新的语义指导奖励called CLIP-agg,它将生成的描述与权重文本embedding进行协调,通过对整个聚合体进行注意力聚集来实现。实验结果在三个SUB Task(ZS-IC、In-UIC和Cross-UIC)中显示,CgT-GAN具有与状态艺术方法相比明显的优势,在所有指标上出现显著提升。代码可以在https://github.com/Lihr747/CgtGAN 上找到。

A multiobjective continuation method to compute the regularization path of deep neural networks

  • paper_url: http://arxiv.org/abs/2308.12044
  • repo_url: https://github.com/aamakor/continuation-method
  • paper_authors: Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz
  • for: 本文旨在提出一种高效的方法,以优化深度神经网络(DNN)的稀疏性和损失函数之间的衔接。
  • methods: 本文使用了一种基于多目标优化的算法,以 aproximate Pareto front 上的整个衔接。
  • results: 数据示出了该算法的高效性和通用性,并且可以在不同的梯度下进行数据的验证。此外,本文还证明了知道衔接路径可以帮助网络 Parametrization 得到更好的泛化性。
    Abstract Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
    摘要 深度神经网络(DNN)中的稀畴性是一个非常强地需求的特性,因为它确保了数学效率、提高模型解释性(由于更少的相关特征),并且提高了模型的稳定性。在线性机器学习方法基于的模型中,已经知道存在一个连接到最稀 Solution 的梯度路径,这个梯度路径被称为规regularization path。很近期,有一个首次尝试将这个概念扩展到 DNN 中,通过对 empirical loss 和稀畴性($\ell^1$ 范数)作为两个矛盾的目标,解决 resulting 多目标优化问题。然而,由于 $\ell^1$ 范数的非滑坡性和参数的高数量,这种方法并不很有效从计算机科学的角度。为了解决这个限制,我们提出了一个算法,可以高效地 aproximate 整个 Pareto front 上的目标。我们通过 deterministic 和 Stochastic 梯度来进行数值示例。此外,我们还证明了知道规regularization path 可以提供一个良好的网络参数化。

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

  • paper_url: http://arxiv.org/abs/2308.12043
  • repo_url: https://github.com/feiyuzhang98/increlora
  • paper_authors: Feiyu Zhang, Liangzhi Li, Junhao Chen, Zhouqiang Jiang, Bowen Wang, Yiming Qian
  • for: 这篇论文主要针对于大型预训语言模型(PLMs)的精致化训练进行优化,以减少训练和储存成本,特别是在大量下游任务时。
  • methods: 这篇论文提出了一种增量化对应(IncreLoRA)方法,将预训模组中的参数转换为可变的权重矩阵,以提高模组之间的通信。此外,这篇论文还提出了一些对LoRA的修正方法,以提高其效能。
  • results: 在GLUE测试集上,这篇论文的方法与基eline相比,具有更高的参数效率,特别是在资源不足的情况下。另外,这篇论文还展示了对LoRA的修正方法可以对模组之间的通信进行更好的控制。
    Abstract With the increasing size of pre-trained language models (PLMs), fine-tuning all the parameters in the model is not efficient, especially when there are a large number of downstream tasks, which incur significant training and storage costs. Many parameter-efficient fine-tuning (PEFT) approaches have been proposed, among which, Low-Rank Adaptation (LoRA) is a representative approach that injects trainable rank decomposition matrices into every target module. Yet LoRA ignores the importance of parameters in different modules. To address this problem, many works have been proposed to prune the parameters of LoRA. However, under limited training conditions, the upper bound of the rank of the pruned parameter matrix is still affected by the preset values. We, therefore, propose IncreLoRA, an incremental parameter allocation method that adaptively adds trainable parameters during training based on the importance scores of each module. This approach is different from the pruning method as it is not limited by the initial number of training parameters, and each parameter matrix has a higher rank upper bound for the same training overhead. We conduct extensive experiments on GLUE to demonstrate the effectiveness of IncreLoRA. The results show that our method owns higher parameter efficiency, especially when under the low-resource settings where our method significantly outperforms the baselines. Our code is publicly available.
    摘要 随着预训语言模型(PLM)的大小的增加,精细调整所有模型参数不是efficient,特别是当有大量下游任务时,会导致显著的训练和存储成本。许多参数精细调整(PEFT)approach已经提出,其中LoRA是一个代表性的方法,它在每个目标模块中注入可学习的排序矩阵。然而,LoRA忽略了参数在不同模块中的重要性。为解决这个问题,许多工作已经提出了对LoRA的剪枝。然而,在限制的训练条件下,剪枝后的参数矩阵的rankUpperBound仍然受到先前设置的值的影响。因此,我们提出了IncreLoRA,一种逐步分配参数的方法,它在训练过程中基于每个模块的重要性分数进行逐步添加可学习参数。这种方法与剪枝方法不同,它不受限于初始训练参数的数量,每个参数矩阵的rankUpperBound都高于同样的训练负担。我们在GLUE上进行了广泛的实验,结果表明我们的方法具有更高的参数效率,特别是在低资源设置下,我们的方法显著超过了基eline。我们的代码公开 disponibles。

PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

  • paper_url: http://arxiv.org/abs/2308.12033
  • repo_url: https://github.com/zcrwind/prefer
  • paper_authors: Chenrui Zhang, Lin Liu, Jinpeng Wang, Chuyuan Wang, Xiao Sun, Hongyu Wang, Mingchen Cai
  • for: 提高 Large Language Model (LLM) 的表现,增强其能力。
  • methods: 提出了一种简单、通用、自动化的方法 named PREFER,通过反馈机制和迭代优化来提高 LLM 的表现。
  • results: 经过广泛的实验,我们的 PREFER 方法在多种任务上达到了 state-of-the-art 水平,超过了现有方法的表现。
    Abstract As an effective tool for eliciting the power of Large Language Models (LLMs), prompting has recently demonstrated unprecedented abilities across a variety of complex tasks. To further improve the performance, prompt ensemble has attracted substantial interest for tackling the hallucination and instability of LLMs. However, existing methods usually adopt a two-stage paradigm, which requires a pre-prepared set of prompts with substantial manual effort, and is unable to perform directed optimization for different weak learners. In this paper, we propose a simple, universal, and automatic method named PREFER (Pompt Ensemble learning via Feedback-Reflect-Refine) to address the stated limitations. Specifically, given the fact that weak learners are supposed to focus on hard examples during boosting, PREFER builds a feedback mechanism for reflecting on the inadequacies of existing weak learners. Based on this, the LLM is required to automatically synthesize new prompts for iterative refinement. Moreover, to enhance stability of the prompt effect evaluation, we propose a novel prompt bagging method involving forward and backward thinking, which is superior to majority voting and is beneficial for both feedback and weight calculation in boosting. Extensive experiments demonstrate that our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin. We have made our code publicly available.
    摘要 为了更好地利用大语言模型(LLM)的能力,提问最近在多种复杂任务中表现出了无 precedent 的能力。为了进一步提高性能,提问ensemble 已经吸引了很多关注,以解决 LLM 的幻觉和不稳定性。然而,现有的方法通常采用两个阶段 paradigm,需要大量的手动努力来预先准备提问集,并且无法 direktly 优化不同的弱学习者。在这篇论文中,我们提出了一种简单、通用和自动的方法 named PREFER (提问组合学习 via 反馈反思改进),以解决所提到的限制。具体来说,我们知道弱学习者在扩大时会关注困难的示例,PREFER 建立了反馈机制,以反思现有弱学习者的不足。基于这,LLM 需要自动生成新的提问,进行迭代改进。此外,为了增强提问效果评估的稳定性,我们提出了一种新的提问袋裹法,其包括前向和后向思考,比较有利于提问评估和权重计算在扩大中。我们的 EXPERIMENT 表明,我们的 PREFER 可以在多种任务中达到 estado 的表现,与当前最佳方法相比,差距非常大。我们的代码已经公开发布。

CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures

  • paper_url: http://arxiv.org/abs/2308.12031
  • repo_url: None
  • paper_authors: Luca Gherardini, Varun Ravi Varma, Karol Capala, Roger Woods, Jose Sousa
  • for: 本研究旨在提高安全分析的解释能力,为现代人工智能发展提供帮助。
  • methods: 本研究使用了CACTUS,一种可解释的人工智能工具,以提高安全分析的效果。CACTUS支持分类特征,保持特征的原始含义,提高内存使用率,并通过并行计算加速计算速度。
  • results: 本研究在应用于美洲矿业癌症和甲状腺癌0387数据集中展现出色的表现,并且可以显示每个类别中特征的频率和排名。
    Abstract The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for developing solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets.
    摘要 大量数据的可用性正为现代人工智能发展提供了推动力。然而,对小数据集的解决方案存在实用和成本效益的挑战,尤其是深度学习模型的透明性问题。本文提出了一种名为“CACTUS”的全面抽象分类工具,用于提高安全分析。它能够有效地使用可解释人工智能,并且支持 categorical 特征,保持原始含义,优化内存使用情况,并通过并行计算加速计算。它可以在用户看到每个类别 attribute 的频率和排名它们的抑制力。它的性能被评估通过应用于美国威斯康星诊断乳腺癌和 thyroid0387 数据集。

Prompt-Based Length Controlled Generation with Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.12030
  • repo_url: None
  • paper_authors: Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu
  • for: 提高 GPT 类模型的准确性和效率,以便更好地满足不同场景中的需求。
  • methods: 采用了反馈学习,通过训练或使用规则来定义奖励模型,以便控制 GPT 类模型的生成长度。
  • results: 在 популяр的数据集 CNNDM 和 NYT 上实现了更高的描述精度和准确性。
    Abstract Recently, large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising improvement and performance. Length controlled generation of LLMs emerges as an important topic, which also enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can arbitrarily reduce the inference cost by limiting the length, and thus satisfy different needs. Therefore, we aim to propose a prompt-based length control method to achieve this length controlled generation, which can also be widely applied in GPT-style LLMs. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward model, which further affects the generation of LLMs via rewarding a pre-defined target length. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. We believe this length-controllable ability can provide more potentials towards the era of LLMs.
    摘要 To address this issue, we propose a prompt-based length control method using reinforcement learning with a trainable or rule-based reward model. Our method aims to achieve length-controlled generation in GPT-style LLMs, and experiments show that it significantly improves the accuracy of prompt-based length control for summarization tasks on popular datasets like CNNDM and NYT. We believe that this length-controllable ability has great potential in the era of LLMs.

A Scale-Invariant Task Balancing Approach for Multi-Task Learning

  • paper_url: http://arxiv.org/abs/2308.12029
  • repo_url: None
  • paper_authors: Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu
  • for: 提高多任务学习(MTL)中任务均衡的问题,以便同时学习多个相关任务并实现优秀表现。
  • methods: 提出了一种具有整数归一化特性的多任务学习方法(SI-MTL),通过对所有任务损失进行对数变换来保证损失水平的均衡,并通过SI-G方法对所有任务导数进行归一化,使所有任务导数具有同一个 максималь去向量范围。
  • results: 经过广泛的实验表明,SI-G方法能够有效地约束任务导数,而SI-MTL方法能够在多个 benchmark 数据集上达到领先的性能水平。
    Abstract Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task-balancing remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Scale-Invariant Multi-Task Learning (SI-MTL) method to alleviate the task-balancing problem from both loss and gradient perspectives. Specifically, SI-MTL contains a logarithm transformation which is performed on all task losses to ensure scale-invariant at the loss level, and a gradient balancing method, SI-G, which normalizes all task gradients to the same magnitude as the maximum gradient norm. Extensive experiments conducted on several benchmark datasets consistently demonstrate the effectiveness of SI-G and the state-of-the-art performance of SI-MTL.
    摘要 多任务学习(MTL),一种同时学习多个相关任务的学习方法,在各个领域取得了很大成功。然而,任务均衡仍然是MTL中的主要挑战,因为任务损失/梯度的尺度差异常常导致性能下降。在这篇论文中,我们提出了一种减小任务均衡问题的扩展MTL方法(SI-MTL)。具体来说,SI-MTL包括一种对所有任务损失进行对数变换,以保证损失水平上的减小,以及一种梯度均衡方法SI-G,该方法将所有任务梯度 норmalizes到最大梯度 норма的同一个范围内。我们在多个标准数据集上进行了广泛的实验,并经常证明了SI-G的有效性和SI-MTL的状态之最性。

LKPNR: LLM and KG for Personalized News Recommendation Framework

  • paper_url: http://arxiv.org/abs/2308.12028
  • repo_url: https://github.com/xuan-zw/lkpnr
  • paper_authors: Chen hao, Xie Runfeng, Cui Xiangyang, Yan Zhou, Wang Xin, Xuan Zhanwei, Zhang Kai
  • for: 提高新闻推荐系统的准确率,解决传统方法对复杂新闻文本的理解困难和长尾问题。
  • methods: combining Large Language Models (LLM) and Knowledge Graphs (KG) into semantic representations of traditional methods, using LLMs’ powerful text understanding ability to generate news representations containing rich semantic information, and combining information about news entities and mining high-order structural information through multiple hops in KG.
  • results: compared with various traditional models, the framework significantly improves the recommendation effect, and the successful integration of LLM and KG in the framework has established a feasible path for achieving more accurate personalized recommendations in the news field.
    Abstract Accurately recommending candidate news articles to users is a basic challenge faced by personalized news recommendation systems. Traditional methods are usually difficult to grasp the complex semantic information in news texts, resulting in unsatisfactory recommendation results. Besides, these traditional methods are more friendly to active users with rich historical behaviors. However, they can not effectively solve the "long tail problem" of inactive users. To address these issues, this research presents a novel general framework that combines Large Language Models (LLM) and Knowledge Graphs (KG) into semantic representations of traditional methods. In order to improve semantic understanding in complex news texts, we use LLMs' powerful text understanding ability to generate news representations containing rich semantic information. In addition, our method combines the information about news entities and mines high-order structural information through multiple hops in KG, thus alleviating the challenge of long tail distribution. Experimental results demonstrate that compared with various traditional models, the framework significantly improves the recommendation effect. The successful integration of LLM and KG in our framework has established a feasible path for achieving more accurate personalized recommendations in the news field. Our code is available at https://github.com/Xuan-ZW/LKPNR.
    摘要 基于大语言模型和知识图的新闻个性化推荐系统是一个基本挑战。传统方法通常难以捕捉新闻文本中复杂的 semantic information,导致推荐结果不 satisfactory。另外,这些传统方法更适合有活跃用户行为的活跃用户。然而,它们无法有效解决“长尾问题”,即不活跃用户。为解决这些问题,本研究提出了一种新的通用框架,combines Large Language Models (LLM) and Knowledge Graphs (KG) into semantic representations of traditional methods。为了提高新闻文本中的semantic理解,我们使用 LLMs的强大文本理解能力生成新闻表示形式,具有丰富的semantic信息。此外,我们的方法结合新闻实体信息,通过多个层次结构信息在 KG 中挖掘高阶结构信息,从而缓解长尾分布的挑战。实验结果表明,与各种传统模型相比,我们的框架显著提高了推荐效果。我们成功地将 LLM 和 KG 集成到我们的框架中,建立了实现更高精度的个性化推荐在新闻领域的可行道路。我们的代码可以在 中找到。

From Instructions to Intrinsic Human Values – A Survey of Alignment Goals for Big Models

  • paper_url: http://arxiv.org/abs/2308.12014
  • repo_url: None
  • paper_authors: Jing Yao, Xiaoyuan Yi, Xiting Wang, Jindong Wang, Xing Xie
  • for: 本研究旨在探讨现有工作中的各种Alignment Goals,以帮助确定最重要的目标。
  • methods: 本研究从两个角度 investigate了现有工作:一是对Alignment Goals的定义,二是对Alignment evaluation的研究。
  • results: 研究发现了三级别的Alignment Goals,并发现了目标转化从基本能力到价值观,这表明了可以利用内在人类价值作为Enhanced LLMs的Alignment goal。
    Abstract Big models, exemplified by Large Language Models (LLMs), are models typically pre-trained on massive data and comprised of enormous parameters, which not only obtain significantly improved performance across diverse tasks but also present emergent capabilities absent in smaller models. However, the growing intertwining of big models with everyday human lives poses potential risks and might cause serious social harm. Therefore, many efforts have been made to align LLMs with humans to make them better follow user instructions and satisfy human preferences. Nevertheless, `what to align with' has not been fully discussed, and inappropriate alignment goals might even backfire. In this paper, we conduct a comprehensive survey of different alignment goals in existing work and trace their evolution paths to help identify the most essential goal. Particularly, we investigate related works from two perspectives: the definition of alignment goals and alignment evaluation. Our analysis encompasses three distinct levels of alignment goals and reveals a goal transformation from fundamental abilities to value orientation, indicating the potential of intrinsic human values as the alignment goal for enhanced LLMs. Based on such results, we further discuss the challenges of achieving such intrinsic value alignment and provide a collection of available resources for future research on the alignment of big models.
    摘要 大型模型,如大语言模型(LLMs),是通常在庞大数据上预训练的模型,不仅在多种任务上显示出较好的性能,而且具有emergent功能,与更小的模型不同。然而,大型模型与人类生活的日益相互 penetration可能会带来潜在的风险,可能会对社会造成严重的危害。因此,许多努力已经被做出,以使LMMs与人类更好地配合,使其更好地遵从用户的指令和满足人类的偏好。然而,`与何进行对齐'的问题尚未得到了完全的讨论,不当的对齐目标可能会倒退。在这篇论文中,我们进行了完整的对齐目标的检查,并跟踪它们的演化路径,以帮助identify最重要的目标。特别是,我们从两个视角 investigate existing work:对齐目标的定义和对齐评估。我们的分析覆盖了三级别的对齐目标,并显示了对齐目标的变化从基本能力到价值观,这表明了内置人类价值的可能性作为LLMs的对齐目标,以提高它们的性能。基于这些结果,我们进一步讨论了实现这种内置价值对齐的挑战,并提供了未来对big models的对齐研究的可用资源。

Quantum-Noise-driven Generative Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.12013
  • repo_url: None
  • paper_authors: Marco Parigi, Stefano Martina, Filippo Caruso
  • for: 这个论文旨在提出和讨论量子扩散模型的量子扩散模型,用于生成复杂的数据分布。
  • methods: 该论文使用机器学习技术实现生成模型,并利用量子随机过程中的偶极性、Entanglement和噪声来超越经典扩散模型的计算困难。
  • results: 该论文预计可以开拓新的量子感知或量子基于的生成扩散算法,用于解决经典任务,如数据生成/预测,并具有广泛的实际应用,如气候预测、神经科学、交通流量分析和财务预测。
    Abstract Generative models realized with machine learning techniques are powerful tools to infer complex and unknown data distributions from a finite number of training samples in order to produce new synthetic data. Diffusion models are an emerging framework that have recently overcome the performance of the generative adversarial networks in creating synthetic text and high-quality images. Here, we propose and discuss the quantum generalization of diffusion models, i.e., three quantum-noise-driven generative diffusion models that could be experimentally tested on real quantum systems. The idea is to harness unique quantum features, in particular the non-trivial interplay among coherence, entanglement and noise that the currently available noisy quantum processors do unavoidably suffer from, in order to overcome the main computational burdens of classical diffusion models during inference. Hence, we suggest to exploit quantum noise not as an issue to be detected and solved but instead as a very remarkably beneficial key ingredient to generate much more complex probability distributions that would be difficult or even impossible to express classically, and from which a quantum processor might sample more efficiently than a classical one. Therefore, our results are expected to pave the way for new quantum-inspired or quantum-based generative diffusion algorithms addressing more powerfully classical tasks as data generation/prediction with widespread real-world applications ranging from climate forecasting to neuroscience, from traffic flow analysis to financial forecasting.
    摘要 通过机器学习技术实现的生成模型是一种 poderoso工具,可以从 finite 数据样本中推断出复杂而未知的数据分布,生成新的合成数据。扩散模型是一种emerging框架,最近已经超越了生成对抗网络在创造合成文本和高质量图像方面的性能。在这里,我们提出并讨论了量子扩散模型的普适化,即利用量子噪声驱动的三种量子扩散生成模型,可以在真正的量子系统上进行实验。我们的想法是利用量子特有的非rivial相互作用,即准确性、耦合和噪声,以超越经典扩散模型的主要计算危机。因此,我们建议利用量子噪声不作为问题,而是作为非常有利的重要组分,以生成更复杂的概率分布,这些分布可能是经典计算不能表达,而量子处理器可能可以更高效地采样这些分布。因此,我们的结果预计将为新的量子激发或量子基于的生成扩散算法开拓出新的应用领域,从气候预测到神经科学,从交通流量分析到金融预测。

Trustworthy Representation Learning Across Domains

  • paper_url: http://arxiv.org/abs/2308.12315
  • repo_url: None
  • paper_authors: Ronghang Zhu, Dongliang Guo, Daiqing Qi, Zhixuan Chu, Xiang Yu, Sheng Li
  • for: 这个论文的目的是提出一个可靠的表示学习框架,以适应实际应用场景中的跨domain问题。
  • methods: 该论文使用了四个概念,即Robustness、Privacy、Fairness和Explainability,以提供一个全面的文献复盘。
  • results: 该论文提出了一个基于这四个概念的信任worthy表示学习框架,并对现有方法进行了概括和分析。
    Abstract As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most important parts for AI systems and representation learning is the fundamental technology in machine learning. How to make the representation learning trustworthy in real-world application, e.g., cross domain scenarios, is very valuable and necessary for both machine learning and AI system fields. Inspired by the concepts in trustworthy AI, we proposed the first trustworthy representation learning across domains framework which includes four concepts, i.e, robustness, privacy, fairness, and explainability, to give a comprehensive literature review on this research direction. Specifically, we first introduce the details of the proposed trustworthy framework for representation learning across domains. Second, we provide basic notions and comprehensively summarize existing methods for the trustworthy framework from four concepts. Finally, we conclude this survey with insights and discussions on future research directions.
    摘要 Inspired by the principles of trustworthy AI, we proposed the first trustworthy representation learning across domains framework, which includes four key concepts: robustness, privacy, fairness, and explainability. This comprehensive literature review provides an overview of this research direction, including the details of the proposed trustworthy framework for representation learning across domains, a summary of existing methods that align with the four concepts, and insights and discussions on future research directions. Specifically, we first introduce the details of the proposed trustworthy framework for representation learning across domains. We then provide a comprehensive overview of existing methods that align with the four concepts, including robustness, privacy, fairness, and explainability. Finally, we conclude this survey with insights and discussions on future research directions.The proposed trustworthy framework for representation learning across domains includes four key concepts:1. Robustness: The ability of the model to perform well in the presence of noise, outliers, or distributional shifts.2. Privacy: The protection of sensitive information and the prevention of unauthorized access or misuse.3. Fairness: The avoidance of bias and discrimination in the model's predictions, ensuring that all individuals or groups are treated equally and without prejudice.4. Explainability: The ability to provide clear and understandable explanations for the model's predictions, allowing users to understand the reasoning behind the model's decisions.Existing methods for the trustworthy framework from these four concepts include:1. Robustness: Techniques such as data augmentation, adversarial training, and ensemble methods can improve the model's robustness to noise and distributional shifts.2. Privacy: Methods such as differential privacy, secure multi-party computation, and homomorphic encryption can protect sensitive information and prevent unauthorized access.3. Fairness: Techniques such as fair batch normalization, fair representation learning, and fair evaluation metrics can help to mitigate bias and discrimination in the model's predictions.4. Explainability: Approaches such as feature importance, saliency maps, and model interpretability techniques can provide clear explanations for the model's predictions.In conclusion, this survey provides a comprehensive overview of the trustworthy representation learning across domains framework, including the proposed framework and existing methods that align with the four key concepts. We also discuss insights and future research directions in this field, highlighting the importance of trustworthy AI systems in our daily lives and human society.

Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations

  • paper_url: http://arxiv.org/abs/2308.11995
  • repo_url: https://github.com/alexa/Topical-Chat
  • paper_authors: Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, Dilek Hakkani-Tur
  • for: 这个论文的目的是提供一个基于知识的人机对话集,帮助开发更加深入、有趣的人机对话AI。
  • methods: 论文使用了知识基础的人机对话集,并在这个集合中采用了无显式角色的对话方式。
  • results: 论文通过对这个知识基础的人机对话集进行自动和人工评价,提出了一些state-of-the-art的对话模型。
    Abstract Building socialbots that can have deep, engaging open-domain conversations with humans is one of the grand challenges of artificial intelligence (AI). To this end, bots need to be able to leverage world knowledge spanning several domains effectively when conversing with humans who have their own world knowledge. Existing knowledge-grounded conversation datasets are primarily stylized with explicit roles for conversation partners. These datasets also do not explore depth or breadth of topical coverage with transitions in conversations. We introduce Topical-Chat, a knowledge-grounded human-human conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don't have explicitly defined roles, to help further research in open-domain conversational AI. We also train several state-of-the-art encoder-decoder conversational models on Topical-Chat and perform automated and human evaluation for benchmarking.
    摘要 建立社交机器人,能够与人类进行深入有趣的开放领域对话,是人工智能(AI)的极大挑战之一。为此,机器人需要能够有效地利用多个领域的世界知识进行对话。现有的知识基础对话数据集主要是通过显式角色定义对话伙伴进行预设。这些数据集还不探讨对话的深度或广度,也没有探讨对话的转变。我们介绍Topical-Chat,一个基于知识的人类对话数据集,其下面知识覆盖8个广泛的主题,对话伙伴没有显式定义角色,以便进一步推动开放领域对话AI的研究。我们还在Topical-Chat上训练了多种当今最佳encoder-decoder对话模型,并进行自动和人类评估,以便作为参考。

Critical Evaluation of Artificial Intelligence as Digital Twin of Pathologist for Prostate Cancer Pathology

  • paper_url: http://arxiv.org/abs/2308.11992
  • repo_url: None
  • paper_authors: Okyaz Eminaga, Mahmoud Abbas, Christian Kunder, Yuri Tolkach, Ryan Han, James D. Brooks, Rosalie Nolley, Axel Semjonow, Martin Boegemann, Robert West, Jin Long, Richard Fan, Olaf Bettendorf
  • for: 这项研究旨在测试一种基于人工智能的 Digitaltwin 技术,用于检测 próstate cancer 和分类。
  • methods: 研究使用了 2,603 个 histology 图像,由 Hematoxylin 和 Eosin 染色。使用了多种因素对 prostate cancer 的诊断和分类进行了分析。
  • results: 研究发现,vPatho 可以与人类Pathologist 相比,在 prostate cancer 的检测和卷积量测量方面具有相当的表现。但是,在 tumor grading 方面,vPatho 和人类Pathologist 之间存在一定的不一致。此外,研究还发现了一些可能导致 grade 不一致的因素,如 tumor 的垂直扩展和抽屉含量。
    Abstract Prostate cancer pathology plays a crucial role in clinical management but is time-consuming. Artificial intelligence (AI) shows promise in detecting prostate cancer and grading patterns. We tested an AI-based digital twin of a pathologist, vPatho, on 2,603 histology images of prostate tissue stained with hematoxylin and eosin. We analyzed various factors influencing tumor-grade disagreement between vPatho and six human pathologists. Our results demonstrated that vPatho achieved comparable performance in prostate cancer detection and tumor volume estimation, as reported in the literature. Concordance levels between vPatho and human pathologists were examined. Notably, moderate to substantial agreement was observed in identifying complementary histological features such as ductal, cribriform, nerve, blood vessels, and lymph cell infiltrations. However, concordance in tumor grading showed a decline when applied to prostatectomy specimens (kappa = 0.44) compared to biopsy cores (kappa = 0.70). Adjusting the decision threshold for the secondary Gleason pattern from 5% to 10% improved the concordance level between pathologists and vPatho for tumor grading on prostatectomy specimens (kappa from 0.44 to 0.64). Potential causes of grade discordance included the vertical extent of tumors toward the prostate boundary and the proportions of slides with prostate cancer. Gleason pattern 4 was particularly associated with discordance. Notably, grade discordance with vPatho was not specific to any of the six pathologists involved in routine clinical grading. In conclusion, our study highlights the potential utility of AI in developing a digital twin of a pathologist. This approach can help uncover limitations in AI adoption and the current grading system for prostate cancer pathology.
    摘要 prostata cancer 的生理学pathology 在临床管理中发挥关键作用,但是它很时间消耗。人工智能(AI)表示可能用于检测 prostata cancer 和分化模式。我们使用了一个基于 AI 的pathologist 数字 близнеvPatho 测试了 2,603 个 prostata组织片中的 Hematoxylin 和 Eosin 染色图像。我们分析了不同因素 influencing tumor-grade 的不一致性。结果表明,vPatho 在检测 prostata cancer 和组织体积方面达到了文献报告的性能。我们对 vPatho 和六名人类病理学家之间的一致性进行了分析。注意,在识别 complementary 的 histological 特征方面,such as ductal、cribriform、nerve、血管和lymphocyte infiltration 中,moderate to substantial 的一致性被观察到。然而,在评估 tumor grading 方面,一致性下降到 prostatectomy specimens (kappa = 0.44),比 biopsy cores (kappa = 0.70)更低。通过调整 secondary Gleason 模式的决策阈值从 5% 到 10%,提高了 pathologists 和 vPatho 之间的一致性水平(kappa from 0.44 to 0.64)。可能导致 grade discordance 的原因包括 tumor 的 vertical 分布向 prostata 边界以及检测到的肿瘤组织片的比例。Gleason 模式 4 特别与不一致相关。需要注意的是,grade discordance 与 vPatho 不特别任何一名病理学家的 routine clinical grading 相关。在结论中,我们的研究表明了 AI 可能在开发一个 pathologist 数字 близне的方面具有潜在的用途。这种方法可以帮助揭露 AI 的采用 limitation 和当前的 prostata cancer 生理学pathology 评估系统的限制。

Relational Concept Based Models

  • paper_url: http://arxiv.org/abs/2308.11991
  • repo_url: https://github.com/Aghoreshwar/Awesome-Customer-Analytics
  • paper_authors: Pietro Barbiero, Francesco Giannini, Gabriele Ciravegna, Michelangelo Diligenti, Giuseppe Marra
  • for: 这个论文的目的是解决关系领域中的深度学习模型可读性问题,这些模型不是专门设计来解决关系问题,而且关系模型不如概念基础模型(CBMs)那样可读性。
  • methods: 作者提议了一种名为关系概念基础模型(Relational CBMs)的家族关系深度学习方法,这些方法可以在关系领域中提供可读性的任务预测。
  • results: 作者的实验表明,关系CBMs可以与现有的关系黑obox(黑obox)相比,在图像分类和知识图表链接预测等问题上达到同等的泛化性能,同时支持生成量化的概念基础解释,并能够应对测试时间 intervención,在有限的训练数据 régime和罕见概念监督下也能够保持稳定性。
    Abstract The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept-Based Models (CBMs), are not designed to solve relational problems, while relational models are not as interpretable as CBMs. To address this problem, we propose Relational Concept-Based Models, a family of relational deep learning methods providing interpretable task predictions. Our experiments, ranging from image classification to link prediction in knowledge graphs, show that relational CBMs (i) match generalization performance of existing relational black-boxes (as opposed to non-relational CBMs), (ii) support the generation of quantified concept-based explanations, (iii) effectively respond to test-time interventions, and (iv) withstand demanding settings including out-of-distribution scenarios, limited training data regimes, and scarce concept supervisions.
    摘要 translate("The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept-Based Models (CBMs), are not designed to solve relational problems, while relational models are not as interpretable as CBMs. To address this problem, we propose Relational Concept-Based Models, a family of relational deep learning methods providing interpretable task predictions. Our experiments, ranging from image classification to link prediction in knowledge graphs, show that relational CBMs (i) match generalization performance of existing relational black-boxes (as opposed to non-relational CBMs), (ii) support the generation of quantified concept-based explanations, (iii) effectively respond to test-time interventions, and (iv) withstand demanding settings including out-of-distribution scenarios, limited training data regimes, and scarce concept supervisions.")Here's the translation in Simplified Chinese:“relational deep learning模型的设计问题对开放式挑战:可解释深度学习方法,如基于概念的模型(CBMs),不适合解决关系问题,而关系模型不如CBMs可解释。为解决这个问题,我们提议了关系基于概念模型(Relational Concept-Based Models),这是一种可解释的关系深度学习方法。我们的实验,从图像分类到知识图的链接预测,显示了关系CBMs(i)与现有关系黑盒(as opposed to non-relational CBMs)的一致性表现,(ii)支持生成量化的概念基于解释,(iii)在测试时干预有效,(iv)在具有异常场景、有限训练数据 régime和罕见概念监督的情况下坚持。”

Will More Expressive Graph Neural Networks do Better on Generative Tasks?

  • paper_url: http://arxiv.org/abs/2308.11978
  • repo_url: None
  • paper_authors: Xiandong Zou, Xiangyu Zhao, Pietro Liò, Yiren Zhao
  • for: 本研究的目的是探讨 Graph Neural Network (GNN) 在分子图生成任务中的表达能力,并将 GNN 应用于两种不同的生成框架(GCPN 和 GraphAF)中。
  • methods: 本研究使用了六种不同的 GNN,包括 GCN、GAT、GGN、GraphSAGE、Graph Isomorphism Network (GIN) 和 Graph Attention Network (GAT),并对这些 GNN 进行了比较。
  • results: 研究发现,使用更高级的 GNN 可以提高 GCPN 和 GraphAF 在分子图生成任务中的表现,但 GNN 表现不是必需的 condition для一个好的 GNN-based 生成模型。此外,研究还发现,使用更高级的 GNN 可以使 GCPN 和 GraphAF 在17种非 GNN-based 图生成方法(如变量 autoencoders 和 Bayesian 优化模型)中 achieve state-of-the-art 结果。
    Abstract Graph generation poses a significant challenge as it involves predicting a complete graph with multiple nodes and edges based on simply a given label. This task also carries fundamental importance to numerous real-world applications, including de-novo drug and molecular design. In recent years, several successful methods have emerged in the field of graph generation. However, these approaches suffer from two significant shortcomings: (1) the underlying Graph Neural Network (GNN) architectures used in these methods are often underexplored; and (2) these methods are often evaluated on only a limited number of metrics. To fill this gap, we investigate the expressiveness of GNNs under the context of the molecular graph generation task, by replacing the underlying GNNs of graph generative models with more expressive GNNs. Specifically, we analyse the performance of six GNNs in two different generative frameworks (GCPN and GraphAF), on six different molecular generative objectives on the ZINC-250k dataset. Through our extensive experiments, we demonstrate that advanced GNNs can indeed improve the performance of GCPN and GraphAF on molecular generation tasks, but GNN expressiveness is not a necessary condition for a good GNN-based generative model. Moreover, we show that GCPN and GraphAF with advanced GNNs can achieve state-of-the-art results across 17 other non-GNN-based graph generative approaches, such as variational autoencoders and Bayesian optimisation models, on the proposed molecular generative objectives (DRD2, Median1, Median2), which are important metrics for de-novo molecular design.
    摘要 “图生成具有重要挑战,因为它需要预测一个完整的图像,包括多个节点和边,基于只提供的标签。这个任务对于许多实际应用都具有重要性,如新药和分子设计。在过去几年,图生成领域内出现了许多成功的方法。然而,这些方法受到两大缺点的影响:(1)用于这些方法的基本图神经网络(GNN)架构经常未得到充分的探索;(2)这些方法通常只被评估在有限的约束下。为了填补这个差距,我们在图生成任务中研究GNN的表达能力,通过将基本GNN替换为更表达能力的GNN来进行分析。我们在ZINC-250k数据集上进行了广泛的实验,并证明了高级GNN可以提高GCPN和GraphAF在分子生成任务中的表现,但GNN表达能力不是必要的condition。此外,我们还示出了GCPN和GraphAF与高级GNN的组合可以在17种非GNN基于的图生成方法(如变量自动编码器和搜索优化模型)中实现州际级结果,这些对于新药设计是重要的度量。”

Approximating Score-based Explanation Techniques Using Conformal Regression

  • paper_url: http://arxiv.org/abs/2308.11975
  • repo_url: None
  • paper_authors: Amr Alkhatib, Henrik Boström, Sofiane Ennadir, Ulf Johansson
  • for: 这些 papers 是为了解释黑obox 模型的逻辑而写的。
  • methods: 这些 papers 使用了 computationally costly 的 explanation techniques, such as SHAP, 并提出了一种使用 computationally less costly regression models 来近似 score-based explanation techniques 的方法。
  • results: 这些 papers 提出了一些 non-conformity measures 来考虑 approximating explanations 的困难度,并在大规模的 empirical investigation 中证明了其效果。 Results 表明,提出的方法可以significantly improve execution time compared to fast version of SHAP, TreeSHAP, 并且可以生成紧凑的 interval。
    Abstract Score-based explainable machine-learning techniques are often used to understand the logic behind black-box models. However, such explanation techniques are often computationally expensive, which limits their application in time-critical contexts. Therefore, we propose and investigate the use of computationally less costly regression models for approximating the output of score-based explanation techniques, such as SHAP. Moreover, validity guarantees for the approximated values are provided by the employed inductive conformal prediction framework. We propose several non-conformity measures designed to take the difficulty of approximating the explanations into account while keeping the computational cost low. We present results from a large-scale empirical investigation, in which the approximate explanations generated by our proposed models are evaluated with respect to efficiency (interval size). The results indicate that the proposed method can significantly improve execution time compared to the fast version of SHAP, TreeSHAP. The results also suggest that the proposed method can produce tight intervals, while providing validity guarantees. Moreover, the proposed approach allows for comparing explanations of different approximation methods and selecting a method based on how informative (tight) are the predicted intervals.
    摘要 黑obox模型的解释技术 oftentimes 使用分数基因 explainable machine-learning 技术。然而,这些解释技术通常 computationally expensive,这限制了它们在时间敏感上下文中的应用。因此,我们提出并 investigate 使用 computationally less costly 回归模型来近似 score-based explanation techniques, such as SHAP。此外,我们提供了雇佣 inductive conformal prediction 框架来提供有效性保证。我们还提出了一些非准确度度量,用于考虑近似解释的困难性,同时保持计算成本低。我们在大规模的实验中发现,我们提posed方法可以在执行时间方面取得显著改进,比如 TreeSHAP 的快速版本。此外,我们的结果还表明,我们的方法可以生成紧凑的间隔,同时提供有效性保证。此外,我们的方法允许比较不同的近似方法的解释,并选择一个基于解释 intervals 的紧凑程度(tightness)。

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2308.11974
  • repo_url: None
  • paper_authors: Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, Taehyeong Kim
  • for: 本研究旨在提出一种基于NeRF的模型,用于文本驱动地地方化编辑3D对象,以实现在文本提示中指定的本地修改。
  • methods: 该模型包含两个NeRF网络:预训练NeRF和可编辑NeRF,以及新的混合操作。使用CLIP模型进行视觉语言对Alignment,引导Blending-NeRF模型在文本提示中添加新物体、修改 текстуры和 removing部分原对象。
  • results: 我们的广泛实验表明,Blending-NeRF模型能够自然地和地方化地编辑3D对象,从多种文本提示中生成修改后的结果。
    Abstract Text-driven localized editing of 3D objects is particularly difficult as locally mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts.
    摘要 文本驱动的3D对象编辑 particullary difficult, because mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts.Here's the translation in Traditional Chinese:文本驱动的3D对象编译 particullary difficult, because mixing the original 3D object with the intended new object and style effects without distorting the object's form is not a straightforward process. To address this issue, we propose a novel NeRF-based model, Blending-NeRF, which consists of two NeRF networks: pretrained NeRF and editable NeRF. Additionally, we introduce new blending operations that allow Blending-NeRF to properly edit target regions which are localized by text. By using a pretrained vision-language aligned model, CLIP, we guide Blending-NeRF to add new objects with varying colors and densities, modify textures, and remove parts of the original object. Our extensive experiments demonstrate that Blending-NeRF produces naturally and locally edited 3D objects from various text prompts.

Value of Assistance for Mobile Agents

  • paper_url: http://arxiv.org/abs/2308.11961
  • repo_url: https://github.com/clair-lab-technion/voa
  • paper_authors: Adi Amuzig, David Dovrat, Sarah Keren
  • for: 这篇论文是为了解决移动机器人agent的地理位置uncertainty问题,通过增加协助行为来减少uncertainty。
  • methods: 该论文提出了一种基于Gaussian process的Value of Assistance(VOA)计算方法,用于评估协助行为的效果。
  • results: 研究人员通过实验和实际应用 validate了VOA计算方法,并证明了VOA可以准确预测机器人的成本减少效果。
    Abstract Mobile robotic agents often suffer from localization uncertainty which grows with time and with the agents' movement. This can hinder their ability to accomplish their task. In some settings, it may be possible to perform assistive actions that reduce uncertainty about a robot's location. For example, in a collaborative multi-robot system, a wheeled robot can request assistance from a drone that can fly to its estimated location and reveal its exact location on the map or accompany it to its intended location. Since assistance may be costly and limited, and may be requested by different members of a team, there is a need for principled ways to support the decision of which assistance to provide to an agent and when, as well as to decide which agent to help within a team. For this purpose, we propose Value of Assistance (VOA) to represent the expected cost reduction that assistance will yield at a given point of execution. We offer ways to compute VOA based on estimations of the robot's future uncertainty, modeled as a Gaussian process. We specify conditions under which our VOA measures are valid and empirically demonstrate the ability of our measures to predict the agent's average cost reduction when receiving assistance in both simulated and real-world robotic settings.
    摘要 Mobile robotic agents often suffer from localization uncertainty, which increases over time and with the agents' movement. This can hinder their ability to complete tasks. In some cases, it may be possible to perform assistive actions that reduce uncertainty about a robot's location. For example, in a collaborative multi-robot system, a wheeled robot can request assistance from a drone that can fly to its estimated location and reveal its exact location on the map or accompany it to its intended location. Since assistance may be costly and limited, and may be requested by different team members, there is a need for principled ways to support the decision of which assistance to provide to an agent and when, as well as to decide which agent to help within a team. To address this need, we propose the Value of Assistance (VOA) to represent the expected cost reduction that assistance will yield at a given point of execution. We provide methods to compute VOA based on estimations of the robot's future uncertainty, modeled as a Gaussian process. We specify conditions under which our VOA measures are valid and empirically demonstrate the ability of our measures to predict the agent's average cost reduction when receiving assistance in both simulated and real-world robotic settings.

Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas

  • paper_url: http://arxiv.org/abs/2308.12312
  • repo_url: None
  • paper_authors: Jai Kumar, David Zarzoso, Virginie Grandgirard, Jan Ebert, Stefan Kesselheim
  • for: 这篇论文使用了哈曼-普朗托纳系统的减少形式版本(1D1V)作为物理信息学神经网络(PINN)的应用测试平台,以解决气体振荡和杯尖不稳定性问题。
  • methods: 这篇论文首先使用了PINN作为压缩方法来解决哈曼-普朗托纳系统的解,并与标准神经网络进行比较。其次,文章还应用了PINN来解决哈曼-普朗托纳系统,并强调了对部分权重的特殊强调,导致了一种基于自动导数和自动积分的PINN变体,称为可integrable PINN(I-PINN)。
  • results: 文章的结果表明,PINN可以成功地解决哈曼-普朗托纳系统的问题,并且可以提供更高精度的解决方案。此外,I-PINN还能够更好地处理部分权重的问题,提高了解决速度和精度。
    Abstract The Vlasov-Poisson system is employed in its reduced form version (1D1V) as a test bed for the applicability of Physics Informed Neural Network (PINN) to the wave-particle resonance. Two examples are explored: the Landau damping and the bump-on-tail instability. PINN is first tested as a compression method for the solution of the Vlasov-Poisson system and compared to the standard neural networks. Second, the application of PINN to solving the Vlasov-Poisson system is also presented with the special emphasis on the integral part, which motivates the implementation of a PINN variant, called Integrable PINN (I-PINN), based on the automatic-differentiation to solve the partial differential equation and on the automatic-integration to solve the integral equation.
    摘要 <>使用减 simplify 的 Vlasov-Poisson 系统作为测试床,以检验物理学 Informed Neural Network (PINN) 在波动-粒子共振中的可应用性。两个例子被探讨:兰道抑压和块在尾部不稳定。首先,PINN 作为 Vlasov-Poisson 系统解的压缩方法,与标准神经网络进行比较。其次,通过特别强调完 integral part,实现了一种基于自动极点 differentiable 和自动极点 integrate 的 PINN 变体,称为可 integrate PINN(I-PINN),以解决 partial differential equation 和 integral equation。[/INST0] Here's the text in Traditional Chinese:<>使用减 simplify 的 Vlasov-Poisson 系统作为测试床,以检验物理学 Informed Neural Network (PINN) 在波动-粒子共振中的可应用性。两个例子被探讨:兰道抑压和块在尾部不稳定。首先,PINN 作为 Vlasov-Poisson 系统解的压缩方法,与标准神经网络进行比较。其次,通过特别强调完 integral part,实现了一种基于自动极点 differentiable 和自动极点 integrate 的 PINN 变体,称为可 integrate PINN(I-PINN),以解决 partial differential equation 和 integral equation。

Maintaining Plasticity via Regenerative Regularization

  • paper_url: http://arxiv.org/abs/2308.11958
  • repo_url: None
  • paper_authors: Saurabh Kumar, Henrik Marklund, Benjamin Van Roy
  • for: 维护权重的柔软性(plasticity)在处理非站点数据流时降低。
  • methods: 提出了L2Init方法,即在损失函数中添加L2正则项,以保持初始参数的柔软性。
  • results: 在不同类型的非站点性问题上,L2Init可以均衡权重的大小和柔软性,并在处理非站点数据流时提高模型的性能。
    Abstract In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a very simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On simple problems representative of different types of nonstationarity in continual learning, we demonstrate that L2 Init consistently mitigates plasticity loss. We additionally find that our regularization term reduces parameter magnitudes and maintains a high effective feature rank.
    摘要

When MiniBatch SGD Meets SplitFed Learning:Convergence Analysis and Performance Evaluation

  • paper_url: http://arxiv.org/abs/2308.11953
  • repo_url: None
  • paper_authors: Chao Huang, Geng Tian, Ming Tang
  • for: 这个论文的目的是提出一种名为MiniBatch-SFL的新的分布式学习方法,以解决在分布式学习中发生的“客户端漂移”问题。
  • methods: 这个方法利用了MiniBatch SGD和分布式学习的概念,在客户端和服务器之间分成了两部分的模型,让客户端只需要训练部分模型,以减少 computation workload。
  • results: 这个方法可以提高分布式学习的精度,尤其是在非同一的数据时。在实验中,MiniBatch-SFL比传统的分布式学习和Federated learning方法提高了精度,具体来说,可以提高24.1%和17.1%。
    Abstract Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two parts, where clients only need to train part of the model. However, SFL still suffers from the \textit{client drift} problem when clients' data are highly non-IID. To address this issue, we propose MiniBatch-SFL. This algorithm incorporates MiniBatch SGD into SFL, where the clients train the client-side model in an FL fashion while the server trains the server-side model similar to MiniBatch SGD. We analyze the convergence of MiniBatch-SFL and show that the bound of the expected loss can be obtained by analyzing the expected server-side and client-side model updates, respectively. The server-side updates do not depend on the non-IID degree of the clients' datasets and can potentially mitigate client drift. However, the client-side model relies on the non-IID degree and can be optimized by properly choosing the cut layer. Perhaps counter-intuitive, our empirical result shows that a latter position of the cut layer leads to a smaller average gradient divergence and a better algorithm performance. Moreover, numerical results show that MiniBatch-SFL achieves higher accuracy than conventional SFL and FL. The accuracy improvement can be up to 24.1\% and 17.1\% with highly non-IID data, respectively.
    摘要 分布式学习(FL)可以在分布式客户端(例如边缘设备)上进行模型训练,而不需要将原始数据共享。然而,FL可能会很 computationally expensive,因为客户端需要训练整个模型多次。SplitFed learning(SFL)是一种最近的分布式方法,它可以减轻客户端设备上的计算工作负担,通过在一层截分模型两部分,其中客户端只需要训练模型的一部分。然而,SFL仍然会遭受客户端数据高度异步的问题,称为“客户端漂移”问题。为解决这个问题,我们提出了MiniBatch-SFL。这个算法将MiniBatch SGD integrate到SFL中,客户端在FL的方式上训练客户端模型,服务器则在MiniBatch SGD的方式上训练服务器模型。我们分析MiniBatch-SFL的整合和融合,并证明了预期的损失下界可以通过分析服务器和客户端模型更新的预期值来获得。服务器端的更新不виси于客户端数据的异步度,可能减轻客户端漂移问题。然而,客户端模型取决于异步度,可以通过合适地选择截分层来优化。奇怪的是,我们的实验结果表明,将截分层放在后者位置可以减少平均梯度差异和提高算法性能。此外,我们的数值结果表明,MiniBatch-SFL可以在异步数据上达到高度的准确率,比 conventinal SFL和FL高达24.1%和17.1%。

Pose Modulated Avatars from Video

  • paper_url: http://arxiv.org/abs/2308.11951
  • repo_url: None
  • paper_authors: Chunjin Song, Bastian Wandt, Helge Rhodin
  • for: 用于重建动态人体运动和形态,并模型人体的衣物和皮肤塑形。
  • methods: 使用神经辐射场(NeRF)驱动下方skeleton,并开发了一个两极分支神经网络,以adaptive和explcit方式在频率域中模型人体部件之间的相互关系。
  • results: 对比州方法,该方法能够更好地保留细节和总体化能力。
    Abstract It is now possible to reconstruct dynamic human motion and shape from a sparse set of cameras using Neural Radiance Fields (NeRF) driven by an underlying skeleton. However, a challenge remains to model the deformation of cloth and skin in relation to skeleton pose. Unlike existing avatar models that are learned implicitly or rely on a proxy surface, our approach is motivated by the observation that different poses necessitate unique frequency assignments. Neglecting this distinction yields noisy artifacts in smooth areas or blurs fine-grained texture and shape details in sharp regions. We develop a two-branch neural network that is adaptive and explicit in the frequency domain. The first branch is a graph neural network that models correlations among body parts locally, taking skeleton pose as input. The second branch combines these correlation features to a set of global frequencies and then modulates the feature encoding. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities.
    摘要 现在可以使用神经辐射场(NeRF)和下面的骨架来重建动态人体运动和形状。然而,模拟人体皮肤和衣服的塑形仍然是一个挑战。现有的人物模型通常是通过隐藏的方式学习或者通过代理表面来实现。我们的方法受到不同姿势需要唯一频谱分配的观察所启发。忽略这种分配会导致缺陷的纹理和形状细节。我们开发了一个两极分支神经网络,其中第一极是一个图像神经网络,地方地模型体部之间的相关性,带入骨架姿势作为输入。第二极将这些相关特征与一组全局频率相结合,然后修饰特征编码。我们的实验表明,我们的网络在保持细节和泛化能力方面超越了现有的方法。

High-quality Image Dehazing with Diffusion Model

  • paper_url: http://arxiv.org/abs/2308.11949
  • repo_url: None
  • paper_authors: Hu Yu, Jie Huang, Kaiwen Zheng, Man Zhou, Feng Zhao
  • for: 解压缩雾化图像,即在浓雾场景下还原原始图像的信息。
  • methods: 本文提出了一种基于DDPM的物理学习框架,即DehazeDDPM,它首先使用物理模型(ASM)模拟雾化任务,然后使用DDPM进行补偿,以恢复雾化induced的信息损失。
  • results: 对比实验表明,DehazeDDPM在 sintetic和实际雾化数据集上达到了领先的表现。
    Abstract Image dehazing is quite challenging in dense-haze scenarios, where quite less original information remains in the hazy image. Though previous methods have made marvelous progress, they still suffer from information loss in content and color in dense-haze scenarios. The recently emerged Denoising Diffusion Probabilistic Model (DDPM) exhibits strong generation ability, showing potential for solving this problem. However, DDPM fails to consider the physics property of dehazing task, limiting its information completion capacity. In this work, we propose DehazeDDPM: A DDPM-based and physics-aware image dehazing framework that applies to complex hazy scenarios. Specifically, DehazeDDPM works in two stages. The former stage physically models the dehazing task with the Atmospheric Scattering Model (ASM), pulling the distribution closer to the clear data and endowing DehazeDDPM with fog-aware ability. The latter stage exploits the strong generation ability of DDPM to compensate for the haze-induced huge information loss, by working in conjunction with the physical modelling. Extensive experiments demonstrate that our method attains state-of-the-art performance on both synthetic and real-world hazy datasets.
    摘要 Image 降霾 quite challenging in dense-haze scenarios, where quite less original information remains in the hazy image. Although previous methods have made marvelous progress, they still suffer from information loss in content and color in dense-haze scenarios. The recently emerged Denoising Diffusion Probabilistic Model (DDPM) exhibits strong generation ability, showing potential for solving this problem. However, DDPM fails to consider the physics property of dehazing task, limiting its information completion capacity. In this work, we propose DehazeDDPM: A DDPM-based and physics-aware image dehazing framework that applies to complex hazy scenarios. Specifically, DehazeDDPM works in two stages. The former stage physically models the dehazing task with the Atmospheric Scattering Model (ASM), pulling the distribution closer to the clear data and endowing DehazeDDPM with fog-aware ability. The latter stage exploits the strong generation ability of DDPM to compensate for the haze-induced huge information loss, by working in conjunction with the physical modelling. Extensive experiments demonstrate that our method attains state-of-the-art performance on both synthetic and real-world hazy datasets.

LongDanceDiff: Long-term Dance Generation with Conditional Diffusion Model

  • paper_url: http://arxiv.org/abs/2308.11945
  • repo_url: None
  • paper_authors: Siqi Yang, Zejun Yang, Zhisheng Wang
  • for: 这个研究旨在解决长期三维真实舞蹈生成中的静止问题,以提高舞蹈生成的可调和自然性。
  • methods: 我们运用了一个条件扩散模型,长舞蹈扩散(LongDanceDiff),并将输入组合了音乐、过去的动作和随机化的未来动作。我们还引入了一个共同信息最小化目标,以优化生成的舞蹈动作的多样性和自然性。
  • results: 我们的方法与现有的方法相比,实现了重大的改善,包括增加了舞蹈生成的可调和自然性。我们计划将我们的代码和模型发布给社区。
    Abstract Dancing with music is always an essential human art form to express emotion. Due to the high temporal-spacial complexity, long-term 3D realist dance generation synchronized with music is challenging. Existing methods suffer from the freezing problem when generating long-term dances due to error accumulation and training-inference discrepancy. To address this, we design a conditional diffusion model, LongDanceDiff, for this sequence-to-sequence long-term dance generation, addressing the challenges of temporal coherency and spatial constraint. LongDanceDiff contains a transformer-based diffusion model, where the input is a concatenation of music, past motions, and noised future motions. This partial noising strategy leverages the full-attention mechanism and learns the dependencies among music and past motions. To enhance the diversity of generated dance motions and mitigate the freezing problem, we introduce a mutual information minimization objective that regularizes the dependency between past and future motions. We also address common visual quality issues in dance generation, such as foot sliding and unsmooth motion, by incorporating spatial constraints through a Global-Trajectory Modulation (GTM) layer and motion perceptual losses, thereby improving the smoothness and naturalness of motion generation. Extensive experiments demonstrate a significant improvement in our approach over the existing state-of-the-art methods. We plan to release our codes and models soon.
    摘要 人类常用舞蹈作为表达情感的重要艺术形式。由于高度时空复杂性,长期3D真实舞蹈生成同音乐同步是一项挑战。现有方法受到预测-实际差异和错误积累的问题。为解决这问题,我们设计了一种 conditional diffusion 模型,长 dance diff(LongDanceDiff),用于这种序列到序列长期舞蹈生成任务,解决时间准确性和空间约束的挑战。LongDanceDiff 包括一个基于 transformer 的扩散模型,输入是音乐、过去动作和噪音未来动作的 concatenation。这种 partial noising 策略利用了全程注意机制,学习音乐和过去动作之间的依赖关系。为提高生成舞蹈动作的多样性和减少冻结问题,我们引入了一个 mutual information minimization 目标,规范过去和未来动作之间的依赖关系。我们还通过 incorporating 全球轨迹修饰(GTM)层和运动观察损失,提高生成动作的平滑性和自然性。广泛的实验表明我们的方法在现有状态的方法上显著提高了性能。我们计划 soon 发布我们的代码和模型。

RamseyRL: A Framework for Intelligent Ramsey Number Counterexample Searching

  • paper_url: http://arxiv.org/abs/2308.11943
  • repo_url: None
  • paper_authors: Steve Vott, Adam M. Lehavi
  • for: 本 paper 探讨了使用最佳先搜索算法和强化学习(RL)技术来找到特定 Ramsey 数字的反例。
  • methods: 本 paper 使用了图vectorization和深度神经网络(DNN)基于的优化和搜索算法,以评估图是否为反例。
  • results: 本 paper 提出了一种搜索框架,可以支持 Ramsey 反例探索使用其他heelures。
    Abstract The Ramsey number is the minimum number of nodes, $n = R(s, t)$, such that all undirected simple graphs of order $n$, contain a clique of order $s$, or an independent set of order $t$. This paper explores the application of a best first search algorithm and reinforcement learning (RL) techniques to find counterexamples to specific Ramsey numbers. We incrementally improve over prior search methods such as random search by introducing a graph vectorization and deep neural network (DNN)-based heuristic, which gauge the likelihood of a graph being a counterexample. The paper also proposes algorithmic optimizations to confine a polynomial search runtime. This paper does not aim to present new counterexamples but rather introduces and evaluates a framework supporting Ramsey counterexample exploration using other heuristics. Code and methods are made available through a PyPI package and GitHub repository.
    摘要 “拉姆齐数”是最小的节点数量,$n = R(s, t)$, 使得所有无向简单图的顺序为$n$,必然包含一个 clique 的顺序为$s$,或一个独立集的顺序为$t$。这篇论文探索使用最佳先搜索算法和强化学习(RL)技术来找到特定拉姆齐数的反例。我们通过引入图像化和深度神经网络(DNN)基于的优化来提高先前的搜索方法,如随机搜索。 paper 还提出了算法优化,以确保搜索时间 polynomial。这篇论文不是想要发现新的反例,而是介绍和评估一个支持拉姆齐反例探索的框架,使用其他规则。代码和方法通过 PyPI 包和 GitHub 存储库提供。

Retail Demand Forecasting: A Comparative Study for Multivariate Time Series

  • paper_url: http://arxiv.org/abs/2308.11939
  • repo_url: None
  • paper_authors: Md Sabbirul Haque, Md Shahedul Amin, Jonayet Miah
  • for: 预测零售需求的精度是零售业的金融性和供应链效率的关键因素。在全球市场变得越来越连接起来,企业们正在寻找更高级别的预测模型,以获得竞争优势。
  • methods: 本研究使用时间系列数据中的顾客需求和macro经济变量(如Consumer Price Index(CPI)、Index of Consumer Sentiment(ICS)和失业率)进行拓展。我们采用了不同的回归和机器学习模型,以准确预测零售需求。
  • results: 我们的研究发现,通过拓展时间系列数据中的顾客需求和macro经济变量,可以提高预测零售需求的准确度。不同的回归和机器学习模型在预测零售需求方面具有不同的表现,但是综合评价下,机器学习模型的表现较好。
    Abstract Accurate demand forecasting in the retail industry is a critical determinant of financial performance and supply chain efficiency. As global markets become increasingly interconnected, businesses are turning towards advanced prediction models to gain a competitive edge. However, existing literature mostly focuses on historical sales data and ignores the vital influence of macroeconomic conditions on consumer spending behavior. In this study, we bridge this gap by enriching time series data of customer demand with macroeconomic variables, such as the Consumer Price Index (CPI), Index of Consumer Sentiment (ICS), and unemployment rates. Leveraging this comprehensive dataset, we develop and compare various regression and machine learning models to predict retail demand accurately.
    摘要 Accurate demand forecasting in the retail industry is a critical determinant of financial performance and supply chain efficiency. As global markets become increasingly interconnected, businesses are turning towards advanced prediction models to gain a competitive edge. However, existing literature mostly focuses on historical sales data and ignores the vital influence of macroeconomic conditions on consumer spending behavior. In this study, we bridge this gap by enriching time series data of customer demand with macroeconomic variables, such as the Consumer Price Index (CPI), Index of Consumer Sentiment (ICS), and unemployment rates. Leveraging this comprehensive dataset, we develop and compare various regression and machine learning models to predict retail demand accurately.Here's the translation in Traditional Chinese:精准的预测是商业领域中的一个关键因素,对于财务性能和供应链效率都是决定性的。随着全球市场变得越来越联系,企业们正在转向更进步的预测模型,以获得竞争优势。然而,现有的文献主要集中在历史销售数据上,忽略了消费者支出行为中的重要影响因素。在这项研究中,我们将把历史销售数据丰富化,加入 macroeconomic 变量,例如消费者物价指数 (CPI)、消费者信心指数 (ICS) 和失业率。利用这个完整的数据集,我们将开发和比较不同的回归和机器学习模型,以精准预测零售需求。

Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification

  • paper_url: http://arxiv.org/abs/2308.11937
  • repo_url: https://github.com/event-ahu/efv_event_classification
  • paper_authors: Chengguo Yuan, Yu Jin, Zongzhen Wu, Fanting Wei, Yangzirui Wang, Lan Chen, Xiao Wang
  • for: 本文提出了一种新的双流框架,用于事件表示、提取和融合,以解决现有方法的缺点,包括单一模式表达和网络结构设计。
  • methods: 本文使用了Transformer和结构化图 neural network(GNN)架构,同时学习事件图像和事件立方体信息。在这个框架中,用瓶颈Transformer来实现双流信息融合。
  • results: 经过广泛的实验表明,我们的提议的框架可以在两个常用的事件基本分类数据集上达到最新的性能水平。代码可以在:\url{https://github.com/Event-AHU/EFV_event_classification} 中找到。
    Abstract Recognizing target objects using an event-based camera draws more and more attention in recent years. Existing works usually represent the event streams into point-cloud, voxel, image, etc, and learn the feature representations using various deep neural networks. Their final results may be limited by the following factors: monotonous modal expressions and the design of the network structure. To address the aforementioned challenges, this paper proposes a novel dual-stream framework for event representation, extraction, and fusion. This framework simultaneously models two common representations: event images and event voxels. By utilizing Transformer and Structured Graph Neural Network (GNN) architectures, spatial information and three-dimensional stereo information can be learned separately. Additionally, a bottleneck Transformer is introduced to facilitate the fusion of the dual-stream information. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on two widely used event-based classification datasets. The source code of this work is available at: \url{https://github.com/Event-AHU/EFV_event_classification}
    摘要 recognizing target objects using event-based cameras has attracted increasing attention in recent years. existing works usually convert event streams into point clouds, voxels, images, etc., and learn feature representations using various deep neural networks. however, their final results may be limited by the following factors: monotonous modal expressions and the design of the network structure. to address these challenges, this paper proposes a novel dual-stream framework for event representation, extraction, and fusion. this framework simultaneously models two common representations: event images and event voxels. by utilizing transformer and structured graph neural network (gnn) architectures, spatial information and three-dimensional stereo information can be learned separately. additionally, a bottleneck transformer is introduced to facilitate the fusion of the dual-stream information. extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on two widely used event-based classification datasets. the source code of this work is available at: \url{https://github.com/Event-AHU/EFV_event_classification}Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Diverse Policies Converge in Reward-free Markov Decision Processe

  • paper_url: http://arxiv.org/abs/2308.11924
  • repo_url: https://github.com/openrl-lab/diversepolicies
  • paper_authors: Fanqi Lin, Shiyu Huang, Weiwei Tu
  • for: 本文旨在提供一个统一的多种策略学习框架,并调查多种策略学习算法的训练是如何 converges 和效率如何。
  • methods: 本文提出了一种可证明高效的多种策略学习算法,并通过数学实验证明了其效果。
  • results: 经过数学实验,本文发现了多种策略学习算法的训练可以高效地 converge 到优化策略,并且可以提高策略的多样性和鲁棒性。
    Abstract Reinforcement learning has achieved great success in many decision-making tasks, and traditional reinforcement learning algorithms are mainly designed for obtaining a single optimal solution. However, recent works show the importance of developing diverse policies, which makes it an emerging research topic. Despite the variety of diversity reinforcement learning algorithms that have emerged, none of them theoretically answer the question of how the algorithm converges and how efficient the algorithm is. In this paper, we provide a unified diversity reinforcement learning framework and investigate the convergence of training diverse policies. Under such a framework, we also propose a provably efficient diversity reinforcement learning algorithm. Finally, we verify the effectiveness of our method through numerical experiments.
    摘要 “强化学习在很多决策任务中取得了很大成功,但传统的强化学习算法主要是为了获得单一的优化解决方案。然而,latest works表明了多种策略的重要性,使得这成为一个emerging研究话题。虽然多种多样性强化学习算法已经出现,但没有任何一个能回答强化学习算法如何 converges和效率如何。在这篇论文中,我们提出了一个统一的多样性强化学习框架,并investigate了训练多种策略的聚合。根据这种框架,我们还提出了可证明有效的多样性强化学习算法。最后,我们通过数值实验验证了我们的方法的有效性。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification

  • paper_url: http://arxiv.org/abs/2308.11920
  • repo_url: None
  • paper_authors: Injae Kim, Jongha Kim, Joonmyung Choi, Hyunwoo J. Kim
  • for: 提高医疗应用中模型可靠性的一个关键因素是可读性。概念瓶颈模型(CBM)可以使用人类理解的概念作为中间目标进行可读性图像分类。
  • methods: 在使用大型自然语言模型(LLM)生成概念的现有方法中,不考虑概念是否具有视觉特征,这是计算有意义的概念分数的重要因素。因此,我们提议使用视觉活动分数来衡量概念是否含有视觉cue,可以使用无标注图像数据来计算。
  • results: 我们的实验结果表明,采用我们提议的视觉活动分数来筛选概念可以consistently提高性能,相比基线。此外,qualitative analyses还证明了视觉相关概念被选择。
    Abstract Interpretability is a crucial factor in building reliable models for various medical applications. Concept Bottleneck Models (CBMs) enable interpretable image classification by utilizing human-understandable concepts as intermediate targets. Unlike conventional methods that require extensive human labor to construct the concept set, recent works leveraging Large Language Models (LLMs) for generating concepts made automatic concept generation possible. However, those methods do not consider whether a concept is visually relevant or not, which is an important factor in computing meaningful concept scores. Therefore, we propose a visual activation score that measures whether the concept contains visual cues or not, which can be easily computed with unlabeled image data. Computed visual activation scores are then used to filter out the less visible concepts, thus resulting in a final concept set with visually meaningful concepts. Our experimental results show that adopting the proposed visual activation score for concept filtering consistently boosts performance compared to the baseline. Moreover, qualitative analyses also validate that visually relevant concepts are successfully selected with the visual activation score.
    摘要 “可读性”是医疗应用中建立可靠模型的重要因素。概念瓶颈模型(CBM)可以实现可读性检查,通过使用人类可理解的概念作为中间目标。与传统方法不同的是,这些方法不需要大量的人工劳动来建立概念集。最近的工作则是利用大型自然语言模型(LLM)生成概念,并使用这些概念来生成可读性检查。但是,这些方法并不考虑概念是否具有视觉相关性,这是 Computing meaningful concept scores 中的重要因素。因此,我们提出了视觉活动 scores,它可以衡量概念是否包含视觉提示,并且可以轻松地使用无标注图像资料来计算。我们的实验结果显示,运用我们提出的视觉活动 scores 进行概念筛选可以与基准相比,实现更高的性能。此外,实验分析也显示,这些视觉相关的概念被成功选择。

LFS-GAN: Lifelong Few-Shot Image Generation

  • paper_url: http://arxiv.org/abs/2308.11917
  • repo_url: https://github.com/jjuon/lfs-gan
  • paper_authors: Juwon Seo, Ji-Su Kang, Gyeong-Moon Park
  • For: The paper addresses the challenging task of lifelong few-shot image generation, where a generative model learns a sequence of tasks using only a few samples per task, and prevents catastrophic forgetting and overfitting.* Methods: The proposed framework, called Lifelong Few-Shot GAN (LFS-GAN), uses an efficient task-specific modulator called Learnable Factorized Tensor (LeFT) to learn each task, and a novel mode seeking loss to improve diversity in low-data circumstances.* Results: The proposed LFS-GAN can generate high-quality and diverse images in various domains without any forgetting and mode collapse, achieving state-of-the-art in lifelong few-shot image generation task, and even outperforming existing few-shot GANs in the few-shot image generation task.Here is the simplified Chinese text for the three key points:* For: 这篇论文首次解决了难度较高的生命周期几个shot图像生成任务,其中一个生成模型需要使用只有几个样本来学习每个任务。* Methods: 该提议的框架called Lifelong Few-Shot GAN (LFS-GAN)使用高效的任务特定修饰器called Learnable Factorized Tensor (LeFT)来学习每个任务,并使用一种新的模式寻找损失来提高模型在低数据情况下的多样性。* Results: 提议的LFS-GAN可以在不同领域中生成高质量和多样的图像,无论是在几个shot图像生成任务中还是在生命周期中,并且可以超越现有的几个shot GANs在几个shot图像生成任务中的性能。
    Abstract We address a challenging lifelong few-shot image generation task for the first time. In this situation, a generative model learns a sequence of tasks using only a few samples per task. Consequently, the learned model encounters both catastrophic forgetting and overfitting problems at a time. Existing studies on lifelong GANs have proposed modulation-based methods to prevent catastrophic forgetting. However, they require considerable additional parameters and cannot generate high-fidelity and diverse images from limited data. On the other hand, the existing few-shot GANs suffer from severe catastrophic forgetting when learning multiple tasks. To alleviate these issues, we propose a framework called Lifelong Few-Shot GAN (LFS-GAN) that can generate high-quality and diverse images in lifelong few-shot image generation task. Our proposed framework learns each task using an efficient task-specific modulator - Learnable Factorized Tensor (LeFT). LeFT is rank-constrained and has a rich representation ability due to its unique reconstruction technique. Furthermore, we propose a novel mode seeking loss to improve the diversity of our model in low-data circumstances. Extensive experiments demonstrate that the proposed LFS-GAN can generate high-fidelity and diverse images without any forgetting and mode collapse in various domains, achieving state-of-the-art in lifelong few-shot image generation task. Surprisingly, we find that our LFS-GAN even outperforms the existing few-shot GANs in the few-shot image generation task. The code is available at Github.
    摘要 我们首次Addressing a challenging lifelong few-shot image generation task. In this situation, a generative model learns a sequence of tasks using only a few samples per task. As a result, the learned model encounters both catastrophic forgetting and overfitting problems at the same time. Existing studies on lifelong GANs have proposed modulation-based methods to prevent catastrophic forgetting, but these methods require additional parameters and cannot generate high-fidelity and diverse images from limited data. On the other hand, existing few-shot GANs suffer from severe catastrophic forgetting when learning multiple tasks. To address these issues, we propose a framework called Lifelong Few-Shot GAN (LFS-GAN) that can generate high-quality and diverse images in the lifelong few-shot image generation task. Our proposed framework learns each task using an efficient task-specific modulator called Learnable Factorized Tensor (LeFT). LeFT is rank-constrained and has a rich representation ability due to its unique reconstruction technique. Furthermore, we propose a novel mode seeking loss to improve the diversity of our model in low-data circumstances. Extensive experiments show that the proposed LFS-GAN can generate high-fidelity and diverse images without any forgetting and mode collapse in various domains, achieving state-of-the-art in the lifelong few-shot image generation task. Surprisingly, we find that our LFS-GAN even outperforms the existing few-shot GANs in the few-shot image generation task. The code is available on Github.Here's the translation in Traditional Chinese:我们首次Addressing a challenging lifelong few-shot image generation task. 在这个情况下, a generative model learns a sequence of tasks using only a few samples per task. 因此, the learned model encounters both catastrophic forgetting and overfitting problems at the same time. existing studies on lifelong GANs have proposed modulation-based methods to prevent catastrophic forgetting, but these methods require additional parameters and cannot generate high-fidelity and diverse images from limited data. On the other hand, existing few-shot GANs suffer from severe catastrophic forgetting when learning multiple tasks. To address these issues, we propose a framework called Lifelong Few-Shot GAN (LFS-GAN) that can generate high-quality and diverse images in the lifelong few-shot image generation task. Our proposed framework learns each task using an efficient task-specific modulator called Learnable Factorized Tensor (LeFT). LeFT is rank-constrained and has a rich representation ability due to its unique reconstruction technique. Furthermore, we propose a novel mode seeking loss to improve the diversity of our model in low-data circumstances. Extensive experiments show that the proposed LFS-GAN can generate high-fidelity and diverse images without any forgetting and mode collapse in various domains, achieving state-of-the-art in the lifelong few-shot image generation task. Surprisingly, we find that our LFS-GAN even outperforms the existing few-shot GANs in the few-shot image generation task. The code is available on Github.

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

  • paper_url: http://arxiv.org/abs/2308.11914
  • repo_url: None
  • paper_authors: Ziyi Tang, Ruilin Wang, Weixing Chen, Keze Wang, Yang Liu, Tianshui Chen, Liang Lin
  • for: 提高知识基于reasoning的 faithfulness和 causality
  • methods: 多智能体协作 reasoning-and-consensus 框架
  • results: 在多种知识reasoning任务(如科学问答和常识reasoning)中,我们的框架比所有比较方法都高得多
    Abstract Despite advancements in LLMs, knowledge-based reasoning remains a longstanding issue due to the fragility of knowledge recall and inference. Existing methods primarily encourage LLMs to autonomously plan and solve problems or to extensively sample reasoning chains without addressing the conceptual and inferential fallacies. Attempting to alleviate inferential fallacies and drawing inspiration from multi-agent collaboration, we present a framework to increase faithfulness and causality for knowledge-based reasoning. Specifically, we propose to employ multiple intelligent agents (i.e., reasoner and causal evaluator) to work collaboratively in a reasoning-and-consensus paradigm for elevated reasoning faithfulness. The reasoners focus on providing solutions with human-like causality to solve open-domain problems. On the other hand, the causal evaluator agent scrutinizes if the answer in a solution is causally deducible from the question and vice versa, with a counterfactual answer replacing the original. According to the extensive and comprehensive evaluations on a variety of knowledge reasoning tasks (e.g., science question answering and commonsense reasoning), our framework outperforms all compared state-of-the-art approaches by large margins.
    摘要 尽管LLM技术得到了进步,知识基于的理解仍然是一个长期的问题,因为知识回忆和推理的 fragility。现有方法主要是让LLM自动规划和解决问题,或者广泛采样推理链而未能解决概念和推理错误。借鉴多智能代理(i.e., 理解者和 causal评估器)的合作,我们提出了增强知识基于的理解 faithfulness 的框架。 Specifically, 我们提议使用多个智能代理(i.e., 理解者和 causal评估器)在一种理解和共识 paradigm中合作,以提高理解的准确性。理解者专注于提供人类化的 causality 来解决开放领域问题,而 causal评估器代理则检查问题和答案之间的 causal 关系是否正确,并将对应的 counterfactual 答案替换原答案。根据对多种知识理解任务(如科学问答和常识理解)的广泛和全面评估,我们的框架在比较的state-of-the-art方法之上出现大幅提升。

Utilizing Admissible Bounds for Heuristic Learning

  • paper_url: http://arxiv.org/abs/2308.11905
  • repo_url: None
  • paper_authors: Carlos Núñez-Molina, Masataro Asai
  • for: 本研究的目的是提高前向搜索算法中的modern机器学习技术应用,并提供更好的理论基础 для这种应用。
  • methods: 本研究使用的方法包括使用Truncated Gaussian distribution作为参数,以及在训练过程中考虑扩展的admissible heuristics。
  • results: 研究发现,使用admissible heuristics作为参数,Truncated Gaussian distribution可以更好地适应实际问题,并且在训练过程中更快 converges。
    Abstract While learning a heuristic function for forward search algorithms with modern machine learning techniques has been gaining interest in recent years, there has been little theoretical understanding of \emph{what} they should learn, \emph{how} to train them, and \emph{why} we do so. This lack of understanding leads to various literature performing an ad-hoc selection of datasets (suboptimal vs optimal costs or admissible vs inadmissible heuristics) and optimization metrics (e.g., squared vs absolute errors). Moreover, due to the lack of admissibility of the resulting trained heuristics, little focus has been put on the role of admissibility \emph{during} learning. This paper articulates the role of admissible heuristics in supervised heuristic learning using them as parameters of Truncated Gaussian distributions, which tightens the hypothesis space compared to ordinary Gaussian distributions. We argue that this mathematical model faithfully follows the principle of maximum entropy and empirically show that, as a result, it yields more accurate heuristics and converges faster during training.
    摘要 Recently, there has been growing interest in using modern machine learning techniques to learn heuristic functions for forward search algorithms. However, there has been little theoretical understanding of what these functions should learn, how to train them, and why we do so. This lack of understanding has led to various literature selecting datasets and optimization metrics on an ad-hoc basis, and little attention has been paid to the role of admissibility during learning.This paper focuses on the role of admissible heuristics in supervised heuristic learning, using Truncated Gaussian distributions as parameters. This approach tightens the hypothesis space compared to ordinary Gaussian distributions, and faithfully follows the principle of maximum entropy. Empirical results show that this approach yields more accurate heuristics and converges faster during training.

Exploring the Optimization Objective of One-Class Classification for Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.11898
  • repo_url: None
  • paper_authors: Han Gao, Huiyuan Luo, Fei Shen, Zhengtao Zhang
    for:This paper focuses on the optimization objective space within one-class classification (OCC) methods and its impact on performance.methods:The paper proposes a novel, data-agnostic deep one-class classification method that utilizes a single 1x1 convolutional layer as a trainable projector and any space with a suitable norm as the optimization objective.results:The proposed method achieves state-of-the-art performance in both one-class classification and industrial vision anomaly detection and segmentation tasks, validating the effectiveness of the proposed approach.
    Abstract One-class classification (OCC) is a longstanding method for anomaly detection. With the powerful representation capability of the pre-trained backbone, OCC methods have witnessed significant performance improvements. Typically, most of these OCC methods employ transfer learning to enhance the discriminative nature of the pre-trained backbone's features, thus achieving remarkable efficacy. While most current approaches emphasize feature transfer strategies, we argue that the optimization objective space within OCC methods could also be an underlying critical factor influencing performance. In this work, we conducted a thorough investigation into the optimization objective of OCC. Through rigorous theoretical analysis and derivation, we unveil a key insights: any space with the suitable norm can serve as an equivalent substitute for the hypersphere center, without relying on the distribution assumption of training samples. Further, we provide guidelines for determining the feasible domain of norms for the OCC optimization objective. This novel insight sparks a simple and data-agnostic deep one-class classification method. Our method is straightforward, with a single 1x1 convolutional layer as a trainable projector and any space with suitable norm as the optimization objective. Extensive experiments validate the reliability and efficacy of our findings and the corresponding methodology, resulting in state-of-the-art performance in both one-class classification and industrial vision anomaly detection and segmentation tasks.
    摘要 一类分类(OCC)是一种长期使用的异常检测方法。随着预训练后处理的特征表示能力的提高,OCC方法已经经历了显著性能提高。通常,大多数这些OCC方法使用传输学来强化预训练后处理的特征,从而实现了很好的效果。而我们认为,OCC方法的优化目标空间也是影响性能的关键因素。在这项工作中,我们进行了一项全面的OCC优化目标对象的调查。通过严格的理论分析和逻辑推导,我们揭示出一个关键发现:任何具有适当 нор 的空间都可以作为异常中心的等价substitute,不需要基于训练样本的分布假设。此外,我们还提供了确定OCC优化目标空间的可行范围的指南。这一新发现引出了一种简单、数据非依的深度一类分类方法。我们的方法包括一个单一的1x1卷积层作为可训练的投影器,以及任何具有适当 norm的空间作为优化目标。我们的实验证明了我们的发现和相应的方法ология的可靠性和效果,在一类分类和工业视觉异常检测和分割任务中实现了状态的末点性能。

Bridging the Gap: Deciphering Tabular Data Using Large Language Model

  • paper_url: http://arxiv.org/abs/2308.11891
  • repo_url: None
  • paper_authors: Hengyuan Zhang, Peng Chang, Zongcheng Ji
  • for: 本研究旨在探讨大语言模型如何用于表格问答 tasks,以提高表格结构和内容的理解。
  • methods: 我们提出了一种特有的模块,用于将表格 serialized 到可以与大语言模型集成的格式。此外,我们还实施了一种纠正机制,以检查和修正模型的可能错误。
  • results: 我们的提议方法在总体指标中落后 SOTA 约 11.7%,但在特定数据集上测试时,超过 SOTA 约 1.2%。这些结果表明我们的方法可以增强大语言模型对表格结构和内容的理解。
    Abstract In the realm of natural language processing, the understanding of tabular data has perpetually stood as a focal point of scholarly inquiry. The emergence of expansive language models, exemplified by the likes of ChatGPT, has ushered in a wave of endeavors wherein researchers aim to harness these models for tasks related to table-based question answering. Central to our investigative pursuits is the elucidation of methodologies that amplify the aptitude of such large language models in discerning both the structural intricacies and inherent content of tables, ultimately facilitating their capacity to provide informed responses to pertinent queries. To this end, we have architected a distinctive module dedicated to the serialization of tables for seamless integration with expansive language models. Additionally, we've instituted a corrective mechanism within the model to rectify potential inaccuracies. Experimental results indicate that, although our proposed method trails the SOTA by approximately 11.7% in overall metrics, it surpasses the SOTA by about 1.2% in tests on specific datasets. This research marks the first application of large language models to table-based question answering tasks, enhancing the model's comprehension of both table structures and content.
    摘要 在自然语言处理领域中,表格数据的理解一直是学术研究的焦点。现在,大型语言模型的出现,如ChatGPT,使得研究人员尝试使用这些模型来解决表格问题。我们的探索的核心在于发展一种能够增强大型语言模型对表格结构和内容的理解,以便它们能够准确回答相关的问题。为此,我们设计了一个专门用于将表格序列化的模块,并在模型中实施了纠正机制以消除可能的错误。实验结果表明,虽然我们的提议方法相对于最佳实践(SOTA)落后约11.7%的总指标,但在特定数据集上测试时超过了SOTA约1.2%。这项研究是大型语言模型在表格问题回答上的首次应用,提高了模型对表格结构和内容的理解。

Integrating the Wikidata Taxonomy into YAGO

  • paper_url: http://arxiv.org/abs/2308.11884
  • repo_url: https://github.com/yago-naga/yago-4.5
  • paper_authors: Fabian Suchanek, Mehwish Alam, Thomas Bonald, Pierre-Henri Paris, Jules Soria
  • for: The paper aims to merge the entire Wikidata taxonomy into the YAGO KB as much as possible, while maintaining logical consistency.
  • methods: The authors combine Wikidata with the ontology from Schema.org to reduce and clean up the taxonomy, and use automated reasoners to ensure logical consistency.
  • results: The authors create YAGO 4.5, which adds a rich layer of informative classes to YAGO while keeping the KB logically consistent.
    Abstract Wikidata is one of the largest public general-purpose Knowledge Bases (KBs). Yet, due to its collaborative nature, its schema and taxonomy have become convoluted. For the YAGO 4 KB, we combined Wikidata with the ontology from Schema.org, which reduced and cleaned up the taxonomy and constraints and made it possible to run automated reasoners on the data. However, it also cut away large parts of the Wikidata taxonomy. In this paper, we present our effort to merge the entire Wikidata taxonomy into the YAGO KB as much as possible. We pay particular attention to logical constraints and a careful distinction of classes and instances. Our work creates YAGO 4.5, which adds a rich layer of informative classes to YAGO, while at the same time keeping the KB logically consistent.
    摘要 wikidata是一个非常大的公共通用知识库(kb)。然而由于其协作性,其架构和分类已经变得混乱。为了构建yaogo4kb,我们将wikidata与schema.org的ontology结合了起来,这有效地减少了和约束,并使得数据可以通过自动推理。但是,这也剪辑了大量wikidata分类。在这篇论文中,我们报告了我们将wikidata分类完全 merged into yaogoKB中的努力。我们特别注重逻辑约束和精心分类和实例的区分。我们的工作创造了yaogo4.5,它添加了一层有用的类到yaogoKB中,同时保持kb的逻辑一致。

Cabrita: closing the gap for foreign languages

  • paper_url: http://arxiv.org/abs/2308.11878
  • repo_url: None
  • paper_authors: Celio Larcher, Marcos Piau, Paulo Finardi, Pedro Gengo, Piero Esposito, Vinicius Caridá
  • for: 提高特定语言或领域上表现,以及有效地进行分词。
  • methods: 使用自scratch训练模型,并开发了一种名为Cabrita的方法ологи。
  • results: 在评估少量学习任务中,与传统连续预训练方法和7B英语预训练模型的结果相似,并且减少了分词的数量。
    Abstract The strategy of training the model from scratch in a specific language or domain serves two essential purposes: i) enhancing performance in the particular linguistic or domain context, and ii) ensuring effective tokenization. The main limitation inherent to this approach lies in the associated cost, which can reach six to seven-digit dollar values, depending on the model size and the number of parameters involved. The main solution to overcome the cost challenge is to rely on available pre-trained models, which, despite recent advancements such as the LLaMA and LLaMA-2 models, still demonstrate inefficiency for certain specific domain problems or prove ineffective in scenarios involving conversational memory resources, given the large number of tokens required to represent text. To overcome this issue, we present a methodology named Cabrita, which, as our research demonstrates, successfully addresses the performance and efficient tokenization problem, all at an affordable cost. We believe that this methodology can be applied to any transformer-like architecture model. To validate the study, we conducted continuous pre-training exclusively using Portuguese text on a 3-billion-parameter model known as OpenLLaMA, resulting in a model named openCabrita 3B. The openCabrita 3B also features a new tokenizer that results in a significant reduction in the number of tokens required to represent the text. In our assessment, for few-shot learning tasks, we achieved similar results with this 3B model compared to a traditional continuous pre-training approach as well as to 7B models English pre-trained models.
    摘要 strategy 训练模型从零开始在特定语言或领域上服务两个重要目的:一是提高特定语言或领域上表现,二是确保有效的分词。但这种方法存在一个主要的限制,即成本,可以达到6到7位数字的值,具体取决于模型大小和参数数量。为了缓解这个问题,可以依靠可用的预训练模型,尽管最近的进步,如LLaMA和LLaMA-2模型,仍然在特定领域问题上不具有效果,因为需要大量的字符表示文本。为了解决这个问题,我们提出了一种方法ологи,名为Cabrita,我们的研究表明,该方法能够成功地解决表现和有效的分词问题,并且具有可Affordable的成本。我们认为该方法可以应用于任何 transformer-like 架构模型。为了验证这种方法,我们进行了继续预训练,专门使用葡萄牙语文本,在一个3亿参数的模型OpenLLaMA上进行了不断预训练,从而获得了一个名为openCabrita 3B的模型。openCabrita 3B还使用了一种新的分词器,从而导致文本表示的字符数量减少了许多。在我们的评估中,对于几个shot学习任务,我们使用这个3B模型和传统预训练方法以及7B英语预训练模型进行比较,得到了类似的结果。

Integrated Image and Location Analysis for Wound Classification: A Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2308.11877
  • repo_url: None
  • paper_authors: Yash Patel, Tirth Shah, Mrinal Kanti Dhar, Taiyu Zhang, Jeffrey Niezgoda, Sandeep Gopalakrishnan, Zeyun Yu
  • for: 提高伤口分类精度,以便更好地诊断和治疗伤口。
  • methods: 基于深度卷积神经网络的多Modal网络,使用伤口图像和其相应的体部位置进行更加精确的分类。
  • results: 比 tradicional方法高,达到了74.79%到100%的ROI(区域 интерес)无位置分类精度,73.98%到100%的ROIwith位置分类精度,和78.10%到100%的全图分类精度。
    Abstract The global burden of acute and chronic wounds presents a compelling case for enhancing wound classification methods, a vital step in diagnosing and determining optimal treatments. Recognizing this need, we introduce an innovative multi-modal network based on a deep convolutional neural network for categorizing wounds into four categories: diabetic, pressure, surgical, and venous ulcers. Our multi-modal network uses wound images and their corresponding body locations for more precise classification. A unique aspect of our methodology is incorporating a body map system that facilitates accurate wound location tagging, improving upon traditional wound image classification techniques. A distinctive feature of our approach is the integration of models such as VGG16, ResNet152, and EfficientNet within a novel architecture. This architecture includes elements like spatial and channel-wise Squeeze-and-Excitation modules, Axial Attention, and an Adaptive Gated Multi-Layer Perceptron, providing a robust foundation for classification. Our multi-modal network was trained and evaluated on two distinct datasets comprising relevant images and corresponding location information. Notably, our proposed network outperformed traditional methods, reaching an accuracy range of 74.79% to 100% for Region of Interest (ROI) without location classifications, 73.98% to 100% for ROI with location classifications, and 78.10% to 100% for whole image classifications. This marks a significant enhancement over previously reported performance metrics in the literature. Our results indicate the potential of our multi-modal network as an effective decision-support tool for wound image classification, paving the way for its application in various clinical contexts.
    摘要 全球各类伤口的扩大问题,提出了加强伤口分类方法的需求,这是诊断和治疗伤口的重要一步。为此,我们介绍了一种创新的多模态网络,基于深度卷积神经网络,用于分类伤口为四类:糖尿病、压力、手术和血液溢出损伤。我们的多模态网络使用伤口图像和其相应的身体位置信息进行更加精确的分类。我们的方法的一个独特特点是通过身体地图系统,实现了更加准确的伤口位置标记,从传统伤口图像分类技术中的改进。我们的方法还integrates了多种模型,如VGG16、ResNet152和EfficientNet,并在一种新的架构中进行组合。这种架构包括空间和通道方向的压缩和激活模块、轴向注意力和适应阀控多层感知机制,为分类提供了坚实的基础。我们的多模态网络在两个不同的数据集上进行训练和评估,其中一个包含了相关的图像和身体位置信息,另一个只包含图像。我们的方法在这两个数据集上达到了74.79%到100%的ROI(区域 интереса)无地址分类精度范围,73.98%到100%的ROI与地址分类精度范围,以及78.10%到100%的整个图像分类精度范围。这表明我们的多模态网络在文献中已经报道的性能指标中达到了显著的提高。我们的结果表明,我们的多模态网络可以作为伤口图像分类的有效决策支持工具,为各种临床上下文应用。

Finding the Perfect Fit: Applying Regression Models to ClimateBench v1.0

  • paper_url: http://arxiv.org/abs/2308.11854
  • repo_url: None
  • paper_authors: Anmol Chaure, Ashok Kumar Behera, Sudip Bhattacharya
  • for: 本研究使用数据驱动机器学习模型作为气候模拟器,以便政策制定者能够做出有知识基础的决策。
  • methods: 本研究使用机器学习模型作为计算昂贵的GCM模拟器的代理,从而降低时间和碳脚印。特别是,使用核函数特性,回归模型可以捕捉复杂关系,提高预测能力。
  • results: 在使用 клима本chmark 数据集进行评估时,我们发现, amongst three non-linear regression models, Gaussian Process Regressor 表现最佳,在标准评估指标上对气候场的模拟表现出色。然而, Gaussian Process Regression 具有空间和时间复杂度的问题。相比之下, Support Vector 和 Kernel Ridge 模型也能够达到竞争力水平,但是有一定的交易offs。此外,我们正在 актив地调查 composite kernels 和变量抽象等技术,以提高回归模型的性能,更好地模拟复杂非线性 patrerns,包括降水现象。
    Abstract Climate projections using data driven machine learning models acting as emulators, is one of the prevailing areas of research to enable policy makers make informed decisions. Use of machine learning emulators as surrogates for computationally heavy GCM simulators reduces time and carbon footprints. In this direction, ClimateBench [1] is a recently curated benchmarking dataset for evaluating the performance of machine learning emulators designed for climate data. Recent studies have reported that despite being considered fundamental, regression models offer several advantages pertaining to climate emulations. In particular, by leveraging the kernel trick, regression models can capture complex relationships and improve their predictive capabilities. This study focuses on evaluating non-linear regression models using the aforementioned dataset. Specifically, we compare the emulation capabilities of three non-linear regression models. Among them, Gaussian Process Regressor demonstrates the best-in-class performance against standard evaluation metrics used for climate field emulation studies. However, Gaussian Process Regression suffers from being computational resource hungry in terms of space and time complexity. Alternatively, Support Vector and Kernel Ridge models also deliver competitive results and but there are certain trade-offs to be addressed. Additionally, we are actively investigating the performance of composite kernels and techniques such as variational inference to further enhance the performance of the regression models and effectively model complex non-linear patterns, including phenomena like precipitation.
    摘要 政策制定者可以通过使用数据驱动机器学模型作为模拟器,来做出了 informed 的决策。通过使用机器学模型作为计算成本高GCM模拟器的代理,可以降低时间和碳脚印。在这个方向下,ClimateBench [1] 是最近筹集的气候模拟数据集,用于评估机器学模型的性能。据研究,尽管被视为基本的,但是回归模型在气候模拟方面具有许多优势。具体来说,通过核心技术,回归模型可以捕捉复杂的关系,提高预测能力。本研究将对非线性回归模型进行评估,并比较其表现。Specifically,我们将 comparing the emulation capabilities of three non-linear regression models. Among them, Gaussian Process Regressor demonstrates the best-in-class performance against standard evaluation metrics used for climate field emulation studies. However, Gaussian Process Regression suffers from being computationally resource-intensive in terms of space and time complexity. Alternatively, Support Vector and Kernel Ridge models also deliver competitive results, but there are certain trade-offs to be addressed. Additionally, we are actively investigating the performance of composite kernels and techniques such as variational inference to further enhance the performance of the regression models and effectively model complex non-linear patterns, including phenomena like precipitation.

A deep reinforcement learning approach for real-time demand-responsive railway rescheduling to mitigate station overcrowding using mobile data

  • paper_url: http://arxiv.org/abs/2308.11849
  • repo_url: None
  • paper_authors: Enze Liu, Zhiyuan Lin, Judith Y. T. Wang, Hong Chen
  • For: This paper aims to provide a demand-responsive approach for real-time railway rescheduling during severe emergency events such as natural disasters, with a focus on a heavy-demand station upstream of the disrupted area.* Methods: The paper proposes using mobile data (MD) to infer real-world passenger mobility and avoid overcrowding at the target station, and a deep reinforcement learning (DRL) framework to determine the optimal reschededuled timetable, route stops, and rolling stock allocation.* Results: The paper addresses challenges such as station overcrowding, rolling stock shortage, open-ended disruption duration, and delays due to detours, and aims to improve the efficiency and safety of real-time railway rescheduling during emergency events.
    Abstract Real-time railway rescheduling is a timely and flexible technique to automatically alter the operation schedule in response to time-varying conditions. Current research lacks data-driven approaches that capture real-time passenger mobility during railway disruptions, relying mostly on OD-based data and model-based methods for estimating demands of trains. Meanwhile, the schedule-updating principles for a long-term disruption overlook the uneven distribution of demand over time. To fill this gap, this paper proposes a demand-responsive approach by inferring real-world passenger mobility from mobile data (MD) to facilitate real-time rescheduling. Unlike network-level approaches, this paper focuses on a heavy-demand station upstream of the disrupted area. The objective is to reschedule all trains on multiple routes passing through this target station, which have been affected by a severe emergency event such as a natural disaster. Particular attention should be given to avoiding the accumulation of overcrowded passengers at this station, to prevent additional accidents arising from overcrowding. This research addresses the challenges associated with this scenario, including the dynamics of arriving and leaving of passengers, station overcrowding, rolling stock shortage, open-ended disruption duration, integrated rescheduling on multiple routes, and delays due to detours. A deep reinforcement learning (DRL) framework is proposed to determine the optimal rescheduled timetable, route stops, and rolling stock allocation, while considering real-time demand satisfaction, station overcrowding, train capacity utilization, and headway safety.
    摘要 现实时铁路重新规划是一种时间变化的和灵活的技术,可以自动修改运营计划应对时间变化的条件。现有研究缺乏基于实时旅客流动数据的数据驱动方法,而是主要依赖于 Origin-Destination(OD)数据和模型基本方法来估算列车需求。同时,长期干扰的调度原则忽略了旅客需求的不均分布。为了填补这一漏洞,本文提出了一种需求响应的方法,通过推理实际旅客流动数据(MD)来促进实时重新规划。与传统网络水平方法不同,本文将注重一个重要的快速车站,该站位于干扰区域之上游。目标是重新规划通过该站的多个路线的列车,这些列车受到严重紧急事件(如自然灾害)的影响。特别是要避免该站堵塞的乘客堆积,以避免由过度堆积而导致的一次性事故。本研究解决了这种情况中的挑战,包括到站拥堵、车厢短缺、开放式干扰持续时间、多路线集成重新规划和延迟。一种深度鼓励学习(DRL)框架被提议,以确定最佳重新规划时间表、站点停留、车厢分配,同时考虑实时需求满足、站点拥堵、车厢容量利用和距离安全。

${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.11842
  • repo_url: https://github.com/dchen48/e3ac
  • paper_authors: Dingyang Chen, Qi Zhang
  • for: 这个论文旨在利用生物世界中的对称Pattern进行多智能体强化学习(MARL)问题的研究,以提高其在多种应用中的性能。
  • methods: 该论文使用了Euclidean symmetries作为多智能体强化学习问题的一种限制,并采用了具有对称约束的神经网络架构。
  • results: 该研究发现,通过对称约束的适应,神经网络架构在多种合作MARL benchmark中表现出色,并且具有很好的泛化能力,如零shot学习和转移学习。
    Abstract Identification and analysis of symmetrical patterns in the natural world have led to significant discoveries across various scientific fields, such as the formulation of gravitational laws in physics and advancements in the study of chemical structures. In this paper, we focus on exploiting Euclidean symmetries inherent in certain cooperative multi-agent reinforcement learning (MARL) problems and prevalent in many applications. We begin by formally characterizing a subclass of Markov games with a general notion of symmetries that admits the existence of symmetric optimal values and policies. Motivated by these properties, we design neural network architectures with symmetric constraints embedded as an inductive bias for multi-agent actor-critic methods. This inductive bias results in superior performance in various cooperative MARL benchmarks and impressive generalization capabilities such as zero-shot learning and transfer learning in unseen scenarios with repeated symmetric patterns. The code is available at: https://github.com/dchen48/E3AC.
    摘要 Identification和分析自然界中的对称征象导致了各科学领域的重要发现,如物理学中的重力法律的制定和化学结构的研究的进步。在这篇论文中,我们关注利用多智能体强化学习(MARL)问题中的欧几何对称的特性,这些特性在许多应用中很普遍。我们首先正式定义了一类马尔可夫游戏,其具有一般对称性,这使得存在对称优质值和策略。这些属性激发我们在多智能体actor-critic方法中嵌入对称约束,这种约束导致在各种合作MARLbenchmark中表现出色,并且具有很好的泛化能力,如零shot学习和转移学习在未看到的对称 patrern中。代码可以在以下链接中找到:https://github.com/dchen48/E3AC。

A Benchmark Study on Calibration

  • paper_url: http://arxiv.org/abs/2308.11838
  • repo_url: https://github.com/Aryia-Behroziuan/history1
  • paper_authors: Linwei Tao, Younan Zhu, Haolan Guo, Minjing Dong, Chang Xu
  • for: 这种研究的目的是为了探讨神经网络模型的准确性和稳定性之间的关系,以及如何在神经网络模型中提高准确性和稳定性的方法。
  • methods: 这种研究使用了Neural Architecture Search(NAS)搜索空间,并创建了一个神经网络模型准确性评估集(Model Calibration Dataset),以探讨神经网络模型的准确性和稳定性问题。
  • results: 研究发现,模型准确性可以在不同任务之间进行泛化,而且可以使用Robustness作为准确性评估指标。另外,研究还发现了一些常见的准确性评估指标的不可靠性,以及在不同的搜索空间中,post-hoc准确性调整方法对所有模型是否具有同样的影响。此外,研究还发现了准确性和精度之间的关系,以及带区大小对准确性评估指标的影响。最后,研究发现了一些特定的建筑设计可以提高神经网络模型的准确性。
    Abstract Deep neural networks are increasingly utilized in various machine learning tasks. However, as these models grow in complexity, they often face calibration issues, despite enhanced prediction accuracy. Many studies have endeavored to improve calibration performance through data preprocessing, the use of specific loss functions, and training frameworks. Yet, investigations into calibration properties have been somewhat overlooked. Our study leverages the Neural Architecture Search (NAS) search space, offering an exhaustive model architecture space for thorough calibration properties exploration. We specifically create a model calibration dataset. This dataset evaluates 90 bin-based and 12 additional calibration measurements across 117,702 unique neural networks within the widely employed NATS-Bench search space. Our analysis aims to answer several longstanding questions in the field, using our proposed dataset: (i) Can model calibration be generalized across different tasks? (ii) Can robustness be used as a calibration measurement? (iii) How reliable are calibration metrics? (iv) Does a post-hoc calibration method affect all models uniformly? (v) How does calibration interact with accuracy? (vi) What is the impact of bin size on calibration measurement? (vii) Which architectural designs are beneficial for calibration? Additionally, our study bridges an existing gap by exploring calibration within NAS. By providing this dataset, we enable further research into NAS calibration. As far as we are aware, our research represents the first large-scale investigation into calibration properties and the premier study of calibration issues within NAS.
    摘要 深度神经网络在不同的机器学习任务中日益普及,但是随着模型复杂度的增加,它们经常面临调整问题,即使Predictive accuracy得到了提高。许多研究尝试通过数据预处理、特定的损失函数和训练框架来改进调整性能。然而,调整性质的研究受到了一定的忽视。我们的研究利用Neural Architecture Search(NAS)搜索空间,提供了详细的模型建筑空间,以便对调整性质进行全面的探索。我们专门创建了一个模型调整数据集。这个数据集评估了90个分割值和12个附加的调整测量,在117,702个Unique Neural Networks中进行了广泛的 NATS-Bench 搜索空间中进行了测试。我们的分析旨在回答一些在领域中长期存在的问题,使用我们提出的数据集:(i)可否将模型调整 generalized 到不同任务?(ii)可否使用Robustness作为调整测量?(iii)如何判定调整指标的可靠性?(iv)post-hoc calibration方法对所有模型是否具有相同的影响?(v)调整与准确度之间是否存在相互关系?(vi)分割值如何影响调整测量?(vii)哪些建筑设计对调整有利?我们的研究填补了现有的空白,通过调整在NAS中进行exploration。我们的研究表明,调整性质在NAS中存在一定的问题,并且我们的研究是这类研究中的第一个大规模调整性质的研究。

Characterizing normal perinatal development of the human brain structural connectivity

  • paper_url: http://arxiv.org/abs/2308.11836
  • repo_url: None
  • paper_authors: Yihan Wu, Lana Vasung, Camilo Calixto, Ali Gholipour, Davood Karimi
    for:这个研究的目的是为了研究新生儿期的脑发育中Structural connectome的发展趋势。methods:这个研究使用了基于时空平均的计算框架,以确定新生儿期的Structural connectivity的 normative baselines。results:研究发现,在33-44postmenstrual weeks期间,Structural connectivity发展了明显的趋势,包括全脑和局部效率的增加,特征路径长度的减少,以及脑叶和脑半球之间的连接强化。此外,研究还发现了一些偏好性特征,这些特征在不同的连接评估方法中都有一致性。
    Abstract Early brain development is characterized by the formation of a highly organized structural connectome. The interconnected nature of this connectome underlies the brain's cognitive abilities and influences its response to diseases and environmental factors. Hence, quantitative assessment of structural connectivity in the perinatal stage is useful for studying normal and abnormal neurodevelopment. However, estimation of the connectome from diffusion MRI data involves complex computations. For the perinatal period, these computations are further challenged by the rapid brain development and imaging difficulties. Combined with high inter-subject variability, these factors make it difficult to chart the normal development of the structural connectome. As a result, there is a lack of reliable normative baselines of structural connectivity metrics at this critical stage in brain development. In this study, we developed a computational framework, based on spatio-temporal averaging, for determining such baselines. We used this framework to analyze the structural connectivity between 33 and 44 postmenstrual weeks using data from 166 subjects. Our results unveiled clear and strong trends in the development of structural connectivity in perinatal stage. Connection weighting based on fractional anisotropy and neurite density produced the most consistent results. We observed increases in global and local efficiency, a decrease in characteristic path length, and widespread strengthening of the connections within and across brain lobes and hemispheres. We also observed asymmetry patterns that were consistent between different connection weighting approaches. The new computational method and results are useful for assessing normal and abnormal development of the structural connectome early in life.
    摘要 早期大脑发展 caracterized by the formation of a highly organized structural connectome. 这个 connectome的交互性是大脑的认知能力的基础,也影响了它对疾病和环境因素的应对。因此,在早期生长阶段的量化评估结构连接性是研究正常和异常神经发展的有用工具。然而,从Diffusion MRI数据中计算structural connectivity的估计具有复杂的计算。在早期生长阶段,这些计算受到迅速发展的大脑和成像困难的挑战。此外,高 между个体变化性和不同年龄的数据也使得 Charting the normal development of the structural connectome 是困难的。因此,我们缺乏可靠的正常发展基线的结构连接度度量。在这项研究中,我们开发了一种基于时空平均的计算框架,以确定这些基线。我们使用这个框架分析33-44周孕期的结构连接度,使用166个主要数据。我们的结果表明,在早期生长阶段,结构连接度呈现了明确和强的趋势。基于分数方差和神经纤维密度的连接重量Produced the most consistent results。我们发现全球和局部效率增加,特征路径长度减少,并且广泛加强连接在大脑叶和半球之间和之间。我们还发现了相互 symmetries 的偏好,这些偏好在不同的连接重量方法之间呈现一致。这些新的计算方法和结果有用于评估早期生长阶段的正常和异常结构连接度发展。

Algorithm-assisted discovery of an intrinsic order among mathematical constants

  • paper_url: http://arxiv.org/abs/2308.11829
  • repo_url: None
  • paper_authors: Rotem Elimelech, Ofir David, Carlos De la Cruz Mengual, Rotem Kalisch, Wolfgang Berndt, Michael Shalyt, Mark Silberstein, Yaron Hadad, Ido Kaminer
  • for: 这个论文的目的是探索数学领域中的新概念和关系,利用计算机算法和人类直觉的结合来发现新的数学常量。
  • methods: 这篇论文使用了大规模并行计算机算法,探索了巨量的参数空间,并发现了一种新的数学结构——保守矩阵场。
  • results: 这篇论文发现了一系列新的约束数学常量表达式,包括ζ(3)的多个整数值,并通过新的数学证明,证明了这些常量的不可数性。这些结果表明了计算机支持的数学研究策略的力量,并开启了新的可能性 для解决长期开放的数学问题。
    Abstract In recent decades, a growing number of discoveries in fields of mathematics have been assisted by computer algorithms, primarily for exploring large parameter spaces that humans would take too long to investigate. As computers and algorithms become more powerful, an intriguing possibility arises - the interplay between human intuition and computer algorithms can lead to discoveries of novel mathematical concepts that would otherwise remain elusive. To realize this perspective, we have developed a massively parallel computer algorithm that discovers an unprecedented number of continued fraction formulas for fundamental mathematical constants. The sheer number of formulas discovered by the algorithm unveils a novel mathematical structure that we call the conservative matrix field. Such matrix fields (1) unify thousands of existing formulas, (2) generate infinitely many new formulas, and most importantly, (3) lead to unexpected relations between different mathematical constants, including multiple integer values of the Riemann zeta function. Conservative matrix fields also enable new mathematical proofs of irrationality. In particular, we can use them to generalize the celebrated proof by Ap\'ery for the irrationality of $\zeta(3)$. Utilizing thousands of personal computers worldwide, our computer-supported research strategy demonstrates the power of experimental mathematics, highlighting the prospects of large-scale computational approaches to tackle longstanding open problems and discover unexpected connections across diverse fields of science.
    摘要

Exploring the Effectiveness of GPT Models in Test-Taking: A Case Study of the Driver’s License Knowledge Test

  • paper_url: http://arxiv.org/abs/2308.11827
  • repo_url: None
  • paper_authors: Saba Rahimi, Tucker Balch, Manuela Veloso
  • for: 本研究旨在使用Context不在GPT模型训练数据中的信息来解决GPT模型无法回答最新发展或非公共文档中的问题。
  • methods: 该方法包括对Context信息进行处理、将Context和问题embedding,通过Context embedding的集成构建提问、使用GPT模型回答问题。
  • results: 在控制测试enario中,使用加州driver’s Handbook作为信息源,GPT-3模型在50个样本驾驶知识测试题上达到了96%的通过率,而无Context情况下的通过率为82%。然而,Model仍然无法正确回答一些问题,反映出还有改进空间。研究还研究了提问长度和Context格式对模型性能的影响。
    Abstract Large language models such as Open AI's Generative Pre-trained Transformer (GPT) models are proficient at answering questions, but their knowledge is confined to the information present in their training data. This limitation renders them ineffective when confronted with questions about recent developments or non-public documents. Our research proposes a method that enables GPT models to answer questions by employing context from an information source not previously included in their training data. The methodology includes preprocessing of contextual information, the embedding of contexts and queries, constructing prompt through the integration of context embeddings, and generating answers using GPT models. We applied this method in a controlled test scenario using the California Driver's Handbook as the information source. The GPT-3 model achieved a 96% passing score on a set of 50 sample driving knowledge test questions. In contrast, without context, the model's passing score fell to 82%. However, the model still fails to answer some questions correctly even with providing library of context, highlighting room for improvement. The research also examined the impact of prompt length and context format, on the model's performance. Overall, the study provides insights into the limitations and potential improvements for GPT models in question-answering tasks.
    摘要 大型语言模型如Open AI的生成预训练变换器(GPT)模型在回答问题方面表现出色,但它们的知识受训数据的限制。这个限制使得它们无法回答最新的发展或者非公共文档中的问题。我们的研究提出了一种方法,使得GPT模型可以通过使用不包含在它们训练数据中的信息源来回答问题。该方法包括Contextual information的处理、查询和Context的编码、通过 integrate context embedding和构建提问的推荐、使用GPT模型回答问题。我们在控制测试enario中应用了这种方法,使用加利福尼亚驾驶手册作为信息源。GPT-3模型在50个示例驾驶知识测试问题中取得96%的通过率,而无Context的情况下,模型的通过率下降到82%。然而,即使提供了库存Context,模型仍然无法回答一些问题正确,这 highlights 进一步的改进空间。研究还检查了提问长度和Context格式对模型性能的影响。总的来说,这项研究提供了GPT模型在问题回答任务中的局限性和可能的改进方向。

Expressive probabilistic sampling in recurrent neural networks

  • paper_url: http://arxiv.org/abs/2308.11809
  • repo_url: None
  • paper_authors: Shirui Chen, Linxin Preston Jiang, Rajesh P. N. Rao, Eric Shea-Brown
  • for: 这篇论文的目的是解决 sampling-based Bayesian 模型中神经活动的问题,即神经活动被视为是 probablistic 计算中的样本。
  • methods: 这篇论文使用了函数分析和随机差分方程来探讨 recurrent 神经网络是如何从复杂分布中采样的。
  • results: 论文表明,使用分立输出单元的 recurrent 神经网络可以采样到任意分布,并提出了一种有效的训练方法基于减噪得分匹配。Empirical 试验表明,该模型可以采样到一些复杂数据分布。
    Abstract In sampling-based Bayesian models of brain function, neural activities are assumed to be samples from probability distributions that the brain uses for probabilistic computation. However, a comprehensive understanding of how mechanistic models of neural dynamics can sample from arbitrary distributions is still lacking. We use tools from functional analysis and stochastic differential equations to explore the minimum architectural requirements for $\textit{recurrent}$ neural circuits to sample from complex distributions. We first consider the traditional sampling model consisting of a network of neurons whose outputs directly represent the samples (sampler-only network). We argue that synaptic current and firing-rate dynamics in the traditional model have limited capacity to sample from a complex probability distribution. We show that the firing rate dynamics of a recurrent neural circuit with a separate set of output units can sample from an arbitrary probability distribution. We call such circuits reservoir-sampler networks (RSNs). We propose an efficient training procedure based on denoising score matching that finds recurrent and output weights such that the RSN implements Langevin sampling. We empirically demonstrate our model's ability to sample from several complex data distributions using the proposed neural dynamics and discuss its applicability to developing the next generation of sampling-based brain models.
    摘要 在基于抽样的贝叶斯模型中,神经活动被假设为抽样来自神经网络中的概率分布。然而,完整理解如何使机制模型的神经动力学可以从任意分布中抽样仍然缺乏。我们使用函数分析和随机差分方程来探索神经网络的最小建筑要求,以便它们可以从复杂的分布中抽样。我们首先考虑传统抽样模型,即一个由神经元输出直接表示抽样的网络(抽样器只网络)。我们 argue that synaptic current和神经元发射速率动力学在传统模型中有限的抽样能力。我们显示,一个具有分离输出单元的循环神经网络可以从任意概率分布中抽样。我们称之为储备抽样网络(RSN)。我们提出了一种高效的训练方法,基于排除掉噪声的对准得分,以找到循环和输出参数,使得 RSN 实现朗凡 sampling。我们employmontricate了我们的模型,并证明其可以从多种复杂数据分布中抽样,并讨论了其在开发下一代抽样基于脑模型方面的应用。

Ceci n’est pas une pomme: Adversarial Illusions in Multi-Modal Embeddings

  • paper_url: http://arxiv.org/abs/2308.11804
  • repo_url: None
  • paper_authors: Eugene Bagdasaryan, Vitaly Shmatikov
  • for: 这种研究用于检测多modal embeddings中的攻击点,以及这些攻击点如何影响下游任务。
  • methods: 研究人员使用了多modal embeddings,并通过对这些 embeddings 进行攻击来证明它们的易攻击性。
  • results: 研究发现,使用这种攻击方法可以让恶意者将任意输入与其他模式的输入相关联,从而影响多modal embeddings 的性能。
    Abstract Multi-modal encoders map images, sounds, texts, videos, etc. into a single embedding space, aligning representations across modalities (e.g., associate an image of a dog with a barking sound). We show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an input in any modality, an adversary can perturb it so as to make its embedding close to that of an arbitrary, adversary-chosen input in another modality. Illusions thus enable the adversary to align any image with any text, any text with any sound, etc. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks. Using ImageBind embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, and zero-shot classification.
    摘要 多模态编码器将图像、声音、文本、视频等转换到单一的嵌入空间中,使表示之间匹配(例如,将一张狗图像与一个叫声相对应)。我们表明,多模态嵌入可能敏感于我们称为“ adversarial 幻觉”的攻击。给定任意模式的输入,敌方可以对其进行扰动,使其嵌入接近另一种选择的敌方输入的嵌入。幻觉如此能让敌方将任意图像与任意文本、任意声音等相对应。这些幻觉利用嵌入空间的近似性,因此不受下游任务的限制。使用 ImageBind 嵌入,我们示例了如何使用不知道下游任务的情况,通过生成 adversarially 对齐的输入,使图像生成、文本生成和零学习分类发生幻觉。

WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters

  • paper_url: http://arxiv.org/abs/2308.11776
  • repo_url: None
  • paper_authors: Ange Lou, Jack Noble
  • for: 这个论文的目的是建立一个自我超vised的深度和 egocentric 运动估计系统,以便在外科视频中提取深度图和摄像头参数。
  • methods: 该论文使用了一种基于 cost-volume 的超vision 方法来为系统提供辅助supervision,以便预测摄像头参数。
  • results: 实验结果表明,提议的方法可以改善摄像头参数、 egocentric 运动和深度估计的准确性。
    Abstract Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community. Although several self-supervision methods alleviate the need for ground truth depth maps and poses, they still need known camera intrinsic parameters, which are often missing or not recorded. Moreover, the camera intrinsic prediction methods in existing works depend heavily on the quality of datasets. In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters. We proposed a cost-volume-based supervision manner to give the system auxiliary supervision for camera parameters prediction. The experimental results showed that the proposed method improved the accuracy of estimated camera parameters, ego-motion, and depth estimation.
    摘要 depth 估算在手术视频中发挥重要作用,但创建深度地图真实数据集在手术视频中具有不一致的亮度和噪声,因此建立精准和可靠的自我超视方法在计算机视觉社区中受到更多的关注。虽然一些自我超视方法可以减少需要深度地图和姿态的真实数据,但它们仍然需要已知的摄像头内参数,这些参数通常缺失或者不记录。此外,现有的摄像头内参数预测方法依赖于数据质量的改进。在这项工作中,我们希望建立一个可以预测不仅准确的深度地图和摄像头姿态,还可以预测摄像头内参数的自我超视系统。我们提出了基于Cost Volume的超视方式,以供系统进行摄像头参数预测的auxiliary超视。实验结果表明,我们的方法可以提高摄像头参数、ego-动作和深度估算的准确性。

3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM Network

  • paper_url: http://arxiv.org/abs/2308.11771
  • repo_url: None
  • paper_authors: Qinyu Chen, Zuowen Wang, Shih-Chii Liu, Chang Gao
  • for: 这 paper 是为了开发一种基于 Change-Based Convolutional Long Short-Term Memory (CB-ConvLSTM) 模型,用于Event-based eye tracking,这是下一代可穿戴医疗技术,如 AR/VR 头盔。
  • methods: 这 paper 使用了 retina-inspired event cameras,它们具有低延迟响应和稀疏输出事件流,而不是传统的帧基本摄像头。CB-ConvLSTM 架构 efficiently 提取了 spatial-temporal 特征,用于 pupil tracking,并且比 convential CNN 结构更高效。
  • results: 这 paper 使用 delta-encoded recurrent path 提高了 activation sparsity,从而减少了数学运算量约 4.7 $\times$,不会失去准确性。这使得它成为实时眼动跟踪的理想选择,特别是在资源有限的设备上。项目代码和数据集都公开可用于 \url{https://github.com/qinche106/cb-convlstm-eyetracking}.
    Abstract This paper presents a sparse Change-Based Convolutional Long Short-Term Memory (CB-ConvLSTM) model for event-based eye tracking, key for next-generation wearable healthcare technology such as AR/VR headsets. We leverage the benefits of retina-inspired event cameras, namely their low-latency response and sparse output event stream, over traditional frame-based cameras. Our CB-ConvLSTM architecture efficiently extracts spatio-temporal features for pupil tracking from the event stream, outperforming conventional CNN structures. Utilizing a delta-encoded recurrent path enhancing activation sparsity, CB-ConvLSTM reduces arithmetic operations by approximately 4.7$\times$ without losing accuracy when tested on a \texttt{v2e}-generated event dataset of labeled pupils. This increase in efficiency makes it ideal for real-time eye tracking in resource-constrained devices. The project code and dataset are openly available at \url{https://github.com/qinche106/cb-convlstm-eyetracking}.
    摘要 这篇论文介绍了一种稀疏变化基于卷积长短期记忆遮盾(CB-ConvLSTM)模型,用于事件基于眼动跟踪,这是下一代可穿戴医疗技术如AR/VR头戴式设备的关键。我们利用了眼睛引发的事件摄像头的优点,即快速响应和稀疏输出事件流,而不是传统的帧基本摄像头。我们的CB-ConvLSTM架构有效地提取了眼动跟踪的空间时间特征,超过了传统的CNN结构。通过使用delta编码的回归路增强活动稀疏,CB-ConvLSTM可以减少笔算操作数约4.7倍,无损loss性能,使其适用于实时眼动跟踪 resource-constrained设备。项目代码和数据集在GitHub上公开可用,请参考\url{https://github.com/qinche106/cb-convlstm-eyetracking}.

Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

  • paper_url: http://arxiv.org/abs/2308.11764
  • repo_url: https://github.com/engsalem/halo
  • paper_authors: Mohamed Elaraby, Mengyin Lu, Jacob Dunn, Xueying Zhang, Yu Wang, Shizhu Liu
  • for: 这 paper 的目的是量化和缓解强大语言模型(LLMs)中的幻觉现象。
  • methods: 这 paper 使用了一种名为 HaloCheck 的轻量级黑盒无知框架,以量化 LLMS 中幻觉的严重程度。此外,paper 还探讨了知识注入和教师学生方法来缓解 LLMS 中幻觉现象。
  • results: experiments 表明,使用 HaloCheck 和其他技术可以有效缓解 LLMS 中幻觉现象,特别是在复杂的领域。
    Abstract Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP). Although convenient for research and practical applications, open-source LLMs with fewer parameters often suffer from severe hallucinations compared to their larger counterparts. This paper focuses on measuring and reducing hallucinations in BLOOM 7B, a representative of such weaker open-source LLMs that are publicly available for research and commercial applications. We introduce HaloCheck, a lightweight BlackBox knowledge-free framework designed to quantify the severity of hallucinations in LLMs. Additionally, we explore techniques like knowledge injection and teacher-student approaches to alleviate hallucinations in low-parameter LLMs. Our experiments effectively demonstrate the reduction of hallucinations in challenging domains for these LLMs.
    摘要 Translated into Simplified Chinese:大型自然语言处理模型(LLM)已经革命化了自然语言处理(NLP)领域。虽然对研究和实际应用来说非常方便,但公共源 LLM 的 fewer parameters often suffer from severe hallucinations compared to their larger counterparts. 这篇论文关注测量和减少 LLM 中的 hallucinations,特别是公共源 LLM 中的 BLOOM 7B。我们介绍 HaloCheck,一种轻量级黑盒无知框架,用于评估 LLM 中 hallucinations 的严重程度。此外,我们还探讨了如何通过知识注入和教师-学生方法来缓解低参数 LLM 中的 hallucinations。我们的实验效果地示了在这些 LLM 中的挑战领域中减少 hallucinations。

VBMO: Voting-Based Multi-Objective Path Planning

  • paper_url: http://arxiv.org/abs/2308.11755
  • repo_url: None
  • paper_authors: Raj Korpan
  • for: 本研究开发了一种VBMO算法,用于生成优化单个目标计划,并对每个目标进行评估,使用投票机制来选择最佳计划。
  • methods: VBMO算法不使用手动调整的权重,不是基于进化算法,而是根据一个计划在一个目标方面的优化程度来评估其在其他目标方面的表现。VBMO使用三种投票机制:范围、波达和组合批准。
  • results: 对于多种和复杂的环境,VBMO算法能够高效生成满足多个目标的计划。
    Abstract This paper presents VBMO, the Voting-Based Multi-Objective path planning algorithm, that generates optimal single-objective plans, evaluates each of them with respect to the other objectives, and selects one with a voting mechanism. VBMO does not use hand-tuned weights, consider the multiple objectives at every step of search, or use an evolutionary algorithm. Instead, it considers how a plan that is optimal in one objective may perform well with respect to others. VBMO incorporates three voting mechanisms: range, Borda, and combined approval. Extensive evaluation in diverse and complex environments demonstrates the algorithm's ability to efficiently produce plans that satisfy multiple objectives.
    摘要

Multi-Instance Adversarial Attack on GNN-Based Malicious Domain Detection

  • paper_url: http://arxiv.org/abs/2308.11754
  • repo_url: None
  • paper_authors: Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan, Yao Ma
  • for: 这种研究的目的是检测互联网域名是否与网络攻击相关。
  • methods: 该方法使用图神经网络(GNN)来推断互联网域名的危险程度,并使用DNS日志来构建域名图(DMG)。
  • results: 该研究发现,现有的单个攻击者节点 manipulation 技术不具备防止多个节点同时 manipulate 的能力,并且提出了一种基于黑盒模型的多实例攻击方法(MintA),可以在无法访问模型的情况下进行攻击。该方法可以在实际数据上实现攻击成功率超过 80%。
    Abstract Malicious domain detection (MDD) is an open security challenge that aims to detect if an Internet domain is associated with cyber-attacks. Among many approaches to this problem, graph neural networks (GNNs) are deemed highly effective. GNN-based MDD uses DNS logs to represent Internet domains as nodes in a maliciousness graph (DMG) and trains a GNN to infer their maliciousness by leveraging identified malicious domains. Since this method relies on accessible DNS logs to construct DMGs, it exposes a vulnerability for adversaries to manipulate their domain nodes' features and connections within DMGs. Existing research mainly concentrates on threat models that manipulate individual attacker nodes. However, adversaries commonly generate multiple domains to achieve their goals economically and avoid detection. Their objective is to evade discovery across as many domains as feasible. In this work, we call the attack that manipulates several nodes in the DMG concurrently a multi-instance evasion attack. We present theoretical and empirical evidence that the existing single-instance evasion techniques for are inadequate to launch multi-instance evasion attacks against GNN-based MDDs. Therefore, we introduce MintA, an inference-time multi-instance adversarial attack on GNN-based MDDs. MintA enhances node and neighborhood evasiveness through optimized perturbations and operates successfully with only black-box access to the target model, eliminating the need for knowledge about the model's specifics or non-adversary nodes. We formulate an optimization challenge for MintA, achieving an approximate solution. Evaluating MintA on a leading GNN-based MDD technique with real-world data showcases an attack success rate exceeding 80%. These findings act as a warning for security experts, underscoring GNN-based MDDs' susceptibility to practical attacks that can undermine their effectiveness and benefits.
    摘要 “恶意域名检测(MDD)是一个开放的安全挑战,旨在检测互联网域名是否与网络攻击相关。许多方法来解决这个问题,Graph Neural Networks(GNN)被视为非常有效。GNN基于的 MDD 使用 DNS 日志来表示互联网域名为节点在恶意域名图(DMG)中,并使用 GNN 来推断它们的恶意程度,利用已知的恶意域名。由于这种方法依赖于可 accessible DNS 日志来构建 DMG,因此暴露了一个攻击者可以 manipulate 其域名节点的特征和连接在 DMG 中的漏洞。现有的研究主要集中在单个攻击者节点的威胁模型上。然而,攻击者通常会生成多个域名来实现他们的目标,以避免检测。我们称这种攻击为多实例逃脱攻击。我们提供了证明和实验证据,证明现有的单实例逃脱技术无法对 GNN 基于的 MDD 进行多实例逃脱攻击。因此,我们引入 MintA,一种在检测时进行多实例逃脱攻击的含糊逻辑攻击。MintA 通过优化的杂化和只有黑盒访问目标模型的能力,实现节点和邻居隐蔽性的提高。我们提出了 MintA 的优化挑战,并实现了一个近似解决方案。对一种主流 GNN 基于 MDD 技术进行实验,MintA 的攻击成功率超过 80%。这些发现作为一个警告,强调 GNN 基于 MDD 的抵御力和优势受到了实际攻击的威胁。”

Patient Clustering via Integrated Profiling of Clinical and Digital Data

  • paper_url: http://arxiv.org/abs/2308.11748
  • repo_url: None
  • paper_authors: Dongjin Choi, Andy Xiang, Ozgur Ozturk, Deep Shrestha, Barry Drake, Hamid Haidarian, Faizan Javed, Haesun Park
  • for: 这个研究是为了开发一个基于Profile的患者划分模型,用于医疗数据分析。
  • methods: 这个模型使用基于受限制低维度approximation的方法,利用患者的临床数据和数字互动数据(包括浏览和搜索)构建患者profile。这个方法生成了非负嵌入 вектор,作为患者的低维度表示。
  • results: 对于使用真实世界患者数据集进行评估,这个方法在划分准确性和推荐精度方面表现出优于其他基线方法。
    Abstract We introduce a novel profile-based patient clustering model designed for clinical data in healthcare. By utilizing a method grounded on constrained low-rank approximation, our model takes advantage of patients' clinical data and digital interaction data, including browsing and search, to construct patient profiles. As a result of the method, nonnegative embedding vectors are generated, serving as a low-dimensional representation of the patients. Our model was assessed using real-world patient data from a healthcare web portal, with a comprehensive evaluation approach which considered clustering and recommendation capabilities. In comparison to other baselines, our approach demonstrated superior performance in terms of clustering coherence and recommendation accuracy.
    摘要 我们介绍了一种基于专业 patient clustering 模型,适用于医疗数据。我们的模型通过使用受限制的低级数据方法,使用病人的临床数据和数字互动数据(包括浏览和搜索)构建病人 профи。这样,我们可以生成非负嵌入 вектор,作为病人的低维度表示。我们的模型在使用实际的病人数据from a healthcare web portal 进行评估,并通过了一种全面的评估方法,考虑 clustering 和推荐能力。与其他基准相比,我们的方法在 clustering 准确性和推荐精度方面表现出色。

Lifted Inference beyond First-Order Logic

  • paper_url: http://arxiv.org/abs/2308.11738
  • repo_url: None
  • paper_authors: Sagar Malhotra, Davide Bizzaro, Luciano Serafini
  • for: 这 paper 的目的是探讨Weighted First Order Model Counting (WFOMC) 的基本性和可行性,以及可以在 probabilistic inference 中使用的逻辑 фрагментов。
  • methods: 这 paper 使用了一种新的方法 called “counting by splitting”,用于解决 WFOMC 的难题。这种方法可以应用于多种不同的逻辑结构,如directed acyclic graphs, connected graphs, trees, etc。
  • results: 这 paper 的结果表明,使用 “counting by splitting” 方法可以将多种逻辑结构转化为 domain-liftable 的形式,从而实现 probabilistic inference。此外,这 paper 还推广了许多之前的结果,如 directed acyclic graphs, phylogenetic networks, etc。
    Abstract Weighted First Order Model Counting (WFOMC) is fundamental to probabilistic inference in statistical relational learning models. As WFOMC is known to be intractable in general ($\#$P-complete), logical fragments that admit polynomial time WFOMC are of significant interest. Such fragments are called domain liftable. Recent works have shown that the two-variable fragment of first order logic extended with counting quantifiers ($\mathrm{C^2}$) is domain-liftable. However, many properties of real-world data, like acyclicity in citation networks and connectivity in social networks, cannot be modeled in $\mathrm{C^2}$, or first order logic in general. In this work, we expand the domain liftability of $\mathrm{C^2}$ with multiple such properties. We show that any $\mathrm{C^2}$ sentence remains domain liftable when one of its relations is restricted to represent a directed acyclic graph, a connected graph, a tree (resp. a directed tree) or a forest (resp. a directed forest). All our results rely on a novel and general methodology of "counting by splitting". Besides their application to probabilistic inference, our results provide a general framework for counting combinatorial structures. We expand a vast array of previous results in discrete mathematics literature on directed acyclic graphs, phylogenetic networks, etc.
    摘要 Weighted First Order Model Counting (WFOMC) 是统计关系学习模型的基本概念。由于 WFOMC 在总体来说是NP完全的(#P-完全),因此可以在有限时间内完成的逻辑 фрагменты具有重要的科学意义。这些 фрагменты被称为域 liftable。 recent works 表明,两变量 fragments 的第一阶alogic 加上计数量词(C^2)可以域 liftable。然而,许多实际数据的特性,如社交网络中的环状图和 citations 网络中的连接性,无法在 C^2 或首阶alogic 中表示。在这种情况下,我们扩展了 C^2 的域 liftability,使其能够模型这些特性。我们证明,任何 C^2 句子都可以域 liftable,当其中一个关系被限制为表示直接径向图、连接图、树(resp. 直接树)或森林(resp. 直接森林)时。我们的结果基于一种新的通用方法,称为“计数分解”。 beside 其应用于 probabilistic inference,我们的结果提供了一个总体的计数结构框架。我们扩展了许多过去的结果 в 直接径向图、phylogenetic 网络等数学文献中。

Knowledge Graph Prompting for Multi-Document Question Answering

  • paper_url: http://arxiv.org/abs/2308.11730
  • repo_url: None
  • paper_authors: Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr
    for: 这种方法可以帮助大语言模型在多文档问答 задании中提高表现,特别是在需要深刻理解不同文档之间的逻辑关系时。methods: 这种方法包括建立知识图和LM帮助图 traversal模块,用于导航多文档之间的Semantic/语言相似性和结构关系。results: 广泛的实验表明,这种方法可以提高大语言模型在多文档问答 задании中的表现,并且可以减少检索延迟。
    Abstract The 'pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in the scenario of multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of different documents. To fill this crucial gap, we propose a Knowledge Graph Prompting (KGP) method to formulate the right context in prompting LLMs for MD-QA, which consists of a graph construction module and a graph traversal module. For graph construction, we create a knowledge graph (KG) over multiple documents with nodes symbolizing passages or document structures (e.g., pages/tables), and edges denoting the semantic/lexical similarity between passages or intra-document structural relations. For graph traversal, we design an LM-guided graph traverser that navigates across nodes and gathers supporting passages assisting LLMs in MD-QA. The constructed graph serves as the global ruler that regulates the transitional space among passages and reduces retrieval latency. Concurrently, the LM-guided traverser acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality. Extensive experiments underscore the efficacy of KGP for MD-QA, signifying the potential of leveraging graphs in enhancing the prompt design for LLMs. Our code is at https://github.com/YuWVandy/KG-LLM-MDQA.
    摘要 “对多篇文档问题回答(MD-QA)任务,大型语言模型(LLM)的“预训、提示、预测”模式已经实现了杰出的成功。然而,现有的研究几乎没有探讨这种模式在MD-QA任务中的应用。为了填补这个重要的空白,我们提出了知识图表示法(KGP),用于将适当的 контекст提示 LLM 进行 MD-QA,这包括对多篇文档建立知识图和对图中的node和edge进行遍历。 для知识图建立,我们创建了多篇文档之间的知识图,其中node表示文档或文档中的章节/表格,而edge表示文档之间的semantic/lexical相似性或内部结构相关。 для图中的遍历,我们设计了LM-导向的图游击者,可以在图中穿梭,寻找支持文档,以助LLM进行MD-QA。建立的图 acted as a global ruler,对文档之间的转换空间实现了统一的规范,同时LM-导向的游击者 acted as a local navigator,寻找适当的上下文,以逐渐进行问题回答和提高检索质量。实验结果证明了KGP的可行性和有效性,这表明了可以通过图在提高LLM的预训设计中应用。我们的代码可以在https://github.com/YuWVandy/KG-LLM-MDQA中找到。”

Efficient Benchmarking (of Language Models)

  • paper_url: http://arxiv.org/abs/2308.11696
  • repo_url: https://github.com/sumankrsh/Sentiment-Analysis.ipynb
  • paper_authors: Yotam Perlitz, Elron Bandel, Ariel Gera, Ofir Arviv, Liat Ein-Dor, Eyal Shnarch, Noam Slonim, Michal Shmueli-Scheuer, Leshem Choshen
  • for: 本研究旨在提高语言模型(LM)评估 benchmark的效率,不会 compromising 可靠性。
  • methods: 该研究使用 HELM benchmark 作为测试例子, investigate 不同 benchmark 设计选择对计算成本-可靠性贸易的影响。提出一种新的度量方法 Decision Impact on Reliability(DIoR)来评估决策对可靠性的影响。
  • results: 研究发现,现有领先者在 HELM 上可能会改变,只需要移除一些低排名的模型即可;同时,一些不同的 HELM 场景选择可以导致成本计算的变化。基于这些发现,提出了一些具体的建议,可以带来计算成本的减少,而无需牺牲 benchmark 的可靠性。例如,可以通过 x100 或更多的计算成本减少来实现这一目标。
    Abstract The increasing versatility of language models LMs has given rise to a new class of benchmarks that comprehensively assess a broad range of capabilities. Such benchmarks are associated with massive computational costs reaching thousands of GPU hours per model. However the efficiency aspect of these evaluation efforts had raised little discussion in the literature. In this work we present the problem of Efficient Benchmarking namely intelligently reducing the computation costs of LM evaluation without compromising reliability. Using the HELM benchmark as a test case we investigate how different benchmark design choices affect the computation-reliability tradeoff. We propose to evaluate the reliability of such decisions by using a new measure Decision Impact on Reliability DIoR for short. We find for example that the current leader on HELM may change by merely removing a low-ranked model from the benchmark and observe that a handful of examples suffice to obtain the correct benchmark ranking. Conversely a slightly different choice of HELM scenarios varies ranking widely. Based on our findings we outline a set of concrete recommendations for more efficient benchmark design and utilization practices leading to dramatic cost savings with minimal loss of benchmark reliability often reducing computation by x100 or more.
    摘要 LM模型的多样化化使得新一代的测试标准出现了,这些标准可以全面评估LM模型的各种能力。然而,这些评估努力的效率却得到了少量的讨论。在这篇文章中,我们提出了一个问题,即如何智能减少LM评估计算成本而不失去可靠性。使用HELM测试标准作为示例,我们研究了不同的测试标准设计选择对计算vs可靠性的负担交互的影响。我们提出了一个新的度量指标,即决策影响可靠性(DIoR),以评估这些决策的可靠性。我们发现,例如,当 removing一个低排名模型时,可以改变当前领先者的位置,并且只需几个例子即可获得正确的排名。然而,不同的HELM场景选择会导致排名差异很大。根据我们的发现,我们提出了一些具体的建议,以提高LM评估设计和使用做法,从而实现计算成本的减少,通常是x100或更多的减少,而且失去可靠性的可能性很低。

Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models

  • paper_url: http://arxiv.org/abs/2308.11601
  • repo_url: None
  • paper_authors: Surya Narayanan Hari, Matt Thomson
  • for: 这个研究是为了提出一个适应环境的路由系统,Tryage,以便自动选择适当的语言模型库中的专家模型,以满足用户的多元化工作流程和数据领域的需求,同时解决了计算、安全性和新鲜度等考虑。
  • methods: Tryage使用了一个语言模型路由器来预测下游模型的表现,然后使用一个目标函数集成表现预测、用户目标和限制(例如模型大小、模型新鲜度、安全性、 verbosity 和可读性)来做路由决策。
  • results: Tryage在多元的数据集中,包括代码、文本、医疗资料和专利,超过 Gorilla 和 GPT3.5 Turbo 在动态模型选择中,可以实现50.9% 的准确率,比 GPT3.5 Turbo 的23.6% 和 Gorilla 的10.8% 高得多。
    Abstract The introduction of the transformer architecture and the self-attention mechanism has led to an explosive production of language models trained on specific downstream tasks and data domains. With over 200, 000 models in the Hugging Face ecosystem, users grapple with selecting and optimizing models to suit multifaceted workflows and data domains while addressing computational, security, and recency concerns. There is an urgent need for machine learning frameworks that can eliminate the burden of model selection and customization and unleash the incredible power of the vast emerging model library for end users. Here, we propose a context-aware routing system, Tryage, that leverages a language model router for optimal selection of expert models from a model library based on analysis of individual input prompts. Inspired by the thalamic router in the brain, Tryage employs a perceptive router to predict down-stream model performance on prompts and, then, makes a routing decision using an objective function that integrates performance predictions with user goals and constraints that are incorporated through flags (e.g., model size, model recency). Tryage allows users to explore a Pareto front and automatically trade-off between task accuracy and secondary goals including minimization of model size, recency, security, verbosity, and readability. Across heterogeneous data sets that include code, text, clinical data, and patents, the Tryage framework surpasses Gorilla and GPT3.5 turbo in dynamic model selection identifying the optimal model with an accuracy of 50.9% , compared to 23.6% by GPT 3.5 Turbo and 10.8% by Gorilla. Conceptually, Tryage demonstrates how routing models can be applied to program and control the behavior of multi-model LLM systems to maximize efficient use of the expanding and evolving language model ecosystem.
    摘要 textNote that Simplified Chinese is the standard form of Chinese used in mainland China, and it is different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Practical Insights on Incremental Learning of New Human Physical Activity on the Edge

  • paper_url: http://arxiv.org/abs/2308.11691
  • repo_url: None
  • paper_authors: George Arvanitakis, Jingwei Zuo, Mthandazo Ndhlovu, Hakim Hacid
  • for: This paper explores the challenges of Edge-based learning, particularly in the context of limited data storage, computing power, and the number of learning classes.
  • methods: The paper uses the MAGNETO system to conduct experiments and demonstrate the challenges of Edge ML, using data collected from mobile sensors to learn human activities.
  • results: The paper highlights the challenges of Edge ML and offers valuable perspectives on how to address them.
    Abstract Edge Machine Learning (Edge ML), which shifts computational intelligence from cloud-based systems to edge devices, is attracting significant interest due to its evident benefits including reduced latency, enhanced data privacy, and decreased connectivity reliance. While these advantages are compelling, they introduce unique challenges absent in traditional cloud-based approaches. In this paper, we delve into the intricacies of Edge-based learning, examining the interdependencies among: (i) constrained data storage on Edge devices, (ii) limited computational power for training, and (iii) the number of learning classes. Through experiments conducted using our MAGNETO system, that focused on learning human activities via data collected from mobile sensors, we highlight these challenges and offer valuable perspectives on Edge ML.
    摘要 《边缘机器学习(边缘ML)》,它将计算智能从云端系统传递到边缘设备,目前吸引了广泛关注,因为它的明显优势包括降低延迟、提高数据隐私和减少连接依赖。虽然这些优势吸引人,但它们也引入了传统云端方法中缺失的挑战。本文将探讨边缘学习的细节,探讨(i)边缘设备的受限数据存储、(ii)训练计算能力的限制和(iii)学习类数。通过我们的MAGNETO系统的实验,我们探讨了这些挑战,并提供了边缘ML的有价值见解。

Handling the inconsistency of systems of $\min\rightarrow$ fuzzy relational equations

  • paper_url: http://arxiv.org/abs/2308.12385
  • repo_url: None
  • paper_authors: Ismaïl Baaj
  • for: 这篇论文研究了系统$\min-\rightarrow$抽象关系方程的不一致性。
  • methods: 该论文使用了分析方法,计算了基于系统$\min-\rightarrow$抽象关系方程的Chebysev距离$\nabla = \inf_{d \in \mathcal{D} \Vert \beta - d \Vert$。
  • results: 该论文显示了$\nabla$的下界是一个vector不等式的解,无论使用了哪种剩下推导器(G"odel、Goguen或Lukasiewicz)。此外,在$\min-\rightarrow_{G}$系统中,$\nabla$可能是下界,而在$\min-\rightarrow_{GG}$和$\min-\rightarrow_{L}$系统中,$\nabla$总是最小值。
    Abstract In this article, we study the inconsistency of systems of $\min-\rightarrow$ fuzzy relational equations. We give analytical formulas for computing the Chebyshev distances $\nabla = \inf_{d \in \mathcal{D} \Vert \beta - d \Vert$ associated to systems of $\min-\rightarrow$ fuzzy relational equations of the form $\Gamma \Box_{\rightarrow}^{\min} x = \beta$, where $\rightarrow$ is a residual implicator among the G\"odel implication $\rightarrow_G$, the Goguen implication $\rightarrow_{GG}$ or Lukasiewicz's implication $\rightarrow_L$ and $\mathcal{D}$ is the set of second members of consistent systems defined with the same matrix $\Gamma$. The main preliminary result that allows us to obtain these formulas is that the Chebyshev distance $\nabla$ is the lower bound of the solutions of a vector inequality, whatever the residual implicator used. Finally, we show that, in the case of the $\min-\rightarrow_{G}$ system, the Chebyshev distance $\nabla$ may be an infimum, while it is always a minimum for $\min-\rightarrow_{GG}$ and $\min-\rightarrow_{L}$ systems.
    摘要 在这篇文章中,我们研究了系统$\min-\rightarrow$uzzifiable relational equation的不一致性。我们给出了计算$\nabla$的分解式,其相关于系统$\Gamma \Box_{\rightarrow}^{\min} x = \beta$,其中$\rightarrow$是G\"odel逻辑$\rightarrow_G$, Goguen逻辑$\rightarrow_{GG}$或Lukasiewicz逻辑$\rightarrow_L$中的剩余逻辑,而$\mathcal{D}$是定义同一个矩阵$\Gamma$的一致系统中的第二个成员的集合。主要的前提结果是$\nabla$是一个准确的下界,无论使用哪种剩余逻辑。最后,我们表明,在$\min-\rightarrow_{G}$系统中,$\nabla$可能是下界,而在$\min-\rightarrow_{GG}$和$\min-\rightarrow_{L}$系统中,$\nabla$总是最小值。