cs.AI - 2023-10-07

Balancing Specialized and General Skills in LLMs: The Impact of Modern Tuning and Data Strategy

  • paper_url: http://arxiv.org/abs/2310.04945
  • repo_url: None
  • paper_authors: Zheng Zhang, Chen Zheng, Da Tang, Ke Sun, Yukun Ma, Yingtong Bu, Xun Zhou, Liang Zhao
  • for: 这 paper 的目的是为大型语言模型(LLMs)特циализирован的商业化任务进行精细调整和评估。
  • methods: 这 paper 使用的方法包括:1) 精心混合域外和通用数据进行 fine-tuning,以实现一个优化的权衡 между通用和专业技能; 2) 设计了一个全面的评估框架,包括45个问题,以评估模型在 fonctionally 相关的维度上的表现,如可靠性、一致性和业务影响; 3) 分析模型大小和连续训练对纪录的影响,以导引有效的资源分配。
  • results: 这 paper 的结果表明,通过采用这些方法,可以很好地权衡通用语言能力和专业技能,并且可以提供有用的指导方针 для商业和研究人员在特циализирован任务上调整 LLMs。
    Abstract This paper introduces a multifaceted methodology for fine-tuning and evaluating large language models (LLMs) for specialized monetization tasks. The goal is to balance general language proficiency with domain-specific skills. The methodology has three main components: 1) Carefully blending in-domain and general-purpose data during fine-tuning to achieve an optimal balance between general and specialized capabilities; 2) Designing a comprehensive evaluation framework with 45 questions tailored to assess performance on functionally relevant dimensions like reliability, consistency, and business impact; 3) Analyzing how model size and continual training influence metrics to guide efficient resource allocation during fine-tuning. The paper details the design, data collection, analytical techniques, and results validating the proposed frameworks. It aims to provide businesses and researchers with actionable insights on effectively adapting LLMs for specialized contexts. We also intend to make public the comprehensive evaluation framework, which includes the 45 tailored questions and their respective scoring guidelines, to foster transparency and collaboration in adapting LLMs for specialized tasks.
    摘要
  1. Blending in-domain and general-purpose data during fine-tuning to achieve an optimal balance between general and specialized capabilities.2. Developing a comprehensive evaluation framework with 45 questions tailored to assess performance on functionally relevant dimensions such as reliability, consistency, and business impact.3. Analyzing how model size and continual training influence metrics to guide efficient resource allocation during fine-tuning.The paper details the design, data collection, analytical techniques, and results validating the proposed frameworks, providing actionable insights for businesses and researchers on effectively adapting LLMs for specialized contexts. Additionally, the comprehensive evaluation framework, including the 45 tailored questions and their respective scoring guidelines, will be made public to foster transparency and collaboration in adapting LLMs for specialized tasks.

Reliable Test-Time Adaptation via Agreement-on-the-Line

  • paper_url: http://arxiv.org/abs/2310.04941
  • repo_url: None
  • paper_authors: Eungyeup Kim, Mingjie Sun, Aditi Raghunathan, Zico Kolter
  • for: 本文主要针对测试时适应(TTA)方法的可靠性和可重复性问题进行研究,尤其是在不同的分布转移下进行适应后模型的评估和调参问题。
  • methods: 本文使用了多种TTA方法,包括随机抽象、权重学习和卷积神经网络等方法,并进行了广泛的实验研究以评估这些方法的可靠性和可重复性。
  • results: 本文的研究发现,TTAed模型具有强的同步特征(agreement-on-the-line phenomenon),即在不同的分布转移下,TTAed模型的预测结果呈现出一定的线性关系。基于这个发现,本文提出了一些解决TTA方法的可靠性和可重复性问题的方法,包括无 labels 数据上的OOD性能估计、无标签信息的模型 calibration 和无标签验证数据的hyperparameter 调参等方法。通过广泛的实验研究,本文证明了这些方法可以准确地评估TTA方法的性能和可靠性,并且可以在不同的分布转移下实现模型的适应和calibration。
    Abstract Test-time adaptation (TTA) methods aim to improve robustness to distribution shifts by adapting models using unlabeled data from the shifted test distribution. However, there remain unresolved challenges that undermine the reliability of TTA, which include difficulties in evaluating TTA performance, miscalibration after TTA, and unreliable hyperparameter tuning for adaptation. In this work, we make a notable and surprising observation that TTAed models strongly show the agreement-on-the-line phenomenon (Baek et al., 2022) across a wide range of distribution shifts. We find such linear trends occur consistently in a wide range of models adapted with various hyperparameters, and persist in distributions where the phenomenon fails to hold in vanilla models (i.e., before adaptation). We leverage these observations to make TTA methods more reliable in three perspectives: (i) estimating OOD accuracy (without labeled data) to determine when TTA helps and when it hurts, (ii) calibrating TTAed models without label information, and (iii) reliably determining hyperparameters for TTA without any labeled validation data. Through extensive experiments, we demonstrate that various TTA methods can be precisely evaluated, both in terms of their improvements and degradations. Moreover, our proposed methods on unsupervised calibration and hyperparameters tuning for TTA achieve results close to the ones assuming access to ground-truth labels, in terms of both OOD accuracy and calibration error.
    摘要
  1. 无标签数据来评估 OOD 精度,以确定 TTA 是否有助于或妨碍。2. 使用无标签数据来准确调整 TTAed 模型的误差。3. 使用无标签数据来可靠地确定 TTA 过程中的 гипер参数。我们通过广泛的实验,证明了不同的 TTA 方法可以准确地评估,并且我们的提议的方法可以在无标签数据下达到与 assuming 访问真实标签数据的结果相似的水平,以 both OOD 精度和误差来衡量。

Diff-Transfer: Model-based Robotic Manipulation Skill Transfer via Differentiable Physics Simulation

  • paper_url: http://arxiv.org/abs/2310.04930
  • repo_url: None
  • paper_authors: Yuqi Xiang, Feitong Chen, Qinsi Wang, Yang Gang, Xiang Zhang, Xinghao Zhu, Xingyu Liu, Lin Shao
  • For: 本文旨在提出一种基于差分 физи学仿真的精准转移框架,以便让智能机器人在完成类似 yet novel 任务时,能够快速转移已经熟悉的技能。* Methods: 本文提出了一种新的转移框架,称为 $\textit{Diff-Transfer}$,它利用差分 физи学仿真来效率地转移机器人技能。具体来说, $\textit{Diff-Transfer}$ 在任务空间中找到可行的任务路径,并在每对邻近点上适应已知的动作,以解决另一个子任务。这种适应是通过差分 физи学仿真的梯度信息引导的。* Results: 作者在实验中使用 $\textit{Diff-Transfer}$ 执行了四个复杂的转移任务,包括机器人抓取和移动等,并通过了全面的实验。详细的实验结果和视频可以参考https://sites.google.com/view/difftransfer。
    Abstract The capability to transfer mastered skills to accomplish a range of similar yet novel tasks is crucial for intelligent robots. In this work, we introduce $\textit{Diff-Transfer}$, a novel framework leveraging differentiable physics simulation to efficiently transfer robotic skills. Specifically, $\textit{Diff-Transfer}$ discovers a feasible path within the task space that brings the source task to the target task. At each pair of adjacent points along this task path, which is two sub-tasks, $\textit{Diff-Transfer}$ adapts known actions from one sub-task to tackle the other sub-task successfully. The adaptation is guided by the gradient information from differentiable physics simulations. We propose a novel path-planning method to generate sub-tasks, leveraging $Q$-learning with a task-level state and reward. We implement our framework in simulation experiments and execute four challenging transfer tasks on robotic manipulation, demonstrating the efficacy of $\textit{Diff-Transfer}$ through comprehensive experiments. Supplementary and Videos are on the website https://sites.google.com/view/difftransfer
    摘要 “智能机器人需要具备将已经学习的技能应用于类似 yet novel 任务的能力。在这个工作中,我们介绍了 $\textit{Diff-Transfer}$ 框架,利用可微的物理学习来有效地传递 робо机能力。具体来说,$\textit{Diff-Transfer}$ 发现可以在任务空间中找到可行的任务路径,将源任务转化为目标任务。在每对相邻的任务点上,$\textit{Diff-Transfer}$ 采用已知的动作改进一个子任务,以解决另一个子任务。改进是通过可微的物理学习的梯度信息引导的。我们提出了一种新的路径规划方法,基于 $Q$-学习和任务级状态和奖励。我们在 simulator 中实现了我们的框架,并在机器人 manipulate 等四个具有挑战性的转移任务中进行了实验,证明了 $\textit{Diff-Transfer}$ 的有效性。补充和视频可以在网站 https://sites.google.com/view/difftransfer 上找到。”Note that Simplified Chinese is used here, which is the most widely used variety of Chinese. If you prefer Traditional Chinese, I can provide that version as well.

Crystal: Introspective Reasoners Reinforced with Self-Feedback

  • paper_url: http://arxiv.org/abs/2310.04921
  • repo_url: https://github.com/liujch1998/crystal
  • paper_authors: Jiacheng Liu, Ramakanth Pasunuru, Hannaneh Hajishirzi, Yejin Choi, Asli Celikyilmaz
  • For: The paper aims to improve the performance and interpretability of commonsense reasoning using knowledge-augmented reasoning methods.* Methods: The proposed method, called Crystal, introspects for knowledge statements related to a given question and makes an informed prediction grounded in the previously introspected knowledge. The knowledge introspection and knowledge-grounded reasoning modes of the model are tuned via reinforcement learning to mutually adapt.* Results: Crystal significantly outperforms both the standard supervised finetuning and chain-of-thought distilled methods, and enhances the transparency of the commonsense reasoning process.
    Abstract Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the reasoning process is explicitly verbalized and utilized. However, existing implementations, including "chain-of-thought" and its variants, fall short in capturing the introspective nature of knowledge required in commonsense reasoning, and in accounting for the mutual adaptation between the generation and utilization of knowledge. We propose a novel method to develop an introspective commonsense reasoner, Crystal. To tackle commonsense problems, it first introspects for knowledge statements related to the given question, and subsequently makes an informed prediction that is grounded in the previously introspected knowledge. The knowledge introspection and knowledge-grounded reasoning modes of the model are tuned via reinforcement learning to mutually adapt, where the reward derives from the feedback given by the model itself. Experiments show that Crystal significantly outperforms both the standard supervised finetuning and chain-of-thought distilled methods, and enhances the transparency of the commonsense reasoning process. Our work ultimately validates the feasibility and potential of reinforcing a neural model with self-feedback.
    摘要 广泛的研究表明,可以通过增强知识推理的方法来提高常识推理的性能和可读性,其中包括使用知识推理方法,其中的知识是明确地表达出来并利用。但是现有的实现方法,如“链子”和其变体,对于常识推理中的 introspective 知识需求和知识生成和使用之间的互动,缺乏捕捉。我们提出了一种新的方法,即 Crystal,以解决常识问题。它首先 introspects 相关的知识含义,然后根据这些知识做出了 Informed 的预测,这些预测是基于之前 introspected 的知识。知识 introspection 和知识推理模式在模型中被调整通过反射学习,以便相互适应。实验结果显示,Crystal 与标准的监督调整和链子混合方法相比,有着很高的表现。我们的工作终于验证了将 neural 模型与自己的反馈整合的可能性。>>>

Robust Network Pruning With Sparse Entropic Wasserstein Regression

  • paper_url: http://arxiv.org/abs/2310.04918
  • repo_url: None
  • paper_authors: Lei You, Hei Victor Cheng
  • for: 本研究推出了一种针对噪声梯度的高效神经网络剔除技术,通过计算empirical Fisher Information Matrix (FIM)来实现。
  • methods: 我们引入了一种Entropic Wasserstein regression (EWR)方法,利用最优运输 (OT) 问题的几何特点,可以有效地减少噪声。
  • results: 我们的方法在不同的网络模型上进行了广泛的实验,与当前最佳方法(SoTA)的网络剔除算法相比,我们的方法在网络大小或目标稀疏性较大时表现更佳,尤其是在噪声存在的情况下,增加约6%的准确率和8%的测试损失。
    Abstract This study unveils a cutting-edge technique for neural network pruning that judiciously addresses noisy gradients during the computation of the empirical Fisher Information Matrix (FIM). We introduce an entropic Wasserstein regression (EWR) formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. This is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points. The unique strength of the Wasserstein distance is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of the proposed method with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.
    摘要

On Accelerating Diffusion-based Molecular Conformation Generation in SE(3)-invariant Space

  • paper_url: http://arxiv.org/abs/2310.04915
  • repo_url: None
  • paper_authors: Zihan Zhou, Ruiying Liu, Tianshu Yu
  • for: 本研究旨在加速在SE(3)-不变空间中的分析生成模型,以提高其在实际应用中的效率。
  • methods: 本研究使用了对现有方法的约错分析,以了解SE(3)-不变空间中的扩散机制。基于这种理解,我们开发了一种新的加速方案,以提高分析生成模型的运行速度。
  • results: 实验表明,我们的加速方案可以在SE(3)-不变空间中生成高质量的分析结果,与现有方法相比,具有50倍至100倍的速度提升。
    Abstract Diffusion-based generative models in SE(3)-invariant space have demonstrated promising performance in molecular conformation generation, but typically require solving stochastic differential equations (SDEs) with thousands of update steps. Till now, it remains unclear how to effectively accelerate this procedure explicitly in SE(3)-invariant space, which greatly hinders its wide application in the real world. In this paper, we systematically study the diffusion mechanism in SE(3)-invariant space via the lens of approximate errors induced by existing methods. Thereby, we develop more precise approximate in SE(3) in the context of projected differential equations. Theoretical analysis is further provided as well as empirical proof relating hyper-parameters with such errors. Altogether, we propose a novel acceleration scheme for generating molecular conformations in SE(3)-invariant space. Experimentally, our scheme can generate high-quality conformations with 50x--100x speedup compared to existing methods.
    摘要 Diffusion-based生成模型在SE(3)-不变空间中表现出了优秀的表现,但通常需要解决数千步隐藏微分方程(SDEs)。直到现在,未知如何明确地加速这个过程,这大大阻碍了它在实际世界中的广泛应用。在这篇论文中,我们系统地研究了SE(3)-不变空间中的扩散机制,通过对现有方法的误差引入的精度来评估。此外,我们还提供了相关的理论分析和实验证明,关于参数与这些误差之间的关系。总之,我们提出了一种新的加速方案,可以在SE(3)-不变空间中高速生成分子结构。实验表明,我们的方案可以比现有方法快速50-100倍,生成高质量的分子结构。

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

  • paper_url: http://arxiv.org/abs/2310.04914
  • repo_url: None
  • paper_authors: Avinash Madasu, Anahita Bhiwandiwalla, Vasudev Lal
  • for: 这些论文的目的是研究基础的多模态模型是否可以适应视频任务,以及这种方法的效果。
  • methods: 这些论文使用的方法是将图文模型适应视频任务,并对这种方法的性能进行评估。
  • results: 研究发现,基础的图文模型在视频理解任务上表现出色,特别是在视频识别、视频检索和视频多选任务上。然而,它们在视频问答和视频描述任务上表现较差。这些发现反映了将图文模型适应视频任务的效果。
    Abstract Foundational multimodal models pre-trained on large scale image-text pairs or video-text pairs or both have shown strong generalization abilities on downstream tasks. However unlike image-text models, pretraining video-text models is always not feasible due to the difficulty in collecting large-scale clean and aligned data, and exponential computational costs involved in the pretraining phase. Therefore, the pertinent question to ask is: Can image-text models be adapted to video tasks and is there any benefit to using these models over pretraining directly on videos? In this work, we focus on this question by proposing a detailed study on the generalization abilities of image-text models when evaluated on video understanding tasks in a zero-shot setting. We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP). Our experiments show that image-text models exhibit impressive performance on video AR, video RT and video MC. Furthermore, they perform moderately on video captioning and poorly on video QA. These findings shed a light on the benefits of adapting foundational image-text models to an array of video tasks while avoiding the costly pretraining step.
    摘要 基础多Modal模型在大规模图片文本对或视频文本对上预训练后显示出强大的通用能力。然而,与图片文本模型不同,不可能在预训练视频文本模型,因为收集大规模干净对齐数据的困难,以及预训练阶段的计算成本呈指数增长。因此,关键的问题是:图片文本模型能否适应视频任务,是否有任何优势使用这些模型而不是直接预训练在视频上?在这项工作中,我们关注这个问题,通过对Foundational image-text模型在视频理解任务上的总体能力进行详细的研究。我们对9种基础图片文本模型在多种视频任务上进行了多样化的测试,包括视频动作识别(视频AR)、视频检索(视频RT)、视频问答(视频QA)、视频多选(视频MC)和视频描述(视频CP)。我们的实验结果表明,图片文本模型在视频AR、视频RT和视频MC方面表现出色,而在视频描述和视频QA方面表现较差。这些发现反映了适应多种视频任务的图片文本模型,而不需要费时的预训练步骤。

Faithful Knowledge Graph Explanations for Commonsense Reasoning

  • paper_url: http://arxiv.org/abs/2310.04910
  • repo_url: None
  • paper_authors: Weihe Zhai, Arkaitz Zubiaga, Bingquan Liu
  • for: 本研究旨在提高知识图(KG)基于解释的准确性和可靠性。
  • methods: 本研究提出了两项主要贡献:首先,我们提出了两种量化指标——图共识度和图准确度——来衡量知识图基于解释的准确性。其次,我们引入了一种新的培训方法,即具有一定的一致性规范的Consistent GNN(CGNN),以提高解释的准确性。
  • results: 我们的分析表明,使用原始模型预测的方法可能会导致知识图中的预测结果与原始模型预测结果不同。而我们提出的CGNN方法能够提高图共识度和图准确度,这表明了它在生成更准确的解释方面的潜力。
    Abstract While fusing language models (LMs) and knowledge graphs (KGs) has become common in commonsense question answering research, enabling faithful chain-of-thought explanations in these models remains an open problem. One major weakness of current KG-based explanation techniques is that they overlook the faithfulness of generated explanations during evaluation. To address this gap, we make two main contributions: (1) We propose and validate two quantitative metrics - graph consistency and graph fidelity - to measure the faithfulness of KG-based explanations. (2) We introduce Consistent GNN (CGNN), a novel training method that adds a consistency regularization term to improve explanation faithfulness. Our analysis shows that predictions from KG often diverge from original model predictions. The proposed CGNN approach boosts consistency and fidelity, demonstrating its potential for producing more faithful explanations. Our work emphasises the importance of explicitly evaluating suggest a path forward for developing architectures for faithful graph-based explanations.
    摘要 当拓展语言模型(LM)和知识图(KG)的研究成为常见的现象,使得 faithful chain-of-thought 解释在这些模型中保持开放问题。现有的 KG 解释技术的一个主要弱点是忽略生成的解释的忠实程度 durante la evaluación。为了解决这个漏洞,我们作出了两个主要贡献:1. 我们提出了两个量化指标 - 图共识性和图准确性 - 来衡量 KG 解释的忠实度。2. 我们介绍了一种新的训练方法,称为 Consistent GNN(CGNN),该方法添加了一个准确性规则来提高解释的忠实度。我们的分析表明,KG 的预测结果与原始模型的预测结果经常存在差异。CGNN 方法可以提高准确性和忠实度,这表明其在生成更 faithful 的解释方面具有潜在的优势。我们的工作强调了评估 KG 解释的忠实度的重要性,并建议一种发展 faithful graph-based 解释体系的可能之路。

Generative AI May Prefer to Present National-level Characteristics of Cities Based on Stereotypical Geographic Impressions at the Continental Level

  • paper_url: http://arxiv.org/abs/2310.04897
  • repo_url: None
  • paper_authors: Shan Ye
  • for: 测试中文基于生成人工智能平台“文心一个”的图像渲染能力,以及该平台是否会描绘出不同国家城市景观的多样性。
  • methods: 通过使用“文心一个”平台生成不同国家城市街景图像,然后对这些图像进行分析和评估,以了解该平台的图像渲染能力和可能存在的偏见。
  • results: 研究发现,“文心一个”平台生成的图像可能带有大洲水平的偏见,表现出不同国家城市的经济发展水平和现代化程度。此外,这些生成图像不能充分表达不同国家城市的多样性。使用这些图像进行地理教育或宣传活动可能会巩固人们对各国的偏见。
    Abstract A simple experiment was conducted to test the ability of the Chinese-based generative artificial intelligence (AI) platform, Wenxin Yige, to render images of urban street views of different countries. The study found that images generated by this AI platform may contain continental-level stereotypes in terms of showing the level of economic development and modernization. Street view images generated from Wenxin Yige do not adequately represent the diverse range of urban landscapes found across different nations. Using these generated images for geography education or outreach initiatives could inadvertently strengthen people's existing stereotypical views about individual countries.
    摘要 一项简单的实验测试了基于中文的生成式人工智能平台“文心易歌”的图像生成能力,以测试它是否能够生成不同国家城市视图的图像。研究发现,由这个AI平台生成的图像可能带有大洲水平的刻板印象,表现出不同国家的经济发展和现代化水平。这些生成的图像无法准确表达不同国家城市的多样化风貌,使用这些图像进行地理教育或宣传活动可能会巩固人们对各国的刻板印象。

Cell Tracking-by-detection using Elliptical Bounding Boxes

  • paper_url: http://arxiv.org/abs/2310.04895
  • repo_url: https://github.com/LucasKirsten/Deep-Cell-Tracking-EBB
  • paper_authors: Lucas N. Kirsten, Cláudio R. Jung
  • for: 这 paper 的目的是提出一种基于经典 tracking-by-detection 模式的细胞检测和跟踪方法,以避免大量的标注数据。
  • methods: 该方法使用 oriented ellipses 来 aproximate 细胞形状,然后使用通用 oriented object detectors 来在每帧中标识细胞。 global data association 算法使用 probability distance metrics 来explore temporal cell similarity。
  • results: 该方法可以 achieves detection and tracking results competitively with state-of-the-art techniques that require considerably more extensive data annotation。Here is the summary in English for reference:
  • for: The purpose of this paper is to propose a new approach based on the classical tracking-by-detection paradigm for cell detection and tracking, which alleviates the need for extensive annotated data.
  • methods: The method approximates cell shapes as oriented ellipses and uses generic-purpose oriented object detectors to identify cells in each frame. A global data association algorithm explores temporal cell similarity using probability distance metrics.
  • results: The method achieves detection and tracking results competitively with state-of-the-art techniques that require considerably more extensive data annotation.
    Abstract Cell detection and tracking are paramount for bio-analysis. Recent approaches rely on the tracking-by-model evolution paradigm, which usually consists of training end-to-end deep learning models to detect and track the cells on the frames with promising results. However, such methods require extensive amounts of annotated data, which is time-consuming to obtain and often requires specialized annotators. This work proposes a new approach based on the classical tracking-by-detection paradigm that alleviates the requirement of annotated data. More precisely, it approximates the cell shapes as oriented ellipses and then uses generic-purpose oriented object detectors to identify the cells in each frame. We then rely on a global data association algorithm that explores temporal cell similarity using probability distance metrics, considering that the ellipses relate to two-dimensional Gaussian distributions. Our results show that our method can achieve detection and tracking results competitively with state-of-the-art techniques that require considerably more extensive data annotation. Our code is available at: https://github.com/LucasKirsten/Deep-Cell-Tracking-EBB.
    摘要 维度分析中的细胞检测和跟踪是非常重要的。现有的方法大多基于跟踪-by-模型演化 paradigm,通常是通过训练端到终的深度学习模型来检测和跟踪细胞在帧中的承诺果。然而,这些方法需要庞大量的注解数据,它们是时间消耗的和特殊的注解员。本工作提出了一种新的方法,基于经典的跟踪-by-检测 paradigm,可以减少注解数据的需求。更准确地说,我们将细胞形状 aproximated为方向几何体,然后使用通用的方向对象检测器来在每帧中识别细胞。我们然后采用了全球数据协调算法,通过考虑细胞形状相似性的时间序列距离度量,来实现细胞跟踪。我们的结果表明,我们的方法可以与state-of-the-art技术相比,实现检测和跟踪结果,而不需要庞大量的注解数据。我们的代码可以在:https://github.com/LucasKirsten/Deep-Cell-Tracking-EBB中找到。

Question-focused Summarization by Decomposing Articles into Facts and Opinions and Retrieving Entities

  • paper_url: http://arxiv.org/abs/2310.04880
  • repo_url: None
  • paper_authors: Krutika Sarode, Shashidhar Reddy Javaji, Vishal Kalakonnavar
  • for: 这个研究旨在利用自然语言处理技术预测股票价格波动,具体来说是早期发现经济、政治、社会和技术变革,以便捕捉市场机会。
  • methods: 该方法包括从新闻文章中提取突出的事实,然后将这些事实与实体组合成 tuples,并使用这些 tuples 获取市场变化的摘要,最后将所有摘要合并为整个文章的摘要总结。
  • results: 研究希望通过分析Wikipedia数据和经济学人报道来建立公司和实体之间的关系,并使用大语言模型GPT 3.5来获取摘要和形成最终摘要。研究的最终目标是开发一个全面的系统,为金融分析师和投资者提供更加有知见的决策工具,以便早期发现市场趋势和事件。
    Abstract This research focuses on utilizing natural language processing techniques to predict stock price fluctuations, with a specific interest in early detection of economic, political, social, and technological changes that can be leveraged for capturing market opportunities. The proposed approach includes the identification of salient facts and events from news articles, then use these facts to form tuples with entities which can be used to get summaries of market changes for particular entity and then finally combining all the summaries to form a final abstract summary of the whole article. The research aims to establish relationships between companies and entities through the analysis of Wikipedia data and articles from the Economist. Large Language Model GPT 3.5 is used for getting the summaries and also forming the final summary. The ultimate goal of this research is to develop a comprehensive system that can provide financial analysts and investors with more informed decision-making tools by enabling early detection of market trends and events.
    摘要 这项研究探讨了使用自然语言处理技术预测股票价格波动,具体来说是早期检测经济、政治、社会和技术变化,以便捕捉市场机会。提出的方法包括从新闻文章中提取重要的事实,然后将这些事实与实体组合成Tuple,并使用这些Tuple获取市场变化的摘要。最终,将所有摘要合并为总摘要。研究的目标是通过分析Wikipedia数据和经济学人报道来建立公司和实体之间的关系。使用大语言模型GPT 3.5来获取摘要和组合总摘要。该研究的最终目标是开发一个全面的系统,为金融分析师和投资者提供更多的 Informed Decision-making 工具,以便早期检测市场趋势和事件。

Hybrid Recommendation System using Graph Neural Network and BERT Embeddings

  • paper_url: http://arxiv.org/abs/2310.04878
  • repo_url: None
  • paper_authors: Shashidhar Reddy Javaji, Krutika Sarode
  • for: 这种模型是为了提供个性化的动画推荐,以满足不同用户的兴趣和需求。
  • methods: 该模型使用图神经网络(GNN)和句子转换器嵌入来预测不同用户的动画推荐,同时考虑了动画的特征和用户对不同动画的交互。
  • results: 该模型不仅可以为用户提供个性化的动画推荐,还可以预测特定用户对某部动画的评分。
    Abstract Recommender systems have emerged as a crucial component of the modern web ecosystem. The effectiveness and accuracy of such systems are critical for providing users with personalized recommendations that meet their specific interests and needs. In this paper, we introduce a novel model that utilizes a Graph Neural Network (GNN) in conjunction with sentence transformer embeddings to predict anime recommendations for different users. Our model employs the task of link prediction to create a recommendation system that considers both the features of anime and user interactions with different anime. The hybridization of the GNN and transformer embeddings enables us to capture both inter-level and intra-level features of anime data.Our model not only recommends anime to users but also predicts the rating a specific user would give to an anime. We utilize the GraphSAGE network for model building and weighted root mean square error (RMSE) to evaluate the performance of the model. Our approach has the potential to significantly enhance the accuracy and effectiveness of anime recommendation systems and can be extended to other domains that require personalized recommendations.
    摘要 现代网络生态系统中,推荐系统已成为一种重要的组成部分。推荐系统的有效性和准确性对于为用户提供个性化推荐是非常重要的。在这篇论文中,我们介绍了一种新的模型,该模型利用图神经网络(GNN)和句子转换器嵌入来预测不同用户的动画推荐。我们的模型通过链接预测任务来创建一个考虑用户和动画特征的推荐系统。我们将GNN和句子转换器嵌入结合使用,以便捕捉动画数据中的内部和外部特征。我们的模型不仅为用户推荐动画,还预测特定用户对于某个动画的评分。我们使用GraphSAGE网络进行模型建立,并使用Weighted Root Mean Square Error(RMSE)来评估模型的性能。我们的方法有可能在动画推荐系统的准确性和有效性方面带来显著改进,并可以扩展到其他需要个性化推荐的领域。

AirIMU: Learning Uncertainty Propagation for Inertial Odometry

  • paper_url: http://arxiv.org/abs/2310.04874
  • repo_url: None
  • paper_authors: Yuheng Qiu, Chen Wang, Xunfei Zhou, Youjie Xia, Sebastian Scherer
  • for: 增强各种传感器系统的优化融合,例如视觉或LiDAR激光测距仪。
  • methods: 学习基于方法,考虑感知器的非线性特性和各种传感器模型。
  • results: 在多个宠 Singles benchmark 和一个大规模直升机数据集上,reduced the drift rate of the inertial odometry by a factor of between 2.2 and 4 times。
    Abstract Accurate uncertainty estimation for inertial odometry is the foundation to achieve optimal fusion in multi-sensor systems, such as visual or LiDAR inertial odometry. Prior studies often simplify the assumptions regarding the uncertainty of inertial measurements, presuming fixed covariance parameters and empirical IMU sensor models. However, the inherent physical limitations and non-linear characteristics of sensors are difficult to capture. Moreover, uncertainty may fluctuate based on sensor rates and motion modalities, leading to variations across different IMUs. To address these challenges, we formulate a learning-based method that not only encapsulate the non-linearities inherent to IMUs but also ensure the accurate propagation of covariance in a data-driven manner. We extend the PyPose library to enable differentiable batched IMU integration with covariance propagation on manifolds, leading to significant runtime speedup. To demonstrate our method's adaptability, we evaluate it on several benchmarks as well as a large-scale helicopter dataset spanning over 262 kilometers. The drift rate of the inertial odometry on these datasets is reduced by a factor of between 2.2 and 4 times. Our method lays the groundwork for advanced developments in inertial odometry.
    摘要 准确的不确定性估计是多感器系统中征印底层的基础,以实现最优的融合。过去的研究常常简化了涉及到涨动测量的不确定性的假设,假设IMU传感器的covariance参数是固定的,并使用经验测量模型。然而,涉及到传感器的物理限制和非线性特性很难捕捉。此外,不确定性可能会随着传感器的读取速率和运动模式而变化,导致不同的IMU传感器之间存在差异。为了解决这些挑战,我们提出了一种学习基于的方法,不仅能够捕捉IMU传感器的非线性特性,还能够确保数据驱动的 covariance 的准确传播。我们将 PyPose 库扩展以实现批处理的 IMU 融合,从而实现了显著的运行速度提升。为了证明我们的方法的适应性,我们对多个 benchmark 和一个大规模的直升机数据集进行了评估,数据集涵盖了262公里的距离。在这些数据集上,IMU 的涨动率被降低了2.2-4倍。我们的方法为高级发展征印底层提供了基础。

Lemur: Integrating Large Language Models in Automated Program Verification

  • paper_url: http://arxiv.org/abs/2310.04870
  • repo_url: None
  • paper_authors: Haoze Wu, Clark Barrett, Nina Narodytska
  • for: automated program verification
  • methods: combines the power of LLMs and automated reasoners
  • results: practical improvements on a set of synthetic and competition benchmarks
    Abstract The demonstrated code-understanding capability of LLMs raises the question of whether they can be used for automated program verification, a task that often demands high-level abstract reasoning about program properties, which is challenging for verification tools. We propose a general methodology to combine the power of LLMs and automated reasoners for automated program verification. We formally describe this methodology as a set of derivation rules and prove its soundness. We instantiate the calculus as a sound automated verification procedure, which led to practical improvements on a set of synthetic and competition benchmarks.
    摘要 LLMS 的代码理解能力的示例引起了自动化程序验证的问题,这是一项需要高级抽象逻辑来评估程序属性的任务,而这是自动验证工具的挑战。我们提出了将 LLMS 和自动逻辑工具结合使用的一般方法,并正式描述了这种方法的 derivation 规则,并证明其正确性。我们将这种方法实现为一个sound的自动验证过程,并在一组 sintetic 和竞赛 bencmarks 上实现了实践性的改进。

ILuvUI: Instruction-tuned LangUage-Vision modeling of UIs from Machine Conversations

  • paper_url: http://arxiv.org/abs/2310.04869
  • repo_url: None
  • paper_authors: Yue Jiang, Eldon Schoop, Amanda Swearngin, Jeffrey Nichols
  • for: 这篇论文目的是为了提高对UI任务的识别能力,并且不需要人类提供纠正注解。
  • methods: 该方法 combining existing pixel-based methods with a Large Language Model (LLM),可以应用于任何UI屏幕截图数据集。
  • results: 该研究生成了335,000个对话示例,并使用它们来练化一个对话型VLM进行UI任务。研究还评估了模型的性能,包括UI元素检测任务、回答质量和多步UI导航和规划等。
    Abstract Multimodal Vision-Language Models (VLMs) enable powerful applications from their fused understanding of images and language, but many perform poorly on UI tasks due to the lack of UI training data. In this paper, we adapt a recipe for generating paired text-image training data for VLMs to the UI domain by combining existing pixel-based methods with a Large Language Model (LLM). Unlike prior art, our method requires no human-provided annotations, and it can be applied to any dataset of UI screenshots. We generate a dataset of 335K conversational examples paired with UIs that cover Q&A, UI descriptions, and planning, and use it to fine-tune a conversational VLM for UI tasks. To assess the performance of our model, we benchmark it on UI element detection tasks, evaluate response quality, and showcase its applicability to multi-step UI navigation and planning.
    摘要 多Modal视觉语言模型(VLM)允许强大的应用由图像和语言的融合理解,但许多perform poorly on UI任务由于缺乏UI训练数据。在这篇论文中,我们适应了一个方法来生成图像和文本的对应训练数据 для VLM,并将其应用到UI领域。与先前艺术不同,我们的方法不需要人工提供笔记,并且可以应用于任何UI屏幕截图集。我们生成了335K的对话示例,与UI描述、问答和规划相关,并用它们来精心UI任务。为评估我们模型的性能,我们对UI元素检测任务进行了测试,评估响应质量,并显示了其可应用于多步UI导航和规划。

ForeSeer: Product Aspect Forecasting Using Temporal Graph Embedding

  • paper_url: http://arxiv.org/abs/2310.04865
  • repo_url: None
  • paper_authors: Zixuan Liu, Gaurush Hiranandani, Kun Qian, Eddie W. Huang, Yi Xu, Belinda Zeng, Karthik Subbian, Sheng Wang
  • for: 预测新产品的未来升级特征
  • methods: 使用文本挖掘和产品嵌入approach,逐渐在时间产品图上进行训练
  • results: 与现有方法相比, ForeSeer 在实际 setting 中具有至少49.1%的 AUPRC 提升,并且在产品图和评论特征关联预测方面具有改善。
    Abstract Developing text mining approaches to mine aspects from customer reviews has been well-studied due to its importance in understanding customer needs and product attributes. In contrast, it remains unclear how to predict the future emerging aspects of a new product that currently has little review information. This task, which we named product aspect forecasting, is critical for recommending new products, but also challenging because of the missing reviews. Here, we propose ForeSeer, a novel textual mining and product embedding approach progressively trained on temporal product graphs for this novel product aspect forecasting task. ForeSeer transfers reviews from similar products on a large product graph and exploits these reviews to predict aspects that might emerge in future reviews. A key novelty of our method is to jointly provide review, product, and aspect embeddings that are both time-sensitive and less affected by extremely imbalanced aspect frequencies. We evaluated ForeSeer on a real-world product review system containing 11,536,382 reviews and 11,000 products over 3 years. We observe that ForeSeer substantially outperformed existing approaches with at least 49.1\% AUPRC improvement under the real setting where aspect associations are not given. ForeSeer further improves future link prediction on the product graph and the review aspect association prediction. Collectively, Foreseer offers a novel framework for review forecasting by effectively integrating review text, product network, and temporal information, opening up new avenues for online shopping recommendation and e-commerce applications.
    摘要 开发文本挖掘方法来挖掘用户评价中的方面,因为它对理解客户需求和产品特性非常重要。然而,尚未有效地预测新产品未经评价的方面。这个任务,我们称之为产品方面预测,是推荐新产品的关键任务,但也是非常困难的因为缺少评价。在这里,我们提出了 ForeSeer,一种新的文本挖掘和产品嵌入方法,通过在时间维度上进行模糊嵌入来预测未来评价中可能出现的方面。ForeSeer 可以从类似产品上的大规模产品图中传递评价,并利用这些评价来预测未来可能出现的方面。我们的方法的一个新特点是同时提供评价、产品和方面嵌入,这些嵌入不仅时间敏感,也受到方面频率异常高的影响。我们在一个真实的产品评价系统上进行了测试,包括11,536,382篇评价和11,000个产品,覆盖了3年的时间。我们发现 ForeSeer 在实际 Setting 下明显超过了现有方法,至少提高了49.1%的 AUPRC。ForeSeer 还改进了产品图中未来链接预测和评价方面关联预测。总之,ForeSeer 提供了一种新的评价预测框架,通过有效地结合评价文本、产品网络和时间信息,打开了在线购物推荐和电商应用的新 Avenues。

Uncovering hidden geometry in Transformers via disentangling position and context

  • paper_url: http://arxiv.org/abs/2310.04861
  • repo_url: https://github.com/jiajunsong629/uncover-hidden-geometry
  • paper_authors: Jiajun Song, Yiqiao Zhong
  • for: This paper aims to provide a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components, in order to gain structural insights about input formats in in-context learning and arithmetic tasks.
  • methods: The authors use a tensor representation of embedding vectors $\boldsymbol{h} \in \mathbb{R}^{C \times T \times d}$ to extract the mean effects and decompose the hidden states into interpretable components, including the global mean vector $\boldsymbol{\mu}$, the mean vectors across contexts and positions $\mathbf{pos}_t$ and $\mathbf{ctx}c$, and the residual vector $\mathbf{resid}{c,t}$.
  • results: The authors find that the decomposition yields a pervasive mathematical structure across popular transformer architectures and diverse text datasets, including a low-dimensional, continuous, and often spiral shape for the mean vectors across positions, clear cluster structure for the mean vectors across contexts, and mutual incoherence between the mean vectors across positions and contexts. These findings offer structural insights into the input formats of transformers and have implications for in-context learning and arithmetic tasks.
    Abstract Transformers are widely used to extract complex semantic meanings from input tokens, yet they usually operate as black-box models. In this paper, we present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into interpretable components. For any layer, embedding vectors of input sequence samples are represented by a tensor $\boldsymbol{h} \in \mathbb{R}^{C \times T \times d}$. Given embedding vector $\boldsymbol{h}_{c,t} \in \mathbb{R}^d$ at sequence position $t \le T$ in a sequence (or context) $c \le C$, extracting the mean effects yields the decomposition \[ \boldsymbol{h}_{c,t} = \boldsymbol{\mu} + \mathbf{pos}_t + \mathbf{ctx}_c + \mathbf{resid}_{c,t} \] where $\boldsymbol{\mu}$ is the global mean vector, $\mathbf{pos}_t$ and $\mathbf{ctx}_c$ are the mean vectors across contexts and across positions respectively, and $\mathbf{resid}_{c,t}$ is the residual vector. For popular transformer architectures and diverse text datasets, empirically we find pervasive mathematical structure: (1) $(\mathbf{pos}_t)_{t}$ forms a low-dimensional, continuous, and often spiral shape across layers, (2) $(\mathbf{ctx}_c)_c$ shows clear cluster structure that falls into context topics, and (3) $(\mathbf{pos}_t)_{t}$ and $(\mathbf{ctx}_c)_c$ are mutually incoherent -- namely $\mathbf{pos}_t$ is almost orthogonal to $\mathbf{ctx}_c$ -- which is canonical in compressed sensing and dictionary learning. This decomposition offers structural insights about input formats in in-context learning (especially for induction heads) and in arithmetic tasks.
    摘要 transformers 广泛使用来提取输入токен的复杂含义,但它们通常作为黑obox模型运行。在这篇文章中,我们提出了一种简单又有用的隐状态(或嵌入)的解剖法。对于任何层,输入序列样本的嵌入 вектор $\boldsymbol{h} \in \mathbb{R}^{C \times T \times d}$ 可以被表示为tensor。在序列(或上下文) $c \le C$ 中的嵌入 vector $\boldsymbol{h}_{c,t} \in \mathbb{R}^d$ 的mean effect的提取可以得到以下含义的解剖:$$\boldsymbol{h}_{c,t} = \boldsymbol{\mu} + \mathbf{pos}_t + \mathbf{ctx}_c + \mathbf{resid}_{c,t}$$其中,$\boldsymbol{\mu}$ 是全局均值向量,$\mathbf{pos}_t$ 和 $\mathbf{ctx}_c$ 是在位置和上下文中的均值向量,respectively,和 $\mathbf{resid}_{c,t}$ 是剩余向量。对于流行的 transformer 架构和多种文本数据集,我们在实验上发现了一种普遍的数学结构:1. $({\mathbf{pos}_t)_t$ 形成了一个低维、连续、常见的旋转形状 across layers。2. $({\mathbf{ctx}_c)_c$ 显示了明确的群结构,frequently falls into context topics。3. $({\mathbf{pos}_t)_t$ 和 $({\mathbf{ctx}_c)_c$ 是互不协方的 -- namely $\mathbf{pos}_t$ 几乎是 $\mathbf{ctx}_c$ 的orthogonal vector -- which is canonical in compressed sensing and dictionary learning.这种解剖提供了针对输入格式的结构性理解,特别是在 induction heads 中,以及在算术任务中。

Balancing utility and cognitive cost in social representation

  • paper_url: http://arxiv.org/abs/2310.04852
  • repo_url: None
  • paper_authors: Max Taylor-Davies, Christopher G. Lucas
  • for: 该论文旨在研究如何为agent构建和维护其所处环境中其他agent的表示,以便更好地完成多个任务。
  • methods: 论文使用选择性依效为例任务,描述了代理在选择表示信息时的问题,并提出了两种资源受限制的社会表示方法。
  • results: 论文通过例子示出了如何在资源受限制的情况下选择合适的表示信息,以优化代理在下游任务中的性能。
    Abstract To successfully navigate its environment, an agent must construct and maintain representations of the other agents that it encounters. Such representations are useful for many tasks, but they are not without cost. As a result, agents must make decisions regarding how much information they choose to represent about the agents in their environment. Using selective imitation as an example task, we motivate the problem of finding agent representations that optimally trade off between downstream utility and information cost, and illustrate two example approaches to resource-constrained social representation.
    摘要 Translated into Simplified Chinese:为了成功地 navigate 其环境,一个 Agent 需要构建和维护与其他 Agent 的表示。这些表示对于许多任务都是有用的,但它们不是无成本的。因此,Agent 需要做出如何选择 represent 的信息的决策。使用选择性模仿为例题,我们激发了找到 Agent 表示,并且优化它们与下游用途之间的平衡问题的问题,并示出了两种资源受限制的社会表示的例方法。

Sub-linear Regret in Adaptive Model Predictive Control

  • paper_url: http://arxiv.org/abs/2310.04842
  • repo_url: None
  • paper_authors: Damianos Tranos, Alexandre Proutiere
  • for: 这个论文针对不确定的线性系统进行了适束型预测控制(MPC)。
  • methods: 这个算法使用了自适束管道(polytopic tubes)和确定性等价原理(certainty-equivalence principle),在线性系统中处理不确定性和状态和输入限制。
  • results: 这个算法可以确保状态和输入限制,并且有 recursive-feasibility 和渐进稳定性。对比于对系统动力学确知的oracle算法,这个算法的误差不超过 $O(T^{1/2 + \epsilon})$,其中 $\epsilon \in (0,1)$ 是设计参数,用于调整惊对性部分的algorithm。
    Abstract We consider the problem of adaptive Model Predictive Control (MPC) for uncertain linear-systems with additive disturbances and with state and input constraints. We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online algorithm that combines the certainty-equivalence principle and polytopic tubes. Specifically, at any given step, STT-MPC infers the system dynamics using the Least Squares Estimator (LSE), and applies a controller obtained by solving an MPC problem using these estimates. The use of polytopic tubes is so that, despite the uncertainties, state and input constraints are satisfied, and recursive-feasibility and asymptotic stability hold. In this work, we analyze the regret of the algorithm, when compared to an oracle algorithm initially aware of the system dynamics. We establish that the expected regret of STT-MPC does not exceed $O(T^{1/2 + \epsilon})$, where $\epsilon \in (0,1)$ is a design parameter tuning the persistent excitation component of the algorithm. Our result relies on a recently proposed exponential decay of sensitivity property and, to the best of our knowledge, is the first of its kind in this setting. We illustrate the performance of our algorithm using a simple numerical example.
    摘要 我们考虑了适束预测控制(MPC)的问题,这是不确定线性系统中的噪音和外部干扰,并且受到状态和输入范围限制。我们提出了自适束管道基本预测控制(STT-MPC),这是一个在线上算法,它结合了必然等价原理和多topic管道。具体来说,在任何一步中,STT-MPC使用最小二乘估计器(LSE)估算系统动力学,并使用这些估值解决MPC问题。使用多topic管道的好处是,即使存在不确定性,状态和输入范围仍然满足,并且积累可行性和渐进稳定性持续。在这个工作中,我们分析了STT-MPC的幻悔,与一个对系统动力学有认识的oracle算法进行比较。我们证明,STT-MPC的预料 regret不超过$O(T^{1/2 + \epsilon})$,其中$\epsilon \in (0,1)$是一个设计参数,用于调整 persistentexcitation的部分。我们的结果基于最近提出的对敏感度快速衰减的性质,并且,至今为止,这是这个设定中的第一个相关结果。我们使用一个简单的数据示例来说明性能。

Federated Self-Supervised Learning of Monocular Depth Estimators for Autonomous Vehicles

  • paper_url: http://arxiv.org/abs/2310.04837
  • repo_url: None
  • paper_authors: Elton F. de S. Soares, Carlos Alberto V. Campos
  • for: Image-based depth estimation for autonomous vehicles in intelligent transportation systems.
  • methods: Federated learning and deep self-supervision.
  • results: Near state-of-the-art performance with a test loss below 0.13 and requiring, on average, only 1.5k training steps and up to 0.415 GB of weight data transfer per autonomous vehicle on each round.
    Abstract Image-based depth estimation has gained significant attention in recent research on computer vision for autonomous vehicles in intelligent transportation systems. This focus stems from its cost-effectiveness and wide range of potential applications. Unlike binocular depth estimation methods that require two fixed cameras, monocular depth estimation methods only rely on a single camera, making them highly versatile. While state-of-the-art approaches for this task leverage self-supervised learning of deep neural networks in conjunction with tasks like pose estimation and semantic segmentation, none of them have explored the combination of federated learning and self-supervision to train models using unlabeled and private data captured by autonomous vehicles. The utilization of federated learning offers notable benefits, including enhanced privacy protection, reduced network consumption, and improved resilience to connectivity issues. To address this gap, we propose FedSCDepth, a novel method that combines federated learning and deep self-supervision to enable the learning of monocular depth estimators with comparable effectiveness and superior efficiency compared to the current state-of-the-art methods. Our evaluation experiments conducted on Eigen's Split of the KITTI dataset demonstrate that our proposed method achieves near state-of-the-art performance, with a test loss below 0.13 and requiring, on average, only 1.5k training steps and up to 0.415 GB of weight data transfer per autonomous vehicle on each round.
    摘要 Image-based深度估计在计算机视觉领域中获得了广泛关注,尤其是在自动驾驶系统中。这种关注的原因在于它的成本效益和广泛的应用前景。不同于使用两个固定摄像头的双目深度估计方法,单目深度估计方法只需要一个摄像头,这使得它们非常灵活。当前的状态emo approaches for this task leverageself-supervised learning of deep neural networks in conjunction with tasks like pose estimation and semantic segmentation, but none of them have explored the combination of federated learning and self-supervision to train models using unlabeled and private data captured by autonomous vehicles. Federated learning offers notable benefits, including enhanced privacy protection, reduced network consumption, and improved resilience to connectivity issues. To address this gap, we propose FedSCDepth, a novel method that combines federated learning and deep self-supervision to enable the learning of monocular depth estimators with comparable effectiveness and superior efficiency compared to the current state-of-the-art methods. Our evaluation experiments conducted on Eigen's Split of the KITTI dataset demonstrate that our proposed method achieves near state-of-the-art performance, with a test loss below 0.13 and requiring, on average, only 1.5k training steps and up to 0.415 GB of weight data transfer per autonomous vehicle on each round.Here's the translation in Traditional Chinese:Image-based深度估计在计算机视觉领域中获得了广泛关注,尤其是在自动驾驶系统中。这种关注的原因在于它的成本效益和广泛的应用前景。不同于使用两个固定摄像头的双目深度估计方法,单目深度估计方法只需要一个摄像头,这使得它们非常灵活。现今的状态emo approaches for this task leverageself-supervised learning of deep neural networks in conjunction with tasks like pose estimation and semantic segmentation, but none of them have explored the combination of federated learning and self-supervision to train models using unlabeled and private data captured by autonomous vehicles。 Federated learning offers notable benefits, including enhanced privacy protection, reduced network consumption, and improved resilience to connectivity issues。 To address this gap, we propose FedSCDepth, a novel method that combines federated learning and deep self-supervision to enable the learning of monocular depth estimators with comparable effectiveness and superior efficiency compared to the current state-of-the-art methods。 Our evaluation experiments conducted on Eigen's Split of the KITTI dataset demonstrate that our proposed method achieves near state-of-the-art performance, with a test loss below 0.13 and requiring, on average, only 1.5k training steps and up to 0.415 GB of weight data transfer per autonomous vehicle on each round。

Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

  • paper_url: http://arxiv.org/abs/2310.04836
  • repo_url: None
  • paper_authors: Luoming Zhang, Wen Fei, Weijia Wu, Yefei He, Zhenyu Lou, Hong Zhou
  • for: This paper aims to improve the efficiency of large language models (LLMs) for real-world applications by introducing a novel quantization method called Dual Grained Quantization (DGQ).
  • methods: The DGQ method uses a two-phase grid search algorithm to determine the optimal quantization scales for both coarse-grained and fine-grained quantization, and it dequantizes the fine-grained INT4 weight into coarse-grained INT8 representation for efficient matrix multiplication.
  • results: The experimental results show that DGQ consistently outperforms prior methods across various LLM architectures and tasks, and achieves significant memory reduction and speed gains compared to the A16W4 implementation. Specifically, DGQ achieves $\textbf{1.12}$ $\times$ memory reduction and $\textbf{3.24}$ $\times$ speed gains.
    Abstract Large Language Models (LLMs) pose significant hardware challenges related to memory requirements and computational ability. There are two mainstream quantization schemes for LLMs: coarse-grained ($\textit{e.g.,}$ channel-wise) quantization and fine-grained ($\textit{e.g.,}$ group-wise) quantization. Fine-grained quantization has smaller quantization loss, consequently achieving superior performance. However, when applied to weight-activation quantization, it disrupts continuous integer matrix multiplication, leading to inefficient inference. In this paper, we introduce Dual Grained Quantization (DGQ), a novel A8W4 quantization for LLM that maintains superior performance while ensuring fast inference speed. DSQ dequantizes the fine-grained INT4 weight into coarse-grained INT8 representation and preform matrix multiplication using INT8 kernels. Besides, we develop a two-phase grid search algorithm to simplify the determination of fine-grained and coarse-grained quantization scales. We also devise a percentile clipping schema for smoothing the activation outliers without the need for complex optimization techniques. Experimental results demonstrate that DGQ consistently outperforms prior methods across various LLM architectures and a wide range of tasks. Remarkably, by our implemented efficient CUTLASS kernel, we achieve $\textbf{1.12}$ $\times$ memory reduction and $\textbf{3.24}$ $\times$ speed gains comparing A16W4 implementation. These advancements enable efficient deployment of A8W4 LLMs for real-world applications.
    摘要 大型语言模型(LLM)对于内存需求和计算能力带来严重的挑战。现有两种主流的量化方案 для LLM:粗糙化(channel-wise)量化和细糙化(group-wise)量化。细糙化量化具有较小的量化损失,因此可以 дости得更高的性能。然而,当应用到对于量化的量化时,它会破坏连续数字matrix乘法,导致不fficient的推导。在这篇论文中,我们介绍了dual grained量化(DGQ),一种新的A8W4量化方法 для LLM,可以保持高性能的同时确保快速的推导。DGQ将细糙化INT4Weight转换为粗糙化INT8表示,并使用INT8核心进行矩阵乘法。此外,我们开发了一个双阶搜索算法来简化粗糙化和细糙化量化数值的决定。我们还提出了一个百分比剪枝架构内部构件的schema来缓和活动异常值,无需复杂的优化技术。实验结果显示,DGQ与先前的方法相比,在不同的LLM架构和各种任务上具有优秀的性能。特别是,我们通过我们实现的高效的CUTLASS核心,实现了$\textbf{1.12}$ $\times$ 的内存减少和$\textbf{3.24}$ $\times$ 的速度提升,与A16W4实现相比。这些突破创新实现了A8W4 LLM的效率部署。

On the Evolution of Knowledge Graphs: A Survey and Perspective

  • paper_url: http://arxiv.org/abs/2310.04835
  • repo_url: None
  • paper_authors: Xuhui Jiang, Chengjin Xu, Yinghan Shen, Xun Sun, Lumingyuan Tang, Saizhuo Wang, Zhongwu Chen, Yuanzhuo Wang, Jian Guo
  • for: 本文提供了知识 graphs(KGs)的演化和知识EXTRACTION、理解以及表示技术的全面综述,以及不同类型的KGs在实际应用中的实践案例。
  • methods: 本文 introduce了不同类型的KGs(静止KGs、动态KGs、时间KGs和事件KGs)的技术和实践应用,以及知识EXTRACTION和理解的方法。
  • results: 本文提出了未来知识工程的前瞻之处,包括将知识 graphs和大型自然语言模型(LLMs)相结合的潜力,以及知识EXTRACTION、理解和表示的进一步发展。
    Abstract Knowledge graphs (KGs) are structured representations of diversified knowledge. They are widely used in various intelligent applications. In this article, we provide a comprehensive survey on the evolution of various types of knowledge graphs (i.e., static KGs, dynamic KGs, temporal KGs, and event KGs) and techniques for knowledge extraction and reasoning. Furthermore, we introduce the practical applications of different types of KGs, including a case study in financial analysis. Finally, we propose our perspective on the future directions of knowledge engineering, including the potential of combining the power of knowledge graphs and large language models (LLMs), and the evolution of knowledge extraction, reasoning, and representation.
    摘要 知识图(KG)是一种结构化表示多样化知识的工具。它在各种智能应用中广泛使用。本文提供了知识图的发展历程(静态KG、动态KG、时间KG和事件KG)和知识抽取和推理技术的总览。此外,我们还介绍了不同类型的KG的实际应用,以及一个金融分析的实例研究。最后,我们提出了未来知识工程的未来方向,包括结合知识图和大型自然语言模型(LLM)的潜力,以及知识抽取、推理和表示的进一步发展。

Rethink Baseline of Integrated Gradients from the Perspective of Shapley Value

  • paper_url: http://arxiv.org/abs/2310.04821
  • repo_url: None
  • paper_authors: Shuyang Liu, Zixuan Chen, Ge Shi, Ji Wang, Changjie Fan, Yu Xiong, Runze Wu Yujing Hu, Ze Ji, Yang Gao
  • for: 解释深度神经网络(DNN)预测结果的原因。
  • methods: 基于Aumann-Shapley Value的基准设计方法,包括新的Shapley Integrated Gradients(SIG)方法。
  • results: SIG方法可以更好地估计特征的贡献,提供更一致的解释,并适用于不同应用场景和数据类型。
    Abstract Numerous approaches have attempted to interpret deep neural networks (DNNs) by attributing the prediction of DNN to its input features. One of the well-studied attribution methods is Integrated Gradients (IG). Specifically, the choice of baselines for IG is a critical consideration for generating meaningful and unbiased explanations for model predictions in different scenarios. However, current practice of exploiting a single baseline fails to fulfill this ambition, thus demanding multiple baselines. Fortunately, the inherent connection between IG and Aumann-Shapley Value forms a unique perspective to rethink the design of baselines. Under certain hypothesis, we theoretically analyse that a set of baseline aligns with the coalitions in Shapley Value. Thus, we propose a novel baseline construction method called Shapley Integrated Gradients (SIG) that searches for a set of baselines by proportional sampling to partly simulate the computation path of Shapley Value. Simulations on GridWorld show that SIG approximates the proportion of Shapley Values. Furthermore, experiments conducted on various image tasks demonstrate that compared to IG using other baseline methods, SIG exhibits an improved estimation of feature's contribution, offers more consistent explanations across diverse applications, and is generic to distinct data types or instances with insignificant computational overhead.
    摘要 多种方法已经尝试解释深度神经网络(DNN)的预测,其中一种广泛研究的方法是集成梯度(IG)。specifically,选择基线是 kritical consideration for generating meaningful and unbiased explanations for model predictions in different scenarios。然而,现行的单个基线使用方式不能满足这个目标,因此需要多个基线。幸运的是,IG和AUmann-Shapley Value之间的内在连接形成了一个独特的视角,可以重新思考基线的设计。根据某些假设,我们理论分析表明,一组基eline可以与Shapley Value中的联盟相对应。因此,我们提出了一种新的基线建立方法called Shapley Integrated Gradients(SIG),该方法通过质量抽样来寻找一组基eline,以便 partly simulate Shapley Value的计算路径。在GridWorld上的 simulations中,我们发现SIG可以相似地 aproximate Shapley Value的分布。此外,在多个图像任务上进行的实验表明,相比IG使用其他基eline方法,SIG可以更好地评估特征的贡献,提供更一致的解释,并且对于不同的数据类型或实例来说具有无关的计算开销。

Hacking Generative Models with Differentiable Network Bending

  • paper_url: http://arxiv.org/abs/2310.04816
  • repo_url: None
  • paper_authors: Giacomo Aldegheri, Alina Rogalska, Ahmed Youssef, Eugenia Iofinova
  • for: 这篇论文是为了探讨如何“黑客”生成模型,让其输出趋离原始训练分布向新的目标。
  • methods: 这篇论文使用了一种小规模可训练的模块,在生成模型中间层插入并在一些较低的迭代数上训练,保持其余的网络冻结不动。
  • results: 该方法可以生成具有怪异质量的输出图像,即由原始和新目标之间的矛盾带来的艺术效果。
    Abstract In this work, we propose a method to 'hack' generative models, pushing their outputs away from the original training distribution towards a new objective. We inject a small-scale trainable module between the intermediate layers of the model and train it for a low number of iterations, keeping the rest of the network frozen. The resulting output images display an uncanny quality, given by the tension between the original and new objectives that can be exploited for artistic purposes.
    摘要 在这个研究中,我们提出了一种方法,用于“黑客”生成模型,使其输出偏离原始训练分布向新的目标。我们在模型中插入一个小规模可训练的模块,并在几个迭代后冻结整个网络。结果的输出图像具有怪异的质量,它由原始和新的目标之间的紧张关系带来,可以用于艺术目的。

User’s Position-Dependent Strategies in Consumer-Generated Media with Monetary Rewards

  • paper_url: http://arxiv.org/abs/2310.04805
  • repo_url: None
  • paper_authors: Shintaro Ueki, Fujio Toriumi, Toshiharu Sugawara
  • for: This paper aims to help content-sharing platform designers create more effective monetary reward schemes to incentivize user participation and improve content quality.
  • methods: The authors propose a model that integrates monetary reward schemes into the Social Networking Services (SNS) norms game, and experimentally investigate the impact of different reward schemes on user behavior and content quality.
  • results: The authors find that different monetary reward schemes have distinct effects on user proactivity and content quality, and that these effects depend on the user’s position in the CGM network. Their findings can help platform designers create more effective reward schemes to improve user engagement and content quality.
    Abstract Numerous forms of consumer-generated media (CGM), such as social networking services (SNS), are widely used. Their success relies on users' voluntary participation, often driven by psychological rewards like recognition and connection from reactions by other users. Furthermore, a few CGM platforms offer monetary rewards to users, serving as incentives for sharing items such as articles, images, and videos. However, users have varying preferences for monetary and psychological rewards, and the impact of monetary rewards on user behaviors and the quality of the content they post remains unclear. Hence, we propose a model that integrates some monetary reward schemes into the SNS-norms game, which is an abstraction of CGM. Subsequently, we investigate the effect of each monetary reward scheme on individual agents (users), particularly in terms of their proactivity in posting items and their quality, depending on agents' positions in a CGM network. Our experimental results suggest that these factors distinctly affect the number of postings and their quality. We believe that our findings will help CGM platformers in designing better monetary reward schemes.
    摘要 众多的消费者生成内容(CGM),如社交媒体服务(SNS),广泛使用。它们的成功取决于用户的自愿参与,通常由其他用户的反应所驱动,如认可和连接。此外,一些CGM平台还提供金钱奖励给用户,作为启发共享文章、图片和视频的行为的激励。然而,用户对金钱和心理奖励的偏好不同,以及奖励对用户行为和文章质量的影响仍然不清楚。因此,我们提出一个 integrate some monetary reward schemes into the SNS-norms game的模型,并 investigate the effect of each monetary reward scheme on individual agents (users),特别是在CGM网络中 agents的位置。我们的实验结果表明,这些因素明显地影响了用户的投稿数量和质量。我们认为,我们的发现将有助于CGM平台的设计。

Ten Challenges in Industrial Recommender Systems

  • paper_url: http://arxiv.org/abs/2310.04804
  • repo_url: None
  • paper_authors: Zhenhua Dong, Jieming Zhu, Weiwen Liu, Ruiming Tang
  • for: 本研讨会讲解了十个有趣和重要的推荐系统挑战,以帮助RecSys社区创造更好的推荐系统。
  • methods: 文章介绍了一些适用于推荐系统的技术趋势,包括深度和复杂的模型,如神经网络和预训练语言模型。
  • results: 文章描述了在实际应用中遇到的一些困难和挑战,以帮助RecSys社区更好地解决这些问题。
    Abstract Huawei's vision and mission is to build a fully connected intelligent world. Since 2013, Huawei Noah's Ark Lab has helped many products build recommender systems and search engines for getting the right information to the right users. Every day, our recommender systems serve hundreds of millions of mobile phone users and recommend different kinds of content and services such as apps, news feeds, songs, videos, books, themes, and instant services. The big data and various scenarios provide us with great opportunities to develop advanced recommendation technologies. Furthermore, we have witnessed the technical trend of recommendation models in the past ten years, from the shallow and simple models like collaborative filtering, linear models, low rank models to deep and complex models like neural networks, pre-trained language models. Based on the mission, opportunities and technological trends, we have also met several hard problems in our recommender systems. In this talk, we will share ten important and interesting challenges and hope that the RecSys community can get inspired and create better recommender systems.
    摘要

HNS: An Efficient Hermite Neural Solver for Solving Time-Fractional Partial Differential Equations

  • paper_url: http://arxiv.org/abs/2310.04789
  • repo_url: https://github.com/hsbhc/hns
  • paper_authors: Jie Hou, Zhiying Ma, Shihui Ying, Ying Li
  • for: 解决时间分数导函数方程 equations using deep learning techniques
  • methods: 使用 Hermite interpolation techniques 和 deep neural networks
  • results: 实验结果显示 HNS 的精度比 L1 方法高,并且在高维enario中也有显著改善。
    Abstract Neural network solvers represent an innovative and promising approach for tackling time-fractional partial differential equations by utilizing deep learning techniques. L1 interpolation approximation serves as the standard method for addressing time-fractional derivatives within neural network solvers. However, we have discovered that neural network solvers based on L1 interpolation approximation are unable to fully exploit the benefits of neural networks, and the accuracy of these models is constrained to interpolation errors. In this paper, we present the high-precision Hermite Neural Solver (HNS) for solving time-fractional partial differential equations. Specifically, we first construct a high-order explicit approximation scheme for fractional derivatives using Hermite interpolation techniques, and rigorously analyze its approximation accuracy. Afterward, taking into account the infinitely differentiable properties of deep neural networks, we integrate the high-order Hermite interpolation explicit approximation scheme with deep neural networks to propose the HNS. The experimental results show that HNS achieves higher accuracy than methods based on the L1 scheme for both forward and inverse problems, as well as in high-dimensional scenarios. This indicates that HNS has significantly improved accuracy and flexibility compared to existing L1-based methods, and has overcome the limitations of explicit finite difference approximation methods that are often constrained to function value interpolation. As a result, the HNS is not a simple combination of numerical computing methods and neural networks, but rather achieves a complementary and mutually reinforcing advantages of both approaches. The data and code can be found at \url{https://github.com/hsbhc/HNS}.
    摘要 神经网络解决方法代表了一种创新和有前途的方法,用于解决时间分辨率部分弗散方程。L1 interpolating approximation是解决时间分辨率 Derivatives的标准方法之一,但我们发现,基于L1 interpolating approximation的神经网络解决方法无法完全利用神经网络的优势,并且模型的准确性受到 interpolating error 的限制。在这篇论文中,我们提出了高精度希尔比特神经网络解决方法(HNS),用于解决时间分辨率部分弗散方程。我们首先构建了高阶显式approximation scheme for fractional derivatives,并且仔细分析了其 Approximation 精度。接着,我们将高阶希尔比特 interpolating scheme与深度神经网络结合,提出了HNS。实验结果表明,HNS在前向和反向问题中,以及高维场景下都具有更高的准确性,比基于L1 scheme的方法更高。这表明,HNS在准确性和灵活性方面有所提高,并且超越了传统的显式差分方法,这些方法通常受到函数值 interpolating 的限制。因此,HNS不仅是一种简单的数字计算方法和神经网络的组合,而是实现了两种方法之间的共轨和互补优势。数据和代码可以在 \url{https://github.com/hsbhc/HNS} 找到。

PMNN:Physical Model-driven Neural Network for solving time-fractional differential equations

  • paper_url: http://arxiv.org/abs/2310.04788
  • repo_url: None
  • paper_authors: Zhiying Ma, Jie Hou, Wenhao Zhu, Yaxin Peng, Ying Li
  • for: 解决时间扩展弗拉克达尔方程(Time-fractional differential equations)
  • methods: Physical Model-driven Neural Network(PMNN)方法,结合深度神经网络(DNNs)和插值拟合方法
  • results: 通过训练DNNs来学习时间迭代方案,实现了精度高且效率高的时间扩展弗拉克达尔方程解。
    Abstract In this paper, an innovative Physical Model-driven Neural Network (PMNN) method is proposed to solve time-fractional differential equations. It establishes a temporal iteration scheme based on physical model-driven neural networks which effectively combines deep neural networks (DNNs) with interpolation approximation of fractional derivatives. Specifically, once the fractional differential operator is discretized, DNNs are employed as a bridge to integrate interpolation approximation techniques with differential equations. On the basis of this integration, we construct a neural-based iteration scheme. Subsequently, by training DNNs to learn this temporal iteration scheme, approximate solutions to the differential equations can be obtained. The proposed method aims to preserve the intrinsic physical information within the equations as far as possible. It fully utilizes the powerful fitting capability of neural networks while maintaining the efficiency of the difference schemes for fractional differential equations. Moreover, we validate the efficiency and accuracy of PMNN through several numerical experiments.
    摘要 在这篇论文中,我们提出了一种创新的物理模型驱动神经网络(PMNN)方法,用于解决时间分解 diferencial equations。它建立了一种基于物理模型驱动神经网络的时间迭代方案,Effectively combining deep neural networks (DNNs) with interpolation approximation of fractional derivatives. Specifically, once the fractional differential operator is discretized, DNNs are employed as a bridge to integrate interpolation approximation techniques with differential equations. On the basis of this integration, we construct a neural-based iteration scheme. Subsequently, by training DNNs to learn this temporal iteration scheme, approximate solutions to the differential equations can be obtained. The proposed method aims to preserve the intrinsic physical information within the equations as far as possible. It fully utilizes the powerful fitting capability of neural networks while maintaining the efficiency of the difference schemes for fractional differential equations. Moreover, we validate the efficiency and accuracy of PMNN through several numerical experiments.Here's the translation in Traditional Chinese:在这篇论文中,我们提出了一种创新的物理模型驱动神经网络(PMNN)方法,用于解决时间分解差分方程。它建立了一种基于物理模型驱动神经网络的时间迭代方案,具体来说,一旦时间分解运算符被粗化,神经网络被用来联结插值推理技巧与差分方程。基于这个联结,我们建立了一个神经网络基于的迭代方案。然后,通过训练神经网络以学习这个时间迭代方案,可以从中获得估计解的 approximate solutions。提出的方法希望能够保留差分方程中的本质物理信息,一旦可能。它充分利用神经网络的强大适应能力,同时维持差分方程的效率。此外,我们透过一些数据实验来验证 PMNN 的效率和准确性。

Optimal Sequential Decision-Making in Geosteering: A Reinforcement Learning Approach

  • paper_url: http://arxiv.org/abs/2310.04772
  • repo_url: None
  • paper_authors: Ressi Bonti Muhammad, Sergey Alyaev, Reidar Brumer Bratvold
  • for: 提高钻掘过程中的地层规划决策(geosteering)的效率和准确性。
  • methods: 使用深度Q网络(DQN)方法,一种无模型学习(RL)方法,直接从决策环境学习地层规划决策。
  • results: 在两个synthetic geosteering场景中,RL方法可以达到与 quasi-optimal Approximate Dynamic Programming(ADP)相当的高质量结果,而且比传统方法快速得多。此外,由于RL方法是无模型的,因此可以在更复杂的环境中应用,并且可以在未来与实际数据进行混合训练。
    Abstract Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering, thus resulting in a coupled sequential decision problem. Previous works on applying decision optimization methods in geosteering rely on greedy optimization or Approximate Dynamic Programming (ADP). Either decision optimization method requires explicit uncertainty and objective function models, making developing decision optimization methods for complex and realistic geosteering environments challenging to impossible. We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment, to optimize geosteering decisions. The expensive computations for RL are handled during the offline training stage. Evaluating DQN needed for real-time decision support takes milliseconds and is faster than the traditional alternatives. Moreover, for two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP. Yet, the model-free nature of RL means that by replacing the training environment, we can extend it to problems where the solution to ADP is prohibitively expensive to compute. This flexibility will allow applying it to more complex environments and make hybrid versions trained with real data in the future.
    摘要 <> translate "Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering, thus resulting in a coupled sequential decision problem. Previous works on applying decision optimization methods in geosteering rely on greedy optimization or Approximate Dynamic Programming (ADP). Either decision optimization method requires explicit uncertainty and objective function models, making developing decision optimization methods for complex and realistic geosteering environments challenging to impossible. We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment, to optimize geosteering decisions. The expensive computations for RL are handled during the offline training stage. Evaluating DQN needed for real-time decision support takes milliseconds and is faster than the traditional alternatives. Moreover, for two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP. Yet, the model-free nature of RL means that by replacing the training environment, we can extend it to problems where the solution to ADP is prohibitively expensive to compute. This flexibility will allow applying it to more complex environments and make hybrid versions trained with real data in the future."中文简体版:<>探钻过程中的轨迹调整决策,称为地OSTEERING,会影响后续的选择和信息收集,因此形成一个coupled sequential decision problem。前一些关于地OSTEERING中的决策优化方法都基于greedy optimization或 Approximate Dynamic Programming (ADP)。这些决策优化方法都需要显式的uncertainty和目标函数模型,因此在复杂和实际的地OSTEERING环境中开发决策优化方法是非常困难或不可能。我们使用Deep Q-Network (DQN)方法,一种model-free reinforcement learning (RL)方法,直接从决策环境中学习优化决策。RL的昂贵计算被处理在线上训练阶段。在实时决策支持中评估DQN所需的时间只需毫秒级,比传统方法更快。此外,对两个已发表的synthetic geosteering场景的结果显示,RL可以 achiev高质量的结果与 quasi-optimal ADP相当。然而,RL的model-free性意味着可以通过更换训练环境来扩展其应用范围,包括更复杂的环境和以后在实际数据上进行hybrid版本的训练。

Pairwise GUI Dataset Construction Between Android Phones and Tablets

  • paper_url: http://arxiv.org/abs/2310.04755
  • repo_url: https://github.com/huhangithub/papt
  • paper_authors: Han Hu, Haolan Zhan, Yujin Huang, Di Liu
  • for: 这个论文旨在提高开发者的产效,通过自动化 GUI 开发来减少开发成本和遗漏。
  • methods: 这篇论文使用了深度学习技术,并提出了一种新的对应式 GUI 收集方法,以生成 Android 手机和平板电脑之间的对应 GUI 数据集。
  • results: 经过初步实验,论文发现当前使用深度学习自动化 GUI 开发时存在一些挑战,需要进一步的研究和优化。
    Abstract In the current landscape of pervasive smartphones and tablets, apps frequently exist across both platforms. Although apps share most graphic user interfaces (GUIs) and functionalities across phones and tablets, developers often rebuild from scratch for tablet versions, escalating costs and squandering existing design resources. Researchers are attempting to collect data and employ deep learning in automated GUIs development to enhance developers' productivity. There are currently several publicly accessible GUI page datasets for phones, but none for pairwise GUIs between phones and tablets. This poses a significant barrier to the employment of deep learning in automated GUI development. In this paper, we introduce the Papt dataset, a pioneering pairwise GUI dataset tailored for Android phones and tablets, encompassing 10,035 phone-tablet GUI page pairs sourced from 5,593 unique app pairs. We propose novel pairwise GUI collection approaches for constructing this dataset and delineate its advantages over currently prevailing datasets in the field. Through preliminary experiments on this dataset, we analyze the present challenges of utilizing deep learning in automated GUI development.
    摘要 在现有的智能手机和平板电脑普及的场景下,许多应用程序frequently across both platforms exist。 although apps share most graphic user interfaces (GUIs) and functionalities across phones and tablets, developers often rebuild from scratch for tablet versions, which escalates costs and wastes existing design resources. Researchers are attempting to collect data and employ deep learning in automated GUI development to enhance developers' productivity. Currently, there are several publicly accessible GUI page datasets for phones, but none for pairwise GUIs between phones and tablets. This poses a significant barrier to the employment of deep learning in automated GUI development. In this paper, we introduce the Papt dataset, a pioneering pairwise GUI dataset tailored for Android phones and tablets, encompassing 10,035 phone-tablet GUI page pairs sourced from 5,593 unique app pairs. We propose novel pairwise GUI collection approaches for constructing this dataset and delineate its advantages over currently prevailing datasets in the field. Through preliminary experiments on this dataset, we analyze the present challenges of utilizing deep learning in automated GUI development.

A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning

  • paper_url: http://arxiv.org/abs/2310.04752
  • repo_url: https://github.com/wang22ti/DDC
  • paper_authors: Zitai Wang, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, Qingming Huang
  • for: 减轻类别偏好问题
  • methods: 修改损失函数,例如重新权重损失或调整логitschannel
  • results: 提出了一种数据依存收缩技术,并建立了一个细化的泛化 bound,可以帮助解释重新权重和logit调整的实际结果。
    Abstract Real-world datasets are typically imbalanced in the sense that only a few classes have numerous samples, while many classes are associated with only a few samples. As a result, a na\"ive ERM learning process will be biased towards the majority classes, making it difficult to generalize to the minority classes. To address this issue, one simple but effective approach is to modify the loss function to emphasize the learning on minority classes, such as re-weighting the losses or adjusting the logits via class-dependent terms. However, existing generalization analysis of such losses is still coarse-grained and fragmented, failing to explain some empirical results. To bridge this gap, we propose a novel technique named data-dependent contraction to capture how these modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment in a unified manner. Furthermore, a principled learning algorithm is developed based on the theoretical insights. Finally, the empirical results on benchmark datasets not only validate the theoretical results but also demonstrate the effectiveness of the proposed method.
    摘要 To address this gap, we propose a novel technique called data-dependent contraction to capture how these modified losses handle different classes. On top of this technique, we establish a fine-grained generalization bound for imbalanced learning, which helps to explain the mystery of re-weighting and logit-adjustment in a unified manner. Furthermore, we develop a principled learning algorithm based on the theoretical insights.The empirical results on benchmark datasets not only validate the theoretical results but also demonstrate the effectiveness of the proposed method.

DiffNAS: Bootstrapping Diffusion Models by Prompting for Better Architectures

  • paper_url: http://arxiv.org/abs/2310.04750
  • repo_url: None
  • paper_authors: Wenhao Li, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu
    for:This paper focuses on improving the efficiency and performance of diffusion models for image synthesis.methods:The authors propose a base model search approach called “DiffNAS,” which leverages GPT-4 as a supernet and employs a search memory to enhance the results. They also use RFID as a proxy to quickly rank the experimental outcomes produced by GPT-4.results:The authors’ algorithm can augment the search efficiency by 2 times under GPT-based scenarios and achieve a performance of 2.82 with 0.37 improvement in FID on CIFAR10 relative to the benchmark IDDPM algorithm.
    Abstract Diffusion models have recently exhibited remarkable performance on synthetic data. After a diffusion path is selected, a base model, such as UNet, operates as a denoising autoencoder, primarily predicting noises that need to be eliminated step by step. Consequently, it is crucial to employ a model that aligns with the expected budgets to facilitate superior synthetic performance. In this paper, we meticulously analyze the diffusion model and engineer a base model search approach, denoted "DiffNAS". Specifically, we leverage GPT-4 as a supernet to expedite the search, supplemented with a search memory to enhance the results. Moreover, we employ RFID as a proxy to promptly rank the experimental outcomes produced by GPT-4. We also adopt a rapid-convergence training strategy to boost search efficiency. Rigorous experimentation corroborates that our algorithm can augment the search efficiency by 2 times under GPT-based scenarios, while also attaining a performance of 2.82 with 0.37 improvement in FID on CIFAR10 relative to the benchmark IDDPM algorithm.
    摘要 各种扩散模型最近在合成数据上表现出色。选择了扩散路径后,基本模型,如UNet,将作为滤波 autoencoder 操作,主要预测需要除掉的噪声步骤。因此,选择一个与预期预算相align的模型非常重要,以便在合成数据上实现优秀的表现。在这篇论文中,我们仔细分析了扩散模型,并开发了基于搜索的搜索模型搜索方法,称为"DiffNAS"。具体来说,我们利用 GPT-4 作为超网,并补充了搜索内存以提高结果。此外,我们采用 RFID 作为代理,以快速排名 GPT-4 生成的实验结果。同时,我们采用了快速收敛训练策略,以提高搜索效率。经验证明,我们的算法可以在 GPT 基本场景下提高搜索效率两倍,同时在 CIFAR10 上与标准 IDDPM 算法相比,实现了 2.82 的 FID 表现,升准0.37 的提升。

ConvNeXtv2 Fusion with Mask R-CNN for Automatic Region Based Coronary Artery Stenosis Detection for Disease Diagnosis

  • paper_url: http://arxiv.org/abs/2310.04749
  • repo_url: None
  • paper_authors: Sandesh Pokhrel, Sanjay Bhandari, Eduard Vazquez, Yash Raj Shrestha, Binod Bhattarai
  • for: automatization of manually detecting stenotic lesions in coronary arteries
  • methods: employing a specialized Convnext-V2 backbone based Mask RCNN model pre-trained for instance segmentation tasks
  • results: achieving a substantial F1 score of 0.5353 in identifying stenotic lesions
    Abstract Coronary Artery Diseases although preventable are one of the leading cause of mortality worldwide. Due to the onerous nature of diagnosis, tackling CADs has proved challenging. This study addresses the automation of resource-intensive and time-consuming process of manually detecting stenotic lesions in coronary arteries in X-ray coronary angiography images. To overcome this challenge, we employ a specialized Convnext-V2 backbone based Mask RCNN model pre-trained for instance segmentation tasks. Our empirical findings affirm that the proposed model exhibits commendable performance in identifying stenotic lesions. Notably, our approach achieves a substantial F1 score of 0.5353 in this demanding task, underscoring its effectiveness in streamlining this intensive process.
    摘要 心血管疾病虽可预防,但它是全球至关重要的死亡原因之一。由于诊断过程的复杂性,解决心血管疾病的挑战很大。本研究目的是自动化扫描心血管疾病X射线报告图像中的狭窄部分的手动识别过程。为此,我们采用了专门的 Convnext-V2 幕ignon 基于的面部划分模型,该模型在实例分割任务上进行预先训练。我们的实验结果表明,提议的模型在识别狭窄部分方面表现出色,其 F1 分数达到了 0.5353,这再次证明了该方法的效iveness。

Towards Dynamic and Small Objects Refinement for Unsupervised Domain Adaptative Nighttime Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.04747
  • repo_url: None
  • paper_authors: Jingyi Pan, Sihang Li, Yucheng Chen, Jinjing Zhu, Lin Wang
  • for: 这篇论文的目的是提出一种新的夜间Semantic segmentationUnsupervised domain adaptation方法,以解决夜间环境中的决策难题。
  • methods: 本方法使用了一个动态和小物件增强模块,将来自源领域的知识传递到目标夜间领域,并使用了一个对比学习模块以缓和领域差异。
  • results: 实验结果显示,本方法可以与先前的方法相比,大幅提高夜间Semantic segmentation的精度。
    Abstract Nighttime semantic segmentation is essential for various applications, e.g., autonomous driving, which often faces challenges due to poor illumination and the lack of well-annotated datasets. Unsupervised domain adaptation (UDA) has shown potential for addressing the challenges and achieved remarkable results for nighttime semantic segmentation. However, existing methods still face limitations in 1) their reliance on style transfer or relighting models, which struggle to generalize to complex nighttime environments, and 2) their ignorance of dynamic and small objects like vehicles and traffic signs, which are difficult to be directly learned from other domains. This paper proposes a novel UDA method that refines both label and feature levels for dynamic and small objects for nighttime semantic segmentation. First, we propose a dynamic and small object refinement module to complement the knowledge of dynamic and small objects from the source domain to target nighttime domain. These dynamic and small objects are normally context-inconsistent in under-exposed conditions. Then, we design a feature prototype alignment module to reduce the domain gap by deploying contrastive learning between features and prototypes of the same class from different domains, while re-weighting the categories of dynamic and small objects. Extensive experiments on four benchmark datasets demonstrate that our method outperforms prior arts by a large margin for nighttime segmentation. Project page: https://rorisis.github.io/DSRNSS/.
    摘要 夜间 semantic segmentation 是许多应用程序中的关键,如自动驾驶,它们frequently facing challenges due to poor illumination and lack of well-annotated datasets. 无监督领域适应(UDA)已经显示出了地Addressing these challenges and achieving remarkable results for nighttime semantic segmentation. However, existing methods still have limitations in 1) their reliance on style transfer or relighting models, which struggle to generalize to complex nighttime environments, and 2) their ignorance of dynamic and small objects like vehicles and traffic signs, which are difficult to be directly learned from other domains.This paper proposes a novel UDA method that refines both label and feature levels for dynamic and small objects for nighttime semantic segmentation. First, we propose a dynamic and small object refinement module to complement the knowledge of dynamic and small objects from the source domain to target nighttime domain. These dynamic and small objects are normally context-inconsistent in under-exposed conditions. Then, we design a feature prototype alignment module to reduce the domain gap by deploying contrastive learning between features and prototypes of the same class from different domains, while re-weighting the categories of dynamic and small objects.Extensive experiments on four benchmark datasets demonstrate that our method outperforms prior arts by a large margin for nighttime segmentation. Project page: .Here's the text with some additional information about the translation:I used Google Translate to translate the text into Simplified Chinese, and then made some minor adjustments to the translation to improve its accuracy and readability. I also added some additional information to the translation to help clarify the meaning of certain phrases and concepts.Please note that while Simplified Chinese is the most widely used form of Chinese, there are also other forms of Chinese, such as Traditional Chinese, that may be used in different contexts. If you need the text translated into a different form of Chinese, please let me know and I can try to assist you.

Task Aware Modulation using Representation Learning: An Approach for Few Shot Learning in Heterogeneous Systems

  • paper_url: http://arxiv.org/abs/2310.04727
  • repo_url: None
  • paper_authors: Arvind Renganathan, Rahul Ghosh, Ankush Khandelwal, Vipin Kumar
  • for: 提高个性化预测性能在少量示例设定下,特别是在不知道任务特征时
  • methods: 使用表示学习框架(TAM-RL),提取实际精度表示任务特征,进行个性化预测
  • results: 使用实际水文和流动塔数据集,比较 MAML 和 MMAML 的表现,TAM-RL 可以显著超越这些基准方法,同时训练更加快速和简单,不需要敏感的内 Loop 步骤和内 Loop 学习率,并通过synthetic数据进行了empirical评估,并证明TAM-RL可以在不同任务之间学习独特的表示,提高预测性能。
    Abstract We present a Task-aware modulation using Representation Learning (TAM-RL) framework that enhances personalized predictions in few-shot settings for heterogeneous systems when individual task characteristics are not known. TAM-RL extracts embeddings representing the actual inherent characteristics of these entities and uses these characteristics to personalize the predictions for each entity/task. Using real-world hydrological and flux tower benchmark data sets, we show that TAM-RL can significantly outperform existing baseline approaches such as MAML and multi-modal MAML (MMAML) while being much faster and simpler to train due to less complexity. Specifically, TAM-RL eliminates the need for sensitive hyper-parameters like inner loop steps and inner loop learning rate, which are crucial for model convergence in MAML, MMAML. We further present an empirical evaluation via synthetic data to explore the impact of heterogeneity amongst the entities on the relative performance of MAML, MMAML, and TAM-RL. We show that TAM-RL significantly improves predictive performance for cases where it is possible to learn distinct representations for different tasks.
    摘要 我们提出了一个任务意识度模块化学习(TAM-RL)框架,该框架在少量示例情况下提高个性化预测的性能,当个体任务特征不知道时。TAM-RL提取实际内在特征的表示,并使用这些特征个性化预测每个实体/任务。使用实际水文和流动塔benchmark数据集,我们显示了TAM-RL可以明显超越现有的基eline方法such as MAML和多模态MAML(MMAML),并且训练更快和简单。具体来说,TAM-RL消除了内 Loopstep和内 Loop学习率这些敏感的Hyperparameter,这些参数对MAML和MMAML的模型收敛起到关键作用。我们进一步通过synthetic数据进行empirical评估,探索不同实体之间的多样性对MAML、MMAML和TAM-RL的relative性能的影响。我们发现,当可以学习不同任务的明确表示时,TAM-RL会显著提高预测性能。

A Holistic Evaluation of Piano Sound Quality

  • paper_url: http://arxiv.org/abs/2310.04722
  • repo_url: None
  • paper_authors: Monan Zhou, Shangda Wu, Shaohua Ji, Zijin Li, Wei Li
  • for: 这个论文的目的是开发一种全面评估方法,帮助用户在购买钢琴时更好地评估音色质量。
  • methods: 这个研究使用主观问卷来 derive高质量评估系统,并使用Convolutional Neural Networks (CNN)进行分类。为了提高模型的可读性,研究人员使用Equivalent Rectangular Bandwidth (ERB)分析。
  • results: 研究发现,有musically trained individuals能够更好地 отли别不同钢琴的音色差异。最佳的CNN预训练后台 achieves a high accuracy of 98.3% as the piano classifier。然而,数据库有限, audio被截割以增加其量,导致数据不均衡和不够多样性,因此使用 focal loss 来减少数据不均衡的影响。
    Abstract This paper aims to develop a holistic evaluation method for piano sound quality to assist in purchasing decisions. Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos. To derive quality evaluation systems, the study uses subjective questionnaires based on a piano sound quality dataset. The method selects the optimal piano classification models by comparing the fine-tuning results of different pre-training models of Convolutional Neural Networks (CNN). To improve the interpretability of the models, the study applies Equivalent Rectangular Bandwidth (ERB) analysis. The results reveal that musically trained individuals are better able to distinguish between the sound quality differences of different pianos. The best fine-tuned CNN pre-trained backbone achieves a high accuracy of 98.3\% as the piano classifier. However, the dataset is limited, and the audio is sliced to increase its quantity, resulting in a lack of diversity and balance, so we use focal loss to reduce the impact of data imbalance. To optimize the method, the dataset will be expanded, or few-shot learning techniques will be employed in future research.
    摘要 Translated into Simplified Chinese:这篇论文目的是开发一种全面评价方法,帮助人们在购买钢琴时做出更加 informed 的决定。与之前的研究不同,这些研究专注于钢琴演奏技巧对音质的影响,而这个研究则评估不同钢琴的内在音质。为 derive 音质评价系统,这个研究使用基于钢琴音质数据的主观问naire。方法选择最佳钢琴分类模型,通过比较不同预训练模型的 Convolutional Neural Networks (CNN) 的细化结果进行比较。为了提高模型的可读性,研究使用 Equivalent Rectangular Bandwidth (ERB) 分析。结果显示,具有音乐训练经验的个体更能够 отличи出不同钢琴的音质差异。最佳细化后的 CNN 预训练后台得到了 98.3% 的钢琴分类精度。然而,数据库有限, audio 被截割以增加其量,导致数据的不平衡和缺乏多样性,因此我们使用 focal loss 减少数据不平衡的影响。为优化方法,将来的研究可能会扩大数据库,或者使用 few-shot learning 技术。

EdgeFD: An Edge-Friendly Drift-Aware Fault Diagnosis System for Industrial IoT

  • paper_url: http://arxiv.org/abs/2310.04704
  • repo_url: None
  • paper_authors: Chen Jiao, Mao Fengjian, Lv Zuohong, Tang Jianhua
  • for: 这篇论文是针对工业智能故障诊断(FD)领域的传播学(TL)方法进行研究,以解决数据漂移问题。
  • methods: 该论文提出了一种名为“漂移意识Weight控制”(DAWC)的方法,用于在边缘设备上进行快速和有效的故障诊断。DAWC通过检测漂移并逐渐增强模型的泛化能力来解决频繁的数据漂移问题。
  • results: 实验结果表明,相比于现有的技术,该论文提出的DAWC方法能够达到更高的性能水平,同时也遵循边缘计算限制。此外,该论文还开发了一个完整的诊断和可视化平台。
    Abstract Recent transfer learning (TL) approaches in industrial intelligent fault diagnosis (FD) mostly follow the "pre-train and fine-tuning" paradigm to address data drift, which emerges from variable working conditions. However, we find that this approach is prone to the phenomenon known as catastrophic forgetting. Furthermore, performing frequent models fine-tuning on the resource-constrained edge nodes can be computationally expensive and unnecessary, given the excellent transferability demonstrated by existing models. In this work, we propose the Drift-Aware Weight Consolidation (DAWC), a method optimized for edge deployments, mitigating the challenges posed by frequent data drift in the industrial Internet of Things (IIoT). DAWC efficiently manages multiple data drift scenarios, minimizing the need for constant model fine-tuning on edge devices, thereby conserving computational resources. By detecting drift using classifier confidence and estimating parameter importance with the Fisher Information Matrix, a tool that measures parameter sensitivity in probabilistic models, we introduce a drift detection module and a continual learning module to gradually equip the FD model with powerful generalization capabilities. Experimental results demonstrate that our proposed DAWC achieves superior performance compared to existing techniques while also ensuring compatibility with edge computing constraints. Additionally, we have developed a comprehensive diagnosis and visualization platform.
    摘要 Recent transfer learning (TL) approaches in industrial intelligent fault diagnosis (FD) mostly follow the "pre-train and fine-tuning" paradigm to address data drift, which emerges from variable working conditions. However, we find that this approach is prone to the phenomenon known as catastrophic forgetting. Furthermore, performing frequent models fine-tuning on the resource-constrained edge nodes can be computationally expensive and unnecessary, given the excellent transferability demonstrated by existing models. In this work, we propose the Drift-Aware Weight Consolidation (DAWC), a method optimized for edge deployments, mitigating the challenges posed by frequent data drift in the industrial Internet of Things (IIoT). DAWC efficiently manages multiple data drift scenarios, minimizing the need for constant model fine-tuning on edge devices, thereby conserving computational resources. By detecting drift using classifier confidence and estimating parameter importance with the Fisher Information Matrix, a tool that measures parameter sensitivity in probabilistic models, we introduce a drift detection module and a continual learning module to gradually equip the FD model with powerful generalization capabilities. Experimental results demonstrate that our proposed DAWC achieves superior performance compared to existing techniques while also ensuring compatibility with edge computing constraints. Additionally, we have developed a comprehensive diagnosis and visualization platform.Here is the translation in Traditional Chinese:Recent transfer learning (TL) approaches in industrial intelligent fault diagnosis (FD) mostly follow the "pre-train and fine-tuning" paradigm to address data drift, which emerges from variable working conditions. However, we find that this approach is prone to the phenomenon known as catastrophic forgetting. Furthermore, performing frequent models fine-tuning on the resource-constrained edge nodes can be computationally expensive and unnecessary, given the excellent transferability demonstrated by existing models. In this work, we propose the Drift-Aware Weight Consolidation (DAWC), a method optimized for edge deployments, mitigating the challenges posed by frequent data drift in the industrial Internet of Things (IIoT). DAWC efficiently manages multiple data drift scenarios, minimizing the need for constant model fine-tuning on edge devices, thereby conserving computational resources. By detecting drift using classifier confidence and estimating parameter importance with the Fisher Information Matrix, a tool that measures parameter sensitivity in probabilistic models, we introduce a drift detection module and a continual learning module to gradually equip the FD model with powerful generalization capabilities. Experimental results demonstrate that our proposed DAWC achieves superior performance compared to existing techniques while also ensuring compatibility with edge computing constraints. Additionally, we have developed a comprehensive diagnosis and visualization platform.

Serving Deep Learning Model in Relational Databases

  • paper_url: http://arxiv.org/abs/2310.04696
  • repo_url: None
  • paper_authors: Alexandre Eichenberger, Qi Lin, Saif Masood, Hong Min, Alexander Sim, Jie Wang, Yida Wang, Kesheng Wu, Binhang Yuan, Lixi Zhou, Jia Zou
  • for: 本研究旨在探讨如何在关系数据上执行深度学习(DL)模型,以满足不同的商业和科学领域的需求。
  • methods: 本文提出了三种重要的架构方法:DL-Centric architecture、UDF-Centric architecture和Relation-Centric architecture。这三种架构各有优势,但是它们之间存在许多挑战,需要进行融合和中间件技术的研究。
  • results: 本研究发现了许多 интеграción gap和挑战,并提出了一些创新的解决方案,以实现一种可以满足各种数据挑战的数据处理和深度学习执行平台。
    Abstract Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains, sparking growing interest recently. In this visionary paper, we embark on a comprehensive exploration of representative architectures to address the requirement. We highlight three pivotal paradigms: The state-of-the-artDL-Centricarchitecture offloadsDL computations to dedicated DL frameworks. The potential UDF-Centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the database system. The potentialRelation-Centricarchitecture aims to represent a large-scale tensor computation through relational operators. While each of these architectures demonstrates promise in specific use scenarios, we identify urgent requirements for seamless integration of these architectures and the middle ground between these architectures. We delve into the gaps that impede the integration and explore innovative strategies to close them. We present a pathway to establish a novel database system for enabling a broad class of data-intensive DL inference applications.
    摘要 优化深度学习(DL)模型在关系数据上的应用已成为不同领域的关键需求,最近吸引了很多关注。在这篇visionary论文中,我们进行了全面的探索,探讨了代表性的建筑方案。我们提出了三个重要的思想:1. 现状顶尖DL-Centric架构,将DL计算外送到专门的DL框架上。2. UDF-Centric架构,将一个或多个张量计算包装在用户定义函数(UDF)内部。3. Relation-Centric架构,通过关系运算来表示大规模张量计算。各种架构在特定使用场景中都有承诺,但是我们认为这些架构之间的协同和中间地带的融合是必要的。我们描述了这些架构之间的差距和融合的难点,并提出了创新的策略来填补这些差距。最后,我们提出了一种新的数据库系统,用于支持广泛的数据敏感DL推理应用。

Robustness-enhanced Uplift Modeling with Adversarial Feature Desensitization

  • paper_url: http://arxiv.org/abs/2310.04693
  • repo_url: None
  • paper_authors: Zexu Sun, Bowei He, Ming Ma, Jiakai Tang, Yuchen Wang, Chen Ma, Dugang Liu
  • for: 本文旨在解决在实际应用中存在的robustness挑战,提出了一种可能的解释,并采用了两个特定模块来进行稳定性提升。
  • methods: 本文提出了一种基于反对抗训练和软 interpolate操作的特殊feature敏感性增强策略,以及一种基于多标签模型的feature选择模块。
  • results: 经过广泛的实验 validate,RUAD可以更好地解决在线广告的feature敏感性问题,同时也能够保持和不同的uplift模型的兼容性。
    Abstract Uplift modeling has shown very promising results in online marketing. However, most existing works are prone to the robustness challenge in some practical applications. In this paper, we first present a possible explanation for the above phenomenon. We verify that there is a feature sensitivity problem in online marketing using different real-world datasets, where the perturbation of some key features will seriously affect the performance of the uplift model and even cause the opposite trend. To solve the above problem, we propose a novel robustness-enhanced uplift modeling framework with adversarial feature desensitization (RUAD). Specifically, our RUAD can more effectively alleviate the feature sensitivity of the uplift model through two customized modules, including a feature selection module with joint multi-label modeling to identify a key subset from the input features and an adversarial feature desensitization module using adversarial training and soft interpolation operations to enhance the robustness of the model against this selected subset of features. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our RUAD in online marketing. In addition, we also demonstrate the robustness of our RUAD to the feature sensitivity, as well as the compatibility with different uplift models.
    摘要 《增强模型在线市场营销中的应用显示了非常有前途。然而,现有的大多数工作受到实际应用中的Robustness挑战。在这篇论文中,我们首先提出了这种现象的可能的解释。我们使用了不同的实际数据集,证明了在线市场营销中存在特征敏感性问题,其中一些关键特征的修改会严重影响增强模型的性能,甚至导致反趋势。为解决以上问题,我们提议一种基于反对抗训练和软插值操作的robustness-enhanced增强模型框架(RUAD)。具体来说,我们的RUAD可以更好地减轻增强模型中特征敏感性通过两个定制模块,包括一个联合多标签模型来确定输入特征中关键子集和一个反对抗训练和软插值操作来强化模型对这些选择的特征的抗训练。最后,我们对公共数据集和真实产品数据集进行了广泛的实验,以证明RUAD在线市场营销中的有效性。此外,我们还证明了RUAD对特征敏感性的稳定性,以及与不同的增强模型相容性。》Note: Please note that the translation is in Simplified Chinese, which is used in mainland China. If you need Traditional Chinese, please let me know.

Understanding and Improving Adversarial Attacks on Latent Diffusion Model

  • paper_url: http://arxiv.org/abs/2310.04687
  • repo_url: https://github.com/caradryanliang/improvedadvdm
  • paper_authors: Boyang Zheng, Chumeng Liang, Xiaoyu Wu, Yan Liu
  • for: 保护个人隐私和安全数据,防止未经授权的艺术作品复制和谣言生成。
  • methods: 基于理论基础的隐 diffusion 模型(LDM)对抗攻击,通过一个统一的目标导向对抗攻击进行优化。
  • results: 比对现有方法,提出了一种更加强大和有效的对抗攻击方法,可以在不同的状态对抗攻击下进行普适化。
    Abstract Latent Diffusion Model (LDM) has emerged as a leading tool in image generation, particularly with its capability in few-shot generation. This capability also presents risks, notably in unauthorized artwork replication and misinformation generation. In response, adversarial attacks have been designed to safeguard personal images from being used as reference data. However, existing adversarial attacks are predominantly empirical, lacking a solid theoretical foundation. In this paper, we introduce a comprehensive theoretical framework for understanding adversarial attacks on LDM. Based on the framework, we propose a novel adversarial attack that exploits a unified target to guide the adversarial attack both in the forward and the reverse process of LDM. We provide empirical evidences that our method overcomes the offset problem of the optimization of adversarial attacks in existing methods. Through rigorous experiments, our findings demonstrate that our method outperforms current attacks and is able to generalize over different state-of-the-art few-shot generation pipelines based on LDM. Our method can serve as a stronger and efficient tool for people exposed to the risk of data privacy and security to protect themselves in the new era of powerful generative models. The code is available on GitHub: https://github.com/CaradryanLiang/ImprovedAdvDM.git.
    摘要 Latent Diffusion Model (LDM) 已成为图像生成领域的主导工具,特别是它的几何批处能力。这种能力也存在风险,包括未经授权的艺术作品复制和谣言生成。为了保护个人隐私和安全,我们提出了一种全面的理论基础 для对LDM的逆攻击。基于这种基础,我们提出了一种新的逆攻击方法,通过一个统一的目标导引LDM的前向和反向过程中的逆攻击。我们提供了实验证据,表明我们的方法可以超越现有的优化问题,并且可以泛化到不同的state-of-the-art几何批处管道中。我们的方法可以作为一种更加强大和高效的工具,为人们在新的强大生成模型时代保护自己的隐私和安全。代码可以在GitHub上获取:https://github.com/CaradryanLiang/ImprovedAdvDM.git。

Data-Centric Financial Large Language Models

  • paper_url: http://arxiv.org/abs/2310.17784
  • repo_url: None
  • paper_authors: Zhixuan Chu, Huaiyu Guo, Xinyuan Zhou, Yijia Wang, Fei Yu, Hong Chen, Wanqing Xu, Xin Lu, Qing Cui, Longfei Li, Jun Zhou, Sheng Li
  • for: This paper aims to improve the performance of large language models (LLMs) in financial tasks by using a data-centric approach and multitask prompt-based finetuning.
  • methods: The proposed method uses a financial LLM (FLLM) and abductive augmentation reasoning (AAR) to generate training data and preprocess the input data.
  • results: The data-centric FLLM with AAR achieves state-of-the-art performance on financial analysis and interpretation tasks, outperforming baseline financial LLMs designed for raw text. Additionally, a new benchmark for financial analysis and interpretation is open-sourced.
    Abstract Large language models (LLMs) show promise for natural language tasks but struggle when applied directly to complex domains like finance. LLMs have difficulty reasoning about and integrating all relevant information. We propose a data-centric approach to enable LLMs to better handle financial tasks. Our key insight is that rather than overloading the LLM with everything at once, it is more effective to preprocess and pre-understand the data. We create a financial LLM (FLLM) using multitask prompt-based finetuning to achieve data pre-processing and pre-understanding. However, labeled data is scarce for each task. To overcome manual annotation costs, we employ abductive augmentation reasoning (AAR) to automatically generate training data by modifying the pseudo labels from FLLM's own outputs. Experiments show our data-centric FLLM with AAR substantially outperforms baseline financial LLMs designed for raw text, achieving state-of-the-art on financial analysis and interpretation tasks. We also open source a new benchmark for financial analysis and interpretation. Our methodology provides a promising path to unlock LLMs' potential for complex real-world domains.
    摘要 Our key insight is that it is more effective to preprocess and pre-understand the data rather than overloading the LLM with everything at once. To achieve this, we create a financial LLM (FLLM) using multitask prompt-based finetuning. However, labeled data is scarce for each task, which can be costly and time-consuming to obtain.To overcome this challenge, we employ abductive augmentation reasoning (AAR) to automatically generate training data by modifying the pseudo labels from FLLM's own outputs. This approach allows us to create a large amount of training data without relying on manual annotation.Our experiments show that our data-centric FLLM with AAR substantially outperforms baseline financial LLMs designed for raw text, achieving state-of-the-art performance on financial analysis and interpretation tasks. We also open source a new benchmark for financial analysis and interpretation.Our methodology provides a promising path to unlock LLMs' potential for complex real-world domains like finance. By preprocessing and pre-understanding the data, we can enable LLMs to better handle tasks that require a deep understanding of financial concepts and terminology.

Automatic and Efficient Customization of Neural Networks for ML Applications

  • paper_url: http://arxiv.org/abs/2310.04685
  • repo_url: None
  • paper_authors: Yuhan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire
  • for: 这种 исследование旨在解决现有的机器学习(ML)API问题,即不同应用程序对ML API输出的不同响应。
  • methods: 该研究使用了77个实际应用程序,总共使用了6个ML API提供商的API,以探索这些应用程序如何使用ML API输出来影响它们的决策过程。
  • results: 研究发现,使用ChameleonAPI优化框架可以减少不正确的应用程序决策数量,相比基准值,减少了43%。
    Abstract ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%.
    摘要 机器学习(ML)API对应用开发者的负担减轻了设计和训练自己的神经网络模型的负担--用一行python代码调用API可以将图像中的对象分类。然而,这些API提供的预训练模型无论如何都是不变的,这可能是优化的问题,因为不同应用程序中ML推断错误的影响不同。为解决这个问题,我们首先研究了77个实际应用程序,它们共用六个ML API从两家提供商,以探索这些应用程序如何使用ML API的输出来影响它们的决策过程。 inspirited by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI提供了一个分析器,可以自动分析应用程序,生成一个摘要,用于生成每个应用程序的特定损失函数,只有API输出错误对应用程序的决策有影响。 ChameleonAPI使用这个损失函数来高效地训练每个应用程序特定的神经网络模型,并通过现有接口部署到应用程序中。相比基准选择所有商业ML API中最佳的选择,我们显示了ChameleonAPI可以将错误应用程序决策减少43%。

VoiceExtender: Short-utterance Text-independent Speaker Verification with Guided Diffusion Model

  • paper_url: http://arxiv.org/abs/2310.04681
  • repo_url: None
  • paper_authors: Yayun He, Zuheng Kang, Jianzong Wang, Junqing Peng, Jing Xiao
  • for: 提高短语音识别性能(Speaker Verification,SV),特别是处理短时间语音信号。
  • methods: 我们提出了一种新的架构,即VoiceExtender,该架构使用两个导航 diffusion model,包括内置的speaker embedding(SE)导航 diffusion model,以及一个基于扩展的 diffusion model-based sample generator,以使用SE指导来增强语音特征。
  • results: 我们的方法在VoxCeleb1数据集上进行了广泛的实验,与基准方法相比,我们的方法在短语音Conditions下(0.5, 1.0, 1.5, 2.0秒)实现了相对改善46.1%, 35.7%, 10.4%, 5.7%。
    Abstract Speaker verification (SV) performance deteriorates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender which provides a promising solution for improving SV performance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusion model-based sample generator that leverages SE guidance to augment the speech features based on a short utterance. Extensive experimental results on the VoxCeleb1 dataset show that our method outperforms the baseline, with relative improvements in equal error rate (EER) of 46.1%, 35.7%, 10.4%, and 5.7% for the short utterance conditions of 0.5, 1.0, 1.5, and 2.0 seconds, respectively.
    摘要 <>translate_language=zh-CN spreaker verification (SV) perfomance degenerates as utterances become shorter. To this end, we propose a new architecture called VoiceExtender, which provides a promising solution for improving SV perfomance when handling short-duration speech signals. We use two guided diffusion models, the built-in and the external speaker embedding (SE) guided diffusion model, both of which utilize a diffusion model-based sample generator that leverages SE guidance to augment the speech features based on a short utterance. Extensive experimental results on the VoxCeleb1 dataset show that our method outperforms the baseline, with relative improvements in equal error rate (EER) of 46.1%, 35.7%, 10.4%, and 5.7% for the short utterance conditions of 0.5, 1.0, 1.5, and 2.0 seconds, respectively.

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

  • paper_url: http://arxiv.org/abs/2310.04680
  • repo_url: None
  • paper_authors: Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite
  • for: 这个研究探讨了大语言模型(LLM)中缩放参数数量对其核心能力的影响。
  • methods: 研究使用了两种自然缩放技术:weight pruning和训练小型或大型模型(dense scaling),并对两种核心能力:在预训练中提供的信息回忆和在推理中处理信息进行分析。
  • results: 研究发现,通过减少模型大小超过30%(via either scaling approach)会显著降低在预训练中提供的信息回忆的能力。然而,减少模型大小60-70%可以保留在context中的多种信息处理方式,包括从长文本中检索答案和从句子中学习参数化函数。这种行为表明缩放模型大小对于信息回忆和在context中学习有不同的影响。
    Abstract How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.
    摘要 如何在大语言模型(LLM)中缩放参数数量影响其核心能力?我们研究了两种自然缩放技术——重量剪裁和训练小或大模型,我们称之为笛卡尔缩放——对两个LLM核心能力的影响:(a)在预训练时提供的信息回忆和(b)在推理时接受的信息处理。我们编排了一组任务,以帮助分离这两种能力。我们发现,减少模型大小超过30%(通过任何缩放方法)会导致预训练中提供的信息回忆降低。然而,减少模型大小60-70%以上将保留在上下文中的各种信息处理方式,包括从长上下文中检索答案以及从上下文中学习参数化函数。这种行为表明,缩放模型大小对预训练中信息回忆和上下文中信息处理具有不同的影响。两种缩放技术的行为表明,缩放模型大小在预训练中信息回忆和上下文中信息处理之间存在本质差异。

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

  • paper_url: http://arxiv.org/abs/2310.04673
  • repo_url: None
  • paper_authors: Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang
  • for: 这篇论文的目的是提出一种基于Transformer框架的通用语言模型,用于音频识别、理解和生成。
  • methods: 这篇论文使用了一种组合 kontinuous和精确的特征来编码输入音频,然后使用一个大型Decoder-onlyTransformer语言模型进行多任务超级vised学习。
  • results: 实验表明,LauraGPT在多种音频处理标准准点上达到了或超过现有最佳性能。
    Abstract Generative Pre-trained Transformer (GPT) models have achieved remarkable performance on various natural language processing tasks. However, there has been limited research on applying similar frameworks to audio tasks. Previously proposed large language models for audio tasks either lack sufficient quantitative evaluations, or are limited to tasks for recognizing and understanding audio content, or significantly underperform existing state-of-the-art (SOTA) models. In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation. LauraGPT is a versatile language model that can process both audio and text inputs and generate outputs in either modalities. It can perform a wide range of tasks related to content, semantics, paralinguistics, and audio-signal analysis. Some of its noteworthy tasks include automatic speech recognition, speech-to-text translation, text-to-speech synthesis, machine translation, speech enhancement, automated audio captioning, speech emotion recognition, and spoken language understanding. To achieve this goal, we use a combination of continuous and discrete features for audio. We encode input audio into continuous representations using an audio encoder and decode output audio from discrete codec codes. We then fine-tune a large decoder-only Transformer-based language model on multiple audio-to-text, text-to-audio, audio-to-audio, and text-to-text tasks using a supervised multitask learning approach. Extensive experiments show that LauraGPT achieves competitive or superior performance compared to existing SOTA models on various audio processing benchmarks.
    摘要 生成预训练 transformer(GPT)模型在不同的自然语言处理任务上实现了卓越的表现。然而,对于音频任务来说,有限的研究是在应用相似的框架。以前提出的大语言模型 для音频任务 either lack sufficient quantitative evaluations, or 只能完成音频内容认识和理解任务,或者明显地下perform existing state-of-the-art(SOTA)模型。在这篇论文中,我们提出LauraGPT,一个统一的GPT模型,用于音频认识、理解和生成。LauraGPT 是一种灵活的语言模型,可以处理音频和文本输入,并生成输出在不同的modalities。它可以执行各种内容、semantics、paralinguistics和音频信号分析相关的任务。LauraGPT 的一些吸引人的任务包括自动 speech recognition、speech-to-text翻译、文本-speech合成、机器翻译、speech enhancement、自动音频标题、speech emotion recognition和 spoken language understanding。为了实现这个目标,我们使用了一种组合 continuous和 discrete 特征来编码输入音频。我们使用 audio encoder 编码输入音频,并将输出音频解码为 discrete codec codes。然后,我们在多个 audio-to-text、text-to-audio、audio-to-audio和 text-to-text 任务上精度 fine-tune 一个大型 decoder-only Transformer-based language model。经验表明,LauraGPT 在多种音频处理标准准则上实现了竞争或更高的表现。

Label-free Node Classification on Graphs with Large Language Models (LLMS)

  • paper_url: http://arxiv.org/abs/2310.04668
  • repo_url: https://github.com/currytang/llmgnn
  • paper_authors: Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, Jiliang Tang
  • for: 这个研究的目的是发展一个没有标签的节点分类框架,即LLM-GNN,以便在节点资料上进行分类。
  • methods: 这个研究使用了大型自然语言模型(LLM)和节点神经网络(GNN)两种不同的技术,并融合它们以获得更好的性能。具体来说,LLMs是用来标签一小部分节点,然后GNNs是用来在这些标签下训练来进行预测。
  • results: 实验结果显示,LLM-GNN可以在广泛的数据集上达到74.9%的精度,而且在训练成本下than 1 dollar。
    Abstract In recent years, there have been remarkable advancements in node classification achieved by Graph Neural Networks (GNNs). However, they necessitate abundant high-quality labels to ensure promising performance. In contrast, Large Language Models (LLMs) exhibit impressive zero-shot proficiency on text-attributed graphs. Yet, they face challenges in efficiently processing structural data and suffer from high inference costs. In light of these observations, this work introduces a label-free node classification on graphs with LLMs pipeline, LLM-GNN. It amalgamates the strengths of both GNNs and LLMs while mitigating their limitations. Specifically, LLMs are leveraged to annotate a small portion of nodes and then GNNs are trained on LLMs' annotations to make predictions for the remaining large portion of nodes. The implementation of LLM-GNN faces a unique challenge: how can we actively select nodes for LLMs to annotate and consequently enhance the GNN training? How can we leverage LLMs to obtain annotations of high quality, representativeness, and diversity, thereby enhancing GNN performance with less cost? To tackle this challenge, we develop an annotation quality heuristic and leverage the confidence scores derived from LLMs to advanced node selection. Comprehensive experimental results validate the effectiveness of LLM-GNN. In particular, LLM-GNN can achieve an accuracy of 74.9% on a vast-scale dataset \products with a cost less than 1 dollar.
    摘要 recent years, there have been remarkable advancements in node classification achieved by Graph Neural Networks (GNNs). However, they necessitate abundant high-quality labels to ensure promising performance. In contrast, Large Language Models (LLMs) exhibit impressive zero-shot proficiency on text-attributed graphs. Yet, they face challenges in efficiently processing structural data and suffer from high inference costs. In light of these observations, this work introduces a label-free node classification on graphs with LLMs pipeline, LLM-GNN. It amalgamates the strengths of both GNNs and LLMs while mitigating their limitations. Specifically, LLMs are leveraged to annotate a small portion of nodes, and then GNNs are trained on LLMs' annotations to make predictions for the remaining large portion of nodes. The implementation of LLM-GNN faces a unique challenge: how can we actively select nodes for LLMs to annotate and consequently enhance the GNN training? How can we leverage LLMs to obtain annotations of high quality, representativeness, and diversity, thereby enhancing GNN performance with less cost? To tackle this challenge, we develop an annotation quality heuristic and leverage the confidence scores derived from LLMs to advanced node selection. Comprehensive experimental results validate the effectiveness of LLM-GNN. In particular, LLM-GNN can achieve an accuracy of 74.9% on a vast-scale dataset with a cost less than 1 dollar.

HalluciDet: Hallucinating RGB Modality for Person Detection Through Privileged Information

  • paper_url: http://arxiv.org/abs/2310.04662
  • repo_url: None
  • paper_authors: Heitor Rapela Medeiros, Fidel A. Guerrero Pena, Masih Aminbeidokhti, Thomas Dubail, Eric Granger, Marco Pedersoli
  • for: 这个论文是用于对于具有大跨模式转换的Visul recognition任务中的人像检测,以提高检测性能。
  • methods: 这个论文使用了一种叫做HalluciDet的IR-RGB图像转换模型,这个模型不是将RGB图像转换为IR图像,而是将IR图像转换为一个能够增强物体的新图像表示。
  • results: 这个论文的实验结果显示,使用HalluciDet模型可以大幅提高人像检测精度,并且比起使用state-of-the-art图像转换方法以及对IR图像进行精度调整,能够更好地适应训练数据的差异。
    Abstract A powerful way to adapt a visual recognition model to a new domain is through image translation. However, common image translation approaches only focus on generating data from the same distribution of the target domain. In visual recognition tasks with complex images, such as pedestrian detection on aerial images with a large cross-modal shift in data distribution from Infrared (IR) to RGB images, a translation focused on generation might lead to poor performance as the loss focuses on irrelevant details for the task. In this paper, we propose HalluciDet, an IR-RGB image translation model for object detection that, instead of focusing on reconstructing the original image on the IR modality, is guided directly on reducing the detection loss of an RGB detector, and therefore avoids the need to access RGB data. This model produces a new image representation that enhances the object of interest in the scene and greatly improves detection performance. We empirically compare our approach against state-of-the-art image translation methods as well as with the commonly used fine-tuning on IR, and show that our method improves detection accuracy in most cases, by exploiting the privileged information encoded in a pre-trained RGB detector.
    摘要 一种强大的方法是通过图像翻译来适应新领域的视觉识别模型。然而,常见的图像翻译方法只关注生成数据与目标领域的同分布。在复杂的视觉任务中,如人员检测在航空图像上的红外(IR)到RGB图像之间的大跨模态差,一种专注于生成的翻译方法可能会导致性能下降,因为损失将关注无关于任务的细节。在这篇论文中,我们提出了HalluciDet,一种用于对象检测的IR-RGB图像翻译模型。这种模型不是专注于重建原始IR图像,而是通过直接减少RGB检测器的检测损失来指导,因此不需要访问RGB数据。这种模型生成的新图像表示法可以增强Scene中的对象,并大幅提高检测性能。我们对比了我们的方法与现有的图像翻译方法以及通常使用的RGB数据练习,并证明了我们的方法在大多数情况下可以提高检测精度,通过利用预训练的RGB检测器中嵌入的特权信息。

Do self-supervised speech and language models extract similar representations as human brain?

  • paper_url: http://arxiv.org/abs/2310.04645
  • repo_url: None
  • paper_authors: Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li
  • for: 这个论文主要研究了 SSL 模型在语言理解中的表现,以及它们与大脑活动的相似性。
  • methods: 研究者使用了两种代表性 SSL 模型,即 Wav2Vec2.0 和 GPT-2,来评估大脑预测性能。
  • results: 研究发现,这两种模型在 auditory cortex 中都能准确预测语音响应,并且它们之间的大脑预测相互吻合。另外,共享的语音上下文信息在这两种模型中占据了大脑活动变化的主要贡献,超过静态 semantics 和 lower-level acoustic-phonetic 信息。这些结果表明 SSL 模型中的语音上下文表示 converge 到大脑 beneath speech perception 的网络,并且它们与大脑的语言处理机制相似。
    Abstract Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and language tasks. Our findings reveal that both models accurately predict speech responses in the auditory cortex, with a significant correlation between their brain predictions. Notably, shared speech contextual information between Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain activity, surpassing static semantic and lower-level acoustic-phonetic information. These results underscore the convergence of speech contextual representations in SSL models and their alignment with the neural network underlying speech perception, offering valuable insights into both SSL models and the neural basis of speech and language processing.
    摘要 <>使用自我超vision学习(SSL)训练的语音和语言模型在语音和语言识别过程中具有强的对应性。然而,由于它们的不同训练方式,是否与同样的神经元方面相关仍然未知。我们直接解决这个问题,通过评估两个代表性SSL模型——Wav2Vec2.0和GPT-2——在语音和语言任务中的脑预测性能。我们发现,这两个模型在听觉核心区域中准确预测语音回应,并且它们的脑预测显示了显著的相关性。值得注意的是,在Wav2Vec2.0和GPT-2之间共享的语音上下文信息占主要的解释变量的比重,超过静态semantic和低级别的语音学习信息。这些结果表明SSL模型中的语音上下文表示具有共同性,与语音识别神经网络下的神经元表示相吻合,为SSL模型和语音和语言处理的神经基础提供了有价值的发现。

Automatic Anonymization of Swiss Federal Supreme Court Rulings

  • paper_url: http://arxiv.org/abs/2310.04632
  • repo_url: None
  • paper_authors: Joel Niklaus, Robin Mamié, Matthias Stürmer, Daniel Brunner, Marcel Gygli
  • for: 本研究旨在提高释放法院判决的公众发布需要遵循适当的匿名化方法,以保护所有参与人员,如果必要。
  • methods: 本研究使用现有系统, комбиines 不同的传统计算方法和人工专家。我们在本研究中增强了现有的匿名化软件,使用大量标注的实体需要匿名化。我们比较了基于 BERT 的模型和基于域内数据预处理的模型。
  • results: 我们的结果表明,使用域内数据预处理模型可以进一步提高 F1 分数,高于现有模型的提升。我们的研究示例了,结合现有的匿名化方法,如常见表达式,与机器学习结合可以进一步减少人工劳动,提高自动建议。
    Abstract Releasing court decisions to the public relies on proper anonymization to protect all involved parties, where necessary. The Swiss Federal Supreme Court relies on an existing system that combines different traditional computational methods with human experts. In this work, we enhance the existing anonymization software using a large dataset annotated with entities to be anonymized. We compared BERT-based models with models pre-trained on in-domain data. Our results show that using in-domain data to pre-train the models further improves the F1-score by more than 5\% compared to existing models. Our work demonstrates that combining existing anonymization methods, such as regular expressions, with machine learning can further reduce manual labor and enhance automatic suggestions.
    摘要 发布法院判决到公众需要采用正确的匿名化方法保护所有参与者,如果必要。瑞士联邦最高法院使用现有系统,这个系统组合了不同的传统计算方法和人工专家。在这项工作中,我们提高了现有的匿名化软件使用大量标注需要匿名化的数据集。我们比较了基于BERT的模型和基于域内数据预处理的模型。我们的结果表明,使用域内数据预处理模型可以提高F1分数超过5% compare to现有模型。我们的工作表明,将现有的匿名化方法,如常见表达式,与机器学习结合可以进一步减少人工劳动并提高自动建议。

SERA:Sample Efficient Reward Augmentation in offline-to-online Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.19805
  • repo_url: None
  • paper_authors: Ziqi Zhang, Xiao Xiong, Zifeng Zhuang, Jinxin Liu, Donglin Wang
  • for: The paper aims to improve the performance of online fine-tuning in reinforcement learning (RL) by addressing the issue of diminished exploration in direct fine-tuning of offline pre-trained policies.
  • methods: The proposed method, called Sample Efficient Reward Augmentation (SERA), uses a generalized reward augmentation framework to improve exploration during online fine-tuning. SERA includes two components: State Marginal Matching (SMM) and penalization of out-of-distribution (OOD) state actions.
  • results: The paper demonstrates that SERA consistently and effectively enhances the performance of various offline algorithms in offline-to-online problems, achieving better online fine-tuning results. Additionally, SERA is versatile and can be effortlessly plugged into various RL algorithms to improve online fine-tuning and ensure sustained asymptotic improvement.
    Abstract A prospective application of offline reinforcement learning (RL) involves initializing a pre-trained policy using existing static datasets for subsequent online fine-tuning. However, direct fine-tuning of the offline pre-trained policy often results in sub-optimal performance. A primary reason is that offline conservative methods diminish the agent's capability of exploration, thereby impacting online fine-tuning performance. To enhance exploration during online fine-tuning and thus enhance the overall online fine-tuning performance, we introduce a generalized reward augmentation framework called Sample Efficient Reward Augmentation (SERA). SERA aims to improve the performance of online fine-tuning by designing intrinsic rewards that encourage the agent to explore. Specifically, it implicitly implements State Marginal Matching (SMM) and penalizes out-of-distribution (OOD) state actions, thus encouraging agents to cover the target state density, and achieving better online fine-tuning results. Additionally, SERA can be effortlessly plugged into various RL algorithms to improve online fine-tuning and ensure sustained asymptotic improvement, showing the versatility as well as the effectiveness of SERA. Moreover, extensive experimental results will demonstrate that when conducting offline-to-online problems, SERA consistently and effectively enhances the performance of various offline algorithms.
    摘要 可能的应用是将预训练的策略初始化使用现有的静止数据集进行后续的线上精练。然而,直接精练预训练的预训策略通常会导致下线精练性能不佳。一个主要的原因是预训保守的方法会对机器人的探索能力有限制,因此影响线上精练性能。为了增强线上精练中的探索和总体性能,我们介绍一个通用的奖励增强框架,即样本优化奖励(SERA)。SERA的目标是通过设计内在奖励来提高线上精练的表现。具体而言,它隐式实现了状态范围匹配(SMM)和外部状态动作的惩罚,因此鼓励机器人覆盖目标状态密度,并获得更好的线上精练结果。此外,SERA可以轻松地插入到不同的RL算法中,以提高线上精练并确保持长期上升,显示了SERA的多样性和有效性。此外,广泛的实验结果显示,当进行阶段性问题时,SERA可靠地和有效地提高不同的预训算法的表现。