cs.AI - 2023-12-01

Video Summarization: Towards Entity-Aware Captions

  • paper_url: http://arxiv.org/abs/2312.02188
  • repo_url: None
  • paper_authors: Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang
  • for: 本研究目的是提出一个直接将新闻视频转换为具有实体名称的字幕的任务。
  • methods: 我们提出了一种方法,即将视频信息与外部世界知识结合,以生成具有实体名称的字幕。
  • results: 我们透过实验证明了我们的方法的有效性,并且显示了它可以应用于现有的新闻图片字幕 dataset。
    Abstract Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities for meaningful summarization. As such, we propose the task of summarizing news video directly to entity-aware captions. We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task. Further, we propose a method that augments visual information from videos with context retrieved from external world knowledge to generate entity-aware captions. We demonstrate the effectiveness of our approach on three video captioning models. We also show that our approach generalizes to existing news image captions dataset. With all the extensive experiments and insights, we believe we establish a solid basis for future research on this challenging task.
    摘要 现有的流行的视频描述标准和模型都是无关具体人名、地点或组织名称的描述。然而,新闻视频提供了一个挑战性的Setting, caption需要这些名称实体,以便有意义的概要。因此,我们提出了将新闻视频直接描述为实体意识的描述。我们还发布了一个大规模的数据集,VIENS(视频新闻),以支持这个任务的研究。此外,我们提出了一种方法,即使视频信息的补充外部世界知识来生成实体意识的描述。我们在三种视频描述模型中证明了我们的方法的有效性。此外,我们还证明了我们的方法可以泛化到现有的新闻图片描述数据集。通过广泛的实验和发现,我们认为我们建立了一个有固定基础的研究任务。

Spectral Temporal Contrastive Learning

  • paper_url: http://arxiv.org/abs/2312.00966
  • repo_url: None
  • paper_authors: Sacha Morin, Somjit Nath, Samira Ebrahimi Kahou, Guy Wolf
  • for: 本研究旨在学习无需标签的有用数据表示方法,特别是利用自动生成的数据增强方法进行自我超vision学习。
  • methods: 本研究使用了时间相关的对比学习方法(TCL),利用数据序列的顺序结构来定义正方向对比。
  • results: 本研究提出了spectral temporal contrastive learning(STCL)方法,可以将linear probing性能连接到图谱的 спектраль性质,并可以通过考虑前面观察到的数据序列来估算STCL损失。
    Abstract Learning useful data representations without requiring labels is a cornerstone of modern deep learning. Self-supervised learning methods, particularly contrastive learning (CL), have proven successful by leveraging data augmentations to define positive pairs. This success has prompted a number of theoretical studies to better understand CL and investigate theoretical bounds for downstream linear probing tasks. This work is concerned with the temporal contrastive learning (TCL) setting where the sequential structure of the data is used instead to define positive pairs, which is more commonly used in RL and robotics contexts. In this paper, we adapt recent work on Spectral CL to formulate Spectral Temporal Contrastive Learning (STCL). We discuss a population loss based on a state graph derived from a time-homogeneous reversible Markov chain with uniform stationary distribution. The STCL loss enables to connect the linear probing performance to the spectral properties of the graph, and can be estimated by considering previously observed data sequences as an ensemble of MCMC chains.
    摘要 现代深度学习中学习有用数据表示方法不需要标签是一个重要的核心。自我超vised学习方法,特别是对比学习(CL),在利用数据增强后定义正例对而取得成功。这种成功引发了一些理论研究,以更好地理解CL和下游线性探索任务的理论上的下限。这篇论文关注在时间对比学习(TCL)设置下,使用数据序列的顺序结构来定义正例对,这更常见于RL和机器人应用场景下。本文将对最近的spectral CL进行修改,提出spectral temporal contrastive learning(STCL)。我们讨论一种基于状态图的人口损失,该图来自于时间Homogeneous reversible Markov chain,具有均匀的站点分布。STCL损失可以将线性探索性能与图的спектраль性质连接起来,并可以通过考虑先前观察到的数据序列来估算。

The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models

  • paper_url: http://arxiv.org/abs/2312.00960
  • repo_url: https://github.com/namburisrinath/llmcompression
  • paper_authors: Satya Sai Srinath Namburi, Makesh Sreedhar, Srinath Srinivasan, Frederic Sala
  • for: 本研究的目的是提供一个系统性的分析,以评估常用的压缩技术对模型性能的影响,特别是在不同的模型家族(ENCODER、ENCODER-DECODER和DECODER)下。
  • methods: 本研究使用了两种常用的压缩技术:减少和量化。减少技术删除模型层中的重复连接,而量化技术将模型参数表示为 fewer bits。
  • results: 研究发现,压缩技术对模型性能的影响是复杂的,存在负相关性。在不同的模型家族和压缩级别下,压缩技术的影响也有差异。此外,研究还发现,压缩技术对模型的 parametric knowledge 有一定的影响。
    Abstract Compressing large language models (LLMs), often consisting of billions of parameters, provides faster inference, smaller memory footprints, and enables local deployment. Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits. The key tradeoff is between the degree of compression and the impact on the quality of the compressed model. Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy. More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored. To help bridge this gap, we present a comprehensive analysis across multiple model families (ENCODER, ENCODER-DECODER, and DECODER) using the LAMA and LM-HARNESS benchmarks in order to systematically quantify the effect of commonly employed compression techniques on model performance. A particular focus is on tradeoffs involving parametric knowledge, with the goal of providing practitioners with practical insights to help make informed decisions on compression. We release our codebase1 to enable further research.
    摘要 压缩大型语言模型(LLM),通常包含数十亿个参数,可以提供更快的推理、更小的内存占用量和本地部署。现有两种标准压缩技术是裁剪和量化,前者将模型层中的重复连接消除,后者将模型参数表示为 fewer bits。关键的交易是压缩度和模型质量之间的平衡。现有研究主要关注总性指标,如杂化率或下游任务准确率。但是,更细化的指标,如参数知识,尚未得到足够的研究。为了bridging这个差距,我们提供了一项全面的分析,使用LAMA和LM-HARNESS benchmark进行系统地量化各种模型家族(ENCODER、ENCODER-DECODER和DECODER)中通常采用的压缩技术对模型性能的影响。特别是关于参数知识的交易,以提供实践者们作为决策参考。我们发布了我们的代码库,以便进一步的研究。

Effectiveness of probabilistic contact tracing in epidemic containment: the role of super-spreaders and transmission paths reconstruction

  • paper_url: http://arxiv.org/abs/2312.00910
  • repo_url: None
  • paper_authors: A. P. Muntoni, F. Mazza, A. Braunstein, G. Catania, L. Dall’Asta
  • for: 本研究旨在提出一种基于概率技术的数字接触追踪方法,以优化抗击新冠肺炎的策略。
  • methods: 研究人员首先使用三种现代新冠肺炎传播模型,对接触追踪措施的诊断和社会成本进行数值分析。
  • results: 研究结果表明,概率技术可以更有效地控制新冠肺炎的传播,并且可以降低诊断和社会成本。此外,研究还发现,概率接触追踪技术能够更好地捕捉反传播和超传播事件。
    Abstract The recent COVID-19 pandemic underscores the significance of early-stage non-pharmacological intervention strategies. The widespread use of masks and the systematic implementation of contact tracing strategies provide a potentially equally effective and socially less impactful alternative to more conventional approaches, such as large-scale mobility restrictions. However, manual contact tracing faces strong limitations in accessing the network of contacts, and the scalability of currently implemented protocols for smartphone-based digital contact tracing becomes impractical during the rapid expansion phases of the outbreaks, due to the surge in exposure notifications and associated tests. A substantial improvement in digital contact tracing can be obtained through the integration of probabilistic techniques for risk assessment that can more effectively guide the allocation of new diagnostic tests. In this study, we first quantitatively analyze the diagnostic and social costs associated with these containment measures based on contact tracing, employing three state-of-the-art models of SARS-CoV-2 spreading. Our results suggest that probabilistic techniques allow for more effective mitigation at a lower cost. Secondly, our findings reveal a remarkable efficacy of probabilistic contact-tracing techniques in capturing backward propagations and super-spreading events, relevant features of the diffusion of many pathogens, including SARS-CoV-2.
    摘要 COVID-19 现象强调了初期不需药物治疗措施的重要性。广泛使用面具和系统性实施了接触跟踪策略,可能是 convent ional方法,如大规模移动限制,的等效和社会影响较小的选择。然而,手动接触跟踪面临着访问联系网络的限制,并且现有的智能手机基础设置的数字接触跟踪协议在爆发速度快的阶段变得不实际,因为暴露通知和相关测试的增加。通过 интегра tion of probabilistic techniques for risk assessment,可以更有效地指导新的诊断测试分配。在这个研究中,我们首先量化了基于接触跟踪的鉴定和社会成本,使用三种 state-of-the-art SARS-CoV-2 传播模型。我们的结果表明, probabilistic techniques 可以更有效地降低成本。其次,我们发现了 probabilistic contact-tracing 技术在反向传播和超速传播事件中的惊人效果。

Identifying Spurious Correlations using Counterfactual Alignment

  • paper_url: http://arxiv.org/abs/2312.02186
  • repo_url: https://github.com/ieee8023/latentshift
  • paper_authors: Joseph Paul Cohen, Louis Blankemeier, Akshay Chaudhari
  • for: 检测黑盒类分器中的假 correlate,提高类分器的普遍性表现。
  • methods: 使用对一个类分器进行对假的图像生成,然后输入到其他类分器中,并计算这些类分器的输出变化的关系。
  • results: 可以检测黑盒类分器中的假 correlate,并且可以Quantify the relationship between the responses of different classifiers to identify specific instances where a spurious correlation exists.
    Abstract Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual alignment method to detect and explore spurious correlations of black box classifiers. Counterfactual images generated with respect to one classifier can be input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists as well as compute aggregate statistics over a dataset. Our work demonstrates the ability to detect spurious correlations in face attribute classifiers. This is validated by observing intuitive trends in a face attribute classifier as well as fabricating spurious correlations and detecting their presence, both visually and quantitatively. Further, utilizing the CF alignment method, we demonstrate that we can rectify spurious correlations identified in classifiers.
    摘要 黑obox分类器driven by假 correlate often yield poor generalization performance. We propose counterfactual alignment method to detect and explore black box classifier的假 correlate. counterfactual image generated with respect to one classifier can be input into other classifier to see if they also induce changes in the outputs of these classifiers. the relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists as well as compute aggregate statistics over a dataset. Our work demonstrates the ability to detect spurious correlations in face attribute classifier. This is validated by observing intuitive trends in a face attribute classifier as well as fabricating spurious correlations and detecting their presence, both visually and quantitatively. Further, utilizing the CF alignment method, we demonstrate that we can rectify spurious correlations identified in classifiers.

LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models

  • paper_url: http://arxiv.org/abs/2312.00909
  • repo_url: None
  • paper_authors: Reza Yousefi Maragheh, Chenhao Fang, Charan Chand Irugu, Parth Parikh, Jason Cho, Jianpeng Xu, Saranyan Sukumar, Malay Patel, Evren Korpeoglu, Sushant Kumar, Kannan Achan
  • for: 这篇论文是为了提出一种基于大语言模型(LLM)的主题感知关键词EXTRACTION模型,用于生成基于文本Metadata的产品主题。
  • methods: 该模型框架包括多个阶段,包括避免输出不信息的或敏感的关键词、减少LLM的幻觉现象,以提高EXTRACTION模型的准确率和多样性。
  • results: 在三个实际数据集上进行了广泛的实验,比较了与参考模型的性能指标,显示了LLM TAKE模型在提高准确率和多样性指标方面的表现优于参考模型。
    Abstract Keyword extraction is one of the core tasks in natural language processing. Classic extraction models are notorious for having a short attention span which make it hard for them to conclude relational connections among the words and sentences that are far from each other. This, in turn, makes their usage prohibitive for generating keywords that are inferred from the context of the whole text. In this paper, we explore using Large Language Models (LLMs) in generating keywords for items that are inferred from the items textual metadata. Our modeling framework includes several stages to fine grain the results by avoiding outputting keywords that are non informative or sensitive and reduce hallucinations common in LLM. We call our LLM-based framework Theme-Aware Keyword Extraction (LLM TAKE). We propose two variations of framework for generating extractive and abstractive themes for products in an E commerce setting. We perform an extensive set of experiments on three real data sets and show that our modeling framework can enhance accuracy based and diversity based metrics when compared with benchmark models.
    摘要 《主题感知键词EXTRACTION》是自然语言处理中的核心任务之一。传统EXTRACTION模型具有短时间注意力,使得它们很难捕捉文本中距离的关联性。这使得传统EXTRACTION模型在生成文本中的键词时 exhibit 出缺乏弹性和准确性。在这篇论文中,我们探讨使用大型自然语言模型(LLM)来生成基于文本metadata的键词。我们的框架包括多个阶段,以精确地避免生成不具有弹性或敏感性的键词,并对LLM的幻视问题进行处理。我们称这为“主题感知键词EXTRACTION”(LLM TAKE)框架。我们提出了两种基于抽象和EXTRACTIVE主题的框架,用于生成产品在电子商务中的键词。我们对三个真实数据集进行了广泛的实验,并证明了我们的框架可以对比对 benchmark 模型进行改进的精度和多样性指标。

Nash Learning from Human Feedback

  • paper_url: http://arxiv.org/abs/2312.00886
  • repo_url: None
  • paper_authors: Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot
  • for: 在文本概要中,这篇论文的目的是提出一种基于人类反馈的 preference learning 方法,用于调整大型自然语言模型 (LLM) 的策略,以便更好地与人类偏好相Alignment。
  • methods: 该方法包括首先学习一个 preference model,该模型是根据提示两个输入来定义,然后通过优化这个 preference model 来实现一个策略,该策略能够在与任何竞争策略相比而被人类更好地喜欢。这个方法被称为 Nash learning from human feedback (NLHF)。
  • results: 在一个 tabular policy representation 中,我们提出了一种基于 mirror descent 的 novel algorithmic solution,称为 Nash-MD,该算法可以生成一系列策略,其中最后一轮 converging 到了准确的 Nash 平衡。此外,我们还 explore 了参数表示方法和深度学习架构的 gradient descent 算法。我们在文本概要任务中进行了实验,并证明了我们的方法的有效性。
    Abstract Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.
    摘要 人类反馈学习(RLHF)已成为大语言模型(LLM)与人类偏好的主要方法。通常情况下,RLHF包括从人类反馈中学习奖励模型,经常表达为两个文本生成Output的 LLMCMP 之间的偏好。然后,通过使 LLMCMP optimize 奖励模型来使其最大化奖励值,使用了反馈学习算法。然而,当前的奖励模型存在一定的局限性,即无法完全表达人类偏好的丰富性和归一化分布的依赖。在这项研究中,我们介绍了一种alternative的精细化方法,使用对比式人类反馈来精细化 LLMCMP。我们的方法包括首先学习一个偏好模型,该模型基于提示输入两个输出,然后通过一种定义Nash平衡的策略来追求一个能够在任何竞争策略下产生更多偏好的响应。我们称之为人类反馈学习(NLHF)。在表示策略使用表格时,我们提出了一种新的算法解决方案,即Nash-MD。这个算法根据mirror descent的原则生成了一系列策略,其中最后一轮 converge 到规范化Nash平衡。此外,我们还考虑了参数表示策略的方法,并引入了深度学习架构中的梯度下降算法。为证明我们的方法的有效性,我们在文本概要任务上进行了实验。我们认为NLHF可以提供一条有挑战性的道路,可以进一步提高LLMCMP与人类偏好的对应。

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

  • paper_url: http://arxiv.org/abs/2312.00878
  • repo_url: https://github.com/walbouss/gem
  • paper_authors: Walid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne
  • for: 这篇论文目的是提出一个零 Fine-tuning 的 Open-vocabulary 物件位置推导方法,并且可以在不同的数据集和背景下实现高效的推导性能。
  • methods: 本文提出了一个名为 Grounding Everything Module (GEM) 的模组,它将value-value attention 扩展为一个自我注意路径,并且透过自我注意来强制物件Token之间的相似性,同时保持语言空间的掌握。
  • results: 本文透过实验表明,GEM 不仅在不同的数据集和背景下可以实现零 Fine-tuning 的 Open-vocabulary 物件位置推导,而且在 OpenImagesV7 大规模分类测试 benchmark 上实现了州务-of-the-art 的结果。
    Abstract Vision-language foundation models have shown remarkable performance in various zero-shot settings such as image retrieval, classification, or captioning. But so far, those models seem to fall behind when it comes to zero-shot localization of referential expressions and objects in images. As a result, they need to be fine-tuned for this task. In this paper, we show that pretrained vision-language (VL) models allow for zero-shot open-vocabulary object localization without any fine-tuning. To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path. We show that the concept of self-self attention corresponds to clustering, thus enforcing groups of tokens arising from the same object to be similar while preserving the alignment with the language space. To further guide the group formation, we propose a set of regularizations that allows the model to finally generalize across datasets and backbones. We evaluate the proposed GEM framework on various benchmark tasks and datasets for semantic segmentation. It shows that GEM not only outperforms other training-free open-vocabulary localization methods, but also achieves state-of-the-art results on the recently proposed OpenImagesV7 large-scale segmentation benchmark.
    摘要 视力语基本模型在多个零批设定中表现出色,如图像检索、分类或captioning等。然而,这些模型在指示表达和物体的零批定位方面仍然落后。因此,它们需要进行微调。在这篇论文中,我们表明了预训练的视力语(VL)模型可以在零批开 vocabulary 对象定位中达到开关不需要微调的性能。为了利用这些能力,我们提出了基于自我自注意的全能模块(GEM)。我们示示了这个概念与自我注意相对应,因此使得来自同一个物体的token在语言空间中保持Alignment,同时强制这些token在语言空间中的分组。为了进一步引导分组,我们提出了一些正则化,使得模型最终能够在不同的dataset和后端上Generalize。我们评估了提出的GEM框架在多个 benchmark 任务和dataset上的性能,并显示GEM不仅超过了其他零批开 vocabulary 定位方法的性能,还达到了最新的OpenImagesV7大规模 segmentation benchmark的状态的艺术 Results。

3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing

  • paper_url: http://arxiv.org/abs/2312.00870
  • repo_url: None
  • paper_authors: Balamurugan Thambiraja, Sadegh Aliakbarian, Darren Cosker, Justus Thies
  • for: 这篇论文是用于创建人性化的speech-driven 3D 面部动画和修改。
  • methods: 这篇论文使用了一个新的方法,即一个轻量级的音频状况驱动的扩散模型,以生成3D 面部动画。
  • results: 根据量化和质感评估,这篇论文证明了其方法可以对已有的State-of-the-art技术进行改进,并生成更有实际感的动画。
    Abstract We present 3DiFACE, a novel method for personalized speech-driven 3D facial animation and editing. While existing methods deterministically predict facial animations from speech, they overlook the inherent one-to-many relationship between speech and facial expressions, i.e., there are multiple reasonable facial expression animations matching an audio input. It is especially important in content creation to be able to modify generated motion or to specify keyframes. To enable stochasticity as well as motion editing, we propose a lightweight audio-conditioned diffusion model for 3D facial motion. This diffusion model can be trained on a small 3D motion dataset, maintaining expressive lip motion output. In addition, it can be finetuned for specific subjects, requiring only a short video of the person. Through quantitative and qualitative evaluations, we show that our method outperforms existing state-of-the-art techniques and yields speech-driven animations with greater fidelity and diversity.
    摘要 我们介绍3DiFACE,一种新的个性化语音驱动3D脸部动画和编辑方法。现有方法可能会 deterministically 预测脸部动画从语音输入,但它们忽略了脸部表达与语音之间的自然一对多关系,即一个语音输入可能有多个合理的脸部动画。特别是在内容创作中,需要能够修改生成的运动或指定关键帧。为了实现随机性以及动作编辑,我们提议一种听取音频条件的扩散模型 для 3D 脸部动画。这种扩散模型可以在一个小的3D动作数据集上培训,保持表达舌头动作的表现。此外,它还可以通过精度调整来适应特定主题,只需要一段短视频。通过量化和质量评估,我们表明我们的方法在现有状态的技术方面表现出色,并且生成的语音驱动动画具有更高的准确性和多样性。

Making Large Multimodal Models Understand Arbitrary Visual Prompts

  • paper_url: http://arxiv.org/abs/2312.00784
  • repo_url: None
  • paper_authors: Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee
  • for: 提高图像理解的Region-specific能力,使用自然的视觉提示。
  • methods: 提出一种新的多Modal模型,可以解码自然的视觉提示,如红色 bounding box 或指向箭头。
  • results: 实现了Visual7W、PointQA和Visual Commonsense Reasoning benchmark的状态天数表现,并提供了一个完整的Benchmark来评估模型的视觉提示理解能力。
    Abstract While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual prompts. This allows users to intuitively mark images and interact with the model using natural cues like a "red bounding box" or "pointed arrow". Our simple design directly overlays visual markers onto the RGB image, eliminating the need for complex region encodings, yet achieves state-of-the-art performance on region-understanding tasks like Visual7W, PointQA, and Visual Commonsense Reasoning benchmark. Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain. Code, data, and model are publicly available.
    摘要 Current large vision-language multimodal models focus on whole image understanding, but there is a prominent gap in achieving region-specific comprehension. Existing approaches using textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual prompts, allowing users to intuitively mark images and interact with the model using natural cues like a "red bounding box" or "pointed arrow". Our simple design directly overlays visual markers onto the RGB image, eliminating the need for complex region encodings, yet achieves state-of-the-art performance on region-understanding tasks like Visual7W, PointQA, and Visual Commonsense Reasoning benchmark. Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain. Code, data, and model are publicly available.

Context Retrieval via Normalized Contextual Latent Interaction for Conversational Agent

  • paper_url: http://arxiv.org/abs/2312.00774
  • repo_url: https://github.com/jliu-v/pk-ncli
  • paper_authors: Junfeng Liu, Zhuocheng Mei, Kewen Peng, Ranga Raju Vatsavai
  • for: 本研究旨在提高对话代理人的质量,特别是使用深度学习的对话代理人在实际应用中面临的挑战,包括不尊重知识和事实、不个性化用户喜好以及训练和推理过程占用计算资源的巨大需求。
  • methods: 本研究使用了一种新的方法,即PK-NCLI,来准确地和高效地从auxiliary信息中提取有用信息,以提高对话响应质量。PK-NCLI通过低级别的常见化潜在交互来学习人物、对话历史和知识背景之间的相互关系。
  • results: 我们的实验结果表明,PK-NCLI比PK-FoCus方法提高了47.80%/30.61%/24.14%的误差、知识基础和训练效率,并同时维持了人物基础性能水平。我们还进行了详细的分析,探讨不同语言模型和训练参数对PK-NCLI性能的影响。
    Abstract Conversational agents leveraging AI, particularly deep learning, are emerging in both academic research and real-world applications. However, these applications still face challenges, including disrespecting knowledge and facts, not personalizing to user preferences, and enormous demand for computational resources during training and inference. Recent research efforts have been focused on addressing these challenges from various aspects, including supplementing various types of auxiliary information to the conversational agents. However, existing methods are still not able to effectively and efficiently exploit relevant information from these auxiliary supplements to further unleash the power of the conversational agents and the language models they use. In this paper, we present a novel method, PK-NCLI, that is able to accurately and efficiently identify relevant auxiliary information to improve the quality of conversational responses by learning the relevance among persona, chat history, and knowledge background through low-level normalized contextual latent interaction. Our experimental results indicate that PK-NCLI outperforms the state-of-the-art method, PK-FoCus, by 47.80%/30.61%/24.14% in terms of perplexity, knowledge grounding, and training efficiency, respectively, and maintained the same level of persona grounding performance. We also provide a detailed analysis of how different factors, including language model choices and trade-offs on training weights, would affect the performance of PK-NCLI.
    摘要 现在的对话代理人使用人工智能(AI)和深度学习(DL)在学术研究和实际应用中得到了广泛的应用。然而,这些应用还面临着一些挑战,包括不尊重知识和事实,不个性化用户喜好,以及训练和推理过程中的巨大计算资源需求。现有研究努力在不同的方面进行了努力,包括补充不同类型的辅助信息。然而,现有方法仍然无法有效地和高效地利用这些辅助信息来提高对话响应质量。在本文中,我们提出了一种新的方法,即PK-NCLI,可以准确地和高效地识别有用的辅助信息,以提高对话响应质量。我们通过对 persona、chat history 和知识背景之间的低级别正规化 контекстual latent interaction(PK-NCLI)进行学习,以确定它们之间的相互关系。我们的实验结果表明,PK-NCLI 在对 persona 背景和知识背景进行准确地和高效地识别方面,与当前状态的方法PK-FoCus相比,提高了47.80%/30.61%/24.14%的plexity、知识固定和训练效率,并保持了同等的 persona 固定性。我们还对不同的因素,包括语言模型选择和训练 веса的平衡,进行了详细的分析,以了解PK-NCLI 的性能如何受到这些因素的影响。

Automated Material Properties Extraction For Enhanced Beauty Product Discovery and Makeup Virtual Try-on

  • paper_url: http://arxiv.org/abs/2312.00766
  • repo_url: None
  • paper_authors: Fatemeh Taheri Dezaki, Himanshu Arora, Rahul Suresh, Amin Banitalebi-Dehkordi
  • for: 提高化妆购物体验,提供更加便捷和满足的化妆产品搜索体验。
  • methods: 采用自定义机器学习模型pipeline,从化妆产品图像中提取重要的物理特征。
  • results: 成功应用于多种化妆产品类别,包括眼镜和口红,并实现虚拟试用体验,提高化妆购物体验。
    Abstract The multitude of makeup products available can make it challenging to find the ideal match for desired attributes. An intelligent approach for product discovery is required to enhance the makeup shopping experience to make it more convenient and satisfying. However, enabling accurate and efficient product discovery requires extracting detailed attributes like color and finish type. Our work introduces an automated pipeline that utilizes multiple customized machine learning models to extract essential material attributes from makeup product images. Our pipeline is versatile and capable of handling various makeup products. To showcase the efficacy of our pipeline, we conduct extensive experiments on eyeshadow products (both single and multi-shade ones), a challenging makeup product known for its diverse range of shapes, colors, and finish types. Furthermore, we demonstrate the applicability of our approach by successfully extending it to other makeup categories like lipstick and foundation, showcasing its adaptability and effectiveness across different beauty products. Additionally, we conduct ablation experiments to demonstrate the superiority of our machine learning pipeline over human labeling methods in terms of reliability. Our proposed method showcases its effectiveness in cross-category product discovery, specifically in recommending makeup products that perfectly match a specified outfit. Lastly, we also demonstrate the application of these material attributes in enabling virtual-try-on experiences which makes makeup shopping experience significantly more engaging.
    摘要 “许多化妆品的可用性可能会让您寻找理想的对象很困难。为了增强化妆用品购买体验,需要一个智能的产品探索方法。然而,实现精确和高效的产品探索需要提取详细的材料特征,如颜色和完成类型。我们的工作将 introduce an automated pipeline that utilizes multiple customized machine learning models to extract essential material attributes from makeup product images.我们的管道可以处理不同的化妆品,并且可以实现跨类别的产品探索。为了说明我们的方法的有效性,我们将实现了广泛的实验,包括眼影粉和彩妆产品(单一和多色),这些产品因其多样化的形状、颜色和完成类型而变得更加挑战。此外,我们还将这个方法扩展到其他化妆类别,如口红和基地,展示了它的灵活性和有效性。此外,我们还将实现ablation实验,以示人工标签方法与我们的机器学习管道之间的优劣。最后,我们还将说明这些材料特征的应用,包括实现虚拟尝试的经验,让化妆品购买体验更加有趣。”

Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses

  • paper_url: http://arxiv.org/abs/2312.00763
  • repo_url: None
  • paper_authors: Xiao Ma, Swaroop Mishra, Ariel Liu, Sophie Su, Jilin Chen, Chinmay Kulkarni, Heng-Tze Cheng, Quoc Le, Ed Chi
  • for: 用于帮助用户结构化思想、探索不同选项、导航选择和建议,以便更好地个性化响应。
  • methods: 使用 schema-like 结构和导航功能,让用户更容易定义高级偏好和目标。
  • results: 在用户研究中,发现用户认为 ExploreLLM 有助于帮助他们完成探索和规划任务,并且可以更好地个性化响应。
    Abstract Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the interaction is textual, users have little scaffolding in the way of structure, informational "scent", or ability to specify high-level preferences or goals. We introduce ExploreLLM that allows users to structure thoughts, help explore different options, navigate through the choices and recommendations, and to more easily steer models to generate more personalized responses. We conduct a user study and show that users find it helpful to use ExploreLLM for exploratory or planning tasks, because it provides a useful schema-like structure to the task, and guides users in planning. The study also suggests that users can more easily personalize responses with high-level preferences with ExploreLLM. Together, ExploreLLM points to a future where users interact with LLMs beyond the form of chatbots, and instead designed to support complex user tasks with a tighter integration between natural language and graphical user interfaces.
    摘要 translate into Simplified Chinese:大型语言模型(LLM)驱动的 chatbot 主要是文本类型的 today,带来大量的互动认知负担,特别是在探索或理解任务中,如规划旅行或学习新城市。因为交互是文本的,用户没有多少框架、信息 "踪迹" 或高级偏好或目标的指导。我们介绍 ExploreLLM,允许用户结构思想,帮助探索不同的选项,导航选择和建议,并更容易使模型生成更个性化的响应。我们进行了用户研究,发现用户在探索或规划任务中发现 ExploreLLM 有用,因为它提供了有用的 schema 类型结构,并导引用户进行规划。研究还表明用户可以更容易个性化响应,使用高级偏好。 together, ExploreLLM 指向了未来,用户与 LLM 不再是 chatbot 的形式,而是为复杂用户任务进行更紧密的自然语言和图形用户界面的结合。

Deep Unlearning: Fast and Efficient Training-free Approach to Controlled Forgetting

  • paper_url: http://arxiv.org/abs/2312.00761
  • repo_url: https://github.com/sangamesh-kodge/class_forgetting
  • paper_authors: Sangamesh Kodge, Gobinda Saha, Kaushik Roy
  • for: 这个论文是为了解决机器学习中的减学问题,即在 deleting user data upon request 和 Privacy 的要求下,现有方法受限于计算资源和对原始训练数据的限制。
  • methods: 这篇论文提出了一种新的类减学算法,可以筛选掉整个类或一组类从学习模型中。这个算法首先估计了保留空间和忘记空间,表示要保留的样本特征或活动空间,以及要忘记的样本特征或活动空间。然后,它计算这两个空间之间的共同信息,并从忘记空间中减去这个信息,以隔离类异特征空间。最后,它将模型参数 проек到这个空间中,以获得减学后的模型。
  • results: 这篇论文使用了 ImageNet 上的 Vision Transformer,并实现了约 1.5% 的保留精度下降,与原始模型相比,同时保持了 Less than 1% 的减学后的样本精度。此外,这种算法在面部推理攻击中表现良好,与其他基elines comparison 平均下降 7.8%,而且在多种图像分类dataset和网络架构上都能够达到这个水平。
    Abstract Machine unlearning has emerged as a prominent and challenging area of interest, driven in large part by the rising regulatory demands for industries to delete user data upon request and the heightened awareness of privacy. Existing approaches either retrain models from scratch or use several finetuning steps for every deletion request, often constrained by computational resource limitations and restricted access to the original training data. In this work, we introduce a novel class unlearning algorithm designed to strategically eliminate an entire class or a group of classes from the learned model. To that end, our algorithm first estimates the Retain Space and the Forget Space, representing the feature or activation spaces for samples from classes to be retained and unlearned, respectively. To obtain these spaces, we propose a novel singular value decomposition-based technique that requires layer wise collection of network activations from a few forward passes through the network. We then compute the shared information between these spaces and remove it from the forget space to isolate class-discriminatory feature space for unlearning. Finally, we project the model weights in the orthogonal direction of the class-discriminatory space to obtain the unlearned model. We demonstrate our algorithm's efficacy on ImageNet using a Vision Transformer with only $\sim$1.5% drop in retain accuracy compared to the original model while maintaining under 1% accuracy on the unlearned class samples. Further, our algorithm consistently performs well when subject to Membership Inference Attacks showing 7.8% improvement on average across a variety of image classification datasets and network architectures, as compared to other baselines while being $\sim$6x more computationally efficient.
    摘要 机器无学习已经成为一个引人注目的领域,受到用户数据删除请求的法规要求和隐私意识的提高的推动。现有的方法可以是从scratch retrained模型或使用多个精度调整步骤来处理每个删除请求,但这些方法受到计算资源限制和原始训练数据的有限访问的限制。在这种情况下,我们介绍了一种新的类无学习算法,可以策略性地从学习模型中除掉整个类或一组类。为达到这个目标,我们的算法首先估算了保留空间和忘记空间,这些空间表示保留模型中的样本类别的特征或活动空间。我们提出了一种基于 singular value decomposition 的新技术来获得这些空间,这种技术需要网络启动一些前进 pass 的活动。然后,我们计算了这些空间之间的共同信息,并从忘记空间中除去它们来隔离类别特征空间。最后,我们将模型权重向量在这个特征空间的正交方向上进行投影,以获得无学习后的模型。我们在 ImageNet 上使用了一个 Vision Transformer,只需要 $\sim$1.5% 的保留精度相比原始模型而下降,同时保持 $\sim$1% 的忘记精度。此外,我们的算法在面对成员推理攻击时表现良好,与其他基准值相比,平均提高了7.8%,在多种图像分类 dataset 和网络架构上表现良好。此外,我们的算法在计算效率方面也比其他基准值更高,约6倍。

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  • paper_url: http://arxiv.org/abs/2312.00752
  • repo_url: https://github.com/radarFudan/mamba
  • paper_authors: Albert Gu, Tri Dao
  • for: This paper aims to improve the efficiency and performance of deep learning models, particularly in natural language processing tasks.
  • methods: The authors propose several improvements to the standard Transformer architecture, including the use of selective structured state space models (SSMs) and a hardware-aware parallel algorithm in recurrent mode.
  • results: The proposed model, called Mamba, achieves faster inference and linear scaling in sequence length, with improved performance on real data up to million-length sequences. Mamba also achieves state-of-the-art performance across several modalities, including language, audio, and genomics. Specifically, the Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size in language modeling tasks.
    Abstract Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5$\times$ higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.
    摘要 基础模型现在推动大多数深度学习应用程序,大多数基于Transformer架构和其核心吸引模块。许多次几乎时间架构,如线性吸引、闭合卷积和回归模型,以及结构状态空间模型(SSM)已经开发出来解决Transformers在长序列上的计算不fficient问题,但它们在语言等重要Modalities上没有达到吸引模块的性能。我们认为这些模型的重大弱点是无法进行内容基于的逻辑推理,我们提出了一些改进。首先,让SSM参数成为输入的函数,使得模型在序列长度维度上可以选择性地传播或忘记信息,根据当前的token。其次,尽管这种变化禁用了高效的卷积,我们设计了硬件意识 parallel算法。我们将这些选择性的SSM integrate into a simplified end-to-end神经网络架构,不含注意力或者MLP块(Mamba)。Mamba在推理速度上比Transformers快5倍,并且在序列长度上 linear scaling,其性能在真实数据上提高,并在不同Modalities上达到了状态空间模型的表现。在语言模型ing,我们的Mamba-3B模型与Transformers相同大小的模型和Transformers twice its size的模型匹配,在预训练和下游评估中都达到了最佳性能。

Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals

  • paper_url: http://arxiv.org/abs/2312.00751
  • repo_url: None
  • paper_authors: Tam Nguyen, Tan M. Nguyen, Richard G. Baraniuk
  • for: 解决深度 transformer 模型中的过度精炼问题,提高模型的表示能力。
  • methods: 提出了一种新的正则化方法,通过 penalty 输出 tokens 的差异的 norm 来保持输入 tokens 的准确性。
  • results: 对于 объек检测、图像分割和语言模型等实际任务, NeuTRENO 模型可以减少输入 tokens 的过度精炼问题,并且比基eline transformer 和现有方法更有优势。
    Abstract Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. We then propose a novel regularizer that penalizes the norm of the difference between the smooth output tokens from self-attention and the input tokens to preserve the fidelity of the tokens. Minimizing the resulting regularized energy functional, we derive the Neural Transformer with a Regularized Nonlocal Functional (NeuTRENO), a novel class of transformer models that can mitigate the over-smoothing issue. We empirically demonstrate the advantages of NeuTRENO over the baseline transformers and state-of-the-art methods in reducing the over-smoothing of token representations on various practical tasks, including object classification, image segmentation, and language modeling.
    摘要 transformers已经取得了广泛的成功在自然语言处理和计算机视觉领域。然而,深度transformer模型的表示能力受到过滤平凡问题的影响,这会导致token表示的标准化。在这种情况下,我们发现了自我注意层在transformers中实际上是最小化了一个功能,该功能提高了均匀性,从而导致token的标准化。我们然后提出了一种新的正则化方法,该方法对输出token的范例进行惩罚,以保持输入token的准确性。通过最小化这个含正则化能量函数,我们 derivates了一种新的transformer模型,即NeuTRENO,该模型可以 Mitigate过滤平凡问题。我们通过实验表明,NeuTRENO在多个实际任务上,包括物体分类、图像分割和语言模型,可以减少token表示的过滤平凡问题。

Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games

  • paper_url: http://arxiv.org/abs/2312.00746
  • repo_url: None
  • paper_authors: Dekun Wu, Haochen Shi, Zhiyuan Sun, Bang Liu
  • for: 这个研究探讨了大自然语言模型(LLM)在“jubensha”(中文杀人推理游戏)中的应用,这是人工智能驱动游戏中一个新领域。
  • methods: 我们引入了第一个特有的中文jubensha数据集,包括人物脚本和游戏规则,以促进AI代理的开发在这种复杂的情节环境中。我们还提出了一种独特的多代理互动框架,使AI代理可以自主参与游戏,提高jubensha游戏的动态性。
  • results: 我们开发了特化的评估方法,以评估AI代理在案件信息和推理技能方面的掌握程度。此外,我们还应用了最新的Contextual Learning技术,以提高代理的信息收集、凶手推理和逻辑推理等方面的表现。实验结果证明了我们的提posed方法的效果。这项研究旨在提供LLM的新的应用场景,并为研究者在这个领域提供一个新的评价标准。
    Abstract In this study, we explore the application of Large Language Models (LLMs) in "Jubensha" (Chinese murder mystery role-playing games), a novel area in AI-driven gaming. We introduce the first Chinese dataset specifically for Jubensha, including character scripts and game rules, to foster AI agent development in this complex narrative environment. Our work also presents a unique multi-agent interaction framework using LLMs, allowing AI agents to autonomously engage in the game, enhancing the dynamics of Jubensha gameplay. To evaluate these AI agents, we developed specialized methods targeting their mastery of case information and reasoning skills. Furthermore, we incorporated the latest advancements in in-context learning to improve the agents' performance in critical aspects like information gathering, murderer detection, and logical reasoning. The experimental results validate the effectiveness of our proposed methods. This work aims to offer a fresh perspective on understanding LLM capabilities and establish a new benchmark for evaluating large language model-based agents to researchers in the field.
    摘要 在这项研究中,我们探索了大语言模型(LLMs)在“卷降者”(中文杀人推理角色游戏)中的应用,这是人工智能驱动游戏领域的一个新领域。我们提出了第一个专门为卷降者而设计的中文数据集,包括角色脚本和游戏规则,以促进AI代理人的发展在这种复杂的故事环境中。我们的工作还提出了一种基于LLMs的多代理人互动框架,使AI代理人能够自主参与游戏,提高卷降者游戏的动态。为评估这些AI代理人,我们开发了专门针对案件信息和推理技能的评价方法。此外,我们还应用了最新的上下文学习技术,以提高代理人的信息收集、凶手检测和逻辑推理能力。实验结果证明了我们的提议的有效性。本研究的目的是为研究者在LLM能力的理解提供一个新的视角,并为大语言模型基于代理人的研究提供一个新的标准。

Scalable Meta-Learning with Gaussian Processes

  • paper_url: http://arxiv.org/abs/2312.00742
  • repo_url: None
  • paper_authors: Petru Tighineanu, Lukas Grossberger, Paul Baireuther, Kathrin Skubch, Stefan Falkner, Julia Vinogradska, Felix Berkenkamp
  • for: 本研究旨在提出一种可扩展的GP模型,用于适应新任务的快速解决。
  • methods: 本研究使用了closed-form posterior of Gaussian processes(GP)和 Bayesian optimization方法,但这些方法存在计算成本高或假设不具有可靠的协调uncertainty между任务模型,这可能会打乱exploration和exploitation的平衡。
  • results: 本研究提出了一种可扩展的GP模型,即ScaML-GP,它可以快速学习新任务,并且可以处理多个任务。在synthetic和实际应用中,我们证明ScaML-GP可以有效地学习,即使只有几个meta-任务或多个meta-任务。
    Abstract Meta-learning is a powerful approach that exploits historical data to quickly solve new tasks from the same distribution. In the low-data regime, methods based on the closed-form posterior of Gaussian processes (GP) together with Bayesian optimization have achieved high performance. However, these methods are either computationally expensive or introduce assumptions that hinder a principled propagation of uncertainty between task models. This may disrupt the balance between exploration and exploitation during optimization. In this paper, we develop ScaML-GP, a modular GP model for meta-learning that is scalable in the number of tasks. Our core contribution is a carefully designed multi-task kernel that enables hierarchical training and task scalability. Conditioning ScaML-GP on the meta-data exposes its modular nature yielding a test-task prior that combines the posteriors of meta-task GPs. In synthetic and real-world meta-learning experiments, we demonstrate that ScaML-GP can learn efficiently both with few and many meta-tasks.
    摘要 <>这是一种强大的方法,利用历史数据快速解决新任务。在低数据情况下,基于关闭形 posterior 的 Gaussian 过程(GP)以及 Bayesian 优化方法已经实现了高性能。然而,这些方法可能是 computationally expensive 或是导入假设,这可能会中断探索和优化的平衡。在这篇文章中,我们开发了 ScaML-GP,一个可扩展的 GP 模型 для meta-learning。我们的核心贡献是一个精心设计的多任务核心,允许 hierarchical training 和任务可扩展性。将 ScaML-GP conditional 在 meta-data 上露出其模块性,则可以获得一个包含 meta-task GP 的 test-task prior。在实验中,我们展示了 ScaML-GP 可以快速学习,具有少量和多量 meta-tasks 的能力。

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

  • paper_url: http://arxiv.org/abs/2312.00732
  • repo_url: https://github.com/lkeab/gaussian-grouping
  • paper_authors: Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke
  • for: 高品质和即时新视角Synthesis of 3D scenes, but lacks fine-grained object-level scene understanding.
  • methods: 提出了Gaussian Grouping,将Gaussian Splatting扩展到同时重建和分类3DScene中的任何物件。
  • results: 证明了Discrete和Grouped 3D Gaussians可以高品质地重建、分类和编辑3D scene,并且有高效率和精细控制。
    Abstract The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by SAM, along with introduced 3D spatial consistency regularization. Comparing to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization and scene recomposition. Our code and models will be at https://github.com/lkeab/gaussian-grouping.
    摘要 最近的泊ERT方法可以实现高质量和实时的新视图合成三个维度场景。然而,它只关注到场景的外观和几何模型,而忽略了细化的物体级别场景理解。为解决这个问题,我们提议泊ERT集合,它将泊ERT扩展到同时重建和分割场景中的任何物体。我们将每个泊ERT加上一个紧凑的标识编码,使得泊ERT可以根据其物体实例或物体成员身份在场景中分组。而不是依靠costly的3D标签,我们在渲染过程中监督标识编码,通过与SAM提供的2Dmask预测以及引入的3D空间一致正则化。与隐式NeRF表示相比,我们示出了精细的3D泊ERT可以重建、分割和编辑任何3D场景,并且具有高视觉质量、细化和效率。基于泊ERT集合,我们进一步提议了本地泊ERT修改方案,它在多种场景编辑应用中表现出了 efficacy,包括3D对象移除、填充、颜色化和场景重组。我们的代码和模型将在GitHub上提供,链接为https://github.com/lkeab/gaussian-grouping。

Virtual Fusion with Contrastive Learning for Single Sensor-based Activity Recognition

  • paper_url: http://arxiv.org/abs/2312.02185
  • repo_url: None
  • paper_authors: Duc-Anh Nguyen, Cuong Pham, Nhien-An Le-Khac
    for:* 这个论文是为了解决人体活动识别(HAR)中多种感知器的问题,即每种感知器都有不同的优劣点,导致错误预测。methods:* 本论文提出了虚拟融合方法,即在训练时使用多个时间同步的感知器获得更多信息,但只需一个感知器进行测试。* 文章采用了对比学习方法,利用感知器之间的相关性。results:* 虚拟融合方法在 UCI-HAR 和 PAMAP2 测试集上实现了状态 искусственный智能(AI)和 F1 分数的最佳记录。* 在某些情况下,虚拟融合方法 même 超过了实际融合方法在测试时的性能。Here’s the translation in Simplified Chinese:for:* 这篇论文是为了解决人体活动识别(HAR)中多种感知器的问题,即每种感知器都有不同的优劣点,导致错误预测。methods:* 本篇论文提出了虚拟融合方法,即在训练时使用多个时间同步的感知器获得更多信息,但只需一个感知器进行测试。* 文章采用了对比学习方法,利用感知器之间的相关性。results:* 虚拟融合方法在 UCI-HAR 和 PAMAP2 测试集上实现了状态 искусственный智能(AI)和 F1 分数的最佳记录。* 在某些情况下,虚拟融合方法 même 超过了实际融合方法在测试时的性能。
    Abstract Various types of sensors can be used for Human Activity Recognition (HAR), and each of them has different strengths and weaknesses. Sometimes a single sensor cannot fully observe the user's motions from its perspective, which causes wrong predictions. While sensor fusion provides more information for HAR, it comes with many inherent drawbacks like user privacy and acceptance, costly set-up, operation, and maintenance. To deal with this problem, we propose Virtual Fusion - a new method that takes advantage of unlabeled data from multiple time-synchronized sensors during training, but only needs one sensor for inference. Contrastive learning is adopted to exploit the correlation among sensors. Virtual Fusion gives significantly better accuracy than training with the same single sensor, and in some cases, it even surpasses actual fusion using multiple sensors at test time. We also extend this method to a more general version called Actual Fusion within Virtual Fusion (AFVF), which uses a subset of training sensors during inference. Our method achieves state-of-the-art accuracy and F1-score on UCI-HAR and PAMAP2 benchmark datasets. Implementation is available upon request.
    摘要 不同类型的感测器可以用于人体活动识别(HAR),每种感测器都有不同的优势和缺陷。有时候单个感测器无法完全观察用户的动作,导致错误预测。虽然感测器融合提供更多的信息 для HAR,但它们带来许多内在的缺陷,如用户隐私和接受度、成本高昂的设置、运行和维护。为解决这个问题,我们提出虚拟融合方法 - 在训练时使用多个时间同步的感测器上的无标签数据,但只需一个感测器进行推理。我们采用对比学习来利用感测器之间的相关性。虚拟融合在训练时比同样的单个感测器更高的准确率和F1分数,并在某些情况下,甚至超过了实际融合多个感测器的测试时间。我们还扩展了这种方法,称为实际融合在虚拟融合中(AFVF),在推理时使用一组训练时使用的感测器 subset。我们的方法在UCIL-HAR和PAMAP2benchmark数据集上达到了状态控制的准确率和F1分数。实现可以在请求中提供。

Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space

  • paper_url: http://arxiv.org/abs/2312.00727
  • repo_url: None
  • paper_authors: Xiaoyuan Cheng, Boli Chen, Liz Varga, Yukun Hu
  • for: 这paper是为了解决半 observable 环境中的安全强化学习(RL)问题,以实现安全可达性目标。
  • methods: 我们提出了一种基于 Stochastic Model 的方法,可以在不知道系统动力学和部分观察环境的情况下保证 RL 的安全性,并且可以在多步观察中 analytically 表示未来的观察。
  • results: 我们在这个上下文中提出了一种可证明的方法,可以在不知道系统动力学和部分观察环境的情况下,通过不同的运算符来 recursive 地估算未来的观察。我们还Established a polynomial sample complexity for the RL algorithm, ensuring an $\epsilon-$suboptimal safe policy guarantee.
    Abstract This paper delves into the problem of safe reinforcement learning (RL) in a partially observable environment with the aim of achieving safe-reachability objectives. In traditional partially observable Markov decision processes (POMDP), ensuring safety typically involves estimating the belief in latent states. However, accurately estimating an optimal Bayesian filter in POMDP to infer latent states from observations in a continuous state space poses a significant challenge, largely due to the intractable likelihood. To tackle this issue, we propose a stochastic model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics and partial observation environments. We leveraged the Predictive State Representation (PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future multi-step observations analytically, and the results in this context are provable. Furthermore, we derived essential operators from the kernel Bayes' rule, enabling the recursive estimation of future observations using various operators. Under the assumption of \textit{undercompleness}, a polynomial sample complexity is established for the RL algorithm for the infinite size of observation and action spaces, ensuring an $\epsilon-$suboptimal safe policy guarantee.
    摘要 The approach leverages the Predictive State Representation (PSR) and Reproducing Kernel Hilbert Space (RKHS) to represent future multi-step observations analytically. The paper derives essential operators from the kernel Bayes' rule, enabling recursive estimation of future observations using various operators. Under the assumption of undercompleness, the paper establishes a polynomial sample complexity for the RL algorithm, ensuring an $\epsilon-$suboptimal safe policy guarantee for the infinite size of observation and action spaces.

DeepCache: Accelerating Diffusion Models for Free

  • paper_url: http://arxiv.org/abs/2312.00858
  • repo_url: https://github.com/horseee/deepcache
  • paper_authors: Xinyin Ma, Gongfan Fang, Xinchao Wang
  • for: 这篇论文旨在提高Diffusion模型的生成能力,并且降低 computional cost。
  • methods: 这篇论文提出了一个名为DeepCache的训练�free方法,具体来说是利用Diffusion模型的时间重复性,将高层次特征缓存起来,并在缓存中进行Feature reuse。
  • results: 这篇论文的实验结果显示,DeepCache可以将Stable Diffusion v1.5的速度提高2.3倍,只有0.05的CLIP Score下降,并且可以将LDM-4-G的速度提高4.1倍,仅有0.22的FID下降。此外,DeepCache也比现有的剪除和精简方法更好,并且可以与现有的抽样技术相容。
    Abstract Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs, primarily attributed to the sequential denoising process and cumbersome model size. Traditional methods for compressing diffusion models typically involve extensive retraining, presenting cost and feasibility challenges. In this paper, we introduce DeepCache, a novel training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models, which caches and retrieves features across adjacent denoising stages, thereby curtailing redundant computations. Utilizing the property of the U-Net, we reuse the high-level features while updating the low-level features in a very cheap way. This innovative strategy, in turn, enables a speedup factor of 2.3$\times$ for Stable Diffusion v1.5 with only a 0.05 decline in CLIP Score, and 4.1$\times$ for LDM-4-G with a slight decrease of 0.22 in FID on ImageNet. Our experiments also demonstrate DeepCache's superiority over existing pruning and distillation methods that necessitate retraining and its compatibility with current sampling techniques. Furthermore, we find that under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS. The code is available at https://github.com/horseee/DeepCache
    摘要 Diffusion models 在图像生成领域已经受到非常广泛的关注,因为它们的生成能力是非常出色的。不过,这些模型经常需要很大的计算成本,主要是由于顺序的噪声处理和庞大的模型大小。传统的方法用于压缩 diffusion models 通常包括了广泛的重新训练,这会带来成本和可行性的挑战。在这篇论文中,我们提出了 DeepCache,一种新的训练自由的方法,用于加速 diffusion models。DeepCache 利用 diffusion models 中的序列噪声处理步骤中的自然时间相关性,将特征缓存并重新利用,从而减少了重复的计算。通过 U-Net 的性质,我们可以重用高级特征,同时更新低级特征,这是非常便宜的。这种创新的策略使得 diffusion models 的速度因子提高至 2.3倍,只有 CLIP 分数下降了 0.05,并且 LDM-4-G 的速度因子提高至 4.1倍,但是 FID 下降了 0.22 个点。我们的实验还表明 DeepCache 的超越性比现有的减少和精炼方法更好,并且可以与当前的采样技术相结合使用。此外,我们发现,在同等吞吐量下,DeepCache 可以实现与 DDIM 或 PLMS 相同或稍微更好的结果。代码可以在 GitHub 上找到:https://github.com/horseee/DeepCache。

Removing Biases from Molecular Representations via Information Maximization

  • paper_url: http://arxiv.org/abs/2312.00718
  • repo_url: https://github.com/uhlerlab/infocore
  • paper_authors: Chenyu Wang, Sharut Gupta, Caroline Uhler, Tommi Jaakkola
  • for: This paper is written to address the challenge of dealing with batch effects in high-throughput drug screening data, and to propose a new method called InfoCORE for effectively removing these effects and obtaining refined molecular representations.
  • methods: The paper proposes an Information maximization approach for COnfounder REmoval (InfoCORE), which establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier, and adaptively reweighs samples to equalize their implied batch distribution.
  • results: The paper reports extensive experiments on drug screening data that demonstrate the superior performance of InfoCORE in a multitude of tasks, including molecular property prediction and molecule-phenotype retrieval. Additionally, the paper shows how InfoCORE can be used to resolve general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes.
    Abstract High-throughput drug screening -- using cell imaging or gene expression measurements as readouts of drug effect -- is a critical tool in biotechnology to assess and understand the relationship between the chemical structure and biological activity of a drug. Since large-scale screens have to be divided into multiple experiments, a key difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. The code is available at https://github.com/uhlerlab/InfoCORE.
    摘要 高通量药物屏选 -- 使用细胞影像或基因表达量作为药物效果的输出 -- 是生物技术中的一种关键工具,用于评估和理解药物化学结构和生物活性之间的关系。由于大规模屏选必须分成多个实验,因此一个关键的Difficulty is dealing with batch effects, which can introduce systematic errors and non-biological associations in the data. We propose InfoCORE, an Information maximization approach for COnfounder REmoval, to effectively deal with batch effects and obtain refined molecular representations. InfoCORE establishes a variational lower bound on the conditional mutual information of the latent representations given a batch identifier. It adaptively reweighs samples to equalize their implied batch distribution. Extensive experiments on drug screening data reveal InfoCORE's superior performance in a multitude of tasks including molecular property prediction and molecule-phenotype retrieval. Additionally, we show results for how InfoCORE offers a versatile framework and resolves general distribution shifts and issues of data fairness by minimizing correlation with spurious features or removing sensitive attributes. 代码可以在https://github.com/uhlerlab/InfoCORE中找到。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need Traditional Chinese, please let me know and I can provide that as well.

Towards Transparency in Coreference Resolution: A Quantum-Inspired Approach

  • paper_url: http://arxiv.org/abs/2312.00688
  • repo_url: https://github.com/hwazni/qcoref
  • paper_authors: Hadi Wazni, Mehrnoosh Sadrzadeh
  • For: The paper is written for the task of pronoun resolution in natural language processing.* Methods: The paper uses a combination of quantum natural language processing (QNLP) and variational quantum classifiers (VQC) to perform the task of pronoun resolution.* Results: The paper achieves an F1 score of 87.20% on the Winograd-style pronoun resolution task, outperforming two out of three classical coreference resolution systems and approaching state-of-the-art SpanBERT. Additionally, the authors propose a mixed quantum-classical model that improves these results with an F1 score increase of around 6%.
    Abstract Guided by grammatical structure, words compose to form sentences, and guided by discourse structure, sentences compose to form dialogues and documents. The compositional aspect of sentence and discourse units is often overlooked by machine learning algorithms. A recent initiative called Quantum Natural Language Processing (QNLP) learns word meanings as points in a Hilbert space and acts on them via a translation of grammatical structure into Parametrised Quantum Circuits (PQCs). Previous work extended the QNLP translation to discourse structure using points in a closure of Hilbert spaces. In this paper, we evaluate this translation on a Winograd-style pronoun resolution task. We train a Variational Quantum Classifier (VQC) for binary classification and implement an end-to-end pronoun resolution system. The simulations executed on IBMQ software converged with an F1 score of 87.20%. The model outperformed two out of three classical coreference resolution systems and neared state-of-the-art SpanBERT. A mixed quantum-classical model yet improved these results with an F1 score increase of around 6%.
    摘要 根据语法结构,单词组成句子,并且根据对话结构,句子组成文档。然而,机器学习算法常常忽略这种 композиitional 性。一个现在的倡议called Quantum Natural Language Processing (QNLP) learns word meanings as Hilbert space points and acts on them via a translation of grammatical structure into Parametrised Quantum Circuits (PQCs)。之前的工作扩展了QNLP翻译到对话结构使用点在 closure of Hilbert spaces。在这篇文章中,我们评估了这种翻译在Winograd式pronoun resolution task上。我们训练了一个Variational Quantum Classifier (VQC) for binary classification,并实现了一个端到端的pronoun resolution系统。在IBMQ软件上执行的模拟得到了F1 score的87.20%。该模型超越了三个 классиcal coreference resolution系统,并且接近了状态的艺术SpanBERT。一个混合 quantum-classical 模型还提高了这些结果,F1 score增加了约6%。

Resource-constrained knowledge diffusion processes inspired by human peer learning

  • paper_url: http://arxiv.org/abs/2312.00660
  • repo_url: None
  • paper_authors: Ehsan Beikihassan, Amy K. Hoover, Ioannis Koutis, Ali Parviz, Niloofar Aghaieabiane
  • for: 本研究旨在优化人工学习者群体的表现指标,以及在受限制的训练资源下实现这一目标。
  • methods: 本研究采用自然知识协同过程,其中学习者之间的交互可能受到协调者的评估影响。研究者还提出了一种模块化神经网络模型,以增强模型的泛化能力和鲁棒性。
  • results: 研究发现,自然知识协同过程可以有效地利用训练资源,并且可以设计模块化神经网络模型,这些模型具有泛化能力和鲁棒性。
    Abstract We consider a setting where a population of artificial learners is given, and the objective is to optimize aggregate measures of performance, under constraints on training resources. The problem is motivated by the study of peer learning in human educational systems. In this context, we study natural knowledge diffusion processes in networks of interacting artificial learners. By `natural', we mean processes that reflect human peer learning where the students' internal state and learning process is mostly opaque, and the main degree of freedom lies in the formation of peer learning groups by a coordinator who can potentially evaluate the learners before assigning them to peer groups. Among else, we empirically show that such processes indeed make effective use of the training resources, and enable the design of modular neural models that have the capacity to generalize without being prone to overfitting noisy labels.
    摘要 我们考虑一个设定,在这个设定中有一群人工学习者, objective是优化总性能指标,受限于培训资源。这个问题受到人类教育系统中的帮助学习的研究启发。在这个上下文中,我们研究了人工智能学习者之间的自然知识传播过程。我们使用“自然”来描述这种过程,因为学习过程中学习者的内部状态和学习过程几乎完全透明,主要的自由度在于学习组合的协调员可能会评估学习者之前分配学习组。此外,我们也证明了这种过程确实能够有效利用培训资源,并可以设计模块化神经网络模型,这些模型具有抗混叠标签的能力。

Simple Transferability Estimation for Regression Tasks

  • paper_url: http://arxiv.org/abs/2312.00656
  • repo_url: https://github.com/cuongnn218/regression_transferability
  • paper_authors: Cuong N. Nguyen, Phong Tran, Lam Si Tung Ho, Vu Dinh, Anh T. Tran, Tal Hassner, Cuong V. Nguyen
  • for: 这篇论文主要是关于估计深度学习模型在不同任务之间的传输性能。它专注于回归任务,这些任务在之前得到了少量关注,并提出了两种简单而计算效率高的方法来估计传输性能。
  • methods: 这篇论文提出了两种方法来估计传输性能,即使用负值正则化方差平均值的线性回归模型。我们证明了这些方法与实际传输性能之间的关系,并证明了其与优化目标模型从传输学习过程中获得的实际传输性能之间的关系。
  • results: 对两个大规模关键点回归benchmark数据集进行测试,我们发现我们的方法可以在平均上提高12%到36%,并且在计算效率方面至少提高27%,与之前的状态态方法相比。
    Abstract We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel theoretical results connecting our approaches to the actual transferability of the optimal target models obtained from the transfer learning process. Despite their simplicity, our approaches significantly outperform existing state-of-the-art regression transferability estimators in both accuracy and efficiency. On two large-scale keypoint regression benchmarks, our approaches yield 12% to 36% better results on average while being at least 27% faster than previous state-of-the-art methods.
    摘要 我们考虑了迁移可Estimation,即用源任务来估计目标任务的迁移性。我们主要关注了回归任务,这个问题在之前得到了少量的关注,并提出了两种简单而 computationally efficient的方法来估计迁移性,基于负抑制的mean squared error of a linear regression model。我们证明了新的理论结论,连接我们的方法与实际迁移学习过程中的最佳目标模型的迁移性。尽管我们的方法简单,但它们在精度和效率方面与之前的状态对比较好,在两个大规模的键点回归 benchmark上,我们的方法平均提高了12%到36%,并且比之前的状态对比较快,至少27%。

Latent Space Explorer: Visual Analytics for Multimodal Latent Space Exploration

  • paper_url: http://arxiv.org/abs/2312.00857
  • repo_url: None
  • paper_authors: Bum Chul Kwon, Samuel Friedman, Kai Xu, Steven A Lubitz, Anthony Philippakis, Puneet Batra, Patrick T Ellinor, Kenney Ng
  • for: 本研究旨在开发一个可视化分析系统,帮助医疗专业人员通过多Modalities的数据进行预测和新颖的医学发现。
  • methods: 本研究使用多Modalities的数据进行预测,并开发了一个可视化分析系统来帮助医疗专业人员更好地探索和理解这些数据。
  • results: 用户研究表明,Latent Space Explorer可以帮助医疗专业人员更好地探索和理解多Modalities的数据,并且可以提供有用的预测结果和新颖的医学发现。
    Abstract Machine learning models built on training data with multiple modalities can reveal new insights that are not accessible through unimodal datasets. For example, cardiac magnetic resonance images (MRIs) and electrocardiograms (ECGs) are both known to capture useful information about subjects' cardiovascular health status. A multimodal machine learning model trained from large datasets can potentially predict the onset of heart-related diseases and provide novel medical insights about the cardiovascular system. Despite the potential benefits, it is difficult for medical experts to explore multimodal representation models without visual aids and to test the predictive performance of the models on various subpopulations. To address the challenges, we developed a visual analytics system called Latent Space Explorer. Latent Space Explorer provides interactive visualizations that enable users to explore the multimodal representation of subjects, define subgroups of interest, interactively decode data with different modalities with the selected subjects, and inspect the accuracy of the embedding in downstream prediction tasks. A user study was conducted with medical experts and their feedback provided useful insights into how Latent Space Explorer can help their analysis and possible new direction for further development in the medical domain.
    摘要 机器学习模型在多模态训练数据上建立可以揭示不可能通过单modal数据描述的新发现。例如,心脏磁共振成像(MRI)和电cardiogram(ECG)都是 capture有用信息关于subjects的心血管健康状况。一个多模态机器学习模型从大量数据中训练可能预测心血管疾病的开始和提供新领域医学发现。 despite the potential benefits, it is difficult for medical experts to explore multimodal representation models without visual aids and to test the predictive performance of the models on various subpopulations. To address the challenges, we developed a visual analytics system called Latent Space Explorer. Latent Space Explorer provides interactive visualizations that enable users to explore the multimodal representation of subjects, define subgroups of interest, interactively decode data with different modalities with the selected subjects, and inspect the accuracy of the embedding in downstream prediction tasks. A user study was conducted with medical experts and their feedback provided useful insights into how Latent Space Explorer can help their analysis and possible new direction for further development in the medical domain.Here's the word-for-word translation of the text into Simplified Chinese:机器学习模型在多模态训练数据上建立可以揭示不可能通过单modal数据描述的新发现。例如,心脏磁共振成像(MRI)和电cardiogram(ECG)都是 capture有用信息关于subjects的心血管健康状况。一个多模态机器学习模型从大量数据中训练可能预测心血管疾病的开始和提供新领域医学发现。 despite the potential benefits, it is difficult for medical experts to explore multimodal representation models without visual aids and to test the predictive performance of the models on various subpopulations. To address the challenges, we developed a visual analytics system called Latent Space Explorer. Latent Space Explorer provides interactive visualizations that enable users to explore the multimodal representation of subjects, define subgroups of interest, interactively decode data with different modalities with the selected subjects, and inspect the accuracy of the embedding in downstream prediction tasks. A user study was conducted with medical experts and their feedback provided useful insights into how Latent Space Explorer can help their analysis and possible new direction for further development in the medical domain.

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models

  • paper_url: http://arxiv.org/abs/2312.00651
  • repo_url: None
  • paper_authors: Pengxiang Li, Zhili Liu, Kai Chen, Lanqing Hong, Yunzhi Zhuge, Dit-Yan Yeung, Huchuan Lu, Xu Jia
  • for: 提高多bject tracking(MOT)系统的表现,通过生成高质量的跟踪序列。
  • methods: 基于图像扩散模型,实现图像扩散模型能够捕捉复杂的动态轨迹,并保证实例一致性。
  • results: 实验结果显示,我们的模型能够显著提高实例一致性,导致改进的感知指标。在YTVIS数据集上,我们的方法实现了8.7的提升(TrackAP)和11.8的提升(TrackAP$_{50}$)。
    Abstract Diffusion models have gained prominence in generating data for perception tasks such as image classification and object detection. However, the potential in generating high-quality tracking sequences, a crucial aspect in the field of video perception, has not been fully investigated. To address this gap, we propose TrackDiffusion, a novel architecture designed to generate continuous video sequences from the tracklets. TrackDiffusion represents a significant departure from the traditional layout-to-image (L2I) generation and copy-paste synthesis focusing on static image elements like bounding boxes by empowering image diffusion models to encompass dynamic and continuous tracking trajectories, thereby capturing complex motion nuances and ensuring instance consistency among video frames. For the first time, we demonstrate that the generated video sequences can be utilized for training multi-object tracking (MOT) systems, leading to significant improvement in tracker performance. Experimental results show that our model significantly enhances instance consistency in generated video sequences, leading to improved perceptual metrics. Our approach achieves an improvement of 8.7 in TrackAP and 11.8 in TrackAP$_{50}$ on the YTVIS dataset, underscoring its potential to redefine the standards of video data generation for MOT tasks and beyond.
    摘要 Diffusion models have gained popularity in generating data for perception tasks such as image classification and object detection. However, the potential in generating high-quality tracking sequences, a crucial aspect in the field of video perception, has not been fully explored. To address this gap, we propose TrackDiffusion, a novel architecture designed to generate continuous video sequences from tracklets. TrackDiffusion represents a significant departure from the traditional layout-to-image (L2I) generation and copy-paste synthesis, focusing on static image elements like bounding boxes by empowering image diffusion models to encompass dynamic and continuous tracking trajectories, thereby capturing complex motion nuances and ensuring instance consistency among video frames. For the first time, we demonstrate that the generated video sequences can be utilized for training multi-object tracking (MOT) systems, leading to significant improvement in tracker performance. Experimental results show that our model significantly enhances instance consistency in generated video sequences, leading to improved perceptual metrics. Our approach achieves an improvement of 8.7 in TrackAP and 11.8 in TrackAP$_{50}$ on the YTVIS dataset, underscoring its potential to redefine the standards of video data generation for MOT tasks and beyond.

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

  • paper_url: http://arxiv.org/abs/2312.00855
  • repo_url: None
  • paper_authors: Shuchi Wu, Chuan Ma, Kang Wei, Xiaogang Xu, Ming Ding, Yuwen Qian, Tao Xiang
  • for: 这篇论文旨在解决过去对于预训语模型的删除攻击中的两个主要缺陷:一是依赖于偏袋化的优化目标,二是由于终端架构的需要每个epoch询问目标Encoder,从而导致询问成本过高。
  • methods: 这篇论文提出了一个新的RDA方法,包括两个主要步骤:首先,将目标Encoder的每个训练数据Refine到更加不偏的表现,以建立更加合理的优化目标;其次,通过将每个数据的多个角度的表现聚合为一个标准型态,以便在不需要询问目标Encoder的情况下进行训练。
  • results: 这篇论文的实验结果显示,RDA方法可以在不同的下游数据集上实现state-of-the-art的结果,并且与多种常用的防护措施相比,RDA方法具有较高的Robustness。
    Abstract This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses.
    摘要
  1. Biased optimization objectives leading to suboptimal performance.2. High query costs due to the end-to-end paradigm, which requires querying the target encoder every epoch.To address these issues, RDA refines the representations of the target encoder for each training sample before the steal-training phase. This is achieved through the use of a sample-wise prototype, which consolidates the target encoder’s representations for a given sample’s various perspectives. This approach exponentially reduces the number of queries compared to the end-to-end approach, making it possible to train the surrogate encoder without queries.To further enhance the efficacy of RDA, a multi-relational extraction loss is developed to train the surrogate encoder to discriminate mismatched embedding-prototype pairs while aligning matched ones in terms of both amplitude and angle. This results in the trained surrogate encoder achieving state-of-the-art results across various downstream datasets with limited queries.Moreover, RDA is shown to be robust to multiple widely-used defenses, making it a versatile and effective approach for stealing pre-trained encoders.

A Probabilistic Neural Twin for Treatment Planning in Peripheral Pulmonary Artery Stenosis

  • paper_url: http://arxiv.org/abs/2312.00854
  • repo_url: None
  • paper_authors: John D. Lee, Jakob Richter, Martin R. Pfaller, Jason M. Szafron, Karthik Menon, Andrea Zanoni, Michael R. Ma, Jeffrey A. Feinstein, Jacqueline Kreutzer, Alison L. Marsden, Daniele E. Schiavazzi
  • for: 该研究旨在使用数据驱动架构和优化技术来减轻高精度模型的计算成本,以便在实时决策中使用数字塑型技术。
  • methods: 该研究使用了抽象模型、模型缩放和训练数据生成 pipeline,以及在线估计极值概率和可能的条件补做。
  • results: 该研究提出了一种新的偏函数抽象法,可以用于描述淤积物的形态和位置。此外,该研究还提出了一种新的参数化方法,可以用于描述任意形状的血管修复。
    Abstract The substantial computational cost of high-fidelity models in numerical hemodynamics has, so far, relegated their use mainly to offline treatment planning. New breakthroughs in data-driven architectures and optimization techniques for fast surrogate modeling provide an exciting opportunity to overcome these limitations, enabling the use of such technology for time-critical decisions. We discuss an application to the repair of multiple stenosis in peripheral pulmonary artery disease through either transcatheter pulmonary artery rehabilitation or surgery, where it is of interest to achieve desired pressures and flows at specific locations in the pulmonary artery tree, while minimizing the risk for the patient. Since different degrees of success can be achieved in practice during treatment, we formulate the problem in probability, and solve it through a sample-based approach. We propose a new offline-online pipeline for probabilsitic real-time treatment planning which combines offline assimilation of boundary conditions, model reduction, and training dataset generation with online estimation of marginal probabilities, possibly conditioned on the degree of augmentation observed in already repaired lesions. Moreover, we propose a new approach for the parametrization of arbitrarily shaped vascular repairs through iterative corrections of a zero-dimensional approximant. We demonstrate this pipeline for a diseased model of the pulmonary artery tree available through the Vascular Model Repository.
    摘要 高精度模型在数字征迹动力学中的计算成本过于高,只能用于后期治疗规划。新的数据驱动架构和优化技术可以快速创建伪函数模型,使得这种技术能够在时间紧迫的决策中使用。我们介绍了用于多处扩张性肺动脉疾病的修复,包括肺动脉再建或手术,其中需要在肺动脉树中达到特定的压力和流速,同时最小化患者风险。在实践中,不同的成功程度可以实现,因此我们将问题表述为概率问题,并使用样本基本方法解决。我们提出了一个新的线上-线下管道,该管道包括线上估计毛瑟概率,可能受到已经修复的损伤的度量影响。此外,我们提出了一种新的参数化方法,用于模型化任意形状的血管修复。我们在一个可用于肺动脉树疾病模型的数据存储库中展示了这种管道。

Towards Efficient 3D Object Detection in Bird’s-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach

  • paper_url: http://arxiv.org/abs/2312.00633
  • repo_url: None
  • paper_authors: Yuxin Li, Qiang Han, Mengying Yu, Yuxin Jiang, Chaikiat Yeo, Yiheng Li, Zihang Huang, Nini Liu, Hsuanhan Chen, Xiaojun Wu
  • for: 提高驾驶自动化领域中BEV空间中的3D对象检测精度和速度优化
  • methods: 提出一种高效的BEV空间基于 convolutional-only 架构的3D检测框架 BEVENet, circumvent ViT模型的限制,保持BEV空间方法的有效性
  • results: 在NuScenes挑战中比现有SOTA方法快3倍,在NuScenes验证集上实现了0.456的mAP和0.555的NDS,并且达到了47.6帧/秒的推理速度
    Abstract 3D object detection in Bird's-Eye-View (BEV) space has recently emerged as a prevalent approach in the field of autonomous driving. Despite the demonstrated improvements in accuracy and velocity estimation compared to perspective view methods, the deployment of BEV-based techniques in real-world autonomous vehicles remains challenging. This is primarily due to their reliance on vision-transformer (ViT) based architectures, which introduce quadratic complexity with respect to the input resolution. To address this issue, we propose an efficient BEV-based 3D detection framework called BEVENet, which leverages a convolutional-only architectural design to circumvent the limitations of ViT models while maintaining the effectiveness of BEV-based methods. Our experiments show that BEVENet is 3$\times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge, achieving a mean average precision (mAP) of 0.456 and a nuScenes detection score (NDS) of 0.555 on the NuScenes validation dataset, with an inference speed of 47.6 frames per second. To the best of our knowledge, this study stands as the first to achieve such significant efficiency improvements for BEV-based methods, highlighting their enhanced feasibility for real-world autonomous driving applications.
    摘要 三维物体探测在鸟瞰视(BEV)空间已经在自动驾驶领域得到广泛应用。虽然BEV基本方法在准确率和速度估计方面表现出了显著改善,但在实际应用中仍然存在挑战。这主要是因为BEV方法的依赖于视Transformer(ViT)基础结构,这会导致输入分辨率的 quadratic complexity。为解决这个问题,我们提出了一种高效的BEV基本方法called BEVENet,该方法通过避免ViT模型的限制而维持BEV基本方法的效果。我们的实验表明,BEVENet比现代state-of-the-art(SOTA)方法三倍快,在NuScenes挑战中 achievement mean average precision(mAP)为0.456,nuScenes检测分数(NDS)为0.555,并且检测速度为47.6帧每秒。根据我们所知,这是首次实现BEV基本方法的性能改进,这种改进了BEV基本方法的实际应用可行性。

Weighted Riesz Particles

  • paper_url: http://arxiv.org/abs/2312.00621
  • repo_url: https://github.com/986876245/weighted-riesz-particles
  • paper_authors: Xiongming Dai, Gerald Baumgartner
  • for: 本文关注Markov chain Monte Carlo(MCMC)方法的探索复杂统计分布的问题,尤其是在高维参数空间时的计算复杂性问题。
  • methods: 本文提出一种基于权重里兹能量的方法,通过对征 Rectifiable submanifolds 的对应点进行对抗,来加速MCMC方法的探索过程。
  • results: 对比实验表明,该方法可以提高MCMC方法的acceptance rate,并且需要 fewer evaluations。这种方法可以应用于高维参数空间的MCMC问题。
    Abstract Markov chain Monte Carlo (MCMC) methods are simulated by local exploration of complex statistical distributions, and while bypassing the cumbersome requirement of a specific analytical expression for the target, this stochastic exploration of an uncertain parameter space comes at the expense of a large number of samples, and this computational complexity increases with parameter dimensionality. Although at the exploration level, some methods are proposed to accelerate the convergence of the algorithm, such as tempering, Hamiltonian Monte Carlo, Rao-redwellization, and scalable methods for better performance, it cannot avoid the stochastic nature of this exploration. We consider the target distribution as a mapping where the infinite-dimensional Eulerian space of the parameters consists of a number of deterministic submanifolds and propose a generalized energy metric, termed weighted Riesz energy, where a number of points is generated through pairwise interactions, to discretize rectifiable submanifolds. We study the properties of the point, called Riesz particle, and embed it into sequential MCMC, and we find that there will be higher acceptance rates with fewer evaluations, we validate it through experimental comparative analysis from a linear Gaussian state-space model with synthetic data and a non-linear stochastic volatility model with real-world data.
    摘要

Learning from One Continuous Video Stream

  • paper_url: http://arxiv.org/abs/2312.00598
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman
  • for: 这个 paper 是为了研究在线学习从单个连续视频流中进行学习,而人类和动物在学习时通常不使用 mini-batches、数据增强或搅拌。
  • methods: 这个 paper 使用了一个框架,该框架包括两个现有视频数据集的流和任务,以及一种性能评估方法,该方法考虑了适应性和泛化性。它们还使用了像素到像素的模型来在单个流中进行预训练和单流评估,而无需更改模型,并且总是使用同一个像素损失。
  • results: 根据这个 paper,通过使用这个框架,他们获得了大量的单流学习增益,并发现了摩拜挫蛋化的影响,以及weight更新的速度对学习的影响。这些发现导致了在使用同样的架构和不使用重复缓存时,与批处理学习的性能相当。
    Abstract We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between consecutive video frames and there is very little prior work on it. Our framework allows us to do a first deep dive into the topic and includes a collection of streams and tasks composed from two existing video datasets, plus methodology for performance evaluation that considers both adaptation and generalization. We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation as well as between arbitrary tasks, without ever requiring changes to models and always using the same pixel loss. Equipped with this framework we obtained large single-stream learning gains from pre-training with a novel family of future prediction tasks, found that momentum hurts, and that the pace of weight updates matters. The combination of these insights leads to matching the performance of IID learning with batch size 1, when using the same architecture and without costly replay buffers.
    摘要 我们介绍了一个在线学习框架,可以从单个连续视频流中学习,就像人类和动物一样,无需小批量、数据增强或搅拌。这会带来很大的挑战,因为视频帧之间的相关性很高,而现有的研究很少。我们的框架允许我们进行深入研究,包括从两个现有视频数据集中获取的流和任务,以及用于性能评估的方法,考虑了适应和泛化。我们使用像素到像素模型作为一种实用和灵活的方式来在预训练和单个流评估之间进行切换,无需更改模型,并且总是使用同一个像素损失。通过这个框架,我们在预训练中获得了大量的单流学习增益,发现了摩托振荡会带来负面影响,以及权重更新的速度会对性能产生影响。这些发现导致我们可以与使用同样的架构和无需贵重的回放缓存的IID学习匹配性能。

UAVs and Birds: Enhancing Short-Range Navigation through Budgerigar Flight Studies

  • paper_url: http://arxiv.org/abs/2312.00597
  • repo_url: None
  • paper_authors: Md. Mahmudur Rahman, Sajid Islam, Showren Chowdhury, Sadia Jahan Zeba, Debajyoti Karmaker
  • for: 本研究探讨了鸽子(Melopsittacus undulatus)的飞行行为,以了解其飞行轨迹和运动。
  • methods: 研究使用了3D重建技术,基于斯tereo视频摄像头记录,对鸽子的速度和加速度进行了仔细分析。
  • results: 研究发现,鸽子在起飞、飞行和着陆时的运动具有独特的特征,这些特征可能用于改进无人飞行器(UAV)的性能和自主性。
    Abstract This study delves into the flight behaviors of Budgerigars (Melopsittacus undulatus) to gain insights into their flight trajectories and movements. Using 3D reconstruction from stereo video camera recordings, we closely examine the velocity and acceleration patterns during three flight motion takeoff, flying and landing. The findings not only contribute to our understanding of bird behaviors but also hold significant implications for the advancement of algorithms in Unmanned Aerial Vehicles (UAVs). The research aims to bridge the gap between biological principles observed in birds and the application of these insights in developing more efficient and autonomous UAVs. In the context of the increasing use of drones, this study focuses on the biologically inspired principles drawn from bird behaviors, particularly during takeoff, flying and landing flight, to enhance UAV capabilities. The dataset created for this research sheds light on Budgerigars' takeoff, flying, and landing techniques, emphasizing their ability to control speed across different situations and surfaces. The study underscores the potential of incorporating these principles into UAV algorithms, addressing challenges related to short-range navigation, takeoff, flying, and landing.
    摘要

BCN: Batch Channel Normalization for Image Classification

  • paper_url: http://arxiv.org/abs/2312.00596
  • repo_url: https://github.com/AfifaKhaled/Batch-Channel-Normalization
  • paper_authors: Afifa Khaled, Chao Li, Jia Ning, Kun He
  • for: 本研究旨在提出一种新的常量化技术,以提高深度学习模型的性能。
  • methods: 该技术基于Batch Channel Normalization(BCN),通过分别对输入进行(N,H,W)和(C,H,W)轴的normalization,然后将两个normalized输出相加,以适应特定的dataset或任务。
  • results: 实验结果显示,提出的技术可以轻松地应用于不同版本的CNN或Vision Transformer架构,并且在不同的dataset上表现出优于标准的Batch Normalization(BN)和Layer Normalization(LN)技术。
    Abstract Normalization techniques have been widely used in the field of deep learning due to their capability of enabling higher learning rates and are less careful in initialization. However, the effectiveness of popular normalization technologies is typically limited to specific areas. Unlike the standard Batch Normalization (BN) and Layer Normalization (LN), where BN computes the mean and variance along the (N,H,W) dimensions and LN computes the mean and variance along the (C,H,W) dimensions (N, C, H and W are the batch, channel, spatial height and width dimension, respectively), this paper presents a novel normalization technique called Batch Channel Normalization (BCN). To exploit both the channel and batch dependence and adaptively and combine the advantages of BN and LN based on specific datasets or tasks, BCN separately normalizes inputs along the (N, H, W) and (C, H, W) axes, then combines the normalized outputs based on adaptive parameters. As a basic block, BCN can be easily integrated into existing models for various applications in the field of computer vision. Empirical results show that the proposed technique can be seamlessly applied to various versions of CNN or Vision Transformer architecture. The code is publicly available at https://github.com/AfifaKhaled/BatchChannel-Normalization
    摘要 常用的正规化技术在深度学习领域广泛应用,因为它们可以提高学习率并且不需要仔细初始化。然而,各种各样的正规化技术效果通常受到特定领域的限制。与标准批处理正规化(BN)和层正规化(LN)不同,BN计算批处理缓冲区(N, H, W)维度上的mean和variance,LN计算通道缓冲区(C, H, W)维度上的mean和variance。这篇论文提出了一种新的正规化技术, называ为批量通道正规化(BCN)。通过在(N, H, W)和(C, H, W)轴上分别正规化输入,然后根据适应参数相乘 combin,BCN可以同时利用通道和批处理的依赖关系,并且可以根据特定数据集或任务选择合适的BN和LN的优点。作为基本块,BCN可以轻松地与现有模型结合使用,用于各种计算机视ión应用。实验结果表明,提议的技术可以顺利应用于不同版本的CNN或计算机视iónTransformer架构。代码可以在https://github.com/AfifaKhaled/BatchChannel-Normalization上获取。

Less is More: Learning Reference Knowledge Using No-Reference Image Quality Assessment

  • paper_url: http://arxiv.org/abs/2312.00591
  • repo_url: https://github.com/LXDxmu/IQA
  • paper_authors: Xudong Li, Jingyuan Zheng, Xiawu Zheng, Runze Hu, Enwei Zhang, Yuting Gao, Yunhang Shen, Ke Li, Yutao Liu, Pingyang Dai, Yan Zhang, Rongrong Ji
  • for: 这个论文主要是为了解决无参图像质量评估(NR-IQA)中的问题,即如何学习参照图像的知识来评估图像质量。
  • methods: 这个论文提出了一种新的特征抽象方法,用于在NR-IQA中学习参照图像的知识。此外,文章还提出了一种增强 inductive bias regularization,以避免过拟合和提高特征提取框架的表达能力。
  • results: 根据文章的实验结果,这种方法可以在八个标准NR-IQA数据集上 achieve state-of-the-art 的性能,其PLCC值分别为0.917和0.686,比之前的教程模型高得多。
    Abstract Image Quality Assessment (IQA) with reference images have achieved great success by imitating the human vision system, in which the image quality is effectively assessed by comparing the query image with its pristine reference image. However, for the images in the wild, it is quite difficult to access accurate reference images. We argue that it is possible to learn reference knowledge under the No-Reference Image Quality Assessment (NR-IQA) setting, which is effective and efficient empirically. Concretely, by innovatively introducing a novel feature distillation method in IQA, we propose a new framework to learn comparative knowledge from non-aligned reference images. And then, to achieve fast convergence and avoid overfitting, we further propose an inductive bias regularization. Such a framework not only solves the congenital defects of NR-IQA but also improves the feature extraction framework, enabling it to express more abundant quality information. Surprisingly, our method utilizes less input while obtaining a more significant improvement compared to the teacher models. Extensive experiments on eight standard NR-IQA datasets demonstrate the superior performance to the state-of-the-art NR-IQA methods, i.e., achieving the PLCC values of 0.917 (vs. 0.884 in LIVEC) and 0.686 (vs. 0.661 in LIVEFB).
    摘要 图像质量评估(IQA)采用参考图像已经取得了很大的成功,通过模仿人类视觉系统,将查询图像与其原始参考图像进行比较,以评估图像质量。然而,在野外图像中获取准确的参考图像很难。我们认为可以在无参考图像质量评估(NR-IQA)设定下学习参考知识,这是有效和高效的。具体来说,我们通过在IQA中引入一种新的特征液化方法,提出了一个新的框架,用于从非对齐参考图像中学习比较知识。然后,为了快速收敛和避免过拟合,我们进一步提出了一种偏导因素正则化。这种框架不仅解决了NR-IQA的遗传缺陷,还改进了特征提取框架,使其能够更好地表达更多的质量信息。 surprisingly,我们的方法使用更少的输入,却获得了更大的改进,相比教师模型。广泛的实验表明,我们的方法在八个标准NR-IQA数据集上达到了状态之 искусственный智能的PLCC值,即0.917(vs. 0.884 in LIVEC)和0.686(vs. 0.661 in LIVEFB)。

Explainable Fraud Detection with Deep Symbolic Classification

  • paper_url: http://arxiv.org/abs/2312.00586
  • repo_url: https://github.com/samanthav24/dsc_fraud_detection
  • paper_authors: Samantha Visbeek, Erman Acar, Floris den Hengst
    for:深度符号分类(DSC)是一种基于深度符号回归框架的分类方法,用于解决诈金探测领域中的各种问题。methods:DSC使用了深度神经网络和反射学习来搜索符号表示空间中的所有可能的分析函数,并且直接优化了任意评价指标。results:DSC在PaySim数据集上达到了与当前最佳模型相同的预测性能,而同时在可解释性方面超过了它们。这表明DSC是一种有前途的诈金探测模型。
    Abstract There is a growing demand for explainable, transparent, and data-driven models within the domain of fraud detection. Decisions made by fraud detection models need to be explainable in the event of a customer dispute. Additionally, the decision-making process in the model must be transparent to win the trust of regulators and business stakeholders. At the same time, fraud detection solutions can benefit from data due to the noisy, dynamic nature of fraud and the availability of large historical data sets. Finally, fraud detection is notorious for its class imbalance: there are typically several orders of magnitude more legitimate transactions than fraudulent ones. In this paper, we present Deep Symbolic Classification (DSC), an extension of the Deep Symbolic Regression framework to classification problems. DSC casts classification as a search problem in the space of all analytic functions composed of a vocabulary of variables, constants, and operations and optimizes for an arbitrary evaluation metric directly. The search is guided by a deep neural network trained with reinforcement learning. Because the functions are mathematical expressions that are in closed-form and concise, the model is inherently explainable both at the level of a single classification decision and the model's decision process. Furthermore, the class imbalance problem is successfully addressed by optimizing for metrics that are robust to class imbalance such as the F1 score. This eliminates the need for oversampling and undersampling techniques that plague traditional approaches. Finally, the model allows to explicitly balance between the prediction accuracy and the explainability. An evaluation on the PaySim data set demonstrates competitive predictive performance with state-of-the-art models, while surpassing them in terms of explainability. This establishes DSC as a promising model for fraud detection systems.
    摘要 有增长的需求是在欺诈检测领域内提供可解释、透明和数据驱动的模型。欺诈检测模型的决策需要在客户纠纷时进行解释。同时,欺诈检测解决方案可以从数据中受益,因为欺诈是不稳定的和有很大历史数据集。然而,欺诈检测具有类别偏好:有许多合法交易,而欺诈交易只占几个数量级。在这篇论文中,我们提出了深度符号分类(DSC),它是深度符号回归框架的扩展,用于解决分类问题。DSC将分类视为在所有可能的分析函数中搜索问题,并且通过深度神经网络训练了回归学习来指导搜索。由于函数是具有闭合形式和简洁的数学表达,因此模型本身就是可解释的。此外,DSC可以直接优化robust to class imbalance的度量,从而消除了传统方法中的欠拟合和异常抽样技术。最后,模型允许明确地平衡 между预测精度和解释性。在PaySim数据集上进行评估,DSC与当前状态的模型竞争,而且在解释性方面超越它们。这种表现证明了DSC是一种有前途的欺诈检测模型。

  • paper_url: http://arxiv.org/abs/2312.00584
  • repo_url: None
  • paper_authors: Josef Valvoda, Alec Thompson, Ryan Cotterell, Simone Teufel
  • for: 这篇论文主要是关于法律自然语言处理领域中法官的自动化问题。
  • methods: 论文使用了大量的公共法律数据集,并使用机器学习技术来模拟法官的决策过程。
  • results: 论文认为,通过自动化法官的角色可能会降低法律系统的伦理层次,特别是在英美法系中。
    Abstract The introduction of large public legal datasets has brought about a renaissance in legal NLP. Many of these datasets are comprised of legal judgements - the product of judges deciding cases. This fact, together with the way machine learning works, means that several legal NLP models are models of judges. While some have argued for the automation of judges, in this position piece, we argue that automating the role of the judge raises difficult ethical challenges, in particular for common law legal systems. Our argument follows from the social role of the judge in actively shaping the law, rather than merely applying it. Since current NLP models come nowhere close to having the facilities necessary for this task, they should not be used to automate judges. Furthermore, even in the case the models could achieve human-level capabilities, there would still be remaining ethical concerns inherent in the automation of the legal process.
    摘要 大量公共法律数据的引入已经为法律自然语言处理带来了一个新的精神。许多这些数据集是法律判决 - 法官决案的产物。这一事实,加上机器学习的工作方式,意味着许多法律自然语言处理模型都是模拟法官的。一些人提出了自动化法官的想法,但在这篇观点文章中,我们 argue that 自动化法官的角色带来了困难的伦理问题,特别是在常法法律系统中。我们的Argument来自于法官在活动shape法律的社会角色,而不仅仅是应用它。由于当前NLP模型没有 necessary的设备,它们不应该用于自动化法官。即使模型可以达到人类水平的能力,也仍然存在自动化法律过程中的伦理问题。

  • paper_url: http://arxiv.org/abs/2312.00554
  • repo_url: None
  • paper_authors: Aniket Deroy, Subhankar Maity
  • for: 这项研究旨在探讨法律数据集和大型自然语言模型(LLM)在法律领域中的应用,特别是在案例判决摘要中发现的可能存在的偏见。
  • methods: 本研究使用法律数据集和大型自然语言模型生成案例判决摘要,并分析这些摘要中可能存在的偏见。
  • results: 研究发现了 gender-related keywords、race-related keywords、对女性犯罪的关键词、国家名称和宗教关键词中存在偏见。
    Abstract The evolution of legal datasets and the advent of large language models (LLMs) have significantly transformed the legal field, particularly in the generation of case judgment summaries. However, a critical concern arises regarding the potential biases embedded within these summaries. This study scrutinizes the biases present in case judgment summaries produced by legal datasets and large language models. The research aims to analyze the impact of biases on legal decision making. By interrogating the accuracy, fairness, and implications of biases in these summaries, this study contributes to a better understanding of the role of technology in legal contexts and the implications for justice systems worldwide. In this study, we investigate biases wrt Gender-related keywords, Race-related keywords, Keywords related to crime against women, Country names and religious keywords. The study shows interesting evidences of biases in the outputs generated by the large language models and pre-trained abstractive summarization models. The reasoning behind these biases needs further studies.
    摘要 法律数据和大型自然语言模型(LLM)的演化,对法律领域产生了深见改变,特别是在案例判决摘要的生成方面。然而,一个关键问题是这些摘要中嵌入的偏见。这项研究检查法律数据和大型自然语言模型生成的案例判决摘要中的偏见。研究的目的是分析偏见对法律决策的影响。通过调查摘要的准确性、公平性和偏见的影响,本研究帮助了我们更好地理解技术在法律上下文中的角色,以及对司法系统世界各地的影响。本研究investigates偏见关键字words related to gender, race, crime against women, country names and religious keywords.结果显示大自然语言模型和预训练摘要模型生成的输出存在偏见。需要进一步的研究,以了解这些偏见的原因。

Target-agnostic Source-free Domain Adaptation for Regression Tasks

  • paper_url: http://arxiv.org/abs/2312.00540
  • repo_url: https://github.com/Siriusize/TASFAR_DA
  • paper_authors: Tianlang He, Zhiqiu Xia, Jierun Chen, Haoliang Li, S. -H. Gary Chan
  • for: 这个论文是为了解决无监督领域适应(UDA)中的领域差距问题,并且不需要目标领域的标签数据。
  • methods: 这个论文提出了一个新的无监督源自由领域适应(TASFAR)方法,用于推断任务中的领域适应。TASFAR使用预测信任来估计目标领域的标签分布,然后将源模型在目标领域上准确化。
  • results: 在四个推断任务中,TASFAR对比于现有的源自由UDA方法而言,平均减少了22%的错误,并且与使用源数据的领域适应方法(source-based UDA)的精度相似。
    Abstract Unsupervised domain adaptation (UDA) seeks to bridge the domain gap between the target and source using unlabeled target data. Source-free UDA removes the requirement for labeled source data at the target to preserve data privacy and storage. However, work on source-free UDA assumes knowledge of domain gap distribution, and hence is limited to either target-aware or classification task. To overcome it, we propose TASFAR, a novel target-agnostic source-free domain adaptation approach for regression tasks. Using prediction confidence, TASFAR estimates a label density map as the target label distribution, which is then used to calibrate the source model on the target domain. We have conducted extensive experiments on four regression tasks with various domain gaps, namely, pedestrian dead reckoning for different users, image-based people counting in different scenes, housing-price prediction at different districts, and taxi-trip duration prediction from different departure points. TASFAR is shown to substantially outperform the state-of-the-art source-free UDA approaches by averagely reducing 22% errors for the four tasks and achieve notably comparable accuracy as source-based UDA without using source data.
    摘要 不监督领域适应(UDA)目标是bridging领域差距 между目标和来源,不使用目标数据进行监督。无源UDA取消了目标数据的监督要求,以保护数据隐私和存储。然而,无源UDA的研究假设了知道领域差距分布,因此受到限制,只能处理目标意识或分类任务。为了突破这一限制,我们提出了TASFAR,一种新的无源领域适应方法,用于回归任务。TASFAR使用预测信任来估算目标标签分布,并将其用于源模型在目标领域的准确。我们对四种回归任务进行了广泛的实验,分别是不同用户的行人去reckoning、不同场景的图像基于人数计算、不同区域的住房价格预测和不同起点的出租车程时间预测。TASFAR与现状的无源UDA方法相比,平均减少了22%的错误,并 achieves notable comparable accuracy with source-based UDA without using source data.

SurreyAI 2023 Submission for the Quality Estimation Shared Task

  • paper_url: http://arxiv.org/abs/2312.00525
  • repo_url: None
  • paper_authors: Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe
  • for: The paper is written for assessing the quality of translations in situations where there is no reference available.
  • methods: The paper uses the TransQuest framework and explores various autoencoder pre-trained language models within the MonoTransQuest architecture using single and ensemble settings.
  • results: The proposed approach, using the MonoTQ-InfoXLM-large model, significantly improves over the baseline for the majority of the 5 language pairs (English-Gujarati, English-Hindi, English-Marathi, English-Tamil, and English-Telugu) in the Sentence-Level Direct Assessment shared task of WMT23.Here is the same information in Simplified Chinese text:
  • for: 本文是为评估缺乏参考的翻译质量而写的。
  • methods: 本文使用TransQuest框架,并使用不同的自动编码器预训练语言模型在单个和ensemble设置下进行探索。
  • results: 提议的方法使用MonoTQ-InfoXLM-large模型,在5种语言对(英语-古吉拉提, 英语-希ن第, 英语-马拉地, 英语-泰米尔, 英语-泰卢固)中的大多数语言对上超越基线。
    Abstract Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available. This paper describes the approach adopted by the SurreyAI team for addressing the Sentence-Level Direct Assessment shared task in WMT23. The proposed approach builds upon the TransQuest framework, exploring various autoencoder pre-trained language models within the MonoTransQuest architecture using single and ensemble settings. The autoencoder pre-trained language models employed in the proposed systems are XLMV, InfoXLM-large, and XLMR-large. The evaluation utilizes Spearman and Pearson correlation coefficients, assessing the relationship between machine-predicted quality scores and human judgments for 5 language pairs (English-Gujarati, English-Hindi, English-Marathi, English-Tamil and English-Telugu). The MonoTQ-InfoXLM-large approach emerges as a robust strategy, surpassing all other individual models proposed in this study by significantly improving over the baseline for the majority of the language pairs.
    摘要 高品质估计(QE)系统在翻译质量评估中发挥着重要作用,特别是在没有参考数据的情况下。本文介绍了萨里郡AI团队在WMT23中的共同评估任务的方法。该方法基于TransQuest框架,并使用单个和ensemble设置来探索不同的自动编码语言模型,包括XLMV、InfoXLM-large和XLMR-large。机器预测的质量分数与人工评分之间的关系使用斯宾塞和帕森相关系数进行评估,对英语-古吉拉特、英语-希腊、英语-马拉地、英语-泰米尔和英语-泰夸鲁五种语言对抗进行评估。结果表明, InfoXLM-large自动编码语言模型在大多数语言对抗中显著超过了其他个人模型,并在大多数语言对抗中提高了基准值。

Generative artificial intelligence enhances individual creativity but reduces the collective diversity of novel content

  • paper_url: http://arxiv.org/abs/2312.00506
  • repo_url: None
  • paper_authors: Anil R. Doshi, Oliver P. Hauser
  • for: 本研究旨在探讨生成人工智能(GenAI)的想法对创作输出的影响。
  • methods: 研究采用在线实验研究,一部分作者可以通过GenAI平台获得创作想法。
  • results: 研究发现,访问GenAI想法会使故事被评价为更有创意、更好写作和更愉悦,尤其是对于 menos 创意的作者。但是对于每个条件来说,机器人编写的故事之间的对比表明,GenAI-enabled故事更像彼此相似,而人类alone的故事更有多样性。这些结果表明个人创新能力增加,但同时也存在一定的集体创新风险:即个人作者使用GenAI提高自己的写作能力,但是集体来说可能会生成更窄的创新内容范围。
    Abstract Creativity is core to being human. Generative artificial intelligence (GenAI) holds promise for humans to be more creative by offering new ideas, or less creative by anchoring on GenAI ideas. We study the causal impact of GenAI ideas on the production of an unstructured creative output in an online experimental study where some writers could obtain ideas for a story from a GenAI platform. We find that access to GenAI ideas causes stories to be evaluated as more creative, better written and more enjoyable, especially among less creative writers. However, objective measures of story similarity within each condition reveal that GenAI-enabled stories are more similar to each other than stories by humans alone. These results point to an increase in individual creativity, but at the same time there is a risk of losing collective novelty: this dynamic resembles a social dilemma where individual writers are better off using GenAI to improve their own writing, but collectively a narrower scope of novel content may be produced with GenAI. Our results have implications for researchers, policy-makers and practitioners interested in bolstering creativity, but point to potential downstream consequences from over-reliance.
    摘要 创造力是人类核心的。生成人工智能(GenAI)的出现提供了新的创意想法,可以使人更创造性。我们在在线实验研究中发现,让作者通过GenAI平台获得创意想法后,他们的故事被评价为更有创意、更好写和更有趣。特别是对于较不创造的作者来说,GenAI的影响更加明显。然而,我们发现,通过GenAI提供的想法使故事更像彼此的同时,人类作者的故事更加独特和多样。这些结果表明,GenAI可以提高个人创造力,但同时也可能导致集体创新的减少。这种情况类似于社会冲击,个人作者使用GenAI提高自己的写作能力,但集体上可能产生较窄的创新内容。我们的结果对于研究人员、政策制定者和实践者有着重要的意义,但也需要关注下沉淀的后果。

  • paper_url: http://arxiv.org/abs/2312.00480
  • repo_url: None
  • paper_authors: Hiroaki Yamada, Takenobu Tokunaga, Ryutaro Ohara, Akira Tokutsu, Keisuke Takeshita, Mihoko Sumida
  • for: 这个论文是为了日本法律判断预测(LJP)预测的第一个数据集,即日本损害诉论据集(JTD),该数据集包括两个任务:诉论据预测和其理由抽取。
  • methods: 该论文使用了基线实验来证明提出的两个任务的可行性,并通过法律专家的错误分析来确定未来LJP研究的方向。
  • results: 该论文通过基线实验表明了提出的两个任务的可行性,并通过法律专家的错误分析identified sources of errors and suggested future directions of the LJP research。
    Abstract This paper presents the first dataset for Japanese Legal Judgment Prediction (LJP), the Japanese Tort-case Dataset (JTD), which features two tasks: tort prediction and its rationale extraction. The rationale extraction task identifies the court's accepting arguments from alleged arguments by plaintiffs and defendants, which is a novel task in the field. JTD is constructed based on annotated 3,477 Japanese Civil Code judgments by 41 legal experts, resulting in 7,978 instances with 59,697 of their alleged arguments from the involved parties. Our baseline experiments show the feasibility of the proposed two tasks, and our error analysis by legal experts identifies sources of errors and suggests future directions of the LJP research.
    摘要 这篇论文介绍了日本法律预测(LJP)的首个数据集,日本侵权案件数据集(JTD),该数据集包含两个任务:侵权预测和其理由提取。理由提取任务目标是从被告和被告党提交的纠纷中提取法院接受的Arguments,这是领域中的一项新任务。JTD是基于3477份日本民法判决,由41名法律专家进行标注,共计7978个实例,59697个被告和被告党的纠纷。我们的基线实验表明了提案的两个任务的可行性,而我们的错误分析也提供了法律专家的意见,并指出了未来LJP研究的方向。

A Bayesian approach for prompt optimization in pre-trained language models

  • paper_url: http://arxiv.org/abs/2312.00471
  • repo_url: None
  • paper_authors: Antonio Sabbatella, Andrea Ponti, Antonio Candelieri, Ilaria Giordani, Francesco Archetti
  • for: 这篇论文主要是为了提高干预提示(HPT)的效果,以便在语言模型(LLM)不可用或只可用作黑盒模型时, Still achieve good performance in text classification tasks.
  • methods: 这篇论文使用了抽象优化(Bayesian optimization)方法来优化精度提示(discrete prompts),以解决高维度的token空间和提示序列长度的问题。
  • results: 实验结果显示,使用抽象优化方法可以 efficiently search for discrete prompts,并且在六个benchmark上显示出良好的性能,包括准确率和wall clock time的tradeoff。
    Abstract A prompt is a sequence of symbol or tokens, selected from a vocabulary according to some rule, which is prepended/concatenated to a textual query. A key problem is how to select the sequence of tokens: in this paper we formulate it as a combinatorial optimization problem. The high dimensionality of the token space com-pounded by the length of the prompt sequence requires a very efficient solution. In this paper we propose a Bayesian optimization method, executed in a continuous em-bedding of the combinatorial space. In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out requiring access to the large language model (LLM) and can be used also when LLM is available only as a black-box. This is critically important if LLMs are made available in the Model as a Service (MaaS) manner as in GPT-4. The current manu-script is focused on the optimization of discrete prompts for classification tasks. The discrete prompts give rise to difficult combinatorial optimization problem which easily become intractable given the dimension of the token space in realistic applications. The optimization method considered in this paper is Bayesian optimization (BO) which has become the dominant approach in black-box optimization for its sample efficiency along with its modular structure and versatility. In this paper we use BoTorch, a library for Bayesian optimization research built on top of pyTorch. Albeit preliminary and obtained using a 'vanilla' version of BO, the experiments on RoB-ERTa on six benchmarks, show a good performance across a variety of tasks and enable an analysis of the tradeoff between size of the search space, accuracy and wall clock time.
    摘要 Prompt 是一个序列表示符或 токен,从一个词汇中选择,根据某些规则进行预先 concatenate 或 prepend 到文本查询。一个关键问题是如何选择这个序列表示符:在这篇论文中,我们将其形式化为一个 combinatorial optimization 问题。高维度的 token 空间和提示序列的长度相互紧张,需要非常高效的解决方案。在这篇论文中,我们提议使用 Bayesian 优化方法,在连续的 embedding 空间中执行。我们在这篇论文中专注于硬件提示优化(HPT),直接在文本输入中添加不同的 discrete 提示,而不需要访问大语言模型(LLM),并且可以在 LLM 只是黑盒模型时使用。这是非常重要的,因为在 MaaS 方式下,LLM 将被提供为模型。现有的整篇论文是关于分类任务的 дискcrete 提示优化。这些 discrete 提示会导致困难的 combinatorial optimization 问题,在实际应用中容易变得不可持。我们考虑使用 Bayesian 优化(BO)方法,它在黑盒优化中已经成为主流方法,因为它的样本效率高,并且具有模块结构和多样性。在这篇论文中,我们使用 BoTorch,一个基于 pyTorch 的 Bayesian 优化库。虽然还是初步的,但通过使用 'vanilla' 版本的 BO,我们在 RoB-ERTa 上进行了六个 benchmark 的实验,显示在多种任务上表现良好,并允许分析搜索空间大小、准确率和wall clock time 之间的负面关系。

Meta-Diversity Search in Complex Systems, A Recipe for Artificial Open-Endedness ?

  • paper_url: http://arxiv.org/abs/2312.00455
  • repo_url: None
  • paper_authors: Mayalen Etcheverry, Bert Wang-Chak Chan, Clément Moulin-Frier, Pierre-Yves Oudeyer
  • For: The paper aims to develop an artificial system that can generate endless surprises in Minecraft by leveraging complex systems and meta-diversity search.* Methods: The proposed framework includes a complex system for recursively growing and complexifying artifacts over time, as well as a discovery algorithm that utilizes meta-diversity search to automate the long-term discovery of novel and increasingly complex artifacts.* Results: The authors simulate an artificial “chemistry” system based on Lenia continuous cellular automaton for generating artifacts, and an artificial “discovery assistant” (called Holmes) for the artifact-discovery process. Holmes incrementally learns a hierarchy of modular representations to characterize divergent sources of diversity and uses a goal-based intrinsically-motivated exploration as the diversity search strategy.Here are the three points in Simplified Chinese text:* For: 这篇论文的目的是开发一个可以在 Minecraft 中产生无限的奇想的人工系统。* Methods: 该提案的框架包括一个复杂系统,用于在时间上重复增长和复杂化 artifacts,以及一个发现算法,利用 meta-diversity search 自动化长期发现 Minecraft 中的新奇和不断增长的 artifacts。* Results: 作者们在 Lenia 连续细胞自动机上模拟了一个人工 “化学” 系统来生成 artifacts,并使用了一个人工 “发现助手” (叫做 Holmes) 来进行 artifact 发现过程。 Holmes 逐渐学习了一个层次的模块化表示来描述不同的多样性源,并使用了一种目标基于自然motivation的探索策略来搜索多样性。
    Abstract Can we build an artificial system that would be able to generate endless surprises if ran "forever" in Minecraft? While there is not a single path toward solving that grand challenge, this article presents what we believe to be some working ingredients for the endless generation of novel increasingly complex artifacts in Minecraft. Our framework for an open-ended system includes two components: a complex system used to recursively grow and complexify artifacts over time, and a discovery algorithm that leverages the concept of meta-diversity search. Since complex systems have shown to enable the emergence of considerable complexity from set of simple rules, we believe them to be great candidates to generate all sort of artifacts in Minecraft. Yet, the space of possible artifacts that can be generated by these systems is often unknown, challenging to characterize and explore. Therefore automating the long-term discovery of novel and increasingly complex artifacts in these systems is an exciting research field. To approach these challenges, we formulate the problem of meta-diversity search where an artificial "discovery assistant" incrementally learns a diverse set of representations to characterize behaviors and searches to discover diverse patterns within each of them. A successful discovery assistant should continuously seek for novel sources of diversities while being able to quickly specialize the search toward a new unknown type of diversity. To implement those ideas in the Minecraft environment, we simulate an artificial "chemistry" system based on Lenia continuous cellular automaton for generating artifacts, as well as an artificial "discovery assistant" (called Holmes) for the artifact-discovery process. Holmes incrementally learns a hierarchy of modular representations to characterize divergent sources of diversity and uses a goal-based intrinsically-motivated exploration as the diversity search strategy.
    摘要 可以建立一个人工系统,以生成无穷无尽的奇怪。这篇文章提出了我们认为是生成奇怪的工具:一个复杂系统,用于在时间进行复杂化和增加复杂性的 artifacts,以及一个发现算法,利用多元多样性搜寻。由于复杂系统可以从简单规则中产生巨大的复杂性,因此我们认为它们是 IDEAL candidates 生成 Minecraft 中的所有类型的artifacts。然而,生成这些系统中的可能性空间是不知道的,困难重要和探索。因此,自动化长期发现新奇的和增加复杂性的artifacts 是一个激发人们的研究领域。为了解决这些挑战,我们将问题定义为多元多样性搜寻,一个人工 "发现助手" (called Holmes) 逐渐学习一组多元的表示,以characterize 行为和搜寻新型多样性。一个成功的发现助手应该不断搜寻新的多样性来源,并能够快速专注于新不熟悉的多样性类型。为了实现这些想法在 Minecraft 环境中,我们使用 Lenia 连续细胞自动Machine 生成 artifacts,以及一个人工 "发现助手" (called Holmes) 来进行发现过程。Holmes 逐渐学习一 hierarchy of 模块化表示,以characterize 多样性的分支源头,并使用目标基于自主动机搜寻为多样性搜寻策略。

PEFTDebias : Capturing debiasing information using PEFTs

  • paper_url: http://arxiv.org/abs/2312.00434
  • repo_url: None
  • paper_authors: Sumit Agarwal, Aditya Srikanth Veerubhotla, Srijan Bansal
  • for: 本研究旨在解决基础模型中含有的隐式偏见问题,通过Parameter-Efficient Fine-Tuning(PEFT)方法来mitigate these biases。
  • methods: 本研究使用了两个主要阶段:一个上游阶段用于获取偏见补偿参数,以及一个下游阶段用于在细化训练过程中固化这些参数。
  • results: 对四个数据集进行了评估,发现PEFT可以有效地减少下游偏见,并且这些参数具有特定偏见轴的补偿特性,可以有效地在不同的下游任务中减少偏见。
    Abstract The increasing use of foundation models highlights the urgent need to address and eliminate implicit biases present in them that arise during pretraining. In this paper, we introduce PEFTDebias, a novel approach that employs parameter-efficient fine-tuning (PEFT) to mitigate the biases within foundation models. PEFTDebias consists of two main phases: an upstream phase for acquiring debiasing parameters along a specific bias axis, and a downstream phase where these parameters are incorporated into the model and frozen during the fine-tuning process. By evaluating on four datasets across two bias axes namely gender and race, we find that downstream biases can be effectively reduced with PEFTs. In addition, we show that these parameters possess axis-specific debiasing characteristics, enabling their effective transferability in mitigating biases in various downstream tasks. To ensure reproducibility, we release the code to do our experiments.
    摘要 随着基础模型的使用量的增加,需要立即解决基础模型中存在的隐式偏见。在这篇论文中,我们介绍PEFTDebias方法,该方法利用参数高效精度调整(PEFT)来缓解基础模型中的偏见。PEFTDebias方法包括两个主要阶段:一个上游阶段用于在特定偏见轴上获得减震参数,以及一个下游阶段在模型细化过程中将这些参数粘土并冻结。我们通过在四个数据集上进行评估,发现在不同的偏见轴(性别和种族)上,下游偏见都可以有效地减震。此外,我们发现这些参数具有轴特异性减震特征,可以有效地在不同的下游任务中传输减震效果。为确保可重复性,我们发布了我们的实验代码。

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

  • paper_url: http://arxiv.org/abs/2312.00413
  • repo_url: https://github.com/wssun/ast4plu
  • paper_authors: Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, Zhenyu Chen
  • for: 本研究的目的是探讨AST基于代码表示学习的效果,以及不同的AST解析/处理/编码方法对代码表示学习和 subsequente任务的影响。
  • methods: 本研究采用了empirical study的方式,对代码表示学习模型的训练数据进行了比较,包括Token基于代码表示和AST基于代码表示。另外,本研究还进行了对不同AST解析/处理/编码方法的evaluation,以及对不同任务类型的比较。
  • results: 结果表明,models trained with Token基于代码表示在所有三个任务中表现出色,而models trained with AST基于代码表示则表现较差。此外,我们还发现了一些子集的样本,其表现与AST基于代码表示的模型不同。此外,本研究还提供了关于如何选择AST解析/处理/编码方法的详细指南。
    Abstract Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of the source code features while preserving its semantics. These representations can be used for facilitating subsequent code-related tasks. The abstract syntax tree (AST), a fundamental code feature, illustrates the syntactic information of the source code and has been widely used in code representation learning. However, there is still a lack of systematic and quantitative evaluation of how well AST-based code representation facilitates subsequent code-related tasks. In this paper, we first conduct a comprehensive empirical study to explore the effectiveness of the AST-based code representation in facilitating follow-up code-related tasks. To do so, we compare the performance of models trained with code token sequence (Token for short) based code representation and AST-based code representation on three popular types of code-related tasks. Surprisingly, the overall quantitative statistical results demonstrate that models trained with AST-based code representation consistently perform worse across all three tasks compared to models trained with Token-based code representation. Our further quantitative analysis reveals that models trained with AST-based code representation outperform models trained with Token-based code representation in certain subsets of samples across all three tasks. We also conduct comprehensive experiments to evaluate and reveal the impact of the choice of AST parsing/preprocessing/encoding methods on AST-based code representation and subsequent code-related tasks. Our study provides future researchers with detailed guidance on how to select solutions at each stage to fully exploit AST.
    摘要 (Simplified Chinese translation)编程语言理解和表示(也称为代码表示学习)一直是软件工程领域热点和挑战性任务。它通过深度学习技术生成源代码特征的数字表示,保留其 semantics。这些表示可以用于促进后续代码相关任务。抽象树(AST)是源代码的基本特征,广泛用于代码表示学习。然而,有一个缺乏系统和量化评估AST基于代码表示在后续代码相关任务中的效果。在这篇论文中,我们首先进行了详细的实验研究,探讨AST基于代码表示在后续代码相关任务中的效果。我们比较了使用代码符号序列(Token)基于代码表示和AST基于代码表示在三种流行的代码相关任务中的性能。 surprisingly,总的量化统计结果表明,使用AST基于代码表示的模型在所有三个任务中一直表现更差。我们进一步的量化分析表明,使用AST基于代码表示的模型在某些子集中表现更好于使用Token基于代码表示的模型。我们还进行了详细的实验,评估AST基于代码表示和后续代码相关任务之间的关系。我们的研究提供了未来研究人员关于如何在每个阶段选择解决方案,以完全利用AST的详细指南。

Enhancing Explainability in Mobility Data Science through a combination of methods

  • paper_url: http://arxiv.org/abs/2312.00380
  • repo_url: None
  • paper_authors: Georgios Makridis, Vasileios Koukos, Georgios Fatouros, Dimosthenis Kyriazis
  • for: 本研究旨在提高路径数据驱动的模型解释性,并为各类用户提供更深入的理解。
  • methods: 我们提出了一个综合性的框架,它结合了LIME、SHAP、Saliency maps、注意力机制、直观轨迹视图和Permutation Feature Importance等多种XAI技术,以便更好地解释模型对路径数据的决策。
  • results: 我们的框架可以帮助用户更好地理解模型的决策过程,并提供更加细致的解释结果,以满足不同用户群体的需求。
    Abstract In the domain of Mobility Data Science, the intricate task of interpreting models trained on trajectory data, and elucidating the spatio-temporal movement of entities, has persistently posed significant challenges. Conventional XAI techniques, although brimming with potential, frequently overlook the distinct structure and nuances inherent within trajectory data. Observing this deficiency, we introduced a comprehensive framework that harmonizes pivotal XAI techniques: LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), Saliency maps, attention mechanisms, direct trajectory visualization, and Permutation Feature Importance (PFI). Unlike conventional strategies that deploy these methods singularly, our unified approach capitalizes on the collective efficacy of these techniques, yielding deeper and more granular insights for models reliant on trajectory data. In crafting this synthesis, we effectively address the multifaceted essence of trajectories, achieving not only amplified interpretability but also a nuanced, contextually rich comprehension of model decisions. To validate and enhance our framework, we undertook a survey to gauge preferences and reception among various user demographics. Our findings underscored a dichotomy: professionals with academic orientations, particularly those in roles like Data Scientist, IT Expert, and ML Engineer, showcased a profound, technical understanding and often exhibited a predilection for amalgamated methods for interpretability. Conversely, end-users or individuals less acquainted with AI and Data Science showcased simpler inclinations, such as bar plots indicating timestep significance or visual depictions pinpointing pivotal segments of a vessel's trajectory.
    摘要 在 mobilicity 数据科学领域,阅读模型在轨迹数据上进行预测和解释是一项复杂的任务。传统的 XAI 技术,尽管充满潜力,frequently overlook 轨迹数据中特殊的结构和特性。我们在这种情况下引入了一个完整的框架,融合了重要的 XAI 技术:LIME(本地可解释模型无关扩展)、SHAP(SHapley Additive exPlanations)、Saliency maps、注意力机制、直接轨迹视化和Permutation Feature Importance(PFI)。与传统策略不同,我们的一体化方法可以结合这些技术的共同力量,以获得更深入和更细腻的模型决策理解。在设计这种合并,我们有效地解决了轨迹的多方面性,实现了不仅增强了解释性,还具有了对模型决策的 nuanced、contextually rich 理解。为验证和改进我们的框架,我们进行了一项调查,了解用户各种各样的偏好和接受度。我们的发现表明了一个分化:专业人员,特别是数据科学家、IT专家和机器学习工程师,表现出了深刻、技术性的理解,并经常表现出对集成方法的偏好。相比之下,非专业用户或对 AI 和数据科学 less 熟悉的人,倾向于更简单的示例,如时间步骤重要性的折衣图或轨迹上重要段的视觉表示。

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

  • paper_url: http://arxiv.org/abs/2312.00845
  • repo_url: https://github.com/HyeonHo99/Video-Motion-Customization
  • paper_authors: Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
  • for: 这篇论文旨在提高视频生成的自定义能力。
  • methods: 该方法使用视频扩散模型,并引入一种新的时间注意力层来调整视频中的动作。
  • results: 该方法可以准确地复制目标视频中的动作,同时生成多种视觉变化。Here’s the full Chinese text:
  • for: 这篇论文旨在提高视频生成的自定义能力,使得可以根据需要生成视频中的特定动作。
  • methods: 该方法使用视频扩散模型,并引入一种新的时间注意力层来调整视频中的动作。
  • results: 该方法可以准确地复制目标视频中的动作,同时生成多种视觉变化。I hope that helps! Let me know if you have any other questions.
    Abstract Text-to-video diffusion models have advanced video generation significantly. However, customizing these models to generate videos with tailored motions presents a substantial challenge. In specific, they encounter hurdles in (a) accurately reproducing motion from a target video, and (b) creating diverse visual variations. For example, straightforward extensions of static image customization methods to video often lead to intricate entanglements of appearance and motion data. To tackle this, here we present the Video Motion Customization (VMC) framework, a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. The diffusion process then preserves low-frequency motion trajectories while mitigating high-frequency motion-unrelated noise in image space. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts. Our codes, data and the project demo can be found at https://video-motion-customization.github.io
    摘要 文本到视频协沟模型已经提高了视频生成的水平。然而,为了生成具有特定动作的视频,存在一定的挑战。具体来说,它们在(a)准确地复制目标视频中的动作,以及(b)创造多样化的视觉变化方面遇到了困难。例如,直接将静止图像自定义方法扩展到视频时,经常会导致图像和动作数据之间的复杂的杂化。为了解决这个问题,我们在本文提出了视频动作定制(VMC)框架,一种新的一键调整方法,用于在视频扩散模型中添加时间注意力层。我们的方法使用 consecutive frames 之间的差异 вектор作为动作参考,然后通过扩散过程来保留低频动作轨迹,同时mitigate高频动作无关的图像噪声。我们对state-of-the-art video生成模型进行了评测,并在多种真实世界的动作和上下文中证明了我们的方法的有效性。codes、数据和项目 demo 可以在 找到。

SynFundus: Generating a synthetic fundus images dataset with millions of samples and multi-disease annotations

  • paper_url: http://arxiv.org/abs/2312.00377
  • repo_url: None
  • paper_authors: Fangxin Shang, Jie Fu, Yehui Yang, Lei Ma
  • for: Addressing the scarcity of large-scale medical imaging datasets due to privacy restrictions, the paper introduces SynFundus-1M, a high-quality synthetic dataset with over 1 million retinal fundus images and extensive disease and pathology annotations.
  • methods: The dataset is generated by a Denoising Diffusion Probabilistic Model, and the paper compares the SynFundus-Generator and SynFundus-1M with existing methods on mainstream public datasets, achieving superior Frechet Inception Distance (FID) scores.
  • results: The ophthalmologists’ evaluation confirms the authenticity of the synthetic images, and the paper demonstrates that both CNN and ViT can benefit from SynFundus-1M by pretraining or training directly, achieving better performance and faster convergence on various downstream tasks compared to datasets like ImageNet or EyePACS.
    Abstract In the field of medical imaging, the scarcity of large-scale datasets due to privacy restrictions stands as a significant barrier to develop large models for medical. To address this issue, we introduce SynFundus-1M, a high-quality synthetic dataset with over 1 million retinal fundus images and extensive disease and pathologies annotations, which is generated by a Denoising Diffusion Probabilistic Model. The SynFundus-Generator and SynFundus-1M achieve superior Frechet Inception Distance (FID) scores compared to existing methods on main-stream public real datasets. Furthermore, the ophthalmologists evaluation validate the difficulty in discerning these synthetic images from real ones, confirming the SynFundus-1M's authenticity. Through extensive experiments, we demonstrate that both CNN and ViT can benifit from SynFundus-1M by pretraining or training directly. Compared to datasets like ImageNet or EyePACS, models train on SynFundus-1M not only achieve better performance but also faster convergence on various downstream tasks.
    摘要 医疗影像领域中,由于隐私限制,庞大数据集的缺乏成为开发大型模型的 significante barrier。为解决这问题,我们介绍SynFundus-1M,一个高质量的synthetic数据集,包含超过100万个眼部影像和丰富的疾病和 PATHOLOGY 注释,通过denoising diffusion probabilistic model生成。SynFundus-Generator和SynFundus-1M在主流公共数据集上达到了更高的Frechet Inception Distance(FID)分数,并且经过了专业针对的诊断医生评估,证明了SynFundus-1M的真实性。我们通过广泛的实验表明,CNN和ViT都可以通过SynFundus-1M进行预训练或直接训练,并且在多种下游任务上比ImageNet和EyePACS数据集更好的性能和更快的收敛。

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

  • paper_url: http://arxiv.org/abs/2312.00844
  • repo_url: None
  • paper_authors: Huadong Li, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji
  • for: 本文探讨了深度完成领域中 dense supervision 和 sparse supervision 之间的差异,并提出了一种“干扰补偿”框架来解决 projection transformation collapse (PTC) 问题。
  • methods: 本文提出了一种“干扰补偿”框架,包括干扰部分和补偿部分。干扰部分抛弃了图像/雷达/探测器之间的位坐标匹配,而补偿部分利用了三维空间和二维 semantic 信息来补偿抛弃的有利位坐标匹配。
  • results: 实验结果表明,我们的框架( sparse supervision)在 mean absolute error 和 speedup 两个指标上都超过了状态计算机( dense supervision),具体提高了11.6%。代码可以在 … 获取。
    Abstract It is widely believed that the dense supervision is better than the sparse supervision in the field of depth completion, but the underlying reasons for this are rarely discussed. In this paper, we find that the challenge of using sparse supervision for training Radar-Camera depth prediction models is the Projection Transformation Collapse (PTC). The PTC implies that sparse supervision leads the model to learn unexpected collapsed projection transformations between Image/Radar/LiDAR spaces. Building on this insight, we propose a novel ``Disruption-Compensation" framework to handle the PTC, thereby relighting the use of sparse supervision in depth completion tasks. The disruption part deliberately discards position correspondences among Image/Radar/LiDAR, while the compensation part leverages 3D spatial and 2D semantic information to compensate for the discarded beneficial position correspondence. Extensive experimental results demonstrate that our framework (sparse supervision) outperforms the state-of-the-art (dense supervision) with 11.6$\%$ improvement in mean absolute error and $1.6 \times$ speedup. The code is available at ...
    摘要 广泛认为密集监督比零监督更好在深度完成领域,但这些下面的原因几乎未得到讨论。在这篇论文中,我们发现使用零监督训练雷达相机深度预测模型的挑战是投影变换塌缩(PTC)。PTC表明零监督导致模型学习未料的投影变换 между图像/雷达/探空器空间。基于这一点,我们提出了一种“干扰补偿”框架,以解决PTC,从而使零监督在深度完成任务中可以使用。干扰部分故意抛弃图像/雷达/探空器之间的位坐标对应,而补偿部分利用三维空间和二维 semantic信息补偿抛弃的有利位坐标对应。我们的框架在mean absolute error和运行速度两个指标上均超过了现有最佳方法(密集监督),即11.6%的提升和1.6倍的快速。代码可以在...

On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2312.00353
  • repo_url: None
  • paper_authors: Pei-Chi Lo, Yi-Hang Tsai, Ee-Peng Lim, San-Yih Hwang
  • for: 本研究探讨 LLM 是否可以通过自己学习的知识图reasoning。
  • methods: 本研究使用 LLMPerform four 种不同的知识图理解任务,以 investigate LLM 的准确性和推理能力。
  • results: 实验结果表明 LLM 可以成功解决 simple 和 complex 的知识图理解任务,并可以从 context 中推理知识图关系。
    Abstract This paper examines the capacity of LLMs to reason with knowledge graphs using their internal knowledge graph, i.e., the knowledge graph they learned during pre-training. Two research questions are formulated to investigate the accuracy of LLMs in recalling information from pre-training knowledge graphs and their ability to infer knowledge graph relations from context. To address these questions, we employ LLMs to perform four distinct knowledge graph reasoning tasks. Furthermore, we identify two types of hallucinations that may occur during knowledge reasoning with LLMs: content and ontology hallucination. Our experimental results demonstrate that LLMs can successfully tackle both simple and complex knowledge graph reasoning tasks from their own memory, as well as infer from input context.
    摘要
  1. How accurate are LLMs in recalling information from pre-training knowledge graphs?2. Can LLMs infer knowledge graph relations from context?To address these questions, the paper uses LLMs to perform four different knowledge graph reasoning tasks. The study also identifies two types of hallucinations that may occur during knowledge reasoning with LLMs: content hallucination and ontology hallucination.The experimental results show that LLMs can successfully complete both simple and complex knowledge graph reasoning tasks from their own memory, as well as infer from input context.

The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP

  • paper_url: http://arxiv.org/abs/2312.00349
  • repo_url: None
  • paper_authors: Julian Michael
  • for: 这篇论文主要是为了探讨如何通过开发可扩展、数据驱动的语言结构理论来推动自然语言处理(NLP)领域的科学进步。
  • methods: 这篇论文使用了机器学习技术来构建基于问答对的语言结构理论,以解释语言行为现象的行为现象。
  • results: 这篇论文通过使用问答驱动的 semantic role labeling(SRL) schema来实现对 verb predicate-argument 关系的精确标注,并提出了将来的数据收集和理论建模的原则。
    Abstract I propose a paradigm for scientific progress in NLP centered around developing scalable, data-driven theories of linguistic structure. The idea is to collect data in tightly scoped, carefully defined ways which allow for exhaustive annotation of behavioral phenomena of interest, and then use machine learning to construct explanatory theories of these phenomena which can form building blocks for intelligible AI systems. After laying some conceptual groundwork, I describe several investigations into data-driven theories of shallow semantic structure using Question-Answer driven Semantic Role Labeling (QA-SRL), a schema for annotating verbal predicate-argument relations using highly constrained question-answer pairs. While this only scratches the surface of the complex language behaviors of interest in AI, I outline principles for data collection and theoretical modeling which can inform future scientific progress. This note summarizes and draws heavily on my PhD thesis.
    摘要 我提出一种 парадиг для科学进步在自然语言处理(NLP)领域,中心思想是发展可扩展、数据驱动的语言结构理论。我的想法是收集数据,使其具有紧密定义、严格控制的方式,以便对语言行为 interessant 的现象进行完整的注释,然后使用机器学习构建解释这些现象的理论,这些理论可以成为可读ble AI 系统的基础。在论述一些概念基础之后,我描述了一些基于问题-答案驱动的semantic Role Labeling(QA-SRL)数据驱动理论的研究。虽然这只是对语言行为的一部分进行了研究,但我提出了数据收集和理论建模的原则,可以导向未来的科学进步。这份笔记主要基于我的博士论文。

Efficient Off-Policy Safe Reinforcement Learning Using Trust Region Conditional Value at Risk

  • paper_url: http://arxiv.org/abs/2312.00342
  • repo_url: https://github.com/rllab-snu/Off-Policy-TRC
  • paper_authors: Dohyeong Kim, Songhwai Oh
  • for: 解决一个安全再增值学习(RL)问题,使用风险度量来约束。
  • methods: 提出了一种基于信任区间方法的在策略上安全RL方法,以及一种基于停止区间约束的离线RL方法。
  • results: 在模拟和实际环境中证明了该方法可以在不到几步内满足安全约束,并且在复杂的机器人任务中达到高返回率。
    Abstract This paper aims to solve a safe reinforcement learning (RL) problem with risk measure-based constraints. As risk measures, such as conditional value at risk (CVaR), focus on the tail distribution of cost signals, constraining risk measures can effectively prevent a failure in the worst case. An on-policy safe RL method, called TRC, deals with a CVaR-constrained RL problem using a trust region method and can generate policies with almost zero constraint violations with high returns. However, to achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods are required to be sample efficient. To this end, we propose an off-policy safe RL method with CVaR constraints, called off-policy TRC. If off-policy data from replay buffers is directly used to train TRC, the estimation error caused by the distributional shift results in performance degradation. To resolve this issue, we propose novel surrogate functions, in which the effect of the distributional shift can be reduced, and introduce an adaptive trust-region constraint to ensure a policy not to deviate far from replay buffers. The proposed method has been evaluated in simulation and real-world environments and satisfied safety constraints within a few steps while achieving high returns even in complex robotic tasks.
    摘要 However, to achieve outstanding performance in complex environments and satisfy safety constraints quickly, RL methods need to be sample efficient. To address this, we propose an off-policy safe RL method with CVaR constraints, called off-policy TRC.One challenge with using off-policy data from replay buffers to train TRC is that the estimation error caused by the distributional shift can result in performance degradation. To address this, we propose novel surrogate functions that can reduce the effect of the distributional shift and introduce an adaptive trust-region constraint to ensure that the policy does not deviate far from the replay buffers.The proposed method has been evaluated in simulation and real-world environments and was able to satisfy safety constraints within a few steps while achieving high returns even in complex robotic tasks.

Green Edge AI: A Contemporary Survey

  • paper_url: http://arxiv.org/abs/2312.00333
  • repo_url: None
  • paper_authors: Yuyi Mao, Xianghao Yu, Kaibin Huang, Ying-Jun Angela Zhang, Jun Zhang
  • for: 本研究旨在探讨绿色边缘AI技术,以提高边缘设备的能源效率和可持续性。
  • methods: 本文采用了一种绿色设计方法,包括分析边缘AI系统的主要能源消耗组成部分,以及基于这些原则的绿色设计方法。
  • results: 本文总结了一些能效的绿色边缘AI设计方法,包括数据收集、边缘训练和边缘计算。此外,文章还提出了未来研究方向,以进一步提高边缘AI的能效性。
    Abstract Artificial intelligence (AI) technologies have emerged as pivotal enablers across a multitude of industries, including consumer electronics, healthcare, and manufacturing, largely due to their resurgence over the past decade. The transformative power of AI is primarily derived from the utilization of deep neural networks (DNNs), which require extensive data for training and substantial computational resources for processing. Consequently, DNN models are typically trained and deployed on resource-rich cloud servers. However, due to potential latency issues associated with cloud communications, deep learning (DL) workflows are increasingly being transitioned to wireless edge networks near end-user devices (EUDs). This shift is designed to support latency-sensitive applications and has given rise to a new paradigm of edge AI, which will play a critical role in upcoming 6G networks to support ubiquitous AI applications. Despite its potential, edge AI faces substantial challenges, mostly due to the dichotomy between the resource limitations of wireless edge networks and the resource-intensive nature of DL. Specifically, the acquisition of large-scale data, as well as the training and inference processes of DNNs, can rapidly deplete the battery energy of EUDs. This necessitates an energy-conscious approach to edge AI to ensure both optimal and sustainable performance. In this paper, we present a contemporary survey on green edge AI. We commence by analyzing the principal energy consumption components of edge AI systems to identify the fundamental design principles of green edge AI. Guided by these principles, we then explore energy-efficient design methodologies for the three critical tasks in edge AI systems, including training data acquisition, edge training, and edge inference. Finally, we underscore potential future research directions to further enhance the energy efficiency of edge AI.
    摘要 人工智能(AI)技术在多个领域中发挥了重要作用,包括消费电子、医疗和制造等,主要归功于过去十年的崛起。AI的变革力主要来自于深度神经网络(DNN)的应用,但DNN模型需要大量数据进行训练和处理,因此通常在云服务器上进行训练和部署。然而,由于云通信可能会导致延迟问题,因此DL工作流程在无线边缘网络附近的终端设备(EUD)上进行转换,以支持延迟敏感应用。这种转换是6G网络中支持普遍AI应用的新 paradigma的一部分。虽然edge AI具有潜在的潜力,但它还面临着云边缘网络资源的限制和DL的资源占用问题。具体来说,获取大规模数据以及DNN模型的训练和推理过程可以迅速消耗EUD的电池能量。因此,我们需要一种能源减少的方法来实现edge AI,以确保优化和可持续的性能。在这篇论文中,我们提供了一个当代的绿色边缘AIsurvey。我们首先分析了边缘AI系统的主要能源消耗成分,以确定绿色边缘AI的基本设计原则。遵循这些原则,我们然后探讨了能量减少的设计方法 для边缘AI系统中的三个关键任务,包括数据收集、边缘训练和边缘推理。最后,我们强调了将来可能的未来研究方向,以进一步提高边缘AI的能效性。

Exploring the Robustness of Decentralized Training for Large Language Models

  • paper_url: http://arxiv.org/abs/2312.00843
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Lin Lu, Chenxi Dai, Wangcheng Tao, Binhang Yuan, Yanan Sun, Pan Zhou
  • for: 这篇论文的目的是探讨 Decentralized training of large language models 中存在的安全问题,并提出一种可行的解决方案。
  • methods: 这篇论文使用了三种主要的方法来探讨 Decentralized training 的 robustness:首先,通过描述 Decentralized training 框架中的硬件、数据和模型漏洞,揭示了 Decentralized training 的敏感性。其次,通过比较 Decentralized foundation model training 和普通的 Federated learning 的安全技术,指出了 Decentralized training 的安全问题。最后,通过提出一种具体的威胁模型,描述了 Decentralized training 需要的基本组成部分和可行的解决方案。
  • results: 这篇论文的结果表明,Decentralized training of large language models 存在许多安全问题,包括硬件漏洞、数据泄露和模型攻击等。此外,Decentralized training 不同于普通的 Federated learning,因此安全技术无法直接应用。为了建立一个可靠和高效的 Decentralized training 框架,需要解决这些问题。
    Abstract Decentralized training of large language models has emerged as an effective way to democratize this technology. However, the potential threats associated with this approach have not been carefully discussed, which would hinder the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from three main perspectives. First, we demonstrate the vulnerabilities inherent in decentralized training frameworks in terms of hardware, data, and models. Second, we highlight the fundamental difference between decentralized foundation model training and vanilla federated learning, where the security techniques employed in federated learning cannot be applied directly. Third, we discuss the essential components required for a robust and efficient decentralized training framework and present a case study by modeling a concrete threat model. Our objective in this vision paper is to emphasize the importance of addressing security concerns in the context of decentralized training for large language models.
    摘要 <> translate_language=zh-CN广泛训练大型自然语言模型的分布式训练方法已经显示出效iveness,但是这种方法的可能的威胁尚未得到了详细的讨论,这会阻碍分布式训练基础设施的发展。这篇论文的目的是通过三个主要角度探讨分布式训练的Robustness。首先,我们展示了分布式训练框架中硬件、数据和模型的潜在漏洞。其次,我们强调了 federated learning 和分布式训练基础模型之间的根本区别,后者的安全技术不能直接应用于前者。最后,我们讨论了robust和高效的分布式训练框架的重要组成部分,并通过定制化威胁模型来进行示例。我们的目标在这篇视野论文中是强调在分布式训练大型自然语言模型的上下文中解决安全问题的重要性。

Matching Weak Informative Ontologies

  • paper_url: http://arxiv.org/abs/2312.00332
  • repo_url: https://github.com/npubird/lilywio
  • paper_authors: Peng Wang
  • for: This paper addresses the challenge of matching weakly informative ontologies (WIOs) using the ontology structure information to discover alignments.
  • methods: The proposed method employs a semantic subgraph-based similarity propagation model to match WIOs, with constraints to ensure a balance between efficiency and quality.
  • results: The proposed method significantly outperforms most state-of-the-art works in both WIO matching tasks and general ontology matching tasks, with a large increase in recall and high precision of matching results.
    Abstract Most existing ontology matching methods utilize the literal information to discover alignments. However, some literal information in ontologies may be opaque and some ontologies may not have sufficient literal information. In this paper, these ontologies are named as weak informative ontologies (WIOs) and it is challenging for existing methods to matching WIOs. On one hand, string-based and linguistic-based matching methods cannot work well for WIOs. On the other hand, some matching methods use external resources to improve their performance, but collecting and processing external resources is still time-consuming. To address this issue, this paper proposes a practical method for matching WIOs by employing the ontology structure information to discover alignments. First, the semantic subgraphs are extracted from the ontology graph to capture the precise meanings of ontology elements. Then, a new similarity propagation model is designed for matching WIOs. Meanwhile, in order to avoid meaningless propagation, the similarity propagation is constrained by semantic subgraphs and other conditions. Consequently, the similarity propagation model ensures a balance between efficiency and quality during matching. Finally, the similarity propagation model uses a few credible alignments as seeds to find more alignments, and some useful strategies are adopted to improve the performance. This matching method for WIOs has been implemented in the ontology matching system Lily. Experimental results on public OAEI benchmark datasets demonstrate that Lily significantly outperforms most of the state-of-the-art works in both WIO matching tasks and general ontology matching tasks. In particular, Lily increases the recall by a large margin, while it still obtains high precision of matching results.
    摘要 现有的 Ontology 匹配方法多数利用文字信息来发现匹配。然而,一些 Ontology 的文字信息可能是透明的,而且一些 Ontology 可能没有充分的文字信息。在这篇论文中,这些 Ontology 被称为弱信息 Ontology(WIO),匹配 WIO 是一个挑战。一方面,字符串基本和语言基本的匹配方法无法对 WIO 匹配好。另一方面,一些匹配方法使用外部资源来提高其性能,但收集和处理外部资源仍然是时间consuming。为了解决这个问题,这篇论文提出了一个实用的 WIO 匹配方法,通过利用 Ontology 结构信息来发现匹配。首先,从 Ontology 图中提取 semantic subgraphs,以捕捉 Ontology 元素的精确意义。然后,设计了一个新的相似传播模型,用于匹配 WIO。同时,为了避免意义无法传播,相似传播被限制在 semantic subgraphs 和其他条件下。因此,相似传播模型保证了匹配效率和质量之间的平衡。最后,相似传播模型使用一些可靠的匹配作为种子,以找到更多的匹配。此外,还采用了一些有用的策略来提高性能。这个 WIO 匹配方法已经被实现在 Lily 中。实验结果显示,Lily 在 OAEI 公共benchmark datasets 上表现出色,与大多数现有的State-of-the-art工作相比,Lily 在 WIO 匹配任务和一般 Ontology 匹配任务中均表现出色,尤其是在回传margin上增加了大量的准确性。

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

  • paper_url: http://arxiv.org/abs/2312.00330
  • repo_url: https://github.com/GongyeLiu/StyleCrafter
  • paper_authors: Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Xintao Wang, Yujiu Yang, Ying Shan
  • for: 这个论文的目的是提高文本到视频转换(T2V)模型的精度和可控性,使其能够生成符合文本内容和参考图像风格的高质量视频。
  • methods: 该论文提出了一种基于参考图像的风格控制器,通过将风格控制器添加到已经训练过的 T2V 模型中,使得模型能够通过提供参考图像来生成任何风格的视频。在训练过程中, authors 使用了一种吸取风格信息的学习策略和一种缩放适应模块来保证内容风格分离和混合的良好性。
  • results: 该论文的实验结果显示,StyleCrafter 可以生成高质量的风格化视频,其风格与文本内容和参考图像高度相似。与现有的竞争者相比,StyleCrafter 更加灵活和高效。
    Abstract Text-to-video (T2V) models have shown remarkable capabilities in generating diverse videos. However, they struggle to produce user-desired stylized videos due to (i) text's inherent clumsiness in expressing specific styles and (ii) the generally degraded style fidelity. To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image. Considering the scarcity of stylized video datasets, we propose to first train a style control adapter using style-rich image datasets, then transfer the learned stylization ability to video generation through a tailor-made finetuning paradigm. To promote content-style disentanglement, we remove style descriptions from the text prompt and extract style information solely from the reference image using a decoupling learning strategy. Additionally, we design a scale-adaptive fusion module to balance the influences of text-based content features and image-based style features, which helps generalization across various text and style combinations. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images. Experiments demonstrate that our approach is more flexible and efficient than existing competitors.
    摘要 文本到视频(T2V)模型已经展现出了非凡的多样化视频生成能力。然而,它们很难生成用户需要的个性化风格视频,主要是因为文本的内在粗糙性和风格缺失精度。为解决这些挑战,我们介绍了 StyleCrafter,一种通用的方法,它可以增强预训练的 T2V 模型,通过提供参考图像来实现任何风格的视频生成。由于缺乏风格化视频数据集,我们提议首先在风格丰富的图像数据集上训练style控制适配器,然后通过特制的 Fine-tuning 策略将学习的风格化能力转移到视频生成中。为了提高内容风格分离,我们从文本提示中去除风格描述,并通过分离学习策略从参考图像中提取风格信息。此外,我们设计了可缩放的融合模块,以平衡文本基于内容特征和图像基于风格特征的影响,从而增强泛化性。 StyleCrafter 高效地生成了高质量的风格化视频,其内容与文本相符,风格与参考图像相似。实验表明,我们的方法比现有的竞争者更加灵活和高效。

Agent-OM: Leveraging Large Language Models for Ontology Matching

  • paper_url: http://arxiv.org/abs/2312.00326
  • repo_url: None
  • paper_authors: Zhangcheng Qiang, Weiqing Wang, Kerry Taylor
  • for: 本研究旨在探讨大语言模型(LLM)在 ontology matching(OM)领域中的应用潜力,并提出了一种基于代理人powered LLM 的 OM 系统设计方案。
  • methods: 本研究提出了一种 Agent-OM 框架,包括两个 Siamese 代理人用于检索和匹配,以及一组简单的 prompt-based OM 工具。
  • results: 对 Ontology Alignment Evaluation Initiative(OAEI)评估 tracks 的评估表明,我们的系统可以在简单的 OM 任务上几乎与最佳长期表现相当,并在复杂和少量 OM 任务上显著提高表现。
    Abstract Ontology matching (OM) enables semantic interoperability between different ontologies and resolves their conceptual heterogeneity by aligning related entities. OM systems currently have two prevailing design paradigms: conventional knowledge-based expert systems and newer machine learning-based predictive systems. While large language models (LLMs) and LLM-based agents have become revolutionary in data engineering and have been applied creatively in various domains, their potential for OM remains underexplored. This study introduces a novel agent-powered LLM-based design paradigm for OM systems. With thoughtful consideration of several specific challenges to leverage LLMs for OM, we propose a generic framework, namely Agent-OM, consisting of two Siamese agents for retrieval and matching, with a set of simple prompt-based OM tools. Our framework is implemented in a proof-of-concept system. Evaluations of three Ontology Alignment Evaluation Initiative (OAEI) tracks over state-of-the-art OM systems show that our system can achieve very close results to the best long-standing performance on simple OM tasks and significantly improve the performance on complex and few-shot OM tasks.
    摘要 ontology matching (OM) 允许 semantic 相互操作 между不同 ontology 和解决它们的概念多样性by aligning 相关的实体。OM 系统目前有两种主要的设计哲学:传统的知识基础系统和 newer machine learning 预测系统。而 large language models (LLMs) 和 LLM-based 代理已经成为数据工程中的革命,它们在不同领域中应用了创新的方式,但它们在 OM 中的潜力还未得到充分利用。这项研究介绍了一种新的 LLM-based 代理设计方法 для OM 系统。通过考虑一些特定的挑战,我们提议一种通用的 Agent-OM 框架,包括两个 Siamese 代理 для检索和匹配,以及一组简单的 prompt-based OM 工具。我们的框架在一个证明性系统中实现。经过评估 OAEI 评估轨道上的三个 Ontology Alignment Evaluation Initiative (OAEI) track,我们的系统可以在简单的 OM 任务上达到与最佳长期性能很近的结果,并在复杂和少量 OM 任务上显著提高性能。

Conceptual Engineering Using Large Language Models

  • paper_url: http://arxiv.org/abs/2312.03749
  • repo_url: https://github.com/bradleypallen/zero-shot-classifiers-for-conceptual-engineering
  • paper_authors: Bradley P. Allen
  • for: 这个论文的目的是提出一种基于 Jennifer Nado 的定义方法,用于实现概念工程中的分类程序。
  • methods: 这个方法使用大型自然语言模型来实现概念工程中的分类程序。
  • results: 通过使用 Wikidata 知识图数据进行评估,这个方法可以评估两个概念工程项目(国际天文联合会的 PLANET 重定义和 Haslanger 的 WOMAN 修补分析)中的概念定义。
    Abstract We describe a method, based on Jennifer Nado's definition of classification procedures as targets of conceptual engineering, that implements such procedures using a large language model. We then apply this method using data from the Wikidata knowledge graph to evaluate concept definitions from two paradigmatic conceptual engineering projects: the International Astronomical Union's redefinition of PLANET and Haslanger's ameliorative analysis of WOMAN. We discuss implications of this work for the theory and practice of conceptual engineering. The code and data can be found on GitHub.
    摘要 我们描述了一种方法,基于 Jennifer Nado 定义的分类过程为概念工程的目标,使用大语言模型实现这些过程。然后我们使用 Wikidata 知识图库中的数据应用这种方法,评估两个概念工程项目中的概念定义:国际天文学联合会重新定义的 PLANET,以及 Haslanger 修复分析的 WOMAN。我们讨论了这项工作对概念工程理论和实践的影响。代码和数据可以在 GitHub 上找到。

PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction

  • paper_url: http://arxiv.org/abs/2312.00839
  • repo_url: https://github.com/guanleics/pipeoptim
  • paper_authors: Lei Guan, Dongsheng Li, Jiye Liang, Wenjian Wang, Xicheng Lu
  • for: 这个研究是为了解决对 asynchronous pipeline training 的 weight inconsistency 和 weight staleness 问题。
  • methods: 这个研究使用了一个 optimizer-dependent weight prediction strategy (简称为 PipeOptim),透过在前进通道中使用预测的 weights,以确保每个 mini-batch 使用的是一致的和不潮湿的 weights。
  • results: 实验结果显示 PipeOptim 能够优于对比的 pipelined 方法(包括 GPipe、PipeDream、PipeDream-2BW 和 SpecTrain),并且能够确保实际的参数学习不受 optimizer 的类型所限。
    Abstract Asynchronous pipeline model parallelism with a "1F1B" (one forward, one backward) schedule generates little bubble overhead and always provides quite a high throughput. However, the "1F1B" schedule inevitably leads to weight inconsistency and weight staleness issues due to the cross-training of different mini-batches across GPUs. To simultaneously address these two problems, in this paper, we propose an optimizer-dependent weight prediction strategy (a.k.a PipeOptim) for asynchronous pipeline training. The key insight of our proposal is that we employ a weight prediction strategy in the forward pass to ensure that each mini-batch uses consistent and staleness-free weights to compute the forward pass. To be concrete, we first construct the weight prediction scheme based on the update rule of the used optimizer when training the deep neural network models. Then throughout the "1F1B" pipelined training, each mini-batch is mandated to execute weight prediction ahead of the forward pass, subsequently employing the predicted weights to perform the forward pass. As a result, PipeOptim 1) inherits the advantage of the "1F1B" schedule and generates pretty high throughput, and 2) can ensure effective parameter learning regardless of the type of the used optimizer. To verify the effectiveness of our proposal, we conducted extensive experimental evaluations using eight different deep-learning models spanning three machine-learning tasks including image classification, sentiment analysis, and machine translation. The experiment results demonstrate that PipeOptim outperforms the popular pipelined approaches including GPipe, PipeDream, PipeDream-2BW, and SpecTrain. The code of PipeOptim can be accessible at https://github.com/guanleics/PipeOptim.
    摘要 异步管道模型并行模式 WITH "1F1B" (一进一返)调度可以生成非常小的气泡开销,并且总是提供非常高的 Throughput。然而,"1F1B" 调度无可避免地会导致参数不一致和参数偏古问题,这是因为不同的 mini-batch 在 GPU 之间进行交叉训练。为了同时解决这两个问题,在这篇论文中,我们提出了一种依赖于优化器的 weight prediction 策略(即 PipeOptim) для异步管道训练。我们的关键发现是,我们在前进 pass 中使用 weight prediction 策略来确保每个 mini-batch 使用一致的和新鲜的 weights 来计算前进 pass。具体来说,我们首先根据使用的优化器更新规则来构建 weight prediction 方案。然后,在 "1F1B" 管道训练中,每个 mini-batch 被要求在进行 weight prediction 之前,并在使用预测的 weights 来进行前进 pass。这样,PipeOptim 1) 继承了 "1F1B" 调度的优点,生成非常高的 Throughput,并 2) 可以确保参数学习不受优化器类型的限制。为了证明 PipeOptim 的效果,我们对深度学习模型进行了广泛的实验评估,包括图像分类、情感分析和机器翻译等三种机器学习任务。实验结果表明,PipeOptim 在与 GPipe、PipeDream、PipeDream-2BW 和 SpecTrain 等流行的管道方法进行比较时,具有更高的 Throughput 和更好的参数学习性。PipeOptim 的代码可以在 GitHub 上获取:

Mark My Words: Analyzing and Evaluating Language Model Watermarks

  • paper_url: http://arxiv.org/abs/2312.00273
  • repo_url: https://github.com/wagner-group/markmywords
  • paper_authors: Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, David Wagner
  • for: 本研究旨在评估文本水印技术,以帮助应用者更好地选择适合自己需求的水印方案。
  • methods: 本研究使用了 MARKMYWORDS benchmark,包括不同任务和实际攻击下的评估。研究者使用了三个主要纪录:质量、大小(例如,需要多少token来探测水印)和防御性。
  • results: 现有的水印技术已经可以在无感知质量下进行部署,例如,使用 Kirchenbauer et al. [1] 的方法可以在 Llama2-7B-chat 上水印无可察见的质量下,探测水印只需要 fewer than 100 tokens,并且对简单的攻击有良好的防御性。研究者 argue that water mark indistinguishability 是一个过于严格的要求,不同的方法可以通过微小地改变 logit 分布来实现更好的性能。
    Abstract The capabilities of large language models have grown significantly in recent years and so too have concerns about their misuse. In this context, the ability to distinguish machine-generated text from human-authored content becomes important. Prior works have proposed numerous schemes to watermark text, which would benefit from a systematic evaluation framework. This work focuses on text watermarking techniques - as opposed to image watermarks - and proposes MARKMYWORDS, a comprehensive benchmark for them under different tasks as well as practical attacks. We focus on three main metrics: quality, size (e.g. the number of tokens needed to detect a watermark), and tamper-resistance. Current watermarking techniques are good enough to be deployed: Kirchenbauer et al. [1] can watermark Llama2-7B-chat with no perceivable loss in quality, the watermark can be detected with fewer than 100 tokens, and the scheme offers good tamper-resistance to simple attacks. We argue that watermark indistinguishability, a criteria emphasized in some prior works, is too strong a requirement: schemes that slightly modify logit distributions outperform their indistinguishable counterparts with no noticeable loss in generation quality. We publicly release our benchmark (https://github.com/wagner-group/MarkMyWords)
    摘要 大型自然语言模型的能力在最近几年内有了很大的进步,同时关于它们的不当使用也逐渐增加。在这种情况下,可以将文本水印作为一种重要的技术。先前的研究已经提出了许多水印文本的方法,但是这些方法尚未得到系统的评估框架。本文将 concentrate 在文本水印技术上,而不是图像水印,并提出了 MARKMYWORDS,一个全面的 benchmark beneath different tasks 以及实际攻击。我们主要关注三个维度:质量、大小(例如,用于检测水印的token数)和抗 tampering 性。目前的水印技术已经可以进行部署:Kirchenbauer et al. [1] 可以将 Llama2-7B-chat 水印到无可察见的质量下,水印可以通过 fewer than 100 个token 检测,并且该方案具有良好的简单攻击 resistant 性。我们认为水印不可区分性,在一些先前的工作中被强调的标准,是一个太强的要求: modifying logit 分布会使水印 scheme outperform 其他可区分的 scheme,无需对生成质量产生显著影响。我们将 MARKMYWORDS 公开发布(https://github.com/wagner-group/MarkMyWords)。

Academic competitions

  • paper_url: http://arxiv.org/abs/2312.00268
  • repo_url: https://github.com/vicentinileonardo/DWT-SVD-digital-watermarking
  • paper_authors: Hugo Jair Escalante, Aleksandra Kruchinina
    for: 这篇论文旨在探讨学术挑战的现状和发展趋势,尤其是在机器学习和相关领域的应用。methods: 本文使用了论文综述的方法,对最近几年最具影响力的学术挑战进行了回顾和分析,包括不同领域的挑战目标、成果和未来发展趋势。results: 本文分析发现,学术挑战在机器学习和相关领域有着广泛的应用和影响,在不同领域的挑战目标和成果方面也有所不同。同时,本文还发现了一些未来发展趋势,如数据驱动的挑战和人工智能的应用。
    Abstract Academic challenges comprise effective means for (i) advancing the state of the art, (ii) putting in the spotlight of a scientific community specific topics and problems, as well as (iii) closing the gap for under represented communities in terms of accessing and participating in the shaping of research fields. Competitions can be traced back for centuries and their achievements have had great influence in our modern world. Recently, they (re)gained popularity, with the overwhelming amounts of data that is being generated in different domains, as well as the need of pushing the barriers of existing methods, and available tools to handle such data. This chapter provides a survey of academic challenges in the context of machine learning and related fields. We review the most influential competitions in the last few years and analyze challenges per area of knowledge. The aims of scientific challenges, their goals, major achievements and expectations for the next few years are reviewed.
    摘要 学术挑战包括有效的手段,以提高研究领域的状态,把科学社区特定话题和问题放在灯光中,以及为受抑表达的社区提供参与研究领域的机会。竞赛可以追溯到 centuries ago,其成就对现代世界产生了深远的影响。在最近几年,它们又重新获得了popularity,因为不同领域生成的庞大数据量,以及现有方法和工具的挑战。本章节对学术挑战在机器学习和相关领域进行了评估。我们回顾过去几年最具影响力的竞赛,分析了每个领域的挑战。本章节还概述了科学竞赛的目标,其中的目标、主要成就和未来几年的期望。

Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

  • paper_url: http://arxiv.org/abs/2312.00267
  • repo_url: None
  • paper_authors: Viraj Mehta, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Jeff Schneider, Willie Neiswanger
  • for: 本研究旨在提高人类反馈学习(RLHF)中的策略选择效率,特别是在大语言模型训练中。
  • methods: 本研究使用了上下文选择性战略,并提出了一种基于上下文的多 armed bandit问题的解决方案。
  • results: 实验结果表明, compared to多种基eline,我们的方法在 synthetic 环境中具有更好的性能,并在实际 datasets 上达到了更高的表现。
    Abstract Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and formalize this as an offline contextual dueling bandit problem. We give an upper-confidence-bound style algorithm for this problem and prove a polynomial worst-case regret bound. We then provide empirical confirmation in a synthetic setting that our approach outperforms existing methods. After, we extend the setting and methodology for practical use in RLHF training of large language models. Here, our method is able to reach better performance with fewer samples of human preferences than multiple baselines on three real-world datasets.
    摘要 preference-based feedback 是 reinforcement learning 中的一个重要应用,特别是当irect evaluation of a reward function 不可能时。一个最近的例子出现在 reinforcement learning from human feedback (RLHF) 中,在大型语言模型上。许多RLHF应用中,获取人类反馈的成本可能很高。在这种情况下,我们利用人类反馈可以在选择上进行优化,以便最 efficiently identify a good policy,并将其形式化为 offline contextual dueling bandit problem。我们提出了一种 upper-confidence-bound 风格的算法,并证明了其减少的最差情况偏离 bound。然后,我们在一个 sintetic 环境中进行了 empirical confirmation,并证明了我们的方法超过了现有方法。最后,我们扩展了设定和方法,用于实际应用在 RLHF 中训练大型语言模型。在这种情况下,我们的方法能够更好地达到 fewer samples of human preferences 下的 better performance,并且超过了多个基eline在三个实际数据集上。

Skipper: Improving the Reach and Fidelity of Quantum Annealers by Skipping Long Chains

  • paper_url: http://arxiv.org/abs/2312.00264
  • repo_url: None
  • paper_authors: Ramin Ayanzadeh, Moinuddin Qureshi
  • for: 提高量子熔化器(QA)的容量和精度,使其能解决更大的问题。
  • methods: 提出了一种软件技术“ Skipper”,可以跳过主导性链并将其程序队列中的两个读取结果替换为一个链。
  • results: 使用了一个5761个队列的QA,证明了“ Skipper”可以解决更大的问题(最多59%),并且可以提高QA的精度(最多44%)。
    Abstract Quantum Annealers (QAs) operate as single-instruction machines, lacking a SWAP operation to overcome limited qubit connectivity. Consequently, multiple physical qubits are chained to form a program qubit with higher connectivity, resulting in a drastically diminished effective QA capacity by up to 33x. We observe that in QAs: (a) chain lengths exhibit a power-law distribution, a few dominant chains holding substantially more qubits than others; and (b) about 25% of physical qubits remain unused, getting isolated between these chains. We propose Skipper, a software technique that enhances the capacity and fidelity of QAs by skipping dominant chains and substituting their program qubit with two readout results. Using a 5761-qubit QA, we demonstrate that Skipper can tackle up to 59% (Avg. 28%) larger problems when eleven chains are skipped. Additionally, Skipper can improve QA fidelity by up to 44% (Avg. 33%) when cutting five chains (32 runs). Users can specify up to eleven chain cuts in Skipper, necessitating about 2,000 distinct quantum executable runs. To mitigate this, we introduce Skipper-G, a greedy scheme that skips sub-problems less likely to hold the global optimum, executing a maximum of 23 quantum executables with eleven chain trims. Skipper-G can boost QA fidelity by up to 41% (Avg. 29%) when cutting five chains (11 runs).
    摘要 量子泛化器(QA)作为单指令机器,缺乏SWAP操作,因此Physical qubits的连接性受限。因此,多个物理量子比特被串联,形成一个高连接性的程序量子比特,导致效果的QA容量减少至多达33倍。我们发现在QA中:(a)链长度遵循力学分布,一些主导的链拥有许多量子比特比其他链更多; 和(b)约25%的物理量子比特处于隔离状态,被孤立在这些链之间。我们提议了一种软件技术,称为Skipper,可以提高QA的容量和准确性。Skipper可以跳过主导的链,并将其程序量子比特替换为两个读出结果。使用5761个量子比特的QA,我们示出Skipper可以解决更大的问题,比如59%(均值28%)更大的问题,当 eleven链被跳过时。此外,Skipper还可以提高QA的准确性,达到44%(均值33%),当五个链被切割(32次)。用户可以指定最多 eleven个链被跳过,需要约2000个不同的量子可执行运行。为了解决这个问题,我们提出了Skipper-G,一种滥览策略,可以跳过不太可能包含全球最优解的子问题。Skipper-G可以提高QA的准确性,达到41%(均值29%),当五个链被切割(11次)。