cs.AI - 2023-09-11

The bionic neural network for external simulation of human locomotor system

  • paper_url: http://arxiv.org/abs/2309.05863
  • repo_url: None
  • paper_authors: Yue Shi, Shuhao Ma, Yihui Zhao
  • for: This paper aims to propose a physics-informed deep learning method to predict joint motion and muscle forces using musculoskeletal (MSK) modeling techniques.
  • methods: The proposed method embeds the MSK model into a neural network as an ordinary differential equation (ODE) loss function, allowing for the automatic estimation of subject-specific MSK physiological parameters during the training process.
  • results: The experimental validations on two datasets demonstrate that the proposed deep learning method can accurately identify subject-specific MSK physiological parameters and yield accurate predictions of joint motion and muscle forces.Here is the text in Simplified Chinese:
  • for: 这篇论文目标是提出一种基于物理学习的深度学习方法,用于预测关节运动和肌肉力量。
  • methods: 该方法将MSK模型 integrate到神经网络中,作为常微分方程(ODE)损失函数,以便自动在训练过程中确定subject特定的MSK生物学参数。
  • results: 对两个数据集进行了实验验证,结果表明,该提出的深度学习方法可以准确地确定subject特定的MSK生物学参数,并且生成准确的关节运动和肌肉力量预测。
    Abstract Muscle forces and joint kinematics estimated with musculoskeletal (MSK) modeling techniques offer useful metrics describing movement quality. Model-based computational MSK models can interpret the dynamic interaction between the neural drive to muscles, muscle dynamics, body and joint kinematics, and kinetics. Still, such a set of solutions suffers from high computational time and muscle recruitment problems, especially in complex modeling. In recent years, data-driven methods have emerged as a promising alternative due to the benefits of flexibility and adaptability. However, a large amount of labeled training data is not easy to be acquired. This paper proposes a physics-informed deep learning method based on MSK modeling to predict joint motion and muscle forces. The MSK model is embedded into the neural network as an ordinary differential equation (ODE) loss function with physiological parameters of muscle activation dynamics and muscle contraction dynamics to be identified. These parameters are automatically estimated during the training process which guides the prediction of muscle forces combined with the MSK forward dynamics model. Experimental validations on two groups of data, including one benchmark dataset and one self-collected dataset from six healthy subjects, are performed. The results demonstrate that the proposed deep learning method can effectively identify subject-specific MSK physiological parameters and the trained physics-informed forward-dynamics surrogate yields accurate motion and muscle forces predictions.
    摘要 筋力和关节动力学估算使用musculoskeletal(MSK)模型技术提供有用的运动质量指标。模型基于计算机MSK模型可以解释动态中 между神经驱动筋肉、筋肉动力、身体和关节动力学的相互作用。然而,这些解决方案受到高计算时间和肌肉征调问题困扰,特别是在复杂的模型中。在过去几年,数据驱动方法出现为一种可能的替代方案,因为它们具有灵活性和适应性。然而,大量标注训练数据很难获得。这篇论文提议一种基于MSK模型的物理学习方法,用于预测关节运动和肌力。MSK模型被嵌入到神经网络中作为常微分方程(ODE)损失函数,以便在训练过程中自动确定肌肉活动动态和肌肉强制动态的物理参数。这些参数被自动确定,并导向预测肌肉力的组合,与MSK前向动力学模型相结合。实验验证了这种深度学习方法的有效性,在六名健康者的两组数据上进行了实验验证。结果表明,提议的深度学习方法可以有效地特定个体MSK生物学参数,并且训练的物理学习前向动力学代理模型可以准确预测运动和肌力。

Uncovering mesa-optimization algorithms in Transformers

  • paper_url: http://arxiv.org/abs/2309.05858
  • repo_url: https://github.com/jimmieliu/transformer-mesa-layer
  • paper_authors: Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento
  • for: 本研究旨在解释Transformers模型的优秀表现是如何实现的,特别是该模型在深度学习中的表现。
  • methods: 本研究使用了倒推工程来探索Transformers模型中的架构偏好,并发现了一种叫做“mesa-optimization”的学习过程。此外,研究者还使用了一系列的autoregressive Transformers模型来测试这个假设。
  • results: 研究结果显示,Transformers模型中的mesa-optimization过程可以帮助模型更好地适应内置学习任务,并且可以在几乎没有训练数据的情况下解决几乎任何深度学习任务。此外,研究者还提出了一个新的自我对齐层(mesa-layer),可以辅助模型更好地解决内置学习任务。
    Abstract Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers.
    摘要 transformers 已成为深度学习中最具有优势的模型,但其表现出色的原因尚未得到充分理解。在这里,我们提出了一种假设,即 transformers 的优异表现是由于模型具有一种叫做“mesa-优化”的建筑性偏好,这是在模型的前进通道中进行的两步过程:(i)内部学习目标的建构,以及(ii)通过优化来找到解决方案。为检验这一假设,我们对一系列基于自然语言处理任务的排序 Transformers 进行了反向工程,揭示了这些模型在生成预测时使用的梯度-based mesa-优化算法。此外,我们还证明了这种学习前进通道优化算法可以立即应用于解决一些简单的几个shot任务,表明了 mesa-优化可能在大语言模型中支持Contextual learning 的能力。最后,我们提出了一种新的自注意层,即 mesa-层,它可以专门和有效地解决在 Context 中指定的优化问题。我们发现这层可以在 synthetic 和预liminary 语言处理实验中提高表现,这进一步支持了我们假设,即 mesa-优化是训练过的 transformers 中隐藏的重要操作。

Challenges in Annotating Datasets to Quantify Bias in Under-represented Society

  • paper_url: http://arxiv.org/abs/2309.08624
  • repo_url: None
  • paper_authors: Vithya Yogarajan, Gillian Dobbie, Timothy Pistotti, Joshua Bensemann, Kobe Knowles
    for: This research aims to address the lack of annotated datasets for quantifying bias in under-represented societies, specifically focusing on the New Zealand (NZ) population.methods: The research involves the manual annotation of benchmark datasets for binary gender classification and ethical/racial considerations, despite the challenges faced with the availability of only three annotators.results: The research provides an overview of the challenges encountered and lessons learnt during the manual annotation process, and offers recommendations for future research on quantifying bias in under-represented societies.
    Abstract Recent advances in artificial intelligence, including the development of highly sophisticated large language models (LLM), have proven beneficial in many real-world applications. However, evidence of inherent bias encoded in these LLMs has raised concerns about equity. In response, there has been an increase in research dealing with bias, including studies focusing on quantifying bias and developing debiasing techniques. Benchmark bias datasets have also been developed for binary gender classification and ethical/racial considerations, focusing predominantly on American demographics. However, there is minimal research in understanding and quantifying bias related to under-represented societies. Motivated by the lack of annotated datasets for quantifying bias in under-represented societies, we endeavoured to create benchmark datasets for the New Zealand (NZ) population. We faced many challenges in this process, despite the availability of three annotators. This research outlines the manual annotation process, provides an overview of the challenges we encountered and lessons learnt, and presents recommendations for future research.
    摘要 近年人工智能的发展,包括高度复杂的大语言模型(LLM),在各个实际应用中得到了 beneficial 的效果。然而,这些 LLM 中的内置偏见问题引起了公平性的担忧。为了应对这些偏见,研究人员们开始了偏见的研究,包括量化偏见和开发减偏见技术。为了适应美国民族的性别和种族考虑,已经开发了一些偏见数据集。然而,对于被排挤社会的偏见问题还没有充分的研究。我们被动机于lack of annotated datasets for quantifying bias in under-represented societies,而 endeavoured 创建了新西兰(NZ)人口的 referential datasets。我们在这个过程中遇到了许多挑战,即使有三名注释者。本研究描述了我们的手动注释过程,介绍了我们遇到的挑战和学习点,并提出了未来研究的建议。

Large Language Models for Compiler Optimization

  • paper_url: http://arxiv.org/abs/2309.07062
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, Hugh Leather
  • for: 这个论文旨在应用大型自然语言模型来优化编程代码。
  • methods: 作者使用了一个7亿参数的转换器模型,从零开始训练,以优化LLVM� assembly代码。模型接受不优化的assembly输入,并输出一个包含编译器选项的列表,以最优化程序。在训练过程中,模型需要预测未优化代码和优化后代码的指令计数,以及优化后代码本身。这些辅助学习任务有助于提高优化模型的性能和理解深度。
  • results: 作者在一个大量测试程序中评估了他们的方法。结果显示,他们的方法可以比基eline的编译器减少指令数量3.0%,并且超过了两个基eline的基eline的优化方法,这两个基eline需要多达千次编译。此外,模型表现出了逗号 surprisingly strong的代码理解能力,可以生成可编译代码91%的时间,并且完全复制编译器的输出70%的时间。
    Abstract We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.
    摘要 我团队在探索大语言模型应用于代码优化方面做出了新的应用。我们提出了一个7亿参数的转换器模型,从头开始训练,以优化LLVM Assembly代码的大小。该模型接受未优化的Assembly输入,并输出一个包含编译器选项的列表,以优化程序。在训练过程中,我们要求模型预测未优化代码和优化后代码的指令计数,以及优化后代码本身。这些辅助学习任务有助于提高优化模型的性能和代码理解深度。我们对一个大量测试程序进行评估。我们的方法在减少指令数量方面比编译器更高,提高了3.0%。此外,模型表现出了奇异的代码理解能力,生成的代码91%的时间可编译,并且70%的时间完全模拟了编译器的输出。

Effective Abnormal Activity Detection on Multivariate Time Series Healthcare Data

  • paper_url: http://arxiv.org/abs/2309.05845
  • repo_url: None
  • paper_authors: Mengjia Niu, Yuchen Zhao, Hamed Haddadi
  • for: 这个研究旨在探讨多重时间序列数据(MTS)中的异常活动检测,以实现智能健康领域中的精确异常检测。
  • methods: 本研究提出了一种基于差异的异常检测方法(Rs-AD),并通过学习表现优化和异常活动检测来解决MTS数据中异常活动的探测问题。
  • results: 实验结果显示,Rs-AD方法在一个真实世界的步行数据集上取得了F1分数0.839,显示了该方法的效果。
    Abstract Multivariate time series (MTS) data collected from multiple sensors provide the potential for accurate abnormal activity detection in smart healthcare scenarios. However, anomalies exhibit diverse patterns and become unnoticeable in MTS data. Consequently, achieving accurate anomaly detection is challenging since we have to capture both temporal dependencies of time series and inter-relationships among variables. To address this problem, we propose a Residual-based Anomaly Detection approach, Rs-AD, for effective representation learning and abnormal activity detection. We evaluate our scheme on a real-world gait dataset and the experimental results demonstrate an F1 score of 0.839.
    摘要 多变量时间序列数据从多个传感器收集得到,提供了智能医疗场景中准确异常活动检测的潜在潜力。然而,异常活动在时间序列数据中显示多样的模式,容易被遗弃。因此,实现准确的异常检测是一项挑战,因为我们需要捕捉时间序列的 temporally 相关性和变量之间的相互关系。为解决这问题,我们提议一种基于差异的异常检测方法,Rs-AD,以便有效地学习表示和异常检测。我们对一个真实的步态数据集进行了实验,结果显示了 F1 分数为 0.839。

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

  • paper_url: http://arxiv.org/abs/2309.05833
  • repo_url: None
  • paper_authors: Dylan Zhang, Xuchao Zhang, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, Saravan Rajmohan
  • for: 本研究旨在提高云计算环境中Root Cause Analysis(RCA)工具的可靠性和准确性,以确保服务可靠性和客户信任。
  • methods: 本研究提出了一种基于大语言模型(LLM)的提取补充法,可以增强RCA工具的自信估计。该方法包括两个阶段:首先,模型根据历史事件数据评估自己的信息强度,然后审查由预测器生成的根 causa。最后,一个优化步骤将这些评估结果组合起来确定最终的自信分配。
  • results: 实验结果表明,我们的方法可以让模型更好地表达自己的自信度,提供更加抗摩擦的分数。我们解决了一些研究问题,包括使用LLMs生成的自信度是否准确、域специ fic retrieved examples对自信度估计的影响和不同RCA模型之间的通用性。通过这些研究,我们希望bridge自信度估计的差距,帮助on-call工程师做出更加有 confidence的决策,提高云 incident管理的效率。
    Abstract In recent years, the transition to cloud-based platforms in the IT sector has emphasized the significance of cloud incident root cause analysis to ensure service reliability and maintain customer trust. Central to this process is the efficient determination of root causes, a task made challenging due to the complex nature of contemporary cloud infrastructures. Despite the proliferation of AI-driven tools for root cause identification, their applicability remains limited by the inconsistent quality of their outputs. This paper introduces a method for enhancing confidence estimation in root cause analysis tools by prompting retrieval-augmented large language models (LLMs). This approach operates in two phases. Initially, the model evaluates its confidence based on historical incident data, considering its assessment of the evidence strength. Subsequently, the model reviews the root cause generated by the predictor. An optimization step then combines these evaluations to determine the final confidence assignment. Experimental results illustrate that our method enables the model to articulate its confidence effectively, providing a more calibrated score. We address research questions evaluating the ability of our method to produce calibrated confidence scores using LLMs, the impact of domain-specific retrieved examples on confidence estimates, and its potential generalizability across various root cause analysis models. Through this, we aim to bridge the confidence estimation gap, aiding on-call engineers in decision-making and bolstering the efficiency of cloud incident management.
    摘要 近年来,云计算领域内的服务可靠性和客户信任的重要性得到了更多的认可。为了确保服务可靠性,cloud incident根本原因分析成为了云计算领域内一项重要的任务。然而,由于当今云基础设施的复杂性,这个过程中的root cause分析具有挑战性。虽然AI驱动的root cause标识工具在市场上普及,但它们的应用受限于输出质量的不一致。这篇论文提出了一种方法,通过提高AI模型对根本原因分析结果的置信度的估计来增强root cause分析的可靠性。这种方法包括两个阶段:首先,模型根据历史事件数据评估其自身的置信度,然后对predictor生成的根本原因进行审查。最后,一个优化步骤将这两个评估结果组合起来确定最终的置信度分配。实验结果表明,我们的方法可以有效地使模型表达其置信度,提供一个更加准确的分数。我们解决了关于我们方法能否生成准确的置信度分数、采用域名Specific retrieved例子对置信度估计的影响以及其普适性的研究问题。通过这些研究,我们希望bridge置信度估计的差距,帮助on-call工程师在决策过程中更加准确,提高云 incident管理的效率。

Studying Accuracy of Machine Learning Models Trained on Lab Lifting Data in Solving Real-World Problems Using Wearable Sensors for Workplace Safety

  • paper_url: http://arxiv.org/abs/2309.05831
  • repo_url: None
  • paper_authors: Joseph Bertrand, Nick Griffey, Ming-Lun Lu, Rashmi Jha
  • for: 本研究旨在将实验室训练的机器学习模型(lifting identification model) портирова到实际世界中。
  • methods: 本研究使用了四种可能的解决方案来提高模型表现,包括:1)调整模型的参数;2)将训练数据集与实际世界中的数据集进行混合训练;3)将模型调整为适应实际世界中的环境;4)将模型训练于更大的数据集中。
  • results: 经过实验和分析后,研究发现这些方案都能够提高模型的表现,并且可以在实际世界中获得较好的结果。
    Abstract Porting ML models trained on lab data to real-world situations has long been a challenge. This paper discusses porting a lab-trained lifting identification model to the real-world. With performance much lower than on training data, we explored causes of the failure and proposed four potential solutions to increase model performance
    摘要 将实验室训练的机器学习模型应用到实际世界中的挑战一直存在。本篇文章讨论了将实验室训练的抬重识别模型应用到实际世界中的问题。模型在实际世界中的性能与训练数据之间存在很大差距,我们探索了引起这个问题的可能性和提出了四种解决方案以提高模型性能。

Exploring Geometric Deep Learning For Precipitation Nowcasting

  • paper_url: http://arxiv.org/abs/2309.05828
  • repo_url: None
  • paper_authors: Shan Zhao, Sudipan Saha, Zhitong Xiong, Niklas Boers, Xiao Xiang Zhu
  • for: 预测降水(几个小时内)的准确性仍然是一个挑战,因为需要准确地捕捉当地复杂的地方交互。
  • methods: 我们采用几何深度学习来普通化神经网络模型,以便更好地模型非欧几何空间中的地方关系。我们使用自动学习的对数矩阵来学习邻居Matrix,然后通过GCN层和1D核函数来提高空间和时间信息的抽象。
  • results: 我们在特伦托/意大利地区的雷达反射图序列上测试了模型,结果显示GCN可以更好地模型云profile的本地细节,以及提高预测准确性。
    Abstract Precipitation nowcasting (up to a few hours) remains a challenge due to the highly complex local interactions that need to be captured accurately. Convolutional Neural Networks rely on convolutional kernels convolving with grid data and the extracted features are trapped by limited receptive field, typically expressed in excessively smooth output compared to ground truth. Thus they lack the capacity to model complex spatial relationships among the grids. Geometric deep learning aims to generalize neural network models to non-Euclidean domains. Such models are more flexible in defining nodes and edges and can effectively capture dynamic spatial relationship among geographical grids. Motivated by this, we explore a geometric deep learning-based temporal Graph Convolutional Network (GCN) for precipitation nowcasting. The adjacency matrix that simulates the interactions among grid cells is learned automatically by minimizing the L1 loss between prediction and ground truth pixel value during the training procedure. Then, the spatial relationship is refined by GCN layers while the temporal information is extracted by 1D convolution with various kernel lengths. The neighboring information is fed as auxiliary input layers to improve the final result. We test the model on sequences of radar reflectivity maps over the Trento/Italy area. The results show that GCNs improves the effectiveness of modeling the local details of the cloud profile as well as the prediction accuracy by achieving decreased error measures.
    摘要 现在降水预测(几个小时)仍然是一个挑战,因为需要准确地捕捉当地复杂的地方交互。卷积神经网络(Convolutional Neural Networks,简称CNN)依靠卷积核对网格数据进行 convolution 操作,但是抽取特征被局部响应场所限制,通常会导致过度平滑的输出与实际值不匹配。这些模型缺乏模elling复杂的空间关系。使用非欧几何学学习(Geometric Deep Learning)可以普通化神经网络模型,使其在非欧几何空间中进行模型化。这些模型可以更 flexibly 定义节点和边,并有效地捕捉地图中的动态空间关系。鼓动于这一点,我们提出一种基于非欧几何学学习的temporal Graph Convolutional Network(GCN) для降水预测。在训练过程中,自动学习 adjacency 矩阵,表示各个网格单元之间的交互,可以通过L1损失函数和实际值像素值进行最小化。然后,GCN层通过各种核长进行1D卷积,提取时间信息,并通过邻域信息作为辅助输入层来改善最终结果。我们在特伦托/意大利区的雷达反射率图序列上测试了这种模型。结果表明,GCNs可以更好地模型云Profile的本地细节以及预测精度,并实现了降低错误度的目标。

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.05793
  • repo_url: None
  • paper_authors: Li Chen, Mengyi Zhao, Yiheng Liu, Mingxu Ding, Yangyang Song, Shizun Wang, Xu Wang, Hao Yang, Jing Liu, Kang Du, Min Zheng
  • for: 这篇论文旨在提出一种新的文本到图像生成方法,以提高个性化图像生成的效率和质量。
  • methods: 该方法采用双树条件机制,在文本和图像域都进行conditioning,以实现更好的控制图像生成过程。此外,我们还引入了一种新的人脸身份损失组件,以提高图像生成过程中的人脸保持性。
  • results: 我们的提案的PhotoVerse方法可以在几秒钟内生成高质量的图像,并且可以生成各种不同的场景和风格的图像。我们的方法还可以完全消除测试时间调整,只需要提供一个目标人脸的单一图像。
    Abstract Personalized text-to-image generation has emerged as a powerful and sought-after tool, empowering users to create customized images based on their specific concepts and prompts. However, existing approaches to personalization encounter multiple challenges, including long tuning times, large storage requirements, the necessity for multiple input images per identity, and limitations in preserving identity and editability. To address these obstacles, we present PhotoVerse, an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains, providing effective control over the image generation process. Furthermore, we introduce facial identity loss as a novel component to enhance the preservation of identity during training. Remarkably, our proposed PhotoVerse eliminates the need for test time tuning and relies solely on a single facial photo of the target identity, significantly reducing the resource cost associated with image generation. After a single training phase, our approach enables generating high-quality images within only a few seconds. Moreover, our method can produce diverse images that encompass various scenes and styles. The extensive evaluation demonstrates the superior performance of our approach, which achieves the dual objectives of preserving identity and facilitating editability. Project page: https://photoverse2d.github.io/
    摘要 个人化文本到图像生成技术已经成为当前最强大和最受欢迎的工具,允许用户根据自己的具体概念和提示来创建自定义的图像。然而,现有的个人化方法面临多个挑战,包括长时间调整、大量存储需求、每个标识符需要多个输入图像,以及保持标识和可编辑性的限制。为解决这些挑战,我们提出了 PhotoVerse,一种创新的方法,它在文本和图像领域都采用双枝条件机制,以提供有效控制图像生成过程。此外,我们还引入了人脸标识损失作为一种新的组件,以增强在训练过程中保持标识的能力。值得一提的是,我们的提议的 PhotoVerse 不需要测试时间调整,仅需要一个目标标识人脸的单个图像,可以减少图像生成的资源成本。另外,我们的方法可以在只需几秒钟内生成高质量图像,并且可以生成包括不同场景和风格的多种图像。我们的评估结果表明,我们的方法可以同时保持标识和促进可编辑性的两个目标。项目页面:https://photoverse2d.github.io/

Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems

  • paper_url: http://arxiv.org/abs/2309.05787
  • repo_url: None
  • paper_authors: Amr Gomaa, Michael Feld
  • for: 这篇论文目的是提出一种基于人类教学的人工智能系统设计方法,以便让人工智能系统更好地理解物体和环境。
  • methods: 这篇论文提出了一种以多模态输入和输出为基础的人工智能系统设计方法,包括人类教学和机器学习等技术。
  • results: 这篇论文提出了一些假设和设计指南,以及一个 relate work 的应用场景,以便实现人工智能系统的更高水平的学习能力。
    Abstract Recent advances in machine learning, particularly deep learning, have enabled autonomous systems to perceive and comprehend objects and their environments in a perceptual subsymbolic manner. These systems can now perform object detection, sensor data fusion, and language understanding tasks. However, there is a growing need to enhance these systems to understand objects and their environments more conceptually and symbolically. It is essential to consider both the explicit teaching provided by humans (e.g., describing a situation or explaining how to act) and the implicit teaching obtained by observing human behavior (e.g., through the system's sensors) to achieve this level of powerful artificial intelligence. Thus, the system must be designed with multimodal input and output capabilities to support implicit and explicit interaction models. In this position paper, we argue for considering both types of inputs, as well as human-in-the-loop and incremental learning techniques, for advancing the field of artificial intelligence and enabling autonomous systems to learn like humans. We propose several hypotheses and design guidelines and highlight a use case from related work to achieve this goal.
    摘要 Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in

Grey-box Bayesian Optimization for Sensor Placement in Assisted Living Environments

  • paper_url: http://arxiv.org/abs/2309.05784
  • repo_url: None
  • paper_authors: Shadan Golestan, Omid Ardakanian, Pierre Boulanger
  • for: 这篇论文是为了实现帮助生活空间中的堕落检测、室内定位和活动识别而优化传感器配置和位置。
  • methods: 本文提出了一种新的、对称的搜寻方法,利用灰色泵测测和模拟评估,寻找在无限室内空间中高质量的传感器配置。本文的主要技术贡献在于将内部活动的空间分布知识 integrate 到 Bayesian 优化中的迭代选择中。
  • results: 在两个 simulated 室内环境和一个真实世界数据中,我们显示了我们的提案方法在识别高质量传感器配置方面比 state-of-the-art 黑色盒子优化技术更好,实现了更高的 F1 分数,而且需要较少 (51.3% 的平均) 耗时和价格的函数询问。
    Abstract Optimizing the configuration and placement of sensors is crucial for reliable fall detection, indoor localization, and activity recognition in assisted living spaces. We propose a novel, sample-efficient approach to find a high-quality sensor placement in an arbitrary indoor space based on grey-box Bayesian optimization and simulation-based evaluation. Our key technical contribution lies in capturing domain-specific knowledge about the spatial distribution of activities and incorporating it into the iterative selection of query points in Bayesian optimization. Considering two simulated indoor environments and a real-world dataset containing human activities and sensor triggers, we show that our proposed method performs better compared to state-of-the-art black-box optimization techniques in identifying high-quality sensor placements, leading to accurate activity recognition in terms of F1-score, while also requiring a significantly lower (51.3% on average) number of expensive function queries.
    摘要 优化感知器置设和位置对于可靠的落体检测、indoor定位和活动识别在助生活空间中是关键。我们提出了一种新的、样本效率高的方法,通过灰度box bayesian优化和基于模拟的评估来找到高质量的感知器置设。我们的关键技术之一是利用域专业知识来捕捉活动空间的空间分布,并在 Bayesian 优化中逐步选择查询点。对于两个 simulated indoor 环境和一个实际数据集中的人类活动和感知器触发,我们表明了我们的提议方法在与现有的黑盒优化技术相比,能够更好地确定高质量的感知器置设,导致更加准确的活动识别(F1 分数),同时也只需要 significantly fewer(51.3% 的平均)次昂贵的函数查询。

Robot Parkour Learning

  • paper_url: http://arxiv.org/abs/2309.05665
  • repo_url: https://github.com/ZiwenZhuang/parkour
  • paper_authors: Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher Atkeson, Soeren Schwertfeger, Chelsea Finn, Hang Zhao
  • for: 本研究旨在开发一种基于视觉的全息攻击策略,以便机器人可以在复杂环境中快速跨越各种障碍。
  • methods: 我们提出了一种基于强化学习的方法,使用简单的奖励函数来学习多种视觉基于的困难跨越技能,包括爬高障碍、跃越大距离、蹲下低障碍、缩进窄障碍和跑步。
  • results: 我们在实验中示出,我们的系统可以将这些技能融合成一个单一的视觉基于的全息攻击策略,并将其转移到一个四脚机器人上使用其 Egocentric depth camera。我们的系统可以让两个不同的低成本机器人自主选择和执行适合的跨越技能,以 traverse 复杂的实际环境。
    Abstract Parkour is a grand challenge for legged locomotion that requires robots to overcome various obstacles rapidly in complex environments. Existing methods can generate either diverse but blind locomotion skills or vision-based but specialized skills by using reference animal data or complex rewards. However, autonomous parkour requires robots to learn generalizable skills that are both vision-based and diverse to perceive and react to various scenarios. In this work, we propose a system for learning a single end-to-end vision-based parkour policy of diverse parkour skills using a simple reward without any reference motion data. We develop a reinforcement learning method inspired by direct collocation to generate parkour skills, including climbing over high obstacles, leaping over large gaps, crawling beneath low barriers, squeezing through thin slits, and running. We distill these skills into a single vision-based parkour policy and transfer it to a quadrupedal robot using its egocentric depth camera. We demonstrate that our system can empower two different low-cost robots to autonomously select and execute appropriate parkour skills to traverse challenging real-world environments.
    摘要

Hypothesis Search: Inductive Reasoning with Language Models

  • paper_url: http://arxiv.org/abs/2309.05660
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman
    for: 这个论文的目的是提高大语言模型(LLM)在推理 inductive reasoning 能力。methods: 该论文使用了生成抽象假设的方法,首先提出多个抽象假设,然后将这些假设转换成 Python 程序,并将这些程序直接应用到观察到的示例上进行验证。results: 该论文的实验结果表明,使用这种方法可以大幅提高 LLM 在 inductive reasoning 任务上的表现。在 ARC 视觉 inductive reasoning benchmark 上,使用自动生成的假设和程序可以达到 27.5% 的准确率,比直接提示baseline(准确率为 12.5%)高出许多。并且,通过人工选择 LLM 生成的候选者来减少生成的数量,可以进一步提高表现,达到 37.5% 的准确率。
    Abstract Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very poorly on more complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be directly verified by running on the observed examples and generalized to novel inputs. Because of the prohibitive cost of generation with state-of-the-art LLMs, we consider a middle step to filter the set of hypotheses that will be implemented into programs: we either ask the LLM to summarize into a smaller set of hypotheses, or ask human annotators to select a subset of the hypotheses. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, and string transformation dataset SyGuS. On a random 40-problem subset of ARC, our automated pipeline using LLM summaries achieves 27.5% accuracy, significantly outperforming the direct prompting baseline (accuracy of 12.5%). With the minimal human input of selecting from LLM-generated candidates, the performance is boosted to 37.5%. (And we argue this is a lower bound on the performance of our approach without filtering.) Our ablation studies show that abstract hypothesis generation and concrete program representations are both beneficial for LLMs to perform inductive reasoning tasks.
    摘要 人类可以通过推理来解决问题,如果给他们一些示例后,他们可以找出其下面的原理,并将其应用到新的场景中。最近的研究发现,大型自然语言模型(LLM)在 inductive reasoning 任务上表现不佳,因为它们直接从示例中学习不够。在这项工作中,我们提出了使 LLM 在 inductive reasoning 任务中更好的方法,那就是通过生成多个层次抽象的假设来提高它们的 inductive reasoning 能力。我们会问 LLM 提供多个自然语言中的假设,然后将这些假设转换为 Python 程序。这些程序可以直接在观察到的示例上运行,并将其推广到新的输入。由于现状的 LLM 生成成本太高,我们考虑了一个中间步骤,即使 LLM SUMMARIZE 生成的假设中的一个子集。我们使用这种方法在 ARC 视觉 inductive reasoning benchmark、其变种 1D-ARC 和 SyGuS 串转换集上进行验证。在随机选择 ARC 中的 40 个问题上,我们的自动化管道使用 LLM 摘要可以达到 27.5% 的准确率,与直接提问基eline (准确率为 12.5%) 相比有显著提高。在人工选择 LLM 生成的候选者的情况下,准确率可以达到 37.5%。我们的抽象研究表明,生成抽象假设和转换为具体程序表示都对 LLM 进行 inductive reasoning 任务是有利的。

Large Language Model for Science: A Study on P vs. NP

  • paper_url: http://arxiv.org/abs/2309.05689
  • repo_url: https://github.com/microsoft/LMOps/tree/main/LLM4Science
  • paper_authors: Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei
  • for: 本研究使用大型自然语言模型(LLMs)来扩展和加速NP问题的研究,NP问题是计算机科学和数学中的一个最重要的开问题。
  • methods: 本研究提出了索洛克式思维框架,这是一种推广和加速复杂问题解决的框架,使用LLMs进行深入思考和推理。
  • results: 在p vs np问题的pilot研究中,GPT-4成功地生成了证明schema并在97次对话中进行了严格的推理,得出了”P≠NP”的结论,与(Xu和Zhou, 2023)的结论一致。
    Abstract In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics. Specifically, we propose Socratic reasoning, a general framework that promotes in-depth thinking with LLMs for complex problem-solving. Socratic reasoning encourages LLMs to recursively discover, solve, and integrate problems while facilitating self-evaluation and refinement. Our pilot study on the P vs. NP problem shows that GPT-4 successfully produces a proof schema and engages in rigorous reasoning throughout 97 dialogue turns, concluding "P $\neq$ NP", which is in alignment with (Xu and Zhou, 2023). The investigation uncovers novel insights within the extensive solution space of LLMs, shedding light on LLM for Science.
    摘要 在这个研究中,我们使用大语言模型(LLM)来增强和加速理论计算机科学和数学领域的研究,特别是P versus NP问题。我们提出了索кратиче思维框架,这是一种推广深思的框架,可以在复杂问题解决时使用LLM。索 kratic思维鼓励LLM在问题解决过程中进行自我评估和修充,从而促进深思。我们的试点研究发现,GPT-4成功地生成了证明schema,并在97次对话中进行了严格的思考,最终结论是P≠NP,这与(Xu和Zhou, 2023)相符。这些调查揭示了LLM在解决问题空间中的广泛新发现,为LLM在科学领域的应用提供了新的思路。

Combinative Cumulative Knowledge Processes

  • paper_url: http://arxiv.org/abs/2309.05638
  • repo_url: https://github.com/Aryia-Behroziuan/Robot-learning
  • paper_authors: Anna Brandenberger, Cassandra Marcussen, Elchanan Mossel, Madhu Sudan
  • for: 本文研究了Ben-Eliezer等人(ITCS 2023)提出的累累加知过程,在“指导的无环图”(DAG)中进行了分析。在这种设定中,新的知识单元可以通过将多个先前的知识单元组合而生成。
  • methods: 本文使用了idealized和简化的“树状”设定,即新单元只依赖于一个先前生成的单元。本文的主要目标是了解当前过程是否安全,即错误的影响是否可控。
  • results: 本文提供了一些必要和 suficient conditions for safety。与先前工作一样,frequency of checking和checking depth都对安全性具有关键作用。current work中新引入的一个关键参数是“组合因子”,即新单元知识取决于多少个先前生成的单元的分布。结果表明,大 combinatin factor可以赞成深度不深的检查。该结果与combination factor之间的依赖性并不简单,有些结果表示为$\mathbb{E}{1/M}$,而其他结果则取决于$\mathbb{E}{M}$.
    Abstract We analyze Cumulative Knowledge Processes, introduced by Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023), in the setting of "directed acyclic graphs", i.e., when new units of knowledge may be derived by combining multiple previous units of knowledge. The main considerations in this model are the role of errors (when new units may be erroneous) and local checking (where a few antecedent units of knowledge are checked when a new unit of knowledge is discovered). The aforementioned work defined this model but only analyzed an idealized and simplified "tree-like" setting, i.e., a setting where new units of knowledge only depended directly on one previously generated unit of knowledge. The main goal of our work is to understand when the general process is safe, i.e., when the effect of errors remains under control. We provide some necessary and some sufficient conditions for safety. As in the earlier work, we demonstrate that the frequency of checking as well as the depth of the checks play a crucial role in determining safety. A key new parameter in the current work is the $\textit{combination factor}$ which is the distribution of the number of units $M$ of old knowledge that a new unit of knowledge depends on. Our results indicate that a large combination factor can compensate for a small depth of checking. The dependency of the safety on the combination factor is far from trivial. Indeed some of our main results are stated in terms of $\mathbb{E}\{1/M\}$ while others depend on $\mathbb{E}\{M\}$.
    摘要 我们分析了Ben-Eliezer等人(ITCS 2023)提出的累积知识过程,在“指定的无环图”(DAG)中进行了研究,即新的知识单元可以通过组合多个先前的知识单元而生成。这个模型中的主要考虑因素包括错误(新单元可能错误)以及本地检查(先前的一些知识单元被检查)。该模型在理想化和简化的“树状”设置下进行了分析,即新单元只依赖于一个先前生成的知识单元。我们的主要目标是理解这个过程是安全的,即错误的影响保持在控制之下。我们提供了一些必要和充分的条件,以确定安全性。与先前的工作相同,我们发现了检查频率以及检查深度对安全性的重要作用。我们的研究发现,一个大的组合因子可以赞成一个小的检查深度。这个参数的依赖性与安全性之间存在很多不确定性。我们的主要结果中有些是基于$\mathbb{E}\{1/M\}$的,而其他些则是基于$\mathbb{E}\{M\}$。

Exploration and Comparison of Deep Learning Architectures to Predict Brain Response to Realistic Pictures

  • paper_url: http://arxiv.org/abs/2309.09983
  • repo_url: None
  • paper_authors: Riccardo Chimisso, Sathya Buršić, Paolo Marocco, Giuseppe Vizzari, Dimitri Ognibene
  • for: 预测大脑对实际图像的反应
  • methods: 使用不同预训练模型进行广泛实验,包括简单的模型和复杂的架构,以及使用可用数据和生成的嵌入。
  • results: 使用多个简单模型,每个模型专门预测每个脑区域的反应,以获得最佳结果,但未能建立坚固的数据相关性。
    Abstract We present an exploration of machine learning architectures for predicting brain responses to realistic images on occasion of the Algonauts Challenge 2023. Our research involved extensive experimentation with various pretrained models. Initially, we employed simpler models to predict brain activity but gradually introduced more complex architectures utilizing available data and embeddings generated by large-scale pre-trained models. We encountered typical difficulties related to machine learning problems, e.g. regularization and overfitting, as well as issues specific to the challenge, such as difficulty in combining multiple input encodings, as well as the high dimensionality, unclear structure, and noisy nature of the output. To overcome these issues we tested single edge 3D position-based, multi-region of interest (ROI) and hemisphere predictor models, but we found that employing multiple simple models, each dedicated to a ROI in each hemisphere of the brain of each subject, yielded the best results - a single fully connected linear layer with image embeddings generated by CLIP as input. While we surpassed the challenge baseline, our results fell short of establishing a robust association with the data.
    摘要 我们在Algonauts Challenge 2023中展示了机器学习架构的探索,用于预测真实图像下大脑的响应。我们的研究包括了许多预训练模型的实验。我们首先使用简单的模型预测大脑活动,然后逐渐引入更复杂的架构,利用可用的数据和由大规模预训练模型生成的嵌入。我们遇到了常见的机器学习问题,如常见化和过拟合,以及挑战中特有的问题,如将多个输入编码器结合起来、高维度、不确定结构和噪音性的输出。为了解决这些问题,我们测试了单边3D位置基于、多区域兴趣点(ROI)和半球预测器模型,但我们发现使用每个主 FROM 中的每个subject的每个ROI上的多个简单模型,每个模型都是一个全连接线性层,使用CLIP生成的图像嵌入,最终得到了最好的结果。虽然我们超越了基准值,但我们的结果未能建立可靠的关系于数据。

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

  • paper_url: http://arxiv.org/abs/2309.05605
  • repo_url: https://github.com/msakarvadia/memory_injections
  • paper_authors: Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster
  • for: 这篇论文目的是提高大语言模型(LLM)在多步骤理解任务中的表现。
  • methods: 该方法利用LLM的注意力头进行targeted memory injection,以帮助LLM在多步骤理解任务中包含更多有关信息。
  • results: 实验结果表明,通过在关键注意力层进行简单、有效和targeted的内存注入,可以提高LLM在多步骤任务中的表现,提高下一个需要的概率,最高提高424%。
    Abstract Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.
    摘要

Introspective Deep Metric Learning

  • paper_url: http://arxiv.org/abs/2309.09982
  • repo_url: https://github.com/wzzheng/IDML
  • paper_authors: Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • for: 提出了一个 introspective deep metric learning(IDML)框架,以解决深度度量学中的uncertainty问题。
  • methods: 提出使用semantic embedding和uncertainty embedding来描述图像的 semantics和ambiguity,并使用 introspective similarity metric进行相似性评估。
  • results: 在CUB-200-2011、Cars196和Stanford Online Products dataset上,IDML framework的性能比 conventinal deep metric learning方法更高,且可以更好地处理ambiguous images。
    Abstract This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce unsatisfactory judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval and clustering. We further provide an in-depth analysis of our framework to demonstrate the effectiveness and reliability of IDML. Code: https://github.com/wzzheng/IDML.
    摘要 To address this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose representing an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively.We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training.The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and achieves state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval and clustering. We also provide an in-depth analysis of our framework to demonstrate its effectiveness and reliability. The code is available at https://github.com/wzzheng/IDML.

Temporal Action Localization with Enhanced Instant Discriminability

  • paper_url: http://arxiv.org/abs/2309.05590
  • repo_url: https://github.com/dingfengshi/tridetplus
  • paper_authors: Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, Dacheng Tao
  • for: 这篇论文目的是提出一种一阶段框架TriDet,用于检测视频中的动作边界和其相应的类别。
  • methods: 这篇论文使用了Trident-head模型动作边界,并提出了一种高效的粒度层(SGP层)来解决转换器基于方法中的排名损失问题。同时,它还利用预训练的大型模型来提高视频背景的表示能力。
  • results: 实验结果表明TriDet具有了高效性和状态最佳的表现在多个动作检测 datasets 上,包括层次(多个标签)动作检测 datasets。
    Abstract Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate spatial-temporal context for classification, we design a decoupled feature pyramid network with separate feature pyramids to incorporate rich spatial context from the large model for localization. Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets, including hierarchical (multilabel) TAD datasets.
    摘要 Temporal action detection (TAD) targets to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often lead to imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate spatial-temporal context for classification, we design a decoupled feature pyramid network with separate feature pyramids to incorporate rich spatial context from the large model for localization. Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets, including hierarchical (multilabel) TAD datasets.Here's the word-for-word translation of the text into Simplified Chinese: temporal action detection (TAD) targets to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often lead to imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate spatial-temporal context for classification, we design a decoupled feature pyramid network with separate feature pyramids to incorporate rich spatial context from the large model for localization. Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets, including hierarchical (multilabel) TAD datasets.

Mind the Uncertainty: Risk-Aware and Actively Exploring Model-Based Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.05582
  • repo_url: None
  • paper_authors: Marin Vlastelica, Sebastian Blaes, Cristina Pineri, Georg Martius
  • for: 这篇论文是为了解决基于模型的再征询学习中的风险管理问题,使用轨迹采样和概率安全约束,并平衡optimism和pessimism两种不确定性。
  • methods: 本论文使用了一种简单 yet effective的方法,即在基于模型的再征询学习中分离不确定性,并使用概率安全约束和轨迹采样来管理风险。
  • results: 各种实验表明,在数据驱动的MPC方法中,分离不确定性是关键 для在不确定和安全控制环境中表现良好。
    Abstract We introduce a simple but effective method for managing risk in model-based reinforcement learning with trajectory sampling that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks.Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.
    摘要 我们介绍一种简单 yet有效的方法来管理模型基于强化学习中的风险,这个方法包括机会不确定性和抽象不确定性之间的分类,并在这些不确定性下寻求平衡。我们通过实验发现,在不确定和安全控制环境中,分类不确定性是管理风险的关键。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

ITI-GEN: Inclusive Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2309.05569
  • repo_url: https://github.com/humansensinglab/ITI-GEN
  • paper_authors: Cheng Zhang, Xuanbai Chen, Siqi Chai, Chen Henry Wu, Dmitry Lagun, Thabo Beeler, Fernando De la Torre
    for:This paper aims to address the issue of unequal representations of underrepresented groups in text-to-image generative models by proposing a novel approach called ITI-GEN.methods:ITI-GEN leverages readily available reference images to learn prompt embeddings that can generate inclusive images from human-written prompts. The approach does not require model fine-tuning, making it computationally efficient.results:Extensive experiments demonstrate that ITI-GEN largely improves over state-of-the-art models in generating inclusive images from prompts, ensuring that all desired attribute categories are represented uniformly.Here is the Chinese version of the three key points:for:这篇论文的目的是解决文本到图像生成模型中的弱化群体表示问题,提出了一种名为ITI-GEN的新方法。methods:ITI-GEN利用可以获得的参考图像来学习提示 embedding,从人写的提示中生成包括所有感兴趣的属性类别的包容图像。该方法不需要模型练习,可以快速进行计算效率。results:广泛的实验表明,ITI-GEN在基于提示生成图像方面大幅超越了现有模型, Ensure that all desired attribute categories are represented uniformly.
    Abstract Text-to-image generative models often reflect the biases of the training data, leading to unequal representations of underrepresented groups. This study investigates inclusive text-to-image generative models that generate images based on human-written prompts and ensure the resulting images are uniformly distributed across attributes of interest. Unfortunately, directly expressing the desired attributes in the prompt often leads to sub-optimal results due to linguistic ambiguity or model misrepresentation. Hence, this paper proposes a drastically different approach that adheres to the maxim that "a picture is worth a thousand words". We show that, for some attributes, images can represent concepts more expressively than text. For instance, categories of skin tones are typically hard to specify by text but can be easily represented by example images. Building upon these insights, we propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration. The key idea is learning a set of prompt embeddings to generate images that can effectively represent all desired attribute categories. More importantly, ITI-GEN requires no model fine-tuning, making it computationally efficient to augment existing text-to-image models. Extensive experiments demonstrate that ITI-GEN largely improves over state-of-the-art models to generate inclusive images from a prompt. Project page: https://czhang0528.github.io/iti-gen.
    摘要 文本到图像生成模型经常表现出训练数据的偏见,导致特定群体的不平等表达。这项研究探讨了包容型文本到图像生成模型,该模型根据人写的提示生成图像,并确保生成图像具有所有Attributes of interest的均匀分布。然而,直接表达愿景中的属性在提示中经常会导致优化不佳的结果,因为语言 ambiguity 或模型误 repreSentation。因此,这篇论文提出了一种极其不同的方法,即通过“一 picture is worth a thousand words”的maxim,我们表明,对于一些属性,图像可以更加表达Concepts than text。例如,皮肤色Category 通常由文本很难Specify,但可以通过示例图像轻松表达。基于这些意识,我们提出了一种新的方法,名为 ITI-GEN,该方法利用可以 obtAin的参考图像来实现包容型文本到图像生成。关键思想是学习一组提示Embeddings,以生成能够有效表示所有愿景Category的图像。更重要的是,ITI-GEN不需要模型细化,因此可以 computationally efficient 地增强现有的文本到图像模型。广泛的实验表明,ITI-GEN较State-of-the-art模型大幅提高了从提示生成包容图像的能力。项目页面:https://czhang0528.github.io/iti-gen。

An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

  • paper_url: http://arxiv.org/abs/2309.05557
  • repo_url: None
  • paper_authors: Yukai Miao, Yu Bai, Li Chen, Dan Li, Haifeng Sun, Xizheng Wang, Ziqiu Luo, Yanyu Ren, Dapeng Sun, Xiuting Xu, Qi Zhang, Chao Xiang, Xinchi Li
  • For: The paper is written for evaluating the comprehensive capabilities of Pre-trained Large Language Models (LLMs) in Network Operations (NetOps) and measuring their performance in a multi-lingual context.* Methods: The paper presents an evaluation set called NetEval, which consists of 5,732 questions about NetOps covering five different sub-domains. The authors systematically evaluate the NetOps capability of 26 publicly available LLMs using NetEval.* Results: The results show that only GPT-4 can achieve a performance competitive to humans in NetOps, while some open models like LLaMA 2 demonstrate significant potential.Here are the three information points in Simplified Chinese text:* For: 这篇论文是为了评估大量语言模型(LLMs)在网络操作(NetOps)中的总体能力,以及在多语言 context中进行评估。* Methods: 论文提出了 NetEval 评估集,包含5,732个网络操作问题,涵盖了五个不同的子领域。作者使用 NetEval 系统性地评估了26个公开available LLMs 的网络操作能力。* Results: 结果显示,只有 GPT-4 能够与人类水平的性能,而一些开源模型如 LLaMA 2 表现出了 significativ potential。
    Abstract Nowadays, the versatile capabilities of Pre-trained Large Language Models (LLMs) have attracted much attention from the industry. However, some vertical domains are more interested in the in-domain capabilities of LLMs. For the Networks domain, we present NetEval, an evaluation set for measuring the comprehensive capabilities of LLMs in Network Operations (NetOps). NetEval is designed for evaluating the commonsense knowledge and inference ability in NetOps in a multi-lingual context. NetEval consists of 5,732 questions about NetOps, covering five different sub-domains of NetOps. With NetEval, we systematically evaluate the NetOps capability of 26 publicly available LLMs. The results show that only GPT-4 can achieve a performance competitive to humans. However, some open models like LLaMA 2 demonstrate significant potential.
    摘要 Note:* 预训练大语言模型 (LLMs) is translated as "预训练大语言模型" in Simplified Chinese.* Network Operations (NetOps) is translated as "网络运维" in Simplified Chinese.* GPT-4 is translated as "GPT-4" in Simplified Chinese.* LLaMA 2 is translated as "LLaMA 2" in Simplified Chinese.

Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning Fusion

  • paper_url: http://arxiv.org/abs/2309.07164
  • repo_url: https://github.com/anshulranjan2004/pyhmm
  • paper_authors: Anshul Ranjan, Kaushik Jegadeesan
  • for: 这个研究是为了开发一个资源有限的机器人领域中的自动语音识别系统(ASR)。
  • methods: 这个方法结合隐藏马克夫模型(HMM)和深度学习模型,并通过 Socket 程式设计来分配处理任务,以提高语音识别精度。
  • results: 实验结果显示,这个混合式 ASR 系统在不同的机器人平台上展现出了实时和精准的语音识别能力,并且具有适应不同音律环境和低功耗硬件的能力。
    Abstract This paper presents a novel hybrid Automatic Speech Recognition (ASR) system designed specifically for resource-constrained robots. The proposed approach combines Hidden Markov Models (HMMs) with deep learning models and leverages socket programming to distribute processing tasks effectively. In this architecture, the HMM-based processing takes place within the robot, while a separate PC handles the deep learning model. This synergy between HMMs and deep learning enhances speech recognition accuracy significantly. We conducted experiments across various robotic platforms, demonstrating real-time and precise speech recognition capabilities. Notably, the system exhibits adaptability to changing acoustic conditions and compatibility with low-power hardware, making it highly effective in environments with limited computational resources. This hybrid ASR paradigm opens up promising possibilities for seamless human-robot interaction. In conclusion, our research introduces a pioneering dimension to ASR techniques tailored for robotics. By employing socket programming to distribute processing tasks across distinct devices and strategically combining HMMs with deep learning models, our hybrid ASR system showcases its potential to enable robots to comprehend and respond to spoken language adeptly, even in environments with restricted computational resources. This paradigm sets a innovative course for enhancing human-robot interaction across a wide range of real-world scenarios.
    摘要 Translated into Simplified Chinese:这篇论文介绍了一种新的混合自动语音识别(ASR)系统,特制为有限的机器人资源。该系统结合隐藏马尔可夫模型(HMM)和深度学习模型,通过socket编程分布处理任务,以提高语音识别精度。在这个架构中,HMM基于的处理在机器人内部进行,而深度学习模型则由 separte的PC处理。这种 hybrid ASR 模型在不同的 роботиче平台上进行实验,展现了实时和精准的语音识别能力。尤其是在受到不同的音响环境影响时,该系统能够适应变化,并且与低功耗硬件相容,使其在有限的计算资源环境中表现出色。这种 hybrid ASR 模型开启了人机合作的新可能,使机器人能够通过语音理解和回应,与人类进行无缝交互,以实现各种真实世界的应用场景。

Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications

  • paper_url: http://arxiv.org/abs/2309.05542
  • repo_url: https://github.com/zhudotexe/kani
  • paper_authors: Andrew Zhu, Liam Dugan, Alyssa Hwang, Chris Callison-Burch
  • for: 这篇论文是为了提供一个轻量级、灵活、无关模型的开源框架,用于构建语音模型应用程序。
  • methods: 论文使用了模型接口、聊天管理和强大函数调用等核心构建块来支持复杂的功能实现。所有核心函数都可以轻松地被 override,并且都具有丰富的文档,以便开发者根据自己的需求进行自定义。
  • results: 论文通过提供一个轻量级、灵活的开源框架,帮助开发者快速实现复杂的语音模型应用程序,同时保持了可重复性和细化控制。
    Abstract Language model applications are becoming increasingly popular and complex, often including features like tool usage and retrieval augmentation. However, existing frameworks for such applications are often opinionated, deciding for developers how their prompts ought to be formatted and imposing limitations on customizability and reproducibility. To solve this we present Kani: a lightweight, flexible, and model-agnostic open-source framework for building language model applications. Kani helps developers implement a variety of complex features by supporting the core building blocks of chat interaction: model interfacing, chat management, and robust function calling. All Kani core functions are easily overridable and well documented to empower developers to customize functionality for their own needs. Kani thus serves as a useful tool for researchers, hobbyists, and industry professionals alike to accelerate their development while retaining interoperability and fine-grained control.
    摘要 language model 应用程序在现在越来越受欢迎和复杂,通常包括工具使用和检索增强功能。然而,现有的框架经常强制性地决定开发者如何格式化他们的提示,并强制限制自定义和重现性。为解决这个问题,我们提出了 Kani:一个轻量级、灵活、无关模型的开源框架,用于构建语音模型应用程序。Kani 帮助开发者实现许多复杂的功能,通过支持语音交互的核心构建块:模型接口、聊天管理和强大的函数调用。Kani 的核心函数都可以轻松地被覆盖,并且所有函数都具有详细的文档,以便开发者可以根据自己的需求自定义功能。因此,Kani 成为了研究人员、爱好者和行业专业人员都可以使用的有用工具,以加速开发,保持兼容性和细化控制。

PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

  • paper_url: http://arxiv.org/abs/2309.05534
  • repo_url: None
  • paper_authors: Chengyu Wang, Zhongjie Duan, Bingyan Liu, Xinyi Zou, Cen Chen, Kui Jia, Jun Huang
  • For: 本文旨在提出一个涵盖整体和域pecific Chinese diffusion模型的完整框架,以生成contextually relevant的图像。* Methods: 本文使用了普遍的Diffusion模型,并结合了域pecific的中文Diffusion模型,以及LoRA和ControlNet来实现细化的图像风格传输和图像编辑。* Results: 本文通过评估多个Benchmark Tasks和实际应用场景,证明了PAI-Diffusion框架在生成contextually relevant的图像方面具有了优秀的表现。
    Abstract Text-to-image synthesis for the Chinese language poses unique challenges due to its large vocabulary size, and intricate character relationships. While existing diffusion models have shown promise in generating images from textual descriptions, they often neglect domain-specific contexts and lack robustness in handling the Chinese language. This paper introduces PAI-Diffusion, a comprehensive framework that addresses these limitations. PAI-Diffusion incorporates both general and domain-specific Chinese diffusion models, enabling the generation of contextually relevant images. It explores the potential of using LoRA and ControlNet for fine-grained image style transfer and image editing, empowering users with enhanced control over image generation. Moreover, PAI-Diffusion seamlessly integrates with Alibaba Cloud's Machine Learning Platform for AI, providing accessible and scalable solutions. All the Chinese diffusion model checkpoints, LoRAs, and ControlNets, including domain-specific ones, are publicly available. A user-friendly Chinese WebUI and the diffusers-api elastic inference toolkit, also open-sourced, further facilitate the easy deployment of PAI-Diffusion models in various environments, making it a valuable resource for Chinese text-to-image synthesis.
    摘要 文本到图像生成 для中文语言具有独特的挑战,主要包括语言大词汇和汉字间复杂的关系。现有的扩散模型已经显示出生成图像from文本描述的潜力,但它们经常忽视特定领域上下文和中文语言的特点。本文提出PAI-Diffusion框架,解决这些局限性。PAI-Diffusion结合了通用和域专的中文扩散模型,使得可以生成上下文相关的图像。它还 explore了使用LoRA和ControlNet进行细化的图像风格传递和图像编辑,让用户对图像生成具有更多的控制权。此外,PAI-Diffusion与阿里巴巴云计算机机器学习平台的AI集成了可靠和扩展的解决方案。所有的中文扩散模型检查点、LoRAs和ControlNets,包括域专的,都是公共可用的。此外,用户友好的中文WebUI和diffusers-api可以方便地在不同环境中部署PAI-Diffusion模型,使其成为中文文本到图像生成的有价值资源。

On the meaning of uncertainty for ethical AI: philosophy and practice

  • paper_url: http://arxiv.org/abs/2309.05529
  • repo_url: None
  • paper_authors: Cassandra Bird, Daniel Williamson, Sabina Leonelli
  • for: 该论文的目的是如何增加人工智能系统的透明度和负责任性,以便更好地回应用户的反馈和评估。
  • methods: 该论文提出了一种解决方案,通过明确指出人工智能系统的开发基础和应用领域的限制,来增强模型的响应性、输出的质量和意义、以及对模型的评估透明度。
  • results: 该论文通过扩展后验投入评估来实现信念拥有,并 argueed that这是一种将伦理考虑入数学逻辑中的重要方法,以及实现伦理AI在统计实践中的实现。 在COVID-19Omicron变种的扩散问题上,该论文提供了一个实践例子。
    Abstract Whether and how data scientists, statisticians and modellers should be accountable for the AI systems they develop remains a controversial and highly debated topic, especially given the complexity of AI systems and the difficulties in comparing and synthesising competing claims arising from their deployment for data analysis. This paper proposes to address this issue by decreasing the opacity and heightening the accountability of decision making using AI systems, through the explicit acknowledgement of the statistical foundations that underpin their development and the ways in which these dictate how their results should be interpreted and acted upon by users. In turn, this enhances (1) the responsiveness of the models to feedback, (2) the quality and meaning of uncertainty on their outputs and (3) their transparency to evaluation. To exemplify this approach, we extend Posterior Belief Assessment to offer a route to belief ownership from complex and competing AI structures. We argue that this is a significant way to bring ethical considerations into mathematical reasoning, and to implement ethical AI in statistical practice. We demonstrate these ideas within the context of competing models used to advise the UK government on the spread of the Omicron variant of COVID-19 during December 2021.
    摘要 Translated into Simplified Chinese:whether 和 how 数据科学家、统计学家和模型构建者应该被负责做AI系统的问题是一个争议和高度讨论的话题,尤其是由于AI系统的复杂性和其部署用于数据分析时的比较和结合的困难。这篇论文提议通过降低透明度和提高决策使用AI系统的负责任,通过明确AI系统的发展基础,并让用户理解和 acted upon 其结果的方式。这有助于 (1) 提高模型的反馈responsiveness, (2) 提高输出的不确定性质量和意义, (3) 提高评估透明度。为了证明这一方法,我们将 posterior belief assessment 扩展到提供对复杂和竞争性AI结构的信念所有权的路径。我们认为这是一种将伦理考虑进数学逻辑中的重要方法,并将伦理AI应用于统计实践中。我们在2021年12月 UK 政府对奥米克隆变种COVID-19 的扩散提供了示例。

NExT-GPT: Any-to-Any Multimodal LLM

  • paper_url: http://arxiv.org/abs/2309.05519
  • repo_url: https://github.com/NExT-GPT/NExT-GPT
  • paper_authors: Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua
  • for: 这 paper 的目的是开发一个可以处理多种模式的大型语言模型(MM-LLM)系统,以便模拟人类在多种感知和交流中的行为。
  • methods: 这 paper 使用了一种综合拓展的结构,将语言模型(LLM)与多模态适配器和不同的扩散解码器相连接,以便接受和生成多种模式的输入和输出。此外,paper 还引入了一种模式转换指令调整(MosIT),并 manually 精心编辑了一个高质量的多模式数据集,以便让 NExT-GPT 具备跨模式的semantic理解和内容生成能力。
  • results: 经过训练,NExT-GPT 能够在多种模式下进行输入和输出转换,并且在不同的模式下能够具备较高的内容生成和理解能力。此外,paper 还证明了 NExT-GPT 的模式转换能力可以在不同的任务上进行改进,例如图像描述和文本生成等。
    Abstract While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities. As we humans always perceive the world and communicate with people through various modalities, developing any-to-any MM-LLMs capable of accepting and delivering content in any modality becomes essential to human-level AI. To fill the gap, we present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. We connect an LLM with multimodal adaptors and different diffusion decoders, enabling NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. By leveraging the existing well-trained highly-performing encoders and decoders, NExT-GPT is tuned with only a small amount of parameter (1%) of certain projection layers, which not only benefits low-cost training and also facilitates convenient expansion to more potential modalities. Moreover, we introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, based on which NExT-GPT is empowered with complex cross-modal semantic understanding and content generation. Overall, our research showcases the promising possibility of building an AI agent capable of modeling universal modalities, paving the way for more human-like AI research in the community. Project page: https://next-gpt.github.io/
    摘要 Recently, Multimodal Large Language Models (MM-LLMs) have made significant progress, but they are limited to only understanding input-side multimodality and cannot produce content in multiple modalities. As humans perceive the world and communicate with others through various modalities, developing any-to-any MM-LLMs that can accept and deliver content in any modality is essential for human-level AI. To address this gap, we propose an end-to-end general-purpose any-to-any MM-LLM system called NExT-GPT.We connect an LLM with multimodal adaptors and different diffusion decoders, allowing NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. By leveraging existing well-trained and highly-performing encoders and decoders, NExT-GPT is trained with only a small amount of parameters (1% of certain projection layers), which not only reduces training costs but also facilitates the addition of more potential modalities.Moreover, we introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, which enables NExT-GPT to understand complex cross-modal semantics and generate content in various modalities. Our research demonstrates the possibility of building an AI agent capable of modeling universal modalities, paving the way for more human-like AI research in the community.Project page:

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

  • paper_url: http://arxiv.org/abs/2309.05516
  • repo_url: https://github.com/intel/neural-compressor
  • paper_authors: Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv
  • For: The paper aims to optimize the weight rounding task for weight-only quantization in large language models to improve their deployment efficiency while maintaining accuracy.* Methods: The proposed method, SignRound, uses lightweight block-wise tuning with signed gradient descent to optimize the weight rounding task, which achieves outstanding results within 400 steps.* Results: SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods, without introducing additional inference overhead.Here’s the same information in Simplified Chinese text:
  • for: 论文目的是优化大语言模型中的weight-only quantization,以提高其部署效率while maintaining accuracy.
  • methods: 提议的方法是SignRound,它使用轻量级块 wise tuning和签名Gradient Descent来优化weight rounding任务,可以在400步内 дости得出色的结果。
  • results: SignRound比RTN基线和最近的方法更强,而无需添加更多的推理过程 overhead.
    Abstract Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods, without introducing additional inference overhead. The source code will be publicly available at https://github.com/intel/neural-compressor soon.
    摘要 To optimize the weight rounding task, we propose a concise and highly effective approach named SignRound. Our method uses lightweight block-wise tuning with signed gradient descent, achieving outstanding results within 400 steps. SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods without introducing additional inference overhead. The source code will be publicly available at https://github.com/intel/neural-compressor soon.

A Co-design Study for Multi-Stakeholder Job Recommender System Explanations

  • paper_url: http://arxiv.org/abs/2309.05507
  • repo_url: https://github.com/roan-schellingerhout/jrs_explanations
  • paper_authors: Roan Schellingerhout, Francesco Barile, Nava Tintarev
    for:The paper aims to determine the explanation preferences of different stakeholder types in the recruitment process, specifically candidates, recruiters, and companies.methods:The authors created a semi-structured interview guide and used grounded theory to analyze the results, finding that each stakeholder type has distinct explanation preferences.results:The study found that candidates prefer brief, textual explanations, while hiring managers prefer visual graph-based explanations, and recruiters prefer more exhaustive textual explanations. Based on these findings, the authors provide guidelines for designing an explanation interface that meets the needs of all three stakeholder types. Additionally, the validated interview guide can be used in future research to determine explanation preferences for different stakeholder types in other domains.Here is the same information in Simplified Chinese text:for:论文目的是确定各种参与者类型在招聘过程中的解释需求,具体来说是候选人、招聘人员和公司。methods:作者们创建了一份 semi-structured 采访指南,并使用基本理论来分析结果,发现每个参与者类型都有不同的解释需求。results:研究发现,候选人喜欢简短的文本解释,而招聘人员偏好图形基于的解释,而招聘人员则更喜欢详细的文本解释。根据这些发现,作者们提出了设计解释界面的指南,以满足所有参与者类型的需求。此外,采访指南也可以在未来研究中用于确定不同参与者类型的解释需求。
    Abstract Recent legislation proposals have significantly increased the demand for eXplainable Artificial Intelligence (XAI) in many businesses, especially in so-called `high-risk' domains, such as recruitment. Within recruitment, AI has become commonplace, mainly in the form of job recommender systems (JRSs), which try to match candidates to vacancies, and vice versa. However, common XAI techniques often fall short in this domain due to the different levels and types of expertise of the individuals involved, making explanations difficult to generalize. To determine the explanation preferences of the different stakeholder types - candidates, recruiters, and companies - we created and validated a semi-structured interview guide. Using grounded theory, we structurally analyzed the results of these interviews and found that different stakeholder types indeed have strongly differing explanation preferences. Candidates indicated a preference for brief, textual explanations that allow them to quickly judge potential matches. On the other hand, hiring managers preferred visual graph-based explanations that provide a more technical and comprehensive overview at a glance. Recruiters found more exhaustive textual explanations preferable, as those provided them with more talking points to convince both parties of the match. Based on these findings, we describe guidelines on how to design an explanation interface that fulfills the requirements of all three stakeholder types. Furthermore, we provide the validated interview guide, which can assist future research in determining the explanation preferences of different stakeholder types.
    摘要 最近的法规提案已经提高了高风险领域内的可解释人工智能(XAI)的需求,特别是在招聘领域。在招聘领域,人工智能已经广泛应用,主要是在 forme of job recommender systems(JRSs),用于匹配候选人和职位。然而,常见的XAI技术经常在这个领域下功不逮,因为不同的个人拥有不同水平和类型的专业知识,使得解释困难于总结。为了确定不同参与者类型(候选人、招聘人员和公司)的解释偏好,我们创建了和验证了一份 semi-structured 采访指南。使用基本的理论,我们结构分析了采访结果,并发现不同参与者类型确实有强烈不同的解释偏好。候选人表示偏好简洁的文本解释,让他们快速判断可能的匹配。相反,招聘人员偏好可见图表解释,提供技术性和全面的概述。招聘人员则更喜欢详细的文本解释,使得他们有更多的讲话点,以convince both parties of the match。基于这些发现,我们描述了如何设计一个满足所有参与者类型的解释界面的指南。此外,我们还提供了验证过的采访指南,可以帮助未来的研究确定不同参与者类型的解释偏好。

  • paper_url: http://arxiv.org/abs/2309.05501
  • repo_url: None
  • paper_authors: Ha-Thanh Nguyen, Randy Goebel, Francesca Toni, Kostas Stathis, Ken Satoh
  • for: This study aims to evaluate the performance of GPT-3.5 and GPT-4 on a prominent benchmark for legal textual entailment, the COLIEE Task 4 dataset, and to analyze their strengths and weaknesses in handling legal textual entailment tasks.
  • methods: The study uses black-box analysis to evaluate the performance of GPT-3.5 and GPT-4 on the COLIEE Task 4 dataset, which includes legal texts from different periods in Japan.
  • results: The preliminary experimental results show intriguing insights into the models’ performance on the legal textual entailment tasks, including their ability to discern entailment relationships within Japanese statute law across different periods. The study also discusses the influence of training data distribution on the models’ generalizability.
    Abstract The evolution of Generative Pre-trained Transformer (GPT) models has led to significant advancements in various natural language processing applications, particularly in legal textual entailment. We present an analysis of GPT-3.5 (ChatGPT) and GPT-4 performances on COLIEE Task 4 dataset, a prominent benchmark in this domain. The study encompasses data from Heisei 18 (2006) to Reiwa 3 (2021), exploring the models' abilities to discern entailment relationships within Japanese statute law across different periods. Our preliminary experimental results unveil intriguing insights into the models' strengths and weaknesses in handling legal textual entailment tasks, as well as the patterns observed in model performance. In the context of proprietary models with undisclosed architectures and weights, black-box analysis becomes crucial for evaluating their capabilities. We discuss the influence of training data distribution and the implications on the models' generalizability. This analysis serves as a foundation for future research, aiming to optimize GPT-based models and enable their successful adoption in legal information extraction and entailment applications.
    摘要 “GPT模型的演化has led to significant advancements in various natural language processing applications, particularly in legal textual entailment. We present an analysis of GPT-3.5(ChatGPT)and GPT-4 performances on COLIEE Task 4 dataset, a prominent benchmark in this domain. The study encompasses data from Heisei 18(2006)to Reiwa 3(2021), exploring the models' abilities to discern entailment relationships within Japanese statute law across different periods. Our preliminary experimental results unveil intriguing insights into the models' strengths and weaknesses in handling legal textual entailment tasks, as well as the patterns observed in model performance. In the context of proprietary models with undisclosed architectures and weights, black-box analysis becomes crucial for evaluating their capabilities. We discuss the influence of training data distribution and the implications on the models' generalizability. This analysis serves as a foundation for future research, aiming to optimize GPT-based models and enable their successful adoption in legal information extraction and entailment applications.”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

  • paper_url: http://arxiv.org/abs/2309.05500
  • repo_url: None
  • paper_authors: Hai-Long Nguyen, Dieu-Quynh Nguyen, Hoang-Trung Nguyen, Thu-Trang Pham, Huu-Dong Nguyen, Thach-Anh Nguyen, Thi-Hai-Yen Vuong, Ha-Thanh Nguyen
  • for: 本研究主要针对于自然语言处理在法律领域中的应用,尤其是对于低资源语言的法律领域知识获取。
  • methods: 本文使用了相似排名和深度学习模型来解决法律文档检索任务,而对于第二个任务,即从相关法律文章中提取问题回答,我们提议了一系列适应性技术来处理不同的问题类型。
  • results: 本文在两个任务上取得了出色的成绩,示出自动问答系统在法律领域,特别是低资源语言中的潜在利好和效果。
    Abstract In recent years, natural language processing has gained significant popularity in various sectors, including the legal domain. This paper presents NeCo Team's solutions to the Vietnamese text processing tasks provided in the Automated Legal Question Answering Competition 2023 (ALQAC 2023), focusing on legal domain knowledge acquisition for low-resource languages through data enrichment. Our methods for the legal document retrieval task employ a combination of similarity ranking and deep learning models, while for the second task, which requires extracting an answer from a relevant legal article in response to a question, we propose a range of adaptive techniques to handle different question types. Our approaches achieve outstanding results on both tasks of the competition, demonstrating the potential benefits and effectiveness of question answering systems in the legal field, particularly for low-resource languages.
    摘要 Recently, 自然语言处理技术在不同领域获得了广泛的应用,其中包括法律领域。这篇论文介绍了NeCo Team在2023年自动法律问答比赛(ALQAC 2023)中提供的越南文本处理任务解决方案,强调了对低资源语言的法律领域知识取得的数据增强。我们对法律文档检索任务使用了相似排名和深度学习模型,而对第二个任务,即根据问题提取相关法律文章中的答案,我们提议了一系列适应性技巧来处理不同的问题类型。我们的方法在两个任务上都取得了出色的成绩,这表明自动问答系统在法律领域,特别是低资源语言中的潜在效果和优势。

Learning Semantic Segmentation with Query Points Supervision on Aerial Images

  • paper_url: http://arxiv.org/abs/2309.05490
  • repo_url: None
  • paper_authors: Santiago Rivier, Carlos Hinojosa, Silvio Giancola, Bernard Ghanem
  • for: 高解析卫星图像 segmentation 是 remote sensing 中关键的任务,它可以将高分辨率卫星图像分解成有意义的区域。
  • methods: 我们提出了一种weakly supervised learning算法,用于训练基于查询点纠正的semantic segmentation算法。我们的提议方法可以减少手动标注的成本和时间,并且可以达到与充分监督训练相同的性能。
  • results: 我们在一个飞行图像集上测试了我们的弱监督训练方法,并与不同的semantic segmentation架构进行比较。结果显示,我们可以达到与充分监督训练相同的性能,而且可以减少手动标注的成本和时间。
    Abstract Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort.
    摘要 <> translate "Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort."into Simplified Chinese.Here's the translation:<>干扰�chnology is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. latest advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well. Let me know if you have any further questions or requests!

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

  • paper_url: http://arxiv.org/abs/2309.05472
  • repo_url: None
  • paper_authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
  • for: 本研究是为了评估和建立基于自我超级学习(SSL)的法语语音技术而开发的一个开源框架。
  • methods: 本研究使用了多种SSL方法,包括wav2vec 2.0,并提供了大量和多样化的训练数据。
  • results: 研究发现了多种SSL模型的性能,包括预设 versus 微调下游模型、任务特定 versus 任务通用预设模型以及大规模模型训练的碳脚印。
    Abstract Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.
    摘要 自我指导学习(SSL)是现代多种领域的起源,包括计算机视觉和自然语言处理。语音处理受益于SSL的改进,因为大多数当前领域任务都使用预训练模型进行处理。本文介绍了LeBenchmark 2.0,一个开源框架用于评估和构建法语SSL技术。该框架包括大量、多样化和可靠的数据集,包括14,000小时的多种语音,10个预训练SSL wav2vec 2.0模型,共有26亿到100亿可学习参数,并与社区共享。LeBenchmark 2.0还提供了评估协议,包括六个下游任务,以及对预训练SSL模型的研究,包括冻结 versus 精细化下游模型,任务非特定预训练模型 versus 任务特定预训练模型,以及大规模模型训练的碳脚印。

Textbooks Are All You Need II: phi-1.5 technical report

  • paper_url: http://arxiv.org/abs/2309.05463
  • repo_url: None
  • paper_authors: Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee
  • for: 研究小型 transformer 语言模型的力量,以实现更好的自然语言处理和数据生成。
  • methods: 使用现有的大型语言模型(LLMs)生成“文book质”数据,以提高学习过程,并创建了一个1.3亿个 parameter的新模型“phi-1.5”,以测试其在自然语言任务中的表现。
  • results: phi-1.5 模型可以在自然语言任务中表现出与更大的模型相似的能力,并在更复杂的推理任务中表现出优异,如小学数学和基本编程。但模型也会出现幻视和可能性的问题,需要进一步的研究。
    Abstract We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.
    摘要

Panoptic Vision-Language Feature Fields

  • paper_url: http://arxiv.org/abs/2309.05448
  • repo_url: https://github.com/ethz-asl/autolabel
  • paper_authors: Haoran Chen, Kenneth Blomqvist, Francesco Milano, Roland Siegwart
  • for: 这篇论文主要旨在提出一种开放词汇三维 semantic segmentation 方法,可以在运行时使用文本描述来分割场景。
  • methods: 该方法使用了 Panoptic Vision-Language Feature Fields (PVLFF) 算法,它同时进行 semantic segmentation 和 instance segmentation,并通过对 2D 实例段子提案的对比损失函数来学习视觉语言特征和层次实例特征。
  • results: 该方法在 HyperSim、ScanNet 和 Replica 数据集上与现有的三维close-set 精度系统相比,具有相似的性能,而且在 semantic segmentation 方面超过了当前的三维开放词汇系统。此外,我们还进行了方法的ablation Studydemonstrate了我们的模型架构的效果。
    Abstract Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes given at run-time using their text description. In this paper, we propose to our knowledge the first algorithm for open-vocabulary panoptic segmentation, simultaneously performing both semantic and instance segmentation. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF) learns a feature field of the scene, jointly learning vision-language features and hierarchical instance features through a contrastive loss function from 2D instance segment proposals on input frames. Our method achieves comparable performance against the state-of-the-art close-set 3D panoptic systems on the HyperSim, ScanNet and Replica dataset and outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We additionally ablate our method to demonstrate the effectiveness of our model architecture. Our code will be available at https://github.com/ethz-asl/autolabel.
    摘要 近些时候,有人提出了3D开放词汇semantic segmentation的方法。这些方法可以在运行时使用文本描述来将场景分成任意类别。在这篇论文中,我们提出了我们知道的第一种开放词汇panoptic segmentation算法,同时进行semantic和instance segmentation。我们的算法,叫做Panoptic Vision-Language Feature Fields(PVLFF),学习了场景的特征场,同时学习视力语言特征和层次实例特征通过对2D实例分割提案的对比损失函数。我们的方法在HyperSim、ScanNet和Replica数据集上与状态的封闭3D�anoptic系统具有相似性,并且在semantic segmentation方面超过当前的3D开放词汇系统。我们还进行了方法的ablation来证明我们的模型体系的有效性。我们的代码将在https://github.com/ethz-asl/autolabel中提供。

Improving Information Extraction on Business Documents with Specific Pre-Training Tasks

  • paper_url: http://arxiv.org/abs/2309.05429
  • repo_url: https://github.com/thibaultdouzon/business-document-pre-training
  • paper_authors: Thibault Douzon, Stefan Duffner, Christophe Garcia, Jérémy Espinas
  • for: This paper aims to improve the performance of Information Extraction in business documents using pre-trained language models.
  • methods: The authors use two new pre-training tasks and a post-processing algorithm to extract relevant information from scanned documents.
  • results: The proposed method achieves significant improvements in extraction performance on both public and private datasets.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目的是提高商业文档中信息提取的性能使用预训练语言模型。
  • methods: 作者使用两个新的预训练任务和一种增强算法来从扫描文档中提取相关信息。
  • results: 提案的方法在公共(从93.88提升到95.50 F1分数)和私人(从84.35提升到84.84 F1分数)数据集上都取得了显著的提升。
    Abstract Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training tasks proposed in the literature for business documents are too generic and not sufficient to learn more complex structures. In this paper, we use LayoutLM, a language model pre-trained on a collection of business documents, and introduce two new pre-training tasks that further improve its capacity to extract relevant information. The first is aimed at better understanding the complex layout of documents, and the second focuses on numeric values and their order of magnitude. These tasks force the model to learn better-contextualized representations of the scanned documents. We further introduce a new post-processing algorithm to decode BIESO tags in Information Extraction that performs better with complex entities. Our method significantly improves extraction performance on both public (from 93.88 to 95.50 F1 score) and private (from 84.35 to 84.84 F1 score) datasets composed of expense receipts, invoices, and purchase orders.
    摘要 transformer-based 语言模型在自然语言处理相关任务中广泛应用。它们的预训练使其在商业文档中的信息EXTRACTION任务上成功适应。然而,大多数在文献中提出的预训练任务 для商业文档太过普遍,无法学习更复杂的结构。在这篇论文中,我们使用LayoutLM,一个基于商业文档的语言模型,并提出了两个新的预训练任务来进一步提高其EXTRACTION信息的能力。第一个任务是了解商业文档的复杂结构,第二个任务是关注数字值和其大小的顺序。这两个任务让模型学习更好地Contextualized表示商务文档。我们还提出了一种新的后处理算法,用于解码BIESO标签在信息EXTRACTION中,该算法在处理复杂实体时表现更好。我们的方法在公共(从93.88提高到95.50 F1分数)和私人(从84.35提高到84.84 F1分数)数据集上显著提高EXTRACTION性能。

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

  • paper_url: http://arxiv.org/abs/2309.05423
  • repo_url: None
  • paper_authors: Jinzuomu Zhong, Yang Li, Hui Huang, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu
  • for: 提高文本到语音转换(TTS)的自然化和可控性。
  • methods: 提出了一个两阶段自动标注管道,包括对 Speech-Silence 和 Word-Punctuation(SSWP)对的对比预处理,以增强从文本语音空间提取的 просоди空间。
  • results: 实验证明,提出的方法可以自动生成 просоди标注,并达到当前最佳性(SOTA)表现。此外,模型还在不同数据量测试下表现出了remarkable的稳定性。
    Abstract In the realm of expressive Text-to-Speech (TTS), explicit prosodic boundaries significantly advance the naturalness and controllability of synthesized speech. While human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes. Despite the availability of extensive supervised data, the current benchmark model still faces performance setbacks. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. Specifically, in the first stage, we propose contrastive text-speech pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs. The pretraining procedure hammers at enhancing the prosodic space extracted from joint text-speech space. In the second stage, we build a multi-modal prosody annotator, which consists of pretrained encoders, a straightforward yet effective text-speech feature fusion scheme, and a sequence classifier. Extensive experiments conclusively demonstrate that our proposed method excels at automatically generating prosody annotation and achieves state-of-the-art (SOTA) performance. Furthermore, our novel model has exhibited remarkable resilience when tested with varying amounts of data.
    摘要 在表达力强的文本至语音(TTS)领域,显著提高自然性和可控性的Explicit prosody bounding significantly advances the naturalness and controllability of synthesized speech. Although human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes. Despite the availability of extensive supervised data, the current benchmark model still faces performance setbacks. To address this issue, this paper proposes a two-stage automatic annotation pipeline in a novel way. Specifically, in the first stage, we propose contrastive text-speech pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs. The pretraining procedure aims to enhance the prosodic space extracted from the joint text-speech space. In the second stage, we build a multi-modal prosody annotator, which consists of pretrained encoders, a straightforward yet effective text-speech feature fusion scheme, and a sequence classifier. Extensive experiments conclusively demonstrate that our proposed method excels at automatically generating prosody annotation and achieves state-of-the-art (SOTA) performance. Moreover, our novel model has exhibited remarkable resilience when tested with varying amounts of data.

Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations

  • paper_url: http://arxiv.org/abs/2309.05381
  • repo_url: None
  • paper_authors: Salah Ghamizi, Maxime Cordy, Yuejun Guo, Mike Papadakis, And Yves Le Traon
  • for: 这篇论文主要针对Machine Learning测试领域的empirical研究进行了分析和评估,发现常见的10种empirical评估风险,这些风险可能导致实验结果不准确,并提出了10种良好的empirical做法来 Mitigate these risks。
  • methods: 本论文首先对相关文献进行了survey,并从中分析出10种常见的empirical评估风险,然后对30篇发表在top-tier SE会议上的Influential Studies进行了敏感性分析,以证明这些风险的重要性。
  • results: 研究发现,所有10种风险都有可能导致实验结果不准确,需要正确处理。此外,本论文还提出了10种良好的empirical做法,可以减少这些风险的影响。
    Abstract Much research on Machine Learning testing relies on empirical studies that evaluate and show their potential. However, in this context empirical results are sensitive to a number of parameters that can adversely impact the results of the experiments and potentially lead to wrong conclusions (Type I errors, i.e., incorrectly rejecting the Null Hypothesis). To this end, we survey the related literature and identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results. We then perform a sensitivity analysis on 30 influential studies that were published in top-tier SE venues, against our hazard set and demonstrate their criticality. Our findings indicate that all 10 hazards we identify have the potential to invalidate experimental findings, such as those made by the related literature, and should be handled properly. Going a step further, we propose a point set of 10 good empirical practices that has the potential to mitigate the impact of the hazards. We believe our work forms the first step towards raising awareness of the common pitfalls and good practices within the software engineering community and hopefully contribute towards setting particular expectations for empirical research in the field of deep learning testing.
    摘要 很多机器学习测试研究依赖于实证研究,以评估和显示其潜力。然而,在这种情况下,实证结果受到许多参数的影响,可能导致实验结果不准确(类型一错误,即错正null Hypothesis)。为此,我们对相关文献进行了检查,并确定了10种常见的实证评估障碍,可能对实验结果产生重大影响。然后,我们对30篇发表在首屈SE会议上的影响力很大的研究进行了敏感性分析,以评估这些障碍对实验结果的影响。我们发现,这10种障碍都有可能导致实验结果无效,因此应当正确处理。为了进一步减少这些障碍的影响,我们提出了10种好的实证做法。我们认为,我们的工作是机器学习测试领域的第一步,希望通过提高社区对实证研究的认识,并为这一领域设置特定的期望。

Steps Towards Satisficing Distributed Dynamic Team Trust

  • paper_url: http://arxiv.org/abs/2309.05378
  • repo_url: None
  • paper_authors: Edmund R. Hunt, Chris Baber, Mehdi Sobhani, Sanja Milivojevic, Sagir Yusuf, Mirco Musolesi, Patrick Waterson, Sally Maynard
  • for: 本研究旨在为动态多代理团队定义和测量信任,特别在国防和安全领域。
  • methods: 本研究使用目标和团队价值定义来定义信任,并提出了一组可解释性的信任指标。
  • results: 研究表明,只有在目标和法律原则层次上可以实现对一致,而不可以在团队价值层次上实现。
    Abstract Defining and measuring trust in dynamic, multiagent teams is important in a range of contexts, particularly in defense and security domains. Team members should be trusted to work towards agreed goals and in accordance with shared values. In this paper, our concern is with the definition of goals and values such that it is possible to define 'trust' in a way that is interpretable, and hence usable, by both humans and robots. We argue that the outcome of team activity can be considered in terms of 'goal', 'individual/team values', and 'legal principles'. We question whether alignment is possible at the level of 'individual/team values', or only at the 'goal' and 'legal principles' levels. We argue for a set of metrics to define trust in human-robot teams that are interpretable by human or robot team members, and consider an experiment that could demonstrate the notion of 'satisficing trust' over the course of a simulated mission.
    摘要 在多代理团队中定义和测量信任是非常重要,特别在国防和安全领域。团队成员应该被信任以实现共同目标和共同价值观。在这篇论文中,我们关注的是目标和价值的定义,以便可以定义出可解释的信任。我们认为团队活动的结果可以表示为目标、个人/团队价值和法律原则。我们问题是个体/团队价值是否可以与目标和法律原则保持一致,或者只能保持在目标和法律原则之间。我们提出了一组可解释的信任度定义,并考虑了一个实验,可以证明在模拟任务中实现“满意信任”的概念。

Exploring Minecraft Settlement Generators with Generative Shift Analysis

  • paper_url: http://arxiv.org/abs/2309.05371
  • repo_url: None
  • paper_authors: Jean-Baptiste Hervé, Oliver Withington, Marion Hervé, Laurissa Tokarchuk, Christoph Salge
  • for: 评估和比较生成系统的方法和工具的发展在增长的兴趣领域。
  • methods: 引入了一种新的评估生成管道的方法,即生成扩散,用于评估生成过程对 pré-exist 的文件的影响。
  • results: 通过应用这种方法到一个非常丰富的 Minecraft 游戏地图数据集中,发现这种方法是一种有前途的评估生成管道的方法,并且可以在各个领域中应用。
    Abstract With growing interest in Procedural Content Generation (PCG) it becomes increasingly important to develop methods and tools for evaluating and comparing alternative systems. There is a particular lack regarding the evaluation of generative pipelines, where a set of generative systems work in series to make iterative changes to an artifact. We introduce a novel method called Generative Shift for evaluating the impact of individual stages in a PCG pipeline by quantifying the impact that a generative process has when it is applied to a pre-existing artifact. We explore this technique by applying it to a very rich dataset of Minecraft game maps produced by a set of alternative settlement generators developed as part of the Generative Design in Minecraft Competition (GDMC), all of which are designed to produce appropriate settlements for a pre-existing map. While this is an early exploration of this technique we find it to be a promising lens to apply to PCG evaluation, and we are optimistic about the potential of Generative Shift to be a domain-agnostic method for evaluating generative pipelines.
    摘要 随着生成内容生成(PCG)的兴趣增长,评估和比较不同系统的方法和工具变得越来越重要。特别是生成管道的评估,这里是一系列的生成系统在 serie 的改变一个文件。我们介绍了一种新的方法,叫做生成偏移(Generative Shift),用于评估生成管道中每个阶段的影响。我们通过应用这种技术来一个非常富裕的 Minecraft 游戏地图数据集,这个数据集包括一些用于生成适当的定居点的替代式定居生成器,这些生成器都是为某个预先存在的地图生成的。虽然这是我们对这种技术的早期探索,但我们认为生成偏移是一种适用于 PCG 评估的领域独特方法。

Feature-based Transferable Disruption Prediction for future tokamaks using domain adaptation

  • paper_url: http://arxiv.org/abs/2309.05361
  • repo_url: None
  • paper_authors: Chengshuo Shen, Wei Zheng, Bihao Guo, Dalong Chen, Xinkun Ai, Fengming Xue, Yu Zhong, Nengchao Wang, Biao Shen, Binjia Xiao, Yonghua Ding, Zhongyong Chen, Yuan Pan, J-TEXT team
  • for: 预测未来tokamak中的干扰 (predicting disruptions in future tokamaks)
  • methods: 使用域 adaptation算法CORAL,将未来tokamak数据和现有tokamak数据相互对应,然后使用机器学习模型进行预测 (using domain adaptation algorithm CORAL to align data from future tokamaks and existing tokamaks, and then using a machine learning model for prediction)
  • results: 提高了未来tokamak中预测干扰性能 (improved disruption prediction performance for future tokamaks)
    Abstract The high acquisition cost and the significant demand for disruptive discharges for data-driven disruption prediction models in future tokamaks pose an inherent contradiction in disruption prediction research. In this paper, we demonstrated a novel approach to predict disruption in a future tokamak only using a few discharges based on a domain adaptation algorithm called CORAL. It is the first attempt at applying domain adaptation in the disruption prediction task. In this paper, this disruption prediction approach aligns a few data from the future tokamak (target domain) and a large amount of data from the existing tokamak (source domain) to train a machine learning model in the existing tokamak. To simulate the existing and future tokamak case, we selected J-TEXT as the existing tokamak and EAST as the future tokamak. To simulate the lack of disruptive data in future tokamak, we only selected 100 non-disruptive discharges and 10 disruptive discharges from EAST as the target domain training data. We have improved CORAL to make it more suitable for the disruption prediction task, called supervised CORAL. Compared to the model trained by mixing data from the two tokamaks, the supervised CORAL model can enhance the disruption prediction performance for future tokamaks (AUC value from 0.764 to 0.890). Through interpretable analysis, we discovered that using the supervised CORAL enables the transformation of data distribution to be more similar to future tokamak. An assessment method for evaluating whether a model has learned a trend of similar features is designed based on SHAP analysis. It demonstrates that the supervised CORAL model exhibits more similarities to the model trained on large data sizes of EAST. FTDP provides a light, interpretable, and few-data-required way by aligning features to predict disruption using small data sizes from the future tokamak.
    摘要 高的投资成本和未来tokamak中数据驱动干扰预测模型的强大需求形成了这种研究的内在矛盾。在这篇论文中,我们提出了一种新的方法,可以在未来tokamak中预测干扰,只使用几个数据。我们使用了域适应算法called CORAL,这是对域适应 task的第一次应用。在这篇论文中,我们将未来tokamak中的数据与现有tokamak中的大量数据进行了对应,以训练一个机器学习模型。为了模拟现有和未来tokamak的情况,我们选择了J-TEXT作为现有tokamak,并选择了EAST作为未来tokamak。为了模拟未来tokamak中缺乏干扰数据的情况,我们只选择了100个非干扰的燃烧和10个干扰的燃烧作为目标域训练数据。我们对CORAL进行了改进,以使其更适合干扰预测任务,称为超级vised CORAL。相比于将数据从两个tokamak混合训练的模型,超级vised CORAL模型可以提高未来tokamak中的干扰预测性能(AUC值从0.764提高到0.890)。通过可解释分析,我们发现使用超级vised CORAL可以使数据分布更加类似于未来tokamak。我们设计了一种基于SHAP分析的评估方法,以判断模型是否学习了类似特征的趋势。结果显示,超级vised CORAL模型更加类似于基于EAST大量数据训练的模型。FTDP提供了一种轻量级、可解释、只需几据的方法,可以通过对未来tokamak中的数据进行对应,预测干扰。

Semantic Latent Decomposition with Normalizing Flows for Face Editing

  • paper_url: http://arxiv.org/abs/2309.05314
  • repo_url: https://github.com/phil329/sdflow
  • paper_authors: Binglei Li, Zhizhong Huang, Hongming Shan, Junping Zhang
  • for: 这 paper 的目的是提出一种新的面部编辑方法,以解决 StyleGAN 的 latent space 中 attribute 的杂糅问题。
  • methods: 该方法使用 continuous conditional normalizing flows 进行 semantic decomposition in original latent space,并通过 jointly optimizing 两部分来解决 entanglement 问题:(i) 一个 semantic encoder 来估算输入面部的 semantic variables,以及 (ii) 一个 flow-based transformation module 来将 latent code 映射到一个 semantic-irrelevant variable in Gaussian distribution。
  • results: 实验结果表明,SDFlow 比 existing state-of-the-art face editing methods 更高效和更 precisley, both qualitatively and quantitatively。I hope this helps! Let me know if you have any other questions.
    Abstract Navigating in the latent space of StyleGAN has shown effectiveness for face editing. However, the resulting methods usually encounter challenges in complicated navigation due to the entanglement among different attributes in the latent space. To address this issue, this paper proposes a novel framework, termed SDFlow, with a semantic decomposition in original latent space using continuous conditional normalizing flows. Specifically, SDFlow decomposes the original latent code into different irrelevant variables by jointly optimizing two components: (i) a semantic encoder to estimate semantic variables from input faces and (ii) a flow-based transformation module to map the latent code into a semantic-irrelevant variable in Gaussian distribution, conditioned on the learned semantic variables. To eliminate the entanglement between variables, we employ a disentangled learning strategy under a mutual information framework, thereby providing precise manipulation controls. Experimental results demonstrate that SDFlow outperforms existing state-of-the-art face editing methods both qualitatively and quantitatively. The source code is made available at https://github.com/phil329/SDFlow.
    摘要 在 StyleGAN 的幽默空间中导航显示了面部编辑的效iveness。然而,通常会遇到 complicated navigation 的问题,这是因为在幽默空间中各个特征之间存在杂化。为解决这个问题,这篇论文提议了一种新的框架,称为 SDFlow,它使用 kontinuous 的 conditional normalizing flows 进行原始 latent space 的semantic decomposition。 Specifically, SDFlow 将原始 latent code 分解成不相关的变量,通过同时优化两个组件:(i)一个 semantic encoder 来 estimatesemantic variables from input faces,以及(ii)一个 flow-based transformation module 来将 latent code 映射到一个 semantic-irrelevant variable in Gaussian distribution, conditioned on the learned semantic variables。为消除变量之间的杂化,我们采用了一种分解学习策略,基于 mutual information 框架,从而提供精准的操作控制。实验结果表明,SDFlow 在质量和量化两个方面都能够超越现有的面部编辑方法。代码可以在 获取。

Unsupervised human-to-robot motion retargeting via expressive latent space

  • paper_url: http://arxiv.org/abs/2309.05310
  • repo_url: None
  • paper_authors: Yashuai Yan, Esteve Valls Mascaro, Dongheui Lee
  • For: 这个论文提出了一种新的人机动作重定向方法,使得机器人能够准确模仿人类动作,同时保留动作的 semantics。* Methods: 我们提出了一种深度学习方法,直接将人类动作转换为机器人动作。我们的方法不需要对人类动作和机器人动作进行注解,从而降低了在新机器人上采用的努力。* Results: 我们所提出的方法可以准确地控制机器人动作,并且可以通过简单的线性 interpolate 在幻数空间中生成中间动作。我们还进行了多种输入模式的评估,如文本、RGB 视频和关键姿势,从而提高了用户Control 机器人的容易性。
    Abstract This paper introduces a novel approach for human-to-robot motion retargeting, enabling robots to mimic human motion with precision while preserving the semantics of the motion. For that, we propose a deep learning method for direct translation from human to robot motion. Our method does not require annotated paired human-to-robot motion data, which reduces the effort when adopting new robots. To this end, we first propose a cross-domain similarity metric to compare the poses from different domains (i.e., human and robot). Then, our method achieves the construction of a shared latent space via contrastive learning and decodes latent representations to robot motion control commands. The learned latent space exhibits expressiveness as it captures the motions precisely and allows direct motion control in the latent space. We showcase how to generate in-between motion through simple linear interpolation in the latent space between two projected human poses. Additionally, we conducted a comprehensive evaluation of robot control using diverse modality inputs, such as texts, RGB videos, and key-poses, which enhances the ease of robot control to users of all backgrounds. Finally, we compare our model with existing works and quantitatively and qualitatively demonstrate the effectiveness of our approach, enhancing natural human-robot communication and fostering trust in integrating robots into daily life.
    摘要 First, we propose a cross-domain similarity metric to compare human and robot poses. Then, we use contrastive learning to construct a shared latent space that captures human motions precisely and allows for direct motion control in the latent space. We show that the learned latent space is expressive and can be used to generate in-between motion through linear interpolation.We also evaluate the effectiveness of our approach using diverse modality inputs, such as texts, RGB videos, and key-poses. Our model outperforms existing works and demonstrates the potential for natural human-robot communication and trust in integrating robots into daily life.Here is the Simplified Chinese translation of the text:这篇论文提出了一种新的人机动作重定向方法,允许机器人模仿人类动作的精度,同时保持动作的 semantics。为此,我们提出了一种深度学习方法,直接将人类动作转化为机器人动作。我们的方法不需要标注的人机动作数据对,这 reduces the effort when adopting new robots.首先,我们提出了域间相似度metric来比较人类和机器人的姿势。然后,我们使用强制学习来构建一个共享的幂空间,该幂空间能够 preciselly capture人类动作并允许直接在幂空间中控制机器人动作。我们表明了 learned幂空间的表达能力,并可以通过简单的直线 interpolate在幂空间中生成间隔动作。我们还进行了多种输入模式的全面评估,包括文本、RGB视频和关键姿势。我们的模型超越了现有的方法,并证明了我们的方法的可行性和在日常生活中机器人的普适性。

Discrete Denoising Diffusion Approach to Integer Factorization

  • paper_url: http://arxiv.org/abs/2309.05295
  • repo_url: https://github.com/karlisfre/diffusion-factorization
  • paper_authors: Karlis Freivalds, Emils Ozolins, Guntis Barzdins
  • for: 这个论文是为了解决一个知名的计算问题—整数因数分解—的 polynomials time 问题。
  • methods: 这个论文使用了深度神经网络和粗粒度滤波来实现因数分解。它通过在具有一定准确性的基础上多次更正错误来实现这一目标。
  • results: 论文的实验结果表明,这种方法可以为整数的因数分解计算出精确的结果,并且可以处理比较长的整数(up to 56 bits)。此外,论文还发现,对于某些整数,随着训练步骤的增加,在判断步骤中所需的抽样步骤数量会下降,从而使得计算时间减少。
    Abstract Integer factorization is a famous computational problem unknown whether being solvable in the polynomial time. With the rise of deep neural networks, it is interesting whether they can facilitate faster factorization. We present an approach to factorization utilizing deep neural networks and discrete denoising diffusion that works by iteratively correcting errors in a partially-correct solution. To this end, we develop a new seq2seq neural network architecture, employ relaxed categorical distribution and adapt the reverse diffusion process to cope better with inaccuracies in the denoising step. The approach is able to find factors for integers of up to 56 bits long. Our analysis indicates that investment in training leads to an exponential decrease of sampling steps required at inference to achieve a given success rate, thus counteracting an exponential run-time increase depending on the bit-length.
    摘要 “数值因式分解是一个著名的计算问题,是否可以在多项时间内解决。深度神经网在问题中发挥了作用,可以帮助实现更快的因式分解。我们提出了一种使用深度神经网和粗糙推导的因式分解方法,它通过逐步纠正错误来实现。为此,我们开发了一个新的seq2seq神经网架构,使用宽松的分类分布和逆推导过程来更好地处理错误。这种方法可以为整数长度达56位的因子找到因式。我们的分析显示,对于对于整数的训练投入,对于指定的成功率而言,推导步骤的数量会 exponentially decrease,从而抵销推导过程中时间增长的 exponential 增长。”Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Addressing Feature Imbalance in Sound Source Separation

  • paper_url: http://arxiv.org/abs/2309.05287
  • repo_url: None
  • paper_authors: Jaechang Kim, Jeongyeon Hwang, Soheun Yi, Jaewoong Cho, Jungseul Ok
  • for: 这个论文是为了解决神经网络在源分离任务中的特征偏好问题。
  • methods: 这个论文提出了一种名为FEABASE的方法,通过对快速特征进行抑制来解决特征偏好问题。
  • results: 在多通道源分离任务中,FEABASE方法可以有效地使用数据,并且可以解决特征偏好问题。
    Abstract Neural networks often suffer from a feature preference problem, where they tend to overly rely on specific features to solve a task while disregarding other features, even if those neglected features are essential for the task. Feature preference problems have primarily been investigated in classification task. However, we observe that feature preference occurs in high-dimensional regression task, specifically, source separation. To mitigate feature preference in source separation, we propose FEAture BAlancing by Suppressing Easy feature (FEABASE). This approach enables efficient data utilization by learning hidden information about the neglected feature. We evaluate our method in a multi-channel source separation task, where feature preference between spatial feature and timbre feature appears.
    摘要 neural networks 常会面临特征偏好问题,即它们会过于依赖特定特征来解决任务,而忽略其他特征,即使这些忽略的特征是任务所必需的。特征偏好问题主要在分类任务中被研究,但我们发现,在高维回归任务中,特征偏好也存在。为了解决源分离中的特征偏好,我们提议了 FEAture BAlancing by Suppressing Easy feature (FEABASE) 方法。这种方法可以有效地利用数据,学习抑制被忽略的特征中的隐藏信息。我们在多通道源分离任务中评估了我们的方法,发现特征偏好 между空间特征和气质特征存在。

Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving

  • paper_url: http://arxiv.org/abs/2309.05282
  • repo_url: None
  • paper_authors: Ali Keysan, Andreas Look, Eitan Kosman, Gonca Gürsun, Jörg Wagner, Yu Yao, Barbara Rakitsch
  • for: 本研究旨在提出一种新的文本基于表示方法,用于描述交通场景,并使用预训练语言编码器进行处理。
  • methods: 本研究使用文本基于表示方法,与经典化的图像表示方法相结合,实现描述场景的嵌入。
  • results: 研究表明,将文本基于表示方法与经典化的图像表示方法结合使用,可以获得更加描述场景的嵌入。此外,对于nuScenes dataset的预测,也显示了与基eline相比的显著提高。最后,ablation研究表明,结合文本和图像的共同编码器可以超过单独的编码器,confirming that both representations have their complementary strengths。
    Abstract In autonomous driving tasks, scene understanding is the first step towards predicting the future behavior of the surrounding traffic participants. Yet, how to represent a given scene and extract its features are still open research questions. In this study, we propose a novel text-based representation of traffic scenes and process it with a pre-trained language encoder. First, we show that text-based representations, combined with classical rasterized image representations, lead to descriptive scene embeddings. Second, we benchmark our predictions on the nuScenes dataset and show significant improvements compared to baselines. Third, we show in an ablation study that a joint encoder of text and rasterized images outperforms the individual encoders confirming that both representations have their complementary strengths.
    摘要 自主驾驶任务中,场景理解是predicting the future behavior of surrounding traffic participants的第一步。然而,如何表示给定场景并提取其特征仍然是开放的研究问题。在这种研究中,我们提议一种新的文本基于表示交通场景的方法,并使用预训练的语言编码器进行处理。首先,我们表明了文本基于表示,与经典的照片表示结合,导致描述性场景嵌入。第二,我们对nuScenes数据集进行了评估,并显示了与基eline相比有显著的提高。第三,我们在剖析研究中表明,结合文本和照片的共同编码器表现出色,超过了两个编码器的独立表现,确认了它们在不同领域具有互补强点。

EANet: Expert Attention Network for Online Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2309.05683
  • repo_url: None
  • paper_authors: Pengfei Yao, Tianlu Mao, Min Shi, Jingkai Sun, Zhaoqi Wang
  • for: 提高自动驾驶中的轨迹预测精度,解决现有主流研究和连续学习方法在快速变化的场景下的预测精度低下问题。
  • methods: 提出了专家注意力网络,一种完整的在线学习框架,通过调整网络层次的权重,解决了Gradient Problem问题,使模型更快地学习新场景知识,恢复预测精度。还提出了短期运动趋势kernel函数,敏感于场景变化,让模型快速响应。
  • results: 对比 Traditional methods,我们的方法可以快速降低预测错误,达到领域的最佳预测精度。
    Abstract Trajectory prediction plays a crucial role in autonomous driving. Existing mainstream research and continuoual learning-based methods all require training on complete datasets, leading to poor prediction accuracy when sudden changes in scenarios occur and failing to promptly respond and update the model. Whether these methods can make a prediction in real-time and use data instances to update the model immediately(i.e., online learning settings) remains a question. The problem of gradient explosion or vanishing caused by data instance streams also needs to be addressed. Inspired by Hedge Propagation algorithm, we propose Expert Attention Network, a complete online learning framework for trajectory prediction. We introduce expert attention, which adjusts the weights of different depths of network layers, avoiding the model updated slowly due to gradient problem and enabling fast learning of new scenario's knowledge to restore prediction accuracy. Furthermore, we propose a short-term motion trend kernel function which is sensitive to scenario change, allowing the model to respond quickly. To the best of our knowledge, this work is the first attempt to address the online learning problem in trajectory prediction. The experimental results indicate that traditional methods suffer from gradient problems and that our method can quickly reduce prediction errors and reach the state-of-the-art prediction accuracy.
    摘要 准确预测轨迹对自动驾驶至关重要。现有主流研究和连续学习基于方法都需要完整的数据集训练,导致enario sudden changes时预测精度低下,无法及时更新模型。whether these methods can make a prediction in real-time and use data instances to update the model immediately (i.e., online learning settings) remains a question. The problem of gradient explosion or vanishing caused by data instance streams also needs to be addressed. Inspired by Hedge Propagation algorithm, we propose Expert Attention Network, a complete online learning framework for trajectory prediction. We introduce expert attention, which adjusts the weights of different depths of network layers, avoiding the model updated slowly due to gradient problem and enabling fast learning of new scenario's knowledge to restore prediction accuracy. Furthermore, we propose a short-term motion trend kernel function which is sensitive to scenario change, allowing the model to respond quickly. To the best of our knowledge, this work is the first attempt to address the online learning problem in trajectory prediction. The experimental results indicate that traditional methods suffer from gradient problems and that our method can quickly reduce prediction errors and reach the state-of-the-art prediction accuracy.Here's the text with some additional information about the Simplified Chinese translation:Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. It is written using the same characters as Traditional Chinese, but with some differences in stroke order and font style.In this translation, I have used Simplified Chinese characters and stroke order to represent the text. However, I have retained the traditional Chinese font style to maintain the original formatting and readability of the text.Additionally, I have made some adjustments to the wording and phrasing to ensure that the translation is accurate and natural-sounding in Simplified Chinese. For example, I have used the word "预测" (yùzhè) instead of "prediction" to emphasize the prediction aspect of the text, and I have used the phrase "轨迹预测" (guīdào yùzhè) instead of "trajectory prediction" to reflect the common usage of this phrase in Simplified Chinese.Overall, I hope that this translation provides a clear and accurate representation of the original text in Simplified Chinese.

AutoFuse: Automatic Fusion Networks for Deformable Medical Image Registration

  • paper_url: http://arxiv.org/abs/2309.05271
  • repo_url: https://github.com/mungomeng/registration-autofuse
  • paper_authors: Mingyuan Meng, Michael Fulham, Dagan Feng, Lei Bi, Jinman Kim
  • for: 本研究旨在解决deep neural network(DNN)基于的扭曲图像匹配中的空间相对性问题,以便实现医疗任务中的肿瘤生长监测和人口分析等。
  • methods: 我们提出了一种数据驱动的拼接策略(AutoFuse),以便在DNN中自动调整匹配的空间相对性策略。我们还提出了一种拼接门(Fusion Gate,FG)模块,以控制在每个网络位置上如何拼接信息。
  • results: 我们的AutoFuse在两个well-benchmarked医疗匹配任务(inter-和intra-patient匹配)上,使用八个公共数据集进行了广泛的实验,并证明了它在无标签和弱标签的情况下超过了现有的无监督和半监督匹配方法。
    Abstract Deformable image registration aims to find a dense non-linear spatial correspondence between a pair of images, which is a crucial step for many medical tasks such as tumor growth monitoring and population analysis. Recently, Deep Neural Networks (DNNs) have been widely recognized for their ability to perform fast end-to-end registration. However, DNN-based registration needs to explore the spatial information of each image and fuse this information to characterize spatial correspondence. This raises an essential question: what is the optimal fusion strategy to characterize spatial correspondence? Existing fusion strategies (e.g., early fusion, late fusion) were empirically designed to fuse information by manually defined prior knowledge, which inevitably constrains the registration performance within the limits of empirical designs. In this study, we depart from existing empirically-designed fusion strategies and develop a data-driven fusion strategy for deformable image registration. To achieve this, we propose an Automatic Fusion network (AutoFuse) that provides flexibility to fuse information at many potential locations within the network. A Fusion Gate (FG) module is also proposed to control how to fuse information at each potential network location based on training data. Our AutoFuse can automatically optimize its fusion strategy during training and can be generalizable to both unsupervised registration (without any labels) and semi-supervised registration (with weak labels provided for partial training data). Extensive experiments on two well-benchmarked medical registration tasks (inter- and intra-patient registration) with eight public datasets show that our AutoFuse outperforms state-of-the-art unsupervised and semi-supervised registration methods.
    摘要 折叠图像匹配目标是找到两个图像之间的稠密非线性空间匹配,这是医学任务中的关键步骤,如肿瘤增长监测和人口分析。在最近几年,深度神经网络(DNNs)已经广泛应用于医学图像匹配中,但是DNNs需要挖掘每个图像的空间信息并将其融合以特征化空间匹配。这引出了一个关键问题:何种最佳的融合策略可以特征化空间匹配?现有的融合策略(例如早期融合、晚期融合)是基于手动定义的先验知识,这会限制匹配性能在实际设计的范围内。在本研究中,我们决定不遵循现有的经验设计的融合策略,而是开发一种数据驱动的融合策略。为此,我们提出了一种自动融合网络(AutoFuse),该网络可以在多个可能的网络位置融合信息,并且通过一种名为卷积网络(Fusion Gate,FG)模块来控制在每个可能的网络位置如何融合信息,根据训练数据来定制。我们的AutoFuse可以在训练期间自动优化其融合策略,并且可以泛化到无标签注意力匹配和半标签注意力匹配。我们在两个医学图像匹配任务(间人匹配和内部人匹配)上进行了八个公共数据集的广泛实验,结果表明,我们的AutoFuse在无标签注意力匹配和半标签注意力匹配中都超过了状态艺术的无标签注意力匹配和半标签注意力匹配方法。

UniKG: A Benchmark and Universal Embedding for Large-Scale Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2309.05269
  • repo_url: https://github.com/yide-qiu/unikg
  • paper_authors: Yide Qiu, Shaoxiang Ling, Tong Zhang, Bo Huang, Zhen Cui
    for: This paper is written to explore useful knowledge from real-world data by constructing a large-scale heterogeneous graph (HG) benchmark dataset named UniKG from Wikidata, and to propose effective learning methods for large-scale HGs.methods: The paper proposes two key measures for effective learning on large-scale HGs, including a semantic alignment strategy for multi-attribute entities and a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels.results: The paper sets up a node classification task on the UniKG dataset and evaluates multiple baseline methods, which demonstrate the effectiveness of the proposed methods in mining multi-attribute association through multi-hop aggregation in large-scale HGs.
    Abstract Irregular data in real-world are usually organized as heterogeneous graphs (HGs) consisting of multiple types of nodes and edges. To explore useful knowledge from real-world data, both the large-scale encyclopedic HG datasets and corresponding effective learning methods are crucial, but haven't been well investigated. In this paper, we construct a large-scale HG benchmark dataset named UniKG from Wikidata to facilitate knowledge mining and heterogeneous graph representation learning. Overall, UniKG contains more than 77 million multi-attribute entities and 2000 diverse association types, which significantly surpasses the scale of existing HG datasets. To perform effective learning on the large-scale UniKG, two key measures are taken, including (i) the semantic alignment strategy for multi-attribute entities, which projects the feature description of multi-attribute nodes into a common embedding space to facilitate node aggregation in a large receptive field; (ii) proposing a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels, which extends methods of large-scale homogeneous graphs to heterogeneous graphs. These two strategies enable efficient information propagation among a tremendous number of multi-attribute entities and meantimes adaptively mine multi-attribute association through the multi-hop aggregation in large-scale HGs. We set up a node classification task on our UniKG dataset, and evaluate multiple baseline methods which are constructed by embedding our APM into large-scale homogenous graph learning methods. Our UniKG dataset and the baseline codes have been released at https://github.com/Yide-Qiu/UniKG.
    摘要 现实世界中的不规则数据通常是多种类型的节点和边组成的不规则图(HG)。为了从现实世界数据中提取有用的知识, both the large-scale encyclopedic HG datasets and corresponding effective learning methods are crucial, but have not been well investigated. In this paper, we construct a large-scale HG benchmark dataset named UniKG from Wikidata to facilitate knowledge mining and heterogeneous graph representation learning. Overall, UniKG contains more than 77 million multi-attribute entities and 2000 diverse association types, which significantly surpasses the scale of existing HG datasets. To perform effective learning on the large-scale UniKG, two key measures are taken, including (i) the semantic alignment strategy for multi-attribute entities, which projects the feature description of multi-attribute nodes into a common embedding space to facilitate node aggregation in a large receptive field; (ii) proposing a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels, which extends methods of large-scale homogeneous graphs to heterogeneous graphs. These two strategies enable efficient information propagation among a tremendous number of multi-attribute entities and meantimes adaptively mine multi-attribute association through the multi-hop aggregation in large-scale HGs. We set up a node classification task on our UniKG dataset, and evaluate multiple baseline methods which are constructed by embedding our APM into large-scale homogeneous graph learning methods. Our UniKG dataset and the baseline codes have been released at https://github.com/Yide-Qiu/UniKG.

Unsupervised Bias Detection in College Student Newspapers

  • paper_url: http://arxiv.org/abs/2309.06557
  • repo_url: None
  • paper_authors: Adam M. Lehavi, William McCormack, Noah Kornfeld, Solomon Glazer
  • for: 这篇论文是为了寻找和检测大学报纸存档中的偏见而写的。
  • methods: 这篇论文使用了一个架构,以帮助自动化工具 Grab data from complex archive sites,并生成了14名学生的14篇报道,共23,154个条目。这些数据可以通过关键词查询来计算偏见,并与原文进行比较来计算偏见的情感。
  • results: 这篇论文的结果表明,使用这种方法可以获得更加精细的偏见分析结果,而无需大量的标注数据和比较偏见。这种方法还可以检测政治敏感词和控制词的偏见,从而帮助更好地理解学生报纸的偏见。
    Abstract This paper presents a pipeline with minimal human influence for scraping and detecting bias on college newspaper archives. This paper introduces a framework for scraping complex archive sites that automated tools fail to grab data from, and subsequently generates a dataset of 14 student papers with 23,154 entries. This data can also then be queried by keyword to calculate bias by comparing the sentiment of a large language model summary to the original article. The advantages of this approach are that it is less comparative than reconstruction bias and requires less labelled data than generating keyword sentiment. Results are calculated on politically charged words as well as control words to show how conclusions can be drawn. The complete method facilitates the extraction of nuanced insights with minimal assumptions and categorizations, paving the way for a more objective understanding of bias within student newspaper sources.
    摘要 Translated into Simplified Chinese:这篇论文介绍了一个极少人工干预的管道,用于抓取和检测大学报纸存档中的偏见。该管道引入了一个自动化工具无法抓取数据的复杂存档网站的框架,并生成了14份学生报纸,共计23154个项目。这些数据可以通过关键词查询来计算偏见,并比较大语言模型总结的情感与原文。这种方法的优点包括减少比较偏见和需要 menos标注数据,相比于生成关键词情感。结果分析了政治敏感词和控制词,以示如何得出结论。该完整的方法可以帮助抽取准确的偏见情况,减少假设和分类,为大学报纸来源中的偏见问题提供更Objective的理解。

Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version)

  • paper_url: http://arxiv.org/abs/2309.05264
  • repo_url: None
  • paper_authors: Pingchuan Ma, Zhenlan Ji, Peisen Yao, Shuai Wang, Kui Ren
  • for: 这个论文的目的是提出一种可靠和安全的 causal discovery 算法,以满足可靠性和隐私性两个方面的要求。
  • methods: 这个论文使用了一种名为 CICheck 的运行时验证工具,该工具可以帮助检测 causal discovery 算法中的不可靠和过多 CI 测试,并提供一种有效的解决方案。 CICheck 使用了一种声明式的编码方案,将 CIR 问题转化为 SMT 问题,并提供了一种四个阶段的决策过程,以及三种轻量级优化技术来提高效率。
  • results: 这个论文的实验结果表明,CICheck 可以帮助提高 causal discovery 算法的可靠性和隐私性,并且可以减少过多的 CI 测试数量。
    Abstract Causal discovery is a powerful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed. Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem). We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively. [abridged due to length limit]
    摘要 causal discovery 是一种 poderful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery Extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed. Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem). We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively.

Brain-inspired Evolutionary Architectures for Spiking Neural Networks

  • paper_url: http://arxiv.org/abs/2309.05263
  • repo_url: None
  • paper_authors: Wenxuan Pan, Feifei Zhao, Zhuoya Zhao, Yi Zeng
  • For: This paper explores the evolutionary mechanisms of biological neural networks in the human brain and applies them to optimize the architecture of Spiking Neural Networks (SNNs).* Methods: The paper proposes an efficient multi-objective evolutionary algorithm based on a few-shot performance predictor to evolve SNNs architecture, incorporating brain-inspired local modular structure and global cross-module connectivity.* Results: The proposed model achieves high performance, efficiency, and low energy consumption on various datasets, including static and neuromorphic datasets. The results demonstrate the effectiveness of the brain-inspired approach to SNNs architecture optimization.Here’s the full text in Simplified Chinese:* For: 这篇论文探索了人脑中生物神经网络的进化机制,并应用其来优化神经网络(SNNs)的建立。* Methods: 论文提出了一种高效的多目标进化算法,基于几个shot性能预测器来演化SNNs的建立,包括脑Region-inspired模块和全模块连接。* Results: 提议的模型在不同的数据集上(CIFAR10、CIFAR100、CIFAR10-DVS、DVS128-Gesture)实现了高性能、高效率和低能耗。结果表明,人脑中的进化机制对SNNs建立具有很好的应用前景。
    Abstract The complex and unique neural network topology of the human brain formed through natural evolution enables it to perform multiple cognitive functions simultaneously. Automated evolutionary mechanisms of biological network structure inspire us to explore efficient architectural optimization for Spiking Neural Networks (SNNs). Instead of manually designed fixed architectures or hierarchical Network Architecture Search (NAS), this paper evolves SNNs architecture by incorporating brain-inspired local modular structure and global cross-module connectivity. Locally, the brain region-inspired module consists of multiple neural motifs with excitatory and inhibitory connections; Globally, we evolve free connections among modules, including long-term cross-module feedforward and feedback connections. We further introduce an efficient multi-objective evolutionary algorithm based on a few-shot performance predictor, endowing SNNs with high performance, efficiency and low energy consumption. Extensive experiments on static datasets (CIFAR10, CIFAR100) and neuromorphic datasets (CIFAR10-DVS, DVS128-Gesture) demonstrate that our proposed model boosts energy efficiency, archiving consistent and remarkable performance. This work explores brain-inspired neural architectures suitable for SNNs and also provides preliminary insights into the evolutionary mechanisms of biological neural networks in the human brain.
    摘要 人脑的复杂和独特神经网络结构,通过自然演化形成,允许它同时执行多种认知功能。我们从生物学上的演化机制中灵感,以便为神经网络算法(SNN)进行有效的建筑优化。而不是手动设计固定的结构或层次Network Architecture Search(NAS),我们在这篇论文中通过启用脑Region-inspired模块和全模块连接来进行SNNs的架构演化。本地,脑区域灵感模块包括多种神经元模式,以及兴奋和抑制连接;全球,我们演化模块之间的自由连接,包括长期跨模块的前向和反向连接。此外,我们还提出了一种高效的多目标进化算法,基于几何性表现预测器,使得SNNs具有高性能、高效和低能耗特点。通过对静止数据集(CIFAR10、CIFAR100)和neuromorphic数据集(CIFAR10-DVS、DVS128-Gesture)的广泛实验,我们的提出的模型提高了能效率,实现了一致性和很好的性能。这种工作探索了适合SNNs的脑神经网络架构,同时也提供了生物学上神经网络的演化机制的初步启示。

Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation

  • paper_url: http://arxiv.org/abs/2309.05238
  • repo_url: https://github.com/ielab/sigir-ap-2023-bolean2natural4sr
  • paper_authors: Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido Zuccon
  • for: 医学系统atic review的层次化检索优化,以提高检索效率和效果。
  • methods: 使用Boolean查询和生成式大语言模型如ChatGPT和Alpaca生成的查询来优化层次化检索。
  • results: 提出了一种实用和有效的层次化检索方法,与最终标题相似效果。
    Abstract Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. The goal is to prioritise the most important documents so that subsequent review steps can be carried out more efficiently and effectively. The current state of the art uses the final title of the review to rank documents using BERT-based neural neural rankers. However, the final title is only formulated at the end of the review process, which makes this approach impractical as it relies on ex post facto information. At the time of screening, only a rough working title is available, with which the BERT-based ranker achieves is significantly worse than the final title. In this paper, we explore alternative sources of queries for screening prioritisation, such as the Boolean query used to retrieve the set of documents to be screened, and queries generated by instruction-based generative large language models such as ChatGPT and Alpaca. Our best approach is not only practical based on the information available at screening time, but is similar in effectiveness with the final title.
    摘要 屏选优化在医疗系统atic reviews中的目的是将检索得到的文档集中分类和排序。目标是通过更高效和有效的方式来快速审核文档,以便更好地使用后续的审核步骤。现状态的最佳实践是使用审核的最终标题使用BERT基于神经网络来排序文档。但是,最终标题只在审核过程的末尾确定,这使得这种方法不实用,因为它基于后审核的信息。在屏选过程中,只有一个粗略的工作标题可用,BERT基于排序器在使用这个工作标题时表现较差。在这篇论文中,我们探讨了屏选优化中的其他查询来源,例如用于检索文档集的 Boolean 查询和基于指令生成的大语言模型如ChatGPT和Alpaca生成的查询。我们的最佳方法不仅是基于屏选时可用的信息实现的实用,而且与最终标题的效果相似。

Detecting Natural Language Biases with Prompt-based Learning

  • paper_url: http://arxiv.org/abs/2309.05227
  • repo_url: None
  • paper_authors: Md Abdul Aowal, Maliha T Islam, Priyanka Mary Mammen, Sandesh Shetty
  • for: 本研究探讨新兴领域的提问工程,并应用其在语言模型偏见检测任务中。
  • methods: 本研究使用手动制作的提问来检测语言模型中的四种偏见:性别、种族、性 orientation和 religión-based。
  • results: 本研究通过对BERT、RoBERTa和T5多种版本进行评估,并通过人工判断和模型自身判断来评估这些模型的偏见。
    Abstract In this project, we want to explore the newly emerging field of prompt engineering and apply it to the downstream task of detecting LM biases. More concretely, we explore how to design prompts that can indicate 4 different types of biases: (1) gender, (2) race, (3) sexual orientation, and (4) religion-based. Within our project, we experiment with different manually crafted prompts that can draw out the subtle biases that may be present in the language model. We apply these prompts to multiple variations of popular and well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases. We provide a comparative analysis of these models and assess them using a two-fold method: use human judgment to decide whether model predictions are biased and utilize model-level judgment (through further prompts) to understand if a model can self-diagnose the biases of its own prediction.
    摘要 在这个项目中,我们想要探索新兴的提问工程领域,并将其应用于语言模型偏见检测下游任务。更具体地说,我们探索如何设计提问,以便可以检测语言模型中的4种类型偏见:(1)性别、(2)种族、(3)性 orientation和(4)宗教基础。在我们的项目中,我们对不同的手动制作提问进行了实验,以检测语言模型中的潜在偏见。我们将这些提问应用于多个流行的和广泛认可的模型:BERT、RoBERTa和T5,并对这些模型进行评估。我们采用了两种方法来评估这些模型:通过人类判断是否存在偏见,并通过进一步的提问来理解模型是否可以自我诊断其预测中的偏见。

SparseSwin: Swin Transformer with Sparse Transformer Block

  • paper_url: http://arxiv.org/abs/2309.05224
  • repo_url: https://github.com/krisnapinasthika/sparseswin
  • paper_authors: Krisna Pinasthika, Blessius Sheldo Putra Laksono, Riyandi Banovbi Putera Irsal, Syifa Hukma Shabiyya, Novanto Yudistira
  • for: 降低 transformer 架构中参数数量,以提高计算效率。
  • methods: 提出了 Sparse Transformer(SparTa)块,将 transformer 块中的TokenConverter添加了一个稀疏化器,以降低 Token 的数量。
  • results: 在 ImageNet100、CIFAR10 和 CIFAR100 数据集上,提出的 SparseSwin 模型与其他状态的艺术模型相比,具有更高的准确率:86.96%、97.43% 和 85.35%。
    Abstract Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin capability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state of the art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and CIFAR100 datasets respectively. Despite its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance.
    摘要 (Simplified Chinese)计算机视觉研究的进步使得转换器架构成为计算机视觉任务的状态体系。 however, one of the known drawbacks of the transformer architecture is the high number of parameters, which can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and make the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin's ability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state-of-the-art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and CIFAR100 datasets, respectively. Despite its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance.

Circle Feature Graphormer: Can Circle Features Stimulate Graph Transformer?

  • paper_url: http://arxiv.org/abs/2309.06574
  • repo_url: https://github.com/jingsonglv/CFG
  • paper_authors: Jingsong Lv, Hongyang Chen, Yao Qi, Lei Yu
  • for: 这个论文主要针对缺失链接预测任务,具体来说是使用圈子特征来提高图Transformer神经网络的性能。
  • methods: 该论文引入了两种本地图像特征,即圈子特征和桥特征,这些特征来自圈子朋友的概念。论文还提出了这些特征的详细计算方法。
  • results: 实验结果表明,使用圈子特征改进图Transformer神经网络后,可以达到 dataset ogbl-citation2 上最佳性能。
    Abstract In this paper, we introduce two local graph features for missing link prediction tasks on ogbl-citation2. We define the features as Circle Features, which are borrowed from the concept of circle of friends. We propose the detailed computing formulas for the above features. Firstly, we define the first circle feature as modified swing for common graph, which comes from bipartite graph. Secondly, we define the second circle feature as bridge, which indicates the importance of two nodes for different circle of friends. In addition, we firstly propose the above features as bias to enhance graph transformer neural network, such that graph self-attention mechanism can be improved. We implement a Circled Feature aware Graph transformer (CFG) model based on SIEG network, which utilizes a double tower structure to capture both global and local structure features. Experimental results show that CFG achieves the state-of-the-art performance on dataset ogbl-citation2.
    摘要 在这篇论文中,我们介绍了两种本地图像特征用于缺失链接预测任务中的ogbl-citation2。我们定义这些特征为圈形特征,它们取自社交圈的概念。我们提出了计算这些特征的详细计算公式。首先,我们定义第一个圈形特征为修改的摆动,它来自二分图。其次,我们定义第二个圈形特征为桥梁,它表示两个节点在不同的社交圈中的重要性。此外,我们首次提出了这些特征作为偏好,以便通过提高图像自注意机制来改进图像 transformer 神经网络。我们实现了基于SIEG网络的圈形特征意识 Graph transformer(CFG)模型,它使用双塔结构来捕捉全球和本地结构特征。实验结果表明,CFG在 dataset ogbl-citation2 上达到了状态 искусственный智能的最佳性能。

Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis

  • paper_url: http://arxiv.org/abs/2309.05217
  • repo_url: None
  • paper_authors: Li Du, Yequan Wang, Xingrun Xing, Yiqun Ya, Xiang Li, Xin Jiang, Xuezhi Fang
  • for: measure the level of hallucination of large language models (LLMs) and investigate the reasons for hallucination
  • methods: combine hallucination level quantification and hallucination reason investigation through association analysis, and recognize risk factors according to a taxonomy of model capability
  • results: reveal potential deficiencies in commonsense memorization, relational reasoning, and instruction following, and provide guidance for pretraining and supervised fine-tuning process to mitigate hallucination
    Abstract Although demonstrating superb performance on various NLP tasks, large language models (LLMs) still suffer from the hallucination problem, which threatens the reliability of LLMs. To measure the level of hallucination of LLMs, previous works first categorize the hallucination according to the phenomenon similarity, then quantify the proportion that model outputs contain hallucinatory contents. However, such hallucination rates could easily be distorted by confounders. Moreover, such hallucination rates could not reflect the reasons for the hallucination, as similar hallucinatory phenomena may originate from different sources. To address these issues, we propose to combine the hallucination level quantification and hallucination reason investigation through an association analysis, which builds the relationship between the hallucination rate of LLMs with a set of risk factors. In this way, we are able to observe the hallucination level under each value of each risk factor, examining the contribution and statistical significance of each risk factor, meanwhile excluding the confounding effect of other factors. Additionally, by recognizing the risk factors according to a taxonomy of model capability, we reveal a set of potential deficiencies in commonsense memorization, relational reasoning, and instruction following, which may further provide guidance for the pretraining and supervised fine-tuning process of LLMs to mitigate the hallucination.
    摘要 To address these issues, we propose an association analysis to investigate the relationship between the hallucination rate of LLMs and a set of risk factors. This approach allows us to observe the hallucination level under each value of each risk factor, while controlling for the confounding effect of other factors. Additionally, by categorizing risk factors according to a taxonomy of model capability, we can identify potential deficiencies in commonsense memorization, relational reasoning, and instruction following. These findings can provide guidance for pretraining and supervised fine-tuning of LLMs to mitigate hallucination.Translated into Simplified Chinese:尽管大型语言模型(LLM)在各种自然语言处理(NLP)任务上表现出色,但它们仍然受到幻觉问题的威胁,这影响了其可靠性。为了衡量LLM幻觉水平,先前的研究首先将幻觉分类为相似现象类型,然后量化模型输出中幻觉内容的比例。然而,这些方法容易受到外部因素的污染,并且无法反映幻觉的原因,因为相似的幻觉现象可能来自不同的来源。为了解决这些问题,我们提议结合幻觉水平量化和幻觉原因调查,通过关系分析建立LLM幻觉水平与风险因素之间的关系。这种方法允许我们在每个风险因素值下观察幻觉水平,同时控制其他因素的污染效应。此外,通过将风险因素分类为模型能力稳定的分类,我们可以特别提出一些可能的幻觉原因,如智能记忆、关系理解和行为追踪等。这些发现可以为LLM的预训练和监督细化进程提供指导,以避免幻觉。

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

  • paper_url: http://arxiv.org/abs/2309.07925
  • repo_url: None
  • paper_authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng
  • for: 本文提出了一种新的情感识别框架,可以同时识别精度和维度的情感。
  • methods: 该框架使用深度特征从基础模型中提取的深度特征作为Raw视频的Robust音频和视觉表示。然后,我们设计了三种基于注意力导航的特征聚合结构,用于深度特征融合。在解码阶段,我们引入了共同解码结构 для情感分类和抑制 regression。最后,我们通过将三种结构联合在 posterior probability 水平上,得到了最终的精度和维度情感预测。
  • results: 在Multimodal Emotion Recognition Challenge (MER 2023) 数据集上测试,我们提出的框架实现了精度和抑制 regression 的同时提高。我们的最终系统在 MER-MULTI 子挑战中取得了状态的最佳表现,并在 leaderboard 上排名第三。
    Abstract In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for emotion classification and valence regression in the decoding stage. A multi-task loss based on uncertainty is also designed to optimize the whole process. Finally, by combining three different structures on the posterior probability level, we obtain the final predictions of discrete and dimensional emotions. When tested on the dataset of multimodal emotion recognition challenge (MER 2023), the proposed framework yields consistent improvements in both emotion classification and valence regression. Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge.
    摘要 在这篇论文中,我们提出了一种新的框架,用于识别不同类型的情感。我们的框架使用基于基础模型的深度特征来生成Robust的音频和视觉表示。然后,我们设计了三种基于注意力引导的特征聚合结构,用于深度特征融合。在解码阶段,我们引入了共同解码结构用于情感分类和价值评分。此外,我们还设计了基于不确定性的多任务损失函数来优化整个过程。最后,我们将三种不同的结构联合在 posterior probability 水平上,从而获得最终的情感分类和价值评分预测结果。在MER 2023数据集上测试,我们的提议的框架实现了顺利的提高,包括情感分类和价值评分。最终,我们的系统在MER-MULTI子挑战中 ranked third,并达到了状态机器人的表现。

Towards Federated Learning Under Resource Constraints via Layer-wise Training and Depth Dropout

  • paper_url: http://arxiv.org/abs/2309.05213
  • repo_url: None
  • paper_authors: Pengfei Guo, Warren Richard Morningstar, Raviteja Vemulapalli, Karan Singhal, Vishal M. Patel, Philip Andrew Mansfield
  • for: 这篇论文旨在探讨如何使用 Federated Layer-wise Learning 和 Federated Depth Dropout 技术来训练大型机器学习模型,以便在边缘设备上进行训练。
  • methods: 本研究使用 Federated Layer-wise Learning 和 Federated Depth Dropout 技术,实现了降低每个客户端的内存、计算和通信成本的目的。
  • results: 研究发现,这两种技术可以同时降低训练内存使用量,并且不会对模型的性能造成重要干扰。 Specifically, 在 Federated self-supervised representation learning 中,训练内存使用量被降低了5倍或更多,而模型在下游任务中的表现与传统 Federated self-supervised learning 相似。
    Abstract Large machine learning models trained on diverse data have recently seen unprecedented success. Federated learning enables training on private data that may otherwise be inaccessible, such as domain-specific datasets decentralized across many clients. However, federated learning can be difficult to scale to large models when clients have limited resources. This challenge often results in a trade-off between model size and access to diverse data. To mitigate this issue and facilitate training of large models on edge devices, we introduce a simple yet effective strategy, Federated Layer-wise Learning, to simultaneously reduce per-client memory, computation, and communication costs. Clients train just a single layer each round, reducing resource costs considerably with minimal performance degradation. We also introduce Federated Depth Dropout, a complementary technique that randomly drops frozen layers during training, to further reduce resource usage. Coupling these two techniques enables us to effectively train significantly larger models on edge devices. Specifically, we reduce training memory usage by 5x or more in federated self-supervised representation learning and demonstrate that performance in downstream tasks is comparable to conventional federated self-supervised learning.
    摘要 大型机器学习模型在多样化数据的训练下最近见到了历史性的成功。联邦学习可以训练在私人数据上,这些数据可能 Otherwise 分散在多个客户端上。然而,联邦学习在大型模型训练时可能会受到限制,导致模型大小和数据多样性之间的复杂负载。为解决这个问题并在边缘设备上训练大型模型,我们介绍了一个简单 yet 有效的策略:联邦层别学习。在每个回合中,客户端只需要训练一层,这有很大的降低了客户端的资源成本。我们还介绍了联邦深度掉擦,一个与之相伴的技术,在训练过程中随机删除冻结层,进一步降低资源使用率。通过结合这两种技术,我们可以有效地在边缘设备上训练更大的模型,尤其是在自我监督学习中。具体来说,我们可以将训练内存使用量降低至5倍以上,并且在下游任务中表现与传统联邦自我监督学习相似。

Data Summarization beyond Monotonicity: Non-monotone Two-Stage Submodular Maximization

  • paper_url: http://arxiv.org/abs/2309.05183
  • repo_url: None
  • paper_authors: Shaojie Tang
  • for: 降低基aset中的元素数量,使新的目标函数优化过基aset中剩下的元素可以达到与原始基aset中的结果相似的效果。
  • methods: 使用提供的培训函数,这些函数都是具有减少基aset的优化目标,并且引入了扩展现有研究中假设的不寻常 monotonicity 的非 monotone 优化方法。
  • results: 引入了首个常数系数approximation算法,用于解决更一般的二stage submodular maximization问题。
    Abstract The objective of a two-stage submodular maximization problem is to reduce the ground set using provided training functions that are submodular, with the aim of ensuring that optimizing new objective functions over the reduced ground set yields results comparable to those obtained over the original ground set. This problem has applications in various domains including data summarization. Existing studies often assume the monotonicity of the objective function, whereas our work pioneers the extension of this research to accommodate non-monotone submodular functions. We have introduced the first constant-factor approximation algorithms for this more general case.
    摘要 Simplified Chinese:这个两个阶段的优化问题的目标是使用提供的训练函数来减少基aset,以确保优化新的目标函数在减少后的基aset上达到原始基aset上的结果相似。这个问题在不同领域,如数据概要中有应用。现有的研究通常假设目标函数升序,而我们的研究则是扩展这个研究来包括非升序的优化函数。我们已经提出了首个常量系数近似算法 для这种更一般的情况。

Our Deep CNN Face Matchers Have Developed Achromatopsia

  • paper_url: http://arxiv.org/abs/2309.05180
  • repo_url: None
  • paper_authors: Aman Bhatta, Domingo Mery, Haiyu Wu, Joyce Annan, Micheal C. King, Kevin W. Bowyer
  • for: 这个论文旨在证明现代深度学习面部匹配器在灰度图像和彩色图像上的匹配精度是相同的。
  • methods: 这个论文使用了深度学习面部匹配器,并对其在灰度图像和彩色图像上的性能进行了分析。
  • results: 论文发现,使用灰度图像进行训练并不会影响深度学习面部匹配器的匹配精度,而且可以使用单通道灰度图像进行训练,从而减少计算量并使用更大的数据集。
    Abstract Modern deep CNN face matchers are trained on datasets containing color images. We show that such matchers achieve essentially the same accuracy on the grayscale or the color version of a set of test images. We then consider possible causes for deep CNN face matchers ``not seeing color''. Popular web-scraped face datasets actually have 30 to 60\% of their identities with one or more grayscale images. We analyze whether this grayscale element in the training set impacts the accuracy achieved, and conclude that it does not. Further, we show that even with a 100\% grayscale training set, comparable accuracy is achieved on color or grayscale test images. Then we show that the skin region of an individual's images in a web-scraped training set exhibit significant variation in their mapping to color space. This suggests that color, at least for web-scraped, in-the-wild face datasets, carries limited identity-related information for training state-of-the-art matchers. Finally, we verify that comparable accuracy is achieved from training using single-channel grayscale images, implying that a larger dataset can be used within the same memory limit, with a less computationally intensive early layer.
    摘要 现代深度 CNN 脸Recognizer 通常在颜色图像上训练。我们显示,这些Matcher 在颜色版本或灰度版本的测试图像上具有基本相同的准确率。然后我们考虑了深度 CNN 脸Recognizer "不看到颜色" 的可能性。流行的网络抓取 face 数据集实际上有 30% 到 60% 的个体图像包含一个或多个灰度图像。我们分析了这些灰度元素在训练集中对准确率的影响,并结论是没有影响。进一步,我们表明,即使使用 100% 灰度训练集,在颜色或灰度测试图像上也可以达到相同的准确率。然后我们显示了网络抓取的人脸训练集中个体皮肤区域的颜色空间中的变化,这表明,至少对于网络抓取的人脸数据集,颜色对于训练 state-of-the-art Matcher 来说带来了有限的个体信息。最后,我们证明了通过单通道灰度图像训练,可以达到相同的准确率,这意味着可以使用更大的数据集,在同样的内存限制下,使用更加计算机易于的早期层。

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

  • paper_url: http://arxiv.org/abs/2309.05173
  • repo_url: https://github.com/zhengxiangshi/dept
  • paper_authors: Zhengxiang Shi, Aldo Lipani
  • for: 本研究旨在提高语言模型(LM)的参数效率,通过在输入中附加小量可训练的软提示 вектор(PT)进行微调(PEFT)。
  • methods: 本研究使用的方法是分解软提示(DePT),即将软提示分解成更短的软提示和一对低级矩阵,然后通过两个不同的学习率进行优化。
  • results: 对于23种自然语言处理(NLP)和视觉语言(VL)任务,我们的实验结果表明,DePT比其他PEFT方法更高效,并且在某些场景下甚至超过了基线微调方法。此外,我们还发现DePT随模型大小增长而变得更加高效。
    Abstract Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20% memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.
    摘要