cs.AI - 2023-09-11

The bionic neural network for external simulation of human locomotor system

  • paper_url: http://arxiv.org/abs/2309.05863
  • repo_url: None
  • paper_authors: Yue Shi, Shuhao Ma, Yihui Zhao
  • for: This paper aims to propose a physics-informed deep learning method to predict joint motion and muscle forces using musculoskeletal (MSK) modeling techniques.
  • methods: The proposed method embeds the MSK model into a neural network as an ordinary differential equation (ODE) loss function, allowing for the automatic estimation of subject-specific MSK physiological parameters during the training process.
  • results: The experimental validations on two datasets demonstrate that the proposed deep learning method can accurately identify subject-specific MSK physiological parameters and yield accurate predictions of joint motion and muscle forces.Here is the text in Simplified Chinese:
  • for: 这篇论文目标是提出一种基于物理学习的深度学习方法,用于预测关节运动和肌肉力量。
  • methods: 该方法将MSK模型 integrate到神经网络中,作为常微分方程(ODE)损失函数,以便自动在训练过程中确定subject特定的MSK生物学参数。
  • results: 对两个数据集进行了实验验证,结果表明,该提出的深度学习方法可以准确地确定subject特定的MSK生物学参数,并且生成准确的关节运动和肌肉力量预测。
    Abstract Muscle forces and joint kinematics estimated with musculoskeletal (MSK) modeling techniques offer useful metrics describing movement quality. Model-based computational MSK models can interpret the dynamic interaction between the neural drive to muscles, muscle dynamics, body and joint kinematics, and kinetics. Still, such a set of solutions suffers from high computational time and muscle recruitment problems, especially in complex modeling. In recent years, data-driven methods have emerged as a promising alternative due to the benefits of flexibility and adaptability. However, a large amount of labeled training data is not easy to be acquired. This paper proposes a physics-informed deep learning method based on MSK modeling to predict joint motion and muscle forces. The MSK model is embedded into the neural network as an ordinary differential equation (ODE) loss function with physiological parameters of muscle activation dynamics and muscle contraction dynamics to be identified. These parameters are automatically estimated during the training process which guides the prediction of muscle forces combined with the MSK forward dynamics model. Experimental validations on two groups of data, including one benchmark dataset and one self-collected dataset from six healthy subjects, are performed. The results demonstrate that the proposed deep learning method can effectively identify subject-specific MSK physiological parameters and the trained physics-informed forward-dynamics surrogate yields accurate motion and muscle forces predictions.
    摘要 筋力和关节动力学估算使用musculoskeletal(MSK)模型技术提供有用的运动质量指标。模型基于计算机MSK模型可以解释动态中 между神经驱动筋肉、筋肉动力、身体和关节动力学的相互作用。然而,这些解决方案受到高计算时间和肌肉征调问题困扰,特别是在复杂的模型中。在过去几年,数据驱动方法出现为一种可能的替代方案,因为它们具有灵活性和适应性。然而,大量标注训练数据很难获得。这篇论文提议一种基于MSK模型的物理学习方法,用于预测关节运动和肌力。MSK模型被嵌入到神经网络中作为常微分方程(ODE)损失函数,以便在训练过程中自动确定肌肉活动动态和肌肉强制动态的物理参数。这些参数被自动确定,并导向预测肌肉力的组合,与MSK前向动力学模型相结合。实验验证了这种深度学习方法的有效性,在六名健康者的两组数据上进行了实验验证。结果表明,提议的深度学习方法可以有效地特定个体MSK生物学参数,并且训练的物理学习前向动力学代理模型可以准确预测运动和肌力。

Uncovering mesa-optimization algorithms in Transformers

  • paper_url: http://arxiv.org/abs/2309.05858
  • repo_url: https://github.com/jimmieliu/transformer-mesa-layer
  • paper_authors: Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento
  • for: 本研究旨在解释Transformers模型的优秀表现是如何实现的,特别是该模型在深度学习中的表现。
  • methods: 本研究使用了倒推工程来探索Transformers模型中的架构偏好,并发现了一种叫做“mesa-optimization”的学习过程。此外,研究者还使用了一系列的autoregressive Transformers模型来测试这个假设。
  • results: 研究结果显示,Transformers模型中的mesa-optimization过程可以帮助模型更好地适应内置学习任务,并且可以在几乎没有训练数据的情况下解决几乎任何深度学习任务。此外,研究者还提出了一个新的自我对齐层(mesa-layer),可以辅助模型更好地解决内置学习任务。
    Abstract Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers.
    摘要 transformers 已成为深度学习中最具有优势的模型,但其表现出色的原因尚未得到充分理解。在这里,我们提出了一种假设,即 transformers 的优异表现是由于模型具有一种叫做“mesa-优化”的建筑性偏好,这是在模型的前进通道中进行的两步过程:(i)内部学习目标的建构,以及(ii)通过优化来找到解决方案。为检验这一假设,我们对一系列基于自然语言处理任务的排序 Transformers 进行了反向工程,揭示了这些模型在生成预测时使用的梯度-based mesa-优化算法。此外,我们还证明了这种学习前进通道优化算法可以立即应用于解决一些简单的几个shot任务,表明了 mesa-优化可能在大语言模型中支持Contextual learning 的能力。最后,我们提出了一种新的自注意层,即 mesa-层,它可以专门和有效地解决在 Context 中指定的优化问题。我们发现这层可以在 synthetic 和预liminary 语言处理实验中提高表现,这进一步支持了我们假设,即 mesa-优化是训练过的 transformers 中隐藏的重要操作。

Challenges in Annotating Datasets to Quantify Bias in Under-represented Society

  • paper_url: http://arxiv.org/abs/2309.08624
  • repo_url: None
  • paper_authors: Vithya Yogarajan, Gillian Dobbie, Timothy Pistotti, Joshua Bensemann, Kobe Knowles
    for: This research aims to address the lack of annotated datasets for quantifying bias in under-represented societies, specifically focusing on the New Zealand (NZ) population.methods: The research involves the manual annotation of benchmark datasets for binary gender classification and ethical/racial considerations, despite the challenges faced with the availability of only three annotators.results: The research provides an overview of the challenges encountered and lessons learnt during the manual annotation process, and offers recommendations for future research on quantifying bias in under-represented societies.
    Abstract Recent advances in artificial intelligence, including the development of highly sophisticated large language models (LLM), have proven beneficial in many real-world applications. However, evidence of inherent bias encoded in these LLMs has raised concerns about equity. In response, there has been an increase in research dealing with bias, including studies focusing on quantifying bias and developing debiasing techniques. Benchmark bias datasets have also been developed for binary gender classification and ethical/racial considerations, focusing predominantly on American demographics. However, there is minimal research in understanding and quantifying bias related to under-represented societies. Motivated by the lack of annotated datasets for quantifying bias in under-represented societies, we endeavoured to create benchmark datasets for the New Zealand (NZ) population. We faced many challenges in this process, despite the availability of three annotators. This research outlines the manual annotation process, provides an overview of the challenges we encountered and lessons learnt, and presents recommendations for future research.
    摘要 近年人工智能的发展,包括高度复杂的大语言模型(LLM),在各个实际应用中得到了 beneficial 的效果。然而,这些 LLM 中的内置偏见问题引起了公平性的担忧。为了应对这些偏见,研究人员们开始了偏见的研究,包括量化偏见和开发减偏见技术。为了适应美国民族的性别和种族考虑,已经开发了一些偏见数据集。然而,对于被排挤社会的偏见问题还没有充分的研究。我们被动机于lack of annotated datasets for quantifying bias in under-represented societies,而 endeavoured 创建了新西兰(NZ)人口的 referential datasets。我们在这个过程中遇到了许多挑战,即使有三名注释者。本研究描述了我们的手动注释过程,介绍了我们遇到的挑战和学习点,并提出了未来研究的建议。

Large Language Models for Compiler Optimization

  • paper_url: http://arxiv.org/abs/2309.07062
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, Hugh Leather
  • for: 这个论文旨在应用大型自然语言模型来优化编程代码。
  • methods: 作者使用了一个7亿参数的转换器模型,从零开始训练,以优化LLVM� assembly代码。模型接受不优化的assembly输入,并输出一个包含编译器选项的列表,以最优化程序。在训练过程中,模型需要预测未优化代码和优化后代码的指令计数,以及优化后代码本身。这些辅助学习任务有助于提高优化模型的性能和理解深度。
  • results: 作者在一个大量测试程序中评估了他们的方法。结果显示,他们的方法可以比基eline的编译器减少指令数量3.0%,并且超过了两个基eline的基eline的优化方法,这两个基eline需要多达千次编译。此外,模型表现出了逗号 surprisingly strong的代码理解能力,可以生成可编译代码91%的时间,并且完全复制编译器的输出70%的时间。
    Abstract We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time.
    摘要 我团队在探索大语言模型应用于代码优化方面做出了新的应用。我们提出了一个7亿参数的转换器模型,从头开始训练,以优化LLVM Assembly代码的大小。该模型接受未优化的Assembly输入,并输出一个包含编译器选项的列表,以优化程序。在训练过程中,我们要求模型预测未优化代码和优化后代码的指令计数,以及优化后代码本身。这些辅助学习任务有助于提高优化模型的性能和代码理解深度。我们对一个大量测试程序进行评估。我们的方法在减少指令数量方面比编译器更高,提高了3.0%。此外,模型表现出了奇异的代码理解能力,生成的代码91%的时间可编译,并且70%的时间完全模拟了编译器的输出。

Effective Abnormal Activity Detection on Multivariate Time Series Healthcare Data

  • paper_url: http://arxiv.org/abs/2309.05845
  • repo_url: None
  • paper_authors: Mengjia Niu, Yuchen Zhao, Hamed Haddadi
  • for: 这个研究旨在探讨多重时间序列数据(MTS)中的异常活动检测,以实现智能健康领域中的精确异常检测。
  • methods: 本研究提出了一种基于差异的异常检测方法(Rs-AD),并通过学习表现优化和异常活动检测来解决MTS数据中异常活动的探测问题。
  • results: 实验结果显示,Rs-AD方法在一个真实世界的步行数据集上取得了F1分数0.839,显示了该方法的效果。
    Abstract Multivariate time series (MTS) data collected from multiple sensors provide the potential for accurate abnormal activity detection in smart healthcare scenarios. However, anomalies exhibit diverse patterns and become unnoticeable in MTS data. Consequently, achieving accurate anomaly detection is challenging since we have to capture both temporal dependencies of time series and inter-relationships among variables. To address this problem, we propose a Residual-based Anomaly Detection approach, Rs-AD, for effective representation learning and abnormal activity detection. We evaluate our scheme on a real-world gait dataset and the experimental results demonstrate an F1 score of 0.839.
    摘要 多变量时间序列数据从多个传感器收集得到,提供了智能医疗场景中准确异常活动检测的潜在潜力。然而,异常活动在时间序列数据中显示多样的模式,容易被遗弃。因此,实现准确的异常检测是一项挑战,因为我们需要捕捉时间序列的 temporally 相关性和变量之间的相互关系。为解决这问题,我们提议一种基于差异的异常检测方法,Rs-AD,以便有效地学习表示和异常检测。我们对一个真实的步态数据集进行了实验,结果显示了 F1 分数为 0.839。

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

  • paper_url: http://arxiv.org/abs/2309.05833
  • repo_url: None
  • paper_authors: Dylan Zhang, Xuchao Zhang, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, Saravan Rajmohan
  • for: 本研究旨在提高云计算环境中Root Cause Analysis(RCA)工具的可靠性和准确性,以确保服务可靠性和客户信任。
  • methods: 本研究提出了一种基于大语言模型(LLM)的提取补充法,可以增强RCA工具的自信估计。该方法包括两个阶段:首先,模型根据历史事件数据评估自己的信息强度,然后审查由预测器生成的根 causa。最后,一个优化步骤将这些评估结果组合起来确定最终的自信分配。
  • results: 实验结果表明,我们的方法可以让模型更好地表达自己的自信度,提供更加抗摩擦的分数。我们解决了一些研究问题,包括使用LLMs生成的自信度是否准确、域специ fic retrieved examples对自信度估计的影响和不同RCA模型之间的通用性。通过这些研究,我们希望bridge自信度估计的差距,帮助on-call工程师做出更加有 confidence的决策,提高云 incident管理的效率。
    Abstract In recent years, the transition to cloud-based platforms in the IT sector has emphasized the significance of cloud incident root cause analysis to ensure service reliability and maintain customer trust. Central to this process is the efficient determination of root causes, a task made challenging due to the complex nature of contemporary cloud infrastructures. Despite the proliferation of AI-driven tools for root cause identification, their applicability remains limited by the inconsistent quality of their outputs. This paper introduces a method for enhancing confidence estimation in root cause analysis tools by prompting retrieval-augmented large language models (LLMs). This approach operates in two phases. Initially, the model evaluates its confidence based on historical incident data, considering its assessment of the evidence strength. Subsequently, the model reviews the root cause generated by the predictor. An optimization step then combines these evaluations to determine the final confidence assignment. Experimental results illustrate that our method enables the model to articulate its confidence effectively, providing a more calibrated score. We address research questions evaluating the ability of our method to produce calibrated confidence scores using LLMs, the impact of domain-specific retrieved examples on confidence estimates, and its potential generalizability across various root cause analysis models. Through this, we aim to bridge the confidence estimation gap, aiding on-call engineers in decision-making and bolstering the efficiency of cloud incident management.
    摘要 近年来,云计算领域内的服务可靠性和客户信任的重要性得到了更多的认可。为了确保服务可靠性,cloud incident根本原因分析成为了云计算领域内一项重要的任务。然而,由于当今云基础设施的复杂性,这个过程中的root cause分析具有挑战性。虽然AI驱动的root cause标识工具在市场上普及,但它们的应用受限于输出质量的不一致。这篇论文提出了一种方法,通过提高AI模型对根本原因分析结果的置信度的估计来增强root cause分析的可靠性。这种方法包括两个阶段:首先,模型根据历史事件数据评估其自身的置信度,然后对predictor生成的根本原因进行审查。最后,一个优化步骤将这两个评估结果组合起来确定最终的置信度分配。实验结果表明,我们的方法可以有效地使模型表达其置信度,提供一个更加准确的分数。我们解决了关于我们方法能否生成准确的置信度分数、采用域名Specific retrieved例子对置信度估计的影响以及其普适性的研究问题。通过这些研究,我们希望bridge置信度估计的差距,帮助on-call工程师在决策过程中更加准确,提高云 incident管理的效率。

Studying Accuracy of Machine Learning Models Trained on Lab Lifting Data in Solving Real-World Problems Using Wearable Sensors for Workplace Safety

  • paper_url: http://arxiv.org/abs/2309.05831
  • repo_url: None
  • paper_authors: Joseph Bertrand, Nick Griffey, Ming-Lun Lu, Rashmi Jha
  • for: 本研究旨在将实验室训练的机器学习模型(lifting identification model) портирова到实际世界中。
  • methods: 本研究使用了四种可能的解决方案来提高模型表现,包括:1)调整模型的参数;2)将训练数据集与实际世界中的数据集进行混合训练;3)将模型调整为适应实际世界中的环境;4)将模型训练于更大的数据集中。
  • results: 经过实验和分析后,研究发现这些方案都能够提高模型的表现,并且可以在实际世界中获得较好的结果。
    Abstract Porting ML models trained on lab data to real-world situations has long been a challenge. This paper discusses porting a lab-trained lifting identification model to the real-world. With performance much lower than on training data, we explored causes of the failure and proposed four potential solutions to increase model performance
    摘要 将实验室训练的机器学习模型应用到实际世界中的挑战一直存在。本篇文章讨论了将实验室训练的抬重识别模型应用到实际世界中的问题。模型在实际世界中的性能与训练数据之间存在很大差距,我们探索了引起这个问题的可能性和提出了四种解决方案以提高模型性能。

Exploring Geometric Deep Learning For Precipitation Nowcasting

  • paper_url: http://arxiv.org/abs/2309.05828
  • repo_url: None
  • paper_authors: Shan Zhao, Sudipan Saha, Zhitong Xiong, Niklas Boers, Xiao Xiang Zhu
  • for: 预测降水(几个小时内)的准确性仍然是一个挑战,因为需要准确地捕捉当地复杂的地方交互。
  • methods: 我们采用几何深度学习来普通化神经网络模型,以便更好地模型非欧几何空间中的地方关系。我们使用自动学习的对数矩阵来学习邻居Matrix,然后通过GCN层和1D核函数来提高空间和时间信息的抽象。
  • results: 我们在特伦托/意大利地区的雷达反射图序列上测试了模型,结果显示GCN可以更好地模型云profile的本地细节,以及提高预测准确性。
    Abstract Precipitation nowcasting (up to a few hours) remains a challenge due to the highly complex local interactions that need to be captured accurately. Convolutional Neural Networks rely on convolutional kernels convolving with grid data and the extracted features are trapped by limited receptive field, typically expressed in excessively smooth output compared to ground truth. Thus they lack the capacity to model complex spatial relationships among the grids. Geometric deep learning aims to generalize neural network models to non-Euclidean domains. Such models are more flexible in defining nodes and edges and can effectively capture dynamic spatial relationship among geographical grids. Motivated by this, we explore a geometric deep learning-based temporal Graph Convolutional Network (GCN) for precipitation nowcasting. The adjacency matrix that simulates the interactions among grid cells is learned automatically by minimizing the L1 loss between prediction and ground truth pixel value during the training procedure. Then, the spatial relationship is refined by GCN layers while the temporal information is extracted by 1D convolution with various kernel lengths. The neighboring information is fed as auxiliary input layers to improve the final result. We test the model on sequences of radar reflectivity maps over the Trento/Italy area. The results show that GCNs improves the effectiveness of modeling the local details of the cloud profile as well as the prediction accuracy by achieving decreased error measures.
    摘要 现在降水预测(几个小时)仍然是一个挑战,因为需要准确地捕捉当地复杂的地方交互。卷积神经网络(Convolutional Neural Networks,简称CNN)依靠卷积核对网格数据进行 convolution 操作,但是抽取特征被局部响应场所限制,通常会导致过度平滑的输出与实际值不匹配。这些模型缺乏模elling复杂的空间关系。使用非欧几何学学习(Geometric Deep Learning)可以普通化神经网络模型,使其在非欧几何空间中进行模型化。这些模型可以更 flexibly 定义节点和边,并有效地捕捉地图中的动态空间关系。鼓动于这一点,我们提出一种基于非欧几何学学习的temporal Graph Convolutional Network(GCN) для降水预测。在训练过程中,自动学习 adjacency 矩阵,表示各个网格单元之间的交互,可以通过L1损失函数和实际值像素值进行最小化。然后,GCN层通过各种核长进行1D卷积,提取时间信息,并通过邻域信息作为辅助输入层来改善最终结果。我们在特伦托/意大利区的雷达反射率图序列上测试了这种模型。结果表明,GCNs可以更好地模型云Profile的本地细节以及预测精度,并实现了降低错误度的目标。

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.05793
  • repo_url: None
  • paper_authors: Li Chen, Mengyi Zhao, Yiheng Liu, Mingxu Ding, Yangyang Song, Shizun Wang, Xu Wang, Hao Yang, Jing Liu, Kang Du, Min Zheng
  • for: 这篇论文旨在提出一种新的文本到图像生成方法,以提高个性化图像生成的效率和质量。
  • methods: 该方法采用双树条件机制,在文本和图像域都进行conditioning,以实现更好的控制图像生成过程。此外,我们还引入了一种新的人脸身份损失组件,以提高图像生成过程中的人脸保持性。
  • results: 我们的提案的PhotoVerse方法可以在几秒钟内生成高质量的图像,并且可以生成各种不同的场景和风格的图像。我们的方法还可以完全消除测试时间调整,只需要提供一个目标人脸的单一图像。
    Abstract Personalized text-to-image generation has emerged as a powerful and sought-after tool, empowering users to create customized images based on their specific concepts and prompts. However, existing approaches to personalization encounter multiple challenges, including long tuning times, large storage requirements, the necessity for multiple input images per identity, and limitations in preserving identity and editability. To address these obstacles, we present PhotoVerse, an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains, providing effective control over the image generation process. Furthermore, we introduce facial identity loss as a novel component to enhance the preservation of identity during training. Remarkably, our proposed PhotoVerse eliminates the need for test time tuning and relies solely on a single facial photo of the target identity, significantly reducing the resource cost associated with image generation. After a single training phase, our approach enables generating high-quality images within only a few seconds. Moreover, our method can produce diverse images that encompass various scenes and styles. The extensive evaluation demonstrates the superior performance of our approach, which achieves the dual objectives of preserving identity and facilitating editability. Project page: https://photoverse2d.github.io/
    摘要 个人化文本到图像生成技术已经成为当前最强大和最受欢迎的工具,允许用户根据自己的具体概念和提示来创建自定义的图像。然而,现有的个人化方法面临多个挑战,包括长时间调整、大量存储需求、每个标识符需要多个输入图像,以及保持标识和可编辑性的限制。为解决这些挑战,我们提出了 PhotoVerse,一种创新的方法,它在文本和图像领域都采用双枝条件机制,以提供有效控制图像生成过程。此外,我们还引入了人脸标识损失作为一种新的组件,以增强在训练过程中保持标识的能力。值得一提的是,我们的提议的 PhotoVerse 不需要测试时间调整,仅需要一个目标标识人脸的单个图像,可以减少图像生成的资源成本。另外,我们的方法可以在只需几秒钟内生成高质量图像,并且可以生成包括不同场景和风格的多种图像。我们的评估结果表明,我们的方法可以同时保持标识和促进可编辑性的两个目标。项目页面:https://photoverse2d.github.io/

Adaptive User-centered Neuro-symbolic Learning for Multimodal Interaction with Autonomous Systems

  • paper_url: http://arxiv.org/abs/2309.05787
  • repo_url: None
  • paper_authors: Amr Gomaa, Michael Feld
  • for: 这篇论文目的是提出一种基于人类教学的人工智能系统设计方法,以便让人工智能系统更好地理解物体和环境。
  • methods: 这篇论文提出了一种以多模态输入和输出为基础的人工智能系统设计方法,包括人类教学和机器学习等技术。
  • results: 这篇论文提出了一些假设和设计指南,以及一个 relate work 的应用场景,以便实现人工智能系统的更高水平的学习能力。
    Abstract Recent advances in machine learning, particularly deep learning, have enabled autonomous systems to perceive and comprehend objects and their environments in a perceptual subsymbolic manner. These systems can now perform object detection, sensor data fusion, and language understanding tasks. However, there is a growing need to enhance these systems to understand objects and their environments more conceptually and symbolically. It is essential to consider both the explicit teaching provided by humans (e.g., describing a situation or explaining how to act) and the implicit teaching obtained by observing human behavior (e.g., through the system's sensors) to achieve this level of powerful artificial intelligence. Thus, the system must be designed with multimodal input and output capabilities to support implicit and explicit interaction models. In this position paper, we argue for considering both types of inputs, as well as human-in-the-loop and incremental learning techniques, for advancing the field of artificial intelligence and enabling autonomous systems to learn like humans. We propose several hypotheses and design guidelines and highlight a use case from related work to achieve this goal.
    摘要 Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in a Perceptual Subsymbolic Manner. However, There Is a Growing Need to Enhance These Systems to Understand Objects and Their Environments More Conceptually and Symbolically.Recent Advances in Machine Learning Have Enabled Autonomous Systems to Perceive and Comprehend Objects and Their Environments in

Grey-box Bayesian Optimization for Sensor Placement in Assisted Living Environments

  • paper_url: http://arxiv.org/abs/2309.05784
  • repo_url: None
  • paper_authors: Shadan Golestan, Omid Ardakanian, Pierre Boulanger
  • for: 这篇论文是为了实现帮助生活空间中的堕落检测、室内定位和活动识别而优化传感器配置和位置。
  • methods: 本文提出了一种新的、对称的搜寻方法,利用灰色泵测测和模拟评估,寻找在无限室内空间中高质量的传感器配置。本文的主要技术贡献在于将内部活动的空间分布知识 integrate 到 Bayesian 优化中的迭代选择中。
  • results: 在两个 simulated 室内环境和一个真实世界数据中,我们显示了我们的提案方法在识别高质量传感器配置方面比 state-of-the-art 黑色盒子优化技术更好,实现了更高的 F1 分数,而且需要较少 (51.3% 的平均) 耗时和价格的函数询问。
    Abstract Optimizing the configuration and placement of sensors is crucial for reliable fall detection, indoor localization, and activity recognition in assisted living spaces. We propose a novel, sample-efficient approach to find a high-quality sensor placement in an arbitrary indoor space based on grey-box Bayesian optimization and simulation-based evaluation. Our key technical contribution lies in capturing domain-specific knowledge about the spatial distribution of activities and incorporating it into the iterative selection of query points in Bayesian optimization. Considering two simulated indoor environments and a real-world dataset containing human activities and sensor triggers, we show that our proposed method performs better compared to state-of-the-art black-box optimization techniques in identifying high-quality sensor placements, leading to accurate activity recognition in terms of F1-score, while also requiring a significantly lower (51.3% on average) number of expensive function queries.
    摘要 优化感知器置设和位置对于可靠的落体检测、indoor定位和活动识别在助生活空间中是关键。我们提出了一种新的、样本效率高的方法,通过灰度box bayesian优化和基于模拟的评估来找到高质量的感知器置设。我们的关键技术之一是利用域专业知识来捕捉活动空间的空间分布,并在 Bayesian 优化中逐步选择查询点。对于两个 simulated indoor 环境和一个实际数据集中的人类活动和感知器触发,我们表明了我们的提议方法在与现有的黑盒优化技术相比,能够更好地确定高质量的感知器置设,导致更加准确的活动识别(F1 分数),同时也只需要 significantly fewer(51.3% 的平均)次昂贵的函数查询。

Robot Parkour Learning

  • paper_url: http://arxiv.org/abs/2309.05665
  • repo_url: https://github.com/ZiwenZhuang/parkour
  • paper_authors: Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher Atkeson, Soeren Schwertfeger, Chelsea Finn, Hang Zhao
  • for: 本研究旨在开发一种基于视觉的全息攻击策略,以便机器人可以在复杂环境中快速跨越各种障碍。
  • methods: 我们提出了一种基于强化学习的方法,使用简单的奖励函数来学习多种视觉基于的困难跨越技能,包括爬高障碍、跃越大距离、蹲下低障碍、缩进窄障碍和跑步。
  • results: 我们在实验中示出,我们的系统可以将这些技能融合成一个单一的视觉基于的全息攻击策略,并将其转移到一个四脚机器人上使用其 Egocentric depth camera。我们的系统可以让两个不同的低成本机器人自主选择和执行适合的跨越技能,以 traverse 复杂的实际环境。
    Abstract Parkour is a grand challenge for legged locomotion that requires robots to overcome various obstacles rapidly in complex environments. Existing methods can generate either diverse but blind locomotion skills or vision-based but specialized skills by using reference animal data or complex rewards. However, autonomous parkour requires robots to learn generalizable skills that are both vision-based and diverse to perceive and react to various scenarios. In this work, we propose a system for learning a single end-to-end vision-based parkour policy of diverse parkour skills using a simple reward without any reference motion data. We develop a reinforcement learning method inspired by direct collocation to generate parkour skills, including climbing over high obstacles, leaping over large gaps, crawling beneath low barriers, squeezing through thin slits, and running. We distill these skills into a single vision-based parkour policy and transfer it to a quadrupedal robot using its egocentric depth camera. We demonstrate that our system can empower two different low-cost robots to autonomously select and execute appropriate parkour skills to traverse challenging real-world environments.
    摘要

Hypothesis Search: Inductive Reasoning with Language Models

  • paper_url: http://arxiv.org/abs/2309.05660
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman
    for: 这个论文的目的是提高大语言模型(LLM)在推理 inductive reasoning 能力。methods: 该论文使用了生成抽象假设的方法,首先提出多个抽象假设,然后将这些假设转换成 Python 程序,并将这些程序直接应用到观察到的示例上进行验证。results: 该论文的实验结果表明,使用这种方法可以大幅提高 LLM 在 inductive reasoning 任务上的表现。在 ARC 视觉 inductive reasoning benchmark 上,使用自动生成的假设和程序可以达到 27.5% 的准确率,比直接提示baseline(准确率为 12.5%)高出许多。并且,通过人工选择 LLM 生成的候选者来减少生成的数量,可以进一步提高表现,达到 37.5% 的准确率。
    Abstract Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which can then be robustly generalized to novel scenarios. Recent work has evaluated large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding "in context learning." This can work well for straightforward inductive tasks, but performs very poorly on more complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be directly verified by running on the observed examples and generalized to novel inputs. Because of the prohibitive cost of generation with state-of-the-art LLMs, we consider a middle step to filter the set of hypotheses that will be implemented into programs: we either ask the LLM to summarize into a smaller set of hypotheses, or ask human annotators to select a subset of the hypotheses. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, and string transformation dataset SyGuS. On a random 40-problem subset of ARC, our automated pipeline using LLM summaries achieves 27.5% accuracy, significantly outperforming the direct prompting baseline (accuracy of 12.5%). With the minimal human input of selecting from LLM-generated candidates, the performance is boosted to 37.5%. (And we argue this is a lower bound on the performance of our approach without filtering.) Our ablation studies show that abstract hypothesis generation and concrete program representations are both beneficial for LLMs to perform inductive reasoning tasks.
    摘要 人类可以通过推理来解决问题,如果给他们一些示例后,他们可以找出其下面的原理,并将其应用到新的场景中。最近的研究发现,大型自然语言模型(LLM)在 inductive reasoning 任务上表现不佳,因为它们直接从示例中学习不够。在这项工作中,我们提出了使 LLM 在 inductive reasoning 任务中更好的方法,那就是通过生成多个层次抽象的假设来提高它们的 inductive reasoning 能力。我们会问 LLM 提供多个自然语言中的假设,然后将这些假设转换为 Python 程序。这些程序可以直接在观察到的示例上运行,并将其推广到新的输入。由于现状的 LLM 生成成本太高,我们考虑了一个中间步骤,即使 LLM SUMMARIZE 生成的假设中的一个子集。我们使用这种方法在 ARC 视觉 inductive reasoning benchmark、其变种 1D-ARC 和 SyGuS 串转换集上进行验证。在随机选择 ARC 中的 40 个问题上,我们的自动化管道使用 LLM 摘要可以达到 27.5% 的准确率,与直接提问基eline (准确率为 12.5%) 相比有显著提高。在人工选择 LLM 生成的候选者的情况下,准确率可以达到 37.5%。我们的抽象研究表明,生成抽象假设和转换为具体程序表示都对 LLM 进行 inductive reasoning 任务是有利的。

Large Language Model for Science: A Study on P vs. NP

  • paper_url: http://arxiv.org/abs/2309.05689
  • repo_url: https://github.com/microsoft/LMOps/tree/main/LLM4Science
  • paper_authors: Qingxiu Dong, Li Dong, Ke Xu, Guangyan Zhou, Yaru Hao, Zhifang Sui, Furu Wei
  • for: 本研究使用大型自然语言模型(LLMs)来扩展和加速NP问题的研究,NP问题是计算机科学和数学中的一个最重要的开问题。
  • methods: 本研究提出了索洛克式思维框架,这是一种推广和加速复杂问题解决的框架,使用LLMs进行深入思考和推理。
  • results: 在p vs np问题的pilot研究中,GPT-4成功地生成了证明schema并在97次对话中进行了严格的推理,得出了”P≠NP”的结论,与(Xu和Zhou, 2023)的结论一致。
    Abstract In this work, we use large language models (LLMs) to augment and accelerate research on the P versus NP problem, one of the most important open problems in theoretical computer science and mathematics. Specifically, we propose Socratic reasoning, a general framework that promotes in-depth thinking with LLMs for complex problem-solving. Socratic reasoning encourages LLMs to recursively discover, solve, and integrate problems while facilitating self-evaluation and refinement. Our pilot study on the P vs. NP problem shows that GPT-4 successfully produces a proof schema and engages in rigorous reasoning throughout 97 dialogue turns, concluding "P $\neq$ NP", which is in alignment with (Xu and Zhou, 2023). The investigation uncovers novel insights within the extensive solution space of LLMs, shedding light on LLM for Science.
    摘要 在这个研究中,我们使用大语言模型(LLM)来增强和加速理论计算机科学和数学领域的研究,特别是P versus NP问题。我们提出了索кратиче思维框架,这是一种推广深思的框架,可以在复杂问题解决时使用LLM。索 kratic思维鼓励LLM在问题解决过程中进行自我评估和修充,从而促进深思。我们的试点研究发现,GPT-4成功地生成了证明schema,并在97次对话中进行了严格的思考,最终结论是P≠NP,这与(Xu和Zhou, 2023)相符。这些调查揭示了LLM在解决问题空间中的广泛新发现,为LLM在科学领域的应用提供了新的思路。

Combinative Cumulative Knowledge Processes

  • paper_url: http://arxiv.org/abs/2309.05638
  • repo_url: https://github.com/Aryia-Behroziuan/Robot-learning
  • paper_authors: Anna Brandenberger, Cassandra Marcussen, Elchanan Mossel, Madhu Sudan
  • for: 本文研究了Ben-Eliezer等人(ITCS 2023)提出的累累加知过程,在“指导的无环图”(DAG)中进行了分析。在这种设定中,新的知识单元可以通过将多个先前的知识单元组合而生成。
  • methods: 本文使用了idealized和简化的“树状”设定,即新单元只依赖于一个先前生成的单元。本文的主要目标是了解当前过程是否安全,即错误的影响是否可控。
  • results: 本文提供了一些必要和 suficient conditions for safety。与先前工作一样,frequency of checking和checking depth都对安全性具有关键作用。current work中新引入的一个关键参数是“组合因子”,即新单元知识取决于多少个先前生成的单元的分布。结果表明,大 combinatin factor可以赞成深度不深的检查。该结果与combination factor之间的依赖性并不简单,有些结果表示为$\mathbb{E}{1/M}$,而其他结果则取决于$\mathbb{E}{M}$.
    Abstract We analyze Cumulative Knowledge Processes, introduced by Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023), in the setting of "directed acyclic graphs", i.e., when new units of knowledge may be derived by combining multiple previous units of knowledge. The main considerations in this model are the role of errors (when new units may be erroneous) and local checking (where a few antecedent units of knowledge are checked when a new unit of knowledge is discovered). The aforementioned work defined this model but only analyzed an idealized and simplified "tree-like" setting, i.e., a setting where new units of knowledge only depended directly on one previously generated unit of knowledge. The main goal of our work is to understand when the general process is safe, i.e., when the effect of errors remains under control. We provide some necessary and some sufficient conditions for safety. As in the earlier work, we demonstrate that the frequency of checking as well as the depth of the checks play a crucial role in determining safety. A key new parameter in the current work is the $\textit{combination factor}$ which is the distribution of the number of units $M$ of old knowledge that a new unit of knowledge depends on. Our results indicate that a large combination factor can compensate for a small depth of checking. The dependency of the safety on the combination factor is far from trivial. Indeed some of our main results are stated in terms of $\mathbb{E}\{1/M\}$ while others depend on $\mathbb{E}\{M\}$.
    摘要 我们分析了Ben-Eliezer等人(ITCS 2023)提出的累积知识过程,在“指定的无环图”(DAG)中进行了研究,即新的知识单元可以通过组合多个先前的知识单元而生成。这个模型中的主要考虑因素包括错误(新单元可能错误)以及本地检查(先前的一些知识单元被检查)。该模型在理想化和简化的“树状”设置下进行了分析,即新单元只依赖于一个先前生成的知识单元。我们的主要目标是理解这个过程是安全的,即错误的影响保持在控制之下。我们提供了一些必要和充分的条件,以确定安全性。与先前的工作相同,我们发现了检查频率以及检查深度对安全性的重要作用。我们的研究发现,一个大的组合因子可以赞成一个小的检查深度。这个参数的依赖性与安全性之间存在很多不确定性。我们的主要结果中有些是基于$\mathbb{E}\{1/M\}$的,而其他些则是基于$\mathbb{E}\{M\}$。

Exploration and Comparison of Deep Learning Architectures to Predict Brain Response to Realistic Pictures

  • paper_url: http://arxiv.org/abs/2309.09983
  • repo_url: None
  • paper_authors: Riccardo Chimisso, Sathya Buršić, Paolo Marocco, Giuseppe Vizzari, Dimitri Ognibene
  • for: 预测大脑对实际图像的反应
  • methods: 使用不同预训练模型进行广泛实验,包括简单的模型和复杂的架构,以及使用可用数据和生成的嵌入。
  • results: 使用多个简单模型,每个模型专门预测每个脑区域的反应,以获得最佳结果,但未能建立坚固的数据相关性。
    Abstract We present an exploration of machine learning architectures for predicting brain responses to realistic images on occasion of the Algonauts Challenge 2023. Our research involved extensive experimentation with various pretrained models. Initially, we employed simpler models to predict brain activity but gradually introduced more complex architectures utilizing available data and embeddings generated by large-scale pre-trained models. We encountered typical difficulties related to machine learning problems, e.g. regularization and overfitting, as well as issues specific to the challenge, such as difficulty in combining multiple input encodings, as well as the high dimensionality, unclear structure, and noisy nature of the output. To overcome these issues we tested single edge 3D position-based, multi-region of interest (ROI) and hemisphere predictor models, but we found that employing multiple simple models, each dedicated to a ROI in each hemisphere of the brain of each subject, yielded the best results - a single fully connected linear layer with image embeddings generated by CLIP as input. While we surpassed the challenge baseline, our results fell short of establishing a robust association with the data.
    摘要 我们在Algonauts Challenge 2023中展示了机器学习架构的探索,用于预测真实图像下大脑的响应。我们的研究包括了许多预训练模型的实验。我们首先使用简单的模型预测大脑活动,然后逐渐引入更复杂的架构,利用可用的数据和由大规模预训练模型生成的嵌入。我们遇到了常见的机器学习问题,如常见化和过拟合,以及挑战中特有的问题,如将多个输入编码器结合起来、高维度、不确定结构和噪音性的输出。为了解决这些问题,我们测试了单边3D位置基于、多区域兴趣点(ROI)和半球预测器模型,但我们发现使用每个主 FROM 中的每个subject的每个ROI上的多个简单模型,每个模型都是一个全连接线性层,使用CLIP生成的图像嵌入,最终得到了最好的结果。虽然我们超越了基准值,但我们的结果未能建立可靠的关系于数据。

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

  • paper_url: http://arxiv.org/abs/2309.05605
  • repo_url: https://github.com/msakarvadia/memory_injections
  • paper_authors: Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster
  • for: 这篇论文目的是提高大语言模型(LLM)在多步骤理解任务中的表现。
  • methods: 该方法利用LLM的注意力头进行targeted memory injection,以帮助LLM在多步骤理解任务中包含更多有关信息。
  • results: 实验结果表明,通过在关键注意力层进行简单、有效和targeted的内存注入,可以提高LLM在多步骤任务中的表现,提高下一个需要的概率,最高提高424%。
    Abstract Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%.
    摘要

Introspective Deep Metric Learning

  • paper_url: http://arxiv.org/abs/2309.09982
  • repo_url: https://github.com/wzzheng/IDML
  • paper_authors: Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
  • for: 提出了一个 introspective deep metric learning(IDML)框架,以解决深度度量学中的uncertainty问题。
  • methods: 提出使用semantic embedding和uncertainty embedding来描述图像的 semantics和ambiguity,并使用 introspective similarity metric进行相似性评估。
  • results: 在CUB-200-2011、Cars196和Stanford Online Products dataset上,IDML framework的性能比 conventinal deep metric learning方法更高,且可以更好地处理ambiguous images。
    Abstract This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images. Conventional deep metric learning methods focus on learning a discriminative embedding to describe the semantic features of images, which ignore the existence of uncertainty in each image resulting from noise or semantic ambiguity. Training without awareness of these uncertainties causes the model to overfit the annotated labels during training and produce unsatisfactory judgments during inference. Motivated by this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training. The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval and clustering. We further provide an in-depth analysis of our framework to demonstrate the effectiveness and reliability of IDML. Code: https://github.com/wzzheng/IDML.
    摘要 To address this, we argue that a good similarity model should consider the semantic discrepancies with awareness of the uncertainty to better deal with ambiguous images for more robust training. To achieve this, we propose representing an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively.We further propose an introspective similarity metric to make similarity judgments between images considering both their semantic differences and ambiguities. The gradient analysis of the proposed metric shows that it enables the model to learn at an adaptive and slower pace to deal with the uncertainty during training.The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and achieves state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets for image retrieval and clustering. We also provide an in-depth analysis of our framework to demonstrate its effectiveness and reliability. The code is available at https://github.com/wzzheng/IDML.

Temporal Action Localization with Enhanced Instant Discriminability

  • paper_url: http://arxiv.org/abs/2309.05590
  • repo_url: https://github.com/dingfengshi/tridetplus
  • paper_authors: Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, Dacheng Tao
  • for: 这篇论文目的是提出一种一阶段框架TriDet,用于检测视频中的动作边界和其相应的类别。
  • methods: 这篇论文使用了Trident-head模型动作边界,并提出了一种高效的粒度层(SGP层)来解决转换器基于方法中的排名损失问题。同时,它还利用预训练的大型模型来提高视频背景的表示能力。
  • results: 实验结果表明TriDet具有了高效性和状态最佳的表现在多个动作检测 datasets 上,包括层次(多个标签)动作检测 datasets。
    Abstract Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate spatial-temporal context for classification, we design a decoupled feature pyramid network with separate feature pyramids to incorporate rich spatial context from the large model for localization. Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets, including hierarchical (multilabel) TAD datasets.
    摘要 Temporal action detection (TAD) targets to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often lead to imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate spatial-temporal context for classification, we design a decoupled feature pyramid network with separate feature pyramids to incorporate rich spatial context from the large model for localization. Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets, including hierarchical (multilabel) TAD datasets.Here's the word-for-word translation of the text into Simplified Chinese: temporal action detection (TAD) targets to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often lead to imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate spatial-temporal context for classification, we design a decoupled feature pyramid network with separate feature pyramids to incorporate rich spatial context from the large model for localization. Experimental results demonstrate the robustness of TriDet and its state-of-the-art performance on multiple TAD datasets, including hierarchical (multilabel) TAD datasets.

Mind the Uncertainty: Risk-Aware and Actively Exploring Model-Based Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.05582
  • repo_url: None
  • paper_authors: Marin Vlastelica, Sebastian Blaes, Cristina Pineri, Georg Martius
  • for: 这篇论文是为了解决基于模型的再征询学习中的风险管理问题,使用轨迹采样和概率安全约束,并平衡optimism和pessimism两种不确定性。
  • methods: 本论文使用了一种简单 yet effective的方法,即在基于模型的再征询学习中分离不确定性,并使用概率安全约束和轨迹采样来管理风险。
  • results: 各种实验表明,在数据驱动的MPC方法中,分离不确定性是关键 для在不确定和安全控制环境中表现良好。
    Abstract We introduce a simple but effective method for managing risk in model-based reinforcement learning with trajectory sampling that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks.Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.
    摘要 我们介绍一种简单 yet有效的方法来管理模型基于强化学习中的风险,这个方法包括机会不确定性和抽象不确定性之间的分类,并在这些不确定性下寻求平衡。我们通过实验发现,在不确定和安全控制环境中,分类不确定性是管理风险的关键。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

ITI-GEN: Inclusive Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2309.05569
  • repo_url: https://github.com/humansensinglab/ITI-GEN
  • paper_authors: Cheng Zhang, Xuanbai Chen, Siqi Chai, Chen Henry Wu, Dmitry Lagun, Thabo Beeler, Fernando De la Torre
    for:This paper aims to address the issue of unequal representations of underrepresented groups in text-to-image generative models by proposing a novel approach called ITI-GEN.methods:ITI-GEN leverages readily available reference images to learn prompt embeddings that can generate inclusive images from human-written prompts. The approach does not require model fine-tuning, making it computationally efficient.results:Extensive experiments demonstrate that ITI-GEN largely improves over state-of-the-art models in generating inclusive images from prompts, ensuring that all desired attribute categories are represented uniformly.Here is the Chinese version of the three key points:for:这篇论文的目的是解决文本到图像生成模型中的弱化群体表示问题,提出了一种名为ITI-GEN的新方法。methods:ITI-GEN利用可以获得的参考图像来学习提示 embedding,从人写的提示中生成包括所有感兴趣的属性类别的包容图像。该方法不需要模型练习,可以快速进行计算效率。results:广泛的实验表明,ITI-GEN在基于提示生成图像方面大幅超越了现有模型, Ensure that all desired attribute categories are represented uniformly.
    Abstract Text-to-image generative models often reflect the biases of the training data, leading to unequal representations of underrepresented groups. This study investigates inclusive text-to-image generative models that generate images based on human-written prompts and ensure the resulting images are uniformly distributed across attributes of interest. Unfortunately, directly expressing the desired attributes in the prompt often leads to sub-optimal results due to linguistic ambiguity or model misrepresentation. Hence, this paper proposes a drastically different approach that adheres to the maxim that "a picture is worth a thousand words". We show that, for some attributes, images can represent concepts more expressively than text. For instance, categories of skin tones are typically hard to specify by text but can be easily represented by example images. Building upon these insights, we propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration. The key idea is learning a set of prompt embeddings to generate images that can effectively represent all desired attribute categories. More importantly, ITI-GEN requires no model fine-tuning, making it computationally efficient to augment existing text-to-image models. Extensive experiments demonstrate that ITI-GEN largely improves over state-of-the-art models to generate inclusive images from a prompt. Project page: https://czhang0528.github.io/iti-gen.
    摘要 文本到图像生成模型经常表现出训练数据的偏见,导致特定群体的不平等表达。这项研究探讨了包容型文本到图像生成模型,该模型根据人写的提示生成图像,并确保生成图像具有所有Attributes of interest的均匀分布。然而,直接表达愿景中的属性在提示中经常会导致优化不佳的结果,因为语言 ambiguity 或模型误 repreSentation。因此,这篇论文提出了一种极其不同的方法,即通过“一 picture is worth a thousand words”的maxim,我们表明,对于一些属性,图像可以更加表达Concepts than text。例如,皮肤色Category 通常由文本很难Specify,但可以通过示例图像轻松表达。基于这些意识,我们提出了一种新的方法,名为 ITI-GEN,该方法利用可以 obtAin的参考图像来实现包容型文本到图像生成。关键思想是学习一组提示Embeddings,以生成能够有效表示所有愿景Category的图像。更重要的是,ITI-GEN不需要模型细化,因此可以 computationally efficient 地增强现有的文本到图像模型。广泛的实验表明,ITI-GEN较State-of-the-art模型大幅提高了从提示生成包容图像的能力。项目页面:https://czhang0528.github.io/iti-gen。

An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

  • paper_url: http://arxiv.org/abs/2309.05557
  • repo_url: None
  • paper_authors: Yukai Miao, Yu Bai, Li Chen, Dan Li, Haifeng Sun, Xizheng Wang, Ziqiu Luo, Yanyu Ren, Dapeng Sun, Xiuting Xu, Qi Zhang, Chao Xiang, Xinchi Li
  • For: The paper is written for evaluating the comprehensive capabilities of Pre-trained Large Language Models (LLMs) in Network Operations (NetOps) and measuring their performance in a multi-lingual context.* Methods: The paper presents an evaluation set called NetEval, which consists of 5,732 questions about NetOps covering five different sub-domains. The authors systematically evaluate the NetOps capability of 26 publicly available LLMs using NetEval.* Results: The results show that only GPT-4 can achieve a performance competitive to humans in NetOps, while some open models like LLaMA 2 demonstrate significant potential.Here are the three information points in Simplified Chinese text:* For: 这篇论文是为了评估大量语言模型(LLMs)在网络操作(NetOps)中的总体能力,以及在多语言 context中进行评估。* Methods: 论文提出了 NetEval 评估集,包含5,732个网络操作问题,涵盖了五个不同的子领域。作者使用 NetEval 系统性地评估了26个公开available LLMs 的网络操作能力。* Results: 结果显示,只有 GPT-4 能够与人类水平的性能,而一些开源模型如 LLaMA 2 表现出了 significativ potential。
    Abstract Nowadays, the versatile capabilities of Pre-trained Large Language Models (LLMs) have attracted much attention from the industry. However, some vertical domains are more interested in the in-domain capabilities of LLMs. For the Networks domain, we present NetEval, an evaluation set for measuring the comprehensive capabilities of LLMs in Network Operations (NetOps). NetEval is designed for evaluating the commonsense knowledge and inference ability in NetOps in a multi-lingual context. NetEval consists of 5,732 questions about NetOps, covering five different sub-domains of NetOps. With NetEval, we systematically evaluate the NetOps capability of 26 publicly available LLMs. The results show that only GPT-4 can achieve a performance competitive to humans. However, some open models like LLaMA 2 demonstrate significant potential.
    摘要 Note:* 预训练大语言模型 (LLMs) is translated as "预训练大语言模型" in Simplified Chinese.* Network Operations (NetOps) is translated as "网络运维" in Simplified Chinese.* GPT-4 is translated as "GPT-4" in Simplified Chinese.* LLaMA 2 is translated as "LLaMA 2" in Simplified Chinese.

Hybrid ASR for Resource-Constrained Robots: HMM - Deep Learning Fusion

  • paper_url: http://arxiv.org/abs/2309.07164
  • repo_url: https://github.com/anshulranjan2004/pyhmm
  • paper_authors: Anshul Ranjan, Kaushik Jegadeesan
  • for: 这个研究是为了开发一个资源有限的机器人领域中的自动语音识别系统(ASR)。
  • methods: 这个方法结合隐藏马克夫模型(HMM)和深度学习模型,并通过 Socket 程式设计来分配处理任务,以提高语音识别精度。
  • results: 实验结果显示,这个混合式 ASR 系统在不同的机器人平台上展现出了实时和精准的语音识别能力,并且具有适应不同音律环境和低功耗硬件的能力。
    Abstract This paper presents a novel hybrid Automatic Speech Recognition (ASR) system designed specifically for resource-constrained robots. The proposed approach combines Hidden Markov Models (HMMs) with deep learning models and leverages socket programming to distribute processing tasks effectively. In this architecture, the HMM-based processing takes place within the robot, while a separate PC handles the deep learning model. This synergy between HMMs and deep learning enhances speech recognition accuracy significantly. We conducted experiments across various robotic platforms, demonstrating real-time and precise speech recognition capabilities. Notably, the system exhibits adaptability to changing acoustic conditions and compatibility with low-power hardware, making it highly effective in environments with limited computational resources. This hybrid ASR paradigm opens up promising possibilities for seamless human-robot interaction. In conclusion, our research introduces a pioneering dimension to ASR techniques tailored for robotics. By employing socket programming to distribute processing tasks across distinct devices and strategically combining HMMs with deep learning models, our hybrid ASR system showcases its potential to enable robots to comprehend and respond to spoken language adeptly, even in environments with restricted computational resources. This paradigm sets a innovative course for enhancing human-robot interaction across a wide range of real-world scenarios.
    摘要 Translated into Simplified Chinese:这篇论文介绍了一种新的混合自动语音识别(ASR)系统,特制为有限的机器人资源。该系统结合隐藏马尔可夫模型(HMM)和深度学习模型,通过socket编程分布处理任务,以提高语音识别精度。在这个架构中,HMM基于的处理在机器人内部进行,而深度学习模型则由 separte的PC处理。这种 hybrid ASR 模型在不同的 роботиче平台上进行实验,展现了实时和精准的语音识别能力。尤其是在受到不同的音响环境影响时,该系统能够适应变化,并且与低功耗硬件相容,使其在有限的计算资源环境中表现出色。这种 hybrid ASR 模型开启了人机合作的新可能,使机器人能够通过语音理解和回应,与人类进行无缝交互,以实现各种真实世界的应用场景。

Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications

  • paper_url: http://arxiv.org/abs/2309.05542
  • repo_url: https://github.com/zhudotexe/kani
  • paper_authors: Andrew Zhu, Liam Dugan, Alyssa Hwang, Chris Callison-Burch
  • for: 这篇论文是为了提供一个轻量级、灵活、无关模型的开源框架,用于构建语音模型应用程序。
  • methods: 论文使用了模型接口、聊天管理和强大函数调用等核心构建块来支持复杂的功能实现。所有核心函数都可以轻松地被 override,并且都具有丰富的文档,以便开发者根据自己的需求进行自定义。
  • results: 论文通过提供一个轻量级、灵活的开源框架,帮助开发者快速实现复杂的语音模型应用程序,同时保持了可重复性和细化控制。
    Abstract Language model applications are becoming increasingly popular and complex, often including features like tool usage and retrieval augmentation. However, existing frameworks for such applications are often opinionated, deciding for developers how their prompts ought to be formatted and imposing limitations on customizability and reproducibility. To solve this we present Kani: a lightweight, flexible, and model-agnostic open-source framework for building language model applications. Kani helps developers implement a variety of complex features by supporting the core building blocks of chat interaction: model interfacing, chat management, and robust function calling. All Kani core functions are easily overridable and well documented to empower developers to customize functionality for their own needs. Kani thus serves as a useful tool for researchers, hobbyists, and industry professionals alike to accelerate their development while retaining interoperability and fine-grained control.
    摘要 language model 应用程序在现在越来越受欢迎和复杂,通常包括工具使用和检索增强功能。然而,现有的框架经常强制性地决定开发者如何格式化他们的提示,并强制限制自定义和重现性。为解决这个问题,我们提出了 Kani:一个轻量级、灵活、无关模型的开源框架,用于构建语音模型应用程序。Kani 帮助开发者实现许多复杂的功能,通过支持语音交互的核心构建块:模型接口、聊天管理和强大的函数调用。Kani 的核心函数都可以轻松地被覆盖,并且所有函数都具有详细的文档,以便开发者可以根据自己的需求自定义功能。因此,Kani 成为了研究人员、爱好者和行业专业人员都可以使用的有用工具,以加速开发,保持兼容性和细化控制。

PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

  • paper_url: http://arxiv.org/abs/2309.05534
  • repo_url: None
  • paper_authors: Chengyu Wang, Zhongjie Duan, Bingyan Liu, Xinyi Zou, Cen Chen, Kui Jia, Jun Huang
  • For: 本文旨在提出一个涵盖整体和域pecific Chinese diffusion模型的完整框架,以生成contextually relevant的图像。* Methods: 本文使用了普遍的Diffusion模型,并结合了域pecific的中文Diffusion模型,以及LoRA和ControlNet来实现细化的图像风格传输和图像编辑。* Results: 本文通过评估多个Benchmark Tasks和实际应用场景,证明了PAI-Diffusion框架在生成contextually relevant的图像方面具有了优秀的表现。
    Abstract Text-to-image synthesis for the Chinese language poses unique challenges due to its large vocabulary size, and intricate character relationships. While existing diffusion models have shown promise in generating images from textual descriptions, they often neglect domain-specific contexts and lack robustness in handling the Chinese language. This paper introduces PAI-Diffusion, a comprehensive framework that addresses these limitations. PAI-Diffusion incorporates both general and domain-specific Chinese diffusion models, enabling the generation of contextually relevant images. It explores the potential of using LoRA and ControlNet for fine-grained image style transfer and image editing, empowering users with enhanced control over image generation. Moreover, PAI-Diffusion seamlessly integrates with Alibaba Cloud's Machine Learning Platform for AI, providing accessible and scalable solutions. All the Chinese diffusion model checkpoints, LoRAs, and ControlNets, including domain-specific ones, are publicly available. A user-friendly Chinese WebUI and the diffusers-api elastic inference toolkit, also open-sourced, further facilitate the easy deployment of PAI-Diffusion models in various environments, making it a valuable resource for Chinese text-to-image synthesis.
    摘要 文本到图像生成 для中文语言具有独特的挑战,主要包括语言大词汇和汉字间复杂的关系。现有的扩散模型已经显示出生成图像from文本描述的潜力,但它们经常忽视特定领域上下文和中文语言的特点。本文提出PAI-Diffusion框架,解决这些局限性。PAI-Diffusion结合了通用和域专的中文扩散模型,使得可以生成上下文相关的图像。它还 explore了使用LoRA和ControlNet进行细化的图像风格传递和图像编辑,让用户对图像生成具有更多的控制权。此外,PAI-Diffusion与阿里巴巴云计算机机器学习平台的AI集成了可靠和扩展的解决方案。所有的中文扩散模型检查点、LoRAs和ControlNets,包括域专的,都是公共可用的。此外,用户友好的中文WebUI和diffusers-api可以方便地在不同环境中部署PAI-Diffusion模型,使其成为中文文本到图像生成的有价值资源。

On the meaning of uncertainty for ethical AI: philosophy and practice

  • paper_url: http://arxiv.org/abs/2309.05529
  • repo_url: None
  • paper_authors: Cassandra Bird, Daniel Williamson, Sabina Leonelli
  • for: 该论文的目的是如何增加人工智能系统的透明度和负责任性,以便更好地回应用户的反馈和评估。
  • methods: 该论文提出了一种解决方案,通过明确指出人工智能系统的开发基础和应用领域的限制,来增强模型的响应性、输出的质量和意义、以及对模型的评估透明度。
  • results: 该论文通过扩展后验投入评估来实现信念拥有,并 argueed that这是一种将伦理考虑入数学逻辑中的重要方法,以及实现伦理AI在统计实践中的实现。 在COVID-19Omicron变种的扩散问题上,该论文提供了一个实践例子。
    Abstract Whether and how data scientists, statisticians and modellers should be accountable for the AI systems they develop remains a controversial and highly debated topic, especially given the complexity of AI systems and the difficulties in comparing and synthesising competing claims arising from their deployment for data analysis. This paper proposes to address this issue by decreasing the opacity and heightening the accountability of decision making using AI systems, through the explicit acknowledgement of the statistical foundations that underpin their development and the ways in which these dictate how their results should be interpreted and acted upon by users. In turn, this enhances (1) the responsiveness of the models to feedback, (2) the quality and meaning of uncertainty on their outputs and (3) their transparency to evaluation. To exemplify this approach, we extend Posterior Belief Assessment to offer a route to belief ownership from complex and competing AI structures. We argue that this is a significant way to bring ethical considerations into mathematical reasoning, and to implement ethical AI in statistical practice. We demonstrate these ideas within the context of competing models used to advise the UK government on the spread of the Omicron variant of COVID-19 during December 2021.
    摘要 Translated into Simplified Chinese:whether 和 how 数据科学家、统计学家和模型构建者应该被负责做AI系统的问题是一个争议和高度讨论的话题,尤其是由于AI系统的复杂性和其部署用于数据分析时的比较和结合的困难。这篇论文提议通过降低透明度和提高决策使用AI系统的负责任,通过明确AI系统的发展基础,并让用户理解和 acted upon 其结果的方式。这有助于 (1) 提高模型的反馈responsiveness, (2) 提高输出的不确定性质量和意义, (3) 提高评估透明度。为了证明这一方法,我们将 posterior belief assessment 扩展到提供对复杂和竞争性AI结构的信念所有权的路径。我们认为这是一种将伦理考虑进数学逻辑中的重要方法,并将伦理AI应用于统计实践中。我们在2021年12月 UK 政府对奥米克隆变种COVID-19 的扩散提供了示例。

NExT-GPT: Any-to-Any Multimodal LLM

  • paper_url: http://arxiv.org/abs/2309.05519
  • repo_url: https://github.com/NExT-GPT/NExT-GPT
  • paper_authors: Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua
  • for: 这 paper 的目的是开发一个可以处理多种模式的大型语言模型(MM-LLM)系统,以便模拟人类在多种感知和交流中的行为。
  • methods: 这 paper 使用了一种综合拓展的结构,将语言模型(LLM)与多模态适配器和不同的扩散解码器相连接,以便接受和生成多种模式的输入和输出。此外,paper 还引入了一种模式转换指令调整(MosIT),并 manually 精心编辑了一个高质量的多模式数据集,以便让 NExT-GPT 具备跨模式的semantic理解和内容生成能力。
  • results: 经过训练,NExT-GPT 能够在多种模式下进行输入和输出转换,并且在不同的模式下能够具备较高的内容生成和理解能力。此外,paper 还证明了 NExT-GPT 的模式转换能力可以在不同的任务上进行改进,例如图像描述和文本生成等。
    Abstract While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities. As we humans always perceive the world and communicate with people through various modalities, developing any-to-any MM-LLMs capable of accepting and delivering content in any modality becomes essential to human-level AI. To fill the gap, we present an end-to-end general-purpose any-to-any MM-LLM system, NExT-GPT. We connect an LLM with multimodal adaptors and different diffusion decoders, enabling NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. By leveraging the existing well-trained highly-performing encoders and decoders, NExT-GPT is tuned with only a small amount of parameter (1%) of certain projection layers, which not only benefits low-cost training and also facilitates convenient expansion to more potential modalities. Moreover, we introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, based on which NExT-GPT is empowered with complex cross-modal semantic understanding and content generation. Overall, our research showcases the promising possibility of building an AI agent capable of modeling universal modalities, paving the way for more human-like AI research in the community. Project page: https://next-gpt.github.io/
    摘要 Recently, Multimodal Large Language Models (MM-LLMs) have made significant progress, but they are limited to only understanding input-side multimodality and cannot produce content in multiple modalities. As humans perceive the world and communicate with others through various modalities, developing any-to-any MM-LLMs that can accept and deliver content in any modality is essential for human-level AI. To address this gap, we propose an end-to-end general-purpose any-to-any MM-LLM system called NExT-GPT.We connect an LLM with multimodal adaptors and different diffusion decoders, allowing NExT-GPT to perceive inputs and generate outputs in arbitrary combinations of text, images, videos, and audio. By leveraging existing well-trained and highly-performing encoders and decoders, NExT-GPT is trained with only a small amount of parameters (1% of certain projection layers), which not only reduces training costs but also facilitates the addition of more potential modalities.Moreover, we introduce a modality-switching instruction tuning (MosIT) and manually curate a high-quality dataset for MosIT, which enables NExT-GPT to understand complex cross-modal semantics and generate content in various modalities. Our research demonstrates the possibility of building an AI agent capable of modeling universal modalities, paving the way for more human-like AI research in the community.Project page:

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

  • paper_url: http://arxiv.org/abs/2309.05516
  • repo_url: https://github.com/intel/neural-compressor
  • paper_authors: Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao Lv
  • For: The paper aims to optimize the weight rounding task for weight-only quantization in large language models to improve their deployment efficiency while maintaining accuracy.* Methods: The proposed method, SignRound, uses lightweight block-wise tuning with signed gradient descent to optimize the weight rounding task, which achieves outstanding results within 400 steps.* Results: SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods, without introducing additional inference overhead.Here’s the same information in Simplified Chinese text:
  • for: 论文目的是优化大语言模型中的weight-only quantization,以提高其部署效率while maintaining accuracy.
  • methods: 提议的方法是SignRound,它使用轻量级块 wise tuning和签名Gradient Descent来优化weight rounding任务,可以在400步内 дости得出色的结果。
  • results: SignRound比RTN基线和最近的方法更强,而无需添加更多的推理过程 overhead.
    Abstract Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods, without introducing additional inference overhead. The source code will be publicly available at https://github.com/intel/neural-compressor soon.
    摘要 To optimize the weight rounding task, we propose a concise and highly effective approach named SignRound. Our method uses lightweight block-wise tuning with signed gradient descent, achieving outstanding results within 400 steps. SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods without introducing additional inference overhead. The source code will be publicly available at https://github.com/intel/neural-compressor soon.

A Co-design Study for Multi-Stakeholder Job Recommender System Explanations

  • paper_url: http://arxiv.org/abs/2309.05507
  • repo_url: https://github.com/roan-schellingerhout/jrs_explanations
  • paper_authors: Roan Schellingerhout, Francesco Barile, Nava Tintarev
    for:The paper aims to determine the explanation preferences of different stakeholder types in the recruitment process, specifically candidates, recruiters, and companies.methods:The authors created a semi-structured interview guide and used grounded theory to analyze the results, finding that each stakeholder type has distinct explanation preferences.results:The study found that candidates prefer brief, textual explanations, while hiring managers prefer visual graph-based explanations, and recruiters prefer more exhaustive textual explanations. Based on these findings, the authors provide guidelines for designing an explanation interface that meets the needs of all three stakeholder types. Additionally, the validated interview guide can be used in future research to determine explanation preferences for different stakeholder types in other domains.Here is the same information in Simplified Chinese text:for:论文目的是确定各种参与者类型在招聘过程中的解释需求,具体来说是候选人、招聘人员和公司。methods:作者们创建了一份 semi-structured 采访指南,并使用基本理论来分析结果,发现每个参与者类型都有不同的解释需求。results:研究发现,候选人喜欢简短的文本解释,而招聘人员偏好图形基于的解释,而招聘人员则更喜欢详细的文本解释。根据这些发现,作者们提出了设计解释界面的指南,以满足所有参与者类型的需求。此外,采访指南也可以在未来研究中用于确定不同参与者类型的解释需求。
    Abstract Recent legislation proposals have significantly increased the demand for eXplainable Artificial Intelligence (XAI) in many businesses, especially in so-called `high-risk' domains, such as recruitment. Within recruitment, AI has become commonplace, mainly in the form of job recommender systems (JRSs), which try to match candidates to vacancies, and vice versa. However, common XAI techniques often fall short in this domain due to the different levels and types of expertise of the individuals involved, making explanations difficult to generalize. To determine the explanation preferences of the different stakeholder types - candidates, recruiters, and companies - we created and validated a semi-structured interview guide. Using grounded theory, we structurally analyzed the results of these interviews and found that different stakeholder types indeed have strongly differing explanation preferences. Candidates indicated a preference for brief, textual explanations that allow them to quickly judge potential matches. On the other hand, hiring managers preferred visual graph-based explanations that provide a more technical and comprehensive overview at a glance. Recruiters found more exhaustive textual explanations preferable, as those provided them with more talking points to convince both parties of the match. Based on these findings, we describe guidelines on how to design an explanation interface that fulfills the requirements of all three stakeholder types. Furthermore, we provide the validated interview guide, which can assist future research in determining the explanation preferences of different stakeholder types.
    摘要 最近的法规提案已经提高了高风险领域内的可解释人工智能(XAI)的需求,特别是在招聘领域。在招聘领域,人工智能已经广泛应用,主要是在 forme of job recommender systems(JRSs),用于匹配候选人和职位。然而,常见的XAI技术经常在这个领域下功不逮,因为不同的个人拥有不同水平和类型的专业知识,使得解释困难于总结。为了确定不同参与者类型(候选人、招聘人员和公司)的解释偏好,我们创建了和验证了一份 semi-structured 采访指南。使用基本的理论,我们结构分析了采访结果,并发现不同参与者类型确实有强烈不同的解释偏好。候选人表示偏好简洁的文本解释,让他们快速判断可能的匹配。相反,招聘人员偏好可见图表解释,提供技术性和全面的概述。招聘人员则更喜欢详细的文本解释,使得他们有更多的讲话点,以convince both parties of the match。基于这些发现,我们描述了如何设计一个满足所有参与者类型的解释界面的指南。此外,我们还提供了验证过的采访指南,可以帮助未来的研究确定不同参与者类型的解释偏好。

  • paper_url: http://arxiv.org/abs/2309.05501
  • repo_url: None
  • paper_authors: Ha-Thanh Nguyen, Randy Goebel, Francesca Toni, Kostas Stathis, Ken Satoh
  • for: This study aims to evaluate the performance of GPT-3.5 and GPT-4 on a prominent benchmark for legal textual entailment, the COLIEE Task 4 dataset, and to analyze their strengths and weaknesses in handling legal textual entailment tasks.
  • methods: The study uses black-box analysis to evaluate the performance of GPT-3.5 and GPT-4 on the COLIEE Task 4 dataset, which includes legal texts from different periods in Japan.
  • results: The preliminary experimental results show intriguing insights into the models’ performance on the legal textual entailment tasks, including their ability to discern entailment relationships within Japanese statute law across different periods. The study also discusses the influence of training data distribution on the models’ generalizability.
    Abstract The evolution of Generative Pre-trained Transformer (GPT) models has led to significant advancements in various natural language processing applications, particularly in legal textual entailment. We present an analysis of GPT-3.5 (ChatGPT) and GPT-4 performances on COLIEE Task 4 dataset, a prominent benchmark in this domain. The study encompasses data from Heisei 18 (2006) to Reiwa 3 (2021), exploring the models' abilities to discern entailment relationships within Japanese statute law across different periods. Our preliminary experimental results unveil intriguing insights into the models' strengths and weaknesses in handling legal textual entailment tasks, as well as the patterns observed in model performance. In the context of proprietary models with undisclosed architectures and weights, black-box analysis becomes crucial for evaluating their capabilities. We discuss the influence of training data distribution and the implications on the models' generalizability. This analysis serves as a foundation for future research, aiming to optimize GPT-based models and enable their successful adoption in legal information extraction and entailment applications.
    摘要 “GPT模型的演化has led to significant advancements in various natural language processing applications, particularly in legal textual entailment. We present an analysis of GPT-3.5(ChatGPT)and GPT-4 performances on COLIEE Task 4 dataset, a prominent benchmark in this domain. The study encompasses data from Heisei 18(2006)to Reiwa 3(2021), exploring the models' abilities to discern entailment relationships within Japanese statute law across different periods. Our preliminary experimental results unveil intriguing insights into the models' strengths and weaknesses in handling legal textual entailment tasks, as well as the patterns observed in model performance. In the context of proprietary models with undisclosed architectures and weights, black-box analysis becomes crucial for evaluating their capabilities. We discuss the influence of training data distribution and the implications on the models' generalizability. This analysis serves as a foundation for future research, aiming to optimize GPT-based models and enable their successful adoption in legal information extraction and entailment applications.”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other countries.

  • paper_url: http://arxiv.org/abs/2309.05500
  • repo_url: None
  • paper_authors: Hai-Long Nguyen, Dieu-Quynh Nguyen, Hoang-Trung Nguyen, Thu-Trang Pham, Huu-Dong Nguyen, Thach-Anh Nguyen, Thi-Hai-Yen Vuong, Ha-Thanh Nguyen
  • for: 本研究主要针对于自然语言处理在法律领域中的应用,尤其是对于低资源语言的法律领域知识获取。
  • methods: 本文使用了相似排名和深度学习模型来解决法律文档检索任务,而对于第二个任务,即从相关法律文章中提取问题回答,我们提议了一系列适应性技术来处理不同的问题类型。
  • results: 本文在两个任务上取得了出色的成绩,示出自动问答系统在法律领域,特别是低资源语言中的潜在利好和效果。
    Abstract In recent years, natural language processing has gained significant popularity in various sectors, including the legal domain. This paper presents NeCo Team's solutions to the Vietnamese text processing tasks provided in the Automated Legal Question Answering Competition 2023 (ALQAC 2023), focusing on legal domain knowledge acquisition for low-resource languages through data enrichment. Our methods for the legal document retrieval task employ a combination of similarity ranking and deep learning models, while for the second task, which requires extracting an answer from a relevant legal article in response to a question, we propose a range of adaptive techniques to handle different question types. Our approaches achieve outstanding results on both tasks of the competition, demonstrating the potential benefits and effectiveness of question answering systems in the legal field, particularly for low-resource languages.
    摘要 Recently, 自然语言处理技术在不同领域获得了广泛的应用,其中包括法律领域。这篇论文介绍了NeCo Team在2023年自动法律问答比赛(ALQAC 2023)中提供的越南文本处理任务解决方案,强调了对低资源语言的法律领域知识取得的数据增强。我们对法律文档检索任务使用了相似排名和深度学习模型,而对第二个任务,即根据问题提取相关法律文章中的答案,我们提议了一系列适应性技巧来处理不同的问题类型。我们的方法在两个任务上都取得了出色的成绩,这表明自动问答系统在法律领域,特别是低资源语言中的潜在效果和优势。

Learning Semantic Segmentation with Query Points Supervision on Aerial Images

  • paper_url: http://arxiv.org/abs/2309.05490
  • repo_url: None
  • paper_authors: Santiago Rivier, Carlos Hinojosa, Silvio Giancola, Bernard Ghanem
  • for: 高解析卫星图像 segmentation 是 remote sensing 中关键的任务,它可以将高分辨率卫星图像分解成有意义的区域。
  • methods: 我们提出了一种weakly supervised learning算法,用于训练基于查询点纠正的semantic segmentation算法。我们的提议方法可以减少手动标注的成本和时间,并且可以达到与充分监督训练相同的性能。
  • results: 我们在一个飞行图像集上测试了我们的弱监督训练方法,并与不同的semantic segmentation架构进行比较。结果显示,我们可以达到与充分监督训练相同的性能,而且可以减少手动标注的成本和时间。
    Abstract Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort.
    摘要 <> translate "Semantic segmentation is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. Recent advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort."into Simplified Chinese.Here's the translation:<>干扰�chnology is crucial in remote sensing, where high-resolution satellite images are segmented into meaningful regions. latest advancements in deep learning have significantly improved satellite image segmentation. However, most of these methods are typically trained in fully supervised settings that require high-quality pixel-level annotations, which are expensive and time-consuming to obtain. In this work, we present a weakly supervised learning algorithm to train semantic segmentation algorithms that only rely on query point annotations instead of full mask labels. Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation. Specifically, we generate superpixels and extend the query point labels into those superpixels that group similar meaningful semantics. Then, we train semantic segmentation models, supervised with images partially labeled with the superpixels pseudo-labels. We benchmark our weakly supervised training approach on an aerial image dataset and different semantic segmentation architectures, showing that we can reach competitive performance compared to fully supervised training while reducing the annotation effort.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well. Let me know if you have any further questions or requests!

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

  • paper_url: http://arxiv.org/abs/2309.05472
  • repo_url: None
  • paper_authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier
  • for: 本研究是为了评估和建立基于自我超级学习(SSL)的法语语音技术而开发的一个开源框架。
  • methods: 本研究使用了多种SSL方法,包括wav2vec 2.0,并提供了大量和多样化的训练数据。
  • results: 研究发现了多种SSL模型的性能,包括预设 versus 微调下游模型、任务特定 versus 任务通用预设模型以及大规模模型训练的碳脚印。
    Abstract Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.
    摘要 自我指导学习(SSL)是现代多种领域的起源,包括计算机视觉和自然语言处理。语音处理受益于SSL的改进,因为大多数当前领域任务都使用预训练模型进行处理。本文介绍了LeBenchmark 2.0,一个开源框架用于评估和构建法语SSL技术。该框架包括大量、多样化和可靠的数据集,包括14,000小时的多种语音,10个预训练SSL wav2vec 2.0模型,共有26亿到100亿可学习参数,并与社区共享。LeBenchmark 2.0还提供了评估协议,包括六个下游任务,以及对预训练SSL模型的研究,包括冻结 versus 精细化下游模型,任务非特定预训练模型 versus 任务特定预训练模型,以及大规模模型训练的碳脚印。

Textbooks Are All You Need II: phi-1.5 technical report

  • paper_url: http://arxiv.org/abs/2309.05463
  • repo_url: None
  • paper_authors: Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar, Yin Tat Lee
  • for: 研究小型 transformer 语言模型的力量,以实现更好的自然语言处理和数据生成。
  • methods: 使用现有的大型语言模型(LLMs)生成“文book质”数据,以提高学习过程,并创建了一个1.3亿个 parameter的新模型“phi-1.5”,以测试其在自然语言任务中的表现。
  • results: phi-1.5 模型可以在自然语言任务中表现出与更大的模型相似的能力,并在更复杂的推理任务中表现出优异,如小学数学和基本编程。但模型也会出现幻视和可能性的问题,需要进一步的研究。
    Abstract We continue the investigation into the power of smaller Transformer-based language models as initiated by \textbf{TinyStories} -- a 10 million parameter model that can produce coherent English -- and the follow-up work on \textbf{phi-1}, a 1.3 billion parameter model with Python coding performance close to the state-of-the-art. The latter work proposed to use existing Large Language Models (LLMs) to generate ``textbook quality" data as a way to enhance the learning process compared to traditional web data. We follow the ``Textbooks Are All You Need" approach, focusing this time on common sense reasoning in natural language, and create a new 1.3 billion parameter model named \textbf{phi-1.5}, with performance on natural language tasks comparable to models 5x larger, and surpassing most non-frontier LLMs on more complex reasoning tasks such as grade-school mathematics and basic coding. More generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs, both good -- such as the ability to ``think step by step" or perform some rudimentary in-context learning -- and bad, including hallucinations and the potential for toxic and biased generations -- encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source \textbf{phi-1.5} to promote further research on these urgent topics.
    摘要

Panoptic Vision-Language Feature Fields

  • paper_url: http://arxiv.org/abs/2309.05448
  • repo_url: https://github.com/ethz-asl/autolabel
  • paper_authors: Haoran Chen, Kenneth Blomqvist, Francesco Milano, Roland Siegwart
  • for: 这篇论文主要旨在提出一种开放词汇三维 semantic segmentation 方法,可以在运行时使用文本描述来分割场景。
  • methods: 该方法使用了 Panoptic Vision-Language Feature Fields (PVLFF) 算法,它同时进行 semantic segmentation 和 instance segmentation,并通过对 2D 实例段子提案的对比损失函数来学习视觉语言特征和层次实例特征。
  • results: 该方法在 HyperSim、ScanNet 和 Replica 数据集上与现有的三维close-set 精度系统相比,具有相似的性能,而且在 semantic segmentation 方面超过了当前的三维开放词汇系统。此外,我们还进行了方法的ablation Studydemonstrate了我们的模型架构的效果。
    Abstract Recently, methods have been proposed for 3D open-vocabulary semantic segmentation. Such methods are able to segment scenes into arbitrary classes given at run-time using their text description. In this paper, we propose to our knowledge the first algorithm for open-vocabulary panoptic segmentation, simultaneously performing both semantic and instance segmentation. Our algorithm, Panoptic Vision-Language Feature Fields (PVLFF) learns a feature field of the scene, jointly learning vision-language features and hierarchical instance features through a contrastive loss function from 2D instance segment proposals on input frames. Our method achieves comparable performance against the state-of-the-art close-set 3D panoptic systems on the HyperSim, ScanNet and Replica dataset and outperforms current 3D open-vocabulary systems in terms of semantic segmentation. We additionally ablate our method to demonstrate the effectiveness of our model architecture. Our code will be available at https://github.com/ethz-asl/autolabel.
    摘要 近些时候,有人提出了3D开放词汇semantic segmentation的方法。这些方法可以在运行时使用文本描述来将场景分成任意类别。在这篇论文中,我们提出了我们知道的第一种开放词汇panoptic segmentation算法,同时进行semantic和instance segmentation。我们的算法,叫做Panoptic Vision-Language Feature Fields(PVLFF),学习了场景的特征场,同时学习视力语言特征和层次实例特征通过对2D实例分割提案的对比损失函数。我们的方法在HyperSim、ScanNet和Replica数据集上与状态的封闭3D�anoptic系统具有相似性,并且在semantic segmentation方面超过当前的3D开放词汇系统。我们还进行了方法的ablation来证明我们的模型体系的有效性。我们的代码将在https://github.com/ethz-asl/autolabel中提供。

Improving Information Extraction on Business Documents with Specific Pre-Training Tasks

  • paper_url: http://arxiv.org/abs/2309.05429
  • repo_url: https://github.com/thibaultdouzon/business-document-pre-training
  • paper_authors: Thibault Douzon, Stefan Duffner, Christophe Garcia, Jérémy Espinas
  • for: This paper aims to improve the performance of Information Extraction in business documents using pre-trained language models.
  • methods: The authors use two new pre-training tasks and a post-processing algorithm to extract relevant information from scanned documents.
  • results: The proposed method achieves significant improvements in extraction performance on both public and private datasets.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目的是提高商业文档中信息提取的性能使用预训练语言模型。
  • methods: 作者使用两个新的预训练任务和一种增强算法来从扫描文档中提取相关信息。
  • results: 提案的方法在公共(从93.88提升到95.50 F1分数)和私人(从84.35提升到84.84 F1分数)数据集上都取得了显著的提升。
    Abstract Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training tasks proposed in the literature for business documents are too generic and not sufficient to learn more complex structures. In this paper, we use LayoutLM, a language model pre-trained on a collection of business documents, and introduce two new pre-training tasks that further improve its capacity to extract relevant information. The first is aimed at better understanding the complex layout of documents, and the second focuses on numeric values and their order of magnitude. These tasks force the model to learn better-contextualized representations of the scanned documents. We further introduce a new post-processing algorithm to decode BIESO tags in Information Extraction that performs better with complex entities. Our method significantly improves extraction performance on both public (from 93.88 to 95.50 F1 score) and private (from 84.35 to 84.84 F1 score) datasets composed of expense receipts, invoices, and purchase orders.
    摘要 transformer-based 语言模型在自然语言处理相关任务中广泛应用。它们的预训练使其在商业文档中的信息EXTRACTION任务上成功适应。然而,大多数在文献中提出的预训练任务 для商业文档太过普遍,无法学习更复杂的结构。在这篇论文中,我们使用LayoutLM,一个基于商业文档的语言模型,并提出了两个新的预训练任务来进一步提高其EXTRACTION信息的能力。第一个任务是了解商业文档的复杂结构,第二个任务是关注数字值和其大小的顺序。这两个任务让模型学习更好地Contextualized表示商务文档。我们还提出了一种新的后处理算法,用于解码BIESO标签在信息EXTRACTION中,该算法在处理复杂实体时表现更好。我们的方法在公共(从93.88提高到95.50 F1分数)和私人(从84.35提高到84.84 F1分数)数据集上显著提高EXTRACTION性能。

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

  • paper_url: http://arxiv.org/abs/2309.05423
  • repo_url: None
  • paper_authors: Jinzuomu Zhong, Yang Li, Hui Huang, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu
  • for: 提高文本到语音转换(TTS)的自然化和可控性。
  • methods: 提出了一个两阶段自动标注管道,包括对 Speech-Silence 和 Word-Punctuation(SSWP)对的对比预处理,以增强从文本语音空间提取的 просоди空间。
  • results: 实验证明,提出的方法可以自动生成 просоди标注,并达到当前最佳性(SOTA)表现。此外,模型还在不同数据量测试下表现出了remarkable的稳定性。
    Abstract In the realm of expressive Text-to-Speech (TTS), explicit prosodic boundaries significantly advance the naturalness and controllability of synthesized speech. While human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes. Despite the availability of extensive supervised data, the current benchmark model still faces performance setbacks. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. Specifically, in the first stage, we propose contrastive text-speech pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs. The pretraining procedure hammers at enhancing the prosodic space extracted from joint text-speech space. In the second stage, we build a multi-modal prosody annotator, which consists of pretrained encoders, a straightforward yet effective text-speech feature fusion scheme, and a sequence classifier. Extensive experiments conclusively demonstrate that our proposed method excels at automatically generating prosody annotation and achieves state-of-the-art (SOTA) performance. Furthermore, our novel model has exhibited remarkable resilience when tested with varying amounts of data.
    摘要 在表达力强的文本至语音(TTS)领域,显著提高自然性和可控性的Explicit prosody bounding significantly advances the naturalness and controllability of synthesized speech. Although human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes. Despite the availability of extensive supervised data, the current benchmark model still faces performance setbacks. To address this issue, this paper proposes a two-stage automatic annotation pipeline in a novel way. Specifically, in the first stage, we propose contrastive text-speech pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs. The pretraining procedure aims to enhance the prosodic space extracted from the joint text-speech space. In the second stage, we build a multi-modal prosody annotator, which consists of pretrained encoders, a straightforward yet effective text-speech feature fusion scheme, and a sequence classifier. Extensive experiments conclusively demonstrate that our proposed method excels at automatically generating prosody annotation and achieves state-of-the-art (SOTA) performance. Moreover, our novel model has exhibited remarkable resilience when tested with varying amounts of data.

Hazards in Deep Learning Testing: Prevalence, Impact and Recommendations

  • paper_url: http://arxiv.org/abs/2309.05381
  • repo_url: None
  • paper_authors: Salah Ghamizi, Maxime Cordy, Yuejun Guo, Mike Papadakis, And Yves Le Traon
  • for: 这篇论文主要针对Machine Learning测试领域的empirical研究进行了分析和评估,发现常见的10种empirical评估风险,这些风险可能导致实验结果不准确,并提出了10种良好的empirical做法来 Mitigate these risks。
  • methods: 本论文首先对相关文献进行了survey,并从中分析出10种常见的empirical评估风险,然后对30篇发表在top-tier SE会议上的Influential Studies进行了敏感性分析,以证明这些风险的重要性。
  • results: 研究发现,所有10种风险都有可能导致实验结果不准确,需要正确处理。此外,本论文还提出了10种良好的empirical做法,可以减少这些风险的影响。
    Abstract Much research on Machine Learning testing relies on empirical studies that evaluate and show their potential. However, in this context empirical results are sensitive to a number of parameters that can adversely impact the results of the experiments and potentially lead to wrong conclusions (Type I errors, i.e., incorrectly rejecting the Null Hypothesis). To this end, we survey the related literature and identify 10 commonly adopted empirical evaluation hazards that may significantly impact experimental results. We then perform a sensitivity analysis on 30 influential studies that were published in top-tier SE venues, against our hazard set and demonstrate their criticality. Our findings indicate that all 10 hazards we identify have the potential to invalidate experimental findings, such as those made by the related literature, and should be handled properly. Going a step further, we propose a point set of 10 good empirical practices that has the potential to mitigate the impact of the hazards. We believe our work forms the first step towards raising awareness of the common pitfalls and good practices within the software engineering community and hopefully contribute towards setting particular expectations for empirical research in the field of deep learning testing.
    摘要 很多机器学习测试研究依赖于实证研究,以评估和显示其潜力。然而,在这种情况下,实证结果受到许多参数的影响,可能导致实验结果不准确(类型一错误,即错正null Hypothesis)。为此,我们对相关文献进行了检查,并确定了10种常见的实证评估障碍,可能对实验结果产生重大影响。然后,我们对30篇发表在首屈SE会议上的影响力很大的研究进行了敏感性分析,以评估这些障碍对实验结果的影响。我们发现,这10种障碍都有可能导致实验结果无效,因此应当正确处理。为了进一步减少这些障碍的影响,我们提出了10种好的实证做法。我们认为,我们的工作是机器学习测试领域的第一步,希望通过提高社区对实证研究的认识,并为这一领域设置特定的期望。

Steps Towards Satisficing Distributed Dynamic Team Trust

  • paper_url: http://arxiv.org/abs/2309.05378
  • repo_url: None
  • paper_authors: Edmund R. Hunt, Chris Baber, Mehdi Sobhani, Sanja Milivojevic, Sagir Yusuf, Mirco Musolesi, Patrick Waterson, Sally Maynard
  • for: 本研究旨在为动态多代理团队定义和测量信任,特别在国防和安全领域。
  • methods: 本研究使用目标和团队价值定义来定义信任,并提出了一组可解释性的信任指标。
  • results: 研究表明,只有在目标和法律原则层次上可以实现对一致,而不可以在团队价值层次上实现。
    Abstract Defining and measuring trust in dynamic, multiagent teams is important in a range of contexts, particularly in defense and security domains. Team members should be trusted to work towards agreed goals and in accordance with shared values. In this paper, our concern is with the definition of goals and values such that it is possible to define 'trust' in a way that is interpretable, and hence usable, by both humans and robots. We argue that the outcome of team activity can be considered in terms of 'goal', 'individual/team values', and 'legal principles'. We question whether alignment is possible at the level of 'individual/team values', or only at the 'goal' and 'legal principles' levels. We argue for a set of metrics to define trust in human-robot teams that are interpretable by human or robot team members, and consider an experiment that could demonstrate the notion of 'satisficing trust' over the course of a simulated mission.
    摘要 在多代理团队中定义和测量信任是非常重要,特别在国防和安全领域。团队成员应该被信任以实现共同目标和共同价值观。在这篇论文中,我们关注的是目标和价值的定义,以便可以定义出可解释的信任。我们认为团队活动的结果可以表示为目标、个人/团队价值和法律原则。我们问题是个体/团队价值是否可以与目标和法律原则保持一致,或者只能保持在目标和法律原则之间。我们提出了一组可解释的信任度定义,并考虑了一个实验,可以证明在模拟任务中实现“满意信任”的概念。

Exploring Minecraft Settlement Generators with Generative Shift Analysis

  • paper_url: http://arxiv.org/abs/2309.05371
  • repo_url: None
  • paper_authors: Jean-Baptiste Hervé, Oliver Withington, Marion Hervé, Laurissa Tokarchuk, Christoph Salge
  • for: 评估和比较生成系统的方法和工具的发展在增长的兴趣领域。
  • methods: 引入了一种新的评估生成管道的方法,即生成扩散,用于评估生成过程对 pré-exist 的文件的影响。
  • results: 通过应用这种方法到一个非常丰富的 Minecraft 游戏地图数据集中,发现这种方法是一种有前途的评估生成管道的方法,并且可以在各个领域中应用。
    Abstract With growing interest in Procedural Content Generation (PCG) it becomes increasingly important to develop methods and tools for evaluating and comparing alternative systems. There is a particular lack regarding the evaluation of generative pipelines, where a set of generative systems work in series to make iterative changes to an artifact. We introduce a novel method called Generative Shift for evaluating the impact of individual stages in a PCG pipeline by quantifying the impact that a generative process has when it is applied to a pre-existing artifact. We explore this technique by applying it to a very rich dataset of Minecraft game maps produced by a set of alternative settlement generators developed as part of the Generative Design in Minecraft Competition (GDMC), all of which are designed to produce appropriate settlements for a pre-existing map. While this is an early exploration of this technique we find it to be a promising lens to apply to PCG evaluation, and we are optimistic about the potential of Generative Shift to be a domain-agnostic method for evaluating generative pipelines.
    摘要 随着生成内容生成(PCG)的兴趣增长,评估和比较不同系统的方法和工具变得越来越重要。特别是生成管道的评估,这里是一系列的生成系统在 serie 的改变一个文件。我们介绍了一种新的方法,叫做生成偏移(Generative Shift),用于评估生成管道中每个阶段的影响。我们通过应用这种技术来一个非常富裕的 Minecraft 游戏地图数据集,这个数据集包括一些用于生成适当的定居点的替代式定居生成器,这些生成器都是为某个预先存在的地图生成的。虽然这是我们对这种技术的早期探索,但我们认为生成偏移是一种适用于 PCG 评估的领域独特方法。

Feature-based Transferable Disruption Prediction for future tokamaks using domain adaptation

  • paper_url: http://arxiv.org/abs/2309.05361
  • repo_url: None
  • paper_authors: Chengshuo Shen, Wei Zheng, Bihao Guo, Dalong Chen, Xinkun Ai, Fengming Xue, Yu Zhong, Nengchao Wang, Biao Shen, Binjia Xiao, Yonghua Ding, Zhongyong Chen, Yuan Pan, J-TEXT team
  • for: 预测未来tokamak中的干扰 (predicting disruptions in future tokamaks)
  • methods: 使用域 adaptation算法CORAL,将未来tokamak数据和现有tokamak数据相互对应,然后使用机器学习模型进行预测 (using domain adaptation algorithm CORAL to align data from future tokamaks and existing tokamaks, and then using a machine learning model for prediction)
  • results: 提高了未来tokamak中预测干扰性能 (improved disruption prediction performance for future tokamaks)
    Abstract The high acquisition cost and the significant demand for disruptive discharges for data-driven disruption prediction models in future tokamaks pose an inherent contradiction in disruption prediction research. In this paper, we demonstrated a novel approach to predict disruption in a future tokamak only using a few discharges based on a domain adaptation algorithm called CORAL. It is the first attempt at applying domain adaptation in the disruption prediction task. In this paper, this disruption prediction approach aligns a few data from the future tokamak (target domain) and a large amount of data from the existing tokamak (source domain) to train a machine learning model in the existing tokamak. To simulate the existing and future tokamak case, we selected J-TEXT as the existing tokamak and EAST as the future tokamak. To simulate the lack of disruptive data in future tokamak, we only selected 100 non-disruptive discharges and 10 disruptive discharges from EAST as the target domain training data. We have improved CORAL to make it more suitable for the disruption prediction task, called supervised CORAL. Compared to the model trained by mixing data from the two tokamaks, the supervised CORAL model can enhance the disruption prediction performance for future tokamaks (AUC value from 0.764 to 0.890). Through interpretable analysis, we discovered that using the supervised CORAL enables the transformation of data distribution to be more similar to future tokamak. An assessment method for evaluating whether a model has learned a trend of similar features is designed based on SHAP analysis. It demonstrates that the supervised CORAL model exhibits more similarities to the model trained on large data sizes of EAST. FTDP provides a light, interpretable, and few-data-required way by aligning features to predict disruption using small data sizes from the future tokamak.
    摘要 高的投资成本和未来tokamak中数据驱动干扰预测模型的强大需求形成了这种研究的内在矛盾。在这篇论文中,我们提出了一种新的方法,可以在未来tokamak中预测干扰,只使用几个数据。我们使用了域适应算法called CORAL,这是对域适应 task的第一次应用。在这篇论文中,我们将未来tokamak中的数据与现有tokamak中的大量数据进行了对应,以训练一个机器学习模型。为了模拟现有和未来tokamak的情况,我们选择了J-TEXT作为现有tokamak,并选择了EAST作为未来tokamak。为了模拟未来tokamak中缺乏干扰数据的情况,我们只选择了100个非干扰的燃烧和10个干扰的燃烧作为目标域训练数据。我们对CORAL进行了改进,以使其更适合干扰预测任务,称为超级vised CORAL。相比于将数据从两个tokamak混合训练的模型,超级vised CORAL模型可以提高未来tokamak中的干扰预测性能(AUC值从0.764提高到0.890)。通过可解释分析,我们发现使用超级vised CORAL可以使数据分布更加类似于未来tokamak。我们设计了一种基于SHAP分析的评估方法,以判断模型是否学习了类似特征的趋势。结果显示,超级vised CORAL模型更加类似于基于EAST大量数据训练的模型。FTDP提供了一种轻量级、可解释、只需几据的方法,可以通过对未来tokamak中的数据进行对应,预测干扰。

Semantic Latent Decomposition with Normalizing Flows for Face Editing

  • paper_url: http://arxiv.org/abs/2309.05314
  • repo_url: https://github.com/phil329/sdflow
  • paper_authors: Binglei Li, Zhizhong Huang, Hongming Shan, Junping Zhang
  • for: 这 paper 的目的是提出一种新的面部编辑方法,以解决 StyleGAN 的 latent space 中 attribute 的杂糅问题。
  • methods: 该方法使用 continuous conditional normalizing flows 进行 semantic decomposition in original latent space,并通过 jointly optimizing 两部分来解决 entanglement 问题:(i) 一个 semantic encoder 来估算输入面部的 semantic variables,以及 (ii) 一个 flow-based transformation module 来将 latent code 映射到一个 semantic-irrelevant variable in Gaussian distribution。
  • results: 实验结果表明,SDFlow 比 existing state-of-the-art face editing methods 更高效和更 precisley, both qualitatively and quantitatively。I hope this helps! Let me know if you have any other questions.
    Abstract Navigating in the latent space of StyleGAN has shown effectiveness for face editing. However, the resulting methods usually encounter challenges in complicated navigation due to the entanglement among different attributes in the latent space. To address this issue, this paper proposes a novel framework, termed SDFlow, with a semantic decomposition in original latent space using continuous conditional normalizing flows. Specifically, SDFlow decomposes the original latent code into different irrelevant variables by jointly optimizing two components: (i) a semantic encoder to estimate semantic variables from input faces and (ii) a flow-based transformation module to map the latent code into a semantic-irrelevant variable in Gaussian distribution, conditioned on the learned semantic variables. To eliminate the entanglement between variables, we employ a disentangled learning strategy under a mutual information framework, thereby providing precise manipulation controls. Experimental results demonstrate that SDFlow outperforms existing state-of-the-art face editing methods both qualitatively and quantitatively. The source code is made available at https://github.com/phil329/SDFlow.
    摘要 在 StyleGAN 的幽默空间中导航显示了面部编辑的效iveness。然而,通常会遇到 complicated navigation 的问题,这是因为在幽默空间中各个特征之间存在杂化。为解决这个问题,这篇论文提议了一种新的框架,称为 SDFlow,它使用 kontinuous 的 conditional normalizing flows 进行原始 latent space 的semantic decomposition。 Specifically, SDFlow 将原始 latent code 分解成不相关的变量,通过同时优化两个组件:(i)一个 semantic encoder 来 estimatesemantic variables from input faces,以及(ii)一个 flow-based transformation module 来将 latent code 映射到一个 semantic-irrelevant variable in Gaussian distribution, conditioned on the learned semantic variables。为消除变量之间的杂化,我们采用了一种分解学习策略,基于 mutual information 框架,从而提供精准的操作控制。实验结果表明,SDFlow 在质量和量化两个方面都能够超越现有的面部编辑方法。代码可以在 获取。

Unsupervised human-to-robot motion retargeting via expressive latent space

  • paper_url: http://arxiv.org/abs/2309.05310
  • repo_url: None
  • paper_authors: Yashuai Yan, Esteve Valls Mascaro, Dongheui Lee
  • For: 这个论文提出了一种新的人机动作重定向方法,使得机器人能够准确模仿人类动作,同时保留动作的 semantics。* Methods: 我们提出了一种深度学习方法,直接将人类动作转换为机器人动作。我们的方法不需要对人类动作和机器人动作进行注解,从而降低了在新机器人上采用的努力。* Results: 我们所提出的方法可以准确地控制机器人动作,并且可以通过简单的线性 interpolate 在幻数空间中生成中间动作。我们还进行了多种输入模式的评估,如文本、RGB 视频和关键姿势,从而提高了用户Control 机器人的容易性。
    Abstract This paper introduces a novel approach for human-to-robot motion retargeting, enabling robots to mimic human motion with precision while preserving the semantics of the motion. For that, we propose a deep learning method for direct translation from human to robot motion. Our method does not require annotated paired human-to-robot motion data, which reduces the effort when adopting new robots. To this end, we first propose a cross-domain similarity metric to compare the poses from different domains (i.e., human and robot). Then, our method achieves the construction of a shared latent space via contrastive learning and decodes latent representations to robot motion control commands. The learned latent space exhibits expressiveness as it captures the motions precisely and allows direct motion control in the latent space. We showcase how to generate in-between motion through simple linear interpolation in the latent space between two projected human poses. Additionally, we conducted a comprehensive evaluation of robot control using diverse modality inputs, such as texts, RGB videos, and key-poses, which enhances the ease of robot control to users of all backgrounds. Finally, we compare our model with existing works and quantitatively and qualitatively demonstrate the effectiveness of our approach, enhancing natural human-robot communication and fostering trust in integrating robots into daily life.
    摘要 First, we propose a cross-domain similarity metric to compare human and robot poses. Then, we use contrastive learning to construct a shared latent space that captures human motions precisely and allows for direct motion control in the latent space. We show that the learned latent space is expressive and can be used to generate in-between motion through linear interpolation.We also evaluate the effectiveness of our approach using diverse modality inputs, such as texts, RGB videos, and key-poses. Our model outperforms existing works and demonstrates the potential for natural human-robot communication and trust in integrating robots into daily life.Here is the Simplified Chinese translation of the text:这篇论文提出了一种新的人机动作重定向方法,允许机器人模仿人类动作的精度,同时保持动作的 semantics。为此,我们提出了一种深度学习方法,直接将人类动作转化为机器人动作。我们的方法不需要标注的人机动作数据对,这 reduces the effort when adopting new robots.首先,我们提出了域间相似度metric来比较人类和机器人的姿势。然后,我们使用强制学习来构建一个共享的幂空间,该幂空间能够 preciselly capture人类动作并允许直接在幂空间中控制机器人动作。我们表明了 learned幂空间的表达能力,并可以通过简单的直线 interpolate在幂空间中生成间隔动作。我们还进行了多种输入模式的全面评估,包括文本、RGB视频和关键姿势。我们的模型超越了现有的方法,并证明了我们的方法的可行性和在日常生活中机器人的普适性。

Discrete Denoising Diffusion Approach to Integer Factorization

  • paper_url: http://arxiv.org/abs/2309.05295
  • repo_url: https://github.com/karlisfre/diffusion-factorization
  • paper_authors: Karlis Freivalds, Emils Ozolins, Guntis Barzdins
  • for: 这个论文是为了解决一个知名的计算问题—整数因数分解—的 polynomials time 问题。
  • methods: 这个论文使用了深度神经网络和粗粒度滤波来实现因数分解。它通过在具有一定准确性的基础上多次更正错误来实现这一目标。
  • results: 论文的实验结果表明,这种方法可以为整数的因数分解计算出精确的结果,并且可以处理比较长的整数(up to 56 bits)。此外,论文还发现,对于某些整数,随着训练步骤的增加,在判断步骤中所需的抽样步骤数量会下降,从而使得计算时间减少。
    Abstract Integer factorization is a famous computational problem unknown whether being solvable in the polynomial time. With the rise of deep neural networks, it is interesting whether they can facilitate faster factorization. We present an approach to factorization utilizing deep neural networks and discrete denoising diffusion that works by iteratively correcting errors in a partially-correct solution. To this end, we develop a new seq2seq neural network architecture, employ relaxed categorical distribution and adapt the reverse diffusion process to cope better with inaccuracies in the denoising step. The approach is able to find factors for integers of up to 56 bits long. Our analysis indicates that investment in training leads to an exponential decrease of sampling steps required at inference to achieve a given success rate, thus counteracting an exponential run-time increase depending on the bit-length.
    摘要 “数值因式分解是一个著名的计算问题,是否可以在多项时间内解决。深度神经网在问题中发挥了作用,可以帮助实现更快的因式分解。我们提出了一种使用深度神经网和粗糙推导的因式分解方法,它通过逐步纠正错误来实现。为此,我们开发了一个新的seq2seq神经网架构,使用宽松的分类分布和逆推导过程来更好地处理错误。这种方法可以为整数长度达56位的因子找到因式。我们的分析显示,对于对于整数的训练投入,对于指定的成功率而言,推导步骤的数量会 exponentially decrease,从而抵销推导过程中时间增长的 exponential 增长。”Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Addressing Feature Imbalance in Sound Source Separation

  • paper_url: http://arxiv.org/abs/2309.05287
  • repo_url: None
  • paper_authors: Jaechang Kim, Jeongyeon Hwang, Soheun Yi, Jaewoong Cho, Jungseul Ok
  • for: 这个论文是为了解决神经网络在源分离任务中的特征偏好问题。
  • methods: 这个论文提出了一种名为FEABASE的方法,通过对快速特征进行抑制来解决特征偏好问题。
  • results: 在多通道源分离任务中,FEABASE方法可以有效地使用数据,并且可以解决特征偏好问题。
    Abstract Neural networks often suffer from a feature preference problem, where they tend to overly rely on specific features to solve a task while disregarding other features, even if those neglected features are essential for the task. Feature preference problems have primarily been investigated in classification task. However, we observe that feature preference occurs in high-dimensional regression task, specifically, source separation. To mitigate feature preference in source separation, we propose FEAture BAlancing by Suppressing Easy feature (FEABASE). This approach enables efficient data utilization by learning hidden information about the neglected feature. We evaluate our method in a multi-channel source separation task, where feature preference between spatial feature and timbre feature appears.
    摘要 neural networks 常会面临特征偏好问题,即它们会过于依赖特定特征来解决任务,而忽略其他特征,即使这些忽略的特征是任务所必需的。特征偏好问题主要在分类任务中被研究,但我们发现,在高维回归任务中,特征偏好也存在。为了解决源分离中的特征偏好,我们提议了 FEAture BAlancing by Suppressing Easy feature (FEABASE) 方法。这种方法可以有效地利用数据,学习抑制被忽略的特征中的隐藏信息。我们在多通道源分离任务中评估了我们的方法,发现特征偏好 между空间特征和气质特征存在。

Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving

  • paper_url: http://arxiv.org/abs/2309.05282
  • repo_url: None
  • paper_authors: Ali Keysan, Andreas Look, Eitan Kosman, Gonca Gürsun, Jörg Wagner, Yu Yao, Barbara Rakitsch
  • for: 本研究旨在提出一种新的文本基于表示方法,用于描述交通场景,并使用预训练语言编码器进行处理。
  • methods: 本研究使用文本基于表示方法,与经典化的图像表示方法相结合,实现描述场景的嵌入。
  • results: 研究表明,将文本基于表示方法与经典化的图像表示方法结合使用,可以获得更加描述场景的嵌入。此外,对于nuScenes dataset的预测,也显示了与基eline相比的显著提高。最后,ablation研究表明,结合文本和图像的共同编码器可以超过单独的编码器,confirming that both representations have their complementary strengths。
    Abstract In autonomous driving tasks, scene understanding is the first step towards predicting the future behavior of the surrounding traffic participants. Yet, how to represent a given scene and extract its features are still open research questions. In this study, we propose a novel text-based representation of traffic scenes and process it with a pre-trained language encoder. First, we show that text-based representations, combined with classical rasterized image representations, lead to descriptive scene embeddings. Second, we benchmark our predictions on the nuScenes dataset and show significant improvements compared to baselines. Third, we show in an ablation study that a joint encoder of text and rasterized images outperforms the individual encoders confirming that both representations have their complementary strengths.
    摘要 自主驾驶任务中,场景理解是predicting the future behavior of surrounding traffic participants的第一步。然而,如何表示给定场景并提取其特征仍然是开放的研究问题。在这种研究中,我们提议一种新的文本基于表示交通场景的方法,并使用预训练的语言编码器进行处理。首先,我们表明了文本基于表示,与经典的照片表示结合,导致描述性场景嵌入。第二,我们对nuScenes数据集进行了评估,并显示了与基eline相比有显著的提高。第三,我们在剖析研究中表明,结合文本和照片的共同编码器表现出色,超过了两个编码器的独立表现,确认了它们在不同领域具有互补强点。

EANet: Expert Attention Network for Online Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2309.05683
  • repo_url: None
  • paper_authors: Pengfei Yao, Tianlu Mao, Min Shi, Jingkai Sun, Zhaoqi Wang
  • for: 提高自动驾驶中的轨迹预测精度,解决现有主流研究和连续学习方法在快速变化的场景下的预测精度低下问题。
  • methods: 提出了专家注意力网络,一种完整的在线学习框架,通过调整网络层次的权重,解决了Gradient Problem问题,使模型更快地学习新场景知识,恢复预测精度。还提出了短期运动趋势kernel函数,敏感于场景变化,让模型快速响应。
  • results: 对比 Traditional methods,我们的方法可以快速降低预测错误,达到领域的最佳预测精度。
    Abstract Trajectory prediction plays a crucial role in autonomous driving. Existing mainstream research and continuoual learning-based methods all require training on complete datasets, leading to poor prediction accuracy when sudden changes in scenarios occur and failing to promptly respond and update the model. Whether these methods can make a prediction in real-time and use data instances to update the model immediately(i.e., online learning settings) remains a question. The problem of gradient explosion or vanishing caused by data instance streams also needs to be addressed. Inspired by Hedge Propagation algorithm, we propose Expert Attention Network, a complete online learning framework for trajectory prediction. We introduce expert attention, which adjusts the weights of different depths of network layers, avoiding the model updated slowly due to gradient problem and enabling fast learning of new scenario's knowledge to restore prediction accuracy. Furthermore, we propose a short-term motion trend kernel function which is sensitive to scenario change, allowing the model to respond quickly. To the best of our knowledge, this work is the first attempt to address the online learning problem in trajectory prediction. The experimental results indicate that traditional methods suffer from gradient problems and that our method can quickly reduce prediction errors and reach the state-of-the-art prediction accuracy.
    摘要 准确预测轨迹对自动驾驶至关重要。现有主流研究和连续学习基于方法都需要完整的数据集训练,导致enario sudden changes时预测精度低下,无法及时更新模型。whether these methods can make a prediction in real-time and use data instances to update the model immediately (i.e., online learning settings) remains a question. The problem of gradient explosion or vanishing caused by data instance streams also needs to be addressed. Inspired by Hedge Propagation algorithm, we propose Expert Attention Network, a complete online learning framework for trajectory prediction. We introduce expert attention, which adjusts the weights of different depths of network layers, avoiding the model updated slowly due to gradient problem and enabling fast learning of new scenario's knowledge to restore prediction accuracy. Furthermore, we propose a short-term motion trend kernel function which is sensitive to scenario change, allowing the model to respond quickly. To the best of our knowledge, this work is the first attempt to address the online learning problem in trajectory prediction. The experimental results indicate that traditional methods suffer from gradient problems and that our method can quickly reduce prediction errors and reach the state-of-the-art prediction accuracy.Here's the text with some additional information about the Simplified Chinese translation:Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. It is written using the same characters as Traditional Chinese, but with some differences in stroke order and font style.In this translation, I have used Simplified Chinese characters and stroke order to represent the text. However, I have retained the traditional Chinese font style to maintain the original formatting and readability of the text.Additionally, I have made some adjustments to the wording and phrasing to ensure that the translation is accurate and natural-sounding in Simplified Chinese. For example, I have used the word "预测" (yùzhè) instead of "prediction" to emphasize the prediction aspect of the text, and I have used the phrase "轨迹预测" (guīdào yùzhè) instead of "trajectory prediction" to reflect the common usage of this phrase in Simplified Chinese.Overall, I hope that this translation provides a clear and accurate representation of the original text in Simplified Chinese.

AutoFuse: Automatic Fusion Networks for Deformable Medical Image Registration

  • paper_url: http://arxiv.org/abs/2309.05271
  • repo_url: https://github.com/mungomeng/registration-autofuse
  • paper_authors: Mingyuan Meng, Michael Fulham, Dagan Feng, Lei Bi, Jinman Kim
  • for: 本研究旨在解决deep neural network(DNN)基于的扭曲图像匹配中的空间相对性问题,以便实现医疗任务中的肿瘤生长监测和人口分析等。
  • methods: 我们提出了一种数据驱动的拼接策略(AutoFuse),以便在DNN中自动调整匹配的空间相对性策略。我们还提出了一种拼接门(Fusion Gate,FG)模块,以控制在每个网络位置上如何拼接信息。
  • results: 我们的AutoFuse在两个well-benchmarked医疗匹配任务(inter-和intra-patient匹配)上,使用八个公共数据集进行了广泛的实验,并证明了它在无标签和弱标签的情况下超过了现有的无监督和半监督匹配方法。
    Abstract Deformable image registration aims to find a dense non-linear spatial correspondence between a pair of images, which is a crucial step for many medical tasks such as tumor growth monitoring and population analysis. Recently, Deep Neural Networks (DNNs) have been widely recognized for their ability to perform fast end-to-end registration. However, DNN-based registration needs to explore the spatial information of each image and fuse this information to characterize spatial correspondence. This raises an essential question: what is the optimal fusion strategy to characterize spatial correspondence? Existing fusion strategies (e.g., early fusion, late fusion) were empirically designed to fuse information by manually defined prior knowledge, which inevitably constrains the registration performance within the limits of empirical designs. In this study, we depart from existing empirically-designed fusion strategies and develop a data-driven fusion strategy for deformable image registration. To achieve this, we propose an Automatic Fusion network (AutoFuse) that provides flexibility to fuse information at many potential locations within the network. A Fusion Gate (FG) module is also proposed to control how to fuse information at each potential network location based on training data. Our AutoFuse can automatically optimize its fusion strategy during training and can be generalizable to both unsupervised registration (without any labels) and semi-supervised registration (with weak labels provided for partial training data). Extensive experiments on two well-benchmarked medical registration tasks (inter- and intra-patient registration) with eight public datasets show that our AutoFuse outperforms state-of-the-art unsupervised and semi-supervised registration methods.
    摘要 折叠图像匹配目标是找到两个图像之间的稠密非线性空间匹配,这是医学任务中的关键步骤,如肿瘤增长监测和人口分析。在最近几年,深度神经网络(DNNs)已经广泛应用于医学图像匹配中,但是DNNs需要挖掘每个图像的空间信息并将其融合以特征化空间匹配。这引出了一个关键问题:何种最佳的融合策略可以特征化空间匹配?现有的融合策略(例如早期融合、晚期融合)是基于手动定义的先验知识,这会限制匹配性能在实际设计的范围内。在本研究中,我们决定不遵循现有的经验设计的融合策略,而是开发一种数据驱动的融合策略。为此,我们提出了一种自动融合网络(AutoFuse),该网络可以在多个可能的网络位置融合信息,并且通过一种名为卷积网络(Fusion Gate,FG)模块来控制在每个可能的网络位置如何融合信息,根据训练数据来定制。我们的AutoFuse可以在训练期间自动优化其融合策略,并且可以泛化到无标签注意力匹配和半标签注意力匹配。我们在两个医学图像匹配任务(间人匹配和内部人匹配)上进行了八个公共数据集的广泛实验,结果表明,我们的AutoFuse在无标签注意力匹配和半标签注意力匹配中都超过了状态艺术的无标签注意力匹配和半标签注意力匹配方法。

UniKG: A Benchmark and Universal Embedding for Large-Scale Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2309.05269
  • repo_url: https://github.com/yide-qiu/unikg
  • paper_authors: Yide Qiu, Shaoxiang Ling, Tong Zhang, Bo Huang, Zhen Cui
    for: This paper is written to explore useful knowledge from real-world data by constructing a large-scale heterogeneous graph (HG) benchmark dataset named UniKG from Wikidata, and to propose effective learning methods for large-scale HGs.methods: The paper proposes two key measures for effective learning on large-scale HGs, including a semantic alignment strategy for multi-attribute entities and a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels.results: The paper sets up a node classification task on the UniKG dataset and evaluates multiple baseline methods, which demonstrate the effectiveness of the proposed methods in mining multi-attribute association through multi-hop aggregation in large-scale HGs.
    Abstract Irregular data in real-world are usually organized as heterogeneous graphs (HGs) consisting of multiple types of nodes and edges. To explore useful knowledge from real-world data, both the large-scale encyclopedic HG datasets and corresponding effective learning methods are crucial, but haven't been well investigated. In this paper, we construct a large-scale HG benchmark dataset named UniKG from Wikidata to facilitate knowledge mining and heterogeneous graph representation learning. Overall, UniKG contains more than 77 million multi-attribute entities and 2000 diverse association types, which significantly surpasses the scale of existing HG datasets. To perform effective learning on the large-scale UniKG, two key measures are taken, including (i) the semantic alignment strategy for multi-attribute entities, which projects the feature description of multi-attribute nodes into a common embedding space to facilitate node aggregation in a large receptive field; (ii) proposing a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels, which extends methods of large-scale homogeneous graphs to heterogeneous graphs. These two strategies enable efficient information propagation among a tremendous number of multi-attribute entities and meantimes adaptively mine multi-attribute association through the multi-hop aggregation in large-scale HGs. We set up a node classification task on our UniKG dataset, and evaluate multiple baseline methods which are constructed by embedding our APM into large-scale homogenous graph learning methods. Our UniKG dataset and the baseline codes have been released at https://github.com/Yide-Qiu/UniKG.
    摘要 现实世界中的不规则数据通常是多种类型的节点和边组成的不规则图(HG)。为了从现实世界数据中提取有用的知识, both the large-scale encyclopedic HG datasets and corresponding effective learning methods are crucial, but have not been well investigated. In this paper, we construct a large-scale HG benchmark dataset named UniKG from Wikidata to facilitate knowledge mining and heterogeneous graph representation learning. Overall, UniKG contains more than 77 million multi-attribute entities and 2000 diverse association types, which significantly surpasses the scale of existing HG datasets. To perform effective learning on the large-scale UniKG, two key measures are taken, including (i) the semantic alignment strategy for multi-attribute entities, which projects the feature description of multi-attribute nodes into a common embedding space to facilitate node aggregation in a large receptive field; (ii) proposing a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels, which extends methods of large-scale homogeneous graphs to heterogeneous graphs. These two strategies enable efficient information propagation among a tremendous number of multi-attribute entities and meantimes adaptively mine multi-attribute association through the multi-hop aggregation in large-scale HGs. We set up a node classification task on our UniKG dataset, and evaluate multiple baseline methods which are constructed by embedding our APM into large-scale homogeneous graph learning methods. Our UniKG dataset and the baseline codes have been released at https://github.com/Yide-Qiu/UniKG.

Unsupervised Bias Detection in College Student Newspapers

  • paper_url: http://arxiv.org/abs/2309.06557
  • repo_url: None
  • paper_authors: Adam M. Lehavi, William McCormack, Noah Kornfeld, Solomon Glazer
  • for: 这篇论文是为了寻找和检测大学报纸存档中的偏见而写的。
  • methods: 这篇论文使用了一个架构,以帮助自动化工具 Grab data from complex archive sites,并生成了14名学生的14篇报道,共23,154个条目。这些数据可以通过关键词查询来计算偏见,并与原文进行比较来计算偏见的情感。
  • results: 这篇论文的结果表明,使用这种方法可以获得更加精细的偏见分析结果,而无需大量的标注数据和比较偏见。这种方法还可以检测政治敏感词和控制词的偏见,从而帮助更好地理解学生报纸的偏见。
    Abstract This paper presents a pipeline with minimal human influence for scraping and detecting bias on college newspaper archives. This paper introduces a framework for scraping complex archive sites that automated tools fail to grab data from, and subsequently generates a dataset of 14 student papers with 23,154 entries. This data can also then be queried by keyword to calculate bias by comparing the sentiment of a large language model summary to the original article. The advantages of this approach are that it is less comparative than reconstruction bias and requires less labelled data than generating keyword sentiment. Results are calculated on politically charged words as well as control words to show how conclusions can be drawn. The complete method facilitates the extraction of nuanced insights with minimal assumptions and categorizations, paving the way for a more objective understanding of bias within student newspaper sources.
    摘要 Translated into Simplified Chinese:这篇论文介绍了一个极少人工干预的管道,用于抓取和检测大学报纸存档中的偏见。该管道引入了一个自动化工具无法抓取数据的复杂存档网站的框架,并生成了14份学生报纸,共计23154个项目。这些数据可以通过关键词查询来计算偏见,并比较大语言模型总结的情感与原文。这种方法的优点包括减少比较偏见和需要 menos标注数据,相比于生成关键词情感。结果分析了政治敏感词和控制词,以示如何得出结论。该完整的方法可以帮助抽取准确的偏见情况,减少假设和分类,为大学报纸来源中的偏见问题提供更Objective的理解。

Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version)

  • paper_url: http://arxiv.org/abs/2309.05264
  • repo_url: None
  • paper_authors: Pingchuan Ma, Zhenlan Ji, Peisen Yao, Shuai Wang, Kui Ren
  • for: 这个论文的目的是提出一种可靠和安全的 causal discovery 算法,以满足可靠性和隐私性两个方面的要求。
  • methods: 这个论文使用了一种名为 CICheck 的运行时验证工具,该工具可以帮助检测 causal discovery 算法中的不可靠和过多 CI 测试,并提供一种有效的解决方案。 CICheck 使用了一种声明式的编码方案,将 CIR 问题转化为 SMT 问题,并提供了一种四个阶段的决策过程,以及三种轻量级优化技术来提高效率。
  • results: 这个论文的实验结果表明,CICheck 可以帮助提高 causal discovery 算法的可靠性和隐私性,并且可以减少过多的 CI 测试数量。
    Abstract Causal discovery is a powerful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed. Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem). We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively. [abridged due to length limit]
    摘要 causal discovery 是一种 poderful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery Extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed. Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem). We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively.

Brain-inspired Evolutionary Architectures for Spiking Neural Networks

  • paper_url: http://arxiv.org/abs/2309.05263
  • repo_url: None
  • paper_authors: Wenxuan Pan, Feifei Zhao, Zhuoya Zhao, Yi Zeng
  • For: This paper explores the evolutionary mechanisms of biological neural networks in the human brain and applies them to optimize the architecture of Spiking Neural Networks (SNNs).* Methods: The paper proposes an efficient multi-objective evolutionary algorithm based on a few-shot performance predictor to evolve SNNs architecture, incorporating brain-inspired local modular structure and global cross-module connectivity.* Results: The proposed model achieves high performance, efficiency, and low energy consumption on various datasets, including static and neuromorphic datasets. The results demonstrate the effectiveness of the brain-inspired approach to SNNs architecture optimization.Here’s the full text in Simplified Chinese:* For: 这篇论文探索了人脑中生物神经网络的进化机制,并应用其来优化神经网络(SNNs)的建立。* Methods: 论文提出了一种高效的多目标进化算法,基于几个shot性能预测器来演化SNNs的建立,包括脑Region-inspired模块和全模块连接。* Results: 提议的模型在不同的数据集上(CIFAR10、CIFAR100、CIFAR10-DVS、DVS128-Gesture)实现了高性能、高效率和低能耗。结果表明,人脑中的进化机制对SNNs建立具有很好的应用前景。
    Abstract The complex and unique neural network topology of the human brain formed through natural evolution enables it to perform multiple cognitive functions simultaneously. Automated evolutionary mechanisms of biological network structure inspire us to explore efficient architectural optimization for Spiking Neural Networks (SNNs). Instead of manually designed fixed architectures or hierarchical Network Architecture Search (NAS), this paper evolves SNNs architecture by incorporating brain-inspired local modular structure and global cross-module connectivity. Locally, the brain region-inspired module consists of multiple neural motifs with excitatory and inhibitory connections; Globally, we evolve free connections among modules, including long-term cross-module feedforward and feedback connections. We further introduce an efficient multi-objective evolutionary algorithm based on a few-shot performance predictor, endowing SNNs with high performance, efficiency and low energy consumption. Extensive experiments on static datasets (CIFAR10, CIFAR100) and neuromorphic datasets (CIFAR10-DVS, DVS128-Gesture) demonstrate that our proposed model boosts energy efficiency, archiving consistent and remarkable performance. This work explores brain-inspired neural architectures suitable for SNNs and also provides preliminary insights into the evolutionary mechanisms of biological neural networks in the human brain.
    摘要 人脑的复杂和独特神经网络结构,通过自然演化形成,允许它同时执行多种认知功能。我们从生物学上的演化机制中灵感,以便为神经网络算法(SNN)进行有效的建筑优化。而不是手动设计固定的结构或层次Network Architecture Search(NAS),我们在这篇论文中通过启用脑Region-inspired模块和全模块连接来进行SNNs的架构演化。本地,脑区域灵感模块包括多种神经元模式,以及兴奋和抑制连接;全球,我们演化模块之间的自由连接,包括长期跨模块的前向和反向连接。此外,我们还提出了一种高效的多目标进化算法,基于几何性表现预测器,使得SNNs具有高性能、高效和低能耗特点。通过对静止数据集(CIFAR10、CIFAR100)和neuromorphic数据集(CIFAR10-DVS、DVS128-Gesture)的广泛实验,我们的提出的模型提高了能效率,实现了一致性和很好的性能。这种工作探索了适合SNNs的脑神经网络架构,同时也提供了生物学上神经网络的演化机制的初步启示。

Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation

  • paper_url: http://arxiv.org/abs/2309.05238
  • repo_url: https://github.com/ielab/sigir-ap-2023-bolean2natural4sr
  • paper_authors: Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido Zuccon
  • for: 医学系统atic review的层次化检索优化,以提高检索效率和效果。
  • methods: 使用Boolean查询和生成式大语言模型如ChatGPT和Alpaca生成的查询来优化层次化检索。
  • results: 提出了一种实用和有效的层次化检索方法,与最终标题相似效果。
    Abstract Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. The goal is to prioritise the most important documents so that subsequent review steps can be carried out more efficiently and effectively. The current state of the art uses the final title of the review to rank documents using BERT-based neural neural rankers. However, the final title is only formulated at the end of the review process, which makes this approach impractical as it relies on ex post facto information. At the time of screening, only a rough working title is available, with which the BERT-based ranker achieves is significantly worse than the final title. In this paper, we explore alternative sources of queries for screening prioritisation, such as the Boolean query used to retrieve the set of documents to be screened, and queries generated by instruction-based generative large language models such as ChatGPT and Alpaca. Our best approach is not only practical based on the information available at screening time, but is similar in effectiveness with the final title.
    摘要 屏选优化在医疗系统atic reviews中的目的是将检索得到的文档集中分类和排序。目标是通过更高效和有效的方式来快速审核文档,以便更好地使用后续的审核步骤。现状态的最佳实践是使用审核的最终标题使用BERT基于神经网络来排序文档。但是,最终标题只在审核过程的末尾确定,这使得这种方法不实用,因为它基于后审核的信息。在屏选过程中,只有一个粗略的工作标题可用,BERT基于排序器在使用这个工作标题时表现较差。在这篇论文中,我们探讨了屏选优化中的其他查询来源,例如用于检索文档集的 Boolean 查询和基于指令生成的大语言模型如ChatGPT和Alpaca生成的查询。我们的最佳方法不仅是基于屏选时可用的信息实现的实用,而且与最终标题的效果相似。

Detecting Natural Language Biases with Prompt-based Learning

  • paper_url: http://arxiv.org/abs/2309.05227
  • repo_url: None
  • paper_authors: Md Abdul Aowal, Maliha T Islam, Priyanka Mary Mammen, Sandesh Shetty
  • for: 本研究探讨新兴领域的提问工程,并应用其在语言模型偏见检测任务中。
  • methods: 本研究使用手动制作的提问来检测语言模型中的四种偏见:性别、种族、性 orientation和 religión-based。
  • results: 本研究通过对BERT、RoBERTa和T5多种版本进行评估,并通过人工判断和模型自身判断来评估这些模型的偏见。
    Abstract In this project, we want to explore the newly emerging field of prompt engineering and apply it to the downstream task of detecting LM biases. More concretely, we explore how to design prompts that can indicate 4 different types of biases: (1) gender, (2) race, (3) sexual orientation, and (4) religion-based. Within our project, we experiment with different manually crafted prompts that can draw out the subtle biases that may be present in the language model. We apply these prompts to multiple variations of popular and well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases. We provide a comparative analysis of these models and assess them using a two-fold method: use human judgment to decide whether model predictions are biased and utilize model-level judgment (through further prompts) to understand if a model can self-diagnose the biases of its own prediction.
    摘要 在这个项目中,我们想要探索新兴的提问工程领域,并将其应用于语言模型偏见检测下游任务。更具体地说,我们探索如何设计提问,以便可以检测语言模型中的4种类型偏见:(1)性别、(2)种族、(3)性 orientation和(4)宗教基础。在我们的项目中,我们对不同的手动制作提问进行了实验,以检测语言模型中的潜在偏见。我们将这些提问应用于多个流行的和广泛认可的模型:BERT、RoBERTa和T5,并对这些模型进行评估。我们采用了两种方法来评估这些模型:通过人类判断是否存在偏见,并通过进一步的提问来理解模型是否可以自我诊断其预测中的偏见。

SparseSwin: Swin Transformer with Sparse Transformer Block

  • paper_url: http://arxiv.org/abs/2309.05224
  • repo_url: https://github.com/krisnapinasthika/sparseswin
  • paper_authors: Krisna Pinasthika, Blessius Sheldo Putra Laksono, Riyandi Banovbi Putera Irsal, Syifa Hukma Shabiyya, Novanto Yudistira
  • for: 降低 transformer 架构中参数数量,以提高计算效率。
  • methods: 提出了 Sparse Transformer(SparTa)块,将 transformer 块中的TokenConverter添加了一个稀疏化器,以降低 Token 的数量。
  • results: 在 ImageNet100、CIFAR10 和 CIFAR100 数据集上,提出的 SparseSwin 模型与其他状态的艺术模型相比,具有更高的准确率:86.96%、97.43% 和 85.35%。
    Abstract Advancements in computer vision research have put transformer architecture as the state of the art in computer vision tasks. One of the known drawbacks of the transformer architecture is the high number of parameters, this can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and in turn, made the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin capability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state of the art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and CIFAR100 datasets respectively. Despite its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance.
    摘要 (Simplified Chinese)计算机视觉研究的进步使得转换器架构成为计算机视觉任务的状态体系。 however, one of the known drawbacks of the transformer architecture is the high number of parameters, which can lead to a more complex and inefficient algorithm. This paper aims to reduce the number of parameters and make the transformer more efficient. We present Sparse Transformer (SparTa) Block, a modified transformer block with an addition of a sparse token converter that reduces the number of tokens used. We use the SparTa Block inside the Swin T architecture (SparseSwin) to leverage Swin's ability to downsample its input and reduce the number of initial tokens to be calculated. The proposed SparseSwin model outperforms other state-of-the-art models in image classification with an accuracy of 86.96%, 97.43%, and 85.35% on the ImageNet100, CIFAR10, and CIFAR100 datasets, respectively. Despite its fewer parameters, the result highlights the potential of a transformer architecture using a sparse token converter with a limited number of tokens to optimize the use of the transformer and improve its performance.

Circle Feature Graphormer: Can Circle Features Stimulate Graph Transformer?

  • paper_url: http://arxiv.org/abs/2309.06574
  • repo_url: https://github.com/jingsonglv/CFG
  • paper_authors: Jingsong Lv, Hongyang Chen, Yao Qi, Lei Yu
  • for: 这个论文主要针对缺失链接预测任务,具体来说是使用圈子特征来提高图Transformer神经网络的性能。
  • methods: 该论文引入了两种本地图像特征,即圈子特征和桥特征,这些特征来自圈子朋友的概念。论文还提出了这些特征的详细计算方法。
  • results: 实验结果表明,使用圈子特征改进图Transformer神经网络后,可以达到 dataset ogbl-citation2 上最佳性能。
    Abstract In this paper, we introduce two local graph features for missing link prediction tasks on ogbl-citation2. We define the features as Circle Features, which are borrowed from the concept of circle of friends. We propose the detailed computing formulas for the above features. Firstly, we define the first circle feature as modified swing for common graph, which comes from bipartite graph. Secondly, we define the second circle feature as bridge, which indicates the importance of two nodes for different circle of friends. In addition, we firstly propose the above features as bias to enhance graph transformer neural network, such that graph self-attention mechanism can be improved. We implement a Circled Feature aware Graph transformer (CFG) model based on SIEG network, which utilizes a double tower structure to capture both global and local structure features. Experimental results show that CFG achieves the state-of-the-art performance on dataset ogbl-citation2.
    摘要 在这篇论文中,我们介绍了两种本地图像特征用于缺失链接预测任务中的ogbl-citation2。我们定义这些特征为圈形特征,它们取自社交圈的概念。我们提出了计算这些特征的详细计算公式。首先,我们定义第一个圈形特征为修改的摆动,它来自二分图。其次,我们定义第二个圈形特征为桥梁,它表示两个节点在不同的社交圈中的重要性。此外,我们首次提出了这些特征作为偏好,以便通过提高图像自注意机制来改进图像 transformer 神经网络。我们实现了基于SIEG网络的圈形特征意识 Graph transformer(CFG)模型,它使用双塔结构来捕捉全球和本地结构特征。实验结果表明,CFG在 dataset ogbl-citation2 上达到了状态 искусственный智能的最佳性能。

Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis

  • paper_url: http://arxiv.org/abs/2309.05217
  • repo_url: None
  • paper_authors: Li Du, Yequan Wang, Xingrun Xing, Yiqun Ya, Xiang Li, Xin Jiang, Xuezhi Fang
  • for: measure the level of hallucination of large language models (LLMs) and investigate the reasons for hallucination
  • methods: combine hallucination level quantification and hallucination reason investigation through association analysis, and recognize risk factors according to a taxonomy of model capability
  • results: reveal potential deficiencies in commonsense memorization, relational reasoning, and instruction following, and provide guidance for pretraining and supervised fine-tuning process to mitigate hallucination
    Abstract Although demonstrating superb performance on various NLP tasks, large language models (LLMs) still suffer from the hallucination problem, which threatens the reliability of LLMs. To measure the level of hallucination of LLMs, previous works first categorize the hallucination according to the phenomenon similarity, then quantify the proportion that model outputs contain hallucinatory contents. However, such hallucination rates could easily be distorted by confounders. Moreover, such hallucination rates could not reflect the reasons for the hallucination, as similar hallucinatory phenomena may originate from different sources. To address these issues, we propose to combine the hallucination level quantification and hallucination reason investigation through an association analysis, which builds the relationship between the hallucination rate of LLMs with a set of risk factors. In this way, we are able to observe the hallucination level under each value of each risk factor, examining the contribution and statistical significance of each risk factor, meanwhile excluding the confounding effect of other factors. Additionally, by recognizing the risk factors according to a taxonomy of model capability, we reveal a set of potential deficiencies in commonsense memorization, relational reasoning, and instruction following, which may further provide guidance for the pretraining and supervised fine-tuning process of LLMs to mitigate the hallucination.
    摘要 To address these issues, we propose an association analysis to investigate the relationship between the hallucination rate of LLMs and a set of risk factors. This approach allows us to observe the hallucination level under each value of each risk factor, while controlling for the confounding effect of other factors. Additionally, by categorizing risk factors according to a taxonomy of model capability, we can identify potential deficiencies in commonsense memorization, relational reasoning, and instruction following. These findings can provide guidance for pretraining and supervised fine-tuning of LLMs to mitigate hallucination.Translated into Simplified Chinese:尽管大型语言模型(LLM)在各种自然语言处理(NLP)任务上表现出色,但它们仍然受到幻觉问题的威胁,这影响了其可靠性。为了衡量LLM幻觉水平,先前的研究首先将幻觉分类为相似现象类型,然后量化模型输出中幻觉内容的比例。然而,这些方法容易受到外部因素的污染,并且无法反映幻觉的原因,因为相似的幻觉现象可能来自不同的来源。为了解决这些问题,我们提议结合幻觉水平量化和幻觉原因调查,通过关系分析建立LLM幻觉水平与风险因素之间的关系。这种方法允许我们在每个风险因素值下观察幻觉水平,同时控制其他因素的污染效应。此外,通过将风险因素分类为模型能力稳定的分类,我们可以特别提出一些可能的幻觉原因,如智能记忆、关系理解和行为追踪等。这些发现可以为LLM的预训练和监督细化进程提供指导,以避免幻觉。

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

  • paper_url: http://arxiv.org/abs/2309.07925
  • repo_url: None
  • paper_authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng
  • for: 本文提出了一种新的情感识别框架,可以同时识别精度和维度的情感。
  • methods: 该框架使用深度特征从基础模型中提取的深度特征作为Raw视频的Robust音频和视觉表示。然后,我们设计了三种基于注意力导航的特征聚合结构,用于深度特征融合。在解码阶段,我们引入了共同解码结构 для情感分类和抑制 regression。最后,我们通过将三种结构联合在 posterior probability 水平上,得到了最终的精度和维度情感预测。
  • results: 在Multimodal Emotion Recognition Challenge (MER 2023) 数据集上测试,我们提出的框架实现了精度和抑制 regression 的同时提高。我们的最终系统在 MER-MULTI 子挑战中取得了状态的最佳表现,并在 leaderboard 上排名第三。
    Abstract In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for emotion classification and valence regression in the decoding stage. A multi-task loss based on uncertainty is also designed to optimize the whole process. Finally, by combining three different structures on the posterior probability level, we obtain the final predictions of discrete and dimensional emotions. When tested on the dataset of multimodal emotion recognition challenge (MER 2023), the proposed framework yields consistent improvements in both emotion classification and valence regression. Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge.
    摘要 在这篇论文中,我们提出了一种新的框架,用于识别不同类型的情感。我们的框架使用基于基础模型的深度特征来生成Robust的音频和视觉表示。然后,我们设计了三种基于注意力引导的特征聚合结构,用于深度特征融合。在解码阶段,我们引入了共同解码结构用于情感分类和价值评分。此外,我们还设计了基于不确定性的多任务损失函数来优化整个过程。最后,我们将三种不同的结构联合在 posterior probability 水平上,从而获得最终的情感分类和价值评分预测结果。在MER 2023数据集上测试,我们的提议的框架实现了顺利的提高,包括情感分类和价值评分。最终,我们的系统在MER-MULTI子挑战中 ranked third,并达到了状态机器人的表现。

Towards Federated Learning Under Resource Constraints via Layer-wise Training and Depth Dropout

  • paper_url: http://arxiv.org/abs/2309.05213
  • repo_url: None
  • paper_authors: Pengfei Guo, Warren Richard Morningstar, Raviteja Vemulapalli, Karan Singhal, Vishal M. Patel, Philip Andrew Mansfield
  • for: 这篇论文旨在探讨如何使用 Federated Layer-wise Learning 和 Federated Depth Dropout 技术来训练大型机器学习模型,以便在边缘设备上进行训练。
  • methods: 本研究使用 Federated Layer-wise Learning 和 Federated Depth Dropout 技术,实现了降低每个客户端的内存、计算和通信成本的目的。
  • results: 研究发现,这两种技术可以同时降低训练内存使用量,并且不会对模型的性能造成重要干扰。 Specifically, 在 Federated self-supervised representation learning 中,训练内存使用量被降低了5倍或更多,而模型在下游任务中的表现与传统 Federated self-supervised learning 相似。
    Abstract Large machine learning models trained on diverse data have recently seen unprecedented success. Federated learning enables training on private data that may otherwise be inaccessible, such as domain-specific datasets decentralized across many clients. However, federated learning can be difficult to scale to large models when clients have limited resources. This challenge often results in a trade-off between model size and access to diverse data. To mitigate this issue and facilitate training of large models on edge devices, we introduce a simple yet effective strategy, Federated Layer-wise Learning, to simultaneously reduce per-client memory, computation, and communication costs. Clients train just a single layer each round, reducing resource costs considerably with minimal performance degradation. We also introduce Federated Depth Dropout, a complementary technique that randomly drops frozen layers during training, to further reduce resource usage. Coupling these two techniques enables us to effectively train significantly larger models on edge devices. Specifically, we reduce training memory usage by 5x or more in federated self-supervised representation learning and demonstrate that performance in downstream tasks is comparable to conventional federated self-supervised learning.
    摘要 大型机器学习模型在多样化数据的训练下最近见到了历史性的成功。联邦学习可以训练在私人数据上,这些数据可能 Otherwise 分散在多个客户端上。然而,联邦学习在大型模型训练时可能会受到限制,导致模型大小和数据多样性之间的复杂负载。为解决这个问题并在边缘设备上训练大型模型,我们介绍了一个简单 yet 有效的策略:联邦层别学习。在每个回合中,客户端只需要训练一层,这有很大的降低了客户端的资源成本。我们还介绍了联邦深度掉擦,一个与之相伴的技术,在训练过程中随机删除冻结层,进一步降低资源使用率。通过结合这两种技术,我们可以有效地在边缘设备上训练更大的模型,尤其是在自我监督学习中。具体来说,我们可以将训练内存使用量降低至5倍以上,并且在下游任务中表现与传统联邦自我监督学习相似。

Data Summarization beyond Monotonicity: Non-monotone Two-Stage Submodular Maximization

  • paper_url: http://arxiv.org/abs/2309.05183
  • repo_url: None
  • paper_authors: Shaojie Tang
  • for: 降低基aset中的元素数量,使新的目标函数优化过基aset中剩下的元素可以达到与原始基aset中的结果相似的效果。
  • methods: 使用提供的培训函数,这些函数都是具有减少基aset的优化目标,并且引入了扩展现有研究中假设的不寻常 monotonicity 的非 monotone 优化方法。
  • results: 引入了首个常数系数approximation算法,用于解决更一般的二stage submodular maximization问题。
    Abstract The objective of a two-stage submodular maximization problem is to reduce the ground set using provided training functions that are submodular, with the aim of ensuring that optimizing new objective functions over the reduced ground set yields results comparable to those obtained over the original ground set. This problem has applications in various domains including data summarization. Existing studies often assume the monotonicity of the objective function, whereas our work pioneers the extension of this research to accommodate non-monotone submodular functions. We have introduced the first constant-factor approximation algorithms for this more general case.
    摘要 Simplified Chinese:这个两个阶段的优化问题的目标是使用提供的训练函数来减少基aset,以确保优化新的目标函数在减少后的基aset上达到原始基aset上的结果相似。这个问题在不同领域,如数据概要中有应用。现有的研究通常假设目标函数升序,而我们的研究则是扩展这个研究来包括非升序的优化函数。我们已经提出了首个常量系数近似算法 для这种更一般的情况。

Our Deep CNN Face Matchers Have Developed Achromatopsia

  • paper_url: http://arxiv.org/abs/2309.05180
  • repo_url: None
  • paper_authors: Aman Bhatta, Domingo Mery, Haiyu Wu, Joyce Annan, Micheal C. King, Kevin W. Bowyer
  • for: 这个论文旨在证明现代深度学习面部匹配器在灰度图像和彩色图像上的匹配精度是相同的。
  • methods: 这个论文使用了深度学习面部匹配器,并对其在灰度图像和彩色图像上的性能进行了分析。
  • results: 论文发现,使用灰度图像进行训练并不会影响深度学习面部匹配器的匹配精度,而且可以使用单通道灰度图像进行训练,从而减少计算量并使用更大的数据集。
    Abstract Modern deep CNN face matchers are trained on datasets containing color images. We show that such matchers achieve essentially the same accuracy on the grayscale or the color version of a set of test images. We then consider possible causes for deep CNN face matchers ``not seeing color''. Popular web-scraped face datasets actually have 30 to 60\% of their identities with one or more grayscale images. We analyze whether this grayscale element in the training set impacts the accuracy achieved, and conclude that it does not. Further, we show that even with a 100\% grayscale training set, comparable accuracy is achieved on color or grayscale test images. Then we show that the skin region of an individual's images in a web-scraped training set exhibit significant variation in their mapping to color space. This suggests that color, at least for web-scraped, in-the-wild face datasets, carries limited identity-related information for training state-of-the-art matchers. Finally, we verify that comparable accuracy is achieved from training using single-channel grayscale images, implying that a larger dataset can be used within the same memory limit, with a less computationally intensive early layer.
    摘要 现代深度 CNN 脸Recognizer 通常在颜色图像上训练。我们显示,这些Matcher 在颜色版本或灰度版本的测试图像上具有基本相同的准确率。然后我们考虑了深度 CNN 脸Recognizer "不看到颜色" 的可能性。流行的网络抓取 face 数据集实际上有 30% 到 60% 的个体图像包含一个或多个灰度图像。我们分析了这些灰度元素在训练集中对准确率的影响,并结论是没有影响。进一步,我们表明,即使使用 100% 灰度训练集,在颜色或灰度测试图像上也可以达到相同的准确率。然后我们显示了网络抓取的人脸训练集中个体皮肤区域的颜色空间中的变化,这表明,至少对于网络抓取的人脸数据集,颜色对于训练 state-of-the-art Matcher 来说带来了有限的个体信息。最后,我们证明了通过单通道灰度图像训练,可以达到相同的准确率,这意味着可以使用更大的数据集,在同样的内存限制下,使用更加计算机易于的早期层。

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

  • paper_url: http://arxiv.org/abs/2309.05173
  • repo_url: https://github.com/zhengxiangshi/dept
  • paper_authors: Zhengxiang Shi, Aldo Lipani
  • for: 本研究旨在提高语言模型(LM)的参数效率,通过在输入中附加小量可训练的软提示 вектор(PT)进行微调(PEFT)。
  • methods: 本研究使用的方法是分解软提示(DePT),即将软提示分解成更短的软提示和一对低级矩阵,然后通过两个不同的学习率进行优化。
  • results: 对于23种自然语言处理(NLP)和视觉语言(VL)任务,我们的实验结果表明,DePT比其他PEFT方法更高效,并且在某些场景下甚至超过了基线微调方法。此外,我们还发现DePT随模型大小增长而变得更加高效。
    Abstract Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20% memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.
    摘要

cs.CL - 2023-09-11

Hi Model, generating ‘nice’ instead of ‘good’ is not as bad as generating ‘rice’! Towards Context and Semantic Infused Dialogue Generation Loss Function and Evaluation Metric

  • paper_url: http://arxiv.org/abs/2309.05804
  • repo_url: None
  • paper_authors: Abhisek Tiwari, Muhammed Sinan, Kaushik Roy, Amit Sheth, Sriparna Saha, Pushpak Bhattacharyya
  • for: 本研究旨在提出一种新的对话生成损失函数和评价指标,以改进对话生成模型的评价和优化。
  • methods: 本研究使用了新的Semantic Infused Contextualized diaLogue(SemTextualLogue)损失函数和Dialuation评价指标,并在两个对话数据集上进行了实验,包括任务对话和开放对话场景。
  • results: 研究发现,使用SemTextualLogue损失函数和Dialuation指标进行训练,对话生成模型的性能有显著提升,比传统的cross-entropy损失函数更能够评价对话生成模型的表现。
    Abstract Over the past two decades, dialogue modeling has made significant strides, moving from simple rule-based responses to personalized and persuasive response generation. However, despite these advancements, the objective functions and evaluation metrics for dialogue generation have remained stagnant, i.e., cross-entropy and BLEU, respectively. These lexical-based metrics have the following key limitations: (a) word-to-word matching without semantic consideration: It assigns the same credit for failure to generate 'nice' and 'rice' for 'good'. (b) missing context attribute for evaluating the generated response: Even if a generated response is relevant to the ongoing dialogue context, it may still be penalized for not matching the gold utterance provided in the corpus. In this paper, we first investigate these limitations comprehensively and propose a new loss function called Semantic Infused Contextualized diaLogue (SemTextualLogue) loss function. Furthermore, we formulate a new evaluation metric called Dialuation, which incorporates both context relevance and semantic appropriateness while evaluating a generated response. We conducted experiments on two benchmark dialogue corpora, encompassing both task-oriented and open-domain scenarios. We found that the dialogue generation model trained with SemTextualLogue loss attained superior performance (in both quantitative and qualitative evaluation) compared to the traditional cross-entropy loss function across the datasets and evaluation metrics.
    摘要 过去二十年,对话模型化已经做出了 significiant 进步,从简单的规则基于响应演进到个性化和说服性响应生成。然而,虽然这些进步,对话生成的目标函数和评价指标仍然停滞不前,即cross-entropy和BLEU,分别。这些lexical-based 指标具有以下两点限制:(a)word-to-word匹配无semantic考虑:它将生成 'good'和'rice'的不同的响应视为相同的失败。(b)缺少对话上下文特征:即使生成的响应与对话上下文相关,仍可能因为不匹配goldutterance而受到penalty。在这篇论文中,我们首先对这些限制进行了全面的调查,并提出了一种新的损失函数called Semantic Infused Contextualized diaLogue (SemTextualLogue)损失函数。此外,我们提出了一种新的评价指标called Dialuation,该指标包含对话上下文相关性和semantic适用性的两个方面。我们在两个标准对话 corpora上进行了实验,包括任务域和开放域场景。我们发现,使用SemTextualLogue损失函数训练的对话生成模型在所有数据集和评价指标上表现出色,比传统的cross-entropy损失函数更好。

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

  • paper_url: http://arxiv.org/abs/2309.05653
  • repo_url: https://github.com/TIGER-AI-Lab/MAmmoTH
  • paper_authors: Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
  • for: The paper is written for developing a series of open-source large language models (LLMs) specifically tailored for general math problem-solving.
  • methods: The paper uses a meticulously curated instruction tuning dataset called MathInstruct, which includes 13 math datasets with intermediate rationales, six of which were newly curated by the authors. The models are trained on this dataset, which presents a unique hybrid of chain-of-thought (CoT) and program-of-thought (PoT) rationales.
  • results: The MAmmoTH series of models substantially outperform existing open-source models on nine mathematical reasoning datasets across all scales, with an average accuracy gain between 13% and 29%. The MAmmoTH-7B model achieves 35% accuracy on MATH, which exceeds the best open-source 7B model (WizardMath) by 25%, and the MAmmoTH-34B model achieves 46% accuracy on MATH, even surpassing GPT-4’s CoT result.
    Abstract We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset. MathInstruct is compiled from 13 math datasets with intermediate rationales, six of which have rationales newly curated by us. It presents a unique hybrid of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and also ensures extensive coverage of diverse fields in math. The hybrid of CoT and PoT not only unleashes the potential of tool use but also allows different thought processes for different math problems. As a result, the MAmmoTH series substantially outperform existing open-source models on nine mathematical reasoning datasets across all scales with an average accuracy gain between 13% and 29%. Remarkably, our MAmmoTH-7B model reaches 35% on MATH (a competition-level dataset), which exceeds the best open-source 7B model (WizardMath) by 25%, and the MAmmoTH-34B model achieves 46% accuracy on MATH, even surpassing GPT-4's CoT result. Our work underscores the importance of diverse problem coverage and the use of hybrid rationales in developing superior math generalist models.
    摘要 我们介绍MAmmoTH,一系列开源大型自然语言模型(LLMs),特别针对数学问题的解释。MAmmoTH模型在我们仔细组合的 instruNet 训练集上训练, instruNet 是我们新compile的 13 个数学数据集,其中六个是我们新给出的 rationales。这些 rationales 是一种 chain-of-thought(CoT)和 program-of-thought(PoT)的混合类型,并且涵盖了数学多个领域。这种混合类型不仅发挥工具的潜力,而且允许不同的思维过程,因此 MAmmoTH 系列在九个数学推理数据集上表现出色,具有13% 至 29% 的总精度提升。特别是我们的 MAmmoTH-7B 模型在 MATH 竞赛级数据集上 дости得 35% 的精度,超过了最佳开源 7B 模型(WizardMath)的 25%,而 MAmmoTH-34B 模型在 MATH 上取得 46% 的精度,甚至超过 GPT-4 的 CoT 结果。我们的工作强调了数学多个领域的多元问题覆盖和 hybrid 的 rationales 在开发出色数学通用模型方面的重要性。

Effective Proxy for Human Labeling: Ensemble Disagreement Scores in Large Language Models for Industrial NLP

  • paper_url: http://arxiv.org/abs/2309.05619
  • repo_url: None
  • paper_authors: Wei Du, Laksh Advani, Yashmeet Gambhir, Daniel J Perry, Prashant Shiralkar, Zhengzheng Xing, Aaron Colak
  • for: 评估大语言模型(LLMs)在实际世界中的性能,以验证其在不同语言和领域中的总体性能。
  • methods: 使用 ensemble disagreement scores 作为人工标注的代理,以评估 LLM 在零shot、几shot 和 fine-tuned 设置下的性能。
  • results: 结果表明,使用 ensemble disagreement scores 可以准确地评估 LLM 的性能,与真实的人工标注 Error 相比,MAE 为 0.4% 左右,与使用另一个 LLM 作为机器标注(silver labels)的情况相比,平均提高了 13.8%。
    Abstract Large language models (LLMs) have demonstrated significant capability to generalize across a large number of NLP tasks. For industry applications, it is imperative to assess the performance of the LLM on unlabeled production data from time to time to validate for a real-world setting. Human labeling to assess model error requires considerable expense and time delay. Here we demonstrate that ensemble disagreement scores work well as a proxy for human labeling for language models in zero-shot, few-shot, and fine-tuned settings, per our evaluation on keyphrase extraction (KPE) task. We measure fidelity of the results by comparing to true error measured from human labeled ground truth. We contrast with the alternative of using another LLM as a source of machine labels, or silver labels. Results across various languages and domains show disagreement scores provide a better estimation of model performance with mean average error (MAE) as low as 0.4% and on average 13.8% better than using silver labels.
    摘要 大型语言模型(LLM)已经展示了广泛的应用准确性。为工业应用,需要定期评估LLM在实际世界数据上的表现,以验证其可行性。人工标注来评估模型错误需要巨大的成本和时间延迟。在本研究中,我们展示了 ensemble disagreement scores 可以作为人工标注的代理,并在零shot、少shot和 fine-tuned 设定下进行评估。我们通过比较 true error measured from human labeled ground truth 和 ensemble disagreement scores 的精度,发现 ensemble disagreement scores 能够提供更好的模型性能估计,mean average error(MAE)只有0.4%,并且在平均上比 silver labels 高13.8%。 results across various languages and domains 表明,ensemble disagreement scores 能够提供更好的模型性能估计。

Incorporating Pre-trained Model Prompting in Multimodal Stock Volume Movement Prediction

  • paper_url: http://arxiv.org/abs/2309.05608
  • repo_url: https://github.com/rayruibochen/promuse
  • paper_authors: Ruibo Chen, Zhiyuan Zhang, Yi Liu, Ruihan Bao, Keiko Harimoto, Xu Sun
  • for: 用于预测股票交易量运动的多modal数据movement prediction
  • methods: 使用预训练语言模型和提示学习方法来处理文本和时间序列模式
  • results: 比较 existing baselines 表现出色,并通过多种分析 validate 模型的效果
    Abstract Multimodal stock trading volume movement prediction with stock-related news is one of the fundamental problems in the financial area. Existing multimodal works that train models from scratch face the problem of lacking universal knowledge when modeling financial news. In addition, the models ability may be limited by the lack of domain-related knowledge due to insufficient data in the datasets. To handle this issue, we propose the Prompt-based MUltimodal Stock volumE prediction model (ProMUSE) to process text and time series modalities. We use pre-trained language models for better comprehension of financial news and adopt prompt learning methods to leverage their capability in universal knowledge to model textual information. Besides, simply fusing two modalities can cause harm to the unimodal representations. Thus, we propose a novel cross-modality contrastive alignment while reserving the unimodal heads beside the fusion head to mitigate this problem. Extensive experiments demonstrate that our proposed ProMUSE outperforms existing baselines. Comprehensive analyses further validate the effectiveness of our architecture compared to potential variants and learning mechanisms.
    摘要 多Modal股票交易量运动预测与股票相关新闻是金融领域的基本问题。现有的多Modal工作都是从头开始训练模型,面临缺乏通用知识的问题。另外,模型的能力可能受到数据集中的域相关知识不充分的限制。为解决这个问题,我们提出了Prompt-based MUltimodal Stock volumE prediction model(ProMUSE)来处理文本和时间序Modalities。我们使用预训练语言模型来更好地理解金融新闻,并采用提问学习方法来利用其在通用知识中的能力来模型文本信息。此外,简单地将两Modalities进行混合可能会对单Modalities的表示带来害。因此,我们提出了一种新的交叉Modalities强制对齐,以保持单Modalities的表示。广泛的实验表明,我们提出的ProMUSE超过了现有的基准值。进一步的分析还证明了我们的architecture的效果与可能的变体和学习机制相比。

Long-Range Transformer Architectures for Document Understanding

  • paper_url: http://arxiv.org/abs/2309.05503
  • repo_url: https://github.com/thibaultdouzon/long-range-document-transformer
  • paper_authors: Thibault Douzon, Stefan Duffner, Christophe Garcia, Jérémy Espinas
  • for: 这篇论文旨在应用Transformer模型于长multi-page文档处理中。
  • methods: 该论文提出了两种多模态(文本+布局)长距离模型,以及一种2D相对注意力偏好来引导自注意力。
  • results: 对多页企业文档进行信息检索时,该模型表现出了改善,与小 sequences 的性能成本相对较低。
    Abstract Since their release, Transformers have revolutionized many fields from Natural Language Understanding to Computer Vision. Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019. However, the computational complexity of the self-attention operation limits their capabilities to small sequences. In this paper we explore multiple strategies to apply Transformer based models to long multi-page documents. We introduce 2 new multi-modal (text + layout) long-range models for DU. They are based on efficient implementations of Transformers for long sequences. Long-range models can process whole documents at once effectively and are less impaired by the document's length. We compare them to LayoutLM, a classical Transformer adapted for DU and pre-trained on millions of documents. We further propose 2D relative attention bias to guide self-attention towards relevant tokens without harming model efficiency. We observe improvements on multi-page business documents on Information Retrieval for a small performance cost on smaller sequences. Relative 2D attention revealed to be effective on dense text for both normal and long-range models.
    摘要 Since their release, transformers have revolutionized many fields, from natural language understanding to computer vision. Document understanding (DU) was not left behind, with the first transformer-based models for DU dating back to late 2019. However, the computational complexity of the self-attention operation limits their capabilities to small sequences. In this paper, we explore multiple strategies to apply transformer-based models to long multi-page documents. We introduce two new multi-modal (text + layout) long-range models for DU. They are based on efficient implementations of transformers for long sequences. Long-range models can process whole documents at once effectively and are less impaired by the document's length. We compare them to LayoutLM, a classical transformer adapted for DU and pre-trained on millions of documents. We further propose 2D relative attention bias to guide self-attention towards relevant tokens without harming model efficiency. We observe improvements on multi-page business documents on information retrieval for a small performance cost on smaller sequences. Relative 2D attention revealed to be effective on dense text for both normal and long-range models.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Personality Detection and Analysis using Twitter Data

  • paper_url: http://arxiv.org/abs/2309.05497
  • repo_url: https://github.com/SRIGURUPRASAD/Trending-Polarity_Diagnosis-Word_Cloud-Profile_analysis
  • paper_authors: Abhilash Datta, Souvic Chakraborty, Animesh Mukherjee
  • for: 这篇论文是为了探讨人格特质自动检测的问题,以及将大量文本数据集用于研究人格类型的可能性。
  • methods: 本论文使用自动检测人格特质的方法,并对大量文本数据集进行了质量控制和分类。
  • results: 研究发现,自动检测人格特质的方法可以准确地预测个人的人格类型,并且可以提供有价值的信息用于各种应用领域。
    Abstract Personality types are important in various fields as they hold relevant information about the characteristics of a human being in an explainable format. They are often good predictors of a person's behaviors in a particular environment and have applications ranging from candidate selection to marketing and mental health. Recently automatic detection of personality traits from texts has gained significant attention in computational linguistics. Most personality detection and analysis methods have focused on small datasets making their experimental observations often limited. To bridge this gap, we focus on collecting and releasing the largest automatically curated dataset for the research community which has 152 million tweets and 56 thousand data points for the Myers-Briggs personality type (MBTI) prediction task. We perform a series of extensive qualitative and quantitative studies on our dataset to analyze the data patterns in a better way and infer conclusions. We show how our intriguing analysis results often follow natural intuition. We also perform a series of ablation studies to show how the baselines perform for our dataset.
    摘要 人格类型在不同领域具有重要的意义,它们可以带来人类特性的可观察性格。它们经常是人类在特定环境中行为的预测器,并且在选拔候选人、营销和心理健康等领域有广泛的应用。现在,自动检测人格特质从文本中的研究受到了计算语言学的广泛关注。大多数人格检测和分析方法都集中在小 dataset 上,导致其实验观察通常有限。为了bridging这个差距,我们集中在收集和发布最大自动筛选的数据集,这个数据集包含152万篇微博和56千个数据点,用于Myers-Briggs人格类型(MBTI)预测任务。我们进行了系列的详细和量化研究,以分析数据的 patrern 以及得出结论。我们的研究结果经常遵循自然的直觉,并且我们进行了一系列的减少研究,以示baseline 在我们的数据集上的性能。

  • paper_url: http://arxiv.org/abs/2309.05494
  • repo_url: None
  • paper_authors: Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera
  • for: This paper is written to address the challenges of analyzing crisis-related social media texts and to introduce an ensemble of pre-trained language models and sentence encoders called CrisisTransformers.
  • methods: The authors use an extensive corpus of over 15 billion word tokens from tweets associated with more than 30 crisis events to train their models, including BERT and RoBERTa, and evaluate their performance on 18 crisis-specific public datasets.
  • results: The authors find that their pre-trained models outperform strong baselines across all datasets in classification tasks, and their best-performing sentence encoder improves the state-of-the-art by 17.43% in sentence encoding tasks. Additionally, they investigate the impact of model initialization on convergence and the significance of domain-specific models in generating semantically meaningful sentence embeddings.
    Abstract Social media platforms play an essential role in crisis communication, but analyzing crisis-related social media texts is challenging due to their informal nature. Transformer-based pre-trained models like BERT and RoBERTa have shown success in various NLP tasks, but they are not tailored for crisis-related texts. Furthermore, general-purpose sentence encoders are used to generate sentence embeddings, regardless of the textual complexities in crisis-related texts. Advances in applications like text classification, semantic search, and clustering contribute to effective processing of crisis-related texts, which is essential for emergency responders to gain a comprehensive view of a crisis event, whether historical or real-time. To address these gaps in crisis informatics literature, this study introduces CrisisTransformers, an ensemble of pre-trained language models and sentence encoders trained on an extensive corpus of over 15 billion word tokens from tweets associated with more than 30 crisis events, including disease outbreaks, natural disasters, conflicts, and other critical incidents. We evaluate existing models and CrisisTransformers on 18 crisis-specific public datasets. Our pre-trained models outperform strong baselines across all datasets in classification tasks, and our best-performing sentence encoder improves the state-of-the-art by 17.43% in sentence encoding tasks. Additionally, we investigate the impact of model initialization on convergence and evaluate the significance of domain-specific models in generating semantically meaningful sentence embeddings. All models are publicly released (https://huggingface.co/crisistransformers), with the anticipation that they will serve as a robust baseline for tasks involving the analysis of crisis-related social media texts.
    摘要 社交媒体平台在危机通信中发挥了重要作用,但分析危机相关的社交媒体文本具有挑战性,这是因为这些文本的形式不具有正式的特征。BERT和RoBERTa等基于Transformer的预训练模型在不同的自然语言处理任务中显示出了成功,但它们没有特定的针对危机相关文本的训练。此外,通用的句子编码器在处理危机相关文本时会遇到文本复杂性的问题。为了解决危机信息学Literature中的漏洞,本研究提出了危机 трансформа(CrisisTransformers),这是一个基于广泛的危机事件 Tweets 集合(超过 15 亿字符)和多种危机类型的预训练语言模型和句子编码器的ensemble。我们对 existed 模型和危机 трансформа进行了18个危机特定的公共数据集的评估。我们的预训练模型在所有数据集中都高于强基eline,并且我们的最佳句子编码器在句子编码任务中提高了状态艺术的最佳性能 by 17.43%。此外,我们还 investigate了模型初始化对叠入的影响和预训练模型在生成Semantically meaningful句子编码的重要性。所有模型都公开发布(https://huggingface.co/crisistransformers),我们anticipate 它们将作为危机相关社交媒体文本分析任务的稳定基线。

Zero-shot Learning with Minimum Instruction to Extract Social Determinants and Family History from Clinical Notes using GPT Model

  • paper_url: http://arxiv.org/abs/2309.05475
  • repo_url: None
  • paper_authors: Neel Bhate, Ansh Mittal, Zhe He, Xiao Luo
  • for: 本研究旨在 investigate Zero-shot learning 方法,以掌握不同条件下的 clinical notes 中的 demographics、社会条件和家族历史信息。
  • methods: 本研究使用 GPT 模型,并提供 minimum information 来检查模型的性能。
  • results: 研究结果显示,GPT-3.5 方法在 demographics 抽出中取得了 0.975 F1 的平均分,在 social determinants 抽出中取得了 0.615 F1 的平均分,在 family history 抽出中取得了 0.722 F1 的平均分。
    Abstract Demographics, Social determinants of health, and family history documented in the unstructured text within the electronic health records are increasingly being studied to understand how this information can be utilized with the structured data to improve healthcare outcomes. After the GPT models were released, many studies have applied GPT models to extract this information from the narrative clinical notes. Different from the existing work, our research focuses on investigating the zero-shot learning on extracting this information together by providing minimum information to the GPT model. We utilize de-identified real-world clinical notes annotated for demographics, various social determinants, and family history information. Given that the GPT model might provide text different from the text in the original data, we explore two sets of evaluation metrics, including the traditional NER evaluation metrics and semantic similarity evaluation metrics, to completely understand the performance. Our results show that the GPT-3.5 method achieved an average of 0.975 F1 on demographics extraction, 0.615 F1 on social determinants extraction, and 0.722 F1 on family history extraction. We believe these results can be further improved through model fine-tuning or few-shots learning. Through the case studies, we also identified the limitations of the GPT models, which need to be addressed in future research.
    摘要 《人口学、社会决定因素和家庭历史记录在电子健康记录中的不结构化文本是逐渐被研究以利用这些信息与结构化数据共同改善医疗结果。》After the release of GPT models, many studies have applied GPT models to extract this information from clinical notes. Different from existing work, our research focuses on investigating zero-shot learning to extract this information together by providing minimum information to the GPT model. We use de-identified real-world clinical notes annotated with demographics, social determinants, and family history information. Given that the GPT model may provide text different from the original data, we explore two sets of evaluation metrics, including traditional NER evaluation metrics and semantic similarity evaluation metrics, to fully understand the performance. Our results show that the GPT-3.5 method achieved an average of 0.975 F1 on demographics extraction, 0.615 F1 on social determinants extraction, and 0.722 F1 on family history extraction. We believe these results can be further improved through model fine-tuning or few-shots learning. Through case studies, we also identified the limitations of GPT models, which need to be addressed in future research.

Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

  • paper_url: http://arxiv.org/abs/2309.05454
  • repo_url: None
  • paper_authors: Joseph Marvin Imperial, Harish Tayyar Madabushi
  • for: 这个研究的目的是评估不同开源和关闭源语言模型在写作完结和简化故事任务中的表现,以便教师可以根据标准指南来评估这些任务的难度。
  • methods: 这个研究使用了多种开源和关闭源语言模型,包括ChatGPT和BLOOMZ等,并使用标准指南来控制文本的阅读难度。
  • results: 研究发现,使用标准指南控制文本阅读难度可以提高模型的表现,而ChatGPT模型在这些生成任务中表现较差,而BLOOMZ和FlanT5等开源模型则表现更加出色。
    Abstract Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use. In this study, we select a diverse set of open and closed-source instruction-tuned language models and investigate their performances in writing story completions and simplifying narratives$-$tasks that teachers perform$-$using standard-guided prompts controlling text readability. Our extensive findings provide empirical proof of how globally recognized models like ChatGPT may be considered less effective and may require more refined prompts for these generative tasks compared to other open-sourced models such as BLOOMZ and FlanT5$-$which have shown promising results.
    摘要 教学工具和标准,如费希-金凯德学年级水平(FKGL)和欧洲共同语言参照体系(CEFR),用于导引教师和教育工作者评估教学材料的复杂性,以确保在教室使用前,材料的阅读性能得到适当的评估。在这项研究中,我们选择了一些多样化的开源和关闭源的 instrucit-调整语言模型,并 investigate其在写作续写和简化故事任务中的表现,使用标准化的提示控制文本阅读性。我们的广泛发现证明了 globally recognized模型 like ChatGPT 可能不太有效,并且可能需要更加细化的提示来完成这些生成任务,相比于其他开源模型如 BLOOMZ 和 FlanT5,这些模型在这些任务中表现出色。

Evaluating the Deductive Competence of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.05452
  • repo_url: None
  • paper_authors: S. M. Seals, Valerie L. Shalin
  • for: 这个研究旨在评估大语言模型(LLMs)的逻辑和问题解决能力。
  • methods: 研究使用了多种大语言模型(LLMs)来解决一种从认知科学文献中的逻辑推理问题。
  • results: 研究发现,这些LLMs在问题的常规形式下表现有限,并且对问题的表示形式和内容进行了跟进实验,但发现表现之间存在差异,并且与人类表现不同。总的来说,这些结果表明LLMs具有独特的逻辑偏见,与人类逻辑性表现相互关联。
    Abstract The development of highly fluent large language models (LLMs) has prompted increased interest in assessing their reasoning and problem-solving capabilities. We investigate whether several LLMs can solve a classic type of deductive reasoning problem from the cognitive science literature. The tested LLMs have limited abilities to solve these problems in their conventional form. We performed follow up experiments to investigate if changes to the presentation format and content improve model performance. We do find performance differences between conditions; however, they do not improve overall performance. Moreover, we find that performance interacts with presentation format and content in unexpected ways that differ from human performance. Overall, our results suggest that LLMs have unique reasoning biases that are only partially predicted from human reasoning performance.
    摘要 发展高度流畅的大语言模型(LLMs)已引发了评估其逻辑和问题解决能力的兴趣。我们研究了一些LLMs是否可以解决知识科学文献中的一种经典逻辑推理问题。我们发现,在传统形式下,测试LLMs的能力并不高。我们进行了续试实验,以确定是否可以通过改变格式和内容来改善模型表现。结果发现,尽管存在具体的表现差异,但这并不能提高总体表现。此外,我们发现模型的表现与显示格式和内容之间存在不可预期的交互作用,与人类表现不同。总之,我们的结果表明,LLMs具有人类逻辑思维不同的偏好,这些偏好只有部分与人类逻辑表现相符。

TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design

  • paper_url: http://arxiv.org/abs/2309.05447
  • repo_url: None
  • paper_authors: Yongrui Chen, Haiyun Jiang, Xinting Huang, Shuming Shi, Guilin Qi
  • for: 提高 LLM 能力,需要高质量的指令调整数据。现有的数据收集方法受限于人工标注成本过高或者 LLM 生成幻化。
  • methods: 本文提出了一种扩展的方法,通过训练语言模型自动设计任务,以获取高质量的指令调整数据。模型通过人工写的文本来减少幻化。
  • results: 自动和手动评估实验结果表明,我们的数据集具有高质量。
    Abstract High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collection methods are limited by unrealistic manual labeling costs or by the hallucination of relying solely on LLM generation. To address the problems, this paper presents a scalable method to automatically collect high-quality instructional adaptation data by training language models to automatically design tasks based on human-written texts. Intuitively, human-written text helps to help the model attenuate illusions during the generation of tasks. Unlike instruction back-translation-based methods that directly take the given text as a response, we require the model to generate the \textit{instruction}, \textit{input}, and \textit{output} simultaneously to filter the noise. The results of the automated and manual evaluation experiments demonstrate the quality of our dataset.
    摘要 高品质的指导数据对于提高LLM能力至关重要。现有的数据收集方法受限于不现实的手动标签成本或者依赖solely LLM生成所导致的幻觉。为解决这些问题,本文提出了一种可扩展的方法,通过训练语言模型自动设计任务基于人类写的文本。人类写的文本可以帮助模型减少幻觉。不同于基于回答 instruction back-translation 的方法,我们需要模型同时生成 \textit{指导}, \textit{输入} 和 \textit{输出},以过滤噪音。经自动和 manual 评估实验表明,我们的数据集具有高质量。

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

  • paper_url: http://arxiv.org/abs/2309.05444
  • repo_url: None
  • paper_authors: Ted Zadouri, Ahmet Üstün, Arash Ahmadian, Beyza Ermiş, Acyr Locatelli, Sara Hooker
  • for: 这 paper 的目的是推动 Mixture of Experts(MoE) neural network 的 Parameter Efficient Fine-Tuning(PEFT)方法,以实现一个常规 MoE 模型的缩放。
  • methods: 这 paper 使用了 MoE 架构,并将它与轻量级专家结合在一起,以实现 Parameter Efficient MoE(PEMoE)方法。这种方法可以在约 1% 的参数上进行微调,并且可以在不知道先前任务的情况下进行普适化。
  • results: 根据 экспериментах,PEMoE 方法可以与标准 PEFT 方法相比,在更小的参数上实现更高的性能。此外,PEMoE 方法还可以在未经过任务知识的情况下进行普适化,并且可以在不同的任务上实现良好的性能。
    Abstract The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architecture with lightweight experts.Our MoE architecture outperforms standard parameter-efficient fine-tuning (PEFT) methods and is on par with full fine-tuning by only updating the lightweight experts -- less than 1% of an 11B parameters model. Furthermore, our method generalizes to unseen tasks as it does not depend on any prior task knowledge. Our research underscores the versatility of the mixture of experts architecture, showcasing its ability to deliver robust performance even when subjected to rigorous parameter constraints. Our code used in all the experiments is publicly available here: https://github.com/for-ai/parameter-efficient-moe.
    摘要 “混合专家(MoE)是一种广泛知名的神经网络架构,其中一个ensemble of specialized sub-models可以提高总性能减少计算成本。然而,传统的MoE遇到了规模化的挑战,因为需要存储所有专家。在这篇论文中,我们将MoE推到了界限。我们提出了非常 Paramater-efficient MoE,通过独特地将MoE架构和轻量级专家结合在一起。我们的MoE架构超越了标准的 Paramater-efficient fine-tuning(PEFT)方法,并与全面 fine-tuning 相当,只需更新轻量级专家—— menos than 1% of an 11B parameters model。此外,我们的方法可以泛化到未看到的任务,因为它不依赖任务知识。我们的研究强调了混合专家架构的灵活性,显示它可以提供坚强的性能,即使面临严格的参数约束。我们在所有实验中使用的代码可以在以下链接获取:https://github.com/for-ai/parameter-efficient-moe。”

Experimenting with UD Adaptation of an Unsupervised Rule-based Approach for Sentiment Analysis of Mexican Tourist Texts

  • paper_url: http://arxiv.org/abs/2309.05312
  • repo_url: None
  • paper_authors: Olga Kellert, Mahmud Uz Zaman, Nicholas Hill Matlis, Carlos Gómez-Rodríguez
  • for: 这个论文描述了一种基于 Universal Dependencies (UD) 的无监督、分析性和递归 (UCR) 规则集合方法的情感分析 (SA) 实验结果,并在 Rest-Mex 2023 共同任务中提交 (Team Olga/LyS-SALSA) (内部的 IberLEF 2023 会议)。
  • methods: 我们的方法使用基本的 sintactic 规则,如修饰和否定词的规则,从情感词典中提取words,利用这些规则来实现无监督方法的优势:(1) 情感分析的解释性和可读性,(2) 鲁棒性适用于不同的数据集、语言和领域,(3) 非 NLP 专家可以使用。
  • results: 我们的方法比其他无监督方法具有更好的表现,我们还讨论了将 modal 特征作为另一种偏置规则以提高结果,以及使用 word ambiguation 技术来正确地识别情感词。
    Abstract This paper summarizes the results of experimenting with Universal Dependencies (UD) adaptation of an Unsupervised, Compositional and Recursive (UCR) rule-based approach for Sentiment Analysis (SA) submitted to the Shared Task at Rest-Mex 2023 (Team Olga/LyS-SALSA) (within the IberLEF 2023 conference). By using basic syntactic rules such as rules of modification and negation applied on words from sentiment dictionaries, our approach exploits some advantages of an unsupervised method for SA: (1) interpretability and explainability of SA, (2) robustness across datasets, languages and domains and (3) usability by non-experts in NLP. We compare our approach with other unsupervised approaches of SA that in contrast to our UCR rule-based approach use simple heuristic rules to deal with negation and modification. Our results show a considerable improvement over these approaches. We discuss future improvements of our results by using modality features as another shifting rule of polarity and word disambiguation techniques to identify the right sentiment words.
    摘要
  1. Interpretability and explainability of SA results2. Robustness across datasets, languages, and domains3. Usability by non-experts in NLPWe compare our approach with other unsupervised approaches of SA that use simple heuristic rules to deal with negation and modification, and our results show a significant improvement over these approaches. In the future, we plan to improve our results by incorporating modality features as another shifting rule of polarity and using word disambiguation techniques to identify the correct sentiment words.

Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity Recognition

  • paper_url: http://arxiv.org/abs/2309.05311
  • repo_url: https://github.com/michael-beukman/nertransfer
  • paper_authors: Michael Beukman, Manuel Fokam
  • for: 本研究探讨了十种低资源语言之间的跨语言转移学习Property,具体是Named Entity Recognition任务。
  • methods: 研究者采用了适应细化调教和转移语言的选择对Zero-shot转移性能的影响。
  • results: 研究发现,能够在单个语言上表现出色的模型通常会在其他语言上表现不佳,而能够在多种语言上准确预测的模型通常会在单个语言上表现不佳。此外,数据集之间的数据重叠度更好地预测转移性能 than geographical或生物学距离 между语言。
    Abstract Transfer learning has led to large gains in performance for nearly all NLP tasks while making downstream models easier and faster to train. This has also been extended to low-resourced languages, with some success. We investigate the properties of cross-lingual transfer learning between ten low-resourced languages, from the perspective of a named entity recognition task. We specifically investigate how much adaptive fine-tuning and the choice of transfer language affect zero-shot transfer performance. We find that models that perform well on a single language often do so at the expense of generalising to others, while models with the best generalisation to other languages suffer in individual language performance. Furthermore, the amount of data overlap between the source and target datasets is a better predictor of transfer performance than either the geographical or genetic distance between the languages.
    摘要 通过转移学习,大多数自然语言处理任务上的性能有了大幅提升,而同时使下游模型更容易和更快地训练。此外,这种技术还被扩展到低资源语言中,并获得了一定的成功。我们对十种低资源语言之间的跨语言转移学习性能进行了调查,从命名实体识别任务的角度来看。我们专门研究了跨语言转移学习后,模型如何影响单个语言和其他语言之间的性能。我们发现,能够在单一语言上表现出色的模型通常是在其他语言上的性能下降的代价,而能够在多种语言上具有最好的总体性能的模型通常是单一语言上的性能下降的代价。此外,源语言和目标语言数据集之间的数据重叠度比较地理或基因距离更好地预测跨语言转移性能。

Minuteman: Machine and Human Joining Forces in Meeting Summarization

  • paper_url: http://arxiv.org/abs/2309.05272
  • repo_url: None
  • paper_authors: František Kmječ, Ondřej Bojar
  • for: 这篇论文的目的是提出一种新的会议笔记工具,帮助会议笔记人员更加快速地制作高质量的会议笔记。
  • methods: 该工具使用了语音识别和摘要模型,提供了现场 trascript 和会议笔记,让用户可以在实时Collaborative manner中编辑和修正 trascript 和笔记。
  • results: 试验结果表明,该工具可以减轻会议笔记人员的认知压力,并帮助他们更加快速地恢复 missed 部分会议。
    Abstract Many meetings require creating a meeting summary to keep everyone up to date. Creating minutes of sufficient quality is however very cognitively demanding. Although we currently possess capable models for both audio speech recognition (ASR) and summarization, their fully automatic use is still problematic. ASR models frequently commit errors when transcribing named entities while the summarization models tend to hallucinate and misinterpret the transcript. We propose a novel tool -- Minuteman -- to enable efficient semi-automatic meeting minuting. The tool provides a live transcript and a live meeting summary to the users, who can edit them in a collaborative manner, enabling correction of ASR errors and imperfect summary points in real time. The resulting application eases the cognitive load of the notetakers and allows them to easily catch up if they missed a part of the meeting due to absence or a lack of focus. We conduct several tests of the application in varied settings, exploring the worthiness of the concept and the possible user strategies.
    摘要 多数会议需要创建会议摘要以保持所有人的更新。创建足够质量的会议笔记是非常认知吃力的。虽然我们目前拥有了可靠的语音识别模型和摘要模型,但它们的完全自动使用仍然存在问题。语音识别模型经常对名称实体进行误报,而摘要模型往往会假设和 Misinterpret 笔记文本。我们提议一种新工具---Minuteman---以实现高效的半自动会议笔记。该工具提供了实时的会议笔记和会议摘要,用户可以在协作模式下编辑,以更正语音识别错误和摘要点。结果使得笔记员的认知负担减轻,使其更容易catch up if 缺席或缺少注意力。我们在不同的设置下进行了多次测试,探讨该概念的可行性和用户策略。

CONFLATOR: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling

  • paper_url: http://arxiv.org/abs/2309.05270
  • repo_url: None
  • paper_authors: Mohsin Ali, Kandukuri Sai Teja, Neeharika Gupta, Parth Patwa, Anubhab Chatterjee, Vinija Jain, Aman Chadha, Amitava Das
  • for: 本研究旨在提出一种基于神经语言模型的代码混合语言模型(CONFLATOR),以便更好地处理混合语言文本。
  • methods: 研究人员采用了多种 позицион编码方法,包括旋转 позицион编码和 switching point 信息,以提高模型的表达能力。
  • results: 研究人员通过对两个基于混合语言的任务(即 sentiment analysis 和 machine translation)进行实验,发现 CONFLATOR 可以在这些任务中达到更高的表达能力,比如state-of-the-art。
    Abstract The mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been very effective on many NLP tasks. However, NLM for CM is an under-explored area. Though transformers are capable and powerful, they cannot always encode positional/sequential information since they are non-recurrent. Therefore, to enrich word information and incorporate positional information, positional encoding is defined. We hypothesize that Switching Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2 or L2-> L1), pose a challenge for CM Language Models (LMs), and hence give special emphasis to switching points in the modeling process. We experiment with several positional encoding mechanisms and show that rotatory positional encodings along with switching point information yield the best results. We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. CONFLATOR tries to learn to emphasize switching points using smarter positional encoding, both at unigram and bigram levels. CONFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English (Hinglish): (i) sentiment analysis and (ii) machine translation.
    摘要 mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been very effective on many NLP tasks. However, NLM for CM is an under-explored area. Though transformers are capable and powerful, they cannot always encode positional/sequential information since they are non-recurrent. Therefore, to enrich word information and incorporate positional information, positional encoding is defined. We hypothesize that Switching Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2 or L2-> L1), pose a challenge for CM Language Models (LMs), and hence give special emphasis to switching points in the modeling process. We experiment with several positional encoding mechanisms and show that rotatory positional encodings along with switching point information yield the best results. We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. CONFLATOR tries to learn to emphasize switching points using smarter positional encoding, both at unigram and bigram levels. CONFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English (Hinglish): (i) sentiment analysis and (ii) machine translation.

Exploring the Law of Numbers: Evidence from China’s Real Estate

  • paper_url: http://arxiv.org/abs/2309.05221
  • repo_url: None
  • paper_authors: Fuqian Zhang, Zhenhua Wang
  • for: 这篇论文探讨了中国地产公司的财务报表,以便更全面地描述数字的法律。
  • methods: 该论文使用了Benford的法律来研究数字的分布,同时还研究了数字的频率和长度。
  • results: 研究发现,中国地产公司的财务报表中的数字不具有完整性,而且存在数据修饰的问题。这些结果不仅有经济 significancen,还可以深入理解数字的分布和用途。
    Abstract The renowned proverb, Numbers do not lie, underscores the reliability and insight that lie beneath numbers, a concept of undisputed importance, especially in economics and finance etc. Despite the prosperity of Benford's Law in the first digit analysis, its scope fails to remain comprehensiveness when it comes to deciphering the laws of number. This paper delves into number laws by taking the financial statements of China real estate as a representative, quantitatively study not only the first digit, but also depict the other two dimensions of numbers: frequency and length. The research outcomes transcend mere reservations about data manipulation and open the door to discussions surrounding number diversity and the delineation of the usage insights. This study wields both economic significance and the capacity to foster a deeper comprehension of numerical phenomena.
    摘要 著名的成语“数字不假”强调数字下面的可靠性和洞察力的重要性,尤其在经济和金融等领域。尽管本福德法在首位数分析方面取得了很大的成功,但其范围却无法涵盖数字法律的全面性。这篇论文通过利用中国地产公司的财务报表作为例子,量化研究不仅首位数,还描述了其他两个维度:频率和长度。研究结果超越了仅仅是数据报告的担忧,开启了数字多样性和使用情况的描述的讨论。这种研究具有经济意义和深入了解数字现象的能力。

Understanding the Impact of Post-Training Quantization on Large Language Models

  • paper_url: http://arxiv.org/abs/2309.05210
  • repo_url: None
  • paper_authors: Somnath Roy
  • for: The paper focuses on the deployment and operation of large language models (LLMs) on consumer-grade GPUs, and the impact of hyperparameters on the performance of quantized models.
  • methods: The paper compares and analyzes the performance of different quantization techniques, including nf4, fp4, and fp4-dq, on various LLMs, and investigates the effects of temperature on the performance of these models.
  • results: The study finds that nf4 and fp4 are equally proficient 4-bit quantization techniques, but nf4 displays greater resilience to temperature variations in the case of the llama2 series of models at lower temperature. Additionally, the study shows that 4-bit quantized models of varying sizes exhibit higher sensitivity to temperature in the range of 0.5 to 0.8, and that int8 quantization is associated with significantly slower inference speeds.
    Abstract Large language models (LLMs) are rapidly increasing in size, with the number of parameters becoming a key factor in the success of many commercial models, such as ChatGPT, Claude, and Bard. Even the recently released publicly accessible models for commercial usage, such as Falcon and Llama2, come equipped with billions of parameters. This significant increase in the number of parameters makes deployment and operation very costly. The remarkable progress in the field of quantization for large neural networks in general and LLMs in particular, has made these models more accessible by enabling them to be deployed on consumer-grade GPUs. Quantized models generally demonstrate comparable performance levels to their unquantized base counterparts. Nonetheless, there exists a notable gap in our comprehensive understanding of how these quantized models respond to hyperparameters, such as temperature, max new tokens, and topk, particularly for next word prediction. The present analysis reveals that nf4 and fp4 are equally proficient 4-bit quantization techniques, characterized by similar attributes such as inference speed, memory consumption, and the quality of generated content. the study identifies nf4 as displaying greater resilience to temperature variations in the case of the llama2 series of models at lower temperature, while fp4 and fp4-dq proves to be a more suitable choice for falcon series of models. It is noteworthy that, in general, 4-bit quantized models of varying sizes exhibit higher sensitivity to temperature in the range of 0.5 to 0.8, unlike their unquantized counterparts. Additionally, int8 quantization is associated with significantly slower inference speeds, whereas unquantized bfloat16 models consistently yield the fastest inference speeds across models of all sizes.
    摘要 大型语言模型(LLM)在大小方面迅速增长,参数的数量成为许多商业模型的成功关键,如ChatGPT、Claude和Bard。即使最近公开的商业模型,如Falcon和Llama2,也搭载了数十亿个参数。这种 Parameters 的增加使得部署和运行变得非常昂贵。在大型神经网络和 LLM 的减量方面做出了重要进步,使得这些模型可以在消费级 GPU 上部署。减量模型通常与不减量模型的性能水平相当。然而,对于下一个字母预测中的 гиперparameters,如温度、最大新字母数和topk,我们对这些减量模型的理解仍然存在一定的差距。 presente 分析发现,nf4 和 fp4 是Equally proficient 4位减量技术,具有相似的特点,如执行速度、内存占用率和生成内容质量。study 发现,nf4 在 llama2 系列模型下 Displayed greater resilience to temperature variations at lower temperature, while fp4 和 fp4-dq 适用于 falcon 系列模型。通常 speaking, 4位减量模型的不同大小在温度范围内 0.5-0.8 exhibit higher sensitivity to temperature, unlike their unquantized counterparts。此外,int8 减量与不减量 bfloat16 模型相比,执行速度明显 slower。

From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery

  • paper_url: http://arxiv.org/abs/2309.05203
  • repo_url: None
  • paper_authors: Yuhan Chen, Nuwa Xi, Yanrui Du, Haochun Wang, Chen Jianyu, Sendong Zhao, Bing Qin
  • for: 提高底层资源缺乏的cross-modal分子发现方法的效果
  • methods: 利用人工生成的大语言模型生成的 pseudo data进行适应Domain adaptation
  • results: 使用 pseudo data 的方法比现有方法有更好的性能,同时需要较小的模型规模、数据量和训练成本,表明其高效性。
    Abstract Molecule discovery serves as a cornerstone in numerous scientific domains, fueling the development of new materials and innovative drug designs. Recent developments of in-silico molecule discovery have highlighted the promising results of cross-modal techniques, which bridge molecular structures with their descriptive annotations. However, these cross-modal methods frequently encounter the issue of data scarcity, hampering their performance and application. In this paper, we address the low-resource challenge by utilizing artificially-real data generated by Large Language Models (LLMs). We first introduce a retrieval-based prompting strategy to construct high-quality pseudo data, then explore the optimal method to effectively leverage this pseudo data. Experiments show that using pseudo data for domain adaptation outperforms all existing methods, while also requiring a smaller model scale, reduced data size and lower training cost, highlighting its efficiency. Furthermore, our method shows a sustained improvement as the volume of pseudo data increases, revealing the great potential of pseudo data in advancing low-resource cross-modal molecule discovery.
    摘要 分子发现在许多科学领域中 serves as a cornerstone, 推动新材料和创新药物设计的发展。 latest developments in in-silico molecule discovery have highlighted the promising results of cross-modal techniques, which bridge molecular structures with their descriptive annotations. However, these cross-modal methods frequently encounter the issue of data scarcity, hampering their performance and application. In this paper, we address the low-resource challenge by utilizing artificially-real data generated by Large Language Models (LLMs). We first introduce a retrieval-based prompting strategy to construct high-quality pseudo data, then explore the optimal method to effectively leverage this pseudo data. Experiments show that using pseudo data for domain adaptation outperforms all existing methods, while also requiring a smaller model scale, reduced data size and lower training cost, highlighting its efficiency. Furthermore, our method shows a sustained improvement as the volume of pseudo data increases, revealing the great potential of pseudo data in advancing low-resource cross-modal molecule discovery.

  • paper_url: http://arxiv.org/abs/2309.05201
  • repo_url: None
  • paper_authors: Minhao Zhang, Yongliang Ma, Yanzeng Li, Ruoyu Zhang, Lei Zou, Ming Zhou
  • for: 本研究旨在解决多知识库(KB)合并问答(QA)问题中,不能充分利用多KB之间的不同链接类型所带来的限制。
  • methods: 本研究提出了一种新的多知识库问答(Multi-KB-QA)任务,利用多KB之间的全链接和半链接来获取正确答案。同时,我们还构建了一个多样化链接和问题类型的准则集,以便效率地评估多KB-QA性能。
  • results: 实验结果表明,我们提出的方法在多知识库问答任务中,与传统KB-QA系统相比,显著提高了性能。这表明,需要解决多KB之间的不同链接类型,以提高QA性能。
    Abstract Incorporating multiple knowledge sources is proven to be beneficial for answering complex factoid questions. To utilize multiple knowledge bases (KB), previous works merge all KBs into a single graph via entity alignment and reduce the problem to question-answering (QA) over the fused KB. In reality, various link relations between KBs might be adopted in QA over multi-KBs. In addition to the identity between the alignable entities (i.e. full link), unalignable entities expressing the different aspects or types of an abstract concept may also be treated identical in a question (i.e. partial link). Hence, the KB fusion in prior works fails to represent all types of links, restricting their ability to comprehend multi-KBs for QA. In this work, we formulate the novel Multi-KB-QA task that leverages the full and partial links among multiple KBs to derive correct answers, a benchmark with diversified link and query types is also constructed to efficiently evaluate Multi-KB-QA performance. Finally, we propose a method for Multi-KB-QA that encodes all link relations in the KB embedding to score and rank candidate answers. Experiments show that our method markedly surpasses conventional KB-QA systems in Multi-KB-QA, justifying the necessity of devising this task.
    摘要 combining multiple knowledge sources has been proven to be beneficial for answering complex factoid questions. to utilize multiple knowledge bases (kb), previous works merge all kbs into a single graph via entity alignment and reduce the problem to question-answering (qa) over the fused kb. in reality, various link relations between kbs might be adopted in qa over multi-kbs. in addition to the identity between the alignable entities (i.e. full link), unalignable entities expressing different aspects or types of an abstract concept may also be treated identical in a question (i.e. partial link). hence, the kb fusion in prior works fails to represent all types of links, restricting their ability to comprehend multi-kbs for qa. in this work, we formulate the novel multi-kb-qa task that leverages the full and partial links among multiple kbs to derive correct answers, a benchmark with diversified link and query types is also constructed to efficiently evaluate multi-kb-qa performance. finally, we propose a method for multi-kb-qa that encodes all link relations in the kb embedding to score and rank candidate answers. experiments show that our method markedly surpasses conventional kb-qa systems in multi-kb-qa, justifying the necessity of devising this task.

Does Writing with Language Models Reduce Content Diversity?

  • paper_url: http://arxiv.org/abs/2309.05196
  • repo_url: https://github.com/vishakhpk/hai-diversity
  • paper_authors: Vishakh Padmakumar, He He
  • for: measure the impact of co-writing on diversity in produced content
  • methods: controlled experiment with three setups (base LLM, feedback-tuned LLM, and no model help) and diversity metrics
  • results: writing with InstructGPT (but not GPT3) results in a statistically significant reduction in diversity, with increased similarity between writings of different authors and reduced lexical and content diversity, primarily due to InstructGPT contributing less diverse text to co-written essays.
    Abstract Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write argumentative essays in three setups -- using a base LLM (GPT3), a feedback-tuned LLM (InstructGPT), and writing without model help. We develop a set of diversity metrics and find that writing with InstructGPT (but not the GPT3) results in a statistically significant reduction in diversity. Specifically, it increases the similarity between the writings of different authors and reduces the overall lexical and content diversity. We additionally find that this effect is mainly attributable to InstructGPT contributing less diverse text to co-written essays. In contrast, the user-contributed text remains unaffected by model collaboration. This suggests that the recent improvement in generation quality from adapting models to human feedback might come at the cost of more homogeneous and less diverse content.
    摘要 Translation notes:* "Large language models" (LLMs) is translated as "大型语言模型" (dàxìng yǔyán módelì)* "Collaborative writing" is translated as "合作写作" (hézuò xiǎoqian)* "Base LLM" is translated as "基础模型" (jīchū módelì)* "Feedback-tuned LLM" is translated as "反馈调整模型" (fǎngxiàn tiángzhèng módelì)* "Co-written essays" is translated as "合作写作的文章" (hézuò xiǎoqian de wénzhang)* "Diversity metrics" is translated as "多样性指标" (duōyànxìng zhǐbǐ)* "Statistically significant reduction in diversity" is translated as " statistically significant reduction in diversity" (统计学上的多样性减少)* "Lexical diversity" is translated as "词语多样性" (cíyǔ duōyànxìng)* "Content diversity" is translated as "内容多样性" (néngjīng duōyànxìng)

cs.LG - 2023-09-11

Reaction coordinate flows for model reduction of molecular kinetics

  • paper_url: http://arxiv.org/abs/2309.05878
  • repo_url: None
  • paper_authors: Hao Wu, Frank Noé
  • for: 本研究推出了一种基于流程的机器学习方法,即反应均衡(RC)流,用于描述分子系统的低维度动力学模型。
  • methods: 该方法使用了正规化流形成均衡变换,并使用布朗运动模型来近似RC的动力学。所有模型参数都可以通过数据驱动方式进行估算。
  • results: numerical experiments表明,提出的方法可以高效地从仿真数据中提取低维度、可读的状态空间表示。
    Abstract In this work, we introduce a flow based machine learning approach, called reaction coordinate (RC) flow, for discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.
    摘要 在这个研究中,我们介绍了一种基于流的机器学习方法,即反应坐标(RC)流,用于分子系统的低维度动力学模型的发现。RC流利用了正规化流来设计坐标变换,并使用布朗运动模型来近似RC的动力学,其中所有模型参数都可以在数据驱动下被估算。与现有的分子动力学减量方法不同,RC流提供了可训练和可追踪的维度减少的动力学模型,因为正规化流的倒散性。此外,在这种研究中,我们使用布朗运动模型来研究分子系统中的潜在稳态态,从而获得了可读取的和准确的低维度表示。numerical experiments表明,该方法可以从模拟数据中提取有效和准确的低维度表示。

Force-directed graph embedding with hops distance

  • paper_url: http://arxiv.org/abs/2309.05865
  • repo_url: None
  • paper_authors: Hamidreza Lotfalizadeh, Mohammad Al Hasan
  • for: 本研究旨在提出一种基于力的图像方法,用于图像中节点的快速嵌入和分类。
  • methods: 该方法使用了稳定加速公式,将节点嵌入低维空间中,以保持图像的结构特征。具体来说,该方法 simulate了一些自定义吸引和排斥力,用于 Node pairs中的快速嵌入。
  • results: 对多个图像分析任务进行评估,该方法可以与现有的无监督嵌入方法相比,实现竞争性的表现。
    Abstract Graph embedding has become an increasingly important technique for analyzing graph-structured data. By representing nodes in a graph as vectors in a low-dimensional space, graph embedding enables efficient graph processing and analysis tasks like node classification, link prediction, and visualization. In this paper, we propose a novel force-directed graph embedding method that utilizes the steady acceleration kinetic formula to embed nodes in a way that preserves graph topology and structural features. Our method simulates a set of customized attractive and repulsive forces between all node pairs with respect to their hop distance. These forces are then used in Newton's second law to obtain the acceleration of each node. The method is intuitive, parallelizable, and highly scalable. We evaluate our method on several graph analysis tasks and show that it achieves competitive performance compared to state-of-the-art unsupervised embedding techniques.
    摘要 图像嵌入技术在处理图Structured数据方面已经变得越来越重要。通过将图像中的节点表示为低维度空间中的向量,图像嵌入技术可以实现高效的图像处理和分析任务,如节点分类、链接预测和可视化。在这篇论文中,我们提出了一种新的力导向的图像嵌入方法,该方法利用了稳定加速公式来嵌入节点,以保持图像的结构特征。我们在所有节点对之间 simulate 自定义的吸引和排斥力,并使用牛顿第二定律来获得每个节点的加速度。该方法易于理解,可并行化和高度扩展。我们在多个图像分析任务中评估了该方法,并示出了与当前最佳无监督嵌入技术相比的竞争性性能。

Energy Preservation and Stability of Random Filterbanks

  • paper_url: http://arxiv.org/abs/2309.05855
  • repo_url: https://github.com/danedane-haider/random-filterbanks
  • paper_authors: Daniel Haider, Vincent Lostanlen, Martin Ehler, Peter Balazs
  • for: 这篇论文是关于干扰波形深度学习的挑战。
  • methods: 这篇论文使用了数据卷积神经网络(convnet)来设计滤波器。
  • results: 研究发现,使用随机 Gaussian 权重的 FIR 滤波器在大 Filter 和本地 périodic 输入信号中存在不稳定性和 Condition number 问题。此外,研究还发现了预期能量保持的不够,导致了数字稳定性的问题,并提出了理论上的 BOUND 限制。
    Abstract What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. This is all the more surprising because these baselines are linear time-invariant systems: as such, their transfer functions could be accurately represented by a convnet with a large receptive field. In this article, we elaborate on the statistical properties of simple convnets from the mathematical perspective of random convolutional operators. We find that FIR filterbanks with random Gaussian weights are ill-conditioned for large filters and locally periodic input signals, which both are typical in audio signal processing applications. Furthermore, we observe that expected energy preservation of a random filterbank is not sufficient for numerical stability and derive theoretical bounds for its expected frame bounds.
    摘要 (以下是文本的简化中文版本)为什么波形基于深度学习这么困难?虽然许多尝试用深度神经网络(convnet)来设计滤波器,但它们frequently fail to outperform hand-crafted baselines。这种情况更加奇怪,因为这些基elines是线性时间不变的系统:这意味着它们的转移函数可以准确地由一个大感知场来表示。在这篇文章中,我们从数学角度来描述简单的convnet的统计性质。我们发现,随机 Gaussian 权重的 FIR 滤波器在大 filters 和本地 periodic input signals 中是不稳定的。此外,我们发现预期能量保留不够 для数学稳定,并derive theoretical bounds for its expected frame bounds。

ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation

  • paper_url: http://arxiv.org/abs/2309.05853
  • repo_url: None
  • paper_authors: Gregory W. Kyro, Anton Morgunov, Rafael I. Brent, Victor S. Batista
  • For: The paper is written for the purpose of developing a novel and efficient semi-supervised active learning methodology for fine-tuning generative artificial intelligence models, specifically in the context of targeted molecular generation.* Methods: The paper uses a GPT-based molecular generator and a constructed representation of the sample space to strategically operate within a chemical space proxy, maximizing attractive interactions between the generated molecules and a protein target. The approach does not require the individual evaluation of all data points used for fine-tuning, enabling the incorporation of computationally expensive metrics.* Results: The paper demonstrates the ability to fine-tune a GPT-based molecular generator with respect to an attractive interaction-based scoring function, resulting in maximized attractive interactions between the generated molecules and a protein target.Here is the same information in Simplified Chinese text:* For: 这篇论文是为了开发一种新的和高效的半监督学习方法,用于微调生成人工智能模型,特别是在分子生成领域中。* Methods: 这篇论文使用基于GPT的分子生成器,并使用一个构建的样本空间来策略地操作在一个化学空间代理中,以最大化生成分子和蛋白质目标之间的吸引力。这种方法不需要评估所有数据点,因此可以包含计算成本较高的指标。* Results: 这篇论文 demonstarted 可以使用这种方法微调基于吸引力分数函数的GPT基于分子生成器,以最大化生成分子和蛋白质目标之间的吸引力。
    Abstract The incredible capabilities of generative artificial intelligence models have inevitably led to their application in the domain of drug discovery. It is therefore of tremendous interest to develop methodologies that enhance the abilities and applicability of these powerful tools. In this work, we present a novel and efficient semi-supervised active learning methodology that allows for the fine-tuning of a generative model with respect to an objective function by strategically operating within a constructed representation of the sample space. In the context of targeted molecular generation, we demonstrate the ability to fine-tune a GPT-based molecular generator with respect to an attractive interaction-based scoring function by strategically operating within a chemical space proxy, thereby maximizing attractive interactions between the generated molecules and a protein target. Importantly, our approach does not require the individual evaluation of all data points that are used for fine-tuning, enabling the incorporation of computationally expensive metrics. We are hopeful that the inherent generality of this methodology ensures that it will remain applicable as this exciting field evolves. To facilitate implementation and reproducibility, we have made all of our software available through the open-source ChemSpaceAL Python package.
    摘要 “启示人工智能模型的强大能力已经无可避免地应用于药物发现领域。因此,开发 методологиías可以提高这些工具的能力和应用性是非常重要的。在这种工作中,我们提出了一种新的、高效的半监督学习方法,可以让一个生成模型与一个目标函数进行精细调整,通过在一个构造的样本空间中精细操作。在分子生成中,我们示例了可以通过在一个化学空间代理中精细调整一个GPT基于的分子生成器,以便最大化与一个蛋白质目标分子之间的吸引力相互作用。值得注意的是,我们的方法不需要评估所有用于精细调整的数据点,因此可以包括计算成本较高的指标。我们希望这种方法的内在一致性将保持其可应用性,并且随着这个吸引人的领域的发展,它将继续保持有用。为了促进实现和复制性,我们已经将所有的软件公开发布在OpenSource的ChemSpaceAL Python包中。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and other parts of the world. If you need the translation in Traditional Chinese, please let me know.

  • paper_url: http://arxiv.org/abs/2309.05843
  • repo_url: None
  • paper_authors: Louis Blankemeier, Sebastien Baur, Wei-Hung Weng, Jake Garrison, Yossi Matias, Shruthi Prabhakara, Diego Ardila, Zaid Nabulsi
  • for: 这个论文旨在提出一种自适应学习框架,用于对医疗声学信号进行对比学习。
  • methods: 该论文使用了SimCLR框架和Slowfast NFNet底层,并进行了对声音扩展的深入分析,以便优化Slowfast NFNet对声音 tasks的性能。
  • results: 研究发现,合适的声音扩展策略可以提高Slowfast NFNet对声音任务的性能,并且当扩展策略组合起来时,它们可以产生相互增强的效果,超过每个策略应用 separately的效果。
    Abstract Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slowfast NFNet backbone, for contrastive learning of health acoustics. A crucial aspect of optimizing Slowfast NFNet for this application lies in identifying effective audio augmentations. We conduct an in-depth analysis of various audio augmentation strategies and demonstrate that an appropriate augmentation strategy enhances the performance of the Slowfast NFNet audio encoder across a diverse set of health acoustic tasks. Our findings reveal that when augmentations are combined, they can produce synergistic effects that exceed the benefits seen when each is applied individually.
    摘要 医疗相关的声学信号,如喘挫和呼吸 зву频,对医疗诊断和连续健康监测有重要 significancE。现有大多数机器学习方法 для医疗声学都是专门为特定任务训练和评估,这限制了它们在不同医疗应用程序中的一致性。在这篇论文中,我们利用了一种无监督学习框架,SimCLR,并与Slowfast NFNet底层结构一起进行对照学习医疗声学。对于 optimize Slowfast NFNet для这个应用程序,我们进行了深入的声音扩充分析,并证明了有效的声音扩充策略可以提高 Slowfast NFNet 声音编码器在多种医疗声学任务上的性能。我们的发现表明,当扩充策略相互结合时,它们可以产生同工合作的效果,超过每个扩充策略应用 separately 的效果。

The Safety Filter: A Unified View of Safety-Critical Control in Autonomous Systems

  • paper_url: http://arxiv.org/abs/2309.05837
  • repo_url: None
  • paper_authors: Kai-Chieh Hsu, Haimin Hu, Jaime Fernández Fisac
  • for: 提高自主 робо器的安全性,满足新的部署环境的需求
  • methods: 评估和比较现有的安全筛选方法,提出一种统一的技术框架,推动未来的安全筛选技术的发展
  • results: 提出一种新的安全筛选方法,可以更好地满足自主 робо器的安全需求,并且可以帮助实现更好的安全性和可靠性
    Abstract Recent years have seen significant progress in the realm of robot autonomy, accompanied by the expanding reach of robotic technologies. However, the emergence of new deployment domains brings unprecedented challenges in ensuring safe operation of these systems, which remains as crucial as ever. While traditional model-based safe control methods struggle with generalizability and scalability, emerging data-driven approaches tend to lack well-understood guarantees, which can result in unpredictable catastrophic failures. Successful deployment of the next generation of autonomous robots will require integrating the strengths of both paradigms. This article provides a review of safety filter approaches, highlighting important connections between existing techniques and proposing a unified technical framework to understand, compare, and combine them. The new unified view exposes a shared modular structure across a range of seemingly disparate safety filter classes and naturally suggests directions for future progress towards more scalable synthesis, robust monitoring, and efficient intervention.
    摘要 This article provides a review of safety filter approaches, highlighting the connections between existing techniques and proposing a unified technical framework for understanding, comparing, and combining them. The new framework exposes a shared modular structure across a range of safety filter classes, providing a foundation for future progress in scalable synthesis, robust monitoring, and efficient intervention.

Ensemble-based modeling abstractions for modern self-optimizing systems

  • paper_url: http://arxiv.org/abs/2309.05823
  • repo_url: https://github.com/smartarch/ml-deeco-security-isola
  • paper_authors: Michal Töpfer, Milad Abdullah, Tomáš Bureš, Petr Hnětynka, Martin Kruliš
  • for: 这篇论文旨在扩展DEECo模型,以便使用机器学习和优化策略来建立和重新配置自动化组件集。
  • methods: 论文使用机器学习和优化策略来建立和重新配置自动化组件集,并在模型层次上Capture这些概念。
  • results: 论文通过用机器学习和优化策略来建立和重新配置自动化组件集,可以在Industry 4.0 Setting中模型访问控制相关问题,并且 argue что这种方法是现代智能系统的关键特征,可以在运行时学习和优化其行为以适应环境不确定性。
    Abstract In this paper, we extend our ensemble-based component model DEECo with the capability to use machine-learning and optimization heuristics in establishing and reconfiguration of autonomic component ensembles. We show how to capture these concepts on the model level and give an example of how such a model can be beneficially used for modeling access-control related problem in the Industry 4.0 settings. We argue that incorporating machine-learning and optimization heuristics is a key feature for modern smart systems which are to learn over the time and optimize their behavior at runtime to deal with uncertainty in their environment.
    摘要 在本文中,我们将我们的集成式组件模型DEECo扩展以使用机器学习和优化办法在自动化组件集中进行设置和重新配置。我们证明了如何在模型层面上表示这些概念,并给出了一个例子,证明如何使用这种模型来解决在产业4.0设置下的访问控制相关问题。我们认为,在运行时使用机器学习和优化办法是现代智能系统的关键特征,以适应环境中的不确定性。Here's the word-for-word translation:在本文中,我们将我们的集成式组件模型DEECo扩展以使用机器学习和优化办法在自动化组件集中进行设置和重新配置。我们证明了如何在模型层面上表示这些概念,并给出了一个例子,证明如何使用这种模型来解决在产业4.0设置下的访问控制相关问题。我们认为,在运行时使用机器学习和优化办法是现代智能系统的关键特征,以适应环境中的不确定性。

Interpretable learning of effective dynamics for multiscale systems

  • paper_url: http://arxiv.org/abs/2309.05812
  • repo_url: None
  • paper_authors: Emmanuel Menier, Sebastian Kaltenbach, Mouadh Yagoubi, Marc Schoenauer, Petros Koumoutsakos
  • for: 这篇论文旨在提出一种可解释性的学习动力学框架,以提高高维多Scale系统的模型化和仿真。
  • methods: 该框架基于深度回归神经网络,并结合了Mori-Zwanzig和Koopman运动理论。
  • results: 实验结果表明,该框架可以生成高精度预测和获得可解释性的动力学特性,适用于解决高维多Scale系统。
    Abstract The modeling and simulation of high-dimensional multiscale systems is a critical challenge across all areas of science and engineering. It is broadly believed that even with today's computer advances resolving all spatiotemporal scales described by the governing equations remains a remote target. This realization has prompted intense efforts to develop model order reduction techniques. In recent years, techniques based on deep recurrent neural networks have produced promising results for the modeling and simulation of complex spatiotemporal systems and offer large flexibility in model development as they can incorporate experimental and computational data. However, neural networks lack interpretability, which limits their utility and generalizability across complex systems. Here we propose a novel framework of Interpretable Learning Effective Dynamics (iLED) that offers comparable accuracy to state-of-the-art recurrent neural network-based approaches while providing the added benefit of interpretability. The iLED framework is motivated by Mori-Zwanzig and Koopman operator theory, which justifies the choice of the specific architecture. We demonstrate the effectiveness of the proposed framework in simulations of three benchmark multiscale systems. Our results show that the iLED framework can generate accurate predictions and obtain interpretable dynamics, making it a promising approach for solving high-dimensional multiscale systems.
    摘要 高维度多尺度系统的模拟和仿真是现代科学和工程领域的核心挑战。广泛认为,即使今天的计算技术得到进步,解决所有空间时间尺度的 governing 方程仍然是一个远方目标。这一 realizations 促使了对模型简化技术的努力。在过去几年,基于深度循环神经网络的技术已经在复杂空间时间系统的模拟和仿真中提供了有希望的结果,并且可以包含实验室和计算数据。然而,神经网络缺乏可解性,这限制了其应用和普遍性,特别是在复杂系统中。我们提出了一种新的框架,即可解释性学习有效动力(iLED)框架。该框架基于 Mori-Zwanzig 和 Koopman 运算理论,这个选择的特定架构是合理的。我们在三个标准多尺度系统的 simulations 中证明了该框架的有效性。我们的结果表明,iLED 框架可以生成准确预测和获得可解性动力,使其成为解决高维度多尺度系统的有力的方法。

Predicting the Radiation Field of Molecular Clouds using Denoising Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2309.05811
  • repo_url: None
  • paper_authors: Duo Xu, Stella Offner, Robert Gutermuth, Michael Grudic, David Guszejnov, Philip Hopkins
  • for: 这篇论文的目的是量化星系形成过程中辐射反馈的影响,以便更好地理解星系形成的物理过程。
  • methods: 这篇论文使用了深度学习技术,具体来说是denoising diffusion probabilistic models (DDPMs),来预测Interstellar Radiation Field (ISRF) 的强度,基于三种频谱尘埃辐射的观测数据。
  • results: 论文通过对STARFORGE项目的 magnetohydrodynamic模拟和Monoceros R2 (MonR2)星系形成区的观测数据进行训练,成功地预测了ISRF的分布。这些预测结果与实际值之间的偏差在0.1倍以内,并且模型可以有效地约束ISRF的相对强度在0.2倍之间。
    Abstract Accurately quantifying the impact of radiation feedback in star formation is challenging. To address this complex problem, we employ deep learning techniques, denoising diffusion probabilistic models (DDPMs), to predict the interstellar radiation field (ISRF) strength based on three-band dust emission at 4.5 \um, 24 \um, and 250 \um. We adopt magnetohydrodynamic simulations from the STARFORGE (STAR FORmation in Gaseous Environments) project that model star formation and giant molecular cloud (GMC) evolution. We generate synthetic dust emission maps matching observed spectral energy distributions in the Monoceros R2 (MonR2) GMC. We train DDPMs to estimate the ISRF using synthetic three-band dust emission. The dispersion between the predictions and true values is within a factor of 0.1 for the test set. We extended our assessment of the diffusion model to include new simulations with varying physical parameters. While there is a consistent offset observed in these out-of-distribution simulations, the model effectively constrains the relative intensity to within a factor of 2. Meanwhile, our analysis reveals weak correlation between the ISRF solely derived from dust temperature and the actual ISRF. We apply our trained model to predict the ISRF in MonR2, revealing a correspondence between intense ISRF, bright sources, and high dust emission, confirming the model's ability to capture ISRF variations. Our model robustly predicts radiation feedback distribution, even in complex, poorly constrained ISRF environments like those influenced by nearby star clusters. However, precise ISRF predictions require an accurate training dataset mirroring the target molecular cloud's unique physical conditions.
    摘要 准确量化星系形成中辐射反馈的影响是一项复杂的问题。为了解决这个问题,我们使用深度学习技术,杂散扩散概率模型(DDPMs),预测Interstellar Radiation Field(ISRF)强度基于三个频谱带的尘埃辐射。我们采用了STARFORGE项目中的磁液动学模拟,模拟星系形成和大分子云(GMC)的演化。我们生成了与观测 спектраль能量分布匹配的人造尘埃辐射图像。我们使用这些图像训练 DDPMs,以便估算ISRF。我们发现在测试集上,模型的预测与真实值之间的差异在一个因子0.1之内。我们对模型进行了进一步的评估,包括在不同物理参数下运行的新的simulation。虽然在这些 OUT-OF-distribution 的simulation中,我们 observes a consistent offset,但模型 Effectively constrains the relative intensity to within a factor of 2.而我们的分析发现,尘埃温度 solo derivated ISRF 和实际 ISRF 之间存在弱相关性。我们使用我们的训练模型来预测 MonR2 中的 ISRF,发现与强 ISRF 和高尘埃辐射相对应。我们的模型可靠地预测辐射反馈分布,即使在复杂、不充分约束的 ISRF 环境中。然而,精确地预测 ISRF 需要一个准确的训练集,这个训练集必须反映目标分子云的特定物理条件。

Online ML Self-adaptation in Face of Traps

  • paper_url: http://arxiv.org/abs/2309.05805
  • repo_url: None
  • paper_authors: Michal Töpfer, František Plášil, Tomáš Bureš, Petr Hnětynka, Martin Kruliš, Danny Weyns
  • for: 本研究旨在探讨在智能农业场景中应用在线机器学习自适应系统中的陷阱。
  • methods: 本研究使用了在线机器学习 estimator 的规范和在线训练,并评估了这些 estimator 的影响。
  • results: 研究发现了一些陷阱,包括规范和在线训练的影响,以及如何评估 estimator 的方法。这些结果可以作为其他研究人员和实践者在应用在线机器学习自适应系统时的指南。
    Abstract Online machine learning (ML) is often used in self-adaptive systems to strengthen the adaptation mechanism and improve the system utility. Despite such benefits, applying online ML for self-adaptation can be challenging, and not many papers report its limitations. Recently, we experimented with applying online ML for self-adaptation of a smart farming scenario and we had faced several unexpected difficulties -- traps -- that, to our knowledge, are not discussed enough in the community. In this paper, we report our experience with these traps. Specifically, we discuss several traps that relate to the specification and online training of the ML-based estimators, their impact on self-adaptation, and the approach used to evaluate the estimators. Our overview of these traps provides a list of lessons learned, which can serve as guidance for other researchers and practitioners when applying online ML for self-adaptation.
    摘要 在线机器学习(ML)常常用于自适应系统以增强适应机制并提高系统的用用。 DESPITE 这些优点,将在线ML应用于自适应可能是问题,而且不多的论文就讨论了这些问题的限制。 最近,我们尝试在智能农业场景中应用在线ML自适应,并遇到了许多意外的困难(陷阱),我们知道,在社区中不太多讨论这些问题。 在这篇论文中,我们详细讨论了这些陷阱,包括在线ML基于估计器的规则和在线培训的影响,以及自适应的方法。 我们的概述这些陷阱提供了一个指南,可以帮助其他研究者和实践者当在线ML应用自适应时遇到的问题。

Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models

  • paper_url: http://arxiv.org/abs/2309.05803
  • repo_url: None
  • paper_authors: Sumeet Singh, Stephen Tu, Vikas Sindhwani
  • for: 本研究的目的是探讨能量基本模型(EBM)作为策略表示的可行性。
  • methods: 研究人员提出了一种实用的训练目标和算法,使得EBM可以成功地训练。这种方法结合了几个关键元素:(i)排名噪音对比估计(R-NCE),(ii)可学习负样本,以及(iii)非对抗共同训练。
  • results: 研究人员通过实验发现,使用EBM作为策略表示可以与 diffusion models 和其他现有方法竞争,并在多个多模态benchmark中表现出色,包括避免障碍物和推动块。
    Abstract A crucial design decision for any robot learning pipeline is the choice of policy representation: what type of model should be used to generate the next set of robot actions? Owing to the inherent multi-modal nature of many robotic tasks, combined with the recent successes in generative modeling, researchers have turned to state-of-the-art probabilistic models such as diffusion models for policy representation. In this work, we revisit the choice of energy-based models (EBM) as a policy class. We show that the prevailing folklore -- that energy models in high dimensional continuous spaces are impractical to train -- is false. We develop a practical training objective and algorithm for energy models which combines several key ingredients: (i) ranking noise contrastive estimation (R-NCE), (ii) learnable negative samplers, and (iii) non-adversarial joint training. We prove that our proposed objective function is asymptotically consistent and quantify its limiting variance. On the other hand, we show that the Implicit Behavior Cloning (IBC) objective is actually biased even at the population level, providing a mathematical explanation for the poor performance of IBC trained energy policies in several independent follow-up works. We further extend our algorithm to learn a continuous stochastic process that bridges noise and data, modeling this process with a family of EBMs indexed by scale variable. In doing so, we demonstrate that the core idea behind recent progress in generative modeling is actually compatible with EBMs. Altogether, our proposed training algorithms enable us to train energy-based models as policies which compete with -- and even outperform -- diffusion models and other state-of-the-art approaches in several challenging multi-modal benchmarks: obstacle avoidance path planning and contact-rich block pushing.
    摘要 robot学习管道中的关键设计决策是选择策略表示方式:使用哪种模型生成下一个机器人动作?由于许多机器人任务的自然多Modal性,加上近年来的生成模型成功,研究人员已经转向当今最先进的概率模型,如扩散模型,作为策略表示方式。在这种工作中,我们重新考虑使用能量模型(EBM)作为策略类型。我们证明了 prevailing folklore ――高维离散空间中的能量模型是不可行的――是错误的。我们开发了实用的训练目标和算法,其中包括以下几个关键元素:(i)排名噪音对比估计(R-NCE),(ii)可学习的负样本,以及(iii)非对抗共同训练。我们证明我们的提议的目标函数是 asymptotically consistent,并且量化其限界方差。然而,我们显示了启发行为嵌入(IBC)目标是偏向的,并提供了数学解释,以解释在多个独立跟进工作中,IBC训练的能量策略表现不佳。此外,我们还扩展了我们的算法,以学习一个连续随机过程,该过程将噪音和数据相连,并使用一个家族的 EBMs 指标。在这种情况下,我们证明了生成模型的核心想法和 EBMs 是兼容的。总之,我们的提议的训练算法可以训练能量模型作为策略,与扩散模型和其他当今最先进方法在多个复杂多Modal 标准准则中竞争。

Enhancing Hyperedge Prediction with Context-Aware Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2309.05798
  • repo_url: https://github.com/yy-ko/cash
  • paper_authors: Yunyong Ko, Hanghang Tong, Sang-Wook Kim
  • for: 这个论文主要用于解决hyperedge prediction问题,即预测未知的超链接(group-wise relations)。
  • methods: 该论文提出了一种新的hyperedge prediction框架(CASH),使用context-aware node aggregation和自supervised contrastive learning来提高hypergraph表示性和预测精度。
  • results: 实验结果显示,CASH在六个实际的超链接上的预测精度高于所有竞争方法,并且每一种提出的策略都有助于提高CASH模型的准确率。Here’s the format of the information in Simplified Chinese text:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope this helps! Let me know if you have any other questions.
    Abstract Hypergraphs can naturally model group-wise relations (e.g., a group of users who co-purchase an item) as hyperedges. Hyperedge prediction is to predict future or unobserved hyperedges, which is a fundamental task in many real-world applications (e.g., group recommendation). Despite the recent breakthrough of hyperedge prediction methods, the following challenges have been rarely studied: (C1) How to aggregate the nodes in each hyperedge candidate for accurate hyperedge prediction? and (C2) How to mitigate the inherent data sparsity problem in hyperedge prediction? To tackle both challenges together, in this paper, we propose a novel hyperedge prediction framework (CASH) that employs (1) context-aware node aggregation to precisely capture complex relations among nodes in each hyperedge for (C1) and (2) self-supervised contrastive learning in the context of hyperedge prediction to enhance hypergraph representations for (C2). Furthermore, as for (C2), we propose a hyperedge-aware augmentation method to fully exploit the latent semantics behind the original hypergraph and consider both node-level and group-level contrasts (i.e., dual contrasts) for better node and hyperedge representations. Extensive experiments on six real-world hypergraphs reveal that CASH consistently outperforms all competing methods in terms of the accuracy in hyperedge prediction and each of the proposed strategies is effective in improving the model accuracy of CASH. For the detailed information of CASH, we provide the code and datasets at: https://github.com/yy-ko/cash.
    摘要 “几何グラフ”(hypergraph)可以自然地模型群聚关系(例如,一群用户购买同一款商品)作为几何边(hyperedge)。几何边预测是几何グラフ应用中的基本任务之一(例如,群组推荐)。Despite the recent breakthrough of 几何边预测方法,以下两个挑战很少被研究:(C1)如何将几何边候选中的节点组合数据准确地预测几何边?和(C2)如何缓和几何边预测中的自然数据罕规问题?To tackle both challenges together, in this paper, we propose a novel 几何边预测框架(CASH),该框架使用(1)具体考虑几何边候选中每个节点的背景信息,以精确地捕捉几何边中每个节点之间的复杂关系,来满足(C1),以及(2)在几何边预测的上下文中,透过自适应对称学习,对几何边预测进行增强,来缓和几何边预测中的数据罕规问题,来满足(C2)。此外,为了缓和几何边预测中的数据罕规问题,我们还提出了一个几何边测试方法,具体来说是通过对几何边预测模型进行几何边层次抽象,对几何边预测进行双层次对称抽象,以便更好地捕捉几何边的内在 semantics。实验结果显示,CASH 在六个真实世界几何グラフ上 consistently outperform 所有竞争方法,并且每个提出的策略都是增强 CASH 的模型精度的有效方法。详细信息请参考:https://github.com/yy-ko/cash。”

On the Fine-Grained Hardness of Inverting Generative Models

  • paper_url: http://arxiv.org/abs/2309.05795
  • repo_url: None
  • paper_authors: Feyza Duman Keles, Chinmay Hegde
  • for: 本研究的目的是为Generative模型的反射问题提供一个细化的视角,即查找一个大小为$n$的latent vector,使得generative模型的输出与给定的target匹配得非常好。
  • methods: 本研究使用了多种方法,包括几何学的抽象和复杂度分析,以及来自计算复杂性理论的抽象和技巧。
  • results: 本研究提供了一些新的下界,证明了generative模型的反射问题在某些情况下的计算复杂度是$\Omega(2^n)$,这是在计算复杂性理论中的一个新成果。
    Abstract The objective of generative model inversion is to identify a size-$n$ latent vector that produces a generative model output that closely matches a given target. This operation is a core computational primitive in numerous modern applications involving computer vision and NLP. However, the problem is known to be computationally challenging and NP-hard in the worst case. This paper aims to provide a fine-grained view of the landscape of computational hardness for this problem. We establish several new hardness lower bounds for both exact and approximate model inversion. In exact inversion, the goal is to determine whether a target is contained within the range of a given generative model. Under the strong exponential time hypothesis (SETH), we demonstrate that the computational complexity of exact inversion is lower bounded by $\Omega(2^n)$ via a reduction from $k$-SAT; this is a strengthening of known results. For the more practically relevant problem of approximate inversion, the goal is to determine whether a point in the model range is close to a given target with respect to the $\ell_p$-norm. When $p$ is a positive odd integer, under SETH, we provide an $\Omega(2^n)$ complexity lower bound via a reduction from the closest vectors problem (CVP). Finally, when $p$ is even, under the exponential time hypothesis (ETH), we provide a lower bound of $2^{\Omega (n)}$ via a reduction from Half-Clique and Vertex-Cover.
    摘要 目标是使用生成模型进行逆向模型,即找到一个大小为$n$的隐藏向量,使得生成模型输出与给定目标匹配得非常好。这是现代计算机视觉和自然语言处理中的一种核心计算基础。然而,这个问题知道是计算上具有NP困难的worst-case。本文旨在为这个问题提供细腻的视野,并确定了一些新的困难下界。在精确的逆向模型中,目标是确定给定生成模型是否包含一个目标。在STRONG EXPONENTIAL TIME HYPOTHESIS(SETH)下,我们证明了逆向模型的计算复杂度是$\Omega(2^n)$,这是现有结果的强化。在更实用的逆向模型中,目标是确定一个模型范围内的点是否与给定目标之间的$\ell_p$-范数相似。当$p$是正的奇数时,在SETH下,我们提供了$\Omega(2^n)$的困难下界,via reduction from closest vectors problem(CVP)。而当$p$是偶数时,在EXPONENTIAL TIME HYPOTHESIS(ETH)下,我们提供了$2^{\Omega(n)}$的困难下界,via reduction from Half-Clique and Vertex-Cover。

Smartwatch-derived Acoustic Markers for Deficits in Cognitively Relevant Everyday Functioning

  • paper_url: http://arxiv.org/abs/2309.05777
  • repo_url: None
  • paper_authors: Yasunori Yamada, Kaoru Shinkawa, Masatomo Kobayashi, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai
  • for: 早期发现脑性功能障碍的检测是重要的,特别是讨论阿兹海默病。现有的评估标准是基于主观评价。 however, speech 有可能提供准确的对� stato markers。
  • methods: 我们使用smartwatch应用程序收集语音特征作为对� estado markers。我们从54名老年人手中收集了语音数据,包括认知任务和日常对话,以及一种普遍的每日功能测试。
  • results: 我们的结果表明,使用语音特征可以准确地检测一些日常功能障碍。我们使用机器学习模型,可以在68.5%的准确率下,检测出使用标准神经心理测试不能检测到的障碍。此外,我们还发现了一些通用的语音特征,可以强制对� estado功能障碍的识别。
    Abstract Detection of subtle deficits in everyday functioning due to cognitive impairment is important for early detection of neurodegenerative diseases, particularly Alzheimer's disease. However, current standards for assessment of everyday functioning are based on qualitative, subjective ratings. Speech has been shown to provide good objective markers for cognitive impairments, but the association with cognition-relevant everyday functioning remains uninvestigated. In this study, we demonstrate the feasibility of using a smartwatch-based application to collect acoustic features as objective markers for detecting deficits in everyday functioning. We collected voice data during the performance of cognitive tasks and daily conversation, as possible application scenarios, from 54 older adults, along with a measure of everyday functioning. Machine learning models using acoustic features could detect individuals with deficits in everyday functioning with up to 77.8% accuracy, which was higher than the 68.5% accuracy with standard neuropsychological tests. We also identified common acoustic features for robustly discriminating deficits in everyday functioning across both types of voice data (cognitive tasks and daily conversation). Our results suggest that common acoustic features extracted from different types of voice data can be used as markers for deficits in everyday functioning.
    摘要 检测日常功能下降 Due to cognitive impairment 是早期发现 neurosurgery 疾病,特别是阿尔茨海默病的关键。然而,现有的日常功能评估标准是基于主观的评分。speech 已经被证明可以提供good 的对象标志,但与认知有关的日常功能之间的关系还没有被调查。本研究示出了使用智能手表应用程序收集语音特征作为对日常功能下降的对象标志的可能性。我们收集了54名老年人的语音数据,包括认知任务和日常对话,以及一种测量日常功能的指标。使用语音特征的机器学习模型可以准确地检测出日常功能下降,准确率高达77.8%,比标准神经生理测试的68.5%高。我们还发现了日常功能下降的共同语音特征,可以在不同类型的语音数据中robustly 分类日常功能下降。我们的结果表明,共同语音特征可以作为日常功能下降的标志。

The Effect of Intrinsic Dimension on Metric Learning under Compression

  • paper_url: http://arxiv.org/abs/2309.05751
  • repo_url: None
  • paper_authors: Efstratios Palias, Ata Kabán
  • for: 本研究的目的是为了提高距离基于学习算法的性能,通过找到适当的距离度量函数。
  • methods: 本研究使用了随机压缩数据来训练全级度度量函数,并提供了对随机压缩的误差 bounds。
  • results: 实验结果表明,在高维设定下,采用随机压缩训练的方法可以提高距离基于学习算法的性能,并且 bounds 不依赖于环境维度。
    Abstract Metric learning aims at finding a suitable distance metric over the input space, to improve the performance of distance-based learning algorithms. In high-dimensional settings, metric learning can also play the role of dimensionality reduction, by imposing a low-rank restriction to the learnt metric. In this paper, instead of training a low-rank metric on high-dimensional data, we consider a randomly compressed version of the data, and train a full-rank metric there. We give theoretical guarantees on the error of distance-based metric learning, with respect to the random compression, which do not depend on the ambient dimension. Our bounds do not make any explicit assumptions, aside from i.i.d. data from a bounded support, and automatically tighten when benign geometrical structures are present. Experimental results on both synthetic and real data sets support our theoretical findings in high-dimensional settings.
    摘要

CaloClouds II: Ultra-Fast Geometry-Independent Highly-Granular Calorimeter Simulation

  • paper_url: http://arxiv.org/abs/2309.05704
  • repo_url: https://github.com/flc-qu-hep/caloclouds-2
  • paper_authors: Erik Buhmann, Frank Gaede, Gregor Kasieczka, Anatolii Korol, William Korcari, Katja Krüger, Peter McKeown
  • for: 高精度探测器的能量储存速度加速是未来冲击实验所需的。
  • methods: 使用生成型机器学习(ML)模型加速和补充传统 simulate chain 的物理分析。
  • results: 新的 CaloClouds II 模型提供了许多关键改进,包括连续时间分数基本模型,可以与 Geant4 相比,在单个 CPU 上实现 $6\times$ 的速度提升,并且通过简化储存模型为准确抽象模型,实现了 $46\times$ ($37\times$) 的速度提升。
    Abstract Fast simulation of the energy depositions in high-granular detectors is needed for future collider experiments with ever increasing luminosities. Generative machine learning (ML) models have been shown to speed up and augment the traditional simulation chain in physics analysis. However, the majority of previous efforts were limited to models relying on fixed, regular detector readout geometries. A major advancement is the recently introduced CaloClouds model, a geometry-independent diffusion model, which generates calorimeter showers as point clouds for the electromagnetic calorimeter of the envisioned International Large Detector (ILD). In this work, we introduce CaloClouds II which features a number of key improvements. This includes continuous time score-based modelling, which allows for a 25 step sampling with comparable fidelity to CaloClouds while yielding a $6\times$ speed-up over Geant4 on a single CPU ($5\times$ over CaloClouds). We further distill the diffusion model into a consistency model allowing for accurate sampling in a single step and resulting in a $46\times$ ($37\times$) speed-up. This constitutes the first application of consistency distillation for the generation of calorimeter showers.
    摘要 In this work, we introduce CaloClouds II, which features a number of key improvements. This includes continuous time score-based modeling, which allows for a 25-step sampling with comparable fidelity to CaloClouds while yielding a 6x speed-up over Geant4 on a single CPU (5x over CaloClouds). We further distill the diffusion model into a consistency model, allowing for accurate sampling in a single step and resulting in a 46x (37x) speed-up. This constitutes the first application of consistency distillation for the generation of calorimeter showers.

Unsupervised Machine Learning Techniques for Exploring Tropical Coamoeba, Brane Tilings and Seiberg Duality

  • paper_url: http://arxiv.org/abs/2309.05702
  • repo_url: None
  • paper_authors: Rak-Kyeong Seong
  • for: 这个论文的目的是用无监督机器学习技术来识别四维N=1维度超Symmetric gauge理论中的拓扑阶段,这些理论是D3- branes在toric Calabi-Yau 3-fold中的世界量理论。
  • methods: 这篇论文使用了无监督机器学习技术,如principal component analysis (PCA)和t-distributed stochastic neighbor embedding (t-SNE),来将复数结构参数的变化所对应的coamoeba和相关的brane tilingsProject down to a lower-dimensional phase space with phase boundaries corresponding to Seiberg duality。
  • results: 这篇论文的结果是通过使用无监督机器学习技术,可以将复数结构参数的变化所对应的coamoeba和相关的brane tilingsProject down to a lower-dimensional phase space with phase boundaries corresponding to Seiberg duality,并且在这个2-dimensional phase diagram中,可以看到Seiberg duality的相关关系。
    Abstract We introduce unsupervised machine learning techniques in order to identify toric phases of 4d N=1 supersymmetric gauge theories corresponding to the same toric Calabi-Yau 3-fold. These 4d N=1 supersymmetric gauge theories are worldvolume theories of a D3-brane probing a toric Calabi-Yau 3-fold and are realized in terms of a Type IIB brane configuration known as a brane tiling. It corresponds to the skeleton graph of the coamoeba projection of the mirror curve associated to the toric Calabi-Yau 3-fold. When we vary the complex structure moduli of the mirror Calabi-Yau 3-fold, the coamoeba and the corresponding brane tilings change their shape, giving rise to different toric phases related by Seiberg duality. We illustrate that by employing techniques such as principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), we can project the space of coamoeba labelled by complex structure moduli down to a lower dimensional phase space with phase boundaries corresponding to Seiberg duality. In this work, we illustrate this technique by obtaining a 2-dimensional phase diagram for brane tilings corresponding to the cone over the zeroth Hirzebruch surface F0.
    摘要 我们引入无监督机器学习技术,以识别四维N=1瑞奖场论的托立阶段。这些四维N=1瑞奖场论是D3-节点探测托立Calabi-Yau 3-次元的世界体系,并且可以表示为Type IIB节点配置所对应的节点矩阵。它对应到镜对称Calabi-Yau 3-次元的对偶矩阵。当我们变化镜对称Calabi-Yau 3-次元的复数结构参数,节点和相应的节点矩阵会改变形状,从而导致不同的托立阶段相互关联。我们使用技术如主成分分析(PCA)和t-分布随机邻近测度(t-SNE),可以将节点参数下的空间对应到较低维度的阶段空间,阶段边界与Seiberg对偶相关。在这个研究中,我们这种技术以获得F0的极点 Hirzebruch 面上的节点矩阵对应的2-维阶段图。

On the quality of randomized approximations of Tukey’s depth

  • paper_url: http://arxiv.org/abs/2309.05657
  • repo_url: None
  • paper_authors: Simon Briend, Gábor Lugosi, Roberto Imbuzeiro Oliveira
  • for: 这个论文是为了研究Tukey深度的准确计算问题,特别是在高维度时。
  • methods: 这篇论文使用随机化方法来估计Tukey深度,并研究这种方法在不同情况下的性能。
  • results: 研究结果显示,如果要求算法在维度上运行时间 polynomial,那么随机化方法可以准确地估计 maximal depth 和 depths close to zero,但是对于中间深度的任意点,任何好的估计都需要 exponential complexity。
    Abstract Tukey's depth (or halfspace depth) is a widely used measure of centrality for multivariate data. However, exact computation of Tukey's depth is known to be a hard problem in high dimensions. As a remedy, randomized approximations of Tukey's depth have been proposed. In this paper we explore when such randomized algorithms return a good approximation of Tukey's depth. We study the case when the data are sampled from a log-concave isotropic distribution. We prove that, if one requires that the algorithm runs in polynomial time in the dimension, the randomized algorithm correctly approximates the maximal depth $1/2$ and depths close to zero. On the other hand, for any point of intermediate depth, any good approximation requires exponential complexity.
    摘要 土耳其的深度(或半空间深度)是多变量数据中广泛使用的中心性指标。然而,对高维数据进行准确计算的Tukey的深度知道是一个困难的问题。为了缓解这个问题,随机化Tukey的深度的算法已经被提出。在这篇论文中,我们研究了这些随机算法是否能够给出Tukey的深度的好 aproximation。我们研究了数据来自具有卷积的均匀分布的情况。我们证明,如果要求算法在维度上运行在多项式时间内,那么随机算法可以正确地approximates maximal depth 1/2和 depths close to zero。然而,对于任何中间深度的点,任何好的approximation都需要对数复杂度。

Dynamic Handover: Throw and Catch with Bimanual Hands

  • paper_url: http://arxiv.org/abs/2309.05655
  • repo_url: None
  • paper_authors: Binghao Huang, Yuanpei Chen, Tianyu Wang, Yuzhe Qin, Yaodong Yang, Nikolay Atanasov, Xiaolong Wang
  • for: 这个论文的目的是解决人工智能机器人在投捕和捕获物体时遇到的挑战,如高速动作、精准协作和对多种物体的交互。
  • methods: 作者使用了多个拟合机器人臂上的多指手系统,并使用多任务学习和实验室转移来训练该系统。为了 bridging the Sim2Real gap,作者提供了多种新的算法设计,包括学习一个物体轨迹预测模型,以帮助机器人捕手在实时情况下了解物体的运动轨迹,并根据此反应。
  • results: 作者在实验中使用多种物体,并与多个基eline进行比较,显示了 significannot improvements。作者的项目页面可以在 \url{https://binghao-huang.github.io/dynamic_handover/} 上找到。
    Abstract Humans throw and catch objects all the time. However, such a seemingly common skill introduces a lot of challenges for robots to achieve: The robots need to operate such dynamic actions at high-speed, collaborate precisely, and interact with diverse objects. In this paper, we design a system with two multi-finger hands attached to robot arms to solve this problem. We train our system using Multi-Agent Reinforcement Learning in simulation and perform Sim2Real transfer to deploy on the real robots. To overcome the Sim2Real gap, we provide multiple novel algorithm designs including learning a trajectory prediction model for the object. Such a model can help the robot catcher has a real-time estimation of where the object will be heading, and then react accordingly. We conduct our experiments with multiple objects in the real-world system, and show significant improvements over multiple baselines. Our project page is available at \url{https://binghao-huang.github.io/dynamic_handover/}.
    摘要 人类常常投掷和捕获物体,但这些动作却对机器人带来许多挑战:机器人需要在高速下进行动态动作,协同准确地操作,并与多种物体进行交互。在这篇论文中,我们设计了两个多 fingers 手 attachment 到机器人臂,以解决这个问题。我们使用多智能 reinforcement learning 在模拟环境中训练我们的系统,并通过 Sim2Real 跨度传输部署到真实机器人上。为了bridging Sim2Real gap,我们提供了多种新算法设计,包括学习物体运动预测模型。这种模型可以帮助机器人捕获器在实时情况下获得物体的预测位置,然后按照相应的反应。我们在实际系统中进行了多种物体的实验,并显示了多个基准值的改进。我们的项目页面可以在 \url{https://binghao-huang.github.io/dynamic_handover/} 上找到。

Data efficiency, dimensionality reduction, and the generalized symmetric information bottleneck

  • paper_url: http://arxiv.org/abs/2309.05649
  • repo_url: None
  • paper_authors: K. Michael Martini, Ilya Nemenman
  • for: simultaneous compression of two random variables to preserve information
  • methods: Generalized Symmetric Information Bottleneck (GSIB) with different functional forms of the cost
  • results: qualitatively less data required for simultaneous compression compared to compressing variables one at a time, demonstrating the principle of simultaneous compression being more data efficient.Here’s the full text in Simplified Chinese:
  • for: 这个论文是为了同时压缩两个随机变量以保留信息的方法。
  • methods: 该方法是通过不同的函数形式来定义成本的扩展了信息瓶颈。
  • results: 通过分析不同的数据集大小,我们发现在典型的情况下,同时压缩两个变量的压缩需要比独立压缩每个变量的数据量更少,这是同时压缩的一个例子,表明同时压缩是独立压缩每个变量的更高效的方法。
    Abstract The Symmetric Information Bottleneck (SIB), an extension of the more familiar Information Bottleneck, is a dimensionality reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the Generalized Symmetric Information Bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the dataset size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that, in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.
    摘要 “ симметричный информационный бутылочный”(SIB)是一种维度减少技术,它同时压缩两个随机变量,以保留它们压缩后的信息之间的相互关系。我们引入了“总体化 симметричный информационный бутылочный”(GSIB),它探讨了不同的函数形式,以优化这种同时压缩的成本。我们然后研究了这种同时压缩的数据大小需求,通过计算涨落函数的下界和均方差估计来评估相关的统计噪声。我们发现,在常见的情况下,同时压缩的GSIB需要较少的数据来实现相同的错误率,而独立压缩每个输入变量的情况下,需要更多的数据。我们认为,这是一种更通用的原理,即同时压缩是独立压缩每个输入变量的数据效率更高的。

A Novel Supervised Deep Learning Solution to Detect Distributed Denial of Service (DDoS) attacks on Edge Systems using Convolutional Neural Networks (CNN)

  • paper_url: http://arxiv.org/abs/2309.05646
  • repo_url: https://github.com/VedanthR5/A-Novel-Deep-Learning-Solution-to-detect-DDoS-attacks-using-Neural-Networks
  • paper_authors: Vedanth Ramanathan, Krish Mahadevan, Sejal Dua
  • for: 本研究旨在探讨一种基于深度学习的网络流量中的DDOS攻击检测方法,以帮助防止DDOS攻击对互联网安全造成威胁。
  • methods: 本研究使用了 convolutional neural network (CNN) 和常用的深度学习算法,开发了一种新的检测技术,可以分类为正常和恶意流量。研究人员采用了数据预处理、流量normalization和自适应抑制等方法,以提高模型的泛化能力和精度。
  • results: 本研究的结果表明,提案的检测算法在2000个未看过的网络流量中实现了检测精度为0.9883,表明该方法可以有效地检测DDOS攻击。此外,研究人员还发现该方法可以扩展到任何网络环境,并且可以满足实时检测的需求。
    Abstract Cybersecurity attacks are becoming increasingly sophisticated and pose a growing threat to individuals, and private and public sectors. Distributed Denial of Service attacks are one of the most harmful of these threats in today's internet, disrupting the availability of essential services. This project presents a novel deep learning-based approach for detecting DDoS attacks in network traffic using the industry-recognized DDoS evaluation dataset from the University of New Brunswick, which contains packet captures from real-time DDoS attacks, creating a broader and more applicable model for the real world. The algorithm employed in this study exploits the properties of Convolutional Neural Networks (CNN) and common deep learning algorithms to build a novel mitigation technique that classifies benign and malicious traffic. The proposed model preprocesses the data by extracting packet flows and normalizing them to a fixed length which is fed into a custom architecture containing layers regulating node dropout, normalization, and a sigmoid activation function to out a binary classification. This allows for the model to process the flows effectively and look for the nodes that contribute to DDoS attacks while dropping the "noise" or the distractors. The results of this study demonstrate the effectiveness of the proposed algorithm in detecting DDOS attacks, achieving an accuracy of .9883 on 2000 unseen flows in network traffic, while being scalable for any network environment.
    摘要 “网络安全攻击日益变得更加复杂,对个人和公共领域 pose 成长中的威胁。分布式拒绝服务攻击(DDoS)是当今互联网中最有害的一种威胁,可以中断网络服务的可用性。本项目提出了一种基于深度学习的新方法,用于在网络流量中检测DDoS攻击。该方法使用了大学新不列颠的DDoS评估数据集,该数据集包含实际时间DDoS攻击的PacketCapture,创造了更广泛和更适用的模型。该算法使用了卷积神经网络和常见的深度学习算法,建立了一种新的防御技术,该技术可以分类 benign 和 malicious 流量。该提案的模型采取了数据预处理步骤,将流量拼接成包流,并将其归一化为固定长度,然后通过自定义架构,包括节点Dropout、normalization和sigmoid活化函数,进行二分类。这种方法可以让模型有效地处理流量,寻找引起DDoS攻击的节点,同时抛弃“噪音”或“拖垮”。研究结果表明,该提案的算法在2000个未看到的流量中的网络流量中的检测DDOS攻击的精度为0.9883,同时具有扩展性,适用于任何网络环境。”

Desenvolvimento de modelo para predição de cotações de ação baseada em análise de sentimentos de tweets

  • paper_url: http://arxiv.org/abs/2309.06538
  • repo_url: None
  • paper_authors: Mario Mitsuo Akita, Everton Josue da Silva
  • for: 预测股票市场价格
  • methods: 使用 iFeel 2.0 平台提取推特社交媒体上关于 Petrobras 公司的19个情感特征,然后使用这些特征训练 XBoot 模型预测未来股票价格。
  • results: 使用模型预测 Petrobras 股票价格后,在250天内实现了与100个随机模型的平均性能相比的净收益 R$88,82。
    Abstract Training machine learning models for predicting stock market share prices is an active area of research since the automatization of trading such papers was available in real time. While most of the work in this field of research is done by training Neural networks based on past prices of stock shares, in this work, we use iFeel 2.0 platform to extract 19 sentiment features from posts obtained from microblog platform Twitter that mention the company Petrobras. Then, we used those features to train XBoot models to predict future stock prices for the referred company. Later, we simulated the trading of Petrobras' shares based on the model's outputs and determined the gain of R$88,82 (net) in a 250-day period when compared to a 100 random models' average performance.
    摘要 研究用机器学习模型预测股票市场价格是一个活跃的领域,因为自动化交易可以在实时提供相关文献。大多数在这个领域的研究是通过以往的股票价格训练神经网络,而在这种工作中,我们使用iFeel 2.0平台提取了19个情感特征从微博平台上提到Petrobras公司的帖子。然后,我们使用这些特征训练XBoot模型预测Petrobras公司的未来股票价格。后来,我们使用模型的输出进行了Petrobras股票的交易模拟,并发现在250天内比Random Models的平均表现获得了R$88,82(净)的收益。

Boundary Peeling: Outlier Detection Method Using One-Class Peeling

  • paper_url: http://arxiv.org/abs/2309.05630
  • repo_url: None
  • paper_authors: Sheikh Arafat, Na Sun, Maria L. Weese, Waldyn G. Martinez
  • for: 本研究旨在提出一种不需要标注数据的潜在异常点检测算法,用于数据分析阶段中的异常点检测。
  • methods: 本算法基于一类支持向量机(One-class Support Vector Machine,SVM)的迭代封闭精度计算,生成可变的边界,并通过iteratively-peeled, flexible boundaries进行异常点检测。
  • results: 在模拟数据中,本算法在无异常情况下的性能比所有现状方法更高,并在异常情况下与参考方法相当或更高,同时在常用的数据集上也表现良好,与标准方法相当或更高。
    Abstract Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research. A good outlier detection algorithm should be computationally efficient, robust to tuning parameter selection, and perform consistently well across diverse underlying data distributions. We introduce One-Class Boundary Peeling, an unsupervised outlier detection algorithm. One-class Boundary Peeling uses the average signed distance from iteratively-peeled, flexible boundaries generated by one-class support vector machines. One-class Boundary Peeling has robust hyperparameter settings and, for increased flexibility, can be cast as an ensemble method. In synthetic data simulations One-Class Boundary Peeling outperforms all state of the art methods when no outliers are present while maintaining comparable or superior performance in the presence of outliers, as compared to benchmark methods. One-Class Boundary Peeling performs competitively in terms of correct classification, AUC, and processing time using common benchmark data sets.
    摘要 translate("Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research.")一种无监督异常检测算法是数据分析中的关键阶段,并且是一个动态的研究领域。一个好的异常检测算法应该具有计算效率、鲁棒性和多样性的特点。我们介绍了一种无监督边缘剥离算法,即One-Class Boundary Peeling。这种算法使用一个一类支持向量机生成的轮廓进行迭代剥离,并且具有鲁棒的超参数设置。为了增加灵活性,One-Class Boundary Peeling可以被视为一种集成方法。在模拟数据中,One-Class Boundary Peeling在没有异常时的情况下比所有现状方法高效,同时在异常存在的情况下也能够保持与参考方法相当或更高的性能。在常用的测试数据集上,One-Class Boundary Peeling在正确分类、AUC和处理时间方面与参考方法竞争。

Privacy Side Channels in Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2309.05610
  • repo_url: None
  • paper_authors: Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A. Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, Florian Tramèr
  • for: 本研究旨在掌握机器学习(ML)模型中的隐私问题,并提出了四种隐私渠道,即训练数据筛选、输入预处理、输出后处理和查询筛选,这些渠道可以帮助攻击者从 ML 模型中提取私人信息。
  • methods: 本研究使用了一系列实验和分析方法,包括训练数据筛选、输入预处理、输出后处理和查询筛选等,以探索机器学习模型中的隐私问题。
  • results: 本研究发现了四种隐私渠道,可以帮助攻击者从机器学习模型中提取私人信息,包括提高会员推理攻击和提取用户测试查询等。此外,研究还发现了一些系统组件,如训练数据筛选和输出筛选,可以帮助攻击者从 ML 模型中提取私人信息。
    Abstract Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models. We propose four categories of side channels that span the entire ML lifecycle (training data filtering, input preprocessing, output post-processing, and query filtering) and allow for either enhanced membership inference attacks or even novel threats such as extracting users' test queries. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. Moreover, we show that systems which block language models from regenerating training data can be exploited to allow exact reconstruction of private keys contained in the training set -- even if the model did not memorize these keys. Taken together, our results demonstrate the need for a holistic, end-to-end privacy analysis of machine learning.
    摘要 现有的机器学习(ML)隐私保护方法都假设模型在孤立的环境中运行,而实际上ML模型是更大的系统的一部分,包括训练数据筛选、输出监控和更多的组件。在这个工作中,我们介绍了隐私副通道:它们利用这些系统级别的组件来提取私人信息,比起独立的模型来说,提取速率更高。我们提出了四种类别的副通道,涵盖了整个ML生命周期(训练数据筛选、输入预处理、输出后处理和查询筛选),可以提供加强的会员推断攻击或者是新的威胁,如提取用户的测试查询。例如,我们显示了在应用幂等训练前的数据归一化会完全跳过任何可证明隐私保证的保障。此外,我们还显示了防止语言模型重新生成训练数据的系统可以被滥用,以访问私钥,即使模型没有记忆这些私钥。总之,我们的结果表明了机器学习隐私分析应该是综合、端到端的。

Quantitative Analysis of Forecasting Models:In the Aspect of Online Political Bias

  • paper_url: http://arxiv.org/abs/2309.05589
  • repo_url: None
  • paper_authors: Srinath Sai Tripuraneni, Sadia Kamal, Arunkumar Bagavathi
  • for: 本研究旨在Characterizing political bias in online social media platforms, particularly in forecasting political leaning time series.
  • methods: 我们提出了一种决策树方法, Utilizing existing time series forecasting models on two social media datasets with different political ideologies, specifically Twitter and Gab.
  • results: Through our experiments and analyses, we aim to shed light on the challenges and opportunities in forecasting political bias in social media platforms, and ultimately pave the way for developing more effective strategies to mitigate the negative impact of political bias in the digital realm.
    Abstract Understanding and mitigating political bias in online social media platforms are crucial tasks to combat misinformation and echo chamber effects. However, characterizing political bias temporally using computational methods presents challenges due to the high frequency of noise in social media datasets. While existing research has explored various approaches to political bias characterization, the ability to forecast political bias and anticipate how political conversations might evolve in the near future has not been extensively studied. In this paper, we propose a heuristic approach to classify social media posts into five distinct political leaning categories. Since there is a lack of prior work on forecasting political bias, we conduct an in-depth analysis of existing baseline models to identify which model best fits to forecast political leaning time series. Our approach involves utilizing existing time series forecasting models on two social media datasets with different political ideologies, specifically Twitter and Gab. Through our experiments and analyses, we seek to shed light on the challenges and opportunities in forecasting political bias in social media platforms. Ultimately, our work aims to pave the way for developing more effective strategies to mitigate the negative impact of political bias in the digital realm.
    摘要 In this paper, we propose a heuristic approach to classify social media posts into five distinct political leaning categories. Since there is a lack of prior work on forecasting political bias, we conduct an in-depth analysis of existing baseline models to identify which model best fits to forecast political leaning time series. Our approach involves utilizing existing time series forecasting models on two social media datasets with different political ideologies, specifically Twitter and Gab.Through our experiments and analyses, we seek to shed light on the challenges and opportunities in forecasting political bias in social media platforms. Our work aims to pave the way for developing more effective strategies to mitigate the negative impact of political bias in the digital realm.

Anisotropic Diffusion Stencils: From Simple Derivations over Stability Estimates to ResNet Implementations

  • paper_url: http://arxiv.org/abs/2309.05575
  • repo_url: None
  • paper_authors: Karl Schrader, Joachim Weickert, Michael Krause
  • for: This paper is written for studying the numerical approximation of anisotropic diffusion processes with a diffusion tensor, and deriving a large family of finite difference discretizations on a 3x3 stencil.
  • methods: The paper uses a directional splitting method to derive a stencil class that covers a wide range of existing discretizations, and establishes a bound on the spectral norm of the matrix corresponding to the stencil to guarantee stability of an explicit scheme in the Euclidean norm.
  • results: The paper shows that the resulting stencil class involves one free parameter and covers a wide range of existing discretizations, and that the two parameters in the stencil of Weickert et al. (2013) contain redundancy. Additionally, the paper demonstrates a natural translation of the explicit scheme into ResNet blocks, which enables simple and highly efficient parallel implementations on GPUs.
    Abstract Anisotropic diffusion processes with a diffusion tensor are important in image analysis, physics, and engineering. However, their numerical approximation has a strong impact on dissipative artefacts and deviations from rotation invariance. In this work, we study a large family of finite difference discretisations on a 3 x 3 stencil. We derive it by splitting 2-D anisotropic diffusion into four 1-D diffusions. The resulting stencil class involves one free parameter and covers a wide range of existing discretisations. It comprises the full stencil family of Weickert et al. (2013) and shows that their two parameters contain redundancy. Furthermore, we establish a bound on the spectral norm of the matrix corresponding to the stencil. This gives time step size limits that guarantee stability of an explicit scheme in the Euclidean norm. Our directional splitting also allows a very natural translation of the explicit scheme into ResNet blocks. Employing neural network libraries enables simple and highly efficient parallel implementations on GPUs.
    摘要 “散射过程”在图像分析、物理和工程中具有重要地位。然而,其数字逼近带来强烈的消耗残差和不够旋转 invariants 的偏差。在这项工作中,我们研究了一个大家族的finite difference离散方法,其基于2-D不同方向散射的4-1-D散射。这个stencil家族包括一个自由参数,覆盖了广泛存在的离散方法。它包括Weickert等人(2013)中的全stencil家族,并证明了其两个参数之间的重复性。此外,我们还提出了一个spectral norm的下界,这个下界 garanties stability of an explicit scheme in Euclidean norm。我们的方向分解也使得可以非常自然地将批处理翻译成ResNet块。通过使用神经网络库,可以在GPU上实现高效并简单的并行实现。

Advancing Federated Learning in 6G: A Trusted Architecture with Graph-based Analysis

  • paper_url: http://arxiv.org/abs/2309.05525
  • repo_url: https://github.com/chendiqian/GNN4FL
  • paper_authors: Wenxuan Ye, Chendi Qian, Xueli An, Xueqiang Yan, Georg Carle
  • for: 提高6G网络中的人工智能支持,使其更加安全和可靠。
  • methods: 使用分布式记录技术和图 neural network,包括预处理层使用同质加密、图StructuredNN用于异常模型检测、和分布式系统选择机制。
  • results: 在模拟中 Validates the feasibility of the proposed architecture, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines.
    Abstract Integrating native AI support into the network architecture is an essential objective of 6G. Federated Learning (FL) emerges as a potential paradigm, facilitating decentralized AI model training across a diverse range of devices under the coordination of a central server. However, several challenges hinder its wide application in the 6G context, such as malicious attacks and privacy snooping on local model updates, and centralization pitfalls. This work proposes a trusted architecture for supporting FL, which utilizes Distributed Ledger Technology (DLT) and Graph Neural Network (GNN), including three key features. First, a pre-processing layer employing homomorphic encryption is incorporated to securely aggregate local models, preserving the privacy of individual models. Second, given the distributed nature and graph structure between clients and nodes in the pre-processing layer, GNN is leveraged to identify abnormal local models, enhancing system security. Third, DLT is utilized to decentralize the system by selecting one of the candidates to perform the central server's functions. Additionally, DLT ensures reliable data management by recording data exchanges in an immutable and transparent ledger. The feasibility of the novel architecture is validated through simulations, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines.
    摘要 六代网络中 интеGRATING本地AI支持是一个重要目标。联邦学习(FL)emerges as a potential paradigm,允许分散的AI模型训练 Across a diverse range of devices under the coordination of a central server。然而,several challenges hinder its wide application in the 6G context, such as malicious attacks and privacy snooping on local model updates, and centralization pitfalls。This work proposes a trusted architecture for supporting FL, which utilizes Distributed Ledger Technology (DLT) and Graph Neural Network (GNN), including three key features。First, a pre-processing layer employing homomorphic encryption is incorporated to securely aggregate local models, preserving the privacy of individual models。Second, given the distributed nature and graph structure between clients and nodes in the pre-processing layer, GNN is leveraged to identify abnormal local models, enhancing system security。Third, DLT is utilized to decentralize the system by selecting one of the candidates to perform the central server's functions。Additionally, DLT ensures reliable data management by recording data exchanges in an immutable and transparent ledger。The feasibility of the novel architecture is validated through simulations, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Re-formalization of Individual Fairness

  • paper_url: http://arxiv.org/abs/2309.05521
  • repo_url: None
  • paper_authors: Toshihiro Kamishima
  • for: 本研究旨在重新定义个人公平,通过统计独立条件确定个人。
  • methods: 本研究使用了Dwork等人的形式化方法,将类似数据在不公平空间映射到公平空间中相似的位置。
  • results: 本研究提出了一种新的公平定义,可以与等式公平和统计平衡结合使用,并且可以应用于预处理、进程处理和后处理阶段。
    Abstract The notion of individual fairness is a formalization of an ethical principle, "Treating like cases alike," which has been argued such as by Aristotle. In a fairness-aware machine learning context, Dwork et al. firstly formalized the notion. In their formalization, a similar pair of data in an unfair space should be mapped to similar positions in a fair space. We propose to re-formalize individual fairness by the statistical independence conditioned by individuals. This re-formalization has the following merits. First, our formalization is compatible with that of Dwork et al. Second, our formalization enables to combine individual fairness with the fairness notion, equalized odds or sufficiency, as well as statistical parity. Third, though their formalization implicitly assumes a pre-process approach for making fair prediction, our formalization is applicable to an in-process or post-process approach.
    摘要 “个人公平”是一种形式化的道德原则, Aristotle 已经提出过,而 Dwork 等人首先将其形式化。在公平意识的机器学习上下,他们的形式化是指将不公平的空间中的相似资料映射到公平的空间中相似的位置。我们提议将个人公平重新形式化为统计独立的条件。这种重新形式化具有以下优点:首先,我们的形式化与 Dwork 等人的形式化相容。其次,我们的形式化可以与 equalized odds 或 sufficiency 的公平观念结合。最后,他们的形式化预设了先processing 的公平预测方法,而我们的形式化则可以应用于进程或后process approach。

Share Your Representation Only: Guaranteed Improvement of the Privacy-Utility Tradeoff in Federated Learning

  • paper_url: http://arxiv.org/abs/2309.05505
  • repo_url: https://github.com/shenzebang/centaur-privacy-federated-representation-learning
  • paper_authors: Zebang Shen, Jiayuan Ye, Anmin Kang, Hamed Hassani, Reza Shokri
  • for: 这个论文目的是提出一种基于分布式学习的隐私保护 federated representation learning 方法,以保持数据隐私而提高模型性能。
  • methods: 该方法使用了现代差分隐私算法,并使用了一种新的可变权重策略来保证 differential privacy 的承诺,同时允许本地个性化。
  • results: 在线性表示设定下,我们的新算法 \DPFEDREP\ 在一个 linear rate 下 converges to a ball centered around the global optimal solution, 并且radius of the ball 与隐私预算的reciprocal成正比。这些结果提高了这个问题的 utility-privacy trade-off 的最佳性能,比过去的最佳性能增加了 $\sqrt{d}$。
    Abstract Repeated parameter sharing in federated learning causes significant information leakage about private data, thus defeating its main purpose: data privacy. Mitigating the risk of this information leakage, using state of the art differentially private algorithms, also does not come for free. Randomized mechanisms can prevent convergence of models on learning even the useful representation functions, especially if there is more disagreement between local models on the classification functions (due to data heterogeneity). In this paper, we consider a representation federated learning objective that encourages various parties to collaboratively refine the consensus part of the model, with differential privacy guarantees, while separately allowing sufficient freedom for local personalization (without releasing it). We prove that in the linear representation setting, while the objective is non-convex, our proposed new algorithm \DPFEDREP\ converges to a ball centered around the \emph{global optimal} solution at a linear rate, and the radius of the ball is proportional to the reciprocal of the privacy budget. With this novel utility analysis, we improve the SOTA utility-privacy trade-off for this problem by a factor of $\sqrt{d}$, where $d$ is the input dimension. We empirically evaluate our method with the image classification task on CIFAR10, CIFAR100, and EMNIST, and observe a significant performance improvement over the prior work under the same small privacy budget. The code can be found in this link: https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning.
    摘要 “重复的参数共享在联合学习中会导致敏感数据信息泄露,这会背离联合学习的主要目的——数据隐私。为了解决这种信息泄露风险,使用当前最佳的权限隐私算法也不是免费的。随机机制可以防止模型学习到本地模型之间的分布不同的情况下的有用表示函数,尤其是当数据不同时。在这篇论文中,我们考虑了一种联合学习目标,它鼓励不同党派共同修改共识部分的模型,同时保证隐私保证。我们证明在线性表示设定下,虽然目标函数不对称,但我们提出的新算法\DPFEDREP\在线性速率下 converge到一个球心在全球优致解的解,球半径与隐私预算reciprocal成正比。通过这种新的用户分析,我们提高了这个问题的状态艺术-隐私质量比,提高了状态艺术质量比$\sqrt{d}$,其中$d$是输入维度。我们通过对图像分类任务进行实验,观察到在相同的小隐私预算下,我们的方法与先前的成果相比有显著的性能提升。代码可以在以下链接中找到:https://github.com/shenzebang/CENTAUR-Privacy-Federated-Representation-Learning。”

Systematic Review of Experimental Paradigms and Deep Neural Networks for Electroencephalography-Based Cognitive Workload Detection

  • paper_url: http://arxiv.org/abs/2309.07163
  • repo_url: None
  • paper_authors: Vishnu KN, Cota Navin Gupta
  • for: 这种系统性文献综述旨在探讨基于电энце法成功的认知工作负荷(CWL)估计方法。
  • methods: 这些研究使用了多种实验方法来刺激人类的认知工作负荷,并使用了深度神经网络(DNNs)进行信号分类。
  • results: 研究发现,只有一些研究使用了在线或 pseudo-在线的分类策略来实时估计认知工作负荷,而大多数研究使用了黑盒模型。 综述还表明,DNNs 是可以有效地分类 EEG 信号的工具,但是现有方法受到非站态信号的限制。
    Abstract This article summarizes a systematic review of the electroencephalography (EEG)-based cognitive workload (CWL) estimation. The focus of the article is twofold: identify the disparate experimental paradigms used for reliably eliciting discreet and quantifiable levels of cognitive load and the specific nature and representational structure of the commonly used input formulations in deep neural networks (DNNs) used for signal classification. The analysis revealed a number of studies using EEG signals in its native representation of a two-dimensional matrix for offline classification of CWL. However, only a few studies adopted an online or pseudo-online classification strategy for real-time CWL estimation. Further, only a couple of interpretable DNNs and a single generative model were employed for cognitive load detection till date during this review. More often than not, researchers were using DNNs as black-box type models. In conclusion, DNNs prove to be valuable tools for classifying EEG signals, primarily due to the substantial modeling power provided by the depth of their network architecture. It is further suggested that interpretable and explainable DNN models must be employed for cognitive workload estimation since existing methods are limited in the face of the non-stationary nature of the signal.
    摘要
  1. Identify the disparate experimental paradigms used for reliably eliciting discreet and quantifiable levels of cognitive load.2. Examine the specific nature and representational structure of the commonly used input formulations in deep neural networks (DNNs) used for signal classification.The analysis revealed several findings:1. Many studies used EEG signals in their native representation, a two-dimensional matrix, for offline classification of CWL.2. Only a few studies adopted an online or pseudo-online classification strategy for real-time CWL estimation.3. Only a couple of interpretable DNNs and a single generative model were employed for cognitive load detection until now.4. Most researchers used DNNs as black-box type models.In conclusion, DNNs are valuable tools for classifying EEG signals, primarily due to the substantial modeling power provided by the depth of their network architecture. However, the review suggests that interpretable and explainable DNN models should be employed for cognitive workload estimation, as existing methods are limited in the face of the non-stationary nature of the signal.

Learning Objective-Specific Active Learning Strategies with Attentive Neural Processes

  • paper_url: http://arxiv.org/abs/2309.05477
  • repo_url: https://github.com/timsey/npal
  • paper_authors: Tim Bakker, Herke van Hoof, Max Welling
  • for: 提高机器学习模型的数据效率
  • methods: 使用 Attentive Conditional Neural Process 模型,利用活动学习问题的对称和独立性
  • results: 比较多种基elines表现出色,并且对变化数据集 exhibits improved stabilityHere’s the full translation of the abstract in Simplified Chinese:
  • for: 本文提出了一种基于 Pool-based Active Learning 技术的 Classification 方法,以提高机器学习模型的数据效率。
  • methods: 我们使用 Attentive Conditional Neural Process 模型,利用活动学习问题的对称和独立性来学习活动学习策略。
  • results: 我们的方法在不同的数据集和训练设置下表现出色,比较多种基elines表现出色,并且对变化数据集 exhibits improved stability。However, please note that the translation is not perfect and may not capture all the nuances of the original English text.
    Abstract Pool-based active learning (AL) is a promising technology for increasing data-efficiency of machine learning models. However, surveys show that performance of recent AL methods is very sensitive to the choice of dataset and training setting, making them unsuitable for general application. In order to tackle this problem, the field Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting. In this work, we propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem with an Attentive Conditional Neural Process model. Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives, such as those that do not equally weight the error on all data points. We experimentally verify that our Neural Process model outperforms a variety of baselines in these settings. Finally, our experiments show that our model exhibits a tendency towards improved stability to changing datasets. However, performance is sensitive to choice of classifier and more work is necessary to reduce the performance the gap with the myopic oracle and to improve scalability. We present our work as a proof-of-concept for LAL on nonstandard objectives and hope our analysis and modelling considerations inspire future LAL work.
    摘要 池度基于的活动学习(AL)是一种可靠的技术,可以提高机器学习模型的数据效率。然而,评估表明,现有的AL方法在不同的数据集和训练环境下表现非常敏感,使其不适用于通用应用。为了解决这个问题,场景学习活动学习(LAL)建议学习活动学习策略自身,以适应给定的环境。在这个工作中,我们提出了一种基于归一化神经过程模型的LAL方法 для分类。我们的方法基于学习偏向oracle,让我们的模型能够适应非标准目标函数,如不均衡所有数据点的错误。我们实验表明,我们的神经过程模型在这些设置下超过了多种基准。然而,我们的模型表现受到选择类фика器的影响,需要更多的工作来减少与偏向oracle的性能差距,并提高可扩展性。我们的工作作为LAL非标准目标的证明,希望我们的分析和模型考虑能够激励未来LAL工作。

Machine learning the dimension of a Fano variety

  • paper_url: http://arxiv.org/abs/2309.05473
  • repo_url: https://bitbucket.org/fanosearch/mldim
  • paper_authors: Tom Coates, Alexander M. Kasprzyk, Sara Veneziale
  • for: 本研究探讨了 whether the quantum period of a Fano variety determines its dimension.
  • methods: 使用 machine learning 技术,特别是 feed-forward neural network,来解决这个问题。
  • results: 研究发现,一个简单的 feed-forward neural network 可以准确地确定 Fano variety 的维度,准确率达到 98%。此外,研究还提出了对 Fano variety 的几何期的准确误差分析,并证明了这些误差可以用来确定 Fano variety 的维度。
    Abstract Fano varieties are basic building blocks in geometry - they are `atomic pieces' of mathematical shapes. Recent progress in the classification of Fano varieties involves analysing an invariant called the quantum period. This is a sequence of integers which gives a numerical fingerprint for a Fano variety. It is conjectured that a Fano variety is uniquely determined by its quantum period. If this is true, one should be able to recover geometric properties of a Fano variety directly from its quantum period. We apply machine learning to the question: does the quantum period of X know the dimension of X? Note that there is as yet no theoretical understanding of this. We show that a simple feed-forward neural network can determine the dimension of X with 98% accuracy. Building on this, we establish rigorous asymptotics for the quantum periods of a class of Fano varieties. These asymptotics determine the dimension of X from its quantum period. Our results demonstrate that machine learning can pick out structure from complex mathematical data in situations where we lack theoretical understanding. They also give positive evidence for the conjecture that the quantum period of a Fano variety determines that variety.
    摘要 To explore this idea, we applied machine learning techniques to the question of whether the quantum period of a Fano variety can determine its dimension. While there is currently no theoretical understanding of this, we found that a simple feed-forward neural network can accurately determine the dimension of a Fano variety with 98% accuracy.Building on this result, we established rigorous asymptotics for the quantum periods of a class of Fano varieties, which allow us to determine the dimension of a Fano variety from its quantum period. Our findings demonstrate that machine learning can be used to extract structure from complex mathematical data, even in situations where there is no theoretical understanding. Additionally, our results provide positive evidence for the conjecture that the quantum period of a Fano variety determines that variety.

Unveiling the Sentinels: Assessing AI Performance in Cybersecurity Peer Review

  • paper_url: http://arxiv.org/abs/2309.05457
  • repo_url: None
  • paper_authors: Liang Niu, Nian Xue, Christina Pöpper
  • For: This paper aims to evaluate the performance of AI in reviewing academic security conferences, specifically by comparing the results obtained from human reviewers and machine-learning models.* Methods: The paper uses a comprehensive dataset of thousands of papers from computer science conferences and the arXiv preprint website, and evaluates the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers.* Results: The experimental evaluation of review outcome prediction using the Doc2Vec-based approach achieves an accuracy of over 90%, significantly outperforming ChatGPT. The paper also identifies the potential advantages and limitations of the tested ML models and explores areas within the paper-reviewing process that can benefit from automated support approaches.
    Abstract Peer review is the method employed by the scientific community for evaluating research advancements. In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard. This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences. Specifically, we investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models. To facilitate our study, we construct a comprehensive dataset by collecting thousands of papers from renowned computer science conferences and the arXiv preprint website. Based on the collected data, we evaluate the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers. Our experimental evaluation of review outcome prediction using the Doc2Vec-based approach performs significantly better than the ChatGPT and achieves an accuracy of over 90%. While analyzing the experimental results, we identify the potential advantages and limitations of the tested ML models. We explore areas within the paper-reviewing process that can benefit from automated support approaches, while also recognizing the irreplaceable role of human intellect in certain aspects that cannot be matched by state-of-the-art AI techniques.
    摘要 Peer review 是科学共识社区用来评估研究进步的方法。在网络安全领域,双盲审核是标准做法。这篇论文探讨审核评审的圣杯之物,旨在探讨人工智能在学术安全会议审核中的表现。我们专门 investigate 审核结果的预测性,比较人工审核员和机器学习模型 obtiain 的结果。为了进行这项研究,我们构建了包括了数千篇计算机科学会议和arXiv预印站点上的论文的全面数据集。基于收集到的数据,我们评估 Doc2Vec 模型和两个阶段分类器的预测能力。我们的实验测试结果显示,使用 Doc2Vec 模型和两个阶段分类器可以达到高于 90% 的准确率。在分析实验结果时,我们发现了机器学习模型的优点和局限性,并探讨了可以通过自动支持方法帮助审核过程的部分,同时也认可人工智能在某些方面无法被现代 AI 技术所代替的不可或缺的作用。

Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation

  • paper_url: http://arxiv.org/abs/2309.05455
  • repo_url: None
  • paper_authors: Anna Deichler, Shivam Mehta, Simon Alexanderson, Jonas Beskow
  • for: 本研究是为了开发一个可以生成人类样式的合体动作系统,用于GENEA(生成和评估非语言行为 для具有身体的代理)挑战2023。
  • methods: 本研究基于现有的扩散基于动作合成模型,并提出了一个对比性语音和姿势预训练(CSMP)模块,用于学习语音和姿势的semantic coupling。CSMP模块的输出用作diffusion-based gesture synthesis模型的conditioning signal,以实现semantically-aware co-speech gesture generation。
  • results: 根据提交的入场点,我们的系统在人类化度和语音合适度方面得到了最高分,表明我们的系统是一种可靠的实现人类样式的合口动作生成的方法。
    Abstract This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing diffusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the diffusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning.
    摘要

Quantized Fourier and Polynomial Features for more Expressive Tensor Network Models

  • paper_url: http://arxiv.org/abs/2309.05436
  • repo_url: https://github.com/neuripsANON2023/QFF
  • paper_authors: Frederiek Wesel, Kim Batselier
  • for: 提高高维数据集中模型的泛化能力和精度
  • methods: 使用几何和傅ри特特征进行非线性扩展,并将模型参数约化为几何网络
  • results: 在大规模 regression 任务中实现了状态最佳的结果,并且通过实验证明了这种方法可以增强模型的泛化能力和精度
    Abstract In the context of kernel machines, polynomial and Fourier features are commonly used to provide a nonlinear extension to linear models by mapping the data to a higher-dimensional space. Unless one considers the dual formulation of the learning problem, which renders exact large-scale learning unfeasible, the exponential increase of model parameters in the dimensionality of the data caused by their tensor-product structure prohibits to tackle high-dimensional problems. One of the possible approaches to circumvent this exponential scaling is to exploit the tensor structure present in the features by constraining the model weights to be an underparametrized tensor network. In this paper we quantize, i.e. further tensorize, polynomial and Fourier features. Based on this feature quantization we propose to quantize the associated model weights, yielding quantized models. We show that, for the same number of model parameters, the resulting quantized models have a higher bound on the VC-dimension as opposed to their non-quantized counterparts, at no additional computational cost while learning from identical features. We verify experimentally how this additional tensorization regularizes the learning problem by prioritizing the most salient features in the data and how it provides models with increased generalization capabilities. We finally benchmark our approach on large regression task, achieving state-of-the-art results on a laptop computer.
    摘要 在内核机器学中,多项式和傅里叶特征通常用于提供非线性扩展,将数据映射到更高维的空间。如果不考虑学习问题的 dual 形式,那么因为特征的维度乘积结构而导致的模型参数的几率增长会使大规模学习变得不可行。为了缓解这种几率增长,可以利用特征中的维度结构,将模型参数约束为减参数化的tensor网络。在这篇论文中,我们对多项式和傅里叶特征进行量化,并将相关的模型参数量化。我们发现,对于同样的参数数量,量化模型具有更高的VC-维度上限,而无需额外的计算成本,而且在学习同样的特征时,可以减少特征的繁殖。我们通过实验表明,这种额外的维度regularizes学习问题,使模型具有更好的泛化能力。最后,我们对大规模回归任务进行了 benchmark,在笔记计算机上实现了状态级Result。

  • paper_url: http://arxiv.org/abs/2309.05434
  • repo_url: None
  • paper_authors: Haohui Lu, Shahadat Uddin
  • For: The paper is written for researchers and practitioners working on link prediction in graph machine learning, with applications in disease prediction, social network recommendations, and drug discovery.* Methods: The paper proposes a novel method called Node Centrality and Similarity Based Parameterised Model (NCSM), which integrates node centrality and similarity measures as edge features in a customized Graph Neural Network (GNN) layer.* Results: The proposed model outperforms existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder across various metrics and datasets, demonstrating its superiority in link prediction tasks.
    Abstract Link prediction is a key aspect of graph machine learning, with applications as diverse as disease prediction, social network recommendations, and drug discovery. It involves predicting new links that may form between network nodes. Despite the clear importance of link prediction, existing models have significant shortcomings. Graph Convolutional Networks, for instance, have been proven to be highly efficient for link prediction on a variety of datasets. However, they encounter severe limitations when applied to short-path networks and ego networks, resulting in poor performance. This presents a critical problem space that this work aims to address. In this paper, we present the Node Centrality and Similarity Based Parameterised Model (NCSM), a novel method for link prediction tasks. NCSM uniquely integrates node centrality and similarity measures as edge features in a customised Graph Neural Network (GNN) layer, effectively leveraging the topological information of large networks. This model represents the first parameterised GNN-based link prediction model that considers topological information. The proposed model was evaluated on five benchmark graph datasets, each comprising thousands of nodes and edges. Experimental results highlight NCSM's superiority over existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder, as it outperforms them across various metrics and datasets. This exceptional performance can be attributed to NCSM's innovative integration of node centrality, similarity measures, and its efficient use of topological information.
    摘要 <>链接预测是图机器学习中关键的一环,它在多个应用领域中发挥着重要的作用,如疾病预测、社交媒体推荐和药物发现。链接预测的目标是预测图中可能会形成的新链接。 despite the clear importance of link prediction, existing models have significant shortcomings. Graph Convolutional Networks, for instance, have been proven to be highly efficient for link prediction on a variety of datasets. However, they encounter severe limitations when applied to short-path networks and ego networks, resulting in poor performance. This presents a critical problem space that this work aims to address.In this paper, we present the Node Centrality and Similarity Based Parameterised Model (NCSM), a novel method for link prediction tasks. NCSM uniquely integrates node centrality and similarity measures as edge features in a customised Graph Neural Network (GNN) layer, effectively leveraging the topological information of large networks. This model represents the first parameterised GNN-based link prediction model that considers topological information.The proposed model was evaluated on five benchmark graph datasets, each comprising thousands of nodes and edges. Experimental results highlight NCSM's superiority over existing state-of-the-art models like Graph Convolutional Networks and Variational Graph Autoencoder, as it outperforms them across various metrics and datasets. This exceptional performance can be attributed to NCSM's innovative integration of node centrality, similarity measures, and its efficient use of topological information.

Neuromorphic Auditory Perception by Neural Spiketrum

  • paper_url: http://arxiv.org/abs/2309.05430
  • repo_url: None
  • paper_authors: Huajin Tang, Pengjie Gu, Jayawan Wijekoon, MHD Anas Alsakkal, Ziming Wang, Jiangrong Shen, Rui Yan
  • for: This paper aims to develop a neural spike coding model called “spiketrum” to efficiently process auditory signals and enable brain-like intelligence in neuromorphic computing.
  • methods: The paper proposes a spiketrum model that transforms time-varying analog signals into spatiotemporal spike patterns, minimizing information loss and providing informational robustness to neural fluctuations and spike losses.
  • results: The paper demonstrates the effectiveness of the spiketrum model through a neuromorphic cochlear prototype, showing that it can provide a systematic solution for spike-based artificial intelligence by fully exploiting the advantages of spike-based computation.
    Abstract Neuromorphic computing holds the promise to achieve the energy efficiency and robust learning performance of biological neural systems. To realize the promised brain-like intelligence, it needs to solve the challenges of the neuromorphic hardware architecture design of biological neural substrate and the hardware amicable algorithms with spike-based encoding and learning. Here we introduce a neural spike coding model termed spiketrum, to characterize and transform the time-varying analog signals, typically auditory signals, into computationally efficient spatiotemporal spike patterns. It minimizes the information loss occurring at the analog-to-spike transformation and possesses informational robustness to neural fluctuations and spike losses. The model provides a sparse and efficient coding scheme with precisely controllable spike rate that facilitates training of spiking neural networks in various auditory perception tasks. We further investigate the algorithm-hardware co-designs through a neuromorphic cochlear prototype which demonstrates that our approach can provide a systematic solution for spike-based artificial intelligence by fully exploiting its advantages with spike-based computation.
    摘要

Temporal Patience: Efficient Adaptive Deep Learning for Embedded Radar Data Processing

  • paper_url: http://arxiv.org/abs/2309.05686
  • repo_url: None
  • paper_authors: Max Sponner, Julius Ott, Lorenzo Servadei, Bernd Waschneck, Robert Wille, Akash Kumar
  • for: 这篇论文旨在提高深度学习推理的能效性,使其在具有限制的附加设备上进行实时处理。
  • methods: 该论文提出了一种使用流动 radar 数据的时间相关性来增强深度学习推理的效率。这些方法包括在架构中添加额外的分类支路,以实现在推理过程中提前终止。
  • results: 该论文的实验结果表明,使用该方法可以在推理过程中减少计算成本,同时保持准确性的最小损失。相比单 Exit 网络,该方法可以节省至 26% 的操作数量。此外,该方法可以与传统优化结合使用,使其在有限的附加设备上可用。
    Abstract Radar sensors offer power-efficient solutions for always-on smart devices, but processing the data streams on resource-constrained embedded platforms remains challenging. This paper presents novel techniques that leverage the temporal correlation present in streaming radar data to enhance the efficiency of Early Exit Neural Networks for Deep Learning inference on embedded devices. These networks add additional classifier branches between the architecture's hidden layers that allow for an early termination of the inference if their result is deemed sufficient enough by an at-runtime decision mechanism. Our methods enable more informed decisions on when to terminate the inference, reducing computational costs while maintaining a minimal loss of accuracy. Our results demonstrate that our techniques save up to 26% of operations per inference over a Single Exit Network and 12% over a confidence-based Early Exit version. Our proposed techniques work on commodity hardware and can be combined with traditional optimizations, making them accessible for resource-constrained embedded platforms commonly used in smart devices. Such efficiency gains enable real-time radar data processing on resource-constrained platforms, allowing for new applications in the context of smart homes, Internet-of-Things, and human-computer interaction.
    摘要 雷达感知器提供了功率高的解决方案,但是处理流处理数据流在具有限制的嵌入式平台上仍然是一个挑战。这篇论文提出了新的技术,利用雷达数据流中的时间相关性来增强深度学习的早期终止网络(Early Exit Neural Networks)的效率在嵌入式设备上。这些网络添加了在架构中隐藏层之间的额外分支,以实现在运行时决策机制的基础上,提前终止推断。我们的方法可以更好地决定终止推断的时间,从而降低计算成本,保持最小的准确性损失。我们的结果表明,我们的技术可以在嵌入式设备上实现26%的操作数减少,相比单exit网络。此外,我们的技术还可以与传统优化相结合,使其在资源有限的嵌入式设备上可用。这些效率提升使得雷达数据流处理在资源有限的平台上实现了实时处理,开 up了新的应用场景,如智能家居、物联网和人机交互。

Learning noise-induced transitions by multi-scaling reservoir computing

  • paper_url: http://arxiv.org/abs/2309.05413
  • repo_url: None
  • paper_authors: Zequn Lin, Zhaofan Lu, Zengru Di, Ying Tang
  • for: 本研究旨在使用机器学习模型,具体是循环神经网络,捕捉时间序列中的随机过渡。
  • methods: 本研究使用了循环神经网络,并对其中的一个关键参数进行了优化。
  • results: 研究发现,使用这种方法可以准确地计算过渡时间和过渡次数的统计数据。此外,该方法还可以capture多稳态系统中的过渡,包括蛋白质折叠过渡。
    Abstract Noise is usually regarded as adversarial to extract the effective dynamics from time series, such that the conventional data-driven approaches usually aim at learning the dynamics by mitigating the noisy effect. However, noise can have a functional role of driving transitions between stable states underlying many natural and engineered stochastic dynamics. To capture such stochastic transitions from data, we find that leveraging a machine learning model, reservoir computing as a type of recurrent neural network, can learn noise-induced transitions. We develop a concise training protocol for tuning hyperparameters, with a focus on a pivotal hyperparameter controlling the time scale of the reservoir dynamics. The trained model generates accurate statistics of transition time and the number of transitions. The approach is applicable to a wide class of systems, including a bistable system under a double-well potential, with either white noise or colored noise. It is also aware of the asymmetry of the double-well potential, the rotational dynamics caused by non-detailed balance, and transitions in multi-stable systems. For the experimental data of protein folding, it learns the transition time between folded states, providing a possibility of predicting transition statistics from a small dataset. The results demonstrate the capability of machine-learning methods in capturing noise-induced phenomena.
    摘要 噪声通常被视为时间序列数据驱动方法中的障碍物,以便学习时间序列中的动力学。然而,噪声可以扮演一种驱动稳定状态之间的转移的功能性角色。为了从数据中捕捉这些随机转移,我们发现可以通过利用机器学习模型,即复合 нейрон网络中的液体计算,来学习噪声引起的转移。我们开发了一种简洁的训练协议,以控制模型中的时间尺度,并将注重这个关键参数。训练后,模型可以准确地预测转移时间和转移次数。这种方法可以应用于广泛的系统中,包括下降 double-well 潜伏 potential 中的 биста布系统,以及白噪声或某些颜色噪声。此外,它还能够捕捉非详细平衡引起的旋转动力学,以及多稳态系统中的转移。对蛋白质折叠的实验数据,它可以学习转移时间 между折叠态,从而提供预测转移统计的可能性。结果表明机器学习方法可以捕捉噪声引起的现象。

Physics-informed reinforcement learning via probabilistic co-adjustment functions

  • paper_url: http://arxiv.org/abs/2309.05404
  • repo_url: None
  • paper_authors: Nat Wannawas, A. Aldo Faisal
    for:The paper is written for training reinforcement learning systems in real-world tasks, which are typically data-inefficient and rely on simulation-based modeling.methods:The paper introduces two novel approaches called co-kriging adjustments (CKA) and ridge regression adjustment (RRA) that combine the advantages of using individual system dynamics and simulation models. These methods use an auto-regressive AR1 co-kriging model integrated with Gaussian process priors to improve uncertainty quantification of the entire system’s dynamics.results:The paper demonstrates the efficiency of co-kriging adjustment with an interpretable reinforcement learning control example, learning to control a biomechanical human arm using only a two-link arm simulation model and CKA derived from a small amount of interaction data. The results show that the method provides more accurate uncertainty quantification of the entire system’s dynamics than pure GP-based and AR1 methods.
    Abstract Reinforcement learning of real-world tasks is very data inefficient, and extensive simulation-based modelling has become the dominant approach for training systems. However, in human-robot interaction and many other real-world settings, there is no appropriate one-model-for-all due to differences in individual instances of the system (e.g. different people) or necessary oversimplifications in the simulation models. This requires two approaches: 1. either learning the individual system's dynamics approximately from data which requires data-intensive training or 2. using a complete digital twin of the instances, which may not be realisable in many cases. We introduce two approaches: co-kriging adjustments (CKA) and ridge regression adjustment (RRA) as novel ways to combine the advantages of both approaches. Our adjustment methods are based on an auto-regressive AR1 co-kriging model that we integrate with GP priors. This yield a data- and simulation-efficient way of using simplistic simulation models (e.g., simple two-link model) and rapidly adapting them to individual instances (e.g., biomechanics of individual people). Using CKA and RRA, we obtain more accurate uncertainty quantification of the entire system's dynamics than pure GP-based and AR1 methods. We demonstrate the efficiency of co-kriging adjustment with an interpretable reinforcement learning control example, learning to control a biomechanical human arm using only a two-link arm simulation model (offline part) and CKA derived from a small amount of interaction data (on-the-fly online). Our method unlocks an efficient and uncertainty-aware way to implement reinforcement learning methods in real world complex systems for which only imperfect simulation models exist.
    摘要 现实世界中的许多任务的强化学习很数据不fficient,而且广泛采用了基于模拟的模型训练系统。然而,在人机交互和许多实际场景中,没有适合一个模型所有的情况,因为实际系统的差异(例如,不同的人)或者模拟模型的必要简化。这需要两种方法:1. either学习个体系统的动态约束从数据中,需要大量的训练数据;2.使用实例的完整数字双方,可能在许多情况下不可能实现。我们介绍了两种新的方法:协同拟合调整(CKA)和ridge regression调整(RRA),这两种方法可以结合模拟模型和GP prior的优点。我们的调整方法基于一个自适应AR1拟合模型,并将其与GP prior相结合。这提供了数据和模拟效率的方式,使用简单的模拟模型(例如,两连接模型),并快速地适应个体实例(例如,人体生物力学)。使用CKA和RRA,我们可以获得更高精度的整体系统动态uncertainty量化,比GP和AR1方法更好。我们通过一个可解释的强化学习控制示例,使用只有两连接臂模型(线上部分)和CKA从小量的互动数据(在线部分)来学习控制人体生物力学。我们的方法可以快速、不确定性感知地实现强化学习方法在实际世界复杂系统中。

Practical Homomorphic Aggregation for Byzantine ML

  • paper_url: http://arxiv.org/abs/2309.05395
  • repo_url: None
  • paper_authors: Antoine Choffrut, Rachid Guerraoui, Rafael Pinot, Renaud Sirdey, John Stephan, Martin Zuber
    for:这个论文是关于分布式学习中的安全和隐私问题的研究。methods:这个论文使用了一种新的纯文本编码方法,这种方法可以在分布式学习中实现稳定的聚合器,并且可以加速现有的幂ometric sorting。results:该论文的实验结果表明,使用这种纯文本编码方法可以实现实时执行,并且与非隐私版本的算法具有相同的机器学习性能。
    Abstract Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
    摘要 Translated into Simplified Chinese:由于大规模数据的可用性,机器学习(ML)算法在分布式架构中被部署,不同的节点共同训练ML模型以其各自的数据交换模型相关信息(例如,梯度)与中央服务器。然而,分布式学习方案受到两种威胁。首先,Byzantine节点可以单方面地损害学习,通过向服务器发送错误信息(例如,错误的梯度)。标准的应对方法是使用非线性的robust合计方法。其次,服务器可以违反节点的隐私。最近的攻击表明,不加加密的梯度交换可以让curious服务器恢复整个节点的数据。使用 homomorphic encryption(HE),一种金标准安全 primitives,广泛研究了分布式学习的隐私保护方案。然而,由于 HE 的计算成本,特别是高维度 ML 模型,还没有任何尝试设计纯 homomorphic 操作符。在这项工作中,我们介绍 SABLE,首个完全 homomorphic 和 Byzantine 抗性的分布式学习算法。 SABLE 基于一种新的普通文本编码方法,允许我们实现robust合计器在批处理友好的 BGV 上。此外,这种编码方案还加速了当前的 homomorphic 排序,提供更大的安全优势和更小的 Ciphertext 大小。我们对图像分类任务进行了广泛的实验,并证明了我们的算法可以实现实用的执行时间,与非隐私counterpart匹配 ML 性能。

Career Path Recommendations for Long-term Income Maximization: A Reinforcement Learning Approach

  • paper_url: http://arxiv.org/abs/2309.05391
  • repo_url: None
  • paper_authors: Spyros Avlonitis, Dor Lavi, Masoud Mansoury, David Graus
  • for: 增强职业规划过程,提高员工长期收入水平
  • methods: 利用Markov决策过程(MDP)和机器学习算法,如Sarsa、Q-学习和A2C,学习优化职业发展路径
  • results: 实验结果显示,RL模型,特别是Q-学习和Sarsa,可以提高员工的收入趋势,平均提高5%,对职业规划过程具有有效性。
    Abstract This study explores the potential of reinforcement learning algorithms to enhance career planning processes. Leveraging data from Randstad The Netherlands, the study simulates the Dutch job market and develops strategies to optimize employees' long-term income. By formulating career planning as a Markov Decision Process (MDP) and utilizing machine learning algorithms such as Sarsa, Q-Learning, and A2C, we learn optimal policies that recommend career paths with high-income occupations and industries. The results demonstrate significant improvements in employees' income trajectories, with RL models, particularly Q-Learning and Sarsa, achieving an average increase of 5% compared to observed career paths. The study acknowledges limitations, including narrow job filtering, simplifications in the environment formulation, and assumptions regarding employment continuity and zero application costs. Future research can explore additional objectives beyond income optimization and address these limitations to further enhance career planning processes.
    摘要 Translation notes:* "Dutch job market" is translated as "荷兰劳动市场" (hànlán láodòng shìchǎng)* "long-term income" is translated as "长期收入" (chángqī shōu jīn)* "career planning" is translated as "职业规划" (zhíyè guīhuà)* "Markov Decision Process" is translated as "Markov决策过程" (Markov juédà gòu jiāng)* "machine learning algorithms" is translated as "机器学习算法" (jīshī xuéxí suàn fāng)* "Q-Learning" is translated as "Q学习" (Q xuéxí)* "Sarsa" is translated as "SARSA" (SARSA)* "A2C" is translated as "A2C" (A2C)* "income trajectories" is translated as "收入轨迹" (shōu jīn xiào tiě)* "RL models" is translated as "RL模型" (RL módeli)

Data-Driven Model Reduction and Nonlinear Model Predictive Control of an Air Separation Unit by Applied Koopman Theory

  • paper_url: http://arxiv.org/abs/2309.05386
  • repo_url: None
  • paper_authors: Jan C. Schulze, Danimir T. Doncevic, Nils Erwes, Alexander Mitsos
  • for: 实现实时能力是非线性预测控制(NMPC)的industrial应用前提。数据驱动模型减少提供了一种获取低阶控制模型的方法,并且该方法需要 minimal expert knowledge of the particular process and its model.
  • methods: 我们使用了 Schulze et al. (2022)提出的数据驱动减少策略,基于Koopman理论,生成了一个低阶控制模型,并使用了机器学习来构建。
  • results: 我们的减少策略和适应NMPC实现使得ASU的实时NMPC可以实现,而且相比原始模型,CPU时间减少了98%。
    Abstract Achieving real-time capability is an essential prerequisite for the industrial implementation of nonlinear model predictive control (NMPC). Data-driven model reduction offers a way to obtain low-order control models from complex digital twins. In particular, data-driven approaches require little expert knowledge of the particular process and its model, and provide reduced models of a well-defined generic structure. Herein, we apply our recently proposed data-driven reduction strategy based on Koopman theory [Schulze et al. (2022), Comput. Chem. Eng.] to generate a low-order control model of an air separation unit (ASU). The reduced Koopman model combines autoencoders and linear latent dynamics and is constructed using machine learning. Further, we present an NMPC implementation that uses derivative computation tailored to the fixed block structure of reduced Koopman models. Our reduction approach with tailored NMPC implementation enables real-time NMPC of an ASU at an average CPU time decrease by 98 %.
    摘要 实时能力是非线性预测控制(NMPC)的industrial实现的必要前提。数据驱动模型减少提供了从复杂数字响应器中获得低阶控制模型的方法。特别是数据驱动方法不需要特定过程和模型的专家知识,并提供了具有well-defined结构的减少模型。在这里,我们使用我们最近提出的基于Koopman理论的数据驱动减少策略来生成一个气体分离机(ASU)的低阶控制模型。减少的Koopman模型结合了自动编码器和线性潜在动力,通过机器学习构建。此外,我们提出了适应fixed block结构的减少Koopman模型的derivative计算,以实现NMPC实现。我们的减少方法和适应NMPC实现使得ASU的实时NMPC实现时间减少了98%。

EDAC: Efficient Deployment of Audio Classification Models For COVID-19 Detection

  • paper_url: http://arxiv.org/abs/2309.05357
  • repo_url: https://github.com/edac-ml4h/edac-ml4h
  • paper_authors: Andrej Jovanović, Mario Mihaly, Lennon Donaldson
  • for: 本研究旨在开发一种可靠、可行的 COVID-19 检测方法,以帮助预防和控制疫情的蔓延。
  • methods: 本研究使用机器学习方法,利用 CT 扫描图像和喊喊声音信号作为输入特征,通过深度神经网络架构实现 COVID-19 的检测。
  • results: 研究人员通过网络剪辑和量化技术来压缩两个模型,实现了模型文件尺寸的压缩和检测性能的维持。Specifically, 研究人员可以实现模型文件尺寸的压缩105.76倍和19.34倍,并对两个模型的检测时间进行了1.37倍和1.71倍的压缩。
    Abstract The global spread of COVID-19 had severe consequences for public health and the world economy. The quick onset of the pandemic highlighted the potential benefits of cheap and deployable pre-screening methods to monitor the prevalence of the disease in a population. Various researchers made use of machine learning methods in an attempt to detect COVID-19. The solutions leverage various input features, such as CT scans or cough audio signals, with state-of-the-art results arising from deep neural network architectures. However, larger models require more compute; a pertinent consideration when deploying to the edge. To address this, we first recreated two models that use cough audio recordings to detect COVID-19. Through applying network pruning and quantisation, we were able to compress these two architectures without reducing the model's predictive performance. Specifically, we were able to achieve an 105.76x and an 19.34x reduction in the compressed model file size with corresponding 1.37x and 1.71x reductions in the inference times of the two models.
    摘要 COVID-19 的全球蔓延引发了公共卫生和世界经济的严重后果。快速的疫情爆发表明了可能利用便宜并可部署的预屏检测方法来监测人口中疫苗的存在。各种研究人员利用机器学习方法尝试检测 COVID-19。这些解决方案利用了不同的输入特征,如 CT 扫描或喊嚔音信号,并使用了当前的神经网络架构获得了 state-of-the-art 的结果。然而,更大的模型需要更多的计算资源,这是在部署到边缘时必须考虑的。为解决这个问题,我们首先重新创建了两个使用喊嚔音记录检测 COVID-19 的模型。通过应用网络剪辑和量化,我们能够压缩这两个架构,而无需降低模型的预测性能。具体来说,我们能够实现一个 105.76x 和一个 19.34x 的压缩模型文件大小减少,相应的执行时间也降低了 1.37x 和 1.71x。

Neural Discovery of Permutation Subgroups

  • paper_url: http://arxiv.org/abs/2309.05352
  • repo_url: None
  • paper_authors: Pavan Karjol, Rohan Kashyap, Prathosh A P
  • for: 找到 permutation group $S_{n}$ 中的子群 $H$
  • methods: 使用 $S_{n}$-invariant function 和线性变换来发现 underlying subgroup
  • results: 可以发现任何类型 $S_{k} (k \leq n)$ 的子群,并且证明了类似结果对循环和对称群也成立。
    Abstract We consider the problem of discovering subgroup $H$ of permutation group $S_{n}$. Unlike the traditional $H$-invariant networks wherein $H$ is assumed to be known, we present a method to discover the underlying subgroup, given that it satisfies certain conditions. Our results show that one could discover any subgroup of type $S_{k} (k \leq n)$ by learning an $S_{n}$-invariant function and a linear transformation. We also prove similar results for cyclic and dihedral subgroups. Finally, we provide a general theorem that can be extended to discover other subgroups of $S_{n}$. We also demonstrate the applicability of our results through numerical experiments on image-digit sum and symmetric polynomial regression tasks.
    摘要 我团队考虑了找到 permutation group $S_{n}$ 中的子群 $H$ 的问题。不同于传统的 $H$-invariant 网络,我们提出了一种方法,可以在 $H$ 满足 certain conditions 时找到其下面的子群。我们的结果表明,可以通过学习 $S_{n}$- invariant 函数和一个线性变换来找到任何类型 $S_{k} (k \leq n)$ 的子群。此外,我们还证明了这些结果的类型也适用于圆柱体和二面体 subgroup。最后,我们提出了一个通用的定理,可以扩展到其他 $S_{n}$ 中的子群。我们还通过数值实验证明了我们的结果,在 image-digit sum 和 symmetric polynomial regression 任务中。

A DRL-based Reflection Enhancement Method for RIS-assisted Multi-receiver Communications

  • paper_url: http://arxiv.org/abs/2309.05343
  • repo_url: None
  • paper_authors: Wei Wang, Peizheng Li, Angela Doufexi, Mark A Beach
  • for: 本研究旨在优化具有 periodical 单反射profile 的 RIS 助持 wireless 通信系统中的 pointing 精度和反射强度。
  • methods: 本文提出了一种基于深度优化学习(DRL)的优化方法,用于解决 periodic 单反射profile 的杂合导致的 amplitude/phase 干扰问题。
  • results: 对比Random Search和枚举Search两种方法,DRL 方法在优化时间短化方面表现出了明显的优势,并且实现了无硬件修改的 1.2 dB 增强和更宽的抛射束。
    Abstract In reconfigurable intelligent surface (RIS)-assisted wireless communication systems, the pointing accuracy and intensity of reflections depend crucially on the 'profile,' representing the amplitude/phase state information of all elements in a RIS array. The superposition of multiple single-reflection profiles enables multi-reflection for distributed users. However, the optimization challenges from periodic element arrangements in single-reflection and multi-reflection profiles are understudied. The combination of periodical single-reflection profiles leads to amplitude/phase counteractions, affecting the performance of each reflection beam. This paper focuses on a dual-reflection optimization scenario and investigates the far-field performance deterioration caused by the misalignment of overlapped profiles. To address this issue, we introduce a novel deep reinforcement learning (DRL)-based optimization method. Comparative experiments against random and exhaustive searches demonstrate that our proposed DRL method outperforms both alternatives, achieving the shortest optimization time. Remarkably, our approach achieves a 1.2 dB gain in the reflection peak gain and a broader beam without any hardware modifications.
    摘要 在带有智能表面(RIS)的无线通信系统中,投射精度和反射强度受到 Profile(所有数组元素的振荡状态信息)的影响。多个单投射Profile的超пози合成可以实现分布式用户的多投射。然而,单投射Profile的Periodic配置和多投射Profile的优化挑战尚未得到足够的研究。在这种双投射优化场景中,我们发现了 Profile重叠的misalignment导致的远场性能弱化。为解决这问题,我们提出了一种基于深度学习(DRL)的优化方法。与随机搜索和枚举搜索相比,我们的提案的DRL方法在优化时间上表现出了明显的优势,并且实现了无硬件修改的1.2 dB的反射峰强度提高和更广的扩散。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Stochastic Gradient Descent-like relaxation is equivalent to Glauber dynamics in discrete optimization and inference problems

  • paper_url: http://arxiv.org/abs/2309.05337
  • repo_url: None
  • paper_authors: Maria Chiara Angelini, Angelo Giorgio Cavaliere, Raffaele Marino, Federico Ricci-Tersenghi
  • for: 这篇论文是 investigate whether Stochastic Gradient Descent (SGD) and Glauber dynamics are substantially different, and to understand the relationship between the two algorithms.
  • methods: 这篇论文使用了SGD-like algorithm和Metropolis Monte Carlo algorithm,并 Compares their dynamics in discrete optimization and inference problems.
  • results: 研究发现,在离散优化和推理问题中,SGD-like algorithm的动力学与Metropolis Monte Carlo algorithm具有很高的相似性,即使这两种算法在详细平衡不满足的情况下。这种相似性使得我们可以使用关于 Monte Carlo 算法的性能和限制来优化SGD-like algorithm的 mini-batch 大小,并使其在困难的推理问题中效率地恢复信号。
    Abstract Is Stochastic Gradient Descent (SGD) substantially different from Glauber dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g.\ SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.
    摘要

A Strong and Simple Deep Learning Baseline for BCI MI Decoding

  • paper_url: http://arxiv.org/abs/2309.07159
  • repo_url: https://github.com/elouayas/eegsimpleconv
  • paper_authors: Yassine El Ouahidi, Vincent Gripon, Bastien Pasdeloup, Ghaith Bouallegue, Nicolas Farrugia, Giulia Lioi
  • for: 这 paper 是为了提出一种简单的1D convolutional neural network(EEG-SimpleConv),用于股肱运动干预BCI的识别。
  • methods: 这 paper 使用了常见的文献中的标准组件,包括1D convolutional neural network和简单的训练策略。
  • results: EEG-SimpleConv 在四个EEG股肱运动数据集上表现至少如好或更高效,并且具有跨主题知识传递能力,但是执行时间较低。
    Abstract We propose EEG-SimpleConv, a straightforward 1D convolutional neural network for Motor Imagery decoding in BCI. Our main motivation is to propose a very simple baseline to compare to, using only very standard ingredients from the literature. We evaluate its performance on four EEG Motor Imagery datasets, including simulated online setups, and compare it to recent Deep Learning and Machine Learning approaches. EEG-SimpleConv is at least as good or far more efficient than other approaches, showing strong knowledge-transfer capabilities across subjects, at the cost of a low inference time. We advocate that using off-the-shelf ingredients rather than coming with ad-hoc solutions can significantly help the adoption of Deep Learning approaches for BCI. We make the code of the models and the experiments accessible.
    摘要 我们提出EEG-SimpleConv,一个简单的1D卷积神经网络,用于肌电意念识别 BCIs。我们的主要动机是提出一个非常简单的基准,使用文献中的标准元素。我们在四个EEG肌电意念数据集上评估了EEG-SimpleConv的性能,并与最近的深度学习和机器学习方法进行比较。EEG-SimpleConv至少和其他方法一样好,甚至更高效,在不同主题之间具有强大的知识传递能力,但没有高的推断时间。我们认为使用商业可用的元素而不是创建特殊解决方案可以帮助深度学习方法在 BCIs 的采纳。我们将模型和实验的代码公开。

Neural Koopman prior for data assimilation

  • paper_url: http://arxiv.org/abs/2309.05317
  • repo_url: https://github.com/anthony-frion/sentinel2ts
  • paper_authors: Anthony Frion, Lucas Drumetz, Mauro Dalla Mura, Guillaume Tochon, Abdeldjalil Aïssa El Bey
  • for: 这个论文是用来描述如何使用神经网络模型来描述动态系统的。
  • methods: 该论文使用了 Koopman 算子理论来嵌入动态系统在隐藏空间中,以便在这个空间中描述动态系统的动态。它还介绍了一些方法来训练这种模型,包括自我监督学习和变量数据整合。
  • results: 该论文的实验结果表明,使用这种神经网络模型可以在不精确时间序列数据的情况下进行长期不间断的重建,并且可以在难以预测的情况下进行自动适应。此外,论文还示出了使用训练过的动态模型作为Variational数据整合的先验。
    Abstract With the increasing availability of large scale datasets, computational power and tools like automatic differentiation and expressive neural network architectures, sequential data are now often treated in a data-driven way, with a dynamical model trained from the observation data. While neural networks are often seen as uninterpretable black-box architectures, they can still benefit from physical priors on the data and from mathematical knowledge. In this paper, we use a neural network architecture which leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, enabling a number of appealing features. We introduce methods that enable to train such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. The potential for self-supervised learning is also demonstrated, as we show the promising use of trained dynamical models as priors for variational data assimilation techniques, with applications to e.g. time series interpolation and forecasting.
    摘要 In this paper, we use a neural network architecture that leverages the long-known Koopman operator theory to embed dynamical systems in latent spaces where their dynamics can be described linearly, allowing for several appealing features. We introduce methods that enable training such a model for long-term continuous reconstruction, even in difficult contexts where the data comes in irregularly-sampled time series. Additionally, we demonstrate the potential for self-supervised learning by showing how trained dynamical models can be used as priors for variational data assimilation techniques, with applications to time series interpolation and forecasting.

Balance Measures Derived from Insole Sensor Differentiate Prodromal Dementia with Lewy Bodies

  • paper_url: http://arxiv.org/abs/2309.08623
  • repo_url: None
  • paper_authors: Masatomo Kobayashi, Yasunori Yamada, Kaoru Shinkawa, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai
    for:这个研究旨在提供一种自动化识别患有 Lewy bodies 认知障碍的机器学习 pipeline,以便在 prodromal 阶段提供适当的护理。methods:这个研究使用了一种基于机器学习的自动化识别方法,利用了一个尺度感应器获取的30秒站立任务中的平衡测量数据。results:研究发现,结果模型可以准确地识别患有 Lewy bodies 认知障碍的参与者,与其他组比之下,准确率可达78.0%(AUC:0.681),比对照模型基于人口和临床神经心理测量的准确率高6.8%。
    Abstract Dementia with Lewy bodies is the second most common type of neurodegenerative dementia, and identification at the prodromal stage$-$i.e., mild cognitive impairment due to Lewy bodies (MCI-LB)$-$is important for providing appropriate care. However, MCI-LB is often underrecognized because of its diversity in clinical manifestations and similarities with other conditions such as mild cognitive impairment due to Alzheimer's disease (MCI-AD). In this study, we propose a machine learning-based automatic pipeline that helps identify MCI-LB by exploiting balance measures acquired with an insole sensor during a 30-s standing task. An experiment with 98 participants (14 MCI-LB, 38 MCI-AD, 46 cognitively normal) showed that the resultant models could discriminate MCI-LB from the other groups with up to 78.0% accuracy (AUC: 0.681), which was 6.8% better than the accuracy of a reference model based on demographic and clinical neuropsychological measures. Our findings may open up a new approach for timely identification of MCI-LB, enabling better care for patients.
    摘要 德мен般 Lewy body 是第二常见的肉体性脑下降症,早期识别$-$即轻度智能障碍due to Lewy bodies (MCI-LB)$-$是提供适当照顾的关键。但是, MCI-LB 常常被认为是其他病情的一部分,因为它的临床表现多样化和 Alzheimer's disease 的轻度智能障碍 (MCI-AD) 相似。在这项研究中,我们提出了一个基于机器学习的自动化管道,可以帮助识别 MCI-LB。我们使用了一个尖锐感知器测量的30秒立位任务中的平衡测量,并使用机器学习算法对数据进行分析。实验中还有98名参与者(14名 MCI-LB,38名 MCI-AD,46名正常智能),结果显示,所得的模型可以对 MCI-LB 和其他两个群体进行区别,精度可达78.0%(AUC: 0.681),较传统基于人口和临床神经心理测验的模型好6.8%。我们的发现可能会开启一个新的识别 MCI-LB 的方法,帮助病人获得更好的照顾。

Fully-Connected Spatial-Temporal Graph for Multivariate Time Series Data

  • paper_url: http://arxiv.org/abs/2309.05305
  • repo_url: None
  • paper_authors: Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen
  • for: 本文旨在为多变量时间序列(MTS)数据提供有效的模型方法,具体来说是利用图神经网络(GNN)来处理MTS数据中的空间时间(ST)相互关联性。
  • methods: 本文提出了一种新的方法 called Fully-Connected Spatial-Temporal Graph Neural Network (FC-STGNN),它包括两个关键组件:FC图构建和FC图卷积。FC图构建使用衰减图连接各感知器的时间距离,以全面模型ST相互关联性,并且FC图卷积使用移动覆盖GNN层来有效地捕捉ST相互关联性。
  • results: 对多个MTS数据集进行了广泛的实验,FC-STGNN的性能比SOTA方法更高,demonstrating the effectiveness of the proposed method in handling MTS data with ST dependencies.
    Abstract Multivariate Time-Series (MTS) data is crucial in various application fields. With its sequential and multi-source (multiple sensors) properties, MTS data inherently exhibits Spatial-Temporal (ST) dependencies, involving temporal correlations between timestamps and spatial correlations between sensors in each timestamp. To effectively leverage this information, Graph Neural Network-based methods (GNNs) have been widely adopted. However, existing approaches separately capture spatial dependency and temporal dependency and fail to capture the correlations between Different sEnsors at Different Timestamps (DEDT). Overlooking such correlations hinders the comprehensive modelling of ST dependencies within MTS data, thus restricting existing GNNs from learning effective representations. To address this limitation, we propose a novel method called Fully-Connected Spatial-Temporal Graph Neural Network (FC-STGNN), including two key components namely FC graph construction and FC graph convolution. For graph construction, we design a decay graph to connect sensors across all timestamps based on their temporal distances, enabling us to fully model the ST dependencies by considering the correlations between DEDT. Further, we devise FC graph convolution with a moving-pooling GNN layer to effectively capture the ST dependencies for learning effective representations. Extensive experiments show the effectiveness of FC-STGNN on multiple MTS datasets compared to SOTA methods.
    摘要 多变量时间序列(MTS)数据在各种应用领域具有重要意义。MTS数据具有顺序和多源(多感器)性质,因此自然地带有空间-时(ST)相关性,包括时间相关性和感器在每个时间戳中的空间相关性。为了有效利用这些信息,图神经网络(GNN)已经广泛应用。然而,现有方法通常分别捕捉空间相关性和时间相关性,而忽略了不同感器在不同时间戳之间的相关性(DEDT)。这限制了现有GNN的全面模型化能力,从而阻碍它们学习有效表示。为解决这个局限性,我们提出了一种新方法,即完全连接空间-时图神经网络(FC-STGNN),其包括两个关键组件:FC图构建和FC图卷积。为图构建,我们设计了衰减图来连接不同时间戳的感器,根据它们的时间距离来建立连接,从而允许我们完全模型ST相关性,包括不同DEDT之间的相关性。此外,我们设计了移动 Pooling GNN层,以有效地捕捉ST相关性,以便学习有效表示。我们对多个MTS数据集进行了广泛的实验,并证明了FC-STGNN在相对方法的基础上表现出色。

The fine print on tempered posteriors

  • paper_url: http://arxiv.org/abs/2309.05292
  • repo_url: None
  • paper_authors: Konstantinos Pitas, Julyan Arbel
  • for: 本研究探讨tempered posteriors的细节,发现了一些重要但未经讨论的点。
  • methods: 我们使用realistic models和dataset,以及Laplace approximation的紧张情况,发现在实际情况下,随机性不一定能提高测试精度。
  • results: 我们发现,bayesian模型中的随机性可能会导致测试精度下降,而targeting Frequentist metrics使得temperature参数$\lambda$在优化目标中无法被视为简单地修正错误的先前分布或概率。
    Abstract We conduct a detailed investigation of tempered posteriors and uncover a number of crucial and previously undiscussed points. Contrary to previous results, we first show that for realistic models and datasets and the tightly controlled case of the Laplace approximation to the posterior, stochasticity does not in general improve test accuracy. The coldest temperature is often optimal. One might think that Bayesian models with some stochasticity can at least obtain improvements in terms of calibration. However, we show empirically that when gains are obtained this comes at the cost of degradation in test accuracy. We then discuss how targeting Frequentist metrics using Bayesian models provides a simple explanation of the need for a temperature parameter $\lambda$ in the optimization objective. Contrary to prior works, we finally show through a PAC-Bayesian analysis that the temperature $\lambda$ cannot be seen as simply fixing a misspecified prior or likelihood.
    摘要 我们进行了详细的探究模tempered posteriors,并发现了一些重要且先前未讨论的点。与之前的结果不同,我们第一次表明在实际模型和数据集下,精度控制的情况下, Stochasticity不一定提高测试准确率。最冷的温度通常是最佳。一 might think Bayesian模型具有一定的随机性可以至少获得改善 Calibration的提高。但我们实际观测到,当获得了这些改善时,它们来的代价是测试准确率下降。然后我们讨论了如何使用 Bayesian 模型来target Frequentist metrics,并通过 PAC-Bayesian 分析显示了温度参数 $\lambda$ 不能被简单地视为修正了错误的先验或likelihood。

Efficient Finite Initialization for Tensorized Neural Networks

  • paper_url: http://arxiv.org/abs/2309.06577
  • repo_url: https://github.com/i3bquantumteam/q4real
  • paper_authors: Alejandro Mata Ali, Iñigo Perez Delgado, Marina Ristol Roura, Aitor Moreno Fdez. de Leceta
  • for: 本研究旨在开发一种初始化tensorized神经网络层的新方法,以避免这些层的参数数量爆炸。这种方法适用于具有较高节点数的层,其中每个节点都与输入或输出相连。
  • methods: 本方法基于层的 Frobenius нор的 iterative 部分形式使用,以确保其 Finite 并且在某个范围内。这种 norm 计算效率高,可以在大多数情况下完全或部分计算。
  • results: 我们在不同层上应用该方法并评估其性能。我们还创建了一个 Python 函数,可以在任意层上运行,可以在 GitHub 上找到:https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspired%20Variational%20Methods/Normalization_process.ipynb
    Abstract We present a novel method for initializing layers of tensorized neural networks in a way that avoids the explosion of the parameters of the matrix it emulates. The method is intended for layers with a high number of nodes in which there is a connection to the input or output of all or most of the nodes. The core of this method is the use of the Frobenius norm of this layer in an iterative partial form, so that it has to be finite and within a certain range. This norm is efficient to compute, fully or partially for most cases of interest. We apply the method to different layers and check its performance. We create a Python function to run it on an arbitrary layer, available in a Jupyter Notebook in the i3BQuantum repository: https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspired%20Variational%20Methods/Normalization_process.ipynb
    摘要 我团队提出了一种新的层初始化方法,用于避免tensorized神经网络层的参数爆炸。这种方法适用于具有较高节点数的层,其中所有或大部分节点与输入或输出进行连接。我们的方法的核心在于使用层的 Frobenius нор的迭代部分形式,使其必须是Finite且在某个范围内。这个 нор 效率Compute,可以在大多数情况下进行完全或半完全计算。我们在不同的层上应用了这种方法,并对其性能进行了检查。我们还创建了一个Python函数,可以在任意层上运行,可以在 i3BQuantum 存储库中找到:https://github.com/i3BQuantumTeam/Q4Real/blob/e07c827651ef16bcf74590ab965ea3985143f891/Quantum-Inspired%20Variational%20Methods/Normalization_process.ipynb。

Compressed Real Numbers for AI: a case-study using a RISC-V CPU

  • paper_url: http://arxiv.org/abs/2309.07158
  • repo_url: None
  • paper_authors: Federico Rossi, Marco Cococcioni, Roger Ferrer Ibàñez, Jesùs Labarta, Filippo Mantovani, Marc Casas, Emanuele Ruffaldi, Sergio Saponara
  • for: 本研究旨在提高深度神经网络(DNN)的运算效率,使用更低的精度数字。
  • methods: 本文使用了两种已经在机器学习应用中取得了有趣的结果的压缩格式:bfloat和posit。
  • results: 本文提出了一种在计算前 decompress tensor 的方法,以避免压缩后的数据进行计算时的带宽使用和缓存不足问题。
    Abstract As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a vector capable CPU, in order to save bandwidth usage and increase cache efficiency. Finally, we show the architectural parameters and considerations under which this solution is advantageous with respect to the uncompressed one.
    摘要 Recently, deep neural networks (DNN) 通常使用单精度 IEEE 754 浮点数 (binary32) 进行训练,但也可以使用较低精度。因此,16 位和 8 位压缩格式在机器学习应用中吸引了广泛的关注。在这篇论文中,我们关注了两种家族的格式,它们已经在机器学习应用中实现了有趣的结果,无论是否占用精度:bfloat 和 posit。尽管 16 位和 8 位 bfloat/posit 通常用于减少训练后的权重/偏移的存储,但是推理通常发生在 CPU 的 32 位 FPU 上(尤其是如果 GPU 不可用)。在这篇论文中,我们提议在计算之前,即在 vector 可能 CPU 中加载压缩参数后,解压缩一个 tensor 中的 bfloat/posits,以节省带宽使用和提高缓存效率。最后,我们介绍了这种解压缩方案与不压缩方案之间的架构参数和考虑因素。

Beamforming in Wireless Coded-Caching Systems

  • paper_url: http://arxiv.org/abs/2309.05276
  • repo_url: None
  • paper_authors: Sneha Madhusudan, Charitha Madapatha, Behrooz Makki, Hao Guo, Tommy Svensson
  • for: 提高存取网络的容量对于传输网络 pose 容量挑战,但用户数据需求具有空间和时间相关性,可能被利用。
  • methods: 我们 investigate 一种 integrate beamforming 和coded-caching 的无线传输网络架构,其中服务器具有多个天线,将内容广播到缓存节点,负责为用户提供内容。
  • results: 我们的设计可以实现多播机会增加,干扰 Mitigation 和减少峰值后端流量。 Comparative 分析表明,与传统、uncoded caching 方法相比,我们的方法具有显著的优势。 更进一步,我们发现适当的扫描优化可以增强coded-caching 技术的效iveness,导致峰值后端流量的显著减少。
    Abstract Increased capacity in the access network poses capacity challenges on the transport network due to the aggregated traffic. However, there are spatial and time correlation in the user data demands that could potentially be utilized. To that end, we investigate a wireless transport network architecture that integrates beamforming and coded-caching strategies. Especially, our proposed design entails a server with multiple antennas that broadcasts content to cache nodes responsible for serving users. Traditional caching methods face the limitation of relying on the individual memory with additional overhead. Hence, we develop an efficient genetic algorithm-based scheme for beam optimization in the coded-caching system. By exploiting the advantages of beamforming and coded-caching, the architecture achieves gains in terms of multicast opportunities, interference mitigation, and reduced peak backhaul traffic. A comparative analysis of this joint design with traditional, un-coded caching schemes is also conducted to assess the benefits of the proposed approach. Additionally, we examine the impact of various buffering and decoding methods on the performance of the coded-caching scheme. Our findings suggest that proper beamforming is useful in enhancing the effectiveness of the coded-caching technique, resulting in significant reduction in peak backhaul traffic.
    摘要 增加了访问网络的容量会导致传输网络的压力增加,但是用户数据需求存在空间和时间的协调关系,这可能可以利用。为此,我们调查了一种具有扫描和编码缓存策略的无线传输网络架构。具体来说,我们的设计包括一个有多个天线的服务器,通过扫描来广播内容到缓存节点,负责服务用户。传统的缓存方法受到各自内存的限制,同时增加了过程中的额外开销。因此,我们开发了一种高效的遗传算法基本方法来优化扫描。通过利用扫描和编码缓存的优势,该架构实现了多播机会的增加、干扰 Mitigation和传输峰值压力的减少。我们还对该 JOINT 设计与传统、未编码缓存方案进行了比较分析,以评估提议的方法的优势。此外,我们还研究了缓存和解码方法对编码缓存方案的性能影响。我们的发现表明,当使用正确的扫描时,编码缓存技术的效果会增强,从而导致峰值传输压力的显著减少。

Generalized Graphon Process: Convergence of Graph Frequencies in Stretched Cut Distance

  • paper_url: http://arxiv.org/abs/2309.05260
  • repo_url: None
  • paper_authors: Xingchao Jian, Feng Ji, Wee Peng Tay
  • for: 本文研究了稀疏图序列的收敛Property,并提出了一种基于泛化图和延展距离的方法来描述这种收敛。
  • methods: 本文使用了通过一种随机图生成器生成的泛化图来模型稀疏图的收敛,并证明了随机图的谱值收敛。
  • results: 本文的研究表明,通过延展距离来定义的稀疏图序列可以收敛到一个泛化图,并且可以在稀疏图上实现 Transfer Learning。
    Abstract Graphons have traditionally served as limit objects for dense graph sequences, with the cut distance serving as the metric for convergence. However, sparse graph sequences converge to the trivial graphon under the conventional definition of cut distance, which make this framework inadequate for many practical applications. In this paper, we utilize the concepts of generalized graphons and stretched cut distance to describe the convergence of sparse graph sequences. Specifically, we consider a random graph process generated from a generalized graphon. This random graph process converges to the generalized graphon in stretched cut distance. We use this random graph process to model the growing sparse graph, and prove the convergence of the adjacency matrices' eigenvalues. We supplement our findings with experimental validation. Our results indicate the possibility of transfer learning between sparse graphs.
    摘要 GRAPHONS tradicionalmente han servido como objetos de límite para secuencias de grafos densos, con la distancia de cortes sirviendo como métrica para la convergencia. Sin embargo, las secuencias de grafos esparsas convergen al grafón trivial bajo la definición conventional de distancia de cortes, lo que hace que este marco sea inadecuado para muchas aplicaciones prácticas. En este artículo, utilizamos los conceptos de grafons generalizados y la distancia de cortes estirada para describir la convergencia de las secuencias de grafos esparsas. En particular, consideramos un proceso de grafos aleatorio generado desde un grafon generalizado. Este proceso de grafos aleatorio converge al grafon generalizado en distancia de cortes estirada. Usamos este proceso de grafos aleatorio para modelar el crecimiento de grafos esparsos y probamos la convergencia de los valores propios de las matrices de conexión. Complementamos nuestros resultados con validación experimental. Nuestros resultados sugieren la posibilidad de transferencia de aprendizaje entre grafos esparsos.Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

A physics-informed and attention-based graph learning approach for regional electric vehicle charging demand prediction

  • paper_url: http://arxiv.org/abs/2309.05259
  • repo_url: None
  • paper_authors: Haohao Qu, Haoxuan Kuang, Jun Li, Linlin You
  • for: 预测电动车充电需求,以便优化电动车充电空间使用,从而缓解城市智能交通系统的负载。
  • methods: 融合 graf和时间注意力机制进行特征提取,并使用物理学 Informed 元学习来对模型进行预训。
  • results: 在深圳18,013个电动车充电桩数据集上进行评估,获得了预测性能的州OF-THE-ART和理解价格变化导致充电需求的适应性。
    Abstract Along with the proliferation of electric vehicles (EVs), optimizing the use of EV charging space can significantly alleviate the growing load on intelligent transportation systems. As the foundation to achieve such an optimization, a spatiotemporal method for EV charging demand prediction in urban areas is required. Although several solutions have been proposed by using data-driven deep learning methods, it can be found that these performance-oriented methods may suffer from misinterpretations to correctly handle the reverse relationship between charging demands and prices. To tackle the emerging challenges of training an accurate and interpretable prediction model, this paper proposes a novel approach that enables the integration of graph and temporal attention mechanisms for feature extraction and the usage of physic-informed meta-learning in the model pre-training step for knowledge transfer. Evaluation results on a dataset of 18,013 EV charging piles in Shenzhen, China, show that the proposed approach, named PAG, can achieve state-of-the-art forecasting performance and the ability in understanding the adaptive changes in charging demands caused by price fluctuations.
    摘要 随着电动汽车(EV)的普及,优化EV充电空间的使用可以有效缓解城市交通系统的增长荷负。为达到这种优化,在城市区域内需要一种空间时间方法来预测EV充电需求。虽然已有使用数据驱动深度学习方法提出了许多解决方案,但这些性能强调的方法可能会错误地处理充电需求和价格之间的反关系。为了解决这些新出现的训练精度和可读性预测模型的挑战,这篇论文提出了一种新的方法,名为PAG,它可以integrate图和时间注意力机制以提取特征,并在模型预训练步骤中使用物理学习来传递知识。对于18,013个EV充电柱在深圳市的数据进行评估,PAG方法可以实现领先的预测性能和理解充电需求因价格波动而发生的适应变化。

Examining the Effect of Pre-training on Time Series Classification

  • paper_url: http://arxiv.org/abs/2309.05256
  • repo_url: None
  • paper_authors: Jiashu Pu, Shiwei Zhao, Ling Cheng, Yongzhu Chang, Runze Wu, Tangjie Lv, Rongsheng Zhang
  • for: 这个研究旨在探讨无监督预训的后续精度调整方法在新的时间序列模式下的效果。
  • methods: 该研究使用了150个分类数据集,来对无监督预训后续精度调整的效果进行了全面的检查。
  • results: 研究发现,预训只有在数据适应度较差的情况下能够改善优化过程,而不会对数据适应度较好的情况下提供正则化效果。此外,预训不会提高总体化能力,但可以加速参数的整合。尽管预训任务和模型结构都会影响该方法在给定数据集上的效果,但模型结构在这两个因素中扮演更重要的角色。
    Abstract Although the pre-training followed by fine-tuning paradigm is used extensively in many fields, there is still some controversy surrounding the impact of pre-training on the fine-tuning process. Currently, experimental findings based on text and image data lack consensus. To delve deeper into the unsupervised pre-training followed by fine-tuning paradigm, we have extended previous research to a new modality: time series. In this study, we conducted a thorough examination of 150 classification datasets derived from the Univariate Time Series (UTS) and Multivariate Time Series (MTS) benchmarks. Our analysis reveals several key conclusions. (i) Pre-training can only help improve the optimization process for models that fit the data poorly, rather than those that fit the data well. (ii) Pre-training does not exhibit the effect of regularization when given sufficient training time. (iii) Pre-training can only speed up convergence if the model has sufficient ability to fit the data. (iv) Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume, such as faster convergence. (v) While both the pre-training task and the model structure determine the effectiveness of the paradigm on a given dataset, the model structure plays a more significant role.
    摘要 尽管预训练followed by fine-tuning paradigm在多个领域广泛应用,但是预训练对精度调整的影响仍存在一定的争议。目前,基于文本和图像数据的实验结果缺乏一致性。为了更深入地探讨预训练followed by fine-tuning paradigm,我们在新的模式下进行了扩展性研究:时间序列数据。在这项研究中,我们对150个分类数据集 derived from Univariate Time Series (UTS)和Multivariate Time Series (MTS) benchmark进行了全面的分析。我们的分析结果显示了以下几点:(i) 预训练只能帮助改善模型不适合数据的优化过程,而不是适合数据的模型。(ii) 预训练不会在充分训练时间后展现正则化效果。(iii) 预训练只能快速化整合速度,只要模型具备足够的适应能力。(iv) 增加更多的预训练数据不会提高总体化,但可以强化预训练对原始数据量的优势,如更快的整合速度。(v) 预训练任务和模型结构决定了预训练followed by fine-tuning paradigm在给定数据集的效果,但模型结构在这两个因素中扮演更重要的角色。

A quantum tug of war between randomness and symmetries on homogeneous spaces

  • paper_url: http://arxiv.org/abs/2309.05253
  • repo_url: None
  • paper_authors: Rahul Arvind, Kishor Bharti, Jun Yong Khoo, Dax Enshan Koh, Jian Feng Kong
  • for: 研究量子信息中的Symmetry和Randomness的关系
  • methods: 采用几何方法,考虑状态为$H$-相似的集合,并引入哈恩抽象来描述真正随机的系统
  • results: 提出了基于哈恩空间的随机性概念,并研究了近似随机性和假随机性的概念,以及其应用于量子机器学习模型的表达性。
    Abstract We explore the interplay between symmetry and randomness in quantum information. Adopting a geometric approach, we consider states as $H$-equivalent if related by a symmetry transformation characterized by the group $H$. We then introduce the Haar measure on the homogeneous space $\mathbb{U}/H$, characterizing true randomness for $H$-equivalent systems. While this mathematical machinery is well-studied by mathematicians, it has seen limited application in quantum information: we believe our work to be the first instance of utilizing homogeneous spaces to characterize symmetry in quantum information. This is followed by a discussion of approximations of true randomness, commencing with $t$-wise independent approximations and defining $t$-designs on $\mathbb{U}/H$ and $H$-equivalent states. Transitioning further, we explore pseudorandomness, defining pseudorandom unitaries and states within homogeneous spaces. Finally, as a practical demonstration of our findings, we study the expressibility of quantum machine learning ansatze in homogeneous spaces. Our work provides a fresh perspective on the relationship between randomness and symmetry in the quantum world.
    摘要 我们探索量子信息中的对称和随机性的关系。我们采用幂等方法,将状态看作$H$-相似的,当$H$表示一个群。然后我们引入了哈ROW measure在同质空间$\mathbb{U}/H$上,这个概率测度描述了真正的随机性。这个数学工具已经由数学家们很好地研究过,但在量子信息领域却很少被应用。我们认为我们的工作是量子信息领域中第一次利用同质空间来描述对称的。接着,我们讨论了true randomness的近似,包括$t$-wise独立的近似和$H$-相似的状态上的$t$-设计。然后我们探索了假随机性,定义了在同质空间中的假随机单位和状态。最后,我们研究了使用同质空间来表示量子机器学习模型的表达性。我们的工作提供了量子世界中对于随机性和对称的新的视角。

Graph Contextual Contrasting for Multivariate Time Series Classification

  • paper_url: http://arxiv.org/abs/2309.05202
  • repo_url: None
  • paper_authors: Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen
  • for: 本文提出了一种新的对比学习方法,用于 Multivariate Time-Series(MTS)分类。该方法可以保证不同视图的无标样本之间的一致性,并学习有效的表示。
  • methods: 该方法使用了图像增强和对比技术来保持时间束缚的稳定性,并使用图像对比来提取强健的感知器和相关性。
  • results: 实验结果表明,提出的GCC方法可以在多种MTS分类任务上达到领先的表现。
    Abstract Contrastive learning, as a self-supervised learning paradigm, becomes popular for Multivariate Time-Series (MTS) classification. It ensures the consistency across different views of unlabeled samples and then learns effective representations for these samples. Existing contrastive learning methods mainly focus on achieving temporal consistency with temporal augmentation and contrasting techniques, aiming to preserve temporal patterns against perturbations for MTS data. However, they overlook spatial consistency that requires the stability of individual sensors and their correlations. As MTS data typically originate from multiple sensors, ensuring spatial consistency becomes essential for the overall performance of contrastive learning on MTS data. Thus, we propose Graph Contextual Contrasting (GCC) for spatial consistency across MTS data. Specifically, we propose graph augmentations including node and edge augmentations to preserve the stability of sensors and their correlations, followed by graph contrasting with both node- and graph-level contrasting to extract robust sensor- and global-level features. We further introduce multi-window temporal contrasting to ensure temporal consistency in the data for each sensor. Extensive experiments demonstrate that our proposed GCC achieves state-of-the-art performance on various MTS classification tasks.
    摘要 《对比学习》作为一种自我监督学习方法,在多变量时间序列(MTS)分类中变得受欢迎。它确保不同视图的无标示样本之间的一致性,然后学习这些样本的有效表示。现有的对比学习方法主要强调获得时间一致性,通过时间扩展和对比技术来保持时间模式对MTS数据的抗变化。然而,它们忽视了空间一致性,即感知器的稳定性和相关性。由于MTS数据通常来自多个感知器,保证空间一致性对总性表现的影响是关键的。因此,我们提议使用图构造对比(GCC)来保证MTS数据的空间一致性。具体来说,我们提出了图加工包括节点和边加工,以保持感知器的稳定性和相关性,然后与节点和图 уровень对比来提取Robust的感知器和全局级别特征。此外,我们还引入了多窗口时间对比来确保每个感知器的数据中的时间一致性。广泛的实验证明,我们提出的GCC可以在多种MTS分类任务上达到顶尖性能。

CARE: Confidence-rich Autonomous Robot Exploration using Bayesian Kernel Inference and Optimization

  • paper_url: http://arxiv.org/abs/2309.05200
  • repo_url: https://github.com/shepherd-gregory/bkio-exploration
  • paper_authors: Yang Xu, Ronghao Zheng, Senlin Zhang, Meiqin Liu, Shoudong Huang
  • for: 本研究旨在提高无人机在未知和复杂环境中的信息基于自主探索效率。
  • methods: 我们首先使用 Gaussian process(GP)回归来学习一个替身模型,以便通过查询控制动作的信息强度来INFER confidence-rich mutual information(CRMI),然后采用一个包含预测值和预测不确定性的目标函数来进行 Bayesian optimization(BO),即 GP-based BO(GPBO)。通过让探索和利用之间进行交互,我们可以实现质量和效率之间的平衡。
  • results: 我们提出了一种新的轻量级信息增加推理方法,基于 Bayesian kernel inference and optimization(BKIO),可以在不需要训练的情况下实现approximate logarithmic complexity。BKIO可以通过INFER CRMI和选择最佳动作来实现GPBO的同等准确性,但是具有更高的效率。我们在不同的无结构、杂乱环境中进行了广泛的数值和实际实验,并证明了我们的提议的效果。我们还提供了我们的开源实现代码,可以在 https://github.com/Shepherd-Gregory/BKIO-Exploration 中下载。
    Abstract In this paper, we consider improving the efficiency of information-based autonomous robot exploration in unknown and complex environments. We first utilize Gaussian process (GP) regression to learn a surrogate model to infer the confidence-rich mutual information (CRMI) of querying control actions, then adopt an objective function consisting of predicted CRMI values and prediction uncertainties to conduct Bayesian optimization (BO), i.e., GP-based BO (GPBO). The trade-off between the best action with the highest CRMI value (exploitation) and the action with high prediction variance (exploration) can be realized. To further improve the efficiency of GPBO, we propose a novel lightweight information gain inference method based on Bayesian kernel inference and optimization (BKIO), achieving an approximate logarithmic complexity without the need for training. BKIO can also infer the CRMI and generate the best action using BO with bounded cumulative regret, which ensures its comparable accuracy to GPBO with much higher efficiency. Extensive numerical and real-world experiments show the desired efficiency of our proposed methods without losing exploration performance in different unstructured, cluttered environments. We also provide our open-source implementation code at https://github.com/Shepherd-Gregory/BKIO-Exploration.
    摘要 在这篇论文中,我们考虑了改善自动机器人在未知和复杂环境中的信息基本探索效率。我们首先使用 Gaussian process(GP)回归来学习一个替代模型,来推算信息充沛的共享信息(CRMI)的询问控制动作的估计值,然后采用一个包含预测值和预测不确定性的目标函数来进行 Bayesian 优化(BO),即 GP-based BO(GPBO)。通过在最佳动作和高信息充沛动作之间进行负荷平衡,我们可以实现对 GPBO 的高效性。为了进一步提高 GPBO 的效率,我们提出了一种新的轻量级信息增强推断方法,基于 Bayesian kernel 推断和优化(BKIO),可以在无需训练的情况下实现对数 logarithmic 复杂度。BKIO 还可以计算 CRMI 和生成最佳动作,并使用 BO 实现 bounded 累累 regret,这 garantizes 其与 GPBO 相比较准确。我们在不同的无结构、堆满环境中进行了广泛的数值和实际实验,并证明了我们的提出的方法可以保持高效性而不失去探索性。我们还提供了我们的开源实现代码,可以在 中找到。

eess.IV - 2023-09-11

Designs and Implementations in Neural Network-based Video Coding

  • paper_url: http://arxiv.org/abs/2309.05846
  • repo_url: None
  • paper_authors: Yue Li, Junru Li, Chaoyi Lin, Kai Zhang, Li Zhang, Franck Galpin, Thierry Dumas, Hongtao Wang, Muhammed Coban, Jacob Ström, Du Liu, Kenneth Andersson
  • for: 这篇论文主要是关于 neural network-based video coding (NNVC) 的研究和应用。
  • methods: 这篇论文使用了两种主要的 neural network-based video coding 技术:卷积 neural network-based intra prediction 和卷积 neural network-based in-loop filtering。
  • results: 对于 random-access、low-delay 和 all-intra 配置,使用了提出的 NN-based coding tools 可以实现 {11.94%, 21.86%, 22.59%} BD-rate reductions 的平均提升。
    Abstract The past decade has witnessed the huge success of deep learning in well-known artificial intelligence applications such as face recognition, autonomous driving, and large language model like ChatGPT. Recently, the application of deep learning has been extended to a much wider range, with neural network-based video coding being one of them. Neural network-based video coding can be performed at two different levels: embedding neural network-based (NN-based) coding tools into a classical video compression framework or building the entire compression framework upon neural networks. This paper elaborates some of the recent exploration efforts of JVET (Joint Video Experts Team of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29) in the name of neural network-based video coding (NNVC), falling in the former category. Specifically, this paper discusses two major NN-based video coding technologies, i.e. neural network-based intra prediction and neural network-based in-loop filtering, which have been investigated for several meeting cycles in JVET and finally adopted into the reference software of NNVC. Extensive experiments on top of the NNVC have been conducted to evaluate the effectiveness of the proposed techniques. Compared with VTM-11.0_nnvc, the proposed NN-based coding tools in NNVC-4.0 could achieve {11.94%, 21.86%, 22.59%}, {9.18%, 19.76%, 20.92%}, and {10.63%, 21.56%, 23.02%} BD-rate reductions on average for {Y, Cb, Cr} under random-access, low-delay, and all-intra configurations respectively.
    摘要 过去一代,深度学习在知名人工智能应用中取得了巨大成功,如面部识别、自动驾驶和大型语言模型如ChatGPT。近些年,深度学习的应用范围已经扩展到了非常广泛,其中包括神经网络基于的视频编码。神经网络基于的视频编码可以在两个不同的水平进行:在经典视频压缩框架中嵌入神经网络基于的编码工具,或者建立整个压缩框架基于神经网络。这篇论文介绍了过去几年,JVET(国际电信标准化组织ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC29联合视频专家小组)在神经网络基于的视频编码(NNVC)方面的一些探索努力。具体来说,这篇论文讨论了JVET在过去几个会议征程中 investigate的两大神经网络基于视频编码技术:神经网络基于内部预测和神经网络基于循环滤波。这两种技术在JVET的参考软件中被采纳,并进行了大量的实验来评估这些技术的效果。相比VTM-11.0_nnvc,NNVC-4.0中的神经网络基于编码工具可以实现{11.94%, 21.86%, 22.59%}、{9.18%, 19.76%, 20.92%}和{10.63%, 21.56%, 23.02%}的BD-rate减少平均值,对于{Y, Cb, Cr} unter random-access、low-delay和all-intra配置分别。

Diffusion-based Adversarial Purification for Robust Deep MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2309.05794
  • repo_url: https://github.com/sjames40/adversarial-purification-for-mri
  • paper_authors: Ismail Alkhouri, Shijun Liang, Rongrong Wang, Qing Qu, Saiprasad Ravishankar
  • for: 提高Diffusion模型对MRI重建图像的鲁棒性,增强MRI重建图像的安全性
  • methods: 利用预训练的Diffusion模型作为对抗干扰器,提高MRI重建图像的对抗性
  • results: 对比主流防御方法(如对抗训练和随机缓和),我们的提议方法可以更好地提高MRI重建图像的鲁棒性和安全性。
    Abstract Deep learning (DL) methods have been extensively employed in magnetic resonance imaging (MRI) reconstruction, demonstrating remarkable performance improvements compared to traditional non-DL methods. However, recent studies have uncovered the susceptibility of these models to carefully engineered adversarial perturbations. In this paper, we tackle this issue by leveraging diffusion models. Specifically, we introduce a defense strategy that enhances the robustness of DL-based MRI reconstruction methods through the utilization of pre-trained diffusion models as adversarial purifiers. Unlike conventional state-of-the-art adversarial defense methods (e.g., adversarial training), our proposed approach eliminates the need to solve a minimax optimization problem to train the image reconstruction model from scratch, and only requires fine-tuning on purified adversarial examples. Our experimental findings underscore the effectiveness of our proposed technique when benchmarked against leading defense methodologies for MRI reconstruction such as adversarial training and randomized smoothing.
    摘要 深度学习(DL)方法已广泛应用于 магни共振成像(MRI)重建,表现出了非常出色的性能提高 compared to traditional non-DL 方法。然而,最近的研究发现,这些模型对特殊设计的恶作剂抗干扰有极高的感受性。在这篇论文中,我们解决这个问题,通过利用扩散模型。我们首先介绍了一种防御策略,通过预训练的扩散模型来增强 DL-based MRI 重建方法的Robustness。与传统的State-of-the-art adversarial defense方法(例如,对抗训练)不同,我们的提议方法不需要解决一个 minimax 优化问题来训练图像重建模型,只需要在纯化的恶作剂例子上进行细调。我们的实验结果表明,我们的提议技术对于 MRI 重建中的防御方法进行了证明,并且比领先的防御方法(如对抗训练和随机滤波)更有效。

From Capture to Display: A Survey on Volumetric Video

  • paper_url: http://arxiv.org/abs/2309.05658
  • repo_url: None
  • paper_authors: Yili Jin, Kaiyuan Hu, Junhua Liu, Fangxin Wang, Xue Liu
  • for: 这篇论文旨在为探讨volumetric video服务提供一个全面的 literature review,以便更好地理解这个领域的发展趋势和未来研究方向。
  • methods: 本论文首先提供了volumetric video服务的总体框架,然后详细介绍了volumetric video服务的各个阶段技术,包括捕捉、压缩、传输、渲染和显示等方面的技术。
  • results: 本论文通过对现有Literature的审核和分析,探讨了volumetric video服务的多种应用场景和未来研究机会,并提供了一些未来研究方向的想法和建议。
    Abstract Volumetric video, which offers immersive viewing experiences, is gaining increasing prominence. With its six degrees of freedom, it provides viewers with greater immersion and interactivity compared to traditional videos. Despite their potential, volumetric video services poses significant challenges. This survey conducts a comprehensive review of the existing literature on volumetric video. We firstly provide a general framework of volumetric video services, followed by a discussion on prerequisites for volumetric video, encompassing representations, open datasets, and quality assessment metrics. Then we delve into the current methodologies for each stage of the volumetric video service pipeline, detailing capturing, compression, transmission, rendering, and display techniques. Lastly, we explore various applications enabled by this pioneering technology and we present an array of research challenges and opportunities in the domain of volumetric video services. This survey aspires to provide a holistic understanding of this burgeoning field and shed light on potential future research trajectories, aiming to bring the vision of volumetric video to fruition.
    摘要 三维视频技术在吸引人们的视觉经验方面占据着越来越重要的地位。它的六个自由度使得观众能够更深入地参与到视频中,与传统视频相比,具有更高的吸引力和互动性。然而,三维视频服务也面临着一些挑战。这篇评论通过对现有文献的审核,为读者提供了三维视频服务的全面性评价。我们首先提供了三维视频服务的通用框架,然后讨论了三维视频的前提条件,包括表示、开放数据集和质量评价指标。接着,我们详细介绍了每个三维视频服务管道阶段的方法,包括捕获、压缩、传输、渲染和显示技术。最后,我们探讨了三维视频服务所带来的各种应用,并提出了这一领域的一些研究挑战和机遇。这篇评论的目的是为读者提供三维视频服务领域的总体理解,并且预测未来研究的趋势,以便实现三维视频的未来视野。

A survey on real-time 3D scene reconstruction with SLAM methods in embedded systems

  • paper_url: http://arxiv.org/abs/2309.05349
  • repo_url: None
  • paper_authors: Quentin Picard, Stephane Chevobbe, Mehdi Darouich, Jean-Yves Didier
  • for: 这篇论文旨在探讨资源受限的硬件平台上实现视觉基于的3D场景重建管线,以及其在实时地图建模和物体识别方面的应用。
  • methods: 文章描述了一种视觉基于的3D场景重建管线,包括激光雷达、图像和深度感知等多种感知器,以及在资源受限的硬件平台上实现的一系列技术,如缓存管理和吞吐量优化。
  • results: 文章介绍了在实际应用中的实时性、内存管理和功耗优化,以及在不同粒度的3D场景重建方面的质量和性能评估。
    Abstract The 3D reconstruction of simultaneous localization and mapping (SLAM) is an important topic in the field for transport systems such as drones, service robots and mobile AR/VR devices. Compared to a point cloud representation, the 3D reconstruction based on meshes and voxels is particularly useful for high-level functions, like obstacle avoidance or interaction with the physical environment. This article reviews the implementation of a visual-based 3D scene reconstruction pipeline on resource-constrained hardware platforms. Real-time performances, memory management and low power consumption are critical for embedded systems. A conventional SLAM pipeline from sensors to 3D reconstruction is described, including the potential use of deep learning. The implementation of advanced functions with limited resources is detailed. Recent systems propose the embedded implementation of 3D reconstruction methods with different granularities. The trade-off between required accuracy and resource consumption for real-time localization and reconstruction is one of the open research questions identified and discussed in this paper.
    摘要 三维重建(3D reconstruction)是交通系统如无人机、服务机器人和移动AR/VR设备等领域的重要话题。与点云表示相比,基于多面体和 voxel 的三维重建特别有用于高级功能,如避免障碍物或与物理环境交互。本文介绍了资源限制的硬件平台上的视觉基于的三维场景重建管道的实现。实时性、内存管理和低功耗是嵌入式系统的关键要求。一个普通的 SLAM 管道从感知器到三维重建被描述,包括可能的深度学习应用。实现高级功能的限制是讨论的一个开放研究问题。文章还讨论了不同粒度的三维重建方法的嵌入实现,以及实时位置和重建的资源消耗和精度之间的负荷。

eess.SP - 2023-09-11

A Novel Catastrophic Condition for Periodically Time-varying Convolutional Encoders Based on Time-varying Equivalent Convolutional Encoders

  • paper_url: http://arxiv.org/abs/2309.05849
  • repo_url: None
  • paper_authors: Fan Jiang
  • for: 本研究考虑了时variant convolutional编码器的灾变性问题。
  • methods: 本文提出了一种基于时variant equivalent convolutional编码器的新的灾变性判定方法,以及一种将灾变性 convolutional编码器转换为非灾变性编码器的技术。
  • results: 研究表明,使用时variant equivalent convolutional编码器可以减少灾变性判定方法的复杂性,并且可以将灾变性 convolutional编码器转换为非灾变性编码器。
    Abstract A convolutional encoder is said to be catastrophic if it maps an information sequence of infinite weight into a code sequence of finite weight. As a consequence of this mapping, a finite number of channel errors may cause an infinite number of information bit errors when decoding. This situation should be avoided. A catastrophic condition to determine if a time-invariant convolutional encoder is catastrophic or not is stated in \cite{Massey:LSC}. Palazzo developed this condition for periodically time-varying convolutional encoders in \cite{Palazzo:Analysis}. Since Palazzo's condition is based on the state transition table of the constituent encoders, its complexity increases exponentially with the number of memory elements in the encoders. A novel catastrophic condition making use of time-varying equivalent convolutional encoders is presented in this letter. A technique to convert a catastrophic periodically time-varying convolutional encoder into a non-catastrophic one can also be developed based on these encoders. Since they do not involve the state transitions of the convolutional encoder, the time complexity of these methods grows linearly with the encoder memory.
    摘要 一个卷积编码器是说服力 catastrophic 如果它将无限长的信息序列映射到 finite 长的编码序列中。由于这种映射,在解码时可能出现一个有限数量的通道错误导致无限多个信息位错误。这种情况应避免。一个用于确定时间不变的卷积编码器是 catastrophic 或者不是的条件是在 \cite{Massey:LSC} 中提出的。Palazzo 在 \cite{Palazzo:Analysis} 中为 periodic time-varying 卷积编码器提出了这种条件。由于这种条件基于卷积编码器的状态转移表,其复杂度呈指数增长与卷积编码器的内存元素数。本文提出了一种使用时变等价的卷积编码器来解决这个问题的新方法。此外,这种方法还可以基于这些编码器来开发一种将 catastrophic periodic time-varying 卷积编码器转换为非 catastrophic 的技术。由于这些方法不涉及卷积编码器的状态转移,它们的时间复杂度随着编码器内存元素数的增长而呈线性增长。

Reinforcement Learning for Supply Chain Attacks Against Frequency and Voltage Control

  • paper_url: http://arxiv.org/abs/2309.05814
  • repo_url: https://github.com/amrmsab/rl-cps-attacks
  • paper_authors: Amr S. Mohamed, Sumin Lee, Deepa Kundur
  • for: 这个论文旨在探讨现代化电力系统受到供应链攻击的威胁,以及如何防范这类攻击。
  • methods: 这篇论文使用了回归学习技术,用于开发 integrate supply chain attacks 中的智能攻击。
  • results: 研究人员通过实验和模拟,发现了一些可能的干扰,包括频率和电压控制的干扰,并提供了一些防范策略。So, in English:
  • for: This paper aims to explore the threats posed by supply chain attacks on modernized power systems, and how to defend against them.
  • methods: The paper uses reinforcement learning technology to develop intelligent attacks incorporated into supply chain attacks against generation control devices.
  • results: The researchers found some possible disturbances, including frequency and voltage control disturbances, and provided some defense strategies through experiments and simulations.
    Abstract The ongoing modernization of the power system, involving new equipment installations and upgrades, exposes the power system to the introduction of malware into its operation through supply chain attacks. Supply chain attacks present a significant threat to power systems, allowing cybercriminals to bypass network defenses and execute deliberate attacks at the physical layer. Given the exponential advancements in machine intelligence, cybercriminals will leverage this technology to create sophisticated and adaptable attacks that can be incorporated into supply chain attacks. We demonstrate the use of reinforcement learning for developing intelligent attacks incorporated into supply chain attacks against generation control devices. We simulate potential disturbances impacting frequency and voltage regulation. The presented method can provide valuable guidance for defending against supply chain attacks.
    摘要 现在的电力系统现代化,包括新设备安装和升级,会导致电力系统暴露于供应链攻击中。供应链攻击对电力系统构成了重要的威胁,允许黑客绕过网络防火墙并执行有意义的攻击。随着机器智能的快速发展,黑客将利用这种技术创造出复杂和适应能力强的攻击,并将其 integrate into supply chain attacks。我们使用强化学习来开发 incorporate 了供应链攻击的生成控制设备攻击。我们通过模拟可能影响频率和电压稳定的干扰来评估这种方法。这种方法可以为防御供应链攻击提供有价值的指导。

Adversarial Score-Based Generative Model for AmBC Channel Estimation

  • paper_url: http://arxiv.org/abs/2309.05776
  • repo_url: None
  • paper_authors: Fatemeh Rezaei, S. Mojtaba Marvasti-Zadeh, Chintha Tellambura, Amine Maaref
  • for: 这个论文是为了提出一种使用深度学习在概率 frameworks 中进行多个标签AmBC网络中的直接和层次通道共享估计的先锋方法。
  • methods: 这个方法使用了一种 adversarial score-based 生成模型进行训练,以获得通道分布。然后,我们通过 Annealed Langevin sampling 技术来实现采样从 posterior distribution。
  • results: 对比标准 least square 估计方法,我们的方法在直接通道和层次通道上都达到了remarkable improvement,并且在层次通道上超过了MMSE估计器的性能。
    Abstract This letter presents a pioneering method that employs deep learning within a probabilistic framework for the joint estimation of both direct and cascaded channels in an ambient backscatter (AmBC) network comprising multiple tags. In essence, we leverage an adversarial score-based generative model for training, enabling the acquisition of channel distributions. Subsequently, our channel estimation process involves sampling from the posterior distribution, facilitated by the annealed Langevin sampling (ALS) technique. Notably, our method demonstrates substantial advancements over standard least square (LS) estimation techniques, achieving performance akin to that of the minimum mean square error (MMSE) estimator for the direct channel, and outperforming it for the cascaded channels.
    摘要

Potentials of Deterministic Radio Propagation Simulation for AI-Enabled Localization and Sensing

  • paper_url: http://arxiv.org/abs/2309.05650
  • repo_url: None
  • paper_authors: Albrecht Michler, Jonas Ninnemann, Jakob Krauthäuser, Paul Schwarzbach, Oliver Michler
  • for: 本文是为了开发和验证基于机器学习和人工智能的下一代网络地区化和感知方法而写的。
  • methods: 本文提出了一个集成的工具链,包括决定性通道模型和电磁波传播模拟,用于开发和验证这些方法。
  • results: 本文通过示例场景分类来获取飞机客舱环境中相关的地位相关通道参数。
    Abstract Machine leaning (ML) and artificial intelligence (AI) enable new methods for localization and sensing in next-generation networks to fulfill a wide range of use cases. These approaches rely on learning approaches that require large amounts of training and validation data. This paper addresses the data generation bottleneck to develop and validate such methods by proposing an integrated toolchain based on deterministic channel modeling and radio propagation simulation. The toolchain is demonstrated exemplary for scenario classification to obtain localization-related channel parameters within an aircraft cabin environment.
    摘要 机器学习(ML)和人工智能(AI)可以开发新的本地化和感知方法,以满足各种应用场景。这些方法需要大量的训练和验证数据。这篇论文解决了数据生成瓶颈,以开发和验证这些方法,通过决定性通道模型和无线传播 simulation 提供了一个集成的工具链。工具链在飞机客舱环境中进行场景分类,以获取本地化相关的通道参数。

Grid-based Hybrid 3DMA GNSS and Terrestrial Positioning

  • paper_url: http://arxiv.org/abs/2309.05644
  • repo_url: None
  • paper_authors: Paul Schwarzbach, Albrecht Michler, Oliver Michler
  • for: 本研究旨在提高基于GNSS的地图导航和位置掌握技术,特别是将三维地图帮助GNSS定位和陆地系统 integrate into a 3DMA positioning framework.
  • methods: 该研究提出了一种非参数滤波方法,具体来说是3DMA多个epoch网格筛,用于紧密地融合GNSS和陆地信号。此外,还Addresses algorithmic challenges such as different measurement models and time synchronization.
  • results: 实验表明,在静态enario中,使用提出的方法可以实现减小位置误差至0.64米,而在动态enario中,平均位置误差为1.62米。这些结果证明了提出的方法的可行性和陆地信号的包含性。
    Abstract The paper discusses the increasing use of hybridized sensor information for GNSS-based localization and navigation, including the use of 3D map-aided GNSS positioning and terrestrial systems based on different geometric measurement principles. However, both GNSS and terrestrial systems are subject to negative impacts from the propagation environment, which can violate the assumptions of conventionally applied parametric state estimators. Furthermore, dynamic parametric state estimation does not account for multi-modalities within the state space leading to an information loss within the prediction step. In addition, the synchronization of non-deterministic multi-rate measurement systems needs to be accounted. In order to address these challenges, the paper proposes the use of a non-parametric filtering method, specifically a 3DMA multi-epoch Grid Filter, for the tight integration of GNSS and terrestrial signals. Specifically, the fusion of GNSS, Ultra-wide Band (UWB) and vehicle motion data is introduced based on a discrete state representation. Algorithmic challenges, including the use of different measurement models and time synchronization, are addressed. In order to evaluate the proposed method, real-world tests were conducted on an urban automotive testbed in both static and dynamic scenarios. We empirically show that we achieve sub-meter accuracy in the static scenario by averaging a positioning error of $0.64$ m, whereas in the dynamic scenario the average positioning error amounts to $1.62$ m. The paper provides a proof-of-concept of the introduced method and shows the feasibility of the inclusion of terrestrial signals in a 3DMA positioning framework in order to further enhance localization in GNSS-degraded environments.
    摘要 To address these challenges, the paper proposes the use of a non-parametric filtering method, specifically a 3DMA multi-epoch Grid Filter, for the tight integration of GNSS and terrestrial signals. Specifically, the fusion of GNSS, Ultra-wide Band (UWB), and vehicle motion data is introduced based on a discrete state representation. Algorithmic challenges, including the use of different measurement models and time synchronization, are addressed.In order to evaluate the proposed method, real-world tests were conducted on an urban automotive testbed in both static and dynamic scenarios. The results show that we achieve sub-meter accuracy in the static scenario by averaging a positioning error of $0.64$ m, whereas in the dynamic scenario the average positioning error amounts to $1.62$ m.The paper provides a proof-of-concept of the introduced method and shows the feasibility of the inclusion of terrestrial signals in a 3DMA positioning framework in order to further enhance localization in GNSS-degraded environments.Translated into Simplified Chinese:文章讨论了使用混合式感知器信息进行GNSS基于地图定位和导航的增加使用,包括使用3D地图帮助GNSS定位和陆地系统,以及不同的几何测量原理。然而,GNSS和陆地系统都受到传播环境的负面影响,可能违反常用的参数化状态估计器的假设。此外,动态参数状态估计不会考虑多模态在状态空间中的存在,导致估计中的信息损失。此外,不束定的多比率测量系统的同步问题也需要解决。为了解决这些挑战,文章提出使用非参数化滤波方法,具体是3DMA多个步Grid Filter,对GNSS和陆地信号进行紧密的集成。基于精确的状态表示,文章还提出了GNSS、Ultra-wide Band(UWB)和车辆运动数据的混合。算法挑战,包括不同测量模型和时间同步问题,也得到了解决。为了评估提出的方法,文章在城市自动车测试床上进行了实际测试,包括静止和动态场景。测试结果表明,在静止场景下,我们可以通过平均GNSS定位错误为0.64米来获得减少到米级准确性。而在动态场景下,GNSS定位错误的平均值为1.62米。文章提供了GNSS-降低环境中地图定位的证明性证明,并表明了包含陆地信号在3DMA定位框架中的可能性,以进一步提高GNSS定位精度。

A Comparative Analysis of Deep Reinforcement Learning-based xApps in O-RAN

  • paper_url: http://arxiv.org/abs/2309.05621
  • repo_url: None
  • paper_authors: Maria Tsampazi, Salvatore D’Oro, Michele Polese, Leonardo Bonati, Gwenael Poitau, Michael Healy, Tommaso Melodia
    for:* This paper focuses on the design and evaluation of Deep Reinforcement Learning (DRL) based xApps for Next Generation (NextG) wireless communication systems.methods:* The paper uses a comparative analysis to evaluate the impact of different DRL-based xApp designs on network performance.* The authors use 12 different xApps that embed DRL agents trained using different reward functions, with different action spaces, and with the ability to hierarchically control different network parameters.results:* The paper demonstrates that certain design choices can deliver the highest performance, while others might result in a competitive behavior between different classes of traffic with similar objectives.Here is the information in Simplified Chinese text:for:* 这篇论文关注下一代无线通信系统中 Deep Reinforcement Learning(DRL)基于 xApps 的设计和评估。methods:* 该论文使用比较分析来评估不同 DRL 基于 xApps 的设计对网络性能的影响。* 作者使用 12 个不同的 xApps,每个 xApp 都包含 DRL 代理训练使用不同的奖励函数、不同的动作空间和可以控制不同的网络参数。results:* 论文表明 certain 的设计选择可以实现最高性能,而其他选择可能会导致不同类型的流量之间的竞争。
    Abstract The highly heterogeneous ecosystem of Next Generation (NextG) wireless communication systems calls for novel networking paradigms where functionalities and operations can be dynamically and optimally reconfigured in real time to adapt to changing traffic conditions and satisfy stringent and diverse Quality of Service (QoS) demands. Open Radio Access Network (RAN) technologies, and specifically those being standardized by the O-RAN Alliance, make it possible to integrate network intelligence into the once monolithic RAN via intelligent applications, namely, xApps and rApps. These applications enable flexible control of the network resources and functionalities, network management, and orchestration through data-driven control loops. Despite recent work demonstrating the effectiveness of Deep Reinforcement Learning (DRL) in controlling O-RAN systems, how to design these solutions in a way that does not create conflicts and unfair resource allocation policies is still an open challenge. In this paper, we perform a comparative analysis where we dissect the impact of different DRL-based xApp designs on network performance. Specifically, we benchmark 12 different xApps that embed DRL agents trained using different reward functions, with different action spaces and with the ability to hierarchically control different network parameters. We prototype and evaluate these xApps on Colosseum, the world's largest O-RAN-compliant wireless network emulator with hardware-in-the-loop. We share the lessons learned and discuss our experimental results, which demonstrate how certain design choices deliver the highest performance while others might result in a competitive behavior between different classes of traffic with similar objectives.
    摘要 Next Generation(NextG)无线通信系统的高度多样化生态系统需要新的网络编组方法,以实时动态和优化网络功能和操作,适应交通条件的变化和满足多样化的服务质量(QoS)需求。开放式无线接入网络(RAN)技术,尤其是由O-RAN联盟标准化的技术,使得可以在无线网络中 интеGRATE智能应用程序,包括xApps和rApps。这些应用程序允许在数据驱动的控制循环中flexibly控制网络资源和功能,网络管理和编组。虽然最近的研究已经证明了深度奖励学习(DRL)可以控制O-RAN系统,但是如何在设计中避免创建冲突和不公平的资源分配策略仍然是一个开放的挑战。在这篇论文中,我们进行了对12个不同的xApp设计的比较分析,以评估它们对网络性能的影响。我们使用了不同的奖励函数、不同的动作空间和可以层次控制不同的网络参数来训练DRL代理。我们使用Colosseum,全球最大的O-RAN兼容无线网络仿真器,进行了实验和评估这些xApps。我们分享了我们所学到的经验和实验结果,这些结果表明了某些设计选择可以实现最高性能,而其他选择可能会导致不同类型的流量之间的竞争行为。

  • paper_url: http://arxiv.org/abs/2309.07162
  • repo_url: None
  • paper_authors: Tanay Rastogi, Michele D. Simoni, Anders Karlström
    for: 这个论文的目的是提出一种基于摄像头数据的交通状态估算方法,以估算交通conditions based on partially observed data using prior knowledge of traffic patterns。methods: 该方法使用了多个运动摄像头的数据,将其组合成时空图,并使用Cell Transmission Model (CTM)和生化算法优化相应的参数和边界条件,以实现准确的交通状态估算。results: 在使用SUMO交通模拟器生成的 simulate traffic data 上进行测试,该方法可以达到低的root mean square error (RMSE)值0.0079 veh/m,与其他CTM-based方法相当。
    Abstract Traffic State Estimation (TSE) is the process of inferring traffic conditions based on partially observed data using prior knowledge of traffic patterns. The type of input data used has a significant impact on the accuracy and methodology of TSE. Traditional TSE methods have relied on data from either stationary sensors like loop detectors or mobile sensors such as GPS-equipped floating cars. However, both approaches have their limitations. This paper proposes a method for estimating traffic states on a road link using vehicle trajectories obtained from cameras mounted on moving vehicles. It involves combining data from multiple moving cameras to construct time-space diagrams and using them to estimate parameters for the link's fundamental diagram (FD) and densities in unobserved regions of space-time. The Cell Transmission Model (CTM) is utilized in conjunction with a Genetic Algorithm (GA) to optimize the FD parameters and boundary conditions necessary for accurate estimation. To evaluate the effectiveness of the proposed methodology, simulated traffic data generated by the SUMO traffic simulator was employed incorporating 140 different space-time diagrams with varying lane density and speed. The evaluation of the simulated data demonstrates the effectiveness of the proposed approach, as it achieves a low root mean square error (RMSE) value of 0.0079 veh/m and is comparable to other CTM-based methods. In conclusion, the proposed TSE method opens new avenues for the estimation of traffic state using an innovative data collection method that uses vehicle trajectories collected from on-board cameras.
    摘要 traffic 状态估计 (TSE) 是根据部分观察数据进行交通条件的推断,使用交通模式的先前知识。传统的 TSE 方法使用了stationary 传感器(如 loop detectors)或移动传感器(如 GPS 装备的浮动车辆)的数据。然而,两种方法都有其局限性。这篇论文提出了使用 mounted 在移动车辆上的相机获取车辆轨迹来估计交通状态的方法。该方法通过将多个移动相机的数据组合成时空图,并使用时空图来估计链接的基本图ogram(FD)和未观察区域的密度。使用 cel 传输模型(CTM)和遗传算法(GA)优化 FD 参数和边界条件,以实现准确的估计。为评估提案的效果,使用 SUMO 交通模拟器生成的 simulated 交通数据,包括140个不同的时空图,具有不同的车道密度和速度。对 simulated 数据进行评估,显示了提案的效果,其Root Mean Square Error(RMSE)值为0.0079 veh/m,与其他 CTM 基于的方法相当。结论,提案的 TSE 方法开启了新的途径,使用 innovative 的数据收集方法,使用 mounted 在移动车辆上的相机收集车辆轨迹来估计交通状态。

ECG-based estimation of respiratory modulation of AV nodal conduction during atrial fibrillation

  • paper_url: http://arxiv.org/abs/2309.05458
  • repo_url: https://github.com/plappertf/ecg-based_estimation_of_respiratory_modulation_of_av_nodal_conduction_during_atrial_fibrillation
  • paper_authors: Felix Plappert, Gunnar Engström, Pyotr G. Platonov, Mikael Wallman, Frida Sandberg
  • for: 本研究旨在提供一种基于ECG的呼吸调控评估方法,以便为个性化的心律失常(AF)治疗提供更多信息。
  • methods: 本研究使用1维 convolutional neural network(1D-CNN)来估计ECG中呼吸调控的AV节间步延迟和传导延迟的呼吸模ulation。首先使用一种网络模型来生成仿真的ECG数据,然后使用1D-CNN来分析临床深呼吸测试数据中的呼吸模ulation。
  • results: 研究表明,使用ECG中的呼吸信号可以对AV节间步延迟和传导延迟进行呼吸调控评估,并且可以通过添加呼吸信号、AFR或两者来提高预测的准确性。在临床数据中,研究发现呼吸模ulation在深呼吸测试中的变化具有大量 между patient variability。
    Abstract Information about autonomic nervous system (ANS) activity may be valuable for personalized atrial fibrillation (AF) treatment but is not easily accessible from the ECG. In this study, we propose a new approach for ECG-based assessment of respiratory modulation in AV nodal refractory period and conduction delay. A 1-dimensional convolutional neural network (1D-CNN) was trained to estimate respiratory modulation of AV nodal conduction properties from 1-minute segments of RR series, respiration signals, and atrial fibrillatory rates (AFR) using synthetic data that replicates clinical ECG-derived data. The synthetic data were generated using a network model of the AV node and 4 million unique model parameter sets. The 1D-CNN was then used to analyze respiratory modulation in clinical deep breathing test data of 28 patients in AF, where a ECG-derived respiration signal was extracted using a novel approach based on periodic component analysis. We demonstrated using synthetic data that the 1D-CNN can predict the respiratory modulation from RR series alone ($\rho$ = 0.805) and that the addition of either respiration signal ($\rho$ = 0.830), AFR ($\rho$ = 0.837), or both ($\rho$ = 0.855) improves the prediction. Results from analysis of clinical ECG data of 20 patients with sufficient signal quality suggest that respiratory modulation decreased in response to deep breathing for five patients, increased for five patients, and remained similar for ten patients, indicating a large inter-patient variability.
    摘要 信息关于自动神经系统(ANS)活动可能对个人化脉动性心律疾病(AF)治疗有价值,但是不容易从电压ogram(ECG)中获取。在本研究中,我们提出了一种新的方法来使用ECG来评估呼吸功能的影响。我们使用一维 convolutional neural network(1D-CNN)来估算呼吸功能对AV节点的储备期和传导延迟的影响。我们使用了一个网络模型来生成Synthetic data,并使用400万个独特参数集来生成仿真数据。然后,我们使用1D-CNN来分析临床深呼吸测试数据中的呼吸功能。我们在Synthetic data中示出了1D-CNN可以从RR序列中预测呼吸功能(ρ = 0.805),并且在添加呼吸信号、AFR(脉动率)或者两者时,预测的精度都会提高(ρ = 0.830、0.837、0.855)。在临床ECG数据中,我们对20名患有AF的患者进行分析,结果显示,呼吸功能对深呼吸的应对有很大的个体差异。

Opinion Dynamics in Two-Step Process: Message Sources, Opinion Leaders and Normal Agents

  • paper_url: http://arxiv.org/abs/2309.05370
  • repo_url: None
  • paper_authors: Huisheng Wang, Yuejiang Li, Yiqing Lin, H. Vicky Zhao
  • for: 本研究旨在探讨社交网络中意见的传播和演化,以及两步过程中消息源、意见领袖和普通代理的交互。
  • methods: 本研究提出了一个统一框架,称为两步模型,用于分析消息传递过程中的意见演化。研究者通过分析平衡状态的意见和稳定性,探讨了各种因素对意见的影响。
  • results: 研究发现,消息分布、初始意见、坚持度和偏好系数等因素都会影响平衡状态下的意见的样本均值和方差。同时, normal agents的意见往往受到意见领袖的影响。研究者还进行了数值和社会实验,并发现两步模型在平均上表现较好。这些结果为社交网络中意见的形成和导航提供了有价值的洞察和指导。
    Abstract According to mass media theory, the dissemination of messages and the evolution of opinions in social networks follow a two-step process. First, opinion leaders receive the message from the message sources, and then they transmit their opinions to normal agents. However, most opinion models only consider the evolution of opinions within a single network, which fails to capture the two-step process accurately. To address this limitation, we propose a unified framework called the Two-Step Model, which analyzes the communication process among message sources, opinion leaders, and normal agents. In this study, we examine the steady-state opinions and stability of the Two-Step Model. Our findings reveal that several factors, such as message distribution, initial opinion, level of stubbornness, and preference coefficient, influence the sample mean and variance of steady-state opinions. Notably, normal agents' opinions tend to be influenced by opinion leaders in the two-step process. We also conduct numerical and social experiments to validate the accuracy of the Two-Step Model, which outperforms other models on average. Our results provide valuable insights into the factors that shape social opinions and can guide the development of effective strategies for opinion guidance in social networks.
    摘要 In this study, we investigate the steady-state opinions and stability of the Two-Step Model. Our findings show that several factors, such as message distribution, initial opinion, level of stubbornness, and preference coefficient, affect the sample mean and variance of steady-state opinions. Notably, normal agents' opinions are influenced by opinion leaders in the two-step process.To validate the accuracy of the Two-Step Model, we conduct numerical and social experiments. Our results show that the Two-Step Model outperforms other models on average. Our findings provide valuable insights into the factors that shape social opinions and can guide the development of effective strategies for opinion guidance in social networks.

Low Peak-to-Average Power Ratio FBMC-OQAM System based on Data Mapping and DFT Precoding

  • paper_url: http://arxiv.org/abs/2309.05278
  • repo_url: None
  • paper_authors: Liming Li, Liqin Ding, Yang Wang, Jiliang Zhang
  • for: 提高 spectrum 的 flexible 使用
  • methods: 使用 conjugate symmetry rule 和 DFT 编码
  • results: 实现更好的 PAPR 减少,但与 prototype filter 的效果存在负面影响
    Abstract Filter bank multicarrier with offset quadrature amplitude modulation (FBMC-OQAM) is an alternative to OFDM for enhanced spectrum flexible usage. To reduce the peak-to-average power ratio (PAPR), DFT spreading is usually adopted in OFDM systems. However, in FBMC-OQAM systems, because the OQAM pre-processing splits the spread data into the real and imaginary parts, the DFT spreading can result in only marginal PAPR reduction. This letter proposes a novel map-DFT-spread FBMC-OQAM scheme. In this scheme, the transmitting data symbols are first mapped with a conjugate symmetry rule and then coded by the DFT. According to this method, the OQAM pre-processing can be avoided. Compared with the simple DFT-spread scheme, the proposed scheme achieves a better PAPR reduction. In addition, the effect of the prototype filter on the PAPR is studied via numerical simulation and a trade-off exists between the PAPR and out-of-band performances.
    摘要 Filter bank multicarrier with offset quadrature amplitude modulation (FBMC-OQAM) 是一种可以增强频率 Versatile 使用的替代品,而不是 OFDM。在 OFDM 系统中,通常采用 DFT 扩展来减少峰峰与平均功率比(PAPR)。但在 FBMC-OQAM 系统中,由于 OQAM 预处理将讯号分成实部和虚部,因此 DFT 扩展仅能实现有限的 PAPR 减少。这封信函数 propose 一种新的 map-DFT-spread FBMC-OQAM 方案。在这个方案中,传输讯号 симвоル是首先使用 conjugate symmetry rule 映射,然后被 DFT 编码。根据这种方法,OQAM 预处理可以避免。相比较简单的 DFT-spread 方案,提议的方案可以更好地减少 PAPR。此外,透过数字 simulations 的研究发现,试验filter 对 PAPR 的影响存在贸易,而且在 PAPR 和外带性能之间存在贸易。

Deep photonic reservoir computing recurrent network

  • paper_url: http://arxiv.org/abs/2309.05246
  • repo_url: None
  • paper_authors: Cheng Wang
  • for: 解决现实世界复杂任务
  • methods: 光学储存器计算(PRC)架构,包括4层隐藏层和320个相互连接的神经元
  • results: 在光纤通信系统中实现了强大的非线性补偿功能
    Abstract Deep neural networks usually process information through multiple hidden layers. However, most hardware reservoir computing recurrent networks only have one hidden reservoir layer, which significantly limits the capability of solving real-world complex tasks. Here we show a deep photonic reservoir computing (PRC) architecture, which is constructed by cascading injection-locked semiconductor lasers. In particular, the connection between successive hidden layers is all optical, without any optical-electrical conversion or analog-digital conversion. The proof of concept is demonstrated on a PRC consisting of 4 hidden layers and 320 interconnected neurons. In addition, we apply the deep PRC in the real-world signal equalization of an optical fiber communication system. It is found that the deep PRC owns strong ability to compensate the nonlinearity of fibers.
    摘要 深度神经网络通常通过多层隐藏层处理信息。然而,大多数硬件液体计算回卷网络只有一层隐藏层,这限制了解决实际世界复杂任务的能力。我们在这里介绍了深度光学液体计算(PRC)架构,该架构由锁定射频激光器串联而成。具体来说,连续隐藏层之间的连接都是光学连接,没有光电转换或杂化数字转换。我们在一个包含4个隐藏层和320个相互连接的 neuron 的 PRC 中进行了证明。此外,我们还应用了深度 PRC 在光纤通信系统中的实际信号平衡。发现深度 PRC 具有强大的补偿光纤非线性能力。

Joint Beamforming and Compression Design for Per-Antenna Power Constrained Cooperative Cellular Networks

  • paper_url: http://arxiv.org/abs/2309.05226
  • repo_url: None
  • paper_authors: Xilai Fan, Ya-Feng Liu, Bo Jiang
  • for: 这个论文关注的是协同单元网络中 relay-like 基站与中央处理器(CP)之间的缓冲限制的 JOINT 扩展和压缩问题。
  • methods: 作者首先确定了考虑问题和其半definite 逼近(SDR)问题的等价性。然后,他们 derive了 SDR 问题的副本Lagrangian dual问题的 partial 对偶函数,并证明了该对偶函数的对数 diferenciable。基于这个 diferenciability,作者提出了两种高效的投影升降算法,即投影精确升降算法(PEGA)和投影不精确升降算法(PIGA)。
  • results: 作者通过数值实验表明了 globally optimal 和高效的性能。
    Abstract In the cooperative cellular network, relay-like base stations are connected to the central processor (CP) via rate-limited fronthaul links and the joint processing is performed at the CP, which thus can effectively mitigate the multiuser interference. In this paper, we consider the joint beamforming and compression problem with per-antenna power constraints in the cooperative cellular network. We first establish the equivalence between the considered problem and its semidefinite relaxation (SDR). Then we further derive the partial Lagrangian dual of the SDR problem and show that the objective function of the obtained dual problem is differentiable. Based on the differentiability, we propose two efficient projected gradient ascent algorithms for solving the dual problem, which are projected exact gradient ascent (PEGA) and projected inexact gradient ascent (PIGA). While PEGA is guaranteed to find the global solution of the dual problem (and hence the global solution of the original problem), PIGA is more computationally efficient due to the lower complexity in inexactly computing the gradient. Global optimality and high efficiency of the proposed algorithms are demonstrated via numerical experiments.
    摘要 在合作mobile network中,关键基站是通过rate-limited前方链接到中央处理器(CP),并在CP上进行共同处理,从而可以有效地减少多用户干扰。在这篇论文中,我们考虑了共同扫描和压缩问题,并受到每个天线的功率限制。我们首先证明了这个问题和其半definite relaxation(SDR)问题的等价性。然后我们derived the partial Lagrangian dual of the SDR problem,并证明了其目标函数的导数ifferentiability。基于导数ifferentiability,我们提出了两种高效的 проекed gradient ascent算法,分别是exact projected gradient ascent(PEGA)和inexact projected gradient ascent(PIGA)。而PEGA是保证找到全局解的,而PIGA是因为计算gradient的复杂度较低,因此更加具有计算效率。我们通过数学实验证明了全球优化和高效性。

Quaternion MLP Neural Networks Based on the Maximum Correntropy Criterion

  • paper_url: http://arxiv.org/abs/2309.05208
  • repo_url: None
  • paper_authors: Gang Wang, Xinyu Tian, Zuxuan Zhang
  • for: 这篇论文是为了提出一种梯度升降算法,用于多层感知器(MLP)网络,并且使用最大对应度假设(MCC)的成本函数。
  • methods: 这篇论文使用了Split点旋转函数,基于对应度矩阵的对应度梯度。它首先将早期点旋转单层感知器算法重新写作为一个新的点旋转算法。其次,它提出了基于MSE成本函数的梯度下降算法,并且将其扩展到MCC成本函数。
  • results: simulations 显示了提案的方法的可行性。
    Abstract We propose a gradient ascent algorithm for quaternion multilayer perceptron (MLP) networks based on the cost function of the maximum correntropy criterion (MCC). In the algorithm, we use the split quaternion activation function based on the generalized Hamilton-real quaternion gradient. By introducing a new quaternion operator, we first rewrite the early quaternion single layer perceptron algorithm. Secondly, we propose a gradient descent algorithm for quaternion multilayer perceptron based on the cost function of the mean square error (MSE). Finally, the MSE algorithm is extended to the MCC algorithm. Simulations show the feasibility of the proposed method.
    摘要 我们提出了一种梯度升降算法,用于基于最大幂函数 критериion(MCC)的四元数多层感知网络(MLP)。在算法中,我们使用基于总体 Hamilton-real四元数 gradient的Split四元数活化函数。首先,我们通过引入一个新的四元数操作符,将早期四元数单层感知算法重新写作。其次,我们提出了基于成本函数 Mean Square Error(MSE)的梯度下降算法。最后,MSE算法被扩展到MCC算法。模拟结果表明提出的方法的可行性。Here's the breakdown of the translation:* 我们提出了一种梯度升降算法 (We propose a gradient ascent algorithm)* 用于基于最大幂函数 критериion (MCC) (based on the maximum correntropy criterion)* 四元数多层感知网络 (MLP) (quaternion multilayer perceptron)* 在算法中,我们使用 (In the algorithm, we use) + 基于总体 Hamilton-real四元数 gradient (Split quaternion activation function based on the generalized Hamilton-real quaternion gradient) + 一个新的四元数操作符 (a new quaternion operator)* 首先 (First) + 我们通过引入 (We introduce) - 早期四元数单层感知算法 (early quaternion single layer perceptron algorithm) + 然后 (Then) - 我们提出了 (We propose) - 基于成本函数 Mean Square Error (MSE) 的梯度下降算法 (a gradient descent algorithm based on the cost function of Mean Square Error) + 最后 (Finally) - MSE算法被扩展到MCC算法 (MSE algorithm is extended to the MCC algorithm)* 模拟结果表明 (Simulation results show) + 提出的方法的可行性 (the feasibility of the proposed method)

A Review of the Applications of Quantum Machine Learning in Optical Communication Systems

  • paper_url: http://arxiv.org/abs/2309.05205
  • repo_url: None
  • paper_authors: Ark Modi, Alonso Viladomat Jasso, Roberto Ferrara, Christian Deppe, Janis Noetzel, Fred Fung, Maximilian Schaedler
  • for: 这篇论文主要为什么写的?
  • methods: 这篇论文使用了哪些方法?
  • results: 这篇论文得到了哪些结果?Here are the answers, in Simplified Chinese:
  • for: 这篇论文主要为了探讨Quantum和量子静默学习算法在光学信号处理中的应用。
  • methods: 这篇论文使用了各种提议的Quantum和量子静默学习算法,包括量子回归、量子矩阵分解等。
  • results: 这篇论文评论了这些算法在现有技术下的可行性和应用前景。
    Abstract In the context of optical signal processing, quantum and quantum-inspired machine learning algorithms have massive potential for deployment. One of the applications is in error correction protocols for the received noisy signals. In some scenarios, non-linear and unknown errors can lead to noise that bypasses linear error correction protocols that optical receivers generally implement. In those cases, machine learning techniques are used to recover the transmitted signal from the received signal through various estimation procedures. Since quantum machine learning algorithms promise advantage over classical algorithms, we expect that optical signal processing can benefit from these advantages. In this review, we survey several proposed quantum and quantum-inspired machine learning algorithms and their applicability with current technology to optical signal processing.
    摘要 在光学信号处理中,量子机器学习算法有巨大的应用潜力。其中一个应用是在接收到含噪信号后使用机器学习技术来重新计算传输信号。在某些情况下,非线性和未知的错误可能会超越线性错误修复协议,那么机器学习技术可以用来从接收到的信号中提取原始传输信号。由于量子机器学习算法能够提供优势,因此我们期望光学信号处理可以从这些优势中受益。在本文中,我们评论了一些提议的量子和量子启发式机器学习算法,以及它们与当前技术的相互应用。

cs.SD - 2023-09-10

Multimodal Fish Feeding Intensity Assessment in Aquaculture

  • paper_url: http://arxiv.org/abs/2309.05058
  • repo_url: None
  • paper_authors: Meng Cui, Xubo Liu, Haohe Liu, Zhuangzhuang Du, Tao Chen, Guoping Lian, Daoliang Li, Wenwu Wang
  • for: 这项研究的目的是评估鱼类食欲强度变化的评估方法,具体来说是用于工业鱼类养殖应用。
  • methods: 这项研究使用了多modal方法,包括单模态预训练模型和模式融合方法,并在大规模的 audio-visual数据集 AV-FFIA 上进行了比较研究。
  • results: 研究结果表明,多模态方法在噪音环境中表现明显更好,而单模态方法在静音环境中表现更好。此外,提出了一种单一模型 U-FFIA,可以处理不同的感知模式,并且可以在较低的计算成本下实现更高的性能。
    Abstract Fish feeding intensity assessment (FFIA) aims to evaluate the intensity change of fish appetite during the feeding process, which is vital in industrial aquaculture applications. The main challenges surrounding FFIA are two-fold. 1) robustness: existing work has mainly leveraged single-modality (e.g., vision, audio) methods, which have a high sensitivity to input noise. 2) efficiency: FFIA models are generally expected to be employed on devices. This presents a challenge in terms of computational efficiency. In this work, we first introduce an audio-visual dataset, called AV-FFIA. AV-FFIA consists of 27,000 labeled audio and video clips that capture different levels of fish feeding intensity. To our knowledge, AV-FFIA is the first large-scale multimodal dataset for FFIA research. Then, we introduce a multi-modal approach for FFIA by leveraging single-modality pre-trained models and modality-fusion methods, with benchmark studies on AV-FFIA. Our experimental results indicate that the multi-modal approach substantially outperforms the single-modality based approach, especially in noisy environments. While multimodal approaches provide a performance gain for FFIA, it inherently increase the computational cost. To overcome this issue, we further present a novel unified model, termed as U-FFIA. U-FFIA is a single model capable of processing audio, visual, or audio-visual modalities, by leveraging modality dropout during training and knowledge distillation from single-modality pre-trained models. We demonstrate that U-FFIA can achieve performance better than or on par with the state-of-the-art modality-specific FFIA models, with significantly lower computational overhead. Our proposed U-FFIA approach enables a more robust and efficient method for FFIA, with the potential to contribute to improved management practices and sustainability in aquaculture.
    摘要 鱼食吞吐评估(FFIA)目的是评估鱼的吞吐程度在食物过程中的变化,这对于工业鱼养殖非常重要。主要挑战包括:1)稳定性:现有工作主要基于单模态(如视觉、音频)方法,具有高敏感度输入噪声。2)效率:FFIA模型通常预期在设备上使用,这将带来计算效率的挑战。在这种情况下,我们首先介绍了一个音频视频数据集(AV-FFIA),AV-FFIA包括27,000个标注音频和视频剪辑,各个剪辑捕捉不同水平的鱼食吞吐程度。我们知道,AV-FFIA是首个大规模的多模态FFIA数据集。然后,我们介绍了一种多模态方法,通过单模态预训练模型和多模态融合方法,对AV-FFIA进行了 benchmark研究。我们的实验结果表明,多模态方法在噪声环境中substantially outperforms单模态基于方法,特别是在噪声环境下。虽然多模态方法提供了FFIA中性能提升,但它会自然增加计算成本。为了解决这个问题,我们进一步发表了一种单一模型,称为U-FFIA。U-FFIA是一个能够处理音频、视觉或音频视频模式的单一模型,通过训练时模式排除和知识储存单模态预训练模型来实现。我们示示了U-FFIA可以达到与状态空间的性能,同时具有明显更低的计算开销。我们的提出的U-FFIA方法可以提供更加稳定和高效的FFIA方法,具有改善鱼养殖管理实践和可持续发展的潜在潜力。

Gray Jedi MVDR Post-filtering

  • paper_url: http://arxiv.org/abs/2309.05057
  • repo_url: https://github.com/FrancoisGrondin/mvdrpf
  • paper_authors: François Grondin, Caleb Rascón
  • for: 提高多个语音源场景中的语音质量
  • methods: 使用深度学习基于的语音提高模型,并使用最小差分误差Response(MVDR)进行干扰估计
  • results: 比单输入基线具有更高的提升性能,并且需要更少的计算资源进行后处理
    Abstract Spatial filters can exploit deep-learning-based speech enhancement models to increase their reliability in scenarios with multiple speech sources scenarios. To further improve speech quality, it is common to perform postfiltering on the estimated target speech obtained with spatial filtering. In this work, Minimum Variance Distortionless Response (MVDR) is employed to provide the interference estimation, along with the estimation of the target speech, to be later used for postfiltering. This improves the enhancement performance over a single-input baseline in a far more significant way than by increasing the model's complexity. Results suggest that less computing resources are required for postfiltering when provided with both target and interference signals, which is a step forward in developing an online speech enhancement system for multi-speech scenarios.
    摘要 空间滤波可以利用深度学习基于的Speech增强模型来提高其在多个语音源场景中的可靠性。为进一步提高语音质量,通常会在估计目标语音后进行后 filtering。在这种工作中,使用最小差异无损响应(MVDR)来提供干扰估计,同时提供目标语音估计,以便后续使用。这会提高增强性能,相比增加模型复杂度。结果表明,提供target和干扰信号后 filtering需要更少的计算资源,这是在开发在线语音增强系统的重要进展。

cs.CV - 2023-09-10

Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color

  • paper_url: http://arxiv.org/abs/2309.05148
  • repo_url: None
  • paper_authors: William Thong, Przemyslaw Joniak, Alice Xiang
  • for: 这篇论文目的是在计算机视觉中测量表面颜色,超越单一的皮肤颜色标准。
  • methods: 该论文使用了 Fitzpatrick 皮肤类型分类法,以及颜色探测技术来评估计算机视觉系统中的皮肤偏见。
  • results: 该论文发现,使用单一的皮肤颜色标准(Fitzpatrick 分类法)不能充分捕捉计算机视觉系统中的皮肤偏见,而使用多维度皮肤颜色标准(包括皮肤颜色和颜色角度)可以更好地评估计算机视觉系统中的皮肤偏见。
    Abstract This paper strives to measure apparent skin color in computer vision, beyond a unidimensional scale on skin tone. In their seminal paper Gender Shades, Buolamwini and Gebru have shown how gender classification systems can be biased against women with darker skin tones. Subsequently, fairness researchers and practitioners have adopted the Fitzpatrick skin type classification as a common measure to assess skin color bias in computer vision systems. While effective, the Fitzpatrick scale only focuses on the skin tone ranging from light to dark. Towards a more comprehensive measure of skin color, we introduce the hue angle ranging from red to yellow. When applied to images, the hue dimension reveals additional biases related to skin color in both computer vision datasets and models. We then recommend multidimensional skin color scales, relying on both skin tone and hue, for fairness assessments.
    摘要

A Skeleton-based Approach For Rock Crack Detection Towards A Climbing Robot Application

  • paper_url: http://arxiv.org/abs/2309.05139
  • repo_url: https://github.com/josselinsomervilleroberts/reachbot-predictor
  • paper_authors: Josselin Somerville Roberts, Paul-Emile Giacomelli, Yoni Gozlan, Julia Di
  • for: 这个论文是为了提高吊 Bridge 爬行机器人在科学上有趣但危险的洞穴环境中的运动能力而写的。
  • methods: 这篇论文使用了一种新的分割方法,即 SKeleton Intersection Loss (SKIL),以便在硬石表面上检测岩石裂隙和边缘。此外,论文还提出了一组新的评价指标,即 LineAcc,以便评估细长物体分割的质量。
  • results: 根据论文的描述,使用 SKIL 方法和 LineAcc 指标可以在类似的细长物体分割任务中获得更高的性能,例如血管分割。这表示这些方法可以用于吊 Bridge 爬行机器人上的 grasp 位置识别。
    Abstract Conventional wheeled robots are unable to traverse scientifically interesting, but dangerous, cave environments. Multi-limbed climbing robot designs, such as ReachBot, are able to grasp irregular surface features and execute climbing motions to overcome obstacles, given suitable grasp locations. To support grasp site identification, we present a method for detecting rock cracks and edges, the SKeleton Intersection Loss (SKIL). SKIL is a loss designed for thin object segmentation that leverages the skeleton of the label. A dataset of rock face images was collected, manually annotated, and augmented with generated data. A new group of metrics, LineAcc, has been proposed for thin object segmentation such that the impact of the object width on the score is minimized. In addition, the metric is less sensitive to translation which can often lead to a score of zero when computing classical metrics such as Dice on thin objects. Our fine-tuned models outperform previous methods on similar thin object segmentation tasks such as blood vessel segmentation and show promise for integration onto a robotic system.
    摘要

DAD++: Improved Data-free Test Time Adversarial Defense

  • paper_url: http://arxiv.org/abs/2309.05132
  • repo_url: https://github.com/vcl-iisc/data-free-defense-at-test-time
  • paper_authors: Gaurav Kumar Nayak, Inder Khatri, Shubham Randive, Ruchit Rawal, Anirban Chakraborty
  • for: 这个研究旨在提高深度神经网络在实际应用中的防御性,以应对攻击者可能会运用攻击技术来破坏神经网络的问题。
  • methods: 本研究使用了训练时间不需要数据的攻击防御技术,包括检测和修正框架。此外,为了进一步提高修正框架在检测器不足自信的情况下的表现,提出了一种软检测方案(称为“DAD++”)。
  • results: 在多个数据集和网络架构上进行了广泛的实验和检测,证明了我们的提案的有效性。此外,我们还证明了我们的方法可以在没有训练数据的情况下实现攻击防御,例如在数据自由知识传播和无监督无标注领域数据预测等方面。我们发现在所有实验和应用中,我们的DAD++方法具有优秀的防御性,即使面对多种攻击方法,clean准确率也几乎不受影响。
    Abstract With the increasing deployment of deep neural networks in safety-critical applications such as self-driving cars, medical imaging, anomaly detection, etc., adversarial robustness has become a crucial concern in the reliability of these networks in real-world scenarios. A plethora of works based on adversarial training and regularization-based techniques have been proposed to make these deep networks robust against adversarial attacks. However, these methods require either retraining models or training them from scratch, making them infeasible to defend pre-trained models when access to training data is restricted. To address this problem, we propose a test time Data-free Adversarial Defense (DAD) containing detection and correction frameworks. Moreover, to further improve the efficacy of the correction framework in cases when the detector is under-confident, we propose a soft-detection scheme (dubbed as "DAD++"). We conduct a wide range of experiments and ablations on several datasets and network architectures to show the efficacy of our proposed approach. Furthermore, we demonstrate the applicability of our approach in imparting adversarial defense at test time under data-free (or data-efficient) applications/setups, such as Data-free Knowledge Distillation and Source-free Unsupervised Domain Adaptation, as well as Semi-supervised classification frameworks. We observe that in all the experiments and applications, our DAD++ gives an impressive performance against various adversarial attacks with a minimal drop in clean accuracy. The source code is available at: https://github.com/vcl-iisc/Improved-Data-free-Test-Time-Adversarial-Defense
    摘要 随着深度神经网络在安全关键应用领域的普及,如自动驾驶车、医学影像分析、异常检测等,对深度神经网络的抗 adversarial 性能成为了这些网络在实际场景中的可靠性问题。众多基于对抗训练和规范化技术的方法已经被提议以使得这些深度网络对 adversarial 攻击具有抗性。然而,这些方法需要 Either retrain models 或从scratch 训练,使得在数据限制情况下不可防御 pre-trained 模型。为解决这个问题,我们提出了一种测试时 Data-free Adversarial Defense (DAD) 包含检测和修正框架。此外,为了进一步提高修正框架在检测器具有低自信的情况下的效果,我们提议了一种软检测方案(称为 "DAD++")。我们在多种数据集和网络架构上进行了广泛的实验和剖除,以示我们的提议方法的有效性。此外,我们还证明了我们的方法可以在数据缺乏(或数据高效)应用/设置下进行免数据抗 adversarial 防御,如数据缺乏知识传承和源缺乏无监督领域适应,以及半监督分类框架。我们在所有实验和应用中观察到,我们的 DAD++ 在对多种 adversarial 攻击的检测和修正方面表现出色,而且clean accuracy 的损失很小。源代码可以在:https://github.com/vcl-iisc/Improved-Data-free-Test-Time-Adversarial-Defense 中下载。

3D Implicit Transporter for Temporally Consistent Keypoint Discovery

  • paper_url: http://arxiv.org/abs/2309.05098
  • repo_url: https://github.com/zhongcl-thu/3d-implicit-transporter
  • paper_authors: Chengliang Zhong, Yuhang Zheng, Yupeng Zheng, Hao Zhao, Li Yi, Xiaodong Mu, Ling Wang, Pengfei Li, Guyue Zhou, Chao Yang, Xinliang Zhang, Jian Zhao
  • for: 本研究旨在提高3D点云数据中的键点检测精度,通过 integrate 空间和时间信息。
  • methods: 该研究提出了首个3D版本的Transporter方法,基于混合3D表示、交叉注意力和隐藏重建。
  • results: 对3D柔性物体和非定形动物(人类和小鼠)进行了验证,显示学习的键点具有空间-时间一致性。此外,还提出了一种封闭控制策略,利用学习的键点进行3D物体抓取,并证明其性能优于传统方法。
    Abstract Keypoint-based representation has proven advantageous in various visual and robotic tasks. However, the existing 2D and 3D methods for detecting keypoints mainly rely on geometric consistency to achieve spatial alignment, neglecting temporal consistency. To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information. However, the direct application of the Transporter to 3D point clouds is infeasible due to their structural differences from 2D images. Thus, we propose the first 3D version of the Transporter, which leverages hybrid 3D representation, cross attention, and implicit reconstruction. We apply this new learning system on 3D articulated objects and nonrigid animals (humans and rodents) and show that learned keypoints are spatio-temporally consistent. Additionally, we propose a closed-loop control strategy that utilizes the learned keypoints for 3D object manipulation and demonstrate its superior performance. Codes are available at https://github.com/zhongcl-thu/3D-Implicit-Transporter.
    摘要 《键点基本表示法在视觉和机器人任务中表现出了优势。然而,现有的2D和3D关键点检测方法主要基于几何一致性来实现空间对齐,忽略了时间一致性。为解决这个问题,Transporter方法在2D数据上被引入,可以重建目标帧从源帧中,并同时包含空间和时间信息。然而,直接将Transporter应用于3D点云是不可能的,因为它们与2D图像的结构不同。因此,我们提出了第一个3D版本的Transporter,它利用混合3D表示、对比注意力和隐式重建。我们在3D可变物体和非均质动物(人类和小鼠)上应用这种新学习系统,并证明学习的关键点是空间-时间一致的。此外,我们提出了一种封闭控制策略,使用学习的关键点进行3D物体操作,并证明其性能更高。代码可以在https://github.com/zhongcl-thu/3D-Implicit-Transporter上下载。》Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment

  • paper_url: http://arxiv.org/abs/2309.05095
  • repo_url: None
  • paper_authors: Tina Behrouzi, Atefeh Shahroudnejad, Payam Mousavi
  • for: 本研究旨在提出一种新的端到端身份无关面塑渲染系统,MaskRenderer,可以在实时下生成真实、高精度的面塑渲染图像。
  • methods: MaskRenderer使用以下三种方法来解决现有面塑渲染问题:(i)使用3DMM模型来更好地处理pose变化、遮挡和嘴部运动;(ii)使用 triplet loss函数来在训练中进行cross-reenactment,以保持人脸认知;(iii)使用多尺度遮挡来提高填充和恢复失去的区域。
  • results: 根据在VoxCeleb1测试集上进行的全面量化和质量测试,MaskRenderer比现有的模型在未看到面前,尤其是当源和驱动身份很不同时,表现出优异的效果。
    Abstract We present a novel end-to-end identity-agnostic face reenactment system, MaskRenderer, that can generate realistic, high fidelity frames in real-time. Although recent face reenactment works have shown promising results, there are still significant challenges such as identity leakage and imitating mouth movements, especially for large pose changes and occluded faces. MaskRenderer tackles these problems by using (i) a 3DMM to model 3D face structure to better handle pose changes, occlusion, and mouth movements compared to 2D representations; (ii) a triplet loss function to embed the cross-reenactment during training for better identity preservation; and (iii) multi-scale occlusion, improving inpainting and restoring missing areas. Comprehensive quantitative and qualitative experiments conducted on the VoxCeleb1 test set, demonstrate that MaskRenderer outperforms state-of-the-art models on unseen faces, especially when the Source and Driving identities are very different.
    摘要 我们提出了一种新的端到端无关identitface reenactment系统,MaskRenderer,可以在实时下生成真实、高质量的帧。尽管最近的face reenactment工作已经显示出了有前途的结果,但还存在许多挑战,如人脸泄露和模仿嘴部运动,尤其是大 pose 变化和 occluded 人脸。MaskRenderer 通过以下几个方法解决这些问题:(i) 使用 3DMM 模型人脸结构来更好地处理 pose 变化、遮挡和嘴部运动,相比于2D 表示。(ii) 使用 triplet 损失函数在训练期间进行 cross-reenactment 的嵌入,以保持人脸identit。(iii) 使用多尺度遮挡,提高填充和恢复缺失区域。我们在 VoxCeleb1 测试集上进行了全面的量化和质量实验,结果表明,MaskRenderer 在未看过的人脸上比 state-of-the-art 模型更高效,特别是当 Source 和 Driving 身份很不同时。

Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference

  • paper_url: http://arxiv.org/abs/2309.05090
  • repo_url: None
  • paper_authors: Sudarshan Sreeram, Bernhard Kainz
  • for: 这个论文旨在应用机器学习技术于医疗领域,以提高患者结果。
  • methods: 这个论文使用了筛选排序技术,测试了心血管疾病和眼科领域的分割模型。
  • results: 研究发现,使用筛选排序技术可以实现图像压缩率达1148倍,而无需 sacrifiSing图像质量。此外, filter-pruned模型在高压缩率下的执行速度比GPU基eline更快。此外,这些模型还表现出了比基eline和Weight-pruned模型更好的 Robustness和泛化特点。
    Abstract Applying ML advancements to healthcare can improve patient outcomes. However, the sheer operational complexity of ML models, combined with legacy hardware and multi-modal gigapixel images, poses a severe deployment limitation for real-time, on-device inference. We consider filter pruning as a solution, exploring segmentation models in cardiology and ophthalmology. Our preliminary results show a compression rate of up to 1148x with minimal loss in quality, stressing the need to consider task complexity and architectural details when using off-the-shelf models. At high compression rates, filter-pruned models exhibit faster inference on a CPU than the GPU baseline. We also demonstrate that such models' robustness and generalisability characteristics exceed that of the baseline and weight-pruned counterparts. We uncover intriguing questions and take a step towards realising cost-effective disease diagnosis, monitoring, and preventive solutions.
    摘要 (Simplified Chinese translation)使用机器学习(ML)技术应用于医疗领域可以提高病人结果,但是ML模型的运算复杂性,加上传统硬件和多模式 gigapixel 图像,导致实时、设备上的推理存在严重的部署限制。我们考虑使用筛子剪辑作为解决方案,探索卡диологи和眼科领域中的 segmentation 模型。我们的初步结果显示,可以达到 1148x 的压缩率,而且影响质量 minimal。这说明在使用卖外模型时需要考虑任务复杂性和建筑特点。在高压缩率下,筛子剪辑的模型在 CPU 上的推理速度比 GPU 基线 faster。我们还发现这些模型的可靠性和泛化特点超过基线和重量剪辑模型。我们探索了一些有趣的问题,并在实现成本效果的道路上进行了一步前进。

FreeMan: Towards Benchmarking 3D Human Pose Estimation in the Wild

  • paper_url: http://arxiv.org/abs/2309.05073
  • repo_url: https://github.com/wangjiongw/freeman_api
  • paper_authors: Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Ruimao Zhang
  • for: 这个论文的目的是为了提供一个大规模、实际场景中的人体3D姿态估计数据集,以促进人体 pose estimation 领域的研究。
  • methods: 这个论文使用了多视图摄像头捕捉的方法,将数据集收集到了8000个序列、1100万帧中,并设计了一个自动化、精确的标注管道,以便大规模处理。
  • results: 这个论文提供了一个大规模、实际场景中的人体3D姿态估计数据集,并提供了评估基准,以便评估不同任务的性能。这个数据集还能够在实际场景中提供Robust的人体姿态估计。
    Abstract Estimating the 3D structure of the human body from natural scenes is a fundamental aspect of visual perception. This task carries great importance for fields like AIGC and human-robot interaction. In practice, 3D human pose estimation in real-world settings is a critical initial step in solving this problem. However, the current datasets, often collected under controlled laboratory conditions using complex motion capture equipment and unvarying backgrounds, are insufficient. The absence of real-world datasets is stalling the progress of this crucial task. To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, real-world multi-view dataset. FreeMan was captured by synchronizing 8 smartphones across diverse scenarios. It comprises 11M frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions. We have also established an automated, precise labeling pipeline that allows for large-scale processing efficiently. We provide comprehensive evaluation baselines for a range of tasks, underlining the significant challenges posed by FreeMan. Further evaluations of standard indoor/outdoor human sensing datasets reveal that FreeMan offers robust representation transferability in real and complex scenes. FreeMan is now publicly available at https://wangjiongw.github.io/freeman.
    摘要 <>translate the following text into Simplified Chinese<>人体三维结构估算从自然场景中获得是视觉认知的基本问题。这项任务对于AIGC和人机交互等领域具有极大的重要性。然而,现有的数据集,通常在控制的实验室条件下使用复杂的运动跟踪设备和不变的背景 captured,具有限制性。absence of real-world data is hindering the progress of this critical task. To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, real-world multi-view dataset. FreeMan was captured by synchronizing 8 smartphones across diverse scenarios. It comprises 11 million frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions. We have also established an automated, precise labeling pipeline that allows for large-scale processing efficiently. We provide comprehensive evaluation baselines for a range of tasks, underlining the significant challenges posed by FreeMan. Further evaluations of standard indoor/outdoor human sensing datasets reveal that FreeMan offers robust representation transferability in real and complex scenes. FreeMan is now publicly available at .Translation:人体三维结构估算从自然场景中获得是视觉认知的基本问题。这项任务对于AIGC和人机交互等领域具有极大的重要性。然而,现有的数据集,通常在控制的实验室条件下使用复杂的运动跟踪设备和不变的背景 captured,具有限制性。absence of real-world data is hindering the progress of this critical task. To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, real-world multi-view dataset. FreeMan was captured by synchronizing 8 smartphones across diverse scenarios. It comprises 11 million frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions. We have also established an automated, precise labeling pipeline that allows for large-scale processing efficiently. We provide comprehensive evaluation baselines for a range of tasks, underlining the significant challenges posed by FreeMan. Further evaluations of standard indoor/outdoor human sensing datasets reveal that FreeMan offers robust representation transferability in real and complex scenes. FreeMan is now publicly available at .

Lung Diseases Image Segmentation using Faster R-CNNs

  • paper_url: http://arxiv.org/abs/2309.06386
  • repo_url: None
  • paper_authors: Mihir Jain
  • for: 这个论文主要是为了提高儿童肺病诊断的准确率,以减少发展国家儿童死亡率。
  • methods: 这个论文提出了一种基于低级别神经网络结构的方法,以解决深度网络中的拓扑挑战。该方法包括在特征峰中嵌入参数,以提高数据提取和避免信息损失。它还使用软非最大抑制来优化地方提案网络生成的地方提案。
  • results: 该论文在肺X射线图像上进行了测试,并计算了冲撤率、准确率、敏感度和特征率来评估模型的性能。研究还分析了损失函数的趋势,包括训练阶段和分类阶段的loss函数。
    Abstract Lung diseases are a leading cause of child mortality in the developing world, with India accounting for approximately half of global pneumonia deaths (370,000) in 2016. Timely diagnosis is crucial for reducing mortality rates. This paper introduces a low-density neural network structure to mitigate topological challenges in deep networks. The network incorporates parameters into a feature pyramid, enhancing data extraction and minimizing information loss. Soft Non-Maximal Suppression optimizes regional proposals generated by the Region Proposal Network. The study evaluates the model on chest X-ray images, computing a confusion matrix to determine accuracy, precision, sensitivity, and specificity. We analyze loss functions, highlighting their trends during training. The regional proposal loss and classification loss assess model performance during training and classification phases. This paper analysis lung disease detection and neural network structures.
    摘要 乳腺疾病是发展中国家儿童死亡率的主要原因,印度负责全球肺炎死亡人数的大约一半(370,000)在2016年。 时间早报诊断非常重要,以减少死亡率。这篇论文介绍了一种低密度神经网络结构,以减少深度网络的拓扑挑战。该网络嵌入参数到特征pyramid中,提高数据提取和减少信息损失。软非最大抑制优化地区提议生成的地区提议网络。本研究对胸部X射线图像进行评估,计算冲混矩阵来确定准确率、精度、敏感度和特征率。我们分析损失函数,描述它们在训练和分类阶段的趋势。地域提议损失和分类损失评估模型在训练和分类阶段的性能。本文分析肺疾病检测和神经网络结构。

Super-Resolution Surface Reconstruction from Few Low-Resolution Slices

  • paper_url: http://arxiv.org/abs/2309.05071
  • repo_url: https://github.com/cyiyoo/SurfaceReconstructionFromFewSlices
  • paper_authors: Yiyao Zhang, Ke Chen, Shang-Hua Yang
  • for: 提高几何特征表面的分辨率,以便进行其他数学模拟(如费米素分析)。
  • methods: 提出了一种基于循环几何的变量模型,并实现了两种数值算法(投影梯度下降法和多个参数的替换方法)来解决该模型。
  • results: 通过实际例子(包括另一种变量模型的输出)的数值实验,显示了新模型的优点,并通过几何学的标准差比较来证明其精度的提高。
    Abstract In many imaging applications where segmented features (e.g. blood vessels) are further used for other numerical simulations (e.g. finite element analysis), the obtained surfaces do not have fine resolutions suitable for the task. Increasing the resolution of such surfaces becomes crucial. This paper proposes a new variational model for solving this problem, based on an Euler-Elastica-based regulariser. Further, we propose and implement two numerical algorithms for solving the model, a projected gradient descent method and the alternating direction method of multipliers. Numerical experiments using real-life examples (including two from outputs of another variational model) have been illustrated for effectiveness. The advantages of the new model are shown through quantitative comparisons by the standard deviation of Gaussian curvatures and mean curvatures from the viewpoint of discrete geometry.
    摘要 Many 图像应用程序中,已经分割特征(例如血液动脉)进行其他数学模拟(例如finite element分析)时,获得的表面没有高精度适用于任务。 提高表面精度成为重要问题。这篇论文提出了一种新的可变模型,基于Euler-Elastica基础函数。此外,我们提出并实现了两种数值算法来解决该模型,即投影梯度下降法和分解方向多项式法。实际实验使用真实的例子(包括另一种可变模型的输出),以示效果。新模型的优势通过精度评估(基于离散几何的标准差)与其他模型进行比较。

Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

  • paper_url: http://arxiv.org/abs/2309.05069
  • repo_url: https://github.com/bobwan1995/zeroshot-hoi-with-clip
  • paper_authors: Bo Wan, Tinne Tuytelaars
  • for: 本研究探讨了零例人物对象互动(HOI)检测任务,这是一种不需要特定任务标注的新 paradigm。
  • methods: 我们采用了CLIP,一个大规模预训练的视觉语言模型(VLM),进行知识储存和多级卷积 neural network,以学习HOI表示。
  • results: 我们的实验证明,我们的新的多级CLIP知识集成策略能够实现强大的表现,与一些完全监督和弱监督方法相比,其表现甚至能达到公共HICO-DET标准 benchmark 的水平。
    Abstract In this paper, we investigate the task of zero-shot human-object interaction (HOI) detection, a novel paradigm for identifying HOIs without the need for task-specific annotations. To address this challenging task, we employ CLIP, a large-scale pre-trained vision-language model (VLM), for knowledge distillation on multiple levels. Specifically, we design a multi-branch neural network that leverages CLIP for learning HOI representations at various levels, including global images, local union regions encompassing human-object pairs, and individual instances of humans or objects. To train our model, CLIP is utilized to generate HOI scores for both global images and local union regions that serve as supervision signals. The extensive experiments demonstrate the effectiveness of our novel multi-level CLIP knowledge integration strategy. Notably, the model achieves strong performance, which is even comparable with some fully-supervised and weakly-supervised methods on the public HICO-DET benchmark.
    摘要 在本文中,我们研究了零例人物对象互动(HOI)检测任务,这是一种新的概念,可以无需特定任务的注释来认定HOIs。为解决这个复杂的任务,我们使用了CLIP,一个大规模预训练的视觉语言模型(VLM),进行知识储存在多级。具体来说,我们设计了一个多支 neuron 网络,利用CLIP来学习 HOI 表示形式在不同级别,包括全图、人物对象对的本地联合区域以及人类或物体的个体实例。为训练我们的模型,CLIP 生成了 HOI scores для全图和本地联合区域,这些权重函数作为超参数。我们的实验证明,我们的新的多级 CLIP 知识集成策略有效。特别是,模型在公共 HICO-DET benchmark 上表现出色,与一些完全监督和弱监督方法相比,其性能甚至达到了一些相似水平。

Multi-view Self-supervised Disentanglement for General Image Denoising

  • paper_url: http://arxiv.org/abs/2309.05049
  • repo_url: https://github.com/chqwer2/multi-view-self-supervised-disentanglement-denoising
  • paper_authors: Hao Chen, Chenyuan Qu, Yu Zhang, Chen Chen, Jianbo Jiao
  • for: 提高现代图像噪声去除器的性能,采用深度学习方法。
  • methods: 提出了一种自我超vised学习框架,不需要seen clean图像,通过两个不同的噪声版本输入,学习分离干净图像的特征和噪声。
  • results: 在 synthetic 和实际噪声下,与先前自我超vised方法相比,提出的方法表现更加优秀,特别是在未看到的新噪声类型上。在实际噪声下,even outperform its supervised counterparts by over 3 dB.
    Abstract With its significant performance improvements, the deep learning paradigm has become a standard tool for modern image denoisers. While promising performance has been shown on seen noise distributions, existing approaches often suffer from generalisation to unseen noise types or general and real noise. It is understandable as the model is designed to learn paired mapping (e.g. from a noisy image to its clean version). In this paper, we instead propose to learn to disentangle the noisy image, under the intuitive assumption that different corrupted versions of the same clean image share a common latent space. A self-supervised learning framework is proposed to achieve the goal, without looking at the latent clean image. By taking two different corrupted versions of the same image as input, the proposed Multi-view Self-supervised Disentanglement (MeD) approach learns to disentangle the latent clean features from the corruptions and recover the clean image consequently. Extensive experimental analysis on both synthetic and real noise shows the superiority of the proposed method over prior self-supervised approaches, especially on unseen novel noise types. On real noise, the proposed method even outperforms its supervised counterparts by over 3 dB.
    摘要 Translated into Simplified Chinese:利用深度学习 paradigm 的性能提升,现代图像去噪器已成为标准工具。然而,现有方法常受到未经测试的噪音类型或通用噪音的影响。这是因为模型是用来学习对应关系(例如,从噪图到其干净版本)。在这篇文章中,我们反而提议学习噪图,基于干净图像的假设,即不同的噪图版本都共享同一个封闭空间。我们提出了一种自我超级vised学习框架,以实现目标。无需考虑干净图像,我们的Multi-view Self-supervised Disentanglement(MeD)方法可以从两个不同的噪图版本中提取干净特征,并 eventually 恢复干净图像。我们对具有 synthetic 和实际噪音的实验分析表明,我们的方法在先前的自我超级vised方法中表现出优异性,特别是在未经测试的新类型噪音上。在实际噪音上,我们的方法甚至超过了其supervised对手的性能,高于3 dB。

What Is Near?: Room Locality Learning for Enhanced Robot Vision-Language-Navigation in Indoor Living Environments

  • paper_url: http://arxiv.org/abs/2309.05036
  • repo_url: None
  • paper_authors: Muraleekrishna Gopinathan, Jumana Abu-Khalaf, David Suter, Sidike Paheding, Nathir A. Rawashdeh
  • for: 本研究旨在提供基于常见家居空间知识的可理解语言导航(VLN)模型,帮助导航器在新环境中快速寻找目标房间。
  • methods: 我们提出了一种名为WIN(What Is Near)的共通elian学习模型,该模型根据当前观察和导航历史,使用生活空间的常见知识来预测当前环境的局部地图。
  • results: 我们的实验结果显示,基于WIN模型的本地-全球规划和预测室内布局可以帮助导航器更好地选择合适的行动,并在未看过的环境中表现出比 классиical VLN代理更好的普适性。我们的模型在标准VLN指标中获得了68%的成功率和63%的成功指标。
    Abstract Humans use their knowledge of common house layouts obtained from previous experiences to predict nearby rooms while navigating in new environments. This greatly helps them navigate previously unseen environments and locate their target room. To provide layout prior knowledge to navigational agents based on common human living spaces, we propose WIN (\textit{W}hat \textit{I}s \textit{N}ear), a commonsense learning model for Vision Language Navigation (VLN) tasks. VLN requires an agent to traverse indoor environments based on descriptive navigational instructions. Unlike existing layout learning works, WIN predicts the local neighborhood map based on prior knowledge of living spaces and current observation, operating on an imagined global map of the entire environment. The model infers neighborhood regions based on visual cues of current observations, navigational history, and layout common sense. We show that local-global planning based on locality knowledge and predicting the indoor layout allows the agent to efficiently select the appropriate action. Specifically, we devised a cross-modal transformer that utilizes this locality prior for decision-making in addition to visual inputs and instructions. Experimental results show that locality learning using WIN provides better generalizability compared to classical VLN agents in unseen environments. Our model performs favorably on standard VLN metrics, with Success Rate 68\% and Success weighted by Path Length 63\% in unseen environments.
    摘要 人类利用前期经验获得的常见家庭布局知识,在新环境中预测附近的房间,以便更好地导航和寻找目标房间。为了为导航代理人提供基于常见生活空间的布局先验知识,我们提议了WIN(何处near)模型,这是一种基于视觉语言导航(VLN)任务的 Commonsense 学习模型。VLN需要一个代理人通过描述性导航指令 traverse indoor环境。不同于现有的布局学习工作,WIN 预测当前环境的本地区域地图,基于先前的生活空间知识和当前观察,并在想象的全局环境地图上进行操作。模型根据当前观察的视觉指示和导航历史,以及布局常识来做地图推断。我们发现,基于本地知识和预测室内布局,使用WIN 模型进行本地-全局规划,可以有效地选择适当的行动。特别是,我们开发了一种跨模态变换器,使用这种本地知识进行决策,并与视觉输入和指令一起使用。实验结果表明,利用WIN 模型学习本地知识可以在未seen 环境中提供更好的普适性,我们的模型在标准 VLN 指标上表现良好,Success Rate 为 68%,Success weighted by Path Length 为 63%。

Unified Contrastive Fusion Transformer for Multimodal Human Action Recognition

  • paper_url: http://arxiv.org/abs/2309.05032
  • repo_url: None
  • paper_authors: Kyoung Ok Yang, Junho Koh, Jun Won Choi
    for:本研究旨在提高人体动作识别(HAR)模型的性能,通过融合不同感知器获取的数据。methods:本文提出了一种新的多模式融合架构, referred to as Unified Contrastive Fusion Transformer (UCFFormer),用于融合不同分布的数据,以提高HAR性能。UCFFormer使用了Unified Transformer来捕捉多模式嵌入特征之间的相互关系,并使用Factorized Time-Modality Attention进行有效的自我注意力计算。此外,UCFFormer还包括对比学习,以减少不同模式特征分布之间的差异,从而生成协调的特征 для信息融合。results:实验结果表明,UCFFormer在UTD-MHAD和NTU RGB+D等两个 популяр的数据集上表现出状元的性能,与竞争方法相比,具有显著的优势。
    Abstract Various types of sensors have been considered to develop human action recognition (HAR) models. Robust HAR performance can be achieved by fusing multimodal data acquired by different sensors. In this paper, we introduce a new multimodal fusion architecture, referred to as Unified Contrastive Fusion Transformer (UCFFormer) designed to integrate data with diverse distributions to enhance HAR performance. Based on the embedding features extracted from each modality, UCFFormer employs the Unified Transformer to capture the inter-dependency among embeddings in both time and modality domains. We present the Factorized Time-Modality Attention to perform self-attention efficiently for the Unified Transformer. UCFFormer also incorporates contrastive learning to reduce the discrepancy in feature distributions across various modalities, thus generating semantically aligned features for information fusion. Performance evaluation conducted on two popular datasets, UTD-MHAD and NTU RGB+D, demonstrates that UCFFormer achieves state-of-the-art performance, outperforming competing methods by considerable margins.
    摘要 不同类型的感知器被考虑用于开发人体行为识别(HAR)模型。在这篇论文中,我们介绍了一种新的多模态融合架构,称为统一对比融合变换器(UCFFormer),用于融合具有多样化分布的数据以提高HAR性能。基于每种模式中提取的嵌入特征,UCFFormer使用统一变换器来捕捉时间和模式域中嵌入的相互关系。我们还提出了分解时间-模式注意力 Mechanism,以实现效率的自我注意力计算。UCFFormer还 integrate了对比学习,以降低不同模式特征分布的差异,从而生成具有相同含义的特征进行信息融合。在UTD-MHAD和NTU RGB+D等两个流行的数据集上进行了性能评估,结果表明,UCFFormer在与竞争方法相比具有明显的优势,以至于达到状态之 искусственный智能。

SC-NeRF: Self-Correcting Neural Radiance Field with Sparse Views

  • paper_url: http://arxiv.org/abs/2309.05028
  • repo_url: None
  • paper_authors: Liang Song, Guangming Wang, Jiuming Liu, Zhenyang Fu, Yanzi Miao, Hesheng
  • for: 本研究扩展了神经辐射场的总结任务到户外场景,并且只使用对象级 datasets 进行训练。
  • methods: 我们提出了一种基于多头注意力机制的几何修正模块和出现修正模块来解决室外场景中的分布隔离和视角变化导致的渲染问题。
  • results: 我们的方法在四个 dataset (Blender, DTU, LLFF, Spaces)上进行评估,与之前的方法相比,我们的网络在户外场景中表现出色,PSNR 平均值由 19.369 提高到 25.989,SSIM 平均值由 0.838 提高到 0.889,LPIPS 值由 0.265 降低到 0.224。
    Abstract In recent studies, the generalization of neural radiance fields for novel view synthesis task has been widely explored. However, existing methods are limited to objects and indoor scenes. In this work, we extend the generalization task to outdoor scenes, trained only on object-level datasets. This approach presents two challenges. Firstly, the significant distributional shift between training and testing scenes leads to black artifacts in rendering results. Secondly, viewpoint changes in outdoor scenes cause ghosting or missing regions in rendered images. To address these challenges, we propose a geometric correction module and an appearance correction module based on multi-head attention mechanisms. We normalize rendered depth and combine it with light direction as query in the attention mechanism. Our network effectively corrects varying scene structures and geometric features in outdoor scenes, generalizing well from object-level to unseen outdoor scenes. Additionally, we use appearance correction module to correct appearance features, preventing rendering artifacts like blank borders and ghosting due to viewpoint changes. By combining these modules, our approach successfully tackles the challenges of outdoor scene generalization, producing high-quality rendering results. When evaluated on four datasets (Blender, DTU, LLFF, Spaces), our network outperforms previous methods. Notably, compared to MVSNeRF, our network improves average PSNR from 19.369 to 25.989, SSIM from 0.838 to 0.889, and reduces LPIPS from 0.265 to 0.224 on Spaces outdoor scenes.
    摘要 在 latest studies, 神经网络频谱场景推广到新视图合成任务中得到了广泛探索。然而,现有方法仅限于对象和室内场景。在这种工作中,我们将推广任务扩展到户外场景,只使用对象级数据进行训练。这种方法存在两个挑战。首先,训练和测试场景之间的分布差异导致黑色artefacts在渲染结果中出现。其次,户外场景中的视角变化会导致 Ghosting 或者 absent 区域在渲染图像中出现。为了解决这些挑战,我们提议一种几何修正模块和一种外观修正模块,这两个模块都基于多头注意机制。我们将渲染深度Normalize 并与光direction作为查询在注意机制中。我们的网络有效地 correction 户外场景中的变化Scene structure 和几何特征,通过对象级数据进行推广,在未看到的户外场景中 generalized 良好。此外,我们使用外观修正模块来修正外观特征,避免由视角变化引起的渲染缺陷,如空白边缘和 Ghosting。通过这两个模块的组合,我们的方法成功地解决了户外场景推广的挑战,生成高质量的渲染结果。当我们的网络在四个数据集(Blender, DTU, LLFF, Spaces)进行评估时,与之前的方法相比,我们的网络在PSNR, SSIM 和 LPIPS 等指标上表现出色,特别是在 Spaces 户外场景上,我们的网络从19.369 提高到25.989,从0.838 提高到0.889,并且降低了0.265 到0.224。

Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch

  • paper_url: http://arxiv.org/abs/2309.07909
  • repo_url: None
  • paper_authors: Zelin Zang, Hao Luo, Kai Wang, Panpan Zhang, Fan Wang, Stan. Z Li, Yang You
  • for: 提高不监督对比学习的效果,特别是针对科学数据领域的数据增强。
  • methods: 提出了一种基于扩散的数据增强技术DiffAug,通过扩散步骤确保增强后和原始数据的略matrizspace相似。DiffAug不需要标签、外部数据/模型或先前知识,因为它首先在邻域中挖掘足够的语义知识。
  • results: DiffAug在图像分类和聚类任务上提高了1.6%~4.5%的精度,对生物数据进行应用后提高了10.1%的性能,平均提高5.8%。DiffAug在视觉和生物领域都表现良好。
    Abstract Unsupervised contrastive learning methods have recently seen significant improvements, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, prevailing data augmentation methods, whether hand designed or based on foundation models, tend to rely heavily on prior knowledge or external data. This dependence often compromises their effectiveness and efficiency. Furthermore, the applicability of most existing data augmentation strategies is limited when transitioning to other research domains, especially science-related data. This limitation stems from the paucity of prior knowledge and labeled data available in these domains. To address these challenges, we introduce DiffAug-a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood. This provides a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. Designed as an architecture-agnostic framework, DiffAug provides consistent improvements. Specifically, it improves image classification and clustering accuracy by 1.6%~4.5%. When applied to biological data, DiffAug improves performance by up to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance in both vision and biological domains.
    摘要 Recently, unsupervised contrastive learning methods have made significant progress, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, current data augmentation methods, whether designed by hand or based on pre-existing models, often rely heavily on prior knowledge or external data, which can limit their effectiveness and efficiency. Moreover, most existing data augmentation strategies are not applicable to other research domains, especially science-related data, due to the lack of prior knowledge and labeled data available in these domains. To address these challenges, we propose DiffAug, a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through a series of diffusion steps. Unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood, providing a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. As an architecture-agnostic framework, DiffAug provides consistent improvements, with image classification and clustering accuracy improved by 1.6%~4.5% and biological data improved by up to 10.1%, with an average improvement of 5.8%. DiffAug demonstrates good performance in both vision and biological domains.

DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices

  • paper_url: http://arxiv.org/abs/2309.05015
  • repo_url: None
  • paper_authors: Guanyu Xu, Zhiwei Hao, Yong Luo, Han Hu, Jianping An, Shiwen Mao
    for:* The paper aims to achieve fast and energy-efficient collaborative inference for vision transformer (ViT) models on resource-constrained edge devices, while maintaining comparable accuracy with large ViTs.methods:* The authors propose a collaborative inference framework called DeViT, which decomposes large ViTs into multiple small models for efficient inference on edge devices.* They also design a decomposition-and-ensemble algorithm based on knowledge distillation, called DEKD, to fuse multiple small decomposed models and reduce communication overheads.results:* The authors achieve efficient collaborative inference for ViTs and outperform existing lightweight ViTs, striking a good trade-off between efficiency and accuracy.* Their DeViTs improve end-to-end latency by 2.89 times with only 1.65% accuracy sacrifice compared to the large ViT, ViT-L/16, on the GPU server.* Their DeDeiTs surpass the recent efficient ViT, MobileViT-S, by 3.54% in accuracy on ImageNet-1K, while running 1.72 times faster and requiring 55.28% lower energy consumption on the edge device.
    Abstract Recent years have witnessed the great success of vision transformer (ViT), which has achieved state-of-the-art performance on multiple computer vision benchmarks. However, ViT models suffer from vast amounts of parameters and high computation cost, leading to difficult deployment on resource-constrained edge devices. Existing solutions mostly compress ViT models to a compact model but still cannot achieve real-time inference. To tackle this issue, we propose to explore the divisibility of transformer structure, and decompose the large ViT into multiple small models for collaborative inference at edge devices. Our objective is to achieve fast and energy-efficient collaborative inference while maintaining comparable accuracy compared with large ViTs. To this end, we first propose a collaborative inference framework termed DeViT to facilitate edge deployment by decomposing large ViTs. Subsequently, we design a decomposition-and-ensemble algorithm based on knowledge distillation, termed DEKD, to fuse multiple small decomposed models while dramatically reducing communication overheads, and handle heterogeneous models by developing a feature matching module to promote the imitations of decomposed models from the large ViT. Extensive experiments for three representative ViT backbones on four widely-used datasets demonstrate our method achieves efficient collaborative inference for ViTs and outperforms existing lightweight ViTs, striking a good trade-off between efficiency and accuracy. For example, our DeViTs improves end-to-end latency by 2.89$\times$ with only 1.65% accuracy sacrifice using CIFAR-100 compared to the large ViT, ViT-L/16, on the GPU server. DeDeiTs surpasses the recent efficient ViT, MobileViT-S, by 3.54% in accuracy on ImageNet-1K, while running 1.72$\times$ faster and requiring 55.28% lower energy consumption on the edge device.
    摘要 近年来,大量参数和高计算成本的视觉变换器(ViT)在计算机视觉领域取得了伟大成就,但是ViT模型在有限的边缘设备上进行部署是困难的。现有的解决方案主要是压缩ViT模型到一个尺寸化的模型,但是这些模型仍然无法实现实时推理。为了解决这个问题,我们提出了在变换结构中进行分解的方法,将大型ViT decomposed into multiple small models,以实现在边缘设备上快速和能efficient的合作推理。我们的目标是实现高度可配置的推理,同时保持与大型ViT相同的准确性。为此,我们首先提出了一种协同推理框架,称为DeViT,以便在边缘设备上部署。然后,我们设计了一种基于知识传授的分解和聚合算法,称为DEKD,以减少通信开销,并处理不同模型的兼容性。我们还开发了一种匹配特征模块,以促进分解模型从大型ViT中的模仿。我们对三种常见的ViT脊梁进行了四个常用的数据集进行了广泛的实验。结果表明,我们的方法可以实现高效的协同推理,同时保持与大型ViT相同的准确性。例如,我们的DeViT在CIFAR-100上提高了端到端延迟时间,同时仅减少了1.65%的准确性。DeDeiTs在ImageNet-1K上超越了最近的高效ViT,MobileViT-S,而且在边缘设备上运行1.72倍快,需要55.28%的能源投入下降。

Geometrically Consistent Partial Shape Matching

  • paper_url: http://arxiv.org/abs/2309.05013
  • repo_url: None
  • paper_authors: Viktoria Ehm, Paul Roetzer, Marvin Eisenberger, Maolin Gao, Florian Bernard, Daniel Cremers
  • for: 这 paper 的目的是解决计算机视觉和图形领域中缺失部分形状匹配的问题,例如形状插值、pose transfer 和 texture transfer。
  • methods: 这 paper 使用了国际线性程序程序方法,并将国际状态艺术特征集成到匹配中。
  • results: 这 paper 的结果表明,使用这种方法可以在部分观察到的形状上提供更可靠和更平滑的匹配结果,并且可以比较学习基于的现有匹配方法。
    Abstract Finding correspondences between 3D shapes is a crucial problem in computer vision and graphics, which is for example relevant for tasks like shape interpolation, pose transfer, or texture transfer. An often neglected but essential property of matchings is geometric consistency, which means that neighboring triangles in one shape are consistently matched to neighboring triangles in the other shape. Moreover, while in practice one often has only access to partial observations of a 3D shape (e.g. due to occlusion, or scanning artifacts), there do not exist any methods that directly address geometrically consistent partial shape matching. In this work we fill this gap by proposing to integrate state-of-the-art deep shape features into a novel integer linear programming partial shape matching formulation. Our optimization yields a globally optimal solution on low resolution shapes, which we then refine using a coarse-to-fine scheme. We show that our method can find more reliable results on partial shapes in comparison to existing geometrically consistent algorithms (for which one first has to fill missing parts with a dummy geometry). Moreover, our matchings are substantially smoother than learning-based state-of-the-art shape matching methods.
    摘要 寻找3D形状之间的对应关系是计算机视觉和图形处理中的一项重要问题,这种问题在例如形状 interpolate、pose transfer 和 texture transfer 等任务中具有重要意义。一种常被忽略但重要的属性是 geometric consistency,即邻近的三角形在一个形状中是一致地匹配邻近的三角形在另一个形状中。而在实践中,一般只有对形状的部分观察数据(例如因 occlusion 或扫描 artifacts)的访问,而没有任何方法可以直接处理准确的 partial shape matching。在这个工作中,我们填充了这个空白,我们提议将 state-of-the-art 的深度 shape feature 集成到一个新的整数线性 программирова部分形状匹配方法中。我们的优化得到了低分辨率形状上的全球优化解决方案,然后使用一种 course-to-fine 方案进行细化。我们表明,我们的方法可以在 partial shape 上获得更可靠的结果,并且与现有的准确 geometric consistency 算法相比,我们的匹配结果更平滑。

  • paper_url: http://arxiv.org/abs/2309.04967
  • repo_url: None
  • paper_authors: Pengcheng Zhang, Xiao Bai, Jin Zheng, Xin Ning
  • for: 本文旨在提出一种全分解人体搜索方法,即将搜索任务分解为两个子任务:检测和身份识别,并通过一个统一的模型来实现这两个任务。
  • methods: 该方法使用一种任务协调学习策略,即在检测和身份识别任务之间进行协调学习,以解决这两个任务之间的冲突。
  • results: 实验结果表明,该方法可以减轻人体搜索任务中的冲突,并提高搜索性能。
    Abstract End-to-end person search aims to jointly detect and re-identify a target person in raw scene images with a unified model. The detection task unifies all persons while the re-id task discriminates different identities, resulting in conflict optimal objectives. Existing works proposed to decouple end-to-end person search to alleviate such conflict. Yet these methods are still sub-optimal on one or two of the sub-tasks due to their partially decoupled models, which limits the overall person search performance. In this paper, we propose to fully decouple person search towards optimal person search. A task-incremental person search network is proposed to incrementally construct an end-to-end model for the detection and re-id sub-task, which decouples the model architecture for the two sub-tasks. The proposed task-incremental network allows task-incremental training for the two conflicting tasks. This enables independent learning for different objectives thus fully decoupled the model for persons earch. Comprehensive experimental evaluations demonstrate the effectiveness of the proposed fully decoupled models for end-to-end person search.
    摘要 Traditional end-to-end person search aims to jointly detect and re-identify a target person in raw scene images with a unified model. The detection task unifies all persons while the re-id task discriminates different identities, resulting in conflict optimal objectives. Existing works proposed to decouple end-to-end person search to alleviate such conflict. Yet these methods are still sub-optimal on one or two of the sub-tasks due to their partially decoupled models, which limits the overall person search performance. In this paper, we propose to fully decouple person search towards optimal person search. A task-incremental person search network is proposed to incrementally construct an end-to-end model for the detection and re-id sub-task, which decouples the model architecture for the two sub-tasks. The proposed task-incremental network allows task-incremental training for the two conflicting tasks. This enables independent learning for different objectives thus fully decoupled the model for persons search. Comprehensive experimental evaluations demonstrate the effectiveness of the proposed fully decoupled models for end-to-end person search.Here's the word-for-word translation:既然传统的端到端人体搜索是通过一个统一的模型来探测和重新标识目标人体在原始场景图像中的,但是探测和重新标识的任务之间存在冲突的优化目标。现有的方法是将端到端人体搜索解除,以降低这种冲突。然而,这些方法仍然在一或两个子任务上不优化,这限制了整体人体搜索性能。在本文中,我们提出了完全解除人体搜索的方法。我们提出了一种任务增量人体搜索网络,可以逐步建立端到端模型,以解除探测和重新标识两个子任务的模型体系。这种任务增量网络允许任务增量训练,从而实现独立学习不同目标,即完全解除人体搜索模型。我们的实验证明了我们的完全解除模型的效果。

Multi-modal Extreme Classification

  • paper_url: http://arxiv.org/abs/2309.04961
  • repo_url: https://github.com/extreme-classification/mufin
  • paper_authors: Anshul Mittal, Kunal Dahiya, Shreya Malani, Janani Ramaswamy, Seba Kuruvilla, Jitendra Ajmera, Keng-hao Chang, Sumeet Agarwal, Purushottam Kar, Manik Varma
  • for: 这个论文是为了解决极端分类(XC)任务中 millions of labels 的问题,其中数据点和标签具有视觉和文本描述器。
  • methods: 这篇论文提出了 MUFIN 技术,它是一种基于 cross-modal 注意力的多模态方法,可以在 millions of labels 的情况下提供高精度的分类。
  • results: 在 MM-AmazonTitles-300K 数据集上,MUFIN 对比主流的文本基于、图像基于和多模态方法,能够提供至少 3% 高的精度。
    Abstract This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. Applications of MUFIN to product-to-product recommendation and bid query prediction over several millions of products are presented. Contemporary multi-modal methods frequently rely on purely embedding-based methods. On the other hand, XC methods utilize classifier architectures to offer superior accuracies than embedding only methods but mostly focus on text-based categorization tasks. MUFIN bridges this gap by reformulating multi-modal categorization as an XC problem with several millions of labels. This presents the twin challenges of developing multi-modal architectures that can offer embeddings sufficiently expressive to allow accurate categorization over millions of labels; and training and inference routines that scale logarithmically in the number of labels. MUFIN develops an architecture based on cross-modal attention and trains it in a modular fashion using pre-training and positive and negative mining. A novel product-to-product recommendation dataset MM-AmazonTitles-300K containing over 300K products was curated from publicly available amazon.com listings with each product endowed with a title and multiple images. On the all datasets MUFIN offered at least 3% higher accuracy than leading text-based, image-based and multi-modal techniques. Code for MUFIN is available at https://github.com/Extreme-classification/MUFIN
    摘要 Traditional multi-modal methods rely solely on embedding-based approaches, while XC methods use classifier architectures for higher accuracy but are limited to text-based categorization tasks. MUFIN addresses this gap by treating multi-modal categorization as an XC problem with millions of labels. This presents two challenges: developing multi-modal architectures that can provide expressive embeddings for accurate categorization, and training and inference routines that scale logarithmically with the number of labels.MUFIN's architecture is based on cross-modal attention and is trained in a modular fashion using pre-training and positive and negative mining. The authors curated a new dataset, MM-AmazonTitles-300K, containing over 300K products with titles and multiple images from publicly available Amazon listings. MUFIN achieved at least 3% higher accuracy than leading text-based, image-based, and multi-modal techniques on all datasets.The code for MUFIN is available on GitHub at .

SdCT-GAN: Reconstructing CT from Biplanar X-Rays with Self-driven Generative Adversarial Networks

  • paper_url: http://arxiv.org/abs/2309.04960
  • repo_url: https://github.com/csqvictory/sdct-gan
  • paper_authors: Shuangqin Cheng, Qingliang Chen, Qiyi Zhang, Ming Li, Yamuhanmode Alike, Kaile Su, Pengcheng Wen
  • For: The paper aims to address the limitations of existing methods in reconstructing 3D CT images from a limited number of 2D X-rays, by proposing a new self-driven generative adversarial network model (SdCT-GAN) that prioritizes the preservation of image details.* Methods: The proposed SdCT-GAN model incorporates a novel auto-encoder structure in the discriminator and utilizes a Sobel Gradient Guider (SGG) idea to integrate edge information from the 2D X-ray image at the input. Additionally, the LPIPS evaluation metric is adopted to quantitatively evaluate the fine contours and textures of reconstructed images.* Results: The proposed SdCT-GAN model outperforms mainstream state-of-the-art baselines in terms of both qualitative and quantitative results, as demonstrated through empirical studies. The reconstructed images exhibit better preservation of textural details and fine contours, as evaluated using the LPIPS metric.
    Abstract Computed Tomography (CT) is a medical imaging modality that can generate more informative 3D images than 2D X-rays. However, this advantage comes at the expense of more radiation exposure, higher costs, and longer acquisition time. Hence, the reconstruction of 3D CT images using a limited number of 2D X-rays has gained significant importance as an economical alternative. Nevertheless, existing methods primarily prioritize minimizing pixel/voxel-level intensity discrepancies, often neglecting the preservation of textural details in the synthesized images. This oversight directly impacts the quality of the reconstructed images and thus affects the clinical diagnosis. To address the deficits, this paper presents a new self-driven generative adversarial network model (SdCT-GAN), which is motivated to pay more attention to image details by introducing a novel auto-encoder structure in the discriminator. In addition, a Sobel Gradient Guider (SGG) idea is applied throughout the model, where the edge information from the 2D X-ray image at the input can be integrated. Moreover, LPIPS (Learned Perceptual Image Patch Similarity) evaluation metric is adopted that can quantitatively evaluate the fine contours and textures of reconstructed images better than the existing ones. Finally, the qualitative and quantitative results of the empirical studies justify the power of the proposed model compared to mainstream state-of-the-art baselines.
    摘要 computed tomography (CT) 是医疗影像Modalities中能生成更加有用的三维图像,比二维X射线更高的Radiation exposure, cost and acquisition time. 因此,通过有限数量的二维X射线重建三维CT图像的重建技术已经得到了重要的重视。然而,现有的方法主要强调Minimize pixel/voxel-level intensity discrepancies,经常忽略保留图像细节的Synthesized images。这种缺点直接影响了重建图像的质量,从而affects the clinical diagnosis.为了解决这些缺点,本文提出了一种新的自适应生成对抗网络模型(SdCT-GAN),该模型具有一种新的自适应网络结构,以增强图像细节的保留。此外,本文还应用了 Sobel Gradient Guider(SGG)的想法,将输入2D X射线图像的边信息集成到模型中。此外,本文采用了LPIPS(学习Perceptual Image Patch Similarity)评价度量,可以更好地评估重建图像的细节和Texture。最后,对比主流状态的参考基eline,质量和量测试结果证明了提案模型的力量。

Semi-Supervised learning for Face Anti-Spoofing using Apex frame

  • paper_url: http://arxiv.org/abs/2309.04958
  • repo_url: None
  • paper_authors: Usman Muhammad, Mourad Oussalah, Jorma Laaksonen
  • for: 提高面部防诈技术的表现
  • methods: 使用 Gaussian 分布计算apex帧,不需要卷积
  • results: 在四个面部防诈数据库(CASIA、REPLAY-ATTACK、OULU-NPU、MSU-MFSD)中,使用apex帧提高了面部防诈技术的表现
    Abstract Conventional feature extraction techniques in the face anti-spoofing domain either analyze the entire video sequence or focus on a specific segment to improve model performance. However, identifying the optimal frames that provide the most valuable input for the face anti-spoofing remains a challenging task. In this paper, we address this challenge by employing Gaussian weighting to create apex frames for videos. Specifically, an apex frame is derived from a video by computing a weighted sum of its frames, where the weights are determined using a Gaussian distribution centered around the video's central frame. Furthermore, we explore various temporal lengths to produce multiple unlabeled apex frames using a Gaussian function, without the need for convolution. By doing so, we leverage the benefits of semi-supervised learning, which considers both labeled and unlabeled apex frames to effectively discriminate between live and spoof classes. Our key contribution emphasizes the apex frame's capacity to represent the most significant moments in the video, while unlabeled apex frames facilitate efficient semi-supervised learning, as they enable the model to learn from videos of varying temporal lengths. Experimental results using four face anti-spoofing databases: CASIA, REPLAY-ATTACK, OULU-NPU, and MSU-MFSD demonstrate the apex frame's efficacy in advancing face anti-spoofing techniques.
    摘要 传统的特征提取技术在面对抗假驱动领域 either 分析整个视频序列或专注于特定的段来提高模型性能。然而,确定最佳的帧,以提供模型最有价值的输入, remains 一个挑战。在这篇论文中,我们解决这个挑战 by employing Gaussian weighting to create apex frames for videos. Specifically, an apex frame is derived from a video by computing a weighted sum of its frames, where the weights are determined using a Gaussian distribution centered around the video's central frame. Furthermore, we explore various temporal lengths to produce multiple unlabeled apex frames using a Gaussian function, without the need for convolution. By doing so, we leverage the benefits of semi-supervised learning, which considers both labeled and unlabeled apex frames to effectively discriminate between live and spoof classes. Our key contribution emphasizes the apex frame's capacity to represent the most significant moments in the video, while unlabeled apex frames facilitate efficient semi-supervised learning, as they enable the model to learn from videos of varying temporal lengths. Experimental results using four face anti-spoofing databases: CASIA, REPLAY-ATTACK, OULU-NPU, and MSU-MFSD demonstrate the apex frame's efficacy in advancing face anti-spoofing techniques.

Anatomy Completor: A Multi-class Completion Framework for 3D Anatomy Reconstruction

  • paper_url: http://arxiv.org/abs/2309.04956
  • repo_url: None
  • paper_authors: Jianning Li, Antonio Pepe, Gijs Luijten, Christina Schwarz-Gsaxner, Jens Kleesiek, Jan Egger
  • for: 这 paper targets 一种 scenario where one or multiple anatomies are missing in the imaging data due to surgical, pathological or traumatic factors, or simply because these anatomies are not covered by image acquisition.
  • methods: 该 paper 提出了 two paradigms based on a 3D denoising auto-encoder (DAE) to solve the anatomy reconstruction problem: (i) the DAE learns a many-to-one mapping between incomplete and complete instances; (ii) the DAE learns directly a one-to-one residual mapping between the incomplete instances and the target anatomies.
  • results: Results show that the method produces reasonable anatomy reconstructions given instances with different levels of incompleteness (i.e., one or multiple random anatomies are missing).
    Abstract In this paper, we introduce a completion framework to reconstruct the geometric shapes of various anatomies, including organs, vessels and muscles. Our work targets a scenario where one or multiple anatomies are missing in the imaging data due to surgical, pathological or traumatic factors, or simply because these anatomies are not covered by image acquisition. Automatic reconstruction of the missing anatomies benefits many applications, such as organ 3D bio-printing, whole-body segmentation, animation realism, paleoradiology and forensic imaging. We propose two paradigms based on a 3D denoising auto-encoder (DAE) to solve the anatomy reconstruction problem: (i) the DAE learns a many-to-one mapping between incomplete and complete instances; (ii) the DAE learns directly a one-to-one residual mapping between the incomplete instances and the target anatomies. We apply a loss aggregation scheme that enables the DAE to learn the many-to-one mapping more effectively and further enhances the learning of the residual mapping. On top of this, we extend the DAE to a multiclass completor by assigning a unique label to each anatomy involved. We evaluate our method using a CT dataset with whole-body segmentations. Results show that our method produces reasonable anatomy reconstructions given instances with different levels of incompleteness (i.e., one or multiple random anatomies are missing). Codes and pretrained models are publicly available at https://github.com/Jianningli/medshapenet-feedback/ tree/main/anatomy-completor
    摘要 在这篇论文中,我们介绍了一个完成框架,用于重建各种生物体的几何形状,包括器官、血管和肌肉。我们的工作针对于医学影像数据中缺失一或多个生物体,可能因为手术、病理或受伤等因素,或者只是因为这些生物体没有被影像捕获。自动重建缺失的生物体具有许多应用,如器官3D生物印刷、全身份割、动画实实地、古生物Radiology和审判影像。我们提出了两种方案,基于3D杂噪自动编码器(DAE)解决生物体重建问题:(i)DAE学习多个尝试的多对一映射,使得 incomplete instances 可以转化为完整的实例。(ii)DAE直接学习一对一差分映射,使得 incomplete instances 可以减少为目标生物体。我们采用损失聚合方案,使得 DAE 更好地学习多对一映射,并进一步提高了差分映射的学习。此外,我们将 DAE 扩展到多类完成器,通过为每个生物体分配唯一的标签。我们使用 CT 数据集进行评估,结果表明,我们的方法可以在不同水平的不完整性(即一或多个随机生物体缺失)下生成合理的生物体重建。代码和预训练模型可以在 中下载。

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

  • paper_url: http://arxiv.org/abs/2309.04946
  • repo_url: https://github.com/yuangan/eat_code
  • paper_authors: Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang
  • for: 这项研究旨在提高audio-driven talking-head sintheis的灵活性和效率,并提供一种可靠地控制表情的方法。
  • methods: 该方法使用一个预训练的无情 talking-head transformer,并提出三种轻量级的改进(深度情感提示、情感变形网络和情感适应模块),以实现精准和真实的情感控制。
  • results: 该方法在广泛使用的benchmark上达到了状态之前的性能,包括LRW和MEAD。此外,我们的参数有效地适应能力在情感训练视频罕见或缺失时表现出色。
    Abstract Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications. However, the inflexibility and inefficiency of existing methods, which necessitate expensive end-to-end training to transfer emotions from guidance videos to talking-head predictions, are significant limitations. In this work, we propose the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner through parameter-efficient adaptations. Our approach utilizes a pretrained emotion-agnostic talking-head transformer and introduces three lightweight adaptations (the Deep Emotional Prompts, Emotional Deformation Network, and Emotional Adaptation Module) from different perspectives to enable precise and realistic emotion controls. Our experiments demonstrate that our approach achieves state-of-the-art performance on widely-used benchmarks, including LRW and MEAD. Additionally, our parameter-efficient adaptations exhibit remarkable generalization ability, even in scenarios where emotional training videos are scarce or nonexistent. Project website: https://yuangan.github.io/eat/
    摘要 文本:Audio-driven talking-head synthesis是虚拟人类应用领域的受欢迎研究主题。然而,现有方法的不灵活和不效率是重要的限制,这些方法需要昂贵的端到端训练来传递视频指导中的情感到说话人预测中。在这项工作中,我们提出了情感适应 для Audio-driven Talking-head(EAT)方法,它将情感无关的说话人模型转化为可控情感的模型,在成本效率的方式下进行。我们的方法使用预训练的情感无关说话人转换器,并引入三种轻量级的适应(深度情感提示、情感变形网络和情感适应模块)从不同的角度来实现精准和真实的情感控制。我们的实验显示,我们的方法在广泛使用的标准准则上达到了状态 искусственный智能的性能,包括LRW和MEAD。此外,我们的参数效率的适应表现了Remarkable的总体化能力,即使情感训练视频罕见或缺失。项目网站:https://yuangan.github.io/eat/Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Text-driven Editing of 3D Scenes without Retraining

  • paper_url: http://arxiv.org/abs/2309.04917
  • repo_url: None
  • paper_authors: Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Ming-Hsuan Yang, Shuchang Zhou
  • for: 该文章目的是提出一种基于文本的3D场景编辑方法,允许用户通过文本描述编辑3D场景图像。
  • methods: 该方法使用现成的文本基于编辑模型来修改2D图像,然后对修改后的图像进行筛选,以消除质量不佳的图像。最后,该方法使用生成特征相似的训练数据来减少干扰干扰。
  • results: 该方法可以实现多种编辑类型,包括外观编辑、天气转换、材质变换和风格传递。具有用户可以通过文本描述编辑3D场景图像的便捷、直观和实用性。
    Abstract Numerous diffusion models have recently been applied to image synthesis and editing. However, editing 3D scenes is still in its early stages. It poses various challenges, such as the requirement to design specific methods for different editing types, retraining new models for various 3D scenes, and the absence of convenient human interaction during editing. To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining. Our method employs off-the-shelf text-based editing models of 2D images to modify the 3D scene images, followed by a filtering process to discard poorly edited images that disrupt 3D consistency. We then consider the remaining inconsistency as a problem of removing noise perturbation, which can be solved by generating training data with similar perturbation characteristics for training. We further propose cross-view regularization terms to help the generalized NeRF model mitigate these perturbations. Our text-driven method allows users to edit a 3D scene with their desired description, which is more friendly, intuitive, and practical than prior works. Empirical results show that our method achieves multiple editing types, including but not limited to appearance editing, weather transition, material changing, and style transfer. Most importantly, our method generalizes well with editing abilities shared among a set of model parameters without requiring a customized editing model for some specific scenes, thus inferring novel views with editing effects directly from user input. The project website is available at http://sk-fun.fun/DN2N
    摘要 “很多扩散模型在图像生成和修改中被应用,但是修改3D场景仍处于早期阶段。这具有多种挑战,例如需要设计特定的修改类型,重新训练新模型以适应不同的3D场景,以及在修改过程中缺乏便利的人类交互。为解决这些问题,我们介绍了一种基于文本的修改方法,称为DN2N,它允许直接从文本描述中获得一个通用的修改能力。我们的方法使用商业可用的文本基于修改模型来修改2D图像,然后对修改后的图像进行筛选,以排除破坏3D一致性的图像。我们认为剩下的不一致性可以视为去除噪声干扰的问题,可以通过生成类似噪声特征的训练数据来解决。我们还提出了跨视图常见化项来帮助通用NeRF模型解决这些干扰。我们的文本驱动方法允许用户通过文本描述修改3D场景,这是较为友好、直观和实用的than之前的方法。我们的实验结果表明,我们的方法可以实现多种修改类型,包括但不限于外观修改、天气转换、材质变化和风格传递。最重要的是,我们的方法可以将修改能力共享到一组模型参数中,无需特定的修改模型,从而直接从用户输入中推断出修改后的视图。项目网站地址为http://sk-fun.fun/DN2N。”

Transformers in Small Object Detection: A Benchmark and Survey of State-of-the-Art

  • paper_url: http://arxiv.org/abs/2309.04902
  • repo_url: https://github.com/arekavandi/transformer-sod
  • paper_authors: Aref Miri Rekavandi, Shima Rashidi, Farid Boussaid, Stephen Hoefs, Emre Akbas, Mohammed bennamoun
  • for: 这个研究的目的是探索对小物类别检测(SOD)的表现优化,以及可能的原因。
  • methods: 这个研究使用了训练中的广泛网络,包括当前最优秀的对应物检测器。
  • results: 这个研究发现,广泛网络在SOD tasks中的表现优化,并且提供了一个统计分析和评估方法来评估广泛网络在SOD tasks中的表现。
    Abstract Transformers have rapidly gained popularity in computer vision, especially in the field of object recognition and detection. Upon examining the outcomes of state-of-the-art object detection methods, we noticed that transformers consistently outperformed well-established CNN-based detectors in almost every video or image dataset. While transformer-based approaches remain at the forefront of small object detection (SOD) techniques, this paper aims to explore the performance benefits offered by such extensive networks and identify potential reasons for their SOD superiority. Small objects have been identified as one of the most challenging object types in detection frameworks due to their low visibility. We aim to investigate potential strategies that could enhance transformers' performance in SOD. This survey presents a taxonomy of over 60 research studies on developed transformers for the task of SOD, spanning the years 2020 to 2023. These studies encompass a variety of detection applications, including small object detection in generic images, aerial images, medical images, active millimeter images, underwater images, and videos. We also compile and present a list of 12 large-scale datasets suitable for SOD that were overlooked in previous studies and compare the performance of the reviewed studies using popular metrics such as mean Average Precision (mAP), Frames Per Second (FPS), number of parameters, and more. Researchers can keep track of newer studies on our web page, which is available at \url{https://github.com/arekavandi/Transformer-SOD}.
    摘要 孔雀Transformers在计算机视觉领域快速崛起,尤其是对象识别和检测领域。我们对现代对象检测方法的结果进行分析发现,transformers在大多数视频或图像数据集上 consistently outperform了基于CNN的检测器。虽然transformer-basedapproaches在小对象检测(SOD)技术中保持领先地位,但这篇论文旨在探讨transformers在SOD中的性能优势,以及可能的潜在原因。小对象被识别为计算机检测框架中最为困难的对象之一,因为它们的可见性较低。我们想 investigate potential strategies可以提高transformers在SOD中的表现。这篇评论文件了60多个关于发展transformers的研究成果,涵盖了2020-2023年间的多种检测应用,包括通用图像中的小对象检测、航空图像、医疗图像、活动毫米图像、水下图像和视频。我们还编译了12个大规模的SOD适用的数据集,并与之前的研究相比较了这些研究的性能,使用popular metrics such as mean Average Precision(mAP)、Frames Per Second(FPS)、参数数量和更多。研究人员可以通过我们的网页(https://github.com/arekavandi/Transformer-SOD) Track newer studies.

cs.AI - 2023-09-10

Collecting Visually-Grounded Dialogue with A Game Of Sorts

  • paper_url: http://arxiv.org/abs/2309.05162
  • repo_url: https://github.com/willemsenbram/a-game-of-sorts
  • paper_authors: Bram Willemsen, Dmytro Kalpakchi, Gabriel Skantze
  • for: 本研究旨在检验对话中referring表达的生成和固定过程是如何进行的。
  • methods: 本研究使用了一个合作图片排序任务,称为“ Sorting Game”,以检验对话中referring表达的困难性和复杂性。
  • results: 研究发现,在这种合作交流中,参与者需要通过讨论和协商来达成一致,而不是只是交换简单的referring表达。这些讨论和协商过程中,参与者需要共同理解和协商referent的含义和特征。
    Abstract An idealized, though simplistic, view of the referring expression production and grounding process in (situated) dialogue assumes that a speaker must merely appropriately specify their expression so that the target referent may be successfully identified by the addressee. However, referring in conversation is a collaborative process that cannot be aptly characterized as an exchange of minimally-specified referring expressions. Concerns have been raised regarding assumptions made by prior work on visually-grounded dialogue that reveal an oversimplified view of conversation and the referential process. We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call "A Game Of Sorts". In our game, players are tasked with reaching agreement on how to rank a set of images given some sorting criterion through a largely unrestricted, role-symmetric dialogue. By putting emphasis on the argumentation in this mixed-initiative interaction, we collect discussions that involve the collaborative referential process. We describe results of a small-scale data collection experiment with the proposed task. All discussed materials, which includes the collected data, the codebase, and a containerized version of the application, are publicly available.
    摘要

Faster, Lighter, More Accurate: A Deep Learning Ensemble for Content Moderation

  • paper_url: http://arxiv.org/abs/2309.05150
  • repo_url: None
  • paper_authors: Mohammad Hosseini, Mahmudul Hasan
  • for: addresses the increasing need for efficient and accurate content moderation
  • methods: combines simple visual features with a lightweight ensemble of models
  • results: achieves significant improvements in prediction accuracy with 7.64x faster inference and lower computation cost compared to popular deep learning models such as ResNet-50.
    Abstract To address the increasing need for efficient and accurate content moderation, we propose an efficient and lightweight deep classification ensemble structure. Our approach is based on a combination of simple visual features, designed for high-accuracy classification of violent content with low false positives. Our ensemble architecture utilizes a set of lightweight models with narrowed-down color features, and we apply it to both images and videos. We evaluated our approach using a large dataset of explosion and blast contents and compared its performance to popular deep learning models such as ResNet-50. Our evaluation results demonstrate significant improvements in prediction accuracy, while benefiting from 7.64x faster inference and lower computation cost. While our approach is tailored to explosion detection, it can be applied to other similar content moderation and violence detection use cases as well. Based on our experiments, we propose a "think small, think many" philosophy in classification scenarios. We argue that transforming a single, large, monolithic deep model into a verification-based step model ensemble of multiple small, simple, and lightweight models with narrowed-down visual features can possibly lead to predictions with higher accuracy.
    摘要 Translated into Simplified Chinese:为了解决内容筛选的增加需求,我们提出了一种高效、轻量级的深度分类ensemble结构。我们的方法基于一组简单的视觉特征,设计用于高精度的暴力内容分类,false positive低。我们的ensemble架构使用一组轻量级的模型,并将其应用于图像和视频。 我们使用一个大量的爆炸和爆炸内容Dataset进行评估,并与popular的深度学习模型 such as ResNet-50进行比较。我们的评估结果表明,我们的方法可以 achieve higher prediction accuracy, while enjoying 7.64x faster inference and lower computation cost。 虽然我们的方法是针对爆炸检测的,但它可以应用于其他类似的内容筛选和暴力检测场景。基于我们的实验,我们提出了一种"思小、思多"的哲学,即将单一的大型、复杂的深度模型转换为验证基于步骤模型ensemble,并使用简单的视觉特征进行筛选。我们认为这可能会导致更高的预测精度。

Representation Learning in Low-rank Slate-based Recommender Systems

  • paper_url: http://arxiv.org/abs/2309.08622
  • repo_url: None
  • paper_authors: Yijia Dai, Wen Sun
  • for: 提高用户长期活跃度,通过强化学习推荐系统。
  • methods: 使用标准推荐setup和低维度Markov决策过程(MDPs)进行 represntation学习算法,以处理在线RL问题。
  • results: 通过构建推荐 simulate环境和提出的采样方法,实现样本效率的学习和探索。
    Abstract Reinforcement learning (RL) in recommendation systems offers the potential to optimize recommendations for long-term user engagement. However, the environment often involves large state and action spaces, which makes it hard to efficiently learn and explore. In this work, we propose a sample-efficient representation learning algorithm, using the standard slate recommendation setup, to treat this as an online RL problem with low-rank Markov decision processes (MDPs). We also construct the recommender simulation environment with the proposed setup and sampling method.
    摘要 <>translate Reinforcement Learning (RL) in recommendation systems into 推荐学习 (RL)<>推荐学习(RL)在推荐系统中提供了长期用户参与度优化的潜在可能性。然而,环境通常具有大状态和动作空间,这使得效率地学习和探索变得困难。在这种工作中,我们提议一种效率的表示学习算法,使用标准推荐SlaterSetup,将这视为在线RL问题,使用低级Markov决策过程(MDP)。我们还构建了推荐 simulate环境,使用我们的设置和抽样方法。

Outlier Robust Adversarial Training

  • paper_url: http://arxiv.org/abs/2309.05145
  • repo_url: https://github.com/discovershu/orat
  • paper_authors: Shu Hu, Zhenhuan Yang, Xin Wang, Yiming Ying, Siwei Lyu
  • for: 这篇论文目的是提出一种能够同时处理含有异常值和敌意攻击的supervised学习模型,以提高模型的可靠性和robustness。
  • methods: 该论文提出了一种基于两级优化的BI-level adversarial Training(ORAT)方法,该方法使用一种鲁棒度-based loss函数来增强模型的鲁棒性。
  • results: 实验结果表明,ORAT可以有效地处理含有异常值和敌意攻击的训练数据,并且在高probability中保证了模型的一致性和一致性。
    Abstract Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are robust with regard to the low-quality training data and the potential adversarial attack at inference time simultaneously. It is for this reason that we introduce Outlier Robust Adversarial Training (ORAT) in this work. ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function. Theoretically, we show that the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss. Furthermore, we analyze its generalization ability and provide uniform convergence rates in high probability. ORAT can be optimized with a simple algorithm. Experimental evaluations on three benchmark datasets demonstrate the effectiveness and robustness of ORAT in handling outliers and adversarial attacks. Our code is available at https://github.com/discovershu/ORAT.
    摘要 超级vised学习模型面临训练数据中的自然复杂性,如异常数据和少数批处理,以及推理时间的意外攻击。传统的Robust学习方法和最近的对抗学习方法可以处理每一个挑战,但到目前为止,没有任何工作在同时处理低质量训练数据和推理时间的对抗攻击。这是我们在这篇文章中引入Outlier Robust Adversarial Training(ORAT)的原因。ORAT基于对抗训练的双级优化形式和Robust排名基于损失函数。我们理论上显示,ORAT的学习目标满足了$\mathcal{H}$-一致性在二分类问题中,这使其成为对抗0/1损失函数的合法代理。此外,我们分析了它的泛化能力,并提供了高probability中的均匀收敛率。ORAT可以使用简单的算法进行优化。实验评估在三个标准数据集上表明ORAT有效地处理异常数据和对抗攻击。我们的代码可以在https://github.com/discovershu/ORAT中找到。

Large Language Models for Difficulty Estimation of Foreign Language Content with Application to Language Learning

  • paper_url: http://arxiv.org/abs/2309.05142
  • repo_url: None
  • paper_authors: Michalis Vlachos, Mircea Lungu, Yash Raj Shrestha, Johannes-Rudolf David
  • for: 帮助外语学习者提高外语水平,通过identifying感兴趣的话题和learner的外语水平相似的内容。
  • methods: 使用大语言模型提高外语学习者的掌握能力,包括发现learner关注的话题上的内容、更准确地估计内容的语言难度,以及提供文本和视频内容。
  • results: 提供一种可以适应学习者的兴趣和学习目标的语言学习解决方案,可以帮助学习者持续激发对外语学习的兴趣和motivation。
    Abstract We use large language models to aid learners enhance proficiency in a foreign language. This is accomplished by identifying content on topics that the user is interested in, and that closely align with the learner's proficiency level in that foreign language. Our work centers on French content, but our approach is readily transferable to other languages. Our solution offers several distinctive characteristics that differentiate it from existing language-learning solutions, such as, a) the discovery of content across topics that the learner cares about, thus increasing motivation, b) a more precise estimation of the linguistic difficulty of the content than traditional readability measures, and c) the availability of both textual and video-based content. The linguistic complexity of video content is derived from the video captions. It is our aspiration that such technology will enable learners to remain engaged in the language-learning process by continuously adapting the topics and the difficulty of the content to align with the learners' evolving interests and learning objectives.
    摘要 我们使用大型语言模型帮助学生提高Foreign language proficiency。我们通过识别用户感兴趣的话题,并与学生的Foreign language水平相似的话题进行匹配,以提高学生的motivation。我们的工作主要关注法语内容,但我们的方法可以适用于其他语言。我们的解决方案具有以下三个特点:一、通过找到用户关心的话题来增加motivation;二、通过语言难度测试来更准确地评估内容的语言难度;三、提供文本和视频内容。视频内容的语言难度来自于视频字幕。我们的目标是通过不断地适应用户的兴趣和学习目标,使学生保持在Foreign language学习过程中的兴趣和积极性。

Signal Temporal Logic Neural Predictive Control

  • paper_url: http://arxiv.org/abs/2309.05131
  • repo_url: None
  • paper_authors: Yue Meng, Chuchu Fan
  • for: 本研究旨在提供一种能够系统地和可靠地满足长期机器人任务的安全性和时间约束要求的方法。
  • methods: 我们提出了一种直接使用强化学习学习一个神经网络控制器,以满足由Signal Temporal Logic(STL)所规定的要求。我们的控制器在训练中 Maximize STL 鲁棒性分数,在投入中类似于预测控制(MPC),预测一个在规划 horizons 内的 trajectory,以确保任务满足 STL 要求。
  • results: 我们在六个任务上进行了实验,其中我们的方法与备用策略在 STL 满足率方面表现出色,特别是在任务中存在复杂 STL 要求时,与传统方法(MPC、STL 解决方案)、模型自由和模型基于RL方法相比,速度比较快,10X-100X faster than classical methods。
    Abstract Ensuring safety and meeting temporal specifications are critical challenges for long-term robotic tasks. Signal temporal logic (STL) has been widely used to systematically and rigorously specify these requirements. However, traditional methods of finding the control policy under those STL requirements are computationally complex and not scalable to high-dimensional or systems with complex nonlinear dynamics. Reinforcement learning (RL) methods can learn the policy to satisfy the STL specifications via hand-crafted or STL-inspired rewards, but might encounter unexpected behaviors due to ambiguity and sparsity in the reward. In this paper, we propose a method to directly learn a neural network controller to satisfy the requirements specified in STL. Our controller learns to roll out trajectories to maximize the STL robustness score in training. In testing, similar to Model Predictive Control (MPC), the learned controller predicts a trajectory within a planning horizon to ensure the satisfaction of the STL requirement in deployment. A backup policy is designed to ensure safety when our controller fails. Our approach can adapt to various initial conditions and environmental parameters. We conduct experiments on six tasks, where our method with the backup policy outperforms the classical methods (MPC, STL-solver), model-free and model-based RL methods in STL satisfaction rate, especially on tasks with complex STL specifications while being 10X-100X faster than the classical methods.
    摘要 Ensuring safety and meeting temporal specifications are critical challenges for long-term robotic tasks. 信号时间逻辑(STL)已广泛应用于系统地和准确地要求这些需求。然而,传统的控制策略找到方法是计算复杂和不可扩展高维或非线性动力学系统。 reinforcement learning(RL)方法可以通过手工或STL- inspirited reward学习策略满足STL要求,但可能会遇到意外行为 due to ambiguity and sparsity in the reward.在这篇论文中,我们提出了一种方法,可以直接学习神经网络控制器,满足STL要求。我们的控制器在训练中学习满足STLRobustness分数的扩展曲线。在测试中,类似于Model Predictive Control(MPC),我们学习的控制器预测一个在规划时间Horizon内的轨迹,以确保STL要求的满足。我们还设计了一个备份策略,以确保安全性,当我们的控制器失败时。我们的方法可以适应不同的初始条件和环境参数。我们在六个任务上进行了实验,我们的方法,备份策略相比于经典方法(MPC、STL-solver)、模型自由和模型基于RL方法,在STL满足率方面表现出色,特别是在复杂的STL要求下,并且在10X-100X快于经典方法。

The online learning architecture with edge computing for high-level control for assisting patients

  • paper_url: http://arxiv.org/abs/2309.05130
  • repo_url: None
  • paper_authors: Yue Shi, Yihui Zhao
  • for: 这篇研究旨在提高因病患或创伤等因素而导致下肢功能障碍的人士 mobility 和 Rehabilitation 的可能性。
  • methods: 本研究使用了在紧复时间内处理感应数据的边缘 Computing 和在线 adversarial learning 架构,实现高级的下肢 exoskeleton 控制。
  • results: 实验结果显示,该架构可以提高控制精度和适应性,同时提高 Quality-of-Service (QoS) 指标。这些成果显示,将在线 adversarial learning 与边缘 Computing 结合可以提供下一代下肢 exoskeleton 控制系统的可靠和高效方法。
    Abstract The prevalence of mobility impairments due to conditions such as spinal cord injuries, strokes, and degenerative diseases is on the rise globally. Lower-limb exoskeletons have been increasingly recognized as a viable solution for enhancing mobility and rehabilitation for individuals with such impairments. However, existing exoskeleton control systems often suffer from limitations such as latency, lack of adaptability, and computational inefficiency. To address these challenges, this paper introduces a novel online adversarial learning architecture integrated with edge computing for high-level lower-limb exoskeleton control. In the proposed architecture, sensor data from the user is processed in real-time through edge computing nodes, which then interact with an online adversarial learning model. This model adapts to the user's specific needs and controls the exoskeleton with minimal latency. Experimental evaluations demonstrate significant improvements in control accuracy and adaptability, as well as enhanced quality-of-service (QoS) metrics. These findings indicate that the integration of online adversarial learning with edge computing offers a robust and efficient approach for the next generation of lower-limb exoskeleton control systems.
    摘要 全球的 mobililty 障碍(如脊梁创伤、中风和逐渐恶化的疾病)的发展趋势是增加的。Lower-limb exoskeletons 被越来越多地认为是提高 mobililty 和rehabilitation 的有效解决方案。然而,现有的 exoskeleton 控制系统经常受到 limitation 的影响,如延迟、缺乏适应性和计算不足。为了解决这些挑战,本文提出了一种基于 online adversarial learning 架构的高级 lower-limb exoskeleton 控制系统。在该架构中,用户的感知数据在实时通过边缘计算节点处理,然后与在线 adversarial learning 模型交互。这个模型适应用户的特定需求,控制 exoskeleton WITH 最小延迟。实验评估表明,该架构可以提高控制精度和适应性,同时提高质量服务(QoS)指标。这些发现表明,将 online adversarial learning 与边缘计算相结合可以提供下一代 lower-limb exoskeleton 控制系统的可靠和高效的解决方案。

WIP: Development of a Student-Centered Personalized Learning Framework to Advance Undergraduate Robotics Education

  • paper_url: http://arxiv.org/abs/2309.05124
  • repo_url: None
  • paper_authors: Ponkoj Chandra Shill, Rui Wu, Hossein Jamali, Bryan Hutchins, Sergiu Dascalu, Frederick C. Harris, David Feil-Seifer
  • for: 提供个性化学习环境 для机器人学生,解决了学院级机器人教学资源紧缺和高昂的训练成本问题。
  • methods: 开发一个基于网页界面的机器人教学系统,可以与较便宜的硬件配合使用,以便免费分布教学材料,推动更多的机器人课程在两年和四年大学 Offered。
  • results: 针对五个Module mini-course进行了评估,发现学生对在线内容表示 позитив的体验,同时在相关性、熟悉性和自主性等三个方面得分很高,表明这种方法具有强大的动机潜力。
    Abstract This paper presents a work-in-progress on a learn-ing system that will provide robotics students with a personalized learning environment. This addresses both the scarcity of skilled robotics instructors, particularly in community colleges and the expensive demand for training equipment. The study of robotics at the college level represents a wide range of interests, experiences, and aims. This project works to provide students the flexibility to adapt their learning to their own goals and prior experience. We are developing a system to enable robotics instruction through a web-based interface that is compatible with less expensive hardware. Therefore, the free distribution of teaching materials will empower educators. This project has the potential to increase the number of robotics courses offered at both two- and four-year schools and universities. The course materials are being designed with small units and a hierarchical dependency tree in mind; students will be able to customize their course of study based on the robotics skills they have already mastered. We present an evaluation of a five module mini-course in robotics. Students indicated that they had a positive experience with the online content. They also scored the experience highly on relatedness, mastery, and autonomy perspectives, demonstrating strong motivation potential for this approach.
    摘要 这份论文介绍了一个学习系统,旨在为机器人学生提供个性化学习环境。这种系统将解决机器人教育人员短缺和训练设备成本高的问题,特别是在社区学院。学生在学习机器人时有各种兴趣、经验和目标,这个项目的目的是让学生可以根据自己的目标和先前学习来自定义学习路径。我们正在开发一个通过网络界面进行机器人教学,可以与较便宜的硬件相结合。因此,我们将免费发布教学材料,以便教育者们可以更加自由地使用。这个项目有望增加两年和四年学院和大学机器人课程的数量。我们正在设计课程材料,以小单元和层次结构为基础,学生可以根据已经掌握的机器人技能自定义课程。我们对五个模块小课程进行了评估,学生表示对在线内容有积极的体验,并在相关性、掌握和自主性方面得分高,表明这种方法具有强的动机潜力。

High Fidelity Fast Simulation of Human in the Loop Human in the Plant (HIL-HIP) Systems

  • paper_url: http://arxiv.org/abs/2309.06558
  • repo_url: None
  • paper_authors: Ayan Banerjee, Payal Kamboj, Aranyak Maity, Riya Sudhakar Salian, Sandeep K. S. Gupta
    for: 这个论文是为了研究在 integrate wireless mobile networks 和人在loop (HIL)和人在plant (HIP)physical systems 下的非线性simulation 问题。methods: 该论文使用了分割时间变化Component的方法(PLIS),将其分解成多个Interval中的固定时间点,然后将这些Interval concatenated 在时间域中。results: 研究发现PLIS方法可以带来大于2.1倍的速度提升,并且保证了 simulations 的准确性。
    Abstract Non-linearities in simulation arise from the time variance in wireless mobile networks when integrated with human in the loop, human in the plant (HIL-HIP) physical systems under dynamic contexts, leading to simulation slowdown. Time variance is handled by deriving a series of piece wise linear time invariant simulations (PLIS) in intervals, which are then concatenated in time domain. In this paper, we conduct a formal analysis of the impact of discretizing time-varying components in wireless network-controlled HIL-HIP systems on simulation accuracy and speedup, and evaluate trade-offs with reliable guarantees. We develop an accurate simulation framework for an artificial pancreas wireless network system that controls blood glucose in Type 1 Diabetes patients with time varying properties such as physiological changes associated with psychological stress and meal patterns. PLIS approach achieves accurate simulation with greater than 2.1 times speedup than a non-linear system simulation for the given dataset.
    摘要 非线性在模拟中来自无线移动网络与人loop(HIL-HIP)物理系统的时间变化下出现,导致模拟慢速。我们采取了分割时间变化的方法, derive a series of piece wise linear time invariant simulations(PLIS),然后将它们 concatenated 在时域中。在这篇论文中,我们进行了正式的时间变化精度和速度的分析,并评估了可靠保证的交易。我们开发了一个准确的模拟框架,用于控制Type 1 диабеت斯 patients的血糖水平,该系统具有时变性特征,如生物physiological 变化和心理压力和饭 Patterns。PLIS 方法实现了更高于 2.1 倍的速度提升,而不 sacrifi 精度。

A compendium of data sources for data science, machine learning, and artificial intelligence

  • paper_url: http://arxiv.org/abs/2309.05682
  • repo_url: None
  • paper_authors: Paul Bilokon, Oleksandr Bilokon, Saeed Amen
  • for: 提供数据科学、机器学习和人工智能领域的数据源列表,帮助数据科学家和机器学习专家在各个应用领域进行数据处理和分析。
  • methods: 列举了多个应用领域的数据源,包括金融和经济、法律(法律和规章)、生命科学(医学和药物发现)、新闻情感和社交媒体、零售和电商、卫星影像和运输和供应链,并提供了这些数据源的简要描述。
  • results: 提供了一个不完全的,但广泛的数据源列表,可以帮助数据科学家和机器学习专家在各个应用领域进行数据处理和分析。
    Abstract Recent advances in data science, machine learning, and artificial intelligence, such as the emergence of large language models, are leading to an increasing demand for data that can be processed by such models. While data sources are application-specific, and it is impossible to produce an exhaustive list of such data sources, it seems that a comprehensive, rather than complete, list would still benefit data scientists and machine learning experts of all levels of seniority. The goal of this publication is to provide just such an (inevitably incomplete) list -- or compendium -- of data sources across multiple areas of applications, including finance and economics, legal (laws and regulations), life sciences (medicine and drug discovery), news sentiment and social media, retail and ecommerce, satellite imagery, and shipping and logistics, and sports.
    摘要 Recent advances in数据科学、机器学习和人工智能,如大语言模型的出现,导致了对这些模型处理数据的需求的增加。虽然数据来源是应用程序特定的,但是无法制作完整的列表,但一份具体的列表仍然会对数据科学家和机器学习专家们有帮助。本文的目标是提供一个(必然不完整的)列表,涵盖多个领域的应用,包括金融和经济、法律(法律和规章)、生命科学(医学和药物发现)、新闻情感和社交媒体、零售和电商、卫星图像和运输和供应链,以及体育。

Deep Learning-Aided Subspace-Based DOA Recovery for Sparse Arrays

  • paper_url: http://arxiv.org/abs/2309.05109
  • repo_url: None
  • paper_authors: Yoav Amiel, Dor H. Shmuel, Nir Shlezinger, Wasim Huleihel
  • for: 这项研究旨在开发一种基于深度学习的异常探测方法,以解决稀疏降噪数组中的方向探测问题。
  • methods: 该方法使用深度学习来学习一个专门的深度网络,以将数组中的异常信号分解成可分辨的子空间。
  • results: 该方法可以在稀疏降噪数组中处理听到的干扰信号,并且可以保持模型基于子空间方向探测器的解释性和适用性。
    Abstract Sparse arrays enable resolving more direction of arrivals (DoAs) than antenna elements using non-uniform arrays. This is typically achieved by reconstructing the covariance of a virtual large uniform linear array (ULA), which is then processed by subspace DoA estimators. However, these method assume that the signals are non-coherent and the array is calibrated; the latter often challenging to achieve in sparse arrays, where one cannot access the virtual array elements. In this work, we propose Sparse-SubspaceNet, which leverages deep learning to enable subspace-based DoA recovery from sparse miscallibrated arrays with coherent sources. Sparse- SubspaceNet utilizes a dedicated deep network to learn from data how to compute a surrogate virtual array covariance that is divisible into distinguishable subspaces. By doing so, we learn to cope with coherent sources and miscalibrated sparse arrays, while preserving the interpretability and the suitability of model-based subspace DoA estimators.
    摘要 稀疏数组可以解决更多的方向来源(DoAs)than antenna element using non-uniform arrays. 通常通过重建虚拟大 uniform linear array(ULA)的协方差来实现这一点,然后使用子空间DoA估计器进行处理。然而,这些方法假设信号是非几何的和数组是calibrated; 后者经常是稀疏数组中的挑战。在这种情况下,我们提出了Sparse-SubspaceNet,这是一种使用深度学习来实现基于subspace的DoA恢复从稀疏不calibrated数组中的几何源。Sparse-SubspaceNet使用专门的深度网络来学习从数据中如何计算一个可分解的虚拟数组协方差,这使得我们可以处理几何源和不calibrated稀疏数组,同时保持模型基于subspace DoA估计器的可读性和适用性。

AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions

  • paper_url: http://arxiv.org/abs/2309.05103
  • repo_url: https://github.com/sonqt/agent-unanswerable
  • paper_authors: Son Quoc Tran, Gia-Huy Do, Phong Nguyen-Thuan Do, Matt Kretchmar, Xinya Du
  • for: 提高Extractive Question Answering(EQA)领域中模型的性能,通过自动生成无法回答的问题来训练EQA模型,以避免模型提取错误或不正确的答案。
  • methods: 提出AGent管道,通过重新匹配问题与缺乏必要信息的上下文来自动生成无法回答的问题。
  • results: 通过使用AGent管道生成的无法回答问题,训练EQA模型可以达到与使用SQuAD 2.0 dataset的性能相似的水平。
    Abstract The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable questions within the EQA domain. Training EQA models with unanswerable questions helps them avoid extracting misleading or incorrect answers for queries that lack valid responses. However, manually annotating unanswerable questions is labor-intensive. To address this, we propose AGent, a novel pipeline that automatically creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer. In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA. These created question sets exhibit low error rates. Additionally, models fine-tuned on these questions show comparable performance with those fine-tuned on the SQuAD 2.0 dataset on multiple EQA benchmarks.
    摘要 <>大量高质量数据和高性能模型的发展对提取问题回答领域(EQA)带来了重要进步,这些进步引起了对不可回答问题的探索的广泛关注。通过训练EQA模型使用不可回答问题,可以帮助这些模型避免提取错误或 incomplete 答案。然而,手动标注不可回答问题是劳动密集的。为解决这个问题,我们提出了AGent管道,一种新的管道,可以自动生成新的不可回答问题,通过重新匹配一个问题与缺乏必要信息的上下文。在这篇论文中,我们示出了AGent管道的有用性,通过将answerable questions转换成不可回答问题,并创造了两个不可回答问题集,其错误率较低。此外,基于这些问题进行了 fine-tuning,模型在多个 EQA 测试上表现相当。[/INST Sure, here's the translation of the text into Simplified Chinese:大量高质量数据和高性能模型的发展对提取问题回答领域(EQA)带来了重要进步,这些进步引起了对不可回答问题的探索的广泛关注。通过训练EQA模型使用不可回答问题,可以帮助这些模型避免提取错误或 incomplete 答案。然而,手动标注不可回答问题是劳动密集的。为解决这个问题,我们提出了AGent管道,一种新的管道,可以自动生成新的不可回答问题,通过重新匹配一个问题与缺乏必要信息的上下文。在这篇论文中,我们示出了AGent管道的有用性,通过将answerable questions转换成不可回答问题,并创造了两个不可回答问题集,其错误率较低。此外,基于这些问题进行了 fine-tuning,模型在多个 EQA 测试上表现相当。

Exploring Social Choice Mechanisms for Recommendation Fairness in SCRUF

  • paper_url: http://arxiv.org/abs/2309.08621
  • repo_url: https://github.com/that-recsys-lab/scruf_facctrec_2023
  • paper_authors: Amanda Aird, Cassidy All, Paresha Farastu, Elena Stefancova, Joshua Sun, Nicholas Mattei, Robin Burke
  • for: 这篇论文主要针对的是推荐系统中的公平问题,具体来说是多个公平关注者之间的矛盾和讨论。
  • methods: 该论文使用社会选择理论来形式化和解决公平问题,并考虑了多种选择机制和分配方式来处理多个公平关注者之间的矛盾。
  • results: 该论文通过使用实际和synthetic数据进行实验,发现不同的选择机制和分配方式会导致不同的公平精度和准确率之间的权衡。此外,该论文还表明了多个代理人形式ulation的灵活性,可以适应用户人口动态变化。
    Abstract Fairness problems in recommender systems often have a complexity in practice that is not adequately captured in simplified research formulations. A social choice formulation of the fairness problem, operating within a multi-agent architecture of fairness concerns, offers a flexible and multi-aspect alternative to fairness-aware recommendation approaches. Leveraging social choice allows for increased generality and the possibility of tapping into well-studied social choice algorithms for resolving the tension between multiple, competing fairness concerns. This paper explores a range of options for choice mechanisms in multi-aspect fairness applications using both real and synthetic data and shows that different classes of choice and allocation mechanisms yield different but consistent fairness / accuracy tradeoffs. We also show that a multi-agent formulation offers flexibility in adapting to user population dynamics.
    摘要 “具有多元 fairness 需求的推荐系统问题通常在实际应用中具有复杂性,不充分被研究形式化的研究所能够捕捉。使用社会选择形式ulation的 fairness 问题,在多代理oki的公平关注架构中运作,提供了一个洒脱的多方面替代方案。利用社会选择可以提高通用性和可以将多元公平 Concerns 转化为已经学习的社会选择算法来解决多元公平 Concerns 之间的紧张关系。本文将评估不同的选择和分配机制在多元公平应用中的效果,包括使用实际和 sintetic 数据,并显示出不同类型的选择和分配机制在公平率 / 准确度贸易中产生不同的但是一致的变化。我们还示出了多代理oki 形式的洒脱性,可以适应用户人口动态。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Variance Reduction of Resampling for Sequential Monte Carlo

  • paper_url: http://arxiv.org/abs/2309.08620
  • repo_url: https://github.com/986876245/variance-reduction-for-smc
  • paper_authors: Xiongming Dai, Gerald Baumgartner
  • for: 这篇论文是为了提出一种统计重点抽样方法,来替代低重量粒子的MCMC方法,以更快速地和更精确地描述隐藏Markov过程。
  • methods: 本论文使用了一种重复决定域法,并且使用中值ergodicity来进行抽样。
  • results: 研究发现,这种方法可以在非线性情况下更快速地和更精确地描述隐藏Markov过程,并且可以降低样本变化的方差。
    Abstract A resampling scheme provides a way to switch low-weight particles for sequential Monte Carlo with higher-weight particles representing the objective distribution. The less the variance of the weight distribution is, the more concentrated the effective particles are, and the quicker and more accurate it is to approximate the hidden Markov model, especially for the nonlinear case. We propose a repetitive deterministic domain with median ergodicity for resampling and have achieved the lowest variances compared to the other resampling methods. As the size of the deterministic domain $M\ll N$ (the size of population), given a feasible size of particles, our algorithm is faster than the state of the art, which is verified by theoretical deduction and experiments of a hidden Markov model in both the linear and non-linear cases.
    摘要 一种重采样方案可以将低权重粒子换为顺序 Monte Carlo 中的高权重粒子,表示目标分布。当 variance 的低时,粒子的效果更集中,更快速地 aproximate 隐藏 Markov 模型,特别是非线性情况。我们提议一种循环决定的 deterministic Domain WITH median 征求,并实现了最低的方差。当 $M\ll N$ (人口大小),给定可行的粒子大小,我们的算法比现状慢,经过了逻辑推导和隐藏 Markov 模型在线性和非线性情况下的实验验证。

Neural-Hidden-CRF: A Robust Weakly-Supervised Sequence Labeler

  • paper_url: http://arxiv.org/abs/2309.05086
  • repo_url: https://github.com/junchenzhi/neural-hidden-crf
  • paper_authors: Zhijun Chen, Hailong Sun, Wanhao Zhang, Chunyi Xu, Qianren Mao, Pengpeng Chen
  • for: 解决弱监督序列标签问题
  • methods: 使用神经网络隐藏CRF层模型word序列、隐藏真实标签序列和弱标签序列的变量,并利用全球视角来模型这些变量
  • results: 在一个人工智能 benchmark 和三个弱监督 benchmark 上达到新的状态对应记录,包括在一般化和推理性能中超过最近的进步模型CHMM的2.80 F1点和2.23 F1点。
    Abstract We propose a neuralized undirected graphical model called Neural-Hidden-CRF to solve the weakly-supervised sequence labeling problem. Under the umbrella of probabilistic undirected graph theory, the proposed Neural-Hidden-CRF embedded with a hidden CRF layer models the variables of word sequence, latent ground truth sequence, and weak label sequence with the global perspective that undirected graphical models particularly enjoy. In Neural-Hidden-CRF, we can capitalize on the powerful language model BERT or other deep models to provide rich contextual semantic knowledge to the latent ground truth sequence, and use the hidden CRF layer to capture the internal label dependencies. Neural-Hidden-CRF is conceptually simple and empirically powerful. It obtains new state-of-the-art results on one crowdsourcing benchmark and three weak-supervision benchmarks, including outperforming the recent advanced model CHMM by 2.80 F1 points and 2.23 F1 points in average generalization and inference performance, respectively.
    摘要 我们提出了一种含有隐藏CRF层的神经网络模型,称为神经隐藏CRF,用于解决弱监督序列标签问题。在概率无向图论下,神经隐藏CRF模型了 palabras序列、隐藏真实序列和弱标签序列的变量,并且具有全局视角,特别是无向图论中的优势。在神经隐藏CRF中,我们可以利用深度语言模型BERT或其他深度模型提供丰富的语义知识来隐藏真实序列,并使用隐藏CRF层捕捉内部标签依赖关系。神经隐藏CRF的概念简单,实际强大。它在一个人工智能投票 benchmark和三个弱监督 benchmark 上取得了新的状态理论最佳结果,包括在平均总体化和推理性能方面比最近的高级模型CHMM高2.80个F1分和2.23个F1分。

An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents

  • paper_url: http://arxiv.org/abs/2309.05076
  • repo_url: None
  • paper_authors: Maximilian Croissant, Madeleine Frister, Guy Schofield, Cade McCall
  • for: 这项研究旨在解决人工智能代理人的可信度、自然性和互动性等领域中的一些挑战,具体来说是开发一种能够模拟人类情感的人工智能代理人。
  • methods: 这项研究采用了大型自然语言模型(LLM),通过挖掘情境评估中的共同模式来解决情感智能任务,并在视频游戏中测试了一种新的情感链架架构。
  • results: 研究结果表明,新的情感链架架构在用户体验和内容分析等方面的多个指标上表现出色,比标准LLM架构更高效。这项研究因此提供了在基于语言模型的认知过程中构建和测试情感代理人的初始证据。
    Abstract The development of believable, natural, and interactive digital artificial agents is a field of growing interest. Theoretical uncertainties and technical barriers present considerable challenges to the field, particularly with regards to developing agents that effectively simulate human emotions. Large language models (LLMs) might address these issues by tapping common patterns in situational appraisal. In three empirical experiments, this study tests the capabilities of LLMs to solve emotional intelligence tasks and to simulate emotions. It presents and evaluates a new chain-of-emotion architecture for emotion simulation within video games, based on psychological appraisal research. Results show that it outperforms standard LLM architectures on a range of user experience and content analysis metrics. This study therefore provides early evidence of how to construct and test affective agents based on cognitive processes represented in language models.
    摘要 随着人工智能技术的不断发展,开发可信、自然、互动的数字人工智能代理人也成为了一项快速增长的领域。然而,许多理论上的不确定性和技术难题使得该领域面临着很大的挑战,尤其是在模拟人类情感方面。大语言模型(LLM)可能可以解决这些问题,通过捕捉情境评估中的共同模式。本研究通过三个实验测试了LLM在情感智能任务中的能力,以及它们在视频游戏中的情感模拟能力。结果表明,我们的新的情感链架系统在用户体验和内容分析指标上表现出色,比标准LLM架构更高效。这项研究因此为构建和测试基于语言模型的情感代理人提供了早期的证据。

Chebyshev Particles

  • paper_url: http://arxiv.org/abs/2309.06373
  • repo_url: https://github.com/986876245/chebyshevparticles
  • paper_authors: Xiongming Dai, Gerald Baumgartner
  • for: 这个论文主要用于推断隐藏马尔可夫模型(Hidden Markov Model,HMM)的参数,尤其是在维度约束的情况下,where the Monte Carlo sampler struggles with the curse of dimensionality.
  • methods: 该论文提出了一种新的 критерий,即最大化权重的里茨卷积量(weighted Riesz polarization quantity),来精确地拟合 rectifiable submanifolds,并通过对互相互动的pairwise interaction来离散化。
  • results: 该论文通过实验表明,在一个线性加 Gaussian state-space模型(Linear Gaussian state-space model,LGSSM)中的参数推断和一个非线性随机抖动模型(Non-linear stochastic volatility model,NLSM)中的参数推断都能够达到高性能。
    Abstract Markov chain Monte Carlo (MCMC) provides a feasible method for inferring Hidden Markov models, however, it is often computationally prohibitive, especially constrained by the curse of dimensionality, as the Monte Carlo sampler traverses randomly taking small steps within uncertain regions in the parameter space. We are the first to consider the posterior distribution of the objective as a mapping of samples in an infinite-dimensional Euclidean space where deterministic submanifolds are embedded and propose a new criterion by maximizing the weighted Riesz polarization quantity, to discretize rectifiable submanifolds via pairwise interaction. We study the characteristics of Chebyshev particles and embed them into sequential MCMC, a novel sampler with a high acceptance ratio that proposes only a few evaluations. We have achieved high performance from the experiments for parameter inference in a linear Gaussian state-space model with synthetic data and a non-linear stochastic volatility model with real-world data.
    摘要 Markerov链 Монте Carlo(MCMC)提供了一种可行的方法来推断隐藏Markov模型,但是它常常由尺度约束所困,尤其是在维度约束的咒语下,MCMC抽样器在参数空间中随机漫步,难以在不确定的区域中进行准确的步长。我们是第一个考虑 posterior Distribution 作为抽象空间中的样本映射,并提出了一新的 критерий,通过最大化均值拓扑量的weighted Riesz polarization量来离散可导的子拓扑。我们研究了Chebychev particles的特点并将其集成到顺序MCMC中,一种新的抽样器,它的接受率很高,只需要少量的评估。我们通过实验表明,这种方法在Linear Gaussian state-space model中进行参数推断时可以 дости得高性能,并在非线性抽象噪声模型中进行参数推断时也能够达到高性能。

Spatiotemporal Graph Neural Networks with Uncertainty Quantification for Traffic Incident Risk Prediction

  • paper_url: http://arxiv.org/abs/2309.05072
  • repo_url: https://github.com/sttdanonymous/sttd
  • paper_authors: Xiaowei Gao, Xinke Jiang, Dingyi Zhuang, Huanfa Chen, Shenhao Wang, James Haworth
  • for: 预测交通事故风险在细致时空层面是一项挑战。现有数据主要具有零值,表示没有事故,而 occasional high-risk values 表示严重事故。现有大多数模型, especial deep learning methods, 强调估计风险值,忽视因事故本身具有不可预测性而产生的uncertainty。
  • methods: 我们引入了Spatiotemporal Zero-Inflated Tweedie Graph Neural Networks (STZITD-GNNs) 模型,这种模型结合了传统统计模型的可靠性和图神经网络的灵活性,以准确量化交通事故风险的不确定性。该模型采用了Tweedie家族中的复合模型,其中Poisson分布模型了风险频率,而Gamma分布做了事故严重程度的衡量。此外,zero-inflated组成部分帮助确定非事故风险enario。
  • results: 实验结果表明,STZITD-GNNs 模型在使用英国伦敦实际交通数据时,不仅在精度方面超越了目前的标准准则,而且在短(7天)和长(14天)时间尺度上都能够提供稳定和可靠的预测结果。STZITD-GNNs 模型的优势不仅在于准确性,还在于能够减少不确定性,从而提供更加可靠的预测结果。
    Abstract Predicting traffic incident risks at granular spatiotemporal levels is challenging. The datasets predominantly feature zero values, indicating no incidents, with sporadic high-risk values for severe incidents. Notably, a majority of current models, especially deep learning methods, focus solely on estimating risk values, overlooking the uncertainties arising from the inherently unpredictable nature of incidents. To tackle this challenge, we introduce the Spatiotemporal Zero-Inflated Tweedie Graph Neural Networks (STZITD-GNNs). Our model merges the reliability of traditional statistical models with the flexibility of graph neural networks, aiming to precisely quantify uncertainties associated with road-level traffic incident risks. This model strategically employs a compound model from the Tweedie family, as a Poisson distribution to model risk frequency and a Gamma distribution to account for incident severity. Furthermore, a zero-inflated component helps to identify the non-incident risk scenarios. As a result, the STZITD-GNNs effectively capture the dataset's skewed distribution, placing emphasis on infrequent but impactful severe incidents. Empirical tests using real-world traffic data from London, UK, demonstrate that our model excels beyond current benchmarks. The forte of STZITD-GNN resides not only in its accuracy but also in its adeptness at curtailing uncertainties, delivering robust predictions over short (7 days) and extended (14 days) timeframes.
    摘要 预测路网冲击风险在精度空间时间层面是一项挑战。数据主要具有零值,表示没有事故,其中间间有极高风险值的严重事故。现有大多数模型,特别是深度学习方法,偏向仅仅估计风险值,忽视了事故的不可预测性。为了解决这个挑战,我们介绍了空间时间零值 Tweedie 图 neural network(STZITD-GNN)。我们的模型结合了传统统计模型的可靠性和图神经网络的灵活性,以准确量化道路层次交通事故的不确定性。我们的模型采用 Tweedie 家族中的复合模型,其中 Poisson 分布模型风险频率,而 Gamma 分布模型考虑事故严重程度。此外,零值填充部分帮助分辨非事故风险场景。因此,STZITD-GNN 能够准确地捕捉数据的极向分布,强调罕见但具有深远影响的严重事故。我们对实际的伦敦交通数据进行了 empirical 测试,发现我们的模型在短(7天)和长(14天)时间层面上都能够超越当前标准。STZITD-GNN 的 forte 不仅在准确性方面,还在减少不确定性方面,在短时间和长时间层面上都能够提供可靠的预测。

Chasing the Intruder: A Reinforcement Learning Approach for Tracking Intruder Drones

  • paper_url: http://arxiv.org/abs/2309.05070
  • repo_url: None
  • paper_authors: Shivam Kainth, Subham Sahoo, Rajtilak Pal, Shashi Shekhar Jha
  • for: 这篇论文是用来解决非法用探空机采用探空机跟踪攻击者探空机的问题的。
  • methods: 该论文提出了一种基于Policy学习的探空机跟踪方法,利用计算机视觉技术和Policy学习框架来学习控制策略,实现探空机跟踪攻击者探空机。
  • results: 实验结果表明,提出的方法可以快速和精准地识别和跟踪攻击者探空机,并且对攻击者探空机的速度或方向变化具有弹性性。
    Abstract Drones are becoming versatile in a myriad of applications. This has led to the use of drones for spying and intruding into the restricted or private air spaces. Such foul use of drone technology is dangerous for the safety and security of many critical infrastructures. In addition, due to the varied low-cost design and agility of the drones, it is a challenging task to identify and track them using the conventional radar systems. In this paper, we propose a reinforcement learning based approach for identifying and tracking any intruder drone using a chaser drone. Our proposed solution uses computer vision techniques interleaved with the policy learning framework of reinforcement learning to learn a control policy for chasing the intruder drone. The whole system has been implemented using ROS and Gazebo along with the Ardupilot based flight controller. The results show that the reinforcement learning based policy converges to identify and track the intruder drone. Further, the learnt policy is robust with respect to the change in speed or orientation of the intruder drone.
    摘要 随着无人机在各种应用中的普及,无人机也开始用于间谍和非法进入受限或私人空域。这种不良使用无人机技术会对多个关键基础设施的安全和安全造成威胁。此外,由于无人机的多样化低成本设计和机敏性,使用传统雷达系统识别和跟踪它们是一项困难的任务。在这篇论文中,我们提出了基于Policy学习框架的强化学习方法,用于识别和跟踪任何非法无人机。我们的提议的解决方案使用计算机视觉技术与Policy学习框架结合,以学习控制策略,追踪非法无人机。整个系统使用ROS和Gazebo以及Ardupilot基于飞行控制器。实验结果表明,强化学习基于策略 converges to识别和跟踪非法无人机。此外,学习的策略还具有对速度或方向变化的robust性。

Federated Learning Incentive Mechanism under Buyers’ Auction Market

  • paper_url: http://arxiv.org/abs/2309.05063
  • repo_url: None
  • paper_authors: Jiaxi Yang, Zihao Guo, Sheng Cao, Cuifang Zhao, Li-Chuan Tsai
  • For: 本文探讨了基于拍卖的联合学习(AFL)如何在开放合作环境下实现数据拥有者和数据消费者之间的协作。* Methods: 本文采用了基于拍卖的订单框架,以解释在买家市场下的价格行为。文中还使用了一种基于区块链的声誉机制,以选择具有高可靠性和数据质量的客户端。* Results: 实验结果证明了我们的方法的有效性。
    Abstract Auction-based Federated Learning (AFL) enables open collaboration among self-interested data consumers and data owners. Existing AFL approaches are commonly under the assumption of sellers' market in that the service clients as sellers are treated as scarce resources so that the aggregation servers as buyers need to compete the bids. Yet, as the technology progresses, an increasing number of qualified clients are now capable of performing federated learning tasks, leading to shift from sellers' market to a buyers' market. In this paper, we shift the angle by adapting the procurement auction framework, aiming to explain the pricing behavior under buyers' market. Our modeling starts with basic setting under complete information, then move further to the scenario where sellers' information are not fully observable. In order to select clients with high reliability and data quality, and to prevent from external attacks, we utilize a blockchain-based reputation mechanism. The experimental results validate the effectiveness of our approach.
    摘要 价格赢 Auction-based Federated Learning (AFL) 可以实现开放合作 among self-interested data consumers 和数据所有者。现有的 AFL 方法通常假设出售方为稀缺资源,因此整合服务器需要竞标。然而,技术的进步使得更多的资格客户可以执行联邦学习任务,导致市场的转变。在这篇论文中,我们将Angleshift towards buyers' market。我们采用了基于 blockchain 的声誉机制来选择可靠的客户和数据质量。实验结果证明了我们的方法的有效性。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Machine Learning for maximizing the memristivity of single and coupled quantum memristors

  • paper_url: http://arxiv.org/abs/2309.05062
  • repo_url: None
  • paper_authors: Carlos Hernani-Morales, Gabriel Alvarado, Francisco Albarrán-Arriagada, Yolanda Vives-Gilabert, Enrique Solano, José D. Martín-Guerrero
  • for: 用机器学习方法描述单个和连接的量子幂istor的幂istor性质。
  • methods: 使用机器学习方法来描述单个和连接的量子幂istor的幂istor性质。
  • results: 结果表明,通过增加幂istor性,可以获得两个量子幂istor的高度相关性,从而证明了量子幂istor与记忆之间的密切关系。这些结果为量子幂istorneuromorphic量子计算提供了更多的可能性。
    Abstract We propose machine learning (ML) methods to characterize the memristive properties of single and coupled quantum memristors. We show that maximizing the memristivity leads to large values in the degree of entanglement of two quantum memristors, unveiling the close relationship between quantum correlations and memory. Our results strengthen the possibility of using quantum memristors as key components of neuromorphic quantum computing.
    摘要 我们提出机器学习(ML)方法来描述单个和连接的量子幂istor的幂istor性质。我们发现通过提高幂istor性来获得两个量子幂istor的共聚能量,暴露出量子相关性和记忆之间的密切关系。我们的结果加强了使用量子幂istor作为神经omorphic量子计算的可能性。Note: The translation is done using Google Translate, and may not be perfect or entirely accurate.

Decolonial AI Alignment: Viśesadharma, Argument, and Artistic Expression

  • paper_url: http://arxiv.org/abs/2309.05030
  • repo_url: None
  • paper_authors: Kush R. Varshney
  • for: 本研究旨在寻找一种去殖民化人工智能(AI)的方法,以适应不同文化和价值观的需求。
  • methods: 本研究提出了三个建议来减少AIAlignment中的殖民化影响:(1)改变基础道德哲学从西方哲学改为道德,(2)允许不同传统的论证和多元主义在Alignment技术中,(3)扩展价值观的 épistémologie beyond自然语言中的 instrucciones or commandments。
  • results: 本研究的提议可以帮助去殖民化AIAlignment,使其更适应不同文化和价值观的需求,并且可以增强AI的多样性和包容性。
    Abstract Prior work has explicated the coloniality of artificial intelligence (AI) development and deployment. One process that that work has not engaged with much is alignment: the tuning of large language model (LLM) behavior to be in line with desired values based on fine-grained human feedback. In addition to other practices, colonialism has a history of altering the beliefs and values of colonized peoples; this history is recapitulated in current LLM alignment practices. We suggest that AI alignment be decolonialized using three proposals: (a) changing the base moral philosophy from Western philosophy to dharma, (b) permitting traditions of argument and pluralism in alignment technologies, and (c) expanding the epistemology of values beyond instructions or commandments given in natural language.
    摘要
  1. Shift the base moral philosophy from Western philosophy to dharma.2. Embrace diverse traditions of argument and pluralism in alignment technologies.3. Expand the epistemology of values beyond instructions or commandments given in natural language.By implementing these proposals, we can work towards decolonializing AI alignment and promoting more inclusive and culturally sensitive values in AI development.

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

  • paper_url: http://arxiv.org/abs/2309.05027
  • repo_url: None
  • paper_authors: Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu
  • for: 提高 текст到语音synthesis的效率,替代传统的扩散模型。
  • methods: 提出了一种基于流匹配算法的语音模型,通过限制抽样步数,实现高质量的语音生成。
  • results: 对单和多话者 corpora进行主观和objective评估,显示了 VoiceFlow 的synthesis质量明显超过扩散模型。
    Abstract Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency. Alternatively, we propose VoiceFlow, an acoustic model that utilizes a rectified flow matching algorithm to achieve high synthesis quality with a limited number of sampling steps. VoiceFlow formulates the process of generating mel-spectrograms into an ordinary differential equation conditional on text inputs, whose vector field is then estimated. The rectified flow technique then effectively straightens its sampling trajectory for efficient synthesis. Subjective and objective evaluations on both single and multi-speaker corpora showed the superior synthesis quality of VoiceFlow compared to the diffusion counterpart. Ablation studies further verified the validity of the rectified flow technique in VoiceFlow.
    摘要 Translated into Simplified Chinese:尽管扩散模型在文本到语音转换中成为了流行的选择,但它们的内在复杂性使其效率受限。相反,我们提出了 VoiceFlow,一种使用矫正流匹配算法来实现高质量的合成。VoiceFlow将文本输入转换为mel-spectrogram的过程形式化为一个条件的ordinary differential equation,然后估算vector field。矫正流技术然后有效地平直 sampling 的轨迹,从而提高合成效率。对单个和多个说话者 corpora进行主观和 объектив评估表明,VoiceFlow的合成质量高于扩散对应部分。另外,ablation 研究进一步证明了矫正流技术在 VoiceFlow 中的有效性。

FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation

  • paper_url: http://arxiv.org/abs/2309.05007
  • repo_url: None
  • paper_authors: Yan Meng, Liangming Pan, Yixin Cao, Min-Yen Kan
  • for: 本研究旨在提供一个真实世界信息寻求续问生成任务 (FQG), 协助模型产生更深入的理解和更多的质问。
  • methods: 研究人员使用 Reddit 论坛提供的开放式问题和回答数据集 (FOLLOWUPQG),并使用现有的问题生成模型来评估模型的效果。
  • results: 研究结果显示,现有的问题生成模型可以生成一些有用的续问,但与人类提出的问题相比,模型生成的问题较为简单和不具有高级认知功能。
    Abstract Humans ask follow-up questions driven by curiosity, which reflects a creative human cognitive process. We introduce the task of real-world information-seeking follow-up question generation (FQG), which aims to generate follow-up questions seeking a more in-depth understanding of an initial question and answer. We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question) tuples collected from a Reddit forum providing layman-friendly explanations for open-ended questions. In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills (such as applying and relating). We evaluate current question generation models on their efficacy for generating follow-up questions, exploring how to generate specific types of follow-up questions based on step-by-step demonstrations. Our results validate FOLLOWUPQG as a challenging benchmark, as model-generated questions are adequate but far from human-raised questions in terms of informativeness and complexity.
    摘要 人类会提出续问,这反映了人类的创新性思维过程。我们介绍了实际世界信息寻求续问生成任务(FQG),该任务的目标是生成更深入理解初始问题和答案的续问。我们构建了FOLLOWUPQG数据集,包含了超过3000个实际世界(初始问题、答案、续问)元组,这些元组来自一个基于Reddit社区的讨论平台,提供了对开放问题的便捷描述。与现有数据集不同,FOLLOWUPQG中的问题使用更多的 Pragmatic 策略来寻求信息,同时也表现出更高一级的认知技能(如应用和关系)。我们对当前问题生成模型进行评估,explore如何基于步骤示例来生成特定类型的续问。我们的结果证明 FOLLOWUPQG 是一个具有挑战性的标准,因为模型生成的问题具有充足的信息和复杂性,但与人类提出的问题相比,它们仍然有一定的差距。

RGAT: A Deeper Look into Syntactic Dependency Information for Coreference Resolution

  • paper_url: http://arxiv.org/abs/2309.04977
  • repo_url: https://github.com/qingtian5/rgat_with_bert
  • paper_authors: Yuan Meng, Xuhao Pan, Jun Chang, Yue Wang
  • for: 这个论文主要研究了如何使用语法依赖关系图来解决核心引用解决问题。
  • methods: 该论文提出了一种结合预训练BERT和语法关系图注意力网络(RGAT)的终端解析器,以更深入地探究语法依赖关系图对核心引用解决问题的作用。RGAT模型首先被提出,然后用于理解语法依赖图并学习更好的任务特定语法嵌入。一个整合的建筑物 combining BERT嵌入和语法嵌入被构建,以生成融合表示 для下游任务。
  • results: 在一个公共的性别不确定 pronouns(GAP)数据集上的实验表明,在对语法依赖图的监督学习和不进行BERT全体调参的情况下,我们提高了之前最佳模型(RGCN-with-BERT)的F1分数从80.3%提高到82.5%,相比于单独使用BERT嵌入的F1分数从78.5%提高到82.5%。另一个公共的OntoNotes 5.0数据集上的实验结果也表明了模型的性能得到了改进。
    Abstract Although syntactic information is beneficial for many NLP tasks, combining it with contextual information between words to solve the coreference resolution problem needs to be further explored. In this paper, we propose an end-to-end parser that combines pre-trained BERT with a Syntactic Relation Graph Attention Network (RGAT) to take a deeper look into the role of syntactic dependency information for the coreference resolution task. In particular, the RGAT model is first proposed, then used to understand the syntactic dependency graph and learn better task-specific syntactic embeddings. An integrated architecture incorporating BERT embeddings and syntactic embeddings is constructed to generate blending representations for the downstream task. Our experiments on a public Gendered Ambiguous Pronouns (GAP) dataset show that with the supervision learning of the syntactic dependency graph and without fine-tuning the entire BERT, we increased the F1-score of the previous best model (RGCN-with-BERT) from 80.3% to 82.5%, compared to the F1-score by single BERT embeddings from 78.5% to 82.5%. Experimental results on another public dataset - OntoNotes 5.0 demonstrate that the performance of the model is also improved by incorporating syntactic dependency information learned from RGAT.
    摘要 �although syntactic information is beneficial for many NLP tasks, combining it with contextual information between words to solve the coreference resolution problem needs to be further explored. In this paper, we propose an end-to-end parser that combines pre-trained BERT with a Syntactic Relation Graph Attention Network (RGAT) to take a deeper look into the role of syntactic dependency information for the coreference resolution task. In particular, the RGAT model is first proposed, then used to understand the syntactic dependency graph and learn better task-specific syntactic embeddings. An integrated architecture incorporating BERT embeddings and syntactic embeddings is constructed to generate blending representations for the downstream task. Our experiments on a public Gendered Ambiguous Pronouns (GAP) dataset show that with the supervision learning of the syntactic dependency graph and without fine-tuning the entire BERT, we increased the F1-score of the previous best model (RGCN-with-BERT) from 80.3% to 82.5%, compared to the F1-score by single BERT embeddings from 78.5% to 82.5%. Experimental results on another public dataset - OntoNotes 5.0 demonstrate that the performance of the model is also improved by incorporating syntactic dependency information learned from RGAT.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China.

AVARS – Alleviating Unexpected Urban Road Traffic Congestion using UAVs

  • paper_url: http://arxiv.org/abs/2309.04976
  • repo_url: https://github.com/guojyjy/avars
  • paper_authors: Jiaying Guo, Michael R. Jones, Soufiene Djahel, Shen Wang
  • for: 实时监控交通流量并快速采取适当的交通信号控制措施,以减少城市快速几何化交通堵塞。
  • methods: 使用深度强化学习(DRL)算法控制交通信号灯,并运用无人机(UAV)实时监控交通流量提供高频高分辨率的交通数据。
  • results: 透过AVARS系统,可以实现快速对应未预期的交通堵塞,并将交通流量回复到原本的不堵塞状态,而且可以在一般无人机的电池寿命 duration 内完成。
    Abstract Reducing unexpected urban traffic congestion caused by en-route events (e.g., road closures, car crashes, etc.) often requires fast and accurate reactions to choose the best-fit traffic signals. Traditional traffic light control systems, such as SCATS and SCOOT, are not efficient as their traffic data provided by induction loops has a low update frequency (i.e., longer than 1 minute). Moreover, the traffic light signal plans used by these systems are selected from a limited set of candidate plans pre-programmed prior to unexpected events' occurrence. Recent research demonstrates that camera-based traffic light systems controlled by deep reinforcement learning (DRL) algorithms are more effective in reducing traffic congestion, in which the cameras can provide high-frequency high-resolution traffic data. However, these systems are costly to deploy in big cities due to the excessive potential upgrades required to road infrastructure. In this paper, we argue that Unmanned Aerial Vehicles (UAVs) can play a crucial role in dealing with unexpected traffic congestion because UAVs with onboard cameras can be economically deployed when and where unexpected congestion occurs. Then, we propose a system called "AVARS" that explores the potential of using UAVs to reduce unexpected urban traffic congestion using DRL-based traffic light signal control. This approach is validated on a widely used open-source traffic simulator with practical UAV settings, including its traffic monitoring ranges and battery lifetime. Our simulation results show that AVARS can effectively recover the unexpected traffic congestion in Dublin, Ireland, back to its original un-congested level within the typical battery life duration of a UAV.
    摘要 红色减少意外城市堵塞需要快速准确的反应选择最佳的交通信号控制。传统的交通信号控制系统,如SCATS和SCOOT,不是高效的,因为它们的交通数据由感测器提供,更新频率较低(大于1分钟)。此外,这些系统使用的交通信号信息是从先前定义的候选计划中选择的,无法适应意外事件的发生。当前的研究表明,基于深度优化学习(DRL)算法控制的摄像头交通信号系统更有效地减少交通堵塞。然而,这些系统在大城市部署时需要昂贵的基础设施升级。在这篇论文中,我们提出了使用无人机(UAV)来解决意外交通堵塞的想法。我们认为UAV可以在意外堵塞发生时经济性地部署,并使用摄像头提供高频高分辨率的交通数据。然后,我们提出了一个名为“AVARS”的系统,该系统使用DRL算法控制UAV摄像头提供的交通数据,以减少意外城市堵塞。我们使用一个广泛使用的开源交通模拟器进行了实验,并模拟了实际的UAV设置,包括交通监测范围和电池寿命。我们的实验结果表明,AVARS可以在都柏林、爱尔兰 effectively recovery意外交通堵塞,并在UAV的Typical电池寿命内恢复到原始无堵塞水平。

Continual Robot Learning using Self-Supervised Task Inference

  • paper_url: http://arxiv.org/abs/2309.04974
  • repo_url: None
  • paper_authors: Muhammad Burhan Hafez, Stefan Wermter
  • for: 本研究旨在将机器人给予人类学习能力,即在生命途中不断学习多个技能。
  • methods: 本研究使用自我组织学习法,从观察运动和效果部分的自适应学习出动作和意图嵌入,以及从共同动作意图嵌入自适应学习出高级行为嵌入。
  • results: 本研究比较多种多任务学习基eline,在人工智能验证中表现出色,能够从不完整的示例中推理任务,并且在不断学习设定中表现更好。
    Abstract Endowing robots with the human ability to learn a growing set of skills over the course of a lifetime as opposed to mastering single tasks is an open problem in robot learning. While multi-task learning approaches have been proposed to address this problem, they pay little attention to task inference. In order to continually learn new tasks, the robot first needs to infer the task at hand without requiring predefined task representations. In this paper, we propose a self-supervised task inference approach. Our approach learns action and intention embeddings from self-organization of the observed movement and effect parts of unlabeled demonstrations and a higher-level behavior embedding from self-organization of the joint action-intention embeddings. We construct a behavior-matching self-supervised learning objective to train a novel Task Inference Network (TINet) to map an unlabeled demonstration to its nearest behavior embedding, which we use as the task representation. A multi-task policy is built on top of the TINet and trained with reinforcement learning to optimize performance over tasks. We evaluate our approach in the fixed-set and continual multi-task learning settings with a humanoid robot and compare it to different multi-task learning baselines. The results show that our approach outperforms the other baselines, with the difference being more pronounced in the challenging continual learning setting, and can infer tasks from incomplete demonstrations. Our approach is also shown to generalize to unseen tasks based on a single demonstration in one-shot task generalization experiments.
    摘要 <>对于 робоット来说,授予它人类的学习能力,即在一生中不断学习多种技能,是一个打开的问题。虽然多任务学习方法有所提出,但它们对任务推理 pays little attention。为了不断学习新任务,首先需要由 robot 自动推理出当前任务,而不需要预定的任务表示。在这篇论文中,我们提出了一种自动任务推理方法。我们从无标示示例中自动学习动作和意图嵌入,以及高级行为嵌入。我们构建了一个行为匹配自我监督学习目标,用于训练一个新的任务推理网络(TINet),以将无标示示例映射到其最似的行为嵌入。基于 TINet 的多任务策略,我们使用强化学习训练,以优化任务表示的性能。我们在固定集和不断多任务学习设置中对我们的方法进行评估,并与不同的多任务学习基准进行比较。结果表明,我们的方法在不断学习设置中与其他基准之间的差异更加明显,并且可以从不完整的示例中推理任务。我们的方法还在一次任务扩展试验中被证明可以基于单个示例进行一次任务扩展。Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need further assistance.

Prefix-diffusion: A Lightweight Diffusion Model for Diverse Image Captioning

  • paper_url: http://arxiv.org/abs/2309.04965
  • repo_url: None
  • paper_authors: Guisheng Liu, Yi Li, Zhengcong Fei, Haiyan Fu, Xiangyang Luo, Yanqing Guo
  • for: 提高图像描述的多样性和可靠性
  • methods: 使用轻量级图像描述网络和不间断填充方法
  • results: 实现多样化的图像描述,同时减少trainable参数数量
    Abstract While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-word application of these systems. In this work, we propose a lightweight image captioning network in combination with continuous diffusion, called Prefix-diffusion. To achieve diversity, we design an efficient method that injects prefix image embeddings into the denoising process of the diffusion model. In order to reduce trainable parameters, we employ a pre-trained model to extract image features and further design an extra mapping network. Prefix-diffusion is able to generate diverse captions with relatively less parameters, while maintaining the fluency and relevance of the captions benefiting from the generative capabilities of the diffusion model. Our work paves the way for scaling up diffusion models for image captioning, and achieves promising performance compared with recent approaches.
    摘要 While impressive performance has been achieved in image captioning, the limited diversity of the generated captions and the large parameter scale remain major barriers to the real-world application of these systems. In this work, we propose a lightweight image captioning network in combination with continuous diffusion, called Prefix-diffusion. To achieve diversity, we design an efficient method that injects prefix image embeddings into the denoising process of the diffusion model. In order to reduce trainable parameters, we employ a pre-trained model to extract image features and further design an extra mapping network. Prefix-diffusion is able to generate diverse captions with relatively less parameters, while maintaining the fluency and relevance of the captions benefiting from the generative capabilities of the diffusion model. Our work paves the way for scaling up diffusion models for image captioning, and achieves promising performance compared with recent approaches.Here's the translation in Traditional Chinese:虽然印象描述中已经取得了卓越的表现,但是生成的描述仍然受到限制的多样性和大量的参数数量的阻碍。在这个工作中,我们提出了一个轻量级的图像描述网络,与不断传递(Diffusion)相结合,称为Prefix-diffusion。以提高多样性,我们设计了一个高效的方法,将预设的prefix图像嵌入送入传递过程中的混浊模型。以减少可读参数数量,我们使用预训模型提取图像特征,并设计了额外的映射网络。Prefix-diffusion能够生成多样的描述,并且保持描述的流利和相关性,充分利用传递模型的创造能力。我们的工作开启了扩展传递模型的可能性,并取得了最近的方法的优秀表现。

Multi-document Summarization: A Comparative Evaluation

  • paper_url: http://arxiv.org/abs/2309.04951
  • repo_url: None
  • paper_authors: Kushan Hewapathirana, Nisansa de Silva, C. D. Athuraliya
  • for: 评估现有多文摘要模型在不同领域和数据集上的表现,并探讨现有模型的局限性,以决定未来研究方向。
  • methods: 进行了广泛的文献综述,并分析了PRIMERA和PEGASUS模型在BigSurvey-MDS和MS$^2$数据集上的表现。
  • results: 发现LED全局预训练模型在MS$^2$数据集上比PRIMERA和PEGASUS表现更好,使用ROUGE分数来评估不同数据集上模型的表现。这些发现可以帮助未来的多文摘要研究,并为涉及复杂数据的各种领域提供准确和可靠的模型。
    Abstract This paper is aimed at evaluating state-of-the-art models for Multi-document Summarization (MDS) on different types of datasets in various domains and investigating the limitations of existing models to determine future research directions. To address this gap, we conducted an extensive literature review to identify state-of-the-art models and datasets. We analyzed the performance of PRIMERA and PEGASUS models on BigSurvey-MDS and MS$^2$ datasets, which posed unique challenges due to their varied domains. Our findings show that the General-Purpose Pre-trained Model LED outperforms PRIMERA and PEGASUS on the MS$^2$ dataset. We used the ROUGE score as a performance metric to evaluate the identified models on different datasets. Our study provides valuable insights into the models' strengths and weaknesses, as well as their applicability in different domains. This work serves as a reference for future MDS research and contributes to the development of accurate and robust models which can be utilized on demanding datasets with academically and/or scientifically complex data as well as generalized, relatively simple datasets.
    摘要 Translation notes:* "Multi-document Summarization" (MDS) is translated as "多文摘要" (duō wén jué yào) in Simplified Chinese.* "BigSurvey-MDS" and "MS$^2$" are translated as "大调查-MDS" (dà zhù zhàng - MDs) and "MS$^2$" (Meng Shi Er Shi) respectively.* "PRIMERA" and "PEGASUS" are translated as "PRIMERA" (Pǐ Mǐ É Ra) and "PEGASUS" (Péi Jī É Shū) respectively.* "General-Purpose Pre-trained Model" (LED) is translated as "通用预训模型" (tōng yòng yù xùn módel) in Simplified Chinese.* "ROUGE" score is translated as "ROUGE" 得分 (ROUGE dé fèng) in Simplified Chinese.

Knowledge-based Refinement of Scientific Publication Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2309.05681
  • repo_url: None
  • paper_authors: Siwen Yan, Phillip Odom, Sriraam Natarajan
  • for: 本研究目的是解决作者归属问题,具体来说是通过构建和更新知识图来实现。
  • methods: 本研究使用了功能Gradient Boosting来学习概率逻辑模型,并在人工指导下进行知识填充。
  • results: 研究表明,在七种作者域中,人工知识可以有效地提高作者归属预测的准确率和可解性。
    Abstract We consider the problem of identifying authorship by posing it as a knowledge graph construction and refinement. To this effect, we model this problem as learning a probabilistic logic model in the presence of human guidance (knowledge-based learning). Specifically, we learn relational regression trees using functional gradient boosting that outputs explainable rules. To incorporate human knowledge, advice in the form of first-order clauses is injected to refine the trees. We demonstrate the usefulness of human knowledge both quantitatively and qualitatively in seven authorship domains.
    摘要 我们视作推断作者的问题为建构知识图和精焕。为此,我们以学习机会逻辑模型为基础,使用函数Gradient Boosting学习关联 regression树,从而获得可解释的规则。为了包括人类知识,我们将知识型clause注入到树中来精焕。我们在七个作者领域证明了人类知识的有用性, both quantitatively and qualitatively。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

MFPNet: Multi-scale Feature Propagation Network For Lightweight Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2309.04914
  • repo_url: None
  • paper_authors: Guoan Xu, Wenjing Jia, Tao Wu, Ligeng Chen
  • for: 提高轻量级 semantic segmentation 的进步(semantic segmentation 是指将图像分割成不同类别的过程),尤其是在 compare to large-scale models 的研究方面,研究发现该领域的进步相对较慢。
  • methods: 我们提出了一种新的轻量级 segmentation 架构,即 Multi-scale Feature Propagation Network (MFPNet),用于解决这个问题。MFPNet 包括对称的 residual blocks,以及 flexible bottleneck residual modules (BRMs),以探索深度和 ricoh 多尺度 semantic context。此外,我们还利用 Graph Convolutional Networks (GCNs) 来促进多尺度 feature propagation between BRM blocks。
  • results: 我们的方法在 benchmark datasets 上进行测试,显示了出色的 segmentation 结果。
    Abstract In contrast to the abundant research focusing on large-scale models, the progress in lightweight semantic segmentation appears to be advancing at a comparatively slower pace. However, existing compact methods often suffer from limited feature representation capability due to the shallowness of their networks. In this paper, we propose a novel lightweight segmentation architecture, called Multi-scale Feature Propagation Network (MFPNet), to address the dilemma. Specifically, we design a robust Encoder-Decoder structure featuring symmetrical residual blocks that consist of flexible bottleneck residual modules (BRMs) to explore deep and rich muti-scale semantic context. Furthermore, taking benefit from their capacity to model latent long-range contextual relationships, we leverage Graph Convolutional Networks (GCNs) to facilitate multi-scale feature propagation between the BRM blocks. When evaluated on benchmark datasets, our proposed approach shows superior segmentation results.
    摘要 contrast to the abundant research focusing on large-scale models, the progress in lightweight semantic segmentation appears to be advancing at a comparatively slower pace. However, existing compact methods often suffer from limited feature representation capability due to the shallowness of their networks. In this paper, we propose a novel lightweight segmentation architecture, called Multi-scale Feature Propagation Network (MFPNet), to address the dilemma. Specifically, we design a robust Encoder-Decoder structure featuring symmetrical residual blocks that consist of flexible bottleneck residual modules (BRMs) to explore deep and rich multi-scale semantic context. Furthermore, taking benefit from their capacity to model latent long-range contextual relationships, we leverage Graph Convolutional Networks (GCNs) to facilitate multi-scale feature propagation between the BRM blocks. When evaluated on benchmark datasets, our proposed approach shows superior segmentation results.Here's the breakdown of the translation:* 异常 (contrast) - 对 (to)* 丰富 (abundant) - 研究 (research)* 注重 (focusing) - 大型 (large-scale) 模型 (models)* 进步 (progress) - 在 (in)* 较 (comparatively) slower pace* 然而 (however) - 现有 (existing) 紧凑 (compact) 方法 (methods)* 常 (often) suffer from - 有限 (limited) 表示 (representation) capability* due to - 由 (because of) the shallowness of their networks* In this paper, we propose - 在这篇论文中,我们提出* a novel lightweight segmentation architecture, called Multi-scale Feature Propagation Network (MFPNet)* to address the dilemma* Specifically, we design - 具体来说,我们设计* a robust Encoder-Decoder structure featuring symmetrical residual blocks* that consist of flexible bottleneck residual modules (BRMs)* to explore deep and rich multi-scale semantic context* Furthermore, taking benefit from - 另外,我们利用* their capacity to model latent long-range contextual relationships* we leverage Graph Convolutional Networks (GCNs) to facilitate multi-scale feature propagation between the BRM blocks* When evaluated on benchmark datasets, our proposed approach shows superior segmentation results.Note that Simplified Chinese is used in this translation, which is the standard written form of Chinese used in mainland China.

A Review of Machine Learning-based Security in Cloud Computing

  • paper_url: http://arxiv.org/abs/2309.04911
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Aptin Babaei, Parham M. Kebria, Mohsen Moradi Dalvand, Saeid Nahavandi
  • for: 本研究旨在提供一个全面的Machine Learning(ML)在云计算安全领域的现状报告,探讨不同ML算法的特点和效果,以及其可能的局限性。
  • methods: 本研究使用了许多最新的ML算法,包括分类、回归、 clustering等,以及其各自的特点和应用场景。
  • results: 本研究发现了一些ML算法在云计算安全领域的应用,包括攻击检测、数据分析、威胁感知等,以及这些算法的效果和局限性。
    Abstract Cloud Computing (CC) is revolutionizing the way IT resources are delivered to users, allowing them to access and manage their systems with increased cost-effectiveness and simplified infrastructure. However, with the growth of CC comes a host of security risks, including threats to availability, integrity, and confidentiality. To address these challenges, Machine Learning (ML) is increasingly being used by Cloud Service Providers (CSPs) to reduce the need for human intervention in identifying and resolving security issues. With the ability to analyze vast amounts of data, and make high-accuracy predictions, ML can transform the way CSPs approach security. In this paper, we will explore some of the most recent research in the field of ML-based security in Cloud Computing. We will examine the features and effectiveness of a range of ML algorithms, highlighting their unique strengths and potential limitations. Our goal is to provide a comprehensive overview of the current state of ML in cloud security and to shed light on the exciting possibilities that this emerging field has to offer.
    摘要 云计算(CC)正在改变IT资源的提供方式,让用户通过更加成本效益和简化的基础设施访问和管理他们的系统。然而,随着CC的增长,也出现了一系列安全风险,包括可用性、完整性和机密性的威胁。为了解决这些挑战,机器学习(ML)在云服务提供商(CSP)中越来越广泛使用,以减少人类干预在安全问题上的需求。机器学习可以分析大量数据,并做出高准确率的预测,因此它可以将云安全问题的解决方式转化为自动化的过程。在这篇论文中,我们将探讨最新的云计算领域中ML基于安全性的研究。我们将评估一些常用的ML算法的特点和效果,并 highlight其独特优势和潜在的限制。我们的目标是提供云计算领域ML安全性的全面概述,并探讨这个新兴领域的激动人心的可能性。

Effective Real Image Editing with Accelerated Iterative Diffusion Inversion

  • paper_url: http://arxiv.org/abs/2309.04907
  • repo_url: None
  • paper_authors: Zhihong Pan, Riccardo Gherardi, Xiufeng Xie, Stephen Huang
  • for: 这 paper 的目的是提出一种高效的图像修改方法,以解决现代生成模型中的图像编辑问题。
  • methods: 该方法使用一种新的混合导航技术,将混合导航和梯度下降两种方法相互融合,以提高图像修改的准确率。
  • results: 对比其他扩散逆向方法,该方法在10和20扩散步的 режиме下显示出更高的稳定性和效率,并且不需要大量的类ifier-free导航。
    Abstract Despite all recent progress, it is still challenging to edit and manipulate natural images with modern generative models. When using Generative Adversarial Network (GAN), one major hurdle is in the inversion process mapping a real image to its corresponding noise vector in the latent space, since its necessary to be able to reconstruct an image to edit its contents. Likewise for Denoising Diffusion Implicit Models (DDIM), the linearization assumption in each inversion step makes the whole deterministic inversion process unreliable. Existing approaches that have tackled the problem of inversion stability often incur in significant trade-offs in computational efficiency. In this work we propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity. By using a novel blended guidance technique, we show that effective results can be obtained on a large range of image editing tasks without large classifier-free guidance in inversion. Furthermore, when compared with other diffusion inversion based works, our proposed process is shown to be more robust for fast image editing in the 10 and 20 diffusion steps' regimes.
    摘要 尽管最近的进步很大,但是使用现代生成模型编辑和 manipulate 自然图像仍然是一个挑战。使用生成对抗网络(GAN)时,一个主要的障碍是在映射实际图像到其对应的隐藏空间噪声 вектор的过程中,因为需要能够重建图像以编辑其内容。同样,对于隐藏扩散假设模型(DDIM),每个反向步骤的线性化假设使整个推导性反向过程变得不可靠。现有的方法通常会在稳定性的权衡中做出大的牺牲。在这种情况下,我们提出了一种加速iterativediffusion inverse method,名为AIDI,该方法可以在重建精度方面取得显著改进,而无需增加空间和时间复杂度。我们使用了一种新的混合引导技术,并证明在大范围的图像编辑任务中可以获得有效的结果,无需大量的类ifier-free引导。此外,我们的提posed进程比其他扩散反向过程更加稳定,在10和20扩散步骤的 режиме下进行快速图像编辑。

cs.CL - 2023-09-10

The Effect of Alignment Objectives on Code-Switching Translation

  • paper_url: http://arxiv.org/abs/2309.05044
  • repo_url: None
  • paper_authors: Mohamed Anwar
  • for: 这个论文主要是为了提高机器翻译模型对Code-switching内容的翻译能力,特别是随着社交媒体和用户生成内容的兴起。
  • methods: 该论文提出了一种训练单个机器翻译模型,可以将一种语言中的单 sentence翻译成另一种语言,同时也可以翻译code-switched sentence。这个模型可以看作是人类的双语模型。为了更好地利用平行数据,我们生成了Synthetic Code-switched (CSW) 数据,并在编码器上添加了对齐损失,以将语言表示 align across languages。
  • results: 使用WMT14英语-法语(En-Fr)数据集,训练过程中的模型在处理code-switched翻译时强制性超过了批量基eline,同时保持了非code-switched(单语言)数据的质量。
    Abstract One of the things that need to change when it comes to machine translation is the models' ability to translate code-switching content, especially with the rise of social media and user-generated content. In this paper, we are proposing a way of training a single machine translation model that is able to translate monolingual sentences from one language to another, along with translating code-switched sentences to either language. This model can be considered a bilingual model in the human sense. For better use of parallel data, we generated synthetic code-switched (CSW) data along with an alignment loss on the encoder to align representations across languages. Using the WMT14 English-French (En-Fr) dataset, the trained model strongly outperforms bidirectional baselines on code-switched translation while maintaining quality for non-code-switched (monolingual) data.
    摘要 一些需要改变的事情在机器翻译方面是模型对混合语言内容的翻译能力,尤其是随着社交媒体和用户生成内容的兴起。在这篇论文中,我们提出了一种训练单个机器翻译模型,可以将一种语言中的单语句翻译成另一种语言,同时也可以翻译混合语言句子到任一种语言。这个模型可以被视为人类中的双语模型。为了更好地利用平行数据,我们生成了人工合成的混合语言数据,并在编码器中添加了对逻辑的损失,以确保语言之间的表示相互对应。使用WMT14英语-法语(En-Fr)数据集,我们训练的模型在混合语言翻译中强制超越了irectional基eline,同时保持了非混合语言数据的质量。

Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps

  • paper_url: http://arxiv.org/abs/2309.05021
  • repo_url: None
  • paper_authors: Yaonai Wei, Tuo Zhang, Han Zhang, Tianyang Zhong, Lin Zhao, Zhengliang Liu, Chong Ma, Songyao Zhang, Muheng Shang, Lei Du, Xiao Li, Tianming Liu, Junwei Han
  • for: 本研究旨在提高meta-analysis中文本Query的准确性,使用大型自然语言模型(LLMs)来解决现有的问题,如semantic redundancy和ambiguity。
  • methods: 本研究使用了一种称为Chat2Brain的方法,它将基本的文本-2-图模型(Text2Brain)与LLMs相结合,以将开放式SemanticQuery映射到大脑活动图像中。
  • results: 研究表明,Chat2Brain可以将文本Query转化为具有生物学可能性的大脑活动图像,并且在数据缺乏和复杂的查询环境中表现出了优于Text2Brain模型。
    Abstract Over decades, neuroscience has accumulated a wealth of research results in the text modality that can be used to explore cognitive processes. Meta-analysis is a typical method that successfully establishes a link from text queries to brain activation maps using these research results, but it still relies on an ideal query environment. In practical applications, text queries used for meta-analyses may encounter issues such as semantic redundancy and ambiguity, resulting in an inaccurate mapping to brain images. On the other hand, large language models (LLMs) like ChatGPT have shown great potential in tasks such as context understanding and reasoning, displaying a high degree of consistency with human natural language. Hence, LLMs could improve the connection between text modality and neuroscience, resolving existing challenges of meta-analyses. In this study, we propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map open-ended semantic queries to brain activation maps in data-scarce and complex query environments. By utilizing the understanding and reasoning capabilities of LLMs, the performance of the mapping model is optimized by transferring text queries to semantic queries. We demonstrate that Chat2Brain can synthesize anatomically plausible neural activation patterns for more complex tasks of text queries.
    摘要 (注:以下是简化中文版本)多年来,神经科学在文本模式中积累了大量的研究成果,可以用来探索认知过程。meta分析是一种常见的方法,可以将文本查询映射到大脑活动图表,但是它仍然依赖于理想的查询环境。在实际应用中,用于meta分析的文本查询可能会遇到 semantics redundancy和ambiguity问题,导致不准确地映射到大脑图像。然而,大型自然语言模型(LLMs)如ChatGPT显示出了在上下文理解和思维任务中的极高潜力,这与人类自然语言的一致度很高。因此,LLMs可以改善文本模式和神经科学之间的连接,解决现有的meta分析挑战。在这项研究中,我们提议一种名为Chat2Brain的方法,将LLMs与基本的文本-2-图模型(Text2Brain)结合,以将开放式semantic查询映射到大脑活动图表中。通过利用LLMs的理解和思维能力,我们可以优化映射模型的性能,将文本查询转化为semantic查询。我们示例ify that Chat2Brain可以生成符合生物学原理的大脑活动 Patterns for more complex tasks of text queries.

Machine Translation Models Stand Strong in the Face of Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2309.06527
  • repo_url: None
  • paper_authors: Pavel Burnyshev, Elizaveta Kostenok, Alexey Zaytsev
  • for: 本研究探讨了深度学习模型面临攻击时的漏洞,具体来说是对序列至序列(seq2seq)模型的机器翻译模型进行攻击。
  • methods: 我们引入了基本文本扰动规则和更高级别的策略,如梯度基于攻击,利用不可导的翻译度量的拟合来进行攻击。
  • results: 我们的研究表明,机器翻译模型对已知最佳攻击方法 Displayed robustness,输入扰动与输出扰动直接相关。但是,在弱者中,我们的攻击表现最好,与其他攻击相比,具有最高相对性。另一强 candidate是基于个体字符混合的攻击。
    Abstract Adversarial attacks expose vulnerabilities of deep learning models by introducing minor perturbations to the input, which lead to substantial alterations in the output. Our research focuses on the impact of such adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models. We introduce algorithms that incorporate basic text perturbation heuristics and more advanced strategies, such as the gradient-based attack, which utilizes a differentiable approximation of the inherently non-differentiable translation metric. Through our investigation, we provide evidence that machine translation models display robustness displayed robustness against best performed known adversarial attacks, as the degree of perturbation in the output is directly proportional to the perturbation in the input. However, among underdogs, our attacks outperform alternatives, providing the best relative performance. Another strong candidate is an attack based on mixing of individual characters.
    摘要 深度学习模型的敌对攻击暴露了它们的漏洞,通过对输入添加微小的修改,导致输出受到重大的变化。我们的研究关注于seq2seq模型,具体来说是机器翻译模型,对于这类模型的敌对攻击。我们提出了基于文本修饰规则和更高级的策略,如基于梯度的攻击,利用可微的翻译评价函数来近似非微分的翻译评价函数。我们的调查发现,机器翻译模型对已知最佳敌对攻击表现出了强健性,输入修饰程度与输出修饰程度直接相关。然而,在弱者中,我们的攻击表现最佳,提供了最好的相对性。另一个强 канди达是基于个体字符混合的攻击。

Mitigating Word Bias in Zero-shot Prompt-based Classifiers

  • paper_url: http://arxiv.org/abs/2309.04992
  • repo_url: None
  • paper_authors: Adian Liusie, Potsawee Manakul, Mark J. F. Gales
  • for: 提高 prompt-based 分类器的性能,解决 word biases 问题
  • methods: 使用 unsupervised 方法,对类别的预测概率进行重新权重,并与语言模型的词权相连
  • results: 实现了大幅提高 prompt 设置的性能,与 oracle Upper bound 性能呈现强相关,并可以在 zero-resource 环境下设置阈值
    Abstract Prompt-based classifiers are an attractive approach for zero-shot classification. However, the precise choice of the prompt template and label words can largely influence performance, with semantically equivalent settings often showing notable performance difference. This discrepancy can be partly attributed to word biases, where the classifier may be biased towards classes. To address this problem, it is possible to optimise classification thresholds on a labelled data set, however, this mitigates some of the advantages of prompt-based classifiers. This paper instead approaches this problem by examining the expected marginal probabilities of the classes. Here, probabilities are reweighted to have a uniform prior over classes, in an unsupervised fashion. Further, we draw a theoretical connection between the class priors and the language models' word prior, and offer the ability to set a threshold in a zero-resource fashion. We show that matching class priors correlates strongly with the oracle upper bound performance and demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
    摘要 This paper proposes a different approach: reweighting probabilities to have a uniform prior over classes in an unsupervised fashion. The expected marginal probabilities of the classes are examined, and a threshold can be set in a zero-resource fashion. The class priors are found to be closely related to the language models' word prior, and matching class priors can achieve strong performance gains for prompt settings across a range of NLP tasks.

Retrieval-Augmented Meta Learning for Low-Resource Text Classification

  • paper_url: http://arxiv.org/abs/2309.04979
  • repo_url: https://github.com/Carolmelon/RAML
  • paper_authors: Rongsheng Li, Yangning Li, Yinghui Li, Chaiyut Luoyiching, Hai-Tao Zheng, Nannan Zhou, Hanjing Su
  • for: 优化低资源文本分类任务的表现,通过从源类任务中传递知识来预测目标类。
  • methods: 使用参数化神经网络进行推理,并从外部词库中检索非参数化知识来增强推理表现。
  • results: 在低资源文本分类任务中显著超过当前最佳状态的表现。
    Abstract Meta learning have achieved promising performance in low-resource text classification which aims to identify target classes with knowledge transferred from source classes with sets of small tasks named episodes. However, due to the limited training data in the meta-learning scenario and the inherent properties of parameterized neural networks, poor generalization performance has become a pressing problem that needs to be addressed. To deal with this issue, we propose a meta-learning based method called Retrieval-Augmented Meta Learning(RAML). It not only uses parameterization for inference but also retrieves non-parametric knowledge from an external corpus to make inferences, which greatly alleviates the problem of poor generalization performance caused by the lack of diverse training data in meta-learning. This method differs from previous models that solely rely on parameters, as it explicitly emphasizes the importance of non-parametric knowledge, aiming to strike a balance between parameterized neural networks and non-parametric knowledge. The model is required to determine which knowledge to access and utilize during inference. Additionally, our multi-view passages fusion network module can effectively and efficiently integrate the retrieved information into low-resource classification task. The extensive experiments demonstrate that RAML significantly outperforms current SOTA low-resource text classification models.
    摘要 Meta 学习已经实现了低资源文本分类中的出色表现,通过从源类中 transferred 知识来标识目标类。然而,由于 meta-learning enario 中的培育数据有限和参数化神经网络的内在性质,低泛化性表现成为一个需要解决的问题。为解决这个问题,我们提出了 Retrieval-Augmented Meta Learning(RAML)方法。它不仅使用参数进行推理,而且从外部资源中检索非参数化知识,以便在推理时使用,这有效地解决了由于缺乏多样化培育数据而导致的低泛化性问题。与之前的模型不同,RAML 不仅仅仅靠参数来进行推理,而是强调非参数化知识的重要性,以达到参数化神经网络和非参数化知识之间的平衡。模型需要在推理时决定哪些知识要访问和利用。此外,我们的多视图通道融合网络模块可以高效地和有效地将检索到的信息集成到低资源分类任务中。广泛的实验表明,RAML 可以明显超过当前最佳的低资源文本分类模型。

Prompt Learning With Knowledge Memorizing Prototypes For Generalized Few-Shot Intent Detection

  • paper_url: http://arxiv.org/abs/2309.04971
  • repo_url: None
  • paper_authors: Chaiyut Luoyiching, Yangning Li, Yinghui Li, Rongsheng Li, Hai-Tao Zheng, Nannan Zhou, Hanjing Su
  • for: solves the challenging problem of generalized few-shot intent detection (GFSID) by converting the task into the class incremental learning paradigm.
  • methods: proposes a two-stage learning framework that sequentially learns the knowledge of different intents in various periods via prompt learning, and uses prototypes to categorize both seen and novel intents.
  • results: achieves promising performance on two widely used datasets through extensive experiments and detailed analyses.Here’s the full summary in Simplified Chinese:
  • for: 通过将GFSID任务转换为类增量学习 paradigm,解决了Generalized Few-Shot Intent Detection (GFSID) 的挑战性问题。
  • methods: 提议了一个两阶段学习框架,通过提示学习顺序地学习不同意图的知识,并使用prototype来分类seen和novel意图。
  • results: 通过广泛的实验和详细的分析,在两个广泛使用的数据集上达到了可观的表现。
    Abstract Generalized Few-Shot Intent Detection (GFSID) is challenging and realistic because it needs to categorize both seen and novel intents simultaneously. Previous GFSID methods rely on the episodic learning paradigm, which makes it hard to extend to a generalized setup as they do not explicitly learn the classification of seen categories and the knowledge of seen intents. To address the dilemma, we propose to convert the GFSID task into the class incremental learning paradigm. Specifically, we propose a two-stage learning framework, which sequentially learns the knowledge of different intents in various periods via prompt learning. And then we exploit prototypes for categorizing both seen and novel intents. Furthermore, to achieve the transfer knowledge of intents in different stages, for different scenarios we design two knowledge preservation methods which close to realistic applications. Extensive experiments and detailed analyses on two widely used datasets show that our framework based on the class incremental learning paradigm achieves promising performance.
    摘要 通用几招意图检测(GFSID)是一个具有挑战性和实用性的任务,因为它需要同时分类已知和新的意图。先前的GFSID方法基于 episodic learning 模式,这使得它们不能直接应用到通用化设置中。为解决这个困境,我们提议将 GFSID 任务转化为类增量学习模式。具体来说,我们提议一个两阶段学习框架,先后学习不同时期的意图知识via prompt learning。然后,我们利用示例来分类已知和新的意图。此外,为了保持意图在不同阶段的传递知识,我们设计了两种知识保持方法,它们更加适合实际应用。我们在两个广泛使用的数据集上进行了详细的实验和分析,得到了我们基于类增量学习模式的框架的优秀表现。

What’s Hard in English RST Parsing? Predictive Models for Error Analysis

  • paper_url: http://arxiv.org/abs/2309.04940
  • repo_url: None
  • paper_authors: Yang Janet Liu, Tatsuya Aoyama, Amir Zeldes
  • for: 本研究旨在探讨逻辑结构理论下的层次话语分析仍然存在挑战,以及这些挑战的原因是如何。
  • methods: 本文使用了一些过去研究中的难点,包括半显式关系、远程关系、缺失词汇等因素,以及两个英文测试集,其中一个包含正确的金标RST关系,另一个包含干扰关系。
  • results: 我们的结果显示,与浅度话语分析一样,显式/隐式之分在层次话语分析中也发挥了作用,但是远程依赖关系是主要的挑战,而词汇重叠的问题较少。我们的最终模型可以在76.3%的精度上预测错误的位置,Bottom-upParser 和 Top-downParser 都是如此。
    Abstract Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in identifying long-distance relations, out-of-vocabulary items, and more. In order to assess the relative importance of these variables, we also release two annotated English test-sets with explicit correct and distracting discourse markers associated with gold standard RST relations. Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge, while lack of lexical overlap is less of a problem, at least for in-domain parsing. Our final model is able to predict where errors will occur with an accuracy of 76.3% for the bottom-up parser and 76.6% for the top-down parser.
    摘要 尽管最近的自然语言处理(NLP)技术已经取得了 significiant advances,但在 rhethorical structure theory(RST)框架下的层次演化分析仍然是一个挑战,我们对这些挑战的理解仍然有限。在这篇论文中,我们研究了过去的 parsing 困难的一些因素,包括隐式 discourse relations 的存在、远程关系的挑战、out-of-vocabulary items 和更多的因素。为了评估这些变量的相对重要性,我们还发布了两个英文测试集,其中包含了可见的正确和干扰 discourse markers,与黄金标准 RST 关系相关。我们的结果表明,与浅层演化 parsing 类似,显式/隐式之分发挥了作用,但长距离依赖关系是主要挑战,而词汇重叠的不足则是一个较小的问题,至少是在预测 parsing 中。我们的最终模型能够预测错误的发生位置的准确率为 76.3%(底层parser)和 76.6%(顶层parser)。

Unsupervised Chunking with Hierarchical RNN

  • paper_url: http://arxiv.org/abs/2309.04919
  • repo_url: https://github.com/manga-uofa/uchrnn
  • paper_authors: Zijun Wu, Anup Anand Deshmukh, Yongkang Wu, Jimmy Lin, Lili Mou
  • for: 这篇论文主要是为了探讨一种无监督的句子分析方法,即用Recurrent Neural Network (RNN)模型来自动从语言模式中提取句子结构。
  • methods: 这篇论文使用了一种两层层次RNN模型,即 Hierarchical Recurrent Neural Network (HRNN),来模型单词到句子和句子到句子的组合。该方法包括了两个阶段的训练过程:首先预训练一个无监督分析器,然后使用下游NLP任务进行细化训练。
  • results: 实验结果表明,这种无监督 chunking 方法可以在 CoNLL-2000 数据集上提取句子结构,并且与现有的无监督方法相比,提高了一个phrase F1 分数的值。此外,在下游 NLP 任务的训练过程中,模型的性能进一步提高。有趣的是,我们发现在神经网络模型在下游任务训练过程中,句子结构的出现是暂时的。这种研究对无监督句子结构发现的进步做出了贡献,并开创了更多的语言理论研究的可能性。
    Abstract In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This paper introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on the CoNLL-2000 dataset reveal a notable improvement over existing unsupervised methods, enhancing phrase F1 score by up to 6 percentage points. Further, finetuning with downstream tasks results in an additional performance improvement. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model's downstream-task training. This study contributes to the advancement of unsupervised syntactic structure discovery and opens avenues for further research in linguistic theory.
    摘要 在自然语言处理(NLP)领域,预测语言结构,如分析和块分,旁通过手动标注语法结构来进行。这篇论文介绍了一种不需要监督的块分法,即将词语组合成非层次的方式。我们提出了一种两层层次逻辑神经网络(HRNN),用于模elling 词语到块和块到句子的组合。我们的方法包括两个阶段的训练过程:预训练与无监督分析器和下游 NLP 任务的训练。实验结果表明,我们的方法在 CoNLL-2000 数据集上具有明显的提升,提高了phrase F1分数 by up to 6个百分点。此外,在下游任务的训练中,再进行一次性的性能提升。另外,我们发现在神经网络模型在下游任务训练过程中,块分结构的出现是暂时的。这项研究对无监督语法结构发现的进步做出了贡献,并开启了更多的语言理论研究的可能性。

cs.LG - 2023-09-10

Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

  • paper_url: http://arxiv.org/abs/2309.05153
  • repo_url: None
  • paper_authors: Yaxuan Zhu, Jianwen Xie, Yingnian Wu, Ruiqi Gao
  • for: 这个论文主要是为了提高能量基本模型(EBM)的采样质量和训练效率。
  • methods: 这个论文提出了一种叫做协同涂抹恢复likelihood(CDRL)的方法,它是一种可追加的方法,可以训练和采样多个EBM,并且可以在不同的噪音水平上进行协同训练。
  • results: 在CIFAR-10和ImageNet 32x32上,这个方法可以大幅提高EBM的FID分数,同时比DRL快2倍,并且可以进行 compositional generation和图像填充任务。
    Abstract Training energy-based models (EBMs) with maximum likelihood estimation on high-dimensional data can be both challenging and time-consuming. As a result, there a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximimizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versons of a dataset, paired with an initializer model for each EBM. At each noise level, the initializer model learns to amortize the sampling process of the EBM, and the two models are jointly estimated within a cooperative training framework. Samples from the initializer serve as starting points that are refined by a few sampling steps from the EBM. With the refined samples, the EBM is optimized by maximizing recovery likelihood, while the initializer is optimized by learning from the difference between the refined samples and the initial samples. We develop a new noise schedule and a variance reduction technique to further improve the sample quality. Combining these advances, we significantly boost the FID scores compared to existing EBM methods on CIFAR-10 and ImageNet 32x32, with a 2x speedup over DRL. In addition, we extend our method to compositional generation and image inpainting tasks, and showcase the compatibility of CDRL with classifier-free guidance for conditional generation, achieving similar trade-offs between sample quality and sample diversity as in diffusion models.
    摘要 训练能量基模型(EBM)的最大可能性估计在高维数据上可能是一项挑战和时间消耗的任务。因此,EBM和其他生成框架如GANs和扩散模型之间存在一定的样本质量差距。为了 bridging这个差距,我们提出了协同扩散恢复可能性(CDRL),一种有效的方法,可以有效地学习和采样多个EBM,每个EBM定义在不同的噪声水平上。在每个噪声水平上,初始化模型学习了EBM的采样过程,两个模型在一个协同训练框架中被联合学习。采样过程中,初始化模型生成的样本作为EBM的起始点,然后通过EBM的几个采样步骤来修正样本。通过这种方式,EBM可以通过最大化恢复可能性来优化,而初始化模型可以通过学习差异来学习。我们还提出了一种新的噪声调度和一种减少噪声的技术,以进一步提高样本质量。将这些进步组合起来,我们在CIFAR-10和ImageNet 32x32上比现有EBM方法提高了FID分数,同时具有2倍的速度提升。此外,我们还扩展了我们的方法到组合生成和图像填充任务,并示出了与类标量指导无关的条件生成的可能性。

Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee

  • paper_url: http://arxiv.org/abs/2309.07157
  • repo_url: None
  • paper_authors: Chenhan Xiao, Yizheng Liao, Yang Weng
  • for: 检测分布网络中的线路失效是持续运行的关键,本文提出了一种实用又可靠的检测方法,不需要costly phase angles或流量数据。
  • methods: 我们提出了一种基于变化点检测的数据驱动方法,通过梯度下降学习 poste-outage 分布的参数,但是直接使用梯度下降会存在可行性问题。我们解决这问题 by adding a Bregman divergence constraint to control the trajectory of the parameter updates。
  • results: 我们使用了多个代表性的分布网络和实际的荷载 profilestest our approach with 17 outage configurations, and the results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.
    Abstract Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.
    摘要 distribution 网格中的线路停机标识是可持续的网格运行的关键。在这种工作中,我们提出了一种实用又可靠的检测方法,只需利用 readily available 电压大小,不需要成本的相位角或电力流数据。给定感知数据,许多现有的检测方法基于变化点检测需要先知道停机模式,这些模式在实际停机场景中是未知的。为了解除这种不实际的要求,我们提出了一种数据驱动的方法,通过梯度下降来学习停机后的分布参数。然而,直接使用梯度下降存在可行性问题。为解决这个问题,我们修改了我们的方法,添加了布雷格曼分布约束来控制参数更新的轨迹,这样消除了可行性问题。由于快速操作是当今关键,我们证明可以在有 statistically 和物理性质的电压数据的基础上快速学习优化参数,并且有 convergence guarantees。我们使用了多个代表性的分布网格和真实的负荷 profiles,并对 17 个停机配置进行了评估。结果表明,我们可以在有 voltage magnitudes 和不假设停机模式的情况下,及时检测和地点化停机。

Nonlinear Granger Causality using Kernel Ridge Regression

  • paper_url: http://arxiv.org/abs/2309.05107
  • repo_url: https://github.com/WojtekFulmyk/mlcausality-krr-paper-replication
  • paper_authors: Wojciech “Victor” Fulmyk
  • For: Identifying nonlinear Granger causal relationships* Methods: Utilizes a flexible plug-in architecture with any nonlinear regressor, and kernel ridge regression with radial basis function kernel* Results: Achieves competitive AUC scores, more finely calibrated $p$-values, and significantly reduced computation times compared to existing algorithms.
    Abstract I introduce a novel algorithm and accompanying Python library, named mlcausality, designed for the identification of nonlinear Granger causal relationships. This novel algorithm uses a flexible plug-in architecture that enables researchers to employ any nonlinear regressor as the base prediction model. Subsequently, I conduct a comprehensive performance analysis of mlcausality when the prediction regressor is the kernel ridge regressor with the radial basis function kernel. The results demonstrate that mlcausality employing kernel ridge regression achieves competitive AUC scores across a diverse set of simulated data. Furthermore, mlcausality with kernel ridge regression yields more finely calibrated $p$-values in comparison to rival algorithms. This enhancement enables mlcausality to attain superior accuracy scores when using intuitive $p$-value-based thresholding criteria. Finally, mlcausality with the kernel ridge regression exhibits significantly reduced computation times compared to existing nonlinear Granger causality algorithms. In fact, in numerous instances, this innovative approach achieves superior solutions within computational timeframes that are an order of magnitude shorter than those required by competing algorithms.
    摘要 我引入了一种新的算法和 accompanying Python 库,名为 mlcausality,用于非线性格兰GER causal 关系的标识。这种新算法使用一种灵活的插件架构,允许研究人员使用任何非线性预测模型作为基础预测模型。然后,我进行了对 mlcausality 使用 kernel ridge 回归时的性能分析。结果表明,mlcausality 使用 kernel ridge 回归可以在一个多样化的 simulated 数据集中实现竞争力强的 AUC 分数。此外,mlcausality 使用 kernel ridge 回归可以获得更细化的 $p$-值,比其他算法更加精准。这种改进使得 mlcausality 可以在使用直观 $p$-值 基于的阈值标准下达到更高的准确率。最后,mlcausality 使用 kernel ridge 回归可以在许多情况下实现比现有的非线性格兰GER causal 关系算法更快的计算速度,并且在一些情况下可以达到对抗算法的一个阶段的计算时间。

Convex Q Learning in a Stochastic Environment: Extended Version

  • paper_url: http://arxiv.org/abs/2309.05105
  • repo_url: None
  • paper_authors: Fan Lu, Sean Meyn
  • for: 这篇论文是关于Markov决策过程中的凸Q学习,使用函数近似。
  • methods: 论文使用了一种凸 программирова的关键性下降法,基于Manne所提出的线性程序Characterization of Optimal Control的准确矩阵。
  • results: 主要贡献包括:(1) 凸程序relaxation的性质和Q学习的关系; (2) 一种直接的模型自由方法,可以准确地approximate凸程序; (3) 新的分析技术,可以确定模型的收敛速率。
    Abstract The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.
    摘要 文章介绍了第一种凸Q学习方法 дляMarkov决策过程中的函数近似。算法和理论基于一种缓和的 dual Manne的线性程序优化 caracterization的relaxation。文章的主要贡献包括:1. 凸程Program的属性:我们确定了一个凸程Program的解是否 bounded,以及它与标准Q学习解的关系。2. 算法设计和分析:(i)一种直接的模型自由方法,可以近似凸程Program,并且具有相似的性质, garantizing a bounded solution subject to a simple property of the basis functions。(ii)提出的算法是 converges,并且引入了新的技术来确定 Mean-square convergence rate。(iii)方法可以扩展到多种性能标准,并且发现可以降低差异的方法是考虑“相对”动态程序方程。(iv)理论通过一个 classical inventory control problem的应用得到了证明。

Is Learning in Biological Neural Networks based on Stochastic Gradient Descent? An analysis using stochastic processes

  • paper_url: http://arxiv.org/abs/2309.05102
  • repo_url: None
  • paper_authors: Sören Christensen, Jan Kallsen
  • for: 本研究探讨了生物神经网络(BNN)中学习的不同方式,以及人工神经网络(ANN)中学习的不同方式之间的区别。
  • methods: 本研究使用了一种抽象的概率模型来研究BNN中的超visum学习。
  • results: 研究发现,在每次学习机会中,多个本地更新都会导致一个梯度步骤出现,这 suggetssthat stochastic gradient descent可能会在BNN中进行优化。
    Abstract In recent years, there has been an intense debate about how learning in biological neural networks (BNNs) differs from learning in artificial neural networks. It is often argued that the updating of connections in the brain relies only on local information, and therefore a stochastic gradient-descent type optimization method cannot be used. In this paper, we study a stochastic model for supervised learning in BNNs. We show that a (continuous) gradient step occurs approximately when each learning opportunity is processed by many local updates. This result suggests that stochastic gradient descent may indeed play a role in optimizing BNNs.
    摘要 近年来,有一些研究者提出了关于生物神经网络(BNN)学习方式与人工神经网络(ANN)之间的区别。一般认为,大脑中的连接更新受到本地信息的限制,因此无法使用渐进式梯度下降优化方法。本文研究了BNN中的抽象学习模型。我们发现,在每次学习机会处理时,多个本地更新发生 approximate gradient step。这一结果表明,渐进式梯度下降可能在BNN中发挥作用。

A Deep Dive into Sleep: Single-Channel EEG-Based Sleep Stage Classification with Model Interpretability

  • paper_url: http://arxiv.org/abs/2309.07156
  • repo_url: https://github.com/suvadeepmaiti/EEG_Sleep_Stage_classification
  • paper_authors: Shivam Sharma, Suvadeep Maiti, S. Mythirayee, Srijithesh Rajendran, Bapi Raju
  • for: 这个研究是为了开发一个基于单通道EEG的睡眠阶段分类方法。
  • methods: 本研究使用了一个SE-ResNet-Bi-LSTM架构,包括两个基本元素:一个特征提取器使用SE-ResNet,以及一个时间上下文编码器使用堆叠Bi-LSTM单元。
  • results: 本研究在三个不同的数据集上进行了严格的评估,包括SLeepEDF-20、SleepEDF-78和SHHS数据集。结果显示,我们的方法在这三个数据集上取得了高度的精度和macro-F1分数(87.5%, 83.9%, 87.8%和82.5, 78.9, 81.9)。此外,我们还引入了1D-GradCAM可视化方法,帮助理解模型在睡眠阶段分类过程中的决策过程。
    Abstract Sleep, a fundamental physiological process, occupies a significant portion of our lives. Accurate classification of sleep stages serves as a crucial tool for evaluating sleep quality and identifying probable sleep disorders. This work introduces a novel methodology that utilises a SE-Resnet-Bi-LSTM architecture to classify sleep into five separate stages. The classification process is based on the analysis of single-channel electroencephalograms (EEGs). The framework that has been suggested consists of two fundamental elements: a feature extractor that utilises SE-ResNet, and a temporal context encoder that use stacks of Bi-LSTM units.The effectiveness of our approach is substantiated by thorough assessments conducted on three different datasets, namely SLeepEDF-20, SleepEDF-78, and SHHS. Significantly, our methodology attains notable levels of accuracy, specifically 87.5\%, 83.9\%, and 87.8\%, along with macro-F1 scores of 82.5, 78.9, and 81.9 for the corresponding datasets. Notably, we introduce the utilization of 1D-GradCAM visualization to shed light on the decision-making process of our model in the realm of sleep stage classification. This visualization method not only provides valuable insights into the model's classification rationale but also aligns its outcomes with the annotations made by sleep experts. One notable feature of our research is the integration of an expedited training approach, which effectively preserves the model's resilience in terms of performance. The experimental evaluations conducted provide a comprehensive evaluation of the effectiveness of our proposed model in comparison to existing approaches, highlighting its potential for practical applications.
    摘要 睡眠是生物体的基本生理过程,占据了我们生活的一大部分。准确地分类睡眠阶段是评估睡眠质量的重要工具,并可以识别可能的睡眠障碍。本文提出了一种新的方法,使用SE-ResNet-Bi-LSTM架构来分类睡眠为五个不同阶段。该分类过程基于单通道电enzephalogram (EEG) 的分析。我们提出的框架包括两个基本元素:一个特征提取器,使用 SE-ResNet,以及一个时间上下文编码器,使用堆栈的 Bi-LSTM 单元。我们的方法在三个不同的数据集上进行了系统性的评估,即SLeepEDF-20、SleepEDF-78和SHHS。结果显示,我们的方法在这些数据集上达到了remarkable的准确率和macro-F1分数,具体数据如下:87.5%、83.9%和87.8%,以及macro-F1分数分别为82.5、78.9和81.9。值得一提的是,我们首次在睡眠阶段分类中引入了1D-GradCAM视觉化方法,以便了解模型在哪些情况下进行分类的决策过程。这种视觉化方法不仅提供了模型分类的价值信息,还与睡眠专家的注释相匹配。我们的研究还 интегрирова了一种加速训练方法,以保持模型在性能上的稳定性。实验评估表明,我们提出的模型在现有方法相比有更好的实际应用前景。

Adaptive conformal classification with noisy labels

  • paper_url: http://arxiv.org/abs/2309.05092
  • repo_url: https://github.com/msesia/conformal-label-noise
  • paper_authors: Matteo Sesia, Y. X. Rachel Wang, Xin Tong
  • for: 这 paper 是为了开发一种能够自动适应Random label contamination的 conformal prediction 方法,以提供更加信息强的预测集和更强的覆盖保证,比对state-of-the-art方法更高效。
  • methods: 这 paper 使用了一种精确的理论 caracterization 来描述标签污染的影响,并通过新的 calibration 算法来让这种影响变得可行。这种解决方案 flexible ,可以利用不同的标签污染过程的假设,而不需要关于数据分布或机器学习分类器的知识。
  • results: 这 paper 通过了广泛的 simulations 和 CIFAR-10H 图像数据集的应用,证明了其方法的优势。
    Abstract This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, enabling more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise theoretical characterization of the effective coverage inflation (or deflation) suffered by standard conformal inferences in the presence of label contamination, which is then made actionable through new calibration algorithms. Our solution is flexible and can leverage different modeling assumptions about the label contamination process, while requiring no knowledge about the data distribution or the inner workings of the machine-learning classifier. The advantages of the proposed methods are demonstrated through extensive simulations and an application to object classification with the CIFAR-10H image data set.
    摘要 这个论文开发了一种新的准确预测方法,用于Classification任务,可以自动适应随机标签污染的校准样本,以便生成更加信息强的预测集,与现有方法相比具有更强的覆盖保证。这种方法基于标准准确推理中对标签污染的精确理论 caracterization,然后通过新的校准算法来实现。我们的解决方案 flexible,可以利用不同的标签污染过程的模型假设,而不需要关于数据分布或机器学习分类器的知识。我们的优点在 simulate 和对 CIFAR-10H 图像数据集进行应用中得到了证明。

A supervised generative optimization approach for tabular data

  • paper_url: http://arxiv.org/abs/2309.05079
  • repo_url: None
  • paper_authors: Fadi Hamad, Shinpei Nakamura-Sakai, Saheed Obitayo, Vamsi K. Potluru
  • for: 本研究旨在提供一种基于supervised learning的synthetic data生成框架,以满足金融机构对具有特定任务和数据集的数据生成需求。
  • methods: 该框架 integra supervised component,专门针对特定下游任务进行tailoring,并使用meta-学习方法来学习优化现有synthetic数据集的混合分布。
  • results: 该框架可以生成高质量的synthetic数据,并且可以根据下游任务进行tailoring,从而提高数据生成的效果和可靠性。
    Abstract Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching the consensus on which method we should use for the specific data sets and use cases remains challenging. Moreover, the majority of existing approaches are ``unsupervised'' in the sense that they do not take into account the downstream task. To address these issues, this work presents a novel synthetic data generation framework. The framework integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.
    摘要 文本翻译为简化中文:现代数据生成技术已成为金融机构关键话题,受多种因素驱动,如隐私保护和数据扩展。许多算法已经提出用于数据生成,但确定特定数据集和用例中使用哪种方法仍然是挑战。此外,大多数现有方法是“无监督的”,即不考虑下游任务。为解决这些问题,本文提出了一种新的数据生成框架。该框架 integrate 一种监督分布 tailored 特定下游任务,并使用 meta-学习方法学习最佳混合分布。

Generalization error bounds for iterative learning algorithms with bounded updates

  • paper_url: http://arxiv.org/abs/2309.05077
  • repo_url: None
  • paper_authors: Jingwen Fu, Nanning Zheng
  • for: 本研究探讨了iterative learning算法对非凸损函数的泛化特性,利用信息学技术。我们的主要贡献是对 bounded updates 算法的泛化错误 bound,超越了先前的works只关注 Stochastic Gradient Descent (SGD) 的情况。
  • methods: 我们的方法包括两大新特点:1) 将更新的不确定性重新表述为 mutual information,提供了新的视角;2) 使用 variance decomposition technique 来分解 iteration 之间的信息,使 surrogate process 更加简单。
  • results: 我们对不同设置下的泛化 bound 进行分析,并在模型维度增加时与训练样本数量相同时显示出改进的 bound。此外,我们还检验了在大型自然语言模型中观察到的扩展行为。最终,我们的工作为实际泛化理论的发展做出了一个更一步。
    Abstract This paper explores the generalization characteristics of iterative learning algorithms with bounded updates for non-convex loss functions, employing information-theoretic techniques. Our key contribution is a novel bound for the generalization error of these algorithms with bounded updates, extending beyond the scope of previous works that only focused on Stochastic Gradient Descent (SGD). Our approach introduces two main novelties: 1) we reformulate the mutual information as the uncertainty of updates, providing a new perspective, and 2) instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process. We analyze our generalization bound under various settings and demonstrate improved bounds when the model dimension increases at the same rate as the number of training data samples. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Ultimately, our work takes a further step for developing practical generalization theories.
    摘要
  1. We reformulate the mutual information as a measure of the uncertainty of updates, providing a new perspective on the problem.2. Instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process.We analyze our generalization bound under various settings and show that it improves as the model dimension increases at the same rate as the number of training data samples. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Our work represents a significant step forward in developing practical generalization theories.

Mutation-based Fault Localization of Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.05067
  • repo_url: https://github.com/ali-ghanbari/deepmufl-ase-2023
  • paper_authors: Ali Ghanbari, Deepak-George Thomas, Muhammad Arbab Arshad, Hridesh Rajan
  • for: 本研究旨在提高深度神经网络(DNN)系统的可靠性,特别是在安全关键领域。
  • methods: 本文提出了一种新的技术——深度瑞夫特(DeepMUFL),用于检测DNN模型中的错误。
  • results: 对于109个Stack Overflow上的错误集,深度瑞夫特能够检测出53个错误,比州态艺术的静态和动态DNN错误检测系统高效。此外,我们发现可以通过选择突变来减少检测时间,但是产生的 bug 检测率下降了7.55%。
    Abstract Deep neural networks (DNNs) are susceptible to bugs, just like other types of software systems. A significant uptick in using DNN, and its applications in wide-ranging areas, including safety-critical systems, warrant extensive research on software engineering tools for improving the reliability of DNN-based systems. One such tool that has gained significant attention in the recent years is DNN fault localization. This paper revisits mutation-based fault localization in the context of DNN models and proposes a novel technique, named deepmufl, applicable to a wide range of DNN models. We have implemented deepmufl and have evaluated its effectiveness using 109 bugs obtained from StackOverflow. Our results show that deepmufl detects 53/109 of the bugs by ranking the buggy layer in top-1 position, outperforming state-of-the-art static and dynamic DNN fault localization systems that are also designed to target the class of bugs supported by deepmufl. Moreover, we observed that we can halve the fault localization time for a pre-trained model using mutation selection, yet losing only 7.55% of the bugs localized in top-1 position.
    摘要

SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.05019
  • repo_url: None
  • paper_authors: Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma
  • for: 这个论文主要针对Diffusion Probabilistic Models(DPMs)的生成任务进行了广泛的分析和优化。
  • methods: 论文使用了两种方法进行随机抽样:variance-controlled diffusion SDE和线性多步SDE解决方法。
  • results: 对于几步抽样,SA-Solver可以实现改进或相当于现有state-of-the-art抽样方法的性能,并在适当的函数评估次数(NFEs)下达到了SOTA FID分数在大量的benchmark数据集上。
    Abstract Diffusion Probabilistic Models (DPMs) have achieved considerable success in generation tasks. As sampling from DPMs is equivalent to solving diffusion SDE or ODE which is time-consuming, numerous fast sampling methods built upon improved differential equation solvers are proposed. The majority of such techniques consider solving the diffusion ODE due to its superior efficiency. However, stochastic sampling could offer additional advantages in generating diverse and high-quality data. In this work, we engage in a comprehensive analysis of stochastic sampling from two aspects: variance-controlled diffusion SDE and linear multi-step SDE solver. Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality. Our experiments show that SA-Solver achieves: 1) improved or comparable performance compared with the existing state-of-the-art sampling methods for few-step sampling; 2) SOTA FID scores on substantial benchmark datasets under a suitable number of function evaluations (NFEs).
    摘要 Diffusion Probabilistic Models (DPMs) 已经取得了较大的成功在生成任务中。由于从 DPMs 中采样是等价于解决 diffusion SDE 或 ODE,这些问题需要很长时间,因此有许多快速采样技术被建议。大多数这些技术是解决 diffusion ODE,因为它的效率更高。然而,随机采样可以提供额外的优势,如生成多样化和高质量的数据。在这项工作中,我们进行了Diffusion SDE 采样的两个方面的全面分析:变量控制的 diffusion SDE 和线性多步 SDE 解决器。基于我们的分析,我们提出了 SA-Solver,它是一种改进的效率随机阿达姆斯方法,用于解决 diffusion SDE,以生成高质量的数据。我们的实验表明,SA-Solver 可以:1)与现有状态的艺术方法相比,在几步采样中达到相同或更高的性能; 2)在适当的函数评估次数(NFEs)下,在大量 benchmark 数据集上达到 SOTA FID 分数。

Computational Approaches for Predicting Drug-Disease Associations: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2309.06388
  • repo_url: None
  • paper_authors: Chunyan Ao, Zhichao Xiao, Lixin Guan, Liang Yu
  • For: 本研究旨在探讨计算方法对药物与疾病关系的预测,以优化药物开发过程中的成本、时间和风险。* Methods: 本文分析了多种计算方法,包括神经网络算法、矩阵算法、推荐算法、链接基于的理由算法和文本挖掘和 semantics 理解算法,以预测药物与疾病关系。* Results: 本文对现有的药物与疾病关系预测算法进行比较,并探讨了现有挑战和未来发展前景。
    Abstract In recent decades, traditional drug research and development have been facing challenges such as high cost, long timelines, and high risks. To address these issues, many computational approaches have been suggested for predicting the relationship between drugs and diseases through drug repositioning, aiming to reduce the cost, development cycle, and risks associated with developing new drugs. Researchers have explored different computational methods to predict drug-disease associations, including drug side effects-disease associations, drug-target associations, and miRNAdisease associations. In this comprehensive review, we focus on recent advances in predicting drug-disease association methods for drug repositioning. We first categorize these methods into several groups, including neural network-based algorithms, matrixbased algorithms, recommendation algorithms, link-based reasoning algorithms, and text mining and semantic reasoning. Then, we compare the prediction performance of existing drug-disease association prediction algorithms. Lastly, we delve into the present challenges and future prospects concerning drug-disease associations.
    摘要 现代药物研发面临高成本、长时间和高风险的挑战。为解决这些问题,许多计算方法被建议用于预测药物和疾病之间的关系,以减少开发新药物的成本、开发周期和风险。研究人员已经探索了不同的计算方法来预测药物疾病关系,包括药物副作用疾病关系、药物Target关系和miRNA疾病关系。在这篇概述中,我们关注最近的药物疾病关系预测方法的进步。我们首先将这些方法分为了一些组,包括神经网络基于的算法、矩阵基于的算法、推荐算法、链接基于的理由算法和文本挖掘和 semantic reasoning。然后,我们比较了现有的药物疾病关系预测算法的预测性能。最后,我们探讨了药物疾病关系的当前挑战和未来前途。

Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data

  • paper_url: http://arxiv.org/abs/2309.04980
  • repo_url: None
  • paper_authors: Xiaolu Wang, Cheng Jin, Hoi-To Wai, Yuantao Gu
  • for: 这种研究是为了研究大规模分布式优化中的增量累加梯度(IAG)方法。
  • methods: 这种方法适合参数服务器架构,因为它可以轻松地将工作者可能停止的梯度集成。
  • results: 对于具有流动数据的 случа子,这种方法可以实现线性的速度提升,即使工作者们在更新频繁 enough。我们证明了在每个工作者更新一个数据点后,解的平均方差衰逝为 O((1+T)/(nt)),其中 n 是工作者数量,t 是迭代次数,T/n 是工作者更新频率。我们的分析包括处理受到停止梯度的条件预期以及延迟和噪声项的重叠系统,这些是 IAG 类型算法的分析中的新特点。
    Abstract This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization. The IAG method is well suited for the parameter server architecture as the latter can easily aggregate potentially staled gradients contributed by workers. Although the convergence of IAG in the case of deterministic gradient is well known, there are only a few results for the case of its stochastic variant based on streaming data. Considering strongly convex optimization, this paper shows that the streaming IAG method achieves linear speedup when the workers are updating frequently enough, even if the data sample distribution across workers are heterogeneous. We show that the expected squared distance to optimal solution decays at O((1+T)/(nt)), where $n$ is the number of workers, t is the iteration number, and T/n is the update frequency of workers. Our analysis involves careful treatments of the conditional expectations with staled gradients and a recursive system with both delayed and noise terms, which are new to the analysis of IAG-type algorithms. Numerical results are presented to verify our findings.
    摘要 This paper focuses on strongly convex optimization and shows that the streaming IAG method achieves linear speedup when workers update frequently enough, even if the data sample distribution across workers is heterogeneous. Our analysis takes into account careful treatments of conditional expectations with stale gradients and a recursive system with both delayed and noise terms, which are new to the analysis of IAG-type algorithms.The expected squared distance to the optimal solution decays at O((1+T)/(nt)), where n is the number of workers, t is the iteration number, and T/n is the update frequency of workers. Numerical results are presented to verify our findings.

LMBiS-Net: A Lightweight Multipath Bidirectional Skip Connection based CNN for Retinal Blood Vessel Segmentation

  • paper_url: http://arxiv.org/abs/2309.04968
  • repo_url: None
  • paper_authors: Mufassir M. Abbasi, Shahzaib Iqbal, Asim Naveed, Tariq M. Khan, Syed S. Naqvi, Wajeeha Khalid
  • for: 这个研究是为了提出一个高速和高精度的眼睛病变检测方法,以帮助诊断和治疗眼睛疾病。
  • methods: 这个方法使用了一个名为LMBiS-Net的轻量级像素级卷积神经网,并且使用了多路特征提取对象和对向 skip connections,以提高分类精度。
  • results: 根据实验结果显示,LMBiS-Net可以实现高速和高精度的眼睛影像分类,并且具有较高的一致性和可靠性。
    Abstract Blinding eye diseases are often correlated with altered retinal morphology, which can be clinically identified by segmenting retinal structures in fundus images. However, current methodologies often fall short in accurately segmenting delicate vessels. Although deep learning has shown promise in medical image segmentation, its reliance on repeated convolution and pooling operations can hinder the representation of edge information, ultimately limiting overall segmentation accuracy. In this paper, we propose a lightweight pixel-level CNN named LMBiS-Net for the segmentation of retinal vessels with an exceptionally low number of learnable parameters \textbf{(only 0.172 M)}. The network used multipath feature extraction blocks and incorporates bidirectional skip connections for the information flow between the encoder and decoder. Additionally, we have optimized the efficiency of the model by carefully selecting the number of filters to avoid filter overlap. This optimization significantly reduces training time and enhances computational efficiency. To assess the robustness and generalizability of LMBiS-Net, we performed comprehensive evaluations on various aspects of retinal images. Specifically, the model was subjected to rigorous tests to accurately segment retinal vessels, which play a vital role in ophthalmological diagnosis and treatment. By focusing on the retinal blood vessels, we were able to thoroughly analyze the performance and effectiveness of the LMBiS-Net model. The results of our tests demonstrate that LMBiS-Net is not only robust and generalizable but also capable of maintaining high levels of segmentation accuracy. These characteristics highlight the potential of LMBiS-Net as an efficient tool for high-speed and accurate segmentation of retinal images in various clinical applications.
    摘要 盲目疾病常与改变 RETINAL 结构相关,可以在背部图像中进行临床识别。然而,现有方法通常不够准确地分割细血管。深度学习在医疗图像分割方面表现出了承诺,但是它的依赖于重复的卷积和抽取操作可能会阻碍缝合信息的表现,从而限制总的分割精度。在这篇论文中,我们提出了一个轻量级的像素级 CNN 名为 LMBiS-Net,用于分割背部图像中的血管。该网络使用多路特征提取块和双向跳转连接,以便在编码和解码器之间进行信息流。此外,我们优化了模型的效率,通过精心选择缺省的缺省数据来避免缺省的过滤重叠。这种优化显著降低了训练时间和计算效率。为评估 LMBiS-Net 模型的稳定性和普适性,我们进行了广泛的评估,包括不同方面的 RETINAL 图像。Specifically,我们将模型测试在精确地分割背部图像中的血管方面,这些血管在眼科诊断和治疗中扮演着关键角色。通过专注于背部血管,我们可以仔细分析 LMBiS-Net 模型的性能和效果。测试结果表明,LMBiS-Net 模型不仅稳定和普适,还能够保持高度的分割精度。这些特点表明 LMBiS-Net 模型具有高速和准确地分割背部图像的能力,这些能力在各种临床应用中具有广泛的应用前景。

A multiple k-means cluster ensemble framework for clustering citation trajectories

  • paper_url: http://arxiv.org/abs/2309.04949
  • repo_url: None
  • paper_authors: Joyita Chakraborty, Dinesh K. Pradhan, Subrata Nandi
  • for: 这篇论文的主要目的是探讨文献强度的分布和不同时间间隔的影响。
  • methods: 这篇论文使用了多尺度整合 clustering 方法,并对不同时间间隔的文献进行分类。
  • results: 研究发现,文献的强度演变 exhibits 四种不同的趋势,包括 Early Rise Rapid Decline、Early Rise Slow Decline、Delayed Rise No Decline 和 Delayed Rise Slow Decline。这些趋势的发展和衰落时间、累积引用分布以及峰值特征都被重新定义了。
    Abstract Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window. Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non linear and non stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule based approach. All methods are primarily parameter dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalised clustering framework is required. This paper proposes a feature based multiple k means cluster ensemble framework. 1,95,783 and 41,732 well cited articles from the Microsoft Academic Graph data are considered for clustering short term (10 year) and long term (30 year) trajectories, respectively. It has linear run time. Four distinct trajectories are obtained Early Rise Rapid Decline (2.2%), Early Rise Slow Decline (45%), Delayed Rise No Decline (53%), and Delayed Rise Slow Decline (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit Early Rise Slow Decline and Delayed Rise No Decline patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories are redefined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.
    摘要 “文献成熟时间各不相同,但所有文献的影响都是在固定窗口内测量的。对文献 trajectory 的归一化可以理解知识传播过程,并发现不所有文献在出版后即时获得成功。此外,对 trajectory 的归一化是提出纸影响建议算法的必要condition。但是,由于引用时间序列具有非线性和不稳定的特点,这是一个具有挑战性的问题。先前的研究提出了一些arbitrary 的阈值和固定规则的方法,但这些方法都是具有参数依赖性的。因此,它们会导致对类似 trajectory 的定义不一致和对其具体数量的歧义。大多数研究只 capture 极端 trajectory。因此,一个通用的归一化框架是需要的。本文提出了一种特征基于多种 k-means 集成框架。对于10年和30年的短期和长期 trajectory,分别使用 Microsoft Academic Graph 数据集中的195,783 和41,732 篇著作进行归一化。它具有线性运行时间。我们获得了4种不同的 trajectory:早期快速下降(2.2%)、早期快速下降(45%)、延迟快速下降(53%)和延迟快速下降(0.8%)。我们对不同时间跨度中文章的差异进行了详细研究。大多数文章展现出早期快速下降和延迟快速下降的模式。我们重新定义了文章成长和衰退时间、累积引用分布和峰值特征。与之前的比较研究相比,我们的提出方法可以检测所有不同的 trajectory 类别。”

Distance-Restricted Folklore Weisfeiler-Leman GNNs with Provable Cycle Counting Power

  • paper_url: http://arxiv.org/abs/2309.04941
  • repo_url: None
  • paper_authors: Junru Zhou, Jiarui Feng, Xiyuan Wang, Muhan Zhang
  • For: The paper aims to improve the efficiency and expressive power of graph neural networks (GNNs) for counting certain graph substructures, especially cycles, which is crucial for achieving robust and generalizable performance on molecular tasks.* Methods: The proposed method, $d$-Distance-Restricted FWL(2) GNNs, uses node pairs whose mutual distances are at most $d$ as the units for message passing to balance the expressive power and complexity. This approach avoids the expensive subgraph extraction operations in subgraph GNNs, making both the time and space complexity lower.* Results: The paper theoretically shows that the discriminative power of $d$-DRFWL(2) GNNs strictly increases as $d$ increases. Moreover, the model has provably strong cycle counting power even with $d=2$, being able to count all 3, 4, 5, 6-cycles, which is crucial for achieving robust and generalizable performance on molecular tasks. Experiments on both synthetic datasets and molecular datasets verify the theory.
    Abstract The ability of graph neural networks (GNNs) to count certain graph substructures, especially cycles, is important for the success of GNNs on a wide range of tasks. It has been recently used as a popular metric for evaluating the expressive power of GNNs. Many of the proposed GNN models with provable cycle counting power are based on subgraph GNNs, i.e., extracting a bag of subgraphs from the input graph, generating representations for each subgraph, and using them to augment the representation of the input graph. However, those methods require heavy preprocessing, and suffer from high time and memory costs. In this paper, we overcome the aforementioned limitations of subgraph GNNs by proposing a novel class of GNNs -- $d$-Distance-Restricted FWL(2) GNNs, or $d$-DRFWL(2) GNNs. $d$-DRFWL(2) GNNs use node pairs whose mutual distances are at most $d$ as the units for message passing to balance the expressive power and complexity. By performing message passing among distance-restricted node pairs in the original graph, $d$-DRFWL(2) GNNs avoid the expensive subgraph extraction operations in subgraph GNNs, making both the time and space complexity lower. We theoretically show that the discriminative power of $d$-DRFWL(2) GNNs strictly increases as $d$ increases. More importantly, $d$-DRFWL(2) GNNs have provably strong cycle counting power even with $d=2$: they can count all 3, 4, 5, 6-cycles. Since 6-cycles (e.g., benzene rings) are ubiquitous in organic molecules, being able to detect and count them is crucial for achieving robust and generalizable performance on molecular tasks. Experiments on both synthetic datasets and molecular datasets verify our theory. To the best of our knowledge, our model is the most efficient GNN model to date (both theoretically and empirically) that can count up to 6-cycles.
    摘要 граф neural networks (GNNs) 的能力count certain graph substructures, especially cycles, is important for the success of GNNs on a wide range of tasks. It has been recently used as a popular metric for evaluating the expressive power of GNNs. Many of the proposed GNN models with provable cycle counting power are based on subgraph GNNs, i.e., extracting a bag of subgraphs from the input graph, generating representations for each subgraph, and using them to augment the representation of the input graph. However, those methods require heavy preprocessing, and suffer from high time and memory costs. In this paper, we overcome the aforementioned limitations of subgraph GNNs by proposing a novel class of GNNs -- $d$-Distance-Restricted FWL(2) GNNs, or $d$-DRFWL(2) GNNs. $d$-DRFWL(2) GNNs use node pairs whose mutual distances are at most $d$ as the units for message passing to balance the expressive power and complexity. By performing message passing among distance-restricted node pairs in the original graph, $d$-DRFWL(2) GNNs avoid the expensive subgraph extraction operations in subgraph GNNs, making both the time and space complexity lower. We theoretically show that the discriminative power of $d$-DRFWL(2) GNNs strictly increases as $d$ increases. More importantly, $d$-DRFWL(2) GNNs have provably strong cycle counting power even with $d=2$: they can count all 3, 4, 5, 6-cycles. Since 6-cycles (e.g., benzene rings) are ubiquitous in organic molecules, being able to detect and count them is crucial for achieving robust and generalizable performance on molecular tasks. Experiments on both synthetic datasets and molecular datasets verify our theory. To the best of our knowledge, our model is the most efficient GNN model to date (both theoretically and empirically) that can count up to 6-cycles.