cs.AI - 2023-07-01

Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training

  • paper_url: http://arxiv.org/abs/2307.00368
  • repo_url: None
  • paper_authors: Dario Lazzaro, Antonio Emanuele Cinà, Maura Pintor, Ambra Demontis, Battista Biggio, Fabio Roli, Marcello Pelillo
  • for: 降低深度学习模型的能耗
  • methods: 使用梯度算法对模型训练进行精度补偿,以提高模型的能效性
  • results: 通过三个数据集和两种深度神经网络的实验分析,我们证明了我们的能源意识训练算法EAT可以培育出具有更好的平衡 между分类性能和能效性的网络。
    Abstract Deep learning models undergo a significant increase in the number of parameters they possess, leading to the execution of a larger number of operations during inference. This expansion significantly contributes to higher energy consumption and prediction latency. In this work, we propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training. To this end, we leverage a differentiable approximation of the $\ell_0$ norm, and use it as a sparse penalty over the training loss. Through our experimental analysis conducted on three datasets and two deep neural networks, we demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.
    摘要 深度学习模型在推理过程中的参数数量呈现显著增长趋势,导致执行更多的操作,从而增加能耗和预测延迟。在本工作中,我们提出了EAT算法,通过对训练损失使用可导的$\ell_0$范数惩罚,以降低训练过程中的能耗。我们通过对三个数据集和两个深度神经网络进行实验分析,证明了EAT算法可以培育更好的对比性和能效性之间的平衡。

CephGPT-4: An Interactive Multimodal Cephalometric Measurement and Diagnostic System with Visual Large Language Model

  • paper_url: http://arxiv.org/abs/2307.07518
  • repo_url: None
  • paper_authors: Lei Ma, Jincong Han, Zhaoxin Wang, Dian Zhang
  • for: 这个研究旨在开发一个基于多modal cephalometric医疗数据的诊断语言模型,以提高颜面医学评估和诊断的精度和效率。
  • methods: 本研究使用了多modal cephalometric数据,包括颜面影像和医生与病人之间的对话数据,并使用了U-net自动分析颜面特征点和生成诊断报告。然后,这些数据被精度地调整在Minigpt-4和VisualGLM上,以提高诊断的精度和可靠性。
  • results: 研究结果显示,CephGPT-4模型在诊断上表现出色,具有巨大的应用潜力,可能将改变颜面医学评估和诊断的方式。这些创新可能将在颜面医学中产生革命性的影响。
    Abstract Large-scale multimodal language models (LMMs) have achieved remarkable success in general domains. However, the exploration of diagnostic language models based on multimodal cephalometric medical data remains limited. In this paper, we propose a novel multimodal cephalometric analysis and diagnostic dialogue model. Firstly, a multimodal orthodontic medical dataset is constructed, comprising cephalometric images and doctor-patient dialogue data, with automatic analysis of cephalometric landmarks using U-net and generation of diagnostic reports. Then, the cephalometric dataset and generated diagnostic reports are separately fine-tuned on Minigpt-4 and VisualGLM. Results demonstrate that the CephGPT-4 model exhibits excellent performance and has the potential to revolutionize orthodontic measurement and diagnostic applications. These innovations hold revolutionary application potential in the field of orthodontics.
    摘要 大规模多Modal语言模型(LMMs)在通用领域已经取得了很大成功。然而,对多Modal cephalometric医疗数据的诊断语言模型的探索仍然受限。在这篇论文中,我们提出了一种新的多Modal cephalometric分析和诊断对话模型。首先,我们构建了一个多Modal cephalometric医疗数据集,包括 Cephalometric图像和医生-病人对话数据,并自动分析 Cephalometric 特征点使用 U-net 生成诊断报告。然后, Cephalometric 数据集和生成的诊断报告分别在 Minigpt-4 和 VisualGLM 上进行了精细调整。结果表明,CephGPT-4 模型在性能方面表现出色,有可能在 ortodontic 测量和诊断应用中引领时代。这些创新拥有在orthodontics 领域的革命性应用潜力。

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

  • paper_url: http://arxiv.org/abs/2307.00364
  • repo_url: None
  • paper_authors: Vinitra Swamy, Jibril Frej, Tanja Käser
  • for: 本文提出了一个呼吁,即从现有的黑obox模型解释方法中扩展到设计可解释的神经网络架构,以解决现有的解释器有限制的问题。
  • methods: 本文提出了两种解释器设计方法,包括适应路由的可解释 conditional computation,以及诊断标准的Iterative Model Learning。
  • results: 本文认为,未来的人类中心的XAI不应该仅仅是解释黑obox,而是通过设计可解释的神经网络来实现。
    Abstract Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems, often defined as determining which features are most important to a model's prediction. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to avoid or minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single explainer. This is a particularly concerning trend when considering that recent work has identified systematic disagreement in explainability methods when applied to the same points and underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose to shift from post-hoc explainability to designing interpretable neural network architectures; moving away from approximation techniques in human-centric and high impact applications. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing for interpretable conditional computation and diagnostic benchmarks for iterative model learning). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.
    摘要 人工智能可解释(XAI)在深度学习系统中扮演了关键角色,通常是确定模型预测中最重要的特征。随着模型变得更大、更普遍和生活中更加普遍,解释性变得必不可少,以避免或减少模型错误的不良影响。然而,当前的人类中心XAI方法(如医疗预测任务、教育预测任务或个性化广告)通常仅仅依赖于单一的解释器。这是一个特别担忧的趋势,因为最近的研究发现了不同的解释方法在处理同一个点和下面的黑盒模型时存在系统性的不一致。在这篇论文中,我们因此提出了一个呼吁,以解决当前state-of-the-art解释器的局限性。我们建议从后期解释转移到设计可解释的神经网络架构,停止使用近似技术在人类中心和高Impact应用中。我们认为人类中心XAI的未来不在于解释黑盒子,nor在返回到传统可解释的模型,而在于内置可解释的神经网络架构。我们 identific five needs of human-centered XAI(实时、准确、操作可能、人类可解释和一致),并提出了两种可解释设计神经网络工作流程(可变路由对 conditional computation的解释和诊断指标 дляiterative模型学习)。我们认为未来的人类中心XAI是不是解释黑盒子,nor在返回到传统可解释的模型,而在于内置可解释的神经网络架构。

A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact

  • paper_url: http://arxiv.org/abs/2307.00361
  • repo_url: None
  • paper_authors: Álvaro Huertas-García, Carlos Martí-González, Rubén García Maezo, Alejandro Echeverría Rey
  • for: 本研究旨在开发一种可持续的人工智能(AI)和机器学习(ML)模型,用于避免异常检测中的高计算需求和相关的环境影响。
  • methods: 本研究采用了多种机器学习算法和不同的多层感知器(MLP)配置,并且对这些模型进行了仔细的评估。
  • results: 研究发现, tradicional的机器学习算法如决策树和随机森林可以达到强健的性能和效率,但是优化后的MLP配置可以提供更高的性能,尽管与此同时也增加了资源的消耗。
    Abstract In the context of Industry 4.0, the use of artificial intelligence (AI) and machine learning for anomaly detection is being hampered by high computational requirements and associated environmental effects. This study seeks to address the demands of high-performance machine learning models with environmental sustainability, contributing to the emerging discourse on 'Green AI.' An extensive variety of machine learning algorithms, coupled with various Multilayer Perceptron (MLP) configurations, were meticulously evaluated. Our investigation encapsulated a comprehensive suite of evaluation metrics, comprising Accuracy, Area Under the Curve (AUC), Recall, Precision, F1 Score, Kappa Statistic, Matthews Correlation Coefficient (MCC), and F1 Macro. Simultaneously, the environmental footprint of these models was gauged through considerations of time duration, CO2 equivalent, and energy consumption during the training, cross-validation, and inference phases. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised MLP configurations, albeit with a commensurate increase in resource consumption. The study incorporated a multi-objective optimisation approach, invoking Pareto optimality principles, to highlight the trade-offs between a model's performance and its environmental impact. The insights derived underscore the imperative of striking a balance between model performance, complexity, and environmental implications, thus offering valuable directions for future work in the development of environmentally conscious machine learning models for industrial applications.
    摘要 在第四次工业 revolution中,人工智能(AI)和机器学习(ML)的异常检测使用受到高度计算需求和相关环境影响的限制。本研究旨在满足高性能机器学习模型的环境可持续性,贡献到emerging discourse on 'Green AI'。我们对多种机器学习算法和多层感知器(MLP)的不同配置进行了仔细评估。我们的调查包括了多个评价指标,包括准确率、抛物线指标(AUC)、回归率、准确率、F1 Score、κ统计、玛特各自相互关系系数(MCC)和F1 Macro。同时,我们对这些模型的环境影响进行了评估,包括训练、批处理和推理阶段的时间长度、CO2等价和能 consumption。传统的机器学习算法,如决策树和随机森林,表现了强大的效率和性能。然而,优化的 MLP 配置可以提供更高的性能,尽管与此同时,资源占用也增加了。我们的研究涉及了多目标优化方法, invoke Pareto 优化原理,以阐明模型性能和环境影响之间的负面关系。研究结果表明,在开发工业应用中的机器学习模型时,必须坚持 struck a balance между模型性能、复杂性和环境影响,从而提供有价值的指导 для未来的工作。

When Synthetic Data Met Regulation

  • paper_url: http://arxiv.org/abs/2307.00359
  • repo_url: None
  • paper_authors: Georgi Ganev
  • for: 本研究 argue that differentially private generative models 生成的伪数据可以具有足够的隐私保护,因此可以被认为是匿名数据和符合法规的。
  • methods: 本研究使用 differentially private generative models,such as 隐私均衡生成器和匿名生成器,来生成伪数据。
  • results: 本研究结果表明,使用 differentially private generative models 生成的伪数据可以具有足够的隐私保护,并且可以满足不同的隐私保护标准。
    Abstract In this paper, we argue that synthetic data produced by Differentially Private generative models can be sufficiently anonymized and, therefore, anonymous data and regulatory compliant.
    摘要 在这篇论文中,我们 argued that由Diff privacy的生成模型生成的Synthetic数据可以具有足够的隐私保护,因此可以被视为匿名数据,符合相关的法规。Here's a breakdown of the translation:* "In this paper" is translated as "在这篇论文中" (在这篇论文中).* "we argue" is translated as "我们 argued" (我们 argued).* "that synthetic data produced by Differentially Private generative models can be sufficiently anonymized" is translated as "由Diff privacy的生成模型生成的Synthetic数据可以具有足够的隐私保护" (由Diff privacy的生成模型生成的Synthetic数据可以具有足够的隐私保护).* "and, therefore, anonymous data and regulatory compliant" is translated as "因此可以被视为匿名数据,符合相关的法规" (因此可以被视为匿名数据,符合相关的法规).

Variation-aware Vision Transformer Quantization

  • paper_url: http://arxiv.org/abs/2307.00331
  • repo_url: https://github.com/huangowen/vvtq
  • paper_authors: Xijie Huang, Zhiqiang Shen, Kwang-Ting Cheng
  • for: 本研究旨在提高镜像变换器(ViT)的训练和推理效率,通过量化来减少模型的计算量和存储量。
  • methods: 本研究使用量化敏感训练(QAT)和知识塑型学习(KD)来解决ViT量化中的变化问题,并提出了模块依赖量化方案和变化敏感规则来稳定训练。
  • results: 在ImageNet-1K上,我们实现了2位Swint-T的77.66% Top-1准确率,比前一个状态的量化模型高出3.35%。
    Abstract Despite the remarkable performance of Vision Transformers (ViTs) in various visual tasks, the expanding computation and model size of ViTs have increased the demand for improved efficiency during training and inference. To address the heavy computation and parameter drawbacks, quantization is frequently studied in the community as a representative model compression technique and has seen extensive use on CNNs. However, due to the unique properties of CNNs and ViTs, the quantization applications on ViTs are still limited and underexplored. In this paper, we identify the difficulty of ViT quantization on its unique variation behaviors, which differ from traditional CNN architectures. The variations indicate the magnitude of the parameter fluctuations and can also measure outlier conditions. Moreover, the variation behaviors reflect the various sensitivities to the quantization of each module. The quantization sensitivity analysis and comparison of ViTs with CNNs help us locate the underlying differences in variations. We also find that the variations in ViTs cause training oscillations, bringing instability during quantization-aware training (QAT). Correspondingly, we solve the variation problem with an efficient knowledge-distillation-based variation-aware quantization method. The multi-crop knowledge distillation scheme can accelerate and stabilize the training and alleviate the variation's influence during QAT. We also proposed a module-dependent quantization scheme and a variation-aware regularization term to suppress the oscillation of weights. On ImageNet-1K, we obtain a 77.66% Top-1 accuracy on the extremely low-bit scenario of 2-bit Swin-T, outperforming the previous state-of-the-art quantized model by 3.35%.
    摘要 尽管视觉转换器(ViT)在视觉任务中表现出色,但是它们的计算量和模型大小的增加使得训练和推理的效率提高成为了一个重要问题。为了解决计算量和参数占用的问题,量化被广泛研究和应用于 convolutional neural networks(CNN)中,但是由于ViT的特殊性,对ViT的量化仍然具有一定的挑战和未explored领域。在这篇论文中,我们发现了ViT量化的困难点,即它们的特殊变化行为,与传统CNN结构不同。这些变化指示参数的总体趋势和异常情况的度量,同时还反映每个模块的量化敏感度。我们通过对ViT和CNN的量化敏感度分析和比较,找到了它们之间的主要差异。此外,我们发现了ViT中的变化会导致训练 oscillaions,从而使量化授学(QAT)中的训练不稳定。为了解决这个问题,我们提出了一种高效的知识塑化基于变化感知量化方法。我们的方法包括多个资源知识塑化和模块 dependent quantization scheme,以及一种变化感知 regularization term,可以减少权重的振荡。在ImageNet-1K上,我们实现了2比特的Swin-T模型,达到了77.66%的Top-1准确率,比前一个状态的量化模型提高3.35%。

FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy

  • paper_url: http://arxiv.org/abs/2307.01217
  • repo_url: https://github.com/tsingz0/fedcp
  • paper_authors: Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan
  • for: 这篇论文主要是为了解决个性化联合学习(pFL)中数据来源的问题,通过生成每个样本的条件政策来分别处理全局信息和个性信息。
  • methods: 该方法提出了基于 Federated Conditional Policy(FedCP)的新方法,它通过为每个样本生成一个条件政策,将其特征分为全局信息和个性信息两部分,然后通过全局头和个性头进行处理。相比之下,现有的pFL方法更加注重全局信息和个性信息的混合。
  • results: 在计算机视觉和自然语言处理领域的广泛实验中,FedCP方法比前十一种状态之arteMethods的方法高效,提高了6.69%。此外,FedCP方法在部分客户意外drop out时仍然保持优势,这经常发生在移动设备上。代码可以在https://github.com/TsingZ0/FedCP中找到。
    Abstract Recently, personalized federated learning (pFL) has attracted increasing attention in privacy protection, collaborative learning, and tackling statistical heterogeneity among clients, e.g., hospitals, mobile smartphones, etc. Most existing pFL methods focus on exploiting the global information and personalized information in the client-level model parameters while neglecting that data is the source of these two kinds of information. To address this, we propose the Federated Conditional Policy (FedCP) method, which generates a conditional policy for each sample to separate the global information and personalized information in its features and then processes them by a global head and a personalized head, respectively. FedCP is more fine-grained to consider personalization in a sample-specific manner than existing pFL methods. Extensive experiments in computer vision and natural language processing domains show that FedCP outperforms eleven state-of-the-art methods by up to 6.69%. Furthermore, FedCP maintains its superiority when some clients accidentally drop out, which frequently happens in mobile settings. Our code is public at https://github.com/TsingZ0/FedCP.
    摘要 最近,个性化联合学习(pFL)已经吸引了越来越多的关注,以保护隐私、合作学习以及客户端间的统计差异等方面。现有的大多数pFL方法都是利用客户端级模型参数中的全球信息和个性信息来获得利益,而忽略了数据的来源。为了解决这个问题,我们提出了联合条件策略(FedCP)方法,该方法在每个样本中分离特定信息和个性信息,然后由全球头和个性头进行处理。FedCP比现有的pFL方法更加细化,能够在样本具有特定方式进行个性化。在计算机视觉和自然语言处理领域进行了广泛的实验,并证明FedCP可以与11种现有方法进行比较,在6.69%的提高率下超越其他方法。此外,FedCP在客户端意外退出时仍保持其优势,这经常发生在移动设备上。我们的代码可以在https://github.com/TsingZ0/FedCP中找到。

DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

  • paper_url: http://arxiv.org/abs/2307.00329
  • repo_url: None
  • paper_authors: Yanjiang Guo, Yen-Jen Wang, Lihan Zha, Zheyuan Jiang, Jianyu Chen
  • For: This paper aims to improve the grounding of language models in robotic tasks to ensure that the sequences generated by the language model are both logically correct and practically executable, and to recover from misalignments between plan and execution.* Methods: The proposed method, DoReMi, leverages large language models (LLMs) for both planning and generating constraints for planned steps, and uses a vision question answering (VQA) model to check constraints during low-level skill execution. If certain misalignment occurs, the method will call the language model to re-plan in order to recover from misalignments.* Results: Experiments on various complex tasks including robot arms and humanoid robots demonstrate that DoReMi can lead to higher task success rates and shorter task completion times. Videos of DoReMi are available at https://sites.google.com/view/doremi-paper.
    Abstract Large language models encode a vast amount of semantic knowledge and possess remarkable understanding and reasoning capabilities. Previous research has explored how to ground language models in robotic tasks to ensure that the sequences generated by the language model are both logically correct and practically executable. However, low-level execution may deviate from the high-level plan due to environmental perturbations or imperfect controller design. In this paper, we propose DoReMi, a novel language model grounding framework that enables immediate Detection and Recovery from Misalignments between plan and execution. Specifically, LLMs are leveraged for both planning and generating constraints for planned steps. These constraints can indicate plan-execution misalignments and we use a vision question answering (VQA) model to check constraints during low-level skill execution. If certain misalignment occurs, our method will call the language model to re-plan in order to recover from misalignments. Experiments on various complex tasks including robot arms and humanoid robots demonstrate that our method can lead to higher task success rates and shorter task completion times. Videos of DoReMi are available at https://sites.google.com/view/doremi-paper.
    摘要 大型语言模型储存了巨量的 semantics 知识,具有惊人的理解和逻辑能力。先前的研究已经探讨了如何将语言模型绑定到机器人任务中,以确保生成的序列是有alogical Correct 并可执行。然而,低级执行可能会与高级计划不符 due to environmental perturbations 或 imperfect controller design。在这篇论文中,我们提出了 DoReMi,一种新的语言模型固定框架,可以立即探测和恢复 plan-execution 的偏差。具体来说,我们利用 LLMS 进行规划和生成约束,这些约束可以标识 plan-execution 的偏差,并使用视觉问答(VQA)模型来在低级技能执行过程中检查约束。如果发生certain misalignment,我们的方法会让语言模型重新规划,以便从偏差中恢复。我们在不同的复杂任务,包括机器人臂和人iform robot,进行了多个实验,结果表明,我们的方法可以提高任务成功率和任务完成时间。视频 demo 可以在https://sites.google.com/view/doremi-paper 找到。

SHARCS: Shared Concept Space for Explainable Multimodal Learning

  • paper_url: http://arxiv.org/abs/2307.00316
  • repo_url: https://github.com/gabriele-dominici/SHARCS
  • paper_authors: Gabriele Dominici, Pietro Barbiero, Lucie Charlotte Magister, Pietro Liò, Nikola Simidjievski
  • for: 本文旨在提出一种可解释的多模态学习方法,以便在复杂的实际问题中解决问题,而不需要培育大量数据。
  • methods: 本文使用了SHARCS(共享概念空间)approach,它可以将不同的多 modalities 映射到单一的概念 manifold 中,从而实现INTUITIVE的cross-modal概念映射。
  • results: 实验结果表明,SHARCS可以实现高度可解释的任务预测,同时也可以提高下游预测性能。此外,SHARCS可以在实际 significannot 情况下运行,如缺失modalities的检索和cross-modal解释等。
    Abstract Multimodal learning is an essential paradigm for addressing complex real-world problems, where individual data modalities are typically insufficient to accurately solve a given modelling task. While various deep learning approaches have successfully addressed these challenges, their reasoning process is often opaque; limiting the capabilities for a principled explainable cross-modal analysis and any domain-expert intervention. In this paper, we introduce SHARCS (SHARed Concept Space) -- a novel concept-based approach for explainable multimodal learning. SHARCS learns and maps interpretable concepts from different heterogeneous modalities into a single unified concept-manifold, which leads to an intuitive projection of semantically similar cross-modal concepts. We demonstrate that such an approach can lead to inherently explainable task predictions while also improving downstream predictive performance. Moreover, we show that SHARCS can operate and significantly outperform other approaches in practically significant scenarios, such as retrieval of missing modalities and cross-modal explanations. Our approach is model-agnostic and easily applicable to different types (and number) of modalities, thus advancing the development of effective, interpretable, and trustworthy multimodal approaches.
    摘要 多Modal学习是现代问题解决的重要方法论,因为各个数据模式通常无法准确地解决给定的模型任务。虽然多种深度学习方法已成功解决这些挑战,但它们的理解过程经常是不透明的,限制了跨模态分析的可解释性和领域专家的干预。本文介绍了SHARCS(共享概念空间)——一种新的概念基于方法,用于可解释的多Modal学习。SHARCS学习和映射不同的多样化模式中的可解释概念,并将它们映射到单一的共享概念空间中,从而导致跨模态相似的概念之间的直观投影。我们示出了这种方法可以导致自然的解释任务预测,同时也提高了下游预测性能。此外,我们还证明了SHARCS可以在实际重要的场景中运行,如缺失模式的检索和跨模态解释。我们的方法是模型不依存的,可以适用于不同类型和数量的模式,因此推动了有效、可解释、可信worthy的多Modal方法的发展。

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

  • paper_url: http://arxiv.org/abs/2307.00310
  • repo_url: None
  • paper_authors: Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot
  • for: 这个论文是为了研究 differentially private stochastic gradient descent (DP-SGD) 算法的隐私分析。
  • methods: 这个论文使用了修改了的每步隐私分析方法,以考虑模型更新的分布。
  • results: 该论文发现,使用这种新的隐私分析方法,可以更好地证明 DP-SGD 在许多数据点上保持隐私。特别是,正确分类的数据点可以获得更好的隐私保证。
    Abstract Differentially private stochastic gradient descent (DP-SGD) is the canonical algorithm for private deep learning. While it is known that its privacy analysis is tight in the worst-case, several empirical results suggest that when training on common benchmark datasets, the models obtained leak significantly less privacy for many datapoints. In this paper, we develop a new analysis for DP-SGD that captures the intuition that points with similar neighbors in the dataset enjoy better privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints. In particular, we observe that correctly classified points obtain better privacy guarantees than misclassified points.
    摘要 differentially private stochastic gradient descent (DP-SGD) 是深度学习中的权限保持算法。虽然已知其隐私分析在最坏情况下是紧张的,但一些实际结果表明在常用的 benchmark 数据集上训练时,模型获得的隐私泄露量较少。在这篇论文中,我们开发了一种新的分析方法,它捕捉到数据集中相似的点在隐私保护方面更好的情况。具体来说,我们在 DP-SGD 每步隐私分析中引入了基于训练集计算的模型更新的分布。我们还开发了一种新的组合定理,以便使用这种新的每步分析来考虑整个训练过程。总之,我们的评估表明,这种新的 DP-SGD 分析方法可以让我们正式地表明 DP-SGD 在许多数据点上隐私泄露量较少。具体来说,我们发现正确分类的点在隐私保护方面获得更好的保证。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation

  • paper_url: http://arxiv.org/abs/2307.00306
  • repo_url: https://github.com/boschresearch/symfm6d
  • paper_authors: Fabian Duffhauss, Sebastian Koch, Hanna Ziesche, Ngo Anh Vien, Gerhard Neumann
  • for: automated systems to interact safely with the environment
  • methods: multi-view 6D pose estimator called SyMFM6D, deep multi-directional fusion network, least-squares fitting
  • results: significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation, robust towards inaccurate camera calibration and dynamic camera setupsHere’s the simplified Chinese text:
  • for: 自动化系统与环境互动安全
  • methods: 多视角6D姿态估测器called SyMFM6D, 深度多向统合网络, 最小二乘推算
  • results: 与状态顶尖优化, 具有实验室对照测试和类比测试的优化性Please note that the simplified Chinese text is written in the traditional Chinese format, which is different from the simplified Chinese format used in mainland China.
    Abstract Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach efficiently fuses the RGB-D frames from multiple perspectives in a deep multi-directional fusion network and predicts predefined keypoints for all objects in the scene simultaneously. Based on the keypoints and an instance semantic segmentation, we efficiently compute the 6D poses by least-squares fitting. To address the ambiguity issues for symmetric objects, we propose a novel training procedure for symmetry-aware keypoint detection including a new objective function. Our SyMFM6D network significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation. We furthermore show the effectiveness of our symmetry-aware training procedure and demonstrate that our approach is robust towards inaccurate camera calibration and dynamic camera setups.
    摘要 检测对象和估算其6D姿态是自动化系统与环境安全交互的关键。大多数6D姿态估计器, however,依赖单个摄像头帧并受到遮挡和对象 симметрии的歧义所迷惑。我们解决这个问题,通过提出一种新的对称意识多视点6D姿态估计器,称为 SyMFM6D。我们的方法可以有效地将RGB-D帧从多个视角集成到深度多向ional fusion网络中,并同时预测场景中所有对象的预定的关键点。基于关键点和实例Semantic部分,我们高效地计算6D姿态。为了解决对称对象的歧义问题,我们提议一种新的训练程序,包括一个新的目标函数。我们的SyMFM6D网络在单个视角和多视角6D姿态估计中都显著超过了状态艺术。我们进一步验证了我们的对称意识训练程序的有效性,并示出我们的方法对不正确的摄像头尺寸和动态摄像头设置是稳定的。

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

  • paper_url: http://arxiv.org/abs/2307.00280
  • repo_url: None
  • paper_authors: Yan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu
  • for: 本研究旨在探讨深度学习模型在不同系统实现中的Robustness,即SysNoise问题。
  • methods: 本文首次引入SysNoise,并将其分为三类基于推理阶段。然后,我们构建了一个全面的benchmark,用于量化SysNoise对20多个模型的影响,包括图像分类、物体检测、实例 segmentation和自然语言处理任务。
  • results: 我们的广泛实验显示,SysNoise会对不同任务的模型Robustness产生影响,而常见的mitigation技术如数据增强和对抗训练显示有限的效果。
    Abstract Extensive studies have shown that deep learning models are vulnerable to adversarial and natural noises, yet little is known about model robustness on noises caused by different system implementations. In this paper, we for the first time introduce SysNoise, a frequently occurred but often overlooked noise in the deep learning training-deployment cycle. In particular, SysNoise happens when the source training system switches to a disparate target system in deployments, where various tiny system mismatch adds up to a non-negligible difference. We first identify and classify SysNoise into three categories based on the inference stage; we then build a holistic benchmark to quantitatively measure the impact of SysNoise on 20+ models, comprehending image classification, object detection, instance segmentation and natural language processing tasks. Our extensive experiments revealed that SysNoise could bring certain impacts on model robustness across different tasks and common mitigations like data augmentation and adversarial training show limited effects on it. Together, our findings open a new research topic and we hope this work will raise research attention to deep learning deployment systems accounting for model performance. We have open-sourced the benchmark and framework at https://modeltc.github.io/systemnoise_web.
    摘要 广泛的研究表明,深度学习模型对于陌生和自然噪声极敏感,然而对于不同系统实现引起的噪声知之少。在这篇论文中,我们首次介绍了系统噪声(SysNoise),这是在深度学习训练部署过程中频繁出现,但通常被忽略的噪声。具体来说,SysNoise发生在训练系统转换到目标系统时,其中各种微小系统差异积累而成为一个不可忽略的差异。我们首先将SysNoise分类为三类基于推理阶段,然后构建了一个整体性的测试基准,以量化SysNoise对20多个模型的影响,包括图像分类、物体检测、实例分割和自然语言处理任务。我们的广泛实验表明,SysNoise可以对不同任务的模型Robustness产生一定的影响,而常见的 Mitigation技术如数据扩展和防御学习也对其具有有限的效果。总之,我们的发现开启了一个新的研究领域,我们希望这项工作将引起深度学习部署系统的模型性能考虑。我们已经在https://modeltc.github.io/systemnoise_web上公开了测试基准和框架。

Causing is Achieving – A solution to the problem of causation

  • paper_url: http://arxiv.org/abs/2307.07517
  • repo_url: None
  • paper_authors: Riichiro Mizoguchi
  • for: 本研究旨在解释和模型 causation 问题,即在 premise 上假设 causation 是真的情况下,从应用ontoлогиical 的角度出发。
  • methods: 本研究使用系统功能理论来理解和模型 causation,并通过分析 causal 理论中的四个子函数(Achieves、Prevents、Allows和Disallows)来解释 causation。
  • results: 研究结果表明,causation 可以通过系统功能理论来理解,并且 Achieves 函数是 causation 的核心。然而,Achieves 函数的本质还需要进一步阐述。
    Abstract From the standpoint of applied ontology, the problem of understanding and modeling causation has been recently challenged on the premise that causation is real. As a consequence, the following three results were obtained: (1) causation can be understood via the notion of systemic function; (2) any cause can be decomposed using only four subfunctions, namely Achieves, Prevents, Allows, and Disallows; and (3) the last three subfunctions can be defined in terms of Achieves alone. It follows that the essence of causation lies in a single function, namely Achieves. It remains to elucidate the nature of the Achieves function, which has been elaborated only partially in the previous work. In this paper, we first discuss a couple of underlying policies in the above-mentioned causal theory since these are useful in the discussion, then summarize the results obtained in the former paper, and finally reveal the nature of Achieves giving a complete solution to the problem of what causation is.
    摘要 从应用ontoology的角度来看,理解和模型 causation 的问题在最近遭到了实际存在的假设。因此,以下三个结果得到:(1) causation 可以通过系统功能的概念来理解;(2)任何原因都可以通过四个子功能:成就、防止、允许和禁止来分解;(3)上述三个子功能都可以通过成就来定义。因此, causation 的本质就是成就功能。然而, Achieves 功能的本质仍然未能完全解释。在这篇文章中,我们首先讨论了在上述 causal 理论中的一些基本政策,然后概述了前一篇文章中的结果,最后揭示了 Achieves 功能的本质,从而完全解决了 causation 是什么的问题。

Finding differences in perspectives between designers and engineers to develop trustworthy AI for autonomous cars

  • paper_url: http://arxiv.org/abs/2307.03193
  • repo_url: None
  • paper_authors: Gustav Jonelid, K. R. Larsson
  • for: 本研究旨在探讨开发可靠的人工智能(AI)系统,特别是自动驾驶车辆中的AI系统,以及如何在不同视角下实现可靠性和伦理原则。
  • methods: 本研究采用了多种方法,包括文献综述、案例研究和问题探讨,以探索不同视角下的差异和关键因素,并提出了bridging gap的策略。
  • results: 本研究发现了开发可靠AI系统的三大支柱:透明度、可靠性和安全性。此外,还提出了一些实践的建议,以帮助开发人员在技术进步和伦理原则之间做出妥协。
    Abstract In the context of designing and implementing ethical Artificial Intelligence (AI), varying perspectives exist regarding developing trustworthy AI for autonomous cars. This study sheds light on the differences in perspectives and provides recommendations to minimize such divergences. By exploring the diverse viewpoints, we identify key factors contributing to the differences and propose strategies to bridge the gaps. This study goes beyond the trolley problem to visualize the complex challenges of trustworthy and ethical AI. Three pillars of trustworthy AI have been defined: transparency, reliability, and safety. This research contributes to the field of trustworthy AI for autonomous cars, providing practical recommendations to enhance the development of AI systems that prioritize both technological advancement and ethical principles.
    摘要 在设计和实施优化人工智能(AI)方面,有不同的视点关于开发可靠的AI汽车。这项研究探讨了这些视点的差异,并提供了减讳措施。通过探究多种视点,我们 indentify了关键因素导致差异,并提议了桥渡措施。这项研究超越了拖车问题,可视化了优化和伦理AI的复杂挑战。三个柱子定义了可靠AI的特征:透明度、可靠性和安全性。这项研究对可靠AI汽车的开发做出了实践推荐,以满足技术进步和伦理原则的平衡。

Hierarchical Pretraining for Biomedical Term Embeddings

  • paper_url: http://arxiv.org/abs/2307.00266
  • repo_url: None
  • paper_authors: Bryan Cai, Sihang Zeng, Yucong Lin, Zheng Yuan, Doudou Zhou, Lu Tian
  • for: 本研究旨在使用自然语言处理(NLP)技术对医疗记录(EHR)中的诊断和治疗信息进行数字化处理,以便在临床决策和患者轨迹预测等应用中使用。
  • methods: 本研究使用了表示学习来转化医疗术语为含义嵌入,然后使用这些嵌入作为预测模型的输入特征。为了提高表示学习的效果,研究人员使用了生物医学知识图(biomedical knowledge graph)来练化预训练的语言模型。
  • results: 研究人员通过修改对比损失函数,使得模型能够从层次结构中提取信息,从而学习了对于层次结构的词语对之间的距离。这些词语对之间的距离可以用于进一步的生物医学应用。
    Abstract Electronic health records (EHR) contain narrative notes that provide extensive details on the medical condition and management of patients. Natural language processing (NLP) of clinical notes can use observed frequencies of clinical terms as predictive features for downstream applications such as clinical decision making and patient trajectory prediction. However, due to the vast number of highly similar and related clinical concepts, a more effective modeling strategy is to represent clinical terms as semantic embeddings via representation learning and use the low dimensional embeddings as feature vectors for predictive modeling. To achieve efficient representation, fine-tuning pretrained language models with biomedical knowledge graphs may generate better embeddings for biomedical terms than those from standard language models alone. These embeddings can effectively discriminate synonymous pairs of from those that are unrelated. However, they often fail to capture different degrees of similarity or relatedness for concepts that are hierarchical in nature. To overcome this limitation, we propose HiPrBERT, a novel biomedical term representation model trained on additionally complied data that contains hierarchical structures for various biomedical terms. We modify an existing contrastive loss function to extract information from these hierarchies. Our numerical experiments demonstrate that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications
    摘要 电子健康记录(EHR)包含医学报告中的描述性注释,提供了详细的医学状况和病人管理信息。自然语言处理(NLP)技术可以从临床报告中的观察频率中提取临床术语的预测特征,用于下游应用程序 such as 临床决策和病人轨迹预测。然而,由于临床术语的数量太多,高度相似和相关的词汇会导致模型选择困难。为了解决这个问题,我们提出了HiPrBERT,一种基于生物医学知识图表(KG)的新的生物医学术语表示模型。我们修改了现有的对比损失函数,以EXTRACT Hierarchical information。我们的数学实验表明,HiPrBERT可以有效地从层次结构中提取对比损失,从而生成更加有用的生物医学术语表示。

InstructEval: Systematic Evaluation of Instruction Selection Methods

  • paper_url: http://arxiv.org/abs/2307.00259
  • repo_url: None
  • paper_authors: Anirudh Ajith, Chris Pan, Mengzhou Xia, Ameet Deshpande, Karthik Narasimhan
  • for: 本研究目的是对inception learning(ICL)中的指令选择算法进行全面评估。
  • methods: 本研究使用了13种不同规模和四种模型家族的大语言模型(LLM),并对九个任务进行评估。七种受欢迎的指令选择方法在五个关键指标上进行了评估。
  • results: 研究发现,使用手动撰写的指令或简单的指令无任务特定描述可以在总体性能方面获得更高的ICL性能,这指示了自动指令生成方法的不足。
    Abstract In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL prompt significantly impact performance, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses restricted to shallow subsets of models and tasks, limiting the generalizability of their insights. We develop InstructEval, an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from four model families, and covers nine tasks across three categories. Using the suite, we evaluate the relative performance of seven popular instruction selection methods over five metrics relevant to ICL. Our experiments reveal that using curated manually-written instructions or simple instructions without any task-specific descriptions often elicits superior ICL performance overall than that of automatic instruction-induction methods, pointing to a lack of generalizability among the latter. We release our evaluation suite for benchmarking instruction selection approaches and enabling more generalizable methods in this space.
    摘要 启发式学习(ICL)通过提示一大语言模型(LLM)使用 instrucion 和一小组注释的示例来完成任务。 recent work 表明,ICL 的输入精度对性能有很大的影响,这引发了 instruction 选择算法的研究。然而, instruction 选择的影响仍然未得到了充分的探讨,现有的分析受限于一些简单的模型和任务,这限制了其可重用性。我们开发了 InstructEval,一个 ICLEvaluation 集,用于进行ICL 评估的全面评估。该集包括13个开源的 LLM varying scales 从四个模型家族,覆盖了九个任务 across 三个类别。使用该集,我们评估了七种流行的 instruction 选择方法的相对性能,并发现使用手动撰写的 instrucion 或简单的 instrucion без任务特定描述可以很好地优化 ICLE 性能,这表明了 automatic instruction-induction 方法的不足。我们发布了我们的评估集,以便对 instruction 选择策略进行标准化的评估,并促进更通用的方法在这个领域。

Efficient Subclass Segmentation in Medical Images

  • paper_url: http://arxiv.org/abs/2307.00257
  • repo_url: https://github.com/ovo1111/efficientsubclasslearning
  • paper_authors: Linrui Dai, Wenhui Lei, Xiaofan Zhang
  • for: 降低医疗图像分析成本,使用粗细分类标签进行标注。
  • methods: 提出一种利用类别层级结构设计网络架构,并使用任务驱动的数据生成方法来让网络更容易识别不同的 subclass 类别。
  • results: 在 BraTS2021 和 ACDC 数据集上实验,该方法可以与全 subclass 标签样本一样具有相似的准确性,仅使用有限的 subclass 标签和足够的 superclass 标签。
    Abstract As research interests in medical image analysis become increasingly fine-grained, the cost for extensive annotation also rises. One feasible way to reduce the cost is to annotate with coarse-grained superclass labels while using limited fine-grained annotations as a complement. In this way, fine-grained data learning is assisted by ample coarse annotations. Recent studies in classification tasks have adopted this method to achieve satisfactory results. However, there is a lack of research on efficient learning of fine-grained subclasses in semantic segmentation tasks. In this paper, we propose a novel approach that leverages the hierarchical structure of categories to design network architecture. Meanwhile, a task-driven data generation method is presented to make it easier for the network to recognize different subclass categories. Specifically, we introduce a Prior Concatenation module that enhances confidence in subclass segmentation by concatenating predicted logits from the superclass classifier, a Separate Normalization module that stretches the intra-class distance within the same superclass to facilitate subclass segmentation, and a HierarchicalMix model that generates high-quality pseudo labels for unlabeled samples by fusing only similar superclass regions from labeled and unlabeled images. Our experiments on the BraTS2021 and ACDC datasets demonstrate that our approach achieves comparable accuracy to a model trained with full subclass annotations, with limited subclass annotations and sufficient superclass annotations. Our approach offers a promising solution for efficient fine-grained subclass segmentation in medical images. Our code is publicly available here.
    摘要 As research interests in medical image analysis become increasingly fine-grained, the cost of extensive annotation also rises. One feasible way to reduce the cost is to annotate with coarse-grained superclass labels while using limited fine-grained annotations as a complement. In this way, fine-grained data learning is assisted by ample coarse annotations. Recent studies in classification tasks have adopted this method to achieve satisfactory results. However, there is a lack of research on efficient learning of fine-grained subclasses in semantic segmentation tasks. In this paper, we propose a novel approach that leverages the hierarchical structure of categories to design network architecture. Meanwhile, a task-driven data generation method is presented to make it easier for the network to recognize different subclass categories. Specifically, we introduce a Prior Concatenation module that enhances confidence in subclass segmentation by concatenating predicted logits from the superclass classifier, a Separate Normalization module that stretches the intra-class distance within the same superclass to facilitate subclass segmentation, and a HierarchicalMix model that generates high-quality pseudo labels for unlabeled samples by fusing only similar superclass regions from labeled and unlabeled images. Our experiments on the BraTS2021 and ACDC datasets demonstrate that our approach achieves comparable accuracy to a model trained with full subclass annotations, with limited subclass annotations and sufficient superclass annotations. Our approach offers a promising solution for efficient fine-grained subclass segmentation in medical images. Our code is publicly available here.Here's the translation in Traditional Chinese:随着医疗图像分析的研究范畴逐渐精细化,丰富标注的成本也逐渐增加。一种可行的方法是使用粗细分类标签进行标注,并使用有限的细分类标签作为辅助。这样可以帮助精细数据学习。Recent studies in classification tasks have adopted this method to achieve satisfactory results. However, there is a lack of research on efficient learning of fine-grained subclasses in semantic segmentation tasks. In this paper, we propose a novel approach that leverages the hierarchical structure of categories to design network architecture. Meanwhile, a task-driven data generation method is presented to make it easier for the network to recognize different subclass categories. Specifically, we introduce a Prior Concatenation module that enhances confidence in subclass segmentation by concatenating predicted logits from the superclass classifier, a Separate Normalization module that stretches the intra-class distance within the same superclass to facilitate subclass segmentation, and a HierarchicalMix model that generates high-quality pseudo labels for unlabeled samples by fusing only similar superclass regions from labeled and unlabeled images. Our experiments on the BraTS2021 and ACDC datasets demonstrate that our approach achieves comparable accuracy to a model trained with full subclass annotations, with limited subclass annotations and sufficient superclass annotations. Our approach offers a promising solution for efficient fine-grained subclass segmentation in medical images. Our code is publicly available here.

An ML approach to resolution of singularities

  • paper_url: http://arxiv.org/abs/2307.00252
  • repo_url: None
  • paper_authors: Gergely Bérczi, Honglu Fan, Mingcong Zeng
  • for: 解决系数方程集中的精度问题
  • methods: 使用人工智能学习代理来找到最佳解决方案
  • results: 在某些领域中,训练模型可以超越现有的选择规则,减少符号计算中的浮点数量,提供了一个证明机会,证明近期的机器学习技术可以改善符号计算中的性能。
    Abstract The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2-player game, the so-called Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.
    摘要 解决集合中的系数方程通常包含病态、单点。解决是几何基本过程,我们将病态点换为平滑点,保持解决集合不变。解决不唯一,通常通过重复执行基本操作“膨胀”来描述。解决过程可以翻译为各种2个玩家游戏,称为希逊加game,并且一个赢家的策略可以提供解决病态点的解决方案。在这篇论文中,我们介绍了一种使用强化学习代理来找到病态点的优化解决方案。在某些领域中,训练模型在总数学表达数量上超过了现有的选择规则,这提供了一个证明,表明最近的机器学习发展有助于提高符号计算中的算法性能。

THUIR2 at NTCIR-16 Session Search (SS) Task

  • paper_url: http://arxiv.org/abs/2307.00250
  • repo_url: None
  • paper_authors: Weihang Su, Xiangsheng Li, Yiqun Liu, Min Zhang, Shaoping Ma
  • for: 这个论文描述了我们在NTCIR-161 Session Search(SS)任务中FOSS和POSS子任务中的方法和结果。
  • methods: 我们使用学习到排名和微调预训练语言模型来进行提交。我们在预训练语言模型上微调了数据和会话信息,并将它们组装成学习到排名方法。
  • results: 在预liminary评估中,我们的组装模型在FOSS子任务中获得了所有参与者中最好的表现,而在POSS子任务中也获得了最好的表现。
    Abstract Our team(THUIR2) participated in both FOSS and POSS subtasks of the NTCIR-161 Session Search (SS) Task. This paper describes our approaches and results. In the FOSS subtask, we submit five runs using learning-to-rank and fine-tuned pre-trained language models. We fine-tuned the pre-trained language model with ad-hoc data and session information and assembled them by a learning-to-rank method. The assembled model achieves the best performance among all participants in the preliminary evaluation. In the POSS subtask, we used an assembled model which also achieves the best performance in the preliminary evaluation.
    摘要 我们团队(THUIR2)参加了NTCIR-161Session Search(SS)任务的FOSS和POSS子任务。本文描述了我们的方法和成果。在FOSS子任务中,我们提交了五次运行,使用学习到排序和调整的预训练语言模型。我们对预训练语言模型进行了特点数据和会话信息的调整,并使用学习到排序方法将其组装起来。组装后的模型在初步评估中表现最佳。在POSS子任务中,我们使用同样的组装模型,也在初步评估中表现最佳。

VesselMorph: Domain-Generalized Retinal Vessel Segmentation via Shape-Aware Representation

  • paper_url: http://arxiv.org/abs/2307.00240
  • repo_url: None
  • paper_authors: Dewei Hu, Hao Li, Han Liu, Xing Yao, Jiacheng Wang, Ipek Oguz
  • for: 这个论文主要针对的问题是如何提高深度学习算法在医疗图像处理中的普适性,具体来说是解决retinal vessel segmentation任务中的域shift问题。
  • methods: 该论文提出了一种名为VesselMorph的方法,它利用域shift不变的形态特征来提高深度模型的普适性。该方法基于Frangi滤波器和Diffusion Tensor Imaging literatura,引入了一个Hessian基于的二元tensor场来描述血管的形态,并将Intensity图像和tensor场映射到一个隐藏空间进行特征提取。然后,通过一种权重平衡技巧将两个隐藏表示 fusion,并将结果传递给一个分割网络进行分割。
  • results: 该论文在六个公共数据集上进行了测试,并取得了与竞争方法相比的更高的普适性表现。
    Abstract Due to the absence of a single standardized imaging protocol, domain shift between data acquired from different sites is an inherent property of medical images and has become a major obstacle for large-scale deployment of learning-based algorithms. For retinal vessel images, domain shift usually presents as the variation of intensity, contrast and resolution, while the basic tubular shape of vessels remains unaffected. Thus, taking advantage of such domain-invariant morphological features can greatly improve the generalizability of deep models. In this study, we propose a method named VesselMorph which generalizes the 2D retinal vessel segmentation task by synthesizing a shape-aware representation. Inspired by the traditional Frangi filter and the diffusion tensor imaging literature, we introduce a Hessian-based bipolar tensor field to depict the morphology of the vessels so that the shape information is taken into account. We map the intensity image and the tensor field to a latent space for feature extraction. Then we fuse the two latent representations via a weight-balancing trick and feed the result to a segmentation network. We evaluate on six public datasets of fundus and OCT angiography images from diverse patient populations. VesselMorph achieves superior generalization performance compared with competing methods in different domain shift scenarios.
    摘要 Inspired by the traditional Frangi filter and the diffusion tensor imaging literature, we introduce a Hessian-based bipolar tensor field to depict the morphology of the vessels, thereby incorporating shape information. We map the intensity image and the tensor field to a latent space for feature extraction. Then, we fuse the two latent representations via a weight-balancing trick and feed the result to a segmentation network.We evaluate VesselMorph on six public datasets of fundus and OCT angiography images from diverse patient populations. Compared with competing methods, VesselMorph achieves superior generalization performance in different domain shift scenarios.

Forward-Forward Algorithm for Hyperspectral Image Classification: A Preliminary Study

  • paper_url: http://arxiv.org/abs/2307.00231
  • repo_url: None
  • paper_authors: Sidike Paheding, Abel A. Reyes-Angulo
  • for: 本研究探讨了使用forward-forward算法(FFA)来优化卷积神经网络的 Parameters,以提高干涉спектル图像分类的性能。
  • methods: 本研究使用了传统的反向传播算法(back-propagation algorithm)和FFA两种方法进行比较,以评估FFA在干涉спектル图像分类中的性能。
  • results: 初步的实验结果表明,FFA可以提高干涉спектル图像分类的性能,并且比传统的反向传播算法更具有潜在的应用前景。
    Abstract The back-propagation algorithm has long been the de-facto standard in optimizing weights and biases in neural networks, particularly in cutting-edge deep learning models. Its widespread adoption in fields like natural language processing, computer vision, and remote sensing has revolutionized automation in various tasks. The popularity of back-propagation stems from its ability to achieve outstanding performance in tasks such as classification, detection, and segmentation. Nevertheless, back-propagation is not without its limitations, encompassing sensitivity to initial conditions, vanishing gradients, overfitting, and computational complexity. The recent introduction of a forward-forward algorithm (FFA), which computes local goodness functions to optimize network parameters, alleviates the dependence on substantial computational resources and the constant need for architectural scaling. This study investigates the application of FFA for hyperspectral image classification. Experimental results and comparative analysis are provided with the use of the traditional back-propagation algorithm. Preliminary results show the potential behind FFA and its promises.
    摘要 <>传播算法已经在神经网络中优化参数的标准方法了,特别是在高级深度学习模型中。其广泛应用在自然语言处理、计算机视觉和远程感知等领域,对自动化多任务做出了革命性的改变。传播算法的各种优点使得它在分类、检测和分割等任务中表现出色。然而,传播算法也有其局限性,包括依赖于初始条件、消失梯度、过拟合和计算复杂性。随着前向传播算法(FFA)的出现,它计算地方优良函数来优化网络参数,从而减少了计算资源的依赖和建筑层次的需求。本研究通过对干扰图像分类进行实验和比较分析,探讨了FFA在干扰图像分类中的应用。初步结果表明FFA具有潜在的优势和承诺。

Image Matters: A New Dataset and Empirical Study for Multimodal Hyperbole Detection

  • paper_url: http://arxiv.org/abs/2307.00209
  • repo_url: None
  • paper_authors: Huixuan Zhang, Xiaojun Wan
  • for: 本研究旨在探讨多模态夸大表达的检测问题。
  • methods: 我们使用Weibo(一种中文社交媒体)上的多模态数据创建了检测数据集(将要公开),并使用文本和图像作为两种模态进行检测。此外,我们还评估了不同的预训练多模态编码器在这个下游任务中的表现。
  • results: 我们在五个不同主题的数据集上进行了跨领域性能评估,并发现了不同的模型在不同主题上的表现。这些研究可以作为参考,并指明了未来多模态夸大检测研究的方向。
    Abstract Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of which focus on text modality only. However, with the development of social media, people can create hyperbolic expressions with various modalities, including text, images, videos, etc. In this paper, we focus on multimodal hyperbole detection. We create a multimodal detection dataset\footnote{The dataset will be released to the community.} from Weibo (a Chinese social media) and carry out some studies on it. We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection. Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance. Besides, since this dataset is constructed from five different topics, we also evaluate the cross-domain performance of different models. These studies can serve as a benchmark and point out the direction of further study on multimodal hyperbole detection.
    摘要 悖论(或者是夸大)是人类表达中常见的语言现象。检测悖论是理解人类表达的重要部分。虽然有很多关于悖论检测的研究,但大多数研究都专注于文本模式。然而,随着社交媒体的发展,人们可以通过不同的模式,包括文本、图片、视频等,表达悖论。在这篇论文中,我们将关注多modal悖论检测。我们创建了一个多modal检测数据集(将于社区公开),并从微博(一种中文社交媒体)中提取数据进行一些研究。在这些研究中,我们将文本和图片作为两种模式,explore这两种模式在悖论检测中的作用。此外,我们还将不同的预训练多modalEncoder评估其在这个下游任务中的性能。此外,由于这个数据集来自五个不同的主题,我们还会评估不同模型在不同领域的跨领域性能。这些研究可以 servir为参考和指导未来的多modal悖论检测研究。

General Part Assembly Planning

  • paper_url: http://arxiv.org/abs/2307.00206
  • repo_url: https://github.com/daria-kafler/GA-SEI56-Project-02
  • paper_authors: Yulong Li, Andy Zeng, Shuran Song
  • for: investigate general part assembly, creating novel target assemblies with unseen part shapes
  • methods: transformer based model architecture, accurately predicts part poses by inferring how each part shape corresponds to the target shape
  • results: generalization abilities to novel and diverse target and part shapes, demonstrated through experiments on 3D CAD models and real-world scans.
    Abstract Most successes in autonomous robotic assembly have been restricted to single target or category. We propose to investigate general part assembly, the task of creating novel target assemblies with unseen part shapes. To tackle the planning of general part assembly, we present General Part Assembly Transformer (GPAT), a transformer based model architecture that accurately predicts part poses by inferring how each part shape corresponds to the target shape. Our experiments on both 3D CAD models and real-world scans demonstrate GPAT's generalization abilities to novel and diverse target and part shapes. Project website: https://general-part-assembly.github.io/
    摘要 多数自主机器人组装成功都受限于单个目标或类别。我们提议调查通用部件组装,即创建未看过的部件形状的新目标组装。为解决通用部件组装的规划问题,我们提出了通用部件组装变换器(GPAT),一种基于变换器的模型建筑,可以准确预测部件位置sBy inferring each part shape corresponds to the target shape。我们的实验表明GPAT在不同目标和部件形状下具有普适性。项目网站:https://general-part-assembly.github.io/Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese. If you prefer Traditional Chinese, please let me know and I can provide that as well.

  • paper_url: http://arxiv.org/abs/2307.01214
  • repo_url: None
  • paper_authors: Rui Song, Fausto Giunchiglia, Yingji Li, Hao Xu
  • for: 提高文本分类模型的稳定性和泛化能力,避免快捷学习问题
  • methods: 提出了一种新的单词组挖掘方法,可以捕捉单词组的 causal 效应,并将其排序以便增强预测的稳定性
  • results: 通过多种任务的实验,证明提出的方法可以提高文本分类模型的泛化能力和稳定性,并且可以避免快捷学习问题
    Abstract Despite large-scale pre-trained language models have achieved striking results for text classificaion, recent work has raised concerns about the challenge of shortcut learning. In general, a keyword is regarded as a shortcut if it creates a superficial association with the label, resulting in a false prediction. Conversely, shortcut learning can be mitigated if the model relies on robust causal features that help produce sound predictions. To this end, many studies have explored post-hoc interpretable methods to mine shortcuts and causal features for robustness and generalization. However, most existing methods focus only on single word in a sentence and lack consideration of word-group, leading to wrong causal features. To solve this problem, we propose a new Word-Group mining approach, which captures the causal effect of any keyword combination and orders the combinations that most affect the prediction. Our approach bases on effective post-hoc analysis and beam search, which ensures the mining effect and reduces the complexity. Then, we build a counterfactual augmentation method based on the multiple word-groups, and use an adaptive voting mechanism to learn the influence of different augmentated samples on the prediction results, so as to force the model to pay attention to effective causal features. We demonstrate the effectiveness of the proposed method by several tasks on 8 affective review datasets and 4 toxic language datasets, including cross-domain text classificaion, text attack and gender fairness test.
    摘要 尽管大规模预训语言模型已经实现了TEXT分类的突出成果,然而最近的研究表明, shortcut learning 是一个挑战。在总的来说,一个关键词被视为一个短cut if it creates a superficial association with the label, resulting in a false prediction。然而, shortcut learning 可以被mitigated if the model relies on robust causal features that help produce sound predictions。为此,许多研究已经explored post-hoc interpretable methods to mine shortcuts and causal features for robustness and generalization。然而,大多数现有方法只focus on single word in a sentence and lack consideration of word-group, leading to wrong causal features。为解决这个问题,我们提出了一种新的Word-Group mining approach,该方法可以捕捉任何关键字组的 causal effect和排序这些组合对于预测的影响。我们的方法基于有效的post-hoc分析和搜索,以确保挖掘的效果并降低复杂性。然后,我们构建了基于多个word-group的 counterfactual augmentation 方法,并使用一种适应性投票机制来学习不同扩展样本对预测结果的影响,以 forced the model to pay attention to有效的 causal features。我们在8种affective review dataset和4种toxic language dataset上进行了多个任务的实验,包括跨频text classification、text attack和性别公平测试。结果表明,我们的方法可以有效地挖掘出有用的 causal features,并且可以在不同的预测任务中提高模型的 robustness和一致性。

An Interpretable Constructive Algorithm for Incremental Random Weight Neural Networks and Its Application

  • paper_url: http://arxiv.org/abs/2307.00185
  • repo_url: None
  • paper_authors: Jing Nan, Wei Dai, Guan Yuan, Ping Zhou
  • for: 提高IRWNN的解释性和推理能力
  • methods: 基于几何关系 между隐藏参数和剩余错误,提出可解释的构建算法(ICA),并采用节点池策略获取更易于整合的隐藏参数。
  • results: ICA比其他构建算法在模型化速度、模型准确率和模型网络结构方面表现出色,并在实际应用中有效地解决了模型化问题。
    Abstract Incremental random weight neural networks (IRWNNs) have gained attention in view of its easy implementation and fast learning. However, a significant drawback of IRWNNs is that the elationship between the hidden parameters (node)and the residual error (model performance) is difficult to be interpreted. To address the above issue, this article proposes an interpretable constructive algorithm (ICA) with geometric information constraint. First, based on the geometric relationship between the hidden parameters and the residual error, an interpretable geometric information constraint is proposed to randomly assign the hidden parameters. Meanwhile, a node pool strategy is employed to obtain hidden parameters that is more conducive to convergence from hidden parameters satisfying the proposed constraint. Furthermore, the universal approximation property of the ICA is proved. Finally, a lightweight version of ICA is presented for large-scale data modeling tasks. Experimental results on six benchmark datasets and a numerical simulation dataset demonstrate that the ICA outperforms other constructive algorithms in terms of modeling speed, model accuracy, and model network structure. Besides, two practical industrial application case are used to validate the effectiveness of ICA in practical applications.
    摘要 incremenetal random weight neural networks (IRWNNs) 已经受到关注,因为它的实现容易,学习速度快。然而,IRWNNs 的一个显著缺点是隐藏参数(节点)和剩余误差(模型性能)之间的关系难以理解。为了解决这个问题,本文提出了可解释性建构算法(ICA),其中包含几何信息约束。首先,根据隐藏参数和剩余误差之间的几何关系,提出了一个可解释性的几何信息约束,用于随机分配隐藏参数。同时,使用节点池策略来获取更易于收敛的隐藏参数。此外,本文证明了ICA的通用适应性。最后,为大规模数据模型任务提出了一种轻量级版本的ICA。实验结果表明,ICA在六个标准测试集和一个数学模拟集上比其他构建算法更快速、更高准确、更有效的建模结构。此外,两个实际应用案例 validate了ICA在实际应用中的有效性。

Personality Traits in Large Language Models

  • paper_url: http://arxiv.org/abs/2307.00184
  • repo_url: None
  • paper_authors: Mustafa Safdari, Greg Serapio-García, Clément Crepy, Stephen Fitz, Peter Romero, Luning Sun, Marwa Abdulhai, Aleksandra Faust, Maja Matarić
  • for: 这个论文旨在研究和评估大语言模型(LLM)生成的文本中表现出的人性特质,以及如何使用这些模型来模拟和形成人性特质。
  • methods: 这个研究使用了评估人性特质的有效方法,并对多种广泛使用的LLM进行了测试和分析。
  • results: 研究发现:1)某些LLM生成的文本中的人性特质是可靠和有效的;2)更大和更精准地训练的LLM模型具有更强的人性特质可靠性和有效性;3)可以通过特定的配置和训练来形成LLM生成的文本中的人性特质,以达到某种特定的人性评估标准。
    Abstract The advent of large language models (LLMs) has revolutionized natural language processing, enabling the generation of coherent and contextually relevant text. As LLMs increasingly power conversational agents, the synthesized personality embedded in these models by virtue of their training on large amounts of human-generated data draws attention. Since personality is an important factor determining the effectiveness of communication, we present a comprehensive method for administering validated psychometric tests and quantifying, analyzing, and shaping personality traits exhibited in text generated from widely-used LLMs. We find that: 1) personality simulated in the outputs of some LLMs (under specific prompting configurations) is reliable and valid; 2) evidence of reliability and validity of LLM-simulated personality is stronger for larger and instruction fine-tuned models; and 3) personality in LLM outputs can be shaped along desired dimensions to mimic specific personality profiles. We also discuss potential applications and ethical implications of our measurement and shaping framework, especially regarding responsible use of LLMs.
    摘要 LLM的出现对自然语言处理造成了革命,使得生成了Contextually relevant和 coherent的文本。随着 LLM 在对话代理人中增加使用,模型内嵌在这些数据上的人性吸引了注意。人性是通信效果的重要因素,我们提出了一种全面的方法,通过验证了大量人类生成的数据来管理和分析 LL 的文本生成中的人性特质。我们发现:1)一些 LL 的输出中 simulate 的人性可靠和有效; 2)大型和指定微调的模型的人性可靠性和有效性更强; 3)可以通过指定特定人性特征来形成 LL 的输出中的人性。我们还讨论了 Our measurement and shaping framework 的应用和伦理问题,特别是关于负责任的 LL 使用。

The Integer Linear Programming Inference Cookbook

  • paper_url: http://arxiv.org/abs/2307.00171
  • repo_url: None
  • paper_authors: Vivek Srikumar, Dan Roth
  • for: 这篇论文旨在帮助读者将自然语言处理问题转换为整数线性 програм的实例。
  • methods: 本文使用了多种方法,包括带有约束的整数线性Program、约束分解和约束优化。
  • results: 本文提供了两个实践例子,用于说明如何使用这些方法解决各种自然语言处理问题。
    Abstract Over the years, integer linear programs have been employed to model inference in many natural language processing problems. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes. At the end, we will see two worked examples to illustrate the use of these recipes.
    摘要 Over the years, 整数线性Programs have been employed to model inference in many natural language processing problems. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes. At the end, we will see two worked examples to illustrate the use of these recipes.Here's the translation of each sentence:1. Over the years, 整数线性Programs have been employed to model inference in many natural language processing problems.整数线性Programs (integer linear programs) 在多年来 already been used to model inference in many natural language processing problems.2. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes.这个survey 是为了引导读者通过将新的推理问题转换为整数线性Programs的实例,并以一系列的方法(recipes)的形式进行结构化。3. At the end, we will see two worked examples to illustrate the use of these recipes.到最后,我们将看到两个实例,以示这些方法的使用。

VoxWatch: An open-set speaker recognition benchmark on VoxCeleb

  • paper_url: http://arxiv.org/abs/2307.00169
  • repo_url: None
  • paper_authors: Raghuveer Peri, Seyed Omid Sadjadi, Daniel Garcia-Romero
  • for: 这篇论文是关于开放集成人识别(OSI)的研究,具体来说是研究如何在识别人员时避免 False Alarm 问题。
  • methods: 这篇论文使用了三种强大的神经网络系统来实现 speaker detection 任务,并使用 VoxCeleb 数据集来构建了首个公共benchmark。
  • results: 研究结果显示,通过 score calibration 和 score fusion 两种方法可以大幅提高 OSI 性能,而原则上采用的 adaptive score normalization 并不一定能够提高表现。
    Abstract Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining if a test speech sample belongs to a speaker from a set of pre-enrolled individuals (in-set) or if it is from an out-of-set speaker. In addition to the typical challenges associated with speech variability, OSI is prone to the "false-alarm problem"; as the size of the in-set speaker population (a.k.a watchlist) grows, the out-of-set scores become larger, leading to increased false alarm rates. This is in particular challenging for applications in financial institutions and border security where the watchlist size is typically of the order of several thousand speakers. Therefore, it is important to systematically quantify the false-alarm problem, and develop techniques that alleviate the impact of watchlist size on detection performance. Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations. In this paper, we present the first public benchmark for OSI, developed using the VoxCeleb dataset. We quantify the effect of the watchlist size and speech duration on the watchlist-based speaker detection task using three strong neural network based systems. In contrast to the findings from prior research, we show that the commonly adopted adaptive score normalization is not guaranteed to improve the performance for this task. On the other hand, we show that score calibration and score fusion, two other commonly used techniques in SV, result in significant improvements in OSI performance.
    摘要 尽管 откры集话者标识(OSI)在验证人员身份方面有广泛的实际应用,如防止诈骗等,但在话者认证社区中它却得到了较少的关注。OSI的任务是 determin whether a test speech sample belongs to a speaker from a set of pre-enrolled individuals(in-set)or if it is from an out-of-set speaker。在Speech variability的常见挑战之外,OSI还面临着"false-alarm problem",即预先列出的话者人数(也称为watchlist)增加后,out-of-set scores的增加会导致false alarm rate的增加。这对于金融机构和边境安全应用 particulary challenging,因为watchlist的大小通常在几千个话者之间。因此,需要系统地量化false-alarm problem,并开发有效地降低watchlist size对于检测性能的影响的技术。 Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations. In this paper, we present the first public benchmark for OSI, developed using the VoxCeleb dataset. We quantify the effect of the watchlist size and speech duration on the watchlist-based speaker detection task using three strong neural network based systems. Unlike the findings from prior research, we show that the commonly adopted adaptive score normalization is not guaranteed to improve the performance for this task. On the other hand, we show that score calibration and score fusion, two other commonly used techniques in SV, result in significant improvements in OSI performance.

Counterfactual Collaborative Reasoning

  • paper_url: http://arxiv.org/abs/2307.00165
  • repo_url: None
  • paper_authors: Jianchao Ji, Zelong Li, Shuyuan Xu, Max Xiong, Juntao Tan, Yingqiang Ge, Hao Wang, Yongfeng Zhang
  • for: 提高机器学习模型的准确率和可解释性
  • methods: 结合Counterfactual Collaborative Reasoning (CCR)和神经网络逻辑理解,提高机器学习模型的性能和可读性
  • results: 对三个实际数据集进行实验,CCR模型在准确率和可解释性方面都超过了非增强模型和隐式增强模型,同时也提高了模型的可读性。
    Abstract Causal reasoning and logical reasoning are two important types of reasoning abilities for human intelligence. However, their relationship has not been extensively explored under machine intelligence context. In this paper, we explore how the two reasoning abilities can be jointly modeled to enhance both accuracy and explainability of machine learning models. More specifically, by integrating two important types of reasoning ability -- counterfactual reasoning and (neural) logical reasoning -- we propose Counterfactual Collaborative Reasoning (CCR), which conducts counterfactual logic reasoning to improve the performance. In particular, we use recommender system as an example to show how CCR alleviate data scarcity, improve accuracy and enhance transparency. Technically, we leverage counterfactual reasoning to generate "difficult" counterfactual training examples for data augmentation, which -- together with the original training examples -- can enhance the model performance. Since the augmented data is model irrelevant, they can be used to enhance any model, enabling the wide applicability of the technique. Besides, most of the existing data augmentation methods focus on "implicit data augmentation" over users' implicit feedback, while our framework conducts "explicit data augmentation" over users explicit feedback based on counterfactual logic reasoning. Experiments on three real-world datasets show that CCR achieves better performance than non-augmented models and implicitly augmented models, and also improves model transparency by generating counterfactual explanations.
    摘要 人工智能中的 causal 理解和逻辑理解是两种重要的理解能力。然而,这两种理解能力在机器智能上的关系尚未得到广泛探讨。在这篇论文中,我们探讨了如何将这两种理解能力联合起来,以提高机器学习模型的准确性和可解释性。具体来说,我们提出了Counterfactual Collaborative Reasoning(CCR),它通过对假设逻辑进行counterfactual逻辑理解来提高性能。例如,我们使用了推荐系统作为示例,显示了如何CCR可以缓解数据稀缺、提高准确性和提高透明度。技术上,我们利用了counterfactual逻辑来生成“difficult”的假设反例用于数据增强,这些反例,与原始训练例子一起,可以提高模型性能。由于这些增强数据不依赖于模型,因此可以用于提高任何模型,使得这种技术具有广泛的可应用性。此外,大多数现有的数据增强方法都是基于用户的偏好进行“隐式数据增强”,而我们的框架则是基于counterfactual逻辑进行“显式数据增强”。实验结果表明,CCR在三个实际 datasets 上表现出比非增强模型和隐式增强模型更好的性能,并且也提高了模型的透明度。

FFPDG: Fast, Fair and Private Data Generation

  • paper_url: http://arxiv.org/abs/2307.00161
  • repo_url: None
  • paper_authors: Weijie Xu, Jinjin Zhao, Francis Iannacci, Bo Wang
  • for: 本研究的目的是设计一种快速、公正、灵活和隐私的数据生成方法,以满足现实世界应用场景中的数据需求。
  • methods: 我们采用的方法是基于GAN的方法,但是我们对GAN进行了一些修改以确保数据生成的公正性和隐私性。我们还使用了一些新的技术来提高数据生成的效率和质量。
  • results: 我们的实验结果表明,模型在使用我们提出的数据生成方法后可以在真实应用场景中进行良好的推理。此外,我们还发现了一些有趣的应用场景,其中包括隐私保护和数据分布彩色等。
    Abstract Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN [\cite{goodfellow2014generative}] based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. In this work, we design a fast, fair, flexible and private data generation method. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios.
    摘要 TRANSLATE_TEXT: Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. In this work, we design a fast, fair, flexible and private data generation method. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios.TRANSLATION: 生成模型在假数据生成中广泛使用。假数据中的公平和隐私是两大关注点。虽然最近的GAN基于方法可以保持隐私,但生成的数据可能更加偏向。同时,这些方法需要高度的计算资源。在这个工作中,我们设计了快速、公平、灵活和隐私的数据生成方法。我们通过理论和实验证明了我们的方法的效果。我们还证明了基于我们的方法训练的模型在真实应用场景中的推理阶段可以表现出色。

Stitched ViTs are Flexible Vision Backbones

  • paper_url: http://arxiv.org/abs/2307.00154
  • repo_url: https://github.com/ziplab/sn-netv2
  • paper_authors: Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
  • for: 该研究旨在提出一种新的模型粘合框架,可以在运行时进行多种性能-效率质量的负担重量调整。
  • methods: 该研究使用了缝合 neural network 框架,并提出了一种两向缝合方案,以扩大缝合空间。此外,还提出了一种基于 FLOPs 分布的资源限制采样策略。
  • results: 实验结果表明,SN-Netv2 可以作为一种灵活的视觉底层模型,在 ImageNet-1K、ADE20K、COCO-Stuff-10K、NYUv2 和 COCO-2017 上表现出色,并且在训练效率和适应性方面具有明显的优势。
    Abstract Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual sizes requires separate training and is restricted by fixed performance-efficiency trade-offs. In this paper, we are inspired by stitchable neural networks, which is a new framework that cheaply produces a single model that covers rich subnetworks by stitching pretrained model families, supporting diverse performance-efficiency trade-offs at runtime. Building upon this foundation, we introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation. Specifically, we first propose a Two-way stitching scheme to enlarge the stitching space. We then design a resource-constrained sampling strategy that takes into account the underlying FLOPs distributions in the space for improved sampling. Finally, we observe that learning stitching layers is a low-rank update, which plays an essential role on downstream tasks to stabilize training and ensure a good Pareto frontier. With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K, NYUv2 and COCO-2017, SN-Netv2 demonstrates strong ability to serve as a flexible vision backbone, achieving great advantages in both training efficiency and adaptation. Code will be released at https://github.com/ziplab/SN-Netv2.
    摘要 大型预训练的普通视transformer(ViT)已成为许多下游任务的工具 horse。然而,现有的使用准备好的ViT的工作是不fficient的,因为采用固定大小的ViT需要分开训练和部署,并且受到fixed performance-efficiency trade-offs的限制。本文通过使用可供购买的神经网络框架,提出了一种新的折衔网络框架,可以低成本地生成一个单一的模型,覆盖丰富的性能-效率质量的范围。基于这个基础,我们介绍SN-Netv2,一个系统性改进的模型拼接框架,以便下游任务的适应。具体来说,我们首先提出了两个方向的拼接方案,以扩大拼接空间。然后,我们设计了考虑下面的FLOPs分布的资源受限 sampling策略。最后,我们发现学习拼接层是一个低级别的更新,在下游任务中发挥了重要的稳定化训练和保证良好的Pareto前沿作用。通过对ImageNet-1K、ADE20K、COCO-Stuff-10K、NYUv2和COCO-2017进行了广泛的实验,SN-Netv2表现出了强大的灵活视觉核心能力,在训练效率和适应方面具有明显的优势。代码将在https://github.com/ziplab/SN-Netv2上发布。

Large Language Models (GPT) for automating feedback on programming assignments

  • paper_url: http://arxiv.org/abs/2307.00150
  • repo_url: None
  • paper_authors: Maciej Pankiewicz, Ryan S. Baker
  • for: This paper aims to improve the automated feedback process for programming assignments by using OpenAI’s GPT-3.5 model to generate personalized hints for students.
  • methods: The authors use GPT-3.5 to generate personalized hints for students solving programming assignments on an automated assessment platform.
  • results: The experimental group (with GPT hints enabled) performed better in terms of successful submissions and took less time to solve assignments, but there was a potential over-reliance on GPT-generated feedback.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目的是提高自动化反馈过程中的程序作业答案,使用OpenAI的GPT-3.5模型生成个性化提示 для学生。
  • methods: 作者使用GPT-3.5来生成个性化提示,用于学生在自动评测平台上解决程序作业。
  • results: 实验组(含GPT提示)的学生 Correctness rate提高,并需要更少的时间解决任务,但可能存在GPT生成反馈过依的问题。
    Abstract Addressing the challenge of generating personalized feedback for programming assignments is demanding due to several factors, like the complexity of code syntax or different ways to correctly solve a task. In this experimental study, we automated the process of feedback generation by employing OpenAI's GPT-3.5 model to generate personalized hints for students solving programming assignments on an automated assessment platform. Students rated the usefulness of GPT-generated hints positively. The experimental group (with GPT hints enabled) relied less on the platform's regular feedback but performed better in terms of percentage of successful submissions across consecutive attempts for tasks, where GPT hints were enabled. For tasks where the GPT feedback was made unavailable, the experimental group needed significantly less time to solve assignments. Furthermore, when GPT hints were unavailable, students in the experimental condition were initially less likely to solve the assignment correctly. This suggests potential over-reliance on GPT-generated feedback. However, students in the experimental condition were able to correct reasonably rapidly, reaching the same percentage correct after seven submission attempts. The availability of GPT hints did not significantly impact students' affective state.
    摘要

A Personalized Household Assistive Robot that Learns and Creates New Breakfast Options through Human-Robot Interaction

  • paper_url: http://arxiv.org/abs/2307.00114
  • repo_url: None
  • paper_authors: Ali Ayub, Chrystopher L. Nehaniv, Kerstin Dautenhahn
  • for: 这篇论文是为了开发一个能够学习用户喜好的家庭助手机器人而设计的认知架构。
  • methods: 该架构使用了当前最佳的感知学算法、计算机科学中的认知模型记忆和学习计算、用于选择和放置家庭中的物品任务计划器、用于与用户交互的图形用户界面(GUI)以及一种基于学习的新 breakfast 选项生成方法。
  • results: 实验结果表明,该架构能够从用户处学习个性化的早餐选项,并使用这些知识生成 nunca before learned by the robot的新 breakfast 选项。
    Abstract For robots to assist users with household tasks, they must first learn about the tasks from the users. Further, performing the same task every day, in the same way, can become boring for the robot's user(s), therefore, assistive robots must find creative ways to perform tasks in the household. In this paper, we present a cognitive architecture for a household assistive robot that can learn personalized breakfast options from its users and then use the learned knowledge to set up a table for breakfast. The architecture can also use the learned knowledge to create new breakfast options over a longer period of time. The proposed cognitive architecture combines state-of-the-art perceptual learning algorithms, computational implementation of cognitive models of memory encoding and learning, a task planner for picking and placing objects in the household, a graphical user interface (GUI) to interact with the user and a novel approach for creating new breakfast options using the learned knowledge. The architecture is integrated with the Fetch mobile manipulator robot and validated, as a proof-of-concept system evaluation in a large indoor environment with multiple kitchen objects. Experimental results demonstrate the effectiveness of our architecture to learn personalized breakfast options from the user and generate new breakfast options never learned by the robot.
    摘要 The proposed cognitive architecture combines state-of-the-art perceptual learning algorithms, computational implementation of cognitive models of memory encoding and learning, a task planner for picking and placing objects in the household, a graphical user interface (GUI) to interact with the user, and a novel approach for creating new breakfast options using the learned knowledge. The architecture is integrated with the Fetch mobile manipulator robot and validated as a proof-of-concept system evaluation in a large indoor environment with multiple kitchen objects.Experimental results demonstrate the effectiveness of our architecture to learn personalized breakfast options from the user and generate new breakfast options never learned by the robot.

Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education

  • paper_url: http://arxiv.org/abs/2307.00112
  • repo_url: None
  • paper_authors: Prabin Sharma, Kisan Thapa, Dikshya Thapa, Prastab Dhakal, Mala Deep Upadhaya, Santosh Adhikari, Salik Ram Khanal
  • For: The paper aims to evaluate the reliability of ChatGPT in answering complex medical and clinical questions.* Methods: The study uses Harvard University gross anatomy and the United States Medical Licensing Examination (USMLE) questionnaire to assess the accuracy of ChatGPT-generated answers. The results are evaluated using a 2-way ANOVA and posthoc analysis.* Results: The study finds that ChatGPT-generated answers are more context-oriented and represent a better model for deductive reasoning than regular Google search results. ChatGPT obtained 58.8% on logical questions and 60% on ethical questions, which is approaching the passing range for logical questions and has crossed the threshold for ethical questions.Here is the information in Simplified Chinese text:
  • for: 这项研究的目的是评估ChatGPT是否可靠地回答复杂的医学和临床问题。
  • methods: 这项研究使用哈佛大学解剖学和美国医学资格考试(USMLE)问卷来评估ChatGPT生成的答案的准确性。结果使用2重分析和后续分析进行评估。
  • results: 研究发现ChatGPT生成的答案更加具有上下文特征,代表了更好的推理模型,比regular Google搜索结果更佳。ChatGPT在逻辑问题上 obtianed 58.8%,在伦理问题上 obtianed 60%,已经接近逻辑问题的通过线和超过伦理问题的阈值。
    Abstract Artificial intelligence is gaining traction in more ways than ever before. The popularity of language models and AI-based businesses has soared since ChatGPT was made available to the general public via OpenAI. It is becoming increasingly common for people to use ChatGPT both professionally and personally. Considering the widespread use of ChatGPT and the reliance people place on it, this study determined how reliable ChatGPT can be for answering complex medical and clinical questions. Harvard University gross anatomy along with the United States Medical Licensing Examination (USMLE) questionnaire were used to accomplish the objective. The paper evaluated the obtained results using a 2-way ANOVA and posthoc analysis. Both showed systematic covariation between format and prompt. Furthermore, the physician adjudicators independently rated the outcome's accuracy, concordance, and insight. As a result of the analysis, ChatGPT-generated answers were found to be more context-oriented and represented a better model for deductive reasoning than regular Google search results. Furthermore, ChatGPT obtained 58.8% on logical questions and 60% on ethical questions. This means that the ChatGPT is approaching the passing range for logical questions and has crossed the threshold for ethical questions. The paper believes ChatGPT and other language learning models can be invaluable tools for e-learners; however, the study suggests that there is still room to improve their accuracy. In order to improve ChatGPT's performance in the future, further research is needed to better understand how it can answer different types of questions.
    摘要 人工智能在各方面得到了更多的应用。自chatGPT被开源AI社区推出以来,语言模型和AI相关业务的受欢迎程度减加了。现在,更多的人在职业和个人水平上使用chatGPT。为了评估chatGPT对复杂医学和клиниче问题的可靠性,这个研究使用了哈佛大学析层解剖和美国医学资格考试(USMLE)问卷。研究使用2重ANOVA和后续分析对结果进行评估。结果表明,chatGPT生成的答案具有更高的上下文化和逻辑推理能力,比搜索结果更佳。此外,chatGPT在逻辑问题上取得了58.8%的分数,在伦理问题上取得了60%的分数。这表明chatGPT在逻辑问题上接近通过分数,在伦理问题上已经突破了阈值。研究认为,chatGPT和其他语言学习模型可以是电子学习者的宝贵工具,但研究还表明,以后要进一步了解chatGPT如何回答不同类型的问题,以提高其性能。

Ticket-BERT: Labeling Incident Management Tickets with Language Models

  • paper_url: http://arxiv.org/abs/2307.00108
  • repo_url: None
  • paper_authors: Zhexiong Liu, Cris Benge, Siduo Jiang
  • for: 这个论文的目的是为了提高缓解问题的效率,使用简单 yet robust的语言模型来标签问题票据。
  • methods: 这个论文使用的方法是使用提出的 ticket datasets 来训练 Ticket-BERT 语言模型。
  • results: 实验表明 Ticket-BERT 比基线和现有文本分类器更高效,并且在 Microsoft IcM 系统上部署后,通过活动学习循环和几个标注来快速适应新收集的问题票据。
    Abstract An essential aspect of prioritizing incident tickets for resolution is efficiently labeling tickets with fine-grained categories. However, ticket data is often complex and poses several unique challenges for modern machine learning methods: (1) tickets are created and updated either by machines with pre-defined algorithms or by engineers with domain expertise that share different protocols, (2) tickets receive frequent revisions that update ticket status by modifying all or parts of ticket descriptions, and (3) ticket labeling is time-sensitive and requires knowledge updates and new labels per the rapid software and hardware improvement lifecycle. To handle these issues, we introduce Ticket- BERT which trains a simple yet robust language model for labeling tickets using our proposed ticket datasets. Experiments demonstrate the superiority of Ticket-BERT over baselines and state-of-the-art text classifiers on Azure Cognitive Services. We further encapsulate Ticket-BERT with an active learning cycle and deploy it on the Microsoft IcM system, which enables the model to quickly finetune on newly-collected tickets with a few annotations.
    摘要 一个重要的决策因素是将事件票据分类为细化的类别,但事件数据往往复杂, pose 多个现代机器学习方法的挑战:(1)票据由机器或域专家生成或修改,(2)票据经常得到修订,修改票据状态,(3)票据标签是时间敏感的,需要持续更新和新增标签,以适应软件和硬件升级循环。为解决这些问题,我们介绍了票据BERT,一种训练简单 yet robust的语言模型,用于标记票据。实验表明,票据BERT在 Azure Cognitive Services 上比基准和当前文本分类器更高效。我们还将 Ticket-BERT 包装在活动学习循环中,并在 Microsoft IcM 系统上部署,这使得模型可以快速适应新收集的票据,只需要几个标注。

Obscured Wildfire Flame Detection By Temporal Analysis of Smoke Patterns Captured by Unmanned Aerial Systems

  • paper_url: http://arxiv.org/abs/2307.00104
  • repo_url: None
  • paper_authors: Uma Meleti, Abolfazl Razi
    for: 本研究文章目的是探讨使用RGB摄像头设备的无人机实时检测遮盖的野火(火焰被树木、烟雾、云层等自然障碍物遮盖)的挑战。methods: 我们提出了一种新的方法,即基于时间分析烟雾模式的semantic segmentation。我们的方法使用卷积神经网络架构,包括预训练的CNNEncoder和3D卷积来解码。我们还使用顺序堆叠特征来利用时间变化。results: 我们的方法可以准确地检测遮盖野火,达到了85.88%的Dice分数,同时实现了92.47%的高精度和90.67%的分类精度。与其他方法相比,我们的方法在视觉上表现出色,并在视频级别的火灾分类中达到了100%的准确率。
    Abstract This research paper addresses the challenge of detecting obscured wildfires (when the fire flames are covered by trees, smoke, clouds, and other natural barriers) in real-time using drones equipped only with RGB cameras. We propose a novel methodology that employs semantic segmentation based on the temporal analysis of smoke patterns in video sequences. Our approach utilizes an encoder-decoder architecture based on deep convolutional neural network architecture with a pre-trained CNN encoder and 3D convolutions for decoding while using sequential stacking of features to exploit temporal variations. The predicted fire locations can assist drones in effectively combating forest fires and pinpoint fire retardant chemical drop on exact flame locations. We applied our method to a curated dataset derived from the FLAME2 dataset that includes RGB video along with IR video to determine the ground truth. Our proposed method has a unique property of detecting obscured fire and achieves a Dice score of 85.88%, while achieving a high precision of 92.47% and classification accuracy of 90.67% on test data showing promising results when inspected visually. Indeed, our method outperforms other methods by a significant margin in terms of video-level fire classification as we obtained about 100% accuracy using MobileNet+CBAM as the encoder backbone.
    摘要

Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models

  • paper_url: http://arxiv.org/abs/2307.00101
  • repo_url: None
  • paper_authors: Harnoor Dhingra, Preetiha Jayashanker, Sayali Moghe, Emma Strubell
  • for: 本研究旨在探讨大语言模型(LLMs)如何生成描述不同性取向人群的文本,并分析文本中各个群体之间的偏见。
  • methods: 本研究采用了对比研究的方法,通过对LLMs生成的文本进行分析,发现文本中各个群体之间存在明显的偏见。此外,研究还使用了SHAP分析方法,通过在文本中添加链接思维提示来减少偏见。
  • results: 研究发现,LLMs生成的文本中对LGBTQIA+社群存在明显的偏见,而使用SHAP分析方法可以减少这种偏见。这表明,在这种设定下,采用后续方法可以有效地减少LLMs生成的偏见。
    Abstract Large Language Models (LLMs) are trained primarily on minimally processed web text, which exhibits the same wide range of social biases held by the humans who created that content. Consequently, text generated by LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community. In this paper, we perform a comparative study of how LLMs generate text describing people with different sexual identities. Analyzing bias in the text generated by an LLM using regard score shows measurable bias against queer people. We then show that a post-hoc method based on chain-of-thought prompting using SHAP analysis can increase the regard of the sentence, representing a promising approach towards debiasing the output of LLMs in this setting.
    摘要 大语言模型(LLM)主要在不加处理的网络文本上进行训练,这些文本具有人们创造的各种社会偏见。因此,由LLM生成的文本可能不偏不倚地扩散对少数群体,如LGBTQIA+社区的刻板印象。在这篇论文中,我们进行了LLM生成文本中人们性取向的比较研究。我们使用regard score来测量文本中对不同性取向的人群的偏见,并发现了LLM生成文本中对Queer人群的偏见。然后,我们示出了基于链条思维提问的后续方法,使用SHAP分析可以提高文本中对 Queer人群的尊重度,表明了对LLM输出的减偏化方法的可行性。

Transformers in Healthcare: A Survey

  • paper_url: http://arxiv.org/abs/2307.00067
  • repo_url: None
  • paper_authors: Subhash Nerella, Sabyasachi Bandyopadhyay, Jiaqing Zhang, Miguel Contreras, Scott Siegel, Aysegul Bumin, Brandon Silva, Jessica Sena, Benjamin Shickel, Azra Bihorac, Kia Khezeli, Parisa Rashidi
    for: This paper provides an overview of how the Transformers neural network architecture has been adopted in healthcare to analyze various forms of data, including medical imaging, Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences.methods: The paper discusses the use of Transformer models in healthcare applications, including clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis.results: The paper highlights the benefits and limitations of using Transformers in healthcare, including computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
    Abstract With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of data, including medical imaging, structured and unstructured Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences. Those models could help in clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. We identified relevant studies using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
    摘要 In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of data, including medical imaging, structured and unstructured Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences. Those models could help in clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis.We identified relevant studies using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.

Qualitative Prediction of Multi-Agent Spatial Interactions

  • paper_url: http://arxiv.org/abs/2307.00065
  • repo_url: None
  • paper_authors: Sariah Mghames, Luca Castri, Marc Hanheide, Nicola Bellotto
  • for: 本研究旨在理解和预测 dense scene 中多个机器人之间的互动,以满足在餐厅、仓库和医院等场景中部署服务机器人的需求。
  • methods: 本文提出了三种新的方法来建模和预测 dense scene 中多个机器人之间的互动,包括使用直观的Qualitative Trajectory Calculus (QTC)表示。这些方法考虑静态和动态上下文,并使用输入和时间注意力机制,并在中长期时间均衡测试。
  • results: 实验结果表明,使用 purely data-driven 网络进行运动预测,并对其进行 QTC 互动预测的输出处理,通常超过其他两种方法的性能。这三种方法还在不同的人类场景中进行了总体化评估。
    Abstract Deploying service robots in our daily life, whether in restaurants, warehouses or hospitals, calls for the need to reason on the interactions happening in dense and dynamic scenes. In this paper, we present and benchmark three new approaches to model and predict multi-agent interactions in dense scenes, including the use of an intuitive qualitative representation. The proposed solutions take into account static and dynamic context to predict individual interactions. They exploit an input- and a temporal-attention mechanism, and are tested on medium and long-term time horizons. The first two approaches integrate different relations from the so-called Qualitative Trajectory Calculus (QTC) within a state-of-the-art deep neural network to create a symbol-driven neural architecture for predicting spatial interactions. The third approach implements a purely data-driven network for motion prediction, the output of which is post-processed to predict QTC spatial interactions. Experimental results on a popular robot dataset of challenging crowded scenarios show that the purely data-driven prediction approach generally outperforms the other two. The three approaches were further evaluated on a different but related human scenarios to assess their generalisation capability.
    摘要 deploying 服务机器人在我们日常生活中,无论在餐厅、仓库或医院,需要对快速发展的 dense 和动态场景进行理解和预测。本文提出了三种新的方法来模型和预测多个机器人之间的互动,包括使用直观的Qualitative Trajectory Calculus(QTC)来创建一个符号驱动的深度学习架构。这三种方法都考虑了静止和动态上下文,并使用输入和时间注意力机制。它们在中期和长期时间 horizon 上测试。第一种方法将不同的QTC关系 integrate 到现有的深度学习网络中,以创建一个符号驱动的深度学习架构。第二种方法使用一种纯数据驱动的网络进行运动预测,并对输出进行后处理以预测QTC的空间互动。实验结果表明,纯数据驱动预测方法在一个流行的机器人数据集上通常表现出色,并且在不同但相关的人类场景中进行评估表明了泛化能力。

Breaking the Metric Voting Distortion Barrier

  • paper_url: http://arxiv.org/abs/2306.17838
  • repo_url: None
  • paper_authors: Moses Charikar, Prasanna Ramakrishnan, Kangning Wang, Hongxun Wu
  • for: 这 paper 的目的是设计一个投票规则,使得选举的候选人的平均距离选民最小。
  • methods: 这 paper 使用了一些新的投票规则,包括 Maximal Lotteries 和一些 hybrids 规则,以及一些抽样技术来实现更好的选举结果。
  • results: 这 paper 实现了一个新的投票规则,可以 garantuee 选举结果的偏差在 $2.753$ 以下。
    Abstract We consider the following well studied problem of metric distortion in social choice. Suppose we have an election with $n$ voters and $m$ candidates who lie in a shared metric space. We would like to design a voting rule that chooses a candidate whose average distance to the voters is small. However, instead of having direct access to the distances in the metric space, each voter gives us a ranked list of the candidates in order of distance. Can we design a rule that regardless of the election instance and underlying metric space, chooses a candidate whose cost differs from the true optimum by only a small factor (known as the distortion)? A long line of work culminated in finding deterministic voting rules with metric distortion $3$, which is the best possible for deterministic rules and many other classes of voting rules. However, without any restrictions, there is still a significant gap in our understanding: Even though the best lower bound is substantially lower at $2.112$, the best upper bound is still $3$, which is attained even by simple rules such as Random Dictatorship. Finding a rule that guarantees distortion $3 - \varepsilon$ for some constant $\varepsilon $ has been a major challenge in computational social choice. In this work, we give a rule that guarantees distortion less than $2.753$. To do so we study a handful of voting rules that are new to the problem. One is Maximal Lotteries, a rule based on the Nash equilibrium of a natural zero-sum game which dates back to the 60's. The others are novel rules that can be thought of as hybrids of Random Dictatorship and the Copeland rule. Though none of these rules can beat distortion $3$ alone, a careful randomization between Maximal Lotteries and any of the novel rules can.
    摘要 我们考虑了一个已经广泛研究过的社会选择问题,即度量扭曲问题。假设我们有一个选举有 $n$ 名选民和 $m$ 名候选人,这些候选人 lying in a shared metric space。我们想设计一个投票规则,使得选举出的候选人的平均距离选民小。然而,每名选民不直接给我们提供度量空间中的距离,而是给我们一个排序列表,其中每个候选人的排名顺序与其度量距离有关。我们可以设计一个规则,使得 regardless of the election instance and underlying metric space,选举出的候选人的成本与真实优质差不多(known as distortion)?一个长时间的工作最终导致了确定性投票规则的度量扭曲为3,这是确定性规则的最佳可能性,以及许多其他类型的投票规则。然而,没有任何限制,我们对于这个问题还是处于不够理解的阶段:尽管最佳下限比较低,大约为2.112,但是最好的上限仍然是3,这是由简单的规则如随机专制所实现。找到一个规则,可以保证度量扭曲小于2.753的常数ε的一个大型挑战在计算社会选择中。在这个工作中,我们提出了一个规则,可以保证度量扭曲小于2.753。为了实现这一点,我们研究了一些新的投票规则,其中之一是最大抽签规则,这是一种基于60年代的自然零Sum游戏的纳什平衡的规则。另外几个规则可以被视为Random Dictatorship和Copeland规则的混合体。虽然这些规则无法独立超过度量扭曲3,但是通过在Maximal Lotteries和这些新规则之间进行精细的随机化,可以实现更好的性能。

Resetting the Optimizer in Deep RL: An Empirical Study

  • paper_url: http://arxiv.org/abs/2306.17833
  • repo_url: None
  • paper_authors: Kavosh Asadi, Rasool Fakoor, Shoham Sabach
  • for: 这个论文主要是为了解决深度学习强化学习中的优值函数近似问题。
  • methods: 这个论文使用了现代variants of stochastic gradient descent algorithm,如Adam,来解决优化问题。这些优化器会在每个迭代中维护自己的内部参数,如梯度的首项和二项估计值,并在时间推移中更新这些参数。因此,在前一个迭代的信息会在当前迭代中影响优化器的内部参数。
  • results: 这个研究发现,使用这个简单的重置策略可以减轻这种影响,并使深度RL在Atari benchmark上表现出显著的改善。
    Abstract We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.
    摘要 我团队关注深度游戏学习中的优化函数近似问题。这是一个迭代进行的过程,其中每个迭代都可能有不同的优化目标函数。通常来说,使用现代化的泊松逻辑梯度下降算法,如Adam,来解决这些优化问题。这些优化器会保持自己的内部参数,如梯度的首个和第二个 moment的估计,并在时间推移中更新这些参数。因此,在前一轮的信息被使用来解决当前轮的优化问题。我们提出了一个简单的想法,即在开始新轮的时候,重置优化器的内部参数。我们通过使用不同的优化器和雨bow算法进行实验,证明了这种简单的修改可以减轻现代优化器的内部参数污染问题,并在Atari测试benchmark上显著提高深度RL的性能。

Understanding Unfairness via Training Concept Influence

  • paper_url: http://arxiv.org/abs/2306.17828
  • repo_url: None
  • paper_authors: Yuanshun Yao, Yang Liu
  • for: 本研究旨在帮助实践者更好地理解他们的数据和算法是如何不公正的。这是一个相对未曾受到充分研究的问题。
  • methods: 我们通过对训练数据进行ounterfactual intervening和改变样本来研究这个问题。我们定义了一些概念,如特征(X)、标签(Y)和敏感特征(A),然后使用Influence function来计算每个样本对模型不公正性的影响。
  • results: 我们的框架可以帮助实践者理解观察到的不公正性,并修复训练数据。此外,它还可以应用于检测杂 Labeling、修复表示不均衡的问题,以及检测公正性 targeted poisoning 攻击。
    Abstract Knowing the causes of a model's unfairness helps practitioners better understand their data and algorithms. This is an important yet relatively unexplored task. We look into this problem through the lens of the training data - one of the major sources of unfairness. We ask the following questions: how would a model's fairness performance change if, in its training data, some samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) some features were changed? In other words, we quantify the fairness influence of training samples by counterfactually intervening and changing samples based on predefined concepts, i.e. data attributes such as features (X), labels (Y), or sensitive attributes (A). To calculate a training sample's influence on the model's unfairness w.r.t a concept, we first generate counterfactual samples based on the concept, i.e. the counterfactual versions of the sample if the concept were changed. We then calculate the resulting impact on the unfairness, via influence function, if the counterfactual samples were used in training. Our framework not only helps practitioners understand the observed unfairness and repair their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.
    摘要 知道模型的不公平性的原因可以帮助实践者更好地理解其数据和算法。这是一个重要 yet 相对未经探索的任务。我们通过训练数据的镜像来查看这个问题。我们问的问题是:如果在模型的训练数据中有些样本(1)来自不同的群体(例如,人口学特征),(2)被标注 differently,或(3)某些特征被改变。换句话说,我们量化训练样本对模型的不公平性的影响,通过对概念(例如特征X、标签Y、敏感特征A)进行定义后,对样本进行counterfactual intervening,以计算样本对模型的不公平性的影响。我们的框架不仅帮助实践者理解观察到的不公平性和修复训练数据,还导致了许多其他应用,例如检测杂 labeling、修复不均衡表示、检测公平性-targeted 攻击。

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

  • paper_url: http://arxiv.org/abs/2307.00040
  • repo_url: https://github.com/Wangt-CN/DisCo
  • paper_authors: Tan Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
  • for: This paper focuses on the problem of Referring Human Dance Generation, which aims to generate realistic human dance images and videos with precise target poses and diverse appearances, while also generalizing to unseen subjects, backgrounds, and poses.
  • methods: The proposed approach, DISCO, includes a novel model architecture with disentangled control and an effective human attribute pre-training to improve the faithfulness and compositionality of dance synthesis.
  • results: DISCO is able to generate high-quality human dance images and videos with diverse appearances and flexible motions, as demonstrated through extensive qualitative and quantitative results.
    Abstract Generative AI has made significant strides in computer vision, particularly in image/video synthesis conditioned on text descriptions. Despite the advancements, it remains challenging especially in the generation of human-centric content such as dance synthesis. Existing dance synthesis methods struggle with the gap between synthesized content and real-world dance scenarios. In this paper, we define a new problem setting: Referring Human Dance Generation, which focuses on real-world dance scenarios with three important properties: (i) Faithfulness: the synthesis should retain the appearance of both human subject foreground and background from the reference image, and precisely follow the target pose; (ii) Generalizability: the model should generalize to unseen human subjects, backgrounds, and poses; (iii) Compositionality: it should allow for composition of seen/unseen subjects, backgrounds, and poses from different sources. To address these challenges, we introduce a novel approach, DISCO, which includes a novel model architecture with disentangled control to improve the faithfulness and compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans. Extensive qualitative and quantitative results demonstrate that DISCO can generate high-quality human dance images and videos with diverse appearances and flexible motions. Code, demo, video and visualization are available at: https://disco-dance.github.io/.
    摘要 “生成AI在计算机视觉领域做出了重要的进步,特别是在文本描述条件下的图像/视频生成。尽管有了进步,但是在人类中心的内容生成方面仍然存在挑战,如舞蹈生成。现有的舞蹈生成方法难以满足现实世界舞蹈场景中的差异。在这篇论文中,我们定义了一个新的问题设定:参照人类舞蹈生成(Referring Human Dance Generation),其特征包括:(i)忠诚度:生成的内容保留人体前景和背景的真实表现,并准确跟踪目标姿势;(ii)普适性:模型能够通过不同的人体、背景和姿势进行普适化;(iii)组合性:允许将已见/未见的人体、背景和姿势进行组合。为了解决这些挑战,我们提出了一种新的方法 DISCO,其包括一种新的模型架构,以提高舞蹈生成的忠诚度和组合性,以及一种有效的人体特征预训练,以提高对未见人体的普适性。我们的EXTENSIVE质量和量值结果表明,DISCO可以生成高质量的人类舞蹈图像和视频,具有多样的外表和灵活的动作。代码、示例、视频和可见化在:https://disco-dance.github.io/。”

Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2306.17817
  • repo_url: None
  • paper_authors: Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki
  • for: This paper proposes a manipulation policy called Act3D, which is designed to improve the accuracy and efficiency of robot manipulation tasks.
  • methods: Act3D uses a Transformer architecture to cast 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, and uses relative spatial attention to select the best feature point for end-effector pose prediction.
  • results: Act3D achieves a new state-of-the-art in RLbench, an established manipulation benchmark, with 10% absolute improvement over the previous SOTA 2D multi-view policy and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy.Here is the summary in Traditional Chinese:
  • for: 本研究提出了一个名为Act3D的抓取策略,旨在提高机器人抓取任务的精度和效率。
  • methods: Act3D使用Transformer架构,将6DoF关键位预测转换为3D探测,并使用相对的空间注意力选择最佳的特征点 для抓取器的姿势预测。
  • results: Act3D在RLbench中 achieve了新的州际优秀成绩,与前一代2D多视点策略相比提高了10%的绝对成绩,并且与前一代3D策略相比提高了22%的绝对成绩,并且仅需3x的计算量。
    Abstract 3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site: https://act3d.github.io/.
    摘要 三维感知表示是机器人操作中非常适合的,因为它们轻松地表示 occlusion 和简化空间理解。许多操作任务需要高精度的末端器pose预测,通常需要高分辨率的三维感知格,这些格是计算成本较高的。因此,大多数操作策略直接在二维上运行,忽略三维逻辑。在这篇论文中,我们提出了 Act3D,一种 manipulate 策略 transformer,将六个自由度的钥匙pose预测视为三维检测,并在可变的空间计算中进行适应计算。它从一或多个相机视图中提取的三维特征云,逐步 samples 三维点网格在自由空间中,将其特征化使用相对空间注意力,并选择最佳特征点 для末端器pose预测。Act3D 创造了RLbench中的新状态机,在74个RLbench任务中达到了10%的绝对提升,并在22%的绝对提升和3x的计算量下超越了先前的SOTA 3D策略。在严格的拓展中,我们显示了相对空间注意力的重要性,大规模的视语言预训练2D脊梁,以及粗略至细致的拓展关系的重要性。代码和视频可以在我们项目网站上获取:

Comparing Reinforcement Learning and Human Learning using the Game of Hidden Rules

  • paper_url: http://arxiv.org/abs/2306.17766
  • repo_url: None
  • paper_authors: Eric Pulick, Vladimir Menkov, Yonatan Mintz, Paul Kantor, Vicki Bier
  • for: 本研究旨在 investigate the impact of task structure on human learning (HL) and reinforcement learning (RL) performance.
  • methods: 研究使用了一个 especialized learning environment 来rigorously study the effect of task structure on HL and RL.
  • results: 研究发现,人类和RL算法在不同任务结构下表现有很大差异。
    Abstract Reliable real-world deployment of reinforcement learning (RL) methods requires a nuanced understanding of their strengths and weaknesses and how they compare to those of humans. Human-machine systems are becoming more prevalent and the design of these systems relies on a task-oriented understanding of both human learning (HL) and RL. Thus, an important line of research is characterizing how the structure of a learning task affects learning performance. While increasingly complex benchmark environments have led to improved RL capabilities, such environments are difficult to use for the dedicated study of task structure. To address this challenge we present a learning environment built to support rigorous study of the impact of task structure on HL and RL. We demonstrate the environment's utility for such study through example experiments in task structure that show performance differences between humans and RL algorithms.
    摘要 <>RL 技术的可靠实际应用需要深刻理解它们的优劣点和与人类学习方法的比较。人机系统在日益普遍,这些系统的设计受到了任务导向的人类学习(HL)和 RL 的影响。因此,研究任务结构对学习表现的影响是一项重要的研究方向。虽然RL algoritmas的能力在复杂的标准环境中得到了提高,但这些环境困难用于专门研究任务结构的问题。为解决这个挑战,我们提供了一个支持严谨研究任务结构对HL和RL表现的学习环境。我们通过一些例子实验表明了这个环境在研究任务结构的问题上的实用性。