cs.LG - 2023-07-22

A Revolution of Personalized Healthcare: Enabling Human Digital Twin with Mobile AIGC

  • paper_url: http://arxiv.org/abs/2307.12115
  • repo_url: None
  • paper_authors: Jiayuan Chen, Changyan Yi, Hongyang Du, Dusit Niyato, Jiawen Kang, Jun Cai, Xuemin, Shen
  • for: 本研究旨在探讨 mobil AI 生成内容技术如何推动人类数字孪生(HDT)的发展,以提高个人化医疗服务。
  • methods: 本文提出了一种基于 mobil AI 生成内容技术的 HDT 系统架构,并讨论了相关的设计要求和挑战。
  • results: 本文通过两个使用场景的示例和一个实验研究证明了该方案的有效性,并提出了一些未来方向和开放问题。
    Abstract Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empowered by the mobile AIGC is expected to revolutionize the personalized healthcare by generating rare disease data, modeling high-fidelity digital twin, building versatile testbeds, and providing 24/7 customized medical services. To promote the development of this new breed of paradigm, in this article, we propose a system architecture of mobile AIGC-driven HDT and highlight the corresponding design requirements and challenges. Moreover, we illustrate two use cases, i.e., mobile AIGC-driven HDT in customized surgery planning and personalized medication. In addition, we conduct an experimental study to prove the effectiveness of the proposed mobile AIGC-driven HDT solution, which shows a particular application in a virtual physical therapy teaching platform. Finally, we conclude this article by briefly discussing several open issues and future directions.
    摘要 mobile artificial intelligence生成内容(AIGC)技术指的是在移动边缘网络中部署AI算法,以自动化信息创建过程,同时满足用户的需求。 mobile AIGC 在最近受到了极高的关注,并可以是人类数字双(HDT)的关键启用技术。 HDT 通过 mobile AIGC 的 empowerment,预计将重塑个性化医疗,生成罕见疾病数据,模拟高精度数字双,建立多样化测试床,提供24/7个性化医疗服务。为推动这种新的 Paradigma 的发展,本文提出了移动 AIGC 驱动 HDT 的系统架构,并 highlighted 相应的设计要求和挑战。 此外,本文还 illustrate 了两个用例,即移动 AIGC 驱动 HDT 在定制手术规划和个性化药物。 此外,我们还进行了实验研究,证明了提议的移动 AIGC 驱动 HDT 解决方案的效iveness。 最后,我们 briefly discuss 了一些开放问题和未来方向。

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

  • paper_url: http://arxiv.org/abs/2307.12114
  • repo_url: None
  • paper_authors: Yanis Labrak, Mickael Rouvier, Richard Dufour
  • for: 这些大型自然语言处理(NLP)任务,如名实化识别(NER)、问答(QA)、关系抽取(RE)等,是为了评估四种现状最佳的 instruciton-tuned 大语言模型(LLMs)在英文医学和生物医学领域的表现。
  • methods: 这些LLMs 是通过对 instruction-tuned 模型进行训练,以适应不同的 NLP 任务。
  • results: 结果表明,评估的 LLMs 在零到几个采样enario 下,对大多数任务的性能都在逐渐提高,特别是在问答任务上,即使它们从来没有看到这些任务的示例。然而,分类和RE任务的性能下降,与专门为医疗领域训练的模型,如PubMedBERT,相比而言,它们的性能较差。此外,我们发现没有任何 LLM 在所有任务上都能超越其他模型,各个模型在不同任务上的表现不同。
    Abstract We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance of state-of-the-art models in zero- and few-shot scenarios for most tasks, and particularly well for the QA task, even though they have never seen examples from these tasks before. However, we observed that the classification and RE tasks perform below what can be achieved with a specifically trained model for the medical field, such as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all the studied tasks, with some models being better suited for certain tasks than others.
    摘要 我们评估了四种现代 instruction-tuned大型自然语言处理(NLP)模型(ChatGPT、Flan-T5 UL2、Tk-Instruct和Alpaca),在英语的13种实际医疗和生物医学NLP任务上进行评估,包括名称实体识别(NER)、问答(QA)、关系提取(RE)等。我们的总结结果表明,评估的LLMs在零或几个预测enario中的性能已经接近了现有模型的性能,尤其是在QA任务上表现出色,即使它们从来没有看到这些任务的示例。然而,我们发现,分类和RE任务的性能比特别训练的医疗领域模型,如PubMedBERT,还是有所下降。最后,我们注意到,无LLM可以在所有研究任务上表现出优于其他模型,一些模型更适合某些任务。

Active Control of Flow over Rotating Cylinder by Multiple Jets using Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.12083
  • repo_url: None
  • paper_authors: Kamyar Dobakhti, Jafar Ghazanfarian
  • for: 这个论文主要目的是提出一种基于深度学习的活动流控方法,以减少碰撞体上的阻力。
  • methods: 该方法使用多个控制的喷流来达到最大可能的阻力减少。具体来说,文章将介绍DRL算法的控制参数、其限制和优化,以及喷流数量和位置、感测器位置和最大喷流速率的优化。
  • results: 结果表明,将旋转和DRL相结合可以有效地减少阻力系数,达到49.75%的减少级别。此外,文章还表明,在不同的配置下,感测器的数量和位置需要根据用户的需求进行选择。同时,允许代理人访问更高的喷流速率,通常不会提高性能,除非rotating cylinder。
    Abstract The real power of artificial intelligence appears in reinforcement learning, which is computationally and physically more sophisticated due to its dynamic nature. Rotation and injection are some of the proven ways in active flow control for drag reduction on blunt bodies. In this paper, rotation will be added to the cylinder alongside the deep reinforcement learning (DRL) algorithm, which uses multiple controlled jets to reach the maximum possible drag suppression. Characteristics of the DRL code, including controlling parameters, their limitations, and optimization of the DRL network for use with rotation will be presented. This work will focus on optimizing the number and positions of the jets, the sensors location, and the maximum allowed flow rate to jets in the form of the maximum allowed flow rate of each actuation and the total number of them per episode. It is found that combining the rotation and DRL is promising since it suppresses the vortex shedding, stabilizes the Karman vortex street, and reduces the drag coefficient by up to 49.75%. Also, it will be shown that having more sensors at more locations is not always a good choice and the sensor number and location should be determined based on the need of the user and corresponding configuration. Also, allowing the agent to have access to higher flow rates, mostly reduces the performance, except when the cylinder rotates. In all cases, the agent can keep the lift coefficient at a value near zero, or stabilize it at a smaller number.
    摘要 真正的人工智能在强化学习中表现出真正的力量,因为它的动态性使其更加复杂。在活动流控中,旋转和注入是已知的降低拖力的方法。在这篇论文中,我们将在筒体上添加旋转,并与深度强化学习(DRL)算法结合使用多个控制的气流来达到最大可能的拖力降低。我们将展示DRL代码中的控制参数、其限制和优化DRL网络的方法,包括气流管道的数量和位置、感应器的位置和每个episode中的最大气流量。我们发现,将旋转和DRL结合使用是有前途的,因为它可以阻断旋转 shedding,稳定卡曼旋流street,并降低拖力系数至最多49.75%。此外,我们还发现,在某些情况下,添加更多的感应器并不总是有利,需要根据用户的需求和相应的配置来确定感应器的数量和位置。此外,允许机器人访问更高的气流量,通常会降低性能,除非筒体在旋转。在所有情况下,机器人都可以保持着降低的升力系数,或者稳定其在更小的数字上。

Spectral Normalized-Cut Graph Partitioning with Fairness Constraints

  • paper_url: http://arxiv.org/abs/2307.12065
  • repo_url: https://github.com/jiali2000/fnm
  • paper_authors: Jia Li, Yanhao Wang, Arpit Merchant
  • for: 本文目的是为了分解一个图的节点集 into $k$ 个彩色独立集,以最小化图中任何两个集之间的正规化连接值,同时保证每个属性分布在每个集中是约等的。
  • methods: 本文提出了一种两阶段的光谱算法,称为 FNM,用于实现公平分解。在第一阶段,我们添加了一个增强的拉格朗日函数基于我们的公平准则,以生成一个公平的光谱节点嵌入。在第二阶段,我们设计了一种圆拟方案,以生成 $k$ 个集从公平嵌入中生成高质量的分解。
  • results: 通过对九个标准数据集进行广泛的实验,我们证明了 FNM 比三种基准方法更高效。
    Abstract Normalized-cut graph partitioning aims to divide the set of nodes in a graph into $k$ disjoint clusters to minimize the fraction of the total edges between any cluster and all other clusters. In this paper, we consider a fair variant of the partitioning problem wherein nodes are characterized by a categorical sensitive attribute (e.g., gender or race) indicating membership to different demographic groups. Our goal is to ensure that each group is approximately proportionally represented in each cluster while minimizing the normalized cut value. To resolve this problem, we propose a two-phase spectral algorithm called FNM. In the first phase, we add an augmented Lagrangian term based on our fairness criteria to the objective function for obtaining a fairer spectral node embedding. Then, in the second phase, we design a rounding scheme to produce $k$ clusters from the fair embedding that effectively trades off fairness and partition quality. Through comprehensive experiments on nine benchmark datasets, we demonstrate the superior performance of FNM compared with three baseline methods.
    摘要 normalized-cut graph partitioning aimed to divide the set of nodes in a graph into $k$ disjoint clusters to minimize the fraction of the total edges between any cluster and all other clusters. In this paper, we considered a fair variant of the partitioning problem, where nodes were characterized by a categorical sensitive attribute (e.g., gender or race) indicating membership to different demographic groups. Our goal was to ensure that each group was approximately proportionally represented in each cluster while minimizing the normalized cut value. To resolve this problem, we proposed a two-phase spectral algorithm called FNM. In the first phase, we added an augmented Lagrangian term based on our fairness criteria to the objective function for obtaining a fairer spectral node embedding. Then, in the second phase, we designed a rounding scheme to produce $k$ clusters from the fair embedding that effectively trades off fairness and partition quality. Through comprehensive experiments on nine benchmark datasets, we demonstrated the superior performance of FNM compared with three baseline methods.

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

  • paper_url: http://arxiv.org/abs/2307.12063
  • repo_url: https://github.com/papercode2022/hill
  • paper_authors: Qingyang Zhang, Yiming Yang, Jingqing Ruan, Xuantang Xiong, Dengpeng Xing, Bo Xu
  • for: 这篇论文目的是提出一种可以解决循环对待问题的弹性问题决策学习方法,即 Hierarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL)。
  • methods: 这篇论文使用了一种名为 HILL 的方法,它使用了对抗表示学习目标来学习隐藏目标表示,然后使用这些表示来动态建立隐藏标签图和选择策略,以解决循环对待问题的问题。
  • results: 实验结果显示,HILL 比state-of-the-art基eline在缺乏对象奖励的连续控制任务上具有更高的样本效率和渐进性表现。
    Abstract Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these representations, HILL dynamically builds latent landmark graphs and employs a novelty measure on nodes and a utility measure on edges. Finally, HILL develops a subgoal selection strategy that balances exploration and exploitation by jointly considering both measures. Experimental results demonstrate that HILL outperforms state-of-the-art baselines on continuous control tasks with sparse rewards in sample efficiency and asymptotic performance. Our code is available at https://github.com/papercode2022/HILL.
    摘要 “对于受益从探索和实施的问题,叫做目标调整层次学习(GCHRL)是一种有前途的思路。它将源任务分解为子任务 conditional subtask,并在子任务空间进行探索和实施。GCHRL的有效性很大程度上取决于子任务表示函数和子任务选择策略。然而,现有的工作往往忽略GCHRL中的时间协调性在学习隐藏子任务表示时。此外,缺乏一个能够均衡探索和实施的子任务选择策略。本文提出了层次学习 via 动态建立隐藏地标 graphs(HILL)来解决这些限制。HILL使用了一个对照式表示学习目标来学习隐藏子任务表示,并在这些表示上动态建立隐藏地标 graphs。HILL还使用了节点上的新鲜度量和边上的实用度量。最后,HILL发展了一个子任务选择策略,考虑了这两个度量,以均衡探索和实施。实验结果显示,HILL在缺少奖励的粒子控制任务上比基于 estado-of-the-art 基eline 高效和长期性。我们的代码可以在 获取。”

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

  • paper_url: http://arxiv.org/abs/2307.12062
  • repo_url: None
  • paper_authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Tuomas Sandholm, Furong Huang, Stephen McAleer
  • for: 本研究旨在训练能够在环境干扰或敌意攻击下表现良好的RL策略。
  • methods: 我们提出了GRAD方法,它将把 temporally-coupled 干扰视为一个部分可见二人零 SUM 游戏,通过查找这个游戏的approximate equilibria来确保代理人的强度对 temporally-coupled 干扰的Robustness。
  • results: 我们的提议方法在许多连续控制任务中实验证明了与基elines相比,具有显著的Robustness优势,包括对于标准和 temporally-coupled 干扰的攻击。
    Abstract Robust reinforcement learning (RL) seeks to train policies that can perform well under environment perturbations or adversarial attacks. Existing approaches typically assume that the space of possible perturbations remains the same across timesteps. However, in many settings, the space of possible perturbations at a given timestep depends on past perturbations. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game. By finding an approximate equilibrium in this game, GRAD ensures the agent's robustness against temporally-coupled perturbations. Empirical experiments on a variety of continuous control tasks demonstrate that our proposed approach exhibits significant robustness advantages compared to baselines against both standard and temporally-coupled attacks, in both state and action spaces.
    摘要 Strong reinforcement learning (RL) aims to train policies that can perform well under environmental perturbations or adversarial attacks. Existing methods typically assume that the space of possible perturbations remains the same across timesteps. However, in many situations, the space of possible perturbations at a given timestep depends on past perturbations. We formally introduce temporally-coupled perturbations, presenting a new challenge for existing robust RL methods. To address this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game. By finding an approximate equilibrium in this game, GRAD ensures the agent's robustness against temporally-coupled perturbations. Empirical experiments on a variety of continuous control tasks show that our proposed approach exhibits significant robustness advantages compared to baselines against both standard and temporally-coupled attacks, in both state and action spaces.Note: Simplified Chinese is used here, as it is the most widely used variety of Chinese in mainland China and Taiwan. Traditional Chinese is also commonly used, especially in Hong Kong and Macau.

Fast Knowledge Graph Completion using Graphics Processing Units

  • paper_url: http://arxiv.org/abs/2307.12059
  • repo_url: None
  • paper_authors: Chun-Hee Lee, Dong-oh Kang, Hwa Jeon Song
  • for: 这个论文的目的是提出一种高效的知识图完成框架,用于在GPU上获得新的关系。
  • methods: 该论文使用知识图嵌入模型,将知识图完成问题转化为一种相似Join问题,然后使用度量空间的性质来 derive 高速的完成算法。
  • results: experiments 表明,该框架可以高效处理知识图完成问题。
    Abstract Knowledge graphs can be used in many areas related to data semantics such as question-answering systems, knowledge based systems. However, the currently constructed knowledge graphs need to be complemented for better knowledge in terms of relations. It is called knowledge graph completion. To add new relations to the existing knowledge graph by using knowledge graph embedding models, we have to evaluate $N\times N \times R$ vector operations, where $N$ is the number of entities and $R$ is the number of relation types. It is very costly. In this paper, we provide an efficient knowledge graph completion framework on GPUs to get new relations using knowledge graph embedding vectors. In the proposed framework, we first define "transformable to a metric space" and then provide a method to transform the knowledge graph completion problem into the similarity join problem for a model which is "transformable to a metric space". After that, to efficiently process the similarity join problem, we derive formulas using the properties of a metric space. Based on the formulas, we develop a fast knowledge graph completion algorithm. Finally, we experimentally show that our framework can efficiently process the knowledge graph completion problem.
    摘要 知识图可以应用于数据 semantics 多个领域,如问答系统、知识基础系统。然而,目前构建的知识图需要补充以获得更好的知识,这被称为知识图完成。为添加新的关系到现有的知识图,我们需要评估 $N\times N \times R$ 矢量操作,其中 $N$ 是实体的数量,$R$ 是关系类型的数量。这很费时。在这篇论文中,我们提供了一个高效的知识图完成框架在 GPU 上来获得新关系使用知识图嵌入向量。我们首先定义 "可转换到一个度量空间",然后提供一种将知识图完成问题转换成一个度量空间中的相似Join问题的方法。接着,我们使用度量空间的性质 deriv 出 formulas,并根据 formulas 开发了一个快速的知识图完成算法。最后,我们通过实验表示,我们的框架可以高效地处理知识图完成问题。

Exploring MLOps Dynamics: An Experimental Analysis in a Real-World Machine Learning Project

  • paper_url: http://arxiv.org/abs/2307.13473
  • repo_url: None
  • paper_authors: Awadelrahman M. A. Ahmed
    for:这个研究旨在优化机器学习操作(MLOps)过程,以提高机器学习项目的效率和生产力。methods:该实验使用了一个全面的 MLOps 工作流程,覆盖了问题定义、数据收集、数据准备、模型开发、模型部署、监测、管理、扩展性和合规遵守等重要阶段。实验还采用了一种系统化跟踪方法,以记录 especified 阶段之间的重复访问,以捕捉这些访问的原因。results:研究发现,MLOps 工作流程具有很强的融合和循环特性,并且具有很高的可重复性和可缩放性。通过对实验数据进行分析,提供了一些实践建议和推荐,以便在实际应用中进行进一步的优化和改进。
    Abstract This article presents an experiment focused on optimizing the MLOps (Machine Learning Operations) process, a crucial aspect of efficiently implementing machine learning projects. The objective is to identify patterns and insights to enhance the MLOps workflow, considering its iterative and interdependent nature in real-world model development scenarios. The experiment involves a comprehensive MLOps workflow, covering essential phases like problem definition, data acquisition, data preparation, model development, model deployment, monitoring, management, scalability, and governance and compliance. Practical tips and recommendations are derived from the results, emphasizing proactive planning and continuous improvement for the MLOps workflow. The experimental investigation was strategically integrated within a real-world ML project which followed essential phases of the MLOps process in a production environment, handling large-scale structured data. A systematic tracking approach was employed to document revisits to specific phases from a main phase under focus, capturing the reasons for such revisits. By constructing a matrix to quantify the degree of overlap between phases, the study unveils the dynamic and iterative nature of the MLOps workflow. The resulting data provides visual representations of the MLOps process's interdependencies and iterative characteristics within the experimental framework, offering valuable insights for optimizing the workflow and making informed decisions in real-world scenarios. This analysis contributes to enhancing the efficiency and effectiveness of machine learning projects through an improved MLOps process. Keywords: MLOps, Machine Learning Operations, Optimization, Experimental Analysis, Iterative Process, Pattern Identification.
    摘要 The experiment covers a comprehensive MLOps workflow, including problem definition, data acquisition, data preparation, model development, model deployment, monitoring, management, scalability, and governance and compliance. The results provide practical tips and recommendations for proactive planning and continuous improvement of the MLOps workflow.The experimental investigation was conducted within a real-world ML project, which followed the essential phases of the MLOps process in a production environment, handling large-scale structured data. A systematic tracking approach was employed to document revisits to specific phases, capturing the reasons for such revisits. By constructing a matrix to quantify the degree of overlap between phases, the study reveals the dynamic and iterative nature of the MLOps workflow.The resulting data provides visual representations of the MLOps process's interdependencies and iterative characteristics within the experimental framework, offering valuable insights for optimizing the workflow and making informed decisions in real-world scenarios. This analysis contributes to enhancing the efficiency and effectiveness of machine learning projects through an improved MLOps process.Keywords: MLOps, Machine Learning Operations, Optimization, Experimental Analysis, Iterative Process, Pattern Identification.

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.12996
  • repo_url: None
  • paper_authors: Romain Lacombe, Andrew Gaut, Jeff He, David Lüdeke, Kateryna Pistunova
  • for: 本研究旨在将科学知识从文本中转移到分子图表示,以推进计算生物化学中深度学习的发展。
  • methods: 研究者使用了对比学习将神经图表示与文本描述的特征相对转移,以提高分子性质预测性能。他们还提出了一种基于有机反应的新型分子图数据生成策略。
  • results: 研究者在下游的分子网络Property Classification任务上实现了+4.26%的AUROC提升,比Graph模式alone模型提升+1.54%。这表明将科学知识从文本中转移到分子图表示可以提高分子性质预测性能。
    Abstract Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).
    摘要 深度学习在计算生物化学中传统上专注于分子图神经表示;然而,最近的语言模型发展显示了科学知识在文本中的含义。为了融合这两种模式,我们研究如何从自然语言中提取分子性质信息并将其传递到图表示中。我们使用对比学习对神经图表示和文本描述中的特征进行对齐,并使用神经相关性分数策略来提高文本检索。我们还介绍了一种基于有机反应的新型化学Graph augmentation策略,并在下游MoleculeNet性质分类任务上达到了+4.26% AUROC提升和+1.54%提升 compared to MoMu模型(Su et al., 2022)。

Flight Contrail Segmentation via Augmented Transfer Learning with Novel SR Loss Function in Hough Space

  • paper_url: http://arxiv.org/abs/2307.12032
  • repo_url: https://github.com/junzis/contrail-net
  • paper_authors: Junzi Sun, Esther Roosenbrand
  • for: 检测飞行 contrails 从卫星图像中
  • methods: 基于增强转移学习的新模型,以及一种新的损失函数 SR Loss
  • results: 准确地检测 contrails WITH minimal data
    Abstract Air transport poses significant environmental challenges, particularly the contribution of flight contrails to climate change due to their potential global warming impact. Detecting contrails from satellite images has been a long-standing challenge. Traditional computer vision techniques have limitations under varying image conditions, and machine learning approaches using typical convolutional neural networks are hindered by the scarcity of hand-labeled contrail datasets and contrail-tailored learning processes. In this paper, we introduce an innovative model based on augmented transfer learning that accurately detects contrails with minimal data. We also propose a novel loss function, SR Loss, which improves contrail line detection by transforming the image space into Hough space. Our research opens new avenues for machine learning-based contrail detection in aviation research, offering solutions to the lack of large hand-labeled datasets, and significantly enhancing contrail detection models.
    摘要 空中交通对环境造成重要挑战,特别是飞行烟尘的潜在全球暖化影响。从卫星图像探测飞行烟尘是一项长期挑战。传统的计算机视觉技术在不同的图像条件下有限制,机器学习方法使用 Typical convolutional neural networks 也受到手动标注飞行烟尘数据的罕见性和适应飞行烟尘学习过程的限制。在这篇论文中,我们介绍了一种创新的模型,基于增强传输学习,可以准确地检测飞行烟尘,只需 minimal data。我们还提出了一种新的损失函数,SR Loss,它通过将图像空间转换为截距空间,提高了飞行烟尘线检测。我们的研究打开了新的机器学习基于飞行烟尘检测的可能性,解决了航空研究中缺乏大量手动标注数据的问题,并显著提高了飞行烟尘检测模型。

FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models

  • paper_url: http://arxiv.org/abs/2308.00065
  • repo_url: https://github.com/yuweiyin/finpt
  • paper_authors: Yuwei Yin, Yazheng Yang, Jian Yang, Qi Liu
  • for: 这研究旨在提出一种新的金融风险预测方法,以帮助金融机构更好地识别和预测风险。
  • methods: 该方法使用Profile Tuning技术,将大型预训模型粘贴到金融表格数据中,并通过提问大语言模型(LLMs)获取自然语言客户profile,进而进行预测。
  • results: 通过对FinBench数据集进行实验,研究人员发现FinPT方法可以与各种代表性的强基线进行比较,并且通过分析LLMs的性能,深入理解它们在金融风险预测中的应用。
    Abstract Financial risk prediction plays a crucial role in the financial sector. Machine learning methods have been widely applied for automatically detecting potential risks and thus saving the cost of labor. However, the development in this field is lagging behind in recent years by the following two facts: 1) the algorithms used are somewhat outdated, especially in the context of the fast advance of generative AI and large language models (LLMs); 2) the lack of a unified and open-sourced financial benchmark has impeded the related research for years. To tackle these issues, we propose FinPT and FinBench: the former is a novel approach for financial risk prediction that conduct Profile Tuning on large pretrained foundation models, and the latter is a set of high-quality datasets on financial risks such as default, fraud, and churn. In FinPT, we fill the financial tabular data into the pre-defined instruction template, obtain natural-language customer profiles by prompting LLMs, and fine-tune large foundation models with the profile text to make predictions. We demonstrate the effectiveness of the proposed FinPT by experimenting with a range of representative strong baselines on FinBench. The analytical studies further deepen the understanding of LLMs for financial risk prediction.
    摘要

A Flexible Framework for Incorporating Patient Preferences Into Q-Learning

  • paper_url: http://arxiv.org/abs/2307.12022
  • repo_url: None
  • paper_authors: Joshua P. Zitovsky, Leslie Wilson, Michael R. Kosorok
  • for: 这篇论文是为了解决现实世界医疗问题中的多个竞争结果问题而写的,包括治疗效果和不良反应的严重程度。
  • methods: 这篇论文提出了一种新的方法,即Latent Utility Q-Learning(LUQ-Learning),以解决现有方法的限制,包括只能处理单个时间点和两个结果、不能 incorporate自报病人偏好等。LUQ-Learning 使用隐藏模型方法,自然地扩展 Q-learning 到复合结果设定下,并采取理想的质量评价来对各个病人进行评价。
  • results: 在基于低背痛的实验中,我们的方法与多种基线方法进行比较,并在所有实验中达到了非常竞争性的实验性表现。
    Abstract In real-world healthcare problems, there are often multiple competing outcomes of interest, such as treatment efficacy and side effect severity. However, statistical methods for estimating dynamic treatment regimes (DTRs) usually assume a single outcome of interest, and the few methods that deal with composite outcomes suffer from important limitations. This includes restrictions to a single time point and two outcomes, the inability to incorporate self-reported patient preferences and limited theoretical guarantees. To this end, we propose a new method to address these limitations, which we dub Latent Utility Q-Learning (LUQ-Learning). LUQ-Learning uses a latent model approach to naturally extend Q-learning to the composite outcome setting and adopt the ideal trade-off between outcomes to each patient. Unlike previous approaches, our framework allows for an arbitrary number of time points and outcomes, incorporates stated preferences and achieves strong asymptotic performance with realistic assumptions on the data. We conduct simulation experiments based on an ongoing trial for low back pain as well as a well-known completed trial for schizophrenia. In all experiments, our method achieves highly competitive empirical performance compared to several alternative baselines.
    摘要 在现实医疗问题中,常常存在多个竞争的目的结果,如治疗效果和副作用严重程度。然而,统计方法 для估计动态治疗方案(DTR)通常假设单一的目的结果,而其中几种方法只能处理单个时间点和两个结果。这些方法还具有限制性,例如不能 incorporate自报病人喜好和有限的理论保证。为此,我们提出了一种新的方法,我们称之为潜在用户价值Q学习(LUQ-Learning)。LUQ-Learning 使用潜在模型方法来自然地扩展Q学习到复合结果设定下,并采取每个患者的理想妥协。不同于前一些方法,我们的框架允许任意数量的时间点和结果,并 incorporate 自报病人喜好,并实现强 asymptotic performance 在现实数据下,只需要有限的假设。我们在一个低肢瘤痛试验和一个已完成的躁闹症试验中进行了 simulations experiments。在所有实验中,我们的方法与多个基准方法相比,表现出了非常竞争的实验性。

Model Predictive Control (MPC) of an Artificial Pancreas with Data-Driven Learning of Multi-Step-Ahead Blood Glucose Predictors

  • paper_url: http://arxiv.org/abs/2307.12015
  • repo_url: None
  • paper_authors: Eleonora Maria Aiello, Mehrad Jaloli, Marzia Cescon
  • for: 这个研究是为了开发一个基于Linear Time-Varying(LTV)Model Predictive Control(MPC)框架的关闭循环胰岛素输送算法,用于治疗类型1 диабе尼(T1D)。
  • methods: 这个研究使用了一个数据驱动的多步预测血糖(BG)预测器,并将其与LTV MPC框架集成。而不是从数据中直接标定胰岛素逻辑系统的开放循环模型,这里提议直接使用BG预测器来预测未来的血糖水平。为非线性部分,使用了Long Short-Term Memory(LSTM)网络,而为线性部分,使用了线性回归模型。
  • results: 对于三个模拟场景,包括一个标准情况,一个随机餐食干扰情况,以及一个减少胰岛素敏感度25%的情况,我们证明了我们的LSTM-MPC控制器的优势。在随机餐食干扰情况下,我们的方法提供了更加准确的未来血糖水平预测,以及更好的封闭循环性能。
    Abstract We present the design and \textit{in-silico} evaluation of a closed-loop insulin delivery algorithm to treat type 1 diabetes (T1D) consisting in a data-driven multi-step-ahead blood glucose (BG) predictor integrated into a Linear Time-Varying (LTV) Model Predictive Control (MPC) framework. Instead of identifying an open-loop model of the glucoregulatory system from available data, we propose to directly fit the entire BG prediction over a predefined prediction horizon to be used in the MPC, as a nonlinear function of past input-ouput data and an affine function of future insulin control inputs. For the nonlinear part, a Long Short-Term Memory (LSTM) network is proposed, while for the affine component a linear regression model is chosen. To assess benefits and drawbacks when compared to a traditional linear MPC based on an auto-regressive with exogenous (ARX) input model identified from data, we evaluated the proposed LSTM-MPC controller in three simulation scenarios: a nominal case with 3 meals per day, a random meal disturbances case where meals were generated with a recently published meal generator, and a case with 25$\%$ decrease in the insulin sensitivity. Further, in all the scenarios, no feedforward meal bolus was administered. For the more challenging random meal generation scenario, the mean $\pm$ standard deviation percent time in the range 70-180 [mg/dL] was 74.99 $\pm$ 7.09 vs. 54.15 $\pm$ 14.89, the mean $\pm$ standard deviation percent time in the tighter range 70-140 [mg/dL] was 47.78$\pm$8.55 vs. 34.62 $\pm$9.04, while the mean $\pm$ standard deviation percent time in sever hypoglycemia, i.e., $<$ 54 [mg/dl] was 1.00$\pm$3.18 vs. 9.45$\pm$11.71, for our proposed LSTM-MPC controller and the traditional ARX-MPC, respectively. Our approach provided accurate predictions of future glucose concentrations and good closed-loop performances of the overall MPC controller.
    摘要 我们介绍了一种关闭Loop抗糖尿病(T1D)的设计和 simulate evaluate 的数据驱动多步预测血糖(BG)预测算法,包括一个基于线性时变(LTV)模型预测控制(MPC)框架的数据驱动多步预测算法。而不是直接从可用数据中Identify一个开 Loop模型的glucoregulatory系统,我们提议直接将整个BG预测 horizon为用于MPC,作为非线性函数过去输入输出数据和未来药物控制输入的非线性函数。 для非线性部分,我们提议使用一个Long Short-Term Memory(LSTM)网络,而对于线性部分,我们选择了一个线性回归模型。为了评估我们提议的LSTM-MPC控制器与传统的ARX-MPC控制器相比,我们在三个模拟场景中评估了这两个控制器的表现:一个标准的3餐/天场景,一个随机餐品干扰场景,以及一个25%的药物敏感度下降场景。此外,在所有场景中,没有feedforward餐品补偿。在更加复杂的随机餐品生成场景中,LSTM-MPC控制器的mean±标准差%时间在70-180[mg/dL]范围内为74.99±7.09 vs. 54.15±14.89,mean±标准差%时间在70-140[mg/dL]范围内为47.78±8.55 vs. 34.62±9.04,而且mean±标准差%时间在严重低血糖(<54[mg/dL])下为1.00±3.18 vs. 9.45±11.71。我们的方法提供了精准的未来血糖浓度预测和关闭Loop控制器的全面性能的良好表现。

NLCUnet: Single-Image Super-Resolution Network with Hairline Details

  • paper_url: http://arxiv.org/abs/2307.12014
  • repo_url: None
  • paper_authors: Jiancong Feng, Yuan-Gen Wang, Fengchuang Xing
  • For: 提高单张超解像图像质量,特别是细节部分的精度。* Methods: 提出了一种基于非本地注意力的单张超解像网络(NLCUnet),包括三个核心设计:非本地注意力机制、深度卷积 convolution 和通道注意力。* Results: 在DF2K dataset上进行了许多实验,发现 NLCUnet 在 PSNR 和 SSIM 指标上比现有方法提高较多,并且可以保持更好的细节部分。
    Abstract Pursuing the precise details of super-resolution images is challenging for single-image super-resolution tasks. This paper presents a single-image super-resolution network with hairline details (termed NLCUnet), including three core designs. Specifically, a non-local attention mechanism is first introduced to restore local pieces by learning from the whole image region. Then, we find that the blur kernel trained by the existing work is unnecessary. Based on this finding, we create a new network architecture by integrating depth-wise convolution with channel attention without the blur kernel estimation, resulting in a performance improvement instead. Finally, to make the cropped region contain as much semantic information as possible, we propose a random 64$\times$64 crop inside the central 512$\times$512 crop instead of a direct random crop inside the whole image of 2K size. Numerous experiments conducted on the benchmark DF2K dataset demonstrate that our NLCUnet performs better than the state-of-the-art in terms of the PSNR and SSIM metrics and yields visually favorable hairline details.
    摘要 推进超高清照片的精确细节是单图超解像 зада务中的挑战。本文提出了一个单图超解像网络(NLCUnet),包括三个核心设计。具体来说,我们首先引入非本地注意力机制,以便通过整个图像区域学习地址本地副本。然后,我们发现现有工作中训练的模糊核心不是必需的,因此我们创建了一个新的网络架构,通过depthwise核论和通道注意力来提高性能。最后,我们提议在中心256×256区域中随机选择64×64区域,以便尽可能包含图像中的semantic信息。在DF2K数据集上进行了多次实验,表明我们的NLCUnet在PSNR和SSIM指标上比state-of-the-art更高,并且视觉上具有更好的毛细膨胀细节。

Contrastive Self-Supervised Learning Based Approach for Patient Similarity: A Case Study on Atrial Fibrillation Detection from PPG Signal

  • paper_url: http://arxiv.org/abs/2308.02433
  • repo_url: https://github.com/subangkar/simsig
  • paper_authors: Subangkar Karmaker Shanto, Shoumik Saha, Atif Hasan Rahman, Mohammad Mehedy Masud, Mohammed Eunus Ali
  • for: 这个论文是为了提出一种基于对比学习的深度学习框架,用于搜索基于生物 физи学信号的病人相似性。
  • methods: 这个框架使用对比学习方法来学习病人的相似embedding,并引入了一些邻居选择算法来确定生成embedding上的最高相似性。
  • results: 作者通过对一个涉及到心脏病的案例研究,证明了该框架的有效性。实验结果表明,该框架可以准确地检测心脏病AF,并且与其他基线方法相比,其性能更高。
    Abstract In this paper, we propose a novel contrastive learning based deep learning framework for patient similarity search using physiological signals. We use a contrastive learning based approach to learn similar embeddings of patients with similar physiological signal data. We also introduce a number of neighbor selection algorithms to determine the patients with the highest similarity on the generated embeddings. To validate the effectiveness of our framework for measuring patient similarity, we select the detection of Atrial Fibrillation (AF) through photoplethysmography (PPG) signals obtained from smartwatch devices as our case study. We present extensive experimentation of our framework on a dataset of over 170 individuals and compare the performance of our framework with other baseline methods on this dataset.
    摘要 在本文中,我们提出了一种基于对比学习的深度学习框架,用于通过生物物理信号来查找病人相似性。我们使用对比学习方法来学习病人的相似 embedding,并引入了一些邻居选择算法来确定生成 embedding 中最相似的病人。为了证明我们的框架的有效性,我们选择了基于 photoplethysmography (PPG) 信号检测 Atrial Fibrillation (AF) 为我们的案例研究。我们对一个包含超过 170 个个体的数据集进行了广泛的实验,并与其他基线方法进行比较。

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

  • paper_url: http://arxiv.org/abs/2307.11986
  • repo_url: https://github.com/holipori/mimic-diff-vqa
  • paper_authors: Xinyue Hu, Lin Gu, Qiyuan An, Mengliang Zhang, Liangchen Liu, Kazuma Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu
  • for: 这 paper 的目的是提出一个新的胸部X射影差异视觉问答任务 (VQA),以帮助自动化医疗视觉语言模型。
  • methods: 这 paper 使用了一种新的专家知识感知图表学习模型,将图像差异视觉问答任务解决。该模型利用了 анатомиче结构优先知识、semantic知识和空间知识等专家知识来构建多关系图,表示图像差异的问答任务。
  • results: 这 paper 收集了一个新的数据集,名为 MIMIC-Diff-VQA,包含 700,703 个问答对from 164,324 对主要和参考图像。与现有的医疗 VQA 数据集相比,这些问题更加适合临床诊断实践中的诊断- intervene-评估过程。
    Abstract To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and, more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images. Compared to existing medical VQA datasets, our questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation treatment procedure used by clinical professionals. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this task. The proposed baseline model leverages expert knowledge such as anatomical structure prior, semantic, and spatial knowledge to construct a multi-relationship graph, representing the image differences between two images for the image difference VQA task. The dataset and code can be found at https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further push forward the medical vision language model.
    摘要 为了让医疗视语言模型自动化,我们提出了一个新的胸部X射影异常视问答(VQA)任务。给定一对主要和参考图像,这个任务的目标是回答一些疾病和图像之间的异常问题。这与医生诊断实践相一致,即将当前图像与参考图像进行比较,以确定报告。我们收集了一个新的数据集,即MIMIC-Diff-VQA,包含700703个问答对 from 164324对主要和参考图像。与现有的医学VQA数据集相比,我们的问题更加适合医生在诊断过程中采用的评估-诊断- interven-评估(ADIE)治疗流程。此外,我们还提出了一种基于专家知识的图像异常关系学习模型,以解决这个任务。我们的基eline模型利用专家知识,如生物结构优先知识、semantic知识和空间知识,构建多关系图,表示图像之间的异常关系。数据集和代码可以在https://github.com/Holipori/MIMIC-Diff-VQA中找到。我们认为这项工作将会进一步推动医学视语言模型的发展。

Collaborative Graph Neural Networks for Attributed Network Embedding

  • paper_url: http://arxiv.org/abs/2307.11981
  • repo_url: https://github.com/qiaoyut/conn
  • paper_authors: Qiaoyu Tan, Xin Zhang, Xiao Huang, Hao Chen, Jundong Li, Xia Hu
    for: This paper focuses on developing a new graph neural network (GNN) architecture called COllaborative graph Neural Networks (CONN) to improve attribute network embedding.methods: The proposed CONN architecture uses selective message diffusion and cross-correlation to jointly reconstruct node-to-node and node-to-attribute-category interactions, which enhances the model’s capacity.results: The experimental results on real-world networks show that CONN outperforms state-of-the-art embedding algorithms with a significant margin.
    Abstract Graph neural networks (GNNs) have shown prominent performance on attributed network embedding. However, existing efforts mainly focus on exploiting network structures, while the exploitation of node attributes is rather limited as they only serve as node features at the initial layer. This simple strategy impedes the potential of node attributes in augmenting node connections, leading to limited receptive field for inactive nodes with few or even no neighbors. Furthermore, the training objectives (i.e., reconstructing network structures) of most GNNs also do not include node attributes, although studies have shown that reconstructing node attributes is beneficial. Thus, it is encouraging to deeply involve node attributes in the key components of GNNs, including graph convolution operations and training objectives. However, this is a nontrivial task since an appropriate way of integration is required to maintain the merits of GNNs. To bridge the gap, in this paper, we propose COllaborative graph Neural Networks--CONN, a tailored GNN architecture for attribute network embedding. It improves model capacity by 1) selectively diffusing messages from neighboring nodes and involved attribute categories, and 2) jointly reconstructing node-to-node and node-to-attribute-category interactions via cross-correlation. Experiments on real-world networks demonstrate that CONN excels state-of-the-art embedding algorithms with a great margin.
    摘要 GRAPH Neural Networks (GNNs) 有出色表现在嵌入属性网络中。然而,现有努力主要是利用网络结构,而忽视节点特征的利用,只是将节点特征作为初始层节点特征使用。这种简单的策略限制了无活节点的潜在范围,因为它们有少量或甚至没有邻居。此外,大多数 GNN 的训练目标(即重建网络结构)并不包括节点特征,尽管研究表明重建节点特征有利。因此,深入涉及节点特征在 GNN 的关键组件中是一项挑战,需要避免降低 GNN 的优点。为了bridging这个差距,在这篇论文中,我们提出了协同图 neural Networks(CONN),一种针对嵌入属性网络的特化 GNN 架构。它提高了模型容量,通过1) 选择性地往返邻居节点和涉及属性类别中传递消息,2) 并同时重建节点到节点和节点到属性类别的交互。实验表明,CONN 在实际网络上超过了当前领先 embedding 算法的性能。

Simulation of Arbitrary Level Contrast Dose in MRI Using an Iterative Global Transformer Model

  • paper_url: http://arxiv.org/abs/2307.11980
  • repo_url: None
  • paper_authors: Dayang Wang, Srivathsa Pasumarthi, Greg Zaharchuk, Ryan Chamberlain
  • for: 这个研究旨在提出一种基于卷积神经网络的图像合成方法,以实现不同剂量水平的对照增强图像的生成,以便为MRI成像中的医学应用提供更好的依据。
  • methods: 该方法基于一种名为Gformer的变换器,其包括一种抽样基于注意力机制和一种旋转 shift模块,以捕捉不同对照增强特征。
  • results: 对比其他状态艺技术,该方法的评估结果表明其性能更高。此外,该方法还在下游任务中,如剂量减少和肿瘤分割中进行了评估,以证明其在临床应用中的价值。
    Abstract Deep learning (DL) based contrast dose reduction and elimination in MRI imaging is gaining traction, given the detrimental effects of Gadolinium-based Contrast Agents (GBCAs). These DL algorithms are however limited by the availability of high quality low dose datasets. Additionally, different types of GBCAs and pathologies require different dose levels for the DL algorithms to work reliably. In this work, we formulate a novel transformer (Gformer) based iterative modelling approach for the synthesis of images with arbitrary contrast enhancement that corresponds to different dose levels. The proposed Gformer incorporates a sub-sampling based attention mechanism and a rotational shift module that captures the various contrast related features. Quantitative evaluation indicates that the proposed model performs better than other state-of-the-art methods. We further perform quantitative evaluation on downstream tasks such as dose reduction and tumor segmentation to demonstrate the clinical utility.
    摘要 深度学习(DL)基于对比剂量减少和消除在MRI成像中得到了进一步的发展,因为Gadolinium-based Contrast Agents(GBCAs)的负面效应。但这些DL算法受到高质量低剂量数据的有效性的限制。此外,不同类型的GBCAs和疾病需要不同的剂量水平以便DL算法可靠地工作。在这种工作中,我们提出了一种基于转换器(Gformer)的迭代模型方法,用于生成具有任意对比强化的图像。我们的Gformer模型包括子抽样基于注意力机制和旋转变换模块,以捕捉不同的对比相关特征。量化评估表明,我们提出的模型在其他状态当前的方法之上表现出了更好的性能。我们进一步进行了下游任务如剂量减少和肿瘤分割,以证明临床实用性。

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

  • paper_url: http://arxiv.org/abs/2307.11978
  • repo_url: https://github.com/cewu/ptnl
  • paper_authors: Cheng-En Wu, Yu Tian, Haichao Yu, Heng Wang, Pedro Morgado, Yu Hen Hu, Linjie Yang
  • for: 研究了CLIP模型在干预几个示例下调整为新的分类任务中的稳定性。
  • methods: 使用了几个示例来调整CLIP模型,并发现这种方法具有很高的抗噪性。
  • results: 发现了两个关键因素导致这种方法的稳定性:1)固定的类名Token提供了模型优化过程中强制的正则化,减少了噪音样本引起的梯度; 2)从多样化和通用的网络数据中学习的强大预训练图文映射提供了图像分类的强大先验知识。此外,我们还示出了使用CLIP模型自己的噪音零例预测来调整其自己的提示,可以显著提高无监督下的预测精度。代码可以在https://github.com/CEWu/PTNL中找到。
    Abstract Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data. A vision-language model can be adapted to a new classification task through few-shot prompt tuning. We find that such a prompt tuning process is highly robust to label noises. This intrigues us to study the key reasons contributing to the robustness of the prompt tuning paradigm. We conducted extensive experiments to explore this property and find the key factors are: 1) the fixed classname tokens provide a strong regularization to the optimization of the model, reducing gradients induced by the noisy samples; 2) the powerful pre-trained image-text embedding that is learned from diverse and generic web data provides strong prior knowledge for image classification. Further, we demonstrate that noisy zero-shot predictions from CLIP can be used to tune its own prompt, significantly enhancing prediction accuracy in the unsupervised setting. The code is available at https://github.com/CEWu/PTNL.
    摘要 CLIP类的视觉语言模型通过大规模训练学习一个通用的文本图像嵌入。一个视觉语言模型可以通过几个shot提问调整到新的分类任务。我们发现这种提问调整过程具有高度的鲁棒性,这使我们感到感兴趣,并且想 deeper 地研究这种特性的原因。我们进行了广泛的实验,并发现关键因素有两个:1)固定的类名token提供了模型优化的强制性,减少了噪音样本引起的梯度;2)通过多种和通用的网络数据学习的强大预训练图像文本嵌入,为图像分类提供了强大的先验知识。此外,我们示出了使用CLIP生成的噪音零shot预测来调整其自己的提问,可以大幅提高无监督下的预测精度。代码可以在https://github.com/CEWu/PTNL 中找到。

Out-of-Distribution Optimality of Invariant Risk Minimization

  • paper_url: http://arxiv.org/abs/2307.11972
  • repo_url: None
  • paper_authors: Shoji Toyota, Kenji Fukumizu
  • for: 提高深度神经网络的泛化能力,即使在未经见过的领域下也能准确预测。
  • methods: 使用偏向风险最小化(IRM)方法,解决深度神经网络继承训练数据中嵌入的假 correlations 问题,以提高模型的泛化能力。
  • results: 提供了一种理论保证,表明在满足certain conditions下,bi-level optimization problem的解决方案会最小化异常风险。
    Abstract Deep Neural Networks often inherit spurious correlations embedded in training data and hence may fail to generalize to unseen domains, which have different distributions from the domain to provide training data. M. Arjovsky et al. (2019) introduced the concept out-of-distribution (o.o.d.) risk, which is the maximum risk among all domains, and formulated the issue caused by spurious correlations as a minimization problem of the o.o.d. risk. Invariant Risk Minimization (IRM) is considered to be a promising approach to minimize the o.o.d. risk: IRM estimates a minimum of the o.o.d. risk by solving a bi-level optimization problem. While IRM has attracted considerable attention with empirical success, it comes with few theoretical guarantees. Especially, a solid theoretical guarantee that the bi-level optimization problem gives the minimum of the o.o.d. risk has not yet been established. Aiming at providing a theoretical justification for IRM, this paper rigorously proves that a solution to the bi-level optimization problem minimizes the o.o.d. risk under certain conditions. The result also provides sufficient conditions on distributions providing training data and on a dimension of feature space for the bi-leveled optimization problem to minimize the o.o.d. risk.
    摘要 深度神经网络经常会继承训练数据中嵌入的假 correlations,从而导致在未看到的领域中失败,这些领域的分布与训练数据的分布不同。M. Arjovsky等人(2019)引入了 OUT-OF-DISTRIBUTION(o.o.d)风险,它是所有领域的最大风险,并将嵌入在训练数据中的假 correlations 问题定义为一个 minimization 问题。不变risk Minimization (IRM) 被视为一种有前景的方法来减少 o.o.d. 风险:IRM 通过解决一个二级优化问题来估算 o.o.d. 风险的最小值。虽然 IRM 在实际中得到了广泛的关注并取得了一些成功,但它具有少量的理论保证。特别是,一个坚实的理论保证,即二级优化问题的解决方案实际上是 o.o.d. 风险的最小值,尚未被成功地建立。本文通过坚实的理论证明,解决二级优化问题可以减少 o.o.d. 风险,并提供了一些有关训练数据的分布和特征空间维度的充分条件。

DHC: Dual-debiased Heterogeneous Co-training Framework for Class-imbalanced Semi-supervised Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.11960
  • repo_url: https://github.com/xmed-lab/dhc
  • paper_authors: Haonan Wang, Xiaomeng Li
  • for: 这个研究的目的是提出一个基于 semi-supervised learning (SSL) 的三维医疗影像分类框架,以解决对于医疗影像分类的专家需求和时间耗费问题。
  • methods: 这个框架使用了一个新的 Dual-debiased Heterogeneous Co-training (DHC) 方法,包括两种损失衡量策略:Distribution-aware Debiased Weighting (DistDW) 和 Difficulty-aware Debiased Weighting (DiffDW),这些策略可以动态地使用 Pseudo 标签来导引模型解决数据和学习偏见。
  • results: 实验结果显示,提出的方法可以将 pseudo 标签用于偏见调整和纠正阶层分类问题,并且与现有的 SSL 方法比较,显示出我们的方法在更加具体的 SSL 设定下表现更好。代码和模型可以在 GitHub 上找到:https://github.com/xmed-lab/DHC.
    Abstract The volume-wise labeling of 3D medical images is expertise-demanded and time-consuming; hence semi-supervised learning (SSL) is highly desirable for training with limited labeled data. Imbalanced class distribution is a severe problem that bottlenecks the real-world application of these methods but was not addressed much. Aiming to solve this issue, we present a novel Dual-debiased Heterogeneous Co-training (DHC) framework for semi-supervised 3D medical image segmentation. Specifically, we propose two loss weighting strategies, namely Distribution-aware Debiased Weighting (DistDW) and Difficulty-aware Debiased Weighting (DiffDW), which leverage the pseudo labels dynamically to guide the model to solve data and learning biases. The framework improves significantly by co-training these two diverse and accurate sub-models. We also introduce more representative benchmarks for class-imbalanced semi-supervised medical image segmentation, which can fully demonstrate the efficacy of the class-imbalance designs. Experiments show that our proposed framework brings significant improvements by using pseudo labels for debiasing and alleviating the class imbalance problem. More importantly, our method outperforms the state-of-the-art SSL methods, demonstrating the potential of our framework for the more challenging SSL setting. Code and models are available at: https://github.com/xmed-lab/DHC.
    摘要 医学三维图像的体积级标注是专业技术和时间consuming的;因此使用限制标注数据的 semi-supervised learning (SSL) 是非常有优点的。然而,实际应用中存在严重的类别分布不均问题,这个问题未得到充分关注。为解决这个问题,我们提出了一种新的双向偏置共训(DHC)框架,用于 semi-supervised 三维医学图像分割。我们提出了两种损失补偿策略,即 Distribution-aware Debiased Weighting(DistDW)和 Difficulty-aware Debiased Weighting(DiffDW),这两种策略可以动态使用 pseudo labels 来引导模型解决数据和学习偏见。我们的框架在合作这两个多样和准确的子模型时得到了显著改进。我们还提出了更加代表性的 semi-supervised 医学图像分割 benchmark,可以全面展示我们的类别偏见设计的效果。实验表明,我们的提议的框架可以通过使用 pseudo labels 进行偏见修正和缓解类别偏见问题,并且超越了当前状态的 SSL 方法,表明了我们的框架在更加挑战的 SSL 设定下的潜在力量。代码和模型可以在 GitHub 上找到:https://github.com/xmed-lab/DHC。

Multi-representations Space Separation based Graph-level Anomaly-aware Detection

  • paper_url: http://arxiv.org/abs/2307.12994
  • repo_url: None
  • paper_authors: Fu Lin, Haonan Gong, Mingkang Li, Zitong Wang, Yue Zhang, Xuexiong Luo
  • for: 本研究的目标是检测图DataSet中的异常图。
  • methods: 我们提出了一种基于多个表示空间分离的图级异常检测框架。为了考虑不同类型的异常图数据的重要性,我们设计了一个异常感知模块来学习特定的节点级和图级异常重要性。此外,我们学习了严格地分离正常和异常图表示空间,通过四种不同的权重图表示对比彼此。
  • results: 我们对基eline方法进行了广泛的评估,并通过十个公共图数据集来评估我们的方法。结果表明,我们的方法具有效果。
    Abstract Graph structure patterns are widely used to model different area data recently. How to detect anomalous graph information on these graph data has become a popular research problem. The objective of this research is centered on the particular issue that how to detect abnormal graphs within a graph set. The previous works have observed that abnormal graphs mainly show node-level and graph-level anomalies, but these methods equally treat two anomaly forms above in the evaluation of abnormal graphs, which is contrary to the fact that different types of abnormal graph data have different degrees in terms of node-level and graph-level anomalies. Furthermore, abnormal graphs that have subtle differences from normal graphs are easily escaped detection by the existing methods. Thus, we propose a multi-representations space separation based graph-level anomaly-aware detection framework in this paper. To consider the different importance of node-level and graph-level anomalies, we design an anomaly-aware module to learn the specific weight between them in the abnormal graph evaluation process. In addition, we learn strictly separate normal and abnormal graph representation spaces by four types of weighted graph representations against each other including anchor normal graphs, anchor abnormal graphs, training normal graphs, and training abnormal graphs. Based on the distance error between the graph representations of the test graph and both normal and abnormal graph representation spaces, we can accurately determine whether the test graph is anomalous. Our approach has been extensively evaluated against baseline methods using ten public graph datasets, and the results demonstrate its effectiveness.
    摘要 GRAPH结构模式在近期内广泛应用于不同领域的数据模型中。检测图数据中异常Graph信息已成为一个流行的研究问题。我们的研究 objective 是 centered 在特定的问题上,即如何在图数据中检测异常图。前一些研究发现,异常图主要表现为节点级别和图级别异常,但这些方法很容易对两种异常形态进行等效的评估,这与实际情况不符。此外,一些异常图具有轻微异常特征,容易被现有方法排除。因此,我们提出了一个基于多个 Representation space 的图级别异常检测框架。为了考虑不同的节点级别和图级别异常的重要性,我们设计了一个异常检测模块,以学习特定的节点级别和图级别异常之间的权重。此外,我们通过四种不同类型的权重化图表示对之间的竞争学习,以学习纯正的正常图表示空间和异常图表示空间。通过测试图表示与正常图表示空间和异常图表示空间之间的距离错误来准确判断测试图是否异常。我们的方法在比基线方法进行evaluate 后得到了显著的效果。

High-performance real-world optical computing trained by in situ model-free optimization

  • paper_url: http://arxiv.org/abs/2307.11957
  • repo_url: None
  • paper_authors: Guangyuan Zhao, Xin Shu, Renjie Zhou
  • for: 提高光学计算系统的高速和低能耗数据处理能力,并解决 simulation-to-reality gap。
  • methods: 使用 score gradient estimation 算法,对光学系统进行模型独立优化,不需要 computation-heavy 和偏见的系统模拟。
  • results: 在 MNIST 和 FMNIST 数据集上实现了高精度分类,并在无图像和高速细胞分析中展示了潜在的应用前景。
    Abstract Optical computing systems can provide high-speed and low-energy data processing but face deficiencies in computationally demanding training and simulation-to-reality gap. We propose a model-free solution for lightweight in situ optimization of optical computing systems based on the score gradient estimation algorithm. This approach treats the system as a black box and back-propagates loss directly to the optical weights' probabilistic distributions, hence circumventing the need for computation-heavy and biased system simulation. We demonstrate a superior classification accuracy on the MNIST and FMNIST datasets through experiments on a single-layer diffractive optical computing system. Furthermore, we show its potential for image-free and high-speed cell analysis. The inherent simplicity of our proposed method, combined with its low demand for computational resources, expedites the transition of optical computing from laboratory demonstrations to real-world applications.
    摘要 光学计算系统可以提供高速和低能耗数据处理,但面临 computationally demanding 训练和实际-模拟之间的差距。我们提出了一种模型自由的解决方案,基于分布式权重的排名预测算法,用于优化光学计算系统。这种方法将系统视为黑盒子,直接从损失函数反射到光学权重的概率分布,因此不需要计算负担重和偏见的系统模拟。我们通过对单层散射光学计算系统进行实验,在 MNIST 和 FMNIST 数据集上达到了更高的分类精度。此外,我们还示出了无图像和高速细胞分析的潜在可能性。我们的提议的简单性和计算资源的低需求,使得光学计算从实验室示范转移到实际应用变得更加容易。

Pūioio: On-device Real-Time Smartphone-Based Automated Exercise Repetition Counting System

  • paper_url: http://arxiv.org/abs/2308.02420
  • repo_url: None
  • paper_authors: Adam Sinclair, Kayla Kautai, Seyed Reza Shahamiri
  • for: 这个研究的目的是为了开发一个可靠且低成本的手机应用程序,可以在实时进行运动重复计数。
  • methods: 这个研究使用了深度学习技术,搭配手机摄像头进行运动重复计数。系统包括五个组件:(1)姿势估计、(2)阈值分类、(3)流动性、(4)状态机器、(5)计数器。
  • results: 这个系统在实际测试中精度高达98.89%,并且在预先录影的数据集中也达到98.85%的准确性。这使得这个系统成为一个有效、低成本且便捷的选择,不需要特殊的仪器或网络连接。
    Abstract Automated exercise repetition counting has applications across the physical fitness realm, from personal health to rehabilitation. Motivated by the ubiquity of mobile phones and the benefits of tracking physical activity, this study explored the feasibility of counting exercise repetitions in real-time, using only on-device inference, on smartphones. In this work, after providing an extensive overview of the state-of-the-art automatic exercise repetition counting methods, we introduce a deep learning based exercise repetition counting system for smartphones consisting of five components: (1) Pose estimation, (2) Thresholding, (3) Optical flow, (4) State machine, and (5) Counter. The system is then implemented via a cross-platform mobile application named P\=uioio that uses only the smartphone camera to track repetitions in real time for three standard exercises: Squats, Push-ups, and Pull-ups. The proposed system was evaluated via a dataset of pre-recorded videos of individuals exercising as well as testing by subjects exercising in real time. Evaluation results indicated the system was 98.89% accurate in real-world tests and up to 98.85% when evaluated via the pre-recorded dataset. This makes it an effective, low-cost, and convenient alternative to existing solutions since the proposed system has minimal hardware requirements without requiring any wearable or specific sensors or network connectivity.
    摘要 自动化的运动重复计数有各种应用在身体健身和重建领域,从个人健康到rehabilitation。为了利用移动电话的普遍性和跟踪物理活动的利点,这项研究探索了使用移动电话上的只有设备推理来实时计数运动重复的可能性。在这项研究中,我们首先提供了现有自动运动重复计数方法的广泛概述,然后引入了一种基于深度学习的运动重复计数系统,该系统由五个组成部分:(1)姿势估计,(2)阈值分割,(3)Optical flow,(4)状态机和(5)计数器。这个系统然后通过一个跨平台移动应用程序 named P\=uioio 实现,该应用程序使用了移动电话摄像头来实时跟踪运动重复,并对三种标准运动进行测试:蹲squats,推push-ups和抓pull-ups。我们对这个系统进行了一系列测试和评估,测试结果表明该系统在实际测试中的准确率达98.89%,并且在预录视频数据集上的评估结果为98.85%。这使得该系统成为一个有效、低成本、方便的替代方案,因为它没有特殊的硬件需求,也没有需要佩戴式设备或特殊的传感器或网络连接。

Implicit Interpretation of Importance Weight Aware Updates

  • paper_url: http://arxiv.org/abs/2307.11955
  • repo_url: None
  • paper_authors: Keyi Chen, Francesco Orabona
  • for: 这篇论文主要是为了解释importance weight aware(IWA)更新法的性能优劣。
  • methods: 论文使用了一种新的框架,即通用隐式跟踪领导者(FTRL),来分析通用隐式更新法。
  • results: 论文表明,IWA更新法在在线学习设置中具有更好的 regret upper bound,比plain gradient更新法更好。
    Abstract Due to its speed and simplicity, subgradient descent is one of the most used optimization algorithms in convex machine learning algorithms. However, tuning its learning rate is probably its most severe bottleneck to achieve consistent good performance. A common way to reduce the dependency on the learning rate is to use implicit/proximal updates. One such variant is the Importance Weight Aware (IWA) updates, which consist of infinitely many infinitesimal updates on each loss function. However, IWA updates' empirical success is not completely explained by their theory. In this paper, we show for the first time that IWA updates have a strictly better regret upper bound than plain gradient updates in the online learning setting. Our analysis is based on the new framework, generalized implicit Follow-the-Regularized-Leader (FTRL) (Chen and Orabona, 2023), to analyze generalized implicit updates using a dual formulation. In particular, our results imply that IWA updates can be considered as approximate implicit/proximal updates.
    摘要 由于其速度和简洁性,剪梯下降是机器学习中最常用的优化算法之一。然而,调整学习率是它最严重的瓶颈,以实现一致的好表现。一种常见的方法是使用隐式/辅助更新。一种such variant是重要性评估(IWA)更新,它们包括无限多个infinitesimal更新。然而, IWA更新的实际成功并不完全由其理论来解释。在这篇论文中,我们展示了IWA更新在在线学习 Setting中具有更好的 regret upper bound,比普通的梯度更新更好。我们的分析基于新的框架,通用隐式 Follow-the-Regularized-Leader(FTRL)(Chen和Orabona,2023),用于分析通用隐式更新。特别是,我们的结果表明,IWA更新可以被视为approximate隐式/辅助更新。

On-Robot Bayesian Reinforcement Learning for POMDPs

  • paper_url: http://arxiv.org/abs/2307.11954
  • repo_url: None
  • paper_authors: Hai Nguyen, Sammie Katt, Yuchen Xiao, Christopher Amato
  • for: 这篇论文的目的是提出一种专门适用于物理系统的 bayesian 强化学习方法,以解决 robot 学习中的数据成本问题。
  • methods: 该方法使用了一种特殊的 factored 表示方法,以捕捉专家知识,并使用 Monte-Carlo tree search 和 particle filtering 来解决 posterior 的推理问题。
  • results: 在两个人机交互任务中,该方法可以在几个实际世界回合后达到 near-optimal 性能,并且可以利用 typical low-level robot simulators 和处理未知环境的不确定性。
    Abstract Robot learning is often difficult due to the expense of gathering data. The need for large amounts of data can, and should, be tackled with effective algorithms and leveraging expert information on robot dynamics. Bayesian reinforcement learning (BRL), thanks to its sample efficiency and ability to exploit prior knowledge, is uniquely positioned as such a solution method. Unfortunately, the application of BRL has been limited due to the difficulties of representing expert knowledge as well as solving the subsequent inference problem. This paper advances BRL for robotics by proposing a specialized framework for physical systems. In particular, we capture this knowledge in a factored representation, then demonstrate the posterior factorizes in a similar shape, and ultimately formalize the model in a Bayesian framework. We then introduce a sample-based online solution method, based on Monte-Carlo tree search and particle filtering, specialized to solve the resulting model. This approach can, for example, utilize typical low-level robot simulators and handle uncertainty over unknown dynamics of the environment. We empirically demonstrate its efficiency by performing on-robot learning in two human-robot interaction tasks with uncertainty about human behavior, achieving near-optimal performance after only a handful of real-world episodes. A video of learned policies is at https://youtu.be/H9xp60ngOes.
    摘要 机器人学习通常困难由于数据收集成本高昂。为了解决这问题,我们可以采用有效的算法和利用机器人动力学专家的知识。 bayesian reinforcement learning(BRL)因其样本效率高和可以利用先验知识而成为一种适用的解决方案。然而,BRL在应用中受到了专家知识表示和推理问题的限制。本文提出了一种特殊的框架,用于physical systems。我们通过 capture this knowledge in a factored representation,然后证明 posterior factorizes in a similar shape,并 ultimately formalize the model in a Bayesian framework。然后,我们引入了一种基于Monte-Carlo tree search和particle filtering的在线解决方法,专门用于解决这个模型。这种方法可以利用 typical low-level robot simulators and handle uncertainty over unknown dynamics of the environment。我们通过在两个人机交互任务中进行实验,demonstrate its efficiency,只需要几个真实世界回合就能够 дости得 near-optimal performance。视频 display learned policies at https://youtu.be/H9xp60ngOes.

HIQL: Offline Goal-Conditioned RL with Latent States as Actions

  • paper_url: http://arxiv.org/abs/2307.11949
  • repo_url: https://github.com/seohongpark/hiql
  • paper_authors: Seohong Park, Dibya Ghosh, Benjamin Eysenbach, Sergey Levine
  • for: 这个论文旨在提出一种基于非监督学习的目标conditioned reinforcement learning算法,可以从无标签数据中学习。
  • methods: 该算法使用一个action-free value function,通过层次分解来学习两个策略:一个高级策略使得状态被看作动作,预测子目标,以及一个低级策略预测达到子目标的行动。
  • results: 通过分析和实践示例, authors表明该层次分解使得其方法具有对噪音估计值函数的 Robustness。然后,通过应用该方法于offline目标 дости达标准别件,authors证明其方法可以解决远程目标任务,可以扩展到高维图像观察数据,并可以充分利用无动作数据。
    Abstract Unsupervised pre-training has recently become the bedrock for computer vision and natural language processing. In reinforcement learning (RL), goal-conditioned RL can potentially provide an analogous self-supervised approach for making use of large quantities of unlabeled (reward-free) data. However, building effective algorithms for goal-conditioned RL that can learn directly from diverse offline data is challenging, because it is hard to accurately estimate the exact value function for faraway goals. Nonetheless, goal-reaching problems exhibit structure, such that reaching distant goals entails first passing through closer subgoals. This structure can be very useful, as assessing the quality of actions for nearby goals is typically easier than for more distant goals. Based on this idea, we propose a hierarchical algorithm for goal-conditioned RL from offline data. Using one action-free value function, we learn two policies that allow us to exploit this structure: a high-level policy that treats states as actions and predicts (a latent representation of) a subgoal and a low-level policy that predicts the action for reaching this subgoal. Through analysis and didactic examples, we show how this hierarchical decomposition makes our method robust to noise in the estimated value function. We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data. Our code is available at https://seohong.me/projects/hiql/
    摘要 现代计算机视觉和自然语言处理领域中,无监督预训练已经成为核心。在奖励学习(RL)领域,目标受控RL可能提供一种类似的自我监督方法,使用大量无奖数据进行学习。然而,建立有效的目标受控RL算法,直接从多样化的离线数据中学习,是一项挑战。这是因为,难以准确地估计远距离目标的价值函数。然而,目标达成问题具有结构,即达到远距离目标需要先通过更近的亚目标。这种结构可以很有用,因为评估近距离目标的动作质量通常比远距离目标更容易。基于这个想法,我们提出了一种层次算法 для目标受控RL。使用一个没有动作的价值函数,我们学习了两个政策:一个高级政策,将状态看作动作,预测(一个隐藏表示)亚目标,以及一个低级政策,预测用于达到亚目标的动作。通过分析和示例,我们证明了这种层次 decomposition 使我们的方法具有鲁棒性,可以抵抗估计值函数的噪声。然后,我们将我们的方法应用于离线目标达成标准,并证明了我们的方法可以解决长期任务,可以扩展到高维图像观察,并可以轻松地使用无动作数据。我们的代码可以在 上获取。

The instabilities of large learning rate training: a loss landscape view

  • paper_url: http://arxiv.org/abs/2307.11948
  • repo_url: None
  • paper_authors: Lawrence Wang, Stephen Roberts
  • for: 研究深度学习网络训练中大学习率的稳定性,特别是在大学习率下的训练过程中存在潜在的不稳定性。
  • methods: 通过分析梯度下降的矩阵Hessian matrix来研究深度学习网络训练过程中的不稳定性。
  • results: 发现在大学习率下的训练过程中出现了“景观平整”和“景观转移”这两种fenomenon,这两种现象与训练过程中的不稳定性息息相关。
    Abstract Modern neural networks are undeniably successful. Numerous works study how the curvature of loss landscapes can affect the quality of solutions. In this work we study the loss landscape by considering the Hessian matrix during network training with large learning rates - an attractive regime that is (in)famously unstable. We characterise the instabilities of gradient descent, and we observe the striking phenomena of \textit{landscape flattening} and \textit{landscape shift}, both of which are intimately connected to the instabilities of training.
    摘要 现代神经网络确实非常成功。许多研究表明损失函数的凹凸度可以影响解决方案的质量。在这篇文章中,我们研究训练神经网络时的损失函数地形,包括在大学习率下进行训练的情况。我们描述梯度下降的不稳定性,并观察到了各种phenomena,如“地形平整”和“地形转移”,这些现象与训练过程中的不稳定性密切相关。

Collaboratively Learning Linear Models with Structured Missing Data

  • paper_url: http://arxiv.org/abs/2307.11947
  • repo_url: None
  • paper_authors: Chen Cheng, Gary Cheng, John Duchi
  • for: 这篇论文目的是解决多个代理(agent)协同学习最小二乘估计问题。每个代理都观察到不同的特征集(e.g., 感知器的分辨率不同)。我们想要协调代理,以生成每个代理最佳估计器。
  • methods: 我们提出了一种分布式、半监督的算法Collab,包括三步:本地训练、聚合和分布。我们的过程不需要交换标注数据,因此具有通信效率和在标注数据不可 accessible 的场景中使用。
  • results: 我们的方法在真实数据和 sintetic 数据上进行测试,并达到了 Nearly asymptotically local minimax 优化的水平,即在不交换标注数据的情况下,我们的方法与可以交换标注数据的优化方法相比,具有类似的性能。
    Abstract We study the problem of collaboratively learning least squares estimates for $m$ agents. Each agent observes a different subset of the features$\unicode{x2013}$e.g., containing data collected from sensors of varying resolution. Our goal is to determine how to coordinate the agents in order to produce the best estimator for each agent. We propose a distributed, semi-supervised algorithm Collab, consisting of three steps: local training, aggregation, and distribution. Our procedure does not require communicating the labeled data, making it communication efficient and useful in settings where the labeled data is inaccessible. Despite this handicap, our procedure is nearly asymptotically local minimax optimal$\unicode{x2013}$even among estimators allowed to communicate the labeled data such as imputation methods. We test our method on real and synthetic data.
    摘要 我们研究多 Agent 协同学习最小二乘估计问题。每个 Agent 观察不同的特征集合$\unicode{x2013}$例如,各种感知器的分辨率不同。我们的目标是在 Agent 之间协调,以生成每个 Agent 最佳估计器。我们提出了分布式、半监督的算法 Collab,包括三个步骤:本地训练、聚合和分布。我们的过程不需要通信标注数据,因此具有通信效率和在标注数据不可 accessible 的场景中使用。尽管这些限制,我们的过程仍然几乎极限本地最小最优$\unicode{x2013}$甚至与可以通信标注数据的估计器相比。我们在真实数据和 sintetic 数据上测试了我们的方法。

Batch Clipping and Adaptive Layerwise Clipping for Differential Private Stochastic Gradient Descent

  • paper_url: http://arxiv.org/abs/2307.11939
  • repo_url: None
  • paper_authors: Toan N. Nguyen, Phuong Ha Nguyen, Lam M. Nguyen, Marten Van Dijk
    for: 这 paper 是为了提出一种新的权限保护技术,以保证 differential privacy 的实现。methods: 这 paper 使用了 Individual Clipping (IC) 和 Batch Clipping (BC) 两种方法来实现权限保护,并且引入了 Adaptive Layerwise Clipping (ALC) 方法来适应不同层的敏感度。results: experiments 表明,使用 BC 和 ALC 可以使 Differential Private Stochastic Gradient Descent (DPSGD) converge,而使用 IC 和 ALC 不能 converge。
    Abstract Each round in Differential Private Stochastic Gradient Descent (DPSGD) transmits a sum of clipped gradients obfuscated with Gaussian noise to a central server which uses this to update a global model which often represents a deep neural network. Since the clipped gradients are computed separately, which we call Individual Clipping (IC), deep neural networks like resnet-18 cannot use Batch Normalization Layers (BNL) which is a crucial component in deep neural networks for achieving a high accuracy. To utilize BNL, we introduce Batch Clipping (BC) where, instead of clipping single gradients as in the orginal DPSGD, we average and clip batches of gradients. Moreover, the model entries of different layers have different sensitivities to the added Gaussian noise. Therefore, Adaptive Layerwise Clipping methods (ALC), where each layer has its own adaptively finetuned clipping constant, have been introduced and studied, but so far without rigorous DP proofs. In this paper, we propose {\em a new ALC and provide rigorous DP proofs for both BC and ALC}. Experiments show that our modified DPSGD with BC and ALC for CIFAR-$10$ with resnet-$18$ converges while DPSGD with IC and ALC does not.
    摘要 每个轮次在差分私人梯度下降(DPSGD)中传输一个混合的梯度,其中包含 Gaussian 噪声,并将其发送到中央服务器,以更新一个全球模型,通常是深度神经网络。由于混合的梯度在不同层中计算,因此无法使用批处理正则化层(BNL),这是深度神经网络实现高精度的一个关键组件。为了使用 BNL,我们引入批量混合(BC),其中,而不是归一化单个梯度,我们平均混合批处理的梯度。此外,不同层的模型元素对添加的 Gaussian 噪声具有不同的感度。因此,我们引入自适应层wise混合方法(ALC),其中每个层有自己的自适应调整的混合常量。在本文中,我们提出了一种新的 ALC,并为 BC 和 ALC 提供了严格的 DP 证明。实验表明,我们修改了 DPSGD 的 BC 和 ALC,可以在 CIFAR-10 上使用 resnet-18 进行训练,而 DPSGD 的 IC 和 ALC 不能。

Mercer Large-Scale Kernel Machines from Ridge Function Perspective

  • paper_url: http://arxiv.org/abs/2307.11925
  • repo_url: None
  • paper_authors: Karol Dziedziul, Sergey Kryzhevich
  • for: 本文关注大规模kernel机器学习方面的Mercer kernel machines的推理方法,从ridge函数的角度出发,回顾林和拜访的结果。
  • methods: 本文使用了近期rachimi和recht(2008)的Random features for large-scale kernel machines,以及相关的Approximation Theory来研究哪些kernel可以被简化为一个恒等式的极值函数。
  • results: 本文发现了一些障碍使用这种方法的问题,并可能有各种应用在深度学习中,特别是图像处理等问题。
    Abstract To present Mercer large-scale kernel machines from a ridge function perspective, we recall the results by Lin and Pinkus from Fundamentality of ridge functions. We consider the main theorem of the recent paper by Rachimi and Recht, 2008, Random features for large-scale kernel machines in terms of the Approximation Theory. We study which kernels can be approximated by a sum of cosine function products with arguments depending on $x$ and $y$ and present the obstacles of such an approach. The results of this article may have various applications in Deep Learning, especially in problems related to Image Processing.
    摘要 要从ridge函数角度介绍Mercer大规模kernel机器,我们回忆了林和拜纳斯在基本性理论中的结果。我们考虑了2008年rachimi和 recht的论文《Random features for large-scale kernel machines in terms of Approximation Theory》中的主要定理。我们研究了可以通过cosine函数产品的叠加来近似kernel机器,其中Arguments取决于x和y坐标,并提出了这种方法的阻碍。这些结果可能在深度学习中有各种应用,特别是在图像处理问题中。

Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

  • paper_url: http://arxiv.org/abs/2307.11922
  • repo_url: None
  • paper_authors: Kolby Nottingham, Yasaman Razeghi, Kyungmin Kim, JB Lanier, Pierre Baldi, Roy Fox, Sameer Singh
  • for: 这个论文是为了研究如何使用自然语言处理技术来提高机器人和游戏中的决策过程。
  • methods: 该论文提出了一种自动选择简洁状态描述的方法,称为Brief Language INputs for DEcision-making Responses(BLINDER),它通过学习任务条件下的状态描述价值函数来选择描述。
  • results: 该论文在NetHack游戏和机器人 manipulate任务中实现了提高任务成功率、减少输入大小和计算成本、并且可以在不同的LLM actors之间进行泛化。
    Abstract Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games, utilizing their general world knowledge and planning abilities. However, previous work does little to explore what environment state information is provided to LLM actors via language. Exhaustively describing high-dimensional states can impair performance and raise inference costs for LLM actors. Previous LLM actors avoid the issue by relying on hand-engineered, task-specific protocols to determine which features to communicate about a state and which to leave out. In this work, we propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions by learning a value function for task-conditioned state descriptions. We evaluate BLINDER on the challenging video game NetHack and a robotic manipulation task. Our method improves task success rate, reduces input size and compute costs, and generalizes between LLM actors.
    摘要 Translated into Simplified Chinese:大型语言模型(LLM)在机器人和游戏等领域中作为决策演员,利用其通用世界知识和规划能力。然而,前一代工作几乎没有探讨在语言中提供环境状态信息给 LLM 演员的问题。描述高维状态的尝试可能会降低性能和提高 LLM 演员的推理成本。先前的 LLM 演员通常采用手动设计、任务特定的协议来确定要将哪些特征包含在状态描述中,并且哪些可以略去。在这项工作中,我们提出了简短语言输入 для决策响应(BLINDER)方法,通过学习任务条件下的状态描述价值函数来自动选择简洁的状态描述。我们在 NetHack 游戏和机器人搅拌任务上评估了 BLINDER。我们的方法可以提高任务成功率,减少输入大小和计算成本,并且可以在不同的 LLM 演员之间进行泛化。

Poverty rate prediction using multi-modal survey and earth observation data

  • paper_url: http://arxiv.org/abs/2307.11921
  • repo_url: None
  • paper_authors: Simone Fobi, Manuel Cardona, Elliott Collins, Caleb Robinson, Anthony Ortiz, Tina Sederholm, Rahul Dodhia, Juan Lavista Ferres
  • for: 预测地区贫困率
  • methods: combining household demographic and living standards survey questions with features derived from satellite imagery
  • results: 1) inclusion of visual features reduces the mean error in poverty rate estimates from 4.09% to 3.88% 2) the best performance – errors in poverty rate decrease from 4.09% to 3.71% 3) extracted visual features encode geographic and urbanization differences between regions.
    Abstract This work presents an approach for combining household demographic and living standards survey questions with features derived from satellite imagery to predict the poverty rate of a region. Our approach utilizes visual features obtained from a single-step featurization method applied to freely available 10m/px Sentinel-2 surface reflectance satellite imagery. These visual features are combined with ten survey questions in a proxy means test (PMT) to estimate whether a household is below the poverty line. We show that the inclusion of visual features reduces the mean error in poverty rate estimates from 4.09% to 3.88% over a nationally representative out-of-sample test set. In addition to including satellite imagery features in proxy means tests, we propose an approach for selecting a subset of survey questions that are complementary to the visual features extracted from satellite imagery. Specifically, we design a survey variable selection approach guided by the full survey and image features and use the approach to determine the most relevant set of small survey questions to include in a PMT. We validate the choice of small survey questions in a downstream task of predicting the poverty rate using the small set of questions. This approach results in the best performance -- errors in poverty rate decrease from 4.09% to 3.71%. We show that extracted visual features encode geographic and urbanization differences between regions.
    摘要 Simplified Chinese translation:这项研究提出了一种方法,利用户户普查和卫星成像特征来预测地区贫困率。该方法使用10m/px Sentinel-2表面反射卫星成像中的视觉特征,与十个问题组成一个代表测试(PMT)来估算户户是否下于贫困线。包括卫星成像特征后,贫困率估计的平均错误率由4.09%降低到3.88%。此外,该方法还提出了一种方法,选择与卫星成像特征相关的小问题集,以便在预测贫困率的下游任务中使用。该方法根据全面调查和成像特征选择最相关的小问题集,并用这些问题集来预测贫困率。这种方法实现了最佳性能,贫困率估计错误率由4.09%降低到3.71%。此外,提取的视觉特征还含有地域和城市化差异。

Unveiling Vulnerabilities in Interpretable Deep Learning Systems with Query-Efficient Black-box Attacks

  • paper_url: http://arxiv.org/abs/2307.11906
  • repo_url: None
  • paper_authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin, Tamer Abuhmed
  • for: 保障深度学习系统的可靠性、可靠性和信任性,防止恶意攻击。
  • methods: 使用微生物遗传算法,基于黑盒测试,不需要目标模型和解释模型的先知知识。
  • results: 实验结果显示,这种攻击具有高成功率,使用挑战性示例和归因地幔,很难于探测。
    Abstract Deep learning has been rapidly employed in many applications revolutionizing many industries, but it is known to be vulnerable to adversarial attacks. Such attacks pose a serious threat to deep learning-based systems compromising their integrity, reliability, and trust. Interpretable Deep Learning Systems (IDLSes) are designed to make the system more transparent and explainable, but they are also shown to be susceptible to attacks. In this work, we propose a novel microbial genetic algorithm-based black-box attack against IDLSes that requires no prior knowledge of the target model and its interpretation model. The proposed attack is a query-efficient approach that combines transfer-based and score-based methods, making it a powerful tool to unveil IDLS vulnerabilities. Our experiments of the attack show high attack success rates using adversarial examples with attribution maps that are highly similar to those of benign samples which makes it difficult to detect even by human analysts. Our results highlight the need for improved IDLS security to ensure their practical reliability.
    摘要 深度学习在许多应用中得到了迅速的应用,但它知道是易受到敌意攻击的。这些攻击会对深度学习基于系统的完整性、可靠性和信任造成严重的威胁。可解释深度学习系统(IDLS)是为了使系统更加透明和可解释的,但它们也被证明是易受到攻击的。在这种工作中,我们提出了一种基于微生物遗传算法的黑盒攻击方法,不需要target模型和其解释模型的先前知识。我们的攻击方法结合了传递基本方法和分数基本方法,使其成为对IDLS的可靠性进行检测的强大工具。我们的实验表明,使用对抗例中的特征图可以达到高度的攻击成功率,并且这些特征图与正常样本的特征图几乎相同,使其具有难以检测的特点。我们的结果表明,为了确保IDLS的实际可靠性,需要进一步加强IDLS的安全性。

Model Compression Methods for YOLOv5: A Review

  • paper_url: http://arxiv.org/abs/2307.11904
  • repo_url: None
  • paper_authors: Mohammad Jani, Jamil Fayyad, Younes Al-Younes, Homayoun Najjaran
  • for: 本文主要针对强化YOLO对象检测器的研究进行了概括,以便在资源有限的设备上部署。
  • methods: 本文主要考虑了网络剪辑和量化两种压缩方法,以减少模型的内存使用量和计算时间。
  • results: 通过对YOLOv5进行剪辑和量化处理,可以降低模型的内存使用量和计算时间,但是还存在一些 gap 需要进一步研究。
    Abstract Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.
    摘要 在过去几年,对 YOLO 对象检测器进行了广泛的研究,以提高其精度和效率。自其引入以来,共有八个主要版本的 YOLO 发布,以提高其精度和效率。虽然 YOLO 在许多领域得到了广泛的应用,但在资源有限的设备上部署它却存在挑战。为解决这个问题,各种神经网络压缩方法被开发出来,这些方法分为三个主要类别:网络剪辑、量化和知识传递。使用这些方法可以降低内存使用量和执行时间,这使得它们在硬件限制的边缘设备上进行部署变得有利可图。在本文中,我们将关注剪辑和量化,因为它们在可模块化方面比较出色。我们将这些方法进行分类和分析,并通过应用这些方法于 YOLOv5 来评估其实际效果。通过这些研究,我们可以了解剪辑和量化在 YOLOv5 上的应用存在哪些挑战,并提供未来研究的方向。在多个 YOLO 版本中,我们选择 YOLOv5,因为它在文献中的悠久度和受欢迎程度均很高。这是关于剪辑和量化方法在 YOLOv5 上的首个具体评估文章。我们的研究也可以扩展到 newer 版本的 YOLO,因为在资源有限的设备上部署它们也存在同样的挑战。本文适合那些关注实际部署模型压缩方法在 YOLOv5 上的人,以及想要探索不同的压缩技术,以应用于未来的 YOLO 版本。

Project Florida: Federated Learning Made Easy

  • paper_url: http://arxiv.org/abs/2307.11899
  • repo_url: None
  • paper_authors: Daniel Madrigal Diaz, Andre Manoel, Jialei Chen, Nalin Singal, Robert Sim
  • for: This paper is written for machine learning engineers and application developers who want to deploy large-scale federated learning (FL) solutions across a heterogeneous device ecosystem.
  • methods: The paper presents a system architecture and software development kit (SDK) called Project Florida, which enables the deployment of FL solutions across a wide range of operating systems and hardware specifications. The paper also discusses the use of cloud-hosted infrastructure and task management interfaces to support the training process.
  • results: The paper presents illustrative experiments that demonstrate the system’s capabilities, including the ability to train machine learning models across a wide range of devices and the ability to scale the training process to accommodate a large number of client devices.
    Abstract We present Project Florida, a system architecture and software development kit (SDK) enabling deployment of large-scale Federated Learning (FL) solutions across a heterogeneous device ecosystem. Federated learning is an approach to machine learning based on a strong data sovereignty principle, i.e., that privacy and security of data is best enabled by storing it at its origin, whether on end-user devices or in segregated cloud storage silos. Federated learning enables model training across devices and silos while the training data remains within its security boundary, by distributing a model snapshot to a client running inside the boundary, running client code to update the model, and then aggregating updated snapshots across many clients in a central orchestrator. Deploying a FL solution requires implementation of complex privacy and security mechanisms as well as scalable orchestration infrastructure. Scale and performance is a paramount concern, as the model training process benefits from full participation of many client devices, which may have a wide variety of performance characteristics. Project Florida aims to simplify the task of deploying cross-device FL solutions by providing cloud-hosted infrastructure and accompanying task management interfaces, as well as a multi-platform SDK supporting most major programming languages including C++, Java, and Python, enabling FL training across a wide range of operating system (OS) and hardware specifications. The architecture decouples service management from the FL workflow, enabling a cloud service provider to deliver FL-as-a-service (FLaaS) to ML engineers and application developers. We present an overview of Florida, including a description of the architecture, sample code, and illustrative experiments demonstrating system capabilities.
    摘要 我们介绍项目“佛罗里达”,这是一个系统架构和软件开发包(SDK),它使得大规模联合学习(FL)解决方案可以在多种设备生态系统中部署。联合学习是一种基于强大数据主权原则的机器学习方法,即数据privacy和安全最好是在数据的原始位置保持,whether on end-user devices or in segregated cloud storage silos。联合学习可以在设备和存储silos之间进行模型训练,而不需要将数据传输到外部,只需在设备上运行客户端代码来更新模型,然后将更新后的模型集中到中央抽象器中。实现FL解决方案需要实施复杂的隐私和安全机制,以及可扩展的管理基础设施。因为模型训练过程需要全面参与多个客户端设备,这些设备可能有各种性能特点。项目“佛罗里达”目标是使得跨设备FL解决方案的部署变得更加简单,通过提供云主机的基础设施和相关的任务管理界面,以及支持多种主要编程语言,包括C++、Java和Python,以实现FL训练在多种操作系统和硬件特性上。架构解决方案的分离,使得云服务提供商可以提供FLaaS(联合学习 как服务),让机器学习工程师和应用程序开发人员快速搭建FL解决方案。我们将对项目“佛罗里达”进行概述,包括架构描述、示例代码和 ilustrative experiments,以示系统的能力。

Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.11897
  • repo_url: https://github.com/skandavaidyanath/credit-assignment
  • paper_authors: Akash Velu, Skanda Vaidyanath, Dilip Arumugam
  • for: 该文章为了解决奖励学习在缺乏评价反馈的环境中表现不佳问题,提出了一种基于追溯政策的方法。
  • methods: 该文章使用了现有的重要性抽样比例估计技术来稳定化和改进基于追溯政策的方法。
  • results: 该文章在各种环境中显示了稳定和高效的学习效果,并且可以在奖励学习中缓解奖励分配问题。
    Abstract Oftentimes, environments for sequential decision-making problems can be quite sparse in the provision of evaluative feedback to guide reinforcement-learning agents. In the extreme case, long trajectories of behavior are merely punctuated with a single terminal feedback signal, leading to a significant temporal delay between the observation of a non-trivial reward and the individual steps of behavior culpable for achieving said reward. Coping with such a credit assignment challenge is one of the hallmark characteristics of reinforcement learning. While prior work has introduced the concept of hindsight policies to develop a theoretically moxtivated method for reweighting on-policy data by impact on achieving the observed trajectory return, we show that these methods experience instabilities which lead to inefficient learning in complex environments. In this work, we adapt existing importance-sampling ratio estimation techniques for off-policy evaluation to drastically improve the stability and efficiency of these so-called hindsight policy methods. Our hindsight distribution correction facilitates stable, efficient learning across a broad range of environments where credit assignment plagues baseline methods.
    摘要 经常情况下,决策问题环境往往缺乏评价反馈,导致强化学习代理人受到很大的评价延迟。在极端情况下,长期行为只会被截止符号性的终端反馈信号刺激,从而导致行为减少的减少很大。强化学习面临着寄付问题的挑战。 Prior work已经引入了叫做前景政策的方法,以 theoretically moxtivated 方式重新权重on-policy数据,以便更好地评价 achieve trajectory return。但我们发现这些方法会导致不稳定性,从而降低强化学习的效率。在这种情况下,我们采用了现有的重要性折衔估计技术,以改善这些叫做 hindsight 政策方法的稳定性和效率。我们的 hindsight 分布修正方法可以在各种缺乏寄付的环境中,稳定、高效地学习。

On the Vulnerability of Fairness Constrained Learning to Malicious Noise

  • paper_url: http://arxiv.org/abs/2307.11892
  • repo_url: None
  • paper_authors: Avrim Blum, Princewill Okoroafor, Aadirupa Saha, Kevin Stangl
  • For: This paper studies the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data.* Methods: The paper uses randomized classifiers to mitigate the vulnerability of fairness-constrained learning to adversarial noise.* Results: The paper shows that for certain fairness notions, such as Demographic Parity, the loss in accuracy can be as low as $\Theta(\alpha)$ when the noise rate is small. For other fairness notions, such as Equal Opportunity, the loss in accuracy can be as low as $O(\sqrt{\alpha})$. The paper also shows that the loss in accuracy clusters into three natural regimes: $O(\alpha)$, $O(\sqrt{\alpha})$, and $O(1)$.Here’s the Chinese translation of the three points:* For: 这篇论文研究了受到训练数据中小量邪恶噪声影响的公平学习的敏感性。* Methods: 这篇论文使用Randomized classifier来减少公平学习受到邪恶噪声影响的敏感性。* Results: 这篇论文显示,对于某些公平性定义,如人口均衡,当噪声率小时,损失率可以为Theta(α)。对于其他公平性定义,如机会平等,损失率可以为O(√α)。论文还显示,损失率分布在三个自然的 régime中:O(α)、O(√α)和O(1)。
    Abstract We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and presented negative results showing there exist data distributions where for several fairness constraints, any proper learner will exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an $O(\sqrt{\alpha})$ loss, and give a matching $\Omega(\sqrt{\alpha})$lower bound. In contrast, Konstantinov and Lampert (2021) showed for proper learners the loss in accuracy for both notions is $\Omega(1)$. The key technical novelty of our work is how randomization can bypass simple "tricks" an adversary can use to amplify his power. We also consider additional fairness notions including Equalized Odds and Calibration. For these fairness notions, the excess accuracy clusters into three natural regimes $O(\alpha)$,$O(\sqrt{\alpha})$ and $O(1)$. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.
    摘要 我们考虑了公平性条件下的学习敏感性对小量邪恶训练数据的影响。 Konstantinov 和 Lampert (2021) 开始了这个研究,并发现了一些负的结果,表明在某些公平性条件下,任何合法的学习者都将具有高度敏感性,当群体大小不对称时。 在这里,我们提供了一个更optimistic的见解,表明如果允许随机分类器,则情况会变得更加细分。例如,对于人口均衡公平性,我们显示可以允许只有 $\Theta(\alpha)$ 的损失率,其中 $\alpha$ 是邪恶训练数据的损失率,与不具有公平性条件时相同。对于平等机会公平性,我们显示可以允许 $O(\sqrt{\alpha})$ 的损失率,并提供了对应的 $\Omega(\sqrt{\alpha})$ 下界。与 Konstantinov 和 Lampert (2021) 的结果相比,我们的结果显示,在合法学习者下,两个公平性条件的损失率都是 $\Omega(1)$。我们的技术新动向是如何使用随机性来绕过简单的邪恶攻击者可以使用的“套路”。我们还考虑了其他的公平性条件,包括平等机会和准确性。这些结果提供了训练数据中邪恶训练数据的影响的更细分的见解。

On the Universality of Linear Recurrences Followed by Nonlinear Projections

  • paper_url: http://arxiv.org/abs/2307.11888
  • repo_url: None
  • paper_authors: Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith
  • for: 本研究目标是表明一种基于回归线性层的字符串模型(包括S4、S5和LRU),可以正确地模拟任何具有 suficiently régulier不对称序列-到-序列映射。
  • methods: 本研究使用了扩展的回归层和位置wise多层感知器(MLPs)来模拟序列-to-序列映射。主要想法是看到回归层为压缩算法,可以准确地存储输入序列的信息到内部状态中,然后由高度表达的 MLP 进行处理。
  • results: 研究发现,这种模型可以将任何 suficiently régulier不对称序列-to-序列映射 aproximated 到任何 desired 精度。
    Abstract In this note (work in progress towards a full-length paper) we show that a family of sequence models based on recurrent linear layers~(including S4, S5, and the LRU) interleaved with position-wise multi-layer perceptrons~(MLPs) can approximate arbitrarily well any sufficiently regular non-linear sequence-to-sequence map. The main idea behind our result is to see recurrent layers as compression algorithms that can faithfully store information about the input sequence into an inner state, before it is processed by the highly expressive MLP.
    摘要 在这份工作进度中(正在prepare一篇全长论文),我们展示了一家系列模型,该模型基于循环线性层(包括S4、S5和LRU)和位置层 wise多层感知器(MLP)。这种模型可以在任何足够规则的序列到序列映射中进行近似。我们的主要想法是看循环层为压缩算法,可以准确地将输入序列存储在内部状态中,然后由高度表达的 MLP 进行处理。

MORE: Measurement and Correlation Based Variational Quantum Circuit for Multi-classification

  • paper_url: http://arxiv.org/abs/2307.11875
  • repo_url: https://github.com/jindi0/more
  • paper_authors: Jindi Wu, Tianjie Hu, Qun Li
    for:MORE is a quantum multi-classifier that leverages the quantum information of a single readout qubit to perform multi-class classification tasks.methods:MORE uses a variational ansatz and quantum state tomography to reconstruct the readout state, and then employs variational quantum clustering and supervised learning to determine the mapping between input data and quantum labels.results:MORE achieves advanced performance in multi-class classification tasks despite using a simple ansatz and limited quantum resources, and outperforms traditional binary classifiers in certain scenarios.Here’s the Chinese translation of the three points:for:MORE 是一个使用单 readout qubit 进行多类别分类任务的量子多类别推断器。methods:MORE 使用量子状态测量来重建 readout 状态,然后使用量子推断 clustering 和 supervised learning 来决定输入数据和量子标签之间的映射。results:MORE 在多类别分类任务中获得进步的表现,即使使用简单的推断器和有限的量子资源,并在某些情况下超越传统的二进制推断器。
    Abstract Quantum computing has shown considerable promise for compute-intensive tasks in recent years. For instance, classification tasks based on quantum neural networks (QNN) have garnered significant interest from researchers and have been evaluated in various scenarios. However, the majority of quantum classifiers are currently limited to binary classification tasks due to either constrained quantum computing resources or the need for intensive classical post-processing. In this paper, we propose an efficient quantum multi-classifier called MORE, which stands for measurement and correlation based variational quantum multi-classifier. MORE adopts the same variational ansatz as binary classifiers while performing multi-classification by fully utilizing the quantum information of a single readout qubit. To extract the complete information from the readout qubit, we select three observables that form the basis of a two-dimensional Hilbert space. We then use the quantum state tomography technique to reconstruct the readout state from the measurement results. Afterward, we explore the correlation between classes to determine the quantum labels for classes using the variational quantum clustering approach. Next, quantum label-based supervised learning is performed to identify the mapping between the input data and their corresponding quantum labels. Finally, the predicted label is determined by its closest quantum label when using the classifier. We implement this approach using the Qiskit Python library and evaluate it through extensive experiments on both noise-free and noisy quantum systems. Our evaluation results demonstrate that MORE, despite using a simple ansatz and limited quantum resources, achieves advanced performance.
    摘要 量子计算在最近几年内已经显示了较大的承诺,尤其是对于计算密集的任务。例如,基于量子神经网络(QNN)的分类任务已经吸引了研究者的广泛关注,并在多个场景中进行了评估。然而,大多数量子分类器目前仅限于二进制分类任务,这可能是因为量子计算资源的限制或需要大量的经典后处理。在这篇论文中,我们提出了一种高效的量子多分类器,即MORE(测量和相关性基于量子多分类器)。MORE采用了同binary分类器一样的变量 ansatz,并在完全利用单个读取量子比特的量子信息上进行多分类。为了从读取量子比特中提取完整的信息,我们选择了三个观察量,它们构成了一个二维希尔伯特空间的基。然后,我们使用量子状态探测技术来重建读取状态。接着,我们研究分类关系来确定类别的量子标签,并使用量子分布式学习方法来确定输入数据与其相应的量子标签之间的映射。最后,我们使用类ifier来预测输入数据的标签。我们使用Qiskit Python库实现这种方法,并对噪声量子系统和噪声自由量子系统进行了广泛的实验评估。我们的评估结果表明,MORE,即使使用简单的 ansatz 和有限的量子资源,仍然可以达到高效的性能。

The Looming Threat of Fake and LLM-generated LinkedIn Profiles: Challenges and Opportunities for Detection and Prevention

  • paper_url: http://arxiv.org/abs/2307.11864
  • repo_url: None
  • paper_authors: Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee
    for:This paper is written to detect fake and Large Language Model (LLM)-generated profiles in the LinkedIn Online Social Network immediately upon registration and before establishing connections.methods:The paper introduces the Section and Subsection Tag Embedding (SSTE) method to enhance the discriminative characteristics of textual information provided in LinkedIn profiles for distinguishing between legitimate profiles and those created by imposters manually or by using an LLM. The paper also uses static and contextualized word embeddings, including GloVe, Flair, BERT, and RoBERTa.results:The suggested method can distinguish between legitimate and fake profiles with an accuracy of about 95% across all word embeddings. Additionally, the SSTE method has a promising accuracy for identifying LLM-generated profiles, with an accuracy of approximately 90% when only 20 LLM-generated profiles are added to the training set.
    Abstract In this paper, we present a novel method for detecting fake and Large Language Model (LLM)-generated profiles in the LinkedIn Online Social Network immediately upon registration and before establishing connections. Early fake profile identification is crucial to maintaining the platform's integrity since it prevents imposters from acquiring the private and sensitive information of legitimate users and from gaining an opportunity to increase their credibility for future phishing and scamming activities. This work uses textual information provided in LinkedIn profiles and introduces the Section and Subsection Tag Embedding (SSTE) method to enhance the discriminative characteristics of these data for distinguishing between legitimate profiles and those created by imposters manually or by using an LLM. Additionally, the dearth of a large publicly available LinkedIn dataset motivated us to collect 3600 LinkedIn profiles for our research. We will release our dataset publicly for research purposes. This is, to the best of our knowledge, the first large publicly available LinkedIn dataset for fake LinkedIn account detection. Within our paradigm, we assess static and contextualized word embeddings, including GloVe, Flair, BERT, and RoBERTa. We show that the suggested method can distinguish between legitimate and fake profiles with an accuracy of about 95% across all word embeddings. In addition, we show that SSTE has a promising accuracy for identifying LLM-generated profiles, despite the fact that no LLM-generated profiles were employed during the training phase, and can achieve an accuracy of approximately 90% when only 20 LLM-generated profiles are added to the training set. It is a significant finding since the proliferation of several LLMs in the near future makes it extremely challenging to design a single system that can identify profiles created with various LLMs.
    摘要 在这篇论文中,我们介绍了一种新的方法,用于在 LinkedIn 在线社交网络上立即识别 fake 和 Large Language Model(LLM)生成的 profiless,并在注册后before establishing connections。早期识别假 profiless是维护平台的完整性的关键,因为它防止了假者从获取真正用户的私人和敏感信息,并从获得未来骗财活动的机会。本工作使用 LinkedIn profiless 中提供的文本信息,并引入 Section and Subsection Tag Embedding(SSTE)方法,以增强这些数据的权威性,以分辨真实 profiless 和由假者或 LLM 生成的 profiless。此外,由于没有大量公开可用的 LinkedIn 数据集,我们自己收集了 3600 个 LinkedIn profiless 为我们的研究。我们将在研究用途上公开我们的数据集。这是,我们知道的, LinkedIn 上假账户检测的首个大规模公开数据集。在我们的 paradigm 中,我们评估了静止和 contextualized 单词嵌入,包括 GloVe、Flair、BERT 和 RoBERTa。我们显示,我们的方法可以在所有单词嵌入上分辨 true 和 fake profiless,准确率约为 95%。此外,我们还显示了 SSTE 在 LLM 生成 profiless 上的扩展性,即使在训练阶段没有使用 LLM 生成 profiless,可以达到约 90% 的准确率,只需要添加 20 个 LLM 生成 profiless 到训练集中。这是一项重要发现,因为未来几年内,许多 LLM 将在未来逐渐普及,设计一个系统可以识别由不同 LLM 生成的 profiless 将变得极其困难。

Data-Induced Interactions of Sparse Sensors

  • paper_url: http://arxiv.org/abs/2307.11838
  • repo_url: None
  • paper_authors: Andrei A. Klishin, J. Nathan Kutz, Krithika Manohar
  • for: 该论文旨在描述如何使用少量的感知器来重建复杂系统的状态,并且如何选择感知器的位置以实现最佳重建结果。
  • methods: 论文使用了基于异谱 interpolate 和 QR 分解的多种算法来优化感知器的位置,并通过统计物理学的狄耳诺模型来计算感知器之间的互动。
  • results: 论文通过计算数据引导的感知器互动的全景,可以结合外部选择标准和预测感知器更换的影响。
    Abstract Large-dimensional empirical data in science and engineering frequently has low-rank structure and can be represented as a combination of just a few eigenmodes. Because of this structure, we can use just a few spatially localized sensor measurements to reconstruct the full state of a complex system. The quality of this reconstruction, especially in the presence of sensor noise, depends significantly on the spatial configuration of the sensors. Multiple algorithms based on gappy interpolation and QR factorization have been proposed to optimize sensor placement. Here, instead of an algorithm that outputs a singular "optimal" sensor configuration, we take a thermodynamic view to compute the full landscape of sensor interactions induced by the training data. The landscape takes the form of the Ising model in statistical physics, and accounts for both the data variance captured at each sensor location and the crosstalk between sensors. Mapping out these data-induced sensor interactions allows combining them with external selection criteria and anticipating sensor replacement impacts.
    摘要 大量实际数据在科学和工程频繁具有低维结构,可以通过一些本地感知器来表示。由于这种结构,我们可以使用一些感知器来重建复杂系统的全部状态,尤其是在感知噪声存在时。多种基于异常 interpolate 和 QR 分解的算法已经被提出来优化感知器布局。而不是输出一个“最优”的感知器配置,我们在这里采用热力学视角计算整个感知器与训练数据之间的互动场景。这个场景采用牛顿模型来描述,考虑了每个感知器位置上采集数据的方差以及感知器之间的干扰。通过映射这些数据引起的感知器互动,我们可以与外部选择标准结合并预测感知器更换的影响。

eXplainable Artificial Intelligence (XAI) in age prediction: A systematic review

  • paper_url: http://arxiv.org/abs/2307.13704
  • repo_url: None
  • paper_authors: Alena Kalyakulina, Igor Yusipov
  • for: 这篇论文探讨了使用可解释人工智能(XAI)技术进行年龄预测任务的应用。
  • methods: 论文将XAI技术应用于不同的身体系统,进行系统化的文献综述。
  • results: 论文指出了XAI在医疗应用中的优点,特别是在年龄预测任务中。
    Abstract eXplainable Artificial Intelligence (XAI) is now an important and essential part of machine learning, allowing to explain the predictions of complex models. XAI is especially required in risky applications, particularly in health care, where human lives depend on the decisions of AI systems. One area of medical research is age prediction and identification of biomarkers of aging and age-related diseases. However, the role of XAI in the age prediction task has not previously been explored directly. In this review, we discuss the application of XAI approaches to age prediction tasks. We give a systematic review of the works organized by body systems, and discuss the benefits of XAI in medical applications and, in particular, in the age prediction domain.
    摘要 <>可解释人工智能(XAI)现在是机器学习中非常重要和必需的一部分,允许解释复杂模型的预测。XAI特别在危险应用中需要,特别是在医疗领域,人工智能系统的决策直接关系到人们的生命。一个医学研究领域是年龄预测和衰老病症的生物标志物质的预测。然而,XAI在年龄预测任务中的角色没有直接探讨过。在这篇评论中,我们讨论了XAI方法在年龄预测任务中的应用。我们按照身体系统进行了系统性的综述,并讨论了医疗应用中XAI的优点和年龄预测领域中XAI的特点。>>>Note that Simplified Chinese is used here, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2307.11833
  • repo_url: https://github.com/adityalab/pinnsformer
  • paper_authors: Leo Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash
  • for: 用于数值解 partial differential equations (PDEs) 的深度学习框架。
  • methods: 使用 Transformer 结构,并采用多头注意机制来捕捉 PDEs 中的时间关系。
  • results: 能够准确地 approximates PDEs 的解,并在不同场景下超过传统 PINNs 的表现,尽管具有较少的计算和存储成本。
    Abstract Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions for partial differential equations (PDEs). While conventional PINNs and most related studies adopt fully-connected multilayer perceptrons (MLP) as the backbone structure, they have neglected the temporal relations in PDEs and failed to approximate the true solution. In this paper, we propose a novel Transformer-based framework, namely PINNsFormer, that accurately approximates PDEs' solutions by capturing the temporal dependencies with multi-head attention mechanisms in Transformer-based models. Instead of approximating point predictions, PINNsFormer adapts input vectors to pseudo sequences and point-wise PINNs loss to a sequential PINNs loss. In addition, PINNsFormer is equipped with a novel activation function, namely Wavelet, which anticipates the Fourier decomposition through deep neural networks. We empirically demonstrate PINNsFormer's ability to capture the PDE solutions for various scenarios, in which conventional PINNs have failed to learn. We also show that PINNsFormer achieves superior approximation accuracy on such problems than conventional PINNs with non-sensitive hyperparameters, in trade of marginal computational and memory costs, with extensive experiments.
    摘要 physics-informed neural networks (PINNs) 已经出现为解决数学Physical laws的深度学习框架,但是传统的PINNs和大多数相关研究都是使用完全连接多层感知器(MLP)作为脊梁结构,这些结构忽略了PDEs中的时间关系,并且无法准确地预测解。在本文中,我们提出了一种新的Transformer-based框架,即PINNsFormer,可以准确地预测PDEs的解决方案,通过在Transformer-based模型中使用多头注意机制来捕捉PDEs中的时间相关性。而不是对点预测进行approximation,PINNsFormer将输入向量转化为pseudo序列,并将点级PINNs损失转化为sequential PINNs损失。此外,PINNsFormer还具有一种新的活动函数,即wavelet,该函数预测了深度神经网络中的Fourier分解。我们通过实验证明PINNsFormer可以在不同的情况下,包括传统PINNs无法学习的情况下,准确地预测PDEs的解决方案。此外,我们还证明PINNsFormer在这些问题上的 aproximation精度高于传统PINNs,但是与非敏感的计算和存储成本相比,PINNsFormer的计算和存储成本几乎是零的。

Differentially Private Heavy Hitter Detection using Federated Analytics

  • paper_url: http://arxiv.org/abs/2307.11749
  • repo_url: None
  • paper_authors: Karan Chadha, Junye Chen, John Duchi, Vitaly Feldman, Hanieh Hashemi, Omid Javidbakht, Audra McMillan, Kunal Talwar
  • for: 增强 prefix-tree 算法 隐私检测 differentially private heavy hitter 性能。
  • methods: 提出了一种基于 adaptive hyperparameter tuning 算法,以满足计算、通信和隐私约束的多用户数据点检测。
  • results: 通过对 Reddit 数据集进行大量实验,发现该方法可以提高检测性能,同时满足计算、通信和隐私约束。
    Abstract In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset~\cite{caldas2018leaf} on the task of learning the most frequent words.
    摘要 在这个工作中,我们研究了使用前缀树基于算法来提高分布式隐私极大热点检测的实用规则。我们的模型假设每个用户有多个数据点,目标是通过聚合和本地隐私来学习所有用户数据中的最多频数据点。我们提议一种适应性hyperparameter调整算法,可以提高算法的性能,同时满足计算、通信和隐私约束。我们还研究了不同的数据选择方案以及在多次运行算法时引入拒绝列表的影响。我们对这些改进进行了广泛的实验,使用了Reddit数据集(Caldas et al., 2018),以学习最常见的单词。

Advancing Ad Auction Realism: Practical Insights & Modeling Implications

  • paper_url: http://arxiv.org/abs/2307.11732
  • repo_url: None
  • paper_authors: Ming Chen, Sareh Nabi, Marciano Siniscalchi
  • for: 这个论文是为了研究当代在线广告拍卖中的四个实际特征,包括广告插播值和点击率因用户搜索词而异常,竞争者的数量和身份在拍卖过程中是未知的,广告主只能得到部分、汇总的反馈。
  • methods: 作者使用了对抗人工智能算法来模型广告主的行为,不受拍卖机制细节的影响。
  • results: 研究发现,在更加复杂的环境中,“软底”可以提高关键性能指标,而且可以在竞争者来自同一个人口群体时实现这一效果。此外,研究还证明了如何从观察拍卖价格中推断广告主价值分布,从而证明了这种方法在更加实际的拍卖Setting中的实际效果。
    Abstract This paper proposes a learning model of online ad auctions that allows for the following four key realistic characteristics of contemporary online auctions: (1) ad slots can have different values and click-through rates depending on users' search queries, (2) the number and identity of competing advertisers are unobserved and change with each auction, (3) advertisers only receive partial, aggregated feedback, and (4) payment rules are only partially specified. We model advertisers as agents governed by an adversarial bandit algorithm, independent of auction mechanism intricacies. Our objective is to simulate the behavior of advertisers for counterfactual analysis, prediction, and inference purposes. Our findings reveal that, in such richer environments, "soft floors" can enhance key performance metrics even when bidders are drawn from the same population. We further demonstrate how to infer advertiser value distributions from observed bids, thereby affirming the practical efficacy of our approach even in a more realistic auction setting.
    摘要
  1. Ad slots can have different values and click-through rates depending on users’ search queries.2. The number and identity of competing advertisers are unobserved and change with each auction.3. Advertisers only receive partial, aggregated feedback.4. Payment rules are only partially specified.We model advertisers as agents governed by an adversarial bandit algorithm, independent of auction mechanism intricacies. Our objective is to simulate the behavior of advertisers for counterfactual analysis, prediction, and inference purposes.Our findings show that “soft floors” can enhance key performance metrics even when bidders are drawn from the same population. Additionally, we demonstrate how to infer advertiser value distributions from observed bids, confirming the practical efficacy of our approach in a more realistic auction setting.

Mitigating Communications Threats in Decentralized Federated Learning through Moving Target Defense

  • paper_url: http://arxiv.org/abs/2307.11730
  • repo_url: https://github.com/enriquetomasmb/fedstellar
  • paper_authors: Enrique Tomás Martínez Beltrán, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán
  • for: This paper aims to address the communication security challenges in Decentralized Federated Learning (DFL) by introducing a security module that combines encryption and Moving Target Defense (MTD) techniques.
  • methods: The security module is implemented in a DFL platform called Fedstellar, and the authors evaluate the effectiveness of the module through experiments with the MNIST dataset and eclipse attacks.
  • results: The results show that the security module can mitigate the risks posed by eavesdropping or eclipse attacks, with an average F1 score of 95% and moderate increases in CPU usage and network traffic under the most secure configuration.Here’s the simplified Chinese text for the three points:
  • for: 这篇论文目的是解决分布式联合学习(DFL)中的通信安全挑战,通过引入加密和移动目标防御(MTD)技术的安全模块。
  • methods: 这个安全模块在分布式联合学习平台Fedstellar中实现,通过MNIST数据集和eclipse攻击进行测试。
  • results: 测试结果表明,安全模块可以降低防御 eclipse 攻击和窃听攻击的风险,实现了95%的平均F1分数,并且在最安全配置下,CPU使用率可以达到63.2% +-3.5%,网络流量可以达到230 MB +-15 MB。
    Abstract The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decentralized nature of the aggregation process, the varied roles and responsibilities of the participants, and the absence of a central authority to oversee and mitigate threats. Addressing these challenges, this paper first delineates a comprehensive threat model, highlighting the potential risks of DFL communications. In response to these identified risks, this work introduces a security module designed for DFL platforms to counter communication-based attacks. The module combines security techniques such as symmetric and asymmetric encryption with Moving Target Defense (MTD) techniques, including random neighbor selection and IP/port switching. The security module is implemented in a DFL platform called Fedstellar, allowing the deployment and monitoring of the federation. A DFL scenario has been deployed, involving eight physical devices implementing three security configurations: (i) a baseline with no security, (ii) an encrypted configuration, and (iii) a configuration integrating both encryption and MTD techniques. The effectiveness of the security module is validated through experiments with the MNIST dataset and eclipse attacks. The results indicated an average F1 score of 95%, with moderate increases in CPU usage (up to 63.2% +-3.5%) and network traffic (230 MB +-15 MB) under the most secure configuration, mitigating the risks posed by eavesdropping or eclipse attacks.
    摘要 《协同学习的分布式协同学习(DFL)技术在训练机器学习模型方面带来了巨大的改变,使得多个参与者之间的模型协同学习可以实现,从而减少依赖于服务器。然而,这种方法引入了一些独特的通信安全挑战,这些挑战主要来自于协同学习过程的分布式特性,参与者的多样化角色和责任,以及缺乏中央权限来监管和处理威胁。为了解决这些挑战,本文首先提出了一个全面的威胁模型,描述了DFL通信的潜在风险。为应对这些风险,本工作提出了一个专门为DFL平台设计的安全模块,该模块结合了加密技术和移动目标防御(MTD)技术,包括随机 neighber 选择和IP/端口 switching。该安全模块在一个名为Fedstellar的DFL平台上实现,allowing the deployment and monitoring of the federation。一个DFL场景已经被部署,并在八个物理设备上实现了三种安全配置:(i)基eline with no security,(ii)加密配置,和(iii) integrate both encryption and MTD techniques。安全模块的有效性通过使用MNIST数据集和eclipse攻击进行实验 validate。结果显示,在最安全的配置下,模型的F1分数平均为95%,CPU使用率提高至63.2% ± 3.5%,网络流量增加至230 MB ± 15 MB。这些结果表明,通过加密和MTD技术,可以有效地防止遮断或eclipse攻击。

Local Kernel Renormalization as a mechanism for feature learning in overparametrized Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2307.11807
  • repo_url: None
  • paper_authors: R. Aiudi, R. Pacelli, A. Vezzani, R. Burioni, P. Rotondo
  • for: 这篇论文主要研究了深度神经网络中的特征学习方法,以及它们在不同类型的架构中的表现。
  • methods: 研究者使用了一种简单的理论框架,来解释FC和CNN架构中特征学习的不同表现。他们首先显示了一个有限宽FC网络的泛化性能可以通过无穷宽网络来获得,并且提出了一种有限宽效果行动来描述CNN架构中的特征学习。
  • results: 研究者发现了一种简单的特征学习机制,它只能在浅层CNN中发生,而不是在浅层FC网络或者无Weight连接神经网络中。这种机制导致CNN架构在有限宽 régime中表现优秀,而FC网络则是在无穷宽 régime中表现优秀。
    Abstract Feature learning, or the ability of deep neural networks to automatically learn relevant features from raw data, underlies their exceptional capability to solve complex tasks. However, feature learning seems to be realized in different ways in fully-connected (FC) or convolutional architectures (CNNs). Empirical evidence shows that FC neural networks in the infinite-width limit eventually outperform their finite-width counterparts. Since the kernel that describes infinite-width networks does not evolve during training, whatever form of feature learning occurs in deep FC architectures is not very helpful in improving generalization. On the other hand, state-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime, suggesting that an effective form of feature learning emerges in this case. In this work, we present a simple theoretical framework that provides a rationale for these differences, in one hidden layer networks. First, we show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors. Second, we derive a finite-width effective action for an architecture with one convolutional hidden layer and compare it with the result available for FC networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the FC architecture is just globally renormalized by a single scalar parameter, the CNN kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow CNNs, but not in shallow FC architectures or in locally connected neural networks without weight sharing.
    摘要 “特征学习”,也就是深度神经网络自动从原始数据中学习到 relevante 特征的能力,是深度神经网络解决复杂任务的关键。然而,在完全连接(FC)或卷积(CNN)架构中,特征学习似乎存在不同的实现方式。实际证明表明,在无穷宽限制下,FC神经网络 eventually 超越其固定宽度 counterparts。由于无穷宽网络的kernel不会在训练过程中进行变化,因此深度FC架构中的特征学习不会对泛化提供帮助。相反,当前领域的状态艺术架构,卷积层 achiev 最佳性能,表明在这种情况下,特征学习会出现有效的形式。在这种情况下,我们提出了一个简单的理论框架,用于解释这些差异。首先,我们证明了一个有限宽FC网络的泛化性能可以通过无穷宽网络来获得,并且需要一个适当的高斯先验。其次,我们 deriv 有限宽效果动作,并与FC网络的结果进行比较。意外地,我们发现了一种完全不同的kernel renormalization:FC架构的kernel仅受到全局抽象,而CNN架构的kernel则会在数据依赖的方式进行本地抽象,这意味着网络可以在数据中选择本地组分,以便在数据依赖的方式进行预测。这种发现高光了一种简单的特征学习机制,可以在过参神经网络中发生,但不可以在FC架构中或者在没有权重共享的本地神经网络中发生。

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

  • paper_url: http://arxiv.org/abs/2307.11714
  • repo_url: None
  • paper_authors: Eloi Tanguy
  • for: 本研究的目的是提供对 fixes step SGD 在 SW 损失函数上的分布学习模型 parameters 的趋势,并对这种方法的有效性进行 теории保证。
  • methods: 本研究使用了 Bianchi et al. (2022) 所提出的非平滑非对称函数下 SGD 的渐进结果,并在这种 Setting 中进行了实际的应用。
  • results: 研究发现,随着步长减小,SGD 轨迹会接近 (sub) 导流方程,并且在更加严格的假设下,SGD 轨迹会在极限下 approaching 泛化极点。
    Abstract Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function.
    摘要

JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.11704
  • repo_url: None
  • paper_authors: Kaiwen Wang, Junxiong Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
  • for: 本文提出了一个高效和轻量级的查询优化环境,用于应用智能学习(RL)。
  • methods: 本文使用了Markov决策过程(MDP)将左深和叶子变种的JoinOrder选择(JOS)问题转化为一个实际的数据管理问题,并提供了遵循标准Gymnasium API的实现。
  • results: 本文对各种RL算法进行了测试,并发现至少一种方法可以在训练集查询中near-优化性表现,但是在测试集查询中表现下降数个量级。这个差距驱动了进一步的研究,以确定RL算法在多任务 combinatorial优化问题中的泛化能力。
    Abstract In this paper, we present \textsc{JoinGym}, an efficient and lightweight query optimization environment for reinforcement learning (RL). Join order selection (JOS) is a classic NP-hard combinatorial optimization problem from database query optimization and can serve as a practical testbed for the generalization capabilities of RL algorithms. We describe how to formulate each of the left-deep and bushy variants of the JOS problem as a Markov Decision Process (MDP), and we provide an implementation adhering to the standard Gymnasium API. We highlight that our implementation \textsc{JoinGym} is completely based on offline traces of all possible joins, which enables RL practitioners to easily and quickly test their methods on a realistic data management problem without needing to setup any systems. Moreover, we also provide all possible join traces on $3300$ novel SQL queries generated from the IMDB dataset. Upon benchmarking popular RL algorithms, we find that at least one method can obtain near-optimal performance on train-set queries but their performance degrades by several orders of magnitude on test-set queries. This gap motivates further research for RL algorithms that generalize well in multi-task combinatorial optimization problems.
    摘要 在这篇论文中,我们介绍了一个高效和轻量级的查询优化环境,称为JoinGym,用于应急学习(RL)。Join order选择(JOS)是一个经典的NP困难的 combinatorial optimization问题,可以作为RL算法的总结能力的实际测试场景。我们描述了如何将左深和荔枝两种JOS问题转化为Markov决策过程(MDP),并提供了符合标准Gymnasium API的实现。我们指出,我们的实现基于全部可能的连接轨迹,使得RL专家可以轻松地和快速地在真实的数据管理问题上测试自己的方法,不需要设置任何系统。此外,我们还提供了3300个新的SQL查询,这些查询来自IMDB数据集。在 benchmarking 各种RL算法时,我们发现至少一种方法可以在训练集查询上获得近似优秀性能,但是它们在测试集查询上的性能却减少了几个数量级。这个差距激励了我们进一步研究RL算法在多任务 combinatorial optimization 问题中的总结能力。

Using simulation to calibrate real data acquisition in veterinary medicine

  • paper_url: http://arxiv.org/abs/2307.11695
  • repo_url: None
  • paper_authors: Krystian Strzałka, Szymon Mazurek, Maciej Wielgosz, Paweł Russek, Jakub Caputa, Daria Łukasik, Jan Krupiński, Jakub Grzeszczyk, Michał Karwatowski, Rafał Frączek, Ernest Jamro, Marcin Pietroń, Sebastian Koryciak, Agnieszka Dąbrowska-Boruch, Kazimierz Wiatr
  • for: 这个研究旨在使用模拟环境提高动物医学数据收集和诊断,特点是通过使用Blender和Blenderproc库生成具有多种生物学、环境和行为条件的 sintetic数据集,并用这些数据集训练机器学习模型以识别正常和异常的步态。
  • methods: 这个研究使用了Blender和Blenderproc库生成 sintetic数据集,并使用这些数据集训练机器学习模型。两个不同的数据集,具有不同的摄像头角度细节,被创建以进一步研究摄像头角度对模型准确性的影响。
  • results: 初步结果表明,通过使用模拟环境和真实病人数据集的组合,这种基于模拟的方法可能会提高动物医学诊断的效果和效率。
    Abstract This paper explores the innovative use of simulation environments to enhance data acquisition and diagnostics in veterinary medicine, focusing specifically on gait analysis in dogs. The study harnesses the power of Blender and the Blenderproc library to generate synthetic datasets that reflect diverse anatomical, environmental, and behavioral conditions. The generated data, represented in graph form and standardized for optimal analysis, is utilized to train machine learning algorithms for identifying normal and abnormal gaits. Two distinct datasets with varying degrees of camera angle granularity are created to further investigate the influence of camera perspective on model accuracy. Preliminary results suggest that this simulation-based approach holds promise for advancing veterinary diagnostics by enabling more precise data acquisition and more effective machine learning models. By integrating synthetic and real-world patient data, the study lays a robust foundation for improving overall effectiveness and efficiency in veterinary medicine.
    摘要 这个研究paper explores the innovative use of simulation environments to enhance data acquisition and diagnostics in veterinary medicine, focusing specifically on gait analysis in dogs. The study harnesses the power of Blender and the Blenderproc library to generate synthetic datasets that reflect diverse anatomical, environmental, and behavioral conditions. The generated data, represented in graph form and standardized for optimal analysis, is utilized to train machine learning algorithms for identifying normal and abnormal gaits. Two distinct datasets with varying degrees of camera angle granularity are created to further investigate the influence of camera perspective on model accuracy. Preliminary results suggest that this simulation-based approach holds promise for advancing veterinary diagnostics by enabling more precise data acquisition and more effective machine learning models. By integrating synthetic and real-world patient data, the study lays a robust foundation for improving overall effectiveness and efficiency in veterinary medicine.Here's the text with traditional Chinese characters:这个研究paper explores the innovative use of simulation environments to enhance data acquisition and diagnostics in veterinary medicine, focusing specifically on gait analysis in dogs. The study harnesses the power of Blender and the Blenderproc library to generate synthetic datasets that reflect diverse anatomical, environmental, and behavioral conditions. The generated data, represented in graph form and standardized for optimal analysis, is utilized to train machine learning algorithms for identifying normal and abnormal gaits. Two distinct datasets with varying degrees of camera angle granularity are created to further investigate the influence of camera perspective on model accuracy. Preliminary results suggest that this simulation-based approach holds promise for advancing veterinary diagnostics by enabling more precise data acquisition and more effective machine learning models. By integrating synthetic and real-world patient data, the study lays a robust foundation for improving overall effectiveness and efficiency in veterinary medicine.

Fast Adaptive Test-Time Defense with Robust Features

  • paper_url: http://arxiv.org/abs/2307.11672
  • repo_url: None
  • paper_authors: Anurag Singh, Mahalakshmi Sabanayagam, Krikamol Muandet, Debarghya Ghoshdastidar
  • for: 提高深度神经网络的对抗性性能
  • methods: 基于特征稳定性的抗击攻击策略
  • results: 在CIFAR-10和CIFAR-100数据集上,与现有最佳方法相比,提出的方法具有较低的计算成本,且对抗性性能较高。
    Abstract Adaptive test-time defenses are used to improve the robustness of deep neural networks to adversarial examples. However, existing methods significantly increase the inference time due to additional optimization on the model parameters or the input at test time. In this work, we propose a novel adaptive test-time defense strategy that is easy to integrate with any existing (robust) training procedure without additional test-time computation. Based on the notion of robustness of features that we present, the key idea is to project the trained models to the most robust feature space, thereby reducing the vulnerability to adversarial attacks in non-robust directions. We theoretically show that the top eigenspace of the feature matrix are more robust for a generalized additive model and support our argument for a large width neural network with the Neural Tangent Kernel (NTK) equivalence. We conduct extensive experiments on CIFAR-10 and CIFAR-100 datasets for several robustness benchmarks, including the state-of-the-art methods in RobustBench, and observe that the proposed method outperforms existing adaptive test-time defenses at much lower computation costs.
    摘要 使用适应性测试时防御,提高深度神经网络对攻击示例的Robustness。然而,现有方法会significantly增加测试时间,因为它们需要在测试时进行额外的优化模型参数或输入。在这项工作中,我们提出了一种新的适应测试时防御策略,可以轻松地与任何现有的Robust训练过程集成,无需额外的测试时间计算。我们基于特征空间的Robustness提出了一个新的思路,即将训练模型映射到最Robust的特征空间,以降低非Robust方向的攻击性。我们理论上显示,通过对特征矩阵的top射影空间进行投影,可以提高一般加法模型的Robustness。我们在CIFAR-10和CIFAR-100数据集上进行了广泛的实验,包括RobustBench状态OF-the-art方法,并观察到我们提出的方法在计算成本远低于现有适应测试时防御方法时仍能够获得更高的Robustness性。

An Efficient Interior-Point Method for Online Convex Optimization

  • paper_url: http://arxiv.org/abs/2307.11668
  • repo_url: None
  • paper_authors: Elad Hazan, Nimrod Megiddo
  • for: 这个论文是为了最小化在线凸优化中的遗弃量而写的。
  • methods: 这个论文使用了一种新的算法来最小化遗弃量,该算法是适应的,meaning its regret bounds hold not only for the time periods 1,…,T but also for every sub-interval s,s+1,…,t。
  • results: 这个论文的结果表明,该算法的遗弃量为O(√T log T),这是最小化遗弃量的下限,只有一个 logs 项。
    Abstract A new algorithm for regret minimization in online convex optimization is described. The regret of the algorithm after $T$ time periods is $O(\sqrt{T \log T})$ - which is the minimum possible up to a logarithmic term. In addition, the new algorithm is adaptive, in the sense that the regret bounds hold not only for the time periods $1,\ldots,T$ but also for every sub-interval $s,s+1,\ldots,t$. The running time of the algorithm matches that of newly introduced interior point algorithms for regret minimization: in $n$-dimensional space, during each iteration the new algorithm essentially solves a system of linear equations of order $n$, rather than solving some constrained convex optimization problem in $n$ dimensions and possibly many constraints.
    摘要 新的算法可以最小化 regret 在在线凸优化中描述。这个算法在 $T$ 时间段后的 regret 是 $O(\sqrt{T \log T})$,这是最低的,只有一个对数性 терMINOLOGY。此外,这个新算法是可适应的,意味着其 regret 约束不仅适用于时间段 $1,\ldots,T$,还适用于每个子时间段 $s,s+1,\ldots,t$。算法的运行时间与新引入的内部点算法一样,在 $n$ 维空间中,每次迭代中,新算法基本上解决了一个线性方程组问题,而不是解决一个凸优化问题并且可能有很多约束。