cs.AI - 2023-08-13

Dual Meta-Learning with Longitudinally Generalized Regularization for One-Shot Brain Tissue Segmentation Across the Human Lifespan

  • paper_url: http://arxiv.org/abs/2308.06774
  • repo_url: None
  • paper_authors: Yongheng Sun, Fan Wang, Jun Shu, Haifeng Wang, Li Wang. Deyu Meng, Chunfeng Lian
  • for: 这个论文旨在提出一种用于批处理数据的脑细胞分割方法,以便于 neuroscience 和临床研究。
  • methods: 该方法使用 dual meta-learning 模型,包括一个 plug-and-play 特征提取器和一个 initializer 任务头,以学习 longitudinally 一致的表示。此外,两种类 aware 正则化也是提出来鼓励 longitudinal 一致性。
  • results: 实验结果表明,该方法在 iSeg2019 和 ADNI 数据集上具有效果。代码可以在 https://github.com/ladderlab-xjtu/DuMeta 上下载。
    Abstract Brain tissue segmentation is essential for neuroscience and clinical studies. However, segmentation on longitudinal data is challenging due to dynamic brain changes across the lifespan. Previous researches mainly focus on self-supervision with regularizations and will lose longitudinal generalization when fine-tuning on a specific age group. In this paper, we propose a dual meta-learning paradigm to learn longitudinally consistent representations and persist when fine-tuning. Specifically, we learn a plug-and-play feature extractor to extract longitudinal-consistent anatomical representations by meta-feature learning and a well-initialized task head for fine-tuning by meta-initialization learning. Besides, two class-aware regularizations are proposed to encourage longitudinal consistency. Experimental results on the iSeg2019 and ADNI datasets demonstrate the effectiveness of our method. Our code is available at https://github.com/ladderlab-xjtu/DuMeta.
    摘要 ��rett�Brain tissue segmentation是 neuroscience 和 clinical studies 中必备的一环。然而,对 longitudinal 数据进行 segmentation 是一个挑战,因为大脑在生长过程中会发生 dynamically 的变化。先前的研究主要集中在自我超vised 的 regularization 上,这会导致 fine-tuning 时失去 longitudinal 一致性。在这篇论文中,我们提出了 dual meta-learning парадигма,以学习 longitudinally 一致的表示和 fine-tuning persist。具体来说,我们学习了一个可插入的 feature extractor,用于抽取 longitudinally 一致的 anatomical 表示,以及一个 Well-Initialized 任务头,用于 fine-tuning。此外,我们还提出了两种类型 aware 的 regularization,以促进 longitudinal 一致性。我们的实验结果在 iSeg2019 和 ADNI 数据集上表明了我们的方法的有效性。我们的代码可以在 https://github.com/ladderlab-xjtu/DuMeta 上获取。

Few-shot Class-incremental Learning: A Survey

  • paper_url: http://arxiv.org/abs/2308.06764
  • repo_url: None
  • paper_authors: Jinghua Zhang, Li Liu, Olli Silven, Matti Pietikäinen, Dewen Hu
  • for: 这篇论文旨在提供关于几拟学习(Few-shot Class-Incremental Learning,FSCIL)的系统性和深入的评论。
  • methods: 本论文涵盖了FSCIL中的多种方法,包括数据基于、结构基于和优化基于的分类方法以及锚点基于和锚点自由的对象检测方法。
  • results: 本论文提供了一个彻底的检查和评估的 benchmark 数据集和评价指标,以及一些在FSCIL中的推荐的研究方向。
    Abstract Few-shot Class-Incremental Learning (FSCIL) presents a unique challenge in machine learning, as it necessitates the continuous learning of new classes from sparse labeled training samples without forgetting previous knowledge. While this field has seen recent progress, it remains an active area of exploration. This paper aims to provide a comprehensive and systematic review of FSCIL. In our in-depth examination, we delve into various facets of FSCIL, encompassing the problem definition, the discussion of primary challenges of unreliable empirical risk minimization and the stability-plasticity dilemma, general schemes, and relevant problems of incremental learning and few-shot learning. Besides, we offer an overview of benchmark datasets and evaluation metrics. Furthermore, we introduce the classification methods in FSCIL from data-based, structure-based, and optimization-based approaches and the object detection methods in FSCIL from anchor-free and anchor-based approaches. Beyond these, we illuminate several promising research directions within FSCIL that merit further investigation.
    摘要 《几个示例学习(Few-shot Class-Incremental Learning,FSCIL)》是机器学习领域的一个独特挑战,它需要在缺乏标注训练样本的情况下,不断学习新的类型,而不会忘记之前的知识。虽然这一领域已经有了相应的进步,但仍然是一个活跃的探索领域。本文的目的是提供关于FSCIL的全面和系统性的综述,我们在这里进行了深入的检查,涵盖了FSCIL的问题定义、主要挑战、基本方案、相关的增量学习和几个示例学习方法等方面。此外,我们还介绍了FSCIL中的标准数据集和评价指标。在FSCIL中,我们分析了数据基于、结构基于和优化基于的分类方法,以及anchor-free和anchor-based的对应检测方法。此外,我们还透出了FSCIL中一些有前途的研究方向,以便进一步探索和发展这一领域。

Evaluating the anticipated outcomes of MRI seizure image from open-source tool- Prototype approach

  • paper_url: http://arxiv.org/abs/2308.07762
  • repo_url: None
  • paper_authors: Jayanthi Vajiram, Aishwarya Senthil, Utkarsh Maurya
  • for: 这个论文主要用于描述世界各地约70亿人口中 epilepsy 病人的脑部功能障碍和分析。
  • methods: 本文使用了多种开源神经成像工具,包括 MATLAB、Slicer 3D、Brain Suite21a、SPM 和 MedCalc,以进行脑部功能障碍的检查和分析。大约 60% 的研究人员使用了 MATLAB 进行图像处理,而其他 30% 使用了其他开源软件工具。
  • results: 根据本文的报告,大约 70% 的研究人员使用了 MATLAB 进行图像处理,而其他 30% 使用了其他开源软件工具。
    Abstract Epileptic Seizure is an abnormal neuronal exertion in the brain, affecting nearly 70 million of the world's population (Ngugi et al., 2010). So many open-source neuroimaging tools are used for metabolism checkups and analysis purposes. The scope of open-source tools like MATLAB, Slicer 3D, Brain Suite21a, SPM, and MedCalc are explained in this paper. MATLAB was used by 60% of the researchers for their image processing and 10% of them use their proprietary software. More than 30% of the researchers use other open-source software tools with their processing techniques for the study of magnetic resonance seizure images
    摘要 эпилептический приступ是脑内部不正常的神经舒张,影响全球约70亿人口( Ngugi et al., 2010)。这么多个开源神经成像工具用于 метаболизма检查和分析。这篇论文中 explain了开源工具如MATLAB、Slicer 3D、Brain Suite21a、SPM 和 MedCalc 的范围。MATLAB 被60%的研究人员用于图像处理,而10%的他们使用自己的专有软件。 более30%的研究人员使用其他开源软件工具进行频率逐点图像的研究

Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization

  • paper_url: http://arxiv.org/abs/2308.06741
  • repo_url: None
  • paper_authors: Mohammad Mehdi Nasiri, Mansoor Rezghi
  • for: solving cooperative Multi-Agent Reinforcement Learning (MARL) problems with varying agent abilities and individual policies
  • methods: Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO) algorithm, which utilizes the multi-agent advantage decomposition lemma for efficient policy updates and guarantees stability and performance improvements
  • results: superiority over state-of-the-art algorithms such as HATRPO and HAPPO, demonstrated through experiments on Multi-Agent MuJoCo and StarCraftII tasks
    Abstract This paper presents an extension of the Mirror Descent method to overcome challenges in cooperative Multi-Agent Reinforcement Learning (MARL) settings, where agents have varying abilities and individual policies. The proposed Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO) algorithm utilizes the multi-agent advantage decomposition lemma to enable efficient policy updates for each agent while ensuring overall performance improvements. By iteratively updating agent policies through an approximate solution of the trust-region problem, HAMDPO guarantees stability and improves performance. Moreover, the HAMDPO algorithm is capable of handling both continuous and discrete action spaces for heterogeneous agents in various MARL problems. We evaluate HAMDPO on Multi-Agent MuJoCo and StarCraftII tasks, demonstrating its superiority over state-of-the-art algorithms such as HATRPO and HAPPO. These results suggest that HAMDPO is a promising approach for solving cooperative MARL problems and could potentially be extended to address other challenging problems in the field of MARL.
    摘要 HAMDPO uses the multi-agent advantage decomposition lemma to efficiently update agent policies while ensuring improved overall performance. The algorithm iteratively updates agent policies using an approximate solution of the trust-region problem, which guarantees stability and improves performance.HAMDPO is capable of handling both continuous and discrete action spaces for heterogeneous agents in various MARL problems. The authors evaluate HAMDPO on Multi-Agent MuJoCo and StarCraftII tasks and show that it outperforms state-of-the-art algorithms such as HATRPO and HAPPO. These results suggest that HAMDPO is a promising approach for solving cooperative MARL problems and could potentially be extended to address other challenging problems in the field of MARL.Here is the Simplified Chinese translation of the text:这篇论文提出了一种新的方法 called Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO),用于解决多体决策学习(MARL)中的合作问题。该方法是为了处理每个代理机器人有不同能力和个性政策的情况。HAMDPO 使用多体优势分解证明来有效地更新代理机器人的策略,同时保证总体性能提高。该算法通过迭代更新代理机器人的策略,使用approxSolution 的信任区问题的解决方案,保证稳定性和性能提高。HAMDPO 可以处理不同的连续和离散动作空间,并在多种 MARL 问题上进行应用。作者们通过在 Multi-Agent MuJoCo 和 StarCraftII 任务上评估 HAMDPO,发现它比state-of-the-art 算法如 HATRPO 和 HAPPO 表现出色。这些结果表明,HAMDPO 是一种有前途的方法,可以解决多体决策学习中的合作问题,并可能扩展到其他难题。

Probabilistic Imputation for Time-series Classification with Missing Data

  • paper_url: http://arxiv.org/abs/2308.06738
  • repo_url: https://github.com/yuneg11/SupNotMIWAE-with-ObsDropout
  • paper_authors: SeungHyun Kim, Hyunsu Kim, EungGu Yun, Hwangrae Lee, Jaehun Lee, Juho Lee
  • For: This paper proposes a novel probabilistic framework for classification with multivariate time series data that contains missing values.* Methods: The proposed method consists of two parts: a deep generative model for missing value imputation and a classifier. The generative model is trained to impute the missing values in multiple plausible ways, effectively modeling the uncertainty of the imputation. The classifier takes the time series data along with the imputed missing values and classifies signals, and is trained to capture the predictive uncertainty due to the multiple possibilities of imputations.* Results: The proposed method is demonstrated to be effective through extensive experiments on real-world time series data with missing values.
    Abstract Multivariate time series data for real-world applications typically contain a significant amount of missing values. The dominant approach for classification with such missing values is to impute them heuristically with specific values (zero, mean, values of adjacent time-steps) or learnable parameters. However, these simple strategies do not take the data generative process into account, and more importantly, do not effectively capture the uncertainty in prediction due to the multiple possibilities for the missing values. In this paper, we propose a novel probabilistic framework for classification with multivariate time series data with missing values. Our model consists of two parts; a deep generative model for missing value imputation and a classifier. Extending the existing deep generative models to better capture structures of time-series data, our deep generative model part is trained to impute the missing values in multiple plausible ways, effectively modeling the uncertainty of the imputation. The classifier part takes the time series data along with the imputed missing values and classifies signals, and is trained to capture the predictive uncertainty due to the multiple possibilities of imputations. Importantly, we show that na\"ively combining the generative model and the classifier could result in trivial solutions where the generative model does not produce meaningful imputations. To resolve this, we present a novel regularization technique that can promote the model to produce useful imputation values that help classification. Through extensive experiments on real-world time series data with missing values, we demonstrate the effectiveness of our method.
    摘要 多变量时间序列数据在实际应用中通常含有大量缺失值。现有的主流方法为这些缺失值是采用各种各样的归纳法(如零、平均值、邻近时间步的值)或学习参数。然而,这些简单策略并不考虑数据生成过程,更重要的是,它们不能有效地捕捉预测中的不确定性。在这篇论文中,我们提出了一种新的概率 frameworks для类型化多变量时间序列数据中的缺失值。我们的模型包括两部分:深度生成模型和分类器。我们在深度生成模型部分使用了多种可能性的填充方法,以模型缺失值的不确定性。分类器部分接受了时间序列数据以及填充后的缺失值,并将信号分类,并且在不同的填充情况下预测不确定性。我们发现,将生成模型和分类器直接组合可能会导致生成模型不生成有意义的填充值,从而影响分类的准确性。为解决这个问题,我们提出了一种新的规范技术,可以促进模型生成有用的填充值,以便于分类。我们通过对实际时间序列数据中缺失值的广泛实验,证明了我们的方法的有效性。

AerialVLN: Vision-and-Language Navigation for UAVs

  • paper_url: http://arxiv.org/abs/2308.06735
  • repo_url: None
  • paper_authors: Shubo Liu, Hongsheng Zhang, Yuankai Qi, Peng Wang, Yaning Zhang, Qi Wu
  • for: 这个论文的目的是为了提出一个新的机器人任务,即空中视语Navigation(AerialVLN),用于研究机器人在开放空间中 navigation 的问题。
  • methods: 这篇论文使用了一种基于 cross-modal-alignment(CMA)方法的扩展基线模型,并使用了一个3D simulator,其中包括25个城市级别的enario,以支持连续导航、环境扩展和配置。
  • results: 论文发现,基于CMA方法的扩展基线模型与人类表现之间仍然存在一定的差距,这表明空中视语Navigation(AerialVLN)是一个新的挑战性任务。
    Abstract Recently emerged Vision-and-Language Navigation (VLN) tasks have drawn significant attention in both computer vision and natural language processing communities. Existing VLN tasks are built for agents that navigate on the ground, either indoors or outdoors. However, many tasks require intelligent agents to carry out in the sky, such as UAV-based goods delivery, traffic/security patrol, and scenery tour, to name a few. Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning. To fill this gap and facilitate research in this field, we propose a new task named AerialVLN, which is UAV-based and towards outdoor environments. We develop a 3D simulator rendered by near-realistic pictures of 25 city-level scenarios. Our simulator supports continuous navigation, environment extension and configuration. We also proposed an extended baseline model based on the widely-used cross-modal-alignment (CMA) navigation methods. We find that there is still a significant gap between the baseline model and human performance, which suggests AerialVLN is a new challenging task. Dataset and code is available at https://github.com/AirVLN/AirVLN.
    摘要 现在刚刚出现的视力语言导航(VLN)任务已经吸引了计算机视觉和自然语言处理领域的广泛关注。现有的VLN任务都是为地面上的agent进行定位 navigating,但是许多任务需要在天空中进行,如用UAV进行物资交通/安全巡查和景色游览等。在天空中导航比在地面上更加复杂,因为代理需要考虑飞行高度和更复杂的空间关系理解。为了填补这个空白和促进这一领域的研究,我们提出了一个新任务名为空中VLN,它是基于UAV的和向外部环境。我们开发了25个城市级的enario的3D模拟器,模拟器支持连续导航、环境扩展和配置。我们还提出了一种基于协调多modal(CMA)导航方法的扩展基线模型。我们发现,与人类性能相比,基线模型还有一定的差距,这表明空中VLN是一个新的挑战性任务。数据集和代码可以在https://github.com/AirVLN/AirVLN上获取。

Precipitation nowcasting with generative diffusion models

  • paper_url: http://arxiv.org/abs/2308.06733
  • repo_url: https://github.com/fmerizzi/Precipitation-nowcasting-with-generative-diffusion-models
  • paper_authors: Andrea Asperti, Fabio Merizzi, Alberto Paparella, Giorgio Pedrazzi, Matteo Angelinelli, Stefano Colamonaco
  • for: 该研究旨在检验气候预测中 diffusion models 的可行性,特别是在降水预测(precipitation nowcasting)方面。
  • methods: 该研究使用了一种生成ensemble diffusion(GED)模型,通过生成多个可能的天气enario,然后使用post-processing网络将其融合成可能性最高的预测。
  • results: 相比之前的深度学习模型,GED模型在总性能方面显著提高了。
    Abstract In recent years traditional numerical methods for accurate weather prediction have been increasingly challenged by deep learning methods. Numerous historical datasets used for short and medium-range weather forecasts are typically organized into a regular spatial grid structure. This arrangement closely resembles images: each weather variable can be visualized as a map or, when considering the temporal axis, as a video. Several classes of generative models, comprising Generative Adversarial Networks, Variational Autoencoders, or the recent Denoising Diffusion Models have largely proved their applicability to the next-frame prediction problem, and is thus natural to test their performance on the weather prediction benchmarks. Diffusion models are particularly appealing in this context, due to the intrinsically probabilistic nature of weather forecasting: what we are really interested to model is the probability distribution of weather indicators, whose expected value is the most likely prediction. In our study, we focus on a specific subset of the ERA-5 dataset, which includes hourly data pertaining to Central Europe from the years 2016 to 2021. Within this context, we examine the efficacy of diffusion models in handling the task of precipitation nowcasting. Our work is conducted in comparison to the performance of well-established U-Net models, as documented in the existing literature. Our proposed approach of Generative Ensemble Diffusion (GED) utilizes a diffusion model to generate a set of possible weather scenarios which are then amalgamated into a probable prediction via the use of a post-processing network. This approach, in comparison to recent deep learning models, substantially outperformed them in terms of overall performance.
    摘要 在最近几年,传统的数学方法 для准确的天气预测逐渐面临深度学习方法的挑战。历史数据集用于短距离和中距离天气预测通常有规则的空间格局结构,这种设置与图像非常相似,每种天气变量都可以被视为地图或者在考虑时间轴的情况下为视频。多种生成模型,包括生成对抗网络、变量自动编码器和最近的干扰扩散模型,在下一帧预测问题上有广泛的应用,因此自然地测试它们的性能在天气预测中。干扰扩散模型在这种情况下特别吸引人,因为天气预测的本质是probabilistic的:我们实际上是希望模型可以描述天气指标的概率分布,其期望值是最有可能的预测。在我们的研究中,我们专注于ERA-5数据集的一个子集,包括2016-2021年中欧每小时的数据。在这个上下文中,我们研究了干扰扩散模型在降水预测方面的能力。我们的方法与文献中已有的U-Net模型相比,并使用生成ensemble扩散(GED)模型。这种方法使用扩散模型生成一组可能的天气情况,然后使用Post处理网络将这些情况融合成一个可能的预测。与最近的深度学习模型相比,我们的方法在总性能方面表现得更好。

Transforming Sentiment Analysis in the Financial Domain with ChatGPT

  • paper_url: http://arxiv.org/abs/2308.07935
  • repo_url: None
  • paper_authors: Georgios Fatouros, John Soldatos, Kalliopi Kouroumali, Georgios Makridis, Dimosthenis Kyriazis
  • for: 本研究旨在探讨大语言模型ChatGPT 3.5在金融情感分析中的潜力, 特别是在外汇市场(forex)中。
  • methods: 本研究使用了零shot提示方法,对手动抽取的forex相关新闻标题进行了多个ChatGPT提问的测试,并使用了精度、准确率、f1score和 Mean Absolute Error(MAE)来衡量情感分类的性能。此外,还进行了对预测的情感和股票市场回报的相关性的评估。
  • results: ChatGPT比FinBERT更高级的情感分类性能提升约35%,并且与股票市场回报之间的相关性提高约36%。这些结果表明,提示工程在零shot上是非常重要的,并且指出了ChatGPT在金融应用中的潜力。
    Abstract Financial sentiment analysis plays a crucial role in decoding market trends and guiding strategic trading decisions. Despite the deployment of advanced deep learning techniques and language models to refine sentiment analysis in finance, this study breaks new ground by investigating the potential of large language models, particularly ChatGPT 3.5, in financial sentiment analysis, with a strong emphasis on the foreign exchange market (forex). Employing a zero-shot prompting approach, we examine multiple ChatGPT prompts on a meticulously curated dataset of forex-related news headlines, measuring performance using metrics such as precision, recall, f1-score, and Mean Absolute Error (MAE) of the sentiment class. Additionally, we probe the correlation between predicted sentiment and market returns as an additional evaluation approach. ChatGPT, compared to FinBERT, a well-established sentiment analysis model for financial texts, exhibited approximately 35\% enhanced performance in sentiment classification and a 36\% higher correlation with market returns. By underlining the significance of prompt engineering, particularly in zero-shot contexts, this study spotlights ChatGPT's potential to substantially boost sentiment analysis in financial applications. By sharing the utilized dataset, our intention is to stimulate further research and advancements in the field of financial services.
    摘要 Note:* "Financial sentiment analysis" Financial sentiment analysis (FSAs) is the process of analyzing text data to determine the sentiment of investors, analysts, and other market participants towards a particular financial asset or market event.* "Zero-shot prompting" Zero-shot prompting refers to the use of language models to perform tasks or generate text based on prompts that are not present in the training data.* "ChatGPT" ChatGPT is a type of large language model that uses a transformer architecture and is trained on a large corpus of text data to generate human-like text.* "FinBERT" FinBERT is a pre-trained language model that is specifically designed for financial text analysis.

CLE Diffusion: Controllable Light Enhancement Diffusion Model

  • paper_url: http://arxiv.org/abs/2308.06725
  • repo_url: None
  • paper_authors: Yuyang Yin, Dejia Xu, Chuangchuang Tan, Ping Liu, Yao Zhao, Yunchao Wei
  • For: 提高低光照图像质量,提供用户rich的控制能力* Methods: 使用 conditional diffusion model 和 Segment-Anything Model (SAM) 实现用户定制的光照增强* Results: 在量值、质量和可控性三个方面达到竞争性表现,并且提供了rich的用户控制能力
    Abstract Low light enhancement has gained increasing importance with the rapid development of visual creation and editing. However, most existing enhancement algorithms are designed to homogeneously increase the brightness of images to a pre-defined extent, limiting the user experience. To address this issue, we propose Controllable Light Enhancement Diffusion Model, dubbed CLE Diffusion, a novel diffusion framework to provide users with rich controllability. Built with a conditional diffusion model, we introduce an illumination embedding to let users control their desired brightness level. Additionally, we incorporate the Segment-Anything Model (SAM) to enable user-friendly region controllability, where users can click on objects to specify the regions they wish to enhance. Extensive experiments demonstrate that CLE Diffusion achieves competitive performance regarding quantitative metrics, qualitative results, and versatile controllability. Project page: \url{https://yuyangyin.github.io/CLEDiffusion/}
    摘要 低光照增强在视觉创作和编辑领域的快速发展中得到了越来越重要的注目。然而,现有的增强算法大多采用一致性提高图像亮度,限制用户体验。为解决这个问题,我们提出了可控光增强扩散模型(CLE Diffusion),这是一种新的扩散框架,提供了用户 Rich 控制性。通过加入条件扩散模型,我们引入了照明嵌入,让用户控制自己的需要的亮度水平。此外,我们还 incorporate了 Segment-Anything Model(SAM),使得用户可以通过点击对象来指定需要增强的区域。广泛的实验表明,CLE Diffusion 在量化指标、质量效果和多样性控制方面具有竞争力。项目页面:

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.06721
  • repo_url: None
  • paper_authors: Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, Wei Yang
  • for: 这篇论文旨在提出一种可靠且轻量级的适配器,使得预训练的文本到图像扩散模型可以使用图像提示来生成图像。
  • methods: 该适配器使用了解coupled crossed attention机制,将文本特征和图像特征分开处理,以提高生成图像的精度和效率。
  • results: experiments show that IP-Adapter可以与预训练的图像提示模型相比,在生成图像方面达到相当或更好的性能,并且可以与文本提示结合使用,实现多模态图像生成。
    Abstract Recent years have witnessed the strong power of large text-to-image diffusion models for the impressive generative capability to create high-fidelity images. However, it is very tricky to generate desired images using only text prompt as it often involves complex prompt engineering. An alternative to text prompt is image prompt, as the saying goes: "an image is worth a thousand words". Although existing methods of direct fine-tuning from pretrained models are effective, they require large computing resources and are not compatible with other base models, text prompt, and structural controls. In this paper, we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pretrained text-to-image diffusion models. The key design of our IP-Adapter is decoupled cross-attention mechanism that separates cross-attention layers for text features and image features. Despite the simplicity of our method, an IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fully fine-tuned image prompt model. As we freeze the pretrained diffusion model, the proposed IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. With the benefit of the decoupled cross-attention strategy, the image prompt can also work well with the text prompt to achieve multimodal image generation. The project page is available at \url{https://ip-adapter.github.io}.
    摘要 近年来,大型文本到图像扩散模型的强大能力吸引了广泛的关注,这些模型可以生成高效的图像。然而,使用仅文本提示来生成愿景图像是非常困难,因为这通常需要复杂的提示工程。一种alternative是使用图像提示,以示人们所说的“一个图像值得一千个话”。 although existing methods of direct fine-tuning from pretrained models are effective, they require large computing resources and are not compatible with other base models, text prompt, and structural controls. 在这篇论文中,我们提出了IP-Adapter,一种高效且轻量级的适配器,可以使得预训练的文本到图像扩散模型具备图像提示能力。我们的键要设计是分离的交叉关注机制,它将文本特征和图像特征的交叉关注层分离开来。尽管我们的方法简单,一个只有22M参数的IP-Adapter仍可以达到与完全 fine-tune 的图像提示模型相当或更好的性能。由于我们冻结了预训练的扩散模型,我们的IP-Adapter可以被普遍化到其他自定义模型,以及使用现有的可控生成工具进行可控生成。此外,使用分离的交叉关注策略,图像提示还可以与文本提示共同工作,以实现多modal图像生成。相关项目页面可以查看于 \url{https://ip-adapter.github.io}。

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

  • paper_url: http://arxiv.org/abs/2308.06718
  • repo_url: None
  • paper_authors: Feng Xie, Biwei Huang, Zhengming Chen, Ruichu Cai, Clark Glymour, Zhi Geng, Kun Zhang
  • For: The paper is focused on learning the causal structure of a system with latent variables, including identifying the number of latent variables and their relationships with observed variables.* Methods: The authors propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models with latent variables, which is used to identify the causal relationships between the observed and latent variables. They also develop a search procedure to efficiently estimate the underlying causal structure.* Results: The authors show that the proposed approach can identify the causal structure of a system with latent variables, and demonstrate its effectiveness through experimental results. Additionally, they find that the independent noise condition can be seen as a special case of the GIN condition, which provides a connection between the two concepts.
    Abstract We investigate the challenging task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To address this, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic causal models. Roughly speaking, GIN implies the existence of an exogenous set $\mathcal{S}$ relative to the parent set of $\mathbf{Y}$ (w.r.t. the causal ordering), such that $\mathcal{S}$ d-separates $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the underlying causal structure of a LiNGLaH is identifiable in light of GIN conditions under mild assumptions. Experimental results show the effectiveness of the proposed approach.
    摘要 我们研究一个复杂任务:在含有隐变量的情况下学习 causal 结构。其中包括找到隐变量的位置和它们的数量,以及确定隐变量和观测变量之间的 causal 关系。为此,我们提出一个通用独立噪声(GIN)条件,该条件用于线性非常 Gaussian 隐含 causal 模型中的隐变量,并且可以独立地测试隐变量的存在和数量。 Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^\intercal \mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic causal models. Roughly speaking, GIN implies the existence of an exogenous set $\mathcal{S}$ relative to the parent set of $\mathbf{Y}$ (w.r.t. the causal ordering), such that $\mathcal{S}$ d-separates $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the underlying causal structure of a LiNGLaH is identifiable in light of GIN conditions under mild assumptions. Experimental results show the effectiveness of the proposed approach.

Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards

  • paper_url: http://arxiv.org/abs/2308.06717
  • repo_url: None
  • paper_authors: Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani
  • for: 本研究探讨了一种复杂的主-代理人游戏,在该游戏中,代理人不能直接观察代理人的奖励实现情况,这与许多主-代理人模型不同。这种信息不均衡使得代理人困难地估计代理人的未知奖励,这种问题在各种实际场景中都有广泛的应用,如绿色能源存储合同和个性化奖励等。
  • methods: 本研究使用了多臂抽象(MAB)问题和学习代理人的方法来解决这种复杂的主-代理人游戏。在这个框架中,代理人通过学习来决定选择,而代理人则通过训练并采用了一种并行的算法来估计代理人的未知奖励。
  • results: 本研究证明了在非 Parametric 模型下,可以使用历史记录来估计代理人的未知奖励,并且可以通过一种数据驱动的奖励策略来实现这一目标。此外,我们还证明了代理人的 regret bound,即代理人在选择奖励策略时的误差 bound。最后,我们通过 simulations 来证明我们的框架在绿色能源集成合同中的实用性。
    Abstract In practice, incentive providers (i.e., principals) often cannot observe the reward realizations of incentivized agents, which is in contrast to many principal-agent models that have been previously studied. This information asymmetry challenges the principal to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, which becomes even more challenging when the agent has to learn its own rewards. This complex setting is observed in various real-life scenarios ranging from renewable energy storage contracts to personalized healthcare incentives. Hence, it offers not only interesting theoretical questions but also wide practical relevance. This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal. The agent tackles a multi-armed bandit (MAB) problem to maximize their expected reward plus incentive. On top of the agent's learning, the principal trains a parallel algorithm and faces a trade-off between consistently estimating the agent's unknown rewards and maximizing their own utility by offering adaptive incentives to lead the agent. For a non-parametric model, we introduce an estimator whose only input is the history of principal's incentives and agent's choices. We unite this estimator with a proposed data-driven incentive policy within a MAB framework. Without restricting the type of the agent's algorithm, we prove finite-sample consistency of the estimator and a rigorous regret bound for the principal by considering the sequential externality imposed by the agent. Lastly, our theoretical results are reinforced by simulations justifying applicability of our framework to green energy aggregator contracts.
    摘要 在实践中,奖励提供者(即主体)经常无法观察奖励的实现,这与许多主体-代理模型不同。这种信息不均衡使得主体Difficult to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, especially when the agent needs to learn its own rewards. This complex setting is observed in various real-life scenarios, such as renewable energy storage contracts and personalized healthcare incentives. Therefore, it not only raises interesting theoretical questions but also has wide practical relevance.本文研究了一个反复的反选择游戏,在这个游戏中,一个自利益的学习代理和一个学习的主体之间进行交互。代理面临着一个多重武器问题(MAB),以最大化其预期奖励加上奖励。同时,主体也在学习,面临着一种奖励的适应性和自己的利用率之间的负担。为一个非 Parametric 模型,我们引入了一个仅基于主体的奖励历史和代理的选择的估计器。我们将这个估计器与一种基于 MAB 的数据驱动奖励策略结合。不Restricting the type of the agent's algorithm, we prove the finite-sample consistency of the estimator and a rigorous regret bound for the principal by considering the sequential externality imposed by the agent.最后,我们的理论结果被实践中的 simulations 证明了我们的框架在绿色能源聚合合同中的应用可行性。

Learning on Graphs with Out-of-Distribution Nodes

  • paper_url: http://arxiv.org/abs/2308.06714
  • repo_url: https://github.com/songyyyy/kdd22-oodgat
  • paper_authors: Yu Song, Donglin Wang
  • for: 本研究旨在 Addressing the problem of graph learning with out-of-distribution nodes, aiming to detect outliers and classify remaining nodes to known classes.
  • methods: 提出了一种新的 Graph Attention Network (GAT) 模型,称为 Out-of-Distribution Graph Attention Network (OODGAT),该模型通过Explicitly modeling the interaction between different kinds of nodes and separating inliers from outliers during feature propagation,以便检测异常节点和分类正常节点。
  • results: 对多个实验 dataset 进行了extensive experiments,结果表明 OODGAT 比现有的异常检测方法具有更大的优势,同时与正常分类 Task 的性能相当。
    Abstract Graph Neural Networks (GNNs) are state-of-the-art models for performing prediction tasks on graphs. While existing GNNs have shown great performance on various tasks related to graphs, little attention has been paid to the scenario where out-of-distribution (OOD) nodes exist in the graph during training and inference. Borrowing the concept from CV and NLP, we define OOD nodes as nodes with labels unseen from the training set. Since a lot of networks are automatically constructed by programs, real-world graphs are often noisy and may contain nodes from unknown distributions. In this work, we define the problem of graph learning with out-of-distribution nodes. Specifically, we aim to accomplish two tasks: 1) detect nodes which do not belong to the known distribution and 2) classify the remaining nodes to be one of the known classes. We demonstrate that the connection patterns in graphs are informative for outlier detection, and propose Out-of-Distribution Graph Attention Network (OODGAT), a novel GNN model which explicitly models the interaction between different kinds of nodes and separate inliers from outliers during feature propagation. Extensive experiments show that OODGAT outperforms existing outlier detection methods by a large margin, while being better or comparable in terms of in-distribution classification.
    摘要 GRAPH NEURAL NETWORKS (GNNs) 是当今最先进的图数据预测模型。而现有的 GNN 模型在各种图数据任务上已经表现出色,但是对于图数据中存在 OUT-OF-DISTRIBUTION(OOD)节点的情况却得到了相对的少的关注。从 Computer Vision 和自然语言处理中借鉴的概念,我们定义 OOD 节点为训练集中未出现过的标签。实际上,由于许多网络是通过程序自动构建的,真实世界中的图数据经常具有噪音和未知分布的特点,因此在这种情况下,我们定义了图学习 WITH OUT-OF-DISTRIBUTION NODES 的问题。特别是,我们希望完成两个任务:1)检测图中不属于已知分布的节点,2)分类剩下的节点为已知类别中的一个。我们表明了图中连接 patrern 可以为 OUTLIER 检测提供信息,并提出了一种新的 GNN 模型,即 Out-of-Distribution Graph Attention Network (OODGAT),该模型在传播特征时显式地处理不同类型的节点之间的交互,以分离异常节点和常见节点。我们进行了广泛的实验,并证明了 OODGAT 在 OUTLIER 检测方面比现有的方法有大幅度的提升,而且在可见分类方面也是或等于于比较好的。

Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection

  • paper_url: http://arxiv.org/abs/2308.06701
  • repo_url: None
  • paper_authors: Haichao Zhang, Can Qin, Yu Yin, Yun Fu
  • for: 提高掩蔽物检测的深度学习模型性能
  • methods: 使用生成模型生成真实的掩蔽图像,并将其用于训练现有的对象检测模型
  • results: 在三个数据集(COD10k、CAMO和CHAMELEON)上超越当前状态的方法, demonstarting 其在掩蔽物检测中的有效性
    Abstract Camouflaged objects that blend into natural scenes pose significant challenges for deep-learning models to detect and synthesize. While camouflaged object detection is a crucial task in computer vision with diverse real-world applications, this research topic has been constrained by limited data availability. We propose a framework for synthesizing camouflage data to enhance the detection of camouflaged objects in natural scenes. Our approach employs a generative model to produce realistic camouflage images, which can be used to train existing object detection models. Specifically, we use a camouflage environment generator supervised by a camouflage distribution classifier to synthesize the camouflage images, which are then fed into our generator to expand the dataset. Our framework outperforms the current state-of-the-art method on three datasets (COD10k, CAMO, and CHAMELEON), demonstrating its effectiveness in improving camouflaged object detection. This approach can serve as a plug-and-play data generation and augmentation module for existing camouflaged object detection tasks and provides a novel way to introduce more diversity and distributions into current camouflage datasets.
    摘要 伪装物体在自然场景中混合是深度学习模型检测和synthesize的重要挑战。伪装物体检测是计算机视觉中重要的应用领域之一,但这一研究领域受到有限的数据可用性的限制。我们提出了一种框架,用于增强自然场景中的伪装物体检测。我们的方法使用生成模型生成真实的伪装图像,这些图像可以用来训练现有的对象检测模型。具体来说,我们使用一个伪装环境生成器,它是根据伪装分布分类器进行监督的。我们的框架在三个数据集(COD10k、CAMO和CHAMELEON)上表现出了比前一个状态的方法更好的性能,这说明了我们的方法的有效性。这种方法可以作为现有伪装物体检测任务的数据生成和增强模块,并提供了一种新的多样性和分布的引入方式,以便为当前的伪装数据集增加更多的多样性。

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

  • paper_url: http://arxiv.org/abs/2308.06696
  • repo_url: https://github.com/zjukg/maco
  • paper_authors: Yichi Zhang, Zhuo Chen, Wen Zhang
  • for: 提高大规模知识图(KG)中缺失模态信息的问题,以便更好地完成知识图完成(KGC)任务。
  • methods: 提出了一种模态对抗和对比框架(MACO),通过对Generator和Discriminator进行对抗训练,生成缺失模态特征,以便在MMKGC模型中使用。同时,我们设计了交叉模态对比损失,以提高生成器的性能。
  • results: 在公共benchmark上进行了实验,并进行了进一步的探索,结果显示MACO可以达到状态机器人的 результаats,并且可以用于强化多种MMKGC模型。我们的代码和benchmark数据可以在https://github.com/zjukg/MACO上获取。
    Abstract Recent years have seen significant advancements in multi-modal knowledge graph completion (MMKGC). MMKGC enhances knowledge graph completion (KGC) by integrating multi-modal entity information, thereby facilitating the discovery of unobserved triples in the large-scale knowledge graphs (KGs). Nevertheless, existing methods emphasize the design of elegant KGC models to facilitate modality interaction, neglecting the real-life problem of missing modalities in KGs. The missing modality information impedes modal interaction, consequently undermining the model's performance. In this paper, we propose a modality adversarial and contrastive framework (MACO) to solve the modality-missing problem in MMKGC. MACO trains a generator and discriminator adversarially to generate missing modality features that can be incorporated into the MMKGC model. Meanwhile, we design a cross-modal contrastive loss to improve the performance of the generator. Experiments on public benchmarks with further explorations demonstrate that MACO could achieve state-of-the-art results and serve as a versatile framework to bolster various MMKGC models. Our code and benchmark data are available at https://github.com/zjukg/MACO.
    摘要 近年来,多模式知识图完成(MMKGC)技术得到了 significiant 进步。 MMKGC 可以将多种模式实体信息集成到知识图中,从而促进发现大规模知识图中的未观察 triple。然而,现有的方法强调设计美观的 KGC 模型,以便模态之间的互动,忽略了实际生活中的缺失模态问题。缺失模态信息会阻碍模态之间的互动,从而降低模型的性能。在这篇论文中,我们提出了一种模态对抗和对比框架(MACO),用于解决 MMKGC 中缺失模态问题。MACO 通过对generator和discriminator进行对抗训练,生成缺失模态特征,以便在 MMKGC 模型中添加。同时,我们设计了对比架构来提高生成器的性能。实验结果表明,MACO 可以达到领先的Result和服务为多种 MMKGC 模型的可靠框架。我们的代码和 benchmark 数据可以在 https://github.com/zjukg/MACO 中下载。

Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion

  • paper_url: http://arxiv.org/abs/2308.06685
  • repo_url: None
  • paper_authors: Yutao Jin, Bin Liu, Jing Wang
  • for: 本研究旨在提高视频captioning模型的准确性和完整性,通过使用 dual graphs和gated fusion来生成多维度特征表示。
  • methods: 我们提出了基于 dual graphs和gated fusion的视频captioning模型,包括 dual-graphs reasoning和gated fusion两部分。 dual-graphs reasoning通过两种图来生成视频内容的多个方面特征,而gated fusion则是将多个特征表示之间的信息聚合以提高视频内容的全面理解。
  • results: 我们在MSVD和MSR-VTT两个常用 dataset上进行了实验,并取得了当前领域最佳表现。
    Abstract The application of video captioning models aims at translating the content of videos by using accurate natural language. Due to the complex nature inbetween object interaction in the video, the comprehensive understanding of spatio-temporal relations of objects remains a challenging task. Existing methods often fail in generating sufficient feature representations of video content. In this paper, we propose a video captioning model based on dual graphs and gated fusion: we adapt two types of graphs to generate feature representations of video content and utilize gated fusion to further understand these different levels of information. Using a dual-graphs model to generate appearance features and motion features respectively can utilize the content correlation in frames to generate various features from multiple perspectives. Among them, dual-graphs reasoning can enhance the content correlation in frame sequences to generate advanced semantic features; The gated fusion, on the other hand, aggregates the information in multiple feature representations for comprehensive video content understanding. The experiments conducted on worldly used datasets MSVD and MSR-VTT demonstrate state-of-the-art performance of our proposed approach.
    摘要 视频captioning模型的应用目标是将视频内容翻译成准确的自然语言。由于视频中对象之间的复杂交互,理解视频内容的空间时间关系是一项具有挑战性的任务。现有方法通常无法生成足够的视频内容特征表示。在这篇论文中,我们提出了基于双图和闭合融合的视频captioning模型:我们采用两种类型的图来生成视频内容的特征表示,并使用闭合融合来进一步理解这些不同水平的信息。使用双图模型生成出现特征和运动特征分别可以利用帧序列中的内容相关性来生成多种特征从多个角度。其中,双图理解可以增强帧序列中的内容相关性,生成更高级别的 semantic 特征;而闭合融合则可以将多个特征表示的信息聚合在一起,实现视频内容全面理解。我们在MSVD和MSR-VTT等世界 commonly 使用的数据集上进行了实验,并达到了当前最佳性能。

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

  • paper_url: http://arxiv.org/abs/2308.06671
  • repo_url: None
  • paper_authors: Liu Ziyin, Hongchao Li, Masahito Ueda
    for: 这个论文的目的是解释如何使用权重学习来训练神经网络,以及神经网络训练过程中SGD算法的工作机制。methods: 这篇论文使用了权重学习和SGD算法来训练神经网络,并通过分批训练来证明SGD算法可以带来对神经网络的稳定化。results: 这篇论文的结果表明,当损失函数具有对称性时,SGD算法会带来对神经网络的稳定化,并且可以带来复杂的非线性现象,如阶梯过程中的相态转变、缺失ерgodicity和振荡倒转。这些现象只存在于深度很大的神经网络中,这表明了深度神经网络和浅度神经网络之间的根本区别。
    Abstract The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we prove that the minibatch noise of SGD regularizes the solution towards a balanced solution whenever the loss function contains a rescaling symmetry. Because the difference between a simple diffusion process and SGD dynamics is the most significant when symmetries are present, our theory implies that the loss function symmetries constitute an essential probe of how SGD works. We then apply this result to derive the stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width. The stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.
    摘要 Stochastic gradient descent(SGD)算法是我们用来训练神经网络的算法。但是,SGD在神经网络的高度非线性和潜在的异常点的搜索方面仍然不甚了解。在这项工作中,我们证明了SGD中的小批处理噪声规范化神经网络的解决方案,当损失函数包含对称性时。因为在对称性存在时,SGD的动态和普通液体流动的差异最大,我们的理论意味着损失函数的对称性是SGD工作的重要检验。我们then使用这个结果来Derive stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width. Stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.Note that Simplified Chinese is a simplified version of Chinese that uses shorter words and sentences, and is often used in informal writing and online communication. Traditional Chinese is a more formal version of Chinese that is used in formal writing and in most printed materials.

Unsupervised Adaptation of Polyp Segmentation Models via Coarse-to-Fine Self-Supervision

  • paper_url: http://arxiv.org/abs/2308.06665
  • repo_url: None
  • paper_authors: Jiexiang Wang, Chaoqi Chen
  • for: 本研究实验为了解决受隐私和安全问题限制的对应领域自适应(UDA) 问题,专注于不需要源数据的对应领域自适应(SFDA) 方法。
  • methods: 本研究提出了一个名为 Region-to-Pixel Adaptation Network~(RPANet) 的新 SFDA 框架,通过均衡粗细自我监督学习,从粗细到细节层次掌握区域和像素层次的标识表现。RPANet 包括两个模组:Foreground-aware Contrastive Learning (FCL) 和 Confidence-Calibrated Pseudo-Labeling (CCPL),它们分别解决了“如何区别”和“如何调整”的关键挑战。
  • results: 实验结果显示,RPANet 在三个跨领域肿瘤分类任务上具有优秀的表现,较以前SFDA和UDA方法无法达到的水准,显示了SFDA在医疗应用中的潜力。
    Abstract Unsupervised Domain Adaptation~(UDA) has attracted a surge of interest over the past decade but is difficult to be used in real-world applications. Considering the privacy-preservation issues and security concerns, in this work, we study a practical problem of Source-Free Domain Adaptation (SFDA), which eliminates the reliance on annotated source data. Current SFDA methods focus on extracting domain knowledge from the source-trained model but neglects the intrinsic structure of the target domain. Moreover, they typically utilize pseudo labels for self-training in the target domain, but suffer from the notorious error accumulation problem. To address these issues, we propose a new SFDA framework, called Region-to-Pixel Adaptation Network~(RPANet), which learns the region-level and pixel-level discriminative representations through coarse-to-fine self-supervision. The proposed RPANet consists of two modules, Foreground-aware Contrastive Learning (FCL) and Confidence-Calibrated Pseudo-Labeling (CCPL), which explicitly address the key challenges of ``how to distinguish'' and ``how to refine''. To be specific, FCL introduces a supervised contrastive learning paradigm in the region level to contrast different region centroids across different target images, which efficiently involves all pseudo labels while robust to noisy samples. CCPL designs a novel fusion strategy to reduce the overconfidence problem of pseudo labels by fusing two different target predictions without introducing any additional network modules. Extensive experiments on three cross-domain polyp segmentation tasks reveal that RPANet significantly outperforms state-of-the-art SFDA and UDA methods without access to source data, revealing the potential of SFDA in medical applications.
    摘要 自然语言处理中的Unsupervised Domain Adaptation~(UDA)在过去的一个十年里引起了广泛的关注,但在实际应用中却难以使用。 Considering the privacy-preservation issues和安全问题,在这种工作中,我们研究了一个实用的源无需采用Domain Adaptation~(SFDA)问题,该问题消除了源数据的注释。Current SFDA方法通过提取源模型中的领域知识来解决问题,但忽略了目标频谱的内在结构。另外,它们通常通过pseudo标签进行自我训练在目标频谱中,但受到误差积累问题的困扰。为了解决这些问题,我们提出了一个新的SFDA框架,calledRegion-to-Pixel Adaptation Network~(RPANet),该框架通过粗细自动supervised contrastive learning和Confidence-Calibrated Pseudo-Labeling(CCPL)模块来学习区域水平和像素级别的抑制表示。RPANet包括两个模块:Foreground-aware Contrastive Learning(FCL)和Confidence-Calibrated Pseudo-Labeling(CCPL),这两个模块直接解决了“如何 отличи出”和“如何精细化”的关键问题。具体来说,FCL引入了一种监督对照学习 парадигмы,在不同的目标图像中对不同的区域中心进行对比,以fficiently利用所有pseudo标签,并且对噪声样本具有鲁棒性。CCPL设计了一种新的融合策略,通过将两个不同的目标预测融合而不引入任何额外网络模块,以减少pseudo标签过度自信的问题。经验表明,RPANet在三个跨频谱肠segmentation任务上显著超过了state-of-the-art SFDA和UDA方法,无需访问源数据,探明SFDA在医疗应用中的潜力。

ALGAN: Time Series Anomaly Detection with Adjusted-LSTM GAN

  • paper_url: http://arxiv.org/abs/2308.06663
  • repo_url: None
  • paper_authors: Md Abul Bashar, Richi Nayak
  • for: 这篇论文旨在应用生成对抗网络(GAN)方法来探测时间序列资料中的异常点。
  • methods: 这篇论文提出了一个新的GAN模型,名为调整LSTM GAN(ALGAN),它可以在无supervision的情况下提高时间序列资料中的异常点探测精度。
  • results: 论文的实验结果显示,ALGAN在46个真实世界单变时间序列数据集和多个领域的大量多变时间序列数据集上的异常点探测精度高于传统、神经网络基于的和其他GAN基于的方法。
    Abstract Anomaly detection in time series data, to identify points that deviate from normal behaviour, is a common problem in various domains such as manufacturing, medical imaging, and cybersecurity. Recently, Generative Adversarial Networks (GANs) are shown to be effective in detecting anomalies in time series data. The neural network architecture of GANs (i.e. Generator and Discriminator) can significantly improve anomaly detection accuracy. In this paper, we propose a new GAN model, named Adjusted-LSTM GAN (ALGAN), which adjusts the output of an LSTM network for improved anomaly detection in both univariate and multivariate time series data in an unsupervised setting. We evaluate the performance of ALGAN on 46 real-world univariate time series datasets and a large multivariate dataset that spans multiple domains. Our experiments demonstrate that ALGAN outperforms traditional, neural network-based, and other GAN-based methods for anomaly detection in time series data.
    摘要 “时间序列资料中的偏差探测,以探测不同于常规行为的点,是不同领域中的一个常见问题,例如生产、医疗影像和 cybersecurity。在最近的研究中,生成对抗网络(GANs)已经被证明能够优化时间序列资料中的偏差探测精度。这个神经网络架构(i.e. 生成器和识别器)可以在无监督下提高偏差探测精度。在这篇论文中,我们提出了一个新的 GAN 模型,名为 Adjusted-LSTM GAN(ALGAN),它可以在无监督下对时间序列资料进行优化的偏差探测。我们将这个模型评估在 46 个真实的时间序列资料集和一个大的多重时间序列资料集中,结果显示 ALGAN 可以在无监督下优化时间序列资料中的偏差探测精度,并且比较传统的神经网络、神经网络基于的方法和其他 GAN 基于的方法更高。”

Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features

  • paper_url: http://arxiv.org/abs/2308.08482
  • repo_url: https://github.com/yiiizhang/shortcutDebiasing
  • paper_authors: Yi Zhang, Jitao Sang, Junyang Wang, Dongmei Jiang, Yaowei Wang
  • for: 这篇论文旨在解决机器学习模型对敏感社会特征(如性别和种族)的预测问题,以确保在社会应用中保持公平性。
  • methods: 这篇论文提出了一种称为“Shortcut Debiasing”的方法,将偏见特征(如性别)转换为快捷特征( Shortcut Features),然后使用 causal intervention 方法删除这些快捷特征 durante 推断过程。
  • results: 这篇论文在多个 benchmark 数据集上实现了与现有debiasing方法相比的重要改善, both 精度和公平性方面。
    Abstract Machine learning models often learn to make predictions that rely on sensitive social attributes like gender and race, which poses significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Existing work tackles this issue by minimizing the employed information about social attributes in models for debiasing. However, the high correlation between target task and these social attributes makes learning on the target task incompatible with debiasing. Given that model bias arises due to the learning of bias features (\emph{i.e}., gender) that help target task optimization, we explore the following research question: \emph{Can we leverage shortcut features to replace the role of bias feature in target task optimization for debiasing?} To this end, we propose \emph{Shortcut Debiasing}, to first transfer the target task's learning of bias attributes from bias features to shortcut features, and then employ causal intervention to eliminate shortcut features during inference. The key idea of \emph{Shortcut Debiasing} is to design controllable shortcut features to on one hand replace bias features in contributing to the target task during the training stage, and on the other hand be easily removed by intervention during the inference stage. This guarantees the learning of the target task does not hinder the elimination of bias features. We apply \emph{Shortcut Debiasing} to several benchmark datasets, and achieve significant improvements over the state-of-the-art debiasing methods in both accuracy and fairness.
    摘要

  • paper_url: http://arxiv.org/abs/2308.06653
  • repo_url: None
  • paper_authors: Srijoni Majumdar, Partha Pratim Das
  • for: Addressing the issue of rising software maintenance cost due to program comprehension challenges.
  • methods: Proposes SMARTKT (Smart Knowledge Transfer), a search framework that extracts and integrates knowledge related to various aspects of an application in the form of a semantic graph, supporting syntax and semantic queries and converting the process of program comprehension into a “google-like” search problem.
  • results: Not specified in the abstract, but the paper likely presents the effectiveness of SMARTKT in improving program comprehension and reducing software maintenance costs.
    Abstract To address the issue of rising software maintenance cost due to program comprehension challenges, we propose SMARTKT (Smart Knowledge Transfer), a search framework, which extracts and integrates knowledge related to various aspects of an application in form of a semantic graph. This graph supports syntax and semantic queries and converts the process of program comprehension into a {\em google-like} search problem.
    摘要 Here's the breakdown of the translation:* "rising software maintenance cost" is translated as "软件维护成本的增加"* "due to program comprehension challenges" is translated as "由程序理解困难引起"* "we propose SMARTKT" is translated as "我们提出智能知识传输"* "a search framework" is translated as "一个搜索框架"* "which extracts and integrates knowledge related to various aspects of an application" is translated as "可以提取和集成各个方面的应用程序知识"* "in the form of a semantic graph" is translated as "以semantic graph的形式"* "This graph supports syntax and semantic queries" is translated as "这个图表支持语法和 semantics 查询"* "and converts the process of program comprehension into a 'Google-like' search problem" is translated as "并将程序理解过程转化为一个"Google-like"搜索问题"Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Stationary Algorithmic Balancing For Dynamic Email Re-Ranking Problem

  • paper_url: http://arxiv.org/abs/2308.08460
  • repo_url: https://github.com/jylevangeline/mosr
  • paper_authors: Jiayi Liu, Jennifer Neville
  • for: 这个研究旨在提出一个基于多重目标的电子邮件推荐系统,以满足用户在不同时间的偏好变化。
  • methods: 这个研究使用了一个适应控制模型来动态均衡多重目标,包括 sender和topic的相关性、时间新鲜度和信息简洁度。
  • results: 研究结果显示,MOSR在非站ARY preferences下表现更好,特别是在用户偏好变化时。另外,MOSR在不同样本中的稳定性也得到了证明。
    Abstract Email platforms need to generate personalized rankings of emails that satisfy user preferences, which may vary over time. We approach this as a recommendation problem based on three criteria: closeness (how relevant the sender and topic are to the user), timeliness (how recent the email is), and conciseness (how brief the email is). We propose MOSR (Multi-Objective Stationary Recommender), a novel online algorithm that uses an adaptive control model to dynamically balance these criteria and adapt to preference changes. We evaluate MOSR on the Enron Email Dataset, a large collection of real emails, and compare it with other baselines. The results show that MOSR achieves better performance, especially under non-stationary preferences, where users value different criteria more or less over time. We also test MOSR's robustness on a smaller down-sampled dataset that exhibits high variance in email characteristics, and show that it maintains stable rankings across different samples. Our work offers novel insights into how to design email re-ranking systems that account for multiple objectives impacting user satisfaction.
    摘要

Accelerating Diffusion-based Combinatorial Optimization Solvers by Progressive Distillation

  • paper_url: http://arxiv.org/abs/2308.06644
  • repo_url: https://github.com/jwrh/Accelerating-Diffusion-based-Combinatorial-Optimization-Solvers-by-Progressive-Distillation
  • paper_authors: Junwei Huang, Zhiqing Sun, Yiming Yang
  • for: 提高NP-完全 combinatorial优化(CO)问题的解决质量和搜索效率
  • methods: 使用进步浸泡法加速推理,通过在推理过程中采取 fewer steps 来减少推理时间
  • results: 实验结果显示,使用进步浸泡法可以提高推理速度 16 倍,只带来 0.019% 的性能下降在 TSP-50 数据集上
    Abstract Graph-based diffusion models have shown promising results in terms of generating high-quality solutions to NP-complete (NPC) combinatorial optimization (CO) problems. However, those models are often inefficient in inference, due to the iterative evaluation nature of the denoising diffusion process. This paper proposes to use progressive distillation to speed up the inference by taking fewer steps (e.g., forecasting two steps ahead within a single step) during the denoising process. Our experimental results show that the progressively distilled model can perform inference 16 times faster with only 0.019% degradation in performance on the TSP-50 dataset.
    摘要 几何基于的扩散模型已经在解决NP完备(NPC) combinatorial优化(CO)问题中显示出了可观的成果,但是这些模型往往在推断中效率低下,因为推断过程是迭代评估的。本文提议使用进步养分来加速推断,在推断过程中只需要几步(例如在单步中预测两步)。我们的实验结果显示,使用进步养分的模型可以在TSP-50 dataset上实现16倍的推断速度,仅带来0.019%的性能下降。

Can Unstructured Pruning Reduce the Depth in Deep Neural Networks?

  • paper_url: http://arxiv.org/abs/2308.06619
  • repo_url: None
  • paper_authors: Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione
  • for: 降低深度神经网络大小而保持性能
  • methods: 使用Entropy Guided Pruning算法(EGP),主要是根据层次 entropy 来决定 Connection 的缩减和完全删除
  • results: 对 популяр的模型 ResNet-18 和 Swin-T 进行了广泛的实验,发现 EGP 能够有效地压缩深度神经网络,同时保持竞争性能水平。
    Abstract Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance. However, such a technique, despite being able to massively compress deep models, is hardly able to remove entire layers from a model (even when structured): is this an addressable task? In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducing the size of deep neural networks while preserving their performance. The key focus of EGP is to prioritize pruning connections in layers with low entropy, ultimately leading to their complete removal. Through extensive experiments conducted on popular models like ResNet-18 and Swin-T, our findings demonstrate that EGP effectively compresses deep neural networks while maintaining competitive performance levels. Our results not only shed light on the underlying mechanism behind the advantages of unstructured pruning, but also pave the way for further investigations into the intricate relationship between entropy, pruning techniques, and deep learning performance. The EGP algorithm and its insights hold great promise for advancing the field of network compression and optimization. The source code for EGP is released open-source.
    摘要 剪辑是一种广泛使用的技术,用于减少深度神经网络的大小,保持其性能。然而,这种技术,即使能够压缩深度模型,几乎无法完全移除层(即使是结构化的):是这个任务可行吗?在这项研究中,我们介绍了EGP算法,一种创新的熵导向剪辑算法,用于减少深度神经网络的大小,保持其性能。EGP的关键焦点在于优先剪辑层中的低熵连接,最终导致它们的完全移除。经过对流行的模型如ResNet-18和Swin-T进行了广泛的实验,我们的发现表明EGP有效地压缩深度神经网络,保持竞争力的性能水平。我们的结果不仅解释了剪辑技术的优势,还开创了进一步研究熵、剪辑技术和深度学习性能之间的复杂关系。EGP算法和其发现将对神经网络压缩和优化领域的发展带来巨大的潜力。EGP算法的源代码已经开源发布。

On the Interplay of Convolutional Padding and Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2308.06612
  • repo_url: None
  • paper_authors: Paul Gavrikov, Janis Keuper
  • for: 本研究探讨了 padding 对 adversarial attack 的影响,并 analyzed 了不同的 padding 模式对 adversarial robustness 的影响。
  • methods: 本研究使用了 Convolutional Neural Networks (CNN),并对 input 进行 padding 以 preserve 特征图像的分辨率。
  • results: 研究发现, adversarial attacks 通常会在图像边缘产生偏差异常,这些偏差异常与 padding 使用的边界有关。
    Abstract It is common practice to apply padding prior to convolution operations to preserve the resolution of feature-maps in Convolutional Neural Networks (CNN). While many alternatives exist, this is often achieved by adding a border of zeros around the inputs. In this work, we show that adversarial attacks often result in perturbation anomalies at the image boundaries, which are the areas where padding is used. Consequently, we aim to provide an analysis of the interplay between padding and adversarial attacks and seek an answer to the question of how different padding modes (or their absence) affect adversarial robustness in various scenarios.
    摘要 通常来说,在卷积神经网络(CNN)中,会在特征地图前加padding以保持特征地图的分辨率。虽然有很多选择,通常是通过添加边框中 zeros 来实现。在这项工作中,我们发现了敌意攻击通常会在图像边界上产生异常的扰动,这 precisamente 是 padding 使用的地方。因此,我们想进行 padding 和敌意攻击之间的分析,并查找不同 padding 模式(或其缺失)对敌意Robustness 在不同情况下的影响。

Bio-SIEVE: Exploring Instruction Tuning Large Language Models for Systematic Review Automation

  • paper_url: http://arxiv.org/abs/2308.06610
  • repo_url: https://github.com/ambroser53/bio-sieve
  • paper_authors: Ambrose Robinson, William Thorne, Ben P. Wu, Abdullah Pandor, Munira Essat, Mark Stevenson, Xingyi Song
  • for: 这个研究旨在探讨如何使用大自然语言模型(LLMs)支持和训练,以便在提供明确的选择标准下进行文献屏选。
  • methods: 研究使用了 instruction tuning 方法来训练 LLaMA 和 Guanaco 模型,以进行摘要屏选。
  • results: 研究发现,使用 Bio-SIEVE 模型可以超越 ChatGPT 和经过训练的传统方法,并在医学领域中更好地泛化。然而,在安全性优先的场景下,模型仍然需要进一步适应。此外,研究还探讨了多任务训练 Bio-SIEVE-Multi 模型,包括 PICO 提取和排除逻辑等任务,但发现它无法与单任务 Bio-SIEVE 的性能相比。
    Abstract Medical systematic reviews can be very costly and resource intensive. We explore how Large Language Models (LLMs) can support and be trained to perform literature screening when provided with a detailed set of selection criteria. Specifically, we instruction tune LLaMA and Guanaco models to perform abstract screening for medical systematic reviews. Our best model, Bio-SIEVE, outperforms both ChatGPT and trained traditional approaches, and generalises better across medical domains. However, there remains the challenge of adapting the model to safety-first scenarios. We also explore the impact of multi-task training with Bio-SIEVE-Multi, including tasks such as PICO extraction and exclusion reasoning, but find that it is unable to match single-task Bio-SIEVE's performance. We see Bio-SIEVE as an important step towards specialising LLMs for the biomedical systematic review process and explore its future developmental opportunities. We release our models, code and a list of DOIs to reconstruct our dataset for reproducibility.
    摘要 医学系统atic review可以非常昂贵和资源占用。我们探讨如何使用大型自然语言模型(LLM)支持和训练来执行文献屏选。特别是我们 instrucion 调整 LLaMA 和 Guanaco 模型来执行医学系统atic review的摘要屏选。我们的最佳模型,生物-SIEVE,超过了 ChatGPT 和训练过的传统方法,并在医学领域中更好地泛化。然而,还有一个适应模型到安全第一的挑战。我们还探讨 Bio-SIEVE 多任务训练的影响,包括 PICO 提取和排除逻辑任务,但发现它无法与单任务 Bio-SIEVE 的性能匹配。我们认为 Bio-SIEVE 是特циализиing LLMs для医学系统atic review过程中的重要一步,并探讨其未来发展机遇。我们发布我们的模型、代码和 DOIs 以便重现我们的数据集。