2023-08-13

cs.LG

cs.LG - 2023-08-13

Faithful to Whom? Questioning Interpretability Measures in NLP

paper_url: http://arxiv.org/abs/2308.06795
repo_url: None
paper_authors: Evan Crothers, Herna Viktor, Nathalie Japkowicz
for: 这 paper 是为了评估不同神经网络文本分类器的解释性而写的。
methods: 这 paper 使用了基于层次遮盖的 faithfulness 度量来评估模型的解释性，并证明了这些度量不适合比较不同模型的解释性。
results: 研究发现，基于层次遮盖的 faithfulness 度量在不同模型之间可能会带来很大的差异，而且遮盖样本经常处于训练期间所未见 Distribution 之外。

Abstract
A common approach to quantifying model interpretability is to calculate faithfulness metrics based on iteratively masking input tokens and measuring how much the predicted label changes as a result. However, we show that such metrics are generally not suitable for comparing the interpretability of different neural text classifiers as the response to masked inputs is highly model-specific. We demonstrate that iterative masking can produce large variation in faithfulness scores between comparable models, and show that masked samples are frequently outside the distribution seen during training. We further investigate the impact of adversarial attacks and adversarial training on faithfulness scores, and demonstrate the relevance of faithfulness measures for analyzing feature salience in text adversarial attacks. Our findings provide new insights into the limitations of current faithfulness metrics and key considerations to utilize them appropriately.

摘要
一种常见的方法量化模型解释性是通过逐渐遮盖输入符号来计算输出标签变化的程度。然而，我们显示这些度量不适合比较不同的神经网络文本分类器的解释性，因为遮盖输入的响应是高度模型特定的。我们示出了遮盖样本会导致大量的 faithfulness 分数变化，并且显示遮盖样本 frequently 外部训练数据分布。我们进一步调查了对抗攻击和对抗训练对 faithfulness 度量的影响，并证明了 faithfulness 度量对文本对抗攻击中的特征突出性进行分析具有重要意义。我们的发现为现有的 faithfulness 度量带来新的理解和使用其应用中的关键考虑因素。

Neural Networks at a Fraction with Pruned Quaternions

paper_url: http://arxiv.org/abs/2308.06780
repo_url: https://github.com/smlab-niser/quartLT22
paper_authors: Sahel Mohammad Iqbal, Subhankar Mishra
for: 这个研究旨在测试在极端资源受限的环境中，使用简单的神经网络来进行预测。
methods: 研究使用删减来简化神经网络中的参数数量，并使用高维数据嵌入来维持预测精度。
results: 研究发现，在某些架构和 dataset 上，删减后的数值网络可以超过相同架构的实际网络。例如，在 CIFAR-10 上使用 Conv-4 架构时，删减后的数值网络在 $3%$ 的参数数量下，可以比实际网络高于 $10%$。

Abstract
Contemporary state-of-the-art neural networks have increasingly large numbers of parameters, which prevents their deployment on devices with limited computational power. Pruning is one technique to remove unnecessary weights and reduce resource requirements for training and inference. In addition, for ML tasks where the input data is multi-dimensional, using higher-dimensional data embeddings such as complex numbers or quaternions has been shown to reduce the parameter count while maintaining accuracy. In this work, we conduct pruning on real and quaternion-valued implementations of different architectures on classification tasks. We find that for some architectures, at very high sparsity levels, quaternion models provide higher accuracies than their real counterparts. For example, at the task of image classification on CIFAR-10 using Conv-4, at $3\%$ of the number of parameters as the original model, the pruned quaternion version outperforms the pruned real by more than $10\%$. Experiments on various network architectures and datasets show that for deployment in extremely resource-constrained environments, a sparse quaternion network might be a better candidate than a real sparse model of similar architecture.

摘要
当代最先进的神经网络具有越来越多的参数，这限制了它们在计算能力有限的设备上进行训练和推理的可行性。剪枝是一种技术，可以从神经网络中移除不必要的权重，以降低训练和推理的资源需求。此外，在多维输入数据的机器学习任务中，使用高维数域嵌入，如复数或四元数，可以降低参数数量而保持准确性。在这项工作中，我们对实际值和四元数值实现的不同架构进行剪枝处理，并发现在某些架构上，难以架构的四元数模型在高度精简率下提供更高的准确性。例如，在使用Conv-4架构进行图像分类任务时，采用了$3\%$的参数数量的剪枝后，四元数模型的准确性高于实际值模型的准确性超过$10\%$。经过了不同的网络架构和数据集的实验，我们发现在极其有限的资源环境中，一个稀疏的四元数网络可能比同类架构的实际稀疏模型更适合进行部署。

A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations

paper_url: http://arxiv.org/abs/2308.06767
repo_url: https://github.com/hrcheng1066/awesome-pruning
paper_authors: Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi
for: 本文提供了现代深度神经网络压缩的综述，尤其是最新的大型语言模型，以及压缩方法的批判和评价。
methods: 本文分类了现有的压缩研究工作，包括一般/特定加速、压缩时机、压缩方法和压缩与其他压缩技术的融合。
results: 本文提供了七对对比设定的深度神经网络压缩的比较分析，并探讨了emerging topics such as post-training pruning, different levels of supervision for pruning, and broader applications。

Abstract
Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources. To enable the deployment of modern models on resource-constrained environments and accelerate inference time, researchers have increasingly explored pruning techniques as a popular research direction in neural network compression. However, there is a dearth of up-to-date comprehensive review papers on pruning. To address this issue, in this survey, we provide a comprehensive review of existing research works on deep neural network pruning in a taxonomy of 1) universal/specific speedup, 2) when to prune, 3) how to prune, and 4) fusion of pruning and other compression techniques. We then provide a thorough comparative analysis of seven pairs of contrast settings for pruning (e.g., unstructured/structured) and explore emerging topics, including post-training pruning, different levels of supervision for pruning, and broader applications (e.g., adversarial robustness) to shed light on the commonalities and differences of existing methods and lay the foundation for further method development. To facilitate future research, we build a curated collection of datasets, networks, and evaluations on different applications. Finally, we provide some valuable recommendations on selecting pruning methods and prospect promising research directions. We build a repository at https://github.com/hrcheng1066/awesome-pruning.

摘要
现代深度神经网络，特别是最近的大型语言模型，具有庞大的计算和存储资源需求。为实现资源约束环境中部署现代模型和加速推理时间，研究人员开始了压缩神经网络的研究，成为现代神经网络压缩的流行研究方向。然而，当前存在相对落后的压缩研究报告。为解决这问题，在这篇评论中，我们提供了一个包括1)通用/特定速度、2)何时压缩、3)如何压缩和4)压缩与其他压缩技术融合的taxonomy的全面评论。然后，我们对7对对比设定进行了综合分析（例如，无结构/结构），并探讨了emerging topics（例如，后处理压缩、不同级别的监督压缩和更广泛的应用，例如对抗攻击），以抛光现有方法的相似性和差异，并为未来的研究提供了基础。为便于未来的研究，我们创建了一个 curaated 的数据集、网络和评估集。最后，我们提供了一些有价值的建议，包括选择压缩方法和前景探索的可能性，以及未来研究的可能性。我们在上建立了一个存储库。

Conic Descent Redux for Memory-Efficient Optimization

paper_url: http://arxiv.org/abs/2308.07343
repo_url: None
paper_authors: Bingcong Li, Georgios B. Giannakis
for: 本研究探讨了一种最近发展的首项凹降（CD）解决方案，并在三个方面进行了改进：intuition、理论和算法实现。
methods: 本研究发现CD可以提供一种直观的几何 derivation，来自对准题的 dual 问题。这开启了新的算法设计的门户，其中一个是旋转 variant of CD（MOCO）的示例。透过分析 CD 和 MOCO 的双重行为，发现：i) 可以分析性地确定停止标准；ii) 可以设计预conditioners 以加速双方的准确。
results: 最后，本研究开发了一种内存效率高的 MOCO 变体，用于扩展 SDP 特别是低级解。numerical validation 表明，这种变体可以快速和精准地解决 SDP 问题。

Abstract
Conic programming has well-documented merits in a gamut of signal processing and machine learning tasks. This contribution revisits a recently developed first-order conic descent (CD) solver, and advances it in three aspects: intuition, theory, and algorithmic implementation. It is found that CD can afford an intuitive geometric derivation that originates from the dual problem. This opens the door to novel algorithmic designs, with a momentum variant of CD, momentum conic descent (MOCO) exemplified. Diving deeper into the dual behavior CD and MOCO reveals: i) an analytically justified stopping criterion; and, ii) the potential to design preconditioners to speed up dual convergence. Lastly, to scale semidefinite programming (SDP) especially for low-rank solutions, a memory efficient MOCO variant is developed and numerically validated.

摘要
带形编程在信号处理和机器学习任务中有良好的记录。这篇论文探讨了最近开发的首项对数算法（CD）解决方案，并在三个方面提高：直观、理论和算法实现。发现CD可以提供直观的几何 derivation，这开启了新的算法设计的门户，例如帕摩散度降低（MOCO）。透过对CD和MOCO的分析，发现：一、分析正确的停止标准；二、设计加速对偶速度的预处理器。最后，为了扩大低级解的SDP，我们开发了内存有效的MOCO变体，并在数值上验证了其正确性。

Few-shot Class-incremental Learning: A Survey

paper_url: http://arxiv.org/abs/2308.06764
repo_url: None
paper_authors: Jinghua Zhang, Li Liu, Olli Silven, Matti Pietikäinen, Dewen Hu
for: 本文提供了一个系统性的和深入的简要评论，涵盖了多类增量学习（Few-shot Class-Incremental Learning，FSCIL）领域的各种方面，包括问题定义、基本挑战、一般方案、相关逻辑和评价指标等。
methods: 本文总结了FSCIL中的一些常见方法，包括基于数据、基于结构和优化基的方法，以及对象检测方法的各种改进方法，如 anchor-free 和 anchor-based 方法。
results: 本文提供了一些在FSCIL领域的研究方向，包括数据-based、结构-based 和优化-based 方法，以及一些需要进一步探索的研究方向。

Abstract
Few-shot Class-Incremental Learning (FSCIL) presents a unique challenge in machine learning, as it necessitates the continuous learning of new classes from sparse labeled training samples without forgetting previous knowledge. While this field has seen recent progress, it remains an active area of exploration. This paper aims to provide a comprehensive and systematic review of FSCIL. In our in-depth examination, we delve into various facets of FSCIL, encompassing the problem definition, the discussion of primary challenges of unreliable empirical risk minimization and the stability-plasticity dilemma, general schemes, and relevant problems of incremental learning and few-shot learning. Besides, we offer an overview of benchmark datasets and evaluation metrics. Furthermore, we introduce the classification methods in FSCIL from data-based, structure-based, and optimization-based approaches and the object detection methods in FSCIL from anchor-free and anchor-based approaches. Beyond these, we illuminate several promising research directions within FSCIL that merit further investigation.

摘要
《几个示例学习（Few-shot Class-Incremental Learning，FSCIL）》是机器学习领域中的一个独特挑战，它需要在缺乏标注训练样本的情况下，不断学习新的类型，而不会忘记之前的知识。尽管这一领域在最近几年内已经取得了一些进展，但仍然是一个活跃的探索领域。本文的目标是提供一个全面和系统的FSCIL评审，包括问题定义、主要挑战的不可靠的实际风险最小化和稳定性-柔软性之间的矛盾、通用方案和相关的增量学习和几个示例学习的问题。此外，我们还介绍了评价指标和标准测试集。进而，我们介绍了FSCIL中的分类方法，包括数据基于、结构基于和优化基于的方法，以及对象检测方法，包括无锚和锚基的方法。此外，我们还逐光了一些在FSCIL中的有前途的研究方向。

Discovering the Symptom Patterns of COVID-19 from Recovered and Deceased Patients Using Apriori Association Rule Mining

paper_url: http://arxiv.org/abs/2308.06763
repo_url: None
paper_authors: Mohammad Dehghani, Zahra Yazdanparast, Mobin Mohammadi
for: 该研究用于挖掘COVID-19患者的症状模式，以帮助临床医生更好地诊断和治疗疾病。
methods: 该研究使用了Apriori算法进行协会规则挖掘，从COVID-19患者的临床数据中挖掘出最常见的症状。
results: 研究结果显示，COVID-19患者最常见的症状包括呼吸停止（72%）、咳嗽（64%）、发热（59%）、衰弱（18%）、肌肉疼痛（14.5%）和喉咙痛（12%）。

Abstract
The COVID-19 pandemic has a devastating impact globally, claiming millions of lives and causing significant social and economic disruptions. In order to optimize decision-making and allocate limited resources, it is essential to identify COVID-19 symptoms and determine the severity of each case. Machine learning algorithms offer a potent tool in the medical field, particularly in mining clinical datasets for useful information and guiding scientific decisions. Association rule mining is a machine learning technique for extracting hidden patterns from data. This paper presents an application of association rule mining based Apriori algorithm to discover symptom patterns from COVID-19 patients. The study, using 2875 records of patient, identified the most common symptoms as apnea (72%), cough (64%), fever (59%), weakness (18%), myalgia (14.5%), and sore throat (12%). The proposed method provides clinicians with valuable insight into disease that can assist them in managing and treating it effectively.

摘要
COVID-19 流行病在全球产生了毁灭性的影响，让数百万人丧生，引起了重大的社会和经济干扰。为了优化决策和分配有限的资源，必须识别 COVID-19 症状并评估每个病例的严重程度。机器学习算法在医疗领域中提供了一个强大的工具，特别是在挖掘医疗数据中找到有用信息和导引科学决策。在这篇文章中，我们使用 Apriori 算法进行协会规则挖掘，以找到 COVID-19 患者的症状模式。研究使用了 2875 份病例数据，发现最常见的症状包括呼吸抑制（72%）、咳嗽（64%）、高烧（59%）、衰弱（18%）、肌痛（14.5%）和喉咙痛（12%）。我们的方法可以帮助医生更好地理解这种疾病，从而更有效地诊治和治疗。

Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization

paper_url: http://arxiv.org/abs/2308.06741
repo_url: None
paper_authors: Mohammad Mehdi Nasiri, Mansoor Rezghi
for: 这个研究旨在解决多智能机器人学习（Multi-Agent Reinforcement Learning，MARL）中参与者的不同能力和个人策略问题。
methods: 提案的Heterogeneous-Agent Mirror Descent Policy Optimization（HAMDPO）算法利用多智能机器人优势分解定理来实现每个代理策略的有效更新，并确保总性表现提高。HAMDPO通过迭代更新代理策略的近似解决信任区域问题，以确保稳定性和表现改善。
results: 在Multi-Agent MuJoCo和StarCraftII任务中，HAMDPO比state-of-the-art算法HATRPO和HAPPO表现出色，实现了稳定性和表现提高。这些结果显示HAMDPO是解决合作MARL问题的有望方法，可能会扩展到其他MARL领域中的挑战性问题。

Abstract
This paper presents an extension of the Mirror Descent method to overcome challenges in cooperative Multi-Agent Reinforcement Learning (MARL) settings, where agents have varying abilities and individual policies. The proposed Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO) algorithm utilizes the multi-agent advantage decomposition lemma to enable efficient policy updates for each agent while ensuring overall performance improvements. By iteratively updating agent policies through an approximate solution of the trust-region problem, HAMDPO guarantees stability and improves performance. Moreover, the HAMDPO algorithm is capable of handling both continuous and discrete action spaces for heterogeneous agents in various MARL problems. We evaluate HAMDPO on Multi-Agent MuJoCo and StarCraftII tasks, demonstrating its superiority over state-of-the-art algorithms such as HATRPO and HAPPO. These results suggest that HAMDPO is a promising approach for solving cooperative MARL problems and could potentially be extended to address other challenging problems in the field of MARL.

摘要
The HAMDPO algorithm uses the multi-agent advantage decomposition lemma to efficiently update agent policies while ensuring overall performance improvements. The algorithm iteratively updates agent policies through an approximate solution of the trust-region problem, which guarantees stability and improves performance.HAMDPO is capable of handling both continuous and discrete action spaces for heterogeneous agents in various MARL problems. The authors evaluate the algorithm on Multi-Agent MuJoCo and StarCraftII tasks and show that it outperforms state-of-the-art algorithms such as HATRPO and HAPPO. These results suggest that HAMDPO is a promising approach for solving cooperative MARL problems and could potentially be extended to address other challenging problems in the field of MARL.

Weighted Sparse Partial Least Squares for Joint Sample and Feature Selection

paper_url: http://arxiv.org/abs/2308.06740
repo_url: https://github.com/wenwenmin/wspls
paper_authors: Wenwen Min, Taosheng Xu, Chris Ding
for: 这种研究旨在扩展sPLS的应用范围，通过特定subset of samples和减少异常值来检测稀疏的数据集。
methods: 该研究提出了一种$\ell_\infty/\ell_0$-norm压缩权重稀疏PLS（wsPLS）方法，通过$\ell_\infty/\ell_0$-norm压缩来选择一个subset of samples，并使用多视图数据可以处理多个数据集。
results: 研究人员通过数值和生物医学数据实验表明，提出的方法可以减少数据维度，提高数据融合的稳定性和准确性。

Abstract
Sparse Partial Least Squares (sPLS) is a common dimensionality reduction technique for data fusion, which projects data samples from two views by seeking linear combinations with a small number of variables with the maximum variance. However, sPLS extracts the combinations between two data sets with all data samples so that it cannot detect latent subsets of samples. To extend the application of sPLS by identifying a specific subset of samples and remove outliers, we propose an $\ell_\infty/\ell_0$-norm constrained weighted sparse PLS ($\ell_\infty/\ell_0$-wsPLS) method for joint sample and feature selection, where the $\ell_\infty/\ell_0$-norm constrains are used to select a subset of samples. We prove that the $\ell_\infty/\ell_0$-norm constrains have the Kurdyka-\L{ojasiewicz}~property so that a globally convergent algorithm is developed to solve it. Moreover, multi-view data with a same set of samples can be available in various real problems. To this end, we extend the $\ell_\infty/\ell_0$-wsPLS model and propose two multi-view wsPLS models for multi-view data fusion. We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property. As well as numerical and biomedical data experiments demonstrate the efficiency of the proposed methods.

摘要
“罕缺部分最小方差（sPLS）是一种常见的维度减少技术，用于数据融合，它通过寻找两个视图中数据样本的线性组合，以实现最大差异。然而，sPLS不能检测隐藏的样本集。为了扩展sPLS的应用，我们提出了一种$\ell_\infty/\ell_0$-norm受限的重量 sparse PLS（$\ell_\infty/\ell_0$-wsPLS）方法，用于联合样本和特征选择。我们证明了$\ell_\infty/\ell_0$-norm受限有 Kurdyka-\L{ojasiewicz} 性质，因此可以开发一个全球收敛的算法来解决它。此外，多视图数据中的样本可能是同一个集合的。为此，我们扩展了$\ell_\infty/\ell_0$-wsPLS模型，并提出了两种多视图wsPLS模型 для多视图数据融合。我们开发了一个高效的迭代算法，并证明其收敛性。数值和生物医学数据实验 demonstrate了我们提出的方法的效率。”Note: Simplified Chinese is a written form of Chinese that uses simpler characters and grammar than Traditional Chinese. It is commonly used in mainland China and Singapore.

Probabilistic Imputation for Time-series Classification with Missing Data

paper_url: http://arxiv.org/abs/2308.06738
repo_url: https://github.com/yuneg11/SupNotMIWAE-with-ObsDropout
paper_authors: SeungHyun Kim, Hyunsu Kim, EungGu Yun, Hwangrae Lee, Jaehun Lee, Juho Lee
for: 这个论文主要是为了解决多重时间序列资料中的缺失价值问题。
methods: 我们提出了一个新的机会统计学 frameworks，它包括两个部分：一个深度生成模型来填写缺失价值，以及一个分类器。我们将深度生成模型扩展到更好地捕捉时间序列资料的结构，并将分类器训练为将时间序列资料与填写的缺失价值分类。
results: 我们通过实际实验表明，我们的方法可以有效地解决多重时间序列资料中的缺失价值问题，并且可以提供更好的预测结果。

Abstract
Multivariate time series data for real-world applications typically contain a significant amount of missing values. The dominant approach for classification with such missing values is to impute them heuristically with specific values (zero, mean, values of adjacent time-steps) or learnable parameters. However, these simple strategies do not take the data generative process into account, and more importantly, do not effectively capture the uncertainty in prediction due to the multiple possibilities for the missing values. In this paper, we propose a novel probabilistic framework for classification with multivariate time series data with missing values. Our model consists of two parts; a deep generative model for missing value imputation and a classifier. Extending the existing deep generative models to better capture structures of time-series data, our deep generative model part is trained to impute the missing values in multiple plausible ways, effectively modeling the uncertainty of the imputation. The classifier part takes the time series data along with the imputed missing values and classifies signals, and is trained to capture the predictive uncertainty due to the multiple possibilities of imputations. Importantly, we show that na\"ively combining the generative model and the classifier could result in trivial solutions where the generative model does not produce meaningful imputations. To resolve this, we present a novel regularization technique that can promote the model to produce useful imputation values that help classification. Through extensive experiments on real-world time series data with missing values, we demonstrate the effectiveness of our method.

摘要
多变量时间序列数据在实际应用中通常含有大量缺失值。现有的主流方法为这种缺失值是轮廓性地填充它们（零、平均值、邻近时间步颗度）或学习参数。然而，这些简单策略并不考虑数据生成过程，更重要的是，它们不能有效捕捉预测中的不确定性，因为缺失值的多种可能性。在这篇论文中，我们提出了一种新的概率 Framework for classification with multivariate time series data containing missing values.我们的模型包括两部分：深度生成模型和分类器。我们对深度生成模型进行了扩展，以更好地捕捉时间序列数据的结构，并训练它们以生成多种可能的缺失值，以模拟缺失值的uncertainty。分类器部分接受了时间序列数据以及填充后的缺失值，并分类信号，并训练它们以捕捉多种缺失值的预测不确定性。然而，我们发现，直接组合生成模型和分类器可能会导致轻微的解决方案，其中生成模型不会生成有用的填充值。为解决这个问题，我们提出了一种新的规范技术，可以促进模型生成有用的填充值，以便分类。通过对实际时间序列数据进行了广泛的实验，我们证明了我们的方法的有效性。

Precipitation nowcasting with generative diffusion models

paper_url: http://arxiv.org/abs/2308.06733
repo_url: https://github.com/fmerizzi/Precipitation-nowcasting-with-generative-diffusion-models
paper_authors: Andrea Asperti, Fabio Merizzi, Alberto Paparella, Giorgio Pedrazzi, Matteo Angelinelli, Stefano Colamonaco
For: 这个研究是用来测试深度学习方法在气象预报中的精度。* Methods: 这个研究使用了数种深度学习模型，包括生成模型、Variational Autoencoders和抑制算法。* Results: 研究发现，使用生成ensemble扩展（GED）模型可以对于降水预报提供更高的精度，比起现有的深度学习模型。

Abstract
In recent years traditional numerical methods for accurate weather prediction have been increasingly challenged by deep learning methods. Numerous historical datasets used for short and medium-range weather forecasts are typically organized into a regular spatial grid structure. This arrangement closely resembles images: each weather variable can be visualized as a map or, when considering the temporal axis, as a video. Several classes of generative models, comprising Generative Adversarial Networks, Variational Autoencoders, or the recent Denoising Diffusion Models have largely proved their applicability to the next-frame prediction problem, and is thus natural to test their performance on the weather prediction benchmarks. Diffusion models are particularly appealing in this context, due to the intrinsically probabilistic nature of weather forecasting: what we are really interested to model is the probability distribution of weather indicators, whose expected value is the most likely prediction. In our study, we focus on a specific subset of the ERA-5 dataset, which includes hourly data pertaining to Central Europe from the years 2016 to 2021. Within this context, we examine the efficacy of diffusion models in handling the task of precipitation nowcasting. Our work is conducted in comparison to the performance of well-established U-Net models, as documented in the existing literature. Our proposed approach of Generative Ensemble Diffusion (GED) utilizes a diffusion model to generate a set of possible weather scenarios which are then amalgamated into a probable prediction via the use of a post-processing network. This approach, in comparison to recent deep learning models, substantially outperformed them in terms of overall performance.

摘要
Recently, traditional numerical methods for accurate weather prediction have been increasingly challenged by deep learning methods. Historical weather data used for short and medium-range forecasts are typically organized into a regular spatial grid structure, resembling images or videos when considering the temporal axis. Generative models such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Denoising Diffusion Models (DDMs) have shown great potential in predicting the next frame of weather patterns. Diffusion models are particularly appealing in this context, as weather forecasting is inherently probabilistic and what we are really interested in modeling is the probability distribution of weather indicators.In our study, we focus on a specific subset of the ERA-5 dataset, which includes hourly data for Central Europe from 2016 to 2021. We examine the efficacy of diffusion models in handling the task of precipitation nowcasting and compare their performance to well-established U-Net models. Our proposed approach, Generative Ensemble Diffusion (GED), utilizes a diffusion model to generate a set of possible weather scenarios, which are then combined into a probable prediction using a post-processing network. This approach outperforms recent deep learning models in terms of overall performance.

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

paper_url: http://arxiv.org/abs/2308.06718
repo_url: None
paper_authors: Feng Xie, Biwei Huang, Zhengming Chen, Ruichu Cai, Clark Glymour, Zhi Geng, Kun Zhang
for: The paper is written for learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables.
methods: The paper proposes a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. The paper also provides necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic causal models.
results: The paper shows that the proposed GIN condition, together with a well-designed search procedure, can be used to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. The paper also demonstrates the effectiveness of the proposed approach through experimental results.

Abstract
We investigate the challenging task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To address this, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic causal models. Roughly speaking, GIN implies the existence of an exogenous set $\mathcal{S}$ relative to the parent set of $\mathbf{Y}$ (w.r.t. the causal ordering), such that $\mathcal{S}$ d-separates $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the underlying causal structure of a LiNGLaH is identifiable in light of GIN conditions under mild assumptions. Experimental results show the effectiveness of the proposed approach.

摘要
Translated into Simplified Chinese:我们研究一个复杂的任务，即在存在隐变量的情况下学习 causal 结构，包括找到隐变量的位置和量，以及确定隐变量和观测变量之间的 causal 关系。为此，我们提出了一种 Generalized Independent Noise (GIN) 条件，用于 linear non-Gaussian 隐变量模型，该条件 garanties that a linear combination of certain observed variables and some other observed variables are independent. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^\top \mathbf{Y}$ and $\mathbf{Z}$ are independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then provide necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic causal models. Roughly speaking, GIN implies the existence of an exogenous set $\mathcal{S}$ relative to the parent set of $\mathbf{Y}$ (w.r.t. the causal ordering), such that $\mathcal{S}$ d-separates $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the underlying causal structure of a LiNGLaH is identifiable in light of GIN conditions under mild assumptions. Experimental results show the effectiveness of the proposed approach.

Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards

paper_url: http://arxiv.org/abs/2308.06717
repo_url: None
paper_authors: Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani
for: This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal in a setting where the principal cannot observe the agent’s reward realizations.methods: The paper uses a multi-armed bandit (MAB) problem to model the agent’s learning and a parallel algorithm for the principal to consistently estimate the agent’s unknown rewards while maximizing their own utility.results: The paper proves finite-sample consistency of an estimator and a rigorous regret bound for the principal by considering the sequential externality imposed by the agent, and simulations justify the applicability of the framework to green energy aggregator contracts.

Abstract
In practice, incentive providers (i.e., principals) often cannot observe the reward realizations of incentivized agents, which is in contrast to many principal-agent models that have been previously studied. This information asymmetry challenges the principal to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, which becomes even more challenging when the agent has to learn its own rewards. This complex setting is observed in various real-life scenarios ranging from renewable energy storage contracts to personalized healthcare incentives. Hence, it offers not only interesting theoretical questions but also wide practical relevance. This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal. The agent tackles a multi-armed bandit (MAB) problem to maximize their expected reward plus incentive. On top of the agent's learning, the principal trains a parallel algorithm and faces a trade-off between consistently estimating the agent's unknown rewards and maximizing their own utility by offering adaptive incentives to lead the agent. For a non-parametric model, we introduce an estimator whose only input is the history of principal's incentives and agent's choices. We unite this estimator with a proposed data-driven incentive policy within a MAB framework. Without restricting the type of the agent's algorithm, we prove finite-sample consistency of the estimator and a rigorous regret bound for the principal by considering the sequential externality imposed by the agent. Lastly, our theoretical results are reinforced by simulations justifying applicability of our framework to green energy aggregator contracts.

摘要
在实践中，奖励提供者（即主体）经常无法观察奖励的实现情况，这与许多主体-代理模型不同，这种信息不均衡会让主体难以透过决策来估计代理人的未知奖励，这变得更加复杂，当代理人需要学习自己的奖励时。这种复杂的设定在各种实际场景中出现，包括可再生能源存储合同和个性化医疗奖励。因此，它不仅存在许多理论问题，还有广泛的实际应用。本文研究了一个反复的对抗选择游戏，其中一个自利主义学习代理人与一个学习主体之间进行交互。代理人面临多支枪战（MAB）问题，以最大化他们的预期奖励加上奖励。除了代理人的学习之外，主体还需要训练一个平行算法，并面临一种奖励优化和代理人奖励的负担。为了不假设代理人的算法类型，我们提出了一种无参数的估计器，其唯一的输入是主体的奖励历史和代理人的选择。我们将这种估计器与一种基于MAB框架的数据驱动奖励策略联系起来。我们证明了这种估计器的finite-sample consistent性和对主体的正确做出约束。最后，我们通过实验证明了我们的框架在绿色能源总包合同中的应用可行性。

CDR: Conservative Doubly Robust Learning for Debiased Recommendation

paper_url: http://arxiv.org/abs/2308.08461
repo_url: None
paper_authors: ZiJie Song, JiaWei Chen, Sheng Zhou, QiHao Shi, Yan Feng, Chun Chen, Can Wang
for: 提高推荐系统中偏见的稳定性和性能
methods: 使用 Conservative Doubly Robust 策略（CDR），包括对填充值进行筛选和分析，以减少偏见的影响
results: 比较 experiments 表明，CDR 可以提高推荐系统的性能，同时减少偏见的频率

Abstract
In recommendation systems (RS), user behavior data is observational rather than experimental, resulting in widespread bias in the data. Consequently, tackling bias has emerged as a major challenge in the field of recommendation systems. Recently, Doubly Robust Learning (DR) has gained significant attention due to its remarkable performance and robust properties. However, our experimental findings indicate that existing DR methods are severely impacted by the presence of so-called Poisonous Imputation, where the imputation significantly deviates from the truth and becomes counterproductive. To address this issue, this work proposes Conservative Doubly Robust strategy (CDR) which filters imputations by scrutinizing their mean and variance. Theoretical analyses show that CDR offers reduced variance and improved tail bounds.In addition, our experimental investigations illustrate that CDR significantly enhances performance and can indeed reduce the frequency of poisonous imputation.

摘要
在推荐系统（RS）中，用户行为数据是观察性的而不是实验性的，导致数据中存在普遍的偏见。因此，解决偏见问题已成为推荐系统领域的主要挑战。近些年来，双重稳健学习（DR）已经受到了广泛关注，因为它的表现良好和稳健性。然而，我们的实验结果表明，现有的DR方法受到 socalled "poisonous imputation" 的影响，其中的填充数据显著不符合事实，甚至变得counterproductive。为解决这个问题，本工作提出了 Conservative Doubly Robust 策略（CDR），该策略通过评估填充数据的mean和variance来筛选填充。理论分析表明，CDR可以降低方差和提高尾 bounds。此外，我们的实验研究表明，CDR可以显著提高性能，并可以减少poisonous imputation的频率。

Learning on Graphs with Out-of-Distribution Nodes

paper_url: http://arxiv.org/abs/2308.06714
repo_url: https://github.com/songyyyy/kdd22-oodgat
paper_authors: Yu Song, Donglin Wang
for: 本文旨在Addressing the problem of graph learning with out-of-distribution nodes, including detecting nodes that do not belong to the known distribution and classifying the remaining nodes to be one of the known classes.
methods: 本文提出了一种新的Graph Attention Network（GAT）模型，即Out-of-Distribution Graph Attention Network（OODGAT），该模型可以Explicitly model the interaction between different kinds of nodes and separate inliers from outliers during feature propagation.
results: 实验表明，OODGAT比现有的异常检测方法表现出较大的优势，同时与现有的分类方法相比，OODGAT的分类性能也是比较良好的。

Abstract
Graph Neural Networks (GNNs) are state-of-the-art models for performing prediction tasks on graphs. While existing GNNs have shown great performance on various tasks related to graphs, little attention has been paid to the scenario where out-of-distribution (OOD) nodes exist in the graph during training and inference. Borrowing the concept from CV and NLP, we define OOD nodes as nodes with labels unseen from the training set. Since a lot of networks are automatically constructed by programs, real-world graphs are often noisy and may contain nodes from unknown distributions. In this work, we define the problem of graph learning with out-of-distribution nodes. Specifically, we aim to accomplish two tasks: 1) detect nodes which do not belong to the known distribution and 2) classify the remaining nodes to be one of the known classes. We demonstrate that the connection patterns in graphs are informative for outlier detection, and propose Out-of-Distribution Graph Attention Network (OODGAT), a novel GNN model which explicitly models the interaction between different kinds of nodes and separate inliers from outliers during feature propagation. Extensive experiments show that OODGAT outperforms existing outlier detection methods by a large margin, while being better or comparable in terms of in-distribution classification.

摘要
图ael Neural Networks (GNNs) 是当前最佳模型 для图ael任务中的预测模型。 Although existing GNNs have shown great performance on various graph-related tasks, little attention has been paid to the scenario where out-of-distribution (OOD) nodes exist in the graph during training and inference. Based on the concept from CV and NLP, we define OOD nodes as nodes with labels not seen in the training set. Since many networks are automatically constructed by programs, real-world graphs are often noisy and may contain nodes from unknown distributions. In this work, we define the problem of graph learning with out-of-distribution nodes. Specifically, we aim to accomplish two tasks: 1) detect nodes that do not belong to the known distribution and 2) classify the remaining nodes as one of the known classes. We demonstrate that the connection patterns in graphs are informative for outlier detection, and propose Out-of-Distribution Graph Attention Network (OODGAT), a novel GNN model that explicitly models the interaction between different types of nodes and separates inliers from outliers during feature propagation. Extensive experiments show that OODGAT outperforms existing outlier detection methods by a large margin, while being better or comparable in terms of in-distribution classification.

The Hard-Constraint PINNs for Interface Optimal Control Problems

paper_url: http://arxiv.org/abs/2308.06709
repo_url: https://github.com/tianyouzeng/pinns-interface-optimal-control
paper_authors: Ming-Chih Lai, Yongcun Song, Xiaoming Yuan, Hangrui Yue, Tianyou Zeng
for: solves optimal control problems subject to partial differential equations (PDEs) with interfaces and some control constraints.
methods: combines physics-informed neural networks (PINNs) with recently developed discontinuity capturing neural networks to solve the problems.
results: guarantees that both the boundary and interface conditions can be satisfied exactly, and is efficient for elliptic and parabolic interface optimal control problems.Here’s the full summary in Simplified Chinese:
for: solves optimal control problems subject to PDEs with interfaces and control constraints.
methods: combines PINNs with discontinuity capturing neural networks.
results: guarantees exact satisfaction of boundary and interface conditions, and is efficient for elliptic and parabolic interface optimal control problems.

Abstract
We show that the physics-informed neural networks (PINNs), in combination with some recently developed discontinuity capturing neural networks, can be applied to solve optimal control problems subject to partial differential equations (PDEs) with interfaces and some control constraints. The resulting algorithm is mesh-free and scalable to different PDEs, and it ensures the control constraints rigorously. Since the boundary and interface conditions, as well as the PDEs, are all treated as soft constraints by lumping them into a weighted loss function, it is necessary to learn them simultaneously and there is no guarantee that the boundary and interface conditions can be satisfied exactly. This immediately causes difficulties in tuning the weights in the corresponding loss function and training the neural networks. To tackle these difficulties and guarantee the numerical accuracy, we propose to impose the boundary and interface conditions as hard constraints in PINNs by developing a novel neural network architecture. The resulting hard-constraint PINNs approach guarantees that both the boundary and interface conditions can be satisfied exactly and they are decoupled from the learning of the PDEs. Its efficiency is promisingly validated by some elliptic and parabolic interface optimal control problems.

摘要
我们显示了物理学 Informed Neural Networks (PINNs) 可以与最近发展的破碎点捕捉神经网络 (DCNNs) 结合，以解决具有界面和一些控制约束的最佳控制问题。这个算法是无网格的和可扩展的，并且保证控制约束的严格性。由于边界和界面条件，以及PDEs，都是软的约束，因此需要同时学习它们，并且没有保证边界和界面条件可以精确地满足。这会导致调整约束的预测条件和神经网络训练中的困难。为了解决这些困难并保证数值精度，我们提出了将边界和界面条件作为硬的约束在PINNs中，通过开发一种新的神经网络架构。这种硬约束PINNs方法可以保证边界和界面条件可以精确地满足，并且与PDEs的学习分离开来。我们在一些椭圆和带形interface最佳控制问题中调查了这种方法的效率，并证明了其可靠性。

Generating observation guided ensembles for data assimilation with denoising diffusion probabilistic model

paper_url: http://arxiv.org/abs/2308.06708
repo_url: https://github.com/yasahi-hpc/generative-enkf
paper_authors: Yuuichi Asahi, Yuta Hasegawa, Naoyuki Onodera, Takashi Shimokawabe, Hayato Shiba, Yasuhiro Idomura
for: 这 paper 用于 ensemble data assimilation，使用 pseudo ensemble 生成的 denoising diffusion probabilistic model。
methods: 该方法使用模型对含杂和罕见观测数据进行训练，生成多个不同的 Ensemble，并利用这些 Ensemble 的差异来进行数据融合。
results: 比较 conventional ensemble data assimilation 方法，这种方法在模型不完善时显示出更好的性能。

Abstract
This paper presents an ensemble data assimilation method using the pseudo ensembles generated by denoising diffusion probabilistic model. Since the model is trained against noisy and sparse observation data, this model can produce divergent ensembles close to observations. Thanks to the variance in generated ensembles, our proposed method displays better performance than the well-established ensemble data assimilation method when the simulation model is imperfect.

摘要
这篇论文介绍了一种ensemble数据融合方法，使用pseudo ensemble生成于噪声扩散概率模型。由于模型对噪声和稀缺观测数据进行训练，因此这个模型可以生成与观测数据相近的多个分布。由于这些生成的分布之间的差异，我们提议的方法在模型不完美时表现更好 than traditional ensemble数据融合方法。Note: "pseudo ensemble" in Chinese is "假集合" (fǎ jiéhù), and "denoising diffusion probabilistic model" in Chinese is "噪声扩散概率模型" (zāi shēng kuò shiān yù jì mó delè).

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

paper_url: http://arxiv.org/abs/2308.06703
repo_url: None
paper_authors: Avery Ma, Yangchen Pan, Amir-massoud Farahmand
for: 论述了使用权重更新法（SGD）和自适应梯度方法（Adam、RMSProp）训练深度神经网络的研究。
methods: 使用SGD和自适应梯度方法训练深度神经网络。
results: 对于自然数据集，SGD训练的模型对输入扰动 exhibit 较好的Robustness，而使用自适应梯度方法训练的模型则对于这些扰动 exhibit 较差的Robustness。这种差异可以通过学习动态研究和synthetic dataset的实验来解释。

Abstract
Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation demonstrates the presence of irrelevant frequencies in natural datasets, where alterations do not affect models' generalization performance. However, models trained with adaptive methods show sensitivity to these changes, suggesting that their use of irrelevant frequencies can lead to solutions sensitive to perturbations. To better understand this difference, we study the learning dynamics of gradient descent (GD) and sign gradient descent (signGD) on a synthetic dataset that mirrors natural signals. With a three-dimensional input space, the models optimized with GD and signGD have standard risks close to zero but vary in their adversarial risks. Our result shows that linear models' robustness to $\ell_2$-norm bounded changes is inversely proportional to the model parameters' weight norm: a smaller weight norm implies better robustness. In the context of deep learning, our experiments show that SGD-trained neural networks show smaller Lipschitz constants, explaining the better robustness to input perturbations than those trained with adaptive gradient methods.

摘要

Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection

paper_url: http://arxiv.org/abs/2308.06701
repo_url: None
paper_authors: Haichao Zhang, Can Qin, Yu Yin, Yun Fu
for: 提高深度学习模型对涂抹式对象检测的能力
methods: 使用生成模型生成涂抹式图像，以增强现有对象检测模型的识别能力
results: 比现有方法高效，在COD10k、CAMO和CHAMELEON三个数据集上达到了更高的检测精度

Abstract
Camouflaged objects that blend into natural scenes pose significant challenges for deep-learning models to detect and synthesize. While camouflaged object detection is a crucial task in computer vision with diverse real-world applications, this research topic has been constrained by limited data availability. We propose a framework for synthesizing camouflage data to enhance the detection of camouflaged objects in natural scenes. Our approach employs a generative model to produce realistic camouflage images, which can be used to train existing object detection models. Specifically, we use a camouflage environment generator supervised by a camouflage distribution classifier to synthesize the camouflage images, which are then fed into our generator to expand the dataset. Our framework outperforms the current state-of-the-art method on three datasets (COD10k, CAMO, and CHAMELEON), demonstrating its effectiveness in improving camouflaged object detection. This approach can serve as a plug-and-play data generation and augmentation module for existing camouflaged object detection tasks and provides a novel way to introduce more diversity and distributions into current camouflage datasets.

摘要
伪装物体在自然场景中混合很困难对深度学习模型进行检测和生成。隐身物体检测是计算机视觉中重要的任务，它在各种实际应用中具有广泛的意义。然而，这一研究领域受到有限的数据可用性的限制。我们提出了一种框架，用于增强自然场景中隐身物体的检测。我们的方法使用生成模型生成真实的伪装图像，这些图像可以用来训练现有的物体检测模型。具体来说，我们使用一个伪装环境生成器，该生成器被监督于伪装分布分类器，以生成伪装图像。这些图像然后被我们的生成器扩展，以增加数据集。我们的框架在COD10k、CAMO和CHAMELEON三个数据集上表现出色，超越当前状态的方法，证明了我们的方法的有效性。这种方法可以作为现有隐身物体检测任务的数据生成和增强模块，并提供一种新的多样性和分布引入现有的伪装数据集的方法。

SimMatchV2: Semi-Supervised Learning with Graph Consistency

paper_url: http://arxiv.org/abs/2308.06692
repo_url: https://github.com/mingkai-zheng/simmatchv2
paper_authors: Mingkai Zheng, Shan You, Lang Huang, Chen Luo, Fei Wang, Chen Qian, Chang Xu
for: 这个研究目的是为了提出一个新的半supervised learning算法，以减少人工劳动。
methods: 这个算法叫做SimMatchV2，它利用图论的观点来设计了多种一致规律，以确保labeled和unlabeled数据之间的一致性。
results: 这个算法在多个半supervised learningbenchmark上进行验证，以300次训练和ResNet-50底层，SimMatchV2在ImageNet上得到71.9%和76.2%的Top-1准确率，优于之前的方法，并达到了现有的最佳性能。

Abstract
Semi-Supervised image classification is one of the most fundamental problem in computer vision, which significantly reduces the need for human labor. In this paper, we introduce a new semi-supervised learning algorithm - SimMatchV2, which formulates various consistency regularizations between labeled and unlabeled data from the graph perspective. In SimMatchV2, we regard the augmented view of a sample as a node, which consists of a label and its corresponding representation. Different nodes are connected with the edges, which are measured by the similarity of the node representations. Inspired by the message passing and node classification in graph theory, we propose four types of consistencies, namely 1) node-node consistency, 2) node-edge consistency, 3) edge-edge consistency, and 4) edge-node consistency. We also uncover that a simple feature normalization can reduce the gaps of the feature norm between different augmented views, significantly improving the performance of SimMatchV2. Our SimMatchV2 has been validated on multiple semi-supervised learning benchmarks. Notably, with ResNet-50 as our backbone and 300 epochs of training, SimMatchV2 achieves 71.9\% and 76.2\% Top-1 Accuracy with 1\% and 10\% labeled examples on ImageNet, which significantly outperforms the previous methods and achieves state-of-the-art performance. Code and pre-trained models are available at \href{https://github.com/mingkai-zheng/SimMatchV2}{https://github.com/mingkai-zheng/SimMatchV2}.

摘要
《半指导Image Classification》是计算机视觉中的一个基本问题，它可以减少人工劳动量。在这篇论文中，我们介绍了一种新的半指导学习算法——SimMatchV2，它在图像视角下划定了不同类别的样本。在SimMatchV2中，我们将每个样本视为一个节点，每个节点有一个标签和对应的表示。不同的节点之间连接了边，边的 Similarity 度量节点表示之间的相似性。我们还提出了四种一致性，即1）节点-节点一致性，2）节点-边一致性，3）边-边一致性，4）边-节点一致性。我们还发现了一种简单的特征归一化可以减少不同扩展视图之间的特征范围差异，从而显著提高SimMatchV2的性能。我们的SimMatchV2在多个半指导学习标准benchmark上进行验证，与ResNet-50作为背景和300个训练周期，SimMatchV2在ImageNet上 achieve 71.9%和76.2%的Top-1准确率，与先前的方法相比显著超越，实现了状态的最佳性能。代码和预训练模型可以在中获取。

MDB: Interactively Querying Datasets and Models

paper_url: http://arxiv.org/abs/2308.06686
repo_url: None
paper_authors: Aaditya Naik, Adam Stein, Yinjun Wu, Eric Wong, Mayur Naik
for: 这篇论文是为了提供一个Debugging框架，帮助开发者在机器学习管道中系统地调试错误。
methods: 这篇论文使用了函数编程和关系代数来构建表达式查询数据集和模型预测。查询可重用和轻松修改，帮助调试员快速缩小查询错误和模型行为。
results: experiments show that MDB可以提供更快（10倍）和更短（40%）的查询，并且在用户研究中，开发者可以成功构建复杂的查询来描述机器学习模型的错误。

Abstract
As models are trained and deployed, developers need to be able to systematically debug errors that emerge in the machine learning pipeline. We present MDB, a debugging framework for interactively querying datasets and models. MDB integrates functional programming with relational algebra to build expressive queries over a database of datasets and model predictions. Queries are reusable and easily modified, enabling debuggers to rapidly iterate and refine queries to discover and characterize errors and model behaviors. We evaluate MDB on object detection, bias discovery, image classification, and data imputation tasks across self-driving videos, large language models, and medical records. Our experiments show that MDB enables up to 10x faster and 40\% shorter queries than other baselines. In a user study, we find developers can successfully construct complex queries that describe errors of machine learning models.

摘要
models 是在训练和部署过程中，开发人员需要系统地调试出现在机器学习管道中的错误。我们提出了 MDB，一个用于交互查询数据集和模型的调试框架。MDB将函数编程与关系代数结合，以构建表达式查询数据库中的数据集和模型预测。查询可重复使用，易于修改，让调试者可以快速灵活地 iteratively 修改查询，以描述和揭示错误和模型行为。我们在对自动驾驶视频、大语言模型和医疗记录进行对象检测、偏见发现、图像分类和数据补充任务上进行了实验，发现 MDB 可以提高查询速度和查询长度，相比于其他基eline。在用户研究中，我们发现开发人员可以成功地构建复杂的查询，以描述机器学习模型的错误。

Separable Gaussian Neural Networks: Structure, Analysis, and Function Approximations

paper_url: http://arxiv.org/abs/2308.06679
repo_url: None
paper_authors: Siyuan Xing, Jianqiao Sun
For: 这个论文想要解决高维输入数据的快速 interpolate和分类问题，提出了一种新的前向网络模型 - 分解 Gaussian 神经网络（SGNN）。* Methods: SGNN 利用 Gaussian 函数的分解性，将输入数据分割成多列，然后在并行层中进行批量处理，从而将计算量从 GRBFNN 的 O(N^d) 减少到 O(dN)，速度增长 linear 地。* Results: 实验表明，SGNN 可以与 GRBFNN 相比，在 tri-variate 函数近似中实现 100 倍的速度提升，并且保持 GRBFNN 的级别准确性。 SGNN 还比 DNNs WITH RuLU 和 Sigmoid 函数更易于训练和调整。在approximating 函数 WITH complex geometry 时，SGNN 可以达到三个数量级更高的准确性。

Abstract
The Gaussian-radial-basis function neural network (GRBFNN) has been a popular choice for interpolation and classification. However, it is computationally intensive when the dimension of the input vector is high. To address this issue, we propose a new feedforward network - Separable Gaussian Neural Network (SGNN) by taking advantage of the separable property of Gaussian functions, which splits input data into multiple columns and sequentially feeds them into parallel layers formed by uni-variate Gaussian functions. This structure reduces the number of neurons from O(N^d) of GRBFNN to O(dN), which exponentially improves the computational speed of SGNN and makes it scale linearly as the input dimension increases. In addition, SGNN can preserve the dominant subspace of the Hessian matrix of GRBFNN in gradient descent training, leading to a similar level of accuracy to GRBFNN. It is experimentally demonstrated that SGNN can achieve 100 times speedup with a similar level of accuracy over GRBFNN on tri-variate function approximations. The SGNN also has better trainability and is more tuning-friendly than DNNs with RuLU and Sigmoid functions. For approximating functions with complex geometry, SGNN can lead to three orders of magnitude more accurate results than a RuLU-DNN with twice the number of layers and the number of neurons per layer.

摘要
Gaussian-radial-basis函数神经网络（GRBFNN）已经是选择 interpolation和分类的受欢迎选择。然而，当输入向量维度高时，它会占用大量计算资源。为解决这个问题，我们提出了一个新的前向网络——分解 Gaussian 神经网络（SGNN），利用 Gaussian 函数的分解性，将输入数据分解成多列，然后将它们顺序输入到由单variate Gaussian 函数组成的并行层中。这种结构将 GRBFNN 中的 neuron 数由 O(N^d) 降低到 O(dN)，从而 exponential 提高 SGNN 的计算速度，使其与输入维度增加时呈线性增长。此外，SGNN 还可以保留 GRBFNN 的主要子空间，从而在梯度下降训练中达到类似精度水平。实验表明，SGNN 可以在 tri-variate 函数拟合中实现 100 倍的速度提升，同时保持精度水平。此外，SGNN 还比 DNNs WITH RuLU 和 sigmoid 函数更易于训练和调整。对于拟合复杂几何函数的情况，SGNN 可以 achieve 三个排名的精度提升。

A deep learning framework for multi-scale models based on physics-informed neural networks

paper_url: http://arxiv.org/abs/2308.06672
repo_url: None
paper_authors: Yong Wang, Yanzhong Yao, Jiawei Guo, Zhiming Gao
for: 解决多级别问题（multi-scale problems）
methods: 基于深度神经网络（deep neural networks）和解决partial differential equations（PDEs）的physics-informed neural networks（PINN）方法
results: 提出了一种新的框架，可以同时优化多级别的损失项，并且可以处理不同子域的问题变化。

Abstract
Physics-informed neural networks (PINN) combine deep neural networks with the solution of partial differential equations (PDEs), creating a new and promising research area for numerically solving PDEs. Faced with a class of multi-scale problems that include loss terms of different orders of magnitude in the loss function, it is challenging for standard PINN methods to obtain an available prediction. In this paper, we propose a new framework for solving multi-scale problems by reconstructing the loss function. The framework is based on the standard PINN method, and it modifies the loss function of the standard PINN method by applying different numbers of power operations to the loss terms of different magnitudes, so that the individual loss terms composing the loss function have approximately the same order of magnitude among themselves. In addition, we give a grouping regularization strategy, and this strategy can deal well with the problem which varies significantly in different subdomains. The proposed method enables loss terms with different magnitudes to be optimized simultaneously, and it advances the application of PINN for multi-scale problems.

摘要
物理学 Informed neural networks (PINN) combine deep neural networks with partial differential equations (PDEs) 的解决方法，创造了一个新的研究领域，用于数值解决 PDEs。面临多个层次问题，其中loss函数中的损失项有不同的级别，标准的PINN方法难以获得可用的预测。在这篇论文中，我们提出了一种新的多层次问题解决框架。这种框架基于标准的PINN方法，对loss函数中的各个损失项应用不同的数量的power操作，使得各个损失项的级别相对较同。此外，我们提出了一种分组常见化策略，该策略可以处理不同子领域中变化很大的问题。提出的方法可以同时优化不同级别的损失项，并提高PINN在多层次问题上的应用。

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

paper_url: http://arxiv.org/abs/2308.06671
repo_url: None
paper_authors: Liu Ziyin, Hongchao Li, Masahito Ueda
for: This paper aims to understand how the stochastic gradient descent (SGD) algorithm navigates the highly nonlinear and degenerate loss landscape of a neural network.
methods: The paper uses theoretical analysis to prove that the minibatch noise of SGD regularizes the solution towards a balanced solution whenever the loss function contains a rescaling symmetry.
results: The paper derives the stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width, and shows that the stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion, which are unique to deep networks.Here is the answer in Simplified Chinese text:
for: 这篇论文目标是理解权重梯度下降（SGD）算法在神经网络的高非线性和平衡梯度图像中的探索。
methods: 论文使用理论分析，证明SGD中批处理噪声对于包含扩缩尺度Symmetry的损失函数的解决方法。
results: 论文Derive diagonally linear network with arbitrary depth and width的stationary distribution of stochastic gradient flow，并显示其站立分布具有复杂非线性现象，如相转变、破碎Ergodicity和振荡反转，这些现象只存在于深度很大的网络中。

Abstract
The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks. However, it remains poorly understood how the SGD navigates the highly nonlinear and degenerate loss landscape of a neural network. In this work, we prove that the minibatch noise of SGD regularizes the solution towards a balanced solution whenever the loss function contains a rescaling symmetry. Because the difference between a simple diffusion process and SGD dynamics is the most significant when symmetries are present, our theory implies that the loss function symmetries constitute an essential probe of how SGD works. We then apply this result to derive the stationary distribution of stochastic gradient flow for a diagonal linear network with arbitrary depth and width. The stationary distribution exhibits complicated nonlinear phenomena such as phase transitions, broken ergodicity, and fluctuation inversion. These phenomena are shown to exist uniquely in deep networks, implying a fundamental difference between deep and shallow models.

摘要
SGD算法是我们用来训练神经网络的算法，但是它在神经网络的高度非线性和缺乏稳定性的损失函数空间中 navigation 仍然不够了解。在这个工作中，我们证明了SGD中的小批量噪声规范化解决方案，当损失函数具有扩展对称性时。由于噪声和SGD动力学的差异最大化在对称性存在时，我们的理论 imply 损失函数对称性是SGD工作的重要探测器。我们然后使用这结果来 derive 神经网络的站点分布，并证明了深度神经网络存在复杂非线性现象，如相对稳定性、破坏性和异常倒振。这些现象仅存在深度神经网络中，表明深度和浅度模型之间存在根本的差异。

Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges

paper_url: http://arxiv.org/abs/2308.06668
repo_url: https://github.com/jiajiali04/agriculture-foundation-models
paper_authors: Jiajia Li, Mingle Xu, Lirong Xiang, Dong Chen, Weichao Zhuang, Xunyuan Yin, Zhaojian Li
for: 本研究旨在探讨基于machine learning和deep learning的智能农业领域中的应用Foundation Model（FM）。
methods: 本研究首先对现代计算机科学领域的FM进行了综述，并将其分为四类：语言FM、视觉FM、多模态FM和奖励学习FM。然后，我们详细介绍了在农业领域开发农业FM的过程，并讨论了其在智能农业中的潜在应用。
results: 本研究通过引入基于FM的应用方法，可以减少农业AI系统的依赖于大量标注数据，提高效率、有效性和通用性。此外，本研究还提出了开发农业FM的一些挑战，包括模型训练、验证和部署。

Abstract
The past decade has witnessed the rapid development of ML and DL methodologies in agricultural systems, showcased by great successes in variety of agricultural applications. However, these conventional ML/DL models have certain limitations: They heavily rely on large, costly-to-acquire labeled datasets for training, require specialized expertise for development and maintenance, and are mostly tailored for specific tasks, thus lacking generalizability. Recently, foundation models have demonstrated remarkable successes in language and vision tasks across various domains. These models are trained on a vast amount of data from multiple domains and modalities. Once trained, they can accomplish versatile tasks with just minor fine-tuning and minimal task-specific labeled data. Despite their proven effectiveness and huge potential, there has been little exploration of applying FMs to agriculture fields. Therefore, this study aims to explore the potential of FMs in the field of smart agriculture. In particular, we present conceptual tools and technical background to facilitate the understanding of the problem space and uncover new research directions in this field. To this end, we first review recent FMs in the general computer science domain and categorize them into four categories: language FMs, vision FMs, multimodal FMs, and reinforcement learning FMs. Subsequently, we outline the process of developing agriculture FMs and discuss their potential applications in smart agriculture. We also discuss the unique challenges associated with developing AFMs, including model training, validation, and deployment. Through this study, we contribute to the advancement of AI in agriculture by introducing AFMs as a promising paradigm that can significantly mitigate the reliance on extensive labeled datasets and enhance the efficiency, effectiveness, and generalization of agricultural AI systems.

摘要
过去一代，机器学习（ML）和深度学习（DL）方法在农业系统中得到了迅速发展，在多种农业应用中显示出了很大成功。然而，这些传统的ML/DL模型具有一些限制：它们需要大量、昂贵的标签数据进行训练，需要专门的专业知识进行开发和维护，而且主要是为特定任务设计，因此缺乏普适性。在最近的几年里，基础模型（FM）在语言和视觉任务中获得了惊人的成功。这些模型通过大量的数据来自多个领域和模式进行训练，一旦训练完成，就可以完成多种任务，只需要微小的调整和微小的任务特定的标签数据。尽管它们的可效性和潜在的潜力很大，但在农业领域中还没有多少探索基础模型的应用。因此，本研究旨在探讨基础模型在智能农业领域的潜力。具体来说，我们首先将最近的FM在通用计算机科学领域中进行了综述，并将其分为四类：语言FM、视觉FM、多模式FM和奖励学习FM。然后，我们详细介绍了在农业领域开发农业FM的过程，并讨论了它们在智能农业中的潜在应用。我们还讨论了开发AFM的独特挑战，包括模型训练、验证和部署。通过本研究，我们为农业AI的发展做出了贡献，将基础模型作为一种可能的解决方案，可以减少农业AI系统的依赖于大量标签数据，提高效率、有效性和普适性。

ALGAN: Time Series Anomaly Detection with Adjusted-LSTM GAN

paper_url: http://arxiv.org/abs/2308.06663
repo_url: None
paper_authors: Md Abul Bashar, Richi Nayak
For: Anomaly detection in time series data, specifically in univariate and multivariate datasets in an unsupervised setting.* Methods: Proposes a new GAN model called Adjusted-LSTM GAN (ALGAN), which adjusts the output of an LSTM network for improved anomaly detection accuracy.* Results: Outperforms traditional, neural network-based, and other GAN-based methods for anomaly detection in time series data, as demonstrated through experiments on 46 real-world univariate time series datasets and a large multivariate dataset.

Abstract
Anomaly detection in time series data, to identify points that deviate from normal behaviour, is a common problem in various domains such as manufacturing, medical imaging, and cybersecurity. Recently, Generative Adversarial Networks (GANs) are shown to be effective in detecting anomalies in time series data. The neural network architecture of GANs (i.e. Generator and Discriminator) can significantly improve anomaly detection accuracy. In this paper, we propose a new GAN model, named Adjusted-LSTM GAN (ALGAN), which adjusts the output of an LSTM network for improved anomaly detection in both univariate and multivariate time series data in an unsupervised setting. We evaluate the performance of ALGAN on 46 real-world univariate time series datasets and a large multivariate dataset that spans multiple domains. Our experiments demonstrate that ALGAN outperforms traditional, neural network-based, and other GAN-based methods for anomaly detection in time series data.

摘要
<>时间序列数据中异常检测，以识别不同于常规行为的点，是多个领域中的一个常见问题，包括制造、医疗影像和网络安全等。最近，生成对抗网络（GANs）在时间序列数据中的异常检测中表现出色。GANs的神经网络架构（即生成器和识别器）可以显著提高异常检测精度。在本文中，我们提出了一种新的GAN模型，名为调整LSTM GAN（ALGAN），该模型可以在无监督的情况下，对单变量和多变量时间序列数据进行改进的异常检测。我们对46个真实的单变量时间序列数据集和多个领域的大量多变量数据集进行了试验，结果表明，ALGAN比传统的神经网络基于的方法、神经网络GAN方法和其他GAN方法在时间序列数据中的异常检测方面表现出色。Note: "LSTM" stands for Long Short-Term Memory, which is a type of Recurrent Neural Network (RNN) designed to handle time series data.

Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features

paper_url: http://arxiv.org/abs/2308.08482
repo_url: https://github.com/yiiizhang/shortcutDebiasing
paper_authors: Yi Zhang, Jitao Sang, Junyang Wang, Dongmei Jiang, Yaowei Wang
for: 降低机器学习模型中的偏见风险，特别是在社会应用中，如雇用、银行和刑事司法等。
methods: 我们提出了一种简洁处理方法，称为“快捷偏见处理”（Shortcut Debiasing），它首先将偏见特征 transferred to快捷特征，然后使用 causal intervention 把快捷特征 eliminated during inference。
results: 我们将此方法应用到多个 benchmark 数据集上，并与现有的偏见处理方法进行比较，获得了显著的改善。

Abstract
Machine learning models often learn to make predictions that rely on sensitive social attributes like gender and race, which poses significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Existing work tackles this issue by minimizing the employed information about social attributes in models for debiasing. However, the high correlation between target task and these social attributes makes learning on the target task incompatible with debiasing. Given that model bias arises due to the learning of bias features (\emph{i.e}., gender) that help target task optimization, we explore the following research question: \emph{Can we leverage shortcut features to replace the role of bias feature in target task optimization for debiasing?} To this end, we propose \emph{Shortcut Debiasing}, to first transfer the target task's learning of bias attributes from bias features to shortcut features, and then employ causal intervention to eliminate shortcut features during inference. The key idea of \emph{Shortcut Debiasing} is to design controllable shortcut features to on one hand replace bias features in contributing to the target task during the training stage, and on the other hand be easily removed by intervention during the inference stage. This guarantees the learning of the target task does not hinder the elimination of bias features. We apply \emph{Shortcut Debiasing} to several benchmark datasets, and achieve significant improvements over the state-of-the-art debiasing methods in both accuracy and fairness.

摘要
机器学习模型经常学习依赖敏感社会特征如性别和种族的预测，这会带来公平风险，特别是在社会应用中，如招聘、银行和刑事司法。现有的工作解决这个问题，是通过减少模型使用的社会特征来减少模型的偏见。然而，目标任务和社会特征之间的高相关性使得学习目标任务与减少偏见不兼容。基于模型偏见来自偏见特征（例如性别）的学习，我们提出了以下研究问题：“可以通过剪辑特征来替代偏见特征的角色来优化目标任务吗？”为此，我们提出了短Circuit Debiasing，即在训练阶段通过将目标任务学习的偏见特征转移到剪辑特征上，然后通过 causal intervention 在推理阶段消除剪辑特征。短Circuit Debiasing 的关键思想是设计可控的剪辑特征，以便在训练阶段替代偏见特征，并在推理阶段通过 intervention 轻松消除。这 garantizes 学习目标任务不会阻碍减少偏见。我们在多个标准数据集上应用短Circuit Debiasing，并在准确率和公平性两个方面获得了 significan 的改进。

Polar Collision Grids: Effective Interaction Modelling for Pedestrian Trajectory Prediction in Shared Space Using Collision Checks

paper_url: http://arxiv.org/abs/2308.06654
repo_url: None
paper_authors: Mahsa Golchoubian, Moojan Ghafurian, Kerstin Dautenhahn, Nasser Lashgarian Azad
for: 预测行人轨迹是自动驾驶车辆安全导航中的关键能力，特别是在与行人共享空间时。行人运动在共享空间中受到车辆和其他行人的影响，因此可以更好地模型行人-车辆和行人-行人交互，从而提高行人轨迹预测模型的准确性。
methods: 我们提出了一种基于启发的交互代理选择过程，利用碰撞风险计算来选择交互代理。我们关注与可能碰撞的代理之间的时间到碰撞和接近方向的影响，并通过引入一种新的极地增量增量Grid Map来编码交互效果。
results: 我们的结果表明，使用我们提出的方法可以比基eline方法（作为参考）在HBS数据集上预测轨迹更加准确。

Abstract
Predicting pedestrians' trajectories is a crucial capability for autonomous vehicles' safe navigation, especially in spaces shared with pedestrians. Pedestrian motion in shared spaces is influenced by both the presence of vehicles and other pedestrians. Therefore, effectively modelling both pedestrian-pedestrian and pedestrian-vehicle interactions can increase the accuracy of the pedestrian trajectory prediction models. Despite the huge literature on ways to encode the effect of interacting agents on a pedestrian's predicted trajectory using deep-learning models, limited effort has been put into the effective selection of interacting agents. In the majority of cases, the interaction features used are mainly based on relative distances while paying less attention to the effect of the velocity and approaching direction in the interaction formulation. In this paper, we propose a heuristic-based process of selecting the interacting agents based on collision risk calculation. Focusing on interactions of potentially colliding agents with a target pedestrian, we propose the use of time-to-collision and the approach direction angle of two agents for encoding the interaction effect. This is done by introducing a novel polar collision grid map. Our results have shown predicted trajectories closer to the ground truth compared to existing methods (used as a baseline) on the HBS dataset.

摘要
预测行人轨迹是自动驾驶车辆安全导航中的关键能力，特别是在与行人共享空间时。行人运动在共享空间中受到车辆和其他行人的影响。因此，可以准确地模拟行人与车辆和其他行人之间的互动，可以提高行人轨迹预测模型的准确性。Despite the extensive literature on using deep-learning models to encode the effect of interacting agents on a pedestrian's predicted trajectory, there has been limited effort put into selecting the interacting agents effectively. Most existing methods use relative distance as the main factor in the interaction formulation, while ignoring the effect of velocity and approaching direction.在这篇论文中，我们提出了一种基于启发的互动代理选择过程，通过计算碰撞风险来选择互动代理。我们将注意力集中在可能碰撞的代理与目标行人之间的互动效应上，并通过引入一种新的极地碰撞格图来编码这种互动效应。我们的结果表明，与基eline方法相比，我们的方法可以在HBS数据集上预测轨迹更加准确。

Accelerating Diffusion-based Combinatorial Optimization Solvers by Progressive Distillation

paper_url: http://arxiv.org/abs/2308.06644
repo_url: https://github.com/jwrh/Accelerating-Diffusion-based-Combinatorial-Optimization-Solvers-by-Progressive-Distillation
paper_authors: Junwei Huang, Zhiqing Sun, Yiming Yang
for: 提高 NP-完全 combinatorial 优化问题的解决速度
methods: 使用进步干涤法加速推理，在杂化过程中采取 fewer steps，如在单步内预测两步
results: 实验结果显示，使用进步干涤模型可以将推理速度提高 16 倍，而性能下降仅 0.019%，在 TSP-50 数据集上。

Abstract
Graph-based diffusion models have shown promising results in terms of generating high-quality solutions to NP-complete (NPC) combinatorial optimization (CO) problems. However, those models are often inefficient in inference, due to the iterative evaluation nature of the denoising diffusion process. This paper proposes to use progressive distillation to speed up the inference by taking fewer steps (e.g., forecasting two steps ahead within a single step) during the denoising process. Our experimental results show that the progressively distilled model can perform inference 16 times faster with only 0.019% degradation in performance on the TSP-50 dataset.

摘要
GRaph-based diffusion models have shown promising results in terms of generating high-quality solutions to NP-complete (NPC) combinatorial optimization (CO) problems. However, those models are often inefficient in inference, due to the iterative evaluation nature of the denoising diffusion process. This paper proposes to use progressive distillation to speed up the inference by taking fewer steps (e.g., forecasting two steps ahead within a single step) during the denoising process. Our experimental results show that the progressively distilled model can perform inference 16 times faster with only 0.019% degradation in performance on the TSP-50 dataset.Here's the translation in Traditional Chinese: GRaph-based diffusion models have shown promising results in terms of generating high-quality solutions to NP-complete (NPC) combinatorial optimization (CO) problems. However, those models are often inefficient in inference, due to the iterative evaluation nature of the denoising diffusion process. This paper proposes to use progressive distillation to speed up the inference by taking fewer steps (e.g., forecasting two steps ahead within a single step) during the denoising process. Our experimental results show that the progressively distilled model can perform inference 16 times faster with only 0.019% degradation in performance on the TSP-50 dataset.

Advances in Self-Supervised Learning for Synthetic Aperture Sonar Data Processing, Classification, and Pattern Recognition

paper_url: http://arxiv.org/abs/2308.11633
repo_url: None
paper_authors: Brandon Sheffield, Frank E. Bobe III, Bradley Marchand, Matthew S. Emigh
for: 本研究旨在提高水下探索中SAS数据处理、分类和 Pattern recognition的效果，通过使用自助学习（SSL）技术。
methods: 本研究提出了MoCo-SAS，一种基于SSL的SAS数据处理方法，包括数据预处理、特征提取、模型训练和测试。
results: 实验结果表明，MoCo-SAS与传统的指导学习方法相比，在F1分数上有显著提高，表明SSL可以在SAS数据处理中提高效果，并且具有潜在的应用前景。

Abstract
Synthetic Aperture Sonar (SAS) imaging has become a crucial technology for underwater exploration because of its unique ability to maintain resolution at increasing ranges, a characteristic absent in conventional sonar techniques. However, the effective application of deep learning to SAS data processing is often limited due to the scarcity of labeled data. To address this challenge, this paper proposes MoCo-SAS that leverages self-supervised learning (SSL) for SAS data processing, classification, and pattern recognition. The experimental results demonstrate that MoCo-SAS significantly outperforms traditional supervised learning methods, as evidenced by significant improvements observed in terms of the F1-score. These findings highlight the potential of SSL in advancing the state-of-the-art in SAS data processing, offering promising avenues for enhanced underwater object detection and classification.

摘要
射频成像技术（SAS）已成为水下探测中不可或缺的一种重要技术，因其可以维持分辨率随距离增长，这是传统声纳技术缺乏的特点。然而，各种深度学习在SAS数据处理中的有效应用却受到标注数据的罕见性的限制。为解决这个挑战，本文提出了MoCo-SAS，利用自动编程学习（SSL）进行SAS数据处理、分类和模式识别。实验结果表明，MoCo-SAS在F1分数方面显著超越传统监督学习方法，这表明SSL在SAS数据处理中具有潜在的潜在优势。这些发现表明SSL在SAS数据处理中可能提供新的突破口，用于提高水下对象检测和分类的精度。

ADRMX: Additive Disentanglement of Domain Features with Remix Loss

paper_url: http://arxiv.org/abs/2308.06624
repo_url: https://github.com/berkerdemirel/ADRMX
paper_authors: Berker Demirel, Erchan Aptoula, Huseyin Ozkan
for: 这个研究旨在创建能够在新不同预设范围中具有普遍化能力的模型，以减少因为不同预设范围之间的分布变化对模型的影响。
methods: 这个研究使用了一种名为“Additive Disentanglement of Domain Features with Remix Loss”的新架构，并 introduce了一种新的数据增强技术，将不同预设范围中的数据混合在潜在空间中。
results: 这个研究透过对DomainBed进行了EXTENSIVE的实验，展示了ADRMX可以实现现场的表现，并且比以前的研究得到更好的结果。

Abstract
The common assumption that train and test sets follow similar distributions is often violated in deployment settings. Given multiple source domains, domain generalization aims to create robust models capable of generalizing to new unseen domains. To this end, most of existing studies focus on extracting domain invariant features across the available source domains in order to mitigate the effects of inter-domain distributional changes. However, this approach may limit the model's generalization capacity by relying solely on finding common features among the source domains. It overlooks the potential presence of domain-specific characteristics that could be prevalent in a subset of domains, potentially containing valuable information. In this work, a novel architecture named Additive Disentanglement of Domain Features with Remix Loss (ADRMX) is presented, which addresses this limitation by incorporating domain variant features together with the domain invariant ones using an original additive disentanglement strategy. Moreover, a new data augmentation technique is introduced to further support the generalization capacity of ADRMX, where samples from different domains are mixed within the latent space. Through extensive experiments conducted on DomainBed under fair conditions, ADRMX is shown to achieve state-of-the-art performance. Code will be made available at GitHub after the revision process.

摘要
通常的假设是训练集和测试集都follow相似的分布是在部署设置中常常被违反。给定多个源领域，领域泛化目标是创建抗衰假设模型，以便在新未经见过的领域中泛化。为此，大多数现有的研究都是EXTRACTING DOMAIN INVARIANT FEATURES ACROSS AVAILABLE SOURCE DOMAINS，以mitigate INTER-DOMAIN distributional changes的影响。然而，这种方法可能会限制模型的泛化能力，因为它只是在 source domains 中找到共同特征。它忽略了可能存在一些领域特有的特征，这些特征可能在一些领域中具有价值信息。在这项工作中，一种新的架构名为 Additive Disentanglement of Domain Features with Remix Loss (ADRMX) 被提出，它解决了这种限制，通过将领域特征和领域 invariants 相加拼接在一起。此外，一种新的数据增强技术也被引入，用于进一步支持 ADRMX 的泛化能力，其中不同领域的样本在离散空间中混合。通过对 DomainBed 进行了广泛的实验，ADRMX 在 fair 的条件下显示出了状态的表现。代码将在 GitHub 上提供。

Can Unstructured Pruning Reduce the Depth in Deep Neural Networks?

paper_url: http://arxiv.org/abs/2308.06619
repo_url: None
paper_authors: Zhu Liao, Victor Quétu, Van-Tam Nguyen, Enzo Tartaglione
for: 降低深度神经网络大小 while maintaining performance
methods: 基于Entropy Guided Pruning算法，优先遍历层次 entropy 低的连接，进行完全移除
results: 成功地压缩深度神经网络，保持竞争力水平，并提供了关于不结构压缩的机制和深度学习性能之间的新的视角。

Abstract
Pruning is a widely used technique for reducing the size of deep neural networks while maintaining their performance. However, such a technique, despite being able to massively compress deep models, is hardly able to remove entire layers from a model (even when structured): is this an addressable task? In this study, we introduce EGP, an innovative Entropy Guided Pruning algorithm aimed at reducing the size of deep neural networks while preserving their performance. The key focus of EGP is to prioritize pruning connections in layers with low entropy, ultimately leading to their complete removal. Through extensive experiments conducted on popular models like ResNet-18 and Swin-T, our findings demonstrate that EGP effectively compresses deep neural networks while maintaining competitive performance levels. Our results not only shed light on the underlying mechanism behind the advantages of unstructured pruning, but also pave the way for further investigations into the intricate relationship between entropy, pruning techniques, and deep learning performance. The EGP algorithm and its insights hold great promise for advancing the field of network compression and optimization. The source code for EGP is released open-source.

摘要
剪辑是一种广泛使用的技术，用于降低深度神经网络的大小，保持性能。然而，这种技术，即使可以压缩深度模型，几乎不能完全移除层（即使是结构化的）：是这个任务可行吗？在这项研究中，我们介绍了EGP算法，一种创新的熵导向剪辑算法，用于减少深度神经网络的大小，保持性能。EGP的关键点在于优先剪辑层中的熵低的连接，以便完全移除它们。我们在popular模型如ResNet-18和Swin-T等模型上进行了广泛的实验，发现EGP有效地减少深度神经网络的大小，保持竞争力水平。我们的研究不仅解释了不结构化剪辑的优势，还为深度学习性能和剪辑技术之间的复杂关系开辟了新的可能性。EGP算法和其洞见拥有很大的潜力，可以推动深度神经网络压缩和优化领域的进步。EGP算法的源代码已经开源。

On the Interplay of Convolutional Padding and Adversarial Robustness

paper_url: http://arxiv.org/abs/2308.06612
repo_url: None
paper_authors: Paul Gavrikov, Janis Keuper
for: 本文旨在研究padding和敌意攻击之间的交互关系，以及不同padding模式对敌意Robustness的影响。
methods: 本文使用Convolutional Neural Networks (CNN)进行研究，并对不同padding模式进行比较。
results: 本文发现，敌意攻击通常会导致图像边界上的异常，这些异常与padding有关。此外，本文还发现不同padding模式对敌意Robustness的影响不同。

Abstract
It is common practice to apply padding prior to convolution operations to preserve the resolution of feature-maps in Convolutional Neural Networks (CNN). While many alternatives exist, this is often achieved by adding a border of zeros around the inputs. In this work, we show that adversarial attacks often result in perturbation anomalies at the image boundaries, which are the areas where padding is used. Consequently, we aim to provide an analysis of the interplay between padding and adversarial attacks and seek an answer to the question of how different padding modes (or their absence) affect adversarial robustness in various scenarios.

摘要
通常来说，在卷积神经网络（CNN）中， pading 被用来保持特征地图的分辨率。虽然有很多方法可供选择，但通常是通过在输入添加一个边界的零值来实现。在这项工作中，我们发现了一个现象：攻击者经常在图像边界处引起异常的杂变，这些区域 precisly 是在 padding 中使用的地方。因此，我们想进行 padding 和攻击者之间的分析，并问到不同的 padding 模式（或其缺失）对于不同的场景中的鲁棒性有什么影响。

LadleNet: Translating Thermal Infrared Images to Visible Light Images Using A Scalable Two-stage U-Net

paper_url: http://arxiv.org/abs/2308.06603
repo_url: https://github.com/ach-1914/ladlenet
paper_authors: Tonghui Zou
for: 该 paper 的目的是提出一种基于 U-Net 架构的算法，用于将thermal infrared（TIR）图像转换为可见光（VI）图像，以满足不同领域的应用需求。
methods: 该算法使用了两个阶段的 U-Net concatenation结构，以及缺省连接和精细特征聚合技术，从而提高模型性能。该算法包括 ‘Handle’ 模块和 ‘Bowl’ 模块，其中 ‘Handle’ 模块建立了一个抽象的 semantic space，而 ‘Bowl’ 模块将该 semantic space 转换为封装的 VI 图像。
results: comparing to existing methodologies, 该方法在 KAIST 数据集上测试得到了最佳性能，包括图像清晰度和感知质量。

Abstract
The translation of thermal infrared (TIR) images to visible light (VI) images presents a challenging task with potential applications spanning various domains such as TIR-VI image registration and fusion. Leveraging supplementary information derived from TIR image conversions can significantly enhance model performance and generalization across these applications. However, prevailing issues within this field include suboptimal image fidelity and limited model scalability. In this paper, we introduce an algorithm, LadleNet, based on the U-Net architecture. LadleNet employs a two-stage U-Net concatenation structure, augmented with skip connections and refined feature aggregation techniques, resulting in a substantial enhancement in model performance. Comprising 'Handle' and 'Bowl' modules, LadleNet's Handle module facilitates the construction of an abstract semantic space, while the Bowl module decodes this semantic space to yield mapped VI images. The Handle module exhibits extensibility by allowing the substitution of its network architecture with semantic segmentation networks, thereby establishing more abstract semantic spaces to bolster model performance. Consequently, we propose LadleNet+, which replaces LadleNet's Handle module with the pre-trained DeepLabv3+ network, thereby endowing the model with enhanced semantic space construction capabilities. The proposed method is evaluated and tested on the KAIST dataset, accompanied by quantitative and qualitative analyses. Compared to existing methodologies, our approach achieves state-of-the-art performance in terms of image clarity and perceptual quality. The source code will be made available at https://github.com/Ach-1914/LadleNet/tree/main/.

摘要
文本翻译：thermal infrared（TIR）图像到可见光（VI）图像的翻译问题具有广泛的应用领域，如TIR-VI图像匹配和融合。利用TIR图像的补充信息可以大幅提高模型性能和泛化性。然而，现有的问题包括图像质量不佳和模型缺乏扩展性。本文介绍一种算法，叫做LadleNet，基于U-Net架构。LadleNet使用了两个阶段的U-Net concatenation结构，加上了跳过连接和细化特征聚合技术，从而实现了显著提高模型性能。LadleNet包括“ Handle”和“Bowl”模块，其中“ Handle”模块建立了一个抽象的语义空间，而“Bowl”模块将这个语义空间转换成VI图像。“ Handle”模块具有扩展性，可以将其网络架构替换为语义分割网络，从而建立更加抽象的语义空间，提高模型性能。因此，我们提出了LadleNet+，其替换了LadleNet的“ Handle”模块，使用了预训练的DeepLabv3+网络，从而为模型增加了更多的语义空间建构能力。我们的方法在KAIST数据集上进行了评估和测试，并进行了量化和质量分析。与现有方法相比，我们的方法在图像清晰度和感知质量方面达到了国际前ier的性能。代码将在https://github.com/Ach-1914/LadleNet/tree/main/中提供。