cs.LG - 2023-09-10

Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

paper_url: http://arxiv.org/abs/2309.05153
repo_url: None
paper_authors: Yaxuan Zhu, Jianwen Xie, Yingnian Wu, Ruiqi Gao
for: 这个论文主要是为了提高能量基本模型（EBM）的采样质量和训练效率。
methods: 这个论文提出了一种叫做协同涂抹恢复likelihood（CDRL）的方法，它是一种可追加的方法，可以训练和采样多个EBM，并且可以在不同的噪音水平上进行协同训练。
results: 在CIFAR-10和ImageNet 32x32上，这个方法可以大幅提高EBM的FID分数，同时比DRL快2倍，并且可以进行 compositional generation和图像填充任务。

Abstract
Training energy-based models (EBMs) with maximum likelihood estimation on high-dimensional data can be both challenging and time-consuming. As a result, there a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximimizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versons of a dataset, paired with an initializer model for each EBM. At each noise level, the initializer model learns to amortize the sampling process of the EBM, and the two models are jointly estimated within a cooperative training framework. Samples from the initializer serve as starting points that are refined by a few sampling steps from the EBM. With the refined samples, the EBM is optimized by maximizing recovery likelihood, while the initializer is optimized by learning from the difference between the refined samples and the initial samples. We develop a new noise schedule and a variance reduction technique to further improve the sample quality. Combining these advances, we significantly boost the FID scores compared to existing EBM methods on CIFAR-10 and ImageNet 32x32, with a 2x speedup over DRL. In addition, we extend our method to compositional generation and image inpainting tasks, and showcase the compatibility of CDRL with classifier-free guidance for conditional generation, achieving similar trade-offs between sample quality and sample diversity as in diffusion models.

摘要
训练能量基模型（EBM）的最大可能性估计在高维数据上可能是一项挑战和时间消耗的任务。因此，EBM和其他生成框架如GANs和扩散模型之间存在一定的样本质量差距。为了 bridging这个差距，我们提出了协同扩散恢复可能性（CDRL），一种有效的方法，可以有效地学习和采样多个EBM，每个EBM定义在不同的噪声水平上。在每个噪声水平上，初始化模型学习了EBM的采样过程，两个模型在一个协同训练框架中被联合学习。采样过程中，初始化模型生成的样本作为EBM的起始点，然后通过EBM的几个采样步骤来修正样本。通过这种方式，EBM可以通过最大化恢复可能性来优化，而初始化模型可以通过学习差异来学习。我们还提出了一种新的噪声调度和一种减少噪声的技术，以进一步提高样本质量。将这些进步组合起来，我们在CIFAR-10和ImageNet 32x32上比现有EBM方法提高了FID分数，同时具有2倍的速度提升。此外，我们还扩展了我们的方法到组合生成和图像填充任务，并示出了与类标量指导无关的条件生成的可能性。

Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee

paper_url: http://arxiv.org/abs/2309.07157
repo_url: None
paper_authors: Chenhan Xiao, Yizheng Liao, Yang Weng
for: 检测分布网络中的线路失效是持续运行的关键，本文提出了一种实用又可靠的检测方法，不需要costly phase angles或流量数据。
methods: 我们提出了一种基于变化点检测的数据驱动方法，通过梯度下降学习 poste-outage 分布的参数，但是直接使用梯度下降会存在可行性问题。我们解决这问题 by adding a Bregman divergence constraint to control the trajectory of the parameter updates。
results: 我们使用了多个代表性的分布网络和实际的荷载 profilestest our approach with 17 outage configurations, and the results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.

Abstract
Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.

摘要
distribution 网格中的线路停机标识是可持续的网格运行的关键。在这种工作中，我们提出了一种实用又可靠的检测方法，只需利用 readily available 电压大小，不需要成本的相位角或电力流数据。给定感知数据，许多现有的检测方法基于变化点检测需要先知道停机模式，这些模式在实际停机场景中是未知的。为了解除这种不实际的要求，我们提出了一种数据驱动的方法，通过梯度下降来学习停机后的分布参数。然而，直接使用梯度下降存在可行性问题。为解决这个问题，我们修改了我们的方法，添加了布雷格曼分布约束来控制参数更新的轨迹，这样消除了可行性问题。由于快速操作是当今关键，我们证明可以在有 statistically 和物理性质的电压数据的基础上快速学习优化参数，并且有 convergence guarantees。我们使用了多个代表性的分布网格和真实的负荷 profiles，并对 17 个停机配置进行了评估。结果表明，我们可以在有 voltage magnitudes 和不假设停机模式的情况下，及时检测和地点化停机。

Nonlinear Granger Causality using Kernel Ridge Regression

paper_url: http://arxiv.org/abs/2309.05107
repo_url: https://github.com/WojtekFulmyk/mlcausality-krr-paper-replication
paper_authors: Wojciech “Victor” Fulmyk
For: Identifying nonlinear Granger causal relationships* Methods: Utilizes a flexible plug-in architecture with any nonlinear regressor, and kernel ridge regression with radial basis function kernel* Results: Achieves competitive AUC scores, more finely calibrated $p$-values, and significantly reduced computation times compared to existing algorithms.

Abstract
I introduce a novel algorithm and accompanying Python library, named mlcausality, designed for the identification of nonlinear Granger causal relationships. This novel algorithm uses a flexible plug-in architecture that enables researchers to employ any nonlinear regressor as the base prediction model. Subsequently, I conduct a comprehensive performance analysis of mlcausality when the prediction regressor is the kernel ridge regressor with the radial basis function kernel. The results demonstrate that mlcausality employing kernel ridge regression achieves competitive AUC scores across a diverse set of simulated data. Furthermore, mlcausality with kernel ridge regression yields more finely calibrated $p$-values in comparison to rival algorithms. This enhancement enables mlcausality to attain superior accuracy scores when using intuitive $p$-value-based thresholding criteria. Finally, mlcausality with the kernel ridge regression exhibits significantly reduced computation times compared to existing nonlinear Granger causality algorithms. In fact, in numerous instances, this innovative approach achieves superior solutions within computational timeframes that are an order of magnitude shorter than those required by competing algorithms.

摘要
我引入了一种新的算法和 accompanying Python 库，名为 mlcausality，用于非线性格兰GER causal 关系的标识。这种新算法使用一种灵活的插件架构，允许研究人员使用任何非线性预测模型作为基础预测模型。然后，我进行了对 mlcausality 使用 kernel ridge 回归时的性能分析。结果表明，mlcausality 使用 kernel ridge 回归可以在一个多样化的 simulated 数据集中实现竞争力强的 AUC 分数。此外，mlcausality 使用 kernel ridge 回归可以获得更细化的 $p$-值，比其他算法更加精准。这种改进使得 mlcausality 可以在使用直观 $p$-值基于的阈值标准下达到更高的准确率。最后，mlcausality 使用 kernel ridge 回归可以在许多情况下实现比现有的非线性格兰GER causal 关系算法更快的计算速度，并且在一些情况下可以达到对抗算法的一个阶段的计算时间。

Convex Q Learning in a Stochastic Environment: Extended Version

paper_url: http://arxiv.org/abs/2309.05105
repo_url: None
paper_authors: Fan Lu, Sean Meyn
for: 这篇论文是关于Markov决策过程中的凸Q学习，使用函数近似。
methods: 论文使用了一种凸 программирова的关键性下降法，基于Manne所提出的线性程序Characterization of Optimal Control的准确矩阵。
results: 主要贡献包括：(1) 凸程序relaxation的性质和Q学习的关系; (2) 一种直接的模型自由方法，可以准确地approximate凸程序; (3) 新的分析技术，可以确定模型的收敛速率。

Abstract
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.

摘要
文章介绍了第一种凸Q学习方法 дляMarkov决策过程中的函数近似。算法和理论基于一种缓和的 dual Manne的线性程序优化 caracterization的relaxation。文章的主要贡献包括：1. 凸程Program的属性：我们确定了一个凸程Program的解是否 bounded，以及它与标准Q学习解的关系。2. 算法设计和分析：（i）一种直接的模型自由方法，可以近似凸程Program，并且具有相似的性质， garantizing a bounded solution subject to a simple property of the basis functions。（ii）提出的算法是 converges，并且引入了新的技术来确定 Mean-square convergence rate。（iii）方法可以扩展到多种性能标准，并且发现可以降低差异的方法是考虑“相对”动态程序方程。（iv）理论通过一个 classical inventory control problem的应用得到了证明。

Is Learning in Biological Neural Networks based on Stochastic Gradient Descent? An analysis using stochastic processes

paper_url: http://arxiv.org/abs/2309.05102
repo_url: None
paper_authors: Sören Christensen, Jan Kallsen
for: 本研究探讨了生物神经网络（BNN）中学习的不同方式，以及人工神经网络（ANN）中学习的不同方式之间的区别。
methods: 本研究使用了一种抽象的概率模型来研究BNN中的超visum学习。
results: 研究发现，在每次学习机会中，多个本地更新都会导致一个梯度步骤出现，这 suggetssthat stochastic gradient descent可能会在BNN中进行优化。

Abstract
In recent years, there has been an intense debate about how learning in biological neural networks (BNNs) differs from learning in artificial neural networks. It is often argued that the updating of connections in the brain relies only on local information, and therefore a stochastic gradient-descent type optimization method cannot be used. In this paper, we study a stochastic model for supervised learning in BNNs. We show that a (continuous) gradient step occurs approximately when each learning opportunity is processed by many local updates. This result suggests that stochastic gradient descent may indeed play a role in optimizing BNNs.

摘要
近年来，有一些研究者提出了关于生物神经网络（BNN）学习方式与人工神经网络（ANN）之间的区别。一般认为，大脑中的连接更新受到本地信息的限制，因此无法使用渐进式梯度下降优化方法。本文研究了BNN中的抽象学习模型。我们发现，在每次学习机会处理时，多个本地更新发生 approximate gradient step。这一结果表明，渐进式梯度下降可能在BNN中发挥作用。

A Deep Dive into Sleep: Single-Channel EEG-Based Sleep Stage Classification with Model Interpretability

paper_url: http://arxiv.org/abs/2309.07156
repo_url: https://github.com/suvadeepmaiti/EEG_Sleep_Stage_classification
paper_authors: Shivam Sharma, Suvadeep Maiti, S. Mythirayee, Srijithesh Rajendran, Bapi Raju
for: 这个研究是为了开发一个基于单通道EEG的睡眠阶段分类方法。
methods: 本研究使用了一个SE-ResNet-Bi-LSTM架构，包括两个基本元素：一个特征提取器使用SE-ResNet，以及一个时间上下文编码器使用堆叠Bi-LSTM单元。
results: 本研究在三个不同的数据集上进行了严格的评估，包括SLeepEDF-20、SleepEDF-78和SHHS数据集。结果显示，我们的方法在这三个数据集上取得了高度的精度和macro-F1分数（87.5%, 83.9%, 87.8%和82.5, 78.9, 81.9）。此外，我们还引入了1D-GradCAM可视化方法，帮助理解模型在睡眠阶段分类过程中的决策过程。

Abstract
Sleep, a fundamental physiological process, occupies a significant portion of our lives. Accurate classification of sleep stages serves as a crucial tool for evaluating sleep quality and identifying probable sleep disorders. This work introduces a novel methodology that utilises a SE-Resnet-Bi-LSTM architecture to classify sleep into five separate stages. The classification process is based on the analysis of single-channel electroencephalograms (EEGs). The framework that has been suggested consists of two fundamental elements: a feature extractor that utilises SE-ResNet, and a temporal context encoder that use stacks of Bi-LSTM units.The effectiveness of our approach is substantiated by thorough assessments conducted on three different datasets, namely SLeepEDF-20, SleepEDF-78, and SHHS. Significantly, our methodology attains notable levels of accuracy, specifically 87.5\%, 83.9\%, and 87.8\%, along with macro-F1 scores of 82.5, 78.9, and 81.9 for the corresponding datasets. Notably, we introduce the utilization of 1D-GradCAM visualization to shed light on the decision-making process of our model in the realm of sleep stage classification. This visualization method not only provides valuable insights into the model's classification rationale but also aligns its outcomes with the annotations made by sleep experts. One notable feature of our research is the integration of an expedited training approach, which effectively preserves the model's resilience in terms of performance. The experimental evaluations conducted provide a comprehensive evaluation of the effectiveness of our proposed model in comparison to existing approaches, highlighting its potential for practical applications.

摘要
睡眠是生物体的基本生理过程，占据了我们生活的一大部分。准确地分类睡眠阶段是评估睡眠质量的重要工具，并可以识别可能的睡眠障碍。本文提出了一种新的方法，使用SE-ResNet-Bi-LSTM架构来分类睡眠为五个不同阶段。该分类过程基于单通道电enzephalogram (EEG) 的分析。我们提出的框架包括两个基本元素：一个特征提取器，使用 SE-ResNet，以及一个时间上下文编码器，使用堆栈的 Bi-LSTM 单元。我们的方法在三个不同的数据集上进行了系统性的评估，即SLeepEDF-20、SleepEDF-78和SHHS。结果显示，我们的方法在这些数据集上达到了remarkable的准确率和macro-F1分数，具体数据如下：87.5%、83.9%和87.8%，以及macro-F1分数分别为82.5、78.9和81.9。值得一提的是，我们首次在睡眠阶段分类中引入了1D-GradCAM视觉化方法，以便了解模型在哪些情况下进行分类的决策过程。这种视觉化方法不仅提供了模型分类的价值信息，还与睡眠专家的注释相匹配。我们的研究还 интегрирова了一种加速训练方法，以保持模型在性能上的稳定性。实验评估表明，我们提出的模型在现有方法相比有更好的实际应用前景。

Adaptive conformal classification with noisy labels

paper_url: http://arxiv.org/abs/2309.05092
repo_url: https://github.com/msesia/conformal-label-noise
paper_authors: Matteo Sesia, Y. X. Rachel Wang, Xin Tong
for: 这 paper 是为了开发一种能够自动适应Random label contamination的 conformal prediction 方法，以提供更加信息强的预测集和更强的覆盖保证，比对state-of-the-art方法更高效。
methods: 这 paper 使用了一种精确的理论 caracterization 来描述标签污染的影响，并通过新的 calibration 算法来让这种影响变得可行。这种解决方案 flexible ，可以利用不同的标签污染过程的假设，而不需要关于数据分布或机器学习分类器的知识。
results: 这 paper 通过了广泛的 simulations 和 CIFAR-10H 图像数据集的应用，证明了其方法的优势。

Abstract
This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, enabling more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise theoretical characterization of the effective coverage inflation (or deflation) suffered by standard conformal inferences in the presence of label contamination, which is then made actionable through new calibration algorithms. Our solution is flexible and can leverage different modeling assumptions about the label contamination process, while requiring no knowledge about the data distribution or the inner workings of the machine-learning classifier. The advantages of the proposed methods are demonstrated through extensive simulations and an application to object classification with the CIFAR-10H image data set.

摘要
这个论文开发了一种新的准确预测方法，用于Classification任务，可以自动适应随机标签污染的校准样本，以便生成更加信息强的预测集，与现有方法相比具有更强的覆盖保证。这种方法基于标准准确推理中对标签污染的精确理论 caracterization，然后通过新的校准算法来实现。我们的解决方案 flexible，可以利用不同的标签污染过程的模型假设，而不需要关于数据分布或机器学习分类器的知识。我们的优点在 simulate 和对 CIFAR-10H 图像数据集进行应用中得到了证明。

A supervised generative optimization approach for tabular data

paper_url: http://arxiv.org/abs/2309.05079
repo_url: None
paper_authors: Fadi Hamad, Shinpei Nakamura-Sakai, Saheed Obitayo, Vamsi K. Potluru
for: 本研究旨在提供一种基于supervised learning的synthetic data生成框架，以满足金融机构对具有特定任务和数据集的数据生成需求。
methods: 该框架 integra supervised component，专门针对特定下游任务进行tailoring，并使用meta-学习方法来学习优化现有synthetic数据集的混合分布。
results: 该框架可以生成高质量的synthetic数据，并且可以根据下游任务进行tailoring，从而提高数据生成的效果和可靠性。

Abstract
Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching the consensus on which method we should use for the specific data sets and use cases remains challenging. Moreover, the majority of existing approaches are ``unsupervised'' in the sense that they do not take into account the downstream task. To address these issues, this work presents a novel synthetic data generation framework. The framework integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.

摘要
文本翻译为简化中文：现代数据生成技术已成为金融机构关键话题，受多种因素驱动，如隐私保护和数据扩展。许多算法已经提出用于数据生成，但确定特定数据集和用例中使用哪种方法仍然是挑战。此外，大多数现有方法是“无监督的”，即不考虑下游任务。为解决这些问题，本文提出了一种新的数据生成框架。该框架 integrate 一种监督分布 tailored 特定下游任务，并使用 meta-学习方法学习最佳混合分布。

Generalization error bounds for iterative learning algorithms with bounded updates

paper_url: http://arxiv.org/abs/2309.05077
repo_url: None
paper_authors: Jingwen Fu, Nanning Zheng
for: 本研究探讨了iterative learning算法对非凸损函数的泛化特性，利用信息学技术。我们的主要贡献是对 bounded updates 算法的泛化错误 bound，超越了先前的works只关注 Stochastic Gradient Descent (SGD) 的情况。
methods: 我们的方法包括两大新特点：1) 将更新的不确定性重新表述为 mutual information，提供了新的视角；2) 使用 variance decomposition technique 来分解 iteration 之间的信息，使 surrogate process 更加简单。
results: 我们对不同设置下的泛化 bound 进行分析，并在模型维度增加时与训练样本数量相同时显示出改进的 bound。此外，我们还检验了在大型自然语言模型中观察到的扩展行为。最终，我们的工作为实际泛化理论的发展做出了一个更一步。

Abstract
This paper explores the generalization characteristics of iterative learning algorithms with bounded updates for non-convex loss functions, employing information-theoretic techniques. Our key contribution is a novel bound for the generalization error of these algorithms with bounded updates, extending beyond the scope of previous works that only focused on Stochastic Gradient Descent (SGD). Our approach introduces two main novelties: 1) we reformulate the mutual information as the uncertainty of updates, providing a new perspective, and 2) instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process. We analyze our generalization bound under various settings and demonstrate improved bounds when the model dimension increases at the same rate as the number of training data samples. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Ultimately, our work takes a further step for developing practical generalization theories.

摘要

We reformulate the mutual information as a measure of the uncertainty of updates, providing a new perspective on the problem.2. Instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process.We analyze our generalization bound under various settings and show that it improves as the model dimension increases at the same rate as the number of training data samples. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Our work represents a significant step forward in developing practical generalization theories.

Mutation-based Fault Localization of Deep Neural Networks

paper_url: http://arxiv.org/abs/2309.05067
repo_url: https://github.com/ali-ghanbari/deepmufl-ase-2023
paper_authors: Ali Ghanbari, Deepak-George Thomas, Muhammad Arbab Arshad, Hridesh Rajan
for: 本研究旨在提高深度神经网络（DNN）系统的可靠性，特别是在安全关键领域。
methods: 本文提出了一种新的技术——深度瑞夫特（DeepMUFL），用于检测DNN模型中的错误。
results: 对于109个Stack Overflow上的错误集，深度瑞夫特能够检测出53个错误，比州态艺术的静态和动态DNN错误检测系统高效。此外，我们发现可以通过选择突变来减少检测时间，但是产生的 bug 检测率下降了7.55%。

Abstract
Deep neural networks (DNNs) are susceptible to bugs, just like other types of software systems. A significant uptick in using DNN, and its applications in wide-ranging areas, including safety-critical systems, warrant extensive research on software engineering tools for improving the reliability of DNN-based systems. One such tool that has gained significant attention in the recent years is DNN fault localization. This paper revisits mutation-based fault localization in the context of DNN models and proposes a novel technique, named deepmufl, applicable to a wide range of DNN models. We have implemented deepmufl and have evaluated its effectiveness using 109 bugs obtained from StackOverflow. Our results show that deepmufl detects 53/109 of the bugs by ranking the buggy layer in top-1 position, outperforming state-of-the-art static and dynamic DNN fault localization systems that are also designed to target the class of bugs supported by deepmufl. Moreover, we observed that we can halve the fault localization time for a pre-trained model using mutation selection, yet losing only 7.55% of the bugs localized in top-1 position.

摘要

SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models

paper_url: http://arxiv.org/abs/2309.05019
repo_url: None
paper_authors: Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma
for: 这个论文主要针对Diffusion Probabilistic Models（DPMs）的生成任务进行了广泛的分析和优化。
methods: 论文使用了两种方法进行随机抽样：variance-controlled diffusion SDE和线性多步SDE解决方法。
results: 对于几步抽样，SA-Solver可以实现改进或相当于现有state-of-the-art抽样方法的性能，并在适当的函数评估次数（NFEs）下达到了SOTA FID分数在大量的benchmark数据集上。

Abstract
Diffusion Probabilistic Models (DPMs) have achieved considerable success in generation tasks. As sampling from DPMs is equivalent to solving diffusion SDE or ODE which is time-consuming, numerous fast sampling methods built upon improved differential equation solvers are proposed. The majority of such techniques consider solving the diffusion ODE due to its superior efficiency. However, stochastic sampling could offer additional advantages in generating diverse and high-quality data. In this work, we engage in a comprehensive analysis of stochastic sampling from two aspects: variance-controlled diffusion SDE and linear multi-step SDE solver. Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality. Our experiments show that SA-Solver achieves: 1) improved or comparable performance compared with the existing state-of-the-art sampling methods for few-step sampling; 2) SOTA FID scores on substantial benchmark datasets under a suitable number of function evaluations (NFEs).

摘要
Diffusion Probabilistic Models (DPMs) 已经取得了较大的成功在生成任务中。由于从 DPMs 中采样是等价于解决 diffusion SDE 或 ODE，这些问题需要很长时间，因此有许多快速采样技术被建议。大多数这些技术是解决 diffusion ODE，因为它的效率更高。然而，随机采样可以提供额外的优势，如生成多样化和高质量的数据。在这项工作中，我们进行了Diffusion SDE 采样的两个方面的全面分析：变量控制的 diffusion SDE 和线性多步 SDE 解决器。基于我们的分析，我们提出了 SA-Solver，它是一种改进的效率随机阿达姆斯方法，用于解决 diffusion SDE，以生成高质量的数据。我们的实验表明，SA-Solver 可以：1）与现有状态的艺术方法相比，在几步采样中达到相同或更高的性能; 2）在适当的函数评估次数（NFEs）下，在大量 benchmark 数据集上达到 SOTA FID 分数。

Computational Approaches for Predicting Drug-Disease Associations: A Comprehensive Review

paper_url: http://arxiv.org/abs/2309.06388
repo_url: None
paper_authors: Chunyan Ao, Zhichao Xiao, Lixin Guan, Liang Yu
For: 本研究旨在探讨计算方法对药物与疾病关系的预测，以优化药物开发过程中的成本、时间和风险。* Methods: 本文分析了多种计算方法，包括神经网络算法、矩阵算法、推荐算法、链接基于的理由算法和文本挖掘和 semantics 理解算法，以预测药物与疾病关系。* Results: 本文对现有的药物与疾病关系预测算法进行比较，并探讨了现有挑战和未来发展前景。

Abstract
In recent decades, traditional drug research and development have been facing challenges such as high cost, long timelines, and high risks. To address these issues, many computational approaches have been suggested for predicting the relationship between drugs and diseases through drug repositioning, aiming to reduce the cost, development cycle, and risks associated with developing new drugs. Researchers have explored different computational methods to predict drug-disease associations, including drug side effects-disease associations, drug-target associations, and miRNAdisease associations. In this comprehensive review, we focus on recent advances in predicting drug-disease association methods for drug repositioning. We first categorize these methods into several groups, including neural network-based algorithms, matrixbased algorithms, recommendation algorithms, link-based reasoning algorithms, and text mining and semantic reasoning. Then, we compare the prediction performance of existing drug-disease association prediction algorithms. Lastly, we delve into the present challenges and future prospects concerning drug-disease associations.

摘要
现代药物研发面临高成本、长时间和高风险的挑战。为解决这些问题，许多计算方法被建议用于预测药物和疾病之间的关系，以减少开发新药物的成本、开发周期和风险。研究人员已经探索了不同的计算方法来预测药物疾病关系，包括药物副作用疾病关系、药物Target关系和miRNA疾病关系。在这篇概述中，我们关注最近的药物疾病关系预测方法的进步。我们首先将这些方法分为了一些组，包括神经网络基于的算法、矩阵基于的算法、推荐算法、链接基于的理由算法和文本挖掘和 semantic reasoning。然后，我们比较了现有的药物疾病关系预测算法的预测性能。最后，我们探讨了药物疾病关系的当前挑战和未来前途。

Linear Speedup of Incremental Aggregated Gradient Methods on Streaming Data

paper_url: http://arxiv.org/abs/2309.04980
repo_url: None
paper_authors: Xiaolu Wang, Cheng Jin, Hoi-To Wai, Yuantao Gu
for: 这种研究是为了研究大规模分布式优化中的增量累加梯度（IAG）方法。
methods: 这种方法适合参数服务器架构，因为它可以轻松地将工作者可能停止的梯度集成。
results: 对于具有流动数据的 случа子，这种方法可以实现线性的速度提升，即使工作者们在更新频繁 enough。我们证明了在每个工作者更新一个数据点后，解的平均方差衰逝为 O((1+T)/(nt)),其中 n 是工作者数量，t 是迭代次数，T/n 是工作者更新频率。我们的分析包括处理受到停止梯度的条件预期以及延迟和噪声项的重叠系统，这些是 IAG 类型算法的分析中的新特点。

Abstract
This paper considers a type of incremental aggregated gradient (IAG) method for large-scale distributed optimization. The IAG method is well suited for the parameter server architecture as the latter can easily aggregate potentially staled gradients contributed by workers. Although the convergence of IAG in the case of deterministic gradient is well known, there are only a few results for the case of its stochastic variant based on streaming data. Considering strongly convex optimization, this paper shows that the streaming IAG method achieves linear speedup when the workers are updating frequently enough, even if the data sample distribution across workers are heterogeneous. We show that the expected squared distance to optimal solution decays at O((1+T)/(nt)), where $n$ is the number of workers, t is the iteration number, and T/n is the update frequency of workers. Our analysis involves careful treatments of the conditional expectations with staled gradients and a recursive system with both delayed and noise terms, which are new to the analysis of IAG-type algorithms. Numerical results are presented to verify our findings.

摘要
This paper focuses on strongly convex optimization and shows that the streaming IAG method achieves linear speedup when workers update frequently enough, even if the data sample distribution across workers is heterogeneous. Our analysis takes into account careful treatments of conditional expectations with stale gradients and a recursive system with both delayed and noise terms, which are new to the analysis of IAG-type algorithms.The expected squared distance to the optimal solution decays at O((1+T)/(nt)), where n is the number of workers, t is the iteration number, and T/n is the update frequency of workers. Numerical results are presented to verify our findings.

LMBiS-Net: A Lightweight Multipath Bidirectional Skip Connection based CNN for Retinal Blood Vessel Segmentation

paper_url: http://arxiv.org/abs/2309.04968
repo_url: None
paper_authors: Mufassir M. Abbasi, Shahzaib Iqbal, Asim Naveed, Tariq M. Khan, Syed S. Naqvi, Wajeeha Khalid
for: 这个研究是为了提出一个高速和高精度的眼睛病变检测方法，以帮助诊断和治疗眼睛疾病。
methods: 这个方法使用了一个名为LMBiS-Net的轻量级像素级卷积神经网，并且使用了多路特征提取对象和对向 skip connections，以提高分类精度。
results: 根据实验结果显示，LMBiS-Net可以实现高速和高精度的眼睛影像分类，并且具有较高的一致性和可靠性。

Abstract
Blinding eye diseases are often correlated with altered retinal morphology, which can be clinically identified by segmenting retinal structures in fundus images. However, current methodologies often fall short in accurately segmenting delicate vessels. Although deep learning has shown promise in medical image segmentation, its reliance on repeated convolution and pooling operations can hinder the representation of edge information, ultimately limiting overall segmentation accuracy. In this paper, we propose a lightweight pixel-level CNN named LMBiS-Net for the segmentation of retinal vessels with an exceptionally low number of learnable parameters \textbf{(only 0.172 M)}. The network used multipath feature extraction blocks and incorporates bidirectional skip connections for the information flow between the encoder and decoder. Additionally, we have optimized the efficiency of the model by carefully selecting the number of filters to avoid filter overlap. This optimization significantly reduces training time and enhances computational efficiency. To assess the robustness and generalizability of LMBiS-Net, we performed comprehensive evaluations on various aspects of retinal images. Specifically, the model was subjected to rigorous tests to accurately segment retinal vessels, which play a vital role in ophthalmological diagnosis and treatment. By focusing on the retinal blood vessels, we were able to thoroughly analyze the performance and effectiveness of the LMBiS-Net model. The results of our tests demonstrate that LMBiS-Net is not only robust and generalizable but also capable of maintaining high levels of segmentation accuracy. These characteristics highlight the potential of LMBiS-Net as an efficient tool for high-speed and accurate segmentation of retinal images in various clinical applications.

摘要
盲目疾病常与改变 RETINAL 结构相关，可以在背部图像中进行临床识别。然而，现有方法通常不够准确地分割细血管。深度学习在医疗图像分割方面表现出了承诺，但是它的依赖于重复的卷积和抽取操作可能会阻碍缝合信息的表现，从而限制总的分割精度。在这篇论文中，我们提出了一个轻量级的像素级 CNN 名为 LMBiS-Net，用于分割背部图像中的血管。该网络使用多路特征提取块和双向跳转连接，以便在编码和解码器之间进行信息流。此外，我们优化了模型的效率，通过精心选择缺省的缺省数据来避免缺省的过滤重叠。这种优化显著降低了训练时间和计算效率。为评估 LMBiS-Net 模型的稳定性和普适性，我们进行了广泛的评估，包括不同方面的 RETINAL 图像。Specifically，我们将模型测试在精确地分割背部图像中的血管方面，这些血管在眼科诊断和治疗中扮演着关键角色。通过专注于背部血管，我们可以仔细分析 LMBiS-Net 模型的性能和效果。测试结果表明，LMBiS-Net 模型不仅稳定和普适，还能够保持高度的分割精度。这些特点表明 LMBiS-Net 模型具有高速和准确地分割背部图像的能力，这些能力在各种临床应用中具有广泛的应用前景。

A multiple k-means cluster ensemble framework for clustering citation trajectories

paper_url: http://arxiv.org/abs/2309.04949
repo_url: None
paper_authors: Joyita Chakraborty, Dinesh K. Pradhan, Subrata Nandi
for: 这篇论文的主要目的是探讨文献强度的分布和不同时间间隔的影响。
methods: 这篇论文使用了多尺度整合 clustering 方法，并对不同时间间隔的文献进行分类。
results: 研究发现，文献的强度演变 exhibits 四种不同的趋势，包括 Early Rise Rapid Decline、Early Rise Slow Decline、Delayed Rise No Decline 和 Delayed Rise Slow Decline。这些趋势的发展和衰落时间、累积引用分布以及峰值特征都被重新定义了。

Abstract
Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window. Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non linear and non stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule based approach. All methods are primarily parameter dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalised clustering framework is required. This paper proposes a feature based multiple k means cluster ensemble framework. 1,95,783 and 41,732 well cited articles from the Microsoft Academic Graph data are considered for clustering short term (10 year) and long term (30 year) trajectories, respectively. It has linear run time. Four distinct trajectories are obtained Early Rise Rapid Decline (2.2%), Early Rise Slow Decline (45%), Delayed Rise No Decline (53%), and Delayed Rise Slow Decline (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit Early Rise Slow Decline and Delayed Rise No Decline patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories are redefined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.

摘要
“文献成熟时间各不相同，但所有文献的影响都是在固定窗口内测量的。对文献 trajectory 的归一化可以理解知识传播过程，并发现不所有文献在出版后即时获得成功。此外，对 trajectory 的归一化是提出纸影响建议算法的必要condition。但是，由于引用时间序列具有非线性和不稳定的特点，这是一个具有挑战性的问题。先前的研究提出了一些arbitrary 的阈值和固定规则的方法，但这些方法都是具有参数依赖性的。因此，它们会导致对类似 trajectory 的定义不一致和对其具体数量的歧义。大多数研究只 capture 极端 trajectory。因此，一个通用的归一化框架是需要的。本文提出了一种特征基于多种 k-means 集成框架。对于10年和30年的短期和长期 trajectory，分别使用 Microsoft Academic Graph 数据集中的195,783 和41,732 篇著作进行归一化。它具有线性运行时间。我们获得了4种不同的 trajectory：早期快速下降（2.2%）、早期快速下降（45%）、延迟快速下降（53%）和延迟快速下降（0.8%）。我们对不同时间跨度中文章的差异进行了详细研究。大多数文章展现出早期快速下降和延迟快速下降的模式。我们重新定义了文章成长和衰退时间、累积引用分布和峰值特征。与之前的比较研究相比，我们的提出方法可以检测所有不同的 trajectory 类别。”

Distance-Restricted Folklore Weisfeiler-Leman GNNs with Provable Cycle Counting Power

paper_url: http://arxiv.org/abs/2309.04941
repo_url: None
paper_authors: Junru Zhou, Jiarui Feng, Xiyuan Wang, Muhan Zhang
For: The paper aims to improve the efficiency and expressive power of graph neural networks (GNNs) for counting certain graph substructures, especially cycles, which is crucial for achieving robust and generalizable performance on molecular tasks.* Methods: The proposed method, $d$-Distance-Restricted FWL(2) GNNs, uses node pairs whose mutual distances are at most $d$ as the units for message passing to balance the expressive power and complexity. This approach avoids the expensive subgraph extraction operations in subgraph GNNs, making both the time and space complexity lower.* Results: The paper theoretically shows that the discriminative power of $d$-DRFWL(2) GNNs strictly increases as $d$ increases. Moreover, the model has provably strong cycle counting power even with $d=2$, being able to count all 3, 4, 5, 6-cycles, which is crucial for achieving robust and generalizable performance on molecular tasks. Experiments on both synthetic datasets and molecular datasets verify the theory.

Abstract
The ability of graph neural networks (GNNs) to count certain graph substructures, especially cycles, is important for the success of GNNs on a wide range of tasks. It has been recently used as a popular metric for evaluating the expressive power of GNNs. Many of the proposed GNN models with provable cycle counting power are based on subgraph GNNs, i.e., extracting a bag of subgraphs from the input graph, generating representations for each subgraph, and using them to augment the representation of the input graph. However, those methods require heavy preprocessing, and suffer from high time and memory costs. In this paper, we overcome the aforementioned limitations of subgraph GNNs by proposing a novel class of GNNs -- $d$-Distance-Restricted FWL(2) GNNs, or $d$-DRFWL(2) GNNs. $d$-DRFWL(2) GNNs use node pairs whose mutual distances are at most $d$ as the units for message passing to balance the expressive power and complexity. By performing message passing among distance-restricted node pairs in the original graph, $d$-DRFWL(2) GNNs avoid the expensive subgraph extraction operations in subgraph GNNs, making both the time and space complexity lower. We theoretically show that the discriminative power of $d$-DRFWL(2) GNNs strictly increases as $d$ increases. More importantly, $d$-DRFWL(2) GNNs have provably strong cycle counting power even with $d=2$: they can count all 3, 4, 5, 6-cycles. Since 6-cycles (e.g., benzene rings) are ubiquitous in organic molecules, being able to detect and count them is crucial for achieving robust and generalizable performance on molecular tasks. Experiments on both synthetic datasets and molecular datasets verify our theory. To the best of our knowledge, our model is the most efficient GNN model to date (both theoretically and empirically) that can count up to 6-cycles.

摘要
граф neural networks (GNNs) 的能力count certain graph substructures, especially cycles, is important for the success of GNNs on a wide range of tasks. It has been recently used as a popular metric for evaluating the expressive power of GNNs. Many of the proposed GNN models with provable cycle counting power are based on subgraph GNNs, i.e., extracting a bag of subgraphs from the input graph, generating representations for each subgraph, and using them to augment the representation of the input graph. However, those methods require heavy preprocessing, and suffer from high time and memory costs. In this paper, we overcome the aforementioned limitations of subgraph GNNs by proposing a novel class of GNNs -- $d$-Distance-Restricted FWL(2) GNNs, or $d$-DRFWL(2) GNNs. $d$-DRFWL(2) GNNs use node pairs whose mutual distances are at most $d$ as the units for message passing to balance the expressive power and complexity. By performing message passing among distance-restricted node pairs in the original graph, $d$-DRFWL(2) GNNs avoid the expensive subgraph extraction operations in subgraph GNNs, making both the time and space complexity lower. We theoretically show that the discriminative power of $d$-DRFWL(2) GNNs strictly increases as $d$ increases. More importantly, $d$-DRFWL(2) GNNs have provably strong cycle counting power even with $d=2$: they can count all 3, 4, 5, 6-cycles. Since 6-cycles (e.g., benzene rings) are ubiquitous in organic molecules, being able to detect and count them is crucial for achieving robust and generalizable performance on molecular tasks. Experiments on both synthetic datasets and molecular datasets verify our theory. To the best of our knowledge, our model is the most efficient GNN model to date (both theoretically and empirically) that can count up to 6-cycles.