for: 本文研究了 whether or not 包括更多模型在 ensemble 中会提高其平均性能。
methods: 本文使用了 several base models 的 ensemble methods,并研究了不同的 predictive metric。
results: 研究结果表明,只有当loss function 是 convex 时, ensemble 的平均性能会逐渐提高。如果loss function 不是 convex,那么 ensemble 的平均性能会随着模型数量的增加而变化。Abstract
Ensemble methods combine the predictions of several base models. We study whether or not including more models in an ensemble always improve its average performance. Such a question depends on the kind of ensemble considered, as well as the predictive metric chosen. We focus on situations where all members of the ensemble are a priori expected to perform as well, which is the case of several popular methods like random forests or deep ensembles. In this setting, we essentially show that ensembles are getting better all the time if, and only if, the considered loss function is convex. More precisely, in that case, the average loss of the ensemble is a decreasing function of the number of models. When the loss function is nonconvex, we show a series of results that can be summarised by the insight that ensembles of good models keep getting better, and ensembles of bad models keep getting worse. To this end, we prove a new result on the monotonicity of tail probabilities that may be of independent interest. We illustrate our results on a simple machine learning problem (diagnosing melanomas using neural nets).
摘要
ensemble方法将多个基本模型的预测结果组合起来。我们研究 whether or not including more models in an ensemble always improve its average performance。这种问题取决于 ensemble 中使用的模型和预测指标。我们主要关注所有ensemble中的模型都能够表现良好的情况,这是 Random Forests 或 Deep Ensembles 等流行的方法的情况。在这种设定下,我们实际上表明,如果loss函数是凸函数,那么ensemble的平均损失将随着模型数量减少。如果loss函数非凸,我们则证明了一系列结论,可以概括为:ensemble中好模型会越来越好,而ensemble中坏模型会越来越差。为此,我们证明了一个新的 monotonicity 的结论,可能有独立的价值。我们使用一个简单的机器学习问题(使用神经网络诊断 melanoma)来 illustrate 我们的结果。
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks
for: This paper is written to address the challenges of deploying AI4S models in real-world applications by introducing a novel benchmarking approach called structural interpretation.
methods: The paper introduces a novel benchmarking approach called structural interpretation, which partitions the problem and metric spaces to facilitate a structural exploration of these spaces and identify the trusted operating range and trace errors.
results: The paper demonstrates the practical utility and effectiveness of structural interpretation through its application to three distinct AI4S workloads: machine-learning force fields (MLFF), jet tagging, and precipitation nowcasting. The benchmarks effectively model the trusted operating range, trace errors, and reveal novel perspectives for refining the model, training process, and data sampling strategy.Abstract
Artificial Intelligence for Science (AI4S) is an emerging research field that utilizes machine learning advancements to tackle complex scientific computational issues, aiming to enhance computational efficiency and accuracy. However, the data-driven nature of AI4S lacks the correctness or accuracy assurances of conventional scientific computing, posing challenges when deploying AI4S models in real-world applications. To mitigate these, more comprehensive benchmarking procedures are needed to better understand AI4S models. This paper introduces a novel benchmarking approach, known as structural interpretation, which addresses two key requirements: identifying the trusted operating range in the problem space and tracing errors back to their computational components. This method partitions both the problem and metric spaces, facilitating a structural exploration of these spaces. The practical utility and effectiveness of structural interpretation are illustrated through its application to three distinct AI4S workloads: machine-learning force fields (MLFF), jet tagging, and precipitation nowcasting. The benchmarks effectively model the trusted operating range, trace errors, and reveal novel perspectives for refining the model, training process, and data sampling strategy. This work is part of the SAIBench project, an AI4S benchmarking suite.
摘要
人工智能科学(AI4S)是一个emerging研究领域,通过机器学习技术提高科学计算问题的效率和准确性。然而,数据驱动的AI4S具有传统科学计算不具备的正确性或准确性保证,这会在实际应用中提出挑战。为了解决这些挑战,需要更加全面的 benchmarking 程序,以更好地理解AI4S模型。这篇论文提出了一种新的 benchmarking 方法,称为结构解释,它解决了两个关键要求:确定问题空间中的可信运行范围和跟踪错误的计算组件。这种方法将问题空间和度量空间分割,使得结构性地探索这些空间。通过应用这种方法于三个不同的 AI4S 负荷:机器学习力场(MLFF)、捕捉和预测降水, benchmarks 能够模型可信运行范围,跟踪错误,并揭示了改进模型、训练过程和数据采样策略的新的视角。这项工作是SAIBench项目的一部分,SAIBench 是一个AI4S benchmarking 集成。
Leveraging Graph Diffusion Models for Network Refinement Tasks
paper_authors: Puja Trivedi, Ryan Rossi, David Arbour, Tong Yu, Franck Dernoncourt, Sungchul Kim, Nedim Lipka, Namyong Park, Nesreen K. Ahmed, Danai Koutra
results: 经过广泛的实验和设计的新指标,论文表明该模型可以有效地支持以下三种半 observable 网络的修正任务:T1 去除干扰子图,T2 扩展现有子图,T3 通过修改某个节点或子图的特征来实现 “风格” 传递。Abstract
Most real-world networks are noisy and incomplete samples from an unknown target distribution. Refining them by correcting corruptions or inferring unobserved regions typically improves downstream performance. Inspired by the impressive generative capabilities that have been used to correct corruptions in images, and the similarities between "in-painting" and filling in missing nodes and edges conditioned on the observed graph, we propose a novel graph generative framework, SGDM, which is based on subgraph diffusion. Our framework not only improves the scalability and fidelity of graph diffusion models, but also leverages the reverse process to perform novel, conditional generation tasks. In particular, through extensive empirical analysis and a set of novel metrics, we demonstrate that our proposed model effectively supports the following refinement tasks for partially observable networks: T1: denoising extraneous subgraphs, T2: expanding existing subgraphs and T3: performing "style" transfer by regenerating a particular subgraph to match the characteristics of a different node or subgraph.
摘要
大多数实际网络是不完整的、含有噪声的样本,来自未知的目标分布。通过更正噪声或推理未经观测的区域,通常会改善下游性能。受到图像修复的印象和图形填充相似的想法,我们提出了一种基于子图扩散的图生成框架,SGDM。我们的框架不仅提高了图扩散模型的可扩展性和准确性,还可以通过反向过程来执行新的、条件生成任务。在广泛的实验和一些新的 метри克的支持下,我们展示了我们的提议模型可以有效地支持以下三种修正任务:T1:除噪扩展Graph,T2:扩展现有的Subgraph,T3:通过重新生成某个Subgraph来模仿另一个节点或Subgraph的特点,进行"风格"转换。
On the Adversarial Robustness of Graph Contrastive Learning Methods
methods: 本研究使用了适应性攻击, targets the graph structure in the evasion scenario,用于评估节点和图像类фикацию任务中GCL模型的Robustness。
results: 研究发现GCL模型在针对图structure的攻击下的Robustness不如图像和文本领域的对应模型。但是,GCL模型在某些情况下可以具有较高的Robustness。这些结果可以帮助研究人员更好地理解GCL模型的Robustness,并为未来的研究提供新的方向。Abstract
Contrastive learning (CL) has emerged as a powerful framework for learning representations of images and text in a self-supervised manner while enhancing model robustness against adversarial attacks. More recently, researchers have extended the principles of contrastive learning to graph-structured data, giving birth to the field of graph contrastive learning (GCL). However, whether GCL methods can deliver the same advantages in adversarial robustness as their counterparts in the image and text domains remains an open question. In this paper, we introduce a comprehensive robustness evaluation protocol tailored to assess the robustness of GCL models. We subject these models to adaptive adversarial attacks targeting the graph structure, specifically in the evasion scenario. We evaluate node and graph classification tasks using diverse real-world datasets and attack strategies. With our work, we aim to offer insights into the robustness of GCL methods and hope to open avenues for potential future research directions.
摘要
对比学习(Contrastive Learning,CL)已经成为一种强大的描述图像和文本的自适应学习框架,同时提高模型对黑客攻击的Robustness。在这些研究中,研究人员已经扩展了对比学习的原则来应用于图 струкured数据,这种学习方法被称为图对比学习(Graph Contrastive Learning,GCL)。然而,Whether GCL方法可以提供与图像和文本领域的对比学习模型相同的鲁棒性优势是一个开放的问题。在这篇论文中,我们提出了一种特有的鲁棒性评估协议,用于评估GCL模型的鲁棒性。我们对这些模型进行了适应式黑客攻击,特别是在诱导攻击enario中。我们使用了多种真实世界的数据集和攻击策略来评估节点和图像的鲁棒性。我们的工作希望能够提供对GCL方法的鲁棒性的深入了解,并开启未来研究的可能性。
A quasi-polynomial time algorithm for Multi-Dimensional Scaling via LP hierarchies
results: 这个论文的结果是一种基于卡玛达-加外堡算法的approximation算法,可以在$n^2 \cdot 2^{\tilde{\mathcal{O}(k \Delta^4 / \epsilon^2)}$时间内实现一个嵌入,其cost与优化目标$\text{OPT}$之间的差别在$\epsilon$ rang内。此外,这个算法还提供了一种基于几何学的分析方法,可以避免对尺度比的极大值的几何学依赖。Abstract
Multi-dimensional Scaling (MDS) is a family of methods for embedding pair-wise dissimilarities between $n$ objects into low-dimensional space. MDS is widely used as a data visualization tool in the social and biological sciences, statistics, and machine learning. We study the Kamada-Kawai formulation of MDS: given a set of non-negative dissimilarities $\{d_{i,j}\}_{i , j \in [n]}$ over $n$ points, the goal is to find an embedding $\{x_1,\dots,x_n\} \subset \mathbb{R}^k$ that minimizes \[ \text{OPT} = \min_{x} \mathbb{E}_{i,j \in [n]} \left[ \left(1-\frac{\|x_i - x_j\|}{d_{i,j}\right)^2 \right] \] Despite its popularity, our theoretical understanding of MDS is extremely limited. Recently, Demaine, Hesterberg, Koehler, Lynch, and Urschel (arXiv:2109.11505) gave the first approximation algorithm with provable guarantees for Kamada-Kawai, which achieves an embedding with cost $\text{OPT} +\epsilon$ in $n^2 \cdot 2^{\tilde{\mathcal{O}(k \Delta^4 / \epsilon^2)}$ time, where $\Delta$ is the aspect ratio of the input dissimilarities. In this work, we give the first approximation algorithm for MDS with quasi-polynomial dependency on $\Delta$: for target dimension $k$, we achieve a solution with cost $\mathcal{O}(\text{OPT}^{ \hspace{0.04in}1/k } \cdot \log(\Delta/\epsilon) )+ \epsilon$ in time $n^{ \mathcal{O}(1)} \cdot 2^{\tilde{\mathcal{O}( k^2 (\log(\Delta)/\epsilon)^{k/2 + 1} ) }$. Our approach is based on a novel analysis of a conditioning-based rounding scheme for the Sherali-Adams LP Hierarchy. Crucially, our analysis exploits the geometry of low-dimensional Euclidean space, allowing us to avoid an exponential dependence on the aspect ratio $\Delta$. We believe our geometry-aware treatment of the Sherali-Adams Hierarchy is an important step towards developing general-purpose techniques for efficient metric optimization algorithms.
摘要
多维规范(MDS)是一家方法,用于将对象之间的差异 embedding 到低维度空间中。MDS 广泛应用于社会科学、生物科学、统计学和机器学习等领域,作为数据视化工具。我们研究了Kamada-Kawai的MDS问题:给定一组非负差异列表$\{d_{i,j}\}_{i,j\in[n]}$,目标是找到一个 embedding $\\{x_1,...,x_n\} \subset \mathbb{R}^k$,以下式中的最小化:$$\text{OPT} = \min_{x} \mathbb{E}_{i,j \in [n]} \left[ \left(1-\frac{\|x_i - x_j\|}{d_{i,j}\right)^2 \right]$$尽管MDS的理论理解非常有限,但是最近Demaine等人(arXiv:2109.11505)提出了第一个有提able保证的MDS算法,可以在$n^2 \cdot 2^{\tilde{\mathcal{O}(k \Delta^4 / \epsilon^2)}$时间内获得一个 embedding 的成本为$\text{OPT} + \epsilon$。在这个工作中,我们提供了第一个基于 quasi-polynomial 的 MDSA 算法,可以在 target 维度 $k$ 下,在时间 $n^{ \mathcal{O}(1)} \cdot 2^{\tilde{\mathcal{O}( k^2 (\log(\Delta)/\epsilon)^{k/2 + 1} ) }$ 内获得一个成本为 $\mathcal{O}(\text{OPT}^{ \hspace{0.04in}1/k } \cdot \log(\Delta/\epsilon) ) + \epsilon$ 的解决方案。我们的方法基于一种新的 conditioning-based rounding scheme for Sherali-Adams LP Hierarchy的分析。我们的分析利用了低维度欧几里得空间的几何特性,因此可以避免对尺度比 $\Delta$ 的几何因子进行对抗性的依赖。我们认为我们对 Sherali-Adams Hierarchy 的几何意识的处理是一种重要的步骤,可以帮助我们开发一些通用的高效度的 metric optimization 算法。
Towards Efficient Hyperdimensional Computing Using Photonics
results: 论文显示, compared to现有的电子光学DNN加速器和 CiM-based 加速器,PhotoHDC可以 дости得二到五个数据点的运算时间下降,并且可以降低四个数据点的能耗延误产品(EDP)。Abstract
Over the past few years, silicon photonics-based computing has emerged as a promising alternative to CMOS-based computing for Deep Neural Networks (DNN). Unfortunately, the non-linear operations and the high-precision requirements of DNNs make it extremely challenging to design efficient silicon photonics-based systems for DNN inference and training. Hyperdimensional Computing (HDC) is an emerging, brain-inspired machine learning technique that enjoys several advantages over existing DNNs, including being lightweight, requiring low-precision operands, and being robust to noise introduced by the nonidealities in the hardware. For HDC, computing in-memory (CiM) approaches have been widely used, as CiM reduces the data transfer cost if the operands can fit into the memory. However, inefficient multi-bit operations, high write latency, and low endurance make CiM ill-suited for HDC. On the other hand, the existing electro-photonic DNN accelerators are inefficient for HDC because they are specifically optimized for matrix multiplication in DNNs and consume a lot of power with high-precision data converters. In this paper, we argue that photonic computing and HDC complement each other better than photonic computing and DNNs, or CiM and HDC. We propose PhotoHDC, the first-ever electro-photonic accelerator for HDC training and inference, supporting the basic, record-based, and graph encoding schemes. Evaluating with popular datasets, we show that our accelerator can achieve two to five orders of magnitude lower EDP than the state-of-the-art electro-photonic DNN accelerators for implementing HDC training and inference. PhotoHDC also achieves four orders of magnitude lower energy-delay product than CiM-based accelerators for both HDC training and inference.
摘要
过去几年,半导体光子学基本 computing 已经出现为深度神经网络 (DNN) 的吸引力 alternatives,但是 DNN 的非线性运算和高精度需求使得设计效率的半导体光子学基本系统具有挑战。对于 HDC 来说, computing in-memory (CiM) 方法已经广泛使用,因为 CiM 可以降低资料传输成本,如果操作数可以适应内存中。然而,不确定的非理想项目导致 CiM 不适合 HDC。另一方面,现有的电子光子 DNN 加速器对 HDC 来说是不确定的,因为它们是特别针对 DNN 的矩阵乘法来设计,并且对高精度资料转换器进行耗电。在本文中,我们认为半导体光子学和 HDC 更好地融合在一起,比起半导体光子学和 DNN,或 CiM 和 HDC。我们提出 PhotoHDC,第一个用于 HDC 训练和测试的电子光子加速器,支持基本、记录、和图形编码方案。使用受欢迎的数据集,我们显示我们的加速器可以在 HDC 训练和测试中实现二到五次的较低 EDP,比之前的电子光子 DNN 加速器更高。PhotoHDC 还可以在 CiM 加速器中实现四个数据点的能量延迟产品,实现 HDC 训练和测试中的更高效率。
Learning to Simulate: Generative Metamodeling via Quantile Regression
paper_authors: L. Jeff Hong, Yanxi Hou, Qingkai Zhang, Xiaowei Zhang
for: This paper aims to provide a new metamodeling technique called generative metamodeling, which can be used for real-time decision-making in complex systems.
methods: The proposed method, called quantile-regression-based generative metamodeling (QRGMM), uses a new algorithm to construct a “fast simulator of the simulator” that can generate random outputs faster than the original simulation model while retaining an approximately equal conditional distribution.
results: The paper presents extensive numerical experiments to demonstrate the empirical performance of QRGMM and compare it with other state-of-the-art generative algorithms. The results show that QRGMM can generate random outputs substantially faster than the original simulation model and provide accurate summary statistics for real-time decision-making.Abstract
Stochastic simulation models, while effective in capturing the dynamics of complex systems, are often too slow to run for real-time decision-making. Metamodeling techniques are widely used to learn the relationship between a summary statistic of the outputs (e.g., the mean or quantile) and the inputs of the simulator, so that it can be used in real time. However, this methodology requires the knowledge of an appropriate summary statistic in advance, making it inflexible for many practical situations. In this paper, we propose a new metamodeling concept, called generative metamodeling, which aims to construct a "fast simulator of the simulator". This technique can generate random outputs substantially faster than the original simulation model, while retaining an approximately equal conditional distribution given the same inputs. Once constructed, a generative metamodel can instantaneously generate a large amount of random outputs as soon as the inputs are specified, thereby facilitating the immediate computation of any summary statistic for real-time decision-making. Furthermore, we propose a new algorithm -- quantile-regression-based generative metamodeling (QRGMM) -- and study its convergence and rate of convergence. Extensive numerical experiments are conducted to investigate the empirical performance of QRGMM, compare it with other state-of-the-art generative algorithms, and demonstrate its usefulness in practical real-time decision-making.
摘要
随机模拟模型可以很好地捕捉复杂系统的动态行为,但它们经常太慢,不能用于实时决策。模型学习技术广泛应用于学习随机模拟器的输出(例如,平均值或分位数)与输入之间的关系,以便在实时使用。然而,这种方法需要先知道合适的摘要统计量,从而带来了许多实际问题。在这篇论文中,我们提出了一种新的模型学习概念,即生成模型学习(generative metamodeling),旨在构建一个“快速的模拟器”。这种技术可以在输入指定后,通过生成大量的随机输出,以实时计算任何摘要统计量,以便实时决策。此外,我们提出了一种新的算法——量词回归基于生成模型(QRGMM),并研究其收敛率和收敛速率。我们对QRGMM的实际性能进行了广泛的数学实验,与其他当前最佳生成算法进行了比较,并在实际实时决策中示出了其有用性。
For: The paper is written for those who work with high-dimensional imbalanced data and need effective unsupervised feature selection methods to handle such data.* Methods: The paper proposes a modification of the Laplacian Score (LS) called Marginal Laplacian Score (MLS) to better handle imbalanced data. The MLS algorithm is integrated into the Differentiable Unsupervised Feature Selection (DUFS) method to create DUFS-MLS.* Results: The proposed methods demonstrate robust and improved performance on synthetic and public data sets.Here’s the same information in Simplified Chinese text:* For: 这篇论文是为了处理高维度不均衡数据而写的,它需要有效的无监督特征选择方法来处理这种数据。* Methods: 论文提出了一种基于 Laplacian Score (LS) 的修改方法,称为 Marginal Laplacian Score (MLS),以更好地处理不均衡数据。它将 MLS 算法集成到了 Differentiable Unsupervised Feature Selection (DUFS) 方法中,创造出 DUFS-MLS。* Results: 提议的方法在 sintetic 和公共数据集上表现了更加稳定和优化的result。Abstract
High-dimensional imbalanced data poses a machine learning challenge. In the absence of sufficient or high-quality labels, unsupervised feature selection methods are crucial for the success of subsequent algorithms. Therefore, there is a growing need for unsupervised feature selection algorithms focused on imbalanced data. Thus, we propose a Marginal Laplacian Score (MLS) a modification of the well-known Laplacian Score (LS) to be better suited for imbalance data. We introduce an assumption that the minority class or anomalous appear more frequently in the margin of the features. Consequently, MLS aims to preserve the local structure of the data set's margin. As MLS is better suited for handling imbalanced data, we propose its integration into modern feature selection methods that utilize the Laplacian score. We integrate the MLS algorithm into the Differentiable Unsupervised Feature Selection (DUFS), resulting in DUFS-MLS. The proposed methods demonstrate robust and improved performance on synthetic and public data sets.
摘要
高维度偏振数据对机器学习 pose 一个挑战,在没有足够或高质量标签的情况下,无监督特征选择方法成为后续算法的成功关键。因此,有一个增长的需求,即开发适合偏振数据的无监督特征选择算法。因此,我们提出了 Marginal Laplacian Score(MLS),它是 Laplacian Score(LS)的修改,更适合处理偏振数据。我们假设了小类或异常在特征边缘出现得更加频繁。因此,MLS aspires to preserve the local structure of the data set's margin。由于 MLS 更适合处理偏振数据,我们提议将其 integrate into modern feature selection methods that utilize the Laplacian score。我们将 MLS 算法 integrate into Differentiable Unsupervised Feature Selection(DUFS),得到 DUFS-MLS。我们的提议方法在 synthetic 和公共数据集上进行了 robust 和改进的表现。
Unified Binary and Multiclass Margin-Based Classification
results: 本文使用相对边缘形式表示多类别损失函数,并通过extend Bartlett et al. (2006)的结论来推广二分类损失函数的分类-报告问题。此外,本文还扩展了 Fenchel-Young 损失函数的分类报告问题。Abstract
The notion of margin loss has been central to the development and analysis of algorithms for binary classification. To date, however, there remains no consensus as to the analogue of the margin loss for multiclass classification. In this work, we show that a broad range of multiclass loss functions, including many popular ones, can be expressed in the relative margin form, a generalization of the margin form of binary losses. The relative margin form is broadly useful for understanding and analyzing multiclass losses as shown by our prior work (Wang and Scott, 2020, 2021). To further demonstrate the utility of this way of expressing multiclass losses, we use it to extend the seminal result of Bartlett et al. (2006) on classification-calibration of binary margin losses to multiclass. We then analyze the class of Fenchel-Young losses, and expand the set of these losses that are known to be classification-calibrated.
摘要
“margin loss”这个概念在 binary 分类算法的开发和分析中具有重要的作用。然而,在多类分类方面,至今还没有一个共识的 аналоги。在这项工作中,我们显示了一种广泛适用的多类损失函数,包括许多流行的函数,可以表示为相对边界形式,这是对 binary 损失函数的推广。这种表示方式广泛有用于理解和分析多类损失,如我们之前的工作(王和斯科特,2020、2021)所示。为了更加彰显这种表示方式的实用性,我们使用它来推广 Bartlett et al. (2006) 的Binary 边界损失的分类抽象结果到多类。然后,我们分析 Fenchel-Young 损失函数,并扩展这些损失函数的分类抽象结果。
A transductive few-shot learning approach for classification of digital histopathological slides from liver cancer
results: 我们在liver cancer 的数据集上进行实验, Specifically hepatocellular carcinoma,Initial results show 我们的方法的有效性,并且它的应用可能会增加自动诊断和治疗的效率,同时降低专家标签的时间和努力。Abstract
This paper presents a new approach for classifying 2D histopathology patches using few-shot learning. The method is designed to tackle a significant challenge in histopathology, which is the limited availability of labeled data. By applying a sliding window technique to histopathology slides, we illustrate the practical benefits of transductive learning (i.e., making joint predictions on patches) to achieve consistent and accurate classification. Our approach involves an optimization-based strategy that actively penalizes the prediction of a large number of distinct classes within each window. We conducted experiments on histopathological data to classify tissue classes in digital slides of liver cancer, specifically hepatocellular carcinoma. The initial results show the effectiveness of our method and its potential to enhance the process of automated cancer diagnosis and treatment, all while reducing the time and effort required for expert annotation.
摘要
Note: "Simplified Chinese" is a simplified version of Traditional Chinese, which is used in mainland China. The translation is written in Simplified Chinese, but the original text is in Traditional Chinese.Here are some differences between the original text and the translation:1. "2D histopathology patches" becomes "2D histopathology patches" (Simplified Chinese does not have a separate word for "2D").2. "few-shot learning" becomes "几个shot学习" (the word "几个" is used to indicate a small number).3. "transductive learning" becomes "混合学习" (the word "混合" means "mixed" or "combined").4. "histopathology slides" becomes " histopathology 幕" (the word "幕" is used to refer to a slide).5. "digital slides" becomes "数字幕" (the word "数字" means "digital").6. "liver cancer" becomes "肝癌" (the word "肝" means "liver").7. "hepatocellular carcinoma" becomes "肝细胞癌" (the word "肝细胞" means "liver cell").8. "tissue classes" becomes "组织类" (the word "组织" means "tissue").9. "classify" becomes "分类" (the word "分类" means "classify").10. "automated cancer diagnosis and treatment" becomes "自动化肿瘤诊断和治疗" (the word "自动化" means "automated").
A novel feature selection method based on quantum support vector machine
methods: 本论文提出的方法是Quantum支持向量机制Feature选择(QSVMF),它结合了量子支持向量机制和多目标遗传算法。QSVMF aim to optimize multiple simultaneous objectives, including maximizing classification accuracy, minimizing selected features and quantum circuit costs, and reducing feature covariance.
results: 实验结果显示,QSVMF可以实现Superior performance compared to classical approaches with selected features. Moreover, the Pareto front solutions of QSVMF enable analysis of accuracy versus feature set size trade-offs, identifying extremely sparse yet accurate feature subsets. The selected features are also found to be biologically relevant in terms of known breast cancer biomarkers.Abstract
Feature selection is critical in machine learning to reduce dimensionality and improve model accuracy and efficiency. The exponential growth in feature space dimensionality for modern datasets directly results in ambiguous samples and redundant features, which can severely degrade classification accuracy. Quantum machine learning offers potential advantages for addressing this challenge. In this paper, we propose a novel method, quantum support vector machine feature selection (QSVMF), integrating quantum support vector machines with multi-objective genetic algorithm. QSVMF optimizes multiple simultaneous objectives: maximizing classification accuracy, minimizing selected features and quantum circuit costs, and reducing feature covariance. We apply QSVMF for feature selection on a breast cancer dataset, comparing the performance of QSVMF against classical approaches with the selected features. Experimental results show that QSVMF achieves superior performance. Furthermore, The Pareto front solutions of QSVMF enable analysis of accuracy versus feature set size trade-offs, identifying extremely sparse yet accurate feature subsets. We contextualize the biological relevance of the selected features in terms of known breast cancer biomarkers. This work highlights the potential of quantum-based feature selection to enhance machine learning efficiency and performance on complex real-world data.
摘要
<>Translate the given text into Simplified Chinese.<>机器学习中的特征选择是关键,以降低维度和提高模型准确率和效率。现代数据集的特征空间维度的几乎无限增长直接导致了混淆样本和冗余特征,这可能会严重降低分类精度。量子机器学习具有解决这个挑战的潜在优势。在这篇论文中,我们提出了一种新的方法,即量子支持向量机特征选择(QSVMF),它将量子支持向量机与多目标遗传算法结合。QSVMF同时优化了多个同时目标:最大化分类精度、最小化选择特征数和量子电路成本,还有减少特征协方差。我们在乳腺癌数据集上应用QSVMF进行特征选择,与采用经典方法选择的特征进行比较。实验结果表明,QSVMF的性能优于经典方法。此外,QSVMF的 pareto 前缘解决方案允许分析精度与特征集大小之间的负面负担关系,并标识出EXTREMELY SPARSE yet ACCURATE特征子组。我们在生物学上对选择的特征进行了Contextual化分析,并证明它们与知名乳腺癌生物标志物相关。这种工作展示了量子特征选择的潜在优势,可以提高机器学习的效率和性能在复杂的实际数据上。
Q-learning Based Optimal False Data Injection Attack on Probabilistic Boolean Control Networks
results: 通过对两个被攻击的PBCNs(包括10个节点的网络和28个节点的网络)进行验证,证明了我们提出的方法的有效性。Abstract
In this paper, we present a reinforcement learning (RL) method for solving optimal false data injection attack problems in probabilistic Boolean control networks (PBCNs) where the attacker lacks knowledge of the system model. Specifically, we employ a Q-learning (QL) algorithm to address this problem. We then propose an improved QL algorithm that not only enhances learning efficiency but also obtains optimal attack strategies for large-scale PBCNs that the standard QL algorithm cannot handle. Finally, we verify the effectiveness of our proposed approach by considering two attacked PBCNs, including a 10-node network and a 28-node network.
摘要
在这篇论文中,我们提出了一种利用强化学习(RL)方法解决潜在假数据插入攻击问题在概率布尔控制网络(PBCN)中,其中攻击者缺乏系统模型知识。我们使用Q学习(QL)算法来解决这个问题。然后,我们提出了一种改进的QL算法,不仅提高学习效率,还可以对大规模PBCN进行优化攻击策略。最后,我们验证了我们提出的方法的有效性,通过考虑两个被攻击的PBCN,即10节点网络和28节点网络。
results: 我们的研究结果表明,在干扰通信设置下,协作可以减少均值误差,并且可能带来线性增速。我们的结果因此bridge了随机和适应设置之间的差异,并实现了紧凑的误差 bound。Abstract
We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access to the first-order gradient information at the queried points. This is in contrast to the case for stochastic functions, where each machine samples the cost functions from a fixed distribution. Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points. The key finding here is identifying the high-dimensional regime where collaboration is beneficial and may even lead to a linear speedup in the number of machines. We further illustrate our findings through federated adversarial linear bandits by developing novel distributed single and two-point feedback algorithms. Our work is the first attempt towards a systematic understanding of federated online optimization with limited feedback, and it attains tight regret bounds in the intermittent communication setting for both first and zeroth-order feedback. Our results thus bridge the gap between stochastic and adaptive settings in federated online optimization.
摘要
我们研究分布式在线和帕戈底函数优化对于适应性对手的问题。我们想要在M机器工作在平行的情况下,避免在T轮次的输送中累累 regret平均值。我们的结果显示,当机器可以在发问点上获得首项梯度信息时,协作无益。相比之下,在数学分布中的机器可以随机抽出成本函数。此外,我们进一步探讨了在联邦线上优化中的困难情况,即机器仅能在发问点上获得成本函数的值。我们发现在高维度情况下,协作可能具有Linear Speedup的优势,并在联邦线上优化中获得紧密的 regret bound。我们还开发了分布式单点和双点反馈算法,以探讨我们的结果。我们的研究是关于联邦线上优化的首次系统性研究,并在干扰通信设定下获得了紧密的 regret bound。我们的结果因此将关于数学和适应设定之间的差异 bridge。
LoCoMotif: Discovering time-warped motifs in time series
results: 作者通过一个实际的 физи疗案例来验证该方法的效果,并提出了一个新的评价指标来评估时间序列模式发现方法。结果显示,LoCoMotif substantially 超过了现有方法的性能。Abstract
Time Series Motif Discovery (TSMD) refers to the task of identifying patterns that occur multiple times (possibly with minor variations) in a time series. All existing methods for TSMD have one or more of the following limitations: they only look for the two most similar occurrences of a pattern; they only look for patterns of a pre-specified, fixed length; they cannot handle variability along the time axis; and they only handle univariate time series. In this paper, we present a new method, LoCoMotif, that has none of these limitations. The method is motivated by a concrete use case from physiotherapy. We demonstrate the value of the proposed method on this use case. We also introduce a new quantitative evaluation metric for motif discovery, and benchmark data for comparing TSMD methods. LoCoMotif substantially outperforms the existing methods, on top of being more broadly applicable.
摘要
时间序列模式发现(TSMD)指的是在时间序列中找到重复出现的模式(可能有些变化)。现有的所有TSMD方法都受到以下一或多个限制:它们只查找两个最相似的模式出现;它们只查找固定长度的模式;它们无法处理时间轴上的变化;它们只处理单变量时间序列。在这篇论文中,我们提出了一种新的方法,LoCoMotif,它没有这些限制。该方法是基于物理治疗的具体用例所驱动的。我们在这个用例中展示了提案的方法的价值。我们还介绍了一个新的量化评价指标 для模式发现,以及用于比较TSMD方法的referenced数据。LoCoMotifsubstantially超越现有的方法,同时更广泛适用。
Interpreting Differentiable Latent States for Healthcare Time-series Data
methods: 该论文使用了可导 differentiable 模型,并提出了一种基于输入特征的 latent state 解释方法,以及一种基于 latent state 的预测方法。
results: 该论文在一个实际医疗数据集上实现了Identifying daytime behavioral pattern for predicting nocturnal behavior的目标,并且表明了该方法的可解释性和实用性。Abstract
Machine learning enables extracting clinical insights from large temporal datasets. The applications of such machine learning models include identifying disease patterns and predicting patient outcomes. However, limited interpretability poses challenges for deploying advanced machine learning in digital healthcare. Understanding the meaning of latent states is crucial for interpreting machine learning models, assuming they capture underlying patterns. In this paper, we present a concise algorithm that allows for i) interpreting latent states using highly related input features; ii) interpreting predictions using subsets of input features via latent states; and iii) interpreting changes in latent states over time. The proposed algorithm is feasible for any model that is differentiable. We demonstrate that this approach enables the identification of a daytime behavioral pattern for predicting nocturnal behavior in a real-world healthcare dataset.
摘要
The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis
results: 实验和理论结果表明,随着模型的过参数化程度增加,SAM可以更好地找到抗泛化的最优解,并且可以在模型 becomes more overparameterized 时提供更好的泛化性。此外,研究还发现了在实际应用中,采用简洁的缺省值可以实现有效的过参数化。Abstract
Training an overparameterized neural network can yield minimizers of the same level of training loss and yet different generalization capabilities. With evidence that indicates a correlation between sharpness of minima and their generalization errors, increasing efforts have been made to develop an optimization method to explicitly find flat minima as more generalizable solutions. This sharpness-aware minimization (SAM) strategy, however, has not been studied much yet as to how overparameterization can actually affect its behavior. In this work, we analyze SAM under varying degrees of overparameterization and present both empirical and theoretical results that suggest a critical influence of overparameterization on SAM. Specifically, we first use standard techniques in optimization to prove that SAM can achieve a linear convergence rate under overparameterization in a stochastic setting. We also show that the linearly stable minima found by SAM are indeed flatter and have more uniformly distributed Hessian moments compared to those of SGD. These results are corroborated with our experiments that reveal a consistent trend that the generalization improvement made by SAM continues to increase as the model becomes more overparameterized. We further present that sparsity can open up an avenue for effective overparameterization in practice.
摘要
训练一个过参数化神经网络可以导致同等训练损失的最小值具有不同的泛化能力。基于证据表明极值锐度和泛化错误之间存在关系,因此增加了寻找平滑极值的优化策略的努力。这种关注锐度-aware minimization(SAM)策略尚未受过过参数化的影响的研究。在这项工作中,我们分析了SAM在不同程度的过参数化下的行为,并提供了实际和理论结果,表明过参数化对SAM的影响是 kritikal。 Specifically,我们使用标准的优化技术来证明SAM在随机设置下可以实现线性准确率。我们还发现SAM找到的线性稳定极值具有更平滑的Hessian均值,比SGD更好。这些结果得到了我们的实验证明,显示在过参数化的模型中,SAM可以提供更好的泛化改进。此外,我们发现稀缺可以在实践中打开一个有效的过参数化之路。
Model Performance Prediction for Hyperparameter Optimization of Deep Learning Models Using High Performance Computing and Quantum Annealing
results: 本论文的测试结果显示,Swift-Hyperband 算法可以在 Machine-Learned Particle Flow 模型和更广泛的目标模型(包括计算机视觉和自然语言处理等领域)中,对参数进行优化,可以获得与使用较多 compute 资源的传统方法相似的结果,同时仅需要使用较少的 compute 资源。Abstract
Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a compute resource intensive process as it usually requires to train the target model with many different hyperparameter configurations. We show that integrating model performance prediction with early stopping methods holds great potential to speed up the HPO process of deep learning models. Moreover, we propose a novel algorithm called Swift-Hyperband that can use either classical or quantum support vector regression for performance prediction and benefit from distributed High Performance Computing environments. This algorithm is tested not only for the Machine-Learned Particle Flow model used in High Energy Physics, but also for a wider range of target models from domains such as computer vision and natural language processing. Swift-Hyperband is shown to find comparable (or better) hyperparameters as well as using less computational resources in all test cases.
摘要
Wireless Network Digital Twin for 6G: Generative AI as A Key Enabler
results: 与existingsystems比较,GNNFlow提供了Up to 21.1倍的持续学习速度Abstract
Graph Neural Networks (GNNs) play a crucial role in various fields. However, most existing deep graph learning frameworks assume pre-stored static graphs and do not support training on graph streams. In contrast, many real-world graphs are dynamic and contain time domain information. We introduce GNNFlow, a distributed framework that enables efficient continuous temporal graph representation learning on dynamic graphs on multi-GPU machines. GNNFlow introduces an adaptive time-indexed block-based data structure that effectively balances memory usage with graph update and sampling operation efficiency. It features a hybrid GPU-CPU graph data placement for rapid GPU-based temporal neighborhood sampling and kernel optimizations for enhanced sampling processes. A dynamic GPU cache for node and edge features is developed to maximize cache hit rates through reuse and restoration strategies. GNNFlow supports distributed training across multiple machines with static scheduling to ensure load balance. We implement GNNFlow based on DGL and PyTorch. Our experimental results show that GNNFlow provides up to 21.1x faster continuous learning than existing systems.
摘要
图内部网络(GNNs)在各个领域发挥关键作用。然而,大多数现有的深度图学框架假设预存的静态图,并不支持在图流中进行训练。相反,许多实际世界中的图是动态的,具有时间域信息。我们介绍了GNNFlow,一个分布式框架,允许高效地在动态图上进行连续时间表示学习,在多个GPU机器上进行分布式训练。GNNFlow引入了适应时间标记的块式数据结构,可以有效地平衡内存使用量与图更新和采样操作的效率。它还特点于GPU-CPU图数据分配,以便快速在GPU上进行时间邻域采样和核心优化。我们还开发了一个动态GPU缓存,以便在节点和边特征之间进行最大化缓存擦取率。GNNFlow支持分布式训练,以确保负荷均衡。我们基于DGL和PyTorch实现了GNNFlow,我们的实验结果显示,GNNFlow可以提供到21.1倍的连续学习速度,相比现有系统。
The Devil is in the Data: Learning Fair Graph Neural Networks via Partial Knowledge Distillation
results: 我们在多个 benchmark 数据集上进行了实验,发现FairGKD可以大幅提高GNN的公平性,同时保持其实用性。Abstract
Graph neural networks (GNNs) are being increasingly used in many high-stakes tasks, and as a result, there is growing attention on their fairness recently. GNNs have been shown to be unfair as they tend to make discriminatory decisions toward certain demographic groups, divided by sensitive attributes such as gender and race. While recent works have been devoted to improving their fairness performance, they often require accessible demographic information. This greatly limits their applicability in real-world scenarios due to legal restrictions. To address this problem, we present a demographic-agnostic method to learn fair GNNs via knowledge distillation, namely FairGKD. Our work is motivated by the empirical observation that training GNNs on partial data (i.e., only node attributes or topology data) can improve their fairness, albeit at the cost of utility. To make a balanced trade-off between fairness and utility performance, we employ a set of fairness experts (i.e., GNNs trained on different partial data) to construct the synthetic teacher, which distills fairer and informative knowledge to guide the learning of the GNN student. Experiments on several benchmark datasets demonstrate that FairGKD, which does not require access to demographic information, significantly improves the fairness of GNNs by a large margin while maintaining their utility.
摘要
GRAPH NEURAL NETWORKS (GNNs) 在许多高风险任务中被越来越广泛使用,而在这个过程中,GNNs的公平性也在引起越来越多的关注。GNNs 有可能做出偏袋分类决策,对某些人群进行歧视,这些人群通常是根据敏感特征如性别和种族来分类。尽管最近的工作已经努力地改进 GNNs 的公平性表现,但它们通常需要可 accessible 的人口信息。这限制了它们在实际场景中的应用,因为有法律限制。为解决这个问题,我们提出了一种不需要人口信息的公平 GNN 学习方法,即 FairGKD。我们的工作是基于训练 GNNs 只使用节点特征或 topological 数据的实际观察,训练 GNNs 可以改善公平性,但是会导致效果下降。为了做出平衡的负担 Trade-off между公平性和效果表现,我们使用了一组公平 GNNs(即在不同的 partial 数据上训练的 GNNs)来构建Synthetic teacher,这些教师通过 distill 公平和有用的知识来引导学习 GNN student。我们的实验表明,FairGKD 可以在不需要人口信息的情况下大幅提高 GNNs 的公平性,同时保持其效果表现。
Continuous optimization by quantum adaptive distribution search
methods: 该论文提出了一种量子连续优化算法 named QuADS,它将量子adaptive搜索(GAS)和经典的 covariance matrix adaptation - evolution strategy(CMA-ES)结合在一起,以优化连续优化问题的解决方法。
results: 数值实验表明,QuADS比GAS和CMA-ES更高效,这是因为它在优化过程中不断更新初始状态分布,而不是一直使用均勤状态,从而减少了 oracle 调用次数。I hope this helps! Let me know if you have any further questions.Abstract
In this paper, we introduce the quantum adaptive distribution search (QuADS), a quantum continuous optimization algorithm that integrates Grover adaptive search (GAS) with the covariance matrix adaptation - evolution strategy (CMA-ES), a classical technique for continuous optimization. QuADS utilizes the quantum-based search capabilities of GAS and enhances them with the principles of CMA-ES for more efficient optimization. It employs a multivariate normal distribution for the initial state of the quantum search and repeatedly updates it throughout the optimization process. Our numerical experiments show that QuADS outperforms both GAS and CMA-ES. This is achieved through adaptive refinement of the initial state distribution rather than consistently using a uniform state, resulting in fewer oracle calls. This study presents an important step toward exploiting the potential of quantum computing for continuous optimization.
摘要
在这篇论文中,我们介绍了量子适应分布搜索(QuADS),这是一种量子连续优化算法,它将格罗弗适应搜索(GAS)与经典技术 covariance matrix adaptation - evolution strategy(CMA-ES)结合在一起。QuADS利用量子搜索的优势,并将其与CMA-ES的原理相结合,以实现更高效的优化。它使用多ivariate normal distribution作为搜索的初始状态,并在优化过程中不断更新它。我们的数值实验显示,QuADS比GAS和CMA-ES更高效。这是因为QuADS通过适应初始状态的修正,而不是一直使用均匀状态,因此需要更少的oracle调用。这篇研究对于利用量子计算来解决连续优化问题的潜在性具有重要意义。
Improving Self-supervised Molecular Representation Learning using Persistent Homology
results: 我们对分子性质预测进行了系统评估,并证明了我们的方法在不同探测任务中具有较好的表示能力,比基线方法更高。我们的损失函数可以增加基线性能,有时大幅提高,并且在非常小的数据集上具有显著改进。Abstract
Self-supervised learning (SSL) has great potential for molecular representation learning given the complexity of molecular graphs, the large amounts of unlabelled data available, the considerable cost of obtaining labels experimentally, and the hence often only small training datasets. The importance of the topic is reflected in the variety of paradigms and architectures that have been investigated recently. Yet the differences in performance seem often minor and are barely understood to date. In this paper, we study SSL based on persistent homology (PH), a mathematical tool for modeling topological features of data that persist across multiple scales. It has several unique features which particularly suit SSL, naturally offering: different views of the data, stability in terms of distance preservation, and the opportunity to flexibly incorporate domain knowledge. We (1) investigate an autoencoder, which shows the general representational power of PH, and (2) propose a contrastive loss that complements existing approaches. We rigorously evaluate our approach for molecular property prediction and demonstrate its particular features in improving the embedding space: after SSL, the representations are better and offer considerably more predictive power than the baselines over different probing tasks; our loss increases baseline performance, sometimes largely; and we often obtain substantial improvements over very small datasets, a common scenario in practice.
摘要
Mostly Beneficial Clustering: Aggregating Data for Operational Decision Making
paper_authors: Chengzhang Li, Zhenkang Peng, Ying Rong
For: 本研究旨在提高大规模系统的运维决策效率,解决具有限制数据的 thousend 个问题。* Methods: 我们提出了一种基于集群的减少SAA方法,利用问题集中的集群结构来实现数据集成。我们证明,随着问题数量增加,利用知道的集群结构中的问题可以获得额外的优势。当集群结构未知时,我们显示可以在一些数据点的代价下揭示集群结构,特别是集群之间的距离较大时。* Results: 我们通过新闻 vendor 系统的管理via数值实验来探讨提议方法的性能。我们研究了不同距离度量之间问题实例间的影响,并验证了我们的提议方法在实际数据上的优势,尤其是在小数据大规模 режи度下。Abstract
With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across problems to improve the decisions obtained by solving those problems individually. We propose a novel cluster-based shrunken-SAA approach that can exploit the cluster structure among problems when implementing the data aggregation approaches. We prove that, as the number of problems grows, leveraging the known cluster structure among problems yields additional benefits over the data aggregation approaches that neglect such structure. When the cluster structure is unknown, we show that unveiling the cluster structure, even at the cost of a few data points, can be beneficial, especially when the distance between clusters of problems is substantial. Our proposed approach can be extended to general cost functions under mild conditions. When the number of problems gets large, the optimality gap of our proposed approach decreases exponentially in the distance between the clusters. We explore the performance of the proposed approach through the application of managing newsvendor systems via numerical experiments. We investigate the impacts of distance metrics between problem instances on the performance of the cluster-based Shrunken-SAA approach with synthetic data. We further validate our proposed approach with real data and highlight the advantages of cluster-based data aggregation, especially in the small-data large-scale regime, compared to the existing approaches.
摘要
随着市场条件的急速变化和产品创新的加速,大规模系统的运营决策面临着解决数千个问题的挑战,具有限制性的数据。为了提高决策,数据聚合被提议来 combiner 数据 Across problems。我们提出了一种基于集群的缩小-SAA方法,可以利用问题之间的集群结构来实现数据聚合方法。我们证明,随着问题的数量增加,利用问题之间的知道集群结构带来更多的优势,比如数据聚合方法不考虑集群结构。当集群结构未知时,我们表明,揭示集群结构,即使只用一些数据点,可以带来很大的优势,特别是集群之间的距离较大。我们的提出的方法可以扩展到通用的成本函数下,只要满足某些轻度的条件。当问题的数量很大时,我们的方法的优化差距在集群之间的距离 exponentiallly 减少。我们通过应用管理新闻系统的 numerical experiments 来探索我们的提出的方法的性能。我们研究了Distance metric 在问题实例之间的影响,并验证了我们的提出的方法在实际数据上的优势,特别是在小数据大规模 режи响应。
Utilizing Model Residuals to Identify Rental Properties of Interest: The Price Anomaly Score (PAS) and Its Application to Real-time Data in Manhattan
results: 本研究发现,通过将泛化到至少75%的数据集,可以捕捉到价格预测偏差的重要信息。此外,PAS还可以用于识别过高或过低的房源价格。Abstract
Understanding whether a property is priced fairly hinders buyers and sellers since they usually do not have an objective viewpoint of the price distribution for the overall market of their interest. Drawing from data collected of all possible available properties for rent in Manhattan as of September 2023, this paper aims to strengthen our understanding of model residuals; specifically on machine learning models which generalize for a majority of the distribution of a well-proportioned dataset. Most models generally perceive deviations from predicted values as mere inaccuracies, however this paper proposes a different vantage point: when generalizing to at least 75\% of the data-set, the remaining deviations reveal significant insights. To harness these insights, we introduce the Price Anomaly Score (PAS), a metric capable of capturing boundaries between irregularly predicted prices. By combining relative pricing discrepancies with statistical significance, the Price Anomaly Score (PAS) offers a multifaceted view of rental valuations. This metric allows experts to identify overpriced or underpriced properties within a dataset by aggregating PAS values, then fine-tuning upper and lower boundaries to any threshold to set indicators of choice.
摘要
To harness these insights, we introduce the Price Anomaly Score (PAS), a metric that captures the boundaries between irregularly predicted prices. By combining relative pricing discrepancies with statistical significance, the PAS offers a multifaceted view of rental valuations. This metric allows experts to identify overpriced or underpriced properties within a dataset by aggregating PAS values and fine-tuning upper and lower boundaries to any threshold of choice.