cs.LG - 2023-09-13

Tackling the dimensions in imaging genetics with CLUB-PLS

  • paper_url: http://arxiv.org/abs/2309.07352
  • repo_url: None
  • paper_authors: Andre Altmann, Ana C Lawry Aguila, Neda Jahanshad, Paul M Thompson, Marco Lorenzi
  • for: 链接高维数据在两个领域,如遗传学数据和大脑成像数据,以探索这两个领域之间的关系。
  • methods: 使用Partial Least Squares(PLS)基本框架,称为Cluster-Bootstrap PLS(CLUB-PLS),可以处理高维输入维度在两个领域以及大样本大小。该框架使用cluster bootstrap提供了robust统计量,以确定单个输入特征在两个领域中的有效性。
  • results: 对33,000名UK Biobank测试者的表面积和 cortical thickness进行了研究,发现了107个遗传因素- Fenotype对,与386个不同的基因相关。大多数这些Loci可以技术上验证:使用 классиclic GWAS或Genome-Wide Inferred Statistics(GWIS),发现85个Loci-Phenotype对超过了遗传学建议的(P<1e-05)阈值。
    Abstract A major challenge in imaging genetics and similar fields is to link high-dimensional data in one domain, e.g., genetic data, to high dimensional data in a second domain, e.g., brain imaging data. The standard approach in the area are mass univariate analyses across genetic factors and imaging phenotypes. That entails executing one genome-wide association study (GWAS) for each pre-defined imaging measure. Although this approach has been tremendously successful, one shortcoming is that phenotypes must be pre-defined. Consequently, effects that are not confined to pre-selected regions of interest or that reflect larger brain-wide patterns can easily be missed. In this work we introduce a Partial Least Squares (PLS)-based framework, which we term Cluster-Bootstrap PLS (CLUB-PLS), that can work with large input dimensions in both domains as well as with large sample sizes. One key factor of the framework is to use cluster bootstrap to provide robust statistics for single input features in both domains. We applied CLUB-PLS to investigating the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.
    摘要 一个主要挑战在生物遗传学和相关领域是将一维数据(例如遗传数据)与另一维数据(例如脑成像数据)相关联。现有的标准方法在这个领域是在每个预先定义的成像测量方面执行一次全 genomic association study (GWAS)。although this approach has been incredibly successful, one shortcoming is that phenotypes must be predefined. As a result, effects that are not limited to predefined regions of interest or that reflect larger brain-wide patterns can easily be overlooked.在这种工作中,我们引入了一种基于半最似方法的框架,我们称之为Cluster-Bootstrap PLS (CLUB-PLS),它可以处理大量的输入维度在两个领域以及大样本大小。一个关键因素是使用cluster bootstrap提供了 robust的统计参数 для单个输入特征在两个领域。我们使用CLUB-PLS investigated the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.Translation notes:* "一维数据" and "另一维数据" are used to refer to the two different types of data, i.e., genetic data and brain imaging data.* "遗传数据" and "成像数据" are used to refer to the two types of data, respectively.* "预先定义" means "pre-defined" in English.* "phenotypes" are referred to as "特征" in Chinese.* "CLUB-PLS" is a combination of "Cluster-Bootstrap PLS" and "遗传基因" in Chinese, which is used to refer to the framework introduced in the text.* "Genome-wide suggestive" is not a direct translation of any specific Chinese phrase, but it is used to refer to the threshold of P<1e-05 used in the study.

Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains

  • paper_url: http://arxiv.org/abs/2309.07344
  • repo_url: None
  • paper_authors: Md Nasim, Yexiang Xue
  • for: 加速Partial Differential Equations(PDEs)学习,以提速科学发现。
  • methods: 利用随机投影加速PDE更新,并可以拓宽应用范围。
  • results: 实验证明,提议的Reel可以在压缩数据 circumstance下减少70-98%的训练时间,而无损失模型质量。
    Abstract Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.
    摘要 加速部分泛函方程(PDE)从实验数据学习会加速科学发现的速度。先前的随机算法利用PDE更新中的稀疏性进行加速。然而,这些方法只适用于有限个可分解PDE中的稀疏特征。我们提议Reel,它通过随机投影加速PDE学习,并具有远更广泛的应用可能性。Reel利用更新中的稀疏特征进行稀疏更新的分解,从而实现高效的学习。当更新来源包括大面积上逐渐变化的项目(稀疏在频域)以及一些集中在小范围内的快速更新(稀疏在值域)时,Reel可以高效地学习PDE模型。随机投影后,可以压缩稀疏信号进行学习。为扩展模型的应用范围,Reel使用泰勒级数展开来近似非线性PDE更新为多项式形式。理论上,我们得出了原始维度的几乎常量因子近似关系。实验证明,我们提议的Reel可以在压缩数据时对PDE模型进行更快的学习(70-98%减少训练时间),并且与非压缩模型的质量相似。

User Training with Error Augmentation for Electromyogram-based Gesture Classification

  • paper_url: http://arxiv.org/abs/2309.07289
  • repo_url: None
  • paper_authors: Yunus Bicer, Niklas Smedemark-Margulies, Basak Celik, Elifnur Sunger, Ryan Orendorff, Stephanie Naufel, Tales Imbiriba, Deniz Erdo{ğ}mu{ş}, Eugene Tunik, Mathew Yarossi
  • for: 这个研究是为了开发一个基于Surface electromyographic (sEMG)的实时控制系统,以便通过识别手势来控制用户界面。
  • methods: 这个系统使用了机器学习算法来实时识别手势,并将sEMG数据流入这个算法中。在训练阶段,参与者获得了三种反馈:正确反馈、 modificated 反馈和无反馈。
  • results: 实验结果显示,相比基准值,modified 反馈 condtion 能够导致更高的精度和更好的手势类别分类。这些结果表明,在一个 gamified 用户界面中,通过实时反馈和手势识别的修改,可以实现INTUITIVE、快速和精度的任务取得。
    Abstract We designed and tested a system for real-time control of a user interface by extracting surface electromyographic (sEMG) activity from eight electrodes in a wrist-band configuration. sEMG data were streamed into a machine-learning algorithm that classified hand gestures in real-time. After an initial model calibration, participants were presented with one of three types of feedback during a human-learning stage: veridical feedback, in which predicted probabilities from the gesture classification algorithm were displayed without alteration, modified feedback, in which we applied a hidden augmentation of error to these probabilities, and no feedback. User performance was then evaluated in a series of minigames, in which subjects were required to use eight gestures to manipulate their game avatar to complete a task. Experimental results indicated that, relative to baseline, the modified feedback condition led to significantly improved accuracy and improved gesture class separation. These findings suggest that real-time feedback in a gamified user interface with manipulation of feedback may enable intuitive, rapid, and accurate task acquisition for sEMG-based gesture recognition applications.
    摘要 我们设计并测试了一个实时控制用户界面的系统,通过抽取背部电 MYO 活动的八个电极配置。 MYO 数据流入一个机器学习算法,并在实时进行手势分类。在人类学习阶段,参与者接受了三种反馈:正确反馈、修改反馈和无反馈。在不同反馈情况下,参与者在一系列小游戏中使用八种手势来控制游戏角色完成任务。实验结果表明,相比基eline,修改反馈情况下的准确率和手势分类率显著提高。这些发现表明,在实时反馈的 gamified 用户界面中,可以通过手势识别应用的修改反馈来快速、准确地学习任务。

Simultaneous inference for generalized linear models with unmeasured confounders

  • paper_url: http://arxiv.org/abs/2309.07261
  • repo_url: None
  • paper_authors: Jin-Hong Du, Larry Wasserman, Kathryn Roeder
  • for: This paper is written for researchers and practitioners in the field of genomic studies, particularly those interested in large-scale hypothesis testing and confounding effect adjustment.
  • methods: The paper proposes a unified statistical estimation and inference framework for multivariate generalized linear models in the presence of confounding effects. The method leverages orthogonal structures and integrates linear projections into three key stages: separating marginal and uncorrelated confounding effects, jointly estimating latent factors and primary effects, and incorporating projected and weighted bias-correction steps for hypothesis testing.
  • results: The paper establishes various identification conditions and non-asymptotic error bounds, and shows effective Type-I error control of asymptotic $z$-tests. Numerical experiments demonstrate that the proposed method controls the false discovery rate and is more powerful than alternative methods. The paper also demonstrates the suitability of adjusting confounding effects when significant covariates are absent from the model using single-cell RNA-seq counts from two groups of samples.
    Abstract Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    摘要 tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you need the translation in Traditional Chinese, please let me know.

All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks

  • paper_url: http://arxiv.org/abs/2309.07250
  • repo_url: None
  • paper_authors: Richard D. P. East, Guillermo Alonso-Linaje, Chae-Yeun Park
  • for: 这个论文的目的是提出使用矩阵网络来构建SU(2)对称的量子电路 ansatz,以实现量子变分析算法的高效运行。
  • methods: 这个论文使用矩阵网络来构建SU(2)对称的量子电路 ansatz,并证明其与其他已知的构建方法相等。
  • results: 该论文的实验结果表明,使用这种SU(2)对称的量子电路 ansatz可以提高量子变分析算法的性能,这表明它们可以应用于更多的实际问题。
    Abstract Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.
    摘要 In this paper, we propose using spin networks, a form of directed tensor network that is invariant under group transformation, to develop SU(2) equivariant quantum circuit ansätze. These circuits possess spin rotation symmetry. By changing to a basis that block diagonalizes the SU(2) group action, these networks provide a natural building block for constructing parameterized equivariant quantum circuits.We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and general permutations, but is more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results show that our equivariant circuits improve the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.

EarthPT: a foundation model for Earth Observation

  • paper_url: http://arxiv.org/abs/2309.07207
  • repo_url: None
  • paper_authors: Michael J. Smith, Luke Fleming, James E. Geach
  • for: 该研究开发了一种 Earth Observation(EO)预训transformer,即 EarthPT,用于预测未来地表反射特征。
  • methods: 该模型使用了700万参数的解码变换器基础模型,通过自我supervised方式进行训练,并特地针对地观用 caso设计。
  • results: 研究表明,EarthPT 是一个高效的预测器,可以准确预测未来地表反射特征,包括 Normalised Difference Vegetation Index(NDVI)的进化。这些预测结果在5个月的测试集期间 Typical error 约为0.05(在自然范围内的 -1到1 之间),超过了基于历史平均值的简单阶段模型。此外,研究还发现,EarthPT 学习的嵌入在 Semantically meaningful information 上,可以用于下游任务,如高精度、动态地用分类。
    Abstract We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'
    摘要 我们介绍 EarthPT -- 一种地球观测(EO)预训练的转换器。 EarthPT 是一个 700 万参数的解码转换器基模型,通过自我授课的自然语言模式进行了专门为 EO 应用场景设计和训练。我们展示了 EarthPT 是一个准确预测未来像素级表面反射的有效预测器,可以在 400-2300 nm 范围内准确预测未来数个月内的表面反射。例如,NDVI(Normalised Difference Vegetation Index)的发展趋势预测 Error 通常在 pixel 级别为 approximately 0.05(在自然范围内),而 phase-folded 模型基于历史平均值的预测 Error 较高。我们还示出了 EarthPT 学习的嵌入都具有Semantically meaningful information,可以用于下游任务,如高精度、动态地球用途分类。另外,由于 EO 数据的备受丰富,我们可以在理论上利用 quadrillions 的training tokens,根据 Large Language Models(LLMs)的神经Scaling 法则,无限制地扩展 EarthPT 和其他类似的 Large Observation Models。

Data Augmentation via Subgroup Mixup for Improving Fairness

  • paper_url: http://arxiv.org/abs/2309.07110
  • repo_url: None
  • paper_authors: Madeline Navarro, Camille Little, Genevera I. Allen, Santiago Segarra
  • for: 提高群体公平性,因为许多实际应用中机器学习系统会出现因为社会偏见而导致的各个群体之间的偏见。
  • methods: 使用对应 subgroup 的 pairwise mixup 数据扩展方法,以促进公平和准确的决策边界 для所有子群体。
  • results: 通过对 simulated 和实际数据进行比较,我们的方法可以实现公平的结果,同时保持或提高准确性。
    Abstract In this work, we propose data augmentation via pairwise mixup across subgroups to improve group fairness. Many real-world applications of machine learning systems exhibit biases across certain groups due to under-representation or training data that reflects societal biases. Inspired by the successes of mixup for improving classification performance, we develop a pairwise mixup scheme to augment training data and encourage fair and accurate decision boundaries for all subgroups. Data augmentation for group fairness allows us to add new samples of underrepresented groups to balance subpopulations. Furthermore, our method allows us to use the generalization ability of mixup to improve both fairness and accuracy. We compare our proposed mixup to existing data augmentation and bias mitigation approaches on both synthetic simulations and real-world benchmark fair classification data, demonstrating that we are able to achieve fair outcomes with robust if not improved accuracy.
    摘要 在这项工作中,我们提出了通过对组比对的混合来提高群公平性。许多实际应用中的机器学习系统具有因社会偏见而导致的各个群体之间的偏见。 Drawing inspiration from the successes of mixup in improving classification performance, we develop a pairwise mixup scheme to augment training data and encourage fair and accurate decision boundaries for all subgroups. 数据增强为群公平性的目的是增加受 represcribed 群体的样本,平衡 subgroup。此外,我们的方法可以利用混合的通用能力来提高公平性和准确性。我们与现有的数据增强和偏见缓和方法进行比较,在 sintetic simulations 和实际的偏见分类数据上表明,我们可以实现公平的结果,并且准确性可能会更高。

The Boundaries of Verifiable Accuracy, Robustness, and Generalisation in Deep Learning

  • paper_url: http://arxiv.org/abs/2309.07072
  • repo_url: None
  • paper_authors: Alexander Bastounis, Alexander N. Gorban, Anders C. Hansen, Desmond J. Higham, Danil Prokhorov, Oliver Sutton, Ivan Y. Tyukin, Qinghua Zhou
  • for: 本研究探讨了神经网络在分类任务中确定稳定性和准确性的理论限制。
  • methods: 本研究使用了传统的分布式随机过程框架和最小化实际风险的算法,可能受权重正则化的影响。
  • results: 研究发现,有许多任务情况下,计算和验证理想稳定和准确的神经网络是极其困难,甚至不可能,即使理想的解在给定神经网络架构中存在。
    Abstract In this work, we assess the theoretical limitations of determining guaranteed stability and accuracy of neural networks in classification tasks. We consider classical distribution-agnostic framework and algorithms minimising empirical risks and potentially subjected to some weights regularisation. We show that there is a large family of tasks for which computing and verifying ideal stable and accurate neural networks in the above settings is extremely challenging, if at all possible, even when such ideal solutions exist within the given class of neural architectures.
    摘要 在这项工作中,我们评估了神经网络在分类任务中的理论限制。我们考虑了传统的分布无关框架和算法,以降低采样风险并可能受到权重正则化。我们发现,有一大家族任务, computing和验证理想稳定和准确的神经网络在上述设定下是极其困难,甚至存在这种理想解在给定的神经网络架构中。Note: " Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

An Extreme Learning Machine-Based Method for Computational PDEs in Higher Dimensions

  • paper_url: http://arxiv.org/abs/2309.07049
  • repo_url: None
  • paper_authors: Yiran Wang, Suchuan Dong
  • for: 解决高维partial differential equation (PDE)问题
  • methods: 使用Randomized Neural Networks方法,包括一种基于ELM的扩展方法和一种基于Approximate Theory of Functional Connections (A-TFC)的减少方法
  • results: 可以精确地解决高维PDE问题,错误水平与机器精度相似,并且比physics-informed neural network (PINN)方法更加经济和精准。
    Abstract We present two effective methods for solving high-dimensional partial differential equations (PDE) based on randomized neural networks. Motivated by the universal approximation property of this type of networks, both methods extend the extreme learning machine (ELM) approach from low to high dimensions. With the first method the unknown solution field in $d$ dimensions is represented by a randomized feed-forward neural network, in which the hidden-layer parameters are randomly assigned and fixed while the output-layer parameters are trained. The PDE and the boundary/initial conditions, as well as the continuity conditions (for the local variant of the method), are enforced on a set of random interior/boundary collocation points. The resultant linear or nonlinear algebraic system, through its least squares solution, provides the trained values for the network parameters. With the second method the high-dimensional PDE problem is reformulated through a constrained expression based on an Approximate variant of the Theory of Functional Connections (A-TFC), which avoids the exponential growth in the number of terms of TFC as the dimension increases. The free field function in the A-TFC constrained expression is represented by a randomized neural network and is trained by a procedure analogous to the first method. We present ample numerical simulations for a number of high-dimensional linear/nonlinear stationary/dynamic PDEs to demonstrate their performance. These methods can produce accurate solutions to high-dimensional PDEs, in particular with their errors reaching levels not far from the machine accuracy for relatively lower dimensions. Compared with the physics-informed neural network (PINN) method, the current method is both cost-effective and more accurate for high-dimensional PDEs.
    摘要 我们提出了两种有效的方法来解决高维度 partial differential equation (PDE) 基于随机 нейрон网络。这些方法都是基于随机 нейрон网络的通用测量性质,并将 ELM 方法从低维度扩展到高维度。在第一种方法中,不知之解决场在 d 维度中是使用随机Feed-Forward нейрон网络表示,其中隐藏层参数随机分配并固定,而输出层参数则是通过训练而得到。PDE 和边界/初始条件以及当地的连续条件( для本地方法)在一些随机内部/边界点上被规律。由于这些点的数量增加,它们的线性或非线性代数系统的最小二乘解决方案提供了训练随机 нейрон网络参数的值。在第二种方法中,高维度 PDE 问题被重新表述为一个受限的表述基于 Approximate 函数连接理论 (A-TFC),这个方法可以避免 PDE 的维度增加所带来的减少维度增加的数量增加。在 A-TFC 受限表述中,自由场函数是使用随机Feed-Forward нейрон网络表示,并通过与第一种方法相似的训练程序训练。我们将在一些高维度线性/非线性站点/动态 PDE 中进行丰富的数据验证,以示其性能。这些方法可以精确地解决高维度 PDE,特别是其错误在相对较低维度时已经接近机器精度。相比于物理学 informed neural network (PINN) 方法,现在的方法更加成本效益和精度高于高维度 PDE。

Optimal transport distances for directed, weighted graphs: a case study with cell-cell communication networks

  • paper_url: http://arxiv.org/abs/2309.07030
  • repo_url: None
  • paper_authors: James S. Nagai, Ivan G. Costa, Michael T. Schaub
  • for: comparing directed graphs using optimal transport distances
  • methods: proposes two distance measures based on variants of optimal transport (Wasserstein and Gromov-Wasserstein)
  • results: evaluates the performance of the two distance measures on simulated graph data and real-world directed cell-cell communication graphs.Here’s the summary in the format you requested:
  • for: 比较直接图使用最优运输距离
  • methods: 提出了基于变种最优运输的两种距离度量(水星车运输和格罗莫夫-水星车运输)
  • results: 对于 simulate图数据和实际直接细胞通信图数据进行评估和比较
    Abstract Comparing graphs by means of optimal transport has recently gained significant attention, as the distances induced by optimal transport provide both a principled metric between graphs as well as an interpretable description of the associated changes between graphs in terms of a transport plan. As the lack of symmetry introduces challenges in the typically considered formulations, optimal transport distances for graphs have mostly been developed for undirected graphs. Here, we propose two distance measures to compare directed graphs based on variants of optimal transport: (i) an earth movers distance (Wasserstein) and (ii) a Gromov-Wasserstein (GW) distance. We evaluate these two distances and discuss their relative performance for both simulated graph data and real-world directed cell-cell communication graphs, inferred from single-cell RNA-seq data.
    摘要 comparing 图表使用最优运输方法已经吸引了广泛关注,因为最优运输距离提供了图表之间的原则性的距离度量,同时还提供了可解释的交通计划。由于不具有对称性,通常考虑的形式ulation中的最优运输距离多是针对无向图进行开发。在这里,我们提出了两种用于比较导向图的距离度量:(一)地球搬运距离(沃asserstein)和(二)格罗莫夫-沃asserstein(GW)距离。我们评估了这两种距离度量,并对假设数据和真实世界导向细胞通信图进行了评估。

Mitigating Adversarial Attacks in Federated Learning with Trusted Execution Environments

  • paper_url: http://arxiv.org/abs/2309.07197
  • repo_url: https://github.com/queyrusi/pelta
  • paper_authors: Simon Queyrut, Valerio Schiavoni, Pascal Felber
  • for: 本研究旨在防止 federated learning(FL)中的模型更新计算被攻击者抢夺,以保护用户数据隐私。
  • methods: 本研究使用 Trusted Execution Environments(TEEs)来防止攻击者通过黑盒攻击(white-box attack)制作恶意样本。
  • results: 本研究在三个常用的数据集(CIFAR-10、CIFAR-100和ImageNet)上评估了 Pelta 机制,并证明了其能够有效地防止六种白盒攻击(包括 Projected Gradient Descent、Momentum Iterative Method、Auto Projected Gradient Descent 和 Carlini & Wagner 攻击)。
    Abstract The main premise of federated learning (FL) is that machine learning model updates are computed locally to preserve user data privacy. This approach avoids by design user data to ever leave the perimeter of their device. Once the updates aggregated, the model is broadcast to all nodes in the federation. However, without proper defenses, compromised nodes can probe the model inside their local memory in search for adversarial examples, which can lead to dangerous real-world scenarios. For instance, in image-based applications, adversarial examples consist of images slightly perturbed to the human eye getting misclassified by the local model. These adversarial images are then later presented to a victim node's counterpart model to replay the attack. Typical examples harness dissemination strategies such as altered traffic signs (patch attacks) no longer recognized by autonomous vehicles or seemingly unaltered samples that poison the local dataset of the FL scheme to undermine its robustness. Pelta is a novel shielding mechanism leveraging Trusted Execution Environments (TEEs) that reduce the ability of attackers to craft adversarial samples. Pelta masks inside the TEE the first part of the back-propagation chain rule, typically exploited by attackers to craft the malicious samples. We evaluate Pelta on state-of-the-art accurate models using three well-established datasets: CIFAR-10, CIFAR-100 and ImageNet. We show the effectiveness of Pelta in mitigating six white-box state-of-the-art adversarial attacks, such as Projected Gradient Descent, Momentum Iterative Method, Auto Projected Gradient Descent, the Carlini & Wagner attack. In particular, Pelta constitutes the first attempt at defending an ensemble model against the Self-Attention Gradient attack to the best of our knowledge. Our code is available to the research community at https://github.com/queyrusi/Pelta.
    摘要 主要想法 behind 联合学习(FL)是计算机机器学习模型更新在保持用户数据隐私的情况下进行本地计算。这种方法避免由设计计划用户数据从设备外部传输。一旦更新被聚合,模型会被广播到所有联合节点。然而,不当防御可能导致受到攻击的节点探测模型内部的攻击示例,这可能导致危险的实际场景。例如,在图像应用程序中,攻击示例可以是微调到人类眼睛的图像,使得本地模型 incorrect 分类。这些攻击示例然后会被传递到受到攻击的节点的对应模型,以重新发动攻击。常见的攻击策略包括修改交通标志(patch attacks),使得自动驾驶车辆不再认可,以及模拟不变的样本,损害本地联合学习方案的可靠性。Pelta 是一种新的防御机制,基于可信执行环境(TEEs),减少了攻击者对恶意示例的能力。Pelta 在 TEE 中隐藏了反propagation chain rule 的第一部分,通常由攻击者利用来制作恶意示例。我们使用三个常见的 dataset:CIFAR-10、CIFAR-100 和 ImageNet,评估 Pelta 对 state-of-the-art 精度模型的防御效果。我们发现 Pelta 可以有效防止六种白盒 state-of-the-art 攻击,包括 Projected Gradient Descent、Momentum Iterative Method、Auto Projected Gradient Descent 和 Carlini & Wagner 攻击。尤其是,Pelta 是首次防御 ensemble 模型对 Self-Attention Gradient 攻击的尝试。我们的代码可以在 https://github.com/queyrusi/Pelta 上下载。

Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

  • paper_url: http://arxiv.org/abs/2309.08561
  • repo_url: None
  • paper_authors: Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet
  • for: 这篇论文的目的是提出一种新的自动话语识别(ASR)系统,用于检测用户定义的关键字。
  • methods: 这篇论文使用了一种新的文本编码器,将语音输入转换为关键字条件的正规化参数。这些参数 затем用于进行话语识别。
  • results: 这篇论文的实验结果显示,这种方法可以在多种多元的语言 benchmark 上取得优秀的成绩,并且在不同的语言中也能够获得显著的改善。此外,这种方法还可以在无法语言training的情况下提供substantial的性能改善。
    Abstract Open vocabulary keyword spotting is a crucial and challenging task in automatic speech recognition (ASR) that focuses on detecting user-defined keywords within a spoken utterance. Keyword spotting methods commonly map the audio utterance and keyword into a joint embedding space to obtain some affinity score. In this work, we propose AdaKWS, a novel method for keyword spotting in which a text encoder is trained to output keyword-conditioned normalization parameters. These parameters are used to process the auditory input. We provide an extensive evaluation using challenging and diverse multi-lingual benchmarks and show significant improvements over recent keyword spotting and ASR baselines. Furthermore, we study the effectiveness of our approach on low-resource languages that were unseen during the training. The results demonstrate a substantial performance improvement compared to baseline methods.
    摘要 开放词汇关键点检测是自动语音识别(ASR)中一项重要和挑战性的任务,它的目标是在语音词语中检测用户定义的关键词。关键点检测方法通常将音频词语和关键词映射到一个共同嵌入空间中,以获得一些相似度分数。在这种工作中,我们提出了AdaKWS方法,它是一种新的关键点检测方法,其中文本编码器在输出关键词条件下的normalization参数。这些参数用于处理听频输入。我们对多种多语言的benchmark进行了广泛的评估,并显示了与最近的关键点检测和ASR基线方法相比的显著提高。此外,我们还研究了我们的方法在没有在训练中看到的低资源语言上的效果,结果表明了和基线方法相比的巨大性能提高。

Effect of hyperparameters on variable selection in random forests

  • paper_url: http://arxiv.org/abs/2309.06943
  • repo_url: https://github.com/imbs-hl/rf-hyperparameters-and-variable-selection
  • paper_authors: Cesaire J. K. Fouodo, Lea L. Kronziel, Inke R. König, Silke Szymczak
  • for: 这个研究是为了探讨Random Forest(RF)算法中的几个参数对预测模型和变量选择的影响。
  • methods: 研究使用了两个 simulation studies,使用理论分布和实验遗传学数据,以evaluate RF算法中参数的影响。
  • results: 研究发现,选择候选变量的程序(Vita和Boruta)受到几个参数的影响,包括mtry.prop和sample.fraction。这些参数的设定可以影响预测性能和变量选择的精度。
    Abstract Random forests (RFs) are well suited for prediction modeling and variable selection in high-dimensional omics studies. The effect of hyperparameters of the RF algorithm on prediction performance and variable importance estimation have previously been investigated. However, how hyperparameters impact RF-based variable selection remains unclear. We evaluate the effects on the Vita and the Boruta variable selection procedures based on two simulation studies utilizing theoretical distributions and empirical gene expression data. We assess the ability of the procedures to select important variables (sensitivity) while controlling the false discovery rate (FDR). Our results show that the proportion of splitting candidate variables (mtry.prop) and the sample fraction (sample.fraction) for the training dataset influence the selection procedures more than the drawing strategy of the training datasets and the minimal terminal node size. A suitable setting of the RF hyperparameters depends on the correlation structure in the data. For weakly correlated predictor variables, the default value of mtry is optimal, but smaller values of sample.fraction result in larger sensitivity. In contrast, the difference in sensitivity of the optimal compared to the default value of sample.fraction is negligible for strongly correlated predictor variables, whereas smaller values than the default are better in the other settings. In conclusion, the default values of the hyperparameters will not always be suitable for identifying important variables. Thus, adequate values differ depending on whether the aim of the study is optimizing prediction performance or variable selection.
    摘要 Random forests (RFs) 是高维数据研究中适用于预测模型和变量选择的好方法。RF算法中的超参数对预测性能和变量重要性估计的影响已经被研究过,但是如何影响RF基于变量选择仍然不清楚。我们通过两个 simulations studies使用理论分布和实际表达数据来评估超参数对Vita和Boruta变量选择过程的影响。我们评估了选择重要变量的能力(敏感度),同时控制false discovery rate(FDR)。我们的结果表明,RF超参数中分布 candidate variables的比例(mtry.prop)和训练集中的样本分布(sample.fraction)对选择过程产生更大的影响,而不是训练集的抽取方式和最小终节点大小。RF超参数的适用取决于数据中变量之间的相关性。对于弱相关的预测变量,默认值的mtry是最佳,但是较小的sample.fraction会导致更大的敏感度。相反,对于强相关的预测变量,默认值的sample.fraction没有影响敏感度,但是较小的值比默认值更好。因此,默认值的超参数不一定适用于identifyingimportant变量。因此,需要根据研究的目标是优化预测性能还是变量选择来选择合适的超参数值。

Modeling Dislocation Dynamics Data Using Semantic Web Technologies

  • paper_url: http://arxiv.org/abs/2309.06930
  • repo_url: None
  • paper_authors: Ahmad Zainul Ihsan, Said Fathalla, Stefan Sandfeld
  • for: 本研究旨在探讨Materials Science and Engineering领域中普遍研究的晶体材料,包括金属和半导体材料。晶体材料通常含有一种特定的缺陷,即“扭变”。这种缺陷对材料的性能产生重要影响,包括强度、裂解强度和ductility。
  • methods: 本研究使用semantic web技术来描述扭变行为,包括使用ontology来注释数据。我们对已有的Dislocation Ontology进行了扩展,添加了缺失的概念,并与两个域相关的ontology(i.e., Elementary Multi-perspective Material Ontology和Materials Design Ontology)进行了对接,以便效率地表示扭变数据。
  • results: 我们通过构建了一个知识图(DisLocKG), illustrate了扭变数据之间的关系。此外,我们还开发了一个SPARQL终点,允许用户进行extensive的查询DisLocKG。
    Abstract Research in the field of Materials Science and Engineering focuses on the design, synthesis, properties, and performance of materials. An important class of materials that is widely investigated are crystalline materials, including metals and semiconductors. Crystalline material typically contains a distinct type of defect called "dislocation". This defect significantly affects various material properties, including strength, fracture toughness, and ductility. Researchers have devoted a significant effort in recent years to understanding dislocation behavior through experimental characterization techniques and simulations, e.g., dislocation dynamics simulations. This paper presents how data from dislocation dynamics simulations can be modeled using semantic web technologies through annotating data with ontologies. We extend the already existing Dislocation Ontology by adding missing concepts and aligning it with two other domain-related ontologies (i.e., the Elementary Multi-perspective Material Ontology and the Materials Design Ontology) allowing for representing the dislocation simulation data efficiently. Moreover, we show a real-world use case by representing the discrete dislocation dynamics data as a knowledge graph (DisLocKG) that illustrates the relationship between them. We also developed a SPARQL endpoint that brings extensive flexibility to query DisLocKG.
    摘要 研究在材料科学和工程领域的ocus是设计、合成、性能和表现等材料的方面。一种广泛研究的材料是晶体材料,包括金属和半导体。晶体材料通常含有一种特殊的缺陷,即“扭变”。这种缺陷对材料的性能产生很大影响,包括强度、裂解强度和塑性。研究人员在过去几年中对扭变行为进行了广泛的研究,包括实验测定技术和 simulations。本文介绍了如何使用Semantic Web技术来模型来自扭变动力学 simulations的数据,包括使用ontology进行数据注释。我们将现有的扭变ontology扩展,添加缺失的概念和与其他两个领域相关的 Ontology(即Elementary Multi-perspective Material Ontology和Materials Design Ontology)进行对应,以便有效地表示扭变动力学数据。此外,我们还构建了一个知识图(DisLocKG),用于表示扭变动力学数据的关系。此外,我们还开发了一个SPARQL端点,以便通过查询DisLocKG进行广泛的灵活性调查。

Investigating the Impact of Action Representations in Policy Gradient Algorithms

  • paper_url: http://arxiv.org/abs/2309.06921
  • repo_url: None
  • paper_authors: Jan Schneider, Pierre Schumacher, Daniel Häufle, Bernhard Schölkopf, Dieter Büchler
  • for: investigate the impact of action representations on the learning performance of reinforcement learning algorithms
  • methods: use different analysis techniques to assess the effectiveness of action representations in RL
  • results: the action representation can significantly influence the learning performance on popular RL benchmark tasks, and some of the performance differences can be attributed to changes in the complexity of the optimization landscape.Here’s the full text in Simplified Chinese:
  • for: 这研究旨在 investigate reinforcement learning 算法的学习性能如何受到行为表现的影响
  • methods: 使用不同的分析技术来评估 action 表现对 reinforcement learning 算法的效果
  • results: 发现 action 表现可以对 popular reinforcement learning benchmark task 产生显著的影响,并且一些性能差异可以被归因于优化 landscape 的复杂度变化。
    Abstract Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-world tasks. However, influences on the learning performance of RL algorithms are often poorly understood in practice. We discuss different analysis techniques and assess their effectiveness for investigating the impact of action representations in RL. Our experiments demonstrate that the action representation can significantly influence the learning performance on popular RL benchmark tasks. The analysis results indicate that some of the performance differences can be attributed to changes in the complexity of the optimization landscape. Finally, we discuss open challenges of analysis techniques for RL algorithms.
    摘要 利用强化学习(RL)框架,可以解决复杂的实际任务。然而,RL算法的学习性能中的影响因素通常在实践中不够了解。我们讨论了不同的分析技术,并评估它们在RL算法中的有效性。我们的实验表明,行为表示可以对RL算法的学习性能产生重要影响。分析结果表明,一些性能差异可以归结于优化景观的复杂性变化。最后,我们讨论了RL算法分析技术的开放挑战。

Domain-Aware Augmentations for Unsupervised Online General Continual Learning

  • paper_url: http://arxiv.org/abs/2309.06896
  • repo_url: None
  • paper_authors: Nicolas Michel, Romain Negrel, Giovanni Chierchia, Jean-François Bercher
  • for: 提高Unsupervised Online General Continual Learning(UOGCL)中的学习稳定性,特别是在无丝总成本知识或任务变化信息的情况下。
  • methods: 提出了一种新的方法,通过定义流程相依的数据增强技术和一些实现策略,以增强对异构学习的记忆使用。
  • results: 与其他无监督方法相比,该方法在所有考虑的设置中达到了最佳效果,并将超级vised和无监督 continual learning之间的差降到最低水平。我们的领域相关增强过程可以适应其他回放基于方法,因此是一个有前途的策略 для continual learning。
    Abstract Continual Learning has been challenging, especially when dealing with unsupervised scenarios such as Unsupervised Online General Continual Learning (UOGCL), where the learning agent has no prior knowledge of class boundaries or task change information. While previous research has focused on reducing forgetting in supervised setups, recent studies have shown that self-supervised learners are more resilient to forgetting. This paper proposes a novel approach that enhances memory usage for contrastive learning in UOGCL by defining and using stream-dependent data augmentations together with some implementation tricks. Our proposed method is simple yet effective, achieves state-of-the-art results compared to other unsupervised approaches in all considered setups, and reduces the gap between supervised and unsupervised continual learning. Our domain-aware augmentation procedure can be adapted to other replay-based methods, making it a promising strategy for continual learning.
    摘要 continuous learning 是具有挑战性的,特别是在无监督的情况下,如无监督在线通用性Continual Learning (UOGCL),学习机器没有类别边界或任务变化信息的先前知识。以前的研究主要是降低忘记的方法,但最近的研究表明,自我监督学习者更抗忘记。这篇论文提出了一种新的方法,增强了对异构学习的内存使用,通过定义流程依赖的数据增强和一些实现技巧。我们提议的方法简单又高效,在所有考虑的设置中达到了其他无监督方法的状态艺术级 результа,并将无监督Continual Learning和监督Continual Learning之间的差距减少。我们的领域相关增强过程可以适应其他播放基于方法,使其成为持续学习的优秀策略。

A Robust SINDy Approach by Combining Neural Networks and an Integral Form

  • paper_url: http://arxiv.org/abs/2309.07193
  • repo_url: None
  • paper_authors: Ali Forootani, Pawan Goyal, Peter Benner
  • for: 在噪音和缺乏数据的情况下找到管理方程的研究已经是数据挖掘领域的活跃领域之一。
  • methods: 我们使用神经网络学习一种含义 repre sentation,使其不仅能够在测量数据附近生成输出,而且能够描述时间演化的输出。在SINDy框架下,我们学习这种动力系统。使用神经网络学习的隐式表示,我们获取了SINDy所需的导数据。为了增强我们的方法的稳定性,我们进一步添加了输出隐式网络的积分条件。
  • results: 我们提出了一种可靠的方法,可以在噪音和缺乏数据的情况下找到管理方程。我们通过多个初始条件的数据集来扩展我们的方法,并证明了它的高效性。与已有方法相比,我们的方法在噪音和缺乏数据的情况下表现更加稳定和有效。
    Abstract The discovery of governing equations from data has been an active field of research for decades. One widely used methodology for this purpose is sparse regression for nonlinear dynamics, known as SINDy. Despite several attempts, noisy and scarce data still pose a severe challenge to the success of the SINDy approach. In this work, we discuss a robust method to discover nonlinear governing equations from noisy and scarce data. To do this, we make use of neural networks to learn an implicit representation based on measurement data so that not only it produces the output in the vicinity of the measurements but also the time-evolution of output can be described by a dynamical system. Additionally, we learn such a dynamic system in the spirit of the SINDy framework. Leveraging the implicit representation using neural networks, we obtain the derivative information -- required for SINDy -- using an automatic differentiation tool. To enhance the robustness of our methodology, we further incorporate an integral condition on the output of the implicit networks. Furthermore, we extend our methodology to handle data collected from multiple initial conditions. We demonstrate the efficiency of the proposed methodology to discover governing equations under noisy and scarce data regimes by means of several examples and compare its performance with existing methods.
    摘要 发现管理方程的研究已经是数据挖掘的一个活跃领域,有一种广泛使用的方法是非线性动力学中的稀疏回归,称为SINDy。 despite several attempts, noisy and scarce data still pose a severe challenge to the success of the SINDy approach. In this work, we discuss a robust method to discover nonlinear governing equations from noisy and scarce data. To do this, we make use of neural networks to learn an implicit representation based on measurement data so that not only it produces the output in the vicinity of the measurements but also the time-evolution of output can be described by a dynamical system. Additionally, we learn such a dynamic system in the spirit of the SINDy framework. Leveraging the implicit representation using neural networks, we obtain the derivative information -- required for SINDy -- using an automatic differentiation tool. To enhance the robustness of our methodology, we further incorporate an integral condition on the output of the implicit networks. Furthermore, we extend our methodology to handle data collected from multiple initial conditions. We demonstrate the efficiency of the proposed methodology to discover governing equations under noisy and scarce data regimes by means of several examples and compare its performance with existing methods.

The effect of data augmentation and 3D-CNN depth on Alzheimer’s Disease detection

  • paper_url: http://arxiv.org/abs/2309.07192
  • repo_url: https://github.com/rturrisige/AD_classification
  • paper_authors: Rosanna Turrisi, Alessandro Verri, Annalisa Barla
  • for: 这个论文的目的是为了提高健康监测领域中的机器学习(ML)技术的可靠性和可重复性,特意采用了最佳实践来确保可重复性和可靠性。
  • methods: 这篇论文使用了多种数据增强技术和模型复杂度来影响总性表现。使用了ADNI数据集中的MRI数据,采用3D卷积神经网络(CNN)进行分类问题。实验设计包括跨 Validation 和多个训练尝试,以资料稀缺和初始随机参数资料补做。
  • results: 研究发现,不同的数据增强策略和模型复杂度对总表现的影响是显著的,最好的模型(8 CL, (B))在各个批处理和训练尝试中具有最高精度和稳定性。
    Abstract Machine Learning (ML) has emerged as a promising approach in healthcare, outperforming traditional statistical techniques. However, to establish ML as a reliable tool in clinical practice, adherence to best practices regarding data handling, experimental design, and model evaluation is crucial. This work summarizes and strictly observes such practices to ensure reproducible and reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. We investigate the impact of different data augmentation techniques and model complexity on the overall performance. We consider MRI data from ADNI dataset to address a classification problem employing 3D Convolutional Neural Network (CNN). The experiments are designed to compensate for data scarcity and initial random parameters by utilizing cross-validation and multiple training trials. Within this framework, we train 15 predictive models, considering three different data augmentation strategies and five distinct 3D CNN architectures, each varying in the number of convolutional layers. Specifically, the augmentation strategies are based on affine transformations, such as zoom, shift, and rotation, applied concurrently or separately. The combined effect of data augmentation and model complexity leads to a variation in prediction performance up to 10% of accuracy. When affine transformation are applied separately, the model is more accurate, independently from the adopted architecture. For all strategies, the model accuracy followed a concave behavior at increasing number of convolutional layers, peaking at an intermediate value of layers. The best model (8 CL, (B)) is the most stable across cross-validation folds and training trials, reaching excellent performance both on the testing set and on an external test set.
    摘要 机器学习(ML)在医疗领域已经出现为一种有前途的方法,超过了传统的统计方法。然而,为了在临床实践中确立ML为可靠的工具,需要严格遵循数据处理、实验设计和模型评价的最佳做法。本工作总结并严格遵循这些做法,以确保可重复和可靠的ML。特别是,我们选择了诊断阿尔茨海默病(AD)为例子,这是医疗领域中的一个挑战性问题。我们研究了不同的数据扩充技术和模型复杂度对总性性能的影响。我们使用ADNI数据集中的MRI数据进行分类问题,使用3D卷积神经网络(CNN)进行解决。实验设计包括资料缺乏和初始随机参数的补做,通过批处理和多个训练回合来实现。在这个框架下,我们训练了15个预测模型,其中包括三种不同的数据扩充策略和五种不同的3D CNN架构,每种架构都有不同的层数。具体来说,数据扩充策略包括同时应用几何变换,包括压缩、移动和旋转,而模型复杂度则由层数决定。结果显示,当应用几何变换分别时,模型的准确率更高,而且独立于采用的架构。对所有策略来说,模型准确率随层数的增加而下降,最佳模型(8 CL,B)在各个批处理弹性和训练回合中具有优秀表现,并在测试集和外部测试集上达到了优秀表现。

Dynamic control of self-assembly of quasicrystalline structures through reinforcement learning

  • paper_url: http://arxiv.org/abs/2309.06869
  • repo_url: None
  • paper_authors: Uyen Tu Lieu, Natsuhiko Yoshinaga
  • For: 本研究使用力学学习控制了动态自组装的十二面体 quasi-кристаллические(DDQC)从粗糙粒子中。* Methods: 研究使用了Q学习方法来估算温度控制策略,并通过这种策略来生成DDQC。* Results: 研究发现,使用力学学习控制的温度规则可以更有效地生成DDQC,并且可以避免一些常见的缺陷。此外,研究还发现了一种自动发现束缚温度的机制,即通过动态变化的束缚温度来帮助系统形成DDQC。
    Abstract We propose reinforcement learning to control the dynamical self-assembly of the dodecagonal quasicrystal (DDQC) from patchy particles. The patchy particles have anisotropic interactions with other particles and form DDQC. However, their structures at steady states are significantly influenced by the kinetic pathways of their structural formation. We estimate the best policy of temperature control trained by the Q-learning method and demonstrate that we can generate DDQC with few defects using the estimated policy. The temperature schedule obtained by reinforcement learning can reproduce the desired structure more efficiently than the conventional pre-fixed temperature schedule, such as annealing. To clarify the success of the learning, we also analyse a simple model describing the kinetics of structural changes through the motion in a triple-well potential. We have found that reinforcement learning autonomously discovers the critical temperature at which structural fluctuations enhance the chance of forming a globally stable state. The estimated policy guides the system toward the critical temperature to assist the formation of DDQC.
    摘要 我们提议使用强化学习控制多十二边形 quasi-кристал(DDQC)由杂性粒子自组装。这些杂性粒子具有方向性相互作用,并且形成DDQC。然而,他们的稳定态结构受到其结构形成的动力学 PATHway 的影响。我们使用Q学习方法估算最佳温度控制策略,并证明可以使用这个策略生成几乎无损DDQC。我们所获得的温度规则可以更有效地生成愿望的结构,而不是传统的预先 fixing 温度规则,如热处理。为了证明学习的成功,我们还分析了一个描述结构变化的三重潜在陷阱中的简单模型。我们发现,强化学习自动发现了在结构变化中增强global stability的温度。我们估算的策略可以帮助系统向这个温度迁移,以便DDQC的形成。

Supervised Machine Learning and Physics based Machine Learning approach for prediction of peak temperature distribution in Additive Friction Stir Deposition of Aluminium Alloy

  • paper_url: http://arxiv.org/abs/2309.06838
  • repo_url: None
  • paper_authors: Akshansh Mishra
  • For: This paper aims to improve the understanding of the relationship between process parameters, thermal profiles, and microstructure in Additive Friction Stir Deposition (AFSD) for solid-state additive manufacturing.* Methods: The paper combines supervised machine learning (SML) and physics-informed neural networks (PINNs) to predict peak temperature distribution in AFSD from process parameters.* Results: The integrated ML approach is able to classify deposition quality from process factors with robust accuracy, providing comprehensive insights into tailoring microstructure through thermal management in AFSD.
    Abstract Additive friction stir deposition (AFSD) is a novel solid-state additive manufacturing technique that circumvents issues of porosity, cracking, and properties anisotropy that plague traditional powder bed fusion and directed energy deposition approaches. However, correlations between process parameters, thermal profiles, and resulting microstructure in AFSD remain poorly understood. This hinders process optimization for properties. This work employs a cutting-edge framework combining supervised machine learning (SML) and physics-informed neural networks (PINNs) to predict peak temperature distribution in AFSD from process parameters. Eight regression algorithms were implemented for SML modeling, while four PINNs leveraged governing equations for transport, wave propagation, heat transfer, and quantum mechanics. Across multiple statistical measures, ensemble techniques like gradient boosting proved superior for SML, with lowest MSE of 165.78. The integrated ML approach was also applied to classify deposition quality from process factors, with logistic regression delivering robust accuracy. By fusing data-driven learning and fundamental physics, this dual methodology provides comprehensive insights into tailoring microstructure through thermal management in AFSD. The work demonstrates the power of bridging statistical and physics-based modeling for elucidating AM process-property relationships.
    摘要 添加式摩擦拌填(AFSD)是一种新型的固体添加制造技术,可以避免传统粉末压缩和指向能量激光束等方法中的孔隙、裂纹和性能不均问题。然而,AFSD的进程参数与热 profiling以及结果结构之间的关系仍然不够了解,这限制了进程优化以获得理想的性能。这项工作使用了一种前沿的框架,结合监督式机器学习(SML)和物理学习网络(PINNs),预测AFSD中热分布的峰值温度。这里实现了8种回归算法 дляSML模型,而4种PINNs利用了传输、波动、热传递和量子力学的管理方程。在多个统计度量上, ensemble技术如Gradient Boosting在SML中证明了最低MSE值为165.78。此外,这种混合的机器学习方法还应用于分类处理过程因素,使用了梯度提升算法获得了可靠的准确率。通过融合数据驱动学习和基础物理学习,这种双重方法提供了全面的理解添加制造过程中热管理对微structure的影响,并且演示了将统计学和物理学基础模型融合的力量,以描述AM过程-性能关系。

Safe Reinforcement Learning with Dual Robustness

  • paper_url: http://arxiv.org/abs/2309.06835
  • repo_url: None
  • paper_authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li
  • for: 本研究旨在整合安全RL和RobustRL,提供一个系统性框架,以便在严重情况下同时保证任务完成度和安全性。
  • methods: 本研究使用束定二player零游戏为基础,提出了一个双政策迭代法,同时优化任务政策和安全政策,并证明了迭代法的收敛性。
  • results: 实验结果显示,DRAC算法在所有情况下(无敌手、安全敌手、性能敌手)具有高性能和持续安全性,较以基eline为主的所有基eline。
    Abstract Reinforcement learning (RL) agents are vulnerable to adversarial disturbances, which can deteriorate task performance or compromise safety specifications. Existing methods either address safety requirements under the assumption of no adversary (e.g., safe RL) or only focus on robustness against performance adversaries (e.g., robust RL). Learning one policy that is both safe and robust remains a challenging open problem. The difficulty is how to tackle two intertwined aspects in the worst cases: feasibility and optimality. Optimality is only valid inside a feasible region, while identification of maximal feasible region must rely on learning the optimal policy. To address this issue, we propose a systematic framework to unify safe RL and robust RL, including problem formulation, iteration scheme, convergence analysis and practical algorithm design. This unification is built upon constrained two-player zero-sum Markov games. A dual policy iteration scheme is proposed, which simultaneously optimizes a task policy and a safety policy. The convergence of this iteration scheme is proved. Furthermore, we design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC). The evaluations with safety-critical benchmarks demonstrate that DRAC achieves high performance and persistent safety under all scenarios (no adversary, safety adversary, performance adversary), outperforming all baselines significantly.
    摘要 利用强化学习(Reinforcement Learning,RL)代理人是易受到敌对干扰的,这可能会导致任务性能下降或者安全要求不符。现有的方法可以在恶作用者不存在的假设下处理安全要求(例如,安全RL),或者只是关注性能对抗者(例如,Robust RL)。学习一个同时具有安全和Robust性的策略是RL领域的一个挑战。这是由于feasibility和optimal性之间的关系,feasibility只能在一个可行的区域内确定,而optimal性则需要学习最佳策略。为解决这个问题,我们提出了一个系统性的框架,这个框架包括问题设定、迭代算法、收敛分析以及实用算法设计。这个框架基于含约两个player零点游戏。我们提出了一种双策略迭代算法,该算法同时优化一个任务策略和一个安全策略。我们证明了这个迭代算法的收敛性。此外,我们还设计了一种深度学习算法,即dually robust actor-critic(DRAC)。我们的实验表明,DRAC在所有情况下(无敌对者、安全敌对者、性能敌对者)都能够达到高性能和持续的安全性,与所有基elines(baselines)相比,DRAC表现出色。

Learning From Drift: Federated Learning on Non-IID Data via Drift Regularization

  • paper_url: http://arxiv.org/abs/2309.07189
  • repo_url: None
  • paper_authors: Yeachan Kim, Bonggun Shin
  • for: 这个研究旨在提高 Federated Learning Algorithms 在异步变的环境中表现。
  • methods: 这个研究使用了 Regularization 技术来防止模型在异步变的环境中表现下降。Specifically, 这个方法包含 two key components: drift estimation 和 drift regularization。
  • results: 实验结果显示,LfD 方法在五个 Federated Learning 方面表现出色:Generalization, Heterogeneity, Scalability, Forgetting, 和 Efficiency。Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data。
    Abstract Federated learning algorithms perform reasonably well on independent and identically distributed (IID) data. They, on the other hand, suffer greatly from heterogeneous environments, i.e., Non-IID data. Despite the fact that many research projects have been done to address this issue, recent findings indicate that they are still sub-optimal when compared to training on IID data. In this work, we carefully analyze the existing methods in heterogeneous environments. Interestingly, we find that regularizing the classifier's outputs is quite effective in preventing performance degradation on Non-IID data. Motivated by this, we propose Learning from Drift (LfD), a novel method for effectively training the model in heterogeneous settings. Our scheme encapsulates two key components: drift estimation and drift regularization. Specifically, LfD first estimates how different the local model is from the global model (i.e., drift). The local model is then regularized such that it does not fall in the direction of the estimated drift. In the experiment, we evaluate each method through the lens of the five aspects of federated learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and Efficiency. Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data.
    摘要 联邦学习算法在独立且同分布(IID)数据上表现良好。然而,在不同环境下(Non-IID)时,它们受到严重的影响,Recent research has shown that they are still suboptimal compared to training on IID data. In this work, we carefully analyze the existing methods in heterogeneous environments. Interestingly, we find that regularizing the classifier's outputs is quite effective in preventing performance degradation on Non-IID data. Motivated by this, we propose Learning from Drift (LfD), a novel method for effectively training the model in heterogeneous settings. Our scheme encapsulates two key components: drift estimation and drift regularization. Specifically, LfD first estimates how different the local model is from the global model (i.e., drift). The local model is then regularized such that it does not fall in the direction of the estimated drift. In the experiment, we evaluate each method through the lens of the five aspects of federated learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and Efficiency. Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data.

Electricity Demand Forecasting through Natural Language Processing with Long Short-Term Memory Networks

  • paper_url: http://arxiv.org/abs/2309.06793
  • repo_url: None
  • paper_authors: Yun Bai, Simon Camal, Andrea Michiorri
  • for: 本研究旨在提高英国国家电力需求预测的性能,通过 incorporating 文本新闻特征。
  • methods: 本研究使用长短Term Memory(LSTM)网络,并将文本新闻特征与历史荷载、天气预报、历法信息和知情事件结合使用。
  • results: 实验结果显示,公众情绪和交通和地政受到的词归档表示在电力需求中有时间连续性作用,LSTM与文本特征组合比对chmark平台上的LSTM准确预测3%左右,比官方benchmark的预测10%左右。此外,提出的模型可以有效减少预测不确定性,缩短预测分布和减少预测不确定性。
    Abstract Electricity demand forecasting is a well established research field. Usually this task is performed considering historical loads, weather forecasts, calendar information and known major events. Recently attention has been given on the possible use of new sources of information from textual news in order to improve the performance of these predictions. This paper proposes a Long and Short-Term Memory (LSTM) network incorporating textual news features that successfully predicts the deterministic and probabilistic tasks of the UK national electricity demand. The study finds that public sentiment and word vector representations related to transport and geopolitics have time-continuity effects on electricity demand. The experimental results show that the LSTM with textual features improves by more than 3% compared to the pure LSTM benchmark and by close to 10% over the official benchmark. Furthermore, the proposed model effectively reduces forecasting uncertainty by narrowing the confidence interval and bringing the forecast distribution closer to the truth.
    摘要 电力需求预测是一个已有研究的领域。通常这项任务通过历史荷载、天气预报、历程信息和知道的主要事件来完成。近期关注到新的信息来源——文本新闻的可能性,以提高预测性能。本文提出一个使用长短期记忆(LSTM)网络,并将文本新闻特征纳入预测模型中,成功预测英国全国电力需求的 deterministic 和 probabilistic 任务。研究发现,公众情绪和交通和地opolitics 相关词 vector 具有时间连续性效应,对电力需求产生影响。实验结果显示,LSTM 与文本特征相加的模型,与纯度 LSTM 参考模型比进行了超过 3% 的提高,与官方参考模型比进行了接近 10% 的提高。此外,提议的模型还能够有效减少预测不确定性,缩短信息分布,使预测 Distribution 更加接近真实。

Scalable neural network models and terascale datasets for particle-flow reconstruction

  • paper_url: http://arxiv.org/abs/2309.06782
  • repo_url: None
  • paper_authors: Joosep Pata, Eric Wulff, Farouk Mokhtar, David Southwick, Mengke Zhang, Maria Girone, Javier Duarte
  • for: 这个论文是为了研究高能电子- пози特核反应中的全事件重建方法。
  • methods: 这个论文使用图 neural network和基于kernel的transformer来实现PF重建,两者都避免了 quadratic memory allocation和计算成本,同时实现了真实的PF重建。
  • results: 研究发现,通过超级计算机进行参数调整可以大幅提高物理性能,并且模型可以在不同的硬件处理器上高度可移植,支持Nvidia、AMD和Intel Habana卡。此外,模型可以在高精度输入上进行训练,并达到与基准相当的物理性能。
    Abstract We study scalable machine learning models for full event reconstruction in high-energy electron-positron collisions based on a highly granular detector simulation. Particle-flow (PF) reconstruction can be formulated as a supervised learning task using tracks and calorimeter clusters or hits. We compare a graph neural network and kernel-based transformer and demonstrate that both avoid quadratic memory allocation and computational cost while achieving realistic PF reconstruction. We show that hyperparameter tuning on a supercomputer significantly improves the physics performance of the models. We also demonstrate that the resulting model is highly portable across hardware processors, supporting Nvidia, AMD, and Intel Habana cards. Finally, we demonstrate that the model can be trained on highly granular inputs consisting of tracks and calorimeter hits, resulting in a competitive physics performance with the baseline. Datasets and software to reproduce the studies are published following the findable, accessible, interoperable, and reusable (FAIR) principles.
    摘要 我们研究可扩展机器学习模型以实现高能电子-正电子撞击中全事件重建,基于高度粒子化计算机模拟。流聚(PF)重建可以表示为监督学习任务,使用轨迹和calorimeter团结或触发。我们比较了图 neural network和基于核 kernel transformer,并证明它们可以避免平方根内存分配和计算成本,同时实现实际PF重建。我们表明在超级计算机上调整超参数可以显著提高物理性能。我们还示出了模型可以在不同的硬件处理器上高度可移植,支持Nvidia、AMD和Intel Habana卡。最后,我们示出了模型可以在高度粒子化输入上训练,并达到与基准相当的物理性能。我们发布了相关的数据集和软件,以便根据FAIR原则进行重用。

MCNS: Mining Causal Natural Structures Inside Time Series via A Novel Internal Causality Scheme

  • paper_url: http://arxiv.org/abs/2309.06739
  • repo_url: None
  • paper_authors: Yuanhao Liu, Dehui Du, Zihan Jiang, Anyan Huang, Yiyang Li
  • for: 本研究旨在探讨时序序列中的内在 causality,以提高神经网络(NN)的准确性和可读性。
  • methods: 该研究提出了一种名为 “Mining Causal Natural Structure”(MCNS)的新框架,可自动找到时序序列中的内在 causality,并将其应用于 NN 中。
  • results: 实验结果表明,通过使用 MCNS 框架和浸泡 NN WITH MCNS,可以提高时序序列分类任务的准确性和可读性,同时还能提供深入的时序序列和数据Summary。
    Abstract Causal inference permits us to discover covert relationships of various variables in time series. However, in most existing works, the variables mentioned above are the dimensions. The causality between dimensions could be cursory, which hinders the comprehension of the internal relationship and the benefit of the causal graph to the neural networks (NNs). In this paper, we find that causality exists not only outside but also inside the time series because it reflects a succession of events in the real world. It inspires us to seek the relationship between internal subsequences. However, the challenges are the hardship of discovering causality from subsequences and utilizing the causal natural structures to improve NNs. To address these challenges, we propose a novel framework called Mining Causal Natural Structure (MCNS), which is automatic and domain-agnostic and helps to find the causal natural structures inside time series via the internal causality scheme. We evaluate the MCNS framework and impregnation NN with MCNS on time series classification tasks. Experimental results illustrate that our impregnation, by refining attention, shape selection classification, and pruning datasets, drives NN, even the data itself preferable accuracy and interpretability. Besides, MCNS provides an in-depth, solid summary of the time series and datasets.
    摘要 causal inference 允许我们发现时序列中变量之间的隐藏关系。然而,现有大多数工作中的变量都是维度,这使得变量之间的相互关系受到忽视,从而阻碍了我们理解内部关系和使用 causal graph 进行神经网络(NN)的优化。在这篇论文中,我们发现时序列中的 causality 不仅存在于外部,而且还存在于内部,因为它反映了实际世界中的事件顺序。这使我们感兴趣寻找内部 subsequences 之间的关系。然而,挑战在于从 subsequences 中发现 causality 和使用 causal natural structure 来改进 NN。为了解决这些挑战,我们提出了一种新的框架,即 Mining Causal Natural Structure (MCNS),它是自动化和领域无关的。MCNS 可以在时序列中找到内部 causal natural structure,并通过 internal causality scheme 来帮助我们更好地理解时序列和数据集。我们对 MCNS 框架和将 MCNS 束入 NN 进行评估。实验结果表明,我们的束入,通过修改注意力、形态选择分类和减少数据集,使得 NN 的准确率和可读性得到了提高。此外,MCNS 还为时序列和数据集提供了深入、坚实的总结。

Bias Amplification Enhances Minority Group Performance

  • paper_url: http://arxiv.org/abs/2309.06717
  • repo_url: None
  • paper_authors: Gaotang Li, Jiarui Liu, Wei Hu
  • for: 提高模型对罕见 subgroup 的准确率,即使模型在平均 Label 上具有高准确率。
  • methods: 提出了一种两stage 训练算法 BAM,包括:首先通过引入每个训练样本的学习抽象变量进行偏好增强 scheme; 其次,对模型错误分类的样本进行权重提高,然后继续使用权重调整后的数据集进行训练。
  • results: 在计算机视觉和自然语言处理领域的假 correlate 测试 benchmark 上,BAM 实现了与现有方法相当的竞争性性能,并且提出了一个简单的停止 criterion,可以根据最小的类别准确率差来除除需要group注解。
    Abstract Neural networks produced by standard training are known to suffer from poor accuracy on rare subgroups despite achieving high accuracy on average, due to the correlations between certain spurious features and labels. Previous approaches based on worst-group loss minimization (e.g. Group-DRO) are effective in improving worse-group accuracy but require expensive group annotations for all the training samples. In this paper, we focus on the more challenging and realistic setting where group annotations are only available on a small validation set or are not available at all. We propose BAM, a novel two-stage training algorithm: in the first stage, the model is trained using a bias amplification scheme via introducing a learnable auxiliary variable for each training sample; in the second stage, we upweight the samples that the bias-amplified model misclassifies, and then continue training the same model on the reweighted dataset. Empirically, BAM achieves competitive performance compared with existing methods evaluated on spurious correlation benchmarks in computer vision and natural language processing. Moreover, we find a simple stopping criterion based on minimum class accuracy difference that can remove the need for group annotations, with little or no loss in worst-group accuracy. We perform extensive analyses and ablations to verify the effectiveness and robustness of our algorithm in varying class and group imbalance ratios.
    摘要 神经网络由标准训练生成的通常会受到罕见 subgroup 的准确率低下,即使它们在平均准确率高,因为某些干扰特征和标签之间的相关性。先前的方法(如 Group-DRO)可以提高罕见 subgroup 的准确率,但它们需要训练样本中的所有样本都有群标注。在这篇论文中,我们关注到更加挑战和现实的设定,即群标注只有小 Validation set 中或完全无法获得。我们提出了 BAM,一种新的两阶段训练算法:在第一阶段,模型通过引入每个训练样本的学习可读变量来进行偏好增强 schemes;在第二阶段,我们将模型对偏好增强后的样本进行重量增强,然后继续训练同样的模型。实际上,BAM 的性能与现有方法在假设分布 benchmark 上相似,并且我们发现一个简单的停止标准 Based on minimum class accuracy difference,可以消除群标注,而且影响最差群的准确率很小。我们进行了广泛的分析和减少来验证我们的算法在不同的类和群异常比例中的效果和稳定性。

Crystal structure prediction using neural network potential and age-fitness Pareto genetic algorithm

  • paper_url: http://arxiv.org/abs/2309.06710
  • repo_url: https://github.com/sadmanomee/ParetoCSP
  • paper_authors: Sadman Sadeed Omee, Lai Wei, Jianjun Hu
  • for: 预测晶体结构 (Crystal Structure Prediction, CSP) 问题的解决方案。
  • methods: combinatorial 多目标遗传算法 (MOGA) 和神经网络间原子 potential (IAP) 模型,用于给化学组成提供能量最优的晶体结构。 使用 NSGA-III 算法,并将生物体质因素作为独立优化因素,使用 M3GNet 通用 IAP 引导 GA 搜索。
  • results: 比GN-OA State-of-the-art 神经 potential 基于 CSP 算法,ParetoCSP 在 $55$ 个多样化 bench mark 结构上表现出色,在七个性能指标上比GN-OA 高出 $2.562$ 倍。 Trajectory 分析表明,ParetoCSP 生成了更多的有效结构,帮助 GA 更好地搜索优化结构。
    Abstract While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (MOGA) with a neural network inter-atomic potential (IAP) model to find energetically optimal crystal structures given chemical compositions. We enhance the NSGA-III algorithm by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal IAP to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $2.562$ across $55$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures
    摘要 While crystal structure prediction (CSP) remains a longstanding challenge, we introduce ParetoCSP, a novel algorithm for CSP, which combines a multi-objective genetic algorithm (MOGA) with a neural network inter-atomic potential (IAP) model to find energetically optimal crystal structures given chemical compositions. We enhance the NSGA-III algorithm by incorporating the genotypic age as an independent optimization criterion and employ the M3GNet universal IAP to guide the GA search. Compared to GN-OA, a state-of-the-art neural potential based CSP algorithm, ParetoCSP demonstrated significantly better predictive capabilities, outperforming by a factor of $2.562$ across $55$ diverse benchmark structures, as evaluated by seven performance metrics. Trajectory analysis of the traversed structures of all algorithms shows that ParetoCSP generated more valid structures than other algorithms, which helped guide the GA to search more effectively for the optimal structures.Here's the translation in Traditional Chinese:而 crystal structure prediction (CSP) 仍然是一个长期的挑战,我们介绍 ParetoCSP,一个新的 CSP 算法,它结合了多个目标遗传算法 (MOGA) 和一个神经网络间原子 potential (IAP) 模型,以发现化学成分所给予的能量最佳晶体结构。我们将 NSGA-III 算法加以改进,包括将生物遗传年龄作为独立优化标准,并使用 M3GNet 通用 IAP 导引 GA 搜索。与 GN-OA,一个现有的神经可能性基于 CSP 算法,ParetoCSP 在 $55$ 个多样的参考结构上显示出明显更好的预测能力,在七个效能指标上出performances by a factor of $2.562$。探索所有算法的轨迹分析显示,ParetoCSP 产生了更多的有效结构,帮助 GA 更有效地寻找最佳结构。

Predicting Fatigue Crack Growth via Path Slicing and Re-Weighting

  • paper_url: http://arxiv.org/abs/2309.06708
  • repo_url: https://github.com/zhaoyj21/fcg
  • paper_authors: Yingjie Zhao, Yong Liu, Zhiping Xu
  • for: 预测结构元件的疲劳风险,即疲劳破坏的可能性,是工程设计中非常重要的一环。
  • methods: 本文使用统计学学习框架来预测疲劳裂 crack 的增长和元件的寿命。文中使用数字图书馆来构建疲劳裂 crack 的模式和剩余寿命。使用维度减少和神经网络架构来学习疲劳裂 crack 的历史依赖性和非线性。
  • results: 文中的预测方法可以准确地预测疲劳裂 crack 的增长和元件的寿命。示例中使用板件的疲劳裂 crack,验证了数字神经网络的实时结构健康监测和疲劳寿命预测方法。
    Abstract Predicting potential risks associated with the fatigue of key structural components is crucial in engineering design. However, fatigue often involves entangled complexities of material microstructures and service conditions, making diagnosis and prognosis of fatigue damage challenging. We report a statistical learning framework to predict the growth of fatigue cracks and the life-to-failure of the components under loading conditions with uncertainties. Digital libraries of fatigue crack patterns and the remaining life are constructed by high-fidelity physical simulations. Dimensionality reduction and neural network architectures are then used to learn the history dependence and nonlinearity of fatigue crack growth. Path-slicing and re-weighting techniques are introduced to handle the statistical noises and rare events. The predicted fatigue crack patterns are self-updated and self-corrected by the evolving crack patterns. The end-to-end approach is validated by representative examples with fatigue cracks in plates, which showcase the digital-twin scenario in real-time structural health monitoring and fatigue life prediction for maintenance management decision-making.
    摘要 预测结构元件的疲劳风险是工程设计中非常重要的一环。然而,疲劳往往具有材料微结构和服务条件之间的紧张关系,使诊断和预测疲劳损伤变得具有挑战性。我们报道了一种统计学学习框架,用于预测负荷下疲劳裂纹的增长和元件的寿命。通过高精度物理 simulations construct 疲劳裂纹和剩余寿命的数字图书馆。使用维度减少和神经网络架构,学习疲劳裂纹的历史依赖和非线性。使用路径排序和重新权重技术,处理统计噪音和罕见事件。预测的疲劳裂纹自动更新和自我更正,具有实时结构健康监测和疲劳寿命预测功能,为维护决策提供了数字 scenario。

Federated PAC-Bayesian Learning on Non-IID data

  • paper_url: http://arxiv.org/abs/2309.06683
  • repo_url: None
  • paper_authors: Zihao Zhao, Yang Liu, Wenbo Ding, Xiao-Ping Zhang
  • for: 本研究是为了提供非独立Identical Distributions(non-IID) Federated Learning(FL)中的 Probably Approximately Correct(PAC) Bayesian bounds。
  • methods: 本研究使用了唯一的前知识和变量汇集 веса,并提出了一个新的 Gibbs 算法来优化 derivated bound。
  • results: 研究 validate 了在实际数据上。I hope this helps! Let me know if you have any further questions or if you’d like me to translate anything else.
    Abstract Existing research has either adapted the Probably Approximately Correct (PAC) Bayesian framework for federated learning (FL) or used information-theoretic PAC-Bayesian bounds while introducing their theorems, but few considering the non-IID challenges in FL. Our work presents the first non-vacuous federated PAC-Bayesian bound tailored for non-IID local data. This bound assumes unique prior knowledge for each client and variable aggregation weights. We also introduce an objective function and an innovative Gibbs-based algorithm for the optimization of the derived bound. The results are validated on real-world datasets.
    摘要 转换文本为简化中文。<>现有研究 either 采用了 Probably Approximately Correct(PAC)抽象框架 для联合学习(FL),或使用信息理论PAC-抽象 bounds,但很少考虑非Identical和分布(non-IID)挑战。我们的工作提供了首个非虚无的联合PAC-抽象约束,特点是每个客户端和变量权重。我们还介绍了一个目标函数和一种创新的吉卜斯-based算法来优化 derive的约束。结果在实际数据上验证。

Generalizable improvement of the Spalart-Allmaras model through assimilation of experimental data

  • paper_url: http://arxiv.org/abs/2309.06679
  • repo_url: None
  • paper_authors: Deepinder Jot Singh Aulakh, Romit Maulik
  • for: 这个研究旨在使用模型和数据融合提高Spalart-Allmaras(SA)闭合模型的纳维-斯托克解(RANS)解析方法,尤其是在分离流场下。
  • methods: 我们使用了数据融合,即ensemble Kalman filtering(EnKF)方法,来调整SA模型中的系数,以改进 computational models 的性能。
  • results: 我们发现,使用单个流条件逆推缘(BFS)的实验数据进行calibration后,重新调整的SA模型能够在其他分离流场中表现出优化的结果,包括2D-缘和修改BFS等情况。此外,我们还发现SA模型在不同的外部流场中能够保持其能力,而且各个模型部分的调整targeted towards specific flow-physics。
    Abstract This study focuses on the use of model and data fusion for improving the Spalart-Allmaras (SA) closure model for Reynolds-averaged Navier-Stokes solutions of separated flows. In particular, our goal is to develop of models that not-only assimilate sparse experimental data to improve performance in computational models, but also generalize to unseen cases by recovering classical SA behavior. We achieve our goals using data assimilation, namely the Ensemble Kalman Filtering approach (EnKF), to calibrate the coefficients of the SA model for separated flows. A holistic calibration strategy is implemented via a parameterization of the production, diffusion, and destruction terms. This calibration relies on the assimilation of experimental data collected velocity profiles, skin friction, and pressure coefficients for separated flows. Despite using of observational data from a single flow condition around a backward-facing step (BFS), the recalibrated SA model demonstrates generalization to other separated flows, including cases such as the 2D-bump and modified BFS. Significant improvement is observed in the quantities of interest, i.e., skin friction coefficient ($C_f$) and pressure coefficient ($C_p$) for each flow tested. Finally, it is also demonstrated that the newly proposed model recovers SA proficiency for external, unseparated flows, such as flow around a NACA-0012 airfoil without any danger of extrapolation, and that the individually calibrated terms in the SA model are targeted towards specific flow-physics wherein the calibrated production term improves the re-circulation zone while destruction improves the recovery zone.
    摘要

Multi-step prediction of chlorophyll concentration based on Adaptive Graph-Temporal Convolutional Network with Series Decomposition

  • paper_url: http://arxiv.org/abs/2309.07187
  • repo_url: None
  • paper_authors: Ying Chen, Xiao Li, Hongbo Zhang, Wenyang Song, Chongxuan Xv
  • for: 本研究旨在预测水体中氨基酸浓度的变化趋势,以便为环境保护和水产业提供科学依据。
  • methods: 该研究提出了一种基于时间分解和自适应图 convolutional neural network (AGTCNSD) 预测模型,通过分解原始序列为趋势组件和周期组件,并使用图 convolutional neural network 模型水质参数数据,并使用矩阵分解法 assigning 参数权重。
  • results: 研究表明,该预测模型在用于预测氨基酸浓度的水质数据中表现出色,比其他方法更好。这可以作为环境管理决策的科学资源。
    Abstract Chlorophyll concentration can well reflect the nutritional status and algal blooms of water bodies, and is an important indicator for evaluating water quality. The prediction of chlorophyll concentration change trend is of great significance to environmental protection and aquaculture. However, there is a complex and indistinguishable nonlinear relationship between many factors affecting chlorophyll concentration. In order to effectively mine the nonlinear features contained in the data. This paper proposes a time-series decomposition adaptive graph-time convolutional network ( AGTCNSD ) prediction model. Firstly, the original sequence is decomposed into trend component and periodic component by moving average method. Secondly, based on the graph convolutional neural network, the water quality parameter data is modeled, and a parameter embedding matrix is defined. The idea of matrix decomposition is used to assign weight parameters to each node. The adaptive graph convolution learns the relationship between different water quality parameters, updates the state information of each parameter, and improves the learning ability of the update relationship between nodes. Finally, time dependence is captured by time convolution to achieve multi-step prediction of chlorophyll concentration. The validity of the model is verified by the water quality data of the coastal city Beihai. The results show that the prediction effect of this method is better than other methods. It can be used as a scientific resource for environmental management decision-making.
    摘要 <>TRANSLATE_TEXT氯生蓝苷含量可以很好地反映水体营养状况和藻类繁殖,是评估水质的重要指标。预测氯生蓝苷含量变化趋势对环境保护和水产业是非常重要的。然而,许多影响氯生蓝苷含量的因素之间存在复杂和不可分别的非线性关系。为了有效挖掘数据中的非线性特征,这篇论文提出了时间序列归一化适应图 convolutional neural network (AGTCNSD) 预测模型。首先,原始序列被分解成趋势组件和周期组件使用移动平均方法。其次,基于图 convolutional neural network,水质参数数据被建模,并定义了参数嵌入矩阵。使用矩阵分解的想法,将每个节点的参数重量分配给每个节点。适应图 convolution 学习了不同水质参数之间的关系,更新每个参数的状态信息,并提高了更新关系 между节点的学习能力。最后,使用时间归一化 convolution 捕捉时间依赖,实现多步预测氯生蓝苷含量。验证模型的有效性通过海 coastal city Beihai 的水质数据进行验证。结果显示,这种方法的预测效果比其他方法更好。它可以作为环境管理决策的科学资源。>>

Sound field decomposition based on two-stage neural networks

  • paper_url: http://arxiv.org/abs/2309.06661
  • repo_url: None
  • paper_authors: Ryo Matsuda, Makoto Otani
  • for: 该研究提出了一种基于神经网络的声场分解方法,用于实时声源定位和声场重建。
  • methods: 该方法包括两个阶段:声场分离阶段和单源定位阶段。在第一阶段,通过多个源的声压在 микро机上合成的声场被分离为每个声源所激发的一个。在第二阶段,通过微机上的声压组成单个声源的来源位置的 regression 来获得估计的位置,而不受精度的影响。
  • results: 数据集通过 simulations 生成,使用Green’s function,并对每个频率进行训练。实验表明,与传统方法相比,提出的方法可以实现更高的源位置准确率和声场重建准确率。
    Abstract A method for sound field decomposition based on neural networks is proposed. The method comprises two stages: a sound field separation stage and a single-source localization stage. In the first stage, the sound pressure at microphones synthesized by multiple sources is separated into one excited by each sound source. In the second stage, the source location is obtained as a regression from the sound pressure at microphones consisting of a single sound source. The estimated location is not affected by discretization because the second stage is designed as a regression rather than a classification. Datasets are generated by simulation using Green's function, and the neural network is trained for each frequency. Numerical experiments reveal that, compared with conventional methods, the proposed method can achieve higher source-localization accuracy and higher sound-field-reconstruction accuracy.
    摘要 提出了基于神经网络的声场分解方法。该方法包括两个阶段:声场分离阶段和单源地址确定阶段。在第一阶段,通过多个源的声压在 Mikrofone sintesized 被分离成每个声源促进的一个。在第二阶段,通过 Mikrofone 上的声压 regression 获取单个声源的位置,而不受精度的影响。数据集通过绿函数 simulations 生成,每个频率都进行神经网络训练。数字实验表明,相比 convent ional 方法,提出的方法可以 дости到更高的声源地址准确率和声场重建率。

Dissipative Imitation Learning for Discrete Dynamic Output Feedback Control with Sparse Data Sets

  • paper_url: http://arxiv.org/abs/2309.06658
  • repo_url: None
  • paper_authors: Amy K. Strong, Ethan J. LoCicero, Leila J. Bridgeman
  • for: 这篇论文旨在解决复杂目标和高度不确定的植入模型下的控制器 synthesis 问题,并提供稳定性保证。
  • methods: 论文使用输入输出稳定性方法,通过专家数据、粗略的输入输出植入模型和新的约束来学习一个关闭Loop稳定的动态输出反馈控制器。学习目标是非凸的,但是使用迭代凸上界法(ICO)和投影加速度下降(PGD)可以成功学习控制器。
  • results: 论文应用于两个未知的植入模型,与传统的动态输出反馈控制器和神经网络控制器进行比较。与少量数据和不知道植入模型的情况下,约束 constrained 学习的控制器可以在关闭Loop中稳定地运行,并成功模仿专家控制器的行为,而其他方法通常无法维持稳定性和达到良好的性能。
    Abstract Imitation learning enables the synthesis of controllers for complex objectives and highly uncertain plant models. However, methods to provide stability guarantees to imitation learned controllers often rely on large amounts of data and/or known plant models. In this paper, we explore an input-output (IO) stability approach to dissipative imitation learning, which achieves stability with sparse data sets and with little known about the plant model. A closed-loop stable dynamic output feedback controller is learned using expert data, a coarse IO plant model, and a new constraint to enforce dissipativity on the learned controller. While the learning objective is nonconvex, iterative convex overbounding (ICO) and projected gradient descent (PGD) are explored as methods to successfully learn the controller. This new imitation learning method is applied to two unknown plants and compared to traditionally learned dynamic output feedback controller and neural network controller. With little knowledge of the plant model and a small data set, the dissipativity constrained learned controller achieves closed loop stability and successfully mimics the behavior of the expert controller, while other methods often fail to maintain stability and achieve good performance.
    摘要 imitative学习可以实现控制器的合成,但是确保控制器的稳定性 часто需要大量数据和/或已知的植物模型。在这篇论文中,我们探索了输入输出(IO)稳定性方法,以达到稳定性的 guarantees WITH sparse data sets AND little known about the plant model。我们使用专家数据、粗略的IO植物模型和一个新的约束来学习一个闭环稳定的动态输出反馈控制器。虽然学习目标是非凸的,但我们使用迭代凸上界(ICO)和投影Gradient Descent(PGD)来成功地学习控制器。这种新的imitative学习方法在两个未知的植物上应用,与传统学习的动态输出反馈控制器和神经网络控制器进行比较。与其他方法不同的是,我们的方法可以在小量数据和未知植物模型的情况下实现closed-loop稳定性,并成功地模仿专家控制器的行为。

Out of Distribution Detection via Domain-Informed Gaussian Process State Space Models

  • paper_url: http://arxiv.org/abs/2309.06655
  • repo_url: None
  • paper_authors: Alonso Marco, Elias Morley, Claire J. Tomlin
  • for: 本文目的是提出一种基于学习的方法,使机器人在未经训练的情况下安全地探索 unknown 环境。
  • methods: 本文使用 Gaussian process state-space models (GPSSMs) 来识别不符合训练数据的情况,通过比较预测值与实际值进行识别。
  • results: 实验结果表明,通过嵌入域知识在核心中,可以提高 GPSSM 的预测质量,并且在实际四足机器人在室内环境中探索时,可以可靠地识别未经训练的地形。
    Abstract In order for robots to safely navigate in unseen scenarios using learning-based methods, it is important to accurately detect out-of-training-distribution (OoD) situations online. Recently, Gaussian process state-space models (GPSSMs) have proven useful to discriminate unexpected observations by comparing them against probabilistic predictions. However, the capability for the model to correctly distinguish between in- and out-of-training distribution observations hinges on the accuracy of these predictions, primarily affected by the class of functions the GPSSM kernel can represent. In this paper, we propose (i) a novel approach to embed existing domain knowledge in the kernel and (ii) an OoD online runtime monitor, based on receding-horizon predictions. Domain knowledge is provided in the form of a dataset, collected either in simulation or by using a nominal model. Numerical results show that the informed kernel yields better regression quality with smaller datasets, as compared to standard kernel choices. We demonstrate the effectiveness of the OoD monitor on a real quadruped navigating an indoor setting, which reliably classifies previously unseen terrains.
    摘要 为让机器人在未经训练的情况下安全地导航,使用学习基于方法是非常重要。在线上快速识别外部训练数据(OoD)的情况是非常重要的。在这篇论文中,我们提出了(i)一种将现有领域知识集成到kernel中的新方法,以及(ii)基于往返预测的OoD在线监控器。领域知识通过一个数据集,其中可以是 simulations或nominal模型中收集的。我们的数字结果显示,在小数据集情况下, Informed kernel可以提供更好的回归质量,与标准kernel相比。我们还证明了OoD监控器在实际四足机器人在室内环境中逻辑分类未经训练的地形。

ConR: Contrastive Regularizer for Deep Imbalanced Regression

  • paper_url: http://arxiv.org/abs/2309.06651
  • repo_url: https://github.com/borealisai/conr
  • paper_authors: Mahsa Keramati, Lili Meng, R. David Evans
  • for: 这篇论文的目的是解决深度学习中的不对称分布问题,尤其是在回归 зада中。
  • methods: 这篇论文提出了一个名为ConR的对称调节器,它模型了数据中的全局和本地关系,并避免了小数据点被嵌入到大数据点中。
  • results: 根据该论文的实验结果,ConR可以对深度学习回归 зада中的不对称分布问题进行有效地解决,并且与现有的方法相比,具有更好的性能。
    Abstract Imbalanced distributions are ubiquitous in real-world data. They create constraints on Deep Neural Networks to represent the minority labels and avoid bias towards majority labels. The extensive body of imbalanced approaches address categorical label spaces but fail to effectively extend to regression problems where the label space is continuous. Conversely, local and global correlations among continuous labels provide valuable insights towards effectively modelling relationships in feature space. In this work, we propose ConR, a contrastive regularizer that models global and local label similarities in feature space and prevents the features of minority samples from being collapsed into their majority neighbours. Serving the similarities of the predictions as an indicator of feature similarities, ConR discerns the dissagreements between the label space and feature space and imposes a penalty on these disagreements. ConR minds the continuous nature of label space with two main strategies in a contrastive manner: incorrect proximities are penalized proportionate to the label similarities and the correct ones are encouraged to model local similarities. ConR consolidates essential considerations into a generic, easy-to-integrate, and efficient method that effectively addresses deep imbalanced regression. Moreover, ConR is orthogonal to existing approaches and smoothly extends to uni- and multi-dimensional label spaces. Our comprehensive experiments show that ConR significantly boosts the performance of all the state-of-the-art methods on three large-scale deep imbalanced regression benchmarks. Our code is publicly available in https://github.com/BorealisAI/ConR.
    摘要 实际世界数据中很常见偏置分布。它们限制深度神经网络来表示少数标签,并避免对多数标签的偏袋。然而,现有的偏置方法主要集中在 categorical 标签空间上,并未能有效扩展到回归问题,其标签空间是连续的。相反,本文提出了 ConR,一种对准规化器,该模型在特征空间中模型全局和本地标签相似性,并防止少数样本的特征被推入多数样本的邻居中。通过将预测结果作为特征相似性的指标,ConR 可以识别标签和特征空间之间的不一致,并对这些不一致进行罚款。ConR 通过两种主要策略在对照方式进行处理: incorrect 邻接被罚款与标签相似性成比例,而正确的邻接被鼓励以模型本地相似性。ConR 汇集了重要考虑因素,并将其转化为一种通用、易于集成、高效的方法,可以有效地解决深层偏置回归问题。此外,ConR 与现有方法相互正交,可以顺利扩展到一维和多维标签空间。我们的实验表明,ConR 可以显著提高现有状态之前方法的性能在三个大规模深层偏置回归 benchmark 上。我们的代码可以在 上公开获取。