cs.LG - 2023-11-13

Probabilistic Physics-integrated Neural Differentiable Modeling for Isothermal Chemical Vapor Infiltration Process

  • paper_url: http://arxiv.org/abs/2311.07798
  • repo_url: None
  • paper_authors: Deepak Akhare, Zeping Chen, Richard Gulotty, Tengfei Luo, Jian-Xun Wang
  • For: This paper aims to develop a data-driven predictive model for the isothermal chemical vapor infiltration (CVI) densification process, which is critical for producing high-performance carbon-carbon and carbon-silicon carbide composites.* Methods: The authors use the physics-integrated neural differentiable (PiNDiff) modeling framework, which incorporates uncertainty quantification to enhance the model’s reliability and robustness. They also use both synthetic and real-world manufacturing data to validate the model’s accuracy.* Results: The proposed method is shown to be effective in modeling the densification process during CVI, and can potentially be used to optimize the manufacturing process and improve the quality and consistency of the final products.
    Abstract Chemical vapor infiltration (CVI) is a widely adopted manufacturing technique used in producing carbon-carbon and carbon-silicon carbide composites. These materials are especially valued in the aerospace and automotive industries for their robust strength and lightweight characteristics. The densification process during CVI critically influences the final performance, quality, and consistency of these composite materials. Experimentally optimizing the CVI processes is challenging due to long experimental time and large optimization space. To address these challenges, this work takes a modeling-centric approach. Due to the complexities and limited experimental data of the isothermal CVI densification process, we have developed a data-driven predictive model using the physics-integrated neural differentiable (PiNDiff) modeling framework. An uncertainty quantification feature has been embedded within the PiNDiff method, bolstering the model's reliability and robustness. Through comprehensive numerical experiments involving both synthetic and real-world manufacturing data, the proposed method showcases its capability in modeling densification during the CVI process. This research highlights the potential of the PiNDiff framework as an instrumental tool for advancing our understanding, simulation, and optimization of the CVI manufacturing process, particularly when faced with sparse data and an incomplete description of the underlying physics.
    摘要 化学蒸气渗入(CVI)是制造 carbon-carbon 和 carbon-silicon carbide composites 的广泛采用的制造技术。这些材料在航空和汽车业中尤其有价值,因为它们具有出色的强度和轻量特点。CVI densification 过程对 composite 材料的最终性能、质量和一致性具有关键影响。由于实验室 optimize CVI 过程的时间长和空间大,因此实验室优化是挑战。为解决这些挑战,这些工作采用了模型中心的方法。由于 CV 的热吸 densification 过程的复杂性和实验数据的有限性,我们开发了基于物理和神经网络的 PiNDiff 模型。在 PiNDiff 方法中嵌入了不确定性评估功能,使模型的可靠性和可重复性得到加强。通过对 synthetic 和实际制造数据进行了广泛的数值实验,我们表明了 PiNDiff 方法在 CV densification 过程中的模型化能力。这些研究强调 PiNDiff 框架在 CV 制造过程中的可能性,特别是在缺乏数据和不完全物理描述的情况下。

Explainable History Distillation by Marked Temporal Point Process

  • paper_url: http://arxiv.org/abs/2311.07797
  • repo_url: None
  • paper_authors: Sishun Liu, Ke Deng, Yan Wang, Xiuzhen Zhang
  • for: 这篇论文的目的是提出一种自动生成事件解释的机器学习系统,以便在实际任务中,特别是高度重要的任务中,让研究人员可以更好地理解机器学习模型的含义。
  • methods: 这篇论文提出了一种新的任务called \acrfull{ehd},它要求一个模型可以从历史记录中提取最少的事件,使得事件分布 conditional on 左边事件可以更好地预测未来。为了有效解决 \acrshort{ehd} 问题, authors 将任务重写为一个 \gls{01ip},并直接使用名为 \acrfull{model} 的模型来解决这个问题。
  • results: 实验结果表明, \acrshort{model} 在 Retweet 和 Stack Overflow 数据集上显示出了显著的优势,并且可以 revelas 实际世界中的逻辑基础。
    Abstract Explainability of machine learning models is mandatory when researchers introduce these commonly believed black boxes to real-world tasks, especially high-stakes ones. In this paper, we build a machine learning system to automatically generate explanations of happened events from history by \gls{ca} based on the \acrfull{tpp}. Specifically, we propose a new task called \acrfull{ehd}. This task requires a model to distill as few events as possible from observed history. The target is that the event distribution conditioned on left events predicts the observed future noticeably worse. We then regard distilled events as the explanation for the future. To efficiently solve \acrshort{ehd}, we rewrite the task into a \gls{01ip} and directly estimate the solution to the program by a model called \acrfull{model}. This work fills the gap between our task and existing works, which only spot the difference between factual and counterfactual worlds after applying a predefined modification to the environment. Experiment results on Retweet and StackOverflow datasets prove that \acrshort{model} significantly outperforms other \acrshort{ehd} baselines and can reveal the rationale underpinning real-world processes.
    摘要 机器学习模型的可解释性是必备的当研究者将这些通常被认为是黑盒子引入到实际任务中,尤其是高度重要的任务。在这篇论文中,我们构建了一个机器学习系统,可以自动生成历史事件的解释。Specifically,我们提出了一个新的任务called \acrfull{ehd}.这个任务需要一个模型可以从观察历史中提取最少的事件,并且目标是使得事件分布 conditioned on 左事件可以预测未来的观察结果 Notable worse。然后,我们将液化的事件视为未来的解释。为了效率地解决 \acrshort{ehd},我们将任务重写为 \gls{01ip} ,并直接通过一个模型called \acrfull{model}来解决该程序。这种方法填充了我们的任务和现有工作之间的空白,后者只是在应用先定的环境修改后才能够识别 Factual 和 counterfactual 世界的差异。实验结果表明,\acrshort{model}在 Retweet 和 StackOverflow 数据集上显著超越了其他 \acrshort{ehd} 基elines,并且可以揭示实际世界中的本质。

Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learning

  • paper_url: http://arxiv.org/abs/2311.07790
  • repo_url: None
  • paper_authors: Paula Chen, Tingwei Meng, Zongren Zou, Jérôme Darbon, George Em Karniadakis
  • for: 科学机器学习(SciML)中的解释性和计算效率两大挑战。
  • methods: 利用一种新的理论连接,将科学机器学习中的优化问题与一般化豪夫公式相连接,该公式表示一个时间依赖的汉密尔顿-雅可比 partial differential equation(HJ PDE)的viscosity解。通过这种连接,我们可以将解决某些带权学习问题的方法重新 интерпретирова为解决一个相关的控制问题和其对应的HJ PDE。这种连接允许我们在时间上跟踪学习过程中的模型更新,并且可以避免忘记性。
  • results: 我们在特殊情况下的线性回归问题中应用了这种连接,开发了一种基于Riccati方法的解决方案,该方案可以在持续学习应用中提供计算和存储的优化。我们还提供了一些相应的数据示例,显示了我们的方法在计算和存储方面的优势。
    Abstract We address two major challenges in scientific machine learning (SciML): interpretability and computational efficiency. We increase the interpretability of certain learning processes by establishing a new theoretical connection between optimization problems arising from SciML and a generalized Hopf formula, which represents the viscosity solution to a Hamilton-Jacobi partial differential equation (HJ PDE) with time-dependent Hamiltonian. Namely, we show that when we solve certain regularized learning problems with integral-type losses, we actually solve an optimal control problem and its associated HJ PDE with time-dependent Hamiltonian. This connection allows us to reinterpret incremental updates to learned models as the evolution of an associated HJ PDE and optimal control problem in time, where all of the previous information is intrinsically encoded in the solution to the HJ PDE. As a result, existing HJ PDE solvers and optimal control algorithms can be reused to design new efficient training approaches for SciML that naturally coincide with the continual learning framework, while avoiding catastrophic forgetting. As a first exploration of this connection, we consider the special case of linear regression and leverage our connection to develop a new Riccati-based methodology for solving these learning problems that is amenable to continual learning applications. We also provide some corresponding numerical examples that demonstrate the potential computational and memory advantages our Riccati-based approach can provide.
    摘要 我们面临科学机器学习(SciML)中的两大挑战:解释性和计算效率。我们将增强一些学习过程的解释性,通过建立一个新的理论连接,它连接了由SciML产生的优化问题和一个通用的豪夫公式,这个公式表示一个时间依赖的汉米顿-雅可比偏微分方程(HJ PDE)的沥丹解。具体来说,当我们解决一些具有积分类型损失函数的定制化学习问题时,我们其实是解决一个优化控制问题和其相关的HJ PDE。这个连接让我们可以将增量更新给学习模型视为时间演化的HJ PDE和优化控制问题,所有的先前信息都是内在地嵌入到HJ PDE的解中。因此,我们可以重用现有的HJ PDE解法和优化控制算法来设计新的高效训练方法,这些方法自然地与持续学习框架匹配,而不会发生衰减式遗传。作为一个首先探索这个连接的例子,我们考虑了特殊情况下的线性回推,并利用我们的连接,开发了一个新的里卡提-基础的方法学,这种方法适合持续学习应用。我们还提供了一些相应数例,以示出我们的方法可能具有更高的计算和内存优势。

Predicting the First Response Latency of Maintainers and Contributors in Pull Requests

  • paper_url: http://arxiv.org/abs/2311.07786
  • repo_url: None
  • paper_authors: SayedHassan Khatoonabadi, Ahmad Abdellatif, Diego Elias Costa, Emad Shihab
  • For: The paper aims to predict the first response latency of maintainers and contributors in the context of pull requests (PRs) on GitHub.* Methods: The authors use a machine-learning approach with 21 features to predict the first response latency of maintainers and contributors. They evaluate seven types of classifiers and perform permutation feature importance and SHAP analyses to understand the impact of different features on the predicted response latencies.* Results: The authors achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. They find that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.
    Abstract The success of a Pull Request (PR) depends on the responsiveness of the maintainers and the contributor during the review process. Being aware of the expected waiting times can lead to better interactions and managed expectations for both the maintainers and the contributor. In this paper, we propose a machine-learning approach to predict the first response latency of the maintainers following the submission of a PR, and the first response latency of the contributor after receiving the first response from the maintainers. We curate a dataset of 20 large and popular open-source projects on GitHub and extract 21 features to characterize projects, contributors, PRs, and review processes. Using these features, we then evaluate seven types of classifiers to identify the best-performing models. We also perform permutation feature importance and SHAP analyses to understand the importance and impact of different features on the predicted response latencies. Our best-performing models achieve an average improvement of 33% in AUC-ROC and 58% in AUC-PR for maintainers, as well as 42% in AUC-ROC and 95% in AUC-PR for contributors compared to a no-skilled classifier across the projects. Our findings indicate that PRs submitted earlier in the week, containing an average or slightly above-average number of commits, and with concise descriptions are more likely to receive faster first responses from the maintainers. Similarly, PRs with a lower first response latency from maintainers, that received the first response of maintainers earlier in the week, and containing an average or slightly above-average number of commits tend to receive faster first responses from the contributors. Additionally, contributors with a higher acceptance rate and a history of timely responses in the project are likely to both obtain and provide faster first responses.
    摘要 Success of a Pull Request (PR) 取决于维护者和贡献者在审核过程中的反应速度。了解审核过程的等待时间可以导致更好的互动和管理的期望。在这篇论文中,我们提出一种机器学习方法,可以预测维护者对于PR的第一个响应时间,以及贡献者对于维护者的第一个响应时间。我们收录了20个大型和受欢迎的开源项目的GitHub数据集,并提取了21个特征来描述项目、贡献者、PR和审核过程。使用这些特征,我们然后评估了七种类型的分类器,以确定最佳性能的模型。我们还进行了排序特征重要性和SHAP分析,以了解不同特征对预测响应时间的重要性和影响。我们的最佳模型在20个项目中的AUC-ROC和AUC-PR方面达到了33%的平均提升和58%的平均提升,而贡献者的AUC-ROC和AUC-PR方面达到了42%的平均提升和95%的平均提升。我们的发现表明,PR在星期一提交的早些时候,包含平均或微妙的提交数量,并且描述简洁的PR更有可能得到更快的第一个响应。同时,维护者的第一个响应时间早些,贡献者的第一个响应时间早些,贡献者的acceptance rate高和历史快速响应率高的贡献者更有可能得到和提供更快的第一个响应。

Dynamic Local Attention with Hierarchical Patching for Irregular Clinical Time Series

  • paper_url: http://arxiv.org/abs/2311.07744
  • repo_url: None
  • paper_authors: Xingyu Chen, Xiaochen Zheng, Amina Mollaysa, Manuel Schürch, Ahmed Allam, Michael Krauthammer
  • for: 这篇论文是为了解决在医疗和健康领域中频繁出现的不规则多ivariate时间序列数据的问题。
  • methods: 这篇论文使用了一个新的模型架构,包括两个模块:(1) DLA,一个动态本地注意力机制,通过学习的问题和特定的特征本地窗口来计算自我注意力操作。这会将不规则时间步的原始输入转换为一个调和的常规特征空间表示,同时考虑不同特征的抽样率。(2) 一个层次MLP混合器,将 DLA 的出力处理,通过多尺度装配来利用不同的尺度上的信息来进行下游任务。
  • results: 这篇论文的方法比前一些方法在三个真实世界数据集上表现更好,包括最新的医疗 MIMIC IV 数据集。
    Abstract Irregular multivariate time series data is prevalent in the clinical and healthcare domains. It is characterized by time-wise and feature-wise irregularities, making it challenging for machine learning methods to work with. To solve this, we introduce a new model architecture composed of two modules: (1) DLA, a Dynamic Local Attention mechanism that uses learnable queries and feature-specific local windows when computing the self-attention operation. This results in aggregating irregular time steps raw input within each window to a harmonized regular latent space representation while taking into account the different features' sampling rates. (2) A hierarchical MLP mixer that processes the output of DLA through multi-scale patching to leverage information at various scales for the downstream tasks. Our approach outperforms state-of-the-art methods on three real-world datasets, including the latest clinical MIMIC IV dataset.
    摘要 众多变量时间序列数据在医疗和健康领域非常普遍,它具有时间和特征方面的不规则性,使得机器学习方法很难处理。为解决这个问题,我们介绍了一种新的模型架构,包括两个模块:(1)DLA(动态本地注意力机制),它使用学习的查询和特征特定的本地窗口来计算自注意操作。这将在每个窗口中将不规则时间步骤的原始输入融合到一个协调的常规特征空间表示中,同时考虑不同特征的抽取速率。(2)层次MLP混合器,它将DLA输出处理过多个尺度的补丁来利用不同尺度的信息来下游任务。我们的方法在三个实际世界数据集上达到了现有方法的最佳性能,包括最新的医疗MIMIC IV数据集。

A Simple Quantum Blockmodeling with Qubits and Permutations

  • paper_url: http://arxiv.org/abs/2311.07726
  • repo_url: None
  • paper_authors: Ammar Daskin
  • For: 这篇论文的目的是为了介绍一种基于排序矩阵的量子阈值模型,用于数据分析任务。* Methods: 这种模型使用了排序矩阵和 permutation matrix,并通过在量子计算机上并行执行排序来实现效率的计算。* Results: 论文表明,使用这种模型可以在 $O(log(N))$ 时间内查找或更新适应值,比类 классической计算机更快。此外,由于量子Circuit中可以同时执行不同的排序序列,因此这种模型在量子计算机上可以更有效地实现机器学习任务。
    Abstract Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step. In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time. This lead us to show that when the number of iterations are less than $log(N)$ time, it may be possible to reach the same solution exponentially faster on quantum computers in comparison to classical computers. In addition, since on quantum circuits the different sequence of permutations can be applied in parallel (superpositon), the machine learning task in this model can be implemented more efficiently on quantum computers.
    摘要 Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step. In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time. This lead us to show that when the number of iterations are less than $log(N)$ time, it may be possible to reach the same solution exponentially faster on quantum computers in comparison to classical computers. In addition, since on quantum circuits the different sequence of permutations can be applied in parallel (superposition), the machine learning task in this model can be implemented more efficiently on quantum computers.Here's the text with the original English text and the Simplified Chinese translation side by side for reference:Original English Text:Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. Simplified Chinese Translation:Blockmodeling of a given problem represented by an $N\times N$ adjacency matrix can be found by swapping rows and columns of the matrix (i.e. multiplying matrix from left and right by a permutation matrix). In general, through performing this task, row and column permutations affect the fitness value in optimization: For an $N\times N$ matrix, it requires $O(N)$ computations to find (or update) the fitness value of a candidate solution. Original English Text:On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step.Simplified Chinese Translation:On quantum computers, permutations can be applied in parallel and efficiently, and their implementations can be as simple as a single qubit operation (a NOT gate on a qubit) which takes an $O(1)$ time algorithmic step.Original English Text:In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time.Simplified Chinese Translation:In this paper, using permutation matrices, we describe a quantum blockmodeling for data analysis tasks. In the model, the measurement outcome of a small group of qubits are mapped to indicate the fitness value. Therefore, we show that it is possible to find or update the fitness value in $O(log(N))$ time.

Deep Phenotyping of Non-Alcoholic Fatty Liver Disease Patients with Genetic Factors for Insights into the Complex Disease

  • paper_url: http://arxiv.org/abs/2311.08428
  • repo_url: None
  • paper_authors: Tahmina Sultana Priya, Fan Leng, Anthony C. Luehrs, Eric W. Klee, Alina M. Allen, Konstantinos N. Lazaridis, Danfeng, Yao, Shulan Tian
    for:* This study aimed to identify subgroups of Non-alcoholic fatty liver disease (NAFLD) patients based on demographic, clinical, and genetic characteristics for precision medicine.methods:* The study used genomic and phenotypic data from 3,408 NAFLD cases and 4,739 controls, including demographic, clinical, and comorbidity data, and genotype information through whole exome sequencing.* The study used a chi-square test and stepwise backward-forward regression model to determine factors highly relevant to NAFLD, and latent class analysis (LCA) to identify subgroups.results:* The study identified 5 latent subgroups of NAFLD patients, characterized by metabolic syndrome, obesity, different comorbidities, psychoneurological factors, and genetic factors.* Cluster 2 had a significantly higher complex disease outcome compared to other clusters, including fibrosis, cirrhosis, and hepatocellular carcinoma (HCC), as well as liver failure.Here is the information in Simplified Chinese text:for:* 这个研究的目的是通过人群特征和生物 markers 进行精准医学,为NAFLD 患者提供个性化治疗。methods:* 这个研究使用了3408例 NAFLD 患者和4739例控制人群的 genomic 和 fenotypic 数据,包括人群特征、临床特征和相关疾病数据,以及通过整个扩展 sequencing 获得的 genotype 信息。* 研究使用 chi-square 测试和步骤性回推前进回归模型来确定 NAFLD 高度相关的因素,并使用 latent class analysis (LCA) 来确定患者群体。results:* 研究发现了5个 latent 群体,每个群体都具有不同的 мета波性、肥胖、不同的相关疾病、心神内科因素和遗传因素。* 群体2的复杂疾病结果明显高于其他群体,包括 fibrosis、cirrhosis 和肝癌 (HCC) 以及肝功能失调。
    Abstract Non-alcoholic fatty liver disease (NAFLD) is a prevalent chronic liver disorder characterized by the excessive accumulation of fat in the liver in individuals who do not consume significant amounts of alcohol, including risk factors like obesity, insulin resistance, type 2 diabetes, etc. We aim to identify subgroups of NAFLD patients based on demographic, clinical, and genetic characteristics for precision medicine. The genomic and phenotypic data (3,408 cases and 4,739 controls) for this study were gathered from participants in Mayo Clinic Tapestry Study (IRB#19-000001) and their electric health records, including their demographic, clinical, and comorbidity data, and the genotype information through whole exome sequencing performed at Helix using the Exome+$^\circledR$ Assay according to standard procedure (www$.$helix$.$com). Factors highly relevant to NAFLD were determined by the chi-square test and stepwise backward-forward regression model. Latent class analysis (LCA) was performed on NAFLD cases using significant indicator variables to identify subgroups. The optimal clustering revealed 5 latent subgroups from 2,013 NAFLD patients (mean age 60.6 years and 62.1% women), while a polygenic risk score based on 6 single-nucleotide polymorphism (SNP) variants and disease outcomes were used to analyze the subgroups. The groups are characterized by metabolic syndrome, obesity, different comorbidities, psychoneurological factors, and genetic factors. Odds ratios were utilized to compare the risk of complex diseases, such as fibrosis, cirrhosis, and hepatocellular carcinoma (HCC), as well as liver failure between the clusters. Cluster 2 has a significantly higher complex disease outcome compared to other clusters. Keywords: Fatty liver disease; Polygenic risk score; Precision medicine; Deep phenotyping; NAFLD comorbidities; Latent class analysis.
    摘要 非酒精脂肪liver病(NAFLD)是一种常见的慢性肝病,表现为不 consume significant amounts of alcohol 的人肝中聚集过多脂肪,包括风险因素如肥胖、荷尔血症、第二型糖尿病等。我们的目标是通过基因和现象特征来分类NAFLD患者,为精准医学提供优化的治疗方案。这些数据来自Mayo临床研究(IRB#19-000001)和其电子健康纪录,包括参与者的民生、临床和相关疾病数据,以及通过全染色体测序实施的基因信息。经过χ²测试和步骤式回溯前进分析模型,确定了NAFLD高度相关的因素。使用秘密分析法(LCA)对NAFLD患者进行分类,并确定了5个秘密群体。这5个群体被定义为不同的 метаболиic syndrome、肥胖、不同的相关疾病、神经内科因素和遗传因素。通过对每个群体的复杂疾病结果进行比较,发现群体2的复杂疾病结果显著高于其他群体。关键词:脂肪肝病;多单 nucleotide polymorphism(SNP)变种;精准医学;深度现象分析;NAFLD相关疾病;秘密分析法。

Matching aggregate posteriors in the variational autoencoder

  • paper_url: http://arxiv.org/abs/2311.07693
  • repo_url: None
  • paper_authors: Surojit Saha, Sarang Joshi, Ross Whitaker
  • for: 提高VAEs的适用范围和性能,解决VAEs常见的“囊泡”和“后退”问题
  • methods: 基于VAE的理论基础,使用kernel density estimate(KDE)模型高维合计 posterior distribution,提出了aggregate variational autoencoder(AVAE)方法
  • results: 对多个referenced数据集进行实验,与现有方法相比,AVAE方法表现更高效
    Abstract The variational autoencoder (VAE) is a well-studied, deep, latent-variable model (DLVM) that efficiently optimizes the variational lower bound of the log marginal data likelihood and has a strong theoretical foundation. However, the VAE's known failure to match the aggregate posterior often results in \emph{pockets/holes} in the latent distribution (i.e., a failure to match the prior) and/or \emph{posterior collapse}, which is associated with a loss of information in the latent space. This paper addresses these shortcomings in VAEs by reformulating the objective function associated with VAEs in order to match the aggregate/marginal posterior distribution to the prior. We use kernel density estimate (KDE) to model the aggregate posterior in high dimensions. The proposed method is named the \emph{aggregate variational autoencoder} (AVAE) and is built on the theoretical framework of the VAE. Empirical evaluation of the proposed method on multiple benchmark data sets demonstrates the effectiveness of the AVAE relative to state-of-the-art (SOTA) methods.
    摘要 “VAEs是一种已经广泛研究的深度隐变量模型(DLVM),能够有效地优化变量下界和具有强的理论基础。然而,VAEs通常会出现\"囊括/洞\"在幂分布中(即失准备)和/或\"后退\",这与数据信息损失在隐变量空间相关。这篇论文解决了VAEs中这些缺陷,通过修改VAEs的目标函数,使其能够匹配归一化 posterior distribution 和先验分布。我们使用核密度估计(KDE)来模型高维归一化 posterior distribution。我们提出的方法被称为\"归一化变量自动编码器\"(AVAE),基于VAEs的理论基础。我们对多个参考数据集进行了实验评估,并证明了AVAE相比 estado-of-the-art(SOTA)方法更有效。”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard varieties of Chinese. If you prefer Traditional Chinese, I can provide that as well. Additionally, please keep in mind that machine translation can sometimes be imperfect, and the nuances of the original text may be lost in translation.

Feature emergence via margin maximization: case studies in algebraic tasks

  • paper_url: http://arxiv.org/abs/2311.07568
  • repo_url: None
  • paper_authors: Depen Morwani, Benjamin L. Edelman, Costin-Andrei Oncescu, Rosie Zhao, Sham Kakade
  • for: 本研究探讨了神经网络学习时所学习的内部表示形式,即神经网络如何选择特定的计算策略。
  • methods: 本研究使用了 margin maximization 原理来完全解释神经网络学习的特性。
  • results: 研究发现,神经网络在解决代数学习任务时会使用 fourier 特征来实现模块加法,并使用 irreducible 群论中的表示特征来实现总体组合。这与 Nanda et al. 和 Chughtai et al. 的实验结果吻合得非常 closely。
    Abstract Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
    摘要 理解神经网络学习过程中内部表征的学习是机器学习科学中一个重要挑战。本文探讨了一个相关问题:神经网络何以采用特定计算策略呢?我们的研究集中在代数学习任务上,包括幂加法、稀疏偶数和整数群操作。我们的主要理论发现可以使用边缘margin最大化原则来完全描述神经网络学习到的特征。具体来说,我们证明神经网络在训练过程中使用FOURIER特征来实现幂加法,并使用对应于整数群理论中reducible的表示来执行总体群操作,与实际观察结果相吻合。更一般来说,我们希望我们的技术可以帮助更深入地理解神经网络采用特定计算策略的原因。

Exploration via linearly perturbed loss minimisation

  • paper_url: http://arxiv.org/abs/2311.07565
  • repo_url: https://github.com/davidjanz/evill-code
  • paper_authors: David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári
  • for: 这个论文是为了解决结构化随机带强问题中的探索问题。
  • methods: 这篇论文提出了一种随机探索方法,即解决一个线性偏移的减少 log-likelihood 函数的最小值。在总体化线性带中,这种方法降到 perturbed history exploration(PHE)。
  • results: 论文表明,使用我们提出的数据依赖的偏移,EVILL可以与参数偏移方法匹配性能,并且在理论和实践中都有good表现。此外,论文还提供了一个外部 generalised linear bandits 中 PHE 会导致不稳定估计,而 EVILL 仍然表现良好的示例。
    Abstract We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood function. We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards. In doing so, we provide a simple and clean explanation of when and why random reward perturbations give rise to good bandit algorithms. With the data-dependent perturbations we propose, not present in previous PHE-type methods, EVILL is shown to match the performance of Thompson-sampling-style parameter-perturbation methods, both in theory and in practice. Moreover, we show an example outside of generalised linear bandits where PHE leads to inconsistent estimates, and thus linear regret, while EVILL remains performant. Like PHE, EVILL can be implemented in just a few lines of code.
    摘要 我们介绍了探索via线性损失偏移(EVILL),一种随机探索方法 для结构化随机抽象问题,它通过解决一个线性偏移后的减少正负扩展函数的最小值。我们显示,在通用化线性抽象问题下,EVILL降到了受扰的历史探索(PHE),一种通过训练在随机偏移奖励上进行探索。在这之中,我们提供了一个简单清晰的解释,当和为什么随机奖励偏移会导致好的抽象问题Algorithm。我们还提出了一种使用我们所提出的数据依赖的偏移,不存在在前一些PHE-型方法中的偏移,EVILL可以与参数偏移方法匹配表现, both in theory and in practice。此外,我们显示了一个外部普通化线性抽象问题中,PHE会导致不一致的估计,从而导致线性 regret,而EVILL则保持高效。与PHE一样,EVILL可以在只需几行程式码中实现。

Learning Control Policies of Hodgkin-Huxley Neuronal Dynamics

  • paper_url: http://arxiv.org/abs/2311.07563
  • repo_url: None
  • paper_authors: Malvern Madondo, Deepanshu Verma, Lars Ruthotto, Nicholas Au Yong
  • for: 这个论文的目的是开发一种基于神经网络的深脑刺激(DBS)closed-loop控制方法,以优化治疗效果。
  • methods: 该方法使用了一种控制策略,通过在实时基于患者的神经活动的参数调整DBS系统,以实现在线调整DBS系统的控制策略。
  • results: 该方法的实验结果显示,可以准确地预测患者的神经活动,并在不同的输入和输出参数下进行优化调整,以提高治疗效果。
    Abstract We present a neural network approach for closed-loop deep brain stimulation (DBS). We cast the problem of finding an optimal neurostimulation strategy as a control problem. In this setting, control policies aim to optimize therapeutic outcomes by tailoring the parameters of a DBS system, typically via electrical stimulation, in real time based on the patient's ongoing neuronal activity. We approximate the value function offline using a neural network to enable generating controls (stimuli) in real time via the feedback form. The neuronal activity is characterized by a nonlinear, stiff system of differential equations as dictated by the Hodgkin-Huxley model. Our training process leverages the relationship between Pontryagin's maximum principle and Hamilton-Jacobi-Bellman equations to update the value function estimates simultaneously. Our numerical experiments illustrate the accuracy of our approach for out-of-distribution samples and the robustness to moderate shocks and disturbances in the system.
    摘要 我们提出了一种神经网络方法用于关闭式深脑刺激(DBS)。我们将问题找到优化神经刺激策略转化为控制问题。在这种设定下,控制策略 aim to 优化治疗结果,通过在患者的进行实时电抗应用的DBS系统中调整参数,根据患者的持续神经活动。我们使用神经网络在线预测值函数,以便在实时通过反馈形式生成控制(刺激)。神经活动被描述为非线性、硬系统的差分方程,由韦德-休克利模型确定。我们的训练过程利用普通拉格曼最大原理和汉密尔-雅各布-贝尔曼方程来同时更新估计值函数。我们的数值实验表明我们的方法对于不同样本和系统强大扰动的稳定性和精度具有高度的准确性和稳定性。

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.07558
  • repo_url: None
  • paper_authors: Arjun Bhardwaj, Jonas Rothfuss, Bhavya Sukhija, Yarden As, Marco Hutter, Stelian Coros, Andreas Krause
  • for: 这个研究设计了一个名为 PACOH-RL 的模型基于 Meta-循环学习(Meta-RL)算法,用于快速适应不断变化的动力学。
  • methods: PACOH-RL 使用了热点热点热点(PACOH)来学习动力学模型的偏好,以便快速适应新的动力学情况。此外,它还包括了调整和知识不确定量化,以便在适应新情况时更好地调整探索和数据收集。
  • results: 实验结果显示,PACOH-RL 比模型基于 RL 和模型基于 Meta-RL 的基eline表现更好,能够快速适应新的动力学情况。此外,在一个真实的机械车上,我们还证明了 PACOH-RL 可以在没有充足的数据情况下进行高效的RL策略适应。
    Abstract We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.
    摘要 我们介绍PACOH-RL,一种新的模型基于Meta-循环学习(Meta-RL)算法,用于快速适应更改的动力学。PACOH-RL在元学习阶段学习动力学模型的假设,以便快速适应新的动力学,只需要最小化互动数据。现有的Meta-RL方法需要充足的元学习数据,限制了它们在机器人等设置中的应用。为解决这个问题,PACOH-RL将在元学习和任务适应阶段中添加了调整和知识不确定量化。当面对新的动力学时,我们使用这些不确定度估计来有效地导引探索和数据收集。这使得PACOH-RL能够在仅有限的数据情况下进行有益的RL政策适应。我们的实验结果显示,PACOH-RL在适应新的动力学条件时表现出色,比model-based RL和model-based Meta-RL基eline更好。最后,我们在一辆真实的机器人车上展示了PACOH-RL在多元、数据缺乏的情况下的实际应用潜力。

Tabdoor: Backdoor Vulnerabilities in Transformer-based Neural Networks for Tabular Data

  • paper_url: http://arxiv.org/abs/2311.07550
  • repo_url: None
  • paper_authors: Bart Pleiter, Behrad Tajalli, Stefanos Koffas, Gorka Abad, Jing Xu, Martha Larson, Stjepan Picek
  • for: 研究攻击和 защищать深度神经网络(DNN)在表格数据上的攻击性质。
  • methods: 使用transformer模型对表格数据进行深度学习,并对其进行系统性的实验 исследование。
  • results: 发现transformer模型对表格数据的攻击性质强,可以通过 minimal feature value alterations 实现nearly perfect attack success rates(约100%)。另外, Spectral Signatures 被证明是最有效的防御策略。
    Abstract Deep neural networks (DNNs) have shown great promise in various domains. Alongside these developments, vulnerabilities associated with DNN training, such as backdoor attacks, are a significant concern. These attacks involve the subtle insertion of triggers during model training, allowing for manipulated predictions. More recently, DNNs for tabular data have gained increasing attention due to the rise of transformer models. Our research presents a comprehensive analysis of backdoor attacks on tabular data using DNNs, particularly focusing on transformer-based networks. Given the inherent complexities of tabular data, we explore the challenges of embedding backdoors. Through systematic experimentation across benchmark datasets, we uncover that transformer-based DNNs for tabular data are highly susceptible to backdoor attacks, even with minimal feature value alterations. Our results indicate nearly perfect attack success rates (approx100%) by introducing novel backdoor attack strategies to tabular data. Furthermore, we evaluate several defenses against these attacks, identifying Spectral Signatures as the most effective one. Our findings highlight the urgency to address such vulnerabilities and provide insights into potential countermeasures for securing DNN models against backdoors on tabular data.
    摘要 深度神经网络(DNN)在不同领域都显示出了很大的承诺。同时,DNN训练中的漏洞,如后门攻击,也成为了一个重要的问题。这些攻击通过在模型训练过程中抽象插入特征,以致于模型的预测被操纵。在最近,使用表格数据的DNN受到了越来越多的关注,特别是由于 transformer 模型的出现。 我们的研究对 tabular 数据使用 DNN 进行了全面的后门攻击分析,特别是对 transformer 基于的网络进行了研究。由于表格数据的内在复杂性,我们探讨了后门攻击的挑战。通过对标准 benchmark 数据集进行系统实验,我们发现了 transformer 基于的 DNN 对 tabular 数据的后门攻击非常易受,即使特征值变化非常小。我们的结果表明,通过引入新的后门攻击策略,可以在 tabular 数据上达到 nearly 100% 的攻击成功率。此外,我们评估了多种防御策略,并发现 spectral signatures 是最有效的一种。我们的发现强调了需要解决这类漏洞,并提供了可能的对策方法来保护 DNN 模型免受 tabular 数据上的后门攻击。

Interpretable Fine-Tuning for Graph Neural Network Surrogate Models

  • paper_url: http://arxiv.org/abs/2311.07548
  • repo_url: None
  • paper_authors: Shivam Barwey, Romit Maulik
  • for: 本研究的目标是提出一种可解释的精度调整策略,用于提高基于图 neural network (GNN) 的无结构域流体动力学模型的预测能力。
  • methods: 该策略基于一种可变的子图采样策略,可以在前向传播中随机选择与预测任务直接相关的物理空间区域,并将这些区域作为基于输入的可读性函数进行表示。
  • results: 通过对一种基于 GNN 的基线模型进行可解释的精度调整,研究人员可以获得一个具有可读性函数的 fine-tuned GNN,该函数可以在预测过程中标识与预测任务直接相关的物理空间区域。此外,通过一种Regularization程序, fine-tuned GNN 还可以在推理过程中标识大多数预测错误的图节点,从而为基eline模型增加一种新的可解释的错误标记功能。
    Abstract Data-based surrogate modeling has surged in capability in recent years with the emergence of graph neural networks (GNNs), which can operate directly on mesh-based representations of data. The goal of this work is to introduce an interpretable fine-tuning strategy for GNNs, with application to unstructured mesh-based fluid dynamics modeling. The end result is a fine-tuned GNN that adds interpretability to a pre-trained baseline GNN through an adaptive sub-graph sampling strategy that isolates regions in physical space intrinsically linked to the forecasting task, while retaining the predictive capability of the baseline. The structures identified by the fine-tuned GNNs, which are adaptively produced in the forward pass as explicit functions of the input, serve as an accessible link between the baseline model architecture, the optimization goal, and known problem-specific physics. Additionally, through a regularization procedure, the fine-tuned GNNs can also be used to identify, during inference, graph nodes that correspond to a majority of the anticipated forecasting error, adding a novel interpretable error-tagging capability to baseline models. Demonstrations are performed using unstructured flow data sourced from flow over a backward-facing step at high Reynolds numbers.
    摘要 “数据基于的代理模型在近年来有了很大的进步,尤其是图 neck 网络(GNN),可以直接操作在数据表示中的碰撞网格。本工作的目标是介绍一种可解释的细化策略 для GNN,并应用于无结构的碰撞网格基础流动模型。结果是一种可解释的 GNN,通过适应性的子图抽样策略,隔离物理空间中与预测任务直接相关的区域,保留基础模型的预测能力。由 fine-tuned GNN 生成的结构,在前向传播中为 explicit 函数而生成的,作为基础模型架构、优化目标和已知问题特有物理的可访问的链接。此外,通过规则化程序, fine-tuned GNN 还可以在推理过程中标识出大多数预测错误的图节点,添加了基础模型中的一种新的可解释错误标记功能。示例通过来源于高 Reynolds 数的逆推流动数据进行演示。”

mlscorecheck: Testing the consistency of reported performance scores and experiments in machine learning

  • paper_url: http://arxiv.org/abs/2311.07541
  • repo_url: None
  • paper_authors: György Kovács, Attila Fazekas
  • for: validate reported experimental results in artificial intelligence
  • methods: numerical techniques for identifying inconsistencies in machine learning problems
  • results: developed an open-source package (mlscorecheck) with specific test bundles to detect systematically recurring flaws in various fields
    Abstract Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.
    摘要 Translated into Simplified Chinese: Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a difficult task. It requires either reimplementing techniques or carefully assessing papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques that can identify inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China and Singapore. Traditional Chinese is also widely used in Taiwan, Hong Kong, and other parts of the world.

Estimating optical vegetation indices with Sentinel-1 SAR data and AutoML

  • paper_url: http://arxiv.org/abs/2311.07537
  • repo_url: None
  • paper_authors: Daniel Paluba, Bertrand Le Saux, Francesco Sarti, Přemysl Stych
  • for: 这个研究的目的是将Synthetic Aperture Radar(SAR)数据用于取代光学数据,并且为森林监控系统提供更好的时间分辨率和空间分辨率。
  • methods: 本研究使用了Google Earth Engine(GEE)创建了一个多标准和多模式的数据集,包括时间和空间对称的Sentinel-1、Sentinel-2、地形高程数据(DEM)、天气和土地类型数据(MMT-GEE)。此外,还使用了DEM和天气数据生成的辅助特征来提高结果。
  • results: 研究结果显示,使用AutoML方法可以超过Random Forest Regression的性能,并且在三个中的四个维度上得到了69-84%的R2低误(0.05-0.32的MAE,对应的VI的低误)。此外,选择的案例研究显示,SAR-based VI可以实现更好的时间分辨率和空间分辩率,并且可以探测森林发生的突然变化。
    Abstract Current optical vegetation indices (VIs) for monitoring forest ecosystems are widely used in various applications. However, continuous monitoring based on optical satellite data can be hampered by atmospheric effects such as clouds. On the contrary, synthetic aperture radar (SAR) data can offer insightful and systematic forest monitoring with complete time series due to signal penetration through clouds and day and night acquisitions. The goal of this work is to overcome the issues affecting optical data with SAR data and serve as a substitute for estimating optical VIs for forests using machine learning. Time series of four VIs (LAI, FAPAR, EVI and NDVI) were estimated using multitemporal Sentinel-1 SAR and ancillary data. This was enabled by creating a paired multi-temporal and multi-modal dataset in Google Earth Engine (GEE), including temporally and spatially aligned Sentinel-1, Sentinel-2, digital elevation model (DEM), weather and land cover datasets (MMT-GEE). The use of ancillary features generated from DEM and weather data improved the results. The open-source Automatic Machine Learning (AutoML) approach, auto-sklearn, outperformed Random Forest Regression for three out of four VIs, while a 1-hour optimization length was enough to achieve sufficient results with an R2 of 69-84% low errors (0.05-0.32 of MAE depending on VI). Great agreement was also found for selected case studies in the time series analysis and in the spatial comparison between the original and estimated SAR-based VIs. In general, compared to VIs from currently freely available optical satellite data and available global VI products, a better temporal resolution (up to 240 measurements/year) and a better spatial resolution (20 m) were achieved using estimated SAR-based VIs. A great advantage of the SAR-based VI is the ability to detect abrupt forest changes with a sub-weekly temporal accuracy.
    摘要 现有的光学植被指数(VI)在监测森林生态系统方面广泛使用,但是不断监测基于光学卫星数据可能受到大气效应的干扰,如云层。然而,Synthetic Aperture Radar(SAR)数据可以提供系统性的森林监测,并且可以在完整的时间序列中提供完整的数据,因为信号可以通过云层和日夜耦合。这个工作的目标是使用SAR数据来解决光学数据中的问题,并作为估算光学VI的替代方案。使用多个时间的Sentinel-1 SAR和辅助数据,时间序列中的四个VI(LAI、FAPAR、EVI和NDVI)的估算被实现。这被启动了一个在Google Earth Engine(GEE)上的配对多时间多模式 dataset(MMT-GEE),包括时间和空间对齐的Sentinel-1、Sentinel-2、数字高程模型(DEM)、天气和土地数据(MMT-GEE)。使用来自DEM和天气数据生成的辅助特征提高了结果。使用自动机器学习(AutoML)方法auto-sklearn,Random Forest Regression的性能被超越,而1小时优化长度已经足够以达到足够的结果,R2值在69-84%之间,低错率(0.05-0.32的MAE,具体取决于VI)。选择的 caso studies 表明,在时间序列分析和空间比较中,选择了原始 SAR-based VI 的准确性是非常高的。总之,使用估算的 SAR-based VI 可以获得更高的时间分辨率(最多 240 次/年)和更高的空间分辨率(20 m),并且可以快速响应森林的快速变化。这是现有的光学卫星数据和全球 VI 产品中的一大优势。

Unsupervised Musical Object Discovery from Audio

  • paper_url: http://arxiv.org/abs/2311.07534
  • repo_url: https://github.com/arahosu/musicslots
  • paper_authors: Joonsu Gha, Vincent Herrmann, Benjamin Grewe, Jürgen Schmidhuber, Anand Gopalakrishnan
  • for: 这篇论文是为了解决音乐频谱中对象的分解问题而写的。
  • methods: 这篇论文使用了 modifying SlotAttention 模型来实现不supervised music decomposition。
  • results: 研究得出了好的性能在不监督的音符发现任务和监督的音符属性预测任务上。
    Abstract Current object-centric learning models such as the popular SlotAttention architecture allow for unsupervised visual scene decomposition. Our novel MusicSlots method adapts SlotAttention to the audio domain, to achieve unsupervised music decomposition. Since concepts of opacity and occlusion in vision have no auditory analogues, the softmax normalization of alpha masks in the decoders of visual object-centric models is not well-suited for decomposing audio objects. MusicSlots overcomes this problem. We introduce a spectrogram-based multi-object music dataset tailored to evaluate object-centric learning on western tonal music. MusicSlots achieves good performance on unsupervised note discovery and outperforms several established baselines on supervised note property prediction tasks.
    摘要 当前的对象中心学习模型,如受欢迎的槽注意模型,允许无监督视觉场景分解。我们的MusicSlots方法将槽注意模型应用到音频频谱中,以实现无监督音乐分解。由于视觉中的涂抹和遮盖没有相应的听觉对应物,视觉中的softmax正则化α面积的decoder无法适应音频对象的分解。MusicSlots解决了这个问题。我们介绍了基于spectrogram的多对象音乐数据集,用于评估对西方抽象音乐的对象中心学习。MusicSlots在无监督音符发现任务上达到了良好的性能,并在指定音符属性预测任务上超越了一些确立的基准点。

Automatic Identification of Driving Maneuver Patterns using a Robust Hidden Semi-Markov Models

  • paper_url: http://arxiv.org/abs/2311.07527
  • repo_url: None
  • paper_authors: Matthew Aguirre, Wenbo Sun, Jionghua, Jin, Yang Chen
  • for: 模型自动化驾驶动作模式,通常用于交通研究领域如减速驾驶、路面安全和智能汽车。
  • methods: 使用 Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM) 模型来自动归类天然的顺序动力数据,并且可以估计数据分割、状态持续时间和转移概率。
  • results: 提出了一种新的可靠 HDP-HSMM (rHDP-HSMM) 方法,可以减少重复的状态数量,提高模型估计的一致性。采用 simulate 试验和实际驾驶数据示例,证明提议的 rHDP-HSMM 能够更好地识别和推断驾驶动作模式。
    Abstract There is an increase in interest to model driving maneuver patterns via the automatic unsupervised clustering of naturalistic sequential kinematic driving data. The patterns learned are often used in transportation research areas such as eco-driving, road safety, and intelligent vehicles. One such model capable of modeling these patterns is the Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM), as it is often used to estimate data segmentation, state duration, and transition probabilities. While this model is a powerful tool for automatically clustering observed sequential data, the existing HDP-HSMM estimation suffers from an inherent tendency to overestimate the number of states. This can result in poor estimation, which can potentially impact impact transportation research through incorrect inference of driving patterns. In this paper, a new robust HDP-HSMM (rHDP-HSMM) method is proposed to reduce the number of redundant states and improve the consistency of the model's estimation. Both a simulation study and a case study using naturalistic driving data are presented to demonstrate the effectiveness of the proposed rHDP-HSMM in identifying and inference of driving maneuver patterns.
    摘要 “there is an increasing interest in using automatic unsupervised clustering of naturalistic sequential kinematic driving data to model driving maneuver patterns. these patterns are often used in transportation research areas such as eco-driving, road safety, and intelligent vehicles. one such model capable of modeling these patterns is the hierarchical dirichlet process hidden semi-markov model (HDP-HSMM), as it is often used to estimate data segmentation, state duration, and transition probabilities. however, the existing HDP-HSMM estimation suffers from an inherent tendency to overestimate the number of states, which can result in poor estimation and potentially impact transportation research through incorrect inference of driving patterns. in this paper, a new robust HDP-HSMM (rHDP-HSMM) method is proposed to reduce the number of redundant states and improve the consistency of the model's estimation. both a simulation study and a case study using naturalistic driving data are presented to demonstrate the effectiveness of the proposed rHDP-HSMM in identifying and inference of driving maneuver patterns.”Here's the text with the Chinese characters:“有增长的兴趣在使用自动无监督数据分 clustering来模型驾驶动作模式。这些模式通常用于交通研究领域,如可持续驾驶、路面安全和智能车辆。一个可以模型这些模式的模型是几何 Dirichlet 过程隐藏Markov 模型 (HDP-HSMM),它通常用于估计数据分 segmentation、状态持续时间和转换 probabilities。但是现有的 HDP-HSMM 估计受到了自然的倾向,即过度估计状态的数量,这可能会导致估计不单纯、不精确,并可能对交通研究造成错误的推论。在本文中,一个新的可靠 HDP-HSMM (rHDP-HSMM) 方法被提出,以减少状态的重复性并改善模型的估计稳定性。在一个实验研究和一个使用自然驾驶数据的实例研究中,显示了提案的 rHDP-HSMM 能够有效地识别和推断驾驶动作模式。”

Machine Learning For Beamline Steering

  • paper_url: http://arxiv.org/abs/2311.07519
  • repo_url: None
  • paper_authors: Isaac Kante
  • for: 这篇论文是为了解决加速器中的电子束轴向准确性问题,以提高光源的科学吞吐量。
  • methods: 该论文使用深度学习模型来帮助准确调整加速器中的磁agnets,从而降低人工操作时间和努力。
  • results: 该论文通过对储存数据和 simulate 数据进行训练,并与人工操作员进行比较,发现深度学习模型可以帮助提高加速器中的准确性和科学吞吐量。
    Abstract Beam steering is the process involving the calibration of the angle and position at which a particle accelerator's electron beam is incident upon the x-ray target with respect to the rotation axis of the collimator. Beam Steering is an essential task for light sources. In the case under study, the LINAC To Undulator (LTU) section of the beamline is difficult to aim. Each use of the accelerator requires re-calibration of the magnets in this section. This involves a substantial amount of time and effort from human operators, while reducing scientific throughput of the light source. We investigate the use of deep neural networks to assist in this task. The deep learning models are trained on archival data and then validated on simulation data. The performance of the deep learning model is contrasted against that of trained human operators.
    摘要 电子束扫描是指在加速器中电子束与精密测量的射线目标之间的角度和位置准确调整,这个过程对光源来说非常重要。在这个案例中,加速器的LINAC到Undulator(LTU)部分很难调整。每次使用加速器都需要重新调整磁铁,这需要人工操作者投入大量时间和努力,同时减少了光源的科学生产率。我们研究使用深度学习模型来帮助进行这个任务。我们在archiv数据上训练了深度学习模型,然后在模拟数据上验证了其性能,并与人工操作员进行了比较。

FEMDA: a unified framework for discriminant analysis

  • paper_url: http://arxiv.org/abs/2311.07518
  • repo_url: None
  • paper_authors: Pierre Houdouin, Matthieu Jonckheere, Frederic Pascal
  • For: The paper aims to address the limitations of classical methods such as linear and quadratic discriminant analysis when dealing with non-Gaussian distributions or contaminated datasets.* Methods: The paper presents a novel approach that uses an arbitrary Elliptically Symmetrical (ES) distribution per cluster with its own arbitrary scale parameter, allowing for potentially diverse and independent samples that may not follow identical distributions.* Results: The paper demonstrates that the new approach is simple, efficient, and robust compared to state-of-the-art methods, and that maximum-likelihood parameter estimation and classification can be easily derived.
    Abstract Although linear and quadratic discriminant analysis are widely recognized classical methods, they can encounter significant challenges when dealing with non-Gaussian distributions or contaminated datasets. This is primarily due to their reliance on the Gaussian assumption, which lacks robustness. We first explain and review the classical methods to address this limitation and then present a novel approach that overcomes these issues. In this new approach, the model considered is an arbitrary Elliptically Symmetrical (ES) distribution per cluster with its own arbitrary scale parameter. This flexible model allows for potentially diverse and independent samples that may not follow identical distributions. By deriving a new decision rule, we demonstrate that maximum-likelihood parameter estimation and classification are simple, efficient, and robust compared to state-of-the-art methods.
    摘要 Translated into Simplified Chinese:尽管线性和 quadratic discriminant analysis 是广泛认可的古典方法,但它们在面临非泊松分布或杂凑数据集时可能遇到 significiant 挑战。这主要是因为它们假设 Gaussian,这种假设缺乏稳定性。我们首先介绍和评论古典方法,然后提出一种新的方法,该方法可以在不同和独立的样本集中实现。在这种新方法中,每个分支 Considered 是一个自由的 Elliptically Symmetrical (ES) 分布,具有自己的自由拟合参数。这种灵活的模型允许样本可能不是完全相同的分布。我们 derivation 了一个新的决策规则,并证明了 maximum-likelihood 参数估计和分类是简单、高效、Robust 的 compared 于现有方法。

A Hypothesis on Good Practices for AI-based Systems for Financial Time Series Forecasting: Towards Domain-Driven XAI Methods

  • paper_url: http://arxiv.org/abs/2311.07513
  • repo_url: None
  • paper_authors: Branka Hadji Misheva, Joerg Osterrieder
  • for: 这篇论文主要是为了探讨在金融预测和预测任务中如何使用可解释的人工智能(XAI)方法,以提高客户体验、民主化金融服务、改善消费者保护和提高风险管理。
  • methods: 论文使用了经典的XAI方法,如LIME和SHAP,以及其他相关的技术来提供模型的解释。
  • results: 论文认为,在金融业中使用XAI方法可以更好地理解和满足用户的需求,但是现有的XAI方法也存在一些局限性,如计算复杂度、模型偏见、数据采样的敏感性和特性数据处理的挑战。
    Abstract Machine learning and deep learning have become increasingly prevalent in financial prediction and forecasting tasks, offering advantages such as enhanced customer experience, democratising financial services, improving consumer protection, and enhancing risk management. However, these complex models often lack transparency and interpretability, making them challenging to use in sensitive domains like finance. This has led to the rise of eXplainable Artificial Intelligence (XAI) methods aimed at creating models that are easily understood by humans. Classical XAI methods, such as LIME and SHAP, have been developed to provide explanations for complex models. While these methods have made significant contributions, they also have limitations, including computational complexity, inherent model bias, sensitivity to data sampling, and challenges in dealing with feature dependence. In this context, this paper explores good practices for deploying explainability in AI-based systems for finance, emphasising the importance of data quality, audience-specific methods, consideration of data properties, and the stability of explanations. These practices aim to address the unique challenges and requirements of the financial industry and guide the development of effective XAI tools.
    摘要

Machine learning for uncertainty estimation in fusing precipitation observations from satellites and ground-based gauges

  • paper_url: http://arxiv.org/abs/2311.07511
  • repo_url: None
  • paper_authors: Georgia Papacharalampous, Hristos Tyralis, Nikolaos Doulamis, Anastasios Doulamis
    for: 这项研究的目的是提供一种可靠的降水数据集,同时具有高的空间密度,通过将卫星数据和测站数据合并。methods: 这项研究使用了6种适合预测 uncertainty 量化的学习器,分别是quantile regression(QR)、quantile regression forest(QRF)、generalized random forests(GRF)、gradient boosting machines(GBM)、light gradient boosting machines(LightGBM)和quantile regression neural networks(QRNN)。results: 结果显示,LightGBM、QRF和GRF是最佳的学习器,其预测量iles的能力最高,而QRNN和QR最差。此外,这些学习器之间的主要区别在于feature importance的实现方式。
    Abstract To form precipitation datasets that are accurate and, at the same time, have high spatial densities, data from satellites and gauges are often merged in the literature. However, uncertainty estimates for the data acquired in this manner are scarcely provided, although the importance of uncertainty quantification in predictive modelling is widely recognized. Furthermore, the benefits that machine learning can bring to the task of providing such estimates have not been broadly realized and properly explored through benchmark experiments. The present study aims at filling in this specific gap by conducting the first benchmark tests on the topic. On a large dataset that comprises 15-year-long monthly data spanning across the contiguous United States, we extensively compared six learners that are, by their construction, appropriate for predictive uncertainty quantification. These are the quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM) and quantile regression neural networks (QRNN). The comparison referred to the competence of the learners in issuing predictive quantiles at nine levels that facilitate a good approximation of the entire predictive probability distribution, and was primarily based on the quantile and continuous ranked probability skill scores. Three types of predictor variables (i.e., satellite precipitation variables, distances between a point of interest and satellite grid points, and elevation at a point of interest) were used in the comparison and were additionally compared with each other. This additional comparison was based on the explainable machine learning concept of feature importance. The results suggest that the order from the best to the worst of the learners for the task investigated is the following: LightGBM, QRF, GRF, GBM, QRNN and QR...
    摘要 通常,通过卫星和测点数据的合并来形成减霾数据集,但是这些数据的不确定性估计 rarely 提供。 although the importance of uncertainty quantification in predictive modeling is widely recognized. 在这种情况下,本研究的目的是填充这个具体的空白,通过对15年的月度数据, covering the contiguous United States,进行了首次 benchmark 测试。我们对6种适用于预测uncertainty quantification的学习器进行了广泛的比较。这些学习器包括:量词回归(QR)、量词回归森林(QRF)、普通随机森林(GRF)、梯度提升机器(GBM)、轻量级梯度提升机器(LightGBM)和量词回归神经网络(QRNN)。 comparison 基于 nine 级预测量误差,以便 aproximate 预测概率分布的整个范围。此外,我们还对不同的预测变量(卫星降水变量、测点与卫星网格点之间的距离、测点 elevation)进行了比较,基于解释机器学习概念的特征重要性。 results 表明,这些学习器的排名如下:LightGBM、QRF、GRF、GBM、QRNN 和 QR。

Explicit Foundation Model Optimization with Self-Attentive Feed-Forward Neural Units

  • paper_url: http://arxiv.org/abs/2311.07510
  • repo_url: None
  • paper_authors: Jake Ryland Williams, Haoran Zhao
  • for: 这个论文的目的是提出一种高效的神经网络优化方法,以降低神经网络的计算成本,特别是在大规模使用时。
  • methods: 这个论文使用迭代近似法和反射层来优化神经网络,并提出了一种基于feed-forward神经网络的通用结果。
  • results: 测试结果表明,使用Explicit Solutions可以取得更好的优化结果,而且在使用反射层后,Explicit Solutions可以从更小的数据量中获得更好的优化结果。此外,这个论文还进行了一系列的ablation experiment,发现一些不同的体系结构可以生成高性能的模型,并且这些模型可以在更少的数据量上训练。
    Abstract Iterative approximation methods using backpropagation enable the optimization of neural networks, but they remain computationally expensive, especially when used at scale. This paper presents an efficient alternative for optimizing neural networks that reduces the costs of scaling neural networks and provides high-efficiency optimizations for low-resource applications. We will discuss a general result about feed-forward neural networks and then extend this solution to compositional (mult-layer) networks, which are applied to a simplified transformer block containing feed-forward and self-attention layers. These models are used to train highly-specified and complex multi-layer neural architectures that we refer to as self-attentive feed-forward unit (SAFFU) layers, which we use to develop a transformer that appears to generalize well over small, cognitively-feasible, volumes of data. Testing demonstrates explicit solutions outperform models optimized by backpropagation alone. Moreover, further application of backpropagation after explicit solutions leads to better optima from smaller scales of data, training effective models from much less data is enabled by explicit solution warm starts. We then carry out ablation experiments training a roadmap of about 250 transformer models over 1-million tokens to determine ideal settings. We find that multiple different architectural variants produce highly-performant models, and discover from this ablation that some of the best are not the most parameterized. This appears to indicate well-generalized models could be reached using less data by using explicit solutions, and that architectural exploration using explicit solutions pays dividends in guiding the search for efficient variants with fewer parameters, and which could be incorporated into low-resource hardware where AI might be embodied.
    摘要 iterative approximation方法使用反射传播可以优化神经网络,但它们仍然具有计算成本,特别是在大规模使用时。这篇文章提出了一种高效的神经网络优化方法,可以降低神经网络缩放时的计算成本,并为低资源应用提供高效优化。我们将讨论一个通用的Feed-Forward神经网络的结果,然后扩展到多层神经网络,并应用于简化后Transformer块中的Feed-Forward和自注意层。这些模型用于训练复杂多层神经架构,我们称之为自注意Feed-Forward单元(SAFFU)层。我们使用这些层来开发一个Transformer模型,该模型在小量数据上Generalization良好。测试表明显式解决方案可以超越backpropagation alone的优化。此外,通过backpropagation和显式解决方案的组合,可以从小规模数据中获得更好的优化。这些方法可以训练效果很好的模型,从更少的数据中训练模型。我们然后进行了ablation experiment,训练约250个Transformer模型,并测试其在100万个字节上的性能。我们发现多种不同的建筑学variant可以生成高性能模型,并发现这些variant中的一些最好的模型并不是最大化参数的。这表明可以使用显式解决方案来找到更好的模型,并且使用这些方法可以在低资源硬件上搬运AI。

STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup

  • paper_url: http://arxiv.org/abs/2311.07504
  • repo_url: None
  • paper_authors: Yumnah Hasan, Fatemeh Amerehi, Patrick Healy, Conor Ryan
  • for: 该论文targets imbalanced medical imaging datasets, particularly breast cancer datasets, and aims to improve the performance of machine learning classifiers on these datasets.
  • methods: 该论文提出了一种新的 Vicinal Distribution Augmentation(Mixup)方法,combines SMOTE-ENN和Mixup在实例层次进行结合,以利用整个少数类分布,thereby mitigating both between-class and within-class imbalances.
  • results: 该论文在Digital Database for Screening Mammography和Wisconsin Breast Cancer(Diagnostics) datasets中 achieved AUC values of 0.96和0.99,respectively, demonstrating the effectiveness of STEM in improving the performance of machine learning classifiers on imbalanced medical imaging datasets.
    Abstract Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases. When trained using such data, models tend to assign higher probabilities to normal cases, leading to biased performance. Common oversampling techniques such as SMOTE rely on local information and can introduce marginalization issues. This paper investigates the potential of using Mixup augmentation that combines two training examples along with their corresponding labels to generate new data points as a generic vicinal distribution. To this end, we propose STEM, which combines SMOTE-ENN and Mixup at the instance level. This integration enables us to effectively leverage the entire distribution of minority classes, thereby mitigating both between-class and within-class imbalances. We focus on the breast cancer problem, where imbalanced datasets are prevalent. The results demonstrate the effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the Digital Database for Screening Mammography and Wisconsin Breast Cancer (Diagnostics) datasets, respectively. Moreover, this method shows promising potential when applied with an ensemble of machine learning (ML) classifiers.
    摘要 医学影像数据集偏度问题常被定义为类别分布不均衡和罕见病例的缺乏。当使用这些数据进行训练时,模型往往偏好正常情况,导致表现偏移。常见的扩大技术,如SMOTE,基于地方信息,可能会导致边缘化问题。本文研究了使用混合增强的潜在利点,通过将两个训练示例和其相应的标签拼接起来生成新的数据点,以实现一个通用的邻近分布。为此,我们提出了STEM,它将SMOTE-ENN和混合拼接在实例层次结合。这种整合使得我们可以有效利用少数类别的整个分布,从而解决 между类和内类别不均衡。我们将精力集中在乳腺癌问题上,这里的数据集很常见偏度。结果表明,STEM具有抗偏衡能力,在数字图像creening Mamмографи和乳腺癌(诊断)数据集上分别 achieve AUC值0.96和0.99。此外,这种方法在 ensemble 机器学习(ML)类ifier上表现了扎实的潜在性。

Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks

  • paper_url: http://arxiv.org/abs/2311.07498
  • repo_url: None
  • paper_authors: Jake Ryland Williams, Haoran Zhao
  • for: 这篇论文旨在提出一种可以实现对神经网络的优化,并且可以降低训练神经网络的Computational Expensive。
  • methods: 作者使用了Iterative differential approximation方法,并且通过分析Gradient的 mathematician来derive一个explicit solution дляfeed-forward language model(LM)和MNIST digit classification。
  • results: 作者发现,这个explicit solution可以实现near-optimality,并且可以降低iterative optimization的computational cost。此外,作者还发现,这个solution可以在多层神经网络中实现更好的optima,并且可以提高模型的解释性。
    Abstract Iterative differential approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures.
    摘要 iterative diferencial approximation methods that rely upon backpropagation have enabled the optimization of neural networks; however, at present, they remain computationally expensive, especially when training models at scale. In this paper, we propose a computationally efficient alternative for optimizing neural networks that can both reduce the costs of scaling neural networks and provide high-efficiency optimizations for low-resource applications. We derive an explicit solution to a simple feed-forward language model (LM) by mathematically analyzing its gradients. This solution generalizes from single-layer LMs to the class of all single-layer feed-forward softmax-activated neural models trained on positive-valued features, as is demonstrated by our extension of this solution application to MNIST digit classification. For both LM and digit classifiers, we find computationally that explicit solutions perform near-optimality in experiments showing that 1) iterative optimization only marginally improves the explicit solution parameters and 2) randomly initialized parameters iteratively optimize towards the explicit solution. We also preliminarily apply the explicit solution locally by layer in multi-layer networks and discuss how the solution's computational savings increase with model complexity -- for both single- and mult-layer applications of the explicit solution, we emphasize that the optima achieved cannot be reached by backpropagation alone, i.e., better optima appear discoverable only after explicit solutions are applied. Finally, we discuss the solution's computational savings alongside its impact on model interpretability and suggest future directions for the derivation of explicit solutions to complex- and multi-layer architectures.

A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals

  • paper_url: http://arxiv.org/abs/2311.07474
  • repo_url: None
  • paper_authors: Madi Arabi, Xiaolei Fang
  • for: 这篇论文旨在提出一种联合多个用户的联邦预测模型,以便在多元数据和尚未完整的情况下预测机器的故障时间。
  • methods: 本论文使用多元功能原始对角测度分析融合多条流损坏讯号,然后使用融合的特征建立(对数)-位置-标准 regresion 模型进行故障预测。
  • results: 数据分析显示,提案的模型性能与非联邦预测模型相同,并且比各用户自己建立的模型更好。
    Abstract Most prognostic methods require a decent amount of data for model training. In reality, however, the amount of historical data owned by a single organization might be small or not large enough to train a reliable prognostic model. To address this challenge, this article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model using their multi-stream, high-dimensional, and incomplete data while keeping each user's data local and confidential. The prognostic model first employs multivariate functional principal component analysis to fuse the multi-stream degradation signals. Then, the fused features coupled with the times-to-failure are utilized to build a (log)-location-scale regression model for failure prediction. To estimate parameters using distributed datasets and keep the data privacy of all participants, we propose a new federated algorithm for feature extraction. Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models and is better than that of the models constructed by each user itself.
    摘要 大多数预测方法需要一定量的数据进行模型训练。然而,在现实中,一个组织可能拥有的历史数据量可能不够或者太少以建立一个可靠的预测模型。为解决这个挑战,这篇文章提出了一种联合预测模型,允许多个用户共同构建一个失败时间预测模型,使用他们的多元流、高维度和不完整的数据,而不需要将数据分享或泄露。该预测模型首先使用多元函数主成分分析将多流衰减信号融合。然后,融合后的特征coupled with the times-to-failure被用建立一个(对数)-(尺度)-(拟合)回归模型进行失败预测。为了在分布式数据集上计算参数并保持所有参与者的数据隐私,我们提出了一种新的联合算法 для特征提取。 numerically studies show that the performance of the proposed model is the same as that of classic non-federated prognostic models and is better than that of the models constructed by each user itself.

On Self-Supervised Dynamic Incremental Regularised Adaptation

  • paper_url: http://arxiv.org/abs/2311.07461
  • repo_url: None
  • paper_authors: Abanoub Ghobrial, Kerstin Eder
  • for: 本研究提出了一种基于几个样本和轻量级融合的动态领域适应方法,即DIRA,以实现领域适应最佳结果。
  • methods: DIRA方法基于一些标签,但这些标签不是必需的,因此它可以视为一种自动适应方法。
  • results: DIRA方法在前一些研究中已经达到了领域适应最佳结果的水平,但它仍然需要提供标签来进行适应。在本研究中,我们提出了一种修改DIRA方法,使其成为自动适应方法,并将在未来的实验中提供证明。
    Abstract In this paper, we overview a recent method for dynamic domain adaptation named DIRA, which relies on a few samples in addition to a regularisation approach named elastic weight consolidation to achieve state-of-the-art (SOTA) domain adaptation results. DIRA has been previously shown to perform competitively with SOTA unsupervised adaption techniques. However, a limitation of DIRA is that it relies on labels to be provided for the few samples used in adaption. This makes it a supervised technique. In this paper, we discuss a proposed alteration to the DIRA method to make it self-supervised i.e. remove the need for providing labels. Experiments on our proposed alteration will be provided in future work.
    摘要 在这篇论文中,我们介绍了一种最近的动态领域适应方法 named DIRA,该方法基于一些样本和一种正则化方法 named elastic weight consolidation来实现领域适应结果。 DIRA 已经在之前的研究中展示了与顶尖无监督适应技术相当的性能。然而,DIRA 的一个局限性是它需要提供适应样本的标签。这使得它成为一种有监督的技术。在这篇论文中,我们讨论了对 DIRA 方法进行修改,以使其成为无监督的,即移除标签提供的需求。未来的工作中将提供相关的实验。

Causal Discovery under Latent Class Confounding

  • paper_url: http://arxiv.org/abs/2311.07454
  • repo_url: None
  • paper_authors: Bijan Mazaheri, Spencer Gordon, Yuval Rabani, Leonard Schulman
  • for: 这种研究是为了解决多源数据中的 causal discovery 问题。
  • methods: 这种方法使用 directaded acyclic graphs 模型系统的 causal 结构,并使用 conditional independence properties 来学习这种结构。
  • results: 研究表明,如果global confounding的 cardinality 是有限的(即数据来源有限),则可以成功地解决 causal discovery 问题。 however, the feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure。
    Abstract Directed acyclic graphs are used to model the causal structure of a system. ``Causal discovery'' describes the problem of learning this structure from data. When data is an aggregate from multiple sources (populations or environments), global confounding obscures conditional independence properties that drive many causal discovery algorithms. For this reason, existing causal discovery algorithms are not suitable for the multiple-source setting. We demonstrate that, if the confounding is of bounded cardinality (i.e. the data comes from a limited number of sources), causal discovery can still be achieved. The feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure.
    摘要 Translated into Simplified Chinese:导向无环Graphs are used to model the causal structure of a system. "Causal discovery" describes the problem of learning this structure from data. When data is an aggregate from multiple sources (populations or environments), global confounding obscures conditional independence properties that drive many causal discovery algorithms. For this reason, existing causal discovery algorithms are not suitable for the multiple-source setting. We demonstrate that, if the confounding is of bounded cardinality (i.e. the data comes from a limited number of sources), causal discovery can still be achieved. The feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Taiwan, and other countries.

Explainable Boosting Machines with Sparsity – Maintaining Explainability in High-Dimensional Settings

  • paper_url: http://arxiv.org/abs/2311.07452
  • repo_url: https://github.com/interpretml/interpret
  • paper_authors: Brandon M. Greenwell, Annika Dahlmann, Saurabh Dhoble
  • For: The paper aims to improve the transparency and speed of Explainable Boosting Machines (EBMs) in high-dimensional settings with many predictor variables.* Methods: The paper proposes using the Least Absolute Shrinkage and Selection Operator (LASSO) to introduce sparsity and remove less relevant terms in the EBM, allowing the model to maintain transparency and relatively fast scoring times.* Results: The paper shows that post-processing a fitted EBM with many terms using LASSO can reduce the model’s complexity and drastically improve scoring time, while maintaining competitive accuracy.Here’s the Chinese translation of the three points:* For: 这篇论文目标是在高维设定下提高Explainable Boosting Machines(EBM)的透明度和速度。* Methods: 论文提议使用Least Absolute Shrinkage and Selection Operator(LASSO)引入简洁性,从 fitted EBM 中除去不那么重要的项目,使模型保持透明度和相对快的分分析时间。* Results: 论文显示,对 fitted EBM 使用 LASSO 后处理可以减少模型的复杂性,提高分分析时间,保持竞争性的准确性。
    Abstract Compared to "black-box" models, like random forests and deep neural networks, explainable boosting machines (EBMs) are considered "glass-box" models that can be competitively accurate while also maintaining a higher degree of transparency and explainability. However, EBMs become readily less transparent and harder to interpret in high-dimensional settings with many predictor variables; they also become more difficult to use in production due to increases in scoring time. We propose a simple solution based on the least absolute shrinkage and selection operator (LASSO) that can help introduce sparsity by reweighting the individual model terms and removing the less relevant ones, thereby allowing these models to maintain their transparency and relatively fast scoring times in higher-dimensional settings. In short, post-processing a fitted EBM with many (i.e., possibly hundreds or thousands) of terms using the LASSO can help reduce the model's complexity and drastically improve scoring time. We illustrate the basic idea using two real-world examples with code.
    摘要 Simplified Chinese:与“黑盒”模型(如随机森林和深度神经网络)相比,可解释扩展机器(EBM)被视为“玻璃盒”模型,可同时保持高度的透明度和解释性。然而,EBM在高维设置中的多个预测变量时会变得更加难以理解和维护,同时也会导致在生产环境中使用变得更加困难。我们提出了一个简单的解决方案,基于最小绝对减少和选择算子(LASSO),可以在高维设置中引入稀疏性,并通过重新权重个模型项和 removes menos相关的项来保持模型的透明度和相对快的分分时间。简而言之,对已经预测好的 EBM 进行 LASSO 处理可以帮助减少模型的复杂性,并快速提高分分时间。我们使用两个实际例子的代码来说明基本思路。

On the Robustness of Neural Collapse and the Neural Collapse of Robustness

  • paper_url: http://arxiv.org/abs/2311.07444
  • repo_url: None
  • paper_authors: Jingtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe
  • for: 本文研究了神经网络中的神经崩溃现象,即训练结束时神经网络的特征向量和分类权重归一化到一个简单的 геометрической设计(一个简单体)。
  • methods: 本文使用了实验和理论的方法来研究神经崩溃的稳定性特性。
  • results: 研究发现,神经崩溃结构在小型攻击下消失,并且输入数据中的扰动Example会“跳跃”到简单体的边点上。此外,研究发现对抗攻击的网络优化后,神经崩溃仍然是普遍存在的现象,clean和扰动表示形成了垂直的简单体,并且导致了一个简单 nearest-neighbor 分类器。
    Abstract Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness, remains unclear. In this work, we study the stability properties of these simplices. We find that the simplex structure disappears under small adversarial attacks, and that perturbed examples "leap" between simplex vertices. We further analyze the geometry of networks that are optimized to be robust against adversarial perturbations of the input, and find that Neural Collapse is a pervasive phenomenon in these cases as well, with clean and perturbed representations forming aligned simplices, and giving rise to a robust simple nearest-neighbor classifier. By studying the propagation of the amount of collapse inside the network, we identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data.
    摘要

Boolean Variation and Boolean Logic BackPropagation

  • paper_url: http://arxiv.org/abs/2311.07427
  • repo_url: None
  • paper_authors: Van Minh Nguyen
  • for: 这篇论文是关于布尔集的概念引入和布尔逻辑推导深度模型的建立的。
  • methods: 该论文使用布尔逻辑推导深度模型, weights和活动都是布尔数字,通过布尔逻辑进行运算。具体来说,布尔深度模型可以直接在布尔领域内训练,不需要隐藏权重。没有梯度,只有逻辑是合成和归并。
  • results: 该论文的实验结果表明,布尔深度模型可以达到与实数深度模型相同的性能水平,但是具有更好的可解释性和安全性。
    Abstract The notion of variation is introduced for the Boolean set and based on which Boolean logic backpropagation principle is developed. Using this concept, deep models can be built with weights and activations being Boolean numbers and operated with Boolean logic instead of real arithmetic. In particular, Boolean deep models can be trained directly in the Boolean domain without latent weights. No gradient but logic is synthesized and backpropagated through layers.
    摘要 “变化”概念在布尔集中引入,基于这个概念,布尔逻辑反propagation原理得到开发。使用这个概念,深度模型可以使用布尔数字和布尔逻辑进行操作,而不需要实数 arithmetic。特别是,布尔深度模型可以直接在布尔领域内被训练,而不需要秘密 веса。无需梯度,逻辑是通过层次 синтези并反propagated。

Three-dimensional granular flow simulation using graph neural network-based learned simulator

  • paper_url: http://arxiv.org/abs/2311.07416
  • repo_url: None
  • paper_authors: Yongjin Choi, Krishna Kumar
  • for: This paper aims to develop a novel deep learning technique, graph neural network (GNN), to simulate granular flows and address the issues of computational intractability and empirical nature of traditional methods.
  • methods: The paper employs GNN to develop a GNN-based simulator (GNS) for granular flows, which learns the local interaction law of granular flows from a limited set of trajectories.
  • results: The paper shows that GNS successfully reproduces the overall behaviors of column collapses with various aspect ratios that were not encountered during training, and outperforms high-fidelity numerical simulators by 300 times in terms of computation speed.
    Abstract Reliable evaluations of geotechnical hazards like landslides and debris flow require accurate simulation of granular flow dynamics. Traditional numerical methods can simulate the complex behaviors of such flows that involve solid-like to fluid-like transitions, but they are computationally intractable when simulating large-scale systems. Surrogate models based on statistical or machine learning methods are a viable alternative, but they are typically empirical and rely on a confined set of parameters in evaluating associated risks. Due to their permutation-dependent learning, conventional machine learning models require an unreasonably large amount of training data for building generalizable surrogate models. We employ a graph neural network (GNN), a novel deep learning technique, to develop a GNN-based simulator (GNS) for granular flows to address these issues. Graphs represent the state of granular flows and interactions, like the exchange of energy and momentum between grains, and GNN learns the local interaction law. GNS takes the current state of the granular flow and estimates the next state using Euler explicit integration. We train GNS on a limited set of granular flow trajectories and evaluate its performance in a three-dimensional granular column collapse domain. GNS successfully reproduces the overall behaviors of column collapses with various aspect ratios that were not encountered during training. The computation speed of GNS outperforms high-fidelity numerical simulators by 300 times.
    摘要 可靠的地层风险评估需要准确地模拟 granular 流动的动态。传统的数值方法可以模拟 granular 流动中的复杂行为,但它们在大规模系统上 computationally intractable。基于统计或机器学习方法的代理模型是一种可行的 alternativa,但它们通常是empirical的,并且基于一个有限的参数集来评估相关的风险。由于它们的 permutation-dependent learning,传统的机器学习模型需要一个不切实际的大量的训练数据来建立通用的代理模型。我们employs a graph neural network (GNN),一种新的深度学习技术,来开发一个 GNN-based simulator (GNS) for granular flows。图表示 granular 流动的状态和交互,如粒子之间的能量和动量交换,GNN 学习本地交互法律。GNS 使用当前 granular 流动的状态来估计下一个状态,使用 Euler 显式积分。我们在一个有限的 granular 流动轨迹上训练 GNS,并评估其性能在一个三维 granular 柱塌领域。GNS 成功地复制了不同方向比例的柱塌的总行为,并且在训练期间没有遇到的多样化的柱塌行为。GNS 的计算速度高于高精度数值模拟器,提高了 300 倍。

Attention-based Multi-task Learning for Base Editor Outcome Prediction

  • paper_url: http://arxiv.org/abs/2311.07636
  • repo_url: None
  • paper_authors: Amina Mollaysa, Ahmed Allam, Michael Krauthammer
  • for: 提高基因编辑技术的精度和效率,以便更好地治疗人类遗传疾病。
  • methods: 使用机器学习模型,通过预测各种可能的编辑结果,以提高基因编辑设计的精度和效率。
  • results: 在多个数据集和基因编辑变体上,模型预测的结果与实验结果呈 corrrelation,证明了模型的有效性和可靠性。
    Abstract Human genetic diseases often arise from point mutations, emphasizing the critical need for precise genome editing techniques. Among these, base editing stands out as it allows targeted alterations at the single nucleotide level. However, its clinical application is hindered by low editing efficiency and unintended mutations, necessitating extensive trial-and-error experimentation in the laboratory. To speed up this process, we present an attention-based two-stage machine learning model that learns to predict the likelihood of all possible editing outcomes for a given genomic target sequence. We further propose a multi-task learning schema to jointly learn multiple base editors (i.e. variants) at once. Our model's predictions consistently demonstrated a strong correlation with the actual experimental results on multiple datasets and base editor variants. These results provide further validation for the models' capacity to enhance and accelerate the process of refining base editing designs.
    摘要 人类遗传病多发生于点突变,强调了精准基因编辑技术的核心性。其中,基因编辑技术出现了,它可以在单个核苷酸水平进行targeted修饰。然而,临床应用受到低修饰效率和不意图的突变所阻碍,需要进行详细的实验室试验。为了加速这个过程,我们提出了一种关注机制基于两个阶段机器学习模型,可以预测给定 genomic 目标序列中所有可能的编辑结果的可能性。我们还提议使用多任务学习 schema,可以同时学习多种基因编辑器(即变体)。我们的模型预测结果与实验结果在多个数据集和基因编辑器变体上均具有强相关性。这些结果为我们的模型增强和加速基因编辑设计的能力提供了进一步的验证。

Transpose Attack: Stealing Datasets with Bidirectional Training

  • paper_url: http://arxiv.org/abs/2311.07389
  • repo_url: https://github.com/guyamit/transpose-attack-paper-ndss24-
  • paper_authors: Guy Amit, Mosh Levy, Yisroel Mirsky
  • for: 本研究探讨了深度神经网络在反向方向下的漏洞,以及恶意用户可以通过这个漏洞将模型隐藏在正常模型中。
  • methods: 本研究使用了深度神经网络的反向传播方法,并示出了如何在这种方法下系统地记忆和回忆特定的样本。
  • results: 研究发现,现代建模可以通过这种方法在保护学习环境下隐藏敏感数据,并且可以高精度地复制大量样本,这可能会损害数据隐私和生成新模型。此外,研究还提出了一种新的方法来检测恶意模型。
    Abstract Deep neural networks are normally executed in the forward direction. However, in this work, we identify a vulnerability that enables models to be trained in both directions and on different tasks. Adversaries can exploit this capability to hide rogue models within seemingly legitimate models. In addition, in this work we show that neural networks can be taught to systematically memorize and retrieve specific samples from datasets. Together, these findings expose a novel method in which adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models. We focus on the data exfiltration attack and show that modern architectures can be used to secretly exfiltrate tens of thousands of samples with high fidelity, high enough to compromise data privacy and even train new models. Moreover, to mitigate this threat we propose a novel approach for detecting infected models.
    摘要 深度神经网络通常在前向方向下执行。然而,在这项工作中,我们发现了一个漏洞,允许模型在两个方向和不同任务上训练。攻击者可以利用这个能力,隐藏恶意模型在看起来合法的模型中。此外,我们还示出了神经网络可以系统地记忆和重复特定样本。这些发现表明了一种新的攻击方法,可以在受保护的学习环境中隐藏大量数据。我们主要关注数据泄露攻击,并证明现代架构可以使用高准确率泄露数据,足以损害数据隐私和生成新模型。此外,我们还提出了一种新的检测恶意模型的方法,以mitigate这种威胁。

arfpy: A python package for density estimation and generative modeling with adversarial random forests

  • paper_url: http://arxiv.org/abs/2311.07366
  • repo_url: https://github.com/bips-hb/arfpy
  • paper_authors: Kristin Blesch, Marvin N. Wright
  • for: 该论文提供了一种用于生成类似给定数据的轻量级方法,即 Adversarial Random Forests(ARF)的Python实现,帮助实际者快速地进行数据生成和分布估计。
  • methods: 该论文使用的方法是Adversarial Random Forests(ARF),它是一种基于树的生成模型,可以快速地生成类似给定数据的新数据。
  • results: 论文的结果表明,$\textit{arfpy}$ 可以快速地生成高质量的新数据,并且与传统的深度学习模型相比,它具有更低的 Tuning 和计算资源的需求,同时具有易用的Python接口,可以让科学家在各个领域进行数据生成。
    Abstract This paper introduces $\textit{arfpy}$, a python implementation of Adversarial Random Forests (ARF) (Watson et al., 2023), which is a lightweight procedure for synthesizing new data that resembles some given data. The software $\textit{arfpy}$ equips practitioners with straightforward functionalities for both density estimation and generative modeling. The method is particularly useful for tabular data and its competitive performance is demonstrated in previous literature. As a major advantage over the mostly deep learning based alternatives, $\textit{arfpy}$ combines the method's reduced requirements in tuning efforts and computational resources with a user-friendly python interface. This supplies audiences across scientific fields with software to generate data effortlessly.
    摘要

ADAMM: Anomaly Detection of Attributed Multi-graphs with Metadata: A Unified Neural Network Approach

  • paper_url: http://arxiv.org/abs/2311.07355
  • repo_url: https://github.com/konsotirop/adamm
  • paper_authors: Konstantinos Sotiropoulos, Lingxiao Zhao, Pierre Jinghong Liang, Leman Akoglu
  • for: 针对复杂的图Database中的节点和边具有自适应特征的异常实例检测。
  • methods: 提出了一种名为ADAMM的图神经网络模型,可以直接处理导向的多边图和自环图,同时同时处理图和标注数据的整合。
  • results: 对两个不同领域的数据集进行了实验,包括公司的财务日志条目和人们的城市流动轨迹数据,并证明了ADAMM的一致性和检测效果。
    Abstract Given a complex graph database of node- and edge-attributed multi-graphs as well as associated metadata for each graph, how can we spot the anomalous instances? Many real-world problems can be cast as graph inference tasks where the graph representation could capture complex relational phenomena (e.g., transactions among financial accounts in a journal entry), along with metadata reflecting tabular features (e.g. approver, effective date, etc.). While numerous anomaly detectors based on Graph Neural Networks (GNNs) have been proposed, none are capable of directly handling directed graphs with multi-edges and self-loops. Furthermore, the simultaneous handling of relational and tabular features remains an unexplored area. In this work we propose ADAMM, a novel graph neural network model that handles directed multi-graphs, providing a unified end-to-end architecture that fuses metadata and graph-level representation learning through an unsupervised anomaly detection objective. Experiments on datasets from two different domains, namely, general-ledger journal entries from different firms (accounting) as well as human GPS trajectories from thousands of individuals (urban mobility) validate ADAMM's generality and detection effectiveness of expert-guided and ground-truth anomalies. Notably, ADAMM outperforms existing baselines that handle the two data modalities (graph and metadata) separately with post hoc synthesis efforts.
    摘要 给定复杂的图数据库,包括节点和边具有多重图的多图,以及每个图的相关metadata,如何检测异常实例?许多现实世界问题可以表示为图推理任务,图表示可以捕捉复杂的关系现象(例如,财务交易记录中的账户之间的交易),并且metadata反映了表格特征(例如,批准人、有效日期等)。尽管已经有许多基于图神经网络(GNNs)的异常检测器被提出,但是这些模型无法直接处理指定的多图和自Loop。此外,同时处理关系和表格特征的推理还是一个未探索的领域。在这项工作中,我们提出了ADAMM模型,它可以处理指定多图,并提供一个简单的端到端架构,通过不监督的异常检测目标来融合metadata和图 nivel representation学习。实验表明,ADAMM在不同领域的数据集上(包括财务记录和人类GPS轨迹)具有一致性和检测准确性,并且超过了分离两种数据模式(图和metadata)的基础线上的混合synthesis方法。

Affine Invariance in Continuous-Domain Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2311.09245
  • repo_url: None
  • paper_authors: Ali Mohaddes, Johannes Lederer
  • for: 本研究旨在提高深度学习模型对几何变换的识别能力,通过利用群变换的概念。
  • methods: 本研究使用了连续域 convolutional neural networks,并引入了一新的相似性评价标准来评估两个输入信号之间的相似性under affine transformations。
  • results: 研究表明,通过利用全部的affine transforms生成的泛型linear group $\mathrm{GL}_2(\mathbb{R})$,可以大幅提高深度学习模型的性能。
    Abstract The notion of group invariance helps neural networks in recognizing patterns and features under geometric transformations. Indeed, it has been shown that group invariance can largely improve deep learning performances in practice, where such transformations are very common. This research studies affine invariance on continuous-domain convolutional neural networks. Despite other research considering isometric invariance or similarity invariance, we focus on the full structure of affine transforms generated by the generalized linear group $\mathrm{GL}_2(\mathbb{R})$. We introduce a new criterion to assess the similarity of two input signals under affine transformations. Then, unlike conventional methods that involve solving complex optimization problems on the Lie group $G_2$, we analyze the convolution of lifted signals and compute the corresponding integration over $G_2$. In sum, our research could eventually extend the scope of geometrical transformations that practical deep-learning pipelines can handle.
    摘要 “GROUP INVARIANCE HELPS NEURAL NETWORKS RECOGNIZE PATTERNS AND FEATURES UNDER GEOMETRIC TRANSFORMATIONS. IN PRACTICE, SUCH TRANSFORMATIONS ARE VERY COMMON, AND GROUP INVARIANCE CAN LARGELY IMPROVE DEEP LEARNING PERFORMANCE. THIS RESEARCH STUDIES AFFINE INVARIANCE ON CONTINUOUS-DOMAIN CONVOLUTIONAL NEURAL NETWORKS. OTHER RESEARCH HAS CONSIDERED ISMETRIC INVARIANCE OR SIMILARITY INVARIANCE, BUT WE FOCUS ON THE FULL STRUCTURE OF AFFINE TRANSFORMS GENERATED BY THE GENERALIZED LINEAR GROUP $\mathbb{R}^2$. WE INTRODUCE A NEW CRITERION TO ASSESS THE SIMILARITY OF TWO INPUT SIGNALS UNDER AFFINE TRANSFORMATIONS. UNLIKE CONVENTIONAL METHODS THAT INVOLVE SOLVING COMPLEX OPTIMIZATION PROBLEMS ON THE LIE GROUP $G_2$, WE ANALYZE THE CONVOLUTION OF LIFTED SIGNALS AND COMPUTE THE CORRESPONDING INTEGRATION OVER $G_2$. IN SUM, OUR RESEARCH COULD FINALLY EXTEND THE SCOPE OF GEOMETRIC TRANSFORMATIONS THAT PRACTICAL DEEP-LEARNING PIPELINES CAN HANDLE.”Note that Simplified Chinese is used here, which is a more common writing system in China. If you prefer Traditional Chinese, I can provide that version as well.

Missing Value Imputation for Multi-attribute Sensor Data Streams via Message Propagation (Extended Version)

  • paper_url: http://arxiv.org/abs/2311.07344
  • repo_url: https://github.com/xli-2020/mpin
  • paper_authors: Xiao Li, Huan Li, Hua Lu, Christian S. Jensen, Varun Pandey, Volker Markl
  • for: 用于替代感知数据流中缺失值的快速和高效方法。
  • methods: 提出了一种消息协议推广网络(MPIN),可以在一个时间窗口内恢复缺失数据实例的值。同时,我们还提出了一种连续替换机制,包括数据更新和模型更新机制,以便MPIN可以在实时应用中进行连续替换。
  • results: MPIN可以在多个实际数据集上表现出较高的替换精度和效率,并且连续替换机制可以保证MPIN的高效性和准确性。
    Abstract Sensor data streams occur widely in various real-time applications in the context of the Internet of Things (IoT). However, sensor data streams feature missing values due to factors such as sensor failures, communication errors, or depleted batteries. Missing values can compromise the quality of real-time analytics tasks and downstream applications. Existing imputation methods either make strong assumptions about streams or have low efficiency. In this study, we aim to accurately and efficiently impute missing values in data streams that satisfy only general characteristics in order to benefit real-time applications more widely. First, we propose a message propagation imputation network (MPIN) that is able to recover the missing values of data instances in a time window. We give a theoretical analysis of why MPIN is effective. Second, we present a continuous imputation framework that consists of data update and model update mechanisms to enable MPIN to perform continuous imputation both effectively and efficiently. Extensive experiments on multiple real datasets show that MPIN can outperform the existing data imputers by wide margins and that the continuous imputation framework is efficient and accurate.
    摘要 仪器数据流广泛存在在互联网东西(IoT)中的实时应用中。然而,仪器数据流中存在缺失值,这些缺失值可能由仪器故障、通信错误或电池耗尽等因素引起。缺失值会下降实时分析任务和下游应用的质量。现有的填充方法都有一定的假设,或者效率低下。在这个研究中,我们想要准确地和高效地填充数据流中缺失值,以便更广泛地应用于实时应用。首先,我们提出了一种消息传播填充网络(MPIN),可以在时间窗口中重建缺失的数据实例。我们给出了MPIN的理论分析,解释了它的效果。其次,我们提出了一种连续填充框架,该框架包括数据更新和模型更新机制,以便MPIN可以在实时中进行连续填充,同时保证效果和效率。在多个实际数据集上进行了广泛的实验,我们发现MPIN可以在现有数据填充器的基础上准确地和高效地填充数据流中的缺失值,并且连续填充框架可以保证MPIN的高效性和准确性。

Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning

  • paper_url: http://arxiv.org/abs/2311.07343
  • repo_url: None
  • paper_authors: Felix den Breejen, Sangmin Bae, Stephen Cha, Tae-Young Kim, Seoung Hyun Koh, Se-Young Yun
  • for: 提高 tabular deep learning 的表现
  • methods: 使用召回机制,特别是在练习 TabPFN 模型的 fine-tuning 阶段
  • results: 在我们的实验中,使用召回机制和大量预训练可以明显超越现有方法,这些发现表明将召回机制融合到预训练和传输学习方案中可以提升 tabular deep learning 的表现。I hope that helps! Let me know if you have any other questions.
    Abstract While interests in tabular deep learning has significantly grown, conventional tree-based models still outperform deep learning methods. To narrow this performance gap, we explore the innovative retrieval mechanism, a methodology that allows neural networks to refer to other data points while making predictions. Our experiments reveal that retrieval-based training, especially when fine-tuning the pretrained TabPFN model, notably surpasses existing methods. Moreover, the extensive pretraining plays a crucial role to enhance the performance of the model. These insights imply that blending the retrieval mechanism with pretraining and transfer learning schemes offers considerable potential for advancing the field of tabular deep learning.
    摘要 而Tabular深度学习的兴趣在过去几年得到了广泛的关注,但是传统的树状模型仍然在深度学习方法之上表现更好。为了减少这个性能差距,我们 explore了一种创新的引用机制,即让神经网络在预测时引用其他数据点。我们的实验表明,引用基本训练,特别是在使用预训练TabPFN模型进行细化训练时,显著超过了现有方法。此外,广泛的预训练也对模型性能产生了重要的影响。这些发现表明,将引用机制与预训练和传输学习策略结合起来可以为表格深度学习领域带来显著的进步。

DAGC: Data-Volume-Aware Adaptive Sparsification Gradient Compression for Distributed Machine Learning in Mobile Computing

  • paper_url: http://arxiv.org/abs/2311.07324
  • repo_url: None
  • paper_authors: Rongwei Lu, Yutong Jiang, Yinan Mao, Chen Tang, Bin Chen, Laizhong Cui, Zhi Wang
  • for: 实现分布式机器学习(Distributed Machine Learning,DML)在移动环境中获得更好的性能。
  • methods: 使用非均匀压缩法,对不同的worker分配不同的压缩比例,以应对非同一的数据分布和量。
  • results: 这项研究发现,对于非同一的数据分布和量,将不同的压缩比例分配给不同的worker可以提高 convergence rate,并且可以在受限的通信预算下进行优化。
    Abstract Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is thus a promising solution. This study introduces an analysis of distributed SGD with non-uniform compression, which reveals that the convergence rate (indicative of the iterations needed to achieve a certain accuracy) is influenced by compression ratios applied to workers with differing volumes. Accordingly, we frame relative compression ratio assignment as an $n$-variables chi-square nonlinear optimization problem, constrained by a fixed and limited communication budget. We propose DAGC-R, which assigns the worker handling larger data volumes the conservative compression. Recognizing the computational limitations of mobile devices, we DAGC-A, which are computationally less demanding and enhances the robustness of the absolute gradient compressor in non-IID scenarios. Our experiments confirm that both the DAGC-A and DAGC-R can achieve better performance when dealing with highly imbalanced data volume distribution and restricted communication.
    摘要 分布式机器学习(DML)在移动环境中遇到了重要的通信瓶颈。梯度压缩已经出现为解决这个问题的有效解决方案,提供了有限的带宽和计量数据环境中的重要优点。然而,它们在非标一致环境中会导致严重的性能下降,因为不考虑工作者间数据量的差异。将不同数据分布和量的工作者分配不同的压缩比例是一种有前途的解决方案。本研究提出了分布式SGD中非均匀压缩的分析,发现压缩比率应用于不同数据分布和量的工作者会影响收敛率(表示需要达到某种精度的迭代次数)。因此,我们将压缩比率分配视为n变量的chi-方差非线性优化问题,受限于固定的通信预算。我们提出了DAGC-R,它将处理大量数据的工作者分配保守的压缩。认识到移动设备的计算限制,我们还提出了DAGC-A,它是计算较少的,并在非标一致情况下提高了绝对梯度压缩器的稳定性。我们的实验表明,DAGC-A和DAGC-R可以在面临高度不均衡数据量分布和限制通信的情况下表现更好。

A Voting Approach for Explainable Classification with Rule Learning

  • paper_url: http://arxiv.org/abs/2311.07323
  • repo_url: https://github.com/albertn7/modularrulelearning
  • paper_authors: Albert Nössig, Tobias Hell, Georg Moser
  • for: This paper investigates the application of rule learning methods in typical classification tasks, with a focus on providing explanations for the predictions made.
  • methods: The approach used in the paper is a voting method that combines rule learning and state-of-the-art methods to achieve comparable results with explanations.
  • results: The paper shows that the proposed approach outperforms ordinary rule learning methods and achieves results on a par with state-of-the-art outcomes, using a variety of benchmark data sets including a use case of significant interest to insurance industries.Here is the text in Simplified Chinese:
  • for: 这篇论文 investigate了通常的分类任务中使用规则学习方法,并强调提供预测结果的解释。
  • methods: 该论文使用的方法是一种投票方法,将规则学习和当今最佳方法结合起来,以实现与解释相对的结果。
  • results: 论文表明,提posed方法不仅比普通的规则学习方法更高效,还能够与当今最佳结果相比。使用了多个 benchmark 数据集,包括保险业的一个有用的应用场景。
    Abstract State-of-the-art results in typical classification tasks are mostly achieved by unexplainable machine learning methods, like deep neural networks, for instance. Contrarily, in this paper, we investigate the application of rule learning methods in such a context. Thus, classifications become based on comprehensible (first-order) rules, explaining the predictions made. In general, however, rule-based classifications are less accurate than state-of-the-art results (often significantly). As main contribution, we introduce a voting approach combining both worlds, aiming to achieve comparable results as (unexplainable) state-of-the-art methods, while still providing explanations in the form of deterministic rules. Considering a variety of benchmark data sets including a use case of significant interest to insurance industries, we prove that our approach not only clearly outperforms ordinary rule learning methods, but also yields results on a par with state-of-the-art outcomes.
    摘要 现代结果通常由不可解释的机器学习方法取得,如深度神经网络。然而,在本研究中,我们调查了规则学习方法的应用。因此,分类结果会基于可读的(第一类)规则,解释预测结果。然而,规则基分类通常比现代结果(经常significantly)精度较差。作为主要贡献,我们介绍了投票方法,结合这两个世界,尝试以相似的方式获得现代结果,同时仍提供可读的规则解释。使用多种标准资料集,包括一个有意义的保险业案例,我们证明了我们的方法不仅明显超过常规规则学习方法,而且也与现代结果相似。

An introduction to reinforcement learning for neuroscience

  • paper_url: http://arxiv.org/abs/2311.07315
  • repo_url: None
  • paper_authors: Kristopher T. Jensen
  • for: 本文主要是为了介绍基于强化学习的神经科学研究,包括 классиical temporal difference算法和深度强化学习方法,以及它们在实验神经科学中的应用。
  • methods: 本文使用的方法包括模型自由和模型基于的强化学习,以及DYNA和继承表示法。这些方法在机器学习和神经科学实验中都有着广泛的应用。
  • results: 本文提供了一个入门性的介绍,涵盖了强化学习的基本理论和神经科学实验中的应用。同时,本文还提供了一些实际的例子,如meta-强化学习(Wang et al., 2018)和分布强化学习(Dabney et al., 2020),以及相关的代码和图像生成。
    Abstract Reinforcement learning has a rich history in neuroscience, from early work on dopamine as a reward prediction error signal for temporal difference learning (Schultz et al., 1997) to recent work suggesting that dopamine could implement a form of 'distributional reinforcement learning' popularized in deep learning (Dabney et al., 2020). Throughout this literature, there has been a tight link between theoretical advances in reinforcement learning and neuroscientific experiments and findings. As a result, the theories describing our experimental data have become increasingly complex and difficult to navigate. In this review, we cover the basic theory underlying classical work in reinforcement learning and build up to an introductory overview of methods used in modern deep reinforcement learning that have found applications in systems neuroscience. We start with an overview of the reinforcement learning problem and classical temporal difference algorithms, followed by a discussion of 'model-free' and 'model-based' reinforcement learning together with methods such as DYNA and successor representations that fall in between these two categories. Throughout these sections, we highlight the close parallels between the machine learning methods and related work in both experimental and theoretical neuroscience. We then provide an introduction to deep reinforcement learning with examples of how these methods have been used to model different learning phenomena in the systems neuroscience literature, such as meta-reinforcement learning (Wang et al., 2018) and distributional reinforcement learning (Dabney et al., 2020). Code that implements the methods discussed in this work and generates the figures is also provided.
    摘要 强化学习有着丰富的历史在神经科学中,从早期关于 dopamine 作为时间差值学习的奖励预测错误信号 (Schultz et al., 1997) 到最近的研究表明 dopamine 可能实现一种 '分布式强化学习' 的概念,它在深度学习中受欢迎 (Dabney et al., 2020)。在这些文献中,有着神经科学实验和发现的紧密联系,因此理论上的进步和实验的发现相互启发。在这篇文章中,我们将讲解强化学习的基本理论,从经典工作开始,然后推导到现代深度强化学习的方法,它们在系统神经科学中找到了应用。我们开始于强化学习问题的概述和经典时间差值算法,然后讲解 '模型自由' 和 '模型基于' 强化学习,以及 DYNA 和继承表示法,它们在这两个类别之间存在。在这些部分中,我们强调了机器学习方法和相关的神经科学实验和理论之间的密切相互关系。然后,我们将介绍深度强化学习,并通过系统神经科学文献中的不同学习现象模型,如 meta-强化学习 (Wang et al., 2018) 和分布式强化学习 (Dabney et al., 2020) 来示例。我们还提供了实现这些方法和生成图表的代码。

A probabilistic forecast methodology for volatile electricity prices in the Australian National Electricity Market

  • paper_url: http://arxiv.org/abs/2311.07289
  • repo_url: None
  • paper_authors: Cameron Cornell, Nam Trong Dinh, S. Ali Pourmousavi
  • for: 该论文主要探讨了澳大利亚南澳地区电力市场中的价格波动问题,并提出了一种可靠的预测方法。
  • methods: 该论文使用了随机过滤和一些后处理步骤,包括量词回归 ensemble 方法,以提高预测 precisio。
  • results: comparing with各个模型的预测结果, ensemble 模型在预测澳大利亚南澳地区电力市场价格中显示出了更高的准确率和更好的适应性。
    Abstract The South Australia region of the Australian National Electricity Market (NEM) displays some of the highest levels of price volatility observed in modern electricity markets. This paper outlines an approach to probabilistic forecasting under these extreme conditions, including spike filtration and several post-processing steps. We propose using quantile regression as an ensemble tool for probabilistic forecasting, with our combined forecasts achieving superior results compared to all constituent models. Within our ensemble framework, we demonstrate that averaging models with varying training length periods leads to a more adaptive model and increased prediction accuracy. The applicability of the final model is evaluated by comparing our median forecasts with the point forecasts available from the Australian NEM operator, with our model outperforming these NEM forecasts by a significant margin.
    摘要 南澳大利用澳大电力市场(NEM)的区域显示出一些最高的价格波动,这篇论文描述了在这些极端情况下的抽象预测方法,包括峰值筛选和一些后处理步骤。我们建议使用量论回归作为ensemble工具,我们的组合预测达到了所有组件模型的超越成果。在我们的集成框架中,我们证明了平均模型各种训练时间期间的变化,导致更适应的模型和提高预测精度。我们对最终模型的可行性进行了评估,通过与澳大电力市场运营商提供的点预测相比较,我们的模型在重要的margin上超越了这些预测。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Learning Arithmetic Formulas in the Presence of Noise: A General Framework and Applications to Unsupervised Learning

  • paper_url: http://arxiv.org/abs/2311.07284
  • repo_url: None
  • paper_authors: Pritam Chandra, Ankit Garg, Neeraj Kayal, Kunal Mittal, Tanmay Sinha
  • for: 这 paper 是为了设计高效的无监督学习问题解决方案,如混合 Gaussian 和子空间 clustering。
  • methods: 这 paper 使用一种基于 meta 算法的框架,学习受噪的 arithmetic circuits。这基于 Garg, Kayal 和 Saha (FOCS 20) 的 latest work,但是它们不受噪。关键的一部分是一种高效的 Robust Vector Space Decomposition 算法。
  • results: 作者表明,当某些矩阵有足够大的最小非零特征值时,他们的 meta 算法会工作良好。作者还推测,这种condition 在简化版问题上成立,因此他们的框架可以在简化设定下提供高效的算法。
    Abstract We present a general framework for designing efficient algorithms for unsupervised learning problems, such as mixtures of Gaussians and subspace clustering. Our framework is based on a meta algorithm that learns arithmetic circuits in the presence of noise, using lower bounds. This builds upon the recent work of Garg, Kayal and Saha (FOCS 20), who designed such a framework for learning arithmetic circuits without any noise. A key ingredient of our meta algorithm is an efficient algorithm for a novel problem called Robust Vector Space Decomposition. We show that our meta algorithm works well when certain matrices have sufficiently large smallest non-zero singular values. We conjecture that this condition holds for smoothed instances of our problems, and thus our framework would yield efficient algorithms for these problems in the smoothed setting.
    摘要 我们提出了一个通用的框架,用于设计高效的无监督学习问题的算法,如混合 Gaussian 和 subspace clustering。我们的框架基于一个 meta 算法,可以在噪声存在的情况下学习加法Circuit,使用下界。这基于最近的 Garg、Kayal 和 Saha (FOCS 20) 的工作,他们设计了这样的框架,但不含噪声。我们的 meta 算法的关键组成部分是一种高效的 Robust Vector Space Decomposition 算法。我们表明,当某些矩阵有足够大的最小非零特征值时,我们的 meta 算法能够工作良好。我们推测,这种条件在简化的问题上是成立的,因此我们的框架可以在简化 Setting 中提供高效的算法。

Predictive and Prescriptive Analytics for Multi-Site Modeling of Frail and Elderly Patient Services

  • paper_url: http://arxiv.org/abs/2311.07283
  • repo_url: None
  • paper_authors: Elizabeth Williams, Daniel Gartner, Paul Harper
    for:The paper aims to assess how predictive and prescriptive analytical methods can address operational challenges in healthcare, specifically in the context of planning resource capacities for frail and elderly inpatient wards.methods:The paper uses a combination of predictive and prescriptive analytical methods, including Classification and Regression Trees (CART) analysis and deterministic and two-stage stochastic programs, to analyze clinical and demographic patient attributes and predict length of stay.results:The linked methodologies provided different but similar results compared to using averages, capturing a more realistic real-world variation in patient length of stay. The results suggest that healthcare managers should consider using predictive and prescriptive models to make more informed decisions, rather than relying on averages.
    Abstract Recent research has highlighted the potential of linking predictive and prescriptive analytics. However, it remains widely unexplored how both paradigms could benefit from one another to address today's major challenges in healthcare. One of these is smarter planning of resource capacities for frail and elderly inpatient wards, addressing the societal challenge of an aging population. Frail and elderly patients typically suffer from multimorbidity and require more care while receiving medical treatment. The aim of this research is to assess how various predictive and prescriptive analytical methods, both individually and in tandem, contribute to addressing the operational challenges within an area of healthcare that is growing in demand. Clinical and demographic patient attributes are gathered from more than 165,000 patient records and used to explain and predict length of stay. To that extent, we employ Classification and Regression Trees (CART) analysis to establish this relationship. On the prescriptive side, deterministic and two-stage stochastic programs are developed to determine how to optimally plan for beds and ward staff with the objective to minimize cost. Furthermore, the two analytical methodologies are linked by generating demand for the prescriptive models using the CART groupings. The results show the linked methodologies provided different but similar results compared to using averages and in doing so, captured a more realistic real-world variation in the patient length of stay. Our research reveals that healthcare managers should consider using predictive and prescriptive models to make more informed decisions. By combining predictive and prescriptive analytics, healthcare managers can move away from relying on averages and incorporate the unique characteristics of their patients to create more robust planning decisions, mitigating risks caused by variations in demand.
    摘要 To assess the operational challenges within inpatient wards, we gathered clinical and demographic patient attributes from over 165,000 patient records and used Classification and Regression Trees (CART) analysis to explain and predict length of stay. We also developed deterministic and two-stage stochastic programs to determine how to optimally plan for beds and ward staff to minimize cost.The linked methodologies provided different but similar results compared to using averages, capturing a more realistic real-world variation in patient length of stay. Our research reveals that healthcare managers should consider using predictive and prescriptive models to make more informed decisions, moving away from relying on averages and incorporating the unique characteristics of their patients to create more robust planning decisions. By linking predictive and prescriptive analytics, healthcare managers can mitigate risks caused by variations in demand and create a more sustainable and effective healthcare system.

Towards Bounding Causal Effects under Markov Equivalence

  • paper_url: http://arxiv.org/abs/2311.07259
  • repo_url: None
  • paper_authors: Alexis Bellot
  • for: 这 paper 的目的是解决非观察数据下的 causal effect 预测问题,即 determining non-trivial bounds on causal effects induced by the data.
  • methods: 这 paper 使用了一种名为 Partial Ancestral Graph 的 less informative structure,并提供了一种系统的算法来 derive bounds on causal effects ,可以从 observational data 中学习。
  • results: 这 paper 的结果是提供了一种可 analytically 计算的 bounds on causal effects,它们可以在 less informative 的 causal diagram 下进行计算。
    Abstract Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that, in general, such questions cannot be answered definitively from observational data, e.g., as a consequence of unobserved confounding. A generalization of this task is to determine non-trivial bounds on causal effects induced by the data, also known as the task of partial causal identification. In the literature, several algorithms have been developed for solving this problem. Most, however, require a known parametric form or a fully specified causal diagram as input, which is usually not available in practical applications. In this paper, we assume as input a less informative structure known as a Partial Ancestral Graph, which represents a Markov equivalence class of causal diagrams and is learnable from observational data. In this more "data-driven" setting, we provide a systematic algorithm to derive bounds on causal effects that can be computed analytically.
    摘要 Predicting the effect of unseen interventions is a fundamental research question across the data sciences. It is well established that, in general, such questions cannot be answered definitively from observational data, e.g., due to unobserved confounding. A generalization of this task is to determine non-trivial bounds on causal effects induced by the data, also known as the task of partial causal identification. In the literature, several algorithms have been developed for solving this problem. Most, however, require a known parametric form or a fully specified causal diagram as input, which is usually not available in practical applications. In this paper, we assume as input a less informative structure known as a Partial Ancestral Graph, which represents a Markov equivalence class of causal diagrams and is learnable from observational data. In this more "data-driven" setting, we provide a systematic algorithm to derive bounds on causal effects that can be computed analytically.Here's the translation in Traditional Chinese as well:预测未见的干预效果是资料科学中的基本研究问题。已经证明,在一般情况下,这些问题无法从观察数据中确定答案,例如因为隐藏的共组因素。一个这个任务的扩展是决定观察数据中的非重要效果 bound,也就是partial causal identification的任务。在文献中,有许多算法用于解决这个问题,但大多数需要知道的 parametric form 或者完全的 causal diagram 作为输入,这通常不是实际应用中的情况。在这篇文章中,我们假设输入的是一个 less informative 的结构,known as Partial Ancestral Graph,这是一个可以从观察数据学习的 Markov equivalence class of causal diagrams。在这个 "data-driven" 的设定下,我们提供了一个系统的算法,可以分析方式 compute analytically bounds on causal effects。

Error Analysis of Option Pricing via Deep PDE Solvers: Empirical Study

  • paper_url: http://arxiv.org/abs/2311.07231
  • repo_url: None
  • paper_authors: Rawin Assabumrungrat, Kentaro Minami, Masanori Hirano
  • for: 这个论文的目的是为了研究深度学习基于PDE解决高维Option价值问题的可scalability和实用性。
  • methods: 这个论文使用了深度学习基于PDE的方法来解决高维Option价值问题,并进行了对比性试验来评估这些方法的实际性和可scalability。
  • results: 研究发现了三种主要的错误来源:(1)目标选项和下一个资产的规定错误,(2)资产模型仿真方法引起的错误,(3)神经网络训练过程中的错误。这些错误的影响都被细分分析了。研究发现DBSDE方法在性能和可靠性方面较为出色,而其他方法则具有较强的sensitivity性。此外,研究还发现了计算资源的尺度关系,即batch size和时间步长的平方根与方法性能之间存在负相关性。
    Abstract Option pricing, a fundamental problem in finance, often requires solving non-linear partial differential equations (PDEs). When dealing with multi-asset options, such as rainbow options, these PDEs become high-dimensional, leading to challenges posed by the curse of dimensionality. While deep learning-based PDE solvers have recently emerged as scalable solutions to this high-dimensional problem, their empirical and quantitative accuracy remains not well-understood, hindering their real-world applicability. In this study, we aimed to offer actionable insights into the utility of Deep PDE solvers for practical option pricing implementation. Through comparative experiments, we assessed the empirical performance of these solvers in high-dimensional contexts. Our investigation identified three primary sources of errors in Deep PDE solvers: (i) errors inherent in the specifications of the target option and underlying assets, (ii) errors originating from the asset model simulation methods, and (iii) errors stemming from the neural network training. Through ablation studies, we evaluated the individual impact of each error source. Our results indicate that the Deep BSDE method (DBSDE) is superior in performance and exhibits robustness against variations in option specifications. In contrast, some other methods are overly sensitive to option specifications, such as time to expiration. We also find that the performance of these methods improves inversely proportional to the square root of batch size and the number of time steps. This observation can aid in estimating computational resources for achieving desired accuracies with Deep PDE solvers.
    摘要 Option 价值计价,金融领域的基本问题,经常需要解决非线性偏微分方程(PDE)。在多资产选项(如彩虹选项)的情况下,这些PDE变得高维度,带来维度味的挑战。Recently, deep learning-based PDE solvers have emerged as scalable solutions to this high-dimensional problem, but their empirical and quantitative accuracy remains not well-understood, hindering their real-world applicability. In this study, we aimed to offer actionable insights into the utility of Deep PDE solvers for practical option pricing implementation. Through comparative experiments, we assessed the empirical performance of these solvers in high-dimensional contexts. Our investigation identified three primary sources of errors in Deep PDE solvers: (i) errors inherent in the specifications of the target option and underlying assets, (ii) errors originating from the asset model simulation methods, and (iii) errors stemming from the neural network training. Through ablation studies, we evaluated the individual impact of each error source. Our results indicate that the Deep BSDE method (DBSDE) is superior in performance and exhibits robustness against variations in option specifications. In contrast, some other methods are overly sensitive to option specifications, such as time to expiration. We also find that the performance of these methods improves inversely proportional to the square root of batch size and the number of time steps. This observation can aid in estimating computational resources for achieving desired accuracies with Deep PDE solvers.Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from Traditional Chinese.

Neural General Circulation Models

  • paper_url: http://arxiv.org/abs/2311.07222
  • repo_url: https://github.com/ananya2001gupta/Bitcoin-Price-Prediction-using-AI-ML.
  • paper_authors: Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, James Lottes, Stephan Rasp, Peter Düben, Milan Klöwer, Sam Hatfield, Peter Battaglia, Alvaro Sanchez-Gonzalez, Matthew Willson, Michael P. Brenner, Stephan Hoyer
  • for: 这篇论文的目的是为了开发一种新的气象模型,它结合了数学方法和机器学习技术,以提高天气预测和气候预测的准确性和效率。
  • methods: 这篇论文使用的方法包括了机器学习模型和传统的气象模型,以及一种可微的解方法,用于解决大规模动力学问题。
  • results: 这篇论文的结果表明,该新的气象模型可以与传统的气象模型和机器学习模型相比,在1-10天的天气预测和1-15天的气候预测中具有同等或更高的准确性。此外,该模型还可以在长期气候预测中准确地跟踪全球气候指标,并且可以模拟出实际的热带风暴频率和轨迹。
    Abstract General circulation models (GCMs) are the foundation of weather and climate prediction. GCMs are physics-based simulators which combine a numerical solver for large-scale dynamics with tuned representations for small-scale processes such as cloud formation. Recently, machine learning (ML) models trained on reanalysis data achieved comparable or better skill than GCMs for deterministic weather forecasting. However, these models have not demonstrated improved ensemble forecasts, or shown sufficient stability for long-term weather and climate simulations. Here we present the first GCM that combines a differentiable solver for atmospheric dynamics with ML components, and show that it can generate forecasts of deterministic weather, ensemble weather and climate on par with the best ML and physics-based methods. NeuralGCM is competitive with ML models for 1-10 day forecasts, and with the European Centre for Medium-Range Weather Forecasts ensemble prediction for 1-15 day forecasts. With prescribed sea surface temperature, NeuralGCM can accurately track climate metrics such as global mean temperature for multiple decades, and climate forecasts with 140 km resolution exhibit emergent phenomena such as realistic frequency and trajectories of tropical cyclones. For both weather and climate, our approach offers orders of magnitude computational savings over conventional GCMs. Our results show that end-to-end deep learning is compatible with tasks performed by conventional GCMs, and can enhance the large-scale physical simulations that are essential for understanding and predicting the Earth system.
    摘要 全球气候模型(GCM)是天气和气候预测的基础模型。GCM 是一种基于物理的数值模拟器,将大规模动力学数值解析与调整的小规模过程模型相结合。近年来,基于机器学习(ML)的模型在 deterministic 天气预测中达到了相当于或更好的技能,但它们尚未展现出了改善的ensemble预测或长期天气和气候 simulations 的稳定性。在这篇文章中,我们首次把数分解 solver 与 ML 组件结合在一起,并证明它可以生成 deterministic 天气、ensemble 天气和气候预测,与best 的 ML 和物理学习方法相当。我们称之为 NeuralGCM。NeuralGCM 在 1-10 天预测中与 ML 模型相当,并与欧洲中期天气预测 ensemble 在 1-15 天预测中相当。在给定的海洋表面温度下,NeuralGCM 可以准确地跟踪气候指标,如全球平均温度,并且气候预测具有140 km 的分辨率可以显示出见识真实的热带风暴的频率和轨迹。在天气和气候方面,我们的方法可以与传统 GCM 的计算量相比,提供多个级别的计算效益。我们的结果表明,深度学习可以与传统 GCM 的任务相容,并且可以提高大规模物理 simulations 的精度,这些 simulations 是地球系统的理解和预测的关键。

Non-Contact Breathing Rate Detection Using Optical Flow

  • paper_url: http://arxiv.org/abs/2311.08426
  • repo_url: None
  • paper_authors: Robyn Maxwell, Timothy Hanley, Dara Golden, Adara Andonie, Joseph Lemley, Ashkan Parsi
  • for: 这个论文的目的是研究一种非接触式呼吸速率检测方法,使用动态推理算法。
  • methods: 这篇论文使用了光流算法来成功测量呼吸速率,通过跟踪身体特定点的运动来确定呼吸速率。
  • results: 测试表明,胸部运动可以生成非常准确的信号,RMSE为0.63。然而,面部运动也可以生成可靠的信号,但是受到头体运动干扰的影响。这些发现表明了光流算法的潜在用于非接触式呼吸速率检测,并且选择合适的点可以提高准确性。
    Abstract Breathing rate is a vital health metric that is an invaluable indicator of the overall health of a person. In recent years, the non-contact measurement of health signals such as breathing rate has been a huge area of development, with a wide range of applications from telemedicine to driver monitoring systems. This paper presents an investigation into a method of non-contact breathing rate detection using a motion detection algorithm, optical flow. Optical flow is used to successfully measure breathing rate by tracking the motion of specific points on the body. In this study, the success of optical flow when using different sets of points is evaluated. Testing shows that both chest and facial movement can be used to determine breathing rate but to different degrees of success. The chest generates very accurate signals, with an RMSE of 0.63 on the tested videos. Facial points can also generate reliable signals when there is minimal head movement but are much more vulnerable to noise caused by head/body movements. These findings highlight the potential of optical flow as a non-invasive method for breathing rate detection and emphasize the importance of selecting appropriate points to optimize accuracy.
    摘要 呼吸速率是一个重要的健康指标,对人体全面健康具有无估的意义。在最近的年头,非接触式健康信号的测量已经成为了很大的发展领域,从 теле医疗到驾驶员监测系统。本文介绍了一种非接触式呼吸速率检测方法,使用动态流体遥感算法。动态流体遥感算法可以成功地测量呼吸速率,通过跟踪体内具有特定点的运动。本研究中,使用不同的点集来评估动态流体的成功率。测试结果显示,胸部运动可以生成非常准确的信号,RMSE为0.63。脸部运动也可以生成可靠的信号,但是当头体运动较大时,会受到干扰。这些发现强调了动态流体的非侵入式方法在呼吸速率检测中的潜在优势,并且选择合适的点可以提高准确性。

On Elastic Language Models

  • paper_url: http://arxiv.org/abs/2311.07204
  • repo_url: https://github.com/RAIVNLab/MatFormer-OLMo
  • paper_authors: Chen Zhang, Benyou Wang, Dawei Song
  • for: 这个论文的目的是提出一种可塑性语言模型(ElasticLM),以适应高变化的请求流量而提供可控制的计算弹性。
  • methods: 该论文使用了知识储存distillation技术将大型语言模型压缩到小型模型中,并在请求流量变化时进行计算弹性调整,以实现可控制的响应时间-性能质量融合。
  • results: 实验结果表明,与静态基线相比,ElasticLM可以在不同的请求流量下提供可控制的响应时间-性能质量融合,并且可以在高并发下进行线上 simulate 。
    Abstract Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.
    摘要 大规模预训练语言模型已经在各种语言理解和信息检索任务中达到了吸引人的性能。知识填充提供了一种将大型语言模型压缩到小型模型的机会,以实现可接受的延迟-性能质量评估。然而,在请求数(例如,提交到搜索引擎的查询)高度变化的场景下,静态质量可能不适用。一旦模型被分配了静态质量,它可能无法适应高请求量时的延迟或低性能时的请求量较少。为解决这个问题,我们提出了弹性语言模型(ElasticLM),它可以在请求流中灵活地调整质量评估的平衡。基本想法是通过引入可灵活计算的弹性结构,使得弹性LM可以在可扩展和可控的计算下进行灵活的质量评估。具体来说,我们在弹性LM中引入了一种计算弹性,以便在不同的请求流中进行灵活的质量评估。我们还设计了一种弹性优化算法,以学习弹性LM在计算弹性下的性能。为了服务弹性LM,我们采用了一种灵活的调度策略。考虑到信息检索的特点,我们适应了弹性LM到紧凑检索和重新排序,并分别提出了弹性紧凑器(ElasticDenser)和弹性排序器(ElasticRanker)。在GLUE语言理解 benchmark上进行了线上评估,以及一些信息检索任务,如自然问题、智能问答和 MS MARCO。结果显示,弹性LM、弹性紧凑器和弹性排序器可以正确地和竞争力地与静态基线相比。此外,我们还进行了在线验证,并发现弹性LM可以根据请求流的变化提供灵活的质量评估平衡。

Input Convex LSTM: A Convex Approach for Fast Lyapunov-Based Model Predictive Control

  • paper_url: http://arxiv.org/abs/2311.07202
  • repo_url: None
  • paper_authors: Zihao Wang, Zhe Wu
  • for: 提高MPC的准确率和速度,解决ICNN中vanishing gradient问题
  • methods: 基于ICNN的LSTM模型,利用input convex性保证closed-loop稳定性,提高MPC的准确率和速度
  • results: 在非线性化学反应器的模拟研究中,提高了46.7%、31.3%和20.2%相比于基eline平板RNN、平板LSTM和输入几何RECNN模型的减少时间和mitigate vanishing gradient问题
    Abstract Leveraging Input Convex Neural Networks (ICNNs), ICNN-based Model Predictive Control (MPC) successfully attains globally optimal solutions by upholding convexity within the MPC framework. However, current ICNN architectures encounter the issue of vanishing gradients, which limits their ability to serve as deep neural networks for complex tasks. Additionally, the current neural network-based MPC, including conventional neural network-based MPC and ICNN-based MPC, faces slower convergence speed when compared to MPC based on first-principles models. In this study, we leverage the principles of ICNNs to propose a novel Input Convex LSTM for Lyapunov-based MPC, with the specific goal of reducing convergence time and mitigating the vanishing gradient problem while ensuring closed-loop stability. From a simulation study of a nonlinear chemical reactor, we observed a mitigation of vanishing gradient problem and a reduction in convergence time, with a percentage decrease of 46.7%, 31.3%, and 20.2% compared to baseline plain RNN, plain LSTM, and Input Convex Recurrent Neural Network, respectively.
    摘要 使用输入凸神经网络(ICNN),ICNN基本的模型预测控制(MPC)得到了全球最优解决方案,并保持在MPC框架中的凸性。然而,当前ICNN架构面临着衰减 gradients 问题,这限制了它们作为复杂任务的深度神经网络的能力。此外,当前神经网络基于的 MPC,包括常见神经网络基于的 MPC 和 ICNN基本的 MPC,在比基于初始理论模型的 MPC slower convergence speed 。在本研究中,我们基于 ICNN 的原理提出了一种新的输入凸 LSTM для Lyapunov-based MPC,以减少 converge 时间和消除衰减 gradients 问题,并确保关闭环Loop 稳定性。从一个非线性化学反应器的模拟研究来看,我们观察到了衰减 gradients 问题的减少和 converge 时间的减少,相比基eline plain RNN、plain LSTM 和输入凸回归神经网络,分别下降了46.7%、31.3%和20.2%。

A Consistent Diffusion-Based Algorithm for Semi-Supervised Graph Learning

  • paper_url: http://arxiv.org/abs/2311.07627
  • repo_url: None
  • paper_authors: Thomas Bonald, Nathan de Lara
  • for: 这个论文是关于半监督分类的研究,目标是将图形上的所有节点分配标签,使用一些已知标签的节点作为种子来刺激整个图形。
  • methods: 这种算法基于热传导原理,将种子节点的标签通过热导媒体传播到整个图形,然后使用每个节点的温度作为每个标签的分数函数。
  • results: 这paper证明了这种算法不是一定consistent, Unless the temperatures of the nodes at equilibrium are centered before scoring。这个步骤不仅使得算法可证明consistent,还会在实际图形上带来显著的性能提升。
    Abstract The task of semi-supervised classification aims at assigning labels to all nodes of a graph based on the labels known for a few nodes, called the seeds. One of the most popular algorithms relies on the principle of heat diffusion, where the labels of the seeds are spread by thermoconductance and the temperature of each node at equilibrium is used as a score function for each label. In this paper, we prove that this algorithm is not consistent unless the temperatures of the nodes at equilibrium are centered before scoring. This crucial step does not only make the algorithm provably consistent on a block model but brings significant performance gains on real graphs.
    摘要 semi-supervised classification的任务是将所有图节点分配标签,基于一些节点的标签(称为种子)的知道。最受欢迎的算法基于热传导原理,其中种子标签通过热导性传播,每个节点的热度(即每个标签的得分函数)在均衡状态下是一个分数函数。在这篇论文中,我们证明了这个算法不一定是一致的,除非在执行前将节点的热度中心化。这一步不仅使算法可证性提高,还会在真实的图上带来显著的性能提升。

Quantum Machine Learning for Remote Sensing: Exploring potential and challenges

  • paper_url: http://arxiv.org/abs/2311.07626
  • repo_url: None
  • paper_authors: Artur Miroszewski, Jakub Nalepa, Bertrand Le Saux, Jakub Mielczarek
  • for: 这篇论文研究了量子机器学习(QML)在遥感领域的应用。
  • methods: 该论文使用了量子计算机来处理和分析遥感数据,并研究了量子优势在QML中对遥感领域的影响。
  • results: 研究发现,尽管量子计算机的kernel值峰值集中问题会降低其性能,但这并不完全消除量子优势在QML中对遥感领域的影响。
    Abstract The industry of quantum technologies is rapidly expanding, offering promising opportunities for various scientific domains. Among these emerging technologies, Quantum Machine Learning (QML) has attracted considerable attention due to its potential to revolutionize data processing and analysis. In this paper, we investigate the application of QML in the field of remote sensing. It is believed that QML can provide valuable insights for analysis of data from space. We delve into the common beliefs surrounding the quantum advantage in QML for remote sensing and highlight the open challenges that need to be addressed. To shed light on the challenges, we conduct a study focused on the problem of kernel value concentration, a phenomenon that adversely affects the runtime of quantum computers. Our findings indicate that while this issue negatively impacts quantum computer performance, it does not entirely negate the potential quantum advantage in QML for remote sensing.
    摘要 产业领域的量子技术在快速发展,为不同科学领域提供了抢夺的机会。这些emerging technologies中,量子机器学习(QML)已经吸引了广泛的注意,因为它可能改变数据处理和分析的方式。在这篇论文中,我们调查了QML在遥感领域的应用。据信QML可以为遥感数据分析提供有价值的洞察。我们探讨了量子优势在QML领域的共同信念,并高亮了需要解决的开放挑战。为了照明这些挑战,我们进行了关于值集中心的问题研究,这是量子计算机性能的一个问题。我们的发现表明,虽然这个问题会影响量子计算机的性能,但并不完全否定量子优势在QML领域的遥感数据分析中。

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

  • paper_url: http://arxiv.org/abs/2311.07625
  • repo_url: None
  • paper_authors: Rishav Mukherji, Mark Schöne, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney
  • for: 这篇论文主要探讨了深度学习模型中的活动簇节点,以及将其转移到神经元 computing 设备上的可能性。
  • methods: 本研究使用了 GRU 模型,并通过将活动节点簇节过滤来实现活动簇。此外,研究还考虑了活动簇与参数簇的互动。
  • results: 研究获得了Up to $20\times$ 的computational 缩减,并且维持了 perplexity 值在 $60$ 以下,在 Penn Treebank 语言模型任务上。这个结果不仅不受对对参数簇的影响,而且也不受对对活动簇的影响。
    Abstract Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to $20\times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning.
    摘要

Learning Symmetrization for Equivariance with Orbit Distance Minimization

  • paper_url: http://arxiv.org/abs/2311.07143
  • repo_url: https://github.com/tiendatnguyen-vision/orbit-symmetrize
  • paper_authors: Tien Dat Nguyen, Jinwoo Kim, Hongseok Yang, Seunghoon Hong
  • for: 将任意神经网络架构变换为具有给定群的对称性和等变性。
  • methods: 基于kim et al. (2023)和kaba et al. (2023)的提议,使用优化的损失函数来替换神经特征的群表示,以提高适用范围。
  • results: 在SO(2)图像分类任务和O(1, 3)任务上实验表明,我们的方法具有竞争力和更广泛的通用性。实现将于https://github.com/tiendatnguyen-vision/Orbit-symmetrize 上公开。
    Abstract We present a general framework for symmetrizing an arbitrary neural-network architecture and making it equivariant with respect to a given group. We build upon the proposals of Kim et al. (2023); Kaba et al. (2023) for symmetrization, and improve them by replacing their conversion of neural features into group representations, with an optimization whose loss intuitively measures the distance between group orbits. This change makes our approach applicable to a broader range of matrix groups, such as the Lorentz group O(1, 3), than these two proposals. We experimentally show our method's competitiveness on the SO(2) image classification task, and also its increased generality on the task with O(1, 3). Our implementation will be made accessible at https://github.com/tiendatnguyen-vision/Orbit-symmetrize.
    摘要 我们提出一个通用的框架,使得任意神经网络架构变为对一个给定群的对称化和可变性满足。我们基于kim et al. (2023)和kaba et al. (2023)的提议,并且改进了它们,将神经特征转换为群表示的步骤替换为一个优化过程,损失函数直观度量群轨迹之间的距离。这种变化使我们的方法可以应用于更广泛的矩阵群,如 Lorentz 群 O(1, 3),而不是这两个提议。我们在 SO(2) 图像分类任务上进行了实验,并证明了我们的方法的竞争力。此外,我们还证明了我们的方法在 O(1, 3) 上的更高一致性。我们的实现将在 GitHub 上提供,地址为

SABAF: Removing Strong Attribute Bias from Neural Networks with Adversarial Filtering

  • paper_url: http://arxiv.org/abs/2311.07141
  • repo_url: None
  • paper_authors: Jiazhi Li, Mahyar Khayatkhoei, Jiageng Zhu, Hanchen Xie, Mohamed E. Hussein, Wael AbdAlmageed
  • for: 避免神经网络依赖保护属性(例如种族、性别、年龄)进行预测是当今发展公平可信的AI的关键。虽然已有许多有效的属性偏好除去方法被提出,但它们的局限性尚未得到充分探讨。为此,在这项工作中,我们数学和实验证明了现有属性偏好除去方法在强偏好情况下的局限性,并提出了一种新的方法可以 Mitigate这种局限性。
  • methods: 我们首先 derive了一个普适的非虚空信息理论上限,表明现有属性偏好除去方法在偏好强度较弱时才能够有效。然后,我们 derive了任何可以除去属性偏好的方法必须满足的必要条件。 inspirited by这个条件,我们则提出了一种新的方法,使用对抗目标函数直接在输入空间中过滤保护属性,不需要任何特定目标标签,并且可以在强偏好和中等偏好情况下达到州际表现。
  • results: 我们对 sintetic、图像和人口普查数据集进行了广泛的实验,以验证 derive的理论上限和其实际效果,以及提出的新方法在强偏好和中等偏好情况下的效果。结果表明,我们的方法可以减少强偏好情况下的属性偏好,并且在中等偏好情况下可以保持现有方法的表现。
    Abstract Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for prediction is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. To that end, in this work, we mathematically and empirically reveal the limitation of existing attribute bias removal methods in presence of strong bias and propose a new method that can mitigate this limitation. Specifically, we first derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength, revealing that they are effective only when the inherent bias in the dataset is relatively weak. Next, we derive a necessary condition for the existence of any method that can remove attribute bias regardless of the bias strength. Inspired by this condition, we then propose a new method using an adversarial objective that directly filters out protected attributes in the input space while maximally preserving all other attributes, without requiring any specific target label. The proposed method achieves state-of-the-art performance in both strong and moderate bias settings. We provide extensive experiments on synthetic, image, and census datasets, to verify the derived theoretical bound and its consequences in practice, and evaluate the effectiveness of the proposed method in removing strong attribute bias.
    摘要 We first derive an upper bound on the performance of any attribute bias removal method in terms of bias strength, showing that they are effective only when the inherent bias in the dataset is relatively weak. We then derive a necessary condition for the existence of any method that can remove attribute bias regardless of strength. Inspired by this condition, we propose a new method using an adversarial objective that directly filters out protected attributes in the input space while preserving all other attributes, without requiring any specific target label.The proposed method achieves state-of-the-art performance in both strong and moderate bias settings. We provide extensive experiments on synthetic, image, and census datasets to verify the derived theoretical bound and its consequences in practice, and evaluate the effectiveness of the proposed method in removing strong attribute bias.

How to Do Machine Learning with Small Data? – A Review from an Industrial Perspective

  • paper_url: http://arxiv.org/abs/2311.07126
  • repo_url: None
  • paper_authors: Ivan Kraljevski, Yong Chul Ju, Dmitrij Ivanov, Constanze Tschöpe, Matthias Wolff
  • for: 本研究旨在探讨机器学习在小数据情况下的应用和工程应用。
  • methods: 本文提出了一种机器学习 formalism,并对小数据的定义、工业应用和机器学习方法进行了简要的介绍。
  • results: 本文介绍了五种机器学习小数据工业应用中的挑战,并对域表示和数据收集的考虑进行了概述。
    Abstract Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational resources that resulted in exponential data growth. However, because of the insufficient amount of data in some cases, employing machine learning in solving complex tasks is not straightforward or even possible. As a result, machine learning with small data experiences rising importance in data science and application in several fields. The authors focus on interpreting the general term of "small data" and their engineering and industrial application role. They give a brief overview of the most important industrial applications of machine learning and small data. Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced. Five critical challenges of machine learning with small data in industrial applications are presented: unlabeled data, imbalanced data, missing data, insufficient data, and rare events. Based on those definitions, an overview of the considerations in domain representation and data acquisition is given along with a taxonomy of machine learning approaches in the context of small data.
    摘要 人工智能在科学、业务和日常生活中经历了技术突破的几十年。这些进步归功于计算资源的急剧增加和缩小,导致数据的快速增长。然而,由于一些情况下数据的缺乏,直接或者even possible的使用机器学习解决复杂任务不是一件容易的事情。因此,机器学习小数据在数据科学和应用领域中升起了重要性。作者将关注“小数据”的通用定义,以及其在工程和产业应用中的作用。他们给出了关于机器学习和小数据的重要工业应用的简要概述。小数据在big data的定义下被定义为具有以下特征:小数据量、高度归一化、缺失数据、缺乏数据和罕见事件。作者还提出了五个关键的机器学习小数据在工业应用中的挑战:无标签数据、偏极数据、缺失数据、缺乏数据和罕见事件。根据这些定义,作者还给出了域表示和数据获取的考虑因素,以及机器学习小数据的稍等分类。

Novel models for fatigue life prediction under wideband random loads based on machine learning

  • paper_url: http://arxiv.org/abs/2311.07114
  • repo_url: None
  • paper_authors: Hong Sun, Yuanying Qiu, Jing Li, Jin Bai, Ming Peng
  • for: 预测轧钢质量寿命
  • methods: 使用三种机器学习模型:支持向量机(SVM)、泊松过程回归(GPR)和人工神经网络(ANN)建立三种宽频质量寿命预测模型,并使用多个频率谱样本和各种质量相关参数提高模型的通用能力。
  • results: 对比传统频率域模型,新开发的机器学习模型具有更高的预测精度,其中人工神经网络模型在三种机器学习模型中表现最佳。
    Abstract Machine learning as a data-driven solution has been widely applied in the field of fatigue lifetime prediction. In this paper, three models for wideband fatigue life prediction are built based on three machine learning models, i.e. support vector machine (SVM), Gaussian process regression (GPR) and artificial neural network (ANN). The generalization ability of the models is enhanced by employing numerous power spectra samples with different bandwidth parameters and a variety of material properties related to fatigue life. Sufficient Monte Carlo numerical simulations demonstrate that the newly developed machine learning models are superior to the traditional frequency-domain models in terms of life prediction accuracy and the ANN model has the best overall performance among the three developed machine learning models.
    摘要 机器学习作为数据驱动解决方案广泛应用于软件衰弱生命预测领域。本文提出了基于三种机器学习模型(支持向量机器、泊松过程回归和人工神经网络)构建三种宽带衰弱生命预测模型,以提高模型通用性。通过使用不同带宽参数和多种相关衰弱生命物理性能的数据采样,提高模型的泛化能力。我们通过充分的蒙特卡洛仿真计算表明,新发展的机器学习模型在生命预测精度方面比传统频率域模型有所提高,而人工神经网络模型在三种机器学习模型中显示出最佳总表现。

Adversarial Purification for Data-Driven Power System Event Classifiers with Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.07110
  • repo_url: None
  • paper_authors: Yuanbin Cheng, Koji Yamashita, Jim Follum, Nanpeng Yu
  • for: 本研究旨在提出一种有效的防御策略,以防止针对机器学习基于PMU数据的电力系统事件分类器的恶意攻击。
  • methods: 该方法包括两步:首先,在PMU数据中添加噪声;其次,使用预训练的神经网络来消除添加的噪声,同时 removelinear perturbations introduced by adversarial attacks。
  • results: 实验结果表明,提议的扩散模型基于的防御策略可以增强事件分类器在恶意攻击下的准确率,同时满足实时操作的需求。另外,理论分析表明,该方法可以减少PMU数据的欧几何距离,从而减少恶意攻击的影响。
    Abstract The global deployment of the phasor measurement units (PMUs) enables real-time monitoring of the power system, which has stimulated considerable research into machine learning-based models for event detection and classification. However, recent studies reveal that machine learning-based methods are vulnerable to adversarial attacks, which can fool the event classifiers by adding small perturbations to the raw PMU data. To mitigate the threats posed by adversarial attacks, research on defense strategies is urgently needed. This paper proposes an effective adversarial purification method based on the diffusion model to counter adversarial attacks on the machine learning-based power system event classifier. The proposed method includes two steps: injecting noise into the PMU data; and utilizing a pre-trained neural network to eliminate the added noise while simultaneously removing perturbations introduced by the adversarial attacks. The proposed adversarial purification method significantly increases the accuracy of the event classifier under adversarial attacks while satisfying the requirements of real-time operations. In addition, the theoretical analysis reveals that the proposed diffusion model-based adversarial purification method decreases the distance between the original and compromised PMU data, which reduces the impacts of adversarial attacks. The empirical results on a large-scale real-world PMU dataset validate the effectiveness and computational efficiency of the proposed adversarial purification method.
    摘要

Exposition on over-squashing problem on GNNs: Current Methods, Benchmarks and Challenges

  • paper_url: http://arxiv.org/abs/2311.07073
  • repo_url: None
  • paper_authors: Dai Shi, Andi Han, Lequan Lin, Yi Guo, Junbin Gao
  • for: 本研究旨在探讨Graph-based message-passing neural networks (MPNNs)中的Over-squashing (OSQ)问题,包括OSQ的不同形式、addressing OSQ的方法和与表达能力之间的关系。
  • methods: 本研究总结了现有literature中对OSQ问题的不同形式和addressing OSQ的方法,包括三类方法:1) node feature transformation, 2) message passing scheme design, 3) graph structure design。
  • results: 本研究评估了现有works中对OSQ问题的解决方案,包括employned empirical methods和computational complexities。此外,本研究还提出了一些未解决的问题,以及可能的解决方案。
    Abstract Graph-based message-passing neural networks (MPNNs) have achieved remarkable success in both node and graph-level learning tasks. However, several identified problems, including over-smoothing (OSM), limited expressive power, and over-squashing (OSQ), still limit the performance of MPNNs. In particular, OSQ serves as the latest identified problem, where MPNNs gradually lose their learning accuracy when long-range dependencies between graph nodes are required. In this work, we provide an exposition on the OSQ problem by summarizing different formulations of OSQ from current literature, as well as the three different categories of approaches for addressing the OSQ problem. In addition, we also discuss the alignment between OSQ and expressive power and the trade-off between OSQ and OSM. Furthermore, we summarize the empirical methods leveraged from existing works to verify the efficiency of OSQ mitigation approaches, with illustrations of their computational complexities. Lastly, we list some open questions that are of interest for further exploration of the OSQ problem along with potential directions from the best of our knowledge.
    摘要 GRAPH-BASED MESSAGE-PASSING NEURAL NETWORKS (MPNNs) 已经取得了优异的成绩在节点和图结构学习任务中。然而,一些已知的问题,包括过滤(OSM)、有限表达力和过滤(OSQ),仍然限制 MPNNs 的表现。特别是 OSQ,最新的已知问题,MPNNs 在需要图节点之间长距离关系时逐渐失去学习精度。在这个工作中,我们提供了 OSQ 问题的概述,包括现有文献中不同形式的 OSQ 问题和Addressing OSQ 问题的三种类型方法。此外,我们还讨论了 OSQ 与表达力之间的对应关系以及 OSQ 和 OSM 之间的贸易。此外,我们还总结了现有工作中用于验证 OSQ 缓解方法的实际方法,包括其计算复杂性。最后,我们列出了一些未解决的问题,以及可能的解决方案。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you need Traditional Chinese, please let me know.

To Transformers and Beyond: Large Language Models for the Genome

  • paper_url: http://arxiv.org/abs/2311.07621
  • repo_url: None
  • paper_authors: Micaela E. Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J. Theis, Alan Moses, Bo Wang
  • for: 本文主要是为了介绍Large Language Models(LLMs)在 genomics 中的应用,以及这些模型在计算生物学和计算机科学领域的转型作用。
  • methods: 本文主要采用 transformer 架构和其他 LLMs 进行 genomics 数据的分析和模型化。
  • results: 本文提出了一种基于 transformer 架构的基因组分析方法,并评估了这种方法在不同的数据集上的性能。Here’s the summary in English:
  • for: The paper primarily focuses on the application of Large Language Models (LLMs) in genomics and their transformative role in computational biology and computer science.
  • methods: The paper mainly uses transformer architecture and other LLMs for genomic data analysis and modeling.
  • results: The paper proposes a gene expression analysis method based on the transformer architecture and evaluates its performance on different data sets.
    Abstract In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based on the transformer architecture, in genomics. Building on the foundation of traditional convolutional neural networks and recurrent neural networks, we explore both the strengths and limitations of transformers and other LLMs for genomics. Additionally, we contemplate the future of genomic modeling beyond the transformer architecture based on current trends in research. The paper aims to serve as a guide for computational biologists and computer scientists interested in LLMs for genomic data. We hope the paper can also serve as an educational introduction and discussion for biologists to a fundamental shift in how we will be analyzing genomic data in the future.
    摘要 在高速发展的基因组学领域中,深度学习已经成为解决复杂计算挑战的有用工具。本文集中关注基因组学中的大语言模型(LLMs),主要基于转换器架构。传统的卷积神经网络和循环神经网络的基础上,我们探讨了转换器和其他 LLMS 在基因组学方面的优势和局限性。此外,我们还考虑了未来基因组数据分析的发展趋势,以及在研究中使用 LLMs 的可能性。本文旨在为计算生物学家和计算机科学家提供 LLMs 在基因组数据分析方面的指南,同时也为生物学家提供一种基因组数据分析的基本变革。

A PAC-Bayesian Perspective on the Interpolating Information Criterion

  • paper_url: http://arxiv.org/abs/2311.07013
  • repo_url: None
  • paper_authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
  • for: 本文旨在解决深度学习中的理论实践差距问题,即理论不能提供实践中有用的指导。
  • methods: 本文使用Interpolating Information Criterion(IIC)来研究过参数化模型的性能。
  • results: 根据IIC, authors得出了一个PAC-Bayes bound,可以描述拥有多参数化模型在 interpolating régime中的性能。从这个 bound 中, authors可以量化不同因素对模型的泛化性能产生的影响,包括模型、优化器和参数初始化方案的组合;Empirical Neural Tangent Kernel 的 спектrum;损失函数的曲线形状;和数据中的噪声。
    Abstract Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent contradiction with the well-known bias-variance tradeoff. While such phenomena have proven challenging to theoretically study for general models, the recently proposed Interpolating Information Criterion (IIC) provides a valuable theoretical framework to examine performance for overparameterized models. Using the IIC, a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence generalization performance in the interpolating regime. From the provided bound, we quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, optimizer, and parameter-initialization scheme; the spectrum of the empirical neural tangent kernel; curvature of the loss landscape; and noise present in the data.
    摘要