cs.LG - 2023-09-08

Probabilistic Safety Regions Via Finite Families of Scalable Classifiers

  • paper_url: http://arxiv.org/abs/2309.04627
  • repo_url: None
  • paper_authors: Alberto Carlevaro, Teodoro Alamo, Fabrizio Dabbene, Maurizio Mongelli
  • for: 这种研究旨在提供机器学习模型的 probabilistic certification,以确保模型的误分类率在输入空间中是可控的。
  • methods: 这种方法使用了可扩展的分类器来链接机器学习的调试与误分类控制。
  • results: 多个测试表明,这种方法可以减少误分类率,并且可以在实际应用中使用,如智能流动应用程序。
    Abstract Supervised classification recognizes patterns in the data to separate classes of behaviours. Canonical solutions contain misclassification errors that are intrinsic to the numerical approximating nature of machine learning. The data analyst may minimize the classification error on a class at the expense of increasing the error of the other classes. The error control of such a design phase is often done in a heuristic manner. In this context, it is key to develop theoretical foundations capable of providing probabilistic certifications to the obtained classifiers. In this perspective, we introduce the concept of probabilistic safety region to describe a subset of the input space in which the number of misclassified instances is probabilistically controlled. The notion of scalable classifiers is then exploited to link the tuning of machine learning with error control. Several tests corroborate the approach. They are provided through synthetic data in order to highlight all the steps involved, as well as through a smart mobility application.
    摘要 <>将文本翻译成简化中文。<>超visired分类可以识别数据中的模式,以分类行为。 canonical solution中包含内在的误分类错误,这是机器学习的数字化方法的特点。 数据分析师可能会在一个类上减少分类错误的代价是增加其他类的错误。这种设计阶段的错误控制通常是 empirical manner。在这种情况下,我们引入了 probabilistic safety region,用于描述输入空间中数据分类错误的概率控制。我们然后利用可扩展分类器来连接机器学习的调整和错误控制。多个测试证明了这种方法的有效性,包括通过synthetic data来展示所有步骤,以及通过智能移动应用程序。Note: "Supervised classification" in the original text is translated as "超visired分类" in Simplified Chinese, which is a combination of "supervised" and "classification".

Knowledge Distillation-Empowered Digital Twin for Anomaly Detection

  • paper_url: http://arxiv.org/abs/2309.04616
  • repo_url: None
  • paper_authors: Qinghua Xu, Shaukat Ali, Tao Yue, Zaimovic Nedim, Inderjeet Singh
  • for: 这个研究旨在提出一个名为 KDDT 的新方法,用于铁路控制和管理系统 (TCMS) 中的异常探测。
  • methods: KDDT 利用语言模型 (LM) 和长短期内存 (LSTM) 网络,将时间序列和上下文特征分别提取出来。此外,KDDT 还利用知识传播 (KD) 技术,将外部数据集整合到模型中,以增加数据量。
  • results: 在 Alstom 提供的两个数据集上进行评估,KDDT 的 F1 分数分别为 0.931 和 0.915,显示 KDDT 的有效性。此外,透过实验研究,发现 KDDT 模型中LM、LSTM和KD的个别贡献,对整体性能的提升为12.4%、3%和6.05%。
    Abstract Cyber-physical systems (CPSs), like train control and management systems (TCMS), are becoming ubiquitous in critical infrastructures. As safety-critical systems, ensuring their dependability during operation is crucial. Digital twins (DTs) have been increasingly studied for this purpose owing to their capability of runtime monitoring and warning, prediction and detection of anomalies, etc. However, constructing a DT for anomaly detection in TCMS necessitates sufficient training data and extracting both chronological and context features with high quality. Hence, in this paper, we propose a novel method named KDDT for TCMS anomaly detection. KDDT harnesses a language model (LM) and a long short-term memory (LSTM) network to extract contexts and chronological features, respectively. To enrich data volume, KDDT benefits from out-of-domain data with knowledge distillation (KD). We evaluated KDDT with two datasets from our industry partner Alstom and obtained the F1 scores of 0.931 and 0.915, respectively, demonstrating the effectiveness of KDDT. We also explored individual contributions of the DT model, LM, and KD to the overall performance of KDDT, via a comprehensive empirical study, and observed average F1 score improvements of 12.4%, 3%, and 6.05%, respectively.
    摘要 Cyber-physical systems (CPSs), like train control and management systems (TCMS), are becoming increasingly prevalent in critical infrastructures. As safety-critical systems, ensuring their reliability during operation is crucial. Digital twins (DTs) have been widely studied for this purpose due to their ability to monitor and warn of anomalies in real-time, as well as predict and detect anomalies. However, constructing a DT for anomaly detection in TCMS requires a sufficient amount of training data and extracting both chronological and context features of high quality. Therefore, in this paper, we propose a novel method named KDDT for TCMS anomaly detection. KDDT utilizes a language model (LM) and a long short-term memory (LSTM) network to extract contexts and chronological features, respectively. To enrich data volume, KDDT leverages out-of-domain data with knowledge distillation (KD). We evaluated KDDT with two datasets from our industry partner Alstom and obtained F1 scores of 0.931 and 0.915, respectively, demonstrating the effectiveness of KDDT. We also conducted a comprehensive empirical study to explore the individual contributions of the DT model, LM, and KD to the overall performance of KDDT, and observed average F1 score improvements of 12.4%, 3%, and 6.05%, respectively.

Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing

  • paper_url: http://arxiv.org/abs/2309.04612
  • repo_url: https://github.com/yingwangyang/hrc_feature_cross
  • paper_authors: Wangyang Ying, Dongjie Wang, Kunpeng Liu, Leilei Sun, Yanjie Fu
  • for: 本研究旨在生成新有意义的特征,以创造一个抑制表示空间。
  • methods: 本文提出了一种原理性的和通用的表示 crossing 框架,以解决自动化特征生成中的几大挑战:有意义、Robust和高效的生成。
  • results: 通过实验研究, authors 证明了提出的方法的有效性和高效性。代码可以在 https://github.com/yingwangyang/HRC_feature_cross.git 上下载。
    Abstract Feature generation aims to generate new and meaningful features to create a discriminative representation space.A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space, in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities.We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation.To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach.We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.
    摘要 < translate> 特征生成旨在生成新而有意义的特征,以创建一个分化表示空间。一个生成的特征是有意义的当其来自一个特征对的内在特征交互。在实际世界中,经验丰富的数据科学家可以识别可能有用的特征对互动,并从极大的搜索空间中生成有意义的维度,在最佳的横跨路径上进行优化。但是,机器有限的人类化能力。我们总结这类学习任务为自动优化特征生成。自动优化特征生成存在一些未得到解决的挑战:有意义、强健和高效的生成。为了解决这些挑战,我们提出了一种原则性的和通用的表示交叉框架,以解决自动优化特征生成。 Here's the translation of the text in Simplified Chinese:特征生成的目标是生成新而有意义的特征,以创建一个分化表示空间。一个生成的特征是有意义的当其来自一个特征对的内在特征交互。在实际世界中,经验丰富的数据科学家可以识别可能有用的特征对互动,并从极大的搜索空间中生成有意义的维度,在最佳的横跨路径上进行优化。但是,机器有限的人类化能力。我们总结这类学习任务为自动优化特征生成。自动优化特征生成存在一些未得到解决的挑战:有意义、强健和高效的生成。为了解决这些挑战,我们提出了一种原则性的和通用的表示交叉框架,以解决自动优化特征生成。

Online Infinite-Dimensional Regression: Learning Linear Operators

  • paper_url: http://arxiv.org/abs/2309.06548
  • repo_url: None
  • paper_authors: Vinod Raman, Unique Subedi, Ambuj Tewari
  • for: 本文研究了在在线设定下学习线性算子的问题,具体来说是学习两个无穷维希尔бер特空间之间的线性算子。
  • methods: 本文使用了在线学习方法,并证明了任何 $p \in [1, \infty)$ 下的线性算子都可以在线学习。同时,本文也证明了一个不可能性结果,即对于Operator norm下的线性算子,不可能在线学习。此外,本文还分别证明了在线学习和一致极值点的分离。
  • results: 本文证明了一个不可能性结果,即对于Operator norm下的线性算子,不可能在线学习。此外,本文还提供了一个可以在线学习的线性算子的示例,但这个算子并不具有一致极值点的性质。最后,本文证明了这些结果在agnostik PAC设定下也是正确的。
    Abstract We consider the problem of learning linear operators under squared loss between two infinite-dimensional Hilbert spaces in the online setting. We show that the class of linear operators with uniformly bounded $p$-Schatten norm is online learnable for any $p \in [1, \infty)$. On the other hand, we prove an impossibility result by showing that the class of uniformly bounded linear operators with respect to the operator norm is \textit{not} online learnable. Moreover, we show a separation between online uniform convergence and online learnability by identifying a class of bounded linear operators that is online learnable but uniform convergence does not hold. Finally, we prove that the impossibility result and the separation between uniform convergence and learnability also hold in the agnostic PAC setting.
    摘要 我们考虑线性算子学习问题,特别是在线上设定下,将两个无穷维度希尔伯特空间之间的squared loss函数学习。我们证明了$p$-Schatten нор的上界紧缩的线性算子集是在任何$p \in [1, \infty)$情况下可线性学习。另一方面,我们证明了一个不可能性结果,表明一个具有操作norm上界的线性算子集不可能在线上学习。此外,我们还证明了一个分离结果,说明一个受限于操作norm的线性算子集可以在线上学习,但uniform convergence不具体。最后,我们证明了这些结果在agnostik PAC设定下也成立。

Motif-aware Attribute Masking for Molecular Graph Pre-training

  • paper_url: http://arxiv.org/abs/2309.04589
  • repo_url: https://github.com/einae-nd/moama-dev
  • paper_authors: Eric Inae, Gang Liu, Meng Jiang
  • for: 用于预训图 neural network 中的属性重建。
  • methods: 使用 motif-aware 属性遮盖策略,通过利用邻近 motif 中 atoms 的信息,捕捉高级结构。
  • results: 在 eight 种分子性质预测任务上显示出优异表现,比randomly select nodes 的方法更能捕捉高级结构。
    Abstract Attribute reconstruction is used to predict node or edge features in the pre-training of graph neural networks. Given a large number of molecules, they learn to capture structural knowledge, which is transferable for various downstream property prediction tasks and vital in chemistry, biomedicine, and material science. Previous strategies that randomly select nodes to do attribute masking leverage the information of local neighbors However, the over-reliance of these neighbors inhibits the model's ability to learn from higher-level substructures. For example, the model would learn little from predicting three carbon atoms in a benzene ring based on the other three but could learn more from the inter-connections between the functional groups, or called chemical motifs. In this work, we propose and investigate motif-aware attribute masking strategies to capture inter-motif structures by leveraging the information of atoms in neighboring motifs. Once each graph is decomposed into disjoint motifs, the features for every node within a sample motif are masked. The graph decoder then predicts the masked features of each node within the motif for reconstruction. We evaluate our approach on eight molecular property prediction datasets and demonstrate its advantages.
    摘要 <>translate("Attribute reconstruction is used to predict node or edge features in the pre-training of graph neural networks. Given a large number of molecules, they learn to capture structural knowledge, which is transferable for various downstream property prediction tasks and vital in chemistry, biomedicine, and material science. Previous strategies that randomly select nodes to do attribute masking leverage the information of local neighbors However, the over-reliance of these neighbors inhibits the model's ability to learn from higher-level substructures. For example, the model would learn little from predicting three carbon atoms in a benzene ring based on the other three but could learn more from the inter-connections between the functional groups, or called chemical motifs. In this work, we propose and investigate motif-aware attribute masking strategies to capture inter-motif structures by leveraging the information of atoms in neighboring motifs. Once each graph is decomposed into disjoint motifs, the features for every node within a sample motif are masked. The graph decoder then predicts the masked features of each node within the motif for reconstruction. We evaluate our approach on eight molecular property prediction datasets and demonstrate its advantages.")]Here's the translation: attribute 重建是用于预训练图 neural network 中的节点或边特征预测。给出大量分子,它们学习了结构知识,可以转移到多种下游性预测任务中,是化学、生物医学和材料科学中不可或缺的。先前的策略是随机选择节点进行特征遮盾,利用当地邻居的信息。然而,这种过度依赖邻居的做法限制了模型的能力,学习更高级的结构。例如,模型可以从预测三个碳原子的бен沃杂质中学习的非常少,但可以从功能组之间的连接学习更多。在这项工作中,我们提出和探索了结构意识的 attribute 遮盾策略,以利用邻居结构中的原子信息。每个图被分解成独立的结构,每个样本结构中的节点特征被遮盾。图解码器然后预测每个结构中的遮盾节点特征,以进行重建。我们在八个分子性质预测数据集上评估了我们的方法,并证明了它的优势。

Circles: Inter-Model Comparison of Multi-Classification Problems with High Number of Classes

  • paper_url: http://arxiv.org/abs/2309.05672
  • repo_url: None
  • paper_authors: Nina Mir, Ragaad AlTarawneh, Shah Rukh Humayoun
  • for: 该论文旨在提供一种可交互的视觉分析工具,帮助用户对多类别分类模型进行视觉比较。
  • methods: 该论文使用了一种叫做 “Concentric Radial Line Layout” 的视觉分析方法,以解决高类别数据集中模型比较的问题。
  • results: 该论文通过一种名叫 “Circles” 的交互式视觉分析工具,可以同时显示多个分类模型的结果,并且可以在一个视图中进行比较。
    Abstract The recent advancements in machine learning have motivated researchers to generate classification models dealing with hundreds of classes such as in the case of image datasets. However, visualization of classification models with high number of classes and inter-model comparison in such classification problems are two areas that have not received much attention in the literature, despite the ever-increasing use of classification models to address problems with very large class categories. In this paper, we present our interactive visual analytics tool, called Circles, that allows a visual inter-model comparison of numerous classification models with 1K classes in one view. To mitigate the tricky issue of visual clutter, we chose concentric a radial line layout for our inter-model comparison task. Our prototype shows the results of 9 models with 1K classes
    摘要 最近的机器学习技术发展,使研究者们能够生成高达百种类别的分类模型,如图像数据集中的情况。然而,对于高类别数的分类问题,Visual化分类模型和多模型比较在文献中尚未受到充分关注,尽管使用分类模型解决大类别数问题的使用量不断增加。在这篇论文中,我们介绍了我们的互动式视觉分析工具“环”,它可以同时显示多个分类模型的1K类别结果。为了解决Visual clutter的问题,我们选择了Concentric radial line布局。我们的原型显示了9种模型的1K类别结果。

Towards Interpretable Solar Flare Prediction with Attention-based Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.04558
  • repo_url: https://bitbucket.org/gsudmlab/fulldiskattention
  • paper_authors: Chetraj Pandey, Anli Ji, Rafal A. Angryk, Berkay Aydin
  • for: 这项研究的目的是开发一种基于注意力的深度学习模型,用于预测下一24小时内大于或等于M1.0级太阳风暴的发生。
  • methods: 该研究使用了数据增强扩大,以解决类别不均衡问题,并使用了真实技能统计指标(TSS)和海德克技能分数(HSS)进行评估。
  • results: 研究得到了一个成功的注意力基于深度学习模型,其中候选模型在24小时内大于或等于M1.0级太阳风暴预测中得到了平均TSS=0.54$\pm$0.03和HSS=0.37$\pm$0.07的Result。
    Abstract Solar flare prediction is a central problem in space weather forecasting and recent developments in machine learning and deep learning accelerated the adoption of complex models for data-driven solar flare forecasting. In this work, we developed an attention-based deep learning model as an improvement over the standard convolutional neural network (CNN) pipeline to perform full-disk binary flare predictions for the occurrence of $\geq$M1.0-class flares within the next 24 hours. For this task, we collected compressed images created from full-disk line-of-sight (LoS) magnetograms. We used data-augmented oversampling to address the class imbalance issue and used true skill statistic (TSS) and Heidke skill score (HSS) as the evaluation metrics. Furthermore, we interpreted our model by overlaying attention maps on input magnetograms and visualized the important regions focused on by the model that led to the eventual decision. The significant findings of this study are: (i) We successfully implemented an attention-based full-disk flare predictor ready for operational forecasting where the candidate model achieves an average TSS=0.54$\pm$0.03 and HSS=0.37$\pm$0.07. (ii) we demonstrated that our full-disk model can learn conspicuous features corresponding to active regions from full-disk magnetogram images, and (iii) our experimental evaluation suggests that our model can predict near-limb flares with adept skill and the predictions are based on relevant active regions (ARs) or AR characteristics from full-disk magnetograms.
    摘要 太阳风暴预测是天文天气预测中的中心问题,而最近的机器学习和深度学习技术的发展使得复杂的模型在数据驱动太阳风暴预测中得到了广泛应用。在这项工作中,我们开发了一个注意力基本的深度学习模型,作为标准 convolutional neural network(CNN)管道的改进,以实现全盘二分类太阳风暴预测。为此,我们收集了全盘线性视图(LoS)磁场图像,并使用数据扩展填充来解决类别不均问题。我们使用了真实技能统计(TSS)和海德ке技能分数(HSS)作为评估指标。此外,我们还对模型进行了解释,将注意力地图 overlay 在输入磁场图像上,并可见地将重要的区域抽象为模型最终决策的原因。研究的主要发现包括:1. 我们成功地实现了一个注意力基本的全盘风暴预测模型,该模型在下一个24小时内的M1.0级太阳风暴预测中取得了平均TSS=0.54±0.03和HSS=0.37±0.07的性能。2. 我们示出了全盘模型可以从全盘磁场图像中学习明显的活跃区域特征,并且这些特征与太阳风暴的发生有直接的关系。3. 我们的实验评估表明,我们的模型可以准确预测近地风暴,并且这些预测基于全盘磁场图像中的相关活跃区域或活跃区域特征。

Regret-Optimal Federated Transfer Learning for Kernel Regression with Applications in American Option Pricing

  • paper_url: http://arxiv.org/abs/2309.04557
  • repo_url: https://github.com/floriankrach/regretoptimalfederatedtransferlearning
  • paper_authors: Xuwei Yang, Anastasis Kratsios, Florian Krach, Matheus Grasselli, Aurelien Lucchi
  • for: 本研究目的是提出一种最优的迭代方案 для联合转移学习,以最小化在多个数据集(${\cal D}_1,\ldots,{\cal D}N$)上的参数生成误差,同时保证模型($f{\theta}$)在停止iteration(round)后的性能。
  • methods: 我们使用一种含有抽象的特殊化模型(node/agent)和中央计划器(server)之间的 continual communication机制,以实现迭代学习。在finite-rank kernel regression模型下,我们 derivate了 regret-optimal算法的显式更新。基于对 regret-optimal算法的 symmetries的利用,我们还开发了一种近似 regret-optimal的启发式,需要$\mathcal{O}(Np^2)$ fewer elementary operations。
  • results: 我们证明了这种迭代方案的 regret-optimal性,并且研究了这种方案对于偏扰攻击的抵抗性。我们发现,对于所有训练集($q$)进行最多$\varepsilon>0$的偏扰,这种方案的 regret不会增加超过$\mathcal{O}(\varepsilon q \bar{N}^{1/2})$, где $\bar{N}$是所有训练集的积合数。我们通过美国选项估价的numerical experiment validate our theoretical findings,使用随机生成的finite-rank kernel。
    Abstract We propose an optimal iterative scheme for federated transfer learning, where a central planner has access to datasets ${\cal D}_1,\dots,{\cal D}_N$ for the same learning model $f_{\theta}$. Our objective is to minimize the cumulative deviation of the generated parameters $\{\theta_i(t)\}_{t=0}^T$ across all $T$ iterations from the specialized parameters $\theta^\star_{1},\ldots,\theta^\star_N$ obtained for each dataset, while respecting the loss function for the model $f_{\theta(T)}$ produced by the algorithm upon halting. We only allow for continual communication between each of the specialized models (nodes/agents) and the central planner (server), at each iteration (round). For the case where the model $f_{\theta}$ is a finite-rank kernel regression, we derive explicit updates for the regret-optimal algorithm. By leveraging symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with $\mathcal{O}(Np^2)$ fewer elementary operations, where $p$ is the dimension of the parameter space. Additionally, we investigate the adversarial robustness of the regret-optimal algorithm showing that an adversary which perturbs $q$ training pairs by at-most $\varepsilon>0$, across all training sets, cannot reduce the regret-optimal algorithm's regret by more than $\mathcal{O}(\varepsilon q \bar{N}^{1/2})$, where $\bar{N}$ is the aggregate number of training pairs. To validate our theoretical findings, we conduct numerical experiments in the context of American option pricing, utilizing a randomly generated finite-rank kernel.
    摘要 我们提议一种最优的迭代方案 для联合转移学习,其中中央规划者有access到多个 datasets $\mathcal{D}_1,\ldots,\mathcal{D}_N$ 的同一个学习模型 $f_{\theta}$. 我们的目标是使得生成的参数 $\{\theta_i(t)\}_{t=0}^T$ across all $T$ iterations 与特定的参数 $\theta^\star_{1},\ldots,\theta^\star_N$ obtained for each dataset deviation最小,同时尊重模型 $f_{\theta(T)}$ 生成的损失函数。我们只允许每个特化模型 (节点/代理) 和中央规划者 (服务器) 之间在每个迭代 (轮) 中进行交互。对于 finite-rank kernel regression 模型,我们 deriv explicit updates for the regret-optimal algorithm。通过利用 symmetries within the regret-optimal algorithm, we further develop a nearly regret-optimal heuristic that runs with $\mathcal{O}(Np^2)$ fewer elementary operations, where $p$ is the dimension of the parameter space. 我们还investigate the adversarial robustness of the regret-optimal algorithm, showing that an adversary which perturbs $q$ training pairs by at-most $\varepsilon>0$, across all training sets, cannot reduce the regret-optimal algorithm's regret by more than $\mathcal{O}(\varepsilon q \bar{N}^{1/2})$, where $\bar{N}$ is the aggregate number of training pairs. 为验证我们的理论发现,我们在美国选择 Options 上进行了数学实验,使用了随机生成的 finite-rank kernel。

Postprocessing of Ensemble Weather Forecasts Using Permutation-invariant Neural Networks

  • paper_url: http://arxiv.org/abs/2309.04452
  • repo_url: https://github.com/khoehlein/Permutation-invariant-Postprocessing
  • paper_authors: Kevin Höhlein, Benedikt Schulz, Rüdiger Westermann, Sebastian Lerch
  • for: 这个论文旨在提出一种基于神经网络的统计处理方法,用于将数据预测 ensemble 转化为可靠的预测分布。
  • methods: 该论文使用 permutation-invariant neural networks 来实现这个目标,这些网络会对预测ensemble treated as a set of unordered member forecasts,并学习链函数,使其对排序无关。
  • results: 作者通过对calibration和锐度进行评估,并与 классифика和神经网络基本方法进行比较,demonstrate state-of-the-art 预测质量。此外, authors 还提出了 permutation-based importance analysis 来深入理解训练后处理模型吸收的信息,发现大多数重要信息都集中在一些 ensemble-internal degree of freedom 上。
    Abstract Statistical postprocessing is used to translate ensembles of raw numerical weather forecasts into reliable probabilistic forecast distributions. In this study, we examine the use of permutation-invariant neural networks for this task. In contrast to previous approaches, which often operate on ensemble summary statistics and dismiss details of the ensemble distribution, we propose networks which treat forecast ensembles as a set of unordered member forecasts and learn link functions that are by design invariant to permutations of the member ordering. We evaluate the quality of the obtained forecast distributions in terms of calibration and sharpness, and compare the models against classical and neural network-based benchmark methods. In case studies addressing the postprocessing of surface temperature and wind gust forecasts, we demonstrate state-of-the-art prediction quality. To deepen the understanding of the learned inference process, we further propose a permutation-based importance analysis for ensemble-valued predictors, which highlights specific aspects of the ensemble forecast that are considered important by the trained postprocessing models. Our results suggest that most of the relevant information is contained in few ensemble-internal degrees of freedom, which may impact the design of future ensemble forecasting and postprocessing systems.
    摘要 统计处理技术用于将raw数值天气预报转换为可靠的概率预报分布。在这种研究中,我们考虑使用固定排序的神经网络来实现这一任务。与以往方法不同,我们的网络会将预报集合视为一组无序成员预报,并学习链函数,这些链函数是设计具有排序无关性的。我们根据预报分布的准确性和锐度进行评估,并与经典方法和神经网络方法进行比较。在地面温度和风暴风速预报的实际案例中,我们实现了当今最佳预报质量。为深入理解训练过程中学习的推理过程,我们进一步提出了 permutation-based importance分析方法,该方法可以Highlight特定预报集合中重要的特征。我们的结果表明,大多数有用信息都包含在一些预报集合内部的度量上,这可能会影响未来的预报系统的设计。

End-to-End Speech Recognition and Disfluency Removal with Acoustic Language Model Pretraining

  • paper_url: http://arxiv.org/abs/2309.04516
  • repo_url: https://github.com/davidsroth/hubert-disfl
  • paper_authors: Saksham Bassi, Giulio Duregon, Siddhartha Jalagam, David Roth
  • for: 这 paper 是关于推广和改进现有的 two-stage 模型,以优化精准的杂音和对话 speech 识别性能的研究。
  • methods: 该 paper 使用了大规模自然语言处理技术,包括大规模自然语言模型的预训练和弱自监督目标的使用,以提高 audio 表示的质量。
  • results: 研究发现,使用 audio 基于语言模型的预训练,可以匹配或超越同等预训练的 two-stage 模型的性能,并且选择的预训练目标对模型的适应性有很大影响。
    Abstract The SOTA in transcription of disfluent and conversational speech has in recent years favored two-stage models, with separate transcription and cleaning stages. We believe that previous attempts at end-to-end disfluency removal have fallen short because of the representational advantage that large-scale language model pretraining has given to lexical models. Until recently, the high dimensionality and limited availability of large audio datasets inhibited the development of large-scale self-supervised pretraining objectives for learning effective audio representations, giving a relative advantage to the two-stage approach, which utilises pretrained representations for lexical tokens. In light of recent successes in large scale audio pretraining, we revisit the performance comparison between two-stage and end-to-end model and find that audio based language models pretrained using weak self-supervised objectives match or exceed the performance of similarly trained two-stage models, and further, that the choice of pretraining objective substantially effects a model's ability to be adapted to the disfluency removal task.
    摘要 最新的State-of-the-Art(SOTA)在不流畅和对话式语音识别方面倾向于两个阶段模型,即分开识别和清洁两个阶段。我们认为,过去的端到端不流畅去除方法未能达到预期的性能,主要是因为大规模语言模型预训练对于字节模型带来了表达优势。在过去,大量音频数据的高维度和有限的可用性,使得大规模自然语言处理的自我超vised预训练目标的发展受到了限制,这给了两个阶段方法一个相对优势。鉴于最近的大规模音频预训练的成功,我们重新评估了两个阶段和端到端模型之间的性能比较,发现 audio基于语言模型通过弱自监学习目标进行预训练,可以与同等预训练的两个阶段模型匹配或超越其性能,而且选择的预训练目标对模型的适应性具有很大的影响。

Soft Quantization using Entropic Regularization

  • paper_url: http://arxiv.org/abs/2309.04428
  • repo_url: https://github.com/rajmadan96/softquantization
  • paper_authors: Rajmadan Lakshmanan, Alois Pichler
  • for: 这个论文的目的是解决量化问题,即在高维空间中用有限、离散的概率度量进行最佳化。
  • methods: 这个论文使用了 entropy-regularized 量化问题,这是标准量化问题的relaxation。它采用了 softmin 函数,这是一种在理论和实践上都具有稳定性的函数。此外,它使用 entropy-regularized Wasserstein distance 来评估量化问题的准确性。
  • results: 这个论文的实验表明,使用 entropy-regularized 量化问题可以在各种不同的情况下提供优秀的性能。具体来说,它可以在高维空间中更好地适应各种各样的概率分布,并且可以在不同的概率水平下进行调整。
    Abstract The quantization problem aims to find the best possible approximation of probability measures on ${\mathbb{R}^d$ using finite, discrete measures. The Wasserstein distance is a typical choice to measure the quality of the approximation. This contribution investigates the properties and robustness of the entropy-regularized quantization problem, which relaxes the standard quantization problem. The proposed approximation technique naturally adopts the softmin function, which is well known for its robustness in terms of theoretical and practicability standpoints. Moreover, we use the entropy-regularized Wasserstein distance to evaluate the quality of the soft quantization problem's approximation, and we implement a stochastic gradient approach to achieve the optimal solutions. The control parameter in our proposed method allows for the adjustment of the optimization problem's difficulty level, providing significant advantages when dealing with exceptionally challenging problems of interest. As well, this contribution empirically illustrates the performance of the method in various expositions.
    摘要 “量化问题”的目的是找到使用finite, discrete measure approximate最好的probability measures在{\mathbb{R}^d中的方法。“ Wasserstein distance”通常用来衡量这个问题的解的质量。本贡献 investigate了entropy-regularized quantization problem的性能和稳定性,这是对于标准量化问题的relaxation。我们使用了softmin函数,它在理论和实践中都具有良好的稳定性。此外,我们使用了 entropy-regularized Wasserstein distance来衡量soft quantization problem的解的质量,并使用了Stochastic gradient方法来实现最佳解。控制参数在我们提出的方法中允许调整优化问题的困难度,具有优化问题的特殊问题的优化问题的特殊优化问题的优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊优化问题的特殊�

Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach

  • paper_url: http://arxiv.org/abs/2309.04427
  • repo_url: None
  • paper_authors: Sofiane Ouaari, Ali Burak Ünal, Mete Akgün, Nico Pfeifer
  • for: 提高隐私保护和数据安全性在机器学习应用中。
  • methods: 利用深度学习来实现robust表示学习,并将编码后的数据安全地发送到第三方进行模型训练和参数优化。
  • results: 在单模式和多模式下实现了提高的性能,并且可以在第三方工具和服务上进行安全的数据处理和模型训练。
    Abstract Several domains increasingly rely on machine learning in their applications. The resulting heavy dependence on data has led to the emergence of various laws and regulations around data ethics and privacy and growing awareness of the need for privacy-preserving machine learning (ppML). Current ppML techniques utilize methods that are either purely based on cryptography, such as homomorphic encryption, or that introduce noise into the input, such as differential privacy. The main criticism given to those techniques is the fact that they either are too slow or they trade off a model s performance for improved confidentiality. To address this performance reduction, we aim to leverage robust representation learning as a way of encoding our data while optimizing the privacy-utility trade-off. Our method centers on training autoencoders in a multi-objective manner and then concatenating the latent and learned features from the encoding part as the encoded form of our data. Such a deep learning-powered encoding can then safely be sent to a third party for intensive training and hyperparameter tuning. With our proposed framework, we can share our data and use third party tools without being under the threat of revealing its original form. We empirically validate our results on unimodal and multimodal settings, the latter following a vertical splitting system and show improved performance over state-of-the-art.
    摘要

Parallel and Limited Data Voice Conversion Using Stochastic Variational Deep Kernel Learning

  • paper_url: http://arxiv.org/abs/2309.04420
  • repo_url: None
  • paper_authors: Mohamadreza Jafaryani, Hamid Sheikhzadeh, Vahid Pourahmadi
  • For: This paper proposes a voice conversion method that works with limited data and is based on stochastic variational deep kernel learning (SVDKL).* Methods: The proposed method combines the deep neural network with the Gaussian process as a Bayesian and non-parametric method, and uses marginal likelihood optimization to train the model parameters.* Results: The proposed method obtained a higher mean opinion score, smaller spectral distortion, and better preference tests than compared methods, with approximately 80 seconds of training data.Here’s the Chinese version of the three points:* For: 这篇论文提出了一种基于有限数据的语音转换方法,使用了Stochastic Variational Deep Kernel Learning(SVDKL)。* Methods: 该方法结合深度神经网络和 Gaussian Process 为 Bayesian 和非Parametric 方法,并使用 marginal likelihood optimization 来训练模型参数。* Results: 该方法在约 80 秒的训练数据上表现出了高于比较方法的 mean opinion score、更小的spectral distortion 和更好的 preference tests。
    Abstract Typically, voice conversion is regarded as an engineering problem with limited training data. The reliance on massive amounts of data hinders the practical applicability of deep learning approaches, which have been extensively researched in recent years. On the other hand, statistical methods are effective with limited data but have difficulties in modelling complex mapping functions. This paper proposes a voice conversion method that works with limited data and is based on stochastic variational deep kernel learning (SVDKL). At the same time, SVDKL enables the use of deep neural networks' expressive capability as well as the high flexibility of the Gaussian process as a Bayesian and non-parametric method. When the conventional kernel is combined with the deep neural network, it is possible to estimate non-smooth and more complex functions. Furthermore, the model's sparse variational Gaussian process solves the scalability problem and, unlike the exact Gaussian process, allows for the learning of a global mapping function for the entire acoustic space. One of the most important aspects of the proposed scheme is that the model parameters are trained using marginal likelihood optimization, which considers both data fitting and model complexity. Considering the complexity of the model reduces the amount of training data by increasing the resistance to overfitting. To evaluate the proposed scheme, we examined the model's performance with approximately 80 seconds of training data. The results indicated that our method obtained a higher mean opinion score, smaller spectral distortion, and better preference tests than the compared methods.
    摘要 通常,voice转换被视为工程问题,受限于有限的训练数据。深入研究的深度学习方法需要巨量数据,但这限制了实际应用的可行性。相比之下,统计方法可以在有限数据情况下表现出色,但它们在复杂的映射函数模型化方面存在困难。本文提出了一种基于有限数据的voice转换方法,该方法基于随机变量深度kernel学习(SVDKL)。同时,SVDKL使得可以利用深度神经网络的表达能力和高灵活性的 Gaussian process 作为 Bayesian 和非 Parametric 方法。当混合传统kernel与深度神经网络时,可以估计非平滑的更复杂函数。此外,模型的稀疏变量 Gaussian process 解决了扩展性问题,与恰当 Gaussian process 不同,可以学习整个声学空间的全局映射函数。该方法的一个重要特点是通过最大 marginal likelihood 优化,考虑数据适应和模型复杂度,这使得模型参数的训练需要更少的训练数据。为评估该方法,我们对约 80 秒的训练数据进行了测试,结果显示,我们的方法在mean opinion score、spectral distortion 和 preference test 等方面比相比方法更高。

Emergent learning in physical systems as feedback-based aging in a glassy landscape

  • paper_url: http://arxiv.org/abs/2309.04382
  • repo_url: None
  • paper_authors: Vidyesh Rao Anisetti, Ananth Kandala, J. M. Schwarz
  • for: 研究线性物理网络如何学习线性变换,探讨这些网络的物理性质如何随权重更新规则的变化。
  • methods: 使用线性物理网络学习线性变换,并通过观察系统在反馈边界力的应用下的relaxation行为,探讨这些网络的学习过程与受损和记忆形成在偏振和玻璃系统中的相似性。
  • results: 发现学习过程类似于受损过程,系统在反馈边界力的应用下relaxation,并且通过输入力和反馈边界力的编码,记忆了输入-输出关系。此外,观察到平均方差的平方根函数随着epoch的变化展现非抽象的特征,这与玻璃系统的特征相似。这些结果提供了物理解释,表明通过编码更多细节的输入和反馈边界力,emergent学习可能是生物系统中very early的物理机制,从EVOLUTIONARY的角度来看。
    Abstract By training linear physical networks to learn linear transformations, we discern how their physical properties evolve due to weight update rules. Our findings highlight a striking similarity between the learning behaviors of such networks and the processes of aging and memory formation in disordered and glassy systems. We show that the learning dynamics resembles an aging process, where the system relaxes in response to repeated application of the feedback boundary forces in presence of an input force, thus encoding a memory of the input-output relationship. With this relaxation comes an increase in the correlation length, which is indicated by the two-point correlation function for the components of the network. We also observe that the square root of the mean-squared error as a function of epoch takes on a non-exponential form, which is a typical feature of glassy systems. This physical interpretation suggests that by encoding more detailed information into input and feedback boundary forces, the process of emergent learning can be rather ubiquitous and, thus, serve as a very early physical mechanism, from an evolutionary standpoint, for learning in biological systems.
    摘要 通过训练线性物理网络学习线性变换,我们发现其物理属性如何随权重更新规则而发展。我们的发现表明 linear physical networks 的学习行为和杂变和玻璃系统中的寿命和记忆形成过程之间存在惊人的相似之处。我们表明这种学习过程类似于冬季过程,系统在输入力和反馈边界力的重复应用下relax,从而记忆输入输出关系。这种塑化过程中,系统的相关程度增加,可以通过两点相关函数来衡量。此外,我们发现在 epoch 函数中,平均平方误差的平方根具有非对数型曲线,这是杂变系统的典型特征。这个物理解释表明,通过在输入和反馈边界力中编码更多细节,emergent learning 过程可以是非常普遍的,从EVOLUTIONARY standpoint,因此可能是生物系统中学习的非常早期物理机制。

Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control

  • paper_url: http://arxiv.org/abs/2309.04370
  • repo_url: None
  • paper_authors: David DeFazio, Eisuke Hirota, Shiqi Zhang
  • for: 这个论文旨在开发一种可以快速响应外部拖动力的视频攻击机器人系统,以帮助视障人群更方便地进行活动。
  • methods: 该论文使用了生成学习和监督学习两种方法来同时训练一个稳定的行走控制器和一个外部力估计器。控制器使得机器人在外部拖动力的情况下保持稳定的行走,而力估计器则帮助机器人响应人类的拖动力,以帮助机器人引导人类避免障碍物。
  • results: 实验结果表明,该控制器具有很好的 External Force Robustness,并且机器人可以准确地探测外部力的方向。此外,在实际硬件上进行了测试,并在视频中可以看到一个盲人与机器人一起进行活动。
    Abstract Seeing-eye robots are very useful tools for guiding visually impaired people, potentially producing a huge societal impact given the low availability and high cost of real guide dogs. Although a few seeing-eye robot systems have already been demonstrated, none considered external tugs from humans, which frequently occur in a real guide dog setting. In this paper, we simultaneously train a locomotion controller that is robust to external tugging forces via Reinforcement Learning (RL), and an external force estimator via supervised learning. The controller ensures stable walking, and the force estimator enables the robot to respond to the external forces from the human. These forces are used to guide the robot to the global goal, which is unknown to the robot, while the robot guides the human around nearby obstacles via a local planner. Experimental results in simulation and on hardware show that our controller is robust to external forces, and our seeing-eye system can accurately detect force direction. We demonstrate our full seeing-eye robot system on a real quadruped robot with a blindfolded human. The video can be seen at our project page: https://bu-air-lab.github.io/guide_dog/
    摘要 SEEING-EYE ROBOTS ARE VERY USEFUL TOOLS FOR GUIDING VISUALLY IMPAIRED PEOPLE, POTENTIALLY PRODUCING A HUGE SOCIETAL IMPACT GIVEN THE LOW AVAILABILITY AND HIGH COST OF REAL GUIDE DOGS. ALTHOUGH A FEW SEEING-EYE ROBOT SYSTEMS HAVE ALREADY BEEN DEMONSTRATED, NONE CONSIDERED EXTERNAL TUGS FROM HUMANS, WHICH FREQUENTLY OCCUR IN A REAL GUIDE DOG SETTING. IN THIS PAPER, WE SIMULTANEOUSLY TRAIN A LOCOMOTION CONTROLLER THAT IS ROBUST TO EXTERNAL TUGGING FORCES VIA REINFORCEMENT LEARNING (RL), AND AN EXTERNAL FORCE ESTIMATOR VIA SUPERVISED LEARNING. THE CONTROLLER ENSURES STABLE WALKING, AND THE FORCE ESTIMATOR ENABLES THE ROBOT TO RESPOND TO THE EXTERNAL FORCES FROM THE HUMAN. THESE FORCES ARE USED TO GUIDE THE ROBOT TO THE GLOBAL GOAL, WHICH IS UNKNOWN TO THE ROBOT, WHILE THE ROBOT GUIDES THE HUMAN AROUND NEARBY OBSTACLES VIA A LOCAL PLANNER. EXPERIMENTAL RESULTS IN SIMULATION AND ON HARDWARE SHOW THAT OUR CONTROLLER IS ROBUST TO EXTERNAL FORCES, AND OUR SEEING-EYE SYSTEM CAN ACCURATELY DETECT FORCE DIRECTION. WE DEMONSTRATE OUR FULL SEEING-EYE ROBOT SYSTEM ON A REAL QUADRUPED ROBOT WITH A BLINDFOLD HUMAN. THE VIDEO CAN BE SEEN AT OUR PROJECT PAGE: https://bu-air-lab.github.io/guide_dog/Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Learning from Power Signals: An Automated Approach to Electrical Disturbance Identification Within a Power Transmission System

  • paper_url: http://arxiv.org/abs/2309.04361
  • repo_url: None
  • paper_authors: Jonathan D. Boyd, Joshua H. Tyler, Anthony M. Murphy, Donald R. Reising
  • for: automated analysis of power quality events recorded by digital fault recorders and power quality monitors
  • methods: rule-based analytics and cyclic histogram processing
  • results: accuracy of 99% in detecting and categorizing 14 different event types, reduction of memory requirements by a factor of 320, and anticipated improvement in reliability of the transmission system through near real-time detection and identification of disturbances and prevention of problems before they occur.Here’s the full translation of the abstract in Simplified Chinese:
  • for: 这项研究旨在自动分析由数字缺陷记录器和电质质量监测器记录的电力质量事件。
  • methods: 该方法使用规则基于分析,对时域和频域特征进行分析,并设置可定制的阈值来分类各种事件。
  • results: 该方法在160个信号文件中实现了99%的准确率,可以自动分类14种不同的事件类型。此外,预计通过循环 histogram 处理可以降低内存需求,并且可以在实时中检测和识别各种问题,预防问题出现。
    Abstract As power quality becomes a higher priority in the electric utility industry, the amount of disturbance event data continues to grow. Utilities do not have the required personnel to analyze each event by hand. This work presents an automated approach for analyzing power quality events recorded by digital fault recorders and power quality monitors operating within a power transmission system. The automated approach leverages rule-based analytics to examine the time and frequency domain characteristics of the voltage and current signals. Customizable thresholds are set to categorize each disturbance event. The events analyzed within this work include various faults, motor starting, and incipient instrument transformer failure. Analytics for fourteen different event types have been developed. The analytics were tested on 160 signal files and yielded an accuracy of ninety-nine percent. Continuous, nominal signal data analysis is performed using an approach coined as the cyclic histogram. The cyclic histogram process will be integrated into the digital fault recorders themselves to facilitate the detection of subtle signal variations that are too small to trigger a disturbance event and that can occur over hours or days. In addition to reducing memory requirements by a factor of 320, it is anticipated that cyclic histogram processing will aid in identifying incipient events and identifiers. This project is expected to save engineers time by automating the classification of disturbance events and increase the reliability of the transmission system by providing near real time detection and identification of disturbances as well as prevention of problems before they occur.
    摘要 随着电力质量在电力供应业中的重要性提高,分布式事件数据的量不断增加。Utilities没有足够的人员来手动分析每个事件。这项工作提出了一种自动化分析电力质量事件记录器和电力质量监测器在电力传输系统中记录的数据的方法。该方法利用规则based分析来检查电压和电流信号在时域和频域特征。可定制的阈值设置来分类每个分布式事件。本工作分析了十四种不同的事件类型,其中包括各种故障、电动机启动和潜在的仪器变换器故障。测试结果表明,使用160个信号文件可达九十九%的准确率。此外,针对常见信号数据进行累积分析,可以降低内存需求的320倍,并且预计可以帮助检测征相对较小的信号变化,这些变化可能在小时或多个小时内发生。这个项目预计可以为工程师提供自动化分类分布式事件的功能,同时提高电力传输系统的可靠性,通过实时检测和识别分布式事件,以及预防问题的发生。

Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data

  • paper_url: http://arxiv.org/abs/2309.04355
  • repo_url: None
  • paper_authors: Skyler Ruiter, Seth Wolfgang, Marc Tunnell, Timothy Triche Jr., Erin Carrier, Zachary DeBruine
  • for: 这篇论文是关于压缩稀疙数组的研究,具体来说是对 Value-Compressed Sparse Column (VCSC) 和 Index- and Value-Compressed Sparse Column (IVCSC) 两种压缩格式的研究。
  • methods: 该论文使用了 CSC 和 COO 两种常见压缩格式,并对它们进行了两种扩展:VCSC 和 IVCSC。VCSC 利用了列中的高重复率进行了进一步的压缩,可以在不影响性能的情况下将数据压缩到 3 倍于 COO 和 2.25 倍于 CSC。IVCSC 则是对 VCSC 进行了进一步的压缩,通过 delta 编码和字节压缩来压缩索引数组,可以将内存占用降低到 10 倍于 COO 和 7.5 倍于 CSC。
  • results: 作者通过对 simulated 和实际数据进行 benchmark 测试,发现 VCSC 和 IVCSC 可以在压缩形式下读取数据,而且这些方法对计算性能没有明显的负面影响。因此,这两种新的压缩格式可以广泛地应用于编码和读取稀疙数据。
    Abstract Compressed Sparse Column (CSC) and Coordinate (COO) are popular compression formats for sparse matrices. However, both CSC and COO are general purpose and cannot take advantage of any of the properties of the data other than sparsity, such as data redundancy. Highly redundant sparse data is common in many machine learning applications, such as genomics, and is often too large for in-core computation using conventional sparse storage formats. In this paper, we present two extensions to CSC: (1) Value-Compressed Sparse Column (VCSC) and (2) Index- and Value-Compressed Sparse Column (IVCSC). VCSC takes advantage of high redundancy within a column to further compress data up to 3-fold over COO and 2.25-fold over CSC, without significant negative impact to performance characteristics. IVCSC extends VCSC by compressing index arrays through delta encoding and byte-packing, achieving a 10-fold decrease in memory usage over COO and 7.5-fold decrease over CSC. Our benchmarks on simulated and real data show that VCSC and IVCSC can be read in compressed form with little added computational cost. These two novel compression formats offer a broadly useful solution to encoding and reading redundant sparse data.
    摘要 压缩紧凑列(CSC)和坐标(COO)是常用压缩格式 для 稀疏矩阵。然而,CSC 和 COO 都不能利用数据的其他属性,如数据重复性。许多机器学习应用,如 genomics,会出现高度重复的稀疏数据,这些数据通常是 convent 稀疏存储格式内存中的。在这篇论文中,我们提出了两种 CSC 的扩展:(1)值压缩紧凑列(VCSC)和(2)标识和值压缩紧凑列(IVCSC)。VCSC 利用每列的高度重复性来进一步压缩数据,可以达到 COO 和 CSC 的 3 倍压缩率,而无需影响性能特征。IVCSC 在 delta 编码和字节压缩技术的基础上压缩标识数组,可以将内存使用率降低至 COO 的 10 倍和 CSC 的 7.5 倍。我们对模拟和实际数据进行了测试,发现 VCSC 和 IVCSC 可以在压缩形式下读取数据,而无需增加计算成本。这两种新的压缩格式可以为重复 sparse 数据编码和读取提供一种广泛有用的解决方案。

Decreasing the Computing Time of Bayesian Optimization using Generalizable Memory Pruning

  • paper_url: http://arxiv.org/abs/2309.04510
  • repo_url: None
  • paper_authors: Alexander E. Siemenn, Tonio Buonassisi
  • for: 这篇论文目的是提出一种能够应用于任何模型和收集函数的实验优化 wrapper,以减少实验运行时间。
  • methods: 本论文使用了内存剔除和范围 bounded optimization 技术,实现了将实验运行时间从复杂函数型式变数为锯条型式,不会对结果的内涵牺牲。
  • results: 本论文显示了实验运行时间的几何减少,并且显示了这个方法在不同的数据集和模型上的一致性和普遍性。
    Abstract Bayesian optimization (BO) suffers from long computing times when processing highly-dimensional or large data sets. These long computing times are a result of the Gaussian process surrogate model having a polynomial time complexity with the number of experiments. Running BO on high-dimensional or massive data sets becomes intractable due to this time complexity scaling, in turn, hindering experimentation. Alternative surrogate models have been developed to reduce the computing utilization of the BO procedure, however, these methods require mathematical alteration of the inherit surrogate function, pigeonholing use into only that function. In this paper, we demonstrate a generalizable BO wrapper of memory pruning and bounded optimization, capable of being used with any surrogate model and acquisition function. Using this memory pruning approach, we show a decrease in wall-clock computing times per experiment of BO from a polynomially increasing pattern to a sawtooth pattern that has a non-increasing trend without sacrificing convergence performance. Furthermore, we illustrate the generalizability of the approach across two unique data sets, two unique surrogate models, and four unique acquisition functions. All model implementations are run on the MIT Supercloud state-of-the-art computing hardware.
    摘要 bayesian 优化(BO)在处理高维度或大数据集时会遇到长时间计算问题。这些长时间计算问题是因为 Gaussian process 模拟函数的时间复杂度与实验数量成正比。在高维度或大数据集上运行 BO 变得不可行,这会阻碍实验。 alternatively, 有些受托函数被开发出来 reducethe computing utilization of the BO procedure,但这些方法需要修改基本的受托函数,这限定了它们的使用。在这篇论文中,我们提出了一种通用的 BO 包装器,可以与任何受托函数和获取函数一起使用。使用这种内存剪辑方法,我们显示了 BO 的墙 clock 计算时间每个实验从呈极值增长趋势变为 sawtooth 增长趋势,而无需牺牲 convergence 性能。此外,我们阐述了该方法在两个不同的数据集、两个不同的受托函数和四个不同的获取函数上的一致性。所有模型实现都在 MIT Supercloud 当今计算硬件上运行。

Generating the Ground Truth: Synthetic Data for Label Noise Research

  • paper_url: http://arxiv.org/abs/2309.04318
  • repo_url: https://github.com/sjoerd-de-vries/synlabel
  • paper_authors: Sjoerd de Vries, Dirk Thierens
  • for: This paper aims to improve the current methodologies for label noise research by creating a noiseless dataset informed by real data.
  • methods: The proposed framework, called SYNLABEL, allows for creating a noiseless dataset by either pre-specifying or learning a function and defining it as the ground truth function from which labels are generated. Additionally, the framework uses resampling and aggregation of labels to generate soft label distributions for each data point.
  • results: The generated datasets serve as a clean baseline of adjustable complexity into which different types of noise may be introduced, allowing for direct injection and quantification of label noise. The paper demonstrates how the framework can be applied, how it enables quantification of label noise, and how it improves over existing methodologies.
    Abstract Most real-world classification tasks suffer from label noise to some extent. Such noise in the data adversely affects the generalization error of learned models and complicates the evaluation of noise-handling methods, as their performance cannot be accurately measured without clean labels. In label noise research, typically either noisy or incomplex simulated data are accepted as a baseline, into which additional noise with known properties is injected. In this paper, we propose SYNLABEL, a framework that aims to improve upon the aforementioned methodologies. It allows for creating a noiseless dataset informed by real data, by either pre-specifying or learning a function and defining it as the ground truth function from which labels are generated. Furthermore, by resampling a number of values for selected features in the function domain, evaluating the function and aggregating the resulting labels, each data point can be assigned a soft label or label distribution. Such distributions allow for direct injection and quantification of label noise. The generated datasets serve as a clean baseline of adjustable complexity into which different types of noise may be introduced. We illustrate how the framework can be applied, how it enables quantification of label noise and how it improves over existing methodologies.
    摘要 大多数实际分类任务受到标签噪声影响,这种噪声会导致学习的模型的泛化误差增大,同时也使得噪声处理方法的评估变得更加复杂,因为无法精确测量噪声的影响。在标签噪声研究中,通常接受现有的噪声数据或者制作具有知名性的噪声数据作为基线。在这篇论文中,我们提出了 SYNLABEL 框架,这是一种尝试超越现有的方法ologies。它允许创建一个噪声な dataset,其中每个数据点都可以被赋予软 Label 或 Label 分布。这些分布允许直接注入和量化标签噪声。生成的 dataset 可以作为一个可调复杂度的清晰基线,在其中不同类型的噪声可以被引入。我们介绍了如何应用该框架,如何使用它来评估标签噪声,以及如何它与现有方法ologies 相比。

Actor critic learning algorithms for mean-field control with moment neural networks

  • paper_url: http://arxiv.org/abs/2309.04317
  • repo_url: None
  • paper_authors: Huyên Pham, Xavier Warin
  • for: 解决连续时间奖励学习中的mean-field控制问题
  • methods: 使用梯度基于的策略和估计函数,采用 Wasserstein 空间上的oment neural network 函数进行学习actor和critic
  • results: 提供了一系列数据示例,包括多维设置和非线性二次mean-field控制问题with controlled volatility
    Abstract We develop a new policy gradient and actor-critic algorithm for solving mean-field control problems within a continuous time reinforcement learning setting. Our approach leverages a gradient-based representation of the value function, employing parametrized randomized policies. The learning for both the actor (policy) and critic (value function) is facilitated by a class of moment neural network functions on the Wasserstein space of probability measures, and the key feature is to sample directly trajectories of distributions. A central challenge addressed in this study pertains to the computational treatment of an operator specific to the mean-field framework. To illustrate the effectiveness of our methods, we provide a comprehensive set of numerical results. These encompass diverse examples, including multi-dimensional settings and nonlinear quadratic mean-field control problems with controlled volatility.
    摘要 我们开发了一种新的政策梯度和批评算法,用于解决连续时间返点学习中的平均场控制问题。我们的方法利用梯度基于的值函数表示,使用参数化的随机政策。学习actor(政策)和批评(值函数)受到oment抽象函数的支持,并且采用 Wasserstein 空间上的概率分布样本来进行学习。我们的中心挑战在于处理mean-field框架中特有的运算问题。为了证明我们的方法的效果,我们提供了广泛的数据结果,包括多维设置和非线性quadratic mean-field控制问题。

Viewing the process of generating counterfactuals as a source of knowledge – Application to the Naive Bayes classifier

  • paper_url: http://arxiv.org/abs/2309.04284
  • repo_url: None
  • paper_authors: Vincent Lemaire, Nathan Le Boudec, Françoise Fessant, Victor Guyomard
  • for: 本文旨在探讨Counterfactual例子生成算法如何帮助理解机器学习模型做出决策。
  • methods: 本文提出将生成Counterfactual例子视为创造知识的过程,并在加法模型和naive Bayes分类器的情况下进行了示例。
  • results: 本文显示了naive Bayes分类器在这种过程中的有趣性和可用性。
    Abstract There are now many comprehension algorithms for understanding the decisions of a machine learning algorithm. Among these are those based on the generation of counterfactual examples. This article proposes to view this generation process as a source of creating a certain amount of knowledge that can be stored to be used, later, in different ways. This process is illustrated in the additive model and, more specifically, in the case of the naive Bayes classifier, whose interesting properties for this purpose are shown.
    摘要 现在有很多机器学习算法的理解方法,其中包括基于生成对例的方法。这篇文章提议视生成过程为创造一定量的知识,可以将其存储,以便在后续使用不同方式。这个过程在添加模型中得到了描述,而且在Naive Bayes分类器中具有感兴趣的特性。

Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity

  • paper_url: http://arxiv.org/abs/2309.04272
  • repo_url: https://github.com/wujiduan/zero-sum-lq-games
  • paper_authors: Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, Niao He
  • for: Zero-sum Linear Quadratic (LQ) games are used as a dynamic game formulation for risk-sensitive or robust control, or as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces.
  • methods: The paper proposes a simpler nested Zeroth-Order (ZO) algorithm that improves sample complexity by several orders of magnitude, with a guaranteed $\widetilde{\mathcal{O}(\epsilon^{-3})$ sample complexity under the same assumptions using a single-point ZO estimator.
  • results: The paper achieves a better $\widetilde{\mathcal{O}(\epsilon^{-2})$ sample complexity when the estimator is replaced by a two-point estimator, with key improvements in a more sample-efficient nested algorithm design and finer control of the ZO natural gradient estimation error.
    Abstract Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and can be used (i) as a dynamic game formulation for risk-sensitive or robust control, or (ii) as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces. In contrast to the well-studied single-agent linear quadratic regulator problem, zero-sum LQ games entail solving a challenging nonconvex-nonconcave min-max problem with an objective function that lacks coercivity. Recently, Zhang et al. discovered an implicit regularization property of natural policy gradient methods which is crucial for safety-critical control systems since it preserves the robustness of the controller during learning. Moreover, in the model-free setting where the knowledge of model parameters is not available, Zhang et al. proposed the first polynomial sample complexity algorithm to reach an $\epsilon$-neighborhood of the Nash equilibrium while maintaining the desirable implicit regularization property. In this work, we propose a simpler nested Zeroth-Order (ZO) algorithm improving sample complexity by several orders of magnitude. Our main result guarantees a $\widetilde{\mathcal{O}(\epsilon^{-3})$ sample complexity under the same assumptions using a single-point ZO estimator. Furthermore, when the estimator is replaced by a two-point estimator, our method enjoys a better $\widetilde{\mathcal{O}(\epsilon^{-2})$ sample complexity. Our key improvements rely on a more sample-efficient nested algorithm design and finer control of the ZO natural gradient estimation error.
    摘要 zero-sum linear quadratic (LQ) 游戏是优化控制中的基本形式,可以用于风险敏感或Robust控制的动态游戏形式,或者作为多代理 reinforcement learning 中的两个竞争对手的标准问题。与单个 Linear Quadratic Regulator 问题相比,zero-sum LQ 游戏具有一个复杂非几何非凹的最小最大问题, objective function 缺乏勉强性。最近,张等人发现了自然策略强度方法的隐式正则化性质,这是关键的 для安全控制系统,因为它保持了控制器的稳定性 durante learning。此外,在没有模型参数知识的模型自由设置中,张等人提出了首个 polynomial sample complexity 算法,可以到 $\epsilon$- neighborhood 的达到 Nash 均衡,同时保持愿意的隐式正则化性质。在这项工作中,我们提出了一种更简单的嵌套 Zeroth-Order(ZO)算法,提高样本复杂度的多个阶段。我们的主要结果表明,使用单点 ZO 估计器时,我们可以达到 $\widetilde{\mathcal{O}(\epsilon^{-3})$ 样本复杂度,而使用两点估计器时,我们可以达到 $\widetilde{\mathcal{O}(\epsilon^{-2})$ 样本复杂度。我们的关键提高来自于更有效的嵌套算法设计和 ZO 自然幂Gradient 估计误差的精细控制。

Optimal Rate of Kernel Regression in Large Dimensions

  • paper_url: http://arxiv.org/abs/2309.04268
  • repo_url: None
  • paper_authors: Weihao Lu, Haobo Zhang, Yicheng Li, Manyun Xu, Qian Lin
  • For: 研究大维度数据的kernel回归( sample size $n$ 是对维度 $d$ 的样本增长 polynomial 相似,即 $n\asymp d^\gamma$ для some $\gamma >0$)。* Methods: 使用一种通用工具来Characterize kernel回归的Upper bound和Minimax lower bound,包括Mendelson复杂性 $\varepsilon_{n}^{2}$ 和 metric entropy $\bar{\varepsilon}_{n}^{2}$。* Results: 当target function falls into RKHS中的(通用)inner product model时,通过新工具可以显示kernel回归的excess risk的minimax rate是$n^{-1/2}$,当$n\asymp d^\gamma$ for $\gamma =2, 4, 6, 8, \cdots$。然后,我们还further determine了kernel回归的optimal rate,并发现了several new phenomenon,包括{\it multiple descent behavior}和{\it periodic plateau behavior}.应用于NTK,我们也提供了类似的Explicit description of the curve of optimal rate。作为直接推论,这些laim hold for wide neural networks as well.
    Abstract We perform a study on kernel regression for large-dimensional data (where the sample size $n$ is polynomially depending on the dimension $d$ of the samples, i.e., $n\asymp d^{\gamma}$ for some $\gamma >0$ ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity $\varepsilon_{n}^{2}$ and the metric entropy $\bar{\varepsilon}_{n}^{2}$ respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on $\mathbb{S}^{d}$, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is $n^{-1/2}$ when $n\asymp d^{\gamma}$ for $\gamma =2, 4, 6, 8, \cdots$. We then further determine the optimal rate of the excess risk of kernel regression for all the $\gamma>0$ and find that the curve of optimal rate varying along $\gamma$ exhibits several new phenomena including the {\it multiple descent behavior} and the {\it periodic plateau behavior}. As an application, For the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well.
    摘要 我们进行了一项研究,探讨了适用于大维度数据(其样本大小 $n$ 是对维度 $d$ 样本的几何函数,即 $n \asymp d^\gamma$ для some $\gamma > 0$)的kernel回归问题。我们首先构建了一个通用工具,用于Characterize kernel回归问题中的Upper bound和Minimax下界。当目标函数属于RKHS中的一个(通用)内积模型定义在 $\mathbb{S}^d$ 上时,我们利用这个工具来示出kernel回归问题的最佳误差率为 $n^{-1/2}$ ,当 $n \asymp d^\gamma$ для $\gamma = 2, 4, 6, 8, \cdots$。然后,我们进一步确定了kernel回归问题的最佳误差率,并发现了一些新现象,包括{\it 多重 descent behavior}和{\it periodic plateau behavior}.在应用中,我们还提供了类似的Explicit Description of the curve of optimal rate for the neural tangent kernel(NTK)。作为直接推论,我们知道这些laims holding for wide neural networks as well.

Generating drawdown-realistic financial price paths using path signatures

  • paper_url: http://arxiv.org/abs/2309.04507
  • repo_url: None
  • paper_authors: Emiel Lemahieu, Kris Boudt, Maarten Wyns
  • for: 用于 simulate financial price data with drawdowns that are quantifiably close to empirical data.
  • methods: 使用非参数 Monte Carlo 方法, combining variational autoencoder 生成模型和 drawdown 重建损失函数。
  • results: 实现了 drawdown-realistic 价格走势的 simulate,并且通过 linear regression 获得了close numerical approximations。
    Abstract A novel generative machine learning approach for the simulation of sequences of financial price data with drawdowns quantifiably close to empirical data is introduced. Applications such as pricing drawdown insurance options or developing portfolio drawdown control strategies call for a host of drawdown-realistic paths. Historical scenarios may be insufficient to effectively train and backtest the strategy, while standard parametric Monte Carlo does not adequately preserve drawdowns. We advocate a non-parametric Monte Carlo approach combining a variational autoencoder generative model with a drawdown reconstruction loss function. To overcome issues of numerical complexity and non-differentiability, we approximate drawdown as a linear function of the moments of the path, known in the literature as path signatures. We prove the required regularity of drawdown function and consistency of the approximation. Furthermore, we obtain close numerical approximations using linear regression for fractional Brownian and empirical data. We argue that linear combinations of the moments of a path yield a mathematically non-trivial smoothing of the drawdown function, which gives one leeway to simulate drawdown-realistic price paths by including drawdown evaluation metrics in the learning objective. We conclude with numerical experiments on mixed equity, bond, real estate and commodity portfolios and obtain a host of drawdown-realistic paths.
    摘要 《一种新的生成机器学习方法 для财务价格数据序列的模拟》引言:在评估资产组合的风险时,Drawdown(下降)是一个非常重要的指标。然而,由于历史场景可能无法有效地训练和测试策略,而标准的 Parametric Monte Carlo 方法不能够准确保持下降。为了解决这些问题,我们提出了一种非 Parametric Monte Carlo 方法,结合 Variational Autoencoder 生成模型和下降重建loss函数。我们证明了下降函数的必要的准确性和相应的拟合精度。此外,我们还利用线性回归来近似下降函数,并证明了这种方法可以在实际数据上实现高度的精度。最后,我们通过对混合Equity、债券、地产和商品资产组合的数据进行数值实验,并获得了一系列具有下降实际性的价格路径。本文引入了一种新的生成机器学习方法,用于模拟财务价格数据序列的下降。这种方法结合了 Variational Autoencoder 生成模型和下降重建loss函数,以解决标准 Parametric Monte Carlo 方法无法准确保持下降的问题。我们证明了下降函数的必要的准确性和相应的拟合精度,并利用线性回归来近似下降函数。最后,我们通过对混合Equity、债券、地产和商品资产组合的数据进行数值实验,并获得了一系列具有下降实际性的价格路径。

Adaptive Distributed Kernel Ridge Regression: A Feasible Distributed Learning Scheme for Data Silos

  • paper_url: http://arxiv.org/abs/2309.04236
  • repo_url: None
  • paper_authors: Di Wang, Xiaotong Liu, Shao-Bo Lin, Ding-Xuan Zhou
  • for: 解决数据隔阂问题,提高不同机构之间数据共享和合作。
  • methods: 基于分治的分布式学习,保证参数自主选择、隐私保障和性能提高。
  • results: 在理论和实验中证明了AdaDKRR的可行性和有效性,在数据隔阂问题上表现出优于其他分布式学习方案。
    Abstract Data silos, mainly caused by privacy and interoperability, significantly constrain collaborations among different organizations with similar data for the same purpose. Distributed learning based on divide-and-conquer provides a promising way to settle the data silos, but it suffers from several challenges, including autonomy, privacy guarantees, and the necessity of collaborations. This paper focuses on developing an adaptive distributed kernel ridge regression (AdaDKRR) by taking autonomy in parameter selection, privacy in communicating non-sensitive information, and the necessity of collaborations in performance improvement into account. We provide both solid theoretical verification and comprehensive experiments for AdaDKRR to demonstrate its feasibility and effectiveness. Theoretically, we prove that under some mild conditions, AdaDKRR performs similarly to running the optimal learning algorithms on the whole data, verifying the necessity of collaborations and showing that no other distributed learning scheme can essentially beat AdaDKRR under the same conditions. Numerically, we test AdaDKRR on both toy simulations and two real-world applications to show that AdaDKRR is superior to other existing distributed learning schemes. All these results show that AdaDKRR is a feasible scheme to defend against data silos, which are highly desired in numerous application regions such as intelligent decision-making, pricing forecasting, and performance prediction for products.
    摘要 “数据堡垒,主要由隐私和兼容性所引起,对不同组织之间的合作进行重大限制。分布式学习基于分割-并行提供了一种解决数据堡垒的可能性,但它面临着多个挑战,包括自主性、隐私保障和合作必要性。这篇论文关注于开发一种适应分布式内核ridge regression(AdaDKRR),该方法考虑到自主性在参数选择、隐私在交换非敏感信息以及合作的必要性。我们提供了坚实的理论验证和实验来证明AdaDKRR的可行性和有效性。理论上,我们证明在某些轻微条件下,AdaDKRR与运行最优学习算法在整个数据上相当,证明了合作的必要性,并证明没有其他分布式学习方案可以超越AdaDKRR在同样的条件下。数字上,我们测试了AdaDKRR在几个实际应用中,并证明它比其他现有的分布式学习方案更高效。所有这些结果表明AdaDKRR是一种可行的分布式学习方案,可以防止数据堡垒,这些受到许多应用领域的需求,如智能决策、价格预测和产品性能预测。”

Offline Recommender System Evaluation under Unobserved Confounding

  • paper_url: http://arxiv.org/abs/2309.04222
  • repo_url: https://github.com/olivierjeunen/confounding-consequences-2023
  • paper_authors: Olivier Jeunen, Ben London
  • for: This paper highlights the problem of unobserved confounders in off-policy estimation (OPE) methods for recommender systems.
  • methods: The paper focuses on policy-based estimators, where the logging propensities are learned from logged data, and demonstrates the statistical bias that arises due to confounding.
  • results: The paper shows that existing diagnostics are unable to uncover such cases, and that na"ive propensity estimation under confounding can lead to severely biased metric estimates.
    Abstract Off-Policy Estimation (OPE) methods allow us to learn and evaluate decision-making policies from logged data. This makes them an attractive choice for the offline evaluation of recommender systems, and several recent works have reported successful adoption of OPE methods to this end. An important assumption that makes this work is the absence of unobserved confounders: random variables that influence both actions and rewards at data collection time. Because the data collection policy is typically under the practitioner's control, the unconfoundedness assumption is often left implicit, and its violations are rarely dealt with in the existing literature. This work aims to highlight the problems that arise when performing off-policy estimation in the presence of unobserved confounders, specifically focusing on a recommendation use-case. We focus on policy-based estimators, where the logging propensities are learned from logged data. We characterise the statistical bias that arises due to confounding, and show how existing diagnostics are unable to uncover such cases. Because the bias depends directly on the true and unobserved logging propensities, it is non-identifiable. As the unconfoundedness assumption is famously untestable, this becomes especially problematic. This paper emphasises this common, yet often overlooked issue. Through synthetic data, we empirically show how na\"ive propensity estimation under confounding can lead to severely biased metric estimates that are allowed to fly under the radar. We aim to cultivate an awareness among researchers and practitioners of this important problem, and touch upon potential research directions towards mitigating its effects.
    摘要 偏离策略估计(OPE)方法可以从日志数据中学习和评估决策策略。这使得它们成为了无法评估推荐系统的线上评估的有力选择,多个最近的工作都报道了使用OPE方法进行这种评估的成功。一个重要的假设是没有隐藏的干扰因素:随机变量会影响行动和奖励在数据采集时。由于数据采集策略通常是实际控制在 praktitioner 手中,因此这个假设通常被遗弃,其违反也rarely被文献中处理。 这个工作的目的是强调在隐藏干扰因素存在的情况下进行偏离策略估计的问题。我们专注于基于策略的估计器,其中 logging 的可能性是通过日志数据学习出来的。我们描述了由干扰引起的统计偏误,并证明现有的诊断不能揭示这种情况。由于偏误直接取决于真实的和隐藏的 logging 可能性,它是不可识别的。这种假设是著名的不可测试的,这成为了特别问题。这篇文章强调了这种常见 yet oft overlooked 的问题。通过 синтетиче数据,我们employs demonstrate 如何 naive 的可能性估计在潜在干扰情况下可能产生严重偏差的 metric 估计。我们 hope 通过鼓励研究人员和实践者对这种重要问题产生意识,并谈谈可能的研究方向来 Mitigate 其影响。

Concomitant Group Testing

  • paper_url: http://arxiv.org/abs/2309.04221
  • repo_url: None
  • paper_authors: Thach V. Bui, Jonathan Scarlett
  • for: 本文 introduce了一种变体的群测试问题,具体来说是每个测试 Item 可以是多个类型的杂合。文中假设存在多个不同的半defective set,测试为阳性只要包含这些set中的至少一个项目。目标是通过最少测试来可靠地确定这些半defective set。
  • methods: 本文提出了多种算法来解决这个问题,主要是两个半defective set的情况。这些算法包括 deterministic 算法(zero-error)和 randomized 算法(small-error),以及非适应、全适应和有限适应(例如 2 或 3 个阶段)。
  • results: 作者的 deterministic adaptive algorithm 和 randomized algorithms(非适应或有限适应)在广泛的扩展 Régime 中具有最佳性,并在基准结果(例如 hypergraph learning)上进行了显著提高。
    Abstract In this paper, we introduce a variation of the group testing problem capturing the idea that a positive test requires a combination of multiple ``types'' of item. Specifically, we assume that there are multiple disjoint \emph{semi-defective sets}, and a test is positive if and only if it contains at least one item from each of these sets. The goal is to reliably identify all of the semi-defective sets using as few tests as possible, and we refer to this problem as \textit{Concomitant Group Testing} (ConcGT). We derive a variety of algorithms for this task, focusing primarily on the case that there are two semi-defective sets. Our algorithms are distinguished by (i) whether they are deterministic (zero-error) or randomized (small-error), and (ii) whether they are non-adaptive, fully adaptive, or have limited adaptivity (e.g., 2 or 3 stages). Both our deterministic adaptive algorithm and our randomized algorithms (non-adaptive or limited adaptivity) are order-optimal in broad scaling regimes of interest, and improve significantly over baseline results that are based on solving a more general problem as an intermediate step (e.g., hypergraph learning).
    摘要 在这篇论文中,我们介绍了一种group testing问题的变种,其中每个测试需要包含多个“类型”的 Item。 Specifically, we assume that there are multiple disjoint semi-defective sets, and a test is positive if and only if it contains at least one item from each of these sets. Our goal is to reliably identify all of the semi-defective sets using as few tests as possible, and we refer to this problem as 共同组测试 (ConcGT). We derive a variety of algorithms for this task, focusing primarily on the case that there are two semi-defective sets. Our algorithms are distinguished by (i) whether they are deterministic (zero-error) or randomized (small-error), and (ii) whether they are non-adaptive, fully adaptive, or have limited adaptivity (e.g., 2 or 3 stages). Both our deterministic adaptive algorithm and our randomized algorithms (non-adaptive or limited adaptivity) are order-optimal in broad scaling regimes of interest, and improve significantly over baseline results that are based on solving a more general problem as an intermediate step (e.g., hypergraph learning).

Counterfactual Explanations via Locally-guided Sequential Algorithmic Recourse

  • paper_url: http://arxiv.org/abs/2309.04211
  • repo_url: None
  • paper_authors: Edward A. Small, Jeffrey N. Clark, Christopher J. McWilliams, Kacper Sokol, Jeffrey Chan, Flora D. Salim, Raul Santos-Rodriguez
  • for: 这个论文的目的是提供一种可操作的对算法质量的批判,以使人工智能系统更加可解释。
  • methods: 这个论文使用了算法权利,通过查找一个对应的对话来帮助用户更好地理解算法的决策过程。
  • results: 本论文引入了一种名为LocalFACE的模型无关技术,可以在algorithmic recourse中构建可行的和有效的对话,并保护用户隐私和模型安全。
    Abstract Counterfactuals operationalised through algorithmic recourse have become a powerful tool to make artificial intelligence systems explainable. Conceptually, given an individual classified as y -- the factual -- we seek actions such that their prediction becomes the desired class y' -- the counterfactual. This process offers algorithmic recourse that is (1) easy to customise and interpret, and (2) directly aligned with the goals of each individual. However, the properties of a "good" counterfactual are still largely debated; it remains an open challenge to effectively locate a counterfactual along with its corresponding recourse. Some strategies use gradient-driven methods, but these offer no guarantees on the feasibility of the recourse and are open to adversarial attacks on carefully created manifolds. This can lead to unfairness and lack of robustness. Other methods are data-driven, which mostly addresses the feasibility problem at the expense of privacy, security and secrecy as they require access to the entire training data set. Here, we introduce LocalFACE, a model-agnostic technique that composes feasible and actionable counterfactual explanations using locally-acquired information at each step of the algorithmic recourse. Our explainer preserves the privacy of users by only leveraging data that it specifically requires to construct actionable algorithmic recourse, and protects the model by offering transparency solely in the regions deemed necessary for the intervention.
    摘要 algorithmic recourse through counterfactuals has become a powerful tool to make artificial intelligence systems explainable. conceptually, given an individual classified as y -- the factual -- we seek actions such that their prediction becomes the desired class y' -- the counterfactual. this process offers algorithmic recourse that is (1) easy to customize and interpret, and (2) directly aligned with the goals of each individual. however, the properties of a "good" counterfactual are still largely debated; it remains an open challenge to effectively locate a counterfactual along with its corresponding recourse. some strategies use gradient-driven methods, but these offer no guarantees on the feasibility of the recourse and are open to adversarial attacks on carefully created manifolds. this can lead to unfairness and lack of robustness. other methods are data-driven, which mostly addresses the feasibility problem at the expense of privacy, security, and secrecy as they require access to the entire training data set. here, we introduce LocalFACE, a model-agnostic technique that composes feasible and actionable counterfactual explanations using locally-acquired information at each step of the algorithmic recourse. our explainer preserves the privacy of users by only leveraging data that it specifically requires to construct actionable algorithmic recourse, and protects the model by offering transparency solely in the regions deemed necessary for the intervention.

COVID-19 Detection System: A Comparative Analysis of System Performance Based on Acoustic Features of Cough Audio Signals

  • paper_url: http://arxiv.org/abs/2309.04505
  • repo_url: None
  • paper_authors: Asmaa Shati, Ghulam Mubashar Hassan, Amitava Datta
  • for: automatize the process of detecting COVID-19 from cough signals
  • methods: 使用 Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Spectral Contrast features 提高机器学习模型的表现
  • results: 提出一种高效的 COVID-19检测系统,并在 COUGHVID 和 Virufy 数据集上达到了更高的状态艺术分类性能
    Abstract A wide range of respiratory diseases, such as cold and flu, asthma, and COVID-19, affect people's daily lives worldwide. In medical practice, respiratory sounds are widely used in medical services to diagnose various respiratory illnesses and lung disorders. The traditional diagnosis of such sounds requires specialized knowledge, which can be costly and reliant on human expertise. Recently, cough audio recordings have been used to automate the process of detecting respiratory conditions. This research aims to examine various acoustic features that enhance the performance of machine learning (ML) models in detecting COVID-19 from cough signals. This study investigates the efficacy of three feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCC), Chroma, and Spectral Contrast features, on two ML algorithms, Support Vector Machine (SVM) and Multilayer Perceptron (MLP), and thus proposes an efficient COVID-19 detection system. The proposed system produces a practical solution and demonstrates higher state-of-the-art classification performance on COUGHVID and Virufy datasets for COVID-19 detection.
    摘要 世界各地的人们日常生活受到各种呼吸疾病的影响,如感冒和流感、asma和COVID-19。在医疗医学中,呼吸音被广泛使用以诊断各种呼吸疾病和肺脏疾患。传统诊断方法需要专业知识,可能成本高昂并且виси于人工智能。现在,喷嚏音频记录被用来自动诊断呼吸疾病。本研究旨在检查不同的音频特征,以提高机器学习(ML)模型在检测COVID-19的能力。本研究investigates三种特征提取技术,包括Mel Frequency Cepstral Coefficients(MFCC)、Chroma和Spectral Contrast特征,并将其应用于两种ML算法,支持向量机(SVM)和多层感知机(MLP)。因此,本研究提出了一个高效的COVID-19检测系统。系统实现了现实的解决方案,并在COUGHVID和Virufy数据集上达到了最高级别的分类性能。

Adversarial attacks on hybrid classical-quantum Deep Learning models for Histopathological Cancer Detection

  • paper_url: http://arxiv.org/abs/2309.06377
  • repo_url: None
  • paper_authors: Biswaraj Baral, Reek Majumdar, Bhavika Bhalgamiya, Taposh Dutta Roy
  • for: histopathological cancer detection
  • methods: hybrid classical-quantum Deep Learning models, quantum transfer learning strategy, multiple transfer learning models (ResNet18, VGG-16, Inception-v3, AlexNet) and quantum circuit-based variational quantum circuits (VQC)
  • results: better accuracy than classical image classification models under adversarial attacks
    Abstract We present an effective application of quantum machine learning in histopathological cancer detection. The study here emphasizes two primary applications of hybrid classical-quantum Deep Learning models. The first application is to build a classification model for histopathological cancer detection using the quantum transfer learning strategy. The second application is to test the performance of this model for various adversarial attacks. Rather than using a single transfer learning model, the hybrid classical-quantum models are tested using multiple transfer learning models, especially ResNet18, VGG-16, Inception-v3, and AlexNet as feature extractors and integrate it with several quantum circuit-based variational quantum circuits (VQC) with high expressibility. As a result, we provide a comparative analysis of classical models and hybrid classical-quantum transfer learning models for histopathological cancer detection under several adversarial attacks. We compared the performance accuracy of the classical model with the hybrid classical-quantum model using pennylane default quantum simulator. We also observed that for histopathological cancer detection under several adversarial attacks, Hybrid Classical-Quantum (HCQ) models provided better accuracy than classical image classification models.
    摘要 我们提出了一种有效的量子机器学习应用在 histopathological 癌症检测中。这个研究强调了两个主要的应用:首先,使用量子传输学习策略建立一个分类模型 для histopathological 癌症检测。其次,测试这个模型对各种恶意攻击的性能。而不是使用单一的传输学习模型,我们使用多个传输学习模型,如 ResNet18、VGG-16、Inception-v3 和 AlexNet 作为特征提取器,并将其与多种基于量子电路的可变量量子电路(VQC)结合。因此,我们提供了对类型模型和混合类型-量子传输学习模型的比较分析,以及对 histopathological 癌症检测下各种恶意攻击的性能对比。我们使用 Pennylane 默认量子 simulate 器进行比较。我们发现,对 histopathological 癌症检测下各种恶意攻击,混合类型-量子(HCQ)模型提供了更高的准确率,比传统的图像分类模型更好。

Preserved Edge Convolutional Neural Network for Sensitivity Enhancement of Deuterium Metabolic Imaging (DMI)

  • paper_url: http://arxiv.org/abs/2309.04100
  • repo_url: None
  • paper_authors: Siyuan Dong, Henk M. De Feyter, Monique A. Thomas, Robin A. de Graaf, James S. Duncan
  • for: 这篇论文主要目的是提高Deuterium Metabolic Imaging(DMI)的感度。
  • methods: 这篇论文使用了一种深度学习方法,即Convolutional Neural Network(CNN),来估算低SNR和扭曲的DMI FIDs中的2H-标记物质浓度。并通过MRI-based edge-preserving regularization进一步改进估算精度。
  • results: PRECISE-DMI可以视觉提高低SNR数据中的代谢图像,并量化提供高精度 than标准 Fourier重建。在骨髓肿瘤模型中的实验中,PRECISE-DMI可以提供更高的分辨率(从>8到2 $\mu$L)或更短的扫描时间(从32到4分),并提供更准确的2H-标记物质浓度测量结果。但是,严格的SD-偏差分析表明,过度使用边缘保持正则化可能会伤害结果的准确性。
    Abstract Purpose: Common to most MRSI techniques, the spatial resolution and the minimal scan duration of Deuterium Metabolic Imaging (DMI) are limited by the achievable SNR. This work presents a deep learning method for sensitivity enhancement of DMI. Methods: A convolutional neural network (CNN) was designed to estimate the 2H-labeled metabolite concentrations from low SNR and distorted DMI FIDs. The CNN was trained with synthetic data that represent a range of SNR levels typically encountered in vivo. The estimation precision was further improved by fine-tuning the CNN with MRI-based edge-preserving regularization for each DMI dataset. The proposed processing method, PReserved Edge ConvolutIonal neural network for Sensitivity Enhanced DMI (PRECISE-DMI), was applied to simulation studies and in vivo experiments to evaluate the anticipated improvements in SNR and investigate the potential for inaccuracies. Results: PRECISE-DMI visually improved the metabolic maps of low SNR datasets, and quantitatively provided higher precision than the standard Fourier reconstruction. Processing of DMI data acquired in rat brain tumor models resulted in more precise determination of 2H-labeled lactate and glutamate + glutamine levels, at increased spatial resolution (from >8 to 2 $\mu$L) or shortened scan time (from 32 to 4 min) compared to standard acquisitions. However, rigorous SD-bias analyses showed that overuse of the edge-preserving regularization can compromise the accuracy of the results. Conclusion: PRECISE-DMI allows a flexible trade-off between enhancing the sensitivity of DMI and minimizing the inaccuracies. With typical settings, the DMI sensitivity can be improved by 3-fold while retaining the capability to detect local signal variations.
    摘要 目的:大多数MRSI技术的空间分解能力和最小扫描时间受到 achievable SNR 的限制。这项工作提出了基于深度学习的MRSI敏感度提高方法。方法:设计了一个卷积神经网络(CNN)来估计低SNR和扭曲的DMI FID中的2H-标记物质浓度。CNN被训练使用表征了在生物体内 typical SNR 水平的synthetic数据。为了进一步提高估计精度,我们采用了基于MRI的edge-preserving regularization(ER)来为每个DMI数据集进行细化调整。我们称之为PRECISE-DMI。这种处理方法在模拟研究和生物体内实验中应用以评估预期的改善和探讨可能的不准确。结果:PRECISE-DMI可视化改进了低SNR数据中的代谢图,并量化提供了高精度than标准 fourier重建。对于 rat brain tumor模型中的DMI数据处理,可以获得更高的空间分解能力(从>8到2 $\mu$L)或更短的扫描时间(从32到4分),相比标准捕捉。然而,严格的SD-bias分析表明,过度使用edge-preserving regularization可能会伪造结果的准确性。结论:PRECISE-DMI允许适量的 Edge-preserving regularization来平衡提高DMI敏感度和避免不准确。通常的设置下,DMI敏感度可以提高3倍,同时保持检测地方信号变化的能力。

Sample-Efficient Co-Design of Robotic Agents Using Multi-fidelity Training on Universal Policy Network

  • paper_url: http://arxiv.org/abs/2309.04085
  • repo_url: None
  • paper_authors: Kishan R. Nagiredla, Buddhika L. Semage, Thommen G. Karimpanal, Arun Kumar A. V, Santu Rana
  • for: 本研究旨在提高协同设计中控制优化和物理设计之间的同步优化效率,并通过绑定控制优化和物理设计的学习过程来提高样本效率。
  • methods: 本研究提出了一种基于Hyperband的多级准确性探索策略,通过将控制优化和物理设计之间的学习过程绑定在一起,实现了逐渐增强的温始效应,从而提高样本效率。
  • results: 实验表明, compared to基eline方法,本研究的方法在各种agent设计问题上显示出明显的优势,并且分析了优化的设计变化,发现了一些非INTUITIVE的设计变化,这些变化在生物世界中也有出现。
    Abstract Co-design involves simultaneously optimizing the controller and agents physical design. Its inherent bi-level optimization formulation necessitates an outer loop design optimization driven by an inner loop control optimization. This can be challenging when the design space is large and each design evaluation involves data-intensive reinforcement learning process for control optimization. To improve the sample-efficiency we propose a multi-fidelity-based design exploration strategy based on Hyperband where we tie the controllers learnt across the design spaces through a universal policy learner for warm-starting the subsequent controller learning problems. Further, we recommend a particular way of traversing the Hyperband generated design matrix that ensures that the stochasticity of the Hyperband is reduced the most with the increasing warm starting effect of the universal policy learner as it is strengthened with each new design evaluation. Experiments performed on a wide range of agent design problems demonstrate the superiority of our method compared to the baselines. Additionally, analysis of the optimized designs shows interesting design alterations including design simplifications and non-intuitive alterations that have emerged in the biological world.
    摘要

Enabling the Evaluation of Driver Physiology Via Vehicle Dynamics

  • paper_url: http://arxiv.org/abs/2309.04078
  • repo_url: None
  • paper_authors: Rodrigo Ordonez-Hurtado, Bo Wen, Nicholas Barra, Ryan Vimba, Sergio Cabrero-Barros, Sergiy Zhuk, Jeffrey L. Rogers
  • for: 这个论文旨在提供一种基于汽车和数字医疗领域的敏捷汽车系统,以评估司机生理状况。
  • methods: 该论文使用了一系列商业感知器,包括汽车和数字医疗领域的感知器,以记录司机的驾驶行为和外部环境。这些数据流被处理,以提取关键参数,以了解司机在驾驶过程中的生理响应。
  • results: 该研究发现,该敏捷汽车系统可以帮助提高道路安全,并且可以早期发现健康问题。
    Abstract Driving is a daily routine for many individuals across the globe. This paper presents the configuration and methodologies used to transform a vehicle into a connected ecosystem capable of assessing driver physiology. We integrated an array of commercial sensors from the automotive and digital health sectors along with driver inputs from the vehicle itself. This amalgamation of sensors allows for meticulous recording of the external conditions and driving maneuvers. These data streams are processed to extract key parameters, providing insights into driver behavior in relation to their external environment and illuminating vital physiological responses. This innovative driver evaluation system holds the potential to amplify road safety. Moreover, when paired with data from conventional health settings, it may enhance early detection of health-related complications.
    摘要 每天,许多人都在全球各地开车。这篇论文介绍了将车辆转化成连接到生物征识系统的方法和配置。我们将汽车业和数字健康领域的商业传感器组合在一起,并从车辆中获取driver的输入。这些敏感器数据流被处理,以提取关键参数,了解 Driver 的行为与外部环境之间的关系,并且揭示了 Driver 的生理反应。这种革新的 Driver 评估系统可能会提高道路安全性。此外,当与传统医疗设施数据相结合时,可能会提高早期发现健康问题的能力。

Riemannian Langevin Monte Carlo schemes for sampling PSD matrices with fixed rank

  • paper_url: http://arxiv.org/abs/2309.04072
  • repo_url: None
  • paper_authors: Tianmin Yu, Shixin Zheng, Jianfeng Lu, Govind Menon, Xiangxiong Zhang
  • for: 本 paper introduce two explicit schemes to sample matrices from Gibbs distributions on $\mathcal S^{n,p}_+$, the manifold of real positive semi-definite (PSD) matrices of size $n\times n$ and rank $p$.
  • methods: 这两种方案基于 Euler-Maruyama 离散化的里曼尼安 Léon 方程(RLE),使用柯西-瓦塞瑞恩(Bures-Wasserstein) metric 和嵌入 metric 来定义 Gibbs distribution。
  • results: 作者们提供了一些实际的能量函数,使得这些方案可以进行数值验证。
    Abstract This paper introduces two explicit schemes to sample matrices from Gibbs distributions on $\mathcal S^{n,p}_+$, the manifold of real positive semi-definite (PSD) matrices of size $n\times n$ and rank $p$. Given an energy function $\mathcal E:\mathcal S^{n,p}_+\to \mathbb{R}$ and certain Riemannian metrics $g$ on $\mathcal S^{n,p}_+$, these schemes rely on an Euler-Maruyama discretization of the Riemannian Langevin equation (RLE) with Brownian motion on the manifold. We present numerical schemes for RLE under two fundamental metrics on $\mathcal S^{n,p}_+$: (a) the metric obtained from the embedding of $\mathcal S^{n,p}_+ \subset \mathbb{R}^{n\times n} $; and (b) the Bures-Wasserstein metric corresponding to quotient geometry. We also provide examples of energy functions with explicit Gibbs distributions that allow numerical validation of these schemes.
    摘要 Translated into Simplified Chinese:这篇论文介绍了两种直接的方法来从 Gibbs 分布中采样矩阵,这些方法基于在 $\mathcal{S}^{n,p}_+ $ 拓扑上的 Riemannian Langevin 方程(RLE)的 Эйле尔-马钦纳练习,并且使用了两种基本的 метрики: (a) 从 $\mathcal{S}^{n,p}_+ $ 的嵌入 embedding 中得到的 metric; 和 (b) Bures-Wasserstein метриoki,对应的是 quotient 拓扑。我们还提供了一些具有显式 Gibbs 分布的能量函数,以便 numerically 验证这些方法。

Weighted Unsupervised Domain Adaptation Considering Geometry Features and Engineering Performance of 3D Design Data

  • paper_url: http://arxiv.org/abs/2309.04499
  • repo_url: None
  • paper_authors: Seungyeon Shin, Namwoo Kang
  • for: 这个研究旨在提高设计过程中的设计优化效率,使用深度学习模型来预测工程性能。
  • methods: 本研究提出了一种双重权重领域适应方法,考虑了3D设计数据的几何特征和工程性能。这方法包括对伪设定进行反对抗训练,以提取不受领域影响的特征,并使用这些特征进行多出力回归 зада项来预测工程性能。
  • results: 研究发现,这种双重权重领域适应方法可以有效地预测3D设计中的最大 von Mises 压力和相应的位置。此外,这种方法可以对于新领域资料进行预测,而不需要大量的训练数据和 Computational expensive。
    Abstract The product design process in manufacturing involves iterative design modeling and analysis to achieve the target engineering performance, but such an iterative process is time consuming and computationally expensive. Recently, deep learning-based engineering performance prediction models have been proposed to accelerate design optimization. However, they only guarantee predictions on training data and may be inaccurate when applied to new domain data. In particular, 3D design data have complex features, which means domains with various distributions exist. Thus, the utilization of deep learning has limitations due to the heavy data collection and training burdens. We propose a bi-weighted unsupervised domain adaptation approach that considers the geometry features and engineering performance of 3D design data. It is specialized for deep learning-based engineering performance predictions. Domain-invariant features can be extracted through an adversarial training strategy by using hypothesis discrepancy, and a multi-output regression task can be performed with the extracted features to predict the engineering performance. In particular, we present a source instance weighting method suitable for 3D design data to avoid negative transfers. The developed bi-weighting strategy based on the geometry features and engineering performance of engineering structures is incorporated into the training process. The proposed model is tested on a wheel impact analysis problem to predict the magnitude of the maximum von Mises stress and the corresponding location of 3D road wheels. This mechanism can reduce the target risk for unlabeled target domains on the basis of weighted multi-source domain knowledge and can efficiently replace conventional finite element analysis.
    摘要 制品设计过程中的iterative设计模型和分析可以实现目标工程性能,但这个iterative过程占用时间和计算成本很高。现在,基于深度学习的工程性能预测模型已经被提出,以加速设计优化。然而,这些模型只能在训练数据上作预测,并且在新领域数据上可能不准确。特别是3D设计数据具有复杂的特征,这意味着存在多种分布领域。因此,使用深度学习有限制,因为需要大量的数据收集和训练卫星。我们提议一种bi-weighted无监督领域适应方法,该方法考虑了3D设计数据的几何特征和工程性能。我们使用对假设的差异进行对抗训练策略,从而提取域无关特征。然后,我们使用提取的特征进行多输出回归任务,以预测工程性能。特别是,我们提出了适用于3D设计数据的来源实例权重策略,以避免负面传递。我们在训练过程中 integrate了这种bi-weighting策略。我们的模型在测试中用于预测3D路轮受力的最大 von Mises 压力和相应的位置。这种机制可以减少目标风险,并fficiently替换传统的Finite Element分析。