cs.LG - 2023-07-23

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

  • paper_url: http://arxiv.org/abs/2307.12344
  • repo_url: https://github.com/ss-sun/right-for-the-wrong-reason
  • paper_authors: Susu Sun, Lisa M. Koch, Christian F. Baumgartner
  • for: This paper aims to evaluate the ability of various explanation techniques to identify spurious correlations in deep neural network models.
  • methods: The paper proposes a rigorous evaluation strategy to assess the effectiveness of post-hoc explanation techniques and inherently interpretable classifiers in detecting artificially added confounders in a chest x-ray diagnosis task.
  • results: The paper finds that the post-hoc technique SHAP and the inherently interpretable Attri-Net provide the best performance in identifying faulty model behavior and can be used to reliably detect spurious correlations.
    Abstract While deep neural network models offer unmatched classification performance, they are prone to learning spurious correlations in the data. Such dependencies on confounding information can be difficult to detect using performance metrics if the test data comes from the same distribution as the training data. Interpretable ML methods such as post-hoc explanations or inherently interpretable classifiers promise to identify faulty model reasoning. However, there is mixed evidence whether many of these techniques are actually able to do so. In this paper, we propose a rigorous evaluation strategy to assess an explanation technique's ability to correctly identify spurious correlations. Using this strategy, we evaluate five post-hoc explanation techniques and one inherently interpretable method for their ability to detect three types of artificially added confounders in a chest x-ray diagnosis task. We find that the post-hoc technique SHAP, as well as the inherently interpretable Attri-Net provide the best performance and can be used to reliably identify faulty model behavior.
    摘要 深度神经网络模型可以提供无可比拟的分类性能,但它们容易学习假设关系。这些关系可能是由干扰信息引起的,这些干扰信息可以难以通过性能指标来探测,尤其如果测试数据来自同一个分布。可解释Machine Learning方法,如后处解释或自然可解释的分类器,承诺可以识别模型的错误思维。然而,现有证据表明,许多这些技术并没有充分能力完成这一任务。在这篇文章中,我们提出了一种严格的评估策略,用于评估解释技术的能力是否可以正确地识别假设关系。使用这种策略,我们评估了五种后处解释技术和一种自然可解释的分类器,以 Detect three types of artificially added confounders in a chest x-ray diagnosis task。我们发现,使用SHAP的后处解释技术以及自然可解释的Attri-Net可以提供最好的性能,可以可靠地识别模型的错误行为。

Self-Supervised Learning for Audio-Based Emotion Recognition

  • paper_url: http://arxiv.org/abs/2307.12343
  • repo_url: None
  • paper_authors: Peranut Nimitsurachat, Peter Washington
    for: 这个研究的目的是发展一个基于音频资料的情绪识别模型,以便在心理健康、市场营销、游戏和社交媒体分析等领域中建立互动系统。methods: 这个研究使用了自我超级vised learning(SSL)方法,通过预测资料本身的特性来学习,不需要大量的指导标签。results: 这个研究发现,使用SSL方法可以在小量 annotated data 上提高模型的性能,特别是当情绪较易分类时。此外,这个研究还证明了SSL方法在嵌入特征表示空间中进行自我超级vised learning可以实现更好的表现。
    Abstract Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU- MOSEI's acoustic modality. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data. Our model is first pretrained to uncover the randomly-masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via several evaluation metrics against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics. This work shows the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small, and that the effect is most pronounced for emotions which are easier to classify such as happy, sad and anger. This work further demonstrates that self-supervised learning works when applied to embedded feature representations rather than the traditional approach of pre-training on the raw input space.
    摘要 这个文章探讨了使用语音资料进行情感识别的模型,并 explore了这些模型在心理健康、市场营销、游戏和社交媒体分析等领域的应用。然而,对于情感识别模型的训练label scarcity是一个主要的阻碍因素。自动学习(SSL)是一家 мето�odo,可以在训练labels的缺乏情况下培养出高性能的模型。为了了解SSL在语音基本情感识别中的使用效果,我们将运用SSL预训练在CMU-MOSEI的语音模式上进行分组。不同于先前的研究,我们的技术是对编码语音资料进行预训练。我们的模型首先预训练以探索随机遮盾的语音资料时间戳。预训练后,模型会被精确地调整使用一小sample的标注数据。最终的模型性能会通过一些评估度量与基准深度学习模型进行比较。我们发现,透过SSL预训练可以对情感识别模型进行改进,并且这个改进效果是随着标注数据的数量增加而加强。此外,我们发现这个效果尤其明显在易于分组的情感方面,例如:快乐、沮丧和愤怒。这个研究显示了自动学习在情感识别中的 utility,并且显示了这种方法在语音嵌入特征表现上进行预训练的效果。

Rapid detection of soil carbonates by means of NIR spectroscopy, deep learning methods and phase quantification by powder Xray diffraction

  • paper_url: http://arxiv.org/abs/2307.12341
  • repo_url: None
  • paper_authors: Lykourgos Chiniadis, Petros Tamvakis
    for: 这个研究旨在提高农业生产和土壤物理特性分析,以实现农业均衡和环境可持续性。methods: 本研究使用FT NIR reflectanceспектроскопія和深度学习方法来预测土壤碳酸含量。results: 研究获得了优异的预测结果,并且在无法使用量imetric方法的情况下,可以快速和有效地预测土壤碳酸含量。
    Abstract Soil NIR spectral absorbance/reflectance libraries are utilized towards improving agricultural production and analysis of soil properties which are key prerequisite for agroecological balance and environmental sustainability. Carbonates in particular, represent a soil property which is mostly affected even by mild, let alone extreme, changes of environmental conditions during climate change. In this study we propose a rapid and efficient way to predict carbonates content in soil by means of FT NIR reflectance spectroscopy and by use of deep learning methods. We exploited multiple machine learning methods, such as: 1) a MLP Regressor and 2) a CNN and compare their performance with other traditional ML algorithms such as PLSR, Cubist and SVM on the combined dataset of two NIR spectral libraries: KSSL (USDA), a dataset of soil samples reflectance spectra collected nationwide, and LUCAS TopSoil (European Soil Library) which contains soil sample absorbance spectra from all over the European Union, and use them to predict carbonate content on never before seen soil samples. Soil samples in KSSL and in TopSoil spectral libraries were acquired in the spectral region of visNIR, however in this study, only the NIR spectral region was utilized. Quantification of carbonates by means of Xray Diffraction is in good agreement with the volumetric method and the MLP prediction. Our work contributes to rapid carbonates content prediction in soil samples in cases where: 1) no volumetric method is available and 2) only NIR spectra absorbance data are available. Up till now and to the best of our knowledge, there exists no other study, that presents a prediction model trained on such an extensive dataset with such promising results on unseen data, undoubtedly supporting the notion that deep learning models present excellent prediction tools for soil carbonates content.
    摘要 soil NIR spectral absorbance/reflectance 图书馆是用于提高农业生产和土壤属性分析的重要前提,这些属性是生态平衡和环境可持续性的关键因素。碳酸盐是土壤属性中受到气候变化的影响最大的一种,因此在这种情况下,我们提出了一种快速和高效的碳酸盐含量预测方法,使用FT NIR 谐振谱分析和深度学习方法。我们利用了多种机器学习方法,如:1)多层感知网络(MLP)回归器,2)卷积神经网络(CNN),并与传统的机器学习算法如PLSR、Cubist和SVM进行比较,用于预测碳酸盐含量。我们使用了KSSL(USDA)和LUCAS TopSoil(欧盟土壤图书馆)两个 spectral 图书馆的共同数据集,其中KSSL包含了全美国的土壤样本谐振谱pectra,而LUCAS TopSoil包含了欧盟各国的土壤样本吸收谱spectra。我们只使用了NIR spectral 区域。我们通过X射晶 diffraction 测量和MLP 预测相比,发现了含碳酸盐的量与volume 方法具有良好一致性。我们的工作可以帮助在没有volume 方法可用时,只有NIR spectra 吸收数据可用时,快速预测土壤中碳酸盐含量。在我们所知道的范围内,没有其他研究可以在这样的广泛数据集上提出类似的预测模型,并且模型在未seen数据上的表现是非常出色,证明了深度学习模型在土壤碳酸盐含量预测中的优秀表现。

TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.12336
  • repo_url: None
  • paper_authors: Guy Zamberg, Moshe Salhov, Ofir Lindenbaum, Amir Averbuch
  • for: 本研究旨在提出一种基于扩散的 probabilistic 模型,用于不监督的异常检测。
  • methods: 我们的模型通过使用特殊的拒绝机制,使正常样本的浓度估计免受异常样本的影响。在推断阶段,我们可以通过查找低浓度区域的样本来识别异常样本。
  • results: 我们使用实际数据进行测试,发现我们的方法可以提高异常检测的能力,并且相比基eline,我们的方法更加稳定和不需要较多的超参数调整。
    Abstract Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.
    摘要 tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.Here's the translation in Traditional Chinese as well:tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.

An axiomatized PDE model of deep neural networks

  • paper_url: http://arxiv.org/abs/2307.12333
  • repo_url: None
  • paper_authors: Tangjun Wang, Wenqi Tao, Chenglong Bao, Zuoqiang Shi
  • for: 研究深度神经网络(DNN)与 partial differential equations(PDEs)之间的关系,尤其是 DNN 的普遍形式 PDE 模型。
  • methods: 将 DNN 视为一个进化Operator,从简单的基模型出发,根据一些合理的假设,证明了演化Operator 实际上是受湍涨-扩散方程的推动。
  • results: 根据演化Operator 的推动,提出了一种新的训练方法,用于改进 ResNet 的性能。实验 validate 了提出的方法的效果。
    Abstract Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This convection-diffusion equation model gives mathematical explanation for several effective networks. Moreover, we show that the convection-diffusion model improves the robustness and reduces the Rademacher complexity. Based on the convection-diffusion equation, we design a new training method for ResNets. Experiments validate the performance of the proposed method.
    摘要 Based on the relation between deep neural networks (DNN) and partial differential equations (PDEs), we investigate the general form of PDE models of DNN. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Under several reasonable assumptions, we prove that the evolution operator is actually determined by a convection-diffusion equation. This convection-diffusion equation model provides a mathematical explanation for several effective networks. Moreover, we show that the convection-diffusion model improves the robustness and reduces the Rademacher complexity. Based on the convection-diffusion equation, we propose a new training method for ResNets. Experimental results validate the performance of the proposed method.Here's the word-for-word translation of the text into Simplified Chinese: Based on Deep Neural Network (DNN) 和 Partial Differential Equations (PDEs) 之间的关系,我们研究 DNN 的总体形式 PDE 模型。为达到这个目标,我们将 DNN 表示为一个从简单基本模型的演化运算器。根据一些合理的假设,我们证明了演化运算器实际上是受湍振-漫步方程的决定。这个湍振-漫步方程模型为许多有效网络提供了数学解释。此外,我们还证明了湍振-漫步模型可以提高稳定性并降低 Rademacher 复杂度。基于湍振-漫步方程,我们提出了一种新的训练方法 для ResNets。实验结果证明了我们的方法的性能。

Tackling the Curse of Dimensionality with Physics-Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12306
  • repo_url: None
  • paper_authors: Zheyuan Hu, Khemraj Shukla, George Em Karniadakis, Kenji Kawaguchi
  • for: 解决高维纬度的物理学定义问题 (solving high-dimensional physical definition problems)
  • methods: 使用Stochastic Dimension Gradient Descent (SDGD)方法,即将梯度分解成不同维度的部分,并随机选择每个训练轮中的一部分维度进行训练physics-informed neural networks (PINNs)。 (using Stochastic Dimension Gradient Descent (SDGD) method, which decomposes the gradient into parts corresponding to different dimensions and randomly selects a subset of these dimensional parts for training physics-informed neural networks (PINNs))
  • results: 可以很快地解决很多难以解决的高维度纬度的非线性Partial Differential Equations (PDEs),例如Hamilton-Jacobi-Bellman (HJB)和Schrödinger方程在千个维度中的解决。 (can solve many notoriously hard high-dimensional PDEs, such as the Hamilton-Jacobi-Bellman (HJB) and the Schrödinger equations in thousands of dimensions very fast)
    Abstract The curse-of-dimensionality (CoD) taxes computational resources heavily with exponentially increasing computational cost as the dimension increases. This poses great challenges in solving high-dimensional PDEs as Richard Bellman first pointed out over 60 years ago. While there has been some recent success in solving numerically partial differential equations (PDEs) in high dimensions, such computations are prohibitively expensive, and true scaling of general nonlinear PDEs to high dimensions has never been achieved. In this paper, we develop a new method of scaling up physics-informed neural networks (PINNs) to solve arbitrary high-dimensional PDEs. The new method, called Stochastic Dimension Gradient Descent (SDGD), decomposes a gradient of PDEs into pieces corresponding to different dimensions and samples randomly a subset of these dimensional pieces in each iteration of training PINNs. We theoretically prove the convergence guarantee and other desired properties of the proposed method. We experimentally demonstrate that the proposed method allows us to solve many notoriously hard high-dimensional PDEs, including the Hamilton-Jacobi-Bellman (HJB) and the Schr\"{o}dinger equations in thousands of dimensions very fast on a single GPU using the PINNs mesh-free approach. For instance, we solve nontrivial nonlinear PDEs (one HJB equation and one Black-Scholes equation) in 100,000 dimensions in 6 hours on a single GPU using SDGD with PINNs. Since SDGD is a general training methodology of PINNs, SDGD can be applied to any current and future variants of PINNs to scale them up for arbitrary high-dimensional PDEs.
    摘要 “几何约束”(CoD)会卷用计算资源,计算成本随着维度的增加而呈指数增长。这会对解决高维 partial differential equations(PDEs) pose 大量挑战,Richard Bellman 在60年前就这样注意到。虽然在过去几年,有些研究者通过数值方法解决高维 PDEs,但这些计算却是非常昂贵的,并且true scaling of general nonlinear PDEs to high dimensions 从未实现过。在这篇论文中,我们开发了一种新的方法,即随机维度梯度下降(SDGD),用于扩展 physics-informed neural networks(PINNs)来解决任意高维 PDEs。SDGD 方法将 PDE 的梯度分解成不同维度的部分,并在训练 PINNs 时随机选择这些维度的部分。我们理论上证明了该方法的收敛性和其他愿望的性质。我们实验表明,该方法可以很快地解决许多知名度的高维 PDEs,包括 Hamilton-Jacobi-Bellman 方程和 Schrödinger 方程,并且可以在单个 GPU 上完成。例如,我们在 100,000 维度中解决了一些非线性 PDEs(一个 HJB 方程和一个 Black-Scholes 方程),只用了 6 个 GPU 上的 6 小时。由于 SDGD 是 PINNs 的一种通用训练方法,SDGD 可以应用于任何当前和未来的 PINNs 变体,以扩展它们到任意高维 PDEs。

Physics-Informed Machine Learning of Argon Gas-Driven Melt Pool Dynamics

  • paper_url: http://arxiv.org/abs/2307.12304
  • repo_url: None
  • paper_authors: R. Sharma, W. Grace Guo, M. Raissi, Y. B. Guo
  • for: 这 paper 是关于 metal 添加印制 (AM) 过程中溶融池动态的研究,它们的目的是提高过程的稳定性、微结构形成和印制物的性能。
  • methods: 这 paper 使用了物理学习 (PIML) 方法,通过将神经网络与物理法律相结合来预测溶融池动态,包括温度、速度和压力等参数。PIML 方法可以避免使用数学模拟方法,从而大幅降低计算成本。
  • results: 该 paper 通过数据驱动发现了模型常数,并且通过优化 PINN 模型来提高模型训练效率。PIML 方法可以高效地预测溶融池动态,并且可以提供更好的初始条件和边界条件。
    Abstract Melt pool dynamics in metal additive manufacturing (AM) is critical to process stability, microstructure formation, and final properties of the printed materials. Physics-based simulation including computational fluid dynamics (CFD) is the dominant approach to predict melt pool dynamics. However, the physics-based simulation approaches suffer from the inherent issue of very high computational cost. This paper provides a physics-informed machine learning (PIML) method by integrating neural networks with the governing physical laws to predict the melt pool dynamics such as temperature, velocity, and pressure without using any training data on velocity. This approach avoids solving the highly non-linear Navier-Stokes equation numerically, which significantly reduces the computational cost. The difficult-to-determine model constants of the governing equations of the melt pool can also be inferred through data-driven discovery. In addition, the physics-informed neural network (PINN) architecture has been optimized for efficient model training. The data-efficient PINN model is attributed to the soft penalty by incorporating governing partial differential equations (PDEs), initial conditions, and boundary conditions in the PINN model.
    摘要 金属加料制造(AM)中的熔 pool 动力学是制造过程稳定性、微结构形成和打印物质的关键因素。基于物理定律的数学模拟(CFD)是预测熔 pool 动力学的主要方法。然而,物理基础的模拟方法受到内置的计算成本高峰问题。这篇论文提出了基于物理学习(PIML)方法,通过将神经网络与管理物理法律相结合来预测熔 pool 动力学的温度、速度和压力,不需要使用任何很速度训练数据。这种方法可以避免数值方法中的高级非线性 Navier-Stokes 方程的解算问题,从而减少计算成本。此外,通过数据驱动发现,可以通过推断模型常数来确定管理方程的困难常数。此外,基于物理学习(PINN)架构已经优化了模型训练效率。通过软约束 penalty,将管理的partial differential equations(PDEs)、初始条件和边界条件 integrate 到 PINN 模型中,使得数据效率的 PINN 模型。

RANSAC-NN: Unsupervised Image Outlier Detection using RANSAC

  • paper_url: http://arxiv.org/abs/2307.12301
  • repo_url: https://github.com/mxtsai/ransac-nn
  • paper_authors: Chen-Han Tsai, Yu-Shao Peng
  • for: 这个论文旨在提出一种专门为图像数据设计的异常检测算法,以确保计算机视觉任务中使用的图像数据质量和准确性。
  • methods: 该算法基于RANSAC的方法进行比较图像,自动预测每个图像的异常分数而无需额外训练或标签信息。
  • results: 对于15种多样化的数据集,RANSAC-NN在与当前状态艺算法进行比较时,无需任何超参数调整,一致地表现出优异性。此外,文章还提供了每个RANSAC-NN组件的详细分析,并展示了其在图像涂抹检测中的潜在应用。
    Abstract Image outlier detection (OD) is crucial for ensuring the quality and accuracy of image datasets used in computer vision tasks. The majority of OD algorithms, however, have not been targeted toward image data. Consequently, the results of applying such algorithms to images are often suboptimal. In this work, we propose RANSAC-NN, a novel unsupervised OD algorithm specifically designed for images. By comparing images in a RANSAC-based approach, our algorithm automatically predicts the outlier score of each image without additional training or label information. We evaluate RANSAC-NN against state-of-the-art OD algorithms on 15 diverse datasets. Without any hyperparameter tuning, RANSAC-NN consistently performs favorably in contrast to other algorithms in almost every dataset category. Furthermore, we provide a detailed analysis to understand each RANSAC-NN component, and we demonstrate its potential applications in image mislabeled detection. Code for RANSAC-NN is provided at https://github.com/mxtsai/ransac-nn
    摘要 图像异常检测(OD)是计算机视觉任务中至关重要的质量和准确性因素。大多数OD算法 however,没有特地针对图像数据。因此,将这些算法应用于图像时的结果通常不佳。在这项工作中,我们提出了一种新的无监督OD算法,即RANSAC-NN。我们通过对图像进行RANSAC-based的比较,自动地对每个图像预测异常分数,无需额外的训练或标签信息。我们对RANSAC-NN与现有OD算法进行了15种多样化的数据集评估。无需任何超参数调整,RANSAC-NN在大多数数据集类别中一直表现优于其他算法。此外,我们还提供了每个RANSAC-NN组件的详细分析,并证明其在图像涂抹检测中的潜在应用。RANSAC-NN代码可以在https://github.com/mxtsai/ransac-nn上获取。

ResWCAE: Biometric Pattern Image Denoising Using Residual Wavelet-Conditioned Autoencoder

  • paper_url: http://arxiv.org/abs/2307.12255
  • repo_url: None
  • paper_authors: Youzhi Liang, Wen Liang
  • For: This paper proposes a deep learning architecture for fingerprint image denoising in compact IoT devices, aiming to improve the reliability of biometric authentication systems.* Methods: The proposed method, called Residual Wavelet-Conditioned Convolutional Autoencoder (Res-WCAE), combines image and wavelet encoders with a Kullback-Leibler divergence regularization. It leverages residual connections and wavelet-transform domain features to preserve fine-grained spatial information.* Results: The experimental results show that Res-WCAE outperforms several state-of-the-art denoising methods, particularly for heavily degraded fingerprint images with high levels of noise. The proposed method demonstrates promise for improving the reliability of biometric authentication systems in compact IoT devices.Here’s the simplified Chinese text for the three key points:* For: 这篇论文提出了一种用于 compact IoT 设备中的指纹图像干扰 removing 深度学习架构,以提高生物识别系统的可靠性。* Methods: 提议的方法是 Residual Wavelet-Conditioned Convolutional Autoencoder (Res-WCAE),它组合了图像编码器和波峰编码器,并使用 Kullback-Leibler 分布regularization。它利用了剩余连接和波峰变换域特征来保留细腻的空间信息。* Results: 实验结果表明,Res-WCAE 比许多状态机器人的干扰方法更高效,特别是对于受到高水平噪声的指纹图像。提议的方法表明,可以提高 compact IoT 设备中生物识别系统的可靠性。
    Abstract The utilization of biometric authentication with pattern images is increasingly popular in compact Internet of Things (IoT) devices. However, the reliability of such systems can be compromised by image quality issues, particularly in the presence of high levels of noise. While state-of-the-art deep learning algorithms designed for generic image denoising have shown promise, their large number of parameters and lack of optimization for unique biometric pattern retrieval make them unsuitable for these devices and scenarios. In response to these challenges, this paper proposes a lightweight and robust deep learning architecture, the Residual Wavelet-Conditioned Convolutional Autoencoder (Res-WCAE) with a Kullback-Leibler divergence (KLD) regularization, designed specifically for fingerprint image denoising. Res-WCAE comprises two encoders - an image encoder and a wavelet encoder - and one decoder. Residual connections between the image encoder and decoder are leveraged to preserve fine-grained spatial features, where the bottleneck layer conditioned on the compressed representation of features obtained from the wavelet encoder using approximation and detail subimages in the wavelet-transform domain. The effectiveness of Res-WCAE is evaluated against several state-of-the-art denoising methods, and the experimental results demonstrate that Res-WCAE outperforms these methods, particularly for heavily degraded fingerprint images in the presence of high levels of noise. Overall, Res-WCAE shows promise as a solution to the challenges faced by biometric authentication systems in compact IoT devices.
    摘要 现在互联网物联网设备中越来越普遍使用生物特征认证,特别是使用图像特征进行身份验证。然而,这些系统的可靠性可能受到图像质量问题的影响,特别是在高噪声水平下。当前的深度学习算法,专门为普通图像除噪设计的深度学习模型,虽然已经达到了一定的成绩,但它们的参数数量很大,并且没有特定生物特征检索优化,因此不适合这些设备和场景。为了解决这些挑战,本文提出了一种轻量级和可靠的深度学习架构——差分波let-conditioned Convolutional Autoencoder(Res-WCAE),并在其中添加了Kullback-Leibler异质(KLD)正则化。Res-WCAE包括两个编码器——图像编码器和波лет编码器——以及一个解码器。图像编码器和解码器之间的差分连接,使得图像的细腻特征得到保留,而波лет编码器使用波лет变换获得的特征的压缩表示,并通过差分连接和权重Conditioning来与图像编码器进行交互。对比多种现有的去噪方法,Res-WCAE的效果得到了证明,特别是在高噪声水平下的极大噪声图像认证 task。总的来说,Res-WCAE表现出了在compact IoT设备中的可靠性和灵活性,并且有望成为生物认证系统中的解决方案。

Explainable Depression Detection via Head Motion Patterns

  • paper_url: http://arxiv.org/abs/2307.12241
  • repo_url: None
  • paper_authors: Monika Gahalawat, Raul Fernandez Rojas, Tanaya Guha, Ramanathan Subramanian, Roland Goecke
  • for: 检测抑郁症状
  • methods: 基于head motion数据的基本运动单元(kinemes)和机器学习方法
  • results: head motion patterns 可以作为抑郁症状的生物标志,并且可以通过基于征料的方法来分类抑郁和健康控制组Here’s a more detailed explanation of each point:
  • for: The paper is written to detect depression symptoms using head motion data and machine learning methods.
  • methods: The paper uses two approaches to detect depression: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes.
  • results: The paper finds that head motion patterns are effective biomarkers for detecting depressive symptoms, and that explanatory kineme patterns consistent with prior findings can be observed for the two classes. The paper achieves peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 datasets for binary classification over episodic thin-slices, and a peak F1 of 0.72 over videos for AVEC2013.
    Abstract While depression has been studied via multimodal non-verbal behavioural cues, head motion behaviour has not received much attention as a biomarker. This study demonstrates the utility of fundamental head-motion units, termed \emph{kinemes}, for depression detection by adopting two distinct approaches, and employing distinctive features: (a) discovering kinemes from head motion data corresponding to both depressed patients and healthy controls, and (b) learning kineme patterns only from healthy controls, and computing statistics derived from reconstruction errors for both the patient and control classes. Employing machine learning methods, we evaluate depression classification performance on the \emph{BlackDog} and \emph{AVEC2013} datasets. Our findings indicate that: (1) head motion patterns are effective biomarkers for detecting depressive symptoms, and (2) explanatory kineme patterns consistent with prior findings can be observed for the two classes. Overall, we achieve peak F1 scores of 0.79 and 0.82, respectively, over BlackDog and AVEC2013 for binary classification over episodic \emph{thin-slices}, and a peak F1 of 0.72 over videos for AVEC2013.
    摘要 研究表示,诊断抑郁症可以通过多modal非语言行为迹象来进行。然而,头部运动行为尚未得到很多关注,作为生物标记。本研究演示了基本头部运动单元(kinemes)在抑郁检测中的有用性,通过采用两种不同的方法和特征:(a)从头部运动数据中提取与抑郁患者和健康控制人群相对应的kinemes,以及(b)通过健康控制人群学习kineme模式,并计算来自重建错误的统计。使用机器学习方法,我们评估了抑郁分类性能在BlackDog和AVEC2013数据集上。我们发现:(1)头部运动模式是抑郁症的有效生物标记,和(2)可以观察到健康控制人群和抑郁患者之间的准确的kineme模式。总的来说,我们在BlackDog和AVEC2013数据集上实现了最高的F1分数为0.79和0.82,分别为 binary 分类EPISODE 薄片和视频。

Demonstration of a Response Time Based Remaining Useful Life (RUL) Prediction for Software Systems

  • paper_url: http://arxiv.org/abs/2307.12237
  • repo_url: None
  • paper_authors: Ray Islam, Peter Sandborn
  • for: 这个论文旨在应用PHM概念到软件系统中,以预测问题和计算系统的RUL。
  • methods: 本论文使用了usage参数(例如发布数量和类别)和性能参数(例如响应时间)来预测RUL。
  • results: 研究人员通过对实际数据进行比较,发现PHM概念可以应用于软件系统,并且可以计算出RUL来做系统管理决策。
    Abstract Prognostic and Health Management (PHM) has been widely applied to hardware systems in the electronics and non-electronics domains but has not been explored for software. While software does not decay over time, it can degrade over release cycles. Software health management is confined to diagnostic assessments that identify problems, whereas prognostic assessment potentially indicates when in the future a problem will become detrimental. Relevant research areas such as software defect prediction, software reliability prediction, predictive maintenance of software, software degradation, and software performance prediction, exist, but all of these represent diagnostic models built upon historical data, none of which can predict an RUL for software. This paper addresses the application of PHM concepts to software systems for fault predictions and RUL estimation. Specifically, this paper addresses how PHM can be used to make decisions for software systems such as version update and upgrade, module changes, system reengineering, rejuvenation, maintenance scheduling, budgeting, and total abandonment. This paper presents a method to prognostically and continuously predict the RUL of a software system based on usage parameters (e.g., the numbers and categories of releases) and performance parameters (e.g., response time). The model developed has been validated by comparing actual data, with the results that were generated by predictive models. Statistical validation (regression validation, and k-fold cross validation) has also been carried out. A case study, based on publicly available data for the Bugzilla application is presented. This case study demonstrates that PHM concepts can be applied to software systems and RUL can be calculated to make system management decisions.
    摘要 预测和健康管理(PHM)已广泛应用于硬件系统中,但尚未探讨软件领域。虽然软件不会逝减,但可能会逐渐下降。软件健康管理仅仅是诊断评估,而预测评估可能会预测未来哪一天问题会变得严重。有关研究领域包括软件缺陷预测、软件可靠性预测、软件维护预测、软件衰老和软件性能预测,但这些都是基于历史数据建立的诊断模型,无法预测软件的寿命。本文探讨将PHM概念应用于软件系统中,以预测问题和计算软件系统的寿命。具体来说,本文探讨了如何使用PHM来做软件系统的决策,如版本更新和升级、模块更改、系统重构、重新生成、维护计划、预算和完全废弃。本文提出了一种基于使用量和性能参数预测软件系统的寿命的方法。该模型已经验证了,并通过与预测模型生成的结果进行比较。此外,还进行了统计验证(回归验证和Kfold跨验证)。一个基于公共数据的 Bugzilla 应用程序的案例研究也被提出,这个案例示出了PHM概念可以应用于软件系统,并且可以计算软件系统的寿命以进行系统管理决策。

Multi-Modal Machine Learning for Assessing Gaming Skills in Online Streaming: A Case Study with CS:GO

  • paper_url: http://arxiv.org/abs/2307.12236
  • repo_url: None
  • paper_authors: Longxiang Zhang, Wenping Wang
  • for: 本研究旨在为串流服务提供商评估电竞技巧,以便为客户提供个性化推荐和服务促销。
  • methods: 本研究使用最新的端到端模型学习joint representation of multiple modalities,并进行了大量的实验证明其效果。
  • results: 研究发现,提议的模型具有识别用户的弱点,而不是学习有意义的表示。未来工作将解决这个问题。
    Abstract Online streaming is an emerging market that address much attention. Assessing gaming skills from videos is an important task for streaming service providers to discover talented gamers. Service providers require the information to offer customized recommendation and service promotion to their customers. Meanwhile, this is also an important multi-modal machine learning tasks since online streaming combines vision, audio and text modalities. In this study we begin by identifying flaws in the dataset and proceed to clean it manually. Then we propose several variants of latest end-to-end models to learn joint representation of multiple modalities. Through our extensive experimentation, we demonstrate the efficacy of our proposals. Moreover, we identify that our proposed models is prone to identifying users instead of learning meaningful representations. We purpose future work to address the issue in the end.
    摘要 在线流媒体是一个崛起的市场,吸引了很多注意。通过视频评估游戏技巧是流媒体服务提供者为发掘才华的玩家提供重要的任务。服务提供者需要这些信息以为客户提供个性化推荐和服务促销。同时,这也是一项重要的多modal机器学习任务,因为在线流媒体结合了视觉、音频和文本模式。在本研究中,我们首先发现数据集中的缺陷,然后手动清理数据。然后,我们提出了多种最新的端到端模型,以学习多Modalities的共同表示。通过我们的广泛实验,我们证明了我们的提议的有效性。另外,我们发现我们的提议模型容易被用户认出,而不是学习有意义的表示。我们在结尾提出了未来的工作,以解决这个问题。

EchoGLAD: Hierarchical Graph Neural Networks for Left Ventricle Landmark Detection on Echocardiograms

  • paper_url: http://arxiv.org/abs/2307.12229
  • repo_url: https://github.com/masoudmo/echoglad
  • paper_authors: Masoud Mokhtari, Mobina Mahdavi, Hooman Vaseli, Christina Luong, Purang Abolmaesumi, Teresa S. M. Tsang, Renjie Liao
  • for: 这 paper 的目的是自动检测心脏左心室的四个标志点和测量左心室内部的尺寸和周围肌肉的大约质量。
  • methods: 这 paper 使用了一种基于 echo cardiogram 的层次 graph neural network (GNN),以实现左心室标志点检测。
  • results: 这 paper 在一个公共数据集和一个私有数据集上进行了评估,在内分布 (ID) 和外分布 (OOD) 两种设置下, achieved state-of-the-art 的 Mean Absolute Error (MAE) 值为 1.46 mm 和 1.86 mm,并且在 OOD 设置下表现更好。
    Abstract The functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.
    摘要 functional assessment of the left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.Here's the translation in Traditional Chinese:函数评估左心室的 left ventricle chamber of the heart requires detecting four landmark locations and measuring the internal dimension of the left ventricle and the approximate mass of the surrounding muscle. The key challenge of automating this task with machine learning is the sparsity of clinical labels, i.e., only a few landmark pixels in a high-dimensional image are annotated, leading many prior works to heavily rely on isotropic label smoothing. However, such a label smoothing strategy ignores the anatomical information of the image and induces some bias. To address this challenge, we introduce an echocardiogram-based, hierarchical graph neural network (GNN) for left ventricle landmark detection (EchoGLAD). Our main contributions are: 1) a hierarchical graph representation learning framework for multi-resolution landmark detection via GNNs; 2) induced hierarchical supervision at different levels of granularity using a multi-level loss. We evaluate our model on a public and a private dataset under the in-distribution (ID) and out-of-distribution (OOD) settings. For the ID setting, we achieve the state-of-the-art mean absolute errors (MAEs) of 1.46 mm and 1.86 mm on the two datasets. Our model also shows better OOD generalization than prior works with a testing MAE of 4.3 mm.

The identification of garbage dumps in the rural areas of Cyprus through the application of deep learning to satellite imagery

  • paper_url: http://arxiv.org/abs/2308.02502
  • repo_url: None
  • paper_authors: Andrew Keith Wilkinson
  • for: 这个研究旨在使用人工智能技术和卫星图像来识别Cyprus农村地区的非法垃圾弃置。
  • methods: 这个研究使用了人工智能技术和卫星图像来识别垃圾,首先收集了一个小型数据集,然后使用数据扩展技术来增加数据量,然后训练了一个 convolutional neural network(CNN)来识别垃圾。
  • results: 这个研究得到了一个深度学习模型,可以在90%的情况下正确地识别垃圾图像。这个模型可以成为未来Cyprus岛上的垃圾映射系统的基础。
    Abstract Garbage disposal is a challenging problem throughout the developed world. In Cyprus, as elsewhere, illegal ``fly-tipping" is a significant issue, especially in rural areas where few legal garbage disposal options exist. However, there is a lack of studies that attempt to measure the scale of this problem, and few resources available to address it. A method of automating the process of identifying garbage dumps would help counter this and provide information to the relevant authorities. The aim of this study was to investigate the degree to which artificial intelligence techniques, together with satellite imagery, can be used to identify illegal garbage dumps in the rural areas of Cyprus. This involved collecting a novel dataset of images that could be categorised as either containing, or not containing, garbage. The collection of such datasets in sufficient raw quantities is time consuming and costly. Therefore a relatively modest baseline set of images was collected, then data augmentation techniques used to increase the size of this dataset to a point where useful machine learning could occur. From this set of images an artificial neural network was trained to recognise the presence or absence of garbage in new images. A type of neural network especially suited to this task known as ``convolutional neural networks" was used. The efficacy of the resulting model was evaluated using an independently collected dataset of test images. The result was a deep learning model that could correctly identify images containing garbage in approximately 90\% of cases. It is envisaged that this model could form the basis of a future system that could systematically analyse the entire landscape of Cyprus to build a comprehensive ``garbage" map of the island.
    摘要 垃圾处理是发达国家的一个挑战。在塞浦路斯,如其他地方一样,非法“飞tipping”是一个严重的问题,特别是在农村地区,其法定垃圾处理选择较少。然而,有很少的研究尝试量化这个问题,而且有限的资源来解决它。这项研究的目的是使用人工智能技术和卫星图像来识别塞浦路斯农村地区非法垃圾排放。这包括收集一个新的图像集,这些图像可以分为含垃圾和不含垃圾两类。收集这些图像集的过程是时间consuming和成本高的。因此,我们只收集了一个相对较小的基线集的图像,然后使用数据扩展技术来增加这个集的大小,以便进行有用的机器学习。从这些图像中,我们用人工神经网络来识别新图像中是否含垃圾。我们使用的是一种适合这种任务的特殊类型的神经网络,即“卷积神经网络”。我们评估了这种模型的效果,使用独立收集的测试集。结果是一个深度学习模型,可以在90%的情况下正确地识别含垃圾的图像。我们可以基于这个模型,建立一个将系统地分析整个塞浦路斯岛的系统,并建立一个“垃圾”地图。

Geometry-Aware Adaptation for Pretrained Models

  • paper_url: http://arxiv.org/abs/2307.12226
  • repo_url: None
  • paper_authors: Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala
  • for: 提高零shot预测性能和适应新类预测
  • methods: 利用度量学空间信息进行适应和预测改进
  • results: 在ImageNet上实现了29.7%的相对改进,并且可以扩展到千万类的预测任务。当无外部度量时,可以使用自动生成的度量从类嵌入中获得10.5%的改进。
    Abstract Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping argmax with the Fr\'echet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.
    摘要 机器学习模型 -- 包括知名的零批量模型 -- 常常在具有小比例标签空间的数据集上训练。这些空间通常具有一个度量关系标签之间的距离。我们提出了一种简单的方法,利用这些信息来适应训练过的模型预测新类 -- 或者在零批量预测中提高性能 -- без需要额外训练。我们的技术是将欧几何均值换取作为标准预测规则的替换。我们提供了全面的理论分析,研究(i)标签空间径距、样本复杂度和模型维度之间的学习理论结果,(ii)可预测任何未见类的全范围描述,以及(iii)针对不能预测整个未见类范围时的优化的活动学习样本选择方法。实际上,我们的提议方法Loki在ImageNet上实现了Relative improvement为29.7%,并且可以扩展到万个类。当没有外部度量时,Loki可以使用自带的类嵌入度量,实现预测零批量模型CLIP的10.5%提高。

Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation

  • paper_url: http://arxiv.org/abs/2307.12219
  • repo_url: None
  • paper_authors: Haoyue Bai, Ceyuan Yang, Yinghao Xu, S. -H. Gary Chan, Bolei Zhou
  • for: 提高神经网络模型对于不同分布数据的鲁棒性
  • methods: 使用生成模型作为数据增强源,通过混合多个域的生成模型并在 interpolate 模型参数来生成多元的OoD样本
  • results: 实验结果显示,提出的方法可以明显提高神经网络模型对于不同分布数据的鲁棒性,并且可以控制增强的方向和强度
    Abstract Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data, where the training and test are drawn from different distributions. In this paper, we explore utilizing the generative models as a data augmentation source for improving out-of-distribution robustness of neural classifiers. Specifically, we develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples. Training a generative model directly on the source domains tends to suffer from mode collapse and sometimes amplifies the data bias. Instead, we first train a StyleGAN model on one source domain and then fine-tune it on the other domains, resulting in many correlated generators where their model parameters have the same initialization thus are aligned. We then linearly interpolate the model parameters of the generators to spawn new sets of generators. Such interpolated generators are used as an extra data augmentation source to train the classifiers. The interpolation coefficients can flexibly control the augmentation direction and strength. In addition, a style-mixing mechanism is applied to further improve the diversity of the generated OoD samples. Our experiments show that the proposed method explicitly increases the diversity of training domains and achieves consistent improvements over baselines across datasets and multiple different distribution shifts.
    摘要 Training a generative model directly on the source domains can lead to mode collapse and amplify data bias. Instead, we first train a StyleGAN model on one source domain and then fine-tune it on the other domains, resulting in many correlated generators with aligned model parameters. We then linearly interpolate the model parameters of the generators to spawn new sets of generators. Such interpolated generators are used as an extra data augmentation source to train the classifiers. The interpolation coefficients can flexibly control the augmentation direction and strength.In addition, we apply a style-mixing mechanism to further improve the diversity of the generated OoD samples. Our experiments show that the proposed method explicitly increases the diversity of training domains and achieves consistent improvements over baselines across datasets and multiple different distribution shifts.

Mental Workload Estimation with Electroencephalogram Signals by Combining Multi-Space Deep Models

  • paper_url: http://arxiv.org/abs/2308.02409
  • repo_url: None
  • paper_authors: Hong-Hai Nguyen, Ngumimi Karen Iyortsuun, Hyung-Jeong Yang, Guee-Sang Lee, Soo-Hyung Kim
  • for: 本研究旨在 классифицироватьMENTAL工作负担为三种状态和估算综合指数。
  • methods: 该方法 combinestemultiple空间维度来获得最佳的心理估算结果。在时域方法中,我们使用Temporal Convolutional Networks,而在频域方法中,我们提出了一种新的Multi-Dimensional Residual Block架构。
  • results: 我们的方法可以准确地分类MENTAL工作负担为三种状态,并且可以准确地估算综合指数。
    Abstract The human brain is in a continuous state of activity during both work and rest. Mental activity is a daily process, and when the brain is overworked, it can have negative effects on human health. In recent years, great attention has been paid to early detection of mental health problems because it can help prevent serious health problems and improve quality of life. Several signals are used to assess mental state, but the electroencephalogram (EEG) is widely used by researchers because of the large amount of information it provides about the brain. This paper aims to classify mental workload into three states and estimate continuum levels. Our method combines multiple dimensions of space to achieve the best results for mental estimation. In the time domain approach, we use Temporal Convolutional Networks, and in the frequency domain, we propose a new architecture called the Multi-Dimensional Residual Block, which combines residual blocks.
    摘要 人类大脑在工作和休息时都处于不断活跃的状态。心理活动是每日的过程,当大脑过度劳累时,可能会对人类健康产生负面影响。在最近几年,关注早期识别心理健康问题的注意力凝固了,因为它可以帮助预防严重的健康问题并提高生活质量。多种信号都可以评估心理状态,但是电enzephalogram(EEG)在研究人员中广泛使用,因为它可以提供大量关于大脑的信息。本文的目标是将心理劳动力分为三个状态,并估计连续水平。我们的方法将多维空间综合使用,以达到最佳的心理估计结果。在时域方法中,我们使用Temporal Convolutional Networks,在频域中,我们提出了一种新的建筑方案called Multi-Dimensional Residual Block,这个方案将residual blocks综合使用。

Adversarial Agents For Attacking Inaudible Voice Activated Devices

  • paper_url: http://arxiv.org/abs/2307.12204
  • repo_url: None
  • paper_authors: Forrest McKee, David Noever
  • For: 这个论文探讨了基于互联网物联网的听不到攻击的威胁风险。* Methods: 该论文使用了强化学习来解决这些听不到攻击的问题。* Results: 研究发现,使用深度强化学习的方法可以快速拥有所有节点,并且在 fewer steps 中完成。In English, this means:* For: This paper explores the risk of inaudible attacks on voice-activated devices.* Methods: The paper uses reinforcement learning to solve the problem of inaudible attacks.* Results: The study finds that using deep reinforcement learning can quickly gain control of all nodes, and achieve this in fewer steps.
    Abstract The paper applies reinforcement learning to novel Internet of Thing configurations. Our analysis of inaudible attacks on voice-activated devices confirms the alarming risk factor of 7.6 out of 10, underlining significant security vulnerabilities scored independently by NIST National Vulnerability Database (NVD). Our baseline network model showcases a scenario in which an attacker uses inaudible voice commands to gain unauthorized access to confidential information on a secured laptop. We simulated many attack scenarios on this baseline network model, revealing the potential for mass exploitation of interconnected devices to discover and own privileged information through physical access without adding new hardware or amplifying device skills. Using Microsoft's CyberBattleSim framework, we evaluated six reinforcement learning algorithms and found that Deep-Q learning with exploitation proved optimal, leading to rapid ownership of all nodes in fewer steps. Our findings underscore the critical need for understanding non-conventional networks and new cybersecurity measures in an ever-expanding digital landscape, particularly those characterized by mobile devices, voice activation, and non-linear microphones susceptible to malicious actors operating stealth attacks in the near-ultrasound or inaudible ranges. By 2024, this new attack surface might encompass more digital voice assistants than people on the planet yet offer fewer remedies than conventional patching or firmware fixes since the inaudible attacks arise inherently from the microphone design and digital signal processing.
    摘要 文章应用再强化学习解决新互联网设备配置中的攻击问题。我们对无声攻击voice控制设备进行分析,确认了攻击性风险因子为7.6/10,强调了设备安全漏洞的独立评分。我们的基线网络模型显示了一种攻击者通过无声voice命令窃取机密信息的场景,我们在这个基线网络模型上进行了许多攻击场景的模拟,发现了大规模攻击INTERNET OF THINGS设备,以获取特权信息,不需要新硬件或设备技能升级。使用Microsoft的CyberBattleSim框架,我们评估了六种再强化学习算法,发现deep Q学习与利用最佳,可以在 fewer steps 内快速拥有所有节点。我们的发现强调了非常 conventional 网络和新的cybersecurity措施在不断扩大的数字ландшаф具 особен需要,特别是包括移动设备、voice控制和非线性 Microphone 在内的设备,遭受恶势力操作的隐藏攻击。到2024年,这个新的攻击表面可能会包括更多的数字voice助手 than人类 на 地球, yet offer fewer remedies than conventional patching or firmware fixes,因为无声攻击来自 Microphone 设计和数字信号处理。

NCART: Neural Classification and Regression Tree for Tabular Data

  • paper_url: http://arxiv.org/abs/2307.12198
  • repo_url: None
  • paper_authors: Jiaqi Luo, Shixin Xu
  • for: 这 paper 旨在提出一种可解释性强的深度学习模型,以解决深度学习模型在大规模或高维数据集中的计算成本高和可解释性差的问题。
  • methods: 该 paper 提出了一种名为 Neural Classification and Regression Tree (NCART) 的新型可解释性神经网络,它将多层感知网络替换为多个可导的无知决策树。通过将决策树 integrate 到网络架构中,NCART 保持了可解释性,同时具有神经网络的综合能力。
  • results: 数值实验表明,NCART 比现有的深度学习模型具有更高的性能,并且在不同的数据集中表现出色,建立了 NCART 作为树状模型的强大竞争对手。
    Abstract Deep learning models have become popular in the analysis of tabular data, as they address the limitations of decision trees and enable valuable applications like semi-supervised learning, online learning, and transfer learning. However, these deep-learning approaches often encounter a trade-off. On one hand, they can be computationally expensive when dealing with large-scale or high-dimensional datasets. On the other hand, they may lack interpretability and may not be suitable for small-scale datasets. In this study, we propose a novel interpretable neural network called Neural Classification and Regression Tree (NCART) to overcome these challenges. NCART is a modified version of Residual Networks that replaces fully-connected layers with multiple differentiable oblivious decision trees. By integrating decision trees into the architecture, NCART maintains its interpretability while benefiting from the end-to-end capabilities of neural networks. The simplicity of the NCART architecture makes it well-suited for datasets of varying sizes and reduces computational costs compared to state-of-the-art deep learning models. Extensive numerical experiments demonstrate the superior performance of NCART compared to existing deep learning models, establishing it as a strong competitor to tree-based models.
    摘要 深度学习模型在表格数据分析中变得越来越受欢迎,因为它们可以解决决策树的限制,并实现有价值的应用,如半监督学习、在线学习和传输学习。然而,这些深度学习方法经常面临一种负担。一方面,它们在处理大规模或高维度数据时可能会变得计算昂贵。另一方面,它们可能缺乏可解性,并且不适用于小规模数据。在本研究中,我们提出一种新的可解的神经网络模型,即神经分类和回归树(NCART),以解决这些挑战。NCART是基于差分网络的修改版本,它将完全连接层 replaced with 多个可微分的无知决策树。通过将决策树 integrate into 神经网络架构,NCART可以保持其可解性,同时受益于神经网络的终端能力。NCART 的简单架构使其适用于不同规模的数据集,并 reduc 计算成本与现有的深度学习模型相比。我们的数字实验证明 NCART 的表现比现有的深度学习模型更优,从而成为树状模型的强竞争对手。

Monadic Deep Learning

  • paper_url: http://arxiv.org/abs/2307.12187
  • repo_url: https://github.com/ThoughtWorksInc/monadic-deep-learning
  • paper_authors: Bo Yang, Zhihao Zhang Kirisame Marisa, Kai Shi
  • for: 这个论文的目的是解决 dynamically typed 编程语言中的神经网络模型问题,使得用户可以使用 statically typed 语言来创建神经网络模型。
  • methods: 这篇论文使用了一种新的方法,即在静态类型函数中自动对含有多个可训练变量的神经网络模型进行差分。它还使用了一些幺athed 和 monad transformers,以便让用户创建具有 intuition 和简洁性的神经网络模型。
  • results: 该论文的实验结果表明,使用 DeepLearning.scala 可以帮助用户创建复杂的神经网络模型,并且仍然保持类型安全性。
    Abstract The Java and Scala community has built a very successful big data ecosystem. However, most of neural networks running on it are modeled in dynamically typed programming languages. These dynamically typed deep learning frameworks treat neural networks as differentiable expressions that contain many trainable variable, and perform automatic differentiation on those expressions when training them. Until 2019, none of the learning frameworks in statically typed languages provided the expressive power of traditional frameworks. Their users are not able to use custom algorithms unless creating plenty of boilerplate code for hard-coded back-propagation. We solved this problem in DeepLearning.scala 2. Our contributions are: 1. We discovered a novel approach to perform automatic differentiation in reverse mode for statically typed functions that contain multiple trainable variable, and can interoperate freely with the metalanguage. 2. We designed a set of monads and monad transformers, which allow users to create monadic expressions that represent dynamic neural networks. 3. Along with these monads, we provide some applicative functors, to perform multiple calculations in parallel. With these features, users of DeepLearning.scala were able to create complex neural networks in an intuitive and concise way, and still maintain type safety.
    摘要 直到2019年,静态类型语言中的学习框架都没有提供传统框架的表达力。其用户无法使用自定义算法,除非创建大量的 boilerplate 代码来实现硬编码的反射传播。我们在 DeepLearning.scala 2 中解决了这个问题。我们的贡献包括:1. 我们发现了一种新的方法,可以在静态类型函数中自动进行反Mode微分,并且可以与金属语言进行自由交互。2. 我们设计了一组幂等和幂等转换,这些幂等可以让用户创建幂等表达式,表示动态神经网络。3. 同时,我们还提供了一些应用程序函数,可以在平行计算中进行多个计算。通过这些特性,DeepLearning.scala 的用户可以创建复杂的神经网络,使用 intuition 和简洁的方式来进行训练,并且仍保持类型安全性。

Machine learning discovers invariants of braids and flat braids

  • paper_url: http://arxiv.org/abs/2307.12185
  • repo_url: None
  • paper_authors: Alexei Lisitsa, Mateo Salles, Alexei Vernitski
  • for: 用机器学习分类布里论(或平坦布里论)的例子为轻量级或非轻量级。
  • methods: 使用超visisted学习神经网络(多层感知器)进行分类。
  • results: 发现新的便利 invariants of braids, including a complete invariant of flat braids.Here’s the translation in English:
  • for: Using machine learning to classify examples of braids (or flat braids) as trivial or non-trivial.
  • methods: Using supervised learning with neural networks (multilayer perceptrons).
  • results: Discovering new convenient invariants of braids, including a complete invariant of flat braids.
    Abstract We use machine learning to classify examples of braids (or flat braids) as trivial or non-trivial. Our ML takes form of supervised learning using neural networks (multilayer perceptrons). When they achieve good results in classification, we are able to interpret their structure as mathematical conjectures and then prove these conjectures as theorems. As a result, we find new convenient invariants of braids, including a complete invariant of flat braids.
    摘要 我们使用机器学习来分类拥有布里亚的示例(或平板布里亚)为致命或非致命。我们的机器学习形式为指导学习使用神经网络(多层感知器)。当它们在分类中获得良好的结果时,我们可以解释它们的结构为数学假设,然后证明这些假设为定理。因此,我们发现新的便利 invariants of braids,包括完整的平板布里亚 invariants。Note: "布里亚" (braids) is a word in Chinese that refers to a type of mathematical object, and "平板布里亚" (flat braids) is a specific type of braid that is flat and has no crossing points.

Prototype-Driven and Multi-Expert Integrated Multi-Modal MR Brain Tumor Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.12180
  • repo_url: https://github.com/linzy0227/pdminet
  • paper_authors: Yafei Zhang, Zhiyuan Li, Huafeng Li, Dapeng Tao
  • for: 这种多Modal MR brain tumor imaging segmentation方法是为了解决现有方法 Directly extracting discriminative features from input images for tumor sub-region category determination and localization,忽略了信息杂化的影响。
  • methods: 该方法提议使用肿瘤原型驱动的多专家结合,使得每个肿瘤子区域特征得到高亮显示。具体来说,我们提出了一种互传机制,将不同modal的特征传递给每个modal,以解决单modal特征不充分的问题。此外,我们还提出了一种使用学习的肿瘤原型来驱动特征表示和融合方法,使得肿瘤特征得到了融合。
  • results: 实验结果表明,该方法在三个竞赛肿瘤分割数据集上具有优秀的性能。
    Abstract For multi-modal magnetic resonance (MR) brain tumor image segmentation, current methods usually directly extract the discriminative features from input images for tumor sub-region category determination and localization. However, the impact of information aliasing caused by the mutual inclusion of tumor sub-regions is often ignored. Moreover, existing methods usually do not take tailored efforts to highlight the single tumor sub-region features. To this end, a multi-modal MR brain tumor segmentation method with tumor prototype-driven and multi-expert integration is proposed. It could highlight the features of each tumor sub-region under the guidance of tumor prototypes. Specifically, to obtain the prototypes with complete information, we propose a mutual transmission mechanism to transfer different modal features to each other to address the issues raised by insufficient information on single-modal features. Furthermore, we devise a prototype-driven feature representation and fusion method with the learned prototypes, which implants the prototypes into tumor features and generates corresponding activation maps. With the activation maps, the sub-region features consistent with the prototype category can be highlighted. A key information enhancement and fusion strategy with multi-expert integration is designed to further improve the segmentation performance. The strategy can integrate the features from different layers of the extra feature extraction network and the features highlighted by the prototypes. Experimental results on three competition brain tumor segmentation datasets prove the superiority of the proposed method.
    摘要 对多Modal MR脑肿刺激图像分割,当前方法通常直接从输入图像中提取特征来确定肿瘤子区划分和定位。然而,现有方法通常忽略了肿瘤子区之间的信息冲突的影响。此外,现有方法通常不会特意强调单个肿瘤子区域的特征。为此,我们提议一种基于肿瘤原型的多Modal MR脑肿刺激分割方法。它可以在肿瘤原型的指导下高亮单个肿瘤子区域的特征。具体来说,为了获取完整信息的肿瘤原型,我们提议一种互传机制来传递不同modal特征之间的信息,以解决单modal特征不具备的问题。此外,我们设计了一种基于肿瘤原型的特征表示和融合方法,其可以在肿瘤特征中嵌入肿瘤原型,并生成相应的激活图。通过激活图,可以高亮与肿瘤原型类别相符的子区域特征。此外,我们设计了一种多 экспер特征融合策略,可以将不同层次的特征和肿瘤原型高亮的特征融合在一起,以提高分割性能。实验结果表明,提议的方法在三个竞赛脑肿刺激分割数据集上显示出了超越性。

Learn to Compress (LtC): Efficient Learning-based Streaming Video Analytics

  • paper_url: http://arxiv.org/abs/2307.12171
  • repo_url: None
  • paper_authors: Quazi Mishkatul Alam, Israat Haque, Nael Abu-Ghazaleh
    for: 这个论文的目的是建立一个高效的云端视频分析框架,以减少视频流的带宽和能源消耗。methods: 这个框架使用了一个轻量级神经网络,通过教师模型在服务器端进行学习,以减少视频流中不必要的信息。此外,它还使用了一种基于特征差分的时间滤波算法,以便快速忽略不必要的帧。results: 这个框架可以使用28-35% less bandwidth和45% shorter response delay,与最近发布的相关研究框架相比,而且保持了类似的分析性能。
    Abstract Video analytics are often performed as cloud services in edge settings, mainly to offload computation, and also in situations where the results are not directly consumed at the video sensors. Sending high-quality video data from the edge devices can be expensive both in terms of bandwidth and power use. In order to build a streaming video analytics pipeline that makes efficient use of these resources, it is therefore imperative to reduce the size of the video stream. Traditional video compression algorithms are unaware of the semantics of the video, and can be both inefficient and harmful for the analytics performance. In this paper, we introduce LtC, a collaborative framework between the video source and the analytics server, that efficiently learns to reduce the video streams within an analytics pipeline. Specifically, LtC uses the full-fledged analytics algorithm at the server as a teacher to train a lightweight student neural network, which is then deployed at the video source. The student network is trained to comprehend the semantic significance of various regions within the videos, which is used to differentially preserve the crucial regions in high quality while the remaining regions undergo aggressive compression. Furthermore, LtC also incorporates a novel temporal filtering algorithm based on feature-differencing to omit transmitting frames that do not contribute new information. Overall, LtC is able to use 28-35% less bandwidth and has up to 45% shorter response delay compared to recently published state of the art streaming frameworks while achieving similar analytics performance.
    摘要 视频分析通常在云服务中进行,主要是为了减轻计算负担,以及在视频感知不直接在视频传感器上进行处理。往往将高质量视频数据从边缘设备传输到云服务器可能会占用很多带宽和电力资源。为建立高效的流动视频分析管道,因此是非常重要减小视频流。传统的视频压缩算法不了解视频的 semantics,可能会导致不fficient和对分析性能有害。在这篇论文中,我们介绍了 LtC,一个协同框架,其中视频源和分析服务器之间协同减小视频流。特别是,LtC 使用全功能的分析算法作为老师来训练一个轻量级神经网络,并将其部署到视频源上。学生网络被训练以理解视频中各个区域的semantic Significance,并使用这些区域来差分保留高质量视频,而其他区域则进行了激进压缩。此外,LtC 还包括一种基于特征差异的时间滤波算法,以便快速忽略不包含新信息的帧。总之,LtC 可以使用28-35%的带宽和45%的响应延迟,相比之下最新的流动框架,而 achieved similar analytics performance。

Optimized Network Architectures for Large Language Model Training with Billions of Parameters

  • paper_url: http://arxiv.org/abs/2307.12169
  • repo_url: None
  • paper_authors: Weiyang Wang, Manya Ghobadi, Kayvon Shakeri, Ying Zhang, Naader Hasani
  • for: 这篇论文挑战了在训练大型自然语言模型(LLM)时建立任意对任意网络的惯例。
  • methods: 我们表明了 LLM 训练中唯一特点的通信模式,只有小组内 GPU 需要高带宽任意对任意通信,以达到训练性能的近似最优。这些小组内 GPU 之间的通信费量极低、稀疏和均匀。我们提议一种新的网络架构,它与 LLM 的通信需求高度相似。我们将集群分为 HB domain,其中每个 HB domain 由非堵塞的任意对任意高带宽互连器连接。在 HB domain 内,网络只与需要通信的 GPU 连接。我们称这种网络为 “铁路仅” 连接,并证明我们的提议架构可以将网络成本降低至最多 75%,不会影响 LLM 训练的性能。
  • results: 我们的实验结果表明,我们的提议架构可以减少网络成本,同时保持 LLM 训练的性能。
    Abstract This paper challenges the well-established paradigm for building any-to-any networks for training Large Language Models (LLMs). We show that LLMs exhibit a unique communication pattern where only small groups of GPUs require high-bandwidth any-to-any communication within them, to achieve near-optimal training performance. Across these groups of GPUs, the communication is insignificant, sparse, and homogeneous. We propose a new network architecture that closely resembles the communication requirement of LLMs. Our architecture partitions the cluster into sets of GPUs interconnected with non-blocking any-to-any high-bandwidth interconnects that we call HB domains. Across the HB domains, the network only connects GPUs with communication demands. We call this network a "rail-only" connection, and show that our proposed architecture reduces the network cost by up to 75% compared to the state-of-the-art any-to-any Clos networks without compromising the performance of LLM training.
    摘要

Facial Point Graphs for Amyotrophic Lateral Sclerosis Identification

  • paper_url: http://arxiv.org/abs/2307.12159
  • repo_url: None
  • paper_authors: Nícolas Barbosa Gomes, Arissa Yoshida, Mateus Roder, Guilherme Camargo de Oliveira, João Paulo Papa
  • for: 这篇论文的目的是找到早期诊断amyotrophic lateral sclerosis (ALS)的方法,以提高病人的预后和生活质量。
  • methods: 这篇论文使用computational方法来分析病人的脸部表情,以检测ALS的症状。具体来说,这篇论文使用Facial Point Graphs来学习脸部图像的几何特征,以自动识别ALS。
  • results: 论文的实验结果显示,提案的方法在测试数据集Toronto Neuroface Dataset中,与现有方法相比,有着更高的准确性和效率。这些结果显示出这种方法的潜力,并带来了领域的发展。
    Abstract Identifying Amyotrophic Lateral Sclerosis (ALS) in its early stages is essential for establishing the beginning of treatment, enriching the outlook, and enhancing the overall well-being of those affected individuals. However, early diagnosis and detecting the disease's signs is not straightforward. A simpler and cheaper way arises by analyzing the patient's facial expressions through computational methods. When a patient with ALS engages in specific actions, e.g., opening their mouth, the movement of specific facial muscles differs from that observed in a healthy individual. This paper proposes Facial Point Graphs to learn information from the geometry of facial images to identify ALS automatically. The experimental outcomes in the Toronto Neuroface dataset show the proposed approach outperformed state-of-the-art results, fostering promising developments in the area.
    摘要 早期识别amyotrophic lateral sclerosis(ALS)是非常重要,可以提高治疗的开始,改善患者的生活质量和总体情况。然而,早期诊断和识别病状的标准方法并不是很直forward。本文提出了一种使用计算机方法分析患者的面部表情来自动识别ALS的方法。当患者进行特定的动作时,如打开嘴巴,特定的面部肌肉的运动会与健康人的不同。这篇论文使用面部点图学习geometry of facial images来自动识别ALS,实验结果表明该方法在多伦多Neuroface dataset中超越了现有的最佳结果,为您的发展提供了有希望的前景。

DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

  • paper_url: http://arxiv.org/abs/2307.12158
  • repo_url: None
  • paper_authors: Ellen Novoseller, Vinicius G. Goecks, David Watkins, Josh Miller, Nicholas Waytowich
  • for: 本研究旨在解决在无结构化真实世界中,RL算法学习Sequential Decision-Making时,因缺乏奖励信号而难以学习的问题。
  • methods: 本研究提出了Demonstration-Inferred Preference Reinforcement Learning(DIP-RL)算法,利用人类示范来推导RL算法学习。DIP-RL在三种不同的方式使用示范数据,包括训练自动编码器、RL训练批处理中使用示范数据,以及推导RL奖励函数。
  • results: 实验结果表明,DIP-RL可以引导RL算法学习人类偏好,并且与基elines相比,DIP-RL在树割任务中表现竞争力强。例如轨迹满足扩展可以在https://sites.google.com/view/dip-rl。
    Abstract In machine learning for sequential decision-making, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal. However, in many unstructured real-world settings, such a reward signal is unknown and humans cannot reliably craft a reward signal that correctly captures desired behavior. To solve tasks in such unstructured and open-ended environments, we present Demonstration-Inferred Preference Reinforcement Learning (DIP-RL), an algorithm that leverages human demonstrations in three distinct ways, including training an autoencoder, seeding reinforcement learning (RL) training batches with demonstration data, and inferring preferences over behaviors to learn a reward function to guide RL. We evaluate DIP-RL in a tree-chopping task in Minecraft. Results suggest that the method can guide an RL agent to learn a reward function that reflects human preferences and that DIP-RL performs competitively relative to baselines. DIP-RL is inspired by our previous work on combining demonstrations and pairwise preferences in Minecraft, which was awarded a research prize at the 2022 NeurIPS MineRL BASALT competition, Learning from Human Feedback in Minecraft. Example trajectory rollouts of DIP-RL and baselines are located at https://sites.google.com/view/dip-rl.
    摘要 机器学习 дляsequential decision-making中的算法式代理可以在环境中学习并接受反馈形式为奖signal。但在许多无结构的实际场景中,这种奖signal是未知的,人们无法可靠地制定一个正确捕捉所愿行为的奖 signal。为解决这些无结构和开放的环境中的任务,我们提出了Demonstration-Inferred Preference Reinforcement Learning(DIP-RL)算法,该算法利用人类示范在三种方式:在训练 autoencoder 中,在RL训练批次中使用示范数据,以及推导RL agent 对行为的偏好来学习一个奖函数来导引RL。我们在 Minecraft 中进行了树割任务的评估,结果表明DIP-RL可以引导RL agent 学习一个符合人类偏好的奖函数,并且DIP-RL与基elines相比表现竞争性。DIP-RL是基于我们在 Minecraft 中结合示范和对比偏好的前一项研究,该研究在2022年 NeurIPS MineRL BASALT 比赛中获得了研究奖,Learning from Human Feedback in Minecraft。DIP-RL的示例轨迹扩展位于

Identifying contributors to supply chain outcomes in a multi-echelon setting: a decentralised approach

  • paper_url: http://arxiv.org/abs/2307.12157
  • repo_url: None
  • paper_authors: Stefan Schoepf, Jack Foster, Alexandra Brintrup
  • for: 本研究旨在帮助企业快速准确地确定生产过程中metric变化的原因,尤其是在多层供应链中,只能部分可见。
  • methods: 本研究提议使用可解释人工智能来实现分布式计算估算变量的贡献,以解决数据隐私问题。
  • results: 实验结果表明,分布式计算可以更有效地检测质量变化的起源,比中央化方法使用Shapley添加itive解释。
    Abstract Organisations often struggle to identify the causes of change in metrics such as product quality and delivery duration. This task becomes increasingly challenging when the cause lies outside of company borders in multi-echelon supply chains that are only partially observable. Although traditional supply chain management has advocated for data sharing to gain better insights, this does not take place in practice due to data privacy concerns. We propose the use of explainable artificial intelligence for decentralised computing of estimated contributions to a metric of interest in a multi-stage production process. This approach mitigates the need to convince supply chain actors to share data, as all computations occur in a decentralised manner. Our method is empirically validated using data collected from a real multi-stage manufacturing process. The results demonstrate the effectiveness of our approach in detecting the source of quality variations compared to a centralised approach using Shapley additive explanations.
    摘要 企业们经常难以确定生产质量和交付时间的变化的原因。这个任务在多层供应链中,只能半 observability 的情况下变得更加困难。传统的供应链管理推荐数据共享,以获得更好的洞察力,但在实践中,由于数据隐私问题,这并没有实现。我们建议使用可解释人工智能 для分布式计算估算的贡献因素,以解决不必要地让供应链 aktör 分享数据的问题。我们的方法在实验 validate 使用实际的多Stage生产过程中的数据,结果显示,我们的方法可以更好地探测质量变化的来源,比中央化使用Shapley加itive解释法。

Real-Time Neural Video Recovery and Enhancement on Mobile Devices

  • paper_url: http://arxiv.org/abs/2307.12152
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Zhaoyuan He, Yifan Yang, Lili Qiu, Kyoungjun Park
  • for: 提高移动设备上视频流式的流畅体验
  • methods: 提出了一种新的视频帧恢复方案、一种新的超分辨率算法和一种接受器增强视频比特率调整算法
  • results: 实现了30帧/秒的实时增强,在不同的网络环境下提高了视频流程的质量经验(QoE),具体的提高率为24%-82%。
    Abstract As mobile devices become increasingly popular for video streaming, it's crucial to optimize the streaming experience for these devices. Although deep learning-based video enhancement techniques are gaining attention, most of them cannot support real-time enhancement on mobile devices. Additionally, many of these techniques are focused solely on super-resolution and cannot handle partial or complete loss or corruption of video frames, which is common on the Internet and wireless networks. To overcome these challenges, we present a novel approach in this paper. Our approach consists of (i) a novel video frame recovery scheme, (ii) a new super-resolution algorithm, and (iii) a receiver enhancement-aware video bit rate adaptation algorithm. We have implemented our approach on an iPhone 12, and it can support 30 frames per second (FPS). We have evaluated our approach in various networks such as WiFi, 3G, 4G, and 5G networks. Our evaluation shows that our approach enables real-time enhancement and results in a significant increase in video QoE (Quality of Experience) of 24\% - 82\% in our video streaming system.
    摘要 为了优化移动设备上的视频流处理,随着移动设备的普及,现在已经非常重要。虽然深度学习基于视频提升技术在获得关注,但大多数这些技术无法在移动设备上实时进行提升。此外,许多这些技术都是专注于超解像,而不是处理部分或完全丢失的视频帧,这是网络和无线网络中的常见问题。为了解决这些挑战,我们在本文中提出了一种新的方法。我们的方法包括以下三个部分:(i) 一种新的视频帧恢复算法,(ii) 一种新的超解像算法,(iii) 一种基于接收器提升的视频比特率自适应算法。我们在iPhone 12上实现了我们的方法,并可以支持30帧/秒。我们在WiFi、3G、4G和5G网络中进行了评估,我们的评估结果表明,我们的方法可以实现实时提升,并导致视频Quality of Experience(QoE)提高24%-82%。

Learned Gridification for Efficient Point Cloud Processing

  • paper_url: http://arxiv.org/abs/2307.14354
  • repo_url: https://github.com/computri/gridifier
  • paper_authors: Putri A. van der Linden, David W. Romero, Erik J. Bekkers
  • for: 这篇论文主要用于解决点云处理领域中的可插入性问题,提高点云处理的缩放性和可扩展性。
  • methods: 该论文提出了一种名为”学习gridification”的方法,即将点云转化为一个紧凑、规则的网格,以便在网格上使用已有的深度学习方法。
  • results: 经过 teorтиче 和实验分析,该论文表明,使用学习gridification方法可以提高点云处理的缩放性和可扩展性,同时保持与原始点云数据相比的竞争性。
    Abstract Neural operations that rely on neighborhood information are much more expensive when deployed on point clouds than on grid data due to the irregular distances between points in a point cloud. In a grid, on the other hand, we can compute the kernel only once and reuse it for all query positions. As a result, operations that rely on neighborhood information scale much worse for point clouds than for grid data, specially for large inputs and large neighborhoods. In this work, we address the scalability issue of point cloud methods by tackling its root cause: the irregularity of the data. We propose learnable gridification as the first step in a point cloud processing pipeline to transform the point cloud into a compact, regular grid. Thanks to gridification, subsequent layers can use operations defined on regular grids, e.g., Conv3D, which scale much better than native point cloud methods. We then extend gridification to point cloud to point cloud tasks, e.g., segmentation, by adding a learnable de-gridification step at the end of the point cloud processing pipeline to map the compact, regular grid back to its original point cloud form. Through theoretical and empirical analysis, we show that gridified networks scale better in terms of memory and time than networks directly applied on raw point cloud data, while being able to achieve competitive results. Our code is publicly available at https://github.com/computri/gridifier.
    摘要 神经操作依赖地域信息在点云上比在格子数据上更加昂贵,因为点云中点的距离不规则。在格子中,我们可以一次计算核心,然后将其重复使用所有查询位置。因此,基于地域信息的操作在点云上缩放比格子数据更差,特别是对于大输入和大地域。在这项工作中,我们解决点云方法的扩展性问题,通过将点云转换为可 compact、规则的格子。感谢gridification,后续层可以使用定义在规则格子上的操作,例如Conv3D,这些操作在点云数据上缩放更好。我们还扩展gridification来点云到点云任务,例如分割,通过在点云处理管道的末端添加学习的de-gridification步骤,将紧凑的规则格子映射回原始点云形式。通过理论和实验分析,我们表明gridified网络在内存和时间上比直接应用于原始点云数据更好的扩展性,同时能够达到竞争性的结果。我们的代码公开在https://github.com/computri/gridifier。

CorrFL: Correlation-Based Neural Network Architecture for Unavailability Concerns in a Heterogeneous IoT Environment

  • paper_url: http://arxiv.org/abs/2307.12149
  • repo_url: https://github.com/Western-OC2-Lab/CorrFL
  • paper_authors: Ibrahim Shaer, Abdallah Shami
  • For: 解决 Federated Learning(FL)中模型策略的差异和物联网(IoT)节点的缺失问题。* Methods: 提出了一种基于相关学习(Correlation-based FL,CorrFL)的方法,通过将不同模型权重映射到共同的准则空间来处理模型策略的差异。* Results: 通过对一个真实的使用场景进行评估,发现CorrFL模型在缺失IoT节点和高活动水平时的预测性能较为出色,并且与不同的使用场景中的标准模型进行比较。
    Abstract The Federated Learning (FL) paradigm faces several challenges that limit its application in real-world environments. These challenges include the local models' architecture heterogeneity and the unavailability of distributed Internet of Things (IoT) nodes due to connectivity problems. These factors posit the question of "how can the available models fill the training gap of the unavailable models?". This question is referred to as the "Oblique Federated Learning" problem. This problem is encountered in the studied environment that includes distributed IoT nodes responsible for predicting CO2 concentrations. This paper proposes the Correlation-based FL (CorrFL) approach influenced by the representational learning field to address this problem. CorrFL projects the various model weights to a common latent space to address the model heterogeneity. Its loss function minimizes the reconstruction loss when models are absent and maximizes the correlation between the generated models. The latter factor is critical because of the intersection of the feature spaces of the IoT devices. CorrFL is evaluated on a realistic use case, involving the unavailability of one IoT device and heightened activity levels that reflect occupancy. The generated CorrFL models for the unavailable IoT device from the available ones trained on the new environment are compared against models trained on different use cases, referred to as the benchmark model. The evaluation criteria combine the mean absolute error (MAE) of predictions and the impact of the amount of exchanged data on the prediction performance improvement. Through a comprehensive experimental procedure, the CorrFL model outperformed the benchmark model in every criterion.
    摘要 联邦学习(FL)模式面临许多实际环境中的挑战,这些挑战包括本地模型的架构多样性和分布式互联网络端的网络问题,这们问题使得“如何让可用的模型填充缺失的模型?”这个问题被称为“偏角联邦学习”问题。这个问题在分散式互联网络端负责预测CO2浓度的环境中被研究。这篇文章提出了基于相互关联学习(CorrFL)方法,它将多个模型的weight投射到共同的潜在空间以解决模型多样性问题。CorrFL的损失函数将缺失的模型的重建损失降低至最小,并将可用模型生成的模型之间的相互相关性提高。这个因素是critical,因为分散式互联网络端的特征空间 intersection。通过一个实际的使用情况,这篇文章评估了CorrFL在一个 IoT 设备缺失和活动水平增加的情况下的表现。生成的CorrFL模型与不可用的 IoT 设备进行比较,并与不同的使用情况下的模型进行比较,这些模型被称为底线模型。评估标准包括预测误差的总平均误差(MAE)和预测性能改善的资料交换量影响。通过一个完整的实验程序,CorrFL 模型在每个标准中都表现出优于底线模型。

Applications of Machine Learning to Modelling and Analysing Dynamical Systems

  • paper_url: http://arxiv.org/abs/2308.03763
  • repo_url: None
  • paper_authors: Vedanta Thapar
  • for: 本研究使用物理学 Informed Neural Networks 分析非线性哈密顿动力系统,具有一个动力平衡方程的第一 интеграル。
  • methods: 本文提出了一种结合现有哈密顿神经网络结构的 Adaptable Symplectic Recurrent Neural Networks 架构,能够保持哈密顿方程和相互作用的 symplectic 结构,并在整个参数空间预测动力学行为。此架构在预测哈密顿动力学中,特别是在含有多个参数的潜能中表现出了显著的优势。
  • results: 本研究表明,使用该架构可以高效地预测哈密顿动力学,尤其是在含有多个参数的潜能中。此外, authors 还证明了该方法在单参数潜能中的稳定性和长期预测性。
    Abstract We explore the use of Physics Informed Neural Networks to analyse nonlinear Hamiltonian Dynamical Systems with a first integral of motion. In this work, we propose an architecture which combines existing Hamiltonian Neural Network structures into Adaptable Symplectic Recurrent Neural Networks which preserve Hamilton's equations as well as the symplectic structure of phase space while predicting dynamics for the entire parameter space. This architecture is found to significantly outperform previously proposed neural networks when predicting Hamiltonian dynamics especially in potentials which contain multiple parameters. We demonstrate its robustness using the nonlinear Henon-Heiles potential under chaotic, quasiperiodic and periodic conditions. The second problem we tackle is whether we can use the high dimensional nonlinear capabilities of neural networks to predict the dynamics of a Hamiltonian system given only partial information of the same. Hence we attempt to take advantage of Long Short Term Memory networks to implement Takens' embedding theorem and construct a delay embedding of the system followed by mapping the topologically invariant attractor to the true form. This architecture is then layered with Adaptable Symplectic nets to allow for predictions which preserve the structure of Hamilton's equations. We show that this method works efficiently for single parameter potentials and provides accurate predictions even over long periods of time.
    摘要 我们探讨使用物理 Informed Neural Networks 分析非线性汉密尔顿动力系统,其具有一个动力的第一Integral of motion。在这项工作中,我们提议一种结合现有汉密尔顿神经网络结构的可适应 симплектиче Recurrent Neural Networks 结构,该结构保留汉密尔顿方程以及相对空间的 симплектиче结构,同时预测动力的整个参数空间。这种结构在预测汉密尔顿动力方面表现出了明显的优异,尤其是在含有多个参数的潜能中。我们通过使用非线性 Henon-Heiles 潜能函数进行了robustness测试,并在各种不同的conditions下进行了验证。第二个问题是可以使用高维非线性神经网络来预测汉密尔顿系统的动力,只要知道一部分系统的信息吗。因此,我们尝试使用 Long Short Term Memory 网络实现 Takens 嵌入定理,并将系统的延迟嵌入映射到真正的形式。然后,我们层加 Adaptable Symplectic nets 以使预测保留汉密尔顿方程的结构。我们发现这种方法可以高效地预测单参数潜能中的动力,并且可以在长时间内提供高度准确的预测。

A Vision for Cleaner Rivers: Harnessing Snapshot Hyperspectral Imaging to Detect Macro-Plastic Litter

  • paper_url: http://arxiv.org/abs/2307.12145
  • repo_url: https://github.com/river-lab/hyperspectral_macro_plastic_detection
  • paper_authors: Nathaniel Hanson, Ahmet Demirkaya, Deniz Erdoğmuş, Aron Stubbins, Taşkın Padır, Tales Imbiriba
  • for: 本研究旨在开发一种高效自动化的浮游垃圾监测方法,以解决水体中垃圾杂物的监测问题。
  • methods: 本研究使用计算成像技术进行浮游垃圾物质的检测,包括可见短波谱成像和可见短波谱识别方法。
  • results: 实验结果表明,使用Snapshot可见短波谱成像和机器学习分类方法可以在实际场景中实现高检测精度,特别是在具有挑战性的场景下。
    Abstract Plastic waste entering the riverine harms local ecosystems leading to negative ecological and economic impacts. Large parcels of plastic waste are transported from inland to oceans leading to a global scale problem of floating debris fields. In this context, efficient and automatized monitoring of mismanaged plastic waste is paramount. To address this problem, we analyze the feasibility of macro-plastic litter detection using computational imaging approaches in river-like scenarios. We enable near-real-time tracking of partially submerged plastics by using snapshot Visible-Shortwave Infrared hyperspectral imaging. Our experiments indicate that imaging strategies associated with machine learning classification approaches can lead to high detection accuracy even in challenging scenarios, especially when leveraging hyperspectral data and nonlinear classifiers. All code, data, and models are available online: https://github.com/RIVeR-Lab/hyperspectral_macro_plastic_detection.
    摘要 塑料废弃物进入河流环境会对当地生态系统造成负面影响,导致生态和经济问题。大量塑料废弃物从陆地传输到海洋,导致全球范围内漂浮垃圾场景。在这种情况下,高效和自动化的废弃塑料监测变得非常重要。为解决这个问题,我们分析了使用计算成像方法检测大型塑料废弃物的可能性。我们使用快照可见短波谱 hyperspectral成像进行近实时检测半潜水塑料。我们的实验表明,通过使用机器学习分类方法和非线性分类器,可以在具有挑战性的情况下实现高检测精度。所有代码、数据和模型都可以在 GitHub 上下载:https://github.com/RIVeR-Lab/hyperspectral_macro_plastic_detection。

Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.12143
  • repo_url: https://github.com/aqeel13932/mn_project
  • paper_authors: Aqeel Labash, Florian Fletzer, Daniel Majoral, Raul Vicente
  • For: The paper explores the emergence of circadian-like rhythms in deep reinforcement learning agents.* Methods: The authors use a foraging task and a reliable periodic variation in the environment to train the agents. They systematically characterize the agents’ behavior during learning and analyze the emergence of an endogenous rhythm using bifurcation and phase response curve analyses.* Results: The paper shows that the internal rhythm adapts to shifts in the phase of the environmental signal without any re-training, and demonstrates how artificial neurons develop dynamics to support the internalization of the environmental rhythm. The adaptation proceeds through the emergence of a stable periodic orbit in the neuron dynamics with a phase response that allows optimal phase synchronization between the agent’s dynamics and the environmental rhythm.Here’s the Chinese translation of the three points:* For: 这篇论文研究了深度奖励学习代理人是如何适应环境的 rhythm。* Methods: 作者使用了一个捕食任务和一个可预测的环境变化来训练代理人。他们系统地描述了代理人在学习过程中的行为,并通过分支和相位响应曲线分析来分析内在的 rhythm 的出现。* Results: 论文显示了内在 rhythm 可以适应环境信号的相位变化,而不需要任何再训练。它还表明了人工神经元的动力学发展了一种支持内在 rhythm 的动力学特性,并且在代理人动力学和环境 rhythm 之间进行了优化的相位同步。从动力学视角来看, adaptive 进程是通过内在 rhythm 的稳定 periodic orbit 的出现来实现的,该 periodic orbit 的相位响应允许代理人动力学和环境 rhythm 之间的优化相位同步。
    Abstract Adapting to regularities of the environment is critical for biological organisms to anticipate events and plan. A prominent example is the circadian rhythm corresponding to the internalization by organisms of the $24$-hour period of the Earth's rotation. In this work, we study the emergence of circadian-like rhythms in deep reinforcement learning agents. In particular, we deployed agents in an environment with a reliable periodic variation while solving a foraging task. We systematically characterize the agent's behavior during learning and demonstrate the emergence of a rhythm that is endogenous and entrainable. Interestingly, the internal rhythm adapts to shifts in the phase of the environmental signal without any re-training. Furthermore, we show via bifurcation and phase response curve analyses how artificial neurons develop dynamics to support the internalization of the environmental rhythm. From a dynamical systems view, we demonstrate that the adaptation proceeds by the emergence of a stable periodic orbit in the neuron dynamics with a phase response that allows an optimal phase synchronisation between the agent's dynamics and the environmental rhythm.
    摘要 适应环境的规律是生物体预测事件和规划的关键。一个明显的例子是生物体内部的 circadian 频率,即通过生物体内部内化地球的24小时转动周期。在这项工作中,我们研究了深度学习Agent中的 circadian-like 频率的出现。特别是,我们在一个可靠 periodic 变化的环境中部署了 Agent,并在寻食任务中学习。我们系统地描述了 Agent 的行为 durante 学习,并证明了 Agent 内部的频率可以自动适应环境的阶段偏移。此外,我们通过杂化和相对响应曲线分析表明,人工神经元发展了 dynamics 来支持内部化环境的频率。从动力系统视角来看,适应进程由神经元动力学中的稳定 periodic 轨迹的出现和相应的相位协调导致。

Unlocking Carbon Reduction Potential with Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem

  • paper_url: http://arxiv.org/abs/2307.12136
  • repo_url: None
  • paper_authors: Stefan Schoepf, Stephen Mak, Julian Senoner, Liming Xu, Netland Torbjörn, Alexandra Brintrup
  • for: 增加效率,提高运输效率
  • methods: 使用强化学习模型
  • results: 与现有方法相比,平均差距在3.83%到8.10%之间
    Abstract Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loading and routing. Current operations research methods suffer from non-linear scaling with increasing problem size and are therefore bound to limited geographic areas to compute results in time for day-to-day operations. This only allows for local optima in routing and leaves global optimisation potential untouched. We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. While this problem has been studied extensively in operations research, no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. The model performs within an average gap of 3.83% to 8.10% compared to established methods. Our model not only represents a promising first step towards large-scale logistics optimisation with reinforcement learning but also lays the foundation for this research stream.
    摘要 To address this challenge, we have developed a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. This problem has been extensively studied in operations research, but no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. Our model performs within an average gap of 3.83% to 8.10% compared to established methods.Our model not only represents a promising first step towards large-scale logistics optimization with reinforcement learning but also lays the foundation for this research stream. With the ability to efficiently compute viable solutions for co-loading and routing, we can significantly reduce carbon emissions from heavy goods vehicles and improve the overall efficiency of the supply chain delivery system.

The Sample Complexity of Multi-Distribution Learning for VC Classes

  • paper_url: http://arxiv.org/abs/2307.12135
  • repo_url: None
  • paper_authors: Pranjal Awasthi, Nika Haghtalab, Eric Zhao
  • for: 多 Distribution Learning 是一种自然推广 PAC 学习到多个数据分布的设置中。
  • methods: 使用游戏动力学来解决这个问题,并讨论了一些基本障碍。
  • results: 研究表明,现有的最佳下界是 $\Omega(\epsilon^{-2}(d + k \ln(k)))$, 而实际 Sample Complexity 为 $O(\epsilon^{-2} \ln(k)(d + k) + \min{\epsilon^{-1} dk, \epsilon^{-4} \ln(k) d})$.
    Abstract Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O(\epsilon^{-2} \ln(k)(d + k) + \min\{\epsilon^{-1} dk, \epsilon^{-4} \ln(k) d\})$, the best lower bound is $\Omega(\epsilon^{-2}(d + k \ln(k)))$. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
    摘要 多分布学习是自然推广PAC学习的设置中的多个数据分布的一种自然推广。现存在较大的知识上下文和下界之间的差距。具体来说,虽然我们理解了VC阶数d在k个分布上学习的样本复杂度为O(ε^-2 \* ln(k) (d + k) + MIN(ε^-1 dk, ε^-4 \* ln(k) d)),但最好的下界是Ω(ε^-2 (d + k \* ln(k)))。我们讨论了这个问题的最新进展和使用游戏动力学在统计学习中的核心障碍。

AI on the Road: A Comprehensive Analysis of Traffic Accidents and Accident Detection System in Smart Cities

  • paper_url: http://arxiv.org/abs/2307.12128
  • repo_url: None
  • paper_authors: Victor Adewopo, Nelly Elsayed, Zag Elsayed, Murat Ozer, Victoria Wangia-Anderson, Ahmed Abdelgawad
  • for: 本研究旨在提高交通管理和交通事故减少,通过分析不同地区的交通事故数据,提出一个基于交通监控摄像头和动作识别系统的交通事故探测和应答框架。
  • methods: 本研究使用了国家公路交通安全管理局(NHTSA)的交通事故报告采样系统(CRSS)数据进行交通事故分析,并提出了一种基于机器学习算法和交通监控摄像头的交通事故探测和应答框架。
  • results: 本研究发现了不同地区的交通事故特征和趋势,并提出了一种基于交通监控摄像头和动作识别系统的交通事故探测和应答框架,可以减少交通事故的频率和严重程度,提高交通管理的效率和安全性。
    Abstract Accident detection and traffic analysis is a critical component of smart city and autonomous transportation systems that can reduce accident frequency, severity and improve overall traffic management. This paper presents a comprehensive analysis of traffic accidents in different regions across the United States using data from the National Highway Traffic Safety Administration (NHTSA) Crash Report Sampling System (CRSS). To address the challenges of accident detection and traffic analysis, this paper proposes a framework that uses traffic surveillance cameras and action recognition systems to detect and respond to traffic accidents spontaneously. Integrating the proposed framework with emergency services will harness the power of traffic cameras and machine learning algorithms to create an efficient solution for responding to traffic accidents and reducing human errors. Advanced intelligence technologies, such as the proposed accident detection systems in smart cities, will improve traffic management and traffic accident severity. Overall, this study provides valuable insights into traffic accidents in the US and presents a practical solution to enhance the safety and efficiency of transportation systems.
    摘要 智能城市和自动交通系统中的事故探测和交通分析是一个关键组成部分,可以降低事故频率、严重程度并改善总体交通管理。这篇论文对美国各地的交通事故进行了全面的分析,使用国家公路安全管理局(NHTSA)的事故报告采样系统(CRSS)的数据。为了解决事故探测和交通分析的挑战,该论文提出了一个框架,使用交通监控摄像头和动作认知系统来自动探测和应对交通事故。将该框架与紧急服务集成,可以利用交通摄像头和机器学习算法创造一种高效的交通事故应对解决方案,减少人类错误。高级智能技术,如智能城市中的事故探测系统,将改善交通管理和交通事故严重程度。总的来说,这篇研究提供了美国交通事故的有价值的视角,并提出了实用的解决方案,以提高交通系统的安全和效率。

Synthesis of Batik Motifs using a Diffusion – Generative Adversarial Network

  • paper_url: http://arxiv.org/abs/2307.12122
  • repo_url: https://github.com/octadion/diffusion-stylegan2-ada-pytorch
  • paper_authors: One Octadion, Novanto Yudistira, Diva Kurnianingtyas
  • for: assist batik designers or craftsmen in producing unique and quality batik motifs with efficient production time and costs.
  • methods: using StyleGAN2-Ada and Diffusion techniques to produce realistic and high-quality synthetic batik patterns, with adjustments to the model architecture and a well-curated batik dataset.
  • results: capable of producing authentic and quality batik patterns, with finer details and rich artistic variations.
    Abstract Batik, a unique blend of art and craftsmanship, is a distinct artistic and technological creation for Indonesian society. Research on batik motifs is primarily focused on classification. However, further studies may extend to the synthesis of batik patterns. Generative Adversarial Networks (GANs) have been an important deep learning model for generating synthetic data, but often face challenges in the stability and consistency of results. This research focuses on the use of StyleGAN2-Ada and Diffusion techniques to produce realistic and high-quality synthetic batik patterns. StyleGAN2-Ada is a variation of the GAN model that separates the style and content aspects in an image, whereas diffusion techniques introduce random noise into the data. In the context of batik, StyleGAN2-Ada and Diffusion are used to produce realistic synthetic batik patterns. This study also made adjustments to the model architecture and used a well-curated batik dataset. The main goal is to assist batik designers or craftsmen in producing unique and quality batik motifs with efficient production time and costs. Based on qualitative and quantitative evaluations, the results show that the model tested is capable of producing authentic and quality batik patterns, with finer details and rich artistic variations. The dataset and code can be accessed here:https://github.com/octadion/diffusion-stylegan2-ada-pytorch
    摘要 《独特的抽象艺术》——巴迪克的研究巴迪克是印度尼西亚社会独特的艺术和手工艺术品。研究巴迪克图案主要集中在分类方面,但可能会扩展到synthesize batik patterns。生成对抗网络(GANs)是深度学习模型,可以生成 sintetic data,但经常面临稳定性和一致性的挑战。本研究使用StyleGAN2-Ada和扩散技术生成高质量和真实的 sintetic batik patterns。StyleGAN2-Ada分离图像中的风格和内容两个方面,而扩散技术引入随机噪音。在batik中,StyleGAN2-Ada和扩散被用来生成真实的 sintetic batik patterns。本研究还对模型结构进行了调整,使用了高质量的batik dataset。主要目标是帮助batik设计师或手工艺术家生成独特和高质量的batik图案,以及减少生产时间和成本。根据质量和量的评估,研究结果表明模型能够生成authentic和高质量的batik patterns,具有细节和艺术变化。数据集和代码可以在以下链接获取:https://github.com/octadion/diffusion-stylegan2-ada-pytorch