cs.LG - 2023-10-19

A Deep Learning Analysis of Climate Change, Innovation, and Uncertainty

  • paper_url: http://arxiv.org/abs/2310.13200
  • repo_url: None
  • paper_authors: Michael Barnett, William Brock, Lars Peter Hansen, Ruimeng Hu, Joseph Huang
  • for: 这个论文旨在研究气候经济框架中模型不确定性的影响。
  • methods: 论文使用神经网络方法解决高维度非线性模型问题。
  • results: 研究发现模型不确定性对优化决策和社会价值具有首层影响。 accounting for climatic dynamics, economic damage from climate change, and the arrival of green technological change leads to significant adjustments to investment in different capital types in anticipation of technological change and the revelation of climate damage severity.
    Abstract We study the implications of model uncertainty in a climate-economics framework with three types of capital: "dirty" capital that produces carbon emissions when used for production, "clean" capital that generates no emissions but is initially less productive than dirty capital, and knowledge capital that increases with R\&D investment and leads to technological innovation in green sector productivity. To solve our high-dimensional, non-linear model framework we implement a neural-network-based global solution method. We show there are first-order impacts of model uncertainty on optimal decisions and social valuations in our integrated climate-economic-innovation framework. Accounting for interconnected uncertainty over climate dynamics, economic damages from climate change, and the arrival of a green technological change leads to substantial adjustments to investment in the different capital types in anticipation of technological change and the revelation of climate damage severity.
    摘要 我们研究模型不确定性在气候经济框架中的影响,这样有三种资产:“坏”资产(dirty capital)在生产时产生碳排放,“清洁”资产(clean capital)在初期不如“坏”资产生产力,但不产生排放,而“知识”资产(knowledge capital)透过研发投资增加技术创新,提高绿色领域生产力。为解决我们的高维度、非线性模型框架,我们实现了基于神经网络的全球解决方案。我们显示,在考虑气候动力学、气候变革对经济伤害和绿色技术变革的连接不确定性下,会出现首项影响并且有很大的调整投资不同类型的资产,以应对技术变革和气候伤害的揭露。

Heterogeneous Graph Neural Networks for Data-driven Traffic Assignment

  • paper_url: http://arxiv.org/abs/2310.13193
  • repo_url: None
  • paper_authors: Tong Liu, Hadi Meidani
  • for: 这篇论文是用于交通流分析的一种新方法的提议,以便更好地理解和管理各种交通系统。
  • methods: 这篇论文使用了不同链接之间的异质图 neural network 模型,以捕捉交通流的空间模式。
  • results: 数据实验表明,该模型能够快速收敛、减少训练损失,并且在预测交通流方面具有高度的准确性。此外,该模型还可以应用于不同的网络架构。
    Abstract The traffic assignment problem is one of the significant components of traffic flow analysis for which various solution approaches have been proposed. However, deploying these approaches for large-scale networks poses significant challenges. In this paper, we leverage the power of heterogeneous graph neural networks to propose a novel data-driven approach for traffic assignment and traffic flow learning. The proposed model is capable of capturing spatial traffic patterns across different links, yielding highly accurate results. We present numerical experiments on urban transportation networks and show that the proposed heterogeneous graph neural network model outperforms other conventional neural network models in terms of convergence rate, training loss, and prediction accuracy. Notably, the proposed heterogeneous graph neural network model can also be generalized to different network topologies. This approach offers a promising solution for complex traffic flow analysis and prediction, enhancing our understanding and management of a wide range of transportation systems.
    摘要 traffic assignment problem 是 traffic flow analysis 中一个重要的 ком component,Various solution approaches 已经被提出。However,在 large-scale networks 上部署这些approaches poses significant challenges。In this paper, we leverage the power of heterogeneous graph neural networks 来提出一种 novel data-driven approach for traffic assignment and traffic flow learning。The proposed model 可以 capture spatial traffic patterns across different links, yielding highly accurate results。We present numerical experiments on urban transportation networks and show that the proposed heterogeneous graph neural network model outperforms other conventional neural network models in terms of convergence rate, training loss, and prediction accuracy。Notably, the proposed heterogeneous graph neural network model can also be generalized to different network topologies。This approach offers a promising solution for complex traffic flow analysis and prediction,enhancing our understanding and management of a wide range of transportation systems。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Almost Equivariance via Lie Algebra Convolutions

  • paper_url: http://arxiv.org/abs/2310.13164
  • repo_url: None
  • paper_authors: Daniel McNeela
  • for: 这篇论文主要研究的是对群行为的模型Equivariance,但是强制模型遵循严格的Equivariance可能会导致模型在实际数据上表现不佳。因此,这篇论文研究了一种相关的主题——几乎Equivariance。
  • methods: 作者提出了一种新的几乎Equivariance定义,并给出了一种实际的方法来在模型中编码几乎Equivariance,通过利用 Lie GROUP 的 Lie algebra。作者还证明了几乎Equivariance和几乎Isometry之间的关系,并证明了几乎Equivariant manifold embeddings的存在性。
  • results: 作者通过对实际数据进行测试,证明了他们的方法的有效性。他们还证明了几乎Equivariance可以提供更好的模型表现,而不需要严格的Equivariance。
    Abstract Recently, the equivariance of models with respect to a group action has become an important topic of research in machine learning. However, imbuing an architecture with a specific group equivariance imposes a strong prior on the types of data transformations that the model expects to see. While strictly-equivariant models enforce symmetries, real-world data does not always conform to such strict equivariances, be it due to noise in the data or underlying physical laws that encode only approximate or partial symmetries. In such cases, the prior of strict equivariance can actually prove too strong and cause models to underperform on real-world data. Therefore, in this work we study a closely related topic, that of almost equivariance. We provide a definition of almost equivariance that differs from those extant in the current literature and give a practical method for encoding almost equivariance in models by appealing to the Lie algebra of a Lie group. Specifically, we define Lie algebra convolutions and demonstrate that they offer several benefits over Lie group convolutions, including being well-defined for non-compact groups. From there, we pivot to the realm of theory and demonstrate connections between the notions of equivariance and isometry and those of almost equivariance and almost isometry, respectively. We prove two existence theorems, one showing the existence of almost isometries within bounded distance of isometries of a general manifold, and another showing the converse for Hilbert spaces. We then extend these theorems to prove the existence of almost equivariant manifold embeddings within bounded distance of fully equivariant embedding functions, subject to certain constraints on the group action and the function class. Finally, we demonstrate the validity of our approach by benchmarking against datasets in fully equivariant and almost equivariant settings.
    摘要 近期,对于一个群作用下的模型的等价性(equivariance)已经成为机器学习领域的重要研究话题。然而,具有特定群等价性的建筑限制了模型对数据变换的预期,而实际世界的数据通常不符合这种严格的等价性,可能因为数据中的噪声或实际physical laws encode only approximate/partial symmetries。在这种情况下,严格等价性的先验可能会导致模型在实际数据上下perform poorly。因此,在这个工作中,我们研究一个相关的话题:几乎等价性(almost equivariance)。我们提出了一种不同于现有文献中的定义,并给出了在 Lie algebra 中编码几乎等价性的实践方法。我们定义了 Lie algebra 混合并证明其在非紧Compact group 下是有用的。然后,我们转移到理论领域,与等价性和几何等价性之间的关系进行研究。我们证明了两个存在定理,其中一个显示在一般 manifold 上存在几乎几何同惯的,另一个则证明在希尔伯特空间上存在几何同惯的。我们然后推广这些定理,证明在满足某些群动作和函数类型的约束下,存在几乎等价的投影函数,并且这些函数在 bounded distance 内与完全等价的投影函数具有相似性。最后,我们通过对实际数据进行测试,证明了我们的方法的有效性。

Graph Neural Networks with polynomial activations have limited expressivity

  • paper_url: http://arxiv.org/abs/2310.13139
  • repo_url: None
  • paper_authors: Sammy Khalife
  • for: 这篇论文探讨了使用图神经网络(GNNs)表示逻辑Query的可能性。
  • methods: 这篇论文使用了逻辑Query的两变量 fragments(GC2)来描述GNNs的表示能力。
  • results: 论文表明,使用某些activation function(如 polynomial activation function)无法表示GC2 queries,这意味着GNNs可以表示不同的逻辑Query,并答复了[Grohe, 2021]提出的问题。
    Abstract The expressivity of Graph Neural Networks (GNNs) can be entirely characterized by appropriate fragments of the first order logic. Namely, any query of the two variable fragment of graded modal logic (GC2) interpreted over labelled graphs can be expressed using a GNN whose size depends only on the depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021 ], this description holds for a family of activation functions, leaving the possibibility for a hierarchy of logics expressible by GNNs depending on the chosen activation function. In this article, we show that such hierarchy indeed exists by proving that GC2 queries cannot be expressed by GNNs with polynomial activation functions. This implies a separation between polynomial and popular non polynomial activations (such as ReLUs, sigmoid and hyperbolic tan and others) and answers an open question formulated by [Grohe, 2021].
    摘要 “Graph Neural Networks(GNNs)的表达能力可以完全用适当的 fragments of first-order logic 来描述。即任何GC2 query (graded modal logic的两变量 фрагмент) interpret over 标记图可以通过一个具有仅仅取决于查询深度的 GNN 表示。根据 [Barcelo & Al., 2020, Grohe, 2021] 所指出,这种描述适用于一家 activation function 家族,从而存在一个基于 activation function 的 hierarchy of logics 可以由 GNNs 表示。在这篇文章中,我们证明 GC2 queries 不能由 polynomial activation functions 表示,这意味着存在一种 polynomial 和 popular non-polynomial activation functions(如 ReLU, sigmoid 和 hyperbolic tan 等)之间的分化,并回答了 [Grohe, 2021] 提出的一个问题。”

Mean Estimation Under Heterogeneous Privacy Demands

  • paper_url: http://arxiv.org/abs/2310.13137
  • repo_url: None
  • paper_authors: Syomantak Chaudhuri, Konstantin Miagkov, Thomas A. Courtade
  • for: 本研究考虑了 differential privacy (DP) 框架中的各种用户隐私偏好。
  • methods: 我们提出了一种基于各自隐私需求的 mean estimation 算法,该算法是 minimax 优化的。
  • results: 我们的结果表明,在不同用户隐私偏好下,总体错误率随着最严格的用户隐私需求增长,而其他用户则得到了免费的隐私保障。
    Abstract Differential Privacy (DP) is a well-established framework to quantify privacy loss incurred by any algorithm. Traditional formulations impose a uniform privacy requirement for all users, which is often inconsistent with real-world scenarios in which users dictate their privacy preferences individually. This work considers the problem of mean estimation, where each user can impose their own distinct privacy level. The algorithm we propose is shown to be minimax optimal and has a near-linear run-time. Our results elicit an interesting saturation phenomenon that occurs. Namely, the privacy requirements of the most stringent users dictate the overall error rates. As a consequence, users with less but differing privacy requirements are all given more privacy than they require, in equal amounts. In other words, these privacy-indifferent users are given a nontrivial degree of privacy for free, without any sacrifice in the performance of the estimator.
    摘要 differential privacy (DP) 是一个已经成熟的框架,用于量化 алгоритмі中的隐私损失。传统的 формулювання对所有用户均强制一致的隐私要求,这经常与实际情况不符,用户们各自表达他们的隐私偏好。这个工作考虑到均值估计问题,每个用户可以单独表达自己的隐私水平。我们提出的算法被证明为最差最佳和时间复杂度几乎线性。我们的结果发现一个有趣的满足现象,即最严格的用户对全局的错误率给出了指令。因此,具有较弱隐私要求的用户们都被给予更多的隐私,但是不需要牺牲估计器的性能。即使这些隐私漏れ者用户不需要隐私,也会被给予一定的隐私,而不是完全没有隐私。

Approaches for Uncertainty Quantification of AI-predicted Material Properties: A Comparison

  • paper_url: http://arxiv.org/abs/2310.13136
  • repo_url: None
  • paper_authors: Francesca Tavazza, Kamal Choudhary, Brian DeCost
  • for: 这个论文旨在研究机器学习模型预测材料性能时间的个体不确定性,并评估了三种简单实现的方法。
  • methods: 这个论文使用了三种方法来确定机器学习模型预测结果的个体不确定性,分别是Quantile方法、直接机器学习预测 интерval和Ensemble方法。
  • results: 研究发现,Quantile方法和Ensemble方法能够更好地预测机器学习模型预测结果的不确定性,而直接机器学习预测 интерval方法的性能较差。
    Abstract The development of large databases of material properties, together with the availability of powerful computers, has allowed machine learning (ML) modeling to become a widely used tool for predicting material performances. While confidence intervals are commonly reported for such ML models, prediction intervals, i.e., the uncertainty on each prediction, are not as frequently available. Here, we investigate three easy-to-implement approaches to determine such individual uncertainty, comparing them across ten ML quantities spanning energetics, mechanical, electronic, optical, and spectral properties. Specifically, we focused on the Quantile approach, the direct machine learning of the prediction intervals and Ensemble methods.
    摘要 大量物理属性数据的发展,加上强大计算机的可用性,使得机器学习(ML)模型成为了预测材料性能的广泛使用的工具。而每个预测的 uncertainty,即预测结果的不确定程度,并不是always report的。在这里,我们 investigate了三种容易实现的方法来确定这种个体uncertainty,对于十个ML量,其中包括能量、机械、电子、光学和spectral性质。我们专注于Quantile方法、直接机器学习预测间隔和Ensemble方法。

Fuel Consumption Prediction for a Passenger Ferry using Machine Learning and In-service Data: A Comparative Study

  • paper_url: http://arxiv.org/abs/2310.13123
  • repo_url: https://github.com/pagand/model_optimze_vessel
  • paper_authors: Pedram Agand, Allison Kennedy, Trevor Harris, Chanwoo Bae, Mo Chen, Edward J Park
  • for: 提高可持续交通的重要性,提供高效的海事运输方法是必要的。
  • methods: 使用实时操作数据和天气Condition forecasting的方法,选择合适的输入变量,避免过拟合、缺失数据和多icollinearity,并提供实用性。
  • results: 使用XGBoost技术建立的模型实现了最好的预测性能,可以帮助提高海事运输的能源效率。Here’s the full translation of the paper’s abstract in simplified Chinese:
  • for: 随着可持续交通的重要性提高,提供高效的海事运输方法是必要的。本文提出了一种基于实时操作数据和天气condition forecasting的方法,以提高海事运输的能源效率。
  • methods: 本文使用实时操作数据和天气condition forecasting的方法,选择合适的输入变量,避免过拟合、缺失数据和多icollinearity。其中,使用了统计学和域知识方法来选择输入变量,以保证模型的准确性和可靠性。
  • results: 本文使用XGBoost技术建立的模型实现了最好的预测性能,可以帮助提高海事运输的能源效率。模型的预测结果表明,通过对操作数据进行分析和预测,可以提高海事运输的能源效率和可持续性。I hope this helps! Let me know if you have any further questions.
    Abstract As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the operational data in real-time. This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. Statistical and domain-knowledge methods were used to select the proper input variables for the models. These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. \rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research.
    摘要 Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. 使用在船舶上collected的实时数据,需要一些准确而完整的模型,以预测船舶的能源效率。The models need to effectively process all the operational data in real-time. 这些模型需要在实时中有效地处理所有的操作数据。This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. 本文提出了使用实时数据收集自一艘客轮船的燃料消耗预测模型。Statistical and domain-knowledge methods were used to select the proper input variables for the models. 使用统计学和域知识方法选择模型的输入变量。These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. 这些方法可以避免过拟合、缺失数据和多重相关性,同时提供实际应用性。Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. 研究的预测模型包括多重直线回归(MLR)、决策树方法(DT)、人工神经网络(ANN)和ensemble方法。The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. 最佳预测性能来自一个使用XGboost技术开发的集成方法。\rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research. 我们的代码可以在GitHub上的\url{https://github.com/pagand/model_optimze_vessel/tree/OE}找到,以便未来的研究。}

SRAI: Towards Standardization of Geospatial AI

  • paper_url: http://arxiv.org/abs/2310.13098
  • repo_url: https://github.com/kraina-ai/srai
  • paper_authors: Piotr Gramacki, Kacper Leśniara, Kamil Raczycki, Szymon Woźniak, Marcin Przymus, Piotr Szymański
  • for: 这个研究是用于探讨人工智能(AI)领域内的地ospatial数据处理方法。
  • methods: 这个研究使用了Python programming language的Spatial Representations for Artificial Intelligence(srai)库,可以下载地ospatial数据,分割给定区域为微区域,并训练嵌入模型使用不同的架构。
  • results: 这个研究可以实现地ospatial任务解决的完整管道,并且是首个将地ospatial AI领域工具集成成一个标准化的库。
    Abstract Spatial Representations for Artificial Intelligence (srai) is a Python library for working with geospatial data. The library can download geospatial data, split a given area into micro-regions using multiple algorithms and train an embedding model using various architectures. It includes baseline models as well as more complex methods from published works. Those capabilities make it possible to use srai in a complete pipeline for geospatial task solving. The proposed library is the first step to standardize the geospatial AI domain toolset. It is fully open-source and published under Apache 2.0 licence.
    摘要 spatial representations for artificial intelligence (SRai) 是一个 Python 库用于处理地ospatial 数据。该库可以下载地ospatial 数据,将给定区域分解成微区域使用多种算法,并使用不同的架构训练嵌入模型。它包括基线模型以及来自已发布作品的更复杂的方法。这些能力使得 SRai 可以在完整的地ospatial 任务解决管道中使用。提议的库是地ospatial AI 领域工具集的首个标准化步骤。它是完全开源的,并根据 Apache 2.0 许可证发布。

A Multi-Stage Temporal Convolutional Network for Volleyball Jumps Classification Using a Waist-Mounted IMU

  • paper_url: http://arxiv.org/abs/2310.13097
  • repo_url: None
  • paper_authors: Meng Shang, Camilla De Bleecker, Jos Vanrenterghem, Roel De Ridder, Sabine Verschueren, Carolina Varon, Walter De Raedt, Bart Vanrumste
  • for: 这个研究是为了开发一种用单个遥感测量单元(IMU)来识别篮球运动员在训练或比赛中的跳跃类型的不侵入式系统。
  • methods: 这个研究使用了一种多层时间卷积神经网络(MS-TCN)来实现样本级别的分类。
  • results: 研究发现,使用单个IMU和MS-TCN模型可以准确地识别篮球运动员的跳跃类型,并且比现有的深度学习模型更具有计算效率。在实验中,模型在10名篮球运动员和26名篮球运动员的 lab session 和训练Session 中的表现均显示出优异的准确率。
    Abstract Monitoring the number of jumps for volleyball players during training or a match can be crucial to prevent injuries, yet the measurement requires considerable workload and cost using traditional methods such as video analysis. Also, existing methods do not provide accurate differentiation between different types of jumps. In this study, an unobtrusive system with a single inertial measurement unit (IMU) on the waist was proposed to recognize the types of volleyball jumps. A Multi-Layer Temporal Convolutional Network (MS-TCN) was applied for sample-wise classification. The model was evaluated on ten volleyball players and twenty-six volleyball players, during a lab session with a fixed protocol of jumping and landing tasks, and during four volleyball training sessions, respectively. The MS-TCN model achieved better performance than a state-of-the-art deep learning model but with lower computational cost. In the lab sessions, most jump counts showed small differences between the predicted jumps and video-annotated jumps, with an overall count showing a Limit of Agreement (LoA) of 0.1+-3.40 (r=0.884). For comparison, the proposed algorithm showed slightly worse results than VERT (a commercial jumping assessment device) with a LoA of 0.1+-2.08 (r=0.955) but the differences were still within a comparable range. In the training sessions, the recognition of three types of jumps exhibited a mean difference from observation of less than 10 jumps: block, smash, and overhead serve. These results showed the potential of using a single IMU to recognize the types of volleyball jumps. The sample-wise architecture provided high resolution of recognition and the MS-TCN required fewer parameters to train compared with state-of-the-art models.
    摘要 监测排球运动员 durante 训练或比赛中的跳跃数量可能是预防伤害的关键,但传统方法 such as video分析需要较大的工作负担和成本。此外,现有的方法无法准确地区分不同类型的跳跃。本研究提出了一种不侵入式系统,使用单个吸收测量单元 (IMU) 在腰部进行跳跃类型识别。使用多层时间卷积网络 (MS-TCN) 进行样本WISE分类。模型在十名排球运动员和二十六名排球运动员 durante 实验室 session 中进行评估,并在四场排球训练Session中进行评估。MS-TCN 模型在与现有深度学习模型进行比较时表现更好,但计算成本较低。在实验室 session 中,大多数跳跃计数显示小差异 между预测跳跃和视频标注跳跃,总计异差为 0.1+-3.40(r=0.884)。相比之下,提posed algorithm 对 VERT (一种商业跳跃评估设备) 的表现略为差,异差为 0.1+-2.08(r=0.955),但差异仍在相对可接受范围内。在训练Session中,识别三种类型的跳跃显示平均差异少于 10 跳跃:封顶、击球和背靠击。这些结果表明了使用单个 IMU 可以准确地识别排球跳跃的类型。样本WISE架构提供了高分辨率的识别,而 MS-TCN 需要 fewer 参数进行训练,相比之下state-of-the-art模型。

Sequence Length Independent Norm-Based Generalization Bounds for Transformers

  • paper_url: http://arxiv.org/abs/2310.13088
  • repo_url: https://github.com/traugerjacob/transformer-gen-bounds
  • paper_authors: Jacob Trauger, Ambuj Tewari
  • for: 这个论文提供了不依赖输入序列长度的Transformer架构的 нор-based通用化 bound。
  • methods: 我们使用 Covering Number 基本法来证明我们的 bound。我们使用三个新的 Covering Number 上界来Upper bound Transformer的 Rademacher complexity。
  • results: 我们证明了这个通用化 bound 适用于常见的Transformer训练技术中的masking和预测masked word。我们也在一个干扰多数据集上进行了实验验证我们的理论发现。
    Abstract This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear transformations to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings.
    摘要 Note:* "Transformer" architecture 改为 "Transformer 架构"* "input sequence length" 改为 "输入序列长度"* "Rademacher complexity" 改为 "拉德贝克复杂度"* "masked word" 改为 "遮盖的单词"* "sparse majority data set" 改为 "稀疏多数数据集"

How Can Everyday Users Efficiently Teach Robots by Demonstrations?

  • paper_url: http://arxiv.org/abs/2310.13083
  • repo_url: None
  • paper_authors: Maram Sakr, Zhikai Zhang, Benjamin Li, Haomiao Zhang, H. F. Machiel Van der Loos, Dana Kulic, Elizabeth Croft
  • for: 这项研究的目的是提高机器人学习效率,并且帮助机器人更好地泛化到任务变化中。
  • methods: 这项研究使用了 uncertainty 度量,即任务相关信息熵,作为建议教师提供示范例子的标准。
  • results: 研究发现,使用提议的示范例子可以提高机器人学习效率,并且可以提高机器人在新任务上的泛化能力。 compared to 一个现有的经验规则,提议的方法可以提高机器人学习效率 by 210%。
    Abstract Learning from Demonstration (LfD) is a framework that allows lay users to easily program robots. However, the efficiency of robot learning and the robot's ability to generalize to task variations hinges upon the quality and quantity of the provided demonstrations. Our objective is to guide human teachers to furnish more effective demonstrations, thus facilitating efficient robot learning. To achieve this, we propose to use a measure of uncertainty, namely task-related information entropy, as a criterion for suggesting informative demonstration examples to human teachers to improve their teaching skills. In a conducted experiment (N=24), an augmented reality (AR)-based guidance system was employed to train novice users to produce additional demonstrations from areas with the highest entropy within the workspace. These novice users were trained for a few trials to teach the robot a generalizable task using a limited number of demonstrations. Subsequently, the users' performance after training was assessed first on the same task (retention) and then on a novel task (transfer) without guidance. The results indicated a substantial improvement in robot learning efficiency from the teacher's demonstrations, with an improvement of up to 198% observed on the novel task. Furthermore, the proposed approach was compared to a state-of-the-art heuristic rule and found to improve robot learning efficiency by 210% compared to the heuristic rule.
    摘要 学习从示例(LfD)是一种框架,允许非专业人员轻松编程机器人。然而,机器人学习效率和任务变化的泛化能力受示例质量和量的限制。我们的目标是指导人类教师提供更有效的示例,以便促进机器人快速学习。为 достичь这一目标,我们提议使用任务相关信息熵作为教学示例选择的依据。在一项实验中(N=24),我们使用了增强现实(AR)基本的指导系统,让新手用户在工作空间中最高 entropy 的区域内提供更多示例。这些新手用户在几个训练周期后,通过限制数量的示例教育机器人一个通用任务。然后,用户的性能被评估在同一个任务上(退 Reserve)和一个新任务上(转移)无指导。结果表明,提posed方法可以大幅提高机器人学习效率,最高提高达198%在新任务上。此外,我们的方法与现有的一种有效规则进行比较,发现可以提高机器人学习效率210%。

On the Computational Complexities of Complex-valued Neural Networks

  • paper_url: http://arxiv.org/abs/2310.13075
  • repo_url: None
  • paper_authors: Kayol Soares Mayer, Jonathan Aguiar Soares, Ariadne Arrais Cruz, Dalton Soares Arantes
  • for: 这篇论文主要是用于研究复数神经网络(CVNN)的计算复杂性。
  • methods: 这篇论文使用了复数神经网络(CVNN)来处理复数域数据的数字信号处理。与实数神经网络(RVNN)相比,CVNN可以直接处理复数输入和输出信号,这是因为它们的复数域参数和活化函数。
  • results: 这篇论文提出了CVNN的量化和极限计算复杂性分析方法,这些方法可以准确地估算CVNN的计算复杂性。这些方法基于实数乘法的数量,这是计算复杂性的主要限制因素。此外,这篇论文还对一些相关研究中提出的CVNN计算复杂性进行了 investigate。
    Abstract Complex-valued neural networks (CVNNs) are nonlinear filters used in the digital signal processing of complex-domain data. Compared with real-valued neural networks~(RVNNs), CVNNs can directly handle complex-valued input and output signals due to their complex domain parameters and activation functions. With the trend toward low-power systems, computational complexity analysis has become essential for measuring an algorithm's power consumption. Therefore, this paper presents both the quantitative and asymptotic computational complexities of CVNNs. This is a crucial tool in deciding which algorithm to implement. The mathematical operations are described in terms of the number of real-valued multiplications, as these are the most demanding operations. To determine which CVNN can be implemented in a low-power system, quantitative computational complexities can be used to accurately estimate the number of floating-point operations. We have also investigated the computational complexities of CVNNs discussed in some studies presented in the literature.
    摘要 复杂值神经网络(CVNN)是非线性滤波器,用于数字信号处理中的复杂频域数据。与实数值神经网络(RVNN)相比,CVNN可以直接处理复杂值输入和输出信号,这是因为它们的复杂频域参数和活动函数。随着低功耗系统的趋势,计算复杂性分析已成为选择算法的关键工具。因此,本文提供了 CVNN 的量化和极限计算复杂性分析。这是准确估计实现低功耗系统中的算法所需的浮点运算数量的关键工具。我们还 investigate了一些 literatures 中关于 CVNN 的计算复杂性分析。

To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets

  • paper_url: http://arxiv.org/abs/2310.13061
  • repo_url: https://github.com/d-doshi/Grokking
  • paper_authors: Darshil Doshi, Aritra Das, Tianyu He, Andrey Gromov
  • for: This paper aims to address the challenge of robust generalization in deep learning, specifically when the number of trainable parameters is very large.
  • methods: The authors use two-layer neural networks trained on modular arithmetic tasks with corrupted labels, and study the effect of regularization methods such as weight decay, dropout, and BatchNorm on the network’s ability to generalize.
  • results: The authors show that regularization methods can force the network to ignore corrupted data during optimization, achieving $100%$ accuracy on the uncorrupted dataset. They also demonstrate that the effect of these regularization methods is interpretable, and that the training dynamics involve two consecutive stages: first, the network undergoes the “grokking” dynamics reaching high train and test accuracy, and second, it unlearns the memorizing representations.
    Abstract Robust generalization is a major challenge in deep learning, particularly when the number of trainable parameters is very large. In general, it is very difficult to know if the network has memorized a particular set of examples or understood the underlying rule (or both). Motivated by this challenge, we study an interpretable model where generalizing representations are understood analytically, and are easily distinguishable from the memorizing ones. Namely, we consider two-layer neural networks trained on modular arithmetic tasks where ($\xi \cdot 100\%$) of labels are corrupted (\emph{i.e.} some results of the modular operations in the training set are incorrect). We show that (i) it is possible for the network to memorize the corrupted labels \emph{and} achieve $100\%$ generalization at the same time; (ii) the memorizing neurons can be identified and pruned, lowering the accuracy on corrupted data and improving the accuracy on uncorrupted data; (iii) regularization methods such as weight decay, dropout and BatchNorm force the network to ignore the corrupted data during optimization, and achieve $100\%$ accuracy on the uncorrupted dataset; and (iv) the effect of these regularization methods is (``mechanistically'') interpretable: weight decay and dropout force all the neurons to learn generalizing representations, while BatchNorm de-amplifies the output of memorizing neurons and amplifies the output of the generalizing ones. Finally, we show that in the presence of regularization, the training dynamics involves two consecutive stages: first, the network undergoes the \emph{grokking} dynamics reaching high train \emph{and} test accuracy; second, it unlearns the memorizing representations, where train accuracy suddenly jumps from $100\%$ to $100 (1-\xi)\%$.
    摘要 深度学习中的稳健泛化是一个主要挑战,特别是当训练参数的数量非常大时。通常,很难判断网络是否已经记忆了特定的示例集或者理解了下面的规则(或者都是)。为了解决这个挑战,我们研究了一种可解释的模型,其中泛化表示可以分析地理解。具体来说,我们考虑了两层神经网络,在模块加法任务上训练,其中($\xi \cdot 100\%$)的标签有误(即训练集中的模块操作结果有误)。我们发现:1. 网络可以同时记忆损害的标签并达到100%的泛化率;2. 记忆神经可以被识别并剔除,使网络的准确率在损害数据上降低,并在不损害数据上提高准确率;3. 训练过程中的正则化方法,如权重衰减、Dropout和BatchNorm,会让网络在优化过程中忽略损害数据,并在不损害数据上达到100%的准确率;4. 这些正则化方法的效果是("机械")可解释的:权重衰减和Dropout让所有神经学习泛化表示,而BatchNorm减小了记忆神经的输出,并增大了泛化神经的输出。最后,我们发现在正则化的情况下,训练过程包括两个阶段:首先,网络进行了“感知”动力学,达到高训练精度和测试精度;第二,它忘记了记忆表示,训练精度 suddenly从100%降低到100(1-$\xi)$%。

Demystifying the Myths and Legends of Nonconvex Convergence of SGD

  • paper_url: http://arxiv.org/abs/2310.12969
  • repo_url: None
  • paper_authors: Aritra Dutta, El Houcine Bergou, Soumia Boucherouite, Nicklas Werge, Melih Kandemir, Xin Li
  • for: 这篇论文旨在研究杂次 gradient descent(SGD)和其变种在解决大规模优化问题时的性能。
  • methods: 本论文使用了SGD和其变种,并对这些方法的非对称优化问题进行了研究。
  • results: 本论文显示了SGD在非对称优化问题中的$\epsilon$-定点存在,并且可以在大 enough的迭代次数T下确定这些定点。此外,本论文还能量量 $\epsilon$-定点在SGD的最终迭代中的密度,并且在不同的函数对象和精度下可以恢复类传统的$O(\frac{1}{\sqrt{T})$的下降率。
    Abstract Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems with nonconvex objective functions. Although the convergence of SGDs in the (strongly) convex case is well-understood, their convergence for nonconvex functions stands on weak mathematical foundations. Most existing studies on the nonconvex convergence of SGD show the complexity results based on either the minimum of the expected gradient norm or the functional sub-optimality gap (for functions with extra structural property) by searching the entire range of iterates. Hence the last iterations of SGDs do not necessarily maintain the same complexity guarantee. This paper shows that an $\epsilon$-stationary point exists in the final iterates of SGDs, given a large enough total iteration budget, $T$, not just anywhere in the entire range of iterates -- a much stronger result than the existing one. Additionally, our analyses allow us to measure the density of the $\epsilon$-stationary points in the final iterates of SGD, and we recover the classical $O(\frac{1}{\sqrt{T})$ asymptotic rate under various existing assumptions on the objective function and the bounds on the stochastic gradient. As a result of our analyses, we addressed certain myths and legends related to the nonconvex convergence of SGD and posed some thought-provoking questions that could set new directions for research.
    摘要

PAC Prediction Sets Under Label Shift

  • paper_url: http://arxiv.org/abs/2310.12964
  • repo_url: https://github.com/averysi224/pac-ps-label-shift
  • paper_authors: Wenwen Si, Sangdon Park, Insup Lee, Edgar Dobriban, Osbert Bastani
  • for: 这个论文旨在提出一种在标签Shift Setting下构建预测集的算法,以保证预测集中的真实标签具有高概率包含真实标签。
  • methods: 该算法使用了预测概率的估计和混淆矩阵的估计,然后通过高斯消元法传递这些估计的不确定性,最后使用这些间隔来构建预测集。
  • results: 在五个 dataset上测试了该算法,与多个基线比较,发现该算法可以满足PAC保证,同时生成更小、更有用的预测集。
    Abstract Prediction sets capture uncertainty by predicting sets of labels rather than individual labels, enabling downstream decisions to conservatively account for all plausible outcomes. Conformal inference algorithms construct prediction sets guaranteed to contain the true label with high probability. These guarantees fail to hold in the face of distribution shift, which is precisely when reliable uncertainty quantification can be most useful. We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting. This method estimates the predicted probabilities of the classes in a target domain, as well as the confusion matrix, then propagates uncertainty in these estimates through a Gaussian elimination algorithm to compute confidence intervals for importance weights. Finally, it uses these intervals to construct prediction sets. We evaluate our approach on five datasets: the CIFAR-10, ChestX-Ray and Entity-13 image datasets, the tabular CDC Heart dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller, more informative, prediction sets compared to several baselines.
    摘要 预测集合 capture uncertainty by predicting sets of labels instead of individual labels, allowing downstream decisions to conservatively account for all plausible outcomes. конформаль inference algorithms construct prediction sets guaranteed to contain the true label with high probability. However, these guarantees fail to hold in the face of distribution shift, which is precisely when reliable uncertainty quantification can be most useful. We propose a novel algorithm for constructing prediction sets with PAC guarantees in the label shift setting. This method estimates the predicted probabilities of the classes in a target domain, as well as the confusion matrix, then propagates uncertainty in these estimates through a Gaussian elimination algorithm to compute confidence intervals for importance weights. Finally, it uses these intervals to construct prediction sets. We evaluate our approach on five datasets: CIFAR-10, ChestX-Ray and Entity-13 image datasets, the tabular CDC Heart dataset, and the AGNews text dataset. Our algorithm satisfies the PAC guarantee while producing smaller, more informative, prediction sets compared to several baselines.

Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning

  • paper_url: http://arxiv.org/abs/2310.12952
  • repo_url: https://github.com/vertaix/vendi-score
  • paper_authors: Amey P. Pasarkar, Adji Bousso Dieng
  • for: 这 paper 是为了提出一种新的多样性度量 metric,用于Machine Learning(ML)、生态学和化学等领域。
  • methods: 这 paper 使用了类似性基于的 Hill 数列扩展,从量子统计力学中借鉴了一些想法,以便更好地衡量多样性。
  • results: 这 paper 在一种控制的synthetic setting中研究了这种新的多样性度量的属性,并在分子模拟中进行了实验,以及用于更好地理解图像生成模型中的 memorization、duplication、多样性和样本质量等问题。
    Abstract Measuring diversity accurately is important for many scientific fields, including machine learning (ML), ecology, and chemistry. The Vendi Score was introduced as a generic similarity-based diversity metric that extends the Hill number of order q=1 by leveraging ideas from quantum statistical mechanics. Contrary to many diversity metrics in ecology, the Vendi Score accounts for similarity and does not require knowledge of the prevalence of the categories in the collection to be evaluated for diversity. However, the Vendi Score treats each item in a given collection with a level of sensitivity proportional to the item's prevalence. This is undesirable in settings where there is a significant imbalance in item prevalence. In this paper, we extend the other Hill numbers using similarity to provide flexibility in allocating sensitivity to rare or common items. This leads to a family of diversity metrics -- Vendi scores with different levels of sensitivity -- that can be used in a variety of applications. We study the properties of the scores in a synthetic controlled setting where the ground truth diversity is known. We then test their utility in improving molecular simulations via Vendi Sampling. Finally, we use the Vendi scores to better understand the behavior of image generative models in terms of memorization, duplication, diversity, and sample quality.
    摘要 Translation notes:* "Measuring diversity accurately" is 精确地测量多样性 (jīngjì de cèngjí de tiǎoyòng xìng)* "Vendi Score" is 维度分数 (wéidù fānsù)* "Hill number of order q=1" is 邦顿数量 q=1 (bāngdòng xiàngliàng q=1)* "similarity" is 相似性 (xiāngsiǒngxìng)* "prevalence" is 存在率 (cúnzài xìng)* "sensitivity" is 敏感度 (mǐnán dé)* "family of diversity metrics" is 多样性指标的家族 (duōyàng xìng zhǐbǎi de jiāfu)* "Vendi Sampling" is 维度采样 (wéidù qiǎo yǎn)* "image generative models" is 图像生成模型 (túxiàng shēngchǎng módelì)* "memorization" is 记忆 (jìyì)* "duplication" is 复制 (fùzhì)* "diversity" is 多样性 (duōyàng xìng)* "sample quality" is 样本质量 (yàngběn zhìliàng)

Generative Flow Networks as Entropy-Regularized RL

  • paper_url: http://arxiv.org/abs/2310.12934
  • repo_url: https://github.com/d-tiapkin/gflownet-rl
  • paper_authors: Daniil Tiapkin, Nikita Morozov, Alexey Naumov, Dmitry Vetrov
  • for: 这个论文旨在训练一个政策来采样可组合的抽象对象,使其概率与给定的奖励相对应。
  • methods: 这个论文使用了 reinforcement learning(RL)的方法,并且将其应用到了一般情况中。
  • results: 研究人员在这篇论文中展示了如何将生成流网络的学习任务重新定义为一个 entropy-regularized RL 问题,并且通过使用标准软RL算法来训练 GFlowNet。结果表明,这种方法可以与已知 GFlowNet 训练方法相比肩,而且可以在几种概率模型任务中实现实用的性能。
    Abstract The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to a general case. We demonstrate how the task of learning a generative flow network can be efficiently redefined as an entropy-regularized RL problem with a specific reward and regularizer structure. Furthermore, we illustrate the practical efficiency of this reformulation by applying standard soft RL algorithms to GFlowNet training across several probabilistic modeling tasks. Contrary to previously reported results, we show that entropic RL approaches can be competitive against established GFlowNet training methods. This perspective opens a direct path for integrating reinforcement learning principles into the realm of generative flow networks.
    摘要

Probabilistic Modeling of Human Teams to Infer False Beliefs

  • paper_url: http://arxiv.org/abs/2310.12929
  • repo_url: None
  • paper_authors: Paulo Soares, Adarsh Pyarelal, Kobus Barnard
    for:这篇论文旨在开发一种概率图示模型(PGM),用于人工智能(AI)代理人在模拟城市搜索和救援(USAR)场景中推断人类信念。methods:这种PGM方法使 observable states和 actions 明确,以及信念和意图基于时间上的观察数据。这种方法还支持推断干预的效果,这些干预是AI代理人与人类团队合作的关键。results:实验包括玩家知识的 manipulate,并在虚拟 Minecraft 环境中提供了多个信息流,包括玩家视野中的物品。参与者使用一组标记块来标注房间内的受害人存在或缺失,并且在团队中有一个成员被分配不同的标记启用,可能误导他们关于房间状态的信念。我们从前一些相关的工作中扩展,引入 ToMCAT,一个可以理解个人和共享心理状态的 AI 代理人。我们发现玩家的行为受到他们在游戏视野中看到的内容、标记的含义和团队决定的信念的影响。此外,我们发现 ToMCAT 的信念与玩家行为相符,并且它可以准确地推断 false belief ,并且与人类观察者相比,其推断精度高于随机并且相当于人类观察者。
    Abstract We develop a probabilistic graphical model (PGM) for artificially intelligent (AI) agents to infer human beliefs during a simulated urban search and rescue (USAR) scenario executed in a Minecraft environment with a team of three players. The PGM approach makes observable states and actions explicit, as well as beliefs and intentions grounded by evidence about what players see and do over time. This approach also supports inferring the effect of interventions, which are vital if AI agents are to assist human teams. The experiment incorporates manipulations of players' knowledge, and the virtual Minecraft-based testbed provides access to several streams of information, including the objects in the players' field of view. The participants are equipped with a set of marker blocks that can be placed near room entrances to signal the presence or absence of victims in the rooms to their teammates. In each team, one of the members is given a different legend for the markers than the other two, which may mislead them about the state of the rooms; that is, they will hold a false belief. We extend previous works in this field by introducing ToMCAT, an AI agent that can reason about individual and shared mental states. We find that the players' behaviors are affected by what they see in their in-game field of view, their beliefs about the meaning of the markers, and their beliefs about which meaning the team decided to adopt. In addition, we show that ToMCAT's beliefs are consistent with the players' actions and that it can infer false beliefs with accuracy significantly better than chance and comparable to inferences made by human observers.
    摘要 我们开发了一个 probabilistic graphical model(PGM),用于人工智能(AI)代理人在模拟城市搜索和救援(USAR)场景中推断人类信念。PGM方法使 observable states 和 actions 显示出来,同时也使 beliefs 和 intention 根据观察到的事实附加下附加。这种方法还支持推断 intervene,这些 intervene 非常重要,如果 AI 代理人想要协助人类团队。实验包括玩家知识的操作,并且在虚拟 Minecraft 环境中提供了多个信息流,包括玩家视野中的物体。参与者被装备了一组标记块,可以在房间入口处置标记,以signal 房间中的病人存在或缺失。在每个团队中,一个成员被给予不同的征legend than the other two,这可能会误导他们房间的状态,即他们将保持false belief。我们在这一Field extrapolate previous works,引入 ToMCAT,一个能理解个人和共享 mental state 的 AI 代理人。我们发现玩家的行为受到他们在游戏视野中看到的内容、标记的含义和团队决定的含义的影响。此外,我们发现 ToMCAT 的信念与玩家行为相符,并且它可以准确地推断 false belief ,比 randomly chance 和人类观察者的推断更高。

Enhancing Open-World Bacterial Raman Spectra Identification by Feature Regularization for Improved Resilience against Unknown Classes

  • paper_url: http://arxiv.org/abs/2310.13723
  • repo_url: https://github.com/balytskyijaroslaw/pathogensramanopenset
  • paper_authors: Yaroslav Balytskyi, Nataliia Kalashnyk, Inna Hubenko, Alina Balytska, Kelly McNear
    for:* 这个研究旨在应用深度学习技术和拉曼спектроскоPY结合,精确地识别临床环境中的病原菌。methods:* 使用了现有的关键字集合和对称网络架构,以及注意力机制来提高模型的准确性。results:* 模型的准确性提高至87.8±0.1%,比最佳可用模型的准确性高出1.1%。* 透过特征规范化,模型能够高效地识别已知的病原菌,并将未知样本分类为不明菌,降低了伪阳性率。* 在测试阶段,模型能够有效地检测未知的菌种,提高了检测结果的可靠性。
    Abstract The combination of Deep Learning techniques and Raman spectroscopy shows great potential offering precise and prompt identification of pathogenic bacteria in clinical settings. However, the traditional closed-set classification approaches assume that all test samples belong to one of the known pathogens, and their applicability is limited since the clinical environment is inherently unpredictable and dynamic, unknown or emerging pathogens may not be included in the available catalogs. We demonstrate that the current state-of-the-art Neural Networks identifying pathogens through Raman spectra are vulnerable to unknown inputs, resulting in an uncontrollable false positive rate. To address this issue, first, we developed a novel ensemble of ResNet architectures combined with the attention mechanism which outperforms existing closed-world methods, achieving an accuracy of $87.8 \pm 0.1\%$ compared to the best available model's accuracy of $86.7 \pm 0.4\%$. Second, through the integration of feature regularization by the Objectosphere loss function, our model achieves both high accuracy in identifying known pathogens from the catalog and effectively separates unknown samples drastically reducing the false positive rate. Finally, the proposed feature regularization method during training significantly enhances the performance of out-of-distribution detectors during the inference phase improving the reliability of the detection of unknown classes. Our novel algorithm for Raman spectroscopy enables the detection of unknown, uncatalogued, and emerging pathogens providing the flexibility to adapt to future pathogens that may emerge, and has the potential to improve the reliability of Raman-based solutions in dynamic operating environments where accuracy is critical, such as public safety applications.
    摘要 “深度学习技术和拉曼光谱结合显示了巨大的潜力,可以提供精准和快速的病菌识别在临床环境中。然而,传统的闭式分类方法假设所有测试样本都属于已知的病菌,其可靠性有限,因为临床环境是自然不可预测的和动态的。未知或emerging病菌可能不包括在可用目录中。我们示示了当前状态的artificial neural networks通过拉曼光谱来识别病菌是容易受到未知输入的影响,导致false positive rate不可控。为解决这个问题,我们首先开发了一种基于ResNet架构的新型ensemble,并添加了注意力机制,该模型在闭式世界方法中超越了最佳可用模型的准确率($87.8\pm 0.1\%$ vs $86.7\pm 0.4\%$)。其次,通过对特征REG regularization的integrated Objectosphere损失函数,我们的模型可以同时具有高准确率对已知病菌目录中的识别和有效地分离未知样本,从而减少false positive rate。最后,在训练阶段通过特征REG regularization,我们的模型在推理阶段的out-of-distribution检测器性能得到了显著提升,从而提高了检测未知类别的可靠性。我们的新算法可以检测未知、未目录和emerging病菌,提供了适应未来病菌的灵活性,并且在准确是关键的公共安全应用中具有潜在的优势。”

Benchmarking GPUs on SVBRDF Extractor Model

  • paper_url: http://arxiv.org/abs/2310.19816
  • repo_url: None
  • paper_authors: Narayan Kandel, Melanie Lambert
  • for: 本研究旨在选择适合特定任务的GPU,以实现优化性能。
  • methods: 本文使用了多种GPU benchmarking方法,包括常见的Synthetic Dataset和Real-world Dataset。
  • results: 研究发现,不同GPU在处理大输入图像(256x256)的神经网络模型时的性能差异较大。
    Abstract With the maturity of deep learning, its use is emerging in every field. Also, as different types of GPUs are becoming more available in the markets, it creates a difficult decision for users. How can users select GPUs to achieve optimal performance for a specific task? Analysis of GPU architecture is well studied, but existing works that benchmark GPUs do not study tasks for networks with significantly larger input. In this work, we tried to differentiate the performance of different GPUs on neural network models that operate on bigger input images (256x256).
    摘要 使深度学习技术成熟,它在各个领域中得到应用。然而,由于不同类型的GPU在市场上更加可用,这创造了用户选择GPU实现特定任务优化性能的困难决策。虽然对GPU架构分析已经得到了广泛的研究,但现有的GPU benchmark工作并不研究处理大量输入的网络。在这个工作中,我们尝试了对不同GPU的 neural network 模型在大小为256x256的输入图像上的性能进行比较。注:使用 Simplified Chinese 翻译,以下是简化中文版本。

Blind quantum machine learning with quantum bipartite correlator

  • paper_url: http://arxiv.org/abs/2310.12893
  • repo_url: None
  • paper_authors: Changhao Li, Boning Li, Omar Amer, Ruslan Shaydulin, Shouvanik Chakrabarti, Guoqing Wang, Haowei Xu, Hao Tang, Isidor Schoch, Niraj Kumar, Charles Lim, Ju Li, Paola Cappellaro, Marco Pistoia
  • for: 这个研究旨在提供一种基于量子分布式计算的隐私保护机制,以保持数据的隐私和可信worthiness。
  • methods: 本研究使用了量子生成器算法,并提出了一种新的盲目量子机器学习协议,它具有减少通信开销的特点,同时保持数据的隐私。
  • results: 研究人员通过了复杂度和隐私分析,证明了该协议的有效性。这些发现打开了新的可能性,使量子技术 era 中的隐私意识应用得到进一步发展。
    Abstract Distributed quantum computing is a promising computational paradigm for performing computations that are beyond the reach of individual quantum devices. Privacy in distributed quantum computing is critical for maintaining confidentiality and protecting the data in the presence of untrusted computing nodes. In this work, we introduce novel blind quantum machine learning protocols based on the quantum bipartite correlator algorithm. Our protocols have reduced communication overhead while preserving the privacy of data from untrusted parties. We introduce robust algorithm-specific privacy-preserving mechanisms with low computational overhead that do not require complex cryptographic techniques. We then validate the effectiveness of the proposed protocols through complexity and privacy analysis. Our findings pave the way for advancements in distributed quantum computing, opening up new possibilities for privacy-aware machine learning applications in the era of quantum technologies.
    摘要 分布式量子计算是一种有前途的计算模式,可以执行个人量子设备无法完成的计算任务。在分布式量子计算中,隐私是 kritical的,以保持数据的隐私和保护数据在不可信计算节点存在下。在这种工作中,我们引入了新的盲目量子机器学习协议,基于量子二重相关算法。我们的协议具有减少的通信负担,同时保持数据来自不可信方的隐私。我们引入了较强的算法特定隐私保护机制,不需要复杂的加密技术,计算负担较低。我们验证了我们的协议的有效性通过复杂度和隐私分析。我们的发现开 up了分布式量子计算的新可能性,为隐私意识的量子技术应用奠定基础。

Fine-Tuning Generative Models as an Inference Method for Robotic Tasks

  • paper_url: http://arxiv.org/abs/2310.12862
  • repo_url: https://github.com/orrkrup/mace
  • paper_authors: Orr Krupnik, Elisei Shafer, Tom Jurgenson, Aviv Tamar
  • for: 本研究旨在开发一种能够快速适应新和不同条件的机器人agent,使其在真实世界中更加灵活和有效。
  • methods: 我们建立在最近几年内的深度生成模型技术上,并利用现代GPU加速器,进行快速适应神经网络模型的样本生成。我们提出了一种简单而普遍的方法,可以应用于不同的深度生成模型和机器人环境。关键思想是通过快速微调模型,使模型能够快速适应观察到的证据,使用十字熵方法。
  • results: 我们的方法可以应用于多种深度生成模型,如自然语言处理、图像生成等,并且在机器人任务中得到了良好的效果,如握掌、 inverse kinematics 计算和点云补充等。
    Abstract Adaptable models could greatly benefit robotic agents operating in the real world, allowing them to deal with novel and varying conditions. While approaches such as Bayesian inference are well-studied frameworks for adapting models to evidence, we build on recent advances in deep generative models which have greatly affected many areas of robotics. Harnessing modern GPU acceleration, we investigate how to quickly adapt the sample generation of neural network models to observations in robotic tasks. We propose a simple and general method that is applicable to various deep generative models and robotic environments. The key idea is to quickly fine-tune the model by fitting it to generated samples matching the observed evidence, using the cross-entropy method. We show that our method can be applied to both autoregressive models and variational autoencoders, and demonstrate its usability in object shape inference from grasping, inverse kinematics calculation, and point cloud completion.
    摘要 这文章探讨了如何使用可靠的模型来帮助机器人代理人在实际世界中运行,以应对不同和变化的环境。我们建立了一个简单且通用的方法,可以快速地适应模型,使其能够对观察到的证据进行适应,使用十字熵方法。我们证明了我们的方法可以应用于多种深度生成模型和机器人环境中。我们还示了这个方法在抓取物体形状推导、 inverse kinematics 计算和点云补充等任务中的可用性。

Audio Editing with Non-Rigid Text Prompts

  • paper_url: http://arxiv.org/abs/2310.12858
  • repo_url: None
  • paper_authors: Francesco Paissan, Zhepei Wang, Mirco Ravanelli, Paris Smaragdis, Cem Subakan
  • for: 这个论文探讨了用非固定文本编辑 Audio-editing。
  • methods: 该编辑管道使用了文本提示进行 Audio-editing,并且可以创建具有固定音频特征的音频编辑。
  • results: 对比 Audio-LDM 模型,该方法可以获得更高的质量和更 faithful 的音频编辑结果,并且可以保持原始音频事件的幂等特征。
    Abstract In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.
    摘要 在这篇论文中,我们探讨了不固定文本编辑的音频编辑技术。我们展示了我们的编辑管道可以创建具有 faithfulness 特性的音频编辑。我们研究了添加、式转换和填充等文本提示,并quantitatively和qualitatively表明我们的编辑结果能够超过Audio-LDM,一个最近发布的文本提示 audio生成模型。 qualitative 分析结果表明,我们的编辑方法能够更好地保持输入音频事件的原始声音和停顿时间。

Model-agnostic variable importance for predictive uncertainty: an entropy-based approach

  • paper_url: http://arxiv.org/abs/2310.12842
  • repo_url: None
  • paper_authors: Danny Wood, Theodore Papamarkou, Matt Benatan, Richard Allmendinger
  • for: 本文旨在探讨如何使用现有的解释方法来理解uncertainty-aware模型的预测结果中的不确定性的来源。
  • methods: 本文使用了 permutation feature importance、partial dependence plots和individual conditional expectation plots等方法来解释uncertainty-aware模型的预测分布中的不确定性的来源。
  • results: 通过使用这些方法,本文发现了一些新的理解模型行为的方法,并且可以使用这些方法来衡量特征对预测分布中不确定性的影响。
    Abstract In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only the reasons for the predictions themselves, but also the model's level of confidence in those predictions. In this paper, we show how existing methods in explainability can be extended to uncertainty-aware models and how such extensions can be used to understand the sources of uncertainty in a model's predictive distribution. In particular, by adapting permutation feature importance, partial dependence plots, and individual conditional expectation plots, we demonstrate that novel insights into model behaviour may be obtained and that these methods can be used to measure the impact of features on both the entropy of the predictive distribution and the log-likelihood of the ground truth labels under that distribution. With experiments using both synthetic and real-world data, we demonstrate the utility of these approaches in understanding both the sources of uncertainty and their impact on model performance.
    摘要 要信任机器学习算法的预测结果,需要了解这些预测结果的原因以及模型对这些预测结果的自信度。在这篇论文中,我们示例了现有的解释方法可以扩展到不确定性意识模型,并使用这些扩展来理解模型预测分布中的不确定性来源。特别是,通过修改Permutation feature importance、partial dependence plots和individual conditional expectation plots,我们示例了可以从这些方法中获得新的意识,并且这些方法可以用来衡量特征对预测分布的熵和真实标签下的对应概率的影响。通过使用 synthetic 和实际数据进行实验,我们证明了这些方法的实用性,可以用来理解模型行为的不确定性来源并对其影响。

Generating collective counterfactual explanations in score-based classification via mathematical optimization

  • paper_url: http://arxiv.org/abs/2310.12822
  • repo_url: https://github.com/jasoneramirez/collectivece
  • paper_authors: Emilio Carrizosa, Jasone Ramírez-Ayerbe, Dolores Romero Morales
  • For: 本研究旨在提供一种用于批处理高风险决策中机器学习模型的解释方法,以帮助理解模型如何做出决策。* Methods: 本研究使用了counterfactual分析方法,通过对一个实例进行微调,以提供一个解释,即如何将该实例微调以使其被分类器分类到所需的类别。此外,本研究还提出了一种集成counterfactual分析方法,可以为一组记录提供共同的解释,以便检测整个数据集中 kritical的特征。* Results: 本研究通过使用新的数学优化模型,可以为每个实例提供一个共同的解释,同时满足一些链接约束,以最小化总的微调成本。此外,本研究还可以处理异常记录,使其更加有效。在一些假设下,解释可以转化为一个半definite Programming问题,可以使用现有的解enser来解决。实验结果表明,本方法可以有效地应用于实际数据集。
    Abstract Due to the increasing use of Machine Learning models in high stakes decision making settings, it has become increasingly important to have tools to understand how models arrive at decisions. Assuming a trained Supervised Classification model, explanations can be obtained via counterfactual analysis: a counterfactual explanation of an instance indicates how this instance should be minimally modified so that the perturbed instance is classified in the desired class by the Machine Learning classification model. Most of the Counterfactual Analysis literature focuses on the single-instance single-counterfactual setting, in which the analysis is done for one single instance to provide one single explanation. Taking a stakeholder's perspective, in this paper we introduce the so-called collective counterfactual explanations. By means of novel Mathematical Optimization models, we provide a counterfactual explanation for each instance in a group of interest, so that the total cost of the perturbations is minimized under some linking constraints. Making the process of constructing counterfactuals collective instead of individual enables us to detect the features that are critical to the entire dataset to have the individuals classified in the desired class. Our methodology allows for some instances to be treated individually, performing the collective counterfactual analysis for a fraction of records of the group of interest. This way, outliers are identified and handled appropriately. Under some assumptions on the classifier and the space in which counterfactuals are sought, finding collective counterfactuals is reduced to solving a convex quadratic linearly constrained mixed integer optimization problem, which, for datasets of moderate size, can be solved to optimality using existing solvers. The performance of our approach is illustrated on real-world datasets, demonstrating its usefulness.
    摘要 Translated into Simplified Chinese:因为机器学习模型在高度决策中越来越常用,因此理解模型如何做出决策变得越来越重要。假设已经训练过的超vision类别化模型,我们可以通过对实例进行counterfactual分析来获得解释:counterfactual解释指的是将实例所需最小改变以使得这个改变后的实例被分类器分类为欲要的类别。大多数counterfactual分析文献都专注于单个实例单个counterfactual的设置,而我们在这篇论文中引入了集合counterfactual解释。我们使用了新的数学优化模型,为每个集合中的每个实例提供一个counterfactual解释,以最小化对实例的改变成本,同时保证实例的改变满足一些链接约束。通过集成counterfactual分析,我们可以检测整个数据集中critical的特征,使得每个个体被分类为欲要的类别。我们的方法允许一些实例被处理 individually,对集合中的一部分记录进行集成counterfactual分析。这样,我们可以检测并处理异常值。在一些假设下,如果分类器和counterfactuals的空间都满足某些条件,那么找到集合counterfactuals就可以降到一个半正quadratic linearly constrained mixed integer optimization problem。这个问题可以用现有的解决器来解决,对于中等大小的数据集来说。我们的方法的性能在实际数据上进行了证明。

Hierarchical Forecasting at Scale

  • paper_url: http://arxiv.org/abs/2310.12809
  • repo_url: https://github.com/elephaint/hfas
  • paper_authors: Olivier Sprangers, Wander Wadman, Sebastian Schelter, Maarten de Rijke
  • For: The paper is written for practitioners who want to improve the forecasting performance of their production forecasting systems, particularly those with millions of time series.* Methods: The paper proposes using a sparse loss function to learn a coherent forecast for millions of time series, which directly optimizes the hierarchical product and/or temporal structure. This approach eliminates the need for a post-processing step in traditional hierarchical forecasting techniques, reducing the computational cost of the prediction phase.* Results: The paper shows that the proposed sparse hierarchical loss function achieves up to 10% better performance (in terms of RMSE) on the public M5 dataset compared to the baseline loss function. In addition, the authors implement the loss function in an existing forecasting model at a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, the paper demonstrates an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies defined.
    Abstract Existing hierarchical forecasting techniques scale poorly when the number of time series increases. We propose to learn a coherent forecast for millions of time series with a single bottom-level forecast model by using a sparse loss function that directly optimizes the hierarchical product and/or temporal structure. The benefit of our sparse hierarchical loss function is that it provides practitioners a method of producing bottom-level forecasts that are coherent to any chosen cross-sectional or temporal hierarchy. In addition, removing the need for a post-processing step as required in traditional hierarchical forecasting techniques reduces the computational cost of the prediction phase in the forecasting pipeline. On the public M5 dataset, our sparse hierarchical loss function performs up to 10% (RMSE) better compared to the baseline loss function. We implement our sparse hierarchical loss function within an existing forecasting model at bol, a large European e-commerce platform, resulting in an improved forecasting performance of 2% at the product level. Finally, we found an increase in forecasting performance of about 5-10% when evaluating the forecasting performance across the cross-sectional hierarchies that we defined. These results demonstrate the usefulness of our sparse hierarchical loss applied to a production forecasting system at a major e-commerce platform.
    摘要 传统的层次预测技术在时间序列数量增加时表现不佳。我们提议通过使用一个稀疏损失函数来学习一个可以涵盖数百万时间序列的底层预测模型。这种稀疏损失函数可以直接优化层次制 продукт和/或时间结构,从而提供一种生成底层预测值具有任意选择的横向或时间层次结构的方法。此外,不需要传统层次预测技术中必需的后处理步骤,因此预测阶段的计算成本也得到了降低。在公共M5数据集上,我们的稀疏层次损失函数与基准损失函数相比,可以提高预测性能达10%(RMSE)。在bol大 european e-commerce平台上实现了我们的稀疏层次损失函数,导致了预测性能的提高达2%。最后,我们发现在我们定义的横向层次中进行评估预测性能时,预测性能提高了5-10%。这些结果表明了我们的稀疏层次损失函数在大规模电商平台上的实用性。

DCSI – An improved measure of cluster separability based on separation and connectedness

  • paper_url: http://arxiv.org/abs/2310.12806
  • repo_url: https://github.com/janagauss/dcsi
  • paper_authors: Jana Gauss, Fabian Scheipl, Moritz Herrmann
  • for: 本研究旨在评估用实际数据集评估 clustering 算法的评估方法,以确定数据集中的类标签是否对应到有意义的集群。
  • methods: 本研究使用了一种新开发的度量指标(density cluster separability index,DCSI),以衡量between-class separation和within-class connectedness two aspects of separability。
  • results: 实验结果表明,DCSI 与 DBSCAN 的调整 Rand 指标(ARI)之间存在强相关性,但在多类数据集上存在 overlap classes 时,DCSI 的稳定性不高。此外,DCSI 能够正确地标识触地或 overlap 的类,这些类不形成有意义的集群。
    Abstract Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. A review of the existing literature shows that neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate the central aspects of separability for density-based clustering: between-class separation and within-class connectedness. A newly developed measure (density cluster separability index, DCSI) aims to quantify these two characteristics and can also be used as a CVI. Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.
    摘要 Extensive experiments on synthetic data indicate that DCSI correlates strongly with the performance of DBSCAN measured via the adjusted rand index (ARI) but lacks robustness when it comes to multi-class data sets with overlapping classes that are ill-suited for density-based hard clustering. Detailed evaluation on frequently used real-world data sets shows that DCSI can correctly identify touching or overlapping classes that do not form meaningful clusters.Translated into Simplified Chinese:是否class label在给定数据集中对应于有意义的集群是评估用实际世界数据集的 clustering 算法的评估中的关键因素。这个属性可以通过分离度量来衡量。现有文献的综述显示, neither classification-based complexity measures nor cluster validity indices (CVIs) adequately incorporate density-based clustering中的两个中心特征:between-class separation和 within-class connectedness。一种新发展的度量(density cluster separability index, DCSI) aspires to quantify这两个特征,并可以作为 CVI。对于 sintetic data 进行了广泛的实验,DCSI与 DBSCAN 的性能 measured via adjusted rand index (ARI) 之间存在强正相关,但在多类数据集中,DCSI 对于多类数据集而言不具有稳定性,这些数据集不适合 density-based hard clustering。详细评估了一些常用的实际世界数据集,DCSI 可以正确地识别触摸或重叠的类,这些类不形成有意义的集群。

Detection and Evaluation of bias-inducing Features in Machine learning

  • paper_url: http://arxiv.org/abs/2310.12805
  • repo_url: None
  • paper_authors: Moses Openja, Gabriel Laberge, Foutse Khomh
  • for: 本研究旨在系统地Identify机器学习模型中的偏见引起因素,以帮助领域专家做出公平的决策。
  • methods: 本研究提出了一种方法,通过对数据中的每个特征进行小改动,观察模型的预测结果变化,以确定每个特征是否引起偏见。
  • results: 研究人员通过应用这种方法,在四个常用的数据集上成功地Identify了机器学习模型中的偏见引起因素,并证明了该方法可以帮助领域专家更好地发现和解决偏见问题。
    Abstract The cause-to-effect analysis can help us decompose all the likely causes of a problem, such as an undesirable business situation or unintended harm to the individual(s). This implies that we can identify how the problems are inherited, rank the causes to help prioritize fixes, simplify a complex problem and visualize them. In the context of machine learning (ML), one can use cause-to-effect analysis to understand the reason for the biased behavior of the system. For example, we can examine the root causes of biases by checking each feature for a potential cause of bias in the model. To approach this, one can apply small changes to a given feature or a pair of features in the data, following some guidelines and observing how it impacts the decision made by the model (i.e., model prediction). Therefore, we can use cause-to-effect analysis to identify the potential bias-inducing features, even when these features are originally are unknown. This is important since most current methods require a pre-identification of sensitive features for bias assessment and can actually miss other relevant bias-inducing features, which is why systematic identification of such features is necessary. Moreover, it often occurs that to achieve an equitable outcome, one has to take into account sensitive features in the model decision. Therefore, it should be up to the domain experts to decide based on their knowledge of the context of a decision whether bias induced by specific features is acceptable or not. In this study, we propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts. We evaluated our technique using four well-known datasets to showcase how our contribution can help spearhead the standard procedure when developing, testing, maintaining, and deploying fair/equitable machine learning systems.
    摘要 通过 causa-efecto 分析,我们可以分解所有可能导致问题的原因,如商业不佳或对个人(们)造成不良影响。这意味着我们可以识别问题的继承,排序原因以帮助优先级化修复,简化复杂问题并可视化它们。在机器学习(ML)上下文中,我们可以使用 causa-efecto 分析理解模型的偏见行为的原因。例如,我们可以检查模型中可能导致偏见的每个特征,以确定哪些特征可能引起偏见。为此,我们可以对数据中的每个特征或一对特征进行小型变更,按照一些指南进行观察,了解模型对这些变更的响应。因此,我们可以使用 causa-efecto 分析来确定可能引起偏见的特征,即使这些特征未经预先标识。这对于当前的方法来说是重要的,因为大多数方法需要先知道敏感特征以进行偏见评估,可能会扫描过其他重要的偏见引起特征。此外,在很多情况下,为了实现公正的结果,需要考虑敏感特征在模型决策中的作用。因此,域专家应该根据它们对决策过程的知识来决定是否acceptible的偏见。在本研究中,我们提出了一种方法,可以系统地标识模型中所有偏见引起特征,以支持域专家决策。我们使用四个常见的数据集进行评估,以示出我们的贡献可以帮助驱动开发、测试、维护和部署公正/公平的机器学习系统的标准程序。

Differentiable Vertex Fitting for Jet Flavour Tagging

  • paper_url: http://arxiv.org/abs/2310.12804
  • repo_url: https://github.com/rachsmith1/ndive
  • paper_authors: Rachel E. C. Smith, Inês Ochoa, Rúben Inácio, Jonathan Shoemaker, Michael Kagan
  • for: 用于粒子物理学中的jets分类
  • methods: 使用可微分点Vertex fitting算法
  • results: 可以提高偏振扩散的粒子物理学模型中的准确率
    Abstract We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.
    摘要 我们提出了一种可微 differentiable 顶点适应算法,可以用于次级顶点适应,并可以轻松地与神经网络结合用于jet flavor标记。顶点适应被формализова为优化问题,其最优解顶点的梯度由implicit differentiations定义,可以传递到上游或下游神经网络组件进行网络训练。更广泛地说,这是一种应用 diferenciable 编程将物理知识integrated into neural network models in high energy physics。我们示出了可微 secondary vertex fitting可以与更大的 transformer-based 模型结合用于味道标记,并提高重量味道jet分类。

A Theoretical Approach to Characterize the Accuracy-Fairness Trade-off Pareto Frontier

  • paper_url: http://arxiv.org/abs/2310.12785
  • repo_url: None
  • paper_authors: Hua Tang, Lu Cheng, Ninghao Liu, Mengnan Du
  • for: 本研究旨在理解准确性和公平性之间的贸易off,提供理论基础 для现有的平衡评估方法。
  • methods: 本研究提出了一个定义准确性和公平性的Pareto前沿(FairFrontier)的理论框架,并采用实验研究来探讨这个前沿的性质。
  • results: 实验结果表明,准确性和公平性之间存在一定的贸易off,而且可以根据不同的属性类型和数据分布来分类型化这个贸易off。此外,研究还发现了一种两步流水线方法可以消除这个贸易off。
    Abstract While the accuracy-fairness trade-off has been frequently observed in the literature of fair machine learning, rigorous theoretical analyses have been scarce. To demystify this long-standing challenge, this work seeks to develop a theoretical framework by characterizing the shape of the accuracy-fairness trade-off Pareto frontier (FairFrontier), determined by a set of all optimal Pareto classifiers that no other classifiers can dominate. Specifically, we first demonstrate the existence of the trade-off in real-world scenarios and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. Experimental results on synthetic data suggest insightful findings of the proposed framework: (1) When sensitive attributes can be fully interpreted by non-sensitive attributes, FairFrontier is mostly continuous. (2) Accuracy can suffer a \textit{sharp} decline when over-pursuing fairness. (3) Eliminate the trade-off via a two-step streamlined approach. The proposed research enables an in-depth understanding of the accuracy-fairness trade-off, pushing current fair machine-learning research to a new frontier.
    摘要 “在文献中,准确性和公平性之间的贸易关系几经不详细。为了解释这个长期挑战,这项研究希望建立一个理论框架,描述公平性贸易关系的Pareto前沿(FairFrontier),由一组所有最优Pareto分类器所决定。 Specifically, we first demonstrate the existence of this trade-off in real-world scenarios, and then propose four potential categories to characterize the important properties of the accuracy-fairness Pareto frontier. For each category, we identify the necessary conditions that lead to corresponding trade-offs. 实验结果表明:(1)当敏感特征可以完全通过非敏感特征解释时,FairFrontier是主要连续的。(2)在追求公平性时,准确性可能会受到急剧下降的影响。(3)通过两步流水线方法,可以消除这个贸易关系。该研究带来了对准确性和公平性之间的贸易关系的深入理解,推动当前公平机器学习研究到新的前ier。”

Conditional Density Estimations from Privacy-Protected Data

  • paper_url: http://arxiv.org/abs/2310.12781
  • repo_url: None
  • paper_authors: Yifei Xiong, Nianqiao P. Ju, Sanguo Zhang
  • for: 本文提出了一种 simulation-based 统计分析方法,用于从受保护数据集中进行参数推断。
  • methods: 本文使用神经网络Conditional Density Estimators(NCE)作为一种灵活的分布家族,以近似 posterior 分布中参数的推断结果。
  • results: 实验和分析表明,采用本文提出的方法可以在受保护数据集中实现有效的统计推断,同时保证个人隐私。
    Abstract Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only the privatized data during statistical analysis makes it computationally challenging to perform valid inferences on parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
    摘要 viele moderne statistische Analyse und maschinelles Lern-Anwendungen erfordern das Training von Modellen auf sensible Benutzerdaten. Differenzielle Privatsphäre bietet eine formale Garantie, dass individual-level Informationen über Benutzer nicht durchlecken. In diesem Framework injectieren randomisierte Algorithmen calibrierte Lärm in das vertrauliche Daten, resulting in privacy-protected Datasets oder -Abfragen. However, restricting access to only the privatized Data during statistical Analysis makes it computationally challenging to perform valid Inferences on parameters underlying the confidential Data. In this work, we propose simulation-based Inference methods from privacy-protected Datasets. Specifically, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series Data under an infectious disease model and on ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical Inference procedures to correct for biases introduced by the privacy-protection mechanisms.

Energy-Based Models For Speech Synthesis

  • paper_url: http://arxiv.org/abs/2310.12765
  • repo_url: https://github.com/NVIDIA/radtts
  • paper_authors: Wanli Sun, Zehai Tu, Anton Ragni
  • for: 该文章探讨了一种新的非 autoregressive(非AR)模型,即能量基本模型(EBMs),以提高speech sintesis的效率。
  • methods: 文章提出了一种基于噪声对比估计的方法来训练EBMs,以及一些生成有效负样本的策略,如使用高性能的AR模型。另外,文章还介绍了使用Langevin MCMC进行采样的方法。
  • results: 实验结果表明,提出的方法可以超越Tacotron 2。
    Abstract Recently there has been a lot of interest in non-autoregressive (non-AR) models for speech synthesis, such as FastSpeech 2 and diffusion models. Unlike AR models, these models do not have autoregressive dependencies among outputs which makes inference efficient. This paper expands the range of available non-AR models with another member called energy-based models (EBMs). The paper describes how noise contrastive estimation, which relies on the comparison between positive and negative samples, can be used to train EBMs. It proposes a number of strategies for generating effective negative samples, including using high-performing AR models. It also describes how sampling from EBMs can be performed using Langevin Markov Chain Monte-Carlo (MCMC). The use of Langevin MCMC enables to draw connections between EBMs and currently popular diffusion models. Experiments on LJSpeech dataset show that the proposed approach offers improvements over Tacotron 2.
    摘要 Note:* "AR" stands for "autoregressive"* "EBMs" stands for "energy-based models"* "LJSpeech" is a dataset for speech synthesis* "Tacotron 2" is a popular method for speech synthesis

Discretize Relaxed Solution of Spectral Clustering via a Non-Heuristic Algorithm

  • paper_url: http://arxiv.org/abs/2310.12752
  • repo_url: https://github.com/hyzhang98/first-order-discretization
  • paper_authors: Hongyuan Zhang, Xuelong Li
  • for: 提出了一种基于first-order优化算法的非规则化策略,以解决spectral clustering和其扩展中的离散化问题。
  • methods: 提出了一种基于first-order优化算法的非规则化策略,并证明了这种策略可以更好地保持原始问题的优化目标。
  • results: 实验表明,该方法在离散化问题中显示出了明显的优势,并且可以更好地保持原始问题的优化目标。
    Abstract Spectral clustering and its extensions usually consist of two steps: (1) constructing a graph and computing the relaxed solution; (2) discretizing relaxed solutions. Although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. Unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. In other words, the primary drawback is the neglect of the original objective when computing the discrete solution. Inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. Since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. We also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. Sufficient experiments significantly show the superiority of our method.
    摘要 spectral clustering 和其 extensions 通常包括两个步骤:(1)构建图并计算宽松解决方案;(2)精化宽松解决方案。 although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. in other words, the primary drawback is the neglect of the original objective when computing the discrete solution. inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. we also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. sufficient experiments significantly show the superiority of our method.Here's the word-for-word translation of the text into Simplified Chinese: spectral clustering 和其 extensions 通常包括两个步骤:(1)构建图并计算宽松解决方案;(2)精化宽松解决方案。 although the former has been extensively investigated, the discretization techniques are mainly heuristic methods, e.g., k-means, spectral rotation. unfortunately, the goal of the existing methods is not to find a discrete solution that minimizes the original objective. in other words, the primary drawback is the neglect of the original objective when computing the discrete solution. inspired by the first-order optimization algorithms, we propose to develop a first-order term to bridge the original problem and discretization algorithm, which is the first non-heuristic to the best of our knowledge. since the non-heuristic method is aware of the original graph cut problem, the final discrete solution is more reliable and achieves the preferable loss value. we also theoretically show that the continuous optimum is beneficial to discretization algorithms though simply finding its closest discrete solution is an existing heuristic algorithm which is also unreliable. sufficient experiments significantly show the superiority of our method.

TabuLa: Harnessing Language Models for Tabular Data Synthesis

  • paper_url: http://arxiv.org/abs/2310.12746
  • repo_url: https://github.com/zhao-zilong/tabula
  • paper_authors: Zilong Zhao, Robert Birke, Lydia Chen
  • for: This paper focuses on the research area of tabular data synthesis, specifically exploring the use of large language models (LLMs) to generate realistic tabular data.
  • methods: The proposed method, called Tabula, is based on the language model structure and utilizes a token sequence compression strategy to reduce training time while maintaining synthetic data quality.
  • results: The paper demonstrates the limitations of using pre-trained language models for tabular data synthesis and proposes a dedicated foundational model tailored specifically for this task. Additionally, the proposed method significantly reduces training time while achieving better synthetic data utility compared to current state-of-the-art algorithms.
    Abstract Given the ubiquitous use of tabular data in industries and the growing concerns in data privacy and security, tabular data synthesis emerges as a critical research area. The recent state-of-the-art methods show that large language models (LLMs) can be adopted to generate realistic tabular data. As LLMs pre-process tabular data as full text, they have the advantage of avoiding the curse of dimensionality associated with one-hot encoding high-dimensional data. However, their long training time and limited re-usability on new tasks prevent them from replacing exiting tabular generative models. In this paper, we propose Tabula, a tabular data synthesizer based on the language model structure. Through Tabula, we demonstrate the inherent limitation of employing pre-trained language models designed for natural language processing (NLP) in the context of tabular data synthesis. Our investigation delves into the development of a dedicated foundational model tailored specifically for tabular data synthesis. Additionally, we propose a token sequence compression strategy to significantly reduce training time while preserving the quality of synthetic data. Extensive experiments on six datasets demonstrate that using a language model structure without loading the well-trained model weights yields a better starting model for tabular data synthesis. Moreover, the Tabula model, previously trained on other tabular data, serves as an excellent foundation model for new tabular data synthesis tasks. Additionally, the token sequence compression method substantially reduces the model's training time. Results show that Tabula averagely reduces 46.2% training time per epoch comparing to current LLMs-based state-of-the-art algorithm and consistently achieves even higher synthetic data utility.
    摘要 “因为业务中广泛使用表格数据,并且数据隐私和安全问题日益升级,表格数据合成成为一个重要的研究领域。 current state-of-the-art方法显示,大型自然语言模型(LLMs)可以用来生成真实的表格数据。因为LLMs将表格数据视为全文进行处理,因此它们可以避免因一个维度化而带来的味道问题。然而,它们的培训时间比较长,并且在新任务上有限的可重用性,使得它们无法取代现有的表格生成模型。在这篇论文中,我们提出了Tabula,一种基于语言模型结构的表格数据合成器。通过Tabula,我们发现了使用预训练的自然语言处理模型(NLP)在表格数据合成中存在的内在限制。我们的调查探讨了开发专门为表格数据合成的基础模型。此外,我们还提出了一种压缩token序列策略,可以减少模型培训时间,保持合成数据质量。我们在六个数据集进行了广泛的实验,结果表明,不加载已经预训练的模型 weights,使用语言模型结构可以获得更好的表格数据合成起始模型。此外,Tabula模型,之前已经在其他表格数据上培训,可以作为新的表格数据合成任务的优秀基础模型。此外,压缩token序列策略可以减少模型培训时间,并且可以保持合成数据质量。结果显示,Tabula平均每 epoch 减少46.2% 的培训时间,并在多个表格数据合成任务中表现出了更高的合成数据实用性。”

Canonical normalizing flows for manifold learning

  • paper_url: http://arxiv.org/abs/2310.12743
  • repo_url: https://github.com/k-flouris/cmf
  • paper_authors: Kyriakos Flouris, Ender Konukoglu
  • for: 这个论文主要目标是提出一种新的推理模型,它可以更好地利用数据的低维度表示,从而提高模型的表达能力和精度。
  • methods: 这个论文使用了推理模型,并且提出了一种新的优化目标函数,可以让模型学习一个更加有效的嵌入表示。
  • results: 实验结果显示,这个方法可以在大多数情况下比其他推理模型更好地approximate target distribution,并且可以生成更加简洁和明确的嵌入表示。
    Abstract Manifold learning flows are a class of generative modelling techniques that assume a low-dimensional manifold description of the data. The embedding of such a manifold into the high-dimensional space of the data is achieved via learnable invertible transformations. Therefore, once the manifold is properly aligned via a reconstruction loss, the probability density is tractable on the manifold and maximum likelihood can be used to optimize the network parameters. Naturally, the lower-dimensional representation of the data requires an injective-mapping. Recent approaches were able to enforce that the density aligns with the modelled manifold, while efficiently calculating the density volume-change term when embedding to the higher-dimensional space. However, unless the injective-mapping is analytically predefined, the learned manifold is not necessarily an efficient representation of the data. Namely, the latent dimensions of such models frequently learn an entangled intrinsic basis, with degenerate information being stored in each dimension. Alternatively, if a locally orthogonal and/or sparse basis is to be learned, here coined canonical intrinsic basis, it can serve in learning a more compact latent space representation. Toward this end, we propose a canonical manifold learning flow method, where a novel optimization objective enforces the transformation matrix to have few prominent and non-degenerate basis functions. We demonstrate that by minimizing the off-diagonal manifold metric elements $\ell_1$-norm, we can achieve such a basis, which is simultaneously sparse and/or orthogonal. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data, and a better approximation of target distributions than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.
    摘要 流形学习流程是一种生成模型技术,假设数据有低维度流形描述。通过学习可逆变换,将流形嵌入高维度数据空间中。一旦流形 Correctly aligned via 重建损失, then the probability density is tractable on the manifold, and maximum likelihood can be used to optimize the network parameters. 然而, unless the injective-mapping is analytically predefined, the learned manifold may not be an efficient representation of the data. Specifically, the latent dimensions of such models frequently learn an entangled intrinsic basis, with degenerate information being stored in each dimension.为了解决这个问题,我们提出了一种 canonical manifold learning flow 方法,其中一个新的优化目标函数要求变换矩阵具有少量显著和非分析的基函数。我们示示了,通过最小化离散 manifold 度量元素 $\ell_1$-norm,可以实现这种基函数,这是同时稀疏和/或正交的。 canonical manifold flow 可以更有效地使用封闭空间,自动生成 fewer 和更明显的维度来表示数据,并且更好地近似目标分布 than other manifold flow methods in most experiments we conducted, resulting in lower FID scores.

Learn from the Past: A Proxy based Adversarial Defense Framework to Boost Robustness

  • paper_url: http://arxiv.org/abs/2310.12713
  • repo_url: None
  • paper_authors: Yaohua Liu, Jiaxin Gao, Zhu Liu, Xianghao Jiao, Xin Fan, Risheng Liu
  • For: This paper focuses on improving the robustness of deep learning models against adversarial attacks, specifically by introducing a new framework called LAST (Learn from the Past) that utilizes historical information to defend against parameter-oriented attacks.* Methods: The paper introduces a two-stage update rule for the target model, which incorporates prior information from the historical state of the model to improve defense against adversarial attacks. Additionally, the paper proposes a Self Distillation (SD) based defense objective to constrain the update process of the proxy model without relying on larger teacher models.* Results: The paper demonstrates significant performance enhancements in improving robust accuracy (RA) across various datasets, backbones, and attack modalities, with improvements of up to 9.2% and 20.5% on the CIFAR10 and CIFAR100 datasets, respectively. The paper also shows that the proposed method can improve training stability and reduce catastrophic overfitting issues.
    Abstract In light of the vulnerability of deep learning models to adversarial samples and the ensuing security issues, a range of methods, including Adversarial Training (AT) as a prominent representative, aimed at enhancing model robustness against various adversarial attacks, have seen rapid development. However, existing methods essentially assist the current state of target model to defend against parameter-oriented adversarial attacks with explicit or implicit computation burdens, which also suffers from unstable convergence behavior due to inconsistency of optimization trajectories. Diverging from previous work, this paper reconsiders the update rule of target model and corresponding deficiency to defend based on its current state. By introducing the historical state of the target model as a proxy, which is endowed with much prior information for defense, we formulate a two-stage update rule, resulting in a general adversarial defense framework, which we refer to as `LAST' ({\bf L}earn from the P{\bf ast}). Besides, we devise a Self Distillation (SD) based defense objective to constrain the update process of the proxy model without the introduction of larger teacher models. Experimentally, we demonstrate consistent and significant performance enhancements by refining a series of single-step and multi-step AT methods (e.g., up to $\bf 9.2\%$ and $\bf 20.5\%$ improvement of Robust Accuracy (RA) on CIFAR10 and CIFAR100 datasets, respectively) across various datasets, backbones and attack modalities, and validate its ability to enhance training stability and ameliorate catastrophic overfitting issues meanwhile.
    摘要 在深度学习模型面临抗击样本攻击和相关安全问题的情况下,一系列方法,包括抗击训练(AT)作为代表,努力强化模型对各种抗击攻击的抗御能力。然而,现有方法主要帮助目标模型在抗击攻击中增强对参数的抗御能力,具有显著的计算负担和不稳定的收敛行为。与之前的工作不同,本文重新考虑目标模型的更新规则和相关缺陷,基于目标模型当前状态,提出了一种通用的抗击防御框架,称之为“LAST”(学习从过去)。此外,我们还提出了一种基于自适应融合(SD)的防御目标函数,以防止代理模型更新过程中的潜在混乱。实验表明,我们可以通过改进单步和多步AT方法(例如,在CIFAR10和CIFAR100 datasets上提高了Robust Accuracy(RA)的性能,最高提高达9.2%和20.5%),并在不同的 datasets、后处和攻击模式下达到了显著的性能提升。此外,我们还证明了它能够提高训练稳定性和避免潜在的混乱学习问题。

On the Optimization and Generalization of Multi-head Attention

  • paper_url: http://arxiv.org/abs/2310.12680
  • repo_url: None
  • paper_authors: Puneesh Deora, Rouzbeh Ghaderi, Hossein Taheri, Christos Thrampoulidis
  • for: investigate the potential optimization and generalization advantages of using multiple attention heads in Transformer’s core mechanism
  • methods: derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model
  • results: demonstrate that the conditions for realizability hold for a simple tokenized-mixture model, and expect the analysis can be extended to various data-model and architecture variations.Here’s the format you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope that helps!
    Abstract The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations.
    摘要 <>transformer 核心机制,即注意机制,的训练和泛化动态仍未得到充分探索。另外,现有的分析主要集中在单头注意力。针对此,我们发现了过参数化训练完全连接网络时的优化和泛化优势。为达到这个目标,我们 derive了梯度下降训练单层多头自注意模型的收敛和泛化保证,只要数据满足适当的可能性条件。然后,我们确定了初始化的 primitive conditions,以保证可能性条件成立。最后,我们证明这些条件在一个简单的Tokenized-mixture模型中成立。我们预计这些结果可以推广到不同的数据-模型和架构变化。<>

Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff

  • paper_url: http://arxiv.org/abs/2310.12671
  • repo_url: https://github.com/freekholvoet/nnforfreqsevpricing
  • paper_authors: Freek Holvoet, Katrien Antonio, Roel Henckaerts
  • for: 这篇论文旨在使用机器学习技术来模型保险公司的索赔频率和严重程度数据。
  • methods: 该论文使用了深度学习结构,包括梯度抽象树模型、feed-forward neural network (FFNN) 和组合 actuarial neural network (CANN),对四个保险数据集进行了比较性研究。
  • results: 研究发现,CANNs 可以在具有多种输入特征的保险数据集上达到最高的准确率,而且可以使用自动编码器将分类变量embed到神经网络中,从而提高模型的性能。此外,研究还构建了全局替身模型,使得可以将神经网络中的频率和严重度模型翻译成 GLM 模型,从而实现技术评估表的生成。
    Abstract Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.
    摘要 保险公司通常会使用通用线性模型来模拟养成和严重性数据。由于机器学习技术在其他领域的成功,因此在保险工具箱中受到欢迎。我们的论文对频率-严重保险价格使用机器学习技术进行了贡献,特别是使用深度学习结构。我们对四个保险数据集进行了比较性研究,每个数据集具有频率和严重性目标,同时具有多种输入特征。我们对比了以下四种模型的性能:通用线性模型在分割输入数据上,梯度拟合树模型,feed-forward neural network (FFNN),以及结合投保险 neural network (CANN)。我们的CANNs将基eline预测使用通用线性模型和梯度拟合树模型,然后使用神经网络修正。我们还解释了对多种输入特征进行数据处理步骤,包括邮政编码、数字和分类变量。我们使用自动编码器将分类变量嵌入神经网络,并考虑其在频率-严重设置中的潜在优势。最后,我们构建了全球抽象模型 для神经网络的频率和严重模型。这些抽象模型使得可以将FFNNs或CANNs的主要启示翻译到GLMs中。因此,我们可以轻松地生成一份技术 tariff table,可以在实践中使用。

STANLEY: Stochastic Gradient Anisotropic Langevin Dynamics for Learning Energy-Based Models

  • paper_url: http://arxiv.org/abs/2310.12667
  • repo_url: None
  • paper_authors: Belhal Karimi, Jianwen Xie, Ping Li
  • for: 本研究提出了一种名为STANLEY的Stochastic gradient ANisotropic LangEvin dYnamics方法,用于采样高维数据。
  • methods: 该方法基于一种批量 Monte Carlo(MCMC)算法,通过一种随机步长和梯度信息来更新负样本。
  • results: 实验表明,STANLEY方法可以更好地采样高维数据,并且可以提供更高质量的样本。
    Abstract We propose in this paper, STANLEY, a STochastic gradient ANisotropic LangEvin dYnamics, for sampling high dimensional data. With the growing efficacy and potential of Energy-Based modeling, also known as non-normalized probabilistic modeling, for modeling a generative process of different natures of high dimensional data observations, we present an end-to-end learning algorithm for Energy-Based models (EBM) with the purpose of improving the quality of the resulting sampled data points. While the unknown normalizing constant of EBMs makes the training procedure intractable, resorting to Markov Chain Monte Carlo (MCMC) is in general a viable option. Realizing what MCMC entails for the EBM training, we propose in this paper, a novel high dimensional sampling method, based on an anisotropic stepsize and a gradient-informed covariance matrix, embedded into a discretized Langevin diffusion. We motivate the necessity for an anisotropic update of the negative samples in the Markov Chain by the nonlinearity of the backbone of the EBM, here a Convolutional Neural Network. Our resulting method, namely STANLEY, is an optimization algorithm for training Energy-Based models via our newly introduced MCMC method. We provide a theoretical understanding of our sampling scheme by proving that the sampler leads to a geometrically uniformly ergodic Markov Chain. Several image generation experiments are provided in our paper to show the effectiveness of our method.
    摘要 我们在这篇论文中提出了一种名为STANLEY的Stochastic gradient ANisotropic LangEvin dynamics,用于采样高维数据。随着能量基模型(EBM)的生长效力和潜在性在高维数据观测的模型方面的不同种类数据观测上的应用,我们提出了一种终端学习算法,用于EBM的训练,以提高模型采样数据点的质量。由于EBM的未知正常化常量,使训练过程变得不可行,因此通常需要采用Markov Chain Monte Carlo(MCMC)方法。我们认为MCMC方法在EBM训练中的实现,需要一种适应非线性EBM背部的更新方法,这里是一个卷积神经网络。我们的方法,即STANLEY,是一种用我们新引入的MCMC方法进行EBM训练的优化算法。我们提供了对我们采样方案的理论理解,证明该采样方案导致一个几何上均匀 Erdős 链。我们的论文中还提供了一些图像生成实验,以证明我们的方法的效果。

SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

  • paper_url: http://arxiv.org/abs/2310.12665
  • repo_url: https://github.com/securitynet-research/securitynet
  • paper_authors: Boyang Zhang, Zheng Li, Ziqing Yang, Xinlei He, Michael Backes, Mario Fritz, Yang Zhang
  • for: 本研究旨在全面描述机器学习模型的安全和隐私漏洞,并在实际应用中进行评估。
  • methods: 本研究使用公开可用的模型 weights 从互联网(公共模型)来评估机器学习模型的攻击和防御方法。我们建立了一个名为 SecurityNet 的数据库,包含 910 个图像分类模型的注解。我们Then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models.
  • results: 我们的评估表明,使用公共模型进行评估时,攻击和防御方法的效果会异常 significatively 不同于使用自己训练的模型。我们将 SecurityNet 分享给研究人员,并建议他们在未来的研究中使用公共模型进行实验,以更好地证明他们的提议的方法的效果。
    Abstract While advanced machine learning (ML) models are deployed in numerous real-world applications, previous works demonstrate these models have security and privacy vulnerabilities. Various empirical research has been done in this field. However, most of the experiments are performed on target ML models trained by the security researchers themselves. Due to the high computational resource requirement for training advanced models with complex architectures, researchers generally choose to train a few target models using relatively simple architectures on typical experiment datasets. We argue that to understand ML models' vulnerabilities comprehensively, experiments should be performed on a large set of models trained with various purposes (not just the purpose of evaluating ML attacks and defenses). To this end, we propose using publicly available models with weights from the Internet (public models) for evaluating attacks and defenses on ML models. We establish a database, namely SecurityNet, containing 910 annotated image classification models. We then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models. We share SecurityNet with the research community. and advocate researchers to perform experiments on public models to better demonstrate their proposed methods' effectiveness in the future.
    摘要 While advanced machine learning (ML) models are deployed in numerous real-world applications, previous works demonstrate these models have security and privacy vulnerabilities. Various empirical research has been done in this field. However, most of the experiments are performed on target ML models trained by the security researchers themselves. Due to the high computational resource requirement for training advanced models with complex architectures, researchers generally choose to train a few target models using relatively simple architectures on typical experiment datasets. We argue that to understand ML models' vulnerabilities comprehensively, experiments should be performed on a large set of models trained with various purposes (not just the purpose of evaluating ML attacks and defenses). To this end, we propose using publicly available models with weights from the Internet (public models) for evaluating attacks and defenses on ML models. We establish a database, namely SecurityNet, containing 910 annotated image classification models. We then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models. We share SecurityNet with the research community and advocate researchers to perform experiments on public models to better demonstrate their proposed methods' effectiveness in the future.

Knowledge from Uncertainty in Evidential Deep Learning

  • paper_url: http://arxiv.org/abs/2310.12663
  • repo_url: None
  • paper_authors: Cai Davies, Marc Roig Vilamala, Alun D. Preece, Federico Cerutti, Lance M. Kaplan, Supriyo Chakraborty
  • for: 这 paper 探讨了 Deep Learning 中的不确定性信号,具体来说是 Evidential Deep Learning (EDL) 中的不确定性信号。EDL 是一种提供测试样本上的信息量 (epistemic uncertainty) 的深度学习方法。
  • methods: 这 paper 使用了 Dirichlet 强度来捕捉 EDL 中的不确定性信号,并对 computer vision 和 bidirectional encoder large language models 进行了实验研究。
  • results: 研究发现,在某些情况下,EDL 的 `evidential signal’ 可以区分类别,特别是使用大语言模型时。此外,研究还发现了 EDL 中的 KL 规则化项对 uncertainty 的影响。与其他 Dirichlet-based 方法比较,EDL 的不确定性 coupling 是由于训练时没有使用 OUT-OF-distribution 样本而导致的。
    Abstract This work reveals an evidential signal that emerges from the uncertainty value in Evidential Deep Learning (EDL). EDL is one example of a class of uncertainty-aware deep learning approaches designed to provide confidence (or epistemic uncertainty) about the current test sample. In particular for computer vision and bidirectional encoder large language models, the `evidential signal' arising from the Dirichlet strength in EDL can, in some cases, discriminate between classes, which is particularly strong when using large language models. We hypothesise that the KL regularisation term causes EDL to couple aleatoric and epistemic uncertainty. In this paper, we empirically investigate the correlations between misclassification and evaluated uncertainty, and show that EDL's `evidential signal' is due to misclassification bias. We critically evaluate EDL with other Dirichlet-based approaches, namely Generative Evidential Neural Networks (EDL-GEN) and Prior Networks, and show theoretically and empirically the differences between these loss functions. We conclude that EDL's coupling of uncertainty arises from these differences due to the use (or lack) of out-of-distribution samples during training.
    摘要 Translated into Simplified Chinese:这个研究揭示了 Evidential Deep Learning (EDL) 中存在的证据信号。EDL 是一种uncertainty-aware深度学习方法,用于提供测试样本的信任度(或epistemic uncertainty)。特别是在计算机视觉和双向编码大语言模型中,EDL 中的 Dirichlet 强度可以在某些情况下区分类别,这是使用大语言模型时 particuarly strong。我们假设了 KL 正则项使得 EDL 将 aleatoric 和 epistemic uncertainty 相互关联。在这篇论文中,我们employmy empirical investigation 证明了 misclassification 和评估不确定性之间的相关性,并显示了 EDL 的 `evidential signal' 是因为分类偏见。我们还与其他 Dirichlet-based 方法,namely Generative Evidential Neural Networks (EDL-GEN) 和 Prior Networks,进行了比较,并通过理论和实验表明了这些损失函数之间的差异。我们结论认为,EDL 的不确定性归功于使用(或缺失)out-of-distribution 样本 durante training。

Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic

  • paper_url: http://arxiv.org/abs/2310.12660
  • repo_url: None
  • paper_authors: Rustem Takhanov, Maxat Tezekbayev, Artur Pak, Arman Bolatov, Zhenisbek Assylbekov
  • for: 本研究探讨了使用梯度基本法来训练高频 периоди函数或模块乘法的限制和挑战。
  • methods: 本研究使用了梯度分析来研究高频 periodic函数或模块乘法的训练难度。
  • results: 研究发现,当频率或基数$p$较大时,梯度的方差在高频 periodic函数或模块乘法中是非常小的,这使得使用梯度基本法进行训练变得困难。
    Abstract Classes of target functions containing a large number of approximately orthogonal elements are known to be hard to learn by the Statistical Query algorithms. Recently this classical fact re-emerged in a theory of gradient-based optimization of neural networks. In the novel framework, the hardness of a class is usually quantified by the variance of the gradient with respect to a random choice of a target function. A set of functions of the form $x\to ax \bmod p$, where $a$ is taken from ${\mathbb Z}_p$, has attracted some attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of $p$-periodic functions on ${\mathbb Z}$ and is tightly connected with a class of high-frequency periodic functions on the real line. We present a mathematical analysis of limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We highlight that the variance of the gradient is negligibly small in both cases when either a frequency or the prime base $p$ is large. This in turn prevents such a learning algorithm from being successful.
    摘要 Classes of target functions containing a large number of approximately orthogonal elements are known to be difficult to learn using Statistical Query algorithms. Recently, this classical fact has resurfaced in the context of gradient-based optimization of neural networks. In the new framework, the difficulty of a class is typically measured by the variance of the gradient with respect to a random choice of a target function.A set of functions of the form $x\to ax \mod p$, where $a$ is chosen from $\mathbb{Z}_p$, has garnered attention from deep learning theorists and cryptographers recently. This class can be understood as a subset of $p$-periodic functions on $\mathbb{Z}$ and is closely related to a class of high-frequency periodic functions on the real line.We present a mathematical analysis of the limitations and challenges associated with using gradient-based learning techniques to train a high-frequency periodic function or modular multiplication from examples. We show that the variance of the gradient is negligibly small in both cases when either the frequency or the prime base $p$ is large, which in turn prevents such a learning algorithm from being successful.

Inverse Renormalization Group of Disordered Systems

  • paper_url: http://arxiv.org/abs/2310.12631
  • repo_url: None
  • paper_authors: Dimitrios Bachtis
  • for: 研究三维杂 aligned spin glass 系统的粒子数量增长
  • methods: 使用 inverse renormalization group 变换和机器学习算法构建粒子数量增长的approximate配置
  • results: 在三维 Edwards-Anderson 模型中提取了两个极限常数Here’s a more detailed explanation of each point:
  • for: The paper is written to study the growth of the number of particles in the three-dimensional Edwards-Anderson spin glass model.
  • methods: The paper uses inverse renormalization group transformations and machine learning algorithms to construct approximate configurations for lattices with increasing particle numbers.
  • results: The paper extracts two critical exponents from the rescaled lattices.I hope this helps! Let me know if you have any further questions.
    Abstract We propose inverse renormalization group transformations to construct approximate configurations for lattice volumes that have not yet been accessed by supercomputers or large-scale simulations in the study of spin glasses. Specifically, starting from lattices of volume $V=8^{3}$ in the case of the three-dimensional Edwards-Anderson model we employ machine learning algorithms to construct rescaled lattices up to $V'=128^{3}$, which we utilize to extract two critical exponents. We conclude by discussing how to incorporate numerical exactness within inverse renormalization group approaches of disordered systems, thus opening up the opportunity to explore a sustainable and energy-efficient generation of exact configurations for increasing lattice volumes without the use of dedicated supercomputers.
    摘要 我们提出倒数重整化群变换来建构粗糙配置,以探索未曾被超级电脑或大规模模拟的磁铁玻璃系统。具体来说,从三维爱德华兹-安德逊模型的网格量$V=8^{3}$开始,我们使用机器学习算法建构缩小网格,直到$V'=128^{3}$,并将其用于提取两个极限常数。我们最后讨论如何在倒数重整化群变换方法中包含数据精度,以便在无需特别超级电脑的情况下,可以持续和可持续地生成精确配置,探索增加网格量的可能性。

An Improved Metarounding Algorithm via Frank-Wolfe

  • paper_url: http://arxiv.org/abs/2310.12629
  • repo_url: https://github.com/rmitsuboshi/metarounding_mitsuboshi
  • paper_authors: Ryotaro Mitsuboshi, Kohei Hatano, Eiji Takimoto
  • for: linear optimization over combinatorial classes
  • methods: metarounding algorithm and relax-based approximation algorithm
  • results: much more efficient in both theoretical and practical aspects
    Abstract Metarounding is an approach to convert an approximation algorithm for linear optimization over some combinatorial classes to an online linear optimization algorithm for the same class. We propose a new metarounding algorithm under a natural assumption that a relax-based approximation algorithm exists for the combinatorial class. Our algorithm is much more efficient in both theoretical and practical aspects.
    摘要 这是一种方法,可以将线性估计算法转换为在同一类别上的在线估计算法。我们提出了一个新的这种方法,基于自然的假设,存在一个松动基于的估计算法。我们的算法在理论和实践方面都比较高效。

How a student becomes a teacher: learning and forgetting through Spectral methods

  • paper_url: http://arxiv.org/abs/2310.12612
  • repo_url: https://github.com/jamba15/spectral-regularization-teacher-student
  • paper_authors: Lorenzo Giambagli, Lorenzo Buffoni, Lorenzo Chicchi, Duccio Fanelli
  • For: 本研究使用教师-学生模式来解释在实际辅导中的效果。在学生网络过参数化的情况下,这种模式特别有用,因为学生网络可能只需要一部分权重来处理任务。* Methods: 本研究提出了一种新的优化方案,基于层传递信息的spectral representation来计算梯度。与标准训练算法相比,该方法增加的计算复杂度和计算量几乎是零。* Results: 研究发现,在培养学生网络后,可以隔离一个稳定的学生子结构,该结构与教师网络的计算neuron数、路径分布和 topological attribute具有相似性。当去掉学生网络中无关的节点时,对记录的性能没有下降。这种行为可以被描述为一种真正的第二阶段相变,具有普遍性特征。
    Abstract In theoretical ML, the teacher-student paradigm is often employed as an effective metaphor for real-life tuition. The above scheme proves particularly relevant when the student network is overparameterized as compared to the teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network. This latter should be to some extent reminiscent of the frozen teacher structure, according to suitable metrics, while being approximately invariant across different architectures of the student candidate network. Unfortunately, state-of-the-art conventional learning techniques could not help in identifying the existence of such an invariant subnetwork, due to the inherent degree of non-convexity that characterizes the examined problem. In this work, we take a leap forward by proposing a radically different optimization scheme which builds on a spectral representation of the linear transfer of information between layers. The gradient is hence calculated with respect to both eigenvalues and eigenvectors with negligible increase in terms of computational and complexity load, as compared to standard training algorithms. Working in this framework, we could isolate a stable student substructure, that mirrors the true complexity of the teacher in terms of computing neurons, path distribution and topological attributes. When pruning unimportant nodes of the trained student, as follows a ranking that reflects the optimized eigenvalues, no degradation in the recorded performance is seen above a threshold that corresponds to the effective teacher size. The observed behavior can be pictured as a genuine second-order phase transition that bears universality traits.
    摘要 在理论机器学习中,教师-学生模式经常被用作有效的教学模式。上述方案特别适用于学生网络过参数化于教师网络。在这种情况下,可能存在一个学生可以处理给定任务的能力被储存在学生网络中的一个子网络中。这个子网络应该与教师结构相似,根据适当的指标,而且在不同学生候选网络架构下保持相对不变。然而,现有的普通学习技术无法确定这个不变子网络的存在,因为研究的问题具有内在的非对称性。在这项工作中,我们采用了一种极其不同的优化方案,基于层传递信息的线性转换的спектраль表示。因此,对于学生网络来说,计算梯度的时候不仅考虑权重,还考虑特征向量,这对标准训练算法而言增加了计算量和复杂性负担相对较少。在这个框架下,我们可以隔离出一个稳定的学生子结构,这个子结构与教师结构相似,包括计算neuron数、路径分布和 topological特征。当将训练后的学生网络中的不重要节点剪除,按照记录的性能来进行排名,则不会出现性能下降问题,直到达到教师大小的有效阈值。这种行为可以被描述为一种真正的第二阶段相对稳定过程,具有普遍性特征。

Causal Similarity-Based Hierarchical Bayesian Models

  • paper_url: http://arxiv.org/abs/2310.12595
  • repo_url: https://github.com/sophiewharrie/causal-similarity-based-hierarchical-bayesian-models
  • paper_authors: Sophie Wharrie, Samuel Kaski
  • for: 这个论文是为了解决机器学习中的泛化问题,即如何将已知数据推广到新数据上。
  • methods: 该论文提出了基于 causal similarity 的层次 Bayesian 模型,以提高泛化到新任务的能力。具体来说,该方法使用不同任务的 causal 机制之间的相似性来决定数据是否可以被Pool。
  • results: 通过对 simulate 数据和实际数据进行试验,该论文表明了该方法的优势和实际应用性。
    Abstract The key challenge underlying machine learning is generalisation to new data. This work studies generalisation for datasets consisting of related tasks that may differ in causal mechanisms. For example, observational medical data for complex diseases suffers from heterogeneity in causal mechanisms of disease across patients, creating challenges for machine learning algorithms that need to generalise to new patients outside of the training dataset. Common approaches for learning supervised models with heterogeneous datasets include learning a global model for the entire dataset, learning local models for each tasks' data, or utilising hierarchical, meta-learning and multi-task learning approaches to learn how to generalise from data pooled across multiple tasks. In this paper we propose causal similarity-based hierarchical Bayesian models to improve generalisation to new tasks by learning how to pool data from training tasks with similar causal mechanisms. We apply this general modelling principle to Bayesian neural networks and compare a variety of methods for estimating causal task similarity (for both known and unknown causal models). We demonstrate the benefits of our approach and applicability to real world problems through a range of experiments on simulated and real data.
    摘要 “ Machine learning 的主要挑战是为新数据进行扩展。这个工作研究了具有相关任务的数据集中的扩展问题。例如,观察医学数据可能受到病人间病理机制的不同,导致机器学习算法对新病人数据进行扩展具有挑战。常见的方法包括学习全域模型、学习每个任务的数据上的本地模型,或者使用层次、多任务学习方法来学习如何从多个任务中获得新数据的扩展。在这篇论文中,我们提出了因果相似性基于的层次确 Dirichlet 模型,以提高对新任务的扩展。我们将这个通用模型应用到 Bayesian 神经网络中,并比较了不同的方法来估计因果任务相似性(包括知道和未知因果模型)。我们透过一系列实验,证明了我们的方法的好处和实际应用性。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models

  • paper_url: http://arxiv.org/abs/2310.12568
  • repo_url: https://github.com/juaml/julearn
  • paper_authors: Sami Hamdan, Shammi More, Leonard Sasse, Vera Komeyer, Kaustubh R. Patil, Federico Raimondo
  • for: 本研究旨在帮助不具有Machine Learning(ML)专业背景的研究人员在ML方面进行研究,避免一些常见的陷阱和错误。
  • methods: 本研究使用的方法包括创建了一个开源的Python库(julearn),该库提供了一个简单易用的环境,以避免一些最常见的ML陷阱,并且提供了一些特有的功能来帮助研究人员设计和评估复杂的ML管道。
  • results: 本研究通过三个实际案例示例,展示了julearn可以帮助研究人员轻松实现一些已经发表的研究项目,并且提供了一些特有的功能来帮助研究人员设计和评估复杂的ML管道。
    Abstract The fast-paced development of machine learning (ML) methods coupled with its increasing adoption in research poses challenges for researchers without extensive training in ML. In neuroscience, for example, ML can help understand brain-behavior relationships, diagnose diseases, and develop biomarkers using various data sources like magnetic resonance imaging and electroencephalography. The primary objective of ML is to build models that can make accurate predictions on unseen data. Researchers aim to prove the existence of such generalizable models by evaluating performance using techniques such as cross-validation (CV), which uses systematic subsampling to estimate the generalization performance. Choosing a CV scheme and evaluating an ML pipeline can be challenging and, if used improperly, can lead to overestimated results and incorrect interpretations. We created julearn, an open-source Python library, that allow researchers to design and evaluate complex ML pipelines without encountering in common pitfalls. In this manuscript, we present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects that can be easily implemented using this novel library. Julearn aims to simplify the entry into the ML world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls. With its design, unique features and simple interface, it poses as a useful Python-based library for research projects.
    摘要 Machine learning (ML) 技术的快速发展和在研究中的普及,使得不具有深入学习 ML 背景的研究者面临着挑战。例如,在神经科学中,ML 可以帮助理解大脑行为关系、诊断疾病和开发生物标志物理化学。ML 的主要目标是建立可以在未看到数据上做准确预测的模型。研究人员希望通过CV技术(系统性采样)来评估ML管道的性能,以证明模型的普遍性。然而,选择CV方案和评估ML管道可以是困难的,如果不当使用,可能会导致结果过分估计和错误解释。为了解决这些问题,我们开发了 jullearn,一个开源的 Python 库。julearn 使得研究人员可以设计和评估复杂的 ML 管道,而无需遇到常见的陷阱。在这篇论文中,我们介绍了 jullearn 的设计理念、核心特点和三个已经发表的研究项目的实现。julearn 希望通过提供简单易用的环境,帮助研究者更容易进入 ML 世界,并提供了一些常见 ML 陷阱的防范机制。与其他 Python 基础库相比,julearn 具有独特的设计和简单的界面,成为一个有用的 Python 库 для研究项目。

Open-World Lifelong Graph Learning

  • paper_url: http://arxiv.org/abs/2310.12565
  • repo_url: https://github.com/bobowner/open-world-lgl
  • paper_authors: Marcel Hoffmann, Lukas Galke, Ansgar Scherp
  • for: 本研究探讨开放世界下的生命长图学习问题,模型需要处理新任务和可能未知的类。
  • methods: 我们利用Open-of-Distribution(OOD)探测方法识别新类,并将非图数据上的OOD探测方法适应到图数据上。我们建议将新类探测与图域信息结合进行进行。大多数OOD探测方法忽略确定Vertex是OOD的明确阈值。为解决这个问题,我们提出了弱监睹反馈(Open-WRF)方法,它降低了OOD探测中对阈值的敏感性。
  • results: 我们对六个 benchmark 数据集进行评估,结果表明我们提出的邻居聚合方法在OOD探测中超过了现有方法独立于图神经网络。此外,我们还证明了我们的Open-WRF方法在阈值选择上更加稳定,并分析了图域对OOD探测的影响。聚合和阈值方法可以与任何图神经网络和OOD探测方法结合使用,使我们的方法强大和适用于许多实际应用。
    Abstract We study the problem of lifelong graph learning in an open-world scenario, where a model needs to deal with new tasks and potentially unknown classes. We utilize Out-of-Distribution (OOD) detection methods to recognize new classes and adapt existing non-graph OOD detection methods to graph data. Crucially, we suggest performing new class detection by combining OOD detection methods with information aggregated from the graph neighborhood. Most OOD detection methods avoid determining a crisp threshold for deciding whether a vertex is OOD. To tackle this problem, we propose a Weakly-supervised Relevance Feedback (Open-WRF) method, which decreases the sensitivity to thresholds in OOD detection. We evaluate our approach on six benchmark datasets. Our results show that the proposed neighborhood aggregation method for OOD scores outperforms existing methods independent of the underlying graph neural network. Furthermore, we demonstrate that our Open-WRF method is more robust to threshold selection and analyze the influence of graph neighborhood on OOD detection. The aggregation and threshold methods are compatible with arbitrary graph neural networks and OOD detection methods, making our approach versatile and applicable to many real-world applications.
    摘要 我们研究开放世界enario下的生命周期图学习问题,模型需要处理新任务和可能未知的类别。我们利用外部 Distribution(OOD)检测方法来识别新类别,并将非图形OOD检测方法应用到图 Daten。在新类别检测中,我们建议结合OOD检测方法和图ogram neighborhood中的信息。大多数OOD检测方法避免明确的阈值来决定顶点是否为外部,来解决这个问题,我们提出了弱监督的相关反馈方法(Open-WRF)。我们将这个方法应用到六个标准资料集上,结果显示,我们的邻居统计方法在不同的图形神经网络下表现出色,并且比独立的OOD检测方法更有效。此外,我们还证明了我们的Open-WRF方法具有更好的韧性,并分析了图ogram neighborhood对OOD检测的影响。这些统计和阈值方法是对任何图形神经网络和OOD检测方法都适用的,因此我们的方法是多元的和适用于实际应用。

Approximate information maximization for bandit games

  • paper_url: http://arxiv.org/abs/2310.12563
  • repo_url: None
  • paper_authors: Alex Barbier-Chebbah, Christian L. Vestergaard, Jean-Baptiste Masson, Etienne Boursier
  • for: 模型物理系统的动态,如大脑做出决策、访问隐藏变量等
  • methods: 使用自由能原理和信息瓶颈原理进行优化
  • results: 提出了一种基于信息最大化的新型抽筋算法,可以在классиical bandit设置中实现强表现,并且在二手抽筋问题中证明其 asymptotic optimality。
    Abstract Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.
    摘要 entropy maximization和自由能 minimization是物理系统的通用原理,用于模型各种物理系统的动态。其中一些例子包括使用自由能原理模型大脑做出决策(Tishby et al., 2000)、使用信息瓶颈原理(Vergassola et al., 2007)在随机环境中导航,以及最大化系统中变量的信息。基于这个原理,我们提出了一种新的bandit算法,该算法可以最大化系统中变量的信息。为此,我们开发了一个近似analytical physics-based表示,用于预测每个动作的信息增加。这种方法在 klasische bandit设置中实现了强的表现。受其实际成功的激励,我们证明了其在两手bandit问题上的极限优化性。由于它可以尝试系统的全局物理函数,这种方法可以有效地适应更复杂的bandit设置,这叫做更多的研究信息最大化方法在多手bandit问题上。

Fast Model Debias with Machine Unlearning

  • paper_url: http://arxiv.org/abs/2310.12560
  • repo_url: None
  • paper_authors: Ruizhe Chen, Jianfei Yang, Huimin Xiong, Jianhong Bai, Tianxiang Hu, Jin Hao, Yang Feng, Joey Tianyi Zhou, Jian Wu, Zuozhu Liu
  • for: 这篇论文旨在解决深度神经网络中的偏见问题,尤其是在实际应用中可能会导致不公正的结果。
  • methods: 本论文提出了一个快速的模型偏见评估和移除框架(FMD),可以快速地识别、评估和移除训练过的模型中的偏见。FMD使用明确的counterfactual概念和影响函数来识别偏见的来源,并且设计了机器学习推卸的策略来实现快速和有效地移除偏见。
  • results: 在颜色MNIST、CelebA和成人收入数据集上进行了实验,结果显示了我们的方法可以与现有的方法相比,实现更高的准确性,同时具有更低的偏见水平和训练成本。此外,我们的方法只需要一小量的外部数据和更新少量的模型参数,不需要训练数据的存取权。
    Abstract Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets along with experiments with large language models demonstrate that our method achieves superior or competing accuracies compared with state-of-the-art methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.
    摘要 最近的发现表明深度神经网络在实际场景中可能会具有偏见性。例如,在大规模人脸识别数据集CelebA上训练的深度网络往往预测女性拥有金色头发,男性拥有黑色头发。这些偏见不仅会影响模型的稳定性,还可能扩大和加剧社会偏见,尤其是在自动决策过程中,这可能会加剧不公平的经济和社会不平等。现有的偏见纠正方法存在高成本的偏见标签或模型重新训练的问题,同时也存在评估偏见来源的缺陷。为此,我们提出了一种快速模型偏见纠正框架(FMD),它提供了一种有效的方法来识别、评估和除去训练过程中的偏见。FMD使用明确的对立思想来识别偏见的特征,并使用数据样本的影响函数来评估偏见的影响。此外,我们还设计了一种基于机器学习的“机器忘记”策略,可以高效地和有效地除去偏见,只需要一小量的对立数据集和更新少量的模型参数。实验表明,我们的方法在颜色MNIST、CelebA和成人收入数据集上以及与大型自然语言模型进行实验,均可以达到或超越当前状态艺的准确率,而且需要远 fewer 偏见和更少的偏见纠正成本。尤其是,我们的方法只需要一小部分的外部数据集和更新少量的模型参数,不需要训练数据的大小或可用性。

Neural Likelihood Approximation for Integer Valued Time Series Data

  • paper_url: http://arxiv.org/abs/2310.12544
  • repo_url: None
  • paper_authors: Luke O’Loughlin, John Maclean, Andrew Black
  • for: 这种模型用于 capture 物理和生物科学中的小系统动态,它们的个体特性不能忽略, Stochastic effects 是重要的。
  • methods: 我们使用 causal convolutions 构建了一种神经网络 posterior approximation,可以并发评估整个时间序列的可能性。
  • results: 我们在一些生态和疫情学模型中进行了推断,并证明我们可以准确地 aproximate 真实 posterior,同时 achieve significiant 计算速度提升。
    Abstract Stochastic processes defined on integer valued state spaces are popular within the physical and biological sciences. These models are necessary for capturing the dynamics of small systems where the individual nature of the populations cannot be ignored and stochastic effects are important. The inference of the parameters of such models, from time series data, is difficult due to intractability of the likelihood; current methods, based on simulations of the underlying model, can be so computationally expensive as to be prohibitive. In this paper we construct a neural likelihood approximation for integer valued time series data using causal convolutions, which allows us to evaluate the likelihood of the whole time series in parallel. We demonstrate our method by performing inference on a number of ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving significant computational speed ups in situations where current methods struggle.
    摘要 In this paper, we propose a neural likelihood approximation for integer-valued time series data based on causal convolutions. This approach enables us to evaluate the likelihood of the entire time series in parallel, achieving significant computational speedups. We demonstrate the effectiveness of our method by performing inference on several ecological and epidemiological models, showing that we can accurately approximate the true posterior while achieving substantial computational savings in situations where current methods struggle.

Constructing Impactful Machine Learning Research for Astronomy: Best Practices for Researchers and Reviewers

  • paper_url: http://arxiv.org/abs/2310.12528
  • repo_url: None
  • paper_authors: D. Huppenkothen, M. Ntampaka, M. Ho, M. Fouesneau, B. Nord, J. E. G. Peek, M. Walmsley, J. F. Wu, C. Avestruz, T. Buck, M. Brescia, D. P. Finkbeiner, A. D. Goulding, T. Kacprzak, P. Melchior, M. Pasquato, N. Ramachandra, Y. -S. Ting, G. van de Ven, S. Villar, V. A. Villar, E. Zinger
  • for: 本研究旨在为天文学界提供机器学习模型的使用指南,以确保结果的准确性、复现性和方法的有用性。
  • methods: 本研究使用机器学习模型来解决天文学问题,并提供了一些最佳实践和挑战。
  • results: 本研究提出了一种方法来报告机器学习模型的结果,以便帮助作者、评审人和编辑者更好地理解和复制研究结果。
    Abstract Machine learning has rapidly become a tool of choice for the astronomical community. It is being applied across a wide range of wavelengths and problems, from the classification of transients to neural network emulators of cosmological simulations, and is shifting paradigms about how we generate and report scientific results. At the same time, this class of method comes with its own set of best practices, challenges, and drawbacks, which, at present, are often reported on incompletely in the astrophysical literature. With this paper, we aim to provide a primer to the astronomical community, including authors, reviewers, and editors, on how to implement machine learning models and report their results in a way that ensures the accuracy of the results, reproducibility of the findings, and usefulness of the method.
    摘要 机器学习已经迅速成为天文学界的工具之一。它在各种波长和问题上应用,从脉冲分类到神经网络模拟 cosmological simulations,并在科学结果的生成和报告方面引发了 paradigm shift。然而,这种类型的方法也有自己的最佳实践、挑战和缺点,现在 frequently 在astrophysical literature中报道不够 completely。本文的目标是为天文学界提供一份指南,包括作者、评审人和编辑,如何实施机器学习模型和报告结果,以确保结果的准确性、结果的重复性和方法的有用性。

Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization

  • paper_url: http://arxiv.org/abs/2310.12526
  • repo_url: None
  • paper_authors: Xiaobin Song, Benben Jiang
  • for: 这个研究ocuses on time-sensitive black-box optimization problems, and proposes a satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) approach to solve these problems.
  • methods: The proposed STS-PBO approach uses the rate-distortion theory to construct a loss function that balances the amount of information that needs to be learned with sub-optimality, and the Blahut-Arimoto algorithm to compute the target solution that reaches the minimum information rate under the distortion limit at each step.
  • results: The proposed STS-PBO methods outperform both sequential counterparts and parallel BO with traditional Thompson sampling in both synchronous and asynchronous settings, as demonstrated on a fast-charging design problem of Lithium-ion batteries.Here is the Chinese version of the three key points:
  • for: 这个研究ocuses on时间敏感黑盒优化问题,并提出了一个 satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) 方法来解决这些问题。
  • methods: 提案的 STS-PBO 方法使用了率调法则来建构一个具有优化与不完整性之间的平衡的损失函数,并使用 Blahut-Arimoto 算法来compute每步的目标解答。
  • results: 提案的 STS-PBO 方法在同步和异步设定下,均能超越统计类似的序列对照和传统 Thompson sampling 的平行BO,并在快充电设计问题上显示了有效性。
    Abstract Bayesian optimization (BO) is widely used for black-box optimization problems, and have been shown to perform well in various real-world tasks. However, most of the existing BO methods aim to learn the optimal solution, which may become infeasible when the parameter space is extremely large or the problem is time-sensitive. In these contexts, switching to a satisficing solution that requires less information can result in better performance. In this work, we focus on time-sensitive black-box optimization problems and propose satisficing Thompson sampling-based parallel Bayesian optimization (STS-PBO) approaches, including synchronous and asynchronous versions. We shift the target from an optimal solution to a satisficing solution that is easier to learn. The rate-distortion theory is introduced to construct a loss function that balances the amount of information that needs to be learned with sub-optimality, and the Blahut-Arimoto algorithm is adopted to compute the target solution that reaches the minimum information rate under the distortion limit at each step. Both discounted and undiscounted Bayesian cumulative regret bounds are theoretically derived for the proposed STS-PBO approaches. The effectiveness of the proposed methods is demonstrated on a fast-charging design problem of Lithium-ion batteries. The results are accordant with theoretical analyses, and show that our STS-PBO methods outperform both sequential counterparts and parallel BO with traditional Thompson sampling in both synchronous and asynchronous settings.
    摘要 泛bayesian优化(BO)广泛应用于黑盒优化问题中,并在实际任务中表现良好。然而,大多数现有BO方法寻求学习最优解决方案,可能在参数空间很大或问题时间敏感时变得不可能。在这些情况下,切换到一个满足解决方案可能更有利。在这项工作中,我们关注时间敏感黑盒优化问题,并提出了一种基于满足 Thompson sampling 的并行 Bayesian 优化方法(STS-PBO),包括同步和异步版本。我们将目标从最优解决方案转换到一个更容易学习的满足解决方案。基于信息率-质量衡量理论,我们构建了一个损失函数,该函数平衡学习所需的信息量与不足的质量之间的平衡。我们采用了Blahut-Arimoto算法来计算每步目标解决方案,以达到最小信息率下的损失最小化。我们在 theoretically deriv了对 STS-PBO 方法的 Bayesian 束违率下界,以及不COUNT 束违率下界。实验结果表明,我们的 STS-PBO 方法在同步和异步设置下都超过了序列对应方法和传统 Thompson sampling 并行 BO 方法。

WeaveNet for Approximating Two-sided Matching Problems

  • paper_url: http://arxiv.org/abs/2310.12515
  • repo_url: None
  • paper_authors: Shusaku Sone, Jiaxin Ma, Atsushi Hashimoto, Naoya Chiba, Yoshitaka Ushiku
  • for: This paper is written for optimizing the assignment of limited resources under various constraints, with a focus on the task of matching in bipartite graphs.
  • methods: The paper proposes a novel graph neural network (GNN) called WeaveNet, which is designed to preserve edge-wise information while passing messages densely to reach a better solution for matching problems.
  • results: Despite being a general-purpose model, WeaveNet achieved a comparable performance with state-of-the-art algorithms for fair stable matching, even for small numbers of agents.
    Abstract Matching, a task to optimally assign limited resources under constraints, is a fundamental technology for society. The task potentially has various objectives, conditions, and constraints; however, the efficient neural network architecture for matching is underexplored. This paper proposes a novel graph neural network (GNN), \textit{WeaveNet}, designed for bipartite graphs. Since a bipartite graph is generally dense, general GNN architectures lose node-wise information by over-smoothing when deeply stacked. Such a phenomenon is undesirable for solving matching problems. WeaveNet avoids it by preserving edge-wise information while passing messages densely to reach a better solution. To evaluate the model, we approximated one of the \textit{strongly NP-hard} problems, \textit{fair stable matching}. Despite its inherent difficulties and the network's general purpose design, our model reached a comparative performance with state-of-the-art algorithms specially designed for stable matching for small numbers of agents.
    摘要 匹配任务是社会基础技术之一,目标是最优分配有限资源于约束下。这个任务可能有多种目标、条件和约束,但是现有的神经网络架构仍然未得到充分探索。这篇论文提出了一种新的图 neural network(GNN),称为 WeaveNet,用于二分图。由于二分图通常是密集的,通常的GNN架构在深层核stacking时会导致节点信息产生泛化,这是解决匹配问题的不希望的现象。WeaveNet则避免了这种现象,通过保持边信息而传递消息,以达到更好的解决方案。为评估模型,我们约化了一个“strongly NP-hard”的问题——公平稳定匹配。尽管这个问题具有内在的困难和网络通用设计,我们的模型仍然可以与特定为稳定匹配的状态静态算法相比,在小量代理人情况下达到了相似的表现。

American Option Pricing using Self-Attention GRU and Shapley Value Interpretation

  • paper_url: http://arxiv.org/abs/2310.12500
  • repo_url: None
  • paper_authors: Yanhui Shen
  • For: The paper is written for investors and financial analysts who want to use machine learning methods to predict the prices of SPY (ETF) options.* Methods: The paper proposes using a gated recurrent unit (GRU) and self-attention mechanism to forecast the prices of SPY options. The authors also compare the performance of their model with traditional binomial models and other machine learning models.* Results: The paper shows that the self-attention GRU model with historical data outperforms other models in predicting the prices of SPY options, and provides insights into the significance and contributions of different input features on option pricing using the SHAP method.
    Abstract Options, serving as a crucial financial instrument, are used by investors to manage and mitigate their investment risks within the securities market. Precisely predicting the present price of an option enables investors to make informed and efficient decisions. In this paper, we propose a machine learning method for forecasting the prices of SPY (ETF) option based on gated recurrent unit (GRU) and self-attention mechanism. We first partitioned the raw dataset into 15 subsets according to moneyness and days to maturity criteria. For each subset, we matched the corresponding U.S. government bond rates and Implied Volatility Indices. This segmentation allows for a more insightful exploration of the impacts of risk-free rates and underlying volatility on option pricing. Next, we built four different machine learning models, including multilayer perceptron (MLP), long short-term memory (LSTM), self-attention LSTM, and self-attention GRU in comparison to the traditional binomial model. The empirical result shows that self-attention GRU with historical data outperforms other models due to its ability to capture complex temporal dependencies and leverage the contextual information embedded in the historical data. Finally, in order to unveil the "black box" of artificial intelligence, we employed the SHapley Additive exPlanations (SHAP) method to interpret and analyze the prediction results of the self-attention GRU model with historical data. This provides insights into the significance and contributions of different input features on the pricing of American-style options.
    摘要 Options, 作为投资工具,可以帮助投资者在证券市场中管理和减轻投资风险。正确预测现有选择价格可以帮助投资者做出 Informed 和高效的决策。在这篇论文中,我们提出了一种基于 GRU 和自注意机制的机器学习方法,用于预测 SPY (ETF) 选择价格。我们首先将原始数据 partitioned 成 15 个subset,根据资产价值和到期日的 criterion。对于每个subset,我们匹配了相应的美国政府债券利率和假设权益指数。这种分 segmentation 允许我们更深入地探索风险自由率和下跌权益对选择价格的影响。接着,我们建立了四种不同的机器学习模型,包括多层感知器 (MLP)、长短期记忆 (LSTM)、自注意 LSTM 和自注意 GRU。与传统 binomial 模型相比,自注意 GRU WITH 历史数据表现最佳,这是因为它可以捕捉复杂的时间相关性和利用历史数据中嵌入的上下文信息。最后,我们使用 SHAP 方法来解释和分析 self-attention GRU 模型 WITH 历史数据的预测结果,这提供了对 American-style 选择价格的预测结果的解释和分析。

Quasi Manhattan Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2310.12498
  • repo_url: https://github.com/evlim/qmwd
  • paper_authors: Evan Unit Lim
  • for: 这个论文主要是为了解释Quasi Manhattan Wasserstein Distance(QMWD),它是一种用于衡量两个矩阵之间的不同程度的度量,它将沃斯特朗纳度与特定的变换相结合。
  • methods: 这篇论文使用了特定的变换和矩阵的组合来计算QMWD,具有更高的时间和空间复杂度的 Manhattan Wasserstein Distance(MWD)的缺点,QMWD可以提供更好的时间和空间复杂度。
  • results: 论文通过对QMWD的计算和复杂度分析,以及与MWD和WD的比较,证明了QMWD在大型数据集或有限的计算资源下的优势。
    Abstract The Quasi Manhattan Wasserstein Distance (QMWD) is a metric designed to quantify the dissimilarity between two matrices by combining elements of the Wasserstein Distance with specific transformations. It offers improved time and space complexity compared to the Manhattan Wasserstein Distance (MWD) while maintaining accuracy. QMWD is particularly advantageous for large datasets or situations with limited computational resources. This article provides a detailed explanation of QMWD, its computation, complexity analysis, and comparisons with WD and MWD.
    摘要 “伪 Manhattan Wasserstein 距离”(QMWD)是一个计量,用于量化两个矩阵之间的不同程度,通过组合 Wasserstein 距离和特定的变换。它提供了与 Manhattan Wasserstein 距离(MWD)相同的精度,但时间和空间复杂度更低。QMWD 特别适合大规模的数据或有限的计算资源的情况。本文将提供 QMWD 的详细解释、计算、时间复杂度分析以及与 WD 和 MWD 的比较。

SDGym: Low-Code Reinforcement Learning Environments using System Dynamics Models

  • paper_url: http://arxiv.org/abs/2310.12494
  • repo_url: https://github.com/google-research/google-research
  • paper_authors: Emmanuel Klu, Sameer Sethi, DJ Passey, Donald Martin Jr
  • for: 本研究旨在探讨algorithmic intervention在社会长期影响下的负责任AI的发展。
  • methods: 本研究使用了reinforcement learning(RL)和system dynamics(SD)两种方法,RL可以在动态环境中优化决策,但是在实际 Setting中建立 robust agent具有困难。为解决这个问题,本研究借鉴了SD的方法,将SD模型作为RL环境的一部分。
  • results: 研究发现,可以使用SD模型生成高质量的RL环境,并且可以使用RL来改进SD模型中的动态策略发现。这些发现开示了SD和RL之间的双重潜在性,并且预示了这两种方法在负责任AI中的潜在合作性。
    Abstract Understanding the long-term impact of algorithmic interventions on society is vital to achieving responsible AI. Traditional evaluation strategies often fall short due to the complex, adaptive and dynamic nature of society. While reinforcement learning (RL) can be a powerful approach for optimizing decisions in dynamic settings, the difficulty of realistic environment design remains a barrier to building robust agents that perform well in practical settings. To address this issue we tap into the field of system dynamics (SD) as a complementary method that incorporates collaborative simulation model specification practices. We introduce SDGym, a low-code library built on the OpenAI Gym framework which enables the generation of custom RL environments based on SD simulation models. Through a feasibility study we validate that well specified, rich RL environments can be generated from preexisting SD models and a few lines of configuration code. We demonstrate the capabilities of the SDGym environment using an SD model of the electric vehicle adoption problem. We compare two SD simulators, PySD and BPTK-Py for parity, and train a D4PG agent using the Acme framework to showcase learning and environment interaction. Our preliminary findings underscore the dual potential of SD to improve RL environment design and for RL to improve dynamic policy discovery within SD models. By open-sourcing SDGym, the intent is to galvanize further research and promote adoption across the SD and RL communities, thereby catalyzing collaboration in this emerging interdisciplinary space.
    摘要 理解算法干预对社会的长期影响是负责任AI的关键。传统评估策略常常因社会的复杂、适应和动态性而受限。而强化学习(RL)可以在动态设置中优化决策,但是在实际设置中建立坚实的代理人表现仍然是一个障碍。为解决这个问题,我们借鉴系统动态学(SD)作为补充方法,该方法包括合作模拟模型规范实践。我们介绍了SDGym,一个基于OpenAI Gym框架的低代码库,可以生成基于SD模型的自定义RL环境。经过一项可行性研究,我们证明了可以从现有的SD模型和一些配置代码生成高质量的RL环境。我们使用Acme框架和D4PG算法对SDGym环境进行了示例训练,并对PySD和BPTK-Py两个SD模拟器进行了比较。我们的初步发现表明SD可以提高RL环境设计,同时RL也可以提高动态政策发现在SD模型中。我们将SDGym公开开源,以促进研究和采用,并且激发SD和RL社区之间的合作,以便在这个新兴交叉领域中推动进步。

Improved Operator Learning by Orthogonal Attention

  • paper_url: http://arxiv.org/abs/2310.12487
  • repo_url: https://github.com/zhijie-group/orthogonal-neural-operator
  • paper_authors: Zipeng Xiao, Zhongkai Hao, Bokai Lin, Zhijie Deng, Hang Su
  • for: 学习 partial differential equations (PDEs) 的解决方案
  • methods: 使用 attention-based neural operators 和 eigendecomposition 进行正则化
  • results: 在六个标准 benchmark 数据集上,我们的方法可以与基准线性比例出色得分强强Here’s a more detailed explanation of each point:
  • for: The paper is written to explore the use of neural operators for solving partial differential equations (PDEs).
  • methods: The paper uses attention-based neural operators, which have become a popular approach in the field of scientific machine learning. However, the authors note that existing approaches can suffer from overfitting due to the large number of parameters in the attention mechanism. To address this, they propose an orthogonal attention method based on eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions.
  • results: The authors report results on six standard benchmark datasets, including both regular and irregular geometries. Their method outperforms competing baselines with a decent margin, indicating the effectiveness of the proposed approach.
    Abstract Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing baselines with decent margins.
    摘要 射频运算符,作为科学机器学习中解决 partialling differential equations(PDEs)的有效代理模型,在科学机器学习领域得到了广泛的关注。其中,关注基于的射频运算符在相关研究中成为了主流。然而,现有的方法因射频机制中参数的较大数量而导致过度适应训练数据。为解决这问题,我们开发了基于 eigendecomposition 的射频运算符和神经函数近似的正交化方法。这种正交化自然地对 resulting neural operator 进行了正确的规范化效果,帮助抵御过度适应和提高泛化。在六个标准射频运算符 benchmark 数据集上(包括正则和不规则的geometry)进行了实验,我们发现我们的方法可以与其他基准值相比,表现出较好的性能。

Balanced Group Convolution: An Improved Group Convolution Based on Approximability Estimates

  • paper_url: http://arxiv.org/abs/2310.12461
  • repo_url: None
  • paper_authors: Youngkyu Lee, Jongho Park, Chang-Ock Lee
  • for: 提高神经网络性能的方法
  • methods: 使用 grouped convolution 减少计算成本
  • results: 对 grouped convolution 的数学分析和一种新的变体 balanced group convolution,以及对其他变体的比较
    Abstract The performance of neural networks has been significantly improved by increasing the number of channels in convolutional layers. However, this increase in performance comes with a higher computational cost, resulting in numerous studies focused on reducing it. One promising approach to address this issue is group convolution, which effectively reduces the computational cost by grouping channels. However, to the best of our knowledge, there has been no theoretical analysis on how well the group convolution approximates the standard convolution. In this paper, we mathematically analyze the approximation of the group convolution to the standard convolution with respect to the number of groups. Furthermore, we propose a novel variant of the group convolution called balanced group convolution, which shows a higher approximation with a small additional computational cost. We provide experimental results that validate our theoretical findings and demonstrate the superior performance of the balanced group convolution over other variants of group convolution.
    摘要 “神经网络的性能已经由通道数的增加在卷积层中得到了显著改善。然而,这种改善的成本增加了计算成本,导致了许多关于减少计算成本的研究。一种有前途的方法是组卷积,它可以有效地减少计算成本。然而,到目前为止,我们没有对组卷积和标准卷积之间的相似性进行了理论分析。在这篇论文中,我们对组卷积和标准卷积之间的相似性进行了数学分析,并考虑了分组数量对相似性的影响。此外,我们还提出了一种新的组卷积变体 called 平衡组卷积,它在小加计算成本的情况下显示了更高的相似性。我们提供了实验结果,证明了我们的理论发现和平衡组卷积的超越性。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

MuseGNN: Interpretable and Convergent Graph Neural Network Layers at Scale

  • paper_url: http://arxiv.org/abs/2310.12457
  • repo_url: None
  • paper_authors: Haitian Jiang, Renjie Liu, Xiao Yan, Zhenkun Cai, Minjie Wang, David Wipf
  • for: 这篇论文是用于提出一种可扩展的图神经网络(GNN)架构,以便处理大规模的图数据。
  • methods: 这篇论文使用了一种叫做“随机抽取”的方法,来降低GNN架构的深度,从而提高其可扩展性。具体来说,他们首先使用了一种随机抽取的方法,来选择图中的一部分节点,然后使用这些节点来降低GNN的深度。
  • results: 根据文章的描述,这种采用随机抽取的GNN架构能够在大规模的图数据上实现竞争力强的准确率和可扩展性。具体来说,文章提出了一种基于这种GNN架构的全局GNN模型,并在大规模的节点分类任务上实现了竞争力强的准确率和可扩展性。
    Abstract Among the many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. In this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit desirable inductive biases and interpretability. However, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. To tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. We also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size.
    摘要 amongst many variants of graph neural network (GNN) architectures capable of modeling data with cross-instance relations, an important subclass involves layers designed such that the forward pass iteratively reduces a graph-regularized energy function of interest. in this way, node embeddings produced at the output layer dually serve as both predictive features for solving downstream tasks (e.g., node classification) and energy function minimizers that inherit desirable inductive biases and interpretability. however, scaling GNN architectures constructed in this way remains challenging, in part because the convergence of the forward pass may involve models with considerable depth. to tackle this limitation, we propose a sampling-based energy function and scalable GNN layers that iteratively reduce it, guided by convergence guarantees in certain settings. we also instantiate a full GNN architecture based on these designs, and the model achieves competitive accuracy and scalability when applied to the largest publicly-available node classification benchmark exceeding 1TB in size.Note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Constrained Reweighting of Distributions: an Optimal Transport Approach

  • paper_url: http://arxiv.org/abs/2310.12447
  • repo_url: None
  • paper_authors: Abhisek Chakraborty, Anirban Bhattacharya, Debdeep Pati
  • for: 本文提出了一种新的方法,用于适应先验数据的重量调整,以满足先验数据的权重约束。
  • methods: 本文使用了非 Parametric 方法和最大 entropy 原理,以及优化运输方法,来实现重量调整。
  • results: 本文在三种不同的应用中展示了其方法的灵活性,包括股票配置、复杂调查的 semi-parametric 推断和机器学习算法中的公平性权重调整。
    Abstract We commonly encounter the problem of identifying an optimally weight adjusted version of the empirical distribution of observed data, adhering to predefined constraints on the weights. Such constraints often manifest as restrictions on the moments, tail behaviour, shapes, number of modes, etc., of the resulting weight adjusted empirical distribution. In this article, we substantially enhance the flexibility of such methodology by introducing a nonparametrically imbued distributional constraints on the weights, and developing a general framework leveraging the maximum entropy principle and tools from optimal transport. The key idea is to ensure that the maximum entropy weight adjusted empirical distribution of the observed data is close to a pre-specified probability distribution in terms of the optimal transport metric while allowing for subtle departures. The versatility of the framework is demonstrated in the context of three disparate applications where data re-weighting is warranted to satisfy side constraints on the optimization problem at the heart of the statistical task: namely, portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
    摘要 通常我们会遇到一个问题,即确定一个最优权重调整后的观察数据的Empirical distribution,并且遵循先defined的约束条件。这些约束通常表现为观察数据的分布的 moments、tail behaviors、形态、数量等的限制。在这篇文章中,我们将提出一种非parametric方法,通过在权重调整后的分布中嵌入分布约束,并利用最大 entropy原理和优化运输工具来开发一个通用的框架。我们的关键想法是使得最大 entropy权重调整后的观察数据的分布与先defined的概率分布在优化运输度量上尽可能接近,同时允许某些微的偏差。我们在三种不同的应用中展示了这种框架的灵活性: namely,股票组合 allocate, 复杂调查的 semi-parametric inference, 和机器学习算法中的公平性。

CAT: Closed-loop Adversarial Training for Safe End-to-End Driving

  • paper_url: http://arxiv.org/abs/2310.12432
  • repo_url: None
  • paper_authors: Linrui Zhang, Zhenghao Peng, Quanyi Li, Bolei Zhou
  • for: 这个研究是为了提高自动驾驶车辆的安全性。
  • methods: 这个研究使用了关注闭环境敌对训练(CAT)框架,通过环境增强来不断改善驾驶代码的安全性。
  • results: 实验结果显示,CAT可以快速生成更有效的物理攻击,并且可以将这些攻击与驾驶代码的训练相互作用,从而提高驾驶代码的安全性。
    Abstract Driving safety is a top priority for autonomous vehicles. Orthogonal to prior work handling accident-prone traffic events by algorithm designs at the policy level, we investigate a Closed-loop Adversarial Training (CAT) framework for safe end-to-end driving in this paper through the lens of environment augmentation. CAT aims to continuously improve the safety of driving agents by training the agent on safety-critical scenarios that are dynamically generated over time. A novel resampling technique is developed to turn log-replay real-world driving scenarios into safety-critical ones via probabilistic factorization, where the adversarial traffic generation is modeled as the multiplication of standard motion prediction sub-problems. Consequently, CAT can launch more efficient physical attacks compared to existing safety-critical scenario generation methods and yields a significantly less computational cost in the iterative learning pipeline. We incorporate CAT into the MetaDrive simulator and validate our approach on hundreds of driving scenarios imported from real-world driving datasets. Experimental results demonstrate that CAT can effectively generate adversarial scenarios countering the agent being trained. After training, the agent can achieve superior driving safety in both log-replay and safety-critical traffic scenarios on the held-out test set. Code and data are available at https://metadriverse.github.io/cat.
    摘要 驾驶安全是自动驾驶车辆的最高优先级。在政策层面上处理减少交通事故的算法设计方法已经存在,而我们在这篇论文中则是通过封闭型对抗训练(CAT)框架来提高驾驶代理人的安全性。CAT采用时间 dynamically generates safety-critical scenarios to continuously improve the safety of driving agents through closed-loop training. We develop a novel resampling technique to turn log-replay real-world driving scenarios into safety-critical ones via probabilistic factorization, allowing CAT to launch more efficient physical attacks and reduce the computational cost of the iterative learning pipeline. We incorporate CAT into the MetaDrive simulator and validate our approach on hundreds of driving scenarios imported from real-world driving datasets. Experimental results demonstrate that CAT can effectively generate adversarial scenarios countering the agent being trained, and the trained agent can achieve superior driving safety in both log-replay and safety-critical traffic scenarios on the held-out test set. Code and data are available at .

Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling

  • paper_url: http://arxiv.org/abs/2310.12421
  • repo_url: None
  • paper_authors: Wendy Hui, Wai Kwong Lau
  • for: 该论文提出了使用 causal 模型检测和纠正算法偏见的想法。
  • methods: 该论文使用了 Adult 数据集,可以从 UC Irvine 机器学习库下载,构建了预测模型和偏见纠正模型。
  • results: 论文发现预测模型中的性别偏见是 statistically 有效的,并通过 cross-validation validate 了偏见纠正模型的效果。此外,论文还表明了偏见纠正模型可以提高类别预测精度。
    Abstract This paper proposes the use of causal modeling to detect and mitigate algorithmic bias. We provide a brief description of causal modeling and a general overview of our approach. We then use the Adult dataset, which is available for download from the UC Irvine Machine Learning Repository, to develop (1) a prediction model, which is treated as a black box, and (2) a causal model for bias mitigation. In this paper, we focus on gender bias and the problem of binary classification. We show that gender bias in the prediction model is statistically significant at the 0.05 level. We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation. Furthermore, we show that the overall classification accuracy is improved slightly. Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R. Hence, it enhances explainability and promotes trust.
    摘要 本文提出了使用 causal 模型探测和 Mitigate 算法偏见的方法。我们提供了简要的 causal 模型介绍和我们的方法概述。我们使用UC Irvine 机器学习库提供的 Adult 数据集来开发 (1) 预测模型(当作黑盒模型)和 (2) 偏见 Mitigation 模型。在本文中,我们关注了性别偏见问题,并使用二分类问题进行探究。我们发现预测模型中的性别偏见是 statistically 显著的(p < 0.05)。我们还证明了 causal 模型可以有效地 Mitigate 性别偏见,并且通过十分法证明了这种方法的可行性。此外,我们还发现了一些轻微的总分率提高。我们的新方法是直观、易用,可以使用现有的统计软件工具such as "lavaan" in R进行实现,因此增加了解释性和信任度。

Cooperative Minibatching in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.12403
  • repo_url: https://github.com/gt-tdalab/dgl-coop
  • paper_authors: Muhammed Fatih Balin, Dominique LaSalle, Ümit V. Çatalyürek
  • for: 降低大规模Graph Neural Networks(GNNs)训练的计算资源需求
  • methods: 使用Cooperative Minibatching方法,利用批处理器(PE)之间的快速交换机制,实现更好的数据 reuse和减少Neighborhood Explosion Phenomenon(NEP)的影响
  • results: 在单节点多GPU系统上实现了up to 64%的速度提升,相比独立批处理方法
    Abstract Significant computational resources are required to train Graph Neural Networks (GNNs) at a large scale, and the process is highly data-intensive. One of the most effective ways to reduce resource requirements is minibatch training coupled with graph sampling. GNNs have the unique property that items in a minibatch have overlapping data. However, the commonly implemented Independent Minibatching approach assigns each Processing Element (PE) its own minibatch to process, leading to duplicated computations and input data access across PEs. This amplifies the Neighborhood Explosion Phenomenon (NEP), which is the main bottleneck limiting scaling. To reduce the effects of NEP in the multi-PE setting, we propose a new approach called Cooperative Minibatching. Our approach capitalizes on the fact that the size of the sampled subgraph is a concave function of the batch size, leading to significant reductions in the amount of work per seed vertex as batch sizes increase. Hence, it is favorable for processors equipped with a fast interconnect to work on a large minibatch together as a single larger processor, instead of working on separate smaller minibatches, even though global batch size is identical. We also show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches. Our experimental evaluations show up to 4x bandwidth savings for fetching vertex embeddings, by simply increasing this dependency without harming model convergence. Combining our proposed approaches, we achieve up to 64% speedup over Independent Minibatching on single-node multi-GPU systems.
    摘要 具有重要计算资源的 Graph Neural Networks (GNNs) 在大规模培育中需要很多计算资源,并且是数据敏感的。一种有效的方法来降低资源需求是使用小批处理并 Graph sampling。GNNs 具有独特的特点,即批处理中的每个进程元素 (PE) 之间的数据协同。然而,通常实现的独立小批处理方法会在每个 PE 上分配自己的小批处理,导致重复的计算和数据访问,从而增加 Neighborhood Explosion Phenomenon (NEP),这是批处理的主要瓶颈。为了减少 NEP 在多个 PE 设置下的影响,我们提出了一种新的方法called Cooperative Minibatching。我们的方法利用了小批处理中采样的子图大小是批处理大小的凹形函数,从而导致每个种子顶点的工作量减少,因此更有利于配备快速互connect的处理器工作于大批处理中。我们还示出了在串行执行中使用相互依赖的连续小批处理可以获得更大的带宽减少,而不会影响模型的 converges。结合我们的提出的方法,我们在单个节点多卡系统上实现了与独立小批处理相比的最高速度提升达 64%。

Closed-Form Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.12395
  • repo_url: https://github.com/kdas0501/Mixing_solution_CFA
  • paper_authors: Christopher Scarvelis, Haitz Sáez de Ocáriz Borde, Justin Solomon
  • for: 用于生成新样本,而不需要训练。
  • methods: 使用closed-form score function,并通过近似 Neil 网络来预测score function。
  • results: 可以在consumer-grade CPU上运行,并且 sampling 速度与 neural SGMs 相当。
    Abstract Score-based generative models (SGMs) sample from a target distribution by iteratively transforming noise using the score function of the perturbed target. For any finite training set, this score function can be evaluated in closed form, but the resulting SGM memorizes its training data and does not generate novel samples. In practice, one approximates the score by training a neural network via score-matching. The error in this approximation promotes generalization, but neural SGMs are costly to train and sample, and the effective regularization this error provides is not well-understood theoretically. In this work, we instead explicitly smooth the closed-form score to obtain an SGM that generates novel samples without training. We analyze our model and propose an efficient nearest-neighbor-based estimator of its score function. Using this estimator, our method achieves sampling times competitive with neural SGMs while running on consumer-grade CPUs.
    摘要 Score-based生成模型(SGM)通过iterativelytransforming noise使用目标分布中的分数函数来采样。任何固定的训练集,这个分数函数都可以在关闭形式中评估,但是这些SGM会memorize其训练数据并不会生成新样本。在实践中,我们通常通过score-matching来approximate分数函数,并通过这个错误来促进泛化。但是神经网络SGM的训练和采样成本高,而且这种错误的效果不够理解。在这个工作中,我们选择显式简化关闭形式的分数函数,以获得一个不需要训练的SGM,可以生成新的样本。我们分析我们的模型,并提出一种高效的最近邻域基于的分数函数估计器。使用这种估计器,我们的方法可以与神经网络SGM的采样时间竞争,而且可以在consumer-grade CPU上运行。