cs.LG - 2023-09-04

Delegating Data Collection in Decentralized Machine Learning

  • paper_url: http://arxiv.org/abs/2309.01837
  • repo_url: None
  • paper_authors: Nivasini Ananthakrishnan, Stephen Bates, Michael I. Jordan, Nika Haghtalab
  • for: 研究了数据收集委托的优化问题,尤其是面临不确定评估模型质量和无知对优化模型性能的两个基本机器学习挑战。
  • methods: 基于合同理论开始,设计优化和近优化合同来解决两个挑战,并通过简单的线性合同实现1-1/e的最优Utility,即使主体只有小考试集。此外,我们还给出了测试集大小的条件,以实现趋近于优化Utility的添加性近似。
  • results: 我们显示了在不确定评估模型质量和无知对优化模型性能的情况下,可以通过简单的线性合同和可靠的算法来解决这两个挑战,并实现高效的数据收集委托。
    Abstract Motivated by the emergence of decentralized machine learning ecosystems, we study the delegation of data collection. Taking the field of contract theory as our starting point, we design optimal and near-optimal contracts that deal with two fundamental machine learning challenges: lack of certainty in the assessment of model quality and lack of knowledge regarding the optimal performance of any model. We show that lack of certainty can be dealt with via simple linear contracts that achieve 1-1/e fraction of the first-best utility, even if the principal has a small test set. Furthermore, we give sufficient conditions on the size of the principal's test set that achieves a vanishing additive approximation to the optimal utility. To address the lack of a priori knowledge regarding the optimal performance, we give a convex program that can adaptively and efficiently compute the optimal contract.
    摘要 驱动了分布式机器学习生态系统的出现,我们研究数据采集委托的问题。从ontract理论为起点,我们设计优化和近似优化的合同,解决机器学习中两个基本挑战:评估模型质量的不确定性和任何模型的优化性知识不够。我们显示,不确定性可以通过简单的线性合同来处理,实现1-1/e的首选utililty,即使主体只有一小部分的测试集。此外,我们给出了测试集的大小必须满足的条件,以实现趋近于优化的utililty。为了解决无先知道优化性的问题,我们给出了一种可靠和高效的 convex program,可以动态计算优化的合同。Note: "systext" is the Simplified Chinese character set, which is a standardized form of Chinese used in mainland China.

Soft-Dropout: A Practical Approach for Mitigating Overfitting in Quantum Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2309.01829
  • repo_url: None
  • paper_authors: Aakash Ravindra Shinde, Charu Jain, Amir Kalev
  • for: 这 paper 是为了研究量子卷积神经网络(QCNN)中的过拟合问题。
  • methods: 这 paper 使用了一种经典的过拟合 mitigation 方法,即在训练后添加 dropout 方法。
  • results: 研究发现,在量子设定下直接实现 dropout 方法会导致 QCNN 的成功率减少。此外,提出了一种更加温和的 dropout 方法,可以成功地处理 QCNN 中的过拟合问题。
    Abstract Quantum convolutional neural network (QCNN), an early application for quantum computers in the NISQ era, has been consistently proven successful as a machine learning (ML) algorithm for several tasks with significant accuracy. Derived from its classical counterpart, QCNN is prone to overfitting. Overfitting is a typical shortcoming of ML models that are trained too closely to the availed training dataset and perform relatively poorly on unseen datasets for a similar problem. In this work we study the adaptation of one of the most successful overfitting mitigation method, knows as the (post-training) dropout method, to the quantum setting. We find that a straightforward implementation of this method in the quantum setting leads to a significant and undesirable consequence: a substantial decrease in success probability of the QCNN. We argue that this effect exposes the crucial role of entanglement in QCNNs and the vulnerability of QCNNs to entanglement loss. To handle overfitting, we proposed a softer version of the dropout method. We find that the proposed method allows us to handle successfully overfitting in the test cases.
    摘要

Secure and Efficient Federated Learning in LEO Constellations using Decentralized Key Generation and On-Orbit Model Aggregation

  • paper_url: http://arxiv.org/abs/2309.01828
  • repo_url: None
  • paper_authors: Mohamed Elmahallawy, Tie Luo, Mohamed I. Ibrahem
  • for: 这篇论文是为了解决在低地球轨道(LEO)上运行小卫星的资料下载和联合学习问题。
  • methods: 这篇论文提出了一个名为 FedSecure 的安全联合学习方法,它包括两个新的 ком成分:(1)分散的钥匙生成,以保护卫星数据的隐私使用函数加密方案,和(2)在轨道上进行模型传输和聚合,从而实现每个轨道的部分全球模型,以最小化遗失可见区域的对等待时间。
  • results: 我们的分析和结果显示,FedSecure 可以保护每个卫星的数据免受窃听者、curious server 或 curious satellite 的披露,并且具有较低的通信和计算负载,从而实现高精度(达85.35%)的联合学习。
    Abstract Satellite technologies have advanced drastically in recent years, leading to a heated interest in launching small satellites into low Earth orbit (LEOs) to collect massive data such as satellite imagery. Downloading these data to a ground station (GS) to perform centralized learning to build an AI model is not practical due to the limited and expensive bandwidth. Federated learning (FL) offers a potential solution but will incur a very large convergence delay due to the highly sporadic and irregular connectivity between LEO satellites and GS. In addition, there are significant security and privacy risks where eavesdroppers or curious servers/satellites may infer raw data from satellites' model parameters transmitted over insecure communication channels. To address these issues, this paper proposes FedSecure, a secure FL approach designed for LEO constellations, which consists of two novel components: (1) decentralized key generation that protects satellite data privacy using a functional encryption scheme, and (2) on-orbit model forwarding and aggregation that generates a partial global model per orbit to minimize the idle waiting time for invisible satellites to enter the visible zone of the GS. Our analysis and results show that FedSecure preserves the privacy of each satellite's data against eavesdroppers, a curious server, or curious satellites. It is lightweight with significantly lower communication and computation overheads than other privacy-preserving FL aggregation approaches. It also reduces convergence delay drastically from days to only a few hours, yet achieving high accuracy of up to 85.35% using realistic satellite images.
    摘要 卫星技术在最近几年内发展了非常快,导致低地球轨道(LEO)上发射小卫星来收集大量数据,如卫星成像。由于下载这些数据到地面站(GS)以进行中央学习并建立人工智能模型是不实际的,因为卫星和GS之间的带宽是有限且昂贵的。联邦学习(FL)提供了一个可能的解决方案,但是它会产生非常大的融合延迟,因为LEO卫星和GS之间的连接是不规则和不可预测的。此外,在卫星和GS之间的通信频道上存在严重的安全和隐私风险,可能导致卫星数据的泄露。为解决这些问题,这篇论文提出了FedSecure,一种安全的联邦学习方法,包括两个新的组件:1. 分布式密钥生成,通过功能加密方案保护卫星数据隐私。2. 在轨道上进行模型转发和聚合,每次轨道执行一个部分全球模型,以最小化隐藏在视野外的卫星等待时间。我们的分析和结果表明,FedSecure可以保护每个卫星的数据隐私,并且具有较低的通信和计算开销,相比其他隐私保护的聚合方法。它还可以减少融合延迟从几天减少到只需几个小时,同时实现高准确率(达85.35%)。

LoopTune: Optimizing Tensor Computations with Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.01825
  • repo_url: None
  • paper_authors: Dejan Grubisic, Bram Wasti, Chris Cummins, John Mellor-Crummey, Aleksandar Zlateski
  • for: 这篇论文是为了解决高性能机器学习应用在新硬件上运行的问题,但传统的编译器无法提供性能。
  • methods: 这篇论文使用了深度学习自适应优化技术,开发了一个名为LoopTune的深度学习编译器,可以优化深度学习模型中的矩阵计算在CPU上。LoopTune使用了ultra-fast lightweight代码生成器LoopNest进行硬件特定优化。
  • results: LoopTune可以减少LoopNest的搜索时间,并且可以生成论文速度比TVM、MetaSchedule和AutoTVM快,consistently performing at the level of the hand-tuned library Numpy。此外,LoopTune可以在秒钟级别进行代码优化。
    Abstract Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries introduce unsustainable costs. To address this, we developed LoopTune, a deep reinforcement learning compiler that optimizes tensor computations in deep learning models for the CPU. LoopTune optimizes tensor traversal order while using the ultra-fast lightweight code generator LoopNest to perform hardware-specific optimizations. With a novel graph-based representation and action space, LoopTune speeds up LoopNest by 3.2x, generating an order of magnitude faster code than TVM, 2.8x faster than MetaSchedule, and 1.08x faster than AutoTVM, consistently performing at the level of the hand-tuned library Numpy. Moreover, LoopTune tunes code in order of seconds.
    摘要 高级编译技术对机器学习应用的运行是关键,但传统的编译器无法提供性能。常用的自动调参器有很长的搜索时间,专家优化库会带来不可持续的成本。为解决这问题,我们开发了 LoopTune,一种基于深度学习的编译器,用于优化深度学习模型中的矩阵计算。LoopTune优化矩阵游标顺序,并使用 ultra-fast 轻量级代码生成器 LoopNest 进行硬件特定优化。使用图表 Representation 和 Action 空间,LoopTune 将 LoopNest 加速了 3.2 倍,生成的代码比 TVM 快了一个数量级,比 MetaSchedule 快了 2.8 倍,和 AutoTVM 快了 1.08 倍,一直保持与手动优化库 Numpy 的水平。此外,LoopTune 只需几秒钟来调参代码。

Computation and Communication Efficient Federated Learning over Wireless Networks

  • paper_url: http://arxiv.org/abs/2309.01816
  • repo_url: None
  • paper_authors: Xiaonan Liu, Tharmalingam Ratnarajah
  • for: 提高 Federated Learning (FL)模型训练的精度和效率,同时保持数据隐私。
  • methods: 提出一种基于 partial model pruning 和个性化的 FL 框架,将学习模型分为全球部分和个性化部分,以适应各个设备的非独立同分布(non IID)数据。
  • results: 通过数学分析和优化问题的解法,提高 FL 框架的计算和通信负载,同时提高学习精度和速度。实验结果显示,提议的 FL 框架可以降低大约 50% 的计算和通信负载。
    Abstract Federated learning (FL) allows model training from local data by edge devices while preserving data privacy. However, the learning accuracy decreases due to the heterogeneity of devices data, and the computation and communication latency increase when updating large scale learning models on devices with limited computational capability and wireless resources. To overcome these challenges, we consider a novel FL framework with partial model pruning and personalization. This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine tuned for a specific device, which adapts the model size during FL to reduce both computation and communication overhead and minimize the overall training time, and increases the learning accuracy for the device with non independent and identically distributed (non IID) data. Then, the computation and communication latency and the convergence analysis of the proposed FL framework are mathematically analyzed. Based on the convergence analysis, an optimization problem is formulated to maximize the convergence rate under a latency threshold by jointly optimizing the pruning ratio and wireless resource allocation. By decoupling the optimization problem and deploying Karush Kuhn Tucker (KKT) conditions, we derive the closed form solutions of pruning ratio and wireless resource allocation. Finally, experimental results demonstrate that the proposed FL framework achieves a remarkable reduction of approximately 50 percents computation and communication latency compared with the scheme only with model personalization.
    摘要 联合学习(FL)允许本地数据进行模型训练,同时保持数据隐私。然而,由于设备数据的不同性,学习准确率减少,并且更新大规模学习模型在设备上的计算和通信负担增加。为了解决这些挑战,我们提出了一种新的FL框架,其中分解学习模型为全球部分和个性化部分。全球部分通过模型剔除来学习数据表示,而个性化部分在特定设备上进行细化调整,以适应非独立同分布(non IID)数据。这种框架可以降低计算和通信延迟,提高学习精度,并最大化 converges 率。然后,我们数学分析了计算和通信延迟以及折衔率的影响。基于折衔率的最大化,我们提出了一个优化问题,以提高 converges 率下的延迟阈值。通过分解优化问题并应用 KKT 条件,我们得到了封闭式的解决方案。最后,我们通过实验证明,提出的FL框架可以降低约50%的计算和通信延迟。

Asymmetric matrix sensing by gradient descent with small random initialization

  • paper_url: http://arxiv.org/abs/2309.01796
  • repo_url: None
  • paper_authors: Johan S. Wind
  • for: matrix sensing problem, reconstructing low-rank matrix from linear measurements
  • methods: factorized gradient descent, continuous differential equation (perturbed gradient flow)
  • results: quick convergence to true target matrix with bounded perturbation, novel proof of asymmetric matrix sensing
    Abstract We study matrix sensing, which is the problem of reconstructing a low-rank matrix from a few linear measurements. It can be formulated as an overparameterized regression problem, which can be solved by factorized gradient descent when starting from a small random initialization. Linear neural networks, and in particular matrix sensing by factorized gradient descent, serve as prototypical models of non-convex problems in modern machine learning, where complex phenomena can be disentangled and studied in detail. Much research has been devoted to studying special cases of asymmetric matrix sensing, such as asymmetric matrix factorization and symmetric positive semi-definite matrix sensing. Our key contribution is introducing a continuous differential equation that we call the $\textit{perturbed gradient flow}$. We prove that the perturbed gradient flow converges quickly to the true target matrix whenever the perturbation is sufficiently bounded. The dynamics of gradient descent for matrix sensing can be reduced to this formulation, yielding a novel proof of asymmetric matrix sensing with factorized gradient descent. Compared to directly analyzing the dynamics of gradient descent, the continuous formulation allows bounding key quantities by considering their derivatives, often simplifying the proofs. We believe the general proof technique may prove useful in other settings as well.
    摘要 我们研究矩阵感知问题,即从一些线性测量中重建一个低级矩阵的问题。可以将其形式化为过参数化回归问题,可以通过分解梯度下降来解决,当起始于一个小random initialization时。 linear neural networks 和特别是矩阵感知通过分解梯度下降是现代机器学习中非 convex 问题的典型模型,其中复杂现象可以分解和研究在详细的方式上。许多研究者已经投入到 изучение特殊情况的偏 asymmetric matrix sensing 中,如偏 asymmetric matrix factorization 和Symmetric positive semi-definite matrix sensing。我们的关键贡献在于引入一个名为 $\textit{perturbed gradient flow}$ 的连续偏微分方程。我们证明了当perturbation够小时,这个方程快速地 converge 到真正的目标矩阵。矩阵感知的梯度下降动力学可以被归纳到这个形式化中,从而得到一种新的证明方式。与直接分析梯度下降动力学相比,连续形式化允许通过考虑其导数来简化证明。我们认为这种普适的证明技巧可能在其他情况下也会有用。

Composite federated learning with heterogeneous data

  • paper_url: http://arxiv.org/abs/2309.01795
  • repo_url: None
  • paper_authors: Jiaojiao Zhang, Jiang Hu, Mikael Johansson
  • for: 解决复杂的 Federated Learning(FL)问题
  • methods: 使用战略性分解质量梯度和通信,并不假设数据相似性来避免客户端漂移
  • results: 比前者更高效,可以 linearly 收敛到一个 neighborhood 的优解Here’s a breakdown of each point:1. for: The paper is written to solve the composite Federated Learning (FL) problem.2. methods: The paper proposes a novel algorithm that manages non-smooth regularization by decoupling the proximal operator and communication, and addresses client drift without assuming data similarity. Each worker uses local updates to reduce communication frequency with the server and transmits only a $d$-dimensional vector per communication round.3. results: The algorithm is proven to converge linearly to a neighborhood of the optimal solution, and the paper demonstrates the superiority of the algorithm over state-of-the-art methods in numerical experiments.
    Abstract We propose a novel algorithm for solving the composite Federated Learning (FL) problem. This algorithm manages non-smooth regularization by strategically decoupling the proximal operator and communication, and addresses client drift without any assumptions about data similarity. Moreover, each worker uses local updates to reduce the communication frequency with the server and transmits only a $d$-dimensional vector per communication round. We prove that our algorithm converges linearly to a neighborhood of the optimal solution and demonstrate the superiority of our algorithm over state-of-the-art methods in numerical experiments.
    摘要 我们提出了一种新的 Federation Learning(FL)问题的算法。该算法在规范化正则化中使用推迟分离 proximal 算符和通信,并 Addresses 客户端漂移无需数据相似性假设。另外,每个工作者使用本地更新减少与服务器之间的通信频率,并只在通信轮次中发送 $d$-维向量。我们证明了我们的算法 linearly 收敛到优解附近的解,并在数值实验中证明了我们的算法与现状算法的超越性。

Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

  • paper_url: http://arxiv.org/abs/2309.01788
  • repo_url: https://github.com/gmh14/geo-deg
  • paper_authors: Minghao Guo, Veronika Thost, Samuel W Song, Adithya Balachandran, Payel Das, Jie Chen, Wojciech Matusik
  • for: 这项研究的目的是提出一种数据效率的物理属性预测方法,以便在材料和药物发现领域中进行预测。
  • methods: 该方法使用学习式层次分子 grammar,可以生成分子结构从 grammar 生成规则。这种 grammar 适应了分子结构空间的显式几何结构,从而提供了有用的分子结构相似性的几何信息。 property 预测使用图 neural diffusion 在 grammar-induced 几何空间中进行。
  • results: 在小规模和大规模数据集上,我们的评估表明,这种方法可以比supervised和预训练图 neural network 等基准方法表现出色,并且在具有极其有限数据情况下进行预测时表现出色。我们还包括了细化的减少研究和进一步分析,以证明我们的解决方案的有效性。
    Abstract The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data. Code is available at https://github.com/gmh14/Geo-DEG.
    摘要 “分子性能预测是物质和药物搜索领域中的一项关键任务。 latest literature 中的 potential benefits 反映了使用深度学习技术的可能性。然而,这些技术在实践中受到一种常见的挑战:标注数据受到文献EXTRACTION AND laborious experimentation的成本限制。在这种情况下,我们提出了一种数据效率的属性预测器,利用可学习的分子语法树来生成分子。这种语法树 induces 分子图的Explicit geometry,提供了有用的分子结构相似性的假设。我们使用图解 diffusion 来预测分子的属性,并在小和大数据集上评估了我们的方法。结果表明,我们的方法可以比supervised和预训练图神经网络Outperform。我们还进行了详细的折衣分析和进一步的分析,证明了我们的解决方案在数据受限情况下的有效性。代码可以在https://github.com/gmh14/Geo-DEG中找到。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

ATMS: Algorithmic Trading-Guided Market Simulation

  • paper_url: http://arxiv.org/abs/2309.01784
  • repo_url: None
  • paper_authors: Song Wei, Andrea Coletta, Svitlana Vyetrenko, Tucker Balch
  • For: The paper aims to propose a metric to quantify market discrepancy and develop an Algorithmic Trading-guided Market Simulation (ATMS) to improve the realism of market simulations.* Methods: The proposed metric measures the difference between a causal effect from underlying market unique characteristics and is evaluated through the interaction between the AT agent and the market. ATMS formulates the simulator as a stochastic policy in reinforcement learning (RL) to account for the sequential nature of trading, and utilizes the policy gradient update to bypass differentiating the proposed metric.* Results: The proposed metric and ATMS are demonstrated to be effective through extensive experiments on semi-real market data, showing improved similarity to reality compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network (cWGAN) approach, and producing market data with more balanced BUY and SELL volumes.
    Abstract The effective construction of an Algorithmic Trading (AT) strategy often relies on market simulators, which remains challenging due to existing methods' inability to adapt to the sequential and dynamic nature of trading activities. This work fills this gap by proposing a metric to quantify market discrepancy. This metric measures the difference between a causal effect from underlying market unique characteristics and it is evaluated through the interaction between the AT agent and the market. Most importantly, we introduce Algorithmic Trading-guided Market Simulation (ATMS) by optimizing our proposed metric. Inspired by SeqGAN, ATMS formulates the simulator as a stochastic policy in reinforcement learning (RL) to account for the sequential nature of trading. Moreover, ATMS utilizes the policy gradient update to bypass differentiating the proposed metric, which involves non-differentiable operations such as order deletion from the market. Through extensive experiments on semi-real market data, we demonstrate the effectiveness of our metric and show that ATMS generates market data with improved similarity to reality compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network (cWGAN) approach. Furthermore, ATMS produces market data with more balanced BUY and SELL volumes, mitigating the bias of the cWGAN baseline approach, where a simple strategy can exploit the BUY/SELL imbalance for profit.
    摘要 通常,建立一个Algorithmic Trading(AT)策略的有效构造往往依赖市场模拟器,但是现有方法很难适应交易活动的顺序和动态性。这种工作填补了这一空白,提出了一个用于量化市场差异的度量。这个度量测量了 causal effect的差异,它通过AT代理和市场的交互来评估。更重要的是,我们引入了Algorithmic Trading-guided Market Simulation(ATMS),通过优化我们的提出的度量来寻找最佳的市场模拟器。受SeqGAN的启发,ATMS将市场模拟器形式化为一个随机政策,以便考虑交易的顺序性。此外,ATMS使用策略梯度更新来绕过非导数的操作,如市场中的订单删除。通过对半实际市场数据进行了广泛的实验,我们证明了我们的度量的有效性,并显示ATMS生成的市场数据与现有的conditional Wasserstein Generative Adversarial Network(cWGAN)方法相比,具有更高的实际性。此外,ATMS生成的市场数据具有更好的BUY和SELL量均衡,从而减轻cWGAN基础方法的偏见,其中一个简单的策略可以通过BUY/SELL偏见来获得利润。

Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers

  • paper_url: http://arxiv.org/abs/2309.01783
  • repo_url: None
  • paper_authors: Sadegh Soleimani, Mahsa Bahrami, Mansour Vali
  • for: 预测COLRET cancer patients的1、3、5年生存率
  • methods: 使用优化的预处理技术、标准平衡技术、 Synthetic Minority Over-sampling Techniques (SMOTE) 和 pipelines of SMOTE和RENN方法来平衡数据,并使用 Decision Trees、Random Forest、Extra Tree、eXtreme Gradient Boosting 和 Light Gradient Boosting (LGBM) 等树型分类算法进行预测
  • results: 使用5-fold cross-validation方法进行性能评估,在高度不平衡的1年生存任务中,提出的方法与LGBM combinatorial方法达到了72.30%的敏感性;在3年生存任务中,combine RENN和LGBM方法达到了80.81%的敏感性,表明提出的方法在高度不平衡的数据集上表现最佳。
    Abstract Background and Objective: Colorectal cancer is a high mortality cancer. Clinical data analysis plays a crucial role in predicting the survival of colorectal cancer patients, enabling clinicians to make informed treatment decisions. However, utilizing clinical data can be challenging, especially when dealing with imbalanced outcomes. This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients using clinical datasets, with particular emphasis on the highly imbalanced 1-year survival prediction task. To address this issue, we propose a method that creates a pipeline of some of standard balancing techniques to increase the true positive rate. Evaluation is conducted on a colorectal cancer dataset from the SEER database. Methods: The pre-processing step consists of removing records with missing values and merging categories. The minority class of 1-year and 3-year survival tasks consists of 10% and 20% of the data, respectively. Edited Nearest Neighbor, Repeated edited nearest neighbor (RENN), Synthetic Minority Over-sampling Techniques (SMOTE), and pipelines of SMOTE and RENN approaches were used and compared for balancing the data with tree-based classifiers. Decision Trees, Random Forest, Extra Tree, eXtreme Gradient Boosting, and Light Gradient Boosting (LGBM) are used in this article. Method. Results: The performance evaluation utilizes a 5-fold cross-validation approach. In the case of highly imbalanced datasets (1-year), our proposed method with LGBM outperforms other sampling methods with the sensitivity of 72.30%. For the task of imbalance (3-year survival), the combination of RENN and LGBM achieves a sensitivity of 80.81%, indicating that our proposed method works best for highly imbalanced datasets. Conclusions: Our proposed method significantly improves mortality prediction for the minority class of colorectal cancer patients.
    摘要 背景和目标:肠Rectal癌是高 Mortality癌症,临床数据分析对于预测肠Rectal癌患者的存活率起着关键作用,帮助临床医生制定 Informed 治疗决策。然而,利用临床数据可能会困难,特别是面临异常尝试的情况。这篇论文关注了开发用来预测肠Rectal癌患者1-, 3-, 5年存活率的算法,特别是面临异常尝试的1年存活预测任务。为解决这一问题,我们提议一种方法,该方法包括一系列标准平衡技术,以增加真正正确率。我们对一个肠Rectal癌数据集进行评估。方法:数据预处理步骤包括去除异常值和合并类别。肠Rectal癌1年和3年存活任务中的少数类刚占数据集的10%和20%。我们使用编辑最近邻国(Edited Nearest Neighbor,RENN)、重复编辑最近邻国(Repeated edited nearest neighbor,SMOTE)、Synthetic Minority Over-sampling Techniques(SMOTE)和这些技术的组合来平衡数据,并与树型分类器结合。我们使用的分类器包括决策树、Random Forest、Extra Tree、eXtreme Gradient Boosting和Light Gradient Boosting(LGBM)。结果:我们使用5-fold Cross-validation方法进行性能评估。在面临异常尝试的1年存活任务中,我们提议的方法与LGBM结合的sensitivity达到72.30%,表明我们的方法在高度偏置的数据集中表现出色。在3年存活任务中,我们 combinational RENN和LGBM的方法达到了80.81%的sensitivity, indicating that our proposed method works best for highly imbalanced datasets。结论:我们的提议方法能够显著提高肠Rectal癌患者少数类的存活预测率。

Self-concordant Smoothing for Convex Composite Optimization

  • paper_url: http://arxiv.org/abs/2309.01781
  • repo_url: https://github.com/adeyemiadeoye/SelfConcordantSmoothOptimization.jl
  • paper_authors: Adeyemi D. Adeoye, Alberto Bemporad
  • for: 本研究旨在提出一种自相关平滑方法,用于最小化两个凸函数之和,其中一个是平滑的,另一个可能是非凸的。
  • methods: 本研究使用了部分平滑技术,只平滑了一部分非凸函数。研究者提出了一种自然的问题结构,以及一种变量 метриック选择方法和一种步长选择规则,特别适合 proximal Newton 类算法。
  • results: 研究者证明了本方法的本地二次quadratic convergence 率,并在两种算法中实现了这一点:Prox-N-SCORE 算法和 Prox-GGN-SCORE 算法。其中 Prox-GGN-SCORE 算法包含一种重要的近似过程,可以减少大多数计算开销,特别是在过parameterized 机器学习模型和 mini-batch Settings 中。 numerics 示例表明了本方法的高效性和其他方法的不平等。
    Abstract We introduce the notion of self-concordant smoothing for minimizing the sum of two convex functions: the first is smooth and the second may be nonsmooth. Our framework results naturally from the smoothing approximation technique referred to as partial smoothing in which only a part of the nonsmooth function is smoothed. The key highlight of our approach is in a natural property of the resulting problem's structure which provides us with a variable-metric selection method and a step-length selection rule particularly suitable for proximal Newton-type algorithms. In addition, we efficiently handle specific structures promoted by the nonsmooth function, such as $\ell_1$-regularization and group-lasso penalties. We prove local quadratic convergence rates for two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton (GGN) algorithm. The Prox-GGN-SCORE algorithm highlights an important approximation procedure which helps to significantly reduce most of the computational overhead associated with the inverse Hessian. This approximation is essentially useful for overparameterized machine learning models and in the mini-batch settings. Numerical examples on both synthetic and real datasets demonstrate the efficiency of our approach and its superiority over existing approaches.
    摘要 我们引入自己协调缓和的减少方法,用于减少两个凸函数:第一个是光滑的,第二个可能是不凸的。我们的框架从partial smoothing技术发展而来,仅将部分不凸函数缓和。我们的方法的关键特点在于它具有自然的变量 метри选择方法和步长选择规则,特别适合 proximal Newton 类型的算法。此外,我们可以有效地处理特定结构,它们由不凸函数带来,例如 $\ell_1$-调整和群lasso 罚则。我们证明了两个结果算法的本地quadratic convergence 率:Prox-N-SCORE 算法和 Prox-GGN-SCORE 算法。Prox-GGN-SCORE 算法显示了一个重要的近似程序,它可以帮助将大多数的计算负担与 inverse Hessian 相关联结。这个近似是在过parameterized machine learning 模型和 mini-batch 设定下特别有用。numero examples 表明我们的方法的效率和其他方法的优势。

Measuring, Interpreting, and Improving Fairness of Algorithms using Causal Inference and Randomized Experiments

  • paper_url: http://arxiv.org/abs/2309.01780
  • repo_url: None
  • paper_authors: James Enouen, Tianshu Sun, Yan Liu
  • For: The paper is written to address the problem of algorithm fairness in real-world AI production systems, with a focus on developing a practical and easy-to-implement measurement framework and a systematic approach to correcting detected sources of bias.* Methods: The paper uses recent advances in causal inference and interpretable machine learning to develop an algorithm-agnostic framework called MIIF (Measure, Interpret, and Improve the Fairness of an algorithmic decision). The framework includes randomized experiments to measure algorithm bias and an explainable machine learning model to interpret and distill the beliefs of a blackbox algorithm.* Results: The paper demonstrates the effectiveness of MIIF in measuring algorithm bias and improving fairness in practical applications like e-commerce and targeted advertising, where industry A/B testing is already abundant. The framework is shown to be simple and powerful, and the results suggest that it can be used to study algorithm fairness in a wide range of applications.
    Abstract Algorithm fairness has become a central problem for the broad adoption of artificial intelligence. Although the past decade has witnessed an explosion of excellent work studying algorithm biases, achieving fairness in real-world AI production systems has remained a challenging task. Most existing works fail to excel in practical applications since either they have conflicting measurement techniques and/ or heavy assumptions, or require code-access of the production models, whereas real systems demand an easy-to-implement measurement framework and a systematic way to correct the detected sources of bias. In this paper, we leverage recent advances in causal inference and interpretable machine learning to present an algorithm-agnostic framework (MIIF) to Measure, Interpret, and Improve the Fairness of an algorithmic decision. We measure the algorithm bias using randomized experiments, which enables the simultaneous measurement of disparate treatment, disparate impact, and economic value. Furthermore, using modern interpretability techniques, we develop an explainable machine learning model which accurately interprets and distills the beliefs of a blackbox algorithm. Altogether, these techniques create a simple and powerful toolset for studying algorithm fairness, especially for understanding the cost of fairness in practical applications like e-commerce and targeted advertising, where industry A/B testing is already abundant.
    摘要 “算法公平性已成为人工智能广泛采用的中心问题。过去十年内,我们已经见证了优秀的研究算法偏见,但在实际应用中实现算法公平性仍然是一项挑战。现有大多数工作具有冲突的测量技术和假设,或者需要生产模型的代码访问,而实际应用需要一个简单易用的测量框架和一个系统atic的方法来纠正检测到的偏见来源。在这篇论文中,我们利用了最新的 causal inference 和可解释机器学习来提出一个算法无关的框架(MIIF),用于测量、解释和改进算法决策中的偏见。我们使用随机实验来测量算法偏见,这使得同时测量不同对待、不同影响和经济价值的可能性。此外,我们使用现代可解释技术来开发一个可解释的机器学习模型,可以准确地解释和概括黑盒算法的信仰。总之,这些技术创造了一个简单强大的工具集,用于研究算法公平性,特别是在实际应用中的电商和目标广告等场景,其中产业A/B测试已经充沛。”

DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data

  • paper_url: http://arxiv.org/abs/2309.01779
  • repo_url: None
  • paper_authors: Feng Zhu, Jingjing Zhang, Shengyun Liu, Xin Wang
  • for: 这个论文的目的是提高 Federated Learning(FL)中的通信效率,以及解决因为不同的训练数据分布而导致的“客户端滑块”现象。
  • methods: 这个论文提出了一个新的度量“拟合度”,用于量度每个客户端的本地更新与全域尺度方向之间的夹角。然后,这个度量被用来实现在每个环境中动态地“拖”本地更新,以避免额外的通信过程。
  • results: 这个论文透过实验证明了DRAG算法在实际应用中具有优异的性能,能够有效地控制“客户端滑块”现象,并且具有不断性和稳定性。此外,DRAG算法还能够对某些Byzantine攻击进行有效的防护。
    Abstract Local stochastic gradient descent (SGD) is a fundamental approach in achieving communication efficiency in Federated Learning (FL) by allowing individual workers to perform local updates. However, the presence of heterogeneous data distributions across working nodes causes each worker to update its local model towards a local optimum, leading to the phenomenon known as ``client-drift" and resulting in slowed convergence. To address this issue, previous works have explored methods that either introduce communication overhead or suffer from unsteady performance. In this work, we introduce a novel metric called ``degree of divergence," quantifying the angle between the local gradient and the global reference direction. Leveraging this metric, we propose the divergence-based adaptive aggregation (DRAG) algorithm, which dynamically ``drags" the received local updates toward the reference direction in each round without requiring extra communication overhead. Furthermore, we establish a rigorous convergence analysis for DRAG, proving its ability to achieve a sublinear convergence rate. Compelling experimental results are presented to illustrate DRAG's superior performance compared to state-of-the-art algorithms in effectively managing the client-drift phenomenon. Additionally, DRAG exhibits remarkable resilience against certain Byzantine attacks. By securely sharing a small sample of the client's data with the FL server, DRAG effectively counters these attacks, as demonstrated through comprehensive experiments.
    摘要 本文提出了一种新的度量量名为“分布度”,用于量化当前工作节点的本地梯度和全局参考方向之间的角度。基于这个度量量,我们提出了一种名为“分布度基于的自适应聚合”(DRAG)算法,可以在每个轮次中动态地“拖”收到的本地更新向参考方向。这种算法不需要额外的通信开销,同时能够有效地控制客户端漂移现象。此外,我们也提供了一种准确的收敛分析,证明DRAG可以实现下线性收敛率。实验结果表明,DRAG在面临客户端漂移现象时表现出了显著的优势,并且具有remarkable的抗拒攻击能力。

CONFIDERAI: a novel CONFormal Interpretable-by-Design score function for Explainable and Reliable Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2309.01778
  • repo_url: None
  • paper_authors: Alberto Carlevaro, Sara Narteni, Fabrizio Dabbene, Marco Muselli, Maurizio Mongelli
  • For: The paper proposes a methodology for linking conformal prediction with explainable machine learning, with the goal of creating more reliable and trustworthy artificial intelligence systems.* Methods: The paper introduces a new score function called CONFIDERAI, which leverages both the predictive ability of rules and their geometric position within boundaries. Additionally, the paper addresses the problem of defining regions in feature space where conformal guarantees are satisfied using techniques to control the number of non-conformal samples in conformal regions based on support vector data description (SVDD).* Results: The paper reports promising results on benchmark and real datasets, such as DNS tunneling detection and cardiovascular disease prediction.
    Abstract Everyday life is increasingly influenced by artificial intelligence, and there is no question that machine learning algorithms must be designed to be reliable and trustworthy for everyone. Specifically, computer scientists consider an artificial intelligence system safe and trustworthy if it fulfills five pillars: explainability, robustness, transparency, fairness, and privacy. In addition to these five, we propose a sixth fundamental aspect: conformity, that is, the probabilistic assurance that the system will behave as the machine learner expects. In this paper, we propose a methodology to link conformal prediction with explainable machine learning by defining CONFIDERAI, a new score function for rule-based models that leverages both rules predictive ability and points geometrical position within rules boundaries. We also address the problem of defining regions in the feature space where conformal guarantees are satisfied by exploiting techniques to control the number of non-conformal samples in conformal regions based on support vector data description (SVDD). The overall methodology is tested with promising results on benchmark and real datasets, such as DNS tunneling detection or cardiovascular disease prediction.
    摘要 每天生活都在人工智能的影响下,机器学习算法必须设计为可靠和信任worthy。特别是计算机科学家认为一个人工智能系统安全和可靠的五大基础:解释性、可靠性、透明度、公平性和隐私。此外,我们还提出了第六个基本方面:准确性,即机器学习者期望的系统行为的概率ensure。在这篇论文中,我们提出了将CONFIDERAI作为新的分数函数,用于规则型模型,该函数利用规则预测的能力和点在规则边界的几何位置。我们还解决了定义特征空间中符合 garanties的问题,通过控制特征空间中非符合 garanties样本的数量来基于支持向量数据描述(SVDD)。总的来说,我们的方法ологи是在 benchmark和实际数据集上测试,如 DNS 隧道检测或心血管疾病预测。

Gated recurrent neural networks discover attention

  • paper_url: http://arxiv.org/abs/2309.01775
  • repo_url: None
  • paper_authors: Nicolas Zucchet, Seijin Kobayashi, Yassir Akram, Johannes von Oswald, Maxime Larcher, Angelika Steger, João Sacramento
  • for: 这个论文主要探讨了使用现代RNN的可能性,以及RNN如何通过线性循环层和Feedforward层实现自注意力。
  • methods: 这个论文使用了现代RNN的设计元素,包括线性循环层和Feedforward层,以及reverse工程技术来探讨RNN如何实现自注意力。
  • results: 研究发现,通过使用gradient descent优化器,RNN可以在具有简单学习任务的情况下实现同Transformers一样的性能,并且发现RNN在具有自注意力性的任务上实现了同Transformers一样的 Algorithm。
    Abstract Recent architectural developments have enabled recurrent neural networks (RNNs) to reach and even surpass the performance of Transformers on certain sequence modeling tasks. These modern RNNs feature a prominent design pattern: linear recurrent layers interconnected by feedforward paths with multiplicative gating. Here, we show how RNNs equipped with these two design elements can exactly implement (linear) self-attention, the main building block of Transformers. By reverse-engineering a set of trained RNNs, we find that gradient descent in practice discovers our construction. In particular, we examine RNNs trained to solve simple in-context learning tasks on which Transformers are known to excel and find that gradient descent instills in our RNNs the same attention-based in-context learning algorithm used by Transformers. Our findings highlight the importance of multiplicative interactions in neural networks and suggest that certain RNNs might be unexpectedly implementing attention under the hood.
    摘要 Here is the text in Simplified Chinese:现代建筑设计使得回归神经网络(RNN)能够在某些序列模型任务上与变换器相当或超越其性能。这些现代RNN具有一种明确的设计模式:线性循环层与Feedforward层之间的乘法关系。我们示出了RNN具有这两个元素可以直接实现线性自注意力,变换器的核心组件。通过分析训练过的RNN,我们发现在实际的梯度下降过程中,Gradient Descent实际上找到了我们的结构。具体来说,我们研究了在解决简单的上下文学习任务上训练过的RNN,这些任务在变换器上很出色,并发现了Gradient Descent在我们的RNN中实际上填充了同样的注意力基于上下文学习算法,与变转器一样。我们的发现表明多元互作在神经网络中的重要性,并且可能存在某些RNN在实际中无意识地实现注意力。

ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation

  • paper_url: http://arxiv.org/abs/2309.01771
  • repo_url: None
  • paper_authors: Nastaran Darabi, Maeesha Binte Hashem, Hongyi Pan, Ahmet Cetin, Wilfred Gomes, Amit Ranjan Trivedi
    for: 这篇论文旨在提出一种能效的频域运算深度神经网络(DNN)加速方法,以减少网络的耗电和延迟。methods: 本文使用频域运算,例如华氏-哈达姆转换(WHT),并提出了一种新的类比频域运算的能效加速方法,通过利用类比频域的tensor Transformation。results: 根据16×16核心阵列,对8位输入处理,提出的方法可以在VDD = 0.8 V下达到1602兆操作每秒每瓦(TOPS/W)的能效率,而且可以透过早期终止策略提高到5311 TOPS/W。
    Abstract The edge processing of deep neural networks (DNNs) is becoming increasingly important due to its ability to extract valuable information directly at the data source to minimize latency and energy consumption. Frequency-domain model compression, such as with the Walsh-Hadamard transform (WHT), has been identified as an efficient alternative. However, the benefits of frequency-domain processing are often offset by the increased multiply-accumulate (MAC) operations required. This paper proposes a novel approach to an energy-efficient acceleration of frequency-domain neural networks by utilizing analog-domain frequency-based tensor transformations. Our approach offers unique opportunities to enhance computational efficiency, resulting in several high-level advantages, including array micro-architecture with parallelism, ADC/DAC-free analog computations, and increased output sparsity. Our approach achieves more compact cells by eliminating the need for trainable parameters in the transformation matrix. Moreover, our novel array micro-architecture enables adaptive stitching of cells column-wise and row-wise, thereby facilitating perfect parallelism in computations. Additionally, our scheme enables ADC/DAC-free computations by training against highly quantized matrix-vector products, leveraging the parameter-free nature of matrix multiplications. Another crucial aspect of our design is its ability to handle signed-bit processing for frequency-based transformations. This leads to increased output sparsity and reduced digitization workload. On a 16$\times$16 crossbars, for 8-bit input processing, the proposed approach achieves the energy efficiency of 1602 tera operations per second per Watt (TOPS/W) without early termination strategy and 5311 TOPS/W with early termination strategy at VDD = 0.8 V.
    摘要 深度神经网络(DNN)的边缘处理在不断增长的重要性,这是因为它可以直接从数据源提取有价值信息,以降低延迟和能耗。频域模型压缩,如沃尔夏-哈达姆变换(WHT),已被证明是一种有效的方法。然而,频域处理的优点通常被增加的 multiply-accumulate(MAC)操作所抵消。这篇论文提出了一种新的能效加速频域神经网络的方法,利用频域频率基于的分析频率转换。我们的方法具有增强计算效率的多种优点,包括数组微架构的并行计算、ADC/DAC无计算、和输出稀疏化。我们的方法可以减少转换矩阵中的可训练参数,从而实现更加紧凑的维度。此外,我们的新的数组微架构可以在某些列和行上进行可靠的缝合,以便实现完全的并行计算。此外,我们的方法可以避免 ADC/DAC 计算,通过对高度量化矩阵-向量乘法进行训练,利用矩阵乘法的参数自由性。此外,我们的方法还可以处理signed-bit转换,从而提高输出稀疏化和减少数字化工作负担。在0.8 V 的电压下,我们的方法在16x16 核心上实现了1602 TOPS/W 的能效率,而不需要早期终止策略,并在使用早期终止策略时实现了5311 TOPS/W 的能效率。

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

  • paper_url: http://arxiv.org/abs/2309.01753
  • repo_url: None
  • paper_authors: Jeongyeol Kwon, Dohyun Kwon, Steve Wright, Robert Nowak
  • for: 本文研究了一种基于 penalty 方法的 first-order 算法,用于解决精度优化(BO)问题,其中目标函数都是平滑的,但可能不具有凸性。
  • methods: 本文使用 penalty 方法来把上下水平的目标函数 combine 成一个权重和 penalty 参数 $\sigma > 0$ 的和。并通过准确地Characterizing 下水平目标函数和上水平目标函数的值和导数在 $\sigma $ 的关系,得出了 penalty 函数的梯度。
  • results: 本文提出了一种 first-order 算法,可以在 $\epsilon $ 精度下找到一个 $\epsilon $ 站ARY点,并且需要 $O(\epsilon^{-3})$ 和 $O(\epsilon^{-7})$ 访问 first-order (随机) 梯度或acles。在 deterministic 梯度或acles 的情况下,算法可以在一个完全单loop 方式下实现,并且可以在 $O(1)$ 扫描每次迭代。
    Abstract In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter $\sigma > 0$. In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be $O(\sigma)$-close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as $O(\sigma)$-approximation of the original BO, we propose first-order algorithms that find an $\epsilon$-stationary solution by optimizing the penalty formulation with $\sigma = O(\epsilon)$. When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an $\epsilon$-stationary point of the penalty function, using in total $O(\epsilon^{-3})$ and $O(\epsilon^{-7})$ accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, i.e., with $O(1)$ samples per iteration, and achieves the improved oracle-complexity of $O(\epsilon^{-3})$ and $O(\epsilon^{-5})$, respectively.
    摘要 在这项工作中,我们研究了一种基于权重方法的首选算法来解决层次优化问题(BO),其中目标函数是连续的,但可能非几何的在两个水平上。作为第一步,我们通过对层次优化问题进行负权重方法的分析,特别是通过Explicitly characterizing the conditions under which the values and derivatives of the two must be $O(\sigma)$-close。这个分析中的结果还包括了层次优化问题的下面解的梯度的表达式,这可能有独立的应用价值。然后,我们视权重方法为$O(\sigma)$-近似于原始BO的方法,并提出了一种首选算法,它可以在$\sigma = O(\epsilon)$下找到一个$\epsilon$-稳定的解。当下面问题的较小精度预测误差Bound(EB)条件满足时,我们提出了一种首选算法,它可以在$O(\epsilon^{-3})$和$O(\epsilon^{-7})$的访问次数下 converge to an $\epsilon$-稳定点,其中可能存在随机 oracle的假设。在这种假设下,我们表明了该算法可以在单 loop(即每个迭代只需要 $O(1)$ 样本)下实现,并且实现了改进的oracle-复杂度 $O(\epsilon^{-3})$和 $O(\epsilon^{-5})$。

Turbulent Flow Simulation using Autoregressive Conditional Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.01745
  • repo_url: None
  • paper_authors: Georg Kohl, Li-Wei Chen, Nils Thuerey
  • for: 这篇论文主要是为了解决机器学习基于PDE解决方法中的稳定性问题。
  • methods: 这篇论文使用了一种基于 conditional diffusion 模型的抽象扩展,以提高学习型PDE解决方法的稳定性。
  • results: 论文表明,这种方法可以在各种复杂的液体流动场景中提供稳定的解决方案,并且可以在不同的流动参数范围内进行扩展。同时,这种方法还可以借助概率性的推 diffusion 方法来预测流体物理统计学上的性质。
    Abstract Simulating turbulent flows is crucial for a wide range of applications, and machine learning-based solvers are gaining increasing relevance. However, achieving stability when generalizing to longer rollout horizons remains a persistent challenge for learned PDE solvers. We address this challenge by introducing a fully data-driven fluid solver that utilizes an autoregressive rollout based on conditional diffusion models. We show that this approach offers clear advantages in terms of rollout stability compared to other learned baselines. Remarkably, these improvements in stability are achieved without compromising the quality of generated samples, and our model successfully generalizes to flow parameters beyond the training regime. Additionally, the probabilistic nature of the diffusion approach allows for inferring predictions that align with the statistics of the underlying physics. We quantitatively and qualitatively evaluate the performance of our method on a range of challenging scenarios, including incompressible and transonic flows, as well as isotropic turbulence.
    摘要 模拟湍流是许多应用领域的关键,而机器学习基于的解决方案在不断增长。然而,在扩展到更长的执行 horizon 时,学习得到的稳定性仍然是一个棘手的挑战。我们解决这个挑战 by introducing a fully data-driven fluid solver that utilizes an autoregressive rollout based on conditional diffusion models.我们发现这种方法可以在其他学习基准下提供明显的稳定性改进,而不需要牺牲生成样本的质量。另外,Diffusion 方法的 probabilistic nature 允许我们生成与物理统计相符的预测。我们对一系列复杂的场景进行量化和质量evaluate our method,包括不压缩和超音速流体动理,以及各向异otropic turbulence。

Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning

  • paper_url: http://arxiv.org/abs/2309.01730
  • repo_url: None
  • paper_authors: Michail Kalntis, George Iosifidis, Fernando A. Kuipers
  • for: optimize the allocation of resources in virtualized base stations (vBSs) to balance effective throughput and energy consumption, even in challenging environments.
  • methods: online learning algorithm and meta-learning scheme to adapt to non-stationary or adversarial traffic demands and choose the best performing algorithm for different environments.
  • results: sub-linear regret and up to 64.5% power consumption savings compared to state-of-the-art benchmarks, evaluated with real-world data and trace-driven evaluations.
    Abstract Open Radio Access Network systems, with their virtualized base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability. Optimizing the allocation of resources in a vBS is challenging since it requires knowledge of the environment, (i.e., "external'' information), such as traffic demands and channel quality, which is difficult to acquire precisely over short intervals of a few seconds. To tackle this problem, we propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments; for instance, non-stationary or adversarial traffic demands. We also develop a meta-learning scheme, which leverages the power of other algorithmic approaches, tailored for more "easy'' environments, and dynamically chooses the best performing one, thus enhancing the overall system's versatility and effectiveness. We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments. The performance of the algorithms is evaluated with real-world data and various trace-driven evaluations, indicating savings of up to 64.5% in the power consumption of a vBS compared with state-of-the-art benchmarks.
    摘要

Robust Online Classification: From Estimation to Denoising

  • paper_url: http://arxiv.org/abs/2309.01698
  • repo_url: None
  • paper_authors: Changlong Wu, Ananth Grama, Wojciech Szpankowski
  • for: 研究在含有噪声标签的在线分类中的噪声干扰。噪声机制由一个通用的kernel模型,对任何特征标签对而指定一个已知的分布集合,而选择器在每个时间步骤上选择未知分布,并生成噪声标签。
  • methods: 研究者使用了在线条件分布估计的概念,来扩展和涵盖了一般的噪声kernel和选择器,以及无限类型和随机生成的特征。
  • results: 研究者表明,对于许多自然的噪声kernel,选择器和标签函数的finite类型, minimax风险可以独立于时间跨度和对征函数的对数幂级别下降。此外,研究者还扩展了结果到无限类型和随机生成的特征。
    Abstract We study online classification in the presence of noisy labels. The noise mechanism is modeled by a general kernel that specifies, for any feature-label pair, a (known) set of distributions over noisy labels. At each time step, an adversary selects an unknown distribution from the distribution set specified by the kernel based on the actual feature-label pair, and generates the noisy label from the selected distribution. The learner then makes a prediction based on the actual features and noisy labels observed thus far, and incurs loss $1$ if the prediction differs from the underlying truth (and $0$ otherwise). The prediction quality is quantified through minimax risk, which computes the cumulative loss over a finite horizon $T$. We show that for a wide range of natural noise kernels, adversarially selected features, and finite class of labeling functions, minimax risk can be upper bounded independent of the time horizon and logarithmic in the size of labeling function class. We then extend these results to inifinite classes and stochastically generated features via the concept of stochastic sequential covering. Our results extend and encompass findings of Ben-David et al. (2009) through substantial generality, and provide intuitive understanding through a novel reduction to online conditional distribution estimation.
    摘要 我们研究在含杂标签下的在线分类。杂标机制是通过一个通用的核函数来模型,该函数指定了任何特征标签对的(已知)分布过滤器。在每个时间步骤中,一个对手选择一个未知分布从选择的分布集中,并生成含杂标签。学习者根据实际特征和含杂标签所见而进行预测,并且如果预测与真实真实值不同,则输入1,否则输入0。预测质量通过最小最大风险来衡量,该风险计算了时间 horizon $T$ 内的总损失。我们证明,对于许多自然的杂标核函数、对手选择的特征和finite类标签函数,最小最大风险可以独立于时间桢和对数型的总体规模而上下界。然后,我们扩展这些结果到无限类和随机生成的特征上,通过离散顺序覆盖的概念。我们的结果超越和涵盖了Ben-David等人(2009)的发现,并提供了直观的理解,通过一种新的减少到在线条件分布估计的概念。

Physics-Informed Polynomial Chaos Expansions

  • paper_url: http://arxiv.org/abs/2309.01697
  • repo_url: None
  • paper_authors: Lukáš Novák, Himanshu Sharma, Michael D. Shields
    for: 本研究旨在构造physics-informed多项式泛化(PCE),以推优化既有数据约束又有物理约束的泛化过程。methods: 本研究使用了一种新的方法,即将实验设计与模型物理约束相结合,以构造physics-informed PCE。这种方法通过利用物理约束来提高泛化精度,而不需要评估原始模型。results: 研究结果表明,提出的方法可以提高泛化精度,而且不加增计算负担。此外,通过对一些具有不同复杂性的决定性例子进行应用,还可以通过分析减少后的PCE滤波来进行不确定性评估。
    Abstract Surrogate modeling of costly mathematical models representing physical systems is challenging since it is typically not possible to create a large experimental design. Thus, it is beneficial to constrain the approximation to adhere to the known physics of the model. This paper presents a novel methodology for the construction of physics-informed polynomial chaos expansions (PCE) that combines the conventional experimental design with additional constraints from the physics of the model. Physical constraints investigated in this paper are represented by a set of differential equations and specified boundary conditions. A computationally efficient means for construction of physically constrained PCE is proposed and compared to standard sparse PCE. It is shown that the proposed algorithms lead to superior accuracy of the approximation and does not add significant computational burden. Although the main purpose of the proposed method lies in combining data and physical constraints, we show that physically constrained PCEs can be constructed from differential equations and boundary conditions alone without requiring evaluations of the original model. We further show that the constrained PCEs can be easily applied for uncertainty quantification through analytical post-processing of a reduced PCE filtering out the influence of all deterministic space-time variables. Several deterministic examples of increasing complexity are provided and the proposed method is applied for uncertainty quantification.
    摘要 实验设计较少的mathematical model预测physical system的成本高,因此实验设计是准确描述physical system的关键。这篇论文提出了一种新的方法,即physics-informed polynomial chaos expansions(PCE)的建构,融合实验设计和物理模型的条件。这些物理条件是通过 differential equations和specified boundary conditions表示的。提出了一种 computationally efficient的建构方法,并与标准的罕见PCE进行比较。结果显示,提案的方法可以提高 aproximation的精度,而且不会增加computational burden。虽然主要的目的是将data和物理条件融合,但我们显示可以从 differential equations和boundary conditions aloneconstruct physically constrained PCE,不需要评估原始模型。此外,我们还显示了 constrained PCE可以通过analytical post-processing的方式,范围内的所有决定性空间时间变量 filtering out。提供了一些 deterministic example of increasing complexity,并应用于uncertainty quantification。

Blind Biological Sequence Denoising with Self-Supervised Set Learning

  • paper_url: http://arxiv.org/abs/2309.01670
  • repo_url: None
  • paper_authors: Nathan Ng, Ji Won Park, Jae Hyeon Lee, Ryan Lewis Kelly, Stephen Ra, Kyunghyun Cho
  • for: 这个论文的目的是为了实现高通量DNA测序数据中的下游科学应用,特别是为了更好地清除Sequencing plataforms中的错误读取。
  • methods: 这个论文提出了一种新的自主学习方法,叫做Self-Supervised Set Learning (SSSL),可以将多个受挤的序列读取到一个嵌入空间中,并且估算这些序列的集合嵌入。这个集合嵌入可以用来预测受挤的清洁序列。
  • results: 在实验中,SSSL方法可以较前一个基准下降17%的错误率,并且在实际数据中也有好的表现,尤其是在小序列上(小于6个受挤),可以大幅降低错误率。
    Abstract Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of $\leq 6$ subreads with 17% fewer errors and large reads of $>6$ subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications.
    摘要 生物序列分析需要去除测序平台输出的不精准数据。我们考虑一种常见的情况,在高通量长读平台上重复读取短序列,以生成多个噪声观测。对这些噪声观测进行对Alignment基于的去噪方法经常失败,当有太少的噪声观测或错误率太高时。在这篇论文中,我们提出了一种新的方法,即Self-Supervised Set Learning(SSSL)。这种方法将噪声观测集成到一个映射空间中,并估算这些噪声观测的集中点,作为latent空间和序列空间中的midpoint。这个集中点表示“平均”的噪声观测,可以被解码成一个clean序列预测。在对模拟长读DNA数据进行实验中,SSSL方法可以对小读数据(≤6个噪声观测)和大读数据(>6个噪声观测)进行去噪,相比best baseline,减少了17%和8%的错误。在一个真实的抗体序列数据集上,SSSL方法超过了基准值,尤其是在difficult小读中,这些小读占检测集的60%以上。通过准确地去噪这些小读,SSSL方法承诺可以更好地实现高通量DNA测序数据的下游科学应用。

Robust penalized least squares of depth trimmed residuals regression for high-dimensional data

  • paper_url: http://arxiv.org/abs/2309.01666
  • repo_url: None
  • paper_authors: Yijun Zuo
  • for: 本研究旨在探讨高维度数据分析中遇到的挑战,包括维度大于样本大小(i)和异常点或杂质点隐藏和更难检测(ii)。
  • methods: 本文使用了许多现代整合 penalty 方法来分析高维度数据,其中包括惩罚方法和深度trimmed residuals 方法。
  • results: 研究发现,大多数整合 penalty 方法在面对异常点或杂质点时会失效,而新提出的最小最小二乘法depth trimmed residuals方法可以更好地处理这些情况,并在实验中表现出较高的估计和预测精度。
    Abstract Challenges with data in the big-data era include (i) the dimension $p$ is often larger than the sample size $n$ (ii) outliers or contaminated points are frequently hidden and more difficult to detect. Challenge (i) renders most conventional methods inapplicable. Thus, it attracts tremendous attention from statistics, computer science, and bio-medical communities. Numerous penalized regression methods have been introduced as modern methods for analyzing high-dimensional data. Disproportionate attention has been paid to the challenge (ii) though. Penalized regression methods can do their job very well and are expected to handle the challenge (ii) simultaneously. Most of them, however, can break down by a single outlier (or single adversary contaminated point) as revealed in this article. The latter systematically examines leading penalized regression methods in the literature in terms of their robustness, provides quantitative assessment, and reveals that most of them can break down by a single outlier. Consequently, a novel robust penalized regression method based on the least sum of squares of depth trimmed residuals is proposed and studied carefully. Experiments with simulated and real data reveal that the newly proposed method can outperform some leading competitors in estimation and prediction accuracy in the cases considered.
    摘要 大数据时代的数据分析挑战包括(i)维度pfrequently大于样本size n(ii)异常值或杂质点隐藏更难于探测。挑战(i)使得大多数传统方法无法应用。这引起了统计、计算机科学和生物医学领域的极大关注。许多惩罚回归方法被引入为现代高维数据分析的方法。虽然挑战(ii)得到了过度的关注,但是惩罚回归方法可以很好地处理它。然而,大多数方法都可以被单个异常值(或单个杂质点)所破坏,这在本文中得到了证明。为了解决这个问题,一种基于深度剔除差异的最小二乘方法被提出并且仔细研究了。实验表明,新提出的方法在预测和估计精度方面在考虑的情况下能够超越一些竞争对手。

Locally Stationary Graph Processes

  • paper_url: http://arxiv.org/abs/2309.01657
  • repo_url: None
  • paper_authors: Abdullah Canbolat, Elif Vural
  • for: 本文旨在提出一种基于不规则网络结构的局部站立图像处理方法,以满足实际问题中的局部特点变化。
  • methods: 本文提出了一种基于组件过程的局部站立图像模型(LSGP),通过表示过程的总体为多个组件过程的组合,来表示图像在不同区域的局部站立性。提出了一种计算LSGP模型的算法,以及本地使用WSS过程的近似方法。
  • results: 实验表明,提出的过程模型可以与现有技术竞争,并且在信号 interpolating 问题中提供了高精度的信号表示。
    Abstract Stationary graph process models are commonly used in the analysis and inference of data sets collected on irregular network topologies. While most of the existing methods represent graph signals with a single stationary process model that is globally valid on the entire graph, in many practical problems, the characteristics of the process may be subject to local variations in different regions of the graph. In this work, we propose a locally stationary graph process (LSGP) model that aims to extend the classical concept of local stationarity to irregular graph domains. We characterize local stationarity by expressing the overall process as the combination of a set of component processes such that the extent to which the process adheres to each component varies smoothly over the graph. We propose an algorithm for computing LSGP models from realizations of the process, and also study the approximation of LSGPs locally with WSS processes. Experiments on signal interpolation problems show that the proposed process model provides accurate signal representations competitive with the state of the art.
    摘要 stationary graph process models 常用于非 régulière 网络 topology 上的数据集分析和推理。大多数现有方法使用 globally 有效的站ARY graph signal 模型来表示整个图的信号,但在实际问题中,过程的特性可能会在不同地方的图中具有本地差异。在这种情况下,我们提出了一种 Locally Stationary Graph Process (LSGP) 模型,旨在扩展传统的本地站ARY性概念到不规则图域。我们通过表示过程的总体作为不同地方的组件过程的组合来 caracterize 本地站ARY性。我们还提出了一种计算 LSGP 模型的算法,以及对 LSGP 模型进行本地approximation的 WSS 过程的研究。实验表明,提出的过程模型可以与当前状态齐的精度地表示信号。

Representing Edge Flows on Graphs via Sparse Cell Complexes

  • paper_url: http://arxiv.org/abs/2309.01632
  • repo_url: None
  • paper_authors: Josef Hoppe, Michael T. Schaub
  • for: 获取对数据的稀疏、可解释性表示是机器学习和信号处理任务中的关键。
  • methods: 将图结构提升到 simplicial complex 中,然后使用 Hodge-Laplacian 的特征值和相应的 incidence matrix 来实现 Hodge 分解,从而将观察数据表示为梯度、旋转和响应流。
  • results: 在实际数据和 sintetic 数据上,我们的算法可以高效地解决 cell inference 优化问题,并且比现有的方法更高效。
    Abstract Obtaining sparse, interpretable representations of observable data is crucial in many machine learning and signal processing tasks. For data representing flows along the edges of a graph, an intuitively interpretable way to obtain such representations is to lift the graph structure to a simplicial complex: The eigenvectors of the associated Hodge-Laplacian, respectively the incidence matrices of the corresponding simplicial complex then induce a Hodge decomposition, which can be used to represent the observed data in terms of gradient, curl, and harmonic flows. In this paper, we generalize this approach to cellular complexes and introduce the cell inference optimization problem, i.e., the problem of augmenting the observed graph by a set of cells, such that the eigenvectors of the associated Hodge Laplacian provide a sparse, interpretable representation of the observed edge flows on the graph. We show that this problem is NP-hard and introduce an efficient approximation algorithm for its solution. Experiments on real-world and synthetic data demonstrate that our algorithm outperforms current state-of-the-art methods while being computationally efficient.
    摘要 获取稀疏、可解释的数据表示是许多机器学习和信号处理任务中的关键。为了在图structure上获取这些表示,一种直观可解的方法是将图结构升级到 simplicial complex:图结构的特征值和相应的 simplicial complex 的 incidence matrix THEN INDUCE A Hodge decomposition, 可以用来表示观察到的边流在图上。在这篇论文中,我们扩展了这种方法到细胞复杂体系,并引入细胞推理优化问题,即在观察到的图上添加一组细胞,以便将 graph 的特征值和 incidence matrix 转化为稀疏、可解释的表示。我们证明了这个问题是NP困难的,并提出了一种有效的近似算法来解决它。实验表明,我们的算法在实际数据和synthetic数据上都能够超越当前状态的方法,而且 Computationally efficient。

Dropout Attacks

  • paper_url: http://arxiv.org/abs/2309.01614
  • repo_url: https://github.com/ngunnar/Robustness_tutorial
  • paper_authors: Andrew Yuan, Alina Oprea, Cheng Tan
  • for: 本研究旨在攻击深度学习模型中的Dropout操作,以防止过拟合。
  • methods: 本文引入了一种新的Dropout攻击方法,称为DROPOUTATTACK,通过 manipulate dropout operator 中选择的 neuron 而不是随机选择。
  • results: 在训练 VGG-16 模型在 CIFAR-100 上,我们的攻击可以将受到攻击的级别降低至 34.6% (从 81.7% 降至 47.1%),而无需对模型精度做出任何干扰。
    Abstract Dropout is a common operator in deep learning, aiming to prevent overfitting by randomly dropping neurons during training. This paper introduces a new family of poisoning attacks against neural networks named DROPOUTATTACK. DROPOUTATTACK attacks the dropout operator by manipulating the selection of neurons to drop instead of selecting them uniformly at random. We design, implement, and evaluate four DROPOUTATTACK variants that cover a broad range of scenarios. These attacks can slow or stop training, destroy prediction accuracy of target classes, and sabotage either precision or recall of a target class. In our experiments of training a VGG-16 model on CIFAR-100, our attack can reduce the precision of the victim class by 34.6% (from 81.7% to 47.1%) without incurring any degradation in model accuracy
    摘要 Dropout 是深度学习中常用的操作,目的是防止适应性过度 Training 中的 neuron 被随机Dropout 操作。这篇论文介绍了一种新的毒素攻击 named DROPOUTATTACK,该攻击targets dropout 操作,而不是随机选择 neuron。我们设计了四种 DROPOUTATTACK 变种,覆盖了广泛的场景。这些攻击可以降低目标类准确率,甚至使模型训练失败。在我们对 VGG-16 模型在 CIFAR-100 上训练的实验中,我们的攻击可以降低目标类准确率 by 34.6%(从 81.7% 降至 47.1%),而无需模型精度下降。

Fair Ranking under Disparate Uncertainty

  • paper_url: http://arxiv.org/abs/2309.01610
  • repo_url: None
  • paper_authors: Richa Rastogi, Thorsten Joachims
  • for: 提高排序系统的公平性,即使数据量不均衡时仍能保证各个群体的排名公平。
  • methods: 提出Equal-Opportunity Ranking(EOR)作为公平排序标准,并实现了一种可行的算法来实现EOR排名,时间复杂度为O(n * log(n))。
  • results: 经过synthetic数据、美国人口普查数据和Amazon搜索关键词数据的实验表明,该算法可靠地保证EOR公平性,同时提供有效的排名。
    Abstract Ranking is a ubiquitous method for focusing the attention of human evaluators on a manageable subset of options. Its use ranges from surfacing potentially relevant products on an e-commerce site to prioritizing college applications for human review. While ranking can make human evaluation far more effective by focusing attention on the most promising options, we argue that it can introduce unfairness if the uncertainty of the underlying relevance model differs between groups of options. Unfortunately, such disparity in uncertainty appears widespread, since the relevance estimates for minority groups tend to have higher uncertainty due to a lack of data or appropriate features. To overcome this fairness issue, we propose Equal-Opportunity Ranking (EOR) as a new fairness criterion for ranking that provably corrects for the disparity in uncertainty between groups. Furthermore, we present a practical algorithm for computing EOR rankings in time $O(n \log(n))$ and prove its close approximation guarantee to the globally optimal solution. In a comprehensive empirical evaluation on synthetic data, a US Census dataset, and a real-world case study of Amazon search queries, we find that the algorithm reliably guarantees EOR fairness while providing effective rankings.
    摘要 “排名是一种普遍存在的方法,用于集中人类评估者的注意力于可管理的子集中。它的应用范围从电子商务网站上浮出潜在有用的产品到审核大学申请。尽管排名可以使人类评估变得非常有效,但它可能引入不公正性,因为不同群体选项的后台相关性模型的不确定性差异较大。实际上,这种差异在少数群体选项中的相关性估计通常更高,因为这些选项的数据或特征不够。为解决这个公正性问题,我们提出了平等机会排名(EOR)作为一种新的公正性标准,可以正确地纠正不同群体选项之间的不确定性差异。此外,我们提出了一种实用的算法来计算EOR排名,时间复杂度为O(nlog(n)),并证明其与全球最佳解决方案的快近优化 garantia。在synthetic数据、US Census数据和amazon搜索查询的实际评估中,我们发现了这种算法可靠地保证EOR公正性,同时提供有效的排名。”

Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems

  • paper_url: http://arxiv.org/abs/2309.08617
  • repo_url: None
  • paper_authors: Blaž Škrlj, Nir Ki-Tov, Lee Edelist, Natalia Silberstein, Hila Weisman-Zohar, Blaž Mramor, Davorin Kopič, Naama Ziporin
  • for: 这个论文是为了解决大规模、动态流中数据质量维护问题。
  • methods: 这个系统使用了新的在线特征监测和验证技术,以提供快速、有效、适应性强的数据质量监测,并能够实时检测数据质量问题的根本原因。
  • results: 对实际数据集进行评估,这个系统能够有效地发送警示并解决数据质量问题,提高了实时live推荐系统的可靠性和性能。
    Abstract Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysis, drift detection and insights into problematic production events. Integrating state-of-the-art online feature ranking for sparse data and anomaly detection ideas, Drifter is highly scalable and resource-efficient, requiring only two threads and less than a gigabyte of RAM per production deployments that handle millions of instances per minute. Evaluation on real-world data sets demonstrates Drifter's effectiveness in alerting and mitigating data quality issues, substantially improving reliability and performance of real-time live recommender systems.
    摘要 现实生产环境中, oftentimes 面临着大规模、动态流中数据质量维护的挑战。我们介绍了 Drifter,一种高效、轻量级的在线特征监测和验证系统,用于推荐使用情况下的数据质量监测。 Drifter 超越了现有方法的局限性,提供了快速、敏感、适应性的数据质量监测,允许实时根本原因分析、漂移检测和问题生成的生产事件中的深入洞察。 Drifter integrate 最新的在线特征排名技术和罕见检测思想,可扩展性强,只需两个线程和 less than 一 gigabyte 的内存,可执行 millions 个实例每分钟的生产部署。 评估实际数据集表明, Drifter 可以有效地预警和解决数据质量问题,substantially 提高实时live 推荐系统的可靠性和性能。

Active flow control for three-dimensional cylinders through deep reinforcement learning

  • paper_url: http://arxiv.org/abs/2309.02462
  • repo_url: None
  • paper_authors: Pol Suárez, Francisco Alcántara-Ávila, Arnau Miró, Jean Rabault, Bernat Font, Oriol Lehmkuhl, R. Vinuesa
  • for: 降低缩扰系数(drag coefficient)
  • methods: 多个独立控制的零负质流体推进器(synthetic jets),基于深度学习托管的液体动力学解算器,以及一个使用质量优化算法的代理人
  • results: 在三种不同的问题配置中,实现了显著的缩扰减少
    Abstract This paper presents for the first time successful results of active flow control with multiple independently controlled zero-net-mass-flux synthetic jets. The jets are placed on a three-dimensional cylinder along its span with the aim of reducing the drag coefficient. The method is based on a deep-reinforcement-learning framework that couples a computational-fluid-dynamics solver with an agent using the proximal-policy-optimization algorithm. We implement a multi-agent reinforcement-learning framework which offers numerous advantages: it exploits local invariants, makes the control adaptable to different geometries, facilitates transfer learning and cross-application of agents and results in significant training speedup. In this contribution we report significant drag reduction after applying the DRL-based control in three different configurations of the problem.
    摘要 We implement a multi-agent reinforcement-learning framework, which has several advantages:1. It exploits local invariants, making the control adaptable to different geometries.2. It facilitates transfer learning and cross-application of agents.3. It results in significant training speedup.In this contribution, we report significant drag reduction after applying the DRL-based control in three different configurations of the problem.

Passing Heatmap Prediction Based on Transformer Model and Tracking Data

  • paper_url: http://arxiv.org/abs/2309.01526
  • repo_url: None
  • paper_authors: Yisheng Pei, Varuna De Silva, Mike Caine
  • For: 这个研究旨在提供一种能够预测传球的可能性和传球前运动影响最终结果的深度学习网络架构,以更正查评球员的表现。* Methods: 该研究使用了深度学习网络模型,分析了超过28,000个传球事件,并达到了0.7以上的顶部一级准确率。* Results: 该研究表明,通过分析球员的运动动作,可以更好地理解球场控制和传球选择对防御性表现的影响,并为足球分析师提供一个更好的工具和指标来评估球员的运动贡献。
    Abstract Although the data-driven analysis of football players' performance has been developed for years, most research only focuses on the on-ball event including shots and passes, while the off-ball movement remains a little-explored area in this domain. Players' contributions to the whole match are evaluated unfairly, those who have more chances to score goals earn more credit than others, while the indirect and unnoticeable impact that comes from continuous movement has been ignored. This research presents a novel deep-learning network architecture which is capable to predict the potential end location of passes and how players' movement before the pass affects the final outcome. Once analysed more than 28,000 pass events, a robust prediction can be achieved with more than 0.7 Top-1 accuracy. And based on the prediction, a better understanding of the pitch control and pass option could be reached to measure players' off-ball movement contribution to defensive performance. Moreover, this model could provide football analysts a better tool and metric to understand how players' movement over time contributes to the game strategy and final victory.
    摘要

A Blackbox Model Is All You Need to Breach Privacy: Smart Grid Forecasting Models as a Use Case

  • paper_url: http://arxiv.org/abs/2309.01523
  • repo_url: None
  • paper_authors: Hussein Aly, Abdulaziz Al-Ali, Abdullah Al-Ali, Qutaibah Malluhi
  • for: 这篇论文研究了智能电网中预测模型的隐私风险,尤其是深度学习和预测模型在智能电网中的应用。
  • methods: 本研究使用了深度学习和预测模型,包括长期快速响应神经网络 (LSTM),分析了预测模型是否泄露敏感信息的风险。
  • results: 研究发现,使用预测模型可以泄露智能电网系统中的全局性和隐私威胁,特别是通过黑盒访问LSTM模型可以泄露大量信息,与数据直接访问的情况相似(差异在1%以下,在ROC曲线下的面积)。这说明需要对预测模型进行保护,与数据一样重要。
    Abstract This paper investigates the potential privacy risks associated with forecasting models, with specific emphasis on their application in the context of smart grids. While machine learning and deep learning algorithms offer valuable utility, concerns arise regarding their exposure of sensitive information. Previous studies have focused on classification models, overlooking risks associated with forecasting models. Deep learning based forecasting models, such as Long Short Term Memory (LSTM), play a crucial role in several applications including optimizing smart grid systems but also introduce privacy risks. Our study analyzes the ability of forecasting models to leak global properties and privacy threats in smart grid systems. We demonstrate that a black box access to an LSTM model can reveal a significant amount of information equivalent to having access to the data itself (with the difference being as low as 1% in Area Under the ROC Curve). This highlights the importance of protecting forecasting models at the same level as the data.
    摘要 Translation notes:* "smart grids" is translated as "智能电网" (zìnéng diànwǎng)* "forecasting models" is translated as "预测模型" (yùzhèng módelǐ)* "machine learning" is translated as "机器学习" (jīshì xuéxí)* "deep learning" is translated as "深度学习" (shēngrù xuéxí)* "Long Short Term Memory" is translated as "长期短 память" (chángzhì duǎnjiàng)* "black box access" is translated as "黑盒访问" (hēi bāo fāngwù)* "privacy risks" is translated as "隐私风险" (yìnwèi fēngxì)

Hawkeye: Change-targeted Testing for Android Apps based on Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.01519
  • repo_url: None
  • paper_authors: Chao Peng, Zhengwei Lv, Jiarong Fu, Jiayuan Liang, Zhao Zhang, Ajitha Rajan, Ping Yang
  • for: 本研究的目的是提高Android应用程序更新的正确性,以避免在用户端出现漏洞。
  • methods: 本研究提出了一种指导测试方法,使用深度强化学习来优先执行更新后影响的GUI操作。
  • results: 对于10个开源App和1个商业App,研究发现,使用 Hawkeye 可以更可靠地生成targeting更改的GUI事件序列,比 FastBot2 和 ARES 更高效。 Hawkeye 在小型开源App上表现相当,但在大型商业App上表现较低。在实际生产环境中,Hawkeye 也能够成功地进行烟测测试。
    Abstract Android Apps are frequently updated to keep up with changing user, hardware, and business demands. Ensuring the correctness of App updates through extensive testing is crucial to avoid potential bugs reaching the end user. Existing Android testing tools generate GUI events focussing on improving the test coverage of the entire App rather than prioritising updates and its impacted elements. Recent research has proposed change-focused testing but relies on random exploration to exercise the updates and impacted GUI elements that is ineffective and slow for large complex Apps with a huge input exploration space. We propose directed testing of App updates with Hawkeye that is able to prioritise executing GUI actions associated with code changes based on deep reinforcement learning from historical exploration data. Our empirical evaluation compares Hawkeye with state-of-the-art model-based and reinforcement learning-based testing tools FastBot2 and ARES using 10 popular open-source and 1 commercial App. We find that Hawkeye is able to generate GUI event sequences targeting changed functions more reliably than FastBot2 and ARES for the open source Apps and the large commercial App. Hawkeye achieves comparable performance on smaller open source Apps with a more tractable exploration space. The industrial deployment of Hawkeye in the development pipeline also shows that Hawkeye is ideal to perform smoke testing for merge requests of a complicated commercial App.
    摘要 我们提出了directed testing of App updates with Hawkeye,它可以根据深度学习历史探索数据来优先执行更改后的GUI操作。我们的实验比较了Hawkeye与当前最佳的模型基于和学习基于的测试工具FastBot2和ARES,使用10个流行的开源App和1个商业App。我们发现Hawkeye可以更可靠地对更改后的函数生成GUI事件序列,比FastBot2和ARES更高效。 Hawkeye在小型开源App上也有类似的性能,而且在商业App的开发pipeline中的实际部署中也显示了Hawkeye的适用性。 Hawkeye可以快速完成 merge requests 的烟测,为复杂的商业App提供了一个 идеal testing solution。

Federated cINN Clustering for Accurate Clustered Federated Learning

  • paper_url: http://arxiv.org/abs/2309.01515
  • repo_url: None
  • paper_authors: Yuhao Zhou, Minjia Shi, Yuxin Tian, Yuanxi Li, Qing Ye, Jiancheng Lv
  • For: 本研究旨在 Addressing the challenge of coordinating Federated Learning (FL) with crowd intelligence, particularly when client groups have disparate objectives due to data heterogeneity or distinct tasks.* Methods: 提议了一种 Federated cINN Clustering Algorithm (FCCA),通过对每个客户端的私有数据进行全局编码,并使用生成模型进行最大可能性估计,以避免模式折射和优化难度。* Results: 对多种模型和数据集进行了广泛的实验,并证明了 FCCA 的超越性,比如其能够更好地减少客户端之间的干扰和提高全局模型的准确率。
    Abstract Federated Learning (FL) presents an innovative approach to privacy-preserving distributed machine learning and enables efficient crowd intelligence on a large scale. However, a significant challenge arises when coordinating FL with crowd intelligence which diverse client groups possess disparate objectives due to data heterogeneity or distinct tasks. To address this challenge, we propose the Federated cINN Clustering Algorithm (FCCA) to robustly cluster clients into different groups, avoiding mutual interference between clients with data heterogeneity, and thereby enhancing the performance of the global model. Specifically, FCCA utilizes a global encoder to transform each client's private data into multivariate Gaussian distributions. It then employs a generative model to learn encoded latent features through maximum likelihood estimation, which eases optimization and avoids mode collapse. Finally, the central server collects converged local models to approximate similarities between clients and thus partition them into distinct clusters. Extensive experimental results demonstrate FCCA's superiority over other state-of-the-art clustered federated learning algorithms, evaluated on various models and datasets. These results suggest that our approach has substantial potential to enhance the efficiency and accuracy of real-world federated learning tasks.
    摘要 《联邦学习(FL)》提出了一种创新的隐私保护分布式机器学习方法,可以实现大规模的群体智能。然而,在与群体智能协调FL时, Client groups possessing diverse objectives due to data heterogeneity or distinct tasks poses a significant challenge. To address this challenge, we propose the Federated cINN Clustering Algorithm (FCCA) to robustly cluster clients into different groups, avoiding mutual interference between clients with data heterogeneity, and thereby enhancing the performance of the global model. Specifically, FCCA utilizes a global encoder to transform each client's private data into multivariate Gaussian distributions. It then employs a generative model to learn encoded latent features through maximum likelihood estimation, which eases optimization and avoids mode collapse. Finally, the central server collects converged local models to approximate similarities between clients and thus partition them into distinct clusters. Extensive experimental results demonstrate FCCA's superiority over other state-of-the-art clustered federated learning algorithms, evaluated on various models and datasets. These results suggest that our approach has substantial potential to enhance the efficiency and accuracy of real-world federated learning tasks.

Layer-wise training for self-supervised learning on graphs

  • paper_url: http://arxiv.org/abs/2309.01503
  • repo_url: None
  • paper_authors: Oscar Pina, Verónica Vilaplana
  • for: 训练大 graphs上的端到端图 neural networks (GNN) 存在多种内存和计算挑战,限制了深度的应用,因为深度会导致内存和空间复杂度的剧烈增长。
  • methods: 我们提出了层 wise Regularized Graph Infomax算法,通过自我超vised的方式来训练 GNN 层次。我们将 GNN 中的特征传播和特征变换分解出来,以学习节点表示,并从未来输入预测的基础上 derivate 一个损失函数。
  • results: 我们在 inductive 大 graphs 中评估了我们的算法,与其他端到端方法相当,并且显著提高了效率,可以在一个单个设备上训练更加复杂的模型。此外,我们的算法还可以避免深度 GNN 中的抖搅现象。
    Abstract End-to-end training of graph neural networks (GNN) on large graphs presents several memory and computational challenges, and limits the application to shallow architectures as depth exponentially increases the memory and space complexities. In this manuscript, we propose Layer-wise Regularized Graph Infomax, an algorithm to train GNNs layer by layer in a self-supervised manner. We decouple the feature propagation and feature transformation carried out by GNNs to learn node representations in order to derive a loss function based on the prediction of future inputs. We evaluate the algorithm in inductive large graphs and show similar performance to other end to end methods and a substantially increased efficiency, which enables the training of more sophisticated models in one single device. We also show that our algorithm avoids the oversmoothing of the representations, another common challenge of deep GNNs.
    摘要 大型图格神经网络(GNN)的端到端训练存在许多内存和计算挑战,深度随着增加而 exponentially 增加内存和空间复杂性。在这篇论文中,我们提出层wise Regularized Graph Infomax算法,用于层段式自我监督的GNN训练。我们将GNN中的特征传播和特征转换分解开来,以学习节点表示,并从未来输入预测得出损失函数。我们在大型 inductive 图上评估了算法,并与其他端到端方法相当,同时有substantially 提高效率,这使得可以在一个设备上训练更复杂的模型。此外,我们还证明了我们的算法可以避免深度GNN的过滤 representations。

FinDiff: Diffusion Models for Financial Tabular Data Generation

  • paper_url: http://arxiv.org/abs/2309.01472
  • repo_url: None
  • paper_authors: Timur Sattarov, Marco Schreyer, Damian Borth
  • for: 这个论文的目的是为了生成真实世界金融数据表格,以便于经济enario模型、压力测试和诈骗探测等下游任务。
  • methods: 这篇论文使用了扩散模型,特别是噪声扩散模型,来生成模拟真实数据的数据。它使用嵌入编码来处理金融数据的混合类型特征,包括分类和数值特征。
  • results: 实验结果表明,FinDiff可以生成高准确性、隐私和实用性的 sintetic tabular financial data。与现有基线模型进行比较,FinDiff在三个真实世界金融数据集上表现出色。
    Abstract The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both academics and practitioners to conduct collaborative research effectively. The emergence of generative models, particularly diffusion models, capable of synthesizing data mimicking the underlying distributions of real-world data presents a compelling solution. This work introduces 'FinDiff', a diffusion model designed to generate real-world financial tabular data for a variety of regulatory downstream tasks, for example economic scenario modeling, stress tests, and fraud detection. The model uses embedding encodings to model mixed modality financial data, comprising both categorical and numeric attributes. The performance of FinDiff in generating synthetic tabular financial data is evaluated against state-of-the-art baseline models using three real-world financial datasets (including two publicly available datasets and one proprietary dataset). Empirical results demonstrate that FinDiff excels in generating synthetic tabular financial data with high fidelity, privacy, and utility.
    摘要 共享微数据,如基金投资和 derivate 工具,由 regulatory 机构提供的存在着独特的挑战,主要是由于严格的数据保密和隐私法规。这些挑战通常会阻碍学者和实践者进行有效的合作研究。随着生成模型的出现,特别是扩散模型,可以Synthesize 数据,模拟实际世界数据的下面分布。本文介绍了 'FinDiff',一种扩散模型,用于生成实际世界金融表格数据,用于经济enario模拟、压力测试和欺诈探测等下游任务。FinDiff 使用 embedding 编码来模型金融数据的混合模式,包括 both categorical 和 numeric 特征。FinDiff 在生成 synthetic 金融表格数据方面的性能被评估于现有的基eline 模型,使用三个实际世界金融数据集(包括两个公共可用数据集和一个专有数据集)。实际结果表明,FinDiff 可以高效地生成 synthetic 金融表格数据,具有高准确性、隐私和实用性。

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.01458
  • repo_url: None
  • paper_authors: Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, Shiji Song
  • for: 解释和解释深度强化学习(RL)Agent的行为
  • methods: 提出了一种新的框架(RL-in-RL),通过解决动作和奖励之间的连接问题,来保证奖励一致性 during interpretable feature发现
  • results: 在Atari 2600游戏和Duckietown自驾车 simulator环境中测试和评估了该方法,结果表明方法可以保持奖励一致性,并实现高质量的特征归属。进一步的分析实验也 validate了动作匹配原则的局限性。
    Abstract The black-box nature of deep reinforcement learning (RL) hinders them from real-world applications. Therefore, interpreting and explaining RL agents have been active research topics in recent years. Existing methods for post-hoc explanations usually adopt the action matching principle to enable an easy understanding of vision-based RL agents. In this paper, it is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents. It may lead to irrelevant or misplaced feature attribution when different DNNs' outputs lead to the same rewards or different rewards result from the same outputs. Therefore, we propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents as well. To ensure reward consistency during interpretable feature discovery, a novel framework (RL interpreting RL, denoted as RL-in-RL) is proposed to solve the gradient disconnection from actions to rewards. We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment. The results show that our method manages to keep reward (or return) consistency and achieves high-quality feature attribution. Further, a series of analytical experiments validate our assumption of the action matching principle's limitations.
    摘要 深度强化学习(RL)的黑盒特性使其在实际应用中受到限制。因此,解释和解释RL代理的研究在最近几年得到了广泛的关注。现有的后续解释方法通常采用行动匹配原理,以便轻松理解视觉RL代理。然而,这篇论文 argue互联网的行动匹配原理更多地是深度神经网络(DNN)的解释,而不是RL代理的解释。这可能会导致不相关或错位的特征归因,因为不同的DNN输出可能导致同一个奖励,或者不同的奖励可能来自同一个输出。因此,我们建议将奖励作为RL代理的解释的关键目标,以确保解释过程中的奖励一致性。为保证解释过程中的奖励一致性,我们提出了一种RL解释RL的框架(denoted as RL-in-RL),解决了动作和奖励之间的梯度分离问题。我们在Atari 2600游戏和DUCKIETOWN自驾车模拟环境中进行了证明和评估。结果表明,我们的方法能够保持奖励一致性,并实现高质量的特征归因。此外,一系列的分析实验 validate了我们对行动匹配原理的假设的局限性。

On the Consistency and Robustness of Saliency Explanations for Time Series Classification

  • paper_url: http://arxiv.org/abs/2309.01457
  • repo_url: None
  • paper_authors: Chiara Balestra, Bin Li, Emmanuel Müller
  • for: 这 paper aims to analyze the consistency and robustness of saliency maps for time series features and temporal attribution in a time series classification task.
  • methods: 该 paper uses perturbation-based and gradient-based explanation models to generate saliency explanations, and examines their consistency and robustness on five real-world datasets.
  • results: The experimental results show that the saliency explanations from both models lack consistent and robust performances to some extent, highlighting the need for developing more reliable explanation methods for time series classification.
    Abstract Interpretable machine learning and explainable artificial intelligence have become essential in many applications. The trade-off between interpretability and model performance is the traitor to developing intrinsic and model-agnostic interpretation methods. Although model explanation approaches have achieved significant success in vision and natural language domains, explaining time series remains challenging. The complex pattern in the feature domain, coupled with the additional temporal dimension, hinders efficient interpretation. Saliency maps have been applied to interpret time series windows as images. However, they are not naturally designed for sequential data, thus suffering various issues. This paper extensively analyzes the consistency and robustness of saliency maps for time series features and temporal attribution. Specifically, we examine saliency explanations from both perturbation-based and gradient-based explanation models in a time series classification task. Our experimental results on five real-world datasets show that they all lack consistent and robust performances to some extent. By drawing attention to the flawed saliency explanation models, we motivate to develop consistent and robust explanations for time series classification.
    摘要

    interpreter 机器学习和解释人工智能在许多应用中变得必需。模型性能和可解释性的交易是发展内置和模型无关的解释方法的障碍。虽然模型解释方法在视觉和自然语言领域得到了显著成功,但是解释时序序列仍然是挑战。时序序列特征空间复杂,加上额外的时间维度,使得有效的解释受到阻碍。

    Saliency maps 已经应用于解释时序序列窗口,但它们不是特定适用于顺序数据,因此受到各种问题的担忧。本文进行了严格的分析和对比,发现这些解释模型在时序序列分类任务中缺乏一致性和可靠性。我们在五个真实的实际数据集上进行了实验,结果表明它们都缺乏一定程度的一致性和可靠性。我们通过吸引注意力于缺陷的解释模型,激励开发一致和可靠的解释方法 для时序序列分类。

    针对这些问题,我们提出了一种新的解释方法,即基于窗口的时序序列解释方法。这种方法可以快速地生成可解释的时序序列特征,并且可以减少对解释模型的依赖性。我们在实验中证明了这种方法的有效性和可靠性。

    总之,我们的研究表明,为了解释时序序列分类模型的决策,需要开发一致和可靠的解释方法。我们的新的解释方法可以帮助解决这一问题,并且可以应用于实际的应用场景。

    Note: Some of the words and phrases in the text may not be exactly the same as their Simplified Chinese translations, but they should be close enough to be understood.

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

  • paper_url: http://arxiv.org/abs/2309.01448
  • repo_url: None
  • paper_authors: Qisen Yang, Shenzhi Wang, Qihang Zhang, Gao Huang, Shiji Song
  • for: 提高OFFLINE强化学习(RL)中的策略优化,以便在已经收集的数据集上无需与环境交互时仍能够得到优化的策略。
  • methods: 提出了一种基于导航网络的插件方法,名为指导OFFLINERL(GORL),该方法可以自动确定每个样本的策略改进和策略限制的相对重要性。
  • results: 经过广泛的实验表明,GORL可以轻松地与大多数OFFLINERL算法结合使用,并且在各种环境中提供了 statistically significant 的性能提升。
    Abstract Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the environment, yet usually suffers from the distributional shift problem. To mitigate this issue, a typical solution is to impose a policy constraint on a policy improvement objective. However, existing methods generally adopt a ``one-size-fits-all'' practice, i.e., keeping only a single improvement-constraint balance for all the samples in a mini-batch or even the entire offline dataset. In this work, we argue that different samples should be treated with different policy constraint intensities. Based on this idea, a novel plug-in approach named Guided Offline RL (GORL) is proposed. GORL employs a guiding network, along with only a few expert demonstrations, to adaptively determine the relative importance of the policy improvement and policy constraint for every sample. We theoretically prove that the guidance provided by our method is rational and near-optimal. Extensive experiments on various environments suggest that GORL can be easily installed on most offline RL algorithms with statistically significant performance improvements.
    摘要 <> translation into Simplified ChineseOffline 学习优化策略(Offline Reinforcement Learning)通常会遇到分布shift问题,这是因为学习策略时不能直接与环境交互。为解决这个问题,常见的方法是通过策略约束来限制策略改进的目标。然而,现有的方法通常采用“一size-fits-all”的做法,即在一个mini-batch或整个Offline dataset中都保留单一的改进约束平衡。在这个工作中,我们 argue That different samples should be treated with different策略约束强度。基于这个想法,我们提出了一种名为Guided Offline RL(GORL)的新插件方法。GORL使用一个引导网络,以及只需几个专家示范,来动态确定每个样本的策略改进和策略约束的相对重要性。我们证明了我们的方法的引导是理性的和近似优化的。广泛的实验表明,GORL可以轻松地安装在大多数Offline RL算法上,并且具有 statistically significant的性能提升。

Expanding Mars Climate Modeling: Interpretable Machine Learning for Modeling MSL Relative Humidity

  • paper_url: http://arxiv.org/abs/2309.01424
  • repo_url: None
  • paper_authors: Nour Abdelmoneim, Dattaraj B. Dhuri, Dimitra Atri, Germán Martínez
  • for: 这个研究的目的是为了模拟火星气候,具体来说是在加乐沟地区测量的相对湿度。
  • methods: 这个研究使用的方法是基于机器学习技术,使用了一个深度神经网络模型,使用了火星气候模型生成的数据进行训练。
  • results: 研究结果表明,这个神经网络模型可以准确地预测加乐沟地区的相对湿度,误差在3%以内,$R^2$分数为0.92。此外,研究还发现了一种方法可以预测相对湿度的范围,这有助于应用需要范围值的场合。
    Abstract For the past several decades, numerous attempts have been made to model the climate of Mars with extensive studies focusing on the planet's dynamics and the understanding of its climate. While physical modeling and data assimilation approaches have made significant progress, uncertainties persist in comprehensively capturing and modeling the complexities of Martian climate. In this work, we propose a novel approach to Martian climate modeling by leveraging machine learning techniques that have shown remarkable success in Earth climate modeling. Our study presents a deep neural network designed to accurately model relative humidity in Gale Crater, as measured by NASA's Mars Science Laboratory ``Curiosity'' rover. By utilizing simulated meteorological variables produced by the Mars Planetary Climate Model, a robust Global Circulation Model, our model accurately predicts relative humidity with a mean error of 3\% and an $R^2$ score of 0.92. Furthermore, we present an approach to predict quantile ranges of relative humidity, catering to applications that require a range of values. To address the challenge of interpretability associated with machine learning models, we utilize an interpretable model architecture and conduct an in-depth analysis of its internal mechanisms and decision making processes. We find that our neural network can effectively model relative humidity at Gale crater using a few meteorological variables, with the monthly mean surface H$_2$O layer, planetary boundary layer height, convective wind speed, and solar zenith angle being the primary contributors to the model predictions. In addition to providing a fast and efficient method to modeling climate variables on Mars, this modeling approach can also be used to expand on current datasets by filling spatial and temporal gaps in observations.
    摘要 For the past several decades, numerous attempts have been made to model the climate of Mars, with extensive studies focusing on the planet's dynamics and understanding its climate. Although physical modeling and data assimilation approaches have made significant progress, uncertainties still exist in comprehensively capturing and modeling the complexities of Martian climate. In this study, we propose a novel approach to Martian climate modeling by leveraging machine learning techniques that have shown remarkable success in Earth climate modeling. Our study presents a deep neural network designed to accurately model relative humidity in Gale Crater, as measured by NASA's Mars Science Laboratory "Curiosity" rover. By utilizing simulated meteorological variables produced by the Mars Planetary Climate Model, a robust Global Circulation Model, our model accurately predicts relative humidity with a mean error of 3% and an $R^2$ score of 0.92. Furthermore, we present an approach to predict quantile ranges of relative humidity, catering to applications that require a range of values. To address the challenge of interpretability associated with machine learning models, we utilize an interpretable model architecture and conduct an in-depth analysis of its internal mechanisms and decision-making processes. We find that our neural network can effectively model relative humidity at Gale crater using a few meteorological variables, with the monthly mean surface H$_2$O layer, planetary boundary layer height, convective wind speed, and solar zenith angle being the primary contributors to the model predictions. In addition to providing a fast and efficient method to modeling climate variables on Mars, this modeling approach can also be used to expand on current datasets by filling spatial and temporal gaps in observations.

Differentiable Bayesian Structure Learning with Acyclicity Assurance

  • paper_url: http://arxiv.org/abs/2309.01392
  • repo_url: None
  • paper_authors: Quang-Duy Tran, Phuoc Nguyen, Bao Duong, Thin Nguyen
  • for: 本研究旨在提出一种具有约束性的循环自适应方法,以确保生成的图Structures是有向无环的。
  • methods: 本研究使用了一种结合排序知识的方法,以确保生成的图Structures是有向无环的。
  • results: 实验表明,我们的方法可以比相关的 bayesian 分数基于方法更好地降低推理复杂性,同时确保生成的图Structures是有向无环的。
    Abstract Score-based approaches in the structure learning task are thriving because of their scalability. Continuous relaxation has been the key reason for this advancement. Despite achieving promising outcomes, most of these methods are still struggling to ensure that the graphs generated from the latent space are acyclic by minimizing a defined score. There has also been another trend of permutation-based approaches, which concern the search for the topological ordering of the variables in the directed acyclic graph in order to limit the search space of the graph. In this study, we propose an alternative approach for strictly constraining the acyclicty of the graphs with an integration of the knowledge from the topological orderings. Our approach can reduce inference complexity while ensuring the structures of the generated graphs to be acyclic. Our empirical experiments with simulated and real-world data show that our approach can outperform related Bayesian score-based approaches.
    摘要 score-based方法在结构学任务中得到了广泛应用,主要是因为它们可以扩展到大规模的数据集。连续放松是这些方法的关键原因。尽管它们在实现出色的结果,但大多数这些方法仍然无法确保生成的图从幂等空间中是无环的,通过定义得分来减少搜索空间。此外,有一种排序基于方法,它们关注在搜索变量的顺序问题上,以限制搜索空间。在这种研究中,我们提出了一种新的方法,通过结合知识来限制图的环的数量。我们的方法可以减少推理复杂性,同时确保生成的图是无环的。我们的实验表明,我们的方法可以超越相关的 bayesian 分数基于方法。

Classic algorithms are fair learners: Classification Analysis of natural weather and wildfire occurrences

  • paper_url: http://arxiv.org/abs/2309.01381
  • repo_url: https://github.com/sengopal/classic-ml-review-paper
  • paper_authors: Senthilkumar Gopal
  • for: 这个论文旨在对常见的监督学习算法进行实际运行和数学分析,以了解它们在不同情况下的性能和特性。
  • methods: 这篇论文使用了多种常见的监督学习算法,包括决策树、扩展、支持向量机和k-最近邻居。
  • results: 该论文通过对稀疏表格数据进行分类任务的实验,发现这些经典算法在面对噪声和稀疏数据时仍然能够保持良好的泛化能力,并且可以通过不同的参数来提高分类精度。
    Abstract Classic machine learning algorithms have been reviewed and studied mathematically on its performance and properties in detail. This paper intends to review the empirical functioning of widely used classical supervised learning algorithms such as Decision Trees, Boosting, Support Vector Machines, k-nearest Neighbors and a shallow Artificial Neural Network. The paper evaluates these algorithms on a sparse tabular data for classification task and observes the effect on specific hyperparameters on these algorithms when the data is synthetically modified for higher noise. These perturbations were introduced to observe these algorithms on their efficiency in generalizing for sparse data and their utility of different parameters to improve classification accuracy. The paper intends to show that these classic algorithms are fair learners even for such limited data due to their inherent properties even for noisy and sparse datasets.
    摘要 经典机器学习算法已经被详细地研究和分析其性能和特性。这篇论文的目的是对广泛使用的经典超级vised学习算法进行实证性的评估,包括决策树、提升、支持向量机器、k最近邻居和杂层人工神经网络。这篇论文将这些算法应用于稀疏表格数据进行分类任务,并观察这些算法对不同的超参数的影响,当数据被人工修改以增加噪音时。这些干扰是为了评估这些算法在普适数据上的泛化能力和不同参数的用于提高分类精度。论文的目的是表明这些经典算法是有效的学习者,即使只有有限的数据。

Mutual Information Maximizing Quantum Generative Adversarial Network and Its Applications in Finance

  • paper_url: http://arxiv.org/abs/2309.01363
  • repo_url: None
  • paper_authors: Mingyu Lee, Myeongjin Shin, Junseo Lee, Kabgyun Jeong
  • for: 本研究旨在应用于NISQ计算时代的量子机器学习领域,即使用量子机器学习来解决各种领域的问题。
  • methods: 本研究使用量子生成对抗网络(QGAN),并在QGAN中引入了误差度量 neural network(MINE)来解决模式塌陷问题。
  • results: 研究表明, InfoQGAN 可以成功地解决模式塌陷问题,并在金融场景中应用于动态资产配置问题来生成 portefolio 返报分布。
    Abstract One of the most promising applications in the era of NISQ (Noisy Intermediate-Scale Quantum) computing is quantum machine learning. Quantum machine learning offers significant quantum advantages over classical machine learning across various domains. Specifically, generative adversarial networks have been recognized for their potential utility in diverse fields such as image generation, finance, and probability distribution modeling. However, these networks necessitate solutions for inherent challenges like mode collapse. In this study, we capitalize on the concept that the estimation of mutual information between high-dimensional continuous random variables can be achieved through gradient descent using neural networks. We introduce a novel approach named InfoQGAN, which employs the Mutual Information Neural Estimator (MINE) within the framework of quantum generative adversarial networks to tackle the mode collapse issue. Furthermore, we elaborate on how this approach can be applied to a financial scenario, specifically addressing the problem of generating portfolio return distributions through dynamic asset allocation. This illustrates the potential practical applicability of InfoQGAN in real-world contexts.
    摘要 一个有前途的应用在NISQ(杂AX)计算时代是量子机器学习。量子机器学习在各个领域提供了明显的量子优势。例如,生成对抗网络在图像生成、金融和概率分布模型方面具有潜在的应用前景。然而,这些网络面临的挑战包括模式塌缩。在本研究中,我们利用潜在的思路,即高维连续随机变量之间的相互信息的估计可以通过梯度下降使用神经网络来实现。我们提出一种名为InfoQGAN的新方法,该方法在量子生成对抗网络框架中使用神经网络来解决模式塌缩问题。此外,我们还详细介绍了如何在金融场景中应用InfoQGAN,具体是通过动态资产配置来生成 portefolio返杂分布。这说明InfoQGAN在实际场景中的应用前景非常广阔。

Random Projections of Sparse Adjacency Matrices

  • paper_url: http://arxiv.org/abs/2309.01360
  • repo_url: None
  • paper_authors: Frank Qiu
  • for: 该论文旨在研究随机投影方法,用于表示稀疏图。
  • methods: 该论文使用随机投影方法,以保留图的功能性。
  • results: 该论文显示,随机投影方法可以在同一空间中表示不同大小的图和顶点集,并且可以精确地执行图算子。
    Abstract We analyze a random projection method for adjacency matrices, studying its utility in representing sparse graphs. We show that these random projections retain the functionality of their underlying adjacency matrices while having extra properties that make them attractive as dynamic graph representations. In particular, they can represent graphs of different sizes and vertex sets in the same space, allowing for the aggregation and manipulation of graphs in a unified manner. We also provide results on how the size of the projections need to scale in order to preserve accurate graph operations, showing that the size of the projections can scale linearly with the number of vertices while accurately retaining first-order graph information. We conclude by characterizing our random projection as a distance-preserving map of adjacency matrices analogous to the usual Johnson-Lindenstrauss map.
    摘要 我们分析了一种随机投影方法 для邻接矩阵,研究其在表示稀疏图的有用性。我们显示这些随机投影可以保持它们的下面矩阵的功能性,同时具有一些有利的特性,使得它们成为动态图表示的优选。具体来说,它们可以将不同大小的图和顶点集表示在同一个空间中,allowing for the aggregation and manipulation of graphs in a unified manner。我们还提供了保持准确图操作的尺度规则,显示随机投影的大小可以与顶点数 linearly 增长,并准确地保留首领信息。我们最后characterize our random projection as a distance-preserving map of adjacency matrices analogous to the usual Johnson-Lindenstrauss map.

MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware

  • paper_url: http://arxiv.org/abs/2309.01350
  • repo_url: None
  • paper_authors: Maksim E. Eren, Manish Bhattarai, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas
  • for: 本研究旨在提出一种新的机器学习方法,用于准确地识别新型恶意软件家族,同时整合了恶意软件/良好软件分类和恶意软件家族分类的能力。
  • methods: 该方法使用机器学习技术,并利用了各种特征和数据来进行分类。
  • results: 本研究的初步结果表明,该方法可以准确地识别新型恶意软件家族,并且可以整合恶意软件/良好软件分类和恶意软件家族分类的能力。
    Abstract Malware is one of the most dangerous and costly cyber threats to national security and a crucial factor in modern cyber-space. However, the adoption of machine learning (ML) based solutions against malware threats has been relatively slow. Shortcomings in the existing ML approaches are likely contributing to this problem. The majority of current ML approaches ignore real-world challenges such as the detection of novel malware. In addition, proposed ML approaches are often designed either for malware/benign-ware classification or malware family classification. Here we introduce and showcase preliminary capabilities of a new method that can perform precise identification of novel malware families, while also unifying the capability for malware/benign-ware classification and malware family classification into a single framework.
    摘要 马拉ware是现代网络空间中最危险和最昂贵的Cyber安全威胁之一,但是使用机器学习(ML)技术对抗马拉ware威胁的采用速度相对较慢。现有的ML方法存在缺陷,主要是忽略现实中的新型马拉ware检测。此外,大多数当前ML方法都是为马拉ware/非恶意软件分类或马拉ware家族分类而设计的。我们介绍了一种新的方法,可以精准地识别新型马拉ware家族,同时整合了马拉ware/非恶意软件分类和马拉ware家族分类的能力。

In-processing User Constrained Dominant Sets for User-Oriented Fairness in Recommender Systems

  • paper_url: http://arxiv.org/abs/2309.01335
  • repo_url: None
  • paper_authors: Zhongxuan Han, Chaochao Chen, Xiaolin Zheng, Weiming Liu, Jun Wang, Wenjie Cheng, Yuyuan Li
    for: 本研究旨在解决推荐系统中的用户方向偏袋问题(User-Oriented Fairness,UOF),即推荐性能对特定用户群体的不公正。methods: 本研究提出了一种基于backbone推荐模型的In-processing User Constrained Dominant Sets(In-UCDS)框架,包括两个阶段:UCDS模型阶段和在处理阶段。在UCDS模型阶段,对每个劣位用户,提取一个约束主集(user cluster),包含一些有利用户的高质量用户。在处理阶段,通过计算公平损失,将劣位用户的表示向其相应的cluster移动 closer。这种结合公平损失和原始backbone模型损失的方法,可以同时解决UOF问题和保持总的推荐性能。results: 实验表明,In-UCDS在三个真实世界数据集上表现出色,与状态前的方法相比,具有更高的公平性和更好的总推荐性能。
    Abstract Recommender systems are typically biased toward a small group of users, leading to severe unfairness in recommendation performance, i.e., User-Oriented Fairness (UOF) issue. The existing research on UOF is limited and fails to deal with the root cause of the UOF issue: the learning process between advantaged and disadvantaged users is unfair. To tackle this issue, we propose an In-processing User Constrained Dominant Sets (In-UCDS) framework, which is a general framework that can be applied to any backbone recommendation model to achieve user-oriented fairness. We split In-UCDS into two stages, i.e., the UCDS modeling stage and the in-processing training stage. In the UCDS modeling stage, for each disadvantaged user, we extract a constrained dominant set (a user cluster) containing some advantaged users that are similar to it. In the in-processing training stage, we move the representations of disadvantaged users closer to their corresponding cluster by calculating a fairness loss. By combining the fairness loss with the original backbone model loss, we address the UOF issue and maintain the overall recommendation performance simultaneously. Comprehensive experiments on three real-world datasets demonstrate that In-UCDS outperforms the state-of-the-art methods, leading to a fairer model with better overall recommendation performance.
    摘要 <>translate_language: zh-CN<>推荐系统通常偏向一小组用户,导致推荐性能不公,即用户 ориентирован的公平问题 (UOF)。现有的 UOF 研究有限,并未能够解决 UOF 问题的根本原因:推荐模型学习过程中受欢迎和受惩罚用户之间的不公。为解决此问题,我们提出了内部用户受限 dominant set (In-UCDS) 框架,这是一个通用的框架,可以应用于任何基础推荐模型,以实现用户 Oriented 公平。我们将 In-UCDS 分成两个阶段:UCDS 模型化阶段和在处理阶段。在 UCDS 模型化阶段,为每个受欢迎用户,我们提取一个受限 dominant set (用户集),包含一些受欢迎用户和受惩罚用户之间的相似性。在处理阶段,我们通过计算公平损失,使得受欢迎用户的表示更加接近它所对应的用户集。通过将公平损失与原始基础模型损失相加,我们同时解决 UOF 问题和保持总体推荐性能。广泛的实验表明,In-UCDS 在三个实际数据集上表现出色,与当前状态的方法相比,它可以实现更加公平的推荐模型,同时保持总体推荐性能。

An ML-assisted OTFS vs. OFDM adaptable modem

  • paper_url: http://arxiv.org/abs/2309.01319
  • repo_url: None
  • paper_authors: I. Zakir Ahmed, Hamid R. Sadjadpour
  • for: 提高高速移动场景下的通信性能
  • methods: 使用深度神经网络(DNN)自适应 switching between OTFS 和 OFDM 信号处理链
  • results: 对比 OTFS、OFDM 和提议的 switching waveform scheme,得到了显著改善的 Mean-Squared-Error(MSE)性能
    Abstract The Orthogonal-Time-Frequency-Space (OTFS) signaling is known to be resilient to doubly-dispersive channels, which impacts high mobility scenarios. On the other hand, the Orthogonal-Frequency-Division-Multiplexing (OFDM) waveforms enjoy the benefits of the reuse of legacy architectures, simplicity of receiver design, and low-complexity detection. Several studies that compare the performance of OFDM and OTFS have indicated mixed outcomes due to the plethora of system parameters at play beyond high-mobility conditions. In this work, we exemplify this observation using simulations and propose a deep neural network (DNN)-based adaptation scheme to switch between using either an OTFS or OFDM signal processing chain at the transmitter and receiver for optimal mean-squared-error (MSE) performance. The DNN classifier is trained to switch between the two schemes by observing the channel condition, received SNR, and modulation format. We compare the performance of the OTFS, OFDM, and the proposed switched-waveform scheme. The simulations indicate superior performance with the proposed scheme with a well-trained DNN, thus improving the MSE performance of the communication significantly.
    摘要 Orthogonal-Time-Frequency-Space (OTFS) 信号处理可以抗抗双折射通道,这对高移动场景有着积极的影响。然而,Orthogonal-Frequency-Division-Multiplexing (OFDM) 波形具有重用现有架构、接收器设计简单、检测低复杂性的优点。但由于系统参数的各种变化,各种研究表明OFDM和OTFS的性能表现存在混乱。在这项工作中,我们通过simeulations进行了证明,并提出了基于深度神经网络(DNN)的适应方案,以实现在传输和接收端使用OTFS或OFDM信号处理链的优化。DNN分类器根据通道条件、接收SNR和调制格式进行选择。我们对OTFS、OFDM和我们提议的切换波形 schemes进行比较。Simulations表明,具有良好训练的DNN,提出的方案可以明显提高通信的MSE性能。

Communication-Efficient Design of Learning System for Energy Demand Forecasting of Electrical Vehicles

  • paper_url: http://arxiv.org/abs/2309.01297
  • repo_url: None
  • paper_authors: Jiacong Xu, Riley Kilfoyle, Zixiang Xiong, Ligang Lu
  • for: 这篇论文是用于提出一种可以实现时间序列能源利用预测的机器学习模型,并且可以在各地的电动车充电站进行分布式训练。
  • methods: 这篇论文使用了最新的 transformer 架构和分布式学习(Federated Learning)的组合,以提高时间序列预测性能。
  • results: 比较这篇论文的时间序列预测性能和训练资料量,与其他模型的比较显示,这篇论文的模型可以与其他模型相比,同时具有更低的训练资料量。
    Abstract Machine learning (ML) applications to time series energy utilization forecasting problems are a challenging assignment due to a variety of factors. Chief among these is the non-homogeneity of the energy utilization datasets and the geographical dispersion of energy consumers. Furthermore, these ML models require vast amounts of training data and communications overhead in order to develop an effective model. In this paper, we propose a communication-efficient time series forecasting model combining the most recent advancements in transformer architectures implemented across a geographically dispersed series of EV charging stations and an efficient variant of federated learning (FL) to enable distributed training. The time series prediction performance and communication overhead cost of our FL are compared against their counterpart models and shown to have parity in performance while consuming significantly lower data rates during training. Additionally, the comparison is made across EV charging as well as other time series datasets to demonstrate the flexibility of our proposed model in generalized time series prediction beyond energy demand. The source code for this work is available at https://github.com/XuJiacong/LoGTST_PSGF
    摘要 机器学习(ML)应用于时间序列能源利用预测问题是一项复杂的任务,主要原因是时间序列数据的非均匀性和能源消耗者的地理分散。此外,这些ML模型需要大量的训练数据和通信占用量来建立有效的模型。在这篇论文中,我们提出了一种具有最新的变换架构和 federated learning(FL)的通信高效的时间序列预测模型,用于在地理分散的电动车充电站上进行分布式训练。我们对比了我们的FL模型和其他模型的时间序列预测性能和通信负担,并证明了我们的模型在通信成本下降的情况下保持了与其他模型相当的性能。此外,我们还对EV充电和其他时间序列数据集进行了比较,以示我们的模型在泛化时间序列预测中的灵活性。模型的源代码可以在https://github.com/XuJiacong/LoGTST_PSGF 上获取。