results: 研究发现,尽管RL和supervised learning在一些方面存在差异,但是在某些情况下可以看到类似的优化动力学特性,例如Edge of stability现象。此外,研究发现DQN使用的Huber损失函数会导致更强的Edge of stability效应,而C51使用的cross entropy损失函数则没有这种效应。Abstract
Recent progress has been made in understanding optimisation dynamics in neural networks trained with full-batch gradient descent with momentum with the uncovering of the edge of stability phenomenon in supervised learning. The edge of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches the divergence threshold of the underlying optimisation algorithm for a quadratic loss, after which it starts oscillating around the threshold, and the loss starts to exhibit local instability but decreases over long time frames. In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms across a variety of data regimes, from offline to online RL. Our experiments reveal that, despite significant differences to supervised learning, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can be present in off-policy deep RL. Unlike supervised learning, however, we observe strong differences depending on the underlying loss, with DQN -- using a Huber loss -- showing a strong edge of stability effect that we do not observe with C51 -- using a cross entropy loss. Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.
摘要
In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms, across various data regimes, from offline to online RL. Our experiments show that, despite significant differences between supervised learning and RL, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can still be present in off-policy deep RL.However, we observe strong differences depending on the underlying loss function. Specifically, we find that DQN, which uses a Huber loss, exhibits a strong edge of stability effect, while C51, which uses a cross entropy loss, does not. Our results suggest that while neural network structure can lead to optimization dynamics that transfer between problem domains, certain aspects of deep RL optimization can differentiate it from domains such as supervised learning.
On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise
results: 研究提出了一种策略性和系统性的方法,可以帮助企业有效地解决挑战并实现目标,同时建立生成数据解决方案的信任。Abstract
Generative AI technologies are gaining unprecedented popularity, causing a mix of excitement and apprehension through their remarkable capabilities. In this paper, we study the challenges associated with deploying synthetic data, a subfield of Generative AI. Our focus centers on enterprise deployment, with an emphasis on privacy concerns caused by the vast amount of personal and highly sensitive data. We identify 40+ challenges and systematize them into five main groups -- i) generation, ii) infrastructure & architecture, iii) governance, iv) compliance & regulation, and v) adoption. Additionally, we discuss a strategic and systematic approach that enterprises can employ to effectively address the challenges and achieve their goals by establishing trust in the implemented solutions.
摘要
<>随着生成AI技术的普及,人们对其卓越的能力感到激动和略有拘懑。在这篇论文中,我们研究了部署生成数据的挑战,这是生成AI技术的一个子领域。我们的关注点是企业部署,强调个人隐私问题,由于巨量的个人隐私数据。我们识别出40多个挑战,并将它们分为五个主要类别:一、生成;二、基础设施与架构;三、管理;四、合规与法规;五、采纳。此外,我们还讨论了企业可以采用的战略和系统性的方法,以确保实施解决方案的可靠性和效果。
For: The paper is written to propose and experiment with the Forward Forward algorithm, a novel method for training neural networks as an alternative to backpropagation.* Methods: The paper uses the MNIST dataset to replicate Hinton’s experiments and extend the scope of the method with two significant contributions: establishing a baseline performance on the IMDb movie reviews dataset for sentiment analysis, and introducing a novel pyramidal optimization strategy for the loss threshold.* Results: The paper shows that the Forward Forward network achieves a good performance on the sentiment analysis task, with a test error difference of up to 8% compared to the baseline method. Additionally, the paper visualizes the trained parameters and derives several significant insights, such as a notably larger mean and variance in the weights acquired by the Forward Forward network.Here is the same information in Simplified Chinese text:* For: 该文章是为了介绍和实验Forward Forward算法,一种用于训练神经网络的新方法,作为对backpropagation的替代方案。* Methods: 文章使用MNIST数据集来重现Hinton的实验,并对方法进行了两项重要贡献:在IMDb电影评论数据集上建立了一个基准性表现,并介绍了一种新的pyramidal优化策略来处理损失阈值。* Results: 文章显示,Forward Forward网络在情感分析任务上达到了一个不错的性能,与基准方法相比,测试错误差异可达8%。文章还可视化了训练参数并得出了一些重要的发现,如Forward Forward网络的 weights 的平均值和标准差较大(10-20倍)。Abstract
The Forward Forward algorithm, proposed by Geoffrey Hinton in November 2022, is a novel method for training neural networks as an alternative to backpropagation. In this project, we replicate Hinton's experiments on the MNIST dataset, and subsequently extend the scope of the method with two significant contributions. First, we establish a baseline performance for the Forward Forward network on the IMDb movie reviews dataset. As far as we know, our results on this sentiment analysis task marks the first instance of the algorithm's extension beyond computer vision. Second, we introduce a novel pyramidal optimization strategy for the loss threshold - a hyperparameter specific to the Forward Forward method. Our pyramidal approach shows that a good thresholding strategy causes a difference of up to 8% in test error. Lastly, we perform visualizations of the trained parameters and derived several significant insights, such as a notably larger (10-20x) mean and variance in the weights acquired by the Forward Forward network. Repository: https://github.com/Ads-cmu/ForwardForward
摘要
“对前进对方法”,由 Geoffrey Hinton 在2022年11月提出,是一种可以作为后向传播的训练神经网络的新方法。在这个项目中,我们重现了 Hinton 的实验,并将其推广到两个重要的贡献。首先,我们建立了 Forward Forward 网络在 IMDb 电影评论数据集上的基准性能。我们知道,这是 Forward Forward 方法在 Computer Vision 以外的第一个应用。其次,我们导入了一个新的 pyramidal 优化策略,用于损失阈值(一个特有的 Forward Forward 方法参数)。我们的 pyramidal 方法显示,一个好的阈值选择可以导致测试错误下降8%。最后,我们进行了网络参数的训练和分析,获得了一些重要的见解,例如 Forward Forward 网络获得的平均和方差都是10-20倍于传统的神经网络。Repository:https://github.com/Ads-cmu/ForwardForward
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
for: 本研究 empirically studies the evolution of the largest eigenvalue of the loss Hessian during gradient descent (GD) training, and observes a phenomenon called Edge of Stability (EoS).
methods: 本研究使用 empirical studies and rigorous proof to demonstrate the phenomenon of trajectory alignment on a specific bifurcation diagram, independent of initialization, when EoS occurs.
results: 研究发现,在进行GD训练时,大estenvalue of loss Hessian在初期phasereceives a sharp increase (referred to as progressive sharpening), eventually saturating close to the threshold of $2 / \text{(step size)}$. Additionally, the study establishes the trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point.Abstract
Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe a phenomenon called the Edge of Stability (EoS). The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.
摘要
科恩等(2021)employs empirical studies to investigate the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory, and observe a phenomenon called the Edge of Stability (EoS). The sharpness initially increases during the early phase of training (referred to as progressive sharpening) and eventually saturates near the threshold of $2 / (\text{step size})$.在本文中,我们首先通过实验研究发现,当EoS现象出现时,不同的GD轨迹(经过适当的重parameterization)在独立于初始化的情况下归一化到特定的分岔图表上。然后,我们正式证明了这种轨迹归一化现象,并且覆盖了 reciently的 literatura 中的发现。我们的轨迹归一化分析确立了进攻性增强和EoS现象,这包括并推广了现有的 literatura 中的发现。
On the sample complexity of estimation in logistic regression
methods: 这 paper 使用了标准正态分布的covariates,并研究了逻辑回归模型参数的估计样本复杂度,以及逻辑回归模型参数的估计与错误和 inverse temperature 之间的关系。
results: 这 paper 发现了逻辑回归模型参数估计的样本复杂度curve 有两个变点(或 kritical points),这些变点可以清晰地分类三个温度范围:低温、中温和高温。Abstract
The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.
摘要
“逻辑回传模型是数据生成模型中最受欢迎的一种,尤其在噪音 binary 分类问题中。在这个工作中,我们研究了逻辑回传模型参数的估计,在给定的 $\ell_2$ 误差下,与维度和逆温度的关系。使用标准正常分布的 covariates。逆温度控制了数据生成过程中的信号噪音比例。 Although the generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression have been well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
results: 论文认为反馈是自动评分系统的关键组成部分,可以帮助用户改善写作技巧。Abstract
The first automated essay scoring system was developed 50 years ago. Automated essay scoring systems are developing into systems with richer functions than the previous simple scoring systems. Its purpose is not only to score essays but also as a learning tool to improve the writing skill of users. Feedback is the most important aspect of making an automated essay scoring system useful in real life. The importance of feedback was already emphasized in the first AES system. This paper reviews research on feedback including different feedback types and essay traits on automated essay scoring. We also reviewed the latest case studies of the automated essay scoring system that provides feedback.
摘要
50 年前开发出了首个自动评分文章系统。现在的自动评分系统不仅可以评分文章,还可以作为学习工具,帮助用户提升写作技巧。回馈是自动评分系统在实际应用中的重要方面。这篇文章检视了不同类型的回馈和文章特征在自动评分系统中的研究。我们也检视了最新的自动评分系统提供的回馈的案例研究。
Latent Graph Attention for Enhanced Spatial Context
results: 在透明物体分割、雾度修复和动态流体估计等三个挑战性应用中,通过 incorporating LGA 提高表现。Abstract
Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.
摘要
A Survey on Figure Classification Techniques in Scientific Documents
paper_authors: Anurag Dhote, Mohammed Javed, David S Doermann
for: This paper is written to provide a systematic review of existing methodologies and data sets for figure classification, with the goal of identifying current research gaps and providing possible directions for future research.
methods: The paper uses a categorization framework to classify figures into five classes - tables, photos, diagrams, maps, and plots - and presents a critical review of existing methodologies and data sets for figure classification.
results: The paper identifies current research gaps in figure classification and provides possible directions for future research, including the need for more diverse and annotated data sets, the development of more sophisticated machine learning algorithms, and the integration of figure classification with other NLP tasks.Abstract
Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables, diagrams, and plots, using different Artificial Intelligence and Machine Learning techniques. This is because removing information from figures could lead to deeper insights into the concepts highlighted in the scientific documents. In this survey paper, we systematically categorize figures into five classes - tables, photos, diagrams, maps, and plots, and subsequently present a critical review of the existing methodologies and data sets that address the problem of figure classification. Finally, we identify the current research gaps and provide possible directions for further research on figure classification.
摘要
Figures 可以用来表示重要信息,并提供一种有效的方式来传递科学知识。近年来,有很多努力在抽取图表中的数据,特别是图表、图形、地图和图表,使用不同的人工智能和机器学习技术。这是因为从图表中抽取信息可以带来更深入的理解科学文献中所提出的概念。在这篇评论文中,我们系统地分类图表为五种类别:表格、照片、图形、地图和图表,然后给出现有的方法和数据集,并评论现有的方法和数据集。最后,我们识别了现有的研究潜在问题,并提供可能的研究方向。
results: 我们在使用chartINFO UB-UNITECH PMC数据集进行实验后,发现使用视觉基于 transformer 模型可以达到 chart classification 的state-of-the-art result。Abstract
Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart classification. In this paper, we present a survey of the current state-of-the-art techniques for chart classification and discuss the available datasets and their supported chart types. We broadly classify these contributions as traditional approaches based on ML, CNN, and Transformers. Furthermore, we carry out an extensive comparative performance analysis of CNN-based and transformer-based approaches on the recently published CHARTINFO UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The data set includes 15 different chart categories, including 22,923 training images and 13,260 test images. We have implemented a vision-based transformer model that produces state-of-the-art results in chart classification.
摘要
图表是文档中重要的视觉信息来源,可以帮助人们更深入理解和解释通常用数字形式表示的信息。科学文献中有很多图表,每个图表都有不同的风格。在文档理解社区中,人们已经开始努力解决自动图表理解的问题,开始于图表分类。在这篇论文中,我们提供了当前状态的技术进展和数据集,以及它们支持的图表类型。我们将这些贡献分为传统的机器学习、CNN和Transformers三大类别。此外,我们还进行了对CNN和Transformers两种方法在ICPR 2022年CHARTINFO UB-UNITECH PMC数据集上的比较性能分析。该数据集包括15种不同的图表类别,包括22,923个训练图像和13,260个测试图像。我们实现了一种基于视觉的Transformers模型,可以在图表分类中达到状态之 Art。
On The Impact of Machine Learning Randomness on Group Fairness
paper_authors: Prakhar Ganesh, Hongyan Chang, Martin Strobel, Reza Shokri
for: 这篇论文探讨了机器学习算法中的群体公平性指标,以及这些指标在不同实验实例中的不稳定性。
methods: 作者研究了不同训练实例中机器学习算法的学习过程中的随机性对群体公平性指标的影响。
results: 研究发现,群体公平性指标的不稳定性主要来自于训练过程中数据顺序的随机性,并且可以通过改变数据顺序的一个epoch来控制群体级别准确率,无需影响模型的总性能。Abstract
Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.
摘要
machine learning中的统计方法可以反映不同群体的性能差异。然而,这些度量值具有高度的变化性,使其在实际评估公平性方面不可靠。我们调查了训练神经网络时不同来源的随机性对集体公平性的影响。我们发现,这种变化源于训练过程中少数群体的学习过程的高度波动性。此外,我们发现主要的随机性来源是训练过程中数据顺序的随机性。基于这些发现,我们展示了如何通过单个轮数据的重新排序来控制集体精度(即模型公平性),并且可以高效地、无损到模型总性能。
results: 相比现有状态的方法,该方法可以提高地点确定精度和覆盖率,并且提供了一些设计指南 дляGNN启用的流动导向地方化。Abstract
Scientific advancements in nanotechnology and advanced materials are paving the way toward nanoscale devices for in-body precision medicine; comprising integrated sensing, computing, communication, data and energy storage capabilities. In the human cardiovascular system, such devices are envisioned to be passively flowing and continuously sensing for detecting events of diagnostic interest. The diagnostic value of detecting such events can be enhanced by assigning to them their physical locations (e.g., body region), which is the main proposition of flow-guided localization. Current flow-guided localization approaches suffer from low localization accuracy and they are by-design unable to localize events within the entire cardiovascular system. Toward addressing this issue, we propose the utilization of Graph Neural Networks (GNNs) for this purpose, and demonstrate localization accuracy and coverage enhancements of our proposal over the existing State of the Art (SotA) approaches. Based on our evaluation, we provide several design guidelines for GNN-enabled flow-guided localization.
摘要
results: 在使用最新的 NASbenchmark数据集和两个碳轨迹下,CE-NAS在碳效率和搜索效率方面表现更好 than三个基elineAbstract
This work presents a novel approach to neural architecture search (NAS) that aims to reduce energy costs and increase carbon efficiency during the model design process. The proposed framework, called carbon-efficient NAS (CE-NAS), consists of NAS evaluation algorithms with different energy requirements, a multi-objective optimizer, and a heuristic GPU allocation strategy. CE-NAS dynamically balances energy-efficient sampling and energy-consuming evaluation tasks based on current carbon emissions. Using a recent NAS benchmark dataset and two carbon traces, our trace-driven simulations demonstrate that CE-NAS achieves better carbon and search efficiency than the three baselines.
摘要
Here is the text in Simplified Chinese:这个工作提出了一种新的神经建筑搜索(NAS)方法,旨在降低能耗成本和提高碳素效率 durante 模型设计过程。提出的框架,叫做碳效 NAS(CE-NAS),包括 NAS 评估算法不同的能耗要求,多目标优化器,和一种启发式 GPU 分配策略。CE-NAS 动态均衡能效取样和能耗评估任务基于当前碳排放。使用最新的 NAS benchmark 数据集和两个碳轨迹,我们的轨迹驱动 simulations 显示,CE-NAS 可以比基准三个实现更好的碳和搜索效率。
A Deep Learning Framework for Solving Hyperbolic Partial Differential Equations: Part I
results: 经过numerical experiment和analytical解的验证,该框架能够准确地近似非线性偏微分方程(PDE)的解,并能够自动处理boundary condition和entropy condition等约束。Abstract
Physics informed neural networks (PINNs) have emerged as a powerful tool to provide robust and accurate approximations of solutions to partial differential equations (PDEs). However, PINNs face serious difficulties and challenges when trying to approximate PDEs with dominant hyperbolic character. This research focuses on the development of a physics informed deep learning framework to approximate solutions to nonlinear PDEs that can develop shocks or discontinuities without any a-priori knowledge of the solution or the location of the discontinuities. The work takes motivation from finite element method that solves for solution values at nodes in the discretized domain and use these nodal values to obtain a globally defined solution field. Built on the rigorous mathematical foundations of the discontinuous Galerkin method, the framework naturally handles imposition of boundary conditions (Neumann/Dirichlet), entropy conditions, and regularity requirements. Several numerical experiments and validation with analytical solutions demonstrate the accuracy, robustness, and effectiveness of the proposed framework.
摘要
物理学 Informed neural networks (PINNs) 已成为解决部分偏微分方程 (PDEs) 的强大工具,但 PINNs 对具有主要浮点特征的 PDEs 颇受阻碍和挑战。这些研究将发展一种基于物理学 Informed 深度学习框架,用于非线性 PDEs 的解析,无需任何先验知识或解析解的位置。这个框架取得了 finite element 方法的灵感,解决方案值在分解区域中的节点,然后使用这些节点值获得全局定义的解场。建立在精确的数学基础上的步骤,该框架自然处理边界条件(Neumann/Dirichlet)、热力学条件和常数性要求。数学实验和与分析解的验证表明该提案的准确性、稳定性和效果。
FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?
results: 提出了一种新的几shot学习框架,通过精心设计文本分支和 metric 模块,可以更好地利用 semantic 信息,并通过 MAML 进行训练,实现更好的扩展性和传输性。Abstract
Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained language model, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.
摘要
几个样本学习目标是训练模型可以通过几个样本来泛化到新的类型。最近,一些工作提出了使用可 accessible 的 semantic information from class names 来增强几个样本学习。然而,这些工作通常是对现有模块,如视觉原型和特征提取器,进行改进。这限制了使用 semantic information 的全部潜力。在这篇论文中,我们提出了一种新的几个样本学习框架,使用基于对比学习的预训练语言模型。为 Address the challenge of 对 visual features 和 textual embeddings 的Alignment,我们仔细设计了文本分支我们的框架,并引入了一个 metric module 来泛化cosine similarity。为了提高转移性,我们让 metric module 适应不同的几个样本任务,并采用 MAML 来训练模型via bi-level optimization。此外,我们在多个 benchmark 上进行了广泛的实验,以证明我们的方法的有效性。
Learning Space-Time Continuous Neural PDEs from Partially Observed States
results: 模型在复杂的synthetic和实际数据集上达到了状态顶峰性,超越了先前的方法和有效地处理受限观测数据,示出其可能性进一步推动数据驱动的 PDE 模型和可靠网格独立的模型化复杂动态过程。Abstract
We introduce a novel grid-independent model for learning partial differential equations (PDEs) from noisy and partial observations on irregular spatiotemporal grids. We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel encoder design for improved data efficiency and grid independence. The latent state dynamics are governed by a PDE model that combines the collocation method and the method of lines. We employ amortized variational inference for approximate posterior estimation and utilize a multiple shooting technique for enhanced training speed and stability. Our model demonstrates state-of-the-art performance on complex synthetic and real-world datasets, overcoming limitations of previous approaches and effectively handling partially-observed data. The proposed model outperforms recent methods, showing its potential to advance data-driven PDE modeling and enabling robust, grid-independent modeling of complex partially-observed dynamic processes.
摘要
我们介绍了一种新的网格独立模型,用于从不稳定和受限的观测数据集上学习部分梯度方程(PDE)。我们提议了一种空间时间连续的潜在神经PDE模型,并使用了一种高效的概率框架和一种新的编码设计以提高数据效率和网格独立性。 latent state动态被由PDE模型控制,这个模型结合了 colocated method和方程eline方法。我们使用了启动变量推理来估计近似 posterior,并使用多个射击技术来提高训练速度和稳定性。我们的模型在复杂的 sintetic和实际数据集上达到了状态艺术性的表现,超越了过去的方法,有效地处理了部分观测数据。我们的模型比最近的方法表现更佳,这表明其可以推进数据驱动PDE模型化,并提供了可靠、网格独立的模型复杂部分观测动态过程。
paper_authors: Chia-Yuan Chang, Yu-Neng Chuang, Kwei-Herng Lai, Xiaotian Han, Xia Hu, Na Zou
for: 降低机器学习模型的不公正预测行为
methods: 利用自动检测偏聚特征交互来减少不公正预测行为
results: 实验结果表明,提议的框架可以在四个实际 datasets 上减少不公正预测行为,并且不需要假设非敏感特征之间的相关性。Abstract
Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missing in real-world scenarios. Therefore, several existing works alleviate the bias without sensitive attributes. Those studies face challenges, either in inaccurate predictions of sensitive attributes or the need to mitigate unequal distribution of manually defined non-sensitive attributes related to bias. The latter requires strong assumptions about the correlation between sensitive and non-sensitive attributes. As data distribution and task goals vary, the strong assumption on non-sensitive attributes may not be valid and require domain expertise. In this work, we propose an assumption-free framework to detect the related attributes automatically by modeling feature interaction for bias mitigation. The proposed framework aims to mitigate the unfair impact of identified biased feature interactions. Experimental results on four real-world datasets demonstrate that our proposed framework can significantly alleviate unfair prediction behaviors by considering biased feature interactions.
摘要
尽管机器学习模型表现出色,但它们仍然存在对某些人群的歧视行为,并且受到不公正预测行为的影响。为了解决这问题,广泛的研究努力于消除敏感特征的不均衡分布,使用多种方法。然而,由于隐私问题,敏感特征在实际场景中 oftentimes 缺失或未知。因此,现有的研究往往需要快速假设敏感特征的替代方案,以减少不公正的预测行为。然而,这些研究面临着两个挑战:一是不准确预测敏感特征,二是需要调整不公正的非敏感特征,这需要强制假设敏感特征和非敏感特征之间的相关性。由于数据分布和任务目标因素的变化,这些假设可能不正确,需要培训领域专家。在这种情况下,我们提出一种假设自由框架,通过模型特性互动来检测相关特征,以消除不公正的影响。我们的提议框架可以减少不公正预测行为,并且实际试验结果表明,在四个真实的数据集上,我们的框架可以明显减少不公正预测行为。
A generative flow for conditional sampling via optimal transport
paper_authors: Jason Alfonso, Ricardo Baptista, Anupam Bhakta, Noam Gal, Alfin Hou, Isa Lyubimova, Daniel Pocklington, Josef Sajonz, Giulio Trigila, Ryan Tsai
methods: 这个论文使用了一种非 Parametric generative model,它通过iteratively mapping reference samples to the target distribution来描述conditionals。这个模型使用了块三角形的交通地图,其中每个块的组件都可以描述目标分布中的conditionals。这个地图由解决一个最优交通问题来获得,其中cost函数是一个weighted $L^2$ cost function。
results: 这个论文的实验结果表明,这种方法可以成功地描述许多非正态的问题,并且比传统的正态流和生成敌对网络更加稳定和可靠。Abstract
Sampling conditional distributions is a fundamental task for Bayesian inference and density estimation. Generative models, such as normalizing flows and generative adversarial networks, characterize conditional distributions by learning a transport map that pushes forward a simple reference (e.g., a standard Gaussian) to a target distribution. While these approaches successfully describe many non-Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn these transformations. This work proposes a non-parametric generative model that iteratively maps reference samples to the target. The model uses block-triangular transport maps, whose components are shown to characterize conditionals of the target distribution. These maps arise from solving an optimal transport problem with a weighted $L^2$ cost function, thereby extending the data-driven approach in [Trigila and Tabak, 2016] for conditional sampling. The proposed approach is demonstrated on a two dimensional example and on a parameter inference problem involving nonlinear ODEs.
摘要
采样 conditional distributions 是 bayesian inference 和 density estimation 的基本任务之一。生成模型,如 нормализацион流和生成对抗网络,可以通过学习一个传输图来描述 conditional distributions,其中传输图将一个简单的参考(例如标准正态)推进到目标分布中。although these approaches have been successful in solving many non-Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn these transformations.本文提出了一种非Parametric生成模型,可以逐步将参考样本映射到目标分布中。该模型使用块三角形的传输图,其中每个组件可以表示目标分布的 conditionals。这些传输图来自于解一个最优运输问题,其中的cost函数是 weighted $L^2$ 的,从而扩展了 [Trigila and Tabak, 2016] 中的数据驱动方法。提出的方法在二维示例和非线性 ODE 参数推断中进行了示例。
GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty
methods: 这篇论文提出了一种新的方法,即Gradient Norm Penalty(GNP),用于提高敌意例的传输性。GNP使得优化过程 converges to a flat region of local optima in the loss landscape,从而提高了敌意例的通用性。
results: 通过对11种state-of-the-art深度学习模型和6种高级防御方法进行实验,这篇论文证明了GNP的高效性和灵活性。GNP可以轻松地与其他梯度基本方法结合使用,以实现更强大的传输基本攻击。Abstract
Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models, where insider knowledge about the target models is not required. Previous methods often generate AE with no or very limited transferability; that is, they easily overfit to the particular architecture and feature representation of the source, white-box model and the generated AE barely work for target, black-box models. In this paper, we propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP). It drives the loss function optimization procedure to converge to a flat region of local optima in the loss landscape. By attacking 11 state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability. We also demonstrate that it is very flexible in that it can be easily integrated with other gradient based methods for stronger transfer-based attacks.
摘要
“敌对例”(AE) WITH 良好的传播能力允许实际的黑盒攻击多种目标模型,不需要内部知识 About the target models. Previous methods often generate AE with no or very limited transferability; that is, they easily overfit to the particular architecture and feature representation of the source, white-box model, and the generated AE barely work for target, black-box models. In this paper, we propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP). It drives the loss function optimization procedure to converge to a flat region of local optima in the loss landscape. By attacking 11 state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability. We also demonstrate that it is very flexible in that it can be easily integrated with other gradient-based methods for stronger transfer-based attacks.Note: Please note that the translation is in Simplified Chinese, which is one of the two standardized Chinese writing systems.
SpreadNUTS – Moderate Dynamic Extension of Paths for No-U-Turn Sampling & Partitioning Visited Regions
for: This paper aims to improve the efficiency and speed of convergence of Hamiltonian Monte Carlo (HMC) methods for sampling distributions.
methods: The paper introduces modifications to the no-U-turn sampler (NUTS) algorithm to explore the sample space faster and achieve faster convergence to the true distribution.
results: The modified NUTS algorithm is shown to have faster convergence to the true distribution than the original NUTS algorithm.Abstract
Markov chain Monte Carlo (MCMC) methods have existed for a long time and the field is well-explored. The purpose of MCMC methods is to approximate a distribution through repeated sampling; most MCMC algorithms exhibit asymptotically optimal behavior in that they converge to the true distribution at the limit. However, what differentiates these algorithms are their practical convergence guarantees and efficiency. While a sampler may eventually approximate a distribution well, because it is used in the real world it is necessary that the point at which the sampler yields a good estimate of the distribution is reachable in a reasonable amount of time. Similarly, if it is computationally difficult or intractable to produce good samples from a distribution for use in estimation, then there is no real-world utility afforded by the sampler. Thus, most MCMC methods these days focus on improving efficiency and speeding up convergence. However, many MCMC algorithms suffer from random walk behavior and often only mitigate such behavior as outright erasing random walks is difficult. Hamiltonian Monte Carlo (HMC) is a class of MCMC methods that theoretically exhibit no random walk behavior because of properties related to Hamiltonian dynamics. This paper introduces modifications to a specific HMC algorithm known as the no-U-turn sampler (NUTS) that aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.
摘要
Hamiltonian Monte Carlo (HMC) 是一种 MCMC 方法,它们在理论上不会受到随机步行行为的影响,因为它们具有相关的 Hamiltonian dynamics 性质。这篇文章介绍了一种对 NUTS 算法(No-U-turn sampler)进行修改,以实现更快地探索样本空间,并实现一个更快速地趋向真实分布的抽样器。
Restricted Generative Projection for One-Class Classification and Anomaly Detection
results: 对多个基准数据集进行比较研究,我们的方法与基准方法相比,显示更高的效果。Abstract
We present a simple framework for one-class classification and anomaly detection. The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution. Crucially, the target distribution should be sufficiently simple, compact, and informative. The simplicity is to ensure that we can sample from the distribution easily, the compactness is to ensure that the decision boundary between normal data and abnormal data is clear and reliable, and the informativeness is to ensure that the transformed data preserve the important information of the original data. Therefore, we propose to use truncated Gaussian, uniform in hypersphere, uniform on hypersphere, or uniform between hyperspheres, as the target distribution. We then minimize the distance between the transformed data distribution and the target distribution while keeping the reconstruction error for the original data small enough. Comparative studies on multiple benchmark datasets verify the effectiveness of our methods in comparison to baselines.
摘要
我们提出了一个简单的框架 для一类分类和异常检测。核心思想是学习将未知的训练数据分布映射到已知的目标分布上。重要的是,目标分布应该够简单、够 компакт、够有信息。简单是以便我们可以轻松地采样分布, compactness 是以便决策边界 между 正常数据和异常数据清晰可靠,有信息是以便保留原始数据的重要信息。因此,我们提议使用 truncated Gaussian, uniform in hypersphere, uniform on hypersphere,或 uniform between hyperspheres 作为目标分布。然后我们尝试将数据分布与目标分布的距离降为最小化,保持原始数据的重建错误小 enough。多种比较研究在多个 benchmark 数据集上验证了我们的方法的有效性,与基eline 相比。
Class-Incremental Mixture of Gaussians for Deep Continual Learning
results: 实验表明,该模型在内存免除的场景下可以有效地学习,并与现有的连续学习基eline相比较竞争力强。Abstract
Continual learning models for stationary data focus on learning and retaining concepts coming to them in a sequential manner. In the most generic class-incremental environment, we have to be ready to deal with classes coming one by one, without any higher-level grouping. This requirement invalidates many previously proposed methods and forces researchers to look for more flexible alternative approaches. In this work, we follow the idea of centroid-driven methods and propose end-to-end incorporation of the mixture of Gaussians model into the continual learning framework. By employing the gradient-based approach and designing losses capable of learning discriminative features while avoiding degenerate solutions, we successfully combine the mixture model with a deep feature extractor allowing for joint optimization and adjustments in the latent space. Additionally, we show that our model can effectively learn in memory-free scenarios with fixed extractors. In the conducted experiments, we empirically demonstrate the effectiveness of the proposed solutions and exhibit the competitiveness of our model when compared with state-of-the-art continual learning baselines evaluated in the context of image classification problems.
摘要
Properly Learning Decision Trees with Queries Is NP-Hard
results: 我们的结果,与最近的几乎多项式时间查询算法(Blanc-Lange-Qiao-Tan 2022)一起,表明了分布假设对问题的影响。Abstract
We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees, we give a general method for identifying a small set of inputs that is responsible for its complexity. Our technique even rules out query learners that are allowed constant error. This contrasts with existing lower bounds for the setting of random examples which only hold for inverse-polynomial error. Our result, taken together with a recent almost-polynomial time query algorithm for properly learning decision trees under the uniform distribution (Blanc-Lange-Qiao-Tan 2022), demonstrates the dramatic impact of distributional assumptions on the problem.
摘要
我们证明了对问题进行PAC学习是NP困难的,解决了学习理论中长期存在的开问题(Bshouty 1993;Guijarro-Lavin-Raghavan 1999;Mehta-Raghavan 2002;Feldman 2016)。尽管在过去的工作中(Pitt-Valiant 1988)已经证明了从随机例子学习决策树的困难性,但是问题的设定是查询学习需要不同的技术,没有过去的下界。我们在证明过程中简化了和加强了最好的下界,并将其应用于决策树问题。在技术上,我们引入了困难炼煮的概念,它可以应用于任何复杂度度量。对于需要大型决策树的函数,我们提供了一个通用的方法,可以从小批量的输入中获得问题的复杂性。我们的技术甚至可以排除查询学习器,即允许常数错误。这与以往的下界,仅适用于随机例子中的问题,不同之处。我们的结果,加上最近的几乎多项时间查询算法(Blanc-Lange-Qiao-Tan 2022),显示出分布假设的影响。
DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs
results: 该论文在 Policy Debate 中的一种美国竞议辩护中,使用了53180个新的例子和更多的有用metadata,并使用了 txtai semantic search 和知识图工具链生成了9个Semantic Knowledge Graphs。这些知识图可以评估哪些知识图在生成政策辩护案例方面更好。Abstract
Recent work within the Argument Mining community has shown the applicability of Natural Language Processing systems for solving problems found within competitive debate. One of the most important tasks within competitive debate is for debaters to create high quality debate cases. We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We study this potential in the context of a type of American Competitive Debate, called Policy Debate, which already has a large scale dataset targeting it called DebateSum. We significantly improve upon DebateSum by introducing 53180 new examples, as well as further useful metadata for every example, to the dataset. We leverage the txtai semantic search and knowledge graph toolchain to produce and contribute 9 semantic knowledge graphs built on this dataset. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases. A demo which automatically generates debate cases, along with all other code and the Knowledge Graphs, are open-sourced and made available to the public here: https://github.com/Hellisotherpeople/DebateKG
摘要
最近在辩论挖掘社区中的工作表明,自然语言处理系统可以解决竞赛辩论中的问题。辩论中最重要的任务之一是创建高质量辩论案例。我们表明,可以使用受限短路径搜索在辩论Semantic Knowledge Graphs中构建高效的辩论案例。我们在美国竞赛辩论(Policy Debate)的上下文中研究这一潜力,使用DebateSum数据集进行研究。我们在DebateSum数据集上进行了大规模的扩展和补充,增加了53180个新的示例,以及每个示例的更多有用的元数据。我们使用txtai的semantic搜索和知识图工具链生成和提交了9个基于这些数据集的semantic知识图。我们还创建了一种用于评估这些知识图在生成政策辩论案例中的优劣的评价方法。一个自动生成辩论案例的demo,以及所有代码和知识图,都是公开开源的,可以在以下链接中获取:https://github.com/Hellisotherpeople/DebateKG。
Semi Supervised Meta Learning for Spatiotemporal Learning
results: 这个论文的结果显示,通过应用元学习到MAE架构中,可以提高视频重建和动作分类的性能。Abstract
We approached the goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning in three steps. Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures. Thus, we test spatiotemporal learning through: a meta-learning architecture only, a representation learning architecture only, and an architecture applying representation learning alongside a meta learning architecture. We utilize the Memory Augmented Neural Network (MANN) architecture to apply meta-learning to our framework. Specifically, we first experiment with applying a pre-trained MAE and fine-tuning on our small-scale spatiotemporal dataset for video reconstruction tasks. Next, we experiment with training an MAE encoder and applying a classification head for action classification tasks. Finally, we experiment with applying a pre-trained MAE and fine-tune with MANN backbone for action classification tasks.
摘要
我们在三个步骤中尝试了应用元学习到自动编码器中进行空间时间学习。我们的目标是理解将元学习应用到现有的状态艺术表示学习架构中的影响。因此,我们通过以下三种方法进行测试:一种仅使用元学习架构,一种仅使用表示学习架构,以及一种同时使用表示学习和元学习架构。我们使用带有记忆增强的神经网络(MANN)架构来应用元学习到我们的小规模空间时间数据集中。首先,我们试验了使用预训练的MAE(自动编码器)并在我们的小规模空间时间数据集中细化 reconstruction 任务。然后,我们试验了训练MAE编码器并应用一个分类头进行动作分类任务。最后,我们试验了使用预训练的MAE并在Mann架构中进行细化,以便进行动作分类任务。
Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings
for: This paper uses court proceedings to investigate gender inequality in divorce cases in India.
methods: The paper uses natural language processing (NLP) techniques to analyze the court proceedings, but also acknowledges the limitations and biases present in these methods.
results: The paper finds that while there may be changing social norms in India, with more women challenging patriarchy, the court proceedings reveal striking gender inequality, with women often experiencing domestic violence.Here’s the same information in Simplified Chinese text:
results: 这篇论文发现,虽然印度社会规范可能在变化,但法院记录表明,妇女经常遭受家庭暴力。Abstract
Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.
摘要
决aroof是法律解除婚姻的法院程序。由于这通常是婚姻合作的不愉快结果,每个方可能有理由来称呼这个决定,通常会在法律程序中详细记录。通过17,306起法律程序的庞大资料库,这篇论文研究了妇女不平等问题,通过婚姻法律程序的视角。虽然新的数据源(如公共法律记录)在社会问题上具有潜在的研究价值,但是现有的自然语言处理(NLP)技术可能会对或affect这些研究。因此,我们需要进行深入的分析,检查现有NLP资源中的潜在差距和局限性。在方法ológico side,我们示出了现有NLP资源的修改,以便量化社会不平等。在 substantiál side,我们发现,尽管有很多法律案件,但AI对这些法律案件的分析表明,妇女在婚姻中 frequently subjected to domestic violence。
Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance
results: 提高了基于少量标签数据的生成质量,并在使用不同百分比的标签数据时表现出优异性,证明了提案的方法在生成模型化中具有广泛的应用前景Abstract
Score-based Generative Models (SGMs) are a popular family of deep generative models that achieves leading image generation quality. Earlier studies have extended SGMs to tackle class-conditional generation by coupling an unconditional SGM with the guidance of a trained classifier. Nevertheless, such classifier-guided SGMs do not always achieve accurate conditional generation, especially when trained with fewer labeled data. We argue that the issue is rooted in unreliable gradients of the classifier and the inability to fully utilize unlabeled data during training. We then propose to improve classifier-guided SGMs by letting the classifier calibrate itself. Our key idea is to use principles from energy-based models to convert the classifier as another view of the unconditional SGM. Then, existing loss for the unconditional SGM can be adopted to calibrate the classifier using both labeled and unlabeled data. Empirical results validate that the proposed approach significantly improves the conditional generation quality across different percentages of labeled data. The improved performance makes the proposed approach consistently superior to other conditional SGMs when using fewer labeled data. The results confirm the potential of the proposed approach for generative modeling with limited labeled data.
摘要
Score-based生成模型(SGM)是一种深度生成模型,其可以实现领先的图像生成质量。 Earlier studies have extended SGMs to tackle class-conditional generation by coupling an unconditional SGM with the guidance of a trained classifier. However, such classifier-guided SGMs do not always achieve accurate conditional generation, especially when trained with fewer labeled data. We argue that the issue is rooted in unreliable gradients of the classifier and the inability to fully utilize unlabeled data during training. We then propose to improve classifier-guided SGMs by letting the classifier calibrate itself. Our key idea is to use principles from energy-based models to convert the classifier as another view of the unconditional SGM. Then, existing loss for the unconditional SGM can be adopted to calibrate the classifier using both labeled and unlabeled data. Empirical results validate that the proposed approach significantly improves the conditional generation quality across different percentages of labeled data. The improved performance makes the proposed approach consistently superior to other conditional SGMs when using fewer labeled data. The results confirm the potential of the proposed approach for generative modeling with limited labeled data.
results: 本文对这些技术的使用 overhead 进行了Characterization,并提出了一些加速GCs和HE加速器的解决方案,包括HAAC和RPU。最后,本文还讨论了未来工作的需要,以减少PPC的开销。Abstract
Privacy and security have rapidly emerged as first order design constraints. Users now demand more protection over who can see their data (confidentiality) as well as how it is used (control). Here, existing cryptographic techniques for security fall short: they secure data when stored or communicated but must decrypt it for computation. Fortunately, a new paradigm of computing exists, which we refer to as privacy-preserving computation (PPC). Emerging PPC technologies can be leveraged for secure outsourced computation or to enable two parties to compute without revealing either users' secret data. Despite their phenomenal potential to revolutionize user protection in the digital age, the realization has been limited due to exorbitant computational, communication, and storage overheads. This paper reviews recent efforts on addressing various PPC overheads using private inference (PI) in neural network as a motivating application. First, the problem and various technologies, including homomorphic encryption (HE), secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are introduced. Next, a characterization of their overheads when used to implement PI is covered. The characterization motivates the need for both GCs and HE accelerators. Then two solutions are presented: HAAC for accelerating GCs and RPU for accelerating HE. To conclude, results and effects are shown with a discussion on what future work is needed to overcome the remaining overheads of PI.
摘要
<>转换文本到简化中文。<>隐私和安全已经迅速emerge为首要的设计约束。用户现在要求更多的保护,包括谁可以看到他们的数据(保密)以及如何使用它们(控制)。在这里,现有的 криптографические技术 для安全 fallen short:它们可以保护数据在存储或传输时,但必须解密它们进行计算。幸运的是,一种新的计算模式存在,我们称之为隐私保持计算(PPC)。这些技术可以用于安全的外部计算或让两个方面计算而不把用户的秘密数据泄露出来。尽管它们在数字时代中保护用户的潜在潜力很大,但实现却受到了极高的计算、通信和存储开销的限制。这篇文章介绍了在实现隐私保持计算时不同技术的开销。首先,问题和不同技术,包括同质加密(HE)、分 sharing(SS)、拟合圈(GCs)和无意识传输(OT),是介绍的。接着,对这些技术在实现隐私保持计算时的开销进行了描述。这种描述驱动了GCs和HE加速器的需求。然后,文章介绍了两个解决方案:HAAC用于加速GCs,和RPU用于加速HE。最后,文章显示了结果和影响,并进行了未来工作的讨论,以便继续减少隐私保持计算的开销。
Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data
paper_authors: Liangrui Pan, Dazhen Liu, Yutao Dou, Lian Wang, Zhichao Feng, Pengfei Rong, Liwen Xu, Shaoliang Peng for:* 这个研究旨在透过非监督学习的对照学习方法来分类不同种类的肝癌,以提高肝癌诊断、治疗和预后预测。methods:* 这个研究使用了一个普遍化 Framework 基于注意力机制(AMUCL),并提出了一个基于注意力机制的多头注意力对照学习模型(DMACL),以深入探索肝癌多种数据的特点和分类。results:* 相比11个深度学习模型,DMACL 模型在单细胞多种数据集上取得了 C-指数0.002、Silhouette 分数0.801和Davies Bouldin 分数0.38的最佳结果,并在肝癌多种数据集上取得了最可靠的肝癌乱型分类结果。Abstract
Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. Therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. In this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. Importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. This unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. Compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset. On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. Finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. By analyzing the GO functional enrichment, subtype-specific biological functions, and GSEA of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.
摘要
due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. in this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. this unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a silhouette score of 0.801, and a davies bouldin score of 0.38 on a single-cell multi-omics dataset. on a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a silhouette score of 0.688, and a davies bouldin score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. by analyzing the go functional enrichment, subtype-specific biological functions, and gsea of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.
Large-scale global optimization of ultra-high dimensional non-convex landscapes based on generative neural networks
For: 这个论文是关于非凸优化问题的一种基于深度生成网络的算法 мета希略,用于在维度非常高的搜索空间中寻找优化答案。* Methods: 该算法使用了一个特制的损失函数和一个适应性的深度网络 architecture,通过训练这个网络来进行搜索和优化。* Results: 在一些标准的优化问题中,该算法能够比对State-of-the-art algorithm benchmarks更好地性能,并且需要 fewer function evaluations。I hope this helps! Let me know if you have any other questions.Abstract
We present a non-convex optimization algorithm metaheuristic, based on the training of a deep generative network, which enables effective searching within continuous, ultra-high dimensional landscapes. During network training, populations of sampled local gradients are utilized within a customized loss function to evolve the network output distribution function towards one peak at high-performing optima. The deep network architecture is tailored to support progressive growth over the course of training, which allows the algorithm to manage the curse of dimensionality characteristic of high-dimensional landscapes. We apply our concept to a range of standard optimization problems with dimensions as high as one thousand and show that our method performs better with fewer function evaluations compared to state-of-the-art algorithm benchmarks. We also discuss the role of deep network over-parameterization, loss function engineering, and proper network architecture selection in optimization, and why the required batch size of sampled local gradients is independent of problem dimension. These concepts form the foundation for a new class of algorithms that utilize customizable and expressive deep generative networks to solve non-convex optimization problems.
摘要
我们提出了一种基于深度生成网络的非 convex 优化算法metaheuristic,可以有效地在维度非常高的连续空间中寻找优点。在网络训练中,通过自定义损失函数来使用射程数据集的人工 popula-tion,逐渐提高网络输出分布函数的优化。我们的网络架构是通过训练进程来支持不断增长,以适应高维度空间的特点。我们在一些标准的优化问题中使用了这种方法,并证明我们的方法可以在 fewer function evaluations 下比 benchmark 更好的性能。我们还讨论了深度网络过parameterization、损失函数工程和网络架构选择对优化的影响,以及批处理大小是独立于问题维度的。这些概念形成了一种新的类型的算法,可以使用可定制和表达ive的深度生成网络来解决非 convex 优化问题。
Bidirectional Attention as a Mixture of Continuous Word Experts
paper_authors: Kevin Christian Wibisono, Yixin Wang
for: This paper aims to examine the statistical underpinnings of bidirectional attention in large language models (LLMs), specifically exploring the relationship between bidirectional attention and mixture-of-experts (MoE) weights.
methods: The paper uses a combination of theoretical analysis and empirical studies to investigate the statistical properties of bidirectional attention. The authors reparameterize bidirectional attention as a continuous bag of words (CBOW) model with MoE weights, and show that this allows for a deeper understanding of the model’s behavior.
results: The paper finds that bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. Additionally, the authors extend the model to categorical tabular data and find that it outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, the paper theoretically characterizes when linear word analogies are present in the word embeddings of bidirectional attention.Abstract
Bidirectional attention $\unicode{x2013}$ composed of self-attention with positional encodings and the masked language model (MLM) objective $\unicode{x2013}$ has emerged as a key component of modern large language models (LLMs). Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.
摘要
bidirectional attention $\unicode{x2013}$ 由自我注意和位置编码组成,并且与屏蔽语言模型(MLM)目标相结合,已经成为现代大语言模型(LLM)的关键组成部分。Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.
paper_authors: Joyce Chew, Edward De Brouwer, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter
for: 本研究旨在更深入理解 manifold neural networks (MNNs), analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs)。
methods: 该类含有多种 subclass,可以看作 manifold 的 аналоги。furthermore, the authors propose a method for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points.
results: the authors provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. unlike previous work (which focused on specific graph constructions), the rate of convergence does not directly depend on the number of filters used, and exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. additionally, the authors provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions.Abstract
We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific graph constructions), our rate of convergence does not directly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. Additionally, we provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions.
摘要
我们引入一种类型的拟合神经网络(MNN),称之为拟合筛合网络(MFCN),以深入理解MNN,类似于如何使用集合合并框架理解图神经网络(GNN)。这个类型包括许多子类,可以看作拟合 manifold 中的各种受欢迎 GNN 的拟合。我们 THEN 考虑一种基于构建数据驱动图的方法,实现这些网络,只有 finite 数据点的知识,而不是全局 manifold 的知识。我们提供了足够的条件,使网络可靠地趋向于维度随着数据点的数量增加而减少。与前一个工作不同,我们的速度不直接取决于使用的筛刷数量。此外,它展现出线性取决于网络的深度,而不是之前所获得的对数靠渐增长。我们还提供了一些有趣的 MFCN 的 subclass 和特定图构造下的速度减少率。
paper_authors: Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun
For: This paper is written to study the contextual dynamic pricing problem with strategic buyers, where buyers can manipulate their feature data to obtain a lower price.* Methods: The paper proposes a strategic dynamic pricing policy that incorporates the buyers’ strategic behavior into the online learning to maximize the seller’s cumulative revenue. The policy uses a combination of dynamic pricing and strategic behavior handling algorithms to account for the buyers’ manipulation of their feature data.* Results: The paper achieves a sublinear regret upper bound of $O(\sqrt{T})$ and is shown to be superior to other pricing policies that are unaware of the strategic behaviors through extensive experiments.Here’s the Chinese translation of the three points:* For: 这篇论文是研究Contextual Dynamic Pricing问题,在这个问题中,买家可以通过操纵自己的特征数据来获得更低的价格。* Methods: 论文提出了一种战略性动态价格策略,它将买家的战略行为包含在在线学习中,以最大化卖家的累积收益。这种策略使用了动态价格和战略行为处理算法的组合来考虑买家的操纵行为。* Results: 论文实现了一个下限为$O(\sqrt{T})$的非线性 regret bound,并通过广泛的实验证明了其在其他不知情的价格策略比较优秀。Abstract
Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.
摘要
企业通常采用个性化价格策略,根据消费者特点来定价。在这个过程中,消费者可以通过滥用自己的特征数据来获得较低的价格,并且会付出一定的操作成本。这种策略性行为可能会妨碍企业实现最高利润。本文研究了上述情境下的动态价格问题,其中买方不仅不见商家的实际特征,而且也不知道自己对产品的评估价值。为了应对这些挑战,我们提出了一种战略性动态价格策略,该策略将买方的策略行为纳入在线学习中,以最大化卖家的总收入。我们首先证明了忽略买方策略行为的非战略价格策略会在总时间 horizon $T$ 上导致一个线性的 $\Omega(T)$ 违和,这表明这些策略不比随机价格策略更好。然后,我们证明了我们提出的策略可以达到一个幂函数Bound $O(\sqrt{T})$ 的违和上限,这表明我们的策略比其他不考虑买方策略行为的价格策略更高效。另外,我们的策略还能够考虑到 manipulate 成本未知的情况,通过在线学习来同时估算买方评估价值和 manipulate 成本参数,并且也可以达到 $O(\sqrt{T})$ 的违和上限。实际实验支持我们的理论发展,并证明了我们的策略在与其他不考虑买方策略行为的价格策略进行比较时具有更高的性能。
A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics
results: 实验结果表明,与物理反向动态肌肉模型相比,该方法的估计结果具有较小的偏差,并且在两个试验(走行和手部运动)中都有较高的准确率。Abstract
Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).
摘要
muscle force和关节动态分析从表面电omyography(sEMG)是生物机器学分析中的关键因素,它们描述了神经肌肉刺激、肌肉动态和动力学的 dynamically interplay。 recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).
results: 通过广泛的实验,显示MolGroup可以提高GIN和Graphormer在11个目标数据集上的性能,增加4.41%和3.47%。Abstract
The limited availability of annotations in small molecule datasets presents a challenge to machine learning models. To address this, one common strategy is to collaborate with additional auxiliary datasets. However, having more data does not always guarantee improvements. Negative transfer can occur when the knowledge in the target dataset differs or contradicts that of the auxiliary molecule datasets. In light of this, identifying the auxiliary molecule datasets that can benefit the target dataset when jointly trained remains a critical and unresolved problem. Through an empirical analysis, we observe that combining graph structure similarity and task similarity can serve as a more reliable indicator for identifying high-affinity auxiliary datasets. Motivated by this insight, we propose MolGroup, which separates the dataset affinity into task and structure affinity to predict the potential benefits of each auxiliary molecule dataset. MolGroup achieves this by utilizing a routing mechanism optimized through a bi-level optimization framework. Empowered by the meta gradient, the routing mechanism is optimized toward maximizing the target dataset's performance and quantifies the affinity as the gating score. As a result, MolGroup is capable of predicting the optimal combination of auxiliary datasets for each target dataset. Our extensive experiments demonstrate the efficiency and effectiveness of MolGroup, showing an average improvement of 4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets selected by MolGroup on 11 target molecule datasets.
摘要
因为小分子数据集的标注有限,机器学习模型受到挑战。为了解决这个问题,一种常见的策略是与其他辅助数据集合作。然而,更多的数据不总是能带来改善。当目标数据集和辅助分子数据集之间存在知识差异或矛盾时,可能会出现负面传递。因此,确定可以帮助目标数据集的辅助分子数据集仍是一个关键和未解决的问题。经验分析发现,将分子结构相似性和任务相似性组合起来可以 servir como一个可靠的指标来选择高价值的辅助分子数据集。鼓动了这一点,我们提出了MolGroup,它将数据集的亲和力分解为任务亲和力和结构亲和力,以预测每个目标数据集的最佳辅助分子数据集。MolGroup使用优化的 Routing 机制和meta gradient来实现这一点,并通过最大化目标数据集的性能来衡量亲和力。因此,MolGroup可以预测每个目标数据集的优化辅助分子数据集。我们的广泛实验表明,MolGroup可以提高 GIN 和 Graphormer 在 11 个目标分子数据集上的性能,平均提高4.41%/3.47%。
results: 相比顺序实现,并行实现可以减少训练时间,并且在大多数情况下,并行版本可以达到更高的预测性能。Abstract
Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve strongly superior predictive performance in most cases.
摘要
Note:* "Neural algorithmic reasoners" is translated as "神经算法推理器" (shénxīn xiàngxíng suǒyì) in Simplified Chinese.* "Parallel processors" is translated as "并行处理器" (héxìng chùxíng) in Simplified Chinese.* "Sequential algorithms" is translated as "顺序算法" (shùxìng suāfāng) in Simplified Chinese.* "Parallel algorithms" is translated as "并行算法" (héxìng suāfāng) in Simplified Chinese.* "CLRS framework" is translated as "CLRS框架" (C-L-R-S kuàiwā) in Simplified Chinese.
Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks
results: 论文的计算研究表明,使用这种目标指导优化方法可以在约10倍速度下获得同质性的优化解决方案,并且可以在几个数量级快的速度下生成与实际终端决策兼容的解决方案。此外,论文还表明,通过结合机器学习和优化,可以获得货物整合的重要经济效益。Abstract
The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers (or loads) to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Dynamic Load Planning Problem (DLPP) that considers both flow and load planning challenges jointly to adjust loads and flows as the demand forecast changes over time before the day of operations. The paper aims at developing a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the DLPP as a MIP and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate paths. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, the paper proposes a Goal-Directed Optimization that eliminates those symmetries by generating optimal solutions staying close to a reference plan. The paper also proposes an optimization proxy to address the computational challenges of the optimization models. The proxy combines a machine learning model and a feasibility restoration model and finds solutions that satisfy real-time constraints imposed by planners-in-the-loop. An extensive computational study on industrial instances shows that the optimization proxy is around 10 times faster than the commercial solver in obtaining the same quality solutions and orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the DLPP for load consolidation, and the significant savings obtained from combining machine learning and optimization.
摘要
服务网络设计中的负载观念问题是货运公司的核心挑战:它决定在时间上将多少货车(或货量)分配给发送。另一个关键挑战是决定流程计划,它决定了各种货物的分配量。这篇文章考虑了时间统计负载观念问题(DLPP),考虑了流程和负载观念问题的 JOINT 解决方案,以适应变化的需求预测。文章的目标是为终端站规划人创建一个决策支持工具,以帮助他们在网络中做出这些决策。文章使用了数理统计方法来形式化 DLPP,并证明了网络中每个商品可以通过主要和备用路径进行 routed。这导致优化 solver 可能会返回对应的解,导致计划师和优化模型之间的不一致,从而减少优化的信任度。为解决这个限制,文章提出了目标导向优化方法,删除网络中的 symmetries,从而确保优化模型返回的解是固定的。文章还提出了一个优化代理,将数理统计方法与可行性修复模型结合,以确保解决方案满足了现场实际的时间限制。一系列的 Computational Study 表明,该优化代理比商业 solver 更快,可以在获得相同质量解决方案的情况下节省大量时间。文章还证明了 DLPP 的负载整合和机器学习优化的好处,并获得了负载观念问题的答案。
Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric Regression by Adversarial Training
results: 研究发现,通过对抗训练方案,深度神经网络估计器可以在$L2$-norm上达到更好的性能,而且可以在sup-norm上收敛。此外,作者还扩展了对抗训练方案到更通用的损失函数和数据生成函数上。实验结果支持了理论发现。Abstract
We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme. For the nonparametric regression problem, it has been shown that an estimator using deep neural networks can achieve better performances in the sense of the $L2$-norm. In contrast, it is difficult for the neural estimator with least-squares to achieve the sup-norm convergence, due to the deep structure of neural network models. In this study, we develop an adversarial training scheme and investigate the sup-norm convergence of deep neural network estimators. First, we find that ordinary adversarial training makes neural estimators inconsistent. Second, we show that a deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction. We extend our adversarial training to general setups of a loss function and a data-generating function. Our experiments support the theoretical findings.
摘要
我们展示了深度神经网络估计器的sup-norm收敛性,使用一个新的反抗训练方案。在非 Parametric 回授问题中,已经证明了使用深度神经网络可以实现更好的性能在 $L2$-norm 的意义上。然而,使用最小二乘训练的神经网络估计器很难实现sup-norm收敛性,因为神经网络模型的深度结构。在这个研究中,我们开发了一个反抗训练方案,并调查深度神经网络估计器的sup-norm收敛性。首先,我们发现了普通的反抗训练会使神经网络估计器不一致。其次,我们显示了一个深度神经网络估计器可以通过我们的反抗训练方案和修正得到最佳的sup-norm收敛性。我们将我们的反抗训练扩展到普通的损失函数和数据生成函数的情况下。我们的实验支持了理论的结论。
paper_authors: Aditya Gupta, Shiva Maharaj, Nicholas Polson, Vadim Sokolov for: 研究棋盘上棋子的分布和评估棋盘的价值。methods: 引入杂入valuation的方法,包括对棋子和棋盘进行评估。results: 研究发现, Knight和Bishop的位置有着重要的影响,而Pawn的价值也需要考虑棋盘结构。同时,研究还提供了有价值的 Pawn 评估方法。Abstract
Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.
摘要
我们的研究目标是评估棋盘上的棋子和棋子的位置。随着棋盘AI的出现,我们可以准确评估棋盘上的位置价值。传统方法将棋子分别赋予固定价值(♔∞、♕9、♖5、♗3、♘3、♟1)。我们增强这种分析方法,引入棋子和棋盘的边缘价值。我们通过审查夜报和主教的位置,提供有价值的反馈,并探讨了它们的价值。值得注意的是,尼莫雷维茨(Nimzowitsch)是棋盘结构和价值评估的先驱者之一。最后,我们提出了未来研究的可能性。Note that the piece names in Simplified Chinese are:♔ 皇后 (queen)♕ 王后 (rook)♖ bishop♗ night♘ knight♟ Pawn
Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations
paper_authors: Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong
For: This paper aims to improve the explainability of Convolutional Neural Networks (CNNs) by providing a more interactive and labor-efficient design for diagnosing and revising CNN vulnerabilities using local explanations.* Methods: The paper proposes an interactive design called DeepFuse, which realizes a direct feedback loop between a user and CNNs in diagnosing and revising CNN vulnerabilities using local explanations.* Results: The paper reports the results of a two-day study with 12 experienced CNN engineers, showing that DeepFuse helps participants create more accurate and “reasonable” models than the current state-of-the-art, and that participants found the way DeepFuse guides case-based reasoning to be practically useful for their current practice.Here’s the Chinese translation of the three points:* For: 这篇论文目的是提高卷积神经网络(CNN)的可解释性,通过提供更有优化的交互式和劳动效率的设计,以便用户更好地诊断和修复 CNN 的潜在漏洞。* Methods: 论文提出了一种交互式的设计方案called DeepFuse,它实现了用户和 CNN 之间的直接反馈循环,以便更好地诊断和修复 CNN 的潜在漏洞。* Results: 论文报告了一项两天的研究, involvement 12名有经验的 CNN 工程师,结果显示,使用 DeepFuse 可以帮助参与者创建更准确和 “合理” 的模型,并且参与者认为 DeepFuse 引导的案例分析方法是实际上有用的 для他们当前的做法。Abstract
The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.
摘要
当地解释提供了图像上的热图,以解释卷积神经网络(CNN)的输出。由于其可见直观,这种方法在诊断CNN方面变得非常流行。然而,我们在第一个研究(S1)中发现,机器学习工程师对当地解释持有两样的看法:一方面,他们认为这种方法是非常有价值和必不可少的视角,另一方面,由于探测漏洞的做法是尝试性的,因此导致了困惑。此外,根据诊断得到的漏洞来控制CNN也显得很困难。为了弥合这个差距,我们设计了深度融合(DeepFuse),这是第一个可互动的设计,可以在诊断和修复CNN漏洞中直接将用户和CNN之间建立反馈循环。深度融合帮助CNN工程师系统地搜索"不合理"的当地解释,并在劳动效率高的情况下注释新的边界。然后,它会根据给出的注释来导向模型,以避免模型 introduce类似的错误。我们在第二个研究(S2)中与12名经验丰富的CNN工程师进行了两天的研究。使用深度融合,参与者可以建立更加准确和"合理"的模型,并发现了深度融合在诊断过程中的指导作用。我们提供了设计意味着,解释如何未来驱动的设计可以将XAI驱动的洞察力转化为实际行动。
Learning Variational Neighbor Labels for Test-Time Domain Generalization
results: 我们在六个常用的数据集上进行了实验,结果表明我们的提议可以提高模型的泛化能力,并且可以在不同的域面上进行更好的泛化。Abstract
This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed at unseen target domains. We follow the strict separation of source training and target testing but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem by modeling pseudo labels as distributions to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on six widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.
摘要
这篇论文努力实现领域总结,即模型在训练后在未见目标领域中进行部署。我们遵循严格的源训练和目标测试的分离,但是利用目标数据本身的价值进行推断。我们提出了三项贡献:一、在测试时对目标样本进行 probabilistic pseudo-标签,以使源训练模型在目标领域进行普适化。我们将总结问题形式为变量推理问题,模型 pseudo labels 为分布来考虑不确定性,以避免 pseudo labels 的不准确信号。二、我们学习了变量邻域标签,以包含邻近目标样本的信息生成更加稳健的 pseudo labels。三、为了学习更好地包含更多的目标信息并生成更精准和稳定的变量邻域标签,我们引入了元总结阶段在训练中进行模拟总结过程。我们在六种广泛使用的数据集上进行了实验,并证明了我们的提议的优点、能力和有效性。
Measuring the Success of Diffusion Models at Imitating Human Artists
results: 研究发现,当生成模型被训练时,可以很准确地模仿70名艺术家的作品,并且可以在图像上匹配这些作品的特征。Abstract
Modern diffusion models have set the state-of-the-art in AI image generation. Their success is due, in part, to training on Internet-scale data which often includes copyrighted work. This prompts questions about the extent to which these models learn from, imitate, or copy the work of human artists. This work suggests that tying copyright liability to the capabilities of the model may be useful given the evolving ecosystem of generative models. Specifically, much of the legal analysis of copyright and generative systems focuses on the use of protected data for training. As a result, the connections between data, training, and the system are often obscured. In our approach, we consider simple image classification techniques to measure a model's ability to imitate specific artists. Specifically, we use Contrastive Language-Image Pretrained (CLIP) encoders to classify images in a zero-shot fashion. Our process first prompts a model to imitate a specific artist. Then, we test whether CLIP can be used to reclassify the artist (or the artist's work) from the imitation. If these tests match the imitation back to the original artist, this suggests the model can imitate that artist's expression. Our approach is simple and quantitative. Furthermore, it uses standard techniques and does not require additional training. We demonstrate our approach with an audit of Stable Diffusion's capacity to imitate 70 professional digital artists with copyrighted work online. When Stable Diffusion is prompted to imitate an artist from this set, we find that the artist can be identified from the imitation with an average accuracy of 81.0%. Finally, we also show that a sample of the artist's work can be matched to these imitation images with a high degree of statistical reliability. Overall, these results suggest that Stable Diffusion is broadly successful at imitating individual human artists.
摘要
现代扩散模型已经提高了人工智能图像生成的水平。这些模型的成功部分归功于训练在互联网规模的数据上,这些数据经常包含版权保护的作品。这种情况引发了关于模型学习、模仿或复制人类艺术家的作品的问题。本研究表明,将版权责任与模型的能力相关可能是有用的,尤其在生成模型生态系统不断演化的情况下。现在的法律分析对版权和生成系统的关系都集中在使用保护数据进行训练上。因此,数据、训练和系统之间的连接经常被隐藏。我们的方法是通过使用语言-图像预训练(CLIP)编码器来测试模型是否可以模仿特定艺术家。我们的过程是先让模型模仿特定艺术家,然后使用CLIP来重新分类艺术家(或者艺术家的作品)。如果这些测试与模仿相匹配,这表明模型可以模仿该艺术家的表达。我们的方法简单、量化,并且不需要额外训练。我们通过对Stable Diffusion的可imitability进行审核,发现这些模型可以成功地模仿70名职业数字艺术家的版权作品。当Stable Diffusion被要求模仿这些艺术家时,我们发现这些艺术家可以通过模仿的方式与模型的作品相匹配,平均准确率为81.0%。此外,我们还发现这些模仿作品与艺术家的作品之间存在高度的统计相似性。总之,这些结果表明Stable Diffusion可以广泛地模仿人类艺术家。
results: 对各种网络架构和数据模式,包括大脑网络,实验表明 R2ET 可以在恶意攻击下增强解释鲁棒性,保持准确性。Abstract
Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using $\ell_p$-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.
摘要
机器学习模型的强健解释是建立人类信任的关键。由于人类认知能力有限,大多数人只能理解最重要的几个突出特征。因此,使得最重要的突出特征对抗攻击,特别是对于更脆弱的梯度基于解释而言,是非常重要的。现有的防御措施通过 $\ell_p$-norm 来实现强健性,但这些措施具有较弱的保护力。我们定义解释厚度来衡量突出特征排名稳定性,并 deriv 出可观测的代理 bound 来设计 \textit{R2ET} 算法,以高效地提高厚度和固定最重要的突出特征。理论上,我们证明 R2ET 与反击训练之间存在联系。实验结果表明,R2ET 在隐蔽攻击下具有更高的解释强度,同时保持准确性。
Robust Learning-Based Incipient Slip Detection using the PapillArray Optical Tactile Sensor for Improved Robotic Gripping
paper_authors: Qiang Wang, Pablo Martinez Ulloa, Robert Burke, David Cordova Bulens, Stephen J. Redmond
for: This paper aims to detect incipient slip in robotic gripping tasks using a learning-based approach with the PapillArray tactile sensor.
methods: The proposed approach uses a machine learning model to identify patterns associated with incipient slip, and the model is trained using data augmentation techniques to enhance its robustness.
results: The proposed approach achieved a high detection success rate of 95.6% when tested with an offline dataset, and maintained robust performance with a success rate of 96.8% when transferred to a robotic gripping environment distinct from where the training data was collected.Here is the information in Simplified Chinese text:
results: 提议的方法在测试数据集上达到了95.6%的检测成功率,并在将训练数据集 transferred to robotic gripping环境中保持了96.8%的成功率。Abstract
The ability to detect slip, particularly incipient slip, enables robotic systems to take corrective measures to prevent a grasped object from being dropped. Therefore, slip detection can enhance the overall security of robotic gripping. However, accurately detecting incipient slip remains a significant challenge. In this paper, we propose a novel learning-based approach to detect incipient slip using the PapillArray (Contactile, Australia) tactile sensor. The resulting model is highly effective in identifying patterns associated with incipient slip, achieving a detection success rate of 95.6% when tested with an offline dataset. Furthermore, we introduce several data augmentation methods to enhance the robustness of our model. When transferring the trained model to a robotic gripping environment distinct from where the training data was collected, our model maintained robust performance, with a success rate of 96.8%, providing timely feedback for stabilizing several practical gripping tasks. Our project website: https://sites.google.com/view/incipient-slip-detection.
摘要
“感知滑动,特别是潜在滑动,可以让机器人系统采取正确的措施以防止握住的物品被掉落。因此,滑动检测可以提高机器人握住的安全性。但是,准确地检测潜在滑动仍然是一项重要的挑战。在这篇论文中,我们提出了一种基于学习的方法来检测潜在滑动,使用Contactile(澳大利亚)的PapillArray感知器。我们的模型可以高效地识别潜在滑动的模式,在测试集上达到95.6%的检测成功率。此外,我们还提出了一些数据增强方法来提高我们的模型的稳定性。当将训练模型应用于机器人握住环境中,与训练数据集不同的环境下,我们的模型保持了96.8%的成功率,提供了实时反馈,以稳定许多实际的握住任务。更多信息请访问我们的项目网站:https://sites.google.com/view/incipient-slip-detection。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.
Understanding the Efficacy of U-Net & Vision Transformer for Groundwater Numerical Modelling
results: 研究发现,U-Net和U-Net+ViT模型在缺少数据情况下的准确率和效率较高,这些结果表明U-Net基于模型在实际应用中的地下水模拟中具有潜力。Abstract
This paper presents a comprehensive comparison of various machine learning models, namely U-Net, U-Net integrated with Vision Transformers (ViT), and Fourier Neural Operator (FNO), for time-dependent forward modelling in groundwater systems. Through testing on synthetic datasets, it is demonstrated that U-Net and U-Net + ViT models outperform FNO in accuracy and efficiency, especially in sparse data scenarios. These findings underscore the potential of U-Net-based models for groundwater modelling in real-world applications where data scarcity is prevalent.
摘要
Polynomial Width is Sufficient for Set Representation with High-dimensional Features
results: 表示$L$可以是$N$和$D$的乘积,并提供了对LP嵌入层的下界。此外,我们还扩展了结果到卷积Equivariant集函数和复数域。Abstract
Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.
摘要
设 Representation 已经成为深度学习中广泛使用的 inductive bias,用于建立不受输入顺序影响的神经网络。DeepSets 是最广泛使用的集合表示方法之一,它包括将每个集合元素embedding到一个维度为 $L$ 的隐藏空间中,然后使用总池化以获取整个集合的权重,最后将整个集合权重映射到输出。在这项工作中,我们研究了 $L$ 的缩放对 DeepSets 的表达力的影响。先前的分析都是将高维特征简化为一维特征,或者只是对分析性活动进行了限制,从而与实际应用不符或者 $L$ 与集合大小 $N$ 和特征维度 $D$ 相乘 exponential 增长。为了研究最小的 $L$ 可以 достичь足够的表达力,我们提出了两种集合元素嵌入层:(a)线性 + 力活动(LP)和(b)线性 + 指数活动(LE)。我们示出 $L$ 是 poly $(N, D)$ 是 sufficient for set representation 的。我们还提供了 LP 嵌入层的下界。此外,我们将结果推广到卷积同态函数和复数域。