results: 我们研究了在使用synthetic数据更新模型训练时,攻击者是否可以更successfully推断目标个体的敏感属性值。Abstract
We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years, i.e., a propensity-to-move classifier. The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available. The attack also assumes that the attacker has obtained the values of non-sensitive attributes for a certain number of target individuals. The objective of the attack is to infer the values of sensitive attributes for these target individuals. We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes.\footnote{Original paper published at PSD 2022. The paper was subsequently updated.}
摘要
我们研究了一种攻击机器学习模型,该模型预测 individu 或家庭在下一两年内是否将移居,即潜在移居分类器。该攻击假设攻击者可以访问模型并获得预测结果,同时公共数据分布也可以公开。攻击者还假设他们已经获得了目标个体的非敏感特征值。攻击者的目标是推断目标个体的敏感特征值。我们研究了在训练模型时使用synthetic数据取代原始数据的影响,以及如何防止攻击者成功推断敏感特征值。Note: "敏感特征值" (sensitive attributes) in the text refers to information that is private and confidential, such as medical information or financial information.
PhyloGFN: Phylogenetic inference with generative flow networks
For: The paper is written for researchers and practitioners in the field of phylogenetics, particularly those interested in computational methods for inferring evolutionary relationships.* Methods: The paper uses a framework called generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. The proposed method, PhyloGFN, is an amortized posterior sampler that uses GFlowNets to explore and sample from the multimodal posterior distribution over tree topologies and evolutionary distances.* Results: The paper demonstrates that PhyloGFN produces diverse and high-quality evolutionary hypotheses on real benchmark datasets, and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.Here’s the same information in Simplified Chinese text:* For: 本文是为phylogenetics研究人员和实践者所写的,尤其是关注计算方法来推断生物体之间的演化关系。* Methods: 本文使用generative flow networks(GFlowNets)解决phylogenetics中的两个核心问题:简洁基于和 bayesianphylogenetic inference。提议的方法是PhyloGFN,它是一种amortized posterior sampler,使用GFlowNets来探索和采样多模态 posterior distribution中的树构型和进化距离。* Results: 本文示出PhyloGFN在真实的benchmark数据上produces多样和高质量的演化假设,并超过现有variational inference方法的fit度。Abstract
Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.
摘要
生物学计算分支——phylogenetics,研究生物体之间进化关系。尽管它们历史悠久,应用广泛,但从序列数据推导演化树仍然是一项挑战:高复杂的树空间对当前的组合学和概率方法 pose significant obstacles。在这篇论文中,我们采用泵流网络(GFlowNets)框架来解决演化树假设的两个核心问题:简便性基于的和 bayesian 演化树推导。由于GFlowNets 适合探索和采样复杂的组合结构,因此它们是探索和采样 posterior 分布中树 topologies 和演化距离的自然选择。我们示出了 PhyloGFN 可以生成多质量和多样性的进化假设,并且与优化 posterior 分布中的目标分布相似。在实际 benchmark 数据上,PhyloGFN 与 priors 在 marginal likelihood estimation 方面竞争,并且在 state-of-the-art variational inference methods 中实现了更加紧密的适应。
Modeling Fission Gas Release at the Mesoscale using Multiscale DenseNet Regression with Attention Mechanism and Inception Blocks
results: 四种 convolutional neural network (CNN) 架构在 simulate FGR 数据上进行了训练和评估,其中最佳performing网络 combin CBAM 和 InceptionNet 机制,可以提供高精度( mean absolute percentage error of 4.4%)、良好的训练稳定性和鲁棒性,特别是在 very low instantaneous FGR flux values 下。Abstract
Mesoscale simulations of fission gas release (FGR) in nuclear fuel provide a powerful tool for understanding how microstructure evolution impacts FGR, but they are computationally intensive. In this study, we present an alternate, data-driven approach, using deep learning to predict instantaneous FGR flux from 2D nuclear fuel microstructure images. Four convolutional neural network (CNN) architectures with multiscale regression are trained and evaluated on simulated FGR data generated using a hybrid phase field/cluster dynamics model. All four networks show high predictive power, with $R^{2}$ values above 98%. The best performing network combine a Convolutional Block Attention Module (CBAM) and InceptionNet mechanisms to provide superior accuracy (mean absolute percentage error of 4.4%), training stability, and robustness on very low instantaneous FGR flux values.
摘要
<>translate "Mesoscale simulations of fission gas release (FGR) in nuclear fuel provide a powerful tool for understanding how microstructure evolution impacts FGR, but they are computationally intensive. In this study, we present an alternate, data-driven approach, using deep learning to predict instantaneous FGR flux from 2D nuclear fuel microstructure images. Four convolutional neural network (CNN) architectures with multiscale regression are trained and evaluated on simulated FGR data generated using a hybrid phase field/cluster dynamics model. All four networks show high predictive power, with $R^{2}$ values above 98%. The best performing network combine a Convolutional Block Attention Module (CBAM) and InceptionNet mechanisms to provide superior accuracy (mean absolute percentage error of 4.4%), training stability, and robustness on very low instantaneous FGR flux values." into Simplified Chinese.习惯的方式是使用深度学习来预测核燃料中的发生气体释放(FGR)实时流量,从2D核燃料微струк影像中提取特征。这篇研究使用了四种卷积神经网络架构,每个架构都包含多尺度调整。这些架构都显示了高预测力,$R^{2}$值高于98%。最佳performing网络是一个结合卷积块注意模组(CBAM)和inception Net机制的网络,具有最高的准确性(统计误差的平均相对误差为4.4%)、训练稳定性和对very low实时FGR流量的稳定性。
Question Answering for Electronic Health Records: A Scoping Review of datasets and models
paper_authors: Jayetri Bardhan, Kirk Roberts, Daisy Zhe Wang
For: This paper is focused on providing a methodological review of existing works on question answering (QA) over electronic health records (EHRs).* Methods: The authors searched four digital sources (Google Scholar, ACL Anthology, ACM Digital Library, and PubMed) to collect relevant publications on EHR QA, and identified 47 papers for further study. They found that most of the works are fairly recent and that emrQA is the most popular EHR QA dataset.* Results: The authors identified the different models used in EHR QA and the evaluation metrics used to assess these models. They also observed that QA on EHRs is a relatively new and unexplored area, and that there is a need for further research in this area.Here is the same information in Simplified Chinese text:* For: 这篇论文是关于电子医疗记录(EHR)上问答(QA)的方法学评估。* Methods: 作者通过四个数字源(Google Scholar、ACL Anthology、ACM Digital Library、PubMed)收集了相关的EHR QA论文,并从4111篇论文中选择了47篇进行进一步研究。他们发现大多数工作是非常新的,并且emrQA是EHR QA数据集中最受引用和使用的。* Results: 作者发现了EHR QA中不同的模型和评价指标,并评估了这些模型。他们还发现了EHR QA是一个相对新的和未经探索的领域,需要进一步的研究。Abstract
Question Answering (QA) systems on patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Significant amounts of patient data are stored in Electronic Health Records (EHRs), making EHR QA an important research area. In EHR QA, the answer is obtained from the medical record of the patient. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that employ medical websites or scientific papers to retrieve answers, making it critical to research EHR question answering. This study aimed to provide a methodological review of existing works on QA over EHRs. We searched for articles from January 1st, 2005 to September 30th, 2023 in four digital sources including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed to collect relevant publications on EHR QA. 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained a total of 47 papers for further study. Out of the 47 papers, 25 papers were about EHR QA datasets, and 37 papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. Also, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models.
摘要
问答系统(QA)在患者相关数据上可以帮助临床医生和患者。它们可以帮助临床医生决策,并让患者更好地理解自己的医疗记录。大量患者数据被存储在电子医疗记录(EHR)中,因此EHR QA成为了重要的研究领域。在EHR QA中,答案来自患者的医疗记录。由于数据格式和模式的不同,这与其他医学问答任务不同,这使得研究EHR问答非常重要。本研究的目的是对现有的EHR QA工作进行方法学性评估。我们在2005年1月1日至2023年9月30日之间在Google学术、ACL Anthology、ACM数字图书馆和PubMed等四个数字源中搜索相关的文献,并从这些源中收集了4111篇文献。经过屏选 Based on our inclusion criteria,我们获得了47篇文献进行进一步研究。其中25篇文献关于EHR QA数据集,37篇文献关于EHR QA模型。我们发现,EHR QA相对较新和未探索。大多数工作是非常新的。此外,我们发现,emrQA是EHR QA数据集中最受欢迎的,同时也是其他文献中最多被引用和使用的。此外,我们还识别了EHR QA中使用的不同模型和评价指标。
Detection and prediction of clopidogrel treatment failures using longitudinal structured electronic health records
results: 我们从uk Biobank数据集中获得了502,527名病人中的1,824名治疗失败病例和6,859名控制病例。我们组织了每名病人的诊断、处方和手术记录,并将其分为同一天的访问。实验结果显示,时间序列模型在检测和预测任务中都能够超越bag-of-words方法。尤其是BERT模型,其在检测任务中可以达到0.928AUC,而在预测任务中可以达到0.729AUC。BERT模型还在缺乏训练数据时表现出色,因为它可以利用大量的预先训练数据。Abstract
We propose machine learning algorithms to automatically detect and predict clopidogrel treatment failure using longitudinal structured electronic health records (EHR). By drawing analogies between natural language and structured EHR, we introduce various machine learning algorithms used in natural language processing (NLP) applications to build models for treatment failure detection and prediction. In this regard, we generated a cohort of patients with clopidogrel prescriptions from UK Biobank and annotated if the patients had treatment failure events within one year of the first clopidogrel prescription; out of 502,527 patients, 1,824 patients were identified as treatment failure cases, and 6,859 patients were considered as control cases. From the dataset, we gathered diagnoses, prescriptions, and procedure records together per patient and organized them into visits with the same date to build models. The models were built for two different tasks, i.e., detection and prediction, and the experimental results showed that time series models outperform bag-of-words approaches in both tasks. In particular, a Transformer-based model, namely BERT, could reach 0.928 AUC in detection tasks and 0.729 AUC in prediction tasks. BERT also showed competence over other time series models when there is not enough training data, because it leverages the pre-training procedure using large unlabeled data.
摘要
我们提议使用机器学习算法自动检测和预测托剂治疗失败,使用长期结构化电子医疗记录(EHR)。我们通过对自然语言和结构化EHR之间的相似性进行Drawing analogies,引入了多种机器学习算法,用于建立治疗失败检测和预测模型。在这种情况下,我们从UK Biobank中提取了托剂订金的患者群体,并将其分为失败和控制两类。其中,1,824名患者被诊断为治疗失败 случа,6,859名患者被诊断为控制 caso。从数据集中,我们收集了诊断、订金和手术记录,并将其分组为同一个日期下的访问。然后,我们建立了两种不同任务的模型,即检测和预测模型,实验结果表明,时间序列模型在两个任务中都高于bag-of-words方法。特别是,一种基于Transformer的模型,即BERT,在检测任务中可达0.928AUC,在预测任务中可达0.729AUC。此外,BERT还在缺乏训练数据时表现出了优势,因为它可以利用大量未标注数据进行预处理。
Tokenizer Choice For LLM Training: Negligible or Crucial?
paper_authors: Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr
for: 这 paper 的目的是研究 tokenizer 对 LLM 的下游性能的影响,并提出了一些可能的解决方案。
results: 研究发现,选择的 tokenizer 可以对 LLM 的下游性能产生显著的影响,同时 Training 和执行成本也受到了影响。特别是,通用的 tokenizer 评价指标 fertility 和 parity 并不总是预测模型的下游性能,因此这些指标可能是一个不可靠的代理。此外,作者发现,使用英语单语言 tokenizer 进行多语言 LLM 的训练会导致下游性能下降和额外的训练成本增加(最高达 68%),因为英语单语言 tokenizer 的词汇表大小不足。Abstract
The recent success of LLMs has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as a blind spot. Shedding light on this underexplored area, we conduct a comprehensive study on the influence of tokenizer choice on LLM downstream performance by training 24 mono- and multilingual LLMs at a 2.6B parameter scale, ablating different tokenizer algorithms and parameterizations. Our studies highlight that the tokenizer choice can significantly impact the model's downstream performance, training and inference costs. In particular, we find that the common tokenizer evaluation metrics fertility and parity are not always predictive of model downstream performance, rendering these metrics a questionable proxy for the model's downstream performance. Furthermore, we show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English. While English-only tokenizers have been applied to the training of multi-lingual LLMs, we find that this approach results in a severe downstream performance degradation and additional training costs of up to 68%, due to an inefficient tokenization vocabulary.
摘要
We trained 24 mono- and multilingual LLMs with 2.6 billion parameters and compared different tokenizer algorithms and parameterizations. Our findings reveal that the tokenizer choice can significantly affect the model's downstream performance, training and inference costs. In particular, we observed that commonly used tokenizer evaluation metrics, such as fertility and parity, are not always indicative of the model's downstream performance, making these metrics a questionable proxy for assessing the model's performance.Moreover, we discovered that multilingual tokenizers trained on the five most frequent European languages require a vocabulary size increase of a factor of three compared to English. While English-only tokenizers have been used for training multi-lingual LLMs, our results show that this approach leads to a significant downstream performance degradation and additional training costs of up to 68%, due to an inefficient tokenization vocabulary.
Search-Adaptor: Text Embedding Customization for Information Retrieval
results: 在多个实际的英语和多语言检索数据集上,我们显示了Search-Adaptor的性能优势,例如在13个BEIR数据集上,与Google嵌入API相比,Search-Adaptor可以提高nDCG@10的平均提升超过5.2%。Abstract
Text embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data has the power to further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, for customizing LLMs for information retrieval in an efficient and robust way. Search-Adaptor modifies the original text embedding generated by pre-trained LLMs, and can be integrated with any LLM, including those only available via APIs. On multiple real-world English and multilingual retrieval datasets, we show consistent and significant performance benefits for Search-Adaptor -- e.g., more than 5.2% improvements over the Google Embedding APIs in nDCG@10 averaged over 13 BEIR datasets.
摘要
文本嵌入EXTracted by pre-trained Large Language Models (LLMs) has significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data has the power to further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, for customizing LLMs for information retrieval in an efficient and robust way. Search-Adaptor modifies the original text embedding generated by pre-trained LLMs, and can be integrated with any LLM, including those only available via APIs. On multiple real-world English and multilingual retrieval datasets, we show consistent and significant performance benefits for Search-Adaptor -- e.g., more than 5.2% improvements over the Google Embedding APIs in nDCG@10 averaged over 13 BEIR datasets.
Evolutionary Dynamic Optimization and Machine Learning
results: 研究发现,通过在EA和MLA之间的相互 интеграción,可以在ML任务中实现更好的动态优化,并且可以提高模型的性能。Abstract
Evolutionary Computation (EC) has emerged as a powerful field of Artificial Intelligence, inspired by nature's mechanisms of gradual development. However, EC approaches often face challenges such as stagnation, diversity loss, computational complexity, population initialization, and premature convergence. To overcome these limitations, researchers have integrated learning algorithms with evolutionary techniques. This integration harnesses the valuable data generated by EC algorithms during iterative searches, providing insights into the search space and population dynamics. Similarly, the relationship between evolutionary algorithms and Machine Learning (ML) is reciprocal, as EC methods offer exceptional opportunities for optimizing complex ML tasks characterized by noisy, inaccurate, and dynamic objective functions. These hybrid techniques, known as Evolutionary Machine Learning (EML), have been applied at various stages of the ML process. EC techniques play a vital role in tasks such as data balancing, feature selection, and model training optimization. Moreover, ML tasks often require dynamic optimization, for which Evolutionary Dynamic Optimization (EDO) is valuable. This paper presents the first comprehensive exploration of reciprocal integration between EDO and ML. The study aims to stimulate interest in the evolutionary learning community and inspire innovative contributions in this domain.
摘要
适应进化 Computation (EC) 已经成为人工智能中的一个强大领域, Drawing inspiration from nature's gradual development mechanisms. However, EC approaches often face challenges such as stagnation, diversity loss, computational complexity, population initialization, and premature convergence. To overcome these limitations, researchers have integrated learning algorithms with evolutionary techniques. This integration harnesses the valuable data generated by EC algorithms during iterative searches, providing insights into the search space and population dynamics. Similarly, the relationship between evolutionary algorithms and Machine Learning (ML) is reciprocal, as EC methods offer exceptional opportunities for optimizing complex ML tasks characterized by noisy, inaccurate, and dynamic objective functions. These hybrid techniques, known as Evolutionary Machine Learning (EML), have been applied at various stages of the ML process. EC techniques play a vital role in tasks such as data balancing, feature selection, and model training optimization. Moreover, ML tasks often require dynamic optimization, for which Evolutionary Dynamic Optimization (EDO) is valuable. This paper presents the first comprehensive exploration of reciprocal integration between EDO and ML. The study aims to stimulate interest in the evolutionary learning community and inspire innovative contributions in this domain.Translated into Simplified Chinese:适应进化计算 (EC) 已经成为人工智能中的一个强大领域,灵感自然的慢慢发展机制。然而,EC方法经常面临困难,如停滞、多样性损失、计算复杂性、人口初始化和快速 converges。为了突破这些局限性,研究人员已经将学习算法与进化技术相结合。这种结合使得EC算法在迭代搜索中生成的有价值数据,提供搜索空间和人口动态的启示。同时,机器学习 (ML) 和 EC 之间的关系是相互的,EC方法在复杂的 ML 任务中提供了优秀的优化机会。这些混合技术,称为进化机器学习 (EML),在 ML 过程中的不同阶段应用。EC 技术在数据均衡、特征选择和模型训练优化等任务中扮演着重要的角色。此外, ML 任务经常需要动态优化,为了这些任务,进化动态优化 (EDO) 是非常有价值的。这篇文章提供了首次对 EDO 和 ML 之间的相互 интеграции的全面探讨。这项研究的目标是在进化学习社区中启发兴趣,激发创新的贡献。
Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning
results: 本文的实验结果显示,这种方法可以在多智能体强化学习环境中提高稳定性,并在协力和竞争环境中都达到了现有最佳性能。Abstract
Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling real-world challenges. However, the seamless transition of trained policies from simulations to real-world requires it to be robust to various environmental uncertainties. Existing works focus on finding Nash Equilibrium or the optimal policy under uncertainty in one environment variable (i.e. action, state or reward). This is because a multi-agent system itself is highly complex and unstationary. However, in real-world situation uncertainty can occur in multiple environment variables simultaneously. This work is the first to formulate the generalised problem of robustness to multi-modal environment uncertainty in MARL. To this end, we propose a general robust training approach for multi-modal uncertainty based on curriculum learning techniques. We handle two distinct environmental uncertainty simultaneously and present extensive results across both cooperative and competitive MARL environments, demonstrating that our approach achieves state-of-the-art levels of robustness.
摘要
Splicing Up Your Predictions with RNA Contrastive Learning
results: 研究证明了通过这种策略可以学习通用RNAisoform表示,并在下游任务中达到竞争性的结果,包括RNA半衰期和平均核酶荷载预测。Abstract
In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. Our novel dataset and contrastive objective enable the learning of generalized RNA isoform representations. We validate their utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing on both tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction.
摘要
在大量基因数据面前,我们对树落语言中的RNA规则 Code仍然不够完善。其他领域的自动学习方法已经表明了数据生成过程中的规则,例如语言 sentence 结构。受这种启发,我们扩展了对 genomic 数据的冲突学习技术,利用功能相似性 между通过alternative splicing和 gene duplication 生成的序列。我们的新数据集和对比目标可以学习通用 RNA isoform 表示。我们验证了它们在下游任务中的有用性,包括 RNA 半衰期和平均ribosome 荷载预测。我们的预训练策略在线性检测中获得了竞争力和在低数据情况下的一两倍增加的皮尔逊相关性。进一步探索学习的latent空间表明,我们的对比目标实际上得到了Semantically meaningful的表示,这 highlights its potential as a valuable initialization technique for RNA property prediction。
Provably Robust Cost-Sensitive Learning via Randomized Smoothing
results: 实验结果表明, compared to existing methods, 本方法可以在成本敏感的情景下实现显著改善的鲁棒性证明性,而无需增加训练时间或计算复杂度。Abstract
We focus on learning adversarially robust classifiers under a cost-sensitive scenario, where the potential harm of different classwise adversarial transformations is encoded in a binary cost matrix. Existing methods are either empirical that cannot certify robustness or suffer from inherent scalability issues. In this work, we study whether randomized smoothing, a more scalable robustness certification framework, can be leveraged to certify cost-sensitive robustness. Built upon a notion of cost-sensitive certified radius, we show how to adapt the standard randomized smoothing certification pipeline to produce tight robustness guarantees for any cost matrix. In addition, with fine-grained certified radius optimization schemes specifically designed for different data subgroups, we propose an algorithm to train smoothed classifiers that are optimized for cost-sensitive robustness. Extensive experiments on image benchmarks and a real-world medical dataset demonstrate the superiority of our method in achieving significantly improved performance of certified cost-sensitive robustness while having a negligible impact on overall accuracy.
摘要
我们关注在一个成本敏感的情况下学习对抗性robust classifier,其中不同类型的对抗性变换的潜在危害是通过二进制成本矩阵编码。现有方法有两种缺点:一是经验的,无法证明Robustness;二是存在内在的扩展性问题。在这个工作中,我们研究了randomized smoothing是否可以用来证明成本敏感Robustness。基于成本敏感证明半径,我们展示了如何适应任何成本矩阵,并提出了一种特定数据 subgroup fine-grained证明半径优化方案来训练优化成本敏感Robustness的滑动类ifier。实验表明,我们的方法可以在图像benchmark和一个真实的医疗数据集上实现显著提高证明成本敏感Robustness的性能,而无需对总准确率造成重要影响。
Heterophily-Based Graph Neural Network for Imbalanced Classification
for: Addressing class imbalance in graph-related problems, particularly in node classification tasks.
methods: Proposes a unique approach that considers graph heterophily to tackle imbalanced classification, integrating an imbalance classification strategy with heterophily-aware GNNs to improve performance and efficiency.
results: Demonstrates superiority in classification performance and efficiency compared to existing baselines through experiments on real-world graphs.Abstract
Graph neural networks (GNNs) have shown promise in addressing graph-related problems, including node classification. However, conventional GNNs assume an even distribution of data across classes, which is often not the case in real-world scenarios, where certain classes are severely underrepresented. This leads to suboptimal performance of standard GNNs on imbalanced graphs. In this paper, we introduce a unique approach that tackles imbalanced classification on graphs by considering graph heterophily. We investigate the intricate relationship between class imbalance and graph heterophily, revealing that minority classes not only exhibit a scarcity of samples but also manifest lower levels of homophily, facilitating the propagation of erroneous information among neighboring nodes. Drawing upon this insight, we propose an efficient method, called Fast Im-GBK, which integrates an imbalance classification strategy with heterophily-aware GNNs to effectively address the class imbalance problem while significantly reducing training time. Our experiments on real-world graphs demonstrate our model's superiority in classification performance and efficiency for node classification tasks compared to existing baselines.
摘要
граф neural networks (GNNs) 已经显示了解决图像问题的抢势,包括节点分类。然而, conventioal GNNs 假设图像上的数据具有均匀分布,而实际场景中的类别经常受到不均匀的影响,导致标准 GNNs 在不均匀图像上表现下降。在这篇论文中,我们介绍了一种独特的方法,该方法解决了不均匀分类问题,通过考虑图像的异质。我们发现了少数类不仅具有样本稀缺,而且也表现出较低的同类关系度,从而使得邻居节点之间的误差信息更易传播。基于这一点,我们提出了一种高效的方法,即 Fast Im-GBK,该方法结合了不均匀分类策略和异质意识 GNNs,以有效地解决不均匀分类问题,同时显著降低训练时间。我们在实际图像上进行了Node classification任务的实验,并证明了我们的模型在性能和效率方面与现有的基elines相比较优。
Designing Observables for Measurements with Deep Learning
results: 作者使用两种物理模型进行深层受体散射的包含测量,并通过比较传统的physics intuition和机器学习设计的观察量来证明机器学习设计的优势。Abstract
Many analyses in particle and nuclear physics use simulations to infer fundamental, effective, or phenomenological parameters of the underlying physics models. When the inference is performed with unfolded cross sections, the observables are designed using physics intuition and heuristics. We propose to design optimal observables with machine learning. Unfolded, differential cross sections in a neural network output contain the most information about parameters of interest and can be well-measured by construction. We demonstrate this idea using two physics models for inclusive measurements in deep inelastic scattering.
摘要
很多分析在粒子和核物理中使用模拟来推导基础、有效或现象学 Parameters of interest. When the inference is performed with unfolded cross sections, the observables are designed using physics intuition and heuristics. We propose to design optimal observables with machine learning. Unfolded, differential cross sections in a neural network output contain the most information about parameters of interest and can be well-measured by construction. We demonstrate this idea using two physics models for inclusive measurements in deep inelastic scattering.Here's the word-for-word translation of the text into Simplified Chinese:很多分析在粒子和核物理中使用模拟来推导基础、有效或现象学参数。当推导使用 unfolded 跨sections 时,观察器是通过物理直觉和规则来设计的。我们提议使用机器学习设计优化观察器。 unfolded,差分跨sections 在神经网络输出中含有最多关于参数的信息,并可以通过构建好地测量。我们使用 two 物理模型 для inclusive 测量在深刻射撃中示例。
Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research
paper_authors: Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, John D. Co-Reyes, Rishabh Agarwal, Rebecca Roelofs, Yao Lu, Nico Montali, Paul Mougin, Zoey Yang, Brandyn White, Aleksandra Faust, Rowan McAllister, Dragomir Anguelov, Benjamin Sapp
methods: 本研究使用了公共发布的实际驾驶数据(如 Waymo Open Motion Dataset)来初始化或播放多个自适应者的 simulated 场景。它利用硬件加速器(如 TPUs/GPUs)进行加速,支持在图像上进行训练,适用于现代大规模分布式机器学习工作流程。
results: 本研究通过对各种启发式学习和奖励学习算法进行比较,以及不同设计决策的ablation 研究,证明了路径作为规划代理的有效性,以及RL 可能受到 simulated 代理的过拟合。Abstract
Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents.
摘要
模拟是自动驾驶车辆规划软件的开发和测试的重要工具,可以在安全和经济的方式下进行模拟。然而,实际的模拟需要准确地模拟复杂多代理交互行为。为解决这些挑战,我们介绍了 Waymax,一个新的数据驱动的自动驾驶多代理场景模拟器,适用于大规模的模拟和测试。Waymax 使用公共发布的实际驾驶数据(例如 Waymo 开放运动数据集)来初始化或播放多种多代理模拟enario。它完全采用硬件加速器 such as TPUs/GPUs 运行,适合现代大规模分布式机器学习工作流程。为支持在线培训和评估,Waymax 包含了一些学习和硬编码的行为模型, allowing for realistic interaction within simulation。为了补充 Waymax,我们对一些流行的仿效学习和奖励学习算法进行了ablation study,其中我们强调路径作为规划代理的导航和RL 可以做到对模拟代理的适应。
Polynomial Time Cryptanalytic Extraction of Neural Network Models
results: 研究发现,使用新的技术可以在30分钟内EXTRACT ALL PARAMETERS OF A FULL-SIZED NEURAL NETWORK FOR CLASSIFYING CIFAR10 DATASET,这个任务原来需要了2^256的搜索空间。Abstract
Billions of dollars and countless GPU hours are currently spent on training Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations. Many versions of this problem have been studied over the last 30 years, and the best current attack on ReLU-based deep neural networks was presented at Crypto 2020 by Carlini, Jagielski, and Mironov. It resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons). In this paper, we improve this attack by developing several new techniques that enable us to extract with arbitrarily high precision all the real-valued parameters of a ReLU-based DNN using a polynomial number of queries and a polynomial amount of time. We demonstrate its practical efficiency by applying it to a full-sized neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8 hidden layers with 256 neurons each, and over million neuronal parameters. An attack following the approach by Carlini et al. requires an exhaustive search over 2 to the power 256 possibilities. Our attack replaces this with our new techniques, which require only 30 minutes on a 256-core computer.
摘要
估计 billions of dollars 和 countless GPU 小时是为了训练深度神经网络(DNN)而被花费。因此, Determining the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations is essential. Over the last 30 years, many versions of this problem have been studied, and the best current attack on ReLU-based deep neural networks was presented at Crypto 2020 by Carlini, Jagielski, and Mironov. This attack resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons).在这篇论文中,我们提高了这种攻击,发展了several new techniques,使得我们可以使用一个 полиномиаль数量的查询和一个 полиномиаль时间来EXTRACT WITH ARBITRARILY HIGH PRECISION 所有实数参数 OF A ReLU-based DNN。我们示出了这个攻击的实践效率,通过应用它到一个用于分类 CIFAR10 数据集的全大小神经网络,该神经网络有 3072 个输入、8 个隐藏层,每个隐藏层有 256 个神经元,以及数百万个神经元。一个如Carlini et al. 所提出的攻击需要枚举 2 的 256 个可能性。我们的攻击则使用我们新提出的技术,只需要 30 分钟的时间在 256 核心计算机上完成。
Eliciting Model Steering Interactions from Users via Data and Visual Design Probes
results: 研究发现,许多Semantic interactions的目标不直接映射到机器学习模型参数,而是增强训练数据集。研究还发现参与者们有几种不同的需求,这些需求与机器学习专家的不同程度相关。此外,参与者们也发现使用语义交互可以帮助团队成员之间合作工作,特别是对于没有机器学习背景的成员。Abstract
Domain experts increasingly use automated data science tools to incorporate machine learning (ML) models in their work but struggle to "debug" these models when they are incorrect. For these experts, semantic interactions can provide an accessible avenue to guide and refine ML models without having to programmatically dive into its technical details. In this research, we conduct an elicitation study using data and visual design probes to examine if and how experts with a spectrum of ML expertise use semantic interactions to update a simple classification model. We use our design probes to facilitate an interactive dialogue with 20 participants and codify their interactions as a set of target-interaction pairs. Interestingly, our findings revealed that many targets of semantic interactions do not directly map to ML model parameters, but instead aim to augment the data a model uses for training. We also identify reasons that participants would hesitate to interact with ML models, including burdens of cognitive load and concerns of injecting bias. Unexpectedly participants also saw the value of using semantic interactions to work collaboratively with members of their team. Participants with less ML expertise found this to be a useful mechanism for communicating their concerns to ML experts. This was an especially important observation, as our study also shows the different needs that correspond to diverse ML expertise. Collectively, we demonstrate that design probes are effective tools for proactively gathering the affordances that should be offered in an interactive machine learning system.
摘要
域专业人员 increasingly 使用自动化数 science工具来 incorporate 机器学习(ML)模型到他们的工作中,但是在模型错误时很难“调试”这些模型。为这些专业人员,语义互动可以提供一条可 accessible 的通道,以便指导并细化 ML 模型,不需要深入了解技术细节。在这项研究中,我们通过数据和视觉设计探索来评估专业人员spectrum 的 ML 技能水平上是否使用语义互动来更新一个简单的分类模型。我们使用我们的设计探索来促进参与者和20名参与者之间的交互对话,并将其 codified 为一组目标互动对。我们发现了许多语义互动目标并不直接映射到 ML 模型参数,而是用于增强训练数据集。我们还发现了参与者与 ML 模型交互时的一些障碍,包括认知负担和担心插入偏见。但是,参与者也看到了使用语义互动工作协作团队成员的价值。参与者具有较低的 ML 专业知识水平发现这是一种有用的机制,用于与 ML 专家进行沟通他们的关注。这是特别重要的,因为我们的研究也显示了不同的 ML 专业知识水平对应的不同需求。总的来说,我们示示了设计探索是可以有效地收集互动机器学习系统的可用性的工具。
results: KAE 在独立测试集上实现了很好的多样性,同时保持了near-perfect的重建性。此外,KAE 还可以进行 conditional 生成和基于搜索的解码,从而实现了 state-of-the-art 的性能在受限优化中。最后,KAE 还可以根据偏好的粘合能力进行 Conditional 生成,并且在 AutoDock Vina 和 Glide 分数上超过了所有从训练集中的候选者。Abstract
We introduce the Kernel-Elastic Autoencoder (KAE), a self-supervised generative model based on the transformer architecture with enhanced performance for molecular design. KAE is formulated based on two novel loss functions: modified maximum mean discrepancy and weighted reconstruction. KAE addresses the long-standing challenge of achieving valid generation and accurate reconstruction at the same time. KAE achieves remarkable diversity in molecule generation while maintaining near-perfect reconstructions on the independent testing dataset, surpassing previous molecule-generating models. KAE enables conditional generation and allows for decoding based on beam search resulting in state-of-the-art performance in constrained optimizations. Furthermore, KAE can generate molecules conditional to favorable binding affinities in docking applications as confirmed by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. Beyond molecular design, we anticipate KAE could be applied to solve problems by generation in a wide range of applications.
摘要
我们介绍了核心弹性自适应器(KAE),一种基于传播架构造的无监督生成模型,具有优化的表现力 для分子设计。KAE 是基于两个新的损失函数:修改后最大平均差异和负载重合成。KAE 解决了实现有效生成和精准重建的问题,同时维持了独立测试集上的近乎完美重建。KAE 可以实现 conditional generation 和基于排序搜寻的解oding,实现了顶尖的性能在受限制的优化中。此外,KAE 可以根据偏好的缘Points generates molecules with favorable binding affinities in docking applications, as confirmed by AutoDock Vina and Glide scores, outperforming all existing candidates from the training dataset. 在分子设计之外,我们预期 KAE 可以应用到各种生成问题中。
Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal
results: 研究发现,targeting中间预测结果最有效,而targeting低预测结果实际上是有害的。在一年的实验中,对所有学生进行激励提高了37%的学生早期申请率的平均值6.4个百分点,并估计使用我们选择的策略可以达到约75%的这个效果。Abstract
In many settings, interventions may be more effective for some individuals than others, so that targeting interventions may be beneficial. We analyze the value of targeting in the context of a large-scale field experiment with over 53,000 college students, where the goal was to use "nudges" to encourage students to renew their financial-aid applications before a non-binding deadline. We begin with baseline approaches to targeting. First, we target based on a causal forest that estimates heterogeneous treatment effects and then assigns students to treatment according to those estimated to have the highest treatment effects. Next, we evaluate two alternative targeting policies, one targeting students with low predicted probability of renewing financial aid in the absence of the treatment, the other targeting those with high probability. The predicted baseline outcome is not the ideal criterion for targeting, nor is it a priori clear whether to prioritize low, high, or intermediate predicted probability. Nonetheless, targeting on low baseline outcomes is common in practice, for example because the relationship between individual characteristics and treatment effects is often difficult or impossible to estimate with historical data. We propose hybrid approaches that incorporate the strengths of both predictive approaches (accurate estimation) and causal approaches (correct criterion); we show that targeting intermediate baseline outcomes is most effective, while targeting based on low baseline outcomes is detrimental. In one year of the experiment, nudging all students improved early filing by an average of 6.4 percentage points over a baseline average of 37% filing, and we estimate that targeting half of the students using our preferred policy attains around 75% of this benefit.
摘要
在许多场景下,干预可能对某些个人更有效,因此定向干预可能有利。我们在一个大规模的场景实验中分析了定向的价值,该实验包括53,000名大学生,并使用“拐弯”来吸引学生在不紧迫的截止日期前续写金融援助申请。我们开始于基本的定向方法。首先,我们使用 causal forest 来估计各个人的不同待遇效果,然后将学生按照这些估计的最高待遇效果进行干预。接着,我们评估了两种替代的定向策略,一种targeting学生的低预测报道续写金融援助的可能性,另一种targeting学生的高预测报道续写金融援助的可能性。理想的基准结果不是定向的理想准则,也没有先前确定是否优先级低、高或中等预测报道。然而,定向低基准结果很普遍,例如因为对个人特征和干预效果的关系 often difficult or impossible to estimate with historical data。我们提议的混合方法可以结合predictive approach的准确估计和causal approach的正确准则,我们发现定向中间基准结果最有效,而定向低基准结果是不利的。在一年的实验中,对所有学生进行拐弯提高了37%的早期申请率的平均值6.4个百分点,我们估计使用我们的首选策略定向一半的学生可以达到约75%的这个效果。
Every Parameter Matters: Ensuring the Convergence of Federated Learning with Dynamic Heterogeneous Models Reduction
paper_authors: Hanhan Zhou, Tian Lan, Guru Venkataramani, Wenbo Ding
for: This paper focuses on addressing the challenges of cross-device Federated Learning (FL) by developing a unifying framework for heterogeneous FL algorithms with online model extraction, and providing a general convergence analysis for the first time.
methods: The paper proposes a holistic approach that considers both model reduction noise and minimum coverage index to enhance the efficiency of heterogeneous FL.
results: The authors prove that under certain sufficient conditions, the proposed algorithms converge to a stationary point of standard FL for general smooth cost functions, both for IID and non-IID data.Abstract
Cross-device Federated Learning (FL) faces significant challenges where low-end clients that could potentially make unique contributions are excluded from training large models due to their resource bottlenecks. Recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.
摘要
跨设备联合学习(FL)面临着严重的挑战,low-end客户端因资源瓶颈而无法参与大型模型的训练,这些客户端具有唯一的贡献 potential。 recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.Here's the translation in Traditional Chinese:跨设备联合学习(FL)面临着严重的挑战,low-end客户端因为资源瓶颈而无法参与大型模型的训练,这些客户端具有唯一的贡献 potential。 recent research efforts have focused on model-heterogeneous FL, by extracting reduced-size models from the global model and applying them to local clients accordingly. Despite the empirical success, general theoretical guarantees of convergence on this method remain an open question. This paper presents a unifying framework for heterogeneous FL algorithms with online model extraction and provides a general convergence analysis for the first time. In particular, we prove that under certain sufficient conditions and for both IID and non-IID data, these algorithms converge to a stationary point of standard FL for general smooth cost functions. Moreover, we introduce the concept of minimum coverage index, together with model reduction noise, which will determine the convergence of heterogeneous federated learning, and therefore we advocate for a holistic approach that considers both factors to enhance the efficiency of heterogeneous federated learning.
Counting and Algorithmic Generalization with Transformers
results: 该论文显示了一种使用了修改后的Transformers架构可以在计数任务上实现良好的泛化能力,而且使用的是非常轻量级的架构。Abstract
Algorithmic generalization in machine learning refers to the ability to learn the underlying algorithm that generates data in a way that generalizes out-of-distribution. This is generally considered a difficult task for most machine learning algorithms. Here, we analyze algorithmic generalization when counting is required, either implicitly or explicitly. We show that standard Transformers are based on architectural decisions that hinder out-of-distribution performance for such tasks. In particular, we discuss the consequences of using layer normalization and of normalizing the attention weights via softmax. With ablation of the problematic operations, we demonstrate that a modified transformer can exhibit a good algorithmic generalization performance on counting while using a very lightweight architecture.
摘要
SplitBeam: Effective and Efficient Beamforming in Wi-Fi Networks Through Split Computing
paper_authors: Niloofar Bahadori, Yoshitomo Matsubara, Marco Levorato, Francesco Restuccia for: 提高 Wi-Fi 网络吞吐量methods: 使用分解深度神经网络 (SplitBeam) 直接生成扫描矩阵 (BM),并解决瓶颈优化问题 (BOP)results: 比标准 IEEE 802.11 算法和 LB-SciFi state-of-the-art DNN-based approach 减少扫描矩阵返回大小和计算复杂度,保持 bit error rate (BER) 在 about 10^-3 以下,并实现FPGA硬件实现以下结果。Abstract
Modern IEEE 802.11 (Wi-Fi) networks extensively rely on multiple-input multiple-output (MIMO) to significantly improve throughput. To correctly beamform MIMO transmissions, the access point needs to frequently acquire a beamforming matrix (BM) from each connected station. However, the size of the matrix grows with the number of antennas and subcarriers, resulting in an increasing amount of airtime overhead and computational load at the station. Conventional approaches come with either excessive computational load or loss of beamforming precision. For this reason, we propose SplitBeam, a new framework where we train a split deep neural network (DNN) to directly output the BM given the channel state information (CSI) matrix as input. We formulate and solve a bottleneck optimization problem (BOP) to keep computation, airtime overhead, and bit error rate (BER) below application requirements. We perform extensive experimental CSI collection with off-the-shelf Wi-Fi devices in two distinct environments and compare the performance of SplitBeam with the standard IEEE 802.11 algorithm for BM feedback and the state-of-the-art DNN-based approach LB-SciFi. Our experimental results show that SplitBeam reduces the beamforming feedback size and computational complexity by respectively up to 81% and 84% while maintaining BER within about 10^-3 of existing approaches. We also implement the SplitBeam DNNs on FPGA hardware to estimate the end-to-end BM reporting delay, and show that the latter is less than 10 milliseconds in the most complex scenario, which is the target channel sounding frequency in realistic multi-user MIMO scenarios.
摘要
现代IEEE 802.11(Wi-Fi)网络广泛采用多输入多输出(MIMO)技术,以大幅提高吞吐量。为正确扫扫MIMO传输,接入点需要从每个连接的站点得到扫扫矩阵(BM)频繁。然而,矩阵的大小与天线和子频点数量成正比,导致了在站点处的空中时间开销和计算负担的增加。传统方法会导致过高的计算负担或扫扫矩阵精度的损失。为此,我们提出了SplitBeam,一个新的框架,其中我们使用分解深度神经网络(DNN)直接输出BM, given the channel state information(CSI)矩阵作为输入。我们解决了瓶颈优化问题(BOP),以保证计算、空中时间开销和比特错误率(BER)都在应用要求之下。我们进行了广泛的实验CSI收集,使用现成的Wi-Fi设备,在两种不同的环境中进行了测试,并与IEEE 802.11标准Feedback算法和LB-SciFi状态的艺术隐藏法相比较。我们的实验结果表明,SplitBeam可以将扫扫矩阵反馈大小和计算复杂性分别减少到81%和84%,保持BER在约10^-3的范围内。我们还将SplitBeam DNN硬件实现在FPGA上,并估算了终端到终端BM报告延迟,发现其在最复杂的场景下不超过10毫秒,这与实际多用户MIMO场景中的目标频率匹配。
Time-vectorized numerical integration for systems of ODEs
results: 该方法可以实现 greater than 100x的速度提升, compared to标准、顺序时间integation方法,并且可以完全利用现代GPU的性能。文章还提供了一些示例问题,来说明方法的优势。Abstract
Stiff systems of ordinary differential equations (ODEs) and sparse training data are common in scientific problems. This paper describes efficient, implicit, vectorized methods for integrating stiff systems of ordinary differential equations through time and calculating parameter gradients with the adjoint method. The main innovation is to vectorize the problem both over the number of independent times series and over a batch or "chunk" of sequential time steps, effectively vectorizing the assembly of the implicit system of ODEs. The block-bidiagonal structure of the linearized implicit system for the backward Euler method allows for further vectorization using parallel cyclic reduction (PCR). Vectorizing over both axes of the input data provides a higher bandwidth of calculations to the computing device, allowing even problems with comparatively sparse data to fully utilize modern GPUs and achieving speed ups of greater than 100x, compared to standard, sequential time integration. We demonstrate the advantages of implicit, vectorized time integration with several example problems, drawn from both analytical stiff and non-stiff ODE models as well as neural ODE models. We also describe and provide a freely available open-source implementation of the methods developed here.
摘要
常见的科学问题中有刚性系统(ODE)和稀疏的训练数据。这篇论文描述了高效、隐式、矩阵化方法,用于在时间上集成刚性系统的常微分方程,并计算参数偏导数量方法。主要创新在于将问题矩阵化在独立时间序列上,以及在批处理(chunk)或连续时间步上。这使得将问题矩阵化在输入数据上,从而提高计算设备的带宽,使得 Even problems with relatively sparse data can fully utilize modern GPUs, achieving speedups of greater than 100x compared to standard, sequential time integration. We demonstrate the advantages of implicit, vectorized time integration with several example problems, drawn from both analytical stiff and non-stiff ODE models as well as neural ODE models. We also describe and provide a freely available open-source implementation of the methods developed here.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Bucks for Buckets (B4B): Active Defenses Against Stealing Encoders
paper_authors: Jan Dubiński, Stanisław Pawlak, Franziska Boenisch, Tomasz Trzciński, Adam Dziedzic
For: 本研究旨在保护Machine Learning as a Service(MLaaS)APIs中的高性能编码器,防止对编码器的模型盗取攻击。* Methods: 本研究提出了一种名为“Bucks for Buckets”(B4B)的活动防御策略,利用了对 adversary 的观察和适应,以防止盗取编码器的功能。* Results: B4B 能够防止盗取编码器的功能,同时不会影响合法用户的表单质量。此外,B4B 还可以防止 adaptive adversaries 通过创建多个帐户(sybils)来绕过防御。Abstract
Machine Learning as a Service (MLaaS) APIs provide ready-to-use and high-utility encoders that generate vector representations for given inputs. Since these encoders are very costly to train, they become lucrative targets for model stealing attacks during which an adversary leverages query access to the API to replicate the encoder locally at a fraction of the original training costs. We propose Bucks for Buckets (B4B), the first active defense that prevents stealing while the attack is happening without degrading representation quality for legitimate API users. Our defense relies on the observation that the representations returned to adversaries who try to steal the encoder's functionality cover a significantly larger fraction of the embedding space than representations of legitimate users who utilize the encoder to solve a particular downstream task.vB4B leverages this to adaptively adjust the utility of the returned representations according to a user's coverage of the embedding space. To prevent adaptive adversaries from eluding our defense by simply creating multiple user accounts (sybils), B4B also individually transforms each user's representations. This prevents the adversary from directly aggregating representations over multiple accounts to create their stolen encoder copy. Our active defense opens a new path towards securely sharing and democratizing encoders over public APIs.
摘要
Stronger Coreset Bounds for Kernel Density Estimators via Chaining
results: 这个论文的结果是提供了一种可随机多阶时间算法来生成核函数核心集,其大小为$O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$,其中$d$是数据集的维度,$\varepsilon$是误差的大小。此外,这个论文还提供了一种可随机多阶时间算法来生成核函数核心集,其大小为$O\left(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$,但是只适用于 laplacian kernel。最后,这个论文还提供了对 exponential、Hellinger 和 JS kernel 的最佳known bounds,即 $O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log(2\max{1,\alpha})}\right)$。Abstract
We apply the discrepancy method and a chaining approach to give improved bounds on the coreset complexity of a wide class of kernel functions. Our results give randomized polynomial time algorithms to produce coresets of size $O\big(\frac{\sqrt{d}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\big)$ for the Gaussian and Laplacian kernels in the case that the data set is uniformly bounded, an improvement that was not possible with previous techniques. We also obtain coresets of size $O\big(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\big)$ for the Laplacian kernel for $d$ constant. Finally, we give the best known bounds of $O\big(\frac{\sqrt{d}{\varepsilon}\sqrt{\log(2\max\{1,\alpha\})}\big)$ on the coreset complexity of the exponential, Hellinger, and JS Kernels, where $1/\alpha$ is the bandwidth parameter of the kernel.
摘要
我们使用差异方法和链接方法来提供增强的核心复杂度下界,并且给出随机算法来生成核心集的大小为$O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$的 Gaussian 和 Laplacian 核函数的情况下,其中数据集是均匀分布的。此外,我们还得到了 Laplacian 核函数的核心集大小为$O\left(\frac{1}{\varepsilon}\sqrt{\log\log \frac{1}{\varepsilon}\right)$的情况。最后,我们给出了最好的下界为$O\left(\frac{\sqrt{d}{\varepsilon}\sqrt{\log(2\max\{1,\alpha\})}\right)$的核心复杂度,其中 $\frac{1}{\alpha}$ 是核函数的宽度参数。
Divorce Prediction with Machine Learning: Insights and LIME Interpretability
For: The paper aims to predict whether a couple will divorce or not using machine learning algorithms and interpretability techniques.* Methods: The authors use six machine learning algorithms (Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbors, Classification and Regression Trees, Gaussian Na"ive Bayes, and Support Vector Machines) to classify married and divorced individuals based on a dataset. They also use Local Interpretable Model-Agnostic Explanations (LIME) to provide interpretable results.* Results: The authors achieve an accuracy of 98.57% in predicting divorce using the SVM, KNN, and LDA algorithms. They also use LIME to explain the prediction probabilities and identify the most important features that differentiate divorced and married couples. Additionally, they develop a divorce predictor app that considers the ten most important features to help couples make decisions about their relationship.Abstract
Divorce is one of the most common social issues in developed countries like in the United States. Almost 50% of the recent marriages turn into an involuntary divorce or separation. While it is evident that people vary to a different extent, and even over time, an incident like Divorce does not interrupt the individual's daily activities; still, Divorce has a severe effect on the individual's mental health, and personal life. Within the scope of this research, the divorce prediction was carried out by evaluating a dataset named by the 'divorce predictor dataset' to correctly classify between married and Divorce people using six different machine learning algorithms- Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Gaussian Na\"ive Bayes (NB), and, Support Vector Machines (SVM). Preliminary computational results show that algorithms such as SVM, KNN, and LDA, can perform that task with an accuracy of 98.57%. This work's additional novel contribution is the detailed and comprehensive explanation of prediction probabilities using Local Interpretable Model-Agnostic Explanations (LIME). Utilizing LIME to analyze test results illustrates the possibility of differentiating between divorced and married couples. Finally, we have developed a divorce predictor app considering ten most important features that potentially affect couples in making decisions in their divorce, such tools can be used by any one in order to identify their relationship condition.
摘要
各种社会问题中,离婚是最常见的一种,美国和其他发达国家的现代社会中约50%的新婚夫妻会经历不可避免的离婚或分居。虽然人们在不同程度和时间上有所不同,但离婚仍然会对个人的心理健康和生活产生严重的影响。在这个研究中,我们使用了名为“离婚预测数据集”的数据集,使用六种机器学习算法(LR、LDA、KNN、CART、NB和SVM)对已婚和离婚人进行分类,并实现了准确率达98.57%。此外,我们还提供了详细的解释,使用本地可解释性模型无关性(LIME)来分析测试结果,并证明了可以在已婚和离婚夫妻之间进行区分。最后,我们开发了一款离婚预测应用,考虑了十大最重要的因素,这些因素可能影响夫妻在离婚决策中的选择。这些工具可以由任何人使用,以了解他们的关系状况。
Characterizing climate pathways using feature importance on echo state networks
results: 研究发现,使用echostate网络(ESN)可以准确地捕捉气候变量之间的关系,并且可以用特征重要性来评估这些关系的重要性。Abstract
The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data-driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatio-temporal climate data. Like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we select a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which we use as a proxy for an artificial stratospheric aerosol injection. Using the proposed approach, we are able to characterize relationships between pathway variables associated with this event.
摘要
美国2022年国防策略列出了气候变化为国家安全的严重威胁。气候干预方法,如大气粉末注射,已经被提出来 mitigation 策略,但气候系统下流效果不很清楚。开发算法技术来衡量气候事件(即气候路径)变量之间的关系可以帮助决策。数据驱动的深度学习模型已成为模拟非线性关系的强大工具,可能提供了 caracterize 气候变量关系的方法。在这篇文章中,我们探讨使用echostate network(ESN)来 caracterize 气候路径。ESN是用于时间数据的计算效率高的神经网络变体,而最近的研究还提出了使用ESN来预测空间时间气候数据。just like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships。我们解决这个问题 by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model。我们在模拟研究中评估和比较特征重要性技术,并在气候数据上进行了应用。在气候应用中,我们选择了1991年Mount Pinatubo火山喷发事件作为人工大气粉末注射的代表。使用我们的方法,我们能够 caracterize 事件相关变量之间的关系。
Strategies and impact of learning curve estimation for CNN-based image classification
results: 根据实验结果,提出的采样策略可以减少模型训练时间,同时仍能准确地估计学习曲线。Abstract
Learning curves are a measure for how the performance of machine learning models improves given a certain volume of training data. Over a wide variety of applications and models it was observed that learning curves follow -- to a large extent -- a power law behavior. This makes the performance of different models for a given task somewhat predictable and opens the opportunity to reduce the training time for practitioners, who are exploring the space of possible models and hyperparameters for the problem at hand. By estimating the learning curve of a model from training on small subsets of data only the best models need to be considered for training on the full dataset. How to choose subset sizes and how often to sample models on these to obtain estimates is however not researched. Given that the goal is to reduce overall training time strategies are needed that sample the performance in a time-efficient way and yet leads to accurate learning curve estimates. In this paper we formulate the framework for these strategies and propose several strategies. Further we evaluate the strategies for simulated learning curves and in experiments with popular datasets and models for image classification tasks.
摘要
学习曲线是机器学习模型的性能改进度量,随着训练数据量的增加而变化。在各种应用和模型中,学习曲线大多数情况下遵循几何规律行为。这使得不同任务的模型性能之间有一定的预测性,从而为实际应用提供了减少训练时间的机会。通过在小样本数据上训练模型来估算学习曲线,只需考虑最佳的模型进行全数据集训练。然而,选择样本大小和如何采样模型以获得准确的学习曲线估计并不是研究的焦点。为了减少总训练时间,需要采用时效的采样策略,同时能够准确地估计学习曲线。在这篇论文中,我们提出了这些策略的框架,并提出了一些策略。此外,我们还对模拟学习曲线和实际数据集和模型进行了实验评估。
Differentially Private Non-convex Learning for Multi-layer Neural Networks
paper_authors: Hanpu Shen, Cheng-Long Wang, Zihang Xiang, Yiming Ying, Di Wang
for: 本研究探讨了多层fully connected神经网络中的差分隐私权限化优化问题。
methods: 我们提出了一些算法,并进行了分析,证明可以实现数据维度不受影响的过分布风险。
results: 我们的研究表明,在一些特定的情况下,DP-SGD可以提供优化的过分布风险保证。Abstract
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.
摘要
In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.
Introducing a Deep Neural Network-based Model Predictive Control Framework for Rapid Controller Implementation
paper_authors: David C. Gordon, Alexander Winkler, Julian Bedei, Patrick Schaber, Jakob Andert, Charles R. Koch
For: This paper presents an experimental implementation of a deep neural network (DNN) based nonlinear model predictive control (MPC) for Homogeneous Charge Compression Ignition (HCCI) combustion control.* Methods: The MPC uses a Long Short-Term Memory (LSTM) network surrounded by fully connected layers, which was trained using experimental engine data and showed acceptable prediction performance with under 5% error for all outputs.* Results: The developed controller was able to track the Indicated Mean Effective Pressure (IMEP) and combustion phasing trajectories, while minimizing several parameters. The IMEP trajectory following was excellent, with a root-mean-square error of 0.133 bar, and process constraints were observed.Here is the same information in Simplified Chinese text:* For: 这篇论文提出了一种基于深度神经网络(DNN)的非线性模块预测控制(MPC)方法,用于控制Homogeneous Charge Compression Ignition(HCCI)燃燃过程。* Methods: MPC使用了Long Short-Term Memory(LSTM)网络和全连接层,通过使用实验引擎数据训练,并达到了所有输出的下限Error的Acceptable prediction performance。* Results: 开发的控制器能够跟踪Indicated Mean Effective Pressure(IMEP)和燃燃阶段的轨迹,同时最小化一些参数。IMEP轨迹跟踪非常出色,root-mean-square error为0.133 bar,并且观察到了过程约束。Abstract
Model Predictive Control (MPC) provides an optimal control solution based on a cost function while allowing for the implementation of process constraints. As a model-based optimal control technique, the performance of MPC strongly depends on the model used where a trade-off between model computation time and prediction performance exists. One solution is the integration of MPC with a machine learning (ML) based process model which are quick to evaluate online. This work presents the experimental implementation of a deep neural network (DNN) based nonlinear MPC for Homogeneous Charge Compression Ignition (HCCI) combustion control. The DNN model consists of a Long Short-Term Memory (LSTM) network surrounded by fully connected layers which was trained using experimental engine data and showed acceptable prediction performance with under 5% error for all outputs. Using this model, the MPC is designed to track the Indicated Mean Effective Pressure (IMEP) and combustion phasing trajectories, while minimizing several parameters. Using the acados software package to enable the real-time implementation of the MPC on an ARM Cortex A72, the optimization calculations are completed within 1.4 ms. The external A72 processor is integrated with the prototyping engine controller using a UDP connection allowing for rapid experimental deployment of the NMPC. The IMEP trajectory following of the developed controller was excellent, with a root-mean-square error of 0.133 bar, in addition to observing process constraints.
摘要
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
results: 研究发现,只需要一个小数量的独立任务,可以有效地预训练模型,并且预训练后的模型可以与最佳折叠 regression 匹配。Abstract
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a statistical task complexity bound for the attention model pretraining, showing that effective pretraining only requires a small number of independent tasks. Furthermore, we prove that the pretrained model closely matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by achieving nearly Bayes optimal risk on unseen tasks under a fixed context length. These theoretical findings complement prior experimental research and shed light on the statistical foundations of ICL.
摘要
启发器预训练在多个任务上表现出了惊人的境TEXTcontext学习(ICL)能力,使其可以基于输入上下文解决未经调整模型参数的未看过任务。在这篇论文中,我们研究了ICL的一个最简设置:预训练一个线性参数化的单层线性注意力模型,用 Gaussian 假设进行线性回归。我们确定了预训练模型的任务复杂度下界,表明有效的预训练只需要一小数量独立任务。此外,我们证明了预训练模型与最优调整ridge回归算法几乎相同,即在未看过任务下,模型可以达到最优的风险,而且这一结论与先前的实验研究相符。这些理论发现补充了ICL的统计基础,并为其进一步的研究提供了新的思路。
Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges
results: 论文的研究结果表明,在TPT阶段内,不同的标签和特征之间的对齐程度可以导致不同的泛化水平,即”非保守泛化”现象。此外,论文还提供了实验证据来支持其理论结论。Abstract
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT). It is characterized by the collapse of features and classifier into a symmetrical structure, known as simplex equiangular tight frame (ETF). While there have been extensive studies on optimization characteristics showing the global optimality of neural collapse, little research has been done on the generalization behaviors during the occurrence of NC. Particularly, the important phenomenon of generalization improvement during TPT has been remaining in an empirical observation and lacking rigorous theoretical explanation. In this paper, we establish the connection between the minimization of CE and a multi-class SVM during TPT, and then derive a multi-class margin generalization bound, which provides a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%. Additionally, our further theoretical results indicate that different alignment between labels and features in a simplex ETF can result in varying degrees of generalization improvement, despite all models reaching NC and demonstrating similar optimization performance on train set. We refer to this newly discovered property as "non-conservative generalization". In experiments, we also provide empirical observations to verify the indications suggested by our theoretical results.
摘要
neural collapse (NC) 是深度神经网络在终端训练阶段的一个常见现象,特征是特征和分类器归一化到一个对称结构,称为简单等距离框架(ETF)。 虽然有了广泛的优化特性研究,表明NC的全球优化性,但对于NC发生时的泛化行为进行了少量的研究。特别是在训练阶段达到100%的时候,继续训练可以导致测试集上的准确率改善,这种现象在理论上没有得到充分的解释。在这篇论文中,我们连接了CE的最小化和多类SVM的训练过程,并 derivated一种多类边界泛化 bound,这提供了一个理论上的解释,为什么在TPT阶段继续训练可以导致准确率改善。此外,我们的进一步理论结果表明,不同的标签和特征在简单ETF中的对应关系可能会导致不同的泛化提升,即使所有模型都达到NC并在训练集上达到相同的优化性。我们称之为“非保守泛化”。在实验中,我们也提供了一些实证证明我们的理论结果的指示。
LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
results: 论文通过对多种任务和环境进行测试,表明了这种方法的潜在强大性和可扩展性。详细的测试结果表明,通过采用这种方法,可以在多种不同的应用中建立可扩展和高效的决策智能。Abstract
Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero.
摘要
<>translate the following text into Simplified Chinese<>建立基于树搜索规划能力的智能代理,如Go和Atari等 класси型决策问题中 achieved remarkable success。然而,扩展 Monte Carlo Tree Search(MCTS)基于算法到实际应用中,特别是当这些环境包含复杂的行动空间和重要的仿真成本,或者内在的随机性时,被评估为困难或不可能。在这种情况下,我们引入了 LightZero,第一个通用的 MCTS/MuZero Benchmark,用于在通用sequential决策场景中部署 MCTS/MuZero 算法。 Specifically, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules。 By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger。详细的 benchmark 结果表明这种方法在构建可扩展和高效的决策智能方面具有潜在的潜力。代码可以在 OpenDILab 中下载,https://github.com/opendilab/LightZero。Note: Simplified Chinese is used in this translation, which is a version of Chinese that uses simpler grammar and vocabulary than Traditional Chinese.
paper_authors: Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth for:NDMs are designed to train generative distributions more efficiently and simplify the reverse process by allowing time-dependent non-linear transformations of data.methods:NDMs use a variational bound to optimize the model in a simulation-free setting, and a time-continuous formulation allows for fast and reliable inference using off-the-shelf numerical ODE and SDE solvers.results:NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples, as demonstrated through experiments on standard image generation benchmarks such as CIFAR-10, downsampled versions of ImageNet, and CelebA-HQ.Abstract
Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
摘要
Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, a broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimize NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet, and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.Here's the translation in Traditional Chinese:Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, a broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimize NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet, and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
Impact of multi-armed bandit strategies on deep recurrent reinforcement learning
results: 研究表明,使用适应性随机方法可以更好地 approximates the trade-off between exploration and exploitation,而且在总体来说,Softmax和Max-Boltzmann策略可以超越epsilon-greedy策略。Abstract
Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.
摘要
agent在缺乏环境知识的情况下做出决策,是Reinforcement Learning(RL)中一个主要的挑战。在RL中,自动代理人需要均衡两种冲突的决策:一方面是利用当前环境知识来最大化总奖励,另一方面是探索可以提高环境知识的动作,希望能够获得更高的奖励值(探索-利用贸易offs)。同时,另一个相关的问题是环境完全可见性,这可能不是所有应用中都能假设。例如,在使用RL方法控制汽车自动驾驶时,只有2D图像作为输入。在这种情况下,我们采用了多种技术来均衡探索和利用贸易offs,并在部分可见性下进行预测车辊。具体来说,我们的目标是研究使用杂合精度和杂合概率方法来均衡探索和利用贸易offs,同时采用深度循环网络来改进学习阶段。我们希望能够证明,适应性的随机方法可以更好地均衡探索和利用贸易offs,而且在总体来说,Softmax和Max-Boltzmann策略可以超越epsilon-greedy策略。
A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors
results: 我们发现 weight-space symmetries 是理解 posterior 的关键方面,并开发了对这种 symmetries 的深入分析。特别是,我们发现 permutation 和 scaling symmetries 对 Bayesian posterior 有重要影响,并探讨了这些 symmetries 与 L2 正则化之间的关系。此外,我们将 shortly release 一个大规模的 checkpoint 数据集,包括了千个实际模型和我们的代码,以帮助社区更好地理解 Bayesian posterior。Abstract
The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.
摘要
现代深度神经网络(DNN)的权重分布 - 对于不确定性评估和稳定性的评估非常重要 - 是一个极其复杂的对象,因为其维度非常高。这篇论文提出了一个大规模的 posterior distribution 的探索,扩展到了真实世界视觉任务和架构。我们详细研究 posterior 的优化方法,分析 posterior 质量和不确定性评估之间的连接,探讨模式对 posterior 的影响,以及如何可视化 posterior。此外,我们还发现权重空间的Symmetry 是理解 posterior 的关键因素。为了更好地理解 posterior,我们开发了一种权重空间Symmetry 的深入分析,包括 permutation 和缩放Symmetry。而 permutation 类型的转换可以复制模式,我们探索了 L2 正则化和这种转换之间的关系,推翻了一些以前的错误假设。最后,为了帮助社区更好地理解 Bayesian posterior,我们将 shortly 发布大规模的 checkpoint 数据集,包括 тысячи个真实世界模型和我们的代码。
results: 研究人员通过对Isng模型进行测试,发现模型可以提取出scale-invariant kernel,并且可以确定非平衡系统的极限阶段转移。Abstract
Multiscale modeling of complex systems is crucial for understanding their intricacies. Data-driven multiscale modeling has emerged as a promising approach to tackle challenges associated with complex systems. On the other hand, self-similarity is prevalent in complex systems, hinting that large-scale complex systems can be modeled at a reduced cost. In this paper, we introduce a multiscale neural network framework that incorporates self-similarity as prior knowledge, facilitating the modeling of self-similar dynamical systems. For deterministic dynamics, our framework can discern whether the dynamics are self-similar. For uncertain dynamics, it can compare and determine which parameter set is closer to self-similarity. The framework allows us to extract scale-invariant kernels from the dynamics for modeling at any scale. Moreover, our method can identify the power law exponents in self-similar systems. Preliminary tests on the Ising model yielded critical exponents consistent with theoretical expectations, providing valuable insights for addressing critical phase transitions in non-equilibrium systems.
摘要
多尺度模型复杂系统的重要性,是为了理解它们的复杂性。数据驱动多尺度模型已成为规模系统的研究挑战的有效方法。同时,复杂系统中的自相似性很普遍,这表示大规模复杂系统可以在更低的成本下进行模型化。在本文中,我们提出了一种多尺度神经网络框架,该框架利用自相似性作为先验知识,以便模型自相似动力系统。对于确定性动力学,我们的框架可以判断动力系统是否为自相似的。对于不确定性动力学,它可以比较和确定最接近自相似性的参数集。我们的方法可以从动力系统中提取任意尺度的扁平函数,并且可以识别自相似系统中的指数律指数。我们的方法可以在不同尺度下进行模型化,并且可以提供非平衡系统的普遍性。在预测期望值下,我们的方法对牛顿模型进行了初步测试,并得到了相关的扩展 exponent。
Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift
results: 提出了一种统一分析方法,并在一个 ricloss函数家族中得到了正确的理论和数值结果,并且在synthetic和实际示例中进行了广泛的数值研究,证明了方法的有效性。Abstract
Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.
摘要
covariate shift 是在实践中非常普遍存在的问题,source和target数据的输入分布substantially different。Despite its practical importance in various learning problems, most existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.
Emergence of Latent Binary Encoding in Deep Neural Network Classifiers
results: 研究发现,在训练的末 stages,深度神经网络的latent space中会出现二进制编码,这会加速训练过程中的折衔和提高分类精度。Abstract
We observe the emergence of binary encoding within the latent space of deep-neural-network classifiers. Such binary encoding is induced by introducing a linear penultimate layer, which is equipped during training with a loss function that grows as $\exp(\vec{x}^2)$, where $\vec{x}$ are the coordinates in the latent space. The phenomenon we describe represents a specific instance of a well-documented occurrence known as \textit{neural collapse}, which arises in the terminal phase of training and entails the collapse of latent class means to the vertices of a simplex equiangular tight frame (ETF). We show that binary encoding accelerates convergence toward the simplex ETF and enhances classification accuracy.
摘要
Note:* "latent space" is translated as "latent space" (干净空间)* "deep-neural-network classifiers" is translated as "深度神经网络分类器" (shēn dào shén zhī wǎng wǎng)* "binary encoding" is translated as "二进制编码" (èr jì bìn yì)* "linear penultimate layer" is translated as "线性末层" (xiào xìng zhì yù)* "loss function" is translated as "损失函数" (shū shī fún)* "neural collapse" is translated as "神经塌陷" (shén xiān zhù)* "simplex equiangular tight frame" is translated as "等角紧缩框架" (děng jiàng jǐn zhù kōng jì)* "accelerates convergence" is translated as "加速收敛" (jiā sù shōu jí)* "enhances classification accuracy" is translated as "提高分类准确性" (tí gāo fēn xiǎng yì yì)
Conformal inference for regression on Riemannian Manifolds
results: 该论文提出了一种基于拟合空间的预测集方法,并证明了该方法的 asymptotic almost sure convergence 性和效率。 通过实验和实际数据分析,论文还证明了该方法的可行性和适用性。Abstract
Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.
摘要
“投 regression on manifolds 和更广泛的 statistics on manifolds 在最近几年内受到了广泛关注,因为这类数据在各个领域中有着广泛的应用。Circular data 和 data on the Grassmannian manifold 都是 класси的例子,以及由 principal component analysis 获得的数据在 manifold 上。在这项工作中,我们调查了 regression enario 中回归变量 $Y$ 的 manifold 上的预测集,而 covariable $X$ 则在欧几何空间中。这个扩展了 [Lei and Wasserman, 2014] 中的概念,并遵循了传统的准确推断原则。这些预测集是无结构的,即没有对 $(X, Y)$ 的共同分布假设,并且具有非Parametric 的特点。我们证明了这些预测集在 manifold 上的empirical版本在极限上对应于人口版本的 asymptotic almost sure convergence。我们通过了广泛的 simulate 研究和一个使用实际数据的分析,证明了这种方法的效率。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.
Infinite Width Graph Neural Networks for Node Regression/ Classification
methods: 这paper使用了各种架构,包括标准图神经网络、跳过 concatenate 连接的图神经网络和Graph Attention神经网络,并 derivied了它们的kernel和 Gaussian Process 的闭式。
results: 这paper在多种dataset上进行了透传节点预测和分类任务,并使用了spectral sparsification方法来提高运行时间和内存需求。此外, inductive graph learning任务(图 regression/classification)的扩展也 briefley discussed。Abstract
This work analyzes Graph Neural Networks, a generalization of Fully-Connected Deep Neural Nets on Graph structured data, when their width, that is the number of nodes in each fullyconnected layer is increasing to infinity. Infinite Width Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels, both Machine Learning Frameworks with long traditions and extensive theoretical foundations. Gaussian Processes and Kernels have much less hyperparameters then Neural Networks and can be used for uncertainty estimation, making them more user friendly for applications. This works extends the increasing amount of research connecting Gaussian Processes and Kernels to Neural Networks. The Kernel and Gaussian Process closed forms are derived for a variety of architectures, namely the standard Graph Neural Network, the Graph Neural Network with Skip-Concatenate Connections and the Graph Attention Neural Network. All architectures are evaluated on a variety of datasets on the task of transductive Node Regression and Classification. Additionally, a Spectral Sparsification method known as Effective Resistance is used to improve runtime and memory requirements. Extending the setting to inductive graph learning tasks (Graph Regression/ Classification) is straightforward and is briefly discussed in 3.5.
摘要
这个研究 analyze Graph Neural Networks (GNNs), a generalization of Fully-Connected Deep Neural Nets on graph-structured data, when the width (i.e., the number of nodes in each fully connected layer) increases to infinity. 无穷宽神经网络可以将深度学习与泊松过程和kernel相连接,这两者都是机器学习框架,具有较少的Hyperparameter,可以用于uncertainty estimation,使其更适用于应用。这项研究将Gaussian Processes和Kernels相关的研究扩展到神经网络领域。这些架构包括标准的Graph Neural Network (GNN)、Skip-Concatenate Connection GNN和Graph Attention Neural Network (GAT)。这些架构在逻辑回归和分类任务上进行了多种数据集的评估。此外,我们还使用了一种spectral sparsification方法,即Effective Resistance,以改善运行时间和内存需求。将这些设置推广到 inductive graph learning任务 (graph regression/classification) 也是可行的,并在3.5中简要介绍。
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders
paper_authors: Luke Marks, Amir Abdullah, Luna Mendez, Rauno Arike, Philip Torr, Fazl Barez
for: This paper aims to provide a method for interpreting the learned reward functions in reinforcement learning-tuned large language models (LLMs), in order to ensure alignment between the model’s behaviors and the specified objectives.
methods: The proposed method uses sparse autoencoders to compare the activations of a base LLM and its RLHF-tuned version, and identifies unique features that reflect the accuracy of the learned reward model.
results: The method provides an abstract approximation of reward integrity, and is the first application of sparse autoencoders for interpreting learned rewards and broadly inspecting reward learning in LLMs.Abstract
Large language models (LLMs) aligned to human preferences via reinforcement learning from human feedback (RLHF) underpin many commercial applications. However, how RLHF impacts LLM internals remains opaque. We propose a novel method to interpret learned reward functions in RLHF-tuned LLMs using sparse autoencoders. Our approach trains autoencoder sets on activations from a base LLM and its RLHF-tuned version. By comparing autoencoder hidden spaces, we identify unique features that reflect the accuracy of the learned reward model. To quantify this, we construct a scenario where the tuned LLM learns token-reward mappings to maximize reward. This is the first application of sparse autoencoders for interpreting learned rewards and broadly inspecting reward learning in LLMs. Our method provides an abstract approximation of reward integrity. This presents a promising technique for ensuring alignment between specified objectives and model behaviors.
摘要
translate to Simplified Chinese:大型语言模型(LLM)通过人类反馈学习(RLHF)实现了许多商业应用程序。然而,RLHF如何影响 LLM 的内部结构仍然不清晰。我们提出了一种新的方法来解释 RLHF 调教的 LLM 学习的奖励函数使用稀疏自动编码器。我们的方法通过将自动编码器集 trains 在基础 LLM 和其RLHF 调教版本的活动上,并比较自动编码器隐藏空间,以确定RLHF 调教后 LLM 学习的奖励模型的准确性。为了量化这一点,我们构建了一种情况,在RLHF 调教后 LLM 学习token-奖励映射以最大化奖励。这是 sparse autoencoders 用于解释学习奖励的第一个应用,并广泛检查RLHF 学习在 LLM 中的奖励学习。我们的方法提供了一种抽象的奖励完整性评估方法,可以确保模型的行为与指定目标之间的一致性。
On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks
results: 研究应用于最小协variance股票组合优化和对偏好风险的分析,ETF指数追踪 by sparse tracking portfolios, convolutional deep learners for image analysis 和 array-of-sensors数据分析。Abstract
Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.
摘要
“ maximum-type 统计学的certain function of the sample covariance matrix of high-dimensional vector time series 被研究以确认或拒绝 null hypothesis 是否在正常情况下收集数据。这种方法总结了最大偏差sample autocovariances函数的假设值。在线性时间序列框架下,证明Gumbel-type极值几何 asymptotics 成立。应用包括long-only minimal-variance portfolio optimization和对唯一风险的subportfolio分析,ETF指数追踪 porfolio by sparse tracking, convolutional deep learners for image analysis和 array-of-sensors data 分析。”Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
for: Answering open-set questions with explicit reasoning paths in a knowledge-based visual question answering system.
methods: The proposed Graph pATH rankER (GATHER) framework, which includes graph constructing, pruning, and path-level ranking.
results: The model is able to perform open-set question answering across the whole knowledge base and provide explicit reasoning paths, as demonstrated through extensive experiments on real-world questions.Abstract
Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where the system is required to answer questions with entities at wild and retain an explainable reasoning path. To resolve the aforementioned issues, we propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity). Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process. To comprehensively evaluate our model, we reformulate the benchmark dataset OK-VQA with manually corrected entity-level annotations and release it as ConceptVQA. Extensive experiments on real-world questions demonstrate that our framework is not only able to perform open-set question answering across the whole knowledge base but provide explicit reasoning path.
摘要
KB-VQA的目的是使用外部知识库提供问题的正确答案,而且可以提供解释的推理路径。现有的KB-VQA模型通常采用 Retriever-Classifier 框架,其中 Pre-trained Retriever 从知识图中提取文本或视觉信息,然后进行多类分类预测。然而,现有模型存在两个缺陷:首先,问题答案模型为多类分类,限制答案空间为预设的词汇库,缺乏灵活的推理能力。其次,分类器仅考虑“问题的答案”而不考虑“如何得到答案”,无法固定推理路径。在这篇文章中,我们面临KB-VQA中的解释开放集问题,需要在未知知识库中回答问题并提供可解释的推理路径。为解决这些问题,我们提出了一种新的 Retriever-Ranker 模型,即 Graph pATH rankER(GATHER)。GATHER模型包含图构建、剪辑和路径级别排名,不仅可以 Retrieves 精准答案,还可以提供推理路径。为了全面评估我们的模型,我们对 OK-VQA benchmark dataset进行了手动修改Entity-level的注释,并将其发布为 ConceptVQA。广泛的实验表明,我们的框架不仅可以在整个知识库中开放式回答问题,还可以提供可解释的推理路径。
Counterfactual Explanations for Time Series Forecasting
results: 实验结果表明,本方法可以在不同的深度预测模型上达到更高的对照性和数据抽象度。Abstract
Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. ForecastCF guides the perturbations by applying constraints to the forecasted values to obtain desired prediction outcomes. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. Our results show that ForecastCF outperforms the baseline in terms of counterfactual validity and data manifold closeness. Overall, our findings suggest that ForecastCF can generate meaningful and relevant counterfactual explanations for various forecasting tasks.
摘要
现代时间序列预测方法中,深度预测模型在使用隐藏特征模式来提高预测性能方面具有广泛的应用。然而,大多数当前的深度预测模型是不透明的,因此很难解释结果。在分类模型中,对比方面的解释方法已经广泛应用,但是对预测模型的应用仍然尚未得到充分发展。在本文中,我们提出了时间序列预测中的对比方面问题,并提出了一种名为 ForecastCF 的算法,该算法通过对原始时间序列应用梯度基于的干扰来解决这个问题。ForecastCF 使用约束来导引干扰,以实现想要的预测结果。我们通过四种当今最佳的深度模型架构进行实验测试,并与两个基准值进行比较。我们的结果表明,ForecastCF 在对比验证中高于基准值,以至于counterfactual正确性和数据折衔离的方面。总的来说,我们的发现表明,ForecastCF 可以生成有意义和相关的对比解释,用于各种预测任务。
results: 我们运行了多个实验,证明我们的核心集方法的效果。具体来说,我们应用了这种核心集方法,对一个实际应用中的时间消息汇总 зада题进行了解决。Specifically, the summary should include more recent messages compared to older ones. 我们实现了100倍的速度提升,而多元化损失只减少了很少。此外,我们的方法可以在流式设定中提高算法的空间使用率。Abstract
We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural diversity measures: sum-of-pairwise distances and sum-of-nearest-neighbor distances, and show improved core-set construction algorithms with respect to these measures. More precisely, we show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. Second, we show the first core-set w.r.t. the sum-of-nearest-neighbor distances. Finally, we run several experiments showing the effectiveness of our core-set approach. In particular, we apply constrained diversity maximization to summarize a set of timed messages that takes into account the messages' recency. Specifically, the summary should include more recent messages compared to older ones. This is a real task in one of the largest communication platforms, affecting the experience of hundreds of millions daily active users. By utilizing our core-set method for this task, we achieve a 100x speed-up while losing the diversity by only a few percent. Moreover, our approach allows us to improve the space usage of the algorithm in the streaming setting.
摘要
我们研究核心集建构算法,以实现多样性最大化 unter fairness/分区约束。给定一个点集 $P$ 在一个度量空间中分成 $m$ 个组,并给定 $k_1, \ldots, k_m$, 则这个问题的目标是从每个组 $i$ 中选择 $k_i$ 个点,使得总的多样性最大化。我们考虑了两种自然的多样性度量:点对的总距离和最近邻居距离,并提供了改进的核心集建构算法。更加准确地说,我们提供了第一个常量因子核心集,其大小独立于数据集大小和尺度比。其次,我们提供了第一个对最近邻居距离进行多样性最大化的核心集。最后,我们运行了多个实验,证明了我们的核心集方法的有效性。特别是,我们应用了多样性最大化来摘要一组时间消息,考虑到消息的新鲜度。具体来说,摘要应包含更新的消息比较新的消息。这是一个现实中的大型通信平台上的实际任务,影响了每天活跃用户数百万。通过利用我们的核心集方法来解决这个任务,我们得到了100倍的速度提升,而多样性损失只减少了很少。此外,我们的方法可以改进流式设定下的算法空间使用情况。
Overview of Physics-Informed Machine Learning Inversion of Geophysical Data
For: The paper discusses the use of physics-informed machine learning (PIML) algorithms for geophysical data inversion.* Methods: The paper proposes four different PIML algorithms, each with a unique combination of weights and neural network operations, and uses a joint objective function (Equation \ref{PIML.eq120}) to minimize the difference between observed and predicted data.* Results: The paper highlights the potential advantages of PIML over standard full-waveform inversion (FWI), including the ability to avoid local minima and the option to locally train the inversion operator, but notes that the effectiveness of PIML relies on the similarity between the test and trained data.Here is the information in Simplified Chinese:* For: 这篇论文介绍了物理学 Informed 机器学习(PIML)算法的应用于地球物理数据逆向。* Methods: 这篇论文提出了四种不同的 PIML 算法,每种都有特定的量身定制的量和神经网络操作,并使用共同目标函数(方程 \ref{PIML.eq120})来最小化观测数据和预测数据之间的差异。* Results: 这篇论文指出了 PIML 相比标准全波形逆向(FWI)具有避免地点最小值和地方训练逆向器的选择,但是其效iveness 受观测数据和训练数据之间的相似性影响。Abstract
We review four types of algorithms for physics-informed machine learning (PIML) inversion of geophysical data. The unifying equation is given by the joint objective function $\epsilon$: \begin{eqnarray} \epsilon^{||-PIML}&=&\lambda_1 \overbrace{||{\bf W}^{ML}({\bf H}_{\bf w} {\bf d}^{obs}-{\bf m})||^2}^{NN} + \lambda_2 \overbrace{||{\bf W}^{FWI}({\bf L} {\bf m}-{\bf d}^{obs})||^2}^{FWI} ~+ \nonumber\\ \nonumber\\ && + ~~Regularizer, \label{PIML.eq120} \end{eqnarray}where the optimal model ${\bf m}^*$ and weights $\bf w^*$ minimize $\epsilon$. Here, The matrix weights are given by the boldface symbol $\bf W$, and full waveform inversion (FWI) is typically computed using a finite-difference solution of the wave equation, where $\bf L$ represents the forward modeling operation of the wave equation as a function of the model $\bf m$. Also, a fully-connected neural network (NN) is used to compute the model ${\bf H_w}{\bf d}^{obs} \approx \bf m$ from the observed input data ${\bf d}^{obs}$. The selection of weights $\lambda_i$ and the NN operations determine one of four different PIML algorithms. PIML offers potential advantages over standard FWI through its enhanced ability to avoid local minima and the option to locally train the inversion operator, minimizing the requirement for extensive training data for global applicability. However, the effectiveness of PIML relies on the similarity between the test and trained data. Nevertheless, a possible strategy to overcome this limitation involves initial pretraining of a PIML architecture with data from a broader region, followed by fine-tuning for specific data-a method reminiscent of the way large language models are pretrained and adapted for various tasks.
摘要
我们评审了四种物理学 Informed Machine Learning(PIML)逆转 geophysical 数据的算法。共同的目标函数为 $\epsilon$:$$\epsilon^{||-PIML} = \lambda_1 \left\lVert{\bf W}^{ML}({\bf H}_{\bf w} {\bf d}^{obs} - {\bf m})\right\rVert^2 + \lambda_2 \left\lVert{\bf W}^{FWI}({\bf L} {\bf m} - {\bf d}^{obs})\right\rVert^2 + \text{Regularizer}$$其中,最佳模型 ${\bf m}^*$ 和权重 $\bf w^*$ 将 minimize $\epsilon$。其中,粗体字 $\bf W$ 表示矩阵权重,全波形推导(FWI)通常使用finite-difference方法解析波动方程,其中 $\bf L$ 表示模型 $\bf m$ 的前向模型化操作。此外,完全连接神经网络(NN)用于计算模型 ${\bf H_w}{\bf d}^{obs} \approx \bf m$ 从观察数据 ${\bf d}^{obs}$。选择权重 $\lambda_i$ 和 NN 操作确定了一种中的四种 PIML 算法。PIML 具有增强的避免本地最小值的能力,以及可以地方式训练逆转操作,以降低训练数据的需求。然而,PIML 的有效性取决于测试和训练数据之间的相似性。然而,一种可能的策略是先初始化 PIML 架构使用更广泛的地区数据进行预训练,然后进行细化适应特定数据。这种策略类似于大型自然语言模型的预训练和适应不同任务的方法。
Generative Intrinsic Optimization: Intrisic Control with Model Learning
results: 这个论文通过 теоретиче分析和实验 validate了其方法的可行性和效果,并开启了在机器人决策中利用内生控制和模型学习来提高样本效率和环境不确定性的潜在应用前景。Abstract
Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.
摘要
Translated into Simplified Chinese:未来序列表示在环境中执行动的结果。当被动力学概念的共同信息概念驱动时,它寻求最大化的信息后果。显式结果可能因状态、返回或轨迹而变化,以服务不同的目的,如归报学习或模仿学习。然而,在把内在动机与奖励最大化结合的情况下,通常会被忽略。在这项工作中,我们提出了一种变量方法,同时学习必要的量来估计共同信息和动力学模型,提供一个通用的框架,以包括不同形式的结果。将这种方法集成到策略迭代程序中,我们的方法保证 converges to the optimal policy。虽主要关注理论分析,但我们的方法开创了在使用内在控制与模型学习来提高样本效率和在决策中包含环境不确定性的可能性。
ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets
results: 我们的实验结果表明,使用 ClimateBERT-NetZero 模型可以帮助自动检测和分析网零或减少目标,并且可以在大规模数据中提取有用的信息。Abstract
Public and private actors struggle to assess the vast amounts of information about sustainability commitments made by various institutions. To address this problem, we create a novel tool for automatically detecting corporate, national, and regional net zero and reduction targets in three steps. First, we introduce an expert-annotated data set with 3.5K text samples. Second, we train and release ClimateBERT-NetZero, a natural language classifier to detect whether a text contains a net zero or reduction target. Third, we showcase its analysis potential with two use cases: We first demonstrate how ClimateBERT-NetZero can be combined with conventional question-answering (Q&A) models to analyze the ambitions displayed in net zero and reduction targets. Furthermore, we employ the ClimateBERT-NetZero model on quarterly earning call transcripts and outline how communication patterns evolve over time. Our experiments demonstrate promising pathways for extracting and analyzing net zero and emission reduction targets at scale.
摘要
公共和私人行业努力评估各种机构的可持续发展承诺,但面临巨大的信息量和识别挑战。为解决这问题,我们创建了一种自动检测公司、国家和地区的减少和零排放目标的新工具。我们的方法包括以下三步:第一步,我们提供了一个专家标注的数据集,包含3.5K个文本样本。第二步,我们训练并发布了一个基于自然语言的分类器,用于判断文本是否包含减少或零排放目标。第三步,我们展示了这种分类器的分析潜力,通过两个使用情况:首先,我们将 ClimateBERT-NetZero 与传统的问答(Q&A)模型结合,分析减少和零排放目标中表达的目标。其次,我们使用 ClimateBERT-NetZero 模型对每季财务会议笔记进行分析,并详细描述了时间序列中的沟通趋势。我们的实验结果表明,这种方法可以有效地检测和分析减少和零排放目标。
Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach
paper_authors: Jože M. Rožanec, Gašper Petelin, João Costa, Blaž Bertalanič, Gregor Cerar, Marko Guček, Gregor Papa, Dunja Mladenić
for: 本研究旨在应对 zero-inflated 数据,提高模型的预测性能。
methods: 本研究使用了层次模型,将 zero-inflated 数据分解为两个部分,并通过统计学方法来捕捉 zero 的影响。
results: 本研究在实际应用中,包括家用电器分类和机场接驳车需求预测,均得到了优秀的结果。例如,在家用电器分类中,与传统方法相比,提高了精度、回传率、内容积分和 ROC 曲线的平均值。而在机场接驳车需求预测中,使用了 two-fold 模型,与其他模型相比,得到了 statistically significant 的结果。Abstract
In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.
摘要
在许多情况下,机器学习模型需要正确预测一些特定值的数据点,而这些数据点中有许多目标值为零。零扩散数据可以在多种场景中找到,如峰值和间歇性的需求,家用电器的功耗频率是否打开或关闭,蒸馏过程中的杂质测量,以及机场接驳车需求预测。 zeros的存在会影响模型的学习,并可能导致模型的性能差。此外, zeros还会扭曲用于计算模型预测质量的指标。这篇文章介绍了两个实际应用场景(家用电器分类和机场接驳车需求预测),在零扩散数据的上下文中使用层次模型,并取得了优秀的结果。具体来说,对于家用电器分类,使用权重平均的精度、回归率、F1指标和AUC ROC指标的提高分别为27%、34%、49%和27%。此外,还估计该方法比最佳实践(State of the Art,SOTA)方法四倍更能效率。在预测机场接驳车需求方面,两重模型都表现最佳,与其他模型之间的差异被证明为 statistically significant。
A Carbon Tracking Model for Federated Learning: Impact of Quantization and Sparsification
results: 研究发现,在低能量通信 Situations (i.e., < 25 Kbit/Joule) 下,使用 consensus-driven FL 实现可以最大限度减少碳排放。此外,研究还发现,量化和简化操作可以寻求一个平衡点,使得 FL 设计变得可持续。Abstract
Federated Learning (FL) methods adopt efficient communication technologies to distribute machine learning tasks across edge devices, reducing the overhead in terms of data storage and computational complexity compared to centralized solutions. Rather than moving large data volumes from producers (sensors, machines) to energy-hungry data centers, raising environmental concerns due to resource demands, FL provides an alternative solution to mitigate the energy demands of several learning tasks while enabling new Artificial Intelligence of Things (AIoT) applications. This paper proposes a framework for real-time monitoring of the energy and carbon footprint impacts of FL systems. The carbon tracking tool is evaluated for consensus (fully decentralized) and classical FL policies. For the first time, we present a quantitative evaluation of different computationally and communication efficient FL methods from the perspectives of energy consumption and carbon equivalent emissions, suggesting also general guidelines for energy-efficient design. Results indicate that consensus-driven FL implementations should be preferred for limiting carbon emissions when the energy efficiency of the communication is low (i.e., < 25 Kbit/Joule). Besides, quantization and sparsification operations are shown to strike a balance between learning performances and energy consumption, leading to sustainable FL designs.
摘要
Tight Time-Space Lower Bounds for Constant-Pass Learning
paper_authors: Xin Lyu, Avishay Tal, Hongxun Wu, Junzhao Yang
for: 这个论文的目的是为了证明任何偏好学习算法都需要 quadratic memory 或者 exponential number of samples。
methods: 这个论文使用了多 passes 模型,允许学习者在流中多次访问样本。
results: 这个论文证明了任何偏好学习算法在多 passes 模型下都需要 either $\Omega(n^2)$ 内存大小或者至少 $2^{\sqrt{n}}$ 样本。Abstract
In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning problems. Until recently, all these results considered learning in the streaming model, where each sample is drawn independently, and the learner is allowed a single pass over the stream of samples. Garg, Raz, and Tal [CCC'19] considered a stronger model, allowing multiple passes over the stream. In the $2$-pass model, they showed that learning parities of size $n$ requires either a memory of size $n^{1.5}$ or at least $2^{\sqrt{n}$ samples. (Their result also generalizes to other learning problems.) In this work, for any constant $q$, we prove tight memory-sample lower bounds for any parity learning algorithm that makes $q$ passes over the stream of samples. We show that such a learner requires either $\Omega(n^{2})$ memory size or at least $2^{\Omega(n)}$ samples. Beyond establishing a tight lower bound, this is the first non-trivial lower bound for $q$-pass learning for any $q\ge 3$. Similar to prior work, our results extend to any learning problem with many nearly-orthogonal concepts. We complement the lower bound with an upper bound, showing that parity learning with $q$ passes can be done efficiently with $O(n^2/\log q)$ memory.
摘要
在他的突破论文中,拉兹(Raz)证明了任何偏置学习算法都需要 Either quadratic memory or an exponential number of samples (FOCS'16, JACM'19)。一条随后的工作扩展了这个结论到了一大类学习问题。直到最近,所有的结果都考虑了学习在流式模型下,每个样本独立地被采样,学习者被允许单次遍历样本流。Garg、拉兹和塔尔(Garg, Raz, and Tal)在 $2$-pass模型中显示了学习parity的大小为 $n$ 需要 Either a memory of size $n^{1.5}$ or at least $2^{\sqrt{n}$ samples (他们的结果也总结于其他学习问题)。在这项工作中,我们证明了任何偏置学习算法在 $q$ 次遍历样本流时,需要 Either $\Omega(n^{2})$ 内存大小或至少 $2^{\Omega(n)}$ 样本。这不仅是首次给出了非微小的下界,还是首次给出了 $q$-pass 学习的非微小下界,其中 $q\ge 3$。我们的结果同样适用于任何具有多个几乎正交概念的学习问题。我们在这篇论文中还提供了一个上界,证明了在 $q$ 次遍历样本流时,可以使用 $O(n^2/\log q)$ 内存来有效地学习 parity。
ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking
results: 我们在实际数据集上进行了实验,结果表明,我们的模型可以达到状态畅的性能。Abstract
Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.
摘要
预测蛋白和小分子的吸附是药物发现中的关键和挑战。传统的吸附方法主要基于得分函数,而深度学习基于吸附方法通常忽略蛋白和小分子的3D空间信息以及ligand的图形级特征,这限制了其性能。为解决这些限制,我们提议一种具有等变性的变换神经网络 для蛋白-小分子吸附姿态预测。我们的方法包括ligand图形级特征的融合以及使用我们提出的TAMformer模块来学习蛋白和小分子表示。此外,我们采用基于预测距离矩阵的迭代优化方法来生成高精度的ligand姿态。实验结果表明,我们的模型可以在真实数据上达到顶尖性能。
LGL-BCI: A Lightweight Geometric Learning Framework for Motor Imagery-Based Brain-Computer Interfaces
paper_authors: Jianchao Lu, Yuzhe Tian, Yang Zhang, Jiaqi Ge, Quan Z. Sheng, Xi Zheng
for: 这个研究旨在提高电enzephalogram (EEG)-based Motor Imagery (MI) 任务的精度和效率,并且探索geometry deep learning的应用在脑机器接口 (BCI) 领域。
methods: 本研究使用Geometric Deep Learning Framework for EEG processing in non-Euclidean metric spaces, particularly the Symmetric Positive Definite (SPD) Manifold space, 并且提出了一个EEG通道选择解决方案,通过将SPD矩阵维度减少,以提高推断速度。
results: 实验结果显示LGL-BCI的精度和效率明显超过现有解决方案($82.54%$ vs. $62.22%$),并且具有较少的参数(64.9M)。Abstract
Brain-Computer Interfaces (BCIs) are a groundbreaking technology for interacting with external devices using brain signals. Despite advancements, electroencephalogram (EEG)-based Motor Imagery (MI) tasks face challenges like amplitude and phase variability, and complex spatial correlations, with a need for smaller model size and faster inference. This study introduces the LGL-BCI framework, employing a Geometric Deep Learning Framework for EEG processing in non-Euclidean metric spaces, particularly the Symmetric Positive Definite (SPD) Manifold space. LGL-BCI offers robust EEG data representation and captures spatial correlations. We propose an EEG channel selection solution via a feature decomposition algorithm to reduce SPD matrix dimensionality, with a lossless transformation boosting inference speed. Extensive experiments show LGL-BCI's superior accuracy and efficiency compared to current solutions, highlighting geometric deep learning's potential in MI-BCI applications. The efficiency, assessed on two public EEG datasets and two real-world EEG devices, significantly outperforms the state-of-the-art solution in accuracy ($82.54\%$ versus $62.22\%$) with fewer parameters (64.9M compared to 183.7M).
摘要
Brain-Computer Interfaces (BCIs) 是一种创新的技术,通过脑电响应与外部设备交互。尽管有进步,但EEG基于 Motor Imagery (MI) 任务还面临着幅度和相位变化,复杂的空间相关性等问题,需要更小的模型大小和更快的推理。本研究介绍了LGL-BCI框架,利用几何深度学习框架对EEG数据进行处理,特别是在非欧几何度量空间中的Symmetric Positive Definite(SPD) manifesto空间。LGL-BCI可以准确地表示EEG数据,并捕捉空间相关性。我们提出了基于特征分解算法的EEG通道选择解决方案,以减少SPD矩阵维度,并通过无损变换提高推理速度。广泛的实验表明LGL-BCI的精度和效率都高于当前解决方案,这highlights几何深度学习在MI-BCI应用中的潜力。LGL-BCI的效率,在两个公共EEG数据集和两个实际世界EEG设备上进行评估,明显超过了当前解决方案的精度($82.54\%$ VS $62.22\%$),同时具有 fewer parameters(64.9M VS 183.7M)。
Exploring the Relationship Between Model Architecture and In-Context Learning Ability
results: 我们的追加实验表明,不同架构对于 Hyperparameter 设置和训练动态的影响不同。另外,我们发现一些 emerging attention 替代品在 Context-aware 学习中表现更加稳定和 Robust,这可能开启了将Context-aware 学习扩展到更多的 Context-aware 示例的未来可能性。Abstract
What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps towards answering this question. In particular, we evaluate fifteen model architectures across a suite of synthetic in-context learning tasks. The selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, and emerging attention alternatives. We discover that all considered architectures can perform in-context learning under certain conditions. However, contemporary architectures are found to be the best performing, especially as task complexity grows. Additionally, our follow-up experiments delve into various factors that influence in-context learning. We observe varied sensitivities among architectures with respect to hyperparameter settings. Our study of training dynamics reveals that certain architectures exhibit a smooth, progressive learning trajectory, while others demonstrate periods of stagnation followed by abrupt mastery of the task. Finally, and somewhat surprisingly, we find that several emerging attention alternatives are more robust in-context learners than transformers; since such approaches have constant-sized memory footprints at inference time, this result opens the future possibility of scaling up in-context learning to vastly larger numbers of in-context examples.
摘要
What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps towards answering this question. In particular, we evaluate fifteen model architectures across a suite of synthetic in-context learning tasks. The selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, and emerging attention alternatives. We discover that all considered architectures can perform in-context learning under certain conditions. However, contemporary architectures are found to be the best performing, especially as task complexity grows. Additionally, our follow-up experiments delve into various factors that influence in-context learning. We observe varied sensitivities among architectures with respect to hyperparameter settings. Our study of training dynamics reveals that certain architectures exhibit a smooth, progressive learning trajectory, while others demonstrate periods of stagnation followed by abrupt mastery of the task. Finally, and somewhat surprisingly, we find that several emerging attention alternatives are more robust in-context learners than transformers; since such approaches have constant-sized memory footprints at inference time, this result opens the future possibility of scaling up in-context learning to vastly larger numbers of in-context examples.Here is the translation in Traditional Chinese:这篇研究对于模型架构和内容学习的关系进行了初步的探索。我们评估了15种模型架构,包括循环神经网络、卷积神经网络、转移器和新兴注意力方法。我们发现所有考虑的架构都可以在某些情况下进行内容学习,但是现代架构在任务复杂度增加时表现最佳。我们还进行了多种因素影响内容学习的实验,发现不同的架构对于内容学习有不同的敏感性。我们的训练动态研究发现一些架构在进行学习时会展示平滑、进步的学习曲线,而其他架构则会在进行学习时出现停滞期后快速掌握任务。最后,我们发现一些新兴注意力方法在内容学习中表现更稳定和更强,这可能是因为这些方法在测试时具有常量大小的内存占用量,这开启了未来可以扩展内容学习至更大量的内容示例的可能性。
SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection
paper_authors: Xiaoyang Song, Wenbo Sun, Maher Nouiehed, Raed Al Kontar, Judy Jin
for: 提高Out-of-Distribution(OoD)检测精度
methods: 基于 Wasserstein 分数的生成对抗训练方案
results: 在多个计算机视觉数据集上表现出超过现有技术,并且在未看到OoD数据时表现出更好的普适性Abstract
Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on features extracted from training data, rendering them less effective when the training and OoD data are highly overlapped in the feature space. In this work, we propose a Wasserstein-score-based generative adversarial training scheme to enhance OoD detection accuracy, which, for the first time, performs data augmentation and exploration simultaneously under the supervision of limited OoD samples. Specifically, the generator explores OoD spaces and generates synthetic OoD samples using feedback from the discriminator, while the discriminator exploits both the observed and synthesized samples for OoD detection using a predefined Wasserstein score. We provide theoretical guarantees that the optimal solutions of our generative scheme are statistically achievable through adversarial training in empirical settings. We then demonstrate that the proposed method outperforms state-of-the-art techniques on various computer vision datasets and exhibits superior generalizability to unseen OoD data.
摘要
现有的Out-of-Distribution(OoD)检测技术主要基于计量预测不确定性和在训练阶段添加模型规则,使用实际或 sintetic OoD 样本。然而,使用实际 OoD 样本的方法缺乏探索,容易过拟合当前 OoD 样本。而使用 sintetic 样本则是基于训练数据中提取的特征,当训练和 OoD 数据在特征空间高度重叠时,这些 sintetic 样本可能变得不效果。在这种情况下,我们提出一种基于 Wasserstein 分数的生成敌对训练方案,以提高 OoD 检测精度。这种方法在supervise limited OoD 样本下同时进行数据扩充和探索。具体来说,生成器通过对 Discriminator 提供反馈,在 OoD 空间中探索并生成 sintetic OoD 样本,而 Discriminator 则是使用观察到的和 sintetic 样本来进行 OoD 检测,使用预定的 Wasserstein 分数。我们提供了理论保证,表明我们的生成方案在实际情况下通过对抗训练来实现最优解。我们然后通过在多种计算机视觉数据集上进行比较,证明了我们的方法在不同的 OoD 数据集上具有优于现状技术的普适性和检测精度。
ZEST: Attention-based Zero-Shot Learning for Unseen IoT Device Classification
results: 我们在实际 IoT 流量数据上进行了广泛的实验,结果表明:i) ZEST 在基础模型上显著提高了准确率;ii) ZEST 能够更好地提取有意义的表示,比 LSTM 更常用于网络流量模型。Abstract
Recent research works have proposed machine learning models for classifying IoT devices connected to a network. However, there is still a practical challenge of not having all devices (and hence their traffic) available during the training of a model. This essentially means, during the operational phase, we need to classify new devices not seen during the training phase. To address this challenge, we propose ZEST -- a ZSL (zero-shot learning) framework based on self-attention for classifying both seen and unseen devices. ZEST consists of i) a self-attention based network feature extractor, termed SANE, for extracting latent space representations of IoT traffic, ii) a generative model that trains a decoder using latent features to generate pseudo data, and iii) a supervised model that is trained on the generated pseudo data for classifying devices. We carry out extensive experiments on real IoT traffic data; our experiments demonstrate i) ZEST achieves significant improvement (in terms of accuracy) over the baselines; ii) ZEST is able to better extract meaningful representations than LSTM which has been commonly used for modeling network traffic.
摘要
现代研究工作已经提出了基于机器学习的互联网设备分类模型。然而,实际上还存在一个实际挑战,即在训练阶段没有所有设备(以及其交换流量)可用。这意味着在运维阶段需要将新设备分类,这些设备未在训练阶段看到。为解决这个挑战,我们提出了ZEST——基于自注意力的零shot学习(ZSL)框架,用于分类已知和未知的设备。ZEST包括以下三个部分:1. 基于自注意力的网络特征提取器(SANE),用于提取互联网流量的隐藏空间表示。2. 使用隐藏特征进行生成模型,用于生成 Pseudo 数据。3. 使用生成的 Pseudo 数据进行超参数学习,用于分类设备。我们对实际的互联网流量数据进行了广泛的实验,我们的实验结果表明:1. ZEST 相比基elines提供了显著改进(准确率)。2. ZEST 能够更好地提取有意义的表示,比如 LSTM 通常用于网络流量模型。
paper_authors: Artur Back de Luca, Kimon Fountoulakis, Shenghao Yang for:这个论文主要研究的是如何使用噪声节点标签来提高本地图表 clustering 性能。methods:该论文提出了一种基于扩展图的本地 clustering 方法,使用噪声节点标签来提高 clustering 性能。此外,论文还提出了一种基于权重图的方法,通过在噪声标签上进行扩散来提高 clustering 性能。results:实验结果表明,使用噪声节点标签可以获得可靠的节点标签,并且通过在权重图上进行扩散可以提高本地 clustering 性能。论文还进行了多个实验,并证明了这种方法可以在多种实际应用中提高 clustering 性能。Abstract
The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.
摘要
“对于具有附加节点信息的图像进行学习问题的兴趣不断增长,然而现有的方法通常需要处理整个图像,即使这些方法可能会带来成本高昂的计算成本。为了开发更快速的本地方法,我们提出了一种基于不纯净节点标签的本地图像归类方法。在这种设定下,每个节点都会Initially receive binary labels based on cluster affiliation:1 if they belong to the target cluster and 0 otherwise。然后,一部分这些标签会被翻转。我们研究了在Weighted graph上使用这些标签来进行图像归类的效果。从理论上看,我们考虑了在Random graph上recover an unknown target cluster with a single seed node。我们提供了对标签噪声的condition under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster。这种方法比使用给定标签alone或使用 diffusion in the label-free original graph更有效。Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph。此外,通过使用这些标签via diffusion in the weighted graph,可以在多个实际数据集上获得显著提高的本地归类性能,提高F1分数达13%。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Robust 1-bit Compressed Sensing with Iterative Hard Thresholding
results: 本文表明,BIHT算法在噪声损坏情况下可以提供更好的结果,并且可以在 $\tilde{O}(\epsilon+\tau)$ 误差范围内估计$x$,其中 $\epsilon$ 是约束误差,$\tau$ 是噪声损坏率。这个结果表明了iterative 硬阈值处理在噪声损坏情况下的稳定性。Abstract
In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.
摘要
在1比特压缩感知中,目标是从最小的线性测量中估计一个$k$-简单向量$x\in S^{n-1}$,保证错误在$\ell_2$范围内偏差不超过$\epsilon$。在这篇论文中,我们研究了一种噪声损失的情况,其中一部分测量可能会被反转,可能由一个对手引起。特别是,我们分析了使用 proximal 梯度下降的损失函数来实现1比特压缩感知的Binary Iterative Hard Thresholding(BIHT)算法在这种噪声设定下的性能。已知于最近的结果是,只需要$\tilde{O}\left(\frac{k}{\epsilon}\right)$个噪声无关的测量,BIHT可以提供 $\epsilon$ 误差内的估计。这个结果是最佳的和通用的,意味着一组测量可以适用于所有简单向量。在这篇论文中,我们证明BIHT在噪声设定下也比所有已知方法更好。我们证明,当最多 $\tau$ 部分签字测量错误(对手错误)时,BIHT可以在 $\tilde{O}(\epsilon+\tau)$ 误差内提供 $x$ 的估计,保持测量的 universality。这个结果证明了迭代坚固resholding在测量误差存在的情况下的稳定性。为了获得这个结果,我们使用了受限的近似逆Matrix,以及高维ensional的对抗性测量的紧张分析。
Why Train More? Effective and Efficient Membership Inference via Memorization
results: 研究表明,通过策略选择敏感样本,攻击者可以最大化攻击成功的可能性,同时减少陌生模型的数量。Abstract
Membership Inference Attacks (MIAs) aim to identify specific data samples within the private training dataset of machine learning models, leading to serious privacy violations and other sophisticated threats. Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models. By doing so, the adversary obtains models trained "with" or "without" samples drawn from the distribution, and analyzes the characteristics of the samples under consideration. The adversary is often required to train more than hundreds of shadow models to extract the signals needed for MIAs; this becomes the computational overhead of MIAs. In this paper, we propose that by strategically choosing the samples, MI adversaries can maximize their attack success while minimizing the number of shadow models. First, our motivational experiments suggest memorization as the key property explaining disparate sample vulnerability to MIAs. We formalize this through a theoretical bound that connects MI advantage with memorization. Second, we show sample complexity bounds that connect the number of shadow models needed for MIAs with memorization. Lastly, we confirm our theoretical arguments with comprehensive experiments; by utilizing samples with high memorization scores, the adversary can (a) significantly improve its efficacy regardless of the MIA used, and (b) reduce the number of shadow models by nearly two orders of magnitude compared to state-of-the-art approaches.
摘要
AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE
results: 实验结果表明,AutoFHE 可以在 RNS-CKKS 加密 CIFAR 数据集上提高安全执行的速度,比较传统方法快得多,同时也可以提高准确率。相比TFHE,AutoFHE 可以提高执行速度和准确率的同时,达到 $103\times$ 和 3.46% 的提升。Abstract
Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by $1.32\times$ to $1.8\times$ compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by $103\times$ and 3.46%, respectively, compared to CNNs under TFHE.
摘要
安全的深度卷积神经网络(CNN)在RNS-CKKS中进行推理需要多项式近似未支持的非线性活动函数。然而,现有的方法具有以下三个主要的限制:1. 不灵活:用于CNN的多项式近似和相关的卷积评估架构是手动定制的,不能泛化到其他网络。2. 不优化的近似:对于每个CNN,都使用多项式近似,而不是网络中的功能表示。3. 局限的设计:使用高度或低度的多项式近似,前者保持高精度,但增加了卷积评估的速度,后者快速了ciphertext推理,但牺牲了精度。为了解决这些限制,我们提出了AutoFHE,它自动将标准的CNN适应安全地进行RNS-CKKS中的推理。AutoFHE的关键思想是采用层次混合度多项式活动函数,并在卷积评估架构中对它们进行优化,以最大化精度和减少卷积评估操作数量。问题被模型为多目标优化框架,以优化精度和响应时间之间的负荷。AutoFHE可以灵活应用于任何CNN架构,并提供了多种解决方案,横跨精度和响应时间之间的负荷Trade-off。实验表明,AutoFHE比使用高度多项式快速了1.32倍至1.8倍的安全推理,并提高了精度。同时,AutoFHE也提高了精度和响应时间之间的负荷Trade-off,相比TFHE, acceleration和精度提高分别为103倍和3.46%。
results: 实验结果显示,使用LEMON可以降低对于Vision Transformers和BERT等模型的训练时间和计算成本,相比训练 desde scratch,可以节省56.7%的计算成本和33.2%的训练时间。Abstract
Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.
摘要
深度神经网络的扩展,特别是转换器,对其表现的增长和引入了更加复杂的理解能力而言是非常重要的。这种扩展通常需要从零开始训练大型模型,不能利用已经训练过的小型模型所获得的知识,这些模型已经是资源占用的了。为解决这种不效率,我们提出了$\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$(LEMON),一种初始化扩展模型的方法,使用小型模型已经训练过的权重。然后,通过对扩展模型进行优化的学习率调整器,减少了训练时间与从零开始训练的计算成本。值得注意的是,LEMON具有兼容性,可以与不同的网络结构相结合,包括视觉转换器和BERT等模型。我们的实验结果表明,LEMON可以在视觉转换器和BERT等模型上减少计算成本,相比于从零开始训练,减少了56.7%和33.2%。
Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics
paper_authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng
For: The paper aims to improve the accuracy of metabolomics data imputation by integrating whole-genome sequencing (WGS) data with metabolomics data.* Methods: The proposed method uses a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation.* Results: The proposed method achieved r2-scores > 0.01 for 71.55% of metabolites, demonstrating its superiority compared to conventional imputation techniques.Here are the three points in Simplified Chinese:* For: 这个研究想要使用整个基因组序列(WGS)数据来改进大规模精确的 метабо树据报告。* Methods: 这种方法使用多视图自适应变换器来同时模型负担分数、多因素风险分数(PGS)和链接不均衡(LD)剔除单树谱分割(SNPs),以提取特征和缺失的 метабо树据。* Results: 这种方法在实验数据上显示了与传统填充方法相比的超越性,其中71.55%的代表物质的r2-分数大于0.01。Abstract
Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved r2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.
摘要
背景:大量数据缺失是生物 массспектрометрии中的常见挑战,可能导致偏执和缺失的分析。整合整个基因组序列数据(WGS)与生物 массспектрометria数据已经出现为提高生物 массспектрометria数据缺失的精度的有力的方法。方法:在这种研究中,我们提出了一种新的方法,利用WGS数据和参照物质的信息来填充未知的代谢物质。我们的方法使用多视图变换自适应器来同时模型负担分数、多型风险分数(PGS)和遗传关系不均匀变异(LD)排序单核苷变异(SNPs),以提取特征和缺失的生物 массспектрометria数据。通过学习两种各自的隐藏表示,我们的方法可以有效地填充缺失的生物 массспектрометria数据。结果:我们对实验室中的生物 массспектрометria数据进行评估,并证明了我们的方法在缺失数据的情况下的表现较为优于传统填充技术。使用35个模板代谢物质的负担分数、PGS和LD排序SNPs,我们的方法在71.55%的代谢物质上达到了r2分数>0.01。结论:整合WGS数据在生物 массспектрометria填充中不仅提高了数据完teness,还提高了下游分析,开创了更全面和准确的代谢 PATHway和疾病相关性的研究。我们的发现对于利用WGS数据进行生物 массспектрометria填充的潜在利益和多Modal数据集成在精度医学研究中的重要性提供了有价值的意见。
Semantic-Forward Relaying: A Novel Framework Towards 6G Cooperative Communications
results: simulation results 表明,即使在坏通道条件下,SF relaying 仍然可以有效地提高 recovered information 质量。Abstract
This letter proposes a novel relaying framework, semantic-forward (SF), for cooperative communications towards the sixth-generation (6G) wireless networks. The SF relay extracts and transmits the semantic features, which reduces forwarding payload, and also improves the network robustness against intra-link errors. Based on the theoretical basis for cooperative communications with side information and the turbo principle, we design a joint source-channel coding algorithm to iteratively exchange the extrinsic information for enhancing the decoding gains at the destination. Surprisingly, simulation results indicate that even in bad channel conditions, SF relaying can still effectively improve the recovered information quality.
摘要
Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization
paper_authors: Fu Luo, Xi Lin, Fei Liu, Qingfu Zhang, Zhenkun Wang
for: 解决复杂的 combinatorial 优化问题,无需专家设计专门的算法。
methods: 提出了一种新的 Light Encoder and Heavy Decoder (LEHD) 模型,可以动态捕捉所有节点的关系,提高模型通用性。 并开发了一种数据效率高的训练方案和灵活的解决方案构建机制。
results: 通过训练小规模问题实例,LEHD 模型可以在 TSP 和 CVRP 问题上生成近似优解,并在实际 TSPLib 和 CVRPLib 问题上也达到了优秀表现。这些结果证明了我们提出的 LEHD 模型可以显著提高现有的 NCO 性能。代码可以在 https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD 上下载。Abstract
Neural combinatorial optimization (NCO) is a promising learning-based approach for solving challenging combinatorial optimization problems without specialized algorithm design by experts. However, most constructive NCO methods cannot solve problems with large-scale instance sizes, which significantly diminishes their usefulness for real-world applications. In this work, we propose a novel Light Encoder and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue. The LEHD model can learn to dynamically capture the relationships between all available nodes of varying sizes, which is beneficial for model generalization to problems of various scales. Moreover, we develop a data-efficient training scheme and a flexible solution construction mechanism for the proposed LEHD model. By training on small-scale problem instances, the LEHD model can generate nearly optimal solutions for the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 1000 nodes, and also generalizes well to solve real-world TSPLib and CVRPLib problems. These results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO. The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.
摘要
神经组合优化(NCO)是一种有前途的学习基于方法,用于解决复杂的组合优化问题,不需要专家设计专门的算法。然而,大多数构造性NCO方法无法解决大规模实例问题,这会对实际应用中的使用有很大的限制。在这种情况下,我们提出了一种新的轻量级编码器和重量级解码器(LEHD)模型,具有强大的总结能力。LEHD模型可以学习动态捕捉所有可用节点的关系,这对模型总结有利,可以在不同的缩放比例下解决问题。此外,我们还开发了一种数据效率的训练方案和灵活的解决方案构建机制。通过训练小规模问题,LEHD模型可以生成近似优质解决方案,并在实际TSPLib和CVRPLib问题上具有良好的总结性。这些结果证明,我们提出的LEHD模型可以对构造性NCO进行显著改进,从而提高实际应用中的性能。代码可以在https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD中下载。
RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization
results: 作者通过实验和分析表明,RandCom 可以在 federated learning 中实现线性增速,并且在非 convex 设定下,RandCom 可以实现网络独立步长下的线性增速。Abstract
Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.
摘要
分布式优化方法 WITH random communication skips 在加速通信复杂性方面受到越来越多的关注,因为它们在强 convex deterministic 设定下证明了其 benefits。然而,现有研究主要集中在中央通信协议上,即使在非 convex 和 strongly convex 设定下也进行了研究。在这项工作中,我们提出了一种分布式优化方法,称为 RandCom,该方法包含 probabilistic local updates。我们分析了 RandCom 在随机非 convex、 convex 和 strongly convex 设定下的性能,并证明了它可以减少通信开销的概率。此外,我们证明了 RandCom 随着节点数量增加而 achieve linear speedup。在随机 strongly convex 设定下,我们进一步证明了 RandCom 可以 achieve linear speedup WITH network-independent stepsizes。此外,我们将 RandCom 应用于联合学习,并提供了关于 achievable linear speedup 和非 convex 设定下 probablistic local update 方法的正面结果。
Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach
paper_authors: Hwajong Lee, Chan Kim, Seong-Woo Kim
for: 解决制造系统生产能力提高中的流控系统优化问题,提高生产率。
methods: 使用深度反馈学习解决制造过程中的调度优化问题,实现可靠的流控系统设计。
results: 通过实验验证,使用反馈学习对显示制造过程中的玻璃流控系统进行优化,可以获得更高的生产率和更好的制造质量。Abstract
A flow control system is a critical concept for increasing the production capacity of manufacturing systems. To solve the scheduling optimization problem related to the flow control with the aim of improving productivity, existing methods depend on a heuristic design by domain human experts. Therefore, the methods require correction, monitoring, and verification by using real equipment. As system designs increase in complexity, the monitoring time increases, which decreases the probability of arriving at the optimal design. As an alternative approach to the heuristic design of flow control systems, the use of deep reinforcement learning to solve the scheduling optimization problem has been considered. Although the existing research on reinforcement learning has yielded excellent performance in some areas, the applicability of the results to actual FAB such as display and semiconductor manufacturing processes is not evident so far. To this end, we propose a method to implement a physical simulation environment and devise a feasible flow control system design using a transfer robot in display manufacturing through reinforcement learning. We present a model and parameter setting to build a virtual environment for different display transfer robots, and training methods of reinforcement learning on the environment to obtain an optimal scheduling of glass flow control systems. Its feasibility was verified by using different types of robots used in the actual process.
摘要
一种流控系统是生产系统的关键概念,可以提高生产能力。为解决流控系统的调度优化问题,目前的方法通过域内人工设计而实现,但这些方法需要纠正、监测和验证,而且随着系统设计的增加,监测时间增加,优化设计的概率降低。为了避免人工设计的限制,我们提出了使用深度强化学习解决调度优化问题的方法。虽然现有研究中的深度学习得到了一些领域的优秀表现,但是其应用于实际的FAB,如显示和半导体生产过程,还未得到证明。因此,我们提出了一种在物理模拟环境中实现流控系统设计的方法,并在显示生产过程中使用转移机器人进行实际应用。我们提出了建立不同类型机器人的虚拟环境,并在这些环境中进行强化学习训练,以获得最佳的玻璃流控系统调度。我们验证了这种方法的可行性,并在实际过程中使用不同类型机器人进行了验证。
GRASP: Accelerating Shortest Path Attacks via Graph Attention
results: 对于APX-硬度的问题,GRASP算法可以在10倍的运行时间内维持解的质量。此外,通过精心设计输入图的表示,包括节点特征与优化任务之间的相关性,可以更好地高亮优化解的重要结构。Abstract
Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.
摘要
We consider an APX-hard problem where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. To address this, we propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML-aided optimization algorithm that achieves run times up to 10 times faster while maintaining the quality of the solution generated.GRASP utilizes a graph attention network to identify a smaller subgraph containing the combinatorial solution, effectively reducing the input problem size. Additionally, we demonstrate how carefully representing the input graph, including node features that strongly correlate with the optimization task, can highlight important structure in the optimization solution.
Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks
paper_authors: Zohair Shafi, Benjamin A. Miller, Tina Eliassi-Rad, Rajmonda S. Caceres
for: solves the Set Cover Problem (SCP) using graph neural networks to accelerate combinatorial optimization.
methods: uses a graph neural network method called Graph-SCP to identify a smaller sub-problem that contains the solution space, and can be used with other optimization solvers to achieve run time improvement.
results: reduces the problem size by 30-70% and achieves run time speedups up to~25x compared to commercial solvers, and can achieve 100% optimality given a desired threshold.Abstract
Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We look specifically at the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that can augment existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. We evaluate the performance of Graph-SCP on synthetic weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 30-70% and achieves run time speedups up to~25x when compared to commercial solvers (Gurobi). Given a desired optimality threshold, Graph-SCP will improve upon it or even achieve 100% optimality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial run time. Graph-SCP can generalize to larger problem sizes and can be used with other conventional or ML-augmented CO solvers to lead to potential additional run time improvement.
摘要
Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach
methods: Hyperparameter Adaptive Search for SO (HASSO)方法,一种自适应的SO算法,动态调整自己的超参数,不需要额外评估
results: 实验结果表明,HASSO可以提高多种SO算法在不同全球优化问题的性能Abstract
Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.
摘要
供质 Optimization(SO)算法已经显示出优化成本高black-box函数的承诺。然而,其性能受到采样和代理适应参数的影响,这对其普及化带来挑战。我们调查了不同SO算法中的 гипер参数对各种问题的影响,并提出了自适应搜索SO方法(HASSO)。HASSO不是一个hyperparameter tuning算法,而是一种通用的自适应SO算法,可以同时进行主要目标函数优化和自动调整参数,不需要额外的评估。目标是使SO算法更加容易使用、有效和快速收敛,为实践者提供更好的解决方案。我们的方法可以根据具体的问题和SO方法自动确定和修改最有影响的参数,从而减少手动调整的需求,不会提高计算负担。实验结果表明,HASSO可以在不同的全球优化测试问题中提高SO算法的性能。
Towards Causal Deep Learning for Vulnerability Detection
paper_authors: Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, Wei Le for:这个研究旨在提高深度学习漏洞探测的可靠性和一致性,以便在实际应用中使用。methods:我们提出了一种基于 causality 的方法,包括两个阶段。第一个阶段是设计 novel perturbations,以发现模型可能使用的伪实际特征。第二个阶段是将 causal learning 算法,特别是 do-calculus,应用到现有的深度学习模型上,以系统地移除使用伪实际特征,并且将 causal 基于预测。results:我们的结果显示,我们的方法 CausalVul 可以适当地提高模型的准确性、可靠性和 OOD 性能,并且适用于所有 state-of-the-art 模型和数据集。此外,我们的研究是首个将 do-calculus 基于 causal learning 应用到软件工程模型上,并证明其实际用途。我们的重复套件可以在 https://figshare.com/s/0ffda320dcb96c249ef2 找到。Abstract
Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.
摘要
深度学习漏洞检测在最近几年内已经展示了有 promise的结果。然而,一个重要的挑战仍然阻碍它在实际中变得非常有用是,模型不具有对扰动和不同数据集(OOD)的抗锋性和扩展性。我们认为这是因为模型学习了不稳定的特征,例如变量名称,这些特征与标签之间存在假 correlations。当扰动和OOD数据集不再具有这些假特征时,模型预测失败。为了解决这个挑战,在这篇论文中,我们引入了 causality 到深度学习漏洞检测中。我们的方法 CausalVul 包括两个阶段。第一个阶段,我们设计了新的扰动,以便发现模型可能使用的假特征。第二个阶段,我们应用了 causal 学习算法,具体来说是 do-calculus,在现有的深度学习模型之上进行系统性的减少假特征的使用,以便促进 causal 基于预测。我们的结果显示, CausalVul 在所有state-of-the-art 模型和数据集上都有提高模型精度、抗锋性和 OOD 性能。到目前为止,这是首次引入 do-calculus 基于 causal 学习到软件工程模型中,并证明其实际上有用于提高模型精度、抗锋性和扩展性。我们的复制包可以在 https://figshare.com/s/0ffda320dcb96c249ef2 找到。