results: 通过 train 模型并且使用 multi-head attention 机制,可以更好地捕捉 Code 中重要信息,并生成高质量的摘要。Abstract
Code summarization aims to generate concise natural language descriptions for source code. The prevailing approaches adopt transformer-based encoder-decoder architectures, where the Abstract Syntax Tree (AST) of the source code is utilized for encoding structural information. However, ASTs are much longer than the corresponding source code, and existing methods ignore this size constraint by directly feeding the entire linearized AST into the encoders. This simplistic approach makes it challenging to extract truly valuable dependency relations from the overlong input sequence and leads to significant computational overhead due to self-attention applied to all nodes in the AST. To address this issue effectively and efficiently, we present a model, AST-MHSA that uses multi-head attention to extract the important semantic information from the AST. The model consists of two main components: an encoder and a decoder. The encoder takes as input the abstract syntax tree (AST) of the code and generates a sequence of hidden states. The decoder then takes these hidden states as input and generates a natural language summary of the code. The multi-head attention mechanism allows the model to learn different representations of the input code, which can be combined to generate a more comprehensive summary. The model is trained on a dataset of code and summaries, and the parameters of the model are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
摘要
Code 摘要目标是生成源代码的简洁自然语言描述。现有的方法大多采用变换器基于encoder-decoder架构,其中源代码的抽象SyntaxTree(AST)被用于编码结构信息。然而,AST的长度比源代码更长,现有方法直接将整个linearized AST feed到encoder中,这种简单的方法使得EXTRACTING truly valuable dependency relations from the oversized input sequence困难,并且会导致自我注意应用于所有节点中,从而增加计算开销。为解决这个问题,我们提出了一种模型,AST-MHSA,该模型使用多头注意 Mechanism to extract重要的semantic信息from AST。该模型包括两个主要组成部分:编码器和解码器。编码器接受源代码的AST作为输入,并生成一个序列Hidden state。然后,解码器接受这些隐藏状态作为输入,并生成一个自然语言摘要。多头注意机制使得模型可以学习不同的输入代码表示,这些表示可以相互组合,以生成更加全面的摘要。模型在一个代码和摘要的集合上训练,并且通过调整模型的参数,以Minimize the loss between the generated summaries and the ground-truth summaries。
IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer
results: 实验结果显示,提案的IIHT方法可以实现高精度的医疗报告生成,并且可以让医生在实际应用中修改疾病指标以提高准确性。Abstract
Automated medical report generation has become increasingly important in medical analysis. It can produce computer-aided diagnosis descriptions and thus significantly alleviate the doctors' work. Inspired by the huge success of neural machine translation and image captioning, various deep learning methods have been proposed for medical report generation. However, due to the inherent properties of medical data, including data imbalance and the length and correlation between report sequences, the generated reports by existing methods may exhibit linguistic fluency but lack adequate clinical accuracy. In this work, we propose an image-to-indicator hierarchical transformer (IIHT) framework for medical report generation. It consists of three modules, i.e., a classifier module, an indicator expansion module and a generator module. The classifier module first extracts image features from the input medical images and produces disease-related indicators with their corresponding states. The disease-related indicators are subsequently utilised as input for the indicator expansion module, incorporating the "data-text-data" strategy. The transformer-based generator then leverages these extracted features along with image features as auxiliary information to generate final reports. Furthermore, the proposed IIHT method is feasible for radiologists to modify disease indicators in real-world scenarios and integrate the operations into the indicator expansion module for fluent and accurate medical report generation. Extensive experiments and comparisons with state-of-the-art methods under various evaluation metrics demonstrate the great performance of the proposed method.
摘要
医疗报告自动生成已成为医学分析中不可或缺的一环。它可以生成计算机辅助的诊断描述,从而减轻医生的工作负担。靛蓝Machine Translation和图像描述等深度学习方法已经在医疗报告生成方面得到了广泛的应用。然而,由于医疗数据的自然属性,如数据不均衡和报告序列的长度和相关性,现有的方法可能会生成报告, linguistic fluency高但丧失医学精度。在这种情况下,我们提出了一种图像层次变换器(IIHT)框架,用于医疗报告生成。该框架由三个模块组成:分类模块、指标扩展模块和生成模块。首先,分类模块从输入医学图像中提取图像特征,并生成相关疾病指标。然后,指标扩展模块使用“数据-文本-数据”策略,将这些疾病指标扩展为更加详细的报告。最后,使用变换器生成器,通过这些提取的特征和图像特征作为辅助信息,生成最终的报告。此外,我们的IIHT方法可以让放医生在实际应用中修改疾病指标,并将这些操作集成到指标扩展模块中,以实现流畅而准确的医疗报告生成。经过了广泛的实验和比较,我们的提出的方法在不同的评价指标上表现出色。
results: 实验结果显示,这种新的门控机制可以在 synthetic sequence learning 任务中维持长期记忆,并且降低 computational cost。另外,在手写文字识别任务中,这种新的门控机制可以训练到与 conventinal GRU 和 LSTM 基准之间的比较类似的准确性。Abstract
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost, thereby opening up for more efficient execution or larger models on restricted hardware. Recurrent Neural Networks (RNNs) with gating mechanisms such as LSTM and GRU have been widely successful in learning from sequential data due to their ability to capture long-term dependencies. Conventionally, the update based on current inputs and the previous state history is each multiplied with dynamic weights and combined to compute the next state. However, multiplication can be computationally expensive, especially for certain hardware architectures or alternative arithmetic systems such as homomorphic encryption. It is demonstrated that the novel gating mechanism can capture long-term dependencies for a standard synthetic sequence learning task while significantly reducing computational costs such that execution time is reduced by half on CPU and by one-third under encryption. Experimental results on handwritten text recognition tasks furthermore show that the proposed architecture can be trained to achieve comparable accuracy to conventional GRU and LSTM baselines. The gating mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the multiplication of encrypted variables. It can also support quantization in (unencrypted) plaintext applications, with the potential for substantial performance gains since the addition-based formulation can avoid the expansion to double precision often required for multiplication.
摘要
我们将传统的条件门的多项式和 Sigmoid 函数取代为加法和ReLU活化。这个机制设计来维持序列处理中的长期记忆,但是降低computational cost,因此开辟更高效的执行或更大的模型在限制硬件上。循环神经网络(RNN)的门机制,如LSTM和GRU,在处理序列资料时具有长期依赖性的能力。在传统上,更新基于当前输入和前一个状态的历史是每个 multiplied with dynamic weights,然后合并以计算下一个状态。但是,multiplication可以是computationally expensive,特别是在certain hardware architectures或alternative arithmetic systems,如homomorphic encryption。实验结果显示,我们的新的门机制可以在标准的合成序列学习任务中长期依赖性,并且将computational costs大幅降低,CPU上执行时间下降了一半,而在加密下执行时间则下降了一个一third。实验结果还显示,我们的提案的架构可以与传统GRU和LSTM基eline相比,在手写文字识别任务中取得相似的准确性。我们的门机制可能将隐私保护AI应用程序在homomorphic encryption下运行,并且可以支持Quantization在(未加密)普通文本应用中,这可能将带来明显的性能提升,因为加法形式可以避免double precision的扩展,常常需要 multiplication。
results: 这篇论文得到了一种基于新的地方Holder平滑性的 bound。Abstract
In this short note, I show how to adapt to H\"{o}lder smoothness using normalized gradients in a black-box way. Moreover, the bound will depend on a novel notion of local H\"{o}lder smoothness. The main idea directly comes from Levy [2017].
摘要
这短いノートでは、Holder平坦性に适応するためのnormalized gradientsを使用する方法を示します。さらに、このバウンドは、新しい地域Holder平坦性の概念によって依存します。主たるアイデアは、Levy(2017)から直接取り入れられています。Here's the breakdown of the translation:* "Holder smoothness" is translated as "Holder平坦性" (Holder smoothness).* "normalized gradients" is translated as "normalized gradients" (normalized gradients).* "black-box way" is translated as "黑盒方式" (black-box method).* "novel notion" is translated as "新的概念" (new concept).* "local H\"{o}lder smoothness" is translated as "地域Holder平坦性" (local Holder smoothness).* "Levy" is translated as "Levy" (Levy).
Updating Clinical Risk Stratification Models Using Rank-Based Compatibility: Approaches for Evaluating and Optimizing Clinician-Model Team Performance
results: 在使用MIMIC数据集进行临床风险分组时,这篇论文的方法比现有的模型选择技术高$0.019$($95%$ confidence interval:$0.005$, $0.035$)。Abstract
As data shift or new data become available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.
摘要
As data shifts or new data becomes available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.Here's the breakdown of the translation:* As data shifts or new data becomes available: 数据变化或新数据Available* updating clinical machine learning models may be necessary: 可能需要更新临床机器学习模型* to maintain or improve performance over time: 以保持或改进性能* However, updating a model can introduce compatibility issues: 但是更新模型可能会引入兼容问题* when the behavior of the updated model does not align with user expectations: 用户对更新后模型的行为不符合预期* resulting in poor user-model team performance: 导致用户-模型团队性能不佳* Existing compatibility measures depend on model decision thresholds: 现有的兼容度度量取决于模型决策阈值* limiting their applicability in settings where models are used to generate rankings based on estimated risk: 限制其在根据估计风险生成排名的设置中的应用* To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function: 为了解决这个限制,我们提出了一种新的排名基于兼容度度量$C^R$和一种新的损失函数* that aims to optimize discriminative performance while encouraging good compatibility: goals to optimize discriminative performance while encouraging good compatibility* Applied to a case study in mortality risk stratification leveraging data from MIMIC: 在基于MIMIC数据的mortality risk stratification中应用* our approach yields more compatible models while maintaining discriminative performance: 我们的方法可以生成更兼容的模型,同时保持分类性能* compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$): 与现有的模型选择技术相比,$C^R$的增加为$0.019$ ($95\%$ confidence interval: $0.005$, $0.035$)* This work provides new tools to analyze and update risk stratification models used in clinical care: 这项工作提供了新的工具来分析和更新在临床护理中使用的风险分化模型
Multi-graph Spatio-temporal Graph Convolutional Network for Traffic Flow Prediction
results: 在一个中国省级高速公路上,对比基准方法,提供了明显的预测精度提升和实际业务优势。Abstract
Inter-city highway transportation is significant for urban life. As one of the key functions in intelligent transportation system (ITS), traffic evaluation always plays significant role nowadays, and daily traffic flow prediction still faces challenges at network-wide toll stations. On the one hand, the data imbalance in practice among various locations deteriorates the performance of prediction. On the other hand, complex correlative spatio-temporal factors cannot be comprehensively employed in long-term duration. In this paper, a prediction method is proposed for daily traffic flow in highway domain through spatio-temporal deep learning. In our method, data normalization strategy is used to deal with data imbalance, due to long-tail distribution of traffic flow at network-wide toll stations. And then, based on graph convolutional network, we construct networks in distinct semantics to capture spatio-temporal features. Beside that, meteorology and calendar features are used by our model in the full connection stage to extra external characteristics of traffic flow. By extensive experiments and case studies in one Chinese provincial highway, our method shows clear improvement in predictive accuracy than baselines and practical benefits in business.
摘要
urban 生活中的交通运输非常重要。作为智能交通系统(ITS)的一个关键功能,交通评估总是在当今具有重要作用,而日常交通流量预测仍然面临在网络级别的收费站道路上的挑战。一方面,实际中数据的不均衡情况下降低预测性能。另一方面,复杂的相关空间时间因素无法在长期时间内充分利用。本文提出了基于高速公路域的日常交通流量预测方法,通过空间时间深度学习来解决数据不均衡问题。我们使用数据normalization策略来处理数据不均衡问题,因为交通流量在网络级别的收费站具有长尾分布。然后,基于图 convolutional 网络,我们构建了不同 semantics 的网络,以捕捉空间时间特征。此外,我们在全连接阶段使用 meterology 和 calendar 特征来捕捉交通流量的外部特征。经过广泛的实验和案例研究在一个中国省道上,我们的方法显示了较好的预测精度,与基线方法相比。此外,在商业实践中也具有实际的优势。
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search
results: 实现了在训练和推理中 simultaneously 优化量化参数,并达到了最佳压缩率和预测性能。Abstract
Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models quantization, namely, non-uniform quantization. NUPES leverages automorphisms to preserve the scalar multiplications. Such transformations are derived from power functions. However, the optimization of the exponent parameter and weight values remains a challenging and novel problem which could not be solved with previous post training optimization techniques which only learn to round up or down weight values in order to preserve the predictive function. We circumvent this limitation with a new paradigm: learning new quantized weights over the entire quantized space. Similarly, we enable the optimization of the power exponent, i.e. the optimization of the quantization operator itself during training by alleviating all the numerical instabilities. The resulting predictive function is compatible with integer-only low-bit inference. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.
摘要
Symmetry Defense Against XGBoost Adversarial Perturbation Attacks
for: 保护树型ensemble分类器(如Gradient-Boosting Decision Trees) Against 攻击性干扰
methods: 使用对称性防御
results: achievement of up to 100% accuracy on adversarial samples, and up to over 95% accuracy on adversarial samples for the GBDT classifier of the F-MNIST dataset.Abstract
We examine whether symmetry can be used to defend tree-based ensemble classifiers such as gradient-boosting decision trees (GBDTs) against adversarial perturbation attacks. The idea is based on a recent symmetry defense for convolutional neural network classifiers (CNNs) that utilizes CNNs' lack of invariance with respect to symmetries. CNNs lack invariance because they can classify a symmetric sample, such as a horizontally flipped image, differently from the original sample. CNNs' lack of invariance also means that CNNs can classify symmetric adversarial samples differently from the incorrect classification of adversarial samples. Using CNNs' lack of invariance, the recent CNN symmetry defense has shown that the classification of symmetric adversarial samples reverts to the correct sample classification. In order to apply the same symmetry defense to GBDTs, we examine GBDT invariance and are the first to show that GBDTs also lack invariance with respect to symmetries. We apply and evaluate the GBDT symmetry defense for nine datasets against six perturbation attacks with a threat model that ranges from zero-knowledge to perfect-knowledge adversaries. Using the feature inversion symmetry against zero-knowledge adversaries, we achieve up to 100% accuracy on adversarial samples even when default and robust classifiers have 0% accuracy. Using the feature inversion and horizontal flip symmetries against perfect-knowledge adversaries, we achieve up to over 95% accuracy on adversarial samples for the GBDT classifier of the F-MNIST dataset even when default and robust classifiers have 0% accuracy.
摘要
我们研究了使用对称性来防御基于树状集成分类器(例如梯度提升决策树,GBDT)的对抗攻击。这个想法基于现有的对称性防御方法,该方法利用卷积神经网络(CNN)的对称性不一致性。CNN的对称性不一致性意味着它会将水平翻转的图像分类为不同的类别,而不是原始图像。此外,CNN的对称性不一致性还意味着它会将对称攻击样本分类为 incorrect 类别。使用对称性不一致性,现有的 CNN 对称防御方法已经证明了对 symmetric 攻击样本的分类将回归到正确的样本分类。为了应用同样的对称防御方法到 GBDT,我们研究了 GBDT 的对称性,并发现 GBDT 也缺乏对称性。我们应用并评估了 GBDT 对称防御策略在 nine 个数据集上,对六种攻击方法进行测试,使用了隐藏知识到完美知识的威胁模型。使用特征反转对称性,我们在零知识攻击者情况下达到了 up to 100% 的攻击样本正确率,而 default 和 Robust 分类器的正确率为 0%。使用特征反转和水平翻转对称性,我们在完美知识攻击者情况下达到了 F-MNIST 数据集上 GBDT 分类器的 over 95% 的攻击样本正确率,而 default 和 Robust 分类器的正确率为 0%。
AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting
results: 在评估29个标准资料集上,这个论文显示了强大的实验性能,与许多预测方法相比,在点预测和分布预测精度上表现出色,甚至可以超越最佳预测方法的组合。Abstract
We introduce AutoGluon-TimeSeries - an open-source AutoML library for probabilistic time series forecasting. Focused on ease of use and robustness, AutoGluon-TimeSeries enables users to generate accurate point and quantile forecasts with just 3 lines of Python code. Built on the design philosophy of AutoGluon, AutoGluon-TimeSeries leverages ensembles of diverse forecasting models to deliver high accuracy within a short training time. AutoGluon-TimeSeries combines both conventional statistical models, machine-learning based forecasting approaches, and ensembling techniques. In our evaluation on 29 benchmark datasets, AutoGluon-TimeSeries demonstrates strong empirical performance, outperforming a range of forecasting methods in terms of both point and quantile forecast accuracy, and often even improving upon the best-in-hindsight combination of prior methods.
摘要
我们介绍AutoGluon-TimeSeries:一个开源AutoML库 для probabilistic time series forecasting。我们专注于使用容易和可靠,让用户只需三行Python代码就能生成高精度的点和量程预测。基于AutoGluon的设计哲学,AutoGluon-TimeSeries 利用多种不同的预测模型 ensemble,以提供高精度的预测,仅需短时间训练。AutoGluon-TimeSeries 结合了传统的统计模型、机器学习基于的预测方法,以及ensemble技术。在我们的29个benchmark数据集评估中,AutoGluon-TimeSeries 展示了强大的实验性表现,比较了多种预测方法,包括点和量程预测精度,并常常超越最佳对应方法的结合。
Efficient Variational Inference for Large Skew-t Copulas with Application to Intraday Equity Returns
results: 这篇论文的研究结果表明,skew-t分布能够更好地捕捉金融数据中的非线性和极端尾部依赖性,并且在预测和股票组合选择方面表现更好。Abstract
Large skew-t factor copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we propose a fast and accurate Bayesian variational inference (VI) approach to do so. The method uses a conditionally Gaussian generative representation of the skew-t distribution to define an augmented posterior that can be approximated accurately. A fast stochastic gradient ascent algorithm is used to solve the variational optimization. The new methodology is used to estimate copula models for intraday returns from 2017 to 2021 on 93 U.S. equities. The copula captures substantial heterogeneity in asymmetric dependence over equity pairs, in addition to the variability in pairwise correlations. We show that intraday predictive densities from the skew-t copula are more accurate than from some other copula models, while portfolio selection strategies based on the estimated pairwise tail dependencies improve performance relative to the benchmark index.
摘要
大的扭曲-t分布模型具有较高的对称不均衡和极端尾部依赖性,因此在金融数据模型化中具有吸引力。我们表明了Azalini和Capitanio(2003)的skew-t分布下的copula允许对每个对象之间的对称不均衡达到更高的水平。这种copula的估计在高维度是困难的,我们提出了一种快速和准确的整体变分推断法来实现。该方法使用一个conditionally Gaussian生成表示skew-t分布来定义一个扩展 posterior,可以准确地估计。一种快速的梯度上升算法来解决变分估计问题。我们使用这种新方法来估计2017年至2021年的93只美国股票的copula模型。这种copula模型能够捕捉到股票对的巨大差异性,以及对对策之间的变化。我们显示了在skew-t copula模型下的预测概率密度比其他 copula模型更为准确,而基于估计的对策tail相互依赖性可以提高投资性能。
Critical Points ++: An Agile Point Cloud Importance Measure for Robust Classification, Adversarial Defense and Explainable AI
For: 提高实际应用中安全需求下模型的准确性和速度。* Methods: 研究3D点云的关键点与异常样本的关系,并将关键点推广为重要度度量。* Results: 训练基于不重要点的分类网络可以大幅提高模型的Robustness,但是会导致清洁集合上的性能下降。发现正常化 entropy 非常有用于异常分析。建议使用 adaptive 阈值基于正常化 entropy 来选择不重要点。提出了一种快速计算的重要度度量,可以应用于 Explainable AI (XAI)、异常点除法、不确定性估计、 Robust Classification 和反击防御等多种应用。在 послед two 任务上达到了 SOTA 结果。代码可以在:https://github.com/yossilevii100/critical_points2 中找到。Abstract
The ability to cope accurately and fast with Out-Of-Distribution (OOD) samples is crucial in real-world safety demanding applications. In this work we first study the interplay between critical points of 3D point clouds and OOD samples. Our findings are that common corruptions and outliers are often interpreted as critical points. We generalize the notion of critical points into importance measures. We show that training a classification network based only on less important points dramatically improves robustness, at a cost of minor performance loss on the clean set. We observe that normalized entropy is highly informative for corruption analysis. An adaptive threshold based on normalized entropy is suggested for selecting the set of uncritical points. Our proposed importance measure is extremely fast to compute. We show it can be used for a variety of applications, such as Explainable AI (XAI), Outlier Removal, Uncertainty Estimation, Robust Classification and Adversarial Defense. We reach SOTA results on the two latter tasks. Code is available at: https://github.com/yossilevii100/critical_points2
摘要
“能够快速和精准地处理外部分布(OOD)样本的能力是实际应用中的重要要求。在这个工作中,我们首先研究3D点云中的重要点和OOD样本之间的互动。我们发现通用的损坏和偏差常常被视为重要点。我们将重要点扩展为重要度量表。我们显示在仅使用不重要点进行训练时,可以大幅提高Robustness,仅带来清洁集的小损失。我们发现Normalized entropy具有高度的敏感度,因此建议使用Normalized entropy来选择不重要点。我们的提出的重要度量表非常快速计算。我们显示它可以用于多个应用,如Explainable AI(XAI)、偏差除去、不确定度估计、Robust Classification和攻击防护。我们在这两个后者任务上达到了SOTA结果。代码可以在:https://github.com/yossilevii100/critical_points2 中找到。”Note: "SOTA" stands for "State of the Art", which means the current best performance in a particular field or task.
Models Matter: The Impact of Single-Step Retrosynthesis on Synthesis Planning
paper_authors: Paula Torren-Peraire, Alan Kai Hassen, Samuel Genheden, Jonas Verhoeven, Djork-Arne Clevert, Mike Preuss, Igor Tetko for: 这种研究旨在提供一种可靠的化学合成路径规划方法,以便在实际应用中找到最佳化学合成路径。methods: 这种方法首先使用多个单步retrosynthesis模型进行预测,然后使用这些预测结果进行多步合成规划。此外,这种方法还使用公共和专用反应数据进行模型评估。results: 研究发现,单步retrosynthesis模型的高性能并不一定意味着它们可以在多步合成规划中表现出色。此外,研究还发现,使用不同的单步模型可以提高多步合成规划的总成功率,并且每个单步模型都可以找到不同的合成路径。Abstract
Retrosynthesis consists of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found with the goal to provide a synthesis route. Its two primary research directions, single-step retrosynthesis prediction, which models the chemical reaction logic, and multi-step synthesis planning, which tries to find the correct sequence of reactions, are inherently intertwined. Still, this connection is not reflected in contemporary research. In this work, we combine these two major research directions by applying multiple single-step retrosynthesis models within multi-step synthesis planning and analyzing their impact using public and proprietary reaction data. We find a disconnection between high single-step performance and potential route-finding success, suggesting that single-step models must be evaluated within synthesis planning in the future. Furthermore, we show that the commonly used single-step retrosynthesis benchmark dataset USPTO-50k is insufficient as this evaluation task does not represent model performance and scalability on larger and more diverse datasets. For multi-step synthesis planning, we show that the choice of the single-step model can improve the overall success rate of synthesis planning by up to +28% compared to the commonly used baseline model. Finally, we show that each single-step model finds unique synthesis routes, and differs in aspects such as route-finding success, the number of found synthesis routes, and chemical validity, making the combination of single-step retrosynthesis prediction and multi-step synthesis planning a crucial aspect when developing future methods.
摘要
逆Synthesis是一种分解化学物质的技术,通过步骤加法来找到化学前体。逆Synthesis的两个主要研究方向是单步逆Synthesis预测和多步Synthesis规划。然而,这两者之间的联系并未得到反映。在这个工作中,我们将这两个主要研究方向结合起来,通过应用多个单步逆Synthesis模型并分析它们在多步Synthesis规划中的影响。我们发现,高单步性能并不一定意味着成功的路径找到。因此,单步逆Synthesis模型在将来应该在Synthesis规划中进行评估。此外,我们显示USPTO-50k单步逆Synthesis评估数据集不能准确评估单步逆Synthesis模型的性能和扩展性。对多步Synthesis规划,我们显示,选择合适的单步逆Synthesis模型可以提高总成功率 by up to +28%比于常用基eline模型。最后,我们发现每个单步逆Synthesis模型找到的合成路径都是独特的,包括成功率、找到的合成路径数量和化学有效性等方面不同,因此将单步逆Synthesis预测和多步Synthesis规划结合起来是未来研究的关键。
On the Optimal Expressive Power of ReLU DNNs and Its Application in Approximation with Kolmogorov Superposition Theorem
for: studying the optimal expressive power of ReLU deep neural networks (DNNs) and its application in approximation.
methods: constructively prove that any continuous piecewise linear functions on $[0,1]$, comprising $O(N^2L)$ segments, can be represented by ReLU DNNs with $L$ hidden layers and $N$ neurons per layer.
results: achieve an enhanced approximation rate for ReLU DNNs of arbitrary width and depth when dealing with continuous functions in high-dimensional spaces, using the Kolmogorov Superposition Theorem.Abstract
This paper is devoted to studying the optimal expressive power of ReLU deep neural networks (DNNs) and its application in approximation via the Kolmogorov Superposition Theorem. We first constructively prove that any continuous piecewise linear functions on $[0,1]$, comprising $O(N^2L)$ segments, can be represented by ReLU DNNs with $L$ hidden layers and $N$ neurons per layer. Subsequently, we demonstrate that this construction is optimal regarding the parameter count of the DNNs, achieved through investigating the shattering capacity of ReLU DNNs. Moreover, by invoking the Kolmogorov Superposition Theorem, we achieve an enhanced approximation rate for ReLU DNNs of arbitrary width and depth when dealing with continuous functions in high-dimensional spaces.
摘要
Here's the translation in Simplified Chinese:这篇论文关注ReLU深度神经网络(DNNs)的最佳表达力和其在高维空间中的投影使用Kolmogorov积合定理。我们首先构造地证明任何[0,1]上连续piecewise线性函数可以通过ReLU DNNs with $L$层和$N$个神经元per layer来表示,其中$O(N^2L)$个段。其次,我们证明这种构造是DNNs中参数的最佳 counted,通过研究ReLU DNNs的分裂能力来达到。 finally,通过采用Kolmogorov积合定理,我们得到了ReLU DNNs of arbitrary width and depth在高维空间中的提高投影率。
Quality Diversity under Sparse Reward and Sparse Interaction: Application to Grasping in Robotics
results: 研究发现,MAP-Elites变体在学习率、稳定性和多样性等方面具有明显的优势,在研究中的所有方法中出色的表现。此外,实验还发现了 sparse interaction 可能导致的假新鲜性现象。本研究的实验结果表明,通过使用QD算法,可以快速生成高质量的抓取轨迹示例,这在文献中无前例。Abstract
Quality-Diversity (QD) methods are algorithms that aim to generate a set of diverse and high-performing solutions to a given problem. Originally developed for evolutionary robotics, most QD studies are conducted on a limited set of domains - mainly applied to locomotion, where the fitness and the behavior signal are dense. Grasping is a crucial task for manipulation in robotics. Despite the efforts of many research communities, this task is yet to be solved. Grasping cumulates unprecedented challenges in QD literature: it suffers from reward sparsity, behavioral sparsity, and behavior space misalignment. The present work studies how QD can address grasping. Experiments have been conducted on 15 different methods on 10 grasping domains, corresponding to 2 different robot-gripper setups and 5 standard objects. An evaluation framework that distinguishes the evaluation of an algorithm from its internal components has also been proposed for a fair comparison. The obtained results show that MAP-Elites variants that select successful solutions in priority outperform all the compared methods on the studied metrics by a large margin. We also found experimental evidence that sparse interaction can lead to deceptive novelty. To our knowledge, the ability to efficiently produce examples of grasping trajectories demonstrated in this work has no precedent in the literature.
摘要
Quality-Diversity(QD)方法是一种算法,旨在生成一组多样性和高性能的解决方案来解决给定问题。原本是为演化 робототех学开发的,大多数QD研究都是在有限的领域上进行,主要是应用于行动,其中健康和行为信号都是密集的。抓取是 робо控制中的一个关键任务,尚未被解决。抓取受到了QD文献中的极其挑战:它受到奖励稀少、行为稀少和行为空间不对齐的影响。本研究探讨了QD如何解决抓取问题。我们对15种方法在10个抓取领域进行了实验,包括2种机器人-握手设备和5种标准物品。我们还提出了一个评价框架,以便公平地对算法进行评价。实验结果显示,MAP-Elites的变体在学习率和稳定性等方面都超过了所有比较的方法。我们还发现了实验证明,稀有的互动可能会导致假新鲜。据我们所知,在这个研究中生成的抓取轨迹示例的能力没有在文献中的前例。
paper_authors: Xuanhe Zhou, Guoliang Li, Zhiyuan Liu for:This paper aims to provide a revolutionary LLM-centric framework for database maintenance, which can help database administrators (DBAs) to manage and optimize database systems more efficiently and effectively.methods:The proposed framework uses large language models (LLMs) to acquire database maintenance experience from textual sources, and provides reasonable, well-founded, in-time diagnosis and optimization advice for target databases. The framework includes three main components: (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs.results:The preliminary experimental results show that D-Bot, the proposed LLM-based database administrator, can efficiently and effectively diagnose the root causes of database issues and provide accurate optimization advice. The code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.Abstract
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
摘要
Database administrators (DBAs) play a crucial role in managing, maintaining, and optimizing a database system to ensure data availability, performance, and reliability. However, it is difficult and time-consuming for DBAs to manage a large number of database instances (e.g., millions of instances on cloud databases). Recently, large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Therefore, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results show that D-Bot can efficiently and effectively diagnose the root causes, and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis
results: 经过广泛的实验,本研究发现了不同方法的性能指标,包括准确率、精度、回归率和F1分数,可以帮助研究者和实践者在面临欺骗内容时做出了 Informed decisions。Abstract
Deceptive text classification is a critical task in natural language processing that aims to identify deceptive o fraudulent content. This study presents a comparative analysis of machine learning and transformer-based approaches for deceptive text classification. We investigate the effectiveness of traditional machine learning algorithms and state-of-the-art transformer models, such as BERT, XLNET, DistilBERT, and RoBERTa, in detecting deceptive text. A labeled dataset consisting of deceptive and non-deceptive texts is used for training and evaluation purposes. Through extensive experimentation, we compare the performance metrics, including accuracy, precision, recall, and F1 score, of the different approaches. The results of this study shed light on the strengths and limitations of machine learning and transformer-based methods for deceptive text classification, enabling researchers and practitioners to make informed decisions when dealing with deceptive content.
摘要
《伪装文本分类是自然语言处理中一项关键任务,旨在识别伪装或欺诈内容。本研究进行了机器学习和转换器基于方法的比较分析,以评估这些方法在检测伪装文本方面的有效性。我们使用了一个标注的数据集,包括伪装和非伪装文本,进行训练和评估。经过广泛的实验,我们比较了不同方法的性能指标,包括准确率、精度、回归率和F1分数。研究结果为研究者和实践者提供了有关机器学习和转换器方法在伪装文本分类方面的强项和局限性,以便他们在处理伪装内容时做出了有知识的决策。》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.
Comprehensive Analysis of Network Robustness Evaluation Based on Convolutional Neural Networks with Spatial Pyramid Pooling
results: 研究结果表明,提案的CNN模型在不同网络类型、组件类型和攻击enario下具有高效的计算时间和可scalability,但在其他攻击模式下表现不佳。研究还发现,当预测的网络类型与训练网络类型不同时,模型仍能在Random Node Failure scenario中显示出良好的性能,但在其他攻击enario下表现下降。这种scenario-sensitivity在前一 studies中未得到关注。Abstract
Connectivity robustness, a crucial aspect for understanding, optimizing, and repairing complex networks, has traditionally been evaluated through time-consuming and often impractical simulations. Fortunately, machine learning provides a new avenue for addressing this challenge. However, several key issues remain unresolved, including the performance in more general edge removal scenarios, capturing robustness through attack curves instead of directly training for robustness, scalability of predictive tasks, and transferability of predictive capabilities. In this paper, we address these challenges by designing a convolutional neural networks (CNN) model with spatial pyramid pooling networks (SPP-net), adapting existing evaluation metrics, redesigning the attack modes, introducing appropriate filtering rules, and incorporating the value of robustness as training data. The results demonstrate the thoroughness of the proposed CNN framework in addressing the challenges of high computational time across various network types, failure component types and failure scenarios. However, the performance of the proposed CNN model varies: for evaluation tasks that are consistent with the trained network type, the proposed CNN model consistently achieves accurate evaluations of both attack curves and robustness values across all removal scenarios. When the predicted network type differs from the trained network, the CNN model still demonstrates favorable performance in the scenario of random node failure, showcasing its scalability and performance transferability. Nevertheless, the performance falls short of expectations in other removal scenarios. This observed scenario-sensitivity in the evaluation of network features has been overlooked in previous studies and necessitates further attention and optimization. Lastly, we discuss important unresolved questions and further investigation.
摘要
网络连接 Robustness 是我们理解、优化和维护复杂网络的关键方面,传统上通过时间消耗和实际无法实施的 simulations 进行评估。幸运的是,机器学习提供了一个新的评估挑战的解决方案。然而,有几个关键的问题仍然无法解决,包括在更广泛的边 removals 方案下的性能、通过攻击曲线而不直接培训 robustness,可扩展性的预测任务和预测能力的传输性。在这篇文章中,我们解决了这些挑战,通过设计一个卷积神经网络(CNN)模型,并将其与空间堆叠网络(SPP-net)相结合,采用现有的评估指标,修改攻击模式,引入适当的筛选规则,以及将值 robustness 作为训练数据。结果表明我们提出的 CNN 框架在不同网络类型、组件类型和失败场景下具有高计算时间的积极性。然而,我们的 CNN 模型在不同的评估任务下表现不一样:当训练网络类型与测试网络类型一致时,我们的 CNN 模型在所有失败场景下具有准确的攻击曲线和 Robustness 值评估。而当预测网络类型与训练网络类型不一致时,我们的 CNN 模型在随机节点失败场景下仍然具有可扩展性和性能传输性,但在其他失败场景下表现不佳。这种场景敏感性在前一 studie 中被忽略了,需要进一步的关注和优化。最后,我们讨论了重要的未解决问题和进一步的调查。
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
results: 提供了 average dynamic suboptimality gap 的上界,证明 PORTAL 和 Ada-PORTAL 在不重大非站点性下可以 achieve 无限小的 average dynamic suboptimality gap,并且有 polynomial sample complexity。Abstract
Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.
摘要
“强化学习(RL)在变化环境中模型许多实际应用via非站点 Markov决策过程(MDPs),因此吸引了广泛的关注。然而,文献中关于非站点 MDPs 的理论研究主要集中在表格和线性(混合) MDPs 上,这些模型不能捕捉深度RL 中未知表示的本质。在这篇论文中,我们进行了第一次对非站点 RL 下 episodic 低维 MDPs 进行研究,其中转移概率和奖励可能随着时间变化,且低维模型包含未知表示。我们首先提出了一种参数取值权重优化算法called PORTAL,然后改进了 PORTAL 到其参数自适应版本Ada-PORTAL,可以自适应地调整其超参数无需任何先知ledge of nonstationarity。对于这两种算法,我们提供了平均动态下optimality gap的上限,表明,只要非站点性不太大,PORTAL 和 Ada-PORTAL 都是可以 дости得到可微小平均动态下optimality gap的样本率质量。”
$\mathcal{G}^2Pxy$: Generative Open-Set Node Classification on Graphs with Proxy Unknowns
paper_authors: Qin Zhang, Zelin Shi, Xiaolin Zhang, Xiaojun Chen, Philippe Fournier-Viger, Shirui Pan
for: This paper focuses on open-set node classification, which is the task of predicting the labels of unlabeled nodes in a graph when some classes are unknown during training.
methods: The proposed method, $\mathcal{G}^2Pxy$, uses a generative approach with proxy unknown nodes generated via mixup to anticipate the distribution of novel classes. It transforms a closed-set classifier into an open-set one by adding an extra proxy classifier, and uses a combination of cross entropy loss and complement entropy loss to optimize the performance.
results: The proposed method achieves superior effectiveness for unknown class detection and known class classification on benchmark graph datasets, and does not have specific requirement on the GNN architecture.Abstract
Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are often applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e. $\mathcal{G}^2Pxy$, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints of both cross entropy loss and complement entropy loss, $\mathcal{G}^2Pxy$ achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on benchmark graph datasets. Moreover, $\mathcal{G}^2Pxy$ does not have specific requirement on the GNN architecture and shows good generalizations.
摘要
节点分类是指预测图表中没有标签的节点的标签。现状最佳方法基于图神经网络实现了在训练时有所有标签的情况下的出色表现。但在实际应用中,模型经常应用于新类的数据上,这可能导致大规模的误分类,从而显著降低性能。因此,开发开放集分类方法是非常重要的,以确定给定的样本是否属于已知类。现有的开放集节点分类方法通常使用整合学习,使用真实未看到的类节点的一部分或所有特征来帮助开放集分类。在这篇论文中,我们提出了一种新的生成型开放集节点分类方法,即$\mathcal{G}^2Pxy$,它遵循一个严格的整合学习设定,在训练和验证过程中不可以使用未知类的信息。通过混合生成器来生成两种类型的代理未知节点(交叉未知代理和外部未知代理),以高效地预测未知类的分布。使用生成的代理节点,一个关闭集分类器可以被转化为一个开放集分类器,通过添加一个额外的代理分类器。在满足极化 entropy 损失和补偿 entropy 损失的约束下,$\mathcal{G}^2Pxy$ 实现了高效的未知类检测和已知类分类,经过实验 validate 在图表数据集上。此外,$\mathcal{G}^2Pxy$ 不具有特定的 GNN 架构要求,并且具有良好的泛化性。
A Forecaster’s Review of Judea Pearl’s Causality: Models, Reasoning and Inference, Second Edition, 2009
results: 本文讨论了时间序列预测中 causal inference 的可能优点和挑战,以及如何在不同的预测场景中 estimate causal effects。Abstract
With the big popularity and success of Judea Pearl's original causality book, this review covers the main topics updated in the second edition in 2009 and illustrates an easy-to-follow causal inference strategy in a forecast scenario. It further discusses some potential benefits and challenges for causal inference with time series forecasting when modeling the counterfactuals, estimating the uncertainty and incorporating prior knowledge to estimate causal effects in different forecasting scenarios.
摘要
原始 causality 书的大受欢迎和成功,这篇评论涵盖了2009年第二版中更新的主题,并提供了一种易于跟进的 causal inference 策略。它还讨论了在预测场景中模型counterfactuals,估计 uncertainty 以及 integrate prior knowledge 以估计不同预测场景中的 causal effect。Here's the word-for-word translation:原始 causality 书的大受欢迎和成功,这篇评论涵盖了2009年第二版中更新的主题,并提供了一种易于跟进的 causal inference 策略。它还讨论了在预测场景中模型counterfactuals,估计 uncertainty 以及 integrate prior knowledge 以估计不同预测场景中的 causal effect。
Explainable AI applications in the Medical Domain: a systematic review
results: 这篇论文的结果表明,在医疗人工智能领域中,使用XAI解释技术可以提高人类对AI的理解和信任,但是目前大多数研究仅仅是在理论上实现了这一点,尚未在实践中应用。同时,深度学习模型在医疗人工智能中的应用比其他类型的机器学习模型更多,而且可见性和互动式用户界面是更有用的。Abstract
Artificial Intelligence in Medicine has made significant progress with emerging applications in medical imaging, patient care, and other areas. While these applications have proven successful in retrospective studies, very few of them were applied in practice.The field of Medical AI faces various challenges, in terms of building user trust, complying with regulations, using data ethically.Explainable AI (XAI) aims to enable humans understand AI and trust its results. This paper presents a literature review on the recent developments of XAI solutions for medical decision support, based on a representative sample of 198 articles published in recent years. The systematic synthesis of the relevant articles resulted in several findings. (1) model-agnostic XAI techniques were mostly employed in these solutions, (2) deep learning models are utilized more than other types of machine learning models, (3) explainability was applied to promote trust, but very few works reported the physicians participation in the loop, (4) visual and interactive user interface is more useful in understanding the explanation and the recommendation of the system. More research is needed in collaboration between medical and AI experts, that could guide the development of suitable frameworks for the design, implementation, and evaluation of XAI solutions in medicine.
摘要
人工智能在医疗领域已经取得了显著的进步,其应用领域包括医疗影像、患者护理等。然而,这些应用尚未在实践中普及。医疗领域的人工智能面临着多种挑战,包括建立用户信任、遵守法规、使用数据伦理。可解释人工智能(XAI)旨在让人类理解人工智能和信任其结果。本文通过对最近几年发表的198篇文章进行文献综述,描述了当前XAI解决方案在医疗决策支持方面的最新发展。系统性的综合分析这些文章的结果显示了以下几点:(1)模型不依赖的XAI技术在这些解决方案中得到了广泛的应用,(2)深度学习模型比其他机器学习模型更为广泛应用,(3)解释性是为了提高信任,但很少的工作报告了医生参与的循环,(4)可视化和交互式用户界面更有用于理解解释和系统的建议。为了指导人工智能在医疗领域的设计、实现和评估,需要更多的合作 between医疗和人工智能专家,以开发适合的框架。
A Comparative Assessment of Multi-view fusion learning for Crop Classification
results: 研究发现,取决于测试区域,不同的 fusión 策略在不同的数据集上可以获得最佳性能。然而,我们提出了一个初步的选择 критериion,可以帮助选择最佳的 fusión 策略。Abstract
With a rapidly increasing amount and diversity of remote sensing (RS) data sources, there is a strong need for multi-view learning modeling. This is a complex task when considering the differences in resolution, magnitude, and noise of RS data. The typical approach for merging multiple RS sources has been input-level fusion, but other - more advanced - fusion strategies may outperform this traditional approach. This work assesses different fusion strategies for crop classification in the CropHarvest dataset. The fusion methods proposed in this work outperform models based on individual views and previous fusion methods. We do not find one single fusion method that consistently outperforms all other approaches. Instead, we present a comparison of multi-view fusion methods for three different datasets and show that, depending on the test region, different methods obtain the best performance. Despite this, we suggest a preliminary criterion for the selection of fusion methods.
摘要
受到快速增加的远程感知数据源的增加和多样化的影响,多视图学习模型的需求正在增加。然而,考虑到不同的分辨率、强度和噪声等因素,这是一个复杂的任务。传统上,多视图源数据的融合方法通常是输入水平的融合,但更高级的融合策略可能会超越传统方法。本工作对用于蔬菜分类的CropHarvest数据集进行了不同融合策略的评估。我们发现,取决于测试区域,不同的融合方法在不同的数据集上表现出最佳的情况。尽管如此,我们提出了一个初步的融合方法选择标准。Here's a breakdown of the translation:* 快速增加 ( rápidamente aumentando ) - rapidly increasing* 远程感知数据源 ( remote sensing data sources ) - remote sensing data* 多样化 ( multi-variate ) - diverse* 多视图学习模型 ( multi-view learning model ) - multi-view learning model* 输入水平 ( input level ) - input level* 融合方法 ( fusion method ) - fusion method* 蔬菜分类 ( crop classification ) - crop classification* CropHarvest数据集 ( CropHarvest dataset ) - CropHarvest dataset* 不同的融合策略 ( different fusion strategies ) - different fusion strategies* 测试区域 ( test region ) - test region* 初步的融合方法选择标准 ( preliminary criterion for selecting fusion methods ) - preliminary criterion for selecting fusion methods
Addressing Data Scarcity in Optical Matrix Multiplier Modeling Using Transfer Learning
paper_authors: Ali Cem, Ognjen Jovanovic, Siqi Yan, Yunhong Ding, Darko Zibar, Francesco Da Ros
for: 用 transferred learning Address experimental data scarcity when training neural network (NN) models for Mach-Zehnder interferometer mesh-based optical matrix multipliers.
methods: 使用 synthetic data Generated from a less accurate analytical model 和 fine-tuning with experimental data.
results: 比使用 analytical model 或 standalone NN model 时,模型错误得 Significant reductions when training data is limited. 使用 regularization techniques 和 ensemble averaging, 实现 < 1 dB root-mean-square error on the matrix weights implemented by a photonic chip, 只用 25% of the available data.Abstract
We present and experimentally evaluate using transfer learning to address experimental data scarcity when training neural network (NN) models for Mach-Zehnder interferometer mesh-based optical matrix multipliers. Our approach involves pre-training the model using synthetic data generated from a less accurate analytical model and fine-tuning with experimental data. Our investigation demonstrates that this method yields significant reductions in modeling errors compared to using an analytical model, or a standalone NN model when training data is limited. Utilizing regularization techniques and ensemble averaging, we achieve < 1 dB root-mean-square error on the matrix weights implemented by a photonic chip while using only 25% of the available data.
摘要
我团队提出了一种使用传输学习来解决光纤Matrix多项式乘法器的实验数据稀缺问题。我们的方法是先在synthetic数据上预训练模型,然后使用实验数据进行细化。我们的调查表明,这种方法可以比使用分析模型或独立的神经网络模型来减少模型错误。通过使用正则化技术和ensemble averaging,我们实现了<1 dB的平均方差,而使用的数据只占总数的25%。
Product Review Image Ranking for Fashion E-commerce
methods: 本文提出了一种简单 yet effective的训练程序,用于排名用户图片。我们使用Myntra studio posts和高度参与度(upvotes/downvotes)的UGC图片作为我们的起点,并使用选择的扭曲技术将图片的质量升级到与坏UGC图片相当。我们训练我们的网络,以便将坏质量图片排名在低于高质量图片之下。
results: 我们的提议方法在两个纪录中,即相关度系数和准确率,与基线模型之间差距很大。Abstract
In a fashion e-commerce platform where customers can't physically examine the products on their own, being able to see other customers' text and image reviews of the product is critical while making purchase decisions. Given the high reliance on these reviews, over the years we have observed customers proactively sharing their reviews. With an increase in the coverage of User Generated Content (UGC), there has been a corresponding increase in the number of customer images. It is thus imperative to display the most relevant images on top as it may influence users' online shopping choices and behavior. In this paper, we propose a simple yet effective training procedure for ranking customer images. We created a dataset consisting of Myntra (A Major Indian Fashion e-commerce company) studio posts and highly engaged (upvotes/downvotes) UGC images as our starting point and used selected distortion techniques on the images of the above dataset to bring their quality at par with those of bad UGC images. We train our network to rank bad-quality images lower than high-quality ones. Our proposed method outperforms the baseline models on two metrics, namely correlation coefficient, and accuracy, by substantial margins.
摘要
在一个无法直接检查产品的电商平台上,可见其他客户的文本和图像评论对于购买决策非常重要。随着用户生成内容的涵盖率的提高,我们在年久的观察中发现了客户积极分享评论。随着用户生成内容的增加,图像的数量也在增加。因此,需要显示最相关的图像,以影响用户在线购物选择和行为。在本文中,我们提出了一种简单 yet effective的训练方法来排序客户图像。我们使用Myntra(印度主要的时尚电商公司) studio 帖子和高度参与(upvotes/downvotes)的用户生成内容图像作为我们的起点,并使用选择的扭曲技术将图像的质量与Bad UGC图像相同。我们训练我们的网络,以将低质量图像排在高质量图像之下。我们的提议方法在两个纪录中,即相关系数和准确率,与基eline模型相比,均表现出了substantial的超越。
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment
results: 测试结果表明,在总体可靠性方面,更加Alignment的模型 tend to perform better。然而,对不同的可靠性类型进行细化分析、测试和不断改进的需求是非常重要的。这些研究结果提供了关于大语言模型可靠性的有价值信息和指导,以便在各种应用中进行可靠和道德的部署。Abstract
Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
摘要
确保对模型的整合,即使模型行为与人类意图相符(1、2),在部署大型自然语言模型(LLM)实际应用前已成为一项关键任务。例如,OpenAI投入了六个月的时间,不断地对GPT-4进行了整合,以确保其可靠性和可靠性(3)。然而,实践者面临着评估模型输出是否符合社会规范、价值观和法规的明确指南的挑战。这个障碍因此阻碍了系统化的迭代部署LLMs。为解决这个问题,本文提供了LLM可靠性的全面调查。调查覆盖了七个主要类别的LLM可靠性:可靠性、安全性、公平性、抵御滥用、可见性和逻辑、遵守社会规范、稳定性。每个主要类别都被进一步细分为多个子类别,共计29个子类别。此外,为进一步调查这些子类别,我们选择了8个子类别进行测量研究,并在多种广泛使用的LLMs上进行了相应的测量研究。测量结果表明,通常情况下,更加整合的模型往往在总的可靠性方面表现更好。然而,对不同的可靠性类别进行了更细化的分析、测试和改进,是关键在实现可靠和道德正确的LLM部署。本文通过把这些关键维度提出,为实践者提供了有价值的指导和发现,以便更好地理解和解决这些问题,从而实现可靠和道德正确的LLM部署。
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization
results: 我们的实验表明,FlexiCubes 可以在 synthetic benchmarks 和实际应用中提供显著改善的网格质量和几何准确性。Abstract
This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics. Existing implementations adapt classic isosurface extraction algorithms like Marching Cubes or Dual Contouring; these techniques were designed to extract meshes from fixed, known fields, and in the optimization setting they lack the degrees of freedom to represent high-quality feature-preserving meshes, or suffer from numerical instabilities. We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives. Our main insight is to introduce additional carefully-chosen parameters into the representation, which allow local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation when optimizing for a downstream task. We base our extraction scheme on Dual Marching Cubes for improved topological properties, and present extensions to optionally generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments validate FlexiCubes on both synthetic benchmarks and real-world applications, showing that it offers significant improvements in mesh quality and geometric fidelity.
摘要
We propose FlexiCubes, a customized isosurface representation designed for optimizing an unknown mesh with respect to geometric, visual, or physical objectives. Our key insight is to introduce additional carefully chosen parameters into the representation, allowing for local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation during optimization.We base our extraction scheme on Dual Marching Cubes for improved topological properties and present extensions to generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments demonstrate that FlexiCubes offers significant improvements in mesh quality and geometric fidelity on both synthetic benchmarks and real-world applications.
Machine Learning aided Computer Architecture Design for CNN Inferencing Systems
For: This paper aims to expedite the Design Space Exploration (DSE) process for selecting the most suitable Graphics Processing Unit (GPU) for Convolutional Neural Network (CNN) inferencing systems.* Methods: The authors propose a quick and precise technique for forecasting the power and performance of CNNs during inference, using a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm and a novel power model.* Results: The proposed approach achieves a Mean Absolute Percentage Error (MAPE) of 5.03% and 5.94% for power and performance, respectively, allowing computer architects to estimate power and performance in the early stages of development and reducing the need for numerous prototypes.Abstract
Efficient and timely calculations of Machine Learning (ML) algorithms are essential for emerging technologies like autonomous driving, the Internet of Things (IoT), and edge computing. One of the primary ML algorithms used in such systems is Convolutional Neural Networks (CNNs), which demand high computational resources. This requirement has led to the use of ML accelerators like GPGPUs to meet design constraints. However, selecting the most suitable accelerator involves Design Space Exploration (DSE), a process that is usually time-consuming and requires significant manual effort. Our work presents approaches to expedite the DSE process by identifying the most appropriate GPGPU for CNN inferencing systems. We have developed a quick and precise technique for forecasting the power and performance of CNNs during inference, with a MAPE of 5.03% and 5.94%, respectively. Our approach empowers computer architects to estimate power and performance in the early stages of development, reducing the necessity for numerous prototypes. This saves time and money while also improving the time-to-market period.
摘要
高效和快速的计算机科学(ML)算法计算是现代科技的核心,如自动驾驶、物联网(IoT)和边缘计算。其中之一的主要ML算法是卷积神经网络(CNNs),它们需要高度的计算资源。这种需求导致了ML加速器如GPGPUs的使用,但选择最适合的加速器需要Design Space Exploration(DSE)过程,这个过程通常需要较长的时间和大量的手动努力。我们的工作提出了加速DSE过程的方法,以便更快地选择适合CNN推理系统的GPGPU。我们已经开发了一种快速精准的CNN推理性能和功率预测技术,MAPE为5.03%和5.94%,分别。我们的方法使得计算机架构师可以在开发的早期阶段估算CNN的功率和性能,从而降低了许多原型的需求,这涯ь时间和钱,同时也提高了市场上采用时间。
FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis
results: 对风险检测器的解释质量进行了广泛评估,并证明了FINER在风险检测中的表现优于当前领先的工具。Abstract
Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribution (FA) methods can be used to explain deep learning, the underlying classifier is still blind to what behavior is suspicious, and the generated explanation cannot adapt to downstream tasks, incurring poor explanation fidelity and intelligibility. In this paper, we propose FINER, the first framework for risk detection classifiers to generate high-fidelity and high-intelligibility explanations. The high-level idea is to gather explanation efforts from model developer, FA designer, and security experts. To improve fidelity, we fine-tune the classifier with an explanation-guided multi-task learning strategy. To improve intelligibility, we engage task knowledge to adjust and ensemble FA methods. Extensive evaluations show that FINER improves explanation quality for risk detection. Moreover, we demonstrate that FINER outperforms a state-of-the-art tool in facilitating malware analysis.
摘要
深度学习类ifizier可以达到不同风险检测应用中的状态地艺术性能。它们探索了丰富的semantic表示,并自动发现风险行为。然而,由于不透明性,这些行为semantic representation无法传递给下游安全专家以减轻他们的安全分析工作负担。尽管Feature Attribution(FA)方法可以用来解释深度学习,但下游类ifier仍然无法了解哪些行为是可疑的,并且生成的解释无法适应下游任务,导致低效解释准确度和可读性。在这篇论文中,我们提出了FINER框架,用于生成高准确性和高可读性的解释。我们的高级想法是聚集解释努力从模型开发者、FA设计者和安全专家的三个方面。为了提高准确性,我们使用解释指导多任务学习策略来练化类ifier。为了提高可读性,我们利用任务知识来调整和混合FA方法。我们的实验表明,FINER可以提高风险检测解释质量,同时也可以超过当前状态的工具在支持黑客分析方面。
Preemptive Detection of Fake Accounts on Social Networks via Multi-Class Preferential Attachment Classifiers
results: 我们的实验结果表明,PreAttacK 可以在新用户加入社交网络后的第20个不Answered好友请求后,准确地判断新账户是假的。这比主流的算法要更高,因为它们需要考虑假账户之间的友谊或共享的内容。此外,PreAttacK 可以在全球Facebook网络上实现状态最佳的性能,并且可以在新用户发送和接收20个不Answered好友请求后 converges to AUC=0.9。Abstract
In this paper, we describe a new algorithm called Preferential Attachment k-class Classifier (PreAttacK) for detecting fake accounts in a social network. Recently, several algorithms have obtained high accuracy on this problem. However, they have done so by relying on information about fake accounts' friendships or the content they share with others--the very things we seek to prevent. PreAttacK represents a significant departure from these approaches. We provide some of the first detailed distributional analyses of how new fake (and real) accounts first attempt to request friends after joining a major network (Facebook). We show that even before a new account has made friends or shared content, these initial friend request behaviors evoke a natural multi-class extension of the canonical Preferential Attachment model of social network growth. We use this model to derive a new algorithm, PreAttacK. We prove that in relevant problem instances, PreAttacK near-optimally approximates the posterior probability that a new account is fake under this multi-class Preferential Attachment model of new accounts' (not-yet-answered) friend requests. These are the first provable guarantees for fake account detection that apply to new users, and that do not require strong homophily assumptions. This principled approach also makes PreAttacK the only algorithm with provable guarantees that obtains state-of-the-art performance on new users on the global Facebook network, where it converges to AUC=0.9 after new users send + receive a total of just 20 not-yet-answered friend requests. For comparison, state-of-the-art benchmarks do not obtain this AUC even after observing additional data on new users' first 100 friend requests. Thus, unlike mainstream algorithms, PreAttacK converges before the median new fake account has made a single friendship (accepted friend request) with a human.
摘要
在这篇论文中,我们描述了一种新的算法,即偏好链接k类分类器(PreAttacK),用于检测社交媒体上的假账户。在过去的几年中,一些算法已经在这个问题上获得了高准确率,但是它们都是通过利用假账户的朋友关系或它们与其他人分享的内容来实现的—— precisamente 我们想要避免的信息。PreAttacK 是这些方法的巨大不同。我们提供了一些初次分布分析,描述新的假(和真)账户在加入社交媒体平台后的第一个请求友情行为。我们发现,even before a new account has made friends or shared content, these initial friend request behaviors evoke a natural multi-class extension of the canonical Preferential Attachment model of social network growth。我们使用这个模型来 derivate a new algorithm,PreAttacK。我们证明,在相关的问题实例中,PreAttacK near-optimally approximates the posterior probability that a new account is fake under this multi-class Preferential Attachment model of new accounts' (not-yet-answered) friend requests。这是首次对新用户的假账户检测提供了证明的保证,而且不需要强调同类性假设。这种原则驱动的方法也使PreAttacK 成为唯一具有证明保证的算法,在全球Facebook网络上实现了状态之前的最佳性能,它在新用户发送和接收20个未回复的朋友请求后 converges to AUC=0.9。对比之下,状态之前的标准准则不能达到这个 AUC,即使在观察新用户的第100个朋友请求之前。因此,与主流算法不同,PreAttacK 在新用户发送第一个朋友请求之前就 converges。
RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model
results: 该论文的结果表明,通过使用自然计划技术,GPT-3.5 在 RTLLM 中的性能可以得到显著提高。此外,该论文还提出了三个进步性目标,包括语法目标、功能目标和设计质量目标,以系统地评估生成的设计 RTL 的质量。Abstract
Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.
摘要
受大语言模型(LLM)的最近成功启发,研究人员开始探索将 LLM 应用于快速硬件设计,如通过自然语言指令生成设计RTL。然而,现有的研究都是Target设计相对较小,而且由作者自己提出的,使得对不同 LLM 解决方案进行公正比较困难。此外,许多前作只注重设计正确性,而忽略生成设计 RTL 的设计质量。在这种情况下,我们提出了一个开源的标准 benchmark,名为 RTLLM,用于通过自然语言指令生成设计 RTL。为系统地评估自动生成的设计 RTL,我们提出了三个进攻性目标,即语法目标、功能目标和设计质量目标。这个 benchmark 可以自动提供任何给定 LLM 解决方案的量化评估。此外,我们还提出了一种易于使用却效果惊人的 prompt 工程技术,名为自我规划,其在我们的提posed benchmark中证明了显著提高 GPT-3.5 的性能。
OpenProteinSet: Training data for structural biology at scale
paper_authors: Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Lukas Jarosch, Daniel Berenberg, Ian Fisk, Andrew M. Watkins, Stephen Ra, Richard Bonneau, Mohammed AlQuraishi
methods: 本研究使用了 transformers 直接 attend over 大量的 raw MSAs,以及 Protein Data Bank 中的结构同源者,并使用 AlphaFold2 进行蛋白质结构预测。
results: 本研究 introduce OpenProteinSet,一个包含超过 16 万个 MSAs、结构同源者和 AlphaFold2 蛋白质结构预测的开源数据集,可以用于蛋白质结构、功能和设计方面的多种任务和大规模多模态机器学习研究。Abstract
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
摘要
多个序列对Alignment(MSA)的蛋白质编码了丰富的生物信息,在生物信息学方法中 acted as workhorses for decades, such as protein design and protein structure prediction. Recent breakthroughs like AlphaFold2 have reaffirmed their importance by using transformers to directly attend over large quantities of raw MSAs. However, generating MSAs is highly computationally intensive, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To address this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Homophily-enhanced Structure Learning for Graph Clustering
paper_authors: Ming Gu, Gaoming Yang, Sheng Zhou, Ning Ma, Jiawei Chen, Qiaoyu Tan, Meihan Liu, Jiajun Bu
For: The paper is written for graph clustering tasks, specifically addressing the issue of subpar performance due to the lack of consideration of graph structure quality in existing GNN-based methods.* Methods: The paper proposes a novel method called HoLe, which enhances the degree of homophily within the graph structure to improve GNNs and clustering outcomes. This is achieved through two clustering-oriented structure learning modules: hierarchical correlation estimation and cluster-aware sparsification.* Results: The paper reports superior performance of HoLe against state-of-the-art baselines on seven benchmark datasets of various types and scales, across a range of clustering metrics.Abstract
Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called \textbf{ho}mophily-enhanced structure \textbf{le}arning for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.
摘要
GRAPH CLUSTERING 是图分析中的基本任务,而使用图神经网络(GNNs)的最新进展已经取得了很好的成果。尽管现有的 GNN-based 图 clustering 方法已经取得了成功,但它们经常忽略图structure的质量,这是真实世界图中的自然特性,导致表现不佳。图结构学习可以改善输入图的精度,但以前的图结构学习尝试主要集中在supervised设定下,无法直接应用于我们的具体 clustering 任务,因为缺乏标注数据。为了填补这个差距,我们提出了一种名为 HoLe 的新方法。我们的动机来源于观察,通过提高图中同类连接的度数,可以大幅提高 GNNs 和 clustering 结果。为实现这个目标,我们开发了两个 clustering-oriented 结构学习模块:卷积相关度估计和集群意见简化。前者模块可以更准确地估计对 node 之间的对应关系,通过利用幽扬和 clustering 空间的指导,而后者模块可以基于对应矩阵和 clustering 分配生成一个缩短后的结构。此外,我们设计了一种相互作用的 JOINT 优化方法,将 homophily-enhanced 结构学习和 GNN-based clustering 的训练联合进行。广泛的实验表明,HoLe 在七种不同类型和规模的测试数据上,以及不同的 clustering 指标下,都与当前的基eline相比具有显著的优势。
From CNN to Transformer: A Review of Medical Image Segmentation Models
results: 论文根据实验结果显示,这些模型在类别数据集上的表现都具有优秀的性能。Abstract
Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Additionally, with the remarkable success of pre-trained models in natural language processing tasks, transformer-based models like TransUNet have achieved desirable performance on multiple medical image segmentation datasets. In this paper, we conduct a survey of the most representative four medical image segmentation models in recent years. We theoretically analyze the characteristics of these models and quantitatively evaluate their performance on two benchmark datasets (i.e., Tuberculosis Chest X-rays and ovarian tumors). Finally, we discuss the main challenges and future trends in medical image segmentation. Our work can assist researchers in the related field to quickly establish medical segmentation models tailored to specific regions.
摘要
医学图像分割是医学图像分析中重要的一步,特别是为了效率地诊断和治疗疾病。使用深度学习进行图像分割已成为现代医学图像分析中的潮流。目前广泛采用的方法是U-Net和其变体。此外,随着自然语言处理任务中预训练模型的成功,基于转换器的模型如TransUNet在医学图像分割数据集上实现了可喜的表现。在这篇论文中,我们对最近几年内最 represetative的四种医学图像分割模型进行了调查和分析。我们对这些模型的特点进行了理论分析,并对两个标准数据集(即肺Tube肿瘤X光图像和卵巢肿瘤)进行了量化评估。最后,我们讨论了医学图像分割的主要挑战和未来趋势。我们的工作可以帮助相关领域的研究人员快速建立适应特定地区的医学分割模型。
Byzantine-Robust Decentralized Stochastic Optimization with Stochastic Gradient Noise-Independent Learning Error
For: 这个论文研究了一种对于分布式网络中的副总统(Byzantine)攻击的随机优化方法。* Methods: 这个方法使用了每个代理在它的邻居中交换本地模型,然后使用随机梯度下降(SGD)更新本地模型。它还引入了两种干扰减少方法:随机平均梯度算法(SAGA)和loopless随机干扰减少Gradient(LSVRG),以减少随机梯度误差的影响。* Results: 这个研究发现,使用这两种干扰减少方法可以实现线性增长速度和随机梯度误差独立的学习结果,并且这些结果是给一种基于总体几何(TV)对偏振度误差的正规化和随机梯度更新的方法。实验结果显示这两种方法在不同的副总统攻击下都能够实现高效的学习。Abstract
This paper studies Byzantine-robust stochastic optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by stochastic gradient descent (SGD). The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process. To the best of our knowledge, there is no existing work that simultaneously achieves a linear convergence speed and a small learning error. We observe that the learning error is largely dependent on the intrinsic stochastic gradient noise. Motivated by this observation, we introduce two variance reduction methods, stochastic average gradient algorithm (SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to Byzantine-robust decentralized stochastic optimization for eliminating the negative effect of the stochastic gradient noise. The two resulting methods, BRAVO-SAGA and BRAVO-LSVRG, enjoy both linear convergence speeds and stochastic gradient noise-independent learning errors. Such learning errors are optimal for a class of methods based on total variation (TV)-norm regularization and stochastic subgradient update. We conduct extensive numerical experiments to demonstrate their effectiveness under various Byzantine attacks.
摘要
We observe that the learning error is heavily influenced by the intrinsic stochastic gradient noise. To address this issue, we introduce two variance reduction methods, stochastic average gradient algorithm (SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to Byzantine-robust decentralized stochastic optimization. These methods eliminate the negative impact of stochastic gradient noise and achieve both linear convergence speeds and stochastic gradient noise-independent learning errors.The two resulting methods, BRAVO-SAGA and BRAVO-LSVRG, are optimal for a class of methods based on total variation (TV)-norm regularization and stochastic subgradient update. We conduct extensive numerical experiments to demonstrate their effectiveness under various Byzantine attacks.
Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season
methods: 本研究使用了BERT话题模型对Twitter数据进行 clustering,并进行了时空分析,检查不同地区的话题分布 during the 2020年西部美国野火季。研究发现,Twitter用户主要关注了“健康影响”、“损害”和“疏散”三个话题。
results: 研究发现, Twitter用户在灾害发生后,主要关注健康影响、损害和疏散等三个话题。使用了SIR理论来探索话题传播的大小和速度,结果显示,话题传播与野火传播模式之间存在明显的关系。Abstract
Effective disaster response is critical for affected communities. Responders and decision-makers would benefit from reliable, timely measures of the issues impacting their communities during a disaster, and social media offers a potentially rich data source. Social media can reflect public concerns and demands during a disaster, offering valuable insights for decision-makers to understand evolving situations and optimize resource allocation. We used Bidirectional Encoder Representations from Transformers (BERT) topic modeling to cluster topics from Twitter data. Then, we conducted a temporal-spatial analysis to examine the distribution of these topics across different regions during the 2020 western U.S. wildfire season. Our results show that Twitter users mainly focused on three topics:"health impact," "damage," and "evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to explore the magnitude and velocity of topic diffusion on Twitter. The results displayed a clear relationship between topic trends and wildfire propagation patterns. The estimated parameters obtained from the SIR model in selected cities revealed that residents exhibited a high level of several concerns during the wildfire. Our study details how the SIR model and topic modeling using social media data can provide decision-makers with a quantitative approach to measure disaster response and support their decision-making processes.
摘要
有效的灾难应对是重要的 для受影响的社区。响应者和决策者可以从社交媒体获得可靠、时间敏感的灾难影响的信息,以便更好地理解灾难情况的发展和优化资源分配。我们使用 bidirectional Encoder Representations from Transformers(BERT)话题分析将推特数据中的话题集成到三个话题:“健康影响”, “损害”和“疏散”。然后,我们进行了时空分析,检查推特数据中这些话题的分布情况。我们的结果显示,推特用户主要关注了这三个话题。我们使用感染-传播- recovered(SIR)理论来探究推特上话题的传播范围和速度。结果显示,选择的城市的居民在野火期间表现出了高水平的多种关心。我们的研究详细介绍了如何使用社交媒体数据和 SIR 模型来衡量灾难应对和支持决策过程。
paper_authors: Pengfei Ding, Yan Wang, Guanfeng Liu for: 本研究主要针对的是跨型图ew-shot学习问题,即在含有不同类型节点和边的图(Heterogeneous Graph,HG)中用少量标签数据进行新类预测。methods: 本文提出了一种新的模型,即跨型图ew-shot学习模型(CGFL),以解决跨型图ew-shot学习问题。CGFL首先提取了元模式来捕捉各种不同类型的信息,然后使用多视图异类图神经网络(MHGN)来学习元模式。最后,CGFL使用一个分数模块来评估标注样本的有用性,并决定源HG中的转移性。results: 对四个实际 datasets进行了广泛的实验,结果表明CGFL在跨型图ew-shot学习问题中表现出色,比之前的方法更高效。Abstract
In recent years, heterogeneous graph few-shot learning has been proposed to address the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges. The existing methods have achieved good performance by transferring generalized knowledge extracted from rich-labeled classes in source HG(s) to few-labeled classes in a target HG. However, these methods only consider the single-heterogeneity scenario where the source and target HGs share a fixed set of node/edge types, ignoring the more general scenario of cross-heterogeneity, where each HG can have a different and non-fixed set of node/edge types. To this end, we focus on the unexplored cross-heterogeneity scenario and propose a novel model for Cross-heterogeneity Graph Few-shot Learning, namely CGFL. In CGFL, we first extract meta-patterns to capture heterogeneous information and propose a multi-view heterogeneous graph neural network (MHGN) to learn meta-patterns across HGs. Then, we propose a score module to measure the informativeness of labeled samples and determine the transferability of each source HG. Finally, by integrating MHGN and the score module into a meta-learning mechanism, CGFL can effectively transfer generalized knowledge to predict new classes with few-labeled data. Extensive experiments on four real-world datasets have demonstrated the superior performance of CGFL over the state-of-the-art methods.
摘要
在最近的几年中,人们提出了hetEROEUS graph few-shot learning,以Addressing the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges. Existing methods have achieved good performance by transferring generalized knowledge extracted from rich-labeled classes in source HGs to few-labeled classes in a target HG. However, these methods only consider the single-heterogeneity scenario where the source and target HGs share a fixed set of node/edge types, ignoring the more general scenario of cross-heterogeneity, where each HG can have a different and non-fixed set of node/edge types. Therefore, we focus on the unexplored cross-heterogeneity scenario and propose a novel model for Cross-heterogeneity Graph Few-shot Learning, namely CGFL.In CGFL, we first extract meta-patterns to capture heterogeneous information and propose a multi-view heterogeneous graph neural network (MHGN) to learn meta-patterns across HGs. Then, we propose a score module to measure the informativeness of labeled samples and determine the transferability of each source HG. Finally, by integrating MHGN and the score module into a meta-learning mechanism, CGFL can effectively transfer generalized knowledge to predict new classes with few-labeled data. Extensive experiments on four real-world datasets have demonstrated the superior performance of CGFL over the state-of-the-art methods.
Data-driven Intra-Autonomous Systems Graph Generator
methods: 本文提出了一种名为Deep-generative graphs for the Internet(DGGI)的生成器,以及一种名为Internet Graphs(IGraphs)的大规模实际AS内部网络图 dataset。为创建IGraphs,开发了 Filtered Recurrent Multi-level(FRM)算法 для社区提取。
results: 实验结果表明,DGGI生成的Synthetic Graphs能够准确地复制AS内部网络的特性,包括中心性、嵌入性、同域性和节点度。相比之下,DGGI generator在Maximum Mean Discrepancy(MMD) metric上提高了84.4%、95.1%、97.9%和94.7%。Abstract
This paper introduces a novel deep-learning based generator of synthetic graphs that represent intra-Autonomous System (AS) in the Internet, named Deep-generative graphs for the Internet (DGGI). It also presents a novel massive dataset of real intra-AS graphs extracted from the project Internet Topology Data Kit (ITDK), called Internet Graphs (IGraphs). To create IGraphs, the Filtered Recurrent Multi-level (FRM) algorithm for community extraction was developed. It is shown that DGGI creates synthetic graphs which accurately reproduce the properties of centrality, clustering, assortativity, and node degree. The DGGI generator overperforms existing Internet topology generators. On average, DGGI improves the Maximum Mean Discrepancy (MMD) metric 84.4%, 95.1%, 97.9%, and 94.7% for assortativity, betweenness, clustering, and node degree, respectively.
摘要
Translated into Simplified Chinese:这篇论文介绍了一种基于深度学习的互联网内部自治系统(AS)的生成器,名为深度生成互联网图(DGGI)。它还介绍了一个大量的真实内部AS图集,取名为互联网图集(IGraphs),通过Filtered Recurrent Multi-level(FRM)算法进行社区抽象。实验表明,DGGI可以准确地复制内部AS图中的中心性、嵌入性、聚合性和节点度的性质。与现有的互联网图生成器相比,DGGI生成器表现出色,在最大均方差(MMD)指标上提高了84.4%、95.1%、97.9%和94.7%。
AI-Enabled Software and System Architecture Frameworks: Focusing on smart Cyber-Physical Systems (CPS)
paper_authors: Armin Moin, Atta Badii, Stephan Günnemann, Moharram Challenger
For: The paper aims to address the gap in existing architecture frameworks for software, systems, and enterprises by including the concerns of data science and Machine Learning (ML) stakeholders, such as data scientists and data engineers.* Methods: The paper proposes two sets of merit criteria for the efficient development and performance assessment of ML-enabled Cyber-Physical Systems (CPSs), as well as criteria for evaluating and benchmarking the tools used in the modeling and development pipeline. The authors use multiple empirical and qualitative research methods, including literature review, survey instruments, and expert interviews, to devise and validate the proposed framework.* Results: The paper provides a framework adapted to meet the requirements of modern applications and organizations where ML artifacts are prevalent and crucial, and proposes criteria for evaluating and benchmarking ML-enabled CPSs and the tools used in their development. The authors collect and analyze opinions from 77 experts from over 25 organizations in more than 10 countries to validate the proposed framework.Abstract
Several architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified various stakeholders and defined architecture viewpoints and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Therefore, they failed to address the architecture viewpoints and views responsive to the concerns of the data science community. In this paper, we address this gap by establishing the architecture frameworks adapted to meet the requirements of modern applications and organizations where ML artifacts are both prevalent and crucial. In particular, we focus on ML-enabled Cyber-Physical Systems (CPSs) and propose two sets of merit criteria for their efficient development and performance assessment, namely the criteria for evaluating and benchmarking ML-enabled CPSs, and the criteria for evaluation and benchmarking of the tools intended to support users through the modeling and development pipeline. In this study, we deploy multiple empirical and qualitative research methods based on literature review and survey instruments including expert interviews and an online questionnaire. We collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries to devise and validate the proposed framework.
摘要
We use multiple empirical and qualitative research methods, including literature review and survey instruments such as expert interviews and online questionnaires, to collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries. We validate the proposed framework through this study.Here is the translation in Simplified Chinese:一些软件、系统和企业架构框架在文献中提出,但它们未包括数据科学和机器学习(ML)相关的利益师,如数据科学家和数据工程师。这种漏洞导致现有的架构框架无法对数据科学社区的关注进行框架和考虑。在本文中,我们希望通过适应现代应用和组织中ML文件的普遍和重要性来解决这个问题。我们专注于ML启用的Cyber-Physical Systems(CPSs),并提出了两个集成评价标准,一个是评估和比较ML启用CPSs的标准,另一个是用于支持用户模拟和开发管道的工具评价标准。我们通过文献复查和问卷调查工具,包括专家采访和在线问卷,收集、分析和集成了77名专家来拟合和验证提议的框架。
Financial Fraud Detection: A Comparative Study of Quantum Machine Learning Models
results: 研究发现,量子支持向量分类模型在金融应用中的表现最高,F1分数达0.98。其他模型也具有潜在的潜力,但存在一些局限性。研究提供了解决当前限制的解决方案,并为量子机器学习在诈骗检测领域的未来发展提供了新的思路。Abstract
In this research, a comparative study of four Quantum Machine Learning (QML) models was conducted for fraud detection in finance. We proved that the Quantum Support Vector Classifier model achieved the highest performance, with F1 scores of 0.98 for fraud and non-fraud classes. Other models like the Variational Quantum Classifier, Estimator Quantum Neural Network (QNN), and Sampler QNN demonstrate promising results, propelling the potential of QML classification for financial applications. While they exhibit certain limitations, the insights attained pave the way for future enhancements and optimisation strategies. However, challenges exist, including the need for more efficient Quantum algorithms and larger and more complex datasets. The article provides solutions to overcome current limitations and contributes new insights to the field of Quantum Machine Learning in fraud detection, with important implications for its future development.
摘要
本研究中,我们进行了四种量子机器学习(QML)模型的比较研究,用于金融领域的诈骗检测。我们证明了量子支持向量分类模型可以达到最高性能,其F1分数为0.98。其他模型,如量子随机分类器、量子神经网络(QNN)估计器和抽样QNN,也展示了潜在的潜力,为金融应用领域的量子机器学习分类带来了新的可能性。尽管它们存在一些局限性,但获得的洞察能够推动未来的优化策略。然而,当前还存在许多挑战,包括需要更高效的量子算法和更大和更复杂的数据集。本文提供了解决当前限制的解决方案,并对量子机器学习在诈骗检测领域的未来发展带来了新的意义。
Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping
paper_authors: Ali Jamali, Swalpa Kumar Roy, Danfeng Hong, Peter M Atkinson, Pedram Ghamisi for: 这个研究旨在开发一个基于多层调和单元(MLP)和空间闸道单元(SGU)的精确土地用类分(LULC)映射算法。methods: 研究人员使用了一个名为SGU-MLP的学习算法,该算法结合了MLP和SGU,以提高LULC映射的精度。results: 实验结果显示,SGU-MLP分类算法比较 HybridSN、ResNet、iFormer、EfficientFormer 和 CoAtNet 等标准 CNN 和 CNN-ViT 基本方法都有着superior的性能,例如在旧金山实验中,SGU-MLP 与 HybridSN、CoAtNet、Efficientformer、iFormer 和 ResNet 相比,获得了15%、19%、20%、21% 和 25% 的提高。Abstract
Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP
摘要
convolutional neural networks (CNNs) 是用于EXTRACT hierarchical features的模型,recently,through the use of a self-attention mechanism, vision transformers (ViTs) have achieved better modeling of global contextual information than CNNs. However, to realize their image classification strength, ViTs require a large amount of training data. Where the available training data is limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP.
Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection in Autonomous Driving
results: 研究发现,可以通过在云端进行模型计算,以提高探测质量并且在实时下进行计算。此外,使用JPEG和H.265压缩也可以降低网络传输延迟,并不影响探测metric。Abstract
Environmental perception is a key element of autonomous driving because the information received from the perception module influences core driving decisions. An outstanding challenge in real-time perception for autonomous driving lies in finding the best trade-off between detection quality and latency. Major constraints on both computation and power have to be taken into account for real-time perception in autonomous vehicles. Larger object detection models tend to produce the best results, but are also slower at runtime. Since the most accurate detectors cannot run in real-time locally, we investigate the possibility of offloading computation to edge and cloud platforms, which are less resource-constrained. We create a synthetic dataset to train object detection models and evaluate different offloading strategies. Using real hardware and network simulations, we compare different trade-offs between prediction quality and end-to-end delay. Since sending raw frames over the network implies additional transmission delays, we also explore the use of JPEG and H.265 compression at varying qualities and measure their impact on prediction metrics. We show that models with adequate compression can be run in real-time on the cloud while outperforming local detection performance.
摘要
环境感知是自动驾驶中关键的元素,因为感知模块的信息会影响核心驾驶决策。现实时感知 для自动驾驶存在出色的挑战,即在检测质量和延迟之间找到最佳平衡。实时感知中的计算和能量限制是必须考虑的。大型物体检测模型通常会生成最佳结果,但是在运行时也更慢。由于最准确的检测器无法在本地实时运行,我们 investigate了将计算卷积到边缘和云平台上,这些平台具有较强的计算和能量资源。我们创建了synthetic数据集来训练物体检测模型,并评估不同的卷积策略。使用真实硬件和网络 simulate,我们比较不同的卷积质量和终端延迟之间的质量差异。由于发送Raw帧到网络会添加额外的传输延迟,我们还探讨了使用JPEG和H.265压缩的影响,并测量其对预测指标的影响。我们发现,使用合适的压缩策略可以在云上实时运行模型,而且比本地检测性能更高。
SegMatch: A semi-supervised learning method for surgical instrument segmentation
paper_authors: Meng Wei, Charlie Budd, Luis C. Garcia-Peraza-Herrera, Reuben Dorent, Miaojing Shi, Tom Vercauteren for:SegMatch is proposed to reduce the need for expensive annotation for laparoscopic and robotic surgical images, which is a key enabler for advanced surgical assistance and computer-assisted interventions.methods:SegMatch is a semi-supervised learning method that combines consistency regularization and pseudo labeling, and adapts it for segmentation tasks. It uses weakly augmented unlabelled images to generate pseudo-labels and enforces the unsupervised loss against the output of the model. The algorithm also introduces a trainable adversarial augmentation strategy to increase the relevance of the augmentations.results:SegMatch outperforms fully supervised approaches and state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios. The results demonstrate the effectiveness of using unlabelled data for training purposes in segmentation tasks.Abstract
Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining consistency regularization and pseudo labelling, and adapts it for the purpose of segmentation. In our proposed SegMatch, the unlabelled images are weakly augmented and fed into the segmentation model to generate a pseudo-label to enforce the unsupervised loss against the output of the model for the adversarial augmented image on the pixels with a high confidence score. Our adaptation for segmentation tasks includes carefully considering the equivariance and invariance properties of the augmentation functions we rely on. To increase the relevance of our augmentations, we depart from using only handcrafted augmentations and introduce a trainable adversarial augmentation strategy. Our algorithm was evaluated on the MICCAI Instrument Segmentation Challenge datasets Robust-MIS 2019 and EndoVis 2017. Our results demonstrate that adding unlabelled data for training purposes allows us to surpass the performance of fully supervised approaches which are limited by the availability of training data in these challenges. SegMatch also outperforms a range of state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios.
摘要
针对手术工具分割问题,我们提出了SegMatch方法,该方法可以减少高成本的标注量,以提高计算机协助手术 intervención的效能。SegMatch基于FixMatch算法,该算法组合了一致性规则和pseudo标签,并特意适应了分割任务。在我们的SegMatch方法中,未标注的图像会被弱地扩展并输入到分割模型中,以生成一个pseudo标签,以便对模型输出的图像像素进行不supervised损失。我们对分割任务进行了仔细考虑,包括选择适合的扩展函数,以保证我们的扩展函数具有对称和不变性性质。为了提高我们的扩展函数的 relevance,我们不再仅仅使用手工设计的扩展函数,而是引入了可学习的对抗扩展策略。我们的算法在Robust-MIS 2019和EndoVis 2017两个MICCAI Instrument Segmentation Challenge数据集上进行了评估。我们的结果表明,通过添加未标注的数据进行训练,我们可以超越完全supervisedapproaches的性能,这些approaches受到训练数据的有限性的限制。SegMatch还在不同的标签到未标签数据比例下与state-of-the-art semi-supervised学习semantic segmentation模型进行比较,并表现出优于这些模型。
Training neural networks with end-to-end optical backpropagation
results: 该研究成功实现了完全光学神经网络,包括训练和推理两个阶段,并且可以在不同的光学平台、材料和网络结构上实现。Abstract
Optics is an exciting route for the next generation of computing hardware for machine learning, promising several orders of magnitude enhancement in both computational speed and energy efficiency. However, to reach the full capacity of an optical neural network it is necessary that the computing not only for the inference, but also for the training be implemented optically. The primary algorithm for training a neural network is backpropagation, in which the calculation is performed in the order opposite to the information flow for inference. While straightforward in a digital computer, optical implementation of backpropagation has so far remained elusive, particularly because of the conflicting requirements for the optical element that implements the nonlinear activation function. In this work, we address this challenge for the first time with a surprisingly simple and generic scheme. Saturable absorbers are employed for the role of the activation units, and the required properties are achieved through a pump-probe process, in which the forward propagating signal acts as the pump and backward as the probe. Our approach is adaptable to various analog platforms, materials, and network structures, and it demonstrates the possibility of constructing neural networks entirely reliant on analog optical processes for both training and inference tasks.
摘要
计算机视觉是未来计算机硬件的吸引人路线,提供了数个数量级提升的计算速度和能效率。然而,为了实现完整的光学神经网络,不仅推理需要光学计算,还需要训练计算也需要光学实现。主要神经网络训练算法是反射传播,在这种情况下,计算顺序与信息流方向相反。虽然在数字计算机中 straightforward,但光学实现反射传播尚未实现,特别是因为光学元件实现非线性活化函数的需求受到了矛盾的要求。在这种工作中,我们为首次解决这个挑战,使用了抑制吸收器作为活化单元,并通过吸引和探测过程来实现所需的属性。我们的方法可以适应不同的分析平台、材料和网络结构,并示出了完全依赖于光学过程实现神经网络的训练和推理任务。
Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes
paper_authors: Sharon Jiang, Shannon Shen, Monica Agrawal, Barbara Lam, Nicholas Kurtzman, Steven Horng, David Karger, David Sontag for: 降低临床医生疲劳,提高医疗记录的效率和质量。methods: 利用电子医疗记录(EHR)审核日志数据进行机器学习超vision,实时动态提取有关病人历史记录。results: 在紧急医疗部门的评估中,方法可以达到0.963的准确率,帮助医生更有效地检索有关病人信息。User study也表明,该框架可以帮助医生更快速地检索有关病人信息。Abstract
The large amount of time clinicians spend sifting through patient notes and documenting in electronic health records (EHRs) is a leading cause of clinician burnout. By proactively and dynamically retrieving relevant notes during the documentation process, we can reduce the effort required to find relevant patient history. In this work, we conceptualize the use of EHR audit logs for machine learning as a source of supervision of note relevance in a specific clinical context, at a particular point in time. Our evaluation focuses on the dynamic retrieval in the emergency department, a high acuity setting with unique patterns of information retrieval and note writing. We show that our methods can achieve an AUC of 0.963 for predicting which notes will be read in an individual note writing session. We additionally conduct a user study with several clinicians and find that our framework can help clinicians retrieve relevant information more efficiently. Demonstrating that our framework and methods can perform well in this demanding setting is a promising proof of concept that they will translate to other clinical settings and data modalities (e.g., labs, medications, imaging).
摘要
临床医生的大量时间浏览病人笔记和电子医疗记录(EHR)是促进临床医生疲劳的主要原因。我们提出了在文笔记录过程中积极和动态检索相关笔记的方法,以减少找到病人历史信息的努力。在这项工作中,我们利用EHR审核日志为机器学习提供监督,以确定笔记 relevance 在特定临床上下文中,在特定时间点。我们的评估表明,我们的方法可以实现笔记 relevance 预测的 AUC 为 0.963。此外,我们还进行了一些临床医生的用户研究,发现我们的框架可以帮助临床医生更有效地检索相关信息。这表明我们的框架和方法在高危紧急环境下表现良好,这是一个有前途的证明,它将在其他临床设置和数据模式(例如,实验室、药物、成像)中发挥作用。
results: 这篇论文在多个benchmark测试 datasets上demonstrated consistent improvement over其他焦点识别方法,并且不需要额外的训练或标示数据。Abstract
In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.
摘要
在这篇论文中,我们提出了一种策略来 Identify textual saliency in large-scale language models applied to classification tasks。在视觉网络中,saliency naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.
Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio
methods: 这个模型包括基于 TitaNet 的说者嵌入模块、基于 Conformer 的遮罔以及 ASR 模块。这些模块在识别目标说者的时间频域中进行了共同优化,以将其他说者的语音排除。
results: 这篇论文使用 CTC 损失函数和对称频谱重建损失函数进行训练,实现了在 WSJ0-2mix-extr (4.2%)上的目标说者字元错误率 (TS-WER) 的最佳性能。此外,这篇论文还报告了在 WSJ0-3mix-extr(12.4%)、LibriSpeech2Mix(4.2%)和 LibriSpeech3Mix(7.6%) datasets 上的TS-WER,创造了新的benchmarks для TS-ASR。Abstract
We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The model consists of a TitaNet based speaker embedding module, a Conformer based masking as well as ASR modules. These modules are jointly optimized to transcribe a target-speaker, while ignoring speech from other speakers. For training we use Connectionist Temporal Classification (CTC) loss and introduce a scale-invariant spectrogram reconstruction loss to encourage the model better separate the target-speaker's spectrogram from mixture. We obtain state-of-the-art target-speaker word error rate (TS-WER) on WSJ0-2mix-extr (4.2%). Further, we report for the first time TS-WER on WSJ0-3mix-extr (12.4%), LibriSpeech2Mix (4.2%) and LibriSpeech3Mix (7.6%) datasets, establishing new benchmarks for TS-ASR. The proposed model will be open-sourced through NVIDIA NeMo toolkit.
摘要
我们提出了 CONF-TSASR,一种非 autoregressive 终端到时域频域架构,用于单频道目标说话人自动语音识别(TS-ASR)。该模型包括基于 TitaNet 的说话人嵌入模块、基于 Conformer 的掩码以及 ASR 模块。这些模块被联合优化,以转录目标说话人的语音,同时忽略其他说话人的语音。我们使用 Connectionist Temporal Classification(CTC)损失函数进行训练,并引入一种扩展矩阵嵌入重建损失函数,以促进模型更好地分离目标说话人的嵌入。我们在 WSJ0-2mix-extr(4.2%)上实现了目标说话人单词错误率(TS-WER)的新 benchmark,并在 WSJ0-3mix-extr(12.4%)、LibriSpeech2Mix(4.2%)和 LibriSpeech3Mix(7.6%) dataset 上提出了新的TS-ASR benchmark。该模型将被通过 NVIDIA NeMo 工具包开源。
Evaluating Pedestrian Trajectory Prediction Methods for the Application in Autonomous Driving
results: 研究发现,简单的模型在生成单个轨迹时仍然竞争力强,而一些常被认为是有用的特征在不同的架构下具有少量影响。此外,对不同数量的代理人进行的执行时间测量表明,简单的模型在面对不同数量的代理人时具有更好的扩展性。根据这些发现,本文提出了指导未来轨迹预测算法的建议。Abstract
In this paper, the state of the art in the field of pedestrian trajectory prediction is evaluated alongside the constant velocity model (CVM) with respect to its applicability in autonomous vehicles. The evaluation is conducted on the widely-used ETH/UCY dataset where the Average Displacement Error (ADE) and the Final Displacement Error (FDE) are reported. To align with requirements in real-world applications, modifications are made to the input features of the initially proposed models. An ablation study is conducted to examine the influence of the observed motion history on the prediction performance, thereby establishing a better understanding of its impact. Additionally, the inference time of each model is measured to evaluate the scalability of each model when confronted with varying amounts of agents. The results demonstrate that simple models remain competitive when generating single trajectories, and certain features commonly thought of as useful have little impact on the overall performance across different architectures. Based on these findings, recommendations are proposed to guide the future development of trajectory prediction algorithms.
摘要
在这篇论文中,我们评估了行人轨迹预测领域的现状,并与常速模型(CVM)进行比较,以探讨其在自动驾驶汽车中的可用性。我们在广泛使用的 ETH/UCY 数据集上进行评估,并计算了平均偏移误差(ADE)和最终偏移误差(FDE)。为适应实际应用需求,我们对初始提出的模型的输入特征进行修改。我们还进行了减少影响分析,以更好地了解其影响。此外,我们测量了每个模型的推理时间,以评估它们在面临不同数量的代理人时的扩展性。结果表明,简单的模型在生成单个轨迹时仍然竞争力强,而一些通常被认为是有用的特征实际上对整体性能有少量影响。根据这些发现,我们提出了指导未来轨迹预测算法的建议。
Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding
results: 该论文通过实验和分析,证明了其提出的计算模型和深度网络架构在视频序列中的应用有效性和可靠性。Abstract
This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.
摘要
本博士论文研究和开发了视觉注意力层次表示和视频序列中的视觉注意力理解。更specifically,我们提出了两种计算模型 для视觉注意力。首先,我们提出了一种生成概率模型,用于 Context-aware 视觉注意力模型和理解。其次,我们开发了一种深度网络架构,用于视觉注意力模型,首先估算上下文视觉注意力,最终用于模型 temporal domain 中的注意力。
Deep Learning for Morphological Identification of Extended Radio Galaxies using Weak Labels
paper_authors: Nikhel Gupta, Zeeshan Hayder, Ray P. Norris, Minh Huynh, Lars Petersson, X. Rosalind Wang, Heinz Andernach, Bärbel S. Koribalski, Miranda Yew, Evan J. Crawford
for: 这项研究旨在开发一种基于深度学习算法,以降低对复杂 radio galaxies 的 pixe-level标签成本。
methods: 该算法使用了弱类标签的 radio galaxies 来获得类活动地图(CAMs),然后使用 Inter-pixel 关系网络(IRNet)进一步加工CAMs,以获得 radio galaxies 和它们的红外主恒星的实例分割mask。
results: 研究人员使用的数据来自澳大利亚 Square Kilometre Array Pathfinder(ASKAP)望远镜,具体是 Evolutionary Map of the Universe(EMU)试验 Survey,覆盖了天空面积270平方度,具有RMS敏感度25-35μJy/ibeam。研究人员展示了使用弱类标签深度学习算法可以高精度预测 pixe-level信息,包括 radio 辐射覆盖所有 галактиComponents的扩展mask和红外主恒星的位置。Abstract
The present work discusses the use of a weakly-supervised deep learning algorithm that reduces the cost of labelling pixel-level masks for complex radio galaxies with multiple components. The algorithm is trained on weak class-level labels of radio galaxies to get class activation maps (CAMs). The CAMs are further refined using an inter-pixel relations network (IRNet) to get instance segmentation masks over radio galaxies and the positions of their infrared hosts. We use data from the Australian Square Kilometre Array Pathfinder (ASKAP) telescope, specifically the Evolutionary Map of the Universe (EMU) Pilot Survey, which covered a sky area of 270 square degrees with an RMS sensitivity of 25-35 $\mu$Jy/beam. We demonstrate that weakly-supervised deep learning algorithms can achieve high accuracy in predicting pixel-level information, including masks for the extended radio emission encapsulating all galaxy components and the positions of the infrared host galaxies. We evaluate the performance of our method using mean Average Precision (mAP) across multiple classes at a standard intersection over union (IoU) threshold of 0.5. We show that the model achieves a mAP$_{50}$ of 67.5\% and 76.8\% for radio masks and infrared host positions, respectively. The network architecture can be found at the following link: https://github.com/Nikhel1/Gal-CAM
摘要
当前研究探讨了一种使用弱监督深度学习算法,以降低标注复杂Radio галактиках多Component的像素级掩码的成本。该算法在Radio галактиках的弱类级标签上训练,以获得类活化图(CAM)。然后,使用Inter-pixel关系网络(IRNet)进一步级别Refine CAM,以获得Radio галактиках和它们的红外主 galaxy位置的实例分割面。我们使用澳大利亚Square Kilometre Array Pathfinder(ASKAP) telescope的数据,具体是Evolutionary Map of the Universe(EMU) Pilot Survey,覆盖了天空面积270平方度,RMS敏感度为25-35 $\mu$Jy/beam。我们示示了弱监督深度学习算法可以高精度预测像素级信息,包括Radio扩散的扩散面和红外主 galaxy的位置。我们使用mean Average Precision(mAP)来评估模型的性能,mAP$_{50}$为67.5%和76.8%分别为Radio掩码和红外主 galaxy位置。网络架构可以在以下链接中找到:https://github.com/Nikhel1/Gal-CAM。
Improved Multi-Shot Diffusion-Weighted MRI with Zero-Shot Self-Supervised Learning Reconstruction
For: This paper aims to improve the resolution of diffusion-weighted images (DWIs) in magnetic resonance imaging (MRI) by developing a novel multi-shot echo-planar imaging (msEPI) reconstruction approach called zero-MIRID.* Methods: The proposed approach uses deep learning-based image regularization techniques, including convolutional neural network (CNN) denoisers in both k- and image-spaces, and leverages virtual coils to enhance image reconstruction conditioning.* Results: The proposed approach achieves superior results compared to the state-of-the-art parallel imaging method, as demonstrated in an in-vivo experiment.Here’s the simplified Chinese text for the three information:* For: 这个研究旨在通过开发一种新的多shot echo-planar imaging (msEPI)重建方法来提高核磁共振成像 (MRI) 中的扩散束图像的分辨率。* Methods: 该方法使用深度学习基于图像规范化技术,包括图像杂化神经网络 (CNN) 杂化器在 both k- 和 image-spaces,并利用虚拟antenna来提高图像重建条件。* Results: 该方法在实验中比 estado-of-the-art 并行成像方法更为出色。Abstract
Diffusion MRI is commonly performed using echo-planar imaging (EPI) due to its rapid acquisition time. However, the resolution of diffusion-weighted images is often limited by magnetic field inhomogeneity-related artifacts and blurring induced by T2- and T2*-relaxation effects. To address these limitations, multi-shot EPI (msEPI) combined with parallel imaging techniques is frequently employed. Nevertheless, reconstructing msEPI can be challenging due to phase variation between multiple shots. In this study, we introduce a novel msEPI reconstruction approach called zero-MIRID (zero-shot self-supervised learning of Multi-shot Image Reconstruction for Improved Diffusion MRI). This method jointly reconstructs msEPI data by incorporating deep learning-based image regularization techniques. The network incorporates CNN denoisers in both k- and image-spaces, while leveraging virtual coils to enhance image reconstruction conditioning. By employing a self-supervised learning technique and dividing sampled data into three groups, the proposed approach achieves superior results compared to the state-of-the-art parallel imaging method, as demonstrated in an in-vivo experiment.
摘要
Diffusion MRI通常使用普通的抗频响成像技术(EPI)进行扫描,但是它的分辨率经常受到磁场不均的artefacts和T2-和T2*-相互作用引起的模糊的限制。为了解决这些限制,常用多极EPI(msEPI)和平行技术。然而,重建msEPI可以很困难,因为多个极的相位变化。在这种研究中,我们介绍了一种新的msEPI重建方法,即零MIRID(零频率自我超vised学习 Multi-shot Image Reconstruction for Improved Diffusion MRI)。这种方法将jointly重建msEPI数据,并通过在图像Regularization技术中 incorporating deep learning-based image regularization techniques。网络包括在k-和图像空间中的CNN denoisers,同时利用虚拟磁体来提高图像重建条件。通过使用自我超vised学习技术和将样本数据分成三组,我们的方法在对比 estado-of-the-art 平行图像方法的实验中 дости到了更好的结果。
DOST – Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels
for: 这篇论文targets the problem of annotation noise in multi-label classification (MLC) tasks, and proposes a novel approach called Domain Obedient Self-supervised Training (DOST) to mitigate the effect of noise.
methods: 这篇论文使用了自适应式学习和域指导来检测错误标签和避免规则违背预测,并且将域规则 incorporated into the learning algorithm to improve the alignment of the model with the domain rules.
results: 实验结果显示,这篇论文的方法可以提高预测性能和减少错误标签的影响,并且可以将模型调整为更好地遵循域规则。Abstract
The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of "multi-label classification" (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question has certain logical constraints, noisy annotations often exacerbate their violations, making such a system unacceptable to an expert. This paper studies the effect of label noise on domain rule violation incidents in the MLC task, and incorporates domain rules into our learning algorithm to mitigate the effect of noise. We propose the Domain Obedient Self-supervised Training (DOST) paradigm which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. This novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, thus making it more "data efficient" and domain compliant. Empirical studies, performed over two large scale multi-label classification datasets, demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.
摘要
深度学习技术的巨大需求已导致笔记噪声问题的出现,而这个问题在多标签分类(MLC)任务中尤为复杂。随着领域的不同,噪声笔记常常扩大领域规则的违反,使得这种系统无法得到专家的acceptance。本文研究了多标签分类任务中标签噪声对领域规则违反的影响,并将领域规则 integrate into our学习算法以降低噪声的影响。我们提出了遵循领域规则自我指导的学习方法(DOST),该方法不仅使得深度学习模型更加遵循领域规则,而且提高了学习性能的关键指标,同时减少了笔记噪声的影响。我们在两个大规模多标签分类数据集上进行了实验,结果表明,我们的方法在所有指标上均得到改善,并经常完全抵消噪声的影响。
A degree of image identification at sub-human scales could be possible with more advanced clusters
results: 扩大数据量和图像质量时,可以达到人类水平的物体检测性能,但需要在sub-human size下达到最佳性能。Abstract
The purpose of the research is to determine if currently available self-supervised learning techniques can accomplish human level comprehension of visual images using the same degree and amount of sensory input that people acquire from. Initial research on this topic solely considered data volume scaling. Here, we scale both the volume of data and the quality of the image. This scaling experiment is a self-supervised learning method that may be done without any outside financing. We find that scaling up data volume and picture resolution at the same time enables human-level item detection performance at sub-human sizes.We run a scaling experiment with vision transformers trained on up to 200000 images up to 256 ppi.
摘要
“研究的目的是确定当前可用的自动学习技术是否可以通过同样的感知输入来达到人类水平的视觉图像理解。初期研究仅考虑数据量的缩放。在这里,我们同时缩放数据量和图像质量。这是一种没有外部资金支持的自动学习方法。我们发现同时缩放数据量和图像分辨率可以达到人类水平的物体检测性能,尽管它们的大小仅为人类的一半。我们使用视Transformers进行训练,并测试了200000个图像,最高达256ppi。”Note that the word "ppi" is not a commonly used term in Simplified Chinese, so I translated it as "分辨率" (resolution) instead.
Bayesian Inverse Transition Learning for Offline Settings
paper_authors: Leo Benac, Sonali Parbhoo, Finale Doshi-Velez
for: 这个论文主要针对Sequential decision-making领域,如医疗和教育, где reward是知道的,并且需要根据批处理数据来估算转移动力学T。
methods: 该论文提出了一种新的约束基于方法,该方法可以可靠地学习 posterior distribution of transition dynamics T,并且可以减少政策的差异度。
results: 论文的结果表明,通过使用该约束,可以学习一个高性能的策略,同时减少策略的差异度。此外,该约束还可以帮助我们推测action的partial ranking,并且帮助我们推测safer和更 информатив的策略 для规划。Abstract
Offline Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare and education, where the rewards are known and the transition dynamics $T$ must be estimated on the basis of batch data. A key challenge for all tasks is how to learn a reliable estimate of the transition dynamics $T$ that produce near-optimal policies that are safe enough so that they never take actions that are far away from the best action with respect to their value functions and informative enough so that they communicate the uncertainties they have. Using data from an expert, we propose a new constraint-based approach that captures our desiderata for reliably learning a posterior distribution of the transition dynamics $T$ that is free from gradients. Our results demonstrate that by using our constraints, we learn a high-performing policy, while considerably reducing the policy's variance over different datasets. We also explain how combining uncertainty estimation with these constraints can help us infer a partial ranking of actions that produce higher returns, and helps us infer safer and more informative policies for planning.
摘要
Translated into Simplified Chinese:Offline 强化学习通常在医疗和教育等领域使用,以实现序列决策,其中奖励已知,transition dynamics $T$ 必须通过批量数据来估算。任务中的一个关键挑战是如何学习可靠的 transition dynamics $T$,以生成近似优质策略,并保证这些策略是安全的,即不会选择远离最优动作的值函数上的行为,同时具有信息量足够的策略。通过专家数据,我们提出了一种新的约束基本方法,该方法满足我们的需求,包括可靠地学习 posterior 分布的 transition dynamics $T$,不受导 gradient 的影响。我们的结果表明,通过我们的约束,我们可以学习高性能的策略,同时在不同的数据集上减少策略的方差。此外,我们还解释了如何将不确定性估计与这些约束结合使用,以帮助我们对不同的动作进行部分排序,并生成更安全和更有信息的策略 для 规划。
An Interpretable and Attention-based Method for Gaze Estimation Using Electroencephalography
results: 研究人员通过对该框架进行了全面评估,发现其在准确性和可靠性方面与当前方法相比较出色,并且提供了可视化结果,以便更好地理解研究结果和增强EEG数据分析的效率和效果。Abstract
Eye movements can reveal valuable insights into various aspects of human mental processes, physical well-being, and actions. Recently, several datasets have been made available that simultaneously record EEG activity and eye movements. This has triggered the development of various methods to predict gaze direction based on brain activity. However, most of these methods lack interpretability, which limits their technology acceptance. In this paper, we leverage a large data set of simultaneously measured Electroencephalography (EEG) and Eye tracking, proposing an interpretable model for gaze estimation from EEG data. More specifically, we present a novel attention-based deep learning framework for EEG signal analysis, which allows the network to focus on the most relevant information in the signal and discard problematic channels. Additionally, we provide a comprehensive evaluation of the presented framework, demonstrating its superiority over current methods in terms of accuracy and robustness. Finally, the study presents visualizations that explain the results of the analysis and highlights the potential of attention mechanism for improving the efficiency and effectiveness of EEG data analysis in a variety of applications.
摘要
眼动可以披露人类心理过程、物理健康和行为等方面的有价值信息。近年来,一些同时记录EEG活动和眼动的数据集被开发出来,这导致了基于EEG数据预测眼动方向的多种方法的开发。然而,大多数这些方法缺乏可解性,这限制了技术的接受度。在本文中,我们利用大量同时测量EEG和眼动的数据集,提出一种可解的EEG信号分析模型,允许网络在信号中寻找最重要的信息,并且抛弃问题性的通道。此外,我们对提出的框架进行了全面的评估,证明它在准确性和稳定性方面超过当前方法。最后,研究还提供了可视化结果,解释分析结果并高亮了EEG数据分析中注意机制的潜在效果和可能性。
EEG-based Emotion Style Transfer Network for Cross-dataset Emotion Recognition
paper_authors: Yijin Zhou, Fu Li, Yang Li, Youshuo Ji, Lijian Zhang, Yuanfang Chen, Wenming Zheng, Guangming Shi
For: 这篇论文旨在解决脑波Emotion识别 tasks中的cross-dataset问题,提出了一个基于EEG的Emotion Style Transfer Network(E2STN)来获得具有源频道和目标频道的情感特征信息。* Methods: E2STN包括三个模组:转移模组、转移评估模组和推断预测模组。转移模组将源频道和目标频道的专业信息转换为新的类别特征信息,并运用转移评估模组来约束生成的表现,以确保能够更 precisley 融合两个频道的资讯。* Results: 实验结果显示,E2STN可以在cross-dataset Emotion识别任务中取得顶尖性能。Abstract
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.
摘要
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.Here's the translation in Traditional Chinese:As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.
Prompting In-Context Operator Learning with Sensor Data, Equations, and Natural Language
results: 通过将人类知识 integrate into 学习过程,提高了学习性能和数据量的灵活性,同时开创了一条新的语言模型应用路径。Abstract
In the growing domain of scientific machine learning, in-context operator learning has demonstrated notable potential in learning operators from prompted data during inference stage without weight updates. However, the current model's overdependence on sensor data, may inadvertently overlook the invaluable human insight into the operator. To address this, we present a transformation of in-context operator learning into a multi-modal paradigm. We propose the use of "captions" to integrate human knowledge about the operator, expressed through natural language descriptions and equations. We illustrate how this method not only broadens the flexibility and generality of physics-informed learning, but also significantly boosts learning performance and reduces data needs. Furthermore, we introduce a more efficient neural network architecture for multi-modal in-context operator learning, referred to as "ICON-LM", based on a language-model-like architecture. We demonstrate the viability of "ICON-LM" for scientific machine learning tasks, which creates a new path for the application of language models.
摘要
在科学机器学习领域的发展中,在推理阶段学习操作符的增ContextOperator Learning(ICOL)技术已经表现出了remarkable的潜力,无需更新参数可以从提示数据中学习操作符。然而,当前模型过于依赖感知数据,可能会不经意识地忽略人类对操作符的重要意见。为此,我们提出了将ICOL转换为多modal的思想。我们建议使用"caption"来整合人类对操作符的知识,通过自然语言描述和方程表达。我们示出了该方法不仅拓宽了物理学习的灵活性和普遍性,还可以显著提高学习性和降低数据需求。此外,我们提出了一种更高效的多modalICOL neural network架构,称为"ICON-LM",基于语言模型的架构。我们示出了"ICON-LM"在科学机器学习任务中的可行性,创造了一条新的语言模型应用路径。
A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique
results: 对比现有方法,该方法在标准数据集上表现更好,并且可以避免消失梯度问题。这项研究开展了高效和有效的深度神经网络训练的可能性。Abstract
Deep learning has revolutionized industries like computer vision, natural language processing, and speech recognition. However, back propagation, the main method for training deep neural networks, faces challenges like computational overhead and vanishing gradients. In this paper, we propose a novel instant parameter update methodology that eliminates the need for computing gradients at each layer. Our approach accelerates learning, avoids the vanishing gradient problem, and outperforms state-of-the-art methods on benchmark data sets. This research presents a promising direction for efficient and effective deep neural network training.
摘要
Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators
paper_authors: Nikolas Borrel-Jensen, Somdatta Goswami, Allan P. Engsig-Karup, George Em Karniadakis, Cheol-Ho Jeong
for: 用于虚拟/增强现实、游戏音频和空间计算等领域的 зву频场模拟。
methods: 使用深度运算网络来近似线性波方程Operator。
results: 实现了在真实3D声学场景中逐源位置和接受器位置的批处理计算,计算时间在毫秒级别,与参考解决方案display(0.02 Pa至0.10 Pa)有good agreement。Abstract
We address the challenge of sound propagation simulations in $3$D virtual rooms with moving sources, which have applications in virtual/augmented reality, game audio, and spatial computing. Solutions to the wave equation can describe wave phenomena such as diffraction and interference. However, simulating them using conventional numerical discretization methods with hundreds of source and receiver positions is intractable, making stimulating a sound field with moving sources impractical. To overcome this limitation, we propose using deep operator networks to approximate linear wave-equation operators. This enables the rapid prediction of sound propagation in realistic 3D acoustic scenes with moving sources, achieving millisecond-scale computations. By learning a compact surrogate model, we avoid the offline calculation and storage of impulse responses for all relevant source/listener pairs. Our experiments, including various complex scene geometries, show good agreement with reference solutions, with root mean squared errors ranging from 0.02 Pa to 0.10 Pa. Notably, our method signifies a paradigm shift as no prior machine learning approach has achieved precise predictions of complete wave fields within realistic domains. We anticipate that our findings will drive further exploration of deep neural operator methods, advancing research in immersive user experiences within virtual environments.
摘要
我们面对三维虚拟房间中有移动源的声波传播模拟挑战,这些应用包括虚拟/增强现实实验、游戏音频和空间计算。我们可以使用深度算子网络来近似线性波方程式的算子,从而实现实时的声波传播预测,并且可以在毫秒级时间内完成计算。通过学习简洁的模型,我们可以避免在所有相关的源/听众配置下进行组织化计算和储存响应。我们的实验包括多种复杂的场景几何,结果显示与参考解析结果几乎完美匹配,误差范围为0.02 Pa至0.10 Pa。值得注意的是,我们的方法代表了一种新的思维方式,没有任何先前的机器学习方法能够精确地预测完整的波场 within realistic domain。我们预期这些成果将驱动深度神经网络方法的进一步探索,促进虚拟环境中的投入性用户体验。
RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction
results: 本研究发现,使用 RadGraph2 训练的模型可以更好地捕捉疾病状态的变化,并在关系抽取任务中表现更好,比旧有的模型更高。Abstract
We present RadGraph2, a novel dataset for extracting information from radiology reports that focuses on capturing changes in disease state and device placement over time. We introduce a hierarchical schema that organizes entities based on their relationships and show that using this hierarchy during training improves the performance of an information extraction model. Specifically, we propose a modification to the DyGIE++ framework, resulting in our model HGIE, which outperforms previous models in entity and relation extraction tasks. We demonstrate that RadGraph2 enables models to capture a wider variety of findings and perform better at relation extraction compared to those trained on the original RadGraph dataset. Our work provides the foundation for developing automated systems that can track disease progression over time and develop information extraction models that leverage the natural hierarchy of labels in the medical domain.
摘要
我们介绍RadGraph2数据集,这是一个抽取医学报告信息的新数据集,它专注于随时间变化的疾病状态和设备位置。我们提出了一个层次结构,将实体按照其关系进行组织,并证明在训练时使用这个层次结构可以提高信息抽取模型的性能。我们对 DyGIE++ 框架进行修改,得到了我们的 HGIE 模型,这个模型在实体和关系抽取任务中表现更好。我们示示 RadGraph2 数据集可以让模型捕捉到更多的发现和在关系抽取任务中表现更好,比之前训练在 RadGraph 数据集上的模型。我们的工作为 automatized 系统的开发提供了基础,以及在医疗领域中自然层次标签的信息抽取模型的开发。
Collaborative Wideband Spectrum Sensing and Scheduling for Networked UAVs in UTM Systems
results: 我们建立了一个包括MATLAB LTE工具箱的详细的模拟框架,以生成一个近似实际的隐藏数据集,用于开发ML/AI基于谱管理解决方案。这种评估方法ология提供了一个灵活的框架,可以生成大量的谱数据集,用于开发ML/AI基于谱管理解决方案。Abstract
In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users to opportunistically utilize detected spectrum holes. To this end, we propose a multi-class classification problem for wideband spectrum sensing to detect vacant spectrum spots based on collected I/Q samples. To enhance the accuracy of the spectrum sensing module, the outputs from the multi-class classification by each individual UAV are fused at a server in the unmanned aircraft system traffic management (UTM) ecosystem. In the spectrum scheduling phase, we leverage reinforcement learning (RL) solutions to dynamically allocate the detected spectrum holes to the secondary users (i.e., UAVs). To evaluate the proposed methods, we establish a comprehensive simulation framework that generates a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station~(BS) locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. This evaluation methodology provides a flexible framework to generate large spectrum datasets that could be used for developing ML/AI-based spectrum management solutions for aerial devices.
摘要
在这篇论文中,我们提出了一个基于数据的框架,用于共享宽带spectrum感知和调度 для网络无人机(UAV),这些UAV作为次要用户在探测到的spectrum孔隙中进行机会性利用。为此,我们提出了一个多类分类问题,用于宽带spectrum感知,以检测空闲spectrum孔隙基于收集的I/Q样本。为了提高spectrum感知模块的准确性,每个个体UAV的输出在UTM生态系统中的服务器上进行融合。在spectrum调度阶段,我们利用了强化学习(RL)解决方案,以动态分配探测到的spectrum孔隙给次要用户(即UAV)。为了评估我们的方法,我们建立了一个全面的 simulate框架,通过在MATLAB LTE工具箱中生成一个近似实际的数据集,来评估我们的方法。这种评估方法ология提供了一个灵活的框架,可以用于生成大量spectrum数据,用于开发ML/AI基于spectrum管理解决方案。
Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance
results: 根据实验结果,这种方法可以准确地检测攻击,并生成简洁的攻击摘要图,方便管理员快速理解和应对系统攻击。Abstract
Provenance graphs are structured audit logs that describe the history of a system's execution. Recent studies have explored a variety of techniques to analyze provenance graphs for automated host intrusion detection, focusing particularly on advanced persistent threats. Sifting through their design documents, we identify four common dimensions that drive the development of provenance-based intrusion detection systems (PIDSes): scope (can PIDSes detect modern attacks that infiltrate across application boundaries?), attack agnosticity (can PIDSes detect novel attacks without a priori knowledge of attack characteristics?), timeliness (can PIDSes efficiently monitor host systems as they run?), and attack reconstruction (can PIDSes distill attack activity from large provenance graphs so that sysadmins can easily understand and quickly respond to system intrusion?). We present KAIROS, the first PIDS that simultaneously satisfies the desiderata in all four dimensions, whereas existing approaches sacrifice at least one and struggle to achieve comparable detection performance. Kairos leverages a novel graph neural network-based encoder-decoder architecture that learns the temporal evolution of a provenance graph's structural changes to quantify the degree of anomalousness for each system event. Then, based on this fine-grained information, Kairos reconstructs attack footprints, generating compact summary graphs that accurately describe malicious activity over a stream of system audit logs. Using state-of-the-art benchmark datasets, we demonstrate that Kairos outperforms previous approaches.
摘要
《证明图》是系统执行历史记录的结构化审核日志。最近的研究已经探讨了多种技术来分析证明图来自动检测主机入侵,尤其是高级持续性威胁。从它们的设计文档中,我们标识出四个常见的维度驱动了证明基于检测系统的开发(PIDS):范围(可以PIDS检测现代攻击卷积到应用程序边界?),攻击不受限(可以PIDS检测新型攻击无需先知攻击特征?),准确性(可以PIDS高效监控主机系统?),和攻击重建(可以PIDS从证明图中提取攻击活动,以便管理员轻松理解和快速应对系统入侵?)。我们介绍了卡伊罗斯(KAIROS),第一个同时满足这四个维度的PIDS,而现有方法至少牺牲一个维度,并且减少了相对的检测性能。卡伊罗斯利用了一种新的图神经网络基于编码器-解码器架构,学习证明图的时间演变的结构变化,以量化每个系统事件的异常程度。然后,基于这些细化的信息,卡伊罗斯重建攻击印记,生成准确描述恶意活动的流程系统审核日志的简短摘要图。使用现代标准数据集,我们证明了卡伊罗斯在前一个approaches中出performanced。