results: 这篇论文的实验结果显示,与基本的mean-teacher方法相比,密度割载导向的半指导 semi-supervised 物体检测方法可以提高检测精度超过2%,特别是 для小型物体。Abstract
One of the important bottlenecks in training modern object detectors is the need for labeled images where bounding box annotations have to be produced for each object present in the image. This bottleneck is further exacerbated in aerial images where the annotators have to label small objects often distributed in clusters on high-resolution images. In recent days, the mean-teacher approach trained with pseudo-labels and weak-strong augmentation consistency is gaining popularity for semi-supervised object detection. However, a direct adaptation of such semi-supervised detectors for aerial images where small clustered objects are often present, might not lead to optimal results. In this paper, we propose a density crop-guided semi-supervised detector that identifies the cluster of small objects during training and also exploits them to improve performance at inference. During training, image crops of clusters identified from labeled and unlabeled images are used to augment the training set, which in turn increases the chance of detecting small objects and creating good pseudo-labels for small objects on the unlabeled images. During inference, the detector is not only able to detect the objects of interest but also regions with a high density of small objects (density crops) so that detections from the input image and detections from image crops are combined, resulting in an overall more accurate object prediction, especially for small objects. Empirical studies on the popular benchmarks of VisDrone and DOTA datasets show the effectiveness of our density crop-guided semi-supervised detector with an average improvement of more than 2\% over the basic mean-teacher method in COCO style AP. Our code is available at: https://github.com/akhilpm/DroneSSOD.
摘要
一个重要的瓶颈在现代物体探测器的训练中是需要标注过的图像,其中需要为每个图像中的每个物体生成矩形框注释。这个瓶颈在空中图像中更加突出,因为注释员需要为高分辨率图像中的小对象进行标注。在最近几天,使用pseudo-labels和弱强协同稳定的mean-teacher方法在 semi-supervised 物体探测中得到了广泛的应用。然而,直接适用这种 semi-supervised 探测器于空中图像中,可能并不会导致最佳的结果。在这篇论文中,我们提出了一种基于密度评估的 semi-supervised 探测器,可以在训练时对小对象进行识别,并在探测时利用这些小对象来提高检测的准确性。在训练时,我们使用标注和无标注图像中的群集来增强训练集,从而提高小对象的检测和生成 Pseudo-labels 的可能性。在探测时,探测器不仅能够检测输入图像中的对象,还能够检测高密度的小对象区域(密度评估),从而将输入图像和密度评估中的检测结果组合起来,得到更加准确的对象预测,特别是对小对象。我们的实验结果表明,我们的密度评估基于 semi-supervised 探测器在 COCO 风格的 AP 中超过了基本 mean-teacher 方法的平均提升 más than 2%。我们的代码可以在:https://github.com/akhilpm/DroneSSOD 中找到。
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures
results: 研究发现,当源文章详细 enough以至于得到人工分析者的一致时,LLMs可以有效地characterize软件供应链攻击。然而,LLMs current不能完全取代人工分析者,未来的工作可以进一步提高LLM的表现在这个领域,并对更广泛的文章和攻击进行研究。Abstract
As we increasingly depend on software systems, the consequences of breaches in the software supply chain become more severe. High-profile cyber attacks like those on SolarWinds and ShadowHammer have resulted in significant financial and data losses, underlining the need for stronger cybersecurity. One way to prevent future breaches is by studying past failures. However, traditional methods of analyzing these failures require manually reading and summarizing reports about them. Automated support could reduce costs and allow analysis of more failures. Natural Language Processing (NLP) techniques such as Large Language Models (LLMs) could be leveraged to assist the analysis of failures. In this study, we assessed the ability of Large Language Models (LLMs) to analyze historical software supply chain breaches. We used LLMs to replicate the manual analysis of 69 software supply chain security failures performed by members of the Cloud Native Computing Foundation (CNCF). We developed prompts for LLMs to categorize these by four dimensions: type of compromise, intent, nature, and impact. GPT 3.5s categorizations had an average accuracy of 68% and Bard had an accuracy of 58% over these dimensions. We report that LLMs effectively characterize software supply chain failures when the source articles are detailed enough for consensus among manual analysts, but cannot yet replace human analysts. Future work can improve LLM performance in this context, and study a broader range of articles and failures.
摘要
随着我们对软件系统的依赖程度越来越高,软件供应链攻击的后果也变得更加严重。如 solsWinds和ShadowHammer等高 Profile cyber attack 的事件,已经导致了 significannot financial和data losses,这也高亮了更强的cybersecurity的需求。一种方法来防止未来的攻击是通过研究过去的失败来。然而,传统的失败分析方法需要手动阅读和总结报告。自动支持可以降低成本和允许更多的失败分析。自然语言处理(NLP)技术,如大语言模型(LLMs),可以帮助失败分析。在这项研究中,我们评估了LLMs是否可以分析历史软件供应链安全失败。我们使用LLMs来复制人工分析69例软件供应链安全失败,这些失败由Cloud Native Computing Foundation(CNCF)的成员进行了手动分析。我们为LLMs设计了四个维度来分类这些失败:类型攻击、意图、性质和影响。GPT 3.5的分类精度为68%,而Bard的精度为58%。我们发现LLMs可以有效地描述软件供应链失败,但是只有当源文章具有足够细节时,才能达到人工分析者之间的一致。未来的工作可以提高LLM的性能在这种情况下,并研究更广泛的文章和失败。
Do Diffusion Models Suffer Error Propagation? Theoretical Analysis and Consistency Regularization
results: 实验结果表明,这种正则化方法可以有效地解决扩散模型中的错误卷积问题,并且可以提高扩散模型的性能。Abstract
While diffusion models have achieved promising performances in data synthesis, they might suffer error propagation because of their cascade structure, where the distributional mismatch spreads and magnifies through the chain of denoising modules. However, a strict analysis is expected since many sequential models such as Conditional Random Field (CRF) are free from error propagation. In this paper, we empirically and theoretically verify that diffusion models are indeed affected by error propagation and we then propose a regularization to address this problem. Our theoretical analysis reveals that the question can be reduced to whether every denoising module of the diffusion model is fault-tolerant. We derive insightful transition equations, indicating that the module can't recover from input errors and even propagates additional errors to the next module. Our analysis directly leads to a consistency regularization scheme for diffusion models, which explicitly reduces the distribution gap between forward and backward processes. We further introduce a bootstrapping algorithm to reduce the computation cost of the regularizer. Our experimental results on multiple image datasets show that our regularization effectively handles error propagation and significantly improves the performance of vanilla diffusion models.
摘要
Diffusion models 有 achieved promising performance in data synthesis, but they may suffer from error propagation due to their cascade structure, where the distributional mismatch spreads and magnifies through the chain of denoising modules. However, many sequential models such as Conditional Random Field (CRF) are free from error propagation. In this paper, we empirically and theoretically verify that diffusion models are indeed affected by error propagation, and we propose a regularization to address this problem.我们的 teorical analysis shows that the question can be reduced to whether every denoising module of the diffusion model is fault-tolerant. We derive insightful transition equations, indicating that the module cannot recover from input errors and even propagates additional errors to the next module. Our analysis directly leads to a consistency regularization scheme for diffusion models, which explicitly reduces the distribution gap between forward and backward processes.我们还提出了一种 bootstrapping algorithm to reduce the computation cost of the regularizer. Our experimental results on multiple image datasets show that our regularization effectively handles error propagation and significantly improves the performance of vanilla diffusion models.
When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis
results: 实验表明,使用 NSCL 可以与多种强基eline 相比竞争,并且在常见的 benchmark 数据集上达到类似或更好的性能,这是一个有趣的实际应用,同时也具有理论保证。Abstract
Novel Class Discovery (NCD) aims at inferring novel classes in an unlabeled set by leveraging prior knowledge from a labeled set with known classes. Despite its importance, there is a lack of theoretical foundations for NCD. This paper bridges the gap by providing an analytical framework to formalize and investigate when and how known classes can help discover novel classes. Tailored to the NCD problem, we introduce a graph-theoretic representation that can be learned by a novel NCD Spectral Contrastive Loss (NSCL). Minimizing this objective is equivalent to factorizing the graph's adjacency matrix, which allows us to derive a provable error bound and provide the sufficient and necessary condition for NCD. Empirically, NSCL can match or outperform several strong baselines on common benchmark datasets, which is appealing for practical usage while enjoying theoretical guarantees.
摘要
《新类发现(NCD)》的目标是从无标签集中推断出新类,通过利用已知类的先知知识。尽管NCD的理论基础缺乏,这篇论文填补了这一空白,提供了一种分析框架,以便正式地探讨已知类如何帮助发现新类。特地设计 дляNCD问题,我们引入了一种图论的表示方式,可以由NSCL(NCD特征对偶损失)学习。将这个目标函数最小化等价于图的邻接矩阵 факторизация,从而得到了一个可证明的错误 bound 和发现新类的必要和 suficient condition。实验表明,NSCL可以与多种强基eline比肩并列,这是实用上的优点,同时又享有理论保证。
An Empirical Study of Bugs in Open-Source Federated Learning Framework
paper_authors: Weijie Shao, Yuyang Gao, Fu Song, Sen Chen, Lingling Fan
for: 本研究的目的是 investigate the security issues in federated learning (FL) frameworks.
methods: 本研究使用了manuel collection, classification, and labeling of 1,112 FL framework bugs from 12 open-source FL frameworks on GitHub, and constructed taxonomies of 15 symptoms, 12 root causes, and 20 fix patterns of these bugs.
results: 研究发现了9个发现,包括15种symptoms、12种root causes、20种fix patterns,并对23个逻辑组件和两个主要应用场景进行了分析。Abstract
Federated learning (FL), as a decentralized machine learning solution to the protection of users' private data, has become an important learning paradigm in recent years, especially since the enforcement of stricter laws and regulations in most countries. Therefore, a variety of FL frameworks are released to facilitate the development and application of federated learning. Despite the considerable amount of research on the security and privacy of FL models and systems, the security issues in FL frameworks have not been systematically studied yet. In this paper, we conduct the first empirical study on 1,112 FL framework bugs to investigate their characteristics. These bugs are manually collected, classified, and labeled from 12 open-source FL frameworks on GitHub. In detail, we construct taxonomies of 15 symptoms, 12 root causes, and 20 fix patterns of these bugs and investigate their correlations and distributions on 23 logical components and two main application scenarios. From the results of our study, we present nine findings, discuss their implications, and propound several suggestions to FL framework developers and security researchers on the FL frameworks.
摘要
federated learning(FL),作为一种保护用户隐私数据的分布式机器学习解决方案,在过去几年中变得非常重要,尤其是在大多数国家实施更加严格的法律和规定后。因此,许多FL框架被发布,以便发展和应用联邦学习。尽管有很多关于FL模型和系统安全性的研究,但FL框架的安全问题尚未得到系统的研究。在这篇论文中,我们进行了第一个对1,112个FL框架漏洞的实证研究,以Investigate其特点。这些漏洞由手动收集、分类和标注的12个开源FL框架在GitHub上的漏洞。在详细的分析中,我们构建了15种表现Symptoms、12种根本原因和20种修复模式的漏洞分类系统,并对23个逻辑组件和两个主要应用场景进行了分析。从我们的研究结果中,我们提出了9个发现,讨论了它们的意义,并对FL框架开发者和安全研究人员提出了一些建议。
Multi-Class Deep SVDD: Anomaly Detection Approach in Astronomy with Distinct Inlier Categories
paper_authors: Manuel Pérez-Carrasco, Guillermo Cabrera-Vives, Lorena Hernández-García, Francisco Forster, Paula Sánchez-Sáez, Alejandra Muñoz Arancibia, Nicolás Astorga, Franz Bauer, Amelia Bayo, Martina Cádiz-Leyton, Marcio Catelan
methods: 这篇论文提出了一个叫做 Multi-Class Deep Support Vector Data Description (MCDSVDD) 的新算法,它是 One-Class Deep SVDD 的扩展,可以处理不同类型的内liers category。MCDSVDD 使用神经网络将数据映射到高维球体中,每个高维球体代表一个特定的内liers category。
results: 这篇论文的结果显示 MCDSVDD 能够对天文数据进行有效的异常探测,并且可以利用不同类型的内liers category。Abstract
With the increasing volume of astronomical data generated by modern survey telescopes, automated pipelines and machine learning techniques have become crucial for analyzing and extracting knowledge from these datasets. Anomaly detection, i.e. the task of identifying irregular or unexpected patterns in the data, is a complex challenge in astronomy. In this paper, we propose Multi-Class Deep Support Vector Data Description (MCDSVDD), an extension of the state-of-the-art anomaly detection algorithm One-Class Deep SVDD, specifically designed to handle different inlier categories with distinct data distributions. MCDSVDD uses a neural network to map the data into hyperspheres, where each hypersphere represents a specific inlier category. The distance of each sample from the centers of these hyperspheres determines the anomaly score. We evaluate the effectiveness of MCDSVDD by comparing its performance with several anomaly detection algorithms on a large dataset of astronomical light-curves obtained from the Zwicky Transient Facility. Our results demonstrate the efficacy of MCDSVDD in detecting anomalous sources while leveraging the presence of different inlier categories. The code and the data needed to reproduce our results are publicly available at https://github.com/mperezcarrasco/AnomalyALeRCE.
摘要
随着现代观测望远镜生成的天文数据量的增加,自动化管道和机器学习技术已成为分析和提取天文数据中的重要工具。异常检测,即在数据中发现不寻常或意外的模式,是天文学中的一项复杂挑战。在这篇论文中,我们提出了多类深度支持向量数据描述(MCDSVDD)算法,这是一种特制来处理不同类准样的异常检测算法。MCDSVDD使用神经网络将数据映射到几个异常分类中的几个圆锥中,每个圆锥表示一种特定的准样类别。每个样本与各个圆锥的中心之间的距离决定了异常分数。我们通过对一些异常检测算法的比较,证明MCDSVDD可以有效地检测异常来源,同时利用不同类准样的存在。codes和需要进行重现的数据可以在https://github.com/mperezcarrasco/AnomalyALeRCE上获取。
Transferable Models for Bioacoustics with Human Language Supervision
results: 当finetuned的BioLingual模型在九个任务上达到了新的状态态-of-the-art水平。此外,该模型可以通过自然语言查询来检索动物叫声录音,并且可以在不同的物种和环境下进行可扩展的应用。Abstract
Passive acoustic monitoring offers a scalable, non-invasive method for tracking global biodiversity and anthropogenic impacts on species. Although deep learning has become a vital tool for processing this data, current models are inflexible, typically cover only a handful of species, and are limited by data scarcity. In this work, we propose BioLingual, a new model for bioacoustics based on contrastive language-audio pretraining. We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, with over a million audio-caption pairs holding information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When fine-tuned, BioLingual sets a new state-of-the-art on nine tasks in the Benchmark of Animal Sounds. Given its broad taxa coverage and ability to be flexibly queried in human language, we believe this model opens new paradigms in ecological monitoring and research, including free-text search on the world's acoustic monitoring archives. We open-source our models, dataset, and code.
摘要
We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, which contains over one million audio-caption pairs with information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When fine-tuned, BioLingual sets a new state-of-the-art on nine tasks in the Benchmark of Animal Sounds.Given its broad taxa coverage and ability to be flexibly queried in human language, we believe that this model opens up new paradigms in ecological monitoring and research, including free-text search on the world's acoustic monitoring archives. We have made our models, dataset, and code open-source, allowing for further research and application of this technology.
Adversarial ModSecurity: Countering Adversarial SQL Injections with Robust Machine Learning
results: 实验结果显示,AdvModSec可以提高ModSecurity的检测精度和鲁棒性,具体来说,可以提高检测率21%,并在对抗反对攻击方面提高鲁棒性42%。Abstract
ModSecurity is widely recognized as the standard open-source Web Application Firewall (WAF), maintained by the OWASP Foundation. It detects malicious requests by matching them against the Core Rule Set, identifying well-known attack patterns. Each rule in the CRS is manually assigned a weight, based on the severity of the corresponding attack, and a request is detected as malicious if the sum of the weights of the firing rules exceeds a given threshold. In this work, we show that this simple strategy is largely ineffective for detecting SQL injection (SQLi) attacks, as it tends to block many legitimate requests, while also being vulnerable to adversarial SQLi attacks, i.e., attacks intentionally manipulated to evade detection. To overcome these issues, we design a robust machine learning model, named AdvModSec, which uses the CRS rules as input features, and it is trained to detect adversarial SQLi attacks. Our experiments show that AdvModSec, being trained on the traffic directed towards the protected web services, achieves a better trade-off between detection and false positive rates, improving the detection rate of the vanilla version of ModSecurity with CRS by 21%. Moreover, our approach is able to improve its adversarial robustness against adversarial SQLi attacks by 42%, thereby taking a step forward towards building more robust and trustworthy WAFs.
摘要
mod_security是通用的开源网络应用程序防火墙(WAF),由OWASP基金会维护。它通过匹配核心规则集来检测攻击性请求,并且每个规则都被手动分配了严重程度,以及检测到的请求是否超过了一定的阈值。在这种简单的策略下,我们发现这个策略在检测SQL注入(SQLi)攻击时效果很差,因为它会拒绝许多合法请求,同时也容易受到攻击者的攻击。为了解决这些问题,我们设计了一个可靠的机器学习模型,名为AdvModSec,它使用核心规则集作为输入特征,并通过训练来检测攻击者Intentional地修改的SQLi攻击。我们的实验表明,AdvModSec在受到保护的网络服务的流量下进行训练后,可以提高检测率,同时也可以降低假阳性率,相比于vanilla版mod_security与核心规则集的检测率提高21%。此外,我们的方法还可以在面对攻击者Intentional地修改SQLi攻击后提高对抗性,提高了24%。这些成果表明我们的方法可以提高WAF的可靠性和信任性。
CasCIFF: A Cross-Domain Information Fusion Framework Tailored for Cascade Prediction in Social Networks
methods: 该研究提出了 Cross-Domain Information Fusion Framework (CasCIFF),这是一种基于多个域的信息融合框架,通过多树邻域信息来提高用户嵌入的Robustness。在插入推荐时,该框架 INTENTIONALLY integrate 时间戳,以捕捉信息扩散过程中的变化趋势。
results: 研究表明,CasCIFF 可以更好地捕捉用户行为和信息扩散的复杂关系,并且在信息扩散预测任务中表现出了superior的性能。Abstract
Existing approaches for information cascade prediction fall into three main categories: feature-driven methods, point process-based methods, and deep learning-based methods. Among them, deep learning-based methods, characterized by its superior learning and representation capabilities, mitigates the shortcomings inherent of the other methods. However, current deep learning methods still face several persistent challenges. In particular, accurate representation of user attributes remains problematic due to factors such as fake followers and complex network configurations. Previous algorithms that focus on the sequential order of user activations often neglect the rich insights offered by activation timing. Furthermore, these techniques often fail to holistically integrate temporal and structural aspects, thus missing the nuanced propagation trends inherent in information cascades.To address these issues, we propose the Cross-Domain Information Fusion Framework (CasCIFF), which is tailored for information cascade prediction. This framework exploits multi-hop neighborhood information to make user embeddings robust. When embedding cascades, the framework intentionally incorporates timestamps, endowing it with the ability to capture evolving patterns of information diffusion. In particular, the CasCIFF seamlessly integrates the tasks of user classification and cascade prediction into a consolidated framework, thereby allowing the extraction of common features that prove useful for all tasks, a strategy anchored in the principles of multi-task learning.
摘要
现有的信息冲击预测方法可以分为三个主要类别:基于特征的方法、基于点过程的方法和深度学习基于方法。其中,深度学习基于方法,具有出色的学习和表示能力,可以抵消其他方法的缺点。然而,当前的深度学习方法仍面临多个持续的挑战。特别是,准确地表示用户属性仍然是一个问题,因为因素如假账户和复杂的网络配置。过去的算法通常将用户活动的顺序序列化,而忽略了用户活动的时间序列信息。此外,这些技术通常不能整合时间和结构方面的信息,因此缺乏把握信息冲击的细腻传播趋势。为解决这些问题,我们提出了跨频率信息融合框架(CasCIFF),这是用于信息冲击预测的专门框架。这个框架利用多跳邻居信息来做用户嵌入 Robust。当嵌入冲击时,框架会意识到时间戳,以便捕捉信息传播中的演化趋势。具体来说,CasCIFF通过将用户分类和冲击预测任务融合到一起,从而允许提取共同的特征,这种策略基于多任务学习的原则。
Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning
results: 我们的结果表明,在比较不含源分离、不含对抗学习和不含两者的系统之间,我们的提议系统可以显著改善沟通隐私保护,同时保持声学监测任务的好性能。Abstract
Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment. In this study, we propose the integration of two commonly used approaches in privacy preservation: source separation and adversarial representation learning. The proposed system learns the latent representation of audio recordings such that it prevents differentiating between speech and non-speech recordings. Initially, the source separation network filters out some of the privacy-sensitive data, and during the adversarial learning process, the system will learn privacy-preserving representation on the filtered signal. We demonstrate the effectiveness of our proposed method by comparing our method against systems without source separation, without adversarial learning, and without both. Overall, our results suggest that the proposed system can significantly improve speech privacy preservation compared to that of using source separation or adversarial learning solely while maintaining good performance in the acoustic monitoring task.
摘要
privacy preservation已经是智能听音系统中的长期关注点,其中speech可以通过系统运行环境中的passive recording方式被记录。在这个研究中,我们提议将两种常用的隐私保护方法集成:源分离和对抗表示学习。我们的提议系统将learnlatent表示音频记录,以防止 differentiating between speech和non-speech记录。在源分离网络过滤一部分隐私敏感数据后,在对抗学习过程中,系统会学习隐私保护的表示。我们的实验结果表明,我们的提议方法可以在保持听音任务的好表现的同时,提高speech隐私保护比使用源分离或对抗学习独立的系统。
Improving Autonomous Separation Assurance through Distributed Reinforcement Learning with Attention Networks
results: numerical study表明,提出的方法可以在高强度、动态环境中保证安全有效的飞机分离,并且可以处理多种不确定性的信息。Abstract
Advanced Air Mobility (AAM) introduces a new, efficient mode of transportation with the use of vehicle autonomy and electrified aircraft to provide increasingly autonomous transportation between previously underserved markets. Safe and efficient navigation of low altitude aircraft through highly dense environments requires the integration of a multitude of complex observations, such as surveillance, knowledge of vehicle dynamics, and weather. The processing and reasoning on these observations pose challenges due to the various sources of uncertainty in the information while ensuring cooperation with a variable number of aircraft in the airspace. These challenges coupled with the requirement to make safety-critical decisions in real-time rule out the use of conventional separation assurance techniques. We present a decentralized reinforcement learning framework to provide autonomous self-separation capabilities within AAM corridors with the use of speed and vertical maneuvers. The problem is formulated as a Markov Decision Process and solved by developing a novel extension to the sample-efficient, off-policy soft actor-critic (SAC) algorithm. We introduce the use of attention networks for variable-length observation processing and a distributed computing architecture to achieve high training sample throughput as compared to existing approaches. A comprehensive numerical study shows that the proposed framework can ensure safe and efficient separation of aircraft in high density, dynamic environments with various sources of uncertainty.
摘要
高级空中 mobilité (AAM) 引入了一种新的、高效的交通方式,通过车辆自主和电动飞机来提供受欢迎的交通 between formerly underserved markets。 safe and efficient navigation of low altitude aircraft through highly dense environments requires the integration of a multitude of complex observations, such as surveillance, vehicle dynamics knowledge, and weather. the processing and reasoning on these observations pose challenges due to the various sources of uncertainty in the information while ensuring cooperation with a variable number of aircraft in the airspace. these challenges coupled with the requirement to make safety-critical decisions in real-time rule out the use of conventional separation assurance techniques. we present a decentralized reinforcement learning framework to provide autonomous self-separation capabilities within AAM corridors with the use of speed and vertical maneuvers. the problem is formulated as a Markov decision process and solved by developing a novel extension to the sample-efficient, off-policy soft actor-critic (SAC) algorithm. we introduce the use of attention networks for variable-length observation processing and a distributed computing architecture to achieve high training sample throughput as compared to existing approaches. a comprehensive numerical study shows that the proposed framework can ensure safe and efficient separation of aircraft in high density, dynamic environments with various sources of uncertainty.
Variations on the Reinforcement Learning performance of Blackjack
results: 研究发现,在黑板子游戏中,使用基础策略和高低系统的card counter可以使房间铺垮,并且牵涉到环境变化的影响。此外,q学习算法在不同deck size下的学习速率也得到了研究。Abstract
Blackjack or "21" is a popular card-based game of chance and skill. The objective of the game is to win by obtaining a hand total higher than the dealer's without exceeding 21. The ideal blackjack strategy will maximize financial return in the long run while avoiding gambler's ruin. The stochastic environment and inherent reward structure of blackjack presents an appealing problem to better understand reinforcement learning agents in the presence of environment variations. Here we consider a q-learning solution for optimal play and investigate the rate of learning convergence of the algorithm as a function of deck size. A blackjack simulator allowing for universal blackjack rules is also implemented to demonstrate the extent to which a card counter perfectly using the basic strategy and hi-lo system can bring the house to bankruptcy and how environment variations impact this outcome. The novelty of our work is to place this conceptual understanding of the impact of deck size in the context of learning agent convergence.
摘要
黑Jack或"21"是一款受欢迎的 карто牌类游戏,旨在通过获得手牌总数高于卡牌师的手牌总数,而不超过21。理想的黑Jack策略可以在长期内最大化财务回报,同时避免投资者的破产。黑Jack的杂 probabilistic环境和内在的奖励结构,使得黑Jack成为了研究增强学习代理人在环境变化下的理解的一个有appeal的问题。在这里,我们考虑了q学习解决方案,以实现最佳的游戏策略,并研究了算法在卡牌堆大小变化时的学习速率的响应。我们还实现了一个可以实现 universal blackjack规则的黑Jack模拟器,以示出一个基本策略和hi-lo系统的卡计数可以使得临场破产,并如何环境变化影响这种结果。我们的研究的新特点在于将这种概念理解与学习代理人的快速学习速率相联系起来。
Performance Analysis of Transformer Based Models (BERT, ALBERT and RoBERTa) in Fake News Detection
paper_authors: Shafna Fitria Nur Azizah, Hasan Dwi Cahyono, Sari Widya Sihwi, Wisnu Widiarto
For: 本研究旨在探讨使用 transformer 模型进行假新闻检测,以提高假新闻检测精度。* Methods: 本研究使用了 ALBERT 模型,并对其进行了改进,以提高假新闻检测精度。* Results: 研究发现,使用 ALBERT 模型可以达到 87.6% 的准确率,86.9% 的精度,86.9% F1-score,以及 174.5 个运行时间(s/epoch)。Abstract
Fake news is fake material in a news media format but is not processed properly by news agencies. The fake material can provoke or defame significant entities or individuals or potentially even for the personal interests of the creators, causing problems for society. Distinguishing fake news and real news is challenging due to limited of domain knowledge and time constraints. According to the survey, the top three areas most exposed to hoaxes and misinformation by residents are in Banten, DKI Jakarta and West Java. The model of transformers is referring to an approach in the field of artificial intelligence (AI) in natural language processing utilizing the deep learning architectures. Transformers exercise a powerful attention mechanism to process text in parallel and produce rich and contextual word representations. A previous study indicates a superior performance of a transformer model known as BERT over and above non transformer approach. However, some studies suggest the performance can be improved with the use of improved BERT models known as ALBERT and RoBERTa. However, the modified BERT models are not well explored for detecting fake news in Bahasa Indonesia. In this research, we explore those transformer models and found that ALBERT outperformed other models with 87.6% accuracy, 86.9% precision, 86.9% F1-score, and 174.5 run-time (s/epoch) respectively. Source code available at: https://github.com/Shafna81/fakenewsdetection.git
摘要
假新闻是指在新闻媒体格式中存在假信息,但是未经正确处理的新闻媒体。假信息可能会挑衅或诋毁重要的实体或个人,甚至为创造者的个人利益而导致社会问题。分辨假新闻和真实新闻是一项挑战,因为有限的领域知识和时间约束。据调查,居民最容易遭受到诈骗和错误信息的地区是万隆、Special Capital Region of Jakarta和西 Java。转换器是一种人工智能(AI)自然语言处理领域的方法,使用深度学习架构。转换器实施了强大的注意力机制,并在平行处理文本,生成富有内在语义的单词表示。以前的研究表明,一种名为BERT的转换器模型在非转换器方法之上表现出优异。然而,一些研究表示,使用改进的BERT模型,如ALBERT和RoBERTa,可以进一步提高性能。然而,这些改进的BERT模型在假新闻检测中的表现还未得到广泛探索。在这项研究中,我们探索了这些转换器模型,并发现ALBERT在87.6%的准确率、86.9%的精度、86.9%的F1分数和174.5个运行时(s/epoch)中表现出优异。源代码可以在GitHub上获取:https://github.com/Shafna81/fakenewsdetection.git。
Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey
For: 本研究目的是对外部知识的整合进行系统性的检讨和总结,以提高股票价格预测的准确性。* Methods: 本研究使用了非图structured和图structured的外部知识,包括文本、多媒体描述和股票市场的关联关系。* Results: 研究提出了一种系统性的方法,可以从不同的未结构化数据源中获取外部知识,并将其与历史价格特征进行融合。此外,研究还总结了一些相关的数据集和未来研究方向。Abstract
Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain.
摘要
预测股票价格是一个复杂的研究问题,因为股票市场具有自然的波动和非线性。在过去几年,带有知识的股票价格预测方法有着创新的成果,这些方法利用了外部知识来理解股票市场。虽然这些方法的重要性,但是学术研究中几乎没有系统地总结过去的研究,特别是对于不同类型的外部知识的分析。在这篇评论文中,我们将提供一个系统和全面的描述,涵盖从不同的未结构化数据源中获取外部知识,然后将其与历史价格特征结合在一起。此外,我们还会探讨不同类型的融合方法,以及可能的未来研究方向。Here's the translation of the text into Traditional Chinese:预测股票价格是一个复杂的研究问题,因为股票市场具有自然的波动和非线性。在过去几年,带有知识的股票价格预测方法有创新的成果,这些方法利用了外部知识来理解股票市场。处理这些方法的重要性,但是学术研究中几乎没有系统地总结过去的研究,特别是对于不同类型的外部知识的分析。在这篇评论文中,我们将提供一个系统和全面的描述,涵盖从不同的未结构化数据源中获取外部知识,然后将其与历史价格特征结合在一起。此外,我们还会探讨不同类型的融合方法,以及可能的未来研究方向。
Differentially Private Graph Neural Network with Importance-Grained Noise Adaption
for: 保护图形数据的隐私,特别是节点数据的隐私,when nodes represent personal and sensitive information。
methods: 提出了一种基于差分隐私的图 neural network (GNN) 算法, named NAP-GNN, which includes topology-based node importance estimation (TNIE) method, adaptive private aggregation method, and private training of graph learning algorithm with adaptive residual connection mode.
results: theoretically analysis shows that NAP-GNN satisfies privacy guarantees, and empirical experiments over real-world graph datasets show that NAP-GNN achieves a better trade-off between privacy and accuracy.Abstract
Graph Neural Networks (GNNs) with differential privacy have been proposed to preserve graph privacy when nodes represent personal and sensitive information. However, the existing methods ignore that nodes with different importance may yield diverse privacy demands, which may lead to over-protect some nodes and decrease model utility. In this paper, we study the problem of importance-grained privacy, where nodes contain personal data that need to be kept private but are critical for training a GNN. We propose NAP-GNN, a node-importance-grained privacy-preserving GNN algorithm with privacy guarantees based on adaptive differential privacy to safeguard node information. First, we propose a Topology-based Node Importance Estimation (TNIE) method to infer unknown node importance with neighborhood and centrality awareness. Second, an adaptive private aggregation method is proposed to perturb neighborhood aggregation from node-importance-grain. Third, we propose to privately train a graph learning algorithm on perturbed aggregations in adaptive residual connection mode over multi-layers convolution for node-wise tasks. Theoretically analysis shows that NAP-GNN satisfies privacy guarantees. Empirical experiments over real-world graph datasets show that NAP-GNN achieves a better trade-off between privacy and accuracy.
摘要
GRAPH NEURAL NETWORKS (GNNs) with differential privacy have been proposed to protect graph privacy when nodes represent personal and sensitive information. However, existing methods ignore that nodes with different importance may have different privacy demands, which may lead to over-protecting some nodes and decreasing model utility. In this paper, we study the problem of importance-grained privacy, where nodes contain personal data that needs to be kept private but are critical for training a GNN. We propose NAP-GNN, a node-importance-grained privacy-preserving GNN algorithm with privacy guarantees based on adaptive differential privacy to safeguard node information. First, we propose a Topology-based Node Importance Estimation (TNIE) method to infer unknown node importance with neighborhood and centrality awareness. Second, an adaptive private aggregation method is proposed to perturb neighborhood aggregation from node-importance-grain. Third, we propose to privately train a graph learning algorithm on perturbed aggregations in adaptive residual connection mode over multi-layers convolution for node-wise tasks. Theoretically analysis shows that NAP-GNN satisfies privacy guarantees. Empirical experiments over real-world graph datasets show that NAP-GNN achieves a better trade-off between privacy and accuracy.
Analyzing the Effect of Data Impurity on the Detection Performances of Mental Disorders
results: 研究发现,通过去除数据杂质,可以显著提高主动神疾病和剂量精神疾病的诊断性能。Abstract
The primary method for identifying mental disorders automatically has traditionally involved using binary classifiers. These classifiers are trained using behavioral data obtained from an interview setup. In this training process, data from individuals with the specific disorder under consideration are categorized as the positive class, while data from all other participants constitute the negative class. In practice, it is widely recognized that certain mental disorders share similar symptoms, causing the collected behavioral data to encompass a variety of attributes associated with multiple disorders. Consequently, attributes linked to the targeted mental disorder might also be present within the negative class. This data impurity may lead to sub-optimal training of the classifier for a mental disorder of interest. In this study, we investigate this hypothesis in the context of major depressive disorder (MDD) and post-traumatic stress disorder detection (PTSD). The results show that upon removal of such data impurity, MDD and PTSD detection performances are significantly improved.
摘要
主要方法用于自动诊断心理疾病traditionally involve使用二分类器。这些分类器通过使用面试设置收集的行为数据进行训练。在这个训练过程中,患有Specific mental disorder的数据被分类为正样本,而所有其他参与者的数据被分类为负样本。在实践中,一些心理疾病会表现相似的症状,导致收集的行为数据包含多种与多种疾病相关的特征。因此,与targeted mental disorder相关的特征可能会存在在负样本中。这种数据杂质可能会导致分类器的训练不优化。在这项研究中,我们 investigate这个假设在主要抑郁症(MDD)和创后应急压力反应症(PTSD)检测方面。结果显示,在移除数据杂质后,MDD和PTSD检测性能得到了显著改善。
An In-Depth Analysis of Discretization Methods for Communication Learning using Backpropagation with Multi-Agent Reinforcement Learning
paper_authors: Astrid Vanneste, Simon Vanneste, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx
for: 这个论文的目的是比较不同的某些方法在多智能人工智能学习中的表现,以及一种基于 DIAL 和 COMA 的通信学习方法的实现。
methods: 这个论文使用的方法包括多种常见的某些方法,以及一种新的方法 named ST-DRU。
results: 论文的结果表明,ST-DRU 方法在不同环境中的表现最佳,它在每个实验中都达到了最佳或非常接近最佳性能,而且是唯一不会在任何环境中失败的方法。Abstract
Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments.
摘要
互动是多智能算法学习中非常重要的一环,当机器人无法观察环境的全部状态时。通常,使得学习得到的通信频道是使用可微分的通信频道,让条件gradient流过到每个机器人作为回传的形式。但是,当使用类型为码的讯息时,条件gradient无法流过这种通信频道。先前的工作已经提出了解决这个问题的方法,但是这些方法在不同的通信学习架构和环境中进行 tested,使得比较困难。在这篇文章中,我们比较了多个现有的码化方法,以及一个新的方法。我们将这些比较在通信学习中使用其他机器人的条件gradient进行tests,并在多个环境中进行试验。此外,我们还提出了COMA-DIAL,一种基于DIAL和COMA扩展的通信学习方法,并将其与学习速率调整和适应性探索相结合。这使得我们能够在更复杂的环境中进行实验。我们的结果显示,这篇文章中所提出的新方法ST-DRU,在不同的环境中都能够取得最佳或接近最佳的表现。它在每个实验中都能够取得最好或接近最好的结果,并且是唯一不会在任何测试环境中失败的方法。
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition
results: 在四个视频动作识别数据集上进行验证,实验结果表明,在同时考虑所有数据集的semi-supervised学习 Setting下,模型可以获得显著的提升,比起初始专家模型。Abstract
We propose JEDI, a multi-dataset semi-supervised learning method, which efficiently combines knowledge from multiple experts, learned on different datasets, to train and improve the performance of individual, per dataset, student models. Our approach achieves this by addressing two important problems in current machine learning research: generalization across datasets and limitations of supervised training due to scarcity of labeled data. We start with an arbitrary number of experts, pretrained on their own specific dataset, which form the initial set of student models. The teachers are immediately derived by concatenating the feature representations from the penultimate layers of the students. We then train all models in a student-teacher semi-supervised learning scenario until convergence. In our efficient approach, student-teacher training is carried out jointly and end-to-end, showing that both students and teachers improve their generalization capacity during training. We validate our approach on four video action recognition datasets. By simultaneously considering all datasets within a unified semi-supervised setting, we demonstrate significant improvements over the initial experts.
摘要
我们提出了JEDI方法,这是一种多个数据集半supervised学习方法,它能够有效地将多个专家知识 fusion,来提高每个数据集的学生模型性能。我们的方法解决了当前机器学习研究中的两个重要问题:数据集之间的泛化和监督学习数据的稀缺。我们从arbitrary数量的专家开始,先在自己specific dataset上预训练student模型,然后将专家转化为教师,并通过学生-教师半supervised学习方式进行训练,直到收敛。在我们的高效的方法中,学生-教师训练是joint和端到端的,这表明在训练过程中,学生和教师都会提高其泛化能力。我们在四个视频动作识别数据集上验证了我们的方法,并证明了在同时考虑所有数据集的半supervised Setting下,我们的方法能够实现显著的改进。
Deep Learning-Based Prediction of Fractional Flow Reserve along the Coronary Artery
paper_authors: Nils Hampe, Sanne G. M. van Velzen, Jean-Paul Aben, Carlos Collet, Ivana Išgum
For: This paper aims to develop a deep learning-based method for predicting fractional flow reserve (FFR) values along the coronary arteries from coronary computed tomography angiography (CCTA) scans.* Methods: The proposed method uses a variational autoencoder to characterize the artery and a convolutional neural network (CNN) to predict the FFR values. The CNN is supervised by multiple loss functions, including a loss function inspired by the Earth Mover’s Distance (EMD) to predict the correct location of FFR drops and a histogram-based loss to explicitly supervise the slope of the FFR curve.* Results: The resulting FFR curves show good agreement with the reference, allowing the distinction between diffuse and focal coronary artery disease (CAD) distributions in most cases. The mean absolute difference in the area under the FFR pullback curve (AUPC) was 1.7.Abstract
Functionally significant coronary artery disease (CAD) is caused by plaque buildup in the coronary arteries, potentially leading to narrowing of the arterial lumen, i.e. coronary stenosis, that significantly obstructs blood flow to the myocardium. The current reference for establishing the presence of a functionally significant stenosis is invasive fractional flow reserve (FFR) measurement. To avoid invasive measurements, non-invasive prediction of FFR from coronary CT angiography (CCTA) has emerged. For this, machine learning approaches, characterized by fast inference, are increasingly developed. However, these methods predict a single FFR value per artery i.e. they don't provide information about the stenosis location or treatment strategy. We propose a deep learning-based method to predict the FFR along the artery from CCTA scans. This study includes CCTA images of 110 patients who underwent invasive FFR pullback measurement in 112 arteries. First, a multi planar reconstruction (MPR) of the artery is fed to a variational autoencoder to characterize the artery, i.e. through the lumen area and unsupervised artery encodings. Thereafter, a convolutional neural network (CNN) predicts the FFR along the artery. The CNN is supervised by multiple loss functions, notably a loss function inspired by the Earth Mover's Distance (EMD) to predict the correct location of FFR drops and a histogram-based loss to explicitly supervise the slope of the FFR curve. To train and evaluate our model, eight-fold cross-validation was performed. The resulting FFR curves show good agreement with the reference allowing the distinction between diffuse and focal CAD distributions in most cases. Quantitative evaluation yielded a mean absolute difference in the area under the FFR pullback curve (AUPC) of 1.7. The method may pave the way towards fast, accurate, automatic prediction of FFR along the artery from CCTA.
摘要
《 coronary artery disease (CAD)的功能 significative coronary artery (CA)病变是由 CA 内积累物质堆积,导致 CA 的 luminal 尺寸减小,从而导致 coronary stenosis ,对 myocardium 的血液流减少具有重要作用。目前,确定 CAD 存在功能 significative stenosis 的参照标准是非侵入性 fractional flow reserve (FFR)测量。为了避免非侵入性测量,非侵入性预测 CCTA 图像中的 FFR 已经出现。这些方法具有快速的推理功能,但它们只能预测每条 CA 的单个 FFR 值,无法提供条件尺寸或治疗策略信息。我们提出了基于 deep learning 的方法,可以从 CCTA 图像中预测 CA 的 FFR。这个研究包括 CCTA 图像的 110 例患者,其中每例包含 112 条 CA。首先,MPR 图像被 feed 到 variational autoencoder,以 caracterize the artery,包括 luminal 区域和不supervised artery 编码。然后,使用 convolutional neural network (CNN)预测 CA 的 FFR。CNN 被多个损失函数 supervise,包括一个取自 Earth Mover's Distance (EMD)的损失函数,以正确地预测 FFR 下降的位置,以及一个 histogram-based 损失函数,以直接监督 FFR 曲线的坡度。为了训练和评估我们的模型,我们使用 eight-fold cross-validation 进行了训练和评估。结果显示,我们的 FFR 曲线与参照标准之间有良好的一致性,可以在大多数情况下分辨 diffuse 和 focal CAD 分布。量化评估表明,AUPC 的平均绝对差为 1.7。这种方法可能将为 CCTA 图像中的 FFR 预测带来快速、准确、自动化的方法。
GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters
results: 在评估中,GraphCC在不同的场景下表现出色,特别是在新的场景下(例如新的吞吐量工作负荷、故障、升级)下显示出优于状态艺术CC(ACC)的性能,升减流程完成时间(FCRT)和缓冲占用率(BO)。改进率可达20%,并且在不同的评估场景下均显示出优异性能。Abstract
Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCN). Currently, DCNs mainly implement two main CC protocols: DCTCP and DCQCN. Both protocols -- and their main variants -- are based on Explicit Congestion Notification (ECN), where intermediate switches mark packets when they detect congestion. The ECN configuration is thus a crucial aspect on the performance of CC protocols. Nowadays, network experts set static ECN parameters carefully selected to optimize the average network performance. However, today's high-speed DCNs experience quick and abrupt changes that severely change the network state (e.g., dynamic traffic workloads, incast events, failures). This leads to under-utilization and sub-optimal performance. This paper presents GraphCC, a novel Machine Learning-based framework for in-network CC optimization. Our distributed solution relies on a novel combination of Multi-agent Reinforcement Learning (MARL) and Graph Neural Networks (GNN), and it is compatible with widely deployed ECN-based CC protocols. GraphCC deploys distributed agents on switches that communicate with their neighbors to cooperate and optimize the global ECN configuration. In our evaluation, we test the performance of GraphCC under a wide variety of scenarios, focusing on the capability of this solution to adapt to new scenarios unseen during training (e.g., new traffic workloads, failures, upgrades). We compare GraphCC with a state-of-the-art MARL-based solution for ECN tuning -- ACC -- and observe that our proposed solution outperforms the state-of-the-art baseline in all of the evaluation scenarios, showing improvements up to $20\%$ in Flow Completion Time as well as significant reductions in buffer occupancy ($38.0-85.7\%$).
摘要
压缩控制(CC)在数据中心网络(DCN)中扮演了基本角色,以优化网络吞吐量。目前,DCN主要实现了两种主要的 CC 协议:DCTCP 和 DCQCN。这两种协议都基于显式压缩通知(ECN), intermediate switches 将包 WHEN 检测到压缩。因此,ECN 配置成为 CC 协议性能的关键因素。目前,网络专家通过手动设置ECN参数来优化网络性能。然而,今天的高速 DCN 经历了快速和突然变化,导致网络状态发生了剧烈变化(例如,动态流量负荷、广播事件、故障),这会导致网络资源的过度利用和低效性能。本文介绍了 GraphCC,一种基于机器学习的协议优化框架。我们的分布式解决方案基于多代理激励学习(MARL)和图神经网络(GNN),并与广泛部署的 ECN 基于 CC 协议相容。GraphCC 在 switches 上部署分布式代理,与其他 switches 通信以协调优化全局 ECN 配置。在我们的评估中,我们测试了 GraphCC 在多种场景下的性能,重点关注这种解决方案在新场景下(例如,新的流量负荷、故障、升级)未经训练时的可靠性。我们与 state-of-the-art MARL 基于 ECN 调试的解决方案(ACC)进行比较,并发现我们的提议的解决方案在所有评估场景中都高于 state-of-the-art 基线,实现了Flow Completion Time 的改进 ($20\%$) 以及显著减少缓存占用率($38.0-85.7\%)。
Towards true discovery of the differential equations
results: 该论文探讨了独立发现方程的前提和工具,并解决了评估发现的方程是否准确的挑战。Abstract
Differential equation discovery, a machine learning subfield, is used to develop interpretable models, particularly in nature-related applications. By expertly incorporating the general parametric form of the equation of motion and appropriate differential terms, algorithms can autonomously uncover equations from data. This paper explores the prerequisites and tools for independent equation discovery without expert input, eliminating the need for equation form assumptions. We focus on addressing the challenge of assessing the adequacy of discovered equations when the correct equation is unknown, with the aim of providing insights for reliable equation discovery without prior knowledge of the equation form.
摘要
differential equation发现,机器学习一个Subfield,用于开发可解释的模型,特别在自然相关应用中。通过专业地包含通用参数形式的动态方程和适当的分 diferencial项,算法可以自主发现方程从数据中。本文探讨无需专家输入的独立方程发现的前提和工具,消除方程形式假设的需求。我们专注于解决无法评估发现方程的正确性,当正确的方程未知时,提供可靠的方程发现无需先知方程形式的坚持。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Unleashing the Power of Extra-Tree Feature Selection and Random Forest Classifier for Improved Survival Prediction in Heart Failure Patients
paper_authors: Md. Simul Hasan Talukder, Rejwan Bin Sulaiman, Mouli Bardhan Paul Angon
For: The paper aims to improve survival prediction in heart failure patients by leveraging data pre-processing techniques and the Extra-Tree (ET) feature selection method in conjunction with the Random Forest (RF) classifier.* Methods: The paper uses the public UCL Heart failure (HF) survival dataset and employs the ET feature selection algorithm to identify the most informative features. These features are then used as input for grid search of RF.* Results: The approach achieved 98.33% accuracy, which is the highest over existing work.Here’s the same information in Simplified Chinese text:* For: 这篇论文目标是提高心衰竭患者存活率预测,通过数据处理技术和Extra-Tree(ET)特征选择法和Random Forest(RF)分类器的结合。* Methods: 该论文使用公共的UCL心衰竭存活数据集,并使用ET特征选择算法选择最有用的特征。这些特征然后用于RF搜索。* Results: 方法实现了98.33%的准确率,超过了现有的工作。Abstract
Heart failure is a life-threatening condition that affects millions of people worldwide. The ability to accurately predict patient survival can aid in early intervention and improve patient outcomes. In this study, we explore the potential of utilizing data pre-processing techniques and the Extra-Tree (ET) feature selection method in conjunction with the Random Forest (RF) classifier to improve survival prediction in heart failure patients. By leveraging the strengths of ET feature selection, we aim to identify the most significant predictors associated with heart failure survival. Using the public UCL Heart failure (HF) survival dataset, we employ the ET feature selection algorithm to identify the most informative features. These features are then used as input for grid search of RF. Finally, the tuned RF Model was trained and evaluated using different matrices. The approach was achieved 98.33% accuracy that is the highest over the exiting work.
摘要
心力衰竭是一种生命威胁性的疾病,影响全球数百万人。可以准确预测患者存活可以提供早期干预,提高患者结果。在这项研究中,我们探讨了使用数据处理技术和Extra-Tree(ET)特征选择方法,与Random Forest(RF)分类器结合,以提高心力衰竭患者存活预测。通过利用ET特征选择算法,我们寻找了心力衰竭存活最重要的预测器。使用公共的UCL心力衰竭存活数据集,我们使用ET特征选择算法确定最有用的特征。这些特征然后用于网格搜索RF模型的调整。最后,我们使用调整后的RF模型进行训练和评估,并使用不同的矩阵进行测试。我们的方法实现了98.33%的准确率,这是目前已知的最高水平。
Targeted and Troublesome: Tracking and Advertising on Children’s Websites
paper_authors: Zahra Moti, Asuman Senol, Hamid Bostani, Frederik Zuiderveen Borgesius, Veelasha Moonsamy, Arunesh Mathur, Gunes Acar for:* The paper focuses on the measurement of tracking and targeted advertising on websites directed at children.methods:* The authors use a multilingual classifier based on web page titles and descriptions to identify child-directed websites.* They crawl these websites from five vantage points to measure the prevalence of trackers, fingerprinting scripts, and advertisements.* They develop an ML pipeline to identify improper ads on child-directed websites by processing both images and text extracted from ads.results:* The authors find that around 90% of child-directed websites embed one or more trackers, and about 27% contain targeted advertisements.* They identify improper ads on child-directed websites, including ads for dating, weight loss, and mental health, as well as sex toys and flirting chat services.* The authors conclude that there is a trend of non-compliance with privacy regulations and troubling ad safety practices among many advertisers and child-directed websites.Abstract
On the modern web, trackers and advertisers frequently construct and monetize users' detailed behavioral profiles without consent. Despite various studies on web tracking mechanisms and advertisements, there has been no rigorous study focusing on websites targeted at children. To address this gap, we present a measurement of tracking and (targeted) advertising on websites directed at children. Motivated by lacking a comprehensive list of child-directed (i.e., targeted at children) websites, we first build a multilingual classifier based on web page titles and descriptions. Applying this classifier to over two million pages, we compile a list of two thousand child-directed websites. Crawling these sites from five vantage points, we measure the prevalence of trackers, fingerprinting scripts, and advertisements. Our crawler detects ads displayed on child-directed websites and determines if ad targeting is enabled by scraping ad disclosure pages whenever available. Our results show that around 90% of child-directed websites embed one or more trackers, and about 27% contain targeted advertisements--a practice that should require verifiable parental consent. Next, we identify improper ads on child-directed websites by developing an ML pipeline that processes both images and text extracted from ads. The pipeline allows us to run semantic similarity queries for arbitrary search terms, revealing ads that promote services related to dating, weight loss, and mental health; as well as ads for sex toys and flirting chat services. Some of these ads feature repulsive and sexually explicit imagery. In summary, our findings indicate a trend of non-compliance with privacy regulations and troubling ad safety practices among many advertisers and child-directed websites. To protect children and create a safer online environment, regulators and stakeholders must adopt and enforce more stringent measures.
摘要
现代网络上,跟踪器和广告商频繁地构建和利用用户的详细行为 profiles 而不经过用户的同意。尽管有各种研究关于网络跟踪机制和广告,但有一个缺乏关于直接向儿童targeted的网站的研究。为了填补这一漏洞,我们提供了一项测量tracking和targeted广告在directed at children的网站上的研究。由于缺乏全面的child-directed(即直接向儿童)网站列表,我们首先创建了一个多语言分类器,基于网页标题和描述来分类。将这些分类器应用于超过两百万页,我们编译了两千个child-directed网站的列表。从五个视点爬取这些站点,我们测量了跟踪器、指纹脚本和广告的存在。我们的爬虫检测在child-directed网站上显示的广告,并判断是否启用了广告targeting,并从可用的广告披露页面中抓取相关信息。我们的结果表明大约90%的child-directed网站 embedding一个或多个跟踪器,而约27%的网站上显示了targeted广告,这些广告应该需要可靠的父母consent。然后,我们使用机器学习管道来识别在child-directed网站上显示的不当广告。这个管道可以处理图像和文本抽取自广告,并允许我们对任意搜索关键词进行 semanticsimilarity 查询,揭示了关于约束、减肥和心理健康的服务,以及性 Toy和情趣交流服务的广告。一些这些广告包含了伤害和性革命的图像。总之,我们的发现表明许多广告商和child-directed网站不遵守隐私法规和儿童在线环境的安全做法。为了保护儿童和创造一个更安全的网络环境,管理者和关注者必须采取和实施更加严格的措施。
For: 提高深度学习模型的通用化能力* Methods: 使用 minimum spanning tree 计算 neuron 之间的 correlation dissimilarities,并使用这些误差来减少 neuron 之间的高相关性* Results: 比较 popular 误差函数,并证明了自己的效果,以及在不同的深度学习任务中的可应用性。Here’s the full translation of the paper’s abstract in Simplified Chinese:* For: 本文提出了一种新的方法,用于提高深度学习模型的通用化能力。* Methods: 该方法基于 minimum spanning tree 计算 neuron 之间的 correlation dissimilarities,并使用这些误差来减少 neuron 之间的高相关性。* Results: 比较 popular 误差函数,并证明了自己的效果,以及在不同的深度学习任务中的可应用性。Abstract
We propose a novel way to improve the generalisation capacity of deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We provide an extensive set of experiments to validate the effectiveness of our terms, showing that they outperform popular ones. Also, we demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms, suggesting that redundancies play a significant role in artificial neural networks, as evidenced by some studies in neuroscience for real networks. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms that consider the whole set of neurons and that can be applied to a feedforward architecture in any deep learning task such as classification, data generation, or regression.
摘要
我团队提出了一种新的方法,用于提高深度学习模型的泛化能力,通过减少神经元之间的高相关性。我们提出了两种正则化项,其计算基于某个神经网络(或一个样本)中神经元之间的最小拓扑树,其边的权重是神经元之间的相关性差异。我们提供了广泛的实验,证明了我们的正则化项的有效性,并表明它们超过了popular ones。此外,我们还证明了直接对所有神经元之间的相关性进行最小化,会导致较低的准确率,这表明了人工神经网络中的约束扮演了重要的角色,这与一些 neuroscience 研究中的真实神经网络一致。我们还提供了正则化项的导函数整合,这是首次开发了考虑整个神经元集的 topological persistence 基于的有效正则化项,可以应用于任何深度学习任务,如分类、数据生成或回归。
Are Sex-based Physiological Differences the Cause of Gender Bias for Chest X-ray Diagnosis?
results: 我们的分析发现,数据集特有的因素,而不是基本的生理差异,是胸部X射线预测性差异的主要驱动者。Abstract
While many studies have assessed the fairness of AI algorithms in the medical field, the causes of differences in prediction performance are often unknown. This lack of knowledge about the causes of bias hampers the efficacy of bias mitigation, as evidenced by the fact that simple dataset balancing still often performs best in reducing performance gaps but is unable to resolve all performance differences. In this work, we investigate the causes of gender bias in machine learning-based chest X-ray diagnosis. In particular, we explore the hypothesis that breast tissue leads to underexposure of the lungs and causes lower model performance. Methodologically, we propose a new sampling method which addresses the highly skewed distribution of recordings per patient in two widely used public datasets, while at the same time reducing the impact of label errors. Our comprehensive analysis of gender differences across diseases, datasets, and gender representations in the training set shows that dataset imbalance is not the sole cause of performance differences. Moreover, relative group performance differs strongly between datasets, indicating important dataset-specific factors influencing male/female group performance. Finally, we investigate the effect of breast tissue more specifically, by cropping out the breasts from recordings, finding that this does not resolve the observed performance gaps. In conclusion, our results indicate that dataset-specific factors, not fundamental physiological differences, are the main drivers of male--female performance gaps in chest X-ray analyses on widely used NIH and CheXpert Dataset.
摘要
虽然许多研究已经评估了人工智能算法在医疗领域的公平性,但对差异的预测性表现的原因 часто不明确。这种不知道偏误的原因使得偏误缓解无法具有最佳效果,如果只是通过简单的数据平衡来减少性能差异。在这项工作中,我们调查了机器学习基于胸部X光图像的性别偏误。我们尝试了一种新的采样方法,以解决两个公共数据集中每个病人记录的极度不均衡的问题,同时减少标签错误的影响。我们对男女之间疾病、数据集和训练集中的性别表现进行了广泛的分析,发现数据集不均衡不是差异的唯一原因。此外,不同数据集中男女组的表现差异很大,这表明数据集特有的因素对男女组的表现产生了重要影响。最后,我们尝试了更 specifically investigate the effect of breast tissue, by cropping out the breasts from the recordings, but found that this did not resolve the observed performance gaps. conclude, our results indicate that dataset-specific factors, rather than fundamental physiological differences, are the main drivers of male-female performance gaps in chest X-ray analyses on the widely used NIH and CheXpert Dataset.
Scalability of Message Encoding Techniques for Continuous Communication Learned with Multi-Agent Reinforcement Learning
results: 结果表明,在大量 Agent 中,mean message encoder 一直表现出色,superior 于 attention message encoder。研究发现,使用 mean message encoder 的 Agent 采用了一种组合 exponential 和 logarithmic 函数的通信策略,以避免信息损失。Abstract
Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in these messages. In this paper, we investigate the effect of increasing the amount of information that should be contained in a message and increasing the number of agents. We evaluate these effects on two different message encoding methods, the mean message encoder and the attention message encoder. We perform our experiments on a matrix environment. Surprisingly, our results show that the mean message encoder consistently outperforms the attention message encoder. Therefore, we analyse the communication protocol used by the agents that use the mean message encoder and can conclude that the agents use a combination of an exponential and a logarithmic function in their communication policy to avoid the loss of important information after applying the mean message encoder.
摘要
多个自动机制系统需要间接通信以实现目标。通过同时学习动作协议和通信协议使用多自动学习技术,代理人获得了自定义信息共享的灵活性。然而,当代理人数量增加时,我们需要创建消息中信息的编码。在这篇论文中,我们研究了增加消息中信息量和代理人数量的效果,并对两种消息编码方法进行评估:平均消息编码器和注意消息编码器。我们在矩阵环境中进行实验,结果显示,平均消息编码器一直表现出色,超过注意消息编码器。因此,我们分析了使用平均消息编码器的通信协议中的代理人们使用的函数,并结论这些函数是一种对抗函数和对数函数的组合,以避免在应用平均消息编码器后失去重要信息。
Unlocking the Diagnostic Potential of ECG through Knowledge Transfer from Cardiac MRI
paper_authors: Özgün Turgut, Philip Müller, Paul Hager, Suprosanna Shit, Sophie Starck, Martin J. Menten, Eimo Martens, Daniel Rueckert
For: 这篇研究旨在提供一种免费和快速的心脏健康评估工具,并将详细的心脏诊断换到更加昂贵的心脏磁共振成像(CMR)成像中。* Methods: 这篇研究提出了第一个自我超vised contrastive方法,将频率域图像与CMR图像的领域专有信息转移到ECG嵌入中。这个方法结合多modal contrastive learning和封页数据模型,实现单个ECG数据的全面心脏检查。* Results: 在对40,044名UK BiobankSubject进行了广泛的实验之后,我们展示了我们的方法的实用性和普遍性。我们预测各个心脏疾病的Subject-specific预后,并从ECG数据中分类出不同的心脏型态。在质感分析中,我们显示了我们学习的ECG嵌入包含了CMR图像区域的信息。我们将整个数据pipeline公开供下载,包括源代码和预读模型的重量。Abstract
The electrocardiogram (ECG) is a widely available diagnostic tool that allows for a cost-effective and fast assessment of the cardiovascular health. However, more detailed examination with expensive cardiac magnetic resonance (CMR) imaging is often preferred for the diagnosis of cardiovascular diseases. While providing detailed visualization of the cardiac anatomy, CMR imaging is not widely available due to long scan times and high costs. To address this issue, we propose the first self-supervised contrastive approach that transfers domain-specific information from CMR images to ECG embeddings. Our approach combines multimodal contrastive learning with masked data modeling to enable holistic cardiac screening solely from ECG data. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalizability of our method. We predict the subject-specific risk of various cardiovascular diseases and determine distinct cardiac phenotypes solely from ECG data. In a qualitative analysis, we demonstrate that our learned ECG embeddings incorporate information from CMR image regions of interest. We make our entire pipeline publicly available, including the source code and pre-trained model weights.
摘要
电导gram (ECG) 是一种广泛可用的诊断工具,可以快速和Cost-effectively评估心血管健康。然而,详细的检查通常使用昂贵的心血管共振成像 (CMR) 图像进行诊断心血管疾病。虽然提供了详细的心血管解剖结构图像,但CMR成像因为长时间扫描和高成本而不太常用。为解决这个问题,我们提出了第一个自动学习对抗方法,将域特定信息从 CMR 图像传递到 ECG 嵌入。我们的方法结合多modal对抗学习和遮盖数据模型,以实现唯一从 ECG 数据进行全面卡ди亚层检查。在使用40044名UK Biobank参与者的数据进行广泛实验中,我们证明了我们的方法的实用性和普适性。我们预测参与者特定的各种心血管疾病的风险,并通过分析发现了具有不同心血管特征的卡ди亚。在质量分析中,我们发现我们学习的 ECG 嵌入包含 CMR 图像区域兴趣的信息。我们将整个管道公开,包括源代码和预训练模型参数。
results: 我们的实验表明,将这种记忆网络与不同的惊喜预测器结合使用可以提高探索行为的效率并提高终极性表现,包括雷达、导航和困难的Atari游戏。Is that what you were looking for?Abstract
We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits efficient exploring behaviors and significantly boosts the final performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games.
摘要
我们提出了一种新的计算模型,用于激励学习中的内在奖励,以解决现有的惊喜驱动探索的局限性。我们的奖励是惊喜的新鲜度,而不是惊喜的平均值。我们使用记忆网络来估计惊喜的新鲜度,并称之为惊喜记忆(SM)。我们的SM可以增强惊喜基于内在动机的能力,使机器人保持有趣的探索,同时减少不必要的吸引到不可预测或噪音观察的情况。我们的实验表明,SM结合不同的惊喜预测器可以实现高效的探索行为,并在稀有奖励环境中显著提高最终性能,包括噪音电视、导航和复杂的Atari游戏。
TSSR: A Truncated and Signed Square Root Activation Function for Neural Networks
methods: 提出了一种新的活化函数called Truncated and Signed Square Root (TSSR)函数
results: TSSR函数在各种应用领域中表现出色,比如计算机视觉、自然语言处理和语音识别等。Abstract
Activation functions are essential components of neural networks. In this paper, we introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is distinctive because it is odd, nonlinear, monotone and differentiable. Its gradient is continuous and always positive. Thanks to these properties, it has the potential to improve the numerical stability of neural networks. Several experiments confirm that the proposed TSSR has better performance than other stat-of-the-art activation functions. The proposed function has significant implications for the development of neural network models and can be applied to a wide range of applications in fields such as computer vision, natural language processing, and speech recognition.
摘要
<>translate the following text into Simplified Chinese<> activation functions are crucial components of neural networks. In this paper, we introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is unique because it is odd, nonlinear, monotone, and differentiable. Its gradient is continuous and always positive. Thanks to these properties, it has the potential to improve the numerical stability of neural networks. Several experiments confirm that the proposed TSSR has better performance than other state-of-the-art activation functions. The proposed function has significant implications for the development of neural network models and can be applied to a wide range of applications in fields such as computer vision, natural language processing, and speech recognition.Here's the translation:<>翻译以下文本为简化字典<>激活函数是神经网络中重要的组件。在这篇论文中,我们介绍了一种新的激活函数,即 truncated and signed square root(TSSR)函数。这个函数是独特的,因为它是奇数,非线性,卷积和导数满是 monotone。其导数是连续的,总是正的。由于这些性质,它有可能改善神经网络的数值稳定性。几个实验证明,我们提出的 TSSR 函数在其他现有的激活函数中表现更好。该函数有广泛应用的前景,可以应用于计算机视觉、自然语言处理和语音识别等领域。
On the Unexpected Abilities of Large Language Models
methods: 论文使用了 indirect acquisition process 和其他已知的 indirect processes。
results: 论文 argued that large language models develop integrated abilities as a side effect of indirect acquisition, and discussed the predictability of these abilities. Additionally, the paper briefly discussed the relation between the cognitive skills acquired by these systems and human cognition.Abstract
Large language models are capable of displaying a wide range of abilities that are not directly connected with the task for which they are trained: predicting the next words of human-written texts. In this article, I discuss the nature of this indirect acquisition process and its relation to other known indirect processes. I argue that an important side effect of such indirect acquisition is the development of integrated abilities. I discuss the extent to which the abilities developed by large language models are predictable. Finally, I briefly discuss the relation between the cognitive skills acquired by these systems and human cognition.
摘要
大型语言模型可以显示广泛的能力,而这些能力与它们所训练的任务没有直接的连接:预测人类写成的文本中的下一句。在这篇文章中,我讨论这种 indirect acquisition 过程的本质和其他已知 indirect process 之间的关系。我认为大型语言模型通过 indirect acquisition 过程中获得的能力具有一定的可预测性。最后,我 briefly discuss 这些系统所获得的认知技能与人类认知之间的关系。
Bayes Risk Consistency of Nonparametric Classification Rules for Spike Trains Data
results: 论文提出了一个具有极限性的 Bayes 规则,并证明了这个规则在不断增加的录音时间间隔和训练集大小下的数据采样中的测度。Abstract
Spike trains data find a growing list of applications in computational neuroscience, imaging, streaming data and finance. Machine learning strategies for spike trains are based on various neural network and probabilistic models. The probabilistic approach is relying on parametric or nonparametric specifications of the underlying spike generation model. In this paper we consider the two-class statistical classification problem for a class of spike train data characterized by nonparametrically specified intensity functions. We derive the optimal Bayes rule and next form the plug-in nonparametric kernel classifier. Asymptotical properties of the rules are established including the limit with respect to the increasing recording time interval and the size of a training set. In particular the convergence of the kernel classifier to the Bayes rule is proved. The obtained results are supported by a finite sample simulation studies.
摘要
射频训练数据在计算神经科学、成像、流动数据和金融等领域发现了广泛的应用。机器学习策略 для射频训练基于各种神经网络和概率模型。 probabilistic 方法取决于射频训练模型的参数或非参数规定。本文考虑一类具有非参数强度函数的射频训练数据的两类统计分类问题。我们 derivation 出最优的 bayes 规则,然后形成插入非参数核函数分类器。我们证明了核函数分类器的极限性,包括记录时间间隔的增长和训练集大小的限制。特别是,我们证明了核函数分类器的极限性和 bayes 规则的同强性。获得的结果得到了finite sample 伪验的支持。
PETformer: Long-term Time Series Forecasting via Placeholder-enhanced Transformer
For: This paper aims to improve the performance of Transformer-based models in long-term time series forecasting (LTSF) tasks by addressing three key issues: temporal continuity, information density, and multi-channel relationships.* Methods: The proposed model, called PETformer, uses three innovative techniques: Placeholder Enhancement Technique (PET), Long Sub-sequence Division (LSD), and Multi-channel Separation and Interaction (MSI) to introduce prior biases suitable for LTSF tasks.* Results: The proposed PETformer model achieves state-of-the-art (SOTA) performance on eight commonly used public datasets for LTSF, outperforming all other models currently available. This demonstrates that Transformer still possesses powerful capabilities in LTSF.Here’s the Chinese version of the information points:* For: 这篇论文目的是提高Transformer基于模型在长期时间序预测(LTSF)任务中的表现,通过解决三个关键问题:时间连续性、信息密度和多通道关系。* Methods: 提议的模型被称为PETformer,使用了三种创新的技术:Placeholder Enhancement Technique(PET)、Long Sub-sequence Division(LSD)和Multi-channel Separation and Interaction(MSI),以引入适合LTSF任务的先验偏好。* Results: PETformer模型在八个常用的公共数据集上达到了状态之最(SOTA)的表现,比其他所有现有的模型都高。这表明Transformer仍然在LTSF中具有强大的能力。Abstract
Recently, Transformer-based models have shown remarkable performance in long-term time series forecasting (LTSF) tasks due to their ability to model long-term dependencies. However, the validity of Transformers for LTSF tasks remains debatable, particularly since recent work has shown that simple linear models can outperform numerous Transformer-based approaches. This suggests that there are limitations to the application of Transformer in LTSF. Therefore, this paper investigates three key issues when applying Transformer to LTSF: temporal continuity, information density, and multi-channel relationships. Accordingly, we propose three innovative solutions, including Placeholder Enhancement Technique (PET), Long Sub-sequence Division (LSD), and Multi-channel Separation and Interaction (MSI), which together form a novel model called PETformer. These three key designs introduce prior biases suitable for LTSF tasks. Extensive experiments have demonstrated that PETformer achieves state-of-the-art (SOTA) performance on eight commonly used public datasets for LTSF, outperforming all other models currently available. This demonstrates that Transformer still possesses powerful capabilities in LTSF.
摘要
Translation notes:1. "long-term time series forecasting" (LTSF) is translated as "长期时间序列预测" (Chángzhòng Shíjiàn Shílián Yùjian)2. "Transformer-based models" is translated as "基于Transformer的模型" (Jīyuè Transformer de Módeli)3. "simple linear models" is translated as "简单的线性模型" (Jìnduan de Língxíng Módeli)4. "prior biases" is translated as "先前的偏见" (Xiānqián de Péndiǎn)5. "Placeholder Enhancement Technique" is translated as "占位提升技术" (Jǐwèi Tiēshén Jìhuà)6. "Long Sub-sequence Division" is translated as "长 subsequences 分解" (Cháng Subseqences Fēnjiě)7. "Multi-channel Separation and Interaction" is translated as "多通道分离和互动" (Duō Tōngdào Fēnlíng Héhuìdòng)8. "novel model" is translated as "新型模型" (Xīn Xíng Módeli)9. "state-of-the-art" is translated as "现状最佳" (Xiànzhèng Zàiqiào)10. "outperforming" is translated as "超越" (Chāoyù)
For: 本研究提出了一种基于文化分析的稀疏混合技术(SUnAA),用于解决稀疏混合问题。* Methods: 我们提出了一个基于文化分析的新模型,假设感兴趣的元件是spectral库提供的元件的几何聚合。然后,我们提出了一个非几何优化目标函数,并使用活动集算法进行迭代优化。* Results: 我们使用两个 simulations dataset进行评估,结果表明SUnAA在signal-to-reconstruction error方面表现更好于传统和先进的方法。此外,我们还应用了SUnAA到Cuprite dataset,并与可用的地质图比较。 qualitative assessment表明SUnAA可以成功地估计矿物含量,并在主要矿物的探测中具有显著改善。Abstract
This paper introduces a new sparse unmixing technique using archetypal analysis (SUnAA). First, we design a new model based on archetypal analysis. We assume that the endmembers of interest are a convex combination of endmembers provided by a spectral library and that the number of endmembers of interest is known. Then, we propose a minimization problem. Unlike most conventional sparse unmixing methods, here the minimization problem is non-convex. We minimize the optimization objective iteratively using an active set algorithm. Our method is robust to the initialization and only requires the number of endmembers of interest. SUnAA is evaluated using two simulated datasets for which results confirm its better performance over other conventional and advanced techniques in terms of signal-to-reconstruction error. SUnAA is also applied to Cuprite dataset and the results are compared visually with the available geological map provided for this dataset. The qualitative assessment demonstrates the successful estimation of the minerals abundances and significantly improves the detection of dominant minerals compared to the conventional regression-based sparse unmixing methods. The Python implementation of SUnAA can be found at: https://github.com/BehnoodRasti/SUnAA.
摘要
Simplified Chinese translation:这篇论文介绍了一种新的稀缺混合技术,基于型态分析(SUnAA)。该方法假设有兴趣的终端成分是spectral库中的终端成分的几何聚合,并且知道终端成分的数量。然后,我们提出了一个非对称的最小化问题。与大多数传统的稀缺混合方法不同,我们在这里使用了活动集算法来解决这个问题。我们的方法对初始化的敏感,只需要终端成分的数量。SUnAA在两个模拟 dataset 上进行评估,结果表明它在信号征化误差方面与其他传统和先进方法相比表现更好。SUnAA还应用于 Cuprite dataset,并与可用的地质地图进行视觉比较。质量评估表明成功地估计矿物含量,并在主要矿物的检测方面提高了传统回归式稀缺混合方法的性能。Python实现的 SUnAA 可以在:https://github.com/BehnoodRasti/SUnAA 找到。
Tram-FL: Routing-based Model Training for Decentralized Federated Learning
results: 通过 MNIST、CIFAR-10 和 IMDb 数据集的实验表明,Tram-FL 与提出的路由算法可以在非独立的条件下达到高精度,比基eline高,同时减少通信成本。Abstract
In decentralized federated learning (DFL), substantial traffic from frequent inter-node communication and non-independent and identically distributed (non-IID) data challenges high-accuracy model acquisition. We propose Tram-FL, a novel DFL method, which progressively refines a global model by transferring it sequentially amongst nodes, rather than by exchanging and aggregating local models. We also introduce a dynamic model routing algorithm for optimal route selection, aimed at enhancing model precision with minimal forwarding. Our experiments using MNIST, CIFAR-10, and IMDb datasets demonstrate that Tram-FL with the proposed routing delivers high model accuracy under non-IID conditions, outperforming baselines while reducing communication costs.
摘要
在分布式联合学习(DFL)中,负担重大的交通和非独立同分布(非IID)数据带来高精度模型获得的挑战。我们提出了Tram-FL方法,它逐步进行全球模型的进一步精度提升,通过将其在节点之间顺序传输,而不是通过交换和聚合本地模型。我们还提出了一种动态模型路由算法,以优化路径选择,以提高模型精度,同时减少前进通信成本。我们通过使用MNIST、CIFAR-10和IMDb数据集进行实验,证明Tram-FL并与提出的路由算法在非IID条件下可以提供高精度模型,而且比基eline优化通信成本。
Feature Matching Data Synthesis for Non-IID Federated Learning
results: 我们的实验结果表明,将提案的HFMDS方法与联合学习结合使用,可以提高模型泛化性和隐私保护,同时降低计算成本。在多个benchmark数据集上,我们的提案HFMDS-FL算法比基eline表现出较高的准确率和隐私保护,同时计算成本也相对较低。Abstract
Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.
摘要
Federated 学习(FL)已经出现为一种隐私保护的思想,在边缘设备上训练神经网络而无需收集数据到中央服务器。然而,FL 面临着非独立和同分布(非IID)数据的挑战。为解决这个挑战,本文提议一种困难特征匹配数据生成(HFMDS)方法,以及在FL 中使用这种方法来共享辅助数据。具体来说,通过学习真实样本中重要的类相关特征,并丢弃 redundant 特征,可以生成高质量的synthetic数据,以有效地解决非IID 问题。为更好地保持隐私,我们提议一种困难特征扩充方法,将真实特征转移到决策边界,使得synthetic数据不仅提高模型泛化性,还将真实特征信息擦除。通过将提案的 HFMDS 方法与 FL 结合,我们提出了一种新的 FL 框架,并在这个框架中添加了数据扩充。理论分析表明,我们的提议的数据生成方法可以有效解决非IID 问题。实验结果还表明,我们的 HFMDS-FL 算法在各种 benchmark 数据集上比基eline 高于精度、隐私保护和计算成本。
Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
paper_authors: Lukas Prediger, Joonas Jälkö, Antti Honkela, Samuel Kaski
for: collaborative learning on sensitive data without violating privacy constraints
methods: 使用具有隐私保证的合成数据分享
results: 与本地数据只使用的情况相比,通过共同学习合成数据集, partiesto obtain more accurate target statistics,尤其是在小型不同类型数据集中; 更多参与者参与学习,则改进的结果越来越大和一致。Abstract
Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible. We propose a framework in which each party shares a differentially private synthetic twin of their data. We study the feasibility of combining such synthetic twin data sets for collaborative learning on real-world health data from the UK Biobank. We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of target statistics compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups. Based on our results we conclude that sharing of synthetic twins is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. The setting of distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.
摘要
假设多个方持有敏感数据,想要共同学习人口级统计数据,但汇集敏感数据集不可能。我们提出了一个框架,每个方共享一个具有隐私保证的假数据集。我们研究了将这些假数据集合用于共同学习的可能性,并应用于UK Biobank的真实世界医疗数据。我们发现,通过共同学习via共享假数据集,各方可以获得更准确的目标统计数据,包括小型不同类型数据集。此外,参与更多方会导致改进变得更大和更一致。最后,我们发现,共享假数据集可以帮助各方进行更好地调整分析,特别是对于被下代表的群体。根据我们的结果,我们认为共享假数据集是一种可靠的方法,允许保持隐私的方式进行敏感数据的学习,即使个人数据集小或者不代表整个人口。这种分布式敏感数据的设定frequently是生物医学研究中的瓶颈,我们的研究表明,这种瓶颈可以通过隐私保证的共同学习方法来缓解。
paper_authors: Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, Lingming Zhang for: 这篇论文是为了探讨一种基于大语言模型的通用软件测试工具(Fuzz4All),它可以针对多种输入语言和多种语言特性进行测试。methods: 这篇论文使用了大语言模型(LLM)作为输入生成和变换引擎,并提出了一种自动提示技术来创建适合软件测试的LLM提示。另外,它还提出了一种基于LLM的软件测试循环,可以在不同语言和不同特性下进行软件测试。results: 这篇论文的实验结果表明,使用Fuzz4All进行软件测试可以取得更高的覆盖率,而且可以发现76个在广泛使用的系统中的漏洞,其中47个已经确认由开发者们为之前未知的漏洞。Abstract
Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solvers, and software libraries with accessible APIs, are especially important as they are fundamental building blocks of software development. However, existing fuzzers for such systems often target a specific language, and thus cannot be easily applied to other languages or even other versions of the same language. Moreover, the inputs generated by existing fuzzers are often limited to specific features of the input language, and thus can hardly reveal bugs related to other or new features. This paper presents Fuzz4All, the first fuzzer that is universal in the sense that it can target many different input languages and many different features of these languages. The key idea behind Fuzz4All is to leverage large language models (LLMs) as an input generation and mutation engine, which enables the approach to produce diverse and realistic inputs for any practically relevant language. To realize this potential, we present a novel autoprompting technique, which creates LLM prompts that are wellsuited for fuzzing, and a novel LLM-powered fuzzing loop, which iteratively updates the prompt to create new fuzzing inputs. We evaluate Fuzz4All on nine systems under test that take in six different languages (C, C++, Go, SMT2, Java and Python) as inputs. The evaluation shows, across all six languages, that universal fuzzing achieves higher coverage than existing, language-specific fuzzers. Furthermore, Fuzz4All has identified 76 bugs in widely used systems, such as GCC, Clang, Z3, CVC5, OpenJDK, and the Qiskit quantum computing platform, with 47 bugs already confirmed by developers as previously unknown.
摘要
各种软件系统中的敏感区域漏洞探测得到了很大的成功,尤其是使用编程或正式语言作为输入的系统(SUT)。这些SUT包括编译器、运行时引擎、约束解决器和可用API的软件库等。然而,现有的敏感区域探测器通常只能针对特定语言,因此无法轻松地应用于其他语言或 même 同一种语言的其他版本。此外,现有的探测器通常仅生成特定语言的输入特性,因此很难暴露新的语言功能中的漏洞。本文介绍了Fuzz4All,首个可以针对多种输入语言和多种语言特性的敏感区域探测器。Fuzz4All的关键思想是利用大语言模型(LLM)作为输入生成和变换引擎,这使得它能够生成多样化和真实的输入 для任何实际有用的语言。为实现这一潜力,我们提出了一种自动提示技术,创建适合探测的 LLM 提示,以及一种基于 LLM 的探测循环,通过更新提示来创造新的探测输入。我们对 nine 种使用 six 种语言(C、C++、Go、SMT2、Java 和 Python)作为输入的系统进行了评估。评估结果显示,在所有 six 种语言中,通用探测 achieve higher coverage than existing, language-specific fuzzers。此外,Fuzz4All 已经发现了76个在广泛使用的系统中的漏洞,其中 47 个已经被开发者确认为之前未知的漏洞。
Optimizing a Transformer-based network for a deep learning seismic processing workflow
results: 试验结果显示,这些修改可以让StorSeismic模型在处理实际的Marmousi和海上场地数据时表现更快且有竞争力,同时需要训练更少的参数。Abstract
StorSeismic is a recently introduced model based on the Transformer to adapt to various seismic processing tasks through its pretraining and fine-tuning training strategy. In the original implementation, StorSeismic utilized a sinusoidal positional encoding and a conventional self-attention mechanism, both borrowed from the natural language processing (NLP) applications. For seismic processing they admitted good results, but also hinted to limitations in efficiency and expressiveness. We propose modifications to these two key components, by utilizing relative positional encoding and low-rank attention matrices as replacements to the vanilla ones. The proposed changes are tested on processing tasks applied to a realistic Marmousi and offshore field data as a sequential strategy, starting from denoising, direct arrival removal, multiple attenuation, and finally root-mean-squared velocity ($V_{RMS}$) prediction for normal moveout (NMO) correction. We observe faster pretraining and competitive results on the fine-tuning tasks and, additionally, fewer parameters to train compared to the vanilla model.
摘要
史顿希伯是一种最近引入的模型,基于Transformer来适应不同的地震处理任务。在原始实现中,史��ton希伯使用了抽象位编码和常见的自注意机制,从自然语言处理(NLP)应用中借鉴。对地震处理来说,它们得到了良好的结果,但也表现了效率和表达能力的限制。我们提议对这两个关键组件进行修改,使用相对位置编码和低级别注意矩阵作为替代物。我们对处理任务进行了顺序推进,从噪声除除、直接到达除、多个减弱、最后是根mean-squared velocity($V_{RMS}$)预测 для正常移动(NMO) corrections。我们发现在先修改任务上快速预训练,并在细化任务上获得了竞争性的结果,同时又需要训练 fewer 参数。
Going Deeper with Five-point Stencil Convolutions for Reaction-Diffusion Equations
for: solves partial differential equations (PDEs) with diverse initial conditions using physics-informed neural networks (PINNs).
methods: uses five-point stencil convolutional neural networks (FCNNs) with large receptive fields to predict time evolutions, and trains the models using two consecutive snapshots with a time step that satisfies the CFL condition.
results: demonstrates that the proposed deep FCNNs retain certain accuracies for the heat, Fisher’s, and Allen-Cahn equations, in contrast to finite difference methods (FDMs) that blow up.Abstract
Physics-informed neural networks have been widely applied to partial differential equations with great success because the physics-informed loss essentially requires no observations or discretization. However, it is difficult to optimize model parameters, and these parameters must be trained for each distinct initial condition. To overcome these challenges in second-order reaction-diffusion type equations, a possible way is to use five-point stencil convolutional neural networks (FCNNs). FCNNs are trained using two consecutive snapshots, where the time step corresponds to the step size of the given snapshots. Thus, the time evolution of FCNNs depends on the time step, and the time step must satisfy its CFL condition to avoid blow-up solutions. In this work, we propose deep FCNNs that have large receptive fields to predict time evolutions with a time step larger than the threshold of the CFL condition. To evaluate our models, we consider the heat, Fisher's, and Allen-Cahn equations with diverse initial conditions. We demonstrate that deep FCNNs retain certain accuracies, in contrast to FDMs that blow up.
摘要
物理学 Informed neural networks 已经广泛应用于部分偏微分方程,取得了很大成功,因为物理学 Informed 损失函数不需要观测或离散。然而,模型参数很难优化,这些参数需要为每个不同的初始条件进行训练。为了解决这些挑战,在第二阶段反应扩散类方程中,可以使用五点矩阵卷积神经网络(FCNN)。FCNN 通过两个连续的快照,其中时间步骤与给出的快照步骤相对应,因此 FCNN 的时间演化取决于时间步骤,并且时间步骤必须满足其 CFL 条件,以避免出现冲击解。在这种工作中,我们提议使用深度 FCNN,其具有大接收场,预测时间演化,时间步骤大于阈值 CFD 条件。为了评估我们的模型,我们考虑了热、施德、艾伦-卡恩方程,并对不同的初始条件进行评估。我们发现深度 FCNN 保留了一定的准确性,与 FDM 不同,后者会冲击。
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
results: 对比prevailing方法,JEN-1 在文本音乐对齐和音乐质量两个指标上具有显著优势,同时保持了计算效率。您可以通过访问 http://futureverse.com/research/jen/demos/jen1 来听取我们的示例作品。Abstract
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1
摘要
音乐生成已经吸引了深入的研究,随着深度生成模型的发展,但是基于文本描述的音乐生成,也就是文本到音乐(text-to-music),仍然是一个挑战。这是因为音乐结构的复杂性和高采样率的要求。虽然这个任务的重要性,现有的生成模型却表现出一些限制,包括音乐质量、计算效率和通用性。这篇论文介绍了JEN-1,一种通用高准确度模型,用于文本到音乐生成。JEN-1是一种扩散模型,通过内容学习来实现文本指导的音乐生成、音乐填充和续写等多种生成任务。评估结果表明,JEN-1在文本音乐对齐和音乐质量方面的表现较为出色,同时保持计算效率。您可以在http://futureverse.com/research/jen/demos/jen1中查看我们的示例。
Data-Free Model Extraction Attacks in the Context of Object Detection
results: 通过使用合理的查询,提出了一种数据free模型抽取方法,并在对象检测预测任务中实现了显著的结果。这种抽取方法将支持未来保护机器学习模型的安全。Abstract
A significant number of machine learning models are vulnerable to model extraction attacks, which focus on stealing the models by using specially curated queries against the target model. This task is well accomplished by using part of the training data or a surrogate dataset to train a new model that mimics a target model in a white-box environment. In pragmatic situations, however, the target models are trained on private datasets that are inaccessible to the adversary. The data-free model extraction technique replaces this problem when it comes to using queries artificially curated by a generator similar to that used in Generative Adversarial Nets. We propose for the first time, to the best of our knowledge, an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. As part of our study, we found that defining a loss function and using a novel generator setup is one of the key aspects in extracting the target model. We find that the proposed model extraction method achieves significant results by using reasonable queries. The discovery of this object detection vulnerability will support future prospects for securing such models.
摘要
许多机器学习模型容易受到模型提取攻击,这些攻击集中在使用特制的查询来盗取目标模型。这种任务在白盒环境中非常成功,通过使用目标模型的一部分训练数据或代理数据集来训练一个模仿目标模型的新模型。然而,在实际情况下,目标模型通常是使用私有数据进行训练,这些数据对于敌方无法访问。我们提出了第一次,至少知道的恶意黑盒攻击,扩展到回归问题,用于预测对象检测中的 bounding box 坐标。在我们的研究中,我们发现了定义损失函数和使用新的生成器设置是抽取目标模型的关键因素。我们发现,我们提议的模型提取方法可以使用合理的查询来获得显著的结果。这种对象检测攻击发现将支持未来对这些模型的安全。
Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
paper_authors: Hoang H. Nguyen, Chenwei Zhang, Ye Liu, Philip S. Yu
For: The paper is written for task-oriented dialogue (TOD) systems, specifically to improve the performance of natural language understanding (NLU) tasks such as intent detection and slot filling.* Methods: The paper proposes a method called Slot Induction (SI) that uses unsupervised pre-trained language models (PLMs) and contrastive learning to induce slot boundaries without explicit knowledge of token-level slot annotations.* Results: The paper shows that the proposed SI method is effective in the SI task and can bridge the gap with token-level supervised models on two NLU benchmark datasets. Additionally, the paper shows that the SI objectives can provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.Here is the information in Simplified Chinese text:
methods: 这篇论文提出了一种无监督语言模型(PLM)探索和对比学习的方法,用于不Explicitly knowledge of token-level插槽标签来逻归插槽界限。
results: 论文显示,提出的SI方法在SI任务中效果很好,可以与token级监督模型在两个NLUbenchmark dataset上凑成一个比。此外,论文还显示,SI目标可以提供更好的插槽标签表示,导致插槽填充任务的改进表现。Abstract
Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations. We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit (1) unsupervised semantic knowledge extracted from PLM, and (2) additional sentence-level intent label signals available from TOD. Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets. When generalized to emerging intents, our SI objectives also provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.
摘要
现代技术在任务对话(TOD)系统中的自然语言理解(NLP)方面(例如,意图检测和槽填充)需要大量注解数据来达到竞争性表现。然而,在实际应用中,token级别的注解(槽标签)是时间consuming和困难的获得。在这种工作中,我们研究了槽引入(SI)任务,其目标是无需明确的token级别槽标签来induce槽界限。我们提议利用无监督语言模型(PLM)探测和对比学习机制,以利用PLM中的无监督semantic知识,以及TOD中可获得的句子级意图标签信号。我们的方法在SI任务中显示效果,可以bridge带有token级别监督模型的 gap,在两个NLU benchmark数据集上。此外,当扩展到新意图时,我们的SI目标还能提供加强的槽标签表示,导致在槽填充任务中提高表现。
Generative Perturbation Analysis for Probabilistic Black-Box Anomaly Attribution
methods: 这 paper 使用了一种新的框架,即Counterfactual Variational Bayes(CVB),来计算输入变量的异常分布。
results: 这 paper 得到了一种不受偏见的异常分布计算方法,并且可以量化异常分布的不确定性。Abstract
We address the task of probabilistic anomaly attribution in the black-box regression setting, where the goal is to compute the probability distribution of the attribution score of each input variable, given an observed anomaly. The training dataset is assumed to be unavailable. This task differs from the standard XAI (explainable AI) scenario, since we wish to explain the anomalous deviation from a black-box prediction rather than the black-box model itself. We begin by showing that mainstream model-agnostic explanation methods, such as the Shapley values, are not suitable for this task because of their ``deviation-agnostic property.'' We then propose a novel framework for probabilistic anomaly attribution that allows us to not only compute attribution scores as the predictive mean but also quantify the uncertainty of those scores. This is done by considering a generative process for perturbations that counter-factually bring the observed anomalous observation back to normalcy. We introduce a variational Bayes algorithm for deriving the distributions of per variable attribution scores. To the best of our knowledge, this is the first probabilistic anomaly attribution framework that is free from being deviation-agnostic.
摘要
我团队正在研究黑盒回归Setting中的概率异常归属问题,目标是计算每个输入变量的归属分布,给出观察到的异常。我们假设训练数据集不可用。这个任务与标准XAI(可解释AI)场景不同,我们想要解释黑盒预测的异常偏差而不是黑盒模型本身。我们首先显示了主流的模型无关解释方法,如夏普利值(Shapley value),不适合这个任务,因为它们的偏差无关性。然后,我们提出了一种新的概率异常归属框架,允许我们不仅计算归属分布,还可以评估这些分布的不确定性。这是通过考虑一种对 perturbations 的生成过程来实现的,该过程可以在观察到的异常 observation 的情况下,Counter-factually 带回正常。我们提出了一种变分 Bayes 算法来 derivation 每个变量归属分布的分布。到目前为止,这是免除偏差无关性的第一个概率异常归属框架。
Pareto Invariant Representation Learning for Multimedia Recommendation
results: 在三个公共多媒体推荐数据集上进行比较,研究结果表明PaInvRL模型可以在不同环境下具有优秀的内在和跨环境学习能力。Abstract
Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning.
摘要
multimedia推荐通常包括个性化排序任务,其中 multimedia 内容通常使用一个通用编码器表示。然而,这些通用表示引入了假 correlate 问题,这些问题使得用户的真实喜好不能正确反映。现有的工作尝试通过学习不变表示来缓解这个问题,但是忽略了IID和OOD总体化的平衡。在这篇论文中,我们提出一个名为Pareto不变表示学习(PaInvRL)的框架,用于减轻IID-OOD多目标优化视角下的假 correlate 问题。具体来说,PaInvRL包括三个相互执行的模块:(i)不同环境标识模块,用于反映用户-项目交互中的分布转移;(ii)不变Mask生成模块,用于基于Pareto优化解决方案Minimize适应权重不变风险(IRM)和实际风险(ERM)损失中的不变Mask;(iii)转换模块,用于生成 variant 表示和项目不变表示,以用于训练一个多Modal推荐模型,以避免假 correlate 和 Balance 在环境分布下的总体化性能。我们对三个公共 multimedia 推荐数据集(Movielens、Tiktok和Kwai)进行比较,并证明PaInvRL在内部和交叉环境中的学习表现效果。
A Feature Set of Small Size for the PDF Malware Detection
results: 研究发现,使用Random Forest模型可以达到99.75%的最高准确率,并且该特征集的12个特征是PDF malware检测领域中最短的之一。Abstract
Machine learning (ML)-based malware detection systems are becoming increasingly important as malware threats increase and get more sophisticated. PDF files are often used as vectors for phishing attacks because they are widely regarded as trustworthy data resources, and are accessible across different platforms. Therefore, researchers have developed many different PDF malware detection methods. Performance in detecting PDF malware is greatly influenced by feature selection. In this research, we propose a small features set that don't require too much domain knowledge of the PDF file. We evaluate proposed features with six different machine learning models. We report the best accuracy of 99.75% when using Random Forest model. Our proposed feature set, which consists of just 12 features, is one of the most conciseness in the field of PDF malware detection. Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.
摘要
Translated into Simplified Chinese:机器学习(ML)基于的钓鱼攻击检测系统正在日益重要,因为钓鱼威胁不断增长和变得更加复杂。PDF文档经常用于钓鱼攻击,因为它们被广泛认为是可靠的数据资源,可以在不同的平台上访问。因此,研究人员已经开发了许多不同的PDF钓鱼检测方法。检测PDF钓鱼的性能受到特征选择的影响。在本研究中,我们提出了一个小的特征集,不需要过多的领域知识。我们使用六种不同的机器学习模型评估提案的特征。我们发现使用随机森林模型时的最佳准确率为99.75%。我们提出的特征集,包含12个特征,是PDF钓鱼检测领域中最短的一个。尽管它的规模不大,但我们得到了与领先的方法相当的结果。
An Analytical Study of Covid-19 Dataset using Graph-Based Clustering Algorithms
results: 研究发现,COVID-19病毒的蛋白质互作网络具有较强的稠密度和连接度,这些特征可能与疾病的发展和恶化有关。Abstract
Corona VIrus Disease abbreviated as COVID-19 is a novel virus which is initially identified in Wuhan of China in December of 2019 and now this deadly disease has spread all over the world. According to World Health Organization (WHO), a total of 3,124,905 people died from 2019 to 2021, April. In this case, many methods, AI base techniques, and machine learning algorithms have been researched and are being used to save people from this pandemic. The SARS-CoV and the 2019-nCoV, SARS-CoV-2 virus invade our bodies, causing some differences in the structure of cell proteins. Protein-protein interaction (PPI) is an essential process in our cells and plays a very important role in the development of medicines and gives ideas about the disease. In this study, we performed clustering on PPI networks generated from 92 genes of the Covi-19 dataset. We have used three graph-based clustering algorithms to give intuition to the analysis of clusters.
摘要
“科罗纳病毒病”,简称“ COVID-19”,是一种新型病毒,最初在2019年12月在中国武汉地区被发现,现在这种致命病已经在全球蔓延。根据世界卫生组织(WHO)的统计,2019年至2021年4月,总共有3,124,905人死亡。在这个情况下,许多方法、AI基础技术和机器学习算法都在应用以拯救人类。SARS-CoV和2019-nCoV病毒会入侵我们的身体,导致细胞蛋白结构的一些差异。蛋白蛋白互动(PPI)是我们细胞中的一个重要过程,它对于药物的发展和疾病的了解具有非常重要的作用。在这个研究中,我们使用了92个Covi-19数据集中的PPI网络进行对数据的分组。我们使用了三种图形基础的分组算法,以提供分析结果的直觉。
Explainable AI in Orthopedics: Challenges, Opportunities, and Prospects
paper_authors: Soheyla Amirian, Luke A. Carlson, Matthew F. Gong, Ines Lohse, Kurt R. Weiss, Johannes F. Plate, Ahmad P. Tafti
for: The paper is written to address the challenge of explainable AI (XAI) in orthopedics and to emphasize the need for interdisciplinary collaborations to establish standards and guidelines for the adoption of XAI in orthopedics.
methods: The paper uses a combination of AI models and algorithms that prioritize transparency and interpretability to address the challenge of XAI in orthopedics.
results: The paper highlights the need for interdisciplinary collaborations between AI practitioners, orthopedic specialists, and regulatory entities to establish standards and guidelines for the adoption of XAI in orthopedics.Here’s the simplified Chinese text for the three key points:
results: 这篇论文提出了多元合作的需要,以建立XAI在骨科领域的标准和指南,并且强调了在实施XAI时,需要与医生、外科医生和管理机构之间的合作。Abstract
While artificial intelligence (AI) has made many successful applications in various domains, its adoption in healthcare lags a little bit behind other high-stakes settings. Several factors contribute to this slower uptake, including regulatory frameworks, patient privacy concerns, and data heterogeneity. However, one significant challenge that impedes the implementation of AI in healthcare, particularly in orthopedics, is the lack of explainability and interpretability around AI models. Addressing the challenge of explainable AI (XAI) in orthopedics requires developing AI models and algorithms that prioritize transparency and interpretability, allowing clinicians, surgeons, and patients to understand the contributing factors behind any AI-powered predictive or descriptive models. The current contribution outlines several key challenges and opportunities that manifest in XAI in orthopedic practice. This work emphasizes the need for interdisciplinary collaborations between AI practitioners, orthopedic specialists, and regulatory entities to establish standards and guidelines for the adoption of XAI in orthopedics.
摘要
人工智能(AI)在不同领域已经取得了许多成功应用,但在医疗领域的采纳 slower 了一些。这些因素包括法规框架、患者隐私问题和数据不一致。然而,在医疗领域,特别是在 ortopedics 中,缺乏 Explainable AI(XAI)是实现 AI 的主要挑战。 Addressing the challenge of XAI in orthopedics requires developing AI models and algorithms that prioritize transparency and interpretability, allowing clinicians, surgeons, and patients to understand the contributing factors behind any AI-powered predictive or descriptive models. 现在的贡献将描述 XAI 在 ortopedics 中的一些关键挑战和机遇。这篇文章强调了在 AI 实践者、orthopedic 专家和 regulatory 机构之间的交往合作,以确立 XAI 在 ortopedics 中的标准和指南。
Finite Element Operator Network for Solving Parametric PDEs
methods: 提议使用Finite Element Operator Network (FEONet),结合深度学习和传统的数值方法(特别是finite element方法)解决Parametric PDEs。
results: 对多个 benchmark 问题进行了实验,证明了我们的方法在准确性、泛化能力和计算灵活性等方面表现出色,并且超过了现有的状态艺术方法。Abstract
Partial differential equations (PDEs) underlie our understanding and prediction of natural phenomena across numerous fields, including physics, engineering, and finance. However, solving parametric PDEs is a complex task that necessitates efficient numerical methods. In this paper, we propose a novel approach for solving parametric PDEs using a Finite Element Operator Network (FEONet). Our proposed method leverages the power of deep learning in conjunction with traditional numerical methods, specifically the finite element method, to solve parametric PDEs in the absence of any paired input-output training data. We demonstrate the effectiveness of our approach on several benchmark problems and show that it outperforms existing state-of-the-art methods in terms of accuracy, generalization, and computational flexibility. Our FEONet framework shows potential for application in various fields where PDEs play a crucial role in modeling complex domains with diverse boundary conditions and singular behavior. Furthermore, we provide theoretical convergence analysis to support our approach, utilizing finite element approximation in numerical analysis.
摘要
Two Novel Approaches to Detect Community: A Case Study of Omicron Lineage Variants PPI Network
results: 研究发现,使用不同的算法可以检测出 variant B.1.1.529 网络中的社群结构,并且这些社群结构具有独特的特征和性质。Abstract
The capacity to identify and analyze protein-protein interactions, along with their internal modular organization, plays a crucial role in comprehending the intricate mechanisms underlying biological processes at the molecular level. We can learn a lot about the structure and dynamics of these interactions by using network analysis. We can improve our understanding of the biological roots of disease pathogenesis by recognizing network communities. This knowledge, in turn, holds significant potential for driving advancements in drug discovery and facilitating personalized medicine approaches for disease treatment. In this study, we aimed to uncover the communities within the variant B.1.1.529 (Omicron virus) using two proposed novel algorithm (ABCDE and ALCDE) and four widely recognized algorithms: Girvan-Newman, Louvain, Leiden, and Label Propagation algorithm. Each of these algorithms has established prominence in the field and offers unique perspectives on identifying communities within complex networks. We also compare the networks by the global properties, statistic summary, subgraph count, graphlet and validate by the modulaity. By employing these approaches, we sought to gain deeper insights into the structural organization and interconnections present within the Omicron virus network.
摘要
“蛋白质-蛋白质互动的能力和内部模块化结构在分子层面上关键地影响生物过程的复杂机制。我们可以通过网络分析来学习这些互动的结构和动力学。通过认可网络社区,我们可以更深入地理解疾病生物根据,这将有助于医疗药物发现和疾病治疗中采取人类化方法。在这个研究中,我们使用了两种提出的新算法(ABCDE和ALCDE)和四种已知的算法:Girvan-Newman、Louvain、Leiden和Label Propagation算法。每个这些算法在领域中都有传统的地位,它们可以帮助我们从不同的角度发现疾病网络中的社区。我们还比较了这些网络的全球性、统计摘要、子graph计数、графлет和验证 Modulaity。通过这些方法,我们想要从这些方法中获得更深入的理解疾病网络中的结构和互动。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
TBIN: Modeling Long Textual Behavior Data for CTR Prediction
results: 实验结果表明,TBIN 可以更好地预测 CTR,并在实际食品推荐平台上达到了好的效果。Abstract
Click-through rate (CTR) prediction plays a pivotal role in the success of recommendations. Inspired by the recent thriving of language models (LMs), a surge of works improve prediction by organizing user behavior data in a \textbf{textual} format and using LMs to understand user interest at a semantic level. While promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs. However, it has been studied that long user behavior data can significantly benefit CTR prediction. In addition, these works typically condense user diverse interests into a single feature vector, which hinders the expressive capability of the model. In this paper, we propose a \textbf{T}extual \textbf{B}ehavior-based \textbf{I}nterest Chunking \textbf{N}etwork (TBIN), which tackles the above limitations by combining an efficient locality-sensitive hashing algorithm and a shifted chunk-based self-attention. The resulting user diverse interests are dynamically activated, producing user interest representation towards the target item. Finally, the results of both offline and online experiments on real-world food recommendation platform demonstrate the effectiveness of TBIN.
摘要
点击率预测(CTR)在推荐中发挥关键作用。鼓使用语言模型(LM)的最近繁荣,一些工作改进预测,将用户行为数据组织成文本格式,并使用LM理解用户兴趣的 semantic 层次。虽然有承诺,但这些工作通常需要压缩文本数据,以降低LM的自注意力计算量的二次性。此外,这些工作通常将用户多样化的兴趣维度化成单一的特征向量,这限制了模型的表达能力。在这篇论文中,我们提出了一种 Textual Behavior-based Interest Chunking Network(TBIN),解决以上限制。TBIN 通过结合本地性敏感哈希算法和偏移 chunk-based self-attention 来实现。这将使用户多样化的兴趣在运行时动态激活,生成用户兴趣表示向 target 项。最后,在实际食品推荐平台上进行了线上和离线实验,证明了 TBIN 的效果。
A General Implicit Framework for Fast NeRF Composition and Rendering
for: This paper aims to provide a general implicit pipeline for composing NeRF objects quickly, enabling the casting of dynamic shadows within or between objects using analytical light sources, and allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations.
methods: The proposed method introduces a new surface representation known as Neural Depth Fields (NeDF), which quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.
results: The proposed method is the first to enable both the progressive and interactive composition of NeRF objects, and it also serves as a previewing plugin for a range of existing NeRF works.Abstract
A variety of Neural Radiance Fields (NeRF) methods have recently achieved remarkable success in high render speed. However, current accelerating methods are specialized and incompatible with various implicit methods, preventing real-time composition over various types of NeRF works. Because NeRF relies on sampling along rays, it is possible to provide general guidance for acceleration. To that end, we propose a general implicit pipeline for composing NeRF objects quickly. Our method enables the casting of dynamic shadows within or between objects using analytical light sources while allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations. Mainly, our work introduces a new surface representation known as Neural Depth Fields (NeDF) that quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.Our proposed method is the first to enable both the progressive and interactive composition of NeRF objects. Additionally, it also serves as a previewing plugin for a range of existing NeRF works.
摘要
各种神经辐射场(NeRF)方法在最近几年内已经实现了高速渲染。然而,当前的加速方法都是特殊化的,与各种隐式方法不兼容,因此无法在不同类型的 NeRF 作品中实现实时组合。因为 NeRF 是基于样本点投射,因此可以提供一般的指导。为了实现这一目标,我们提议一种通用的隐式管道,用于快速组合 NeRF 对象。我们的方法可以在动态阴影下投射 NeRF 对象,并允许多个 NeRF 对象在任意旋转变换下协同渲染。主要是,我们的工作引入了一种新的表面表示方式,即神经深度场(NeDF),它快速确定对象之间的空间关系,并使用神经网络进行交叉计算。这种方法不依赖于Explicit的空间结构,可以快速地计算ray和隐式表面之间的交叉。我们的提议方法是首次实现了 NeRF 对象的逐渐和交互式组合。此外,它还可以作为许多现有 NeRF 作品的预览插件。
Classification of lung cancer subtypes on CT images with synthetic pathological priors
results: 实验结果显示,提案的模型在肺癌病理型态分类任务中具有superiority,与多种state-of-the-art(SOTA)分类模型相比,具有 significiant 的准确性改善,包括精确率(ACC)、抽象曲线(AUC)和 F1 分数。Abstract
The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns between the same case's CT images and its pathological images, we innovatively developed a pathological feature synthetic module (PFSM), which quantitatively maps cross-modality associations through deep neural networks, to derive the "gold standard" information contained in the corresponding pathological images from CT images. Additionally, we designed a radiological feature extraction module (RFEM) to directly acquire CT image information and integrated it with the pathological priors under an effective feature fusion framework, enabling the entire classification model to generate more indicative and specific pathologically related features and eventually output more accurate predictions. The superiority of the proposed model lies in its ability to self-generate hybrid features that contain multi-modality image information based on a single-modality input. To evaluate the effectiveness, adaptability, and generalization ability of our model, we performed extensive experiments on a large-scale multi-center dataset (i.e., 829 cases from three hospitals) to compare our model and a series of state-of-the-art (SOTA) classification models. The experimental results demonstrated the superiority of our model for lung cancer subtypes classification with significant accuracy improvements in terms of accuracy (ACC), area under the curve (AUC), and F1 score.
摘要
准确诊断肺癌分型对跟进治疗和预后管理具有重要的重要性。在这篇论文中,我们提议一种自生成混合特征网络(SGHF-Net),用于准确分类肺癌分型的计算机Tomography(CT)图像。受到 Studies表明cross-scale关系存在图像特征之间的同一个患者的CT图像和其 PATHOLOGICAL图像的研究所启发,我们创新地开发了一种 PATHOLOGICAL特征合成模块(PFSM),用于量化跨模态关系,从深度神经网络中获取 PATHOLOGICAL图像中的"金标准"信息。此外,我们设计了一种放射学特征提取模块(RFEM),用于直接获取 CT 图像信息,并将其与 PATHOLOGICAL 先天知识结合在一起,以实现效果的特征融合框架,使整个分类模型能够生成更指示性和特定的 PATHOLOGICAL 相关特征,并最终输出更高精度的预测结果。我们的模型的优势在于它可以基于单一输入模式生成混合特征,包括多Modal 图像信息。为评估我们模型的有效性、适应性和普遍性,我们在大规模多中心数据集(i.e., 829 例from three hospitals)进行了广泛的实验,与一些当前最佳分类模型进行比较。实验结果表明,我们的模型在肺癌分型方面具有显著的准确性改进,包括准确率(ACC)、曲线下的面积(AUC)和 F1 分数。
Efficient Bayesian Optimization with Deep Kernel Learning and Transformer Pre-trained on Multiple Heterogeneous Datasets
results: 在 sintetic和实际benchmark问题上,提出的方法比现有方法更高效。Abstract
Bayesian optimization (BO) is widely adopted in black-box optimization problems and it relies on a surrogate model to approximate the black-box response function. With the increasing number of black-box optimization tasks solved and even more to solve, the ability to learn from multiple prior tasks to jointly pre-train a surrogate model is long-awaited to further boost optimization efficiency. In this paper, we propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder, using datasets from prior tasks with possibly heterogeneous input spaces. In addition, we provide a simple yet effective mix-up initialization strategy for input tokens corresponding to unseen input variables and therefore accelerate new tasks' convergence. Experiments on both synthetic and real benchmark problems demonstrate the effectiveness of our proposed pre-training and transfer BO strategy over existing methods.
摘要
bayesian 优化(BO)广泛应用于黑盒优化问题中,它基于一个模拟黑盒响应函数的伪函数来进行优化。随着黑盒优化任务的数量不断增加,并且还有更多的任务需要解决,因此有必要将多个前一个任务的知识共享以提高优化效率。在这篇论文中,我们提出了一种简单的预训练方法,其中使用一个基于Transformer的encoder学习的深度特征来定义GP的kernel,并使用多个先前任务的数据进行预训练。此外,我们还提供了一种简单却有效的混合初始化策略,以便快速加速新任务的启动。实验表明,我们的提议的预训练和传递BO策略在实际和Syntheticbenchmark问题上比既有方法更高效。
Assessing the performance of deep learning-based models for prostate cancer segmentation using uncertainty scores
results: 研究发现,Attention R2U-Net 模型在 segmenting 所有区域时达到了平均 Intersection over Union(IoU)76.3%和 Dice Similarity Coefficient(DSC)85%的最高性能,并且在边界区域中,特别是在转换区域和肿瘤边界上,Attention R2U-Net 模型表现出最低的不确定性值。Abstract
This study focuses on comparing deep learning methods for the segmentation and quantification of uncertainty in prostate segmentation from MRI images. The aim is to improve the workflow of prostate cancer detection and diagnosis. Seven different U-Net-based architectures, augmented with Monte-Carlo dropout, are evaluated for automatic segmentation of the central zone, peripheral zone, transition zone, and tumor, with uncertainty estimation. The top-performing model in this study is the Attention R2U-Net, achieving a mean Intersection over Union (IoU) of 76.3% and Dice Similarity Coefficient (DSC) of 85% for segmenting all zones. Additionally, Attention R2U-Net exhibits the lowest uncertainty values, particularly in the boundaries of the transition zone and tumor, when compared to the other models.
摘要
Deep Metric Learning for the Hemodynamics Inference with Electrocardiogram Signals
results: 研究发现,这个自我监督DML方法可以对于患者 subgroup 进行优化,并且在不同的患者 subgroup 中表现良好。此外,这个方法还可以与现有的标准方法进行比较,以验证其性能。Abstract
Heart failure is a debilitating condition that affects millions of people worldwide and has a significant impact on their quality of life and mortality rates. An objective assessment of cardiac pressures remains an important method for the diagnosis and treatment prognostication for patients with heart failure. Although cardiac catheterization is the gold standard for estimating central hemodynamic pressures, it is an invasive procedure that carries inherent risks, making it a potentially dangerous procedure for some patients. Approaches that leverage non-invasive signals - such as electrocardiogram (ECG) - have the promise to make the routine estimation of cardiac pressures feasible in both inpatient and outpatient settings. Prior models trained to estimate intracardiac pressures (e.g., mean pulmonary capillary wedge pressure (mPCWP)) in a supervised fashion have shown good discriminatory ability but have been limited to the labeled dataset from the heart failure cohort. To address this issue and build a robust representation, we apply deep metric learning (DML) and propose a novel self-supervised DML with distance-based mining that improves the performance of a model with limited labels. We use a dataset that contains over 5.4 million ECGs without concomitant central pressure labels to pre-train a self-supervised DML model which showed improved classification of elevated mPCWP compared to self-supervised contrastive baselines. Additionally, the supervised DML model that is using ECGs with access to 8,172 mPCWP labels demonstrated significantly better performance on the mPCWP regression task compared to the supervised baseline. Moreover, our data suggest that DML yields models that are performant across patient subgroups, even when some patient subgroups are under-represented in the dataset. Our code is available at https://github.com/mandiehyewon/ssldml
摘要
心力衰竭是一种严重的疾病,影响了全球数百万人的生活质量和死亡率。心脏压力的 объектив评估仍然是诊断和治疗预测的重要方法。尽管心脏插管是心脏压力的标准方法,但它是一种侵入性的程序,带来了内在的风险,因此可能对某些患者而言是危险的。使用非侵入性信号(如电cardiogram)的方法可以使 Routine estimation of cardiac pressures 变得可能。先前的模型,通过监督学习来估计内心脏压力(如mean pulmonary capillary wedge pressure (mPCWP)),在心脏疾病群体中显示了良好的预测能力,但它们受限于标注数据集。为解决这个问题并建立一个坚固的表示,我们应用深度度量学习(DML)并提出一种新的自动化DML,通过距离基本挖掘来提高模型的性能。我们使用了包含超过540万个ECG无相关中央压力标签的数据集来预训一个自动化DML模型,该模型在提高高mPCWP的分类性能方面表现出色,而且与自动化对比基线显著更好。此外,我们使用ECG和8172个mPCWP标签来训练一个监督DML模型,该模型在mPCWP回归任务中表现出色,并且与监督基线显著更好。此外,我们的数据表明,DML模型在不同的患者子组中表现良好,即使某些患者子组在数据集中受到保守。我们的代码可以在https://github.com/mandiehyewon/ssldml 中找到。
Enhancing Optimization Performance: A Novel Hybridization of Gaussian Crunching Search and Powell’s Method for Derivative-Free Optimization
results: 经过实验,我们发现这种混合方法可以显著提高优化性能,同时保留每种方法的优点。这种混合方法开启了优化复杂系统中的新可能性。Abstract
This research paper presents a novel approach to enhance optimization performance through the hybridization of Gaussian Crunching Search (GCS) and Powell's Method for derivative-free optimization. While GCS has shown promise in overcoming challenges faced by traditional derivative-free optimization methods [1], it may not always excel in finding the local minimum. On the other hand, some traditional methods may have better performance in this regard. However, GCS demonstrates its strength in escaping the trap of local minima and approaching the global minima. Through experimentation, we discovered that by combining GCS with certain traditional derivative-free optimization methods, we can significantly boost performance while retaining the respective advantages of each method. This hybrid approach opens up new possibilities for optimizing complex systems and finding optimal solutions in a range of applications.
摘要
Note:* "GCS" is translated as " Gaussian Crunching Search" (GCS)* "Powell's Method" is translated as "Powell's Method" ( Powell 方法)* "derivative-free optimization" is translated as "无导数优化" (without derivative optimization)* "local minimum" is translated as "地方最优" (local optimal)* "global minimum" is translated as "全球最优" (global optimal)
Sparse Binary Transformers for Multivariate Time Series Modeling
paper_authors: Matt Gorbett, Hossein Shirazi, Indrakshi Ray
for: 应用于多变量时间序列问题的简单深度学习模型
methods: 使用稀疏和二进制权重的 transformer 模型
results: 在三个时间序列学习任务中获得了比较出色的成绩:分类、异常检测和单步预测,同时通过两种修改来减少计算复杂性:1)在分类任务中应用固定mask,2)在预测和异常检测任务中应用时间步骤Attention mask。这些修改和压缩技术可以减少 transformer 模型中非零操作数量,并且对模型性能没有明显的影响。Abstract
Compressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and anomaly detection, which rely on predicting outputs at a single point in time, we propose an attention mask to allow computation only at the current time step. Together, each compression technique and attention modification substantially reduces the number of non-zero operations necessary in the Transformer. We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count, showing up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs.
摘要
压缩神经网络有望推动深度学习应用于新的应用环境和较小的计算环境。然而,了解这些模型在哪些学习任务中能够成功是未有充分研究。在这种工作中,我们使用稀疏和二进制权重的Transformers来解决多变量时间序列问题,并证明这些轻量级模型可以与同结构的浮点数Transformers具有相同的准确率。我们的模型在三个时间序列学习任务中显示出了有利的结果:分类、异常检测和单步预测。此外,为了降低计算复杂性的注意机制,我们应用了两种修改,其中第一种是在分类任务中采用固定掩码来修改查询、关键和值活动,而第二种是在预测和异常检测任务中,因为需要在单个时间步骤上预测输出,我们提议使用注意力掩码,只允许在当前时间步骤上进行计算。总之,我们的方法可以减少Transformer中非零操作数量,并且对参数数、位数和浮点运算(FLOPs)数进行了评估,得出了最多53倍减少存储大小和最多10.5倍减少FLOPs。
Multiclass Online Learnability under Bandit Feedback
results: 结果与hanneke2023multiclass的结果相 compliment,显示了在充满信息设定下的线上多类分类可学习性是基于抽象反馈的Littlestone dimension。Abstract
We study online multiclass classification under bandit feedback. We extend the results of (daniely2013price) by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online multiclass learnability even when the label space is unbounded. Our result complements the recent work by (hanneke2023multiclass) who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting when the label space is unbounded.
摘要
我们研究在抽奖式多类分类中的在线学习。我们将(daniely2013price)的结果推广到不确定label空间的情况下,证明在线多类学习的可学习性需要和充分条件是抽奖式Littlestone维度的 фиnisiteness。我们的结果与(hanneke2023multiclass)的最近研究相 complement,其证明在全信息设置下,无限大的label空间下的多类学习可学习性是Littlestone维度的Characterize。
Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
paper_authors: Hang Wang, Zhen Xiang, David J. Miller, George Kesidis for: 防止深度神经网络受到后门攻击(Trojan)的攻击,攻击者在训练集中杀入后门触发器,使神经网络在测试时识别攻击者所指定的目标类。methods: 我们提出了一种新的静态策略,通过在小量的净样上学习内层吞吐量的约束,来限制内层吞吐量的范围。这种方法可以更好地防止后门攻击,并且具有强大的鲁棒性。results: 我们的方法在CIFAR-10图像分类任务上显示出了更好的性能,并且对于不同的数据集和攻击方法都具有强大的鲁棒性。此外,我们还提出了一种基于输出差异的测试时检测和修复方法。Abstract
Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.
摘要
深度神经网络容易受到后门攻击(Trojan),攻击者在训练集中杀断特定目标类的识别器,使神经网络在测试时通过特定目标类来识别测试触发器。最近的研究表明,后门恶意投入会导致模型过度适应(异常大的活化),这种情况下我们提出了一种通用、 после训练剪辑方法来 mitigate 后门攻击,即通过小量的干净样本学习内层活化的约束。我们提出了一种新的方法,选择活化约束来限制分类范围。这种方法在对同类方法进行比较时表现出色,并且具有强大的适应性,可以在X2X攻击、适应攻击和不同的 datasets 上进行验证。最后,我们还示出了一种基于输出差异的测试时检测和修复方法。我们的方法代码在线可用。
Machine Learning, Deep Learning and Data Preprocessing Techniques for Detection, Prediction, and Monitoring of Stress and Stress-related Mental Disorders: A Scoping Review
results: 研究发现,支持向量机器(SVM)、神经网络(NN)和随机森林(RF)模型在所有机器学习算法中具有最高精度和稳定性。此外, Physiological parameters,如心率测量和皮肤响应,是压力预测中最常用的数据类型。Abstract
This comprehensive review systematically evaluates Machine Learning (ML) methodologies employed in the detection, prediction, and analysis of mental stress and its consequent mental disorders (MDs). Utilizing a rigorous scoping review process, the investigation delves into the latest ML algorithms, preprocessing techniques, and data types employed in the context of stress and stress-related MDs. The findings highlight that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently exhibit superior accuracy and robustness among all machine learning algorithms examined. Furthermore, the review underscores that physiological parameters, such as heart rate measurements and skin response, are prevalently used as stress predictors in ML algorithms. This is attributed to their rich explanatory information concerning stress and stress-related MDs, as well as the relative ease of data acquisition. Additionally, the application of dimensionality reduction techniques, including mappings, feature selection, filtering, and noise reduction, is frequently observed as a crucial step preceding the training of ML algorithms. The synthesis of this review identifies significant research gaps and outlines future directions for the field. These encompass areas such as model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for detection and prediction of stress and stress-related MDs.
摘要
The findings show that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently demonstrate superior accuracy and robustness among all ML algorithms examined. Additionally, the review highlights that physiological parameters, such as heart rate measurements and skin response, are commonly used as stress predictors in ML algorithms due to their rich explanatory information and ease of data acquisition.The review also notes that dimensionality reduction techniques, such as mappings, feature selection, filtering, and noise reduction, are frequently applied as a crucial step before training ML algorithms.The synthesis of this review identifies significant research gaps and outlines future directions for the field, including model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for detection and prediction of stress and stress-related MDs.
Sparse Array Design for Direction Finding using Deep Learning
results: 这些研究通过数字实验显示了模型基于优化和DL技术的性能,并讨论了对于稀疏数组设计问题的多种实际应用,包括认知雷达、无线通信和 интеGRATED sensing和通信(ISAC)应用。Abstract
In the past few years, deep learning (DL) techniques have been introduced for designing sparse arrays. These methods offer the advantages of feature engineering and low prediction-stage complexity, which is helpful in tackling the combinatorial search inherent to finding a sparse array. In this chapter, we provide a synopsis of several direction finding applications of DL-based sparse arrays. We begin by examining supervised and transfer learning techniques that have applications in selecting sparse arrays for a cognitive radar application. Here, we also discuss the use of meta-heuristic learning algorithms such as simulated annealing for the case of designing two-dimensional sparse arrays. Next, we consider DL-based antenna selection for wireless communications, wherein sparse array problem may also be combined with channel estimation, beamforming, or localization. Finally, we provide an example of deep sparse array technique for integrated sensing and communications (ISAC) application, wherein a trade-off of radar and communications performance makes ISAC sparse array problem very challenging. For each setting, we illustrate the performance of model-based optimization and DL techniques through several numerical experiments. We discuss additional considerations required to ensure robustness of DL-based algorithms against various imperfections in array data.
摘要
We begin by examining supervised and transfer learning techniques that have applications in selecting sparse arrays for a cognitive radar system. We also discuss the use of meta-heuristic learning algorithms such as simulated annealing for the case of designing two-dimensional sparse arrays.Next, we consider DL-based antenna selection for wireless communications, where the sparse array problem may also be combined with channel estimation, beamforming, or localization. Finally, we provide an example of deep sparse array techniques for integrated sensing and communications (ISAC) applications, where a trade-off between radar and communications performance makes ISAC sparse array problems very challenging.For each setting, we illustrate the performance of model-based optimization and DL techniques through several numerical experiments. We also discuss additional considerations required to ensure the robustness of DL-based algorithms against various imperfections in array data.
Deep Learning Driven Detection of Tsunami Related Internal GravityWaves: a path towards open-ocean natural hazards detection
paper_authors: Valentino Constantinou, Michela Ravanelli, Hamlin Liu, Jacob Bortnik
for: 这个论文是为了检测地震引起的内力波在电离层中的影响,以提高早期警报系统的精度。
methods: 这个研究使用了GNSS数据和深度学习技术,并将slant total electron content(sTEC)从VARION算法和计算机视觉中的Gramian Angular Difference Fields(GADF)和卷积神经网络(CNN)组合使用,以实时检测内力波。
results: 研究结果显示,使用这种方法可以在near-real-time中检测到内力波,并在2010年的墨西哥大地震、2011年的东北大地震和2012年的海达岛大地震中达到了91.7%的F1分数。Abstract
Tsunamis can trigger internal gravity waves (IGWs) in the ionosphere, perturbing the Total Electron Content (TEC) - referred to as Traveling Ionospheric Disturbances (TIDs) that are detectable through the Global Navigation Satellite System (GNSS). The GNSS are constellations of satellites providing signals from Earth orbit - Europe's Galileo, the United States' Global Positioning System (GPS), Russia's Global'naya Navigatsionnaya Sputnikovaya Sistema (GLONASS) and China's BeiDou. The real-time detection of TIDs provides an approach for tsunami detection, enhancing early warning systems by providing open-ocean coverage in geographic areas not serviceable by buoy-based warning systems. Large volumes of the GNSS data is leveraged by deep learning, which effectively handles complex non-linear relationships across thousands of data streams. We describe a framework leveraging slant total electron content (sTEC) from the VARION (Variometric Approach for Real-Time Ionosphere Observation) algorithm by Gramian Angular Difference Fields (from Computer Vision) and Convolutional Neural Networks (CNNs) to detect TIDs in near-real-time. Historical data from the 2010 Maule, 2011 Tohoku and the 2012 Haida-Gwaii earthquakes and tsunamis are used in model training, and the later-occurring 2015 Illapel earthquake and tsunami in Chile for out-of-sample model validation. Using the experimental framework described in the paper, we achieved a 91.7% F1 score. Source code is available at: https://github.com/vc1492a/tidd. Our work represents a new frontier in detecting tsunami-driven IGWs in open-ocean, dramatically improving the potential for natural hazards detection for coastal communities.
摘要
TSUNAMIS可以触发内部重力波(IGW)在电离层,干扰全电子内容(TEC),被称为旅行 ionospheric 干扰(TIDs),可以通过全球导航卫星系统(GNSS)探测。GNSS 包括欧盟的加利列オ(Galileo)、美国的全球定位系统(GPS)、俄罗斯的全球卫星导航系统(GLONASS)和中国的北斗卫星导航系统(BeiDou)。实时探测 TIDs 提供了一种方法,用于早期警报系统,提供了不可达的开 ocean 覆盖。通过深入学习,可以有效处理复杂的非线性关系,并处理数千个数据流。我们描述了一个框架,利用倾斜全电子内容(sTEC)从VARION(Variometric Approach for Real-Time Ionosphere Observation)算法、格里曼angular Difference Fields(from Computer Vision)和卷积神经网络(CNNs)来探测 TIDs 的实时探测。历史数据来自2010年智利地震、2011年日本地震和2012年加拿大海啸,用于模型训练,而2015年智利地震用于模型验证。使用我们所描述的实验框架,我们实现了 91.7% F1 分数。源代码可以在 GitHub 上获取:https://github.com/vc1492a/tidd。我们的工作代表了一种新的前沿,用于探测在开 ocean 中的地震引起的 IGW,这将在沿海社区中提高天然威胁探测的潜在性。
PSRFlow: Probabilistic Super Resolution with Flow-Based Models for Scientific Data
results: 与插值和GAN-based超分辨化网络相比,表现出色,并且可以正确地衡量超分辨化结果的不确定性。Abstract
Although many deep-learning-based super-resolution approaches have been proposed in recent years, because no ground truth is available in the inference stage, few can quantify the errors and uncertainties of the super-resolved results. For scientific visualization applications, however, conveying uncertainties of the results to scientists is crucial to avoid generating misleading or incorrect information. In this paper, we propose PSRFlow, a novel normalizing flow-based generative model for scientific data super-resolution that incorporates uncertainty quantification into the super-resolution process. PSRFlow learns the conditional distribution of the high-resolution data based on the low-resolution counterpart. By sampling from a Gaussian latent space that captures the missing information in the high-resolution data, one can generate different plausible super-resolution outputs. The efficient sampling in the Gaussian latent space allows our model to perform uncertainty quantification for the super-resolved results. During model training, we augment the training data with samples across various scales to make the model adaptable to data of different scales, achieving flexible super-resolution for a given input. Our results demonstrate superior performance and robust uncertainty quantification compared with existing methods such as interpolation and GAN-based super-resolution networks.
摘要
尽管最近几年内有许多基于深度学习的超分辨率方法被提出,但由于无法在推理阶段获得真实的参照值,因此只能很少量化错误和不确定性。在科学视觉应用中,却非常重要将超分辨率结果的不确定性传递给科学家,以避免生成错误或 incorrect 信息。在本文中,我们提出了 PSRFlow,一种基于正规流的生成模型,用于科学数据超分辨率。PSRFlow 学习了高分辨率数据的假设分布,基于低分辨率数据。通过在 Gaussian 隐藏空间中采样,可以生成不同可能的超分辨率输出。在 Gaussian 隐藏空间中高效采样可以实现对超分辨率结果的不确定性评估。在模型训练过程中,我们将训练数据进行了不同尺度的扩展,使模型适应不同的尺度,实现数据的灵活超分辨率。我们的结果表明,与既有方法相比,PSRFlow 具有更高的性能和更精准的不确定性评估。
results: 这篇论文主要的结果是,现有的分布式FL方法存在一些潜在的安全性和可靠性问题,如中央服务器的单点失败风险和人在中攻击等。同时,它还提出了一些未来研究方向,以解决现有的挑战和问题。Abstract
In recent years, federated learning (FL) has become a very popular paradigm for training distributed, large-scale, and privacy-preserving machine learning (ML) systems. In contrast to standard ML, where data must be collected at the exact location where training is performed, FL takes advantage of the computational capabilities of millions of edge devices to collaboratively train a shared, global model without disclosing their local private data. Specifically, in a typical FL system, the central server acts only as an orchestrator; it iteratively gathers and aggregates all the local models trained by each client on its private data until convergence. Although FL undoubtedly has several benefits over traditional ML (e.g., it protects private data ownership by design), it suffers from several weaknesses. One of the most critical challenges is to overcome the centralized orchestration of the classical FL client-server architecture, which is known to be vulnerable to single-point-of-failure risks and man-in-the-middle attacks, among others. To mitigate such exposure, decentralized FL solutions have emerged where all FL clients cooperate and communicate without a central server. This survey comprehensively summarizes and reviews existing decentralized FL approaches proposed in the literature. Furthermore, it identifies emerging challenges and suggests promising research directions in this under-explored domain.
摘要
Deep Learning based Image Watermarking: A Brief Survey
paper_authors: Xin Zhong, Arjon Das, Fahad Alrasheedi, Abdullah Tanvir
for: 保护图像 against unauthorized use and distribution
methods: 使用深度学习技术,包括Embedder-Extractor Joint Training、Deep Networks as a Feature Transformation和Hybrid schemes
results: 分析了现有的深度学习图像水印技术,并提出了未来研究的可能性。Abstract
The act of secretly embedding and extracting a watermark on a cover image to protect it is known as image watermarking. In recent years, deep learning-based image watermarking techniques have been emerging one after another. To study the state-of-the-art, this survey categorizes cutting-edge deep learning-based image watermarking techniques into Embedder-Extractor Joint Training, Deep Networks as a Feature Transformation, and Hybrid schemes. Research directions in each category are also analyzed and summarized. Additionally, potential future research directions are discussed to envision future studies.
摘要
“图像水印”是指在封面图像上隐藏并提取水印以保护图像的行为。在最近几年,基于深度学习的图像水印技术逐渐涌现。本笔报告将这些技术分为三类:嵌入器-提取器共同训练、深度网络作为特征转换和混合方案。每个类别的研究方向也进行了分析和总结。此外,未来研究的可能性也被讨论了,以便预测未来的研究方向。
Quantization Aware Factorization for Deep Neural Network Compression
methods: 本研究使用 Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with quantized factors, 以实现tensor decomposition的缩减和量化。
results: 试验结果显示, compared to state-of-the-art post-training quantization methods, 本方法可以实现高品质和高效性的平衡,并且具有高的灵活性和可靠性。Abstract
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.
摘要
《神经网络参数和计算量减少的tensor分解技术》是一种有效的神经网络减少参数和计算量的方法。由于移动或嵌入式设备的内存和电力限制,通常需要进行量化步骤才能在这些设备上部署预训练模型。然而,通常的后期量化方法在使用分解的 weights 时会导致模型的准确率下降。这种情况 Motivated us 开发一种直接使用量化因子进行tensorapproximation的算法,以便同时利用压缩技术,保持模型预测质量。具体来说,我们提出使用 Alternating Direction Method of Multipliers (ADMM) для canonical polyadic (CP) 分解,其中因子的元素 lying on a specified quantization grid。我们使用自定义的算法压缩神经网络参数,并评估其预测质量和性能。我们与state-of-the-art post-training quantization方法进行比较,并表现出高的灵活性和满足 desire quality-performance tradeoff。
ScatterUQ: Interactive Uncertainty Visualizations for Multiclass Deep Learning Problems
results: 这个论文通过一种交互式系统ScatterUQ,可以让用户更好地理解模型在不同不确定性 Settings下的性能,并通过 hover 回调来比较测试示例和训练示例的材料特征,以了解模型uncertainty性能和进行后续操作。Abstract
Recently, uncertainty-aware deep learning methods for multiclass labeling problems have been developed that provide calibrated class prediction probabilities and out-of-distribution (OOD) indicators, letting machine learning (ML) consumers and engineers gauge a model's confidence in its predictions. However, this extra neural network prediction information is challenging to scalably convey visually for arbitrary data sources under multiple uncertainty contexts. To address these challenges, we present ScatterUQ, an interactive system that provides targeted visualizations to allow users to better understand model performance in context-driven uncertainty settings. ScatterUQ leverages recent advances in distance-aware neural networks, together with dimensionality reduction techniques, to construct robust, 2-D scatter plots explaining why a model predicts a test example to be (1) in-distribution and of a particular class, (2) in-distribution but unsure of the class, and (3) out-of-distribution. ML consumers and engineers can visually compare the salient features of test samples with training examples through the use of a ``hover callback'' to understand model uncertainty performance and decide follow up courses of action. We demonstrate the effectiveness of ScatterUQ to explain model uncertainty for a multiclass image classification on a distance-aware neural network trained on Fashion-MNIST and tested on Fashion-MNIST (in distribution) and MNIST digits (out of distribution), as well as a deep learning model for a cyber dataset. We quantitatively evaluate dimensionality reduction techniques to optimize our contextually driven UQ visualizations. Our results indicate that the ScatterUQ system should scale to arbitrary, multiclass datasets. Our code is available at https://github.com/mit-ll-responsible-ai/equine-webapp
摘要
近期,对多类标签问题的不确定性意识深度学习方法已经发展出来,这些方法可以提供标量预测概率和外部数据(OOD)指示器,让机器学习(ML)用户和工程师可以评估模型对预测的自信心。然而,这些额外神经网络预测信息在多种不确定性Setting下可能困难减少可扩展的可视化显示。为解决这些挑战,我们现在介绍ScatterUQ,一个互动系统,可以提供特定的可视化来让用户更好地理解模型在受限制的不确定性设置下的性能。ScatterUQ利用了最新的距离意识神经网络和维度减少技术,构建了可靠的2D散点图,解释模型对测试示例的预测是(1)在distribution中,(2)在distribution中,但是不确定的类别,以及(3)外部数据。通过使用“悬停回调”,ML用户和工程师可以通过比较测试示例与训练示例的材料特征来理解模型不确定性性能,并根据需要采取后续行动。我们在多类图像分类任务上使用了距离意识神经网络,并在Fashion-MNIST和MNIST数字上进行了测试,以及一个深度学习模型 для一个网络安全任务。我们对维度减少技术进行了量化评估,以便优化我们在不同上下文中的UQ视觉表示。我们的结果表明,ScatterUQ系统可以扩展到任意多类数据集。我们的代码可以在https://github.com/mit-ll-responsible-ai/equine-webapp 上获取。
Kernel Single Proxy Control for Deterministic Confounding
results: 该文章通过实验表明,使用单个 proxy 变量可以成功地估计 causal effect,并且可以在 synthetic dataset 上成功地回归 true causal effect。Abstract
We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder. Although Proxy Causal Learning (PCL) uses two proxy variables to recover the true causal effect, we show that a single proxy variable is sufficient for causal estimation if the outcome is generated deterministically, generalizing Control Outcome Calibration Approach (COCA). We propose two kernel-based methods for this setting: the first based on the two-stage regression approach, and the second based on a maximum moment restriction approach. We prove that both approaches can consistently estimate the causal effect, and we empirically demonstrate that we can successfully recover the causal effect on a synthetic dataset.
摘要
我团队考虑了一个 causal effect 估计问题,其中存在一个未观测的假设变量。我们观察了一个代表变量,该变量与假设变量相关。虽然 Proxy Causal Learning(PCL)使用了两个代表变量来恢复真实的 causal effect,但我们显示了一个单个代表变量足够于 causal 估计,如果结果是 deterministic 生成的,则推广 Control Outcome Calibration Approach(COCA)。我们提出了两种基于 kernel 方法来解决这个问题:第一种基于 two-stage 回归方法,第二种基于最大 moments 约束方法。我们证明了这两种方法都可靠地估计 causal effect,并在一个 synthetic 数据集上进行了实验验证。
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?
results: 对于两个公开 dataset 上的食谱推荐问题,我们的实验结果表明,KGE 模型与深度神经 Collaborative Filtering(NCF)的性能相似。此外,我们还提出了针对新用户(即冷启动问题)和食谱类别 conditional 推荐的适用方案。最后,我们将 RECipe 应用于多功能推荐设定中。Abstract
Over the past two decades, recommendation systems (RSs) have used machine learning (ML) solutions to recommend items, e.g., movies, books, and restaurants, to clients of a business or an online platform. Recipe recommendation, however, has not yet received much attention compared to those applications. We introduce RECipe as a multi-purpose recipe recommendation framework with a multi-modal knowledge graph (MMKG) backbone. The motivation behind RECipe is to go beyond (deep) neural collaborative filtering (NCF) by recommending recipes to users when they query in natural language or by providing an image. RECipe consists of 3 subsystems: (1) behavior-based recommender, (2) review-based recommender, and (3) image-based recommender. Each subsystem relies on the embedding representations of entities and relations in the graph. We first obtain (pre-trained) embedding representations of textual entities, such as reviews or ingredients, from a fine-tuned model of Microsoft's MPNet. We initialize the weights of the entities with these embeddings to train our knowledge graph embedding (KGE) model. For the visual component, i.e., recipe images, we develop a KGE-Guided variational autoencoder (KG-VAE) to learn the distribution of images and their latent representations. Once KGE and KG-VAE models are fully trained, we use them as a multi-purpose recommendation framework. For benchmarking, we created two knowledge graphs (KGs) from public datasets on Kaggle for recipe recommendation. Our experiments show that the KGE models have comparable performance to the neural solutions. We also present pre-trained NLP embeddings to address important applications such as zero-shot inference for new users (or the cold start problem) and conditional recommendation with respect to recipe categories. We eventually demonstrate the application of RECipe in a multi-purpose recommendation setting.
摘要
RECipe consists of three subsystems: (1) behavior-based recommender, (2) review-based recommender, and (3) image-based recommender. Each subsystem relies on the embedding representations of entities and relations in the graph. We first obtain (pre-trained) embedding representations of textual entities, such as reviews or ingredients, from a fine-tuned model of Microsoft's MPNet. We initialize the weights of the entities with these embeddings to train our knowledge graph embedding (KGE) model. For the visual component, i.e., recipe images, we develop a KGE-Guided variational autoencoder (KG-VAE) to learn the distribution of images and their latent representations. Once KGE and KG-VAE models are fully trained, we use them as a multi-purpose recommendation framework.For benchmarking, we created two knowledge graphs (KGs) from public datasets on Kaggle for recipe recommendation. Our experiments show that the KGE models have comparable performance to the neural solutions. We also present pre-trained NLP embeddings to address important applications such as zero-shot inference for new users (or the cold start problem) and conditional recommendation with respect to recipe categories. We eventually demonstrate the application of RECipe in a multi-purpose recommendation setting.Translation notes:* "Over the past two decades" is translated as "过去二十年" (guò qù èr shí nián), using the past perfect tense to indicate that the events described in the sentence took place before a specific time in the past.* "recommendation systems" is translated as "推荐系统" (tuī yù xìng zhì), using the pinyin romanization of the Chinese term.* "machine learning" is translated as "机器学习" (jī shì xué xí), using the Chinese term for the field.* "solutions" is translated as "解决方案" (jiě jí fāng àn), using the Chinese term for "solution" or "answer".* "recipe" is translated as "菜谱" (cào bù), using the Chinese term for "recipe".* "users" is translated as "用户" (yòng hù), using the Chinese term for "user".* "queries" is translated as "查询" (chá xún), using the Chinese term for "query".* "natural language" is translated as "自然语言" (zì rán yǔ yán), using the Chinese term for "natural language".* "image" is translated as "图像" (tú xiàng), using the Chinese term for "image".* "entities" is translated as "实体" (shí tǐ), using the Chinese term for "entity".* "relations" is translated as "关系" (guān xì), using the Chinese term for "relation".* "knowledge graph" is translated as "知识图" (zhī shí tú), using the Chinese term for "knowledge graph".* "backbone" is translated as "基础结构" (jī jí jié gòng), using the Chinese term for "backbone" or "basis".* "multi-modal" is translated as "多Modal" (duō mó dāo), using the Chinese term for "multi-modal".* "embedding representations" is translated as "嵌入表示" (fàn shì biǎo xiǎng), using the Chinese term for "embedding representation".* "pre-trained" is translated as "预训练" (zhāng xiǎng xiǎng), using the Chinese term for "pre-trained".* "variational autoencoder" is translated as "变量自适应器" (biàn yù zì shì qǐng), using the Chinese term for "variational autoencoder".* "KGE" is translated as "知识图加 embedding" (zhī shí tú jiā embedding), using the Chinese term for "knowledge graph embedding".* "NLP" is translated as "自然语言处理" (zì rán yǔ yán bù), using the Chinese term for "natural language processing".* "zero-shot" is translated as "零枪指" (zhì zhèng zhǐ), using the Chinese term for "zero-shot".* "cold start" is translated as "冰点问题" (bīng diǎn wèn tí), using the Chinese term for "cold start problem".* "conditional" is translated as "条件" (tiáo jiàn), using the Chinese term for "conditional".* "recipe categories" is translated as "菜谱分类" (cào bù fēn lèi), using the Chinese term for "recipe categories".* "multi-purpose" is translated as "多目的" (duō mù de), using the Chinese term for "multi-purpose".Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.
Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder
results: 研究人员通过对228例自闭症和正常发育者的数据进行十 fold十字验证,证明了这种注意力基于方法在自闭症分类和严重程度预测任务中的优于其他多模态方法。Abstract
The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.
摘要
Autism spectrum disorder (ASD) 的多因素起源 suggests that studying it would greatly benefit from 多模态方法, combine 数据 from 各种不同的平台,例如 neuroscience imaging, genetics, and clinical characterization. Previous neuroimaging-genetic analyses often use naive feature concatenation approaches in data-driven work or use the findings from one modality to guide post hoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.
From Fake to Real (FFR): A two-stage training pipeline for mitigating spurious correlations with synthetic data
paper_authors: Maan Qraitem, Kate Saenko, Bryan A. Plummer
for: mitigate the bias in visual recognition models caused by an imbalanced training set
methods: pre-train a model on a balanced synthetic dataset, fine-tune on real data, and learn robust features against the bias
results: improve the performance of bias mitigation methods and achieve state-of-the-art performance on three large-scale datasetsAbstract
Visual recognition models are prone to learning spurious correlations induced by an imbalanced training set where certain groups (\eg Females) are under-represented in certain classes (\eg Programmers). Generative models offer a promising direction in mitigating this bias by generating synthetic data for the minority samples and thus balancing the training set. However, prior work that uses these approaches overlooks that visual recognition models could often learn to differentiate between real and synthetic images and thus fail to unlearn the bias in the original dataset. In our work, we propose a novel two-stage pipeline to mitigate this issue where 1) we pre-train a model on a balanced synthetic dataset and then 2) fine-tune on the real data. Using this pipeline, we avoid training on both real and synthetic data, thus avoiding the bias between real and synthetic data. Moreover, we learn robust features against the bias in the first step that mitigate the bias in the second step. Moreover, our pipeline naturally integrates with bias mitigation methods; they can be simply applied to the fine-tuning step. As our experiments prove, our pipeline can further improve the performance of bias mitigation methods obtaining state-of-the-art performance on three large-scale datasets.
摘要
视觉识别模型容易学习偏袋性词induced by an imbalanced训练集,其中certain groups(例如女性)在certain classes(例如程序员)下是under-represented。生成模型提供了一个promising direction来mitigate这种偏袋性,通过生成Synthetic数据来填充训练集中的缺失。然而,先前的工作 ignores the fact that visual recognition models could learn to differentiate between real and synthetic images, leading to a failure to unlearn the bias in the original dataset.在我们的工作中,我们提出了一个novel two-stage管道来mitigate this issue,包括:1)先在一个平衡的Synthetic dataset上pre-train模型,然后2)在真实数据上 fine-tune。通过这个管道,我们可以避免训练在真实和Synthetic数据上,从而避免偏袋性between real and Synthetic data。此外,我们在第一步学习了强健的特征,以mitigate偏袋性在第二步。此外,我们的管道自然地与偏袋性mitigation方法集成,它们可以简单地应用到精度调整步骤中。根据我们的实验,我们的管道可以进一步提高偏袋性mitigation方法的性能,在三个大规模数据集上达到状态之 искусственный智能的性能。
Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining
results: 研究发现,使用自我主导学习方法初始化模型的 weights 可以帮助模型更好地学习噪音标签下的图像分类任务,并且可以提高模型对噪音标签的抗衰假性。Abstract
Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based self-supervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels -- NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels.
摘要
噪声标签会对深度学习基于监督学习的图像分类性能产生负面影响,因为模型可能会过拟合噪声并学习损坏的特征提取器。在自然图像分类训练中使用噪声标签数据时,使用对比自我超视的初始化模型 weights 可以降低特征损坏和提高分类性能。然而,现有的研究没有探讨:一、其他自我超视任务基于预测任务的预训练对噪声标签下的学习产生影响,二、任何自我超视预训练方法可以独立地为医学图像中的噪声标签下学习提供帮助。医学图像通常具有较小的数据集和极细的间类变化,需要人工专业来确保正确的分类。因此,是否可以在自然图像 dataset 中使用自我超视预训练来提高噪声标签下的学习,是一个未知问题。在这个工作中,我们探讨了对比和预测任务基于自我超视预训练的影响,以 Initialize 深度学习分类模型的 weights 以便在两个医学图像 dataset 上进行自我induced 噪声标签下的学习。我们的结果表明,使用自我超视预训练初始化模型 weights 可以更好地学习特征和提高对噪声标签的Robustness。
Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures
results: 该方法在 Split-MNIST、Split-CIFAR-10 和 Split-CIFAR-100 数据集上表现出色,比较memory-constrained learning方法和 memory-intensive replay-based方法更好,并且可以与state-of-the-art memory-intensive replay-based方法匹配。此外, authors 还将关键的设计元素integrated into other backpropagation-based continual learning algorithms,提高了它们的准确性。Abstract
The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.
摘要
“持续学习”是设计智能系统的核心能力。许多持续学习方法 rely 于测验函数均值对应和其变形,因此需要运用记忆缓冲或重播来缓解其稳定性、嗜好和短期记忆限制。为解决这个限制,我们已经开发了基于生物学原理的轻量级神经网络架构,具有synaptic plasticity机制和神经调节,因此可以通过本地错误信号进行在线持续学习,不需要数据测验函数均值对应。我们的方法在Split-MNIST、Split-CIFAR-10和Split-CIFAR-100 datasets上实现了较好的在线持续学习性能,比较于其他记忆受限的学习方法和匹配了memory-intensive replay-based方法的性能。我们还证明了我们的方法可以与其他条件反射-based持续学习算法相结合,提高其精度。我们的结果提供了将生物学原理应用到机器学习模型的重要证据,并提供了如何运用这些原理来设计更有效率和可靠的在线持续学习系统。
Deep Learning for Diverse Data Types Steganalysis: A Review
paper_authors: Hamza Kheddar, Mustapha Hemis, Yassine Himeur, David Megías, Abbes Amira for: 这篇论文主要探讨了深度学习基本的隐藏信息检测技术,以帮助探索黑客或恐怖份子所使用的隐藏通信。methods: 这篇论文主要介绍了各种深度学习技术,包括深度学习探测、深度转移学习和深度强化学习,并评估了它们在不同的数据集上的性能。results: 根据文献的数据显示,深度学习基本的隐藏信息检测技术已经取得了比较高的检测精度和速度。尤其是使用深度转移学习和深度强化学习的方法,能够在不同的类型数据上实现更高的检测性能。Abstract
Steganography and steganalysis are two interrelated aspects of the field of information security. Steganography seeks to conceal communications, whereas steganalysis is aimed to either find them or even, if possible, recover the data they contain. Steganography and steganalysis have attracted a great deal of interest, particularly from law enforcement. Steganography is often used by cybercriminals and even terrorists to avoid being captured while in possession of incriminating evidence, even encrypted, since cryptography is prohibited or restricted in many countries. Therefore, knowledge of cutting-edge techniques to uncover concealed information is crucial in exposing illegal acts. Over the last few years, a number of strong and reliable steganography and steganalysis techniques have been introduced in the literature. This review paper provides a comprehensive overview of deep learning-based steganalysis techniques used to detect hidden information within digital media. The paper covers all types of cover in steganalysis, including image, audio, and video, and discusses the most commonly used deep learning techniques. In addition, the paper explores the use of more advanced deep learning techniques, such as deep transfer learning (DTL) and deep reinforcement learning (DRL), to enhance the performance of steganalysis systems. The paper provides a systematic review of recent research in the field, including data sets and evaluation metrics used in recent studies. It also presents a detailed analysis of DTL-based steganalysis approaches and their performance on different data sets. The review concludes with a discussion on the current state of deep learning-based steganalysis, challenges, and future research directions.
摘要
信息安全领域中的隐藏通信和找到隐藏通信的技术是两个相关的方面,称为 стеганография和стегана利isis。Steganography是为了隐藏通信,而Steganalysis是为了找到隐藏的通信或者恢复其中的数据。隐藏通信是由Cybercriminal和恐怖分子使用,以避免被抓获时 Possession of incriminating evidence,因为在许多国家, cryptography is prohibited or restricted。因此,了解最新的隐藏信息探测技术是在揭露违法行为中非常重要。过去几年,一些强大和可靠的隐藏信息探测技术在文献中被引入。这篇评论文章提供了深度学习基于的隐藏信息探测技术,用于检测数字媒体中的隐藏信息。文章覆盖了所有类型的遮盖,包括图像、音频和视频,并讨论了最常用的深度学习技术。此外,文章还探讨了使用更高级的深度学习技术,如深度传输学习(DTL)和深度奖励学习(DRL),以提高隐藏信息探测系统的性能。文章提供了现场的评论,包括最新的数据集和评价标准,以及DTL基于的隐藏信息探测方法的性能分析。文章结束于隐藏信息探测领域的当前状况,挑战和未来研究方向。
Dynamic Model Agnostic Reliability Evaluation of Machine-Learning Methods Integrated in Instrumentation & Control Systems
for: This paper aims to improve the trustworthiness of machine learning (ML) predictions in instrumentation and control systems by developing a real-time model-agnostic method to evaluate the relative reliability of ML predictions.
methods: The proposed method, called Laplacian distributed decay for reliability (LADDR), incorporates out-of-distribution detection on the training dataset to determine the difference between the operational and training datasets, which is used to calculate a prediction’s relative reliability.
results: The LADDR method is demonstrated on a feedforward neural network-based model used to predict safety significant factors during different loss-of-flow transients, and is shown to be effective in evaluating the relative reliability of ML predictions for conventional interpolation tasks.Abstract
In recent years, the field of data-driven neural network-based machine learning (ML) algorithms has grown significantly and spurred research in its applicability to instrumentation and control systems. While they are promising in operational contexts, the trustworthiness of such algorithms is not adequately assessed. Failures of ML-integrated systems are poorly understood; the lack of comprehensive risk modeling can degrade the trustworthiness of these systems. In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption and will play a vital role in intelligent systems' safe and accountable operation. Thus, in this work, we demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset. It is well documented that ML algorithms excel at interpolation (or near-interpolation) tasks but significantly degrade at extrapolation. This occurs when new samples are "far" from training samples. The method, referred to as the Laplacian distributed decay for reliability (LADDR), determines the difference between the operational and training datasets, which is used to calculate a prediction's relative reliability. LADDR is demonstrated on a feedforward neural network-based model used to predict safety significant factors during different loss-of-flow transients. LADDR is intended as a "data supervisor" and determines the appropriateness of well-trained ML models in the context of operational conditions. Ultimately, LADDR illustrates how training data can be used as evidence to support the trustworthiness of ML predictions when utilized for conventional interpolation tasks.
摘要
近年来,数据驱动神经网络基于机器学习(ML)算法的Field在实现仪表和控制系统方面得到了广泛的研究和应用。虽然它们在操作上有承诺,但ML算法的可靠性还未得到全面的评估。失败的ML集成系统未能准确地理解;缺乏完善的风险模型可能会降低ML系统的可靠性。国家标准技术研究所的最近报告显示,在智能系统中,可靠性将作为采用ML的关键障碍。因此,在这项工作中,我们提出了一种实时模型不依赖的方法,通过在训练集上进行异常检测来评估ML预测的相对可靠性。ML算法在 interpolate(或近似 interpolate)任务上 excel,但在 extrapolation 任务上会 significatively degrade。这意味着当新样本远离训练样本时,ML算法会表现不佳。我们提出的方法,即Laplacian distributed decay for reliability(LADDR),可以在不同的损失流转中预测安全重要因素的可靠性。LADDR是一种“数据监管”,用于判断训练集和操作集之间的差异,以计算预测的相对可靠性。LADDR是一种模型不依赖的方法,可以在不同的操作条件下评估ML模型的可靠性。最终,LADDR示例了如何在常规 interpolate 任务中使用训练数据作为可靠性的证明。
MT-IceNet – A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting
for: 这 paper 的目的是为了预测北极海冰覆盖率(SIC),以便更好地理解和预测北极气候变化。
methods: 这 paper 使用了一种基于深度学习的方法,称为 MT-IceNet,它使用了一个 UNet 结构,并使用了多个时间流和空间流来预测北极海冰覆盖率。
results: 根据使用 NSIDC 的卫星评估数据和 ERA5 演算结果,这 paper 的结果表明,MT-IceNet 模型可以提供出色的预测性能,比其他现有的方法更加准确,最多可以预测到 6 个月前的情况。Abstract
Arctic amplification has altered the climate patterns both regionally and globally, resulting in more frequent and more intense extreme weather events in the past few decades. The essential part of Arctic amplification is the unprecedented sea ice loss as demonstrated by satellite observations. Accurately forecasting Arctic sea ice from sub-seasonal to seasonal scales has been a major research question with fundamental challenges at play. In addition to physics-based Earth system models, researchers have been applying multiple statistical and machine learning models for sea ice forecasting. Looking at the potential of data-driven approaches to study sea ice variations, we propose MT-IceNet - a UNet based spatial and multi-temporal (MT) deep learning model for forecasting Arctic sea ice concentration (SIC). The model uses an encoder-decoder architecture with skip connections and processes multi-temporal input streams to regenerate spatial maps at future timesteps. Using bi-monthly and monthly satellite retrieved sea ice data from NSIDC as well as atmospheric and oceanic variables from ERA5 reanalysis product during 1979-2021, we show that our proposed model provides promising predictive performance for per-pixel SIC forecasting with up to 60% decrease in prediction error for a lead time of 6 months as compared to its state-of-the-art counterparts.
摘要
《北极强化效应对 климатиче Patterns 产生了广泛和深刻的影响,从2000年代中期以来,全球和地方的极端天气事件变得更加频繁和严重。北极强化效应的核心是历史上无 precedent的海冰损失,这种观测结果表明了这一点。预测北极海冰的科学研究问题具有基本挑战,包括物理基础模型和统计学机器学习模型。我们提出了MT-IceNet模型,这是一种基于UNet的空间和多时间(MT)深度学习模型,用于预测北极海冰浓度(SIC)。该模型使用编码器-解码器架构,并使用跳接连接来处理多时间输入流,以生成未来时间步的空间地图。使用1979-2021年NSIDC从卫星得到的月度和两个月度海冰数据,以及ERA5分析产品中的大气和海洋变量,我们表明了我们提出的模型在预测每个像素SIC的前6个月的预测误差减少了60%,与现有的模型相比。
Efficient option pricing with unary-based photonic computing chip and generative adversarial learning
paper_authors: Hui Zhang, Lingxiao Wan, Sergi Ramos-Calderer, Yuancheng Zhan, Wai-Keong Mok, Hong Cai, Feng Gao, Xianshu Luo, Guo-Qiang Lo, Leong Chuan Kwek, José Ignacio Latorre, Ai Qun Liu
for: 这个论文是为了提高金融业务效率和质量而设计的。
methods: 这个论文使用光子芯片实现欧洲选择价格的唯一方法,并与量子振荡估算算法相结合,以实现类比于经典 Monte Carlo 方法的二次加速。
results: 这个论文实现了一种光子芯片,可以快速计算欧洲选择价格,并且可以减少经典 Monte Carlo 方法的计算时间。Abstract
In the modern financial industry system, the structure of products has become more and more complex, and the bottleneck constraint of classical computing power has already restricted the development of the financial industry. Here, we present a photonic chip that implements the unary approach to European option pricing, in combination with the quantum amplitude estimation algorithm, to achieve a quadratic speedup compared to classical Monte Carlo methods. The circuit consists of three modules: a module loading the distribution of asset prices, a module computing the expected payoff, and a module performing the quantum amplitude estimation algorithm to introduce speed-ups. In the distribution module, a generative adversarial network is embedded for efficient learning and loading of asset distributions, which precisely capture the market trends. This work is a step forward in the development of specialized photonic processors for applications in finance, with the potential to improve the efficiency and quality of financial services.
摘要
现代金融系统中,产品结构变得越来越复杂,而 классическое计算能力的瓶颈已经限制了金融业的发展。我们在这里提出了一款光学芯片,实现了一元方法来估算欧洲期权价格,并结合量子振荡估计算法,实现了对类比 Monte Carlo 方法的二次加速。该芯片由三个模块组成:分布模块、预期支付模块和量子振荡估计算法模块。在分布模块中,我们嵌入了生成对抗网络,以高效地学习和加载资产分布,准确捕捉市场趋势。这项工作是金融特殊光学处理器的开发的一个重要步骤,具有改善金融服务效率和质量的潜力。
When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations
paper_authors: Rhys Compton, Lily Zhang, Aahlad Puli, Rajesh Ranganath
for: 这篇论文旨在探讨机器学习模型如何处理多个数据集的问题,以及这些数据集之间的相互作用。
methods: 作者使用了多个开源的胸部X射线图像数据集,并对它们进行了大规模的实验研究。
results: 研究发现,在43%的情况下,将多个医院的数据集合并训练机器学习模型可能会导致模型的性能下降。这种现象发生在训练数据集与测试数据集之间存在潜在的假相关性的情况下。Abstract
In machine learning, incorporating more data is often seen as a reliable strategy for improving model performance; this work challenges that notion by demonstrating that the addition of external datasets in many cases can hurt the resulting model's performance. In a large-scale empirical study across combinations of four different open-source chest x-ray datasets and 9 different labels, we demonstrate that in 43% of settings, a model trained on data from two hospitals has poorer worst group accuracy over both hospitals than a model trained on just a single hospital's data. This surprising result occurs even though the added hospital makes the training distribution more similar to the test distribution. We explain that this phenomenon arises from the spurious correlation that emerges between the disease and hospital, due to hospital-specific image artifacts. We highlight the trade-off one encounters when training on multiple datasets, between the obvious benefit of additional data and insidious cost of the introduced spurious correlation. In some cases, balancing the dataset can remove the spurious correlation and improve performance, but it is not always an effective strategy. We contextualize our results within the literature on spurious correlations to help explain these outcomes. Our experiments underscore the importance of exercising caution when selecting training data for machine learning models, especially in settings where there is a risk of spurious correlations such as with medical imaging. The risks outlined highlight the need for careful data selection and model evaluation in future research and practice.
摘要
在机器学习中,通常认为更多数据可以提高模型性能,但这项工作挑战这一观点,表明在许多情况下,外部数据集添加可能会损害模型性能。我们在四个开源胸部X射线图像集和九个标签之间进行了大规模的实验,发现在43%的情况下,使用两家医院的数据进行训练的模型在两家医院的数据上具有较差的最坏群组精度。这种意外的结果,即尽管添加的医院使训练分布更加类似于测试分布,但是模型在两家医院的数据上具有较差的性能。我们解释了这种现象是由于医院特有的图像artifacts产生的假 correlation的影响。我们强调在多个数据集训练时存在的费解之处, между添加更多数据的明显利益和引入的假 correlations的隐性成本。在某些情况下,平衡数据可以消除假 correlations并提高性能,但并非总是有效的策略。我们在文献中归纳了我们的结果,以帮助解释这些结果。我们的实验让人们意识到在机器学习模型训练时,特别是在医疗影像领域,选择数据的价值很大,需要小心评估和选择数据。这些风险提出的需要在未来的研究和实践中进行细心的数据选择和模型评估。
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
results: 研究发现,使用SILO可以提高语言模型在不同领域的性能,同时减少版权和限制数据的风险。Specifically, SILO的搜寻性能与使用Pile corpus进行训练的LMClosing 90%的性能差。此外,研究还分析了不同的非参数方法的效果,以及数据库大小对性能的影响。Abstract
The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high quality language models while mitigating their legal risk.
摘要
训练语言模型(LM)在版权或限制数据上的法律合法性正在激烈讨论。然而,我们表明,如果只训练LM在低风险文本(如公共领域或政府文件)上,其性能会显著下降,因为这些文本的数量和覆盖率都很限制。我们提出了SILO,一种新的语言模型,可以在推理时管理这种风险和性能的贸易。SILO由以下两个部分组成:1. 使用 parametric LM 训练 Open License Corpus(OLC),一个我们新建的公共领域和允许授权文本的新词汇库,包含228亿个字符。2. 通过在推理时使用非参数的数据存储(例如包含版权书籍或新闻的数据)来增强模型的性能。这个数据存储支持句子级数据归属,并允许数据生成者在模型中排除自己的内容。我们的实验表明, parametric LM 在不受 OLC 覆盖的领域表现不佳。然而,通过访问数据存储,可以大幅提高 Mod 的 OUT OF 领域表现,将性能与基于 Pile 的 LM 相凑,这个 Pile 的文本主要是高风险文本。我们还分析了非参数方法的最佳选择、剩下的错误的位置以及数据存储大小的性能满意度。我们的结果表明,可以建立高质量的语言模型,同时避免其法律风险。
Meta-Learning Operators to Optimality from Multi-Task Non-IID Data
results: 这个论文的结果表明,使用这种方法可以在不同来源或任务上学习共同的表示函数,并且可以降低计算成本和统计泛化。此外,这种方法还可以在各种数学模型下进行扩展和应用。Abstract
A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic meta-learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent (AMD) scheme proposed in Collins et al., (2021), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems.
摘要
一个强大的概念在现代机器学习中是将多元数据中的共同特征提取出来。这将reduces computational effort和statistical generalization的成本,因为仅需要 fine-tune fewer parameters on a given task。在理论上诠释这些优点,我们提出一个统一 Linear operators $M$ 的恢复问题,其中 vector measurements $y = Mx + w$ 中的 covariates $x$ 可能是非 i.i.d. 和非对称的。我们证明了现有的不对称适应学习方法对 representation update 会产生偏见,导致随着任务数量增加而对 noise term 的扩大。这会导致 representation learning 的样本Complexity 被瓶颈在单一任务数据大小上。我们引入了一个适应 $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$),它是 Collins et al., (2021) 提出的流行的 Alternating Minimization-Descent (AMD)方案的修改。我们证明了 $\texttt{DFW}$ 在 total source data size 下随着阶层降低的情况下,具有线性传播到优化的表现。这导致了一个对 oracle empirical risk minimizer 相似的一个 generalization bound。我们透过各种数据 simulated 的 verify,证明了 $\texttt{DFW}$ 的重要性。具体来说,我们发现了 vanilla Alternating-Minimization Descent 甚至在 iid 的数据上也会 catastrophically fail。我们的分析统一了和扩展了先前的工作,并提供了更多的应用,例如在控制和动力系统等领域。
A Deep-Learning Method Using Auto-encoder and Generative Adversarial Network for Anomaly Detection on Ancient Stone Stele Surfaces
results: 这篇论文使用自动encoder(AE)和生成敌方网络(GAN)架构,可以实时检测古代碑石表面上的紧急情况,并且不需要大量的异常样本。在实验中,使用长门洞穴的石碑为案例研究,提出了一个无监督学习模型,并在重建精度99.74%的基础上验证了模型的可靠性和精度。Abstract
Accurate detection of natural deterioration and man-made damage on the surfaces of ancient stele in the first instance is essential for their preventive conservation. Existing methods for cultural heritage preservation are not able to achieve this goal perfectly due to the difficulty of balancing accuracy, efficiency, timeliness, and cost. This paper presents a deep-learning method to automatically detect above mentioned emergencies on ancient stone stele in real time, employing autoencoder (AE) and generative adversarial network (GAN). The proposed method overcomes the limitations of existing methods by requiring no extensive anomaly samples while enabling comprehensive detection of unpredictable anomalies. the method includes stages of monitoring, data acquisition, pre-processing, model structuring, and post-processing. Taking the Longmen Grottoes' stone steles as a case study, an unsupervised learning model based on AE and GAN architectures is proposed and validated with a reconstruction accuracy of 99.74\%. The method's evaluation revealed the proficient detection of seven artificially designed anomalies and demonstrated precision and reliability without false alarms. This research provides novel ideas and possibilities for the application of deep learning in the field of cultural heritage.
摘要
正确地探测古迹上自然衰老和人为损坏的存在是保护古迹的首要任务。现有的文化遗产保护方法无法完全达到这个目标,因为很难平衡精度、效率、时间和成本。本研究提出了一种基于深度学习的方法,可以自动探测古石碑上的紧急情况,使用自适应神经网络(AE)和生成对抗网络(GAN)。提案的方法可以无需大量的异常样本,同时具有全面探测不可预测的异常的能力。方法包括监控、数据收集、预处理、模型结构和后处理等阶段。以长门石窟的石碑为例,提出了一个无监控学习模型,使用AE和GAN架构,并在重建准确率达99.74%。评估结果显示了对七种人工设计的异常探测的精准性和可靠性,无 FALSE ALARM 的情况。本研究将深度学习应用在文化遗产保护领域提供了新的想法和可能性。
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
results: 实验结果显示,DiffCR 在两个通用的评估数据集上均能够获得最佳性能,并且与前一代方法的参数和computational complexity相比,只需5.1%和5.4%。所有的实验结果和代码将会在https://github.com/XavierJiezou/DiffCR 上公开。Abstract
Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.
摘要
OPTICAL 卫星图像是一种关键的数据源,但是云覆盖往往会降低图像质量,妨碍图像应用和分析。因此,从事cloud removal的研究已经成为一个重要的研究方向。 aunque recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. 经验证明,DiffCR在两个常用的 benchmark 数据集上具有最佳性能,与之前最佳方法的参数和计算复杂度分别为5.1%和5.4%。源代码、预训练模型和所有实验结果将在https://github.com/XavierJiezou/DiffCR 上公开发布。
Vector Embeddings by Sequence Similarity and Context for Improved Compression, Similarity Search, Clustering, Organization, and Manipulation of cDNA Libraries
results: 研究发现,通过使用这种数字表示方法,可以更好地聚类序列,并且可以提高压缩率。此外,通过基于 codon triplets 的上下文学习,可以进行更精细的聚类和特征分析。Abstract
This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ5). FASTA/FASTQ files have several current limitations, such as their large file sizes, slow processing speeds for mapping and alignment, and contextual dependencies. These challenges significantly hinder investigations and tasks that involve finding similar sequences. The solution lies in transforming sequences into an alternative representation that facilitates easier clustering into similar groups compared to the raw sequences themselves. By assigning a unique vector embedding to each short sequence, it is possible to more efficiently cluster and improve upon compression performance for the string representations of cDNA libraries. Furthermore, through learning alternative coordinate vector embeddings based on the contexts of codon triplets, we can demonstrate clustering based on amino acid properties. Finally, using this sequence embedding method to encode barcodes and cDNA sequences, we can improve the time complexity of the similarity search by coupling vector embeddings with an algorithm that determines the proximity of vectors in Euclidean space; this allows us to perform sequence similarity searches in a quicker and more modular fashion.
摘要
Translated into Simplified Chinese:这篇论文展示了用有组织的数字表示方法优化flat string基因格式(即FASTA/FASTQ5)的研究。FASTA/FASTQ文件存在许多现有的限制,例如文件大小、对 mapping 和 aligning 进行slow processing的速度、以及contextual dependencies。这些限制会对investigations和任务 involving finding similar sequences 造成很大的阻碍。解决方案在于将序列转换成一种更好的表示方式,以便更容易将similar sequences clustering。通过赋予每个短序列唯一的vector embedding,可以更高效地 clustering和提高cDNA库的压缩性能。此外,通过基于 codon triplets 的上下文学习alternative coordinate vector embeddings,我们可以示出基于aa properties的 clustering。最后,通过使用这种序列嵌入方法来编码 barcodes 和 cDNA sequences,我们可以通过将vector embeddings coupled with an algorithm that determines the proximity of vectors in Euclidean space 来提高sequence similarity searches的时间复杂度,从而实现更快速和更模块化的搜索方式。
Probabilistic Invariant Learning with Randomized Linear Classifiers
paper_authors: Leonardo Cotta, Gal Yehuda, Assaf Schuster, Chris J. Maddison
for: 这个论文的目的是设计一类能够具有表达力和保持任务不变性的模型,而且具有较少的资源需求。
methods: 这个论文使用了随机化的算法思想,提出了一类基于随机线性模型的分类器(Randomized Linear Classifiers,RLCs),并证明RLCs可以高概率地近似任何(光滑)函数,同时保持 compact group transformations 的不变性。
results: 论文通过实验表明,RLCs可以在不变性任务中超越权重链 neuron 和其不变性版本,具有较少的资源需求。Abstract
Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
摘要
“设计能够表达性和保持已知 invariants 的模型是一个在不断增长的问题。现有的解决方案都会交换 invariants 和计算或内存资源。在这项工作中,我们表明了如何利用 randomness 和设计能够表达性和 invariants 的模型,同时使用更少的资源。我们的关键发现是接受 probabilistic 的 universality 和 invariants 的想法可以降低我们的资源需求。更 Specifically,我们提出了一类基于 randomized algorithms 的 binary classification 模型,称为 Randomized Linear Classifiers (RLCs)。我们给出了参数和样本大小的条件,在这些条件下,RLCs 可以, WITH HIGH PROBABILITY,近似任何(平滑)函数,同时保持 compact group transformations 的 invariants。基于这个结果,我们设计了三种 RLCs,这些模型可以在 classification tasks 中保证 probabilistic invariants。我们证明了这些模型可以使用更少的资源来实现 universality 和 invariants,比 deterministic 神经网络和其 invariants 的counterparts。最后,我们通过实验证明了这种新的类型模型在 invariant tasks 中的 beneficial effects。”
paper_authors: Zihan Guan, Mengnan Du, Ninghao Liu
For: The paper is written to detect backdoor attacks on graph learning models.* Methods: The paper proposes an explanation-guided backdoor detection method that utilizes topological feature information to distinguish backdoor samples from clean samples.* Results: The proposed method is effective in detecting backdoor attacks on multiple popular datasets and attack methods, and provides explainable results through the use of explanation methods.Here is the same information in Simplified Chinese:* For: 文章目的是探测图学习模型中的后门攻击。* Methods: 文章提出了一种基于 topological 特征信息的解释帮助的后门检测方法。* Results: 提议的方法在多个流行的数据集和攻击方法上显示出了效果,并通过使用解释方法提供了可解释的结果。I hope this helps!Abstract
Backdoor attacks pose a significant security risk to graph learning models. Backdoors can be embedded into the target model by inserting backdoor triggers into the training dataset, causing the model to make incorrect predictions when the trigger is present. To counter backdoor attacks, backdoor detection has been proposed. An emerging detection strategy in the vision and NLP domains is based on an intriguing phenomenon: when training models on a mixture of backdoor and clean samples, the loss on backdoor samples drops significantly faster than on clean samples, allowing backdoor samples to be easily detected by selecting samples with the lowest loss values. However, the ignorance of topological feature information on graph data limits its detection effectiveness when applied directly to the graph domain. To this end, we propose an explanation-guided backdoor detection method to take advantage of the topological information. Specifically, we train a helper model on the graph dataset, feed graph samples into the model, and then adopt explanation methods to attribute model prediction to an important subgraph. We observe that backdoor samples have distinct attribution distribution than clean samples, so the explanatory subgraph could serve as more discriminative features for detecting backdoor samples. Comprehensive experiments on multiple popular datasets and attack methods demonstrate the effectiveness and explainability of our method. Our code is available: https://github.com/GuanZihan/GNN_backdoor_detection.
摘要
Graph 学习模型面临着重要的安全隐患,即后门攻击。可以通过插入后门触发器到训练集中,使模型在触发器存在时进行错误预测。为了防止后门攻击,后门检测被提议。在视觉和自然语言处理领域,一种emerging检测策略是根据一种奇妙现象:在混合后门和干净样本训练模型时,后门样本的损失值会下降得更快,使得后门样本可以通过选择损失值最低的样本来轻松地检测。然而,在图数据上直接应用这种方法时,它的检测效果受到图数据的topological特征信息的限制。为此,我们提出了一种带有解释的后门检测方法,利用图数据的topological信息。具体来说,我们在图数据集上训练一个助手模型,然后将图样本 feed 到模型中,然后采用解释方法来归因模型预测结果到一个重要的子图。我们发现,后门样本的归因分布与干净样本不同,因此解释子图可以作为更有特征的检测特征。我们的实验表明,我们的方法在多个流行的数据集和攻击方法上具有效果和可解释性。我们的代码可以在 GitHub 上找到:https://github.com/GuanZihan/GNN_backdoor_detection。
Event Abstraction for Enterprise Collaboration Systems to Support Social Process Mining
results: 论文的评估结果表明,该算法能够生成准确的结果。ECSEA是一种重要的预处理方法,可以帮助解读ECS中的协作工作活动,我们称之为社会进程挖掘(Social Process Mining)。Abstract
One aim of Process Mining (PM) is the discovery of process models from event logs of information systems. PM has been successfully applied to process-oriented enterprise systems but is less suited for communication- and document-oriented Enterprise Collaboration Systems (ECS). ECS event logs are very fine-granular and PM applied to their logs results in spaghetti models. A common solution for this is event abstraction, i.e., converting low-level logs into more abstract high-level logs before running discovery algorithms. ECS logs come with special characteristics that have so far not been fully addressed by existing event abstraction approaches. We aim to close this gap with a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities (high-level traces) with the system-generated low-level traces (extracted from the ECS). The model allows us to automatically convert future low-level traces into an abstracted high-level log that can be used for PM. Our evaluation shows that the algorithm produces accurate results. ECSEA is a preprocessing method that is essential for the interpretation of collaborative work activity in ECS, which we call Social Process Mining.
摘要
一个目标 OF 进程挖掘(PM)是从信息系统事件日志中发现进程模型。PM 已经成功地应用于进程强调企业系统,但是它对交通和文档强调企业协作系统(ECS)的事件日志 Less 适用。ECS 事件日志具有非常细腻的特点,PM 在其日志上运行发现算法会导致“卡路里模型”。一种常见的解决方案是事件抽象,即将低级别的日志转换为更加抽象的高级别日志,以便于发现算法。ECS 日志具有特殊的特点, existing 事件抽象方法未能充分 Address 这些特点。我们的目标是通过针对实际用户活动记录(高级轨迹)和系统生成的低级别轨迹(从 ECS 提取)进行比较,并 trains 一个模型,以便将未来的低级别轨迹自动转换为抽象的高级别日志,可以用于 PM。我们的评估结果表明,该算法生成的结果准确。ECSEA 是一种适用于 ECS 的预处理方法,它是社交过程挖掘(Social Process Mining)中不可或缺的一部分。
Data Augmentation-Based Unsupervised Domain Adaptation In Medical Imaging
results: 经过广泛的实验和比较,该方法在多种任务和数据集中表现出了高精度和广泛的可用性,同时也能够快速地适应新的扫描设备和数据集。Abstract
Deep learning-based models in medical imaging often struggle to generalize effectively to new scans due to data heterogeneity arising from differences in hardware, acquisition parameters, population, and artifacts. This limitation presents a significant challenge in adopting machine learning models for clinical practice. We propose an unsupervised method for robust domain adaptation in brain MRI segmentation by leveraging MRI-specific augmentation techniques. To evaluate the effectiveness of our method, we conduct extensive experiments across diverse datasets, modalities, and segmentation tasks, comparing against the state-of-the-art methods. The results show that our proposed approach achieves high accuracy, exhibits broad applicability, and showcases remarkable robustness against domain shift in various tasks, surpassing the state-of-the-art performance in the majority of cases.
摘要
深度学习基本模型在医疗成像领域经常陷入新扫描数据的泛化问题,这是因为数据间的不同,包括硬件、获取参数、人口和artefacts等。这种限制对于实施机器学习模型在临床实践中带来了重要的挑战。我们提议一种无监督的鲁棒领域适应方法,通过利用MRI特有的扩充技术来实现。为评估我们的方法的效果,我们在多个数据集、模式和分割任务中进行了广泛的实验,与当前最佳方法进行比较。结果显示,我们的提议方法在多个任务中具有高精度、广泛适用性和强大的鲁棒性,超越了当前最佳性能的大多数情况。
Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries
paper_authors: Elvis Han Cui, Zizhao Zhang, Culsome Junwen Chen, Weng Kee Wong
for: This paper is written to demonstrate the flexibility and out-performance of a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) in various optimization problems in the statistical sciences.
methods: The paper uses the CSO-MA algorithm to solve a variety of optimization problems, including finding maximum likelihood estimates of parameters, estimating parameters in a Rasch model, finding M-estimates for a Cox regression, and matrix completion to impute missing values.
results: The paper shows that the CSO-MA algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. The algorithm is applied to a variety of optimization problems, including finding maximum likelihood estimates of parameters in a single cell generalized trend model, estimating parameters in a commonly used Rasch model, finding M-estimates for a Cox regression, and matrix completion to impute missing values.Abstract
Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.
摘要
自然 inspirited metaheuristic algorithms 是人工智能中重要的组件,广泛应用于各个领域解决各种困难的优化问题。我们使用一种新提出的自然 inspirited metaheuristic algorithm called competitive swarm optimizer with mutated agents(CSO-MA),并证明其灵活性和相比其他竞争对手的高性能。在统计科学中,我们应用了这种算法,包括:(i) 在单元维度泛化趋势模型中找到最大化拟合参数的最佳估计值,以研究生物信息学中的pseudotime。(ii) 在教育研究中,使用这种算法来估计Rasch模型中的参数。(iii) 在Markov renewal模型中使用这种算法来找到Cox回归的M-估计值。(iv) 在两个分布中完成缺失值的imatrix completion。此外,我们还讨论了这种算法在生态学问题中选取变量的最佳方法,以及在汽车业中使用Logistic模型来设计一个循环实验。
AdaptEx: A Self-Service Contextual Bandit Platform
results: 该平台可以快速提高用户体验,降低传统测试方法相关的成本和时间。它还可以在内容不断变化和连续”冷启”情况下快速迭代到优化产品解决方案。Abstract
This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.
摘要
Understanding the Effect of Counterfactual Explanations on Trust and Reliance on AI for Human-AI Collaborative Clinical Decision Making
paper_authors: Min Hun Lee, Chong Jun Chew for: 这个论文旨在研究人工智能(AI)在高风险领域(如医疗)的决策支持方面,以及人类对 AI 建议的过度依赖问题。methods: 该论文使用了突出特征解释以及对假解释来让人类更加分析地评估 AI 建议,以降低对 AI 的过度依赖。results: 研究发现,当 AI 模型提供正确的建议时,人类的性能和协调水平都得到了提高。然而,人类对 AI 模型的错误建议仍然存在过度依赖的问题,并且counterfactual解释可以帮助人类减少对错误 AI 建议的过度依赖。Specifically, 非专业人员在使用 counterfactual解释时表现出了更大的改善,而专业人员的性能则更好。Abstract
Artificial intelligence (AI) is increasingly being considered to assist human decision-making in high-stake domains (e.g. health). However, researchers have discussed an issue that humans can over-rely on wrong suggestions of the AI model instead of achieving human AI complementary performance. In this work, we utilized salient feature explanations along with what-if, counterfactual explanations to make humans review AI suggestions more analytically to reduce overreliance on AI and explored the effect of these explanations on trust and reliance on AI during clinical decision-making. We conducted an experiment with seven therapists and ten laypersons on the task of assessing post-stroke survivors' quality of motion, and analyzed their performance, agreement level on the task, and reliance on AI without and with two types of AI explanations. Our results showed that the AI model with both salient features and counterfactual explanations assisted therapists and laypersons to improve their performance and agreement level on the task when `right' AI outputs are presented. While both therapists and laypersons over-relied on `wrong' AI outputs, counterfactual explanations assisted both therapists and laypersons to reduce their over-reliance on `wrong' AI outputs by 21\% compared to salient feature explanations. Specifically, laypersons had higher performance degrades by 18.0 f1-score with salient feature explanations and 14.0 f1-score with counterfactual explanations than therapists with performance degrades of 8.6 and 2.8 f1-scores respectively. Our work discusses the potential of counterfactual explanations to better estimate the accuracy of an AI model and reduce over-reliance on `wrong' AI outputs and implications for improving human-AI collaborative decision-making.
摘要
人工智能(AI)在高度决策领域(如医疗)中被越来越广泛使用以协助人类决策。然而,研究人员已经提出了一个问题,即人们可能会因为AI模型的错误建议而过分依赖于AI,而不是 дости得人类AI补充性能。在这项工作中,我们使用了突出特征说明以及对比说明来使人们更加分析地评估AI建议,以降低对AI的过分依赖。我们在评估患者质量运动质量任务上进行了实验,并分析了参与者的表现、同意水平和对AI的依赖度。我们的结果表明,含有突出特征和对比说明的AI模型可以帮助治疗师和非专业人员提高表现和同意水平。而且,对比说明可以帮助治疗师和非专业人员减少对“错误”AI输出的过分依赖,相比突出特征说明下降21%。特别是,非专业人员在使用突出特征说明时表现下降18.0 f1-score,而在使用对比说明时表现下降14.0 f1-score,而治疗师则表现下降8.6和2.8 f1-score。我们的研究表明,对比说明可以更好地估计AI模型的准确性,降低对“错误”AI输出的过分依赖,并带来人类AI协同决策的进步。
Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning
paper_authors: Simon Queyrut, Yérom-David Bromberg, Valerio Schiavoni
for: The paper is written to address the issue of privacy preservation in federated learning, specifically the problem of malicious probing attacks on the model updates.
methods: The paper proposes a novel shielding mechanism called Pelta, which leverages Trusted Execution Environments (TEEs) to mask part of the back-propagation chain rule and prevent attackers from exploiting it for the design of malicious samples.
results: The paper demonstrates the effectiveness of Pelta against the Self Attention Gradient adversarial attack on a state-of-the-art ensemble model.Here’s the simplified Chinese version of the three points:
results: 本文对一个国际 ensemble 模型进行了评估,并证明Pelta可以有效防止自我注意力梯度攻击。Abstract
The main premise of federated learning is that machine learning model updates are computed locally, in particular to preserve user data privacy, as those never leave the perimeter of their device. This mechanism supposes the general model, once aggregated, to be broadcast to collaborating and non malicious nodes. However, without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.
摘要
SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling
methods: 本研究使用 Super Learner Equation Modeling(SLEM),一种基于机器学习 Super Learner 集成的路径模型方法,来解决 causal inference 中的函数偏差问题。
results: 对比 SEM 方法,SLEM 在线性模型中表现出了竞争性的表现,并且在非线性关系中表现出了superiority。此外,SLEM 还提供了可靠且不偏的 causal effect 估计方法,可以用于 observational 数据上进行预测性的 intervenion 研究。Abstract
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
摘要
科学中的重要目标之一是 causal inference,即使用观察数据来得到干扰干扰的结论。路径模型、结构方程模型(SEM)和、更一般地说,导向无环图(DAG)可以明确地 especify causal structure underlying a phenomenon的假设。不同于 DAG,SEM 假设 linearity,这可能会导致函数假设不正确,从而阻碍研究人员进行可靠的效应大小估计。在这种情况下,我们提出了 Super Learner Equation Modeling,一种路径模型技术, integrate machine learning Super Learner ensembles。我们经验表明它可以提供可靠和不偏的 causal effect estimates,与 SEM 在线性模型中的表现竞争性,并在非线性关系中超过 SEM。我们提供了开源代码和一个教程Notebook,强调这种方法的易用性。