results: 使用半监督学习引擎团队建立了11个广泛的公共交通主题,并使用这些数据来训练和评估一种基于RoBERTa架构的语言模型。该模型在所有评价指标上都超过了经典机器学习方法,提供了90%的主题分类精度。Abstract
Transit riders' feedback provided in ridership surveys, customer relationship management (CRM) channels, and in more recent times, through social media is key for transit agencies to better gauge the efficacy of their services and initiatives. Getting a holistic understanding of riders' experience through the feedback shared in those instruments is often challenging, mostly due to the open-ended, unstructured nature of text feedback. In this paper, we propose leveraging traditional transit CRM feedback to develop and deploy a transit-topic-aware large language model (LLM) capable of classifying open-ended text feedback to relevant transit-specific topics. First, we utilize semi-supervised learning to engineer a training dataset of 11 broad transit topics detected in a corpus of 6 years of customer feedback provided to the Washington Metropolitan Area Transit Authority (WMATA). We then use this dataset to train and thoroughly evaluate a language model based on the RoBERTa architecture. We compare our LLM, MetRoBERTa, to classical machine learning approaches utilizing keyword-based and lexicon representations. Our model outperforms those methods across all evaluation metrics, providing an average topic classification accuracy of 90%. Finally, we provide a value proposition of this work demonstrating how the language model, alongside additional text processing tools, can be applied to add structure to open-ended text sources of feedback like Twitter. The framework and results we present provide a pathway for an automated, generalizable approach for ingesting, visualizing, and reporting transit riders' feedback at scale, enabling agencies to better understand and improve customer experience.
摘要
公共交通使用者的反馈,从乘客关系管理(CRM)渠道、客户反馈Surveys以及最近的社交媒体,对公共交通机构来说非常重要。通过反馈来了解乘客的体验,可以帮助机构更好地了解自己的服务和活动的效果。然而,由于反馈的开放性和无结构性,通常是困难的获得整体的理解。在这篇文章中,我们提议利用传统的公共交通CRM反馈,开发和部署一个具有公共交通话题意识的大语言模型(LLM),可以将开放式文本反馈分类到相关的公共交通话题。我们首先利用半监督学习,engineer一个训练集,其中包含6年的客户反馈数据,来自华盛顿都会区公共交通管理局(WMATA)。然后,我们使用这个数据集来训练和评估一个基于RoBERTa架构的语言模型。我们与键字基本架构和词汇表表示方法进行比较。我们的模型在所有评价指标上都高于这些方法,提供了90%的话题分类精度。最后,我们提供了这种工作的价值提案,说明如何使用语言模型, alongside其他文本处理工具,对开放式文本反馈 sources like Twitter进行结构化处理,以提高客户体验的理解和改进。我们的框架和结果提供了一种可扩展的自动化方法,可以在大规模的客户反馈数据中快速、自动地进行分类和报告,帮助机构更好地了解和改进客户体验。
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
paper_authors: Jingdan Zhang, Jiaan Wang, Xiaodan Wang, Zhixu Li, Yanghua Xiao for:这 paper 的目的是构建一个基于多modal数据的实体知识图(MMKG),以便更全面地理解实体的多个方面。methods:这 paper 使用了一种新的方法,即匹配实体图像与不同的实体方面,以扩展现有的 MMKG。这种方法包括从知识库中收集方面相关的图像,并从知识库中提取方面相关的句子作为查询来 Retrieving a large number of aspect-related images via an online image search engine。results:这 paper constructed a new MMKG called AspectMMKG, which contains 2,380 entities, 18,139 entity aspects, and 645,383 aspect-related images. 这 paper также提出了一种新的实体方面相关图像检索(AIR)模型,可以更正和扩展 AspectMMKG 中的实体图像。这种模型通过将实体图像、实体方面和相关图像信息integrate into一个模型来学习实体图像与方面相关图像之间的关系。实验结果表明,AIR 模型可以为给定的实体 Retrieves suitable images for different aspects.Abstract
Multi-modal knowledge graphs (MMKGs) combine different modal data (e.g., text and image) for a comprehensive understanding of entities. Despite the recent progress of large-scale MMKGs, existing MMKGs neglect the multi-aspect nature of entities, limiting the ability to comprehend entities from various perspectives. In this paper, we construct AspectMMKG, the first MMKG with aspect-related images by matching images to different entity aspects. Specifically, we collect aspect-related images from a knowledge base, and further extract aspect-related sentences from the knowledge base as queries to retrieve a large number of aspect-related images via an online image search engine. Finally, AspectMMKG contains 2,380 entities, 18,139 entity aspects, and 645,383 aspect-related images. We demonstrate the usability of AspectMMKG in entity aspect linking (EAL) downstream task and show that previous EAL models achieve a new state-of-the-art performance with the help of AspectMMKG. To facilitate the research on aspect-related MMKG, we further propose an aspect-related image retrieval (AIR) model, that aims to correct and expand aspect-related images in AspectMMKG. We train an AIR model to learn the relationship between entity image and entity aspect-related images by incorporating entity image, aspect, and aspect image information. Experimental results indicate that the AIR model could retrieve suitable images for a given entity w.r.t different aspects.
摘要
多modal知识图(MMKG)结合不同模式数据(例如文本和图像)以实现实体的全面理解。虽然最近大规模的 MMKG 已经取得了进展,但现有 MMKG 忽视实体的多方面性,限制了从不同角度理解实体的能力。在本文中,我们构建了 AspectMMKG,首个基于实体方面的图像的 MMKG。具体来说,我们从知识库中收集了实体方面的图像,并从知识库中提取实体方面相关的句子作为查询来 retrieve 大量实体方面相关的图像 via 在线图像搜索引擎。最后,AspectMMKG 包含 2,380 个实体、18,139 个实体方面、645,383 个实体方面相关的图像。我们示出了 AspectMMKG 在实体方面链接(EAL)下游任务中的可用性,并证明了前一个 EAL 模型在 AspectMMKG 的帮助下实现了新的州OF-THE-ART 性能。为便于研究实体方面相关的 MMKG,我们还提出了一种实体方面相关的图像检索(AIR)模型,该模型 aimed 学习实体图像和实体方面相关图像之间的关系。我们在 AIR 模型中包含实体图像、实体方面和实体方面相关图像信息。实验结果表明,AIR 模型可以为给定实体 Retrieval 相关的图像。
results: 本文的实验结果显示,这些方法可以实现高度的分类强度,并且具有跨架构对应的数据簇范例能力。此外,本文也 investigate了这些方法对不同语言的公平性。Abstract
With the rise of deep learning, large datasets and complex models have become common, requiring significant computing power. To address this, data distillation has emerged as a technique to quickly train models with lower memory and time requirements. However, data distillation on text-based datasets hasn't been explored much because of the challenges rising due to its discrete nature. Additionally, existing dataset distillation methods often struggle to generalize to new architectures. In the paper, we propose several data distillation techniques for multilingual text classification datasets using language-model-based learning methods. We conduct experiments to analyze their performance in terms of classification strength, and cross-architecture generalization. Furthermore, we investigate the language-specific fairness of the data summaries generated by these methods. Our approach builds upon existing techniques, enhancing cross-architecture generalization in the text data distillation domain.
摘要
随着深度学习的出现,大量数据和复杂的模型变得普遍,需要大量的计算能力。为解决这问题,数据简化技术得到了广泛应用,但是在文本数据集上,数据简化技术尚未得到了充分的研究,这是因为文本数据的精度性带来了很多挑战。现有的数据简化方法通常难以泛化到新的架构上。在本文中,我们提出了一些基于语言模型学习方法的文本分类数据集简化技术。我们通过实验分析这些技术的分类强度和泛化性。此外,我们还研究了这些方法生成的语言特异性数据概要的公平性。我们的方法基于现有技术,提高了文本数据简化领域的跨架构泛化性。
Improving Autonomous Separation Assurance through Distributed Reinforcement Learning with Attention Networks
results: 通过实验研究,表明提议的框架可以在高密度、动态环境中保证安全高效的分离,并且比现有方法具有更高的训练样本通过率。Abstract
Advanced Air Mobility (AAM) introduces a new, efficient mode of transportation with the use of vehicle autonomy and electrified aircraft to provide increasingly autonomous transportation between previously underserved markets. Safe and efficient navigation of low altitude aircraft through highly dense environments requires the integration of a multitude of complex observations, such as surveillance, knowledge of vehicle dynamics, and weather. The processing and reasoning on these observations pose challenges due to the various sources of uncertainty in the information while ensuring cooperation with a variable number of aircraft in the airspace. These challenges coupled with the requirement to make safety-critical decisions in real-time rule out the use of conventional separation assurance techniques. We present a decentralized reinforcement learning framework to provide autonomous self-separation capabilities within AAM corridors with the use of speed and vertical maneuvers. The problem is formulated as a Markov Decision Process and solved by developing a novel extension to the sample-efficient, off-policy soft actor-critic (SAC) algorithm. We introduce the use of attention networks for variable-length observation processing and a distributed computing architecture to achieve high training sample throughput as compared to existing approaches. A comprehensive numerical study shows that the proposed framework can ensure safe and efficient separation of aircraft in high density, dynamic environments with various sources of uncertainty.
摘要
高级空中流动(AAM)介入了一种新的、高效的交通方式,通过车辆自主和电动飞机提供前所未有的市场。在低空飞行的环境中保持安全和高效的导航需要融合多种复杂的观察,如抽象、车辆动力学和天气情况。这些观察的处理和理解受多种不确定性的影响,同时确保与变量数量的飞机在空间中协作。这些挑战,加上实时做出安全关键决策,排除了传统的分离保障技术的使用。我们提出了一种分布式学习框架,以提供自动化的自分配能力在AAM通道中。问题是表示为马尔可夫决策过程,通过开发一种新的、软 actor-critic(SAC)算法的扩展来解决。我们引入了注意网络来处理变量长度的观察,以及分布式计算架构,以实现高训练样本通过率比现有方法高。一项完整的数学研究表明,我们的框架可以在高密度、动态环境中保持安全和高效的飞机分离能力,并处理多种不确定性。
Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation
results: 研究结果表明,提出的可持续联合学习(S2FL)算法可以减少完成时间,相比其他参考方案,可以减少21.45%。此外,研究还发现在非对抗多access(NOMA)下,可以提高总完成时间8.36%的平均值。Abstract
Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the first time investigates a resource allocation problem in collaborative sensing-assisted sustainable FL (S2FL) networks with the goal of minimizing the total completion time. We investigate a practical harvesting-sensing-training-transmitting protocol in which energy-limited MDs first harvest energy from RF signals, use it to gain a reward for user participation, sense the training data from the environment, train the local models at MDs, and transmit the model updates to the server. The total completion time minimization problem of jointly optimizing power transfer, transmit power allocation, data sensing, bandwidth allocation, local model training, and data transmission is complicated due to the non-convex objective function, highly non-convex constraints, and strongly coupled variables. We propose a computationally-efficient path-following algorithm to obtain the optimal solution via the decomposition technique. In particular, inner convex approximations are developed for the resource allocation subproblem, and the subproblems are performed alternatively in an iterative fashion. Simulation results are provided to evaluate the effectiveness of the proposed S2FL algorithm in reducing the completion time up to 21.45% in comparison with other benchmark schemes. Further, we investigate an extension of our work from frequency division multiple access (FDMA) to non-orthogonal multiple access (NOMA) and show that NOMA can speed up the total completion time 8.36% on average of the considered FL system.
摘要
联合学习(FL)在无线网络中获得了许多成功;但是实现FL的实现受到移动设备(MD)的能源限制和训练数据的可用性所妨碍。如何整合无线电力转移和移动协同测量以实现可持续的FL解决方案,这是现有文献中没有研究的领域。本研究首次对联合测量助长的可持续FL网络(S2FL)进行了资源分配问题的研究,以最小化总完成时间。我们提出了一个实用的收数测量训练传输协议,在能源有限的MD上首先从RF信号中获取能量,用以获得用户参与的奖励,从环境中测量训练数据,在MD上训练本地模型,并将模型更新发送到服务器。总完成时间最小化问题是由于目标函数不对应、内部非拘束和变量之间强烈的相互关联,而变得困难。我们提出了一个可靠的 Computational Efficiency 的路径追踪算法,通过分解技术来取得最佳解。具体来说,我们在资源分配子问题上开发了内部凸approximation,并在迭代的方式下进行了资源分配子问题的解决。 simulation 结果显示,对于考虑的FL系统,我们的S2FL算法可以降低总完成时间21.45%。此外,我们将FDMA扩展到NOMA,并显示了NOMA可以将总完成时间提高8.36%的平均值。
Prototypical Kernel Learning and Open-set Foreground Perception for Generalized Few-shot Semantic Segmentation
results: 在 PASCAL-5i 和 COCO-20i 数据集上进行了广泛的实验,并证明了我们的方法在 previous state-of-the-art 之上表现更好。Abstract
Generalized Few-shot Semantic Segmentation (GFSS) extends Few-shot Semantic Segmentation (FSS) to simultaneously segment unseen classes and seen classes during evaluation. Previous works leverage additional branch or prototypical aggregation to eliminate the constrained setting of FSS. However, representation division and embedding prejudice, which heavily results in poor performance of GFSS, have not been synthetical considered. We address the aforementioned problems by jointing the prototypical kernel learning and open-set foreground perception. Specifically, a group of learnable kernels is proposed to perform segmentation with each kernel in charge of a stuff class. Then, we explore to merge the prototypical learning to the update of base-class kernels, which is consistent with the prototype knowledge aggregation of few-shot novel classes. In addition, a foreground contextual perception module cooperating with conditional bias based inference is adopted to perform class-agnostic as well as open-set foreground detection, thus to mitigate the embedding prejudice and prevent novel targets from being misclassified as background. Moreover, we also adjust our method to the Class Incremental Few-shot Semantic Segmentation (CIFSS) which takes the knowledge of novel classes in a incremental stream. Extensive experiments on PASCAL-5i and COCO-20i datasets demonstrate that our method performs better than previous state-of-the-art.
摘要
Variations on the Reinforcement Learning performance of Blackjack
for: The paper is written to explore the impact of deck size on the convergence of q-learning algorithms in the context of blackjack.
methods: The paper uses a q-learning solution for optimal play in blackjack, and investigates the rate of learning convergence as a function of deck size.
results: The paper shows that a card counter perfectly using the basic strategy and hi-lo system can bring the house to bankruptcy, and that environment variations have a significant impact on this outcome.Here are the three points in Simplified Chinese:
results: 这篇论文显示,一个 perfectly 使用基本策略和 hi-lo 系统的 card counter 可以使酒店破产,并且环境变化会对这个结果产生很大的影响。Abstract
Blackjack or "21" is a popular card-based game of chance and skill. The objective of the game is to win by obtaining a hand total higher than the dealer's without exceeding 21. The ideal blackjack strategy will maximize financial return in the long run while avoiding gambler's ruin. The stochastic environment and inherent reward structure of blackjack presents an appealing problem to better understand reinforcement learning agents in the presence of environment variations. Here we consider a q-learning solution for optimal play and investigate the rate of learning convergence of the algorithm as a function of deck size. A blackjack simulator allowing for universal blackjack rules is also implemented to demonstrate the extent to which a card counter perfectly using the basic strategy and hi-lo system can bring the house to bankruptcy and how environment variations impact this outcome. The novelty of our work is to place this conceptual understanding of the impact of deck size in the context of learning agent convergence.
摘要
黑Jack或"21"是一款流行的 кар牌游戏,旨在通过获得手牌总数高于供应者的手牌总数而赢得比赛,而不超过21。理想的黑Jack策略可以在长期内最大化金钱收益,同时避免投资者的破产。黑Jack的 Stochastic 环境和内在的奖励结构,使得黑Jack 在变化的环境中的研究非常有吸引力。在这里,我们考虑了 q-learning 方法来实现最佳策略,并研究了算法的学习速率是否与扑克牌deck大小相关。我们还实现了一个支持通用黑Jack规则的黑Jack模拟器,以示出一个基本策略和高低系统的卡计数员可以使得供应者铺垮,以及环境变化对这个结果的影响。我们的作品之处在于将这些概念理解与学习代理人快速学习的关系。
Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey
for: This paper aims to provide a systematic and comprehensive overview of methods for incorporating external knowledge into stock price prediction models, including the acquisition of external knowledge from various unstructured data sources and fusion methods for combining external knowledge with historical price features.
methods: The paper covers various methods for acquiring external knowledge, including non-graph-based and graph-based knowledge representations, and explores fusion methods for combining external knowledge with historical price features.
results: The paper includes a compilation of relevant datasets and discusses potential future research directions in this domain.Here’s the same information in Simplified Chinese text:
results: 论文收录了相关的数据集,并讨论了未来在这个领域的可能的研究方向。Abstract
Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain.
摘要
预测股票价格是一个复杂的研究问题,因为股票市场本身具有不确定性和非线性。在过去几年,带有知识的股票价格预测方法已经显示出了创新的成果,通过利用外部知识来理解股票市场。尽管这些方法的重要性,但是学术研究中对外部知识类型的系统化synthesis却相对罕见。特别是,外部知识可以通过不同的数据结构来表示,我们将其分为非图形化格式和图形化格式:1)非图形化知识捕捉特定股票的上下文信息和 multimedia描述; 2)图形化知识捕捉股票市场中的相互连接和相互依赖信息。本文旨在提供一个系统性和全面的描述,涵盖从不同的不结构化数据源中获取外部知识,并将其与历史价格特征合并。此外,本文还探讨了外部知识与历史价格特征的融合方法,以及相关数据集和未来研究方向。
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
paper_authors: Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam
results: 在 31 个Unique NLP任务和 53 个公共数据集中进行了约 296K 个实验设定,以评估框架的性能。Abstract
The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework. Initially developed to evaluate Arabic NLP tasks using OpenAI's GPT and BLOOM models; it can be seamlessly customized for any NLP task and model, regardless of language. The framework also features zero- and few-shot learning settings. A new custom dataset can be added in less than 10 minutes, and users can use their own model API keys to evaluate the task at hand. The developed framework has been already tested on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. We plan to open-source the framework for the community (https://github.com/qcri/LLMeBench/). A video demonstrating the framework is available online (https://youtu.be/FkQn4UjYA0s).
摘要
The LLMeBench framework is highly customizable and can be easily adapted for any NLP task and model, regardless of language. It also supports zero- and few-shot learning settings, allowing users to evaluate their tasks with minimal data. Adding a new custom dataset to the framework takes less than 10 minutes, and users can use their own model API keys to evaluate the task at hand.We have tested the LLMeBench framework on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. The framework is set to be open-sourced for the community, and a video demonstrating its capabilities is available online (https://youtu.be/FkQn4UjYA0s).
Gaussian Image Anomaly Detection with Greedy Eigencomponent Selection
paper_authors: Tetiana Gula, João P C Bertoldo for: 本文旨在提出一种新的维度减少方法,用于图像异常检测(AD),通过使用预训练的卷积神经网络(CNN)和高效率的EfficientNet模型。methods: 本文使用两种类型的树搜索方法,即搜索最佳准则和最佳词汇搜索,以选择最佳的 eigencomponent。同时,我们也进行了三个主要的实验来评估方法的效果,包括测试集的影响、一种异常类型的训练和所有其他类型的评估、以及使用最小数量的图像进行训练和选择。results: 我们的方法在检测精度方面胜过了PCA和NPCA,即使使用 fewer components。这表明,我们的方法可以提供一种有效的维度减少方法,并且可以增强图像异常检测系统的效率和精度。Abstract
Anomaly detection (AD) in images, identifying significant deviations from normality, is a critical issue in computer vision. This paper introduces a novel approach to dimensionality reduction for AD using pre-trained convolutional neural network (CNN) that incorporate EfficientNet models. We investigate the importance of component selection and propose two types of tree search approaches, both employing a greedy strategy, for optimal eigencomponent selection. Our study conducts three main experiments to evaluate the effectiveness of our approach. The first experiment explores the influence of test set performance on component choice, the second experiment examines the performance when we train on one anomaly type and evaluate on all other types, and the third experiment investigates the impact of using a minimum number of images for training and selecting them based on anomaly types. Our approach aims to find the optimal subset of components that deliver the highest performance score, instead of focusing solely on the proportion of variance explained by each component and also understand the components behaviour in different settings. Our results indicate that the proposed method surpasses both Principal Component Analysis (PCA) and Negated Principal Component Analysis (NPCA) in terms of detection accuracy, even when using fewer components. Thus, our approach provides a promising alternative to conventional dimensionality reduction techniques in AD, and holds potential to enhance the efficiency and effectiveness of AD systems.
摘要
“问题检测(AD)在图像中,找到重要的异常,是计算机视觉中的关键问题。这篇论文介绍了一种新的维度减少方法,使用预训练的卷积神经网络(CNN)和EfficientNet模型。我们研究了选择组件的重要性,并提出了两种树搜索方法,都采用了聪明策略,以便选择最佳的组件。我们的研究进行了三项主要实验,以评估我们的方法的效果。第一项实验研究了测试集的影响因素,第二项实验验证了我们在一种异常类型上训练,然后在所有其他异常类型上进行评估,第三项实验研究了使用最少数量的图像进行训练和选择,并根据异常类型来选择图像。我们的方法旨在找到最佳的组件子集,而不是围绕每个组件的解释度量进行围绕。我们的结果表明,我们的方法在检测精度方面比PCA和NPCA高,即使使用 fewer components。因此,我们的方法提供了一种可靠的替代方法,可以增强AD系统的效率和效果。”
paper_authors: Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin, Shen
For: This paper aims to develop a comprehensive conceptual model for integrating Artificial Intelligence Generated Content (AIGC) and Semantic Communication (SemCom), and to propose a novel framework for generating meaningful and effective content using AIGC technology.* Methods: The paper employs a novel framework that uses AIGC technology as an encoder and decoder for semantic information, and jointly optimizes semantic extraction and evaluation metrics tailored to AIGC services. The framework is adaptable to different types of content generated, the required quality, and the semantic information utilized.* Results: The paper presents a case study using a Deep Q Network (DQN) to demonstrate the feasibility of the optimization problem and its convergence characteristics. The study provides useful insights into the effectiveness of the proposed framework for generating meaningful and effective content using AIGC technology.Abstract
Artificial Intelligence Generated Content (AIGC) Services have significant potential in digital content creation. The distinctive abilities of AIGC, such as content generation based on minimal input, hold huge potential, especially when integrating with semantic communication (SemCom). In this paper, a novel comprehensive conceptual model for the integration of AIGC and SemCom is developed. Particularly, a content generation level is introduced on top of the semantic level that provides a clear outline of how AIGC and SemCom interact with each other to produce meaningful and effective content. Moreover, a novel framework that employs AIGC technology is proposed as an encoder and decoder for semantic information, considering the joint optimization of semantic extraction and evaluation metrics tailored to AIGC services. The framework can adapt to different types of content generated, the required quality, and the semantic information utilized. By employing a Deep Q Network (DQN), a case study is presented that provides useful insights into the feasibility of the optimization problem and its convergence characteristics.
摘要
人工智能生成内容(AIGC)服务具有很大的潜力在数字内容创造中。AIGC的特殊能力,如基于最小输入的内容生成,在整合 semantic communication(SemCom)时表现出巨大的潜力,尤其是在生成有意义和有效的内容方面。在这篇论文中,我们提出了一种新的全面概念模型,用于AIGC和SemCom的整合。具体来说,我们在 semantic 层次上引入了内容生成级别,以便清晰地描述AIGC和SemCom之间的交互方式,并生成有意义和有效的内容。此外,我们提议了一种基于 AIGC 技术的核心架构,用于SemCom 的编解码器,并考虑了对 AIGC 服务的 JOINT 优化。这种框架可以适应不同类型的内容生成、需要的质量和使用的semantic信息。通过使用深度感知网络(DQN),我们在这种优化问题上提供了有用的洞察和其叠合特性。
An In-Depth Analysis of Discretization Methods for Communication Learning using Backpropagation with Multi-Agent Reinforcement Learning
results: 研究发现,ST-DRU 方法在不同环境中的表现最佳,其在每个实验中都达到了或超过了其他方法的最佳性能,而且不会在任何环境中失败。Abstract
Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments.
摘要
<>通信是多智能代理学习中关键的一部分,当代理不能观察环境的全部状态时。最常见的解决方案是使用可导通信通道,让代理之间的反馈通过梯度流动。然而,当使用整数消息时,梯度无法流动,这引发了一些问题。先前的研究已经提出了解决方案,但是这些方法在不同的通信学习架构和环境中进行测试,使得比较困难。本文比较了多种当前领先的整数化方法,以及一种新的方法。我们在通信学习中使用其他代理的梯度进行学习,并在多个环境中进行测试。此外,我们还提出了 COMA-DIAL,一种基于 DIAL 和 COMA 的通信学习方法,其中包括学习速率缩放和适应exploration。使用 COMA-DIAL 让我们在更复杂的环境中进行实验。我们的结果表明,本文所提出的新方法 ST-DRU,在不同环境中的所有实验中均达到了最佳result。它在每个实验中达到了最佳或接近最佳性能,并且是唯一不会在任何测试环境中失败的方法。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Service Reservation and Pricing for Green Metaverses: A Stackelberg Game Approach
results: compared with传统方案,我们的方案可以同时实现能源减少和个人合理性,并且可以满足用户的经济需求。此外,我们还提出了如何通过结合多种新技术实现可持续的绿色Metaverse。Abstract
Metaverse enables users to communicate, collaborate and socialize with each other through their digital avatars. Due to the spatio-temporal characteristics, co-located users are served well by performing their software components in a collaborative manner such that a Metaverse service provider (MSP) eliminates redundant data transmission and processing, ultimately reducing the total energy consumption. The energyefficient service provision is crucial for enabling the green and sustainable Metaverse. In this article, we take an augmented reality (AR) application as an example to achieve this goal. Moreover, we study an economic issue on how the users reserve offloading services from the MSP and how the MSP determines an optimal charging price since each user is rational to decide whether to accept the offloading service by taking into account the monetary cost. A single-leader multi-follower Stackelberg game is formulated between the MSP and users while each user optimizes an offloading probability to minimize the weighted sum of time, energy consumption and monetary cost. Numerical results show that our scheme achieves energy savings and satisfies individual rationality simultaneously compared with the conventional schemes. Finally, we identify and discuss open directions on how several emerging technologies are combined with the sustainable green Metaverse.
摘要
Metaverse 允许用户通过数字化的人物进行交流、合作和社交交流。由于空间 temporal 特点,用户在同一个地点时,Metaverse 服务提供商 (MSP) 可以减少重复的数据传输和处理,最终减少总能 consumption。这种能源efficient的服务提供是metaverse 的关键,以实现绿色可持续的Metaverse。在这篇文章中,我们使用了一个扩展现实 (AR) 应用程序作为例子,以实现这个目标。此外,我们还研究了用户向 MSP 积分服务的协议和 MSP 如何确定最佳的收费 Price,因为每个用户都是合理的决定是否接受协议,考虑到经济成本。在这个框架中,MSP 是一个领导者,而用户是多个追随者。每个用户都是最小化时间、能源消耗和经济成本的权重和总和。numerical 结果表明,我们的方案可以同时实现能源减少和个人合理性。最后,我们还讨论了一些新兴技术如何与可持续的绿色Metaverse 结合。
LLaMA-E: Empowering E-commerce Authoring with Multi-Aspect Instruction Following
paper_authors: Kaize Shi, Xueyao Sun, Dingxian Wang, Yinlin Fu, Guandong Xu, Qing Li methods: 这篇论文使用了专门的执行 instruction-following 语言模型(LLaMA-E),并将它们训练在多种电商创作任务中,包括广告生成、产品标题修改、产品分类、购买意愿预测和通用问答等。results: 这篇论文的实验结果显示,提案的 LLaMA-E 模型在量和质量评估中均取得了顶尖的结果,并在零执行场景中表现出优势。此外,这篇论文还发现 LLMA-E 模型可以对电商内容创作问题进行优化解决。Abstract
E-commerce authoring involves creating attractive, abundant, and targeted promotional content to drive product sales. The emergence of large language models (LLMs) introduces an innovative paradigm, offering a unified solution to address various authoring tasks within this scenario. However, mainstream LLMs trained on general corpora with common sense knowledge reveal limitations in fitting complex and personalized features unique to e-commerce products and customers. Furthermore, LLMs like GPT-3.5 necessitate remote accessibility, raising concerns about safeguarding voluminous customer privacy data during transmission. This paper proposes the LLaMA-E, the unified and customized instruction-following language models focusing on diverse e-commerce authoring tasks. Specifically, the domain experts create the seed instruction set from the tasks of ads generation, query-enhanced product title rewriting, product classification, purchase intent speculation, and general Q&A. These tasks enable the models to comprehensively understand precise e-commerce authoring knowledge by interleaving features covering typical service aspects of customers, sellers, and platforms. The GPT-3.5 is introduced as a teacher model, which expands the seed instructions to form a training set for the LLaMA-E models with various scales. The experimental results show that the proposed LLaMA-E models achieve state-of-the-art results in quantitative and qualitative evaluations, also exhibiting the advantage in zero-shot scenes. To the best of our knowledge, this study is the first to serve the LLMs to specific e-commerce authoring scenarios.
摘要
电商作者需创建吸引人、丰富、有目标的推广内容,以促进产品销售。大语言模型(LLM)的出现提供了一种创新的解决方案,可以同时解决多种作者任务。然而,主流的LLM通常由通用文献训练,对电商产品和客户特有的复杂和个性化特征表现出限制。此外,LLM如GPT-3.5需要远程访问,可能会使客户隐私数据泄露。本文提议了LLaMA-E,一种特定于电商作者任务的一体化和个性化语言模型。具体来说,域专家会根据广告生成、产品标题重写、产品分类、购买意愿预测和通用问答等任务,创建种子指令集。这些任务使模型能够全面地理解电商作者的精准知识,覆盖客户、卖家和平台的典型服务方面。GPT-3.5被用作教学模型,将种子指令扩展成训练集,以形成不同规模的LLaMA-E模型。实验结果表明,提议的LLaMA-E模型在量和质量评价中具有国际最佳效果,同时在零容量场景中也表现出优势。到目前为止,这是电商作者scenario中LLM的首次应用。
SLPT: Selective Labeling Meets Prompt Tuning on Label-Limited Lesion Segmentation
results: 这个研究在肝癌分 segmentation 中实现了国际级的性能,只需要6%的弹性参数,并且可以在5%的标签数量下 достиieving 94%的全资料性能。Abstract
Medical image analysis using deep learning is often challenged by limited labeled data and high annotation costs. Fine-tuning the entire network in label-limited scenarios can lead to overfitting and suboptimal performance. Recently, prompt tuning has emerged as a more promising technique that introduces a few additional tunable parameters as prompts to a task-agnostic pre-trained model, and updates only these parameters using supervision from limited labeled data while keeping the pre-trained model unchanged. However, previous work has overlooked the importance of selective labeling in downstream tasks, which aims to select the most valuable downstream samples for annotation to achieve the best performance with minimum annotation cost. To address this, we propose a framework that combines selective labeling with prompt tuning (SLPT) to boost performance in limited labels. Specifically, we introduce a feature-aware prompt updater to guide prompt tuning and a TandEm Selective LAbeling (TESLA) strategy. TESLA includes unsupervised diversity selection and supervised selection using prompt-based uncertainty. In addition, we propose a diversified visual prompt tuning strategy to provide multi-prompt-based discrepant predictions for TESLA. We evaluate our method on liver tumor segmentation and achieve state-of-the-art performance, outperforming traditional fine-tuning with only 6% of tunable parameters, also achieving 94% of full-data performance by labeling only 5% of the data.
摘要
医疗图像分析使用深度学习经常受到有限的标注数据和高标注成本的挑战。 fine-tuning整个网络在标注有限的场景下可能导致逗减和下标性性能。Recently, prompt tuning emerged as a more promising technique that introduces a few additional tunable parameters as prompts to a task-agnostic pre-trained model, and updates only these parameters using supervision from limited labeled data while keeping the pre-trained model unchanged. However, previous work has overlooked the importance of selective labeling in downstream tasks, which aims to select the most valuable downstream samples for annotation to achieve the best performance with minimum annotation cost. To address this, we propose a framework that combines selective labeling with prompt tuning (SLPT) to boost performance in limited labels. Specifically, we introduce a feature-aware prompt updater to guide prompt tuning and a TandEm Selective LAbeling (TESLA) strategy. TESLA includes unsupervised diversity selection and supervised selection using prompt-based uncertainty. In addition, we propose a diversified visual prompt tuning strategy to provide multi-prompt-based discrepant predictions for TESLA. We evaluate our method on liver tumor segmentation and achieve state-of-the-art performance, outperforming traditional fine-tuning with only 6% of tunable parameters, also achieving 94% of full-data performance by labeling only 5% of the data.
Adversarial Deep Reinforcement Learning for Cyber Security in Software Defined Networks
paper_authors: Luke Borchjes, Clement Nyirenda, Louise Leenen
For: The paper explores the impact of adversarial learning in Deep Reinforcement Learning (DRL) for autonomous security in Software Defined Networks (SDN).* Methods: The paper compares two algorithms, Double Deep Q-Networks (DDQN) and Neural Episodic Control to Deep Q-Network (NEC2DQN or N2D), and evaluates their performance under a white-box setting with a causative attack.* Results: The paper shows that with minute parameter changes, the algorithms are still able to defend the network, and the introduction of the causative attack improves the attacker’s performance.Abstract
This paper focuses on the impact of leveraging autonomous offensive approaches in Deep Reinforcement Learning (DRL) to train more robust agents by exploring the impact of applying adversarial learning to DRL for autonomous security in Software Defined Networks (SDN). Two algorithms, Double Deep Q-Networks (DDQN) and Neural Episodic Control to Deep Q-Network (NEC2DQN or N2D), are compared. NEC2DQN was proposed in 2018 and is a new member of the deep q-network (DQN) family of algorithms. The attacker has full observability of the environment and access to a causative attack that uses state manipulation in an attempt to poison the learning process. The implementation of the attack is done under a white-box setting, in which the attacker has access to the defender's model and experiences. Two games are played; in the first game, DDQN is a defender and N2D is an attacker, and in second game, the roles are reversed. The games are played twice; first, without an active causative attack and secondly, with an active causative attack. For execution, three sets of game results are recorded in which a single set consists of 10 game runs. The before and after results are then compared in order to see if there was actually an improvement or degradation. The results show that with minute parameter changes made to the algorithms, there was growth in the attacker's role, since it is able to win games. Implementation of the adversarial learning by the introduction of the causative attack showed the algorithms are still able to defend the network according to their strengths.
摘要
Here is the text in Simplified Chinese:这篇论文研究了利用深度强化学习(DRL)中的自主攻击方法来训练更加鲜硬的代理人,并通过对DRL的自主安全性进行抗击学习来强化网络的自主安全性。研究比较了两种算法,Double Deep Q-Networks(DDQN)和Neural Episodic Control to Deep Q-Network(NEC2DQN或N2D),在面临 causative 攻击时的防御能力。攻击者具有环境的全观察权和对 defendere 的模型和经验的访问权。在一个白盒Setting下,攻击者通过 state manipulate 来尝试毒害学习过程。两个游戏被玩了两次,一次没有活动 causative 攻击,第二次有活动 causative 攻击。每个游戏被玩了十次。前后结果被记录下来,并进行比较,以确定是否有改进或退化。结果显示,通过微调算法的参数,攻击者能够赢得游戏。对于 causative 攻击的引入,算法仍然能够防御网络,根据其优势。
GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters
results: 在多种场景下测试和比较GraphCC和ACC两种解决方案,结果显示GraphCC在所有评价场景中表现出色,与ACC比较而言,GraphCC在流程完成时间和缓存占用率方面具有显著的改善($20%$ 的提高和$38.0-85.7%$ 的减少)。Abstract
Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCN). Currently, DCNs mainly implement two main CC protocols: DCTCP and DCQCN. Both protocols -- and their main variants -- are based on Explicit Congestion Notification (ECN), where intermediate switches mark packets when they detect congestion. The ECN configuration is thus a crucial aspect on the performance of CC protocols. Nowadays, network experts set static ECN parameters carefully selected to optimize the average network performance. However, today's high-speed DCNs experience quick and abrupt changes that severely change the network state (e.g., dynamic traffic workloads, incast events, failures). This leads to under-utilization and sub-optimal performance. This paper presents GraphCC, a novel Machine Learning-based framework for in-network CC optimization. Our distributed solution relies on a novel combination of Multi-agent Reinforcement Learning (MARL) and Graph Neural Networks (GNN), and it is compatible with widely deployed ECN-based CC protocols. GraphCC deploys distributed agents on switches that communicate with their neighbors to cooperate and optimize the global ECN configuration. In our evaluation, we test the performance of GraphCC under a wide variety of scenarios, focusing on the capability of this solution to adapt to new scenarios unseen during training (e.g., new traffic workloads, failures, upgrades). We compare GraphCC with a state-of-the-art MARL-based solution for ECN tuning -- ACC -- and observe that our proposed solution outperforms the state-of-the-art baseline in all of the evaluation scenarios, showing improvements up to $20\%$ in Flow Completion Time as well as significant reductions in buffer occupancy ($38.0-85.7\%$).
摘要
压力控制(CC)在数据中心网络(DCN)中扮演了基本角色,以优化流量。目前,DCNs主要实施了两种主要的CC协议:DCTCP和DCQCN。这两种协议都基于显式拥堵通知(ECN), intermediate switches 将包 WHEN 检测拥堵。因此,ECN配置成为了CC协议性能的关键因素。当前,网络专家通过精心选择ECN参数来优化平均网络性能。然而,今天的高速DCNs在快速和突然变化的网络状态下经历了差不多的性能下降。这导致了下Utilization和不佳的性能。本文介绍了一种基于机器学习的GraphCC框架,用于在网络中进行CC优化。我们的分布式解决方案基于多代理循环学习(MARL)和图神经网络(GNN)的新组合,与广泛部署的ECN基于CC协议相容。GraphCC在 switches 上部署分布式代理,与相邻的 switches 进行交互,以便协调和优化全局ECN配置。在我们的评估中,我们测试了GraphCC在多种场景下的性能,特别是它在新的场景下(例如新的流量工作负荷、故障、升级)进行适应性的能力。我们将GraphCC与一种基于MARL的ECN调试解决方案——ACC进行比较,并发现我们的提议方案在所有评估场景中都高于基线,显示改进流程完成时间($20\%$)以及重要减少缓冲占用($38.0-85.7\%$)。
Unleashing the Power of Extra-Tree Feature Selection and Random Forest Classifier for Improved Survival Prediction in Heart Failure Patients
results: 通过ET特征选择算法确定最重要的预测器,并使用网格搜索法进行RF模型的优化,实现了98.33%的准确率,至今为最高的成果。Abstract
Heart failure is a life-threatening condition that affects millions of people worldwide. The ability to accurately predict patient survival can aid in early intervention and improve patient outcomes. In this study, we explore the potential of utilizing data pre-processing techniques and the Extra-Tree (ET) feature selection method in conjunction with the Random Forest (RF) classifier to improve survival prediction in heart failure patients. By leveraging the strengths of ET feature selection, we aim to identify the most significant predictors associated with heart failure survival. Using the public UCL Heart failure (HF) survival dataset, we employ the ET feature selection algorithm to identify the most informative features. These features are then used as input for grid search of RF. Finally, the tuned RF Model was trained and evaluated using different matrices. The approach was achieved 98.33% accuracy that is the highest over the exiting work.
摘要
心力衰竭是一种生命威胁性的疾病,影响全球数百万人。可以准确预测患者存活的能力可以帮助早期干预,提高患者结果。在这项研究中,我们探讨了使用数据处理技术和EXTRA-TREE(ET)特征选择方法,与Random Forest(RF)分类器结合以提高心力衰竭患者存活预测。通过利用ET特征选择算法的优势,我们希望可以确定心力衰竭存活的最重要预测因素。使用公共的UCL心力衰竭(HF)存活数据集,我们采用ET特征选择算法确定最有用的特征。这些特征然后作为RF模型的输入,进行了网格搜索。最后,通过不同矩阵的训练和评估,我们实现了98.33%的准确率,这是已有工作中最高的成绩。
Learning Type-Generalized Actions for Symbolic Planning
results: 在模拟的网格式厨房环境中,通过几个观察学习,可以学习和泛化类型总结,解决未经见过的任务组合、新的实体和不预期的环境行为。Abstract
Symbolic planning is a powerful technique to solve complex tasks that require long sequences of actions and can equip an intelligent agent with complex behavior. The downside of this approach is the necessity for suitable symbolic representations describing the state of the environment as well as the actions that can change it. Traditionally such representations are carefully hand-designed by experts for distinct problem domains, which limits their transferability to different problems and environment complexities. In this paper, we propose a novel concept to generalize symbolic actions using a given entity hierarchy and observed similar behavior. In a simulated grid-based kitchen environment, we show that type-generalized actions can be learned from few observations and generalize to novel situations. Incorporating an additional on-the-fly generalization mechanism during planning, unseen task combinations, involving longer sequences, novel entities and unexpected environment behavior, can be solved.
摘要
symbolic 规划是一种强大的技术,可以解决复杂的任务,需要长串动作,并具备复杂的行为。但是,这种方法的缺点是需要适当的符号表示,描述环境状态以及可以改变它的行动。传统上,这些表示是由专家手动设计,限制了它们在不同问题领域的传输性。在这篇论文中,我们提出了一种新的概念,使用给定的实体层次结构和观察到的相似行为来泛化符号行动。在一个模拟的网格式厨房环境中,我们示出了通过几次观察学习,可以将类型泛化行动应用于新的情况。在规划过程中,附加了一种在线泛化机制,可以解决未看过的任务组合,包括更长的序列,新的实体和意外的环境行为。
Scalability of Message Encoding Techniques for Continuous Communication Learned with Multi-Agent Reinforcement Learning
paper_authors: Astrid Vanneste, Thomas Somers, Simon Vanneste, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx
for: 本研究旨在 investigating the effect of increasing the amount of information in multi-agent communication messages and the number of agents on the performance of the system.
methods: 本研究使用 multi-agent reinforcement learning 技术,并 comparison of two message encoding methods: mean message encoder 和 attention message encoder.
results: surprisingly, 我们发现 mean message encoder 在所有情况下表现出色,而 attention message encoder 则表现较差。 进一步分析发现,使用 mean message encoder 的 agent 使用一种组合函数,包括对数和对数函数,来确保重要信息不会在传输过程中丢失。Abstract
Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in these messages. In this paper, we investigate the effect of increasing the amount of information that should be contained in a message and increasing the number of agents. We evaluate these effects on two different message encoding methods, the mean message encoder and the attention message encoder. We perform our experiments on a matrix environment. Surprisingly, our results show that the mean message encoder consistently outperforms the attention message encoder. Therefore, we analyse the communication protocol used by the agents that use the mean message encoder and can conclude that the agents use a combination of an exponential and a logarithmic function in their communication policy to avoid the loss of important information after applying the mean message encoder.
摘要
多个Agent系统需要间 Agent communication来实现目标。通过使用多 Agent reinforcement learning技术学习交流协议和行为协议,代理人获得了自定义信息共享的灵活性。然而,当代理人数量增加时,我们需要创建消息中信息的编码方式。在这篇论文中,我们研究了增加消息中信息量和代理人数量的效果,并对两种消息编码方法进行评估:mean message encoder和attention message encoder。我们在矩阵环境中进行了实验,结果显示mean message encoder在所有情况下表现出优于attention message encoder。因此,我们分析了使用mean message encoder的交流协议,并确定代理人使用权重函数和幂函数在其交流策略中,以避免信息损失。
Unlocking the Diagnostic Potential of ECG through Knowledge Transfer from Cardiac MRI
paper_authors: Özgün Turgut, Philip Müller, Paul Hager, Suprosanna Shit, Sophie Starck, Martin J. Menten, Eimo Martens, Daniel Rueckert for:本研究的目的是使用自适应对比学习技术,将心脏磁共振成像(CMR)图像中封闭的域特征传递到电cardiogram(ECG)中,以提高心血管疾病诊断的效率和准确性。methods:本研究使用了多模态对比学习和伪数据模型,将CMR图像中的域特征与ECG数据进行对比,以学习ECG中含有的域特征。results:研究结果表明,通过使用自适应对比学习技术,可以从ECG数据中提取出各种心血管疾病的风险和各种心脏现象的信息。此外,研究还发现了ECG中含有CMR图像中封闭的域特征。Abstract
The electrocardiogram (ECG) is a widely available diagnostic tool that allows for a cost-effective and fast assessment of the cardiovascular health. However, more detailed examination with expensive cardiac magnetic resonance (CMR) imaging is often preferred for the diagnosis of cardiovascular diseases. While providing detailed visualization of the cardiac anatomy, CMR imaging is not widely available due to long scan times and high costs. To address this issue, we propose the first self-supervised contrastive approach that transfers domain-specific information from CMR images to ECG embeddings. Our approach combines multimodal contrastive learning with masked data modeling to enable holistic cardiac screening solely from ECG data. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalizability of our method. We predict the subject-specific risk of various cardiovascular diseases and determine distinct cardiac phenotypes solely from ECG data. In a qualitative analysis, we demonstrate that our learned ECG embeddings incorporate information from CMR image regions of interest. We make our entire pipeline publicly available, including the source code and pre-trained model weights.
摘要
电rokardiogram (ECG) 是一种广泛可用的诊断工具,可以快速和效果地评估心血管健康。然而,更详细的检查通常使用昂贵的心血管磁共振 (CMR) 成像,以诊断心血管疾病。虽然可以提供详细的卡ди亚解剖结构,但CMR成像不够普遍使用,因为扫描时间长和成本高。为解决这个问题,我们提出了首个自动supervised contrastiveapproach,可以从CMR图像中传递域特定信息到ECG嵌入。我们的方法结合多modal contrastive学习和masked数据模型,以实现从ECG数据中进行全面的心血管检查。在使用40044名UK Biobank参与者的数据进行广泛的实验中,我们证明了我们的方法的实用性和普遍性。我们预测参与者的具体风险以及不同心血管疾病的各种疾病。在质量分析中,我们示出了我们学习的ECG嵌入包含CMR图像区域兴趣的信息。我们将整个管道公开发布,包括源代码和预训练模型参数。
On the Unexpected Abilities of Large Language Models
methods: 这篇论文使用了各种 indirect acquisition 方法,包括 predicting the next words of human-written texts,来研究大语言模型的能力。
results: 研究发现,大语言模型通过 indirect acquisition 方式可以获得各种 интегрирован的能力,包括语言理解和生成能力。此外,这些系统还可以通过自我改进来提高自己的能力。Abstract
Large language models are capable of displaying a wide range of abilities that are not directly connected with the task for which they are trained: predicting the next words of human-written texts. In this article, I discuss the nature of this indirect acquisition process and its relation to other known indirect processes. I argue that an important side effect of such indirect acquisition is the development of integrated abilities. I discuss the extent to which the abilities developed by large language models are predictable. Finally, I briefly discuss the relation between the cognitive skills acquired by these systems and human cognition.
摘要
Neuro-Symbolic RDF and Description Logic Reasoners: The State-Of-The-Art and Challenges
for: This paper provides an overview of the existing literature in the field of neuro-symbolic deductive reasoning supported by RDF(S), the description logics EL and ALC, and OWL 2 RL.
methods: The paper discusses various techniques employed in neuro-symbolic deductive reasoning, including neural networks and symbolic systems.
results: The paper provides a comprehensive overview of the existing literature in the field, discussing the tasks addressed and other relevant efforts in this area.Abstract
Ontologies are used in various domains, with RDF and OWL being prominent standards for ontology development. RDF is favored for its simplicity and flexibility, while OWL enables detailed domain knowledge representation. However, as ontologies grow larger and more expressive, reasoning complexity increases, and traditional reasoners struggle to perform efficiently. Despite optimization efforts, scalability remains an issue. Additionally, advancements in automated knowledge base construction have created large and expressive ontologies that are often noisy and inconsistent, posing further challenges for conventional reasoners. To address these challenges, researchers have explored neuro-symbolic approaches that combine neural networks' learning capabilities with symbolic systems' reasoning abilities. In this chapter,we provide an overview of the existing literature in the field of neuro-symbolic deductive reasoning supported by RDF(S), the description logics EL and ALC, and OWL 2 RL, discussing the techniques employed, the tasks they address, and other relevant efforts in this area.
摘要
ontologies 在不同领域中使用,RDF 和 OWL 是 Ontology 开发的主要标准。RDF 因其简单性和灵活性而受到推崇,而 OWL 允许对域知识进行详细表示。然而,随着 ontologies 的增大和表示力的提高,推理复杂性也随之增加,传统的推理器难以高效执行。尽管进行了优化尝试,但Scalability 仍然是一个问题。此外,自动化知识库建构技术的进步创造了大量和表示力强的 ontologies,这些 ontologies 经常具有噪音和不一致性,对传统的推理器 pose 更大的挑战。为解决这些挑战,研究人员 explore 了 neuralsymbolic 方法,这些方法将 neural networks 的学习能力与 symbolic 系统的推理能力结合起来。在这章中,我们提供了 exist 的文献综述,涵盖使用 RDF(S)、描述逻辑 EL 和 ALC,以及 OWL 2 RL,讨论使用的技术、Addressed 的任务和其他相关的努力。
A Fast and Optimal Learning-based Path Planning Method for Planetary Rovers
results: 对于 novel maps,可以快速搜索优化路径,并且在同硬件条件下,导航场景生成的优化路径可以大大减少搜索时间。Abstract
Intelligent autonomous path planning is crucial to improve the exploration efficiency of planetary rovers. In this paper, we propose a learning-based method to quickly search for optimal paths in an elevation map, which is called NNPP. The NNPP model learns semantic information about start and goal locations, as well as map representations, from numerous pre-annotated optimal path demonstrations, and produces a probabilistic distribution over each pixel representing the likelihood of it belonging to an optimal path on the map. More specifically, the paper computes the traversal cost for each grid cell from the slope, roughness and elevation difference obtained from the DEM. Subsequently, the start and goal locations are encoded using a Gaussian distribution and different location encoding parameters are analyzed for their effect on model performance. After training, the NNPP model is able to perform path planning on novel maps. Experiments show that the guidance field generated by the NNPP model can significantly reduce the search time for optimal paths under the same hardware conditions, and the advantage of NNPP increases with the scale of the map.
摘要
智能自主路径规划是探索效率提高 planetary rover 的关键。本文提出一种学习基于的方法,快速搜索地形图上的优化路径,称为 NNPP。 NNPP 模型从 numerous pre-annotated 优化路径示例中学习起始和目标位置的 semantic 信息以及地图表示,并生成地图上每个像素的可能性分布,表示它是优化路径的一部分。更加具体地说,文章计算地形图上每个格子单元的行进成本,基于 DEM 中的坡度、粗糙度和高程差。然后,起始和目标位置被编码为 Gaussian 分布,并对不同的位置编码参数进行分析,以影响模型性能。在训练后,NNPP 模型可以在新地图上进行路径规划。实验表明,NNPP 模型生成的引导场可以在同样的硬件条件下减少搜索优化路径的时间,并且 NNPP 模型在地图规模增加后的优势也随着增加。
Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR
paper_authors: Chunpeng Zhou, Kangjie Ning, Haishuai Wang, Zhi Yu, Sheng Zhou, Jiajun Bu for:This paper focuses on the subgrade distress detection task using 3D ground-penetrating radar (3D-GPR) data, with the goal of enhancing efficiency and accuracy through the use of automatic detection techniques and deep learning.methods:The proposed method leverages multi-view information from 3D-GPR data and constructs a real multi-view image dataset for the detection task. The method also develops a novel framework called GPR-MVFD, which incorporates multi-view distillation and attention-based fusion to extract significant features for subgrade distresses.results:The proposed framework outperforms existing GPR baselines and state-of-the-art methods in multi-view learning, multi-modal learning, and knowledge distillation, as demonstrated through extensive experiments on a new GPR benchmark. The constructed multi-view GPR dataset with expert-annotated labels and the source codes of the proposed framework will be released.Abstract
The application of 3D ground-penetrating radar (3D-GPR) for subgrade distress detection has gained widespread popularity. To enhance the efficiency and accuracy of detection, pioneering studies have attempted to adopt automatic detection techniques, particularly deep learning. However, existing works typically rely on traditional 1D A-scan, 2D B-scan or 3D C-scan data of the GPR, resulting in either insufficient spatial information or high computational complexity. To address these challenges, we introduce a novel methodology for the subgrade distress detection task by leveraging the multi-view information from 3D-GPR data. Moreover, we construct a real multi-view image dataset derived from the original 3D-GPR data for the detection task, which provides richer spatial information compared to A-scan and B-scan data, while reducing computational complexity compared to C-scan data. Subsequently, we develop a novel \textbf{M}ulti-\textbf{V}iew \textbf{V}usion and \textbf{D}istillation framework, \textbf{GPR-MVFD}, specifically designed to optimally utilize the multi-view GPR dataset. This framework ingeniously incorporates multi-view distillation and attention-based fusion to facilitate significant feature extraction for subgrade distresses. In addition, a self-adaptive learning mechanism is adopted to stabilize the model training and prevent performance degeneration in each branch. Extensive experiments conducted on this new GPR benchmark demonstrate the effectiveness and efficiency of our proposed framework. Our framework outperforms not only the existing GPR baselines, but also the state-of-the-art methods in the fields of multi-view learning, multi-modal learning, and knowledge distillation. We will release the constructed multi-view GPR dataset with expert-annotated labels and the source codes of the proposed framework.
摘要
现在广泛应用的3D地面探测(3D-GPR)技术已经为基层损伤探测带来了广泛的应用。为了提高检测效率和准确性,先锋研究者们已经尝试使用自动检测技术,特别是深度学习。然而,现有的工作通常仅仅基于传统的1D A-scan、2D B-scan或3D C-scan GPR数据,这会导致 either 缺乏空间信息 or 高度计算复杂。为解决这些挑战,我们介绍了一种新的方法oloogy for the subgrade distress detection task by leveraging the multi-view information from 3D-GPR data。此外,我们构建了来自原始3D-GPR数据的真正多视图图像集,这提供了与A-scan和B-scan数据相比更丰富的空间信息,而与C-scan数据相比又更加可靠。然后,我们开发了一种新的Multi-View Fusion and Distillation框架(GPR-MVFD),专门为了优化多视图GPR数据的利用。这个框架杰出地结合了多视图混合和注意力基于的混合,以便实现了显著的特征提取 для基层损伤。此外,我们采用了自适应学习机制,以稳定模型训练和避免每个分支的性能下降。我们在新的GPR benchmark上进行了广泛的实验,并证明了我们提出的方法的有效性和效率。我们的方法不仅超过了现有的GPR基准,还超过了多视图学习、多模态学习和知识混合等领域的状态 искусственный机制。我们将构建的多视图GPR数据集和GPR-MVFD框架的源代码一起发布。
Multi-modal Multi-view Clustering based on Non-negative Matrix Factorization
results: 实验结果表明,提出的方法在多种数据集上具有优秀的性能,与当前state-of-the-art方法相比,具有更高的准确率和更好的可读性。Abstract
By combining related objects, unsupervised machine learning techniques aim to reveal the underlying patterns in a data set. Non-negative Matrix Factorization (NMF) is a data mining technique that splits data matrices by imposing restrictions on the elements' non-negativity into two matrices: one representing the data partitions and the other to represent the cluster prototypes of the data set. This method has attracted a lot of attention and is used in a wide range of applications, including text mining, clustering, language modeling, music transcription, and neuroscience (gene separation). The interpretation of the generated matrices is made simpler by the absence of negative values. In this article, we propose a study on multi-modal clustering algorithms and present a novel method called multi-modal multi-view non-negative matrix factorization, in which we analyze the collaboration of several local NMF models. The experimental results show the value of the proposed approach, which was evaluated using a variety of data sets, and the obtained results are very promising compared to state of art methods.
摘要
通过组合相关对象,无监督机器学习技术寻求潜在的数据集下的 patrón。非正式矩阵分解(NMF)是一种数据挖掘技术,通过强制数据矩阵中元素的非负性约束,将数据分解成两个矩阵:一个表示数据分区,另一个表示数据集中聚类prototype。这种方法在各种应用中具有广泛的应用,包括文本挖掘、聚类、语言模型、音乐识别和神经科学(基因分离)。由于缺乏负值,生成的矩阵的解释变得更加简单。在本文中,我们提出了一种多模态聚类算法的研究,并提出了一种新的方法 called 多模态多视图非正式矩阵分解。我们通过分析多个本地NMF模型的协作来进行分析。实验结果显示,提出的方法在多种数据集上具有极高的效果,与现有方法相比,结果很有前途。
E3-UAV: An Edge-based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles
results: 实验结果显示,这个系统可以在实际应用中对检测任务进行有效的能源消耗优化。Abstract
Motivated by the advances in deep learning techniques, the application of Unmanned Aerial Vehicle (UAV)-based object detection has proliferated across a range of fields, including vehicle counting, fire detection, and city monitoring. While most existing research studies only a subset of the challenges inherent to UAV-based object detection, there are few studies that balance various aspects to design a practical system for energy consumption reduction. In response, we present the E3-UAV, an edge-based energy-efficient object detection system for UAVs. The system is designed to dynamically support various UAV devices, edge devices, and detection algorithms, with the aim of minimizing energy consumption by deciding the most energy-efficient flight parameters (including flight altitude, flight speed, detection algorithm, and sampling rate) required to fulfill the detection requirements of the task. We first present an effective evaluation metric for actual tasks and construct a transparent energy consumption model based on hundreds of actual flight data to formalize the relationship between energy consumption and flight parameters. Then we present a lightweight energy-efficient priority decision algorithm based on a large quantity of actual flight data to assist the system in deciding flight parameters. Finally, we evaluate the performance of the system, and our experimental results demonstrate that it can significantly decrease energy consumption in real-world scenarios. Additionally, we provide four insights that can assist researchers and engineers in their efforts to study UAV-based object detection further.
摘要
In response, we propose the E3-UAV, an edge-based energy-efficient object detection system for UAVs. The system is designed to dynamically support various UAV devices, edge devices, and detection algorithms, with the aim of minimizing energy consumption by determining the most energy-efficient flight parameters (including flight altitude, flight speed, detection algorithm, and sampling rate) required to fulfill the detection requirements of the task.To evaluate the performance of the system, we first present an effective evaluation metric for actual tasks and construct a transparent energy consumption model based on hundreds of actual flight data to formalize the relationship between energy consumption and flight parameters. Then, we present a lightweight energy-efficient priority decision algorithm based on a large quantity of actual flight data to assist the system in deciding flight parameters.Our experimental results demonstrate that the E3-UAV system can significantly decrease energy consumption in real-world scenarios. Additionally, we provide four insights that can assist researchers and engineers in their efforts to study UAV-based object detection further:1. The choice of detection algorithm has a significant impact on energy consumption, and the most energy-efficient algorithm may not always be the best performer.2. Flight altitude has a greater impact on energy consumption than flight speed, and adjusting flight altitude can lead to significant energy savings.3. Sampling rate has a complex relationship with energy consumption, and the optimal sampling rate depends on the specific task and environment.4. The E3-UAV system can be used for a variety of tasks beyond object detection, such as tracking and monitoring, and can be integrated with other systems to achieve even greater energy savings.In summary, the E3-UAV system represents a significant step forward in the development of energy-efficient UAV-based object detection systems, and our insights provide valuable guidance for future research and development in this area.
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
results: 实验表明,Compared with其他State-of-the-art工作,本研究的方法在不同的挑战性enario中具有更高的性能和稳定性。特别是在SoundNet-Flickr和VGG-Sound Source数据集上,本研究的方法的声音源localization性能都达到了最高水平。Abstract
Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive loss, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN
摘要
自我超vision的声源localization通常面临modal inconsistency挑战。近年来,基于对比学习的策略在视觉场景中已经展示了promising的表现,可以建立声源和视觉模态之间的一致性。然而,忽视不同模态特征之间的多样性影响仍然限制了这种方案的进一步改进,这也成为了我们的研究动机。在这项研究中,我们提出了一种启发网络,可以更有效地bridging模态差距。通过解coupling视觉和声音模态的梯度,可以学习视觉特征中声源的抽象表示,同时使声音模态与视觉模态保持一致。此外,我们还引入了一种视觉权重对比损失和适应阈值选择策略,以增强启发网络的稳定性。在SoundNet-Flickr和VGG-Sound Source等数据集上进行了大量实验,并证明了与其他状态 искус技术相比,我们的方法在不同的挑战场景中具有优秀的表现。代码可以在https://github.com/Tahy1/AVIN中下载。
Feature Matching Data Synthesis for Non-IID Federated Learning
for: This paper proposes a novel federated learning (FL) framework with data augmentation to relieve data heterogeneity, which can effectively address the non-independent and identically distributed (non-IID) data challenge in FL.
methods: The proposed framework uses a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models, which generates synthetic data by learning the essential class-relevant features of real samples and discarding the redundant features. To further enhance privacy preservation, a hard feature augmentation method is proposed to transfer real features towards the decision boundary, making the synthetic data not only improve the model generalization but also erase the information of real features.
results: The theoretical analysis and simulation results demonstrate that the proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.Abstract
Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.
摘要
federated learning (FL) 已经成为一种保持隐私的 парадиг,通过在边缘设备上训练神经网络而不需要收集数据到中央服务器。然而,FL 遇到了非独立和同样分布 (非 IID) 数据问题。为解决这个挑战,本文提出了一种困难特征匹配数据合成 (HFMDS) 方法,以及在本地模型 aside,共享auxiliary数据。具体来说,我们通过学习实际样本中的重要类相关特征,并丢弃 redundant 特征,可以有效地解决非 IID 问题。为了更好地保持隐私,我们提议一种困难特征增强方法,通过将实际特征转移到决策边界,使得synthetic数据不仅提高模型通用性,还消除实际特征的信息。通过将提档的 HFMDS 方法与 FL 集成,我们提出了一种新的 FL 框架,并在不同的 benchmark 数据集上进行了实验。理论分析表明,我们的提档的数据合成方法能够有效地解决非 IID 问题。实验结果还表明,我们的 HFMDS-FL 算法在准确率、隐私保持和计算成本等方面,与基eline相比,在不同的 benchmark 数据集上表现出色。
Automated Driving Without Ethics: Meaning, Design and Real-World Implementation
paper_authors: Katherine Evans, Nelson de Moura, Raja Chatila, Stéphane Chauvier
for: 这 paper 的目的是提出一种基于预定参数的 AV 决策策略,以便在各种决策场景中满足不同的人道主义观点和公众期望。
methods: 该策略使用了 Ethical Valence Theory,将 AV 决策视为一种缓冲降低报告的过程,并提出多种可能的决策规则,以便在具体的决策场景中选择最适合的行动。
results: 该策略可以帮助评估自动化汽车的决策是否符合社会可接受水平,并且可以满足不同的人道主义观点和公众期望。Abstract
The ethics of automated vehicles (AV) has received a great amount of attention in recent years, specifically in regard to their decisional policies in accident situations in which human harm is a likely consequence. After a discussion about the pertinence and cogency of the term 'artificial moral agent' to describe AVs that would accomplish these sorts of decisions, and starting from the assumption that human harm is unavoidable in some situations, a strategy for AV decision making is proposed using only pre-defined parameters to characterize the risk of possible accidents and also integrating the Ethical Valence Theory, which paints AV decision-making as a type of claim mitigation, into multiple possible decision rules to determine the most suitable action given the specific environment and decision context. The goal of this approach is not to define how moral theory requires vehicles to behave, but rather to provide a computational approach that is flexible enough to accommodate a number of human 'moral positions' concerning what morality demands and what road users may expect, offering an evaluation tool for the social acceptability of an automated vehicle's decision making.
摘要
自动驾驶车(AV)的伦理问题在最近几年内得到了广泛的关注,尤其是在冲突情况下人类伤害的可能性存在时。 после一些讨论“人造道德代理人”这个术语是否适用于AV,以及从假设存在某些情况下人类伤害是不可避免的前提下,一种基于预先定义的参数来描述可能发生的事故风险,以及纳入伦理价值论中的多种可能的决策规则,以确定在特定环境和决策上最合适的行动。该方法的目的不是定义自动车如何行事,而是提供一种计算方法,能够适应人类“道德位置”的多种要求和公路用户的期望,并提供评估自动车决策的社会可接受性的工具。
Bird’s-Eye-View Scene Graph for Vision-Language Navigation
methods: 该论文提出了一种基于多步BEV表示(BEV Scene Graph,BSG)的方法,用于编码室内环境的场景布局和几何特征。在导航过程中,BSG会建立当前步骤的本地BEV表示,并维护一个基于BEV的全局场景地图,用于存储和组织在线收集的所有本地BEV表示。
results: 与现有方法相比,该方法在REVERIE、R2R和R4R等测试 datasets 上显示出了明显的提升。这表明BEV感知在VLN中具有潜在的应用前景。Abstract
Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances. However, current agents are built upon panoramic observations, which hinders their ability to perceive 3D scene geometry and easily leads to ambiguous selection of panoramic view. To address these limitations, we present a BEV Scene Graph (BSG), which leverages multi-step BEV representations to encode scene layouts and geometric cues of indoor environment under the supervision of 3D detection. During navigation, BSG builds a local BEV representation at each step and maintains a BEV-based global scene map, which stores and organizes all the online collected local BEV representations according to their topological relations. Based on BSG, the agent predicts a local BEV grid-level decision score and a global graph-level decision score, combined with a sub-view selection score on panoramic views, for more accurate action prediction. Our approach significantly outperforms state-of-the-art methods on REVERIE, R2R, and R4R, showing the potential of BEV perception in VLN.
摘要
vision-language navigation (VLN) 已经取得了很大的进步,但现有的代理人都是基于投影的观察,这会限制它们对3D场景的理解和 Selection of panoramic view 的能力。为了解决这些局限性,我们提出了 BEV 场景图 (BSG),它利用多步 BEV 表示来编码室内环境的场景布局和几何信息。在导航过程中,BSG 在每个步骤建立了本地 BEV 表示,并维护了基于 BEV 的全局场景图,该图存储和组织了在线收集的所有本地 BEV 表示,根据它们的 topological relations。基于 BSG,代理人可以预测本地 BEV 格子级别的决策分数和全局图级别的决策分数,同时还可以基于投影视图进行更准确的动作预测。我们的方法在 REVERIE、R2R 和 R4R 上表现出了明显的突破,demonstrating the potential of BEV perception in VLN。
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
For: The paper proposes a new method for efficient and adaptive continual learning in Spiking Neural Networks (SNNs), inspired by the dynamic structure development of the human brain during child growth and development.* Methods: The proposed method, called Dynamic Structure Development of Spiking Neural Networks (DSD-SNN), dynamically assigns and grows new neurons to new tasks, prunes redundant neurons, and leverages overlapping shared structure to quickly adapt to new tasks while reducing computational overhead.* Results: The proposed model achieves significant improvements in performance, learning speed, and memory capacity compared to existing SNNs-based continual learning methods, and achieves comparable performance with DNNs-based methods.Abstract
Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
摘要
children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.Here's the text in Traditional Chinese:children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
Case Study: Using AI-Assisted Code Generation In Mobile Teams
results: 研究结果表明,使用AI助成编程工具可以提高开发效率和正确率,同时也可以帮助开发者更快地适应新的技术环境。Abstract
The aim of this study is to evaluate the performance of AI-assisted programming in actual mobile development teams that are focused on native mobile languages like Kotlin and Swift. The extensive case study involves 16 participants and 2 technical reviewers, from a software development department designed to understand the impact of using LLMs trained for code generation in specific phases of the team, more specifically, technical onboarding and technical stack switch. The study uses technical problems dedicated to each phase and requests solutions from the participants with and without using AI-Code generators. It measures time, correctness, and technical integration using ReviewerScore, a metric specific to the paper and extracted from actual industry standards, the code reviewers of merge requests. The output is converted and analyzed together with feedback from the participants in an attempt to determine if using AI-assisted programming tools will have an impact on getting developers onboard in a project or helping them with a smooth transition between the two native development environments of mobile development, Android and iOS. The study was performed between May and June 2023 with members of the mobile department of a software development company based in Cluj-Napoca, with Romanian ownership and management.
摘要
这项研究的目的是评估人工智能助手在实际移动开发团队中的表现,这些团队专注于本地移动语言如kotlin和swift。这项案例研究包括16名参与者和2名技术评审员,来自软件开发部门,旨在了解使用LLMs生成代码在特定阶段的团队中的影响,具体来说是技术培训和技术栈转换。研究使用每个阶段的技术问题,并请参与者在使用AI代码生成器和不使用其之间提交解决方案。研究测量时间、正确性和技术 интеграción使用ReviewerScore指标,该指标来自实际业界标准,代码审查人员对合并请求的反馈。输出被转换和分析,并与参与者反馈结合,以确定使用AI助手编程工具是否会影响开发人员在项目中上手或者在 Android 和 iOS 两个本地开发环境之间的畅转。研究在2023年5月至6月进行,参与者来自罗马尼亚CLuj-Napoca的软件开发公司的移动部门,该公司拥有罗马尼亚所有和管理。
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
results: 对比先前方法,JEN-1 在文本音乐对齐和音乐质量两个指标上具有显著优势,同时保持计算效率。研究人员提供了在线示例,详细介绍了模型的应用和实现。Abstract
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1
摘要
音乐生成已经吸引了深入的关注,随着深度生成模型的发展。然而,根据文本描述生成音乐,也称为文本到音乐,仍然是一个挑战,因为音乐结构的复杂性和高抽样率的需求。尽管这个任务的重要性,现有的生成模型却表现出限制,包括音乐质量、计算效率和泛化能力。这篇论文介绍了JEN-1,一种通用的高精度模型 для文本到音乐生成。JEN-1是一种扩散模型,通过同时使用抽样和非抽样训练,实现了在文本指导下生成音乐、音乐填充和续写等多种生成任务。评估结果表明,JEN-1在文本音乐对齐和音乐质量方面表现出了明显的优异性,同时保持计算效率。我们的 demo 可以在 http://futureverse.com/research/jen/demos/jen1 上查看。
Data-Free Model Extraction Attacks in the Context of Object Detection
results: 研究发现,定义损失函数和生成器的设置是EXTRACTION攻击的关键因素,并且使用合理的查询语句可以获得显著的结果。这种对找象检测模型的攻击将支持未来保护这些模型的安全。Abstract
A significant number of machine learning models are vulnerable to model extraction attacks, which focus on stealing the models by using specially curated queries against the target model. This task is well accomplished by using part of the training data or a surrogate dataset to train a new model that mimics a target model in a white-box environment. In pragmatic situations, however, the target models are trained on private datasets that are inaccessible to the adversary. The data-free model extraction technique replaces this problem when it comes to using queries artificially curated by a generator similar to that used in Generative Adversarial Nets. We propose for the first time, to the best of our knowledge, an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. As part of our study, we found that defining a loss function and using a novel generator setup is one of the key aspects in extracting the target model. We find that the proposed model extraction method achieves significant results by using reasonable queries. The discovery of this object detection vulnerability will support future prospects for securing such models.
摘要
许多机器学习模型容易受到模型EXTRACTION攻击,这种攻击将目标模型通过使用特制的查询来盗取模型。在实际情况下,目标模型通常是使用私有数据进行训练,这些数据不可 accessible于敌方。我们提出了一种数据free模型EXTRACTION技术,使用生成器类似于生成对抗网络来生成训练数据。我们在这种情况下首次,至少知道的情况下,将敌方黑盒攻击扩展到回归问题中,用于预测物体检测中的矩形坐标。在我们的研究中,我们发现了定义损失函数和使用新的生成器设置是EXTRACTION模型的关键方面。我们发现,使用合理的查询,提议的模型EXTRACTION方法可以取得显著的结果。这种发现将对物体检测模型的安全提供支持。
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
methods: 这个论文使用了人类对象棋盘游戏的纪录分析,以及蜕虫 Monte-Carlo Tree Search(MCTS)和策略空间响应器(PSRO)的组合,以估算 Nash 平衡。
results: 这个论文通过对 WeChat 小程序进行实践测试,实现了人类玩家的Master级别冠军,win rate 为 99.41%。这个结果表明了算法的有效性在超越非传统性方面。Abstract
This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game's strategic structure. To address non-transitivity, we introduce the JiangJun algorithm, an innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space Response Oracles (PSRO) designed to approximate a Nash equilibrium. We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.41\% win rate against human players. The algorithm's effectiveness in overcoming non-transitivity is confirmed by a plethora of metrics, such as relative population performance and visualization results. Our project site is available at \url{https://sites.google.com/view/jiangjun-site/}.
摘要
Generative Perturbation Analysis for Probabilistic Black-Box Anomaly Attribution
results: 本文提出了一种基于变量 Bayes算法的方法来derive异常归因分布的 distributions。这种方法可以减少异常归因分布的uncertainty。根据作者所知,这是第一个不受异常归因的概率异常归因框架。Abstract
We address the task of probabilistic anomaly attribution in the black-box regression setting, where the goal is to compute the probability distribution of the attribution score of each input variable, given an observed anomaly. The training dataset is assumed to be unavailable. This task differs from the standard XAI (explainable AI) scenario, since we wish to explain the anomalous deviation from a black-box prediction rather than the black-box model itself. We begin by showing that mainstream model-agnostic explanation methods, such as the Shapley values, are not suitable for this task because of their ``deviation-agnostic property.'' We then propose a novel framework for probabilistic anomaly attribution that allows us to not only compute attribution scores as the predictive mean but also quantify the uncertainty of those scores. This is done by considering a generative process for perturbations that counter-factually bring the observed anomalous observation back to normalcy. We introduce a variational Bayes algorithm for deriving the distributions of per variable attribution scores. To the best of our knowledge, this is the first probabilistic anomaly attribution framework that is free from being deviation-agnostic.
摘要
我们考虑了黑盒回归Setting中的概率异常归因问题,其目标是计算每个输入变量的归因分布,给出观察到的异常。我们假设训练数据集不可用。这个任务与标准XAI(可解释AI)场景不同,我们想要解释黑盒预测的异常偏差,而不是黑盒模型本身。我们首先表明,主流的模型无关解释方法,如雪佛利值,不适合这个任务,因为它们的“偏差无关性”性。然后,我们提出了一种新的概率异常归因框架,允许我们不仅计算归因分布,还能量度这些分布的不确定性。这是通过考虑一种生成过程来perturbations counter-factually bring the observed anomalous observation back to normalcy来实现的。我们引入了一种变分极 bayes算法来 derive the distributions of per variable attribution scores。到目前为止,这是免受偏差无关性的第一个概率异常归因框架。
Explainable AI in Orthopedics: Challenges, Opportunities, and Prospects
results: 本研究发现了一些关键的挑战和机遇,包括数据不一致、模型复杂性和解释性要求等,这些挑战需要在AI实践中得到解决,以便在orthopedics中广泛采用XAI。Abstract
While artificial intelligence (AI) has made many successful applications in various domains, its adoption in healthcare lags a little bit behind other high-stakes settings. Several factors contribute to this slower uptake, including regulatory frameworks, patient privacy concerns, and data heterogeneity. However, one significant challenge that impedes the implementation of AI in healthcare, particularly in orthopedics, is the lack of explainability and interpretability around AI models. Addressing the challenge of explainable AI (XAI) in orthopedics requires developing AI models and algorithms that prioritize transparency and interpretability, allowing clinicians, surgeons, and patients to understand the contributing factors behind any AI-powered predictive or descriptive models. The current contribution outlines several key challenges and opportunities that manifest in XAI in orthopedic practice. This work emphasizes the need for interdisciplinary collaborations between AI practitioners, orthopedic specialists, and regulatory entities to establish standards and guidelines for the adoption of XAI in orthopedics.
摘要
人工智能(AI)在不同领域取得了许多成功应用,但在医疗领域的采纳相对落后一些。这些因素包括法规框架、患者隐私问题和数据多样性。然而,在医疗领域中,特别是在骨科方面,缺乏可解释性和解释力是人工智能的实施的一大障碍。解决骨科中的可解释人工智能(XAI)挑战需要开发可透明和可解释的AI模型和算法,让临床医生、外科医生和患者理解任何基于AI的预测或描述性模型的贡献因素。本著作描述了骨科中XAI的一些关键挑战和机遇,并强调需要在AI实践者、骨科专家和法规机构之间建立标准和指南,以便在骨科中广泛采用XAI。
Finite Element Operator Network for Solving Parametric PDEs
results: 在多个 benchmark 问题中,FEONet 方法比现有的状态 искусственный intelligence 方法更高精度、更好的泛化性和计算灵活性,并且可以应用于各种领域中, где PDEs 扮演关键角色。Abstract
Partial differential equations (PDEs) underlie our understanding and prediction of natural phenomena across numerous fields, including physics, engineering, and finance. However, solving parametric PDEs is a complex task that necessitates efficient numerical methods. In this paper, we propose a novel approach for solving parametric PDEs using a Finite Element Operator Network (FEONet). Our proposed method leverages the power of deep learning in conjunction with traditional numerical methods, specifically the finite element method, to solve parametric PDEs in the absence of any paired input-output training data. We demonstrate the effectiveness of our approach on several benchmark problems and show that it outperforms existing state-of-the-art methods in terms of accuracy, generalization, and computational flexibility. Our FEONet framework shows potential for application in various fields where PDEs play a crucial role in modeling complex domains with diverse boundary conditions and singular behavior. Furthermore, we provide theoretical convergence analysis to support our approach, utilizing finite element approximation in numerical analysis.
摘要
web crawler strategies for web pages under robot.txt restriction
results: 本研究回答了一些基本问题,例如网页在搜索引擎中如何获得高排名、搜索引擎如何获得网页的全部内容、网站管理员如何透过 robot.txt 文件对网络爬虫进行限制等等。Abstract
In the present time, all know about World Wide Web and work over the Internet daily. In this paper, we introduce the search engines working for keywords that are entered by users to find something. The search engine uses different search algorithms for convenient results for providing to the net surfer. Net surfers go with the top search results but how did the results of web pages get higher ranks over search engines? how the search engine got that all the web pages in the database? This paper gives the answers to all these kinds of basic questions. Web crawlers working for search engines and robot exclusion protocol rules for web crawlers are also addressed in this research paper. Webmaster uses different restriction facts in robot.txt file to instruct web crawler, some basic formats of robot.txt are also mentioned in this paper.
摘要
现在,所有人都知道世界延伸网(World Wide Web)以及在互联网上每天进行工作。在这篇论文中,我们介绍了用户输入关键词时搜索引擎所使用的不同搜索算法,以提供便利的搜索结果给网络浏览者。网络浏览者通常会遵循搜索结果的排名,但是如何使得网页在搜索引擎中获得高排名呢?这篇论文会回答这些基本问题。此外,我们还会讨论搜索引擎使用的网络爬虫(web crawler)以及爬虫排除协议(robot exclusion protocol)规则。网站管理员可以使用 robot.txt 文件中的不同限制事实来 instruc爬虫,这篇论文还将介绍一些基本的 robot.txt 格式。
Rapid Training Data Creation by Synthesizing Medical Images for Classification and Localization
paper_authors: Abhishek Kushwaha, Sarthak Gupta, Anish Bhanushali, Tathagato Rai Dastidar for: 这篇论文的目的是解决医疗影像分析中对于资料标注的问题,以及实现对于医疗影像的训练深度神经网络所需的大量标注资料的产生。methods: 这篇论文使用了一种方法来将真实的数据转换为训练深度神经网络所需的标注资料。这种方法可以帮助解决医疗影像分析中对于资料标注的问题,并且可以实现对于医疗影像的训练深度神经网络所需的大量标注资料的产生。results: 这篇论文的结果显示,使用这种方法可以将医疗影像的训练深度神经网络的对于标注的精度提高 significantly。另外,这种方法可以实现对于医疗影像的训练深度神经网络所需的大量标注资料的产生,并且可以与实际的标注资料相对比较。Abstract
While the use of artificial intelligence (AI) for medical image analysis is gaining wide acceptance, the expertise, time and cost required to generate annotated data in the medical field are significantly high, due to limited availability of both data and expert annotation. Strongly supervised object localization models require data that is exhaustively annotated, meaning all objects of interest in an image are identified. This is difficult to achieve and verify for medical images. We present a method for the transformation of real data to train any Deep Neural Network to solve the above problems. We show the efficacy of this approach on both a weakly supervised localization model and a strongly supervised localization model. For the weakly supervised model, we show that the localization accuracy increases significantly using the generated data. For the strongly supervised model, this approach overcomes the need for exhaustive annotation on real images. In the latter model, we show that the accuracy, when trained with generated images, closely parallels the accuracy when trained with exhaustively annotated real images. The results are demonstrated on images of human urine samples obtained using microscopy.
摘要
Artificial intelligence (AI) for medical image analysis 受欢迎,但是在医疗领域生成标注数据的专业知识、时间和成本很高,主要是因为医疗数据和专家标注数据的有限性。强制supervised对象定位模型需要完全标注的数据,这是医疗图像中很难以完成和验证。我们提出了一种将实际数据转换为训练任何深度神经网络的方法。我们在弱supervised对象定位模型和强制supervised对象定位模型中应用了这种方法,并证明了其效果。对于弱supervised模型,我们发现使用生成的数据可以显著提高对象定位精度。对于强制supervised模型,这种方法可以解决实际图像中的尝试性标注问题,并且在使用生成的图像进行训练时,对象定位精度与使用实际图像进行训练时的精度几乎相同。我们在人体尿样微scopic图像中进行了实验,并证明了这种方法的有效性。
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
results: 该 paper 的 80 亿参数模型在 ARC-Easy 数据集下,在 few shot 设定下,超过 BLOOM-176B 的性能。Abstract
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited enhanced reasoning capabilities in tackling complex reasoning tasks, owing to the utilization of a method named ``Chain-of-Thought (CoT) prompting''. This method is designed to generate intermediate reasoning steps that guide the inference of the final answer. However, it is essential to highlight that these advanced reasoning abilities appear to emerge in models with a minimum of 10 billion parameters, thereby limiting its efficacy in situations where computational resources are constrained. In this paper, we investigate the possibility of transferring the reasoning capabilities of LLMs to smaller models via knowledge distillation. Specifically, we propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers. This method enables a more efficient use of rationales during the answer inference stage, leading to improved performance on scientific question-answering tasks. Utilizing Sci-CoT, our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
摘要
Addressing Racial Bias in Facial Emotion Recognition
paper_authors: Alex Fan, Xingshuo Xiao, Peter Washington
for: 该研究旨在分析深度学习模型在高维输入和主观标签下的公平性问题。
methods: 该研究使用变换训练集的方法来分析种族偏见,并评估模型在不同种族群体中的表现。
results: 研究发现,尽管使用更小的训练集可以改善公平性和性能指标,但在更大的数据集中,种族偏见指标通常保持不变,这表明种族平衡本身不足以实现不同种族群体中的表现均衡。Abstract
Fairness in deep learning models trained with high-dimensional inputs and subjective labels remains a complex and understudied area. Facial emotion recognition, a domain where datasets are often racially imbalanced, can lead to models that yield disparate outcomes across racial groups. This study focuses on analyzing racial bias by sub-sampling training sets with varied racial distributions and assessing test performance across these simulations. Our findings indicate that smaller datasets with posed faces improve on both fairness and performance metrics as the simulations approach racial balance. Notably, the F1-score increases by $27.2\%$ points, and demographic parity increases by $15.7\%$ points on average across the simulations. However, in larger datasets with greater facial variation, fairness metrics generally remain constant, suggesting that racial balance by itself is insufficient to achieve parity in test performance across different racial groups.
摘要
“深入学习模型对高维输入和主观标签的公平性问题仍然是一个复杂和未得到充分研究的领域。 facial emotion recognition 领域中的数据集经常具有种族不均衡,这可能导致不同种族群体之间的模型性能差异。本研究通过对训练集进行不同种族分布的子批采样,并评估测试性能在这些模拟中的变化。我们发现,使用posed faces的训练集可以提高公平性和性能指标,特别是在模拟中接近种族均衡时。平均而言,使用posed faces的训练集可以提高 F1 分数27.2个百分点,并提高了种族平衡指标15.7个百分点。然而,在大型数据集中,公平指标通常保持不变,这表明,只有寻求种族均衡不能 garantate 测试性能的平衡 across different racial groups。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
SSL-Auth: An Authentication Framework by Fragile Watermarking for Pre-trained Encoders in Self-supervised Learning
results: 实验结果显示,SSL-Auth可以实现标本处理器的完整性验证,并且可以探测到潜在的黑门和攻击攻击。实验结果显示,SSL-Auth不会对标本处理器的性能造成影响。Abstract
Self-supervised learning (SSL), utilizing unlabeled datasets for training powerful encoders, has achieved significant success recently. These encoders serve as feature extractors for downstream tasks, requiring substantial resources. However, the challenge of protecting the intellectual property of encoder trainers and ensuring the trustworthiness of deployed encoders remains a significant gap in SSL. Moreover, recent researches highlight threats to pre-trained encoders, such as backdoor and adversarial attacks. To address these gaps, we propose SSL-Auth, the first authentication framework designed specifically for pre-trained encoders. In particular, SSL-Auth utilizes selected key samples as watermark information and trains a verification network to reconstruct the watermark information, thereby verifying the integrity of the encoder without compromising model performance. By comparing the reconstruction results of the key samples, malicious alterations can be detected, as modified encoders won't mimic the original reconstruction. Comprehensive evaluations on various encoders and diverse downstream tasks demonstrate the effectiveness and fragility of our proposed SSL-Auth.
摘要
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks
paper_authors: Jue Chen, Huan Yuan, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang
for: This paper focuses on compressing Brain-inspired Spiking Neural Networks (SNNs) to improve their deployment on edge devices such as neuromorphic chips.
methods: The proposed method uses an improved end-to-end Minimax optimization method for sparse learning to balance model performance and computation efficiency.
results: The compressed SNN models achieved state-of-the-art (SOTA) performance on various benchmark datasets and architectures.Here’s the same information in Simplified Chinese:
for: 这篇论文主要关注压缩Brain-inspired Spiking Neural Networks (SNNs),以提高其在边缘设备such as neuromorphic chips上的部署。
results: 压缩后的SNN模型在多个 benchmark数据集和架构上达到了最佳性能(SOTA)。I hope that helps!Abstract
Brain-inspired Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient, which are different from traditional Artificial Neural Networks (ANNs) when deployed on edge devices such as neuromorphic chips. Most previous work focuses on SNNs training strategies to improve model performance and brings larger and deeper network architectures. It is difficult to deploy these complex networks on resource-limited edge devices directly. To meet such demand, people compress SNNs very cautiously to balance the performance and the computation efficiency. Existing compression methods either iteratively pruned SNNs using weights norm magnitude or formulated the problem as a sparse learning optimization. We propose an improved end-to-end Minimax optimization method for this sparse learning problem to better balance the model performance and the computation efficiency. We also demonstrate that jointly applying compression and finetuning on SNNs is better than sequentially, especially for extreme compression ratios. The compressed SNN models achieved state-of-the-art (SOTA) performance on various benchmark datasets and architectures. Our code is available at https://github.com/chenjallen/Resource-Constrained-Compression-on-SNN.
摘要
�建基于大脑的�顿神经网络(SNN)具有事件驱动和高能效的特点,与传统的人工神经网络(ANN)不同,当部署在边缘设备such as neuromorphic chips时。大多数先前的工作将焦点放在SNNs 训练策略以提高模型性能,并带来更大和更深的网络架构。然而,这些复杂的网络直接部署在有限的边缘设备上是困难的。为满足这种需求,人们很注意地压缩SNNs以平衡性能和计算效率。现有的压缩方法包括Iteratively pruning SNNs使用权重norm magnitude或将问题形式为稀疏学习优化问题。我们提出了改进的端到端最小最大优化方法,以更好地平衡模型性能和计算效率。我们还示出,将压缩和训练结合在一起,特别是在极高的压缩比例时,jointly applying compression and finetuning on SNNs 更好than sequentially。压缩后的SNN模型在多个benchmark datasets和架构上达到了state-of-the-art(SOTA)性能。我们的代码可以在https://github.com/chenjallen/Resource-Constrained-Compression-on-SNN上找到。
A Hierarchical Destroy and Repair Approach for Solving Very Large-Scale Travelling Salesman Problem
results: 在 nineteen 个著名的大规模实例上进行了公正的比较,显示 HDR 在计算效率和解决质量两个方面具有竞争力,并在两个大实例上打破了世界纪录。Abstract
For prohibitively large-scale Travelling Salesman Problems (TSPs), existing algorithms face big challenges in terms of both computational efficiency and solution quality. To address this issue, we propose a hierarchical destroy-and-repair (HDR) approach, which attempts to improve an initial solution by applying a series of carefully designed destroy-and-repair operations. A key innovative concept is the hierarchical search framework, which recursively fixes partial edges and compresses the input instance into a small-scale TSP under some equivalence guarantee. This neat search framework is able to deliver highly competitive solutions within a reasonable time. Fair comparisons based on nineteen famous large-scale instances (with 10,000 to 10,000,000 cities) show that HDR is highly competitive against existing state-of-the-art TSP algorithms, in terms of both efficiency and solution quality. Notably, on two large instances with 3,162,278 and 10,000,000 cities, HDR breaks the world records (i.e., best-known results regardless of computation time), which were previously achieved by LKH and its variants, while HDR is completely independent of LKH. Finally, ablation studies are performed to certify the importance and validity of the hierarchical search framework.
摘要
For 非常大规模的旅行销售人员问题 (TSP), 现有的算法面临着计算效率和解决质量两个大的挑战。为了解决这个问题,我们提议一种层次破坏和重建 (HDR) 方法,它尝试通过一系列特制的破坏和重建操作来提高初始解。一个关键创新的搜索框架是层次搜索框架,它可以在一定的等价保证下将输入实例压缩到小规模的 TSP 中进行搜索。这个搜索框架能够在有限时间内提供高度竞争力的解决方案。基于 nineteen 著名的大规模实例 (10,000 到 10,000,000 个城市) 进行了公正的比较,显示 HDR 在计算效率和解决质量两个方面与现有的状态 искусственный智能 TSP 算法高度竞争。特别是在 3,162,278 和 10,000,000 个城市的两个大实例上,HDR 破坏了世界纪录 (即不管计算时间为何),这些纪录曾由 LKH 和其变种所获得,而 HDR 与 LKH 完全无关。最后,我们进行了剖面研究,以证明层次搜索框架的重要性和有效性。
Sparse Binary Transformers for Multivariate Time Series Modeling
paper_authors: Matt Gorbett, Hossein Shirazi, Indrakshi Ray for:This paper focuses on applying sparse and binary-weighted Transformers to multivariate time series problems, with the goal of achieving accuracy comparable to that of dense floating-point Transformers while reducing computational complexity.methods:The authors use two compression techniques to reduce the number of non-zero operations necessary in the Transformer: 1) applying a fixed mask to the query, key, and value activations, and 2) proposing an attention mask to allow computation only at the current time step.results:The model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting, with up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs compared to the dense floating-point Transformers.Abstract
Compressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and anomaly detection, which rely on predicting outputs at a single point in time, we propose an attention mask to allow computation only at the current time step. Together, each compression technique and attention modification substantially reduces the number of non-zero operations necessary in the Transformer. We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count, showing up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs.
摘要
压缩神经网络(Compressed Neural Networks)具有启动深度学习的潜力,但未有充分研究其在不同任务下的范围。在这个工作中,我们运用稀疑和二进制权重的 transformer 来解决多元时间序问题,获得了与组 dense floating-point transformer 相同结构的精度。我们的模型在三个时间序学习任务中获得了良好的结果:分类、侦测异常和预测。此外,为降低对于注意力机制的计算复杂度,我们提出了两种修改,它们几乎没有影响模型性能:1)在分类任务中,我们对查询、钥匙和值活动中的数据应用固定的面罩,2)在预测和侦测任务中,我们提出了一个注意力面罩,允许只在目前时间步进行计算。组合这些压缩技术和注意力修改,我们可以对 transformer 进行重大的储存空间和计算复杂度的节省。我们使用多个指标来衡量我们的方法对于储存空间、位元数和浮点操作(FLOPs)的节省,获得了最多 53 倍的储存空间节省和最多 10.5 倍的 FLOPs 节省。
methods: 文章使用了三种责任 régime来分析生成AI模型的责任风险,包括诽谤、speech integral to criminal conduct和wrongful death。
results: 文章发现,任何 Section 230 免责分析或下游责任分析都与算法设计细节有紧密的关系,而且在法律责任上存在多个障碍,使得很难确定生成AI模型和关联的党员是否承担生成的言论责任。文章认为,AI应不被总是免责,法院和政策制定者应该仔细考虑技术设计的优化,以便更好地评估这些问题。Abstract
Generative AI, in particular text-based "foundation models" (large models trained on a huge variety of information including the internet), can generate speech that could be problematic under a wide range of liability regimes. Machine learning practitioners regularly "red team" models to identify and mitigate such problematic speech: from "hallucinations" falsely accusing people of serious misconduct to recipes for constructing an atomic bomb. A key question is whether these red-teamed behaviors actually present any liability risk for model creators and deployers under U.S. law, incentivizing investments in safety mechanisms. We examine three liability regimes, tying them to common examples of red-teamed model behaviors: defamation, speech integral to criminal conduct, and wrongful death. We find that any Section 230 immunity analysis or downstream liability analysis is intimately wrapped up in the technical details of algorithm design. And there are many roadblocks to truly finding models (and their associated parties) liable for generated speech. We argue that AI should not be categorically immune from liability in these scenarios and that as courts grapple with the already fine-grained complexities of platform algorithms, the technical details of generative AI loom above with thornier questions. Courts and policymakers should think carefully about what technical design incentives they create as they evaluate these issues.
摘要
优先级AI,特别是基于文本的“基础模型”(大型模型通过互联网上庞大量信息进行训练),可能会生成有问题的语音。机器学习实践者 regularely “红军”模型以识别和避免这些问题语音,从“幻觉”(误告人员严重不当行为)到杀人炸弹的制作方法。关键问题是这些红色测试行为是否对美国法律而言带来责任风险,并促使投资于安全机制。我们研究了三种责任 régime,与常见的红色测试模型行为相关:诽谤、语音与刑事活动相关、和谋杀。我们发现,任何 Section 230 免责分析或下游责任分析都与算法设计技术细节有紧密的关系。而且有很多障碍物,使得真正找到模型(以及关联的党)因生成的语音而负责。我们认为AI shouldn't be categorically immune from liability in these scenarios,而且法院和政策制定者在评估这些问题时应该仔细考虑技术设计的吸引力。
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
results: 研究发现,大多数语句嵌入方法在给定的文档中具有高相关性,但显示出有趣的差异。Abstract
Analyzing the pattern of semantic variation in long real-world texts such as books or transcripts is interesting from the stylistic, cognitive, and linguistic perspectives. It is also useful for applications such as text segmentation, document summarization, and detection of semantic novelty. The recent emergence of several vector-space methods for sentence embedding has made such analysis feasible. However, this raises the issue of how consistent and meaningful the semantic representations produced by various methods are in themselves. In this paper, we compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature. In contrast to previous work using target tasks and curated datasets to compare sentence embedding methods, our approach provides an evaluation of the methods 'in the wild'. We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.
摘要
分析长文本中的语义变化 Pattern 是从语言、认知和风格等多个角度来看的非常有趣。同时,这也有很多应用,例如文本分 segmentation、文摘概要和语义新颖性检测。随着 sentence embedding 技术的出现,这种分析变得可能。然而,这也引出了各种方法生成的语义表示是否具有一致性和意义的问题。在这篇文章中,我们对一些最近的 sentence embedding 方法进行了比较,使用时间序列的语义相似性和多本文学作品的对应矩阵来评估这些方法。与之前使用目标任务和精心制作的数据集来评估 sentence embedding 方法不同,我们的方法在实际应用中进行了评估。我们发现大多数考虑的 sentence embedding 方法在给定的文档中具有高度相关的语义相似性模式,但是具有舒适的差异。
Benchmarking LLM powered Chatbots: Methods and Metrics
for: 评估自动对话机器人(chatbot)的性能,特别是使用生成型人工智能工具(Large Language Models,LLMs)的chatbot。
methods: 提出一种新的终到终(End to End,E2E)benchmark,用于评估chatbot的答案准确性和用用性。
results: 通过对一个示例chatbot进行评估,显示E2E benchmark能够更好地评估chatbot的性能,而且与其他常用的metric相比,E2E benchmark的metric(cosine similarity)表现良好。Abstract
Autonomous conversational agents, i.e. chatbots, are becoming an increasingly common mechanism for enterprises to provide support to customers and partners. In order to rate chatbots, especially ones powered by Generative AI tools like Large Language Models (LLMs) we need to be able to accurately assess their performance. This is where chatbot benchmarking becomes important. In this paper, we propose the use of a novel benchmark that we call the E2E (End to End) benchmark, and show how the E2E benchmark can be used to evaluate accuracy and usefulness of the answers provided by chatbots, especially ones powered by LLMs. We evaluate an example chatbot at different levels of sophistication based on both our E2E benchmark, as well as other available metrics commonly used in the state of art, and observe that the proposed benchmark show better results compared to others. In addition, while some metrics proved to be unpredictable, the metric associated with the E2E benchmark, which uses cosine similarity performed well in evaluating chatbots. The performance of our best models shows that there are several benefits of using the cosine similarity score as a metric in the E2E benchmark.
摘要
自动化对话代理(即 chatbot)在企业提供客户和伙伴支持的机制中变得越来越普遍。为了评估 chatbot 的性能,特别是基于大型语言模型(LLM)的话语生成器,我们需要能够准确评估它们的表现。这是 где对话机器人评估变得重要。在这篇论文中,我们提议使用一种新的评估标准,称之为端到端(E2E)标准,并证明了该标准可以准确评估对话机器人的答案准确性和用于性。我们对一个示例的对话机器人进行了不同水平的评估,并证明了我们的 E2E 标准与其他常见的状态艺术metric 相比,能够更好地评估对话机器人的性能。此外,我们发现了一些 metrics 的不可预测性,但是与 E2E 标准相关的 cosine 相似性分数metric 表现了良好的评估能力。我们的最佳模型表示,使用 cosine 相似性分数作为metric 在 E2E 标准中有几个优点。
Accelerating LLM Inference with Staged Speculative Decoding
results: 对于一个 762M 参数的 GPT-2-L 模型,单批解码延迟时间可以下降至 3.16 倍,同时保持输出质量不变。Abstract
Recent advances with large language models (LLM) illustrate their diverse capabilities. We propose a novel algorithm, staged speculative decoding, to accelerate LLM inference in small-batch, on-device scenarios. We address the low arithmetic intensity of small-batch inference by improving upon previous work in speculative decoding. First, we restructure the speculative batch as a tree, which reduces generation costs and increases the expected tokens per batch. Second, we add a second stage of speculative decoding. Taken together, we reduce single-batch decoding latency by 3.16x with a 762M parameter GPT-2-L model while perfectly preserving output quality.
摘要
methods: 这个模型的主要方法包括基于对照原理的认知演化,从婴儿期到成年期,通过固化原理来描述达到成熟状态。此外,模型还包括一种普适的 AGI 设计方法,以及认知下Constraints or efficiency的方法,如重用和简洁。
results: 这个模型的最终产品是一个动态操作记忆,包含模型和实例。此外,论文还提供了一些示例和初步的进化阶段来达到成熟状态的想法。Abstract
This paper proposes a new cognitive model, acting as the main component of an AGI agent. The model is introduced in its mature intelligence state, and as an extension of previous models, DENN, and especially AKREM, by including operational models (frames/classes) and will. This model's core assumption is that cognition is about operating on accumulated knowledge, with the guidance of an appropriate will. Also, we assume that the actions, part of knowledge, are learning to be aligned with will, during the evolution phase that precedes the mature intelligence state. In addition, this model is mainly based on the duality principle in every known intelligent aspect, such as exhibiting both top-down and bottom-up model learning, generalization verse specialization, and more. Furthermore, a holistic approach is advocated for AGI designing, and cognition under constraints or efficiency is proposed, in the form of reusability and simplicity. Finally, reaching this mature state is described via a cognitive evolution from infancy to adulthood, utilizing a consolidation principle. The final product of this cognitive model is a dynamic operational memory of models and instances. Lastly, some examples and preliminary ideas for the evolution phase to reach the mature state are presented.
摘要
The model is based on the duality principle in every known intelligent aspect, such as top-down and bottom-up model learning, generalization and specialization, and more. Additionally, the model advocates for a holistic approach to AGI design and cognition under constraints or efficiency, in the form of reusability and simplicity.The model proposes a cognitive evolution from infancy to adulthood, utilizing a consolidation principle to reach the mature state. The final product of the model is a dynamic operational memory of models and instances. Some examples and preliminary ideas for the evolution phase to reach the mature state are also presented.Here is the text in Simplified Chinese:这篇论文提出了一个新的认知模型,作为人工通用智能(AGI)代理人的主要组件。该模型是前一代模型DENN和AKREM的扩展,包括操作模型(框架/类)和愿power。模型的核心假设是认知是通过储存的知识进行操作,并且被适当的愿power引导。模型还假设行为,是知识的一部分,在演化阶段前置于成熟智能状态时,通过学习和对愿power的调整,使行为与愿power相吻合。此外,模型还基于知识的每一个智能方面的 dualism原理,例如展现出顶下向和底上向的模型学习、通用和特殊化、以及更多的方面。模型还提出了一种整体的方法 дляAGI设计和认知,即在约束或效率下进行认知,通过再用性和简洁来实现。最后,模型描述了一种认知演化从婴儿期到成年期,通过固化原理来达到成熟状态。最后,模型还提供了一些初步的演化阶段来达到成熟状态的例子和想法。
paper_authors: Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O’Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz
results: 根据GPT-4的评价,Shepherd的评价与其他成熔比赛的模型相当或更高(53-87%的胜率),而在人工评价中,Shepherd优于其他模型,并与ChatGPT相对较为均衡。Abstract
As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language model specifically tuned to critique responses and suggest refinements, extending beyond the capabilities of an untuned model to identify diverse errors and provide suggestions to remedy them. At the core of our approach is a high quality feedback dataset, which we curate from community feedback and human annotations. Even though Shepherd is small (7B parameters), its critiques are either equivalent or preferred to those from established models including ChatGPT. Using GPT-4 for evaluation, Shepherd reaches an average win-rate of 53-87% compared to competitive alternatives. In human evaluation, Shepherd strictly outperforms other models and on average closely ties with ChatGPT.
摘要
large language models 的改进引起了越来越多的关注,这些技术可以利用这些模型的能力来优化其输出。在这项工作中,我们介绍Shepherd,一种特地适应批判回答和提供修正的语言模型,超出了未调节模型的能力,能够识别多种错误并提供修正方案。我们的方法的核心是高质量的反馈数据集,我们从社区反馈和人工标注中筛选出来。尽管Shepherd只有7B个参数,但它的批判比与已有的模型,如ChatGPT相当或更好。使用GPT-4进行评估,Shepherd的平均胜率为53-87%,与竞争对手相比。在人类评估中,Shepherd严格超越了其他模型,平均与ChatGPT相对。
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction
paper_authors: Izzeddin Teeti, Rongali Sai Bhargav, Vivek Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin
for: This paper is written for improving the action prediction in computer vision applications such as autonomous driving, activity analysis, and human-computer interaction.
methods: The paper introduces a novel self-supervised video strategy called Temporal-DINO, which uses two models (a “student” and a “teacher”) to learn future context by only observing past frames.
results: The experimental results show that the proposed method achieves significant improvements in prediction performance across different architectures, with an average enhancement of 9.9% Precision Points (PP), and demonstrates efficiency in terms of the pretraining dataset size and the number of epochs required.Abstract
The emerging field of action prediction plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches require large amounts of labelled data, which is expensive and time-consuming to obtain. This paper introduces a novel self-supervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels). The Temporal-DINO approach employs two models; a 'student' processing past frames; and a 'teacher' processing both past and future frames, enabling a broader temporal context. During training, the teacher guides the student to learn future context by only observing past frames. The strategy is evaluated on ROAD dataset for the action prediction downstream task using 3D-ResNet, Transformer, and LSTM architectures. The experimental results showcase significant improvements in prediction performance across these architectures, with our method achieving an average enhancement of 9.9% Precision Points (PP), highlighting its effectiveness in enhancing the backbones' capabilities of capturing long-term dependencies. Furthermore, our approach demonstrates efficiency regarding the pretraining dataset size and the number of epochs required. This method overcomes limitations present in other approaches, including considering various backbone architectures, addressing multiple prediction horizons, reducing reliance on hand-crafted augmentations, and streamlining the pretraining process into a single stage. These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding.
摘要
新兴的动作预测领域在计算机视觉应用中扮演着重要角色,包括自主驾驶、活动分析和人机交互。尽管有 significante 进步,但准确预测未来动作仍然是一个挑战,因为视频数据中存在高维度、复杂的动态和内在的不确定性。传统的监督方法需要大量标注数据,这是时间consuming 和成本高的。这篇文章介绍了一种新的无监督视频策略,以提高动作预测, inspirited by DINO(自我混合无标签)。Temporal-DINO方法使用两个模型:一个“学生”处理过去帧,一个“老师”处理过去和未来帧,这使得学生可以学习未来上下文。在训练过程中,老师指导学生通过只看过去帧来学习未来上下文。这种策略在 ROAD 数据集上进行动作预测下渠道任务中使用 3D-ResNet、Transformer 和 LSTM 架构进行评估。实验结果显示,我们的方法可以在这些架构上提高预测性能,typically 9.9% 精度点(PP),这说明我们的方法可以增强核心的长期依赖能力。此外,我们的方法也具有效率的优势,包括预训练数据集大小和轮数 requirement 的减少。这种方法超越了其他方法的限制,包括考虑多种核心架构、 Addressing 多个预测时间 horizons、减少手动制作的扩展和单一预训练过程。这些发现表明我们的方法在多个视频基于任务中具有潜在的优势,例如活动识别、运动规划和场景理解。
Developmental Bootstrapping: From Simple Competences to Intelligent Human-Compatible AIs
results: 未达到成年人类水平的能力(have not yet reached adult-level competences)Abstract
Although some AIs surpass human abilities in closed artificial worlds such as board games, in the real world they make strange mistakes and do not notice them. They cannot be instructed easily, fail to use common sense, and lack curiosity. Mainstream approaches for creating AIs include the traditional manually-constructed symbolic AI approach and the generative and deep learning AI approaches including large language models (LLMs). Although it is outside of the mainstream, the developmental bootstrapping approach may have more potential. In developmental bootstrapping, AIs develop competences like human children do. They start with innate competences. They interact with the environment and learn from their interactions. They incrementally extend their innate competences with self-developed competences. They interact and learn from people and establish perceptual, cognitive, and common grounding. They acquire the competences they need through competence bootstrapping. However, developmental robotics has not yet produced AIs with robust adult-level competences. Projects have typically stopped before reaching the Toddler Barrier. This corresponds to human infant development at about two years of age, before infant speech becomes fluent. They also do not bridge the Reading Barrier, where they could skillfully and skeptically draw on the socially developed online information resources that power LLMs. The next competences in human cognitive development involve intrinsic motivation, imitation learning, imagination, coordination, and communication. This position paper lays out the logic, prospects, gaps, and challenges for extending the practice of developmental bootstrapping to create robust, trustworthy, and human-compatible AIs.
摘要
尽管一些人工智能在封闭的人工世界中超越人类能力,但在真实世界中它们会做奇怪的错误并不会注意到它们。它们不易教育,失去常识,缺乏好奇心。主流的创造人工智能方法包括传统的手动构建符号AI方法和生成和深度学习AI方法,包括大型语言模型(LLM)。虽然它不在主流,但开发启动approach可能具有更大的潜力。在开发启动中,人工智能发展出 Competences 像人类孩子一样。它们从内在的 Competences 开始,与环境互动,从互动学习。它们逐步增强内在 Competences 自己发展出的 Competences。它们与人类交互,学习人类的语言、认知和共同基础。它们通过 Competence 启动获得需要的 Competences。然而,开发机器人学还没有生成成熟的人类水平 Competences。项目通常在达到婴儿障碍(Toddler Barrier)前就停止。这与人类婴儿发展相对应,约两岁时, antes que el habla fluente。它们也不能跨越阅读障碍(Reading Barrier),可以 skillfully 和skeptically draw on the socially developed online information resources that power LLMs。人类认知发展中下一个 Competences 包括内在动机、模仿学习、想象力、协调和communication。这篇position paper 描述了开发启动的逻辑、前景、潜在的差距和挑战,以创建可靠、可信任的人类兼容AI。
Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures
results: 作者在Split-MNIST、Split-CIFAR-10和Split-CIFAR-100数据集上实现了在线不断学习的superior性能,并且与其他存储受限的学习方法相比,与STATE-OF-THE-ART的存储充足的重播基于方法相当。此外,作者还将这些设计元素 интеGRATE到其他Backpropagation-based continual learning算法中,提高了它们的准确性。Abstract
The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.
摘要
<>为设计智能系统,持续学习能力是关键。许多持续学习方法依赖于梯度下降和其变体,这会导致稳定性、贪吃和短期记忆限制。为解决这一限制,我们开发了基于生物学原理的轻量级神经网络架构,包括synaptic plasticity机制和 neuromodulation,从而通过本地错误信号来实现在线持续学习无需梯度下降。我们的方法在Split-MNIST、Split-CIFAR-10和Split-CIFAR-100数据集上表现出色,比其他内存限制的学习方法更好,并与state-of-the-art内存占用重的播放方法匹配。我们还将关键设计元素integrated into other backpropagation-based continual learning algorithms,显著提高了它们的准确性。我们的结果提供了迫使生物学原理integrated into机器学习模型的证据,并提供了如何通过这些原理来设计更高效和可靠的在线持续学习系统。
Generating Modern Persian Carpet Map by Style-transfer
results: 生成的织革图易得到了用户评价的满意度,并且比传统方法快速。Abstract
Today, the great performance of Deep Neural Networks(DNN) has been proven in various fields. One of its most attractive applications is to produce artistic designs. A carpet that is known as a piece of art is one of the most important items in a house, which has many enthusiasts all over the world. The first stage of producing a carpet is to prepare its map, which is a difficult, time-consuming, and expensive task. In this research work, our purpose is to use DNN for generating a Modern Persian Carpet Map. To reach this aim, three different DNN style transfer methods are proposed and compared against each other. In the proposed methods, the Style-Swap method is utilized to create the initial carpet map, and in the following, to generate more diverse designs, methods Clip-Styler, Gatys, and Style-Swap are used separately. In addition, some methods are examined and introduced for coloring the produced carpet maps. The designed maps are evaluated via the results of filled questionnaires where the outcomes of user evaluations confirm the popularity of generated carpet maps. Eventually, for the first time, intelligent methods are used in producing carpet maps, and it reduces human intervention. The proposed methods can successfully produce diverse carpet designs, and at a higher speed than traditional ways.
摘要
In our proposed methods, the Style-Swap method is used to create the initial carpet map, and then, to generate more diverse designs, we use the Clip-Styler, Gatys, and Style-Swap methods separately. Additionally, we explore and introduce some methods for coloring the produced carpet maps. The designed maps are evaluated through filled questionnaires, and the results of user evaluations confirm the popularity of the generated carpet maps.For the first time, intelligent methods are used in producing carpet maps, reducing human intervention. Our proposed methods can successfully produce diverse carpet designs at a higher speed than traditional methods.
Deep Learning for Diverse Data Types Steganalysis: A Review
results: 本文对多种数字媒体类型进行了检测和分析,并提供了一个系统性的评估 metric 和数据集。同时,文章还提出了未来研究方向和挑战。Abstract
Steganography and steganalysis are two interrelated aspects of the field of information security. Steganography seeks to conceal communications, whereas steganalysis is aimed to either find them or even, if possible, recover the data they contain. Steganography and steganalysis have attracted a great deal of interest, particularly from law enforcement. Steganography is often used by cybercriminals and even terrorists to avoid being captured while in possession of incriminating evidence, even encrypted, since cryptography is prohibited or restricted in many countries. Therefore, knowledge of cutting-edge techniques to uncover concealed information is crucial in exposing illegal acts. Over the last few years, a number of strong and reliable steganography and steganalysis techniques have been introduced in the literature. This review paper provides a comprehensive overview of deep learning-based steganalysis techniques used to detect hidden information within digital media. The paper covers all types of cover in steganalysis, including image, audio, and video, and discusses the most commonly used deep learning techniques. In addition, the paper explores the use of more advanced deep learning techniques, such as deep transfer learning (DTL) and deep reinforcement learning (DRL), to enhance the performance of steganalysis systems. The paper provides a systematic review of recent research in the field, including data sets and evaluation metrics used in recent studies. It also presents a detailed analysis of DTL-based steganalysis approaches and their performance on different data sets. The review concludes with a discussion on the current state of deep learning-based steganalysis, challenges, and future research directions.
摘要
《隐藏通信和抹除检测:一篇现代信息安全领域的综述》Introduction:信息安全领域中,隐藏通信和抹除检测是两个相关的方面。隐藏通信旨在隐藏通信内容,而抹除检测则是检测和恢复隐藏的内容。由于隐藏通信和抹除检测具有广泛的应用前景,特别是在法律执法方面,因此对于找到隐藏的信息是非常重要的。Background:隐藏通信和抹除检测在过去几年中得到了广泛的关注,特别是在cyber犯罪和恐怖主义领域。隐藏通信可以帮助犯罪分子和恐怖分子避免被捕获,甚至是 encrypted 的情况下。因此,了解最新的隐藏信息检测技术是非常重要的。Methodology:本文将提供一个系统性的综述,涵盖了深度学习基于的隐藏信息检测技术。文章将覆盖所有类型的遮盖,包括图像、音频和视频,并讨论了最常用的深度学习技术。此外,文章还将探讨更高级的深度学习技术,如深度传输学习(DTL)和深度奖励学习(DRL),以提高隐藏信息检测系统的性能。Results:本文结合了最新的研究成果,包括数据集和评估指标。文章还进行了深度传输学习(DTL)基于的隐藏信息检测方法的系统性分析,并对不同数据集进行了详细的性能分析。Discussion:本文结束后,我们将对现代深度学习基于的隐藏信息检测领域进行一个系统性的回顾。我们还将讨论一些挑战和未来的研究方向。Conclusion:深度学习基于的隐藏信息检测技术在信息安全领域具有广泛的应用前景。本文提供了一个系统性的综述,涵盖了深度学习基于的隐藏信息检测技术的最新研究成果。我们希望这篇文章可以为读者提供一个全面的了解,并为未来的研究提供一个指导。
results: 本文提出了一种类型逻辑 sintaxisa paraparse donkey句子,并通过实验证明了其效果。Abstract
We demonstrate how to parse Geach's Donkey sentences in a compositional distributional model of meaning. We build on previous work on the DisCoCat (Distributional Compositional Categorical) framework, including extensions that model discourse, determiners, and relative pronouns. We present a type-logical syntax for parsing donkey sentences, for which we define both relational and vector space semantics.
摘要
我们展示了如何使用分布式compositional模型来解析格雷奇的驴句。我们基于之前的DisCoCat(分布式compositional categorical)框架,包括扩展以处理话语、determiners和相对副词。我们提出了一种类型逻辑语法来解析驴句,并对其定义了关系和vector空间 semantics。Note: "分布式compositional" (fēnhuì zhīxìng) is a compound word in Chinese, where "分布式" (fēnhuì) means "distributed" and "compositional" is a loanword from English.
MT-IceNet – A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting
results: 研究表明,使用NSIDC的卫星评估数据和ERA5的大气和海洋变量,MT-IceNet模型在6个月预测时间点上具有60%的预测误差减少,与现有的模型相比。Abstract
Arctic amplification has altered the climate patterns both regionally and globally, resulting in more frequent and more intense extreme weather events in the past few decades. The essential part of Arctic amplification is the unprecedented sea ice loss as demonstrated by satellite observations. Accurately forecasting Arctic sea ice from sub-seasonal to seasonal scales has been a major research question with fundamental challenges at play. In addition to physics-based Earth system models, researchers have been applying multiple statistical and machine learning models for sea ice forecasting. Looking at the potential of data-driven approaches to study sea ice variations, we propose MT-IceNet - a UNet based spatial and multi-temporal (MT) deep learning model for forecasting Arctic sea ice concentration (SIC). The model uses an encoder-decoder architecture with skip connections and processes multi-temporal input streams to regenerate spatial maps at future timesteps. Using bi-monthly and monthly satellite retrieved sea ice data from NSIDC as well as atmospheric and oceanic variables from ERA5 reanalysis product during 1979-2021, we show that our proposed model provides promising predictive performance for per-pixel SIC forecasting with up to 60% decrease in prediction error for a lead time of 6 months as compared to its state-of-the-art counterparts.
摘要
results: GPT-4 在专家提示下 achieve up to $65.49$ F\textsubscript{1} 分数,较前一个基准值高出约 $5$ 分数。 这表明 LLM 在有限的资源情况下可以提供可靠的数据生成方法,并且可以为模型训练提供有用的数据。Abstract
Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC) tasks, particularly in non-English languages, remains significantly unexplored. In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex due to Arabic's rich morphology. Our findings suggest that various prompting methods, coupled with (in-context) few-shot learning, demonstrate considerable effectiveness, with GPT-4 achieving up to $65.49$ F\textsubscript{1} score under expert prompting (approximately $5$ points higher than our established baseline). This highlights the potential of LLMs in low-resource settings, offering a viable approach for generating useful synthetic data for model training. Despite these positive results, we find that instruction fine-tuned models, regardless of their size, significantly underperform compared to fully fine-tuned models of significantly smaller sizes. This disparity highlights a substantial room for improvements for LLMs. Inspired by methods from low-resource machine translation, we also develop a method exploiting synthetic data that significantly outperforms previous models on two standard Arabic benchmarks. Our work sets new SoTA for Arabic GEC, with $72.19\%$ and $73.26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively.
摘要
现在,大型语言模型(LLM)经过人类指导的微调表现出了在英语自然语言处理(NLP)任务中的显著能力。然而,它们在非英语语法错误修正(GEC)任务中的表现仍然尚未得到了足够的探索。在这篇论文中,我们探索了微调后的LLM在阿拉伯语GEC任务中的能力。我们发现,使用不同的提示方法和(Context)少量学习可以获得显著的效果,GPT-4在专家提示下达到了$65.49$ F\textsubscript{1} 分数(相对于我们的基准值,大约5个点高)。这显示了LLM在有限的资源情况下的潜在能力,提供了一种可靠的方法来生成有用的合成数据 для模型训练。尽管我们得到了正面的结果,我们发现微调后的模型,无论其大小,都比完全微调的模型(即模型的大小更小)表现下降。这种差异表明了LLM的大型化可能存在一定的限制。 Drawing inspiration from low-resource机器翻译技术,我们还开发了一种利用合成数据的方法,在两个标准的阿拉伯语 benchmark 上获得了显著的提高。我们的工作设置了新的 SoTA для阿拉伯语 GEC,分别为$72.19\%$ 和 $73.26$ F\textsubscript{1} 在2014 和 2015 QALB 数据集上。
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
methods: 本研究使用了一种新的方法,即在搜索时使用可Parametric Language Model(OLC)和一个可修改的非参数化数据存储(例如,包含版权书籍或新闻)。这种方法可以在搜索时使用高风险数据,而不需要在训练过程中使用它们。
results: 实验结果显示,使用OLC和非参数化数据存储可以大幅提高语言模型的性能, especialy 在不同领域的搜索中。此外,研究还发现了不同的非参数化方法的效果,以及数据存储大小对性能的影响。这些结果表明,可以构建高质量的语言模型,同时遵守法律法规。Abstract
The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high quality language models while mitigating their legal risk.
摘要
Currently, the legality of training language models (LMs) on copyrighted or restricted data is under debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text, and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out-of-domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high-quality language models while mitigating their legal risk.
Probabilistic Invariant Learning with Randomized Linear Classifiers
results: 作者证明了RLCs可以在某些条件下, WITH HIGH PROBABILITY aproximate any (smooth) function while preserving invariance to compact group transformations。此外,作者还设计了三种基于RLCs的随机化分类模型,可以在不同的数据上实现概率性和通用适应。最后,作者通过实验表明,这种新的模型在不变任务中可以比 deterministic invariant neural networks 更好地表现。Abstract
Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
摘要
“设计能够表达性和保持任务知识的模型是一个在不断增长的问题。现有的解决方案都是在计算资源和存储空间方面做出了牺牲。在这种工作中,我们表明了可以通过随机性来解决这个问题。我们的关键发现是,接受随机性的通用近似和不变性可以降低我们的资源需求。更具体地,我们提出了一类基于随机化的线性模型,称为随机线性分类器(RLCs)。我们给出了参数和样本大小的条件,在这些条件下,RLCs可以,高概率地,近似任何(光滑)函数,同时保持 compact 群变换的不变性。基于这个结果,我们设计了三种随机线性模型,它们可以在分类任务中保持随机性和不变性,并且使用较少的资源。最后,我们通过实验证明这种新类型的模型在不变任务中比 deterministic 抽象神经网络更有优势。”
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
results: 论文发现,即使一个公司的成本比另一个公司更高,也可以实现利润分享,并且提供了一些方法来确定合理的谈判安排。Abstract
Major advances in Machine Learning (ML) and Artificial Intelligence (AI) increasingly take the form of developing and releasing general-purpose models. These models are designed to be adapted by other businesses and agencies to perform a particular, domain-specific function. This process has become known as adaptation or fine-tuning. This paper offers a model of the fine-tuning process where a Generalist brings the technological product (here an ML model) to a certain level of performance, and one or more Domain-specialist(s) adapts it for use in a particular domain. Both entities are profit-seeking and incur costs when they invest in the technology, and they must reach a bargaining agreement on how to share the revenue for the technology to reach the market. For a relatively general class of cost and revenue functions, we characterize the conditions under which the fine-tuning game yields a profit-sharing solution. We observe that any potential domain-specialization will either contribute, free-ride, or abstain in their uptake of the technology, and we provide conditions yielding these different strategies. We show how methods based on bargaining solutions and sub-game perfect equilibria provide insights into the strategic behavior of firms in these types of interactions, and we find that profit-sharing can still arise even when one firm has significantly higher costs than another. We also provide methods for identifying Pareto-optimal bargaining arrangements for a general set of utility functions.
摘要
大量的机器学习(ML)和人工智能(AI)创新都在形式为开发和发布通用模型。这些模型是为其他企业和机构用于特定领域功能而设计的,并且可以通过精度调整来适应不同的领域。这个过程被称为调整或精度调整。这篇论文提出了调整过程中一个通用者和一个或多个领域专家之间的合作模型。这两个实体都是追求利润的,并且投入技术时需要支付成本。为了将技术投入到市场上,他们需要达成协议来分配收益。为一类相对通用的成本和收益函数,我们Characterize了调整游戏中的盈利分享解决方案的条件。我们发现,在采用这种技术时,任何领域专业化都可能会为其投入、免费享用或决不使用技术,并且我们提供了这些不同策略的条件。我们发现,基于谈判解决方案和下游完整平衡的方法可以提供关于企业在这类互动中的策略性行为的深入理解,并且我们发现,盈利分享可以在一个实体有较高成本时仍然出现。此外,我们还提供了一种方法来标识通用集成函数的Pareto优化协议。
Event Abstraction for Enterprise Collaboration Systems to Support Social Process Mining
paper_authors: Jonas Blatt, Patrick Delfmann, Petra Schubert
For: The paper is written for Process Mining (PM) and Enterprise Collaboration Systems (ECS).* Methods: The paper proposes a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities with the system-generated low-level traces.* Results: The algorithm produces accurate results.Here’s the information in Simplified Chinese text, as requested:* For: 这篇论文是为了进程挖掘(PM)和企业协作系统(ECS)所写的。* Methods: 论文提出了一种适应ECS事件抽象(ECSEA)方法,通过比较记录的实际用户活动(高级踪迹)与系统生成的低级踪迹(从ECS中提取的)来训练模型。* Results: 算法生成的结果准确。I hope that helps! Let me know if you have any other questions.Abstract
One aim of Process Mining (PM) is the discovery of process models from event logs of information systems. PM has been successfully applied to process-oriented enterprise systems but is less suited for communication- and document-oriented Enterprise Collaboration Systems (ECS). ECS event logs are very fine-granular and PM applied to their logs results in spaghetti models. A common solution for this is event abstraction, i.e., converting low-level logs into more abstract high-level logs before running discovery algorithms. ECS logs come with special characteristics that have so far not been fully addressed by existing event abstraction approaches. We aim to close this gap with a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities (high-level traces) with the system-generated low-level traces (extracted from the ECS). The model allows us to automatically convert future low-level traces into an abstracted high-level log that can be used for PM. Our evaluation shows that the algorithm produces accurate results. ECSEA is a preprocessing method that is essential for the interpretation of collaborative work activity in ECS, which we call Social Process Mining.
摘要
一个目标 OF 过程挖掘(PM)是从信息系统事件日志中发现过程模型。PM 已经成功应用于进程 oriented 企业系统,但是它在交流和文档 oriented 企业协作系统(ECS)中 menos 适用。ECS 事件日志具有特殊的特点,PM 应用于其日志会导致蛇形模型。一种常见的解决方案是事件抽象,即将低级别的日志转换为更高级别的日志,以便在发现算法中使用。ECS 日志具有特殊的特点,已经不完全由现有的事件抽象方法解决。我们的目标是通过适应 ECS 事件抽象(ECSEA)方法,训练一个模型,将记录的实际用户活动(高级跟踪)与系统生成的低级别跟踪(从 ECS 提取)进行比较。这个模型可以自动将未来的低级别跟踪转换为抽象的高级别日志,用于 PM。我们的评估表明,该算法生成的结果准确。ECSEA 是一种必需的预处理方法,用于解释 ECS 中的协作工作活动,我们称之为社交过程挖掘。
Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries
paper_authors: Elvis Han Cui, Zizhao Zhang, Culsome Junwen Chen, Weng Kee Wong
for: 本研究用一种新提出的自然引导算法 Competitive Swarm Optimizer with Mutated Agents (CSO-MA) 应用于各种统计科学中的优化问题,以示其灵活性和与其他算法的比较。
methods: 本研究使用CSO-MA算法,可以处理不同的成本结构或多个用户指定的非线性约束。
results: 本研究在不同的优化问题中应用CSO-MA算法,如找到单个维度泛化趋势模型中参数的最大可能性估计、教育研究中常用的拉希模型参数估计、Markov renewal模型中Cox回归估计和两个分 комpartment模型中缺失值补充等。此外,还应用到生态学问题中选取最佳变量和制造业中用logistic模型与多个交互因素进行车辆燃料实验的设计。Abstract
Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.
摘要
自然 inspirited мета希顿算法是人工智能中重要的组件,广泛应用于各个领域解决各种复杂的优化问题。我们应用了一种新提出的自然 inspirited meta希顿算法called 竞争群体优化器with 突变代理(CSO-MA),并证明其灵活性和相比其他竞争者的出色表现在各种优化问题中。具体来说,我们表明该算法可以有效地处理不同的成本结构或多个用户指定的非线性约束。我们的应用包括(i)在生物信息学中使用单个维度泛化趋势模型来找到最大感受度参数的最佳估计,(ii)在教育研究中使用 Rasch 模型来估计参数,(iii)使用 Cox 回归模型来找到 M-估计,(iv)对两个部件模型中的缺失值进行完成,以及(v)在生态学问题中选取最佳变量,(vi)采用 Logistic 模型与多个交互因素来设计汽车燃油实验。
AdaptEx: A Self-Service Contextual Bandit Platform
results: 该平台可以快速提高用户体验,降低传统测试方法相关的成本和时间。它还能够在内容不断变化和连续”冰封”情况下妥协很好地适应。Abstract
This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.
摘要
这份论文介绍了Expedia Group广泛使用的自助上下文强制投机平台AdaptEx,该平台利用多臂强制投机算法个性化用户体验,并快速学习每次用户互动。它提供了改善用户体验的强大解决方案,同时减少传统测试方法相关的成本和时间。该平台允许快速迭代到优质产品解决方案,即使在不断变化的内容和连续“冷启动”情况下也能够 gracefully adapt。
Understanding the Effect of Counterfactual Explanations on Trust and Reliance on AI for Human-AI Collaborative Clinical Decision Making
results: 研究发现,当AI建议正确时,both therapists和laypersons可以通过特征解释和对比解释来提高审查性和一致性。然而,当AI建议错误时,对比解释可以帮助both therapists和laypersons减少对错误AI建议的过度依赖。Abstract
Artificial intelligence (AI) is increasingly being considered to assist human decision-making in high-stake domains (e.g. health). However, researchers have discussed an issue that humans can over-rely on wrong suggestions of the AI model instead of achieving human AI complementary performance. In this work, we utilized salient feature explanations along with what-if, counterfactual explanations to make humans review AI suggestions more analytically to reduce overreliance on AI and explored the effect of these explanations on trust and reliance on AI during clinical decision-making. We conducted an experiment with seven therapists and ten laypersons on the task of assessing post-stroke survivors' quality of motion, and analyzed their performance, agreement level on the task, and reliance on AI without and with two types of AI explanations. Our results showed that the AI model with both salient features and counterfactual explanations assisted therapists and laypersons to improve their performance and agreement level on the task when `right' AI outputs are presented. While both therapists and laypersons over-relied on `wrong' AI outputs, counterfactual explanations assisted both therapists and laypersons to reduce their over-reliance on `wrong' AI outputs by 21\% compared to salient feature explanations. Specifically, laypersons had higher performance degrades by 18.0 f1-score with salient feature explanations and 14.0 f1-score with counterfactual explanations than therapists with performance degrades of 8.6 and 2.8 f1-scores respectively. Our work discusses the potential of counterfactual explanations to better estimate the accuracy of an AI model and reduce over-reliance on `wrong' AI outputs and implications for improving human-AI collaborative decision-making.
摘要
人工智能(AI)在高度决策领域(如医疗)中被越来越广泛使用以协助人类决策。然而,研究人员发现,人们可能会过度依赖错误的AI模型建议而不是实现人类AI协同性能。在这种情况下,我们使用了突出性特征解释以及对其他选项的对比解释,以便让人类更加分析地审查AI建议,并且研究这些解释对信任和依赖AI的影响。我们在评估后期生存者质量动作任务上进行了实验,并分析了参与者的表现、同意水平和无AI和两种AI解释情况下的AI依赖度。我们的结果显示,带有突出性特征和对比解释的AI模型可以帮助治疗师和非专业人员提高表现和同意水平。然而,两者都会过度依赖错误的AI输出,并且对比解释可以帮助两者减少对错误AI输出的依赖度,相比突出性特征解释下降21%。特别是,非专业人员在使用突出性特征解释时表现下降18.0 f1-score,而使用对比解释时表现下降14.0 f1-score,与治疗师表现下降8.6和2.8 f1-score相比。我们的研究表明,对比解释可以更好地估计AI模型的准确性,降低对错误AI输出的依赖度,并对人类AI协同决策产生影响。
Some Options for Instantiation of Bipolar Argument Graphs with Deductive Arguments
results: 本研究的结果可以帮助我们更好地理解二分olar Argument Graph 中的 Argument 和它们之间的交互,并提供一种基于逻辑 Argument 的 Framework 来实现这一目标。Abstract
Argument graphs provide an abstract representation of an argumentative situation. A bipolar argument graph is a directed graph where each node denotes an argument, and each arc denotes the influence of one argument on another. Here we assume that the influence is supporting, attacking, or ambiguous. In a bipolar argument graph, each argument is atomic and so it has no internal structure. Yet to better understand the nature of the individual arguments, and how they interact, it is important to consider their internal structure. To address this need, this paper presents a framework based on the use of logical arguments to instantiate bipolar argument graphs, and a set of possible constraints on instantiating arguments that take into account the internal structure of the arguments, and the types of relationship between arguments.
摘要
Argument graphs provide an abstract representation of an argumentative situation. A bipolar argument graph is a directed graph where each node denotes an argument, and each arc denotes the influence of one argument on another. Here we assume that the influence is supporting, attacking, or ambiguous. In a bipolar argument graph, each argument is atomic and so it has no internal structure. However, to better understand the nature of the individual arguments and how they interact, it is important to consider their internal structure. To address this need, this paper presents a framework based on the use of logical arguments to instantiate bipolar argument graphs, and a set of possible constraints on instantiating arguments that take into account the internal structure of the arguments and the types of relationships between them.
paper_authors: Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao
for: solves complex problems with human-like thought processes
methods: employs language models in a cumulative and iterative manner, decomposing tasks into smaller components
results: consistently outperforms existing methods with an improvement up to 9.3%, achieves 98.04% accuracy on the curated FOLIO wiki dataset, and achieves 94% accuracy on the Game of 24 with a 20% enhancement over the previous state-of-the-art method.Here’s the full text in Simplified Chinese:
for: solves complex problems with human-like thought processes
methods: employs language models in a cumulative and iterative manner, decomposing tasks into smaller components
results: consistently outperforms existing methods with an improvement up to 9.3%, achieves 98.04% accuracy on the curated FOLIO wiki dataset, and achieves 94% accuracy on the Game of 24 with a 20% enhancement over the previous state-of-the-art method.Abstract
While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves the astonishing accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 94%, which signifies a substantial enhancement of 20% over the previous state-of-the-art method (code is available at https://github.com/iiis-ai/cumulative-reasoning).
摘要
While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves the astonishing accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 94%, which signifies a substantial enhancement of 20% over the previous state-of-the-art method (code is available at https://github.com/iiis-ai/cumulative-reasoning).Here's the word-for-word translation of the text into Simplified Chinese: whilst language models powerful versatile often fail address highly complex problems . This because solving complex problems requires deliberate thinking , which has been only minimally guided during training . In this paper , we propose new method called Cumulative Reasoning (CR) , which employs language models in cumulative iterative manner emulate human thought processes . By decomposing tasks smaller components , CR streamlines problem-solving process , rendering it both more manageable effective . For logical inference tasks , CR consistently outperforms existing methods with improvement up 9.3% , and achieves astonishing accuracy 98.04% on curated FOLIO wiki dataset . In context of Game of 24 , CR achieves accuracy 94% , which signifies substantial enhancement 20% over previous state-of-the-art method (code available at https://github.com/iiis-ai/cumulative-reasoning) .
Learning Unbiased Image Segmentation: A Case Study with Plain Knee Radiographs
paper_authors: Nickolas Littlefield, Johannes F. Plate, Kurt R. Weiss, Ines Lohse, Avani Chhabra, Ismaeel A. Siddiqui, Zoe Menezes, George Mastorakos, Sakshi Mehul Thakar, Mehrnaz Abedian, Matthew F. Gong, Luke A. Carlson, Hamidreza Moradi, Soheyla Amirian, Ahmad P. Tafti
results: 研究发现了gender和racial偏见,并提出了一些 Mitigation Strategies来纠正这些偏见,以确保公平和不偏见的分割结果。Abstract
Automatic segmentation of knee bony anatomy is essential in orthopedics, and it has been around for several years in both pre-operative and post-operative settings. While deep learning algorithms have demonstrated exceptional performance in medical image analysis, the assessment of fairness and potential biases within these models remains limited. This study aims to revisit deep learning-powered knee-bony anatomy segmentation using plain radiographs to uncover visible gender and racial biases. The current contribution offers the potential to advance our understanding of biases, and it provides practical insights for researchers and practitioners in medical imaging. The proposed mitigation strategies mitigate gender and racial biases, ensuring fair and unbiased segmentation results. Furthermore, this work promotes equal access to accurate diagnoses and treatment outcomes for diverse patient populations, fostering equitable and inclusive healthcare provision.
摘要
自动 segmentation of knee 骨骼结构是orthopedics中的一项基础技术,已经在预操作和后操作设置中存在几年之久。深度学习算法在医疗影像分析中表现出色,但评估公平和可能的偏见在这些模型中仍然有限。本研究旨在通过使用平面X光图像来探索深度学习 powers knee-bony anatomy segmentation中可见的性别和种族偏见。本贡献可能提高我们对偏见的理解,并提供实践的建议 для研究人员和实践者在医疗影像领域。提出的mitigation strategies可以抑制性别和种族偏见,确保公平和不偏见的segmentation结果。此外,这种工作促进了多样化患者人口群的准确诊断和治疗结果,推动了公平和包容的医疗服务。