results: 文章通过实验研究证明了移动AIGC驱动的HDT解决方案的效果,并在虚拟物理治疗教学平台中应用了这种解决方案。Abstract
Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empowered by the mobile AIGC is expected to revolutionize the personalized healthcare by generating rare disease data, modeling high-fidelity digital twin, building versatile testbeds, and providing 24/7 customized medical services. To promote the development of this new breed of paradigm, in this article, we propose a system architecture of mobile AIGC-driven HDT and highlight the corresponding design requirements and challenges. Moreover, we illustrate two use cases, i.e., mobile AIGC-driven HDT in customized surgery planning and personalized medication. In addition, we conduct an experimental study to prove the effectiveness of the proposed mobile AIGC-driven HDT solution, which shows a particular application in a virtual physical therapy teaching platform. Finally, we conclude this article by briefly discussing several open issues and future directions.
摘要
Mobile artificial intelligence生成内容(AIGC)技术指的是通过移动边缘网络部署人工智能算法自动生成信息的过程,同时满足用户的需求。 Mobile AIGC在最近吸引了非常大的关注,可以是人类数字双胞虫(HDT)的关键技术。 HDT通过移动AIGC得到强化,预计将在个性化医疗方面产生极高精度的数字双胞虫,生成罕见疾病数据,建立多样化测试平台,提供24/7个性化医疗服务。在这篇文章中,我们提议了移动AIGC驱动HDT的系统架构,并 highlight了相关的设计要求和挑战。此外,我们还介绍了两个使用场景:移动AIGC驱动HDT在自定义手术规划中和个性化药物。此外,我们进行了实验研究,证明了我们提议的移动AIGC驱动HDT解决方案的有效性,该解决方案在虚拟物理治疗教学平台中得到应用。最后,我们 briefly discussed several open issues and future directions。
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
results: 研究结果表明,评估的LLMs在零和几个预测场景中对大多数任务的性能几乎相当于状态之前的模型,特别是在问答任务上表现非常出色,即使它们没有看到这些任务的示例。然而,对于分类和RE任务,模型的性能比专门为医疗领域训练的模型,如PubMedBERT,下降。此外,研究发现没有LLM可以在所有任务中占据领先地位,各模型在某些任务上表现更好。Abstract
We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance of state-of-the-art models in zero- and few-shot scenarios for most tasks, and particularly well for the QA task, even though they have never seen examples from these tasks before. However, we observed that the classification and RE tasks perform below what can be achieved with a specifically trained model for the medical field, such as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all the studied tasks, with some models being better suited for certain tasks than others.
摘要
我们评估了四种当前最佳的 instruciton-tuned大语言模型(LLMs)——ChatGPT、Flan-T5 UL2、Tk-Instruct和Alpaca——在英文的13种实际医疗和生物医学自然语言处理(NLP)任务上,如命名实体识别(NER)、问答(QA)、关系提取(RE)等。我们的总结结果表明,评估的LLMs在零和几个预测场景中的性能接近了当前最佳模型的水平,特别是在QA任务上表现出色,即使它们从来没有看到这些任务的示例。然而,我们发现,分类和RE任务的性能下降到了专门为医疗领域训练的模型,如PubMedBERT,所能达到的水平。最后,我们注意到,没有任何LLM在所有研究任务上表现出优于其他模型,一些模型更适合某些任务。
CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong
results: 研究发现,这种基于赢家策略的CFR框架可以普适应其他不完整信息游戏。Abstract
Counterfactual Regret Minimization(CFR) has shown its success in Texas Hold'em poker. We apply this algorithm to another popular incomplete information game, Mahjong. Compared to the poker game, Mahjong is much more complex with many variants. We study two-player Mahjong by conducting game theoretical analysis and making a hierarchical abstraction to CFR based on winning policies. This framework can be generalized to other imperfect information games.
摘要
<>转换给定文本到简化中文。>Counterfactual Regret Minimization(CFR)在德州扑克游戏中表现出色。我们将这个算法应用到另一款流行的不完全信息游戏——麻将。与扑克游戏相比,麻将更加复杂,有多种变体。我们通过游戏理论分析和基于赢家策略层次化CFR的框架来研究两人麻将。这个框架可以推广到其他不完全信息游戏。
Enhancing Temporal Planning Domains by Sequential Macro-actions (Extended Version)
results: 在多个计划器和领域中实现提高了获得优化计划和计划质量。Abstract
Temporal planning is an extension of classical planning involving concurrent execution of actions and alignment with temporal constraints. Durative actions along with invariants allow for modeling domains in which multiple agents operate in parallel on shared resources. Hence, it is often important to avoid resource conflicts, where temporal constraints establish the consistency of concurrent actions and events. Unfortunately, the performance of temporal planning engines tends to sharply deteriorate when the number of agents and objects in a domain gets large. A possible remedy is to use macro-actions that are well-studied in the context of classical planning. In temporal planning settings, however, introducing macro-actions is significantly more challenging when the concurrent execution of actions and shared use of resources, provided the compliance to temporal constraints, should not be suppressed entirely. Our work contributes a general concept of sequential temporal macro-actions that guarantees the applicability of obtained plans, i.e., the sequence of original actions encapsulated by a macro-action is always executable. We apply our approach to several temporal planners and domains, stemming from the International Planning Competition and RoboCup Logistics League. Our experiments yield improvements in terms of obtained satisficing plans as well as plan quality for the majority of tested planners and domains.
摘要
temporal 规划是 classical 规划的扩展,具有同时执行动作和时间约束的整合。持续动作和 invariants 允许在多个代理人在共享资源上并发操作的Domain 模型。因此,通常需要避免资源冲突,而时间约束可以确定同时执行的动作和事件的一致性。然而, temporal 规划引擎的性能通常随多个代理人和对象的数量增加而逐渐下降。为了解决这个问题,我们可以使用 macro-actions,它们在 классиical 规划中非常成熟。然而,在 temporal 规划设置下,引入 macro-actions 是更加挑战,因为需要保持同时执行动作和共享资源的一致性,而不能完全终止。我们的工作提出了一种通用的顺序 temporal macro-actions 概念,该概念保证原始动作序列被封装在 macro-action 中执行的情况下,得到的计划是可靠的。我们对多个 temporal 规划器和领域进行了应用,其中包括国际规划竞赛和 RoboCup 物流联盟。我们的实验表明,我们的方法可以提高大多数测试的规划器和领域中的得到的满意计划以及计划质量。
paper_authors: Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Tuomas Sandholm, Furong Huang, Stephen McAleer
for: 本研究旨在训练可以在环境干扰或敌意攻击下表现良好的RL策略。
methods: 我们提出了GRAD方法,它将把时间相关的干扰问题看作是一个部分可见两个玩家零 SUM 游戏,通过找到这个游戏的approximate平衡,确保agent对时间相关的干扰具有强大的Robustness。
results: 我们在一系列连续控制任务上进行了实验,结果表明,相比基eline,我们的提议方法在state和action空间中都具有显著的Robustness优势,特别是在面对时间相关的干扰攻击时。Abstract
Robust reinforcement learning (RL) seeks to train policies that can perform well under environment perturbations or adversarial attacks. Existing approaches typically assume that the space of possible perturbations remains the same across timesteps. However, in many settings, the space of possible perturbations at a given timestep depends on past perturbations. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game. By finding an approximate equilibrium in this game, GRAD ensures the agent's robustness against temporally-coupled perturbations. Empirical experiments on a variety of continuous control tasks demonstrate that our proposed approach exhibits significant robustness advantages compared to baselines against both standard and temporally-coupled attacks, in both state and action spaces.
摘要
STRONG REINFORCEMENT LEARNING (RL) aims to train policies that can perform well under environmental perturbations or adversarial attacks. Existing methods typically assume that the space of possible perturbations remains the same across timesteps. However, in many situations, the space of possible perturbations at a given timestep depends on past perturbations. We formally introduce temporally-coupled perturbations, presenting a new challenge for existing robust RL methods. To address this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially-observable two-player zero-sum game. By finding an approximate equilibrium in this game, GRAD ensures the agent's robustness against temporally-coupled perturbations. Empirical experiments on a variety of continuous control tasks show that our proposed approach exhibits significant robustness advantages compared to baselines against both standard and temporally-coupled attacks, in both state and action spaces.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Fast Knowledge Graph Completion using Graphics Processing Units
results: 我们的研究表明,我们的框架可以高效地处理知识图谱完成问题。Abstract
Knowledge graphs can be used in many areas related to data semantics such as question-answering systems, knowledge based systems. However, the currently constructed knowledge graphs need to be complemented for better knowledge in terms of relations. It is called knowledge graph completion. To add new relations to the existing knowledge graph by using knowledge graph embedding models, we have to evaluate $N\times N \times R$ vector operations, where $N$ is the number of entities and $R$ is the number of relation types. It is very costly. In this paper, we provide an efficient knowledge graph completion framework on GPUs to get new relations using knowledge graph embedding vectors. In the proposed framework, we first define "transformable to a metric space" and then provide a method to transform the knowledge graph completion problem into the similarity join problem for a model which is "transformable to a metric space". After that, to efficiently process the similarity join problem, we derive formulas using the properties of a metric space. Based on the formulas, we develop a fast knowledge graph completion algorithm. Finally, we experimentally show that our framework can efficiently process the knowledge graph completion problem.
摘要
知识图可以应用于数据semantics中的多个领域,如问答系统、知识基础系统。然而,现有的知识图需要补充以提高知识的关系。这被称为知识图完成。为了通过使用知识图嵌入模型添加新的关系到现有的知识图,我们需要评估 $N\times N \times R$ 矢量操作,其中 $N$ 是实体的数量,$R$ 是关系类型的数量。这非常昂贵。在这篇论文中,我们提供了一个高效的知识图完成框架在GPU上进行新关系获取。我们首先定义"可转换到一个度量空间",然后将知识图完成问题转换成一个相似Join问题,该问题可以"可转换到一个度量空间"。然后,我们 derivate formulas使用度量空间的性质,然后我们开发了一个快速的知识图完成算法。最后,我们实验表明,我们的框架可以高效地处理知识图完成问题。
Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning
results: 我们的模型在下游 MoleculeNet 性质分类任务上表现出色,与模型只使用图模式alone (+4.26% AUROC提升) 和 MoMu 模型(Su et al. 2022) (+1.54% 提升) 相比,均显著提高了性能。Abstract
Deep learning in computational biochemistry has traditionally focused on molecular graphs neural representations; however, recent advances in language models highlight how much scientific knowledge is encoded in text. To bridge these two modalities, we investigate how molecular property information can be transferred from natural language to graph representations. We study property prediction performance gains after using contrastive learning to align neural graph representations with representations of textual descriptions of their characteristics. We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy inspired by organic reactions, and demonstrate improved performance on downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model (Su et al. 2022).
摘要
(Simplified Chinese translation)深度学习在计算生物化学中传统上专注于分子图 neural representation; 然而,最近的语言模型发展提出了如何在文本中储存科学知识的问题。为了联系这两种模式,我们研究如何从自然语言中提取分子性质信息并将其转换为图表示。我们使用对比学习将神经图表示与文本描述中的特征表示进行对应。我们还使用神经相关分数策略来改进文本检索,并提出了一种基于有机反应的化学正确分子图增强策略。我们在下游MoleculeNet属性分类任务上达到了+4.26% AUROC提升和+1.54%提升,相比于只使用图模式预训练的模型。
How to Design and Deliver Courses for Higher Education in the AI Era: Insights from Exam Data Analysis
For: The paper advocates for the idea that courses and exams in the AI era should be designed based on the strengths and limitations of AI, as well as pedagogical educational objectives.* Methods: The paper explores the strengths and limitations of AI based on current advances in the field, and provides examples of how courses and exams can be designed based on these factors. The paper also describes a pedagogical approach inspired by the Socratic teaching method that was adopted from January 2023 to May 2023.* Results: The paper presents data analysis results of seven ChatGPT-authorized exams conducted between December 2022 and March 2023, which show no correlation between students’ grades and whether or not they use ChatGPT to answer their exam questions. The paper also proposes a new exam system that allows for the application of the pedagogical approach in the AI era.Here is the information in Simplified Chinese text:* For: 这篇论文提出了在人工智能时代,课程和考试应该如何设计,以便符合人工智能的优势和局限性,以及教育目标。* Methods: 论文从现有人工智能技术的发展来探讨人工智能的优势和局限性,并提供了不同领域的示例,如IT、英语和艺术等。论文还描述了一种基于索普朗教学方法的教学方法,从2023年1月至2023年5月进行了应用。* Results: 论文提供了七个使用ChatGPT作为考试工具的考试数据分析结果,显示学生的成绩与使用ChatGPT answering考试问题无关。论文还提出了一种新的考试系统,以便在人工智能时代应用教学方法。Abstract
In this position paper, we advocate for the idea that courses and exams in the AI era have to be designed based on two factors: (1) the strengths and limitations of AI, and (2) the pedagogical educational objectives. Based on insights from the Delors report on education [1], we first address the role of education and recall the main objectives that educational institutes must strive to achieve independently of any technology. We then explore the strengths and limitations of AI, based on current advances in AI. We explain how courses and exams can be designed based on these strengths and limitations of AI, providing different examples in the IT, English, and Art domains. We show how we adopted a pedagogical approach that is inspired from the Socratic teaching method from January 2023 to May 2023. Then, we present the data analysis results of seven ChatGPT-authorized exams conducted between December 2022 and March 2023. Our exam data results show that there is no correlation between students' grades and whether or not they use ChatGPT to answer their exam questions. Finally, we present a new exam system that allows us to apply our pedagogical approach in the AI era.
摘要
Translation notes:* "AI era" is translated as "人工智能时代" (rénxīng zhìnéng shídài)* "pedagogical educational objectives" is translated as "教育目标" (jiàoyù mùbiāo)* "Delors report" is translated as "德洛尔报告" (déluō'ěr bàogāo)* "Socratic teaching method" is translated as "苏格拉底教学方法" (sūgélādī jíxué fāngfa)* "ChatGPT-authorized exams" is translated as "ChatGPT授权考试" (ChatGPT shèngquán kǎoshì)* "data analysis results" is translated as "数据分析结果" (numbers dàxīn yìjī)Note: The translation is in Simplified Chinese, as requested.
Model Predictive Control (MPC) of an Artificial Pancreas with Data-Driven Learning of Multi-Step-Ahead Blood Glucose Predictors
results: 我们对这两种控制器进行了Simulation比较,并发现我们的LSTM-MPC控制器在三个场景中表现更好,即在常规情况下、随机饭物干扰情况下和降低胰岛素敏感性25%情况下。此外,我们的方法可以更好地预测未来血糖浓度,并且closed-loop性能更好。Abstract
We present the design and \textit{in-silico} evaluation of a closed-loop insulin delivery algorithm to treat type 1 diabetes (T1D) consisting in a data-driven multi-step-ahead blood glucose (BG) predictor integrated into a Linear Time-Varying (LTV) Model Predictive Control (MPC) framework. Instead of identifying an open-loop model of the glucoregulatory system from available data, we propose to directly fit the entire BG prediction over a predefined prediction horizon to be used in the MPC, as a nonlinear function of past input-ouput data and an affine function of future insulin control inputs. For the nonlinear part, a Long Short-Term Memory (LSTM) network is proposed, while for the affine component a linear regression model is chosen. To assess benefits and drawbacks when compared to a traditional linear MPC based on an auto-regressive with exogenous (ARX) input model identified from data, we evaluated the proposed LSTM-MPC controller in three simulation scenarios: a nominal case with 3 meals per day, a random meal disturbances case where meals were generated with a recently published meal generator, and a case with 25$\%$ decrease in the insulin sensitivity. Further, in all the scenarios, no feedforward meal bolus was administered. For the more challenging random meal generation scenario, the mean $\pm$ standard deviation percent time in the range 70-180 [mg/dL] was 74.99 $\pm$ 7.09 vs. 54.15 $\pm$ 14.89, the mean $\pm$ standard deviation percent time in the tighter range 70-140 [mg/dL] was 47.78$\pm$8.55 vs. 34.62 $\pm$9.04, while the mean $\pm$ standard deviation percent time in sever hypoglycemia, i.e., $<$ 54 [mg/dl] was 1.00$\pm$3.18 vs. 9.45$\pm$11.71, for our proposed LSTM-MPC controller and the traditional ARX-MPC, respectively. Our approach provided accurate predictions of future glucose concentrations and good closed-loop performances of the overall MPC controller.
摘要
我们介绍了一种closed-loop胰岛素输液算法,用于治疗型1 диабеtes(T1D),这种算法包括一个数据驱动的多步预测血糖(BG)预测器,integrated into a Linear Time-Varying(LTV) Model Predictive Control(MPC)框架。而不是从可用数据中直接Identify opens loop模型的静脉糖皮肤系统,我们提议直接预测整个BG预测 horizon,作为一个非线性函数,用于MPC中的预测。 For the nonlinear part, a Long Short-Term Memory(LSTM) network is proposed, while for the affine component a linear regression model is chosen. To evaluate the benefits and drawbacks of the proposed LSTM-MPC controller compared to a traditional linear MPC based on an auto-regressive with exogenous(ARX)input model identified from data, we evaluated the proposed LSTM-MPC controller in three simulation scenarios: a nominal case with 3 meals per day, a random meal disturbances case where meals were generated with a recently published meal generator, and a case with 25% decrease in the insulin sensitivity. Further, in all the scenarios, no feedforward meal bolus was administered. For the more challenging random meal generation scenario, the mean ± standard deviation percent time in the range 70-180 [mg/dL] was 74.99 ± 7.09 vs. 54.15 ± 14.89, the mean ± standard deviation percent time in the tighter range 70-140 [mg/dL] was 47.78 ± 8.55 vs. 34.62 ± 9.04, while the mean ± standard deviation percent time in severe hypoglycemia, i.e., <54 [mg/dL] was 1.00 ± 3.18 vs. 9.45 ± 11.71, for our proposed LSTM-MPC controller and the traditional ARX-MPC, respectively. Our approach provided accurate predictions of future glucose concentrations and good closed-loop performances of the overall MPC controller.
Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models
paper_authors: Tin Lai, Yukun Shi, Zicong Du, Jiajie Wu, Ken Fu, Yichao Dou, Ziqi Wang for:The paper aims to provide a novel AI-based system for online psychological consultation, which can assist healthcare professionals in providing timely and professional mental health support.methods:The proposed framework, called Psy-LLM, leverages Large Language Models (LLMs) for question-answering in online psychological consultation. The framework combines pre-trained LLMs with real-world professional Q&A from psychologists and extensively crawled psychological articles.results:The authors evaluated the framework using intrinsic metrics such as perplexity and extrinsic evaluation metrics including human participant assessments of response helpfulness, fluency, relevance, and logic. The results demonstrate the effectiveness of the Psy-LLM framework in generating coherent and relevant answers to psychological questions.Abstract
The demand for psychological counseling has grown significantly in recent years, particularly with the global outbreak of COVID-19, which has heightened the need for timely and professional mental health support. Online psychological counseling has emerged as the predominant mode of providing services in response to this demand. In this study, we propose the Psy-LLM framework, an AI-based system leveraging Large Language Models (LLMs) for question-answering in online psychological consultation. Our framework combines pre-trained LLMs with real-world professional Q&A from psychologists and extensively crawled psychological articles. The Psy-LLM framework serves as a front-end tool for healthcare professionals, allowing them to provide immediate responses and mindfulness activities to alleviate patient stress. Additionally, it functions as a screening tool to identify urgent cases requiring further assistance. We evaluated the framework using intrinsic metrics, such as perplexity, and extrinsic evaluation metrics, with human participant assessments of response helpfulness, fluency, relevance, and logic. The results demonstrate the effectiveness of the Psy-LLM framework in generating coherent and relevant answers to psychological questions. This article concludes by discussing the potential of large language models to enhance mental health support through AI technologies in online psychological consultation.
摘要
“对于心理辅导的需求在最近的几年中有了很大的增长,特别是COVID-19全球大流行,这使得心理健康支持的需求增加了。在这篇研究中,我们提出了Psy-LLM框架,这是一个基于大语言模型(LLM)的人工智能系统,用于在线心理咨询中回答问题。我们的框架结合了预训语言模型和专业心理师的问答,以及大量爬虫的心理文章。Psy-LLM框架作为健康专业人员的前端工具,可以提供即时的回答和心理活动,以减轻病人的压力。同时,它还可以作为寻找紧急案例需要进一步帮助的萤幕工具。我们使用了自类度、流畅度、相关度和逻辑性等内部评估指标,以及人类参与者的评价,来评估Psy-LLM框架的效果。结果显示,Psy-LLM框架可以生成 coherent 和相关的回答心理问题。本文结束时,讨论了大语言模型在线心理咨询中如何通过人工智能技术增强心理健康支持。”
Sparse then Prune: Toward Efficient Vision Transformers
paper_authors: Yogi Prasetyo, Novanto Yudistira, Agus Wahyu Widodo
for: 这个研究旨在investigate the possibility of applying Sparse Regularization and Pruning methods to the Vision Transformer architecture for image classification tasks, and explore the trade-off between performance and efficiency.
methods: 这个研究使用了Sparse Regularization和Pruning方法,并在CIFAR-10、CIFAR-100和ImageNet-100 datasets上进行了实验。模型的训练过程包括两部分:预训练和精度调整。预训练使用了ImageNet21K数据,followed by 20 epochs of fine-tuning.
results: 研究发现,当使用CIFAR-100和ImageNet-100数据进行测试时,带有Sparse Regularization的模型可以提高准确率by 0.12%。此外,对带有Sparse Regularization的模型进行截割,可以更好地提高平均准确率。特别是在CIFAR-10数据集上,截割后的模型可以提高准确率by 0.568%,在CIFAR-100和ImageNet-100数据集上提高了1.764%和0.256%。Abstract
The Vision Transformer architecture is a deep learning model inspired by the success of the Transformer model in Natural Language Processing. However, the self-attention mechanism, large number of parameters, and the requirement for a substantial amount of training data still make Vision Transformers computationally burdensome. In this research, we investigate the possibility of applying Sparse Regularization to Vision Transformers and the impact of Pruning, either after Sparse Regularization or without it, on the trade-off between performance and efficiency. To accomplish this, we apply Sparse Regularization and Pruning methods to the Vision Transformer architecture for image classification tasks on the CIFAR-10, CIFAR-100, and ImageNet-100 datasets. The training process for the Vision Transformer model consists of two parts: pre-training and fine-tuning. Pre-training utilizes ImageNet21K data, followed by fine-tuning for 20 epochs. The results show that when testing with CIFAR-100 and ImageNet-100 data, models with Sparse Regularization can increase accuracy by 0.12%. Furthermore, applying pruning to models with Sparse Regularization yields even better results. Specifically, it increases the average accuracy by 0.568% on CIFAR-10 data, 1.764% on CIFAR-100, and 0.256% on ImageNet-100 data compared to pruning models without Sparse Regularization. Code can be accesed here: https://github.com/yogiprsty/Sparse-ViT
摘要
“当前的视觉 трансформер架构是一种深度学习模型,受到自然语言处理中的Transformer模型的成功所 inspirited。然而,自我对项 mechanism,大量的参数,以及需要大量的训练数据仍然使得视觉 трансформер Computationally burdensome。在这个研究中,我们 investigate了将Sparse Regularization应用到视觉 трансформер架构中,以及对其进行Prune的影响,以进行性能和效率之间的交易。为此,我们将Sparse Regularization和Prune方法应用到视觉 трансформер架构,进行图像分类任务。训练过程包括两个部分:预训练和精练。预训练使用ImageNet21K数据,接着进行20次精练。结果显示,在CIFAR-100和ImageNet-100数据上进行训练时,具有Sparse Regularization的模型可以提高精确率0.12%。此外,对Sparse Regularization的模型进行Prune操作,产生了更好的结果。具体来说,它可以在CIFAR-10数据上提高平均精确率0.568%,CIFAR-100数据上提高1.764%,ImageNet-100数据上提高0.256%。软件可以在以下github上取得:https://github.com/yogiprsty/Sparse-ViT”
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
results: 研究发现, CLIP的示例调整过程具有很高的Robustness,这主要归因于模型中的固定类名token提供了强制的Regularization,以及CLIP学习的强大预训练图像文本嵌入,帮助提高图像分类的预测精度。Abstract
Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data. A vision-language model can be adapted to a new classification task through few-shot prompt tuning. We find that such a prompt tuning process is highly robust to label noises. This intrigues us to study the key reasons contributing to the robustness of the prompt tuning paradigm. We conducted extensive experiments to explore this property and find the key factors are: 1) the fixed classname tokens provide a strong regularization to the optimization of the model, reducing gradients induced by the noisy samples; 2) the powerful pre-trained image-text embedding that is learned from diverse and generic web data provides strong prior knowledge for image classification. Further, we demonstrate that noisy zero-shot predictions from CLIP can be used to tune its own prompt, significantly enhancing prediction accuracy in the unsupervised setting. The code is available at https://github.com/CEWu/PTNL.
摘要
视力语模型如CLIP通过大规模训练学习通用文本图像嵌入。一个视力语模型可以通过几招提示调整来适应新的分类任务。我们发现这种提示调整过程具有很高的鲁棒性,使我们感到惊叹。我们进行了广泛的实验研究这种性能的原因,并发现关键因素有:1)固定的类名token提供了模型优化的强制 régularization,减少了噪声样本引起的梯度; 2)通过多样化和通用的网络数据学习的强大预训练图像文本嵌入,为图像分类提供了强大的先验知识。此外,我们示出了CLIP的噪声零时预测可以用来调整其自己的提示,significantly enhance预测精度在无监督Setting下。代码可以在https://github.com/CEWu/PTNL中找到。
Multi-representations Space Separation based Graph-level Anomaly-aware Detection
results: 我们对基eline方法进行了广泛的评估,并获得了显著的效果。Abstract
Graph structure patterns are widely used to model different area data recently. How to detect anomalous graph information on these graph data has become a popular research problem. The objective of this research is centered on the particular issue that how to detect abnormal graphs within a graph set. The previous works have observed that abnormal graphs mainly show node-level and graph-level anomalies, but these methods equally treat two anomaly forms above in the evaluation of abnormal graphs, which is contrary to the fact that different types of abnormal graph data have different degrees in terms of node-level and graph-level anomalies. Furthermore, abnormal graphs that have subtle differences from normal graphs are easily escaped detection by the existing methods. Thus, we propose a multi-representations space separation based graph-level anomaly-aware detection framework in this paper. To consider the different importance of node-level and graph-level anomalies, we design an anomaly-aware module to learn the specific weight between them in the abnormal graph evaluation process. In addition, we learn strictly separate normal and abnormal graph representation spaces by four types of weighted graph representations against each other including anchor normal graphs, anchor abnormal graphs, training normal graphs, and training abnormal graphs. Based on the distance error between the graph representations of the test graph and both normal and abnormal graph representation spaces, we can accurately determine whether the test graph is anomalous. Our approach has been extensively evaluated against baseline methods using ten public graph datasets, and the results demonstrate its effectiveness.
摘要
graph结构模式在当今数据中广泛应用。如何检测图数据中的异常信息已成为一个流行的研究问题。本研究的目标在于特定的问题:如何在图集中检测异常图。前一些研究发现,异常图主要表现为节点水平和图水平异常,但这些方法在评估异常图时平等对待这两种异常形态,这与实际情况不符。此外,异常图具有微妙的差异,容易被现有方法检测掉。因此,我们提出了一个基于多个表示空间分离的图级异常检测框架。为了考虑节点水平和图水平异常的不同重要性,我们设计了一个异常检测模块,以学习特定的节点水平和图水平异常权重。此外,我们通过四种不同的权重图表示对彼此进行学习,以学习纯正的常见图和异常图表示空间。通过测试图表示空间与常见图表示空间和异常图表示空间之间的距离错误来准确判断测试图是否异常。我们的方法与基准方法进行比较使用了十个公共图数据集,结果表明其效果批示。
Pyrus Base: An Open Source Python Framework for the RoboCup 2D Soccer Simulation
For: The paper is written to introduce Pyrus, a Python base code for the RoboCup Soccer Simulation 2D (SS2D) league, to provide a more accessible and efficient platform for researchers to develop their ideas and integrate machine learning algorithms into their teams.* Methods: The paper uses C++ base codes as the foundation and develops Pyrus, a Python base code, to overcome the challenges of C++ base codes and provide a more user-friendly platform for researchers.* Results: Pyrus is introduced as a powerful baseline for developing machine learning concepts in SS2D, and it is open-source and publicly available under MIT License on GitHub, encouraging researchers to efficiently develop their ideas and integrate machine learning algorithms into their teams.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为了介绍PYRUS,一个基于Python的RoboCup足球模拟2D(SS2D)联赛的基础代码,以便更好地为研究人员提供一个访问ibility和效率的平台,以便他们可以更加快速地开发自己的想法并将机器学习算法integrated into their teams。* Methods: 这篇论文使用C++基础代码作为基础,然后开发了PYRUS,一个基于Python的基础代码,以解决C++基础代码的挑战,并提供一个更加用户友好的平台 для研究人员。* Results: PYRUS被引入为SS2D中的一个强大基线,可以帮助研究人员更加快速地开发自己的想法并integrated machine learning算法into their teams,PYRUS的基础代码公开发布在GitHub上,并且以MIT许可证进行公共可用,以便更多的研究人员可以参与到这个项目中。Abstract
Soccer, also known as football in some parts of the world, involves two teams of eleven players whose objective is to score more goals than the opposing team. To simulate this game and attract scientists from all over the world to conduct research and participate in an annual computer-based soccer world cup, Soccer Simulation 2D (SS2D) was one of the leagues initiated in the RoboCup competition. In every SS2D game, two teams of 11 players and one coach connect to the RoboCup Soccer Simulation Server and compete against each other. Over the past few years, several C++ base codes have been employed to control agents' behavior and their communication with the server. Although C++ base codes have laid the foundation for the SS2D, developing them requires an advanced level of C++ programming. C++ language complexity is a limiting disadvantage of C++ base codes for all users, especially for beginners. To conquer the challenges of C++ base codes and provide a powerful baseline for developing machine learning concepts, we introduce Pyrus, the first Python base code for SS2D. Pyrus is developed to encourage researchers to efficiently develop their ideas and integrate machine learning algorithms into their teams. Pyrus base is open-source code, and it is publicly available under MIT License on GitHub
摘要
足球(也称为足球在一些地方)是一种需要两支队伍的 eleven 名球员,目标是将更多的入球击败对手队伍。为了模拟这场游戏并吸引全球科学家来参与研究和参加每年的计算机基于足球世界杯赛,Football Simulation 2D(SS2D)是RoboCup竞赛中的一个赛事。在每场 SS2D 比赛中,两支队伍的 11 名球员和一位教练通过RoboCup足球 simulate Server 竞争对对手。过去几年,一些 C++ 基础代码被使用来控制代理的行为和与服务器的通信。虽然 C++ 基础代码已经为 SS2D 提供了基础,但是开发它们需要高级的 C++ 编程技能。 C++ 语言复杂性是 C++ 基础代码的限制性,特别是对所有用户来说,尤其是对初学者来说。为了 conquering C++ 基础代码的挑战和提供一个机器学习概念的强大基础,我们引入了 Pyrus,SS2D 的第一个 Python 基础代码。Pyrus 是为了鼓励研究人员尽可能快速地发展他们的想法,并将机器学习算法 integrate 到他们的队伍中。Pyrus 的基础代码是开源的,公开在 GitHub 上,并以 MIT 许可证进行公共发布。
On-Robot Bayesian Reinforcement Learning for POMDPs
results: 本paper在两个人机交互任务中实现了近乎最佳性能,只需要几个实际世界话语。视频证明可以在https://youtu.be/H9xp60ngOes中找到。Abstract
Robot learning is often difficult due to the expense of gathering data. The need for large amounts of data can, and should, be tackled with effective algorithms and leveraging expert information on robot dynamics. Bayesian reinforcement learning (BRL), thanks to its sample efficiency and ability to exploit prior knowledge, is uniquely positioned as such a solution method. Unfortunately, the application of BRL has been limited due to the difficulties of representing expert knowledge as well as solving the subsequent inference problem. This paper advances BRL for robotics by proposing a specialized framework for physical systems. In particular, we capture this knowledge in a factored representation, then demonstrate the posterior factorizes in a similar shape, and ultimately formalize the model in a Bayesian framework. We then introduce a sample-based online solution method, based on Monte-Carlo tree search and particle filtering, specialized to solve the resulting model. This approach can, for example, utilize typical low-level robot simulators and handle uncertainty over unknown dynamics of the environment. We empirically demonstrate its efficiency by performing on-robot learning in two human-robot interaction tasks with uncertainty about human behavior, achieving near-optimal performance after only a handful of real-world episodes. A video of learned policies is at https://youtu.be/H9xp60ngOes.
摘要
机器人学习往往困难,主要是因为获取数据的成本高昂。为了解决这个问题,我们需要使用有效的算法和利用机器人动力学专家的知识。泛bayesian学习(BRL)因其样本效率高和能够利用先验知识的特点,成为一种有优势的解决方案。然而,BRL在应用中受到了知识表示和推理问题的限制。这篇论文提出了一种特有的框架,用于解决机器人物理系统中的问题。我们捕捉了专家知识,并证明 posterior 会分解为类似的形式,最后将模型形式化为 bayesian 框架。我们then introduces 一种基于 Monte-Carlo 搜索和粒子筛选的在线解决方法,特化用于解决 resulting 模型。这种方法可以利用典型的低级机器人模拟器,并处理不确定环境中的动力学不确定性。我们实验表明,这种方法可以在两个人机器人互动任务中达到近似优化性,只需要几十个真实世界 episoden。有关学习的视频可以在 https://youtu.be/H9xp60ngOes 中找到。
Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction
methods: 该研究提出了一种多Modal transformer(PathOmics),通过不监督预训练来捕捉组织微环境的内在相互作用,并将这些信息与许多 genomics数据(例如mRNA-sequence、copy number variant和methylation)融合。
results: 研究表明,提出的方法可以在TCGA colon和RECTUM癌组织中表现出优异,并且超越了现有的研究。此外,该方法还可以使用有限的finetunedamples进行数据效率的分析,从而提高预测结果的准确性。Abstract
Survival outcome assessment is challenging and inherently associated with multiple clinical factors (e.g., imaging and genomics biomarkers) in cancer. Enabling multimodal analytics promises to reveal novel predictive patterns of patient outcomes. In this study, we propose a multimodal transformer (PathOmics) integrating pathology and genomics insights into colon-related cancer survival prediction. We emphasize the unsupervised pretraining to capture the intrinsic interaction between tissue microenvironments in gigapixel whole slide images (WSIs) and a wide range of genomics data (e.g., mRNA-sequence, copy number variant, and methylation). After the multimodal knowledge aggregation in pretraining, our task-specific model finetuning could expand the scope of data utility applicable to both multi- and single-modal data (e.g., image- or genomics-only). We evaluate our approach on both TCGA colon and rectum cancer cohorts, showing that the proposed approach is competitive and outperforms state-of-the-art studies. Finally, our approach is desirable to utilize the limited number of finetuned samples towards data-efficient analytics for survival outcome prediction. The code is available at https://github.com/Cassie07/PathOmics.
摘要
生存结果评估在癌症中是挑战性的,与多种临床因素(例如成像和基因表达 markers)相关。启用多modal分析承诺可以揭示新的预测性模式。在这项研究中,我们提出了一种多modal transformer(PathOmics),将pathology和基因学信息集成到colon相关癌症生存预测中。我们强调了无监督预训来捕捉材料微环境的内在交互。经过多modal知识聚合的预训后,我们的任务特定模型精度调整可以扩大数据的可用范围,包括多modal数据(例如图像或基因数据)以及单modal数据(例如图像或基因数据)。我们在TCGAcolon和rectum癌症群体上评估了我们的方法,并显示了我们的方法与当前最佳实践相比较竞争。最后,我们的方法可以使用有限的精度调整样本来实现数据效率的分析。代码可以在https://github.com/Cassie07/PathOmics中找到。
HIQL: Offline Goal-Conditioned RL with Latent States as Actions
results: 该方法可以解决长期任务,并可以在高维图像观察中进行扩展。 Code可以在https://seohong.me/projects/hiql/上下载。Abstract
Unsupervised pre-training has recently become the bedrock for computer vision and natural language processing. In reinforcement learning (RL), goal-conditioned RL can potentially provide an analogous self-supervised approach for making use of large quantities of unlabeled (reward-free) data. However, building effective algorithms for goal-conditioned RL that can learn directly from diverse offline data is challenging, because it is hard to accurately estimate the exact value function for faraway goals. Nonetheless, goal-reaching problems exhibit structure, such that reaching distant goals entails first passing through closer subgoals. This structure can be very useful, as assessing the quality of actions for nearby goals is typically easier than for more distant goals. Based on this idea, we propose a hierarchical algorithm for goal-conditioned RL from offline data. Using one action-free value function, we learn two policies that allow us to exploit this structure: a high-level policy that treats states as actions and predicts (a latent representation of) a subgoal and a low-level policy that predicts the action for reaching this subgoal. Through analysis and didactic examples, we show how this hierarchical decomposition makes our method robust to noise in the estimated value function. We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data. Our code is available at https://seohong.me/projects/hiql/
摘要
“无监督预训”最近已经成为计算机视觉和自然语言处理领域的基础。在征激学习(RL)中,目标conditioned RL可能提供一种自愿supervised的方法,将大量的无回奖数据给利用。然而,建立有效的对目标conditioned RL算法,从多元的过去数据中学习,是一个挑战。这是因为,过去的目标变得越远,预测其价值函数的专业程度就越高。然而,目标 raggiungere问题具有结构,即到达较远的目标需要先通过更近的子目标。这种结构可以非常有用,因为评估靠近目标的动作较 easier than评估更远的目标。基于这个想法,我们提出了一个层次架构的对目标conditioned RL算法。我们使用一个不含动作的价值函数,学习两个政策:一个高层政策,将状态视为动作,预测(一个隐藏表示)子目标,以及一个低层政策,预测将用来达到子目标的动作。我们通过分析和示例,显示了我们的层次分解对于错误估计价值函数的影响。我们然后将我们的方法应用到过去目标 raggiungere测试 benchmark,展示了我们的方法可以解决长期任务,可以扩展到高维度的影像观察,并可以轻松地使用无动作数据。我们的代码可以在https://seohong.me/projects/hiql/ 获取。”
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors
results: 实验结果表明,使用布林德方法可以提高任务成功率,降低输入大小和计算成本,并在不同的语言模型actor之间进行泛化。Abstract
Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games, utilizing their general world knowledge and planning abilities. However, previous work does little to explore what environment state information is provided to LLM actors via language. Exhaustively describing high-dimensional states can impair performance and raise inference costs for LLM actors. Previous LLM actors avoid the issue by relying on hand-engineered, task-specific protocols to determine which features to communicate about a state and which to leave out. In this work, we propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions by learning a value function for task-conditioned state descriptions. We evaluate BLINDER on the challenging video game NetHack and a robotic manipulation task. Our method improves task success rate, reduces input size and compute costs, and generalizes between LLM actors.
摘要
Translated into Simplified Chinese:大型语言模型(LLM)在机器人和游戏等领域被应用为序列决策任务的演员,利用其总体世界知识和规划能力。然而,先前的研究几乎没有探讨 LLM 演员所接受的环境状态信息是如何传递给语言中。描述高维状态的详细信息可能会降低性能和提高 LLM 演员的推理成本。先前的 LLM 演员通常通过靠手工设计、任务特定协议来确定要关注哪些状态特征和哪些可以被忽略。在这项工作中,我们提出了 Brief Language INputs for DEcision-making Responses(BLINDER)方法,通过学习任务条件下的状态描述值函数来自动选择简洁的状态描述。我们在 NetHack 游戏和机器人 manipulate 任务上评估 BLINDER。我们的方法可以提高任务成功率,降低输入大小和计算成本,并在不同的 LLM 演员之间进行泛化。
Bibliometric Analysis of Publisher and Journal Instructions to Authors on Generative-AI in Academic and Scientific Publishing
paper_authors: Conner Ganjavi, Michael B. Eppler, Asli Pekcan, Brett Biedermann, Andre Abreu, Gary S. Collins, Inderbir S. Gill, Giovanni E. Cacciamani
for: The paper aims to determine the extent and content of guidance for authors regarding the use of generative-AI (GAI), Generative Pretrained models (GPTs), and Large Language Models (LLMs) powered tools among the top 100 academic publishers and journals in science.
methods: The study screened the websites of the top 100 publishers and journals from May 19th to May 20th, 2023, to identify guidance on the use of GAI.
results: The study found that 17% of the largest 100 publishers and 70% of the top 100 journals provided guidance on the use of GAI. Most publishers and journals prohibited the inclusion of GAI as an author, but there was variability in how to disclose the use of GAI and in the allowable uses of GAI. Some top publishers and journals lacked guidance on the use of GAI by authors, and there was a need for standardized guidelines to protect the integrity of scientific output.Here are the three key points in Simplified Chinese text:
results: 研究发现,前100家出版社中有17%提供了GAI的指南,而前100家期刊中有70%提供了指南。大多数出版社和期刊禁止了GAI作为作者的包含,但是有一定的变化在披露GAI的方式和允许的GAI使用方式。一些顶尖出版社和期刊缺乏关于GAI的指南,需要有标准化的指南来保护科学输出的正当性。Abstract
We aim to determine the extent and content of guidance for authors regarding the use of generative-AI (GAI), Generative Pretrained models (GPTs) and Large Language Models (LLMs) powered tools among the top 100 academic publishers and journals in science. The websites of these publishers and journals were screened from between 19th and 20th May 2023. Among the largest 100 publishers, 17% provided guidance on the use of GAI, of which 12 (70.6%) were among the top 25 publishers. Among the top 100 journals, 70% have provided guidance on GAI. Of those with guidance, 94.1% of publishers and 95.7% of journals prohibited the inclusion of GAI as an author. Four journals (5.7%) explicitly prohibit the use of GAI in the generation of a manuscript, while 3 (17.6%) publishers and 15 (21.4%) journals indicated their guidance exclusively applies to the writing process. When disclosing the use of GAI, 42.8% of publishers and 44.3% of journals included specific disclosure criteria. There was variability in guidance of where to disclose the use of GAI, including in the methods, acknowledgments, cover letter, or a new section. There was also variability in how to access GAI guidance and the linking of journal and publisher instructions to authors. There is a lack of guidance by some top publishers and journals on the use of GAI by authors. Among those publishers and journals that provide guidance, there is substantial heterogeneity in the allowable uses of GAI and in how it should be disclosed, with this heterogeneity persisting among affiliated publishers and journals in some instances. The lack of standardization burdens authors and threatens to limit the effectiveness of these regulations. There is a need for standardized guidelines in order to protect the integrity of scientific output as GAI continues to grow in popularity.
摘要
我们目的是确定杂志和出版商在使用生成AI(GAI)、生成预训练模型(GPT)和大语言模型(LLM)激活的指导内容和范围。我们在2023年5月19日至20日检查了前100名学术出版商和杂志的网站。 Among the largest 100 publishers, 17% provided guidance on the use of GAI, of which 12 (70.6%) were among the top 25 publishers. Among the top 100 journals, 70% have provided guidance on GAI. Of those with guidance, 94.1% of publishers and 95.7% of journals prohibited the inclusion of GAI as an author. Four journals (5.7%) explicitly prohibit the use of GAI in the generation of a manuscript, while 3 (17.6%) publishers and 15 (21.4%) journals indicated their guidance exclusively applies to the writing process. When disclosing the use of GAI, 42.8% of publishers and 44.3% of journals included specific disclosure criteria. There was variability in guidance of where to disclose the use of GAI, including in the methods, acknowledgments, cover letter, or a new section. There was also variability in how to access GAI guidance and the linking of journal and publisher instructions to authors. There is a lack of guidance by some top publishers and journals on the use of GAI by authors. Among those publishers and journals that provide guidance, there is substantial heterogeneity in the allowable uses of GAI and in how it should be disclosed, with this heterogeneity persisting among affiliated publishers and journals in some instances. The lack of standardization burdens authors and threatens to limit the effectiveness of these regulations. There is a need for standardized guidelines in order to protect the integrity of scientific output as GAI continues to grow in popularity.
Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning
results: 我们的方法可以在各种环境中稳定、高效地学习,并且可以缓解奖励学习Agent 在奖励分配问题上的困难。Abstract
Oftentimes, environments for sequential decision-making problems can be quite sparse in the provision of evaluative feedback to guide reinforcement-learning agents. In the extreme case, long trajectories of behavior are merely punctuated with a single terminal feedback signal, leading to a significant temporal delay between the observation of a non-trivial reward and the individual steps of behavior culpable for achieving said reward. Coping with such a credit assignment challenge is one of the hallmark characteristics of reinforcement learning. While prior work has introduced the concept of hindsight policies to develop a theoretically moxtivated method for reweighting on-policy data by impact on achieving the observed trajectory return, we show that these methods experience instabilities which lead to inefficient learning in complex environments. In this work, we adapt existing importance-sampling ratio estimation techniques for off-policy evaluation to drastically improve the stability and efficiency of these so-called hindsight policy methods. Our hindsight distribution correction facilitates stable, efficient learning across a broad range of environments where credit assignment plagues baseline methods.
摘要
常常,决策问题的环境很少提供评价反馈来引导强化学习代理人。在极端情况下,长期行为只有单个终端反馈信号,从而导致获得非致命奖励的步骤之间的时间延迟。处理这种奖励分配挑战是强化学习的一个标志特征。而优先作业已经介绍了使用影响实现观察路径返回的奖励重要性权重法,但这些方法会导致不稳定性,从而降低复杂环境中学习的效率。在这种情况下,我们采用现有的不当重要性评估技术来重要性权重法,以改善稳定性和效率。我们的往事分布修正方法可以在各种奖励分配问题中稳定、高效地学习。
On the Vulnerability of Fairness Constrained Learning to Malicious Noise
paper_authors: Avrim Blum, Princewill Okoroafor, Aadirupa Saha, Kevin Stangl
for: 本文研究了对小量恶意噪声的抗性性别平等学习。
methods: 本文使用了随机分类器来减轻恶意噪声的影响。
results: 研究发现,允许随机分类器时,性别平等学习对小量恶意噪声的抗性较为良好,例如对于人口均衡性,可以具有$\Theta(\alpha)$的准确率损失,与无性别约束的最好情况相当。对于平等机会性,可以具有$O(\sqrt{\alpha})$的准确率损失,并给出了匹配的下界$\Omega(\sqrt{\alpha})$。与 Konstantinov 和 Lampert(2021)的研究相比,这些结果表明性别平等学习对小量恶意噪声的抗性较为优秀。此外,本文还考虑了其他的公平性定义,包括平等机会性和均衡性。对这些公平性定义,残余准确率分布在$O(\alpha)$, $O(\sqrt{\alpha})$和$O(1)$三个自然区间内。这些结果为性别平等学习对 adversarial 噪声的抗性提供了更细致的视角。Abstract
We consider the vulnerability of fairness-constrained learning to small amounts of malicious noise in the training data. Konstantinov and Lampert (2021) initiated the study of this question and presented negative results showing there exist data distributions where for several fairness constraints, any proper learner will exhibit high vulnerability when group sizes are imbalanced. Here, we present a more optimistic view, showing that if we allow randomized classifiers, then the landscape is much more nuanced. For example, for Demographic Parity we show we can incur only a $\Theta(\alpha)$ loss in accuracy, where $\alpha$ is the malicious noise rate, matching the best possible even without fairness constraints. For Equal Opportunity, we show we can incur an $O(\sqrt{\alpha})$ loss, and give a matching $\Omega(\sqrt{\alpha})$lower bound. In contrast, Konstantinov and Lampert (2021) showed for proper learners the loss in accuracy for both notions is $\Omega(1)$. The key technical novelty of our work is how randomization can bypass simple "tricks" an adversary can use to amplify his power. We also consider additional fairness notions including Equalized Odds and Calibration. For these fairness notions, the excess accuracy clusters into three natural regimes $O(\alpha)$,$O(\sqrt{\alpha})$ and $O(1)$. These results provide a more fine-grained view of the sensitivity of fairness-constrained learning to adversarial noise in training data.
摘要
我们考虑了公平性条件下的学习的易受攻击性。 Konstantinov 和 Lampert (2021) 开始了这个研究,并发现了一些数据分布下,任何合法的学习者都会受到高度易受攻击性的影响,当集合大小不对称时。 在这里,我们提供了一个更有希望的看法,表明如果允许随机分类器, то alors the landscape 会是非常复杂的。 例如,对于人口均衡,我们显示可以允许仅有 $\Theta(\alpha)$ 的精度损失,其中 $\alpha$ 是邪恶噪音率,与不具有公平性限制的情况相同。 对于平等机会,我们显示可以允许 $O(\sqrt{\alpha})$ 的精度损失,并提供了匹配的 $\Omega(\sqrt{\alpha})$ 下界。 与 Konstantinov 和 Lampert (2021) 的结果相比,我们的结果显示,允许随机分类器后,损失精度会分布在三个自然的 режимах $O(\alpha)$, $O(\sqrt{\alpha})$ 和 $O(1)$。 这些结果提供了一个更细部的看法,对于公平性限制下的学习对于噪音训练数据的敏感性。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Multimodal Document Analytics for Banking Process Automation
paper_authors: Christopher Gerling, Stefan Lessmann
For: This paper aims to understand the potential of advanced document analytics, specifically using multimodal models, in banking processes to improve operational efficiency and enhance process efficiency.* Methods: The paper uses a comprehensive analysis of the diverse banking document landscape, highlighting opportunities for efficiency gains through automation and advanced analytics techniques in the customer business. The study also employs natural language processing (NLP) techniques, including LayoutXLM, a cross-lingual, multimodal, pre-trained model, to analyze diverse documents in the banking sector.* Results: The study achieves an overall F1 score performance of around 80% on German company register extracts, demonstrating the efficiency of LayoutXLM. Additionally, the study finds that over 75% F1 score can be achieved with only 30% of the training data, highlighting the benefits of integrating image information and the potential for real-world applicability and benefits of multimodal models within banking.Abstract
In response to growing FinTech competition and the need for improved operational efficiency, this research focuses on understanding the potential of advanced document analytics, particularly using multimodal models, in banking processes. We perform a comprehensive analysis of the diverse banking document landscape, highlighting the opportunities for efficiency gains through automation and advanced analytics techniques in the customer business. Building on the rapidly evolving field of natural language processing (NLP), we illustrate the potential of models such as LayoutXLM, a cross-lingual, multimodal, pre-trained model, for analyzing diverse documents in the banking sector. This model performs a text token classification on German company register extracts with an overall F1 score performance of around 80\%. Our empirical evidence confirms the critical role of layout information in improving model performance and further underscores the benefits of integrating image information. Interestingly, our study shows that over 75% F1 score can be achieved with only 30% of the training data, demonstrating the efficiency of LayoutXLM. Through addressing state-of-the-art document analysis frameworks, our study aims to enhance process efficiency and demonstrate the real-world applicability and benefits of multimodal models within banking.
摘要
响应金融科技竞争的增长和业务效率的需求,这项研究专注于理解进步的文档分析技术在银行业务中的潜在优势。我们进行了银行文档多样化领域的全面分析,并指出了自动化和高级分析技术的可能性,以提高客户业务的效率。基于自然语言处理(NLP)领域的快速发展,我们介绍了 LayoutXLM 模型,这是一种跨语言、多modal、预训练的模型,可以分析银行业务中的多种文档。这个模型在德国公司注册报表EXTRACTS上进行文本符号分类,其总 F1 分数为约 80%。我们的实证证明了文档中的布局信息对模型性能的重要性,并进一步强调了将图像信息integrated的利好。奇妙的是,我们的研究表明,只使用 30% 的训练数据,可以达到超过 75% F1 分数,这表明 LayoutXLM 的效率。通过对现代文档分析框架进行调查,我们的研究旨在提高业务效率,并证明在银行业务中的多Modal模型的实际可用性和优势。
eXplainable Artificial Intelligence (XAI) in age prediction: A systematic review
results: 论文通过对多个身体系统的研究,发现XAI可以帮助提高年龄预测的准确率和可解释性。Abstract
eXplainable Artificial Intelligence (XAI) is now an important and essential part of machine learning, allowing to explain the predictions of complex models. XAI is especially required in risky applications, particularly in health care, where human lives depend on the decisions of AI systems. One area of medical research is age prediction and identification of biomarkers of aging and age-related diseases. However, the role of XAI in the age prediction task has not previously been explored directly. In this review, we discuss the application of XAI approaches to age prediction tasks. We give a systematic review of the works organized by body systems, and discuss the benefits of XAI in medical applications and, in particular, in the age prediction domain.
摘要
<>将文本翻译成简化中文。<>现代人工智能(XAI)已成为机器学习的重要和必需的一部分,允许解释模型的预测结果。XAI特别在高风险应用中需要,如医疗领域,人工智能系统的决策对人生命有着重要的影响。一个医学研究领域是年龄预测和衰老病症的生物标志物的预测。然而,XAI在年龄预测任务中的角色尚未得到直接探讨。在这篇评论中,我们讨论了XAI方法在年龄预测任务中的应用。我们按照身体系统进行了系统性的综述,并讨论了医疗应用中XAI的利点,特别是在年龄预测领域。
HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness
results: HybridAugment和HybridAugment++在CIFAR-10/100和ImageNet上达到或超过了现状的clean accuracy,在ImageNet-C、CIFAR-10-C和CIFAR-100-C中的损坏测试中达到或超过了现状,在CIFAR-10上的抗击性和多种数据集上的out-of-distribution检测中达到了竞争水平Abstract
Convolutional Neural Networks (CNN) are known to exhibit poor generalization performance under distribution shifts. Their generalization have been studied extensively, and one line of work approaches the problem from a frequency-centric perspective. These studies highlight the fact that humans and CNNs might focus on different frequency components of an image. First, inspired by these observations, we propose a simple yet effective data augmentation method HybridAugment that reduces the reliance of CNNs on high-frequency components, and thus improves their robustness while keeping their clean accuracy high. Second, we propose HybridAugment++, which is a hierarchical augmentation method that attempts to unify various frequency-spectrum augmentations. HybridAugment++ builds on HybridAugment, and also reduces the reliance of CNNs on the amplitude component of images, and promotes phase information instead. This unification results in competitive to or better than state-of-the-art results on clean accuracy (CIFAR-10/100 and ImageNet), corruption benchmarks (ImageNet-C, CIFAR-10-C and CIFAR-100-C), adversarial robustness on CIFAR-10 and out-of-distribution detection on various datasets. HybridAugment and HybridAugment++ are implemented in a few lines of code, does not require extra data, ensemble models or additional networks.
摘要
卷积神经网络(CNN)在分布Shift下表现不佳。其泛化性已经得到了广泛的研究,其中一种方向是从频率角度出发。这些研究表明人类和CNN可能会关注不同频率成分的图像。基于这些观察,我们提出了一种简单又有效的数据扩充方法 HybridAugment,它降低了CNN对高频成分的依赖,从而提高了其Robustness,保持了清晰度高。其次,我们提出了HybridAugment++,它是一种层次扩充方法,它尝试通过不同频谱扩充来统一各种频谱扩充。HybridAugment++ builds on HybridAugment,并且降低了CNN对图像的振荡 Component 的依赖,而且促进图像的频谱信息。这种统一结果在CIFAR-10/100和ImageNet上得到了与或超过了现状的 Results,同时在ImageNet-C、CIFAR-10-C和CIFAR-100-C上的腐坏检验、鲁棒性检验和Out-of-distribution检验中也表现出色。HybridAugment和HybridAugment++都是几行代码,不需要额外数据、ensemble模型或额外网络。
Mitigating Communications Threats in Decentralized Federated Learning through Moving Target Defense
paper_authors: Enrique Tomás Martínez Beltrán, Pedro Miguel Sánchez Sánchez, Sergio López Bernal, Gérôme Bovet, Manuel Gil Pérez, Gregorio Martínez Pérez, Alberto Huertas Celdrán
results: 实验结果显示,在对 MNIST 数据集和 eclipse 攻击进行评估时,这个安全模组能够提高 F1 分数的平均值至 95%,并导致 CPU 使用率(最高可达 63.2% +-3.5%)和网络流量(最高可达 230 MB +-15 MB)的moderate 增加。Abstract
The rise of Decentralized Federated Learning (DFL) has enabled the training of machine learning models across federated participants, fostering decentralized model aggregation and reducing dependence on a server. However, this approach introduces unique communication security challenges that have yet to be thoroughly addressed in the literature. These challenges primarily originate from the decentralized nature of the aggregation process, the varied roles and responsibilities of the participants, and the absence of a central authority to oversee and mitigate threats. Addressing these challenges, this paper first delineates a comprehensive threat model, highlighting the potential risks of DFL communications. In response to these identified risks, this work introduces a security module designed for DFL platforms to counter communication-based attacks. The module combines security techniques such as symmetric and asymmetric encryption with Moving Target Defense (MTD) techniques, including random neighbor selection and IP/port switching. The security module is implemented in a DFL platform called Fedstellar, allowing the deployment and monitoring of the federation. A DFL scenario has been deployed, involving eight physical devices implementing three security configurations: (i) a baseline with no security, (ii) an encrypted configuration, and (iii) a configuration integrating both encryption and MTD techniques. The effectiveness of the security module is validated through experiments with the MNIST dataset and eclipse attacks. The results indicated an average F1 score of 95%, with moderate increases in CPU usage (up to 63.2% +-3.5%) and network traffic (230 MB +-15 MB) under the most secure configuration, mitigating the risks posed by eavesdropping or eclipse attacks.
摘要
DFL(分布式联合学习)的出现使得机器学习模型可以在联合参与者之间训练,从而实现分布式模型聚合和减少依赖于服务器。然而,这种方法引入了一些独特的通信安全挑战,在文献中没有得到充分解决。这些挑战主要来自联合聚合过程的分布式特性、参与者的多样化角色和责任,以及缺乏中央权限来监督和 Mitigate 威胁。为了解决这些挑战,本文首先定义了DFL通信的威胁模型,并提出了一种安全模块,用于DFL平台来防御通信基于攻击。该模块结合了加密技术和移动目标防御(MTD)技术,包括随机邻居选择和IP/端口转换。该安全模块在名为Fedstellar的DFL平台上实现, allowing the deployment and monitoring of the federation。一个DFL场景已经被部署,涉及八个物理设备实现三种安全配置:(i)无安全(基准),(ii)加密配置,(iii)加密和MTD技术的配置。安全模块的效果通过使用MNIST数据集和 eclipse 攻击进行实验验证。结果表明,在最安全配置下,F1 分数平均达到 95%,CPU 使用率(最大值63.2% +-3.5%)和网络流量(230 MB +-15 MB)增加较moderate。这些结果验证了安全模块的有效性,抵消了防止窃听或 eclipse 攻击的风险。
Benchmark datasets for biomedical knowledge graphs with negative statements
paper_authors: Rita T. Sousa, Sara Silva, Catia Pesquita
for: fills the lack of benchmark datasets for knowledge graphs with negative statements, especially in the biomedical domain.
methods: two popular path-based methods are used to generate knowledge graph embeddings for each dataset.
results: negative statements can improve the performance of knowledge graph embeddings in relation prediction tasks, such as protein-protein interaction prediction, gene-disease association prediction, and disease prediction.Abstract
Knowledge graphs represent facts about real-world entities. Most of these facts are defined as positive statements. The negative statements are scarce but highly relevant under the open-world assumption. Furthermore, they have been demonstrated to improve the performance of several applications, namely in the biomedical domain. However, no benchmark dataset supports the evaluation of the methods that consider these negative statements. We present a collection of datasets for three relation prediction tasks - protein-protein interaction prediction, gene-disease association prediction and disease prediction - that aim at circumventing the difficulties in building benchmarks for knowledge graphs with negative statements. These datasets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, enriched with negative statements. We also generate knowledge graph embeddings for each dataset with two popular path-based methods and evaluate the performance in each task. The results show that the negative statements can improve the performance of knowledge graph embeddings.
摘要
知识图表示实际世界实体的事实。大多数这些事实被定义为正面声明。然而,负面声明scarce,但在开放世界假设下,它们对许多应用程序的性能有着高度相关性。例如,在生物医学领域中,它们已经被证明可以提高性能。然而,没有一个benchmark dataset来评估这些方法,这使得建立 benchmarks for knowledge graphs with negative statements 变得困难。为了解决这些问题,我们提供了三个关系预测任务的数据集 - protein-protein交互预测、基因疾病相关性预测和疾病预测 - 这些数据集包括了两个成功的生物医学 ontologies,生物学机制 Ontology 和人类现象 Ontology,这些数据集还包括了负面声明。我们还生成了每个数据集的知识图嵌入,使用两种流行的路径基方法,并评估每个任务的性能。结果表明,负面声明可以提高知识图嵌入的性能。
Statement-based Memory for Neural Source Code Summarization
For: The paper is written for programmers who want to quickly understand the behavior of source code without having to read the code itself. It aims to provide natural language descriptions of code behavior.* Methods: The paper proposes a statement-based memory encoder that learns the important elements of flow during training, allowing for a statement-based subroutine representation without the need for dynamic analysis.* Results: The paper demonstrates a significant improvement over the state-of-the-art in code summarization using the proposed statement-based memory encoder.Here is the information in Simplified Chinese text:
results: 论文通过提出的语句基于编码器,实现了对代码摘要的显著改进。Abstract
Source code summarization is the task of writing natural language descriptions of source code behavior. Code summarization underpins software documentation for programmers. Short descriptions of code help programmers understand the program quickly without having to read the code itself. Lately, neural source code summarization has emerged as the frontier of research into automated code summarization techniques. By far the most popular targets for summarization are program subroutines. The idea, in a nutshell, is to train an encoder-decoder neural architecture using large sets of examples of subroutines extracted from code repositories. The encoder represents the code and the decoder represents the summary. However, most current approaches attempt to treat the subroutine as a single unit. For example, by taking the entire subroutine as input to a Transformer or RNN-based encoder. But code behavior tends to depend on the flow from statement to statement. Normally dynamic analysis may shed light on this flow, but dynamic analysis on hundreds of thousands of examples in large datasets is not practical. In this paper, we present a statement-based memory encoder that learns the important elements of flow during training, leading to a statement-based subroutine representation without the need for dynamic analysis. We implement our encoder for code summarization and demonstrate a significant improvement over the state-of-the-art.
摘要
源代码概要是指将源代码行为写入自然语言描述。代码概要支持软件文档 для程序员。简短的代码描述可以帮助程序员快速理解程序,而无需阅读代码本身。目前,神经源代码概要已经成为自动代码概要技术的前沿。目标最多是程序子循环。基本思路是使用大量的例子来训练神经网络Encoder-Decoder结构。Encoder表示代码,Decoder表示概要。但现有方法通常会将子循环视为单个单元。例如,将整个子循环作为Transformer或RNN基于Encoder的输入。但代码行为通常是从语句到语句的流动的。正常的动态分析可能会暴露这种流动,但是在大量数据集上进行动态分析是不实际的。在本文中,我们提出一个语句基 Memory Encoder,可以在训练中学习重要的流动元素,从而得到语句基的子循环表示,无需动态分析。我们实现了我们的Encoder для代码概要,并在状态前方示出了显著提升。