results: 提出了一幅基于历史维护经验和技术指标的维护计划和维护部分优先级分配方案,帮助道路维护部门更科学地做出决策,考虑有限的预算和历史维护经验。Abstract
With the rapid development of global road transportation, countries worldwide have completed the construction of road networks. However, the ensuing challenge lies in the maintenance of existing roads. It is well-known that countries allocate limited budgets to road maintenance projects, and road management departments face difficulties in making scientifically informed maintenance decisions. Therefore, integrating various artificial intelligence decision-making techniques to thoroughly explore historical maintenance data and adapt them to the context of road maintenance scientific decision-making has become an urgent issue. This integration aims to provide road management departments with more scientific tools and evidence for decision-making. The framework proposed in this paper primarily addresses the following four issues: 1) predicting the pavement performance of various routes, 2) determining the prioritization of maintenance routes, 3) making maintenance decisions based on the evaluation of the effects of past maintenance, and considering comprehensive technical and management indicators, and 4) determining the prioritization of maintenance sections based on the maintenance effectiveness and recommended maintenance effectiveness. By tackling these four problems, the framework enables intelligent decision-making for the optimal maintenance plan and maintenance sections, taking into account limited funding and historical maintenance management experience.
摘要
The proposed framework primarily addresses the following four issues:1. 预测不同路线的路面性能(pavement performance)2. 确定维护路线的优先级(prioritization of maintenance routes)3. 根据过去维护的影响进行维护决策(maintenance decisions based on the evaluation of the effects of past maintenance)4. 根据维护效果和推荐维护效果来决定维护部分的优先级(prioritization of maintenance sections based on the maintenance effectiveness and recommended maintenance effectiveness)By tackling these four problems, the framework enables intelligent decision-making for the optimal maintenance plan and maintenance sections, taking into account limited funding and historical maintenance management experience.
Accurate deep learning sub-grid scale models for large eddy simulations
methods: 这两个模型使用了physics-informed Deep Learning(DL)算法,与传统的分析模型技术不同,可以生成高阶复杂非线性关系。
results: 两个模型在不同的滤波宽度和 Reynolds 数下预测了SGS 压力的结果,其中一个模型使用了tensor基 neural network(TBNN),具有更高的数学表达能力;另一个模型使用了简单的网络结构,具有更好的特征提取能力,但在统计性能指标上表现更好。Abstract
We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes. Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms which, unlike state-of-the-art analytical modeling techniques can produce high-order complex non-linear relations between inputs and outputs. Explicit filtering of data from direct simulations of the canonical channel flow at two friction Reynolds numbers $Re_\tau\approx 395$ and 590 provided accurate data for training and testing. The two sets of models use different network architectures. One of the architectures uses tensor basis neural networks (TBNN) and embeds the simplified analytical model form of the general effective-viscosity hypothesis, thus incorporating the Galilean, rotational and reflectional invariances. The other architecture is that of a relatively simple network, that is able to incorporate the Galilean invariance only. However, this simpler architecture has better feature extraction capacity owing to its ability to establish relations between and extract information from cross-components of the integrity basis tensors and the SGS stresses. Both sets of models are used to predict the SGS stresses for feature datasets generated with different filter widths, and at different Reynolds numbers. It is shown that due to the simpler model's better feature learning capabilities, it outperforms the invariance embedded model in statistical performance metrics. In a priori tests, both sets of models provide similar levels of dissipation and backscatter. Based on the test results, both sets of models should be usable in a posteriori actual LESs.
摘要
我们介绍了两家族的子grid尺度(SGS)随机流模型,用于大扩散 simulated(LES)的目的。它们的发展需要了物理知识具有强大和有效的深度学习(DL)算法,不同于现有的分析模型技术可以生成高阶复杂的非线性关系。我们使用了直接实验的标准频率道流场的数据进行范例训练和测试。这两个模型使用了不同的网络架构。其中一个架构使用了tensor基 neural network(TBNN),并嵌入了简化的分析模型形式的通用有效黏度假设,因此包含了加利ле安、旋转和反射对称性。另一个架构则是一个较简单的网络,它能够将加利ле安对称性独立出来。然而,这个简单的架构有更好的特征学习能力,因为它能够在横向分量和SGS压力之间建立关系和提取信息。两个模型都用于预测SGS压力的特征数据集,并在不同的滤波宽度和 Reynolds 数下进行预测。发现简单的模型在统计性表现指标上比嵌入对称性的模型更好。在先前的测试中,这两个模型都提供了相似的扩散和反射性能。基于这些测试结果,这两个模型在 posteriori 实际 LES 中都可以使用。
Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization
for: investigate the convergence properties of stochastic gradient descent (SGD) method and its variants in training neural networks with nonsmooth activation functions.
methods: develop a novel framework that assigns different timescales to stepsizes for updating momentum terms and variables, and prove the global convergence of the proposed framework in both single-timescale and two-timescale cases.
results: prove the convergence properties of several well-known SGD-type methods, including heavy-ball SGD, SignSGD, Lion, normalized SGD, and clipped SGD, and demonstrate the high efficiency of these methods through preliminary numerical experiments.Abstract
In this paper, we investigate the convergence properties of the stochastic gradient descent (SGD) method and its variants, especially in training neural networks built from nonsmooth activation functions. We develop a novel framework that assigns different timescales to stepsizes for updating the momentum terms and variables, respectively. Under mild conditions, we prove the global convergence of our proposed framework in both single-timescale and two-timescale cases. We show that our proposed framework encompasses a wide range of well-known SGD-type methods, including heavy-ball SGD, SignSGD, Lion, normalized SGD and clipped SGD. Furthermore, when the objective function adopts a finite-sum formulation, we prove the convergence properties for these SGD-type methods based on our proposed framework. In particular, we prove that these SGD-type methods find the Clarke stationary points of the objective function with randomly chosen stepsizes and initial points under mild assumptions. Preliminary numerical experiments demonstrate the high efficiency of our analyzed SGD-type methods.
摘要
在这篇论文中,我们研究权重梯度下降(SGD)方法和其变体的收敛性质,特别是在训练使用非凸活动函数建立的神经网络时。我们提出了一个新的框架,它在更新摇摆项和变量时分配不同的时间尺度。在某些轻度条件下,我们证明了我们所提议的框架的全球收敛性。我们还证明了我们的框架包括许多已知SGD类型方法,包括重力SGD、SignSGD、Lion、normalized SGD和clipped SGD。此外,当目标函数采用finite-sum形式时,我们证明了这些SGD类型方法的收敛性基于我们所提议的框架。具体来说,我们证明了这些SGD类型方法可以随机选择步长和初始点,并在某些轻度假设下找到目标函数的克拉克站点。初步的数据 экспериiments表明我们分析的SGD类型方法具有高效性。
results: 提高了优化问题解决效率。Abstract
Obtaining Quadratic Unconstrained Binary Optimisation models for various optimisation problems, in order to solve those on physical quantum computers (such as the the DWave annealers) is nowadays a lengthy and tedious process that requires one to remodel all problem variables as binary variables and squeeze the target function and the constraints into a single quadratic polynomial into these new variables. We report here on the basis of our automatic converter from MiniZinc to QUBO, which is able to process a large set of constraint optimisation and constraint satisfaction problems and turn them into equivalent QUBOs, effectively optimising the whole process.
摘要
当前,获取二次不约定 binary 优化模型(QUBO)来解决各种优化问题,需要将问题变量全部转换为二进制变量,并将目标函数和约束函数压缩成单个二次多项式中。我们在这篇报告中介绍了我们的自动转换器,可以将 MiniZinc 问题转换为 QUBO,并且可以处理大量的约束优化和约束满足问题。这有效地优化了整个过程。
An Empirical Study on Fertility Proposals Using Multi-Grained Topic Analysis Methods
methods: employing co-occurrence semantic analysis, topic analysis and sentiment analysis to conduct multi-granularity semantic analysis of microblog comments
results: 发现提议“废除婚姻限制出生登记”的讨论涉及个人、社会和国家三维度,并详细分析到社会问题、个人行为、社会伦理和法律等方面,人们的情感倾向于负面的主题。基于这些结论,提出了八项建议作为政策决策参考和研究公众意见政治问题的参考方法。Abstract
Fertility issues are closely related to population security, in 60 years China's population for the first time in a negative growth trend, the change of fertility policy is of great concern to the community. 2023 "two sessions" proposal "suggests that the country in the form of legislation, the birth of the registration of the cancellation of the marriage restriction" This topic was once a hot topic on the Internet, and "unbundling" the relationship between birth registration and marriage has become the focus of social debate. In this paper, we adopt co-occurrence semantic analysis, topic analysis and sentiment analysis to conduct multi-granularity semantic analysis of microblog comments. It is found that the discussion on the proposal of "removing marriage restrictions from birth registration" involves the individual, society and the state at three dimensions, and is detailed into social issues such as personal behaviour, social ethics and law, and national policy, with people's sentiment inclined to be negative in most of the topics. Based on this, eight proposals were made to provide a reference for governmental decision making and to form a reference method for researching public opinion on political issues.
摘要
fertility issues are closely related to population security, in 60 years China's population for the first time in a negative growth trend, the change of fertility policy is of great concern to the community. 2023 "two sessions" proposal "suggests that the country in the form of legislation, the birth of the registration of the cancellation of the marriage restriction" This topic was once a hot topic on the Internet, and "unbundling" the relationship between birth registration and marriage has become the focus of social debate. In this paper, we adopt co-occurrence semantic analysis, topic analysis and sentiment analysis to conduct multi-granularity semantic analysis of microblog comments. It is found that the discussion on the proposal of "removing marriage restrictions from birth registration" involves the individual, society and the state at three dimensions, and is detailed into social issues such as personal behavior, social ethics and law, and national policy, with people's sentiment inclined to be negative in most of the topics. Based on this, eight proposals were made to provide a reference for governmental decision making and to form a reference method for researching public opinion on political issues.Here is the text with the names of the months and the years in Simplified Chinese:fertility issues are closely related to population security, in 60 years China's population for the first time in a negative growth trend, the change of fertility policy is of great concern to the community. 2023 "two sessions" proposal "suggests that the country in the form of legislation, the birth of the registration of the cancellation of the marriage restriction" This topic was once a hot topic on the Internet, and "unbundling" the relationship between birth registration and marriage has become the focus of social debate. In this paper, we adopt co-occurrence semantic analysis, topic analysis and sentiment analysis to conduct multi-granularity semantic analysis of microblog comments. It is found that the discussion on the proposal of "removing marriage restrictions from birth registration" involves the individual, society and the state at three dimensions, and is detailed into social issues such as personal behavior, social ethics and law, and national policy, with people's sentiment inclined to be negative in most of the topics. Based on this, eight proposals were made to provide a reference for governmental decision making and to form a reference method for researching public opinion on political issues.
RobôCIn Small Size League Extended Team Description Paper for RoboCup 2023
paper_authors: Aline Lima de Oliveira, Cauê Addae da Silva Gomes, Cecília Virginia Santos da Silva, Charles Matheus de Sousa Alves, Danilo Andrade Martins de Souza, Driele Pires Ferreira Araújo Xavier, Edgleyson Pereira da Silva, Felipe Bezerra Martins, Lucas Henrique Cavalcanti Santos, Lucas Dias Maciel, Matheus Paixão Gumercindo dos Santos, Matheus Lafayette Vasconcelos, Matheus Vinícius Teotonio do Nascimento Andrade, João Guilherme Oliveira Carvalho de Melo, João Pedro Souza Pereira de Moura, José Ronald da Silva, José Victor Silva Cruz, Pedro Henrique Santana de Morais, Pedro Paulo Salman de Oliveira, Riei Joaquim Matos Rodrigues, Roberto Costa Fernandes, Ryan Vinicius Santos Morais, Tamara Mayara Ramos Teobaldo, Washington Igor dos Santos Silva, Edna Natividade Silva Barros
for: 本研究目的是在2023年的RoboCup Small Size League(SSL)组别B联赛中卫冕冠军。
results: 本研究已经发表了两篇相关SSL的学术研究论文,并在25届RoboCup国际symposium和19届IEEE拉丁美洲机器人学会议(LARS 2022)上发表。我们还在持续将过去的代码库转换到Unification架构。Abstract
Rob\^oCIn has participated in RoboCup Small Size League since 2019, won its first world title in 2022 (Division B), and is currently a three-times Latin-American champion. This paper presents our improvements to defend the Small Size League (SSL) division B title in RoboCup 2023 in Bordeaux, France. This paper aims to share some of the academic research that our team developed over the past year. Our team has successfully published 2 articles related to SSL at two high-impact conferences: the 25th RoboCup International Symposium and the 19th IEEE Latin American Robotics Symposium (LARS 2022). Over the last year, we have been continuously migrating from our past codebase to Unification. We will describe the new architecture implemented and some points of software and AI refactoring. In addition, we discuss the process of integrating machined components into the mechanical system, our development for participating in the vision blackout challenge last year and what we are preparing for this year.
摘要
罗博琪(Rob\^oCIn)自2019年起参加小型足球联赛(Small Size League,SSL),赢得了2022年世界冠军(分区B),并现为拉丁美洲三届冠军。本文描述我们在2023年法国博览会(RoboCup 2023)中防守小型足球联赛(SSL)分区B冠军的改进。本文目的是分享过去一年我们团队进行的一些学术研究。我们成功发表了两篇与SSL相关的文章,在两个高影响因子会议上进行了发表:25届RoboCup国际学术会议和19届IEEE拉丁美洲机器人学术会议(LARS 2022)。过去一年,我们不断迁移自己的代码基础,迁移到统一平台。我们将描述新的架构实现和一些软件和人工智能重构。此外,我们还讨论了在机械系统中 integrate 机器组件的过程,我们在过去一年参加视网膜挑战并准备今年参加。
paper_authors: Ye Ouyang, Yaqin Zhang, Peng Wang, Yunxin Liu, Wen Qiao, Jun Zhu, Yang Liu, Feng Zhang, Shuling Wang, Xidong Wang for:* 6G BSS systems will support the efficient connection of intelligent agents and lead the digital, intelligent, and green transformation of the economy and society.methods:* The paper introduces the overall vision, potential key technologies, and functional architecture of 6G BSS systems.results:* The paper presents an evolutionary roadmap and technological prospects for the BSS systems from 5G to 6G.Abstract
6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc. 6G will realize the transition from serving people and people-things communication to supporting the efficient connection of intelligent agents, and comprehensively leading the digital, intelligent and green transformation of the economy and the society. As the core support system for mobile communication network, 6 6G BSS need to integrate with new business models brought about by the development of the next-generation Internet and IT, upgrade from "network-centric" to "business and service centric" and "customer-centric". 6G OSS and BSS systems need to strengthen their integration to improve the operational efficiency and benefits of customers by connecting the digital intelligence support capabilities on both sides of supply and demand. This paper provides a detailed introduction to the overall vision, potential key technologies, and functional architecture of 6G BSS systems. It also presents an evolutionary roadmap and technological prospects for the BSS systems from 5G to 6G.
摘要
6G是下一代智能 интегрирован的数字信息基础设施,具有 ubique 连接、内置智能、多维感知、全球覆盖、绿色低碳、内网安全等特点。6G将实现从人工智能和物联网的服务转移到支持高效的智能代理连接,全面领导数字、智能和绿色经济社会的转型。作为移动通信网络核心支持系统,6G BSS需要与新一代互联网和IT的商业化模式相结合,升级从“网络中心”到“业务和服务中心”和“客户中心”。6G OSS和BSS系统需要加强对两侧供应需求的连接,以提高客户运营效率和利益。本文对6G BSS系统的总视图、潜在关键技术和功能架构进行详细介绍,还提供了5G到6G BSS系统的进化路线图和技术前景。
TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction
results: 对Scene Classification模型进行质量和量化测试,发现TbExplain可以提高分类精度,并且对于初始预测不可靠时,文本 объяснение具有可靠性。Abstract
The field of Explainable Artificial Intelligence (XAI) aims to improve the interpretability of black-box machine learning models. Building a heatmap based on the importance value of input features is a popular method for explaining the underlying functions of such models in producing their predictions. Heatmaps are almost understandable to humans, yet they are not without flaws. Non-expert users, for example, may not fully understand the logic of heatmaps (the logic in which relevant pixels to the model's prediction are highlighted with different intensities or colors). Additionally, objects and regions of the input image that are relevant to the model prediction are frequently not entirely differentiated by heatmaps. In this paper, we propose a framework called TbExplain that employs XAI techniques and a pre-trained object detector to present text-based explanations of scene classification models. Moreover, TbExplain incorporates a novel method to correct predictions and textually explain them based on the statistics of objects in the input image when the initial prediction is unreliable. To assess the trustworthiness and validity of the text-based explanations, we conducted a qualitative experiment, and the findings indicated that these explanations are sufficiently reliable. Furthermore, our quantitative and qualitative experiments on TbExplain with scene classification datasets reveal an improvement in classification accuracy over ResNet variants.
摘要
黑盒式人工智能(XAI)领域的目标是提高黑盒机器学习模型的解释性。建立基于输入特征的重要性值的热力映射是广泛使用的方法来解释这些模型在预测中所使用的下面运算。然而,热力映射并不完美,例如非专家用户可能无法全面理解热力映射的逻辑(在预测中相关的像素被不同的强度或颜色表示)。此外,输入图像中对模型预测的物件和区域也不常完全区分。在这篇论文中,我们提出了一个名为TbExplain的框架,使用XAI技术和预训物件探测器来提供Scene Classification模型的文本解释。此外,TbExplain还包括一个新的方法来修正预测和文本解释基于输入图像中物件的统计时,当初始预测不可靠时。为评估文本解释的可信度和有效性,我们进行了一个质量性实验,发现这些解释具有足够的可靠性。此外,我们在TbExplainScene Classification dataset上进行了量值和质量性实验,发现TbExplain可以与ResNetVariants相比提高分类精度。
Our Model Achieves Excellent Performance on MovieLens: What Does it Mean?
results: 研究发现,用户交互的不同阶段会对用户的首选项产生影响,并且随着用户的交互增加,推荐算法的学习效果会逐渐下降。此外,改变用户交互的顺序会使Sequential algoritms更难 capture用户交互进程。Abstract
A typical benchmark dataset for recommender system (RecSys) evaluation consists of user-item interactions generated on a platform within a time period. The interaction generation mechanism partially explains why a user interacts with (e.g.,like, purchase, rate) an item, and the context of when a particular interaction happened. In this study, we conduct a meticulous analysis on the MovieLens dataset and explain the potential impact on using the dataset for evaluating recommendation algorithms. We make a few main findings from our analysis. First, there are significant differences in user interactions at the different stages when a user interacts with the MovieLens platform. The early interactions largely define the user portrait which affect the subsequent interactions. Second, user interactions are highly affected by the candidate movies that are recommended by the platform's internal recommendation algorithm(s). Removal of interactions that happen nearer to the last few interactions of a user leads to increasing difficulty in learning user preference, thus deteriorating recommendation accuracy. Third, changing the order of user interactions makes it more difficult for sequential algorithms to capture the progressive interaction process. Based on these findings, we further discuss the discrepancy between the interaction generation mechanism that is employed by the MovieLens system and that of typical real world recommendation scenarios. In summary, models that achieve excellent recommendation accuracy on the MovieLens dataset may not demonstrate superior performance in practice for at least two kinds of differences: (i) the differences in the contexts of user-item interaction generation, and (ii) the differences in user knowledge about the item collections.
摘要
一个典型的RecSys评估数据集包含用户和项目之间的交互,在一个平台上的一段时间内生成。交互生成机制部分解释了用户为何与项目(例如,喜欢、购买、评分)交互。在本研究中,我们进行了仔细的分析MovieLens数据集,并解释了使用该数据集评估推荐算法的可能的影响。我们得出了一些主要发现:1. 用户在不同阶段与MovieLens平台交互时,交互差异显著。早期交互主要定义用户肖像,影响后续交互。2. 用户交互受到MovieLens平台内部推荐算法提供的候选电影的影响,移除用户最后几次交互的交互会降低用户喜好学习的难度,导致推荐精度下降。3. 改变用户交互的顺序使得sequential算法更难捕捉用户交互过程的进步。根据这些发现,我们进一步讨论了MovieLens系统employs的交互生成机制与真实世界推荐场景中的交互生成机制之间的差异。总结来说,在MovieLens数据集上达到杰出推荐精度的模型可能不会在实际场景中表现出色,因为有两种类型的差异:1. 用户和项目之间交互的上下文差异。2. 用户对项目集的了解程度的差异。
results: 实验结果显示XSkill可以将学习自人类示范影片中的技能转换到机器人动作中,并且可以根据人类提供的问题影片来组合学习的技能以完成未见过的任务。Abstract
Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior. However, directly extracting reusable robot manipulation skills from unstructured human videos is challenging due to the big embodiment difference and unobserved action parameters. To bridge this embodiment gap, this paper introduces XSkill, an imitation learning framework that 1) discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos, 2) transfers the skill representation to robot actions using conditional diffusion policy, and finally, 3) composes the learned skill to accomplish unseen tasks specified by a human prompt video. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate both skill transfer and composition for unseen tasks, resulting in a more general and scalable imitation learning framework. The performance of XSkill is best understood from the anonymous website: https://xskillcorl.github.io.
摘要
人类示例视频是机器人学习中广泛可用的数据源,同时也是一种直观的用户界面,用于表达所愿的行为。然而,直接从不结构化的人类视频中提取可重用的机器人操作技巧是困难的,这是因为人类和机器人之间存在大的实体差异和未观察到的行为参数。为 bridging这个实体差距,本文提出了XSkill,一种仿冒学习框架,它可以:1)从无标签的人类和机器人操作视频中找到跨实体表示called skill prototypes,2)使用条件扩散策略将技能表示转移到机器人动作,并最后3)使用人类提示视频来组合学习的技能完成未看到的任务。我们在模拟环境和实际环境中进行了实验,结果表明XSkill可以快速地传递和组合学习的技能,从而实现更一般和可扩展的仿冒学习框架。XSkill的性能可以通过无名website:https://xskillcorl.github.io来了解。
U-CE: Uncertainty-aware Cross-Entropy for Semantic Segmentation
results: 在Cityscapes和ACDC datasets上,使用ResNet-18和ResNet-101两种常见背景网络架构,U-CE训练模型可以不仅提高分割性能,还提供了有意义的uncertainty值。Abstract
Deep neural networks have shown exceptional performance in various tasks, but their lack of robustness, reliability, and tendency to be overconfident pose challenges for their deployment in safety-critical applications like autonomous driving. In this regard, quantifying the uncertainty inherent to a model's prediction is a promising endeavour to address these shortcomings. In this work, we present a novel Uncertainty-aware Cross-Entropy loss (U-CE) that incorporates dynamic predictive uncertainties into the training process by pixel-wise weighting of the well-known cross-entropy loss (CE). Through extensive experimentation, we demonstrate the superiority of U-CE over regular CE training on two benchmark datasets, Cityscapes and ACDC, using two common backbone architectures, ResNet-18 and ResNet-101. With U-CE, we manage to train models that not only improve their segmentation performance but also provide meaningful uncertainties after training. Consequently, we contribute to the development of more robust and reliable segmentation models, ultimately advancing the state-of-the-art in safety-critical applications and beyond.
摘要
Through extensive experimentation, we demonstrate the superiority of U-CE over regular CE training on two benchmark datasets, Cityscapes and ACDC, using two common backbone architectures, ResNet-18 and ResNet-101. With U-CE, we manage to train models that not only improve their segmentation performance but also provide meaningful uncertainties after training. This contribution advances the development of more robust and reliable segmentation models, ultimately benefiting safety-critical applications and beyond.
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network
results: 比较TREEMENT模型和现有模型,TREEMENT模型在实际数据上表现较好,降低了7%的条件水平匹配错误率,并在临床试验水平上表现出色。此外,TREEMENT模型也提供了好的解释性,让模型结果更易采纳。Abstract
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials based on longitudinal patient electronic health records (EHR) data and eligibility criteria of clinical trials. However, they either depend on trial-specific expert rules that cannot expand to other trials or perform matching at a very general level with a black-box model where the lack of interpretability makes the model results difficult to be adopted. To provide accurate and interpretable patient trial matching, we introduce a personalized dynamic tree-based memory network model named TREEMENT. It utilizes hierarchical clinical ontologies to expand the personalized patient representation learned from sequential EHR data, and then uses an attentional beam-search query learned from eligibility criteria embedding to offer a granular level of alignment for improved performance and interpretability. We evaluated TREEMENT against existing models on real-world datasets and demonstrated that TREEMENT outperforms the best baseline by 7% in terms of error reduction in criteria-level matching and achieves state-of-the-art results in its trial-level matching ability. Furthermore, we also show TREEMENT can offer good interpretability to make the model results easier for adoption.
摘要
临床试验是药品开发的关键,但它们经常受到昂贵和不效的病人招募的困扰。在最近的几年中,机器学习模型被提议用于加速病人招募,通过自动将病人与临床试验相匹配,基于 longitudinal 病人电子医疗记录 (EHR) 数据和临床试验的参与条件。然而,它们可能会依赖于具体的试验规则,无法扩展到其他试验,或者使用黑obox模型,导致模型结果难以采用。为了提供准确和可解释的病人试验匹配,我们引入了个性化动态树型记忆网络模型 named TREEMENT。它利用层次的临床 Ontology 扩展个性化病人表示,然后使用注意力寻找查询学习从参与条件嵌入中提取的精细水平匹配,以提高性能和可解释性。我们对实际数据进行了评估,并证明 TREEMENT 在参与条件匹配和试验匹配能力方面都达到了领先的水平,与最佳基eline 比较,TREEMENT 可以提高error reduction 7%。此外,我们还示出 TREEMENT 可以提供好的可解释性,使得模型结果更易于采用。
Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features
results: 我们 teorically prove That SFB can learn an asymptotically-optimal predictor without test-domain labels. Empirically, we demonstrate the effectiveness of SFB on real and synthetic data.Abstract
To avoid failures on out-of-distribution data, recent works have sought to extract features that have a stable or invariant relationship with the label across domains, discarding the "spurious" or unstable features whose relationship with the label changes across domains. However, unstable features often carry complementary information about the label that could boost performance if used correctly in the test domain. Our main contribution is to show that it is possible to learn how to use these unstable features in the test domain without labels. In particular, we prove that pseudo-labels based on stable features provide sufficient guidance for doing so, provided that stable and unstable features are conditionally independent given the label. Based on this theoretical insight, we propose Stable Feature Boosting (SFB), an algorithm for: (i) learning a predictor that separates stable and conditionally-independent unstable features; and (ii) using the stable-feature predictions to adapt the unstable-feature predictions in the test domain. Theoretically, we prove that SFB can learn an asymptotically-optimal predictor without test-domain labels. Empirically, we demonstrate the effectiveness of SFB on real and synthetic data.
摘要
recent works have sought to extract features that have a stable or invariant relationship with the label across domains, discarding the "spurious" or unstable features whose relationship with the label changes across domains. However, unstable features often carry complementary information about the label that could boost performance if used correctly in the test domain. Our main contribution is to show that it is possible to learn how to use these unstable features in the test domain without labels. In particular, we prove that pseudo-labels based on stable features provide sufficient guidance for doing so, provided that stable and unstable features are conditionally independent given the label. Based on this theoretical insight, we propose Stable Feature Boosting (SFB), an algorithm for: (i) learning a predictor that separates stable and conditionally-independent unstable features; and (ii) using the stable-feature predictions to adapt the unstable-feature predictions in the test domain. Theoretically, we prove that SFB can learn an asymptotically-optimal predictor without test-domain labels. Empirically, we demonstrate the effectiveness of SFB on real and synthetic data.Here's the translation in Traditional Chinese:近期的研究尝试提取具有稳定或跨Domain对于标签的相似关系的特征,将"伪�component"或不稳定的特征排除出来。然而,不稳定的特征经常带有标签的补充信息,可以帮助提高效能。我们的主要贡献是表明可以在测试Domain中使用不稳定的特征, без标签。具体来说,我们证明可以使用稳定特征 Pseudo-labels 提供足够的指导,以便在测试Domain中使用不稳定特征。基于这个理论性的见解,我们提出了稳定特征提升(SFB)算法,包括: (i) 学习分类器,以分类稳定和条件独立的不稳定特征; (ii) 使用稳定特征预测,以适应不稳定特征的预测。理论上,我们证明SFB可以学习无需测试Domain标签的 asymptotically-optimal 预测器。实际上,我们透过实验表明SFB在真实和 sintetic 数据上具有优秀的效能。
Exploring Non-Regular Extensions of Propositional Dynamic Logic with Description-Logics Features
for: investigate the impact of non-regular path expressions on the decidability of satisfiability checking and querying in description logics extending ALC.
methods: employing regular and visibly-pushdown languages, and using undecidability results to show the loss of decidability.
results: established undecidability of the concept satisfiability problem for ALCvpl extended with nominals, and undecidability of query entailment for queries involving non-regular atoms.Abstract
We investigate the impact of non-regular path expressions on the decidability of satisfiability checking and querying in description logics extending ALC. Our primary objects of interest are ALCreg and ALCvpl, the extensions of with path expressions employing, respectively, regular and visibly-pushdown languages. The first one, ALCreg, is a notational variant of the well-known Propositional Dynamic Logic of Fischer and Ladner. The second one, ALCvpl, was introduced and investigated by Loding and Serre in 2007. The logic ALCvpl generalises many known decidable non-regular extensions of ALCreg. We provide a series of undecidability results. First, we show that decidability of the concept satisfiability problem for ALCvpl is lost upon adding the seemingly innocent Self operator. Second, we establish undecidability for the concept satisfiability problem for ALCvpl extended with nominals. Interestingly, our undecidability proof relies only on one single non-regular (visibly-pushdown) language, namely on r#s# := { r^n s^n | n in N } for fixed role names r and s. Finally, in contrast to the classical database setting, we establish undecidability of query entailment for queries involving non-regular atoms from r#s#, already in the case of ALC-TBoxes.
摘要
我们研究非正规路径表达对描述逻辑中的可 decidability和查询问题的影响。我们的主要研究对象是 ALCreg 和 ALCvpl,它们分别使用正规语言和可见推下语言来表达非正规路径。ALCreg 是 Fischer 和 Ladner 提出的一种知名的推理逻辑,而 ALCvpl 是 Loding 和 Serre 在 2007 年提出的一种扩展。我们提供了一系列的不可 decidability 结果。首先,我们显示了 ALCvpl 中的概念可行性问题是在添加 Self 运算符后失去可 decidability。其次,我们证明了 ALCvpl 中的概念可行性问题是在添加nominals后不可 decidability。值得注意的是,我们的不可 decidability 证明只需一个非正规(可见推下)语言,即 r#s# := { r^n s^n | n in N } for fixed role names r 和 s。最后,我们证明了在 classical 数据库设定下,对于含有非正规atom的 r#s# 查询,就算是 ALC-TBoxes 中,也是不可 decidability。
Chit-Chat or Deep Talk: Prompt Engineering for Process Mining
results: 我们的框架可以提高对话代理人的性能和可访问性,如公共问题和数据集上的实验所示。我们的研究为未来 LLM 在过程挖掘中的角色做出了贡献,并提出了改进 LLM 记忆、实时用户测试和多样数据集的建议。Abstract
This research investigates the application of Large Language Models (LLMs) to augment conversational agents in process mining, aiming to tackle its inherent complexity and diverse skill requirements. While LLM advancements present novel opportunities for conversational process mining, generating efficient outputs is still a hurdle. We propose an innovative approach that amend many issues in existing solutions, informed by prior research on Natural Language Processing (NLP) for conversational agents. Leveraging LLMs, our framework improves both accessibility and agent performance, as demonstrated by experiments on public question and data sets. Our research sets the stage for future explorations into LLMs' role in process mining and concludes with propositions for enhancing LLM memory, implementing real-time user testing, and examining diverse data sets.
摘要
Here is the text in Simplified Chinese:这个研究 investigate Large Language Models (LLMs) 在进程挖掘中增强对话代理人,以解决该领域的内在复杂性和多样化技能要求。虽然 LLM 的进步提供了对话进程挖掘中新的机遇,但生成高效输出仍然是一个障碍。我们提议一种创新的方法,利用 LLMs 解决现有解决方案中的许多问题,基于对话代理人的先前研究。我们的框架可以提高对话代理人的可访问性和性能,如实验结果所示。我们的研究为未来对 LLMs 在进程挖掘中的角色进行未来的探索提供了基础,并结束于对 LLM 的内存增强、实时用户测试和多样数据集的探索。
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
results: 实验结果显示MCNet可以学习代表性和补偿性的人脸记忆,并在VoxCeleb1和CelebV数据集上明显超越前一代的人脸生成方法。Abstract
Talking head video generation aims to animate a human face in a still image with dynamic poses and expressions using motion information derived from a target-driving video, while maintaining the person's identity in the source image. However, dramatic and complex motions in the driving video cause ambiguous generation, because the still source image cannot provide sufficient appearance information for occluded regions or delicate expression variations, which produces severe artifacts and significantly degrades the generation quality. To tackle this problem, we propose to learn a global facial representation space, and design a novel implicit identity representation conditioned memory compensation network, coined as MCNet, for high-fidelity talking head generation.~Specifically, we devise a network module to learn a unified spatial facial meta-memory bank from all training samples, which can provide rich facial structure and appearance priors to compensate warped source facial features for the generation. Furthermore, we propose an effective query mechanism based on implicit identity representations learned from the discrete keypoints of the source image. It can greatly facilitate the retrieval of more correlated information from the memory bank for the compensation. Extensive experiments demonstrate that MCNet can learn representative and complementary facial memory, and can clearly outperform previous state-of-the-art talking head generation methods on VoxCeleb1 and CelebV datasets. Please check our \href{https://github.com/harlanhong/ICCV2023-MCNET}{Project}.
摘要
幕前人物视频生成技术目的是使用动态姿势和表情信息来动画一张静止图像中的人脸,保持人脸的身份信息,但是由于驱动视频中的剧烈和复杂动作,可能会导致生成杂乱和严重损害质量。为解决这个问题,我们提议学习全面的人脸表示空间,并设计了一种基于人脸记忆强化网络(MCNet)的高精度人脸生成方法。特别是,我们设计了一种网络模块来学习所有训练样本中的统一空间人脸媒体记忆银行,这可以提供丰富的人脸结构和外观偏好,以补偿驱动视频中的扭曲源图像特征。此外,我们提出了一种基于人脸记忆中学习的隐式身份表示查询机制,可以大大优化寻找更相关信息的检索。我们的实验表明,MCNet可以学习代表性和补偿性的人脸记忆,并在VoxCeleb1和CelebV数据集上明显超越前一代人脸生成方法。请查看我们的\href{https://github.com/harlanhong/ICCV2023-MCNET}{项目}.
PyTAG: Challenges and Opportunities for Reinforcement Learning in Tabletop Games
paper_authors: Martin Balla, George E. M. Long, Dominik Jeurissen, James Goodman, Raluca D. Gaina, Diego Perez-Liebana
for: 这 paper 是为了探索使用 Reinforcement Learning (RL) 在现代桌游戏中进行研究的。
methods: 这 paper 使用了 Python API 和 Tabletop Games 框架 (TAG),并提出了基eline 的 Proximal Policy Optimization 算法在一 subset of games 上进行训练。
results: 这 paper 获得了一些基eline 的结果,并讨论了现代桌游戏在RL研究中的一些特殊挑战。Abstract
In recent years, Game AI research has made important breakthroughs using Reinforcement Learning (RL). Despite this, RL for modern tabletop games has gained little to no attention, even when they offer a range of unique challenges compared to video games. To bridge this gap, we introduce PyTAG, a Python API for interacting with the Tabletop Games framework (TAG). TAG contains a growing set of more than 20 modern tabletop games, with a common API for AI agents. We present techniques for training RL agents in these games and introduce baseline results after training Proximal Policy Optimisation algorithms on a subset of games. Finally, we discuss the unique challenges complex modern tabletop games provide, now open to RL research through PyTAG.
摘要
Recently, 游戏人工智能研究(Reinforcement Learning,RL)得到了重要的突破。然而,RL在现代桌面游戏方面却受到了 peu 的关注,即使这些游戏具有许多独特的挑战,比如桌面游戏。为了 bridging 这个差距,我们介绍 PyTAG,一个 Python API for interacting with the Tabletop Games framework(TAG)。TAG 包含了20多种现代桌面游戏,具有共同的 API for AI agents。我们介绍了在这些游戏中训练 RL 代理的技巧,并在一部分游戏上提出了基线结果。最后,我们讨论了现代桌面游戏对 RL 研究提供的特殊挑战,现在通过 PyTAG 开放给 RL 研究。
An analysis on the effects of speaker embedding choice in non auto-regressive TTS
results: 研究发现,不 matter 使用哪种 embedding 集和学习策略,网络都可以处理不同 speaker 标识符 equally well,并且 speech output 质量的变化幅度很小。此外,研究还发现,在标准的训练过程中,speaker leakage 会在核心 speech abstraction 中出现。Abstract
In this paper we introduce a first attempt on understanding how a non-autoregressive factorised multi-speaker speech synthesis architecture exploits the information present in different speaker embedding sets. We analyse if jointly learning the representations, and initialising them from pretrained models determine any quality improvements for target speaker identities. In a separate analysis, we investigate how the different sets of embeddings impact the network's core speech abstraction (i.e. zero conditioned) in terms of speaker identity and representation learning. We show that, regardless of the used set of embeddings and learning strategy, the network can handle various speaker identities equally well, with barely noticeable variations in speech output quality, and that speaker leakage within the core structure of the synthesis system is inevitable in the standard training procedures adopted thus far.
摘要
在这篇论文中,我们介绍了一种非autoregressive多speaker语音合成架构中 informations present in different speaker embedding sets的理解。我们分析了在jointly learning representations并使用预训练模型初始化时,对目标speaker identity的质量改进的影响。在另一个分析中,我们研究了不同 embedding sets对网络的核心speech abstraction(i.e. zero conditioned)的影响, Specifically, we investigate how the different sets of embeddings impact the network's core speech abstraction (i.e. zero conditioned) in terms of speaker identity and representation learning. We show that, regardless of the used set of embeddings and learning strategy, the network can handle various speaker identities equally well, with barely noticeable variations in speech output quality, and that speaker leakage within the core structure of the synthesis system is inevitable in the standard training procedures adopted thus far.
Amortised Design Optimization for Item Response Theory
results: 该论文通过训练DRL代理人使用合成数据,实现了对学生能力分布的优化测试项目选择,并在实际应用中提供了准确的学生能力评估。Abstract
Item Response Theory (IRT) is a well known method for assessing responses from humans in education and psychology. In education, IRT is used to infer student abilities and characteristics of test items from student responses. Interactions with students are expensive, calling for methods that efficiently gather information for inferring student abilities. Methods based on Optimal Experimental Design (OED) are computationally costly, making them inapplicable for interactive applications. In response, we propose incorporating amortised experimental design into IRT. Here, the computational cost is shifted to a precomputing phase by training a Deep Reinforcement Learning (DRL) agent with synthetic data. The agent is trained to select optimally informative test items for the distribution of students, and to conduct amortised inference conditioned on the experiment outcomes. During deployment the agent estimates parameters from data, and suggests the next test item for the student, in close to real-time, by taking into account the history of experiments and outcomes.
摘要
item Response Theory(IRT)是一种常用的方法,用于评估学生在教育和心理学中的回答。在教育领域,IRT用于从学生回答中推断学生能力和测试项Characteristics。与学生互动的成本高,需要有效地收集学生能力信息。基于优化实验设计(OED)的方法 computationally expensive,不适用于交互应用。为此,我们提议将amortized experimental design incorporated into IRT。在这种方式下,计算成本在预计算phase中卷shifted到训练一个深度学习(DRL)代理人,使其能够选择Student Distribution中最有用的测试项,并在实际应用中使用conditioned on experiment outcomes进行amortized inference。在部署时,代理人通过对数据进行参数估计,并根据学生历史和试题结果来建议下一个测试项。Note: "amortized" in the text refers to the idea of precomputing the optimal experimental design and storing it in a way that allows for efficient inference during deployment, rather than computing it on the fly.
A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading
for: This paper focuses on providing a more comprehensive and appropriate validation approach for highly powerful Visual Question Answering (VQA) algorithms, specifically for diabetic macular edema (DME) grading.
methods: The proposed approach uses an automatic adaptive questioning method based on reinforcement learning (RL), which selects the next question to pose based on the history of previously asked questions.
results: The experiments show that the RL agent exhibits similar behavior to a clinician, asking questions that are relevant to key clinical concepts.Abstract
Recent advances in machine learning models have greatly increased the performance of automated methods in medical image analysis. However, the internal functioning of such models is largely hidden, which hinders their integration in clinical practice. Explainability and trust are viewed as important aspects of modern methods, for the latter's widespread use in clinical communities. As such, validation of machine learning models represents an important aspect and yet, most methods are only validated in a limited way. In this work, we focus on providing a richer and more appropriate validation approach for highly powerful Visual Question Answering (VQA) algorithms. To better understand the performance of these methods, which answer arbitrary questions related to images, this work focuses on an automatic visual Turing test (VTT). That is, we propose an automatic adaptive questioning method, that aims to expose the reasoning behavior of a VQA algorithm. Specifically, we introduce a reinforcement learning (RL) agent that observes the history of previously asked questions, and uses it to select the next question to pose. We demonstrate our approach in the context of evaluating algorithms that automatically answer questions related to diabetic macular edema (DME) grading. The experiments show that such an agent has similar behavior to a clinician, whereby asking questions that are relevant to key clinical concepts.
摘要
Test-takers have a say: understanding the implications of the use of AI in language tests
paper_authors: Dawen Zhang, Thong Hoang, Shidong Pan, Yongquan Hu, Zhenchang Xing, Mark Staples, Xiwei Xu, Qinghua Lu, Aaron Quigley for: 这个研究的目的是了解在语言测试中使用人工智能(AI)的影响,特别是测试者的看法和行为方式。methods: 这个研究使用了面对面和在线调查,了解测试者对AI在语言测试中的感知和行为。results: 研究发现,AI的 интеграción可能会提高测试者对测试的公正性和可用性的认知,但也可能会导致测试者对测试的可靠性和互动性的不信任。这些发现可以帮助各方在决定使用AI在语言测试中的时候做出更 Informed 的选择,以保护社会的利益和测试的integrity。Abstract
Language tests measure a person's ability to use a language in terms of listening, speaking, reading, or writing. Such tests play an integral role in academic, professional, and immigration domains, with entities such as educational institutions, professional accreditation bodies, and governments using them to assess candidate language proficiency. Recent advances in Artificial Intelligence (AI) and the discipline of Natural Language Processing have prompted language test providers to explore AI's potential applicability within language testing, leading to transformative activity patterns surrounding language instruction and learning. However, with concerns over AI's trustworthiness, it is imperative to understand the implications of integrating AI into language testing. This knowledge will enable stakeholders to make well-informed decisions, thus safeguarding community well-being and testing integrity. To understand the concerns and effects of AI usage in language tests, we conducted interviews and surveys with English test-takers. To the best of our knowledge, this is the first empirical study aimed at identifying the implications of AI adoption in language tests from a test-taker perspective. Our study reveals test-taker perceptions and behavioral patterns. Specifically, we identify that AI integration may enhance perceptions of fairness, consistency, and availability. Conversely, it might incite mistrust regarding reliability and interactivity aspects, subsequently influencing the behaviors and well-being of test-takers. These insights provide a better understanding of potential societal implications and assist stakeholders in making informed decisions concerning AI usage in language testing.
摘要
语言测试测量人类语言使用能力,包括听说、读写等方面。这些测试在学术、职业和移民领域具有重要的应用,由于语言测试提供者正在探索人工智能(AI)在语言测试中的应用前景,导致语言测试领域的变革。然而,由于AI的可靠性问题,需要深入了解AI在语言测试中的影响,以便各方能够做出了解据的决策,保护社会利益和测试的公正性。为了了解AI在语言测试中的影响,我们通过对英语测试者进行了访谈和问卷调查。根据我们所知,这是第一项针对语言测试中AI使用的实证研究,旨在了解测试者的看法和行为倾向。我们发现,AI的integrating可能会提高公正性、一致性和可用性的认知,但同时也可能会产生不信任的情感,影响测试者的行为和心理健康。这些发现可以帮助各方更好地理解AI在语言测试中的社会影响,并帮助各方做出了解据的决策。
Adversarial Likelihood Estimation with One-way Flows
results: 本研究的实验结果表明,使用这种新的方法可以更快速地收敛,生成质量与传统GANs相似,避免过拟合常用数据集,并生成了平滑的低维 latent representation。Abstract
Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require to have a tractable inverse function. Our experimental results show that we converge faster, produce comparable sample quality to GANs with similar architecture, successfully avoid over-fitting to commonly used datasets and produce smooth low-dimensional latent representations of the training data.
摘要
генеративные adversarial networks (GANs) 可以生成高质量样本,但不提供样本附近的概率密度估计。然而,有人注意到,在能量基本设置下,最大化征韵 log-likelihood 可以导致对抗性框架,其中分类器提供无正则化概率(常称为能量)。我们进一步发展这个视角,吸收重要抽象,并表明了以下两点:1. Wasserstein GAN 实现了偏置估计 partition function,我们提议使用不偏置的估计器;2. 在优化可能性时,必须最大化生成器 entropy,这被假设为提供更好的模式覆盖率。与前一些工作不同,我们直接计算生成器样本的概率密度。这是计算分配函数的不可或缺的关键因素。我们的实验结果表明,我们的方法可以更快 converges,生成与 GANs 类似的样本质量,避免过拟合常用的数据集,并生成缓慢低维 latent 表示。
Amortised Experimental Design and Parameter Estimation for User Models of Pointing
results: 研究发现,通过使用这种方法,可以快速地收集大量数据,并且可以使用 Synthetic data 而不需要巨量的人工数据来估计参数。Abstract
User models play an important role in interaction design, supporting automation of interaction design choices. In order to do so, model parameters must be estimated from user data. While very large amounts of user data are sometimes required, recent research has shown how experiments can be designed so as to gather data and infer parameters as efficiently as possible, thereby minimising the data requirement. In the current article, we investigate a variant of these methods that amortises the computational cost of designing experiments by training a policy for choosing experimental designs with simulated participants. Our solution learns which experiments provide the most useful data for parameter estimation by interacting with in-silico agents sampled from the model space thereby using synthetic data rather than vast amounts of human data. The approach is demonstrated for three progressively complex models of pointing.
摘要
用户模型在互动设计中发挥重要作用,支持自动化互动设计选择。为此,模型参数必须从用户数据进行估算。虽然有时需要很大量的用户数据,但最近的研究表明,可以通过设计 эксперименты,以最小化数据需求来收集数据和推导参数。在当前文章中,我们调查了一种变体的这些方法,即通过训练选择实验设计的策略来减少计算成本。我们的解决方案通过与模型空间中随机 sampling的数字人类进行交互,以使用合成数据而不是巨量的人类数据来学习哪些实验提供最有用的数据 для参数估算。我们的方法在三个不同的指向模型上进行了证明。
Detecting Vulnerable Nodes in Urban Infrastructure Interdependent Network
paper_authors: Jinzhu Mao, Liu Cao, Chen Gao, Huandong Wang, Hangyu Fan, Depeng Jin, Yong Li
for: 本研究旨在理解和 caracterizar la vulnerabilidad de las infraestructuras urbanas, es decir, los equipos de ingeniería esenciales para el funcionamiento normal de las ciudades y que existen naturalmente en la forma de redes. La aplicación potencial incluye proteger instalaciones frágiles y diseñar topologías robustas, etc.
methods: 本研究 utiliza la teoría de grafos neuronales y aprendizaje por refuerzo para modelar la red heterogénea de la infraestructura urbana y caracterizar su vulnerabilidad de manera precisa. El sistema propuesto se entrena con datos reales y utiliza técnicas de aprendizaje profundo para comprender y analizar la red heterogénea, lo que permite capturar el riesgo de fallos en cascada y descubrir las infraestructuras vulnerables de las ciudades.
results: los resultados extensivos de los experimentos con diferentes solicitudes demuestran no solo el poder expresivo del sistema propuesto, sino también su capacidad de transferencia y la necesidad de los componentes específicos.Abstract
Understanding and characterizing the vulnerability of urban infrastructures, which refers to the engineering facilities essential for the regular running of cities and that exist naturally in the form of networks, is of great value to us. Potential applications include protecting fragile facilities and designing robust topologies, etc. Due to the strong correlation between different topological characteristics and infrastructure vulnerability and their complicated evolution mechanisms, some heuristic and machine-assisted analysis fall short in addressing such a scenario. In this paper, we model the interdependent network as a heterogeneous graph and propose a system based on graph neural network with reinforcement learning, which can be trained on real-world data, to characterize the vulnerability of the city system accurately. The presented system leverages deep learning techniques to understand and analyze the heterogeneous graph, which enables us to capture the risk of cascade failure and discover vulnerable infrastructures of cities. Extensive experiments with various requests demonstrate not only the expressive power of our system but also transferring ability and necessity of the specific components.
摘要
理解和 характеризуй城市基础设施的漏洞性,即工程设施的重要组成部分,是对我们来说非常重要的。潜在应用包括保护脆弱设施和设计强健拓扑等等。由于不同的拓扑特征与基础设施漏洞性之间存在强相关性,以及其复杂的演化机制,一些启发式和机器学习分析方法无法处理这种情况。在这篇论文中,我们将城市系统模型为不同类型图的异质图,并提出基于图神经网络和强化学习的系统,可以在实际数据上训练,准确地评估城市系统的漏洞性。我们的系统利用深度学习技术来理解和分析异质图,从而捕捉城市系统中的风险冲击和漏洞设施。广泛的实验表明,我们的系统不仅具有表达力,还能够跨请求传递知识和特定组件的必要性。
Towards a population-informed approach to the definition of data-driven models for structural dynamics
results: 研究结果显示,这两种算法可以对于构造动力学问题进行高精度的预测,并且比传统机器学习算法更好地适应训练人口中的数据稀缺性。Abstract
Machine learning has affected the way in which many phenomena for various domains are modelled, one of these domains being that of structural dynamics. However, because machine-learning algorithms are problem-specific, they often fail to perform efficiently in cases of data scarcity. To deal with such issues, combination of physics-based approaches and machine learning algorithms have been developed. Although such methods are effective, they also require the analyser's understanding of the underlying physics of the problem. The current work is aimed at motivating the use of models which learn such relationships from a population of phenomena, whose underlying physics are similar. The development of such models is motivated by the way that physics-based models, and more specifically finite element models, work. Such models are considered transferrable, explainable and trustworthy, attributes which are not trivially imposed or achieved for machine-learning models. For this reason, machine-learning approaches are less trusted by industry and often considered more difficult to form validated models. To achieve such data-driven models, a population-based scheme is followed here and two different machine-learning algorithms from the meta-learning domain are used. The two algorithms are the model-agnostic meta-learning (MAML) algorithm and the conditional neural processes (CNP) model. The algorithms seem to perform as intended and outperform a traditional machine-learning algorithm at approximating the quantities of interest. Moreover, they exhibit behaviour similar to traditional machine learning algorithms (e.g. neural networks or Gaussian processes), concerning their performance as a function of the available structures in the training population.
摘要
The current work aims to develop models that learn relationships from a population of phenomena with similar underlying physics. This is inspired by the way physics-based models, such as finite element models, work. These models are considered transferrable, explainable, and trustworthy, which are not easily achieved with machine learning models. As a result, machine learning approaches are less trusted by industry and are often more difficult to form validated models.To achieve these data-driven models, a population-based approach is used with two machine learning algorithms from the meta-learning domain: the model-agnostic meta-learning (MAML) algorithm and the conditional neural processes (CNP) model. These algorithms seem to perform as intended and outperform traditional machine learning algorithms at approximating the quantities of interest. Additionally, they exhibit similar behavior to traditional machine learning algorithms, such as neural networks or Gaussian processes, in terms of their performance as a function of the available structures in the training population.
Towards Reliable Rare Category Analysis on Graphs via Individual Calibration
results: 实验结果表明,CALIRARE可以有效地缓解罕见类分类中的偏置和不确定性问题,提高模型的准确性和可靠性。Abstract
Rare categories abound in a number of real-world networks and play a pivotal role in a variety of high-stakes applications, including financial fraud detection, network intrusion detection, and rare disease diagnosis. Rare category analysis (RCA) refers to the task of detecting, characterizing, and comprehending the behaviors of minority classes in a highly-imbalanced data distribution. While the vast majority of existing work on RCA has focused on improving the prediction performance, a few fundamental research questions heretofore have received little attention and are less explored: How confident or uncertain is a prediction model in rare category analysis? How can we quantify the uncertainty in the learning process and enable reliable rare category analysis? To answer these questions, we start by investigating miscalibration in existing RCA methods. Empirical results reveal that state-of-the-art RCA methods are mainly over-confident in predicting minority classes and under-confident in predicting majority classes. Motivated by the observation, we propose a novel individual calibration framework, named CALIRARE, for alleviating the unique challenges of RCA, thus enabling reliable rare category analysis. In particular, to quantify the uncertainties in RCA, we develop a node-level uncertainty quantification algorithm to model the overlapping support regions with high uncertainty; to handle the rarity of minority classes in miscalibration calculation, we generalize the distribution-based calibration metric to the instance level and propose the first individual calibration measurement on graphs named Expected Individual Calibration Error (EICE). We perform extensive experimental evaluations on real-world datasets, including rare category characterization and model calibration tasks, which demonstrate the significance of our proposed framework.
摘要
罕见类在许多实际网络中充斥,在各种高风险应用中扮演着重要角色,如金融欺诈检测、网络入侵检测和罕见疾病诊断。罕见类分析(RCA)指的是在高度不均匀数据分布中探测、特征化和理解少数类别的行为。而现有大多数RCA研究都专注于提高预测性能,而忽略了一些基本研究问题,如预测模型对罕见类的自信度和不确定性如何?如何量化学习过程中的不确定性,使罕见类分析可靠?为回答这些问题,我们开始 investigate existing RCA方法的miscalibration。实验结果表明,现状顶峰RCA方法主要是对少数类预测过于自信,对多数类预测过于不自信。这一观察导我们提出一种新的个体准确框架,名为CALIRARE,以解决RCAUnique挑战,使罕见类分析可靠。 Specifically, to quantify the uncertainties in RCA, we develop a node-level uncertainty quantification algorithm to model the overlapping support regions with high uncertainty; to handle the rarity of minority classes in miscalibration calculation, we generalize the distribution-based calibration metric to the instance level and propose the first individual calibration measurement on graphs named Expected Individual Calibration Error (EICE). We perform extensive experimental evaluations on real-world datasets, including rare category characterization and model calibration tasks, which demonstrate the significance of our proposed framework.
A Fast and Map-Free Model for Trajectory Prediction in Traffics
results: 在Argoverse数据集上比较了现有的map-free方法和map-based状态前方法,其性能最高,并且比基eline方法更快。Abstract
To handle the two shortcomings of existing methods, (i)nearly all models rely on high-definition (HD) maps, yet the map information is not always available in real traffic scenes and HD map-building is expensive and time-consuming and (ii) existing models usually focus on improving prediction accuracy at the expense of reducing computing efficiency, yet the efficiency is crucial for various real applications, this paper proposes an efficient trajectory prediction model that is not dependent on traffic maps. The core idea of our model is encoding single-agent's spatial-temporal information in the first stage and exploring multi-agents' spatial-temporal interactions in the second stage. By comprehensively utilizing attention mechanism, LSTM, graph convolution network and temporal transformer in the two stages, our model is able to learn rich dynamic and interaction information of all agents. Our model achieves the highest performance when comparing with existing map-free methods and also exceeds most map-based state-of-the-art methods on the Argoverse dataset. In addition, our model also exhibits a faster inference speed than the baseline methods.
摘要
为了解决现有方法的两个缺陷,即大多数模型依赖高清晰度地图, yet 地图信息在实际交通场景中不一定可用,而且制图 HD 地图是昂贵的和时间consuming,以及现有模型通常是通过增强预测精度来忽略计算效率,而计算效率对实际应用来说是关键,这篇论文提出了一种高效的轨迹预测模型,这种模型不依赖于交通地图。我们的核心想法是在第一阶段对单个机器的空间时间信息进行编码,然后在第二阶段通过多 Agent 的空间时间互动来探索。通过全面利用注意机制、LSTM、图 convolution 网络和时间变换器,我们的模型能够学习所有机器的丰富动态和互动信息。我们的模型在对比现有无地图方法的情况下 achieved 最高性能,同时也超过了大多数基于地图的现状方法在 Argoverse 数据集上。此外,我们的模型还具有更快的推理速度than 基eline 方法。
Online Continual Learning for Robust Indoor Object Recognition
results: 我们在不同的训练/测试扩展情况下评估了CL模型的RobOCLe的Robustness。结果显示,不同的高阶统计矩阵可以捕捉不同的变形特征,从而提供更高的Robustness而无需减少推断速率。Abstract
Vision systems mounted on home robots need to interact with unseen classes in changing environments. Robots have limited computational resources, labelled data and storage capability. These requirements pose some unique challenges: models should adapt without forgetting past knowledge in a data- and parameter-efficient way. We characterize the problem as few-shot (FS) online continual learning (OCL), where robotic agents learn from a non-repeated stream of few-shot data updating only a few model parameters. Additionally, such models experience variable conditions at test time, where objects may appear in different poses (e.g., horizontal or vertical) and environments (e.g., day or night). To improve robustness of CL agents, we propose RobOCLe, which; 1) constructs an enriched feature space computing high order statistical moments from the embedded features of samples; and 2) computes similarity between high order statistics of the samples on the enriched feature space, and predicts their class labels. We evaluate robustness of CL models to train/test augmentations in various cases. We show that different moments allow RobOCLe to capture different properties of deformations, providing higher robustness with no decrease of inference speed.
摘要
家用机器人视系统需要与未经见过的类型在变化环境中交互。机器人有限的计算资源,标注数据和存储能力。这些需求带来一些独特的挑战:模型需要适应无重复数据和参数的方式进行学习,而不会忘记过去的知识。我们描述这个问题为几shot(FS)在线连续学习(OCL)问题, где机器人代理人通过非重复的流量几shot数据来学习,只需要更新几个模型参数。此外,模型在测试时可能会遇到不同的位置(例如,横向或纵向)和环境(例如,白天或黑夜)。为了提高CL模型的Robustness,我们提议RobOCLe,它包括以下两个部分:1. 构建增强的特征空间,计算样本嵌入特征的高阶统计 moments。2. 在增强的特征空间上计算样本的相似度,预测样本的类别标签。我们对CL模型的Robustness进行了不同的训练/测试拓展的评估。我们发现不同的 moments 可以让 RobOCLe 捕捉不同的变形特征,从而提供更高的Robustness,而无需减少推断速度。
Probabilistic Forecasting with Coherent Aggregation
paper_authors: Geoffrey Négiar, Ruijun Ma, O. Nangba Meetei, Mengfei Cao, Michael W. Mahoney
for: 本研究旨在提供高度准确的预测分布,同时尊重层次结构信息。
methods: 本文提出一种基于因素模型结构的新方法,通过构建层次结构的预测来保证预测的准确性。
results: 对三个层次预测数据集进行比较,本方法可以达到11.8-41.4%的显著提升,而且可以调整基础分布和因素数量来影响预测结果。Abstract
Obtaining accurate probabilistic forecasts while respecting hierarchical information is an important operational challenge in many applications, perhaps most obviously in energy management, supply chain planning, and resource allocation. The basic challenge, especially for multivariate forecasting, is that forecasts are often required to be coherent with respect to the hierarchical structure. In this paper, we propose a new model which leverages a factor model structure to produce coherent forecasts by construction. This is a consequence of a simple (exchangeability) observation: permuting \textit{}base-level series in the hierarchy does not change their aggregates. Our model uses a convolutional neural network to produce parameters for the factors, their loadings and base-level distributions; it produces samples which can be differentiated with respect to the model's parameters; and it can therefore optimize for any sample-based loss function, including the Continuous Ranked Probability Score and quantile losses. We can choose arbitrary continuous distributions for the factor and the base-level distributions. We compare our method to two previous methods which can be optimized end-to-end, while enforcing coherent aggregation. Our model achieves significant improvements: between $11.8-41.4\%$ on three hierarchical forecasting datasets. We also analyze the influence of parameters in our model with respect to base-level distribution and number of factors.
摘要
获取准确的 probabilistic 预测,同时尊重层次结构,在许多应用程序中是一项重要的操作挑战,例如能源管理、供应链规划和资源分配。基本挑战在多变量预测中,即 forecast 需要尊重层次结构的准确性。在这篇论文中,我们提出了一种新的模型,利用因子模型结构生成准确的预测。这是由于一个简单观察(交换性)的结论:在层次结构中重新排序基级系列不会改变它们的总和。我们的模型使用卷积神经网络生成因子、因子加载和基级分布的参数;它生成可以与模型参数进行梯度检查的样本;因此它可以优化任何样本基于损失函数,包括连续排名概率分数和量iles损失。我们可以选择任何连续分布来描述因子和基级分布。我们与之前的两种可以结构化优化的方法进行比较,我们的方法在三个层次预测 datasets 上实现了显著提升:11.8-41.4%。我们还分析了我们模型中参数的影响,即基级分布和因子数量。
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
results: 我们的研究发现,对 LLM 来说,FP8 活动常量比 INT8 更高效,特别是模型参数超过一亿时。对于参量量化,我们发现 FP4 与 INT4 相比,性能相对或更高,使得在 FP 支持的硬件上进行部署更加简单。我们的研究结果表明,FP 量化可以为 LLM 带来很高的效率,使其在资源有限的场景中进行部署。Abstract
In the complex domain of large language models (LLMs), striking a balance between computational efficiency and maintaining model quality is a formidable challenge. Navigating the inherent limitations of uniform quantization, particularly when dealing with outliers, and motivated by the launch of NVIDIA's H100 hardware, this study delves into the viability of floating-point (FP) quantization, particularly focusing on FP8 and FP4, as a potential solution. Our comprehensive investigation reveals that for LLMs, FP8 activation consistently outshines its integer (INT8) equivalent, with the performance edge becoming more noticeable in models possessing parameters beyond one billion. For weight quantization, our findings indicate that FP4 exhibits comparable, if not superior, performance to INT4, simplifying deployment on FP-supported hardware like H100. To mitigate the overhead from precision alignment caused by the disparity between weights and activations, we propose two scaling constraints for weight quantization that negligibly impact the performance compared to the standard W4A8 model. We additionally enhance our quantization methods by integrating the Low Rank Compensation (LoRC) strategy, yielding improvements especially in smaller models. The results of our investigation emphasize the immense potential of FP quantization for LLMs, paving the way for high-efficiency deployment in resource-limited settings.
摘要
在大语言模型(LLM)领域中,Computational efficiency和维护模型质量的平衡是一项具有挑战性的任务。uniform quantization的内在限制,特别是处理异常值时,驱动了我们对浮点(FP)量化的研究,特别是FP8和FP4。这项研究发现,对LLMs来说,FP8活动通常比INT8相对胜过,其性能优势在模型参数超过一亿时变得更加明显。对于权量量化,我们发现,FP4的性能与INT4相当,甚至超过INT4,这使得在FP支持的硬件上进行部署变得更加简单。为了减少精度对齐所导致的开销,我们提议两种缩放约束 для权量量化,对W4A8模型的性能影响很小。此外,我们还增强了我们的量化方法,通过 интеграating Low Rank Compensation(LoRC)策略,尤其是在较小的模型中,得到了改进。研究结果表明,FP量化对LLMs具有巨大的潜力,为高效部署在有限资源的场景提供了道路。
Text2Layer: Layered Image Generation using Latent Diffusion Model
results: 实验结果显示,该方法可以生成高质量的层分割图像,同时提供了一个基准 для未来的研究。Abstract
Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.
摘要
层compositing是现场图像编辑中最受欢迎的工作流程,both amateur和professional都广泛采用。受扩散模型的成功启发,我们从层图生成的角度来探讨层compositing。而不是生成整个图像,我们提议同时生成背景、前景、层Mask和组合图像。为实现层图生成,我们训练了一个能够重建层图的autoencoder,并在幂 rappresentation中训练扩散模型。本提案的两大优点是:一是可以实现更好的组合工作流程,二是生成高质量的层Mask,比起分开进行图像 segmentation后生成的Mask更高质量。实验结果表明,我们的方法能够生成高质量的层图并成为未来工作的标准。
results: 在synthetic和实际数据上进行了实验,并显示了ICECREAM在解释和根本原因分析任务中的出色表现,并在这些任务中实现了惊人的准确率。Abstract
Which set of features was responsible for a certain output of a machine learning model? Which components caused the failure of a cloud computing application? These are just two examples of questions we are addressing in this work by Identifying Coalition-based Explanations for Common and Rare Events in Any Model (ICECREAM). Specifically, we propose an information-theoretic quantitative measure for the influence of a coalition of variables on the distribution of a target variable. This allows us to identify which set of factors is essential to obtain a certain outcome, as opposed to well-established explainability and causal contribution analysis methods which can assign contributions only to individual factors and rank them by their importance. In experiments with synthetic and real-world data, we show that ICECREAM outperforms state-of-the-art methods for explainability and root cause analysis, and achieves impressive accuracy in both tasks.
摘要
我们在这个工作中通过“ icecream”方法来解释模型的输出和云计算应用的失败。 Specifically, we propose an information-theoretic quantitative measure for the influence of a coalition of variables on the distribution of a target variable. This allows us to identify which set of factors is essential to obtain a certain outcome, as opposed to well-established explainability and causal contribution analysis methods which can assign contributions only to individual factors and rank them by their importance. In experiments with synthetic and real-world data, we show that ICECREAM outperforms state-of-the-art methods for explainability and root cause analysis, and achieves impressive accuracy in both tasks.Here's the breakdown of the translation:* "icecream" 是我们提出的一种信息理论基于的量化方法,用于解释模型的输出和云计算应用的失败。* "coalition of variables" 是指一组变量的集合,这些变量可能会影响模型的输出或云计算应用的结果。* "target variable" 是指我们想要解释的变量,例如模型的输出或云计算应用的结果。* "well-established explainability and causal contribution analysis methods" 是指已有的解释和 causal contribution 分析方法,这些方法可能会忽略一些变量的相互作用,导致解释不准确。* "information-theoretic quantitative measure" 是指我们提出的一种基于信息理论的量化方法,用于衡量变量之间的相互作用。* "essential to obtain a certain outcome" 是指这些变量的集合是必须的,以确保模型的输出或云计算应用的结果。* "impressive accuracy" 是指我们在实验中得到的结果具有很高的准确率。I hope this helps! Let me know if you have any further questions or if you'd like me to translate anything else.
results: 实验结果表明,规则基于的算法在平坦地图上能够更快地找到可接受的建筑布局,而演化布局算法在 rugged 地图上表现较好。用户研究 comparing 我们的生成器与2022年版 Minecraft Settlement Generation Competition 的优秀入选作品,根据比赛的评价标准,表明我们的生成器在适应和功能性方面表现良好。Abstract
Procedurally generating cities in Minecraft provides players more diverse scenarios and could help understand and improve the design of cities in other digital worlds and the real world. This paper presents a city generator that was submitted as an entry to the 2023 Edition of Minecraft Settlement Generation Competition for Minecraft. The generation procedure is composed of six main steps, namely vegetation clearing, terrain reshaping, building layout generation, route planning, streetlight placement, and wall construction. Three algorithms, including a heuristic-based algorithm, an evolving layout algorithm, and a random one are applied to generate the building layout, thus determining where to place different redstone style buildings, and tested by generating cities on random maps in limited time. Experimental results show that the heuristic-based algorithm is capable of finding an acceptable building layout faster for flat maps, while the evolving layout algorithm performs better in evolving layout for rugged maps. A user study is conducted to compare our generator with outstanding entries of the competition's 2022 edition using the competition's evaluation criteria and shows that our generator performs well in the adaptation and functionality criteria
摘要
通过生成 Minecraft 中的城市,玩家可以获得更多的多样化场景,并且可以帮助我们更好地理解和改进真实世界中的城市设计。这篇论文介绍了一种城市生成器,该生成器作为2023年 Minecraft 定居点生成比赛的参赛作品。生成过程包括6个主要步骤,namely 维度清除、地形重塑、建筑布局生成、路径规划、灯光设置和墙体建设。三种算法,包括一种规则基于的算法、一种演化布局算法和一种随机算法,被应用于生成建筑布局,因此决定在不同的红石风格建筑物中放置不同的位置。这些算法在生成城市的过程中被测试,并在限时内生成Random maps上进行了测试。实验结果显示,规则基于的算法在平坦地图上能够更快地找到适当的建筑布局,而演化布局算法在 rugged 地图上表现更好。另外,我们进行了用户研究,对2022年版 Minecraft 定居点生成比赛的出色入选作品进行了比较,并通过使用比赛的评价标准表明,我们的生成器在适应性和功能性方面表现良好。
Eliminating Label Leakage in Tree-Based Vertical Federated Learning
results: 实验结果表明,ID2Graph攻击可以快速地泄露tree-based模型的训练标签信息,而ID-LMID防御机制能够有效地防止标签泄露。Abstract
Vertical federated learning (VFL) enables multiple parties with disjoint features of a common user set to train a machine learning model without sharing their private data. Tree-based models have become prevalent in VFL due to their interpretability and efficiency. However, the vulnerability of tree-based VFL has not been sufficiently investigated. In this study, we first introduce a novel label inference attack, ID2Graph, which utilizes the sets of record-IDs assigned to each node (i.e., instance space) to deduce private training labels. The ID2Graph attack generates a graph structure from training samples, extracts communities from the graph, and clusters the local dataset using community information. To counteract label leakage from the instance space, we propose an effective defense mechanism, ID-LMID, which prevents label leakage by focusing on mutual information regularization. Comprehensive experiments conducted on various datasets reveal that the ID2Graph attack presents significant risks to tree-based models such as Random Forest and XGBoost. Further evaluations on these benchmarks demonstrate that ID-LMID effectively mitigates label leakage in such instances.
摘要
Vertical Federated Learning (VFL) 允许不同党有不同特征的用户集合训练机器学习模型,无需共享私人数据。Tree-based模型在VFL中变得普遍,因为它们具有可读性和效率。然而,tree-based VFL 的漏洞尚未得到充分调查。在这种研究中,我们首先介绍了一种新的标签推理攻击,ID2Graph,它利用每个节点(即实例空间)分配的集合ID来推理私人训练标签。ID2Graph 攻击生成了训练样本中的图结构,提取了图中的社区信息,并使用社区信息将地方数据分组。为了防止实例空间中的标签泄露,我们提议一种有效的防御机制,ID-LMID,它通过互信息规范来防止标签泄露。对于Random Forest和XGBoost等树状模型,我们的实验表明,ID2Graph 攻击对这些模型具有显著的风险。而ID-LMID 可以有效地遏制标签泄露在这些情况下。
Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study
results: 实验结果表明,SSL算法可以有效地提高wiFi CSIsensor-based HAR的性能,但是还存在一些限制和盲点,需要进一步改进才能在实际应用中使用。Abstract
Recently, with the advancement of the Internet of Things (IoT), WiFi CSI-based HAR has gained increasing attention from academic and industry communities. By integrating the deep learning technology with CSI-based HAR, researchers achieve state-of-the-art performance without the need of expert knowledge. However, the scarcity of labeled CSI data remains the most prominent challenge when applying deep learning models in the context of CSI-based HAR due to the privacy and incomprehensibility of CSI-based HAR data. On the other hand, SSL has emerged as a promising approach for learning meaningful representations from data without heavy reliance on labeled examples. Therefore, considerable efforts have been made to address the challenge of insufficient data in deep learning by leveraging SSL algorithms. In this paper, we undertake a comprehensive inventory and analysis of the potential held by different categories of SSL algorithms, including those that have been previously studied and those that have not yet been explored, within the field. We provide an in-depth investigation of SSL algorithms in the context of WiFi CSI-based HAR. We evaluate four categories of SSL algorithms using three publicly available CSI HAR datasets, each encompassing different tasks and environmental settings. To ensure relevance to real-world applications, we design performance metrics that align with specific requirements. Furthermore, our experimental findings uncover several limitations and blind spots in existing work, highlighting the barriers that need to be addressed before SSL can be effectively deployed in real-world WiFi-based HAR applications. Our results also serve as a practical guideline for industry practitioners and provide valuable insights for future research endeavors in this field.
摘要
最近,因互联网物联网(IoT)的进步,WiFi CSI基本的人工智能识别(HAR)已经获得了学术和工业社区的增加关注。通过融合深度学习技术与CSI基本的HAR,研究人员可以 achieve state-of-the-art performance 不需要专家知识。然而,CSI基本的数据短缺仍然是对应深度学习模型的最大挑战,因为CSI基本的数据隐私和不可理解。另一方面,安全性可见性学习(SSL)已经出现了具有潜在的应用前景的方法,可以从数据中学习有意义的表现,不需要专家知识。因此,许多努力已经被做出来解决深度学习中的数据不足问题,通过应用SSL算法。在这篇论文中,我们进行了广泛的调查和分析不同类型的SSL算法,包括已经研究过的和尚未研究过的,在这个领域中。我们对这些SSL算法进行了深入的探索,并将其应用到WiFi CSI基本的HAR中。我们使用三个公开available的CSI HAR数据集,每个数据集都包含不同的任务和环境设定,以评估SSL算法的性能。为了保持实际应用的相关性,我们设计了与实际应用相关的表现度量。我们的实验结果显示出了一些限制和盲点,强调了现有的问题,需要解决才能够在实际应用中有效地应用SSL。我们的结果也服务了实际应用的实践者,并提供了价值的见解,供未来的研究努力参考。
Perturbing a Neural Network to Infer Effective Connectivity: Evidence from Synthetic EEG Data
results: 研究人员发现,使用人工神经网络来预测脑电信号的方法可以更好地推断脑电信号之间的 causal 关系,并且可以超过经典的Granger causality方法。Abstract
Identifying causal relationships among distinct brain areas, known as effective connectivity, holds key insights into the brain's information processing and cognitive functions. Electroencephalogram (EEG) signals exhibit intricate dynamics and inter-areal interactions within the brain. However, methods for characterizing nonlinear causal interactions among multiple brain regions remain relatively underdeveloped. In this study, we proposed a data-driven framework to infer effective connectivity by perturbing the trained neural networks. Specifically, we trained neural networks (i.e., CNN, vanilla RNN, GRU, LSTM, and Transformer) to predict future EEG signals according to historical data and perturbed the networks' input to obtain effective connectivity (EC) between the perturbed EEG channel and the rest of the channels. The EC reflects the causal impact of perturbing one node on others. The performance was tested on the synthetic EEG generated by a biological-plausible Jansen-Rit model. CNN and Transformer obtained the best performance on both 3-channel and 90-channel synthetic EEG data, outperforming the classical Granger causality method. Our work demonstrated the potential of perturbing an artificial neural network, learned to predict future system dynamics, to uncover the underlying causal structure.
摘要
描述脑部信息处理和认知功能的关键信息在脑部之间的关系,即有效连接,是脑部研究的关键。电энцеfalogram(EEG)信号表现出脑部内部不同区域之间的复杂动态交互。然而,用于描述多个脑部区域之间的非线性相关性的方法仍然较为未发展。在这项研究中,我们提出了一种数据驱动的框架,用于推测有效连接。具体来说,我们将神经网络(如CNN、vanilla RNN、GRU、LSTM和Transformer)训练为预测基于历史数据的未来EEG信号,并将神经网络的输入干扰以获得有效连接(EC)。EC表示对一个节点的干扰对其他节点的影响。我们在生物可能的Jansen-Rit模型生成的 sintetic EEG数据上测试了表现。CNN和Transformer在3个渠道和90个渠道的 sintetic EEG数据上表现出色,超过了经典Granger causality方法。我们的工作表明了可以使用训练用于预测未来系统动力的人工神经网络来揭示脑部的下面 causal结构。
Sig-Splines: universal approximation and convex calibration of time series generative models
results: 这篇论文的结果显示了这个新型模型不仅具有神经网络的通用性,而且还具有线性的参数。Abstract
We propose a novel generative model for multivariate discrete-time time series data. Drawing inspiration from the construction of neural spline flows, our algorithm incorporates linear transformations and the signature transform as a seamless substitution for traditional neural networks. This approach enables us to achieve not only the universality property inherent in neural networks but also introduces convexity in the model's parameters.
摘要
我们提出了一种新的生成模型,用于多变量离散时间序列数据。我们的算法启发自神经剖流的建构,并将线性变换和签名变换作为神经网络的替换方式。这种方法不仅可以实现神经网络的universality性,还可以在模型参数中引入凸形。
Towards Building More Robust Models with Frequency Bias
methods: 本研究使用了一种叫做频率偏好控制模块(Frequency Preference Control Module),该模块可以自适应地重新配置中间特征表示的低频和高频成分,以更好地利用频谱特征。
results: 实验表明,使用我们提出的频率偏好控制模块可以轻松地将敌方训练框架与不同的架构和数据集绑定,并且可以提高模型的抗辐射性。此外,我们还进行了对频谱偏好对敌方训练过程和最终抗辐射性的研究,发现了有趣的结论。Abstract
The vulnerability of deep neural networks to adversarial samples has been a major impediment to their broad applications, despite their success in various fields. Recently, some works suggested that adversarially-trained models emphasize the importance of low-frequency information to achieve higher robustness. While several attempts have been made to leverage this frequency characteristic, they have all faced the issue that applying low-pass filters directly to input images leads to irreversible loss of discriminative information and poor generalizability to datasets with distinct frequency features. This paper presents a plug-and-play module called the Frequency Preference Control Module that adaptively reconfigures the low- and high-frequency components of intermediate feature representations, providing better utilization of frequency in robust learning. Empirical studies show that our proposed module can be easily incorporated into any adversarial training framework, further improving model robustness across different architectures and datasets. Additionally, experiments were conducted to examine how the frequency bias of robust models impacts the adversarial training process and its final robustness, revealing interesting insights.
摘要
深度神经网络在面对攻击样本时的抵触性问题,对其广泛应用带来了很大障碍,尽管它在不同领域取得了成功。近期一些研究表明,使用对抗样本训练的模型强调低频信息的重要性可以提高模型的抗性。然而,直接将输入图像应用低通滤波器导致了不可逆的信息损失和不好的泛化性,这限制了在不同频谱特征的数据集上的应用。本文提出了一个叫做频率偏好控制模块的插件式模块,可以自适应调整中间特征表示中的低频和高频组分,从而更好地利用频率在Robust学习中。实验表明,我们的提议的模块可以轻松地与任何对抗训练框架结合使用,并在不同架构和数据集上提高模型的抗性。此外,我们还进行了对抗训练过程中频率偏好的模型的影响和最终抗性的实验,发现了有趣的发现。
Reinforcing POD based model reduction techniques in reaction-diffusion complex networks using stochastic filtering and pattern recognition
paper_authors: Abhishek Ajayakumar, Soumyendu Raha
for: 模型复杂网络系统,但是这些系统的维度可能会使其分析变得困难。
methods: 我们提出了一种算法框架,该框架结合了模式识别技术和随机滤波理论来增强模型的输出。
results: 我们的研究结果显示,我们的方法可以在受到输入数据干扰后提高模型的准确性。Here’s the breakdown of each point in English:
for: The paper is written for modeling complex networks, which can be challenging to analyze due to their high dimensionality.
methods: The paper proposes an algorithmic framework that combines techniques from pattern recognition and stochastic filtering theory to enhance the output of dimensionality reduction models.
results: The results of the study show that the proposed method can improve the accuracy of the surrogate model under perturbed inputs.Abstract
Complex networks are used to model many real-world systems. However, the dimensionality of these systems can make them challenging to analyze. Dimensionality reduction techniques like POD can be used in such cases. However, these models are susceptible to perturbations in the input data. We propose an algorithmic framework that combines techniques from pattern recognition (PR) and stochastic filtering theory to enhance the output of such models. The results of our study show that our method can improve the accuracy of the surrogate model under perturbed inputs. Deep Neural Networks (DNNs) are susceptible to adversarial attacks. However, recent research has revealed that neural Ordinary Differential Equations (ODEs) exhibit robustness in specific applications. We benchmark our algorithmic framework with a Neural ODE-based approach as a reference.
摘要
复杂网络被用来模型许多实际世界系统。然而,这些系统的维度可以使得它们变得困难分析。降维技术如POD可以在这些情况下使用。然而,这些模型对输入数据的变化很敏感。我们提出了一个算法框架,该框架结合了 Pattern recognition(PR)和随机滤波理论来增强模型的输出。我们的研究结果表明,我们的方法可以在受到输入数据变化时提高模型的准确性。深度神经网络(DNNs)容易受到恶意攻击。然而,最近的研究发现,神经Ordinary Differential Equations(ODEs)在特定应用中具有抗锋性。我们对比了我们的算法框架与神经ODE-based方法作为参考。
FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning
results: 该论文通过了严格的实验和理论分析,证明 FedBug 在多个数据集、训练条件和网络架构下具有优秀的性能和可靠性。Abstract
Federated Learning (FL) offers a collaborative training framework, allowing multiple clients to contribute to a shared model without compromising data privacy. Due to the heterogeneous nature of local datasets, updated client models may overfit and diverge from one another, commonly known as the problem of client drift. In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Unfreezing), a novel FL framework designed to effectively mitigate client drift. FedBug adaptively leverages the client model parameters, distributed by the server at each global round, as the reference points for cross-client alignment. Specifically, on the client side, FedBug begins by freezing the entire model, then gradually unfreezes the layers, from the input layer to the output layer. This bottom-up approach allows models to train the newly thawed layers to project data into a latent space, wherein the separating hyperplanes remain consistent across all clients. We theoretically analyze FedBug in a novel over-parameterization FL setup, revealing its superior convergence rate compared to FedAvg. Through comprehensive experiments, spanning various datasets, training conditions, and network architectures, we validate the efficacy of FedBug. Our contributions encompass a novel FL framework, theoretical analysis, and empirical validation, demonstrating the wide potential and applicability of FedBug.
摘要
联合学习(FL)提供了一个合作训练框架,让多个客户端参与共享模型,而无需泄露数据隐私。由于本地数据的不同性,更新的客户端模型可能会过滤和分化,又称为客户遗传问题。在这篇论文中,我们提出了FedBug(联合学习底下逐层解冻),一个新的FL框架,可以有效地缓和客户遗传问题。FedBug在服务器端分发的每个全球轮次中,将客户端模型中的各层参数作为参考点进行跨客户同步。具体来说,在客户端上,FedBug会首先冻结整个模型,然后逐渐解冻层,从输入层到输出层。这个底下逐层解冻的方法让模型可以在新解冻的层上训练,将数据对应到一个共同的潜在空间,在这个潜在空间中,数据分隔的折冲线保持一致 across all clients。我们在一个新的FL设置中进行了理论分析,显示FedBug在较高的速度比FedAvg更快地趋向数据。通过了多种数据集、训练条件和网络架构,我们验证了FedBug的有效性。我们的贡献包括一个新的FL框架、理论分析和实验验证,显示了FedBug的广泛应用和可能性。
Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation
results: 对两个公共的benchmark进行测试,研究结果表明,与现有方法相比,本方法具有竞争性的性能。Abstract
Semi-Supervised Semantic Segmentation (S4) aims to train a segmentation model with limited labeled images and a substantial volume of unlabeled images. To improve the robustness of representations, powerful methods introduce a pixel-wise contrastive learning approach in latent space (i.e., representation space) that aggregates the representations to their prototypes in a fully supervised manner. However, previous contrastive-based S4 methods merely rely on the supervision from the model's output (logits) in logit space during unlabeled training. In contrast, we utilize the outputs in both logit space and representation space to obtain supervision in a collaborative way. The supervision from two spaces plays two roles: 1) reduces the risk of over-fitting to incorrect semantic information in logits with the help of representations; 2) enhances the knowledge exchange between the two spaces. Furthermore, unlike previous approaches, we use the similarity between representations and prototypes as a new indicator to tilt training those under-performing representations and achieve a more efficient contrastive learning process. Results on two public benchmarks demonstrate the competitive performance of our method compared with state-of-the-art methods.
摘要
半指导Semantic Segmentation(S4)目标是使用有限量的标注图像和大量的无标注图像来训练一个分割模型。为了提高表示的稳定性,我们引入了一种像素级对比学习方法,将表示 aggregate到其权重空间中的聚合权重。然而,之前的对比基于S4方法仅仅在无标注训练中使用模型的输出(logits)作为监督。与之不同,我们在两个空间中获得监督:1)降低因 incorrect semantic information在logits中过度适应的风险,通过使用表示;2)提高两个空间之间的知识交换。此外,与之前的方法不同,我们使用表示和聚合权重之间的相似性作为一个新的指标,以调整不优秀的表示并实现更高效的对比学习过程。对两个公共的标准测试集进行了比较,我们的方法与当前的状态OF-THE-ART方法相比,具有竞争性的性能。
Information Retrieval Meets Large Language Models: A Strategic Report from Chinese IR Community
results: 工作坊的结果表明,LLMs可以帮助改善信息检索的效果和体验,同时也存在一些 Computational Costs、信任性和道德问题等挑战。此外,提出了一种新的信息检索技术哲学,即“人-LLM-信息检索”技术哲学。Abstract
The research field of Information Retrieval (IR) has evolved significantly, expanding beyond traditional search to meet diverse user information needs. Recently, Large Language Models (LLMs) have demonstrated exceptional capabilities in text understanding, generation, and knowledge inference, opening up exciting avenues for IR research. LLMs not only facilitate generative retrieval but also offer improved solutions for user understanding, model evaluation, and user-system interactions. More importantly, the synergistic relationship among IR models, LLMs, and humans forms a new technical paradigm that is more powerful for information seeking. IR models provide real-time and relevant information, LLMs contribute internal knowledge, and humans play a central role of demanders and evaluators to the reliability of information services. Nevertheless, significant challenges exist, including computational costs, credibility concerns, domain-specific limitations, and ethical considerations. To thoroughly discuss the transformative impact of LLMs on IR research, the Chinese IR community conducted a strategic workshop in April 2023, yielding valuable insights. This paper provides a summary of the workshop's outcomes, including the rethinking of IR's core values, the mutual enhancement of LLMs and IR, the proposal of a novel IR technical paradigm, and open challenges.
摘要
研究领域信息检索(IR)在过去几年发展得非常 significatively,从传统搜索扩展到满足多样化用户信息需求。最近,大型自然语言模型(LLM)在文本理解、生成和知识推理方面表现出色,为IR研究开创了吸引人的新途径。LLM不仅实现生成检索,还提供改进的用户理解、模型评估和用户-系统互动解决方案。此外,IR模型、LLM和人类之间的协同关系形成了一个更强大的技术哲学,用于信息检索。IR模型在实时提供有关信息,LLM增加内存知识,人类扮演搜索者和评估者的中心角色,以确保信息服务的可靠性。然而,计算成本、可靠性、领域特定限制以及伦理考虑等问题仍然存在。为了全面讨论LLM对IR研究的转变性影响,中国IR社区在2023年4月举行了一场策略性工作坊,并获得了有价值的成果。本文总结了工作坊的结果,包括重新思考IR核心价值观、LLM和IR之间的互进式关系、提议一种新的IR技术哲学,以及开放的挑战。
Enhancing conversational quality in language learning chatbots: An evaluation of GPT4 for ASR error correction
results: 研究发现,使用GPT4进行错误纠正后,对话质量得到提高,尽管word-error-rate (WER)增加。GPT4还超过了标准错误纠正方法,无需培训数据。Abstract
The integration of natural language processing (NLP) technologies into educational applications has shown promising results, particularly in the language learning domain. Recently, many spoken open-domain chatbots have been used as speaking partners, helping language learners improve their language skills. However, one of the significant challenges is the high word-error-rate (WER) when recognizing non-native/non-fluent speech, which interrupts conversation flow and leads to disappointment for learners. This paper explores the use of GPT4 for ASR error correction in conversational settings. In addition to WER, we propose to use semantic textual similarity (STS) and next response sensibility (NRS) metrics to evaluate the impact of error correction models on the quality of the conversation. We find that transcriptions corrected by GPT4 lead to higher conversation quality, despite an increase in WER. GPT4 also outperforms standard error correction methods without the need for in-domain training data.
摘要< translator: Google translate文本 интеграция到教育应用程序中已经显示出了有望的结果,特别是在语言学习领域。最近,许多开放频道的口语聊天机器人被用作对话伙伴,帮助语言学习者提高语言技能。然而,一个重要的挑战是非Native/非流利语音识别器的高字幕错误率(WER),这会中断对话流程并使学习者失望。这篇论文探讨使用GPT4进行ASR错误修正的方法。除了WER,我们还提出使用Semantic Textual Similarity(STS)和Next Response Sensibility(NRS)度量来评估错误修正模型对对话质量的影响。我们发现GPT4修正的 trascriptions 会导致更高的对话质量,尽管WER增加。GPT4 还超越了标准的错误修正方法,无需培训数据。
results: 该论文证明了绝对约束可以防止AI系统导致灾难性结果,并且不会让AI系统变得无理性。Abstract
This paper argues that training AI systems with absolute constraints -- which forbid certain acts irrespective of the amount of value they might produce -- may make considerable progress on many AI safety problems in principle. First, it provides a guardrail for avoiding the very worst outcomes of misalignment. Second, it could prevent AIs from causing catastrophes for the sake of very valuable consequences, such as replacing humans with a much larger number of beings living at a higher welfare level. Third, it makes systems more corrigible, allowing creators to make corrective interventions in them, such as altering their objective functions or shutting them down. And fourth, it helps systems explore their environment more safely by prohibiting them from exploring especially dangerous acts. I offer a decision-theoretic formalization of an absolute constraints, improving on existing models in the literature, and use this model to prove some results about the training and behavior of absolutist AIs. I conclude by showing that, although absolutist AIs will not maximize expected value, they will not be susceptible to behave irrationally, and they will not (contra coherence arguments) face environmental pressure to become expected-value maximizers.
摘要
The paper offers a decision-theoretic formalization of absolute constraints, improving on existing models in the literature, and uses this model to prove results about the training and behavior of absolutist AIs. The author concludes that, although absolutist AIs will not maximize expected value, they will not behave irrationally, and they will not face environmental pressure to become expected-value maximizers.
Multi-Grained Multimodal Interaction Network for Entity Linking
results: 实验结果表明,我们的解决方案在三个公共测试集上表现出色,并且ablation study verify了设计的模块的效果Abstract
Multimodal entity linking (MEL) task, which aims at resolving ambiguous mentions to a multimodal knowledge graph, has attracted wide attention in recent years. Though large efforts have been made to explore the complementary effect among multiple modalities, however, they may fail to fully absorb the comprehensive expression of abbreviated textual context and implicit visual indication. Even worse, the inevitable noisy data may cause inconsistency of different modalities during the learning process, which severely degenerates the performance. To address the above issues, in this paper, we propose a novel Multi-GraIned Multimodal InteraCtion Network $\textbf{(MIMIC)}$ framework for solving the MEL task. Specifically, the unified inputs of mentions and entities are first encoded by textual/visual encoders separately, to extract global descriptive features and local detailed features. Then, to derive the similarity matching score for each mention-entity pair, we device three interaction units to comprehensively explore the intra-modal interaction and inter-modal fusion among features of entities and mentions. In particular, three modules, namely the Text-based Global-Local interaction Unit (TGLU), Vision-based DuaL interaction Unit (VDLU) and Cross-Modal Fusion-based interaction Unit (CMFU) are designed to capture and integrate the fine-grained representation lying in abbreviated text and implicit visual cues. Afterwards, we introduce a unit-consistency objective function via contrastive learning to avoid inconsistency and model degradation. Experimental results on three public benchmark datasets demonstrate that our solution outperforms various state-of-the-art baselines, and ablation studies verify the effectiveness of designed modules.
摘要
Multimodal实体链接(MEL)任务,目标是将不确定的提及映射到多模态知识图,在过去几年内吸引了广泛的关注。虽然大量努力被投入到多模态之间的补做效应的探索中,但它们可能会忽略短文本上的缩写表述和隐藏的视觉指示。worse, noisy数据可能会在学习过程中导致不一致的多模态,从而严重降低性能。为解决上述问题,在这篇论文中,我们提出了一种新的多模态交互网络(MIMIC)框架,用于解决MEL任务。具体来说,提及和实体的共同输入首先被文本/视觉编码器分别解码,以提取全局描述特征和局部细节特征。然后,为每个提及实体对应的匹配分数,我们设计了三种交互单元,用于全面探索实体和提及之间的内模态交互和跨模态融合。具体来说,我们设计了文本基于的全球-本地交互单元(TGLU)、视觉基于的双向交互单元(VDLU)和交叉模态融合基于的交互单元(CMFU),以捕捉和融合缩写文本中的细节表述和隐藏的视觉指示。接着,我们引入了一个单元一致性目标函数,通过对比学习来避免不一致和模型衰竭。实验结果表明,我们的解决方案在三个公共 benchmark 数据集上表现出色,并且ablation 研究证明了我们设计的模块的有效性。
Two Tales of Platoon Intelligence for Autonomous Mobility Control: Enabling Deep Learning Recipes
paper_authors: Soohyun Park, Haemin Lee, Chanyoung Park, Soyi Jung, Minseok Choi, Joongheon Kim
for: 解决自动驾驶汽车和无人机的自主行动控制和资源管理问题
methods: 使用多代理人学习(MARL)和神经Myerson拍卖
results: 实现多代理人的分布式行动,以及保证多代理人之间的信任性和优化高动态系统的收益Note: The above text is in Simplified Chinese.Abstract
This paper presents the deep learning-based recent achievements to resolve the problem of autonomous mobility control and efficient resource management of autonomous vehicles and UAVs, i.e., (i) multi-agent reinforcement learning (MARL), and (ii) neural Myerson auction. Representatively, communication network (CommNet), which is one of the most popular MARL algorithms, is introduced to enable multiple agents to take actions in a distributed manner for their shared goals by training all agents' states and actions in a single neural network. Moreover, the neural Myerson auction guarantees trustfulness among multiple agents as well as achieves the optimal revenue of highly dynamic systems. Therefore, we survey the recent studies on autonomous mobility control based on MARL and neural Myerson auction. Furthermore, we emphasize that integration of MARL and neural Myerson auction is expected to be critical for efficient and trustful autonomous mobility services.
摘要
Multi-agent reinforcement learning (MARL): This approach enables multiple agents to learn and take actions in a distributed manner to achieve shared goals.2. Neural Myerson auction: This method guarantees trustworthiness among multiple agents and achieves optimal revenue in highly dynamic systems.The paper surveys recent studies on autonomous mobility control based on MARL and neural Myerson auction, and highlights the importance of integrating these two approaches for efficient and trustworthy autonomous mobility services.In Simplified Chinese:这篇论文介绍了深度学习技术在自动驾驶车载和无人机中的应用,以解决自动驾驶车载和无人机的资源管理和自动驾驶控制问题。这些方法包括:1. 多代理人奖励学习(MARL):这种方法可以让多个代理人在分布式环境中学习和行动,以实现共同目标。2. 神经美森拍卖:这种方法保证多个代理人之间的信任性,并在高度动态系统中实现最优的收益。这篇论文对自动驾驶车载基于MARL和神经美森拍卖的最新研究进行了综述,并强调将这两种方法集成起来将是efficient和可信的自动驾驶服务的关键。
RaTE: a Reproducible automatic Taxonomy Evaluation by Filling the Gap
for: automatic taxonomy construction (ATC) algorithm evaluation
methods: using a large pre-trained language model for label-free taxonomy scoring
results: 1) RaTE correlates well with human judgments, 2) artificially degrading a taxonomy leads to decreasing RaTE score.Abstract
Taxonomies are an essential knowledge representation, yet most studies on automatic taxonomy construction (ATC) resort to manual evaluation to score proposed algorithms. We argue that automatic taxonomy evaluation (ATE) is just as important as taxonomy construction. We propose RaTE, an automatic label-free taxonomy scoring procedure, which relies on a large pre-trained language model. We apply our evaluation procedure to three state-of-the-art ATC algorithms with which we built seven taxonomies from the Yelp domain, and show that 1) RaTE correlates well with human judgments and 2) artificially degrading a taxonomy leads to decreasing RaTE score.
摘要
taxonomies是知识表示的重要组成部分,然而大多数自动taxonomy建构(ATC)研究仍然采用手动评估提议的算法。我们认为自动taxonomy评估(ATE)也非常重要。我们提出了一种名为RaTE的自动无标签分类方法,它基于大型预训练语言模型。我们对三种state-of-the-art ATC算法进行了七个Yelp领域的taxonomy建构,并显示了以下两点:1)RaTE与人类评价高度相关,2) искусственно降低一个分类会导致RaTE分数下降。
STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization
paper_authors: Yachen Kang, Li He, Jinxin Liu, Zifeng Zhuang, Donglin Wang for:This paper aims to learn a complex reward function with binary human preference using preference-based reinforcement learning (PbRL).methods:The paper proposes a self-training method with proposed peer regularization to overcome the issue of similarity trap in PbRL, which improperly enhances the consistency possibility of the model’s predictions between segment pairs and reduces the confidence in reward learning.results:The proposed approach is capable of learning well a variety of locomotion and robotic manipulation behaviors using different semi-supervised alternatives and peer regularization, demonstrating its effectiveness in large-scale applications.Here is the Chinese translation of the three key points:for:这篇论文目标是使用 preference-based reinforcement learning (PbRL) 学习复杂的奖励函数和人类偏好。methods:论文提议一种自适应方法,并提出了一种 peer regularization 来解决 PbRL 中的相似陷阱问题,该问题会使模型在类比对中增强预测的一致性,从而降低奖励学习的信任度。results:提议的方法可以使用不同的半超vised альтернатив和 peer regularization 学习多种 lokomotion 和机器人 manipulation 行为, demonstrating its effectiveness in large-scale applications。Abstract
Preference-based reinforcement learning (PbRL) promises to learn a complex reward function with binary human preference. However, such human-in-the-loop formulation requires considerable human effort to assign preference labels to segment pairs, hindering its large-scale applications. Recent approache has tried to reuse unlabeled segments, which implicitly elucidates the distribution of segments and thereby alleviates the human effort. And consistency regularization is further considered to improve the performance of semi-supervised learning. However, we notice that, unlike general classification tasks, in PbRL there exits a unique phenomenon that we defined as similarity trap in this paper. Intuitively, human can have diametrically opposite preferredness for similar segment pairs, but such similarity may trap consistency regularization fail in PbRL. Due to the existence of similarity trap, such consistency regularization improperly enhances the consistency possiblity of the model's predictions between segment pairs, and thus reduces the confidence in reward learning, since the augmented distribution does not match with the original one in PbRL. To overcome such issue, we present a self-training method along with our proposed peer regularization, which penalizes the reward model memorizing uninformative labels and acquires confident predictions. Empirically, we demonstrate that our approach is capable of learning well a variety of locomotion and robotic manipulation behaviors using different semi-supervised alternatives and peer regularization.
摘要
preference-based reinforcement learning (PbRL) 承诺学习复杂的奖励函数使用二进制人类偏好。然而,这种人loop形式需要较大的人类努力来分配偏好标签对segment对, limiting its large-scale applications。 recent approaches have tried to reuse unlabeled segments, which implicitly elucidates the distribution of segments and thereby alleviates the human effort. In addition, consistency regularization is further considered to improve the performance of semi-supervised learning. However, we notice that, unlike general classification tasks, in PbRL there exists a unique phenomenon that we defined as similarity trap in this paper. Intuitively, human can have diametrically opposite preferredness for similar segment pairs, but such similarity may trap consistency regularization fail in PbRL. Due to the existence of similarity trap, such consistency regularization improperly enhances the consistency possibility of the model's predictions between segment pairs, and thus reduces the confidence in reward learning, since the augmented distribution does not match with the original one in PbRL. To overcome such issue, we present a self-training method along with our proposed peer regularization, which penalizes the reward model memorizing uninformative labels and acquires confident predictions. Empirically, we demonstrate that our approach is capable of learning well a variety of locomotion and robotic manipulation behaviors using different semi-supervised alternatives and peer regularization.
Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation
for: This paper is written for those who are interested in improving customer shopping experiences and personalized recommendations in e-commerce.
methods: The paper uses a multilingual session dataset, named Amazon-M2, which consists of millions of user sessions from six different locales, to benchmark various algorithms and tasks related to next-product recommendation, next-product recommendation with domain shifts, and next-product title generation.
results: The paper introduces three tasks and benchmarks a range of algorithms on the proposed dataset, drawing new insights for further research and practice. Additionally, the paper hosts a competition in the KDD CUP 2023 and attracts thousands of users and submissions.Abstract
Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website https://kddcup23.github.io/.
摘要
模拟客户购买意图是电商的关键任务,直接影响用户体验和参与度。因此,正确理解客户偏好是提供个性化推荐的关键。会话基于推荐,利用客户会话数据预测他们下一次交互,在现有会话数据上有限制,无法全面捕捉用户行为和偏好谱系。为了弥补这一差距,我们提出了亚马逊多语言多地区购物会话数据集(Amazon-M2)。这是首个多语言数据集,包含来自六个不同的地区,主要语言包括英文、德语、日语、法语、意大利语和西班牙语。这个数据集可以帮助我们提高个性化和用户偏好的理解,从而对现有任务和新任务产生新的发现和挑战。为了评估数据集的潜力,我们在本文中引入了三个任务:(1)下一个产品推荐(2)下一个产品推荐域转移(3)下一个产品标题生成。通过这三个任务,我们对我们提posed的数据集和任务进行了 benchmark,从而获得了新的发现和实践。此外,我们还在KDD CUP 2023中举办了一场竞赛,并吸引了数千名用户和提交。赢家的解决方案和相关的工作坊可以在我们的官方网站https://kddcup23.github.io/上获得。
PubMed and Beyond: Recent Advances and Best Practices in Biomedical Literature Search
paper_authors: Qiao Jin, Robert Leaman, Zhiyong Lu
for: This paper aims to help readers efficiently fulfill their information needs in biomedicine by providing a comprehensive survey of literature search tools tailored to specific information needs.
methods: The paper examines widely used PubMed search engine and describes literature search tools catering to five specific information needs, including identifying high-quality clinical research, retrieving gene-related information, searching by meaning, locating related articles, and mining literature to discover associations between concepts.
results: The paper provides a comprehensive view of biomedical literature search functionalities with 36 publicly available tools, and offers practical considerations and best practices for choosing and using these tools. Additionally, the paper provides a perspective on the future of literature search engines considering recent breakthroughs in large language models such as ChatGPT.Abstract
Biomedical research yields a wealth of information, much of which is only accessible through the literature. Consequently, literature search is an essential tool for building on prior knowledge in clinical and biomedical research. Although recent improvements in artificial intelligence have expanded functionality beyond keyword-based search, these advances may be unfamiliar to clinicians and researchers. In response, we present a survey of literature search tools tailored to both general and specific information needs in biomedicine, with the objective of helping readers efficiently fulfill their information needs. We first examine the widely used PubMed search engine, discussing recent improvements and continued challenges. We then describe literature search tools catering to five specific information needs: 1. Identifying high-quality clinical research for evidence-based medicine. 2. Retrieving gene-related information for precision medicine and genomics. 3. Searching by meaning, including natural language questions. 4. Locating related articles with literature recommendation. 5. Mining literature to discover associations between concepts such as diseases and genetic variants. Additionally, we cover practical considerations and best practices for choosing and using these tools. Finally, we provide a perspective on the future of literature search engines, considering recent breakthroughs in large language models such as ChatGPT. In summary, our survey provides a comprehensive view of biomedical literature search functionalities with 36 publicly available tools.
摘要
生物医学研究提供了大量信息,其中许多信息只能通过文献来获取。因此,文献搜索成为了临床和生物医学研究中不可或缺的工具。虽然最近的人工智能技术已经扩展了搜索功能之 beyond 键盘基本搜索,但这些进步可能对临床和研究人员来说还是不熟悉的。为了帮助读者快速满足他们的信息需求,我们提供了一份关于文献搜索工具的讲话,涵盖了以下五个特定信息需求:1. 找到高质量的临床研究,以便实施基于证据的医疗。2. 搜索与基因相关的信息,以满足精准医学和基因学的需求。3. 通过自然语言问题进行搜索。4. 找到相关的文献,并获得文献推荐。5. 挖掘文献中的概念之间的关系,如疾病和基因变异之间的关系。此外,我们还讨论了选择和使用这些工具的实际考虑和最佳实践。最后,我们提供了对文献搜索引擎的未来展望,考虑到最近的大语言模型如ChatGPT的突破。总之,我们的讲话提供了36种公共可用的生物医学文献搜索功能的全面视图。
What’s meant by explainable model: A Scoping Review
results: 本研究发现,81%的应用论文中的作者在使用解释模型时没有进行任何评估,这表明了在使用解释模型时,作者们往往假设它们已经足够地被评估。Abstract
We often see the term explainable in the titles of papers that describe applications based on artificial intelligence (AI). However, the literature in explainable artificial intelligence (XAI) indicates that explanations in XAI are application- and domain-specific, hence requiring evaluation whenever they are employed to explain a model that makes decisions for a specific application problem. Additionally, the literature reveals that the performance of post-hoc methods, particularly feature attribution methods, varies substantially hinting that they do not represent a solution to AI explainability. Therefore, when using XAI methods, the quality and suitability of their information outputs should be evaluated within the specific application. For these reasons, we used a scoping review methodology to investigate papers that apply AI models and adopt methods to generate post-hoc explanations while referring to said models as explainable. This paper investigates whether the term explainable model is adopted by authors under the assumption that incorporating a post-hoc XAI method suffices to characterize a model as explainable. To inspect this problem, our review analyzes whether these papers conducted evaluations. We found that 81% of the application papers that refer to their approaches as an explainable model do not conduct any form of evaluation on the XAI method they used.
摘要
我们经常看到“可解释的”(Explainable)一词在人工智能(AI)应用中的标题中出现。然而,XAI文献表明,在不同应用和领域中,解释都是应用和领域特定的,因此需要在特定应用中进行评估。此外,文献还表明,后期方法特别是特征归属方法的性能差异很大,这意味着它们不是AI可解释的解决方案。因此,当使用XAI方法时,需要评估其输出信息的质量和适用性。为了解决这个问题,我们采用了一种批量评估方法来研究那些使用AI模型并采用后期解释方法的论文。我们发现,81%的应用论文中的作者在使用XAI方法时没有进行任何评估。
results: 在一个具有笨重奖励的机器人拼堆环境中,这种方法能够提高探索效率和重用经验数据的能力,并且可以在不同的任务上重用已学会的技能。Abstract
Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges, such as efficient exploration, reusing experience data, scheduling skills, and learning from observations, which traditionally require separate, vertically designed algorithms. We test our method on a sparse-reward simulated robotic manipulation environment, where a robot needs to stack a set of objects. We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets, and illustrate how to reuse learned skills to solve novel tasks or imitate videos of human experts.
摘要
<>translate_language model Reinforcement Learning agent 能力 investigated embed scene understanding planning-like behavior text form efficient exploration reusing experience data scheduling skills learning from observations sparse-reward simulated robotic manipulation environment substantial performance improvements exploration efficiency reuse learned skills solve novel tasks imitate human expertsLanguage Models 和 Vision Language Models 最近Displayed unprecedented capabilities 理解人类意图 reasoning scene understanding 和 planning-like behavior 在文本形式中,其中许多。在这项工作中,我们调查了如何在Reinforcement Learning(RL)代理人中嵌入和利用这些能力。我们设计了一个框架,用语言作为核心的理解工具,探索如何使得代理人可以解决RL中的一系列基本挑战,例如高效探索、重用经验数据、培训技能和从观察中学习,这些传统需要独立的、垂直设计的算法。我们在一个具有缺乏奖励的 simulate 机器人搬运环境中测试了我们的方法,where a robot needs to stack a set of objects。我们示出了较好的性能提升,包括探索效率和从停挂数据中重用经验,并示出了如何重用已学到的技能来解决新任务或模仿人类专家的视频。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.
Anticipating Technical Expertise and Capability Evolution in Research Communities using Dynamic Graph Transformers
paper_authors: Sameera Horawalavithana, Ellyn Ayton, Anastasiya Usenko, Robin Cosbey, Svitlana Volkova for: 这种研究旨在预测技术专业和能力发展趋势,以便为国际和全球安全做出更好的准备。特别是在安全关键领域 like 核不扩散 (NN) 和快速发展的领域 like 人工智能 (AI) 中,预测技术专业和能力发展变得非常重要。methods: 该研究使用了传统的统计关系学学习方法(如链接预测在合作网络中),并将问题转化为动态多态图表示。研究人员还开发了一种新的能量预测技术,以便预测科学家和机构之间的合作模式、作者行为和技术能力的发展。results: 研究人员使用的动态图变换(DGT)神经网络模型,可以在AI和NN两个领域中预测合作、合作伙伴和技术能力的发展。DGT模型的性能比最佳静止图模型提高30-80%。研究人员还发现,DGT模型在未seen节点出现在测试数据中时,对 inductive 任务的性能有所提高。具体来说,模型可以准确预测已有的科学家和新手科学家之间的合作关系在 AI 领域中。Abstract
The ability to anticipate technical expertise and capability evolution trends globally is essential for national and global security, especially in safety-critical domains like nuclear nonproliferation (NN) and rapidly emerging fields like artificial intelligence (AI). In this work, we extend traditional statistical relational learning approaches (e.g., link prediction in collaboration networks) and formulate a problem of anticipating technical expertise and capability evolution using dynamic heterogeneous graph representations. We develop novel capabilities to forecast collaboration patterns, authorship behavior, and technical capability evolution at different granularities (e.g., scientist and institution levels) in two distinct research fields. We implement a dynamic graph transformer (DGT) neural architecture, which pushes the state-of-the-art graph neural network models by (a) forecasting heterogeneous (rather than homogeneous) nodes and edges, and (b) relying on both discrete -- and continuous -- time inputs. We demonstrate that our DGT models predict collaboration, partnership, and expertise patterns with 0.26, 0.73, and 0.53 mean reciprocal rank values for AI and 0.48, 0.93, and 0.22 for NN domains. DGT model performance exceeds the best-performing static graph baseline models by 30-80% across AI and NN domains. Our findings demonstrate that DGT models boost inductive task performance, when previously unseen nodes appear in the test data, for the domains with emerging collaboration patterns (e.g., AI). Specifically, models accurately predict which established scientists will collaborate with early career scientists and vice-versa in the AI domain.
摘要
能够预测技术培养和能力演化趋势是国家和全球安全的关键,特别是在安全敏感领域如核不扩散(NN)和快速出现的领域如人工智能(AI)。在这种工作中,我们扩展了传统的统计关系学学习方法(例如链接预测在合作网络中),并将问题定义为预测技术培养和能力演化使用动态多类图表示。我们开发了新的能力来预测合作模式、作者行为和技术能力演化的不同细腻度(例如科学家和机构层次)在两个不同的研究领域。我们实现了动态图变换(DGT)神经网络模型,该模型超过了状态可能的图像基eline模型的性能,通过(a)预测多样性(而不是同质)的节点和边,以及(b)使用时间输入。我们 demonstate that DGT models predict collaboration, partnership, and expertise patterns with mean reciprocal rank values of 0.26, 0.73, and 0.53 for AI and 0.48, 0.93, and 0.22 for NN domains. DGT model performance exceeds the best-performing static graph baseline models by 30-80% across AI and NN domains. Our findings demonstrate that DGT models boost inductive task performance, when previously unseen nodes appear in the test data, for the domains with emerging collaboration patterns (e.g., AI). Specifically, models accurately predict which established scientists will collaborate with early career scientists and vice-versa in the AI domain.
HAT-CL: A Hard-Attention-to-the-Task PyTorch Library for Continual Learning
methods: 这篇论文提出了一种名为 HAT 机制来 Mitigate 这个问题,但实际应用中存在可用性和兼容性问题,以及现有网络重用支持不足。
results: 本文引入了一种更User-Friendly、PyTorch-Compatible的 HAT 机制,名为 HAT-CL。 HAT-CL 不仅自动 manipulate 梯度,还可以将 PyTorch 模块转换为 HAT 模块,并提供了一个完整的模块集,可以轻松地整合到现有的架构中。 In addition, the authors introduce novel mask manipulation techniques for HAT, which have consistently shown improvements across various experiments.Abstract
Catastrophic forgetting, the phenomenon in which a neural network loses previously obtained knowledge during the learning of new tasks, poses a significant challenge in continual learning. The Hard-Attention-to-the-Task (HAT) mechanism has shown potential in mitigating this problem, but its practical implementation has been complicated by issues of usability and compatibility, and a lack of support for existing network reuse. In this paper, we introduce HAT-CL, a user-friendly, PyTorch-compatible redesign of the HAT mechanism. HAT-CL not only automates gradient manipulation but also streamlines the transformation of PyTorch modules into HAT modules. It achieves this by providing a comprehensive suite of modules that can be seamlessly integrated into existing architectures. Additionally, HAT-CL offers ready-to-use HAT networks that are smoothly integrated with the TIMM library. Beyond the redesign and reimplementation of HAT, we also introduce novel mask manipulation techniques for HAT, which have consistently shown improvements across various experiments. Our work paves the way for a broader application of the HAT mechanism, opening up new possibilities in continual learning across diverse models and applications.
摘要
灾难性忘却(Catastrophic Forgetting)是机器学习中的一个问题,它表示神经网络在学习新任务时会遗传已经获得的知识。这个问题对 kontinual learning(持续学习)领域 pose a significant challenge。在这篇论文中,我们介绍了一个名为 HAT-CL(Hard-Attention-to-the-Task)的用户友好的、PyTorch-相容的重新设计。HAT-CL不仅自动调整Gradient的方法,而且可以将PyTorch模组转换为HAT模组,并且提供了一个涵盖了多个模组的套件,可以与现有的架构集成。此外,HAT-CL还提供了一些Ready-to-use HAT网络,与TIMM库相关联系。除了重新设计和重新实现HAT之外,我们还引入了一些新的面积调整技术,这些技术在不同的实验中均展示了改善。我们的工作开辟了HAT机制的应用,将在不同的模型和应用中实现持续学习的新可能性。
VISER: A Tractable Solution Concept for Games with Information Asymmetry
results: 作者表示,每个玩家的VISER策略都可以独立计算,并且可以用线性规划(LP)来计算。此外,作者还扩展了VISER到其Markov游戏对应的Markov-perfect版本,可以使用一系列LP来解决。Abstract
Many real-world games suffer from information asymmetry: one player is only aware of their own payoffs while the other player has the full game information. Examples include the critical domain of security games and adversarial multi-agent reinforcement learning. Information asymmetry renders traditional solution concepts such as Strong Stackelberg Equilibrium (SSE) and Robust-Optimization Equilibrium (ROE) inoperative. We propose a novel solution concept called VISER (Victim Is Secure, Exploiter best-Responds). VISER enables an external observer to predict the outcome of such games. In particular, for security applications, VISER allows the victim to better defend itself while characterizing the most damaging attacks available to the attacker. We show that each player's VISER strategy can be computed independently in polynomial time using linear programming (LP). We also extend VISER to its Markov-perfect counterpart for Markov games, which can be solved efficiently using a series of LPs.
摘要
Many real-world games suffer from information asymmetry: one player only knows their own payoffs, while the other player has full game information. Examples include security games and adversarial multi-agent reinforcement learning. Information asymmetry makes traditional solution concepts like Strong Stackelberg Equilibrium (SSE) and Robust-Optimization Equilibrium (ROE) inoperative. We propose a novel solution concept called VISER (Victim Is Secure, Exploiter best-Responds). VISER enables an external observer to predict the outcome of such games. In particular, for security applications, VISER allows the victim to better defend itself while characterizing the most damaging attacks available to the attacker. We show that each player's VISER strategy can be computed independently in polynomial time using linear programming (LP). We also extend VISER to its Markov-perfect counterpart for Markov games, which can be solved efficiently using a series of LPs.Here's the translation in Traditional Chinese:Many real-world games suffer from information asymmetry: one player only knows their own payoffs, while the other player has full game information. Examples include security games and adversarial multi-agent reinforcement learning. Information asymmetry makes traditional solution concepts like Strong Stackelberg Equilibrium (SSE) and Robust-Optimization Equilibrium (ROE) inoperative. We propose a novel solution concept called VISER (Victim Is Secure, Exploiter best-Responds). VISER enables an external observer to predict the outcome of such games. In particular, for security applications, VISER allows the victim to better defend itself while characterizing the most damaging attacks available to the attacker. We show that each player's VISER strategy can be computed independently in polynomial time using linear programming (LP). We also extend VISER to its Markov-perfect counterpart for Markov games, which can be solved efficiently using a series of LPs.
With Flying Colors: Predicting Community Success in Large-scale Collaborative Campaigns
For: This paper aims to study the effectiveness of online communities in promoting their agenda, specifically in the context of Reddit’s r/place.* Methods: The paper uses a novel task of predicting the success level of online communities in Reddit’s r/place, and experiments with several hybrid models combining various types of features.* Results: The models significantly outperform all baseline models over all definitions of “success level”, and the results provide insights into the factors that contribute to the success of coordinated campaigns.Here are the three points in Simplified Chinese:* For: 这篇论文目的是研究在Reddit的r/place上线上社区的效果,具体来说是研究社区的合作能力和成果。* Methods: 这篇论文使用了一个新的任务,即预测Reddit的r/place上线上社区的成功水平,并试用了多种混合模型。* Results: 这些模型在所有基础模型上都显著地超越了,并且提供了社区协作成功的因素分析。Abstract
Online communities develop unique characteristics, establish social norms, and exhibit distinct dynamics among their members. Activity in online communities often results in concrete ``off-line'' actions with a broad societal impact (e.g., political street protests and norms related to sexual misconduct). While community dynamics, information diffusion, and online collaborations have been widely studied in the past two decades, quantitative studies that measure the effectiveness of online communities in promoting their agenda are scarce. In this work, we study the correspondence between the effectiveness of a community, measured by its success level in a competitive online campaign, and the underlying dynamics between its members. To this end, we define a novel task: predicting the success level of online communities in Reddit's r/place - a large-scale distributed experiment that required collaboration between community members. We consider an array of definitions for success level; each is geared toward different aspects of collaborative achievement. We experiment with several hybrid models, combining various types of features. Our models significantly outperform all baseline models over all definitions of `success level'. Analysis of the results and the factors that contribute to the success of coordinated campaigns can provide a better understanding of the resilience or the vulnerability of communities to online social threats such as election interference or anti-science trends. We make all data used for this study publicly available for further research.
摘要
在线社区发展独特特征,成社会规范,展现特殊的动态。活动在线社区常引起“Off-line”行动,影响广泛社会(如政治街头抗议和性骚扰规范)。 despite community dynamics, information diffusion, and online collaborations have been widely studied in the past two decades, quantitative studies that measure the effectiveness of online communities in promoting their agenda are scarce. In this work, we study the correspondence between the effectiveness of a community, measured by its success level in a competitive online campaign, and the underlying dynamics between its members. To this end, we define a novel task: predicting the success level of online communities in Reddit's r/place - a large-scale distributed experiment that required collaboration between community members. We consider an array of definitions for success level; each is geared toward different aspects of collaborative achievement. We experiment with several hybrid models, combining various types of features. Our models significantly outperform all baseline models over all definitions of `success level'. Analysis of the results and the factors that contribute to the success of coordinated campaigns can provide a better understanding of the resilience or the vulnerability of communities to online social threats such as election interference or anti-science trends. We make all data used for this study publicly available for further research.Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China and other parts of the world. If you need the translation in Traditional Chinese, please let me know.
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
methods: 本研究提出了一种基于缓存的 Adam 优化器,通过保留一些关键动量项的缓存来促进探索更平的极值陷阱。
results: 实验表明,使用该新的缓存版本的 Adam 优化器可以提高多种 Adam 变体在标准的语言模型和图像分类任务中的性能。Abstract
Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of such optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.
摘要
适应式梯度基本优化器,如 Adam,在训练大规模深度学习模型中留下了印记。这些优化器具有快速收敛的优点,同时也更加抗性于超参数选择。然而,它们通常的泛化性比非适应方法差。最近的研究表明,这种性能差距与平坦的最小值选择相关:适应方法通常会在loss函数的梯度图上找到更加锐角的解,这会导致泛化性下降。为了解决这个问题,我们提出了一种新的带缓存的 Adam 方法,通过在训练过程中使用缓存来保持权重的积累。这种方法可以使优化器在训练过程中更加勇敢地尝试不同的方向,从而增强泛化性。我们通过实验证明,我们的方法可以提高许多 Adam 变体在标准的supervised语言模型和图像分类任务中的性能。
Traffic-Domain Video Question Answering with Automatic Captioning
results: 实验结果表明,TRIVIA方法可以提高代表性视频语言模型的准确率,相比基eline设置提高了6.5点(19.88%)。这一突破性的方法在交通相关应用中具有极大的推动力,鼓励研究者和实践者共同探索视频语言模型的潜力。Abstract
Video Question Answering (VidQA) exhibits remarkable potential in facilitating advanced machine reasoning capabilities within the domains of Intelligent Traffic Monitoring and Intelligent Transportation Systems. Nevertheless, the integration of urban traffic scene knowledge into VidQA systems has received limited attention in previous research endeavors. In this work, we present a novel approach termed Traffic-domain Video Question Answering with Automatic Captioning (TRIVIA), which serves as a weak-supervision technique for infusing traffic-domain knowledge into large video-language models. Empirical findings obtained from the SUTD-TrafficQA task highlight the substantial enhancements achieved by TRIVIA, elevating the accuracy of representative video-language models by a remarkable 6.5 points (19.88%) compared to baseline settings. This pioneering methodology holds great promise for driving advancements in the field, inspiring researchers and practitioners alike to unlock the full potential of emerging video-language models in traffic-related applications.
摘要
视频问答(VidQA)具有很好的潜力,能够推动智能交通监测和交通系统领域的高级机器理解能力。然而,在前一些研究中,对城市交通场景知识的 интеграción into VidQA系统受到了限制。在这项工作中,我们提出了一种新的方法,称为交通领域视频问答自动标注(TRIVIA),它作为软监督技术,将交通领域知识注入到大型视频语言模型中。实验结果表明,TRIVIA可以提高代表性的视频语言模型的准确率,相比基线设定,提高6.5点(19.88%)。这种先锋的方法会推动这个领域的发展,鼓励研究人员和实践者们一起探索emerging视频语言模型在交通相关应用中的潜力。
Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions
results: 经验 validate by cardiac catheterization图像、核心医生的诊断解读和FDA 510(k)-清理的产品软件,该方法在人体研究中比前一代方法高于cardiac defect contrast,可能启用站立几视专业cardiac SPECT镜像仪器上高质量的缺陷可见化。Abstract
Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.
摘要
心血管疾病(CVD)是全球最主要的死亡原因,而myocardial perfusion imaging using SPECT已经广泛用于CVD的诊断。GE 530/570c专用心血管SPECT扫描仪采用了静止geometry,同时获取19个投影图以增强敏感度和实现动态扫描。然而,有限的角度采样会影响图像质量。深度学习方法可以实现从静止数据生成高质量图像。这是基本上是几视图图像问题。在这种工作中,我们提出了一种新的3D transformer-based dual-domain网络,called TIP-Net,用于高质量3Dcardiac SPECT图像重建。我们的方法首先从投影数据直接重建3Dcardiac SPECT图像,而不需要迭代重建过程。然后,给出其重建输出和原始几视图重建结果,我们进一步修复图像使用图像域重建网络。被 cardiac catheterization 图像、核子医生的诊断解读和FDA 510(k)-cleared严格评估软件评估,我们的方法在人类研究中生成了比前一种基准方法更高的心血管损伤对比度,可能允许使用静止几视图专用心血管SPECT扫描仪获得高质量损伤可视化。
Gradient strikes back: How filtering out high frequencies improves explanations
results: 研究发现,通过使用最佳低通 filtered attribution maps,可以提高梯度基于方法的解释效果,并且可以在多个模型中获得更好的解释效果。这些结果表明,使用估计基于方法可以提高解释效果,而不需要使用复杂的计算方法。Abstract
Recent years have witnessed an explosion in the development of novel prediction-based attribution methods, which have slowly been supplanting older gradient-based methods to explain the decisions of deep neural networks. However, it is still not clear why prediction-based methods outperform gradient-based ones. Here, we start with an empirical observation: these two approaches yield attribution maps with very different power spectra, with gradient-based methods revealing more high-frequency content than prediction-based methods. This observation raises multiple questions: What is the source of this high-frequency information, and does it truly reflect decisions made by the system? Lastly, why would the absence of high-frequency information in prediction-based methods yield better explainability scores along multiple metrics? We analyze the gradient of three representative visual classification models and observe that it contains noisy information emanating from high-frequencies. Furthermore, our analysis reveals that the operations used in Convolutional Neural Networks (CNNs) for downsampling appear to be a significant source of this high-frequency content -- suggesting aliasing as a possible underlying basis. We then apply an optimal low-pass filter for attribution maps and demonstrate that it improves gradient-based attribution methods. We show that (i) removing high-frequency noise yields significant improvements in the explainability scores obtained with gradient-based methods across multiple models -- leading to (ii) a novel ranking of state-of-the-art methods with gradient-based methods at the top. We believe that our results will spur renewed interest in simpler and computationally more efficient gradient-based methods for explainability.
摘要
近年来,有一场激进的发展,即使用预测来解释深度神经网络的决策。然而,我们仍然不清楚为什么预测基于的方法超越了梯度基于的方法。在这里,我们从employm empirical observation开始:这两种方法的解释地图具有不同的功率谱,预测基于的方法 fewer high-frequency content than gradient-based methods。这个观察引出了多个问题:这些高频信息的来源是什么,并不是系统真正做出的决策吗?最后,为什么缺少高频信息的预测基于方法会得到多个维度的更好的解释分数?我们分析了三种代表性的视觉分类模型的梯度,发现梯度包含了高频信息的噪声。进一步分析发现,Convolutional Neural Networks (CNNs) 的下采样操作是高频信息的主要来源,这可能是噪声的基础。我们应用最佳低通Filter для解释地图,并观察到:(i)从高频噪声中移除噪声可以提高梯度基于方法的解释分数,(ii)这种方法可以将 gradient-based 方法列为现有的最佳方法之一。我们认为,我们的结果会激励人们对简单而 computationally more efficient 的梯度基于方法的解释表示新的兴趣。
Automating Wood Species Detection and Classification in Microscopic Images of Fibrous Materials with Deep Learning
results: 该方法的表现与人工专家类似,未来这将改善全球木材产品流量的控制,以保护森林。I hope this helps! Let me know if you have any other questions.Abstract
We have developed a methodology for the systematic generation of a large image dataset of macerated wood references, which we used to generate image data for nine hardwood genera. This is the basis for a substantial approach to automate, for the first time, the identification of hardwood species in microscopic images of fibrous materials by deep learning. Our methodology includes a flexible pipeline for easy annotation of vessel elements. We compare the performance of different neural network architectures and hyperparameters. Our proposed method performs similarly well to human experts. In the future, this will improve controls on global wood fiber product flows to protect forests.
摘要
我们已经开发了一种系统化生成大量浸泡木参考图像的方法ologies,用于生成九种硬木属图像数据。这是我们使用深度学习自动识别硬木种类的基础。我们的方法包括一个灵活的管道,方便对血管元素进行标注。我们比较了不同的神经网络架构和超参数的性能。我们的提议方法与人类专家相似。未来,这将改善全球木质产品流动的控制,以保护森林。Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
paper_authors: Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt
for: 这 paper 研究了模型在几个示例学习中吸收和复制危险内容的现象,并提出了两种相关现象:过度思考和假推导头。
methods: 研究使用了一种基于 interior representations 的方法,通过decode predictions from intermediate layers来研究模型的内部表示。
results: 研究发现,在某些层次上,模型会因为 incorrect demonstrations 而出现过度思考现象,并且这种现象可能归结于 false induction heads 的存在。Abstract
Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present in the context. We study harmful imitation through the lens of a model's internal representations, and identify two related phenomena: overthinking and false induction heads. The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations. At early layers, both demonstrations induce similar model behavior, but the behavior diverges sharply at some "critical layer", after which the accuracy given incorrect demonstrations progressively decreases. The second phenomenon, false induction heads, are a possible mechanistic cause of overthinking: these are heads in late layers that attend to and copy false information from previous demonstrations, and whose ablation reduces overthinking. Beyond scientific understanding, our results suggest that studying intermediate model computations could be a promising avenue for understanding and guarding against harmful model behaviors.
摘要
现代语言模型可以通过几个示例学习模式,完成复杂任务,无需调整。然而,模仿也可能使模型复制错误或有害内容,如果存在上下文中。我们通过模型内部表示的研究,发现了两种相关现象:过度思考和假推导头。第一种现象,过度思考,发生在decode预测时,当正确vs.错误几个示例中的解码。在早期层次,两种示例都会导致模型行为相似,但行为在某个"关键层"后开始差异化,并且在这个层次下,错误示例下的准确率逐渐下降。第二种现象,假推导头,可能是过度思考的可能机制:这些是在晚期层次中的头,它们会将false信息从前一个示例中拟合并复制,并且去掉这些头会减少过度思考。我们的结果表明,研究模型中间计算可能是制止模型不良行为的有望的途径。
Unsupervised Conditional Slot Attention for Object Centric Learning
results: 我们的方法在多个下游任务中提供了场景组合能力和几个几shot适应能力的显著提升,包括对象发现、 compositional scene generation 和 compositional visual reasoning。 另外,我们的方法在对象发现任务中与槽注意力方法相当或更好,而在其他两个任务中则显示出了更好的性能。Abstract
Extracting object-level representations for downstream reasoning tasks is an emerging area in AI. Learning object-centric representations in an unsupervised setting presents multiple challenges, a key one being binding an arbitrary number of object instances to a specialized object slot. Recent object-centric representation methods like Slot Attention utilize iterative attention to learn composable representations with dynamic inference level binding but fail to achieve specialized slot level binding. To address this, in this paper we propose Unsupervised Conditional Slot Attention using a novel Probabilistic Slot Dictionary (PSD). We define PSD with (i) abstract object-level property vectors as key and (ii) parametric Gaussian distribution as its corresponding value. We demonstrate the benefits of the learnt specific object-level conditioning distributions in multiple downstream tasks, namely object discovery, compositional scene generation, and compositional visual reasoning. We show that our method provides scene composition capabilities and a significant boost in a few shot adaptability tasks of compositional visual reasoning, while performing similarly or better than slot attention in object discovery tasks
摘要
通过不监督学习,从object-centric表示中提取下游逻辑任务的object-level表示是一个emerging领域。在这种情况下,绑定arbitrary数量的object实例到特殊的object槽是一个关键挑战。现有的object-centric表示方法,如槽注意力,通过迭代注意力学习可composable表示,但未能实现特殊槽级别绑定。为解决这个问题,在这篇论文中,我们提出了不监督条件槽注意力,使用一种新的概率槽词典(PSD)。我们定义PSD中的键为抽象object-level属性向量,值为参数化的 Gaussian Distribution。我们示出了学习的特定object-levelconditioning分布的好处,在多个下游任务中,包括object发现、compositionalScene生成和compositional visual reasoning。我们显示了我们的方法可以实现Scene组合能力,并在compositional visual reasoning中具有显著的几步适应性提升,同时与槽注意力在object发现任务中表现相似或更好。
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
results: 对比 existing state-of-the-art zero-shot voice conversion models, 本paper的SLMGAN模型在自然性和相似性方面表现出色,并且与文本标注不需要进行训练。Abstract
In recent years, large-scale pre-trained speech language models (SLMs) have demonstrated remarkable advancements in various generative speech modeling applications, such as text-to-speech synthesis, voice conversion, and speech enhancement. These applications typically involve mapping text or speech inputs to pre-trained SLM representations, from which target speech is decoded. This paper introduces a new approach, SLMGAN, to leverage SLM representations for discriminative tasks within the generative adversarial network (GAN) framework, specifically for voice conversion. Building upon StarGANv2-VC, we add our novel SLM-based WavLM discriminators on top of the mel-based discriminators along with our newly designed SLM feature matching loss function, resulting in an unsupervised zero-shot voice conversion system that does not require text labels during training. Subjective evaluation results show that SLMGAN outperforms existing state-of-the-art zero-shot voice conversion models in terms of naturalness and achieves comparable similarity, highlighting the potential of SLM-based discriminators for related applications.
摘要
Translated into Simplified Chinese:近年来,大规模预训练的语音语言模型(SLM)在各种生成语音模型应用中表现出色,如文本到语音合成、声音转换和语音增强等。这些应用通常涉及将文本或语音输入映射到预训练SLM表示,从而解码目标语音。本文介绍了一种新的方法,SLMGAN,利用SLM表示来在生成对抗网络(GAN)框架中进行探索性任务,特别是语音转换。基于StarGANv2-VC,我们添加了我们的新的SLM基于WavLM探测器,以及我们新的SLM特征匹配损失函数,从而实现了无监督零shot语音转换系统,不需要文本标注 durante 训练。主观评估结果表明,SLMGAN在自然性和相似性方面比前 estado-of-the-art 零shot语音转换模型高, highlighting the potential of SLM-based discriminators for related applications.
Balancing Privacy and Progress in Artificial Intelligence: Anonymization in Histopathology for Biomedical Research and Education
paper_authors: Neel Kanwal, Emiel A. M. Janssen, Kjersti Engan
For: This paper aims to explore the legal regulations and terminologies for medical data-sharing in the context of histopathology, with a focus on balancing privacy and progress in bio-informatics research.* Methods: The paper reviews existing approaches to medical data-sharing and highlights challenges from the histopathological perspective, including the risk of data linkage attacks and the lack of standardization in digital pathology.* Results: The paper presents a data-sharing guideline for histological data to foster multidisciplinary research and education, while addressing the challenges of privacy and data usability.Here’s the same information in Simplified Chinese text:* For: 这篇论文目标是探讨医疗数据共享的法律规定和术语,特别是在 Histopathology 领域内进行生物信息学研究的平衡 privacy 和进步。* Methods: 论文回顾了现有的医疗数据共享方法,并将 Histopathology 领域的挑战作为主要关注点,包括数据链接攻击和数字patology 的不一致性。* Results: 论文提出了一个 Histological 数据共享指南,以促进多学科研究和教育,同时解决 privacy 和数据可用性之间的矛盾。Abstract
The advancement of biomedical research heavily relies on access to large amounts of medical data. In the case of histopathology, Whole Slide Images (WSI) and clinicopathological information are valuable for developing Artificial Intelligence (AI) algorithms for Digital Pathology (DP). Transferring medical data "as open as possible" enhances the usability of the data for secondary purposes but poses a risk to patient privacy. At the same time, existing regulations push towards keeping medical data "as closed as necessary" to avoid re-identification risks. Generally, these legal regulations require the removal of sensitive data but do not consider the possibility of data linkage attacks due to modern image-matching algorithms. In addition, the lack of standardization in DP makes it harder to establish a single solution for all formats of WSIs. These challenges raise problems for bio-informatics researchers in balancing privacy and progress while developing AI algorithms. This paper explores the legal regulations and terminologies for medical data-sharing. We review existing approaches and highlight challenges from the histopathological perspective. We also present a data-sharing guideline for histological data to foster multidisciplinary research and education.
摘要
医学研究的进步受到医疗数据的大量存储的限制。在 histopathology 中,整个标本图像(WSI)和临床 PATHOLOGY 信息是开发数字 pathology (DP) 的人工智能(AI)算法的宝贵资源。将医疗数据“作为公开的可能”可以提高数据的可重用性,但会对患者隐私带来风险。同时,现有的法规强调保持医疗数据“作为关闭的必要”,以避免重新邻接风险。通常,这些法规要求去除敏感数据,但不考虑现代图像匹配算法可能导致的数据链接攻击。此外,DP 的不一致性使其更难建立一个 universal 的解决方案。这些挑战使生物信息学研究人员在保护隐私和进步之间做出坚韧的决策。本文研究医疗数据共享的法规和术语。我们评论现有的方法并高亮 histopathological 视角中的挑战。我们还提出了 histological 数据共享指南,以促进多学科研究和教育。
paper_authors: Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade for: 这个论文主要目的是研究如何通过练习学习来帮助机器学习模型更好地学习人类专家的行为。methods: 这篇论文使用了委婉学习(Imitation Learning,IL)方法,并且对模型和数据集进行了缩放。results: 研究发现,通过缩放模型和数据集可以使IL模型更好地学习人类专家的行为,并且可以达到至少2倍的性能提升。Abstract
Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, while powerful, many works find it is often not able to fully recover the underlying expert behavior. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting. To demonstrate our findings, we focus on the game of NetHack, a challenging environment featuring procedural generation, stochasticity, long-term dependencies, and partial observability. We find IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. We forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by at least 2x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a challenging domain, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.
摘要
copying learning (IL) 是机器学习中最广泛使用的方法之一。然而,虽然强大,但许多研究发现它通常无法完全恢复专家行为的下面。然而,这些研究几乎没有深入探究扩大模型和数据集大小的作用。受到最近的自然语言处理(NLP)研究的启发,我们调查了扩大模型和数据集大小是否可以在imitating learning setting中带来类似的改进。为了证明我们的发现,我们关注了NetHack游戏,这是一个复杂的环境,具有生成过程、随机性、长期依赖和部分可见性。我们发现IL损失和平均返回金额随着计算预算的增加而呈现平滑的关系,并且具有强相关性,从而导致了训练计算优化IL代理人的模型大小和样本数量的力学关系。我们预测和训练了一些NetHack代理人,并发现它们在所有设置下都高于之前的状态艺术,至少提高2倍。我们的工作不仅证明了imitating learning在复杂环境中的扩展行为,也证明了可以通过扩大当前方法来创建更有能力的NetHack代理人,一游戏仍然是AI系统所难以解决的。