paper_authors: Adu-Gyamfi Kojo, Kandiboina Raghupathi, Ravichandra-Mouli Varsha, Knickerbocker Skylar, Hans Zachary N, Hawkins, Neal R, Sharma Anuj
for: 快速改善城市景观中的交通系统,提高路网规划、交通安全和通勤体验等方面的研究。
methods: 利用连接式汽车数据和 cutting-edge 深度学习技术自动识别交叉口。
results: 实验结果显示自动识别率达 95%,直道路段 F1 分数达 97%,交叉口 F1 分数达 90%。Abstract
In today's rapidly evolving urban landscapes, efficient and accurate mapping of road infrastructure is critical for optimizing transportation systems, enhancing road safety, and improving the overall mobility experience for drivers and commuters. Yet, a formidable bottleneck obstructs progress - the laborious and time-intensive manual identification of intersections. Simply considering the shear number of intersections that need to be identified, and the labor hours required per intersection, the need for an automated solution becomes undeniable. To address this challenge, we propose a novel approach that leverages connected vehicle data and cutting-edge deep learning techniques. By employing geohashing to segment vehicle trajectories and then generating image representations of road segments, we utilize the YOLOv5 (You Only Look Once version 5) algorithm for accurate classification of both straight road segments and intersections. Experimental results demonstrate an impressive overall classification accuracy of 95%, with straight roads achieving a remarkable 97% F1 score and intersections reaching a 90% F1 score. This approach not only saves time and resources but also enables more frequent updates and a comprehensive understanding of the road network. Our research showcases the potential impact on traffic management, urban planning, and autonomous vehicle navigation systems. The fusion of connected vehicle data and deep learning models holds promise for a transformative shift in road infrastructure mapping, propelling us towards a smarter, safer, and more connected transportation ecosystem.
摘要
今天的城市景观在不断发展,效率和准确地图示城市道路基础设施是非常重要,以提高交通系统的优化、提高安全性和改善通勤者的整体移动体验。然而,一个强大的障碍是手动标识交叉口的劳动 INTENSIVE 和时间耗费。考虑到需要标识的交叉口的数量和每个交叉口的劳动时间,就需要一个自动化解决方案。为此,我们提出了一种新的方法,利用连接式车辆数据和 cutting-edge 深度学习技术。我们使用 geohashing 将车辆轨迹分段,然后生成道路段的图像表示,并使用 YOLOv5(You Only Look Once version 5)算法进行精准的交叉口和直线道路段的分类。实验结果表明,这种方法可以 дости到95%的总分类精度,直线道路段达到97%的 F1 分数,交叉口达到90%的 F1 分数。这种方法不仅可以节省时间和资源,还可以实现更频繁的更新和全面了解道路网络。我们的研究表明,这种方法可以对交通管理、城市规划和自动驾驶 Navigation 系统产生深远的影响,推动我们 towards a smarter, safer, and more connected transportation ecosystem。
Updating Clinical Risk Stratification Models Using Rank-Based Compatibility: Approaches for Evaluating and Optimizing Clinician-Model Team Performance
results: 在使用MIMIC数据集进行临床风险分类的案例研究中,该方法可以提供更兼容的模型,同时保持分类性能的提升,相比现有的模型选择技术,$C^R$值提高了$0.019$ ($95%$ 信息interval: $0.005$, $0.035$).Abstract
As data shift or new data become available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.
摘要
如果数据变化或新数据成 disponible,then updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.Here's the breakdown of the translation:* "data shift" is translated as "数据变化" (shift is not a direct translation, but it's the closest equivalent to convey the idea of change in data)* "new data become available" is translated as "新数据成 disponible" (available is not a direct translation, but it's the closest equivalent to convey the idea of data becoming accessible)* "clinical machine learning models" is translated as "临床机器学习模型"* "user expectations" is translated as "用户期望"* "poor user-model team performance" is translated as "用户-模型团队性能不佳"* "compatibility measures" is translated as " compatibilty 度量"* "model decision thresholds" is translated as "模型决策阈值"* "settings where models are used to generate rankings based on estimated risk" is translated as "根据估计风险生成排名的设置"* "novel rank-based compatibility measure" is translated as "新的排名基于 compatibilty 度量"* "new loss function" is translated as "新的损失函数"* "case study in mortality risk stratification" is translated as "mortality 风险分级的案例研究"* "MIMIC" is translated as "MIMIC" (it's a well-known dataset in the field, so it's better to keep the original name)* "our approach yields more compatible models" is translated as "我们的方法生成了更Compatibilty 的模型"* "maintaining discriminative performance" is translated as "保持预测性能"* "compared to existing model selection techniques" is translated as "与现有的模型选择技术相比"* "with an increase in $C^R$ of $0.019$" is translated as " $C^R$ 增加 $0.019$"* "($95\%$ confidence interval: $0.005$, $0.035$)" is translated as "(95% 信息 interval:$0.005$, $0.035$)"* "This work provides new tools to analyze and update risk stratification models used in clinical care" is translated as "这种工作提供了新的工具来分析和更新在临床护理中使用的风险分级模型"
A Neural Network Based Choice Model for Assortment Optimization
results: 比较了神经网络模型和基准模型的性能,并提供了训练技巧以提高神经网络模型的稳定性和性能。Abstract
Discrete-choice models are used in economics, marketing and revenue management to predict customer purchase probabilities, say as a function of prices and other features of the offered assortment. While they have been shown to be expressive, capturing customer heterogeneity and behaviour, they are also hard to estimate, often based on many unobservables like utilities; and moreover, they still fail to capture many salient features of customer behaviour. A natural question then, given their success in other contexts, is if neural networks can eliminate the necessity of carefully building a context-dependent customer behaviour model and hand-coding and tuning the estimation. It is unclear however how one would incorporate assortment effects into such a neural network, and also how one would optimize the assortment with such a black-box generative model of choice probabilities. In this paper we investigate first whether a single neural network architecture can predict purchase probabilities for datasets from various contexts and generated under various models and assumptions. Next, we develop an assortment optimization formulation that is solvable by off-the-shelf integer programming solvers. We compare against a variety of benchmark discrete-choice models on simulated as well as real-world datasets, developing training tricks along the way to make the neural network prediction and subsequent optimization robust and comparable in performance to the alternates.
摘要
“统计学、市场营运和收益管理中使用数据选择模型来预测顾客购买可能性,例如作为价格和产品推荐的函数。这些模型能够捕捉顾客多样性和行为,但是它们还是很难估计,通常基于许多不可观测的价值;此外,它们还不能捕捉许多顾客行为的突出特征。因此,一个自然的问题是,使用神经网络可以消除需要手动建立上下文相依的顾客行为模型,并且透过调整估计。对于这个问题,我们在这篇论文中进行了调查,以查探一个神经网络架构是否能预测不同上下文中的购买可能性。接着,我们开发了一个排序优化的形式关系,可以通过商业标准的整数计划解决方案来解决。我们对此进行了评估,并且发展了一些训练技巧,以使神经网络预测和后续的优化可以对于替代方案进行比较。”
A Smart Robotic System for Industrial Plant Supervision
paper_authors: D. Adriana Gómez-Rosal, Max Bergau, Georg K. J. Fischer, Andreas Wachaja, Johannes Gräter, Matthias Odenweller, Uwe Piechottka, Fabian Hoeflinger, Nikhil Gosala, Niklas Wetzel, Daniel Büscher, Abhinav Valada, Wolfram Burgard
results: 我们在一座废水处理厂的实际工作环境中进行了广泛的评估,结果表明我们的系统可以在厂房内自主导航,并提供有用的异常操作信息。Abstract
In today's chemical production plants, human field operators perform frequent checks on the plant's integrity to guarantee high safety standards, and thus are possibly the first to encounter dangerous operating conditions. To alleviate their tasks of failure detection and monitoring by audio, visual, and olfactory perceptions, we present a robotic system that consists of an autonomously navigating robot integrated with various sensors and data processing. We aim to resemble the human sensing and interpretation capabilities of sight, smell, and hearing, for providing automated inspection. We evaluate our system extensively at a wastewater facility in full working conditions. Our results demonstrate that the system is able to robustly navigate a plant and to provide useful information about critical operating conditions.
摘要
今天的化学生产厂中,人类场地操作员经常进行厂房完整性的检查,以保证高度的安全标准,因此可能是首先发现危险运行条件的人。为了减轻人员的失效检测和监测的任务,我们提出了一个由自动导航机器人、多种感器和数据处理技术组成的 робоット系统。我们希望通过模仿人类感知和解释能力,提供自动检查。我们在废水处理设施的实际工作环境下进行了广泛的评估。我们的结果表明,系统可以强健地导航厂房,并提供有用的关键操作状况信息。
Multi-graph Spatio-temporal Graph Convolutional Network for Traffic Flow Prediction
results: 比基eline方法有明显改善的预测精度, 并有实际的商业应用优势。Abstract
Inter-city highway transportation is significant for urban life. As one of the key functions in intelligent transportation system (ITS), traffic evaluation always plays significant role nowadays, and daily traffic flow prediction still faces challenges at network-wide toll stations. On the one hand, the data imbalance in practice among various locations deteriorates the performance of prediction. On the other hand, complex correlative spatio-temporal factors cannot be comprehensively employed in long-term duration. In this paper, a prediction method is proposed for daily traffic flow in highway domain through spatio-temporal deep learning. In our method, data normalization strategy is used to deal with data imbalance, due to long-tail distribution of traffic flow at network-wide toll stations. And then, based on graph convolutional network, we construct networks in distinct semantics to capture spatio-temporal features. Beside that, meteorology and calendar features are used by our model in the full connection stage to extra external characteristics of traffic flow. By extensive experiments and case studies in one Chinese provincial highway, our method shows clear improvement in predictive accuracy than baselines and practical benefits in business.
摘要
между城市高速交通是城市生活中的重要功能之一,作为智能交通系统(ITS)的关键功能之一,交通评估总是在当今扮演着重要的角色,而日常交通流量预测仍然面临着网络范围的站点上的挑战。一方面,在实践中的数据不均衡问题使得预测性能下降。另一方面,较复杂的空间时间因素不能在长期时间内充分发挥作用。在本文中,我们提出了基于高速公路域的日常交通流量预测方法,通过空间时间深度学习来解决数据不均衡问题。我们使用了数据 нормализаStrategy来处理数据不均衡问题,因为交通流量在网络范围的站点上有长尾分布。然后,基于图 convolutional network,我们构建了不同 semantics 的网络来捕捉空间时间特征。此外,我们在全连接阶段使用了气象和历法特征来采取外部特征。经过广泛的实验和案例研究在一个中国省级高速公路上,我们的方法表现出了明显的预测精度提升和实践上的商业优势。
Optical Script Identification for multi-lingual Indic-script
results: 本文提出了一种新的脸部文本处理和识别方法,并进行了比较分析与现有方法的性能对比。这种方法可以减少识别错误率,提高识别精度。Abstract
Script identification and text recognition are some of the major domains in the application of Artificial Intelligence. In this era of digitalization, the use of digital note-taking has become a common practice. Still, conventional methods of using pen and paper is a prominent way of writing. This leads to the classification of scripts based on the method they are obtained. A survey on the current methodologies and state-of-art methods used for processing and identification would prove beneficial for researchers. The aim of this article is to discuss the advancement in the techniques for script pre-processing and text recognition. In India there are twelve prominent Indic scripts, unlike the English language, these scripts have layers of characteristics. Complex characteristics such as similarity in text shape make them difficult to recognize and analyze, thus this requires advance preprocessing methods for their accurate recognition. A sincere attempt is made in this survey to provide a comparison between all algorithms. We hope that this survey would provide insight to a researcher working not only on Indic scripts but also other languages.
摘要
artificial intelligence 中的一些主要领域是文本识别和脸部识别。在这个数字化时代,使用数字记录已成为一种常见的做法。然而,传统的使用笔和纸的方法仍然是写作的主要方式。这导致了脚本的分类基于它们的获取方法。这篇文章的目的是讨论脚本预处理和文本识别技术的进步。在印度,有十二种主要的印度文字,与英语不同的是,这些文字具有复杂的特征。例如,文本的形状相似性使得它们Difficult to recognize and analyze, therefore, advanced preprocessing methods are required for their accurate recognition. We hope that this survey will provide insight to researchers working not only on Indic scripts but also other languages.Note: Please note that Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.
Proximal Policy Optimization Actual Combat: Manipulating Output Tokenizer Length
results: 实验结果表明,PPO 不仅可以在特定任务中 manipulate 输出的tokenizer长度,而且在奖励模型效果被排除后,训练更加简单。Abstract
The Reinforcement Learning from Human Feedback (RLHF) plays a pivotal role in shaping the impact of large language models (LLMs), contributing significantly to controlling output toxicity and selecting output styles, particularly as LLMs often harbor misleading content, highlighting the urgency to align them with human values for secure AI systems. The RLHF, characterized by complexity, instability, and sensitivity to hyperparameters, makes the evaluation of the reward model for complex tasks challenging, thereby further complicating the use of Proximal Policy Optimization (PPO). In this paper, we introduce a simple task designed to employ Gloden as a reward model that validates the effectiveness of PPO and inspires it, primarily explaining the task of utilizing PPO to manipulate the tokenizer length of the output generated by the model. Experiments confirm that PPO is not only effective in manipulating the output tokenizer length to a certain extent in this type of task but also exhibits facilitated training once the influence of the reward model effect is excluded, making it an exciting development.
摘要
大型语言模型(LLM)的影响被强制控制,RLHF(人类反馈学习)在这个过程中扮演着关键角色,尤其是当LLM经常含有误导性内容时,高亮了需要与人类价值调整AI系统的紧急性。RLHF具有复杂性、不稳定性和参数敏感性,因此评估奖劵模型的测试难度增加,进一步复杂了使用PPO(最近邻居损失)。在这篇文章中,我们介绍了一个简单的任务,用于运用Golden作为奖劵模型,验证PPO的效果和鼓励它,主要是解释使用PPO来调整模型产生的输出块长。实验表明,PPO不仅可以在这种任务中有效地调整输出块长,而且在奖劵模型效应被排除后,训练更加 Facilitated,这是一个有趣的发展。
Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling
paper_authors: Ushnish Sengupta, Chinkuo Jao, Alberto Bernacchia, Sattar Vakili, Da-shan Shiu for: 这种吸引模型基于的渠道采样方法用于快速生成具有有限数据的通道实现。methods: 我们使用一种吸引模型,其拥有一个基于U Net架构的频谱空间频率域的扩散模型。results: 我们比较了我们的扩散模型和现有的GAN基于方法,发现我们的方法可以稳定地训练,并且生成高准确性和多样性的通道样本。此外,我们还示出了可以在有限数据上预训练模型,然后在小型、不同的通道分布上进行微调,这表明了使用有限数据来模型现实世界通道的可能性。Abstract
Channel modelling is essential to designing modern wireless communication systems. The increasing complexity of channel modelling and the cost of collecting high-quality wireless channel data have become major challenges. In this paper, we propose a diffusion model based channel sampling approach for rapidly synthesizing channel realizations from limited data. We use a diffusion model with a U Net based architecture operating in the frequency space domain. To evaluate how well the proposed model reproduces the true distribution of channels in the training dataset, two evaluation metrics are used: $i)$ the approximate $2$-Wasserstein distance between real and generated distributions of the normalized power spectrum in the antenna and frequency domains and $ii)$ precision and recall metric for distributions. We show that, compared to existing GAN based approaches which suffer from mode collapse and unstable training, our diffusion based approach trains stably and generates diverse and high-fidelity samples from the true channel distribution. We also show that we can pretrain the model on a simulated urban macro-cellular channel dataset and fine-tune it on a smaller, out-of-distribution urban micro-cellular dataset, therefore showing that it is feasible to model real world channels using limited data with this approach.
摘要
通道模拟是现代无线通信系统设计的关键。随着通道模拟的复杂度和收集高质量无线通道数据的成本的增加,已成为主要挑战。在本文中,我们提出了基于扩散模型的通道采样方法,可以快速生成基于有限数据的通道实现。我们使用了一种基于U网络的扩散模型,在频率空间域中运行。为评估我们提出的模型是否可以正确地重建真实通道分布,我们使用了两个评估指标:1. approx 2-Wasserstein 距离:用于评估生成的通道分布是否与实际数据分布相似。2. 准确率和回归率指标:用于评估生成的通道分布是否具有高准确率和高回归率。我们比较了我们的扩散模型与基于GAN的方法,发现其在训练过程中更加稳定,并且可以生成多样性和高精度的通道样本。此外,我们还证明了我们可以在有限数据上预训练模型,然后在小型、异常分布上进行细化训练,因此表明了可以使用有限数据来模拟真实的通道分布。
C5: Towards Better Conversation Comprehension and Contextual Continuity for ChatGPT
paper_authors: Pan Liang, Danwei Ye, Zihao Zhu, Yunchao Wang, Wang Xia, Ronghua Liang, Guodao Sun for: 这篇论文的目的是提出一个互动对话可视化系统,以帮助用户更好地理解和维护对话的Contextual information。methods: 这篇论文使用了一个名为C5的互动对话可视化系统,包括全局View、主题View和问题答案View。全局View使用GitLog图例表示对话结构,显示对话的演化趋势并支持地方焦点特征的探索。主题View将所有问题和答案节点和它们之间的关系展示在知识图中,显示对话的相关性和演化。问题答案View包括三个连接的看板,让用户深入探索个别对话,并在提问时提供特定的Contextual information。results: 这篇论文的用户研究结果显示,C5可以帮助用户更好地理解和维护对话的Contextual information,并提高用户对对话的满意度和效率。Abstract
Large language models (LLMs), such as ChatGPT, have demonstrated outstanding performance in various fields, particularly in natural language understanding and generation tasks. In complex application scenarios, users tend to engage in multi-turn conversations with ChatGPT to keep contextual information and obtain comprehensive responses. However, human forgetting and model contextual forgetting remain prominent issues in multi-turn conversation scenarios, which challenge the users' conversation comprehension and contextual continuity for ChatGPT. To address these challenges, we propose an interactive conversation visualization system called C5, which includes Global View, Topic View, and Context-associated Q\&A View. The Global View uses the GitLog diagram metaphor to represent the conversation structure, presenting the trend of conversation evolution and supporting the exploration of locally salient features. The Topic View is designed to display all the question and answer nodes and their relationships within a topic using the structure of a knowledge graph, thereby display the relevance and evolution of conversations. The Context-associated Q\&A View consists of three linked views, which allow users to explore individual conversations deeply while providing specific contextual information when posing questions. The usefulness and effectiveness of C5 were evaluated through a case study and a user study.
摘要
The Global View uses the GitLog diagram metaphor to represent the conversation structure, showing the trend of conversation evolution and allowing users to explore locally salient features. The Topic View displays all the question and answer nodes and their relationships within a topic using the structure of a knowledge graph, thereby displaying the relevance and evolution of conversations. The Context-associated Q&A View consists of three linked views that allow users to explore individual conversations in depth while providing specific contextual information when posing questions.We evaluated the usefulness and effectiveness of C5 through a case study and a user study. The results showed that C5 can effectively improve the understanding of multi-turn conversations and the contextual continuity of ChatGPT.
Recent Advancements In The Field Of Deepfake Detection
paper_authors: Natalie Krueger, Dr. Mounika Vanamala, Dr. Rushit Dave
for: 本研究的目的是Survey and analyze current methods and advances in the field of deepfake detection, in order to address the problems of malicious deepfake creation and the lack of universal deepfake detection methods.
results: 本研究对多种deepfake检测方法进行了分析和评估,并提出了一些新的方法和技术,以提高deepfake检测的精度和效果。Abstract
A deepfake is a photo or video of a person whose image has been digitally altered or partially replaced with an image of someone else. Deepfakes have the potential to cause a variety of problems and are often used maliciously. A common usage is altering videos of prominent political figures and celebrities. These deepfakes can portray them making offensive, problematic, and/or untrue statements. Current deepfakes can be very realistic, and when used in this way, can spread panic and even influence elections and political opinions. There are many deepfake detection strategies currently in use but finding the most comprehensive and universal method is critical. So, in this survey we will address the problems of malicious deepfake creation and the lack of universal deepfake detection methods. Our objective is to survey and analyze a variety of current methods and advances in the field of deepfake detection.
摘要
深圳是一种图像或视频中人脸的数字修改或部分替换为另一个人的图像。深圳有可能导致多种问题,常用于邪恶目的。一种常见的用途是修改政要和明星视频。这些深圳可以让他们发表侮辱、问题和/或不实的话语。当使用这种方式时,可以广泛散播恐慌,甚至影响选举和政治意见。目前有很多深圳检测策略在使用,但找到最全面和通用的方法是关键。因此,在这份调查中,我们将评估和分析各种当前方法和深圳检测领域的进展。
results: 这篇论文 highlights了分布式优化在机器学习和成像等领域的实际应用,以及增强勋爵方法在非对称优化问题上的性能优势。Abstract
This paper provides an overview of the historical progression of distributed optimization techniques, tracing their development from early duality-based methods pioneered by Dantzig, Wolfe, and Benders in the 1960s to the emergence of the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) algorithm. The initial focus on Lagrangian relaxation for convex problems and decomposition strategies led to the refinement of methods like the Alternating Direction Method of Multipliers (ADMM). The resurgence of interest in distributed optimization in the late 2000s, particularly in machine learning and imaging, demonstrated ADMM's practical efficacy and its unifying potential. This overview also highlights the emergence of the proximal center method and its applications in diverse domains. Furthermore, the paper underscores the distinctive features of ALADIN, which offers convergence guarantees for non-convex scenarios without introducing auxiliary variables, differentiating it from traditional augmentation techniques. In essence, this work encapsulates the historical trajectory of distributed optimization and underscores the promising prospects of ALADIN in addressing non-convex optimization challenges.
摘要
Translation into Simplified Chinese:这篇论文提供了分布式优化技术的历史发展概述,从早期的 dual-based 方法开始,由达普雷、沃尔夫和本德尔在 1960 年代开创,到 ALADIN 算法的出现。初期的关注于勾配 relaxation 的 convex 问题和分解策略,导致了 ADMM 的改进。2000 年代后期,特别是在机器学习和成像领域,再次启发了分布式优化的兴趣,证明了 ADMM 的实用性和统一性。此外,文章还提出了 proximal center 方法的出现和其在多个领域的应用。此外,文章还强调了 ALADIN 的独特特征,即不需要辅助变量,可以保证非对易问题的收敛,与传统的增强技术不同。总之,这篇论文概述了分布式优化的历史轨迹,并强调了 ALADIN 在非对易优化挑战中的前景。
Enhancing AUV Autonomy With Model Predictive Path Integral Control
paper_authors: Pierre Nicolay, Yvan Petillot, Mykhaylo Marfeychuk, Sen Wang, Ignacio Carlucho
for: 这个论文是为了研究模型预测积分控制(MPPI)在潜水器自主车(AUV)控制中的可行性。
methods: 该论文使用了非线性AUV模型,通过在实时间卷积样本来计算控制动作。
results: 论文比较了MPPI控制器与经典PID和链式PID控制器的性能,并证明了MPPI控制器的优越性。此外,论文还介绍了在环境限制下如何使用MPPI控制器处理环境变化。Abstract
Autonomous underwater vehicles (AUVs) play a crucial role in surveying marine environments, carrying out underwater inspection tasks, and ocean exploration. However, in order to ensure that the AUV is able to carry out its mission successfully, a control system capable of adapting to changing environmental conditions is required. Furthermore, to ensure the robotic platform's safe operation, the onboard controller should be able to operate under certain constraints. In this work, we investigate the feasibility of Model Predictive Path Integral Control (MPPI) for the control of an AUV. We utilise a non-linear model of the AUV to propagate the samples of the MPPI, which allow us to compute the control action in real time. We provide a detailed evaluation of the effect of the main hyperparameters on the performance of the MPPI controller. Furthermore, we compared the performance of the proposed method with a classical PID and Cascade PID approach, demonstrating the superiority of our proposed controller. Finally, we present results where environmental constraints are added and show how MPPI can handle them by simply incorporating those constraints in the cost function.
摘要
自主水下车辆(AUV)在水下环境评估、下水检查任务以及海洋探险中扮演着关键角色。但是,要确保AUV成功完成其任务,需要一个能够适应环境变化的控制系统。另外,为保证机器人平台的安全运行,控制器应该在某些限制下运行。在这种情况下,我们研究了MPPI(模型预测融合控制)的可行性,并使用AUV的非线性模型来传播MPPI的样本,以实时计算控制动作。我们提供了MPPI控制器的详细评估,包括主要超参数的影响。此外,我们还比较了我们的提议方法与传统PID和链式PID控制方法的性能,并证明了我们的控制器的优越性。最后,我们展示了在环境约束下运行MPPI控制器的效果,并证明了MPPI可以通过简单地包含约束条件在成本函数中来处理约束。
Models Matter: The Impact of Single-Step Retrosynthesis on Synthesis Planning
results: 研究发现,单步逆synthesis模型的高性能并不一定意味着它们在多步synthesis规划中的表现也高,而且使用不同的单步逆synthesis模型可以提高多步synthesis规划的成功率。Abstract
Retrosynthesis consists of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found with the goal to provide a synthesis route. Its two primary research directions, single-step retrosynthesis prediction, which models the chemical reaction logic, and multi-step synthesis planning, which tries to find the correct sequence of reactions, are inherently intertwined. Still, this connection is not reflected in contemporary research. In this work, we combine these two major research directions by applying multiple single-step retrosynthesis models within multi-step synthesis planning and analyzing their impact using public and proprietary reaction data. We find a disconnection between high single-step performance and potential route-finding success, suggesting that single-step models must be evaluated within synthesis planning in the future. Furthermore, we show that the commonly used single-step retrosynthesis benchmark dataset USPTO-50k is insufficient as this evaluation task does not represent model performance and scalability on larger and more diverse datasets. For multi-step synthesis planning, we show that the choice of the single-step model can improve the overall success rate of synthesis planning by up to +28% compared to the commonly used baseline model. Finally, we show that each single-step model finds unique synthesis routes, and differs in aspects such as route-finding success, the number of found synthesis routes, and chemical validity, making the combination of single-step retrosynthesis prediction and multi-step synthesis planning a crucial aspect when developing future methods.
摘要
<>translate the following text into Simplified Chinese<>Retrosynthesis 是将化学化合物分解成步骤进行回归的过程,直到找到一组商业可用的化合物,以实现合成路径。它的两个主要研究方向是单步反synthesis预测和多步合成规划,这两者在实际研究中是相互关联的。然而,这种关系并没有得到现代研究的反映。在这项工作中,我们将这两个主要研究方向结合起来,通过应用多个单步反synthesis模型在多步合成规划中,并使用公共和专用反应数据进行分析。我们发现,高单步性能与合成规划成功之间存在较大的差异,这表明,单步模型在合成规划中的评价需要进一步强化。此外,我们发现,通用的单步反synthesis标准数据集USPTO-50k不足,这种评价任务不能准确反映模型的性能和扩展性。对多步合成规划来说,我们发现,选择合适的单步模型可以提高总成功率达+28%,相比常用的基准模型。最后,我们发现,每个单步模型都发现了独特的合成路径,这些路径之间存在差异,例如合成路径成功率、发现的合成路径数量和化学有效性等方面,这使得单步反synthesis预测和多步合成规划的组合成为未来发展方法的关键。
Mono-hydra: Real-time 3D scene graph construction from monocular camera input with IMU
results: 该系统可以在实时下处理,并且可以达到 sub-20cm 的错误水平,使得机器人可以更加快速做出决策,提高决策效率和效果。Abstract
The ability of robots to autonomously navigate through 3D environments depends on their comprehension of spatial concepts, ranging from low-level geometry to high-level semantics, such as objects, places, and buildings. To enable such comprehension, 3D scene graphs have emerged as a robust tool for representing the environment as a layered graph of concepts and their relationships. However, building these representations using monocular vision systems in real-time remains a difficult task that has not been explored in depth. This paper puts forth a real-time spatial perception system Mono-Hydra, combining a monocular camera and an IMU sensor setup, focusing on indoor scenarios. However, the proposed approach is adaptable to outdoor applications, offering flexibility in its potential uses. The system employs a suite of deep learning algorithms to derive depth and semantics. It uses a robocentric visual-inertial odometry (VIO) algorithm based on square-root information, thereby ensuring consistent visual odometry with an IMU and a monocular camera. This system achieves sub-20 cm error in real-time processing at 15 fps, enabling real-time 3D scene graph construction using a laptop GPU (NVIDIA 3080). This enhances decision-making efficiency and effectiveness in simple camera setups, augmenting robotic system agility. We make Mono-Hydra publicly available at: https://github.com/UAV-Centre-ITC/Mono_Hydra
摘要
robot的自动导航能力在3D环境中取决于它对空间概念的理解,从低级几何到高级 semantics,如物体、场所和建筑。为实现这种理解,3D场景图作为一种层次图的工具,表示环境的概念和其关系,得到了广泛应用。然而,使用单目视系统在实时中建立这些表示仍然是一项具有挑战性的任务。这篇论文提出了一种实时空间感知系统Mono-Hydra,该系统结合单目视系统和IMU传感器设置,主要针对室内场景。然而,提出的方法可以适应户外应用,提供了灵活的应用可能性。该系统使用深度学习算法 derivation depth和semantics。它使用基于正方形信息的robocentric visual-inertial odometry(VIO)算法,以确保与IMU和单目视系统的视觉同步。该系统在实时处理中实现了20cm以下的错误,并在15帧/秒的速度下进行了实时3D场景图建构,使用了NVIDIA 3080 laptop GPU。这种系统提高了决策效率和有效性,并在简单的相机设置下增强了机器人系统的敏捷。我们在https://github.com/UAV-Centre-ITC/Mono_Hydra上公开了Mono-Hydra。
Multi-domain Recommendation with Embedding Disentangling and Domain Alignment
For: 本研究旨在提供多domain recommendation(MDR),即为不同领域(例如产品类型)的用户/项目提供推荐。现有的MDR模型面临两个挑战:首先,困难分离通用于多个领域的知识(例如用户喜欢便宜的item)和单个领域的知识(例如用户喜欢蓝色的服装,但不喜欢蓝色的汽车)。其次,它们对不同领域之间的知识传递有限。我们提出了一种新的MDR方法,名为EDDA,它包括两个关键 ком成分:嵌入分离推荐和领域对应。具体来说,嵌入分离推荐将模型和嵌入分为两部分,分别用于多个领域和单个领域,而大多数现有的MDR方法只关注模型级别的分离。领域对应利用图像处理的Random Walk来identify不同领域的用户/项目对应的相似对,并强制相似的用户/项目对应具有相似的嵌入,从而提高知识传递。* Methods: 我们提出了一种新的MDR方法,名为EDDA,它包括两个关键 ком成分:嵌入分离推荐和领域对应。嵌入分离推荐将模型和嵌入分为两部分,分别用于多个领域和单个领域。领域对应利用图像处理的Random Walk来identify不同领域的用户/项目对应的相似对,并强制相似的用户/项目对应具有相似的嵌入。* Results: 我们对12个基线模型进行比较,并在3个实际数据集上进行测试。结果表明,EDDA在所有数据集和领域上都能够超越基线模型。所有数据集和代码可以在https://github.com/Stevenn9981/EDDA上下载。Abstract
Multi-domain recommendation (MDR) aims to provide recommendations for different domains (e.g., types of products) with overlapping users/items and is common for platforms such as Amazon, Facebook, and LinkedIn that host multiple services. Existing MDR models face two challenges: First, it is difficult to disentangle knowledge that generalizes across domains (e.g., a user likes cheap items) and knowledge specific to a single domain (e.g., a user likes blue clothing but not blue cars). Second, they have limited ability to transfer knowledge across domains with small overlaps. We propose a new MDR method named EDDA with two key components, i.e., embedding disentangling recommender and domain alignment, to tackle the two challenges respectively. In particular, the embedding disentangling recommender separates both the model and embedding for the inter-domain part and the intra-domain part, while most existing MDR methods only focus on model-level disentangling. The domain alignment leverages random walks from graph processing to identify similar user/item pairs from different domains and encourages similar user/item pairs to have similar embeddings, enhancing knowledge transfer. We compare EDDA with 12 state-of-the-art baselines on 3 real datasets. The results show that EDDA consistently outperforms the baselines on all datasets and domains. All datasets and codes are available at https://github.com/Stevenn9981/EDDA.
摘要
(Simplified Chinese translation)多域推荐(MDR)目的是为不同类型产品的推荐,通常在 Amazon、Facebook 和 LinkedIn 等多种服务平台上出现。现有的 MDR 模型面临两个挑战:首先,很难分离通用于多个域的知识(例如,用户喜欢便宜的商品)和单个域的知识(例如,用户喜欢蓝色的服装,但不喜欢蓝色的汽车)。其次,它们对不同域之间的知识传递有限。我们提出了一种新的 MDR 方法,名为 EDDA,它包括嵌入分离推荐器和域对齐。嵌入分离推荐器将模型和嵌入分为两个部分:交叉域部分和内部域部分,而大多数现有的 MDR 方法只是强调模型级别的分离。域对齐利用图像处理中的Random Walk来标识不同域的用户/物品对,并强制类似的用户/物品对有类似的嵌入,从而提高知识传递。我们与 12 个现有基线比较 EDDA,结果显示 EDDA 在所有数据集和域上均表现优于基线。所有数据集和代码可以在 https://github.com/Stevenn9981/EDDA 上下载。
results: 证明 EFX 分配 sempre 存在,并且提供一个几何时间算法来计算 EFX 分配。Abstract
We study the fair division problem and the existence of allocations satisfying the fairness criterion envy-freeness up to any item (EFX). The existence of EFX allocations is a major open problem in the fair division literature. We consider binary valuations where the marginal gain of the value by receiving an extra item is either $0$ or $1$. Babaioff et al. [2021] proved that EFX allocations always exist for binary and submodular valuations. In this paper, by using completely different techniques, we extend this existence result to general binary valuations that are not necessarily submodular, and we present a polynomial time algorithm for computing an EFX allocation.
摘要
我们研究公平分配问题,特别是一个称为“envy-freeness up to any item”(EFX)的公平性准则。EFX分配的存在是公平分配领域的一个主要开放问题。我们考虑二进制的价值函数,其中每个物品的追加价值为0或1。 Babai et al.(2021)证明了对二进制和不等式价值函数的EFX分配总是存在。在这篇文章中,我们使用完全不同的技术推广这个存在结果至一般二进制价值函数,并提供了一个几何时间算法来计算EFX分配。
Bringing order into the realm of Transformer-based language models for artificial intelligence and law
for: This paper provides a systematic overview of Transformer-based language models (TLMs) for AI-driven problems and tasks in the legal domain, with a focus on highlighting research advances and current limitations.
methods: The paper uses TLMs, specifically BERT and related models, to address AI-driven problems and tasks in the legal domain.
results: The paper provides a comprehensive overview of the current state of TLM-based methods in the legal domain, highlighting their contributions and limitations, and identifying opportunities for further research development.Here’s the Chinese translation of the three information points:
results: 论文提供了法律领域中TLM-基本方法的现状综述, highlighting其贡献和局限性,以及未来研发的机会。Abstract
Transformer-based language models (TLMs) have widely been recognized to be a cutting-edge technology for the successful development of deep-learning-based solutions to problems and applications that require natural language processing and understanding. Like for other textual domains, TLMs have indeed pushed the state-of-the-art of AI approaches for many tasks of interest in the legal domain. Despite the first Transformer model being proposed about six years ago, there has been a rapid progress of this technology at an unprecedented rate, whereby BERT and related models represent a major reference, also in the legal domain. This article provides the first systematic overview of TLM-based methods for AI-driven problems and tasks in the legal sphere. A major goal is to highlight research advances in this field so as to understand, on the one hand, how the Transformers have contributed to the success of AI in supporting legal processes, and on the other hand, what are the current limitations and opportunities for further research development.
摘要
TLMs (Transformer-based language models) 已经广泛被认为是现代技术的代表之一,用于解决需要自然语言处理和理解的问题和应用。就如同其他文本领域一样,TLMs 在法律领域也提高了人工智能的方法。虽然首个 Transformer 模型已经提出了大约六年之前,但这技术在无 precedent 的速度下进步了,BERT 和相关模型在法律领域也成为了主要参考。本文为法律领域中 TLM-based 方法的首次系统性概述,目的是要把研究进展突出来,以便理解TLMs 如何贡献于法律过程中的成功,以及当前的局限性和未来研究发展的机会。
More Than Meets the Eye: Analyzing Anesthesiologists’ Visual Attention in the Operating Room Using Deep Learning Models
for: This study aims to improve the safe management of patients under general anesthesia by analyzing the visual attention of anesthesiologists during surgery.
methods: The study uses a novel eye-tracking method based on deep learning models that process monitor-mounted webcams to collect continuous behavioral data without disturbing the anesthesiologists’ natural workflow.
results: The study found that the proposed framework can distinguish between different visual behavioral patterns, including baseline visual attention during uneventful periods, patterns associated with active phases, and patterns during critical, unanticipated incidents.Abstract
Patient's vital signs, which are displayed on monitors, make the anesthesiologist's visual attention (VA) a key component in the safe management of patients under general anesthesia; moreover, the distribution of said VA and the ability to acquire specific cues throughout the anesthetic, may have a direct impact on patient's outcome. Currently, most studies employ wearable eye-tracking technologies to analyze anesthesiologists' visual patterns. Albeit being able to produce meticulous data, wearable devices are not a sustainable solution for large-scale or long-term use for data collection in the operating room (OR). Thus, by utilizing a novel eye-tracking method in the form of deep learning models that process monitor-mounted webcams, we collected continuous behavioral data and gained insight into the anesthesiologist's VA distribution with minimal disturbance to their natural workflow. In this study, we collected OR video recordings using the proposed framework and compared different visual behavioral patterns. We distinguished between baseline VA distribution during uneventful periods to patterns associated with active phases or during critical, unanticipated incidents. In the future, such a platform may serve as a crucial component of context-aware assistive technologies in the OR.
摘要
医生的生命体征,displayed on monitors,making the anesthesiologist's visual attention (VA) a key component in the safe management of patients under general anesthesia; Moreover,the distribution of said VA and the ability to acquire specific cues throughout the anesthetic,may have a direct impact on the patient's outcome. Currently,most studies employ wearable eye-tracking technologies to analyze anesthesiologists' visual patterns. Although wearable devices can produce meticulous data,they are not a sustainable solution for large-scale or long-term use for data collection in the operating room (OR). Therefore,by utilizing a novel eye-tracking method in the form of deep learning models that process monitor-mounted webcams,we collected continuous behavioral data and gained insight into the anesthesiologist's VA distribution with minimal disturbance to their natural workflow. In this study,we collected OR video recordings using the proposed framework and compared different visual behavioral patterns. We distinguished between baseline VA distribution during uneventful periods and patterns associated with active phases or during critical,unanticipated incidents. In the future,such a platform may serve as a crucial component of context-aware assistive technologies in the OR.
Exploring XAI for the Arts: Explaining Latent Space in Generative Music
paper_authors: Nick Bryan-Kinns, Berker Banar, Corey Ford, Courtney N. Reed, Yixiao Zhang, Simon Colton, Jack Armitage
for: 本研究旨在增强Explainable AI(XAI)功能,以便更好地 debug 和理解创作 AI 系统。
methods: 该研究使用了扩展MeasureVAE模型,通过增加干扰变量来增强模型的可解释性。
results: 研究人员通过提供用户界面反馈循环和 latent space 可视化来帮助人们更好地理解和预测模型的输出。Abstract
Explainable AI has the potential to support more interactive and fluid co-creative AI systems which can creatively collaborate with people. To do this, creative AI models need to be amenable to debugging by offering eXplainable AI (XAI) features which are inspectable, understandable, and modifiable. However, currently there is very little XAI for the arts. In this work, we demonstrate how a latent variable model for music generation can be made more explainable; specifically we extend MeasureVAE which generates measures of music. We increase the explainability of the model by: i) using latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes, ii) providing a user interface feedback loop to allow people to adjust dimensions of the latent space and observe the results of these changes in real-time, iii) providing a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions. We suggest that in doing so we bridge the gap between the latent space and the generated musical outcomes in a meaningful way which makes the model and its outputs more explainable and more debuggable.
摘要
“可解释AI”(Explainable AI)具有潜在地支持更互动性和流动性的合作AI系统,这些系统可以与人类创作合作。为了实现这一目标,创意AI模型需要具有可解释的特性,例如:可检查、理解和修改。然而,目前在艺术领域中有很少XAI(可解释AI)。在这个工作中,我们示范了一种将音乐生成模型变得更加可解释的方法,具体来说是:将测量VAE扩展为生成乐曲的度量。我们增加了这个模型的可解释性,通过:1. 使用几何空间调整来让特定的几何空间维度映射到有意义的音乐特征上。2. 提供用户反馈回路,让人类可以在实时运行中调整几何空间维度,并观察改变的结果。3. 提供音乐特征的可视化,帮助人类理解和预测几何空间维度的变化会对音乐生成的影响。我们认为,这些措施可以将latent space和生成的音乐结果之间的距离降低到有意义的水平,使模型和其输出更加可解释和可 debug。
results: 这个论文的实验结果表明,D-Bot可以高效地和有效地诊断目标数据库的根本原因,并且可以帮助DBA更好地维护和优化数据库系统。Abstract
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
摘要
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results show that D-Bot can efficiently and effectively diagnose the root causes, and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.Here's the translation in Traditional Chinese:Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results show that D-Bot can efficiently and effectively diagnose the root causes, and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
Reviewing 3D Object Detectors in the Context of High-Resolution 3+1D Radar
paper_authors: Patrick Palmer, Martin Krueger, Richard Altendorfer, Ganesh Adam, Torsten Bertram for:* 这个研究是为了探索深度学习模型在高分辨率激光数据上进行3D物体检测。methods:* 这个研究使用了现有的3D激光点云资料,并将其适应到高分辨率激光数据上。results:* 研究发现了一些适合高分辨率激光数据的3D物体检测模型,并评估了它们在不同数据集上的表现。Abstract
Recent developments and the beginning market introduction of high-resolution imaging 4D (3+1D) radar sensors have initialized deep learning-based radar perception research. We investigate deep learning-based models operating on radar point clouds for 3D object detection. 3D object detection on lidar point cloud data is a mature area of 3D vision. Many different architectures have been proposed, each with strengths and weaknesses. Due to similarities between 3D lidar point clouds and 3+1D radar point clouds, those existing 3D object detectors are a natural basis to start deep learning-based 3D object detection on radar data. Thus, the first step is to analyze the detection performance of the existing models on the new data modality and evaluate them in depth. In order to apply existing 3D point cloud object detectors developed for lidar point clouds to the radar domain, they need to be adapted first. While some detectors, such as PointPillars, have already been adapted to be applicable to radar data, we have adapted others, e.g., Voxel R-CNN, SECOND, PointRCNN, and PV-RCNN. To this end, we conduct a cross-model validation (evaluating a set of models on one particular data set) as well as a cross-data set validation (evaluating all models in the model set on several data sets). The high-resolution radar data used are the View-of-Delft and Astyx data sets. Finally, we evaluate several adaptations of the models and their training procedures. We also discuss major factors influencing the detection performance on radar data and propose possible solutions indicating potential future research avenues.
摘要
近期发展和开始应用高分辨率图像4D(3+1D)雷达感知技术已经INITIALIZED深度学习基于雷达感知研究。我们研究在雷达点云数据上使用深度学习模型进行3D对象检测。3D对象检测在激光点云数据上是成熟领域。许多不同的架构已经被提议,每种都有优点和缺点。由于雷达点云和激光点云之间的相似性,因此现有的3D对象检测模型成为开始深度学习基于雷达数据的自然选择。因此,我们的首要步骤是分析现有模型在新数据模式上的检测性能,并对其进行深入评估。为了将现有的3D点云对象检测器应用到雷达频谱中,它们需要进行适应。而一些检测器,如PointPillars,已经被适应以使其适用于雷达数据。我们则适应了其他检测器,如Voxel R-CNN、SECOND、PointRCNN和PV-RCNN。为此,我们进行了跨模型验证(评估多个模型在一个特定数据集上)以及跨数据集验证(评估所有模型在多个数据集上)。使用的高分辨率雷达数据是View-of-Delft和Astyx数据集。最后,我们评估了多种适应和训练过程。我们还讨论了雷达数据上检测性能的主要因素,并提出了可能的解决方案,并指出了未来研究的可能性。
Explainable AI applications in the Medical Domain: a systematic review
results: 该论文的结果显示,在医疗人工智能领域内,大多数解释AI解决方案使用了模型无关的技术,深度学习模型被广泛应用,而且用于提高信任性的解释性能力被广泛应用,但很少有工作报道了医生的参与。同时,用户可交互的视觉界面被证明是更有用的。Abstract
Artificial Intelligence in Medicine has made significant progress with emerging applications in medical imaging, patient care, and other areas. While these applications have proven successful in retrospective studies, very few of them were applied in practice.The field of Medical AI faces various challenges, in terms of building user trust, complying with regulations, using data ethically.Explainable AI (XAI) aims to enable humans understand AI and trust its results. This paper presents a literature review on the recent developments of XAI solutions for medical decision support, based on a representative sample of 198 articles published in recent years. The systematic synthesis of the relevant articles resulted in several findings. (1) model-agnostic XAI techniques were mostly employed in these solutions, (2) deep learning models are utilized more than other types of machine learning models, (3) explainability was applied to promote trust, but very few works reported the physicians participation in the loop, (4) visual and interactive user interface is more useful in understanding the explanation and the recommendation of the system. More research is needed in collaboration between medical and AI experts, that could guide the development of suitable frameworks for the design, implementation, and evaluation of XAI solutions in medicine.
摘要
人工智能在医学中已经做出了重要进展,其应用领域包括医疗影像、患者护理和其他领域。虽然这些应用在回顾性研究中证明了其成功,但是它们在实践中却非常少。医学人工智能遇到了许多挑战,包括建立用户信任、遵循法规和使用数据伦理。可解释人工智能(XAI)的目的是让人们理解人工智能和信任其结果。本文介绍了一篇关于最近发展的 XAI解决方案在医疗决策支持方面的文献回顾,基于最近几年发表的198篇文章。系统性的分析这些相关文章的结果有几个发现:(1)模型不依赖的 XAI技术在这些解决方案中广泛应用,(2)深度学习模型比其他机器学习模型更多地使用,(3)解释是为了提高信任,但很少的工作报告了医生参与的循环,(4)可视化和交互的用户界面更有用于理解解释和系统的建议。需要更多的合作 между医学和人工智能专家,以确定适合的框架,以便设计、实施和评估 XAI 解决方案在医学领域。
A Comparative Assessment of Multi-view fusion learning for Crop Classification
paper_authors: Francisco Mena, Diego Arenas, Marlon Nuske, Andreas Dengel
for: 本研究旨在评估不同多视图学习模型在农作物分类 tasks 上的表现。
methods: 本研究使用了多种多视图合并策略,包括输入级别合并、特征级别合并和层次级别合并。
results: 研究发现,不同的测试区域中,不同的合并策略具有最佳表现。另外,研究还发现,使用多视图合并策略可以超越单个视图模型和前一代合并策略。Abstract
With a rapidly increasing amount and diversity of remote sensing (RS) data sources, there is a strong need for multi-view learning modeling. This is a complex task when considering the differences in resolution, magnitude, and noise of RS data. The typical approach for merging multiple RS sources has been input-level fusion, but other - more advanced - fusion strategies may outperform this traditional approach. This work assesses different fusion strategies for crop classification in the CropHarvest dataset. The fusion methods proposed in this work outperform models based on individual views and previous fusion methods. We do not find one single fusion method that consistently outperforms all other approaches. Instead, we present a comparison of multi-view fusion methods for three different datasets and show that, depending on the test region, different methods obtain the best performance. Despite this, we suggest a preliminary criterion for the selection of fusion methods.
摘要
随着Remote Sensing(RS)数据源的快速增加和多样化,需要 Multi-view learning 模型。这是一个复杂的任务,因为RS数据差异尺度、大小和噪声。传统的方法是输入水平融合,但更高级的融合策略可能超越这些传统方法。这项工作评估了不同融合策略对 cropharvest 数据集的蓟类分类。我们未发现一个单一的融合方法,可以一直抵御所有其他方法。相反,我们提供了不同融合方法的比较,并在三个不同的数据集上进行了测试。根据测试区域,不同的方法在不同的情况下表现最佳。尽管如此,我们建议一个初步的融合方法选择 criterion。
results: 作者通过实验表明,使用existential rule reasoners可以提高\nthree reasoning的性能,特别是在含有多个实体的情况下。此外,EYE reasoner在处理依赖关系较多的规则时表现出了 excel 的速度。Abstract
Notation3 Logic (\nthree) is an extension of RDF that allows the user to write rules introducing new blank nodes to RDF graphs. Many applications (e.g., ontology mapping) rely on this feature as blank nodes -- used directly or in auxiliary constructs -- are omnipresent on the Web. However, the number of fast \nthree reasoners covering this very important feature of the logic is rather limited. On the other hand, there are engines like VLog or Nemo which do not directly support Semantic Web rule formats but which are developed and optimized for very similar constructs: existential rules. In this paper, we investigate the relation between \nthree rules with blank nodes in their heads and existential rules. We identify a subset of \nthree which can be mapped directly to existential rules and define such a mapping preserving the equivalence of \nthree formulae. In order to also illustrate that in some cases \nthree reasoning could benefit from our translation, we then employ this mapping in an implementation to compare the performance of the \nthree reasoners EYE and cwm to VLog and Nemo on \nthree rules and their mapped counterparts. Our tests show that the existential rule reasoners perform particularly well for use cases containing many facts while especially the EYE reasoner is very fast when dealing with a high number of dependent rules. We thus provide a tool enabling the Semantic Web community to directly use existing and future existential rule reasoners and benefit from the findings of this active community.
摘要
notation3逻辑(\nthree)是RDF的扩展,允许用户编写引入新的白节点到RDF图表中的规则。许多应用(例如ontology mapping)对此特性具有重要性,因为白节点在互联网上是普遍存在的。然而,快速的\nthree理解器覆盖这一特性的数量相对较少。一方面,有些引擎如VLog或Nemo不直接支持semantic web规则格式,但是它们是为类似结构优化的。在这篇论文中,我们研究了\nthree规则中包含白节点的头部和existential规则之间的关系。我们 identificada一个\nthree规则的子集可以直接映射到existential规则,并定义了这种映射,以保持\nthree формулы的等价性。为了证明\nthree理解可以从我们的翻译中受益,我们在实现中使用了这种映射,并与EYE和cwm两个\nthree理解器进行比较。我们的测试显示,existential规则理解器在具有多个事实的用例中表现特别好,而EYE理解器在依赖的规则多的用例中具有很高的运动速度。因此,我们提供了一种工具,使得semantic web社区可以直接使用现有和未来的existential规则理解器,并且从这个活跃社区的研究中受益。
Enhancing Trust in LLM-Based AI Automation Agents: New Considerations and Future Challenges
paper_authors: Sivan Schwartz, Avi Yaeli, Segev Shlomov
for: 本研究探讨了人工智能代理人在新一代自动化工具中的信任问题,以及这些工具如何解决这些问题。
methods: 本研究分析了现有文献中关于人工智能代理人信任的主要方面,并评估了新一代自动化工具是如何Addressing these considerations。
results: 研究发现,新一代自动化工具可以帮助解决许多信任问题,但还有一些挑战需要进一步研究。Abstract
Trust in AI agents has been extensively studied in the literature, resulting in significant advancements in our understanding of this field. However, the rapid advancements in Large Language Models (LLMs) and the emergence of LLM-based AI agent frameworks pose new challenges and opportunities for further research. In the field of process automation, a new generation of AI-based agents has emerged, enabling the execution of complex tasks. At the same time, the process of building automation has become more accessible to business users via user-friendly no-code tools and training mechanisms. This paper explores these new challenges and opportunities, analyzes the main aspects of trust in AI agents discussed in existing literature, and identifies specific considerations and challenges relevant to this new generation of automation agents. We also evaluate how nascent products in this category address these considerations. Finally, we highlight several challenges that the research community should address in this evolving landscape.
摘要
信任AI代理已经得到了广泛的研究,从而取得了 significative advancements 在这一领域。然而,大量语言模型(LLM)的快速发展以及基于LLM的AI代理框架的出现带来了新的挑战和机遇,需要进一步的研究。在过程自动化领域,一个新一代基于AI的代理出现了,允许执行复杂任务。同时,建立自动化的过程变得更加容易,通过用户友好的无代码工具和培训机制,让企业用户能够更直观地参与到自动化过程中。本文探讨了这些新的挑战和机遇,分析了现有文献中关于信任AI代理的主要方面,并特别关注这一新一代自动化代理的准确性和可靠性。最后,我们还提出了一些研究社区应该解决的挑战。
Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification
paper_authors: Tao Zou, Le Yu, Leilei Sun, Bowen Du, Deqing Wang, Fuzhen Zhuang for: 本研究的目的是提出一种完整考虑专利申请文本和证明人申请历史pattern的权利分类方法,以提高权利分类的准确性。methods: 本研究使用一种综合学习模型,包括IPC代码相关性学习模块和历史应用模式学习模块。IPC代码相关性学习模块通过自适应传递和聚合消息来 derive IPC代码的语义表示。历史应用模式学习模块通过双通道聚合机制来 incorporate 申请人的前一个申请文本。results: 实验结果表明,当前方法在实际数据集上表现出色,比已有方法更高精度。此外,模型还能捕捉证明人的时间特征和IPC代码之间的语义依赖关系。Abstract
Patent classification aims to assign multiple International Patent Classification (IPC) codes to a given patent. Recent methods for automatically classifying patents mainly focus on analyzing the text descriptions of patents. However, apart from the texts, each patent is also associated with some assignees, and the knowledge of their applied patents is often valuable for classification. Furthermore, the hierarchical taxonomy formulated by the IPC system provides important contextual information and enables models to leverage the correlations between IPC codes for more accurate classification. However, existing methods fail to incorporate the above aspects. In this paper, we propose an integrated framework that comprehensively considers the information on patents for patent classification. To be specific, we first present an IPC codes correlations learning module to derive their semantic representations via adaptively passing and aggregating messages within the same level and across different levels along the hierarchical taxonomy. Moreover, we design a historical application patterns learning component to incorporate the corresponding assignee's previous patents by a dual channel aggregation mechanism. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions. Experiments on real-world datasets demonstrate the superiority of our approach over the existing methods. Besides, we present the model's ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.
摘要
《权利分类目的是将多个国际专利分类(IPC)代码分配给一个专利。现代方法主要关注专利文本描述进行自动分类。然而,除了文本,每个专利还与一些投分人相关,其应用专利知识对分类具有重要作用。另外,由IPC系统定义的层次分类结构提供了重要的上下文信息,使模型可以利用分类代码之间的相关性进行更准确的分类。然而,现有方法忽略了以上因素。在这篇论文中,我们提出一个整合性的框架,全面考虑专利信息进行专利分类。具体来说,我们首先提出一个IPC代码相关性学习模块,通过适应性地传递和聚合消息来 derivation их semantic representations。此外,我们设计了历史应用模式学习组件,通过双通道聚合机制来 incorporate 投分人的前一个专利。最后,我们将专利文本中含有的专利 semantics,与投分人的时序预ference相结合,进行预测。实验结果表明,我们的方法在实际数据上表现出色,并且能够捕捉投分人的时序模式和IPC代码之间的semantic相关性。》
Beyond Semantics: Learning a Behavior Augmented Relevance Model with Self-supervised Learning
results: 该论文的实验结果表明,使用该新的相似性模型可以更好地捕捉用户的搜索需求,提高搜索结果的准确率和用户满意度。Abstract
Relevance modeling aims to locate desirable items for corresponding queries, which is crucial for search engines to ensure user experience. Although most conventional approaches address this problem by assessing the semantic similarity between the query and item, pure semantic matching is not everything.
摘要
<> translate "Relevance modeling aims to locate desirable items for corresponding queries, which is crucial for search engines to ensure user experience. Although most conventional approaches address this problem by assessing the semantic similarity between the query and item, pure semantic matching is not everything." into Simplified Chinese<>Here's the translation:重要模型化目标是为相应的查询找到需要的项目,这对搜索引擎确保用户体验至关重要。大多数常见方法通过对查询和项目之间的semantic相似性进行评估来解决这个问题,但纯semantic匹配并不是所有的。
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment
results: 测试结果表明,通常情况下,更加适应的模型具有更高的总可靠性。但是,对不同的可靠性类别进行细化分析、测试和改进,则可以提高 LLM 的可靠性和伦理性。这些研究结果为实际应用中 LLM 部署提供了有价值的指导和灵感。Abstract
Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
摘要
ensure alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
Machine Learning aided Computer Architecture Design for CNN Inferencing Systems
methods: 本文使用了Design Space Exploration(DSE)技术来探索最佳的加速器选择,并开发了一个快速和精确的预测方法来评估加速器的电力和性能。
results: 本文获得了一个MAPE值为5.03%和5.94%的快速和精确的预测方法,可以帮助计算机架构师在开发过程中更快地评估加速器的选择,从而节省时间和成本,提高时间上市时间。Abstract
Efficient and timely calculations of Machine Learning (ML) algorithms are essential for emerging technologies like autonomous driving, the Internet of Things (IoT), and edge computing. One of the primary ML algorithms used in such systems is Convolutional Neural Networks (CNNs), which demand high computational resources. This requirement has led to the use of ML accelerators like GPGPUs to meet design constraints. However, selecting the most suitable accelerator involves Design Space Exploration (DSE), a process that is usually time-consuming and requires significant manual effort. Our work presents approaches to expedite the DSE process by identifying the most appropriate GPGPU for CNN inferencing systems. We have developed a quick and precise technique for forecasting the power and performance of CNNs during inference, with a MAPE of 5.03% and 5.94%, respectively. Our approach empowers computer architects to estimate power and performance in the early stages of development, reducing the necessity for numerous prototypes. This saves time and money while also improving the time-to-market period.
摘要
高效和时间相对的计算机学学习(ML)算法计算是现代技术的核心,如自动驾驶、物联网(IoT)和边缘计算。某些主要的ML算法在这些系统中使用的是卷积神经网络(CNNs),它们需要高度的计算资源。这种需求导致了使用ML加速器,如GPGPU,以满足设计要求。然而,选择最适合的加速器通常需要大量的时间和人工劳动,这是通过设计空间探索(DSE)来实现的。我们的工作提出了加速DSE过程的方法,以便更快地选择适合CNN推理系统的GPGPU。我们开发了一种快速精确的方法来预测CNN在推理过程中的能量和性能,MAPE值分别为5.03%和5.94%。我们的方法使得计算机架构师可以在开发的早期阶段估算能量和性能,从而降低了许多原型的需求。这涵盖了时间和成本,同时也提高了时间上市期。
Metacognitive Prompting Improves Understanding in Large Language Models
results: MP 方法比标准和串联思维提问方法(chain-of-thought prompting)更高效, PaLM 当 equipped with MP 能够 approached GPT-4 的表现水平,并且 across models 和 datasets 中,MP consistently outperform 其他提问方法。Abstract
In Large Language Models (LLMs), there have been consistent advancements in task-specific performance, largely influenced by effective prompt design. While recent research on prompting has enhanced the reasoning capabilities of LLMs, a gap remains in further improving their understanding abilities. In this study, we introduce Metacognitive Prompting (MP), a strategy inspired by human introspective reasoning processes. Using MP, LLMs undergo a systematic series of structured, self-aware evaluations, drawing on both their vast inherent knowledge and new insights. Our experiments involve five prevalent LLMs: Llama2, Vicuna, PaLM, GPT-3.5, and GPT-4, all of which span various general natural language understanding (NLU) tasks from the GLUE and SuperGLUE benchmarks. Results indicate that, although GPT-4 consistently excels in most tasks, PaLM, when equipped with MP, approaches its performance level. Furthermore, across models and datasets, MP consistently outperforms existing prompting methods, including standard and chain-of-thought prompting. This study underscores the potential to amplify the understanding abilities of LLMs and highlights the benefits of mirroring human introspective reasoning in NLU tasks.
摘要
在大型语言模型(LLM)中,有了不断的进步,主要受到有效的提示设计的影响。虽然最近的研究表明,提示可以提高LLM的理解能力,但是还有一定的差距需要补做。在本研究中,我们介绍了认知提示(MP),一种基于人类 introspective reasoning 的策略。使用MP,LLMs 进行了一系列结构化、自我意识的评估,利用其内置的庞大知识和新的发现。我们的实验包括五种流行的 LLM:Llama2、Vicuna、PaLM、GPT-3.5 和 GPT-4,这些模型都涵盖了 GLUE 和 SuperGLUE benchmark 上的多种通用自然语言理解(NLU)任务。结果表明,虽然 GPT-4 在大多数任务中表现出色,但PaLM,当它配备 MP 时,可以接近它的表现水平。此外,在模型和数据集之间,MP 一直比标准和链式思维提示方法表现更好。这种研究强调了提高LLMs 的理解能力的潜在可能性,并高亮了在 NLU 任务中模仿人类 introspective reasoning 的重要性。
Classification of Human- and AI-Generated Texts: Investigating Features for ChatGPT
results: 实验结果显示,新特征可以substantially提高许多分类器的表现,并且最佳基本文本重构检测系统甚至高于GPTZero的183.8%相对F1分数。Abstract
Recently, generative AIs like ChatGPT have become available to the wide public. These tools can for instance be used by students to generate essays or whole theses. But how does a teacher know whether a text is written by a student or an AI? In our work, we explore traditional and new features to (1) detect text generated by AI from scratch and (2) text rephrased by AI. Since we found that classification is more difficult when the AI has been instructed to create the text in a way that a human would not recognize that it was generated by an AI, we also investigate this more advanced case. For our experiments, we produced a new text corpus covering 10 school topics. Our best systems to classify basic and advanced human-generated/AI-generated texts have F1-scores of over 96%. Our best systems for classifying basic and advanced human-generated/AI-rephrased texts have F1-scores of more than 78%. The systems use a combination of perplexity, semantic, list lookup, error-based, readability, AI feedback, and text vector features. Our results show that the new features substantially help to improve the performance of many classifiers. Our best basic text rephrasing detection system even outperforms GPTZero by 183.8% relative in F1-score.
摘要
最近,生成型AI如ChatGPT已经向广大公众开放。这些工具可以让学生生成作业或整篇论文。但教师如何知道文本是学生写的或AI写的?在我们的工作中,我们探索了传统和新特征,以检测AI生成文本和学生重塑文本。由于我们发现,当AI被 instrumente产生文本以让人无法认出是AI生成的时,分类更加困难,因此我们也进行了这种更高级的检测。为了进行实验,我们生成了10个学校主题的新文本库。我们的最佳系统可以分类基本和高级人类生成/AI生成文本的F1分数高于96%。我们的最佳系统可以分类基本和高级人类重塑/AI重塑文本的F1分数高于78%。这些系统使用了混合的困惑度、语义、列表查找、错误基本、可读性、AI反馈和文本向量特征。我们的结果表明,新特征有效地提高了许多分类器的性能。我们的最佳基本文本重塑检测系统甚至超过GPTZero的相对F1分数183.8%。
Adv-Inpainting: Generating Natural and Transferable Adversarial Patch via Attention-guided Feature Fusion
For: The paper aims to improve the stealthiness and transferability of adversarial attacks on facial recognition (FR) models by generating natural-looking and highly transferable adversarial patches.* Methods: The proposed method, called Adv-Inpainting, consists of two stages: (1) an attention-guided StyleGAN (Att-StyleGAN) that adaptively combines texture and identity features based on the attention map to generate high-transferable and natural adversarial patches, and (2) a refinement network with a new boundary variance loss to further improve the coherence between the patch and its surrounding area.* Results: The proposed method demonstrates stronger transferability and improved visual quality than previous adversarial patch attacks, making it a more effective and stealthy approach for attacking FR models.Abstract
The rudimentary adversarial attacks utilize additive noise to attack facial recognition (FR) models. However, because manipulating the total face is impractical in the physical setting, most real-world FR attacks are based on adversarial patches, which limit perturbations to a small area. Previous adversarial patch attacks often resulted in unnatural patterns and clear boundaries that were easily noticeable. In this paper, we argue that generating adversarial patches with plausible content can result in stronger transferability than using additive noise or directly sampling from the latent space. To generate natural-looking and highly transferable adversarial patches, we propose an innovative two-stage coarse-to-fine attack framework called Adv-Inpainting. In the first stage, we propose an attention-guided StyleGAN (Att-StyleGAN) that adaptively combines texture and identity features based on the attention map to generate high-transferable and natural adversarial patches. In the second stage, we design a refinement network with a new boundary variance loss to further improve the coherence between the patch and its surrounding area. Experiment results demonstrate that Adv-Inpainting is stealthy and can produce adversarial patches with stronger transferability and improved visual quality than previous adversarial patch attacks.
摘要
“攻击Facial recognition(FR)模型的基本攻击方法是使用添加噪音。然而,在实际情况下,对整个脸部进行修改是不切实际的。因此,大多数实际世界中的FR攻击都是基于对小区域进行修改的对应攻击。在这篇论文中,我们认为可以使用可信内容的对应攻击来实现更高的传递性。为了生成自然看起来的高传递性对应攻击,我们提出了一个创新的两阶段粗糙到细致的攻击框架,称为Adv-Inpainting。在第一阶段,我们提出了一个注意力引导的StyleGAN(Att-StyleGAN),可以根据注意力地图来组合文本和识别特征,生成高传递性和自然的对应攻击。在第二阶段,我们设计了一个新的边缘差分损失,以更好地调节对应攻击和周围区域之间的协调。实验结果显示,Adv-Inpainting可以让攻击更加隐藏和生成更高传递性和改善的视觉质量的对应攻击。”
Homophily-enhanced Structure Learning for Graph Clustering
methods: 本研究提出了一种新的方法 called HoLe,通过提高图中同类度的度量来提高GNNs和图分类的性能。 HoLe 包括两个结构学习模块:层次相关度估计和群集 aware 缩放。
results: 对于七种不同类型和规模的测试数据集,HoLe 在多种分类指标下与州前的基线方法进行比较,并表明其性能优于基线方法。Abstract
Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called \textbf{ho}mophily-enhanced structure \textbf{le}arning for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.
摘要
“图 clustering 是图分析中的基本任务,而 latest advances in utilizing graph neural networks (GNNs) 已经 achieved impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called 同型环境增强的结构学习 for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.”
Double-chain Constraints for 3D Human Pose Estimation in Images and Videos
results: 实验结果表明,DC-GCT在两个复杂的数据集(Human3.6M和MPI-INF-3DHP)上实现了状态可读性的表现,特别是在Human3.6M数据集上,我们的模型在所有动作类别上实现了状态可读性。Abstract
Reconstructing 3D poses from 2D poses lacking depth information is particularly challenging due to the complexity and diversity of human motion. The key is to effectively model the spatial constraints between joints to leverage their inherent dependencies. Thus, we propose a novel model, called Double-chain Graph Convolutional Transformer (DC-GCT), to constrain the pose through a double-chain design consisting of local-to-global and global-to-local chains to obtain a complex representation more suitable for the current human pose. Specifically, we combine the advantages of GCN and Transformer and design a Local Constraint Module (LCM) based on GCN and a Global Constraint Module (GCM) based on self-attention mechanism as well as a Feature Interaction Module (FIM). The proposed method fully captures the multi-level dependencies between human body joints to optimize the modeling capability of the model. Moreover, we propose a method to use temporal information into the single-frame model by guiding the video sequence embedding through the joint embedding of the target frame, with negligible increase in computational cost. Experimental results demonstrate that DC-GCT achieves state-of-the-art performance on two challenging datasets (Human3.6M and MPI-INF-3DHP). Notably, our model achieves state-of-the-art performance on all action categories in the Human3.6M dataset using detected 2D poses from CPN, and our code is available at: https://github.com/KHB1698/DC-GCT.
摘要
“重建3D姿势从2D姿势中缺失深度信息是特别困难的,因为人体运动的复杂性和多样性。关键在于有效地模型人体 JOINTS 之间的空间约束,以利用它们的内在依赖关系。因此,我们提出了一个新的模型,叫做 Double-chain Graph Convolutional Transformer (DC-GCT),用于将姿势约束到一个双排链接构中,包括本地到全球和全球到本地链接。这个模型结合了 GCN 和 Transformer 的优点,并将本地约束模块 (LCM) 和全球约束模块 (GCM) 搭配在一起,以及一个特有的 Feature Interaction Module (FIM)。这个方法可以充分捕捉人体 JOINTS 之间的多层次依赖关系,以便优化模型的测试能力。此外,我们还提出了一种将时间信息 integrate 到单框模型中的方法,通过将动态序列嵌入与目标帧的肢体嵌入相互推导,几乎没有加入计算成本。实验结果表明,DC-GCT 在 Human3.6M 和 MPI-INF-3DHP 两个挑战性 datasets 上 achieves 州际最佳性能。特别是,我们的模型在 Human3.6M dataset 中使用检测到的 2D 姿势从 CPN 上的检测结果,在所有动作类别上均达到州际最佳性能。我们的代码可以在:https://github.com/KHB1698/DC-GCT 中找到。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.
Multimodal Pretrained Models for Sequential Decision-Making: Synthesis, Verification, Grounding, and Perception
results: 研究人员通过一系列实验,证明了该算法可以成功构建、验证和ground控制器,并在实际任务中提供了高效的控制器。Abstract
Recently developed pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from pretrained models to construct and verify controllers for sequential decision-making tasks, and to ground these controllers to task environments through visual observations. In particular, the algorithm queries a pretrained model with a user-provided, text-based task description and uses the model's output to construct an automaton-based controller that encodes the model's task-relevant knowledge. It then verifies whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. If this verification step discovers any inconsistency, the algorithm automatically refines the controller to resolve the inconsistency. Next, the algorithm leverages the vision and language capabilities of pretrained models to ground the controller to the task environment. It collects image-based observations from the task environment and uses the pretrained model to link these observations to the text-based control logic encoded in the controller (e.g., actions and conditions that trigger the actions). We propose a mechanism to ensure the controller satisfies the user-provided specification even when perceptual uncertainties are present. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks, including daily life and robot manipulation tasks.
摘要
近期开发的预训模型可以对多modalitate的资讯进行快速整合,例如文本和图像。然而,这些模型的输出无法直接用于解决织tek的决策任务。我们开发了一个算法,它可以利用预训模型中的知识来建立和验证控制器,并将这些控制器与任务环境联系起来。具体来说,这个算法会将用户提供的文本任务描述输入到预训模型中,然后使用模型的输出来建立自动机-基于的控制器,这个控制器将包含模型中任务相关的知识。接着,这个算法将检查控制器中的知识是否与独立可用的知识一致,这可能包括环境中的抽象信息或用户提供的规则。如果发现任何不一致,这个算法将自动修补控制器以解决这个问题。接下来,这个算法将利用预训模型的视觉和语言能力来联系控制器和任务环境。它将从任务环境中收集图像观察,并使用预训模型来将这些观察与文本基于的控制逻辑相连接(例如,动作和条件)。我们提出了一种机制,以确保控制器遵循用户提供的规则,甚至在感知不确定情况下。我们透过一系列实际任务,包括日常生活和机器人操作任务,证明了这个算法的能力。
paper_authors: Pengfei Ding, Yan Wang, Guanfeng Liu
for: addresses the label sparsity issue in heterogeneous graphs (HGs)
methods: proposes a novel model for Cross-heterogeneity Graph Few-shot Learning, including extracting meta-patterns and a score module to measure the informativeness of labeled samples
results: demonstrates superior performance over the state-of-the-art methods in predicting new classes with few-labeled data on four real-world datasets.Here is the same information in Simplified Chinese text:
for: solves the label sparsity issue in多类型 nodes和 edges的短文本 (HGs)
methods: 提出了跨类型图几拟学学习的新模型,包括提取多态模式和评估标注样本信息的分数模块
results: 在四个实际 dataset上表现出了与状态艺术方法相比的更好的性能,用于预测新类别 WITH few-labeled data.Abstract
In recent years, heterogeneous graph few-shot learning has been proposed to address the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges. The existing methods have achieved good performance by transferring generalized knowledge extracted from rich-labeled classes in source HG(s) to few-labeled classes in a target HG. However, these methods only consider the single-heterogeneity scenario where the source and target HGs share a fixed set of node/edge types, ignoring the more general scenario of cross-heterogeneity, where each HG can have a different and non-fixed set of node/edge types. To this end, we focus on the unexplored cross-heterogeneity scenario and propose a novel model for Cross-heterogeneity Graph Few-shot Learning, namely CGFL. In CGFL, we first extract meta-patterns to capture heterogeneous information and propose a multi-view heterogeneous graph neural network (MHGN) to learn meta-patterns across HGs. Then, we propose a score module to measure the informativeness of labeled samples and determine the transferability of each source HG. Finally, by integrating MHGN and the score module into a meta-learning mechanism, CGFL can effectively transfer generalized knowledge to predict new classes with few-labeled data. Extensive experiments on four real-world datasets have demonstrated the superior performance of CGFL over the state-of-the-art methods.
摘要
近年来,异构图少shot学习(HGFS)已经提出来解决键盘稀缺问题在异构图(HG)中,异构图包含不同类型的节点和边。现有的方法已经实现了好的表现,通过将源HG中的通用知识轮换到目标HG中来预测新的类别。但是,这些方法只考虑单种异构enario,即源和目标HG中共享一定的节点/边类型,忽略了更通用的跨异构enario,即每个HG可以有不同的非固定节点/边类型。为此,我们将注意力集中在未探索的跨异构enario中,并提出了一种新的模型:异构图几种shot学习(CGFL)。在CGFL中,我们首先提取异构模式,以捕捉异构信息,然后提出了多视图异构图神经网络(MHGN)来学习异构模式。接着,我们提出了一个分数模块,用于评估预测样本的信息含量,并确定每个源HG的传输性。最后,通过将MHGN和分数模块集成到一个元学习机制中,CGFL可以有效地将通用知识传输到预测新类别的新样本上。实际实验在四个真实世界数据集上展示了CGFL的优于现有方法。
AI4GCC – Track 3: Consumption and the Challenges of Multi-Agent RL
for: The paper aims to improve the integration of machine learning with traditional economic policy analysis.
methods: The authors suggest including an additional index that accounts for consumption/utility in the evaluation criteria and further investigating the learning dynamics of agents in the simulator and the game theoretic properties of outcomes from proposed negotiation protocols.
results: The paper does not provide specific results, but rather suggests areas for improvement for future iterations of the competition/simulation.Abstract
The AI4GCC competition presents a bold step forward in the direction of integrating machine learning with traditional economic policy analysis. Below, we highlight two potential areas for improvement that could enhance the competition's ability to identify and evaluate proposed negotiation protocols. Firstly, we suggest the inclusion of an additional index that accounts for consumption/utility as part of the evaluation criteria. Secondly, we recommend further investigation into the learning dynamics of agents in the simulator and the game theoretic properties of outcomes from proposed negotiation protocols. We hope that these suggestions can be of use for future iterations of the competition/simulation.
摘要
《AI4GCC比赛对经济政策分析中机器学习的整合做出了勇敢的一步前进。以下是我们对提高比赛评估提案交涉协议的能力的两个可能的改进建议:首先,我们建议添加一个考虑消费/用户需求的指标,作为评估标准之一。其次,我们建议进一步研究模拟器中代理人的学习动态和游戏理论性质。我们希望这些建议可以对未来的比赛/模拟具有帮助。
Vector quantization loss analysis in VQGANs: a single-GPU ablation study for image-to-image synthesis
results: 虽然研究者未能超越现有的benchmark,但发现了一些有趣的现象,如artifacts、代码book size优化和PCA comparative analysis。此外,研究者还发现了在2D Positional Encoding中引入位置编码可以减少artefacts并提供对平衡清晰度和过拟合的洞察。Abstract
This study performs an ablation analysis of Vector Quantized Generative Adversarial Networks (VQGANs), concentrating on image-to-image synthesis utilizing a single NVIDIA A100 GPU. The current work explores the nuanced effects of varying critical parameters including the number of epochs, image count, and attributes of codebook vectors and latent dimensions, specifically within the constraint of limited resources. Notably, our focus is pinpointed on the vector quantization loss, keeping other hyperparameters and loss components (GAN loss) fixed. This was done to delve into a deeper understanding of the discrete latent space, and to explore how varying its size affects the reconstruction. Though, our results do not surpass the existing benchmarks, however, our findings shed significant light on VQGAN's behaviour for a smaller dataset, particularly concerning artifacts, codebook size optimization, and comparative analysis with Principal Component Analysis (PCA). The study also uncovers the promising direction by introducing 2D positional encodings, revealing a marked reduction in artifacts and insights into balancing clarity and overfitting.
摘要
这项研究进行了Vectơr Quantized Generative Adversarial Networks(VQGAN)的ablation分析,专注于使用单个NVIDIA A100 GPU进行图像到图像synthesis。当前研究探讨了关键参数的细腻效果,包括epoch数、图像数量和codebook vector和latent dimension的特性,specifically在有限资源的约束下。特别是,我们围绕vector quantization loss进行了定制,其他 гиперпараметры和损失组件(GAN损失)保持不变。这是为了深入了解权重量化空间的细节,并探讨如何通过变化其大小影响重建。虽然我们的结果不能超越现有的标准,但我们的发现对VQGAN在小数据集上的行为提供了深入的理解,特别是关于缺陷、codebook大小优化和与Principal Component Analysis(PCA)的比较分析。研究还发现了在引入2D位置编码时可以获得明显减少缺陷的效果,并提供了权重和过拟合之间的平衡。
Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection in Autonomous Driving
results: 研究发现,使用Cloud平台可以在实时中运行高精度探测模型,而不需要在本地进行计算。此外,研究还发现了不同压缩算法和探测模型之间的平衡,可以根据具体情况选择最佳的探测质量和响应速度。Abstract
Environmental perception is a key element of autonomous driving because the information received from the perception module influences core driving decisions. An outstanding challenge in real-time perception for autonomous driving lies in finding the best trade-off between detection quality and latency. Major constraints on both computation and power have to be taken into account for real-time perception in autonomous vehicles. Larger object detection models tend to produce the best results, but are also slower at runtime. Since the most accurate detectors cannot run in real-time locally, we investigate the possibility of offloading computation to edge and cloud platforms, which are less resource-constrained. We create a synthetic dataset to train object detection models and evaluate different offloading strategies. Using real hardware and network simulations, we compare different trade-offs between prediction quality and end-to-end delay. Since sending raw frames over the network implies additional transmission delays, we also explore the use of JPEG and H.265 compression at varying qualities and measure their impact on prediction metrics. We show that models with adequate compression can be run in real-time on the cloud while outperforming local detection performance.
摘要
环境感知是自动驾驶中的关键元素,因为感知模块的信息直接影响自动驾驶的核心决策。现实时感知是自动驾驶中的一大挑战,需要找到最佳的折衔点 между检测质量和延迟。由于 computation 和 power 受限,实时感知在自动车辆中存在严格的限制。大型 объек detection 模型通常能生成最佳的结果,但它们在运行时也比较慢。由于最准确的检测器无法在实时地本地运行,我们 investigate 将计算卷积到边缘和云平台上,这些平台具有较强的计算能力和资源。我们创建了一个 sintetic 数据集,用于训练对象检测模型并评估不同的卷积策略。使用真实硬件和网络 simulations,我们比较不同的折衔点 between 预测质量和综合延迟。由于将原始帧发送到网络会导致额外的传输延迟,我们也探索使用 JPEG 和 H.265 压缩在不同质量下的影响。我们发现可以在云上运行具有合适压缩的模型,而且其预测性能超过了本地检测性能。
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI
paper_authors: Hangjie Shi, Leslie Ball, Govind Thattai, Desheng Zhang, Lucy Hu, Qiaozi Gao, Suhaila Shakiah, Xiaofeng Gao, Aishwarya Padmakumar, Bofei Yang, Cadence Chung, Dinakar Guthy, Gaurav Sukhatme, Karthika Arumugam, Matthew Wen, Osman Ipek, Patrick Lange, Rohan Khanna, Shreyas Pansare, Vasu Sharma, Chao Zhang, Cris Flagg, Daniel Pressel, Lavina Vaz, Luke Dai, Prasoon Goyal, Sattvik Sahai, Shaohua Liu, Yao Lu, Anna Gottardi, Shui Hu, Yang Liu, Dilek Hakkani-Tur, Kate Bland, Heather Rocker, James Jeun, Yadunandana Rao, Michael Johnston, Akshaya Iyengar, Arindam Mandal, Prem Natarajan, Reza Ghanadan
for: The paper is written to describe the SimBot Challenge, a new challenge for university teams to build robot assistants that complete tasks in a simulated physical environment.
methods: The paper describes the infrastructure and support provided to the teams, including Alexa Arena, the simulated environment, and an ML toolkit to accelerate the building of vision and language models.
results: The paper summarizes the approaches taken by the participating teams to overcome research challenges and extracts key lessons learned, and provides analysis of the performance of the competing SimBots during the competition.Here is the same information in Simplified Chinese text:
results: 论文总结了参与队伍解决研究挑战的方法,提取了关键的学习经验,并对参赛 SimBots 的性能进行分析。Abstract
The Alexa Prize program has empowered numerous university students to explore, experiment, and showcase their talents in building conversational agents through challenges like the SocialBot Grand Challenge and the TaskBot Challenge. As conversational agents increasingly appear in multimodal and embodied contexts, it is important to explore the affordances of conversational interaction augmented with computer vision and physical embodiment. This paper describes the SimBot Challenge, a new challenge in which university teams compete to build robot assistants that complete tasks in a simulated physical environment. This paper provides an overview of the SimBot Challenge, which included both online and offline challenge phases. We describe the infrastructure and support provided to the teams including Alexa Arena, the simulated environment, and the ML toolkit provided to teams to accelerate their building of vision and language models. We summarize the approaches the participating teams took to overcome research challenges and extract key lessons learned. Finally, we provide analysis of the performance of the competing SimBots during the competition.
摘要
《Alexa Prize计划》已经授权了许多大学生参与创建对话代理人的挑战,如社交谱挑战和任务谱挑战。随着对话交互在多Modal和肉体上出现,我们需要探索对话交互的可用性,并与计算机视觉和物理体附加。这篇论文描述了新的SimBot挑战,university teams在模拟的物理环境中完成任务的机器助手。这篇文章提供了挑战的概述,包括在线和离线两个阶段的挑战。我们描述了提供给团队的基础设施和支持,包括Alexa Arena模拟环境和提供给团队的ML工具箱,以加速他们的视觉和语言模型建立。我们总结了参与队伍所采取的解决研究挑战的方法,并提取出了关键的学习经验。最后,我们对参赛SimBots的性能进行分析。
“Generate” the Future of Work through AI: Empirical Evidence from Online Labor Markets
methods: 本研究采用了差异差分(DID)方法, interpreting the launch of ChatGPT as an exogenous shock,以量化其对文本相关工作和自由职业者的影响。
results: 研究发现,chatGPT的发布导致文本相关工作和自由职业者的交易量减少,特别是在高过去交易量或低质量标准单位上。然而,这种负面影响不是所有服务提供者都经受到的。Abstract
With the advent of general-purpose Generative AI, the interest in discerning its impact on the labor market escalates. In an attempt to bridge the extant empirical void, we interpret the launch of ChatGPT as an exogenous shock, and implement a Difference-in-Differences (DID) approach to quantify its influence on text-related jobs and freelancers within an online labor marketplace. Our results reveal a significant decrease in transaction volume for gigs and freelancers directly exposed to ChatGPT. Additionally, this decline is particularly marked in units of relatively higher past transaction volume or lower quality standards. Yet, the negative effect is not universally experienced among service providers. Subsequent analyses illustrate that freelancers proficiently adapting to novel advancements and offering services that augment AI technologies can yield substantial benefits amidst this transformative period. Consequently, even though the advent of ChatGPT could conceivably substitute existing occupations, it also unfolds immense opportunities and carries the potential to reconfigure the future of work. This research contributes to the limited empirical repository exploring the profound influence of LLM-based generative AI on the labor market, furnishing invaluable insights for workers, job intermediaries, and regulatory bodies navigating this evolving landscape.
摘要
Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding
results: 我们提出了一种 Context-aware 视觉注意力模型和一种深度网络架构,可以帮助理解视频序列中的视觉注意力。Abstract
This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.
摘要
这个博士论文关注了视觉视频序列中的空间时间视注意力模型化和理解的研究和开发。更 Specifically,我们提出了两种计算模型 для视觉视注意力。首先,我们提出了一种基于概率理论的生成模型,用于Context-aware的视注意力模型化和理解。其次,我们开发了一种深度网络架构,用于视注意力模型化,首先估算上下文视注意力,最终用于模型时间域的视注意力。
PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions
results: 通过一系列研究,我们描述了不同的混合提示方法、设计贸易和社会技术挑战,以及未来可调配生成工具的可能性。Abstract
While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPaint, which combines T2I generation with interactions that model how we use colored paints. PromptPaint allows users to go beyond language to mix prompts that express challenging concepts. Just as we iteratively tune colors through layered placements of paint on a physical canvas, PromptPaint similarly allows users to apply different prompts to different canvas areas and times of the generative process. Through a set of studies, we characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models. With PromptPaint we provide insight into future steerable generative tools.
摘要
Diffusion-based文本到图像(T2I)模型提供了一种简单而强大的图像生成方式,但是导向这个生成过程仍然是一个挑战。有些概念难以通过语言描述,用户可能会遇到创建提示的困难。此外,许多这些模型是端到端系统,缺乏生成过程中的迭代调整支持。为了解决这些问题,我们引入PromptPaint,它将T2I生成与用户使用颜色涂抹的交互方式结合。PromptPaint允许用户超过语言来混合表达复杂概念的提示。与物理画布上层层涂抹颜色一样,PromptPaint也 позволяет用户在生成过程中不同的提示应用于不同的画布区域和时间。通过一系列研究,我们描述了不同的混合提示方法、设计负担和生成模型的社会技术挑战。PromptPaint为未来可控生成工具提供了新的视角。
FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
results: 在实验中,这个方法可以实现55%-92%的DSP对象和81%的BRAM使用率的减少,并在实时标签分类和快速图像分类等任务上表现出色。Abstract
Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. With the ever-increasing need for faster computation and lower power consumption, driven by real-time systems and Internet-of-Things (IoT) devices, FPGAs have emerged as suitable devices for deep learning inference. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning, quantization and knowledge distillation, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. The primary emphasis is on real-time inference, with latencies in the order of 1$\mu$s, accelerated with hls4ml, an open-source framework for deep learning inference on FPGAs. Evaluated on a range of tasks, including real-time particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves a reduction ranging between 55% and 92% in the utilization of digital signal processing blocks (DSP) and up to 81% in block memory (BRAM) utilization.
摘要
DOST – Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels
results: 实验结果显示,DOST方法能够提高多标签分类任务中的性能,并具有更高的数据效率和领域遵循性。Abstract
The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of "multi-label classification" (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question has certain logical constraints, noisy annotations often exacerbate their violations, making such a system unacceptable to an expert. This paper studies the effect of label noise on domain rule violation incidents in the MLC task, and incorporates domain rules into our learning algorithm to mitigate the effect of noise. We propose the Domain Obedient Self-supervised Training (DOST) paradigm which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. This novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, thus making it more "data efficient" and domain compliant. Empirical studies, performed over two large scale multi-label classification datasets, demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.
摘要
due to the enormous demand for annotated data brought about by deep learning techniques, the problem of annotation noise has become a major issue. while this problem has been widely discussed in the field of machine learning, it has received relatively little attention in the context of "multi-label classification" (MLC) tasks, which involve more complex types of noise. furthermore, when the domain in question has certain logical constraints, noisy annotations can exacerbate their violations, making the system unacceptable to experts. this paper investigates the effect of label noise on domain rule violations in MLC tasks and incorporates domain rules into our learning algorithm to mitigate the effect of noise. we propose the "Domain Obedient Self-supervised Training" (DOST) paradigm, which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. this novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, making it more "data efficient" and domain compliant. empirical studies conducted on two large-scale multi-label classification datasets demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
results: 对比之前的状态艺模型,本文的提posed方法在图像生成和布局规划方面具有显著的超越性,并且可以在不提供任何指导信息的情况下实现高准确性图像生成。Abstract
In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it possible to generate rich kinds of novel photorealistic images. However, current models still face misalignment issues (e.g., problematic spatial relation understanding and numeration failure) in complex natural scenes, which impedes the high-faithfulness text-to-image generation. Although recent efforts have been made to improve controllability by giving fine-grained guidance (e.g., sketch and scribbles), this issue has not been fundamentally tackled since users have to provide such guidance information manually. In this work, we strive to synthesize high-fidelity images that are semantically aligned with a given textual prompt without any guidance. Toward this end, we propose a coarse-to-fine paradigm to achieve layout planning and image generation. Concretely, we first generate the coarse-grained layout conditioned on a given textual prompt via in-context learning based on Large Language Models. Afterward, we propose a fine-grained object-interaction diffusion method to synthesize high-faithfulness images conditioned on the prompt and the automatically generated layout. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art models in terms of layout and image generation. Our code and settings are available at https://layoutllm-t2i.github.io.
摘要
在文本至图生成领域,最近有remarkable进步在稳定扩散方面,使得可以生成丰富的新型高品质图像。然而,当前模型在复杂的自然场景中仍然存在误差问题(如场景理解和数字化问题),这会限制高准确性文本至图生成。虽然最近有努力改进可控性,如给予细化指导(如素描和涂鸦),但这些问题没有得到基础性解决,用户需要手动提供指导信息。在这项工作中,我们努力生成具有高准确性的图像,与给定的文本提示semantic上一致,无需任何指导。为此,我们提出了一种循环式的coarse-to-fine方法,包括场景规划和图像生成。具体来说,我们首先通过基于大语言模型的受容学习来生成基于文本提示的粗细化场景。然后,我们提出了一种基于提示和自动生成的场景进行高准确性图像生成的细化对象互动扩散方法。广泛的实验表明,我们的提出方法在场景和图像生成方面超过了状态艺术模型的性能。我们的代码和设置可以在https://layoutllm-t2i.github.io获取。
Organizational Bulk Email Systems: Their Role and Performance in Remote Work
results: 该论文未提出明确的结果,主要是为了探讨远程工作环境下组织交流系统的问题和解决方案。Abstract
The COVID-19 pandemic has forced many employees to work from home. Organizational bulk emails now play a critical role to reach employees with central information in this work-from-home environment. However, we know from our own recent work that organizational bulk email has problems: recipients fail to retain the bulk messages they received from the organization; recipients and senders have different opinions on which bulk messages were important; and communicators lack technology support to better target and design messages. In this position paper, first we review the prior work on evaluating, designing, and prototyping organizational communication systems. Second we review our recent findings and some research techniques we found useful in studying organizational communication. Last we propose a research agenda to study organizational communications in remote work environment and suggest some key questions and potential study directions.
摘要
COVID-19 大流行导致许多员工需要从家中办公。组织批量电子邮件现在在这个从家中工作环境中扮演着关键角色,将中央信息传递给员工。然而,我们从我们最近的工作中发现,组织批量电子邮件存在一些问题:收件人Difficult to retain bulk messages received from the organization; recipients and senders have different opinions on which bulk messages were important; and communicators lack technology support to better target and design messages。在这份位点纸上,我们首先回顾了评估、设计和实验室通信系统的先前工作。其次,我们回顾了我们最近的发现和一些有用的研究技术。最后,我们提出了研究计划,以研究组织通信在远程工作环境中,并提出了一些关键问题和潜在的研究方向。
Drones4Good: Supporting Disaster Relief Through Remote Sensing and AI
paper_authors: Nina Merkle, Reza Bahmanyar, Corentin Henry, Seyed Majid Azimi, Xiangtian Yuan, Simon Schopferer, Veronika Gstaiger, Stefan Auer, Anne Schneibel, Marc Wieland, Thomas Kraft
results: 研究结果表明了大规模图像分析在场景中的可行性,以及在无人机上部署自动化援助物资发送可以提高物资发送的安全性。Abstract
In order to respond effectively in the aftermath of a disaster, emergency services and relief organizations rely on timely and accurate information about the affected areas. Remote sensing has the potential to significantly reduce the time and effort required to collect such information by enabling a rapid survey of large areas. To achieve this, the main challenge is the automatic extraction of relevant information from remotely sensed data. In this work, we show how the combination of drone-based data with deep learning methods enables automated and large-scale situation assessment. In addition, we demonstrate the integration of onboard image processing techniques for the deployment of autonomous drone-based aid delivery. The results show the feasibility of a rapid and large-scale image analysis in the field, and that onboard image processing can increase the safety of drone-based aid deliveries.
摘要
为了有效应对灾难后math,急救服务和救济机构需要快速和准确的信息关于受到影响的区域。远程感知技术有望大幅减少收集这些信息所需的时间和劳动。为此,主要挑战是自动提取远程感知数据中相关信息。在这种工作中,我们表明了结合无人机数据和深度学习方法可以实现自动化和大规模情况评估。此外,我们还示出了在无人机上部署自动救济物资交付的可能性,并且显示了在场景中实现大规模图像分析的可能性,以及在无人机上部署可以提高救济物资交付的安全性。
Competitions in AI – Robustly Ranking Solvers Using Statistical Resampling
results: 应用于最近的SAT、人工智能规划和计算机视觉竞赛中,本研究发现了经常出现的统计平局和排名倒转现象,与官方结果相比,显示了一定的不一致性。Abstract
Solver competitions play a prominent role in assessing and advancing the state of the art for solving many problems in AI and beyond. Notably, in many areas of AI, competitions have had substantial impact in guiding research and applications for many years, and for a solver to be ranked highly in a competition carries considerable weight. But to which extent can we expect competition results to generalise to sets of problem instances different from those used in a particular competition? This is the question we investigate here, using statistical resampling techniques. We show that the rankings resulting from the standard interpretation of competition results can be very sensitive to even minor changes in the benchmark instance set used as the basis for assessment and can therefore not be expected to carry over to other samples from the same underlying instance distribution. To address this problem, we introduce a novel approach to statistically meaningful analysis of competition results based on resampling performance data. Our approach produces confidence intervals of competition scores as well as statistically robust solver rankings with bounded error. Applied to recent SAT, AI planning and computer vision competitions, our analysis reveals frequent statistical ties in solver performance as well as some inversions of ranks compared to the official results based on simple scoring.
摘要
解决比赛在许多人工智能领域中扮演着重要的角色,以评估和推动技术的进步。在许多领域中,比赛的影响非常大,一个解决方案在比赛中的排名具有很大的重要性。然而,我们是否能期望比赛结果在不同的问题集上进行泛化呢?这是我们在这里进行研究的问题。使用统计重样技术,我们发现了比赛结果的排名可以受到小于1%的变化在基础实例集上的影响,这意味着不能预期在其他样本上进行泛化。为解决这个问题,我们提出了一种新的统计意义上的比赛结果分析方法,该方法可以生成比赛 scores 的置信区间以及带有 bounded error 的有效解决方案排名。应用于最近的 SAT、人工智能规划和计算机视觉比赛,我们的分析发现了许多解决方案之间的统计平局,以及一些与官方结果不同的排名。
paper_authors: Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang for:AudioSep is a foundation model for open-domain audio source separation with natural language queries, aiming to separate a target sound from an audio mixture given a natural language query.methods:AudioSep is trained on large-scale multimodal datasets and evaluated on numerous tasks including audio event separation, musical instrument separation, and speech enhancement.results:AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability using audio captions or text labels as queries, substantially outperforming previous audio-queried and language-queried sound separation models.Abstract
Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable interface for digital audio applications. Recent works on LASS, despite attaining promising separation performance on specific sources (e.g., musical instruments, limited classes of audio events), are unable to separate audio concepts in the open domain. In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries. We train AudioSep on large-scale multimodal datasets and extensively evaluate its capabilities on numerous tasks including audio event separation, musical instrument separation, and speech enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability using audio captions or text labels as queries, substantially outperforming previous audio-queried and language-queried sound separation models. For reproducibility of this work, we will release the source code, evaluation benchmark and pre-trained model at: https://github.com/Audio-AGI/AudioSep.
摘要
新的语音场景分析(CASA) paradigma:语言-声音源分离(LASS)。LASS aimsto separate a target sound from an audio mixture given a natural language query, providing a natural and scalable interface for digital audio applications. Recent works on LASS, despite achieving promising separation performance on specific sources (e.g., musical instruments, limited classes of audio events), are unable to separate audio concepts in the open domain. In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries. We train AudioSep on large-scale multimodal datasets and extensively evaluate its capabilities on numerous tasks including audio event separation, musical instrument separation, and speech enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability using audio captions or text labels as queries, significantly outperforming previous audio-queried and language-queried sound separation models. For reproducibility of this work, we will release the source code, evaluation benchmark, and pre-trained model at: https://github.com/Audio-AGI/AudioSep.
Expert load matters: operating networks at high accuracy and low manual effort
results: 该论文的实验结果表明,使用该补做函数可以同时提高模型的准确率和减少人类专家的干预数量,并且可以更好地检测不符合样本。Abstract
In human-AI collaboration systems for critical applications, in order to ensure minimal error, users should set an operating point based on model confidence to determine when the decision should be delegated to human experts. Samples for which model confidence is lower than the operating point would be manually analysed by experts to avoid mistakes. Such systems can become truly useful only if they consider two aspects: models should be confident only for samples for which they are accurate, and the number of samples delegated to experts should be minimized. The latter aspect is especially crucial for applications where available expert time is limited and expensive, such as healthcare. The trade-off between the model accuracy and the number of samples delegated to experts can be represented by a curve that is similar to an ROC curve, which we refer to as confidence operating characteristic (COC) curve. In this paper, we argue that deep neural networks should be trained by taking into account both accuracy and expert load and, to that end, propose a new complementary loss function for classification that maximizes the area under this COC curve. This promotes simultaneously the increase in network accuracy and the reduction in number of samples delegated to humans. We perform experiments on multiple computer vision and medical image datasets for classification. Our results demonstrate that the proposed loss improves classification accuracy and delegates less number of decisions to experts, achieves better out-of-distribution samples detection and on par calibration performance compared to existing loss functions.
摘要
人工智能和人类合作系统在关键应用中,以确保最小错误,用户应该根据模型信任度设置操作点,以确定决策是否交由人类专家。样本的模型信任度低于操作点的情况应该由专家手动分析以避免错误。这些系统才能真正有用,需要考虑两个方面:模型只应该对它们精度高的样本表现出自信,而且尽量减少向人类专家 delegates 的样本数量。特别是在医疗领域,专家时间是有限的和昂贵的。这种贸易关系可以用一个类似于 ROC 曲线的折衔曲线来表示,我们称之为信任操作特征(COC)曲线。在这篇文章中,我们 argue 深度神经网络应该在考虑准确性和专家负担的同时进行训练,并为此提出了一种新的补偿损失函数。这种损失函数可以同时提高网络准确性和减少向人类专家 delegates 的样本数量。我们在多个计算机视觉和医疗影像Dataset上进行了实验,结果表明,我们的损失函数可以提高分类准确性,同时减少向人类专家 delegates 的样本数量,并且在Out-of-distribution样本检测和Calibration性能方面表现良好。
results: 通过训练模型并对其进行优化,可以实现更高的摘要生成精度和更高的效率。Abstract
Code summarization aims to generate concise natural language descriptions for source code. The prevailing approaches adopt transformer-based encoder-decoder architectures, where the Abstract Syntax Tree (AST) of the source code is utilized for encoding structural information. However, ASTs are much longer than the corresponding source code, and existing methods ignore this size constraint by directly feeding the entire linearized AST into the encoders. This simplistic approach makes it challenging to extract truly valuable dependency relations from the overlong input sequence and leads to significant computational overhead due to self-attention applied to all nodes in the AST. To address this issue effectively and efficiently, we present a model, AST-MHSA that uses multi-head attention to extract the important semantic information from the AST. The model consists of two main components: an encoder and a decoder. The encoder takes as input the abstract syntax tree (AST) of the code and generates a sequence of hidden states. The decoder then takes these hidden states as input and generates a natural language summary of the code. The multi-head attention mechanism allows the model to learn different representations of the input code, which can be combined to generate a more comprehensive summary. The model is trained on a dataset of code and summaries, and the parameters of the model are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
摘要
仪器简化目标在编译代码时生成简洁的自然语言描述。现有方法大多采用变换器基于encoder-decoder架构,其中源代码的抽象 sintaxis树(AST)被用于编码结构信息。然而,AST比源代码更长,现有方法直接将整个 linearized AST 传递给编码器,这种简单的方法使得EXTRACTING valuable dependency relations from the oversized input sequence becomes challenging, leading to significant computational overhead due to self-attention applied to all nodes in the AST.为解决这个问题,我们提出了一个模型,AST-MHSA,该模型使用多头注意力机制来EXTRACT important semantic information from the AST.该模型包括两个主要组件:编码器和解码器。编码器接受源代码的抽象 sintaxis树(AST)作为输入,并生成一个序列hidden states。解码器接受这些hidden states作为输入,并生成一个自然语言描述。多头注意力机制允许模型学习不同的输入代码的表示,这些表示可以结合以生成更加全面的描述。模型在一个代码和描述的集合上训练,并优化参数以Minimize the loss between the generated summaries and the ground-truth summaries。
IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer
results: 实验结果显示,提出的IIHT方法可以实现高品质的医疗报告生成,并且在不同的评估指标下表现出色。此外,这个方法还可以让医生在实际应用中修改病理指标,以确保报告的准确性和流畅性。Abstract
Automated medical report generation has become increasingly important in medical analysis. It can produce computer-aided diagnosis descriptions and thus significantly alleviate the doctors' work. Inspired by the huge success of neural machine translation and image captioning, various deep learning methods have been proposed for medical report generation. However, due to the inherent properties of medical data, including data imbalance and the length and correlation between report sequences, the generated reports by existing methods may exhibit linguistic fluency but lack adequate clinical accuracy. In this work, we propose an image-to-indicator hierarchical transformer (IIHT) framework for medical report generation. It consists of three modules, i.e., a classifier module, an indicator expansion module and a generator module. The classifier module first extracts image features from the input medical images and produces disease-related indicators with their corresponding states. The disease-related indicators are subsequently utilised as input for the indicator expansion module, incorporating the "data-text-data" strategy. The transformer-based generator then leverages these extracted features along with image features as auxiliary information to generate final reports. Furthermore, the proposed IIHT method is feasible for radiologists to modify disease indicators in real-world scenarios and integrate the operations into the indicator expansion module for fluent and accurate medical report generation. Extensive experiments and comparisons with state-of-the-art methods under various evaluation metrics demonstrate the great performance of the proposed method.
摘要
医学报告自动生成已成为医学分析中不可或缺的一环。它可以生成计算机辅助诊断描述,从而减轻医生的工作负担。启发于神经机器翻译和图像描述的巨大成功,各种深度学习方法在医学报告生成中得到了广泛应用。然而,由于医学数据的内在特性,包括数据不均衡和报告序列长度和相关性,现有方法生成的报告可能具有语言流畅性,但缺乏准确的医学精度。在这项工作中,我们提出了一种图像层次变换器(IIHT)框架,用于医学报告生成。该框架包括三个模块:分类模块、指标扩展模块和生成模块。首先,分类模块从输入医学图像中提取出疾病相关的特征,并生成相应的疾病指标。然后,指标扩展模块使用“数据-文本-数据”策略,将疾病指标扩展为更加详细的报告。最后,基于变换器的生成器利用这些提取的特征和图像特征作为辅助信息,生成最终的报告。此外,我们的IIHT方法可以让医生在实际应用场景中修改疾病指标,并将操作集成到指标扩展模块中,以实现流畅和准确的医学报告生成。我们对多种评价指标进行了广泛的实验和比较,结果表明我们的方法具有出色的表现。
LASIGE and UNICAGE solution to the NASA LitCoin NLP Competition
methods: 本研究使用了行业数据工程解决方案和学术系统(LasigeUnicage_NER和BiOnt)的集成,并将外部知识(其他数据集和生物学 ontology)integrated into the pipeline。
results: 本研究在2022年LitCoin NLP挑战赛中获得了第7名,表明了学术和行业之间的成功合作(LASIGE和Unicage)。软件支持此研究可以在GitHub上获取:https://github.com/lasigeBioTM/Litcoin-Lasige_Unicage。Abstract
Biomedical Natural Language Processing (NLP) tends to become cumbersome for most researchers, frequently due to the amount and heterogeneity of text to be processed. To address this challenge, the industry is continuously developing highly efficient tools and creating more flexible engineering solutions. This work presents the integration between industry data engineering solutions for efficient data processing and academic systems developed for Named Entity Recognition (LasigeUnicage\_NER) and Relation Extraction (BiOnt). Our design reflects an integration of those components with external knowledge in the form of additional training data from other datasets and biomedical ontologies. We used this pipeline in the 2022 LitCoin NLP Challenge, where our team LasigeUnicage was awarded the 7th Prize out of approximately 200 participating teams, reflecting a successful collaboration between the academia (LASIGE) and the industry (Unicage). The software supporting this work is available at \url{https://github.com/lasigeBioTM/Litcoin-Lasige_Unicage}.
摘要
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content
paper_authors: Xinlei He, Savvas Zannettou, Yun Shen, Yang Zhang for: 这paper主要针对的是如何使用大语言模型(LLMs)和提示学习来解决在线社会中的恶意内容问题。methods: 这paper使用了五种模型建筑和八个数据集来评估LLMs和提示学习在恶意内容检测、恶意 span 检测和净化等三个任务中的性能。results: 研究发现,使用LLMs和提示学习可以在恶意内容检测和恶意 span 检测任务中达到类似或更好的性能,而且在净化任务中可以成功减少恶意内容的平均分数(从0.775降至0.213)而不会产生Semantic 损害。Abstract
The spread of toxic content online is an important problem that has adverse effects on user experience online and in our society at large. Motivated by the importance and impact of the problem, research focuses on developing solutions to detect toxic content, usually leveraging machine learning (ML) models trained on human-annotated datasets. While these efforts are important, these models usually do not generalize well and they can not cope with new trends (e.g., the emergence of new toxic terms). Currently, we are witnessing a shift in the approach to tackling societal issues online, particularly leveraging large language models (LLMs) like GPT-3 or T5 that are trained on vast corpora and have strong generalizability. In this work, we investigate how we can use LLMs and prompt learning to tackle the problem of toxic content, particularly focusing on three tasks; 1) Toxicity Classification, 2) Toxic Span Detection, and 3) Detoxification. We perform an extensive evaluation over five model architectures and eight datasets demonstrating that LLMs with prompt learning can achieve similar or even better performance compared to models trained on these specific tasks. We find that prompt learning achieves around 10\% improvement in the toxicity classification task compared to the baselines, while for the toxic span detection task we find better performance to the best baseline (0.643 vs. 0.640 in terms of $F_1$-score). Finally, for the detoxification task, we find that prompt learning can successfully reduce the average toxicity score (from 0.775 to 0.213) while preserving semantic meaning.
摘要
互联网上的毒害内容问题是一个重要的问题,它对于用户在线体验和社会的影响都有不良的影响。为了解决这个问题,研究者们通常会使用机器学习(ML)模型,这些模型通常是基于人工标注的数据集进行训练。然而,这些模型通常无法泛化好和适应新趋势(如新的毒害词语的出现)。目前,我们正在见证到一种shift在在线社会问题的解决方法,即通过大型自然语言模型(LLMs)如GPT-3或T5进行训练,这些模型可以在庞大的数据集上进行训练,并且具有强大的泛化能力。在这项工作中,我们研究了如何使用LLMs和提示学习来解决毒害内容问题,特别关注以下三个任务:1)毒害分类,2)毒害块检测,3)毒害除掉。我们对五种模型结构进行了广泛的评估,并对八个数据集进行了extensive的评估,得到的结果表明,LLMs通过提示学习可以与特定任务训练的模型 achieve similar or even better性能。我们发现,提示学习在毒害分类任务中可以提高约10%的性能,而在毒害块检测任务中,我们的模型可以与最佳基eline(0.643 vs. 0.640)相比,而在毒害除掉任务中,我们的模型可以成功地减少平均毒害分数(从0.775下降至0.213),同时保持 semantics的意义。
results: 研究结果表明,LMs的输出可以建立“字符串”与世界之间的连接,并且这种连接是通过语言产生的。Abstract
What do language models (LMs) do with language? Everyone agrees that they produce sequences of (mostly) coherent sentences. But are they saying anything with those strings or simply babbling in a convincing simulacrum of language use? This is a vague question, and there are many ways of making it precise. Here we will address one aspect of the question, namely, whether LMs' words refer: that is, whether the outputs of LMs achieve "word-to-world" connections. There is prima facie reason to think they do not since LMs do not interact with the world in the way that ordinary language users do. Drawing on insights from the externalist tradition in philosophy of language, we argue that appearances are misleading and that there is good reason to think that LMs can refer.
摘要
Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages
methods: using transliteration and linguistic similarity (利用转写和语言相似性)
results: achieves scores within 3 BLEU of large-scale pivot-based models when trained on 50% of the language directions (在50%的语言方向上训练后,与大规模的 pivot-based 模型的分值相似)Abstract
Current research in zero-shot translation is plagued by several issues such as high compute requirements, increased training time and off target translations. Proposed remedies often come at the cost of additional data or compute requirements. Pivot based neural machine translation is preferred over a single-encoder model for most settings despite the increased training and evaluation time. In this work, we overcome the shortcomings of zero-shot translation by taking advantage of transliteration and linguistic similarity. We build a single encoder-decoder neural machine translation system for Dravidian-Dravidian multilingual translation and perform zero-shot translation. We compare the data vs zero-shot accuracy tradeoff and evaluate the performance of our vanilla method against the current state of the art pivot based method. We also test the theory that morphologically rich languages require large vocabularies by restricting the vocabulary using an optimal transport based technique. Our model manages to achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50\% of the language directions.
摘要
当前的零shot翻译研究受到许多问题的困扰,如高计算需求、增长训练时间和目标翻译错误。提出的解决方案通常会增加更多的数据或计算需求。 pivot基于神经机器翻译模型被广泛采用,尽管它增加了训练和评估时间。在这项工作中,我们超越零shot翻译的缺点,利用翻译和语言相似性。我们构建了单个encoder-decoder神经机器翻译系统,用于欧洲-欧洲多语言翻译,并实现零shot翻译。我们比较数据VS零shot精度的贸易和现有状态的 pivot基于方法的性能。我们还测试了语言富有 morphology 的语言需要大 vocabulary 的假设,使用最优运输算法来限制词汇。我们的模型在训练50%的语言方向时达到了与大规模 pivot基于模型相当的 scores,并且在 3 BLEU 之内。
Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis
results: 经过广泛的实验,本研究发现了不同方法的性能指标,包括准确率、特异度、准确率和F1分数,可以帮助研究人员和实践者在面临骗inson内容时做出了 informed decisions。Abstract
Deceptive text classification is a critical task in natural language processing that aims to identify deceptive o fraudulent content. This study presents a comparative analysis of machine learning and transformer-based approaches for deceptive text classification. We investigate the effectiveness of traditional machine learning algorithms and state-of-the-art transformer models, such as BERT, XLNET, DistilBERT, and RoBERTa, in detecting deceptive text. A labeled dataset consisting of deceptive and non-deceptive texts is used for training and evaluation purposes. Through extensive experimentation, we compare the performance metrics, including accuracy, precision, recall, and F1 score, of the different approaches. The results of this study shed light on the strengths and limitations of machine learning and transformer-based methods for deceptive text classification, enabling researchers and practitioners to make informed decisions when dealing with deceptive content.
摘要
伪装文本分类是自然语言处理中的一项重要任务,旨在识别伪装或fraudulent内容。本研究进行了机器学习和转换器基于方法的比较分析,用于检测伪装文本。我们使用了一个标注数据集,其中包含伪装和非伪装文本,以进行训练和评估。通过广泛的实验,我们比较了不同方法的性能指标,包括准确率、精度、回归率和F1分数。研究结果为研究者和实践者提供了有用的指导,帮助他们在处理伪装内容时做出 Informed 决策。
WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine
results: 这篇研究显示了 WeaverBird 比其他模型更高的性能,通过处理多种金融相关的问题。如果您想了解更多,可以通过访问我们的线上示范 https://weaverbird.ttic.edu,以及观看我们的2分钟视频示范https://www.youtube.com/watch?v=yofgeqnlrMc。Abstract
We present WeaverBird, an intelligent dialogue system designed specifically for the finance domain. Our system harnesses a large language model of GPT architecture that has been tuned using extensive corpora of finance-related text. As a result, our system possesses the capability to understand complex financial queries, such as "How should I manage my investments during inflation?", and provide informed responses. Furthermore, our system incorporates a local knowledge base and a search engine to retrieve relevant information. The final responses are conditioned on the search results and include proper citations to the sources, thus enjoying an enhanced credibility. Through a range of finance-related questions, we have demonstrated the superior performance of our system compared to other models. To experience our system firsthand, users can interact with our live demo at https://weaverbird.ttic.edu, as well as watch our 2-min video illustration at https://www.youtube.com/watch?v=yofgeqnlrMc.
摘要
我们介绍WeaverBird,一个专门为金融领域设计的智能对话系统。我们的系统利用一个大型语言模型基于GPT架构,通过对金融相关文本的大量训练而调整。因此,我们的系统具有理解复杂金融查询的能力,如“如何在Inflation中管理投资?”,并提供了有知识的回答。另外,我们的系统还包括本地知识库和搜索引擎,以 retrieve relevant information。最终的回答基于搜索结果,并包含对源的正确引用,因此受到加强的信任。通过一系列金融相关问题的测试,我们已经证明了我们的系统在其他模型之上的超越表现。如果您想体验我们的系统,可以通过https://weaverbird.ttic.edu的实时示例和https://www.youtube.com/watch?v=yofgeqnlrMc的2分钟视频资料来了解更多。
results: 这篇论文建立了一个包含 50,000 句对的平行对照词库,其中包含了大约 530,000 个对应关系和一个词汇词组词典,其中包含 49,397 个词和短语对。Abstract
Informal language is a style of spoken or written language frequently used in casual conversations, social media, weblogs, emails and text messages. In informal writing, the language faces some lexical and/or syntactic changes varying among different languages. Persian is one of the languages with many differences between its formal and informal styles of writing, thus developing informal language processing tools for this language seems necessary. Such a converter needs a large aligned parallel corpus of colloquial-formal sentences which can be useful for linguists to extract a regulated grammar and orthography for colloquial Persian as is done for the formal language. In this paper we explain our methodology in building a parallel corpus of 50,000 sentence pairs with alignments in the word/phrase level. The sentences were attempted to cover almost all kinds of lexical and syntactic changes between informal and formal Persian, therefore both methods of exploring and collecting from the different resources of informal scripts and following the phonological and morphological patterns of changes were applied to find as much instances as possible. The resulting corpus has about 530,000 alignments and a dictionary containing 49,397 word and phrase pairs.
摘要
便衣语言是一种常用于日常对话、社交媒体、博客、电子邮件和短信的语言样式。在便衣写作中,语言会经历一些词汇和语法变化,这些变化可能在不同的语言中出现。波斯语是一种语言,其正式和便衣样式之间存在很多差异,因此开发一个便衣语言处理工具 для这种语言是必要的。这种转换器需要一个大量的同步并行的便衣-正式句子对Alignment的词/短语级别 parallel corpus。在这篇论文中,我们介绍了我们的方法ологи寻找和收集50,000个句子对Alignment的方法。这些句子尽可能地覆盖了便衣波斯语中的所有类型的词汇和语法变化,因此我们采用了从不同的 informal scripts 和遵循语音和形态变化的方法来找到最多的实例。最终的 corpus 包含了约530,000个对Alignment和一个字典,其中包含了49,397个词和短语对。
Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning
results: 我们的方法可以有效地适应新的结构形式,并在比较现有方法时显著提高表达能力,例如在转移模型从表格输入到知识图集合中的零 shot BLEU 分数中提高了66%。Abstract
We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios by providing a unified representation that can handle various forms of structured data such as tables, knowledge graph triples, and meaning representations. We demonstrate that our proposed approach can effectively adapt to new structured forms, and can improve performance in comparison to current methods. For example, our method resulted in a 66% improvement in zero-shot BLEU scores when transferring models trained on table inputs to a knowledge graph dataset. Our proposed method is an important step towards a more general data-to-text generation framework.
摘要
我们提出了一种新的数据结构化到文本生成方法,旨在解决现有方法的局限性,主要专注于特定类型的结构数据。我们的提议方法可以在多任务训练、零shot和几shot情况下提高性能,通过提供一个统一的表示方式,处理不同类型的结构数据,如表格、知识图三元组和意义表示。我们示示了我们的提议方法可以有效地适应新的结构形式,并在与当前方法进行比较时提高性能。例如,我们的方法在适应知识图数据集时,对于从表格输入的模型进行转移而达到了66%的零shot BLEU分数提升。我们的提议方法是一个重要的步骤 towards更通用的数据结构化到文本生成框架。
Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season
results: 研究发现,Twitter用户主要关注了“健康影响”、“损害”和“疏散”等三个主题。使用SIR理论分析话题的传播范围和速度,发现居民在野火爆发期间表现出了高度的一些担忧。Abstract
Effective disaster response is critical for affected communities. Responders and decision-makers would benefit from reliable, timely measures of the issues impacting their communities during a disaster, and social media offers a potentially rich data source. Social media can reflect public concerns and demands during a disaster, offering valuable insights for decision-makers to understand evolving situations and optimize resource allocation. We used Bidirectional Encoder Representations from Transformers (BERT) topic modeling to cluster topics from Twitter data. Then, we conducted a temporal-spatial analysis to examine the distribution of these topics across different regions during the 2020 western U.S. wildfire season. Our results show that Twitter users mainly focused on three topics:"health impact," "damage," and "evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to explore the magnitude and velocity of topic diffusion on Twitter. The results displayed a clear relationship between topic trends and wildfire propagation patterns. The estimated parameters obtained from the SIR model in selected cities revealed that residents exhibited a high level of several concerns during the wildfire. Our study details how the SIR model and topic modeling using social media data can provide decision-makers with a quantitative approach to measure disaster response and support their decision-making processes.
摘要
“有效的灾害应急应对是影响的社区的关键。响应者和决策者可以从社交媒体中获得可靠、时间性的灾害影响社区的信息,以获得更好的决策。社交媒体可以反映公众对灾害的关注和要求,提供决策者理解灾害发展的价值信息。我们使用了Transformers(BERT)话题模型将Twitter数据集分为话题,然后对这些话题的空间分布进行时间-空间分析。我们的结果显示,Twitter用户主要关注的是“健康影响”、“损害”和“疏散”三个话题。我们使用了感染传播理论(SIR)模型来探索话题在Twitter上的扩散范围和速度。结果显示,话题趋势和野火传播模式之间存在明显的关系。我们的研究表明,使用社交媒体数据和SIR模型可以为决策者提供一种量化的方法来衡量灾害应急应对和支持决策过程。”
A Novel Self-training Approach for Low-resource Speech Recognition
paper_authors: Satwinder Singh, Feng Hou, Ruili Wang
for: 提高低资源语言自动语音识别(ASR)的精度。
methods: 提议使用自我准备方法生成低资源语言无标注语音的高度准确 pseudo-标签,以解决低资源语言 ASR 系统的发展困难。
results: 对四个实际语音Dataset进行实验,比基eline模型提高单词错误率14.94%,并在Common Voice Punjabi Dataset上达到最佳结果。Abstract
In this paper, we propose a self-training approach for automatic speech recognition (ASR) for low-resource settings. While self-training approaches have been extensively developed and evaluated for high-resource languages such as English, their applications to low-resource languages like Punjabi have been limited, despite the language being spoken by millions globally. The scarcity of annotated data has hindered the development of accurate ASR systems, especially for low-resource languages (e.g., Punjabi and M\=aori languages). To address this issue, we propose an effective self-training approach that generates highly accurate pseudo-labels for unlabeled low-resource speech. Our experimental analysis demonstrates that our approach significantly improves word error rate, achieving a relative improvement of 14.94% compared to a baseline model across four real speech datasets. Further, our proposed approach reports the best results on the Common Voice Punjabi dataset.
摘要
在这篇论文中,我们提出了一种自动听说训练方法,用于低资源语言自动听说识别(ASR)。虽然自动听说训练方法在高资源语言如英语上已经广泛开发和评估,但是它们在低资源语言如旁遮普语上的应用却受到了限制,尽管这种语言由全球数百万人说。缺乏标注数据的问题使得低资源语言ASR系统的开发受到了很大的阻碍,尤其是旁遮普语和Maori语言。为解决这个问题,我们提出了一种高度有效的自动听说训练方法,可以生成优质的 Pseudo-标签 для无标注的低资源语音。我们的实验分析表明,我们的方法可以在四个真实语音数据集上显著改善单词错误率,相比基准模型,实现了14.94%的相对改善。而我们提出的方法在Common Voice Punjabi数据集上的结果最佳。
Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes
paper_authors: Sharon Jiang, Shannon Shen, Monica Agrawal, Barbara Lam, Nicholas Kurtzman, Steven Horng, David Karger, David Sontag
for: 降低医生疲劳和提高诊断效率
methods: 利用电子医疗记录(EHR)审核日志数据进行机器学习监督,实时动态提取相关病史信息
results: 在紧急医学部门中,我们的方法可以达到0.963的准确率,帮助医生更快速地找到相关病史信息,并在用户研究中得到了证明。Abstract
The large amount of time clinicians spend sifting through patient notes and documenting in electronic health records (EHRs) is a leading cause of clinician burnout. By proactively and dynamically retrieving relevant notes during the documentation process, we can reduce the effort required to find relevant patient history. In this work, we conceptualize the use of EHR audit logs for machine learning as a source of supervision of note relevance in a specific clinical context, at a particular point in time. Our evaluation focuses on the dynamic retrieval in the emergency department, a high acuity setting with unique patterns of information retrieval and note writing. We show that our methods can achieve an AUC of 0.963 for predicting which notes will be read in an individual note writing session. We additionally conduct a user study with several clinicians and find that our framework can help clinicians retrieve relevant information more efficiently. Demonstrating that our framework and methods can perform well in this demanding setting is a promising proof of concept that they will translate to other clinical settings and data modalities (e.g., labs, medications, imaging).
摘要
丰富的时间投入在检索病人病历和电子医疗记录 (EHR) 中是临床人员疲劳的主要原因。我们提出了一种在医疗记录过程中积极动态检索有关病人历史的方法,以减少找到有关病人历史的努力。在本工作中,我们使用EHR审核日志作为机器学习的监督来确定病人病历中的有关性。我们的评估将注重在急诊室中进行实时检索,这是一个高危级设置,具有独特的信息检索和笔记录模式。我们的方法可以达到0.963的AUC,用于预测单个笔记录会被读取。此外,我们还进行了一些临床医生的用户研究,发现我们的框架可以帮助临床人员更有效地检索有关信息。这是一个有前途的证明,表明我们的框架和方法可以在其他临床设置和数据模式中表现出色。
results: 论文在多个benchmark分类数据集上demonstrated consistent improvement over其他文本焦点方法,无需额外训练或访问标注数据。Abstract
In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.
摘要
在这篇论文中,我们介绍了一种策略用于在大规模语言模型中标识文本焦点。在视觉网络中,焦点自然地由卷积层localized;然而,这不同于现代使用transformer核心网络处理自然语言的网络。我们采用梯度基的焦点方法,提出了评估层次含义一致度的方法,并在多个benchmark分类 datasets上示出了逐渐提高的表现。我们的方法不需要额外的训练或标注数据,并且计算效率较高。
Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling
for: 本文targets Video Semantic Role Labeling (VidSRL), aiming to detect salient events in videos by recognizing predict-argument event structures and interrelationships.
results: 在标准测试集上,该框架与现有最佳模型相比,显著提高了性能。此外,文章还进行了进一步的分析,以便更好地理解该方法的进步。Abstract
Video Semantic Role Labeling (VidSRL) aims to detect the salient events from given videos, by recognizing the predict-argument event structures and the interrelationships between events. While recent endeavors have put forth methods for VidSRL, they can be mostly subject to two key drawbacks, including the lack of fine-grained spatial scene perception and the insufficiently modeling of video temporality. Towards this end, this work explores a novel holistic spatio-temporal scene graph (namely HostSG) representation based on the existing dynamic scene graph structures, which well model both the fine-grained spatial semantics and temporal dynamics of videos for VidSRL. Built upon the HostSG, we present a nichetargeting VidSRL framework. A scene-event mapping mechanism is first designed to bridge the gap between the underlying scene structure and the high-level event semantic structure, resulting in an overall hierarchical scene-event (termed ICE) graph structure. We further perform iterative structure refinement to optimize the ICE graph, such that the overall structure representation can best coincide with end task demand. Finally, three subtask predictions of VidSRL are jointly decoded, where the end-to-end paradigm effectively avoids error propagation. On the benchmark dataset, our framework boosts significantly over the current best-performing model. Further analyses are shown for a better understanding of the advances of our methods.
摘要
RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction
results: 比较 RadGraph2 和原始 RadGraph 数据集训练的模型,显示 RadGraph2 能够捕捉更多的发现和在关系抽取任务中表现更好,提供了开发自动监测疾病进程变化和医疗器械领域信息EXTRACTION模型的基础。Abstract
We present RadGraph2, a novel dataset for extracting information from radiology reports that focuses on capturing changes in disease state and device placement over time. We introduce a hierarchical schema that organizes entities based on their relationships and show that using this hierarchy during training improves the performance of an information extraction model. Specifically, we propose a modification to the DyGIE++ framework, resulting in our model HGIE, which outperforms previous models in entity and relation extraction tasks. We demonstrate that RadGraph2 enables models to capture a wider variety of findings and perform better at relation extraction compared to those trained on the original RadGraph dataset. Our work provides the foundation for developing automated systems that can track disease progression over time and develop information extraction models that leverage the natural hierarchy of labels in the medical domain.
摘要
我团队今天发布了一个新的 dataset called RadGraph2,用于从医学报告中提取信息。这个 dataset 专注于捕捉疾病状态的变化和设备的位置变化过时。我们提出了一个层次结构,用于将实体按照其关系进行组织,并证明在训练时使用这个层次结构可以提高信息抽取模型的性能。我们对 DyGIE++ 框架进行修改,得到了我们的 HGIE 模型,该模型在实体和关系抽取任务中表现出色。我们表明,RadGraph2 可以让模型捕捉更多的发现和在关系抽取任务中表现更好,比那些基于原始 RadGraph dataset 训练的模型。我们的工作为建立自动跟踪疾病进展和在医疗领域中利用自然的标签层次结构开发自动化系统提供了基础。
results: 通过 train 模型并且使用 multi-head attention 机制,可以更好地捕捉 Code 中重要信息,并生成高质量的摘要。Abstract
Code summarization aims to generate concise natural language descriptions for source code. The prevailing approaches adopt transformer-based encoder-decoder architectures, where the Abstract Syntax Tree (AST) of the source code is utilized for encoding structural information. However, ASTs are much longer than the corresponding source code, and existing methods ignore this size constraint by directly feeding the entire linearized AST into the encoders. This simplistic approach makes it challenging to extract truly valuable dependency relations from the overlong input sequence and leads to significant computational overhead due to self-attention applied to all nodes in the AST. To address this issue effectively and efficiently, we present a model, AST-MHSA that uses multi-head attention to extract the important semantic information from the AST. The model consists of two main components: an encoder and a decoder. The encoder takes as input the abstract syntax tree (AST) of the code and generates a sequence of hidden states. The decoder then takes these hidden states as input and generates a natural language summary of the code. The multi-head attention mechanism allows the model to learn different representations of the input code, which can be combined to generate a more comprehensive summary. The model is trained on a dataset of code and summaries, and the parameters of the model are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
摘要
Code 摘要目标是生成源代码的简洁自然语言描述。现有的方法大多采用变换器基于encoder-decoder架构,其中源代码的抽象SyntaxTree(AST)被用于编码结构信息。然而,AST的长度比源代码更长,现有方法直接将整个linearized AST feed到encoder中,这种简单的方法使得EXTRACTING truly valuable dependency relations from the oversized input sequence困难,并且会导致自我注意应用于所有节点中,从而增加计算开销。为解决这个问题,我们提出了一种模型,AST-MHSA,该模型使用多头注意 Mechanism to extract重要的semantic信息from AST。该模型包括两个主要组成部分:编码器和解码器。编码器接受源代码的AST作为输入,并生成一个序列Hidden state。然后,解码器接受这些隐藏状态作为输入,并生成一个自然语言摘要。多头注意机制使得模型可以学习不同的输入代码表示,这些表示可以相互组合,以生成更加全面的摘要。模型在一个代码和摘要的集合上训练,并且通过调整模型的参数,以Minimize the loss between the generated summaries and the ground-truth summaries。
IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer
results: 实验结果显示,提案的IIHT方法可以实现高精度的医疗报告生成,并且可以让医生在实际应用中修改疾病指标以提高准确性。Abstract
Automated medical report generation has become increasingly important in medical analysis. It can produce computer-aided diagnosis descriptions and thus significantly alleviate the doctors' work. Inspired by the huge success of neural machine translation and image captioning, various deep learning methods have been proposed for medical report generation. However, due to the inherent properties of medical data, including data imbalance and the length and correlation between report sequences, the generated reports by existing methods may exhibit linguistic fluency but lack adequate clinical accuracy. In this work, we propose an image-to-indicator hierarchical transformer (IIHT) framework for medical report generation. It consists of three modules, i.e., a classifier module, an indicator expansion module and a generator module. The classifier module first extracts image features from the input medical images and produces disease-related indicators with their corresponding states. The disease-related indicators are subsequently utilised as input for the indicator expansion module, incorporating the "data-text-data" strategy. The transformer-based generator then leverages these extracted features along with image features as auxiliary information to generate final reports. Furthermore, the proposed IIHT method is feasible for radiologists to modify disease indicators in real-world scenarios and integrate the operations into the indicator expansion module for fluent and accurate medical report generation. Extensive experiments and comparisons with state-of-the-art methods under various evaluation metrics demonstrate the great performance of the proposed method.
摘要
医疗报告自动生成已成为医学分析中不可或缺的一环。它可以生成计算机辅助的诊断描述,从而减轻医生的工作负担。靛蓝Machine Translation和图像描述等深度学习方法已经在医疗报告生成方面得到了广泛的应用。然而,由于医疗数据的自然属性,如数据不均衡和报告序列的长度和相关性,现有的方法可能会生成报告, linguistic fluency高但丧失医学精度。在这种情况下,我们提出了一种图像层次变换器(IIHT)框架,用于医疗报告生成。该框架由三个模块组成:分类模块、指标扩展模块和生成模块。首先,分类模块从输入医学图像中提取图像特征,并生成相关疾病指标。然后,指标扩展模块使用“数据-文本-数据”策略,将这些疾病指标扩展为更加详细的报告。最后,使用变换器生成器,通过这些提取的特征和图像特征作为辅助信息,生成最终的报告。此外,我们的IIHT方法可以让放医生在实际应用中修改疾病指标,并将这些操作集成到指标扩展模块中,以实现流畅而准确的医疗报告生成。经过了广泛的实验和比较,我们的提出的方法在不同的评价指标上表现出色。
results: 实验结果显示,这种新的门控机制可以在 synthetic sequence learning 任务中维持长期记忆,并且降低 computational cost。另外,在手写文字识别任务中,这种新的门控机制可以训练到与 conventinal GRU 和 LSTM 基准之间的比较类似的准确性。Abstract
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost, thereby opening up for more efficient execution or larger models on restricted hardware. Recurrent Neural Networks (RNNs) with gating mechanisms such as LSTM and GRU have been widely successful in learning from sequential data due to their ability to capture long-term dependencies. Conventionally, the update based on current inputs and the previous state history is each multiplied with dynamic weights and combined to compute the next state. However, multiplication can be computationally expensive, especially for certain hardware architectures or alternative arithmetic systems such as homomorphic encryption. It is demonstrated that the novel gating mechanism can capture long-term dependencies for a standard synthetic sequence learning task while significantly reducing computational costs such that execution time is reduced by half on CPU and by one-third under encryption. Experimental results on handwritten text recognition tasks furthermore show that the proposed architecture can be trained to achieve comparable accuracy to conventional GRU and LSTM baselines. The gating mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the multiplication of encrypted variables. It can also support quantization in (unencrypted) plaintext applications, with the potential for substantial performance gains since the addition-based formulation can avoid the expansion to double precision often required for multiplication.
摘要
我们将传统的条件门的多项式和 Sigmoid 函数取代为加法和ReLU活化。这个机制设计来维持序列处理中的长期记忆,但是降低computational cost,因此开辟更高效的执行或更大的模型在限制硬件上。循环神经网络(RNN)的门机制,如LSTM和GRU,在处理序列资料时具有长期依赖性的能力。在传统上,更新基于当前输入和前一个状态的历史是每个 multiplied with dynamic weights,然后合并以计算下一个状态。但是,multiplication可以是computationally expensive,特别是在certain hardware architectures或alternative arithmetic systems,如homomorphic encryption。实验结果显示,我们的新的门机制可以在标准的合成序列学习任务中长期依赖性,并且将computational costs大幅降低,CPU上执行时间下降了一半,而在加密下执行时间则下降了一个一third。实验结果还显示,我们的提案的架构可以与传统GRU和LSTM基eline相比,在手写文字识别任务中取得相似的准确性。我们的门机制可能将隐私保护AI应用程序在homomorphic encryption下运行,并且可以支持Quantization在(未加密)普通文本应用中,这可能将带来明显的性能提升,因为加法形式可以避免double precision的扩展,常常需要 multiplication。
results: 这篇论文得到了一种基于新的地方Holder平滑性的 bound。Abstract
In this short note, I show how to adapt to H\"{o}lder smoothness using normalized gradients in a black-box way. Moreover, the bound will depend on a novel notion of local H\"{o}lder smoothness. The main idea directly comes from Levy [2017].
摘要
这短いノートでは、Holder平坦性に适応するためのnormalized gradientsを使用する方法を示します。さらに、このバウンドは、新しい地域Holder平坦性の概念によって依存します。主たるアイデアは、Levy(2017)から直接取り入れられています。Here's the breakdown of the translation:* "Holder smoothness" is translated as "Holder平坦性" (Holder smoothness).* "normalized gradients" is translated as "normalized gradients" (normalized gradients).* "black-box way" is translated as "黑盒方式" (black-box method).* "novel notion" is translated as "新的概念" (new concept).* "local H\"{o}lder smoothness" is translated as "地域Holder平坦性" (local Holder smoothness).* "Levy" is translated as "Levy" (Levy).
Updating Clinical Risk Stratification Models Using Rank-Based Compatibility: Approaches for Evaluating and Optimizing Clinician-Model Team Performance
results: 在使用MIMIC数据集进行临床风险分组时,这篇论文的方法比现有的模型选择技术高$0.019$($95%$ confidence interval:$0.005$, $0.035$)。Abstract
As data shift or new data become available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.
摘要
As data shifts or new data becomes available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.Here's the breakdown of the translation:* As data shifts or new data becomes available: 数据变化或新数据Available* updating clinical machine learning models may be necessary: 可能需要更新临床机器学习模型* to maintain or improve performance over time: 以保持或改进性能* However, updating a model can introduce compatibility issues: 但是更新模型可能会引入兼容问题* when the behavior of the updated model does not align with user expectations: 用户对更新后模型的行为不符合预期* resulting in poor user-model team performance: 导致用户-模型团队性能不佳* Existing compatibility measures depend on model decision thresholds: 现有的兼容度度量取决于模型决策阈值* limiting their applicability in settings where models are used to generate rankings based on estimated risk: 限制其在根据估计风险生成排名的设置中的应用* To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function: 为了解决这个限制,我们提出了一种新的排名基于兼容度度量$C^R$和一种新的损失函数* that aims to optimize discriminative performance while encouraging good compatibility: goals to optimize discriminative performance while encouraging good compatibility* Applied to a case study in mortality risk stratification leveraging data from MIMIC: 在基于MIMIC数据的mortality risk stratification中应用* our approach yields more compatible models while maintaining discriminative performance: 我们的方法可以生成更兼容的模型,同时保持分类性能* compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$): 与现有的模型选择技术相比,$C^R$的增加为$0.019$ ($95\%$ confidence interval: $0.005$, $0.035$)* This work provides new tools to analyze and update risk stratification models used in clinical care: 这项工作提供了新的工具来分析和更新在临床护理中使用的风险分化模型
Multi-graph Spatio-temporal Graph Convolutional Network for Traffic Flow Prediction
results: 在一个中国省级高速公路上,对比基准方法,提供了明显的预测精度提升和实际业务优势。Abstract
Inter-city highway transportation is significant for urban life. As one of the key functions in intelligent transportation system (ITS), traffic evaluation always plays significant role nowadays, and daily traffic flow prediction still faces challenges at network-wide toll stations. On the one hand, the data imbalance in practice among various locations deteriorates the performance of prediction. On the other hand, complex correlative spatio-temporal factors cannot be comprehensively employed in long-term duration. In this paper, a prediction method is proposed for daily traffic flow in highway domain through spatio-temporal deep learning. In our method, data normalization strategy is used to deal with data imbalance, due to long-tail distribution of traffic flow at network-wide toll stations. And then, based on graph convolutional network, we construct networks in distinct semantics to capture spatio-temporal features. Beside that, meteorology and calendar features are used by our model in the full connection stage to extra external characteristics of traffic flow. By extensive experiments and case studies in one Chinese provincial highway, our method shows clear improvement in predictive accuracy than baselines and practical benefits in business.
摘要
urban 生活中的交通运输非常重要。作为智能交通系统(ITS)的一个关键功能,交通评估总是在当今具有重要作用,而日常交通流量预测仍然面临在网络级别的收费站道路上的挑战。一方面,实际中数据的不均衡情况下降低预测性能。另一方面,复杂的相关空间时间因素无法在长期时间内充分利用。本文提出了基于高速公路域的日常交通流量预测方法,通过空间时间深度学习来解决数据不均衡问题。我们使用数据normalization策略来处理数据不均衡问题,因为交通流量在网络级别的收费站具有长尾分布。然后,基于图 convolutional 网络,我们构建了不同 semantics 的网络,以捕捉空间时间特征。此外,我们在全连接阶段使用 meterology 和 calendar 特征来捕捉交通流量的外部特征。经过广泛的实验和案例研究在一个中国省道上,我们的方法显示了较好的预测精度,与基线方法相比。此外,在商业实践中也具有实际的优势。
NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search
results: 实现了在训练和推理中 simultaneously 优化量化参数,并达到了最佳压缩率和预测性能。Abstract
Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models quantization, namely, non-uniform quantization. NUPES leverages automorphisms to preserve the scalar multiplications. Such transformations are derived from power functions. However, the optimization of the exponent parameter and weight values remains a challenging and novel problem which could not be solved with previous post training optimization techniques which only learn to round up or down weight values in order to preserve the predictive function. We circumvent this limitation with a new paradigm: learning new quantized weights over the entire quantized space. Similarly, we enable the optimization of the power exponent, i.e. the optimization of the quantization operator itself during training by alleviating all the numerical instabilities. The resulting predictive function is compatible with integer-only low-bit inference. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.
摘要
Symmetry Defense Against XGBoost Adversarial Perturbation Attacks
for: 保护树型ensemble分类器(如Gradient-Boosting Decision Trees) Against 攻击性干扰
methods: 使用对称性防御
results: achievement of up to 100% accuracy on adversarial samples, and up to over 95% accuracy on adversarial samples for the GBDT classifier of the F-MNIST dataset.Abstract
We examine whether symmetry can be used to defend tree-based ensemble classifiers such as gradient-boosting decision trees (GBDTs) against adversarial perturbation attacks. The idea is based on a recent symmetry defense for convolutional neural network classifiers (CNNs) that utilizes CNNs' lack of invariance with respect to symmetries. CNNs lack invariance because they can classify a symmetric sample, such as a horizontally flipped image, differently from the original sample. CNNs' lack of invariance also means that CNNs can classify symmetric adversarial samples differently from the incorrect classification of adversarial samples. Using CNNs' lack of invariance, the recent CNN symmetry defense has shown that the classification of symmetric adversarial samples reverts to the correct sample classification. In order to apply the same symmetry defense to GBDTs, we examine GBDT invariance and are the first to show that GBDTs also lack invariance with respect to symmetries. We apply and evaluate the GBDT symmetry defense for nine datasets against six perturbation attacks with a threat model that ranges from zero-knowledge to perfect-knowledge adversaries. Using the feature inversion symmetry against zero-knowledge adversaries, we achieve up to 100% accuracy on adversarial samples even when default and robust classifiers have 0% accuracy. Using the feature inversion and horizontal flip symmetries against perfect-knowledge adversaries, we achieve up to over 95% accuracy on adversarial samples for the GBDT classifier of the F-MNIST dataset even when default and robust classifiers have 0% accuracy.
摘要
我们研究了使用对称性来防御基于树状集成分类器(例如梯度提升决策树,GBDT)的对抗攻击。这个想法基于现有的对称性防御方法,该方法利用卷积神经网络(CNN)的对称性不一致性。CNN的对称性不一致性意味着它会将水平翻转的图像分类为不同的类别,而不是原始图像。此外,CNN的对称性不一致性还意味着它会将对称攻击样本分类为 incorrect 类别。使用对称性不一致性,现有的 CNN 对称防御方法已经证明了对 symmetric 攻击样本的分类将回归到正确的样本分类。为了应用同样的对称防御方法到 GBDT,我们研究了 GBDT 的对称性,并发现 GBDT 也缺乏对称性。我们应用并评估了 GBDT 对称防御策略在 nine 个数据集上,对六种攻击方法进行测试,使用了隐藏知识到完美知识的威胁模型。使用特征反转对称性,我们在零知识攻击者情况下达到了 up to 100% 的攻击样本正确率,而 default 和 Robust 分类器的正确率为 0%。使用特征反转和水平翻转对称性,我们在完美知识攻击者情况下达到了 F-MNIST 数据集上 GBDT 分类器的 over 95% 的攻击样本正确率,而 default 和 Robust 分类器的正确率为 0%。
AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting
results: 在评估29个标准资料集上,这个论文显示了强大的实验性能,与许多预测方法相比,在点预测和分布预测精度上表现出色,甚至可以超越最佳预测方法的组合。Abstract
We introduce AutoGluon-TimeSeries - an open-source AutoML library for probabilistic time series forecasting. Focused on ease of use and robustness, AutoGluon-TimeSeries enables users to generate accurate point and quantile forecasts with just 3 lines of Python code. Built on the design philosophy of AutoGluon, AutoGluon-TimeSeries leverages ensembles of diverse forecasting models to deliver high accuracy within a short training time. AutoGluon-TimeSeries combines both conventional statistical models, machine-learning based forecasting approaches, and ensembling techniques. In our evaluation on 29 benchmark datasets, AutoGluon-TimeSeries demonstrates strong empirical performance, outperforming a range of forecasting methods in terms of both point and quantile forecast accuracy, and often even improving upon the best-in-hindsight combination of prior methods.
摘要
我们介绍AutoGluon-TimeSeries:一个开源AutoML库 для probabilistic time series forecasting。我们专注于使用容易和可靠,让用户只需三行Python代码就能生成高精度的点和量程预测。基于AutoGluon的设计哲学,AutoGluon-TimeSeries 利用多种不同的预测模型 ensemble,以提供高精度的预测,仅需短时间训练。AutoGluon-TimeSeries 结合了传统的统计模型、机器学习基于的预测方法,以及ensemble技术。在我们的29个benchmark数据集评估中,AutoGluon-TimeSeries 展示了强大的实验性表现,比较了多种预测方法,包括点和量程预测精度,并常常超越最佳对应方法的结合。
Efficient Variational Inference for Large Skew-t Copulas with Application to Intraday Equity Returns
results: 这篇论文的研究结果表明,skew-t分布能够更好地捕捉金融数据中的非线性和极端尾部依赖性,并且在预测和股票组合选择方面表现更好。Abstract
Large skew-t factor copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we propose a fast and accurate Bayesian variational inference (VI) approach to do so. The method uses a conditionally Gaussian generative representation of the skew-t distribution to define an augmented posterior that can be approximated accurately. A fast stochastic gradient ascent algorithm is used to solve the variational optimization. The new methodology is used to estimate copula models for intraday returns from 2017 to 2021 on 93 U.S. equities. The copula captures substantial heterogeneity in asymmetric dependence over equity pairs, in addition to the variability in pairwise correlations. We show that intraday predictive densities from the skew-t copula are more accurate than from some other copula models, while portfolio selection strategies based on the estimated pairwise tail dependencies improve performance relative to the benchmark index.
摘要
大的扭曲-t分布模型具有较高的对称不均衡和极端尾部依赖性,因此在金融数据模型化中具有吸引力。我们表明了Azalini和Capitanio(2003)的skew-t分布下的copula允许对每个对象之间的对称不均衡达到更高的水平。这种copula的估计在高维度是困难的,我们提出了一种快速和准确的整体变分推断法来实现。该方法使用一个conditionally Gaussian生成表示skew-t分布来定义一个扩展 posterior,可以准确地估计。一种快速的梯度上升算法来解决变分估计问题。我们使用这种新方法来估计2017年至2021年的93只美国股票的copula模型。这种copula模型能够捕捉到股票对的巨大差异性,以及对对策之间的变化。我们显示了在skew-t copula模型下的预测概率密度比其他 copula模型更为准确,而基于估计的对策tail相互依赖性可以提高投资性能。
Critical Points ++: An Agile Point Cloud Importance Measure for Robust Classification, Adversarial Defense and Explainable AI
For: 提高实际应用中安全需求下模型的准确性和速度。* Methods: 研究3D点云的关键点与异常样本的关系,并将关键点推广为重要度度量。* Results: 训练基于不重要点的分类网络可以大幅提高模型的Robustness,但是会导致清洁集合上的性能下降。发现正常化 entropy 非常有用于异常分析。建议使用 adaptive 阈值基于正常化 entropy 来选择不重要点。提出了一种快速计算的重要度度量,可以应用于 Explainable AI (XAI)、异常点除法、不确定性估计、 Robust Classification 和反击防御等多种应用。在 послед two 任务上达到了 SOTA 结果。代码可以在:https://github.com/yossilevii100/critical_points2 中找到。Abstract
The ability to cope accurately and fast with Out-Of-Distribution (OOD) samples is crucial in real-world safety demanding applications. In this work we first study the interplay between critical points of 3D point clouds and OOD samples. Our findings are that common corruptions and outliers are often interpreted as critical points. We generalize the notion of critical points into importance measures. We show that training a classification network based only on less important points dramatically improves robustness, at a cost of minor performance loss on the clean set. We observe that normalized entropy is highly informative for corruption analysis. An adaptive threshold based on normalized entropy is suggested for selecting the set of uncritical points. Our proposed importance measure is extremely fast to compute. We show it can be used for a variety of applications, such as Explainable AI (XAI), Outlier Removal, Uncertainty Estimation, Robust Classification and Adversarial Defense. We reach SOTA results on the two latter tasks. Code is available at: https://github.com/yossilevii100/critical_points2
摘要
“能够快速和精准地处理外部分布(OOD)样本的能力是实际应用中的重要要求。在这个工作中,我们首先研究3D点云中的重要点和OOD样本之间的互动。我们发现通用的损坏和偏差常常被视为重要点。我们将重要点扩展为重要度量表。我们显示在仅使用不重要点进行训练时,可以大幅提高Robustness,仅带来清洁集的小损失。我们发现Normalized entropy具有高度的敏感度,因此建议使用Normalized entropy来选择不重要点。我们的提出的重要度量表非常快速计算。我们显示它可以用于多个应用,如Explainable AI(XAI)、偏差除去、不确定度估计、Robust Classification和攻击防护。我们在这两个后者任务上达到了SOTA结果。代码可以在:https://github.com/yossilevii100/critical_points2 中找到。”Note: "SOTA" stands for "State of the Art", which means the current best performance in a particular field or task.
Models Matter: The Impact of Single-Step Retrosynthesis on Synthesis Planning
paper_authors: Paula Torren-Peraire, Alan Kai Hassen, Samuel Genheden, Jonas Verhoeven, Djork-Arne Clevert, Mike Preuss, Igor Tetko for: 这种研究旨在提供一种可靠的化学合成路径规划方法,以便在实际应用中找到最佳化学合成路径。methods: 这种方法首先使用多个单步retrosynthesis模型进行预测,然后使用这些预测结果进行多步合成规划。此外,这种方法还使用公共和专用反应数据进行模型评估。results: 研究发现,单步retrosynthesis模型的高性能并不一定意味着它们可以在多步合成规划中表现出色。此外,研究还发现,使用不同的单步模型可以提高多步合成规划的总成功率,并且每个单步模型都可以找到不同的合成路径。Abstract
Retrosynthesis consists of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found with the goal to provide a synthesis route. Its two primary research directions, single-step retrosynthesis prediction, which models the chemical reaction logic, and multi-step synthesis planning, which tries to find the correct sequence of reactions, are inherently intertwined. Still, this connection is not reflected in contemporary research. In this work, we combine these two major research directions by applying multiple single-step retrosynthesis models within multi-step synthesis planning and analyzing their impact using public and proprietary reaction data. We find a disconnection between high single-step performance and potential route-finding success, suggesting that single-step models must be evaluated within synthesis planning in the future. Furthermore, we show that the commonly used single-step retrosynthesis benchmark dataset USPTO-50k is insufficient as this evaluation task does not represent model performance and scalability on larger and more diverse datasets. For multi-step synthesis planning, we show that the choice of the single-step model can improve the overall success rate of synthesis planning by up to +28% compared to the commonly used baseline model. Finally, we show that each single-step model finds unique synthesis routes, and differs in aspects such as route-finding success, the number of found synthesis routes, and chemical validity, making the combination of single-step retrosynthesis prediction and multi-step synthesis planning a crucial aspect when developing future methods.
摘要
逆Synthesis是一种分解化学物质的技术,通过步骤加法来找到化学前体。逆Synthesis的两个主要研究方向是单步逆Synthesis预测和多步Synthesis规划。然而,这两者之间的联系并未得到反映。在这个工作中,我们将这两个主要研究方向结合起来,通过应用多个单步逆Synthesis模型并分析它们在多步Synthesis规划中的影响。我们发现,高单步性能并不一定意味着成功的路径找到。因此,单步逆Synthesis模型在将来应该在Synthesis规划中进行评估。此外,我们显示USPTO-50k单步逆Synthesis评估数据集不能准确评估单步逆Synthesis模型的性能和扩展性。对多步Synthesis规划,我们显示,选择合适的单步逆Synthesis模型可以提高总成功率 by up to +28%比于常用基eline模型。最后,我们发现每个单步逆Synthesis模型找到的合成路径都是独特的,包括成功率、找到的合成路径数量和化学有效性等方面不同,因此将单步逆Synthesis预测和多步Synthesis规划结合起来是未来研究的关键。
On the Optimal Expressive Power of ReLU DNNs and Its Application in Approximation with Kolmogorov Superposition Theorem
for: studying the optimal expressive power of ReLU deep neural networks (DNNs) and its application in approximation.
methods: constructively prove that any continuous piecewise linear functions on $[0,1]$, comprising $O(N^2L)$ segments, can be represented by ReLU DNNs with $L$ hidden layers and $N$ neurons per layer.
results: achieve an enhanced approximation rate for ReLU DNNs of arbitrary width and depth when dealing with continuous functions in high-dimensional spaces, using the Kolmogorov Superposition Theorem.Abstract
This paper is devoted to studying the optimal expressive power of ReLU deep neural networks (DNNs) and its application in approximation via the Kolmogorov Superposition Theorem. We first constructively prove that any continuous piecewise linear functions on $[0,1]$, comprising $O(N^2L)$ segments, can be represented by ReLU DNNs with $L$ hidden layers and $N$ neurons per layer. Subsequently, we demonstrate that this construction is optimal regarding the parameter count of the DNNs, achieved through investigating the shattering capacity of ReLU DNNs. Moreover, by invoking the Kolmogorov Superposition Theorem, we achieve an enhanced approximation rate for ReLU DNNs of arbitrary width and depth when dealing with continuous functions in high-dimensional spaces.
摘要
Here's the translation in Simplified Chinese:这篇论文关注ReLU深度神经网络(DNNs)的最佳表达力和其在高维空间中的投影使用Kolmogorov积合定理。我们首先构造地证明任何[0,1]上连续piecewise线性函数可以通过ReLU DNNs with $L$层和$N$个神经元per layer来表示,其中$O(N^2L)$个段。其次,我们证明这种构造是DNNs中参数的最佳 counted,通过研究ReLU DNNs的分裂能力来达到。 finally,通过采用Kolmogorov积合定理,我们得到了ReLU DNNs of arbitrary width and depth在高维空间中的提高投影率。
Quality Diversity under Sparse Reward and Sparse Interaction: Application to Grasping in Robotics
results: 研究发现,MAP-Elites变体在学习率、稳定性和多样性等方面具有明显的优势,在研究中的所有方法中出色的表现。此外,实验还发现了 sparse interaction 可能导致的假新鲜性现象。本研究的实验结果表明,通过使用QD算法,可以快速生成高质量的抓取轨迹示例,这在文献中无前例。Abstract
Quality-Diversity (QD) methods are algorithms that aim to generate a set of diverse and high-performing solutions to a given problem. Originally developed for evolutionary robotics, most QD studies are conducted on a limited set of domains - mainly applied to locomotion, where the fitness and the behavior signal are dense. Grasping is a crucial task for manipulation in robotics. Despite the efforts of many research communities, this task is yet to be solved. Grasping cumulates unprecedented challenges in QD literature: it suffers from reward sparsity, behavioral sparsity, and behavior space misalignment. The present work studies how QD can address grasping. Experiments have been conducted on 15 different methods on 10 grasping domains, corresponding to 2 different robot-gripper setups and 5 standard objects. An evaluation framework that distinguishes the evaluation of an algorithm from its internal components has also been proposed for a fair comparison. The obtained results show that MAP-Elites variants that select successful solutions in priority outperform all the compared methods on the studied metrics by a large margin. We also found experimental evidence that sparse interaction can lead to deceptive novelty. To our knowledge, the ability to efficiently produce examples of grasping trajectories demonstrated in this work has no precedent in the literature.
摘要
Quality-Diversity(QD)方法是一种算法,旨在生成一组多样性和高性能的解决方案来解决给定问题。原本是为演化 робототех学开发的,大多数QD研究都是在有限的领域上进行,主要是应用于行动,其中健康和行为信号都是密集的。抓取是 робо控制中的一个关键任务,尚未被解决。抓取受到了QD文献中的极其挑战:它受到奖励稀少、行为稀少和行为空间不对齐的影响。本研究探讨了QD如何解决抓取问题。我们对15种方法在10个抓取领域进行了实验,包括2种机器人-握手设备和5种标准物品。我们还提出了一个评价框架,以便公平地对算法进行评价。实验结果显示,MAP-Elites的变体在学习率和稳定性等方面都超过了所有比较的方法。我们还发现了实验证明,稀有的互动可能会导致假新鲜。据我们所知,在这个研究中生成的抓取轨迹示例的能力没有在文献中的前例。
paper_authors: Xuanhe Zhou, Guoliang Li, Zhiyuan Liu for:This paper aims to provide a revolutionary LLM-centric framework for database maintenance, which can help database administrators (DBAs) to manage and optimize database systems more efficiently and effectively.methods:The proposed framework uses large language models (LLMs) to acquire database maintenance experience from textual sources, and provides reasonable, well-founded, in-time diagnosis and optimization advice for target databases. The framework includes three main components: (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs.results:The preliminary experimental results show that D-Bot, the proposed LLM-based database administrator, can efficiently and effectively diagnose the root causes of database issues and provide accurate optimization advice. The code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.Abstract
Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
摘要
Database administrators (DBAs) play a crucial role in managing, maintaining, and optimizing a database system to ensure data availability, performance, and reliability. However, it is difficult and time-consuming for DBAs to manage a large number of database instances (e.g., millions of instances on cloud databases). Recently, large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Therefore, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results show that D-Bot can efficiently and effectively diagnose the root causes, and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis
results: 经过广泛的实验,本研究发现了不同方法的性能指标,包括准确率、精度、回归率和F1分数,可以帮助研究者和实践者在面临欺骗内容时做出了 Informed decisions。Abstract
Deceptive text classification is a critical task in natural language processing that aims to identify deceptive o fraudulent content. This study presents a comparative analysis of machine learning and transformer-based approaches for deceptive text classification. We investigate the effectiveness of traditional machine learning algorithms and state-of-the-art transformer models, such as BERT, XLNET, DistilBERT, and RoBERTa, in detecting deceptive text. A labeled dataset consisting of deceptive and non-deceptive texts is used for training and evaluation purposes. Through extensive experimentation, we compare the performance metrics, including accuracy, precision, recall, and F1 score, of the different approaches. The results of this study shed light on the strengths and limitations of machine learning and transformer-based methods for deceptive text classification, enabling researchers and practitioners to make informed decisions when dealing with deceptive content.
摘要
《伪装文本分类是自然语言处理中一项关键任务,旨在识别伪装或欺诈内容。本研究进行了机器学习和转换器基于方法的比较分析,以评估这些方法在检测伪装文本方面的有效性。我们使用了一个标注的数据集,包括伪装和非伪装文本,进行训练和评估。经过广泛的实验,我们比较了不同方法的性能指标,包括准确率、精度、回归率和F1分数。研究结果为研究者和实践者提供了有关机器学习和转换器方法在伪装文本分类方面的强项和局限性,以便他们在处理伪装内容时做出了有知识的决策。》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.
Comprehensive Analysis of Network Robustness Evaluation Based on Convolutional Neural Networks with Spatial Pyramid Pooling
results: 研究结果表明,提案的CNN模型在不同网络类型、组件类型和攻击enario下具有高效的计算时间和可scalability,但在其他攻击模式下表现不佳。研究还发现,当预测的网络类型与训练网络类型不同时,模型仍能在Random Node Failure scenario中显示出良好的性能,但在其他攻击enario下表现下降。这种scenario-sensitivity在前一 studies中未得到关注。Abstract
Connectivity robustness, a crucial aspect for understanding, optimizing, and repairing complex networks, has traditionally been evaluated through time-consuming and often impractical simulations. Fortunately, machine learning provides a new avenue for addressing this challenge. However, several key issues remain unresolved, including the performance in more general edge removal scenarios, capturing robustness through attack curves instead of directly training for robustness, scalability of predictive tasks, and transferability of predictive capabilities. In this paper, we address these challenges by designing a convolutional neural networks (CNN) model with spatial pyramid pooling networks (SPP-net), adapting existing evaluation metrics, redesigning the attack modes, introducing appropriate filtering rules, and incorporating the value of robustness as training data. The results demonstrate the thoroughness of the proposed CNN framework in addressing the challenges of high computational time across various network types, failure component types and failure scenarios. However, the performance of the proposed CNN model varies: for evaluation tasks that are consistent with the trained network type, the proposed CNN model consistently achieves accurate evaluations of both attack curves and robustness values across all removal scenarios. When the predicted network type differs from the trained network, the CNN model still demonstrates favorable performance in the scenario of random node failure, showcasing its scalability and performance transferability. Nevertheless, the performance falls short of expectations in other removal scenarios. This observed scenario-sensitivity in the evaluation of network features has been overlooked in previous studies and necessitates further attention and optimization. Lastly, we discuss important unresolved questions and further investigation.
摘要
网络连接 Robustness 是我们理解、优化和维护复杂网络的关键方面,传统上通过时间消耗和实际无法实施的 simulations 进行评估。幸运的是,机器学习提供了一个新的评估挑战的解决方案。然而,有几个关键的问题仍然无法解决,包括在更广泛的边 removals 方案下的性能、通过攻击曲线而不直接培训 robustness,可扩展性的预测任务和预测能力的传输性。在这篇文章中,我们解决了这些挑战,通过设计一个卷积神经网络(CNN)模型,并将其与空间堆叠网络(SPP-net)相结合,采用现有的评估指标,修改攻击模式,引入适当的筛选规则,以及将值 robustness 作为训练数据。结果表明我们提出的 CNN 框架在不同网络类型、组件类型和失败场景下具有高计算时间的积极性。然而,我们的 CNN 模型在不同的评估任务下表现不一样:当训练网络类型与测试网络类型一致时,我们的 CNN 模型在所有失败场景下具有准确的攻击曲线和 Robustness 值评估。而当预测网络类型与训练网络类型不一致时,我们的 CNN 模型在随机节点失败场景下仍然具有可扩展性和性能传输性,但在其他失败场景下表现不佳。这种场景敏感性在前一 studie 中被忽略了,需要进一步的关注和优化。最后,我们讨论了重要的未解决问题和进一步的调查。
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs
results: 提供了 average dynamic suboptimality gap 的上界,证明 PORTAL 和 Ada-PORTAL 在不重大非站点性下可以 achieve 无限小的 average dynamic suboptimality gap,并且有 polynomial sample complexity。Abstract
Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.
摘要
“强化学习(RL)在变化环境中模型许多实际应用via非站点 Markov决策过程(MDPs),因此吸引了广泛的关注。然而,文献中关于非站点 MDPs 的理论研究主要集中在表格和线性(混合) MDPs 上,这些模型不能捕捉深度RL 中未知表示的本质。在这篇论文中,我们进行了第一次对非站点 RL 下 episodic 低维 MDPs 进行研究,其中转移概率和奖励可能随着时间变化,且低维模型包含未知表示。我们首先提出了一种参数取值权重优化算法called PORTAL,然后改进了 PORTAL 到其参数自适应版本Ada-PORTAL,可以自适应地调整其超参数无需任何先知ledge of nonstationarity。对于这两种算法,我们提供了平均动态下optimality gap的上限,表明,只要非站点性不太大,PORTAL 和 Ada-PORTAL 都是可以 дости得到可微小平均动态下optimality gap的样本率质量。”
$\mathcal{G}^2Pxy$: Generative Open-Set Node Classification on Graphs with Proxy Unknowns
paper_authors: Qin Zhang, Zelin Shi, Xiaolin Zhang, Xiaojun Chen, Philippe Fournier-Viger, Shirui Pan
for: This paper focuses on open-set node classification, which is the task of predicting the labels of unlabeled nodes in a graph when some classes are unknown during training.
methods: The proposed method, $\mathcal{G}^2Pxy$, uses a generative approach with proxy unknown nodes generated via mixup to anticipate the distribution of novel classes. It transforms a closed-set classifier into an open-set one by adding an extra proxy classifier, and uses a combination of cross entropy loss and complement entropy loss to optimize the performance.
results: The proposed method achieves superior effectiveness for unknown class detection and known class classification on benchmark graph datasets, and does not have specific requirement on the GNN architecture.Abstract
Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are often applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e. $\mathcal{G}^2Pxy$, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints of both cross entropy loss and complement entropy loss, $\mathcal{G}^2Pxy$ achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on benchmark graph datasets. Moreover, $\mathcal{G}^2Pxy$ does not have specific requirement on the GNN architecture and shows good generalizations.
摘要
节点分类是指预测图表中没有标签的节点的标签。现状最佳方法基于图神经网络实现了在训练时有所有标签的情况下的出色表现。但在实际应用中,模型经常应用于新类的数据上,这可能导致大规模的误分类,从而显著降低性能。因此,开发开放集分类方法是非常重要的,以确定给定的样本是否属于已知类。现有的开放集节点分类方法通常使用整合学习,使用真实未看到的类节点的一部分或所有特征来帮助开放集分类。在这篇论文中,我们提出了一种新的生成型开放集节点分类方法,即$\mathcal{G}^2Pxy$,它遵循一个严格的整合学习设定,在训练和验证过程中不可以使用未知类的信息。通过混合生成器来生成两种类型的代理未知节点(交叉未知代理和外部未知代理),以高效地预测未知类的分布。使用生成的代理节点,一个关闭集分类器可以被转化为一个开放集分类器,通过添加一个额外的代理分类器。在满足极化 entropy 损失和补偿 entropy 损失的约束下,$\mathcal{G}^2Pxy$ 实现了高效的未知类检测和已知类分类,经过实验 validate 在图表数据集上。此外,$\mathcal{G}^2Pxy$ 不具有特定的 GNN 架构要求,并且具有良好的泛化性。
A Forecaster’s Review of Judea Pearl’s Causality: Models, Reasoning and Inference, Second Edition, 2009
results: 本文讨论了时间序列预测中 causal inference 的可能优点和挑战,以及如何在不同的预测场景中 estimate causal effects。Abstract
With the big popularity and success of Judea Pearl's original causality book, this review covers the main topics updated in the second edition in 2009 and illustrates an easy-to-follow causal inference strategy in a forecast scenario. It further discusses some potential benefits and challenges for causal inference with time series forecasting when modeling the counterfactuals, estimating the uncertainty and incorporating prior knowledge to estimate causal effects in different forecasting scenarios.
摘要
原始 causality 书的大受欢迎和成功,这篇评论涵盖了2009年第二版中更新的主题,并提供了一种易于跟进的 causal inference 策略。它还讨论了在预测场景中模型counterfactuals,估计 uncertainty 以及 integrate prior knowledge 以估计不同预测场景中的 causal effect。Here's the word-for-word translation:原始 causality 书的大受欢迎和成功,这篇评论涵盖了2009年第二版中更新的主题,并提供了一种易于跟进的 causal inference 策略。它还讨论了在预测场景中模型counterfactuals,估计 uncertainty 以及 integrate prior knowledge 以估计不同预测场景中的 causal effect。
Explainable AI applications in the Medical Domain: a systematic review
results: 这篇论文的结果表明,在医疗人工智能领域中,使用XAI解释技术可以提高人类对AI的理解和信任,但是目前大多数研究仅仅是在理论上实现了这一点,尚未在实践中应用。同时,深度学习模型在医疗人工智能中的应用比其他类型的机器学习模型更多,而且可见性和互动式用户界面是更有用的。Abstract
Artificial Intelligence in Medicine has made significant progress with emerging applications in medical imaging, patient care, and other areas. While these applications have proven successful in retrospective studies, very few of them were applied in practice.The field of Medical AI faces various challenges, in terms of building user trust, complying with regulations, using data ethically.Explainable AI (XAI) aims to enable humans understand AI and trust its results. This paper presents a literature review on the recent developments of XAI solutions for medical decision support, based on a representative sample of 198 articles published in recent years. The systematic synthesis of the relevant articles resulted in several findings. (1) model-agnostic XAI techniques were mostly employed in these solutions, (2) deep learning models are utilized more than other types of machine learning models, (3) explainability was applied to promote trust, but very few works reported the physicians participation in the loop, (4) visual and interactive user interface is more useful in understanding the explanation and the recommendation of the system. More research is needed in collaboration between medical and AI experts, that could guide the development of suitable frameworks for the design, implementation, and evaluation of XAI solutions in medicine.
摘要
人工智能在医疗领域已经取得了显著的进步,其应用领域包括医疗影像、患者护理等。然而,这些应用尚未在实践中普及。医疗领域的人工智能面临着多种挑战,包括建立用户信任、遵守法规、使用数据伦理。可解释人工智能(XAI)旨在让人类理解人工智能和信任其结果。本文通过对最近几年发表的198篇文章进行文献综述,描述了当前XAI解决方案在医疗决策支持方面的最新发展。系统性的综合分析这些文章的结果显示了以下几点:(1)模型不依赖的XAI技术在这些解决方案中得到了广泛的应用,(2)深度学习模型比其他机器学习模型更为广泛应用,(3)解释性是为了提高信任,但很少的工作报告了医生参与的循环,(4)可视化和交互式用户界面更有用于理解解释和系统的建议。为了指导人工智能在医疗领域的设计、实现和评估,需要更多的合作 between医疗和人工智能专家,以开发适合的框架。
A Comparative Assessment of Multi-view fusion learning for Crop Classification
results: 研究发现,取决于测试区域,不同的 fusión 策略在不同的数据集上可以获得最佳性能。然而,我们提出了一个初步的选择 критериion,可以帮助选择最佳的 fusión 策略。Abstract
With a rapidly increasing amount and diversity of remote sensing (RS) data sources, there is a strong need for multi-view learning modeling. This is a complex task when considering the differences in resolution, magnitude, and noise of RS data. The typical approach for merging multiple RS sources has been input-level fusion, but other - more advanced - fusion strategies may outperform this traditional approach. This work assesses different fusion strategies for crop classification in the CropHarvest dataset. The fusion methods proposed in this work outperform models based on individual views and previous fusion methods. We do not find one single fusion method that consistently outperforms all other approaches. Instead, we present a comparison of multi-view fusion methods for three different datasets and show that, depending on the test region, different methods obtain the best performance. Despite this, we suggest a preliminary criterion for the selection of fusion methods.
摘要
受到快速增加的远程感知数据源的增加和多样化的影响,多视图学习模型的需求正在增加。然而,考虑到不同的分辨率、强度和噪声等因素,这是一个复杂的任务。传统上,多视图源数据的融合方法通常是输入水平的融合,但更高级的融合策略可能会超越传统方法。本工作对用于蔬菜分类的CropHarvest数据集进行了不同融合策略的评估。我们发现,取决于测试区域,不同的融合方法在不同的数据集上表现出最佳的情况。尽管如此,我们提出了一个初步的融合方法选择标准。Here's a breakdown of the translation:* 快速增加 ( rápidamente aumentando ) - rapidly increasing* 远程感知数据源 ( remote sensing data sources ) - remote sensing data* 多样化 ( multi-variate ) - diverse* 多视图学习模型 ( multi-view learning model ) - multi-view learning model* 输入水平 ( input level ) - input level* 融合方法 ( fusion method ) - fusion method* 蔬菜分类 ( crop classification ) - crop classification* CropHarvest数据集 ( CropHarvest dataset ) - CropHarvest dataset* 不同的融合策略 ( different fusion strategies ) - different fusion strategies* 测试区域 ( test region ) - test region* 初步的融合方法选择标准 ( preliminary criterion for selecting fusion methods ) - preliminary criterion for selecting fusion methods
Addressing Data Scarcity in Optical Matrix Multiplier Modeling Using Transfer Learning
paper_authors: Ali Cem, Ognjen Jovanovic, Siqi Yan, Yunhong Ding, Darko Zibar, Francesco Da Ros
for: 用 transferred learning Address experimental data scarcity when training neural network (NN) models for Mach-Zehnder interferometer mesh-based optical matrix multipliers.
methods: 使用 synthetic data Generated from a less accurate analytical model 和 fine-tuning with experimental data.
results: 比使用 analytical model 或 standalone NN model 时,模型错误得 Significant reductions when training data is limited. 使用 regularization techniques 和 ensemble averaging, 实现 < 1 dB root-mean-square error on the matrix weights implemented by a photonic chip, 只用 25% of the available data.Abstract
We present and experimentally evaluate using transfer learning to address experimental data scarcity when training neural network (NN) models for Mach-Zehnder interferometer mesh-based optical matrix multipliers. Our approach involves pre-training the model using synthetic data generated from a less accurate analytical model and fine-tuning with experimental data. Our investigation demonstrates that this method yields significant reductions in modeling errors compared to using an analytical model, or a standalone NN model when training data is limited. Utilizing regularization techniques and ensemble averaging, we achieve < 1 dB root-mean-square error on the matrix weights implemented by a photonic chip while using only 25% of the available data.
摘要
我团队提出了一种使用传输学习来解决光纤Matrix多项式乘法器的实验数据稀缺问题。我们的方法是先在synthetic数据上预训练模型,然后使用实验数据进行细化。我们的调查表明,这种方法可以比使用分析模型或独立的神经网络模型来减少模型错误。通过使用正则化技术和ensemble averaging,我们实现了<1 dB的平均方差,而使用的数据只占总数的25%。
Product Review Image Ranking for Fashion E-commerce
methods: 本文提出了一种简单 yet effective的训练程序,用于排名用户图片。我们使用Myntra studio posts和高度参与度(upvotes/downvotes)的UGC图片作为我们的起点,并使用选择的扭曲技术将图片的质量升级到与坏UGC图片相当。我们训练我们的网络,以便将坏质量图片排名在低于高质量图片之下。
results: 我们的提议方法在两个纪录中,即相关度系数和准确率,与基线模型之间差距很大。Abstract
In a fashion e-commerce platform where customers can't physically examine the products on their own, being able to see other customers' text and image reviews of the product is critical while making purchase decisions. Given the high reliance on these reviews, over the years we have observed customers proactively sharing their reviews. With an increase in the coverage of User Generated Content (UGC), there has been a corresponding increase in the number of customer images. It is thus imperative to display the most relevant images on top as it may influence users' online shopping choices and behavior. In this paper, we propose a simple yet effective training procedure for ranking customer images. We created a dataset consisting of Myntra (A Major Indian Fashion e-commerce company) studio posts and highly engaged (upvotes/downvotes) UGC images as our starting point and used selected distortion techniques on the images of the above dataset to bring their quality at par with those of bad UGC images. We train our network to rank bad-quality images lower than high-quality ones. Our proposed method outperforms the baseline models on two metrics, namely correlation coefficient, and accuracy, by substantial margins.
摘要
在一个无法直接检查产品的电商平台上,可见其他客户的文本和图像评论对于购买决策非常重要。随着用户生成内容的涵盖率的提高,我们在年久的观察中发现了客户积极分享评论。随着用户生成内容的增加,图像的数量也在增加。因此,需要显示最相关的图像,以影响用户在线购物选择和行为。在本文中,我们提出了一种简单 yet effective的训练方法来排序客户图像。我们使用Myntra(印度主要的时尚电商公司) studio 帖子和高度参与(upvotes/downvotes)的用户生成内容图像作为我们的起点,并使用选择的扭曲技术将图像的质量与Bad UGC图像相同。我们训练我们的网络,以将低质量图像排在高质量图像之下。我们的提议方法在两个纪录中,即相关系数和准确率,与基eline模型相比,均表现出了substantial的超越。
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment
results: 测试结果表明,在总体可靠性方面,更加Alignment的模型 tend to perform better。然而,对不同的可靠性类型进行细化分析、测试和不断改进的需求是非常重要的。这些研究结果提供了关于大语言模型可靠性的有价值信息和指导,以便在各种应用中进行可靠和道德的部署。Abstract
Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
摘要
确保对模型的整合,即使模型行为与人类意图相符(1、2),在部署大型自然语言模型(LLM)实际应用前已成为一项关键任务。例如,OpenAI投入了六个月的时间,不断地对GPT-4进行了整合,以确保其可靠性和可靠性(3)。然而,实践者面临着评估模型输出是否符合社会规范、价值观和法规的明确指南的挑战。这个障碍因此阻碍了系统化的迭代部署LLMs。为解决这个问题,本文提供了LLM可靠性的全面调查。调查覆盖了七个主要类别的LLM可靠性:可靠性、安全性、公平性、抵御滥用、可见性和逻辑、遵守社会规范、稳定性。每个主要类别都被进一步细分为多个子类别,共计29个子类别。此外,为进一步调查这些子类别,我们选择了8个子类别进行测量研究,并在多种广泛使用的LLMs上进行了相应的测量研究。测量结果表明,通常情况下,更加整合的模型往往在总的可靠性方面表现更好。然而,对不同的可靠性类别进行了更细化的分析、测试和改进,是关键在实现可靠和道德正确的LLM部署。本文通过把这些关键维度提出,为实践者提供了有价值的指导和发现,以便更好地理解和解决这些问题,从而实现可靠和道德正确的LLM部署。
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization
results: 我们的实验表明,FlexiCubes 可以在 synthetic benchmarks 和实际应用中提供显著改善的网格质量和几何准确性。Abstract
This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics. Existing implementations adapt classic isosurface extraction algorithms like Marching Cubes or Dual Contouring; these techniques were designed to extract meshes from fixed, known fields, and in the optimization setting they lack the degrees of freedom to represent high-quality feature-preserving meshes, or suffer from numerical instabilities. We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives. Our main insight is to introduce additional carefully-chosen parameters into the representation, which allow local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation when optimizing for a downstream task. We base our extraction scheme on Dual Marching Cubes for improved topological properties, and present extensions to optionally generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments validate FlexiCubes on both synthetic benchmarks and real-world applications, showing that it offers significant improvements in mesh quality and geometric fidelity.
摘要
We propose FlexiCubes, a customized isosurface representation designed for optimizing an unknown mesh with respect to geometric, visual, or physical objectives. Our key insight is to introduce additional carefully chosen parameters into the representation, allowing for local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation during optimization.We base our extraction scheme on Dual Marching Cubes for improved topological properties and present extensions to generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments demonstrate that FlexiCubes offers significant improvements in mesh quality and geometric fidelity on both synthetic benchmarks and real-world applications.
Machine Learning aided Computer Architecture Design for CNN Inferencing Systems
For: This paper aims to expedite the Design Space Exploration (DSE) process for selecting the most suitable Graphics Processing Unit (GPU) for Convolutional Neural Network (CNN) inferencing systems.* Methods: The authors propose a quick and precise technique for forecasting the power and performance of CNNs during inference, using a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm and a novel power model.* Results: The proposed approach achieves a Mean Absolute Percentage Error (MAPE) of 5.03% and 5.94% for power and performance, respectively, allowing computer architects to estimate power and performance in the early stages of development and reducing the need for numerous prototypes.Abstract
Efficient and timely calculations of Machine Learning (ML) algorithms are essential for emerging technologies like autonomous driving, the Internet of Things (IoT), and edge computing. One of the primary ML algorithms used in such systems is Convolutional Neural Networks (CNNs), which demand high computational resources. This requirement has led to the use of ML accelerators like GPGPUs to meet design constraints. However, selecting the most suitable accelerator involves Design Space Exploration (DSE), a process that is usually time-consuming and requires significant manual effort. Our work presents approaches to expedite the DSE process by identifying the most appropriate GPGPU for CNN inferencing systems. We have developed a quick and precise technique for forecasting the power and performance of CNNs during inference, with a MAPE of 5.03% and 5.94%, respectively. Our approach empowers computer architects to estimate power and performance in the early stages of development, reducing the necessity for numerous prototypes. This saves time and money while also improving the time-to-market period.
摘要
高效和快速的计算机科学(ML)算法计算是现代科技的核心,如自动驾驶、物联网(IoT)和边缘计算。其中之一的主要ML算法是卷积神经网络(CNNs),它们需要高度的计算资源。这种需求导致了ML加速器如GPGPUs的使用,但选择最适合的加速器需要Design Space Exploration(DSE)过程,这个过程通常需要较长的时间和大量的手动努力。我们的工作提出了加速DSE过程的方法,以便更快地选择适合CNN推理系统的GPGPU。我们已经开发了一种快速精准的CNN推理性能和功率预测技术,MAPE为5.03%和5.94%,分别。我们的方法使得计算机架构师可以在开发的早期阶段估算CNN的功率和性能,从而降低了许多原型的需求,这涯ь时间和钱,同时也提高了市场上采用时间。
FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis
results: 对风险检测器的解释质量进行了广泛评估,并证明了FINER在风险检测中的表现优于当前领先的工具。Abstract
Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribution (FA) methods can be used to explain deep learning, the underlying classifier is still blind to what behavior is suspicious, and the generated explanation cannot adapt to downstream tasks, incurring poor explanation fidelity and intelligibility. In this paper, we propose FINER, the first framework for risk detection classifiers to generate high-fidelity and high-intelligibility explanations. The high-level idea is to gather explanation efforts from model developer, FA designer, and security experts. To improve fidelity, we fine-tune the classifier with an explanation-guided multi-task learning strategy. To improve intelligibility, we engage task knowledge to adjust and ensemble FA methods. Extensive evaluations show that FINER improves explanation quality for risk detection. Moreover, we demonstrate that FINER outperforms a state-of-the-art tool in facilitating malware analysis.
摘要
深度学习类ifizier可以达到不同风险检测应用中的状态地艺术性能。它们探索了丰富的semantic表示,并自动发现风险行为。然而,由于不透明性,这些行为semantic representation无法传递给下游安全专家以减轻他们的安全分析工作负担。尽管Feature Attribution(FA)方法可以用来解释深度学习,但下游类ifier仍然无法了解哪些行为是可疑的,并且生成的解释无法适应下游任务,导致低效解释准确度和可读性。在这篇论文中,我们提出了FINER框架,用于生成高准确性和高可读性的解释。我们的高级想法是聚集解释努力从模型开发者、FA设计者和安全专家的三个方面。为了提高准确性,我们使用解释指导多任务学习策略来练化类ifier。为了提高可读性,我们利用任务知识来调整和混合FA方法。我们的实验表明,FINER可以提高风险检测解释质量,同时也可以超过当前状态的工具在支持黑客分析方面。
Preemptive Detection of Fake Accounts on Social Networks via Multi-Class Preferential Attachment Classifiers
results: 我们的实验结果表明,PreAttacK 可以在新用户加入社交网络后的第20个不Answered好友请求后,准确地判断新账户是假的。这比主流的算法要更高,因为它们需要考虑假账户之间的友谊或共享的内容。此外,PreAttacK 可以在全球Facebook网络上实现状态最佳的性能,并且可以在新用户发送和接收20个不Answered好友请求后 converges to AUC=0.9。Abstract
In this paper, we describe a new algorithm called Preferential Attachment k-class Classifier (PreAttacK) for detecting fake accounts in a social network. Recently, several algorithms have obtained high accuracy on this problem. However, they have done so by relying on information about fake accounts' friendships or the content they share with others--the very things we seek to prevent. PreAttacK represents a significant departure from these approaches. We provide some of the first detailed distributional analyses of how new fake (and real) accounts first attempt to request friends after joining a major network (Facebook). We show that even before a new account has made friends or shared content, these initial friend request behaviors evoke a natural multi-class extension of the canonical Preferential Attachment model of social network growth. We use this model to derive a new algorithm, PreAttacK. We prove that in relevant problem instances, PreAttacK near-optimally approximates the posterior probability that a new account is fake under this multi-class Preferential Attachment model of new accounts' (not-yet-answered) friend requests. These are the first provable guarantees for fake account detection that apply to new users, and that do not require strong homophily assumptions. This principled approach also makes PreAttacK the only algorithm with provable guarantees that obtains state-of-the-art performance on new users on the global Facebook network, where it converges to AUC=0.9 after new users send + receive a total of just 20 not-yet-answered friend requests. For comparison, state-of-the-art benchmarks do not obtain this AUC even after observing additional data on new users' first 100 friend requests. Thus, unlike mainstream algorithms, PreAttacK converges before the median new fake account has made a single friendship (accepted friend request) with a human.
摘要
在这篇论文中,我们描述了一种新的算法,即偏好链接k类分类器(PreAttacK),用于检测社交媒体上的假账户。在过去的几年中,一些算法已经在这个问题上获得了高准确率,但是它们都是通过利用假账户的朋友关系或它们与其他人分享的内容来实现的—— precisamente 我们想要避免的信息。PreAttacK 是这些方法的巨大不同。我们提供了一些初次分布分析,描述新的假(和真)账户在加入社交媒体平台后的第一个请求友情行为。我们发现,even before a new account has made friends or shared content, these initial friend request behaviors evoke a natural multi-class extension of the canonical Preferential Attachment model of social network growth。我们使用这个模型来 derivate a new algorithm,PreAttacK。我们证明,在相关的问题实例中,PreAttacK near-optimally approximates the posterior probability that a new account is fake under this multi-class Preferential Attachment model of new accounts' (not-yet-answered) friend requests。这是首次对新用户的假账户检测提供了证明的保证,而且不需要强调同类性假设。这种原则驱动的方法也使PreAttacK 成为唯一具有证明保证的算法,在全球Facebook网络上实现了状态之前的最佳性能,它在新用户发送和接收20个未回复的朋友请求后 converges to AUC=0.9。对比之下,状态之前的标准准则不能达到这个 AUC,即使在观察新用户的第100个朋友请求之前。因此,与主流算法不同,PreAttacK 在新用户发送第一个朋友请求之前就 converges。
RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model
results: 该论文的结果表明,通过使用自然计划技术,GPT-3.5 在 RTLLM 中的性能可以得到显著提高。此外,该论文还提出了三个进步性目标,包括语法目标、功能目标和设计质量目标,以系统地评估生成的设计 RTL 的质量。Abstract
Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.
摘要
受大语言模型(LLM)的最近成功启发,研究人员开始探索将 LLM 应用于快速硬件设计,如通过自然语言指令生成设计RTL。然而,现有的研究都是Target设计相对较小,而且由作者自己提出的,使得对不同 LLM 解决方案进行公正比较困难。此外,许多前作只注重设计正确性,而忽略生成设计 RTL 的设计质量。在这种情况下,我们提出了一个开源的标准 benchmark,名为 RTLLM,用于通过自然语言指令生成设计 RTL。为系统地评估自动生成的设计 RTL,我们提出了三个进攻性目标,即语法目标、功能目标和设计质量目标。这个 benchmark 可以自动提供任何给定 LLM 解决方案的量化评估。此外,我们还提出了一种易于使用却效果惊人的 prompt 工程技术,名为自我规划,其在我们的提posed benchmark中证明了显著提高 GPT-3.5 的性能。
OpenProteinSet: Training data for structural biology at scale
paper_authors: Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Lukas Jarosch, Daniel Berenberg, Ian Fisk, Andrew M. Watkins, Stephen Ra, Richard Bonneau, Mohammed AlQuraishi
methods: 本研究使用了 transformers 直接 attend over 大量的 raw MSAs,以及 Protein Data Bank 中的结构同源者,并使用 AlphaFold2 进行蛋白质结构预测。
results: 本研究 introduce OpenProteinSet,一个包含超过 16 万个 MSAs、结构同源者和 AlphaFold2 蛋白质结构预测的开源数据集,可以用于蛋白质结构、功能和设计方面的多种任务和大规模多模态机器学习研究。Abstract
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
摘要
多个序列对Alignment(MSA)的蛋白质编码了丰富的生物信息,在生物信息学方法中 acted as workhorses for decades, such as protein design and protein structure prediction. Recent breakthroughs like AlphaFold2 have reaffirmed their importance by using transformers to directly attend over large quantities of raw MSAs. However, generating MSAs is highly computationally intensive, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To address this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Homophily-enhanced Structure Learning for Graph Clustering
paper_authors: Ming Gu, Gaoming Yang, Sheng Zhou, Ning Ma, Jiawei Chen, Qiaoyu Tan, Meihan Liu, Jiajun Bu
For: The paper is written for graph clustering tasks, specifically addressing the issue of subpar performance due to the lack of consideration of graph structure quality in existing GNN-based methods.* Methods: The paper proposes a novel method called HoLe, which enhances the degree of homophily within the graph structure to improve GNNs and clustering outcomes. This is achieved through two clustering-oriented structure learning modules: hierarchical correlation estimation and cluster-aware sparsification.* Results: The paper reports superior performance of HoLe against state-of-the-art baselines on seven benchmark datasets of various types and scales, across a range of clustering metrics.Abstract
Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called \textbf{ho}mophily-enhanced structure \textbf{le}arning for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.
摘要
GRAPH CLUSTERING 是图分析中的基本任务,而使用图神经网络(GNNs)的最新进展已经取得了很好的成果。尽管现有的 GNN-based 图 clustering 方法已经取得了成功,但它们经常忽略图structure的质量,这是真实世界图中的自然特性,导致表现不佳。图结构学习可以改善输入图的精度,但以前的图结构学习尝试主要集中在supervised设定下,无法直接应用于我们的具体 clustering 任务,因为缺乏标注数据。为了填补这个差距,我们提出了一种名为 HoLe 的新方法。我们的动机来源于观察,通过提高图中同类连接的度数,可以大幅提高 GNNs 和 clustering 结果。为实现这个目标,我们开发了两个 clustering-oriented 结构学习模块:卷积相关度估计和集群意见简化。前者模块可以更准确地估计对 node 之间的对应关系,通过利用幽扬和 clustering 空间的指导,而后者模块可以基于对应矩阵和 clustering 分配生成一个缩短后的结构。此外,我们设计了一种相互作用的 JOINT 优化方法,将 homophily-enhanced 结构学习和 GNN-based clustering 的训练联合进行。广泛的实验表明,HoLe 在七种不同类型和规模的测试数据上,以及不同的 clustering 指标下,都与当前的基eline相比具有显著的优势。
From CNN to Transformer: A Review of Medical Image Segmentation Models
results: 论文根据实验结果显示,这些模型在类别数据集上的表现都具有优秀的性能。Abstract
Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Additionally, with the remarkable success of pre-trained models in natural language processing tasks, transformer-based models like TransUNet have achieved desirable performance on multiple medical image segmentation datasets. In this paper, we conduct a survey of the most representative four medical image segmentation models in recent years. We theoretically analyze the characteristics of these models and quantitatively evaluate their performance on two benchmark datasets (i.e., Tuberculosis Chest X-rays and ovarian tumors). Finally, we discuss the main challenges and future trends in medical image segmentation. Our work can assist researchers in the related field to quickly establish medical segmentation models tailored to specific regions.
摘要
医学图像分割是医学图像分析中重要的一步,特别是为了效率地诊断和治疗疾病。使用深度学习进行图像分割已成为现代医学图像分析中的潮流。目前广泛采用的方法是U-Net和其变体。此外,随着自然语言处理任务中预训练模型的成功,基于转换器的模型如TransUNet在医学图像分割数据集上实现了可喜的表现。在这篇论文中,我们对最近几年内最 represetative的四种医学图像分割模型进行了调查和分析。我们对这些模型的特点进行了理论分析,并对两个标准数据集(即肺Tube肿瘤X光图像和卵巢肿瘤)进行了量化评估。最后,我们讨论了医学图像分割的主要挑战和未来趋势。我们的工作可以帮助相关领域的研究人员快速建立适应特定地区的医学分割模型。
Byzantine-Robust Decentralized Stochastic Optimization with Stochastic Gradient Noise-Independent Learning Error
For: 这个论文研究了一种对于分布式网络中的副总统(Byzantine)攻击的随机优化方法。* Methods: 这个方法使用了每个代理在它的邻居中交换本地模型,然后使用随机梯度下降(SGD)更新本地模型。它还引入了两种干扰减少方法:随机平均梯度算法(SAGA)和loopless随机干扰减少Gradient(LSVRG),以减少随机梯度误差的影响。* Results: 这个研究发现,使用这两种干扰减少方法可以实现线性增长速度和随机梯度误差独立的学习结果,并且这些结果是给一种基于总体几何(TV)对偏振度误差的正规化和随机梯度更新的方法。实验结果显示这两种方法在不同的副总统攻击下都能够实现高效的学习。Abstract
This paper studies Byzantine-robust stochastic optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by stochastic gradient descent (SGD). The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process. To the best of our knowledge, there is no existing work that simultaneously achieves a linear convergence speed and a small learning error. We observe that the learning error is largely dependent on the intrinsic stochastic gradient noise. Motivated by this observation, we introduce two variance reduction methods, stochastic average gradient algorithm (SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to Byzantine-robust decentralized stochastic optimization for eliminating the negative effect of the stochastic gradient noise. The two resulting methods, BRAVO-SAGA and BRAVO-LSVRG, enjoy both linear convergence speeds and stochastic gradient noise-independent learning errors. Such learning errors are optimal for a class of methods based on total variation (TV)-norm regularization and stochastic subgradient update. We conduct extensive numerical experiments to demonstrate their effectiveness under various Byzantine attacks.
摘要
We observe that the learning error is heavily influenced by the intrinsic stochastic gradient noise. To address this issue, we introduce two variance reduction methods, stochastic average gradient algorithm (SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to Byzantine-robust decentralized stochastic optimization. These methods eliminate the negative impact of stochastic gradient noise and achieve both linear convergence speeds and stochastic gradient noise-independent learning errors.The two resulting methods, BRAVO-SAGA and BRAVO-LSVRG, are optimal for a class of methods based on total variation (TV)-norm regularization and stochastic subgradient update. We conduct extensive numerical experiments to demonstrate their effectiveness under various Byzantine attacks.
Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season
methods: 本研究使用了BERT话题模型对Twitter数据进行 clustering,并进行了时空分析,检查不同地区的话题分布 during the 2020年西部美国野火季。研究发现,Twitter用户主要关注了“健康影响”、“损害”和“疏散”三个话题。
results: 研究发现, Twitter用户在灾害发生后,主要关注健康影响、损害和疏散等三个话题。使用了SIR理论来探索话题传播的大小和速度,结果显示,话题传播与野火传播模式之间存在明显的关系。Abstract
Effective disaster response is critical for affected communities. Responders and decision-makers would benefit from reliable, timely measures of the issues impacting their communities during a disaster, and social media offers a potentially rich data source. Social media can reflect public concerns and demands during a disaster, offering valuable insights for decision-makers to understand evolving situations and optimize resource allocation. We used Bidirectional Encoder Representations from Transformers (BERT) topic modeling to cluster topics from Twitter data. Then, we conducted a temporal-spatial analysis to examine the distribution of these topics across different regions during the 2020 western U.S. wildfire season. Our results show that Twitter users mainly focused on three topics:"health impact," "damage," and "evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to explore the magnitude and velocity of topic diffusion on Twitter. The results displayed a clear relationship between topic trends and wildfire propagation patterns. The estimated parameters obtained from the SIR model in selected cities revealed that residents exhibited a high level of several concerns during the wildfire. Our study details how the SIR model and topic modeling using social media data can provide decision-makers with a quantitative approach to measure disaster response and support their decision-making processes.
摘要
有效的灾难应对是重要的 для受影响的社区。响应者和决策者可以从社交媒体获得可靠、时间敏感的灾难影响的信息,以便更好地理解灾难情况的发展和优化资源分配。我们使用 bidirectional Encoder Representations from Transformers(BERT)话题分析将推特数据中的话题集成到三个话题:“健康影响”, “损害”和“疏散”。然后,我们进行了时空分析,检查推特数据中这些话题的分布情况。我们的结果显示,推特用户主要关注了这三个话题。我们使用感染-传播- recovered(SIR)理论来探究推特上话题的传播范围和速度。结果显示,选择的城市的居民在野火期间表现出了高水平的多种关心。我们的研究详细介绍了如何使用社交媒体数据和 SIR 模型来衡量灾难应对和支持决策过程。
paper_authors: Pengfei Ding, Yan Wang, Guanfeng Liu for: 本研究主要针对的是跨型图ew-shot学习问题,即在含有不同类型节点和边的图(Heterogeneous Graph,HG)中用少量标签数据进行新类预测。methods: 本文提出了一种新的模型,即跨型图ew-shot学习模型(CGFL),以解决跨型图ew-shot学习问题。CGFL首先提取了元模式来捕捉各种不同类型的信息,然后使用多视图异类图神经网络(MHGN)来学习元模式。最后,CGFL使用一个分数模块来评估标注样本的有用性,并决定源HG中的转移性。results: 对四个实际 datasets进行了广泛的实验,结果表明CGFL在跨型图ew-shot学习问题中表现出色,比之前的方法更高效。Abstract
In recent years, heterogeneous graph few-shot learning has been proposed to address the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges. The existing methods have achieved good performance by transferring generalized knowledge extracted from rich-labeled classes in source HG(s) to few-labeled classes in a target HG. However, these methods only consider the single-heterogeneity scenario where the source and target HGs share a fixed set of node/edge types, ignoring the more general scenario of cross-heterogeneity, where each HG can have a different and non-fixed set of node/edge types. To this end, we focus on the unexplored cross-heterogeneity scenario and propose a novel model for Cross-heterogeneity Graph Few-shot Learning, namely CGFL. In CGFL, we first extract meta-patterns to capture heterogeneous information and propose a multi-view heterogeneous graph neural network (MHGN) to learn meta-patterns across HGs. Then, we propose a score module to measure the informativeness of labeled samples and determine the transferability of each source HG. Finally, by integrating MHGN and the score module into a meta-learning mechanism, CGFL can effectively transfer generalized knowledge to predict new classes with few-labeled data. Extensive experiments on four real-world datasets have demonstrated the superior performance of CGFL over the state-of-the-art methods.
摘要
在最近的几年中,人们提出了hetEROEUS graph few-shot learning,以Addressing the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges. Existing methods have achieved good performance by transferring generalized knowledge extracted from rich-labeled classes in source HGs to few-labeled classes in a target HG. However, these methods only consider the single-heterogeneity scenario where the source and target HGs share a fixed set of node/edge types, ignoring the more general scenario of cross-heterogeneity, where each HG can have a different and non-fixed set of node/edge types. Therefore, we focus on the unexplored cross-heterogeneity scenario and propose a novel model for Cross-heterogeneity Graph Few-shot Learning, namely CGFL.In CGFL, we first extract meta-patterns to capture heterogeneous information and propose a multi-view heterogeneous graph neural network (MHGN) to learn meta-patterns across HGs. Then, we propose a score module to measure the informativeness of labeled samples and determine the transferability of each source HG. Finally, by integrating MHGN and the score module into a meta-learning mechanism, CGFL can effectively transfer generalized knowledge to predict new classes with few-labeled data. Extensive experiments on four real-world datasets have demonstrated the superior performance of CGFL over the state-of-the-art methods.
Data-driven Intra-Autonomous Systems Graph Generator
methods: 本文提出了一种名为Deep-generative graphs for the Internet(DGGI)的生成器,以及一种名为Internet Graphs(IGraphs)的大规模实际AS内部网络图 dataset。为创建IGraphs,开发了 Filtered Recurrent Multi-level(FRM)算法 для社区提取。
results: 实验结果表明,DGGI生成的Synthetic Graphs能够准确地复制AS内部网络的特性,包括中心性、嵌入性、同域性和节点度。相比之下,DGGI generator在Maximum Mean Discrepancy(MMD) metric上提高了84.4%、95.1%、97.9%和94.7%。Abstract
This paper introduces a novel deep-learning based generator of synthetic graphs that represent intra-Autonomous System (AS) in the Internet, named Deep-generative graphs for the Internet (DGGI). It also presents a novel massive dataset of real intra-AS graphs extracted from the project Internet Topology Data Kit (ITDK), called Internet Graphs (IGraphs). To create IGraphs, the Filtered Recurrent Multi-level (FRM) algorithm for community extraction was developed. It is shown that DGGI creates synthetic graphs which accurately reproduce the properties of centrality, clustering, assortativity, and node degree. The DGGI generator overperforms existing Internet topology generators. On average, DGGI improves the Maximum Mean Discrepancy (MMD) metric 84.4%, 95.1%, 97.9%, and 94.7% for assortativity, betweenness, clustering, and node degree, respectively.
摘要
Translated into Simplified Chinese:这篇论文介绍了一种基于深度学习的互联网内部自治系统(AS)的生成器,名为深度生成互联网图(DGGI)。它还介绍了一个大量的真实内部AS图集,取名为互联网图集(IGraphs),通过Filtered Recurrent Multi-level(FRM)算法进行社区抽象。实验表明,DGGI可以准确地复制内部AS图中的中心性、嵌入性、聚合性和节点度的性质。与现有的互联网图生成器相比,DGGI生成器表现出色,在最大均方差(MMD)指标上提高了84.4%、95.1%、97.9%和94.7%。
AI-Enabled Software and System Architecture Frameworks: Focusing on smart Cyber-Physical Systems (CPS)
paper_authors: Armin Moin, Atta Badii, Stephan Günnemann, Moharram Challenger
For: The paper aims to address the gap in existing architecture frameworks for software, systems, and enterprises by including the concerns of data science and Machine Learning (ML) stakeholders, such as data scientists and data engineers.* Methods: The paper proposes two sets of merit criteria for the efficient development and performance assessment of ML-enabled Cyber-Physical Systems (CPSs), as well as criteria for evaluating and benchmarking the tools used in the modeling and development pipeline. The authors use multiple empirical and qualitative research methods, including literature review, survey instruments, and expert interviews, to devise and validate the proposed framework.* Results: The paper provides a framework adapted to meet the requirements of modern applications and organizations where ML artifacts are prevalent and crucial, and proposes criteria for evaluating and benchmarking ML-enabled CPSs and the tools used in their development. The authors collect and analyze opinions from 77 experts from over 25 organizations in more than 10 countries to validate the proposed framework.Abstract
Several architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified various stakeholders and defined architecture viewpoints and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Therefore, they failed to address the architecture viewpoints and views responsive to the concerns of the data science community. In this paper, we address this gap by establishing the architecture frameworks adapted to meet the requirements of modern applications and organizations where ML artifacts are both prevalent and crucial. In particular, we focus on ML-enabled Cyber-Physical Systems (CPSs) and propose two sets of merit criteria for their efficient development and performance assessment, namely the criteria for evaluating and benchmarking ML-enabled CPSs, and the criteria for evaluation and benchmarking of the tools intended to support users through the modeling and development pipeline. In this study, we deploy multiple empirical and qualitative research methods based on literature review and survey instruments including expert interviews and an online questionnaire. We collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries to devise and validate the proposed framework.
摘要
We use multiple empirical and qualitative research methods, including literature review and survey instruments such as expert interviews and online questionnaires, to collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries. We validate the proposed framework through this study.Here is the translation in Simplified Chinese:一些软件、系统和企业架构框架在文献中提出,但它们未包括数据科学和机器学习(ML)相关的利益师,如数据科学家和数据工程师。这种漏洞导致现有的架构框架无法对数据科学社区的关注进行框架和考虑。在本文中,我们希望通过适应现代应用和组织中ML文件的普遍和重要性来解决这个问题。我们专注于ML启用的Cyber-Physical Systems(CPSs),并提出了两个集成评价标准,一个是评估和比较ML启用CPSs的标准,另一个是用于支持用户模拟和开发管道的工具评价标准。我们通过文献复查和问卷调查工具,包括专家采访和在线问卷,收集、分析和集成了77名专家来拟合和验证提议的框架。
Financial Fraud Detection: A Comparative Study of Quantum Machine Learning Models
results: 研究发现,量子支持向量分类模型在金融应用中的表现最高,F1分数达0.98。其他模型也具有潜在的潜力,但存在一些局限性。研究提供了解决当前限制的解决方案,并为量子机器学习在诈骗检测领域的未来发展提供了新的思路。Abstract
In this research, a comparative study of four Quantum Machine Learning (QML) models was conducted for fraud detection in finance. We proved that the Quantum Support Vector Classifier model achieved the highest performance, with F1 scores of 0.98 for fraud and non-fraud classes. Other models like the Variational Quantum Classifier, Estimator Quantum Neural Network (QNN), and Sampler QNN demonstrate promising results, propelling the potential of QML classification for financial applications. While they exhibit certain limitations, the insights attained pave the way for future enhancements and optimisation strategies. However, challenges exist, including the need for more efficient Quantum algorithms and larger and more complex datasets. The article provides solutions to overcome current limitations and contributes new insights to the field of Quantum Machine Learning in fraud detection, with important implications for its future development.
摘要
本研究中,我们进行了四种量子机器学习(QML)模型的比较研究,用于金融领域的诈骗检测。我们证明了量子支持向量分类模型可以达到最高性能,其F1分数为0.98。其他模型,如量子随机分类器、量子神经网络(QNN)估计器和抽样QNN,也展示了潜在的潜力,为金融应用领域的量子机器学习分类带来了新的可能性。尽管它们存在一些局限性,但获得的洞察能够推动未来的优化策略。然而,当前还存在许多挑战,包括需要更高效的量子算法和更大和更复杂的数据集。本文提供了解决当前限制的解决方案,并对量子机器学习在诈骗检测领域的未来发展带来了新的意义。
Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping
paper_authors: Ali Jamali, Swalpa Kumar Roy, Danfeng Hong, Peter M Atkinson, Pedram Ghamisi for: 这个研究旨在开发一个基于多层调和单元(MLP)和空间闸道单元(SGU)的精确土地用类分(LULC)映射算法。methods: 研究人员使用了一个名为SGU-MLP的学习算法,该算法结合了MLP和SGU,以提高LULC映射的精度。results: 实验结果显示,SGU-MLP分类算法比较 HybridSN、ResNet、iFormer、EfficientFormer 和 CoAtNet 等标准 CNN 和 CNN-ViT 基本方法都有着superior的性能,例如在旧金山实验中,SGU-MLP 与 HybridSN、CoAtNet、Efficientformer、iFormer 和 ResNet 相比,获得了15%、19%、20%、21% 和 25% 的提高。Abstract
Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP
摘要
convolutional neural networks (CNNs) 是用于EXTRACT hierarchical features的模型,recently,through the use of a self-attention mechanism, vision transformers (ViTs) have achieved better modeling of global contextual information than CNNs. However, to realize their image classification strength, ViTs require a large amount of training data. Where the available training data is limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP.
Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection in Autonomous Driving
results: 研究发现,可以通过在云端进行模型计算,以提高探测质量并且在实时下进行计算。此外,使用JPEG和H.265压缩也可以降低网络传输延迟,并不影响探测metric。Abstract
Environmental perception is a key element of autonomous driving because the information received from the perception module influences core driving decisions. An outstanding challenge in real-time perception for autonomous driving lies in finding the best trade-off between detection quality and latency. Major constraints on both computation and power have to be taken into account for real-time perception in autonomous vehicles. Larger object detection models tend to produce the best results, but are also slower at runtime. Since the most accurate detectors cannot run in real-time locally, we investigate the possibility of offloading computation to edge and cloud platforms, which are less resource-constrained. We create a synthetic dataset to train object detection models and evaluate different offloading strategies. Using real hardware and network simulations, we compare different trade-offs between prediction quality and end-to-end delay. Since sending raw frames over the network implies additional transmission delays, we also explore the use of JPEG and H.265 compression at varying qualities and measure their impact on prediction metrics. We show that models with adequate compression can be run in real-time on the cloud while outperforming local detection performance.
摘要
环境感知是自动驾驶中关键的元素,因为感知模块的信息会影响核心驾驶决策。现实时感知 для自动驾驶存在出色的挑战,即在检测质量和延迟之间找到最佳平衡。实时感知中的计算和能量限制是必须考虑的。大型物体检测模型通常会生成最佳结果,但是在运行时也更慢。由于最准确的检测器无法在本地实时运行,我们 investigate了将计算卷积到边缘和云平台上,这些平台具有较强的计算和能量资源。我们创建了synthetic数据集来训练物体检测模型,并评估不同的卷积策略。使用真实硬件和网络 simulate,我们比较不同的卷积质量和终端延迟之间的质量差异。由于发送Raw帧到网络会添加额外的传输延迟,我们还探讨了使用JPEG和H.265压缩的影响,并测量其对预测指标的影响。我们发现,使用合适的压缩策略可以在云上实时运行模型,而且比本地检测性能更高。
SegMatch: A semi-supervised learning method for surgical instrument segmentation
paper_authors: Meng Wei, Charlie Budd, Luis C. Garcia-Peraza-Herrera, Reuben Dorent, Miaojing Shi, Tom Vercauteren for:SegMatch is proposed to reduce the need for expensive annotation for laparoscopic and robotic surgical images, which is a key enabler for advanced surgical assistance and computer-assisted interventions.methods:SegMatch is a semi-supervised learning method that combines consistency regularization and pseudo labeling, and adapts it for segmentation tasks. It uses weakly augmented unlabelled images to generate pseudo-labels and enforces the unsupervised loss against the output of the model. The algorithm also introduces a trainable adversarial augmentation strategy to increase the relevance of the augmentations.results:SegMatch outperforms fully supervised approaches and state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios. The results demonstrate the effectiveness of using unlabelled data for training purposes in segmentation tasks.Abstract
Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining consistency regularization and pseudo labelling, and adapts it for the purpose of segmentation. In our proposed SegMatch, the unlabelled images are weakly augmented and fed into the segmentation model to generate a pseudo-label to enforce the unsupervised loss against the output of the model for the adversarial augmented image on the pixels with a high confidence score. Our adaptation for segmentation tasks includes carefully considering the equivariance and invariance properties of the augmentation functions we rely on. To increase the relevance of our augmentations, we depart from using only handcrafted augmentations and introduce a trainable adversarial augmentation strategy. Our algorithm was evaluated on the MICCAI Instrument Segmentation Challenge datasets Robust-MIS 2019 and EndoVis 2017. Our results demonstrate that adding unlabelled data for training purposes allows us to surpass the performance of fully supervised approaches which are limited by the availability of training data in these challenges. SegMatch also outperforms a range of state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios.
摘要
针对手术工具分割问题,我们提出了SegMatch方法,该方法可以减少高成本的标注量,以提高计算机协助手术 intervención的效能。SegMatch基于FixMatch算法,该算法组合了一致性规则和pseudo标签,并特意适应了分割任务。在我们的SegMatch方法中,未标注的图像会被弱地扩展并输入到分割模型中,以生成一个pseudo标签,以便对模型输出的图像像素进行不supervised损失。我们对分割任务进行了仔细考虑,包括选择适合的扩展函数,以保证我们的扩展函数具有对称和不变性性质。为了提高我们的扩展函数的 relevance,我们不再仅仅使用手工设计的扩展函数,而是引入了可学习的对抗扩展策略。我们的算法在Robust-MIS 2019和EndoVis 2017两个MICCAI Instrument Segmentation Challenge数据集上进行了评估。我们的结果表明,通过添加未标注的数据进行训练,我们可以超越完全supervisedapproaches的性能,这些approaches受到训练数据的有限性的限制。SegMatch还在不同的标签到未标签数据比例下与state-of-the-art semi-supervised学习semantic segmentation模型进行比较,并表现出优于这些模型。
Training neural networks with end-to-end optical backpropagation
results: 该研究成功实现了完全光学神经网络,包括训练和推理两个阶段,并且可以在不同的光学平台、材料和网络结构上实现。Abstract
Optics is an exciting route for the next generation of computing hardware for machine learning, promising several orders of magnitude enhancement in both computational speed and energy efficiency. However, to reach the full capacity of an optical neural network it is necessary that the computing not only for the inference, but also for the training be implemented optically. The primary algorithm for training a neural network is backpropagation, in which the calculation is performed in the order opposite to the information flow for inference. While straightforward in a digital computer, optical implementation of backpropagation has so far remained elusive, particularly because of the conflicting requirements for the optical element that implements the nonlinear activation function. In this work, we address this challenge for the first time with a surprisingly simple and generic scheme. Saturable absorbers are employed for the role of the activation units, and the required properties are achieved through a pump-probe process, in which the forward propagating signal acts as the pump and backward as the probe. Our approach is adaptable to various analog platforms, materials, and network structures, and it demonstrates the possibility of constructing neural networks entirely reliant on analog optical processes for both training and inference tasks.
摘要
计算机视觉是未来计算机硬件的吸引人路线,提供了数个数量级提升的计算速度和能效率。然而,为了实现完整的光学神经网络,不仅推理需要光学计算,还需要训练计算也需要光学实现。主要神经网络训练算法是反射传播,在这种情况下,计算顺序与信息流方向相反。虽然在数字计算机中 straightforward,但光学实现反射传播尚未实现,特别是因为光学元件实现非线性活化函数的需求受到了矛盾的要求。在这种工作中,我们为首次解决这个挑战,使用了抑制吸收器作为活化单元,并通过吸引和探测过程来实现所需的属性。我们的方法可以适应不同的分析平台、材料和网络结构,并示出了完全依赖于光学过程实现神经网络的训练和推理任务。
Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes
paper_authors: Sharon Jiang, Shannon Shen, Monica Agrawal, Barbara Lam, Nicholas Kurtzman, Steven Horng, David Karger, David Sontag for: 降低临床医生疲劳,提高医疗记录的效率和质量。methods: 利用电子医疗记录(EHR)审核日志数据进行机器学习超vision,实时动态提取有关病人历史记录。results: 在紧急医疗部门的评估中,方法可以达到0.963的准确率,帮助医生更有效地检索有关病人信息。User study也表明,该框架可以帮助医生更快速地检索有关病人信息。Abstract
The large amount of time clinicians spend sifting through patient notes and documenting in electronic health records (EHRs) is a leading cause of clinician burnout. By proactively and dynamically retrieving relevant notes during the documentation process, we can reduce the effort required to find relevant patient history. In this work, we conceptualize the use of EHR audit logs for machine learning as a source of supervision of note relevance in a specific clinical context, at a particular point in time. Our evaluation focuses on the dynamic retrieval in the emergency department, a high acuity setting with unique patterns of information retrieval and note writing. We show that our methods can achieve an AUC of 0.963 for predicting which notes will be read in an individual note writing session. We additionally conduct a user study with several clinicians and find that our framework can help clinicians retrieve relevant information more efficiently. Demonstrating that our framework and methods can perform well in this demanding setting is a promising proof of concept that they will translate to other clinical settings and data modalities (e.g., labs, medications, imaging).
摘要
临床医生的大量时间浏览病人笔记和电子医疗记录(EHR)是促进临床医生疲劳的主要原因。我们提出了在文笔记录过程中积极和动态检索相关笔记的方法,以减少找到病人历史信息的努力。在这项工作中,我们利用EHR审核日志为机器学习提供监督,以确定笔记 relevance 在特定临床上下文中,在特定时间点。我们的评估表明,我们的方法可以实现笔记 relevance 预测的 AUC 为 0.963。此外,我们还进行了一些临床医生的用户研究,发现我们的框架可以帮助临床医生更有效地检索相关信息。这表明我们的框架和方法在高危紧急环境下表现良好,这是一个有前途的证明,它将在其他临床设置和数据模式(例如,实验室、药物、成像)中发挥作用。
results: 这篇论文在多个benchmark测试 datasets上demonstrated consistent improvement over其他焦点识别方法,并且不需要额外的训练或标示数据。Abstract
In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.
摘要
在这篇论文中,我们提出了一种策略来 Identify textual saliency in large-scale language models applied to classification tasks。在视觉网络中,saliency naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.
Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio
methods: 这个模型包括基于 TitaNet 的说者嵌入模块、基于 Conformer 的遮罔以及 ASR 模块。这些模块在识别目标说者的时间频域中进行了共同优化,以将其他说者的语音排除。
results: 这篇论文使用 CTC 损失函数和对称频谱重建损失函数进行训练,实现了在 WSJ0-2mix-extr (4.2%)上的目标说者字元错误率 (TS-WER) 的最佳性能。此外,这篇论文还报告了在 WSJ0-3mix-extr(12.4%)、LibriSpeech2Mix(4.2%)和 LibriSpeech3Mix(7.6%) datasets 上的TS-WER,创造了新的benchmarks для TS-ASR。Abstract
We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The model consists of a TitaNet based speaker embedding module, a Conformer based masking as well as ASR modules. These modules are jointly optimized to transcribe a target-speaker, while ignoring speech from other speakers. For training we use Connectionist Temporal Classification (CTC) loss and introduce a scale-invariant spectrogram reconstruction loss to encourage the model better separate the target-speaker's spectrogram from mixture. We obtain state-of-the-art target-speaker word error rate (TS-WER) on WSJ0-2mix-extr (4.2%). Further, we report for the first time TS-WER on WSJ0-3mix-extr (12.4%), LibriSpeech2Mix (4.2%) and LibriSpeech3Mix (7.6%) datasets, establishing new benchmarks for TS-ASR. The proposed model will be open-sourced through NVIDIA NeMo toolkit.
摘要
我们提出了 CONF-TSASR,一种非 autoregressive 终端到时域频域架构,用于单频道目标说话人自动语音识别(TS-ASR)。该模型包括基于 TitaNet 的说话人嵌入模块、基于 Conformer 的掩码以及 ASR 模块。这些模块被联合优化,以转录目标说话人的语音,同时忽略其他说话人的语音。我们使用 Connectionist Temporal Classification(CTC)损失函数进行训练,并引入一种扩展矩阵嵌入重建损失函数,以促进模型更好地分离目标说话人的嵌入。我们在 WSJ0-2mix-extr(4.2%)上实现了目标说话人单词错误率(TS-WER)的新 benchmark,并在 WSJ0-3mix-extr(12.4%)、LibriSpeech2Mix(4.2%)和 LibriSpeech3Mix(7.6%) dataset 上提出了新的TS-ASR benchmark。该模型将被通过 NVIDIA NeMo 工具包开源。
Evaluating Pedestrian Trajectory Prediction Methods for the Application in Autonomous Driving
results: 研究发现,简单的模型在生成单个轨迹时仍然竞争力强,而一些常被认为是有用的特征在不同的架构下具有少量影响。此外,对不同数量的代理人进行的执行时间测量表明,简单的模型在面对不同数量的代理人时具有更好的扩展性。根据这些发现,本文提出了指导未来轨迹预测算法的建议。Abstract
In this paper, the state of the art in the field of pedestrian trajectory prediction is evaluated alongside the constant velocity model (CVM) with respect to its applicability in autonomous vehicles. The evaluation is conducted on the widely-used ETH/UCY dataset where the Average Displacement Error (ADE) and the Final Displacement Error (FDE) are reported. To align with requirements in real-world applications, modifications are made to the input features of the initially proposed models. An ablation study is conducted to examine the influence of the observed motion history on the prediction performance, thereby establishing a better understanding of its impact. Additionally, the inference time of each model is measured to evaluate the scalability of each model when confronted with varying amounts of agents. The results demonstrate that simple models remain competitive when generating single trajectories, and certain features commonly thought of as useful have little impact on the overall performance across different architectures. Based on these findings, recommendations are proposed to guide the future development of trajectory prediction algorithms.
摘要
在这篇论文中,我们评估了行人轨迹预测领域的现状,并与常速模型(CVM)进行比较,以探讨其在自动驾驶汽车中的可用性。我们在广泛使用的 ETH/UCY 数据集上进行评估,并计算了平均偏移误差(ADE)和最终偏移误差(FDE)。为适应实际应用需求,我们对初始提出的模型的输入特征进行修改。我们还进行了减少影响分析,以更好地了解其影响。此外,我们测量了每个模型的推理时间,以评估它们在面临不同数量的代理人时的扩展性。结果表明,简单的模型在生成单个轨迹时仍然竞争力强,而一些通常被认为是有用的特征实际上对整体性能有少量影响。根据这些发现,我们提出了指导未来轨迹预测算法的建议。
Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding
results: 该论文通过实验和分析,证明了其提出的计算模型和深度网络架构在视频序列中的应用有效性和可靠性。Abstract
This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.
摘要
本博士论文研究和开发了视觉注意力层次表示和视频序列中的视觉注意力理解。更specifically,我们提出了两种计算模型 для视觉注意力。首先,我们提出了一种生成概率模型,用于 Context-aware 视觉注意力模型和理解。其次,我们开发了一种深度网络架构,用于视觉注意力模型,首先估算上下文视觉注意力,最终用于模型 temporal domain 中的注意力。
Deep Learning for Morphological Identification of Extended Radio Galaxies using Weak Labels
paper_authors: Nikhel Gupta, Zeeshan Hayder, Ray P. Norris, Minh Huynh, Lars Petersson, X. Rosalind Wang, Heinz Andernach, Bärbel S. Koribalski, Miranda Yew, Evan J. Crawford
for: 这项研究旨在开发一种基于深度学习算法,以降低对复杂 radio galaxies 的 pixe-level标签成本。
methods: 该算法使用了弱类标签的 radio galaxies 来获得类活动地图(CAMs),然后使用 Inter-pixel 关系网络(IRNet)进一步加工CAMs,以获得 radio galaxies 和它们的红外主恒星的实例分割mask。
results: 研究人员使用的数据来自澳大利亚 Square Kilometre Array Pathfinder(ASKAP)望远镜,具体是 Evolutionary Map of the Universe(EMU)试验 Survey,覆盖了天空面积270平方度,具有RMS敏感度25-35μJy/ibeam。研究人员展示了使用弱类标签深度学习算法可以高精度预测 pixe-level信息,包括 radio 辐射覆盖所有 галактиComponents的扩展mask和红外主恒星的位置。Abstract
The present work discusses the use of a weakly-supervised deep learning algorithm that reduces the cost of labelling pixel-level masks for complex radio galaxies with multiple components. The algorithm is trained on weak class-level labels of radio galaxies to get class activation maps (CAMs). The CAMs are further refined using an inter-pixel relations network (IRNet) to get instance segmentation masks over radio galaxies and the positions of their infrared hosts. We use data from the Australian Square Kilometre Array Pathfinder (ASKAP) telescope, specifically the Evolutionary Map of the Universe (EMU) Pilot Survey, which covered a sky area of 270 square degrees with an RMS sensitivity of 25-35 $\mu$Jy/beam. We demonstrate that weakly-supervised deep learning algorithms can achieve high accuracy in predicting pixel-level information, including masks for the extended radio emission encapsulating all galaxy components and the positions of the infrared host galaxies. We evaluate the performance of our method using mean Average Precision (mAP) across multiple classes at a standard intersection over union (IoU) threshold of 0.5. We show that the model achieves a mAP$_{50}$ of 67.5\% and 76.8\% for radio masks and infrared host positions, respectively. The network architecture can be found at the following link: https://github.com/Nikhel1/Gal-CAM
摘要
当前研究探讨了一种使用弱监督深度学习算法,以降低标注复杂Radio галактиках多Component的像素级掩码的成本。该算法在Radio галактиках的弱类级标签上训练,以获得类活化图(CAM)。然后,使用Inter-pixel关系网络(IRNet)进一步级别Refine CAM,以获得Radio галактиках和它们的红外主 galaxy位置的实例分割面。我们使用澳大利亚Square Kilometre Array Pathfinder(ASKAP) telescope的数据,具体是Evolutionary Map of the Universe(EMU) Pilot Survey,覆盖了天空面积270平方度,RMS敏感度为25-35 $\mu$Jy/beam。我们示示了弱监督深度学习算法可以高精度预测像素级信息,包括Radio扩散的扩散面和红外主 galaxy的位置。我们使用mean Average Precision(mAP)来评估模型的性能,mAP$_{50}$为67.5%和76.8%分别为Radio掩码和红外主 galaxy位置。网络架构可以在以下链接中找到:https://github.com/Nikhel1/Gal-CAM。
Improved Multi-Shot Diffusion-Weighted MRI with Zero-Shot Self-Supervised Learning Reconstruction
For: This paper aims to improve the resolution of diffusion-weighted images (DWIs) in magnetic resonance imaging (MRI) by developing a novel multi-shot echo-planar imaging (msEPI) reconstruction approach called zero-MIRID.* Methods: The proposed approach uses deep learning-based image regularization techniques, including convolutional neural network (CNN) denoisers in both k- and image-spaces, and leverages virtual coils to enhance image reconstruction conditioning.* Results: The proposed approach achieves superior results compared to the state-of-the-art parallel imaging method, as demonstrated in an in-vivo experiment.Here’s the simplified Chinese text for the three information:* For: 这个研究旨在通过开发一种新的多shot echo-planar imaging (msEPI)重建方法来提高核磁共振成像 (MRI) 中的扩散束图像的分辨率。* Methods: 该方法使用深度学习基于图像规范化技术,包括图像杂化神经网络 (CNN) 杂化器在 both k- 和 image-spaces,并利用虚拟antenna来提高图像重建条件。* Results: 该方法在实验中比 estado-of-the-art 并行成像方法更为出色。Abstract
Diffusion MRI is commonly performed using echo-planar imaging (EPI) due to its rapid acquisition time. However, the resolution of diffusion-weighted images is often limited by magnetic field inhomogeneity-related artifacts and blurring induced by T2- and T2*-relaxation effects. To address these limitations, multi-shot EPI (msEPI) combined with parallel imaging techniques is frequently employed. Nevertheless, reconstructing msEPI can be challenging due to phase variation between multiple shots. In this study, we introduce a novel msEPI reconstruction approach called zero-MIRID (zero-shot self-supervised learning of Multi-shot Image Reconstruction for Improved Diffusion MRI). This method jointly reconstructs msEPI data by incorporating deep learning-based image regularization techniques. The network incorporates CNN denoisers in both k- and image-spaces, while leveraging virtual coils to enhance image reconstruction conditioning. By employing a self-supervised learning technique and dividing sampled data into three groups, the proposed approach achieves superior results compared to the state-of-the-art parallel imaging method, as demonstrated in an in-vivo experiment.
摘要
Diffusion MRI通常使用普通的抗频响成像技术(EPI)进行扫描,但是它的分辨率经常受到磁场不均的artefacts和T2-和T2*-相互作用引起的模糊的限制。为了解决这些限制,常用多极EPI(msEPI)和平行技术。然而,重建msEPI可以很困难,因为多个极的相位变化。在这种研究中,我们介绍了一种新的msEPI重建方法,即零MIRID(零频率自我超vised学习 Multi-shot Image Reconstruction for Improved Diffusion MRI)。这种方法将jointly重建msEPI数据,并通过在图像Regularization技术中 incorporating deep learning-based image regularization techniques。网络包括在k-和图像空间中的CNN denoisers,同时利用虚拟磁体来提高图像重建条件。通过使用自我超vised学习技术和将样本数据分成三组,我们的方法在对比 estado-of-the-art 平行图像方法的实验中 дости到了更好的结果。
DOST – Domain Obedient Self-supervised Training for Multi Label Classification with Noisy Labels
for: 这篇论文targets the problem of annotation noise in multi-label classification (MLC) tasks, and proposes a novel approach called Domain Obedient Self-supervised Training (DOST) to mitigate the effect of noise.
methods: 这篇论文使用了自适应式学习和域指导来检测错误标签和避免规则违背预测,并且将域规则 incorporated into the learning algorithm to improve the alignment of the model with the domain rules.
results: 实验结果显示,这篇论文的方法可以提高预测性能和减少错误标签的影响,并且可以将模型调整为更好地遵循域规则。Abstract
The enormous demand for annotated data brought forth by deep learning techniques has been accompanied by the problem of annotation noise. Although this issue has been widely discussed in machine learning literature, it has been relatively unexplored in the context of "multi-label classification" (MLC) tasks which feature more complicated kinds of noise. Additionally, when the domain in question has certain logical constraints, noisy annotations often exacerbate their violations, making such a system unacceptable to an expert. This paper studies the effect of label noise on domain rule violation incidents in the MLC task, and incorporates domain rules into our learning algorithm to mitigate the effect of noise. We propose the Domain Obedient Self-supervised Training (DOST) paradigm which not only makes deep learning models more aligned to domain rules, but also improves learning performance in key metrics and minimizes the effect of annotation noise. This novel approach uses domain guidance to detect offending annotations and deter rule-violating predictions in a self-supervised manner, thus making it more "data efficient" and domain compliant. Empirical studies, performed over two large scale multi-label classification datasets, demonstrate that our method results in improvement across the board, and often entirely counteracts the effect of noise.
摘要
深度学习技术的巨大需求已导致笔记噪声问题的出现,而这个问题在多标签分类(MLC)任务中尤为复杂。随着领域的不同,噪声笔记常常扩大领域规则的违反,使得这种系统无法得到专家的acceptance。本文研究了多标签分类任务中标签噪声对领域规则违反的影响,并将领域规则 integrate into our学习算法以降低噪声的影响。我们提出了遵循领域规则自我指导的学习方法(DOST),该方法不仅使得深度学习模型更加遵循领域规则,而且提高了学习性能的关键指标,同时减少了笔记噪声的影响。我们在两个大规模多标签分类数据集上进行了实验,结果表明,我们的方法在所有指标上均得到改善,并经常完全抵消噪声的影响。
A degree of image identification at sub-human scales could be possible with more advanced clusters
results: 扩大数据量和图像质量时,可以达到人类水平的物体检测性能,但需要在sub-human size下达到最佳性能。Abstract
The purpose of the research is to determine if currently available self-supervised learning techniques can accomplish human level comprehension of visual images using the same degree and amount of sensory input that people acquire from. Initial research on this topic solely considered data volume scaling. Here, we scale both the volume of data and the quality of the image. This scaling experiment is a self-supervised learning method that may be done without any outside financing. We find that scaling up data volume and picture resolution at the same time enables human-level item detection performance at sub-human sizes.We run a scaling experiment with vision transformers trained on up to 200000 images up to 256 ppi.
摘要
“研究的目的是确定当前可用的自动学习技术是否可以通过同样的感知输入来达到人类水平的视觉图像理解。初期研究仅考虑数据量的缩放。在这里,我们同时缩放数据量和图像质量。这是一种没有外部资金支持的自动学习方法。我们发现同时缩放数据量和图像分辨率可以达到人类水平的物体检测性能,尽管它们的大小仅为人类的一半。我们使用视Transformers进行训练,并测试了200000个图像,最高达256ppi。”Note that the word "ppi" is not a commonly used term in Simplified Chinese, so I translated it as "分辨率" (resolution) instead.
Bayesian Inverse Transition Learning for Offline Settings
paper_authors: Leo Benac, Sonali Parbhoo, Finale Doshi-Velez
for: 这个论文主要针对Sequential decision-making领域,如医疗和教育, где reward是知道的,并且需要根据批处理数据来估算转移动力学T。
methods: 该论文提出了一种新的约束基于方法,该方法可以可靠地学习 posterior distribution of transition dynamics T,并且可以减少政策的差异度。
results: 论文的结果表明,通过使用该约束,可以学习一个高性能的策略,同时减少策略的差异度。此外,该约束还可以帮助我们推测action的partial ranking,并且帮助我们推测safer和更 информатив的策略 для规划。Abstract
Offline Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare and education, where the rewards are known and the transition dynamics $T$ must be estimated on the basis of batch data. A key challenge for all tasks is how to learn a reliable estimate of the transition dynamics $T$ that produce near-optimal policies that are safe enough so that they never take actions that are far away from the best action with respect to their value functions and informative enough so that they communicate the uncertainties they have. Using data from an expert, we propose a new constraint-based approach that captures our desiderata for reliably learning a posterior distribution of the transition dynamics $T$ that is free from gradients. Our results demonstrate that by using our constraints, we learn a high-performing policy, while considerably reducing the policy's variance over different datasets. We also explain how combining uncertainty estimation with these constraints can help us infer a partial ranking of actions that produce higher returns, and helps us infer safer and more informative policies for planning.
摘要
Translated into Simplified Chinese:Offline 强化学习通常在医疗和教育等领域使用,以实现序列决策,其中奖励已知,transition dynamics $T$ 必须通过批量数据来估算。任务中的一个关键挑战是如何学习可靠的 transition dynamics $T$,以生成近似优质策略,并保证这些策略是安全的,即不会选择远离最优动作的值函数上的行为,同时具有信息量足够的策略。通过专家数据,我们提出了一种新的约束基本方法,该方法满足我们的需求,包括可靠地学习 posterior 分布的 transition dynamics $T$,不受导 gradient 的影响。我们的结果表明,通过我们的约束,我们可以学习高性能的策略,同时在不同的数据集上减少策略的方差。此外,我们还解释了如何将不确定性估计与这些约束结合使用,以帮助我们对不同的动作进行部分排序,并生成更安全和更有信息的策略 для 规划。
An Interpretable and Attention-based Method for Gaze Estimation Using Electroencephalography
results: 研究人员通过对该框架进行了全面评估,发现其在准确性和可靠性方面与当前方法相比较出色,并且提供了可视化结果,以便更好地理解研究结果和增强EEG数据分析的效率和效果。Abstract
Eye movements can reveal valuable insights into various aspects of human mental processes, physical well-being, and actions. Recently, several datasets have been made available that simultaneously record EEG activity and eye movements. This has triggered the development of various methods to predict gaze direction based on brain activity. However, most of these methods lack interpretability, which limits their technology acceptance. In this paper, we leverage a large data set of simultaneously measured Electroencephalography (EEG) and Eye tracking, proposing an interpretable model for gaze estimation from EEG data. More specifically, we present a novel attention-based deep learning framework for EEG signal analysis, which allows the network to focus on the most relevant information in the signal and discard problematic channels. Additionally, we provide a comprehensive evaluation of the presented framework, demonstrating its superiority over current methods in terms of accuracy and robustness. Finally, the study presents visualizations that explain the results of the analysis and highlights the potential of attention mechanism for improving the efficiency and effectiveness of EEG data analysis in a variety of applications.
摘要
眼动可以披露人类心理过程、物理健康和行为等方面的有价值信息。近年来,一些同时记录EEG活动和眼动的数据集被开发出来,这导致了基于EEG数据预测眼动方向的多种方法的开发。然而,大多数这些方法缺乏可解性,这限制了技术的接受度。在本文中,我们利用大量同时测量EEG和眼动的数据集,提出一种可解的EEG信号分析模型,允许网络在信号中寻找最重要的信息,并且抛弃问题性的通道。此外,我们对提出的框架进行了全面的评估,证明它在准确性和稳定性方面超过当前方法。最后,研究还提供了可视化结果,解释分析结果并高亮了EEG数据分析中注意机制的潜在效果和可能性。
EEG-based Emotion Style Transfer Network for Cross-dataset Emotion Recognition
paper_authors: Yijin Zhou, Fu Li, Yang Li, Youshuo Ji, Lijian Zhang, Yuanfang Chen, Wenming Zheng, Guangming Shi
For: 这篇论文旨在解决脑波Emotion识别 tasks中的cross-dataset问题,提出了一个基于EEG的Emotion Style Transfer Network(E2STN)来获得具有源频道和目标频道的情感特征信息。* Methods: E2STN包括三个模组:转移模组、转移评估模组和推断预测模组。转移模组将源频道和目标频道的专业信息转换为新的类别特征信息,并运用转移评估模组来约束生成的表现,以确保能够更 precisley 融合两个频道的资讯。* Results: 实验结果显示,E2STN可以在cross-dataset Emotion识别任务中取得顶尖性能。Abstract
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.
摘要
As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.Here's the translation in Traditional Chinese:As the key to realizing aBCIs, EEG emotion recognition has been widely studied by many researchers. Previous methods have performed well for intra-subject EEG emotion recognition. However, the style mismatch between source domain (training data) and target domain (test data) EEG samples caused by huge inter-domain differences is still a critical problem for EEG emotion recognition. To solve the problem of cross-dataset EEG emotion recognition, in this paper, we propose an EEG-based Emotion Style Transfer Network (E2STN) to obtain EEG representations that contain the content information of source domain and the style information of target domain, which is called stylized emotional EEG representations. The representations are helpful for cross-dataset discriminative prediction. Concretely, E2STN consists of three modules, i.e., transfer module, transfer evaluation module, and discriminative prediction module. The transfer module encodes the domain-specific information of source and target domains and then re-constructs the source domain's emotional pattern and the target domain's statistical characteristics into the new stylized EEG representations. In this process, the transfer evaluation module is adopted to constrain the generated representations that can more precisely fuse two kinds of complementary information from source and target domains and avoid distorting. Finally, the generated stylized EEG representations are fed into the discriminative prediction module for final classification. Extensive experiments show that the E2STN can achieve the state-of-the-art performance on cross-dataset EEG emotion recognition tasks.
Prompting In-Context Operator Learning with Sensor Data, Equations, and Natural Language
results: 通过将人类知识 integrate into 学习过程,提高了学习性能和数据量的灵活性,同时开创了一条新的语言模型应用路径。Abstract
In the growing domain of scientific machine learning, in-context operator learning has demonstrated notable potential in learning operators from prompted data during inference stage without weight updates. However, the current model's overdependence on sensor data, may inadvertently overlook the invaluable human insight into the operator. To address this, we present a transformation of in-context operator learning into a multi-modal paradigm. We propose the use of "captions" to integrate human knowledge about the operator, expressed through natural language descriptions and equations. We illustrate how this method not only broadens the flexibility and generality of physics-informed learning, but also significantly boosts learning performance and reduces data needs. Furthermore, we introduce a more efficient neural network architecture for multi-modal in-context operator learning, referred to as "ICON-LM", based on a language-model-like architecture. We demonstrate the viability of "ICON-LM" for scientific machine learning tasks, which creates a new path for the application of language models.
摘要
在科学机器学习领域的发展中,在推理阶段学习操作符的增ContextOperator Learning(ICOL)技术已经表现出了remarkable的潜力,无需更新参数可以从提示数据中学习操作符。然而,当前模型过于依赖感知数据,可能会不经意识地忽略人类对操作符的重要意见。为此,我们提出了将ICOL转换为多modal的思想。我们建议使用"caption"来整合人类对操作符的知识,通过自然语言描述和方程表达。我们示出了该方法不仅拓宽了物理学习的灵活性和普遍性,还可以显著提高学习性和降低数据需求。此外,我们提出了一种更高效的多modalICOL neural network架构,称为"ICON-LM",基于语言模型的架构。我们示出了"ICON-LM"在科学机器学习任务中的可行性,创造了一条新的语言模型应用路径。
A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique
results: 对比现有方法,该方法在标准数据集上表现更好,并且可以避免消失梯度问题。这项研究开展了高效和有效的深度神经网络训练的可能性。Abstract
Deep learning has revolutionized industries like computer vision, natural language processing, and speech recognition. However, back propagation, the main method for training deep neural networks, faces challenges like computational overhead and vanishing gradients. In this paper, we propose a novel instant parameter update methodology that eliminates the need for computing gradients at each layer. Our approach accelerates learning, avoids the vanishing gradient problem, and outperforms state-of-the-art methods on benchmark data sets. This research presents a promising direction for efficient and effective deep neural network training.
摘要
Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators
paper_authors: Nikolas Borrel-Jensen, Somdatta Goswami, Allan P. Engsig-Karup, George Em Karniadakis, Cheol-Ho Jeong
for: 用于虚拟/增强现实、游戏音频和空间计算等领域的 зву频场模拟。
methods: 使用深度运算网络来近似线性波方程Operator。
results: 实现了在真实3D声学场景中逐源位置和接受器位置的批处理计算,计算时间在毫秒级别,与参考解决方案display(0.02 Pa至0.10 Pa)有good agreement。Abstract
We address the challenge of sound propagation simulations in $3$D virtual rooms with moving sources, which have applications in virtual/augmented reality, game audio, and spatial computing. Solutions to the wave equation can describe wave phenomena such as diffraction and interference. However, simulating them using conventional numerical discretization methods with hundreds of source and receiver positions is intractable, making stimulating a sound field with moving sources impractical. To overcome this limitation, we propose using deep operator networks to approximate linear wave-equation operators. This enables the rapid prediction of sound propagation in realistic 3D acoustic scenes with moving sources, achieving millisecond-scale computations. By learning a compact surrogate model, we avoid the offline calculation and storage of impulse responses for all relevant source/listener pairs. Our experiments, including various complex scene geometries, show good agreement with reference solutions, with root mean squared errors ranging from 0.02 Pa to 0.10 Pa. Notably, our method signifies a paradigm shift as no prior machine learning approach has achieved precise predictions of complete wave fields within realistic domains. We anticipate that our findings will drive further exploration of deep neural operator methods, advancing research in immersive user experiences within virtual environments.
摘要
我们面对三维虚拟房间中有移动源的声波传播模拟挑战,这些应用包括虚拟/增强现实实验、游戏音频和空间计算。我们可以使用深度算子网络来近似线性波方程式的算子,从而实现实时的声波传播预测,并且可以在毫秒级时间内完成计算。通过学习简洁的模型,我们可以避免在所有相关的源/听众配置下进行组织化计算和储存响应。我们的实验包括多种复杂的场景几何,结果显示与参考解析结果几乎完美匹配,误差范围为0.02 Pa至0.10 Pa。值得注意的是,我们的方法代表了一种新的思维方式,没有任何先前的机器学习方法能够精确地预测完整的波场 within realistic domain。我们预期这些成果将驱动深度神经网络方法的进一步探索,促进虚拟环境中的投入性用户体验。
RadGraph2: Modeling Disease Progression in Radiology Reports via Hierarchical Information Extraction
results: 本研究发现,使用 RadGraph2 训练的模型可以更好地捕捉疾病状态的变化,并在关系抽取任务中表现更好,比旧有的模型更高。Abstract
We present RadGraph2, a novel dataset for extracting information from radiology reports that focuses on capturing changes in disease state and device placement over time. We introduce a hierarchical schema that organizes entities based on their relationships and show that using this hierarchy during training improves the performance of an information extraction model. Specifically, we propose a modification to the DyGIE++ framework, resulting in our model HGIE, which outperforms previous models in entity and relation extraction tasks. We demonstrate that RadGraph2 enables models to capture a wider variety of findings and perform better at relation extraction compared to those trained on the original RadGraph dataset. Our work provides the foundation for developing automated systems that can track disease progression over time and develop information extraction models that leverage the natural hierarchy of labels in the medical domain.
摘要
我们介绍RadGraph2数据集,这是一个抽取医学报告信息的新数据集,它专注于随时间变化的疾病状态和设备位置。我们提出了一个层次结构,将实体按照其关系进行组织,并证明在训练时使用这个层次结构可以提高信息抽取模型的性能。我们对 DyGIE++ 框架进行修改,得到了我们的 HGIE 模型,这个模型在实体和关系抽取任务中表现更好。我们示示 RadGraph2 数据集可以让模型捕捉到更多的发现和在关系抽取任务中表现更好,比之前训练在 RadGraph 数据集上的模型。我们的工作为 automatized 系统的开发提供了基础,以及在医疗领域中自然层次标签的信息抽取模型的开发。
Collaborative Wideband Spectrum Sensing and Scheduling for Networked UAVs in UTM Systems
results: 我们建立了一个包括MATLAB LTE工具箱的详细的模拟框架,以生成一个近似实际的隐藏数据集,用于开发ML/AI基于谱管理解决方案。这种评估方法ология提供了一个灵活的框架,可以生成大量的谱数据集,用于开发ML/AI基于谱管理解决方案。Abstract
In this paper, we propose a data-driven framework for collaborative wideband spectrum sensing and scheduling for networked unmanned aerial vehicles (UAVs), which act as the secondary users to opportunistically utilize detected spectrum holes. To this end, we propose a multi-class classification problem for wideband spectrum sensing to detect vacant spectrum spots based on collected I/Q samples. To enhance the accuracy of the spectrum sensing module, the outputs from the multi-class classification by each individual UAV are fused at a server in the unmanned aircraft system traffic management (UTM) ecosystem. In the spectrum scheduling phase, we leverage reinforcement learning (RL) solutions to dynamically allocate the detected spectrum holes to the secondary users (i.e., UAVs). To evaluate the proposed methods, we establish a comprehensive simulation framework that generates a near-realistic synthetic dataset using MATLAB LTE toolbox by incorporating base-station~(BS) locations in a chosen area of interest, performing ray-tracing, and emulating the primary users channel usage in terms of I/Q samples. This evaluation methodology provides a flexible framework to generate large spectrum datasets that could be used for developing ML/AI-based spectrum management solutions for aerial devices.
摘要
在这篇论文中,我们提出了一个基于数据的框架,用于共享宽带spectrum感知和调度 для网络无人机(UAV),这些UAV作为次要用户在探测到的spectrum孔隙中进行机会性利用。为此,我们提出了一个多类分类问题,用于宽带spectrum感知,以检测空闲spectrum孔隙基于收集的I/Q样本。为了提高spectrum感知模块的准确性,每个个体UAV的输出在UTM生态系统中的服务器上进行融合。在spectrum调度阶段,我们利用了强化学习(RL)解决方案,以动态分配探测到的spectrum孔隙给次要用户(即UAV)。为了评估我们的方法,我们建立了一个全面的 simulate框架,通过在MATLAB LTE工具箱中生成一个近似实际的数据集,来评估我们的方法。这种评估方法ология提供了一个灵活的框架,可以用于生成大量spectrum数据,用于开发ML/AI基于spectrum管理解决方案。
Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance
results: 根据实验结果,这种方法可以准确地检测攻击,并生成简洁的攻击摘要图,方便管理员快速理解和应对系统攻击。Abstract
Provenance graphs are structured audit logs that describe the history of a system's execution. Recent studies have explored a variety of techniques to analyze provenance graphs for automated host intrusion detection, focusing particularly on advanced persistent threats. Sifting through their design documents, we identify four common dimensions that drive the development of provenance-based intrusion detection systems (PIDSes): scope (can PIDSes detect modern attacks that infiltrate across application boundaries?), attack agnosticity (can PIDSes detect novel attacks without a priori knowledge of attack characteristics?), timeliness (can PIDSes efficiently monitor host systems as they run?), and attack reconstruction (can PIDSes distill attack activity from large provenance graphs so that sysadmins can easily understand and quickly respond to system intrusion?). We present KAIROS, the first PIDS that simultaneously satisfies the desiderata in all four dimensions, whereas existing approaches sacrifice at least one and struggle to achieve comparable detection performance. Kairos leverages a novel graph neural network-based encoder-decoder architecture that learns the temporal evolution of a provenance graph's structural changes to quantify the degree of anomalousness for each system event. Then, based on this fine-grained information, Kairos reconstructs attack footprints, generating compact summary graphs that accurately describe malicious activity over a stream of system audit logs. Using state-of-the-art benchmark datasets, we demonstrate that Kairos outperforms previous approaches.
摘要
《证明图》是系统执行历史记录的结构化审核日志。最近的研究已经探讨了多种技术来分析证明图来自动检测主机入侵,尤其是高级持续性威胁。从它们的设计文档中,我们标识出四个常见的维度驱动了证明基于检测系统的开发(PIDS):范围(可以PIDS检测现代攻击卷积到应用程序边界?),攻击不受限(可以PIDS检测新型攻击无需先知攻击特征?),准确性(可以PIDS高效监控主机系统?),和攻击重建(可以PIDS从证明图中提取攻击活动,以便管理员轻松理解和快速应对系统入侵?)。我们介绍了卡伊罗斯(KAIROS),第一个同时满足这四个维度的PIDS,而现有方法至少牺牲一个维度,并且减少了相对的检测性能。卡伊罗斯利用了一种新的图神经网络基于编码器-解码器架构,学习证明图的时间演变的结构变化,以量化每个系统事件的异常程度。然后,基于这些细化的信息,卡伊罗斯重建攻击印记,生成准确描述恶意活动的流程系统审核日志的简短摘要图。使用现代标准数据集,我们证明了卡伊罗斯在前一个approaches中出performanced。
results: 我们的方法在ADNI数据库上进行了ablation实验,使用了两种数据模式,获得了89.71%和91.18%的AD诊断准确性,并超过了一些现有的方法。Abstract
Structural MRI and PET imaging play an important role in the diagnosis of Alzheimer's disease (AD), showing the morphological changes and glucose metabolism changes in the brain respectively. The manifestations in the brain image of some cognitive impairment patients are relatively inconspicuous, for example, it still has difficulties in achieving accurate diagnosis through sMRI in clinical practice. With the emergence of deep learning, convolutional neural network (CNN) has become a valuable method in AD-aided diagnosis, but some CNN methods cannot effectively learn the features of brain image, making the diagnosis of AD still presents some challenges. In this work, we propose an end-to-end 3D CNN framework for AD diagnosis based on ResNet, which integrates multi-layer features obtained under the effect of the attention mechanism to better capture subtle differences in brain images. The attention maps showed our model can focus on key brain regions related to the disease diagnosis. Our method was verified in ablation experiments with two modality images on 792 subjects from the ADNI database, where AD diagnostic accuracies of 89.71% and 91.18% were achieved based on sMRI and PET respectively, and also outperformed some state-of-the-art methods.
摘要
Magnetic Resonance Imaging (MRI) 和 Positron Emission Tomography (PET) 影像技术在诊断阿尔茨海默病 (AD) 中扮演着重要的角色,显示了脑部的形态变化和葡萄糖代谢变化。但是,在临床实践中,使用深度学习的时候,一些 convolutional neural network (CNN) 方法无法有效地学习脑部影像中的特征,从而使得 AD 的诊断仍然存在一些挑战。在这项工作中,我们提出了基于 ResNet 的终端三维 CNN 框架,该框架可以更好地捕捉脑部影像中的微小差异。我们的注意力地図显示了我们的模型可以更好地关注与疾病诊断相关的关键脑部区域。我们的方法在 ADNI 数据库中进行了ablation实验,使用两种模式的图像进行诊断,并达到了89.71% 和 91.18% 的 AD 诊断精度。此外,我们的方法还超过了一些当前的状态艺术方法。
SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network
results: 通过对多个实验来说,提出的方法可以具有较好的Azimuth控制和SAR图像生成精度。Abstract
Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR images' azimuths. This network mainly contains three parts: generator, discriminator, and predictor. Through the proposed specific network structure, the generator can extract and fuse the optimal target features from two input SAR target images to generate SAR target image. Then a similarity discriminator and an azimuth predictor are designed. The similarity discriminator can differentiate the generated SAR target images from the real SAR images to ensure the accuracy of the generated, while the azimuth predictor measures the difference of azimuth between the generated and the desired to ensure the azimuth controllability of the generated. Therefore, the proposed network can generate precise SAR images, and their azimuths can be controlled well by the inputs of the deep network, which can generate the target images in different azimuths to solve the small sample problem to some degree and benefit the researches of SAR images. Extensive experimental results show the superiority of the proposed method in azimuth controllability and accuracy of SAR target image generation.
摘要
够多的Synthetic Aperture Radar(SAR)目标图像非常重要 для研究发展。然而,实际中可用的SAR目标图像往往受限,这会阻碍SAR应用的进步。在这篇论文中,我们提出了一种 azimuth 可控的生成对抗网络,可以生成具有中间 azimuth 的精准 SAR 目标图像。该网络主要包括三部分:生成器、判别器和预测器。通过我们的具体网络结构,生成器可以从两个输入 SAR 目标图像中提取和融合最佳的目标特征,生成 SAR 目标图像。然后,我们设计了一个相似性判别器和一个 azimuth 预测器。相似性判别器可以判断生成的 SAR 目标图像是否与实际 SAR 图像相似,以确保生成的准确性。而 azimuth 预测器则可以测量生成的 azimuth 与所需的 azimuth 之间的差异,以确保生成的 azimuth 可控性。因此,我们的网络可以生成精准的 SAR 目标图像,并且可以通过输入深度网络的参数控制生成的 azimuth。这有助于解决小样本问题,并为 SAR 图像研究提供一些答案。我们的实验结果表明,我们的方法在 azimuth 可控性和精准性方面具有显著优势。
Surface Masked AutoEncoder: Self-Supervision for Cortical Imaging Data
paper_authors: Simon Dahan, Mariana da Silva, Daniel Rueckert, Emma C Robinson
for: This paper aims to improve the performance of vision transformer models in cortical surface learning tasks, specifically in the context of cortical imaging where datasets are limited in size.
methods: The proposed method uses Masked AutoEncoder (MAE) self-supervision to pre-train vision transformer models on large datasets, such as the UK Biobank (UKB), and then fine-tunes the models on smaller cortical phenotype regression datasets.
results: The pre-trained models achieve a 26% improvement in performance and an 80% faster convergence compared to models trained from scratch, demonstrating the effectiveness of the proposed method in learning strong representations for cortical surface learning tasks.Abstract
Self-supervision has been widely explored as a means of addressing the lack of inductive biases in vision transformer architectures, which limits generalisation when networks are trained on small datasets. This is crucial in the context of cortical imaging, where phenotypes are complex and heterogeneous, but the available datasets are limited in size. This paper builds upon recent advancements in translating vision transformers to surface meshes and investigates the potential of Masked AutoEncoder (MAE) self-supervision for cortical surface learning. By reconstructing surface data from a masked version of the input, the proposed method effectively models cortical structure to learn strong representations that translate to improved performance in downstream tasks. We evaluate our approach on cortical phenotype regression using the developing Human Connectome Project (dHCP) and demonstrate that pre-training leads to a 26\% improvement in performance, with an 80\% faster convergence, compared to models trained from scratch. Furthermore, we establish that pre-training vision transformer models on large datasets, such as the UK Biobank (UKB), enables the acquisition of robust representations for finetuning in low-data scenarios. Our code and pre-trained models are publicly available at \url{https://github.com/metrics-lab/surface-vision-transformers}.
摘要
自我超视听已经广泛探索了用于解决视transformer架构中缺乏逻辑假设的问题,这限制了通用化 quando networks是在小数据集上训练的。这篇文章基于latest advancements in translating vision transformers to surface meshes, and investigates the potential of Masked AutoEncoder (MAE) self-supervision for cortical surface learning. By reconstructing surface data from a masked version of the input, the proposed method effectively models cortical structure to learn strong representations that translate to improved performance in downstream tasks. We evaluate our approach on cortical phenotype regression using the developing Human Connectome Project (dHCP) and demonstrate that pre-training leads to a 26% improvement in performance, with an 80% faster convergence, compared to models trained from scratch. Furthermore, we establish that pre-training vision transformer models on large datasets, such as the UK Biobank (UKB), enables the acquisition of robust representations for finetuning in low-data scenarios. Our code and pre-trained models are publicly available at \url{https://github.com/metrics-lab/surface-vision-transformers}.Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.
Global in Local: A Convolutional Transformer for SAR ATR FSL
results: 在Moving and Stationary Target Acquisition and Recognition(MSTAR)数据集上进行了实验,并达到了 pioneering 的性能,不需要其他SAR目标图像进行训练。Abstract
Convolutional neural networks (CNNs) have dominated the synthetic aperture radar (SAR) automatic target recognition (ATR) for years. However, under the limited SAR images, the width and depth of the CNN-based models are limited, and the widening of the received field for global features in images is hindered, which finally leads to the low performance of recognition. To address these challenges, we propose a Convolutional Transformer (ConvT) for SAR ATR few-shot learning (FSL). The proposed method focuses on constructing a hierarchical feature representation and capturing global dependencies of local features in each layer, named global in local. A novel hybrid loss is proposed to interpret the few SAR images in the forms of recognition labels and contrastive image pairs, construct abundant anchor-positive and anchor-negative image pairs in one batch and provide sufficient loss for the optimization of the ConvT to overcome the few sample effect. An auto augmentation is proposed to enhance and enrich the diversity and amount of the few training samples to explore the hidden feature in a few SAR images and avoid the over-fitting in SAR ATR FSL. Experiments conducted on the Moving and Stationary Target Acquisition and Recognition dataset (MSTAR) have shown the effectiveness of our proposed ConvT for SAR ATR FSL. Different from existing SAR ATR FSL methods employing additional training datasets, our method achieved pioneering performance without other SAR target images in training.
摘要
convolutional neural networks (CNNs) have dominated the synthetic aperture radar (SAR) automatic target recognition (ATR) for years. However, under the limited SAR images, the width and depth of the CNN-based models are limited, and the widening of the received field for global features in images is hindered, which finally leads to the low performance of recognition. To address these challenges, we propose a Convolutional Transformer (ConvT) for SAR ATR few-shot learning (FSL). The proposed method focuses on constructing a hierarchical feature representation and capturing global dependencies of local features in each layer, named global in local. A novel hybrid loss is proposed to interpret the few SAR images in the forms of recognition labels and contrastive image pairs, construct abundant anchor-positive and anchor-negative image pairs in one batch and provide sufficient loss for the optimization of the ConvT to overcome the few sample effect. An auto augmentation is proposed to enhance and enrich the diversity and amount of the few training samples to explore the hidden feature in a few SAR images and avoid the over-fitting in SAR ATR FSL. Experiments conducted on the Moving and Stationary Target Acquisition and Recognition dataset (MSTAR) have shown the effectiveness of our proposed ConvT for SAR ATR FSL. Different from existing SAR ATR FSL methods employing additional training datasets, our method achieved pioneering performance without other SAR target images in training.Here's the breakdown of the translation:* "convolutional neural networks" is translated as " convolutional neural networks" (同 "convolutional neural networks" in English)* "synthetic aperture radar" is translated as "干扰雷达" (a common translation for "synthetic aperture radar")* "automatic target recognition" is translated as "自动目标识别" (a common translation for "automatic target recognition")* "few-shot learning" is translated as "少量学习" (a common translation for "few-shot learning")* "Convolutional Transformer" is translated as "卷积变换器" (a common translation for "Convolutional Transformer")* "Moving and Stationary Target Acquisition and Recognition dataset" is translated as "移动和静止目标获取和识别数据集" (a common translation for "Moving and Stationary Target Acquisition and Recognition dataset")Note that the translation is in Simplified Chinese, which is the most widely used version of Chinese. If you need the translation in Traditional Chinese, please let me know.
Transforming Breast Cancer Diagnosis: Towards Real-Time Ultrasound to Mammogram Conversion for Cost-Effective Diagnosis
paper_authors: Sahar Almahfouz Nasser, Ashutosh Sharma, Anmol Saraf, Amruta Mahendra Parulekar, Purvi Haria, Amit Sethi for: This research aims to provide surgeons with mammogram-like image quality in real-time from noisy US images.methods: The research utilizes the Stride software to numerically solve the forward model and generate ultrasound images from mammography images. Additionally, generative adversarial networks (GANs) are used to tackle the inverse problem of generating mammogram-quality images from ultrasound images.results: The resultant images have considerably more discernible details than the original US images.Abstract
Ultrasound (US) imaging is better suited for intraoperative settings because it is real-time and more portable than other imaging techniques, such as mammography. However, US images are characterized by lower spatial resolution noise-like artifacts. This research aims to address these limitations by providing surgeons with mammogram-like image quality in real-time from noisy US images. Unlike previous approaches for improving US image quality that aim to reduce artifacts by treating them as (speckle noise), we recognize their value as informative wave interference pattern (WIP). To achieve this, we utilize the Stride software to numerically solve the forward model, generating ultrasound images from mammograms images by solving wave-equations. Additionally, we leverage the power of domain adaptation to enhance the realism of the simulated ultrasound images. Then, we utilize generative adversarial networks (GANs) to tackle the inverse problem of generating mammogram-quality images from ultrasound images. The resultant images have considerably more discernible details than the original US images.
摘要
超声成像(US)在操作间更适合使用,因为它们是实时的,更携带性好于其他成像技术,如胸部X光图像。然而,US图像受到低分辨率噪声杂音的限制。这些研究希望通过为外科医生提供来自噪声US图像的高品质图像,例如胸部X光图像的品质。不同于以往的方法,我们不会将噪声视为干扰,而是认为它们是有用的波形干扰(WIP)。为了实现这一目标,我们利用Stride软件来数学模拟前向模型,将ultrasound图像转换成胸部X光图像,并利用领域适应来增强模拟的超声图像的实际性。然后,我们利用生成对抗网络(GANs)解决超声图像到胸部X光图像的逆问题,得到了胸部X光图像的高品质图像。这些图像的详细程度明显高于原始US图像。
A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement
methods: 这个方法包括三部分:大气基于动态结构(ADS)、传输基于动态结构(TDS)和优先级基于多尺度结构(PMS)。特别是,为了涵盖复杂的水下场景,这种研究改变了全球大气光和传输,通过形成模型来模拟不同类型的水下图像(如水下图像颜色从黄到蓝)。然后,ADS和TDS使用动态 convolution来自适应地提取水下图像中的准确信息,并生成 Parameters for PMS。
results: 这个方法可以适应不同类型的水下图像,并且可以提高水下图像的对比度和颜色准确性。具体来说,对于不同的水类型,这个方法可以自动选择合适的参数,从而提高图像增强的效果。Abstract
Underwater images often suffer from color distortion and low contrast resulting in various image types, due to the scattering and absorption of light by water. While it is difficult to obtain high-quality paired training samples with a generalized model. To tackle these challenges, we design a Generalized Underwater image enhancement method via a Physical-knowledge-guided Dynamic Model (short for GUPDM), consisting of three parts: Atmosphere-based Dynamic Structure (ADS), Transmission-guided Dynamic Structure (TDS), and Prior-based Multi-scale Structure (PMS). In particular, to cover complex underwater scenes, this study changes the global atmosphere light and the transmission to simulate various underwater image types (e.g., the underwater image color ranging from yellow to blue) through the formation model. We then design ADS and TDS that use dynamic convolutions to adaptively extract prior information from underwater images and generate parameters for PMS. These two modules enable the network to select appropriate parameters for various water types adaptively. Besides, the multi-scale feature extraction module in PMS uses convolution blocks with different kernel sizes and obtains weights for each feature map via channel attention block and fuses them to boost the receptive field of the network. The source code will be available at \href{https://github.com/shiningZZ/GUPDM}{https://github.com/shiningZZ/GUPDM}.
摘要
水下图像经常受到颜色扭曲和对比度低下,导致各种图像类型,这是由水媒体扩散和吸收光线所致。而获得高质量的通用模型 paired 训练样本很难。为了解决这些挑战,我们设计了一种通用水下图像提升方法,通过物理知识导引动模型(简称 GUPDM),它由三部分组成:大气基础动态结构(ADS)、传输基础动态结构(TDS)和优先级多尺度结构(PMS)。具体来说,为了处理复杂的水下场景,本研究通过形成模型来改变全球大气光和传输,模拟不同的水下图像颜色(例如,水下图像颜色从黄色到蓝色)。然后,我们设计了 ADS 和 TDS,它们使用动态滤波器来自适应地提取水下图像中的优先信息,并生成参数 для PMS。这两个模块使得网络可以在不同的水类型上适应性选择合适的参数。此外,PMS 中的多尺度特征提取模块使用不同的核群尺寸和权重获取器,通过通道注意力块并将其拼接起来,以提高网络的感知范围。源代码将在 \href{https://github.com/shiningZZ/GUPDM}{https://github.com/shiningZZ/GUPDM} 上公开。
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network
results: 比革命性更好,保留图像的基本 геометрическую结构Here’s a breakdown of each point:1. for: The paper is written for restoring low-light light field images.2. methods: The paper uses a novel and interpretable end-to-end learning framework called the deep compensation unfolding network (DCUNet), which is designed to mimic the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework includes a multi-stage architecture and a content-associated deep compensation module to suppress noise and illumination map estimation errors. Additionally, the paper proposes a pseudo-explicit feature interaction module to comprehensively exploit redundant information in LF images.3. results: The experimental results on both simulated and real datasets demonstrate the superiority of the DCUNet over state-of-the-art methods, both qualitatively and quantitatively. The results also show that DCUNet preserves the essential geometric structure of enhanced LF images much better than other methods.Abstract
This paper presents a novel and interpretable end-to-end learning framework, called the deep compensation unfolding network (DCUNet), for restoring light field (LF) images captured under low-light conditions. DCUNet is designed with a multi-stage architecture that mimics the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. Additionally, DCUNet includes a content-associated deep compensation module at each optimization stage to suppress noise and illumination map estimation errors. To properly mine and leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module that comprehensively exploits redundant information in LF images. The experimental results on both simulated and real datasets demonstrate the superiority of our DCUNet over state-of-the-art methods, both qualitatively and quantitatively. Moreover, DCUNet preserves the essential geometric structure of enhanced LF images much better. The code will be publicly available at https://github.com/lyuxianqiang/LFLL-DCU.
摘要
To fully utilize the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module that comprehensively exploits redundant information in LF images. The experimental results on both simulated and real datasets demonstrate the superiority of our DCUNet over state-of-the-art methods, both qualitatively and quantitatively. Moreover, DCUNet preserves the essential geometric structure of enhanced LF images much better. The code will be publicly available at https://github.com/lyuxianqiang/LFLL-DCU.中文翻译:本文提出了一种新的、可解释的端到端学习框架,called deep compensation unfolding network (DCUNet),用于提高低光照的光场图像(LF)。DCUNet 采用了多stage 架构,模拟了解决逆转媒体问题的优化过程。框架使用 intermediate 增强结果来估算照明图像,然后employs 它们在 unfolding 过程中生成新的增强结果。此外,DCUNet 还包括一个内容相关的深度补偿模块,以 suppress 噪声和照明图像估算错误。为了充分利用光场图像的特点,这篇文章提出了一种 Pseudo-explicit feature interaction module,可以全面利用光场图像中的重复信息。实验结果表明,我们的 DCUNet 在实际和模拟数据集上比 state-of-the-art 方法更高效, both qualitatively and quantitatively。此外,DCUNet 更好地保持了增强后 LF 图像的重要几何结构。代码将在 https://github.com/lyuxianqiang/LFLL-DCU 上公开。
TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms
paper_authors: Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen
For: 提高低剂量Positron发射Tomography(PET)图像质量,以降低辐射暴露。* Methods: 提出一种基于变换器的模型,称为TriDo-Former,可以直接将低剂量PET(LPET)的信号转换为标准剂量PET(SPET)图像。该模型包括两个融合的网络:一个权重提升变换器(SE-Former)用于去噪LPET信号,以及一个空间-спектраль重建变换器(SSR-Former)用于从去噪后的LPET信号中重建SPET图像。* Results: 与现有方法相比,TriDo-Former可以更好地保持图像的细节和边缘,并且能够更好地捕捉全局结构。在临床数据集上进行验证,TriDo-Former表现较好,both qualitatively and quantitatively。Abstract
To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminished or fuzzy edges in the reconstructed images. Furthermore, the convolutional architectures, which are commonly used, lack the ability to model long-range non-local interactions, potentially leading to inaccurate representations of global structures. To alleviate these problems, we propose a transformer-based model that unites triple domains of sinogram, image, and frequency for direct PET reconstruction, namely TriDo-Former. Specifically, the TriDo-Former consists of two cascaded networks, i.e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms. Different from the vanilla transformer that splits an image into 2D patches, based specifically on the PET imaging mechanism, our SE-Former divides the sinogram into 1D projection view angles to maintain its inner-structure while denoising, preventing the noise in the sinogram from prorogating into the image domain. Moreover, to mitigate high-frequency distortion and improve reconstruction details, we integrate global frequency parsers (GFPs) into SSR-Former. The GFP serves as a learnable frequency filter that globally adjusts the frequency components in the frequency domain, enforcing the network to restore high-frequency details resembling real SPET images. Validations on a clinical dataset demonstrate that our TriDo-Former outperforms the state-of-the-art methods qualitatively and quantitatively.
摘要
为了获得高质量的 positron发射tomography(PET)图像,同时尽量降低辐射暴露,various methods have been proposed for reconstructing standard-dose PET(SPET)图像 directly from low-dose PET(LPET)sinograms. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminished or fuzzy edges in the reconstructed images. Furthermore, the convolutional architectures, which are commonly used, lack the ability to model long-range non-local interactions, potentially leading to inaccurate representations of global structures. To alleviate these problems, we propose a transformer-based model that unites triple domains of sinogram, image, and frequency for direct PET reconstruction, namely TriDo-Former. Specifically, the TriDo-Former consists of two cascaded networks, i.e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms. Different from the vanilla transformer that splits an image into 2D patches, based specifically on the PET imaging mechanism, our SE-Former divides the sinogram into 1D projection view angles to maintain its inner-structure while denoising, preventing the noise in the sinogram from propagating into the image domain. Moreover, to mitigate high-frequency distortion and improve reconstruction details, we integrate global frequency parsers (GFPs) into SSR-Former. The GFP serves as a learnable frequency filter that globally adjusts the frequency components in the frequency domain, enforcing the network to restore high-frequency details resembling real SPET images. Validations on a clinical dataset demonstrate that our TriDo-Former outperforms the state-of-the-art methods qualitatively and quantitatively.
Towards General and Fast Video Derain via Knowledge Distillation
results: 我们的开发的通用方法在运行速度和雨除效果两个方面达到了最佳效果。Abstract
As a common natural weather condition, rain can obscure video frames and thus affect the performance of the visual system, so video derain receives a lot of attention. In natural environments, rain has a wide variety of streak types, which increases the difficulty of the rain removal task. In this paper, we propose a Rain Review-based General video derain Network via knowledge distillation (named RRGNet) that handles different rain streak types with one pre-training weight. Specifically, we design a frame grouping-based encoder-decoder network that makes full use of the temporal information of the video. Further, we use the old task model to guide the current model in learning new rain streak types while avoiding forgetting. To consolidate the network's ability to derain, we design a rain review module to play back data from old tasks for the current model. The experimental results show that our developed general method achieves the best results in terms of running speed and derain effect.
摘要
为了解决雨水影响视觉系统性能的问题,视频雨除(video derain)已经受到了广泛的关注。自然环境中的雨水有多种斑斓类型,这使得雨除任务变得更加困难。在这篇论文中,我们提出了一种基于知识储存(knowledge distillation)的通用视频雨除网络(RRGNet),可以处理不同的雨斑类型。具体来说,我们设计了一个帧组合网络,使得网络可以充分利用视频的时间信息。此外,我们使用老任务模型来导引当前模型学习新的雨斑类型,以避免忘记。为了巩固网络的雨除能力,我们设计了雨评模块,以便将老任务数据播放给当前模型。实验结果表明,我们开发的通用方法在运行速度和雨除效果两个方面均达到了最佳效果。
Geometric Learning-Based Transformer Network for Estimation of Segmentation Errors
results: 这个方法在一个高分辨率微型CT数据集上进行了评估,结果显示它可以实现约0.042的平均绝对误差和79.53%的准确率,优于其他 Graph Neural Network(GNN)。此外,这个方法还提出了点批评点预测(vertex-normal prediction)作为自适应任务,以提高网络的总性能。Abstract
Many segmentation networks have been proposed for 3D volumetric segmentation of tumors and organs at risk. Hospitals and clinical institutions seek to accelerate and minimize the efforts of specialists in image segmentation. Still, in case of errors generated by these networks, clinicians would have to manually edit the generated segmentation maps. Given a 3D volume and its putative segmentation map, we propose an approach to identify and measure erroneous regions in the segmentation map. Our method can estimate error at any point or node in a 3D mesh generated from a possibly erroneous volumetric segmentation map, serving as a Quality Assurance tool. We propose a graph neural network-based transformer based on the Nodeformer architecture to measure and classify the segmentation errors at any point. We have evaluated our network on a high-resolution micro-CT dataset of the human inner-ear bony labyrinth structure by simulating erroneous 3D segmentation maps. Our network incorporates a convolutional encoder to compute node-centric features from the input micro-CT data, the Nodeformer to learn the latent graph embeddings, and a Multi-Layer Perceptron (MLP) to compute and classify the node-wise errors. Our network achieves a mean absolute error of ~0.042 over other Graph Neural Networks (GNN) and an accuracy of 79.53% over other GNNs in estimating and classifying the node-wise errors, respectively. We also put forth vertex-normal prediction as a custom pretext task for pre-training the CNN encoder to improve the network's overall performance. Qualitative analysis shows the efficiency of our network in correctly classifying errors and reducing misclassifications.
摘要
很多分割网络已经被提出用于3D分割肿瘤和潜在受影响的器官。医院和临床机构希望通过加速和减少专家图像分割的努力来加速图像分割过程。然而,在这些网络生成的分割地图中出现错误时,临床医生需要手动修改生成的分割地图。我们提出一种方法,可以在3D矩阵中找到和测量错误的区域。我们的方法可以在3D矩阵中计算错误的误差,并且可以作为图像质量控制工具。我们提出一种基于Transformer架构的图神经网络,可以在3D矩阵中测量和分类分割错误。我们的网络包括一个卷积Encoder来计算输入微CT数据中的节点特征,Nodeformer来学习幂等图像嵌入,以及一个多层感知器(MLP)来计算和分类节点错误。我们的网络在其他图神经网络(GNN)中的mean absolute error为~0.042,并且在分类节点错误的精度为79.53%。我们还提出了预训练CNNEncoder的预测顶点均值作为自定义预处理任务,以提高网络的整体性能。 качеitative分析表明,我们的网络能够正确地分类错误并减少错误分类。
results: 我们的方法在比较其他 state-of-the-art OOD detection 方法时表现出了显著的优势。Abstract
Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system. In a real-world scenario, a deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution, also called out-of-distribution (OOD) samples. Those OOD samples can lead to unexpected outputs, as dialects of those samples are unseen during model training. Out-of-distribution detection is a new research area that has received little attention in the context of dialect classification. Towards this, we proposed a simple yet effective unsupervised Mahalanobis distance feature-based method to detect out-of-distribution samples. We utilize the latent embeddings from all intermediate layers of a wav2vec 2.0 transformer-based dialect classifier model for multi-task learning. Our proposed approach outperforms other state-of-the-art OOD detection methods significantly.
摘要
dialect classification 在各种应用中使用,如机器翻译和语音识别,以提高整体系统性能。在实际场景中,部署的 диалект分类模型可能会遇到不同于训练数据分布的输入样本,也称为 OUT-OF-DISTRIBUTION(OOD)样本。这些 OOD 样本可能会导致不期望的输出,因为这些 диалект样本在模型训练时未看到。 OUT-OF-DISTRIBUTION 检测是一个新的研究领域,在 диаLECT 分类领域得到了少量的关注。为了解决这问题,我们提出了一种简单 yet 高效的无监督 Mahalanobis 距离特征基于方法来检测 OOD 样本。我们利用了一个 wav2vec 2.0 基于 transformer 的 диалект分类模型的所有 intermediate 层的矩阵表示。我们的提出方法在与其他现有的 OOD 检测方法比较中表现出色。
DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music
paper_authors: Hongru Liang, Jingyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei
for: automatic music labeling in an essential but under-explored setting
methods: uses pre-trained classifiers and a novel joint score function to harvest more diverse and valid labels from user comments
results: produces more diverse labels missed by the gold labels, superior to state-of-the-art solutionsAbstract
Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an essential but under-explored setting, where the model is required to harvest more diverse and valid labels from the users' comments given limited gold labels. To this end, we design an iterative framework (DiVa) to harvest more $\underline{\text{Di}$verse and $\underline{\text{Va}$lid labels from user comments for music. The framework makes a classifier able to form complete sets of labels for songs via pseudo-labels inferred from pre-trained classifiers and a novel joint score function. The experiment on a densely annotated testing set reveals the superiority of the Diva over state-of-the-art solutions in producing more diverse labels missed by the gold labels. We hope our work can inspire future research on automated music labeling.
摘要
向 suficient music searching, it is vital to form a complete set of labels for each song。However,current solutions fail to resolve it as they cannot produce diverse enough mappings to make up for the information missed by the gold labels。Based on the observation that such missing information may already be presented in user comments,we propose to study the automated music labeling in an essential but under-explored setting,where the model is required to harvest more diverse and valid labels from the users' comments given limited gold labels。To this end,we design an iterative framework (DiVa) to harvest more $\underline{\text{Di}$verse and $\underline{\text{Va}$lid labels from user comments for music。The framework makes a classifier able to form complete sets of labels for songs via pseudo-labels inferred from pre-trained classifiers and a novel joint score function。The experiment on a densely annotated testing set reveals the superiority of the Diva over state-of-the-art solutions in producing more diverse labels missed by the gold labels。We hope our work can inspire future research on automated music labeling。
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
results: 在SoundNet-Flickr和VGG-Sound Source等数据集上进行了详细的实验,并证明了与其他state-of-the-art方法相比,本研究的方法在不同的挑战性enario下表现出了superior的性能。Abstract
Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive loss, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN
摘要
自顾supervised音频源localization通常面临 modalities 不一致性挑战。 recent studies have shown that contrastive learning based strategies have promising to establish a consistent correspondence between audio and visual modalities in visual scenarios. However, the insufficient attention to the heterogeneity influence in different modality features still limits this scheme to be further improved, which also becomes the motivation of our work.In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network.Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN.Here's the translation in Traditional Chinese:自顾supervised音频源localization通常面临 modalities 不一致性挑战。 recent studies have shown that contrastive learning based strategies have promising to establish a consistent correspondence between audio and visual modalities in visual scenarios. However, the insufficient attention to the heterogeneity influence in different modality features still limits this scheme to be further improved, which also becomes the motivation of our work.In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network.Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN.
Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation
results: 在VoxCeleb1&2 dataset上进行了语音识别任务的实验,并比较了提议方法与现有的pooling方法的表现。Abstract
The emergence of self-supervised representation (i.e., wav2vec 2.0) allows speaker-recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub-optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self-supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self-supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1\&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self-supervised representation.
摘要
《自我超级 Representation(即 Wav2vec 2.0)的出现允许 speaker recognition 方法通过基于 speech 数据建立的基础模型进行处理 spoken signals。然而,为了实现有效的融合,需要进一步调查,因为包含 fixed 或不优化的 temporal pooling 策略。尽管考虑了基于图学习和图注意因素的改进策略,仍然存在非具有注意力的聚合,这可能影响 speaker recognition 的性能。在这种情况下,我们提出了基于 Isomorphic Graph ATtention network(IsoGAT)的 speaker recognition 方法。该方法包括 three 个模块:表示学习、图注意和聚合,同时考虑了学习 self-supervised representation 和 IsoGAT。然后,我们对 VoxCeleb1&2 数据集进行实验,并得到了对exist pooling approaches on self-supervised representation的比较。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The traditional Chinese writing system is also widely used, especially in Taiwan and Hong Kong.
results: 本研究的实验结果显示,使用了density crop-guided semi-supervised detector可以在COCO式AP中提高物体探测器的准确性超过2%。Abstract
One of the important bottlenecks in training modern object detectors is the need for labeled images where bounding box annotations have to be produced for each object present in the image. This bottleneck is further exacerbated in aerial images where the annotators have to label small objects often distributed in clusters on high-resolution images. In recent days, the mean-teacher approach trained with pseudo-labels and weak-strong augmentation consistency is gaining popularity for semi-supervised object detection. However, a direct adaptation of such semi-supervised detectors for aerial images where small clustered objects are often present, might not lead to optimal results. In this paper, we propose a density crop-guided semi-supervised detector that identifies the cluster of small objects during training and also exploits them to improve performance at inference. During training, image crops of clusters identified from labeled and unlabeled images are used to augment the training set, which in turn increases the chance of detecting small objects and creating good pseudo-labels for small objects on the unlabeled images. During inference, the detector is not only able to detect the objects of interest but also regions with a high density of small objects (density crops) so that detections from the input image and detections from image crops are combined, resulting in an overall more accurate object prediction, especially for small objects. Empirical studies on the popular benchmarks of VisDrone and DOTA datasets show the effectiveness of our density crop-guided semi-supervised detector with an average improvement of more than 2\% over the basic mean-teacher method in COCO style AP. Our code is available at: https://github.com/akhilpm/DroneSSOD.
摘要
一个重要的瓶颈在现代物体探测器的训练中是需要标注的图像,其中每个图像需要生成矩形框注释。这个瓶颈在飞行图像中更加严重,因为注释者需要为高分辨率图像中的小对象进行标注。在最近的日子里,使用pseudo-labels和弱强协同稳定的方法训练的semi-supervised物体探测器在获得优化的结果。然而,直接将这种semi-supervised探测器应用于飞行图像中可能并不会导致最佳的结果。在这篇论文中,我们提出了一种基于密度的剪辑引导的semi-supervised探测器,它在训练时使用标注和无标注图像中的群集来增强训练集,从而提高小对象的检测和生成良好的pseudo-labels。在探测时,探测器不仅可以检测输入图像中的对象,还可以检测密度较高的小对象区域(密度剪辑),从而将输入图像和剪辑图像的检测结果组合起来,实现更加准确的对象预测,特别是小对象。我们的实验结果表明,我们的密度剪辑引导的semi-supervised探测器在COCO风格的AP中超过2%的提升。我们的代码可以在GitHub上找到:https://github.com/akhilpm/DroneSSOD。
An End-to-End Framework of Road User Detection, Tracking, and Prediction from Monocular Images
results: 广泛的实验表明,ODTP在nuScenes数据集上表现出高级别的端到端路径预测能力,DCENet++通过加强动态地图来预测更为准确的路径,并与其他生成和决定性路径预测模型相比较为稳定。Abstract
Perception that involves multi-object detection and tracking, and trajectory prediction are two major tasks of autonomous driving. However, they are currently mostly studied separately, which results in most trajectory prediction modules being developed based on ground truth trajectories without taking into account that trajectories extracted from the detection and tracking modules in real-world scenarios are noisy. These noisy trajectories can have a significant impact on the performance of the trajectory predictor and can lead to serious prediction errors. In this paper, we build an end-to-end framework for detection, tracking, and trajectory prediction called ODTP (Online Detection, Tracking and Prediction). It adopts the state-of-the-art online multi-object tracking model, QD-3DT, for perception and trains the trajectory predictor, DCENet++, directly based on the detection results without purely relying on ground truth trajectories. We evaluate the performance of ODTP on the widely used nuScenes dataset for autonomous driving. Extensive experiments show that ODPT achieves high performance end-to-end trajectory prediction. DCENet++, with the enhanced dynamic maps, predicts more accurate trajectories than its base model. It is also more robust when compared with other generative and deterministic trajectory prediction models trained on noisy detection results.
摘要
感知 tasks 中的多对象探测和跟踪,以及预测 trajectory 是自动驾驶技术的两大任务。然而,这两个任务目前大多是分开研究,导致大多数预测 trajectory 模块是基于真实的 trajectory 进行开发,而不是基于实际enario 中的探测和跟踪结果。这些噪音的 trajectory 可能会对预测性能产生重要的影响,导致严重的预测错误。在这篇论文中,我们提出了一个综合框架,称为 ODTP(Online Detection, Tracking and Prediction),用于探测、跟踪和预测。ODTP 采用了当前最佳的在线多对象跟踪模型 QD-3DT,用于感知,并直接基于探测结果进行预测 trajectory 的训练,而不是完全依赖于真实的 trajectory。我们对 nuScenes 数据集进行了广泛的实验,并证明了 ODTP 在综合框架中的高性能端到端预测。DCENet++ 使用了增强的动态地图,预测更加准确的 trajectory,并与其基本模型相比,更加Robust。
Feature Modulation Transformer: Cross-Refinement of Global Representation via High-Frequency Prior for Image Super-Resolution
For: The paper is written for improving the performance of single image super-resolution (SISR) using transformer-based methods.* Methods: The paper proposes a new method called CRAFT, which integrates the strengths of both convolutional and transformer structures. CRAFT consists of three key components: the high-frequency enhancement residual block (HFERB), the shift rectangle window attention block (SRWAB), and the hybrid fusion block (HFB).* Results: The paper reports that CRAFT outperforms state-of-the-art methods by up to 0.29dB while using fewer parameters, as demonstrated through experiments on multiple datasets.Abstract
Transformer-based methods have exhibited remarkable potential in single image super-resolution (SISR) by effectively extracting long-range dependencies. However, most of the current research in this area has prioritized the design of transformer blocks to capture global information, while overlooking the importance of incorporating high-frequency priors, which we believe could be beneficial. In our study, we conducted a series of experiments and found that transformer structures are more adept at capturing low-frequency information, but have limited capacity in constructing high-frequency representations when compared to their convolutional counterparts. Our proposed solution, the cross-refinement adaptive feature modulation transformer (CRAFT), integrates the strengths of both convolutional and transformer structures. It comprises three key components: the high-frequency enhancement residual block (HFERB) for extracting high-frequency information, the shift rectangle window attention block (SRWAB) for capturing global information, and the hybrid fusion block (HFB) for refining the global representation. Our experiments on multiple datasets demonstrate that CRAFT outperforms state-of-the-art methods by up to 0.29dB while using fewer parameters. The source code will be made available at: https://github.com/AVC2-UESTC/CRAFT-SR.git.
摘要
“ transformer 基本方法在单图超解(SISR)中表现出了非常出色的潜力,但大多数当前的研究都是强调设计 transformer 块来捕捉全球信息,而忽略了包含高频约束的重要性,我们认为这可能是有利的。在我们的研究中,我们进行了一系列实验,发现 transformer 结构更适合捕捉低频信息,但在高频信息建模方面有限制,与 convolutional 结构相比。我们的提议方案,即 cross-refinement adaptive feature modulation transformer(CRAFT),结合了 convolutional 和 transformer 结构的优点。它包括三个关键组件:高频增强剩余块(HFERB)、移动矩形窗口注意块(SRWAB)和混合融合块(HFB)。我们在多个数据集上进行了实验,结果显示,CRAFT 可以与状态之前的方法相比,在0.29dB 的误差上提高到最高。我们将在 GitHub 上公开源代码:https://github.com/AVC2-UESTC/CRAFT-SR.git。”
results: 实验表明,ROMTrack在多个benchmark上达到了新的状态态-of-the-art水平, indicating that the proposed framework is effective in improving the stability and performance of visual tracking.Abstract
Object modeling has become a core part of recent tracking frameworks. Current popular tackers use Transformer attention to extract the template feature separately or interactively with the search region. However, separate template learning lacks communication between the template and search regions, which brings difficulty in extracting discriminative target-oriented features. On the other hand, interactive template learning produces hybrid template features, which may introduce potential distractors to the template via the cluttered search regions. To enjoy the merits of both methods, we propose a robust object modeling framework for visual tracking (ROMTrack), which simultaneously models the inherent template and the hybrid template features. As a result, harmful distractors can be suppressed by combining the inherent features of target objects with search regions' guidance. Target-related features can also be extracted using the hybrid template, thus resulting in a more robust object modeling framework. To further enhance robustness, we present novel variation tokens to depict the ever-changing appearance of target objects. Variation tokens are adaptable to object deformation and appearance variations, which can boost overall performance with negligible computation. Experiments show that our ROMTrack sets a new state-of-the-art on multiple benchmarks.
摘要
现代跟踪框架中的对象模型已成为核心。目前流行的跟踪器使用 transformer 注意力来分离模板特征或在搜索区域中交互地学习模板。然而,分离模板学习缺乏模板和搜索区域之间的交流,这会增加提取特定目标 oriented 特征的困难。然而,交互模板学习生成的杂合模板特征可能会通过搜索区域中的噪声引入潜在的干扰器。为了享受到这两种方法的优点,我们提出了一种robust对象模型框架(ROMTrack),它同时模型了内在模板和杂合模板特征。因此,干扰器可以通过将内在特征与搜索区域的指导相结合来被抑制。同时,我们还可以使用杂合模板来提取目标相关的特征,从而得到更加robust的对象模型框架。为了进一步增强可靠性,我们还提出了一种新的变化 токен来描述目标对象的变化。这些变化 токен可以适应物体变形和变化,从而提高总性能。实验表明,我们的 ROMTrack 在多个标准准则上设置了新的状态之冠。
Do Diffusion Models Suffer Error Propagation? Theoretical Analysis and Consistency Regularization
paper_authors: Yangming Li, Zhaozhi Qian, Mihaela van der Schaar
for: This paper aims to address the error propagation issue in diffusion models, which can cause the cascade structure to magnify distributional mismatches.
methods: The paper proposes a regularization scheme to address error propagation in diffusion models, which is based on a consistency constraint that ensures the forward and backward processes have similar distributions.
results: The paper shows through theoretical analysis and experimental results that the proposed regularization scheme can effectively reduce error propagation in diffusion models, leading to improved performance on multiple image datasets.Abstract
While diffusion models have achieved promising performances in data synthesis, they might suffer error propagation because of their cascade structure, where the distributional mismatch spreads and magnifies through the chain of denoising modules. However, a strict analysis is expected since many sequential models such as Conditional Random Field (CRF) are free from error propagation. In this paper, we empirically and theoretically verify that diffusion models are indeed affected by error propagation and we then propose a regularization to address this problem. Our theoretical analysis reveals that the question can be reduced to whether every denoising module of the diffusion model is fault-tolerant. We derive insightful transition equations, indicating that the module can't recover from input errors and even propagates additional errors to the next module. Our analysis directly leads to a consistency regularization scheme for diffusion models, which explicitly reduces the distribution gap between forward and backward processes. We further introduce a bootstrapping algorithm to reduce the computation cost of the regularizer. Our experimental results on multiple image datasets show that our regularization effectively handles error propagation and significantly improves the performance of vanilla diffusion models.
摘要
diffusion models 有 achieved promising performances in data synthesis, but they may suffer from error propagation due to their cascade structure, where the distributional mismatch spreads and magnifies through the chain of denoising modules. However, strict analysis is expected since many sequential models such as Conditional Random Field (CRF) are free from error propagation. In this paper, we empirically and theoretically verify that diffusion models are indeed affected by error propagation, and we then propose a regularization to address this problem.我们的理论分析表明, diffusion models 的问题可以简化为每个减腾模组是否有耐错能力。我们 derivate 出几个重要的转换方程,显示每个减腾模组无法从输入错误中恢复,甚至将错误传递到下一个模组。我们的分析直接导向一种一致调整方案,用于降低 diffusion models 中的分布差距。我们还提出了一个快速bootstrapping算法,以减少调整的计算成本。我们的实验结果显示,我们的调整方法可以有效地处理 error propagation,并提高了 vanilla diffusion models 的性能。
Deep Learning Model Transfer in Forest Mapping using Multi-source Satellite SAR and Optical Images
results: 通过转移学习,SeUNet的预测得到了根圆误差(RMSE)为2.70米和R$^2$为0.882,与传统标准方法相比较为准确。 authors expect这种森林特定的深度学习模型转移可以适用于其他森林变量和其他地球观测数据源。Abstract
Deep learning (DL) models are gaining popularity in forest variable prediction using Earth Observation images. However, in practical forest inventories, reference datasets are often represented by plot- or stand-level measurements, while high-quality representative wall-to-wall reference data for end-to-end training of DL models are rarely available. Transfer learning facilitates expansion of the use of deep learning models into areas with sub-optimal training data by allowing pretraining of the model in areas where high-quality teaching data are available. In this study, we perform a "model transfer" (or domain adaptation) of a pretrained DL model into a target area using plot-level measurements and compare performance versus other machine learning models. We use an earlier developed UNet based model (SeUNet) to demonstrate the approach on two distinct taiga sites with varying forest structure and composition. Multisource Earth Observation (EO) data are represented by a combination of Copernicus Sentinel-1 C-band SAR and Sentinel-2 multispectral images, JAXA ALOS-2 PALSAR-2 SAR mosaic and TanDEM-X bistatic interferometric radar data. The training study site is located in Finnish Lapland, while the target site is located in Southern Finland. By leveraging transfer learning, the prediction of SeUNet achieved root mean squared error (RMSE) of 2.70 m and R$^2$ of 0.882, considerably more accurate than traditional benchmark methods. We expect such forest-specific DL model transfer can be suitable also for other forest variables and other EO data sources that are sensitive to forest structure.
摘要
深度学习(DL)模型在森林变量预测中获得广泛应用,但在实际森林资源评估中,参考数据通常由干扰或树立面级别测量,而高质量代表墙壁到墙壁的参考数据对深度学习模型的端到端训练是罕见。转移学习可以扩展深度学习模型在有限制性训练数据的领域中的应用,通过将预训练模型转移到目标领域使用干扰量级测量。本研究使用了之前开发的UNet基于模型(SeUNet),在两个不同的落叶阔绿林区域中进行了对比研究。使用了欧盟资料遥感-1C频率Synthetic Aperture Radar(SAR)、欧盟资料遥感-2多spectral图像、JAXA ALOS-2 PALSAR-2 SAR覆盖图和TanDEM-X对干扰雷达数据。训练研究地点位于芬兰拉普兰地区,目标地点位于南芬兰。通过利用转移学习,SeUNet模型预测的root mean squared error(RMSE)为2.70米,R$^2$为0.882,远远高于传统标准方法。我们预期这种森林特有的深度学习模型转移可以适用于其他森林变量和其他遥感数据源,这些数据源对森林结构敏感。
Discrepancy-based Active Learning for Weakly Supervised Bleeding Segmentation in Wireless Capsule Endoscopy Images
results: 根据实验结果,这篇论文的方法比预设的活动学习方法更好,并且只需要10%的训练数据被标注。此外,这篇论文的方法与完全标注的数据集 trains中的性能相似。Abstract
Weakly supervised methods, such as class activation maps (CAM) based, have been applied to achieve bleeding segmentation with low annotation efforts in Wireless Capsule Endoscopy (WCE) images. However, the CAM labels tend to be extremely noisy, and there is an irreparable gap between CAM labels and ground truths for medical images. This paper proposes a new Discrepancy-basEd Active Learning (DEAL) approach to bridge the gap between CAMs and ground truths with a few annotations. Specifically, to liberate labor, we design a novel discrepancy decoder model and a CAMPUS (CAM, Pseudo-label and groUnd-truth Selection) criterion to replace the noisy CAMs with accurate model predictions and a few human labels. The discrepancy decoder model is trained with a unique scheme to generate standard, coarse and fine predictions. And the CAMPUS criterion is proposed to predict the gaps between CAMs and ground truths based on model divergence and CAM divergence. We evaluate our method on the WCE dataset and results show that our method outperforms the state-of-the-art active learning methods and reaches comparable performance to those trained with full annotated datasets with only 10% of the training data labeled.
摘要
weakly 监督的方法,如基于活化图 (CAM) 的方法,已经应用于具有低注释努力的内膜投射图像 (WCE) 中的分割。然而,CAM 标签往往具有噪声,并且在医疗图像中存在不可覆盖的差距 между CAM 标签和真实值。这篇论文提出了一种新的差异基于活动学习 (DEAL) 方法,以填补 CAM 标签和真实值之间的差距。特别是,为了减少劳动力,我们设计了一种新的差异解码器模型和 CAMPUS (CAM, Pseudo-label and groUnd-truth Selection) criterion。差异解码器模型通过独特的训练方法生成标准、粗略和细化预测。而 CAMPUS criterion 基于模型分布和 CAM 分布来预测 CAM 与真实值之间的差距。我们在 WCE 数据集上评估了我们的方法,结果显示,我们的方法比 estado-of-the-art 活动学习方法高效,并且只使用 10% 的训练数据标注就达到了与全注释数据集相同的性能。
IDiff-Face: Synthetic-based Face Recognition through Fizzy Identity-Conditioned Diffusion Models
paper_authors: Fadi Boutros, Jonas Henry Grebe, Arjan Kuijper, Naser Damer
for: This paper aims to address the issue of limited intra-class diversity and cross-class discrimination in synthetic face datasets, which hinders the performance of face recognition models trained on these datasets.
methods: The proposed approach, IDiff-Face, uses conditional latent diffusion models to generate synthetic identities with realistic identity variations for face recognition training.
results: The proposed approach achieved state-of-the-art performance on the LFW benchmark, with an accuracy of 98.00%, significantly outperforming recent synthetic-based face recognition solutions (95.40%) and bridging the gap to authentic-based face recognition (99.82%).Abstract
The availability of large-scale authentic face databases has been crucial to the significant advances made in face recognition research over the past decade. However, legal and ethical concerns led to the recent retraction of many of these databases by their creators, raising questions about the continuity of future face recognition research without one of its key resources. Synthetic datasets have emerged as a promising alternative to privacy-sensitive authentic data for face recognition development. However, recent synthetic datasets that are used to train face recognition models suffer either from limitations in intra-class diversity or cross-class (identity) discrimination, leading to less optimal accuracies, far away from the accuracies achieved by models trained on authentic data. This paper targets this issue by proposing IDiff-Face, a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training. Through extensive evaluations, our proposed synthetic-based face recognition approach pushed the limits of state-of-the-art performances, achieving, for example, 98.00% accuracy on the Labeled Faces in the Wild (LFW) benchmark, far ahead from the recent synthetic-based face recognition solutions with 95.40% and bridging the gap to authentic-based face recognition with 99.82% accuracy.
摘要
大量真实面部数据的可用性对于过去一代面部识别研究的进步做出了重要贡献。然而,法律和伦理问题导致了许多这些数据的撤回,使得未来面部识别研究的继续发展受到了很大的威胁。 synthetic数据 emerged as a promising alternative to privacy-sensitive authentic data for face recognition development. However, recent synthetic datasets used to train face recognition models suffer from limitations in intra-class diversity or cross-class (identity) discrimination, leading to less optimal accuracies, far away from the accuracies achieved by models trained on authentic data. This paper targets this issue by proposing IDiff-Face, a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training. Through extensive evaluations, our proposed synthetic-based face recognition approach pushed the limits of state-of-the-art performances, achieving, for example, 98.00% accuracy on the Labeled Faces in the Wild (LFW) benchmark, far ahead from the recent synthetic-based face recognition solutions with 95.40% and bridging the gap to authentic-based face recognition with 99.82% accuracy.
Foreground Object Search by Distilling Composite Image Feature
results: 对于FOS任务,本方法比前一些方法更高效,并且提供了两个新的数据集(S-FOSD和R-FOSD),以便进一步探索FOS领域的可能性。Abstract
Foreground object search (FOS) aims to find compatible foreground objects for a given background image, producing realistic composite image. We observe that competitive retrieval performance could be achieved by using a discriminator to predict the compatibility of composite image, but this approach has unaffordable time cost. To this end, we propose a novel FOS method via distilling composite feature (DiscoFOS). Specifically, the abovementioned discriminator serves as teacher network. The student network employs two encoders to extract foreground feature and background feature. Their interaction output is enforced to match the composite image feature from the teacher network. Additionally, previous works did not release their datasets, so we contribute two datasets for FOS task: S-FOSD dataset with synthetic composite images and R-FOSD dataset with real composite images. Extensive experiments on our two datasets demonstrate the superiority of the proposed method over previous approaches. The dataset and code are available at https://github.com/bcmi/Foreground-Object-Search-Dataset-FOSD.
摘要
Background object search (FOS) aims to find compatible background objects for a given background image, producing realistic composite image. We observe that competitive retrieval performance could be achieved by using a discriminator to predict the compatibility of composite image, but this approach has unaffordable time cost. To this end, we propose a novel FOS method via distilling composite feature (DiscoFOS). Specifically, the abovementioned discriminator serves as teacher network. The student network employs two encoders to extract foreground feature and background feature. Their interaction output is enforced to match the composite image feature from the teacher network. Additionally, previous works did not release their datasets, so we contribute two datasets for FOS task: S-FOSD dataset with synthetic composite images and R-FOSD dataset with real composite images. Extensive experiments on our two datasets demonstrate the superiority of the proposed method over previous approaches. The dataset and code are available at https://github.com/bcmi/Foreground-Object-Search-Dataset-FOSD.Here's the word-for-word translation of the text into Simplified Chinese:背景物体搜索(FOS)目标是找到与背景图像兼容的背景物体,生成真实的复合图像。我们观察到,通过使用一个推理器预测复合图像的兼容性,可以达到竞争性的检索性能,但这种方法具有不可持续的时间成本。为此,我们提议一种新的FOS方法,通过预测复合特征进行启发(DiscoFOS)。具体来说,所述的推理器服务器作为教师网络。学生网络使用两个编码器提取背景特征和前景特征。它们之间的交互输出被要求与教师网络中的复合图像特征匹配。此外,先前的工作没有公开其数据集,我们为FOS任务贡献了两个数据集:S-FOSD数据集和R-FOSD数据集。我们对这两个数据集进行了广泛的实验,并证明了我们的方法在先前方法之上具有超越性。数据集和代码可以在https://github.com/bcmi/Foreground-Object-Search-Dataset-FOSD上获取。
Self-supervised Landmark Learning with Deformation Reconstruction and Cross-subject Consistency Objectives
results: 我们的方法在一种退化股骨arthritis进程预测任务中表现出色,比 EXISTS image-based和点基的方法更高。Abstract
A Point Distribution Model (PDM) is the basis of a Statistical Shape Model (SSM) that relies on a set of landmark points to represent a shape and characterize the shape variation. In this work, we present a self-supervised approach to extract landmark points from a given registration model for the PDMs. Based on the assumption that the landmarks are the points that have the most influence on registration, existing works learn a point-based registration model with a small number of points to estimate the landmark points that influence the deformation the most. However, such approaches assume that the deformation can be captured by point-based registration and quality landmarks can be learned solely with the deformation capturing objective. We argue that data with complicated deformations can not easily be modeled with point-based registration when only a limited number of points is used to extract influential landmark points. Further, landmark consistency is not assured in existing approaches In contrast, we propose to extract landmarks based on a given registration model, which is tailored for the target data, so we can obtain more accurate correspondences. Secondly, to establish the anatomical consistency of the predicted landmarks, we introduce a landmark discovery loss to explicitly encourage the model to predict the landmarks that are anatomically consistent across subjects. We conduct experiments on an osteoarthritis progression prediction task and show our method outperforms existing image-based and point-based approaches.
摘要
“一个点分布模型(PDM)是基础的 Statistical Shape Model(SSM)的基础,它透过一组点来表示形状和描述形状的变化。在这个工作中,我们提出了一个自动化的方法,以EXTract点来自已知的注册模型中,以便为PDM中的点分布建立更加精确的描述。对于现有的方法,它们假设只需要使用一小数量的点来学习点基于注册模型,并假设这些点可以充分地捕捉变形的特征。但是,我们认为复杂的变形不易被点基于的注册模型所捕捉,尤其是只使用一小数量的点。此外,现有的方法不能保证点的一致性。相反,我们提出了一个基于注册模型的方法,可以更加精确地描述点的分布,并且引入一个关于点的探索损失,以便更好地保证点的一致性。我们对于关节炎进步预测任务进行了实验,结果显示我们的方法可以较前者的图像基于和点基于方法出perform。”
ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction
results: 在模拟数据集上,ACE-HetEM的准确率和非整合方法相当,而且它还能生成更高的重建分辨率。此外,ACE-HetEM还可以应用于实验数据集。Abstract
Due to the extremely low signal-to-noise ratio (SNR) and unknown poses (projection angles and image translation) in cryo-EM experiments, reconstructing 3D structures from 2D images is very challenging. On top of these challenges, heterogeneous cryo-EM reconstruction also has an additional requirement: conformation classification. An emerging solution to this problem is called amortized inference, implemented using the autoencoder architecture or its variants. Instead of searching for the correct image-to-pose/conformation mapping for every image in the dataset as in non-amortized methods, amortized inference only needs to train an encoder that maps images to appropriate latent spaces representing poses or conformations. Unfortunately, standard amortized-inference-based methods with entangled latent spaces have difficulty learning the distribution of conformations and poses from cryo-EM images. In this paper, we propose an unsupervised deep learning architecture called "ACE-HetEM" based on amortized inference. To explicitly enforce the disentanglement of conformation classifications and pose estimations, we designed two alternating training tasks in our method: image-to-image task and pose-to-pose task. Results on simulated datasets show that ACE-HetEM has comparable accuracy in pose estimation and produces even better reconstruction resolution than non-amortized methods. Furthermore, we show that ACE-HetEM is also applicable to real experimental datasets.
摘要
Translated into Simplified Chinese:由于电子顺传显微实验中的信号噪声比(SNR)和不知的投影角和图像翻译是非常低的,从2D图像中重建3D结构非常困难。此外,病理学实验中的重建还有一个额外要求:确定投影类别和结构。一种趋势的解决方案是受束推断,通过自动编码器架构或其变体来实现。在非束推断方法中,需要为每个图像在数据集中找到正确的图像-投影/结构映射。而我们提出的方法ACE-HetEM使用受束推断,通过两个相互训练任务来强制分离投影类别和结构估计:图像-图像任务和投影-投影任务。实验结果表明,ACE-HetEM在pose估计方面具有相当的准确率,并且在重建分辨率方面还能更好些。此外,我们还证明了ACE-HetEM可以应用于实验数据集。
Branches Mutual Promotion for End-to-End Weakly Supervised Semantic Segmentation
results: 实验表明,我们的方法在无监督下进行权重分割,比现有的方法高效。Abstract
End-to-end weakly supervised semantic segmentation aims at optimizing a segmentation model in a single-stage training process based on only image annotations. Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch. However, this strategy makes the classification branch dominate the whole concurrent training process, hindering these two branches from assisting each other. In our work, we treat these two branches equally by viewing them as diverse ways to generate the segmentation map, and add interactions on both their supervision and operation to achieve mutual promotion. For this purpose, a bidirectional supervision mechanism is elaborated to force the consistency between the outputs of these two branches. Thus, the segmentation branch can also give feedback to the classification branch to enhance the quality of localization seeds. Moreover, our method also designs interaction operations between these two branches to exchange their knowledge to assist each other. Experiments indicate our work outperforms existing end-to-end weakly supervised segmentation methods.
摘要
End-to-end weakly supervised semantic segmentation aims to optimize a segmentation model in a single-stage training process based on only image annotations. Existing methods use an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch, but this strategy hinders the two branches from assisting each other. In our work, we treat these two branches equally and add interactions on both their supervision and operation to achieve mutual promotion. For this purpose, we elaborate a bidirectional supervision mechanism to enforce consistency between the outputs of the two branches. This allows the segmentation branch to provide feedback to the classification branch to enhance the quality of localization seeds. Moreover, our method also designs interaction operations between the two branches to exchange their knowledge and assist each other. Experimental results show that our work outperforms existing end-to-end weakly supervised segmentation methods.
SelectNAdapt: Support Set Selection for Few-Shot Domain Adaptation
results: 这篇论文的实验结果显示,相比于相关方法和标准随机选择方法,SelectNAdapt方法可以更好地适应深度神经网络到目标预训网络中,实现更高的预测性。Abstract
Generalisation of deep neural networks becomes vulnerable when distribution shifts are encountered between train (source) and test (target) domain data. Few-shot domain adaptation mitigates this issue by adapting deep neural networks pre-trained on the source domain to the target domain using a randomly selected and annotated support set from the target domain. This paper argues that randomly selecting the support set can be further improved for effectively adapting the pre-trained source models to the target domain. Alternatively, we propose SelectNAdapt, an algorithm to curate the selection of the target domain samples, which are then annotated and included in the support set. In particular, for the K-shot adaptation problem, we first leverage self-supervision to learn features of the target domain data. Then, we propose a per-class clustering scheme of the learned target domain features and select K representative target samples using a distance-based scoring function. Finally, we bring our selection setup towards a practical ground by relying on pseudo-labels for clustering semantically similar target domain samples. Our experiments show promising results on three few-shot domain adaptation benchmarks for image recognition compared to related approaches and the standard random selection.
摘要
通用化的深度神经网络在面临源频率和目标频率数据分布shift时变得易受攻击。几拍频率适应缓解这个问题,通过随机选择和标注目标频率领域的支持集来适应源频率上预训练的深度神经网络。这篇论文提出,随机选择支持集可以进一步改进为有效地适应源模型到目标频率领域。作为一种改进方案,我们提出了SelectNAdapt算法,它可以细化目标频率领域样本的选择。具体来说,在K-shot适应问题中,我们首先利用自我超级vision来学习目标频率数据的特征。然后,我们提议一种类别划分方案,将学习的目标频率特征分为K个类别。接着,我们使用一个基于距离评分函数的距离分配K个表示目标频率样本。最后,我们将选择setup带到实际应用中,通过 pseudo-labels来划分semantic相似的目标频率样本。我们的实验结果显示,Compared to related approaches and the standard random selection, our method achieves promising results on three few-shot domain adaptation benchmarks for image recognition.
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition
paper_authors: Lucian Bicsi, Bogdan Alexe, Radu Tudor Ionescu, Marius Leordeanu
For: 本研究提出了一种多dataset semi-supervised learning方法,即JEDI,以提高个体模型在不同数据集上的性能。* Methods: 该方法使用学习多个专家的知识,每个专家在不同数据集上进行预训练,然后将专家的特征表示 concatenate 成教师模型。 student-teacher semi-supervised learning 方法在joint 和 end-to-end 训练中进行学习,以提高学习效率和泛化能力。* Results: 在四个视频动作识别数据集上验证了该方法,结果表明,同时考虑所有数据集在一个统一的 semi-supervised Setting 中,可以获得显著的提升,与初始专家相比。Abstract
We propose JEDI, a multi-dataset semi-supervised learning method, which efficiently combines knowledge from multiple experts, learned on different datasets, to train and improve the performance of individual, per dataset, student models. Our approach achieves this by addressing two important problems in current machine learning research: generalization across datasets and limitations of supervised training due to scarcity of labeled data. We start with an arbitrary number of experts, pretrained on their own specific dataset, which form the initial set of student models. The teachers are immediately derived by concatenating the feature representations from the penultimate layers of the students. We then train all models in a student-teacher semi-supervised learning scenario until convergence. In our efficient approach, student-teacher training is carried out jointly and end-to-end, showing that both students and teachers improve their generalization capacity during training. We validate our approach on four video action recognition datasets. By simultaneously considering all datasets within a unified semi-supervised setting, we demonstrate significant improvements over the initial experts.
摘要
我们提出了JEDI方法,这是一种多集数 semi-supervised learning方法,可以有效地结合多个专家所学习的知识,以提高每个特定集数据的学生模型的性能。我们的方法解决了当前机器学习研究中两个重要问题:跨集数泛化和监督学习因数据缺乏标注数据而受限。我们从初始的任意数量专家开始,每个专家都是在自己特定的集数据上预训练的。我们的教师是通过将学生模型的准备层的特征表示 concatenate 而得到的。然后,我们在学生-教师半监督学习场景下同时训练所有模型,直到收敛。在我们的有效的方法中,学生-教师训练是结合的和端到端的,表明在训练过程中,学生和教师都会提高其泛化能力。我们在四个视频动作识别dataset上验证了我们的方法,并显示了与初始专家相比有显著的改进。通过同时考虑所有集数据,我们的方法实现了跨集数据的泛化。
GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via Geodesic Patch Similarity
paper_authors: Qi Yang, Joel Jung, Xiaozhong Xu, Shan Liu for: GeodesicPSIM is proposed to accurately predict human perception quality for static meshes with texture maps.methods: The paper uses a two-step patch cropping algorithm and a patch texture mapping module to refine the size of 1-hop geodesic patches and build the relationship between mesh geometry and color information. Three types of features are extracted to quantify the distortion.results: GeodesicPSIM provides state-of-the-art performance in comparison with image-based, point-based, and video-based metrics on a newly created and challenging database. The paper also proves the robustness of GeodesicPSIM by introducing different settings of hyperparameters and exhibits the effectiveness of the three proposed features and the patch cropping algorithm through ablation studies.Abstract
Static meshes with texture maps have attracted considerable attention in both industrial manufacturing and academic research, leading to an urgent requirement for effective and robust objective quality evaluation. However, current model-based static mesh quality metrics have obvious limitations: most of them only consider geometry information, while color information is ignored, and they have strict constraints for the meshes' geometrical topology. Other metrics, such as image-based and point-based metrics, are easily influenced by the prepossessing algorithms, e.g., projection and sampling, hampering their ability to perform at their best. In this paper, we propose Geodesic Patch Similarity (GeodesicPSIM), a novel model-based metric to accurately predict human perception quality for static meshes. After selecting a group keypoints, 1-hop geodesic patches are constructed based on both the reference and distorted meshes cleaned by an effective mesh cleaning algorithm. A two-step patch cropping algorithm and a patch texture mapping module refine the size of 1-hop geodesic patches and build the relationship between the mesh geometry and color information, resulting in the generation of 1-hop textured geodesic patches. Three types of features are extracted to quantify the distortion: patch color smoothness, patch discrete mean curvature, and patch pixel color average and variance. To the best of our knowledge, GeodesicPSIM is the first model-based metric especially designed for static meshes with texture maps. GeodesicPSIM provides state-of-the-art performance in comparison with image-based, point-based, and video-based metrics on a newly created and challenging database. We also prove the robustness of GeodesicPSIM by introducing different settings of hyperparameters. Ablation studies also exhibit the effectiveness of three proposed features and the patch cropping algorithm.
摘要
static mesh 的 texture map 在工业生产和学术研究中吸引了广泛的关注,导致对 static mesh 的效果评估 urgent requirement 的出现。然而,当前的模型基于的 static mesh 质量指标有显著的局限性:大多数它们只考虑 geometry 信息,而忽略 color 信息,并且具有严格的 mesh 的 geometrical topology 约束。其他指标,如图像基于和点基于的指标,容易受到预处理算法的影响,如投影和采样,这会限制它们的表现。在本文中,我们提出了 Geodesic Patch Similarity (GeodesicPSIM),一种新的模型基于指标,可以准确预测 static mesh 的人类感知质量。选择一组关键点后,使用 cleaned 的 referential 和扭曲的 mesh 构建一个一阶 geodesic patch。一个两步 cropping 算法和一个 patch texture mapping 模块来质量提高一阶 geodesic patch,并将 mesh 的 geometry 和 color 信息建立关系。通过提取三种特征(patch color smoothness、patch discrete mean curvature 和 patch pixel color average和variance),我们可以量化扭曲的程度。在我们知道的情况下,GeodesicPSIM 是第一种特别设计用于 static mesh 的 texture map 的模型基于指标。在一个新创建的和挑战性较高的数据库中,GeodesicPSIM 与图像基于、点基于和视频基于指标进行比较,得到了最新的状态。我们还证明了 GeodesicPSIM 的稳定性,通过不同的设置 hyperparameters 进行证明。另外,我们还进行了ablation study,以证明三种提出的特征和 patch cropping 算法的效果。
Deep Learning-Based Prediction of Fractional Flow Reserve along the Coronary Artery
paper_authors: Nils Hampe, Sanne G. M. van Velzen, Jean-Paul Aben, Carlos Collet, Ivana Išgum
for: This paper aims to develop a deep learning-based method to predict the fractional flow reserve (FFR) along the coronary artery from coronary CT angiography (CCTA) scans, which can help doctors identify functionally significant coronary artery disease (CAD) and determine the best treatment strategy.
methods: The proposed method uses a combination of a variational autoencoder (VAE) and a convolutional neural network (CNN) to predict the FFR along the artery. The VAE is used to characterize the artery and generate an unsupervised artery encoding, while the CNN uses this encoding and other inputs to predict the FFR. The CNN is supervised by multiple loss functions, including a loss function inspired by the Earth Mover’s Distance (EMD) to predict the correct location of FFR drops and a histogram-based loss to explicitly supervise the slope of the FFR curve.
results: The proposed method was evaluated using eight-fold cross-validation on a dataset of 110 patients who underwent invasive FFR pullback measurement in 112 arteries. The resulting FFR curves showed good agreement with the reference, allowing the distinction between diffuse and focal CAD distributions in most cases. Quantitative evaluation yielded a mean absolute difference in the area under the FFR pullback curve (AUPC) of 1.7. The method has the potential to provide fast, accurate, and automatic prediction of FFR along the artery from CCTA, which may help doctors make more informed decisions about treatment strategies for CAD patients.Abstract
Functionally significant coronary artery disease (CAD) is caused by plaque buildup in the coronary arteries, potentially leading to narrowing of the arterial lumen, i.e. coronary stenosis, that significantly obstructs blood flow to the myocardium. The current reference for establishing the presence of a functionally significant stenosis is invasive fractional flow reserve (FFR) measurement. To avoid invasive measurements, non-invasive prediction of FFR from coronary CT angiography (CCTA) has emerged. For this, machine learning approaches, characterized by fast inference, are increasingly developed. However, these methods predict a single FFR value per artery i.e. they don't provide information about the stenosis location or treatment strategy. We propose a deep learning-based method to predict the FFR along the artery from CCTA scans. This study includes CCTA images of 110 patients who underwent invasive FFR pullback measurement in 112 arteries. First, a multi planar reconstruction (MPR) of the artery is fed to a variational autoencoder to characterize the artery, i.e. through the lumen area and unsupervised artery encodings. Thereafter, a convolutional neural network (CNN) predicts the FFR along the artery. The CNN is supervised by multiple loss functions, notably a loss function inspired by the Earth Mover's Distance (EMD) to predict the correct location of FFR drops and a histogram-based loss to explicitly supervise the slope of the FFR curve. To train and evaluate our model, eight-fold cross-validation was performed. The resulting FFR curves show good agreement with the reference allowing the distinction between diffuse and focal CAD distributions in most cases. Quantitative evaluation yielded a mean absolute difference in the area under the FFR pullback curve (AUPC) of 1.7. The method may pave the way towards fast, accurate, automatic prediction of FFR along the artery from CCTA.
摘要
Functionally significant coronary artery disease (CAD) is caused by plaque buildup in the coronary arteries, potentially leading to narrowing of the arterial lumen, i.e. coronary stenosis, that significantly obstructs blood flow to the myocardium. The current reference for establishing the presence of a functionally significant stenosis is invasive fractional flow reserve (FFR) measurement. To avoid invasive measurements, non-invasive prediction of FFR from coronary CT angiography (CCTA) has emerged. For this, machine learning approaches, characterized by fast inference, are increasingly developed. However, these methods predict a single FFR value per artery, i.e. they don't provide information about the stenosis location or treatment strategy. We propose a deep learning-based method to predict the FFR along the artery from CCTA scans.This study includes CCTA images of 110 patients who underwent invasive FFR pullback measurement in 112 arteries. First, a multi planar reconstruction (MPR) of the artery is fed to a variational autoencoder to characterize the artery, i.e. through the lumen area and unsupervised artery encodings. Thereafter, a convolutional neural network (CNN) predicts the FFR along the artery. The CNN is supervised by multiple loss functions, notably a loss function inspired by the Earth Mover's Distance (EMD) to predict the correct location of FFR drops and a histogram-based loss to explicitly supervise the slope of the FFR curve. To train and evaluate our model, eight-fold cross-validation was performed. The resulting FFR curves show good agreement with the reference, allowing the distinction between diffuse and focal CAD distributions in most cases. Quantitative evaluation yielded a mean absolute difference in the area under the FFR pullback curve (AUPC) of 1.7. The method may pave the way towards fast, accurate, automatic prediction of FFR along the artery from CCTA.
Cross-view Semantic Alignment for Livestreaming Product Recognition
results: 对于LPR4M数据集,该方法 achieved state-of-the-art performance,并且提供了对多modal数据集的评估和分析,以及对不同视图和数据集的影响的研究。Abstract
Live commerce is the act of selling products online through live streaming. The customer's diverse demands for online products introduce more challenges to Livestreaming Product Recognition. Previous works have primarily focused on fashion clothing data or utilize single-modal input, which does not reflect the real-world scenario where multimodal data from various categories are present. In this paper, we present LPR4M, a large-scale multimodal dataset that covers 34 categories, comprises 3 modalities (image, video, and text), and is 50x larger than the largest publicly available dataset. LPR4M contains diverse videos and noise modality pairs while exhibiting a long-tailed distribution, resembling real-world problems. Moreover, a cRoss-vIew semantiC alignmEnt (RICE) model is proposed to learn discriminative instance features from the image and video views of the products. This is achieved through instance-level contrastive learning and cross-view patch-level feature propagation. A novel Patch Feature Reconstruction loss is proposed to penalize the semantic misalignment between cross-view patches. Extensive experiments demonstrate the effectiveness of RICE and provide insights into the importance of dataset diversity and expressivity. The dataset and code are available at https://github.com/adxcreative/RICE
摘要
live commerce 是在线销售产品的现象,客户的多样化需求对于在线产品的涵义提出了更多的挑战。现有的研究主要集中在时尚服装数据上,或者使用单模态输入,这并不反映实际情况中的多模态数据从多个类别存在。本文提出了LPR4M,一个大规模多模态数据集,覆盖34个类别,包括图像、视频和文本三种模式,比最大公开数据集50倍大。LPR4M包含多样化的视频和噪音模式对,同时具有长尾分布,类似于实际问题。此外,一种cross-view semantiC alignmEnt(RICE)模型被提出,用于从图像和视频视图中学习抽象实例特征。这是通过实例级别对比学习和跨视图块特征传递来实现的。一种新的patch feature reconstruction loss函数被提出,以惩恶分布在跨视图块上的semantic misalignment。广泛的实验表明RICE的有效性,并提供了数据多样性和表达力的意义。数据和代码可以在https://github.com/adxcreative/RICE上下载。
StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability
results: 实验结果显示,StableVQA模型与主观意见更高度相关,比现有的VQA-S模型和通用VQA模型更高效。Abstract
Video shakiness is an unpleasant distortion of User Generated Content (UGC) videos, which is usually caused by the unstable hold of cameras. In recent years, many video stabilization algorithms have been proposed, yet no specific and accurate metric enables comprehensively evaluating the stability of videos. Indeed, most existing quality assessment models evaluate video quality as a whole without specifically taking the subjective experience of video stability into consideration. Therefore, these models cannot measure the video stability explicitly and precisely when severe shakes are present. In addition, there is no large-scale video database in public that includes various degrees of shaky videos with the corresponding subjective scores available, which hinders the development of Video Quality Assessment for Stability (VQA-S). To this end, we build a new database named StableDB that contains 1,952 diversely-shaky UGC videos, where each video has a Mean Opinion Score (MOS) on the degree of video stability rated by 34 subjects. Moreover, we elaborately design a novel VQA-S model named StableVQA, which consists of three feature extractors to acquire the optical flow, semantic, and blur features respectively, and a regression layer to predict the final stability score. Extensive experiments demonstrate that the StableVQA achieves a higher correlation with subjective opinions than the existing VQA-S models and generic VQA models. The database and codes are available at https://github.com/QMME/StableVQA.
摘要
文本稳定性是 User Generated Content (UGC) 视频的不愉快变形,通常是由于摄像头不稳定所致。在过去几年,许多视频稳定算法被提出,但没有特定和准确的度量可以全面评估视频稳定性。实际上,大多数现有的质量评估模型将视频质量评估为整体,不特别考虑视频稳定性的主观体验。因此,这些模型无法明确和精确地测量视频稳定性, especial when severe shakes are present。此外,没有一个大规模的公共视频数据库,包含不同程度的抖动视频和相应的主观分数,这阻碍了视频质量评估的发展。为此,我们建立了一个新的数据库名为 StableDB,其包含 1,952 个多样化的 UGC 视频,每个视频具有主观分数(MOS),用于评估视频稳定性。此外,我们还设计了一种新的 VQA-S 模型,名为 StableVQA,它包含三种特征提取器,用于获取光流、 semantics 和抖动特征,以及一个回归层,用于预测最终的稳定性分数。经过广泛的实验,我们发现 StableVQA 与主观意见更高度相关,并且在与既有 VQA-S 模型和通用 VQA 模型进行比较时,也表现出更好的性能。数据库和代码可以在 GitHub 上获取:https://github.com/QMME/StableVQA。
Histogram-guided Video Colorization Structure with Spatial-Temporal Connection
paper_authors: Zheyuan Liu, Pan Mu, Hanning Xu, Cong Bai
for: Video colorization, aiming at obtaining colorful and plausible results from grayish frames.
methods: 使用 Histogram-guided Video Colorization with Spatial-Temporal connection structure (named ST-HVC), 结合 histogram 和流动特征,以及一种组合方案来处理模糊和噪声。
results: 与多种现有图像和视频基于方法进行比较,表现出优秀的数值和质量性表现在两个视频数据集中。Abstract
Video colorization, aiming at obtaining colorful and plausible results from grayish frames, has aroused a lot of interest recently. Nevertheless, how to maintain temporal consistency while keeping the quality of colorized results remains challenging. To tackle the above problems, we present a Histogram-guided Video Colorization with Spatial-Temporal connection structure (named ST-HVC). To fully exploit the chroma and motion information, the joint flow and histogram module is tailored to integrate the histogram and flow features. To manage the blurred and artifact, we design a combination scheme attending to temporal detail and flow feature combination. We further recombine the histogram, flow and sharpness features via a U-shape network. Extensive comparisons are conducted with several state-of-the-art image and video-based methods, demonstrating that the developed method achieves excellent performance both quantitatively and qualitatively in two video datasets.
摘要
“视频彩色化,目标是从灰度帧中获得鲜艳和合理的结果,在最近几年内引起了很多关注。然而,如何保持时间一致性而保持彩色结果的质量仍然是一大挑战。为解决以上问题,我们提出了基于 histogram 和空间-时间结构的彩色视频处理方法(简称 ST-HVC)。在使用 histogram 和流动特征之前,我们特制了 JOINT FLOW 和 histogram 模块,以便充分利用彩色和运动信息。此外,我们还设计了一种组合方案,以解决杂乱和瑕疵问题。最后,我们通过 U 型网络重新组合 histogram、流动和锐度特征,以实现优秀的性能。我们对多个图像和视频基础方法进行了广泛的比较,并证明了我们提出的方法在两个视频数据集上的出色性 both quantitatively and qualitatively。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
Transmission and Color-guided Network for Underwater Image Enhancement
results: 对多个 referential dataset进行了广泛的实验,并达到了当前最佳性能。Abstract
In recent years, with the continuous development of the marine industry, underwater image enhancement has attracted plenty of attention. Unfortunately, the propagation of light in water will be absorbed by water bodies and scattered by suspended particles, resulting in color deviation and low contrast. To solve these two problems, we propose an Adaptive Transmission and Dynamic Color guided network (named ATDCnet) for underwater image enhancement. In particular, to exploit the knowledge of physics, we design an Adaptive Transmission-directed Module (ATM) to better guide the network. To deal with the color deviation problem, we design a Dynamic Color-guided Module (DCM) to post-process the enhanced image color. Further, we design an Encoder-Decoder-based Compensation (EDC) structure with attention and a multi-stage feature fusion mechanism to perform color restoration and contrast enhancement simultaneously. Extensive experiments demonstrate the state-of-the-art performance of the ATDCnet on multiple benchmark datasets.
摘要
Specifically, we design an Adaptive Transmission-directed Module (ATM) to better guide the network, taking into account the knowledge of physics. To address the color deviation issue, we design a Dynamic Color-guided Module (DCM) to post-process the enhanced image color. Additionally, we propose an Encoder-Decoder-based Compensation (EDC) structure with attention and a multi-stage feature fusion mechanism to perform color restoration and contrast enhancement simultaneously.Extensive experiments show that the ATDCnet achieves state-of-the-art performance on multiple benchmark datasets.
Deep Generative Networks for Heterogeneous Augmentation of Cranial Defects
For: 本研究旨在提高个性化头部充填设计的自动化程度,通过使用深度学习技术。* Methods: 本研究使用了三种深度生成模型来增强数据集,包括Wasserstein生成对抗网络带梯度约束(WGAN-GP)、WGAN-GP混合变换学习(VAE/WGAN-GP)和 introspective变换学习(IntroVAE)。* Results: 通过生成各种不同的缺陷头骨,包括具有相同缺陷的多个头骨,使得自动设计个性化头部充填的过程得到了大幅提高。研究表明,使用生成的头骨数据可以提高缺陷分割的精度,并且可以提供更多的实际案例研究。Abstract
The design of personalized cranial implants is a challenging and tremendous task that has become a hot topic in terms of process automation with the use of deep learning techniques. The main challenge is associated with the high diversity of possible cranial defects. The lack of appropriate data sources negatively influences the data-driven nature of deep learning algorithms. Hence, one of the possible solutions to overcome this problem is to rely on synthetic data. In this work, we propose three volumetric variations of deep generative models to augment the dataset by generating synthetic skulls, i.e. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), WGAN-GP hybrid with Variational Autoencoder pretraining (VAE/WGAN-GP) and Introspective Variational Autoencoder (IntroVAE). We show that it is possible to generate dozens of thousands of defective skulls with compatible defects that achieve a trade-off between defect heterogeneity and the realistic shape of the skull. We evaluate obtained synthetic data quantitatively by defect segmentation with the use of V-Net and qualitatively by their latent space exploration. We show that the synthetically generated skulls highly improve the segmentation process compared to using only the original unaugmented data. The generated skulls may improve the automatic design of personalized cranial implants for real medical cases.
摘要
文本:预制个性化头部刺青设计是一项复杂且具有挑战性的任务,目前在使用深度学习技术时已经成为热点话题。主要挑战在于头部缺陷的多样性。缺乏合适的数据源,使得深度学习算法的数据驱动特性受到负面影响。因此,可以考虑使用合成数据来解决这个问题。在这个工作中,我们提出了三种深度生成模型的几何变换,用于增强数据集,即 Wasserstein生成对抗网络 with Gradient Penalty (WGAN-GP)、WGAN-GP 杂合 hybrid with Variational Autoencoder pretraining (VAE/WGAN-GP) 和 Introspective Variational Autoencoder (IntroVAE)。我们发现可以生成多达数千个具有相同缺陷的人工头部,并且实现了缺陷多样性和真实的头部形状之间的质量衡量。我们通过 V-Net 进行缺陷分割评估,以及latent space探索来评估生成的头部。结果显示,生成的头部能够大幅提高分割过程的精度,相比使用原始未处理数据。这些生成的头部可能会改善实际医疗案例中的个性化头部刺青设计。简化中文:预制个性化头部刺青设计是一项复杂且具有挑战性的任务。主要挑战在于头部缺陷的多样性。缺乏合适的数据源,使得深度学习算法的数据驱动特性受到负面影响。因此,可以考虑使用合成数据来解决这个问题。我们提出了三种深度生成模型,用于增强数据集,包括 Wasserstein生成对抗网络、WGAN-GP 杂合 hybrid 和 Introspective Variational Autoencoder。我们发现可以生成多达数千个具有相同缺陷的人工头部,并且实现了缺陷多样性和真实的头部形状之间的质量衡量。这些生成的头部可能会改善实际医疗案例中的个性化头部刺青设计。
Learning multi-domain feature relation for visible and Long-wave Infrared image patch matching
results: 研究发现,使用 MD-FRN 网络可以提高 Cross-spectral 图像块匹配的性能,特别是在面对不同模式的应用变化时。Abstract
Recently, learning-based algorithms have achieved promising performance on cross-spectral image patch matching, which, however, is still far from satisfactory for practical application. On the one hand, a lack of large-scale dataset with diverse scenes haunts its further improvement for learning-based algorithms, whose performances and generalization rely heavily on the dataset size and diversity. On the other hand, more emphasis has been put on feature relation in the spatial domain whereas the scale dependency between features has often been ignored, leading to performance degeneration especially when encountering significant appearance variations for cross-spectral patches. To address these issues, we publish, to be best of our knowledge, the largest visible and Long-wave Infrared (LWIR) image patch matching dataset, termed VL-CMIM, which contains 1300 pairs of strictly aligned visible and LWIR images and over 2 million patch pairs covering diverse scenes such as asteroid, field, country, build, street and water.In addition, a multi-domain feature relation learning network (MD-FRN) is proposed. Input by the features extracted from a four-branch network, both feature relations in spatial and scale domains are learned via a spatial correlation module (SCM) and multi-scale adaptive aggregation module (MSAG), respectively. To further aggregate the multi-domain relations, a deep domain interactive mechanism (DIM) is applied, where the learnt spatial-relation and scale-relation features are exchanged and further input into MSCRM and SCM. This mechanism allows our model to learn interactive cross-domain feature relations, leading to improved robustness to significant appearance changes due to different modality.
摘要
近些时间,学习基本的算法在跨спектル图像小块匹配方面实现了可观的表现,但是仍然远远不够满足实际应用需求。一方面,缺乏大规模的多样化场景的数据集,使得further improvement of learning-based algorithms的表现和泛化受到数据集大小和多样化的重要限制。另一方面,更多的注重点放在图像域的特征关系上,而忽略了特征之间的尺度关系,导致特征变化时的表现下降,特别是在遇到明显的特征变化时。为解决这些问题,我们在本文中发布了,以我们所知道的最大的可见和长波infrared(LWIR)图像小块匹配数据集,称为VL-CMIM,该数据集包含1300对精准对齐的可见和LWIR图像,以及超过200万个小块对。此外,我们还提出了一种多域特征关系学习网络(MD-FRN),该网络输入来自四个分支网络提取的特征,并通过空间相关模块(SCM)和多Scale适应汇集模块(MSAG)分别学习图像域和尺度域的特征关系。为了进一步汇集多域关系,我们还应用了深度域互动机制(DIM),该机制使得我们的模型可以学习交互的跨域特征关系,从而提高对不同模态的外观变化的Robustness。
Tracking Players in a Badminton Court by Two Cameras
results: 该方法可以减轻球员 occlusion 和重叠问题,提供了球员轨迹跟踪和多视角分析。系统提供了球员位置和运动姿势的信息,可以作为羽毛球教练或自我训练工具,帮助球员提高游戏策略。Abstract
This study proposes a simple method for multi-object tracking (MOT) of players in a badminton court. We leverage two off-the-shelf cameras, one on the top of the court and the other on the side of the court. The one on the top is to track players' trajectories, while the one on the side is to analyze the pixel features of players. By computing the correlations between adjacent frames and engaging the information of the two cameras, MOT of badminton players is obtained. This two-camera approach addresses the challenge of player occlusion and overlapping in a badminton court, providing player trajectory tracking and multi-angle analysis. The presented system offers insights into the positions and movements of badminton players, thus serving as a coaching or self-training tool for badminton players to improve their gaming strategies.
摘要
这项研究提出了一种简单的多对目标跟踪(MOT)方法,用于跟踪Badminton场上球员的运动轨迹。我们利用了两台准备的摄像头,一台置于场上,另一台置于场边。前者用于跟踪球员的轨迹,后者用于分析球员的像素特征。通过计算相邻帧之间的相关性和两台摄像头的信息,实现了球员 occlusion 和 overlap 问题的解决,从而实现了多个角度的分析和球员轨迹跟踪。该系统为Badminton球员提供了运动轨迹和多个角度分析的视角,可以作为教练或自我训练工具,帮助球员改进游戏策略。
InstantAvatar: Efficient 3D Head Reconstruction via Surface Rendering
paper_authors: Antonio Canela, Pol Caselles, Ibrar Malik, Eduard Ramon, Jaime García, Jordi Sánchez-Riera, Gil Triginer, Francesc Moreno-Noguer
for: 快速生成全头Avtaar(full-head avatar)from few images(down to just one)
methods: combinest voxel-grid neural field representation with surface renderer, and uses a novel statistical model to learn a prior distribution over 3D head signed distance functions.
results: achieves 3D head reconstructions with comparable accuracy as the state-of-the-art, with a 100x speed-up.Abstract
Recent advances in full-head reconstruction have been obtained by optimizing a neural field through differentiable surface or volume rendering to represent a single scene. While these techniques achieve an unprecedented accuracy, they take several minutes, or even hours, due to the expensive optimization process required. In this work, we introduce InstantAvatar, a method that recovers full-head avatars from few images (down to just one) in a few seconds on commodity hardware. In order to speed up the reconstruction process, we propose a system that combines, for the first time, a voxel-grid neural field representation with a surface renderer. Notably, a naive combination of these two techniques leads to unstable optimizations that do not converge to valid solutions. In order to overcome this limitation, we present a novel statistical model that learns a prior distribution over 3D head signed distance functions using a voxel-grid based architecture. The use of this prior model, in combination with other design choices, results into a system that achieves 3D head reconstructions with comparable accuracy as the state-of-the-art with a 100x speed-up.
摘要
最近的全头重建技术已经取得了很大的进步,通过使用可导表面或体积渲染来优化神经场来表示一个场景。尽管这些技术达到了历史上无前例的精度,但它们需要几分钟或者几个小时的昂贵优化过程。在这项工作中,我们介绍了InstantAvatar方法,可以从几张图像(甚至是一张)中快速地恢复全头模型,仅需几秒钟的时间。为了加速重建过程,我们提议一种结合了神经场格表示和表面渲染的系统。尽管这种组合可能导致不稳定的优化过程,但我们提出了一种新的统计模型,可以学习3D头签名距离函数的 prior 分布。通过这种先进的统计模型,我们可以在其他设计选择的基础上实现一个100倍加速的系统,并且和现状的精度相当。
Are Sex-based Physiological Differences the Cause of Gender Bias for Chest X-ray Diagnosis?
results: 研究发现,不是数据集的不均衡导致性别之间的表现差异,而是数据集特有的因素。此外,研究还发现,对于不同疾病和数据集,男女群体之间的表现差异强烈不同。最后,研究发现,对卷积物的影响不是解决表现差异的关键。Abstract
While many studies have assessed the fairness of AI algorithms in the medical field, the causes of differences in prediction performance are often unknown. This lack of knowledge about the causes of bias hampers the efficacy of bias mitigation, as evidenced by the fact that simple dataset balancing still often performs best in reducing performance gaps but is unable to resolve all performance differences. In this work, we investigate the causes of gender bias in machine learning-based chest X-ray diagnosis. In particular, we explore the hypothesis that breast tissue leads to underexposure of the lungs and causes lower model performance. Methodologically, we propose a new sampling method which addresses the highly skewed distribution of recordings per patient in two widely used public datasets, while at the same time reducing the impact of label errors. Our comprehensive analysis of gender differences across diseases, datasets, and gender representations in the training set shows that dataset imbalance is not the sole cause of performance differences. Moreover, relative group performance differs strongly between datasets, indicating important dataset-specific factors influencing male/female group performance. Finally, we investigate the effect of breast tissue more specifically, by cropping out the breasts from recordings, finding that this does not resolve the observed performance gaps. In conclusion, our results indicate that dataset-specific factors, not fundamental physiological differences, are the main drivers of male--female performance gaps in chest X-ray analyses on widely used NIH and CheXpert Dataset.
摘要
多个研究已经评估了人工智能算法在医疗领域的公平性,但是对差异的预测性能的原因frequently unknown.这种不知道偏见的原因使得偏见缓减措施效果受限,例如简单的数据集平衡仍然能够减少性能差距,但是无法解决所有的差距。在这个工作中,我们调查了机器学习基于骨科X光图诊的性别偏见的原因。特别是,我们研究了肿瘤组织导致肺部不足暴露和降低模型性能的假设。方法上,我们提出了一种新的采样方法, Addressing the highly skewed distribution of recordings per patient in two widely used public datasets, while reducing the impact of label errors.我们对男女之间疾病、数据集和训练集中的性别表示进行了全面的分析,发现数据集偏见不是差异的唯一原因。此外,数据集之间的相对性能差异强烈,表明 dataset-specific factors significantly influencing male/female group performance.最后,我们具体调查了乳腺组织的影响,通过将乳腺从记录中剪除,发现这并不能解决观察到的性能差距。结论是,我们的结果表明,在 widely used NIH and CheXpert Dataset 上, male--female performance gaps in chest X-ray analyses are mainly driven by dataset-specific factors, not fundamental physiological differences.
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
results: 实验结果表明,本方法在长视频和短视频识别任务上均达到了状态革命的性能,同时也提供了新的效率和准确率的贸易OFF。Abstract
Recent adaptive methods for efficient video recognition mostly follow the two-stage paradigm of "preview-then-recognition" and have achieved great success on multiple video benchmarks. However, this two-stage paradigm involves two visits of raw frames from coarse-grained to fine-grained during inference (cannot be parallelized), and the captured spatiotemporal features cannot be reused in the second stage (due to varying granularity), being not friendly to efficiency and computation optimization. To this end, inspired by human cognition, we propose a novel recognition paradigm of "View while Moving" for efficient long-untrimmed video recognition. In contrast to the two-stage paradigm, our paradigm only needs to access the raw frame once. The two phases of coarse-grained sampling and fine-grained recognition are combined into unified spatiotemporal modeling, showing great performance. Moreover, we investigate the properties of semantic units in video and propose a hierarchical mechanism to efficiently capture and reason about the unit-level and video-level temporal semantics in long-untrimmed videos respectively. Extensive experiments on both long-untrimmed and short-trimmed videos demonstrate that our approach outperforms state-of-the-art methods in terms of accuracy as well as efficiency, yielding new efficiency and accuracy trade-offs for video spatiotemporal modeling.
摘要
现代适应方法 для高效视频识别大多采用两个阶段 paradigm,即"预览然后识别",在推理过程中两次访问粗粒度和细粒度的原始帧(不能并发),并且在第二阶段 capture的空间时间特征不能重用(因为不同的粒度),不友好于效率和计算优化。为了解决这个问题,我们受人类认知启发,提出了一种新的识别方法——"在移动中查看",这种方法只需要访问原始帧一次。在两个阶段的粗粒度和细粒度 sampling 和识别结合在一起,实现了非常出色的性能。此外,我们还研究了视频中元素的性质,并提出了一种层次机制来高效地捕捉和理解视频中元素的时间 semantics。广泛的实验表明,我们的方法在精度和效率两个方面都高于当前状态艺术方法,提供了新的精度和效率贸易平衡 для视频空间模型。
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
results: 实验结果表明,提出的方法能够在零shot情况下,从任意视频提示中提取表达性的人脸样式,并将其转移到个性化的图像渲染器上,以获得更加生动、有authenticity和丰富的讲话人物。Abstract
Current talking face generation methods mainly focus on speech-lip synchronization. However, insufficient investigation on the facial talking style leads to a lifeless and monotonous avatar. Most previous works fail to imitate expressive styles from arbitrary video prompts and ensure the authenticity of the generated video. This paper proposes an unsupervised variational style transfer model (VAST) to vivify the neutral photo-realistic avatars. Our model consists of three key components: a style encoder that extracts facial style representations from the given video prompts; a hybrid facial expression decoder to model accurate speech-related movements; a variational style enhancer that enhances the style space to be highly expressive and meaningful. With our essential designs on facial style learning, our model is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer it onto a personalized image renderer in a zero-shot manner. Experimental results demonstrate the proposed approach contributes to a more vivid talking avatar with higher authenticity and richer expressiveness.
摘要
当前的话语生成方法主要关注于 speech-lip 同步。然而,对话优化方法的不足导致生成的人工智能人物具有毫不生动的、单一的表情。大多数前一代的工作无法从 произвольный视频提示中捕捉出表情特征,并确保生成的视频的authenticity。本文提出一种无监督的变换式样本传输模型(VAST),以vivify中性真实的人工智能人物。我们的模型包括三个关键组件:一个样式编码器,用于从给定的视频提示中提取表情样式表示;一个混合的表情解码器,用于模型准确的speech-相关的动作;一个变换式样本增强器,用于增强样式空间,使其具有高度表意和意义。通过我们的关键设计,我们的模型能够灵活地从 произвольный视频提示中捕捉表情样式,并将其传输到个性化的图像渲染器中,无需预训练。实验结果表明,我们的方法对生成的人工智能人物具有更高的authenticity和更丰富的表达力。
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
results: 在多个Zero-shot semantic segmentation benchmark上达到了显著的提高,相比GroupViT, MixReorg 模型在 PASCAL VOC2012、PASCAL Context、MS COCO 和 ADE20K 上的mIoU提高为5.0%、6.2%、2.5% 和 3.4%。Abstract
Recently, semantic segmentation models trained with image-level text supervision have shown promising results in challenging open-world scenarios. However, these models still face difficulties in learning fine-grained semantic alignment at the pixel level and predicting accurate object masks. To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model's ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence. Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text. The model is then trained to minimize the segmentation loss of the mixed images and the two contrastive losses of the original and restored features. With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability, which is crucial for open-world segmentation. After training with large-scale image-text data, MixReorg models can be applied directly to segment visual objects of arbitrary categories, without the need for further fine-tuning. Our proposed framework demonstrates strong performance on popular zero-shot semantic segmentation benchmarks, outperforming GroupViT by significant margins of 5.0%, 6.2%, 2.5%, and 3.4% mIoU on PASCAL VOC2012, PASCAL Context, MS COCO, and ADE20K, respectively.
摘要
最近,受图像级文本监督训练的 semantic segmentation 模型在开放世界enario中表现出色,但这些模型仍然面临精度的semantic alignment 和准确的 объек mask 预测问题。为解决这个问题,我们提出 MixReorg,一种新的预训练方法 для semantic segmentation,该方法可以提高模型对图像中混合的 patch 的重新组织能力,同时考虑本地视觉相关性和全局semantic coherence。我们的方法是通过混合图像 patch 而生成细化的 patch-text 对,并训练模型将混合图像的 segmentation 损失和原始和恢复特征的两个对照损失降低到最小值。通过 MixReorg 作为 mask learner,传统的文本监督 semantic segmentation 模型可以学习高度普适的像素级semantic alignment能力,这是开放世界 segmentation 中非常重要的。 после训练大规模的图像-文本数据,MixReorg 模型可以直接应用于任意类型的视觉对象分割,无需进一步的微调。我们的提出的框架在流行的零shot semantic segmentation 标准准的 benchmark 上显示出强大的表现,比 GroupViT 高出5.0%, 6.2%, 2.5%, 3.4% mIoU 的提高。
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
results: 在hmdb-51、ucf-101和kinetics-400 datasets上进行了广泛的实验,其方法在”少shot”和”zero-shot”训练中超越了大多数现有的状态静态方法,并在”closed-set”训练中实现了非常少的可训练参数和额外计算成本。Abstract
The Contrastive Language-Image Pre-training (CLIP) has recently shown remarkable generalization on "zero-shot" training and has applied to many downstream tasks. We explore the adaptation of CLIP to achieve a more efficient and generalized action recognition method. We propose that the key lies in explicitly modeling the motion cues flowing in video frames. To that end, we design a two-stream motion modeling block to capture motion and spatial information at the same time. And then, the obtained motion cues are utilized to drive a dynamic prompts learner to generate motion-aware prompts, which contain much semantic information concerning human actions. In addition, we propose a multimodal communication block to achieve a collaborative learning and further improve the performance. We conduct extensive experiments on HMDB-51, UCF-101, and Kinetics-400 datasets. Our method outperforms most existing state-of-the-art methods by a significant margin on "few-shot" and "zero-shot" training. We also achieve competitive performance on "closed-set" training with extremely few trainable parameters and additional computational costs.
摘要
CLIP(对照语言图像预训)在最近的应用中显示了很好的普遍化能力,特别是在“零shot”训练中。我们想要探索CLIP的改进,以实现更有效和普遍的动作识别方法。我们认为关键在于Explicitly 模型影像中的动作讯号。为此,我们设计了两条流动模型对应块,以同时捕捉影像中的动作和空间信息。然后,所获得的动作讯号被用来驱动动态提示学习者生成动作感知的提示,这些提示具有人类动作的含义信息。此外,我们提议了多模式通信对应块,以实现多模式学习和进一步提高性能。我们对HMDB-51、UCF-101和Kinetics-400 dataset进行了广泛的实验。我们的方法在“几shot”和“零shot”训练中比大多数现有的方法表现出了明显的超越。我们还在“关闭集”训练中取得了非常有效的性能,仅需要极少的可训练参数和额外的计算成本。
paper_authors: Muyu Xu, Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Xiaoqin Zhang, Christian Theobalt, Ling Shao, Shijian Lu
for: 用于 novel view synthesis via implicit scene representation,但通常受到精度降低的问题。
methods: integrate Multi-View Stereo (MVS) technique into NeRF,并在新场景中进行一定的微调。
results: 通过将浮点数分解插入到MVS和NeRF中,实现了高质量且通用的Synthesis,无需每个新场景进行微调。Abstract
Neural Radiance Field (NeRF) has shown impressive performance in novel view synthesis via implicit scene representation. However, it usually suffers from poor scalability as requiring densely sampled images for each new scene. Several studies have attempted to mitigate this problem by integrating Multi-View Stereo (MVS) technique into NeRF while they still entail a cumbersome fine-tuning process for new scenes. Notably, the rendering quality will drop severely without this fine-tuning process and the errors mainly appear around the high-frequency features. In the light of this observation, we design WaveNeRF, which integrates wavelet frequency decomposition into MVS and NeRF to achieve generalizable yet high-quality synthesis without any per-scene optimization. To preserve high-frequency information when generating 3D feature volumes, WaveNeRF builds Multi-View Stereo in the Wavelet domain by integrating the discrete wavelet transform into the classical cascade MVS, which disentangles high-frequency information explicitly. With that, disentangled frequency features can be injected into classic NeRF via a novel hybrid neural renderer to yield faithful high-frequency details, and an intuitive frequency-guided sampling strategy can be designed to suppress artifacts around high-frequency regions. Extensive experiments over three widely studied benchmarks show that WaveNeRF achieves superior generalizable radiance field modeling when only given three images as input.
摘要
neural radiance field (NeRF) 已经展现出优秀的新视图合成能力via做 Implicit scene representation. However, it usually suffers from poor scalability as it requires densely sampled images for each new scene. Several studies have attempted to mitigate this problem by integrating Multi-View Stereo (MVS) technique into NeRF, while they still entail a cumbersome fine-tuning process for new scenes. Notably, the rendering quality will drop severely without this fine-tuning process, and the errors mainly appear around the high-frequency features. In the light of this observation, we design WaveNeRF, which integrates wavelet frequency decomposition into MVS and NeRF to achieve generalizable yet high-quality synthesis without any per-scene optimization. To preserve high-frequency information when generating 3D feature volumes, WaveNeRF builds Multi-View Stereo in the Wavelet domain by integrating the discrete wavelet transform into the classical cascade MVS, which disentangles high-frequency information explicitly. With that, disentangled frequency features can be injected into classic NeRF via a novel hybrid neural renderer to yield faithful high-frequency details, and an intuitive frequency-guided sampling strategy can be designed to suppress artifacts around high-frequency regions. Extensive experiments over three widely studied benchmarks show that WaveNeRF achieves superior generalizable radiance field modeling when only given three images as input.
HyperCoil-Recon: A Hypernetwork-based Adaptive Coil Configuration Task Switching Network for MRI Reconstruction
paper_authors: Sriprabha Ramanarayanan, Mohammad Al Fahim, Rahul G. S., Amrit Kumar Jethi, Keerthi Ram, Mohanasankar Sivaprakasam for: HyperCoil-Recon is proposed to address the challenge of training deep learning-based image reconstruction models for multi-coil MRI reconstruction, which requires adapting to diverse coil configurations.methods: The approach uses a hypernetwork-based coil configuration task-switching network, which encodes varying configurations of the number of coils in a multi-tasking perspective. The hypernetworks infer and embed task-specific weights into the reconstruction network, leveraging contextual knowledge of common and varying image features among the various fields-of-view of the coils.results: The approach adapts on the fly to various unseen configurations up to 32 coils when trained on lower numbers (i.e. 7 to 11) of randomly varying coils, and to 120 deviated unseen configurations when trained on 18 configurations in a single model. It matches the performance of coil configuration-specific models and outperforms configuration-invariant models with improvement margins of around 1 dB / 0.03 and 0.3 dB / 0.02 in PSNR / SSIM for knee and brain data.Abstract
Parallel imaging, a fast MRI technique, involves dynamic adjustments based on the configuration i.e. number, positioning, and sensitivity of the coils with respect to the anatomy under study. Conventional deep learning-based image reconstruction models have to be trained or fine-tuned for each configuration, posing a barrier to clinical translation, given the lack of computational resources and machine learning expertise for clinicians to train models at deployment. Joint training on diverse datasets learns a single weight set that might underfit to deviated configurations. We propose, HyperCoil-Recon, a hypernetwork-based coil configuration task-switching network for multi-coil MRI reconstruction that encodes varying configurations of the numbers of coils in a multi-tasking perspective, posing each configuration as a task. The hypernetworks infer and embed task-specific weights into the reconstruction network, 1) effectively utilizing the contextual knowledge of common and varying image features among the various fields-of-view of the coils, and 2) enabling generality to unseen configurations at test time. Experiments reveal that our approach 1) adapts on the fly to various unseen configurations up to 32 coils when trained on lower numbers (i.e. 7 to 11) of randomly varying coils, and to 120 deviated unseen configurations when trained on 18 configurations in a single model, 2) matches the performance of coil configuration-specific models, and 3) outperforms configuration-invariant models with improvement margins of around 1 dB / 0.03 and 0.3 dB / 0.02 in PSNR / SSIM for knee and brain data. Our code is available at https://github.com/sriprabhar/HyperCoil-Recon
摘要
《平行巡检:一种快速MRI技术》,其中的巡检配置(number,positioning,sensitivity)与研究对象的解剖学相关。传统的深度学习基于图像重建模型需要根据配置进行训练或精度调整,这会对临床应用带来障碍,因为临床医生缺乏计算资源和机器学习专家来在部署时训练模型。我们提议使用卷积网络(hypernetwork)来实现巡检配置任务 switching,其中每个配置视为一个任务。卷积网络在重建网络中插入任务特有的 weights,以利用不同巡检配置中图像特征的共同知识,并在测试时对未经见配置进行普适化。我们的方法可以在不同的巡检配置下进行适应,并且与特定配置模型和配置不变模型相比,能够提高PSNR/SSIM指标约1dB/0.03和0.3dB/0.02。我们的代码可以在https://github.com/sriprabhar/HyperCoil-Recon上找到。
Joint-Relation Transformer for Multi-Person Motion Prediction
results: 实验表明,我们的方法在3DPW-SoMoF/RC和CMU-Mpcap/MuPoTS-3D数据集上 achieved a 13.4% improvement of 900ms VIM and 17.8%/12.0% improvement of 3s MPJPE。Abstract
Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people. Transformer-based methods have shown promising results on this task, but they miss the explicit relation representation between joints, such as skeleton structure and pairwise distance, which is crucial for accurate interaction modeling. In this paper, we propose the Joint-Relation Transformer, which utilizes relation information to enhance interaction modeling and improve future motion prediction. Our relation information contains the relative distance and the intra-/inter-person physical constraints. To fuse relation and joint information, we design a novel joint-relation fusion layer with relation-aware attention to update both features. Additionally, we supervise the relation information by forecasting future distance. Experiments show that our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE on CMU-Mpcap/MuPoTS-3D dataset.
摘要
多人运动预测是一个复杂的问题,因为运动的依赖于每个人的过去运动和人们之间的交互。基于Transformer的方法在这个任务上表现出了承诺,但它们缺乏明确的关系表示,如骨架结构和对方之间的距离,这些信息对准确地模拟人体交互是非常重要。在这篇论文中,我们提出了关节关系变换器(Joint-Relation Transformer),它利用关系信息来增强交互模拟,并提高未来运动预测。我们的关系信息包括对方之间的距离和人体内部/外部的物理约束。为了融合关系和关节信息,我们设计了一种新的关节关系融合层,其中包含关注关系的注意力来更新两种特征。此外,我们还对关系信息进行预测未来距离的超vision,以便进一步训练关系信息。实验表明,我们的方法在3DPW-SoMoF/RC和CMU-Mpcap/MuPoTS-3D dataset上的900ms VIM和3s MPJPE上提高了13.4%和17.8%,分别。
methods: 提出了一种新的研究问题——Generalized Unbiased Scene Graph Generation(G-USGG),并提出了Multi-Concept Learning(MCL)框架,以确保学习过程中具有不同概念的均衡。同时,还引入了Balanced Prototypical Memory(BPM)来实现不同概念的平衡学习。
results: 对 VG-SGG 和 OI-SGG dataset进行了广泛的实验,证明了我们的模型独立技术在提高 predicate-level 不均衡关系识别和 concept-level compositional 生成能力方面具有极高的效果,并在两个关键方面达到了新的州OF-THE-ART纪录。Abstract
Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-object combinations). This concept-level imbalance poses a more pervasive and challenging issue compared to the predicate-level imbalance since subject-object pairs are inherently complex in combinations. Hence, we introduce a novel research problem: Generalized Unbiased Scene Graph Generation (G-USGG), which takes into account both predicate-level and concept-level imbalance. To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts. MCL first quantifies the concept-level imbalance across predicates in terms of different amounts of concepts, representing as multiple concept-prototypes within the same class. It then effectively learns concept-prototypes by applying the Concept Regularization (CR) technique. Furthermore, to achieve balanced learning over different concepts, we introduce the Balanced Prototypical Memory (BPM), which guides SGG models to generate balanced representations for concept-prototypes. Extensive experiments demonstrate the remarkable efficacy of our model-agnostic strategy in enhancing the performance of benchmark models on both VG-SGG and OI-SGG datasets, leading to new state-of-the-art achievements in two key aspects: predicate-level unbiased relation recognition and concept-level compositional generability.
摘要
现有的不偏 scene graph生成(USGG)方法只是解决 predicate-level 偏见问题,即高频类占据罕见类预测,而忽略了 concept-level 偏见。实际上,即使 predicate 本身具有平衡,也存在 context 中的 long-tailed 分布,导致 predicate 内部的 concept-level 偏见。这种 concept-level 偏见比 predicate-level 偏见更加广泛和困难,因为 subject-object 组合是复杂的。因此,我们提出了一个新的研究问题: Generalized Unbiased Scene Graph Generation(G-USGG),它考虑了 predicate-level 和 concept-level 偏见。为此,我们提出了 Multi-Concept Learning(MCL)框架,确保学习过程中具有平衡的概念。MCL 首先量化 predicate 中的 concept-level 偏见,并通过 Concept Regularization(CR)技术有效地学习概念权重。此外,为了实现不同概念之间的平衡学习,我们引入了 Balanced Prototypical Memory(BPM),它使得 SGG 模型生成的概念表示具有平衡。经验表明,我们的模型自适应策略可以提高 benchmark 模型在 VG-SGG 和 OI-SGG 数据集上的表现,并创造出新的 state-of-the-art 成绩在两个关键方面: predicate-level 不偏关系认识和 concept-level композиitional 可 generates。
High-Level Features Parallelization for Inference Cost Reduction Through Selective Attention
paper_authors: André Peter Kelm, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop
for: 降低深度学习模型的执行成本,特别是适用于移动设备、工业应用和机器人应用等场景。
methods: 使用平行高级特征来选择性地跳过或选择类型特征,以降低推理成本。该方法基于人脑科学发现, Observation of spatially and contextually separated neural activations in the human brain.
results: 可以保持高性能,但可以减少参数数量、计算复杂度和电力消耗。在一些示例中,可以减少参数数量的75%,并且可以避免重新训练。此外,该方法还具有可以根据增强或抑制高级类型特征来直接影响处理的能力,类似于人脑中的选择性注意力机制。Abstract
In this work, we parallelize high-level features in deep networks to selectively skip or select class-specific features to reduce inference costs. This challenges most deep learning methods due to their limited ability to efficiently and effectively focus on selected class-specific features without retraining. We propose a serial-parallel hybrid architecture with serial generic low-level features and parallel high-level features. This accounts for the fact that many high-level features are class-specific rather than generic, and has connections to recent neuroscientific findings that observe spatially and contextually separated neural activations in the human brain. Our approach provides the unique functionality of cutouts: selecting parts of the network to focus on only relevant subsets of classes without requiring retraining. High performance is maintained, but the cost of inference can be significantly reduced. In some of our examples, up to $75\,\%$ of parameters are skipped and $35\,\%$ fewer GMACs (Giga multiply-accumulate) operations are used as the approach adapts to a change in task complexity. This is important for mobile, industrial, and robotic applications where reducing the number of parameters, the computational complexity, and thus the power consumption can be paramount. Another unique functionality is that it allows processing to be directly influenced by enhancing or inhibiting high-level class-specific features, similar to the mechanism of selective attention in the human brain. This can be relevant for cross-modal applications, the use of semantic prior knowledge, and/or context-aware processing.
摘要
在这项工作中,我们平行化深度网络中的高级特征,以选择性地跳过或选择特定类型的特征,以降低推理成本。这会挑战大多数深度学习方法,因为它们具有限制性能 efficiently和有效地关注选择的特定类型的特征而不需要重新训练。我们提议一种序列-平行混合架构,其包括序列的通用低级特征和平行的高级特征。这是因为许多高级特征是特定的类型而不是通用的,并且与最近的神经科学发现相关,观察人脑中的空间和上下文分离的神经活动。我们的方法提供了独特的功能,即“剪辑”:可以在不需要重新训练的情况下,选择ively关注特定类型的特征。我们的方法可以保持高性能,同时降低推理成本。在一些我们的示例中,可以避免大约75%的参数和35% fewer GMACs(亿乘法积加)操作。这对移动、工业和机器人应用而言非常重要,因为减少参数、计算复杂性和电力消耗是 Paramount。另外,我们的方法还允许处理直接受到高级类型特征的增强或抑制影响,类似于人脑中的选择性注意力机制。这可能对cross-modal应用、使用Semantic prior知识和/或Context-aware处理有 relevance。
Enhancing Mobile Privacy and Security: A Face Skin Patch-Based Anti-Spoofing Approach
results: 在多个公共数据集上进行实验,结果表明我们的算法在准确率和速度两个方面具有优势。Abstract
As Facial Recognition System(FRS) is widely applied in areas such as access control and mobile payments due to its convenience and high accuracy. The security of facial recognition is also highly regarded. The Face anti-spoofing system(FAS) for face recognition is an important component used to enhance the security of face recognition systems. Traditional FAS used images containing identity information to detect spoofing traces, however there is a risk of privacy leakage during the transmission and storage of these images. Besides, the encryption and decryption of these privacy-sensitive data takes too long compared to inference time by FAS model. To address the above issues, we propose a face anti-spoofing algorithm based on facial skin patches leveraging pure facial skin patch images as input, which contain no privacy information, no encryption or decryption is needed for these images. We conduct experiments on several public datasets, the results prove that our algorithm has demonstrated superiority in both accuracy and speed.
摘要
As Facial Recognition System(FRS) 广泛应用于访问控制和移动支付等领域,因为其方便性和高准确率。 Facial recognition 的安全性也备受重视。 Face anti-spoofing system(FAS) 是face recognition 系统中的一个重要组件,用于增强face recognition 系统的安全性。 传统的 FAS 使用包含身份信息的图像检测冒险迹象,但存在隐私泄露的风险在传输和存储这些图像时。 此外,对这些隐私敏感数据的加密和解密也需要很长时间,比推理时间更长。 为解决上述问题,我们提出一种基于 facial skin patches 的面反射验证算法,使用纯度 facial skin patch 图像作为输入,这些图像不含隐私信息,无需加密或解密。 我们在多个公共数据集上进行了实验,结果表明,我们的算法在准确率和速度两个方面具有显著的优势。
Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection
results: 本研究在Visual Anomaly and Novelty Detection(VAND)竞赛中的零shot追踪和几shot追踪中获得了4th和2nd名的佳绩。Abstract
Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection. To address the challenges of data collection, researchers have introduced zero-/few-shot anomaly detection techniques that require minimal normal images for each category. However, complex industrial scenarios often involve multiple objects, presenting a significant challenge. In light of this, we propose a straightforward yet powerful multi-scale memory comparison framework for zero-/few-shot anomaly detection. Our approach employs a global memory bank to capture features across the entire image, while an individual memory bank focuses on simplified scenes containing a single object. The efficacy of our method is validated by its remarkable achievement of 4th place in the zero-shot track and 2nd place in the few-shot track of the Visual Anomaly and Novelty Detection (VAND) competition.
摘要
“异常检测已经受到了广泛关注,特别是在工业缺陷检测方面,因为它们可以应用于各种领域。为了解决数据收集的挑战,研究人员已经提出了零/几个例图 anomaly detection 技术,这些技术需要最小的正常图像。然而,复杂的工业场景经常会包含多个物体,这成为一大挑战。为此,我们提出了一种简单 yet 强大的多级内存比较框架,用于零/几个例图 anomaly detection。我们的方法使用全图内存银行来捕捉图像中的特征,而各个内存银行则专注于单个物体的简化场景。我们的方法的有效性得到了 VAND 比赛中的 zero-shot 轨道和 few-shot 轨道的优秀成绩,排名第四和第二。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration
results: 我们在ScanNet和3DMatch上进行了广泛的实验,结果显示,我们的方法可以达到新的州OF-THE-ART性能水平。Abstract
Point cloud registration is a task to estimate the rigid transformation between two unaligned scans, which plays an important role in many computer vision applications. Previous learning-based works commonly focus on supervised registration, which have limitations in practice. Recently, with the advance of inexpensive RGB-D sensors, several learning-based works utilize RGB-D data to achieve unsupervised registration. However, most of existing unsupervised methods follow a cascaded design or fuse RGB-D data in a unidirectional manner, which do not fully exploit the complementary information in the RGB-D data. To leverage the complementary information more effectively, we propose a network implementing multi-scale bidirectional fusion between RGB images and point clouds generated from depth images. By bidirectionally fusing visual and geometric features in multi-scales, more distinctive deep features for correspondence estimation can be obtained, making our registration more accurate. Extensive experiments on ScanNet and 3DMatch demonstrate that our method achieves new state-of-the-art performance. Code will be released at https://github.com/phdymz/PointMBF
摘要
点云注册是一个重要的计算机视觉任务,目的是估算两个不同的扫描中的固定变换。在许多计算机视觉应用中,点云注册扮演着关键的角色。先前的学习型工作通常是指监督式注册,它在实际应用中有限制。随着便宜的RGB-D感知器的普及,最近几年有很多学习型工作利用RGB-D数据实现无监督注册。然而,大多数现有的无监督方法采用层次结构或将RGB-D数据在单向方式混合,这并不能充分利用RGB-D数据的补充信息。为了更好地利用RGB-D数据的补充信息,我们提议一种网络实现多尺度粒度的拼接,并将RGB图像和从深度图像生成的点云进行多向拼接。通过多尺度粒度的拼接,可以更好地获得更明确的深度特征,从而提高注册的准确性。我们对ScanNet和3DMatch进行了广泛的实验,结果显示,我们的方法在新的状态艺术性能。代码将在GitHub上发布,请参考https://github.com/phdymz/PointMBF。
results: 对于两个 simulated 数据集,结果表明 SUnAA 的性能比传统和先进的方法更好,具体来说是 signal-to-reconstruction error 下降。此外,SUnAA 还应用于 Cuprite 数据集,并与可用的地质地图进行比较。 Qualitative 评估表明 SUnAA 可以成功估计矿物含量,并在主要矿物的检测方面提供了显著改进。Abstract
This paper introduces a new sparse unmixing technique using archetypal analysis (SUnAA). First, we design a new model based on archetypal analysis. We assume that the endmembers of interest are a convex combination of endmembers provided by a spectral library and that the number of endmembers of interest is known. Then, we propose a minimization problem. Unlike most conventional sparse unmixing methods, here the minimization problem is non-convex. We minimize the optimization objective iteratively using an active set algorithm. Our method is robust to the initialization and only requires the number of endmembers of interest. SUnAA is evaluated using two simulated datasets for which results confirm its better performance over other conventional and advanced techniques in terms of signal-to-reconstruction error. SUnAA is also applied to Cuprite dataset and the results are compared visually with the available geological map provided for this dataset. The qualitative assessment demonstrates the successful estimation of the minerals abundances and significantly improves the detection of dominant minerals compared to the conventional regression-based sparse unmixing methods. The Python implementation of SUnAA can be found at: https://github.com/BehnoodRasti/SUnAA.
摘要
The performance of SUnAA is evaluated using two simulated datasets, and the results show that it outperforms conventional and advanced techniques in terms of signal-to-reconstruction error. SUnAA is also applied to the Cuprite dataset and the results are compared visually with the available geological map. The qualitative assessment demonstrates the successful estimation of mineral abundances and the improved detection of dominant minerals compared to conventional regression-based sparse unmixing methods.The Python implementation of SUnAA can be found at the following link: .Translated into Simplified Chinese:这篇论文介绍了一种新的稀疏分解技术,基于体型分析(SUnAA)。该方法假设有兴趣的终端成分是spectral库中的终端成分的 convex combinaison,并且知道终端成分的数量。然后,我们提出了一个非凸优化问题,并使用活动集算法来逐步解决。这种方法对 initialization 非常稳定,只需要终端成分的数量。SUnAA 在两个 simulated 数据集上进行了评估,结果表明它在信号到重建错误方面与其他 conventinal 和高级方法相比,表现更好。此外,SUnAA 还应用于 Cuprite 数据集,并与可用的地质地图进行比较。质量评估表明,SUnAA 成功地估计了矿物质的含量,并在主要矿物质的探测方面提高了可见性。Python 实现的 SUnAA 可以在以下链接中找到:.
Objects do not disappear: Video object detection by single-frame object location anticipation
paper_authors: Xin Liu, Fatemeh Karimi Nejadasl, Jan C. van Gemert, Olaf Booij, Silvia L. Pintea
for: 提高视频对象检测精度和效率,以及减少注释成本。
methods: 利用视频中对象的连续平滑运动,提高对象检测精度和效率,并减少注释成本。
results: 在四个 dataset 上达到了比state-of-the-art更高的 mean average precision,并且提高了计算效率和注释效率。Abstract
Objects in videos are typically characterized by continuous smooth motion. We exploit continuous smooth motion in three ways. 1) Improved accuracy by using object motion as an additional source of supervision, which we obtain by anticipating object locations from a static keyframe. 2) Improved efficiency by only doing the expensive feature computations on a small subset of all frames. Because neighboring video frames are often redundant, we only compute features for a single static keyframe and predict object locations in subsequent frames. 3) Reduced annotation cost, where we only annotate the keyframe and use smooth pseudo-motion between keyframes. We demonstrate computational efficiency, annotation efficiency, and improved mean average precision compared to the state-of-the-art on four datasets: ImageNet VID, EPIC KITCHENS-55, YouTube-BoundingBoxes, and Waymo Open dataset. Our source code is available at https://github.com/L-KID/Videoobject-detection-by-location-anticipation.
摘要
视频中的对象通常具有连续的平滑运动。我们利用连续的平滑运动来提高检测精度,并且在三种方面进行利用:1. 使用对象运动作为额外的监督来源,我们通过预测对象位置从静止关键帧中获取。2. 提高效率,只在一小部分帧上进行昂贵的特征计算。因为邻近帧往往是重复的,所以只计算关键帧上的特征,并预测后续帧中对象的位置。3. 降低注释成本,只需注释关键帧,并使用平滑 Pseudo-运动 между关键帧来预测后续帧中对象的位置。我们在四个数据集上进行了比较:ImageNet VID、EPIC KITCHENS-55、YouTube-BoundingBoxes 和 Waymo Open dataset,并demonstrate了计算效率、注释效率以及改进的平均准确率。我们的源代码可以在https://github.com/L-KID/Videoobject-detection-by-location-anticipation上获取。
FaceSkin: A Privacy Preserving Facial skin patch Dataset for multi Attributes classification
for: attribute classification, such as age, race, and gender
methods: utilizes a dataset called FaceSkin, which includes diverse ages and races, as well as synthetic skin-patches from 2D and 3D attack images
results: effective in attribute classification and has potential for various downstream tasks, such as Face anti-spoofing and Age estimation.Abstract
Human facial skin images contain abundant textural information that can serve as valuable features for attribute classification, such as age, race, and gender. Additionally, facial skin images offer the advantages of easy collection and minimal privacy concerns. However, the availability of well-labeled human skin datasets with a sufficient number of images is limited. To address this issue, we introduce a dataset called FaceSkin, which encompasses a diverse range of ages and races. Furthermore, to broaden the application scenarios, we incorporate synthetic skin-patches obtained from 2D and 3D attack images, including printed paper, replays, and 3D masks. We evaluate the FaceSkin dataset across distinct categories and present experimental results demonstrating its effectiveness in attribute classification, as well as its potential for various downstream tasks, such as Face anti-spoofing and Age estimation.
摘要
人脸皮肤图像含有丰富的文本特征,可以作为年龄、种族和性别等特征的有价值特征。此外,人脸皮肤图像具有易收集和低隐私问题的优点。然而,有限的人脸皮肤数据集的可用性是一个问题。为解决这个问题,我们介绍了一个名为FaceSkin的数据集,该数据集包含多个年龄和种族的多样化图像。此外,为扩展应用场景,我们添加了由2D和3D攻击图像生成的人工皮肤质感补充。我们在不同类别上评估了FaceSkin数据集,并提供了对attribute分类、Face anti-spoofing和年龄估计等下游任务的实验结果,以及其潜在应用场景。
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
results: 本论文提出了一种新的方法来评估DNN层的重要性,并创建了一个新的数据集来评估这种方法。研究表明,DNN层的重要性与层之间的相互作用有关,并且可以通过层层的权重融合来评估层的重要性。这些结论可以用于提高DNN的效率(通过剪辑和量化)以及增强DNN的Robustness(例如硬件故障)。Abstract
Deep neural networks (DNNs) demonstrate outstanding performance across most computer vision tasks. Some critical applications, such as autonomous driving or medical imaging, also require investigation into their behavior and the reasons behind the decisions they make. In this vein, DNN attribution consists in studying the relationship between the predictions of a DNN and its inputs. Attribution methods have been adapted to highlight the most relevant weights or neurons in a DNN, allowing to more efficiently select which weights or neurons can be pruned. However, a limitation of these approaches is that weights are typically compared within each layer separately, while some layers might appear as more critical than others. In this work, we propose to investigate DNN layer importance, i.e. to estimate the sensitivity of the accuracy w.r.t. perturbations applied at the layer level. To do so, we propose a novel dataset to evaluate our method as well as future works. We benchmark a number of criteria and draw conclusions regarding how to assess DNN layer importance and, consequently, how to budgetize layers for increased DNN efficiency (with applications for DNN pruning and quantization), as well as robustness to hardware failure (e.g. bit swaps).
摘要
In this work, we propose to investigate DNN layer importance, i.e., to estimate the sensitivity of the accuracy with respect to perturbations applied at the layer level. To do so, we propose a novel dataset to evaluate our method as well as future works. We benchmark a number of criteria and draw conclusions regarding how to assess DNN layer importance and, consequently, how to budgetize layers for increased DNN efficiency (with applications for DNN pruning and quantization), as well as robustness to hardware failure (e.g., bit swaps).
TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design
for: This paper is written for researchers and practitioners in the field of text design and multimodal processing, with a focus on generating visually-and-semantically-harmonious text images for posters.
methods: The paper proposes a novel multimodal approach called TextPainter, which leverages contextual visual information and corresponding text semantics to generate text images. The approach takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Additionally, the paper introduces a text comprehension module to achieve both sentence-level and word-level style variations.
results: The paper presents extensive quantitative and qualitative experiments that demonstrate the effectiveness of TextPainter in generating visually-and-semantically-harmonious text images for posters. The results show that TextPainter can generate high-quality text images that are both aesthetically pleasing and semantically consistent with the context.Abstract
Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Furthermore, we leverage the language model and introduce a text comprehension module to achieve both sentence-level and word-level style variations. Besides, we construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents. We hope this dataset will pave the way for further research on multimodal text image generation. Extensive quantitative and qualitative experiments demonstrate that TextPainter can generate visually-and-semantically-harmonious text images for posters.
摘要
文本设计是海报设计中最重要的过程之一,因为它几乎完全依赖人类的创造力和专业知识来设计文本图像,考虑到视觉和文本 semantics。本研究介绍了 TextPainter,一种新的多Modal方法,利用上下文ual visual information和相应的文本 semantics来生成文本图像。具体来说,TextPainter 利用全局-局部背景图像作为风格的提示,引导文本图像生成,同时还利用语言模型和引入文本理解模块,实现句子级和单词级样式变化。此外,我们构建了 PosterT80K 数据集,包含约80K 海报,每个海报都有 sentence-level bounding box 和文本内容。我们希望这个数据集能够推动未来的多Modal文本图像生成研究。EXTENSIVE 量化和质量实验表明,TextPainter 可以生成视觉和semantically 和谐的文本图像。
Self-supervised Learning of Rotation-invariant 3D Point Set Features using Transformer and its Self-distillation
results: 我们的算法可以学习精准的3D点云特征,并且比现有的自然语言处理算法更加精准。我们还提出了一种 combining多种数据增强技术来增加训练数据的多样性,以便更好地学习3D点云特征。Abstract
Invariance against rotations of 3D objects is an important property in analyzing 3D point set data. Conventional 3D point set DNNs having rotation invariance typically obtain accurate 3D shape features via supervised learning by using labeled 3D point sets as training samples. However, due to the rapid increase in 3D point set data and the high cost of labeling, a framework to learn rotation-invariant 3D shape features from numerous unlabeled 3D point sets is required. This paper proposes a novel self-supervised learning framework for acquiring accurate and rotation-invariant 3D point set features at object-level. Our proposed lightweight DNN architecture decomposes an input 3D point set into multiple global-scale regions, called tokens, that preserve the spatial layout of partial shapes composing the 3D object. We employ a self-attention mechanism to refine the tokens and aggregate them into an expressive rotation-invariant feature per 3D point set. Our DNN is effectively trained by using pseudo-labels generated by a self-distillation framework. To facilitate the learning of accurate features, we propose to combine multi-crop and cut-mix data augmentation techniques to diversify 3D point sets for training. Through a comprehensive evaluation, we empirically demonstrate that, (1) existing rotation-invariant DNN architectures designed for supervised learning do not necessarily learn accurate 3D shape features under a self-supervised learning scenario, and (2) our proposed algorithm learns rotation-invariant 3D point set features that are more accurate than those learned by existing algorithms. Code will be available at https://github.com/takahikof/RIPT_SDMM
摘要
“三维点云集数据中的不变性对三维物体的分析是非常重要的属性。传统的三维点云集DNN通常通过监督学习使用标签的三维点云集来获取精确的三维形状特征。但由于三维点云集数据的快速增长和标签成本的高昂,需要一个框架可以从大量未标签的三维点云集中学习精确的三维形状特征。本文提出了一个新的自我监督学习框架,可以从许多未标签的三维点云集中学习精确且不变性的三维形状特征。我们的提案的轻量级DNN架构可以将输入的三维点云集分解为多个全球缩尺的区域,称为“token”,这些区域可以保持三维物体中的空间布局。我们还使用自我注意力机制来精确地调整token,并将其聚合为一个表达三维点云集不变性的特征。我们的DNN可以通过使用自我养分析框架生成的pseudo-labels进行有效地训练。为了让学习精确的特征,我们提议使用多个拼接和切割资料增强技术来让训练集中的3D点云集更加多样化。经过实验验证,我们证明了以下两点:(1)现有的不变性DNN架构,设计来进行监督学习情况下,不一定会学习精确的三维形状特征;(2)我们的提案的算法可以从未标签的三维点云集中学习精确且不变性的三维形状特征,并且比现有的算法更精确。代码将会在https://github.com/takahikof/RIPT_SDMM中公开。”
Continual Road-Scene Semantic Segmentation via Feature-Aligned Symmetric Multi-Modal Network
results: 在SemanticKITTI dataset上进行评估,与 closest competitor 进行比较,得到了良好的结果。 此外,还引入了一个特殊的 continual learning 方案,在 class-incremental continual learning enario中证明了方法的有效性。Abstract
State-of-the-art multimodal semantic segmentation approaches combining LiDAR and color data are usually designed on top of asymmetric information-sharing schemes and assume that both modalities are always available. Regrettably, this strong assumption may not hold in real-world scenarios, where sensors are prone to failure or can face adverse conditions (night-time, rain, fog, etc.) that make the acquired information unreliable. Moreover, these architectures tend to fail in continual learning scenarios. In this work, we re-frame the task of multimodal semantic segmentation by enforcing a tightly-coupled feature representation and a symmetric information-sharing scheme, which allows our approach to work even when one of the input modalities is missing. This makes our model reliable even in safety-critical settings, as is the case of autonomous driving. We evaluate our approach on the SemanticKITTI dataset, comparing it with our closest competitor. We also introduce an ad-hoc continual learning scheme and show results in a class-incremental continual learning scenario that prove the effectiveness of the approach also in this setting.
摘要
现代多模态Semantic segmentation方法通常基于不均衡信息分享模式和假设所有感知数据都可用。可惜,这强制假设在实际场景中可能不成立,感知器容易出现故障或面临不良天气(夜晚、雨、雾等),导致获取到的信息不可靠。此外,这些架构在连续学习场景下也存在问题。在这项工作中,我们重新定义多模态Semantic segmentation任务,强制实施紧密相关的特征表示和 symmetrical information-sharing模式,使我们的方法能够在一个模式缺失时仍然可靠。这使我们的模型在安全关键的应用场景中可靠,如自动驾驶。我们在SemanticKITTI数据集上评估我们的方法,与 closest competitor进行比较。我们还引入了特殊的连续学习方案,并在类增量连续学习场景中展示结果,证明了我们的方法在这种场景中的有效性。
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization
paper_authors: Hao Fang, Bin Chen, Xuan Wang, Zhi Wang, Shu-Tao Xia
For: This paper proposes a method to protect the privacy of clients in Federated Learning (FL) by defending against gradient inversion attacks.* Methods: The proposed method, called Gradient Inversion over Feature Domains (GIFD), disassembles the Generative Adversarial Network (GAN) model and searches for feature domains in the intermediate layers. It also includes a regularizer to avoid unreal image generation.* Results: The proposed method achieves pixel-level reconstruction and outperforms existing methods. It also demonstrates great generalizability under different defense strategy settings and batch sizes.Here’s the simplified Chinese text:
methods: 提议的方法是Gradient Inversion over Feature Domains(GIFD),它将GAN模型分解成多个层次结构,并在这些层次结构中搜索特征领域。它还包括一个正则项来避免生成不实际的图像。
results: 提议的方法可以实现像素级重建,并超越现有的方法。它还在不同的防御策略设置和批处大小下展现出了优秀的一致性。Abstract
Federated Learning (FL) has recently emerged as a promising distributed machine learning framework to preserve clients' privacy, by allowing multiple clients to upload the gradients calculated from their local data to a central server. Recent studies find that the exchanged gradients also take the risk of privacy leakage, e.g., an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. However, performing gradient inversion attacks in the latent space of the GAN model limits their expression ability and generalizability. To tackle these challenges, we propose \textbf{G}radient \textbf{I}nversion over \textbf{F}eature \textbf{D}omains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers. Instead of optimizing only over the initial latent code, we progressively change the optimized layer, from the initial latent space to intermediate layers closer to the output images. In addition, we design a regularizer to avoid unreal image generation by adding a small ${l_1}$ ball constraint to the searching range. We also extend GIFD to the out-of-distribution (OOD) setting, which weakens the assumption that the training sets of GANs and FL tasks obey the same data distribution. Extensive experiments demonstrate that our method can achieve pixel-level reconstruction and is superior to the existing methods. Notably, GIFD also shows great generalizability under different defense strategy settings and batch sizes.
摘要
federated learning (FL) 最近 emerge as a promising distributed machine learning framework to preserve clients' privacy, by allowing multiple clients to upload the gradients calculated from their local data to a central server. recent studies find that the exchanged gradients also take the risk of privacy leakage, e.g., an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. however, performing gradient inversion attacks in the latent space of the GAN model limits their expression ability and generalizability. to tackle these challenges, we propose 《G》radient 《I》nversion over 《F》eature 《D》omains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers. instead of optimizing only over the initial latent code, we progressively change the optimized layer, from the initial latent space to intermediate layers closer to the output images. in addition, we design a regularizer to avoid unreal image generation by adding a small ${l_1}$ ball constraint to the searching range. we also extend GIFD to the out-of-distribution (OOD) setting, which weakens the assumption that the training sets of GANs and FL tasks obey the same data distribution. extensive experiments demonstrate that our method can achieve pixel-level reconstruction and is superior to the existing methods. notably, GIFD also shows great generalizability under different defense strategy settings and batch sizes.
Score Priors Guided Deep Variational Inference for Unsupervised Real-World Single Image Denoising
results: 该论文的方法比其他单一图像基于的实际世界去噪方法表现更好,并且与 dataset-based 无监督方法相似。Abstract
Real-world single image denoising is crucial and practical in computer vision. Bayesian inversions combined with score priors now have proven effective for single image denoising but are limited to white Gaussian noise. Moreover, applying existing score-based methods for real-world denoising requires not only the explicit train of score priors on the target domain but also the careful design of sampling procedures for posterior inference, which is complicated and impractical. To address these limitations, we propose a score priors-guided deep variational inference, namely ScoreDVI, for practical real-world denoising. By considering the deep variational image posterior with a Gaussian form, score priors are extracted based on easily accessible minimum MSE Non-$i.i.d$ Gaussian denoisers and variational samples, which in turn facilitate optimizing the variational image posterior. Such a procedure adaptively applies cheap score priors to denoising. Additionally, we exploit a Non-$i.i.d$ Gaussian mixture model and variational noise posterior to model the real-world noise. This scheme also enables the pixel-wise fusion of multiple image priors and variational image posteriors. Besides, we develop a noise-aware prior assignment strategy that dynamically adjusts the weight of image priors in the optimization. Our method outperforms other single image-based real-world denoising methods and achieves comparable performance to dataset-based unsupervised methods.
摘要
By considering the deep variational image posterior with a Gaussian form, score priors are extracted based on easily accessible minimum MSE Non-$i.i.d$ Gaussian denoisers and variational samples, which facilitate optimizing the variational image posterior. This procedure adaptively applies cheap score priors to denoising. Additionally, we exploit a Non-$i.i.d$ Gaussian mixture model and variational noise posterior to model the real-world noise. This scheme enables the pixel-wise fusion of multiple image priors and variational image posteriors.Moreover, we develop a noise-aware prior assignment strategy that dynamically adjusts the weight of image priors in the optimization. Our method outperforms other single image-based real-world denoising methods and achieves comparable performance to dataset-based unsupervised methods.Translation notes:* "Real-world" is translated as "实际世界" (shíjiè shìjì)* "Single image denoising" is translated as "单图干涂除" (dan tú gān bù)* "Bayesian inversions" is translated as " bayesian 逆转" (bài jiàn zhòng)* "Score priors" is translated as "Score 先验" (mù xiān yǐ)* "Non-$i.i.d$ Gaussian denoisers" is translated as "非-$i.i.d$ Gaussian 干涂除器" (fēi-$i.i.d$ Gaussian gān bù zhèng)* "Variational image posterior" is translated as "变分图 posterior" (biàn fēn tú zhèng)* "Pixel-wise fusion" is translated as "像素级融合" (xiàng xiàng jí yù)* "Noise-aware prior assignment" is translated as "针对噪声的先验分配" (jiào duì fāng xiàng zhòng yì)
A General Implicit Framework for Fast NeRF Composition and Rendering
methods: 该方法使用了一种新的表面表示方式 called Neural Depth Fields (NeDF), 它可以快速确定物体之间的空间关系,并且可以使用抽象光源来渲染动态阴影。
results: 该方法可以快速地进行NeRF对象的组合和预览,并且可以在实时中进行多个NeRF对象的组合和预览。此外,该方法还可以作为现有NeRF作品的预览插件使用。Abstract
A variety of Neural Radiance Fields (NeRF) methods have recently achieved remarkable success in high render speed. However, current accelerating methods are specialized and incompatible with various implicit methods, preventing real-time composition over various types of NeRF works. Because NeRF relies on sampling along rays, it is possible to provide general guidance for acceleration. To that end, we propose a general implicit pipeline for composing NeRF objects quickly. Our method enables the casting of dynamic shadows within or between objects using analytical light sources while allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations. Mainly, our work introduces a new surface representation known as Neural Depth Fields (NeDF) that quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.Our proposed method is the first to enable both the progressive and interactive composition of NeRF objects. Additionally, it also serves as a previewing plugin for a range of existing NeRF works.
摘要
各种神经辐射场(NeRF)方法在最近几年内取得了显著的成功,但现有的加速方法具有特定的限制,无法与各种隐式方法兼容,因此在实时组合不同类型的 NeRF 作品中存在限制。由于 NeRF 通过抽象线段进行抽象,因此可以提供一般的指导方针 для加速。为了实现这一目标,我们提议一种通用的隐式管道,用于快速组合 NeRF 对象。我们的方法允许在动态阴影中投射analytical 光源,并允许多个 NeRF 对象在任意的旋转变换下进行平铺渲染。主要地,我们的工作引入了一种新的表面表示方式,称为神经深度场(NeDF),它快速确定了物体之间的空间关系,通过直接计算抽象线段与隐式表面之间的交点。它利用了交叉神经网络来查询 NeRF 的加速而不是依赖于显式空间结构。我们的提议方法是首个允许 NeRF 对象进行进度式和交互式组合。此外,它还可以作为许多现有 NeRF 作品的预览插件。
Classification of lung cancer subtypes on CT images with synthetic pathological priors
results: 对于肺癌分型的类型,SGHF-Net 模型比 SOTA 模型具有显著的高精准度,包括准确率(ACC)、曲线面积(AUC)和 F1 分数等指标均有显著提高。Abstract
The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns between the same case's CT images and its pathological images, we innovatively developed a pathological feature synthetic module (PFSM), which quantitatively maps cross-modality associations through deep neural networks, to derive the "gold standard" information contained in the corresponding pathological images from CT images. Additionally, we designed a radiological feature extraction module (RFEM) to directly acquire CT image information and integrated it with the pathological priors under an effective feature fusion framework, enabling the entire classification model to generate more indicative and specific pathologically related features and eventually output more accurate predictions. The superiority of the proposed model lies in its ability to self-generate hybrid features that contain multi-modality image information based on a single-modality input. To evaluate the effectiveness, adaptability, and generalization ability of our model, we performed extensive experiments on a large-scale multi-center dataset (i.e., 829 cases from three hospitals) to compare our model and a series of state-of-the-art (SOTA) classification models. The experimental results demonstrated the superiority of our model for lung cancer subtypes classification with significant accuracy improvements in terms of accuracy (ACC), area under the curve (AUC), and F1 score.
摘要
精准诊断lung cancer的临床亚型是诊断和治疗评估中的关键因素。本文提出一种自生成混合特征网络(SGHF-Net),用于精准分类lung cancer亚型的计算机 Tomatoes(CT)影像。研究表明,同一个患者的CT影像和病理图像之间存在跨Modal Association,我们采用Pathological Feature Synthetic Module(PFSM),通过深度神经网络,将CT影像中的病理信息转化为"标准"信息,并与放射学特征提取模块(RFEM)集成,以实现更加指示和特定的病理相关特征,最终输出更高精度的预测结果。我们的模型的优势在于,可以基于单模态输入生成多Modal特征,以提高分类精度。为评估我们的模型的有效性、适应性和普遍性,我们在三家医院的大规模多中心数据集(829例)上进行了广泛的实验,与一系列现有的SOTA分类模型进行比较。实验结果表明,我们的模型在lung cancer亚型分类中表现出了显著的高精度,ACC、AUC和F1分数均达到了SOTA水平。
Which Tokens to Use? Investigating Token Reduction in Vision Transformers
paper_authors: Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor, Thomas B. Moeslund
For: 这 paper 的目的是为了理解不同的token reduction方法在不同的图像分类任务中的减少模式。* Methods: 这 paper 使用了10种不同的token reduction方法,并在四个图像分类dataset上进行了系统比较。* Results: 研究发现,Top-K pruning方法是一个意外的强大基线。通过深入分析不同方法的减少模式,发现减少模式通常不是随机变化的,pruning-based方法的减少模式与固定圆形模式不同,并且在不同的分类任务中减少模式的相似程度是一个中等至强的代理。Here’s the English version of the information for reference:* For: The purpose of this paper is to understand the reduction patterns of different token reduction methods on different image classification tasks.* Methods: The paper uses 10 different token reduction methods and compares them systematically on four image classification datasets.* Results: The study finds that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, it is found that the reduction patterns are not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally, it is reported that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. The project page is available at https://vap.aau.dk/tokens.Abstract
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs more efficient by removing redundant information in the processed tokens. While different methods have been explored to achieve this goal, we still lack understanding of the resulting reduction patterns and how those patterns differ across token reduction methods and datasets. To close this gap, we set out to understand the reduction patterns of 10 different token reduction methods using four image classification datasets. By systematically comparing these methods on the different classification tasks, we find that the Top-K pruning method is a surprisingly strong baseline. Through in-depth analysis of the different methods, we determine that: the reduction patterns are generally not consistent when varying the capacity of the backbone model, the reduction patterns of pruning-based methods significantly differ from fixed radial patterns, and the reduction patterns of pruning-based methods are correlated across classification datasets. Finally we report that the similarity of reduction patterns is a moderate-to-strong proxy for model performance. Project page at https://vap.aau.dk/tokens.
摘要
自 introduce Vision Transformer (ViT) 以来,研究人员努力减少 ViT 中处理符号中的重复信息,以提高模型效率。然而,不同的方法在实现这个目标上有所不同,我们仍然缺乏对减少模型的理解和不同减少方法和数据集之间的差异。为了填补这个空白,我们决心了解不同减少方法在四个图像分类任务中的减少模式。我们系统地比较了这些方法,并发现:1. 顶峰权重剪除法是一个意外的强基线。2. 随着后向模型的容量变化,减少模式不一致。3. 剪除基于方法的减少模式与固定圆形减少模式显著不同。4. 剪除基于方法的减少模式在不同的分类任务中呈 corrleation 关系。5. 减少模式之间的相似性是模型性能的中等到强的代表。更多信息请访问我们的项目页面:。
Assessing the performance of deep learning-based models for prostate cancer segmentation using uncertainty scores
methods: 这个研究使用了七种不同的 U-Net 架构,其中包括 Monte Carlo dropout 的扩展。
results: 这个研究发现,使用 Attention R2U-Net 模型可以获得最高的 Mean Intersection over Union (IoU) 和 Dice Similarity Coefficient (DSC) 值,它可以准确地 segmentation所有区域,并且在transition zone和肿瘤边界处具有最低的uncertainty值。Abstract
This study focuses on comparing deep learning methods for the segmentation and quantification of uncertainty in prostate segmentation from MRI images. The aim is to improve the workflow of prostate cancer detection and diagnosis. Seven different U-Net-based architectures, augmented with Monte-Carlo dropout, are evaluated for automatic segmentation of the central zone, peripheral zone, transition zone, and tumor, with uncertainty estimation. The top-performing model in this study is the Attention R2U-Net, achieving a mean Intersection over Union (IoU) of 76.3% and Dice Similarity Coefficient (DSC) of 85% for segmenting all zones. Additionally, Attention R2U-Net exhibits the lowest uncertainty values, particularly in the boundaries of the transition zone and tumor, when compared to the other models.
摘要
Long-Distance Gesture Recognition using Dynamic Neural Networks
results: 在LD-ConGR长距离数据集上,该方法与前一代方法相比,在识别精度和计算效率两个方面均有显著提高。Abstract
Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.
摘要
人机之间的姿势成为通信的重要媒体。现有的大多数姿势识别方法都是为短距离场景设计的,而这个假设不符合一些交互,如floor cleaning robot或者飞行器的姿势交互。这些短距离的方法在长距离识别中表现不佳,因为姿势只占输入数据中的一小部分。它们在有限的计算资源下表现特别糟糕,无法有效地专注于捕捉姿势表达者。我们提出了一种新的、准确和高效的姿势识别方法,使用动态神经网络选择 gesture-containing 的空间区域输入感知器数据进行进一步处理。这帮助网络在执行姿势识别时选择重要的特征,而不是浪费计算资源于背景特征。我们在 LD-ConGR 长距离数据集上证明了我们的方法的性能,其在识别精度和计算效率两个方面都高于之前的状态 искусственныйints。
GeoAdapt: Self-Supervised Test-Time Adaption in LiDAR Place Recognition Using Geometric Priors
results: 实验表明,GeoAdapt 可以在中度至严重的域名shift情况下显著提高场景认知性能,并与完全监督的测试时适应方法相比赛得竞争力。Abstract
LiDAR place recognition approaches based on deep learning suffer a significant degradation in performance when there is a shift between the distribution of the training and testing datasets, with re-training often required to achieve top performance. However, obtaining accurate ground truth on new environments can be prohibitively expensive, especially in complex or GPS-deprived environments. To address this issue we propose GeoAdapt, which introduces a novel auxiliary classification head to generate pseudo-labels for re-training on unseen environments in a self-supervised manner. GeoAdapt uses geometric consistency as a prior to improve the robustness of our generated pseudo-labels against domain shift, improving the performance and reliability of our Test-Time Adaptation approach. Comprehensive experiments show that GeoAdapt significantly boosts place recognition performance across moderate to severe domain shifts, and is competitive with fully supervised test-time adaptation approaches. Our code will be available at https://github.com/csiro-robotics/GeoAdapt.
摘要
“LiDAR位置识别方法基于深度学习受到分布不同的训练和测试数据集之间的偏移会导致性能下降,并且经常需要重新训练以达到最佳性能。然而,在新环境中获取准确的测试数据可以非常昂贵,特别是在复杂或GPS缺乏环境中。为解决这个问题,我们提出了GeoAdapt,它 introduce了一个新的辅助分类头来生成 Pseudo-标签,以便在无监督的自适应方式下重新训练在未看过的环境中。GeoAdapt使用几何一致性作为假设,以提高我们生成的 Pseudo-标签对域转移的可靠性,从而提高了我们的测试时适应方法的性能和可靠性。我们的实验表明,GeoAdapt在中等至严重的域转移情况下能够显著提高位置识别性能,并与完全监督的测试时适应方法竞争。我们的代码将在https://github.com/csiro-robotics/GeoAdapt上公开。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Rendering Humans from Object-Occluded Monocular Videos
methods: 我们提出了一种名为OccNeRF的神经渲染方法,可以更好地渲染人体在受到干扰的场景中。我们直接解决了两个缺陷,一是使用点对点映射的标准渲染策略可能导致人体部分的不一致,二是直接 regression approach 不考虑任何可行性条件(即先验信息) для 渲染下遮挡。为解决这两个缺陷,我们提出了基于表面和可见性先验的渲染方法。
results: 我们验证了我们的方法在both simulated和实际 occlusions 中的超过人体渲染和渲染效果,并证明了我们的方法的优越性。Abstract
3D understanding and rendering of moving humans from monocular videos is a challenging task. Despite recent progress, the task remains difficult in real-world scenarios, where obstacles may block the camera view and cause partial occlusions in the captured videos. Existing methods cannot handle such defects due to two reasons. First, the standard rendering strategy relies on point-point mapping, which could lead to dramatic disparities between the visible and occluded areas of the body. Second, the naive direct regression approach does not consider any feasibility criteria (ie, prior information) for rendering under occlusions. To tackle the above drawbacks, we present OccNeRF, a neural rendering method that achieves better rendering of humans in severely occluded scenes. As direct solutions to the two drawbacks, we propose surface-based rendering by integrating geometry and visibility priors. We validate our method on both simulated and real-world occlusions and demonstrate our method's superiority.
摘要
三维理解和渲染移动人体从单目视频中是一项具有挑战性的任务。尽管最近有所进步,但在真实世界场景中,障碍物可能会阻挡摄像头视野,导致视频中的部分遮挡。现有方法无法处理这些缺陷,主要因两点:首先,标准渲染策略基于点对点映射,可能导致人体部分遮挡和可见部分之间的差异极大。其次,直接回归方法不考虑任何可行性条件(即先验知识),在遮挡下进行渲染。为解决以上缺陷,我们提出OccNeRF方法,实现在严重遮挡场景中更好的人体渲染。为直接解决两个缺陷,我们提议基于几何和可见约束的表面渲染。我们在模拟和实际遮挡场景中验证了我们的方法,并证明其超越性。
PSRFlow: Probabilistic Super Resolution with Flow-Based Models for Scientific Data
results: 对比其他方法如 interpolate 和 GAN 基本的超分辨化网络,PSRFlow 模型在不确定性评估方面表现出色,并且在不同的数据比例下进行灵活的超分辨化。Abstract
Although many deep-learning-based super-resolution approaches have been proposed in recent years, because no ground truth is available in the inference stage, few can quantify the errors and uncertainties of the super-resolved results. For scientific visualization applications, however, conveying uncertainties of the results to scientists is crucial to avoid generating misleading or incorrect information. In this paper, we propose PSRFlow, a novel normalizing flow-based generative model for scientific data super-resolution that incorporates uncertainty quantification into the super-resolution process. PSRFlow learns the conditional distribution of the high-resolution data based on the low-resolution counterpart. By sampling from a Gaussian latent space that captures the missing information in the high-resolution data, one can generate different plausible super-resolution outputs. The efficient sampling in the Gaussian latent space allows our model to perform uncertainty quantification for the super-resolved results. During model training, we augment the training data with samples across various scales to make the model adaptable to data of different scales, achieving flexible super-resolution for a given input. Our results demonstrate superior performance and robust uncertainty quantification compared with existing methods such as interpolation and GAN-based super-resolution networks.
摘要
尽管最近几年内提出了许多深度学习基于超分辨率方法,但由于无法在推理阶段获得测试数据,因此只能很难量化和不确定性的超分辨率结果。在科学视觉应用中,却是非常重要的,通过传递结果的不确定性给科学家,以避免生成错误或不准确的信息。在这篇论文中,我们提出了PSRFlow,一种基于Normalizing Flow的生成模型,用于科学数据超分辨率中的不确定性评估。PSRFlow学习了高分辨率数据的Conditional分布,基于低分辨率数据。通过在Gaussian准则空间中采样,可以生成不同的可能的超分辨率输出。我们的模型可以在Gaussian准则空间中高效采样,从而实现对超分辨率结果的不确定性评估。在模型训练时,我们将训练数据扩展到不同的尺度,使模型适应不同的数据尺度,实现数据的灵活超分辨率。我们的结果表明,PSRFlow比既有 interpolate和GAN基于超分辨率网络的方法具有更高的性能和稳定性。
1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges
paper_authors: Kaer Huang for:* The paper is written to address the challenge of video instance segmentation (VIS) in long-tailed and open-world scenarios.methods:* The authors use a combination of LVISv0.5 and the COCO dataset with repeat factor sampling to train their model.* They train the detector with segmentation and CEM on the LVISv0.5 + COCO dataset, and then train the instance appearance similarity head on the TAO dataset.results:* The authors achieve 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark.* They also achieve 61.4 OWTAall in the open-world challenges, ranking 1st in the benchmark.Abstract
Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only a few dozen of categories, lacking the ability to handle diverse objects in real-world videos. As TAO and BURST datasets release, we have the opportunity to research VIS in long-tailed and open-world scenarios. Traditional VIS methods are evaluated on benchmarks limited to a small number of common classes, But practical applications require trackers that go beyond these common classes, detecting and tracking rare and even never-before-seen objects. Inspired by the latest MOT paper for the long tail task (Tracking Every Thing in the Wild, Siyuan Li et), for the BURST long tail challenge, we train our model on a combination of LVISv0.5 and the COCO dataset using repeat factor sampling. First, train the detector with segmentation and CEM on LVISv0.5 + COCO dataset. And then, train the instance appearance similarity head on the TAO dataset. at last, our method (LeTracker) gets 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark. for the open-world challenges, we only use 64 classes (Intersection classes of BURST Train subset and COCO dataset, without LVIS dataset) annotations data training, and testing on BURST test set data and get 61.4 OWTAall, ranking 1st in the benchmark. Our code will be released to facilitate future research.
摘要
当前,视频实例分割(VIS)目标是将视频中的对象分类和分割,但现有的方法仅能处理固定的训练类别,无法涵盖实际世界中的多样化对象。随着TAO和BURST数据集的发布,我们有机会进行VIS在长尾和开放世界enario中的研究。传统的VIS方法通常被评估在限制于一些常见类别的benchmark上,但实际应用需要满足更多的类别,检测和跟踪 rare和even never-before-seen对象。受latest MOT论文的长尾任务(Tracking Every Thing in the Wild,Siyuan Li et al)的启发,我们在BURST长尾挑战中使用repeat factor sampling训练我们的模型。首先,我们使用LVISv0.5和COCO数据集训练探测器的segmentation和CEM。然后,我们在TAO数据集上训练实例外观相似度头。最后,我们的方法(LeTracker)在BURST测试集上得到14.9 HOTAall,排名第一名。对于开放世界挑战,我们只使用64个类别(BURST训练集和COCO数据集的交集类别,不包括LVIS数据集)的注释数据训练,并在BURST测试集上进行测试,得到61.4 OWTAall,排名第一名。我们的代码将被释出,以便未来的研究。
LATR: 3D Lane Detection from Monocular Images with Transformer
results: LATR在synthetic Apollo、realistic OpenLane和ONCE-3DLanes等数据集上表现出优于之前的状态 искусственный方法(例如,OpenLane上的F1分数提高11.4个)。Abstract
3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. Specifically, LATR detects 3D lanes via cross-attention based on query and key-value pairs, constructed using our lane-aware query generator and dynamic 3D ground positional embedding. On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance lane information. On the other hand, 3D space information is injected as positional embedding from an iteratively-updated 3D ground plane. LATR outperforms previous state-of-the-art methods on both synthetic Apollo, realistic OpenLane and ONCE-3DLanes by large margins (e.g., 11.4 gain in terms of F1 score on OpenLane). Code will be released at https://github.com/JMoonr/LATR .
摘要
三维车道检测从单视图图像是自主驾驶中的基本 yet 挑战性任务。当前的进步主要基于结构三维代理(如鸟瞰视图),从前视图图像特征和摄像头参数构建。然而,单视图图像中的深度不确定性无法准确对应原始图像,这对准确车道检测 pose 大问题。为解决上述问题,我们提出了一种新的LATR模型,一个端到端的三维车道检测器,使用三维感知的前视图特征而不需要转换视图表示。具体来说,LATR通过对查询和关键值对进行跨注意力的注意力机制来检测三维车道。一方面,每个查询基于二维车道意识特征,采用混合嵌入以增强车道信息。另一方面,3D空间信息通过循环更新的3D地面嵌入注入到扩展特征中。LATR在 Apollo 和 OpenLane 等实际数据集上的性能明显超过了之前的状态对照方法(例如,OpenLane 上的 F1 分数提高11.4)。代码将在 GitHub 上发布。
Optimizing Algorithms From Pairwise User Preferences
results: 研究中应用 SortCMA 算法成功地优化了无地平 truth 的深度探测仪器和人工社交导航问题,并进行了用户研究以评估社交导航结果。Abstract
Typical black-box optimization approaches in robotics focus on learning from metric scores. However, that is not always possible, as not all developers have ground truth available. Learning appropriate robot behavior in human-centric contexts often requires querying users, who typically cannot provide precise metric scores. Existing approaches leverage human feedback in an attempt to model an implicit reward function; however, this reward may be difficult or impossible to effectively capture. In this work, we introduce SortCMA to optimize algorithm parameter configurations in high dimensions based on pairwise user preferences. SortCMA efficiently and robustly leverages user input to find parameter sets without directly modeling a reward. We apply this method to tuning a commercial depth sensor without ground truth, and to robot social navigation, which involves highly complex preferences over robot behavior. We show that our method succeeds in optimizing for the user's goals and perform a user study to evaluate social navigation results.
摘要
传统的黑盒优化方法在机器人学中通常是通过学习度量分数来进行。但是,不 все开发者拥有地面 truth,而学习合适的机器人行为在人类中心的上下文中经常需要询问用户,用户通常无法提供精确的度量分数。现有方法利用用户反馈来尝试模型一个隐式奖励函数,但这个奖励可能很难或者无法有效地捕捉。在这项工作中,我们介绍 SortCMA 来优化算法参数配置在高维度基于用户偏好的情况下。SortCMA 能够高效地和稳定地利用用户输入来找到参数集,而不需直接模型一个奖励。我们在没有地面 truth 的情况下使用 SortCMA 来调整一个商业深度探测器,以及机器人社交导航,这些导航包括机器人行为的复杂偏好。我们示示了我们的方法可以在用户的目标下进行优化,并进行了用户研究来评估社交导航结果。
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
methods: 提出了一种通用的管道方法 Hard Instance Probing (HIP),可以在多个阶段进行缺失预测的识别和提高模型的预测精度。在3D对象检测方面,我们实现了这个方法为FocalFormer3D,一种简单又高效的检测器,能够高效地检测难对象并提高预测精度。FocalFormer3D使用多个阶段生成查询来找到困难对象,并使用箱级变换器解码器来快速分辨对象和庞大对象候选。
results: 在nuScenes和Waymo datasets上进行实验, validate FocalFormer3D的超越性性能。FocalFormer3D在LiDAR和多模态设置下都具有出色的检测和跟踪性能,其中 nuScenes检测benchmark中的70.5 mAP和73.9 NDS都在1位的LiDAR领导板块上。Abstract
False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at \url{https://github.com/NVlabs/FocalFormer3D}.
摘要
假阳性(FN)在三维 объек的检测中,例如缺失人行车辆或其他障碍物的预测,可能导致自动驾驶中的危险 Situations. 虽然这是致命的问题,但在许多当前的三维检测方法中却被不 suficiently studied. 在这种工作中,我们提议使用 Hard Instance Probing(HIP),一种总体来说是多个阶段的方法,用于标识假阳性。为三维对象检测,我们实现了 FocalFormer3D,一种简单又高效的检测器,可以帮助模型更好地挖掘困难对象。 FocalFormer3D 的设计包括多个阶段的查询生成,用于找到困难对象,以及一个箱级别的 transformer 解码器,用于高效地分辨对象和大量对象候选者之间。实验结果表明,FocalFormer3D 在 nuScenes 和 Waymo 数据集上表现出色,其优势在于强大的检测和跟踪性能,包括 LiDAR 和多模式设置下的性能。特别是,FocalFormer3D 在 nuScenes 检测benchmark上达到了 70.5 mAP 和 73.9 NDS,在 nuScenes 跟踪benchmark上达到了 72.1 AMOTA,都在 nuScenes LiDAR 领先者板块上排名第一。我们的代码可以在 上获取。
Towards Automatic Scoring of Spinal X-ray for Ankylosing Spondylitis
results: 研究结果显示,VertXGradeNet 可以对有限量和不均匀的数据进行自动评分,并且在两个试验数据集上实现了0.56和0.51的平衡精度。Abstract
Manually grading structural changes with the modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) on spinal X-ray imaging is costly and time-consuming due to bone shape complexity and image quality variations. In this study, we address this challenge by prototyping a 2-step auto-grading pipeline, called VertXGradeNet, to automatically predict mSASSS scores for the cervical and lumbar vertebral units (VUs) in X-ray spinal imaging. The VertXGradeNet utilizes VUs generated by our previously developed VU extraction pipeline (VertXNet) as input and predicts mSASSS based on those VUs. VertXGradeNet was evaluated on an in-house dataset of lateral cervical and lumbar X-ray images for axial spondylarthritis patients. Our results show that VertXGradeNet can predict the mSASSS score for each VU when the data is limited in quantity and imbalanced. Overall, it can achieve a balanced accuracy of 0.56 and 0.51 for 4 different mSASSS scores (i.e., a score of 0, 1, 2, 3) on two test datasets. The accuracy of the presented method shows the potential to streamline the spinal radiograph readings and therefore reduce the cost of future clinical trials.
摘要
人工评估结构变化的成本和时间费用很高,主要是因为骨形态复杂性和图像质量变化。在这项研究中,我们解决这个挑战,推出了一个两步自动评估管线,称为VertXGradeNet,可以自动预测X光影像中肩峰病变综合分数(mSASSS)。VertXGradeNet使用我们先前开发的VU提取管线(VertXNet)生成的VU作为输入,并预测基于这些VU的mSASSS分数。我们的结果表明,VertXGradeNet可以在数量有限、不均衡的数据集上预测每个VU的mSASSS分数。总的来说,它可以在两个测试数据集上实现平均准确率为0.56和0.51,对四个不同的mSASSS分数(即分数为0、1、2、3)进行预测。本方法的准确率表明,可以通过自动化肩峰X光影像读取,提高未来临床试验的效率,降低成本。
Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder
paper_authors: Nicha C. Dvornek, Catherine Sullivan, James S. Duncan, Abha R. Gupta
for: This paper aims to develop a more integrative model for combining genetic, demographic, and neuroimaging data to better understand the multifactorial etiology of autism spectrum disorder (ASD).
methods: The proposed approach uses an attention-based model that guides attention to neuroimaging features of importance for model prediction based on genetic data derived from copy number variation parameters.
results: The attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches, as demonstrated on ASD classification and severity prediction tasks using a sex-balanced dataset of 228 ASD and typically developing subjects.Abstract
The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.
摘要
Autism spectrum disorder (ASD) 的多因素起源表示,研究它会受到多模态方法的 combinatio。例如,神经成像、遗传学和临床特征的数据可以结合在一起来研究。在过去的神经成像-遗传学分析中,通常采用了简单的特征串接方法或者使用一个模式来导向另一个模式的分析,而忽略了对复合数据进行真正的统一分析。在这篇论文中,我们开发了一种更集成的方法,将遗传学、人口统计学和神经成像数据结合在一起。我们受到遗传型的影响,使用了一种注意力机制,使遗传学数据引导神经成像特征的重要性对模型预测。我们的数据来自于拷贝数变化参数,而神经成像数据来自功能核磁共振成像。我们在ASD分类和严重程度预测任务中使用了10次交叉验证框架,并证明了我们的注意力机制,结合遗传信息、人口统计学和神经成像结果,在多模态方法中实现了更高的预测性能。
From Fake to Real (FFR): A two-stage training pipeline for mitigating spurious correlations with synthetic data
results: 提出了一种两阶段管道,首先在偏见synthetic dataset上预训练模型,然后在真实数据上细化。这种管道可以避免训练both real和synthetic数据,从而避免real和synthetic数据之间的偏见。此外,我们的管道还能够学习减轻偏见的特征,从而在第二阶段中减少偏见。此外,我们的管道可以自然地与偏见缓解方法集成,这些方法可以简单地应用于细化阶段。我们的实验证明,我们的管道可以进一步提高偏见缓解方法的性能,在三个大规模数据集上达到状态空间的表现。Abstract
Visual recognition models are prone to learning spurious correlations induced by an imbalanced training set where certain groups (\eg Females) are under-represented in certain classes (\eg Programmers). Generative models offer a promising direction in mitigating this bias by generating synthetic data for the minority samples and thus balancing the training set. However, prior work that uses these approaches overlooks that visual recognition models could often learn to differentiate between real and synthetic images and thus fail to unlearn the bias in the original dataset. In our work, we propose a novel two-stage pipeline to mitigate this issue where 1) we pre-train a model on a balanced synthetic dataset and then 2) fine-tune on the real data. Using this pipeline, we avoid training on both real and synthetic data, thus avoiding the bias between real and synthetic data. Moreover, we learn robust features against the bias in the first step that mitigate the bias in the second step. Moreover, our pipeline naturally integrates with bias mitigation methods; they can be simply applied to the fine-tuning step. As our experiments prove, our pipeline can further improve the performance of bias mitigation methods obtaining state-of-the-art performance on three large-scale datasets.
摘要
“视觉识别模型容易学习偏见,由于训练集中某些组(如女性)被下标,导致训练集不均衡。生成模型提供了一个有前途的方向,即通过生成少数样本的 sintetic 数据,以增加训练集的均衡性。然而,现有的方法忽略了视觉识别模型可能会学习分辨真实和 sintetic 图像的 diference,从而失去原始数据中的偏见。在我们的工作中,我们提出了一个新的两个阶段管道来解决这个问题:1)我们首先在一个均衡的 sintetic 数据上预训练模型,然后2)在真实数据上细化。使用这个管道,我们可以避免在真实和 sintetic 数据上进行训练,从而避免偏见的问题。此外,我们在第一个阶段学习了对偏见的Robust特征,以 Mitigate 在第二个阶段的偏见。此外,我们的管道自然地与偏见缓解方法集成,这些方法可以简单地应用于细化阶段。根据我们的实验,我们的管道可以进一步提高偏见缓解方法的性能,在三个大规模数据集上达到了状态之作�的表现。”
Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining
results: 研究发现,使用自我超vised学习初始化的模型可以更好地学习随机标签影像,并提高分类性能。Abstract
Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based self-supervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels -- NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels.
摘要
噪声标签会对深度学习基于监督图像分类的性能产生负面影响,因为模型可能会适应噪声并学习损坏的特征提取器。在自然图像分类训练中使用噪声标签数据,使用对比自我超vised预训练的初始化方法可以减少特征损坏并提高分类性能。然而,没有任何研究探讨了:i) 其他自我超视任务基本预训练方法对噪声标签学习的影响,ii) 任何自我超视任务基本预训练方法在医学图像上的效果。医学图像通常具有较小的数据集和柔微的间类差异,需要人类专业知识来确保正确的分类。因此,不清楚自然图像数据集CIFAR中的方法会在医学图像上有效。在这项工作中,我们探讨了对比和预 Text Task-based自我超视预训练来初始化深度学习分类模型的两个医学数据集——NCT-CRC-HE-100K组织组织肿瘤图像和COVID-QU-Ex胸部X射图像。我们的结果表明,使用自我超视预训练获得的预 initialize的模型可以更好地学习特征并提高对噪声标签的Robustness。
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
results: 在Kinetics-400和Something-Something V2 datasets上,使用STA模块可以实现约30%的计算减少,而准确率下降幅度只有0.2%。Abstract
Transformers have become the primary backbone of the computer vision community due to their impressive performance. However, the unfriendly computation cost impedes their potential in the video recognition domain. To optimize the speed-accuracy trade-off, we propose Semantic-aware Temporal Accumulation score (STA) to prune spatio-temporal tokens integrally. STA score considers two critical factors: temporal redundancy and semantic importance. The former depicts a specific region based on whether it is a new occurrence or a seen entity by aggregating token-to-token similarity in consecutive frames while the latter evaluates each token based on its contribution to the overall prediction. As a result, tokens with higher scores of STA carry more temporal redundancy as well as lower semantics thus being pruned. Based on the STA score, we are able to progressively prune the tokens without introducing any additional parameters or requiring further re-training. We directly apply the STA module to off-the-shelf ViT and VideoSwin backbones, and the empirical results on Kinetics-400 and Something-Something V2 achieve over 30% computation reduction with a negligible ~0.2% accuracy drop. The code is released at https://github.com/Mark12Ding/STA.
摘要
孵化器在计算机视觉领域中成为主要脊梁,但它们的计算成本却限制其在视频识别领域的应用。为了优化速率和准确度之间的负荷,我们提议使用 semantic-aware temporal accumulation score(STA)来精炼时空 токен。STA Score考虑了两个关键因素:时间重复和semantic importance。前者描述特定区域是新出现还是已经看到的实体,通过在连续帧中 Token-to-token 相似性的积累来评估;后者根据每个 Token 对总预测的贡献来评估。因此,具有更高 STA 分数的 Token 会有更高的时间重复和更低的semantic importance,因此被折射。基于 STA 分数,我们可以不需要额外参数或重新训练,直接应用 STA 模块到 off-the-shelf ViT 和 VideoSwin 脊梁上。我们在 Kinetics-400 和 Something-Something V2 上进行了实验,实际结果达到了30% 的计算减少,准确率下降约0.2%。代码可以在 GitHub 上找到:https://github.com/Mark12Ding/STA。
results: 该方法可以准确地检测对象的位置和方向,并且可以快速地进行计算。 authors还引入了扩展的Skew Intersection over Union (SkewIoU)计算方法,用于处理旋转的盒子。Abstract
This paper presents an efficient way of detecting directed objects by predicting their center coordinates and direction angle. Since the objects are of uniform size, the proposed model works without predicting the object's width and height. The dataset used for this problem is presented in Honeybee Segmentation and Tracking Datasets project. One of the contributions of this work is an examination of the ability of the standard real-time object detection architecture like YoloV7 to be customized for position and direction detection. A very efficient, tiny version of the architecture is used in this approach. Moreover, only one of three detection heads without anchors is sufficient for this task. We also introduce the extended Skew Intersection over Union (SkewIoU) calculation for rotated boxes - directed IoU (DirIoU), which includes an absolute angle difference. DirIoU is used both in the matching procedure of target and predicted bounding boxes for mAP calculation, and in the NMS filtering procedure. The code and models are available at https://github.com/djordjened92/yudo.
摘要
这篇论文提出了一种高效的指向对象探测方法,通过预测对象的中心坐标和方向角来实现。由于对象均匀大小,该模型不需要预测对象的宽高。使用的数据集是Honeybee Segmentation and Tracking Datasets项目中的 dataset。本研究的一个贡献是对标准实时对象检测架构如YoloV7进行定制,以实现位置和方向检测。此外,还引入了扩展的倾斜交叉Union(SkewIoU)计算方法,该方法包括绝对角度差。SkewIoU被用于练习步骤中的匹配过程和NMS筛选步骤。代码和模型可以在https://github.com/djordjened92/yudo中下载。
Facial Prior Based First Order Motion Model for Micro-expression Generation
results: 本文使用公共的CASME II、SAMM和SMIC datasets进行训练,并使用模型生成新的微表情视频进行评估。模型在Facial Micro-Expression Challenge 2021(MEGC2021)中获得了第一名,并被三位专家认证为Facial Action Coding System认证。Abstract
Spotting facial micro-expression from videos finds various potential applications in fields including clinical diagnosis and interrogation, meanwhile this task is still difficult due to the limited scale of training data. To solve this problem, this paper tries to formulate a new task called micro-expression generation and then presents a strong baseline which combines the first order motion model with facial prior knowledge. Given a target face, we intend to drive the face to generate micro-expression videos according to the motion patterns of source videos. Specifically, our new model involves three modules. First, we extract facial prior features from a region focusing module. Second, we estimate facial motion using key points and local affine transformations with a motion prediction module. Third, expression generation module is used to drive the target face to generate videos. We train our model on public CASME II, SAMM and SMIC datasets and then use the model to generate new micro-expression videos for evaluation. Our model achieves the first place in the Facial Micro-Expression Challenge 2021 (MEGC2021), where our superior performance is verified by three experts with Facial Action Coding System certification. Source code is provided in https://github.com/Necolizer/Facial-Prior-Based-FOMM.
摘要
发现面部微表情从视频中的应用场景广泛,包括临床诊断和问候,但这个任务仍然具有挑战性,原因是训练数据的限制。为解决这个问题,这篇论文提出了一个新的任务 called micro-expression generation,并提供了一个强大的基线模型,它结合了首要动作模型和面部先验知识。给定一个目标面,我们希望使其生成微表情视频,根据源视频的动作模式。我们的新模型包括三个模块。首先,我们从一个区域专注模块中提取面部先验特征。第二,我们使用关键点和本地拟合变换来估算面部动作,并使用运动预测模块。第三,我们使用表达生成模块来驱动目标面生成视频。我们在公共的 CASME II、SAMM 和 SMIC 数据集上训练了我们的模型,然后使用模型生成新的微表情视频进行评估。我们的模型在 Facial Micro-Expression Challenge 2021 (MEGC2021) 中获得了第一名,并被三位专家(具有 Facial Action Coding System 证书)验证了我们的优秀表现。源代码可以在 https://github.com/Necolizer/Facial-Prior-Based-FOMM 上获取。
Estimation of Human Condition at Disaster Site Using Aerial Drone Images
results: 研究结果显示,可以达到超过80%的准确率来分类特征人类动作状况,而其他类似人类动作状况只能达到约50%的准确率。此外,云端VR演示应用程序表明了使用无人机理解灾难现场和估计人员状况的可能性。Abstract
Drones are being used to assess the situation in various disasters. In this study, we investigate a method to automatically estimate the damage status of people based on their actions in aerial drone images in order to understand disaster sites faster and save labor. We constructed a new dataset of aerial images of human actions in a hypothetical disaster that occurred in an urban area, and classified the human damage status using 3D ResNet. The results showed that the status with characteristic human actions could be classified with a recall rate of more than 80%, while other statuses with similar human actions could only be classified with a recall rate of about 50%. In addition, a cloud-based VR presentation application suggested the effectiveness of using drones to understand the disaster site and estimate the human condition.
摘要
<>translate english text into simplified chineseDrones are being used to assess the situation in various disasters. In this study, we investigate a method to automatically estimate the damage status of people based on their actions in aerial drone images in order to understand disaster sites faster and save labor. We constructed a new dataset of aerial images of human actions in a hypothetical disaster that occurred in an urban area, and classified the human damage status using 3D ResNet. The results showed that the status with characteristic human actions could be classified with a recall rate of more than 80%, while other statuses with similar human actions could only be classified with a recall rate of about 50%. In addition, a cloud-based VR presentation application suggested the effectiveness of using drones to understand the disaster site and estimate the human condition.中文简体版:<>将英文文本翻译成中文简体版用悬浮机评估灾害现场的情况,这项研究探讨了基于悬浮机上空图像的人类行为自动评估人员受损状况的方法,以便更快地理解灾害现场和降低劳动成本。我们创建了一个新的飞行图像人类行为数据集,使用3D ResNet分类人员受损状况,结果显示了特征人类行为状况的分类率高于80%,而其他类似人类行为状况的分类率只有约50%。此外,一个云端VR演示应用程序表明了使用悬浮机理解灾害现场和估算人员状况的效果。
Unsupervised Camouflaged Object Segmentation as Domain Adaptation
results: 我们的基eline模型在UCOS数据集上 achieve superior segmentation性能,与竞争对手无监督模型相比,即使训练集规模只有一半于监督COS counterpart。Abstract
Deep learning for unsupervised image segmentation remains challenging due to the absence of human labels. The common idea is to train a segmentation head, with the supervision of pixel-wise pseudo-labels generated based on the representation of self-supervised backbones. By doing so, the model performance depends much on the distance between the distributions of target datasets and the pre-training dataset (e.g., ImageNet). In this work, we investigate a new task, namely unsupervised camouflaged object segmentation (UCOS), where the target objects own a common rarely-seen attribute, i.e., camouflage. Unsurprisingly, we find that the state-of-the-art unsupervised models struggle in adapting UCOS, due to the domain gap between the properties of generic and camouflaged objects. To this end, we formulate the UCOS as a source-free unsupervised domain adaptation task (UCOS-DA), where both source labels and target labels are absent during the whole model training process. Specifically, we define a source model consisting of self-supervised vision transformers pre-trained on ImageNet. On the other hand, the target domain includes a simple linear layer (i.e., our target model) and unlabeled camouflaged objects. We then design a pipeline for foreground-background-contrastive self-adversarial domain adaptation, to achieve robust UCOS. As a result, our baseline model achieves superior segmentation performance when compared with competing unsupervised models on the UCOS benchmark, with the training set which's scale is only one tenth of the supervised COS counterpart.
摘要
深度学习无监督图像分割仍然是挑战,因为缺乏人类标签。常见的想法是训练一个分割头,通过自我监督核心的代码生成的 Pseudo-标签来监督。由于模型性能很大程度取决于目标数据集和预训练集(如ImageNet)之间的分布距离,在这个工作中,我们 investigate一个新任务,即无监督隐形对象分割(UCOS)。不 surprisingly,我们发现现有state-of-the-art无监督模型在适应UCOS时受到域 gap的限制,即隐形对象的特性与普通对象的特性之间的域差异。为此,我们将UCOS定义为一个源无监督领域适应任务(UCOS-DA),其中源标签和目标标签都缺失在模型训练过程中。我们定义了一个源模型,即基于ImageNet自我监督vision transformers预训练的模型。然而,目标领域包括一个简单的线性层(即我们的目标模型)和无标签的隐形对象。我们然后设计了一个干扰对比自我适应领域域适应管道,以实现robust UCOS。结果,我们的基线模型在UCOS benchmark上比同类无监督模型表现出色,训练集的规模只是一个十分之一的超vised COS counterpart。
Large-Scale Multi-Hypotheses Cell Tracking Using Ultrametric Contours Maps
results: 该方法在3D图像中实现了状态领先的细胞跟踪结果,并且比使用深度学习方法要快得多。此外,该方法可以支持多种 Cell segmentation 模型,并可以将它们组合成一个 ensemble 来提高跟踪性能。Abstract
In this work, we describe a method for large-scale 3D cell-tracking through a segmentation selection approach. The proposed method is effective at tracking cells across large microscopy datasets on two fronts: (i) It can solve problems containing millions of segmentation instances in terabyte-scale 3D+t datasets; (ii) It achieves competitive results with or without deep learning, which requires 3D annotated data, that is scarce in the fluorescence microscopy field. The proposed method computes cell tracks and segments using a hierarchy of segmentation hypotheses and selects disjoint segments by maximizing the overlap between adjacent frames. We show that this method achieves state-of-the-art results in 3D images from the cell tracking challenge and has a faster integer linear programming formulation. Moreover, our framework is flexible and supports segmentations from off-the-shelf cell segmentation models and can combine them into an ensemble that improves tracking. The code is available https://github.com/royerlab/ultrack.
摘要
在这个工作中,我们描述了一种大规模3D细胞跟踪方法,通过选择分割选择方法实现。我们提出的方法能够在大规模微型生物图像 dataset 中跟踪细胞,包括:(i) 解决包含数百万个分割实例的 terrabyte 级 3D+t 数据集中的问题;(ii) 在 fluorescence 微型生物图像领域中罕见的3D标注数据不足的情况下,与或 без深度学习模型,达到竞争性的结果。我们的方法计算细胞跟踪和分割使用层次结构的分割假设,并选择不相交的分割。我们展示了该方法在3D图像中的细胞跟踪挑战中达到了状态的前导 результа。此外,我们的框架是灵活的,支持自动生成的细胞分割模型,并可以将其组合成一个 ensemble,以提高跟踪性。代码可以在 https://github.com/royerlab/ultrack 上获取。
Toward unlabeled multi-view 3D pedestrian detection by generalizable AI: techniques and performance analysis
paper_authors: João Paulo Lima, Diego Thomas, Hideaki Uchiyama, Veronica Teichrieb
for: 提高多视图3D人体检测中的泛化能力
methods: 自动标注目标数据和使用无需训练的检测器
results: 使用自动标注方法可以获得更高的泛化性能,比直接使用无需训练的检测器或使用现有的标注源数据训练的检测器更好。Abstract
We unveil how generalizable AI can be used to improve multi-view 3D pedestrian detection in unlabeled target scenes. One way to increase generalization to new scenes is to automatically label target data, which can then be used for training a detector model. In this context, we investigate two approaches for automatically labeling target data: pseudo-labeling using a supervised detector and automatic labeling using an untrained detector (that can be applied out of the box without any training). We adopt a training framework for optimizing detector models using automatic labeling procedures. This framework encompasses different training sets/modes and multi-round automatic labeling strategies. We conduct our analyses on the publicly-available WILDTRACK and MultiviewX datasets. We show that, by using the automatic labeling approach based on an untrained detector, we can obtain superior results than directly using the untrained detector or a detector trained with an existing labeled source dataset. It achieved a MODA about 4% and 1% better than the best existing unlabeled method when using WILDTRACK and MultiviewX as target datasets, respectively.
摘要
我们揭示了如何使用通用化AI提高多视图3D人体检测在无标目标场景中。一种方法是自动将目标数据标注,然后用于训练检测模型。在这个上下文中,我们研究了两种自动标注目标数据的方法:假标签使用supervised检测器和自动标注使用无学习检测器。我们采用了一个用于优化检测模型的自动标注训练框架。这个框架包括不同的训练集/模式和多轮自动标注策略。我们在公共可用的WILDTRACK和MultiviewX数据集上进行了分析。我们发现,通过使用基于无学习检测器的自动标注方法,可以获得更高的性能,比直接使用无学习检测器或使用现有标注源数据集来训练检测器更好。它在使用WILDTRACK和MultiviewX作为目标数据集时,分别提高了4%和1%。
When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations
paper_authors: Rhys Compton, Lily Zhang, Aahlad Puli, Rajesh Ranganath
for: 这个研究旨在探讨 whether incorporating more data can always improve machine learning model performance,以及在医学影像数据中存在假相关性的问题。
methods: 这个研究使用了大规模的实验,对四个开源胸部X射线图像集和九个标签进行了组合。
results: 研究发现,在43%的情况下,将两个医院的数据作为训练数据,会使模型在两个医院的数据上具有更差的最坏群体精度。这种结果尽管训练数据更加相似于测试数据,但是由医院特有的图像 artifacts 导致的假相关性的出现。Abstract
In machine learning, incorporating more data is often seen as a reliable strategy for improving model performance; this work challenges that notion by demonstrating that the addition of external datasets in many cases can hurt the resulting model's performance. In a large-scale empirical study across combinations of four different open-source chest x-ray datasets and 9 different labels, we demonstrate that in 43% of settings, a model trained on data from two hospitals has poorer worst group accuracy over both hospitals than a model trained on just a single hospital's data. This surprising result occurs even though the added hospital makes the training distribution more similar to the test distribution. We explain that this phenomenon arises from the spurious correlation that emerges between the disease and hospital, due to hospital-specific image artifacts. We highlight the trade-off one encounters when training on multiple datasets, between the obvious benefit of additional data and insidious cost of the introduced spurious correlation. In some cases, balancing the dataset can remove the spurious correlation and improve performance, but it is not always an effective strategy. We contextualize our results within the literature on spurious correlations to help explain these outcomes. Our experiments underscore the importance of exercising caution when selecting training data for machine learning models, especially in settings where there is a risk of spurious correlations such as with medical imaging. The risks outlined highlight the need for careful data selection and model evaluation in future research and practice.
摘要
在机器学习中,通常认为更多数据会提高模型性能,但这项工作挑战了这一观点,示出在许多情况下,外部数据集的添加实际上会降低模型性能。我们在四个不同的开源胸部X射线图像集和九个标签之间进行了大规模的实验,发现在43%的情况下,使用两家医院的数据进行训练的模型在两家医院的数据上具有较差的最坏群 accuracy。这种意外的结果尽管训练集在测试集中变得更加相似,但是由医院特有的图像artefact引起的假相关性导致这种现象。我们解释了这种现象的起因,并提出了在多个数据集训练时存在的贸易关系。我们发现在某些情况下,平衡数据可以消除假相关性,提高性能,但并不总是有效的策略。我们将这些结果与文献中的假相关性相关研究进行比较,以帮助解释这些结果。我们的实验警示了在机器学习模型训练时,特别是在医疗影像领域,必须仔细选择训练数据,以避免假相关性的风险。这些风险描述了未来研究和实践中需要进行仔细的数据选择和模型评估。
A Deep-Learning Method Using Auto-encoder and Generative Adversarial Network for Anomaly Detection on Ancient Stone Stele Surfaces
results: 在使用Longmen洞雕像石刻为案例研究中,提出了一种无监督学习模型,实现了99.74%的重建精度。该方法可以准确地检测七种人工设计的异常,无误告警。Abstract
Accurate detection of natural deterioration and man-made damage on the surfaces of ancient stele in the first instance is essential for their preventive conservation. Existing methods for cultural heritage preservation are not able to achieve this goal perfectly due to the difficulty of balancing accuracy, efficiency, timeliness, and cost. This paper presents a deep-learning method to automatically detect above mentioned emergencies on ancient stone stele in real time, employing autoencoder (AE) and generative adversarial network (GAN). The proposed method overcomes the limitations of existing methods by requiring no extensive anomaly samples while enabling comprehensive detection of unpredictable anomalies. the method includes stages of monitoring, data acquisition, pre-processing, model structuring, and post-processing. Taking the Longmen Grottoes' stone steles as a case study, an unsupervised learning model based on AE and GAN architectures is proposed and validated with a reconstruction accuracy of 99.74\%. The method's evaluation revealed the proficient detection of seven artificially designed anomalies and demonstrated precision and reliability without false alarms. This research provides novel ideas and possibilities for the application of deep learning in the field of cultural heritage.
摘要
通过检测古代碑刻表面的自然衰败和人工损害,可以采取预防保护措施。现有的文化遗产保护方法不能完全实现这个目标,因为很难平衡准确性、效率、时效性和成本。本文提出了一种基于深度学习的方法,可以在实时中自动检测古代石碑上的紧急情况,使用自适应网络(AE)和生成对抗网络(GAN)。该方法可以减少现有方法的限制,不需要大量的异常样本,同时可以全面检测不可预测的异常。该方法包括监测、数据收集、预处理、模型结构和后处理等阶段。通过使用长门石窟的石碑作为案例研究,我们提出了一种无监督学习模型,并在重建精度为99.74%的基础上验证了其可靠性和精度。测试结果表明该方法可以准确检测七种人工设计的异常情况,而无 FALSE ALARM 问题。这些研究提供了深度学习在文化遗产保护领域的新想法和可能性。
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
results: 我们在两个常用的数据集上进行了广泛的实验评估,并证明了DiffCR在所有指标上均达到了当前最佳性能,其参数和计算复杂度分别为5.1%和5.4%。所有实验结果和代码将在https://github.com/XavierJiezou/DiffCR上公开发布。Abstract
Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.
摘要
“Optical 卫星图像是一种重要的数据源;然而,云覆盖通常会降低图像质量,使图像应用和分析变得困难。因此,从云除去技术已成为一个主要的研究方向。而在最近的进展中,大多数研究都基于生成对抗网络,尽管它们可能会导致图像质量下降。Diffusion 模型在多种图像生成任务中表现出色,这表明它们可能会解决这个挑战。本文提出了一个新的框架called DiffCR,它利用 conditional 导向扩散和深度卷积网络来实现高性能的云除去技术。特别是,我们引入了独立的编码器来提取 conditional 图像特征,以确保 conditional 输入和生成输出的外观信息几乎相同。此外,我们还提出了一种新的时间和条件融合块,以准确模拟 conditional 图像中的外观和目标图像之间的对应关系,并且在低计算成本下完成。我们对两种常用的参考数据集进行了广泛的实验评估,结果表明,DiffCR 在所有指标上具有状态之冠的表现,与参数和计算复杂度分别为 5.1% 和 5.4%,与之前最佳方法相比。源代码、预训练模型和所有实验结果将在 https://github.com/XavierJiezou/DiffCR 上公开发布, waits for the paper's acceptance of this work。”
Digging into Depth Priors for Outdoor Neural Radiance Fields
results: 实验结果显示了各种深度假设和深度使用方式在户外NeRF训练中的效果,并提供了一些可能有用的实践经验和研究方向。In English, the three key points are:
for: The paper aims to investigate the impact of using depth priors in outdoor NeRF training, to solve the shape-radiance ambiguity problem in radiance fields.
methods: The study uses two commonly used NeRF methods and compares them with four commonly used depth priors.
results: The experimental results show the effects of different depth priors and depth usage strategies in outdoor NeRF training, and provide useful practical experience and research directions.Abstract
Neural Radiance Fields (NeRF) have demonstrated impressive performance in vision and graphics tasks, such as novel view synthesis and immersive reality. However, the shape-radiance ambiguity of radiance fields remains a challenge, especially in the sparse viewpoints setting. Recent work resorts to integrating depth priors into outdoor NeRF training to alleviate the issue. However, the criteria for selecting depth priors and the relative merits of different priors have not been thoroughly investigated. Moreover, the relative merits of selecting different approaches to use the depth priors is also an unexplored problem. In this paper, we provide a comprehensive study and evaluation of employing depth priors to outdoor neural radiance fields, covering common depth sensing technologies and most application ways. Specifically, we conduct extensive experiments with two representative NeRF methods equipped with four commonly-used depth priors and different depth usages on two widely used outdoor datasets. Our experimental results reveal several interesting findings that can potentially benefit practitioners and researchers in training their NeRF models with depth priors. Project Page: https://cwchenwang.github.io/outdoor-nerf-depth
摘要
神经辐射场(NeRF)在视觉和图形任务中表现出色,如新视角合成和吸引实际。然而,辐射场的形态-辐射权 ambiguity仍然是一大挑战,特别是在稀疏视点设置下。现有研究通过将深度假设 incorporated into outdoor NeRF 训练来解决这个问题。然而,选择depth priors的准则和不同假设的相对优劣还没有进行了全面的研究。此外,使用不同方法选择depth priors的问题也是一个未解决的问题。本文提供了对使用depth priors来outdoor神经辐射场的全面研究和评价,涵盖了常见的深度探测技术和大多数应用方式。特别是,我们进行了两个代表性的NeRF方法和四种常用的深度假设进行了广泛的实验,并在两个广泛使用的户外数据集上进行了extensive experiments。我们的实验结果表明了一些有价值的发现,可能对实践者和研究人员在训练NeRF模型时有所帮助。项目页面:https://cwchenwang.github.io/outdoor-nerf-depth
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection
paper_authors: Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo
For: The paper is written for proposing a highly performant 3D object detector for point clouds using the DETR framework.* Methods: The paper introduces a novel 3D Vertex Relative Position Encoding (3DV-RPE) method that computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, which helps the model focus on points near the objects and improve object detection accuracy.* Results: The paper achieves significant improvements over the previous 3DETR in $\rm{AP}{25}$/$\rm{AP}{50}$ from 65.0%/47.0% to 77.8%/66.0%, respectively, on the challenging ScanNetV2 benchmark. The method also sets a new record on ScanNetV2 and SUN RGB-D datasets.Abstract
We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address the limitation, we introduce a novel 3D Vertex Relative Position Encoding (3DV-RPE) method which computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, thus providing clear information to guide the model to focus on points near the objects, in accordance with the principle of locality. In addition, we systematically improve the pipeline from various aspects such as data normalization based on our understanding of the task. We show exceptional results on the challenging ScanNetV2 benchmark, achieving significant improvements over the previous 3DETR in $\rm{AP}_{25}$/$\rm{AP}_{50}$ from 65.0\%/47.0\% to 77.8\%/66.0\%, respectively. In addition, our method sets a new record on ScanNetV2 and SUN RGB-D datasets.Code will be released at http://github.com/yichaoshen-MS/V-DETR.
摘要
我们介绍一个高性能的3D物体探测器使用DETR框架。先前的尝试都会得到不佳结果,因为它们无法从训练数据的限制范围中学习正确的归胚。特别是,问题经常对距离目标物体较远的点进行参考,违反了物体探测中的地方性原则。为解决这个限制,我们提出了一个新的3D点 cloud 相对位置编码方法(3DV-RPE),它在每个层的解oder层中计算每个点的位置编码,基于这个点与Predicted 3D Box的相对位置,以提供明确的信息,使模型能够专注于靠近物体的点,遵循物体探测中的地方性原则。此外,我们系统地提高了管线,包括根据我们对任务的认识而进行的数据Normalization。我们在ScanNetV2标准benchmark上获得了出色的结果,从65.0%/47.0%的AP25/AP50提高到77.8%/66.0%,增进了过去3DETR的significant。此外,我们的方法在ScanNetV2和SUN RGB-D数据集上设置了新的纪录。代码将在http://github.com/yichaoshen-MS/V-DETR上发布。
Person Re-Identification without Identification via Event Anonymization
paper_authors: Shafiq Ahmad, Pietro Morerio, Alessio Del Bue
for: 避免人员隐私泄露,对于event-camera视觉应用进行隐私保护。
methods: 提出了一个统一的网络架构,同时实现隐私保护和下游任务(如人ReId)的两重目标。
results: 实现了对于event-camera资料的隐私保护,并在实验中证明了其效果。Abstract
Wide-scale use of visual surveillance in public spaces puts individual privacy at stake while increasing resource consumption (energy, bandwidth, and computation). Neuromorphic vision sensors (event-cameras) have been recently considered a valid solution to the privacy issue because they do not capture detailed RGB visual information of the subjects in the scene. However, recent deep learning architectures have been able to reconstruct images from event cameras with high fidelity, reintroducing a potential threat to privacy for event-based vision applications. In this paper, we aim to anonymize event-streams to protect the identity of human subjects against such image reconstruction attacks. To achieve this, we propose an end-to-end network architecture jointly optimized for the twofold objective of preserving privacy and performing a downstream task such as person ReId. Our network learns to scramble events, enforcing the degradation of images recovered from the privacy attacker. In this work, we also bring to the community the first ever event-based person ReId dataset gathered to evaluate the performance of our approach. We validate our approach with extensive experiments and report results on the synthetic event data simulated from the publicly available SoftBio dataset and our proposed Event-ReId dataset.
摘要
广泛使用视觉监测在公共空间 puts 个人隐私受到威胁,同时增加资源消耗(能源、带宽、计算)。 neuromorphic vision sensors(事件摄像头)在最近被视为隐私问题的有效解决方案,因为它们不捕捉Scene中主体的详细RGB视觉信息。然而,最近的深度学习架构可以很好地从事件摄像头中重建图像,重新引入隐私问题 для事件视觉应用。在这篇论文中,我们想要匿名事件流以保护人类主体的身份 against 这种图像重建攻击。为了实现这一目标,我们提议了一个综合架构,该架构同时满足隐私保护和下游任务(如人ReId)的两重目标。我们的网络学习混合事件,使得恢复图像的攻击者难以获得有用信息。在这项工作中,我们还为社区提供了首次使用事件基于人ReId数据集进行评估我们的方法的机会。我们通过了详细的实验,并在Synthetic事件数据和我们提出的Event-ReId数据集上进行了评估。
LEFormer: A Hybrid CNN-Transformer Architecture for Accurate Lake Extraction from Remote Sensing Imagery
paper_authors: Ben Chen, Xuechao Zou, Yu Zhang, Jiayu Li, Kai Li, Pin Tao
For: Accurate lake extraction from remote sensing imagery.* Methods: Hybrid CNN-Transformer architecture (LEFormer) with four main modules: CNN encoder, Transformer encoder, cross-encoder fusion, and lightweight decoder.* Results: Consistently achieves state-of-the-art (SOTA) performance and efficiency on two datasets (Surface Water and Qinghai-Tibet Plateau Lake) with mIoU scores of 90.86% and 97.42%, outperforming existing methods while being 20x more efficient.Abstract
Lake extraction from remote sensing imagery is challenging due to the complex shapes of lakes and the presence of noise. Existing methods suffer from blurred segmentation boundaries and poor foreground modeling. In this paper, we propose a hybrid CNN-Transformer architecture, called LEFormer, for accurate lake extraction. LEFormer contains four main modules: CNN encoder, Transformer encoder, cross-encoder fusion, and lightweight decoder. The CNN encoder recovers local spatial information and improves fine-scale details. Simultaneously, the Transformer encoder captures long-range dependencies between sequences of any length, allowing them to obtain global features and context information better. Finally, a lightweight decoder is employed for mask prediction. We evaluate the performance and efficiency of LEFormer on two datasets, the Surface Water (SW) and the Qinghai-Tibet Plateau Lake (QTPL). Experimental results show that LEFormer consistently achieves state-of-the-art (SOTA) performance and efficiency on these two datasets, outperforming existing methods. Specifically, LEFormer achieves 90.86% and 97.42% mIoU on the SW and QTPL datasets with a parameter count of 3.61M, respectively, while being 20x minor than the previous SOTA method.
摘要
湖水抽取从遥感影像中是一项复杂的任务,由于湖泊的复杂形态和干扰的存在。现有的方法受到模糊的分割边界和质地模型的限制。在这篇论文中,我们提出了一种混合CNN-Transformer架构,称为LEFormer,用于准确的湖水抽取。LEFormer包括四个主要模块:CNN编码器、Transformer编码器、交叉编码器融合和轻量级解码器。CNN编码器恢复本地空间信息,提高细致细节。同时,Transformer编码器捕捉任意长度序列之间的长距离依赖关系,以获得更好的全球特征和上下文信息。最后,一个轻量级解码器被employmed для推测Mask。我们对LEFormer在两个数据集上进行了性能和效率测试,结果表明LEFormer在这两个数据集上具有SOTA性能和效率,并且在参数计数3.61M的情况下,与之前的SOTA方法相比,LEFormer的参数计数减少了20倍。特别是,LEFormer在SW数据集上达到了90.86%和QTPL数据集上达到了97.42%的mIoU,而且参数计数只有3.61M。
Data Augmentation-Based Unsupervised Domain Adaptation In Medical Imaging
results: 结果显示,该方法能够实现高精度、广泛适用和对数据集shift具有强大的韧性,在大多数情况下超越了现有的表现。Abstract
Deep learning-based models in medical imaging often struggle to generalize effectively to new scans due to data heterogeneity arising from differences in hardware, acquisition parameters, population, and artifacts. This limitation presents a significant challenge in adopting machine learning models for clinical practice. We propose an unsupervised method for robust domain adaptation in brain MRI segmentation by leveraging MRI-specific augmentation techniques. To evaluate the effectiveness of our method, we conduct extensive experiments across diverse datasets, modalities, and segmentation tasks, comparing against the state-of-the-art methods. The results show that our proposed approach achieves high accuracy, exhibits broad applicability, and showcases remarkable robustness against domain shift in various tasks, surpassing the state-of-the-art performance in the majority of cases.
摘要
深度学习模型在医疗影像中经常陷于新扫描数据不好适应的问题,这导致模型在实际应用中表现不佳。我们提出了一种无监督的多元领域适应方法,通过利用特定于MRI的扩展技术来强化模型的鲁棒性。为评估我们的方法的有效性,我们在多个数据集、模式和分割任务中进行了广泛的实验,与当前最佳方法进行比较。结果显示,我们的提议方法在多个任务中达到了高精度,具有广泛的应用性和强大的鲁棒性,在大多数情况下超过了当前最佳性能。
DELFlow: Dense Efficient Learning of Scene Flow for Large-Scale Point Clouds
results: 与优先艺术相比,该方法在 FlyingThings3D 和 KITTI 数据集上实现了更高的效率和准确性。Abstract
Point clouds are naturally sparse, while image pixels are dense. The inconsistency limits feature fusion from both modalities for point-wise scene flow estimation. Previous methods rarely predict scene flow from the entire point clouds of the scene with one-time inference due to the memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation. To mitigate these issues in scene flow learning, we regularize raw points to a dense format by storing 3D coordinates in 2D grids. Unlike the sampling operation commonly used in existing works, the dense 2D representation 1) preserves most points in the given scene, 2) brings in a significant boost of efficiency, and 3) eliminates the density gap between points and pixels, allowing us to perform effective feature fusion. We also present a novel warping projection technique to alleviate the information loss problem resulting from the fact that multiple points could be mapped into one grid during projection when computing cost volume. Sufficient experiments demonstrate the efficiency and effectiveness of our method, outperforming the prior-arts on the FlyingThings3D and KITTI dataset.
摘要
点云自然 sparse,而图像像素 dense。这种不一致性限制了从两个Modalities的特征融合,用于点云流场估计。前一代方法rarely predict scene flow from the entire point clouds of the scene with one-time inference due to memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation. To mitigate these issues in scene flow learning, we regularize raw points to a dense format by storing 3D coordinates in 2D grids. Unlike the sampling operation commonly used in existing works, the dense 2D representation 1) preserves most points in the given scene, 2) brings in a significant boost of efficiency, and 3) eliminates the density gap between points and pixels, allowing us to perform effective feature fusion. We also present a novel warping projection technique to alleviate the information loss problem resulting from the fact that multiple points could be mapped into one grid during projection when computing cost volume. Sufficient experiments demonstrate the efficiency and effectiveness of our method, outperforming the prior-arts on the FlyingThings3D and KITTI dataset.Here's a word-for-word translation of the text into Simplified Chinese:点云自然 sparse,而图像像素 dense。这种不一致性限制了从两个Modalities的特征融合,用于点云流场估计。前一代方法rarely predict scene flow from the entire point clouds of the scene with one-time inference due to memory inefficiency and heavy overhead from distance calculation and sorting involved in commonly used farthest point sampling, KNN, and ball query algorithms for local feature aggregation。 To mitigate these issues in scene flow learning, we regularize raw points to a dense format by storing 3D coordinates in 2D grids。 Unlike the sampling operation commonly used in existing works, the dense 2D representation 1) preserves most points in the given scene, 2) brings in a significant boost of efficiency, and 3) eliminates the density gap between points and pixels, allowing us to perform effective feature fusion。 We also present a novel warping projection technique to alleviate the information loss problem resulting from the fact that multiple points could be mapped into one grid during projection when computing cost volume。 Sufficient experiments demonstrate the efficiency and effectiveness of our method, outperforming the prior-arts on the FlyingThings3D and KITTI dataset。
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination
results: 经过广泛的实验 validate 了我们的提案,在Flickr30K和MS-COCO数据集上达到了比较好的表现,并且在假性负样本的存在下保持了模型的表现稳定性。Abstract
Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of is important for effectively training the model, e.g., hard negatives make the model learn efficiently and effectively. However, we observe that existing methods mainly employ the most similar samples as hard negatives, which may not be true negatives. In other words, the samples with high similarity but not paired with the anchor may reserve positive semantic associations, and we call them false negatives. Repelling these false negatives in triplet loss would mislead the semantic representation learning and result in inferior retrieval performance. In this paper, we propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling, which could alleviate the problem introduced by false negatives. Specifically, we first construct the distributions of positive and negative samples separately via their similarities with the anchor, based on the features extracted from image and text encoders. Then we calculate the false negative probability of a given sample based on its similarity with the anchor and the above distributions via the Bayes' rule, which is employed as the sampling weight during negative sampling process. Since there may not exist any false negative in a small batch size, we design a memory module with momentum to retain a large negative buffer and implement our negative sampling strategy spanning over the buffer. In addition, to make the model focus on hard negatives, we reassign the sampling weights for the simple negatives with a cut-down strategy. The extensive experiments are conducted on Flickr30K and MS-COCO, and the results demonstrate the superiority of our proposed false negative elimination strategy. The code is available at https://github.com/LuminosityX/FNE.
摘要
现有的图像文本匹配方法大多采用 triplet loss 作为优化目标,选择正确的负样本 для triplet 中的链接、正例和负例是重要的,以提高模型的训练效率和效果。然而,我们发现现有的方法主要使用最相似的样本作为负样本,这可能不是真正的负例。即,具有高相似度但不是链接的样本可能具有正面 semantic association,我们称之为假负例。对 triplet loss 中的这种假负例的排斥会导致 semantic representation 的学习受到干扰,结果导致对图像文本匹配的 Retrieval 性能下降。在这篇文章中,我们提出了一个 False Negative Elimination (FNE) 策略,用于选择负例via sampling,以解决上述问题。具体来说,我们首先将图像和文本编码器提取出的特征用于分别建立正例和负例的分布,然后计算对链接的 false negative probability 基于它们的相似度和上述分布,并运用 Bayes 规则作为 sampling weight。在小批量大小中可能没有假负例,我们设计了一个内存模组,以维持一个大型的负例缓存,并实现我们的负例抽样策略。此外,为使模型专注于困难的负例,我们将简单的负例重新分配 sampling weight 的策略。实验结果显示,我们提出的 False Negative Elimination 策略在 Flickr30K 和 MS-COCO 上得到了优异的结果。代码可以在 https://github.com/LuminosityX/FNE 上获取。
Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning
results: Pelta 在一个 state-of-the-art 的集成模型上进行评估,并证明其效iveness против Self Attention Gradient 攻击。Abstract
The main premise of federated learning is that machine learning model updates are computed locally, in particular to preserve user data privacy, as those never leave the perimeter of their device. This mechanism supposes the general model, once aggregated, to be broadcast to collaborating and non malicious nodes. However, without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.
摘要
主要假设联邦学习的概念是,机器学习模型的更新是在本地进行,具体是为了维护用户数据隐私,因为这些数据从不会离开用户的设备。这个机制假设整个模型,一旦统计,会被协力且不可靠的节点所传递。但是, Without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.Here's a word-for-word translation of the text into Simplified Chinese:主要假设联邦学习的概念是,机器学习模型的更新是在本地进行,具体是为了保护用户数据隐私,因为这些数据从不会离开用户的设备。这个机制假设整个模型,一旦统计,会被协力且不可靠的节点所传递。但是,无法防止受损客户端可以轻松地在本地内存中探测模型,寻找攻击性的例子。例如,考虑图像基于应用程序,攻击性的例子包括不可见地修改图像(对人类眼可见),使本地模型错误分类,这些修改图像可以在后来被让客户端对方的模型中复制攻击。为了消除这种恶意探测,我们介绍了Pelta,一种新的遮盾机制,利用可信硬件。通过利用可信执行环境(TEEs)的能力,Pelta遮盖一部分的反向传播链规则,通常由攻击者利用来设计恶意样本。我们对一个现代集成模型进行评估,并证明Pelta在自注意Gradient攻击下的效果。
When Super-Resolution Meets Camouflaged Object Detection: A Comparison Study
results: 本研究通过对SR和COD两个领域的综合评估,探讨了这两个领域之间的关系,发现了一些新的实验现象,并summarized新的研究方向。Abstract
Super Resolution (SR) and Camouflaged Object Detection (COD) are two hot topics in computer vision with various joint applications. For instance, low-resolution surveillance images can be successively processed by super-resolution techniques and camouflaged object detection. However, in previous work, these two areas are always studied in isolation. In this paper, we, for the first time, conduct an integrated comparative evaluation for both. Specifically, we benchmark different super-resolution methods on commonly used COD datasets, and meanwhile, we evaluate the robustness of different COD models by using COD data processed by SR methods. Our goal is to bridge these two domains, discover novel experimental phenomena, summarize new experim.
摘要
superResolution (SR) 和 camouflagedObjectDetection (COD) 是计算机视觉中两个热门的话题,它们在各种应用场景中可以结合使用。例如,低分辨率的视频监测图像可以被一次接一次地处理superResolution 技术,并且使用 COD 模型进行掩蔽物检测。然而,在过去的研究中,这两个领域一直被研究独立。在这篇论文中,我们首次对这两个领域进行了集成性评估。 Specifically, we 对常用的 COD 数据集进行了不同的 superResolution 方法的比较,而同时,我们也使用 SR 方法处理 COD 数据来评估 COD 模型的可靠性。我们的目标是将这两个领域相连,发现新的实验现象,总结新的经验。
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
results: 实验结果显示,提出的混合模式识别架构可以实现高性能和能效的模式识别,并且可以处理RGB帧和事件流的融合运算。Abstract
Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may be still limited due to the following two issues. Firstly, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition. However, seldom of them consider achieving a balance between these two aspects. In this paper, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multi-modal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-Event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.
摘要
event镜头基于模式识别是一个相对较新的研究领域,现有研究者通常将事件流转换为图像、图表或 voxel,并采用深度神经网络进行事件基本类型的识别。尽管在简单的事件识别数据上可以获得良好的性能,但是其结果可能受到以下两个问题的限制。首先,他们仅使用空间稀疏的事件流进行识别,这可能会忽略颜色和细节 тексту层的信息。其次,他们可能采用神经元突发网络(SNN)进行能效识别,或者人工神经网络(ANN)进行能量浪费、高性能识别。然而,很少的人会考虑在这两个方面寻求平衡。在本文中,我们正式提出将RGB帧和事件流同时识别,并提出了一种新的RGB帧-事件识别框架。该框架包括四个主要模块:记忆支持变换网络 дляRGB帧编码、突发神经网络 для raw事件流编码、多模态瓶颈融合模块 дляRGB-Event特征聚合,以及预测头。由于RGB-Event基本类型的识别数据稀缺,我们还提出了一个大规模的PokerEvent数据集,该数据集包含114个类别,27102帧-事件对。我们对两个RGB-Event基本类型的识别数据集进行了广泛的实验,并证明了我们提出的框架的效果。我们希望这种工作能够促进RGB帧和事件流的模式识别的发展。我们的数据集和源代码将在https://github.com/Event-AHU/SSTFormer上发布。
results: 使用半监督学习引擎团队建立了11个广泛的公共交通主题,并使用这些数据来训练和评估一种基于RoBERTa架构的语言模型。该模型在所有评价指标上都超过了经典机器学习方法,提供了90%的主题分类精度。Abstract
Transit riders' feedback provided in ridership surveys, customer relationship management (CRM) channels, and in more recent times, through social media is key for transit agencies to better gauge the efficacy of their services and initiatives. Getting a holistic understanding of riders' experience through the feedback shared in those instruments is often challenging, mostly due to the open-ended, unstructured nature of text feedback. In this paper, we propose leveraging traditional transit CRM feedback to develop and deploy a transit-topic-aware large language model (LLM) capable of classifying open-ended text feedback to relevant transit-specific topics. First, we utilize semi-supervised learning to engineer a training dataset of 11 broad transit topics detected in a corpus of 6 years of customer feedback provided to the Washington Metropolitan Area Transit Authority (WMATA). We then use this dataset to train and thoroughly evaluate a language model based on the RoBERTa architecture. We compare our LLM, MetRoBERTa, to classical machine learning approaches utilizing keyword-based and lexicon representations. Our model outperforms those methods across all evaluation metrics, providing an average topic classification accuracy of 90%. Finally, we provide a value proposition of this work demonstrating how the language model, alongside additional text processing tools, can be applied to add structure to open-ended text sources of feedback like Twitter. The framework and results we present provide a pathway for an automated, generalizable approach for ingesting, visualizing, and reporting transit riders' feedback at scale, enabling agencies to better understand and improve customer experience.
摘要
公共交通使用者的反馈,从乘客关系管理(CRM)渠道、客户反馈Surveys以及最近的社交媒体,对公共交通机构来说非常重要。通过反馈来了解乘客的体验,可以帮助机构更好地了解自己的服务和活动的效果。然而,由于反馈的开放性和无结构性,通常是困难的获得整体的理解。在这篇文章中,我们提议利用传统的公共交通CRM反馈,开发和部署一个具有公共交通话题意识的大语言模型(LLM),可以将开放式文本反馈分类到相关的公共交通话题。我们首先利用半监督学习,engineer一个训练集,其中包含6年的客户反馈数据,来自华盛顿都会区公共交通管理局(WMATA)。然后,我们使用这个数据集来训练和评估一个基于RoBERTa架构的语言模型。我们与键字基本架构和词汇表表示方法进行比较。我们的模型在所有评价指标上都高于这些方法,提供了90%的话题分类精度。最后,我们提供了这种工作的价值提案,说明如何使用语言模型, alongside其他文本处理工具,对开放式文本反馈 sources like Twitter进行结构化处理,以提高客户体验的理解和改进。我们的框架和结果提供了一种可扩展的自动化方法,可以在大规模的客户反馈数据中快速、自动地进行分类和报告,帮助机构更好地了解和改进客户体验。
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
paper_authors: Jingdan Zhang, Jiaan Wang, Xiaodan Wang, Zhixu Li, Yanghua Xiao for:这 paper 的目的是构建一个基于多modal数据的实体知识图(MMKG),以便更全面地理解实体的多个方面。methods:这 paper 使用了一种新的方法,即匹配实体图像与不同的实体方面,以扩展现有的 MMKG。这种方法包括从知识库中收集方面相关的图像,并从知识库中提取方面相关的句子作为查询来 Retrieving a large number of aspect-related images via an online image search engine。results:这 paper constructed a new MMKG called AspectMMKG, which contains 2,380 entities, 18,139 entity aspects, and 645,383 aspect-related images. 这 paper также提出了一种新的实体方面相关图像检索(AIR)模型,可以更正和扩展 AspectMMKG 中的实体图像。这种模型通过将实体图像、实体方面和相关图像信息integrate into一个模型来学习实体图像与方面相关图像之间的关系。实验结果表明,AIR 模型可以为给定的实体 Retrieves suitable images for different aspects.Abstract
Multi-modal knowledge graphs (MMKGs) combine different modal data (e.g., text and image) for a comprehensive understanding of entities. Despite the recent progress of large-scale MMKGs, existing MMKGs neglect the multi-aspect nature of entities, limiting the ability to comprehend entities from various perspectives. In this paper, we construct AspectMMKG, the first MMKG with aspect-related images by matching images to different entity aspects. Specifically, we collect aspect-related images from a knowledge base, and further extract aspect-related sentences from the knowledge base as queries to retrieve a large number of aspect-related images via an online image search engine. Finally, AspectMMKG contains 2,380 entities, 18,139 entity aspects, and 645,383 aspect-related images. We demonstrate the usability of AspectMMKG in entity aspect linking (EAL) downstream task and show that previous EAL models achieve a new state-of-the-art performance with the help of AspectMMKG. To facilitate the research on aspect-related MMKG, we further propose an aspect-related image retrieval (AIR) model, that aims to correct and expand aspect-related images in AspectMMKG. We train an AIR model to learn the relationship between entity image and entity aspect-related images by incorporating entity image, aspect, and aspect image information. Experimental results indicate that the AIR model could retrieve suitable images for a given entity w.r.t different aspects.
摘要
多modal知识图(MMKG)结合不同模式数据(例如文本和图像)以实现实体的全面理解。虽然最近大规模的 MMKG 已经取得了进展,但现有 MMKG 忽视实体的多方面性,限制了从不同角度理解实体的能力。在本文中,我们构建了 AspectMMKG,首个基于实体方面的图像的 MMKG。具体来说,我们从知识库中收集了实体方面的图像,并从知识库中提取实体方面相关的句子作为查询来 retrieve 大量实体方面相关的图像 via 在线图像搜索引擎。最后,AspectMMKG 包含 2,380 个实体、18,139 个实体方面、645,383 个实体方面相关的图像。我们示出了 AspectMMKG 在实体方面链接(EAL)下游任务中的可用性,并证明了前一个 EAL 模型在 AspectMMKG 的帮助下实现了新的州OF-THE-ART 性能。为便于研究实体方面相关的 MMKG,我们还提出了一种实体方面相关的图像检索(AIR)模型,该模型 aimed 学习实体图像和实体方面相关图像之间的关系。我们在 AIR 模型中包含实体图像、实体方面和实体方面相关图像信息。实验结果表明,AIR 模型可以为给定实体 Retrieval 相关的图像。
results: 本文的实验结果显示,这些方法可以实现高度的分类强度,并且具有跨架构对应的数据簇范例能力。此外,本文也 investigate了这些方法对不同语言的公平性。Abstract
With the rise of deep learning, large datasets and complex models have become common, requiring significant computing power. To address this, data distillation has emerged as a technique to quickly train models with lower memory and time requirements. However, data distillation on text-based datasets hasn't been explored much because of the challenges rising due to its discrete nature. Additionally, existing dataset distillation methods often struggle to generalize to new architectures. In the paper, we propose several data distillation techniques for multilingual text classification datasets using language-model-based learning methods. We conduct experiments to analyze their performance in terms of classification strength, and cross-architecture generalization. Furthermore, we investigate the language-specific fairness of the data summaries generated by these methods. Our approach builds upon existing techniques, enhancing cross-architecture generalization in the text data distillation domain.
摘要
随着深度学习的出现,大量数据和复杂的模型变得普遍,需要大量的计算能力。为解决这问题,数据简化技术得到了广泛应用,但是在文本数据集上,数据简化技术尚未得到了充分的研究,这是因为文本数据的精度性带来了很多挑战。现有的数据简化方法通常难以泛化到新的架构上。在本文中,我们提出了一些基于语言模型学习方法的文本分类数据集简化技术。我们通过实验分析这些技术的分类强度和泛化性。此外,我们还研究了这些方法生成的语言特异性数据概要的公平性。我们的方法基于现有技术,提高了文本数据简化领域的跨架构泛化性。
Improving Autonomous Separation Assurance through Distributed Reinforcement Learning with Attention Networks
results: 通过实验研究,表明提议的框架可以在高密度、动态环境中保证安全高效的分离,并且比现有方法具有更高的训练样本通过率。Abstract
Advanced Air Mobility (AAM) introduces a new, efficient mode of transportation with the use of vehicle autonomy and electrified aircraft to provide increasingly autonomous transportation between previously underserved markets. Safe and efficient navigation of low altitude aircraft through highly dense environments requires the integration of a multitude of complex observations, such as surveillance, knowledge of vehicle dynamics, and weather. The processing and reasoning on these observations pose challenges due to the various sources of uncertainty in the information while ensuring cooperation with a variable number of aircraft in the airspace. These challenges coupled with the requirement to make safety-critical decisions in real-time rule out the use of conventional separation assurance techniques. We present a decentralized reinforcement learning framework to provide autonomous self-separation capabilities within AAM corridors with the use of speed and vertical maneuvers. The problem is formulated as a Markov Decision Process and solved by developing a novel extension to the sample-efficient, off-policy soft actor-critic (SAC) algorithm. We introduce the use of attention networks for variable-length observation processing and a distributed computing architecture to achieve high training sample throughput as compared to existing approaches. A comprehensive numerical study shows that the proposed framework can ensure safe and efficient separation of aircraft in high density, dynamic environments with various sources of uncertainty.
摘要
高级空中流动(AAM)介入了一种新的、高效的交通方式,通过车辆自主和电动飞机提供前所未有的市场。在低空飞行的环境中保持安全和高效的导航需要融合多种复杂的观察,如抽象、车辆动力学和天气情况。这些观察的处理和理解受多种不确定性的影响,同时确保与变量数量的飞机在空间中协作。这些挑战,加上实时做出安全关键决策,排除了传统的分离保障技术的使用。我们提出了一种分布式学习框架,以提供自动化的自分配能力在AAM通道中。问题是表示为马尔可夫决策过程,通过开发一种新的、软 actor-critic(SAC)算法的扩展来解决。我们引入了注意网络来处理变量长度的观察,以及分布式计算架构,以实现高训练样本通过率比现有方法高。一项完整的数学研究表明,我们的框架可以在高密度、动态环境中保持安全和高效的飞机分离能力,并处理多种不确定性。
Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation
results: 研究结果表明,提出的可持续联合学习(S2FL)算法可以减少完成时间,相比其他参考方案,可以减少21.45%。此外,研究还发现在非对抗多access(NOMA)下,可以提高总完成时间8.36%的平均值。Abstract
Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the first time investigates a resource allocation problem in collaborative sensing-assisted sustainable FL (S2FL) networks with the goal of minimizing the total completion time. We investigate a practical harvesting-sensing-training-transmitting protocol in which energy-limited MDs first harvest energy from RF signals, use it to gain a reward for user participation, sense the training data from the environment, train the local models at MDs, and transmit the model updates to the server. The total completion time minimization problem of jointly optimizing power transfer, transmit power allocation, data sensing, bandwidth allocation, local model training, and data transmission is complicated due to the non-convex objective function, highly non-convex constraints, and strongly coupled variables. We propose a computationally-efficient path-following algorithm to obtain the optimal solution via the decomposition technique. In particular, inner convex approximations are developed for the resource allocation subproblem, and the subproblems are performed alternatively in an iterative fashion. Simulation results are provided to evaluate the effectiveness of the proposed S2FL algorithm in reducing the completion time up to 21.45% in comparison with other benchmark schemes. Further, we investigate an extension of our work from frequency division multiple access (FDMA) to non-orthogonal multiple access (NOMA) and show that NOMA can speed up the total completion time 8.36% on average of the considered FL system.
摘要
联合学习(FL)在无线网络中获得了许多成功;但是实现FL的实现受到移动设备(MD)的能源限制和训练数据的可用性所妨碍。如何整合无线电力转移和移动协同测量以实现可持续的FL解决方案,这是现有文献中没有研究的领域。本研究首次对联合测量助长的可持续FL网络(S2FL)进行了资源分配问题的研究,以最小化总完成时间。我们提出了一个实用的收数测量训练传输协议,在能源有限的MD上首先从RF信号中获取能量,用以获得用户参与的奖励,从环境中测量训练数据,在MD上训练本地模型,并将模型更新发送到服务器。总完成时间最小化问题是由于目标函数不对应、内部非拘束和变量之间强烈的相互关联,而变得困难。我们提出了一个可靠的 Computational Efficiency 的路径追踪算法,通过分解技术来取得最佳解。具体来说,我们在资源分配子问题上开发了内部凸approximation,并在迭代的方式下进行了资源分配子问题的解决。 simulation 结果显示,对于考虑的FL系统,我们的S2FL算法可以降低总完成时间21.45%。此外,我们将FDMA扩展到NOMA,并显示了NOMA可以将总完成时间提高8.36%的平均值。
Prototypical Kernel Learning and Open-set Foreground Perception for Generalized Few-shot Semantic Segmentation
results: 在 PASCAL-5i 和 COCO-20i 数据集上进行了广泛的实验,并证明了我们的方法在 previous state-of-the-art 之上表现更好。Abstract
Generalized Few-shot Semantic Segmentation (GFSS) extends Few-shot Semantic Segmentation (FSS) to simultaneously segment unseen classes and seen classes during evaluation. Previous works leverage additional branch or prototypical aggregation to eliminate the constrained setting of FSS. However, representation division and embedding prejudice, which heavily results in poor performance of GFSS, have not been synthetical considered. We address the aforementioned problems by jointing the prototypical kernel learning and open-set foreground perception. Specifically, a group of learnable kernels is proposed to perform segmentation with each kernel in charge of a stuff class. Then, we explore to merge the prototypical learning to the update of base-class kernels, which is consistent with the prototype knowledge aggregation of few-shot novel classes. In addition, a foreground contextual perception module cooperating with conditional bias based inference is adopted to perform class-agnostic as well as open-set foreground detection, thus to mitigate the embedding prejudice and prevent novel targets from being misclassified as background. Moreover, we also adjust our method to the Class Incremental Few-shot Semantic Segmentation (CIFSS) which takes the knowledge of novel classes in a incremental stream. Extensive experiments on PASCAL-5i and COCO-20i datasets demonstrate that our method performs better than previous state-of-the-art.
摘要
Variations on the Reinforcement Learning performance of Blackjack
for: The paper is written to explore the impact of deck size on the convergence of q-learning algorithms in the context of blackjack.
methods: The paper uses a q-learning solution for optimal play in blackjack, and investigates the rate of learning convergence as a function of deck size.
results: The paper shows that a card counter perfectly using the basic strategy and hi-lo system can bring the house to bankruptcy, and that environment variations have a significant impact on this outcome.Here are the three points in Simplified Chinese:
results: 这篇论文显示,一个 perfectly 使用基本策略和 hi-lo 系统的 card counter 可以使酒店破产,并且环境变化会对这个结果产生很大的影响。Abstract
Blackjack or "21" is a popular card-based game of chance and skill. The objective of the game is to win by obtaining a hand total higher than the dealer's without exceeding 21. The ideal blackjack strategy will maximize financial return in the long run while avoiding gambler's ruin. The stochastic environment and inherent reward structure of blackjack presents an appealing problem to better understand reinforcement learning agents in the presence of environment variations. Here we consider a q-learning solution for optimal play and investigate the rate of learning convergence of the algorithm as a function of deck size. A blackjack simulator allowing for universal blackjack rules is also implemented to demonstrate the extent to which a card counter perfectly using the basic strategy and hi-lo system can bring the house to bankruptcy and how environment variations impact this outcome. The novelty of our work is to place this conceptual understanding of the impact of deck size in the context of learning agent convergence.
摘要
黑Jack或"21"是一款流行的 кар牌游戏,旨在通过获得手牌总数高于供应者的手牌总数而赢得比赛,而不超过21。理想的黑Jack策略可以在长期内最大化金钱收益,同时避免投资者的破产。黑Jack的 Stochastic 环境和内在的奖励结构,使得黑Jack 在变化的环境中的研究非常有吸引力。在这里,我们考虑了 q-learning 方法来实现最佳策略,并研究了算法的学习速率是否与扑克牌deck大小相关。我们还实现了一个支持通用黑Jack规则的黑Jack模拟器,以示出一个基本策略和高低系统的卡计数员可以使得供应者铺垮,以及环境变化对这个结果的影响。我们的作品之处在于将这些概念理解与学习代理人快速学习的关系。
Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey
for: This paper aims to provide a systematic and comprehensive overview of methods for incorporating external knowledge into stock price prediction models, including the acquisition of external knowledge from various unstructured data sources and fusion methods for combining external knowledge with historical price features.
methods: The paper covers various methods for acquiring external knowledge, including non-graph-based and graph-based knowledge representations, and explores fusion methods for combining external knowledge with historical price features.
results: The paper includes a compilation of relevant datasets and discusses potential future research directions in this domain.Here’s the same information in Simplified Chinese text:
results: 论文收录了相关的数据集,并讨论了未来在这个领域的可能的研究方向。Abstract
Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain.
摘要
预测股票价格是一个复杂的研究问题,因为股票市场本身具有不确定性和非线性。在过去几年,带有知识的股票价格预测方法已经显示出了创新的成果,通过利用外部知识来理解股票市场。尽管这些方法的重要性,但是学术研究中对外部知识类型的系统化synthesis却相对罕见。特别是,外部知识可以通过不同的数据结构来表示,我们将其分为非图形化格式和图形化格式:1)非图形化知识捕捉特定股票的上下文信息和 multimedia描述; 2)图形化知识捕捉股票市场中的相互连接和相互依赖信息。本文旨在提供一个系统性和全面的描述,涵盖从不同的不结构化数据源中获取外部知识,并将其与历史价格特征合并。此外,本文还探讨了外部知识与历史价格特征的融合方法,以及相关数据集和未来研究方向。
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
paper_authors: Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam
results: 在 31 个Unique NLP任务和 53 个公共数据集中进行了约 296K 个实验设定,以评估框架的性能。Abstract
The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench framework. Initially developed to evaluate Arabic NLP tasks using OpenAI's GPT and BLOOM models; it can be seamlessly customized for any NLP task and model, regardless of language. The framework also features zero- and few-shot learning settings. A new custom dataset can be added in less than 10 minutes, and users can use their own model API keys to evaluate the task at hand. The developed framework has been already tested on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. We plan to open-source the framework for the community (https://github.com/qcri/LLMeBench/). A video demonstrating the framework is available online (https://youtu.be/FkQn4UjYA0s).
摘要
The LLMeBench framework is highly customizable and can be easily adapted for any NLP task and model, regardless of language. It also supports zero- and few-shot learning settings, allowing users to evaluate their tasks with minimal data. Adding a new custom dataset to the framework takes less than 10 minutes, and users can use their own model API keys to evaluate the task at hand.We have tested the LLMeBench framework on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. The framework is set to be open-sourced for the community, and a video demonstrating its capabilities is available online (https://youtu.be/FkQn4UjYA0s).
Gaussian Image Anomaly Detection with Greedy Eigencomponent Selection
paper_authors: Tetiana Gula, João P C Bertoldo for: 本文旨在提出一种新的维度减少方法,用于图像异常检测(AD),通过使用预训练的卷积神经网络(CNN)和高效率的EfficientNet模型。methods: 本文使用两种类型的树搜索方法,即搜索最佳准则和最佳词汇搜索,以选择最佳的 eigencomponent。同时,我们也进行了三个主要的实验来评估方法的效果,包括测试集的影响、一种异常类型的训练和所有其他类型的评估、以及使用最小数量的图像进行训练和选择。results: 我们的方法在检测精度方面胜过了PCA和NPCA,即使使用 fewer components。这表明,我们的方法可以提供一种有效的维度减少方法,并且可以增强图像异常检测系统的效率和精度。Abstract
Anomaly detection (AD) in images, identifying significant deviations from normality, is a critical issue in computer vision. This paper introduces a novel approach to dimensionality reduction for AD using pre-trained convolutional neural network (CNN) that incorporate EfficientNet models. We investigate the importance of component selection and propose two types of tree search approaches, both employing a greedy strategy, for optimal eigencomponent selection. Our study conducts three main experiments to evaluate the effectiveness of our approach. The first experiment explores the influence of test set performance on component choice, the second experiment examines the performance when we train on one anomaly type and evaluate on all other types, and the third experiment investigates the impact of using a minimum number of images for training and selecting them based on anomaly types. Our approach aims to find the optimal subset of components that deliver the highest performance score, instead of focusing solely on the proportion of variance explained by each component and also understand the components behaviour in different settings. Our results indicate that the proposed method surpasses both Principal Component Analysis (PCA) and Negated Principal Component Analysis (NPCA) in terms of detection accuracy, even when using fewer components. Thus, our approach provides a promising alternative to conventional dimensionality reduction techniques in AD, and holds potential to enhance the efficiency and effectiveness of AD systems.
摘要
“问题检测(AD)在图像中,找到重要的异常,是计算机视觉中的关键问题。这篇论文介绍了一种新的维度减少方法,使用预训练的卷积神经网络(CNN)和EfficientNet模型。我们研究了选择组件的重要性,并提出了两种树搜索方法,都采用了聪明策略,以便选择最佳的组件。我们的研究进行了三项主要实验,以评估我们的方法的效果。第一项实验研究了测试集的影响因素,第二项实验验证了我们在一种异常类型上训练,然后在所有其他异常类型上进行评估,第三项实验研究了使用最少数量的图像进行训练和选择,并根据异常类型来选择图像。我们的方法旨在找到最佳的组件子集,而不是围绕每个组件的解释度量进行围绕。我们的结果表明,我们的方法在检测精度方面比PCA和NPCA高,即使使用 fewer components。因此,我们的方法提供了一种可靠的替代方法,可以增强AD系统的效率和效果。”
paper_authors: Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin, Shen
For: This paper aims to develop a comprehensive conceptual model for integrating Artificial Intelligence Generated Content (AIGC) and Semantic Communication (SemCom), and to propose a novel framework for generating meaningful and effective content using AIGC technology.* Methods: The paper employs a novel framework that uses AIGC technology as an encoder and decoder for semantic information, and jointly optimizes semantic extraction and evaluation metrics tailored to AIGC services. The framework is adaptable to different types of content generated, the required quality, and the semantic information utilized.* Results: The paper presents a case study using a Deep Q Network (DQN) to demonstrate the feasibility of the optimization problem and its convergence characteristics. The study provides useful insights into the effectiveness of the proposed framework for generating meaningful and effective content using AIGC technology.Abstract
Artificial Intelligence Generated Content (AIGC) Services have significant potential in digital content creation. The distinctive abilities of AIGC, such as content generation based on minimal input, hold huge potential, especially when integrating with semantic communication (SemCom). In this paper, a novel comprehensive conceptual model for the integration of AIGC and SemCom is developed. Particularly, a content generation level is introduced on top of the semantic level that provides a clear outline of how AIGC and SemCom interact with each other to produce meaningful and effective content. Moreover, a novel framework that employs AIGC technology is proposed as an encoder and decoder for semantic information, considering the joint optimization of semantic extraction and evaluation metrics tailored to AIGC services. The framework can adapt to different types of content generated, the required quality, and the semantic information utilized. By employing a Deep Q Network (DQN), a case study is presented that provides useful insights into the feasibility of the optimization problem and its convergence characteristics.
摘要
人工智能生成内容(AIGC)服务具有很大的潜力在数字内容创造中。AIGC的特殊能力,如基于最小输入的内容生成,在整合 semantic communication(SemCom)时表现出巨大的潜力,尤其是在生成有意义和有效的内容方面。在这篇论文中,我们提出了一种新的全面概念模型,用于AIGC和SemCom的整合。具体来说,我们在 semantic 层次上引入了内容生成级别,以便清晰地描述AIGC和SemCom之间的交互方式,并生成有意义和有效的内容。此外,我们提议了一种基于 AIGC 技术的核心架构,用于SemCom 的编解码器,并考虑了对 AIGC 服务的 JOINT 优化。这种框架可以适应不同类型的内容生成、需要的质量和使用的semantic信息。通过使用深度感知网络(DQN),我们在这种优化问题上提供了有用的洞察和其叠合特性。
An In-Depth Analysis of Discretization Methods for Communication Learning using Backpropagation with Multi-Agent Reinforcement Learning
results: 研究发现,ST-DRU 方法在不同环境中的表现最佳,其在每个实验中都达到了或超过了其他方法的最佳性能,而且不会在任何环境中失败。Abstract
Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments.
摘要
<>通信是多智能代理学习中关键的一部分,当代理不能观察环境的全部状态时。最常见的解决方案是使用可导通信通道,让代理之间的反馈通过梯度流动。然而,当使用整数消息时,梯度无法流动,这引发了一些问题。先前的研究已经提出了解决方案,但是这些方法在不同的通信学习架构和环境中进行测试,使得比较困难。本文比较了多种当前领先的整数化方法,以及一种新的方法。我们在通信学习中使用其他代理的梯度进行学习,并在多个环境中进行测试。此外,我们还提出了 COMA-DIAL,一种基于 DIAL 和 COMA 的通信学习方法,其中包括学习速率缩放和适应exploration。使用 COMA-DIAL 让我们在更复杂的环境中进行实验。我们的结果表明,本文所提出的新方法 ST-DRU,在不同环境中的所有实验中均达到了最佳result。它在每个实验中达到了最佳或接近最佳性能,并且是唯一不会在任何测试环境中失败的方法。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Service Reservation and Pricing for Green Metaverses: A Stackelberg Game Approach
results: compared with传统方案,我们的方案可以同时实现能源减少和个人合理性,并且可以满足用户的经济需求。此外,我们还提出了如何通过结合多种新技术实现可持续的绿色Metaverse。Abstract
Metaverse enables users to communicate, collaborate and socialize with each other through their digital avatars. Due to the spatio-temporal characteristics, co-located users are served well by performing their software components in a collaborative manner such that a Metaverse service provider (MSP) eliminates redundant data transmission and processing, ultimately reducing the total energy consumption. The energyefficient service provision is crucial for enabling the green and sustainable Metaverse. In this article, we take an augmented reality (AR) application as an example to achieve this goal. Moreover, we study an economic issue on how the users reserve offloading services from the MSP and how the MSP determines an optimal charging price since each user is rational to decide whether to accept the offloading service by taking into account the monetary cost. A single-leader multi-follower Stackelberg game is formulated between the MSP and users while each user optimizes an offloading probability to minimize the weighted sum of time, energy consumption and monetary cost. Numerical results show that our scheme achieves energy savings and satisfies individual rationality simultaneously compared with the conventional schemes. Finally, we identify and discuss open directions on how several emerging technologies are combined with the sustainable green Metaverse.
摘要
Metaverse 允许用户通过数字化的人物进行交流、合作和社交交流。由于空间 temporal 特点,用户在同一个地点时,Metaverse 服务提供商 (MSP) 可以减少重复的数据传输和处理,最终减少总能 consumption。这种能源efficient的服务提供是metaverse 的关键,以实现绿色可持续的Metaverse。在这篇文章中,我们使用了一个扩展现实 (AR) 应用程序作为例子,以实现这个目标。此外,我们还研究了用户向 MSP 积分服务的协议和 MSP 如何确定最佳的收费 Price,因为每个用户都是合理的决定是否接受协议,考虑到经济成本。在这个框架中,MSP 是一个领导者,而用户是多个追随者。每个用户都是最小化时间、能源消耗和经济成本的权重和总和。numerical 结果表明,我们的方案可以同时实现能源减少和个人合理性。最后,我们还讨论了一些新兴技术如何与可持续的绿色Metaverse 结合。
LLaMA-E: Empowering E-commerce Authoring with Multi-Aspect Instruction Following
paper_authors: Kaize Shi, Xueyao Sun, Dingxian Wang, Yinlin Fu, Guandong Xu, Qing Li methods: 这篇论文使用了专门的执行 instruction-following 语言模型(LLaMA-E),并将它们训练在多种电商创作任务中,包括广告生成、产品标题修改、产品分类、购买意愿预测和通用问答等。results: 这篇论文的实验结果显示,提案的 LLaMA-E 模型在量和质量评估中均取得了顶尖的结果,并在零执行场景中表现出优势。此外,这篇论文还发现 LLMA-E 模型可以对电商内容创作问题进行优化解决。Abstract
E-commerce authoring involves creating attractive, abundant, and targeted promotional content to drive product sales. The emergence of large language models (LLMs) introduces an innovative paradigm, offering a unified solution to address various authoring tasks within this scenario. However, mainstream LLMs trained on general corpora with common sense knowledge reveal limitations in fitting complex and personalized features unique to e-commerce products and customers. Furthermore, LLMs like GPT-3.5 necessitate remote accessibility, raising concerns about safeguarding voluminous customer privacy data during transmission. This paper proposes the LLaMA-E, the unified and customized instruction-following language models focusing on diverse e-commerce authoring tasks. Specifically, the domain experts create the seed instruction set from the tasks of ads generation, query-enhanced product title rewriting, product classification, purchase intent speculation, and general Q&A. These tasks enable the models to comprehensively understand precise e-commerce authoring knowledge by interleaving features covering typical service aspects of customers, sellers, and platforms. The GPT-3.5 is introduced as a teacher model, which expands the seed instructions to form a training set for the LLaMA-E models with various scales. The experimental results show that the proposed LLaMA-E models achieve state-of-the-art results in quantitative and qualitative evaluations, also exhibiting the advantage in zero-shot scenes. To the best of our knowledge, this study is the first to serve the LLMs to specific e-commerce authoring scenarios.
摘要
电商作者需创建吸引人、丰富、有目标的推广内容,以促进产品销售。大语言模型(LLM)的出现提供了一种创新的解决方案,可以同时解决多种作者任务。然而,主流的LLM通常由通用文献训练,对电商产品和客户特有的复杂和个性化特征表现出限制。此外,LLM如GPT-3.5需要远程访问,可能会使客户隐私数据泄露。本文提议了LLaMA-E,一种特定于电商作者任务的一体化和个性化语言模型。具体来说,域专家会根据广告生成、产品标题重写、产品分类、购买意愿预测和通用问答等任务,创建种子指令集。这些任务使模型能够全面地理解电商作者的精准知识,覆盖客户、卖家和平台的典型服务方面。GPT-3.5被用作教学模型,将种子指令扩展成训练集,以形成不同规模的LLaMA-E模型。实验结果表明,提议的LLaMA-E模型在量和质量评价中具有国际最佳效果,同时在零容量场景中也表现出优势。到目前为止,这是电商作者scenario中LLM的首次应用。
SLPT: Selective Labeling Meets Prompt Tuning on Label-Limited Lesion Segmentation
results: 这个研究在肝癌分 segmentation 中实现了国际级的性能,只需要6%的弹性参数,并且可以在5%的标签数量下 достиieving 94%的全资料性能。Abstract
Medical image analysis using deep learning is often challenged by limited labeled data and high annotation costs. Fine-tuning the entire network in label-limited scenarios can lead to overfitting and suboptimal performance. Recently, prompt tuning has emerged as a more promising technique that introduces a few additional tunable parameters as prompts to a task-agnostic pre-trained model, and updates only these parameters using supervision from limited labeled data while keeping the pre-trained model unchanged. However, previous work has overlooked the importance of selective labeling in downstream tasks, which aims to select the most valuable downstream samples for annotation to achieve the best performance with minimum annotation cost. To address this, we propose a framework that combines selective labeling with prompt tuning (SLPT) to boost performance in limited labels. Specifically, we introduce a feature-aware prompt updater to guide prompt tuning and a TandEm Selective LAbeling (TESLA) strategy. TESLA includes unsupervised diversity selection and supervised selection using prompt-based uncertainty. In addition, we propose a diversified visual prompt tuning strategy to provide multi-prompt-based discrepant predictions for TESLA. We evaluate our method on liver tumor segmentation and achieve state-of-the-art performance, outperforming traditional fine-tuning with only 6% of tunable parameters, also achieving 94% of full-data performance by labeling only 5% of the data.
摘要
医疗图像分析使用深度学习经常受到有限的标注数据和高标注成本的挑战。 fine-tuning整个网络在标注有限的场景下可能导致逗减和下标性性能。Recently, prompt tuning emerged as a more promising technique that introduces a few additional tunable parameters as prompts to a task-agnostic pre-trained model, and updates only these parameters using supervision from limited labeled data while keeping the pre-trained model unchanged. However, previous work has overlooked the importance of selective labeling in downstream tasks, which aims to select the most valuable downstream samples for annotation to achieve the best performance with minimum annotation cost. To address this, we propose a framework that combines selective labeling with prompt tuning (SLPT) to boost performance in limited labels. Specifically, we introduce a feature-aware prompt updater to guide prompt tuning and a TandEm Selective LAbeling (TESLA) strategy. TESLA includes unsupervised diversity selection and supervised selection using prompt-based uncertainty. In addition, we propose a diversified visual prompt tuning strategy to provide multi-prompt-based discrepant predictions for TESLA. We evaluate our method on liver tumor segmentation and achieve state-of-the-art performance, outperforming traditional fine-tuning with only 6% of tunable parameters, also achieving 94% of full-data performance by labeling only 5% of the data.
Adversarial Deep Reinforcement Learning for Cyber Security in Software Defined Networks
paper_authors: Luke Borchjes, Clement Nyirenda, Louise Leenen
For: The paper explores the impact of adversarial learning in Deep Reinforcement Learning (DRL) for autonomous security in Software Defined Networks (SDN).* Methods: The paper compares two algorithms, Double Deep Q-Networks (DDQN) and Neural Episodic Control to Deep Q-Network (NEC2DQN or N2D), and evaluates their performance under a white-box setting with a causative attack.* Results: The paper shows that with minute parameter changes, the algorithms are still able to defend the network, and the introduction of the causative attack improves the attacker’s performance.Abstract
This paper focuses on the impact of leveraging autonomous offensive approaches in Deep Reinforcement Learning (DRL) to train more robust agents by exploring the impact of applying adversarial learning to DRL for autonomous security in Software Defined Networks (SDN). Two algorithms, Double Deep Q-Networks (DDQN) and Neural Episodic Control to Deep Q-Network (NEC2DQN or N2D), are compared. NEC2DQN was proposed in 2018 and is a new member of the deep q-network (DQN) family of algorithms. The attacker has full observability of the environment and access to a causative attack that uses state manipulation in an attempt to poison the learning process. The implementation of the attack is done under a white-box setting, in which the attacker has access to the defender's model and experiences. Two games are played; in the first game, DDQN is a defender and N2D is an attacker, and in second game, the roles are reversed. The games are played twice; first, without an active causative attack and secondly, with an active causative attack. For execution, three sets of game results are recorded in which a single set consists of 10 game runs. The before and after results are then compared in order to see if there was actually an improvement or degradation. The results show that with minute parameter changes made to the algorithms, there was growth in the attacker's role, since it is able to win games. Implementation of the adversarial learning by the introduction of the causative attack showed the algorithms are still able to defend the network according to their strengths.
摘要
Here is the text in Simplified Chinese:这篇论文研究了利用深度强化学习(DRL)中的自主攻击方法来训练更加鲜硬的代理人,并通过对DRL的自主安全性进行抗击学习来强化网络的自主安全性。研究比较了两种算法,Double Deep Q-Networks(DDQN)和Neural Episodic Control to Deep Q-Network(NEC2DQN或N2D),在面临 causative 攻击时的防御能力。攻击者具有环境的全观察权和对 defendere 的模型和经验的访问权。在一个白盒Setting下,攻击者通过 state manipulate 来尝试毒害学习过程。两个游戏被玩了两次,一次没有活动 causative 攻击,第二次有活动 causative 攻击。每个游戏被玩了十次。前后结果被记录下来,并进行比较,以确定是否有改进或退化。结果显示,通过微调算法的参数,攻击者能够赢得游戏。对于 causative 攻击的引入,算法仍然能够防御网络,根据其优势。
GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters
results: 在多种场景下测试和比较GraphCC和ACC两种解决方案,结果显示GraphCC在所有评价场景中表现出色,与ACC比较而言,GraphCC在流程完成时间和缓存占用率方面具有显著的改善($20%$ 的提高和$38.0-85.7%$ 的减少)。Abstract
Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCN). Currently, DCNs mainly implement two main CC protocols: DCTCP and DCQCN. Both protocols -- and their main variants -- are based on Explicit Congestion Notification (ECN), where intermediate switches mark packets when they detect congestion. The ECN configuration is thus a crucial aspect on the performance of CC protocols. Nowadays, network experts set static ECN parameters carefully selected to optimize the average network performance. However, today's high-speed DCNs experience quick and abrupt changes that severely change the network state (e.g., dynamic traffic workloads, incast events, failures). This leads to under-utilization and sub-optimal performance. This paper presents GraphCC, a novel Machine Learning-based framework for in-network CC optimization. Our distributed solution relies on a novel combination of Multi-agent Reinforcement Learning (MARL) and Graph Neural Networks (GNN), and it is compatible with widely deployed ECN-based CC protocols. GraphCC deploys distributed agents on switches that communicate with their neighbors to cooperate and optimize the global ECN configuration. In our evaluation, we test the performance of GraphCC under a wide variety of scenarios, focusing on the capability of this solution to adapt to new scenarios unseen during training (e.g., new traffic workloads, failures, upgrades). We compare GraphCC with a state-of-the-art MARL-based solution for ECN tuning -- ACC -- and observe that our proposed solution outperforms the state-of-the-art baseline in all of the evaluation scenarios, showing improvements up to $20\%$ in Flow Completion Time as well as significant reductions in buffer occupancy ($38.0-85.7\%$).
摘要
压力控制(CC)在数据中心网络(DCN)中扮演了基本角色,以优化流量。目前,DCNs主要实施了两种主要的CC协议:DCTCP和DCQCN。这两种协议都基于显式拥堵通知(ECN), intermediate switches 将包 WHEN 检测拥堵。因此,ECN配置成为了CC协议性能的关键因素。当前,网络专家通过精心选择ECN参数来优化平均网络性能。然而,今天的高速DCNs在快速和突然变化的网络状态下经历了差不多的性能下降。这导致了下Utilization和不佳的性能。本文介绍了一种基于机器学习的GraphCC框架,用于在网络中进行CC优化。我们的分布式解决方案基于多代理循环学习(MARL)和图神经网络(GNN)的新组合,与广泛部署的ECN基于CC协议相容。GraphCC在 switches 上部署分布式代理,与相邻的 switches 进行交互,以便协调和优化全局ECN配置。在我们的评估中,我们测试了GraphCC在多种场景下的性能,特别是它在新的场景下(例如新的流量工作负荷、故障、升级)进行适应性的能力。我们将GraphCC与一种基于MARL的ECN调试解决方案——ACC进行比较,并发现我们的提议方案在所有评估场景中都高于基线,显示改进流程完成时间($20\%$)以及重要减少缓冲占用($38.0-85.7\%$)。
Unleashing the Power of Extra-Tree Feature Selection and Random Forest Classifier for Improved Survival Prediction in Heart Failure Patients
results: 通过ET特征选择算法确定最重要的预测器,并使用网格搜索法进行RF模型的优化,实现了98.33%的准确率,至今为最高的成果。Abstract
Heart failure is a life-threatening condition that affects millions of people worldwide. The ability to accurately predict patient survival can aid in early intervention and improve patient outcomes. In this study, we explore the potential of utilizing data pre-processing techniques and the Extra-Tree (ET) feature selection method in conjunction with the Random Forest (RF) classifier to improve survival prediction in heart failure patients. By leveraging the strengths of ET feature selection, we aim to identify the most significant predictors associated with heart failure survival. Using the public UCL Heart failure (HF) survival dataset, we employ the ET feature selection algorithm to identify the most informative features. These features are then used as input for grid search of RF. Finally, the tuned RF Model was trained and evaluated using different matrices. The approach was achieved 98.33% accuracy that is the highest over the exiting work.
摘要
心力衰竭是一种生命威胁性的疾病,影响全球数百万人。可以准确预测患者存活的能力可以帮助早期干预,提高患者结果。在这项研究中,我们探讨了使用数据处理技术和EXTRA-TREE(ET)特征选择方法,与Random Forest(RF)分类器结合以提高心力衰竭患者存活预测。通过利用ET特征选择算法的优势,我们希望可以确定心力衰竭存活的最重要预测因素。使用公共的UCL心力衰竭(HF)存活数据集,我们采用ET特征选择算法确定最有用的特征。这些特征然后作为RF模型的输入,进行了网格搜索。最后,通过不同矩阵的训练和评估,我们实现了98.33%的准确率,这是已有工作中最高的成绩。
Learning Type-Generalized Actions for Symbolic Planning
results: 在模拟的网格式厨房环境中,通过几个观察学习,可以学习和泛化类型总结,解决未经见过的任务组合、新的实体和不预期的环境行为。Abstract
Symbolic planning is a powerful technique to solve complex tasks that require long sequences of actions and can equip an intelligent agent with complex behavior. The downside of this approach is the necessity for suitable symbolic representations describing the state of the environment as well as the actions that can change it. Traditionally such representations are carefully hand-designed by experts for distinct problem domains, which limits their transferability to different problems and environment complexities. In this paper, we propose a novel concept to generalize symbolic actions using a given entity hierarchy and observed similar behavior. In a simulated grid-based kitchen environment, we show that type-generalized actions can be learned from few observations and generalize to novel situations. Incorporating an additional on-the-fly generalization mechanism during planning, unseen task combinations, involving longer sequences, novel entities and unexpected environment behavior, can be solved.
摘要
symbolic 规划是一种强大的技术,可以解决复杂的任务,需要长串动作,并具备复杂的行为。但是,这种方法的缺点是需要适当的符号表示,描述环境状态以及可以改变它的行动。传统上,这些表示是由专家手动设计,限制了它们在不同问题领域的传输性。在这篇论文中,我们提出了一种新的概念,使用给定的实体层次结构和观察到的相似行为来泛化符号行动。在一个模拟的网格式厨房环境中,我们示出了通过几次观察学习,可以将类型泛化行动应用于新的情况。在规划过程中,附加了一种在线泛化机制,可以解决未看过的任务组合,包括更长的序列,新的实体和意外的环境行为。
Scalability of Message Encoding Techniques for Continuous Communication Learned with Multi-Agent Reinforcement Learning
paper_authors: Astrid Vanneste, Thomas Somers, Simon Vanneste, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx
for: 本研究旨在 investigating the effect of increasing the amount of information in multi-agent communication messages and the number of agents on the performance of the system.
methods: 本研究使用 multi-agent reinforcement learning 技术,并 comparison of two message encoding methods: mean message encoder 和 attention message encoder.
results: surprisingly, 我们发现 mean message encoder 在所有情况下表现出色,而 attention message encoder 则表现较差。 进一步分析发现,使用 mean message encoder 的 agent 使用一种组合函数,包括对数和对数函数,来确保重要信息不会在传输过程中丢失。Abstract
Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in these messages. In this paper, we investigate the effect of increasing the amount of information that should be contained in a message and increasing the number of agents. We evaluate these effects on two different message encoding methods, the mean message encoder and the attention message encoder. We perform our experiments on a matrix environment. Surprisingly, our results show that the mean message encoder consistently outperforms the attention message encoder. Therefore, we analyse the communication protocol used by the agents that use the mean message encoder and can conclude that the agents use a combination of an exponential and a logarithmic function in their communication policy to avoid the loss of important information after applying the mean message encoder.
摘要
多个Agent系统需要间 Agent communication来实现目标。通过使用多 Agent reinforcement learning技术学习交流协议和行为协议,代理人获得了自定义信息共享的灵活性。然而,当代理人数量增加时,我们需要创建消息中信息的编码方式。在这篇论文中,我们研究了增加消息中信息量和代理人数量的效果,并对两种消息编码方法进行评估:mean message encoder和attention message encoder。我们在矩阵环境中进行了实验,结果显示mean message encoder在所有情况下表现出优于attention message encoder。因此,我们分析了使用mean message encoder的交流协议,并确定代理人使用权重函数和幂函数在其交流策略中,以避免信息损失。
Unlocking the Diagnostic Potential of ECG through Knowledge Transfer from Cardiac MRI
paper_authors: Özgün Turgut, Philip Müller, Paul Hager, Suprosanna Shit, Sophie Starck, Martin J. Menten, Eimo Martens, Daniel Rueckert for:本研究的目的是使用自适应对比学习技术,将心脏磁共振成像(CMR)图像中封闭的域特征传递到电cardiogram(ECG)中,以提高心血管疾病诊断的效率和准确性。methods:本研究使用了多模态对比学习和伪数据模型,将CMR图像中的域特征与ECG数据进行对比,以学习ECG中含有的域特征。results:研究结果表明,通过使用自适应对比学习技术,可以从ECG数据中提取出各种心血管疾病的风险和各种心脏现象的信息。此外,研究还发现了ECG中含有CMR图像中封闭的域特征。Abstract
The electrocardiogram (ECG) is a widely available diagnostic tool that allows for a cost-effective and fast assessment of the cardiovascular health. However, more detailed examination with expensive cardiac magnetic resonance (CMR) imaging is often preferred for the diagnosis of cardiovascular diseases. While providing detailed visualization of the cardiac anatomy, CMR imaging is not widely available due to long scan times and high costs. To address this issue, we propose the first self-supervised contrastive approach that transfers domain-specific information from CMR images to ECG embeddings. Our approach combines multimodal contrastive learning with masked data modeling to enable holistic cardiac screening solely from ECG data. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalizability of our method. We predict the subject-specific risk of various cardiovascular diseases and determine distinct cardiac phenotypes solely from ECG data. In a qualitative analysis, we demonstrate that our learned ECG embeddings incorporate information from CMR image regions of interest. We make our entire pipeline publicly available, including the source code and pre-trained model weights.
摘要
电rokardiogram (ECG) 是一种广泛可用的诊断工具,可以快速和效果地评估心血管健康。然而,更详细的检查通常使用昂贵的心血管磁共振 (CMR) 成像,以诊断心血管疾病。虽然可以提供详细的卡ди亚解剖结构,但CMR成像不够普遍使用,因为扫描时间长和成本高。为解决这个问题,我们提出了首个自动supervised contrastiveapproach,可以从CMR图像中传递域特定信息到ECG嵌入。我们的方法结合多modal contrastive学习和masked数据模型,以实现从ECG数据中进行全面的心血管检查。在使用40044名UK Biobank参与者的数据进行广泛的实验中,我们证明了我们的方法的实用性和普遍性。我们预测参与者的具体风险以及不同心血管疾病的各种疾病。在质量分析中,我们示出了我们学习的ECG嵌入包含CMR图像区域兴趣的信息。我们将整个管道公开发布,包括源代码和预训练模型参数。
On the Unexpected Abilities of Large Language Models
methods: 这篇论文使用了各种 indirect acquisition 方法,包括 predicting the next words of human-written texts,来研究大语言模型的能力。
results: 研究发现,大语言模型通过 indirect acquisition 方式可以获得各种 интегрирован的能力,包括语言理解和生成能力。此外,这些系统还可以通过自我改进来提高自己的能力。Abstract
Large language models are capable of displaying a wide range of abilities that are not directly connected with the task for which they are trained: predicting the next words of human-written texts. In this article, I discuss the nature of this indirect acquisition process and its relation to other known indirect processes. I argue that an important side effect of such indirect acquisition is the development of integrated abilities. I discuss the extent to which the abilities developed by large language models are predictable. Finally, I briefly discuss the relation between the cognitive skills acquired by these systems and human cognition.
摘要
Neuro-Symbolic RDF and Description Logic Reasoners: The State-Of-The-Art and Challenges
for: This paper provides an overview of the existing literature in the field of neuro-symbolic deductive reasoning supported by RDF(S), the description logics EL and ALC, and OWL 2 RL.
methods: The paper discusses various techniques employed in neuro-symbolic deductive reasoning, including neural networks and symbolic systems.
results: The paper provides a comprehensive overview of the existing literature in the field, discussing the tasks addressed and other relevant efforts in this area.Abstract
Ontologies are used in various domains, with RDF and OWL being prominent standards for ontology development. RDF is favored for its simplicity and flexibility, while OWL enables detailed domain knowledge representation. However, as ontologies grow larger and more expressive, reasoning complexity increases, and traditional reasoners struggle to perform efficiently. Despite optimization efforts, scalability remains an issue. Additionally, advancements in automated knowledge base construction have created large and expressive ontologies that are often noisy and inconsistent, posing further challenges for conventional reasoners. To address these challenges, researchers have explored neuro-symbolic approaches that combine neural networks' learning capabilities with symbolic systems' reasoning abilities. In this chapter,we provide an overview of the existing literature in the field of neuro-symbolic deductive reasoning supported by RDF(S), the description logics EL and ALC, and OWL 2 RL, discussing the techniques employed, the tasks they address, and other relevant efforts in this area.
摘要
ontologies 在不同领域中使用,RDF 和 OWL 是 Ontology 开发的主要标准。RDF 因其简单性和灵活性而受到推崇,而 OWL 允许对域知识进行详细表示。然而,随着 ontologies 的增大和表示力的提高,推理复杂性也随之增加,传统的推理器难以高效执行。尽管进行了优化尝试,但Scalability 仍然是一个问题。此外,自动化知识库建构技术的进步创造了大量和表示力强的 ontologies,这些 ontologies 经常具有噪音和不一致性,对传统的推理器 pose 更大的挑战。为解决这些挑战,研究人员 explore 了 neuralsymbolic 方法,这些方法将 neural networks 的学习能力与 symbolic 系统的推理能力结合起来。在这章中,我们提供了 exist 的文献综述,涵盖使用 RDF(S)、描述逻辑 EL 和 ALC,以及 OWL 2 RL,讨论使用的技术、Addressed 的任务和其他相关的努力。
A Fast and Optimal Learning-based Path Planning Method for Planetary Rovers
results: 对于 novel maps,可以快速搜索优化路径,并且在同硬件条件下,导航场景生成的优化路径可以大大减少搜索时间。Abstract
Intelligent autonomous path planning is crucial to improve the exploration efficiency of planetary rovers. In this paper, we propose a learning-based method to quickly search for optimal paths in an elevation map, which is called NNPP. The NNPP model learns semantic information about start and goal locations, as well as map representations, from numerous pre-annotated optimal path demonstrations, and produces a probabilistic distribution over each pixel representing the likelihood of it belonging to an optimal path on the map. More specifically, the paper computes the traversal cost for each grid cell from the slope, roughness and elevation difference obtained from the DEM. Subsequently, the start and goal locations are encoded using a Gaussian distribution and different location encoding parameters are analyzed for their effect on model performance. After training, the NNPP model is able to perform path planning on novel maps. Experiments show that the guidance field generated by the NNPP model can significantly reduce the search time for optimal paths under the same hardware conditions, and the advantage of NNPP increases with the scale of the map.
摘要
智能自主路径规划是探索效率提高 planetary rover 的关键。本文提出一种学习基于的方法,快速搜索地形图上的优化路径,称为 NNPP。 NNPP 模型从 numerous pre-annotated 优化路径示例中学习起始和目标位置的 semantic 信息以及地图表示,并生成地图上每个像素的可能性分布,表示它是优化路径的一部分。更加具体地说,文章计算地形图上每个格子单元的行进成本,基于 DEM 中的坡度、粗糙度和高程差。然后,起始和目标位置被编码为 Gaussian 分布,并对不同的位置编码参数进行分析,以影响模型性能。在训练后,NNPP 模型可以在新地图上进行路径规划。实验表明,NNPP 模型生成的引导场可以在同样的硬件条件下减少搜索优化路径的时间,并且 NNPP 模型在地图规模增加后的优势也随着增加。
Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR
paper_authors: Chunpeng Zhou, Kangjie Ning, Haishuai Wang, Zhi Yu, Sheng Zhou, Jiajun Bu for:This paper focuses on the subgrade distress detection task using 3D ground-penetrating radar (3D-GPR) data, with the goal of enhancing efficiency and accuracy through the use of automatic detection techniques and deep learning.methods:The proposed method leverages multi-view information from 3D-GPR data and constructs a real multi-view image dataset for the detection task. The method also develops a novel framework called GPR-MVFD, which incorporates multi-view distillation and attention-based fusion to extract significant features for subgrade distresses.results:The proposed framework outperforms existing GPR baselines and state-of-the-art methods in multi-view learning, multi-modal learning, and knowledge distillation, as demonstrated through extensive experiments on a new GPR benchmark. The constructed multi-view GPR dataset with expert-annotated labels and the source codes of the proposed framework will be released.Abstract
The application of 3D ground-penetrating radar (3D-GPR) for subgrade distress detection has gained widespread popularity. To enhance the efficiency and accuracy of detection, pioneering studies have attempted to adopt automatic detection techniques, particularly deep learning. However, existing works typically rely on traditional 1D A-scan, 2D B-scan or 3D C-scan data of the GPR, resulting in either insufficient spatial information or high computational complexity. To address these challenges, we introduce a novel methodology for the subgrade distress detection task by leveraging the multi-view information from 3D-GPR data. Moreover, we construct a real multi-view image dataset derived from the original 3D-GPR data for the detection task, which provides richer spatial information compared to A-scan and B-scan data, while reducing computational complexity compared to C-scan data. Subsequently, we develop a novel \textbf{M}ulti-\textbf{V}iew \textbf{V}usion and \textbf{D}istillation framework, \textbf{GPR-MVFD}, specifically designed to optimally utilize the multi-view GPR dataset. This framework ingeniously incorporates multi-view distillation and attention-based fusion to facilitate significant feature extraction for subgrade distresses. In addition, a self-adaptive learning mechanism is adopted to stabilize the model training and prevent performance degeneration in each branch. Extensive experiments conducted on this new GPR benchmark demonstrate the effectiveness and efficiency of our proposed framework. Our framework outperforms not only the existing GPR baselines, but also the state-of-the-art methods in the fields of multi-view learning, multi-modal learning, and knowledge distillation. We will release the constructed multi-view GPR dataset with expert-annotated labels and the source codes of the proposed framework.
摘要
现在广泛应用的3D地面探测(3D-GPR)技术已经为基层损伤探测带来了广泛的应用。为了提高检测效率和准确性,先锋研究者们已经尝试使用自动检测技术,特别是深度学习。然而,现有的工作通常仅仅基于传统的1D A-scan、2D B-scan或3D C-scan GPR数据,这会导致 either 缺乏空间信息 or 高度计算复杂。为解决这些挑战,我们介绍了一种新的方法oloogy for the subgrade distress detection task by leveraging the multi-view information from 3D-GPR data。此外,我们构建了来自原始3D-GPR数据的真正多视图图像集,这提供了与A-scan和B-scan数据相比更丰富的空间信息,而与C-scan数据相比又更加可靠。然后,我们开发了一种新的Multi-View Fusion and Distillation框架(GPR-MVFD),专门为了优化多视图GPR数据的利用。这个框架杰出地结合了多视图混合和注意力基于的混合,以便实现了显著的特征提取 для基层损伤。此外,我们采用了自适应学习机制,以稳定模型训练和避免每个分支的性能下降。我们在新的GPR benchmark上进行了广泛的实验,并证明了我们提出的方法的有效性和效率。我们的方法不仅超过了现有的GPR基准,还超过了多视图学习、多模态学习和知识混合等领域的状态 искусственный机制。我们将构建的多视图GPR数据集和GPR-MVFD框架的源代码一起发布。
Multi-modal Multi-view Clustering based on Non-negative Matrix Factorization
results: 实验结果表明,提出的方法在多种数据集上具有优秀的性能,与当前state-of-the-art方法相比,具有更高的准确率和更好的可读性。Abstract
By combining related objects, unsupervised machine learning techniques aim to reveal the underlying patterns in a data set. Non-negative Matrix Factorization (NMF) is a data mining technique that splits data matrices by imposing restrictions on the elements' non-negativity into two matrices: one representing the data partitions and the other to represent the cluster prototypes of the data set. This method has attracted a lot of attention and is used in a wide range of applications, including text mining, clustering, language modeling, music transcription, and neuroscience (gene separation). The interpretation of the generated matrices is made simpler by the absence of negative values. In this article, we propose a study on multi-modal clustering algorithms and present a novel method called multi-modal multi-view non-negative matrix factorization, in which we analyze the collaboration of several local NMF models. The experimental results show the value of the proposed approach, which was evaluated using a variety of data sets, and the obtained results are very promising compared to state of art methods.
摘要
通过组合相关对象,无监督机器学习技术寻求潜在的数据集下的 patrón。非正式矩阵分解(NMF)是一种数据挖掘技术,通过强制数据矩阵中元素的非负性约束,将数据分解成两个矩阵:一个表示数据分区,另一个表示数据集中聚类prototype。这种方法在各种应用中具有广泛的应用,包括文本挖掘、聚类、语言模型、音乐识别和神经科学(基因分离)。由于缺乏负值,生成的矩阵的解释变得更加简单。在本文中,我们提出了一种多模态聚类算法的研究,并提出了一种新的方法 called 多模态多视图非正式矩阵分解。我们通过分析多个本地NMF模型的协作来进行分析。实验结果显示,提出的方法在多种数据集上具有极高的效果,与现有方法相比,结果很有前途。
E3-UAV: An Edge-based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles
results: 实验结果显示,这个系统可以在实际应用中对检测任务进行有效的能源消耗优化。Abstract
Motivated by the advances in deep learning techniques, the application of Unmanned Aerial Vehicle (UAV)-based object detection has proliferated across a range of fields, including vehicle counting, fire detection, and city monitoring. While most existing research studies only a subset of the challenges inherent to UAV-based object detection, there are few studies that balance various aspects to design a practical system for energy consumption reduction. In response, we present the E3-UAV, an edge-based energy-efficient object detection system for UAVs. The system is designed to dynamically support various UAV devices, edge devices, and detection algorithms, with the aim of minimizing energy consumption by deciding the most energy-efficient flight parameters (including flight altitude, flight speed, detection algorithm, and sampling rate) required to fulfill the detection requirements of the task. We first present an effective evaluation metric for actual tasks and construct a transparent energy consumption model based on hundreds of actual flight data to formalize the relationship between energy consumption and flight parameters. Then we present a lightweight energy-efficient priority decision algorithm based on a large quantity of actual flight data to assist the system in deciding flight parameters. Finally, we evaluate the performance of the system, and our experimental results demonstrate that it can significantly decrease energy consumption in real-world scenarios. Additionally, we provide four insights that can assist researchers and engineers in their efforts to study UAV-based object detection further.
摘要
In response, we propose the E3-UAV, an edge-based energy-efficient object detection system for UAVs. The system is designed to dynamically support various UAV devices, edge devices, and detection algorithms, with the aim of minimizing energy consumption by determining the most energy-efficient flight parameters (including flight altitude, flight speed, detection algorithm, and sampling rate) required to fulfill the detection requirements of the task.To evaluate the performance of the system, we first present an effective evaluation metric for actual tasks and construct a transparent energy consumption model based on hundreds of actual flight data to formalize the relationship between energy consumption and flight parameters. Then, we present a lightweight energy-efficient priority decision algorithm based on a large quantity of actual flight data to assist the system in deciding flight parameters.Our experimental results demonstrate that the E3-UAV system can significantly decrease energy consumption in real-world scenarios. Additionally, we provide four insights that can assist researchers and engineers in their efforts to study UAV-based object detection further:1. The choice of detection algorithm has a significant impact on energy consumption, and the most energy-efficient algorithm may not always be the best performer.2. Flight altitude has a greater impact on energy consumption than flight speed, and adjusting flight altitude can lead to significant energy savings.3. Sampling rate has a complex relationship with energy consumption, and the optimal sampling rate depends on the specific task and environment.4. The E3-UAV system can be used for a variety of tasks beyond object detection, such as tracking and monitoring, and can be integrated with other systems to achieve even greater energy savings.In summary, the E3-UAV system represents a significant step forward in the development of energy-efficient UAV-based object detection systems, and our insights provide valuable guidance for future research and development in this area.
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
results: 实验表明,Compared with其他State-of-the-art工作,本研究的方法在不同的挑战性enario中具有更高的性能和稳定性。特别是在SoundNet-Flickr和VGG-Sound Source数据集上,本研究的方法的声音源localization性能都达到了最高水平。Abstract
Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive loss, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN
摘要
自我超vision的声源localization通常面临modal inconsistency挑战。近年来,基于对比学习的策略在视觉场景中已经展示了promising的表现,可以建立声源和视觉模态之间的一致性。然而,忽视不同模态特征之间的多样性影响仍然限制了这种方案的进一步改进,这也成为了我们的研究动机。在这项研究中,我们提出了一种启发网络,可以更有效地bridging模态差距。通过解coupling视觉和声音模态的梯度,可以学习视觉特征中声源的抽象表示,同时使声音模态与视觉模态保持一致。此外,我们还引入了一种视觉权重对比损失和适应阈值选择策略,以增强启发网络的稳定性。在SoundNet-Flickr和VGG-Sound Source等数据集上进行了大量实验,并证明了与其他状态 искус技术相比,我们的方法在不同的挑战场景中具有优秀的表现。代码可以在https://github.com/Tahy1/AVIN中下载。
Feature Matching Data Synthesis for Non-IID Federated Learning
for: This paper proposes a novel federated learning (FL) framework with data augmentation to relieve data heterogeneity, which can effectively address the non-independent and identically distributed (non-IID) data challenge in FL.
methods: The proposed framework uses a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models, which generates synthetic data by learning the essential class-relevant features of real samples and discarding the redundant features. To further enhance privacy preservation, a hard feature augmentation method is proposed to transfer real features towards the decision boundary, making the synthetic data not only improve the model generalization but also erase the information of real features.
results: The theoretical analysis and simulation results demonstrate that the proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.Abstract
Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.
摘要
federated learning (FL) 已经成为一种保持隐私的 парадиг,通过在边缘设备上训练神经网络而不需要收集数据到中央服务器。然而,FL 遇到了非独立和同样分布 (非 IID) 数据问题。为解决这个挑战,本文提出了一种困难特征匹配数据合成 (HFMDS) 方法,以及在本地模型 aside,共享auxiliary数据。具体来说,我们通过学习实际样本中的重要类相关特征,并丢弃 redundant 特征,可以有效地解决非 IID 问题。为了更好地保持隐私,我们提议一种困难特征增强方法,通过将实际特征转移到决策边界,使得synthetic数据不仅提高模型通用性,还消除实际特征的信息。通过将提档的 HFMDS 方法与 FL 集成,我们提出了一种新的 FL 框架,并在不同的 benchmark 数据集上进行了实验。理论分析表明,我们的提档的数据合成方法能够有效地解决非 IID 问题。实验结果还表明,我们的 HFMDS-FL 算法在准确率、隐私保持和计算成本等方面,与基eline相比,在不同的 benchmark 数据集上表现出色。
Automated Driving Without Ethics: Meaning, Design and Real-World Implementation
paper_authors: Katherine Evans, Nelson de Moura, Raja Chatila, Stéphane Chauvier
for: 这 paper 的目的是提出一种基于预定参数的 AV 决策策略,以便在各种决策场景中满足不同的人道主义观点和公众期望。
methods: 该策略使用了 Ethical Valence Theory,将 AV 决策视为一种缓冲降低报告的过程,并提出多种可能的决策规则,以便在具体的决策场景中选择最适合的行动。
results: 该策略可以帮助评估自动化汽车的决策是否符合社会可接受水平,并且可以满足不同的人道主义观点和公众期望。Abstract
The ethics of automated vehicles (AV) has received a great amount of attention in recent years, specifically in regard to their decisional policies in accident situations in which human harm is a likely consequence. After a discussion about the pertinence and cogency of the term 'artificial moral agent' to describe AVs that would accomplish these sorts of decisions, and starting from the assumption that human harm is unavoidable in some situations, a strategy for AV decision making is proposed using only pre-defined parameters to characterize the risk of possible accidents and also integrating the Ethical Valence Theory, which paints AV decision-making as a type of claim mitigation, into multiple possible decision rules to determine the most suitable action given the specific environment and decision context. The goal of this approach is not to define how moral theory requires vehicles to behave, but rather to provide a computational approach that is flexible enough to accommodate a number of human 'moral positions' concerning what morality demands and what road users may expect, offering an evaluation tool for the social acceptability of an automated vehicle's decision making.
摘要
自动驾驶车(AV)的伦理问题在最近几年内得到了广泛的关注,尤其是在冲突情况下人类伤害的可能性存在时。 после一些讨论“人造道德代理人”这个术语是否适用于AV,以及从假设存在某些情况下人类伤害是不可避免的前提下,一种基于预先定义的参数来描述可能发生的事故风险,以及纳入伦理价值论中的多种可能的决策规则,以确定在特定环境和决策上最合适的行动。该方法的目的不是定义自动车如何行事,而是提供一种计算方法,能够适应人类“道德位置”的多种要求和公路用户的期望,并提供评估自动车决策的社会可接受性的工具。
Bird’s-Eye-View Scene Graph for Vision-Language Navigation
methods: 该论文提出了一种基于多步BEV表示(BEV Scene Graph,BSG)的方法,用于编码室内环境的场景布局和几何特征。在导航过程中,BSG会建立当前步骤的本地BEV表示,并维护一个基于BEV的全局场景地图,用于存储和组织在线收集的所有本地BEV表示。
results: 与现有方法相比,该方法在REVERIE、R2R和R4R等测试 datasets 上显示出了明显的提升。这表明BEV感知在VLN中具有潜在的应用前景。Abstract
Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances. However, current agents are built upon panoramic observations, which hinders their ability to perceive 3D scene geometry and easily leads to ambiguous selection of panoramic view. To address these limitations, we present a BEV Scene Graph (BSG), which leverages multi-step BEV representations to encode scene layouts and geometric cues of indoor environment under the supervision of 3D detection. During navigation, BSG builds a local BEV representation at each step and maintains a BEV-based global scene map, which stores and organizes all the online collected local BEV representations according to their topological relations. Based on BSG, the agent predicts a local BEV grid-level decision score and a global graph-level decision score, combined with a sub-view selection score on panoramic views, for more accurate action prediction. Our approach significantly outperforms state-of-the-art methods on REVERIE, R2R, and R4R, showing the potential of BEV perception in VLN.
摘要
vision-language navigation (VLN) 已经取得了很大的进步,但现有的代理人都是基于投影的观察,这会限制它们对3D场景的理解和 Selection of panoramic view 的能力。为了解决这些局限性,我们提出了 BEV 场景图 (BSG),它利用多步 BEV 表示来编码室内环境的场景布局和几何信息。在导航过程中,BSG 在每个步骤建立了本地 BEV 表示,并维护了基于 BEV 的全局场景图,该图存储和组织了在线收集的所有本地 BEV 表示,根据它们的 topological relations。基于 BSG,代理人可以预测本地 BEV 格子级别的决策分数和全局图级别的决策分数,同时还可以基于投影视图进行更准确的动作预测。我们的方法在 REVERIE、R2R 和 R4R 上表现出了明显的突破,demonstrating the potential of BEV perception in VLN。
Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks
For: The paper proposes a new method for efficient and adaptive continual learning in Spiking Neural Networks (SNNs), inspired by the dynamic structure development of the human brain during child growth and development.* Methods: The proposed method, called Dynamic Structure Development of Spiking Neural Networks (DSD-SNN), dynamically assigns and grows new neurons to new tasks, prunes redundant neurons, and leverages overlapping shared structure to quickly adapt to new tasks while reducing computational overhead.* Results: The proposed model achieves significant improvements in performance, learning speed, and memory capacity compared to existing SNNs-based continual learning methods, and achieves comparable performance with DNNs-based methods.Abstract
Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
摘要
children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.Here's the text in Traditional Chinese:children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.
Case Study: Using AI-Assisted Code Generation In Mobile Teams
results: 研究结果表明,使用AI助成编程工具可以提高开发效率和正确率,同时也可以帮助开发者更快地适应新的技术环境。Abstract
The aim of this study is to evaluate the performance of AI-assisted programming in actual mobile development teams that are focused on native mobile languages like Kotlin and Swift. The extensive case study involves 16 participants and 2 technical reviewers, from a software development department designed to understand the impact of using LLMs trained for code generation in specific phases of the team, more specifically, technical onboarding and technical stack switch. The study uses technical problems dedicated to each phase and requests solutions from the participants with and without using AI-Code generators. It measures time, correctness, and technical integration using ReviewerScore, a metric specific to the paper and extracted from actual industry standards, the code reviewers of merge requests. The output is converted and analyzed together with feedback from the participants in an attempt to determine if using AI-assisted programming tools will have an impact on getting developers onboard in a project or helping them with a smooth transition between the two native development environments of mobile development, Android and iOS. The study was performed between May and June 2023 with members of the mobile department of a software development company based in Cluj-Napoca, with Romanian ownership and management.
摘要
这项研究的目的是评估人工智能助手在实际移动开发团队中的表现,这些团队专注于本地移动语言如kotlin和swift。这项案例研究包括16名参与者和2名技术评审员,来自软件开发部门,旨在了解使用LLMs生成代码在特定阶段的团队中的影响,具体来说是技术培训和技术栈转换。研究使用每个阶段的技术问题,并请参与者在使用AI代码生成器和不使用其之间提交解决方案。研究测量时间、正确性和技术 интеграción使用ReviewerScore指标,该指标来自实际业界标准,代码审查人员对合并请求的反馈。输出被转换和分析,并与参与者反馈结合,以确定使用AI助手编程工具是否会影响开发人员在项目中上手或者在 Android 和 iOS 两个本地开发环境之间的畅转。研究在2023年5月至6月进行,参与者来自罗马尼亚CLuj-Napoca的软件开发公司的移动部门,该公司拥有罗马尼亚所有和管理。
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
results: 对比先前方法,JEN-1 在文本音乐对齐和音乐质量两个指标上具有显著优势,同时保持计算效率。研究人员提供了在线示例,详细介绍了模型的应用和实现。Abstract
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1
摘要
音乐生成已经吸引了深入的关注,随着深度生成模型的发展。然而,根据文本描述生成音乐,也称为文本到音乐,仍然是一个挑战,因为音乐结构的复杂性和高抽样率的需求。尽管这个任务的重要性,现有的生成模型却表现出限制,包括音乐质量、计算效率和泛化能力。这篇论文介绍了JEN-1,一种通用的高精度模型 для文本到音乐生成。JEN-1是一种扩散模型,通过同时使用抽样和非抽样训练,实现了在文本指导下生成音乐、音乐填充和续写等多种生成任务。评估结果表明,JEN-1在文本音乐对齐和音乐质量方面表现出了明显的优异性,同时保持计算效率。我们的 demo 可以在 http://futureverse.com/research/jen/demos/jen1 上查看。
Data-Free Model Extraction Attacks in the Context of Object Detection
results: 研究发现,定义损失函数和生成器的设置是EXTRACTION攻击的关键因素,并且使用合理的查询语句可以获得显著的结果。这种对找象检测模型的攻击将支持未来保护这些模型的安全。Abstract
A significant number of machine learning models are vulnerable to model extraction attacks, which focus on stealing the models by using specially curated queries against the target model. This task is well accomplished by using part of the training data or a surrogate dataset to train a new model that mimics a target model in a white-box environment. In pragmatic situations, however, the target models are trained on private datasets that are inaccessible to the adversary. The data-free model extraction technique replaces this problem when it comes to using queries artificially curated by a generator similar to that used in Generative Adversarial Nets. We propose for the first time, to the best of our knowledge, an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. As part of our study, we found that defining a loss function and using a novel generator setup is one of the key aspects in extracting the target model. We find that the proposed model extraction method achieves significant results by using reasonable queries. The discovery of this object detection vulnerability will support future prospects for securing such models.
摘要
许多机器学习模型容易受到模型EXTRACTION攻击,这种攻击将目标模型通过使用特制的查询来盗取模型。在实际情况下,目标模型通常是使用私有数据进行训练,这些数据不可 accessible于敌方。我们提出了一种数据free模型EXTRACTION技术,使用生成器类似于生成对抗网络来生成训练数据。我们在这种情况下首次,至少知道的情况下,将敌方黑盒攻击扩展到回归问题中,用于预测物体检测中的矩形坐标。在我们的研究中,我们发现了定义损失函数和使用新的生成器设置是EXTRACTION模型的关键方面。我们发现,使用合理的查询,提议的模型EXTRACTION方法可以取得显著的结果。这种发现将对物体检测模型的安全提供支持。
JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games
methods: 这个论文使用了人类对象棋盘游戏的纪录分析,以及蜕虫 Monte-Carlo Tree Search(MCTS)和策略空间响应器(PSRO)的组合,以估算 Nash 平衡。
results: 这个论文通过对 WeChat 小程序进行实践测试,实现了人类玩家的Master级别冠军,win rate 为 99.41%。这个结果表明了算法的有效性在超越非传统性方面。Abstract
This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game's strategic structure. To address non-transitivity, we introduce the JiangJun algorithm, an innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space Response Oracles (PSRO) designed to approximate a Nash equilibrium. We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.41\% win rate against human players. The algorithm's effectiveness in overcoming non-transitivity is confirmed by a plethora of metrics, such as relative population performance and visualization results. Our project site is available at \url{https://sites.google.com/view/jiangjun-site/}.
摘要
Generative Perturbation Analysis for Probabilistic Black-Box Anomaly Attribution
results: 本文提出了一种基于变量 Bayes算法的方法来derive异常归因分布的 distributions。这种方法可以减少异常归因分布的uncertainty。根据作者所知,这是第一个不受异常归因的概率异常归因框架。Abstract
We address the task of probabilistic anomaly attribution in the black-box regression setting, where the goal is to compute the probability distribution of the attribution score of each input variable, given an observed anomaly. The training dataset is assumed to be unavailable. This task differs from the standard XAI (explainable AI) scenario, since we wish to explain the anomalous deviation from a black-box prediction rather than the black-box model itself. We begin by showing that mainstream model-agnostic explanation methods, such as the Shapley values, are not suitable for this task because of their ``deviation-agnostic property.'' We then propose a novel framework for probabilistic anomaly attribution that allows us to not only compute attribution scores as the predictive mean but also quantify the uncertainty of those scores. This is done by considering a generative process for perturbations that counter-factually bring the observed anomalous observation back to normalcy. We introduce a variational Bayes algorithm for deriving the distributions of per variable attribution scores. To the best of our knowledge, this is the first probabilistic anomaly attribution framework that is free from being deviation-agnostic.
摘要
我们考虑了黑盒回归Setting中的概率异常归因问题,其目标是计算每个输入变量的归因分布,给出观察到的异常。我们假设训练数据集不可用。这个任务与标准XAI(可解释AI)场景不同,我们想要解释黑盒预测的异常偏差,而不是黑盒模型本身。我们首先表明,主流的模型无关解释方法,如雪佛利值,不适合这个任务,因为它们的“偏差无关性”性。然后,我们提出了一种新的概率异常归因框架,允许我们不仅计算归因分布,还能量度这些分布的不确定性。这是通过考虑一种生成过程来perturbations counter-factually bring the observed anomalous observation back to normalcy来实现的。我们引入了一种变分极 bayes算法来 derive the distributions of per variable attribution scores。到目前为止,这是免受偏差无关性的第一个概率异常归因框架。
Explainable AI in Orthopedics: Challenges, Opportunities, and Prospects
results: 本研究发现了一些关键的挑战和机遇,包括数据不一致、模型复杂性和解释性要求等,这些挑战需要在AI实践中得到解决,以便在orthopedics中广泛采用XAI。Abstract
While artificial intelligence (AI) has made many successful applications in various domains, its adoption in healthcare lags a little bit behind other high-stakes settings. Several factors contribute to this slower uptake, including regulatory frameworks, patient privacy concerns, and data heterogeneity. However, one significant challenge that impedes the implementation of AI in healthcare, particularly in orthopedics, is the lack of explainability and interpretability around AI models. Addressing the challenge of explainable AI (XAI) in orthopedics requires developing AI models and algorithms that prioritize transparency and interpretability, allowing clinicians, surgeons, and patients to understand the contributing factors behind any AI-powered predictive or descriptive models. The current contribution outlines several key challenges and opportunities that manifest in XAI in orthopedic practice. This work emphasizes the need for interdisciplinary collaborations between AI practitioners, orthopedic specialists, and regulatory entities to establish standards and guidelines for the adoption of XAI in orthopedics.
摘要
人工智能(AI)在不同领域取得了许多成功应用,但在医疗领域的采纳相对落后一些。这些因素包括法规框架、患者隐私问题和数据多样性。然而,在医疗领域中,特别是在骨科方面,缺乏可解释性和解释力是人工智能的实施的一大障碍。解决骨科中的可解释人工智能(XAI)挑战需要开发可透明和可解释的AI模型和算法,让临床医生、外科医生和患者理解任何基于AI的预测或描述性模型的贡献因素。本著作描述了骨科中XAI的一些关键挑战和机遇,并强调需要在AI实践者、骨科专家和法规机构之间建立标准和指南,以便在骨科中广泛采用XAI。
Finite Element Operator Network for Solving Parametric PDEs
results: 在多个 benchmark 问题中,FEONet 方法比现有的状态 искусственный intelligence 方法更高精度、更好的泛化性和计算灵活性,并且可以应用于各种领域中, где PDEs 扮演关键角色。Abstract
Partial differential equations (PDEs) underlie our understanding and prediction of natural phenomena across numerous fields, including physics, engineering, and finance. However, solving parametric PDEs is a complex task that necessitates efficient numerical methods. In this paper, we propose a novel approach for solving parametric PDEs using a Finite Element Operator Network (FEONet). Our proposed method leverages the power of deep learning in conjunction with traditional numerical methods, specifically the finite element method, to solve parametric PDEs in the absence of any paired input-output training data. We demonstrate the effectiveness of our approach on several benchmark problems and show that it outperforms existing state-of-the-art methods in terms of accuracy, generalization, and computational flexibility. Our FEONet framework shows potential for application in various fields where PDEs play a crucial role in modeling complex domains with diverse boundary conditions and singular behavior. Furthermore, we provide theoretical convergence analysis to support our approach, utilizing finite element approximation in numerical analysis.
摘要
web crawler strategies for web pages under robot.txt restriction
results: 本研究回答了一些基本问题,例如网页在搜索引擎中如何获得高排名、搜索引擎如何获得网页的全部内容、网站管理员如何透过 robot.txt 文件对网络爬虫进行限制等等。Abstract
In the present time, all know about World Wide Web and work over the Internet daily. In this paper, we introduce the search engines working for keywords that are entered by users to find something. The search engine uses different search algorithms for convenient results for providing to the net surfer. Net surfers go with the top search results but how did the results of web pages get higher ranks over search engines? how the search engine got that all the web pages in the database? This paper gives the answers to all these kinds of basic questions. Web crawlers working for search engines and robot exclusion protocol rules for web crawlers are also addressed in this research paper. Webmaster uses different restriction facts in robot.txt file to instruct web crawler, some basic formats of robot.txt are also mentioned in this paper.
摘要
现在,所有人都知道世界延伸网(World Wide Web)以及在互联网上每天进行工作。在这篇论文中,我们介绍了用户输入关键词时搜索引擎所使用的不同搜索算法,以提供便利的搜索结果给网络浏览者。网络浏览者通常会遵循搜索结果的排名,但是如何使得网页在搜索引擎中获得高排名呢?这篇论文会回答这些基本问题。此外,我们还会讨论搜索引擎使用的网络爬虫(web crawler)以及爬虫排除协议(robot exclusion protocol)规则。网站管理员可以使用 robot.txt 文件中的不同限制事实来 instruc爬虫,这篇论文还将介绍一些基本的 robot.txt 格式。
Rapid Training Data Creation by Synthesizing Medical Images for Classification and Localization
paper_authors: Abhishek Kushwaha, Sarthak Gupta, Anish Bhanushali, Tathagato Rai Dastidar for: 这篇论文的目的是解决医疗影像分析中对于资料标注的问题,以及实现对于医疗影像的训练深度神经网络所需的大量标注资料的产生。methods: 这篇论文使用了一种方法来将真实的数据转换为训练深度神经网络所需的标注资料。这种方法可以帮助解决医疗影像分析中对于资料标注的问题,并且可以实现对于医疗影像的训练深度神经网络所需的大量标注资料的产生。results: 这篇论文的结果显示,使用这种方法可以将医疗影像的训练深度神经网络的对于标注的精度提高 significantly。另外,这种方法可以实现对于医疗影像的训练深度神经网络所需的大量标注资料的产生,并且可以与实际的标注资料相对比较。Abstract
While the use of artificial intelligence (AI) for medical image analysis is gaining wide acceptance, the expertise, time and cost required to generate annotated data in the medical field are significantly high, due to limited availability of both data and expert annotation. Strongly supervised object localization models require data that is exhaustively annotated, meaning all objects of interest in an image are identified. This is difficult to achieve and verify for medical images. We present a method for the transformation of real data to train any Deep Neural Network to solve the above problems. We show the efficacy of this approach on both a weakly supervised localization model and a strongly supervised localization model. For the weakly supervised model, we show that the localization accuracy increases significantly using the generated data. For the strongly supervised model, this approach overcomes the need for exhaustive annotation on real images. In the latter model, we show that the accuracy, when trained with generated images, closely parallels the accuracy when trained with exhaustively annotated real images. The results are demonstrated on images of human urine samples obtained using microscopy.
摘要
Artificial intelligence (AI) for medical image analysis 受欢迎,但是在医疗领域生成标注数据的专业知识、时间和成本很高,主要是因为医疗数据和专家标注数据的有限性。强制supervised对象定位模型需要完全标注的数据,这是医疗图像中很难以完成和验证。我们提出了一种将实际数据转换为训练任何深度神经网络的方法。我们在弱supervised对象定位模型和强制supervised对象定位模型中应用了这种方法,并证明了其效果。对于弱supervised模型,我们发现使用生成的数据可以显著提高对象定位精度。对于强制supervised模型,这种方法可以解决实际图像中的尝试性标注问题,并且在使用生成的图像进行训练时,对象定位精度与使用实际图像进行训练时的精度几乎相同。我们在人体尿样微scopic图像中进行了实验,并证明了这种方法的有效性。
Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA
results: 该 paper 的 80 亿参数模型在 ARC-Easy 数据集下,在 few shot 设定下,超过 BLOOM-176B 的性能。Abstract
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited enhanced reasoning capabilities in tackling complex reasoning tasks, owing to the utilization of a method named ``Chain-of-Thought (CoT) prompting''. This method is designed to generate intermediate reasoning steps that guide the inference of the final answer. However, it is essential to highlight that these advanced reasoning abilities appear to emerge in models with a minimum of 10 billion parameters, thereby limiting its efficacy in situations where computational resources are constrained. In this paper, we investigate the possibility of transferring the reasoning capabilities of LLMs to smaller models via knowledge distillation. Specifically, we propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers. This method enables a more efficient use of rationales during the answer inference stage, leading to improved performance on scientific question-answering tasks. Utilizing Sci-CoT, our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
摘要
Addressing Racial Bias in Facial Emotion Recognition
paper_authors: Alex Fan, Xingshuo Xiao, Peter Washington
for: 该研究旨在分析深度学习模型在高维输入和主观标签下的公平性问题。
methods: 该研究使用变换训练集的方法来分析种族偏见,并评估模型在不同种族群体中的表现。
results: 研究发现,尽管使用更小的训练集可以改善公平性和性能指标,但在更大的数据集中,种族偏见指标通常保持不变,这表明种族平衡本身不足以实现不同种族群体中的表现均衡。Abstract
Fairness in deep learning models trained with high-dimensional inputs and subjective labels remains a complex and understudied area. Facial emotion recognition, a domain where datasets are often racially imbalanced, can lead to models that yield disparate outcomes across racial groups. This study focuses on analyzing racial bias by sub-sampling training sets with varied racial distributions and assessing test performance across these simulations. Our findings indicate that smaller datasets with posed faces improve on both fairness and performance metrics as the simulations approach racial balance. Notably, the F1-score increases by $27.2\%$ points, and demographic parity increases by $15.7\%$ points on average across the simulations. However, in larger datasets with greater facial variation, fairness metrics generally remain constant, suggesting that racial balance by itself is insufficient to achieve parity in test performance across different racial groups.
摘要
“深入学习模型对高维输入和主观标签的公平性问题仍然是一个复杂和未得到充分研究的领域。 facial emotion recognition 领域中的数据集经常具有种族不均衡,这可能导致不同种族群体之间的模型性能差异。本研究通过对训练集进行不同种族分布的子批采样,并评估测试性能在这些模拟中的变化。我们发现,使用posed faces的训练集可以提高公平性和性能指标,特别是在模拟中接近种族均衡时。平均而言,使用posed faces的训练集可以提高 F1 分数27.2个百分点,并提高了种族平衡指标15.7个百分点。然而,在大型数据集中,公平指标通常保持不变,这表明,只有寻求种族均衡不能 garantate 测试性能的平衡 across different racial groups。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
SSL-Auth: An Authentication Framework by Fragile Watermarking for Pre-trained Encoders in Self-supervised Learning
results: 实验结果显示,SSL-Auth可以实现标本处理器的完整性验证,并且可以探测到潜在的黑门和攻击攻击。实验结果显示,SSL-Auth不会对标本处理器的性能造成影响。Abstract
Self-supervised learning (SSL), utilizing unlabeled datasets for training powerful encoders, has achieved significant success recently. These encoders serve as feature extractors for downstream tasks, requiring substantial resources. However, the challenge of protecting the intellectual property of encoder trainers and ensuring the trustworthiness of deployed encoders remains a significant gap in SSL. Moreover, recent researches highlight threats to pre-trained encoders, such as backdoor and adversarial attacks. To address these gaps, we propose SSL-Auth, the first authentication framework designed specifically for pre-trained encoders. In particular, SSL-Auth utilizes selected key samples as watermark information and trains a verification network to reconstruct the watermark information, thereby verifying the integrity of the encoder without compromising model performance. By comparing the reconstruction results of the key samples, malicious alterations can be detected, as modified encoders won't mimic the original reconstruction. Comprehensive evaluations on various encoders and diverse downstream tasks demonstrate the effectiveness and fragility of our proposed SSL-Auth.
摘要
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks
paper_authors: Jue Chen, Huan Yuan, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang
for: This paper focuses on compressing Brain-inspired Spiking Neural Networks (SNNs) to improve their deployment on edge devices such as neuromorphic chips.
methods: The proposed method uses an improved end-to-end Minimax optimization method for sparse learning to balance model performance and computation efficiency.
results: The compressed SNN models achieved state-of-the-art (SOTA) performance on various benchmark datasets and architectures.Here’s the same information in Simplified Chinese:
for: 这篇论文主要关注压缩Brain-inspired Spiking Neural Networks (SNNs),以提高其在边缘设备such as neuromorphic chips上的部署。
results: 压缩后的SNN模型在多个 benchmark数据集和架构上达到了最佳性能(SOTA)。I hope that helps!Abstract
Brain-inspired Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient, which are different from traditional Artificial Neural Networks (ANNs) when deployed on edge devices such as neuromorphic chips. Most previous work focuses on SNNs training strategies to improve model performance and brings larger and deeper network architectures. It is difficult to deploy these complex networks on resource-limited edge devices directly. To meet such demand, people compress SNNs very cautiously to balance the performance and the computation efficiency. Existing compression methods either iteratively pruned SNNs using weights norm magnitude or formulated the problem as a sparse learning optimization. We propose an improved end-to-end Minimax optimization method for this sparse learning problem to better balance the model performance and the computation efficiency. We also demonstrate that jointly applying compression and finetuning on SNNs is better than sequentially, especially for extreme compression ratios. The compressed SNN models achieved state-of-the-art (SOTA) performance on various benchmark datasets and architectures. Our code is available at https://github.com/chenjallen/Resource-Constrained-Compression-on-SNN.
摘要
�建基于大脑的�顿神经网络(SNN)具有事件驱动和高能效的特点,与传统的人工神经网络(ANN)不同,当部署在边缘设备such as neuromorphic chips时。大多数先前的工作将焦点放在SNNs 训练策略以提高模型性能,并带来更大和更深的网络架构。然而,这些复杂的网络直接部署在有限的边缘设备上是困难的。为满足这种需求,人们很注意地压缩SNNs以平衡性能和计算效率。现有的压缩方法包括Iteratively pruning SNNs使用权重norm magnitude或将问题形式为稀疏学习优化问题。我们提出了改进的端到端最小最大优化方法,以更好地平衡模型性能和计算效率。我们还示出,将压缩和训练结合在一起,特别是在极高的压缩比例时,jointly applying compression and finetuning on SNNs 更好than sequentially。压缩后的SNN模型在多个benchmark datasets和架构上达到了state-of-the-art(SOTA)性能。我们的代码可以在https://github.com/chenjallen/Resource-Constrained-Compression-on-SNN上找到。
A Hierarchical Destroy and Repair Approach for Solving Very Large-Scale Travelling Salesman Problem
results: 在 nineteen 个著名的大规模实例上进行了公正的比较,显示 HDR 在计算效率和解决质量两个方面具有竞争力,并在两个大实例上打破了世界纪录。Abstract
For prohibitively large-scale Travelling Salesman Problems (TSPs), existing algorithms face big challenges in terms of both computational efficiency and solution quality. To address this issue, we propose a hierarchical destroy-and-repair (HDR) approach, which attempts to improve an initial solution by applying a series of carefully designed destroy-and-repair operations. A key innovative concept is the hierarchical search framework, which recursively fixes partial edges and compresses the input instance into a small-scale TSP under some equivalence guarantee. This neat search framework is able to deliver highly competitive solutions within a reasonable time. Fair comparisons based on nineteen famous large-scale instances (with 10,000 to 10,000,000 cities) show that HDR is highly competitive against existing state-of-the-art TSP algorithms, in terms of both efficiency and solution quality. Notably, on two large instances with 3,162,278 and 10,000,000 cities, HDR breaks the world records (i.e., best-known results regardless of computation time), which were previously achieved by LKH and its variants, while HDR is completely independent of LKH. Finally, ablation studies are performed to certify the importance and validity of the hierarchical search framework.
摘要
For 非常大规模的旅行销售人员问题 (TSP), 现有的算法面临着计算效率和解决质量两个大的挑战。为了解决这个问题,我们提议一种层次破坏和重建 (HDR) 方法,它尝试通过一系列特制的破坏和重建操作来提高初始解。一个关键创新的搜索框架是层次搜索框架,它可以在一定的等价保证下将输入实例压缩到小规模的 TSP 中进行搜索。这个搜索框架能够在有限时间内提供高度竞争力的解决方案。基于 nineteen 著名的大规模实例 (10,000 到 10,000,000 个城市) 进行了公正的比较,显示 HDR 在计算效率和解决质量两个方面与现有的状态 искусственный智能 TSP 算法高度竞争。特别是在 3,162,278 和 10,000,000 个城市的两个大实例上,HDR 破坏了世界纪录 (即不管计算时间为何),这些纪录曾由 LKH 和其变种所获得,而 HDR 与 LKH 完全无关。最后,我们进行了剖面研究,以证明层次搜索框架的重要性和有效性。
Sparse Binary Transformers for Multivariate Time Series Modeling
paper_authors: Matt Gorbett, Hossein Shirazi, Indrakshi Ray for:This paper focuses on applying sparse and binary-weighted Transformers to multivariate time series problems, with the goal of achieving accuracy comparable to that of dense floating-point Transformers while reducing computational complexity.methods:The authors use two compression techniques to reduce the number of non-zero operations necessary in the Transformer: 1) applying a fixed mask to the query, key, and value activations, and 2) proposing an attention mask to allow computation only at the current time step.results:The model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting, with up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs compared to the dense floating-point Transformers.Abstract
Compressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and anomaly detection, which rely on predicting outputs at a single point in time, we propose an attention mask to allow computation only at the current time step. Together, each compression technique and attention modification substantially reduces the number of non-zero operations necessary in the Transformer. We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count, showing up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs.
摘要
压缩神经网络(Compressed Neural Networks)具有启动深度学习的潜力,但未有充分研究其在不同任务下的范围。在这个工作中,我们运用稀疑和二进制权重的 transformer 来解决多元时间序问题,获得了与组 dense floating-point transformer 相同结构的精度。我们的模型在三个时间序学习任务中获得了良好的结果:分类、侦测异常和预测。此外,为降低对于注意力机制的计算复杂度,我们提出了两种修改,它们几乎没有影响模型性能:1)在分类任务中,我们对查询、钥匙和值活动中的数据应用固定的面罩,2)在预测和侦测任务中,我们提出了一个注意力面罩,允许只在目前时间步进行计算。组合这些压缩技术和注意力修改,我们可以对 transformer 进行重大的储存空间和计算复杂度的节省。我们使用多个指标来衡量我们的方法对于储存空间、位元数和浮点操作(FLOPs)的节省,获得了最多 53 倍的储存空间节省和最多 10.5 倍的 FLOPs 节省。
methods: 文章使用了三种责任 régime来分析生成AI模型的责任风险,包括诽谤、speech integral to criminal conduct和wrongful death。
results: 文章发现,任何 Section 230 免责分析或下游责任分析都与算法设计细节有紧密的关系,而且在法律责任上存在多个障碍,使得很难确定生成AI模型和关联的党员是否承担生成的言论责任。文章认为,AI应不被总是免责,法院和政策制定者应该仔细考虑技术设计的优化,以便更好地评估这些问题。Abstract
Generative AI, in particular text-based "foundation models" (large models trained on a huge variety of information including the internet), can generate speech that could be problematic under a wide range of liability regimes. Machine learning practitioners regularly "red team" models to identify and mitigate such problematic speech: from "hallucinations" falsely accusing people of serious misconduct to recipes for constructing an atomic bomb. A key question is whether these red-teamed behaviors actually present any liability risk for model creators and deployers under U.S. law, incentivizing investments in safety mechanisms. We examine three liability regimes, tying them to common examples of red-teamed model behaviors: defamation, speech integral to criminal conduct, and wrongful death. We find that any Section 230 immunity analysis or downstream liability analysis is intimately wrapped up in the technical details of algorithm design. And there are many roadblocks to truly finding models (and their associated parties) liable for generated speech. We argue that AI should not be categorically immune from liability in these scenarios and that as courts grapple with the already fine-grained complexities of platform algorithms, the technical details of generative AI loom above with thornier questions. Courts and policymakers should think carefully about what technical design incentives they create as they evaluate these issues.
摘要
优先级AI,特别是基于文本的“基础模型”(大型模型通过互联网上庞大量信息进行训练),可能会生成有问题的语音。机器学习实践者 regularely “红军”模型以识别和避免这些问题语音,从“幻觉”(误告人员严重不当行为)到杀人炸弹的制作方法。关键问题是这些红色测试行为是否对美国法律而言带来责任风险,并促使投资于安全机制。我们研究了三种责任 régime,与常见的红色测试模型行为相关:诽谤、语音与刑事活动相关、和谋杀。我们发现,任何 Section 230 免责分析或下游责任分析都与算法设计技术细节有紧密的关系。而且有很多障碍物,使得真正找到模型(以及关联的党)因生成的语音而负责。我们认为AI shouldn't be categorically immune from liability in these scenarios,而且法院和政策制定者在评估这些问题时应该仔细考虑技术设计的吸引力。
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
results: 研究发现,大多数语句嵌入方法在给定的文档中具有高相关性,但显示出有趣的差异。Abstract
Analyzing the pattern of semantic variation in long real-world texts such as books or transcripts is interesting from the stylistic, cognitive, and linguistic perspectives. It is also useful for applications such as text segmentation, document summarization, and detection of semantic novelty. The recent emergence of several vector-space methods for sentence embedding has made such analysis feasible. However, this raises the issue of how consistent and meaningful the semantic representations produced by various methods are in themselves. In this paper, we compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature. In contrast to previous work using target tasks and curated datasets to compare sentence embedding methods, our approach provides an evaluation of the methods 'in the wild'. We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.
摘要
分析长文本中的语义变化 Pattern 是从语言、认知和风格等多个角度来看的非常有趣。同时,这也有很多应用,例如文本分 segmentation、文摘概要和语义新颖性检测。随着 sentence embedding 技术的出现,这种分析变得可能。然而,这也引出了各种方法生成的语义表示是否具有一致性和意义的问题。在这篇文章中,我们对一些最近的 sentence embedding 方法进行了比较,使用时间序列的语义相似性和多本文学作品的对应矩阵来评估这些方法。与之前使用目标任务和精心制作的数据集来评估 sentence embedding 方法不同,我们的方法在实际应用中进行了评估。我们发现大多数考虑的 sentence embedding 方法在给定的文档中具有高度相关的语义相似性模式,但是具有舒适的差异。
Benchmarking LLM powered Chatbots: Methods and Metrics
for: 评估自动对话机器人(chatbot)的性能,特别是使用生成型人工智能工具(Large Language Models,LLMs)的chatbot。
methods: 提出一种新的终到终(End to End,E2E)benchmark,用于评估chatbot的答案准确性和用用性。
results: 通过对一个示例chatbot进行评估,显示E2E benchmark能够更好地评估chatbot的性能,而且与其他常用的metric相比,E2E benchmark的metric(cosine similarity)表现良好。Abstract
Autonomous conversational agents, i.e. chatbots, are becoming an increasingly common mechanism for enterprises to provide support to customers and partners. In order to rate chatbots, especially ones powered by Generative AI tools like Large Language Models (LLMs) we need to be able to accurately assess their performance. This is where chatbot benchmarking becomes important. In this paper, we propose the use of a novel benchmark that we call the E2E (End to End) benchmark, and show how the E2E benchmark can be used to evaluate accuracy and usefulness of the answers provided by chatbots, especially ones powered by LLMs. We evaluate an example chatbot at different levels of sophistication based on both our E2E benchmark, as well as other available metrics commonly used in the state of art, and observe that the proposed benchmark show better results compared to others. In addition, while some metrics proved to be unpredictable, the metric associated with the E2E benchmark, which uses cosine similarity performed well in evaluating chatbots. The performance of our best models shows that there are several benefits of using the cosine similarity score as a metric in the E2E benchmark.
摘要
自动化对话代理(即 chatbot)在企业提供客户和伙伴支持的机制中变得越来越普遍。为了评估 chatbot 的性能,特别是基于大型语言模型(LLM)的话语生成器,我们需要能够准确评估它们的表现。这是 где对话机器人评估变得重要。在这篇论文中,我们提议使用一种新的评估标准,称之为端到端(E2E)标准,并证明了该标准可以准确评估对话机器人的答案准确性和用于性。我们对一个示例的对话机器人进行了不同水平的评估,并证明了我们的 E2E 标准与其他常见的状态艺术metric 相比,能够更好地评估对话机器人的性能。此外,我们发现了一些 metrics 的不可预测性,但是与 E2E 标准相关的 cosine 相似性分数metric 表现了良好的评估能力。我们的最佳模型表示,使用 cosine 相似性分数作为metric 在 E2E 标准中有几个优点。
Accelerating LLM Inference with Staged Speculative Decoding
results: 对于一个 762M 参数的 GPT-2-L 模型,单批解码延迟时间可以下降至 3.16 倍,同时保持输出质量不变。Abstract
Recent advances with large language models (LLM) illustrate their diverse capabilities. We propose a novel algorithm, staged speculative decoding, to accelerate LLM inference in small-batch, on-device scenarios. We address the low arithmetic intensity of small-batch inference by improving upon previous work in speculative decoding. First, we restructure the speculative batch as a tree, which reduces generation costs and increases the expected tokens per batch. Second, we add a second stage of speculative decoding. Taken together, we reduce single-batch decoding latency by 3.16x with a 762M parameter GPT-2-L model while perfectly preserving output quality.
摘要
methods: 这个模型的主要方法包括基于对照原理的认知演化,从婴儿期到成年期,通过固化原理来描述达到成熟状态。此外,模型还包括一种普适的 AGI 设计方法,以及认知下Constraints or efficiency的方法,如重用和简洁。
results: 这个模型的最终产品是一个动态操作记忆,包含模型和实例。此外,论文还提供了一些示例和初步的进化阶段来达到成熟状态的想法。Abstract
This paper proposes a new cognitive model, acting as the main component of an AGI agent. The model is introduced in its mature intelligence state, and as an extension of previous models, DENN, and especially AKREM, by including operational models (frames/classes) and will. This model's core assumption is that cognition is about operating on accumulated knowledge, with the guidance of an appropriate will. Also, we assume that the actions, part of knowledge, are learning to be aligned with will, during the evolution phase that precedes the mature intelligence state. In addition, this model is mainly based on the duality principle in every known intelligent aspect, such as exhibiting both top-down and bottom-up model learning, generalization verse specialization, and more. Furthermore, a holistic approach is advocated for AGI designing, and cognition under constraints or efficiency is proposed, in the form of reusability and simplicity. Finally, reaching this mature state is described via a cognitive evolution from infancy to adulthood, utilizing a consolidation principle. The final product of this cognitive model is a dynamic operational memory of models and instances. Lastly, some examples and preliminary ideas for the evolution phase to reach the mature state are presented.
摘要
The model is based on the duality principle in every known intelligent aspect, such as top-down and bottom-up model learning, generalization and specialization, and more. Additionally, the model advocates for a holistic approach to AGI design and cognition under constraints or efficiency, in the form of reusability and simplicity.The model proposes a cognitive evolution from infancy to adulthood, utilizing a consolidation principle to reach the mature state. The final product of the model is a dynamic operational memory of models and instances. Some examples and preliminary ideas for the evolution phase to reach the mature state are also presented.Here is the text in Simplified Chinese:这篇论文提出了一个新的认知模型,作为人工通用智能(AGI)代理人的主要组件。该模型是前一代模型DENN和AKREM的扩展,包括操作模型(框架/类)和愿power。模型的核心假设是认知是通过储存的知识进行操作,并且被适当的愿power引导。模型还假设行为,是知识的一部分,在演化阶段前置于成熟智能状态时,通过学习和对愿power的调整,使行为与愿power相吻合。此外,模型还基于知识的每一个智能方面的 dualism原理,例如展现出顶下向和底上向的模型学习、通用和特殊化、以及更多的方面。模型还提出了一种整体的方法 дляAGI设计和认知,即在约束或效率下进行认知,通过再用性和简洁来实现。最后,模型描述了一种认知演化从婴儿期到成年期,通过固化原理来达到成熟状态。最后,模型还提供了一些初步的演化阶段来达到成熟状态的例子和想法。
paper_authors: Tianlu Wang, Ping Yu, Xiaoqing Ellen Tan, Sean O’Brien, Ramakanth Pasunuru, Jane Dwivedi-Yu, Olga Golovneva, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz
results: 根据GPT-4的评价,Shepherd的评价与其他成熔比赛的模型相当或更高(53-87%的胜率),而在人工评价中,Shepherd优于其他模型,并与ChatGPT相对较为均衡。Abstract
As large language models improve, there is increasing interest in techniques that leverage these models' capabilities to refine their own outputs. In this work, we introduce Shepherd, a language model specifically tuned to critique responses and suggest refinements, extending beyond the capabilities of an untuned model to identify diverse errors and provide suggestions to remedy them. At the core of our approach is a high quality feedback dataset, which we curate from community feedback and human annotations. Even though Shepherd is small (7B parameters), its critiques are either equivalent or preferred to those from established models including ChatGPT. Using GPT-4 for evaluation, Shepherd reaches an average win-rate of 53-87% compared to competitive alternatives. In human evaluation, Shepherd strictly outperforms other models and on average closely ties with ChatGPT.
摘要
large language models 的改进引起了越来越多的关注,这些技术可以利用这些模型的能力来优化其输出。在这项工作中,我们介绍Shepherd,一种特地适应批判回答和提供修正的语言模型,超出了未调节模型的能力,能够识别多种错误并提供修正方案。我们的方法的核心是高质量的反馈数据集,我们从社区反馈和人工标注中筛选出来。尽管Shepherd只有7B个参数,但它的批判比与已有的模型,如ChatGPT相当或更好。使用GPT-4进行评估,Shepherd的平均胜率为53-87%,与竞争对手相比。在人类评估中,Shepherd严格超越了其他模型,平均与ChatGPT相对。
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction
paper_authors: Izzeddin Teeti, Rongali Sai Bhargav, Vivek Singh, Andrew Bradley, Biplab Banerjee, Fabio Cuzzolin
for: This paper is written for improving the action prediction in computer vision applications such as autonomous driving, activity analysis, and human-computer interaction.
methods: The paper introduces a novel self-supervised video strategy called Temporal-DINO, which uses two models (a “student” and a “teacher”) to learn future context by only observing past frames.
results: The experimental results show that the proposed method achieves significant improvements in prediction performance across different architectures, with an average enhancement of 9.9% Precision Points (PP), and demonstrates efficiency in terms of the pretraining dataset size and the number of epochs required.Abstract
The emerging field of action prediction plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches require large amounts of labelled data, which is expensive and time-consuming to obtain. This paper introduces a novel self-supervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels). The Temporal-DINO approach employs two models; a 'student' processing past frames; and a 'teacher' processing both past and future frames, enabling a broader temporal context. During training, the teacher guides the student to learn future context by only observing past frames. The strategy is evaluated on ROAD dataset for the action prediction downstream task using 3D-ResNet, Transformer, and LSTM architectures. The experimental results showcase significant improvements in prediction performance across these architectures, with our method achieving an average enhancement of 9.9% Precision Points (PP), highlighting its effectiveness in enhancing the backbones' capabilities of capturing long-term dependencies. Furthermore, our approach demonstrates efficiency regarding the pretraining dataset size and the number of epochs required. This method overcomes limitations present in other approaches, including considering various backbone architectures, addressing multiple prediction horizons, reducing reliance on hand-crafted augmentations, and streamlining the pretraining process into a single stage. These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding.
摘要
新兴的动作预测领域在计算机视觉应用中扮演着重要角色,包括自主驾驶、活动分析和人机交互。尽管有 significante 进步,但准确预测未来动作仍然是一个挑战,因为视频数据中存在高维度、复杂的动态和内在的不确定性。传统的监督方法需要大量标注数据,这是时间consuming 和成本高的。这篇文章介绍了一种新的无监督视频策略,以提高动作预测, inspirited by DINO(自我混合无标签)。Temporal-DINO方法使用两个模型:一个“学生”处理过去帧,一个“老师”处理过去和未来帧,这使得学生可以学习未来上下文。在训练过程中,老师指导学生通过只看过去帧来学习未来上下文。这种策略在 ROAD 数据集上进行动作预测下渠道任务中使用 3D-ResNet、Transformer 和 LSTM 架构进行评估。实验结果显示,我们的方法可以在这些架构上提高预测性能,typically 9.9% 精度点(PP),这说明我们的方法可以增强核心的长期依赖能力。此外,我们的方法也具有效率的优势,包括预训练数据集大小和轮数 requirement 的减少。这种方法超越了其他方法的限制,包括考虑多种核心架构、 Addressing 多个预测时间 horizons、减少手动制作的扩展和单一预训练过程。这些发现表明我们的方法在多个视频基于任务中具有潜在的优势,例如活动识别、运动规划和场景理解。
Developmental Bootstrapping: From Simple Competences to Intelligent Human-Compatible AIs
results: 未达到成年人类水平的能力(have not yet reached adult-level competences)Abstract
Although some AIs surpass human abilities in closed artificial worlds such as board games, in the real world they make strange mistakes and do not notice them. They cannot be instructed easily, fail to use common sense, and lack curiosity. Mainstream approaches for creating AIs include the traditional manually-constructed symbolic AI approach and the generative and deep learning AI approaches including large language models (LLMs). Although it is outside of the mainstream, the developmental bootstrapping approach may have more potential. In developmental bootstrapping, AIs develop competences like human children do. They start with innate competences. They interact with the environment and learn from their interactions. They incrementally extend their innate competences with self-developed competences. They interact and learn from people and establish perceptual, cognitive, and common grounding. They acquire the competences they need through competence bootstrapping. However, developmental robotics has not yet produced AIs with robust adult-level competences. Projects have typically stopped before reaching the Toddler Barrier. This corresponds to human infant development at about two years of age, before infant speech becomes fluent. They also do not bridge the Reading Barrier, where they could skillfully and skeptically draw on the socially developed online information resources that power LLMs. The next competences in human cognitive development involve intrinsic motivation, imitation learning, imagination, coordination, and communication. This position paper lays out the logic, prospects, gaps, and challenges for extending the practice of developmental bootstrapping to create robust, trustworthy, and human-compatible AIs.
摘要
尽管一些人工智能在封闭的人工世界中超越人类能力,但在真实世界中它们会做奇怪的错误并不会注意到它们。它们不易教育,失去常识,缺乏好奇心。主流的创造人工智能方法包括传统的手动构建符号AI方法和生成和深度学习AI方法,包括大型语言模型(LLM)。虽然它不在主流,但开发启动approach可能具有更大的潜力。在开发启动中,人工智能发展出 Competences 像人类孩子一样。它们从内在的 Competences 开始,与环境互动,从互动学习。它们逐步增强内在 Competences 自己发展出的 Competences。它们与人类交互,学习人类的语言、认知和共同基础。它们通过 Competence 启动获得需要的 Competences。然而,开发机器人学还没有生成成熟的人类水平 Competences。项目通常在达到婴儿障碍(Toddler Barrier)前就停止。这与人类婴儿发展相对应,约两岁时, antes que el habla fluente。它们也不能跨越阅读障碍(Reading Barrier),可以 skillfully 和skeptically draw on the socially developed online information resources that power LLMs。人类认知发展中下一个 Competences 包括内在动机、模仿学习、想象力、协调和communication。这篇position paper 描述了开发启动的逻辑、前景、潜在的差距和挑战,以创建可靠、可信任的人类兼容AI。
Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures
results: 作者在Split-MNIST、Split-CIFAR-10和Split-CIFAR-100数据集上实现了在线不断学习的superior性能,并且与其他存储受限的学习方法相比,与STATE-OF-THE-ART的存储充足的重播基于方法相当。此外,作者还将这些设计元素 интеGRATE到其他Backpropagation-based continual learning算法中,提高了它们的准确性。Abstract
The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.
摘要
<>为设计智能系统,持续学习能力是关键。许多持续学习方法依赖于梯度下降和其变体,这会导致稳定性、贪吃和短期记忆限制。为解决这一限制,我们开发了基于生物学原理的轻量级神经网络架构,包括synaptic plasticity机制和 neuromodulation,从而通过本地错误信号来实现在线持续学习无需梯度下降。我们的方法在Split-MNIST、Split-CIFAR-10和Split-CIFAR-100数据集上表现出色,比其他内存限制的学习方法更好,并与state-of-the-art内存占用重的播放方法匹配。我们还将关键设计元素integrated into other backpropagation-based continual learning algorithms,显著提高了它们的准确性。我们的结果提供了迫使生物学原理integrated into机器学习模型的证据,并提供了如何通过这些原理来设计更高效和可靠的在线持续学习系统。
Generating Modern Persian Carpet Map by Style-transfer
results: 生成的织革图易得到了用户评价的满意度,并且比传统方法快速。Abstract
Today, the great performance of Deep Neural Networks(DNN) has been proven in various fields. One of its most attractive applications is to produce artistic designs. A carpet that is known as a piece of art is one of the most important items in a house, which has many enthusiasts all over the world. The first stage of producing a carpet is to prepare its map, which is a difficult, time-consuming, and expensive task. In this research work, our purpose is to use DNN for generating a Modern Persian Carpet Map. To reach this aim, three different DNN style transfer methods are proposed and compared against each other. In the proposed methods, the Style-Swap method is utilized to create the initial carpet map, and in the following, to generate more diverse designs, methods Clip-Styler, Gatys, and Style-Swap are used separately. In addition, some methods are examined and introduced for coloring the produced carpet maps. The designed maps are evaluated via the results of filled questionnaires where the outcomes of user evaluations confirm the popularity of generated carpet maps. Eventually, for the first time, intelligent methods are used in producing carpet maps, and it reduces human intervention. The proposed methods can successfully produce diverse carpet designs, and at a higher speed than traditional ways.
摘要
In our proposed methods, the Style-Swap method is used to create the initial carpet map, and then, to generate more diverse designs, we use the Clip-Styler, Gatys, and Style-Swap methods separately. Additionally, we explore and introduce some methods for coloring the produced carpet maps. The designed maps are evaluated through filled questionnaires, and the results of user evaluations confirm the popularity of the generated carpet maps.For the first time, intelligent methods are used in producing carpet maps, reducing human intervention. Our proposed methods can successfully produce diverse carpet designs at a higher speed than traditional methods.
Deep Learning for Diverse Data Types Steganalysis: A Review
results: 本文对多种数字媒体类型进行了检测和分析,并提供了一个系统性的评估 metric 和数据集。同时,文章还提出了未来研究方向和挑战。Abstract
Steganography and steganalysis are two interrelated aspects of the field of information security. Steganography seeks to conceal communications, whereas steganalysis is aimed to either find them or even, if possible, recover the data they contain. Steganography and steganalysis have attracted a great deal of interest, particularly from law enforcement. Steganography is often used by cybercriminals and even terrorists to avoid being captured while in possession of incriminating evidence, even encrypted, since cryptography is prohibited or restricted in many countries. Therefore, knowledge of cutting-edge techniques to uncover concealed information is crucial in exposing illegal acts. Over the last few years, a number of strong and reliable steganography and steganalysis techniques have been introduced in the literature. This review paper provides a comprehensive overview of deep learning-based steganalysis techniques used to detect hidden information within digital media. The paper covers all types of cover in steganalysis, including image, audio, and video, and discusses the most commonly used deep learning techniques. In addition, the paper explores the use of more advanced deep learning techniques, such as deep transfer learning (DTL) and deep reinforcement learning (DRL), to enhance the performance of steganalysis systems. The paper provides a systematic review of recent research in the field, including data sets and evaluation metrics used in recent studies. It also presents a detailed analysis of DTL-based steganalysis approaches and their performance on different data sets. The review concludes with a discussion on the current state of deep learning-based steganalysis, challenges, and future research directions.
摘要
《隐藏通信和抹除检测:一篇现代信息安全领域的综述》Introduction:信息安全领域中,隐藏通信和抹除检测是两个相关的方面。隐藏通信旨在隐藏通信内容,而抹除检测则是检测和恢复隐藏的内容。由于隐藏通信和抹除检测具有广泛的应用前景,特别是在法律执法方面,因此对于找到隐藏的信息是非常重要的。Background:隐藏通信和抹除检测在过去几年中得到了广泛的关注,特别是在cyber犯罪和恐怖主义领域。隐藏通信可以帮助犯罪分子和恐怖分子避免被捕获,甚至是 encrypted 的情况下。因此,了解最新的隐藏信息检测技术是非常重要的。Methodology:本文将提供一个系统性的综述,涵盖了深度学习基于的隐藏信息检测技术。文章将覆盖所有类型的遮盖,包括图像、音频和视频,并讨论了最常用的深度学习技术。此外,文章还将探讨更高级的深度学习技术,如深度传输学习(DTL)和深度奖励学习(DRL),以提高隐藏信息检测系统的性能。Results:本文结合了最新的研究成果,包括数据集和评估指标。文章还进行了深度传输学习(DTL)基于的隐藏信息检测方法的系统性分析,并对不同数据集进行了详细的性能分析。Discussion:本文结束后,我们将对现代深度学习基于的隐藏信息检测领域进行一个系统性的回顾。我们还将讨论一些挑战和未来的研究方向。Conclusion:深度学习基于的隐藏信息检测技术在信息安全领域具有广泛的应用前景。本文提供了一个系统性的综述,涵盖了深度学习基于的隐藏信息检测技术的最新研究成果。我们希望这篇文章可以为读者提供一个全面的了解,并为未来的研究提供一个指导。
results: 本文提出了一种类型逻辑 sintaxisa paraparse donkey句子,并通过实验证明了其效果。Abstract
We demonstrate how to parse Geach's Donkey sentences in a compositional distributional model of meaning. We build on previous work on the DisCoCat (Distributional Compositional Categorical) framework, including extensions that model discourse, determiners, and relative pronouns. We present a type-logical syntax for parsing donkey sentences, for which we define both relational and vector space semantics.
摘要
我们展示了如何使用分布式compositional模型来解析格雷奇的驴句。我们基于之前的DisCoCat(分布式compositional categorical)框架,包括扩展以处理话语、determiners和相对副词。我们提出了一种类型逻辑语法来解析驴句,并对其定义了关系和vector空间 semantics。Note: "分布式compositional" (fēnhuì zhīxìng) is a compound word in Chinese, where "分布式" (fēnhuì) means "distributed" and "compositional" is a loanword from English.
MT-IceNet – A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting
results: 研究表明,使用NSIDC的卫星评估数据和ERA5的大气和海洋变量,MT-IceNet模型在6个月预测时间点上具有60%的预测误差减少,与现有的模型相比。Abstract
Arctic amplification has altered the climate patterns both regionally and globally, resulting in more frequent and more intense extreme weather events in the past few decades. The essential part of Arctic amplification is the unprecedented sea ice loss as demonstrated by satellite observations. Accurately forecasting Arctic sea ice from sub-seasonal to seasonal scales has been a major research question with fundamental challenges at play. In addition to physics-based Earth system models, researchers have been applying multiple statistical and machine learning models for sea ice forecasting. Looking at the potential of data-driven approaches to study sea ice variations, we propose MT-IceNet - a UNet based spatial and multi-temporal (MT) deep learning model for forecasting Arctic sea ice concentration (SIC). The model uses an encoder-decoder architecture with skip connections and processes multi-temporal input streams to regenerate spatial maps at future timesteps. Using bi-monthly and monthly satellite retrieved sea ice data from NSIDC as well as atmospheric and oceanic variables from ERA5 reanalysis product during 1979-2021, we show that our proposed model provides promising predictive performance for per-pixel SIC forecasting with up to 60% decrease in prediction error for a lead time of 6 months as compared to its state-of-the-art counterparts.
摘要
results: GPT-4 在专家提示下 achieve up to $65.49$ F\textsubscript{1} 分数,较前一个基准值高出约 $5$ 分数。 这表明 LLM 在有限的资源情况下可以提供可靠的数据生成方法,并且可以为模型训练提供有用的数据。Abstract
Recently, large language models (LLMs) fine-tuned to follow human instruction have exhibited significant capabilities in various English NLP tasks. However, their performance in grammatical error correction (GEC) tasks, particularly in non-English languages, remains significantly unexplored. In this paper, we delve into abilities of instruction fine-tuned LLMs in Arabic GEC, a task made complex due to Arabic's rich morphology. Our findings suggest that various prompting methods, coupled with (in-context) few-shot learning, demonstrate considerable effectiveness, with GPT-4 achieving up to $65.49$ F\textsubscript{1} score under expert prompting (approximately $5$ points higher than our established baseline). This highlights the potential of LLMs in low-resource settings, offering a viable approach for generating useful synthetic data for model training. Despite these positive results, we find that instruction fine-tuned models, regardless of their size, significantly underperform compared to fully fine-tuned models of significantly smaller sizes. This disparity highlights a substantial room for improvements for LLMs. Inspired by methods from low-resource machine translation, we also develop a method exploiting synthetic data that significantly outperforms previous models on two standard Arabic benchmarks. Our work sets new SoTA for Arabic GEC, with $72.19\%$ and $73.26$ F$_{1}$ on the 2014 and 2015 QALB datasets, respectively.
摘要
现在,大型语言模型(LLM)经过人类指导的微调表现出了在英语自然语言处理(NLP)任务中的显著能力。然而,它们在非英语语法错误修正(GEC)任务中的表现仍然尚未得到了足够的探索。在这篇论文中,我们探索了微调后的LLM在阿拉伯语GEC任务中的能力。我们发现,使用不同的提示方法和(Context)少量学习可以获得显著的效果,GPT-4在专家提示下达到了$65.49$ F\textsubscript{1} 分数(相对于我们的基准值,大约5个点高)。这显示了LLM在有限的资源情况下的潜在能力,提供了一种可靠的方法来生成有用的合成数据 для模型训练。尽管我们得到了正面的结果,我们发现微调后的模型,无论其大小,都比完全微调的模型(即模型的大小更小)表现下降。这种差异表明了LLM的大型化可能存在一定的限制。 Drawing inspiration from low-resource机器翻译技术,我们还开发了一种利用合成数据的方法,在两个标准的阿拉伯语 benchmark 上获得了显著的提高。我们的工作设置了新的 SoTA для阿拉伯语 GEC,分别为$72.19\%$ 和 $73.26$ F\textsubscript{1} 在2014 和 2015 QALB 数据集上。
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
methods: 本研究使用了一种新的方法,即在搜索时使用可Parametric Language Model(OLC)和一个可修改的非参数化数据存储(例如,包含版权书籍或新闻)。这种方法可以在搜索时使用高风险数据,而不需要在训练过程中使用它们。
results: 实验结果显示,使用OLC和非参数化数据存储可以大幅提高语言模型的性能, especialy 在不同领域的搜索中。此外,研究还发现了不同的非参数化方法的效果,以及数据存储大小对性能的影响。这些结果表明,可以构建高质量的语言模型,同时遵守法律法规。Abstract
The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high quality language models while mitigating their legal risk.
摘要
Currently, the legality of training language models (LMs) on copyrighted or restricted data is under debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text, and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out-of-domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high-quality language models while mitigating their legal risk.
Probabilistic Invariant Learning with Randomized Linear Classifiers
results: 作者证明了RLCs可以在某些条件下, WITH HIGH PROBABILITY aproximate any (smooth) function while preserving invariance to compact group transformations。此外,作者还设计了三种基于RLCs的随机化分类模型,可以在不同的数据上实现概率性和通用适应。最后,作者通过实验表明,这种新的模型在不变任务中可以比 deterministic invariant neural networks 更好地表现。Abstract
Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
摘要
“设计能够表达性和保持任务知识的模型是一个在不断增长的问题。现有的解决方案都是在计算资源和存储空间方面做出了牺牲。在这种工作中,我们表明了可以通过随机性来解决这个问题。我们的关键发现是,接受随机性的通用近似和不变性可以降低我们的资源需求。更具体地,我们提出了一类基于随机化的线性模型,称为随机线性分类器(RLCs)。我们给出了参数和样本大小的条件,在这些条件下,RLCs可以,高概率地,近似任何(光滑)函数,同时保持 compact 群变换的不变性。基于这个结果,我们设计了三种随机线性模型,它们可以在分类任务中保持随机性和不变性,并且使用较少的资源。最后,我们通过实验证明这种新类型的模型在不变任务中比 deterministic 抽象神经网络更有优势。”
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
results: 论文发现,即使一个公司的成本比另一个公司更高,也可以实现利润分享,并且提供了一些方法来确定合理的谈判安排。Abstract
Major advances in Machine Learning (ML) and Artificial Intelligence (AI) increasingly take the form of developing and releasing general-purpose models. These models are designed to be adapted by other businesses and agencies to perform a particular, domain-specific function. This process has become known as adaptation or fine-tuning. This paper offers a model of the fine-tuning process where a Generalist brings the technological product (here an ML model) to a certain level of performance, and one or more Domain-specialist(s) adapts it for use in a particular domain. Both entities are profit-seeking and incur costs when they invest in the technology, and they must reach a bargaining agreement on how to share the revenue for the technology to reach the market. For a relatively general class of cost and revenue functions, we characterize the conditions under which the fine-tuning game yields a profit-sharing solution. We observe that any potential domain-specialization will either contribute, free-ride, or abstain in their uptake of the technology, and we provide conditions yielding these different strategies. We show how methods based on bargaining solutions and sub-game perfect equilibria provide insights into the strategic behavior of firms in these types of interactions, and we find that profit-sharing can still arise even when one firm has significantly higher costs than another. We also provide methods for identifying Pareto-optimal bargaining arrangements for a general set of utility functions.
摘要
大量的机器学习(ML)和人工智能(AI)创新都在形式为开发和发布通用模型。这些模型是为其他企业和机构用于特定领域功能而设计的,并且可以通过精度调整来适应不同的领域。这个过程被称为调整或精度调整。这篇论文提出了调整过程中一个通用者和一个或多个领域专家之间的合作模型。这两个实体都是追求利润的,并且投入技术时需要支付成本。为了将技术投入到市场上,他们需要达成协议来分配收益。为一类相对通用的成本和收益函数,我们Characterize了调整游戏中的盈利分享解决方案的条件。我们发现,在采用这种技术时,任何领域专业化都可能会为其投入、免费享用或决不使用技术,并且我们提供了这些不同策略的条件。我们发现,基于谈判解决方案和下游完整平衡的方法可以提供关于企业在这类互动中的策略性行为的深入理解,并且我们发现,盈利分享可以在一个实体有较高成本时仍然出现。此外,我们还提供了一种方法来标识通用集成函数的Pareto优化协议。
Event Abstraction for Enterprise Collaboration Systems to Support Social Process Mining
paper_authors: Jonas Blatt, Patrick Delfmann, Petra Schubert
For: The paper is written for Process Mining (PM) and Enterprise Collaboration Systems (ECS).* Methods: The paper proposes a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities with the system-generated low-level traces.* Results: The algorithm produces accurate results.Here’s the information in Simplified Chinese text, as requested:* For: 这篇论文是为了进程挖掘(PM)和企业协作系统(ECS)所写的。* Methods: 论文提出了一种适应ECS事件抽象(ECSEA)方法,通过比较记录的实际用户活动(高级踪迹)与系统生成的低级踪迹(从ECS中提取的)来训练模型。* Results: 算法生成的结果准确。I hope that helps! Let me know if you have any other questions.Abstract
One aim of Process Mining (PM) is the discovery of process models from event logs of information systems. PM has been successfully applied to process-oriented enterprise systems but is less suited for communication- and document-oriented Enterprise Collaboration Systems (ECS). ECS event logs are very fine-granular and PM applied to their logs results in spaghetti models. A common solution for this is event abstraction, i.e., converting low-level logs into more abstract high-level logs before running discovery algorithms. ECS logs come with special characteristics that have so far not been fully addressed by existing event abstraction approaches. We aim to close this gap with a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities (high-level traces) with the system-generated low-level traces (extracted from the ECS). The model allows us to automatically convert future low-level traces into an abstracted high-level log that can be used for PM. Our evaluation shows that the algorithm produces accurate results. ECSEA is a preprocessing method that is essential for the interpretation of collaborative work activity in ECS, which we call Social Process Mining.
摘要
一个目标 OF 过程挖掘(PM)是从信息系统事件日志中发现过程模型。PM 已经成功应用于进程 oriented 企业系统,但是它在交流和文档 oriented 企业协作系统(ECS)中 menos 适用。ECS 事件日志具有特殊的特点,PM 应用于其日志会导致蛇形模型。一种常见的解决方案是事件抽象,即将低级别的日志转换为更高级别的日志,以便在发现算法中使用。ECS 日志具有特殊的特点,已经不完全由现有的事件抽象方法解决。我们的目标是通过适应 ECS 事件抽象(ECSEA)方法,训练一个模型,将记录的实际用户活动(高级跟踪)与系统生成的低级别跟踪(从 ECS 提取)进行比较。这个模型可以自动将未来的低级别跟踪转换为抽象的高级别日志,用于 PM。我们的评估表明,该算法生成的结果准确。ECSEA 是一种必需的预处理方法,用于解释 ECS 中的协作工作活动,我们称之为社交过程挖掘。
Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries
paper_authors: Elvis Han Cui, Zizhao Zhang, Culsome Junwen Chen, Weng Kee Wong
for: 本研究用一种新提出的自然引导算法 Competitive Swarm Optimizer with Mutated Agents (CSO-MA) 应用于各种统计科学中的优化问题,以示其灵活性和与其他算法的比较。
methods: 本研究使用CSO-MA算法,可以处理不同的成本结构或多个用户指定的非线性约束。
results: 本研究在不同的优化问题中应用CSO-MA算法,如找到单个维度泛化趋势模型中参数的最大可能性估计、教育研究中常用的拉希模型参数估计、Markov renewal模型中Cox回归估计和两个分 комpartment模型中缺失值补充等。此外,还应用到生态学问题中选取最佳变量和制造业中用logistic模型与多个交互因素进行车辆燃料实验的设计。Abstract
Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.
摘要
自然 inspirited мета希顿算法是人工智能中重要的组件,广泛应用于各个领域解决各种复杂的优化问题。我们应用了一种新提出的自然 inspirited meta希顿算法called 竞争群体优化器with 突变代理(CSO-MA),并证明其灵活性和相比其他竞争者的出色表现在各种优化问题中。具体来说,我们表明该算法可以有效地处理不同的成本结构或多个用户指定的非线性约束。我们的应用包括(i)在生物信息学中使用单个维度泛化趋势模型来找到最大感受度参数的最佳估计,(ii)在教育研究中使用 Rasch 模型来估计参数,(iii)使用 Cox 回归模型来找到 M-估计,(iv)对两个部件模型中的缺失值进行完成,以及(v)在生态学问题中选取最佳变量,(vi)采用 Logistic 模型与多个交互因素来设计汽车燃油实验。
AdaptEx: A Self-Service Contextual Bandit Platform
results: 该平台可以快速提高用户体验,降低传统测试方法相关的成本和时间。它还能够在内容不断变化和连续”冰封”情况下妥协很好地适应。Abstract
This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.
摘要
这份论文介绍了Expedia Group广泛使用的自助上下文强制投机平台AdaptEx,该平台利用多臂强制投机算法个性化用户体验,并快速学习每次用户互动。它提供了改善用户体验的强大解决方案,同时减少传统测试方法相关的成本和时间。该平台允许快速迭代到优质产品解决方案,即使在不断变化的内容和连续“冷启动”情况下也能够 gracefully adapt。
Understanding the Effect of Counterfactual Explanations on Trust and Reliance on AI for Human-AI Collaborative Clinical Decision Making
results: 研究发现,当AI建议正确时,both therapists和laypersons可以通过特征解释和对比解释来提高审查性和一致性。然而,当AI建议错误时,对比解释可以帮助both therapists和laypersons减少对错误AI建议的过度依赖。Abstract
Artificial intelligence (AI) is increasingly being considered to assist human decision-making in high-stake domains (e.g. health). However, researchers have discussed an issue that humans can over-rely on wrong suggestions of the AI model instead of achieving human AI complementary performance. In this work, we utilized salient feature explanations along with what-if, counterfactual explanations to make humans review AI suggestions more analytically to reduce overreliance on AI and explored the effect of these explanations on trust and reliance on AI during clinical decision-making. We conducted an experiment with seven therapists and ten laypersons on the task of assessing post-stroke survivors' quality of motion, and analyzed their performance, agreement level on the task, and reliance on AI without and with two types of AI explanations. Our results showed that the AI model with both salient features and counterfactual explanations assisted therapists and laypersons to improve their performance and agreement level on the task when `right' AI outputs are presented. While both therapists and laypersons over-relied on `wrong' AI outputs, counterfactual explanations assisted both therapists and laypersons to reduce their over-reliance on `wrong' AI outputs by 21\% compared to salient feature explanations. Specifically, laypersons had higher performance degrades by 18.0 f1-score with salient feature explanations and 14.0 f1-score with counterfactual explanations than therapists with performance degrades of 8.6 and 2.8 f1-scores respectively. Our work discusses the potential of counterfactual explanations to better estimate the accuracy of an AI model and reduce over-reliance on `wrong' AI outputs and implications for improving human-AI collaborative decision-making.
摘要
人工智能(AI)在高度决策领域(如医疗)中被越来越广泛使用以协助人类决策。然而,研究人员发现,人们可能会过度依赖错误的AI模型建议而不是实现人类AI协同性能。在这种情况下,我们使用了突出性特征解释以及对其他选项的对比解释,以便让人类更加分析地审查AI建议,并且研究这些解释对信任和依赖AI的影响。我们在评估后期生存者质量动作任务上进行了实验,并分析了参与者的表现、同意水平和无AI和两种AI解释情况下的AI依赖度。我们的结果显示,带有突出性特征和对比解释的AI模型可以帮助治疗师和非专业人员提高表现和同意水平。然而,两者都会过度依赖错误的AI输出,并且对比解释可以帮助两者减少对错误AI输出的依赖度,相比突出性特征解释下降21%。特别是,非专业人员在使用突出性特征解释时表现下降18.0 f1-score,而使用对比解释时表现下降14.0 f1-score,与治疗师表现下降8.6和2.8 f1-score相比。我们的研究表明,对比解释可以更好地估计AI模型的准确性,降低对错误AI输出的依赖度,并对人类AI协同决策产生影响。
Some Options for Instantiation of Bipolar Argument Graphs with Deductive Arguments
results: 本研究的结果可以帮助我们更好地理解二分olar Argument Graph 中的 Argument 和它们之间的交互,并提供一种基于逻辑 Argument 的 Framework 来实现这一目标。Abstract
Argument graphs provide an abstract representation of an argumentative situation. A bipolar argument graph is a directed graph where each node denotes an argument, and each arc denotes the influence of one argument on another. Here we assume that the influence is supporting, attacking, or ambiguous. In a bipolar argument graph, each argument is atomic and so it has no internal structure. Yet to better understand the nature of the individual arguments, and how they interact, it is important to consider their internal structure. To address this need, this paper presents a framework based on the use of logical arguments to instantiate bipolar argument graphs, and a set of possible constraints on instantiating arguments that take into account the internal structure of the arguments, and the types of relationship between arguments.
摘要
Argument graphs provide an abstract representation of an argumentative situation. A bipolar argument graph is a directed graph where each node denotes an argument, and each arc denotes the influence of one argument on another. Here we assume that the influence is supporting, attacking, or ambiguous. In a bipolar argument graph, each argument is atomic and so it has no internal structure. However, to better understand the nature of the individual arguments and how they interact, it is important to consider their internal structure. To address this need, this paper presents a framework based on the use of logical arguments to instantiate bipolar argument graphs, and a set of possible constraints on instantiating arguments that take into account the internal structure of the arguments and the types of relationships between them.
paper_authors: Yifan Zhang, Jingqin Yang, Yang Yuan, Andrew Chi-Chih Yao
for: solves complex problems with human-like thought processes
methods: employs language models in a cumulative and iterative manner, decomposing tasks into smaller components
results: consistently outperforms existing methods with an improvement up to 9.3%, achieves 98.04% accuracy on the curated FOLIO wiki dataset, and achieves 94% accuracy on the Game of 24 with a 20% enhancement over the previous state-of-the-art method.Here’s the full text in Simplified Chinese:
for: solves complex problems with human-like thought processes
methods: employs language models in a cumulative and iterative manner, decomposing tasks into smaller components
results: consistently outperforms existing methods with an improvement up to 9.3%, achieves 98.04% accuracy on the curated FOLIO wiki dataset, and achieves 94% accuracy on the Game of 24 with a 20% enhancement over the previous state-of-the-art method.Abstract
While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves the astonishing accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 94%, which signifies a substantial enhancement of 20% over the previous state-of-the-art method (code is available at https://github.com/iiis-ai/cumulative-reasoning).
摘要
While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves the astonishing accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 94%, which signifies a substantial enhancement of 20% over the previous state-of-the-art method (code is available at https://github.com/iiis-ai/cumulative-reasoning).Here's the word-for-word translation of the text into Simplified Chinese: whilst language models powerful versatile often fail address highly complex problems . This because solving complex problems requires deliberate thinking , which has been only minimally guided during training . In this paper , we propose new method called Cumulative Reasoning (CR) , which employs language models in cumulative iterative manner emulate human thought processes . By decomposing tasks smaller components , CR streamlines problem-solving process , rendering it both more manageable effective . For logical inference tasks , CR consistently outperforms existing methods with improvement up 9.3% , and achieves astonishing accuracy 98.04% on curated FOLIO wiki dataset . In context of Game of 24 , CR achieves accuracy 94% , which signifies substantial enhancement 20% over previous state-of-the-art method (code available at https://github.com/iiis-ai/cumulative-reasoning) .
Learning Unbiased Image Segmentation: A Case Study with Plain Knee Radiographs
paper_authors: Nickolas Littlefield, Johannes F. Plate, Kurt R. Weiss, Ines Lohse, Avani Chhabra, Ismaeel A. Siddiqui, Zoe Menezes, George Mastorakos, Sakshi Mehul Thakar, Mehrnaz Abedian, Matthew F. Gong, Luke A. Carlson, Hamidreza Moradi, Soheyla Amirian, Ahmad P. Tafti
results: 研究发现了gender和racial偏见,并提出了一些 Mitigation Strategies来纠正这些偏见,以确保公平和不偏见的分割结果。Abstract
Automatic segmentation of knee bony anatomy is essential in orthopedics, and it has been around for several years in both pre-operative and post-operative settings. While deep learning algorithms have demonstrated exceptional performance in medical image analysis, the assessment of fairness and potential biases within these models remains limited. This study aims to revisit deep learning-powered knee-bony anatomy segmentation using plain radiographs to uncover visible gender and racial biases. The current contribution offers the potential to advance our understanding of biases, and it provides practical insights for researchers and practitioners in medical imaging. The proposed mitigation strategies mitigate gender and racial biases, ensuring fair and unbiased segmentation results. Furthermore, this work promotes equal access to accurate diagnoses and treatment outcomes for diverse patient populations, fostering equitable and inclusive healthcare provision.
摘要
自动 segmentation of knee 骨骼结构是orthopedics中的一项基础技术,已经在预操作和后操作设置中存在几年之久。深度学习算法在医疗影像分析中表现出色,但评估公平和可能的偏见在这些模型中仍然有限。本研究旨在通过使用平面X光图像来探索深度学习 powers knee-bony anatomy segmentation中可见的性别和种族偏见。本贡献可能提高我们对偏见的理解,并提供实践的建议 для研究人员和实践者在医疗影像领域。提出的mitigation strategies可以抑制性别和种族偏见,确保公平和不偏见的segmentation结果。此外,这种工作促进了多样化患者人口群的准确诊断和治疗结果,推动了公平和包容的医疗服务。
methods: 本研究使用了 ALBERT 和 RoBERTa 两种改进的 transformer 模型,并对其进行比较,以检测假新闻的性能。
results: 研究发现,使用 ALBERT 模型可以达到 87.6% 的准确率、86.9% 的精度、86.9% F1 分数和 174.5 次/秒(epoch)的运行时间,超过了非 transformer 方法的性能。Abstract
Fake news is fake material in a news media format but is not processed properly by news agencies. The fake material can provoke or defame significant entities or individuals or potentially even for the personal interests of the creators, causing problems for society. Distinguishing fake news and real news is challenging due to limited of domain knowledge and time constraints. According to the survey, the top three areas most exposed to hoaxes and misinformation by residents are in Banten, DKI Jakarta and West Java. The model of transformers is referring to an approach in the field of artificial intelligence (AI) in natural language processing utilizing the deep learning architectures. Transformers exercise a powerful attention mechanism to process text in parallel and produce rich and contextual word representations. A previous study indicates a superior performance of a transformer model known as BERT over and above non transformer approach. However, some studies suggest the performance can be improved with the use of improved BERT models known as ALBERT and RoBERTa. However, the modified BERT models are not well explored for detecting fake news in Bahasa Indonesia. In this research, we explore those transformer models and found that ALBERT outperformed other models with 87.6% accuracy, 86.9% precision, 86.9% F1-score, and 174.5 run-time (s/epoch) respectively. Source code available at: https://github.com/Shafna81/fakenewsdetection.git
摘要
假新闻是指在新闻媒体格式中存在假信息,但是由新闻机构不当处理而导致的假物。假新闻可能会诋毁或攻击重要个体或组织,甚至是为个人利益。分辨假新闻和真实新闻是困难的,因为有限的领域知识和时间限制。据调查,居民最常受到诈骗和不实信息的地区为印度尼西亚巴仁、特区雅加达和西爪哇。 transformers 是一种人工智能(AI)自然语言处理领域的方法,使用深度学习架构。 transformers 使用强大的注意机制来并行处理文本,生成rich和上下文敏感的单词表示。前一项研究表明,BERT 模型在非 transformer 方法之上显示出超越性,但是一些研究表明,使用改进的 BERT 模型,如 ALBERT 和 RoBERTa,可以提高性能。然而,这些改进 BERT 模型在印度尼西亚语言检测假新闻方面尚未得到充分探索。本研究探讨了这些 transformer 模型,发现 ALBERT 模型在准确率、精度、 F1 分数和运行时间等方面均达到了最高水平,具体数据为 87.6%、86.9%、86.9% 和 174.5(s/epoch)。源代码可以在 GitHub 上找到:https://github.com/Shafna81/fakenewsdetection.git。
Extrapolating Large Language Models to Non-English by Aligning Languages
paper_authors: Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li
for: 提高大语言模型(LLM)对非英语语言的能力
methods: 使用语义对齐和指令调整来强化预训练的LLM在非英语语言上
results: x-LLaMA模型在六种非英语语言的跨语言标准 bencmark 上平均高于英语 instrucction-tuned 对手(Alpaca) by 42.50%,并在中文人文任务上达到了8.2%的提升。Abstract
Due to the unbalanced training data distribution, the language ability of large language models (LLMs) is often biased towards English. In this paper, we propose to empower pre-trained LLMs on non-English languages by building semantic alignment across languages. We perform instruction-tuning on LLaMA with both translation task data and cross-lingual general task data to obtain cross-lingual models (x-LLaMA). Experiment results on cross-lingual benchmark XQUAD and MLQA show that x-LLaMA models outperform the English instruction-tuned counterpart (Alpaca) by 42.50% on average on six non-English languages. Further experiments on Chinese benchmark C-Eval show that x-LLaMA achieves significant improvement on Chinese humanities tasks, outperforming Alpaca by 8.2%. We also discover that incorporating non-English text on the target side of translation data is particularly effective for boosting non-English ability. Besides, we find that semantic alignment within LLM can be further strengthened as translation task data scales up and we present the formulation of the underlying scaling law. Evaluation results on translation dataset Flores-101 show that \method outperforms previous LLaMA-based models in all evaluated directions. Code and data will be available at: https://github.com/OwenNJU/x-LLM.
摘要
由于训练数据的不均衡分布,大型自然语言模型(LLM)的语言能力 часто受到英语的影响。在这篇论文中,我们提出了使用语义对Alignment来强化预训练的LLM在非英语语言上的能力。我们通过对LLaMA进行 instrucion-tuning,使用翻译任务数据和跨语言通用任务数据来获得跨语言模型(x-LLaMA)。实验结果表明,x-LLaMA模型在六种非英语语言的跨语言标准 bencmark XQUAD和MLQA上的表现比英语 instrucion-tuned counterpart(Alpaca)提高42.50%的平均值。进一步的实验表明,x-LLaMA在中文人文任务上 achieve significan improvement,比Alpaca提高8.2%。我们还发现,在目标语言的翻译数据中包含非英语文本时,特别有效地提高非英语能力。此外,我们发现在翻译任务数据尺度上,LLM的语义对Alignment可以进一步强化。我们还提出了翻译数据集Flores-101上的扩展法则。评估结果表明,我们的方法在所有评估方向上都超过了之前的LLaMA-based模型。代码和数据将在https://github.com/OwenNJU/x-LLM上公开。
Integrating large language models and active inference to understand eye movements in reading and dyslexia
paper_authors: Francesco Donnarumma, Mirco Frosolone, Giovanni Pezzulo
for: simulating reading and eye movements using a computational model
methods: hierarchical active inference, combining strengths of large language models and active inference
results: proficiency in reading known and unknown words and sentences, exploration of maladaptive inference effects in dyslexia, potential implications for understanding and addressing dyslexiaAbstract
We present a novel computational model employing hierarchical active inference to simulate reading and eye movements. The model characterizes linguistic processing as inference over a hierarchical generative model, facilitating predictions and inferences at various levels of granularity, from syllables to sentences. Our approach combines the strengths of large language models for realistic textual predictions and active inference for guiding eye movements to informative textual information, enabling the testing of predictions. The model exhibits proficiency in reading both known and unknown words and sentences, adhering to the distinction between lexical and nonlexical routes in dual-route theories of reading. Notably, our model permits the exploration of maladaptive inference effects on eye movements during reading, such as in dyslexia. To simulate this condition, we attenuate the contribution of priors during the reading process, leading to incorrect inferences and a more fragmented reading style, characterized by a greater number of shorter saccades. This alignment with empirical findings regarding eye movements in dyslexic individuals highlights the model's potential to aid in understanding the cognitive processes underlying reading and eye movements, as well as how reading deficits associated with dyslexia may emerge from maladaptive predictive processing. In summary, our model represents a significant advancement in comprehending the intricate cognitive processes involved in reading and eye movements, with potential implications for understanding and addressing dyslexia through the simulation of maladaptive inference. It may offer valuable insights into this condition and contribute to the development of more effective interventions for treatment.
摘要
我们提出了一种新的计算模型,使用层次活动推理来模拟阅读和视力运动。该模型将语言处理视为推理过程中的层次生成模型,从而实现不同级别的预测和推理,从音节到句子。我们的方法结合了大型语言模型的实用性和活动推理的指导力,以便测试预测。模型能够预测已知和未知词和句子,并且遵循 dual-route 理论中的词和非词路径分离。特别是,我们的模型允许探索误差推理对视力运动的影响,如阅读障碍。为了模拟这种情况,我们在阅读过程中减少了假设的影响,导致错误的推理和更多的短暂快速跳跃,这与词 Reading 障碍者的实际观察结果相符。这种对models 的应用可能为理解阅读和视力运动的认知过程提供valuable 信息,以及如何通过模拟误差推理来理解和治疗阅读障碍。
Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance
results: 我们的方法与其他现有的 OOD 检测方法比较,显著地提高了检测准确率。Abstract
Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system. In a real-world scenario, a deployed dialect classification model can encounter anomalous inputs that differ from the training data distribution, also called out-of-distribution (OOD) samples. Those OOD samples can lead to unexpected outputs, as dialects of those samples are unseen during model training. Out-of-distribution detection is a new research area that has received little attention in the context of dialect classification. Towards this, we proposed a simple yet effective unsupervised Mahalanobis distance feature-based method to detect out-of-distribution samples. We utilize the latent embeddings from all intermediate layers of a wav2vec 2.0 transformer-based dialect classifier model for multi-task learning. Our proposed approach outperforms other state-of-the-art OOD detection methods significantly.
摘要
<>dialect 分类在多种应用中使用,如机器翻译和语音识别,以提高整体系统性能。在实际场景中,部署的 диалект分类模型可能会遇到不同于训练数据分布的输入,也称为 OUT-OF-DISTRIBUTION(OOD)样本。这些 OOD 样本可能会导致不预期的输出,因为这些 диаLECT 的样本在模型训练时未被考虑。 OUT-OF-DISTRIBUTION 检测是一个新的研究领域,在 диалект分类上尚未受到充分关注。为了解决这个问题,我们提出了一种简单 yet 有效的无监督 Mahalanobis 距离特征基于方法。我们利用了所有 intermediate layer 的 latent 嵌入,来进行多任务学习。我们的提议方法在比较其他现有的 OOD 检测方法时表现出色。
Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists
results: 研究人员通过使用不含扩Difficult inflection的词形来覆盖更多的语言,并使用word list来训练模型,成功地捕捉了一些语言中的口音协调模式。Abstract
We present a cross-linguistic study that aims to quantify vowel harmony using data-driven computational modeling. Concretely, we define an information-theoretic measure of harmonicity based on the predictability of vowels in a natural language lexicon, which we estimate using phoneme-level language models (PLMs). Prior quantitative studies have relied heavily on inflected word-forms in the analysis of vowel harmony. We instead train our models using cross-linguistically comparable lemma forms with little or no inflection, which enables us to cover more under-studied languages. Training data for our PLMs consists of word lists with a maximum of 1000 entries per language. Despite the fact that the data we employ are substantially smaller than previously used corpora, our experiments demonstrate the neural PLMs capture vowel harmony patterns in a set of languages that exhibit this phenomenon. Our work also demonstrates that word lists are a valuable resource for typological research, and offers new possibilities for future studies on low-resource, under-studied languages.
摘要
我们发表了一项跨语言研究,旨在使用数据驱动的计算模型量化元音协调。具体来说,我们定义了基于自然语言词典中元音预测性的信息理论度量,并使用语音模型(PLM)来估算。先前的量化研究主要基于语法变化word形式进行分析元音协调。我们 Instead,我们使用跨语言相似的 lemma 形式来训练我们的模型,这使得我们能够更好地涵盖更多的未研究语言。我们的训练数据包括每种语言1000个单词最多的列表。尽管我们使用的数据比之前使用的 corpora 更小,但我们的实验表明我们的神经网络PLMs 能够 Capture 元音协调模式在一组语言中。我们的工作还表明,word lists 是 typological 研究的有价值资源,并且提供了未来研究低资源、未研究语言的新可能性。
Emotion-Conditioned Text Generation through Automatic Prompt Optimization
results: 我们在使用这种方法进行情感条件文本生成 task 时,与手动设计的提示相比,能够达到更高的 macro-average F1 值(0.75),而手动设计的seed prompts 只能达到 macro-average F1 值为 0.22。Abstract
Conditional natural language generation methods often require either expensive fine-tuning or training a large language model from scratch. Both are unlikely to lead to good results without a substantial amount of data and computational resources. Prompt learning without changing the parameters of a large language model presents a promising alternative. It is a cost-effective approach, while still achieving competitive results. While this procedure is now established for zero- and few-shot text classification and structured prediction, it has received limited attention in conditional text generation. We present the first automatic prompt optimization approach for emotion-conditioned text generation with instruction-fine-tuned models. Our method uses an iterative optimization procedure that changes the prompt by adding, removing, or replacing tokens. As objective function, we only require a text classifier that measures the realization of the conditional variable in the generated text. We evaluate the method on emotion-conditioned text generation with a focus on event reports and compare it to manually designed prompts that also act as the seed for the optimization procedure. The optimized prompts achieve 0.75 macro-average F1 to fulfill the emotion condition in contrast to manually designed seed prompts with only 0.22 macro-average F1.
摘要
常用的自然语言生成方法经常需要 either 昂贵的微调或者从scratch学习大型语言模型。两者都不太可能导致良好的结果,除非有庞大的数据和计算资源。提示学习无需改变大型语言模型的参数,表现出了可持续的潜在。这种方法在零和几个shot文本分类和结构预测方面已经得到了广泛的关注,但在条件文本生成方面却受到了有限的关注。我们提出了首个自动提示优化方法 для情感条件文本生成,使用迭代优化过程,通过添加、删除或替换Token来更新提示。我们的方法只需要一个可测量实现条件变量的文本分类器作为目标函数。我们对情感条件文本生成进行评估,并与手动设计的种子提示进行比较。得到的优化提示达到0.75的macro-average F1,以满足情感条件,而手动设计的种子提示只达到0.22的macro-average F1。
TSSR: A Truncated and Signed Square Root Activation Function for Neural Networks
results: 试验表明,提议的TSSR函数在比较难以学习的问题上表现更好,比如计算机视觉、自然语言处理和语音识别等领域。Abstract
Activation functions are essential components of neural networks. In this paper, we introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is distinctive because it is odd, nonlinear, monotone and differentiable. Its gradient is continuous and always positive. Thanks to these properties, it has the potential to improve the numerical stability of neural networks. Several experiments confirm that the proposed TSSR has better performance than other stat-of-the-art activation functions. The proposed function has significant implications for the development of neural network models and can be applied to a wide range of applications in fields such as computer vision, natural language processing, and speech recognition.
摘要
translate into Simplified Chinese:activation functions are crucial components of neural networks. In this paper, we introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is distinctive because it is odd, nonlinear, monotone, and differentiable. Its gradient is continuous and always positive. Thanks to these properties, it has the potential to improve the numerical stability of neural networks. Several experiments confirm that the proposed TSSR has better performance than other state-of-the-art activation functions. The proposed function has significant implications for the development of neural network models and can be applied to a wide range of applications in fields such as computer vision, natural language processing, and speech recognition.Note:* "odd" is translated as "奇数" (qīshū)* "nonlinear" is translated as "非线性" (fēi lǐnéng)* "monotone" is translated as "单调" (dāngdiào)* "differentiable" is translated as "可导数" (kědǎoxiàng)* "continuous" is translated as "连续" (liánxù)* "always positive" is translated as "总是正" (zǒngshì zhèng)
Evaluating the Generation Capabilities of Large Chinese Language Models
paper_authors: Hui Zeng, Jingyuan Xue, Meng Hao, Chen Sun, Bin Ning, Na Zhang
for: 这篇论文是为了评估大型中文语言模型在不同学术领域的生成能力而写的。
methods: 这篇论文使用了多种指标来评估模型的生成质量,包括准确率、相关性、朴素质量等。
results: 论文发现大型中文语言模型在六个领域中的生成能力强度不同,sciences and engineering领域的模型表现最好,而judicial examination领域的模型表现最差。同时,论文还提出了一个可重复性的Gscore指标来评估模型的生成质量。Abstract
This paper presents CG-Eval, the first comprehensive evaluation of the generation capabilities of large Chinese language models across a wide range of academic disciplines. The models' performance was assessed based on their ability to generate accurate and relevant responses to different types of questions in six disciplines, namely, Science and Engineering, Humanities and Social Sciences, Mathematical Calculations, Medical Practitioner Qualification Examination, Judicial Examination, and Certified Public Accountant Examination. This paper also presents Gscore, a composite index derived from the weighted sum of multiple metrics to measure the quality of model's generation against a reference. The test data and test results can be found at http://cgeval.besteasy.com/.
摘要
CLEVA: Chinese Language Models EVAluation Platform
paper_authors: Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, Liwei Wang
for: 评估中文大型自然语言模型(LLM)的能力 has become an increasingly significant issue, and the paper aims to address this issue by presenting a comprehensive platform for evaluating Chinese LLMs.
methods: The platform, called CLEVA, employs a standardized workflow to assess LLMs’ performance across various dimensions, regularly updating a competitive leaderboard. It also curates a significant proportion of new data and develops a sampling strategy to alleviate contamination.
results: Large-scale experiments featuring 23 influential Chinese LLMs have validated CLEVA’s efficacy.Here is the same information in Simplified Chinese text:
methods: 该平台叫做CLEVA,它使用标准化的工作流程评估不同维度的LLM表现,定期更新竞争性的 liderboard。它还curates a significant proportion of new data和开发了一种避免污染的采样策略。
results: 23种Influential Chinese LLMs的大规模实验已经验证了CLEVA的有效性。Abstract
With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs' performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks and a model API, users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 influential Chinese LLMs have validated CLEVA's efficacy.
摘要
To address these challenges, we present CLEVA, a user-friendly platform that holistically evaluates Chinese LLMs. Our platform employs a standardized workflow to assess LLMs' performance across various dimensions, and regularly updates a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round.CLEVA is easy to use and requires just a few mouse clicks and a model API. Users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 influential Chinese LLMs have validated CLEVA's efficacy.In simplified Chinese, the text would be:中文大语模型(LLM)的出现使得评估模型能力的问题日益重要。但是现在中文LLM的评估遇到了一些挑战,包括中文benchmark的缺乏,评估程序的标准化和比较不一致,以及污染的问题。为了解决这些挑战,我们提出了CLEVA,一个易用的平台,可以全面评估中文LLM。我们的平台使用标准化的工作流程,评估模型的能力在不同的维度上,并 régulièrement更新竞争性的领先板。为了解决污染的问题,CLEVA获取了大量的新数据,并开发了一个确保每个领先板都有唯一子集的抽样方法。使用CLEVA需要只需要几个点键和模型API,用户可以快速进行充分的评估, minimal coding。大规模的实验表明,CLEVA具有效果。
A Bipartite Graph is All We Need for Enhancing Emotional Reasoning with Commonsense Knowledge
results: 这种方法在对比之下显著超越了现有的知识混合方法,并且可以直接普适化到不同类型和级别的知识源。Abstract
The context-aware emotional reasoning ability of AI systems, especially in conversations, is of vital importance in applications such as online opinion mining from social media and empathetic dialogue systems. Due to the implicit nature of conveying emotions in many scenarios, commonsense knowledge is widely utilized to enrich utterance semantics and enhance conversation modeling. However, most previous knowledge infusion methods perform empirical knowledge filtering and design highly customized architectures for knowledge interaction with the utterances, which can discard useful knowledge aspects and limit their generalizability to different knowledge sources. Based on these observations, we propose a Bipartite Heterogeneous Graph (BHG) method for enhancing emotional reasoning with commonsense knowledge. In BHG, the extracted context-aware utterance representations and knowledge representations are modeled as heterogeneous nodes. Two more knowledge aggregation node types are proposed to perform automatic knowledge filtering and interaction. BHG-based knowledge infusion can be directly generalized to multi-type and multi-grained knowledge sources. In addition, we propose a Multi-dimensional Heterogeneous Graph Transformer (MHGT) to perform graph reasoning, which can retain unchanged feature spaces and unequal dimensions for heterogeneous node types during inference to prevent unnecessary loss of information. Experiments show that BHG-based methods significantly outperform state-of-the-art knowledge infusion methods and show generalized knowledge infusion ability with higher efficiency. Further analysis proves that previous empirical knowledge filtering methods do not guarantee to provide the most useful knowledge information. Our code is available at: https://github.com/SteveKGYang/BHG.
摘要
“context-aware情感理解能力”是人工智能系统在对话中的重要特点,尤其在社交媒体上的情感分析和Empathy对话系统中。由于许多情感表达 implicit nature,因此通常使用常识来填充语音 semantics 和对话模型。然而,大多数先前知识混入方法通过 empirical knowledge filtering 和自定义 architectures 来实现知识与语音的交互,这可能会抛弃有用的知识方面和限制其在不同的知识来源上的一致性。基于这些观察,我们提出了一种 Bipartite Heterogeneous Graph (BHG) 方法来增强情感理解。在 BHG 中,提取的上下文化语音表示和知识表示被模型为不同类型的异常节点。我们还提出了两种新的知识聚合节点类型,以自动实现知识过滤和交互。BHG 基于的知识混入方法可以直接普遍应用于不同类型和多维度的知识来源。此外,我们还提出了一种 Multi-dimensional Heterogeneous Graph Transformer (MHGT) 来进行图reasoning,可以保持不变的特征空间和不等维度的异常节点类型 durante inference,以避免不必要的信息损失。实验表明,BHG 基于的方法在情感理解方面表现出色,并且具有更高的一致性和效率。进一步分析表明,先前的 empirical knowledge filtering 方法并不能提供最有用的知识信息。我们的代码可以在 GitHub 上找到:https://github.com/SteveKGYang/BHG。
ADMUS: A Progressive Question Answering Framework Adaptable to Multiple Knowledge Sources
results: 在多种不同的数据集上进行了实质性的试验,证明了ADMUS系统的高效性和灵活性。在线示例可以在https://answer.gstore.cn/pc/index.html中找到。Abstract
With the introduction of deep learning models, semantic parsingbased knowledge base question answering (KBQA) systems have achieved high performance in handling complex questions. However, most existing approaches primarily focus on enhancing the model's effectiveness on individual benchmark datasets, disregarding the high costs of adapting the system to disparate datasets in real-world scenarios (e.g., multi-tenant platform). Therefore, we present ADMUS, a progressive knowledge base question answering framework designed to accommodate a wide variety of datasets, including multiple languages, diverse backbone knowledge bases, and disparate question answering datasets. To accomplish the purpose, we decouple the architecture of conventional KBQA systems and propose this dataset-independent framework. Our framework supports the seamless integration of new datasets with minimal effort, only requiring creating a dataset-related micro-service at a negligible cost. To enhance the usability of ADUMS, we design a progressive framework consisting of three stages, ranges from executing exact queries, generating approximate queries and retrieving open-domain knowledge referring from large language models. An online demonstration of ADUMS is available at: https://answer.gstore.cn/pc/index.html
摘要
随着深度学习模型的引入,基于语义解析的知识库问答(KBQA)系统在处理复杂问题的性能得到了显著提高。然而,大多数现有方法主要是强调改进模型在特定 benchmark 数据集上的效果,忽视了在实际场景中适应不同数据集的高成本(例如,多租户平台)。因此,我们提出了 ADMUS,一个适应多种数据集的进步知识库问答框架。为了实现这一目标,我们将 convent ional KBQA 系统的架构划分为多个独立的组件,并且提出了一种不同数据集的独立框架。这些组件可以轻松地与新的数据集集成,只需要创建一个数据集相关的微服务,成本极低。为了提高 ADMUS 的可用性,我们设计了一个进步的框架,包括三个阶段:在执行精确查询、生成approx query和从大语言模型中提取开放领域知识三个阶段。在线示例可以在以下地址找到:https://answer.gstore.cn/pc/index.html。
Automatically measuring speech fluency in people with aphasia: first achievements using read-speech data
for: This study aims to assess the relevance of a signal processing algorithm for the automatic measurement of speech fluency in people with aphasia (PWA).
methods: The study uses a forward-backward divergence segmentation and a clustering algorithm to compute four automatic predictors of speech fluency, and combines these predictors into multivariate regression models to predict the average SLP ratings of speech fluency.
results: The study finds that the algorithms used can constitute a cost-effective and reliable tool for the assessment of the speech fluency of patients with aphasia in read-aloud tasks, with accurate predictions and high correlation coefficients between the automatic predictions and SLP ratings.Abstract
Background: Speech and language pathologists (SLPs) often relyon judgements of speech fluency for diagnosing or monitoringpatients with aphasia. However, such subjective methods havebeen criticised for their lack of reliability and their clinical cost interms of time. Aims: This study aims at assessing the relevance of a signalprocessingalgorithm, initially developed in the field of language acquisition, for the automatic measurement of speech fluency in people with aphasia (PWA). Methods & Procedures: Twenty-nine PWA and five control participantswere recruited via non-profit organizations and SLP networks. All participants were recorded while reading out loud a set ofsentences taken from the French version of the Boston Diagnostic Aphasia Examination. Three trained SLPs assessed the fluency of each sentence on a five-point qualitative scale. A forward-backward divergence segmentation and a clustering algorithm were used to compute, for each sentence, four automatic predictors of speech fluency: pseudo-syllable rate, speech ratio, rate of silent breaks, and standard deviation of pseudo-syllable length. The four predictors were finally combined into multivariate regression models (a multiplelinear regression - MLR, and two non-linear models) to predict the average SLP ratings of speech fluency, using a leave-one speaker-out validation scheme. Outcomes & Results: All models achieved accurate predictions of speech fluency ratings, with average root-mean-square errors as low as 0.5. The MLR yielded a correlation coefficient of 0.87 with reference ratings at the sentence level, and of 0.93 when aggregating the data for each participant. The inclusion of an additional predictor sensitive to repetitions improved further the predictions with a correlation coefficient of 0.91 at the sentence level, and of 0.96 at the participant level. Conclusions: The algorithms used in this study can constitute a cost-effective and reliable tool for the assessment of the speech fluency of patients with aphasia in read-aloud tasks. Perspectives for the assessment of spontaneous speech are discussed.
摘要
背景:语言学和语音学师(SLP)经常依靠语言流畅性的评估来诊断或监测患有语言异常的患者。然而,这些主观方法受到了不可靠性的批评,以及严重影响临床成本。目标:本研究目的是评估一种信号处理算法在诊断语言异常患者(PWA)的语言流畅性方面的可靠性。方法与程序:招募了29名PWA和5名控制参与者,来自非营利组织和SLP网络。所有参与者在念出句子时被录音,并且使用法语版本的波士顿语言鉴别检测测试套件中的句子。三名SLP评估每句语言流畅性的五个质量水平。使用前后弧 divergence 分 segmentation 和归一化算法计算每句语言流畅性的四个自动预测器:pseudo-syllable rate、speech ratio、silent breaks 率和pseudo-syllable length 的标准差。这四个预测器最终组合成多变量回归模型(多元回归)和两种非线性模型来预测SLP评估语言流畅性的平均分数,使用了留一个说话者验证方案。结果与结论:所有模型均达到了准确的语言流畅性评估结果,平均根据值为0.5。多元回归模型在句子水平获得了0.87的相关系数,并在每名参与者的数据归一化后获得0.93的相关系数。在添加一个更多的预测器时,预测结果进一步改善,句子水平相关系数提高到0.91,每名参与者的相关系数提高到0.96。结论:这些算法可以成为诊断患有语言异常的患者语言流畅性的可靠和Cost-effective工具。对叙述语言的评估可能性进行了讨论。
Building Interpretable and Reliable Open Information Retriever for New Domains Overnight
results: 论文提出了一个信息检索管道,该管道使用 entity/event linking 模型和查询分解模型来更加准确地关注查询中不同的信息单元。论文表明,相比单个 dense vectors 和端到端超vision,该管道可以更好地提高 passage coverages 和denotation accuracies,并且更加可读性和可靠性。Abstract
Information retrieval (IR) or knowledge retrieval, is a critical component for many down-stream tasks such as open-domain question answering (QA). It is also very challenging, as it requires succinctness, completeness, and correctness. In recent works, dense retrieval models have achieved state-of-the-art (SOTA) performance on in-domain IR and QA benchmarks by representing queries and knowledge passages with dense vectors and learning the lexical and semantic similarity. However, using single dense vectors and end-to-end supervision are not always optimal because queries may require attention to multiple aspects and event implicit knowledge. In this work, we propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query. We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks. It will be the go-to system to use for applications that need to perform IR on a new domain without much dedicated effort, because of its superior interpretability and cross-domain performance.
摘要
信息检索(IR)或知识检索是许多下游任务的关键组件,如开放领域问答(QA)。它具有精炼、完整和正确的要求。在最近的工作中,稠密检索模型已经实现了领域内IR和QA benchmark的状态最佳性(SOTA)性能,通过将查询和知识段表示为稠密矢量,并学习语义和语言相似性。但是,使用单个稠密矢量和端到端超vis�� Nobel是不 siempre最佳,因为查询可能需要对多个方面进行注意力和隐藏知识。在这种情况下,我们提议一个信息检索管道,使用实体/事件关联模型和查询分解模型,以更加准确地关注不同信息单元的查询。我们表明,相比于单一稠密矢量和端到端超vis�� Nobel,我们的提议管道可以更好地提高通过五个IR和QA benchmark的段覆盖率和涵义准确率。这将成为新领域IR应用的标准系统,因为它的超过其他解释和跨领域性能。
Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
results: 研究结果表明,使用PLM探测和对比学习机制可以有效地实现槽induction任务,并且可以与token级监督模型相似或更高的性能。此外,当扩展到新意图时,我们的SI目标还可以提高插槽填充任务的性能。Abstract
Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations. We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit (1) unsupervised semantic knowledge extracted from PLM, and (2) additional sentence-level intent label signals available from TOD. Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets. When generalized to emerging intents, our SI objectives also provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.
摘要
To address this challenge, we propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanisms to extract unsupervised semantic knowledge from PLM and utilize additional sentence-level intent label signals available from TOD. Our approach is effective in the SI task and can bridge the gap with token-level supervised models on two NLU benchmark datasets.Moreover, our SI objectives also provide enhanced slot label representations, leading to improved performance on Slot Filling tasks. This is particularly useful when dealing with emerging intents, where traditional slot label representations may not be effective. Our approach offers a promising solution for improving the efficiency and accuracy of NLU systems in TOD applications.
Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval
paper_authors: Tim Hartill, Diana Benavides-Prado, Michael Witbrock, Patricia J. Riddle
For: The paper aims to improve the performance of smaller language models on challenging short-answer question-answering tasks by combining rationales generated by a larger language model with longer contexts created from a multi-hop dense retrieval system.* Methods: The paper proposes two methods for combining rationales and contexts: Rationale Ranking (RR) and Reasoning with Retrieval-Augmented Training Data (RATD). RR involves training a model to score both generated rationales and retrieved contexts with respect to relevance and truthfulness, and then combining the scores to derive combined contexts. RATD involves training a smaller reasoning model using retrieval-augmented training datasets to utilize relevant information from longer text sequences.* Results: The paper finds that both methods are effective, but the RATD method is more straightforward to apply and produces the strongest results in unseen settings. The proposed models also generally outperform direct prompts against much larger models in both few-shot chain-of-thought and few-shot answer-only settings.Abstract
When provided with sufficient explanatory context, smaller Language Models have been shown to exhibit strong reasoning ability on challenging short-answer question-answering tasks where the questions are unseen in training. We evaluate two methods for further improvement in this setting. Both methods focus on combining rationales generated by a larger Language Model with longer contexts created from a multi-hop dense retrieval system. The first method ($\textit{RR}$) involves training a Rationale Ranking model to score both generated rationales and retrieved contexts with respect to relevance and truthfulness. We then use the scores to derive combined contexts from both knowledge sources using a number of combinatory strategies. For the second method ($\textit{RATD}$) we train a smaller Reasoning model using retrieval-augmented training datasets such that it becomes proficient at utilising relevant information from longer text sequences that may be only partially evidential and frequently contain many irrelevant sentences. Generally we find that both methods are effective but that the $\textit{RATD}$ method is more straightforward to apply and produces the strongest results in the unseen setting on which we focus. Our single best Reasoning model using only 440 million parameters materially improves upon strong comparable prior baselines for unseen evaluation datasets (StrategyQA 58.9 $\rightarrow$ 61.7 acc., CommonsenseQA 63.6 $\rightarrow$ 72.7 acc., ARC-DA 31.6 $\rightarrow$ 52.1 F1, IIRC 25.5 $\rightarrow$ 27.3 F1) and a version utilising our prior knowledge of each type of question in selecting a context combination strategy does even better. Our proposed models also generally outperform direct prompts against much larger models (BLOOM 175B and StableVicuna 13B) in both few-shot chain-of-thought and few-shot answer-only settings.
摘要
The first method, called $\textit{RR}$, trains a Rationale Ranking model to score both generated rationales and retrieved contexts based on relevance and truthfulness. We then use the scores to combine the contexts from both knowledge sources using various strategies.The second method, called $\textit{RATD}$, trains a smaller reasoning model using retrieval-augmented training datasets. This allows the model to learn how to utilize relevant information from longer text sequences, even if they contain many irrelevant sentences.We find that both methods are effective, but the $\textit{RATD}$ method is easier to apply and produces the strongest results in unseen settings. Our best reasoning model, using only 440 million parameters, significantly improves upon strong prior baselines (StrategyQA 58.9 $\rightarrow$ 61.7 acc., CommonsenseQA 63.6 $\rightarrow$ 72.7 acc., ARC-DA 31.6 $\rightarrow$ 52.1 F1, IIRC 25.5 $\rightarrow$ 27.3 F1) and even outperforms direct prompts against larger models (BLOOM 175B and StableVicuna 13B) in both few-shot chain-of-thought and few-shot answer-only settings. Our proposed models also generally outperform direct prompts against much larger models in both settings.
A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology
paper_authors: Sean Wu, Michael Koo, Lesley Blum, Andy Black, Liyo Kao, Fabien Scalzo, Ira Kurtz
for: This study investigated the medical knowledge capability of large language models (LLMs) in the context of internal medicine subspecialty multiple-choice test-taking ability.
methods: The study compared the performance of several open-source LLMs (Koala 7B, Falcon 7B, Stable-Vicuna 13B, and Orca Mini 13B) to GPT-4 and Claude 2 on multiple-choice questions in the field of Nephrology.
results: The study found that current widely used open-sourced LLMs have poor zero-shot reasoning ability compared to GPT-4 and Claude 2, with an overall success rate of 17.1% - 25.5% in answering nephSAP multiple-choice questions correctly.Abstract
In recent years, there have been significant breakthroughs in the field of natural language processing, particularly with the development of large language models (LLMs). These LLMs have showcased remarkable capabilities on various benchmarks. In the healthcare field, the exact role LLMs and other future AI models will play remains unclear. There is a potential for these models in the future to be used as part of adaptive physician training, medical co-pilot applications, and digital patient interaction scenarios. The ability of AI models to participate in medical training and patient care will depend in part on their mastery of the knowledge content of specific medical fields. This study investigated the medical knowledge capability of LLMs, specifically in the context of internal medicine subspecialty multiple-choice test-taking ability. We compared the performance of several open-source LLMs (Koala 7B, Falcon 7B, Stable-Vicuna 13B, and Orca Mini 13B), to GPT-4 and Claude 2 on multiple-choice questions in the field of Nephrology. Nephrology was chosen as an example of a particularly conceptually complex subspecialty field within internal medicine. The study was conducted to evaluate the ability of LLM models to provide correct answers to nephSAP (Nephrology Self-Assessment Program) multiple-choice questions. The overall success of open-sourced LLMs in answering the 858 nephSAP multiple-choice questions correctly was 17.1% - 25.5%. In contrast, Claude 2 answered 54.4% of the questions correctly, whereas GPT-4 achieved a score of 73.3%. We show that current widely used open-sourced LLMs do poorly in their ability for zero-shot reasoning when compared to GPT-4 and Claude 2. The findings of this study potentially have significant implications for the future of subspecialty medical training and patient care.
摘要
近年来,自然语言处理领域有了 significativ breakthrough,特别是大语言模型(LLMs)的发展。这些 LLMs 在不同的标准准则上显示了惊人的能力。在医疗领域,未来 LLMs 和其他未来 AI 模型的具体作用仍然 unclear。这些模型在未来可能用于 adaptive physician training、医疗 copilot 应用和数字patient interaction 场景。AI 模型在医疗教育和患者护理中的参与度取决于它们在特定医学领域的知识内容的掌握程度。本研究 investigated LLMs 在内科亚专业多选题测试能力方面的医学知识能力。我们比较了多个开源 LLMs(Koala 7B、Falcon 7B、Stable-Vicuna 13B 和 Orca Mini 13B)和 GPT-4 和 Claude 2 在尼科логи亚专业多选题中的表现。尼科логи亚专业是内科中的一个特别概念复杂的亚专业领域。本研究的目的是评估 LLM 模型在 nephSAP 多选题中的正确答案能力。全部 open-sourced LLMs 在 858 个 nephSAP 多选题中正确答案的成功率为 17.1% - 25.5%。与此相比, Claude 2 答对了 54.4% 的问题,而 GPT-4 则达到了 73.3%。我们显示了当前广泛使用的 open-sourced LLMs 在零次学习时的能力远低于 GPT-4 和 Claude 2。本研究的结果可能对未来医疗专业培训和患者护理产生重要影响。
Generating News-Centric Crossword Puzzles As A Constraint Satisfaction and Optimization Problem
results: 研究发现,即使仅有少量的新闻词汇,仍可以生成新闻对应的十字WORD游戏,并且可以在不同的环境下进行优化。Abstract
Crossword puzzles have traditionally served not only as entertainment but also as an educational tool that can be used to acquire vocabulary and language proficiency. One strategy to enhance the educational purpose is personalization, such as including more words on a particular topic. This paper focuses on the case of encouraging people's interest in news and proposes a framework for automatically generating news-centric crossword puzzles. We designed possible scenarios and built a prototype as a constraint satisfaction and optimization problem, that is, containing as many news-derived words as possible. Our experiments reported the generation probabilities and time required under several conditions. The results showed that news-centric crossword puzzles can be generated even with few news-derived words. We summarize the current issues and future research directions through a qualitative evaluation of the prototype. This is the first proposal that a formulation of a constraint satisfaction and optimization problem can be beneficial as an educational application.
摘要
十字WORD puzzles 不仅作为娱乐,还可以作为学习工具,帮助学习者提高词汇和语言水平。一种增强教育效果的策略是个性化,例如包含特定主题的词汇。这篇论文关注于鼓励人们对新闻的兴趣,并提出了自动生成新闻中心十字WORD puzzles 的框架。我们设计了可能的情景,建立了一个约束满足优化问题,即包含最多新闻来源的词汇。我们的实验报告了生成概率和时间,并对不同条件进行了评估。结果表明,可以使用少量新闻来源生成新闻中心十字WORD puzzles。我们通过质量评估我们的原型,总结了当前的问题和未来的研究方向。这是首次提出了一种约束满足优化问题可以作为教育应用。
TBIN: Modeling Long Textual Behavior Data for CTR Prediction
results: 实验结果表明,TBIN 可以有效地预测 CTR,并且在实际食物推荐平台上进行了在线实验,得到了较高的预测精度。Abstract
Click-through rate (CTR) prediction plays a pivotal role in the success of recommendations. Inspired by the recent thriving of language models (LMs), a surge of works improve prediction by organizing user behavior data in a \textbf{textual} format and using LMs to understand user interest at a semantic level. While promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs. However, it has been studied that long user behavior data can significantly benefit CTR prediction. In addition, these works typically condense user diverse interests into a single feature vector, which hinders the expressive capability of the model. In this paper, we propose a \textbf{T}extual \textbf{B}ehavior-based \textbf{I}nterest Chunking \textbf{N}etwork (TBIN), which tackles the above limitations by combining an efficient locality-sensitive hashing algorithm and a shifted chunk-based self-attention. The resulting user diverse interests are dynamically activated, producing user interest representation towards the target item. Finally, the results of both offline and online experiments on real-world food recommendation platform demonstrate the effectiveness of TBIN.
摘要
点击率(CTR)预测在推荐中扮演重要的角色。鉴于最近崛起的语言模型(LM),一些工作将用户行为数据 format 为文本,并使用 LM 理解用户的 semantic 价值。 Although promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs. However, it has been studied that long user behavior data can significantly benefit CTR prediction. In addition, these works typically condense user diverse interests into a single feature vector, which hinders the expressive capability of the model.In this paper, we propose a 文本行为基因网络(TBIN), which tackles the above limitations by combining an efficient locality-sensitive hashing algorithm and a shifted chunk-based self-attention. The resulting user diverse interests are dynamically activated, producing user interest representation towards the target item. Finally, the results of both offline and online experiments on real-world food recommendation platform demonstrate the effectiveness of TBIN.
Sudowoodo: a Chinese Lyric Imitation System with Source Lyrics
results: 该论文的实验结果表明,使用该新的框架和模型可以更好地进行中文歌词模仿。此外,该论文还提供了一个 demo 视频,详细介绍了该系统的使用和应用。Abstract
Lyrics generation is a well-known application in natural language generation research, with several previous studies focusing on generating accurate lyrics using precise control such as keywords, rhymes, etc. However, lyrics imitation, which involves writing new lyrics by imitating the style and content of the source lyrics, remains a challenging task due to the lack of a parallel corpus. In this paper, we introduce \textbf{\textit{Sudowoodo}, a Chinese lyrics imitation system that can generate new lyrics based on the text of source lyrics. To address the issue of lacking a parallel training corpus for lyrics imitation, we propose a novel framework to construct a parallel corpus based on a keyword-based lyrics model from source lyrics. Then the pairs \textit{(new lyrics, source lyrics)} are used to train the lyrics imitation model. During the inference process, we utilize a post-processing module to filter and rank the generated lyrics, selecting the highest-quality ones. We incorporated audio information and aligned the lyrics with the audio to form the songs as a bonus. The human evaluation results show that our framework can perform better lyric imitation. Meanwhile, the \textit{Sudowoodo} system and demo video of the system is available at \href{https://Sudowoodo.apps-hp.danlu.netease.com/}{Sudowoodo} and \href{https://youtu.be/u5BBT_j1L5M}{https://youtu.be/u5BBT\_j1L5M}.
摘要
文章摘要:本文介绍了一种新的中文歌词模仿系统——《嗨嗨》(Sudowoodo),可以基于源歌词文本生成新歌词。由于缺乏平行训练集,歌词模仿 зада务一直是一个挑战。我们提出了一种新的框架,利用关键词基于的歌词模型从源歌词中构建平行训练集。然后,我们使用这些对(新歌词、源歌词)进行训练歌词模仿模型。在推理过程中,我们利用一个后处理模块来筛选和排序生成的歌词,选择最高质量的歌词。此外,我们还 incorporated 音频信息并将歌词与音频进行对应,形成了完整的歌曲。人工评估结果显示,我们的框架可以实现更好的歌词模仿。同时,《嗨嗨》系统和demo视频也可以在 \href{https://Sudowoodo.apps-hp.danlu.netease.com/}{Sudowoodo} 和 \href{https://youtu.be/u5BBT_j1L5M}{https://youtu.be/u5BBT\_j1L5M} 上查看。
Cross-Lingual Constituency Parsing for Middle High German: A Delexicalized Approach
paper_authors: Ercong Nie, Helmut Schmid, Hinrich Schütze
for: 本研究旨在建立一个自动 syntax 分析系统 для $\mathbf{M}$iddle $\mathbf{H}$igh $\mathbf{G}$erman $\mathbf{MHG}$,但是 Due to the lack of annotated parse data, 训练一个自动 syntax 分析系统是一个困难的任务。
results: 我们的 delexicalized constituency parser 在 MHG 测试集上表现出色,实现了 F1 分数 67.3%,比最佳 zero-shot cross-lingual baseline 高出 28.6% 点。这个鼓舞人心的结果说明了自动 syntax 分析在其他具有相似挑战的古语言中的实际可行性。Abstract
Constituency parsing plays a fundamental role in advancing natural language processing (NLP) tasks. However, training an automatic syntactic analysis system for ancient languages solely relying on annotated parse data is a formidable task due to the inherent challenges in building treebanks for such languages. It demands extensive linguistic expertise, leading to a scarcity of available resources. To overcome this hurdle, cross-lingual transfer techniques which require minimal or even no annotated data for low-resource target languages offer a promising solution. In this study, we focus on building a constituency parser for $\mathbf{M}$iddle $\mathbf{H}$igh $\mathbf{G}$erman $\mathbf{MHG}$ under realistic conditions, where no annotated MHG treebank is available for training. In our approach, we leverage the linguistic continuity and structural similarity between MHG and $\mathbf{M}$odern $\mathbf{G}$erman $\mathbf{MG}$, along with the abundance of MG treebank resources. Specifically, by employing the $\mathit{delexicalization}$ method, we train a constituency parser on MG parse datasets and perform cross-lingual transfer to MHG parsing. Our delexicalized constituency parser demonstrates remarkable performance on the MHG test set, achieving an F1-score of 67.3%. It outperforms the best zero-shot cross-lingual baseline by a margin of 28.6% points. These encouraging results underscore the practicality and potential for automatic syntactic analysis in other ancient languages that face similar challenges as MHG.
摘要
古代语言处理自然语言处理(NLP)任务的基础角色。然而,为古代语言solely靠注释分析数据自动生成语法分析系统是一项困难的任务,因为这些语言的语法特征往往具有独特的挑战。这需要广泛的语言专业知识,从而导致了资源的缺乏。为了突破这个障碍,跨语言传递技术,即不需要或只需要少量注释数据的低资源目标语言中的技术,提供了一个有前途的解决方案。在这种研究中,我们关注于在实际条件下构建一个中世纪高德语(MHG)的分析器,无需注释MHG语法数据进行训练。我们利用中世纪高德语和现代德语之间的语言相似性和结构相似性,同时利用现代德语语法数据资源的充足。具体来说,我们采用了delexicalization方法,通过在MG语法数据集上训练分析器,并对MHG语法进行跨语言传递。我们的delexicalized分析器在MHG测试集上显示了Remarkable性能,达到了67.3%的F1分数。它超过了零零假设的跨语言基线值28.6个百分点。这些鼓舞人心的结果证明了自动语法分析在其他面临类似挑战的古代语言中的实用性和潜力。
Single-Sentence Reader: A Novel Approach for Addressing Answer Position Bias
results: 实验结果表明,提案的单句读者方法可以nearly match conventional training sets上模型的性能,证明其有效性。Abstract
Machine Reading Comprehension (MRC) models tend to take advantage of spurious correlations (also known as dataset bias or annotation artifacts in the research community). Consequently, these models may perform the MRC task without fully comprehending the given context and question, which is undesirable since it may result in low robustness against distribution shift. This paper delves into the concept of answer-position bias, where a significant percentage of training questions have answers located solely in the first sentence of the context. We propose a Single-Sentence Reader as a new approach for addressing answer position bias in MRC. We implement this approach using six different models and thoroughly analyze their performance. Remarkably, our proposed Single-Sentence Readers achieve results that nearly match those of models trained on conventional training sets, proving their effectiveness. Our study also discusses several challenges our Single-Sentence Readers encounter and proposes a potential solution.
摘要
Ahead of the Text: Leveraging Entity Preposition for Financial Relation Extraction
results: 该方法在比赛的公共排名榜上获得了第一名。Abstract
In the context of the ACM KDF-SIGIR 2023 competition, we undertook an entity relation task on a dataset of financial entity relations called REFind. Our top-performing solution involved a multi-step approach. Initially, we inserted the provided entities at their corresponding locations within the text. Subsequently, we fine-tuned the transformer-based language model roberta-large for text classification by utilizing a labeled training set to predict the entity relations. Lastly, we implemented a post-processing phase to identify and handle improbable predictions generated by the model. As a result of our methodology, we achieved the 1st place ranking on the competition's public leaderboard.
摘要
在ACM KDF-SIGIR 2023比赛的Entity Relation任务中,我们采用了一种多步方法。首先,我们将提供的实体 inserting到文本中相应的位置。然后,我们使用abeled训练集来使transformer基于语言模型roberta-large进行文本分类的精度。最后,我们实施了一个后处理阶段,以便通过模型生成的不可能预测来处理。因此,我们的方法在比赛的公共排名板上获得了第一名。
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition
paper_authors: Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li
for: 本研究targets at further improving the performance of multimodal emotion recognition in conversation (MM-ERC) tasks, by properly modeling the multimodal feature and conversational context.
results: 在两个公共MM-ERC数据集上,该系统实现了新的状态OF-THE-ART性能,并且进行了进一步的分析,表明所提出的方法可以充分利用多 modal和Contextual features,并且具有推广到更广泛的 conversational multimodal任务的潜在潜力。Abstract
It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.
摘要
研究团队已经很长时间来关注人工智能机器人理解人类情感的问题,即在对话场景下的多模态情感分析(MM-ERC)。这个问题在过去几年内得到了不间断的关注,而且提出了各种方法来提高任务性能。大多数现有方法都将 MM-ERC 视为标准的多模态分类问题,然后进行多模态特征分离和融合以最大化特征用用。然而,我们在再次检视 MM-ERC 的特点时,发现需要同时模型多模态特征和对话上下文。在这种情况下,我们提出了一种新的方法,以提高任务性能。在特征分离阶段,我们根据对比学习技术提出了双级分离机制(DDM),以分离特征到模态空间和话语空间两个级别。在特征融合阶段,我们提出了参与度感知融合机制(CFM)和上下文融合机制(CRM),用于多模态和上下文集成。CFM 可以动态管理多模态特征的贡献,而 CRM 可以flexibly 协调对话上下文的引入。在两个公共 MM-ERC 数据集上,我们的系统在新的状态均达到了领先的性能。进一步分析表明,我们的所有提出的机制都可以帮助 MM-ERC 任务,使得机器人可以更好地理解人类情感。此外,我们的方法还有潜在的推广性,可以推动更多的对话多模态任务的进步。
DialogRE^C+: An Extension of DialogRE to Investigate How Much Coreference Helps Relation Extraction in Dialogs
paper_authors: Yiyun Xiong, Mengwei Dai, Fei Li, Hao Fei, Bobo Li, Shengqiong Wu, Donghong Ji, Chong Teng
For: This paper introduces a new benchmark dataset called DialogRE^C+, which incorporates coreference resolution into the dialogue relation extraction (DRE) task.* Methods: The paper manually annotates a total of 5,068 coreference chains over 36,369 argument mentions based on existing DialogRE data, and develops four coreference-enhanced graph-based DRE models.* Results: The paper evaluates the effect of automatically extracted coreference chains and demonstrates the practicality of the DialogRE^C+ dataset for other domains and tasks.Abstract
Dialogue relation extraction (DRE) that identifies the relations between argument pairs in dialogue text, suffers much from the frequent occurrence of personal pronouns, or entity and speaker coreference. This work introduces a new benchmark dataset DialogRE^C+, introducing coreference resolution into the DRE scenario. With the aid of high-quality coreference knowledge, the reasoning of argument relations is expected to be enhanced. In DialogRE^C+ dataset, we manually annotate total 5,068 coreference chains over 36,369 argument mentions based on the existing DialogRE data, where four different coreference chain types namely speaker chain, person chain, location chain and organization chain are explicitly marked. We further develop 4 coreference-enhanced graph-based DRE models, which learn effective coreference representations for improving the DRE task. We also train a coreference resolution model based on our annotations and evaluate the effect of automatically extracted coreference chains demonstrating the practicality of our dataset and its potential to other domains and tasks.
摘要
对话关系提取(DRE)在对话文本中识别对话参与者之间的关系,受到个人 pronouns 和实体、发言人核心Reference的频繁出现困扰。这项工作提出了一个新的标准数据集 DialogRE^C+,将核心 Reference resolution 引入 DRE 场景中。通过高质量核心 Reference 知识的帮助,我们预期可以提高对话关系的逻辑推理。在 DialogRE^C+ 数据集中,我们 manually annotated 总共 5,068 个核心 Reference chain sobre 36,369 个 argue mention,其中包括 speaker chain、person chain、location chain 和 organization chain 四种不同类型的核心 Reference chain。我们还开发了 4 种核心 Reference 增强的图基于 DRE 模型,这些模型学习了有效的核心 Reference 表示,以提高 DRE 任务的性能。此外,我们还训练了基于我们的标注的核心 Reference 解析模型,并评估了自动提取的核心 Reference chain 的实用性, thereby demonstrating the practicality of our dataset and its potential to other domains and tasks.
A Bi-directional Multi-hop Inference Model for Joint Dialog Sentiment Classification and Act Recognition
results: 对两个常用的数据集进行了实验,结果显示,与州立标准比较,该模型在DAR和DSC任务上的性能提高至少2.6%和1.4%。此外,该模型不仅提高了性能,还增强了对这两个任务的解释性。Abstract
The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and act labels, which leads to an insufficient ability to capture rich sentiment and act clues and hinders effective and accurate reasoning. To address these issues, we propose a Bi-directional Multi-hop Inference Model (BMIM) that leverages a feature selection network and a bi-directional multi-hop inference network to iteratively extract and integrate rich sentiment and act clues in a bi-directional manner. We also employ contrastive learning and dual learning to explicitly model the correlations of sentiment and act labels. Our experiments on two widely-used datasets show that BMIM outperforms state-of-the-art baselines by at least 2.6% on F1 score in DAR and 1.4% on F1 score in DSC. Additionally, Our proposed model not only improves the performance but also enhances the interpretability of the joint sentiment and act prediction task.
摘要
joint任务对话情感分类(DSC)和行为识别(DAR)的目标是同时预测每句话的情感标签和行为标签。然而,现有方法只是一个方向地编码对话上下文,这限制了它们对对话上下文的理解能力。另外,这些方法忽略了情感和行为标签之间的直接相关性,这会导致对 sentiment和 act 标签的捕捉不充分,从而降低对这两个任务的准确性和有效性。为了解决这些问题,我们提议一种双向多跳推理模型(BMIM),利用特征选择网络和双向多跳推理网络来顺序提取和融合rich的情感和行为 clue。我们还使用对比学习和双向学习来直接模型情感和行为标签之间的相关性。我们的实验结果表明,BMIM在两个广泛使用的 dataset 上表现至少比状态ixelbaselines 2.6%的F1分数提高,并且我们的提出的模型不仅改善性能,还提高了对这两个任务的合理性。
results: 研究发现,在相似语言之间的翻译benefits于字符级别输入分 segmentation,而在不相似语言之间,字符级别vanilla Transformer-base经常落后于字符级别分 segmentation。研究也证实了之前的发现,可以通过 fine-tuning已经训练过的字符级别模型来逼近这些模型的性能。Abstract
We explore the effectiveness of character-level neural machine translation using Transformer architecture for various levels of language similarity and size of the training dataset on translation between Czech and Croatian, German, Hungarian, Slovak, and Spanish. We evaluate the models using automatic MT metrics and show that translation between similar languages benefits from character-level input segmentation, while for less related languages, character-level vanilla Transformer-base often lags behind subword-level segmentation. We confirm previous findings that it is possible to close the gap by finetuning the already trained subword-level models to character-level.
摘要
我们研究使用Transformer架构进行字符级别人工智能翻译的效果,在捷克和克罗地亚、德国、匈牙利、斯洛伐克和西班牙之间的翻译中进行了不同语言相似性和训练数据大小的研究。我们使用自动MT指标进行评估,并发现在相似语言之间的翻译中,字符级别输入分 segmentation 带来了好处,而在较不相似的语言中,字符级凯旋Transformer-base 经常落后于字符级别分 segmentation。我们证实了之前的发现,可以通过 fine-tuning 已经训练过的字符级别模型来减变这个差距。
Learning Evaluation Models from Large Language Models for Sequence Generation
results: 实验结果显示,使用ECT学习评估模型后,sequence generation模型的生成结果对于常用的 метри和ChatGPT进行评估都有所提高。Abstract
Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters. This is a computational challenge as presented by applying their evaluation capability at scale. To overcome the challenge, in this paper, we propose \textbf{ECT}, an \textbf{e}valuation \textbf{c}apability \textbf{t}ransfer method, to transfer the evaluation capability from LLMs to relatively lightweight language models. Based on the proposed ECT, we learn various evaluation models from ChatGPT, and employ them as reward models to improve sequence generation models via reinforcement learning and reranking approaches. Experimental results on machine translation, text style transfer, and summarization tasks demonstrate the effectiveness of our ECT. Notably, applying the learned evaluation models to sequence generation models results in better generated sequences as evaluated by commonly used metrics and ChatGPT.
摘要
大型语言模型在序列生成评价评价中表现出色,但它们通常具有较多参数。这是一个计算挑战,因为在应用它们的评价能力的大规模应用中,需要大量的计算资源。为解决这个挑战,在这篇论文中,我们提出了ECT(评价能力传输方法),用于将大型语言模型中的评价能力传输到较轻量级的语言模型中。基于ECT,我们从ChatGPT中学习了多种评价模型,并使其作为激励学习和重新排序的奖励模型。实验结果表明,ECT可以有效地提高序列生成模型的性能,并且应用学习的评价模型可以根据常用的指标和ChatGPT进行评价。
results: 这篇论文的实验结果显示,与基本的mean-teacher方法相比,密度割载导向的半指导 semi-supervised 物体检测方法可以提高检测精度超过2%,特别是 для小型物体。Abstract
One of the important bottlenecks in training modern object detectors is the need for labeled images where bounding box annotations have to be produced for each object present in the image. This bottleneck is further exacerbated in aerial images where the annotators have to label small objects often distributed in clusters on high-resolution images. In recent days, the mean-teacher approach trained with pseudo-labels and weak-strong augmentation consistency is gaining popularity for semi-supervised object detection. However, a direct adaptation of such semi-supervised detectors for aerial images where small clustered objects are often present, might not lead to optimal results. In this paper, we propose a density crop-guided semi-supervised detector that identifies the cluster of small objects during training and also exploits them to improve performance at inference. During training, image crops of clusters identified from labeled and unlabeled images are used to augment the training set, which in turn increases the chance of detecting small objects and creating good pseudo-labels for small objects on the unlabeled images. During inference, the detector is not only able to detect the objects of interest but also regions with a high density of small objects (density crops) so that detections from the input image and detections from image crops are combined, resulting in an overall more accurate object prediction, especially for small objects. Empirical studies on the popular benchmarks of VisDrone and DOTA datasets show the effectiveness of our density crop-guided semi-supervised detector with an average improvement of more than 2\% over the basic mean-teacher method in COCO style AP. Our code is available at: https://github.com/akhilpm/DroneSSOD.
摘要
一个重要的瓶颈在现代物体探测器的训练中是需要标注过的图像,其中需要为每个图像中的每个物体生成矩形框注释。这个瓶颈在空中图像中更加突出,因为注释员需要为高分辨率图像中的小对象进行标注。在最近几天,使用pseudo-labels和弱强协同稳定的mean-teacher方法在 semi-supervised 物体探测中得到了广泛的应用。然而,直接适用这种 semi-supervised 探测器于空中图像中,可能并不会导致最佳的结果。在这篇论文中,我们提出了一种基于密度评估的 semi-supervised 探测器,可以在训练时对小对象进行识别,并在探测时利用这些小对象来提高检测的准确性。在训练时,我们使用标注和无标注图像中的群集来增强训练集,从而提高小对象的检测和生成 Pseudo-labels 的可能性。在探测时,探测器不仅能够检测输入图像中的对象,还能够检测高密度的小对象区域(密度评估),从而将输入图像和密度评估中的检测结果组合起来,得到更加准确的对象预测,特别是对小对象。我们的实验结果表明,我们的密度评估基于 semi-supervised 探测器在 COCO 风格的 AP 中超过了基本 mean-teacher 方法的平均提升 más than 2%。我们的代码可以在:https://github.com/akhilpm/DroneSSOD 中找到。
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures
results: 研究发现,当源文章详细 enough以至于得到人工分析者的一致时,LLMs可以有效地characterize软件供应链攻击。然而,LLMs current不能完全取代人工分析者,未来的工作可以进一步提高LLM的表现在这个领域,并对更广泛的文章和攻击进行研究。Abstract
As we increasingly depend on software systems, the consequences of breaches in the software supply chain become more severe. High-profile cyber attacks like those on SolarWinds and ShadowHammer have resulted in significant financial and data losses, underlining the need for stronger cybersecurity. One way to prevent future breaches is by studying past failures. However, traditional methods of analyzing these failures require manually reading and summarizing reports about them. Automated support could reduce costs and allow analysis of more failures. Natural Language Processing (NLP) techniques such as Large Language Models (LLMs) could be leveraged to assist the analysis of failures. In this study, we assessed the ability of Large Language Models (LLMs) to analyze historical software supply chain breaches. We used LLMs to replicate the manual analysis of 69 software supply chain security failures performed by members of the Cloud Native Computing Foundation (CNCF). We developed prompts for LLMs to categorize these by four dimensions: type of compromise, intent, nature, and impact. GPT 3.5s categorizations had an average accuracy of 68% and Bard had an accuracy of 58% over these dimensions. We report that LLMs effectively characterize software supply chain failures when the source articles are detailed enough for consensus among manual analysts, but cannot yet replace human analysts. Future work can improve LLM performance in this context, and study a broader range of articles and failures.
摘要
随着我们对软件系统的依赖程度越来越高,软件供应链攻击的后果也变得更加严重。如 solsWinds和ShadowHammer等高 Profile cyber attack 的事件,已经导致了 significannot financial和data losses,这也高亮了更强的cybersecurity的需求。一种方法来防止未来的攻击是通过研究过去的失败来。然而,传统的失败分析方法需要手动阅读和总结报告。自动支持可以降低成本和允许更多的失败分析。自然语言处理(NLP)技术,如大语言模型(LLMs),可以帮助失败分析。在这项研究中,我们评估了LLMs是否可以分析历史软件供应链安全失败。我们使用LLMs来复制人工分析69例软件供应链安全失败,这些失败由Cloud Native Computing Foundation(CNCF)的成员进行了手动分析。我们为LLMs设计了四个维度来分类这些失败:类型攻击、意图、性质和影响。GPT 3.5的分类精度为68%,而Bard的精度为58%。我们发现LLMs可以有效地描述软件供应链失败,但是只有当源文章具有足够细节时,才能达到人工分析者之间的一致。未来的工作可以提高LLM的性能在这种情况下,并研究更广泛的文章和失败。
Do Diffusion Models Suffer Error Propagation? Theoretical Analysis and Consistency Regularization
results: 实验结果表明,这种正则化方法可以有效地解决扩散模型中的错误卷积问题,并且可以提高扩散模型的性能。Abstract
While diffusion models have achieved promising performances in data synthesis, they might suffer error propagation because of their cascade structure, where the distributional mismatch spreads and magnifies through the chain of denoising modules. However, a strict analysis is expected since many sequential models such as Conditional Random Field (CRF) are free from error propagation. In this paper, we empirically and theoretically verify that diffusion models are indeed affected by error propagation and we then propose a regularization to address this problem. Our theoretical analysis reveals that the question can be reduced to whether every denoising module of the diffusion model is fault-tolerant. We derive insightful transition equations, indicating that the module can't recover from input errors and even propagates additional errors to the next module. Our analysis directly leads to a consistency regularization scheme for diffusion models, which explicitly reduces the distribution gap between forward and backward processes. We further introduce a bootstrapping algorithm to reduce the computation cost of the regularizer. Our experimental results on multiple image datasets show that our regularization effectively handles error propagation and significantly improves the performance of vanilla diffusion models.
摘要
Diffusion models 有 achieved promising performance in data synthesis, but they may suffer from error propagation due to their cascade structure, where the distributional mismatch spreads and magnifies through the chain of denoising modules. However, many sequential models such as Conditional Random Field (CRF) are free from error propagation. In this paper, we empirically and theoretically verify that diffusion models are indeed affected by error propagation, and we propose a regularization to address this problem.我们的 teorical analysis shows that the question can be reduced to whether every denoising module of the diffusion model is fault-tolerant. We derive insightful transition equations, indicating that the module cannot recover from input errors and even propagates additional errors to the next module. Our analysis directly leads to a consistency regularization scheme for diffusion models, which explicitly reduces the distribution gap between forward and backward processes.我们还提出了一种 bootstrapping algorithm to reduce the computation cost of the regularizer. Our experimental results on multiple image datasets show that our regularization effectively handles error propagation and significantly improves the performance of vanilla diffusion models.
When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis
results: 实验表明,使用 NSCL 可以与多种强基eline 相比竞争,并且在常见的 benchmark 数据集上达到类似或更好的性能,这是一个有趣的实际应用,同时也具有理论保证。Abstract
Novel Class Discovery (NCD) aims at inferring novel classes in an unlabeled set by leveraging prior knowledge from a labeled set with known classes. Despite its importance, there is a lack of theoretical foundations for NCD. This paper bridges the gap by providing an analytical framework to formalize and investigate when and how known classes can help discover novel classes. Tailored to the NCD problem, we introduce a graph-theoretic representation that can be learned by a novel NCD Spectral Contrastive Loss (NSCL). Minimizing this objective is equivalent to factorizing the graph's adjacency matrix, which allows us to derive a provable error bound and provide the sufficient and necessary condition for NCD. Empirically, NSCL can match or outperform several strong baselines on common benchmark datasets, which is appealing for practical usage while enjoying theoretical guarantees.
摘要
《新类发现(NCD)》的目标是从无标签集中推断出新类,通过利用已知类的先知知识。尽管NCD的理论基础缺乏,这篇论文填补了这一空白,提供了一种分析框架,以便正式地探讨已知类如何帮助发现新类。特地设计 дляNCD问题,我们引入了一种图论的表示方式,可以由NSCL(NCD特征对偶损失)学习。将这个目标函数最小化等价于图的邻接矩阵 факторизация,从而得到了一个可证明的错误 bound 和发现新类的必要和 suficient condition。实验表明,NSCL可以与多种强基eline比肩并列,这是实用上的优点,同时又享有理论保证。
An Empirical Study of Bugs in Open-Source Federated Learning Framework
paper_authors: Weijie Shao, Yuyang Gao, Fu Song, Sen Chen, Lingling Fan
for: 本研究的目的是 investigate the security issues in federated learning (FL) frameworks.
methods: 本研究使用了manuel collection, classification, and labeling of 1,112 FL framework bugs from 12 open-source FL frameworks on GitHub, and constructed taxonomies of 15 symptoms, 12 root causes, and 20 fix patterns of these bugs.
results: 研究发现了9个发现,包括15种symptoms、12种root causes、20种fix patterns,并对23个逻辑组件和两个主要应用场景进行了分析。Abstract
Federated learning (FL), as a decentralized machine learning solution to the protection of users' private data, has become an important learning paradigm in recent years, especially since the enforcement of stricter laws and regulations in most countries. Therefore, a variety of FL frameworks are released to facilitate the development and application of federated learning. Despite the considerable amount of research on the security and privacy of FL models and systems, the security issues in FL frameworks have not been systematically studied yet. In this paper, we conduct the first empirical study on 1,112 FL framework bugs to investigate their characteristics. These bugs are manually collected, classified, and labeled from 12 open-source FL frameworks on GitHub. In detail, we construct taxonomies of 15 symptoms, 12 root causes, and 20 fix patterns of these bugs and investigate their correlations and distributions on 23 logical components and two main application scenarios. From the results of our study, we present nine findings, discuss their implications, and propound several suggestions to FL framework developers and security researchers on the FL frameworks.
摘要
federated learning(FL),作为一种保护用户隐私数据的分布式机器学习解决方案,在过去几年中变得非常重要,尤其是在大多数国家实施更加严格的法律和规定后。因此,许多FL框架被发布,以便发展和应用联邦学习。尽管有很多关于FL模型和系统安全性的研究,但FL框架的安全问题尚未得到系统的研究。在这篇论文中,我们进行了第一个对1,112个FL框架漏洞的实证研究,以Investigate其特点。这些漏洞由手动收集、分类和标注的12个开源FL框架在GitHub上的漏洞。在详细的分析中,我们构建了15种表现Symptoms、12种根本原因和20种修复模式的漏洞分类系统,并对23个逻辑组件和两个主要应用场景进行了分析。从我们的研究结果中,我们提出了9个发现,讨论了它们的意义,并对FL框架开发者和安全研究人员提出了一些建议。
Multi-Class Deep SVDD: Anomaly Detection Approach in Astronomy with Distinct Inlier Categories
paper_authors: Manuel Pérez-Carrasco, Guillermo Cabrera-Vives, Lorena Hernández-García, Francisco Forster, Paula Sánchez-Sáez, Alejandra Muñoz Arancibia, Nicolás Astorga, Franz Bauer, Amelia Bayo, Martina Cádiz-Leyton, Marcio Catelan
methods: 这篇论文提出了一个叫做 Multi-Class Deep Support Vector Data Description (MCDSVDD) 的新算法,它是 One-Class Deep SVDD 的扩展,可以处理不同类型的内liers category。MCDSVDD 使用神经网络将数据映射到高维球体中,每个高维球体代表一个特定的内liers category。
results: 这篇论文的结果显示 MCDSVDD 能够对天文数据进行有效的异常探测,并且可以利用不同类型的内liers category。Abstract
With the increasing volume of astronomical data generated by modern survey telescopes, automated pipelines and machine learning techniques have become crucial for analyzing and extracting knowledge from these datasets. Anomaly detection, i.e. the task of identifying irregular or unexpected patterns in the data, is a complex challenge in astronomy. In this paper, we propose Multi-Class Deep Support Vector Data Description (MCDSVDD), an extension of the state-of-the-art anomaly detection algorithm One-Class Deep SVDD, specifically designed to handle different inlier categories with distinct data distributions. MCDSVDD uses a neural network to map the data into hyperspheres, where each hypersphere represents a specific inlier category. The distance of each sample from the centers of these hyperspheres determines the anomaly score. We evaluate the effectiveness of MCDSVDD by comparing its performance with several anomaly detection algorithms on a large dataset of astronomical light-curves obtained from the Zwicky Transient Facility. Our results demonstrate the efficacy of MCDSVDD in detecting anomalous sources while leveraging the presence of different inlier categories. The code and the data needed to reproduce our results are publicly available at https://github.com/mperezcarrasco/AnomalyALeRCE.
摘要
随着现代观测望远镜生成的天文数据量的增加,自动化管道和机器学习技术已成为分析和提取天文数据中的重要工具。异常检测,即在数据中发现不寻常或意外的模式,是天文学中的一项复杂挑战。在这篇论文中,我们提出了多类深度支持向量数据描述(MCDSVDD)算法,这是一种特制来处理不同类准样的异常检测算法。MCDSVDD使用神经网络将数据映射到几个异常分类中的几个圆锥中,每个圆锥表示一种特定的准样类别。每个样本与各个圆锥的中心之间的距离决定了异常分数。我们通过对一些异常检测算法的比较,证明MCDSVDD可以有效地检测异常来源,同时利用不同类准样的存在。codes和需要进行重现的数据可以在https://github.com/mperezcarrasco/AnomalyALeRCE上获取。
Transferable Models for Bioacoustics with Human Language Supervision
results: 当finetuned的BioLingual模型在九个任务上达到了新的状态态-of-the-art水平。此外,该模型可以通过自然语言查询来检索动物叫声录音,并且可以在不同的物种和环境下进行可扩展的应用。Abstract
Passive acoustic monitoring offers a scalable, non-invasive method for tracking global biodiversity and anthropogenic impacts on species. Although deep learning has become a vital tool for processing this data, current models are inflexible, typically cover only a handful of species, and are limited by data scarcity. In this work, we propose BioLingual, a new model for bioacoustics based on contrastive language-audio pretraining. We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, with over a million audio-caption pairs holding information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When fine-tuned, BioLingual sets a new state-of-the-art on nine tasks in the Benchmark of Animal Sounds. Given its broad taxa coverage and ability to be flexibly queried in human language, we believe this model opens new paradigms in ecological monitoring and research, including free-text search on the world's acoustic monitoring archives. We open-source our models, dataset, and code.
摘要
We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, which contains over one million audio-caption pairs with information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When fine-tuned, BioLingual sets a new state-of-the-art on nine tasks in the Benchmark of Animal Sounds.Given its broad taxa coverage and ability to be flexibly queried in human language, we believe that this model opens up new paradigms in ecological monitoring and research, including free-text search on the world's acoustic monitoring archives. We have made our models, dataset, and code open-source, allowing for further research and application of this technology.
Adversarial ModSecurity: Countering Adversarial SQL Injections with Robust Machine Learning
results: 实验结果显示,AdvModSec可以提高ModSecurity的检测精度和鲁棒性,具体来说,可以提高检测率21%,并在对抗反对攻击方面提高鲁棒性42%。Abstract
ModSecurity is widely recognized as the standard open-source Web Application Firewall (WAF), maintained by the OWASP Foundation. It detects malicious requests by matching them against the Core Rule Set, identifying well-known attack patterns. Each rule in the CRS is manually assigned a weight, based on the severity of the corresponding attack, and a request is detected as malicious if the sum of the weights of the firing rules exceeds a given threshold. In this work, we show that this simple strategy is largely ineffective for detecting SQL injection (SQLi) attacks, as it tends to block many legitimate requests, while also being vulnerable to adversarial SQLi attacks, i.e., attacks intentionally manipulated to evade detection. To overcome these issues, we design a robust machine learning model, named AdvModSec, which uses the CRS rules as input features, and it is trained to detect adversarial SQLi attacks. Our experiments show that AdvModSec, being trained on the traffic directed towards the protected web services, achieves a better trade-off between detection and false positive rates, improving the detection rate of the vanilla version of ModSecurity with CRS by 21%. Moreover, our approach is able to improve its adversarial robustness against adversarial SQLi attacks by 42%, thereby taking a step forward towards building more robust and trustworthy WAFs.
摘要
mod_security是通用的开源网络应用程序防火墙(WAF),由OWASP基金会维护。它通过匹配核心规则集来检测攻击性请求,并且每个规则都被手动分配了严重程度,以及检测到的请求是否超过了一定的阈值。在这种简单的策略下,我们发现这个策略在检测SQL注入(SQLi)攻击时效果很差,因为它会拒绝许多合法请求,同时也容易受到攻击者的攻击。为了解决这些问题,我们设计了一个可靠的机器学习模型,名为AdvModSec,它使用核心规则集作为输入特征,并通过训练来检测攻击者Intentional地修改的SQLi攻击。我们的实验表明,AdvModSec在受到保护的网络服务的流量下进行训练后,可以提高检测率,同时也可以降低假阳性率,相比于vanilla版mod_security与核心规则集的检测率提高21%。此外,我们的方法还可以在面对攻击者Intentional地修改SQLi攻击后提高对抗性,提高了24%。这些成果表明我们的方法可以提高WAF的可靠性和信任性。
CasCIFF: A Cross-Domain Information Fusion Framework Tailored for Cascade Prediction in Social Networks
methods: 该研究提出了 Cross-Domain Information Fusion Framework (CasCIFF),这是一种基于多个域的信息融合框架,通过多树邻域信息来提高用户嵌入的Robustness。在插入推荐时,该框架 INTENTIONALLY integrate 时间戳,以捕捉信息扩散过程中的变化趋势。
results: 研究表明,CasCIFF 可以更好地捕捉用户行为和信息扩散的复杂关系,并且在信息扩散预测任务中表现出了superior的性能。Abstract
Existing approaches for information cascade prediction fall into three main categories: feature-driven methods, point process-based methods, and deep learning-based methods. Among them, deep learning-based methods, characterized by its superior learning and representation capabilities, mitigates the shortcomings inherent of the other methods. However, current deep learning methods still face several persistent challenges. In particular, accurate representation of user attributes remains problematic due to factors such as fake followers and complex network configurations. Previous algorithms that focus on the sequential order of user activations often neglect the rich insights offered by activation timing. Furthermore, these techniques often fail to holistically integrate temporal and structural aspects, thus missing the nuanced propagation trends inherent in information cascades.To address these issues, we propose the Cross-Domain Information Fusion Framework (CasCIFF), which is tailored for information cascade prediction. This framework exploits multi-hop neighborhood information to make user embeddings robust. When embedding cascades, the framework intentionally incorporates timestamps, endowing it with the ability to capture evolving patterns of information diffusion. In particular, the CasCIFF seamlessly integrates the tasks of user classification and cascade prediction into a consolidated framework, thereby allowing the extraction of common features that prove useful for all tasks, a strategy anchored in the principles of multi-task learning.
摘要
现有的信息冲击预测方法可以分为三个主要类别:基于特征的方法、基于点过程的方法和深度学习基于方法。其中,深度学习基于方法,具有出色的学习和表示能力,可以抵消其他方法的缺点。然而,当前的深度学习方法仍面临多个持续的挑战。特别是,准确地表示用户属性仍然是一个问题,因为因素如假账户和复杂的网络配置。过去的算法通常将用户活动的顺序序列化,而忽略了用户活动的时间序列信息。此外,这些技术通常不能整合时间和结构方面的信息,因此缺乏把握信息冲击的细腻传播趋势。为解决这些问题,我们提出了跨频率信息融合框架(CasCIFF),这是用于信息冲击预测的专门框架。这个框架利用多跳邻居信息来做用户嵌入 Robust。当嵌入冲击时,框架会意识到时间戳,以便捕捉信息传播中的演化趋势。具体来说,CasCIFF通过将用户分类和冲击预测任务融合到一起,从而允许提取共同的特征,这种策略基于多任务学习的原则。
Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning
results: 我们的结果表明,在比较不含源分离、不含对抗学习和不含两者的系统之间,我们的提议系统可以显著改善沟通隐私保护,同时保持声学监测任务的好性能。Abstract
Privacy preservation has long been a concern in smart acoustic monitoring systems, where speech can be passively recorded along with a target signal in the system's operating environment. In this study, we propose the integration of two commonly used approaches in privacy preservation: source separation and adversarial representation learning. The proposed system learns the latent representation of audio recordings such that it prevents differentiating between speech and non-speech recordings. Initially, the source separation network filters out some of the privacy-sensitive data, and during the adversarial learning process, the system will learn privacy-preserving representation on the filtered signal. We demonstrate the effectiveness of our proposed method by comparing our method against systems without source separation, without adversarial learning, and without both. Overall, our results suggest that the proposed system can significantly improve speech privacy preservation compared to that of using source separation or adversarial learning solely while maintaining good performance in the acoustic monitoring task.
摘要
privacy preservation已经是智能听音系统中的长期关注点,其中speech可以通过系统运行环境中的passive recording方式被记录。在这个研究中,我们提议将两种常用的隐私保护方法集成:源分离和对抗表示学习。我们的提议系统将learnlatent表示音频记录,以防止 differentiating between speech和non-speech记录。在源分离网络过滤一部分隐私敏感数据后,在对抗学习过程中,系统会学习隐私保护的表示。我们的实验结果表明,我们的提议方法可以在保持听音任务的好表现的同时,提高speech隐私保护比使用源分离或对抗学习独立的系统。
Improving Autonomous Separation Assurance through Distributed Reinforcement Learning with Attention Networks
results: numerical study表明,提出的方法可以在高强度、动态环境中保证安全有效的飞机分离,并且可以处理多种不确定性的信息。Abstract
Advanced Air Mobility (AAM) introduces a new, efficient mode of transportation with the use of vehicle autonomy and electrified aircraft to provide increasingly autonomous transportation between previously underserved markets. Safe and efficient navigation of low altitude aircraft through highly dense environments requires the integration of a multitude of complex observations, such as surveillance, knowledge of vehicle dynamics, and weather. The processing and reasoning on these observations pose challenges due to the various sources of uncertainty in the information while ensuring cooperation with a variable number of aircraft in the airspace. These challenges coupled with the requirement to make safety-critical decisions in real-time rule out the use of conventional separation assurance techniques. We present a decentralized reinforcement learning framework to provide autonomous self-separation capabilities within AAM corridors with the use of speed and vertical maneuvers. The problem is formulated as a Markov Decision Process and solved by developing a novel extension to the sample-efficient, off-policy soft actor-critic (SAC) algorithm. We introduce the use of attention networks for variable-length observation processing and a distributed computing architecture to achieve high training sample throughput as compared to existing approaches. A comprehensive numerical study shows that the proposed framework can ensure safe and efficient separation of aircraft in high density, dynamic environments with various sources of uncertainty.
摘要
高级空中 mobilité (AAM) 引入了一种新的、高效的交通方式,通过车辆自主和电动飞机来提供受欢迎的交通 between formerly underserved markets。 safe and efficient navigation of low altitude aircraft through highly dense environments requires the integration of a multitude of complex observations, such as surveillance, vehicle dynamics knowledge, and weather. the processing and reasoning on these observations pose challenges due to the various sources of uncertainty in the information while ensuring cooperation with a variable number of aircraft in the airspace. these challenges coupled with the requirement to make safety-critical decisions in real-time rule out the use of conventional separation assurance techniques. we present a decentralized reinforcement learning framework to provide autonomous self-separation capabilities within AAM corridors with the use of speed and vertical maneuvers. the problem is formulated as a Markov decision process and solved by developing a novel extension to the sample-efficient, off-policy soft actor-critic (SAC) algorithm. we introduce the use of attention networks for variable-length observation processing and a distributed computing architecture to achieve high training sample throughput as compared to existing approaches. a comprehensive numerical study shows that the proposed framework can ensure safe and efficient separation of aircraft in high density, dynamic environments with various sources of uncertainty.
Variations on the Reinforcement Learning performance of Blackjack
results: 研究发现,在黑板子游戏中,使用基础策略和高低系统的card counter可以使房间铺垮,并且牵涉到环境变化的影响。此外,q学习算法在不同deck size下的学习速率也得到了研究。Abstract
Blackjack or "21" is a popular card-based game of chance and skill. The objective of the game is to win by obtaining a hand total higher than the dealer's without exceeding 21. The ideal blackjack strategy will maximize financial return in the long run while avoiding gambler's ruin. The stochastic environment and inherent reward structure of blackjack presents an appealing problem to better understand reinforcement learning agents in the presence of environment variations. Here we consider a q-learning solution for optimal play and investigate the rate of learning convergence of the algorithm as a function of deck size. A blackjack simulator allowing for universal blackjack rules is also implemented to demonstrate the extent to which a card counter perfectly using the basic strategy and hi-lo system can bring the house to bankruptcy and how environment variations impact this outcome. The novelty of our work is to place this conceptual understanding of the impact of deck size in the context of learning agent convergence.
摘要
黑Jack或"21"是一款受欢迎的 карто牌类游戏,旨在通过获得手牌总数高于卡牌师的手牌总数,而不超过21。理想的黑Jack策略可以在长期内最大化财务回报,同时避免投资者的破产。黑Jack的杂 probabilistic环境和内在的奖励结构,使得黑Jack成为了研究增强学习代理人在环境变化下的理解的一个有appeal的问题。在这里,我们考虑了q学习解决方案,以实现最佳的游戏策略,并研究了算法在卡牌堆大小变化时的学习速率的响应。我们还实现了一个可以实现 universal blackjack规则的黑Jack模拟器,以示出一个基本策略和hi-lo系统的卡计数可以使得临场破产,并如何环境变化影响这种结果。我们的研究的新特点在于将这种概念理解与学习代理人的快速学习速率相联系起来。
Performance Analysis of Transformer Based Models (BERT, ALBERT and RoBERTa) in Fake News Detection
paper_authors: Shafna Fitria Nur Azizah, Hasan Dwi Cahyono, Sari Widya Sihwi, Wisnu Widiarto
For: 本研究旨在探讨使用 transformer 模型进行假新闻检测,以提高假新闻检测精度。* Methods: 本研究使用了 ALBERT 模型,并对其进行了改进,以提高假新闻检测精度。* Results: 研究发现,使用 ALBERT 模型可以达到 87.6% 的准确率,86.9% 的精度,86.9% F1-score,以及 174.5 个运行时间(s/epoch)。Abstract
Fake news is fake material in a news media format but is not processed properly by news agencies. The fake material can provoke or defame significant entities or individuals or potentially even for the personal interests of the creators, causing problems for society. Distinguishing fake news and real news is challenging due to limited of domain knowledge and time constraints. According to the survey, the top three areas most exposed to hoaxes and misinformation by residents are in Banten, DKI Jakarta and West Java. The model of transformers is referring to an approach in the field of artificial intelligence (AI) in natural language processing utilizing the deep learning architectures. Transformers exercise a powerful attention mechanism to process text in parallel and produce rich and contextual word representations. A previous study indicates a superior performance of a transformer model known as BERT over and above non transformer approach. However, some studies suggest the performance can be improved with the use of improved BERT models known as ALBERT and RoBERTa. However, the modified BERT models are not well explored for detecting fake news in Bahasa Indonesia. In this research, we explore those transformer models and found that ALBERT outperformed other models with 87.6% accuracy, 86.9% precision, 86.9% F1-score, and 174.5 run-time (s/epoch) respectively. Source code available at: https://github.com/Shafna81/fakenewsdetection.git
摘要
假新闻是指在新闻媒体格式中存在假信息,但是未经正确处理的新闻媒体。假信息可能会挑衅或诋毁重要的实体或个人,甚至为创造者的个人利益而导致社会问题。分辨假新闻和真实新闻是一项挑战,因为有限的领域知识和时间约束。据调查,居民最容易遭受到诈骗和错误信息的地区是万隆、Special Capital Region of Jakarta和西 Java。转换器是一种人工智能(AI)自然语言处理领域的方法,使用深度学习架构。转换器实施了强大的注意力机制,并在平行处理文本,生成富有内在语义的单词表示。以前的研究表明,一种名为BERT的转换器模型在非转换器方法之上表现出优异。然而,一些研究表示,使用改进的BERT模型,如ALBERT和RoBERTa,可以进一步提高性能。然而,这些改进的BERT模型在假新闻检测中的表现还未得到广泛探索。在这项研究中,我们探索了这些转换器模型,并发现ALBERT在87.6%的准确率、86.9%的精度、86.9%的F1分数和174.5个运行时(s/epoch)中表现出优异。源代码可以在GitHub上获取:https://github.com/Shafna81/fakenewsdetection.git。
Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey
For: 本研究目的是对外部知识的整合进行系统性的检讨和总结,以提高股票价格预测的准确性。* Methods: 本研究使用了非图structured和图structured的外部知识,包括文本、多媒体描述和股票市场的关联关系。* Results: 研究提出了一种系统性的方法,可以从不同的未结构化数据源中获取外部知识,并将其与历史价格特征进行融合。此外,研究还总结了一些相关的数据集和未来研究方向。Abstract
Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain.
摘要
预测股票价格是一个复杂的研究问题,因为股票市场具有自然的波动和非线性。在过去几年,带有知识的股票价格预测方法有着创新的成果,这些方法利用了外部知识来理解股票市场。虽然这些方法的重要性,但是学术研究中几乎没有系统地总结过去的研究,特别是对于不同类型的外部知识的分析。在这篇评论文中,我们将提供一个系统和全面的描述,涵盖从不同的未结构化数据源中获取外部知识,然后将其与历史价格特征结合在一起。此外,我们还会探讨不同类型的融合方法,以及可能的未来研究方向。Here's the translation of the text into Traditional Chinese:预测股票价格是一个复杂的研究问题,因为股票市场具有自然的波动和非线性。在过去几年,带有知识的股票价格预测方法有创新的成果,这些方法利用了外部知识来理解股票市场。处理这些方法的重要性,但是学术研究中几乎没有系统地总结过去的研究,特别是对于不同类型的外部知识的分析。在这篇评论文中,我们将提供一个系统和全面的描述,涵盖从不同的未结构化数据源中获取外部知识,然后将其与历史价格特征结合在一起。此外,我们还会探讨不同类型的融合方法,以及可能的未来研究方向。
Differentially Private Graph Neural Network with Importance-Grained Noise Adaption
for: 保护图形数据的隐私,特别是节点数据的隐私,when nodes represent personal and sensitive information。
methods: 提出了一种基于差分隐私的图 neural network (GNN) 算法, named NAP-GNN, which includes topology-based node importance estimation (TNIE) method, adaptive private aggregation method, and private training of graph learning algorithm with adaptive residual connection mode.
results: theoretically analysis shows that NAP-GNN satisfies privacy guarantees, and empirical experiments over real-world graph datasets show that NAP-GNN achieves a better trade-off between privacy and accuracy.Abstract
Graph Neural Networks (GNNs) with differential privacy have been proposed to preserve graph privacy when nodes represent personal and sensitive information. However, the existing methods ignore that nodes with different importance may yield diverse privacy demands, which may lead to over-protect some nodes and decrease model utility. In this paper, we study the problem of importance-grained privacy, where nodes contain personal data that need to be kept private but are critical for training a GNN. We propose NAP-GNN, a node-importance-grained privacy-preserving GNN algorithm with privacy guarantees based on adaptive differential privacy to safeguard node information. First, we propose a Topology-based Node Importance Estimation (TNIE) method to infer unknown node importance with neighborhood and centrality awareness. Second, an adaptive private aggregation method is proposed to perturb neighborhood aggregation from node-importance-grain. Third, we propose to privately train a graph learning algorithm on perturbed aggregations in adaptive residual connection mode over multi-layers convolution for node-wise tasks. Theoretically analysis shows that NAP-GNN satisfies privacy guarantees. Empirical experiments over real-world graph datasets show that NAP-GNN achieves a better trade-off between privacy and accuracy.
摘要
GRAPH NEURAL NETWORKS (GNNs) with differential privacy have been proposed to protect graph privacy when nodes represent personal and sensitive information. However, existing methods ignore that nodes with different importance may have different privacy demands, which may lead to over-protecting some nodes and decreasing model utility. In this paper, we study the problem of importance-grained privacy, where nodes contain personal data that needs to be kept private but are critical for training a GNN. We propose NAP-GNN, a node-importance-grained privacy-preserving GNN algorithm with privacy guarantees based on adaptive differential privacy to safeguard node information. First, we propose a Topology-based Node Importance Estimation (TNIE) method to infer unknown node importance with neighborhood and centrality awareness. Second, an adaptive private aggregation method is proposed to perturb neighborhood aggregation from node-importance-grain. Third, we propose to privately train a graph learning algorithm on perturbed aggregations in adaptive residual connection mode over multi-layers convolution for node-wise tasks. Theoretically analysis shows that NAP-GNN satisfies privacy guarantees. Empirical experiments over real-world graph datasets show that NAP-GNN achieves a better trade-off between privacy and accuracy.
Analyzing the Effect of Data Impurity on the Detection Performances of Mental Disorders
results: 研究发现,通过去除数据杂质,可以显著提高主动神疾病和剂量精神疾病的诊断性能。Abstract
The primary method for identifying mental disorders automatically has traditionally involved using binary classifiers. These classifiers are trained using behavioral data obtained from an interview setup. In this training process, data from individuals with the specific disorder under consideration are categorized as the positive class, while data from all other participants constitute the negative class. In practice, it is widely recognized that certain mental disorders share similar symptoms, causing the collected behavioral data to encompass a variety of attributes associated with multiple disorders. Consequently, attributes linked to the targeted mental disorder might also be present within the negative class. This data impurity may lead to sub-optimal training of the classifier for a mental disorder of interest. In this study, we investigate this hypothesis in the context of major depressive disorder (MDD) and post-traumatic stress disorder detection (PTSD). The results show that upon removal of such data impurity, MDD and PTSD detection performances are significantly improved.
摘要
主要方法用于自动诊断心理疾病traditionally involve使用二分类器。这些分类器通过使用面试设置收集的行为数据进行训练。在这个训练过程中,患有Specific mental disorder的数据被分类为正样本,而所有其他参与者的数据被分类为负样本。在实践中,一些心理疾病会表现相似的症状,导致收集的行为数据包含多种与多种疾病相关的特征。因此,与targeted mental disorder相关的特征可能会存在在负样本中。这种数据杂质可能会导致分类器的训练不优化。在这项研究中,我们 investigate这个假设在主要抑郁症(MDD)和创后应急压力反应症(PTSD)检测方面。结果显示,在移除数据杂质后,MDD和PTSD检测性能得到了显著改善。
An In-Depth Analysis of Discretization Methods for Communication Learning using Backpropagation with Multi-Agent Reinforcement Learning
paper_authors: Astrid Vanneste, Simon Vanneste, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx
for: 这个论文的目的是比较不同的某些方法在多智能人工智能学习中的表现,以及一种基于 DIAL 和 COMA 的通信学习方法的实现。
methods: 这个论文使用的方法包括多种常见的某些方法,以及一种新的方法 named ST-DRU。
results: 论文的结果表明,ST-DRU 方法在不同环境中的表现最佳,它在每个实验中都达到了最佳或非常接近最佳性能,而且是唯一不会在任何环境中失败的方法。Abstract
Communication is crucial in multi-agent reinforcement learning when agents are not able to observe the full state of the environment. The most common approach to allow learned communication between agents is the use of a differentiable communication channel that allows gradients to flow between agents as a form of feedback. However, this is challenging when we want to use discrete messages to reduce the message size, since gradients cannot flow through a discrete communication channel. Previous work proposed methods to deal with this problem. However, these methods are tested in different communication learning architectures and environments, making it hard to compare them. In this paper, we compare several state-of-the-art discretization methods as well as a novel approach. We do this comparison in the context of communication learning using gradients from other agents and perform tests on several environments. In addition, we present COMA-DIAL, a communication learning approach based on DIAL and COMA extended with learning rate scaling and adapted exploration. Using COMA-DIAL allows us to perform experiments on more complex environments. Our results show that the novel ST-DRU method, proposed in this paper, achieves the best results out of all discretization methods across the different environments. It achieves the best or close to the best performance in each of the experiments and is the only method that does not fail on any of the tested environments.
摘要
互动是多智能算法学习中非常重要的一环,当机器人无法观察环境的全部状态时。通常,使得学习得到的通信频道是使用可微分的通信频道,让条件gradient流过到每个机器人作为回传的形式。但是,当使用类型为码的讯息时,条件gradient无法流过这种通信频道。先前的工作已经提出了解决这个问题的方法,但是这些方法在不同的通信学习架构和环境中进行 tested,使得比较困难。在这篇文章中,我们比较了多个现有的码化方法,以及一个新的方法。我们将这些比较在通信学习中使用其他机器人的条件gradient进行tests,并在多个环境中进行试验。此外,我们还提出了COMA-DIAL,一种基于DIAL和COMA扩展的通信学习方法,并将其与学习速率调整和适应性探索相结合。这使得我们能够在更复杂的环境中进行实验。我们的结果显示,这篇文章中所提出的新方法ST-DRU,在不同的环境中都能够取得最佳或接近最佳的表现。它在每个实验中都能够取得最好或接近最好的结果,并且是唯一不会在任何测试环境中失败的方法。
JEDI: Joint Expert Distillation in a Semi-Supervised Multi-Dataset Student-Teacher Scenario for Video Action Recognition
results: 在四个视频动作识别数据集上进行验证,实验结果表明,在同时考虑所有数据集的semi-supervised学习 Setting下,模型可以获得显著的提升,比起初始专家模型。Abstract
We propose JEDI, a multi-dataset semi-supervised learning method, which efficiently combines knowledge from multiple experts, learned on different datasets, to train and improve the performance of individual, per dataset, student models. Our approach achieves this by addressing two important problems in current machine learning research: generalization across datasets and limitations of supervised training due to scarcity of labeled data. We start with an arbitrary number of experts, pretrained on their own specific dataset, which form the initial set of student models. The teachers are immediately derived by concatenating the feature representations from the penultimate layers of the students. We then train all models in a student-teacher semi-supervised learning scenario until convergence. In our efficient approach, student-teacher training is carried out jointly and end-to-end, showing that both students and teachers improve their generalization capacity during training. We validate our approach on four video action recognition datasets. By simultaneously considering all datasets within a unified semi-supervised setting, we demonstrate significant improvements over the initial experts.
摘要
我们提出了JEDI方法,这是一种多个数据集半supervised学习方法,它能够有效地将多个专家知识 fusion,来提高每个数据集的学生模型性能。我们的方法解决了当前机器学习研究中的两个重要问题:数据集之间的泛化和监督学习数据的稀缺。我们从arbitrary数量的专家开始,先在自己specific dataset上预训练student模型,然后将专家转化为教师,并通过学生-教师半supervised学习方式进行训练,直到收敛。在我们的高效的方法中,学生-教师训练是joint和端到端的,这表明在训练过程中,学生和教师都会提高其泛化能力。我们在四个视频动作识别数据集上验证了我们的方法,并证明了在同时考虑所有数据集的半supervised Setting下,我们的方法能够实现显著的改进。
Deep Learning-Based Prediction of Fractional Flow Reserve along the Coronary Artery
paper_authors: Nils Hampe, Sanne G. M. van Velzen, Jean-Paul Aben, Carlos Collet, Ivana Išgum
For: This paper aims to develop a deep learning-based method for predicting fractional flow reserve (FFR) values along the coronary arteries from coronary computed tomography angiography (CCTA) scans.* Methods: The proposed method uses a variational autoencoder to characterize the artery and a convolutional neural network (CNN) to predict the FFR values. The CNN is supervised by multiple loss functions, including a loss function inspired by the Earth Mover’s Distance (EMD) to predict the correct location of FFR drops and a histogram-based loss to explicitly supervise the slope of the FFR curve.* Results: The resulting FFR curves show good agreement with the reference, allowing the distinction between diffuse and focal coronary artery disease (CAD) distributions in most cases. The mean absolute difference in the area under the FFR pullback curve (AUPC) was 1.7.Abstract
Functionally significant coronary artery disease (CAD) is caused by plaque buildup in the coronary arteries, potentially leading to narrowing of the arterial lumen, i.e. coronary stenosis, that significantly obstructs blood flow to the myocardium. The current reference for establishing the presence of a functionally significant stenosis is invasive fractional flow reserve (FFR) measurement. To avoid invasive measurements, non-invasive prediction of FFR from coronary CT angiography (CCTA) has emerged. For this, machine learning approaches, characterized by fast inference, are increasingly developed. However, these methods predict a single FFR value per artery i.e. they don't provide information about the stenosis location or treatment strategy. We propose a deep learning-based method to predict the FFR along the artery from CCTA scans. This study includes CCTA images of 110 patients who underwent invasive FFR pullback measurement in 112 arteries. First, a multi planar reconstruction (MPR) of the artery is fed to a variational autoencoder to characterize the artery, i.e. through the lumen area and unsupervised artery encodings. Thereafter, a convolutional neural network (CNN) predicts the FFR along the artery. The CNN is supervised by multiple loss functions, notably a loss function inspired by the Earth Mover's Distance (EMD) to predict the correct location of FFR drops and a histogram-based loss to explicitly supervise the slope of the FFR curve. To train and evaluate our model, eight-fold cross-validation was performed. The resulting FFR curves show good agreement with the reference allowing the distinction between diffuse and focal CAD distributions in most cases. Quantitative evaluation yielded a mean absolute difference in the area under the FFR pullback curve (AUPC) of 1.7. The method may pave the way towards fast, accurate, automatic prediction of FFR along the artery from CCTA.
摘要
《 coronary artery disease (CAD)的功能 significative coronary artery (CA)病变是由 CA 内积累物质堆积,导致 CA 的 luminal 尺寸减小,从而导致 coronary stenosis ,对 myocardium 的血液流减少具有重要作用。目前,确定 CAD 存在功能 significative stenosis 的参照标准是非侵入性 fractional flow reserve (FFR)测量。为了避免非侵入性测量,非侵入性预测 CCTA 图像中的 FFR 已经出现。这些方法具有快速的推理功能,但它们只能预测每条 CA 的单个 FFR 值,无法提供条件尺寸或治疗策略信息。我们提出了基于 deep learning 的方法,可以从 CCTA 图像中预测 CA 的 FFR。这个研究包括 CCTA 图像的 110 例患者,其中每例包含 112 条 CA。首先,MPR 图像被 feed 到 variational autoencoder,以 caracterize the artery,包括 luminal 区域和不supervised artery 编码。然后,使用 convolutional neural network (CNN)预测 CA 的 FFR。CNN 被多个损失函数 supervise,包括一个取自 Earth Mover's Distance (EMD)的损失函数,以正确地预测 FFR 下降的位置,以及一个 histogram-based 损失函数,以直接监督 FFR 曲线的坡度。为了训练和评估我们的模型,我们使用 eight-fold cross-validation 进行了训练和评估。结果显示,我们的 FFR 曲线与参照标准之间有良好的一致性,可以在大多数情况下分辨 diffuse 和 focal CAD 分布。量化评估表明,AUPC 的平均绝对差为 1.7。这种方法可能将为 CCTA 图像中的 FFR 预测带来快速、准确、自动化的方法。
GraphCC: A Practical Graph Learning-based Approach to Congestion Control in Datacenters
results: 在评估中,GraphCC在不同的场景下表现出色,特别是在新的场景下(例如新的吞吐量工作负荷、故障、升级)下显示出优于状态艺术CC(ACC)的性能,升减流程完成时间(FCRT)和缓冲占用率(BO)。改进率可达20%,并且在不同的评估场景下均显示出优异性能。Abstract
Congestion Control (CC) plays a fundamental role in optimizing traffic in Data Center Networks (DCN). Currently, DCNs mainly implement two main CC protocols: DCTCP and DCQCN. Both protocols -- and their main variants -- are based on Explicit Congestion Notification (ECN), where intermediate switches mark packets when they detect congestion. The ECN configuration is thus a crucial aspect on the performance of CC protocols. Nowadays, network experts set static ECN parameters carefully selected to optimize the average network performance. However, today's high-speed DCNs experience quick and abrupt changes that severely change the network state (e.g., dynamic traffic workloads, incast events, failures). This leads to under-utilization and sub-optimal performance. This paper presents GraphCC, a novel Machine Learning-based framework for in-network CC optimization. Our distributed solution relies on a novel combination of Multi-agent Reinforcement Learning (MARL) and Graph Neural Networks (GNN), and it is compatible with widely deployed ECN-based CC protocols. GraphCC deploys distributed agents on switches that communicate with their neighbors to cooperate and optimize the global ECN configuration. In our evaluation, we test the performance of GraphCC under a wide variety of scenarios, focusing on the capability of this solution to adapt to new scenarios unseen during training (e.g., new traffic workloads, failures, upgrades). We compare GraphCC with a state-of-the-art MARL-based solution for ECN tuning -- ACC -- and observe that our proposed solution outperforms the state-of-the-art baseline in all of the evaluation scenarios, showing improvements up to $20\%$ in Flow Completion Time as well as significant reductions in buffer occupancy ($38.0-85.7\%$).
摘要
压缩控制(CC)在数据中心网络(DCN)中扮演了基本角色,以优化网络吞吐量。目前,DCN主要实现了两种主要的 CC 协议:DCTCP 和 DCQCN。这两种协议都基于显式压缩通知(ECN), intermediate switches 将包 WHEN 检测到压缩。因此,ECN 配置成为 CC 协议性能的关键因素。目前,网络专家通过手动设置ECN参数来优化网络性能。然而,今天的高速 DCN 经历了快速和突然变化,导致网络状态发生了剧烈变化(例如,动态流量负荷、广播事件、故障),这会导致网络资源的过度利用和低效性能。本文介绍了 GraphCC,一种基于机器学习的协议优化框架。我们的分布式解决方案基于多代理激励学习(MARL)和图神经网络(GNN),并与广泛部署的 ECN 基于 CC 协议相容。GraphCC 在 switches 上部署分布式代理,与其他 switches 通信以协调优化全局 ECN 配置。在我们的评估中,我们测试了 GraphCC 在多种场景下的性能,重点关注这种解决方案在新场景下(例如,新的流量负荷、故障、升级)未经训练时的可靠性。我们与 state-of-the-art MARL 基于 ECN 调试的解决方案(ACC)进行比较,并发现我们的提议的解决方案在所有评估场景中都高于 state-of-the-art 基线,实现了Flow Completion Time 的改进 ($20\%$) 以及显著减少缓存占用率($38.0-85.7\%)。
Towards true discovery of the differential equations
results: 该论文探讨了独立发现方程的前提和工具,并解决了评估发现的方程是否准确的挑战。Abstract
Differential equation discovery, a machine learning subfield, is used to develop interpretable models, particularly in nature-related applications. By expertly incorporating the general parametric form of the equation of motion and appropriate differential terms, algorithms can autonomously uncover equations from data. This paper explores the prerequisites and tools for independent equation discovery without expert input, eliminating the need for equation form assumptions. We focus on addressing the challenge of assessing the adequacy of discovered equations when the correct equation is unknown, with the aim of providing insights for reliable equation discovery without prior knowledge of the equation form.
摘要
differential equation发现,机器学习一个Subfield,用于开发可解释的模型,特别在自然相关应用中。通过专业地包含通用参数形式的动态方程和适当的分 diferencial项,算法可以自主发现方程从数据中。本文探讨无需专家输入的独立方程发现的前提和工具,消除方程形式假设的需求。我们专注于解决无法评估发现方程的正确性,当正确的方程未知时,提供可靠的方程发现无需先知方程形式的坚持。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Unleashing the Power of Extra-Tree Feature Selection and Random Forest Classifier for Improved Survival Prediction in Heart Failure Patients
paper_authors: Md. Simul Hasan Talukder, Rejwan Bin Sulaiman, Mouli Bardhan Paul Angon
For: The paper aims to improve survival prediction in heart failure patients by leveraging data pre-processing techniques and the Extra-Tree (ET) feature selection method in conjunction with the Random Forest (RF) classifier.* Methods: The paper uses the public UCL Heart failure (HF) survival dataset and employs the ET feature selection algorithm to identify the most informative features. These features are then used as input for grid search of RF.* Results: The approach achieved 98.33% accuracy, which is the highest over existing work.Here’s the same information in Simplified Chinese text:* For: 这篇论文目标是提高心衰竭患者存活率预测,通过数据处理技术和Extra-Tree(ET)特征选择法和Random Forest(RF)分类器的结合。* Methods: 该论文使用公共的UCL心衰竭存活数据集,并使用ET特征选择算法选择最有用的特征。这些特征然后用于RF搜索。* Results: 方法实现了98.33%的准确率,超过了现有的工作。Abstract
Heart failure is a life-threatening condition that affects millions of people worldwide. The ability to accurately predict patient survival can aid in early intervention and improve patient outcomes. In this study, we explore the potential of utilizing data pre-processing techniques and the Extra-Tree (ET) feature selection method in conjunction with the Random Forest (RF) classifier to improve survival prediction in heart failure patients. By leveraging the strengths of ET feature selection, we aim to identify the most significant predictors associated with heart failure survival. Using the public UCL Heart failure (HF) survival dataset, we employ the ET feature selection algorithm to identify the most informative features. These features are then used as input for grid search of RF. Finally, the tuned RF Model was trained and evaluated using different matrices. The approach was achieved 98.33% accuracy that is the highest over the exiting work.
摘要
心力衰竭是一种生命威胁性的疾病,影响全球数百万人。可以准确预测患者存活可以提供早期干预,提高患者结果。在这项研究中,我们探讨了使用数据处理技术和Extra-Tree(ET)特征选择方法,与Random Forest(RF)分类器结合,以提高心力衰竭患者存活预测。通过利用ET特征选择算法,我们寻找了心力衰竭存活最重要的预测器。使用公共的UCL心力衰竭存活数据集,我们使用ET特征选择算法确定最有用的特征。这些特征然后用于网格搜索RF模型的调整。最后,我们使用调整后的RF模型进行训练和评估,并使用不同的矩阵进行测试。我们的方法实现了98.33%的准确率,这是目前已知的最高水平。
Targeted and Troublesome: Tracking and Advertising on Children’s Websites
paper_authors: Zahra Moti, Asuman Senol, Hamid Bostani, Frederik Zuiderveen Borgesius, Veelasha Moonsamy, Arunesh Mathur, Gunes Acar for:* The paper focuses on the measurement of tracking and targeted advertising on websites directed at children.methods:* The authors use a multilingual classifier based on web page titles and descriptions to identify child-directed websites.* They crawl these websites from five vantage points to measure the prevalence of trackers, fingerprinting scripts, and advertisements.* They develop an ML pipeline to identify improper ads on child-directed websites by processing both images and text extracted from ads.results:* The authors find that around 90% of child-directed websites embed one or more trackers, and about 27% contain targeted advertisements.* They identify improper ads on child-directed websites, including ads for dating, weight loss, and mental health, as well as sex toys and flirting chat services.* The authors conclude that there is a trend of non-compliance with privacy regulations and troubling ad safety practices among many advertisers and child-directed websites.Abstract
On the modern web, trackers and advertisers frequently construct and monetize users' detailed behavioral profiles without consent. Despite various studies on web tracking mechanisms and advertisements, there has been no rigorous study focusing on websites targeted at children. To address this gap, we present a measurement of tracking and (targeted) advertising on websites directed at children. Motivated by lacking a comprehensive list of child-directed (i.e., targeted at children) websites, we first build a multilingual classifier based on web page titles and descriptions. Applying this classifier to over two million pages, we compile a list of two thousand child-directed websites. Crawling these sites from five vantage points, we measure the prevalence of trackers, fingerprinting scripts, and advertisements. Our crawler detects ads displayed on child-directed websites and determines if ad targeting is enabled by scraping ad disclosure pages whenever available. Our results show that around 90% of child-directed websites embed one or more trackers, and about 27% contain targeted advertisements--a practice that should require verifiable parental consent. Next, we identify improper ads on child-directed websites by developing an ML pipeline that processes both images and text extracted from ads. The pipeline allows us to run semantic similarity queries for arbitrary search terms, revealing ads that promote services related to dating, weight loss, and mental health; as well as ads for sex toys and flirting chat services. Some of these ads feature repulsive and sexually explicit imagery. In summary, our findings indicate a trend of non-compliance with privacy regulations and troubling ad safety practices among many advertisers and child-directed websites. To protect children and create a safer online environment, regulators and stakeholders must adopt and enforce more stringent measures.
摘要
现代网络上,跟踪器和广告商频繁地构建和利用用户的详细行为 profiles 而不经过用户的同意。尽管有各种研究关于网络跟踪机制和广告,但有一个缺乏关于直接向儿童targeted的网站的研究。为了填补这一漏洞,我们提供了一项测量tracking和targeted广告在directed at children的网站上的研究。由于缺乏全面的child-directed(即直接向儿童)网站列表,我们首先创建了一个多语言分类器,基于网页标题和描述来分类。将这些分类器应用于超过两百万页,我们编译了两千个child-directed网站的列表。从五个视点爬取这些站点,我们测量了跟踪器、指纹脚本和广告的存在。我们的爬虫检测在child-directed网站上显示的广告,并判断是否启用了广告targeting,并从可用的广告披露页面中抓取相关信息。我们的结果表明大约90%的child-directed网站 embedding一个或多个跟踪器,而约27%的网站上显示了targeted广告,这些广告应该需要可靠的父母consent。然后,我们使用机器学习管道来识别在child-directed网站上显示的不当广告。这个管道可以处理图像和文本抽取自广告,并允许我们对任意搜索关键词进行 semanticsimilarity 查询,揭示了关于约束、减肥和心理健康的服务,以及性 Toy和情趣交流服务的广告。一些这些广告包含了伤害和性革命的图像。总之,我们的发现表明许多广告商和child-directed网站不遵守隐私法规和儿童在线环境的安全做法。为了保护儿童和创造一个更安全的网络环境,管理者和关注者必须采取和实施更加严格的措施。
For: 提高深度学习模型的通用化能力* Methods: 使用 minimum spanning tree 计算 neuron 之间的 correlation dissimilarities,并使用这些误差来减少 neuron 之间的高相关性* Results: 比较 popular 误差函数,并证明了自己的效果,以及在不同的深度学习任务中的可应用性。Here’s the full translation of the paper’s abstract in Simplified Chinese:* For: 本文提出了一种新的方法,用于提高深度学习模型的通用化能力。* Methods: 该方法基于 minimum spanning tree 计算 neuron 之间的 correlation dissimilarities,并使用这些误差来减少 neuron 之间的高相关性。* Results: 比较 popular 误差函数,并证明了自己的效果,以及在不同的深度学习任务中的可应用性。Abstract
We propose a novel way to improve the generalisation capacity of deep learning models by reducing high correlations between neurons. For this, we present two regularisation terms computed from the weights of a minimum spanning tree of the clique whose vertices are the neurons of a given network (or a sample of those), where weights on edges are correlation dissimilarities. We provide an extensive set of experiments to validate the effectiveness of our terms, showing that they outperform popular ones. Also, we demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms, suggesting that redundancies play a significant role in artificial neural networks, as evidenced by some studies in neuroscience for real networks. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms that consider the whole set of neurons and that can be applied to a feedforward architecture in any deep learning task such as classification, data generation, or regression.
摘要
我团队提出了一种新的方法,用于提高深度学习模型的泛化能力,通过减少神经元之间的高相关性。我们提出了两种正则化项,其计算基于某个神经网络(或一个样本)中神经元之间的最小拓扑树,其边的权重是神经元之间的相关性差异。我们提供了广泛的实验,证明了我们的正则化项的有效性,并表明它们超过了popular ones。此外,我们还证明了直接对所有神经元之间的相关性进行最小化,会导致较低的准确率,这表明了人工神经网络中的约束扮演了重要的角色,这与一些 neuroscience 研究中的真实神经网络一致。我们还提供了正则化项的导函数整合,这是首次开发了考虑整个神经元集的 topological persistence 基于的有效正则化项,可以应用于任何深度学习任务,如分类、数据生成或回归。
Are Sex-based Physiological Differences the Cause of Gender Bias for Chest X-ray Diagnosis?
results: 我们的分析发现,数据集特有的因素,而不是基本的生理差异,是胸部X射线预测性差异的主要驱动者。Abstract
While many studies have assessed the fairness of AI algorithms in the medical field, the causes of differences in prediction performance are often unknown. This lack of knowledge about the causes of bias hampers the efficacy of bias mitigation, as evidenced by the fact that simple dataset balancing still often performs best in reducing performance gaps but is unable to resolve all performance differences. In this work, we investigate the causes of gender bias in machine learning-based chest X-ray diagnosis. In particular, we explore the hypothesis that breast tissue leads to underexposure of the lungs and causes lower model performance. Methodologically, we propose a new sampling method which addresses the highly skewed distribution of recordings per patient in two widely used public datasets, while at the same time reducing the impact of label errors. Our comprehensive analysis of gender differences across diseases, datasets, and gender representations in the training set shows that dataset imbalance is not the sole cause of performance differences. Moreover, relative group performance differs strongly between datasets, indicating important dataset-specific factors influencing male/female group performance. Finally, we investigate the effect of breast tissue more specifically, by cropping out the breasts from recordings, finding that this does not resolve the observed performance gaps. In conclusion, our results indicate that dataset-specific factors, not fundamental physiological differences, are the main drivers of male--female performance gaps in chest X-ray analyses on widely used NIH and CheXpert Dataset.
摘要
虽然许多研究已经评估了人工智能算法在医疗领域的公平性,但对差异的预测性表现的原因 часто不明确。这种不知道偏误的原因使得偏误缓解无法具有最佳效果,如果只是通过简单的数据平衡来减少性能差异。在这项工作中,我们调查了机器学习基于胸部X光图像的性别偏误。我们尝试了一种新的采样方法,以解决两个公共数据集中每个病人记录的极度不均衡的问题,同时减少标签错误的影响。我们对男女之间疾病、数据集和训练集中的性别表现进行了广泛的分析,发现数据集不均衡不是差异的唯一原因。此外,不同数据集中男女组的表现差异很大,这表明数据集特有的因素对男女组的表现产生了重要影响。最后,我们尝试了更 specifically investigate the effect of breast tissue, by cropping out the breasts from the recordings, but found that this did not resolve the observed performance gaps. conclude, our results indicate that dataset-specific factors, rather than fundamental physiological differences, are the main drivers of male-female performance gaps in chest X-ray analyses on the widely used NIH and CheXpert Dataset.
Scalability of Message Encoding Techniques for Continuous Communication Learned with Multi-Agent Reinforcement Learning
results: 结果表明,在大量 Agent 中,mean message encoder 一直表现出色,superior 于 attention message encoder。研究发现,使用 mean message encoder 的 Agent 采用了一种组合 exponential 和 logarithmic 函数的通信策略,以避免信息损失。Abstract
Many multi-agent systems require inter-agent communication to properly achieve their goal. By learning the communication protocol alongside the action protocol using multi-agent reinforcement learning techniques, the agents gain the flexibility to determine which information should be shared. However, when the number of agents increases we need to create an encoding of the information contained in these messages. In this paper, we investigate the effect of increasing the amount of information that should be contained in a message and increasing the number of agents. We evaluate these effects on two different message encoding methods, the mean message encoder and the attention message encoder. We perform our experiments on a matrix environment. Surprisingly, our results show that the mean message encoder consistently outperforms the attention message encoder. Therefore, we analyse the communication protocol used by the agents that use the mean message encoder and can conclude that the agents use a combination of an exponential and a logarithmic function in their communication policy to avoid the loss of important information after applying the mean message encoder.
摘要
多个自动机制系统需要间接通信以实现目标。通过同时学习动作协议和通信协议使用多自动学习技术,代理人获得了自定义信息共享的灵活性。然而,当代理人数量增加时,我们需要创建消息中信息的编码。在这篇论文中,我们研究了增加消息中信息量和代理人数量的效果,并对两种消息编码方法进行评估:平均消息编码器和注意消息编码器。我们在矩阵环境中进行实验,结果显示,平均消息编码器一直表现出色,超过注意消息编码器。因此,我们分析了使用平均消息编码器的通信协议中的代理人们使用的函数,并结论这些函数是一种对抗函数和对数函数的组合,以避免在应用平均消息编码器后失去重要信息。
Unlocking the Diagnostic Potential of ECG through Knowledge Transfer from Cardiac MRI
paper_authors: Özgün Turgut, Philip Müller, Paul Hager, Suprosanna Shit, Sophie Starck, Martin J. Menten, Eimo Martens, Daniel Rueckert
For: 这篇研究旨在提供一种免费和快速的心脏健康评估工具,并将详细的心脏诊断换到更加昂贵的心脏磁共振成像(CMR)成像中。* Methods: 这篇研究提出了第一个自我超vised contrastive方法,将频率域图像与CMR图像的领域专有信息转移到ECG嵌入中。这个方法结合多modal contrastive learning和封页数据模型,实现单个ECG数据的全面心脏检查。* Results: 在对40,044名UK BiobankSubject进行了广泛的实验之后,我们展示了我们的方法的实用性和普遍性。我们预测各个心脏疾病的Subject-specific预后,并从ECG数据中分类出不同的心脏型态。在质感分析中,我们显示了我们学习的ECG嵌入包含了CMR图像区域的信息。我们将整个数据pipeline公开供下载,包括源代码和预读模型的重量。Abstract
The electrocardiogram (ECG) is a widely available diagnostic tool that allows for a cost-effective and fast assessment of the cardiovascular health. However, more detailed examination with expensive cardiac magnetic resonance (CMR) imaging is often preferred for the diagnosis of cardiovascular diseases. While providing detailed visualization of the cardiac anatomy, CMR imaging is not widely available due to long scan times and high costs. To address this issue, we propose the first self-supervised contrastive approach that transfers domain-specific information from CMR images to ECG embeddings. Our approach combines multimodal contrastive learning with masked data modeling to enable holistic cardiac screening solely from ECG data. In extensive experiments using data from 40,044 UK Biobank subjects, we demonstrate the utility and generalizability of our method. We predict the subject-specific risk of various cardiovascular diseases and determine distinct cardiac phenotypes solely from ECG data. In a qualitative analysis, we demonstrate that our learned ECG embeddings incorporate information from CMR image regions of interest. We make our entire pipeline publicly available, including the source code and pre-trained model weights.
摘要
电导gram (ECG) 是一种广泛可用的诊断工具,可以快速和Cost-effectively评估心血管健康。然而,详细的检查通常使用昂贵的心血管共振成像 (CMR) 图像进行诊断心血管疾病。虽然提供了详细的心血管解剖结构图像,但CMR成像因为长时间扫描和高成本而不太常用。为解决这个问题,我们提出了第一个自动学习对抗方法,将域特定信息从 CMR 图像传递到 ECG 嵌入。我们的方法结合多modal对抗学习和遮盖数据模型,以实现唯一从 ECG 数据进行全面卡ди亚层检查。在使用40044名UK Biobank参与者的数据进行广泛实验中,我们证明了我们的方法的实用性和普适性。我们预测参与者特定的各种心血管疾病的风险,并通过分析发现了具有不同心血管特征的卡ди亚。在质量分析中,我们发现我们学习的 ECG 嵌入包含 CMR 图像区域兴趣的信息。我们将整个管道公开,包括源代码和预训练模型参数。
results: 我们的实验表明,将这种记忆网络与不同的惊喜预测器结合使用可以提高探索行为的效率并提高终极性表现,包括雷达、导航和困难的Atari游戏。Is that what you were looking for?Abstract
We present a new computing model for intrinsic rewards in reinforcement learning that addresses the limitations of existing surprise-driven explorations. The reward is the novelty of the surprise rather than the surprise norm. We estimate the surprise novelty as retrieval errors of a memory network wherein the memory stores and reconstructs surprises. Our surprise memory (SM) augments the capability of surprise-based intrinsic motivators, maintaining the agent's interest in exciting exploration while reducing unwanted attraction to unpredictable or noisy observations. Our experiments demonstrate that the SM combined with various surprise predictors exhibits efficient exploring behaviors and significantly boosts the final performance in sparse reward environments, including Noisy-TV, navigation and challenging Atari games.
摘要
我们提出了一种新的计算模型,用于激励学习中的内在奖励,以解决现有的惊喜驱动探索的局限性。我们的奖励是惊喜的新鲜度,而不是惊喜的平均值。我们使用记忆网络来估计惊喜的新鲜度,并称之为惊喜记忆(SM)。我们的SM可以增强惊喜基于内在动机的能力,使机器人保持有趣的探索,同时减少不必要的吸引到不可预测或噪音观察的情况。我们的实验表明,SM结合不同的惊喜预测器可以实现高效的探索行为,并在稀有奖励环境中显著提高最终性能,包括噪音电视、导航和复杂的Atari游戏。
TSSR: A Truncated and Signed Square Root Activation Function for Neural Networks
methods: 提出了一种新的活化函数called Truncated and Signed Square Root (TSSR)函数
results: TSSR函数在各种应用领域中表现出色,比如计算机视觉、自然语言处理和语音识别等。Abstract
Activation functions are essential components of neural networks. In this paper, we introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is distinctive because it is odd, nonlinear, monotone and differentiable. Its gradient is continuous and always positive. Thanks to these properties, it has the potential to improve the numerical stability of neural networks. Several experiments confirm that the proposed TSSR has better performance than other stat-of-the-art activation functions. The proposed function has significant implications for the development of neural network models and can be applied to a wide range of applications in fields such as computer vision, natural language processing, and speech recognition.
摘要
<>translate the following text into Simplified Chinese<> activation functions are crucial components of neural networks. In this paper, we introduce a new activation function called the Truncated and Signed Square Root (TSSR) function. This function is unique because it is odd, nonlinear, monotone, and differentiable. Its gradient is continuous and always positive. Thanks to these properties, it has the potential to improve the numerical stability of neural networks. Several experiments confirm that the proposed TSSR has better performance than other state-of-the-art activation functions. The proposed function has significant implications for the development of neural network models and can be applied to a wide range of applications in fields such as computer vision, natural language processing, and speech recognition.Here's the translation:<>翻译以下文本为简化字典<>激活函数是神经网络中重要的组件。在这篇论文中,我们介绍了一种新的激活函数,即 truncated and signed square root(TSSR)函数。这个函数是独特的,因为它是奇数,非线性,卷积和导数满是 monotone。其导数是连续的,总是正的。由于这些性质,它有可能改善神经网络的数值稳定性。几个实验证明,我们提出的 TSSR 函数在其他现有的激活函数中表现更好。该函数有广泛应用的前景,可以应用于计算机视觉、自然语言处理和语音识别等领域。
On the Unexpected Abilities of Large Language Models
methods: 论文使用了 indirect acquisition process 和其他已知的 indirect processes。
results: 论文 argued that large language models develop integrated abilities as a side effect of indirect acquisition, and discussed the predictability of these abilities. Additionally, the paper briefly discussed the relation between the cognitive skills acquired by these systems and human cognition.Abstract
Large language models are capable of displaying a wide range of abilities that are not directly connected with the task for which they are trained: predicting the next words of human-written texts. In this article, I discuss the nature of this indirect acquisition process and its relation to other known indirect processes. I argue that an important side effect of such indirect acquisition is the development of integrated abilities. I discuss the extent to which the abilities developed by large language models are predictable. Finally, I briefly discuss the relation between the cognitive skills acquired by these systems and human cognition.
摘要
大型语言模型可以显示广泛的能力,而这些能力与它们所训练的任务没有直接的连接:预测人类写成的文本中的下一句。在这篇文章中,我讨论这种 indirect acquisition 过程的本质和其他已知 indirect process 之间的关系。我认为大型语言模型通过 indirect acquisition 过程中获得的能力具有一定的可预测性。最后,我 briefly discuss 这些系统所获得的认知技能与人类认知之间的关系。
Bayes Risk Consistency of Nonparametric Classification Rules for Spike Trains Data
results: 论文提出了一个具有极限性的 Bayes 规则,并证明了这个规则在不断增加的录音时间间隔和训练集大小下的数据采样中的测度。Abstract
Spike trains data find a growing list of applications in computational neuroscience, imaging, streaming data and finance. Machine learning strategies for spike trains are based on various neural network and probabilistic models. The probabilistic approach is relying on parametric or nonparametric specifications of the underlying spike generation model. In this paper we consider the two-class statistical classification problem for a class of spike train data characterized by nonparametrically specified intensity functions. We derive the optimal Bayes rule and next form the plug-in nonparametric kernel classifier. Asymptotical properties of the rules are established including the limit with respect to the increasing recording time interval and the size of a training set. In particular the convergence of the kernel classifier to the Bayes rule is proved. The obtained results are supported by a finite sample simulation studies.
摘要
射频训练数据在计算神经科学、成像、流动数据和金融等领域发现了广泛的应用。机器学习策略 для射频训练基于各种神经网络和概率模型。 probabilistic 方法取决于射频训练模型的参数或非参数规定。本文考虑一类具有非参数强度函数的射频训练数据的两类统计分类问题。我们 derivation 出最优的 bayes 规则,然后形成插入非参数核函数分类器。我们证明了核函数分类器的极限性,包括记录时间间隔的增长和训练集大小的限制。特别是,我们证明了核函数分类器的极限性和 bayes 规则的同强性。获得的结果得到了finite sample 伪验的支持。
PETformer: Long-term Time Series Forecasting via Placeholder-enhanced Transformer
For: This paper aims to improve the performance of Transformer-based models in long-term time series forecasting (LTSF) tasks by addressing three key issues: temporal continuity, information density, and multi-channel relationships.* Methods: The proposed model, called PETformer, uses three innovative techniques: Placeholder Enhancement Technique (PET), Long Sub-sequence Division (LSD), and Multi-channel Separation and Interaction (MSI) to introduce prior biases suitable for LTSF tasks.* Results: The proposed PETformer model achieves state-of-the-art (SOTA) performance on eight commonly used public datasets for LTSF, outperforming all other models currently available. This demonstrates that Transformer still possesses powerful capabilities in LTSF.Here’s the Chinese version of the information points:* For: 这篇论文目的是提高Transformer基于模型在长期时间序预测(LTSF)任务中的表现,通过解决三个关键问题:时间连续性、信息密度和多通道关系。* Methods: 提议的模型被称为PETformer,使用了三种创新的技术:Placeholder Enhancement Technique(PET)、Long Sub-sequence Division(LSD)和Multi-channel Separation and Interaction(MSI),以引入适合LTSF任务的先验偏好。* Results: PETformer模型在八个常用的公共数据集上达到了状态之最(SOTA)的表现,比其他所有现有的模型都高。这表明Transformer仍然在LTSF中具有强大的能力。Abstract
Recently, Transformer-based models have shown remarkable performance in long-term time series forecasting (LTSF) tasks due to their ability to model long-term dependencies. However, the validity of Transformers for LTSF tasks remains debatable, particularly since recent work has shown that simple linear models can outperform numerous Transformer-based approaches. This suggests that there are limitations to the application of Transformer in LTSF. Therefore, this paper investigates three key issues when applying Transformer to LTSF: temporal continuity, information density, and multi-channel relationships. Accordingly, we propose three innovative solutions, including Placeholder Enhancement Technique (PET), Long Sub-sequence Division (LSD), and Multi-channel Separation and Interaction (MSI), which together form a novel model called PETformer. These three key designs introduce prior biases suitable for LTSF tasks. Extensive experiments have demonstrated that PETformer achieves state-of-the-art (SOTA) performance on eight commonly used public datasets for LTSF, outperforming all other models currently available. This demonstrates that Transformer still possesses powerful capabilities in LTSF.
摘要
Translation notes:1. "long-term time series forecasting" (LTSF) is translated as "长期时间序列预测" (Chángzhòng Shíjiàn Shílián Yùjian)2. "Transformer-based models" is translated as "基于Transformer的模型" (Jīyuè Transformer de Módeli)3. "simple linear models" is translated as "简单的线性模型" (Jìnduan de Língxíng Módeli)4. "prior biases" is translated as "先前的偏见" (Xiānqián de Péndiǎn)5. "Placeholder Enhancement Technique" is translated as "占位提升技术" (Jǐwèi Tiēshén Jìhuà)6. "Long Sub-sequence Division" is translated as "长 subsequences 分解" (Cháng Subseqences Fēnjiě)7. "Multi-channel Separation and Interaction" is translated as "多通道分离和互动" (Duō Tōngdào Fēnlíng Héhuìdòng)8. "novel model" is translated as "新型模型" (Xīn Xíng Módeli)9. "state-of-the-art" is translated as "现状最佳" (Xiànzhèng Zàiqiào)10. "outperforming" is translated as "超越" (Chāoyù)
For: 本研究提出了一种基于文化分析的稀疏混合技术(SUnAA),用于解决稀疏混合问题。* Methods: 我们提出了一个基于文化分析的新模型,假设感兴趣的元件是spectral库提供的元件的几何聚合。然后,我们提出了一个非几何优化目标函数,并使用活动集算法进行迭代优化。* Results: 我们使用两个 simulations dataset进行评估,结果表明SUnAA在signal-to-reconstruction error方面表现更好于传统和先进的方法。此外,我们还应用了SUnAA到Cuprite dataset,并与可用的地质图比较。 qualitative assessment表明SUnAA可以成功地估计矿物含量,并在主要矿物的探测中具有显著改善。Abstract
This paper introduces a new sparse unmixing technique using archetypal analysis (SUnAA). First, we design a new model based on archetypal analysis. We assume that the endmembers of interest are a convex combination of endmembers provided by a spectral library and that the number of endmembers of interest is known. Then, we propose a minimization problem. Unlike most conventional sparse unmixing methods, here the minimization problem is non-convex. We minimize the optimization objective iteratively using an active set algorithm. Our method is robust to the initialization and only requires the number of endmembers of interest. SUnAA is evaluated using two simulated datasets for which results confirm its better performance over other conventional and advanced techniques in terms of signal-to-reconstruction error. SUnAA is also applied to Cuprite dataset and the results are compared visually with the available geological map provided for this dataset. The qualitative assessment demonstrates the successful estimation of the minerals abundances and significantly improves the detection of dominant minerals compared to the conventional regression-based sparse unmixing methods. The Python implementation of SUnAA can be found at: https://github.com/BehnoodRasti/SUnAA.
摘要
Simplified Chinese translation:这篇论文介绍了一种新的稀缺混合技术,基于型态分析(SUnAA)。该方法假设有兴趣的终端成分是spectral库中的终端成分的几何聚合,并且知道终端成分的数量。然后,我们提出了一个非对称的最小化问题。与大多数传统的稀缺混合方法不同,我们在这里使用了活动集算法来解决这个问题。我们的方法对初始化的敏感,只需要终端成分的数量。SUnAA在两个模拟 dataset 上进行评估,结果表明它在信号征化误差方面与其他传统和先进方法相比表现更好。SUnAA还应用于 Cuprite dataset,并与可用的地质地图进行视觉比较。质量评估表明成功地估计矿物含量,并在主要矿物的检测方面提高了传统回归式稀缺混合方法的性能。Python实现的 SUnAA 可以在:https://github.com/BehnoodRasti/SUnAA 找到。
Tram-FL: Routing-based Model Training for Decentralized Federated Learning
results: 通过 MNIST、CIFAR-10 和 IMDb 数据集的实验表明,Tram-FL 与提出的路由算法可以在非独立的条件下达到高精度,比基eline高,同时减少通信成本。Abstract
In decentralized federated learning (DFL), substantial traffic from frequent inter-node communication and non-independent and identically distributed (non-IID) data challenges high-accuracy model acquisition. We propose Tram-FL, a novel DFL method, which progressively refines a global model by transferring it sequentially amongst nodes, rather than by exchanging and aggregating local models. We also introduce a dynamic model routing algorithm for optimal route selection, aimed at enhancing model precision with minimal forwarding. Our experiments using MNIST, CIFAR-10, and IMDb datasets demonstrate that Tram-FL with the proposed routing delivers high model accuracy under non-IID conditions, outperforming baselines while reducing communication costs.
摘要
在分布式联合学习(DFL)中,负担重大的交通和非独立同分布(非IID)数据带来高精度模型获得的挑战。我们提出了Tram-FL方法,它逐步进行全球模型的进一步精度提升,通过将其在节点之间顺序传输,而不是通过交换和聚合本地模型。我们还提出了一种动态模型路由算法,以优化路径选择,以提高模型精度,同时减少前进通信成本。我们通过使用MNIST、CIFAR-10和IMDb数据集进行实验,证明Tram-FL并与提出的路由算法在非IID条件下可以提供高精度模型,而且比基eline优化通信成本。
Feature Matching Data Synthesis for Non-IID Federated Learning
results: 我们的实验结果表明,将提案的HFMDS方法与联合学习结合使用,可以提高模型泛化性和隐私保护,同时降低计算成本。在多个benchmark数据集上,我们的提案HFMDS-FL算法比基eline表现出较高的准确率和隐私保护,同时计算成本也相对较低。Abstract
Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.
摘要
Federated 学习(FL)已经出现为一种隐私保护的思想,在边缘设备上训练神经网络而无需收集数据到中央服务器。然而,FL 面临着非独立和同分布(非IID)数据的挑战。为解决这个挑战,本文提议一种困难特征匹配数据生成(HFMDS)方法,以及在FL 中使用这种方法来共享辅助数据。具体来说,通过学习真实样本中重要的类相关特征,并丢弃 redundant 特征,可以生成高质量的synthetic数据,以有效地解决非IID 问题。为更好地保持隐私,我们提议一种困难特征扩充方法,将真实特征转移到决策边界,使得synthetic数据不仅提高模型泛化性,还将真实特征信息擦除。通过将提案的 HFMDS 方法与 FL 结合,我们提出了一种新的 FL 框架,并在这个框架中添加了数据扩充。理论分析表明,我们的提议的数据生成方法可以有效解决非IID 问题。实验结果还表明,我们的 HFMDS-FL 算法在各种 benchmark 数据集上比基eline 高于精度、隐私保护和计算成本。
Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
paper_authors: Lukas Prediger, Joonas Jälkö, Antti Honkela, Samuel Kaski
for: collaborative learning on sensitive data without violating privacy constraints
methods: 使用具有隐私保证的合成数据分享
results: 与本地数据只使用的情况相比,通过共同学习合成数据集, partiesto obtain more accurate target statistics,尤其是在小型不同类型数据集中; 更多参与者参与学习,则改进的结果越来越大和一致。Abstract
Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible. We propose a framework in which each party shares a differentially private synthetic twin of their data. We study the feasibility of combining such synthetic twin data sets for collaborative learning on real-world health data from the UK Biobank. We discover that parties engaging in the collaborative learning via shared synthetic data obtain more accurate estimates of target statistics compared to using only their local data. This finding extends to the difficult case of small heterogeneous data sets. Furthermore, the more parties participate, the larger and more consistent the improvements become. Finally, we find that data sharing can especially help parties whose data contain underrepresented groups to perform better-adjusted analysis for said groups. Based on our results we conclude that sharing of synthetic twins is a viable method for enabling learning from sensitive data without violating privacy constraints even if individual data sets are small or do not represent the overall population well. The setting of distributed sensitive data is often a bottleneck in biomedical research, which our study shows can be alleviated with privacy-preserving collaborative learning methods.
摘要
假设多个方持有敏感数据,想要共同学习人口级统计数据,但汇集敏感数据集不可能。我们提出了一个框架,每个方共享一个具有隐私保证的假数据集。我们研究了将这些假数据集合用于共同学习的可能性,并应用于UK Biobank的真实世界医疗数据。我们发现,通过共同学习via共享假数据集,各方可以获得更准确的目标统计数据,包括小型不同类型数据集。此外,参与更多方会导致改进变得更大和更一致。最后,我们发现,共享假数据集可以帮助各方进行更好地调整分析,特别是对于被下代表的群体。根据我们的结果,我们认为共享假数据集是一种可靠的方法,允许保持隐私的方式进行敏感数据的学习,即使个人数据集小或者不代表整个人口。这种分布式敏感数据的设定frequently是生物医学研究中的瓶颈,我们的研究表明,这种瓶颈可以通过隐私保证的共同学习方法来缓解。
paper_authors: Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, Lingming Zhang for: 这篇论文是为了探讨一种基于大语言模型的通用软件测试工具(Fuzz4All),它可以针对多种输入语言和多种语言特性进行测试。methods: 这篇论文使用了大语言模型(LLM)作为输入生成和变换引擎,并提出了一种自动提示技术来创建适合软件测试的LLM提示。另外,它还提出了一种基于LLM的软件测试循环,可以在不同语言和不同特性下进行软件测试。results: 这篇论文的实验结果表明,使用Fuzz4All进行软件测试可以取得更高的覆盖率,而且可以发现76个在广泛使用的系统中的漏洞,其中47个已经确认由开发者们为之前未知的漏洞。Abstract
Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solvers, and software libraries with accessible APIs, are especially important as they are fundamental building blocks of software development. However, existing fuzzers for such systems often target a specific language, and thus cannot be easily applied to other languages or even other versions of the same language. Moreover, the inputs generated by existing fuzzers are often limited to specific features of the input language, and thus can hardly reveal bugs related to other or new features. This paper presents Fuzz4All, the first fuzzer that is universal in the sense that it can target many different input languages and many different features of these languages. The key idea behind Fuzz4All is to leverage large language models (LLMs) as an input generation and mutation engine, which enables the approach to produce diverse and realistic inputs for any practically relevant language. To realize this potential, we present a novel autoprompting technique, which creates LLM prompts that are wellsuited for fuzzing, and a novel LLM-powered fuzzing loop, which iteratively updates the prompt to create new fuzzing inputs. We evaluate Fuzz4All on nine systems under test that take in six different languages (C, C++, Go, SMT2, Java and Python) as inputs. The evaluation shows, across all six languages, that universal fuzzing achieves higher coverage than existing, language-specific fuzzers. Furthermore, Fuzz4All has identified 76 bugs in widely used systems, such as GCC, Clang, Z3, CVC5, OpenJDK, and the Qiskit quantum computing platform, with 47 bugs already confirmed by developers as previously unknown.
摘要
各种软件系统中的敏感区域漏洞探测得到了很大的成功,尤其是使用编程或正式语言作为输入的系统(SUT)。这些SUT包括编译器、运行时引擎、约束解决器和可用API的软件库等。然而,现有的敏感区域探测器通常只能针对特定语言,因此无法轻松地应用于其他语言或 même 同一种语言的其他版本。此外,现有的探测器通常仅生成特定语言的输入特性,因此很难暴露新的语言功能中的漏洞。本文介绍了Fuzz4All,首个可以针对多种输入语言和多种语言特性的敏感区域探测器。Fuzz4All的关键思想是利用大语言模型(LLM)作为输入生成和变换引擎,这使得它能够生成多样化和真实的输入 для任何实际有用的语言。为实现这一潜力,我们提出了一种自动提示技术,创建适合探测的 LLM 提示,以及一种基于 LLM 的探测循环,通过更新提示来创造新的探测输入。我们对 nine 种使用 six 种语言(C、C++、Go、SMT2、Java 和 Python)作为输入的系统进行了评估。评估结果显示,在所有 six 种语言中,通用探测 achieve higher coverage than existing, language-specific fuzzers。此外,Fuzz4All 已经发现了76个在广泛使用的系统中的漏洞,其中 47 个已经被开发者确认为之前未知的漏洞。
Optimizing a Transformer-based network for a deep learning seismic processing workflow
results: 试验结果显示,这些修改可以让StorSeismic模型在处理实际的Marmousi和海上场地数据时表现更快且有竞争力,同时需要训练更少的参数。Abstract
StorSeismic is a recently introduced model based on the Transformer to adapt to various seismic processing tasks through its pretraining and fine-tuning training strategy. In the original implementation, StorSeismic utilized a sinusoidal positional encoding and a conventional self-attention mechanism, both borrowed from the natural language processing (NLP) applications. For seismic processing they admitted good results, but also hinted to limitations in efficiency and expressiveness. We propose modifications to these two key components, by utilizing relative positional encoding and low-rank attention matrices as replacements to the vanilla ones. The proposed changes are tested on processing tasks applied to a realistic Marmousi and offshore field data as a sequential strategy, starting from denoising, direct arrival removal, multiple attenuation, and finally root-mean-squared velocity ($V_{RMS}$) prediction for normal moveout (NMO) correction. We observe faster pretraining and competitive results on the fine-tuning tasks and, additionally, fewer parameters to train compared to the vanilla model.
摘要
史顿希伯是一种最近引入的模型,基于Transformer来适应不同的地震处理任务。在原始实现中,史��ton希伯使用了抽象位编码和常见的自注意机制,从自然语言处理(NLP)应用中借鉴。对地震处理来说,它们得到了良好的结果,但也表现了效率和表达能力的限制。我们提议对这两个关键组件进行修改,使用相对位置编码和低级别注意矩阵作为替代物。我们对处理任务进行了顺序推进,从噪声除除、直接到达除、多个减弱、最后是根mean-squared velocity($V_{RMS}$)预测 для正常移动(NMO) corrections。我们发现在先修改任务上快速预训练,并在细化任务上获得了竞争性的结果,同时又需要训练 fewer 参数。
Going Deeper with Five-point Stencil Convolutions for Reaction-Diffusion Equations
for: solves partial differential equations (PDEs) with diverse initial conditions using physics-informed neural networks (PINNs).
methods: uses five-point stencil convolutional neural networks (FCNNs) with large receptive fields to predict time evolutions, and trains the models using two consecutive snapshots with a time step that satisfies the CFL condition.
results: demonstrates that the proposed deep FCNNs retain certain accuracies for the heat, Fisher’s, and Allen-Cahn equations, in contrast to finite difference methods (FDMs) that blow up.Abstract
Physics-informed neural networks have been widely applied to partial differential equations with great success because the physics-informed loss essentially requires no observations or discretization. However, it is difficult to optimize model parameters, and these parameters must be trained for each distinct initial condition. To overcome these challenges in second-order reaction-diffusion type equations, a possible way is to use five-point stencil convolutional neural networks (FCNNs). FCNNs are trained using two consecutive snapshots, where the time step corresponds to the step size of the given snapshots. Thus, the time evolution of FCNNs depends on the time step, and the time step must satisfy its CFL condition to avoid blow-up solutions. In this work, we propose deep FCNNs that have large receptive fields to predict time evolutions with a time step larger than the threshold of the CFL condition. To evaluate our models, we consider the heat, Fisher's, and Allen-Cahn equations with diverse initial conditions. We demonstrate that deep FCNNs retain certain accuracies, in contrast to FDMs that blow up.
摘要
物理学 Informed neural networks 已经广泛应用于部分偏微分方程,取得了很大成功,因为物理学 Informed 损失函数不需要观测或离散。然而,模型参数很难优化,这些参数需要为每个不同的初始条件进行训练。为了解决这些挑战,在第二阶段反应扩散类方程中,可以使用五点矩阵卷积神经网络(FCNN)。FCNN 通过两个连续的快照,其中时间步骤与给出的快照步骤相对应,因此 FCNN 的时间演化取决于时间步骤,并且时间步骤必须满足其 CFL 条件,以避免出现冲击解。在这种工作中,我们提议使用深度 FCNN,其具有大接收场,预测时间演化,时间步骤大于阈值 CFD 条件。为了评估我们的模型,我们考虑了热、施德、艾伦-卡恩方程,并对不同的初始条件进行评估。我们发现深度 FCNN 保留了一定的准确性,与 FDM 不同,后者会冲击。
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
results: 对比prevailing方法,JEN-1 在文本音乐对齐和音乐质量两个指标上具有显著优势,同时保持了计算效率。您可以通过访问 http://futureverse.com/research/jen/demos/jen1 来听取我们的示例作品。Abstract
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through in-context learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at http://futureverse.com/research/jen/demos/jen1
摘要
音乐生成已经吸引了深入的研究,随着深度生成模型的发展,但是基于文本描述的音乐生成,也就是文本到音乐(text-to-music),仍然是一个挑战。这是因为音乐结构的复杂性和高采样率的要求。虽然这个任务的重要性,现有的生成模型却表现出一些限制,包括音乐质量、计算效率和通用性。这篇论文介绍了JEN-1,一种通用高准确度模型,用于文本到音乐生成。JEN-1是一种扩散模型,通过内容学习来实现文本指导的音乐生成、音乐填充和续写等多种生成任务。评估结果表明,JEN-1在文本音乐对齐和音乐质量方面的表现较为出色,同时保持计算效率。您可以在http://futureverse.com/research/jen/demos/jen1中查看我们的示例。
Data-Free Model Extraction Attacks in the Context of Object Detection
results: 通过使用合理的查询,提出了一种数据free模型抽取方法,并在对象检测预测任务中实现了显著的结果。这种抽取方法将支持未来保护机器学习模型的安全。Abstract
A significant number of machine learning models are vulnerable to model extraction attacks, which focus on stealing the models by using specially curated queries against the target model. This task is well accomplished by using part of the training data or a surrogate dataset to train a new model that mimics a target model in a white-box environment. In pragmatic situations, however, the target models are trained on private datasets that are inaccessible to the adversary. The data-free model extraction technique replaces this problem when it comes to using queries artificially curated by a generator similar to that used in Generative Adversarial Nets. We propose for the first time, to the best of our knowledge, an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. As part of our study, we found that defining a loss function and using a novel generator setup is one of the key aspects in extracting the target model. We find that the proposed model extraction method achieves significant results by using reasonable queries. The discovery of this object detection vulnerability will support future prospects for securing such models.
摘要
许多机器学习模型容易受到模型提取攻击,这些攻击集中在使用特制的查询来盗取目标模型。这种任务在白盒环境中非常成功,通过使用目标模型的一部分训练数据或代理数据集来训练一个模仿目标模型的新模型。然而,在实际情况下,目标模型通常是使用私有数据进行训练,这些数据对于敌方无法访问。我们提出了第一次,至少知道的恶意黑盒攻击,扩展到回归问题,用于预测对象检测中的 bounding box 坐标。在我们的研究中,我们发现了定义损失函数和使用新的生成器设置是抽取目标模型的关键因素。我们发现,我们提议的模型提取方法可以使用合理的查询来获得显著的结果。这种对象检测攻击发现将支持未来对这些模型的安全。
Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
paper_authors: Hoang H. Nguyen, Chenwei Zhang, Ye Liu, Philip S. Yu
For: The paper is written for task-oriented dialogue (TOD) systems, specifically to improve the performance of natural language understanding (NLU) tasks such as intent detection and slot filling.* Methods: The paper proposes a method called Slot Induction (SI) that uses unsupervised pre-trained language models (PLMs) and contrastive learning to induce slot boundaries without explicit knowledge of token-level slot annotations.* Results: The paper shows that the proposed SI method is effective in the SI task and can bridge the gap with token-level supervised models on two NLU benchmark datasets. Additionally, the paper shows that the SI objectives can provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.Here is the information in Simplified Chinese text:
methods: 这篇论文提出了一种无监督语言模型(PLM)探索和对比学习的方法,用于不Explicitly knowledge of token-level插槽标签来逻归插槽界限。
results: 论文显示,提出的SI方法在SI任务中效果很好,可以与token级监督模型在两个NLUbenchmark dataset上凑成一个比。此外,论文还显示,SI目标可以提供更好的插槽标签表示,导致插槽填充任务的改进表现。Abstract
Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations. We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit (1) unsupervised semantic knowledge extracted from PLM, and (2) additional sentence-level intent label signals available from TOD. Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets. When generalized to emerging intents, our SI objectives also provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.
摘要
现代技术在任务对话(TOD)系统中的自然语言理解(NLP)方面(例如,意图检测和槽填充)需要大量注解数据来达到竞争性表现。然而,在实际应用中,token级别的注解(槽标签)是时间consuming和困难的获得。在这种工作中,我们研究了槽引入(SI)任务,其目标是无需明确的token级别槽标签来induce槽界限。我们提议利用无监督语言模型(PLM)探测和对比学习机制,以利用PLM中的无监督semantic知识,以及TOD中可获得的句子级意图标签信号。我们的方法在SI任务中显示效果,可以bridge带有token级别监督模型的 gap,在两个NLU benchmark数据集上。此外,当扩展到新意图时,我们的SI目标还能提供加强的槽标签表示,导致在槽填充任务中提高表现。
Generative Perturbation Analysis for Probabilistic Black-Box Anomaly Attribution
methods: 这 paper 使用了一种新的框架,即Counterfactual Variational Bayes(CVB),来计算输入变量的异常分布。
results: 这 paper 得到了一种不受偏见的异常分布计算方法,并且可以量化异常分布的不确定性。Abstract
We address the task of probabilistic anomaly attribution in the black-box regression setting, where the goal is to compute the probability distribution of the attribution score of each input variable, given an observed anomaly. The training dataset is assumed to be unavailable. This task differs from the standard XAI (explainable AI) scenario, since we wish to explain the anomalous deviation from a black-box prediction rather than the black-box model itself. We begin by showing that mainstream model-agnostic explanation methods, such as the Shapley values, are not suitable for this task because of their ``deviation-agnostic property.'' We then propose a novel framework for probabilistic anomaly attribution that allows us to not only compute attribution scores as the predictive mean but also quantify the uncertainty of those scores. This is done by considering a generative process for perturbations that counter-factually bring the observed anomalous observation back to normalcy. We introduce a variational Bayes algorithm for deriving the distributions of per variable attribution scores. To the best of our knowledge, this is the first probabilistic anomaly attribution framework that is free from being deviation-agnostic.
摘要
我团队正在研究黑盒回归Setting中的概率异常归属问题,目标是计算每个输入变量的归属分布,给出观察到的异常。我们假设训练数据集不可用。这个任务与标准XAI(可解释AI)场景不同,我们想要解释黑盒预测的异常偏差而不是黑盒模型本身。我们首先显示了主流的模型无关解释方法,如夏普利值(Shapley value),不适合这个任务,因为它们的偏差无关性。然后,我们提出了一种新的概率异常归属框架,允许我们不仅计算归属分布,还可以评估这些分布的不确定性。这是通过考虑一种对 perturbations 的生成过程来实现的,该过程可以在观察到的异常 observation 的情况下,Counter-factually 带回正常。我们提出了一种变分 Bayes 算法来 derivation 每个变量归属分布的分布。到目前为止,这是免除偏差无关性的第一个概率异常归属框架。
Pareto Invariant Representation Learning for Multimedia Recommendation
results: 在三个公共多媒体推荐数据集上进行比较,研究结果表明PaInvRL模型可以在不同环境下具有优秀的内在和跨环境学习能力。Abstract
Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning.
摘要
multimedia推荐通常包括个性化排序任务,其中 multimedia 内容通常使用一个通用编码器表示。然而,这些通用表示引入了假 correlate 问题,这些问题使得用户的真实喜好不能正确反映。现有的工作尝试通过学习不变表示来缓解这个问题,但是忽略了IID和OOD总体化的平衡。在这篇论文中,我们提出一个名为Pareto不变表示学习(PaInvRL)的框架,用于减轻IID-OOD多目标优化视角下的假 correlate 问题。具体来说,PaInvRL包括三个相互执行的模块:(i)不同环境标识模块,用于反映用户-项目交互中的分布转移;(ii)不变Mask生成模块,用于基于Pareto优化解决方案Minimize适应权重不变风险(IRM)和实际风险(ERM)损失中的不变Mask;(iii)转换模块,用于生成 variant 表示和项目不变表示,以用于训练一个多Modal推荐模型,以避免假 correlate 和 Balance 在环境分布下的总体化性能。我们对三个公共 multimedia 推荐数据集(Movielens、Tiktok和Kwai)进行比较,并证明PaInvRL在内部和交叉环境中的学习表现效果。
A Feature Set of Small Size for the PDF Malware Detection
results: 研究发现,使用Random Forest模型可以达到99.75%的最高准确率,并且该特征集的12个特征是PDF malware检测领域中最短的之一。Abstract
Machine learning (ML)-based malware detection systems are becoming increasingly important as malware threats increase and get more sophisticated. PDF files are often used as vectors for phishing attacks because they are widely regarded as trustworthy data resources, and are accessible across different platforms. Therefore, researchers have developed many different PDF malware detection methods. Performance in detecting PDF malware is greatly influenced by feature selection. In this research, we propose a small features set that don't require too much domain knowledge of the PDF file. We evaluate proposed features with six different machine learning models. We report the best accuracy of 99.75% when using Random Forest model. Our proposed feature set, which consists of just 12 features, is one of the most conciseness in the field of PDF malware detection. Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.
摘要
Translated into Simplified Chinese:机器学习(ML)基于的钓鱼攻击检测系统正在日益重要,因为钓鱼威胁不断增长和变得更加复杂。PDF文档经常用于钓鱼攻击,因为它们被广泛认为是可靠的数据资源,可以在不同的平台上访问。因此,研究人员已经开发了许多不同的PDF钓鱼检测方法。检测PDF钓鱼的性能受到特征选择的影响。在本研究中,我们提出了一个小的特征集,不需要过多的领域知识。我们使用六种不同的机器学习模型评估提案的特征。我们发现使用随机森林模型时的最佳准确率为99.75%。我们提出的特征集,包含12个特征,是PDF钓鱼检测领域中最短的一个。尽管它的规模不大,但我们得到了与领先的方法相当的结果。
An Analytical Study of Covid-19 Dataset using Graph-Based Clustering Algorithms
results: 研究发现,COVID-19病毒的蛋白质互作网络具有较强的稠密度和连接度,这些特征可能与疾病的发展和恶化有关。Abstract
Corona VIrus Disease abbreviated as COVID-19 is a novel virus which is initially identified in Wuhan of China in December of 2019 and now this deadly disease has spread all over the world. According to World Health Organization (WHO), a total of 3,124,905 people died from 2019 to 2021, April. In this case, many methods, AI base techniques, and machine learning algorithms have been researched and are being used to save people from this pandemic. The SARS-CoV and the 2019-nCoV, SARS-CoV-2 virus invade our bodies, causing some differences in the structure of cell proteins. Protein-protein interaction (PPI) is an essential process in our cells and plays a very important role in the development of medicines and gives ideas about the disease. In this study, we performed clustering on PPI networks generated from 92 genes of the Covi-19 dataset. We have used three graph-based clustering algorithms to give intuition to the analysis of clusters.
摘要
“科罗纳病毒病”,简称“ COVID-19”,是一种新型病毒,最初在2019年12月在中国武汉地区被发现,现在这种致命病已经在全球蔓延。根据世界卫生组织(WHO)的统计,2019年至2021年4月,总共有3,124,905人死亡。在这个情况下,许多方法、AI基础技术和机器学习算法都在应用以拯救人类。SARS-CoV和2019-nCoV病毒会入侵我们的身体,导致细胞蛋白结构的一些差异。蛋白蛋白互动(PPI)是我们细胞中的一个重要过程,它对于药物的发展和疾病的了解具有非常重要的作用。在这个研究中,我们使用了92个Covi-19数据集中的PPI网络进行对数据的分组。我们使用了三种图形基础的分组算法,以提供分析结果的直觉。
Explainable AI in Orthopedics: Challenges, Opportunities, and Prospects
paper_authors: Soheyla Amirian, Luke A. Carlson, Matthew F. Gong, Ines Lohse, Kurt R. Weiss, Johannes F. Plate, Ahmad P. Tafti
for: The paper is written to address the challenge of explainable AI (XAI) in orthopedics and to emphasize the need for interdisciplinary collaborations to establish standards and guidelines for the adoption of XAI in orthopedics.
methods: The paper uses a combination of AI models and algorithms that prioritize transparency and interpretability to address the challenge of XAI in orthopedics.
results: The paper highlights the need for interdisciplinary collaborations between AI practitioners, orthopedic specialists, and regulatory entities to establish standards and guidelines for the adoption of XAI in orthopedics.Here’s the simplified Chinese text for the three key points:
results: 这篇论文提出了多元合作的需要,以建立XAI在骨科领域的标准和指南,并且强调了在实施XAI时,需要与医生、外科医生和管理机构之间的合作。Abstract
While artificial intelligence (AI) has made many successful applications in various domains, its adoption in healthcare lags a little bit behind other high-stakes settings. Several factors contribute to this slower uptake, including regulatory frameworks, patient privacy concerns, and data heterogeneity. However, one significant challenge that impedes the implementation of AI in healthcare, particularly in orthopedics, is the lack of explainability and interpretability around AI models. Addressing the challenge of explainable AI (XAI) in orthopedics requires developing AI models and algorithms that prioritize transparency and interpretability, allowing clinicians, surgeons, and patients to understand the contributing factors behind any AI-powered predictive or descriptive models. The current contribution outlines several key challenges and opportunities that manifest in XAI in orthopedic practice. This work emphasizes the need for interdisciplinary collaborations between AI practitioners, orthopedic specialists, and regulatory entities to establish standards and guidelines for the adoption of XAI in orthopedics.
摘要
人工智能(AI)在不同领域已经取得了许多成功应用,但在医疗领域的采纳 slower 了一些。这些因素包括法规框架、患者隐私问题和数据不一致。然而,在医疗领域,特别是在 ortopedics 中,缺乏 Explainable AI(XAI)是实现 AI 的主要挑战。 Addressing the challenge of XAI in orthopedics requires developing AI models and algorithms that prioritize transparency and interpretability, allowing clinicians, surgeons, and patients to understand the contributing factors behind any AI-powered predictive or descriptive models. 现在的贡献将描述 XAI 在 ortopedics 中的一些关键挑战和机遇。这篇文章强调了在 AI 实践者、orthopedic 专家和 regulatory 机构之间的交往合作,以确立 XAI 在 ortopedics 中的标准和指南。
Finite Element Operator Network for Solving Parametric PDEs
methods: 提议使用Finite Element Operator Network (FEONet),结合深度学习和传统的数值方法(特别是finite element方法)解决Parametric PDEs。
results: 对多个 benchmark 问题进行了实验,证明了我们的方法在准确性、泛化能力和计算灵活性等方面表现出色,并且超过了现有的状态艺术方法。Abstract
Partial differential equations (PDEs) underlie our understanding and prediction of natural phenomena across numerous fields, including physics, engineering, and finance. However, solving parametric PDEs is a complex task that necessitates efficient numerical methods. In this paper, we propose a novel approach for solving parametric PDEs using a Finite Element Operator Network (FEONet). Our proposed method leverages the power of deep learning in conjunction with traditional numerical methods, specifically the finite element method, to solve parametric PDEs in the absence of any paired input-output training data. We demonstrate the effectiveness of our approach on several benchmark problems and show that it outperforms existing state-of-the-art methods in terms of accuracy, generalization, and computational flexibility. Our FEONet framework shows potential for application in various fields where PDEs play a crucial role in modeling complex domains with diverse boundary conditions and singular behavior. Furthermore, we provide theoretical convergence analysis to support our approach, utilizing finite element approximation in numerical analysis.
摘要
Two Novel Approaches to Detect Community: A Case Study of Omicron Lineage Variants PPI Network
results: 研究发现,使用不同的算法可以检测出 variant B.1.1.529 网络中的社群结构,并且这些社群结构具有独特的特征和性质。Abstract
The capacity to identify and analyze protein-protein interactions, along with their internal modular organization, plays a crucial role in comprehending the intricate mechanisms underlying biological processes at the molecular level. We can learn a lot about the structure and dynamics of these interactions by using network analysis. We can improve our understanding of the biological roots of disease pathogenesis by recognizing network communities. This knowledge, in turn, holds significant potential for driving advancements in drug discovery and facilitating personalized medicine approaches for disease treatment. In this study, we aimed to uncover the communities within the variant B.1.1.529 (Omicron virus) using two proposed novel algorithm (ABCDE and ALCDE) and four widely recognized algorithms: Girvan-Newman, Louvain, Leiden, and Label Propagation algorithm. Each of these algorithms has established prominence in the field and offers unique perspectives on identifying communities within complex networks. We also compare the networks by the global properties, statistic summary, subgraph count, graphlet and validate by the modulaity. By employing these approaches, we sought to gain deeper insights into the structural organization and interconnections present within the Omicron virus network.
摘要
“蛋白质-蛋白质互动的能力和内部模块化结构在分子层面上关键地影响生物过程的复杂机制。我们可以通过网络分析来学习这些互动的结构和动力学。通过认可网络社区,我们可以更深入地理解疾病生物根据,这将有助于医疗药物发现和疾病治疗中采取人类化方法。在这个研究中,我们使用了两种提出的新算法(ABCDE和ALCDE)和四种已知的算法:Girvan-Newman、Louvain、Leiden和Label Propagation算法。每个这些算法在领域中都有传统的地位,它们可以帮助我们从不同的角度发现疾病网络中的社区。我们还比较了这些网络的全球性、统计摘要、子graph计数、графлет和验证 Modulaity。通过这些方法,我们想要从这些方法中获得更深入的理解疾病网络中的结构和互动。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
TBIN: Modeling Long Textual Behavior Data for CTR Prediction
results: 实验结果表明,TBIN 可以更好地预测 CTR,并在实际食品推荐平台上达到了好的效果。Abstract
Click-through rate (CTR) prediction plays a pivotal role in the success of recommendations. Inspired by the recent thriving of language models (LMs), a surge of works improve prediction by organizing user behavior data in a \textbf{textual} format and using LMs to understand user interest at a semantic level. While promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs. However, it has been studied that long user behavior data can significantly benefit CTR prediction. In addition, these works typically condense user diverse interests into a single feature vector, which hinders the expressive capability of the model. In this paper, we propose a \textbf{T}extual \textbf{B}ehavior-based \textbf{I}nterest Chunking \textbf{N}etwork (TBIN), which tackles the above limitations by combining an efficient locality-sensitive hashing algorithm and a shifted chunk-based self-attention. The resulting user diverse interests are dynamically activated, producing user interest representation towards the target item. Finally, the results of both offline and online experiments on real-world food recommendation platform demonstrate the effectiveness of TBIN.
摘要
点击率预测(CTR)在推荐中发挥关键作用。鼓使用语言模型(LM)的最近繁荣,一些工作改进预测,将用户行为数据组织成文本格式,并使用LM理解用户兴趣的 semantic 层次。虽然有承诺,但这些工作通常需要压缩文本数据,以降低LM的自注意力计算量的二次性。此外,这些工作通常将用户多样化的兴趣维度化成单一的特征向量,这限制了模型的表达能力。在这篇论文中,我们提出了一种 Textual Behavior-based Interest Chunking Network(TBIN),解决以上限制。TBIN 通过结合本地性敏感哈希算法和偏移 chunk-based self-attention 来实现。这将使用户多样化的兴趣在运行时动态激活,生成用户兴趣表示向 target 项。最后,在实际食品推荐平台上进行了线上和离线实验,证明了 TBIN 的效果。
A General Implicit Framework for Fast NeRF Composition and Rendering
for: This paper aims to provide a general implicit pipeline for composing NeRF objects quickly, enabling the casting of dynamic shadows within or between objects using analytical light sources, and allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations.
methods: The proposed method introduces a new surface representation known as Neural Depth Fields (NeDF), which quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.
results: The proposed method is the first to enable both the progressive and interactive composition of NeRF objects, and it also serves as a previewing plugin for a range of existing NeRF works.Abstract
A variety of Neural Radiance Fields (NeRF) methods have recently achieved remarkable success in high render speed. However, current accelerating methods are specialized and incompatible with various implicit methods, preventing real-time composition over various types of NeRF works. Because NeRF relies on sampling along rays, it is possible to provide general guidance for acceleration. To that end, we propose a general implicit pipeline for composing NeRF objects quickly. Our method enables the casting of dynamic shadows within or between objects using analytical light sources while allowing multiple NeRF objects to be seamlessly placed and rendered together with any arbitrary rigid transformations. Mainly, our work introduces a new surface representation known as Neural Depth Fields (NeDF) that quickly determines the spatial relationship between objects by allowing direct intersection computation between rays and implicit surfaces. It leverages an intersection neural network to query NeRF for acceleration instead of depending on an explicit spatial structure.Our proposed method is the first to enable both the progressive and interactive composition of NeRF objects. Additionally, it also serves as a previewing plugin for a range of existing NeRF works.
摘要
各种神经辐射场(NeRF)方法在最近几年内已经实现了高速渲染。然而,当前的加速方法都是特殊化的,与各种隐式方法不兼容,因此无法在不同类型的 NeRF 作品中实现实时组合。因为 NeRF 是基于样本点投射,因此可以提供一般的指导。为了实现这一目标,我们提议一种通用的隐式管道,用于快速组合 NeRF 对象。我们的方法可以在动态阴影下投射 NeRF 对象,并允许多个 NeRF 对象在任意旋转变换下协同渲染。主要是,我们的工作引入了一种新的表面表示方式,即神经深度场(NeDF),它快速确定对象之间的空间关系,并使用神经网络进行交叉计算。这种方法不依赖于Explicit的空间结构,可以快速地计算ray和隐式表面之间的交叉。我们的提议方法是首次实现了 NeRF 对象的逐渐和交互式组合。此外,它还可以作为许多现有 NeRF 作品的预览插件。
Classification of lung cancer subtypes on CT images with synthetic pathological priors
results: 实验结果显示,提案的模型在肺癌病理型态分类任务中具有superiority,与多种state-of-the-art(SOTA)分类模型相比,具有 significiant 的准确性改善,包括精确率(ACC)、抽象曲线(AUC)和 F1 分数。Abstract
The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns between the same case's CT images and its pathological images, we innovatively developed a pathological feature synthetic module (PFSM), which quantitatively maps cross-modality associations through deep neural networks, to derive the "gold standard" information contained in the corresponding pathological images from CT images. Additionally, we designed a radiological feature extraction module (RFEM) to directly acquire CT image information and integrated it with the pathological priors under an effective feature fusion framework, enabling the entire classification model to generate more indicative and specific pathologically related features and eventually output more accurate predictions. The superiority of the proposed model lies in its ability to self-generate hybrid features that contain multi-modality image information based on a single-modality input. To evaluate the effectiveness, adaptability, and generalization ability of our model, we performed extensive experiments on a large-scale multi-center dataset (i.e., 829 cases from three hospitals) to compare our model and a series of state-of-the-art (SOTA) classification models. The experimental results demonstrated the superiority of our model for lung cancer subtypes classification with significant accuracy improvements in terms of accuracy (ACC), area under the curve (AUC), and F1 score.
摘要
准确诊断肺癌分型对跟进治疗和预后管理具有重要的重要性。在这篇论文中,我们提议一种自生成混合特征网络(SGHF-Net),用于准确分类肺癌分型的计算机Tomography(CT)图像。受到 Studies表明cross-scale关系存在图像特征之间的同一个患者的CT图像和其 PATHOLOGICAL图像的研究所启发,我们创新地开发了一种 PATHOLOGICAL特征合成模块(PFSM),用于量化跨模态关系,从深度神经网络中获取 PATHOLOGICAL图像中的"金标准"信息。此外,我们设计了一种放射学特征提取模块(RFEM),用于直接获取 CT 图像信息,并将其与 PATHOLOGICAL 先天知识结合在一起,以实现效果的特征融合框架,使整个分类模型能够生成更指示性和特定的 PATHOLOGICAL 相关特征,并最终输出更高精度的预测结果。我们的模型的优势在于它可以基于单一输入模式生成混合特征,包括多Modal 图像信息。为评估我们模型的有效性、适应性和普遍性,我们在大规模多中心数据集(i.e., 829 例from three hospitals)进行了广泛的实验,与一些当前最佳分类模型进行比较。实验结果表明,我们的模型在肺癌分型方面具有显著的准确性改进,包括准确率(ACC)、曲线下的面积(AUC)和 F1 分数。
Efficient Bayesian Optimization with Deep Kernel Learning and Transformer Pre-trained on Multiple Heterogeneous Datasets
results: 在 sintetic和实际benchmark问题上,提出的方法比现有方法更高效。Abstract
Bayesian optimization (BO) is widely adopted in black-box optimization problems and it relies on a surrogate model to approximate the black-box response function. With the increasing number of black-box optimization tasks solved and even more to solve, the ability to learn from multiple prior tasks to jointly pre-train a surrogate model is long-awaited to further boost optimization efficiency. In this paper, we propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder, using datasets from prior tasks with possibly heterogeneous input spaces. In addition, we provide a simple yet effective mix-up initialization strategy for input tokens corresponding to unseen input variables and therefore accelerate new tasks' convergence. Experiments on both synthetic and real benchmark problems demonstrate the effectiveness of our proposed pre-training and transfer BO strategy over existing methods.
摘要
bayesian 优化(BO)广泛应用于黑盒优化问题中,它基于一个模拟黑盒响应函数的伪函数来进行优化。随着黑盒优化任务的数量不断增加,并且还有更多的任务需要解决,因此有必要将多个前一个任务的知识共享以提高优化效率。在这篇论文中,我们提出了一种简单的预训练方法,其中使用一个基于Transformer的encoder学习的深度特征来定义GP的kernel,并使用多个先前任务的数据进行预训练。此外,我们还提供了一种简单却有效的混合初始化策略,以便快速加速新任务的启动。实验表明,我们的提议的预训练和传递BO策略在实际和Syntheticbenchmark问题上比既有方法更高效。
Assessing the performance of deep learning-based models for prostate cancer segmentation using uncertainty scores
results: 研究发现,Attention R2U-Net 模型在 segmenting 所有区域时达到了平均 Intersection over Union(IoU)76.3%和 Dice Similarity Coefficient(DSC)85%的最高性能,并且在边界区域中,特别是在转换区域和肿瘤边界上,Attention R2U-Net 模型表现出最低的不确定性值。Abstract
This study focuses on comparing deep learning methods for the segmentation and quantification of uncertainty in prostate segmentation from MRI images. The aim is to improve the workflow of prostate cancer detection and diagnosis. Seven different U-Net-based architectures, augmented with Monte-Carlo dropout, are evaluated for automatic segmentation of the central zone, peripheral zone, transition zone, and tumor, with uncertainty estimation. The top-performing model in this study is the Attention R2U-Net, achieving a mean Intersection over Union (IoU) of 76.3% and Dice Similarity Coefficient (DSC) of 85% for segmenting all zones. Additionally, Attention R2U-Net exhibits the lowest uncertainty values, particularly in the boundaries of the transition zone and tumor, when compared to the other models.
摘要
Deep Metric Learning for the Hemodynamics Inference with Electrocardiogram Signals
results: 研究发现,这个自我监督DML方法可以对于患者 subgroup 进行优化,并且在不同的患者 subgroup 中表现良好。此外,这个方法还可以与现有的标准方法进行比较,以验证其性能。Abstract
Heart failure is a debilitating condition that affects millions of people worldwide and has a significant impact on their quality of life and mortality rates. An objective assessment of cardiac pressures remains an important method for the diagnosis and treatment prognostication for patients with heart failure. Although cardiac catheterization is the gold standard for estimating central hemodynamic pressures, it is an invasive procedure that carries inherent risks, making it a potentially dangerous procedure for some patients. Approaches that leverage non-invasive signals - such as electrocardiogram (ECG) - have the promise to make the routine estimation of cardiac pressures feasible in both inpatient and outpatient settings. Prior models trained to estimate intracardiac pressures (e.g., mean pulmonary capillary wedge pressure (mPCWP)) in a supervised fashion have shown good discriminatory ability but have been limited to the labeled dataset from the heart failure cohort. To address this issue and build a robust representation, we apply deep metric learning (DML) and propose a novel self-supervised DML with distance-based mining that improves the performance of a model with limited labels. We use a dataset that contains over 5.4 million ECGs without concomitant central pressure labels to pre-train a self-supervised DML model which showed improved classification of elevated mPCWP compared to self-supervised contrastive baselines. Additionally, the supervised DML model that is using ECGs with access to 8,172 mPCWP labels demonstrated significantly better performance on the mPCWP regression task compared to the supervised baseline. Moreover, our data suggest that DML yields models that are performant across patient subgroups, even when some patient subgroups are under-represented in the dataset. Our code is available at https://github.com/mandiehyewon/ssldml
摘要
心力衰竭是一种严重的疾病,影响了全球数百万人的生活质量和死亡率。心脏压力的 объектив评估仍然是诊断和治疗预测的重要方法。尽管心脏插管是心脏压力的标准方法,但它是一种侵入性的程序,带来了内在的风险,因此可能对某些患者而言是危险的。使用非侵入性信号(如电cardiogram)的方法可以使 Routine estimation of cardiac pressures 变得可能。先前的模型,通过监督学习来估计内心脏压力(如mean pulmonary capillary wedge pressure (mPCWP)),在心脏疾病群体中显示了良好的预测能力,但它们受限于标注数据集。为解决这个问题并建立一个坚固的表示,我们应用深度度量学习(DML)并提出一种新的自动化DML,通过距离基本挖掘来提高模型的性能。我们使用了包含超过540万个ECG无相关中央压力标签的数据集来预训一个自动化DML模型,该模型在提高高mPCWP的分类性能方面表现出色,而且与自动化对比基线显著更好。此外,我们使用ECG和8172个mPCWP标签来训练一个监督DML模型,该模型在mPCWP回归任务中表现出色,并且与监督基线显著更好。此外,我们的数据表明,DML模型在不同的患者子组中表现良好,即使某些患者子组在数据集中受到保守。我们的代码可以在https://github.com/mandiehyewon/ssldml 中找到。
Enhancing Optimization Performance: A Novel Hybridization of Gaussian Crunching Search and Powell’s Method for Derivative-Free Optimization
results: 经过实验,我们发现这种混合方法可以显著提高优化性能,同时保留每种方法的优点。这种混合方法开启了优化复杂系统中的新可能性。Abstract
This research paper presents a novel approach to enhance optimization performance through the hybridization of Gaussian Crunching Search (GCS) and Powell's Method for derivative-free optimization. While GCS has shown promise in overcoming challenges faced by traditional derivative-free optimization methods [1], it may not always excel in finding the local minimum. On the other hand, some traditional methods may have better performance in this regard. However, GCS demonstrates its strength in escaping the trap of local minima and approaching the global minima. Through experimentation, we discovered that by combining GCS with certain traditional derivative-free optimization methods, we can significantly boost performance while retaining the respective advantages of each method. This hybrid approach opens up new possibilities for optimizing complex systems and finding optimal solutions in a range of applications.
摘要
Note:* "GCS" is translated as " Gaussian Crunching Search" (GCS)* "Powell's Method" is translated as "Powell's Method" ( Powell 方法)* "derivative-free optimization" is translated as "无导数优化" (without derivative optimization)* "local minimum" is translated as "地方最优" (local optimal)* "global minimum" is translated as "全球最优" (global optimal)
Sparse Binary Transformers for Multivariate Time Series Modeling
paper_authors: Matt Gorbett, Hossein Shirazi, Indrakshi Ray
for: 应用于多变量时间序列问题的简单深度学习模型
methods: 使用稀疏和二进制权重的 transformer 模型
results: 在三个时间序列学习任务中获得了比较出色的成绩:分类、异常检测和单步预测,同时通过两种修改来减少计算复杂性:1)在分类任务中应用固定mask,2)在预测和异常检测任务中应用时间步骤Attention mask。这些修改和压缩技术可以减少 transformer 模型中非零操作数量,并且对模型性能没有明显的影响。Abstract
Compressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and anomaly detection, which rely on predicting outputs at a single point in time, we propose an attention mask to allow computation only at the current time step. Together, each compression technique and attention modification substantially reduces the number of non-zero operations necessary in the Transformer. We measure the computational savings of our approach over a range of metrics including parameter count, bit size, and floating point operation (FLOPs) count, showing up to a 53x reduction in storage size and up to 10.5x reduction in FLOPs.
摘要
压缩神经网络有望推动深度学习应用于新的应用环境和较小的计算环境。然而,了解这些模型在哪些学习任务中能够成功是未有充分研究。在这种工作中,我们使用稀疏和二进制权重的Transformers来解决多变量时间序列问题,并证明这些轻量级模型可以与同结构的浮点数Transformers具有相同的准确率。我们的模型在三个时间序列学习任务中显示出了有利的结果:分类、异常检测和单步预测。此外,为了降低计算复杂性的注意机制,我们应用了两种修改,其中第一种是在分类任务中采用固定掩码来修改查询、关键和值活动,而第二种是在预测和异常检测任务中,因为需要在单个时间步骤上预测输出,我们提议使用注意力掩码,只允许在当前时间步骤上进行计算。总之,我们的方法可以减少Transformer中非零操作数量,并且对参数数、位数和浮点运算(FLOPs)数进行了评估,得出了最多53倍减少存储大小和最多10.5倍减少FLOPs。
Multiclass Online Learnability under Bandit Feedback
results: 结果与hanneke2023multiclass的结果相 compliment,显示了在充满信息设定下的线上多类分类可学习性是基于抽象反馈的Littlestone dimension。Abstract
We study online multiclass classification under bandit feedback. We extend the results of (daniely2013price) by showing that the finiteness of the Bandit Littlestone dimension is necessary and sufficient for bandit online multiclass learnability even when the label space is unbounded. Our result complements the recent work by (hanneke2023multiclass) who show that the Littlestone dimension characterizes online multiclass learnability in the full-information setting when the label space is unbounded.
摘要
我们研究在抽奖式多类分类中的在线学习。我们将(daniely2013price)的结果推广到不确定label空间的情况下,证明在线多类学习的可学习性需要和充分条件是抽奖式Littlestone维度的 фиnisiteness。我们的结果与(hanneke2023multiclass)的最近研究相 complement,其证明在全信息设置下,无限大的label空间下的多类学习可学习性是Littlestone维度的Characterize。
Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
paper_authors: Hang Wang, Zhen Xiang, David J. Miller, George Kesidis for: 防止深度神经网络受到后门攻击(Trojan)的攻击,攻击者在训练集中杀入后门触发器,使神经网络在测试时识别攻击者所指定的目标类。methods: 我们提出了一种新的静态策略,通过在小量的净样上学习内层吞吐量的约束,来限制内层吞吐量的范围。这种方法可以更好地防止后门攻击,并且具有强大的鲁棒性。results: 我们的方法在CIFAR-10图像分类任务上显示出了更好的性能,并且对于不同的数据集和攻击方法都具有强大的鲁棒性。此外,我们还提出了一种基于输出差异的测试时检测和修复方法。Abstract
Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for test-time detection and correction based on the output differences between the original and activation-bounded networks. The code of our method is online available.
摘要
深度神经网络容易受到后门攻击(Trojan),攻击者在训练集中杀断特定目标类的识别器,使神经网络在测试时通过特定目标类来识别测试触发器。最近的研究表明,后门恶意投入会导致模型过度适应(异常大的活化),这种情况下我们提出了一种通用、 после训练剪辑方法来 mitigate 后门攻击,即通过小量的干净样本学习内层活化的约束。我们提出了一种新的方法,选择活化约束来限制分类范围。这种方法在对同类方法进行比较时表现出色,并且具有强大的适应性,可以在X2X攻击、适应攻击和不同的 datasets 上进行验证。最后,我们还示出了一种基于输出差异的测试时检测和修复方法。我们的方法代码在线可用。
Machine Learning, Deep Learning and Data Preprocessing Techniques for Detection, Prediction, and Monitoring of Stress and Stress-related Mental Disorders: A Scoping Review
results: 研究发现,支持向量机器(SVM)、神经网络(NN)和随机森林(RF)模型在所有机器学习算法中具有最高精度和稳定性。此外, Physiological parameters,如心率测量和皮肤响应,是压力预测中最常用的数据类型。Abstract
This comprehensive review systematically evaluates Machine Learning (ML) methodologies employed in the detection, prediction, and analysis of mental stress and its consequent mental disorders (MDs). Utilizing a rigorous scoping review process, the investigation delves into the latest ML algorithms, preprocessing techniques, and data types employed in the context of stress and stress-related MDs. The findings highlight that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently exhibit superior accuracy and robustness among all machine learning algorithms examined. Furthermore, the review underscores that physiological parameters, such as heart rate measurements and skin response, are prevalently used as stress predictors in ML algorithms. This is attributed to their rich explanatory information concerning stress and stress-related MDs, as well as the relative ease of data acquisition. Additionally, the application of dimensionality reduction techniques, including mappings, feature selection, filtering, and noise reduction, is frequently observed as a crucial step preceding the training of ML algorithms. The synthesis of this review identifies significant research gaps and outlines future directions for the field. These encompass areas such as model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for detection and prediction of stress and stress-related MDs.
摘要
The findings show that Support Vector Machine (SVM), Neural Network (NN), and Random Forest (RF) models consistently demonstrate superior accuracy and robustness among all ML algorithms examined. Additionally, the review highlights that physiological parameters, such as heart rate measurements and skin response, are commonly used as stress predictors in ML algorithms due to their rich explanatory information and ease of data acquisition.The review also notes that dimensionality reduction techniques, such as mappings, feature selection, filtering, and noise reduction, are frequently applied as a crucial step before training ML algorithms.The synthesis of this review identifies significant research gaps and outlines future directions for the field, including model interpretability, model personalization, the incorporation of naturalistic settings, and real-time processing capabilities for detection and prediction of stress and stress-related MDs.
Sparse Array Design for Direction Finding using Deep Learning
results: 这些研究通过数字实验显示了模型基于优化和DL技术的性能,并讨论了对于稀疏数组设计问题的多种实际应用,包括认知雷达、无线通信和 интеGRATED sensing和通信(ISAC)应用。Abstract
In the past few years, deep learning (DL) techniques have been introduced for designing sparse arrays. These methods offer the advantages of feature engineering and low prediction-stage complexity, which is helpful in tackling the combinatorial search inherent to finding a sparse array. In this chapter, we provide a synopsis of several direction finding applications of DL-based sparse arrays. We begin by examining supervised and transfer learning techniques that have applications in selecting sparse arrays for a cognitive radar application. Here, we also discuss the use of meta-heuristic learning algorithms such as simulated annealing for the case of designing two-dimensional sparse arrays. Next, we consider DL-based antenna selection for wireless communications, wherein sparse array problem may also be combined with channel estimation, beamforming, or localization. Finally, we provide an example of deep sparse array technique for integrated sensing and communications (ISAC) application, wherein a trade-off of radar and communications performance makes ISAC sparse array problem very challenging. For each setting, we illustrate the performance of model-based optimization and DL techniques through several numerical experiments. We discuss additional considerations required to ensure robustness of DL-based algorithms against various imperfections in array data.
摘要
We begin by examining supervised and transfer learning techniques that have applications in selecting sparse arrays for a cognitive radar system. We also discuss the use of meta-heuristic learning algorithms such as simulated annealing for the case of designing two-dimensional sparse arrays.Next, we consider DL-based antenna selection for wireless communications, where the sparse array problem may also be combined with channel estimation, beamforming, or localization. Finally, we provide an example of deep sparse array techniques for integrated sensing and communications (ISAC) applications, where a trade-off between radar and communications performance makes ISAC sparse array problems very challenging.For each setting, we illustrate the performance of model-based optimization and DL techniques through several numerical experiments. We also discuss additional considerations required to ensure the robustness of DL-based algorithms against various imperfections in array data.
Deep Learning Driven Detection of Tsunami Related Internal GravityWaves: a path towards open-ocean natural hazards detection
paper_authors: Valentino Constantinou, Michela Ravanelli, Hamlin Liu, Jacob Bortnik
for: 这个论文是为了检测地震引起的内力波在电离层中的影响,以提高早期警报系统的精度。
methods: 这个研究使用了GNSS数据和深度学习技术,并将slant total electron content(sTEC)从VARION算法和计算机视觉中的Gramian Angular Difference Fields(GADF)和卷积神经网络(CNN)组合使用,以实时检测内力波。
results: 研究结果显示,使用这种方法可以在near-real-time中检测到内力波,并在2010年的墨西哥大地震、2011年的东北大地震和2012年的海达岛大地震中达到了91.7%的F1分数。Abstract
Tsunamis can trigger internal gravity waves (IGWs) in the ionosphere, perturbing the Total Electron Content (TEC) - referred to as Traveling Ionospheric Disturbances (TIDs) that are detectable through the Global Navigation Satellite System (GNSS). The GNSS are constellations of satellites providing signals from Earth orbit - Europe's Galileo, the United States' Global Positioning System (GPS), Russia's Global'naya Navigatsionnaya Sputnikovaya Sistema (GLONASS) and China's BeiDou. The real-time detection of TIDs provides an approach for tsunami detection, enhancing early warning systems by providing open-ocean coverage in geographic areas not serviceable by buoy-based warning systems. Large volumes of the GNSS data is leveraged by deep learning, which effectively handles complex non-linear relationships across thousands of data streams. We describe a framework leveraging slant total electron content (sTEC) from the VARION (Variometric Approach for Real-Time Ionosphere Observation) algorithm by Gramian Angular Difference Fields (from Computer Vision) and Convolutional Neural Networks (CNNs) to detect TIDs in near-real-time. Historical data from the 2010 Maule, 2011 Tohoku and the 2012 Haida-Gwaii earthquakes and tsunamis are used in model training, and the later-occurring 2015 Illapel earthquake and tsunami in Chile for out-of-sample model validation. Using the experimental framework described in the paper, we achieved a 91.7% F1 score. Source code is available at: https://github.com/vc1492a/tidd. Our work represents a new frontier in detecting tsunami-driven IGWs in open-ocean, dramatically improving the potential for natural hazards detection for coastal communities.
摘要
TSUNAMIS可以触发内部重力波(IGW)在电离层,干扰全电子内容(TEC),被称为旅行 ionospheric 干扰(TIDs),可以通过全球导航卫星系统(GNSS)探测。GNSS 包括欧盟的加利列オ(Galileo)、美国的全球定位系统(GPS)、俄罗斯的全球卫星导航系统(GLONASS)和中国的北斗卫星导航系统(BeiDou)。实时探测 TIDs 提供了一种方法,用于早期警报系统,提供了不可达的开 ocean 覆盖。通过深入学习,可以有效处理复杂的非线性关系,并处理数千个数据流。我们描述了一个框架,利用倾斜全电子内容(sTEC)从VARION(Variometric Approach for Real-Time Ionosphere Observation)算法、格里曼angular Difference Fields(from Computer Vision)和卷积神经网络(CNNs)来探测 TIDs 的实时探测。历史数据来自2010年智利地震、2011年日本地震和2012年加拿大海啸,用于模型训练,而2015年智利地震用于模型验证。使用我们所描述的实验框架,我们实现了 91.7% F1 分数。源代码可以在 GitHub 上获取:https://github.com/vc1492a/tidd。我们的工作代表了一种新的前沿,用于探测在开 ocean 中的地震引起的 IGW,这将在沿海社区中提高天然威胁探测的潜在性。
PSRFlow: Probabilistic Super Resolution with Flow-Based Models for Scientific Data
results: 与插值和GAN-based超分辨化网络相比,表现出色,并且可以正确地衡量超分辨化结果的不确定性。Abstract
Although many deep-learning-based super-resolution approaches have been proposed in recent years, because no ground truth is available in the inference stage, few can quantify the errors and uncertainties of the super-resolved results. For scientific visualization applications, however, conveying uncertainties of the results to scientists is crucial to avoid generating misleading or incorrect information. In this paper, we propose PSRFlow, a novel normalizing flow-based generative model for scientific data super-resolution that incorporates uncertainty quantification into the super-resolution process. PSRFlow learns the conditional distribution of the high-resolution data based on the low-resolution counterpart. By sampling from a Gaussian latent space that captures the missing information in the high-resolution data, one can generate different plausible super-resolution outputs. The efficient sampling in the Gaussian latent space allows our model to perform uncertainty quantification for the super-resolved results. During model training, we augment the training data with samples across various scales to make the model adaptable to data of different scales, achieving flexible super-resolution for a given input. Our results demonstrate superior performance and robust uncertainty quantification compared with existing methods such as interpolation and GAN-based super-resolution networks.
摘要
尽管最近几年内有许多基于深度学习的超分辨率方法被提出,但由于无法在推理阶段获得真实的参照值,因此只能很少量化错误和不确定性。在科学视觉应用中,却非常重要将超分辨率结果的不确定性传递给科学家,以避免生成错误或 incorrect 信息。在本文中,我们提出了 PSRFlow,一种基于正规流的生成模型,用于科学数据超分辨率。PSRFlow 学习了高分辨率数据的假设分布,基于低分辨率数据。通过在 Gaussian 隐藏空间中采样,可以生成不同可能的超分辨率输出。在 Gaussian 隐藏空间中高效采样可以实现对超分辨率结果的不确定性评估。在模型训练过程中,我们将训练数据进行了不同尺度的扩展,使模型适应不同的尺度,实现数据的灵活超分辨率。我们的结果表明,与既有方法相比,PSRFlow 具有更高的性能和更精准的不确定性评估。
results: 这篇论文主要的结果是,现有的分布式FL方法存在一些潜在的安全性和可靠性问题,如中央服务器的单点失败风险和人在中攻击等。同时,它还提出了一些未来研究方向,以解决现有的挑战和问题。Abstract
In recent years, federated learning (FL) has become a very popular paradigm for training distributed, large-scale, and privacy-preserving machine learning (ML) systems. In contrast to standard ML, where data must be collected at the exact location where training is performed, FL takes advantage of the computational capabilities of millions of edge devices to collaboratively train a shared, global model without disclosing their local private data. Specifically, in a typical FL system, the central server acts only as an orchestrator; it iteratively gathers and aggregates all the local models trained by each client on its private data until convergence. Although FL undoubtedly has several benefits over traditional ML (e.g., it protects private data ownership by design), it suffers from several weaknesses. One of the most critical challenges is to overcome the centralized orchestration of the classical FL client-server architecture, which is known to be vulnerable to single-point-of-failure risks and man-in-the-middle attacks, among others. To mitigate such exposure, decentralized FL solutions have emerged where all FL clients cooperate and communicate without a central server. This survey comprehensively summarizes and reviews existing decentralized FL approaches proposed in the literature. Furthermore, it identifies emerging challenges and suggests promising research directions in this under-explored domain.
摘要
Deep Learning based Image Watermarking: A Brief Survey
paper_authors: Xin Zhong, Arjon Das, Fahad Alrasheedi, Abdullah Tanvir
for: 保护图像 against unauthorized use and distribution
methods: 使用深度学习技术,包括Embedder-Extractor Joint Training、Deep Networks as a Feature Transformation和Hybrid schemes
results: 分析了现有的深度学习图像水印技术,并提出了未来研究的可能性。Abstract
The act of secretly embedding and extracting a watermark on a cover image to protect it is known as image watermarking. In recent years, deep learning-based image watermarking techniques have been emerging one after another. To study the state-of-the-art, this survey categorizes cutting-edge deep learning-based image watermarking techniques into Embedder-Extractor Joint Training, Deep Networks as a Feature Transformation, and Hybrid schemes. Research directions in each category are also analyzed and summarized. Additionally, potential future research directions are discussed to envision future studies.
摘要
“图像水印”是指在封面图像上隐藏并提取水印以保护图像的行为。在最近几年,基于深度学习的图像水印技术逐渐涌现。本笔报告将这些技术分为三类:嵌入器-提取器共同训练、深度网络作为特征转换和混合方案。每个类别的研究方向也进行了分析和总结。此外,未来研究的可能性也被讨论了,以便预测未来的研究方向。
Quantization Aware Factorization for Deep Neural Network Compression
methods: 本研究使用 Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with quantized factors, 以实现tensor decomposition的缩减和量化。
results: 试验结果显示, compared to state-of-the-art post-training quantization methods, 本方法可以实现高品质和高效性的平衡,并且具有高的灵活性和可靠性。Abstract
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a drop in accuracy. This motivated us to develop an algorithm that finds tensor approximation directly with quantized factors and thus benefit from both compression techniques while keeping the prediction quality of the model. Namely, we propose to use Alternating Direction Method of Multipliers (ADMM) for Canonical Polyadic (CP) decomposition with factors whose elements lie on a specified quantization grid. We compress neural network weights with a devised algorithm and evaluate it's prediction quality and performance. We compare our approach to state-of-the-art post-training quantization methods and demonstrate competitive results and high flexibility in achiving a desirable quality-performance tradeoff.
摘要
《神经网络参数和计算量减少的tensor分解技术》是一种有效的神经网络减少参数和计算量的方法。由于移动或嵌入式设备的内存和电力限制,通常需要进行量化步骤才能在这些设备上部署预训练模型。然而,通常的后期量化方法在使用分解的 weights 时会导致模型的准确率下降。这种情况 Motivated us 开发一种直接使用量化因子进行tensorapproximation的算法,以便同时利用压缩技术,保持模型预测质量。具体来说,我们提出使用 Alternating Direction Method of Multipliers (ADMM) для canonical polyadic (CP) 分解,其中因子的元素 lying on a specified quantization grid。我们使用自定义的算法压缩神经网络参数,并评估其预测质量和性能。我们与state-of-the-art post-training quantization方法进行比较,并表现出高的灵活性和满足 desire quality-performance tradeoff。
ScatterUQ: Interactive Uncertainty Visualizations for Multiclass Deep Learning Problems
results: 这个论文通过一种交互式系统ScatterUQ,可以让用户更好地理解模型在不同不确定性 Settings下的性能,并通过 hover 回调来比较测试示例和训练示例的材料特征,以了解模型uncertainty性能和进行后续操作。Abstract
Recently, uncertainty-aware deep learning methods for multiclass labeling problems have been developed that provide calibrated class prediction probabilities and out-of-distribution (OOD) indicators, letting machine learning (ML) consumers and engineers gauge a model's confidence in its predictions. However, this extra neural network prediction information is challenging to scalably convey visually for arbitrary data sources under multiple uncertainty contexts. To address these challenges, we present ScatterUQ, an interactive system that provides targeted visualizations to allow users to better understand model performance in context-driven uncertainty settings. ScatterUQ leverages recent advances in distance-aware neural networks, together with dimensionality reduction techniques, to construct robust, 2-D scatter plots explaining why a model predicts a test example to be (1) in-distribution and of a particular class, (2) in-distribution but unsure of the class, and (3) out-of-distribution. ML consumers and engineers can visually compare the salient features of test samples with training examples through the use of a ``hover callback'' to understand model uncertainty performance and decide follow up courses of action. We demonstrate the effectiveness of ScatterUQ to explain model uncertainty for a multiclass image classification on a distance-aware neural network trained on Fashion-MNIST and tested on Fashion-MNIST (in distribution) and MNIST digits (out of distribution), as well as a deep learning model for a cyber dataset. We quantitatively evaluate dimensionality reduction techniques to optimize our contextually driven UQ visualizations. Our results indicate that the ScatterUQ system should scale to arbitrary, multiclass datasets. Our code is available at https://github.com/mit-ll-responsible-ai/equine-webapp
摘要
近期,对多类标签问题的不确定性意识深度学习方法已经发展出来,这些方法可以提供标量预测概率和外部数据(OOD)指示器,让机器学习(ML)用户和工程师可以评估模型对预测的自信心。然而,这些额外神经网络预测信息在多种不确定性Setting下可能困难减少可扩展的可视化显示。为解决这些挑战,我们现在介绍ScatterUQ,一个互动系统,可以提供特定的可视化来让用户更好地理解模型在受限制的不确定性设置下的性能。ScatterUQ利用了最新的距离意识神经网络和维度减少技术,构建了可靠的2D散点图,解释模型对测试示例的预测是(1)在distribution中,(2)在distribution中,但是不确定的类别,以及(3)外部数据。通过使用“悬停回调”,ML用户和工程师可以通过比较测试示例与训练示例的材料特征来理解模型不确定性性能,并根据需要采取后续行动。我们在多类图像分类任务上使用了距离意识神经网络,并在Fashion-MNIST和MNIST数字上进行了测试,以及一个深度学习模型 для一个网络安全任务。我们对维度减少技术进行了量化评估,以便优化我们在不同上下文中的UQ视觉表示。我们的结果表明,ScatterUQ系统可以扩展到任意多类数据集。我们的代码可以在https://github.com/mit-ll-responsible-ai/equine-webapp 上获取。
Kernel Single Proxy Control for Deterministic Confounding
results: 该文章通过实验表明,使用单个 proxy 变量可以成功地估计 causal effect,并且可以在 synthetic dataset 上成功地回归 true causal effect。Abstract
We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder. Although Proxy Causal Learning (PCL) uses two proxy variables to recover the true causal effect, we show that a single proxy variable is sufficient for causal estimation if the outcome is generated deterministically, generalizing Control Outcome Calibration Approach (COCA). We propose two kernel-based methods for this setting: the first based on the two-stage regression approach, and the second based on a maximum moment restriction approach. We prove that both approaches can consistently estimate the causal effect, and we empirically demonstrate that we can successfully recover the causal effect on a synthetic dataset.
摘要
我团队考虑了一个 causal effect 估计问题,其中存在一个未观测的假设变量。我们观察了一个代表变量,该变量与假设变量相关。虽然 Proxy Causal Learning(PCL)使用了两个代表变量来恢复真实的 causal effect,但我们显示了一个单个代表变量足够于 causal 估计,如果结果是 deterministic 生成的,则推广 Control Outcome Calibration Approach(COCA)。我们提出了两种基于 kernel 方法来解决这个问题:第一种基于 two-stage 回归方法,第二种基于最大 moments 约束方法。我们证明了这两种方法都可靠地估计 causal effect,并在一个 synthetic 数据集上进行了实验验证。
RECipe: Does a Multi-Modal Recipe Knowledge Graph Fit a Multi-Purpose Recommendation System?
results: 对于两个公开 dataset 上的食谱推荐问题,我们的实验结果表明,KGE 模型与深度神经 Collaborative Filtering(NCF)的性能相似。此外,我们还提出了针对新用户(即冷启动问题)和食谱类别 conditional 推荐的适用方案。最后,我们将 RECipe 应用于多功能推荐设定中。Abstract
Over the past two decades, recommendation systems (RSs) have used machine learning (ML) solutions to recommend items, e.g., movies, books, and restaurants, to clients of a business or an online platform. Recipe recommendation, however, has not yet received much attention compared to those applications. We introduce RECipe as a multi-purpose recipe recommendation framework with a multi-modal knowledge graph (MMKG) backbone. The motivation behind RECipe is to go beyond (deep) neural collaborative filtering (NCF) by recommending recipes to users when they query in natural language or by providing an image. RECipe consists of 3 subsystems: (1) behavior-based recommender, (2) review-based recommender, and (3) image-based recommender. Each subsystem relies on the embedding representations of entities and relations in the graph. We first obtain (pre-trained) embedding representations of textual entities, such as reviews or ingredients, from a fine-tuned model of Microsoft's MPNet. We initialize the weights of the entities with these embeddings to train our knowledge graph embedding (KGE) model. For the visual component, i.e., recipe images, we develop a KGE-Guided variational autoencoder (KG-VAE) to learn the distribution of images and their latent representations. Once KGE and KG-VAE models are fully trained, we use them as a multi-purpose recommendation framework. For benchmarking, we created two knowledge graphs (KGs) from public datasets on Kaggle for recipe recommendation. Our experiments show that the KGE models have comparable performance to the neural solutions. We also present pre-trained NLP embeddings to address important applications such as zero-shot inference for new users (or the cold start problem) and conditional recommendation with respect to recipe categories. We eventually demonstrate the application of RECipe in a multi-purpose recommendation setting.
摘要
RECipe consists of three subsystems: (1) behavior-based recommender, (2) review-based recommender, and (3) image-based recommender. Each subsystem relies on the embedding representations of entities and relations in the graph. We first obtain (pre-trained) embedding representations of textual entities, such as reviews or ingredients, from a fine-tuned model of Microsoft's MPNet. We initialize the weights of the entities with these embeddings to train our knowledge graph embedding (KGE) model. For the visual component, i.e., recipe images, we develop a KGE-Guided variational autoencoder (KG-VAE) to learn the distribution of images and their latent representations. Once KGE and KG-VAE models are fully trained, we use them as a multi-purpose recommendation framework.For benchmarking, we created two knowledge graphs (KGs) from public datasets on Kaggle for recipe recommendation. Our experiments show that the KGE models have comparable performance to the neural solutions. We also present pre-trained NLP embeddings to address important applications such as zero-shot inference for new users (or the cold start problem) and conditional recommendation with respect to recipe categories. We eventually demonstrate the application of RECipe in a multi-purpose recommendation setting.Translation notes:* "Over the past two decades" is translated as "过去二十年" (guò qù èr shí nián), using the past perfect tense to indicate that the events described in the sentence took place before a specific time in the past.* "recommendation systems" is translated as "推荐系统" (tuī yù xìng zhì), using the pinyin romanization of the Chinese term.* "machine learning" is translated as "机器学习" (jī shì xué xí), using the Chinese term for the field.* "solutions" is translated as "解决方案" (jiě jí fāng àn), using the Chinese term for "solution" or "answer".* "recipe" is translated as "菜谱" (cào bù), using the Chinese term for "recipe".* "users" is translated as "用户" (yòng hù), using the Chinese term for "user".* "queries" is translated as "查询" (chá xún), using the Chinese term for "query".* "natural language" is translated as "自然语言" (zì rán yǔ yán), using the Chinese term for "natural language".* "image" is translated as "图像" (tú xiàng), using the Chinese term for "image".* "entities" is translated as "实体" (shí tǐ), using the Chinese term for "entity".* "relations" is translated as "关系" (guān xì), using the Chinese term for "relation".* "knowledge graph" is translated as "知识图" (zhī shí tú), using the Chinese term for "knowledge graph".* "backbone" is translated as "基础结构" (jī jí jié gòng), using the Chinese term for "backbone" or "basis".* "multi-modal" is translated as "多Modal" (duō mó dāo), using the Chinese term for "multi-modal".* "embedding representations" is translated as "嵌入表示" (fàn shì biǎo xiǎng), using the Chinese term for "embedding representation".* "pre-trained" is translated as "预训练" (zhāng xiǎng xiǎng), using the Chinese term for "pre-trained".* "variational autoencoder" is translated as "变量自适应器" (biàn yù zì shì qǐng), using the Chinese term for "variational autoencoder".* "KGE" is translated as "知识图加 embedding" (zhī shí tú jiā embedding), using the Chinese term for "knowledge graph embedding".* "NLP" is translated as "自然语言处理" (zì rán yǔ yán bù), using the Chinese term for "natural language processing".* "zero-shot" is translated as "零枪指" (zhì zhèng zhǐ), using the Chinese term for "zero-shot".* "cold start" is translated as "冰点问题" (bīng diǎn wèn tí), using the Chinese term for "cold start problem".* "conditional" is translated as "条件" (tiáo jiàn), using the Chinese term for "conditional".* "recipe categories" is translated as "菜谱分类" (cào bù fēn lèi), using the Chinese term for "recipe categories".* "multi-purpose" is translated as "多目的" (duō mù de), using the Chinese term for "multi-purpose".Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.
Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder
results: 研究人员通过对228例自闭症和正常发育者的数据进行十 fold十字验证,证明了这种注意力基于方法在自闭症分类和严重程度预测任务中的优于其他多模态方法。Abstract
The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.
摘要
Autism spectrum disorder (ASD) 的多因素起源 suggests that studying it would greatly benefit from 多模态方法, combine 数据 from 各种不同的平台,例如 neuroscience imaging, genetics, and clinical characterization. Previous neuroimaging-genetic analyses often use naive feature concatenation approaches in data-driven work or use the findings from one modality to guide post hoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.
From Fake to Real (FFR): A two-stage training pipeline for mitigating spurious correlations with synthetic data
paper_authors: Maan Qraitem, Kate Saenko, Bryan A. Plummer
for: mitigate the bias in visual recognition models caused by an imbalanced training set
methods: pre-train a model on a balanced synthetic dataset, fine-tune on real data, and learn robust features against the bias
results: improve the performance of bias mitigation methods and achieve state-of-the-art performance on three large-scale datasetsAbstract
Visual recognition models are prone to learning spurious correlations induced by an imbalanced training set where certain groups (\eg Females) are under-represented in certain classes (\eg Programmers). Generative models offer a promising direction in mitigating this bias by generating synthetic data for the minority samples and thus balancing the training set. However, prior work that uses these approaches overlooks that visual recognition models could often learn to differentiate between real and synthetic images and thus fail to unlearn the bias in the original dataset. In our work, we propose a novel two-stage pipeline to mitigate this issue where 1) we pre-train a model on a balanced synthetic dataset and then 2) fine-tune on the real data. Using this pipeline, we avoid training on both real and synthetic data, thus avoiding the bias between real and synthetic data. Moreover, we learn robust features against the bias in the first step that mitigate the bias in the second step. Moreover, our pipeline naturally integrates with bias mitigation methods; they can be simply applied to the fine-tuning step. As our experiments prove, our pipeline can further improve the performance of bias mitigation methods obtaining state-of-the-art performance on three large-scale datasets.
摘要
视觉识别模型容易学习偏袋性词induced by an imbalanced训练集,其中certain groups(例如女性)在certain classes(例如程序员)下是under-represented。生成模型提供了一个promising direction来mitigate这种偏袋性,通过生成Synthetic数据来填充训练集中的缺失。然而,先前的工作 ignores the fact that visual recognition models could learn to differentiate between real and synthetic images, leading to a failure to unlearn the bias in the original dataset.在我们的工作中,我们提出了一个novel two-stage管道来mitigate this issue,包括:1)先在一个平衡的Synthetic dataset上pre-train模型,然后2)在真实数据上 fine-tune。通过这个管道,我们可以避免训练在真实和Synthetic数据上,从而避免偏袋性between real and Synthetic data。此外,我们在第一步学习了强健的特征,以mitigate偏袋性在第二步。此外,我们的管道自然地与偏袋性mitigation方法集成,它们可以简单地应用到精度调整步骤中。根据我们的实验,我们的管道可以进一步提高偏袋性mitigation方法的性能,在三个大规模数据集上达到状态之 искусственный智能的性能。
Improving Medical Image Classification in Noisy Labels Using Only Self-supervised Pretraining
results: 研究发现,使用自我主导学习方法初始化模型的 weights 可以帮助模型更好地学习噪音标签下的图像分类任务,并且可以提高模型对噪音标签的抗衰假性。Abstract
Noisy labels hurt deep learning-based supervised image classification performance as the models may overfit the noise and learn corrupted feature extractors. For natural image classification training with noisy labeled data, model initialization with contrastive self-supervised pretrained weights has shown to reduce feature corruption and improve classification performance. However, no works have explored: i) how other self-supervised approaches, such as pretext task-based pretraining, impact the learning with noisy label, and ii) any self-supervised pretraining methods alone for medical images in noisy label settings. Medical images often feature smaller datasets and subtle inter class variations, requiring human expertise to ensure correct classification. Thus, it is not clear if the methods improving learning with noisy labels in natural image datasets such as CIFAR would also help with medical images. In this work, we explore contrastive and pretext task-based self-supervised pretraining to initialize the weights of a deep learning classification model for two medical datasets with self-induced noisy labels -- NCT-CRC-HE-100K tissue histological images and COVID-QU-Ex chest X-ray images. Our results show that models initialized with pretrained weights obtained from self-supervised learning can effectively learn better features and improve robustness against noisy labels.
摘要
噪声标签会对深度学习基于监督学习的图像分类性能产生负面影响,因为模型可能会过拟合噪声并学习损坏的特征提取器。在自然图像分类训练中使用噪声标签数据时,使用对比自我超视的初始化模型 weights 可以降低特征损坏和提高分类性能。然而,现有的研究没有探讨:一、其他自我超视任务基于预测任务的预训练对噪声标签下的学习产生影响,二、任何自我超视预训练方法可以独立地为医学图像中的噪声标签下学习提供帮助。医学图像通常具有较小的数据集和极细的间类变化,需要人工专业来确保正确的分类。因此,是否可以在自然图像 dataset 中使用自我超视预训练来提高噪声标签下的学习,是一个未知问题。在这个工作中,我们探讨了对比和预测任务基于自我超视预训练的影响,以 Initialize 深度学习分类模型的 weights 以便在两个医学图像 dataset 上进行自我induced 噪声标签下的学习。我们的结果表明,使用自我超视预训练初始化模型 weights 可以更好地学习特征和提高对噪声标签的Robustness。
Improving Performance in Continual Learning Tasks using Bio-Inspired Architectures
results: 该方法在 Split-MNIST、Split-CIFAR-10 和 Split-CIFAR-100 数据集上表现出色,比较memory-constrained learning方法和 memory-intensive replay-based方法更好,并且可以与state-of-the-art memory-intensive replay-based方法匹配。此外, authors 还将关键的设计元素integrated into other backpropagation-based continual learning algorithms,提高了它们的准确性。Abstract
The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.
摘要
“持续学习”是设计智能系统的核心能力。许多持续学习方法 rely 于测验函数均值对应和其变形,因此需要运用记忆缓冲或重播来缓解其稳定性、嗜好和短期记忆限制。为解决这个限制,我们已经开发了基于生物学原理的轻量级神经网络架构,具有synaptic plasticity机制和神经调节,因此可以通过本地错误信号进行在线持续学习,不需要数据测验函数均值对应。我们的方法在Split-MNIST、Split-CIFAR-10和Split-CIFAR-100 datasets上实现了较好的在线持续学习性能,比较于其他记忆受限的学习方法和匹配了memory-intensive replay-based方法的性能。我们还证明了我们的方法可以与其他条件反射-based持续学习算法相结合,提高其精度。我们的结果提供了将生物学原理应用到机器学习模型的重要证据,并提供了如何运用这些原理来设计更有效率和可靠的在线持续学习系统。
Deep Learning for Diverse Data Types Steganalysis: A Review
paper_authors: Hamza Kheddar, Mustapha Hemis, Yassine Himeur, David Megías, Abbes Amira for: 这篇论文主要探讨了深度学习基本的隐藏信息检测技术,以帮助探索黑客或恐怖份子所使用的隐藏通信。methods: 这篇论文主要介绍了各种深度学习技术,包括深度学习探测、深度转移学习和深度强化学习,并评估了它们在不同的数据集上的性能。results: 根据文献的数据显示,深度学习基本的隐藏信息检测技术已经取得了比较高的检测精度和速度。尤其是使用深度转移学习和深度强化学习的方法,能够在不同的类型数据上实现更高的检测性能。Abstract
Steganography and steganalysis are two interrelated aspects of the field of information security. Steganography seeks to conceal communications, whereas steganalysis is aimed to either find them or even, if possible, recover the data they contain. Steganography and steganalysis have attracted a great deal of interest, particularly from law enforcement. Steganography is often used by cybercriminals and even terrorists to avoid being captured while in possession of incriminating evidence, even encrypted, since cryptography is prohibited or restricted in many countries. Therefore, knowledge of cutting-edge techniques to uncover concealed information is crucial in exposing illegal acts. Over the last few years, a number of strong and reliable steganography and steganalysis techniques have been introduced in the literature. This review paper provides a comprehensive overview of deep learning-based steganalysis techniques used to detect hidden information within digital media. The paper covers all types of cover in steganalysis, including image, audio, and video, and discusses the most commonly used deep learning techniques. In addition, the paper explores the use of more advanced deep learning techniques, such as deep transfer learning (DTL) and deep reinforcement learning (DRL), to enhance the performance of steganalysis systems. The paper provides a systematic review of recent research in the field, including data sets and evaluation metrics used in recent studies. It also presents a detailed analysis of DTL-based steganalysis approaches and their performance on different data sets. The review concludes with a discussion on the current state of deep learning-based steganalysis, challenges, and future research directions.
摘要
信息安全领域中的隐藏通信和找到隐藏通信的技术是两个相关的方面,称为 стеганография和стегана利isis。Steganography是为了隐藏通信,而Steganalysis是为了找到隐藏的通信或者恢复其中的数据。隐藏通信是由Cybercriminal和恐怖分子使用,以避免被抓获时 Possession of incriminating evidence,因为在许多国家, cryptography is prohibited or restricted。因此,了解最新的隐藏信息探测技术是在揭露违法行为中非常重要。过去几年,一些强大和可靠的隐藏信息探测技术在文献中被引入。这篇评论文章提供了深度学习基于的隐藏信息探测技术,用于检测数字媒体中的隐藏信息。文章覆盖了所有类型的遮盖,包括图像、音频和视频,并讨论了最常用的深度学习技术。此外,文章还探讨了使用更高级的深度学习技术,如深度传输学习(DTL)和深度奖励学习(DRL),以提高隐藏信息探测系统的性能。文章提供了现场的评论,包括最新的数据集和评价标准,以及DTL基于的隐藏信息探测方法的性能分析。文章结束于隐藏信息探测领域的当前状况,挑战和未来研究方向。
Dynamic Model Agnostic Reliability Evaluation of Machine-Learning Methods Integrated in Instrumentation & Control Systems
for: This paper aims to improve the trustworthiness of machine learning (ML) predictions in instrumentation and control systems by developing a real-time model-agnostic method to evaluate the relative reliability of ML predictions.
methods: The proposed method, called Laplacian distributed decay for reliability (LADDR), incorporates out-of-distribution detection on the training dataset to determine the difference between the operational and training datasets, which is used to calculate a prediction’s relative reliability.
results: The LADDR method is demonstrated on a feedforward neural network-based model used to predict safety significant factors during different loss-of-flow transients, and is shown to be effective in evaluating the relative reliability of ML predictions for conventional interpolation tasks.Abstract
In recent years, the field of data-driven neural network-based machine learning (ML) algorithms has grown significantly and spurred research in its applicability to instrumentation and control systems. While they are promising in operational contexts, the trustworthiness of such algorithms is not adequately assessed. Failures of ML-integrated systems are poorly understood; the lack of comprehensive risk modeling can degrade the trustworthiness of these systems. In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption and will play a vital role in intelligent systems' safe and accountable operation. Thus, in this work, we demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset. It is well documented that ML algorithms excel at interpolation (or near-interpolation) tasks but significantly degrade at extrapolation. This occurs when new samples are "far" from training samples. The method, referred to as the Laplacian distributed decay for reliability (LADDR), determines the difference between the operational and training datasets, which is used to calculate a prediction's relative reliability. LADDR is demonstrated on a feedforward neural network-based model used to predict safety significant factors during different loss-of-flow transients. LADDR is intended as a "data supervisor" and determines the appropriateness of well-trained ML models in the context of operational conditions. Ultimately, LADDR illustrates how training data can be used as evidence to support the trustworthiness of ML predictions when utilized for conventional interpolation tasks.
摘要
近年来,数据驱动神经网络基于机器学习(ML)算法的Field在实现仪表和控制系统方面得到了广泛的研究和应用。虽然它们在操作上有承诺,但ML算法的可靠性还未得到全面的评估。失败的ML集成系统未能准确地理解;缺乏完善的风险模型可能会降低ML系统的可靠性。国家标准技术研究所的最近报告显示,在智能系统中,可靠性将作为采用ML的关键障碍。因此,在这项工作中,我们提出了一种实时模型不依赖的方法,通过在训练集上进行异常检测来评估ML预测的相对可靠性。ML算法在 interpolate(或近似 interpolate)任务上 excel,但在 extrapolation 任务上会 significatively degrade。这意味着当新样本远离训练样本时,ML算法会表现不佳。我们提出的方法,即Laplacian distributed decay for reliability(LADDR),可以在不同的损失流转中预测安全重要因素的可靠性。LADDR是一种“数据监管”,用于判断训练集和操作集之间的差异,以计算预测的相对可靠性。LADDR是一种模型不依赖的方法,可以在不同的操作条件下评估ML模型的可靠性。最终,LADDR示例了如何在常规 interpolate 任务中使用训练数据作为可靠性的证明。
MT-IceNet – A Spatial and Multi-Temporal Deep Learning Model for Arctic Sea Ice Forecasting
for: 这 paper 的目的是为了预测北极海冰覆盖率(SIC),以便更好地理解和预测北极气候变化。
methods: 这 paper 使用了一种基于深度学习的方法,称为 MT-IceNet,它使用了一个 UNet 结构,并使用了多个时间流和空间流来预测北极海冰覆盖率。
results: 根据使用 NSIDC 的卫星评估数据和 ERA5 演算结果,这 paper 的结果表明,MT-IceNet 模型可以提供出色的预测性能,比其他现有的方法更加准确,最多可以预测到 6 个月前的情况。Abstract
Arctic amplification has altered the climate patterns both regionally and globally, resulting in more frequent and more intense extreme weather events in the past few decades. The essential part of Arctic amplification is the unprecedented sea ice loss as demonstrated by satellite observations. Accurately forecasting Arctic sea ice from sub-seasonal to seasonal scales has been a major research question with fundamental challenges at play. In addition to physics-based Earth system models, researchers have been applying multiple statistical and machine learning models for sea ice forecasting. Looking at the potential of data-driven approaches to study sea ice variations, we propose MT-IceNet - a UNet based spatial and multi-temporal (MT) deep learning model for forecasting Arctic sea ice concentration (SIC). The model uses an encoder-decoder architecture with skip connections and processes multi-temporal input streams to regenerate spatial maps at future timesteps. Using bi-monthly and monthly satellite retrieved sea ice data from NSIDC as well as atmospheric and oceanic variables from ERA5 reanalysis product during 1979-2021, we show that our proposed model provides promising predictive performance for per-pixel SIC forecasting with up to 60% decrease in prediction error for a lead time of 6 months as compared to its state-of-the-art counterparts.
摘要
《北极强化效应对 климатиче Patterns 产生了广泛和深刻的影响,从2000年代中期以来,全球和地方的极端天气事件变得更加频繁和严重。北极强化效应的核心是历史上无 precedent的海冰损失,这种观测结果表明了这一点。预测北极海冰的科学研究问题具有基本挑战,包括物理基础模型和统计学机器学习模型。我们提出了MT-IceNet模型,这是一种基于UNet的空间和多时间(MT)深度学习模型,用于预测北极海冰浓度(SIC)。该模型使用编码器-解码器架构,并使用跳接连接来处理多时间输入流,以生成未来时间步的空间地图。使用1979-2021年NSIDC从卫星得到的月度和两个月度海冰数据,以及ERA5分析产品中的大气和海洋变量,我们表明了我们提出的模型在预测每个像素SIC的前6个月的预测误差减少了60%,与现有的模型相比。
Efficient option pricing with unary-based photonic computing chip and generative adversarial learning
paper_authors: Hui Zhang, Lingxiao Wan, Sergi Ramos-Calderer, Yuancheng Zhan, Wai-Keong Mok, Hong Cai, Feng Gao, Xianshu Luo, Guo-Qiang Lo, Leong Chuan Kwek, José Ignacio Latorre, Ai Qun Liu
for: 这个论文是为了提高金融业务效率和质量而设计的。
methods: 这个论文使用光子芯片实现欧洲选择价格的唯一方法,并与量子振荡估算算法相结合,以实现类比于经典 Monte Carlo 方法的二次加速。
results: 这个论文实现了一种光子芯片,可以快速计算欧洲选择价格,并且可以减少经典 Monte Carlo 方法的计算时间。Abstract
In the modern financial industry system, the structure of products has become more and more complex, and the bottleneck constraint of classical computing power has already restricted the development of the financial industry. Here, we present a photonic chip that implements the unary approach to European option pricing, in combination with the quantum amplitude estimation algorithm, to achieve a quadratic speedup compared to classical Monte Carlo methods. The circuit consists of three modules: a module loading the distribution of asset prices, a module computing the expected payoff, and a module performing the quantum amplitude estimation algorithm to introduce speed-ups. In the distribution module, a generative adversarial network is embedded for efficient learning and loading of asset distributions, which precisely capture the market trends. This work is a step forward in the development of specialized photonic processors for applications in finance, with the potential to improve the efficiency and quality of financial services.
摘要
现代金融系统中,产品结构变得越来越复杂,而 классическое计算能力的瓶颈已经限制了金融业的发展。我们在这里提出了一款光学芯片,实现了一元方法来估算欧洲期权价格,并结合量子振荡估计算法,实现了对类比 Monte Carlo 方法的二次加速。该芯片由三个模块组成:分布模块、预期支付模块和量子振荡估计算法模块。在分布模块中,我们嵌入了生成对抗网络,以高效地学习和加载资产分布,准确捕捉市场趋势。这项工作是金融特殊光学处理器的开发的一个重要步骤,具有改善金融服务效率和质量的潜力。
When More is Less: Incorporating Additional Datasets Can Hurt Performance By Introducing Spurious Correlations
paper_authors: Rhys Compton, Lily Zhang, Aahlad Puli, Rajesh Ranganath
for: 这篇论文旨在探讨机器学习模型如何处理多个数据集的问题,以及这些数据集之间的相互作用。
methods: 作者使用了多个开源的胸部X射线图像数据集,并对它们进行了大规模的实验研究。
results: 研究发现,在43%的情况下,将多个医院的数据集合并训练机器学习模型可能会导致模型的性能下降。这种现象发生在训练数据集与测试数据集之间存在潜在的假相关性的情况下。Abstract
In machine learning, incorporating more data is often seen as a reliable strategy for improving model performance; this work challenges that notion by demonstrating that the addition of external datasets in many cases can hurt the resulting model's performance. In a large-scale empirical study across combinations of four different open-source chest x-ray datasets and 9 different labels, we demonstrate that in 43% of settings, a model trained on data from two hospitals has poorer worst group accuracy over both hospitals than a model trained on just a single hospital's data. This surprising result occurs even though the added hospital makes the training distribution more similar to the test distribution. We explain that this phenomenon arises from the spurious correlation that emerges between the disease and hospital, due to hospital-specific image artifacts. We highlight the trade-off one encounters when training on multiple datasets, between the obvious benefit of additional data and insidious cost of the introduced spurious correlation. In some cases, balancing the dataset can remove the spurious correlation and improve performance, but it is not always an effective strategy. We contextualize our results within the literature on spurious correlations to help explain these outcomes. Our experiments underscore the importance of exercising caution when selecting training data for machine learning models, especially in settings where there is a risk of spurious correlations such as with medical imaging. The risks outlined highlight the need for careful data selection and model evaluation in future research and practice.
摘要
在机器学习中,通常认为更多数据可以提高模型性能,但这项工作挑战这一观点,表明在许多情况下,外部数据集添加可能会损害模型性能。我们在四个开源胸部X射线图像集和九个标签之间进行了大规模的实验,发现在43%的情况下,使用两家医院的数据进行训练的模型在两家医院的数据上具有较差的最坏群组精度。这种意外的结果,即尽管添加的医院使训练分布更加类似于测试分布,但是模型在两家医院的数据上具有较差的性能。我们解释了这种现象是由于医院特有的图像artifacts产生的假 correlation的影响。我们强调在多个数据集训练时存在的费解之处, между添加更多数据的明显利益和引入的假 correlations的隐性成本。在某些情况下,平衡数据可以消除假 correlations并提高性能,但并非总是有效的策略。我们在文献中归纳了我们的结果,以帮助解释这些结果。我们的实验让人们意识到在机器学习模型训练时,特别是在医疗影像领域,选择数据的价值很大,需要小心评估和选择数据。这些风险提出的需要在未来的研究和实践中进行细心的数据选择和模型评估。
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
results: 研究发现,使用SILO可以提高语言模型在不同领域的性能,同时减少版权和限制数据的风险。Specifically, SILO的搜寻性能与使用Pile corpus进行训练的LMClosing 90%的性能差。此外,研究还分析了不同的非参数方法的效果,以及数据库大小对性能的影响。Abstract
The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government documents), due to its limited size and domain coverage. We present SILO, a new language model that manages this risk-performance tradeoff during inference. SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e.g., containing copyrighted books or news) that is only queried during inference. The datastore allows use of high-risk data without training on it, supports sentence-level data attribution, and enables data producers to opt out from the model by removing content from the store. These capabilities can foster compliance with data-use regulations such as the fair use doctrine in the United States and the GDPR in the European Union. Our experiments show that the parametric LM struggles on domains not covered by OLC. However, access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile, a more diverse corpus with mostly high-risk text. We also analyze which nonparametric approach works best, where the remaining errors lie, and how performance scales with datastore size. Our results suggest that it is possible to build high quality language models while mitigating their legal risk.
摘要
训练语言模型(LM)在版权或限制数据上的法律合法性正在激烈讨论。然而,我们表明,如果只训练LM在低风险文本(如公共领域或政府文件)上,其性能会显著下降,因为这些文本的数量和覆盖率都很限制。我们提出了SILO,一种新的语言模型,可以在推理时管理这种风险和性能的贸易。SILO由以下两个部分组成:1. 使用 parametric LM 训练 Open License Corpus(OLC),一个我们新建的公共领域和允许授权文本的新词汇库,包含228亿个字符。2. 通过在推理时使用非参数的数据存储(例如包含版权书籍或新闻的数据)来增强模型的性能。这个数据存储支持句子级数据归属,并允许数据生成者在模型中排除自己的内容。我们的实验表明, parametric LM 在不受 OLC 覆盖的领域表现不佳。然而,通过访问数据存储,可以大幅提高 Mod 的 OUT OF 领域表现,将性能与基于 Pile 的 LM 相凑,这个 Pile 的文本主要是高风险文本。我们还分析了非参数方法的最佳选择、剩下的错误的位置以及数据存储大小的性能满意度。我们的结果表明,可以建立高质量的语言模型,同时避免其法律风险。
Meta-Learning Operators to Optimality from Multi-Task Non-IID Data
results: 这个论文的结果表明,使用这种方法可以在不同来源或任务上学习共同的表示函数,并且可以降低计算成本和统计泛化。此外,这种方法还可以在各种数学模型下进行扩展和应用。Abstract
A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic meta-learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent (AMD) scheme proposed in Collins et al., (2021), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems.
摘要
一个强大的概念在现代机器学习中是将多元数据中的共同特征提取出来。这将reduces computational effort和statistical generalization的成本,因为仅需要 fine-tune fewer parameters on a given task。在理论上诠释这些优点,我们提出一个统一 Linear operators $M$ 的恢复问题,其中 vector measurements $y = Mx + w$ 中的 covariates $x$ 可能是非 i.i.d. 和非对称的。我们证明了现有的不对称适应学习方法对 representation update 会产生偏见,导致随着任务数量增加而对 noise term 的扩大。这会导致 representation learning 的样本Complexity 被瓶颈在单一任务数据大小上。我们引入了一个适应 $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$),它是 Collins et al., (2021) 提出的流行的 Alternating Minimization-Descent (AMD)方案的修改。我们证明了 $\texttt{DFW}$ 在 total source data size 下随着阶层降低的情况下,具有线性传播到优化的表现。这导致了一个对 oracle empirical risk minimizer 相似的一个 generalization bound。我们透过各种数据 simulated 的 verify,证明了 $\texttt{DFW}$ 的重要性。具体来说,我们发现了 vanilla Alternating-Minimization Descent 甚至在 iid 的数据上也会 catastrophically fail。我们的分析统一了和扩展了先前的工作,并提供了更多的应用,例如在控制和动力系统等领域。
A Deep-Learning Method Using Auto-encoder and Generative Adversarial Network for Anomaly Detection on Ancient Stone Stele Surfaces
results: 这篇论文使用自动encoder(AE)和生成敌方网络(GAN)架构,可以实时检测古代碑石表面上的紧急情况,并且不需要大量的异常样本。在实验中,使用长门洞穴的石碑为案例研究,提出了一个无监督学习模型,并在重建精度99.74%的基础上验证了模型的可靠性和精度。Abstract
Accurate detection of natural deterioration and man-made damage on the surfaces of ancient stele in the first instance is essential for their preventive conservation. Existing methods for cultural heritage preservation are not able to achieve this goal perfectly due to the difficulty of balancing accuracy, efficiency, timeliness, and cost. This paper presents a deep-learning method to automatically detect above mentioned emergencies on ancient stone stele in real time, employing autoencoder (AE) and generative adversarial network (GAN). The proposed method overcomes the limitations of existing methods by requiring no extensive anomaly samples while enabling comprehensive detection of unpredictable anomalies. the method includes stages of monitoring, data acquisition, pre-processing, model structuring, and post-processing. Taking the Longmen Grottoes' stone steles as a case study, an unsupervised learning model based on AE and GAN architectures is proposed and validated with a reconstruction accuracy of 99.74\%. The method's evaluation revealed the proficient detection of seven artificially designed anomalies and demonstrated precision and reliability without false alarms. This research provides novel ideas and possibilities for the application of deep learning in the field of cultural heritage.
摘要
正确地探测古迹上自然衰老和人为损坏的存在是保护古迹的首要任务。现有的文化遗产保护方法无法完全达到这个目标,因为很难平衡精度、效率、时间和成本。本研究提出了一种基于深度学习的方法,可以自动探测古石碑上的紧急情况,使用自适应神经网络(AE)和生成对抗网络(GAN)。提案的方法可以无需大量的异常样本,同时具有全面探测不可预测的异常的能力。方法包括监控、数据收集、预处理、模型结构和后处理等阶段。以长门石窟的石碑为例,提出了一个无监控学习模型,使用AE和GAN架构,并在重建准确率达99.74%。评估结果显示了对七种人工设计的异常探测的精准性和可靠性,无 FALSE ALARM 的情况。本研究将深度学习应用在文化遗产保护领域提供了新的想法和可能性。
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
results: 实验结果显示,DiffCR 在两个通用的评估数据集上均能够获得最佳性能,并且与前一代方法的参数和computational complexity相比,只需5.1%和5.4%。所有的实验结果和代码将会在https://github.com/XavierJiezou/DiffCR 上公开。Abstract
Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.
摘要
OPTICAL 卫星图像是一种关键的数据源,但是云覆盖往往会降低图像质量,妨碍图像应用和分析。因此,从事cloud removal的研究已经成为一个重要的研究方向。 aunque recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. 经验证明,DiffCR在两个常用的 benchmark 数据集上具有最佳性能,与之前最佳方法的参数和计算复杂度分别为5.1%和5.4%。源代码、预训练模型和所有实验结果将在https://github.com/XavierJiezou/DiffCR 上公开发布。
Vector Embeddings by Sequence Similarity and Context for Improved Compression, Similarity Search, Clustering, Organization, and Manipulation of cDNA Libraries
results: 研究发现,通过使用这种数字表示方法,可以更好地聚类序列,并且可以提高压缩率。此外,通过基于 codon triplets 的上下文学习,可以进行更精细的聚类和特征分析。Abstract
This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ5). FASTA/FASTQ files have several current limitations, such as their large file sizes, slow processing speeds for mapping and alignment, and contextual dependencies. These challenges significantly hinder investigations and tasks that involve finding similar sequences. The solution lies in transforming sequences into an alternative representation that facilitates easier clustering into similar groups compared to the raw sequences themselves. By assigning a unique vector embedding to each short sequence, it is possible to more efficiently cluster and improve upon compression performance for the string representations of cDNA libraries. Furthermore, through learning alternative coordinate vector embeddings based on the contexts of codon triplets, we can demonstrate clustering based on amino acid properties. Finally, using this sequence embedding method to encode barcodes and cDNA sequences, we can improve the time complexity of the similarity search by coupling vector embeddings with an algorithm that determines the proximity of vectors in Euclidean space; this allows us to perform sequence similarity searches in a quicker and more modular fashion.
摘要
Translated into Simplified Chinese:这篇论文展示了用有组织的数字表示方法优化flat string基因格式(即FASTA/FASTQ5)的研究。FASTA/FASTQ文件存在许多现有的限制,例如文件大小、对 mapping 和 aligning 进行slow processing的速度、以及contextual dependencies。这些限制会对investigations和任务 involving finding similar sequences 造成很大的阻碍。解决方案在于将序列转换成一种更好的表示方式,以便更容易将similar sequences clustering。通过赋予每个短序列唯一的vector embedding,可以更高效地 clustering和提高cDNA库的压缩性能。此外,通过基于 codon triplets 的上下文学习alternative coordinate vector embeddings,我们可以示出基于aa properties的 clustering。最后,通过使用这种序列嵌入方法来编码 barcodes 和 cDNA sequences,我们可以通过将vector embeddings coupled with an algorithm that determines the proximity of vectors in Euclidean space 来提高sequence similarity searches的时间复杂度,从而实现更快速和更模块化的搜索方式。
Probabilistic Invariant Learning with Randomized Linear Classifiers
paper_authors: Leonardo Cotta, Gal Yehuda, Assaf Schuster, Chris J. Maddison
for: 这个论文的目的是设计一类能够具有表达力和保持任务不变性的模型,而且具有较少的资源需求。
methods: 这个论文使用了随机化的算法思想,提出了一类基于随机线性模型的分类器(Randomized Linear Classifiers,RLCs),并证明RLCs可以高概率地近似任何(光滑)函数,同时保持 compact group transformations 的不变性。
results: 论文通过实验表明,RLCs可以在不变性任务中超越权重链 neuron 和其不变性版本,具有较少的资源需求。Abstract
Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
摘要
“设计能够表达性和保持已知 invariants 的模型是一个在不断增长的问题。现有的解决方案都会交换 invariants 和计算或内存资源。在这项工作中,我们表明了如何利用 randomness 和设计能够表达性和 invariants 的模型,同时使用更少的资源。我们的关键发现是接受 probabilistic 的 universality 和 invariants 的想法可以降低我们的资源需求。更 Specifically,我们提出了一类基于 randomized algorithms 的 binary classification 模型,称为 Randomized Linear Classifiers (RLCs)。我们给出了参数和样本大小的条件,在这些条件下,RLCs 可以, WITH HIGH PROBABILITY,近似任何(平滑)函数,同时保持 compact group transformations 的 invariants。基于这个结果,我们设计了三种 RLCs,这些模型可以在 classification tasks 中保证 probabilistic invariants。我们证明了这些模型可以使用更少的资源来实现 universality 和 invariants,比 deterministic 神经网络和其 invariants 的counterparts。最后,我们通过实验证明了这种新的类型模型在 invariant tasks 中的 beneficial effects。”
paper_authors: Zihan Guan, Mengnan Du, Ninghao Liu
For: The paper is written to detect backdoor attacks on graph learning models.* Methods: The paper proposes an explanation-guided backdoor detection method that utilizes topological feature information to distinguish backdoor samples from clean samples.* Results: The proposed method is effective in detecting backdoor attacks on multiple popular datasets and attack methods, and provides explainable results through the use of explanation methods.Here is the same information in Simplified Chinese:* For: 文章目的是探测图学习模型中的后门攻击。* Methods: 文章提出了一种基于 topological 特征信息的解释帮助的后门检测方法。* Results: 提议的方法在多个流行的数据集和攻击方法上显示出了效果,并通过使用解释方法提供了可解释的结果。I hope this helps!Abstract
Backdoor attacks pose a significant security risk to graph learning models. Backdoors can be embedded into the target model by inserting backdoor triggers into the training dataset, causing the model to make incorrect predictions when the trigger is present. To counter backdoor attacks, backdoor detection has been proposed. An emerging detection strategy in the vision and NLP domains is based on an intriguing phenomenon: when training models on a mixture of backdoor and clean samples, the loss on backdoor samples drops significantly faster than on clean samples, allowing backdoor samples to be easily detected by selecting samples with the lowest loss values. However, the ignorance of topological feature information on graph data limits its detection effectiveness when applied directly to the graph domain. To this end, we propose an explanation-guided backdoor detection method to take advantage of the topological information. Specifically, we train a helper model on the graph dataset, feed graph samples into the model, and then adopt explanation methods to attribute model prediction to an important subgraph. We observe that backdoor samples have distinct attribution distribution than clean samples, so the explanatory subgraph could serve as more discriminative features for detecting backdoor samples. Comprehensive experiments on multiple popular datasets and attack methods demonstrate the effectiveness and explainability of our method. Our code is available: https://github.com/GuanZihan/GNN_backdoor_detection.
摘要
Graph 学习模型面临着重要的安全隐患,即后门攻击。可以通过插入后门触发器到训练集中,使模型在触发器存在时进行错误预测。为了防止后门攻击,后门检测被提议。在视觉和自然语言处理领域,一种emerging检测策略是根据一种奇妙现象:在混合后门和干净样本训练模型时,后门样本的损失值会下降得更快,使得后门样本可以通过选择损失值最低的样本来轻松地检测。然而,在图数据上直接应用这种方法时,它的检测效果受到图数据的topological特征信息的限制。为此,我们提出了一种带有解释的后门检测方法,利用图数据的topological信息。具体来说,我们在图数据集上训练一个助手模型,然后将图样本 feed 到模型中,然后采用解释方法来归因模型预测结果到一个重要的子图。我们发现,后门样本的归因分布与干净样本不同,因此解释子图可以作为更有特征的检测特征。我们的实验表明,我们的方法在多个流行的数据集和攻击方法上具有效果和可解释性。我们的代码可以在 GitHub 上找到:https://github.com/GuanZihan/GNN_backdoor_detection。
Event Abstraction for Enterprise Collaboration Systems to Support Social Process Mining
results: 论文的评估结果表明,该算法能够生成准确的结果。ECSEA是一种重要的预处理方法,可以帮助解读ECS中的协作工作活动,我们称之为社会进程挖掘(Social Process Mining)。Abstract
One aim of Process Mining (PM) is the discovery of process models from event logs of information systems. PM has been successfully applied to process-oriented enterprise systems but is less suited for communication- and document-oriented Enterprise Collaboration Systems (ECS). ECS event logs are very fine-granular and PM applied to their logs results in spaghetti models. A common solution for this is event abstraction, i.e., converting low-level logs into more abstract high-level logs before running discovery algorithms. ECS logs come with special characteristics that have so far not been fully addressed by existing event abstraction approaches. We aim to close this gap with a tailored ECS event abstraction (ECSEA) approach that trains a model by comparing recorded actual user activities (high-level traces) with the system-generated low-level traces (extracted from the ECS). The model allows us to automatically convert future low-level traces into an abstracted high-level log that can be used for PM. Our evaluation shows that the algorithm produces accurate results. ECSEA is a preprocessing method that is essential for the interpretation of collaborative work activity in ECS, which we call Social Process Mining.
摘要
一个目标 OF 进程挖掘(PM)是从信息系统事件日志中发现进程模型。PM 已经成功地应用于进程强调企业系统,但是它对交通和文档强调企业协作系统(ECS)的事件日志 Less 适用。ECS 事件日志具有非常细腻的特点,PM 在其日志上运行发现算法会导致“卡路里模型”。一种常见的解决方案是事件抽象,即将低级别的日志转换为更加抽象的高级别日志,以便于发现算法。ECS 日志具有特殊的特点, existing 事件抽象方法未能充分 Address 这些特点。我们的目标是通过针对实际用户活动记录(高级轨迹)和系统生成的低级别轨迹(从 ECS 提取)进行比较,并 trains 一个模型,以便将未来的低级别轨迹自动转换为抽象的高级别日志,可以用于 PM。我们的评估结果表明,该算法生成的结果准确。ECSEA 是一种适用于 ECS 的预处理方法,它是社交过程挖掘(Social Process Mining)中不可或缺的一部分。
Data Augmentation-Based Unsupervised Domain Adaptation In Medical Imaging
results: 经过广泛的实验和比较,该方法在多种任务和数据集中表现出了高精度和广泛的可用性,同时也能够快速地适应新的扫描设备和数据集。Abstract
Deep learning-based models in medical imaging often struggle to generalize effectively to new scans due to data heterogeneity arising from differences in hardware, acquisition parameters, population, and artifacts. This limitation presents a significant challenge in adopting machine learning models for clinical practice. We propose an unsupervised method for robust domain adaptation in brain MRI segmentation by leveraging MRI-specific augmentation techniques. To evaluate the effectiveness of our method, we conduct extensive experiments across diverse datasets, modalities, and segmentation tasks, comparing against the state-of-the-art methods. The results show that our proposed approach achieves high accuracy, exhibits broad applicability, and showcases remarkable robustness against domain shift in various tasks, surpassing the state-of-the-art performance in the majority of cases.
摘要
深度学习基本模型在医疗成像领域经常陷入新扫描数据的泛化问题,这是因为数据间的不同,包括硬件、获取参数、人口和artefacts等。这种限制对于实施机器学习模型在临床实践中带来了重要的挑战。我们提议一种无监督的鲁棒领域适应方法,通过利用MRI特有的扩充技术来实现。为评估我们的方法的效果,我们在多个数据集、模式和分割任务中进行了广泛的实验,与当前最佳方法进行比较。结果显示,我们的提议方法在多个任务中具有高精度、广泛适用性和强大的鲁棒性,超越了当前最佳性能的大多数情况。
Metaheuristic Algorithms in Artificial Intelligence with Applications to Bioinformatics, Biostatistics, Ecology and, the Manufacturing Industries
paper_authors: Elvis Han Cui, Zizhao Zhang, Culsome Junwen Chen, Weng Kee Wong
for: This paper is written to demonstrate the flexibility and out-performance of a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) in various optimization problems in the statistical sciences.
methods: The paper uses the CSO-MA algorithm to solve a variety of optimization problems, including finding maximum likelihood estimates of parameters, estimating parameters in a Rasch model, finding M-estimates for a Cox regression, and matrix completion to impute missing values.
results: The paper shows that the CSO-MA algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. The algorithm is applied to a variety of optimization problems, including finding maximum likelihood estimates of parameters in a single cell generalized trend model, estimating parameters in a commonly used Rasch model, finding M-estimates for a Cox regression, and matrix completion to impute missing values.Abstract
Nature-inspired metaheuristic algorithms are important components of artificial intelligence, and are increasingly used across disciplines to tackle various types of challenging optimization problems. We apply a newly proposed nature-inspired metaheuristic algorithm called competitive swarm optimizer with mutated agents (CSO-MA) and demonstrate its flexibility and out-performance relative to its competitors in a variety of optimization problems in the statistical sciences. In particular, we show the algorithm is efficient and can incorporate various cost structures or multiple user-specified nonlinear constraints. Our applications include (i) finding maximum likelihood estimates of parameters in a single cell generalized trend model to study pseudotime in bioinformatics, (ii) estimating parameters in a commonly used Rasch model in education research, (iii) finding M-estimates for a Cox regression in a Markov renewal model and (iv) matrix completion to impute missing values in a two compartment model. In addition we discuss applications to (v) select variables optimally in an ecology problem and (vi) design a car refueling experiment for the auto industry using a logistic model with multiple interacting factors.
摘要
自然 inspirited metaheuristic algorithms 是人工智能中重要的组件,广泛应用于各个领域解决各种困难的优化问题。我们使用一种新提出的自然 inspirited metaheuristic algorithm called competitive swarm optimizer with mutated agents(CSO-MA),并证明其灵活性和相比其他竞争对手的高性能。在统计科学中,我们应用了这种算法,包括:(i) 在单元维度泛化趋势模型中找到最大化拟合参数的最佳估计值,以研究生物信息学中的pseudotime。(ii) 在教育研究中,使用这种算法来估计Rasch模型中的参数。(iii) 在Markov renewal模型中使用这种算法来找到Cox回归的M-估计值。(iv) 在两个分布中完成缺失值的imatrix completion。此外,我们还讨论了这种算法在生态学问题中选取变量的最佳方法,以及在汽车业中使用Logistic模型来设计一个循环实验。
AdaptEx: A Self-Service Contextual Bandit Platform
results: 该平台可以快速提高用户体验,降低传统测试方法相关的成本和时间。它还可以在内容不断变化和连续”冷启”情况下快速迭代到优化产品解决方案。Abstract
This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous "cold start" situations gracefully.
摘要
Understanding the Effect of Counterfactual Explanations on Trust and Reliance on AI for Human-AI Collaborative Clinical Decision Making
paper_authors: Min Hun Lee, Chong Jun Chew for: 这个论文旨在研究人工智能(AI)在高风险领域(如医疗)的决策支持方面,以及人类对 AI 建议的过度依赖问题。methods: 该论文使用了突出特征解释以及对假解释来让人类更加分析地评估 AI 建议,以降低对 AI 的过度依赖。results: 研究发现,当 AI 模型提供正确的建议时,人类的性能和协调水平都得到了提高。然而,人类对 AI 模型的错误建议仍然存在过度依赖的问题,并且counterfactual解释可以帮助人类减少对错误 AI 建议的过度依赖。Specifically, 非专业人员在使用 counterfactual解释时表现出了更大的改善,而专业人员的性能则更好。Abstract
Artificial intelligence (AI) is increasingly being considered to assist human decision-making in high-stake domains (e.g. health). However, researchers have discussed an issue that humans can over-rely on wrong suggestions of the AI model instead of achieving human AI complementary performance. In this work, we utilized salient feature explanations along with what-if, counterfactual explanations to make humans review AI suggestions more analytically to reduce overreliance on AI and explored the effect of these explanations on trust and reliance on AI during clinical decision-making. We conducted an experiment with seven therapists and ten laypersons on the task of assessing post-stroke survivors' quality of motion, and analyzed their performance, agreement level on the task, and reliance on AI without and with two types of AI explanations. Our results showed that the AI model with both salient features and counterfactual explanations assisted therapists and laypersons to improve their performance and agreement level on the task when `right' AI outputs are presented. While both therapists and laypersons over-relied on `wrong' AI outputs, counterfactual explanations assisted both therapists and laypersons to reduce their over-reliance on `wrong' AI outputs by 21\% compared to salient feature explanations. Specifically, laypersons had higher performance degrades by 18.0 f1-score with salient feature explanations and 14.0 f1-score with counterfactual explanations than therapists with performance degrades of 8.6 and 2.8 f1-scores respectively. Our work discusses the potential of counterfactual explanations to better estimate the accuracy of an AI model and reduce over-reliance on `wrong' AI outputs and implications for improving human-AI collaborative decision-making.
摘要
人工智能(AI)在高度决策领域(如医疗)中被越来越广泛使用以协助人类决策。然而,研究人员已经提出了一个问题,即人们可能会因为AI模型的错误建议而过分依赖于AI,而不是 дости得人类AI补充性能。在这项工作中,我们使用了突出特征说明以及对比说明来使人们更加分析地评估AI建议,以降低对AI的过分依赖。我们在评估患者质量运动质量任务上进行了实验,并分析了参与者的表现、同意水平和对AI的依赖度。我们的结果表明,含有突出特征和对比说明的AI模型可以帮助治疗师和非专业人员提高表现和同意水平。而且,对比说明可以帮助治疗师和非专业人员减少对“错误”AI输出的过分依赖,相比突出特征说明下降21%。特别是,非专业人员在使用突出特征说明时表现下降18.0 f1-score,而在使用对比说明时表现下降14.0 f1-score,而治疗师则表现下降8.6和2.8 f1-score。我们的研究表明,对比说明可以更好地估计AI模型的准确性,降低对“错误”AI输出的过分依赖,并带来人类AI协同决策的进步。
Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning
paper_authors: Simon Queyrut, Yérom-David Bromberg, Valerio Schiavoni
for: The paper is written to address the issue of privacy preservation in federated learning, specifically the problem of malicious probing attacks on the model updates.
methods: The paper proposes a novel shielding mechanism called Pelta, which leverages Trusted Execution Environments (TEEs) to mask part of the back-propagation chain rule and prevent attackers from exploiting it for the design of malicious samples.
results: The paper demonstrates the effectiveness of Pelta against the Self Attention Gradient adversarial attack on a state-of-the-art ensemble model.Here’s the simplified Chinese version of the three points:
results: 本文对一个国际 ensemble 模型进行了评估,并证明Pelta可以有效防止自我注意力梯度攻击。Abstract
The main premise of federated learning is that machine learning model updates are computed locally, in particular to preserve user data privacy, as those never leave the perimeter of their device. This mechanism supposes the general model, once aggregated, to be broadcast to collaborating and non malicious nodes. However, without proper defenses, compromised clients can easily probe the model inside their local memory in search of adversarial examples. For instance, considering image-based applications, adversarial examples consist of imperceptibly perturbed images (to the human eye) misclassified by the local model, which can be later presented to a victim node's counterpart model to replicate the attack. To mitigate such malicious probing, we introduce Pelta, a novel shielding mechanism leveraging trusted hardware. By harnessing the capabilities of Trusted Execution Environments (TEEs), Pelta masks part of the back-propagation chain rule, otherwise typically exploited by attackers for the design of malicious samples. We evaluate Pelta on a state of the art ensemble model and demonstrate its effectiveness against the Self Attention Gradient adversarial Attack.
摘要
SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling
methods: 本研究使用 Super Learner Equation Modeling(SLEM),一种基于机器学习 Super Learner 集成的路径模型方法,来解决 causal inference 中的函数偏差问题。
results: 对比 SEM 方法,SLEM 在线性模型中表现出了竞争性的表现,并且在非线性关系中表现出了superiority。此外,SLEM 还提供了可靠且不偏的 causal effect 估计方法,可以用于 observational 数据上进行预测性的 intervenion 研究。Abstract
Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
摘要
科学中的重要目标之一是 causal inference,即使用观察数据来得到干扰干扰的结论。路径模型、结构方程模型(SEM)和、更一般地说,导向无环图(DAG)可以明确地 especify causal structure underlying a phenomenon的假设。不同于 DAG,SEM 假设 linearity,这可能会导致函数假设不正确,从而阻碍研究人员进行可靠的效应大小估计。在这种情况下,我们提出了 Super Learner Equation Modeling,一种路径模型技术, integrate machine learning Super Learner ensembles。我们经验表明它可以提供可靠和不偏的 causal effect estimates,与 SEM 在线性模型中的表现竞争性,并在非线性关系中超过 SEM。我们提供了开源代码和一个教程Notebook,强调这种方法的易用性。
paper_authors: Weijie Chen, Lin Yao, Zeqing Xia, Yuhang Wang for:This paper aims to improve the accuracy of 3D structure reconstruction from cryo-EM images with unknown poses and low signal-to-noise ratio.methods:The proposed method, called ACE-HetEM, uses an unsupervised deep learning architecture based on amortized inference to disentangle conformation classifications and pose estimations.results:ACE-HetEM has comparable accuracy in pose estimation and produces better reconstruction resolution than non-amortized methods on simulated datasets, and is also applicable to real experimental datasets.Here’s the Chinese translation:for:这篇论文目的是提高低信号噪响和未知投影角度的普适电子顺传影像三维结构重建精度。methods:提议的方法是基于杜邦诱导的无监督深度学习架构ACE-HetEM,以分离配置分类和投影估计。results:在模拟数据集上,ACE-HetEM和非杜邦方法相比,pose估计准确率相似,重建分辨率更高,并且可应用于实验室数据集。Abstract
Due to the extremely low signal-to-noise ratio (SNR) and unknown poses (projection angles and image translation) in cryo-EM experiments, reconstructing 3D structures from 2D images is very challenging. On top of these challenges, heterogeneous cryo-EM reconstruction also has an additional requirement: conformation classification. An emerging solution to this problem is called amortized inference, implemented using the autoencoder architecture or its variants. Instead of searching for the correct image-to-pose/conformation mapping for every image in the dataset as in non-amortized methods, amortized inference only needs to train an encoder that maps images to appropriate latent spaces representing poses or conformations. Unfortunately, standard amortized-inference-based methods with entangled latent spaces have difficulty learning the distribution of conformations and poses from cryo-EM images. In this paper, we propose an unsupervised deep learning architecture called "ACE-HetEM" based on amortized inference. To explicitly enforce the disentanglement of conformation classifications and pose estimations, we designed two alternating training tasks in our method: image-to-image task and pose-to-pose task. Results on simulated datasets show that ACE-HetEM has comparable accuracy in pose estimation and produces even better reconstruction resolution than non-amortized methods. Furthermore, we show that ACE-HetEM is also applicable to real experimental datasets.
摘要
Translated into Simplified Chinese:由于电镜电子显微镜实验中的信号响应率(SNR)和不确定的投影角度和图像翻译都很低,从2D图像中重建3D结构非常困难。此外,异类电镜电子显微镜重建还有一个额外要求:摘要分类。一种迅速成为解决方案的方法是使用启发器架构或其变体来实现摘要推理。而非摘要方法需要对每个图像在数据集中搜索正确的图像到pose/摘要映射。然而,标准的摘要推理方法附加的缺点是难以从电镜电子显微镜图像中学习摘要分布和投影角度的分布。在这篇论文中,我们提出了一种无监督深度学习架构,称为“ACE-HetEM”,基于摘要推理。为了明确分离摘要分类和投影估计的分布,我们在我们的方法中设计了两个相互轮换的训练任务:图像到图像任务和投影到投影任务。实验结果表明,ACE-HetEM在 simulate 数据集上有相当的准确率和更高的重建分辨率,而且可以应用于实验数据集。
HSD-PAM: High Speed Super Resolution Deep Penetration Photoacoustic Microscopy Imaging Boosted by Dual Branch Fusion Network
results: 经过广泛的数值和生物体实验 validate,该算法可以提高PAM系统的速度和分辨率,同时保持AR-PAM模式的深度能力。结果显示,对于低分辨率、低采样率AR-PAM图像,可以通过增加采样率和提高分辨率来提高图像质量,从而实现高速、超分辨率、深度采集的PAM系统(HSD-PAM)。Abstract
Photoacoustic microscopy (PAM) is a novel implementation of photoacoustic imaging (PAI) for visualizing the 3D bio-structure, which is realized by raster scanning of the tissue. However, as three involved critical imaging parameters, imaging speed, lateral resolution, and penetration depth have mutual effect to one the other. The improvement of one parameter results in the degradation of other two parameters, which constrains the overall performance of the PAM system. Here, we propose to break these limitations by hardware and software co-design. Starting with low lateral resolution, low sampling rate AR-PAM imaging which possesses the deep penetration capability, we aim to enhance the lateral resolution and up sampling the images, so that high speed, super resolution, and deep penetration for the PAM system (HSD-PAM) can be achieved. Data-driven based algorithm is a promising approach to solve this issue, thereby a dedicated novel dual branch fusion network is proposed, which includes a high resolution branch and a high speed branch. Since the availability of switchable AR-OR-PAM imaging system, the corresponding low resolution, undersample AR-PAM and high resolution, full sampled OR-PAM image pairs are utilized for training the network. Extensive simulation and in vivo experiments have been conducted to validate the trained model, enhancement results have proved the proposed algorithm achieved the best perceptual and quantitative image quality. As a result, the imaging speed is increased 16 times and the imaging lateral resolution is improved 5 times, while the deep penetration merit of AR-PAM modality is still reserved.
摘要
photoacoustic microscopy (PAM) 是一种新的 photoacoustic imaging (PAI) 技术,用于可见三维生物结构,通过扫描生物组织的方式实现。然而,存在三个关键的图像参数之间的互相关系:图像速度、横向分辨率和吸收深度。提高一个参数会导致其他两个参数受损,这限制了整体 PAM 系统的性能。我们提议通过硬件和软件合作来突破这些限制。从低横向分辨率、低抽象率 AR-PAM 成像开始,我们想要提高横向分辨率和更新图像,以达到高速、超分辨、深入吸收的 PAM 系统(HSD-PAM)。数据驱动基于的算法是一种有希望的方法,因此我们提出了一种专门的双分支融合网络,包括高分辨率分支和高速分支。由于可用的可 switchable AR-OR-PAM 成像系统,对应的低分辨率、下抽象 AR-PAM 和高分辨率、全样本 OR-PAM 图像对在训练网络时使用。我们进行了广泛的 simulations 和生物实验,以验证训练的模型,提升结果表明,提案的算法实现了最佳的感知和量化图像质量。因此,图像速度提高 16 倍,横向分辨率提高 5 倍,而 AR-PAM 模式下的深入吸收特性仍然保留。
StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability
paper_authors: Tengchuan Kou, Xiaohong Liu, Wei Sun, Jun Jia, Xiongkuo Min, Guangtao Zhai, Ning Liu
For: The paper is written for evaluating the stability of User Generated Content (UGC) videos and proposing a novel Video Quality Assessment for Stability (VQA-S) model.* Methods: The paper uses a novel VQA-S model named StableVQA, which consists of three feature extractors to acquire optical flow, semantic, and blur features, and a regression layer to predict the final stability score.* Results: The paper achieves a higher correlation with subjective opinions than existing VQA-S models and generic VQA models, and provides a new database named StableDB that contains 1,952 diversely-shaky UGC videos with subjective scores for video stability.Abstract
Video shakiness is an unpleasant distortion of User Generated Content (UGC) videos, which is usually caused by the unstable hold of cameras. In recent years, many video stabilization algorithms have been proposed, yet no specific and accurate metric enables comprehensively evaluating the stability of videos. Indeed, most existing quality assessment models evaluate video quality as a whole without specifically taking the subjective experience of video stability into consideration. Therefore, these models cannot measure the video stability explicitly and precisely when severe shakes are present. In addition, there is no large-scale video database in public that includes various degrees of shaky videos with the corresponding subjective scores available, which hinders the development of Video Quality Assessment for Stability (VQA-S). To this end, we build a new database named StableDB that contains 1,952 diversely-shaky UGC videos, where each video has a Mean Opinion Score (MOS) on the degree of video stability rated by 34 subjects. Moreover, we elaborately design a novel VQA-S model named StableVQA, which consists of three feature extractors to acquire the optical flow, semantic, and blur features respectively, and a regression layer to predict the final stability score. Extensive experiments demonstrate that the StableVQA achieves a higher correlation with subjective opinions than the existing VQA-S models and generic VQA models. The database and codes are available at https://github.com/QMME/StableVQA.
摘要
视频不稳定是User Generated Content(UGC)视频中的一种不приятный扭曲,通常是因为摄像机不稳定。在过去的几年中,许多视频稳定算法被提出,但没有专门和准确的度量能够全面评估视频的稳定性。实际上,大多数现有的视频质量评估模型只评估视频质量总体而不是特定地考虑视频稳定的主观体验。因此,这些模型不能明确和精确地测量在严重抖动时的视频稳定性。另外,没有公开的大规模视频数据库,其中包含了不同程度的抖动视频以及对应的主观得分,这阻碍了视频质量评估 для稳定(VQA-S)的发展。为此,我们建立了一个名为StableDB的数据库,其中包含1,952个多样化的UGC视频,每个视频都有由34名评分者评分的视频稳定度的Mean Opinion Score(MOS)。此外,我们还 elaborately 设计了一种名为StableVQA的新型VQA-S模型,它包括三个特征提取器,用于获取光流、 semantics 和模糊特征,以及一个回归层用于预测最终的稳定度分。广泛的实验表明,StableVQA在与主观意见相关性方面高于现有的VQA-S模型和通用VQA模型。数据库和代码可以在https://github.com/QMME/StableVQA 中获取。
An automated pipeline for quantitative T2* fetal body MRI and segmentation at low field
paper_authors: Kelly Payette, Alena Uus, Jordina Aviles Verdera, Carla Avena Zampieri, Megan Hall, Lisa Story, Maria Deprez, Mary A. Rutherford, Joseph V. Hajnal, Sebastien Ourselin, Raphael Tomi-Tricot, Jana Hutter
results: 研究发现,这种管道可以成功地对17-40周的胎儿进行高精度的T2*相关分析,并且具有高度的抗变化和鲁棒性。Abstract
Fetal Magnetic Resonance Imaging at low field strengths is emerging as an exciting direction in perinatal health. Clinical low field (0.55T) scanners are beneficial for fetal imaging due to their reduced susceptibility-induced artefacts, increased T2* values, and wider bore (widening access for the increasingly obese pregnant population). However, the lack of standard automated image processing tools such as segmentation and reconstruction hampers wider clinical use. In this study, we introduce a semi-automatic pipeline using quantitative MRI for the fetal body at low field strength resulting in fast and detailed quantitative T2* relaxometry analysis of all major fetal body organs. Multi-echo dynamic sequences of the fetal body were acquired and reconstructed into a single high-resolution volume using deformable slice-to-volume reconstruction, generating both structural and quantitative T2* 3D volumes. A neural network trained using a semi-supervised approach was created to automatically segment these fetal body 3D volumes into ten different organs (resulting in dice values > 0.74 for 8 out of 10 organs). The T2* values revealed a strong relationship with GA in the lungs, liver, and kidney parenchyma (R^2>0.5). This pipeline was used successfully for a wide range of GAs (17-40 weeks), and is robust to motion artefacts. Low field fetal MRI can be used to perform advanced MRI analysis, and is a viable option for clinical scanning.
摘要
低场强磁共振成像在妊娠健康领域是一项发展方向。低场(0.55T)扫描仪的优点包括降低受影响的artefacts,提高T2*值和更宽的轴扩(扩大对日益肥胖妊娠人口的访问)。然而,由于缺乏标准自动化图像处理工具,如分割和重建,因此对于临床应用而言,更加受限。本研究中,我们提出了一个半自动化管线,使用量化MRI对妊娠体内部进行快速和详细的T2*相关分析。我们使用多echo动态序列获取和重建妊娠体内部高分辨率三维volume,并使用可变的材料学模型进行slice-to-volume重建。通过使用半超级vised学习方法,我们自动将妊娠体三维volume分割成十个不同的器官(得到了 dice值超过0.74的八个器官)。T2*值与胎儿龄(GA)之间存在强相关性(R^2>0.5)。这种管线在17-40周的多个胎儿龄上成功应用,并具有对运动artefacts的Robust性。低场妊娠MRI可以进行高级MRI分析,是一种可靠的扫描选项。
Transmission and Color-guided Network for Underwater Image Enhancement
results: EXTENSIVE experiments示出ATDCnet在多个benchmark数据集上达到了当今最佳性能水平。Abstract
In recent years, with the continuous development of the marine industry, underwater image enhancement has attracted plenty of attention. Unfortunately, the propagation of light in water will be absorbed by water bodies and scattered by suspended particles, resulting in color deviation and low contrast. To solve these two problems, we propose an Adaptive Transmission and Dynamic Color guided network (named ATDCnet) for underwater image enhancement. In particular, to exploit the knowledge of physics, we design an Adaptive Transmission-directed Module (ATM) to better guide the network. To deal with the color deviation problem, we design a Dynamic Color-guided Module (DCM) to post-process the enhanced image color. Further, we design an Encoder-Decoder-based Compensation (EDC) structure with attention and a multi-stage feature fusion mechanism to perform color restoration and contrast enhancement simultaneously. Extensive experiments demonstrate the state-of-the-art performance of the ATDCnet on multiple benchmark datasets.
摘要
Specifically, we design an Adaptive Transmission-directed Module (ATM) to leverage the knowledge of physics and better guide the network. To address the color deviation problem, we design a Dynamic Color-guided Module (DCM) to post-process the enhanced image color. Additionally, we design an Encoder-Decoder-based Compensation (EDC) structure with attention and a multi-stage feature fusion mechanism to simultaneously perform color restoration and contrast enhancement. Extensive experiments show that the ATDCnet achieves state-of-the-art performance on multiple benchmark datasets.
Deep Generative Networks for Heterogeneous Augmentation of Cranial Defects
results: 这些生成的异常头盾可以帮助提高人工头盾设计的自动化程度,并且可以增强异常部分的分类效果。研究显示,使用这些生成的异常头盾可以提高头盾设计的精度和效率。Abstract
The design of personalized cranial implants is a challenging and tremendous task that has become a hot topic in terms of process automation with the use of deep learning techniques. The main challenge is associated with the high diversity of possible cranial defects. The lack of appropriate data sources negatively influences the data-driven nature of deep learning algorithms. Hence, one of the possible solutions to overcome this problem is to rely on synthetic data. In this work, we propose three volumetric variations of deep generative models to augment the dataset by generating synthetic skulls, i.e. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), WGAN-GP hybrid with Variational Autoencoder pretraining (VAE/WGAN-GP) and Introspective Variational Autoencoder (IntroVAE). We show that it is possible to generate dozens of thousands of defective skulls with compatible defects that achieve a trade-off between defect heterogeneity and the realistic shape of the skull. We evaluate obtained synthetic data quantitatively by defect segmentation with the use of V-Net and qualitatively by their latent space exploration. We show that the synthetically generated skulls highly improve the segmentation process compared to using only the original unaugmented data. The generated skulls may improve the automatic design of personalized cranial implants for real medical cases.
摘要
个人化骨Implant设计是一个具有挑战性和巨大性的任务,现在透过深度学习技术进行自动化处理而成为热门话题。主要挑战在于骨骼缺陷的多样性。无法获得适当的数据源,对深度学习算法的数据驱动性产生负面影响。因此,我们提出了利用合成数据作为解决方案。在这个研究中,我们提出了三种深度生成模型的卷积量变化,即Wasserstein生成对抗网络受限GP(WGAN-GP)、WGAN-GP混合Variale Autoencoder预训练(VAE/WGAN-GP)和Introspective Variale Autoencoder(IntroVAE)。我们发现可以生成万分之一的缺陷骨骼,并且可以实现缺陷兼具实际骨骼形状的调和。我们使用V-Net进行缺陷分类,并且进行潜在空间探索。我们发现生成的骨骼可以对骨骼缺陷进行优化分类,相比使用仅有原始未处理数据,提高分类效果。这些生成的骨骼可能对实际医疗案例中的个人化骨Implant设计提供帮助。
HyperCoil-Recon: A Hypernetwork-based Adaptive Coil Configuration Task Switching Network for MRI Reconstruction
results: 这个方法可以在不同的配置下进行灵活的自适应,并且可以与特定的配置进行实时的配置匹配,实现了在试验时对应不同配置的普遍性。实验结果显示,这个方法可以与特定配置进行匹配,并且可以与不同的配置进行自适应,而且与专门设计的配置模型相比,有着1-3dB / 0.02-0.03的提升。Abstract
Parallel imaging, a fast MRI technique, involves dynamic adjustments based on the configuration i.e. number, positioning, and sensitivity of the coils with respect to the anatomy under study. Conventional deep learning-based image reconstruction models have to be trained or fine-tuned for each configuration, posing a barrier to clinical translation, given the lack of computational resources and machine learning expertise for clinicians to train models at deployment. Joint training on diverse datasets learns a single weight set that might underfit to deviated configurations. We propose, HyperCoil-Recon, a hypernetwork-based coil configuration task-switching network for multi-coil MRI reconstruction that encodes varying configurations of the numbers of coils in a multi-tasking perspective, posing each configuration as a task. The hypernetworks infer and embed task-specific weights into the reconstruction network, 1) effectively utilizing the contextual knowledge of common and varying image features among the various fields-of-view of the coils, and 2) enabling generality to unseen configurations at test time. Experiments reveal that our approach 1) adapts on the fly to various unseen configurations up to 32 coils when trained on lower numbers (i.e. 7 to 11) of randomly varying coils, and to 120 deviated unseen configurations when trained on 18 configurations in a single model, 2) matches the performance of coil configuration-specific models, and 3) outperforms configuration-invariant models with improvement margins of around 1 dB / 0.03 and 0.3 dB / 0.02 in PSNR / SSIM for knee and brain data. Our code is available at https://github.com/sriprabhar/HyperCoil-Recon
摘要
“ параллельное изображение, быстрый метод МРИ, involves 动态调整基于配置,即数量、位置和敏感度的磁共振器与研究对象的解剖学相关。 传统的深度学习基于图像重建模型需要训练或微调,这会对临床应用带来障碍,因为临床医生缺乏计算资源和机器学习专业知识来训练模型。 我们提出了 HyperCoil-Recon,一种基于 hypernetwork 的磁共振器配置任务转换网络,用于多个磁共振器的图像重建。 Hypernetworks 将任务特定的 weights 适应到重建网络中,1) 有效地利用磁共振器不同配置下图像特征的共同知识,2) 允许在测试时适应未看过的配置。 实验表明,我们的方法可以1) 在未见过配置下适应到多达 32 个磁共振器,2) 与磁共振器配置特定模型匹配,3) 超过配置不变模型,增幅约为 1 dB / 0.03 和 0.3 dB / 0.02 的 PSNR / SSIM 值。我们的代码可以在 GitHub 上找到。”
An Integrated Visual Analytics System for Studying Clinical Carotid Artery Plaques
results: 通过使用医院实际取得的数据进行两个案例研究,结果表明,设计的血液动脉分析系统可以有效地为血液动脉外科医生提供临床诊断和治疗指导。Abstract
Carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery plaques. We have designed an intelligent carotid artery plaque visual analysis system for vascular surgery experts to comprehensively analyze the clinical physiological and imaging indicators of carotid artery diseases. The system mainly includes two functions: First, it displays the correlation between carotid artery plaque and various factors through a series of information visualization methods and integrates the analysis of patient physiological indicator data. Second, it enhances the interface guidance analysis of the inherent correlation between the components of carotid artery plaque through machine learning and displays the spatial distribution of the plaque on medical images. Additionally, we conducted two case studies on carotid artery plaques using real data obtained from a hospital, and the results indicate that our designed carotid analysis system can effectively provide clinical diagnosis and treatment guidance for vascular surgeons.
摘要
carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery plaques. We have designed an intelligent carotid artery plaque visual analysis system for vascular surgery experts to comprehensively analyze the clinical physiological and imaging indicators of carotid artery diseases. The system mainly includes two functions: First, it displays the correlation between carotid artery plaque and various factors through a series of information visualization methods and integrates the analysis of patient physiological indicator data. Second, it enhances the interface guidance analysis of the inherent correlation between the components of carotid artery plaque through machine learning and displays the spatial distribution of the plaque on medical images. Additionally, we conducted two case studies on carotid artery plaques using real data obtained from a hospital, and the results indicate that our designed carotid analysis system can effectively provide clinical diagnosis and treatment guidance for vascular surgeons.Here's the text in Traditional Chinese:carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery plaques. We have designed an intelligent carotid artery plaque visual analysis system for vascular surgery experts to comprehensively analyze the clinical physiological and imaging indicators of carotid artery diseases. The system mainly includes two functions: First, it displays the correlation between carotid artery plaque and various factors through a series of information visualization methods and integrates the analysis of patient physiological indicator data. Second, it enhances the interface guidance analysis of the inherent correlation between the components of carotid artery plaque through machine learning and displays the spatial distribution of the plaque on medical images. Additionally, we conducted two case studies on carotid artery plaques using real data obtained from a hospital, and the results indicate that our designed carotid analysis system can effectively provide clinical diagnosis and treatment guidance for vascular surgeons.
Long-Distance Gesture Recognition using Dynamic Neural Networks
paper_authors: Shubhang Bhatnagar, Sharath Gopal, Narendra Ahuja, Liu Ren
For: This paper is written for recognizing gestures from longer distances, specifically for applications such as gesture-based interactions with a floor cleaning robot or a drone.* Methods: The proposed method uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing, which helps the network focus on features important for gesture recognition while discarding background features early on.* Results: The proposed method outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency in the LD-ConGR long-distance dataset.Abstract
Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.
摘要
Gestures 是人工智能与机器之间重要的通信媒体。现有的大多数手势识别方法假设人类和机器在非常近的距离上进行交互,这个短距离假设不符实际的许多交互情况,例如与地板干净机器人或者无人机的交互。由于手势占输入数据中的只占一小部分,因此这些方法在远距离识别中表现不佳。我们提出了一种新的、准确和高效的手势识别方法,它使用动态神经网络选择手势包含的空间区域的输入感知数据进行进一步处理。这有助于网络在执行手势识别时更加有效地启用有限的计算资源,并且能够更高效地识别手势。我们在LD-ConGR长距离数据集上展示了我们的方法的性能,其与前一个状态的方法在识别精度和计算效率上都表现出色。
1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges
paper_authors: Kaer Huang for:* This paper focuses on video instance segmentation (VIS) in long-tailed and open-world scenarios, where traditional VIS methods are limited to a small number of common classes but real-world applications require detection and tracking of rare and never-before-seen objects.methods:* The proposed method, LeTracker, uses a combination of segmentation and CEM on LVISv0.5 + COCO dataset for training the detector, and instance appearance similarity head on TAO dataset.results:* LeTracker achieves 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark, and 61.4 OWTAall in the open-world challenges, ranking 1st in the benchmark.Abstract
Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only a few dozen of categories, lacking the ability to handle diverse objects in real-world videos. As TAO and BURST datasets release, we have the opportunity to research VIS in long-tailed and open-world scenarios. Traditional VIS methods are evaluated on benchmarks limited to a small number of common classes, But practical applications require trackers that go beyond these common classes, detecting and tracking rare and even never-before-seen objects. Inspired by the latest MOT paper for the long tail task (Tracking Every Thing in the Wild, Siyuan Li et), for the BURST long tail challenge, we train our model on a combination of LVISv0.5 and the COCO dataset using repeat factor sampling. First, train the detector with segmentation and CEM on LVISv0.5 + COCO dataset. And then, train the instance appearance similarity head on the TAO dataset. at last, our method (LeTracker) gets 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark. for the open-world challenges, we only use 64 classes (Intersection classes of BURST Train subset and COCO dataset, without LVIS dataset) annotations data training, and testing on BURST test set data and get 61.4 OWTAall, ranking 1st in the benchmark. Our code will be released to facilitate future research.
摘要
当前,视频实例分 segmentation(VIS)目标是将视频中的对象分类和分割,但现有的VIS方法只能处理固定的训练类别,缺乏对实际世界视频中的多样化对象的能力。随着TAO和BURST数据集的发布,我们有机会进行VIS在长尾和开放世界场景下的研究。传统的VIS方法通常被评估在限定的一些常见类别上,但实际应用需要超出这些常见类别,检测和跟踪罕见和从未seen的对象。 drawing inspiration from the latest MOT paper on the long tail task(Tracking Every Thing in the Wild,Siyuan Li et al),我们在BURST长尾挑战中使用重复因子抽样训练我们的模型。首先,我们使用LVISv0.5和COCO数据集训练探测器的 segmentation和CEM。然后,我们在TAO数据集上训练实例外观相似头。最后,我们的方法(LeTracker)在BURST测试集上获得14.9 HOTAall,排名第一名。在开放世界挑战中,我们只使用64个类别(BURST训练集和COCO数据集的交集类别,不包括LVIS数据集)的注释数据进行训练,并在BURST测试集上进行测试,得到61.4 OWTAall,排名第一名。我们将代码发布,以便未来的研究。
Semi-Supervised Semantic Segmentation of Cell Nuclei via Diffusion-based Large-Scale Pre-Training and Collaborative Learning
results: 在四个公开的数据集上进行了实验,证明了我们的框架可以与竞争性的半监督数据学习方法相比,并且在监督式分类模型的基础上进行了进一步的改进。Abstract
Automated semantic segmentation of cell nuclei in microscopic images is crucial for disease diagnosis and tissue microenvironment analysis. Nonetheless, this task presents challenges due to the complexity and heterogeneity of cells. While supervised deep learning methods are promising, they necessitate large annotated datasets that are time-consuming and error-prone to acquire. Semi-supervised approaches could provide feasible alternatives to this issue. However, the limited annotated data may lead to subpar performance of semi-supervised methods, regardless of the abundance of unlabeled data. In this paper, we introduce a novel unsupervised pre-training-based semi-supervised framework for cell-nuclei segmentation. Our framework is comprised of three main components. Firstly, we pretrain a diffusion model on a large-scale unlabeled dataset. The diffusion model's explicit modeling capability facilitates the learning of semantic feature representation from the unlabeled data. Secondly, we achieve semantic feature aggregation using a transformer-based decoder, where the pretrained diffusion model acts as the feature extractor, enabling us to fully utilize the small amount of labeled data. Finally, we implement a collaborative learning framework between the diffusion-based segmentation model and a supervised segmentation model to further enhance segmentation performance. Experiments were conducted on four publicly available datasets to demonstrate significant improvements compared to competitive semi-supervised segmentation methods and supervised baselines. A series of out-of-distribution tests further confirmed the generality of our framework. Furthermore, thorough ablation experiments and visual analysis confirmed the superiority of our proposed method.
摘要
自动化的细胞核 segmentation 在微scopic 图像中是致命的 для疾病诊断和组织微environment 分析。然而,这个任务存在复杂性和多样性的细胞问题。 Although supervised deep learning methods are promising, they require large annotated datasets that are time-consuming and error-prone to acquire. Semi-supervised approaches could provide feasible alternatives to this issue. However, the limited annotated data may lead to subpar performance of semi-supervised methods, regardless of the abundance of unlabeled data.In this paper, we introduce a novel unsupervised pre-training-based semi-supervised framework for cell-nuclei segmentation. Our framework consists of three main components. First, we pretrain a diffusion model on a large-scale unlabeled dataset. The diffusion model's explicit modeling capability facilitates the learning of semantic feature representation from the unlabeled data. Second, we achieve semantic feature aggregation using a transformer-based decoder, where the pretrained diffusion model acts as the feature extractor, enabling us to fully utilize the small amount of labeled data. Finally, we implement a collaborative learning framework between the diffusion-based segmentation model and a supervised segmentation model to further enhance segmentation performance. Experiments were conducted on four publicly available datasets to demonstrate significant improvements compared to competitive semi-supervised segmentation methods and supervised baselines. A series of out-of-distribution tests further confirmed the generality of our framework. Furthermore, thorough ablation experiments and visual analysis confirmed the superiority of our proposed method.
Towards Automatic Scoring of Spinal X-ray for Ankylosing Spondylitis
methods: 这个pipeline使用了我们先前开发的 VU 抽出pipeline (VertXNet) 来生成 VU,然后使用这些 VU 作为输入,预测 mSASSS 分数。
results: 我们的结果显示,这个pipeline可以在有限量和不均匀的数据下预测每个 VU 的 mSASSS 分数。总体而言,它可以在两个试验数据集上 achieve 平均准确率为 0.56 和 0.51 для 4 个不同的 mSASSS 分数 (即分数为 0、1、2、3)。Abstract
Manually grading structural changes with the modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) on spinal X-ray imaging is costly and time-consuming due to bone shape complexity and image quality variations. In this study, we address this challenge by prototyping a 2-step auto-grading pipeline, called VertXGradeNet, to automatically predict mSASSS scores for the cervical and lumbar vertebral units (VUs) in X-ray spinal imaging. The VertXGradeNet utilizes VUs generated by our previously developed VU extraction pipeline (VertXNet) as input and predicts mSASSS based on those VUs. VertXGradeNet was evaluated on an in-house dataset of lateral cervical and lumbar X-ray images for axial spondylarthritis patients. Our results show that VertXGradeNet can predict the mSASSS score for each VU when the data is limited in quantity and imbalanced. Overall, it can achieve a balanced accuracy of 0.56 and 0.51 for 4 different mSASSS scores (i.e., a score of 0, 1, 2, 3) on two test datasets. The accuracy of the presented method shows the potential to streamline the spinal radiograph readings and therefore reduce the cost of future clinical trials.
摘要
人工评估结构变化使用修改Stoke Ankylosing Spondylitis Spinal Score(mSASSS)在脊椎X射线成像中是成本高和时间耗费大的,这是因为骨形态复杂和成像质量变化。在这项研究中,我们解决这个挑战,推出了一个两步自动评估管道,称为VertXGradeNet,可以自动预测脊椎X射线成像中mSASSS分数。VertXGradeNet使用我们之前开发的VU提取管道(VertXNet)生成的VU作为输入,并根据这些VU预测mSASSS分数。我们在医学实验室内的一个后退性cervical和肠脊椎X射线成像数据集上评估了VertXGradeNet。结果表明,VertXGradeNet可以在数据量有限、不均衡的情况下预测每个VU的mSASSS分数。总的来说,它可以在两个测试数据集上实现平衡性的准确率0.56和0.51,对四个不同的mSASSS分数(即分数为0、1、2、3)进行预测。提出的方法的准确率表明了将来的脊椎X射线成像读取可能会更加流畅,从而降低未来临床试验的成本。