results: 经过5类 экспериimento,结果表明,ViTScore 能够更好地评估图像的Semantic Similarity,比传统的 PSNR、MS-SSIM 和 LPIPS 三种 metric 更加有效。Abstract
Semantic communications (SC) have been expected to be a new paradigm shifting to catalyze the next generation communication, whose main concerns shift from accurate bit transmission to effective semantic information exchange in communications. However, the previous and widely-used metrics for images are not applicable to evaluate the image semantic similarity in SC. Classical metrics to measure the similarity between two images usually rely on the pixel level or the structural level, such as the PSNR and the MS-SSIM. Straightforwardly using some tailored metrics based on deep-learning methods in CV community, such as the LPIPS, is infeasible for SC. To tackle this, inspired by BERTScore in NLP community, we propose a novel metric for evaluating image semantic similarity, named Vision Transformer Score (ViTScore). We prove theoretically that ViTScore has 3 important properties, including symmetry, boundedness, and normalization, which make ViTScore convenient and intuitive for image measurement. To evaluate the performance of ViTScore, we compare ViTScore with 3 typical metrics (PSNR, MS-SSIM, and LPIPS) through 5 classes of experiments. Experimental results demonstrate that ViTScore can better evaluate the image semantic similarity than the other 3 typical metrics, which indicates that ViTScore is an effective performance metric when deployed in SC scenarios.
摘要
听说(SC)将被看作是一个新的思维方式,它将catalyze下一代通信,主要关注从精确位传输升级到有效semantic信息交换在通信中。然而,过去广泛使用的图像评估 metric不适用于图像semantic相似性的评估。经典的图像相似性评估方法通常基于像素层或结构层,如PSNR和MS-SSIM。直接使用CV社区的深度学习方法基于metric,如LPIPS,是不可能的SC中。为了解决这个问题,我们提出了一种新的图像semantic相似性评估 metric,名为视觉 трансформа器分数(ViTScore)。我们证明了ViTScore具有3个重要的性质,包括对称性、卷积性和正规化性,这些性质使得ViTScore在图像评估中方便又直观。为了评估ViTScore的性能,我们与3种典型的metric(PSNR、MS-SSIM和LPIPS)进行5种类型的实验。实验结果表明,ViTScore可以更好地评估图像semantic相似性,这表明ViTScore是SC场景中的有效性能指标。
Evaluating Chatbots to Promote Users’ Trust – Practices and Open Problems
results: 评估chatbot的性能和用户满意度,以及对社会的长期影响。Abstract
Chatbots, the common moniker for collaborative assistants, are Artificial Intelligence (AI) software that enables people to naturally interact with them to get tasks done. Although chatbots have been studied since the dawn of AI, they have particularly caught the imagination of the public and businesses since the launch of easy-to-use and general-purpose Large Language Model-based chatbots like ChatGPT. As businesses look towards chatbots as a potential technology to engage users, who may be end customers, suppliers, or even their own employees, proper testing of chatbots is important to address and mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society. This paper reviews current practices for chatbot testing, identifies gaps as open problems in pursuit of user trust, and outlines a path forward.
摘要
chatbots,它们是人工智能软件,允许人们自然地与其交互,完成任务。虽然 chatbots 已经从人工智能出现以来被研究,但是它们特别在 ChatGPT 类大语言模型基础上的易于使用和通用 chatbots 出现后,引起了公众和企业的关注。在企业希望通过 chatbots 来与用户进行互动,包括客户、供应商和员工,正确测试 chatbots 是非常重要的,以解决服务或产品性能、用户满意度和社会长期未来的问题。本文将评论当前 chatbot 测试实践,描述存在的问题和挑战,并提出未来的发展道路。
Recall-driven Precision Refinement: Unveiling Accurate Fall Detection using LSTM
results: 我们的实验结果显示,本系统具有高精度和高特异性(96%),实现了堕伤检测的目标。我们的研究将fall detection技术带到了新的水平,提供了一个可靠和有效的堕伤预防和处理解决方案。Abstract
This paper presents an innovative approach to address the pressing concern of fall incidents among the elderly by developing an accurate fall detection system. Our proposed system combines state-of-the-art technologies, including accelerometer and gyroscope sensors, with deep learning models, specifically Long Short-Term Memory (LSTM) networks. Real-time execution capabilities are achieved through the integration of Raspberry Pi hardware. We introduce pruning techniques that strategically fine-tune the LSTM model's architecture and parameters to optimize the system's performance. We prioritize recall over precision, aiming to accurately identify falls and minimize false negatives for timely intervention. Extensive experimentation and meticulous evaluation demonstrate remarkable performance metrics, emphasizing a high recall rate while maintaining a specificity of 96\%. Our research culminates in a state-of-the-art fall detection system that promptly sends notifications, ensuring vulnerable individuals receive timely assistance and improve their overall well-being. Applying LSTM models and incorporating pruning techniques represent a significant advancement in fall detection technology, offering an effective and reliable fall prevention and intervention solution.
摘要
(Simplified Chinese translation)这篇论文提出了一种创新的方法,用于解决老年人倒下的问题,即开发一个高度准确的倒下检测系统。我们的提议的系统结合了当前最佳的技术,包��加速度和自转仪器,以及深度学习模型,具体来说是Long Short-Term Memory(LSTM)网络。通过raspberry pi硬件的集成,实现了实时执行能力。我们引入了截剪技术,以优化LSTM模型的结构和参数,以提高系统的性能。我们偏好回报,即准确地识别倒下,而不是精度。通过严格的实验和评估,我们得到了惊人的性能指标,包括高回报率和96%的特异性。我们的研究最终 culminates in a state-of-the-art fall detection system that promptly sends notifications, ensuring vulnerable individuals receive timely assistance and improve their overall well-being。通过应用LSTM模型和截剪技术,我们代表了一种有效和可靠的倒下检测技术,提供了一个有效的倒下预防和 intervención解决方案。
Distributional Data Augmentation Methods for Low Resource Language
results: 在 svenska 语料中,使用提议的方法可以提高分类性能,特别是在低资源语言中Abstract
Text augmentation is a technique for constructing synthetic data from an under-resourced corpus to improve predictive performance. Synthetic data generation is common in numerous domains. However, recently text augmentation has emerged in natural language processing (NLP) to improve downstream tasks. One of the current state-of-the-art text augmentation techniques is easy data augmentation (EDA), which augments the training data by injecting and replacing synonyms and randomly permuting sentences. One major obstacle with EDA is the need for versatile and complete synonym dictionaries, which cannot be easily found in low-resource languages. To improve the utility of EDA, we propose two extensions, easy distributional data augmentation (EDDA) and type specific similar word replacement (TSSR), which uses semantic word context information and part-of-speech tags for word replacement and augmentation. In an extensive empirical evaluation, we show the utility of the proposed methods, measured by F1 score, on two representative datasets in Swedish as an example of a low-resource language. With the proposed methods, we show that augmented data improve classification performances in low-resource settings.
摘要
文本扩充是一种技术,用于从不充分的 corpus 中构建合成数据,以提高预测性能。合成数据生成在许多领域非常常见。然而,在自然语言处理(NLP)领域,文本扩充最近才得到了应用。一种当前状态的文本扩充技术是轻松数据扩充(EDA),它在训练数据中注入和替换同义词和随机排序句子。然而,EDA 需要具有广泛和完整的同义词词典,这些词典在低资源语言中很难找。为了改进 EDDA 的Utility,我们提出了两种扩展,易用分布数据扩充(EDDA)和类型特定相似词替换(TSSR),它们使用语义词语上下文信息和部首标签进行词替换和扩展。在两个代表性数据集上进行了广泛的实验评估,我们表明了提案方法的有用性, measured by F1 分数。通过扩展了的数据,我们在低资源设置中显示了预测性能的提高。
AmbientFlow: Invertible generative models from incomplete, noisy measurements
results: 数值研究表明,AmbientFlow 可以正确地学习对象分布,并在下游推理任务中进行图像重建。Abstract
Generative models have gained popularity for their potential applications in imaging science, such as image reconstruction, posterior sampling and data sharing. Flow-based generative models are particularly attractive due to their ability to tractably provide exact density estimates along with fast, inexpensive and diverse samples. Training such models, however, requires a large, high quality dataset of objects. In applications such as computed imaging, it is often difficult to acquire such data due to requirements such as long acquisition time or high radiation dose, while acquiring noisy or partially observed measurements of these objects is more feasible. In this work, we propose AmbientFlow, a framework for learning flow-based generative models directly from noisy and incomplete data. Using variational Bayesian methods, a novel framework for establishing flow-based generative models from noisy, incomplete data is proposed. Extensive numerical studies demonstrate the effectiveness of AmbientFlow in correctly learning the object distribution. The utility of AmbientFlow in a downstream inference task of image reconstruction is demonstrated.
摘要
生成模型在媒体科学中得到了广泛的应用,如图像重建、贝叶抽样和数据分享。基于流的生成模型尤其吸引人,因为它们可以追加精确的概率估计,同时提供快速、便宜和多样的样本。但是训练这些模型需要一大量、高质量的对象数据。在计算成像应用中,通常难以获得这些数据,因为需要长时间的获取或高剂量的辐射剂量,而获取噪声或部分观测的对象数据是更可行的。在这项工作中,我们提出了 AmbientFlow,一种直接从噪声和部分观测数据学习流基的生成模型的框架。使用变分 Bayesian 方法,我们提出了一种新的框架,可以从噪声和部分观测数据中直接学习对象分布。我们的数值研究表明,AmbientFlow 可以正确地学习对象分布。此外,AmbientFlow 在下游推理任务中的图像重建中的实用性也被证明。
Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations
results: 实验表明,我们的方法在IEMOCAP benchmark上比其他单模式和多模式方法高出许多,并达到了状态机的性能(77.49%无担荷准确率和78.91%担荷准确率)。详细的ablation研究表明每个组件的影响。Abstract
We propose EmoDistill, a novel speech emotion recognition (SER) framework that leverages cross-modal knowledge distillation during training to learn strong linguistic and prosodic representations of emotion from speech. During inference, our method only uses a stream of speech signals to perform unimodal SER thus reducing computation overhead and avoiding run-time transcription and prosodic feature extraction errors. During training, our method distills information at both embedding and logit levels from a pair of pre-trained Prosodic and Linguistic teachers that are fine-tuned for SER. Experiments on the IEMOCAP benchmark demonstrate that our method outperforms other unimodal and multimodal techniques by a considerable margin, and achieves state-of-the-art performance of 77.49% unweighted accuracy and 78.91% weighted accuracy. Detailed ablation studies demonstrate the impact of each component of our method.
摘要
我们提出了 EmoDistill,一种新的语音情感识别(SER)框架,利用交叉模态知识储备 durante 训练以学习从语音中强大的语言和表征表达情感。在推断过程中,我们的方法仅使用一个流 speech 信号来进行单模态 SER,从而减少计算负担和避免运行时转写和表征特征EXTRACTING错误。在训练过程中,我们的方法在 embedding 和 logit 两个水平上储备信息从 two 个预训练的 Prosodic 和 Linguistic 教师,这些教师在 SER 上进行了精度的 fine-tuning。在 IEMOCAP benchmark 上进行的实验表明,我们的方法在其他单模态和多模态技术的比较中表现出了 considerable 的优势,并达到了 state-of-the-art 性能的 77.49% 不平衡精度和 78.91% 平衡精度。详细的抽象研究表明了我们的方法中每个组件的影响。
Verifiable Reinforcement Learning Systems via Compositionality
results: 实验结果表明,该框架在具有全 observability 和 partial observability 的环境中都能够实现高效的任务执行。同时,该框架可以处理离散和连续状态和动作空间,以及 deterministic 和 stochastic 动力学。Abstract
We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems, each of which learns to accomplish a separate subtask, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process, which is used to plan and analyze compositions of subsystems, and of the collection of low-level subsystems themselves. The subsystems are implemented as deep RL agents operating under partial observability. By defining interfaces between the subsystems, the framework enables automatic decompositions of task specifications, e.g., reach a target set of states with a probability of at least 0.95, into individual subtask specifications, i.e. achieve the subsystem's exit conditions with at least some minimum probability, given that its entry conditions are met. This in turn allows for the independent training and testing of the subsystems. We present theoretical results guaranteeing that if each subsystem learns a policy satisfying its subtask specification, then their composition is guaranteed to satisfy the overall task specification. Conversely, if the subtask specifications cannot all be satisfied by the learned policies, we present a method, formulated as the problem of finding an optimal set of parameters in the high-level model, to automatically update the subtask specifications to account for the observed shortcomings. The result is an iterative procedure for defining subtask specifications, and for training the subsystems to meet them. Experimental results demonstrate the presented framework's novel capabilities in environments with both full and partial observability, discrete and continuous state and action spaces, as well as deterministic and stochastic dynamics.
摘要
我们提出了一个扩展的强化学习(RL)框架,在这个框架中,一群RL子系统,每个子系统都学习完成一个独立的子任务,这些子系统被组合以完成总任务。该框架包括一个高级模型,表示为参数化的随机过程决策过程,用于规划和分析子系统的组合。子系统实现为深度学习RL代理,在受限性观察下运行。通过定义子系统之间的界面,该框架允许自动将任务规范分解成个别子任务规范,例如,达到目标集的状态 WITH least 0.95 的概率,或者在达到子系统的入口条件时,达到至少一定的最小概率。这样做了可以独立地培训和测试子系统。我们提供了理论结果,证明如果每个子系统学习满足其子任务规范,那么其组合就可以满足总任务规范。相反,如果子任务规范无法由学习的策略满足,我们提供了一种方法,即在高级模型中寻找优化的参数集,以自动更新子任务规范,以便 compte ten observe 短coming。结果是一种迭代的过程,用于定义子任务规范,并培训子系统以满足它们。实验结果表明,该框架在具有全 observable 和 partial observable 的环境中,以及具有整数和连续状态空间的环境中,都能够展示出新的能力。
Global Convergence of Receding-Horizon Policy Search in Learning Estimator Designs
methods: RHPG算法 integrates vanilla policy search directions into a dynamic programming outer loop,将无限时间KF问题转换为一系列静止估计问题,并且提供了优化内部的测地图分析和数据点复杂度保证。
results: RHPG算法可以实现全球均衡,并且不需要任何先验知识或开 Loop稳定性。我们还提供了细化的优化景象分析和数据点复杂度保证。这个研究是控制应用中首次开发了具有性能保证的循环学习算法,并且结合了精确控制理论在算法设计和理论分析中。我们还验证了RHPG算法在一个大规模对流混合运算中的性能。代码存储库可以在 \url{https://github.com/xiangyuan-zhang/LearningKF} 上找到。Abstract
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with provable global convergence in learning the optimal linear estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm does not require any prior knowledge of the system for initialization and does not require the target system to be open-loop stable. The key of RHPG is that we integrate vanilla PG (or any other policy search directions) into a dynamic programming outer loop, which iteratively decomposes the infinite-horizon KF problem that is constrained and non-convex in the policy parameter into a sequence of static estimation problems that are unconstrained and strongly-convex, thus enabling global convergence. We further provide fine-grained analyses of the optimization landscape under RHPG and detail the convergence and sample complexity guarantees of the algorithm. This work serves as an initial attempt to develop reinforcement learning algorithms specifically for control applications with performance guarantees by utilizing classic control theory in both algorithmic design and theoretical analyses. Lastly, we validate our theories by deploying the RHPG algorithm to learn the Kalman filter design of a large-scale convection-diffusion model. We open-source the code repository at \url{https://github.com/xiangyuan-zhang/LearningKF}.
摘要
我们介绍了落后 horizen 策略导数(RHPG)算法,这是首个可证明全球准确性的学习优化 Linear Estimator 设计算法,即卡尔曼滤波器(KF)。值得注意的是,RHPG 算法不需要任何系统的先前知识 для初始化,也不需要目标系统是开 Loop 稳定。RHPG 算法的关键在于将 vanilla PG(或任何其他策略搜索方向)integrated into a dynamic programming outer loop,这将将无限远程 KF 问题,即受约束和非对称的策略参数, decomposed into a sequence of static estimation problems that are unconstrained and strongly convex, thus enabling global convergence. 我们还提供了细化的优化景观下的 RHPG 算法的分析,并详细介绍了算法的收敛和样本复杂度保证。这项工作作为控制应用中开发强化学习算法的初步尝试,并通过利用经典控制理论在算法设计和理论分析中使用。最后,我们验证了我们的理论,通过将 RHPG 算法应用于一个大规模的扩散干扰模型来学习 Kalman 滤波器设计。我们在 GitHub 上开源了代码存储库,详情请参考 \url{https://github.com/xiangyuan-zhang/LearningKF}.
Good-looking but Lacking Faithfulness: Understanding Local Explanation Methods through Trend-based Testing
results: 研究发现,使用新的趋势基测试可以更好地评估 faithfulness,并获得了在复杂数据上的首次评估成果。 Downstream tasks也受益匪浅,例如模型调试具有 faithful explanation methods可以更好地检测和修正精度和安全问题。Abstract
While enjoying the great achievements brought by deep learning (DL), people are also worried about the decision made by DL models, since the high degree of non-linearity of DL models makes the decision extremely difficult to understand. Consequently, attacks such as adversarial attacks are easy to carry out, but difficult to detect and explain, which has led to a boom in the research on local explanation methods for explaining model decisions. In this paper, we evaluate the faithfulness of explanation methods and find that traditional tests on faithfulness encounter the random dominance problem, \ie, the random selection performs the best, especially for complex data. To further solve this problem, we propose three trend-based faithfulness tests and empirically demonstrate that the new trend tests can better assess faithfulness than traditional tests on image, natural language and security tasks. We implement the assessment system and evaluate ten popular explanation methods. Benefiting from the trend tests, we successfully assess the explanation methods on complex data for the first time, bringing unprecedented discoveries and inspiring future research. Downstream tasks also greatly benefit from the tests. For example, model debugging equipped with faithful explanation methods performs much better for detecting and correcting accuracy and security problems.
摘要
While enjoying the great achievements brought by deep learning (DL), people are also worried about the decisions made by DL models, since the high degree of non-linearity of DL models makes the decisions extremely difficult to understand. Consequently, attacks such as adversarial attacks are easy to carry out, but difficult to detect and explain, which has led to a boom in the research on local explanation methods for explaining model decisions. In this paper, we evaluate the faithfulness of explanation methods and find that traditional tests on faithfulness encounter the random dominance problem, \ie, the random selection performs the best, especially for complex data. To further solve this problem, we propose three trend-based faithfulness tests and empirically demonstrate that the new trend tests can better assess faithfulness than traditional tests on image, natural language and security tasks. We implement the assessment system and evaluate ten popular explanation methods. Benefiting from the trend tests, we successfully assess the explanation methods on complex data for the first time, bringing unprecedented discoveries and inspiring future research. Downstream tasks also greatly benefit from the tests. For example, model debugging equipped with faithful explanation methods performs much better for detecting and correcting accuracy and security problems.Here is the translation in Traditional Chinese:人们在深度学习(DL)的成就下享受着,但也担心DL模型的决策,因为DL模型的高度非线性性使得决策 extremely difficult to understand。因此,如 adversarial attack 等攻击性能易于实现,但困难检测和解释,这导致了解释模型决策的本地解释方法的研究热潮。在这篇论文中,我们评估解释方法的忠实度,发现传统的忠实度测试遇到随机主导问题,即随机选择perform the best,特别是 для复杂的数据。为了解决这个问题,我们提出了三种趋势基本的忠实度测试,并证明了这些新的趋势测试可以更好地评估忠实度 than traditional tests on image, natural language and security tasks。我们实现了评估系统,并评估了十种受欢迎的解释方法。受益于趋势测试,我们成功地评估了解释方法 on complex data for the first time,带来了前所未有的发现和未来研究的鼓励。下游任务也受益于测试。例如,具有忠实的解释方法的模型 Debugging 在检测和修正精度和安全问题上表现 Much better。
Timely Fusion of Surround Radar/Lidar for Object Detection in Autonomous Driving Systems
paper_authors: Wenjing Xie, Tao Hu, Neiwen Ling, Guoliang Xing, Shaoshan Liu, Nan Guan for: This paper aims to improve the fusion of surround Radar and Lidar sensor data for autonomous driving systems by developing techniques to work with the faster Lidar data instead of the slower Radar data.methods: The proposed method uses the state-of-the-art object detection model MVDNet to fuse surround Radar/Lidar data, but with enhanced training to tolerate the temporal unalignment of input data.results: The proposed method achieves high output frequency with little accuracy loss, making it a promising solution for real-time object detection in autonomous driving systems.Abstract
Fusing Radar and Lidar sensor data can fully utilize their complementary advantages and provide more accurate reconstruction of the surrounding for autonomous driving systems. Surround Radar/Lidar can provide 360-degree view sampling with the minimal cost, which are promising sensing hardware solutions for autonomous driving systems. However, due to the intrinsic physical constraints, the rotating speed of surround Radar, and thus the frequency to generate Radar data frames, is much lower than surround Lidar. Existing Radar/Lidar fusion methods have to work at the low frequency of surround Radar, which cannot meet the high responsiveness requirement of autonomous driving systems.This paper develops techniques to fuse surround Radar/Lidar with working frequency only limited by the faster surround Lidar instead of the slower surround Radar, based on the state-of-the-art object detection model MVDNet. The basic idea of our approach is simple: we let MVDNet work with temporally unaligned data from Radar/Lidar, so that fusion can take place at any time when a new Lidar data frame arrives, instead of waiting for the slow Radar data frame. However, directly applying MVDNet to temporally unaligned Radar/Lidar data greatly degrades its object detection accuracy. The key information revealed in this paper is that we can achieve high output frequency with little accuracy loss by enhancing the training procedure to explore the temporal redundancy in MVDNet so that it can tolerate the temporal unalignment of input data. We explore several different ways of training enhancement and compare them quantitatively with experiments.
摘要
将雷达和激光感知器融合可以完全利用它们的优势,提供更准确的周围环境重建 для自动驾驶系统。三百六十度雷达/激光可以提供360度的视野样本,是自动驾驶系统的感知硬件解决方案。然而,由于雷达的物理限制,雷达旋转速率相对较低,因此雷达数据帧的频率远低于激光。现有的雷达/激光融合方法必须在低频率的雷达数据帧上工作,无法满足自动驾驶系统的高响应性要求。本文提出了一种解决方案,使用基于state-of-the-art对象检测模型MVDNet进行雷达/激光融合。我们的思路简单:让MVDNet在雷达/激光数据不对时进行融合,以便在新的激光数据帧到达时进行融合,而不必等待慢速的雷达数据帧。然而,直接将MVDNet应用于不对时的雷达/激光数据会导致对象检测精度下降。我们发现,可以通过强化训练程序,以利用MVDNet中的时间重复性,使其能够忍受输入数据的时间不对。我们试了多种训练强化方法,并对它们进行了量化比较。
Finding Influencers in Complex Networks: An Effective Deep Reinforcement Learning Approach
methods: 本文提出了一种结合图 neural network和强化学习的综合学习框架,名为DREIM,通过广泛的小型synthetic graphs训练,在大型synthetic和实际世界网络上超越了现有的基eline方法,并且对网络大小 linear scalability 的特性做出了实际证明。
results: 本文的DREIM模型在解决Influence Maximization问题时,相比现有的基eline方法,具有更高的解决质量和linear scalability 特性。Abstract
Maximizing influences in complex networks is a practically important but computationally challenging task for social network analysis, due to its NP- hard nature. Most current approximation or heuristic methods either require tremendous human design efforts or achieve unsatisfying balances between effectiveness and efficiency. Recent machine learning attempts only focus on speed but lack performance enhancement. In this paper, different from previous attempts, we propose an effective deep reinforcement learning model that achieves superior performances over traditional best influence maximization algorithms. Specifically, we design an end-to-end learning framework that combines graph neural network as the encoder and reinforcement learning as the decoder, named DREIM. Trough extensive training on small synthetic graphs, DREIM outperforms the state-of-the-art baseline methods on very large synthetic and real-world networks on solution quality, and we also empirically show its linear scalability with regard to the network size, which demonstrates its superiority in solving this problem.
摘要
<>将复杂网络中的影响力最大化作为社交网络分析中的实际重要任务,由于其NP困难的性质,现有的现有的近似或规则方法通常需要巨大的人工设计努力或者实现不够的效率和效果平衡。现代机器学习尝试只集中于速度,但缺乏性能提升。在这篇论文中,与之前的尝试不同,我们提出了一种高效的深度强化学习模型,可以超越传统的最佳影响最大化算法。specifically,我们设计了一个端到端学习框架,将图 neural network作为编码器和强化学习作为解码器,名为DREIM。经过广泛的小 synthetic graphs 训练,DREIM 超越了状态静态基eline 方法在很大的 sintetic 和实际网络上的解决质量,并且我们还证明其线性扩展性,表明其在解决这个问题上的优势。Note:* "NP-hard" is translated as "NP困难" (NP困难性)* "influence maximization" is translated as "影响力最大化" (影响力最大化)* "deep reinforcement learning" is translated as "深度强化学习" (深度强化学习)* "graph neural network" is translated as "图 neural network" (图 neural network)* "baseline methods" is translated as "基线方法" (基线方法)* "synthetic graphs" is translated as "小 synthetic graphs" (小 synthetic graphs)
Towards Real-World Burst Image Super-Resolution: Benchmark and Method
results: 我们的 FBAnet 在两个版本的数据集上进行了广泛的实验,并证明了它可以超过现有的状态艺术图像重建方法,并且可以生成有趣的 SR 图像预测。我们的数据集、代码和模型都公开可用于 GitHub。Abstract
Despite substantial advances, single-image super-resolution (SISR) is always in a dilemma to reconstruct high-quality images with limited information from one input image, especially in realistic scenarios. In this paper, we establish a large-scale real-world burst super-resolution dataset, i.e., RealBSR, to explore the faithful reconstruction of image details from multiple frames. Furthermore, we introduce a Federated Burst Affinity network (FBAnet) to investigate non-trivial pixel-wise displacements among images under real-world image degradation. Specifically, rather than using pixel-wise alignment, our FBAnet employs a simple homography alignment from a structural geometry aspect and a Federated Affinity Fusion (FAF) strategy to aggregate the complementary information among frames. Those fused informative representations are fed to a Transformer-based module of burst representation decoding. Besides, we have conducted extensive experiments on two versions of our datasets, i.e., RealBSR-RAW and RealBSR-RGB. Experimental results demonstrate that our FBAnet outperforms existing state-of-the-art burst SR methods and also achieves visually-pleasant SR image predictions with model details. Our dataset, codes, and models are publicly available at https://github.com/yjsunnn/FBANet.
摘要
尽管已经取得了重要进步,单一图像超分解 (SISR) 仍然面临着从一个输入图像中重建高质量图像的挑战,特别是在实际场景下。在这篇论文中,我们建立了一个大规模的实际场景中的爆发超分解数据集,即RealBSR,以探索图像细节的忠实重建。此外,我们引入了一种 Federated Burst Affinity network (FBAnet),以探索实际场景下图像的非致命像素位移。具体来说,而不是使用像素位移对 align,我们的FBAnet使用了一种简单的投影变换的结构几何学方面的同步方法,并使用一种 Federated Affinity Fusion (FAF) 策略来聚合各帧中的补充信息。这些融合的信息表示被 fed 到一个基于 Transformer 的强制代码帧表示解码模块。此外,我们在 RealBSR-RAW 和 RealBSR-RGB 两个版本的数据集上进行了广泛的实验,结果表明,我们的 FBAnet 超过了现有的推荐爆发 SR 方法,并且实现了可见愉悦 SR 图像预测,同时保持模型细节。我们的数据集、代码和模型都可以在 GitHub 上公开获取,链接在https://github.com/yjsunnn/FBANet。
CPMR: Context-Aware Incremental Sequential Recommendation with Pseudo-Multi-Task Learning
results: 在四个标准推荐数据集上实验表明,CPMR可以持续超越当前状态艺术的基eline,并在三个数据集上 achieve 显著的提升。Abstract
The motivations of users to make interactions can be divided into static preference and dynamic interest. To accurately model user representations over time, recent studies in sequential recommendation utilize information propagation and evolution to mine from batches of arriving interactions. However, they ignore the fact that people are easily influenced by the recent actions of other users in the contextual scenario, and applying evolution across all historical interactions dilutes the importance of recent ones, thus failing to model the evolution of dynamic interest accurately. To address this issue, we propose a Context-Aware Pseudo-Multi-Task Recommender System (CPMR) to model the evolution in both historical and contextual scenarios by creating three representations for each user and item under different dynamics: static embedding, historical temporal states, and contextual temporal states. To dually improve the performance of temporal states evolution and incremental recommendation, we design a Pseudo-Multi-Task Learning (PMTL) paradigm by stacking the incremental single-target recommendations into one multi-target task for joint optimization. Within the PMTL paradigm, CPMR employs a shared-bottom network to conduct the evolution of temporal states across historical and contextual scenarios, as well as the fusion of them at the user-item level. In addition, CPMR incorporates one real tower for incremental predictions, and two pseudo towers dedicated to updating the respective temporal states based on new batches of interactions. Experimental results on four benchmark recommendation datasets show that CPMR consistently outperforms state-of-the-art baselines and achieves significant gains on three of them. The code is available at: https://github.com/DiMarzioBian/CPMR.
摘要
用户的动机可以分为静态喜好和动态兴趣。为了准确地模型用户在时间上的表现,现在的研究在串行推荐中使用信息传播和进化来 mines 从到达的交互批处理。然而,它们忽略了人们在场景下的受到他人最近行为影响的事实,并且在所有历史交互上应用进化,从而不能准确地模型动态兴趣的演化。为解决这个问题,我们提出了Context-Aware Pseudo-Multi-Task Recommender System (CPMR),用于在历史和场景下模型用户和ITEM的演化。我们设计了三种表示方法:静态嵌入、历史时间状态和场景时间状态。为了提高时间状态演化和逐步推荐的性能,我们实现了一种Pseudo-Multi-Task Learning (PMTL) paradigm,其中CPMR使用一个共享底层网络来进行时间状态的演化和用户-ITEM级别的 fusión。此外,CPMR还包括一个真实的射频塔来进行逐步预测,以及两个 Pseudo 射频塔来更新各自的时间状态基于新批处理的交互。实验结果表明,CPMR在四个基准推荐数据集上具有显著的优势,并在三个基准上达到了显著的提升。代码可以在https://github.com/DiMarzioBian/CPMR 中获取。
TMComposites: Plug-and-Play Collaboration Between Specialized Tsetlin Machines
results: 在Fashion-MNIST、CIFAR-10和CIFAR-100上提高了准确率, Specifically, the TM Composite increased accuracy on Fashion-MNIST by 2 percentage points, CIFAR-10 by 12 points, and CIFAR-100 by 9 points, achieving new state-of-the-art results for TMs.Abstract
Tsetlin Machines (TMs) provide a fundamental shift from arithmetic-based to logic-based machine learning. Supporting convolution, they deal successfully with image classification datasets like MNIST, Fashion-MNIST, and CIFAR-2. However, the TM struggles with getting state-of-the-art performance on CIFAR-10 and CIFAR-100, representing more complex tasks. This paper introduces plug-and-play collaboration between specialized TMs, referred to as TM Composites. The collaboration relies on a TM's ability to specialize during learning and to assess its competence during inference. When teaming up, the most confident TMs make the decisions, relieving the uncertain ones. In this manner, a TM Composite becomes more competent than its members, benefiting from their specializations. The collaboration is plug-and-play in that members can be combined in any way, at any time, without fine-tuning. We implement three TM specializations in our empirical evaluation: Histogram of Gradients, Adaptive Gaussian Thresholding, and Color Thermometers. The resulting TM Composite increases accuracy on Fashion-MNIST by two percentage points, CIFAR-10 by twelve points, and CIFAR-100 by nine points, yielding new state-of-the-art results for TMs. Overall, we envision that TM Composites will enable an ultra-low energy and transparent alternative to state-of-the-art deep learning on more tasks and datasets.
摘要
特具机器 (TM) 提供了一个基本的转换,从数学基础到逻辑基础的机器学习。它们可以成功地处理像 Minnist、Fashion-Minnist 和 CIFAR-2 这些图像分类 dataset,但是它们对 CIFAR-10 和 CIFAR-100 这些更加复杂的任务表现不佳。这篇论文介绍了特殊化的 TM 之间的协作,称为 TM Composites。这种协作基于 TM 的学习中的特殊化和推断中的能力评估。当它们合作时,最自信的 TM 会作出决策,减轻不确定的 TM。因此,一个 TM Composite 会比其成员更有能力,从其特殊化中受益。这种协作是可插入式的,成员可以在任何时候、任何方式混合,不需要微调。我们在实验中实现了三种 TM 特殊化: Histogram of Gradients、Adaptive Gaussian Thresholding 和 Color Thermometers。它们的结合使得 Fashion-MNIST 的准确率提高了二个百分比点,CIFAR-10 的准确率提高了十二个百分比点,CIFAR-100 的准确率提高了九个百分比点,创造了新的state-of-the-art 结果。总的来说,我们预期 TM Composites 将在更多的任务和数据集上提供低能耗和透明的替代方案。
A Fast Algorithm for Moderating Critical Nodes via Edge Removal
results: 实验结果表明,我们提出的算法在不同的设定下具有高效性和可靠性,能够有效地减少网络中执行moderation的计算成本。Abstract
Critical nodes in networks are extremely vulnerable to malicious attacks to trigger negative cascading events such as the spread of misinformation and diseases. Therefore, effective moderation of critical nodes is very vital for mitigating the potential damages caused by such malicious diffusions. The current moderation methods are computationally expensive. Furthermore, they disregard the fundamental metric of information centrality, which measures the dissemination power of nodes. We investigate the problem of removing $k$ edges from a network to minimize the information centrality of a target node $\lea$ while preserving the network's connectivity. We prove that this problem is computationally challenging: it is NP-complete and its objective function is not supermodular. However, we propose three approximation greedy algorithms using novel techniques such as random walk-based Schur complement approximation and fast sum estimation. One of our algorithms runs in nearly linear time in the number of edges. To complement our theoretical analysis, we conduct a comprehensive set of experiments on synthetic and real networks with over one million nodes. Across various settings, the experimental results illustrate the effectiveness and efficiency of our proposed algorithms.
摘要
重要的网络中的节点非常易受到黑客攻击,导致负面传播事件的发生,如误information和疾病的传播。因此,有效地调节重要节点非常重要,以减少这些黑客攻击导致的潜在损害。现有的调节方法 computationally expensive,而且忽略了信息中心度的基本度量,它度量节点传播力。我们研究了从网络中移除 $k$ 个边,以使Target node $\lea$ 的信息中心度最小化,保持网络连接性。我们证明这个问题是 computationally challenging:它是NP-complete,并且其目标函数不具有supermodular。然而,我们提出了三种近似算法,使用了新的技术,如Random walk-based Schur complement approximation和快速总和估计。其中一个算法在 Nearly linear time 中处理了 edges。实际上,我们对实际和 sintetic 网络进行了广泛的实验,包括超过一百万个节点。不同的设定下,实验结果显示了我们的提案的有效性和高效性。
A Full-fledged Commit Message Quality Checker Based on Machine Learning
paper_authors: David Faragó, Michael Färber, Christian Petrov
for: 这篇论文是关于提高版本控制中的提交信息质量的研究,以便更好地支持软件维护和演化。
methods: 该论文使用机器学习方法来评估提交信息质量,包括语义和上下文。
results: 该论文可以够准确地评估提交信息质量,其最低F$_1$分为82.9%,这表明机器学习方法可以很好地评估提交信息质量。Abstract
Commit messages (CMs) are an essential part of version control. By providing important context in regard to what has changed and why, they strongly support software maintenance and evolution. But writing good CMs is difficult and often neglected by developers. So far, there is no tool suitable for practice that automatically assesses how well a CM is written, including its meaning and context. Since this task is challenging, we ask the research question: how well can the CM quality, including semantics and context, be measured with machine learning methods? By considering all rules from the most popular CM quality guideline, creating datasets for those rules, and training and evaluating state-of-the-art machine learning models to check those rules, we can answer the research question with: sufficiently well for practice, with the lowest F$_1$ score of 82.9\%, for the most challenging task. We develop a full-fledged open-source framework that checks all these CM quality rules. It is useful for research, e.g., automatic CM generation, but most importantly for software practitioners to raise the quality of CMs and thus the maintainability and evolution speed of their software.
摘要
commit messages (CMs) 是版本控制中非常重要的一部分,它们提供了更改的重要上下文和原因,从而强化软件维护和演化。然而,写好CMs是很困难的,开发者们经常忽略这一点。迄今为止,没有一种适合实践的工具可以自动评估CM质量,包括它的 semantics 和context。由于这是一项具有挑战性的任务,我们提出了研究问题:可以使用机器学习方法来评估CM质量,包括 semantics 和context?我们考虑了最流行的CM质量指南中的所有规则,创建了相应的数据集,并使用当今最佳的机器学习模型来检查这些规则。我们的研究表明,使用机器学习方法可以很好地评估CM质量,最低的F1分数为82.9%,对最复杂的任务来说。我们开发了一套免费、开源的框架,可以检查所有CM质量规则。它可以用于研究,例如自动生成CM,但更重要的是,它可以帮助软件实践者提高CM质量,从而提高软件的维护和演化速度。
Towards Robust Model Watermark via Reducing Parametric Vulnerability
results: 提出了一种基于最大化最小化的方法,可以在 parametric 变化和许多后门除法攻击下提高模型水印的稳定性。Abstract
Deep neural networks are valuable assets considering their commercial benefits and huge demands for costly annotation and computation resources. To protect the copyright of DNNs, backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model by embedding a specific backdoor behavior before releasing it. The defenders (usually the model owners) can identify whether a suspicious third-party model is ``stolen'' from them based on the presence of the behavior. Unfortunately, these watermarks are proven to be vulnerable to removal attacks even like fine-tuning. To further explore this vulnerability, we investigate the parameter space and find there exist many watermark-removed models in the vicinity of the watermarked one, which may be easily used by removal attacks. Inspired by this finding, we propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Extensive experiments demonstrate that our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks. The codes for reproducing our main experiments are available at \url{https://github.com/GuanhaoGan/robust-model-watermarking}.
摘要
深度神经网络是商业上非常有价值的资产,同时它们需要大量的昂贵的注解和计算资源。为了保护深度神经网络的版权,在最近几年,以特定的后门行为为水印的拥有者认可方式在使用。但是这些水印却被证明容易受到移除攻击,甚至是通过精细调整。为了进一步探索这一点,我们研究了参数空间,发现在水印模型附近存在许多没有水印的模型,这些模型可能被用于移除攻击。受这一发现的启发,我们提出了一种最大化-最小化的形式来找到这些没有水印的模型,并恢复它们的水印行为。我们的方法可以提高模型水印的对 Parametric 变化和许多水印移除攻击的Robustness。codes for reproducing our main experiments are available at \url{https://github.com/GuanhaoGan/robust-model-watermarking}.
SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning
results: 研究发现了许多基础模型具有异常的行为,如重复提供的指令、位置偏好和主流标签偏好。此外,许多模型在根据factual、科学和常识知识提问时表现不一致。Abstract
We present SeaEval, a benchmark for multilingual foundation models. In addition to characterizing how these models understand and reason with natural language, we also investigate how well they comprehend cultural practices, nuances, and values. Alongside standard accuracy metrics, we investigate the brittleness of foundation models in the dimensions of semantics and multilinguality. Our analyses span both open-sourced and closed models, leading to empirical results across classic NLP tasks, reasoning, and cultural comprehension. Key findings indicate (1) Most models exhibit varied behavior when given paraphrased instructions. (2) Many models still suffer from exposure bias (e.g., positional bias, majority label bias). (3) For questions rooted in factual, scientific, and commonsense knowledge, consistent responses are expected across multilingual queries that are semantically equivalent. Yet, most models surprisingly demonstrate inconsistent performance on these queries. (4) Multilingually-trained models have not attained "balanced multilingual" capabilities. Our endeavors underscore the need for more generalizable semantic representations and enhanced multilingual contextualization. SeaEval can serve as a launchpad for more thorough investigations and evaluations for multilingual and multicultural scenarios.
摘要
我们介绍了 SeaEval,一个多语言基础模型的benchmark。除了描述这些模型如何理解和处理自然语言之外,我们还研究了这些模型如何理解文化做法、细节和价值观。与标准精度指标相结合,我们调查基础模型在语义和多语言方面的脆弱性。我们的分析覆盖了开源和关闭模型,从经典NLP任务、理解到文化理解方面得到了实证结果。关键发现包括:1. 大多数模型对提供重叠 instrucciones 时表现不同。2. 许多模型仍然受到露天偏见(例如位置偏见、多数标签偏见)的影响。3. 根据Factual、科学和通俗知识而问的问题,多语言查询的semantic相同性预期得到一致的回答。然而,大多数模型却在这些查询上表现不一致。4. 多语言训练的模型尚未 дости到了"平衡多语言"的能力。我们的努力强调了需要更加通用的semantic表示和多语言contextualization。 SeaEval可以作为多语言和多文化enario的评估和研究的起点。
AudRandAug: Random Image Augmentations for Audio Classification
methods: 这篇论文使用了一种基于搜索空间的随机数据增强方法(AudRandAug), randomly selecting data augmentation techniques from a dedicated audio search space。
results: 根据我们的实验结果,AudRandAug 比其他现有的数据增强方法有着更高的精度表现。Abstract
Data augmentation has proven to be effective in training neural networks. Recently, a method called RandAug was proposed, randomly selecting data augmentation techniques from a predefined search space. RandAug has demonstrated significant performance improvements for image-related tasks while imposing minimal computational overhead. However, no prior research has explored the application of RandAug specifically for audio data augmentation, which converts audio into an image-like pattern. To address this gap, we introduce AudRandAug, an adaptation of RandAug for audio data. AudRandAug selects data augmentation policies from a dedicated audio search space. To evaluate the effectiveness of AudRandAug, we conducted experiments using various models and datasets. Our findings indicate that AudRandAug outperforms other existing data augmentation methods regarding accuracy performance.
摘要
<>translate "Data augmentation has proven to be effective in training neural networks. Recently, a method called RandAug was proposed, randomly selecting data augmentation techniques from a predefined search space. RandAug has demonstrated significant performance improvements for image-related tasks while imposing minimal computational overhead. However, no prior research has explored the application of RandAug specifically for audio data augmentation, which converts audio into an image-like pattern. To address this gap, we introduce AudRandAug, an adaptation of RandAug for audio data. AudRandAug selects data augmentation policies from a dedicated audio search space. To evaluate the effectiveness of AudRandAug, we conducted experiments using various models and datasets. Our findings indicate that AudRandAug outperforms other existing data augmentation methods regarding accuracy performance." into 简化中文。Here's the translation:数据增强已经证明对神经网络训练是有效的。最近,一种方法called RandAug被提出,随机从预定搜索空间中选择数据增强策略。RandAug在图像相关任务上表现出了显著的性能提升,而且对计算负担的要求非常低。然而,没有任何之前的研究探讨了将RandAug特地应用于音频数据增强,这将音频转换成图像类似的模式。为了填补这一漏洞,我们介绍了AudRandAug,它是RandAug的音频数据增强版本。AudRandAug从专门的音频搜索空间中选择数据增强策略。为了评估AudRandAug的效果,我们使用了不同的模型和数据集进行实验。我们的发现表明,AudRandAug在准确性表现方面超过了其他现有的数据增强方法。
RR-CP: Reliable-Region-Based Conformal Prediction for Trustworthy Medical Image Classification
results: 在五个公共数据集上进行了实验,并显示了RR-CP在实现用户指定的错误率(例如0.5%)的情况下, prediction set 的大小相对较小,而且可靠性较高。Abstract
Conformal prediction (CP) generates a set of predictions for a given test sample such that the prediction set almost always contains the true label (e.g., 99.5\% of the time). CP provides comprehensive predictions on possible labels of a given test sample, and the size of the set indicates how certain the predictions are (e.g., a set larger than one is `uncertain'). Such distinct properties of CP enable effective collaborations between human experts and medical AI models, allowing efficient intervention and quality check in clinical decision-making. In this paper, we propose a new method called Reliable-Region-Based Conformal Prediction (RR-CP), which aims to impose a stronger statistical guarantee so that the user-specified error rate (e.g., 0.5\%) can be achieved in the test time, and under this constraint, the size of the prediction set is optimized (to be small). We consider a small prediction set size an important measure only when the user-specified error rate is achieved. Experiments on five public datasets show that our RR-CP performs well: with a reasonably small-sized prediction set, it achieves the user-specified error rate (e.g., 0.5\%) significantly more frequently than exiting CP methods.
摘要
具有预测集的具体预测(CP)生成一个测试样本的预测集,使得预测集中的真实标签几乎总是包含true label(例如,99.5%的时间)。CP提供了测试样本的可能性标签的全面预测,预测集的大小表示预测的certainty(例如,大于一的集是“uncertain”)。CP的特有性使得人类专家和医疗AI模型之间的合作更加有效, allowing for efficient intervention and quality check in clinical decision-making.在这篇论文中,我们提出了一种新的方法called Reliable-Region-Based Conformal Prediction (RR-CP),旨在在测试时间内实现用户指定的错误率(例如,0.5%),并在这个约束下优化预测集的大小。我们认为小的预测集大小是重要的度量,只当用户指定的错误率得到实现时。在五个公共数据集上进行了实验,我们的RR-CP表现良好:与相对较小的预测集大小,它可以 achieve用户指定的错误率(例如,0.5%),与现有CP方法相比,significantly more frequently。
Towards Real-time Training of Physics-informed Neural Networks: Applications in Ultrafast Ultrasound Blood Flow Imaging
results: 对于单血管和 trifurcate 血管的 Finite-element 模拟和 \emph{in vitro} 血液模型,SeqPINN 和 SP-PINN 都能够快速地解决血液流动速度的问题,而且它们分别对 straight 血管和 trifurcate 血管的 RMSE 分别为 1.01 cm/s 和 1.26 cm/s,和 1.91 cm/s 和 2.56 cm/s。Abstract
Physics-informed Neural Network (PINN) is one of the most preeminent solvers of Navier-Stokes equations, which are widely used as the governing equation of blood flow. However, current approaches, relying on full Navier-Stokes equations, are impractical for ultrafast Doppler ultrasound, the state-of-the-art technique for depiction of complex blood flow dynamics \emph{in vivo} through acquired thousands of frames (or, timestamps) per second. In this article, we first propose a novel training framework of PINN for solving Navier-Stokes equations by discretizing Navier-Stokes equations into steady state and sequentially solving steady-state Navier-Stokes equations with transfer learning. The novel training framework is coined as SeqPINN. Upon the success of SeqPINN, we adopt the idea of averaged constant stochastic gradient descent (SGD) as initialization and propose a parallel training scheme for all timestamps. To ensure an initialization that generalizes well, we borrow the concept of Stochastic Weight Averaging Gaussian to perform uncertainty estimation as an indicator of generalizability of the initialization. This algorithm, named SP-PINN, further expedites training of PINN while achieving comparable accuracy with SeqPINN. Finite-element simulations and \emph{in vitro} phantoms of single-branch and trifurcate blood vessels are used to evaluate the performance of SeqPINN and SP-PINN. Results show that both SeqPINN and SP-PINN are manyfold faster than the original design of PINN, while respectively achieving Root Mean Square Errors (RMSEs) of 1.01 cm/s and 1.26 cm/s on the straight vessel and 1.91 cm/s and 2.56 cm/s on the trifurcate blood vessel when recovering blood flow velocities.
摘要
物理学信息化神经网络(PINN)是 Navier-Stokes 方程的一种最优解,广泛用于血液流动的研究。然而,现有的方法,基于全 Navier-Stokes 方程,对于高速Doppler超音波扫描(ultrafast Doppler ultrasound)来说是不实用的。在这篇文章中,我们首先提出了一种新的训练框架,称为SeqPINN,用于解决 Navier-Stokes 方程。我们将 Navier-Stokes 方程精度化为稳定态,并采用转移学习来逐渐解决稳定态 Navier-Stokes 方程。这种训练框架是SeqPINN。成功SeqPINN之后,我们采用了averaged constant stochastic gradient descent(SGD)的初始化,并提出了并行训练方案。为确保一个初始化能够通用,我们借鉴了Stochastic Weight Averaging Gaussian(SWAG)来进行uncertainty estimation,这个算法被称为SP-PINN。SP-PINN可以更快地训练 PINN,而且可以达到与SeqPINN相同的准确性。在finite-element simulations和�emph;in vitro�emph; phantoms中,我们使用了单臂和 trifurcate 血管来评估SeqPINN和SP-PINN的性能。结果显示,SeqPINN和SP-PINN都比原始 PINN 快得多,同时分别在直流血管和 trifurcate 血管中的Root Mean Square Errors(RMSE)分别为1.01 cm/s和1.26 cm/s,以及1.91 cm/s和2.56 cm/s。
A Spatiotemporal Deep Neural Network for Fine-Grained Multi-Horizon Wind Prediction
results: 模型的评估结果表明,与竞争对手相比,MHSTN具有显著的优势,并且已经在中国一个最繁忙的国际机场的调度平台中实现了集成。Abstract
The prediction of wind in terms of both wind speed and direction, which has a crucial impact on many real-world applications like aviation and wind power generation, is extremely challenging due to the high stochasticity and complicated correlation in the weather data. Existing methods typically focus on a sub-set of influential factors and thus lack a systematic treatment of the problem. In addition, fine-grained forecasting is essential for efficient industry operations, but has been less attended in the literature. In this work, we propose a novel data-driven model, Multi-Horizon SpatioTemporal Network (MHSTN), generally for accurate and efficient fine-grained wind prediction. MHSTN integrates multiple deep neural networks targeting different factors in a sequence-to-sequence (Seq2Seq) backbone to effectively extract features from various data sources and produce multi-horizon predictions for all sites within a given region. MHSTN is composed of four major modules. First, a temporal module fuses coarse-grained forecasts derived by Numerical Weather Prediction (NWP) and historical on-site observation data at stations so as to leverage both global and local atmospheric information. Second, a spatial module exploits spatial correlation by modeling the joint representation of all stations. Third, an ensemble module weighs the above two modules for final predictions. Furthermore, a covariate selection module automatically choose influential meteorological variables as initial input. MHSTN is already integrated into the scheduling platform of one of the busiest international airports of China. The evaluation results demonstrate that our model outperforms competitors by a significant margin.
摘要
各种因素的预测,包括风速和方向,对许多现实生活中的应用,如航空和风力发电,是极其困难的。这是因为天气数据中存在高度的随机性和复杂的相关性。现有的方法通常只关注一 subset of 影响因素,因此缺乏一个系统性的处理方法。另外,细化预测是业务操作的效率化的关键,但在文献中得到了更少的关注。在这项工作中,我们提出了一种新的数据驱动模型,即多个顺序时空网络(Multi-Horizon SpatioTemporal Network,MHSTN),用于准确和效率地进行细化风预测。MHSTN 模型包括四个主要模块。首先,一个时间模块将 numerical weather prediction(NWP) 和历史站点观测数据 fusion 以利用全球和地方大气信息。其次,一个空间模块利用空间相关性,模型所有站点的联合表示。第三,一个ensemble模块将上述两个模块进行最终预测。最后,一个 covariate 选择模块自动选择影响大气变量的关键变量作为输入。MHSTN 模型已经成功 интеGRATED 到了中国一个最繁忙的国际机场的调度平台。评估结果表明,我们的模型在竞争对手之上显著超越。
TCGAN: Convolutional Generative Adversarial Network for Time Series Classification and Clustering
results: 对 synthetic 和实际世界数据进行了广泛的实验,结果表明 TCGAN 比现有的时序 GAN 更快速和更准确。学习得到的表示能够提高简单的分类和归一化方法的性能,并在具有少量标注和不均匀标注的情况下保持高效。Abstract
Recent works have demonstrated the superiority of supervised Convolutional Neural Networks (CNNs) in learning hierarchical representations from time series data for successful classification. These methods require sufficiently large labeled data for stable learning, however acquiring high-quality labeled time series data can be costly and potentially infeasible. Generative Adversarial Networks (GANs) have achieved great success in enhancing unsupervised and semi-supervised learning. Nonetheless, to our best knowledge, it remains unclear how effectively GANs can serve as a general-purpose solution to learn representations for time series recognition, i.e., classification and clustering. The above considerations inspire us to introduce a Time-series Convolutional GAN (TCGAN). TCGAN learns by playing an adversarial game between two one-dimensional CNNs (i.e., a generator and a discriminator) in the absence of label information. Parts of the trained TCGAN are then reused to construct a representation encoder to empower linear recognition methods. We conducted comprehensive experiments on synthetic and real-world datasets. The results demonstrate that TCGAN is faster and more accurate than existing time-series GANs. The learned representations enable simple classification and clustering methods to achieve superior and stable performance. Furthermore, TCGAN retains high efficacy in scenarios with few-labeled and imbalanced-labeled data. Our work provides a promising path to effectively utilize abundant unlabeled time series data.
摘要
Recent research has shown that supervised Convolutional Neural Networks (CNNs) can learn hierarchical representations from time series data for successful classification. However, acquiring high-quality labeled time series data can be costly and potentially infeasible. Generative Adversarial Networks (GANs) have achieved great success in enhancing unsupervised and semi-supervised learning. However, it remains unclear how effectively GANs can serve as a general-purpose solution to learn representations for time series recognition, i.e., classification and clustering. Inspired by these considerations, we introduce a Time-series Convolutional GAN (TCGAN). TCGAN learns by playing an adversarial game between two one-dimensional CNNs (i.e., a generator and a discriminator) in the absence of label information. Parts of the trained TCGAN are then reused to construct a representation encoder to empower linear recognition methods. We conducted comprehensive experiments on synthetic and real-world datasets. The results demonstrate that TCGAN is faster and more accurate than existing time-series GANs. The learned representations enable simple classification and clustering methods to achieve superior and stable performance. Furthermore, TCGAN retains high efficacy in scenarios with few-labeled and imbalanced-labeled data. Our work provides a promising path to effectively utilize abundant unlabeled time series data.Here is the word-for-word translation of the text into Simplified Chinese:近期研究表明,监督式 Convolutional Neural Networks (CNNs) 可以学习时序数据的层次表示,以实现成功的分类。然而,获取高质量的时序数据标注可能成本高昂,可能无法实现。生成对抗网络 (GANs) 在无标签情况下增强了无监督和半监督学习。然而,我们知道 GANs 是否可以作为时序Recognition的通用解决方案?TCGAN 是我们的答案。TCGAN 通过两个一维 CNNs(生成器和识别器)之间的对抗游戏学习,不需要标签信息。部分训练 TCGAN 后,可以重用来构建表示编码器,以便使用线性识别方法。我们在synthetic和实际 datasets上进行了广泛的实验。结果表明,TCGAN 比现有的时序 GANs 更快和更准。TCGAN 学习的表示能够使得简单的分类和聚类方法实现超越性和稳定性。此外,TCGAN 在少量标签和偏振标签数据 scenarios 中保持高效。我们的工作为充分利用庞大的无标签时序数据提供了一条可行的道路。
Transitions in echo index and dependence on input repetitions
results: 发现响应性稳态与输入的关系取决于输入的振荡 amplitude和输入的特性,并在输入强度 intermediate 区域内适用。Abstract
The echo index counts the number of simultaneously stable asymptotic responses of a nonautonomous (i.e. input-driven) dynamical system. It generalizes the well-known echo state property for recurrent neural networks - this corresponds to the echo index being equal to one. In this paper, we investigate how the echo index depends on parameters that govern typical responses to a finite-state ergodic external input that forces the dynamics. We consider the echo index for a nonautonomous system that switches between a finite set of maps, where we assume that each map possesses a finite set of hyperbolic equilibrium attractors. We find the minimum and maximum repetitions of each map are crucial for the resulting echo index. Casting our theoretical findings in the RNN computing framework, we obtain that for small amplitude forcing the echo index corresponds to the number of attractors for the input-free system, while for large amplitude forcing, the echo index reduces to one. The intermediate regime is the most interesting; in this region the echo index depends not just on the amplitude of forcing but also on more subtle properties of the input.
摘要
SHAPE: A Sample-adaptive Hierarchical Prediction Network for Medication Recommendation
results: 经验表明,SHAPE模型在一个标准测试集上比多种现有基线模型具有更高的准确率和更好的泛化能力。Abstract
Effectively medication recommendation with complex multimorbidity conditions is a critical task in healthcare. Most existing works predicted medications based on longitudinal records, which assumed the information transmitted patterns of learning longitudinal sequence data are stable and intra-visit medical events are serialized. However, the following conditions may have been ignored: 1) A more compact encoder for intra-relationship in the intra-visit medical event is urgent; 2) Strategies for learning accurate representations of the variable longitudinal sequences of patients are different. In this paper, we proposed a novel Sample-adaptive Hierarchical medicAtion Prediction nEtwork, termed SHAPE, to tackle the above challenges in the medication recommendation task. Specifically, we design a compact intra-visit set encoder to encode the relationship in the medical event for obtaining visit-level representation and then develop an inter-visit longitudinal encoder to learn the patient-level longitudinal representation efficiently. To endow the model with the capability of modeling the variable visit length, we introduce a soft curriculum learning method to assign the difficulty of each sample automatically by the visit length. Extensive experiments on a benchmark dataset verify the superiority of our model compared with several state-of-the-art baselines.
摘要
<>输入文本 translate into Simplified Chinese:“医疗健康预测是一项关键任务。现有的大多数工作都是基于长期纪录预测药物,假设传输信息的长期记录数据是稳定的,并且intragvisit医疗事件是串行化的。然而,以下情况可能被忽略:1)更加 компакт的内部关系编码器可以更好地编码医疗事件中的关系,以获得访问级别表示。2)为了学习精准的患者级别长期序列表示,需要不同的策略。在这篇论文中,我们提出了一种新的Sample-adaptive Hierarchical medicAtion Prediction nEtwork,简称SHAPE,以解决以上挑战。具体来说,我们设计了一个更加 компакт的内部关系编码器,以编码医疗事件中的关系,并开发了一个间访长期编码器,以有效地学习患者级别长期序列表示。为了让模型能够模型变化的访问长度,我们引入了一种软学习策略,以自动将每个样本的难度分配到访问长度。经过了一系列的实验,我们发现我们的模型在一个标准 benchmark dataset 上表现出色,与多种状态之前的基准相比。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Toward Reproducing Network Research Results Using Large Language Models
results: 实验观察到了四名学生的复制结果和经验教训,并提出了未来的研究问题。Abstract
Reproducing research results in the networking community is important for both academia and industry. The current best practice typically resorts to three approaches: (1) looking for publicly available prototypes; (2) contacting the authors to get a private prototype; and (3) manually implementing a prototype following the description of the publication. However, most published network research does not have public prototypes and private prototypes are hard to get. As such, most reproducing efforts are spent on manual implementation based on the publications, which is both time and labor consuming and error-prone. In this paper, we boldly propose reproducing network research results using the emerging large language models (LLMs). In particular, we first prove its feasibility with a small-scale experiment, in which four students with essential networking knowledge each reproduces a different networking system published in prominent conferences and journals by prompt engineering ChatGPT. We report the experiment's observations and lessons and discuss future open research questions of this proposal. This work raises no ethical issue.
摘要
重要的网络研究结果复制是对学术界和业界都非常重要。目前最佳做法通常是:(1)搜索公开可用的原型;(2)与作者联系以获取私有原型;以及(3)根据文章描述手动实现原型。但大多数发表的网络研究没有公共原型,私有原型很难获得。因此,大多数复制努力都是基于文章的手动实现,这是时间和劳动 intensity的和容易出错的。在这篇论文中,我们勇敢地提议使用emerging的大语言模型(LLMs)来复制网络研究结果。具体来说,我们首先证明了这种方法的可行性,通过在四名学生每个复制一种不同的网络系统,这些系统分别发表在知名会议和学术期刊上,通过提示工程ChatGPT来实现。我们报告了实验的观察和教训,并讨论了未来的开放研究问题。这项工作没有伦理问题。
Jade: A Differentiable Physics Engine for Articulated Rigid Bodies with Intersection-Free Frictional Contact
methods: 这个论文使用了Linear Complementarity Problem (LCP)来模型联系,并且使用了无穷小冲击探测和反转策略来避免体之间的交叉。它还使用了不断的碰撞探测来探测冲击时间,并且将整个模拟过程转换为可微分的形式。
results: 这个论文通过广泛的实验表明,它的可微分物理模拟可以在许多具有联系的任务中实现更高的效能和稳定性,比如零碎体之间的碰撞和冲击。Abstract
We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt the backtracking strategy to prevent intersection between bodies with complex geometry shapes. We derive the gradient calculation to ensure the whole simulation process is differentiable under the backtracking mechanism. We modify the popular Dantzig algorithm to get valid solutions under multiple frictional contacts. We conduct extensive experiments to demonstrate the effectiveness of our differentiable physics simulation over a variety of contact-rich tasks.
摘要
我们介绍了一个差异化物理引擎——瑰琅(Jade),它模型了接触为线性补假问题(LCP)。相比现有的差异化仿真,瑰琅具有不同的特点,包括不受接触干扰的碰撞仿真和多种摩擦接触的稳定解。我们使用连续碰撞检测来检测碰撞时间,并采用回溯策略来避免复杂形状的体之间的交叠。我们还计算了梯度,以确保整个仿真过程是不可分割的。我们修改了流行的达条算法,以获得多种摩擦接触下的有效解。我们进行了广泛的实验,以证明我们的差异化物理仿真在多种接触丰富任务中的效果。
Advantage Actor-Critic with Reasoner: Explaining the Agent’s Behavior from an Exploratory Perspective
methods: 我们提出了一种名为 Advantage Actor-Critic with Reasoner (A2CR) 的新方法,可以轻松应用于 Actor-Critic 基于的 RL 模型中,并使其更加可解释。A2CR 由三个相互连接的网络组成:政策网络、价值网络和理解器网络。通过预先定义和分类 actor 的行为目的,A2CR 自动生成了更加全面和可解释的决策过程模型。
results: 在行动含量高的 Super Mario Bros 环境中进行了评估,发现:Reasoner 预测的标签分数随 RL 算法的探索水平增加而减少,而 purpose-based 焦点更加集中和可读。Abstract
Reinforcement learning (RL) is a powerful tool for solving complex decision-making problems, but its lack of transparency and interpretability has been a major challenge in domains where decisions have significant real-world consequences. In this paper, we propose a novel Advantage Actor-Critic with Reasoner (A2CR), which can be easily applied to Actor-Critic-based RL models and make them interpretable. A2CR consists of three interconnected networks: the Policy Network, the Value Network, and the Reasoner Network. By predefining and classifying the underlying purpose of the actor's actions, A2CR automatically generates a more comprehensive and interpretable paradigm for understanding the agent's decision-making process. It offers a range of functionalities such as purpose-based saliency, early failure detection, and model supervision, thereby promoting responsible and trustworthy RL. Evaluations conducted in action-rich Super Mario Bros environments yield intriguing findings: Reasoner-predicted label proportions decrease for ``Breakout" and increase for ``Hovering" as the exploration level of the RL algorithm intensifies. Additionally, purpose-based saliencies are more focused and comprehensible.
摘要
� Reinforcement learning (RL) 是一种强大的解决复杂决策问题的工具,但它缺乏透明性和可解释性,在具有重要现实世界影响的领域是一个主要挑战。在这篇论文中,我们提出了一种新的优先级理解者-评价者(A2CR)模型,可以轻松应用于actor-critic型RL模型中,并使其更加透明。A2CR包括三个相互连接的网络:政策网络、价值网络和理解者网络。通过预先定义和分类actor的行为目的,A2CR自动生成了更加全面和可解释的agent决策过程的模型。它提供了一些功能,如目的基于的焦点度、早期失败检测和模型监管,从而推动了负责任和可信RL。在动作富 Super Mario Bros 环境中进行的评估结果表明:理解者预测的标签分布随RL算法的探索水平的强化而下降,而“Breakout”和“悬停”的目的基于的焦点度更加集中和可解释。
Analysis of Disinformation and Fake News Detection Using Fine-Tuned Large Language Model
results: 研究表明,精细调整的Llama 2模型可以深入分析文本,揭示复杂的风格和narraatives。提取的名实体情感可以作为预测特征在指导机器学习模型中。Abstract
The paper considers the possibility of fine-tuning Llama 2 large language model (LLM) for the disinformation analysis and fake news detection. For fine-tuning, the PEFT/LoRA based approach was used. In the study, the model was fine-tuned for the following tasks: analysing a text on revealing disinformation and propaganda narratives, fact checking, fake news detection, manipulation analytics, extracting named entities with their sentiments. The obtained results show that the fine-tuned Llama 2 model can perform a deep analysis of texts and reveal complex styles and narratives. Extracted sentiments for named entities can be considered as predictive features in supervised machine learning models.
摘要
文章考虑了使用LLama 2大语言模型(LLM)进行假新闻检测和假信息分析的可能性。为了进行细化,使用了PEFT/LoRA基于的方法。研究中使用了以下任务进行细化:分析文本中的假信息和宣传叙述,实现Fact Checking,假新闻检测, manipulate analytics,提取Named Entity的情感。研究结果显示,经过细化的Llama 2模型可以对文本进行深入的分析,揭示复杂的风格和叙述。提取的情感特征可以作为生成式机器学习模型的预测特征。
Advancements in Upper Body Exoskeleton: Implementing Active Gravity Compensation with a Feedforward Controller
results: 对硬件和软件实验以及模拟结果进行比较,系统在稳定性和精度方面具有优秀表现,能够维持位置在长时间内,并具有最小摩擦和不良滚动的特点。Abstract
In this study, we present a feedforward control system designed for active gravity compensation on an upper body exoskeleton. The system utilizes only positional data from internal motor sensors to calculate torque, employing analytical control equations based on Newton-Euler Inverse Dynamics. Compared to feedback control systems, the feedforward approach offers several advantages. It eliminates the need for external torque sensors, resulting in reduced hardware complexity and weight. Moreover, the feedforward control exhibits a more proactive response, leading to enhanced performance. The exoskeleton used in the experiments is lightweight and comprises 4 Degrees of Freedom, closely mimicking human upper body kinematics and three-dimensional range of motion. We conducted tests on both hardware and simulations of the exoskeleton, demonstrating stable performance. The system maintained its position over an extended period, exhibiting minimal friction and avoiding undesired slewing.
摘要
在这项研究中,我们提出了一种Feedforward控制系统,用于活动重力补偿Upper Body exoskeleton。该系统只使用内部电动机传感器的位势数据来计算扭矩,使用分析控制方程基于新顿-尤利尔反逆动力学。相比反馈控制系统,Feedforward方法具有多种优势。它消除了需要外部扭矩传感器的需求,从而减轻硬件复杂性和重量。此外,Feedforward控制具有更加积极的响应,导致性能的提高。我们使用的Exoskeleton是轻量级的,包含4个度Of Freedom,准确模拟人类Upper Body骨骼动态和三维运动范围。我们对硬件和Simulations中的Exoskeleton进行了测试,示出了稳定的性能。系统在Extended Period内保持了其位置,表现出Minimal friction和避免了不良滚动。
Code-Style In-Context Learning for Knowledge-Based Question Answering
results: 实验结果表明,我们的代码式受Context学习方法可以减少生成逻辑形式时的格式错误问题,同时实现新的最佳性能在WebQSP、GrailQA和GraphQ下的少量设定下。Abstract
Current methods for Knowledge-Based Question Answering (KBQA) usually rely on complex training techniques and model frameworks, leading to many limitations in practical applications. Recently, the emergence of In-Context Learning (ICL) capabilities in Large Language Models (LLMs) provides a simple and training-free semantic parsing paradigm for KBQA: Given a small number of questions and their labeled logical forms as demo examples, LLMs can understand the task intent and generate the logic form for a new question. However, current powerful LLMs have little exposure to logic forms during pre-training, resulting in a high format error rate. To solve this problem, we propose a code-style in-context learning method for KBQA, which converts the generation process of unfamiliar logical form into the more familiar code generation process for LLMs. Experimental results on three mainstream datasets show that our method dramatically mitigated the formatting error problem in generating logic forms while realizing a new SOTA on WebQSP, GrailQA, and GraphQ under the few-shot setting.
摘要
现有的知识基本问答(KBQA)方法通常依赖于复杂的训练技术和模型框架,导致在实际应用中具有许多限制。近期,大量语言模型(LLMs)的具有场景学习(ICL)能力提供了一种简单而无需训练的 semantic parsing 模型 для KBQA:给定一小 número de preguntas和其标注的逻辑形式作为示例,LLMs 可以理解任务目的并生成新的问题逻辑形式。然而,当前最强大的 LLMs 在预训练时对逻辑形式没有多少接触,导致高的格式错误率。为解决这个问题,我们提议一种 code-style 在 Context Learning 方法 для KBQA,将生成不熟悉的逻辑形式转换成更加熟悉的代码生成过程。实验结果表明,我们的方法可以减少生成逻辑形式时的格式错误问题,同时实现新的 SOTA 在 WebQSP、GrailQA 和 GraphQ 上下的 few-shot 设置下。
Flexible and Robust Counterfactual Explanations with Minimal Satisfiable Perturbations
For: The paper aims to provide more robust and flexible counterfactual explanations (CFEs) for enhancing informational fairness and trustworthiness in machine learning models.* Methods: The proposed method, called Counterfactual Explanations with Minimal Satisfiable Perturbations (CEMSP), constrains changing values of abnormal features with their semantically meaningful normal ranges, and models the problem as a Boolean satisfiability problem to modify as few features as possible.* Results: The proposed method provides more robust explanations while preserving flexibility, and is demonstrated to be more effective than existing methods through comprehensive experiments on both synthetic and real-world datasets.Here’s the simplified Chinese text for the three key points:* For: 这篇论文目的是提供更加稳定和灵活的对假解释(CFEs),以增强机器学习模型的信息公正和可靠性。* Methods: 提议的方法是Counterfactual Explanations with Minimal Satisfiable Perturbations(CEMSP),它将异常特征值修改为 semantically meaningful normal ranges,并将问题模型为Boolean satisfiability problem,以修改最少特征。* Results: 提议的方法可以提供更加稳定的解释,同时保持灵活性,并在synthetic和实际 datasets上进行了广泛的实验,证明了它比现有方法更有效。Abstract
Counterfactual explanations (CFEs) exemplify how to minimally modify a feature vector to achieve a different prediction for an instance. CFEs can enhance informational fairness and trustworthiness, and provide suggestions for users who receive adverse predictions. However, recent research has shown that multiple CFEs can be offered for the same instance or instances with slight differences. Multiple CFEs provide flexible choices and cover diverse desiderata for user selection. However, individual fairness and model reliability will be damaged if unstable CFEs with different costs are returned. Existing methods fail to exploit flexibility and address the concerns of non-robustness simultaneously. To address these issues, we propose a conceptually simple yet effective solution named Counterfactual Explanations with Minimal Satisfiable Perturbations (CEMSP). Specifically, CEMSP constrains changing values of abnormal features with the help of their semantically meaningful normal ranges. For efficiency, we model the problem as a Boolean satisfiability problem to modify as few features as possible. Additionally, CEMSP is a general framework and can easily accommodate more practical requirements, e.g., casualty and actionability. Compared to existing methods, we conduct comprehensive experiments on both synthetic and real-world datasets to demonstrate that our method provides more robust explanations while preserving flexibility.
摘要
counterfactual explanations (CFEs) 可以最小化特征向量的修改,以实现对一个实例的不同预测。CFEs 可以提高信息公正和可靠性,并为用户提供不同预测选择的建议。然而,当不同的 CFEs 对同一个实例或 slight 不同的实例提供多个选择时,这会导致问题。多个 CFEs 可以提供多样化的选择,但是如果返回不稳定的 CFEs ,则个人公正和模型可靠性将受损。现有方法无法充分利用多样化和不稳定性问题的同时处理。为解决这些问题,我们提出了一种概念简单又有效的解决方案,名为 counterfactual explanations with minimal satisfiable perturbations (CEMSP)。具体来说,CEMSP 通过在异常特征上进行Semantically meaningful normal range的改变来限制修改。为了提高效率,我们将问题模型为Boolean satisfiability problem,以修改最少的特征。此外,CEMSP 是一个通用的框架,可以轻松地满足更多的实际需求,例如 causality 和 actionability。与现有方法相比,我们在 Both synthetic 和实际数据集上进行了 comprehensive 的实验,并证明了我们的方法可以提供更加稳定的解释,同时保持多样化。
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
results: 研究发现,使用FIAT方法可以在100-10,000个训练示例的范围内,比ICL和精度调整更好地表现。FIAT方法可以同时利用最大化的方法和受限学习的方法,以提高模型的性能。Abstract
Learning paradigms for large language models (LLMs) currently tend to fall within either in-context learning (ICL) or full fine-tuning. Each of these comes with their own trade-offs based on available data, model size, compute cost, ease-of-use, and final quality with neither solution performing well across-the-board. In this article, we first describe ICL and fine-tuning paradigms in a way that highlights their natural connections. Based on these connections, we propose a new learning paradigm called FIAT that fuses the best of these paradigms together, enabling prompt-engineered instructions and chain-of-thought reasoning with the very largest models while also using similar methods to perform parameter updates on a modestly-sized LLM with parameter-efficient tuning. We evaluate FIAT's effectiveness on a variety of multilingual tasks and observe that FIAT performs better than both ICL and fine-tuning at scales ranging from 100-10,000 training examples. We hope that FIAT provides a practical way of harnessing the full potential of LLMs without needing to make a hard choice between learning paradigms.
摘要
现有大语言模型(LLM)学习模式主要分为两类:在Context Learning(ICL)和完整精度调整(Fine-tuning)。每种方法都有其特点,包括数据可用性、模型大小、计算成本、使用容易度和最终质量等方面。然而, neither solution performs well across-the-board。在这篇文章中,我们首先描述ICL和精度调整模式,并将其联系到它们之间的自然联系。基于这些联系,我们提议一种新的学习模式called FIAT,它结合了ICL和精度调整模式的优点,使得使用最大模型时可以实现提示工程ered instrucions和链式思维,同时使用相同的方法来进行参数更新 modestly-sized LLM中 parameter-efficient tuning。我们在多种多语言任务上评估FIAT的效果,并发现FIAT在100-10,000个训练示例范围内比ICL和精度调整更好。我们希望FIAT可以为LLM的潜在力量做出实用的方式,不需要选择学习模式。
Video and Synthetic MRI Pre-training of 3D Vision Architectures for Neuroimage Analysis
paper_authors: Nikhil J. Dhinagar, Amit Singh, Saket Ozarkar, Ketaki Buwa, Sophia I. Thomopoulos, Conor Owens-Walton, Emily Laltoo, Yao-Liang Chen, Philip Cook, Corey McMillan, Chih-Chien Tsai, J-J Wang, Yih-Ru Wu, Paul M. Thompson for: 这个论文主要是为了评估不同的预训练方法,以提高3D医学影像任务的模型性能。methods: 作者使用了视transformer(ViT)和卷积神经网络(CNN)作为模型,并对其进行了不同的预训练方法 initialization。results: 研究发现,预训练可以提高所有任务的性能,包括提高AD分类任务的性能7.4%和PD分类任务的性能4.6%,同时也可以减少脑龄预测错误值1.26年。此外,研究还发现,使用大规模的视频或合成MRI数据进行预训练可以提高ViT的性能,而CNN在有限数据设置下表现了良好的稳定性,并且在预训练下进行域外预测也有良好的性能。Abstract
Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on natural images, medical images, or even synthetically generated MRI scans or video data. To evaluate these alternatives, here we benchmarked vision transformers (ViTs) and convolutional neural networks (CNNs), initialized with varied upstream pre-training approaches. These methods were then adapted to three unique downstream neuroimaging tasks with a range of difficulty: Alzheimer's disease (AD) and Parkinson's disease (PD) classification, "brain age" prediction. Experimental tests led to the following key observations: 1. Pre-training improved performance across all tasks including a boost of 7.4% for AD classification and 4.6% for PD classification for the ViT and 19.1% for PD classification and reduction in brain age prediction error by 1.26 years for CNNs, 2. Pre-training on large-scale video or synthetic MRI data boosted performance of ViTs, 3. CNNs were robust in limited-data settings, and in-domain pretraining enhanced their performances, 4. Pre-training improved generalization to out-of-distribution datasets and sites. Overall, we benchmarked different vision architectures, revealing the value of pre-training them with emerging datasets for model initialization. The resulting pre-trained models can be adapted to a range of downstream neuroimaging tasks, even when training data for the target task is limited.
摘要
transferred learning 表示人工智能(AI)系统的新方法shift。相比于专门预训练任务的模型,transferred learning 涉及预训练深度学习模型在大量数据集上并将其微调以适应特定任务。然而, для 3D医学成像任务,我们不知道是否最好预训练模型在自然图像、医疗图像或Synthetically生成的MRI扫描或视频数据上。为了评估这些选择,我们在这里对 ViTs 和 convolutional neural networks (CNNs) 进行了初始化不同的上游预训练方法。这些方法然后在三个独特的下游神经成像任务中进行了适应,包括阿尔茨海默病(AD)和公主病(PD)的分类、"脑龄"预测。实验测试表明了以下关键观察:1. 预训练提高了所有任务的表现,包括ViTs中的7.4%的AD分类提升和4.6%的PD分类提升,以及CNNs中的19.1%的PD分类提升和1.26年的脑龄预测错误减少。2. 预训练在大规模的视频或生成的MRI数据上得到了ViTs的提升,而CNNs在有限数据设置中表现了 robustness。3. 在有限数据设置中,培育在域内预训练中表现出了优异,而CNNs在域外预训练中表现出了更好的泛化性。4. 预训练提高了模型对不同数据集和站点的泛化性。总之,我们对不同的视觉架构进行了 benchmarking,发现预训练它们使用emerging datasets的值。这些预训练的模型可以适应一系列的下游神经成像任务,即使训练数据集的规模有限。
Efficient Finetuning Large Language Models For Vietnamese Chatbot
results: 我们通过自动评分机制GPT-4进行评估,发现我们的方法可以在评估任务中提高20-30%。Abstract
Large language models (LLMs), such as GPT-4, PaLM, and LLaMa, have been shown to achieve remarkable performance across a variety of natural language tasks. Recent advancements in instruction tuning bring LLMs with ability in following user's instructions and producing human-like responses. However, the high costs associated with training and implementing LLMs pose challenges to academic research. Furthermore, the availability of pretrained LLMs and instruction-tune datasets for Vietnamese language is limited. To tackle these concerns, we leverage large-scale instruction-following datasets from open-source projects, namely Alpaca, GPT4All, and Chat-Doctor, which cover general domain and specific medical domain. To the best of our knowledge, these are the first instructional dataset for Vietnamese. Subsequently, we utilize parameter-efficient tuning through Low-Rank Adaptation (LoRA) on two open LLMs: Bloomz (Multilingual) and GPTJ-6B (Vietnamese), resulting four models: Bloomz-Chat, Bloomz-Doctor, GPTJ-Chat, GPTJ-Doctor.Finally, we assess the effectiveness of our methodology on a per-sample basis, taking into consideration the helpfulness, relevance, accuracy, level of detail in their responses. This evaluation process entails the utilization of GPT-4 as an automated scoring mechanism. Despite utilizing a low-cost setup, our method demonstrates about 20-30\% improvement over the original models in our evaluation tasks.
摘要
大型自然语言模型(LLM),如GPT-4、PaLM和LLaMa,已经在各种自然语言任务上表现出色。最近的指令调整技术使得LLM可以按照用户的指令进行行动,并生成人类化的回复。然而,训练和实现LLM的成本高昂,对学术研究提出了挑战。此外,预训练的LLM和指令调整数据集 для越南语言的可用性受限。为解决这些问题,我们利用大规模的指令遵从数据集,来自开源项目,包括Alpaca、GPT4All和Chat-Doctor,这些数据集覆盖通用领域和具体医疗领域。我们知道这是越南语言第一个指令数据集。然后,我们使用LoRA parameter-efficient tuning技术,在开放的两个LLM上进行调整,生成四个模型:Bloomz-Chat、Bloomz-Doctor、GPTJ-Chat和GPTJ-Doctor。最后,我们根据每个样本的帮助程度、相关性、准确性和回答细节进行评估。这个评估过程中使用GPT-4作为自动评分机制。尽管我们使用低成本的设置,但我们的方法在我们的评估任务中表现出20-30%的提升。