results: 根据性能指标,这个论文比较了VALERIE22数据集与其他公开available的数据集,结果显示VALERIE22是目前开放领域中最佳的synthetic数据集之一。Abstract
The VALERIE tool pipeline is a synthetic data generator developed with the goal to contribute to the understanding of domain-specific factors that influence perception performance of DNNs (deep neural networks). This work was carried out under the German research project KI Absicherung in order to develop a methodology for the validation of DNNs in the context of pedestrian detection in urban environments for automated driving. The VALERIE22 dataset was generated with the VALERIE procedural tools pipeline providing a photorealistic sensor simulation rendered from automatically synthesized scenes. The dataset provides a uniquely rich set of metadata, allowing extraction of specific scene and semantic features (like pixel-accurate occlusion rates, positions in the scene and distance + angle to the camera). This enables a multitude of possible tests on the data and we hope to stimulate research on understanding performance of DNNs. Based on performance metric a comparison with several other publicly available datasets is provided, demonstrating that VALERIE22 is one of best performing synthetic datasets currently available in the open domain.
摘要
valeerie工具管道是一种人工数据生成工具,用于帮助理解深度神经网络(DNN)在不同领域的感知性能因素。这项工作是在德国研究项目“KI Absicherung”下进行的,旨在为自动驾驶中的人体检测中开发一种验证方法。valerie22数据集是通过 valeerie工具管道生成的,该管道提供了高度真实的感知器 simulated scenes。该数据集具有独特的元数据,例如像素精度的遮挡率、场景中的位置和相机到摄像头的距离。这些元数据可以用于进行多种测试,并且我们希望通过这些测试来探索DNN的性能。基于性能指标,我们对多个公共可用的数据集进行了比较,并证明了valerie22是目前公开领域中最佳的人工数据集之一。
Minimum Coverage Sets for Training Robust Ad Hoc Teamwork Agents
results: 我们的实验表明,L-BRDiv在更多的两个玩家合作问题中生成了更加稳定和可靠的AHT代理,而不需要让对象的目标进行详细调整。我们的研究表明,L-BRDiv比基eline方法更好地发现不同的MCS成员,而不是重复找到相同的策略。Abstract
Robustly cooperating with unseen agents and human partners presents significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies obtained through maximizing specific diversity metrics. However, these heuristic diversity metrics do not always maximize the agent's robustness in all cooperative problems. In this work, we first propose that maximizing an AHT agent's robustness requires it to emulate policies in the minimum coverage set (MCS), the set of best-response policies to any partner policies in the environment. We then introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS. L-BRDiv works by solving a constrained optimization problem to jointly train teammate policies for AHT training and approximating AHT agent policies that are members of the MCS. We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems without the need for extensive hyperparameter tuning for its objectives. Our study shows that L-BRDiv outperforms the baseline methods by prioritizing discovering distinct members of the MCS instead of repeatedly finding redundant policies.
摘要
Robustly collaborating with unseen agents and human partners poses significant challenges due to the diverse cooperative conventions these partners may adopt. Existing Ad Hoc Teamwork (AHT) methods address this challenge by training an agent with a population of diverse teammate policies obtained through maximizing specific diversity metrics. However, these heuristic diversity metrics do not always maximize the agent's robustness in all cooperative problems. In this work, we first propose that maximizing an AHT agent's robustness requires it to emulate policies in the minimum coverage set (MCS), the set of best-response policies to any partner policies in the environment. We then introduce the L-BRDiv algorithm that generates a set of teammate policies that, when used for AHT training, encourage agents to emulate policies from the MCS. L-BRDiv works by solving a constrained optimization problem to jointly train teammate policies for AHT training and approximating AHT agent policies that are members of the MCS. We empirically demonstrate that L-BRDiv produces more robust AHT agents than state-of-the-art methods in a broader range of two-player cooperative problems without the need for extensive hyperparameter tuning for its objectives. Our study shows that L-BRDiv outperforms the baseline methods by prioritizing discovering distinct members of the MCS instead of repeatedly finding redundant policies.Here is the translation in Traditional Chinese:和不可见的代理人和人类合作伙伴一起合作具有很大的挑战,因为这些合作伙伴可能遵循多种不同的合作传统。现有的协作团队(AHT)方法面对这个挑战,通过将一个代理人训练为一群多样化的伙伴政策,并通过最大化特定多样性度量来实现。但这些旧的多样性度量不一定能够将代理人的Robustness最大化在所有的合作问题中。在这个工作中,我们首先提出,将代理人的Robustness最大化需要它们模仿环境中的最小覆盖集(MCS)中的最佳回应策略。我们然后引入L-BRDiv算法,这个算法可以将一群伙伴政策训练为AHT训练,并且将AHT代理人的政策变成MCS中的成员。L-BRDiv算法通过解决一个受限搜索问题,以实现AHT训练和MCS中的策略匹配。我们实际地显示,L-BRDiv算法可以在更广泛的二 player合作问题中生成更Robust的AHT代理人,并且不需要进行大量的参数调整。我们的研究显示,L-BRDiv算法比基eline方法更好,因为它将优先发现MCS中的不同成员,而不是重复发现重复的策略。
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
results: 经过广泛的实验,WizardMath 在两个数学逻辑评测 benchmark 上表现出众,超越了所有现有的开源大型自然语言模型,并在 GSM8k 和 MATH 两个评测 benchmark 上同时超越了 ChatGPT-3.5、Claude Instant-1、PaLM-2 和 Minerva。Please note that the translation is based on the abstract you provided, and the full paper may contain more details and results.Abstract
Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at https://github.com/nlpxucan/WizardLM and https://huggingface.co/WizardLM.
摘要
大型自然语言处理(NLP)模型(如GPT-4)在数学逻辑任务中表现出色,但大多数现有的开源模型都只是通过大规模互联网数据进行预训练而未进行数学相关的优化。在这篇论文中,我们提出了增强大型自然语言处理模型的数学逻辑能力的方法——基于反馈学习的强化学习(RLEIF)方法。我们在数学逻辑 benchmark 上进行了广泛的实验,并证明了我们的模型在 GSM8k 和 MATH 上的超常表现。我们的 WizardMath 模型在 GSM8k 上超过了所有现有的开源 LLM,并同时超过了 ChatGPT-3.5、Claude Instant-1、PaLM-2 和 Minerva。此外,我们的模型还在 MATH 上超过了 Text-davinci-002、PaLM-1 和 GPT-3。更多细节和模型权重可以在 GitHub 上找到(https://github.com/nlpxucan/WizardLM)和 Hugging Face 上找到(https://huggingface.co/WizardLM)。
Investigating the Interplay between Features and Structures in Graph Learning
for: This paper aims to investigate the relationship between node features and target labels in deep graph networks, and to develop new metrics to measure the influence of node features on target labels.
methods: The paper uses two generative processes to build and study ad-hoc node classification tasks, and evaluates the performance of six models, including structure-agnostic ones.
results: The paper finds that previously defined metrics are not adequate when the assumption of a strong correlation between node features and target labels is relaxed, and presents novel research findings that could help advance our understanding of the field.Here’s the Chinese translation of the three pieces of information:
results: 这篇论文发现先前定义的度量在减弱节点特征和目标标签之间的强相关关系时不适用,并提出了新的研究成果,可能帮助我们更深入理解这个领域。Abstract
In the past, the dichotomy between homophily and heterophily has inspired research contributions toward a better understanding of Deep Graph Networks' inductive bias. In particular, it was believed that homophily strongly correlates with better node classification predictions of message-passing methods. More recently, however, researchers pointed out that such dichotomy is too simplistic as we can construct node classification tasks where graphs are completely heterophilic but the performances remain high. Most of these works have also proposed new quantitative metrics to understand when a graph structure is useful, which implicitly or explicitly assume the correlation between node features and target labels. Our work empirically investigates what happens when this strong assumption does not hold, by formalising two generative processes for node classification tasks that allow us to build and study ad-hoc problems. To quantitatively measure the influence of the node features on the target labels, we also use a metric we call Feature Informativeness. We construct six synthetic tasks and evaluate the performance of six models, including structure-agnostic ones. Our findings reveal that previously defined metrics are not adequate when we relax the above assumption. Our contribution to the workshop aims at presenting novel research findings that could help advance our understanding of the field.
摘要
Translated into Simplified Chinese:在过去,豪豪和非豪豪的分类理论在深度图学网络的吸引偏好上促进了研究贡献。特别是,人们认为豪豪和非豪豪之间的差异会导致更好的节点预测结果。然而,最近的研究表明,这种分类是太过简单,因为我们可以构建节点预测任务,其中图是完全不豪豪的,却可以达到高性能。大多数这些工作还提出了新的量化指标,以评估图结构的有用性,这些指标直接或间接假设节点特征和目标标签之间的相关性。我们的工作employs two generative processes for node classification tasks,allowing us to build and study ad-hoc problems。为了量化节点特征对目标标签的影响,我们还使用一个叫做特征有用性的指标。我们构建了六个 sintetic任务,并评估了六种模型,包括结构不关注的模型。我们的发现表明,先前定义的指标不适用于放宽这种假设。我们的贡献是在工作室会议上发表新的研究发现,可以帮助我们更深入地理解这个领域。
results: 实验表明,这种增量 spectral clustering 方法可以准确地 clustering 大量数据,并且可以避免因数据规模增长而导致的复杂性增加。Abstract
Our previous experiments demonstrated that subsets collections of (short) documents (with several hundred entries) share a common normalized in some way eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets, (3) merge clusters from different subsets based on the eigenvalue spectrum similarity to form clusters of the entire set. This method can be especially useful for clustering methods of complexity strongly increasing with the size of the data sample,like in case of typical spectral clustering. Experiments were performed showing that in fact the clustering and merging the subsets yields clusters close to clustering the entire dataset.
摘要
我们之前的实验表明, subsets collections of (短) 文档(具有数百个数据)共享一个常数归一化的几何 Laplacian 吸引器谱。基于这一点,我们提出了一种逐步增量 spectral clustering 方法。该方法包括以下步骤:1. 将数据分割成可控制的 subset,2. 对每个 subset 进行归一化,3. 根据吸引器谱的相似性,将不同 subset 的归一化结果合并形成整个数据集的归一化结果。这种方法可以特别有用于数据集的规模快速增长的情况下,如典型的spectral clustering。我们的实验表明,将 subset 进行归一化并将归一化结果合并可以得到整个数据集的准确归一化结果。
Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning
results: 这篇研究发现TA方法可以与KD-based CIL方法相互运作,并在多个例示无法学习benchmark上提供了一致的性能提升。Abstract
In this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher network when dealing with out-of-distribution data. This causes large errors in the KD loss component, leading to performance degradation in CIL. Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main model during incremental training. Our method seamlessly integrates with KD-based CIL approaches and allows for consistent enhancement of their performance across multiple exemplar-free CIL benchmarks.
摘要
在这项研究中,我们调查了无 exemplar 的类增量学习(CIL)中使用知识塑造(KD)作为准则约束,以避免忘记。KD 基本方法在 CIL 中成功应用,但它们经常无法规范模型,不管在之前任务的示例数据上进行学习。我们的分析发现,这个问题源于教师网络对异常数据的表达变化,导致 KD 损失成分中的大误差,从而导致 CIL 的性能下降。针对此,我们引入了教师适应(TA)方法,该方法在增量训练中同时更新教师网络和主模型。我们的方法顺利地融合了 KD 基本方法和 CIL 方法,并且可以在多个无 exemplar 的 CIL 基准上提供一致的性能提升。
paper_authors: Lu Zhang, Chenbo Zhang, Jiajia Zhao, Jihong Guan, Shuigeng Zhou for: 这篇论文的目的是为了解决零次识别 task 中的问题,特别是低精度和背景混淆问题。methods: 这篇论文使用了 DETR 和 meta-learning 技术,实现了名为 Meta-ZSDETR 的零次识别方法。这篇论文不同于使用 Faster R-CNN 的方法,首先产生无关于类别的提案,然后使用视觉 semantic 对照模组进行分类。相反地,Meta-ZSDETR 直接预测类别特定的方块坐标,使用类别特定的查询来预测类别对应的坐标,并且使用预测的准确度从分类头进行筛选。results: 这篇论文的实验结果显示,该方法在两个 benchmark 数据集 MS COCO 和 PASCAL VOC 上的表现都大幅超过了现有的 ZSD 方法。Abstract
Zero-shot object detection aims to localize and recognize objects of unseen classes. Most of existing works face two problems: the low recall of RPN in unseen classes and the confusion of unseen classes with background. In this paper, we present the first method that combines DETR and meta-learning to perform zero-shot object detection, named Meta-ZSDETR, where model training is formalized as an individual episode based meta-learning task. Different from Faster R-CNN based methods that firstly generate class-agnostic proposals, and then classify them with visual-semantic alignment module, Meta-ZSDETR directly predict class-specific boxes with class-specific queries and further filter them with the predicted accuracy from classification head. The model is optimized with meta-contrastive learning, which contains a regression head to generate the coordinates of class-specific boxes, a classification head to predict the accuracy of generated boxes, and a contrastive head that utilizes the proposed contrastive-reconstruction loss to further separate different classes in visual space. We conduct extensive experiments on two benchmark datasets MS COCO and PASCAL VOC. Experimental results show that our method outperforms the existing ZSD methods by a large margin.
摘要
Zero-shot object detection aims to localize and recognize objects of unseen classes. Most existing works face two problems: low recall of RPN in unseen classes and confusion of unseen classes with background. In this paper, we present the first method that combines DETR and meta-learning for zero-shot object detection, named Meta-ZSDETR. The model training is formalized as an individual episode-based meta-learning task. Different from Faster R-CNN based methods that first generate class-agnostic proposals and then classify them with visual-semantic alignment modules, Meta-ZSDETR directly predicts class-specific boxes with class-specific queries and further filters them with the predicted accuracy from the classification head. The model is optimized with meta-contrastive learning, which contains a regression head to generate the coordinates of class-specific boxes, a classification head to predict the accuracy of generated boxes, and a contrastive head that utilizes the proposed contrastive-reconstruction loss to further separate different classes in visual space. We conduct extensive experiments on two benchmark datasets MS COCO and PASCAL VOC. Experimental results show that our method outperforms existing ZSD methods by a large margin.
Proceedings of the 2nd International Workshop on Adaptive Cyber Defense
results: 研究人员通过实验和演示证明了AI和ML技术在网络防御中的可行性和有效性,并提出了一些可能的应用场景。Abstract
The 2nd International Workshop on Adaptive Cyber Defense was held at the Florida Institute of Technology, Florida. This workshop was organized to share research that explores unique applications of Artificial Intelligence (AI) and Machine Learning (ML) as foundational capabilities for the pursuit of adaptive cyber defense. The cyber domain cannot currently be reliably and effectively defended without extensive reliance on human experts. Skilled cyber defenders are in short supply and often cannot respond fast enough to cyber threats. Building on recent advances in AI and ML the Cyber defense research community has been motivated to develop new dynamic and sustainable defenses through the adoption of AI and ML techniques to cyber settings. Bridging critical gaps between AI and Cyber researchers and practitioners can accelerate efforts to create semi-autonomous cyber defenses that can learn to recognize and respond to cyber attacks or discover and mitigate weaknesses in cooperation with other cyber operation systems and human experts. Furthermore, these defenses are expected to be adaptive and able to evolve over time to thwart changes in attacker behavior, changes in the system health and readiness, and natural shifts in user behavior over time. The workshop was comprised of invited keynote talks, technical presentations and a panel discussion about how AI/ML can enable autonomous mitigation of current and future cyber attacks. Workshop submissions were peer reviewed by a panel of domain experts with a proceedings consisting of six technical articles exploring challenging problems of critical importance to national and global security. Participation in this workshop offered new opportunities to stimulate research and innovation in the emerging domain of adaptive and autonomous cyber defense.
摘要
第二届国际适应性网络防御工作坊在美国佛罗里达理工学院举行,旨在共享研究人员在使用人工智能(AI)和机器学习(ML)技术为适应性网络防御做出各种应用。现在,无法可靠地和有效地防御网络域,不可避免人才短缺和快速响应网络攻击的问题。基于最新的AI和ML技术,网络防御研究社区受到了激励,以开发新的动态和可持续的防御机制,通过与网络设备和人类专家合作,实现自动化的防御。这些防御机制预期能够适应变化的攻击者行为、系统健康和准备度以及自然的用户行为变化。工作坊包括邀请演讲、技术演示和关于如何通过AI/ML实现自主 Mitigation的panel讨论。工作坊提交经过埃评审定,汇报包括六篇技术文章,探讨了国家和全球安全中的挑战性问题。参加这个工作坊,提供了新的研究和创新机会在emerging领域中,即适应性网络防御。
Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning
results: 作者使用了这个数据集训练了四个空间声音任务,并取得了 median absolute error 为6.60{\deg} 的3D声音源定位任务,0.43m 的距离任务,90.66ms 的 T30 任务和 2.74dB 的 DRR 估计任务的结果。此外,作者还证明了这些模型可以通过在广泛使用的评估数据集上进行训练来获得良好的普适性。Abstract
We present Spatial LibriSpeech, a spatial audio dataset with over 650 hours of 19-channel audio, first-order ambisonics, and optional distractor noise. Spatial LibriSpeech is designed for machine learning model training, and it includes labels for source position, speaking direction, room acoustics and geometry. Spatial LibriSpeech is generated by augmenting LibriSpeech samples with 200k+ simulated acoustic conditions across 8k+ synthetic rooms. To demonstrate the utility of our dataset, we train models on four spatial audio tasks, resulting in a median absolute error of 6.60{\deg} on 3D source localization, 0.43m on distance, 90.66ms on T30, and 2.74dB on DRR estimation. We show that the same models generalize well to widely-used evaluation datasets, e.g., obtaining a median absolute error of 12.43{\deg} on 3D source localization on TUT Sound Events 2018, and 157.32ms on T30 estimation on ACE Challenge.
摘要
我们现在提供的是一个空间听说数据集,名为空间听说(Spatial LibriSpeech),它包含了650小时以上的19个频道声音,以及一个首选的干扰噪声。空间听说数据集是为机器学习模型训练而设计,其中包括源位置、说话方向、室内声学和室内geometry的标签。空间听说数据集通过对LibriSpeech样本进行扩展,生成了200,000+个simulated acoustic condition,并在8,000+个synthetic room中进行了扩展。为了证明我们的数据集的实用性,我们在四个空间听说任务上训练了模型,其中包括3D源localization、距离、T30和DRR估计。我们发现这些模型在广泛使用的评估数据集上也能够良好地适应,例如在TUT Sound Events 2018中, median absolute error为12.43°,和在ACE Challenge中,T30估计中的 median absolute error为157.32ms。
Semantic relatedness in DBpedia: A comparative and experimental assessment
results: 根据实验结果,将RDF triplets权重化并评估所有导向链连接的资源是计算DBpedia中Semantic relatedness的最佳策略。Abstract
Evaluating semantic relatedness of Web resources is still an open challenge. This paper focuses on knowledge-based methods, which represent an alternative to corpus-based approaches, and rely in general on the availability of knowledge graphs. In particular, we have selected 10 methods from the existing literature, that have been organized according to it adjacent resources, triple patterns, and triple weights-based methods. They have been implemented and evaluated by using DBpedia as reference RDF knowledge graph. Since DBpedia is continuously evolving, the experimental results provided by these methods in the literature are not comparable. For this reason, in this work, such methods have been experimented by running them all at once on the same DBpedia release and against 14 well-known golden datasets. On the basis of the correlation values with human judgment obtained according to the experimental results, weighting the RDF triples in combination with evaluating all the directed paths linking the compared resources is the best strategy in order to compute semantic relatedness in DBpedia.
摘要
evaluating semantic relatedness of web resources is still an open challenge. this paper focuses on knowledge-based methods, which represent an alternative to corpus-based approaches, and rely in general on the availability of knowledge graphs. in particular, we have selected 10 methods from the existing literature, that have been organized according to their adjacent resources, triple patterns, and triple weights-based methods. they have been implemented and evaluated by using dbpedia as reference rdf knowledge graph. since dbpedia is continuously evolving, the experimental results provided by these methods in the literature are not comparable. for this reason, in this work, such methods have been experimented by running them all at once on the same dbpedia release and against 14 well-known golden datasets. on the basis of the correlation values with human judgment obtained according to the experimental results, weighting the rdf triples in combination with evaluating all the directed paths linking the compared resources is the best strategy in order to compute semantic relatedness in dbpedia.
Predictive Authoring for Brazilian Portuguese Augmentative and Alternative Communication
for: This paper proposes using a BERT-like model for pictogram prediction in AAC systems to improve the efficiency of message authoring for individuals with complex communication needs.
methods: The authors finetune BERTimbau, a Brazilian Portuguese version of BERT, using an AAC corpus for Brazilian Portuguese, and test different approaches to representing a pictogram for prediction, including as a word, as a concept, and as a set of synonyms.
results: The results demonstrate that using embeddings computed from the pictograms’ captions, synonyms, or definitions have a similar performance, and using synonyms leads to lower perplexity, but using captions leads to the highest accuracies. The paper provides insight into how to represent a pictogram for prediction using a BERT-like model and the potential of using images for pictogram prediction.Here is the same information in Simplified Chinese:
results: 结果表明,使用图像描述、词典定义或相关词来计算图像预测的方法具有相似的性能。使用相关词的方法可以得到更低的混淆率,但使用描述的方法可以得到最高的准确率。这篇论文提供了使用基于BERT的模型进行图像预测的可能性和图像预测的技术。Abstract
Individuals with complex communication needs (CCN) often rely on augmentative and alternative communication (AAC) systems to have conversations and communique their wants. Such systems allow message authoring by arranging pictograms in sequence. However, the difficulty of finding the desired item to complete a sentence can increase as the user's vocabulary increases. This paper proposes using BERTimbau, a Brazilian Portuguese version of BERT, for pictogram prediction in AAC systems. To finetune BERTimbau, we constructed an AAC corpus for Brazilian Portuguese to use as a training corpus. We tested different approaches to representing a pictogram for prediction: as a word (using pictogram captions), as a concept (using a dictionary definition), and as a set of synonyms (using related terms). We also evaluated the usage of images for pictogram prediction. The results demonstrate that using embeddings computed from the pictograms' caption, synonyms, or definitions have a similar performance. Using synonyms leads to lower perplexity, but using captions leads to the highest accuracies. This paper provides insight into how to represent a pictogram for prediction using a BERT-like model and the potential of using images for pictogram prediction.
摘要
人们 WITH complex communication needs (CCN) 常常使用增强型替代通信系统 (AAC) 进行交流和表达自己的需求。这些系统允许用户通过排序绘图来编写消息。然而,如果用户的词汇量增加,则查找所需的项目可能变得更加困难。这篇论文提议使用 BERTimbau,一种巴西葡萄牙语版的 BERT,来预测绘图。为了训练 BERTimbau,我们创建了一个用于巴西葡萄牙语 AAC 系统的训练集。我们测试了不同的绘图预测方法:以字符 (使用绘图描述)、以概念 (使用词典定义) 和以同义词 (使用相关词语) 三种方法。我们还评估了使用图像进行绘图预测。结果表明,使用绘图描述、同义词或词典定义生成的嵌入都有类似的表现。使用同义词可以降低噪音水平,但使用绘图描述可以达到最高的准确率。这篇论文提供了如何使用 BERT-like 模型来预测绘图,以及使用图像进行绘图预测的潜在性。
Balancing Transparency and Risk: The Security and Privacy Risks of Open-Source Machine Learning Models
paper_authors: Dominik Hintersdorf, Lukas Struppek, Kristian Kersting
for: 这项研究的目的是提醒用户关于使用开源机器学习模型的隐私和安全风险。
methods: 该研究使用了诸如攻击分析和隐私检测等方法来描述开源模型中常见的隐私和安全风险。
results: 研究发现了许多开源模型中隐私和安全风险的例子,包括模型中隐藏的功能和输入模式的攻击等。这些风险可能导致服务中断、敏感用户数据泄露和更严重的物理损害等。Abstract
The field of artificial intelligence (AI) has experienced remarkable progress in recent years, driven by the widespread adoption of open-source machine learning models in both research and industry. Considering the resource-intensive nature of training on vast datasets, many applications opt for models that have already been trained. Hence, a small number of key players undertake the responsibility of training and publicly releasing large pre-trained models, providing a crucial foundation for a wide range of applications. However, the adoption of these open-source models carries inherent privacy and security risks that are often overlooked. To provide a concrete example, an inconspicuous model may conceal hidden functionalities that, when triggered by specific input patterns, can manipulate the behavior of the system, such as instructing self-driving cars to ignore the presence of other vehicles. The implications of successful privacy and security attacks encompass a broad spectrum, ranging from relatively minor damage like service interruptions to highly alarming scenarios, including physical harm or the exposure of sensitive user data. In this work, we present a comprehensive overview of common privacy and security threats associated with the use of open-source models. By raising awareness of these dangers, we strive to promote the responsible and secure use of AI systems.
摘要
人工智能(AI)领域在最近几年内取得了很大进步,由于研究和业务中广泛采用开源机器学习模型而驱动。由于训练大量数据集需要巨量的资源,许多应用程序选择使用已经训练过的模型。因此,只有一些关键参与者在训练和公共发布大型预训练模型,为各种应用提供了重要的基础。然而,使用这些开源模型的采用带来了隐私和安全风险,这些风险frequently overlooked。为了提供一个具体的例子,一个无征的模型可能封装了隐藏的功能,当特定的输入模式触发时,可以 manipulate the behavior of the system,如 instructing self-driving cars to ignore the presence of other vehicles。成功的隐私和安全攻击的后果extremely broad,从relatively minor damage like service interruptions到highly alarming scenarios, including physical harm or the exposure of sensitive user data。在这项工作中,我们提供了对公开源模型的常见隐私和安全威胁的全面概述。通过提醒这些危险,我们希望促进AI系统的负责任和安全使用。
Modelling Electricity Consumption in Irish Dairy Farms Using Agent-Based Modelling
results: 研究发现代理模型可以准确地预测奶牛农场的电力消耗,并且可以提供完全可解释的输出,这是其他人工智能技术,如深度学习模型所不具备的优点。Abstract
Dairy farming can be an energy intensive form of farming. Understanding the factors affecting electricity consumption on dairy farms is crucial for farm owners and energy providers. In order to accurately estimate electricity demands in dairy farms, it is necessary to develop a model. In this research paper, an agent-based model is proposed to model the electricity consumption of Irish dairy farms. The model takes into account various factors that affect the energy consumption of dairy farms, including herd size, number of milking machines, and time of year. The outputs are validated using existing state-of-the-art dairy farm modelling frameworks. The proposed agent-based model is fully explainable, which is an advantage over other Artificial Intelligence techniques, e.g. deep learning.
摘要
牛奶农业可能是一种能源密集的农业形式。了解牛奶农场电ity消耗的因素对农场所有者和能源供应商非常重要。为了准确估算牛奶农场电ity需求,需要开发一个模型。在这篇研究论文中,我们提出了一个代理基模型,用于模拟爱尔兰牛奶农场电ity消耗。该模型考虑了各种影响牛奶农场电ity消耗的因素,包括牛群规模、数量 milking machines 和时间季节。输出被验证使用现有的国际先进的牛奶农场模型框架。我们的代理基模型具有可解释性,这是对其他人工智能技术,例如深度学习,的优势。
Poison Dart Frog: A Clean-Label Attack with Low Poisoning Rate and High Attack Success Rate in the Absence of Training Data
paper_authors: Binhao Ma, Jiahui Wang, Dejun Wang, Bo Meng
for: 防止背门的攻击 (backdoor attacks)
methods: 使用“Poison Dart Frog”clean-label方法,不需要攻击者有训练集的知识 (knowledge of the entire training set or a portion of it)
results: 在CIFAR10、Tiny-ImageNet和TSRD上,使用0.1%、0.025%和0.4%的训练集恶化率,分别达到高的攻击成功率,并与州对抗方法NARCISSUS相似的攻击成功率,而不需要任何训练集知识。Abstract
To successfully launch backdoor attacks, injected data needs to be correctly labeled; otherwise, they can be easily detected by even basic data filters. Hence, the concept of clean-label attacks was introduced, which is more dangerous as it doesn't require changing the labels of injected data. To the best of our knowledge, the existing clean-label backdoor attacks largely relies on an understanding of the entire training set or a portion of it. However, in practice, it is very difficult for attackers to have it because of training datasets often collected from multiple independent sources. Unlike all current clean-label attacks, we propose a novel clean label method called 'Poison Dart Frog'. Poison Dart Frog does not require access to any training data; it only necessitates knowledge of the target class for the attack, such as 'frog'. On CIFAR10, Tiny-ImageNet, and TSRD, with a mere 0.1\%, 0.025\%, and 0.4\% poisoning rate of the training set size, respectively, Poison Dart Frog achieves a high Attack Success Rate compared to LC, HTBA, BadNets, and Blend. Furthermore, compared to the state-of-the-art attack, NARCISSUS, Poison Dart Frog achieves similar attack success rates without any training data. Finally, we demonstrate that four typical backdoor defense algorithms struggle to counter Poison Dart Frog.
摘要
要成功发起后门攻击,注入数据需要正确地标签,否则可能被even basic数据筛选器轻易探测。因此, cleaner-label攻击被引入,这种攻击更加危险,因为它不需要改变注入数据的标签。据我们所知,现有的 cleaner-label 后门攻击大多数依据整个训练集或一部分它的理解。然而,在实践中,攻击者很难获得这些训练集,因为训练集通常来自多个独立的源头。不同于现有的 cleaner-label 攻击,我们提出了一种新的 cleaner-label 方法 called 'Poison Dart Frog'。Poison Dart Frog 不需要训练集的访问权限,只需要target类的知识,例如 '蛙'。在 CIFAR10、Tiny-ImageNet 和 TSRD 上,使用0.1%、0.025% 和 0.4% 的训练集大小杂入率,Poison Dart Frog 可以获得高度的 Attack Success Rate,比LC、HTBA、BadNets 和 Blend 高。此外,与当前状态的攻击相比,Poison Dart Frog 可以达到类似的攻击成功率,不需要任何训练数据。最后,我们示示了四种常见的后门防御算法无法防御 Poison Dart Frog。
RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition
results: 在 IEMOCAP 和 MELD datasets 上,RBA-GCN 的 weighted average F1 score 与最先进方法相比,有2.17%∼5.21%的提升。Abstract
Emotion recognition in conversation (ERC) has received increasing attention from researchers due to its wide range of applications. As conversation has a natural graph structure, numerous approaches used to model ERC based on graph convolutional networks (GCNs) have yielded significant results. However, the aggregation approach of traditional GCNs suffers from the node information redundancy problem, leading to node discriminant information loss. Additionally, single-layer GCNs lack the capacity to capture long-range contextual information from the graph. Furthermore, the majority of approaches are based on textual modality or stitching together different modalities, resulting in a weak ability to capture interactions between modalities. To address these problems, we present the relational bilevel aggregation graph convolutional network (RBA-GCN), which consists of three modules: the graph generation module (GGM), similarity-based cluster building module (SCBM) and bilevel aggregation module (BiAM). First, GGM constructs a novel graph to reduce the redundancy of target node information. Then, SCBM calculates the node similarity in the target node and its structural neighborhood, where noisy information with low similarity is filtered out to preserve the discriminant information of the node. Meanwhile, BiAM is a novel aggregation method that can preserve the information of nodes during the aggregation process. This module can construct the interaction between different modalities and capture long-range contextual information based on similarity clusters. On both the IEMOCAP and MELD datasets, the weighted average F1 score of RBA-GCN has a 2.17$\sim$5.21\% improvement over that of the most advanced method.
摘要
很多研究者对话情感识别(ERC)的注意力增加,因为它在各种应用方面有广泛的应用空间。由于对话有自然的图形结构,许多方法使用图形傅敏网(GCNs)来模型ERC,它们已经获得了重要的成果。但是,传统GCN的聚合方法受到节点信息重复问题的影响,导致节点标识信息的损失。此外,单层GCN缺乏捕捉长距离内容关联的能力。此外,大多数方法仅基于文本模式或将不同模式组合起来,导致模式间的互动capture能力弱化。为了解决这些问题,我们提出了关系两级聚合图形傅敏网(RBA-GCN),它包括三个模组:图形生成模组(GGM)、相似度基于集群建立模组(SCBM)和两级聚合模组(BiAM)。首先,GGM将建立一个新的图形,以减少目标节点信息的重复性。接着,SCBM计算目标节点和其结构邻域中的相似度,并将低相似度的信息排除,以保留节点标识信息的 дискリ民资讯。同时,BiAM是一个新的聚合方法,可以在聚合过程中保持节点信息。这个模组可以建立不同模式之间的互动,并捕捉长距离内容关联。在IEMOCAP和MELD dataset上,RBA-GCN的复合权重平均F1分数与最先进方法相比,具有2.17%∼5.21%的提升。
AI Hilbert: From Data and Background Knowledge to Automated Scientific Discovery
paper_authors: Ryan Cory-Wright, Bachir El Khadir, Cristina Cornelio, Sanjeeb Dash, Lior Horesh
for: 本研究的目的是找到一种能够简洁地解释自然现象的科学方程式,并与现有背景理论相符。
methods: 本研究使用了回归和理解的方法,以消除与背景理论不符的方程式。
results: 研究表明,使用拟合和逻辑的方法可以快速地找到一个与数据最佳匹配的科学方程式,并且可以自动证明这些发现的正确性。Abstract
The discovery of scientific formulae that parsimoniously explain natural phenomena and align with existing background theory is a key goal in science. Historically, scientists have derived natural laws by manipulating equations based on existing knowledge, forming new equations, and verifying them experimentally. In recent years, data-driven scientific discovery has emerged as a viable competitor in settings with large amounts of experimental data. Unfortunately, data-driven methods often fail to discover valid laws when data is noisy or scarce. Accordingly, recent works combine regression and reasoning to eliminate formulae inconsistent with background theory. However, the problem of searching over the space of formulae consistent with background theory to find one that fits the data best is not well solved. We propose a solution to this problem when all axioms and scientific laws are expressible via polynomial equalities and inequalities and argue that our approach is widely applicable. We further model notions of minimal complexity using binary variables and logical constraints, solve polynomial optimization problems via mixed-integer linear or semidefinite optimization, and automatically prove the validity of our scientific discoveries via Positivestellensatz certificates. Remarkably, the optimization techniques leveraged in this paper allow our approach to run in polynomial time with fully correct background theory, or non-deterministic polynomial (NP) time with partially correct background theory. We experimentally demonstrate that some famous scientific laws, including Kepler's Third Law of Planetary Motion, the Hagen-Poiseuille Equation, and the Radiated Gravitational Wave Power equation, can be automatically derived from sets of partially correct background axioms.
摘要
科学发现的目标是找到经济、与背景理论相align的自然现象的科学方程。历史上,科学家通过拟合方程、形成新方程、并通过实验验证来 derive 自然法律。在近年来,数据驱动的科学发现得到了广泛应用,但是在具有噪音或缺乏数据的情况下,数据驱动方法常常无法发现有效的法律。因此,现有的方法通常是结合回归和逻辑来排除不符合背景理论的方程。然而,搜索符合背景理论的方程空间中最佳方程的问题还没有得到妥善解决。我们提出一种解决这个问题的方法,其中所有的axioms和科学法律都可以表示为多项式等式和不等式。我们还使用二进制变量和逻辑约束来模型简洁度的概念,使用混合整数线性或半definiteProgramming来解决多项式优化问题,并使用Positivestellensatz证明来自动验证我们的科学发现。很显然,我们的优化技术可以使我们的方法在 polynomial 时间内运行,并且完全符合背景理论,或者在 partially correct 背景理论下运行在非 deterministic polynomial 时间内。我们在实验中证明了一些著名的科学法律,包括凯撒第三法律、hagen-Poiseuille方程和释放 gravitational wave 的能量方程,可以自动从部分正确的背景axioms中 derivation。
Vision Relation Transformer for Unbiased Scene Graph Generation
paper_authors: Gopika Sudhakaran, Devendra Singh Dhami, Kristian Kersting, Stefan Roth
for: scene graph generation (SGG) task, aiming to predict entity relationships using a relation encoder-decoder pipeline stacked on top of an object encoder-decoder backbone.
methods: introduces the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder, and the Mutually Exclusive ExperT (MEET) learning strategy to capture important relation features without bias towards head or tail classes.
results: experimental results on the VG and GQA datasets demonstrate that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.Abstract
Recent years have seen a growing interest in Scene Graph Generation (SGG), a comprehensive visual scene understanding task that aims to predict entity relationships using a relation encoder-decoder pipeline stacked on top of an object encoder-decoder backbone. Unfortunately, current SGG methods suffer from an information loss regarding the entities local-level cues during the relation encoding process. To mitigate this, we introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder. We further observe that many existing SGG methods claim to be unbiased, but are still biased towards either head or tail classes. To overcome this bias, we introduce a Mutually Exclusive ExperT (MEET) learning strategy that captures important relation features without bias towards head or tail classes. Experimental results on the VG and GQA datasets demonstrate that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
摘要
Recent years have seen a growing interest in Scene Graph Generation (SGG), a comprehensive visual scene understanding task that aims to predict entity relationships using a relation encoder-decoder pipeline stacked on top of an object encoder-decoder backbone. Unfortunately, current SGG methods suffer from an information loss regarding the entities local-level cues during the relation encoding process. To mitigate this, we introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder. We further observe that many existing SGG methods claim to be unbiased, but are still biased towards either head or tail classes. To overcome this bias, we introduce a Mutually Exclusive ExperT (MEET) learning strategy that captures important relation features without bias towards head or tail classes. Experimental results on the VG and GQA datasets demonstrate that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.Here's the translation in Traditional Chinese:Recent years have seen a growing interest in Scene Graph Generation (SGG), a comprehensive visual scene understanding task that aims to predict entity relationships using a relation encoder-decoder pipeline stacked on top of an object encoder-decoder backbone. Unfortunately, current SGG methods suffer from an information loss regarding the entities local-level cues during the relation encoding process. To mitigate this, we introduce the Vision rElation TransfOrmer (VETO), consisting of a novel local-level entity relation encoder. We further observe that many existing SGG methods claim to be unbiased, but are still biased towards either head or tail classes. To overcome this bias, we introduce a Mutually Exclusive ExperT (MEET) learning strategy that captures important relation features without bias towards head or tail classes. Experimental results on the VG and GQA datasets demonstrate that VETO + MEET boosts the predictive performance by up to 47 percentage over the state of the art while being 10 times smaller.
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
results: 在多个Established downstream VL tasks上,提出的 STUA 预训练方法和 ASH-Nets 模型实现了竞争性的 результаados。Abstract
With the success of self-supervised learning, multimodal foundation models have rapidly adapted a wide range of downstream tasks driven by vision and language (VL) pretraining. State-of-the-art methods achieve impressive performance by pre-training on large-scale datasets. However, bridging the semantic gap between the two modalities remains a nonnegligible challenge for VL tasks. In this work, we propose an efficient computation framework for multimodal alignment by introducing a novel visual semantic module to further improve the performance of the VL tasks. Specifically, we propose a flexible model, namely Artificial-Spiking Hierarchical Networks (ASH-Nets), which combines the complementary advantages of Artificial neural networks (ANNs) and Spiking neural networks (SNNs) to enrich visual semantic representations. In particular, a visual concrete encoder and a semantic abstract encoder are constructed to learn continuous and discrete latent variables to enhance the flexibility of semantic encoding. Considering the spatio-temporal properties of SNNs modeling, we introduce a contrastive learning method to optimize the inputs of similar samples. This can improve the computational efficiency of the hierarchical network, while the augmentation of hard samples is beneficial to the learning of visual representations. Furthermore, the Spiking to Text Uni-Alignment Learning (STUA) pre-training method is proposed, which only relies on text features to enhance the encoding ability of abstract semantics. We validate the performance on multiple well-established downstream VL tasks. Experiments show that the proposed ASH-Nets achieve competitive results.
摘要
自顾精学学习的成功,多Modal基础模型快速适应了视频语言(VL)预训练驱动的广泛下游任务。状态顶尖方法在大规模数据预训练后达到了印象式的性能。然而,跨模态 semantic gap 问题仍然是 VL 任务的一个重要挑战。在这种情况下,我们提议一种高效的计算框架,即 Multimodal 对接框架,通过引入一种新的视觉semantic模块来进一步提高 VL 任务的性能。具体来说,我们提出一种灵活的模型,即人工神经网络(ANNs)和脉冲神经网络(SNNs)的组合模型,以增强视觉semantic表示。特别是,我们构建了一个视觉具体编码器和一个semantic抽象编码器,以学习不同的连续和离散的幂变量,以提高模型的灵活性。此外,我们引入了一种对比学习方法,以优化层次网络的输入,从而提高计算效率。此外,我们还提出了一种基于文本特征的预训练方法,即 STUA 预训练方法,可以增强模型的抽象semantic编码能力。我们 validate 了多个常见的下游 VL 任务的性能,实验结果表明,我们提出的 ASH-Nets 可以 achieve 竞争性的结果。
Logistics Hub Location Optimization: A K-Means and P-Median Model Hybrid Approach Using Road Network Distances
paper_authors: Muhammad Abdul Rahman, Muhammad Aamir Basheer, Zubair Khalid, Muhammad Tahir, Momin Uppal
For: 优化快递总站的位置,以提高电商业务的效率和环保性。* Methods: hybrid方法,首先使用K-Means对交通网络距离相关的配送点进行聚合,然后使用P-Median方法将总站设置在优化位置。* Results: 通过使用优化的总站位置,可以节省每个配送的10%(815米)的距离。Abstract
Logistic hubs play a pivotal role in the last-mile delivery distance; even a slight increment in distance negatively impacts the business of the e-commerce industry while also increasing its carbon footprint. The growth of this industry, particularly after Covid-19, has further intensified the need for optimized allocation of resources in an urban environment. In this study, we use a hybrid approach to optimize the placement of logistic hubs. The approach sequentially employs different techniques. Initially, delivery points are clustered using K-Means in relation to their spatial locations. The clustering method utilizes road network distances as opposed to Euclidean distances. Non-road network-based approaches have been avoided since they lead to erroneous and misleading results. Finally, hubs are located using the P-Median method. The P-Median method also incorporates the number of deliveries and population as weights. Real-world delivery data from Muller and Phipps (M&P) is used to demonstrate the effectiveness of the approach. Serving deliveries from the optimal hub locations results in the saving of 815 (10%) meters per delivery.
摘要
Delivery points are clustered using K-Means based on their spatial locations, utilizing road network distances instead of Euclidean distances to ensure accurate results.2. Hubs are located using the P-Median method, which takes into account the number of deliveries and population as weights to ensure the most efficient placement.Using real-world delivery data from Muller and Phipps (M&P), we demonstrate the effectiveness of our approach. By serving deliveries from the optimal hub locations, we save 815 meters (10%) per delivery, significantly reducing the carbon footprint and improving the efficiency of the e-commerce industry.
From Hope to Safety: Unlearning Biases of Deep Models by Enforcing the Right Reasons in Latent Space
results: 我们在控制环境和实际世界中进行了实验,在 ISIC、Bone Age、ImageNet 和 CelebA 数据集上使用 VGG、ResNet 和 EfficientNet 架构,实现了对伪 positives 的有效控制。Abstract
Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations, which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method ensuring the right reasons on the concept level by reducing the model's sensitivity towards biases through the gradient. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures.
摘要
We present a novel method ensuring the right reasons on the concept level by reducing the model's sensitivity towards biases through the gradient 梯度。 When modeling biases via Concept Activation Vectors 概念活动向量, we highlight the importance of choosing robust directions 选择可靠的方向, as traditional regression-based approaches such as Support Vector Machines 支持向量机 tend to result in diverging directions 分散的方向. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures 在 ISIC, Bone Age, ImageNet 和 CelebA dataset上使用 VGG, ResNet 和 EfficientNet 架构中有效地缓解偏见。
Enhancing Agent Communication and Learning through Action and Language
results: 研究发现,通过 combining 动作和语言交互模式,可以提高学习效果。Abstract
We introduce a novel category of GC-agents capable of functioning as both teachers and learners. Leveraging action-based demonstrations and language-based instructions, these agents enhance communication efficiency. We investigate the incorporation of pedagogy and pragmatism, essential elements in human communication and goal achievement, enhancing the agents' teaching and learning capabilities. Furthermore, we explore the impact of combining communication modes (action and language) on learning outcomes, highlighting the benefits of a multi-modal approach.
摘要
我们介绍了一种新的 GC-代理人,可以作为 both 教师和学生。通过行为示例和语言指令,这些代理人可以提高交流效率。我们研究了包括教学理论和 Pragmatics 等人类交流和目标实现的关键元素,以提高代理人的教学和学习能力。此外,我们还探讨了将多种交流方式(行为和语言)结合使用的影响,并指出了多模式approach的 beneficial effects。
ICU Mortality Prediction Using Long Short-Term Memory Networks
results: 实验结果表明,LSTM模型在具有严格多变量时间序列测量的情况下,对实际世界预测建立了高效的预测机制。Abstract
Extensive bedside monitoring in Intensive Care Units (ICUs) has resulted in complex temporal data regarding patient physiology, which presents an upscale context for clinical data analysis. In the other hand, identifying the time-series patterns within these data may provide a high aptitude to predict clinical events. Hence, we investigate, during this work, the implementation of an automatic data-driven system, which analyzes large amounts of multivariate temporal data derived from Electronic Health Records (EHRs), and extracts high-level information so as to predict in-hospital mortality and Length of Stay (LOS) early. Practically, we investigate the applicability of LSTM network by reducing the time-frame to 6-hour so as to enhance clinical tasks. The experimental results highlight the efficiency of LSTM model with rigorous multivariate time-series measurements for building real-world prediction engines.
摘要
延伸床side监测在医疗急诊室(ICU)中已经导致了复杂的时间序列数据,这些数据提供了更高级别的临床数据分析的上下文。然而,在这些数据中找到时间序列模式可能提供高度预测临床事件的可能性。因此,在本研究中,我们调查了一个自动化的数据驱动系统,该系统分析大量的多变量时间序列数据,并从医疗记录(EHRs)中提取高级别的信息,以预测患者在医院内的死亡率和治疗时间(LOS)的早期预测。实际上,我们研究了使用LSTM网络,通过减少时间帧为6小时,以提高临床任务的效率。实验结果表明,LSTM模型在多变量时间序列测量下表现了高效的预测能力。
Multi-Level Compositional Reasoning for Interactive Instruction Following
results: 该论文通过这种方法实现了2.03%的绝对提升(PLWSR)在未seen集中,而不使用规则基本规划或semantic spatial memory。Abstract
Robotic agents performing domestic chores by natural language directives are required to master the complex job of navigating environment and interacting with objects in the environments. The tasks given to the agents are often composite thus are challenging as completing them require to reason about multiple subtasks, e.g., bring a cup of coffee. To address the challenge, we propose to divide and conquer it by breaking the task into multiple subgoals and attend to them individually for better navigation and interaction. We call it Multi-level Compositional Reasoning Agent (MCR-Agent). Specifically, we learn a three-level action policy. At the highest level, we infer a sequence of human-interpretable subgoals to be executed based on language instructions by a high-level policy composition controller. At the middle level, we discriminatively control the agent's navigation by a master policy by alternating between a navigation policy and various independent interaction policies. Finally, at the lowest level, we infer manipulation actions with the corresponding object masks using the appropriate interaction policy. Our approach not only generates human interpretable subgoals but also achieves 2.03% absolute gain to comparable state of the arts in the efficiency metric (PLWSR in unseen set) without using rule-based planning or a semantic spatial memory.
摘要
机器人代理人在完成家务任务时需要掌握环境导航和对物体互动的复杂任务。给予的任务经常是复杂的,需要理解多个子任务,例如带一杯咖啡。为了解决这个挑战,我们提议分解任务为多个子目标,并对它们进行独立的处理,以更好地导航和互动。我们称之为多级组合理解代理人(MCR-Agent)。具体来说,我们学习了三级行为策略。在最高层,我们根据语言指令生成一个序列的人类可读Subgoal,并由高级组合控制器执行。在中层,我们通过alternating between a navigation policy和多种独立的互动策略来干扰导航。最后,在最低层,我们使用相应的互动策略来INFER操作。我们的方法不仅生成了人类可读的Subgoal,而且在PLWSR指标上(未seen集)实现了2.03%的绝对提升,不使用规则化计划或semantic spatial memory。
Deciphering knee osteoarthritis diagnostic features with explainable artificial intelligence: A systematic review
results: 本研究发现XAI技术可以提高膝关节杂合病诊断的可靠性和可信度,并且可以提供有用的启示,以便促进XAI技术的应用在临床实践中。Abstract
Existing artificial intelligence (AI) models for diagnosing knee osteoarthritis (OA) have faced criticism for their lack of transparency and interpretability, despite achieving medical-expert-like performance. This opacity makes them challenging to trust in clinical practice. Recently, explainable artificial intelligence (XAI) has emerged as a specialized technique that can provide confidence in the model's prediction by revealing how the prediction is derived, thus promoting the use of AI systems in healthcare. This paper presents the first survey of XAI techniques used for knee OA diagnosis. The XAI techniques are discussed from two perspectives: data interpretability and model interpretability. The aim of this paper is to provide valuable insights into XAI's potential towards a more reliable knee OA diagnosis approach and encourage its adoption in clinical practice.
摘要
现有的膝关节artoarthritis(OA)诊断模型(AI)受到了不透明性和解释性的批评,即使它们达到了医疗专业人员水平。这种透明性使得它们在临床实践中具有挑战。最近,解释性人工智能(XAI)作为一种专门的技术,可以提供对预测的信任,并透明地表明预测是如何 derivation的。这篇文章发表了膝关节OA诊断中使用XAI技术的首次报告。XAI技术从两个角度进行了讨论:数据解释性和模型解释性。文章的目的是提供XAI在膝关节OA诊断方面的可靠性的 valuable 信息,并促进XAI在临床实践中的采用。
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
paper_authors: Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
for: This paper aims to provide a comprehensive evaluation of vision transformers and related architectures, focusing on their efficiency across multiple performance metrics.
methods: The authors use more than 30 models and consider various performance metrics to evaluate the efficiency of different architectures. They also propose a hybrid attention-CNN model that performs well with low inference memory and number of parameters.
results: The study finds that ViT is still Pareto optimal across multiple efficiency metrics, despite the existence of alternative approaches claiming to be more efficient. The authors also discover a strong positive correlation between the number of FLOPS and training memory, and that scaling the model size is more effective than scaling the image size. The study provides valuable insights for practitioners and researchers when selecting models for specific applications.Abstract
The growing popularity of Vision Transformers as the go-to models for image classification has led to an explosion of architectural modifications claiming to be more efficient than the original ViT. However, a wide diversity of experimental conditions prevents a fair comparison between all of them, based solely on their reported results. To address this gap in comparability, we conduct a comprehensive analysis of more than 30 models to evaluate the efficiency of vision transformers and related architectures, considering various performance metrics. Our benchmark provides a comparable baseline across the landscape of efficiency-oriented transformers, unveiling a plethora of surprising insights. For example, we discover that ViT is still Pareto optimal across multiple efficiency metrics, despite the existence of several alternative approaches claiming to be more efficient. Results also indicate that hybrid attention-CNN models fare particularly well when it comes to low inference memory and number of parameters, and also that it is better to scale the model size, than the image size. Furthermore, we uncover a strong positive correlation between the number of FLOPS and the training memory, which enables the estimation of required VRAM from theoretical measurements alone. Thanks to our holistic evaluation, this study offers valuable insights for practitioners and researchers, facilitating informed decisions when selecting models for specific applications. We publicly release our code and data at https://github.com/tobna/WhatTransformerToFavor
摘要
“vision transformer”的快速增长 popularity 使得许多建筑修改被提出,承认这些模型更高效 чем原始的 ViT。然而,各种实验条件的多样性使得对各模型的比较变得困难,根据报告的结果来进行对比。为了解决这个问题,我们进行了对超过30个模型的全面分析,以评估视transformer和相关建筑的效率,并考虑多种性能指标。我们的基准提供了不同类型的transformer模型之间的比较基准,揭示了许多有趣的发现。例如,我们发现, despite the existence of several alternative approaches claiming to be more efficient, ViT仍然在多种效率指标上保持Pareto优化的状态。此外,我们发现在低执行 памяти和参数数量方面,混合注意力-CNN模型表现特别好,而且Scaling the model size是比Scale the image size更好的选择。此外,我们发现对于FLOPS和训练内存之间存在强正相关关系,这使得通过理论测量 alone 可以估算需要的VRAM。由于我们的彻底评估,这些研究可以为实践者和研究人员提供有价值的意见,以便在特定应用中选择合适的模型。我们将代码和数据公开发布在https://github.com/tobna/WhatTransformerToFavor上。”
RLIPv2: Fast Scaling of Relational Language-Image Pre-training
paper_authors: Hangjie Yuan, Shiwei Zhang, Xiang Wang, Samuel Albanie, Yining Pan, Tao Feng, Jianwen Jiang, Dong Ni, Yingya Zhang, Deli Zhao
for: 提高计算机视觉任务中的关系理解能力,通过将视觉表示与语言表示相关联。
methods: 提出了一种快速结合的模型RLIPv2,使得可以在大规模 Pseudo-labelled scene graph 数据上进行关系预训练。RLIPv2 引入了非对称语言图像融合 (ALIF) 机制,使得早期和深入的受阻语言编码层可以更加简洁。
results: 通过对 Human-Object Interaction Detection 和 Scene Graph Generation 等两个任务进行广泛的实验,RLIPv2 在三个标准 bencmarks 上达到了状态的艺术性表现,包括无需训练、几架shot 和零shot 设置下的表现。特别是,最大的 RLIPv2 在 HICO-DET 上达到了 23.29mAP 的最高分,只需要 1% 的数据进行 finituning,可以达到 32.22mAP 的表现,并且在 100% 的数据上进行 finituning 可以达到 45.09mAP。Abstract
Relational Language-Image Pre-training (RLIP) aims to align vision representations with relational texts, thereby advancing the capability of relational reasoning in computer vision tasks. However, hindered by the slow convergence of RLIPv1 architecture and the limited availability of existing scene graph data, scaling RLIPv1 is challenging. In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data. To enable fast scaling, RLIPv2 introduces Asymmetric Language-Image Fusion (ALIF), a mechanism that facilitates earlier and deeper gated cross-modal fusion with sparsified language encoding layers. ALIF leads to comparable or better performance than RLIPv1 in a fraction of the time for pre-training and fine-tuning. To obtain scene graph data at scale, we extend object detection datasets with free-form relation labels by introducing a captioner (e.g., BLIP) and a designed Relation Tagger. The Relation Tagger assigns BLIP-generated relation texts to region pairs, thus enabling larger-scale relational pre-training. Through extensive experiments conducted on Human-Object Interaction Detection and Scene Graph Generation, RLIPv2 shows state-of-the-art performance on three benchmarks under fully-finetuning, few-shot and zero-shot settings. Notably, the largest RLIPv2 achieves 23.29mAP on HICO-DET without any fine-tuning, yields 32.22mAP with just 1% data and yields 45.09mAP with 100% data. Code and models are publicly available at https://github.com/JacobYuan7/RLIPv2.
摘要
RLIP(关系语言图前期训练)目的是将视觉表示与关系文本对齐,从而提高计算机视觉任务中的关系理解能力。然而,RLIPv1架构的慢慢涨潮和现有场景图数据的有限性,使得扩展RLIPv1的困难。在这篇论文中,我们提出RLIPv2,一种快速涨潮的模型,可以将关系预训练扩展到大规模 Pseudo-标注场景图数据。为了快速涨潮,RLIPv2引入了异常语言图像融合(ALIF)机制,使得更早、更深的隔离modal融合,并使用压缩语言编码层。ALIF使得RLIPv2在预训练和精度调整中比RLIPv1快得多。为了获得大规模场景图数据,我们将对象检测数据集扩展,并在BLIP(例如)和设计的关系标签器(Relation Tagger)的帮助下,对区域对应关系文本进行分配。这使得更大规模的关系预训练可能。通过对人物交互检测和场景图生成进行广泛的实验,RLIPv2在三个标准准则下达到了状态 искусственный智能水平,包括完全finetuning、几何shot和零shot设置。尤其是RLIPv2最大的版本在HICO-DET上达到了23.29mAP,无需任何调整,并且在1%数据上达到了32.22mAP,以及在100%数据上达到了45.09mAP。代码和模型在https://github.com/JacobYuan7/RLIPv2上公开。
Surprise machines: revealing Harvard Art Museums’ image collection
paper_authors: Dario Rodighiero, Lins Derry, Douglas Duhaime, Jordan Kruguer, Maximilian C. Mueller, Christopher Pietsch, Jeffrey T. Schnapp, Jeff Steward
results: 该项目成功地创造了一种意外的视觉体验,使访问者能够更深入地了解和探索哈佛艺术博物馆的图像收藏。Abstract
Surprise Machines is a project of experimental museology that sets out to visualize the entire image collection of the Harvard Art Museums, intending to open up unexpected vistas on more than 200,000 objects usually inaccessible to visitors. Part of the exhibition Curatorial A(i)gents organized by metaLAB (at) Harvard, the project explores the limits of artificial intelligence to display a large set of images and create surprise among visitors. To achieve such a feeling of surprise, a choreographic interface was designed to connect the audience's movement with several unique views of the collection.
摘要
《意料之机》是哈佛艺术博物馆的一个实验 museology 项目,旨在将哈佛艺术博物馆的全部图像收藏视觉化,以开拓访问者未曾能够访问的更多 чем200,000个物品。该项目是metaLAB(@)哈佛所组织的《Curatorial A(i)gents》展览的一部分,探索人工智能是否能够显示大量图像并创造访问者的意外感。为实现此类感受,项目设计了一个chorographic User Interface,以连接访客的移动和图像收藏中的多个独特视图。
Distributed Neurodynamics-Based Backstepping Optimal Control for Robust Constrained Consensus of Underactuated Underwater Vehicles Fleet
results: 研究发现,基于协调协议和稳定控制器的优化形态跟踪控制方法可以实现UUVs队伍的最佳形态跟踪,同时满足约束条件。此外,在不确定干扰的情况下,研究也证明了UUVs队伍的稳定性和可靠性。Abstract
Robust constrained formation tracking control of underactuated underwater vehicles (UUVs) fleet in three-dimensional space is a challenging but practical problem. To address this problem, this paper develops a novel consensus based optimal coordination protocol and a robust controller, which adopts a hierarchical architecture. On the top layer, the spherical coordinate transform is introduced to tackle the nonholonomic constraint, and then a distributed optimal motion coordination strategy is developed. As a result, the optimal formation tracking of UUVs fleet can be achieved, and the constraints are fulfilled. To realize the generated optimal commands better and, meanwhile, deal with the underactuation, at the lower-level control loop a neurodynamics based robust backstepping controller is designed, and in particular, the issue of "explosion of terms" appearing in conventional backstepping based controllers is avoided and control activities are improved. The stability of the overall UUVs formation system is established to ensure that all the states of the UUVs are uniformly ultimately bounded in the presence of unknown disturbances. Finally, extensive simulation comparisons are made to illustrate the superiority and effectiveness of the derived optimal formation tracking protocol.
摘要
Robust constrained formation tracking控制 OF underactuated underwater vehicles (UUVs) 舰队在三维空间是一个具有挑战性但实际性的问题。为解决这个问题,这篇论文开发了一种新的协调协议和一种稳定控制器,其采用了层次架构。在最高层,引入了球坐标变换来处理非整体约束,然后开发了分布式优化运动协调策略。因此,UUVs 舰队可以实现优化的形态跟踪,同时满足约束。为了实现生成的优化命令更好地,并且处理下 actuation,在下一级控制循环中设计了基于神经动力学的稳定后退控制器,其中避免了传统后退控制器中的“爆炸性”问题,提高了控制活动。最后,确立了 UUVs formation 系统的稳定性,以确保所有 UUVs 的状态在未知干扰的情况下都是 ultimately bounded。 finally,通过了EXTENSIVE 的simulation 比较,证明了获得的优化形态跟踪协议的优越性和有效性。
Audio-Visual Glance Network for Efficient Video Recognition
results: 在多个视频认证测试准则中实现新的状态略进行多个视频认证任务,并实现更快的处理速度。Abstract
Deep learning has made significant strides in video understanding tasks, but the computation required to classify lengthy and massive videos using clip-level video classifiers remains impractical and prohibitively expensive. To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video. AVGN firstly divides the video into snippets of image-audio clip pair and employs lightweight unimodal encoders to extract global visual features and audio features. To identify the important temporal segments, we use an Audio-Visual Temporal Saliency Transformer (AV-TeST) that estimates the saliency scores of each frame. To further increase efficiency in the spatial dimension, AVGN processes only the important patches instead of the whole images. We use an Audio-Enhanced Spatial Patch Attention (AESPA) module to produce a set of enhanced coarse visual features, which are fed to a policy network that produces the coordinates of the important patches. This approach enables us to focus only on the most important spatio-temporally parts of the video, leading to more efficient video recognition. Moreover, we incorporate various training techniques and multi-modal feature fusion to enhance the robustness and effectiveness of our AVGN. By combining these strategies, our AVGN sets new state-of-the-art performance in multiple video recognition benchmarks while achieving faster processing speed.
摘要
深度学习在视频理解任务中做出了重要的进步,但计算长期和大量视频使用clip级视频分类器仍然是不可能和过分昂贵的。为解决这个问题,我们提出了Audio-Visual Glance Network(AVGN),它利用通常可以获得的音频和视觉modalities来高效地处理视频中的空间-时间重要部分。AVGN首先将视频分割成帧和音频clip对的剪辑,然后使用轻量级单模态编码器提取全局视觉特征和音频特征。为了确定重要的时间段落,我们使用Audio-Visual Temporal Saliency Transformer(AV-TeST)来估算每帧的特征积分。然后,我们使用Audio-Enhanced Spatial Patch Attention(AESPA)模块生成一组改进后的粗糙视觉特征,并将其传递给一个策略网络,以生成视频中重要的空间-时间部分的坐标。这种方法使得我们只需要关注视频中最重要的空间-时间部分,从而提高视频识别的效率。此外,我们还 incorporatedvarious training techniques和多模态特征融合以提高我们的AVGN的 Robustness和有效性。通过组合这些策略,我们的AVGN在多个视频识别benchmark上达到了新的状态态表现,同时实现更快的处理速度。
Distributed Robust Learning-Based Backstepping Control Aided with Neurodynamics for Consensus Formation Tracking of Underwater Vessels
for: 本 paper 旨在提出一种分布式Robust学习控制方法,用于多艘海上 vessel 的聚合成本 Tracking 问题,系统参数被完全 unknown 并且受到模型匹配错误、海洋干扰和噪声的影响。
methods: 本 paper 使用 graph theory synthesize 分布式控制器,并提供了稳定性保证。由于参数不确定性仅出现在船舶动态模型中,因此使用了 backstepping control technique。然后,为了解决时变不确定系统的问题,提出了一种在线学习程序。此外,模型误差、环境干扰和测量噪声也被考虑并解决。
results: 本 paper 提出的分布式控制协议,可以在面对模型误差、海洋干扰和测量噪声等问题下,实现稳定的聚合成本 Tracking。并通过了大量的simulation experiment 来验证其效果。Abstract
This paper addresses distributed robust learning-based control for consensus formation tracking of multiple underwater vessels, in which the system parameters of the marine vessels are assumed to be entirely unknown and subject to the modeling mismatch, oceanic disturbances, and noises. Towards this end, graph theory is used to allow us to synthesize the distributed controller with a stability guarantee. Due to the fact that the parameter uncertainties only arise in the vessels' dynamic model, the backstepping control technique is then employed. Subsequently, to overcome the difficulties in handling time-varying and unknown systems, an online learning procedure is developed in the proposed distributed formation control protocol. Moreover, modeling errors, environmental disturbances, and measurement noises are considered and tackled by introducing a neurodynamics model in the controller design to obtain a robust solution. Then, the stability analysis of the overall closed-loop system under the proposed scheme is provided to ensure the robust adaptive performance at the theoretical level. Finally, extensive simulation experiments are conducted to further verify the efficacy of the presented distributed control protocol.
摘要
Here is the text in Simplified Chinese:这篇论文关于多艘海上船舶的协同追踪共轨,即使系统参数完全不确定,并且受到模型误差、海洋干扰和噪声的影响。为此,我们使用图论来确保稳定性,并使用backstepping控制技术来处理时间变化和不确定系统。此外,我们还引入了神经动力学模型来处理模型误差、环境干扰和测量噪声。最后,我们提供了对整个关闭环系统的稳定分析,以确保在理论上的稳定适应性。此外,我们还进行了详细的仿真实验来验证我们的分布式控制协议的有效性。
Towards Attack-tolerant Federated Learning via Critical Parameter Analysis
results: 实验结果表明,我们的模型在不同的攻击场景下,能够比现有的防御策略更高效地防御 poisoning 攻击。Abstract
Federated learning is used to train a shared model in a decentralized way without clients sharing private data with each other. Federated learning systems are susceptible to poisoning attacks when malicious clients send false updates to the central server. Existing defense strategies are ineffective under non-IID data settings. This paper proposes a new defense strategy, FedCPA (Federated learning with Critical Parameter Analysis). Our attack-tolerant aggregation method is based on the observation that benign local models have similar sets of top-k and bottom-k critical parameters, whereas poisoned local models do not. Experiments with different attack scenarios on multiple datasets demonstrate that our model outperforms existing defense strategies in defending against poisoning attacks.
摘要
federated learning 是一种用于共享模型的分布式训练方式,无需客户端共享私人数据。但是, federated learning 系统容易受到毒品攻击,当恶意客户端将false更新发送到中央服务器。现有的防御策略在非标一分布数据设置下无效。这篇论文提出了一种新的防御策略,即 FedCPA( federated learning with Critical Parameter Analysis)。我们的攻击忍受聚合方法基于本地模型的涵义参数 sets 的观察,即善意本地模型的 top-k 和 bottom-k 涵义参数集相似,而毒品本地模型不同。在多个数据集上对不同的攻击场景进行了实验,我们的模型在防御毒品攻击方面表现出excel,比现有的防御策略更高效。
Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on Multi-Order Spectrograms
results: 在ASVspoof2019 LA Challenge上取得了顶尖表现,EER为0.77%。Abstract
Robust audio anti-spoofing has been increasingly challenging due to the recent advancements on deepfake techniques. While spectrograms have demonstrated their capability for anti-spoofing, complementary information presented in multi-order spectral patterns have not been well explored, which limits their effectiveness for varying spoofing attacks. Therefore, we propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations. Specifically, spectral patterns up to second-order are fused in a coarse-to-fine manner and two branches are designed for the fine-level fusion from the spectral and temporal contexts. A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss. Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset: ASVspoof2019 LA Challenge.
摘要
受深圳技术的进步影响,Robust audio anti-spoofing已经变得越来越困难。虽然spectrograms表明其可以用于anti-spoofing,但是多个 spectral pattern的信息尚未得到了充分利用,这限制了它们在不同的 spoofing 攻击下的效iveness。因此,我们提出了一种基于深度学习的新方法,即S2pecNet,以利用多个 spectral pattern来获得Robust audio anti-spoofing表示。特别是,在第二顺序的spectral pattern中进行了混合,并在粗糙到细腻的manner中设置了两个分支来从spectral和temporal上下文中进行细粒度的拼接。从拼接后的表示重建回到输入spectrograms,可以避免潜在的混合信息损失。我们的方法在ASVspoof2019 LA Challenge上达到了状态级的表现,EER为0.77%。
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
paper_authors: Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai for:This paper focuses on the problem of generating semantically-relevant sound from visual input, specifically using foundation models (FMs) to bridge the domain gap between visual and auditory modalities.methods:The proposed method uses a simple yet effective mapper mechanism (V2A-Mapper) to translate the visual input between the CLIP and CLAP spaces, and then uses pretrained audio generative FM AudioLDM to produce high-fidelity and visually-aligned sound.results:Compared to previous approaches, the proposed method achieves superior performance in both objective and subjective evaluations, with 53% and 19% improvement in fidelity and relevance, respectively, using 86% fewer parameters.Abstract
Building artificial intelligence (AI) systems on top of a set of foundation models (FMs) is becoming a new paradigm in AI research. Their representative and generative abilities learnt from vast amounts of data can be easily adapted and transferred to a wide range of downstream tasks without extra training from scratch. However, leveraging FMs in cross-modal generation remains under-researched when audio modality is involved. On the other hand, automatically generating semantically-relevant sound from visual input is an important problem in cross-modal generation studies. To solve this vision-to-audio (V2A) generation problem, existing methods tend to design and build complex systems from scratch using modestly sized datasets. In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM. We first investigate the domain gap between the latent space of the visual CLIP and the auditory CLAP models. Then we propose a simple yet effective mapper mechanism (V2A-Mapper) to bridge the domain gap by translating the visual input between CLIP and CLAP spaces. Conditioned on the translated CLAP embedding, pretrained audio generative FM AudioLDM is adopted to produce high-fidelity and visually-aligned sound. Compared to previous approaches, our method only requires a quick training of the V2A-Mapper. We further analyze and conduct extensive experiments on the choice of the V2A-Mapper and show that a generative mapper is better at fidelity and variability (FD) while a regression mapper is slightly better at relevance (CS). Both objective and subjective evaluation on two V2A datasets demonstrate the superiority of our proposed method compared to current state-of-the-art approaches - trained with 86% fewer parameters but achieving 53% and 19% improvement in FD and CS, respectively.
摘要
Translated into Simplified Chinese:建立基于基础模型(FM)的人工智能系统是当前AI研究中的新方案。这些基础模型通过大量数据学习的表示和生成能力,可以方便地适应多种下游任务,无需从头开始重新训练。然而,在音频模式中使用FM进行跨模态生成仍然受到了较少的研究。在这篇论文中,我们提出了一种轻量级的解决方案,利用CLIP、CLAP和AudioLDM这些基础模型。我们首先调查了视觉CLIP和听音CLAP模型之间的领域差距。然后,我们提出了一种简单 yet有效的映射机制(V2A-Mapper),用于跨模态的映射。 conditioned on the translated CLAP embedding,我们采用了预训练的听音生成FM AudioLDM,以生成具有高准确性和视觉对齐的声音。相比之前的方法,我们的方法只需快速训练V2A-Mapper。我们进一步分析和进行了大量实验,研究V2A-Mapper的选择问题,并发现生成映射在FD中性能提高53%,CS中性能提高19%。两个V2A数据集的对象和主观评估都表明了我们提出的方法在当前状态方法中的超越性,训练参数减少86%,FD和CS中性能提高53%和19%,分别。
How important are specialized transforms in Neural Operators?
results: 研究发现,使用学习型线性层可以提供与最佳转换层相当的性能,并且在计算时间方面也有一定的优势。这种观察可能对未来关于Neural Operators的研究有着重要的影响。Abstract
Simulating physical systems using Partial Differential Equations (PDEs) has become an indispensible part of modern industrial process optimization. Traditionally, numerical solvers have been used to solve the associated PDEs, however recently Transform-based Neural Operators such as the Fourier Neural Operator and Wavelet Neural Operator have received a lot of attention for their potential to provide fast solutions for systems of PDEs. In this work, we investigate the importance of the transform layers to the reported success of transform based neural operators. In particular, we record the cost in terms of performance, if all the transform layers are replaced by learnable linear layers. Surprisingly, we observe that linear layers suffice to provide performance comparable to the best-known transform-based layers and seem to do so with a compute time advantage as well. We believe that this observation can have significant implications for future work on Neural Operators, and might point to other sources of efficiencies for these architectures.
摘要
使用分数方程(PDEs)模拟物理系统已成为现代工业过程优化的不可或缺的一部分。传统上,数值解法被用来解决相关的PDEs,但最近,基于变换的神经操作符,如傅里叶 neural operator和wavelet neural operator,在提供快速解决系统的PDEs方面受到了广泛关注。在这项工作中,我们研究了变换层的重要性,特别是替换所有变换层为学习的线性层后的成本。我们发现了一个Counterintuitive的现象:所有变换层被替换后,使用学习的线性层可以提供与最佳变换层相当的性能,并且在计算时间方面也有一定的优势。我们认为这一观察可能对未来神经操作符的研究产生重要的影响,并可能暴露出其他优化的来源。
Graph-based Alignment and Uniformity for Recommendation
paper_authors: Liangwei Yang, Zhiwei Liu, Chen Wang, Mingdai Yang, Xiaolong Liu, Jing Ma, Philip S. Yu
for: addressing the sparsity issue in collaborative filtering-based recommender systems (RecSys)
methods: proposes a novel approach called graph-based alignment and uniformity (GraphAU), which explicitly considers high-order connectivities in the user-item bipartite graph
results: significantly alleviates the sparsity issue and achieves state-of-the-art performance on four datasets, with the open-source code available at https://github.com/YangLiangwei/GraphAU.Here's the full Chinese text:
for: 本研究旨在解决collaborative filtering-based recommender systems (RecSys)中的缺乏问题。
results: 在四个数据集上,GraphAU有效地解决了缺乏问题,并达到了当前最佳性能。代码可以在https://github.com/YangLiangwei/GraphAU中下载。Abstract
Collaborative filtering-based recommender systems (RecSys) rely on learning representations for users and items to predict preferences accurately. Representation learning on the hypersphere is a promising approach due to its desirable properties, such as alignment and uniformity. However, the sparsity issue arises when it encounters RecSys. To address this issue, we propose a novel approach, graph-based alignment and uniformity (GraphAU), that explicitly considers high-order connectivities in the user-item bipartite graph. GraphAU aligns the user/item embedding to the dense vector representations of high-order neighbors using a neighborhood aggregator, eliminating the need to compute the burdensome alignment to high-order neighborhoods individually. To address the discrepancy in alignment losses, GraphAU includes a layer-wise alignment pooling module to integrate alignment losses layer-wise. Experiments on four datasets show that GraphAU significantly alleviates the sparsity issue and achieves state-of-the-art performance. We open-source GraphAU at https://github.com/YangLiangwei/GraphAU.
摘要
results: 通过添加物理信息损失函数 HyperPINN 进行训练,可以快速学习参数化部分 differential equations 的解,如布格尔方程和奈尔-斯托克斯-科瓦斯纳流体动力学方程,并在参数量方面实现8倍减少,而不会影响准确性。Abstract
Physics-informed neural networks (PINNs) have been widely used to develop neural surrogates for solutions of Partial Differential Equations. A drawback of PINNs is that they have to be retrained with every change in initial-boundary conditions and PDE coefficients. The Hypernetwork, a model-based meta learning technique, takes in a parameterized task embedding as input and predicts the weights of PINN as output. Predicting weights of a neural network however, is a high-dimensional regression problem, and hypernetworks perform sub-optimally while predicting parameters for large base networks. To circumvent this issue, we use a low ranked adaptation (LoRA) formulation to decompose every layer of the base network into low-ranked tensors and use hypernetworks to predict the low-ranked tensors. Despite the reduced dimensionality of the resulting weight-regression problem, LoRA-based Hypernetworks violate the underlying physics of the given task. We demonstrate that the generalization capabilities of LoRA-based hypernetworks drastically improve when trained with an additional physics-informed loss component (HyperPINN) to satisfy the governing differential equations. We observe that LoRA-based HyperPINN training allows us to learn fast solutions for parameterized PDEs like Burger's equation and Navier Stokes: Kovasznay flow, while having an 8x reduction in prediction parameters on average without compromising on accuracy when compared to all other baselines.
摘要
физи学信息泛化神经网络(PINNs)已广泛应用于解决部分梯度方程的解的神经替代模型。 PINNs 的缺点是它们需要每次更改初始边界条件和微分方程系数时重新训练。 神经网络模型基于元学习技术(Hypernetwork)可以将任务嵌入作为输入,预测 PINN 的权重。 但是,预测神经网络权重是一个高维度回归问题,神经网络模型在预测基础网络参数时表现不佳。 为了解决这个问题,我们使用 low-rank adaptation(LoRA)形式划分每层基础网络中的每个层成低维度的矩阵,并使用神经网络预测这些低维度矩阵。 尽管LoRA-based Hypernetworks 的维度减少了,但是它们仍然不符合给定任务的物理学。 我们表明,通过在 LoRA-based HyperPINN 训练中添加物理学信息泛化损失函数,可以提高 LoRA-based HyperPINN 的泛化能力。 我们观察到,LoRA-based HyperPINN 训练可以快速地解决参数化的微分方程,如布尔格方程和奈尔-斯托克斯:科瓦兹纳流,而且在 average 上降低预测参数的数量约 8 倍,不会降低准确性。
Preference-conditioned Pixel-based AI Agent For Game Testing
paper_authors: Sherif Abdelfattah, Adrian Brown, Pushi Zhang for: This paper aims to improve game testing AI agents’ ability to explore and test games with high quality and efficiency, addressing the limitations of current methods that rely on game state information and lack explicit control over exploration style.methods: The proposed agent design uses pixel-based state observations and imitation learning with self-supervised and supervised learning objectives to improve exploration coverage and test execution quality.results: The proposed agent significantly outperforms state-of-the-art pixel-based game testing agents in exploration coverage and test execution quality when evaluated on a complex open-world environment resembling many aspects of real AAA games.Abstract
The game industry is challenged to cope with increasing growth in demand and game complexity while maintaining acceptable quality standards for released games. Classic approaches solely depending on human efforts for quality assurance and game testing do not scale effectively in terms of time and cost. Game-testing AI agents that learn by interaction with the environment have the potential to mitigate these challenges with good scalability properties on time and costs. However, most recent work in this direction depends on game state information for the agent's state representation, which limits generalization across different game scenarios. Moreover, game test engineers usually prefer exploring a game in a specific style, such as exploring the golden path. However, current game testing AI agents do not provide an explicit way to satisfy such a preference. This paper addresses these limitations by proposing an agent design that mainly depends on pixel-based state observations while exploring the environment conditioned on a user's preference specified by demonstration trajectories. In addition, we propose an imitation learning method that couples self-supervised and supervised learning objectives to enhance the quality of imitation behaviors. Our agent significantly outperforms state-of-the-art pixel-based game testing agents over exploration coverage and test execution quality when evaluated on a complex open-world environment resembling many aspects of real AAA games.
摘要
游戏产业面临增长的需求和游戏复杂度的挑战,同时保持适当的质量标准 для发布的游戏。经典的方法仅凭靠人工努力来确保质量检测和游戏测试,效率不足。基于环境互动学习的游戏测试AI代理具有良好的可扩展性,但大多数最新的研究仅基于游戏状态信息来表示代理的状态,导致泛化性不足。此外,游戏测试工程师通常会按照某种特定的风格来探索游戏,如探索 golden path。但现有的游戏测试AI代理没有直接提供这种偏好的实现方式。本文解决了这些限制,提出一种基于像素状态观察的代理设计,同时采用用户 preference 提供的示范轨迹来conditioned 环境探索。此外,我们提出一种imiter learning方法,将自动学习和监督学习目标相结合,以提高模仿行为质量。我们的代理在一个复杂的开放世界环境中表现出色,与现有的像素基于游戏测试代理相比,在探索覆盖率和测试执行质量方面显著超越。
Enhancing Reasoning Capabilities of Large Language Models: A Graph-Based Verification Approach
results: 实验结果表明,这种图像基的验证方法不仅可以显著提高LLM的理解能力,而且也可以超过现有的验证方法在提高这些模型的理解性能方面。Abstract
Large Language Models (LLMs) have showcased impressive reasoning capabilities, particularly when guided by specifically designed prompts in complex reasoning tasks such as math word problems. These models typically solve tasks using a chain-of-thought approach, which not only bolsters their reasoning abilities but also provides valuable insights into their problem-solving process. However, there is still significant room for enhancing the reasoning abilities of LLMs. Some studies suggest that the integration of an LLM output verifier can boost reasoning accuracy without necessitating additional model training. In this paper, we follow these studies and introduce a novel graph-based method to further augment the reasoning capabilities of LLMs. We posit that multiple solutions to a reasoning task, generated by an LLM, can be represented as a reasoning graph due to the logical connections between intermediate steps from different reasoning paths. Therefore, we propose the Reasoning Graph Verifier (RGV) to analyze and verify the solutions generated by LLMs. By evaluating these graphs, models can yield more accurate and reliable results.Our experimental results show that our graph-based verification method not only significantly enhances the reasoning abilities of LLMs but also outperforms existing verifier methods in terms of improving these models' reasoning performance.
摘要
Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos
results: 实验表明,我们的方法可以比超过supervised counterparts 的表示学习方法在多种下游任务上表现出色,并且表明 learned 表示的转移性。Abstract
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data. Previous methods commonly conduct representation learning at the clip or frame level and cannot well capture fine-grained semantics. Instead of contrasting the representations of clips or frames, in this paper, we propose a unified self-supervised framework by conducting contrastive learning at the point level. Moreover, we introduce a new pretext task by achieving semantic alignment of superpoints, which further facilitates the representations to capture semantic cues at multiple scales. In addition, due to the high redundancy in the temporal dimension of dynamic point clouds, directly conducting contrastive learning at the point level usually leads to massive undesired negatives and insufficient modeling of positive representations. To remedy this, we propose a selection strategy to retain proper negatives and make use of high-similarity samples from other instances as positive supplements. Extensive experiments show that our method outperforms supervised counterparts on a wide range of downstream tasks and demonstrates the superior transferability of the learned representations.
摘要
我们提出了一种综合点云视频自动学习框架,用于物体和场景数据的 Representation Learning。过去的方法通常在clip或帧级进行 Representation Learning,但这些方法难以捕捉细腻的 semantics。相反,在这篇论文中,我们提出了一种综合自我超vised学习框架,通过在点级进行对比学习。此外,我们还引入了一种新的预text任务,即实现superpoint的semantic aligning,以便 representations能够捕捉多个尺度的semanticcue。此外,由于点云动态数据中的时间维度具有高的重复率,直接在点级进行对比学习通常会导致庞大的undesired negatives和Positive representations的不足。为了解决这个问题,我们提出了一种选择策略,以保留正确的负例和利用其他实例中的高相似性样本作为Positive complement。广泛的实验表明,我们的方法在多种下游任务上表现出优于supervised counterpart,并且demonstrates the superior transferability of the learned representations。
A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments
results: 该论文提出了一种发展优化方法,可以在在线环境中不断演化政策覆盖集,同时在定义的目标空间中探索Preferencespace。该算法在非站点环境中表现出了明显的优势,并在站点环境中达到了相对的比较Result。Abstract
Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity to evolve a coverage set of policies that can solve the problem. This paper introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.
摘要
本文提出了一种发展优化方法,可以在在线模式下,在定义的目标空间中探索 preference space,而不是采用优化过程。我们提出了一种新的多目标激励学习算法,可以在非站ARY环境中稳定地演化一个凸包含策略的覆盖集。我们与两种现有的多目标激励学习算法进行比较,Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments, while achieving comparable results in stationary environments.
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos
results: 对于MSRAction-3D、NTU-RGBD、NvGesture和SHREC’17等数据集,我们进行了广泛的实验,并证明了提议的方法的有效性。Abstract
Recently, the community has made tremendous progress in developing effective methods for point cloud video understanding that learn from massive amounts of labeled data. However, annotating point cloud videos is usually notoriously expensive. Moreover, training via one or only a few traditional tasks (e.g., classification) may be insufficient to learn subtle details of the spatio-temporal structure existing in point cloud videos. In this paper, we propose a Masked Spatio-Temporal Structure Prediction (MaST-Pre) method to capture the structure of point cloud videos without human annotations. MaST-Pre is based on spatio-temporal point-tube masking and consists of two self-supervised learning tasks. First, by reconstructing masked point tubes, our method is able to capture the appearance information of point cloud videos. Second, to learn motion, we propose a temporal cardinality difference prediction task that estimates the change in the number of points within a point tube. In this way, MaST-Pre is forced to model the spatial and temporal structure in point cloud videos. Extensive experiments on MSRAction-3D, NTU-RGBD, NvGesture, and SHREC'17 demonstrate the effectiveness of the proposed method.
摘要
(Simplified Chinese translation)最近,社区在开发有效的点云视频理解方法上做出了很大的进步,通过大量标注数据学习。然而,标注点云视频通常非常昂贵。此外,通过一些传统任务(例如分类)训练可能不足以学习点云视频中细腻的空间-时间结构。在这篇论文中,我们提出了一种Masked Spatio-Temporal Structure Prediction(MaST-Pre)方法,可以无需人工标注,捕捉点云视频的结构。MaST-Pre基于空间-时间点管道遮盲,包括两个自我超vised学习任务。首先,通过重建遮盲点管道,我们的方法可以捕捉点云视频的外观信息。其次,为了学习运动,我们提出了一个时间 cardinality difference prediction任务,可以估计点管道中点的变化。这样,MaST-Pre是被迫模型点云视频的空间和时间结构。我们在 MSRAction-3D、NTU-RGBD、NvGesture 和 SHREC'17 上进行了广泛的实验, demonstarted 提出的方法的有效性。
Intrinsically Motivated Hierarchical Policy Learning in Multi-objective Markov Decision Processes
paper_authors: Sherif Abdelfattah, Kathryn Merrick, Jiankun Hu
for: solve multi-objective Markov decision processes in non-stationary environments
methods: intrinsically motivated reinforcement learning with dual-phase learning
results: significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environmentHere’s the simplified Chinese text:
for: 解决多目标Markov决策过程中的非站点环境
methods: 使用内在动机导向的强化学习方法,包括双期学习
results: 在动力环境中显著超越了现有多目标强化方法的性能I hope this helps!Abstract
Multi-objective Markov decision processes are sequential decision-making problems that involve multiple conflicting reward functions that cannot be optimized simultaneously without a compromise. This type of problems cannot be solved by a single optimal policy as in the conventional case. Alternatively, multi-objective reinforcement learning methods evolve a coverage set of optimal policies that can satisfy all possible preferences in solving the problem. However, many of these methods cannot generalize their coverage sets to work in non-stationary environments. In these environments, the parameters of the state transition and reward distribution vary over time. This limitation results in significant performance degradation for the evolved policy sets. In order to overcome this limitation, there is a need to learn a generic skill set that can bootstrap the evolution of the policy coverage set for each shift in the environment dynamics therefore, it can facilitate a continuous learning process. In this work, intrinsically motivated reinforcement learning has been successfully deployed to evolve generic skill sets for learning hierarchical policies to solve multi-objective Markov decision processes. We propose a novel dual-phase intrinsically motivated reinforcement learning method to address this limitation. In the first phase, a generic set of skills is learned. While in the second phase, this set is used to bootstrap policy coverage sets for each shift in the environment dynamics. We show experimentally that the proposed method significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environment.
摘要
Simplified Chinese translation:多目标Markov决策过程是一种sequential decision-making问题,其中存在多个矛盾的奖励函数,无法同时优化。这类问题不可以通过单一的优化策略来解决,与传统情况不同。相反,多目标学习方法会演化一个coverage集的优化策略,以满足所有可能的偏好。然而,许多这些方法无法泛化其coverage集,以适应非站ARY environments。在这些环境中,状态转移和奖励分布的参数变化过时。这限制了演化出来的策略集的性能。为了突破这些限制,需要学习一个通用技能集,可以启动演化策略集的扩展。在这种情况下,我们提出了一种新的双相启动多目标学习方法。在第一阶段,学习一个通用技能集。在第二阶段,使用这个集来扩展策略集 для每个环境动态变化。我们实验表明,我们提出的方法在动态 роботикс环境中明显超过了当前多目标学习方法的性能。
Digital Twin-Oriented Complex Networked Systems based on Heterogeneous node features and interaction rules
paper_authors: Jiaqi Wen, Bogdan Gabrys, Katarzyna Musial for: This paper proposes a modelling framework for Digital Twin-Oriented Complex Networked Systems (DT-CNSs) to generate networks that faithfully represent real systems.methods: The modelling process focuses on features of nodes and interaction rules for creating connections based on individual node preferences.results: The paper presents a case study on disaster resilience of social networks during an epidemic outbreak, showing how different levels of structural and dynamics complexities influence network growth and epidemic spread. The analysis reveals that mitigation policies should target nodes with preferred features as they have higher infection risks and should be the focus of epidemic control.Here is the same information in Simplified Chinese text:for: 这篇论文提出了一种用于数字双体复杂网络系统(DT-CNS)的扩展模型 Frameworks,以生成具有真实系统特点的网络。methods: 模型过程关注节点特点和连接规则的互动,以建立基于个体节点偏好的连接。results: 论文通过一个案例研究,探讨在社交网络上疫情爆发时的灾害抵御能力,发现不同水平的结构和动态复杂性对网络增长和疫情传播产生了影响。分析表明,控制疫情时应该向具有偏好特点的节点进行抗疫措施,以降低疫情传播风险。Abstract
This study proposes an extendable modelling framework for Digital Twin-Oriented Complex Networked Systems (DT-CNSs) with a goal of generating networks that faithfully represent real systems. Modelling process focuses on (i) features of nodes and (ii) interaction rules for creating connections that are built based on individual node's preferences. We conduct experiments on simulation-based DT-CNSs that incorporate various features and rules about network growth and different transmissibilities related to an epidemic spread on these networks. We present a case study on disaster resilience of social networks given an epidemic outbreak by investigating the infection occurrence within specific time and social distance. The experimental results show how different levels of the structural and dynamics complexities, concerned with feature diversity and flexibility of interaction rules respectively, influence network growth and epidemic spread. The analysis revealed that, to achieve maximum disaster resilience, mitigation policies should be targeted at nodes with preferred features as they have higher infection risks and should be the focus of the epidemic control.
摘要
The case study on disaster resilience of social networks during an epidemic outbreak shows how different levels of structural and dynamics complexities, such as feature diversity and flexibility of interaction rules, influence network growth and epidemic spread. The analysis reveals that mitigation policies should target nodes with preferred features as they have a higher risk of infection and should be the focus of epidemic control.Translation in Simplified Chinese:这个研究提出了一个可扩展的模型框架,用于数字双方向复杂网络系统(DT-CNS),以创建准确表示实际系统的网络。模型 процесс关注节点特性以及建立连接的交互规则,这些规则基于个节点的偏好。研究通过在模拟基础上进行实验,检验不同特性和规则对网络增长和疾病传播的影响。研究中的案例研究探讨了在疾病爆发时社交网络的难以恢复性,通过对特定时间和社交距离内的感染发生进行调查。实验结果表明,不同的结构和动态复杂性水平对网络增长和疾病传播产生不同的影响。分析表明,为了实现最大的灾害抵御能力,应该通过针对具有偏好特性的节点进行疫苗控制,以降低这些节点的感染风险。
Improving Buoy Detection with Deep Transfer Learning for Mussel Farm Automation
paper_authors: Carl McMillan, Junhong Zhao, Bing Xue, Ross Vennell, Mengjie Zhang
for: 提高 mussel farm 运营效率和管理 accuracy 和鲜明度
methods: 使用人工智能和计算机视觉技术,包括深度学习方法和对象检测
results: 通过适应转移学习和数据多样性,实现了对浮标的检测性能的显著提高,并且在不同的天气和照明条件下具有良好的一致性和鲜明度Abstract
The aquaculture sector in New Zealand is experiencing rapid expansion, with a particular emphasis on mussel exports. As the demands of mussel farming operations continue to evolve, the integration of artificial intelligence and computer vision techniques, such as intelligent object detection, is emerging as an effective approach to enhance operational efficiency. This study delves into advancing buoy detection by leveraging deep learning methodologies for intelligent mussel farm monitoring and management. The primary objective centers on improving accuracy and robustness in detecting buoys across a spectrum of real-world scenarios. A diverse dataset sourced from mussel farms is captured and labeled for training, encompassing imagery taken from cameras mounted on both floating platforms and traversing vessels, capturing various lighting and weather conditions. To establish an effective deep learning model for buoy detection with a limited number of labeled data, we employ transfer learning techniques. This involves adapting a pre-trained object detection model to create a specialized deep learning buoy detection model. We explore different pre-trained models, including YOLO and its variants, alongside data diversity to investigate their effects on model performance. Our investigation demonstrates a significant enhancement in buoy detection performance through deep learning, accompanied by improved generalization across diverse weather conditions, highlighting the practical effectiveness of our approach.
摘要
新西兰aquaculture业正在迅速扩张,特别是对贝壳出口有着强烈的注意力。随着贝壳养殖操作的需求不断演化,人工智能和计算机视觉技术的应用正在成为提高操作效率的有效方法。本研究探讨了基于深度学习方法的智能贝壳园监测和管理,以提高贝壳园的准确性和鲁棒性。我们使用了多种预训练模型,包括YOLO和其变体,以及不同的数据多样性来研究它们对模型性能的影响。我们的调查显示,通过深度学习实现了贝壳检测的显著提高,同时在多种天气条件下也有良好的泛化性,这说明了我们的方法的实际效果。
Advancing Relation Extraction through Language Probing with Exemplars from Set Co-Expansion
methods: integrate representative examples and through co-set expansion, context-free Hearst patterns, and contrastive examples tuning
results: 实验结果表明提出的方法可以显著提高relation extraction的准确率,同时降低相似类异类的混淆Abstract
Relation Extraction (RE) is a pivotal task in automatically extracting structured information from unstructured text. In this paper, we present a multi-faceted approach that integrates representative examples and through co-set expansion. The primary goal of our method is to enhance relation classification accuracy and mitigating confusion between contrastive classes. Our approach begins by seeding each relationship class with representative examples. Subsequently, our co-set expansion algorithm enriches training objectives by incorporating similarity measures between target pairs and representative pairs from the target class. Moreover, the co-set expansion process involves a class ranking procedure that takes into account exemplars from contrastive classes. Contextual details encompassing relation mentions are harnessed via context-free Hearst patterns to ascertain contextual similarity. Empirical evaluation demonstrates the efficacy of our co-set expansion approach, resulting in a significant enhancement of relation classification performance. Our method achieves an observed margin of at least 1 percent improvement in accuracy in most settings, on top of existing fine-tuning approaches. To further refine our approach, we conduct an in-depth analysis that focuses on tuning contrastive examples. This strategic selection and tuning effectively reduce confusion between classes sharing similarities, leading to a more precise classification process. Experimental results underscore the effectiveness of our proposed framework for relation extraction. The synergy between co-set expansion and context-aware prompt tuning substantially contributes to improved classification accuracy. Furthermore, the reduction in confusion between contrastive classes through contrastive examples tuning validates the robustness and reliability of our method.
摘要
<>关系提取(RE)是自动从不结构化文本中提取结构化信息的关键任务。在这篇论文中,我们提出了一种多方面的方法,它将表示例和相似扩展相结合。我们的方法的 PRIMARY GOAL 是提高关系类别的分类精度,并降低对比类别的混淆。我们的方法开始于每个关系类别中的表示例。然后,我们的相似扩展算法将训练目标包含类别之间的相似度。此外,相似扩展过程还包括一个类别排名过程,它考虑了对应类别中的表示例。 Contextual details surrounding relation mentions are harnessed via context-free Hearst patterns to ascertain contextual similarity.实验结果表明,我们的相似扩展方法有效地提高了关系分类性能。我们的方法在大多数设置下达到了至少1%的提升率,并且在现有的细化方法之上进行了进一步的优化。为了进一步改进我们的方法,我们进行了深入的分析,将对于相似类别的选择和调整作为战略。这种策略性的选择和调整有效地减少了类别之间的混淆,导致更加精准的分类过程。实验结果证明了我们提出的关系提取框架的效iveness。它的同时进行相似扩展和上下文相关的提升,使得分类性能得到了进一步提高。此外,通过对相似类别进行调整,我们的方法的可靠性和可靠性得到了证明。
Baird Counterexample Is Solved: with an example of How to Debug a Two-time-scale Algorithm
results: 解释TD算法在这个例子中的缓慢问题,并提供了一个可用于研究离况学习算法的 debug 技术,以及实验结果显示 Impression GTD 算法在这个例子中的快速收敛。Abstract
Baird counterexample was proposed by Leemon Baird in 1995, first used to show that the Temporal Difference (TD(0)) algorithm diverges on this example. Since then, it is often used to test and compare off-policy learning algorithms. Gradient TD algorithms solved the divergence issue of TD on Baird counterexample. However, their convergence on this example is still very slow, and the nature of the slowness is not well understood, e.g., see (Sutton and Barto 2018). This note is to understand in particular, why TDC is slow on this example, and provide debugging analysis to understand this behavior. Our debugging technique can be used to study the convergence behavior of two-time-scale stochastic approximation algorithms. We also provide empirical results of the recent Impression GTD algorithm on this example, showing the convergence is very fast, in fact, in a linear rate. We conclude that Baird counterexample is solved, by an algorithm with convergence guarantee to the TD solution in general and a fast convergence rate.
摘要
白尔德对例(Baird counterexample)于1995年由Leemon Baird提出,用以证明TD(0)算法在这个例子中崩溃。自此以后,它经常用于测试和比较不同的离政学习算法。梯度TD算法解决了TD算法在白尔德对例中的崩溃问题,但它们在这个例子上的 converges 速度非常慢,并且不很了解这种慢速度的性质。例如,参见(Sutton和Barto 2018)。本记录的目的是要更好地理解TD算法在白尔德对例上的慢速度,并提供调试分析来理解这种行为。我们的调试技术可以用来研究两个时间尺度的随机抽象算法的收敛行为。我们还提供了最近的Impression GTD算法在这个例子上的实验结果,显示其 converge 速度非常快,甚至在线性速度。我们结论是,白尔德对例已经被解决,并且有一个可靠的收敛保证和快速 converges 速度。
Learning in Cooperative Multiagent Systems Using Cognitive and Machine Models
results: 对于不同的随机奖励设定,MAIBL模型在动态CMOTP任务中表现出比现有MADRL模型更快的学习速度和更好的协调性。Abstract
Developing effective Multi-Agent Systems (MAS) is critical for many applications requiring collaboration and coordination with humans. Despite the rapid advance of Multi-Agent Deep Reinforcement Learning (MADRL) in cooperative MAS, one major challenge is the simultaneous learning and interaction of independent agents in dynamic environments in the presence of stochastic rewards. State-of-the-art MADRL models struggle to perform well in Coordinated Multi-agent Object Transportation Problems (CMOTPs), wherein agents must coordinate with each other and learn from stochastic rewards. In contrast, humans often learn rapidly to adapt to nonstationary environments that require coordination among people. In this paper, motivated by the demonstrated ability of cognitive models based on Instance-Based Learning Theory (IBLT) to capture human decisions in many dynamic decision making tasks, we propose three variants of Multi-Agent IBL models (MAIBL). The idea of these MAIBL algorithms is to combine the cognitive mechanisms of IBLT and the techniques of MADRL models to deal with coordination MAS in stochastic environments from the perspective of independent learners. We demonstrate that the MAIBL models exhibit faster learning and achieve better coordination in a dynamic CMOTP task with various settings of stochastic rewards compared to current MADRL models. We discuss the benefits of integrating cognitive insights into MADRL models.
摘要
发展有效的多智能体系统(MAS)是许多需要协作和协调的应用中的关键。虽然多智能深度学习(MADRL)在合作MAS中得到了快速的进步,但一个主要挑战是独立的智能体在动态环境中同时学习和互动,并在恒定奖励下学习。现状最先进的MADRL模型在协调多智能对象运输问题(CMOTP)中表现不佳,这种问题需要智能体之间协调和学习。然而,人类在非站台环境中很快地适应和适应非站台环境,并且能够快速地学习。在这篇论文中,我们受到了基于实例学习理论(IBLT)的认知模型的能力所启发,并提出了三种多智能IBLT模型(MAIBL)。这些MAIBL算法的想法是将IBLT的认知机制与MADRL模型的技术相结合,以面对独立学习的协调MAS在随机奖励下的问题。我们示出了MAIBL模型在动态CMOTP任务中与不同设置的随机奖励下显著更快地学习和更好地协调。我们讨论了将认知预测 integrate into MADRL模型的好处。
GPU Accelerated Color Correction and Frame Warping for Real-time Video Stitching
paper_authors: Lu Yang, Zhenglun Kong, Ting Li, Xinyi Bai, Zhiye Lin, Hong Cheng
for: 实时拼接多个视频序列为全景视频
methods: 基于GPU加速的颜色修正和框架折叠,不需要精确的摄像机参数
results: 实时生成高质量全景视频Abstract
Traditional image stitching focuses on a single panorama frame without considering the spatial-temporal consistency in videos. The straightforward image stitching approach will cause temporal flicking and color inconstancy when it is applied to the video stitching task. Besides, inaccurate camera parameters will cause artifacts in the image warping. In this paper, we propose a real-time system to stitch multiple video sequences into a panoramic video, which is based on GPU accelerated color correction and frame warping without accurate camera parameters. We extend the traditional 2D-Matrix (2D-M) color correction approach and a present spatio-temporal 3D-Matrix (3D-M) color correction method for the overlap local regions with online color balancing using a piecewise function on global frames. Furthermore, we use pairwise homography matrices given by coarse camera calibration for global warping followed by accurate local warping based on the optical flow. Experimental results show that our system can generate highquality panorama videos in real time.
摘要
传统的图像融合方法集中精力于单一的панaramic帧,不考虑视频中的空间-时间一致性。直接使用图像融合方法会导致视频融合任务中的时间闪烁和颜色不稳定。此外,不准确的相机参数会导致图像扭曲。在这篇论文中,我们提出了一种基于GPU加速的实时系统,用于将多个视频序列拼接成一个投影视频。我们extend了传统的2D-矩阵(2D-M)颜色修正方法,并提出了一种基于空间-时间3D-矩阵(3D-M)颜色修正方法,用于在重叠地方进行在线颜色均衡使用分割函数。此外,我们使用粗略相机封锁得到的对角矩阵,然后使用精度地ocal warping基于运动图像。实验结果表明,我们的系统可以在实时中生成高质量的投影视频。
A Model-Agnostic Framework for Recommendation via Interest-aware Item Embeddings
results: 在多种benchmark dataset上进行了广泛的实验,并证明了该方法可以帮助提高推荐系统的性能,特别是在用户兴趣方面。Abstract
Item representation holds significant importance in recommendation systems, which encompasses domains such as news, retail, and videos. Retrieval and ranking models utilise item representation to capture the user-item relationship based on user behaviours. While existing representation learning methods primarily focus on optimising item-based mechanisms, such as attention and sequential modelling. However, these methods lack a modelling mechanism to directly reflect user interests within the learned item representations. Consequently, these methods may be less effective in capturing user interests indirectly. To address this challenge, we propose a novel Interest-aware Capsule network (IaCN) recommendation model, a model-agnostic framework that directly learns interest-oriented item representations. IaCN serves as an auxiliary task, enabling the joint learning of both item-based and interest-based representations. This framework adopts existing recommendation models without requiring substantial redesign. We evaluate the proposed approach on benchmark datasets, exploring various scenarios involving different deep neural networks, behaviour sequence lengths, and joint learning ratios of interest-oriented item representations. Experimental results demonstrate significant performance enhancements across diverse recommendation models, validating the effectiveness of our approach.
摘要
“item表示具有重要 significancenin recommendation系统中,包括新闻、零售和视频等领域。 Retrieval和排名模型通过item表示来捕捉用户-item关系,基于用户行为。而现有的表示学习方法主要是通过对item-based机制进行优化,如关注和序列模型。但这些方法缺乏直接表达用户兴趣的模elling机制,因此可能不能够准确地捕捉用户兴趣。为了解决这个挑战,我们提出了一种新的用户兴趣注意力感知网络(IaCN)推荐模型,这是一种model-agnostic框架,可以直接学习用户兴趣 oriented item表示。IaCN作为auxiliary task,可以在已有的推荐模型中 jointly learn item-based和用户兴趣 oriented表示。这种框架不需要大量重新设计现有的推荐模型。我们在 benchmark datasets上进行了实验,exploring不同的深度神经网络、行为序列长度和joint learning ratio of interest-oriented item表示。实验结果表明,我们的方法可以在多种不同的推荐模型和 scenarios中提高表现,证明了我们的方法的有效性。”
Regularizing Adversarial Imitation Learning Using Causal Invariance
results: 这篇论文表明了使用 causal invariance 作为regularization principle可以解决模型吸收专家数据中的假 correlation问题。Abstract
Imitation learning methods are used to infer a policy in a Markov decision process from a dataset of expert demonstrations by minimizing a divergence measure between the empirical state occupancy measures of the expert and the policy. The guiding signal to the policy is provided by the discriminator used as part of an versarial optimization procedure. We observe that this model is prone to absorbing spurious correlations present in the expert data. To alleviate this issue, we propose to use causal invariance as a regularization principle for adversarial training of these models. The regularization objective is applicable in a straightforward manner to existing adversarial imitation frameworks. We demonstrate the efficacy of the regularized formulation in an illustrative two-dimensional setting as well as a number of high-dimensional robot locomotion benchmark tasks.
摘要
模型使用依据学习方法从专家示范数据集中推导策略,以减少专家和策略之间的差异度量。导引信号被用来引导策略,并通过对抗优化过程中的探测器来提供指导信号。我们发现这种模型容易吸收专家数据中的偶极 correlations。为解决这问题,我们提议使用 causal invariance 作为对抗训练这些模型的正则化原则。这个正则化目标可以直接应用于现有的对抗依据模型中。我们在一些二维设定和高维机器人行走 benchmark 任务中证明了这种准则的效果。
ChatGPT-HealthPrompt. Harnessing the Power of XAI in Prompt-Based Healthcare Decision Support using ChatGPT
methods: 这种方法利用了域知识,从高性能可读取 ML 模型中提取了关键信息,并将其灵活地 integrate 到提问设计中。
results: 研究表明,使用这种方法可以在数据稀缺的情况下实现高质量的二分类任务,并且在不同的数据条件下,OpenAI 的 ChatGPT 的性能比传统的直接学习 ML 模型要好。这种方法可以在医疗决策中提供更多的洞察力和支持。Abstract
This study presents an innovative approach to the application of large language models (LLMs) in clinical decision-making, focusing on OpenAI's ChatGPT. Our approach introduces the use of contextual prompts-strategically designed to include task description, feature description, and crucially, integration of domain knowledge-for high-quality binary classification tasks even in data-scarce scenarios. The novelty of our work lies in the utilization of domain knowledge, obtained from high-performing interpretable ML models, and its seamless incorporation into prompt design. By viewing these ML models as medical experts, we extract key insights on feature importance to aid in decision-making processes. This interplay of domain knowledge and AI holds significant promise in creating a more insightful diagnostic tool. Additionally, our research explores the dynamics of zero-shot and few-shot prompt learning based on LLMs. By comparing the performance of OpenAI's ChatGPT with traditional supervised ML models in different data conditions, we aim to provide insights into the effectiveness of prompt engineering strategies under varied data availability. In essence, this paper bridges the gap between AI and healthcare, proposing a novel methodology for LLMs application in clinical decision support systems. It highlights the transformative potential of effective prompt design, domain knowledge integration, and flexible learning approaches in enhancing automated decision-making.
摘要
Our research also explores the dynamics of zero-shot and few-shot prompt learning based on LLMs. By comparing the performance of OpenAI's ChatGPT with traditional supervised ML models in different data conditions, we aim to provide insights into the effectiveness of prompt engineering strategies under varied data availability. This study bridges the gap between AI and healthcare, proposing a novel methodology for LLMs application in clinical decision support systems. It highlights the transformative potential of effective prompt design, domain knowledge integration, and flexible learning approaches in enhancing automated decision-making.
How Does Pruning Impact Long-Tailed Multi-Label Medical Image Classifiers?
paper_authors: Gregory Holste, Ziyu Jiang, Ajay Jaiswal, Maria Hanna, Shlomo Minkowitz, Alan C. Legasto, Joanna G. Escalon, Sharon Steinberger, Mark Bittman, Thomas C. Shen, Ying Ding, Ronald M. Summers, George Shih, Yifan Peng, Zhangyang Wang
for: 本研究旨在 investigating the impact of pruning on deep neural networks trained for thorax disease diagnosis from chest X-rays (CXRs), and understanding how pruning affects model behavior in long-tailed, multi-label datasets.
methods: 研究使用了 two large CXR datasets, and analyzed the effect of pruning on disease classification. The study also identified individual CXRs where uncompressed and heavily pruned models disagreed, known as pruning-identified exemplars (PIEs), and conducted a human reader study to evaluate their unifying qualities.
results: 研究发现,采用 pruning 技术可以减少深度神经网络的内存使用和执行时间,但是这些方法可能会对模型行为产生负面影响,特别是在长尾、多标签数据集中。研究还发现,采用 pruning 技术可以增加模型的Forgettability,并且可以通过人类读者研究来评估这些 exemplars 的特征。Abstract
Pruning has emerged as a powerful technique for compressing deep neural networks, reducing memory usage and inference time without significantly affecting overall performance. However, the nuanced ways in which pruning impacts model behavior are not well understood, particularly for long-tailed, multi-label datasets commonly found in clinical settings. This knowledge gap could have dangerous implications when deploying a pruned model for diagnosis, where unexpected model behavior could impact patient well-being. To fill this gap, we perform the first analysis of pruning's effect on neural networks trained to diagnose thorax diseases from chest X-rays (CXRs). On two large CXR datasets, we examine which diseases are most affected by pruning and characterize class "forgettability" based on disease frequency and co-occurrence behavior. Further, we identify individual CXRs where uncompressed and heavily pruned models disagree, known as pruning-identified exemplars (PIEs), and conduct a human reader study to evaluate their unifying qualities. We find that radiologists perceive PIEs as having more label noise, lower image quality, and higher diagnosis difficulty. This work represents a first step toward understanding the impact of pruning on model behavior in deep long-tailed, multi-label medical image classification. All code, model weights, and data access instructions can be found at https://github.com/VITA-Group/PruneCXR.
摘要
剪辑技术已成为深度神经网络压缩的有力的方法,可以降低计算机 memory 使用量和执行时间,而不会对总性表现产生重要的影响。然而,剪辑对模型行为的细微影响还不够了解,特别是在常见的医疗数据集上。这种知识空白可能会在部署剪辑后导致诊断错误,这可能会影响病人健康。为了填补这一空白,我们进行了首次对剪辑对神经网络诊断颈部疾病(CXR)的影响的分析。在两个大CXR数据集上,我们研究了哪些疾病受到剪辑影响,并 characterize 疾病 "忘记度" 基于疾病频率和相互出现行为。此外,我们identified 压缩后和 heavily 剪辑后模型之间的分歧,称为剪辑标识 exemplars (PIEs),并进行了人类读者研究来评估其共同特征。我们发现, radiologists 认为 PIEs 具有更多的标签噪音、更差的图像质量和更高的诊断难度。这项工作代表了对剪辑对深度、多标签医疗图像分类模型行为的影响的首次研究。所有代码、模型权重和数据访问指南可以在 GitHub 上找到:https://github.com/VITA-Group/PruneCXR。
Diversifying AI: Towards Creative Chess with AlphaZero
results: 实验表明,AZ_db在围棋游戏中表现出了多样化的做法,解决了更多的问题,并在围棋游戏中超越了更一致的团队。此外,在不同的开局中,AZ_db的成员特циализиру于不同的开局,通过低添加计划选择开局的棋手可以提高50个Elo分。研究结果表明,AI团队中的多样性贡献可以与人类团队中的多样性贡献相比,多样性是解决计算复杂问题的有价值资产。Abstract
In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.
摘要
近年来,人工智能(AI)系统已经超越了人类智能在多种计算任务上。然而,AI系统,如人类一样,会出现错误、盲点、幻觉和难以通过新情况泛化。这项工作探讨了AI是否可以通过创新决策机制提高其计算理性的限制。特别是,我们研究了一群多样化AI系统是否可以在复杂任务上超越单个AI,通过生成更多的想法并选择最佳的想法来增强其表现。我们在国际象棋(即人工智能的“蜞蜓”)中进行了研究,我们称之为AZ_db。我们使用了行为多样性技术来让AZ_db生成更广泛的想法,并使用下降式规划选择最佳想法。我们的实验表明,AZ_db在各种开局下玩国际象棋,每个开局都有不同的特点,并且选择每个开局的最佳棋手使用下降式规划,可以提高50个Elo分。我们的发现表明,AI团队中的多样性奖励存在,与人类团队一样,多样性是解决计算上的困难问题的有价值资产。
Forensic Data Analytics for Anomaly Detection in Evolving Networks
results: 实验结果表明,提出的数字分析解决方案能够有效探测网络中的异常行为。Abstract
In the prevailing convergence of traditional infrastructure-based deployment (i.e., Telco and industry operational networks) towards evolving deployments enabled by 5G and virtualization, there is a keen interest in elaborating effective security controls to protect these deployments in-depth. By considering key enabling technologies like 5G and virtualization, evolving networks are democratized, facilitating the establishment of point presences integrating different business models ranging from media, dynamic web content, gaming, and a plethora of IoT use cases. Despite the increasing services provided by evolving networks, many cybercrimes and attacks have been launched in evolving networks to perform malicious activities. Due to the limitations of traditional security artifacts (e.g., firewalls and intrusion detection systems), the research on digital forensic data analytics has attracted more attention. Digital forensic analytics enables people to derive detailed information and comprehensive conclusions from different perspectives of cybercrimes to assist in convicting criminals and preventing future crimes. This chapter presents a digital analytics framework for network anomaly detection, including multi-perspective feature engineering, unsupervised anomaly detection, and comprehensive result correction procedures. Experiments on real-world evolving network data show the effectiveness of the proposed forensic data analytics solution.
摘要
在传统基础设施(如电信和产业运营网络)协调向5G和虚拟化的演进部署方向,有很大的兴趣在彻底保护这些部署。通过考虑关键启用技术(如5G和虚拟化),演进网络被民主化,使得不同业务模式的点 présence可以成功建立,包括媒体、动态网页内容、游戏和互联网器件多种应用场景。尽管演进网络提供了越来越多的服务,但是许多网络犯罪和攻击仍然在演进网络中进行不良活动。由于传统安全文件(如防火墙和侵入检测系统)的局限性,研究数字审计数据分析的研究吸引了更多的注意。数字审计数据分析可以帮助人们从不同角度获得详细信息和全面的结论,以帮助检察官捕捉犯罪分子和预防未来的犯罪。本章介绍了一种网络异常检测数字审计框架,包括多元角度特征工程、无监督异常检测和全面结果修正过程。实验表明,提案的数字审计数据分析解决方案在真实的演进网络数据上具有有效性。
Semantic Consistency for Assuring Reliability of Large Language Models
results: 实验结果显示,使用 A2C 问题提示策略可以将关注率提高 47%,并且可以将对称性指标提高 7 倍。Abstract
Large Language Models (LLMs) exhibit remarkable fluency and competence across various natural language tasks. However, recent research has highlighted their sensitivity to variations in input prompts. To deploy LLMs in a safe and reliable manner, it is crucial for their outputs to be consistent when prompted with expressions that carry the same meaning or intent. While some existing work has explored how state-of-the-art LLMs address this issue, their evaluations have been confined to assessing lexical equality of single- or multi-word answers, overlooking the consistency of generative text sequences. For a more comprehensive understanding of the consistency of LLMs in open-ended text generation scenarios, we introduce a general measure of semantic consistency, and formulate multiple versions of this metric to evaluate the performance of various LLMs. Our proposal demonstrates significantly higher consistency and stronger correlation with human evaluations of output consistency than traditional metrics based on lexical consistency. Finally, we propose a novel prompting strategy, called Ask-to-Choose (A2C), to enhance semantic consistency. When evaluated for closed-book question answering based on answer variations from the TruthfulQA benchmark, A2C increases accuracy metrics for pretrained and finetuned LLMs by up to 47%, and semantic consistency metrics for instruction-tuned models by up to 7-fold.
摘要
大型语言模型(LLM)在不同自然语言任务上表现出惊人的流畅和能力。然而,最近的研究发现,LLM在输入提示的变化对其输出有敏感性。为了在安全和可靠的方式部署LLM,其输出必须在具有同意义或目的的输入提示下保持一致性。一些现有的研究已经探讨了现代LLM如何处理这个问题,但是其评估仅限于单词或多词答案的字符串一致性,忽略了生成文本序列的一致性。为了更全面地了解LLM在开放式文本生成场景下的一致性,我们提出了一个通用的 semantic consistency 度量,并制定了多个版本的这个度量来评估不同LLM的性能。我们的提案显示在开放式文本生成场景下,我们的度量具有更高的一致性和更强的与人类评估输出一致性的相关性,而传统的基于字符串一致性的度量则显示较差的性能。最后,我们提出了一种新的提示策略,called Ask-to-Choose(A2C),以提高 semantic consistency。在基于 TruthfulQA benchmark 的关闭书问答任务上,A2C 提高了预训练和训练 LLM 的准确度指标 by up to 47%,并提高了 instruction-tuned 模型的 semantic consistency 指标 by up to 7倍。
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
results: 本研究发现,使用EgoSchema dataset进行评估,现代视觉和语言系统的长期视频理解能力远低于人类的76%,甚至bilions参数的模型也只能达到20%~33%的答案正确率。本研究认为,EgoSchema dataset,拥有长期内在时间结构和多样化复杂性,将成为未来开发有效长期视频理解系统的价值评估工具。Abstract
We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior. For each question, EgoSchema requires the correct answer to be selected between five given options based on a three-minute-long video clip. While some prior works have proposed video datasets with long clip lengths, we posit that merely the length of the video clip does not truly capture the temporal difficulty of the video task that is being considered. To remedy this, we introduce temporal certificate sets, a general notion for capturing the intrinsic temporal understanding length associated with a broad range of video understanding tasks & datasets. Based on this metric, we find EgoSchema to have intrinsic temporal lengths over 5.7x longer than the second closest dataset and 10x to 100x longer than any other video understanding dataset. Further, our evaluation of several current state-of-the-art video and language models shows them to be severely lacking in long-term video understanding capabilities. Even models with several billions of parameters achieve QA accuracy less than 33% (random is 20%) on the EgoSchema multi-choice question answering task, while humans achieve about 76% accuracy. We posit that \name{}{}, with its long intrinsic temporal structures and diverse complexity, would serve as a valuable evaluation probe for developing effective long-term video understanding systems in the future. Data and Zero-shot model evaluation code are open-sourced for both public and commercial use under the Ego4D license at http://egoschema.github.io
摘要
我们介绍EGOSchema,一个非常长形式的视频问题回答数据集和benchmark,用于评估现代视频和语言系统的长期视频理解能力。从EGO4D derive,EGOSchema包含了超过5000个人类混合多个选项的问题答案对,覆盖了250小时的真实视频数据,涵盖了人类活动和行为的非常广泛领域。对每个问题,EGOSchema需要根据三分钟的视频片段选择正确的答案,而不是仅根据视频clip的长度。我们认为,仅仅根据视频clip的长度不能够真正捕捉视频任务中的时间困难程度。为了解决这个问题,我们引入了时间证书集,一个通用的概念用于捕捉视频任务中的内在时间理解长度。根据这个指标,我们发现EGOSchema的内在时间长度高于第二最近的数据集5.7倍,并且与任何其他视频理解数据集相比,EGOSchema的内在时间长度在10倍至100倍之间。此外,我们评估了一些现代视频和语言模型,发现这些模型在EGOSchema的多选问题回答任务中的答案率仅在33%(随机为20%),而人类则有约76%的答案率。我们认为,EGOSchema,拥有长期视频结构和多方面的复杂性,将成为未来发展长期视频理解系统的重要评估棒。EGOSchema的数据和零式模型评估代码在http://egoschema.github.io上公开,供公共和商业使用,欢迎各位专家和研究人员前来参与。
Spectral information criterion for automatic elbow detection
methods: 本文使用了spectral information criterion(SIC)来提取error curve的几何特征,并且不需要知道likelihood函数。
results: 本文的实验结果表明,SIC可以提供一个较小的子集模型,并且这些模型都是error curve的拐角点。此外,本文还提出了一个实际适用的选择模型的方法。Abstract
We introduce a generalized information criterion that contains other well-known information criteria, such as Bayesian information Criterion (BIC) and Akaike information criterion (AIC), as special cases. Furthermore, the proposed spectral information criterion (SIC) is also more general than the other information criteria, e.g., since the knowledge of a likelihood function is not strictly required. SIC extracts geometric features of the error curve and, as a consequence, it can be considered an automatic elbow detector. SIC provides a subset of all possible models, with a cardinality that often is much smaller than the total number of possible models. The elements of this subset are elbows of the error curve. A practical rule for selecting a unique model within the sets of elbows is suggested as well. Theoretical invariance properties of SIC are analyzed. Moreover, we test SIC in ideal scenarios where provides always the optimal expected results. We also test SIC in several numerical experiments: some involving synthetic data, and two experiments involving real datasets. They are all real-world applications such as clustering, variable selection, or polynomial order selection, to name a few. The results show the benefits of the proposed scheme. Matlab code related to the experiments is also provided. Possible future research lines are finally discussed.
摘要
我们介绍一个通用化信息标准(Generalized Information Criterion,GIC),它包含了bayesian信息标准(BIC)和阿凯瑞信息标准(AIC)等其他知名信息标准的特别情况。此外,我们的spectral information criterion(SIC)还比其他信息标准更加通用,例如不需要知道假设概率函数。SIC从几何特征中提取错误曲线的几何特征,因此可以被视为一个自动拱角检测器。SIC提供了可能的模型集合,其中的元素都是错误曲线的拱角。我们建议一个实用的选择模型方法,以及对SIC的理论不变性性的分析。此外,我们还进行了一些理论测试和实验测试,包括使用 sintetic 数据和实际数据。结果显示了我们的方案的优点。matlab代码相关的实验也提供。最后,我们讨论了未来的研究方向。
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models
results: 我们在三个问答 dataset上进行了实验,结果表明,使用 MindMap 提示可以带来明显的实验性提升。例如,在 GPT-3.5 上使用 MindMap 提示可以在 GPT-4 上取得战胜性的表现。此外,我们还发现,通过结构化知识从 KG 中检索,MindMap 可以超越一些使用文档检索方法的提示方法,从而获得更高的准确率、更短的检索距离和更全面的知识。Abstract
LLMs usually exhibit limitations in their ability to incorporate new knowledge, the generation of hallucinations, and the transparency of their decision-making process. In this paper, we explore how to prompt LLMs with knowledge graphs (KG), working as a remedy to engage LLMs with up-to-date knowledge and elicit the reasoning pathways from LLMs. Specifically, we build a prompting pipeline that endows LLMs with the capability of comprehending KG inputs and inferring with a combined implicit knowledge and the retrieved external knowledge. In addition, we investigate eliciting the mind map on which LLMs perform the reasoning and generate the answers. It is identified that the produced mind map exhibits the reasoning pathways of LLMs grounded on the ontology of knowledge, hence bringing the prospects of probing and gauging LLM inference in production. The experiments on three question & answering datasets also show that MindMap prompting leads to a striking empirical gain. For instance, prompting a GPT-3.5 with MindMap yields an overwhelming performance over GPT-4 consistently. We also demonstrate that with structured facts retrieved from KG, MindMap can outperform a series of prompting-with-document-retrieval methods, benefiting from more accurate, concise, and comprehensive knowledge from KGs.
摘要
Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification
results: 这个方法可以准确地重识别漫画中的人物,并且比使用脸或身体独立地重识别更有效。Abstract
Character re-identification, recognizing characters consistently across different panels in comics, presents significant challenges due to limited annotated data and complex variations in character appearances. To tackle this issue, we introduce a robust semi-supervised framework that combines metric learning with a novel 'Identity-Aware' self-supervision method by contrastive learning of face and body pairs of characters. Our approach involves processing both facial and bodily features within a unified network architecture, facilitating the extraction of identity-aligned character embeddings that capture individual identities while preserving the effectiveness of face and body features. This integrated character representation enhances feature extraction and improves character re-identification compared to re-identification by face or body independently, offering a parameter-efficient solution. By extensively validating our method using in-series and inter-series evaluation metrics, we demonstrate its effectiveness in consistently re-identifying comic characters. Compared to existing methods, our approach not only addresses the challenge of character re-identification but also serves as a foundation for downstream tasks since it can produce character embeddings without restrictions of face and body availability, enriching the comprehension of comic books. In our experiments, we leverage two newly curated datasets: the 'Comic Character Instances Dataset', comprising over a million character instances and the 'Comic Sequence Identity Dataset', containing annotations of identities within more than 3000 sets of four consecutive comic panels that we collected.
摘要
<>将文本翻译成简化中文。<>Character 重认处理,在漫画中的人物重复检测,存在很大的挑战,主要是因为有限的标注数据和人物形象的复杂变化。为解决这个问题,我们提出了一种可靠的半监督性框架,将度量学习与一种新的 '自我认知' 自监督学习方法结合,通过对人物脸和身体匹配的对比学习。我们的方法包括对人物脸和身体特征进行共同处理,使得提取人物特征的标准化 embeddings 可以捕捉到个体特征,同时保持脸和身体特征的有效性。这种一体化的人物表示提高了特征提取和人物重认处理,比起独立地使用脸或身体进行重认处理,提供了参数高效的解决方案。通过对我们方法进行 série 和 inter-série 评估指标,我们证明了它的有效性,可以在漫画中一致地重复检测人物。相比之下,我们的方法不仅解决了人物重认处理的挑战,还可以为下游任务提供基础,因为它可以不受脸和身体可用性的限制生成人物 embeddings,推动漫画的理解。在我们的实验中,我们利用了两个新收集的数据集:'Comic Character Instances Dataset',包含了大于一百万个人物实例,以及'Comic Sequence Identity Dataset',包含了四个连续的漫画幕后的标注,我们收集了超过3000个人物标注。
Fast Decision Support for Air Traffic Management at Urban Air Mobility Vertiports using Graph Learning
results: 通过在AirSim中进行实际模拟,对减小多旋翼机的实际情况进行评估,结果表明图约束学习可以有效地解决城市空中交通——Vertiport调度管理问题,并且比基本的强化学习(图嵌入)或随机选择基eline有更好的性能, measured by delays, safety (no. of collisions) and battery consumption.Abstract
Urban Air Mobility (UAM) promises a new dimension to decongested, safe, and fast travel in urban and suburban hubs. These UAM aircraft are conceived to operate from small airports called vertiports each comprising multiple take-off/landing and battery-recharging spots. Since they might be situated in dense urban areas and need to handle many aircraft landings and take-offs each hour, managing this schedule in real-time becomes challenging for a traditional air-traffic controller but instead calls for an automated solution. This paper provides a novel approach to this problem of Urban Air Mobility - Vertiport Schedule Management (UAM-VSM), which leverages graph reinforcement learning to generate decision-support policies. Here the designated physical spots within the vertiport's airspace and the vehicles being managed are represented as two separate graphs, with feature extraction performed through a graph convolutional network (GCN). Extracted features are passed onto perceptron layers to decide actions such as continue to hover or cruise, continue idling or take-off, or land on an allocated vertiport spot. Performance is measured based on delays, safety (no. of collisions) and battery consumption. Through realistic simulations in AirSim applied to scaled down multi-rotor vehicles, our results demonstrate the suitability of using graph reinforcement learning to solve the UAM-VSM problem and its superiority to basic reinforcement learning (with graph embeddings) or random choice baselines.
摘要