paper_authors: Fei Zhang, Yunjie Ye, Lei Feng, Zhongwen Rao, Jieming Zhu, Marcus Kalander, Chen Gong, Jianye Hao, Bo Han
for: 这个论文研究了一个新的问题,即活动学习 WITH 部分标签(ALPL)。在这种设定下,一个oracle对查询样本提供部分标签,从而放宽oracle的准确标签过程。
methods: 我们首先构建了一个直观的基线,可以轻松地整合到现有的AL框架中。虽然有效,但这个基eline仍然受到过拟合的影响,并且在选择代表样本时缺乏表达partial-label-based samples的能力。我们从人类的认知科学中灵感,where accurate inferences can be explicitly derived from counter-examples (CEs),我们的目标是使用这种人类学习模式来解决过拟合问题,同时提高选择代表样本的过程。
results: 我们的方法在五个实际数据集和四个benchmark数据集上实现了全面的改进,超过了十个代表性的AL框架。这些实验结果表明,我们提议的方法可以增强predictor的性能和选择代表样本的过程,使predictor能够更准确地捕捉数据中的征性 Patterns。Abstract
This paper studies a new problem, \emph{active learning with partial labels} (ALPL). In this setting, an oracle annotates the query samples with partial labels, relaxing the oracle from the demanding accurate labeling process. To address ALPL, we first build an intuitive baseline that can be seamlessly incorporated into existing AL frameworks. Though effective, this baseline is still susceptible to the \emph{overfitting}, and falls short of the representative partial-label-based samples during the query process. Drawing inspiration from human inference in cognitive science, where accurate inferences can be explicitly derived from \emph{counter-examples} (CEs), our objective is to leverage this human-like learning pattern to tackle the \emph{overfitting} while enhancing the process of selecting representative samples in ALPL. Specifically, we construct CEs by reversing the partial labels for each instance, and then we propose a simple but effective WorseNet to directly learn from this complementary pattern. By leveraging the distribution gap between WorseNet and the predictor, this adversarial evaluation manner could enhance both the performance of the predictor itself and the sample selection process, allowing the predictor to capture more accurate patterns in the data. Experimental results on five real-world datasets and four benchmark datasets show that our proposed method achieves comprehensive improvements over ten representative AL frameworks, highlighting the superiority of WorseNet. The source code will be available at \url{https://github.com/Ferenas/APLL}.
摘要
CAMP: A Context-Aware Cricket Players Performance Metric
results: 研究发现,CAMP评估结果与专家委员会宣布的最佳球员(Man of the Match,MoM)相closely match,在961场比赛中,CAMP评估的top两名球员与MoM一致的比例为83%。此外,CAMP的评估结果也超过了现有的最佳球员贡献度量方法(基于Duckworth-Lewis-Stern方法)。Abstract
Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a cricket match outcome. CAMP employs data mining methods and enables effective data-driven decision-making for selection and drafting, coaching and training, team line-ups, and strategy development. CAMP incorporates the exact context of performance, such as opponents' strengths and specific circumstances of games, such as pressure situations. We empirically evaluate CAMP on data of limited-over cricket matches between 2001 and 2019. In every match, a committee of experts declares one player as the best player, called Man of the M}atch (MoM). The top two rated players by CAMP match with MoM in 83\% of the 961 games. Thus, the CAMP rating of the best player closely matches that of the domain experts. By this measure, CAMP significantly outperforms the current best-known players' contribution measure based on the Duckworth-Lewis-Stern (DLS) method.
摘要
крикет是以往最受欢迎的运动之一,仅次于足球。然而,评估个体运动员的表现,是现代团队运动中的基本任务,目前主要基于汇总性统计,包括平均得分和帮助取下。我们提出了 Context-Aware Metric of player Performance(CAMP),用于衡量 крикет运动员的贡献。CAMP 利用数据挖掘技术,实现了有效的数据驱动决策,包括选择和招募、教练和训练、队列和战略开发。CAMP 包含了特定场景的表现,如对手的优势和特定游戏情况,如压力情况。我们对数据库中的限制时间 крикет比赛数据进行了Empirical评估。每场比赛中,专家委员会选择一名最佳运动员,称为 Man of the Match(MoM)。CAMP 中的前两名与 MoM 匹配在 83% 的 961 场比赛中。因此,CAMP 评分的最佳运动员与领域专家的评价相当接近。根据这个标准,CAMP 明显超过了目前最佳运动员贡献的度量方法,基于 Duckworth-Lewis-Stern(DLS)方法。
Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach
results: 实验结果表明,通过 combinig 每种预训练模型的优点并使用排名算法,本方法可以显著提高 Bengali 文本摘要的准确性和效果。Abstract
With the increasing need for text summarization techniques that are both efficient and accurate, it becomes crucial to explore avenues that enhance the quality and precision of pre-trained models specifically tailored for summarizing Bengali texts. When it comes to text summarization tasks, there are numerous pre-trained transformer models at one's disposal. Consequently, it becomes quite a challenge to discern the most informative and relevant summary for a given text among the various options generated by these pre-trained summarization models. This paper aims to identify the most accurate and informative summary for a given text by utilizing a simple but effective ranking-based approach that compares the output of four different pre-trained Bengali text summarization models. The process begins by carrying out preprocessing of the input text that involves eliminating unnecessary elements such as special characters and punctuation marks. Next, we utilize four pre-trained summarization models to generate summaries, followed by applying a text ranking algorithm to identify the most suitable summary. Ultimately, the summary with the highest ranking score is chosen as the final one. To evaluate the effectiveness of this approach, the generated summaries are compared against human-annotated summaries using standard NLG metrics such as BLEU, ROUGE, BERTScore, WIL, WER, and METEOR. Experimental results suggest that by leveraging the strengths of each pre-trained transformer model and combining them using a ranking-based approach, our methodology significantly improves the accuracy and effectiveness of the Bengali text summarization.
摘要
随着需要高效精准的文本摘要技术的增加,对适应 Bengali 文本摘要模型进行特点化成本的探索变得非常重要。在文本摘要任务中,有许多预训练的转换器模型可供选择。因此,对于给定的文本,从多个预训练摘要模型中选择最佳的摘要成为了一项挑战。本文提出了一种简单 yet effective 的排名基于方法,该方法通过对四个预训练 Bengali 文本摘要模型的输出进行比较,以确定最佳的摘要。该过程包括对输入文本进行简化处理,从而消除不必要的元素,如特殊字符和标点符号。接着,我们利用四个预训练摘要模型生成摘要,然后应用文本排名算法来确定最佳摘要。最后,根据排名得分,选择最高分的摘要作为最终结果。为了评估该方法的有效性,我们将生成的摘要与人工标注的摘要进行比较,使用标准的NLG指标,如 BLEU、ROUGE、BERTScore、WIL、WER 和 METEOR。实验结果表明,通过利用每个预训练转换器模型的优点并将其结合使用排名基于方法,我们的方法可以显著提高 Bengali 文本摘要的准确性和效iveness。
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
paper_authors: Guoyun Tu, Ying Liu, Vladimir Vlassov
For: 本研究旨在提出和实现一种Attribute-Information-Combined Attention-Based Network(AIC-AB NET),用于图像描述 generation。* Methods: 该模型结合了空间注意力架构和文本特征信息,并在encoder-decoder结构中进行了可适应的注意力调整。* Results: 对于 MS COCO 数据集和我们新提出的 Fashion 数据集,我们的 AIC-AB NET 与基eline模型和减少模型进行了比较,并取得了更高的性能水平。Abstract
Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset.
摘要
“图像描述是计算机视觉和自然语言处理领域中的一个重要领域。我们提议并提出了一种新的 Attribute-Information-Combined Attention-Based Network(AIC-AB NET),该模型结合了空间注意架构和文本特征在encoder-decoder中组合。在图像描述中,适应的空间注意可以确定图像中最好代表图像的区域,以及是否需要关注视觉特征或视觉特征。文本特征信息同时被 fed into decoder,以帮助图像识别和减少不确定性。我们在 MS COCO 数据集和我们提出的新的时尚数据集上测试和评估了我们的 AICAB NET。时尚数据集被用作单个物体图像的标准 benchmark。结果表明我们的提议模型与状态之前的基eline和剥离模型在 MS COCO 数据集和时尚数据集上的表现都有所提高,分别提高了0.017(CIDEr 分数)和0.095(CIDEr 分数)。”
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
paper_authors: Maria del Rio-Chanona, Nadzeya Laurentsyeva, Johannes Wachs
for: investigate how the release of ChatGPT affects human-generated open data on the web, specifically on Stack Overflow
methods: analyze activity on Stack Overflow, use difference-in-differences model to estimate the impact of ChatGPT
results: find a 16% decrease in weekly posts on Stack Overflow after the release of ChatGPT, with a greater impact on posts related to the most widely used programming languages, and no significant change in voting scores.Abstract
Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant problem in securing training data for future models. In this work, we investigate how the release of ChatGPT changed human-generated open data on the web by analyzing the activity on Stack Overflow, the leading online Q\&A platform for computer programming. We find that relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable, activity on Stack Overflow significantly decreased. A difference-in-differences model estimates a 16\% decrease in weekly posts on Stack Overflow. This effect increases in magnitude over time, and is larger for posts related to the most widely used programming languages. Posts made after ChatGPT get similar voting scores than before, suggesting that ChatGPT is not merely displacing duplicate or low-quality content. These results suggest that more users are adopting large language models to answer questions and they are better substitutes for Stack Overflow for languages for which they have more training data. Using models like ChatGPT may be more efficient for solving certain programming problems, but its widespread adoption and the resulting shift away from public exchange on the web will limit the open data people and models can learn from in the future.
摘要
大型自然语言模型如ChatGPT有效地为用户提供了关于不同话题的信息,可能成为搜索网络和在线问答的替代方案。但是由于用户与模型进行私人交互,这些模型可能会减少公开可用的人类生成的数据和知识资源。这种替换可能会对未来模型的训练数据产生重大问题。在这项工作中,我们研究了ChatGPT的发布后对网络上的人类生成开放数据的影响,通过分析Stack Overflow网站的活动。我们发现,相比于其俄罗斯和中国的对手, Stack Overflow上的活动呈现下降趋势。此外,与数学相关的讨论forum不同,Stack Overflow上的活动呈现更大的下降趋势。使用差异分析模型,我们估算Stack Overflow每周的帖子数减少16%。这个效果随时间的推移而增强,并且对于使用最广泛的编程语言的帖子更大。帖子发布后得到了与之前相同的投票分数,表明ChatGPT不仅不是替换低质量或 duplicates的内容,而且是更好的解决某些编程问题的模型。这些结果表明,更多的用户正在使用大语言模型来回答问题,而且它们对于某些编程语言来说是更好的替代方案。使用模型如ChatGPT可能更高效地解决某些编程问题,但是其广泛的采用和相应的减少公开数据将限制未来模型和人类可以学习的开放数据。
Source-Free Domain Adaptation with Temporal Imputation for Time Series Data
results: 实验结果显示, compared to现有的方法,MAPU在三个真实世界时间序列数据集上 achieves significant performance gain。Abstract
Source-free domain adaptation (SFDA) aims to adapt a pretrained model from a labeled source domain to an unlabeled target domain without access to the source domain data, preserving source domain privacy. Despite its prevalence in visual applications, SFDA is largely unexplored in time series applications. The existing SFDA methods that are mainly designed for visual applications may fail to handle the temporal dynamics in time series, leading to impaired adaptation performance. To address this challenge, this paper presents a simple yet effective approach for source-free domain adaptation on time series data, namely MAsk and imPUte (MAPU). First, to capture temporal information of the source domain, our method performs random masking on the time series signals while leveraging a novel temporal imputer to recover the original signal from a masked version in the embedding space. Second, in the adaptation step, the imputer network is leveraged to guide the target model to produce target features that are temporally consistent with the source features. To this end, our MAPU can explicitly account for temporal dependency during the adaptation while avoiding the imputation in the noisy input space. Our method is the first to handle temporal consistency in SFDA for time series data and can be seamlessly equipped with other existing SFDA methods. Extensive experiments conducted on three real-world time series datasets demonstrate that our MAPU achieves significant performance gain over existing methods. Our code is available at \url{https://github.com/mohamedr002/MAPU_SFDA_TS}.
摘要
源自由领域适应 (SFDA) 目标是将预训练的源频率模型适应到无标记目标频率频率上,保持源频率隐私。尽管 SFDA 在视觉应用中广泛存在,但在时间序列应用中它尚未得到充分研究。现有的 SFDA 方法主要是为视觉应用设计,可能无法处理时间序列中的时间动态,导致适应性下降。为解决这个挑战,本文提出了一种简单 yet 有效的时间序列 SFDA 方法,即 Maske 和 imPUte (MAPU)。首先,为了捕捉源频率频率中的时间信息,我们的方法在时间序列信号上随机填充,并利用一种新的时间填充器来在嵌入空间中恢复原始信号。其次,在适应步骤中,填充器网络被利用来引导目标模型生成目标特征,使其与源特征在时间上具有一致性。这样,我们的 MAPU 可以在适应过程中考虑时间相互关系,而不是在噪音输入空间中进行填充。我们的方法是时间序列 SFDA 中首次考虑时间一致性的方法,可以顺利地与其他现有的 SFDA 方法结合使用。我们的实验结果表明,使用 MAPU 可以在三个实际的时间序列数据集上实现显著的性能提升。我们的代码可以在 \url{https://github.com/mohamedr002/MAPU_SFDA_TS} 上找到。
Rethinking Trust Repair in Human-Robot Interaction
results: 本研究提供了人机交互中信任修复策略的概念和关键组件,以及现有的研究成果。未来的研究将围绕着这些研究问题进行发展。Abstract
As robots become increasingly prevalent in work-oriented collaborations, trust has emerged as a critical factor in their acceptance and effectiveness. However, trust is dynamic and can erode when mistakes are made. Despite emerging research on trust repair in human-robot interaction, significant questions remain about identifying reliable approaches to restoring trust in robots after trust violations occur. To address this problem, my research aims to identify effective strategies for designing robots capable of trust repair in human-robot interaction (HRI) and to explore the underlying mechanisms that make these strategies successful. This paper provides an overview of the fundamental concepts and key components of the trust repair process in HRI, as well as a summary of my current published work in this area. Additionally, I discuss the research questions that will guide my future work and the potential contributions that this research could make to the field.
摘要
As robots become increasingly prevalent in work-oriented collaborations, trust has emerged as a critical factor in their acceptance and effectiveness. However, trust is dynamic and can erode when mistakes are made. Despite emerging research on trust repair in human-robot interaction, significant questions remain about identifying reliable approaches to restoring trust in robots after trust violations occur. To address this problem, my research aims to identify effective strategies for designing robots capable of trust repair in human-robot interaction (HRI) and to explore the underlying mechanisms that make these strategies successful. This paper provides an overview of the fundamental concepts and key components of the trust repair process in HRI, as well as a summary of my current published work in this area. Additionally, I discuss the research questions that will guide my future work and the potential contributions that this research could make to the field.Here's the translation in Traditional Chinese:随着机器人在工作合作中的普及,信任已成为 kritical factor 的 acceptance 和 efficiency。然而,信任是动态的,可以在错误时被损坏。尽管人机交互中的信任修复研究已经出现,但仍有很多问题需要解决,包括如何确定可靠的方法来重建机器人信任。为了解决这个问题,我的研究目标是发现可靠的机器人信任修复策略,并探索这些策略的成功关键。这篇文章提供了人机交互中信任修复过程的基本概念和关键组件,以及我的现有发表作品。此外,我还讨论了未来研究的问题和这些研究对领域的潜在贡献。
Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser with Prompts
methods: 提议一种两步方法:首先,使用分类器检测 hate speech;然后,通过提示生成更不偏见或不偏见的替代语言。
results: 对一个标准数据集进行评估,观察到 hate speech 的负面效果减少。 这种方法可以帮助在线对话中减少偏见,创造更公正和包容的沟通环境。Abstract
Discriminatory language and biases are often present in hate speech during conversations, which usually lead to negative impacts on targeted groups such as those based on race, gender, and religion. To tackle this issue, we propose an approach that involves a two-step process: first, detecting hate speech using a classifier, and then utilizing a debiasing component that generates less biased or unbiased alternatives through prompts. We evaluated our approach on a benchmark dataset and observed reduction in negativity due to hate speech comments. The proposed method contributes to the ongoing efforts to reduce biases in online discourse and promote a more inclusive and fair environment for communication.
摘要
【文本】恐Speech hate speech在对话中经常带有歧视性语言和偏见,这通常会对targeted groups造成负面影响,如基于种族、性别和宗教的群体。为解决这个问题,我们提出了一种两步方法:首先,使用分类器探测 hate speech,然后使用debiasing组件生成less biased或无偏见的 altenatives through prompts。我们对 benchmark dataset进行了评估,并观察到了对 hate speech负面评论的减少。该方法对在线对话中减少偏见和促进更加包容和公正的环境做出了贡献。Note: "恐Speech" is a combination of "恐怖" (terror) and "Speech" (speech), which is a common term used to refer to hate speech in Chinese.
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
for: 这 paper 是关于无文本资源的语音表征学研究,具体是用 hidden unit clustering (HUC) 框架来实现自我超参的 speech 表征学习。
methods: 输入 audio 样本经 windowing 和 1-D convolutional layers 处理,然后使用 long short term memory (LSTM) 层生成每个窗口段的上下文向量表示。HUC 框架用于训练模型学习具有含义的 speech 表示。
results: 在 ZeroSpeech 2021 挑战中 Completely Unsupervised 语音应用中,以及 TIMIT 数据集和 GramVaani Hindi 数据集上的 semi-supervised automatic speech recognition (ASR) 应用中,模型 achieved state-of-art 结果。另外,HUC 表示在 ASR 实验中显著提高了对比 Wav2vec、HuBERT 和 Best-RQ 的成果。Abstract
The representation learning of speech, without textual resources, is an area of significant interest for many low resource speech applications. In this paper, we describe an approach to self-supervised representation learning from raw audio using a hidden unit clustering (HUC) framework. The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers. The learned "time-frequency" representations from the convolutional neural network (CNN) module are further processed with long short term memory (LSTM) layers which generate a contextual vector representation for every windowed segment. The HUC framework, allowing the categorization of the representations into a small number of phoneme-like units, is used to train the model for learning semantically rich speech representations. The targets consist of phoneme-like pseudo labels for each audio segment and these are generated with an iterative k-means algorithm. We explore techniques that improve the speaker invariance of the learned representations and illustrate the effectiveness of the proposed approach on two settings, i) completely unsupervised speech applications on the sub-tasks described as part of the ZeroSpeech 2021 challenge and ii) semi-supervised automatic speech recognition (ASR) applications on the TIMIT dataset and on the GramVaani challenge Hindi dataset. In these experiments, we achieve state-of-art results for various ZeroSpeech tasks. Further, on the ASR experiments, the HUC representations are shown to improve significantly over other established benchmarks based on Wav2vec, HuBERT and Best-RQ.
摘要
“对于无文本资源的语音识别,是一个具有很大的研究 интерес的领域。在这篇文章中,我们描述了一种对于类似时间频率的自我监督学习方法,使用隐藏单元聚合(HUC)框架。输入模型包含了对于每个时间频率范围的窗口处理和1-D卷积层。从卷积层获得的“时间频率”表现被进一步处理,并使用长期内存(LSTM)层生成每个窗口段的内容vector表现。HUC框架允许将表现分为一小数量的语音单元,并在这些单元上进行训练,以学习具有 semantic richness 的语音表现。目标包括每个音频段的语音单元 pseudo-标签,这些标签是使用迭代k-means算法生成的。我们探索了提高话者不变的技术,并证明了我们的方法在两个设定下具有优秀的效果:完全无监督语音应用程序中的ZeroSpeech 2021挑战和半监督自动语音识别(ASR)应用程序中的TIMITdataset和GramVaani挑战Hindi dataset。在这些实验中,我们取得了顶尖的成绩,并在不同的 ZeroSpeech 任务中获得了州内最佳的成绩。此外,在 ASR 实验中,HUC 表现优化了与其他已知的参考标准Wav2vec、HuBERT和Best-RQ相比。”
methods: 方法包括三个关键组件:Clear Prompting (CP)、Calibration with Hints (CH)和Consistent Output (CO),用于处理模型输入、模型偏见和模型输出。
results: 在Spider Challenge的备用测试集上达到了82.3%的执行精度,成为零shot Text-to-SQL领域的状态精度。Abstract
This paper proposes a ChatGPT-based zero-shot Text-to-SQL method, dubbed C3, which achieves 82.3\% in terms of execution accuracy on the holdout test set of Spider and becomes the state-of-the-art zero-shot Text-to-SQL method on the Spider Challenge. C3 consists of three key components: Clear Prompting (CP), Calibration with Hints (CH), and Consistent Output (CO), which are corresponding to the model input, model bias and model output respectively. It provides a systematic treatment for zero-shot Text-to-SQL. Extensive experiments have been conducted to verify the effectiveness and efficiency of our proposed method.
摘要
这篇论文提出了基于ChatGPT的零次 Text-to-SQL方法,名为C3,其在Spider挑战的备用测试集上达到了82.3%的执行精度,成为零次 Text-to-SQL领域的状态之一。C3包括三个关键组件: Clear Prompting(CP)、Calibration with Hints(CH)和Consistent Output(CO),它们分别对应模型输入、模型偏好和模型输出。它提供了零次 Text-to-SQL的系统性处理方法。我们进行了广泛的实验来证明我们提出的方法的有效性和效率。
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
paper_authors: Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Alex C. Kot
for: 一shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample.
methods: 使用多scale spatial-temporal feature matching to handle skeleton action recognition, representing skeleton data at multiple spatial and temporal scales and achieving optimal feature matching from two perspectives.
results: 在三个大规模数据集(NTU RGB+D、NTU RGB+D 120、PKU-MMD)上进行了广泛的实验,得到了superior的一shot skeleton action recognition结果,并与状态对比大幅提高了性能。Abstract
One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins.
摘要
一shot skeleton action recognition技术,即通过单个训练样本学习skeleton action recognition模型,已经引起了越来越多的关注,因为收集和标注大规模skeleton action数据是一项挑战。然而,大多数现有研究都是通过直接比较skeleton序列的特征向量来匹配skeleton数据,这种方法忽略了skeleton数据的空间结构和时间顺序。本文提出了一种新的一shot skeleton action recognition技术,通过多级空间-时间特征匹配来处理skeleton action recognition。我们将skeleton数据表示在多个空间和时间级别,并实现最佳的特征匹配从两个方面。第一是多级匹配,同时捕捉多个空间和时间级别的scale-wise semantic relevance。第二是交叉级别匹配,处理不同的动作幅度和速度,通过捕捉不同级别的sample-wise relevance。我们在NTU RGB+D、NTU RGB+D 120和PKU-MMD三个大规模数据集上进行了广泛的实验,结果表明,我们的方法可以实现superior的一shot skeleton action recognition,并在相对评价中减少了state-of-the-art的差距。
AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023
results: 我们的方法在挑战测试集上达到55.43%的排名第一的top-1准确率,代码可以匿名获取于https://github.com/StevenLauHKHK/AudioInceptionNeXt.git。Abstract
This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the mapping from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-spectrogram of the audio samples. Motivated by the design of the InceptionNeXt, we propose parallel multi-scale depthwise separable convolutional kernels in the AudioInceptionNeXt block, which enable the model to learn the time and frequency information more effectively. The large-scale separable kernels capture the long duration of activities and the global frequency semantic information, while the small-scale separable kernels capture the short duration of activities and local details of frequency information. Our approach achieved 55.43% of top-1 accuracy on the challenge test set, ranked as 1st on the public leaderboard. Codes are available anonymously at https://github.com/StevenLauHKHK/AudioInceptionNeXt.git.
摘要
这份报告介绍我们在2023年 Epic-Kitchen EPIC-SOUNDS 音频基于交互识别挑战中的提交技术细节。任务是学习音频示例与其相应的动作标签之间的映射。为达到这个目标,我们提议一种简单 yet effective的单流Convolutional Neural Network(CNN)架构 AudioInceptionNeXt,该架构在时域-频谱幅的Log-Mel spectrogram上运行。受inceptionNeXt的设计启发,我们提议在AudioInceptionNeXt块中并行执行多尺度分割的深度独立 convolutional 核,这使得模型能够更有效地学习时间和频谱信息。大规模分割核捕捉活动的长期持续和全球频谱 semantic 信息,而小规模分割核捕捉活动的短期持续和本地频谱信息。我们的方法在挑战测试集上达到55.43%的 top-1 准确率,排名公共排行板上第一名。代码可以在https://github.com/StevenLauHKHK/AudioInceptionNeXt.git anonymous 上获取。
A Dynamic Points Removal Benchmark in Point Cloud Maps
results: 本研究使用了多个不同感应器类型的数据集,并提供了一个公开可用的代码和数据集,以便进一步的发展和应用。Abstract
In the field of robotics, the point cloud has become an essential map representation. From the perspective of downstream tasks like localization and global path planning, points corresponding to dynamic objects will adversely affect their performance. Existing methods for removing dynamic points in point clouds often lack clarity in comparative evaluations and comprehensive analysis. Therefore, we propose an easy-to-extend unified benchmarking framework for evaluating techniques for removing dynamic points in maps. It includes refactored state-of-art methods and novel metrics to analyze the limitations of these approaches. This enables researchers to dive deep into the underlying reasons behind these limitations. The benchmark makes use of several datasets with different sensor types. All the code and datasets related to our study are publicly available for further development and utilization.
摘要
在机器人学中,点云已成为重要的地图表示方式。从本地化和全球规划任务的视角来看,对动态对象的点会有负面影响性能。现有的点云中动态点除法方法经常缺乏明确的比较评价和全面分析。因此,我们提议一个易扩展的统一评价框架,用于评估地图中动态点除法的技术。该框架包括 refactoring 当前领先的方法和新的评价指标,以分析这些方法的局限性。这使研究人员能够深入探究这些局限性的根本原因。我们的benchmark使用了不同感知器类型的数据集。所有与我们的研究相关的代码和数据集都公开可用于进一步开发和应用。
Dialogue Agents 101: A Beginner’s Guide to Critical Ingredients for Designing Effective Conversational Systems
results: 研究发现,使用不同的方法解决不同的对话任务可能会带来高成本和不充分利用对话任务之间的相互关系。因此,现在的趋势是向建立统一基础模型。本研究还提出了一个统一对话数据集(UNIT),用于检验对话代理系统的性能。Abstract
Sharing ideas through communication with peers is the primary mode of human interaction. Consequently, extensive research has been conducted in the area of conversational AI, leading to an increase in the availability and diversity of conversational tasks, datasets, and methods. However, with numerous tasks being explored simultaneously, the current landscape of conversational AI becomes fragmented. Therefore, initiating a well-thought-out model for a dialogue agent can pose significant challenges for a practitioner. Towards highlighting the critical ingredients needed for a practitioner to design a dialogue agent from scratch, the current study provides a comprehensive overview of the primary characteristics of a dialogue agent, the supporting tasks, their corresponding open-domain datasets, and the methods used to benchmark these datasets. We observe that different methods have been used to tackle distinct dialogue tasks. However, building separate models for each task is costly and does not leverage the correlation among the several tasks of a dialogue agent. As a result, recent trends suggest a shift towards building unified foundation models. To this end, we propose UNIT, a UNified dIalogue dataseT constructed from conversations of existing datasets for different dialogue tasks capturing the nuances for each of them. We also examine the evaluation strategies used to measure the performance of dialogue agents and highlight the scope for future research in the area of conversational AI.
摘要
人类交流的主要方式是通过与同伴的交流,因此很多研究在对话AI方面进行了探索。这导致了对话任务的多样性和可用性的提高,但是由于同时探索多个任务,现在的对话AI领域就变得分散化。因此,设计一个从头开始的对话代理模型可能会对实践者提出 significiant challenges。为了帮助实践者设计对话代理模型,本研究提供了对对话代理模型的主要特征、支持任务、相关的开放领域数据集以及用于评估这些数据集的方法的全面回顾。我们发现不同的方法在解决不同的对话任务时都有用,但是建立每个任务的单独模型是昂贵的,并且不利用对话任务之间的相互关系。因此,现在的趋势是建立统一基础模型。为此,我们提出了UNIT,一个基于对话数据集的统一基础模型, capture了每个任务的细节。我们还检查了用于评估对话代理模型的评价策略,并 highlighted the scope for future research in the area of conversational AI.
Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning
paper_authors: Byung-Kwan Lee, Junho Kim, Yong Man Ro
For: This paper aims to improve the adversarial robustness of deep neural networks by introducing a causal approach called Adversarial Double Machine Learning (ADML).* Methods: The ADML method uses a causal perspective to quantify the degree of adversarial vulnerability and capture the effect of treatments on the outcome of interests.* Results: The paper demonstrates through extensive experiments on various CNN and Transformer architectures that ADML improves adversarial robustness with large margins and relieves the empirical observation of vulnerability.Abstract
Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks. To prevent potential threats, various adversarial training-based defense methods have grown rapidly and become a de facto standard approach for robustness. Despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. Through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
摘要
“深度神经网络的决策过程可以轻易受到来自故意设计的扰动的攻击,这些扰动可以轻易地让深度神经网络的决策过程受到损害。为了防止这些潜在的威胁,各种基于对抗训练的防御方法在深度神经网络中迅速增长,并成为了惯例的标准方法。尽管最近的竞赛成绩表明了这些防御方法的竞争力,但我们发现,对于不同的目标,攻击性的扰动的敏感性会异常变化,而且某些敏感性甚至无法被深度神经网络的更深度结构和高级防御方法缓解。为了解决这个问题,在这篇论文中,我们提出了一种名为对抗双机器学习(ADML)的 causal 方法,可以评估深度神经网络的预测中的攻击性扰动的度量,并且可以直接评估对于结果的影响。ADML可以直接估计攻击性扰动的 causal 参数,并且可以避免对 robustness 的负面影响,从而将 causal 视角引入到攻击性扰动中。通过对各种 CNN 和 Transformer 架构进行广泛的实验,我们证明了 ADML 可以提高对抗鲁棒性,并且可以解除实际观察中的异常现象。”
Rigorous Runtime Analysis of Diversity Optimization with GSEMO on OneMinMax
results: 论文证明了GSEMO算法在 Population 中拥有第二最佳多样化时,预期时间 $O(n^2)$ 内找到优化多样化的解。这个结论基于Population 的随机漫步分析,该分析反映了解集的变化频率和结果。Abstract
The evolutionary diversity optimization aims at finding a diverse set of solutions which satisfy some constraint on their fitness. In the context of multi-objective optimization this constraint can require solutions to be Pareto-optimal. In this paper we study how the GSEMO algorithm with additional diversity-enhancing heuristic optimizes a diversity of its population on a bi-objective benchmark problem OneMinMax, for which all solutions are Pareto-optimal. We provide a rigorous runtime analysis of the last step of the optimization, when the algorithm starts with a population with a second-best diversity, and prove that it finds a population with optimal diversity in expected time $O(n^2)$, when the problem size $n$ is odd. For reaching our goal, we analyse the random walk of the population, which reflects the frequency of changes in the population and their outcomes.
摘要
进化多标的优化目标是找到一个多元化的解集,满足一些适当的健康指标。在多目标优化情况下,这个限制可能需要解是Pareto优的。在这篇论文中,我们研究了GSEMO算法,具有额外的多标化增强规律,在二元最大最小问题OneMinMax上进行了多标化优化。我们提供了严谨的时间分析,当算法从一个第二最佳多标的人口开始时,证明它可以在预期时间O(n^2)内找到一个最佳多标的人口,当问题大小n是奇数时。为了实现我们的目标,我们分析了人口的随机步行,这反映了人口中的变化频率和其结果。
Fairness of ChatGPT and the Role Of Explainable-Guided Prompts
methods: 研究使用了judiciously designed prompts和domain-specific knowledge来导向LLMs,并与传统机器学习(ML)模型进行比较。
results: 研究发现LLMs可以与传统ML模型相似的性能,但使用了40倍少的数据(20个数据点,相比800个数据点),并优化错误率和公平性,这些都是风险分析中重要的层面。Abstract
Our research investigates the potential of Large-scale Language Models (LLMs), specifically OpenAI's GPT, in credit risk assessment-a binary classification task. Our findings suggest that LLMs, when directed by judiciously designed prompts and supplemented with domain-specific knowledge, can parallel the performance of traditional Machine Learning (ML) models. Intriguingly, they achieve this with significantly less data-40 times less, utilizing merely 20 data points compared to the ML's 800. LLMs particularly excel in minimizing false positives and enhancing fairness, both being vital aspects of risk analysis. While our results did not surpass those of classical ML models, they underscore the potential of LLMs in analogous tasks, laying a groundwork for future explorations into harnessing the capabilities of LLMs in diverse ML tasks.
摘要
我们的研究探讨了大规模语言模型(LLMs),具体是OpenAI的GPT,在信用风险评估中的潜力。我们发现,当用judiciously设计的提示和域pecific知识支持时,LLMs可以与传统机器学习(ML)模型相当。这些LLMs可以在使用merely 20个数据点时达到800个数据点的性能,并且特别是减少假阳性和提高公平性,这两者都是重要的风险分析方面。虽然我们的结果没有超过传统ML模型的表现,但它们表明LLMs在类似任务中具有潜力,为未来在多种ML任务中利用LLMs的可能性奠定基础。
Multiplicative update rules for accelerating deep learning training and increasing robustness
results: 实验表明,使用这种新的多元更新规则可以在多种优化方法和深度神经网络(DNN)架构下加速DL训练,并且导致模型更加稳定和robust。Abstract
Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
摘要
In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of tasks and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.Translated into Simplified Chinese:即使现在,深度学习(DL)已经在许多研究领域达到了状态机器的表现,加速训练和建立robust DL模型仍然是一项挑战。为此,不同的研究者已经努力开发了robust DL模型训练方法,以降低参数分布、模型架构和损失landscape的敏感性。然而,这些方法受限于 adaptive learning rate optimizer、初始化方案和clipping gradients,而没有探究参数更新的基本规则。 although multiplicative updates have made significant contributions to the early development of machine learning and have strong theoretical foundations, to the best of our knowledge, this is the first work that investigates them in the context of DL training acceleration and robustness.在这个工作中,我们提出了一个优化框架,可以适应多种优化算法,并允许使用alternative update rule。为此,我们提出了一个新的乘数更新规则,并通过将其与传统的加itive更新项结合,实现了一种新的hybrid更新方法。我们声称,该 frameworks可以加速训练,同时导致模型更加稳定和robust,相比于传统使用的加itive更新规则,并通过实验证明其效果。这些任务包括从 convex和非convex优化到难度图像分类benchmark,并通过多种传统使用的优化方法和深度神经网络(DNN)架构。
TriFormer: A Multi-modal Transformer Framework For Mild Cognitive Impairment Conversion Prediction
results: Triformer在Alzheimer’s Disease Neuroimaging Initiative(ANDI)1和ADNI2数据集上评估,与之前的单模和多模态方法相比,显示出更高的准确率。Abstract
The prediction of mild cognitive impairment (MCI) conversion to Alzheimer's disease (AD) is important for early treatment to prevent or slow the progression of AD. To accurately predict the MCI conversion to stable MCI or progressive MCI, we propose Triformer, a novel transformer-based framework with three specialized transformers to incorporate multi-model data. Triformer uses I) an image transformer to extract multi-view image features from medical scans, II) a clinical transformer to embed and correlate multi-modal clinical data, and III) a modality fusion transformer that produces an accurate prediction based on fusing the outputs from the image and clinical transformers. Triformer is evaluated on the Alzheimer's Disease Neuroimaging Initiative (ANDI)1 and ADNI2 datasets and outperforms previous state-of-the-art single and multi-modal methods.
摘要
预测轻度认知障碍(MCI)转化为阿尔茨海默病(AD)的预测非常重要,以便在早期发现并采取措施来防止或减慢AD的进程。为了准确预测MCI转化为稳定MCI或进展MCI,我们提议Triformer,一种新的转换器基础框架,其特点是使用三种特殊的转换器来汇合多种数据。Triformer使用以下三种转换器:1. 图像转换器,用于从医疗扫描中提取多视图图像特征。2. 临床转换器,用于将多modal临床数据进行嵌入和相关计算。3. 模式融合转换器,用于基于多种数据融合的准确预测。Triformer在Alzheimer's Disease Neuroimaging Initiative(ANDI)1和ADNI2 datasets上进行评估,并与之前的单模和多模态方法相比,显示出更高的预测精度。
Safe DreamerV3: Safe Reinforcement Learning with World Models
results: 我们的实验结果显示,Safe DreamerV3算法在Safety-Gymnasium benchmark中的低维度和视觉任务中可以达到几乎零成本,而 existing SafeRL 方法则无法达到这个目标。Abstract
The widespread application of Reinforcement Learning (RL) in real-world situations is yet to come to fruition, largely as a result of its failure to satisfy the essential safety demands of such systems. Existing safe reinforcement learning (SafeRL) methods, employing cost functions to enhance safety, fail to achieve zero-cost in complex scenarios, including vision-only tasks, even with comprehensive data sampling and training. To address this, we introduce Safe DreamerV3, a novel algorithm that integrates both Lagrangian-based and planning-based methods within a world model. Our methodology represents a significant advancement in SafeRL as the first algorithm to achieve nearly zero-cost in both low-dimensional and vision-only tasks within the Safety-Gymnasium benchmark. Our project website can be found in: https://sites.google.com/view/safedreamerv3.
摘要
RL在实际应用中的普及尚未实现,主要是因为它无法满足实际系统的安全需求。现有的安全再强化学习(SafeRL)方法,通过提高安全成本来增强安全性,在复杂的场景中,包括视觉任务,甚至 WITH comprehensive data sampling和训练,都无法实现零成本。为解决这一问题,我们提出了Safe DreamerV3算法,该算法结合了Lagrangian-based和规划基于的方法,并在世界模型中实现了这些方法。我们的方法代表了安全RL的一大突破,是首个在低维度和视觉任务中的 Safety-Gymnasium bencmark中实现零成本的算法。您可以在以下网站上找到我们的项目网站:https://sites.google.com/view/safedreamerv3。
HYTREL: Hypergraph-enhanced Tabular Data Representation Learning
results: 在四个下游任务中,HYTREL 表现出了与其他竞争对手相比的稳定和高效表现,只需要最小的预训练。此外,qualitative 分析表明 HYTREL 可以快速吸收表格结构,生成稳定的表格单元表示。Abstract
Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.
摘要
language models 预训练在大量表格数据上有显示其效iveness 在多个下游任务中。然而,许多这些模型不会考虑表格数据中的列/行Permutation invariances,层次结构等属性。为了解决这些限制,我们提出了 HYTREL,一种表格语言模型,该模型通过使用 hypergraphs capture 表格数据中的 Permutation invariances 和三种结构属性。我们证明 HYTREL 在某些条件下对 tabular data 是最大 invariant,即两个表格通过 HYTREL 生成的表示相同,只要两个表格在排序上相同。我们的实验结果表明 HYTREL 在四个下游任务中具有明显的优势,只需要 minimal pretraining,这说明了在表格数据中包含表格数据的预测性假设可以为表格数据的表示增加优势。最后,我们的质量分析表明 HYTREL 可以吸收表格结构,为细胞、行、列和整个表格生成强健的表示。
Vulnerability-Aware Instance Reweighting For Adversarial Training
results: 对比 existed 的权重调整方法,本研究提出了一种新的实例级权重调整方法,可以减少对攻击示例的依赖度和信息损失。实验结果表明,该方法可以显著提高对攻击示例的Robustness,特别是面对强大的白盒和黑盒攻击。Abstract
Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks. AT involves obtaining robustness by including adversarial examples in training a classifier. Most variants of AT algorithms treat every training example equally. However, recent works have shown that better performance is achievable by treating them unequally. In addition, it has been observed that AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify. Consequently, various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set. In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks. Through extensive experiments, we show that our proposed method significantly improves over existing reweighting schemes, especially against strong white and black-box attacks.
摘要
Looking deeper into interpretable deep learning in neuroimaging: a comprehensive survey
results: 研究发现,使用可解释深度学习模型可以更好地捕捉神经成像数据中关键的脑变化,并且可以提高模型的预测性能。但是,当前的实践还存在一些限制和挑战,需要进一步的研究和改进。Abstract
Deep learning (DL) models have been popular due to their ability to learn directly from the raw data in an end-to-end paradigm, alleviating the concern of a separate error-prone feature extraction phase. Recent DL-based neuroimaging studies have also witnessed a noticeable performance advancement over traditional machine learning algorithms. But the challenges of deep learning models still exist because of the lack of transparency in these models for their successful deployment in real-world applications. In recent years, Explainable AI (XAI) has undergone a surge of developments mainly to get intuitions of how the models reached the decisions, which is essential for safety-critical domains such as healthcare, finance, and law enforcement agencies. While the interpretability domain is advancing noticeably, researchers are still unclear about what aspect of model learning a post hoc method reveals and how to validate its reliability. This paper comprehensively reviews interpretable deep learning models in the neuroimaging domain. Firstly, we summarize the current status of interpretability resources in general, focusing on the progression of methods, associated challenges, and opinions. Secondly, we discuss how multiple recent neuroimaging studies leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions. Finally, we discuss the limitations of the current practices and offer some valuable insights and guidance on how we can steer our future research directions to make deep learning models substantially interpretable and thus advance scientific understanding of brain disorders.
摘要
深度学习(DL)模型在过去几年中得到了广泛应用,主要是因为它们可以直接从原始数据中学习,而不需要额外的错误产生的特征提取阶段。现在的DL基于的脑成像研究也经历了明显的性能提高,但DL模型的挑战仍然存在,主要是因为这些模型的透明性不够,使得它们在实际应用中不可靠。在过去几年,可解释AI(XAI)技术得到了广泛发展,以获取模型做出决策的 intuitions,这对于安全关键领域如医疗、金融和宪政机构来说是非常重要。虽然解释领域在发展中,但研究人员仍然不清楚哪些方面的模型学习post hoc方法揭示出来,以及如何验证其可靠性。这篇评论综述了在脑成像领域中可解释深度学习模型的发展。首先,我们概括了目前可解释资源的状况,包括方法的进步、相关挑战和观点。其次,我们讲述了一些最近的脑成像研究如何通过模型解释来捕捉最相关的脑结构和功能变化,以及对于模型预测的影响。最后,我们讨论了当前实践中的限制,并提供了一些有价值的指导和建议,以便我们可以在未来减少DL模型的不可靠性,并提高我们对脑疾病的科学理解。
Federated Learning-Empowered AI-Generated Content in Wireless Networks
paper_authors: Xumin Huang, Peichun Li, Hongyang Du, Jiawen Kang, Dusit Niyato, Dong In Kim, Yuan Wu
for: 提高内容创作效率、质量、多样性和灵活性
methods: 采用分布式学习框架,协同数据所有者进行模型训练,保护用户隐私
results: 降低通信成本和训练延迟,同时保护用户隐私Abstract
Artificial intelligence generated content (AIGC) has emerged as a promising technology to improve the efficiency, quality, diversity and flexibility of the content creation process by adopting a variety of generative AI models. Deploying AIGC services in wireless networks has been expected to enhance the user experience. However, the existing AIGC service provision suffers from several limitations, e.g., the centralized training in the pre-training, fine-tuning and inference processes, especially their implementations in wireless networks with privacy preservation. Federated learning (FL), as a collaborative learning framework where the model training is distributed to cooperative data owners without the need for data sharing, can be leveraged to simultaneously improve learning efficiency and achieve privacy protection for AIGC. To this end, we present FL-based techniques for empowering AIGC, and aim to enable users to generate diverse, personalized, and high-quality content. Furthermore, we conduct a case study of FL-aided AIGC fine-tuning by using the state-of-the-art AIGC model, i.e., stable diffusion model. Numerical results show that our scheme achieves advantages in effectively reducing the communication cost and training latency and privacy protection. Finally, we highlight several major research directions and open issues for the convergence of FL and AIGC.
摘要
人工智能生成内容(AIGC)技术已经出现为改善内容创作过程的效率、质量、多样性和灵活性的优秀选择。在无线网络中部署AIGC服务可以提高用户体验。然而,现有的AIGC服务提供方式受到一些限制,例如中央化训练在预训练、精度调整和推理过程中,特别是在保护隐私的无线网络中。 Federated learning(FL),作为一种分布式学习框架,可以同时提高学习效率和实现隐私保护。为此,我们提出了基于FL的AIGC技术,并 Hoping to enable users to generate diverse, personalized, and high-quality content.在我们的实验中,我们使用当前的AIGC模型,即稳定扩散模型进行FL-帮助的精度调整。 numerically results show that our scheme achieves advantages in effectively reducing the communication cost and training latency, as well as privacy protection. Finally, we highlight several major research directions and open issues for the convergence of FL and AIGC.
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
results: 研究发现,open-domain chatbot 模型在多Turn对话中可以被触发出攻击性的回答,最好的情况下,\toxicbot 的激活率达到 67%。Abstract
Recent advances in natural language processing and machine learning have led to the development of chatbot models, such as ChatGPT, that can engage in conversational dialogue with human users. However, the ability of these models to generate toxic or harmful responses during a non-toxic multi-turn conversation remains an open research question. Existing research focuses on single-turn sentence testing, while we find that 82\% of the individual non-toxic sentences that elicit toxic behaviors in a conversation are considered safe by existing tools. In this paper, we design a new attack, \toxicbot, by fine-tuning a chatbot to engage in conversation with a target open-domain chatbot. The chatbot is fine-tuned with a collection of crafted conversation sequences. Particularly, each conversation begins with a sentence from a crafted prompt sentences dataset. Our extensive evaluation shows that open-domain chatbot models can be triggered to generate toxic responses in a multi-turn conversation. In the best scenario, \toxicbot achieves a 67\% activation rate. The conversation sequences in the fine-tuning stage help trigger the toxicity in a conversation, which allows the attack to bypass two defense methods. Our findings suggest that further research is needed to address chatbot toxicity in a dynamic interactive environment. The proposed \toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue and improve the robustness of chatbots for end users.
摘要
近年,自然语言处理和机器学习的进步使得聊天机器人模型,如ChatGPT,能够与人类用户进行对话交流。然而,这些模型在不恶意多轮对话中产生恶意或危险回复的能力仍然是一个开放的研究问题。现有研究主要集中在单Turn句子测试上,而我们发现82%的个人不恶意句子在对话中可以让chatbot产生恶意行为。在这篇论文中,我们设计了一种新的攻击,\toxicbot,通过细化一个目标的开放领域聊天机器人。这个聊天机器人通过一个采集的对话序列进行细化。具体来说,每个对话都开始于一个采集的提示句子集中的一句话。我们的广泛评估表明,开放领域聊天机器人模型可以在多轮对话中产生恶意回复。最好的情况下,\toxicbot达到67%的活动率。对话序列在细化阶段帮助诱发对话中的恶意,这使得攻击能够绕过两种防御方法。我们的发现表明,进一步的研究是必要的,以解决聊天机器人中的恶意对话在动态交互环境中的问题。我们提出的\toxicbot可以由行业和研究人员使用,以开发检测和mitigate聊天机器人中的恶意回复的方法,并提高聊天机器人的用户端Robustness。
Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms
results: 对四个公共数据集进行了广泛的实验,结果显示 Camilla 可以更准确地捕捉每个算法的优缺点,并在metric可靠性、排名一致性和排名稳定性上表现出色Abstract
Machine learning algorithms have become ubiquitous in a number of applications (e.g. image classification). However, due to the insufficient measurement of traditional metrics (e.g. the coarse-grained Accuracy of each classifier), substantial gaps are usually observed between the real-world performance of these algorithms and their scores in standardized evaluations. In this paper, inspired by the psychometric theories from human measurement, we propose a task-agnostic evaluation framework Camilla, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm. Specifically, given the response logs from different algorithms to data samples, we leverage cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills (explicitly or implicitly pre-defined) of each sample. In this way, both the abilities of each algorithm on multiple skills and some of the sample factors (e.g. sample difficulty) can be simultaneously quantified. We conduct extensive experiments with hundreds of machine learning algorithms on four public datasets, and our experimental results demonstrate that Camilla not only can capture the pros and cons of each algorithm more precisely, but also outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
摘要
Ethics in the Age of AI: An Analysis of AI Practitioners’ Awareness and Challenges
results: 研究发现大多数AI实践者有一定的认识度对AI伦理,主要归功于工作场所的规则和政策。他们主要关注隐私保护和安全问题。正式的教育和培训被认为有一定帮助作用。AI实践者在开发伦理AI系统时遇到的挑战包括一般挑战、技术相关挑战和人类相关挑战。Abstract
Ethics in AI has become a debated topic of public and expert discourse in recent years. But what do people who build AI - AI practitioners - have to say about their understanding of AI ethics and the challenges associated with incorporating it in the AI-based systems they develop? Understanding AI practitioners' views on AI ethics is important as they are the ones closest to the AI systems and can bring about changes and improvements. We conducted a survey aimed at understanding AI practitioners' awareness of AI ethics and their challenges in incorporating ethics. Based on 100 AI practitioners' responses, our findings indicate that majority of AI practitioners had a reasonable familiarity with the concept of AI ethics, primarily due to workplace rules and policies. Privacy protection and security was the ethical principle that majority of them were aware of. Formal education/training was considered somewhat helpful in preparing practitioners to incorporate AI ethics. The challenges that AI practitioners faced in the development of ethical AI-based systems included (i) general challenges, (ii) technology-related challenges and (iii) human-related challenges. We also identified areas needing further investigation and provided recommendations to assist AI practitioners and companies in incorporating ethics into AI development.
摘要
调查结果表明,大多数 AI practitioners 对 AI 伦理概念有一定的了解,主要归功于工作场所的规则和政策。隐私保护和安全是他们意识到的伦理原则中的主要内容。有些人认为,正式的教育和训练有所帮助,以准备 practitioners 在实践中采用 AI 伦理。在开发伦理 AI 系统时,AI practitioners 面临着一些挑战,包括(i)总体挑战、(ii)技术相关的挑战和(iii)人类相关的挑战。我们还发现了需要进一步调查的领域和提出了建议,以帮助 AI practitioners 和公司在 AI 开发中更好地 integrating ethics。
DataAssist: A Machine Learning Approach to Data Cleaning and Preparation
results: 可以大幅减少数据整理和整合的时间,为经济、商业和预测应用等领域提供高质量数据集Here’s the breakdown of each point:
for: The paper is written for people who want to improve the efficiency of data analysis and reduce the time spent on data cleaning and preparation.
methods: The paper uses machine learning-informed methods to automate data preparation, including generating visualizations for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data.
results: The paper shows that using DataAssist can significantly reduce the time spent on data cleaning and preparation, providing high-quality data sets for a variety of fields, including economics, business, and forecasting applications.Abstract
Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools are available. Here we present DataAssist, an automated data preparation and cleaning platform that enhances dataset quality using ML-informed methods. We show that DataAssist provides a pipeline for exploratory data analysis and data cleaning, including generating visualization for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data. The exported dataset can be readily integrated with other autoML tools or user-specified model for downstream analysis. Our data-centric tool is applicable to a variety of fields, including economics, business, and forecasting applications saving over 50% time of the time spent on data cleansing and preparation.
摘要
现有的自动化机器学习(ML)工具都是模型中心的,强调模型选择和参数优化。然而,数据分析中大部分时间都是投入到数据整理和整理中,而现有的工具却有限。我们现在提出了DataAssist,一个自动化数据准备和整理平台,使用机器学习 Informed 方法提高数据集质量。我们显示了DataAssist 提供了探索数据分析和数据整理的管道,包括生成用户选择变量的Visualization,统一数据注释、建议异常移除和数据预处理。导出的数据集可以轻松地与其他自动ML工具或用户指定的模型进行下游分析。我们的数据集中心工具适用于多个领域,包括经济、商业和预测应用,可以节省大约50%的时间用于数据整理和准备。
EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
results: 对8种不同任务和4种模型(ChatGPT、Vicuna-13b、Bloom、T5)进行实验,EmotionPrompt 显著超越原始的零例示 prompt 和 Zero-shot-CoT,同时提高了真实性和信息性。Abstract
Large language models (LLMs) have achieved significant performance in many fields such as reasoning, language understanding, and math problem-solving, and are regarded as a crucial step to artificial general intelligence (AGI). However, the sensitivity of LLMs to prompts remains a major bottleneck for their daily adoption. In this paper, we take inspiration from psychology and propose EmotionPrompt to explore emotional intelligence to enhance the performance of LLMs. EmotionPrompt operates on a remarkably straightforward principle: the incorporation of emotional stimulus into prompts. Experimental results demonstrate that our EmotionPrompt, using the same single prompt templates, significantly outperforms original zero-shot prompt and Zero-shot-CoT on 8 tasks with diverse models: ChatGPT, Vicuna-13b, Bloom, and T5. Further, EmotionPrompt was observed to improve both truthfulness and informativeness. We believe that EmotionPrompt heralds a novel avenue for exploring interdisciplinary knowledge for humans-LLMs interaction.
摘要
Translated into Simplified Chinese:大型语言模型(LLM)在多个领域 such as 理解、语言理解和数学问题解决方面已经达到了显著的性能,并被视为人工通用智能(AGI)的关键一步。然而,LLM的提示敏感性仍然是日常使用的主要瓶颈。在这篇论文中,我们启发自心理学,并提出了情感提示(EmotionPrompt),以增强 LLM 的性能。情感提示的工作原理很简单:在提示中添加情感刺激。实验结果表明,我们的 EmotionPrompt,使用同一个单一提示模板,在8个任务上 WITH 多种模型(ChatGPT、Vicuna-13b、Bloom 和 T5)显著超过原始 zero-shot 提示和 Zero-shot-CoT。此外,EmotionPrompt 还被观察到改善了真实性和信息性。我们认为,EmotionPrompt 开启了一条新的人类-LLM交互知识探索的道路。
Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning
results: 论文的实验表明,当前的OFFLINE RL方法可以在一定程度上学习训练任务,而 compose 方法在不同任务结构下表现 significatively better than non-compose 方法。但是,当前的方法还无法EXTRACT 任务的compositional结构,以便在未看到的任务上掌握。Abstract
Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL.
摘要
减少数据采集成本的离线强化学习(RL)是一个有前途的方向,允许RL Agent在大量数据上预训练,避免重复的数据采集成本。为了推动这一领域,生成大规模数据是非常重要的。 compositional RL 特别有利于生成大规模数据,因为它允许创建多个任务从少量组成部件,并且任务结构可能使得训练过的 Agent 能够解决新任务,并将相关学习的组成部件组合起来。这篇文章提供了基于 CompoSuite 的 256 个任务的四个离线 RL 数据集,每个数据集由一个不同的表现度作为 Agent 生成的 256 亿个转移。我们提供了训练和评估设置,以评估 Agent 是否能够学习复杂的任务策略。我们的检验结果表明,当前的离线 RL 方法可以在一定程度上学习训练任务,而 compositional 方法在不同任务之间具有显著的优势。然而,当前的方法仍然无法提取任务的复杂结构,以普适化到未看到的任务,表明需要进一步的研究在离线 compositional RL 方面。
Espaloma-0.3.0: Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond
paper_authors: Kenichiro Takaba, Iván Pulido, Mike Henry, Hugo MacDermott-Opeskin, John D. Chodera, Yuanqing Wang
For: The paper aims to develop a new approach to building molecular mechanics (MM) force fields that can be learned directly from quantum chemical calculations or condensed-phase data, using graph neural networks.* Methods: The approach uses an end-to-end differentiable force field construction method called Espaloma, which incorporates both energy and force fitting directly to quantum chemical data. The method is trained on a dataset of chemical spaces relevant to biomolecular modeling, including small molecules, proteins, and RNA.* Results: The resulting force field, espaloma 0.3.0, accurately predicts quantum chemical energies and forces, and maintains stable quantum chemical energy-minimized geometries. The approach also produces highly accurate protein-ligand binding free energies when self-consistently parametrizing protein and ligand. The method shows promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest.Here are the three key points in Simplified Chinese text:* For: 这篇论文目标是开发一种基于量子化学计算或固体数据的分子机械力场模型,使用图 neural network 来取代人工专家审核的、不可变的化学参数分配规则。* Methods: 该方法使用一种终端可微分的力场建模方法,称为 Espaloma,直接从量子化学计算或固体数据中学习力场模型。该方法在包括小分子、蛋白质和 RNA 等化学空间中进行了训练。* Results: 所得到的力场模型,即 espaloma 0.3.0,能够准确预测量子化学能量和力,并稳定地保持量子化学能量最小化的结构。此外,该方法还能够高精度地预测蛋白质-抗体复合物的绑定自由能。该方法显示出了在新的化学领域中建立更加准确的力场模型的潜在优势。Abstract
Molecular mechanics (MM) force fields -- the models that characterize the energy landscape of molecular systems via simple pairwise and polynomial terms -- have traditionally relied on human expert-curated, inflexible, and poorly extensible discrete chemical parameter assignment rules, namely atom or valence types. Recently, there has been significant interest in using graph neural networks to replace this process, while enabling the parametrization scheme to be learned in an end-to-end differentiable manner directly from quantum chemical calculations or condensed-phase data. In this paper, we extend the Espaloma end-to-end differentiable force field construction approach by incorporating both energy and force fitting directly to quantum chemical data into the training process. Building on the OpenMM SPICE dataset, we curate a dataset containing chemical spaces highly relevant to the broad interest of biomolecular modeling, covering small molecules, proteins, and RNA. The resulting force field, espaloma 0.3.0, self-consistently parametrizes these diverse biomolecular species, accurately predicts quantum chemical energies and forces, and maintains stable quantum chemical energy-minimized geometries. Surprisingly, this simple approach produces highly accurate protein-ligand binding free energies when self-consistently parametrizing protein and ligand. This approach -- capable of fitting new force fields to large quantum chemical datasets in one GPU-day -- shows significant promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest.
摘要
分子机械力场(MM)力场模型 -- 用简单的对拟和多项式关系来描述分子系统的能量景观的模型 -- 传统上依赖人工专家 manually curated 和不可靠的化学参数赋值规则,即原子或Valence 类型。在最近几年,有一个重要的兴趣是使用图 neural networks 取代这个过程,以使 parametrization 算法可以在一个端到端可微分的方式直接从量子化学计算或压缩物理数据中学习。在这篇文章中,我们扩展了Espaloma 端到端可微分力场建构方法,并将能量和力数据直接适应到量子化学计算中。基于 OpenMM SPICE 数据集,我们筹集了一个包含高度相关的生物分子模型化学species的数据集,包括小分子、蛋白质和 RNA。得到的力场,Espaloma 0.3.0,自 consistently 参数化这些多样化的生物分子物种,准确预测量子化学能量和力,并维持稳定的量子化学能量最小化几何。Surprisingly,这种简单的方法可以高度准确地预测蛋白质- ligand 绑定自由能量,当自 consistently 参数化蛋白质和 ligand 时。这种方法 -- 可以在一个 GPU-day 内适应新的量子化学数据集 -- 显示出了重要的承诺,作为在新化学领域中建立更准确的力场的可行之路。
Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability
results: 该论文通过在实验和真实机器人任务中的训练表现,证明了AWaVO方法的可行性和稳定性,并实际地证明了一种合理的性能与保守可观察性之间的交易。Abstract
Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.
摘要
“强化学习或最佳控制可以提供有效的推理 для累缲问题,具有变化的动态。然而,实际实现中,评估奖函数和对应的优质策略实际上具有挑战。因此,将累缲问题正式化为推理具有很大的价值,因为概率推理在原理上提供了多元化和强大的数学工具,以推断随机动态,并提供了累缲决策的概率解释。在本研究中,我们提出了一种新的适应 Wasserstein 统计Optimization(AWaVO),以解决这些挑战。我们的方法使用正式方法,提供奖函数设计的解释、训练调合的透明度和累缲决策的概率解释。为证明实用性,我们在模拟和实际 robot 任务中显示了执行调合的训练,并证明了保证全球调合速率不仅在模拟中实现,而且在实际任务中也实现了。此外,我们还证明了累缲决策的合理交换,即高性能和保守解释之间的平衡。”
Vertex-based Networks to Accelerate Path Planning Algorithms
results: 通过对 randomly generated floor maps 进行实验,所提出的解决方案可以实现超过400%的速度提升,与基eline模型相比。Abstract
Path planning plays a crucial role in various autonomy applications, and RRT* is one of the leading solutions in this field. In this paper, we propose the utilization of vertex-based networks to enhance the sampling process of RRT*, leading to more efficient path planning. Our approach focuses on critical vertices along the optimal paths, which provide essential yet sparser abstractions of the paths. We employ focal loss to address the associated data imbalance issue, and explore different masking configurations to determine practical tradeoffs in system performance. Through experiments conducted on randomly generated floor maps, our solutions demonstrate significant speed improvements, achieving over a 400% enhancement compared to the baseline model.
摘要
<>将文本翻译成简化中文。<>路径规划在各种自动化应用中扮演着关键性的角色,RRT* 是该领域的一种领先解决方案。在这篇论文中,我们提议通过顶点基于网络来增强 RRT* 的抽象过程,从而实现更有效的路径规划。我们的方法是关注优质路径上的关键顶点,这些顶点提供了关键的 yet 稀疏的路径抽象。我们使用焦点损失来解决相关的数据不均衡问题,并 explore 不同的masking配置来确定实际的系统性能质量。通过在随机生成的 floor 图上进行的实验,我们的解决方案表明了明显的速度提高,相比基eline 模型,实现了超过 400% 的提高。
A metric learning approach for endoscopic kidney stone identification
paper_authors: Jorge Gonzalez-Zapata, Francisco Lopez-Tiro, Elias Villalvazo-Avila, Daniel Flores-Araiza, Jacques Hubert, Andres Mendez-Vazquez, Gilberto Ochoa-Ruiz, Christian Daul
for: automatic identification of kidney stones during ureteroscopy to enable rapid therapeutic decisions
methods: Deep Metric Learning (DML) methods, including a novel architecture and a teacher-student approach with Knowledge Distillation
results: improved identification accuracy by 10-12% compared to Deep Learning (DL) methods and other DML approaches, and up to 30% compared to shallow machine learning methodsAbstract
Several Deep Learning (DL) methods have recently been proposed for an automated identification of kidney stones during an ureteroscopy to enable rapid therapeutic decisions. Even if these DL approaches led to promising results, they are mainly appropriate for kidney stone types for which numerous labelled data are available. However, only few labelled images are available for some rare kidney stone types. This contribution exploits Deep Metric Learning (DML) methods i) to handle such classes with few samples, ii) to generalize well to out of distribution samples, and iii) to cope better with new classes which are added to the database. The proposed Guided Deep Metric Learning approach is based on a novel architecture which was designed to learn data representations in an improved way. The solution was inspired by Few-Shot Learning (FSL) and makes use of a teacher-student approach. The teacher model (GEMINI) generates a reduced hypothesis space based on prior knowledge from the labeled data, and is used it as a guide to a student model (i.e., ResNet50) through a Knowledge Distillation scheme. Extensive tests were first performed on two datasets separately used for the recognition, namely a set of images acquired for the surfaces of the kidney stone fragments, and a set of images of the fragment sections. The proposed DML-approach improved the identification accuracy by 10% and 12% in comparison to DL-methods and other DML-approaches, respectively. Moreover, model embeddings from the two dataset types were merged in an organized way through a multi-view scheme to simultaneously exploit the information of surface and section fragments. Test with the resulting mixed model improves the identification accuracy by at least 3% and up to 30% with respect to DL-models and shallow machine learning methods, respectively.
摘要
Recently, several Deep Learning (DL) methods have been proposed for automated identification of kidney stones during ureteroscopy to enable rapid therapeutic decisions. Although these DL approaches have shown promising results, they are mainly suitable for kidney stone types with abundant labeled data. However, there are few labeled images available for rare kidney stone types. To address this challenge, this study exploits Deep Metric Learning (DML) methods to handle such classes with few samples, generalize well to out-of-distribution samples, and cope better with new classes added to the database.The proposed Guided Deep Metric Learning approach is based on a novel architecture that learns data representations in an improved way. The solution is inspired by Few-Shot Learning (FSL) and uses a teacher-student approach. The teacher model (GEMINI) generates a reduced hypothesis space based on prior knowledge from labeled data and guides a student model (i.e., ResNet50) through a Knowledge Distillation scheme.Extensive tests were performed on two datasets separately used for recognition, namely a set of images of the surfaces of kidney stone fragments and a set of images of fragment sections. The proposed DML-approach improved identification accuracy by 10% and 12% compared to DL-methods and other DML-approaches, respectively. Moreover, model embeddings from the two dataset types were merged in an organized way through a multi-view scheme to simultaneously exploit the information of surface and section fragments. Tests with the resulting mixed model improved identification accuracy by at least 3% and up to 30% compared to DL-models and shallow machine learning methods, respectively.
Leveraging Factored Action Spaces for Off-Policy Evaluation
results: 提出了一种新的“分解”重样 estimator,并证明了这种 estimator 具有较低的偏差和较高的稳定性,同时保持零偏差性。通过实验验证了这些理论结果,并证明了假设的有效性。Abstract
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.
摘要
<> translate "Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.">>Here's the translation in Simplified Chinese:<>Off-policy evaluation (OPE) 目标是估计在执行不同的行动序列后的利益,基于已经执行过的数据。然而,现有的 OPE 估计器经常在具有大量结合型动作空间的问题中表现出高偏差和高方差。我们研究如何使用分解动作空间来降低这些问题。我们提出了一种基于分解动作空间的 "分解" 重要抽样(IS) 估计器。对于满足某些假设,我们证明了这种分解 IS 估计器 的方差比原始非分解版本更低,同时保持零偏差性。通过实验,我们证明了我们的理论结论,检验了各种假设的有效性。如果可以得到一种可以 derive 动作空间分解的技术,我们的工作显示了 OPE 可以免费地改进,通过利用这种问题的内在结构。
Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks
results: 我们的分析显示,现有的NLP任务中的OOD检测方法并不够敏感,无法捕捉各种分布偏移的样本。特别是背景偏移和随机排序的字符串内域文本测试场景是最大的挑战。这说明未来的研究应该更加注重开发更有效的OOD检测方法,而我们的研究提供了一个充分定义的研究基础。Abstract
State-of-the-art models can perform well in controlled environments, but they often struggle when presented with out-of-distribution (OOD) examples, making OOD detection a critical component of NLP systems. In this paper, we focus on highlighting the limitations of existing approaches to OOD detection in NLP. Specifically, we evaluated eight OOD detection methods that are easily integrable into existing NLP systems and require no additional OOD data or model modifications. One of our contributions is providing a well-structured research environment that allows for full reproducibility of the results. Additionally, our analysis shows that existing OOD detection methods for NLP tasks are not yet sufficiently sensitive to capture all samples characterized by various types of distributional shifts. Particularly challenging testing scenarios arise in cases of background shift and randomly shuffled word order within in domain texts. This highlights the need for future work to develop more effective OOD detection approaches for the NLP problems, and our work provides a well-defined foundation for further research in this area.
摘要
现代模型在控制环境下可以表现出色,但它们在不同类型的 Distributional Shift 下陷入困难,因此 OOD 检测成为 NLP 系统的关键组成部分。在这篇论文中,我们关注现有 OOD 检测方法在 NLP 领域的局限性。我们评估了八种可以容易地集成到现有 NLP 系统中的 OOD 检测方法,不需要额外的 OOD 数据或模型修改。我们的贡献之一是提供了一个具有完整可重现性的研究环境。此外,我们的分析表明,现有的 OOD 检测方法在 NLP 任务中并不够敏感,无法捕捉所有类型的 Distributional Shift 下的样本。特别是在背景变化和随机排序的情况下,OOD 检测方法表现出特别困难。这种情况 highlights 未来的研究应该更加注重开发更有效的 OOD 检测方法,我们的工作提供了一个完善的基础 для进一步的研究。
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
results: 本论文的方法可以在约 20 秒内进行个人化,比 DreamBooth 快 25 倍,比 Textual Inversion 快 125 倍,并且可以从单一图像中生成高质量和多样化的个人化 faces,而且模型的大小仅有 DreamBooth 的 1/10000。Abstract
Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
摘要
个人化在生成人工智能领域中已经成为一个显著的特征,允许在不同的上下文和风格中生成个性化的人脸,保持高度准确性。然而,个人化过程存在内存和时间的挑战,每个个性化模型都需要较长的GPU时间投资,并且每个人需要存储一个个性化模型,这会占用更多的存储容量。为了解决这些挑战,我们提出了HyperDreamBooth,一个具有高效生成少量个性化参数的超网络。通过这些参数与扩散模型的组合,加速了个性化,可以在20秒钟内生成具有高级精度和多样化风格的人脸,与DreamBooth和Textual Inversion相当,但快速得多,使用的参数少得多,模型的大小也很小。更多信息请参考:
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
results: 在五个大规模数据集(Kinetics-400、Kinetics-600、SS-v2、Diving-48和ActivityNet-1.3)上,Video-FocalNets模型表现出色,与现有的Transformer模型相比,具有更高的效率和更好的性能。Abstract
Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. Video transformer designs are based on self-attention that can model global context at a high computational cost. In comparison, convolutional designs for videos offer an efficient alternative but lack long-range dependency modeling. Towards achieving the best of both designs, this work proposes Video-FocalNet, an effective and efficient architecture for video recognition that models both local and global contexts. Video-FocalNet is based on a spatio-temporal focal modulation architecture that reverses the interaction and aggregation steps of self-attention for better efficiency. Further, the aggregation step and the interaction step are both implemented using efficient convolution and element-wise multiplication operations that are computationally less expensive than their self-attention counterparts on video representations. We extensively explore the design space of focal modulation-based spatio-temporal context modeling and demonstrate our parallel spatial and temporal encoding design to be the optimal choice. Video-FocalNets perform favorably well against the state-of-the-art transformer-based models for video recognition on five large-scale datasets (Kinetics-400, Kinetics-600, SS-v2, Diving-48, and ActivityNet-1.3) at a lower computational cost. Our code/models are released at https://github.com/TalalWasim/Video-FocalNets.
摘要
近期的视频识别模型通过Transformer模型来实现长距离空间temporal上下文模型。视频转换设计基于自注意的自我注意可以模型全局上下文,但是计算成本较高。相比之下,图像设计对视频提供了一个有效的alternative,但缺少长距离依赖关系模型。为了实现两种设计的优点,本工作提出了Video-FocalNet,一种高效和经济的视频识别模型。Video-FocalNet基于空间temporal关注模块,该模块通过逆转自注意的互动和聚合步骤来提高效率。此外,聚合步骤和互动步骤都使用了高效的卷积和元素乘法操作,这些操作相比于自注意操作更加经济。我们广泛探索了关注模块基于空间temporal上下文模型的设计空间,并证明我们的并行空间和时间编码设计是最佳选择。Video-FocalNet在五个大规模数据集(Kinetics-400、Kinetics-600、SS-v2、Diving-48和ActivityNet-1.3)上与当前的Transformer模型进行视频识别,性能较高,计算成本较低。我们的代码和模型在https://github.com/TalalWasim/Video-FocalNets上发布。
In-context Autoencoder for Context Compression in a Large Language Model
for: solves the long context problem in large language models (LLMs) by compressing a long context into a limited number of memory slots.
methods: uses an In-context Autoencoder (ICAE) with two modules: a learnable encoder adapted with LoRA from an LLM, and a fixed decoder which is the target LLM. The ICAE is pretrained using both autoencoding and language modeling objectives, and then fine-tuned on a small amount of instruct data to enhance its interaction with various prompts.
results: effectively produces memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts.Abstract
We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with various prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM. Our code and data will be released shortly.
摘要
我们提议使用内容压缩 autoencoder(ICAE)来解决大型语言模型(LLM)中的长度问题。ICAE有两个模组:一个可学习的Encoder,从LLM中获取丰富的文本数据,将长 context 压缩到有限的内存槽中,以及一个固定的Decoder,可以根据内存槽进行多种目的的处理。我们首先将ICAE使用自动编码和语言模型目标函数进行预训练,将其训练到从大量文本数据中获取的memory slots可以准确地和全面地表示原始上下文。然后,我们精确地调整预训练后的ICAE,以便在不同的提示下生成适当的回应。我们的实验结果表明,透过我们提出的预训练和精确调整方法,ICAE可以实现$4\times$的context压缩,可以很好地被目标LLM所条件。这些成果表明ICAE具有novel的方法并且具有实际应用中LLM计算和内存负载的减少之能力,因此需要进一步的研究,以解决LLM中的长度问题。我们将在短时间内发布代码和数据。
On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations
results: 我们的研究显示,在满足某些条件时,game-theoretic feature attributions和对冲推论解释是等价的。此外,我们还发现了对冲推论解释的局限性,其不能准确地提供特征重要性。我们的实验结果表明,在不同的数据集上,使用不同的方法可以得到不同的解释结果。Abstract
Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
摘要
“几年前,可解释人工智能(XAI)已经受到了广泛的关注,而feature attributions和counterfactual explanations是最受欢迎的两种解释方法。这两种方法在过去一直被研究独立,只有一些基于实践的尝试了它们的结合。本文确立了game-theoretic feature attributions和counterfactual explanations之间的具体理论连接,并且修改了Shapley值基础的feature attributions和counterfactual explanations,以便实现它们的相等。此外,我们还延伸了这个相等结果至游戏理论解决方案的其他概念,并且通过分析这些条件的限制,实际阐明了使用counterfactual explanations提供特征重要性的局限性。实验结果显示,这两种方法在connection中的解释有所不同,并且与理论结果相符。”Here is a word-for-word translation of the text into Simplified Chinese:“几年前,可解释人工智能(XAI)已经受到了广泛的关注,而feature attributions和counterfactual explanations是最受欢迎的两种解释方法。这两种方法在过去一直被研究独立,只有一些基于实践的尝试了它们的结合。本文确立了game-theoretic feature attributions和counterfactual explanations之间的具体理论连接,并且修改了Shapley值基础的feature attributions和counterfactual explanations,以便实现它们的相等。此外,我们还延伸了这个相等结果至游戏理论解决方案的其他概念。”
DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding
results: 经过用户实验,发现DRAGON能够与用户通信流畅,提供好的引导体验,并将用户联系到周遭环境的意义性资讯。Abstract
Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.
摘要
人们视障 (PwVI) 在周围环境中有困难理解和导航。当前的导航技术ether solely focus on navigation or provide limited environmental information. 鼓动于最近的视觉语言固定和semantic navigation技术的进步,我们提议了DRAGON,一种带有对话系统的导航机器人。通过理解用户的命令,DRAGON可以引导用户到地图上的目标点,描述环境,并根据视觉观察回答问题。通过对话的有效利用,机器人可以将用户的自由形式描述与环境中的标志点相关联,并通过语音提供Semantic information。我们在每天的室内环境中进行了盲人参与者的用户研究。我们的结果表明,DRAGON能够与用户交流平滑,提供良好的引导体验,并通过语音连接用户与周围环境的INTUITIVE manner。
LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT
paper_authors: Lars-Peter Meyer, Claus Stadler, Johannes Frey, Norman Radtke, Kurt Junghanns, Roy Meissner, Gordian Dziwis, Kirill Bulert, Michael Martin
methods: 该 paper 使用了 ChatGPT 进行了详细的实验,以 explore its potential in supporting KGE。
results: 实验结果表明,ChatGPT 可以帮助开发和管理 Knowledge Graphs (KGs),并且可以提高 KGE 的效率和质量。Abstract
Knowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.
摘要
知识图(KG)为我们提供一种结构化、灵活、透明、跨系统、合作的知识和数据组织方式,可以涵盖社会、工业以及科学领域的多种领域。KG比任何其他表示方式更有效。然而,知识图工程(KGE)需要深厚的图结构、网络技术、现有模型和词汇、规则集、逻辑以及最佳实践的经验。它还需要大量的劳动。鉴于大语言模型(LLM)的发展和其界面和应用在最近几年,我们已经进行了全面的实验,用ChatGPT探索其在支持KGE方面的潜力。本文介绍了这些实验的选择和结果,以示出ChatGPT如何帮助我们在开发和管理知识图方面。
Uncovering Unique Concept Vectors through Latent Space Decomposition
results: 经过广泛的实验表明,大多数提取的概念向量都是人类可理解的,具有准确性和一致性,并与任务有直接关系。此外,该方法在数据集探索中也有remarkable的实用性,可以成功地找到训练数据中的偏移和杂音样本。Abstract
Interpreting the inner workings of deep learning models is crucial for establishing trust and ensuring model safety. Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates such as pixel saliency. However, defining the concepts for the interpretability analysis biases the explanations by the user's expectations on the concepts. To address this, we propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. By decomposing the latent space of a layer in singular vectors and refining them by unsupervised clustering, we uncover concept vectors aligned with directions of high variance that are relevant to the model prediction, and that point to semantically distinct concepts. Our extensive experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand. Moreover, we showcase the practical utility of our method in dataset exploration, where our concept vectors successfully identify outlier training samples affected by various confounding factors. This novel exploration technique has remarkable versatility to data types and model architectures and it will facilitate the identification of biases and the discovery of sources of error within training data.
摘要
深度学习模型的内部工作方式的解释是建立信任和确保模型安全的关键。概念基于的解释方法在可解释性方面胜过特征归因估计如像素感知。然而,为了定义概念,用户的期望会影响解释。为解决这个问题,我们提出了一种新的后续无监督方法,可以自动揭示深度模型在训练过程中学习的概念。我们将层的秘密空间拆分成单值特征,然后通过无监督归一化来精细化概念向量,从而找到与高差值方向相关的概念向量,这些向量对应于semantically meaningful的概念。我们的广泛实验表明,大多数我们的概念都可以被人类理解,具有凝聚性,并与任务相关。此外,我们还证明了我们的方法在数据集探索中的实用性,我们的概念向量可以成功地标识在训练数据中受到各种干扰因素的异常训练样本。这种新的探索技术具有数据类型和模型结构的强大可 versatility,可以帮助确定模型中的偏见和训练数据中的错误来源。
Generating Benchmarks for Factuality Evaluation of Language Models
results: 这个论文使用 FACTOR 方法创建了两个 benchmark:Wiki-FACTOR 和 News-FACTOR。研究发现,(i) benchmark 分数与模型大小相关,并且当LM被增强后,其分数会提高。(ii) benchmark 分数与沟通能力之间存在正相关关系,但这两个指标不总是同时同意模型排名。(iii) 当沟通能力和 benchmark 分数不同时,后者更好地反映了 LM 在开放生成中的 фактиче正确性,这种反馈来自于人工标注员。Abstract
Before deploying a language model (LM) within a given domain, it is important to measure its tendency to generate factually incorrect information in that domain. Existing factual generation evaluation methods focus on facts sampled from the LM itself, and thus do not control the set of evaluated facts and might under-represent rare and unlikely facts. We propose FACTOR: Factual Assessment via Corpus TransfORmation, a scalable approach for evaluating LM factuality. FACTOR automatically transforms a factual corpus of interest into a benchmark evaluating an LM's propensity to generate true facts from the corpus vs. similar but incorrect statements. We use our framework to create two benchmarks: Wiki-FACTOR and News-FACTOR. We show that: (i) our benchmark scores increase with model size and improve when the LM is augmented with retrieval; (ii) benchmark score correlates with perplexity, but the two metrics do not always agree on model ranking; and (iii) when perplexity and benchmark score disagree, the latter better reflects factuality in open-ended generation, as measured by human annotators. We make our data and code publicly available in https://github.com/AI21Labs/factor.
摘要
Benchmark scores increase with model size and improve when the LM is augmented with retrieval.2. Benchmark score correlates with perplexity, but the two metrics do not always agree on model ranking.3. When perplexity and benchmark score disagree, the latter better reflects factuality in open-ended generation, as measured by human annotators.We make our data and code publicly available at https://github.com/AI21Labs/factor.
Sequential Monte Carlo Learning for Time Series Structure Discovery
paper_authors: Feras A. Saad, Brian J. Patton, Matthew D. Hoffman, Rif A. Saurous, Vikash K. Mansinghka
for: automatizilly discover accurate models of complex time series data
methods: Bayesian nonparametric prior over Gaussian process time series models, integrates sequential Monte Carlo (SMC) and involutive MCMC for posterior inference
results: delivers 10x-100x runtime speedups over previous MCMC and greedy-search structure learning algorithms, and discovers sensible models that deliver more accurate point forecasts and interval forecasts compared to statistical and neural baselines.Abstract
This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both in "online" settings, where new data is incorporated sequentially in time, and in "offline" settings, by using nested subsets of historical data to anneal the posterior. Empirical measurements on real-world time series show that our method can deliver 10x--100x runtime speedups over previous MCMC and greedy-search structure learning algorithms targeting the same model family. We use our method to perform the first large-scale evaluation of Gaussian process time series structure learning on a prominent benchmark of 1,428 econometric datasets. The results show that our method discovers sensible models that deliver more accurate point forecasts and interval forecasts over multiple horizons as compared to widely used statistical and neural baselines that struggle on this challenging data.
摘要
Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach
for: This paper is written for solving the dynamic vehicle dispatching problem, which is a problem of assigning vehicles to requests that arise stochastically over time and space.
methods: The paper uses a semi-Markov decision process to model the problem, which allows for treating time as continuous and reduces the combinatorial complexity of the decision space. The authors also use double deep q-learning to train their decision agents.
results: The authors’ policies exhibit better average waiting times, cancellation rates, and total service times compared to heuristic policies often used in practice, with a reduction in average waiting times of up to 50% relative to the other tested heuristic policies.Here is the text in Simplified Chinese:
results: 作者的策略比常用的各种规则更好,waiting time、取消率和总服务时间都得到了改善,waiting time的减少达50%以上。Abstract
The dynamic vehicle dispatching problem corresponds to deciding which vehicles to assign to requests that arise stochastically over time and space. It emerges in diverse areas, such as in the assignment of trucks to loads to be transported; in emergency systems; and in ride-hailing services. In this paper, we model the problem as a semi-Markov decision process, which allows us to treat time as continuous. In this setting, decision epochs coincide with discrete events whose time intervals are random. We argue that an event-based approach substantially reduces the combinatorial complexity of the decision space and overcomes other limitations of discrete-time models often proposed in the literature. In order to test our approach, we develop a new discrete-event simulator and use double deep q-learning to train our decision agents. Numerical experiments are carried out in realistic scenarios using data from New York City. We compare the policies obtained through our approach with heuristic policies often used in practice. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction in average waiting times of up to 50% relative to the other tested heuristic policies.
摘要
这个动态车辆分配问题与决定在时间和空间上随机出现的请求中分配车辆相关。它出现在不同领域,如货物运输、紧急系统和乘用车服务等。在这篇论文中,我们将问题模型为半Markov决策过程,这使得我们可以将时间视为连续的。在这种设定下,决策瞬间与随机时间间隔相匹配,我们认为事件基本方法可以减少决策空间的复杂度,并且超越了常见的 discrete-time 模型。为了测试我们的方法,我们开发了一个新的离散事件仿真器,并使用双层深度Q学习来训练我们的决策代理人。我们在使用实际数据从纽约市进行数值实验,并与常见的各种各样的策略进行比较。结果显示,我们的策略可以减少待命时间的平均值,最多减少50%,相比其他测试的各种各样的策略。
The complexity of non-stationary reinforcement learning
for: solves the problem of continual learning in reinforcement learning, specifically the non-stationary reinforcement learning challenge.
methods: uses a worst-case complexity result to prove that modifying probabilities or rewards in a reinforcement learning problem requires a significant amount of time, unless the strong exponential time hypothesis (SETH) is false.
results: shows that adding a new state-action pair is much easier to implement than modifying existing probabilities or rewards.Abstract
The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.
摘要
“对于回传学习领域中的持续学习问题(通常称为非站ARY reinforcement learning),我们证明了一个最差情况的复杂性结果:对某个状态动作 pairs modify 概率或奖励,需要大约与状态数量相当的时间才能保持值函数更新, Unless SETH 是 false;SETK 是一个广泛accepted的对 P $\neq$ NP 推论的强化。请注意,现在的实际应用中的状态数量通常是惊人的。然而,我们显示了仅增加一个新的状态动作 pairs 是可以轻松实现的。”Note: SETH stands for "Strong Exponential Time Hypothesis" and P $\neq$ NP is a famous open problem in computer science.
Embodied Lifelong Learning for Task and Motion Planning
results: 论文在 simulated 2D 领域和 BEHAVIOR 数据集上的规划成功率显示了明显的提高。Abstract
A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge to become a more proficient assistant. We formalize this setting with a novel lifelong learning problem formulation in the context of learning for task and motion planning (TAMP). Exploiting the modularity of TAMP systems, we develop a generative mixture model that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across task models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements in planning success on simulated 2D domains and on several problems from the BEHAVIOR benchmark.
摘要
一个机器人在家中长时间内部署,面临着一个真正的一生学习问题。作为它提供帮助的用户,机器人应该利用已有经验来提高自己的知识,成为更有效的帮手。我们将这种设定形式化为一种新的一生学习问题形式,在任务和动作规划(TAMP)上下文中。利用TAMP系统的模块性,我们开发了一种生成混合模型,生成候选的连续参数 для плаanner。大多数现有的一生学习方法在数据分享上做出了决定,而我们的方法在线上决定使用共享和非共享模型,并根据辅助任务来选择使用哪些模型,以便在规划中使用。我们的方法在 simulated 2D 领域和 BEHAVIOR benchmark 上表现出了显著的改善。
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
for: 这 paper 的目的是提出一种简单 yet effective 的自然语言生成(NLG)评估 metric,它可以增强评估过程的普适性和可读性。
methods: 这 paper 使用了 instruction-style 问题回答任务来形式NLG评估,并使用了 instruction-tuned pre-trained language models(PLMs),不需要训练在评估数据集上,以提高普适性。另外,这 paper 还 decomposes 了自己的设计的 instruction-style 问题,以获得更好的解释性。
results: 实验结果表明,DecompEval 可以在不需要训练的情况下,在 text summarization 和对话生成 等 NLG 任务中 achieve state-of-the-art 性能,同时也能够具备强大的特征级 / 任务级普适性和可读性。Abstract
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. Specifically, most of the well-performed metrics are required to train on evaluation datasets of specific NLG tasks and evaluation dimensions, which may cause over-fitting to task-specific datasets. Furthermore, existing metrics only provide an evaluation score for each dimension without revealing the evidence to interpret how this score is obtained. To deal with these challenges, we propose a simple yet effective metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets, aiming to enhance the generalization ability. To make the evaluation process more interpretable, we decompose our devised instruction-style question about the quality of generated texts into the subquestions that measure the quality of each sentence. The subquestions with their answers generated by PLMs are then recomposed as evidence to obtain the evaluation result. Experimental results show that DecompEval achieves state-of-the-art performance in untrained metrics for evaluating text summarization and dialogue generation, which also exhibits strong dimension-level / task-level generalization ability and interpretability.
摘要
To address these challenges, we propose a new metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets. This approach aims to enhance the generalization ability of the metric.To make the evaluation process more interpretable, we decompose the instruction-style question about the quality of generated texts into subquestions that measure the quality of each sentence. The subquestions, along with the answers generated by PLMs, are then recomposed to obtain the evaluation result.Experimental results show that DecompEval achieves state-of-the-art performance in untrained metrics for evaluating text summarization and dialogue generation. It also exhibits strong dimension-level and task-level generalization ability and interpretability.
Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success
results: 实验结果表明,简单的文本基本可以抓取提取,高效地抓取提取。Abstract
The generations of large language models are commonly controlled through prompting techniques, where a user's query to the model is prefixed with a prompt that aims to guide the model's behaviour on the query. The prompts used by companies to guide their models are often treated as secrets, to be hidden from the user making the query. They have even been treated as commodities to be bought and sold. However, there has been anecdotal evidence showing that the prompts can be extracted by a user even when they are kept secret. In this paper, we present a framework for systematically measuring the success of prompt extraction attacks. In experiments with multiple sources of prompts and multiple underlying language models, we find that simple text-based attacks can in fact reveal prompts with high probability.
摘要
大型语言模型的一代通常通过提示技术来控制,其中用户的查询会被前置一个提示,以导引模型对查询的行为。公司使用的提示 oftentimes 被视为秘密,隐瞒于用户。然而,有一些具体的证据表明,用户可以EXTRACT 提示,即使它们被保持为秘密。在这篇论文中,我们提出了一种系统化测量提示提取攻击的成功度的框架。在多个来源的提示和多个基础语言模型的实验中,我们发现了简单的文本基于攻击可以高概率地披露提示。
Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening
methods: 该方法使用了学习的2D外科缝合线检测网络来分割RGB图像中的缝合线,然后利用两个ステレオカメラ拍摄的图像来重建缝合线为NURBS spline。方法还使用了两个摄像头来跟踪缝合线 across consecutive frames。
results: 实验表明,该方法在单帧3D缝合线重建中实现了1.33像素平均 reprojection error,并在两个跟踪序列中实现了0.84像素平均 reprojection error。在“尾短”任务中,方法实现了20次实验中的90%成功率。详细的补充材料可以在https://sites.google.com/berkeley.edu/autolab-surgical-thread/ 查看。Abstract
Accurate 3D sensing of suturing thread is a challenging problem in automated surgical suturing because of the high state-space complexity, thinness and deformability of the thread, and possibility of occlusion by the grippers and tissue. In this work we present a method for tracking surgical thread in 3D which is robust to occlusions and complex thread configurations, and apply it to autonomously perform the surgical suture "tail-shortening" task: pulling thread through tissue until a desired "tail" length remains exposed. The method utilizes a learned 2D surgical thread detection network to segment suturing thread in RGB images. It then identifies the thread path in 2D and reconstructs the thread in 3D as a NURBS spline by triangulating the detections from two stereo cameras. Once a 3D thread model is initialized, the method tracks the thread across subsequent frames. Experiments suggest the method achieves a 1.33 pixel average reprojection error on challenging single-frame 3D thread reconstructions, and an 0.84 pixel average reprojection error on two tracking sequences. On the tail-shortening task, it accomplishes a 90% success rate across 20 trials. Supplemental materials are available at https://sites.google.com/berkeley.edu/autolab-surgical-thread/ .
摘要
医学自动控制技术是一个挑战性的问题,因为缝纫线的高状态空间复杂性,细长和扭曲性,以及机械握住和组织干扰。在这项工作中,我们提出了一种能够承受干扰和复杂缝纫线配置的3D缝纫线跟踪方法,并将其应用于自动完成“尾短化”任务:将缝纫线通过组织 until a desired “tail” length remains exposed。该方法使用了学习的2D缝纫线检测网络来分割RGB图像中的缝纫线。然后,它将缝纫线路径在2D上标识出,并使用两个ステレオ摄像头拍摄的检测来重建缝纫线为NURBS spline。一旦Initializes a 3D thread model, the method tracks the thread across subsequent frames.实验表明,该方法在具有挑战性的单帧3D缝纫线重建任务中达到1.33像素平均投影误差,并在两个跟踪序列中达到0.84像素平均投影误差。在“尾短化”任务中,它实现了20次实验中的90%成功率。补充材料可以在https://sites.google.com/berkeley.edu/autolab-surgical-thread/查看。