results: 这个研究的结果显示,使用实体效率的调整(例如提示调整和低维度适应)来优化控制token,然后进行控制生成,可以将控制token的质量提高,并在两个公认的数据集上显著提高控制生成质量,与先前的研究相比。Abstract
Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment. Nevertheless, they have certain drawbacks. One such limitation is that they can only align models with one preference at the training time (e.g., they cannot learn to generate concise responses when the preference data prefers detailed responses), or have certain constraints for the data format (e.g., DPO only supports pairwise preference data). To this end, prior works incorporate controllable generations for alignment to make language models learn multiple preferences and provide outputs with different preferences during inference if asked. Controllable generation also offers more flexibility with regard to data format (e.g., it supports pointwise preference data). Specifically, it uses different control tokens for different preferences during training and inference, making LLMs behave differently when required. Current controllable generation methods either use a special token or hand-crafted prompts as control tokens, and optimize them together with LLMs. As control tokens are typically much lighter than LLMs, this optimization strategy may not effectively optimize control tokens. To this end, we first use parameter-efficient tuning (e.g., prompting tuning and low-rank adaptation) to optimize control tokens and then fine-tune models for controllable generations, similar to prior works. Our approach, alignMEnt with parameter-Efficient Tuning (MEET), improves the quality of control tokens, thus improving controllable generation quality consistently by an apparent margin on two well-recognized datasets compared with prior works.
摘要
对大型语言模型(LLM)的调整是非常重要,以确保其安全和有用。以前的工作主要采用了强化学习(RLHF)和直接喜好优化(DPO),并通过人类反馈来进行调整。然而,这些方法有一些缺点。例如,它们只能在训练时间内对一个喜好进行调整(例如,它们无法学习生成简洁响应,当喜好数据偏好详细响应时),或者有一些数据格式的限制(例如,DPO只支持对数据进行对比优化)。为了解决这个问题,先前的工作会 incorporate 可控生成,以使语言模型学习多个喜好,并在推理时根据需要生成不同的响应。可控生成还提供了更多的数据格式灵活性(例如,它支持点对数据)。具体来说,它在训练和推理时使用不同的控制符,使模型在不同的喜好下行为不同。现有的可控生成方法通常使用特殊符号或手工制定的提示作为控制符,并与模型一起优化。然而,这种优化策略可能不能有效地优化控制符。为了解决这个问题,我们首先使用参数高效调整(例如,提示调整和低级变换)来优化控制符,然后继续调整模型以实现可控生成。我们的方法,名为 alignMEnt with parameter-Efficient Tuning(MEET),可以不断提高控制符的质量,从而提高可控生成质量,并在两个常见的数据集上显著超越先前的工作。
Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation
results: 实验结果表明,我们的方法可以带给Transformer模型强制性 inductive bias,从而提高系统泛化和少量数据学习的能力,特别是 для FST-like 任务。Abstract
Strong inductive biases enable learning from little data and help generalization outside of the training distribution. Popular neural architectures such as Transformers lack strong structural inductive biases for seq2seq NLP tasks on their own. Consequently, they struggle with systematic generalization beyond the training distribution, e.g. with extrapolating to longer inputs, even when pre-trained on large amounts of text. We show how a structural inductive bias can be injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data. Specifically, we inject an inductive bias towards Finite State Transducers (FSTs) into a Transformer by pre-training it to simulate FSTs given their descriptions. Our experiments show that our method imparts the desired inductive bias, resulting in improved systematic generalization and better few-shot learning for FST-like tasks.
摘要
强大的推导偏好可以帮助学习从少量数据中学习和泛化到训练分布之外。流行的神经网络架构如Transformer在seq2seq NLP任务中缺乏强制性的推导偏好,因此在训练分布之外的泛化方面会遇到困难,如 extrapolating 到更长的输入。我们示示了如何通过在模型中注入结构偏好来增强 seq2seq 模型的泛化能力。特别是,我们将 Transformer 模型预训练以模拟 Finite State Transducers (FSTs) 的结构变换。我们的实验结果表明,我们的方法可以增强模型的泛化能力和几拘学习能力,特别是在 FST-like 任务中。
Testing the Limits of Unified Sequence to Sequence LLM Pretraining on Diverse Table Data Tasks
results: 我们通过多个减少研究发现,预训自我的目标可以大幅提高模型的表格特定任务表现。例如,我们发现在表格内容中训练的文本问题回答(QA)模型,虽然已经特化了,但仍然有很大的改善空间。我们的研究是首次尝试将表格特定预训扩展到770M至11B字串处理器模型,并与对表格数据进行特化的模型进行比较。Abstract
Tables stored in databases and tables which are present in web pages and articles account for a large part of semi-structured data that is available on the internet. It then becomes pertinent to develop a modeling approach with large language models (LLMs) that can be used to solve diverse table tasks such as semantic parsing, question answering as well as classification problems. Traditionally, there existed separate models specialized for each task individually. It raises the question of how far can we go to build a unified model that works well on some table tasks without significant degradation on others. To that end, we attempt at creating a shared modeling approach in the pretraining stage with encoder-decoder style LLMs that can cater to diverse tasks. We evaluate our approach that continually pretrains and finetunes different model families of T5 with data from tables and surrounding context, on these downstream tasks at different model scales. Through multiple ablation studies, we observe that our pretraining with self-supervised objectives can significantly boost the performance of the models on these tasks. As an example of one improvement, we observe that the instruction finetuned public models which come specialized on text question answering (QA) and have been trained on table data still have room for improvement when it comes to table specific QA. Our work is the first attempt at studying the advantages of a unified approach to table specific pretraining when scaled from 770M to 11B sequence to sequence models while also comparing the instruction finetuned variants of the models.
摘要
《文档存储在数据库和网页上的表格占据互联网上很大一部分半结构化数据。随后,我们需要开发一种模型方法,使用大型自然语言模型(LLM)来解决多种表格任务,如semantic parsing、问答以及分类问题。过去,我们有着专门为每个任务设计的单独模型。这引发了我们是否可以建立一个统一的模型,可以在不同任务之间无需重大下降性的情况下工作。为此,我们尝试了在预训练阶段使用encoder-decoder式LLM来建立共享模型方法,可以满足多种任务。我们通过多个缺省研究发现,我们的预训练自然语言对象可以显著提高模型在这些任务上的性能。例如,我们发现,通过训练文本问答(QA)模型,并将其特化为表格数据,仍然可以进一步提高表格特定的QA表现。我们的工作是首次研究表格特定预训练的优点,在扩展自然语言模型规模从770M到11B时进行比较,同时对特定 instrucion 的训练过程进行比较。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
results: 该研究在六个开源LVLM中测试了LURE,并取得了23%的全面对象幻觉评价指标提升,比前一个最佳方法更高。在GPT和人类评估中,LURE一直 ranks at the top。数据和代码可以在https://github.com/YiyangZhou/LURE上获取。Abstract
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages. However, LVLMs still suffer from object hallucination, which is the problem of generating descriptions that include objects that do not actually exist in the images. This can negatively impact many vision-language tasks, such as visual summarization and reasoning. To address this issue, we propose a simple yet powerful algorithm, LVLM Hallucination Revisor (LURE), to post-hoc rectify object hallucination in LVLMs by reconstructing less hallucinatory descriptions. LURE is grounded in a rigorous statistical analysis of the key factors underlying object hallucination, including co-occurrence (the frequent appearance of certain objects alongside others in images), uncertainty (objects with higher uncertainty during LVLM decoding), and object position (hallucination often appears in the later part of the generated text). LURE can also be seamlessly integrated with any LVLMs. We evaluate LURE on six open-source LVLMs, achieving a 23% improvement in general object hallucination evaluation metrics over the previous best approach. In both GPT and human evaluations, LURE consistently ranks at the top. Our data and code are available at https://github.com/YiyangZhou/LURE.
摘要
(Simplified Chinese translation)大量视力语言模型(LVLM)已经表现出了对人类语言的Visual Information的强大理解能力。然而,LVLM仍然受到对象幻觉的困扰,即生成包含不存在于图像中的对象的描述。这可能会对视力语言任务产生负面影响,如视觉概要和理解。为解决这个问题,我们提议一种简单 yet powerful的算法,即LVLM幻觉修正器(LURE),以后期修正LVLM中的对象幻觉。LURE基于对对象幻觉的关键因素进行了严格的统计分析,包括共occurrence(图像中certain对象的频繁出现)、uncertainty(LVLM解码过程中对象的高度不确定性)和object position(幻觉通常在生成文本的后半部分出现)。此外,LURE还可以与任何LVLM集成。我们对六个开源LVLM进行评估,实现了以往最佳方法的23%提升。在GPT和人类评估中,LURE也一直 ranked at the top。我们的数据和代码可以在https://github.com/YiyangZhou/LURE中获得。
FELM: Benchmarking Factuality Evaluation of Large Language Models
results: 研究发现,虽然 Retrieval 可以帮助factuality evaluation,但目前的LLM仍然远远不够,无法准确检测factual errors。Abstract
Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as felm. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g.~information from Wikipedia), felm focuses on factuality across diverse domains, spanning from world knowledge to math and reasoning. Our annotation is based on text segments, which can help pinpoint specific factual errors. The factuality annotations are further supplemented by predefined error types and reference links that either support or contradict the statement. In our experiments, we investigate the performance of several LLM-based factuality evaluators on felm, including both vanilla LLMs and those augmented with retrieval mechanisms and chain-of-thought processes. Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors.
摘要
正在评估大语言模型(LLM)生成的文本真实性是一个emerging yet crucial的研究领域,旨在警示用户 potential errors和导向更可靠的 LLM 发展。然而,评估真实性的评估人员自己也需要适当的评估,以便衡量进步和促进进步。这个方向还未得到充分的探索,导致 LLM 的发展受到了重大的阻碍。为了解决这个问题,我们提出了一个大语言模型真实性评估标准(Felm)。在这个标准中,我们收集了由 LLM 生成的回答,并对其进行细化的标签分类。与前一些研究主要集中于世界知识(例如Wikipedia)的真实性,felm 强调在多个领域中的真实性,包括世界知识、数学和逻辑。我们的标注基于文本段,可以帮助特定的错误找到。真实性标注还得到了预定义的错误类型和参考链接,这些链接可以支持或反对声明。在我们的实验中,我们调查了一些基于 LLM 的真实性评估器在 felm 上的表现,包括基于 vanilla LLM 和增强了检索机制和链式思维的 LLM。我们的发现表明,虽然检索可以帮助真实性评估,但目前的 LLM 还远不够可靠地检测错误。
Robust Sentiment Analysis for Low Resource languages Using Data Augmentation Approaches: A Case Study in Marathi
for: 本研究旨在提高低资源语言 sentiment 分析的表现,特别是对印度语言 Marathi 进行了一项全面的数据扩充研究。
methods: 本文提出了四种数据扩充技术,包括 paraphrasing、back-translation、BERT 基于随机Token 替换和 named entity 替换,以及 GPT 基于文本和标签生成。
results: 研究结果显示,这些数据扩充方法可以提高 Marathi 语言的 sentiment 分析模型在跨频道情况下的表现,并且这些技术可以扩展到其他低资源语言和普通文本分类任务。Abstract
Sentiment analysis plays a crucial role in understanding the sentiment expressed in text data. While sentiment analysis research has been extensively conducted in English and other Western languages, there exists a significant gap in research efforts for sentiment analysis in low-resource languages. Limited resources, including datasets and NLP research, hinder the progress in this area. In this work, we present an exhaustive study of data augmentation approaches for the low-resource Indic language Marathi. Although domain-specific datasets for sentiment analysis in Marathi exist, they often fall short when applied to generalized and variable-length inputs. To address this challenge, this research paper proposes four data augmentation techniques for sentiment analysis in Marathi. The paper focuses on augmenting existing datasets to compensate for the lack of sufficient resources. The primary objective is to enhance sentiment analysis model performance in both in-domain and cross-domain scenarios by leveraging data augmentation strategies. The data augmentation approaches proposed showed a significant performance improvement for cross-domain accuracies. The augmentation methods include paraphrasing, back-translation; BERT-based random token replacement, named entity replacement, and pseudo-label generation; GPT-based text and label generation. Furthermore, these techniques can be extended to other low-resource languages and for general text classification tasks.
摘要
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
results: 这篇论文的结果表明,使用这种方法可以对文本到语音转化系统的质量进行更广泛的评估,而不是仅仅是测试语音智能的准确率。此外,这种方法还可以与人工评估方法相比肩,并且可以减少人工评估的成本。Abstract
Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Error Rate (WER) to measure intelligibility. Prior works focus on evaluating synthetic speech based on pre-trained speech recognition models, however, this can be limiting since this approach primarily measures speech intelligibility. In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech. Our main assumption is that by training the ASR model on the synthetic speech, the WER on real speech reflects the similarity between distributions, a broader assessment of synthetic speech quality beyond intelligibility. Our proposed metric demonstrates a strong correlation with both MOS naturalness and MOS intelligibility when compared to SpeechLMScore and MOSNet on three recent Text-to-Speech (TTS) systems: MQTTS, StyleTTS, and YourTTS.
摘要
现代语音合成系统已经进步很 significatively,Synthetic speech 和 real speech 之间的差别已经变得极其微scopic。然而,efficiently and holistically evaluate synthetic speech 仍然是一个主要挑战。人工评分使用 Mean Opinion Score (MOS) 是理想的,但是它的成本很高。因此,研究人员已经开发了auxiliary automatic metrics like Word Error Rate (WER) 来度量语音明亮度。先前的研究主要基于使用预训练的speech recognition 模型来评估合成语音的质量,但这种方法只能测量语音的elligibility。在这篇论文中,我们提出了一种评估技术,即使用 ASR 模型来训练 synthetic speech,并用其在真实语音上的性能来度量合成语音的质量。我们的主要假设是,通过训练 ASR 模型使 synthetic speech 与 real speech 之间的分布相似,那么 WER 在真实语音上的性能将反映合成语音的质量,不仅是语音可理解性。我们的提出的度量与 MOS naturalness 和 MOS intelligibility 具有强相关性,并且在三个 latest Text-to-Speech (TTS) 系统(MQTTS、StyleTTS 和 YourTTS)上进行了比较。
Do the Benefits of Joint Models for Relation Extraction Extend to Document-level Tasks?
results: 实验结果表明,joint 模型在 sentence-level 任务上比 pipeline 模型显示出了更高的性能,但是在 document-level 任务上,joint 模型的性能下降了,与 pipeline 模型的性能相比。Abstract
Two distinct approaches have been proposed for relational triple extraction - pipeline and joint. Joint models, which capture interactions across triples, are the more recent development, and have been shown to outperform pipeline models for sentence-level extraction tasks. Document-level extraction is a more challenging setting where interactions across triples can be long-range, and individual triples can also span across sentences. Joint models have not been applied for document-level tasks so far. In this paper, we benchmark state-of-the-art pipeline and joint extraction models on sentence-level as well as document-level datasets. Our experiments show that while joint models outperform pipeline models significantly for sentence-level extraction, their performance drops sharply below that of pipeline models for the document-level dataset.
摘要
两种不同的方法有被提议用于关系三元EXTRACT - 管道和共同。共同模型, capture关系三元之间的互动,是更新的发展,并在句子级EXTRACT任务中显示出perform得到更好的结果。文档级EXTRACT是一个更加复杂的设定, где交互关系可以是长距离的,并且每个三元也可以跨 sentence。共同模型没有被应用于文档级任务上。在这篇文章中,我们对 sentence级和文档级的EXTRACT模型进行了比较。我们的实验结果表明,虽然共同模型在句子级EXTRACT任务上表现明显 луч于管道模型,但是对文档级数据集的性能下降了很多。
CebuaNER: A New Baseline Cebuano Named Entity Recognition Model
paper_authors: Ma. Beatrice Emanuela Pilar, Ellyza Mari Papas, Mary Loise Buenaventura, Dane Dedoroy, Myron Darrel Montefalcon, Jay Rhald Padilla, Lany Maceda, Mideth Abisado, Joseph Marvin Imperial
for: 这个研究的目的是为了提供一个基线模型 для缅甸语名实体识别(NER)任务。
methods: 这个研究使用了Conditional Random Field和Bidirectional LSTM算法来适应缅甸语文本,并对4000份当地新闻文章进行了标注和训练。
results: 研究发现这个基线模型在精度、准确率和F1指标上达到了70%以上,并且在跨语言设置下与标准模型进行比较表现良好。Abstract
Despite being one of the most linguistically diverse groups of countries, computational linguistics and language processing research in Southeast Asia has struggled to match the level of countries from the Global North. Thus, initiatives such as open-sourcing corpora and the development of baseline models for basic language processing tasks are important stepping stones to encourage the growth of research efforts in the field. To answer this call, we introduce CebuaNER, a new baseline model for named entity recognition (NER) in the Cebuano language. Cebuano is the second most-used native language in the Philippines, with over 20 million speakers. To build the model, we collected and annotated over 4,000 news articles, the largest of any work in the language, retrieved from online local Cebuano platforms to train algorithms such as Conditional Random Field and Bidirectional LSTM. Our findings show promising results as a new baseline model, achieving over 70% performance on precision, recall, and F1 across all entity tags, as well as potential efficacy in a crosslingual setup with Tagalog.
摘要
Despite being one of the most linguistically diverse regions in the world, computational linguistics and language processing research in Southeast Asia has struggled to keep up with the level of countries from the Global North. To address this challenge, initiatives such as open-sourcing corpora and developing baseline models for basic language processing tasks are crucial stepping stones to encourage the growth of research efforts in the field. In response to this call, we introduce CebuaNER, a new baseline model for named entity recognition (NER) in the Cebuano language. Cebuano is the second most widely spoken native language in the Philippines, with over 20 million speakers. To build the model, we collected and annotated over 4,000 news articles, the largest dataset of any work in the language, retrieved from online local Cebuano platforms and trained algorithms such as Conditional Random Field and Bidirectional LSTM. Our findings show promising results as a new baseline model, achieving over 70% performance on precision, recall, and F1 across all entity tags, as well as potential efficacy in a crosslingual setup with Tagalog.
results: experiments 表明,GeRA 方法在 speech-text 和 image-text 领域中表现出了明显的改进,特别是使用小量对数据的 paired data。Abstract
Pretrained unimodal encoders incorporate rich semantic information into embedding space structures. To be similarly informative, multi-modal encoders typically require massive amounts of paired data for alignment and training. We introduce a semi-supervised Geometrically Regularized Alignment (GeRA) method to align the embedding spaces of pretrained unimodal encoders in a label-efficient way. Our method leverages the manifold geometry of unpaired (unlabeled) data to improve alignment performance. To prevent distortions to local geometry during the alignment process, potentially disrupting semantic neighborhood structures and causing misalignment of unobserved pairs, we introduce a geometric loss term. This term is built upon a diffusion operator that captures the local manifold geometry of the unimodal pretrained encoders. GeRA is modality-agnostic and thus can be used to align pretrained encoders from any data modalities. We provide empirical evidence to the effectiveness of our method in the domains of speech-text and image-text alignment. Our experiments demonstrate significant improvement in alignment quality compared to a variaty of leading baselines, especially with a small amount of paired data, using our proposed geometric regularization.
摘要
<>translate the following text into Simplified Chinese:Pretrained unimodal encoders incorporate rich semantic information into embedding space structures. To be similarly informative, multi-modal encoders typically require massive amounts of paired data for alignment and training. We introduce a semi-supervised Geometrically Regularized Alignment (GeRA) method to align the embedding spaces of pretrained unimodal encoders in a label-efficient way. Our method leverages the manifold geometry of unpaired (unlabeled) data to improve alignment performance. To prevent distortions to local geometry during the alignment process, potentially disrupting semantic neighborhood structures and causing misalignment of unobserved pairs, we introduce a geometric loss term. This term is built upon a diffusion operator that captures the local manifold geometry of the unimodal pretrained encoders. GeRA is modality-agnostic and thus can be used to align pretrained encoders from any data modalities. We provide empirical evidence to the effectiveness of our method in the domains of speech-text and image-text alignment. Our experiments demonstrate significant improvement in alignment quality compared to a variety of leading baselines, especially with a small amount of paired data, using our proposed geometric regularization.Translate the text into Simplified Chinese: preprained 单modal encoders 含有丰富的 semantic 信息,将 embedding 空间结构中的信息升级为多modal encoders 需要巨量的对应数据对Alignment和training。我们介绍了一种 semi-supervised 的 Geometrically Regularized Alignment (GeRA) 方法,用于对 preprained 单modal encoders 的 embedding 空间进行标签效率的对Alignment。我们的方法利用了无对应数据的 manifold geometry,以提高对Alignment的性能。为避免对Local geometry的扭曲,可能导致 semantic 邻居结构的扰乱和未观察对的歪曲,我们引入了一个 geometric 损失项。这个项目基于一个 diffusion 算子,捕捉了单modal 预训练 encoders 的 Local manifold geometry。GeRA 是modal-agnostic,因此可以用于对任何数据模式的预训练 encoders 进行对Alignment。我们提供了实验证明我们的方法在speech-text 和 image-text 对Alignment中的效果。我们的实验表明,使用我们提posed的 geometric 正则化可以在小量对数据情况下达到显著提高对Alignment质量的效果,特别是与多种主流基准值进行比较。
Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning
for: This paper explores the security implications of parameter-efficient fine-tuning (PEFT) for pre-trained language models (PLMs), and reveals a novel attack called PETA that can successfully inject a backdoor into a PLM using PEFT.
methods: The attack uses bilevel optimization to embed a backdoor into a PLM while retaining the PLM’s task-specific performance, and the defense omits PEFT in selected layers of the backdoored PLM and unfreezes a subset of these layers’ parameters to neutralize the attack.
results: The attack is effective in terms of both attack success rate and unaffected clean accuracy, even after the victim user performs PEFT over the backdoored PLM using untainted data. The defense is effective in neutralizing the attack.Here is the summary in Traditional Chinese:
for: 本研究探讨parameter-efficient fine-tuning (PEFT)所带来的安全问题,并发现了一种称为PETA的攻击,可以成功地将backdoor注入到pre-trained language models (PLMs)中。
results: 这个攻击具有成功率和不受污染的清洁率,甚至在受害者使用不混合的数据进行PEFT后仍然有效。防御方法能够有效地中和攻击。Abstract
Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT achieves performance comparable to full fine-tuning. However, despite its prevalent use, the security implications of PEFT remain largely unexplored. In this paper, we conduct a pilot study revealing that PEFT exhibits unique vulnerability to trojan attacks. Specifically, we present PETA, a novel attack that accounts for downstream adaptation through bilevel optimization: the upper-level objective embeds the backdoor into a PLM while the lower-level objective simulates PEFT to retain the PLM's task-specific performance. With extensive evaluation across a variety of downstream tasks and trigger designs, we demonstrate PETA's effectiveness in terms of both attack success rate and unaffected clean accuracy, even after the victim user performs PEFT over the backdoored PLM using untainted data. Moreover, we empirically provide possible explanations for PETA's efficacy: the bilevel optimization inherently 'orthogonalizes' the backdoor and PEFT modules, thereby retaining the backdoor throughout PEFT. Based on this insight, we explore a simple defense that omits PEFT in selected layers of the backdoored PLM and unfreezes a subset of these layers' parameters, which is shown to effectively neutralize PETA.
摘要
parameter-efficient fine-tuning (PEFT) 可以快速地适应预训练语言模型 (PLM) 到特定任务。通过只调整一小部分 (Extra) 的参数,PEFT 可以达到与全面 fine-tuning 相同的性能。然而,尽管它在广泛使用,PEFT 的安全性问题仍然未得到足够的探讨。在这篇论文中,我们进行了一个小型研究,揭示 PEFT 存在独特的潜在攻击点。 Specifically, we present PETA, a novel attack that accounts for downstream adaptation through bilevel optimization: the upper-level objective embeds the backdoor into a PLM while the lower-level objective simulates PEFT to retain the PLM's task-specific performance. With extensive evaluation across a variety of downstream tasks and trigger designs, we demonstrate PETA's effectiveness in terms of both attack success rate and unaffected clean accuracy, even after the victim user performs PEFT over the backdoored PLM using untainted data. Moreover, we empirically provide possible explanations for PETA's efficacy: the bilevel optimization inherently 'orthogonalizes' the backdoor and PEFT modules, thereby retaining the backdoor throughout PEFT. Based on this insight, we explore a simple defense that omits PEFT in selected layers of the backdoored PLM and unfreezes a subset of these layers' parameters, which is shown to effectively neutralize PETA.
Wavelet Scattering Transform for Improving Generalization in Low-Resourced Spoken Language Identification
results: 与MFCC相比,使用WST特征可以降低识别错误率,最多降低14.05%和6.40% для同一 corpus和隐藏 VoxLingua107评估 respectivelyAbstract
Commonly used features in spoken language identification (LID), such as mel-spectrogram or MFCC, lose high-frequency information due to windowing. The loss further increases for longer temporal contexts. To improve generalization of the low-resourced LID systems, we investigate an alternate feature representation, wavelet scattering transform (WST), that compensates for the shortcomings. To our knowledge, WST is not explored earlier in LID tasks. We first optimize WST features for multiple South Asian LID corpora. We show that LID requires low octave resolution and frequency-scattering is not useful. Further, cross-corpora evaluations show that the optimal WST hyper-parameters depend on both train and test corpora. Hence, we develop fused ECAPA-TDNN based LID systems with different sets of WST hyper-parameters to improve generalization for unknown data. Compared to MFCC, EER is reduced upto 14.05% and 6.40% for same-corpora and blind VoxLingua107 evaluations, respectively.
摘要
通常使用的语音识别(LID)任务中的特征,如MEL-spectrogram或MFCC,因窗口效应而产生高频信息损失,这种损失随着时间上下文的增加而加大。为了改善低资源的LID系统的通用性,我们 investigate了一种 alternate 特征表示,wavelet scattering transform(WST),该表示可以补偿这些缺点。据我们所知,WST在LID任务中没有被探索过。我们首先优化WST特征 для多个南亚语言LID corpus。我们发现,LID需要低 octave 分辨率,而频率散射并不是有用。此外,跨 corpus 评估表明,优化 WST 超参数取决于训练和测试 corpus。因此,我们开发了 fusion ECAPA-TDNN 基于 WST 的 LID 系统,以提高对不知数据的泛化性。相比 MFCC,我们在同一 corpora 和 blind VoxLingua107 评估中分别减少了 EER 14.05% 和 6.40%。
A Task-oriented Dialog Model with Task-progressive and Policy-aware Pre-training
paper_authors: Lucen Zhong, Hengtong Lu, Caixia Yuan, Xiaojie Wang, Jiashen Sun, Ke Zeng, Guanglu Wan
for: 提高任务对话(TOD)相关任务的顺序性和对话策略学习
methods: 使用两种策略相关预训练任务进行预训练,包括全球策略一致性任务和行为相似学习任务
results: 在多个WOZ和车辆内端对话模型评价标准中表现更好,只使用18%的参数和25%的预训练数据,与之前的状态当前PCMGALAXY相比Abstract
Pre-trained conversation models (PCMs) have achieved promising progress in recent years. However, existing PCMs for Task-oriented dialog (TOD) are insufficient for capturing the sequential nature of the TOD-related tasks, as well as for learning dialog policy information. To alleviate these problems, this paper proposes a task-progressive PCM with two policy-aware pre-training tasks. The model is pre-trained through three stages where TOD-related tasks are progressively employed according to the task logic of the TOD system. A global policy consistency task is designed to capture the multi-turn dialog policy sequential relation, and an act-based contrastive learning task is designed to capture similarities among samples with the same dialog policy. Our model achieves better results on both MultiWOZ and In-Car end-to-end dialog modeling benchmarks with only 18\% parameters and 25\% pre-training data compared to the previous state-of-the-art PCM, GALAXY.
摘要
各种前置模型(PCM)在过去几年内已经取得了令人满意的进步。然而,现有的PCM对任务导向对话(TOD)不足以捕捉TOD相关任务的顺序性,以及对话策略信息的学习。为了解决这些问题,这篇论文提出了一种任务逐步进行的PCM,其中包括两个策略意识的预训练任务。模型在三个阶段中预训练,其中TOD相关任务逐步应用于TOD系统的任务逻辑。为了捕捉多Turn对话策略的顺序关系,我们设计了全球策略一致任务。同时,为了捕捉同一策略下的对话样本的相似性,我们设计了基于行为的对比学习任务。我们的模型在MultiWOZ和In-Car终端对话模型 benchmark上达到了之前的state-of-the-art PCMGALAXY的性能,但它只有18%的参数和25%的预训练数据。
Nine-year-old children outperformed ChatGPT in emotion: Evidence from Chinese writing
results: 结果显示 nine-year-old 儿童在 fluency 和 cohesion 方面的写作水平胜过 chatGPT,但 chatGPT 在 accuracy 方面表现出色。 children 在 science-themed 写作中表现出更高的 complexity,而 chatGPT 在 nature-themed 写作中表现出更高的 accuracy。 最重要的是,这项研究发现 nine-year-old 儿童在中文作文中表达的情感更强于 chatGPT。Abstract
ChatGPT has been demonstrated to possess significant capabilities in generating intricate, human-like text, and recent studies have established that its performance in theory of mind tasks is comparable to that of a nine-year-old child. However, it remains uncertain whether ChatGPT surpasses nine-year-old children in Chinese writing proficiency. To explore this, our study juxtaposed the Chinese writing performance of ChatGPT and nine-year-old children on both narrative and scientific topics, aiming to uncover the relative strengths and weaknesses of ChatGPT in writing. The collected data were analyzed across five linguistic dimensions: fluency, accuracy, complexity, cohesion, and emotion. Each dimension underwent assessment through precise indices. The findings revealed that nine-year-old children excelled beyond ChatGPT in terms of fluency and cohesion within their writing. In contrast, ChatGPT manifested a superior performance in accuracy compared to the children. Concerning complexity, children exhibited superior skills in science-themed writing, while ChatGPT prevailed in nature-themed writing. Significantly, this research is pioneering in revealing that nine-year-old children convey stronger emotions than ChatGPT in their Chinese compositions.
摘要
chatGPT possess了较强的文本生成能力,并且研究表明其在理解人类思维方面的表现与9岁孩子相当。然而,是否chatGPT在中文写作方面超过9岁孩子仍然存在uncertainty。为了解答这个问题,我们的研究将chatGPT和9岁孩子的中文写作比较在 narative和科学话题上。我们通过分析5种语言特征,包括流畅、准确、复杂度、连贯和情感,来评估这两个组合的写作能力。我们发现,9岁孩子在流畅和连贯方面的写作能力比chatGPT更强,而chatGPT在准确性方面表现更优。在复杂度方面,孩子在科学话题上表现出了更高的技巧水平,而chatGPT在自然话题上表现更优。最重要的是,这项研究发现,9岁孩子在中文作文中表达的情感更强于chatGPT。
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
results: 我们的实验结果显示,使用我们的方法训练 LLMs 可以更快地趋向于极值,并且比使用现有方法训练的模型表现更好。此外,我们的方法不需要任何额外的工程实践,因此是实际的解决方案在 LLMs 领域。Abstract
The evolving sophistication and intricacies of Large Language Models (LLMs) yield unprecedented advancements, yet they simultaneously demand considerable computational resources and incur significant costs. To alleviate these challenges, this paper introduces a novel, simple, and effective method named ``\growlength'' to accelerate the pretraining process of LLMs. Our method progressively increases the training length throughout the pretraining phase, thereby mitigating computational costs and enhancing efficiency. For instance, it begins with a sequence length of 128 and progressively extends to 4096. This approach enables models to process a larger number of tokens within limited time frames, potentially boosting their performance. In other words, the efficiency gain is derived from training with shorter sequences optimizing the utilization of resources. Our extensive experiments with various state-of-the-art LLMs have revealed that models trained using our method not only converge more swiftly but also exhibit superior performance metrics compared to those trained with existing methods. Furthermore, our method for LLMs pretraining acceleration does not require any additional engineering efforts, making it a practical solution in the realm of LLMs.
摘要
大型语言模型(LLMs)的发展和复杂性带来了前所未有的进步,但它们同时需要很大的计算资源和成本。为了解决这些挑战,本文提出了一种新的、简单的和有效的方法名为“\growlength”,用于加速 LLMS 的预训练过程。我们的方法在预训练阶段逐步增长训练长度,从而减少计算成本并提高效率。例如,它从序列长度为 128 开始,逐步增长到 4096。这种方法使得模型在限时内处理更多的字符,可能提高其性能。换句话说,效率提升来自于在限时内训练使用资源的优化。我们对各种现代 LLMS 进行了广泛的实验,发现使用我们的方法训练的模型不仅更快 converges,而且也表现出了较高的性能指标,比于使用现有方法训练的模型。此外,我们的方法不需要任何额外的工程努力,因此是 LLMS 预训练加速方法中的实用解决方案。
Colloquial Persian POS (CPPOS) Corpus: A Novel Corpus for Colloquial Persian Part of Speech Tagging
for: This paper is written for those interested in natural language processing and POS tagging in Persian, specifically for colloquial text in social network analysis.
methods: The paper introduces a novel corpus called “Colloquial Persian POS” (CPPOS), which includes formal and informal text collected from various social media platforms such as Telegram, Twitter, and Instagram. The corpus was manually annotated and verified by a team of linguistic experts, and a POS tagging guideline was defined for annotating the data.
results: The paper evaluates the quality of CPPOS by training various deep learning models, such as the RNN family, on the constructed corpus. The results show that the model trained on CPPOS outperforms other existing Persian POS corpora and tools, achieving a 14% improvement over the previous dataset.Abstract
Introduction: Part-of-Speech (POS) Tagging, the process of classifying words into their respective parts of speech (e.g., verb or noun), is essential in various natural language processing applications. POS tagging is a crucial preprocessing task for applications like machine translation, question answering, sentiment analysis, etc. However, existing corpora for POS tagging in Persian mainly consist of formal texts, such as daily news and newspapers. As a result, smart POS tools, machine learning models, and deep learning models trained on these corpora may not perform optimally for processing colloquial text in social network analysis. Method: This paper introduces a novel corpus, "Colloquial Persian POS" (CPPOS), specifically designed to support colloquial Persian text. The corpus includes formal and informal text collected from various domains such as political, social, and commercial on Telegram, Twitter, and Instagram more than 520K labeled tokens. After collecting posts from these social platforms for one year, special preprocessing steps were conducted, including normalization, sentence tokenizing, and word tokenizing for social text. The tokens and sentences were then manually annotated and verified by a team of linguistic experts. This study also defines a POS tagging guideline for annotating the data and conducting the annotation process. Results: To evaluate the quality of CPPOS, various deep learning models, such as the RNN family, were trained using the constructed corpus. A comparison with another well-known Persian POS corpus named "Bijankhan" and the Persian Hazm POS tool trained on Bijankhan revealed that our model trained on CPPOS outperforms them. With the new corpus and the BiLSTM deep neural model, we achieved a 14% improvement over the previous dataset.
摘要
Introduction: 部件之分标记(POS)标注,将词语分类为它们的各种部件(如动词或名词),是自然语言处理应用中的重要预处理任务。POS标注是机器翻译、问答、情感分析等应用中的关键预处理任务。然而,现有的波斯语POS标注 corpora主要由正式文本组成,如日报和报纸。这导致了聪明POS工具、机器学习模型和深度学习模型在处理社交网络分析中的混乱文本时可能不具备最佳性能。方法:本文介绍了一个新的 corpora,名为“通用波斯语POS”(CPPOS),用于支持通用波斯语文本。该 corpora 包括了正式和非正式文本,从各种领域,如政治、社会和商业,收集自 Telegram、Twitter 和 Instagram 等社交平台上的大于520K个标注的字符。在收集一年的社交媒体文本后,我们进行了特殊的预处理步骤,包括Normalization、句子分割和词语分割。这些字符和句子 THEN 被一群语言专家 manually annotate 和验证。本研究还定义了POS标注指南,用于标注数据并进行标注过程。结果:为评估 CPPOS 的质量,我们使用constructed corpora trains 了多种深度学习模型,如 RNN 家族。与另一个已知的波斯语POS corpus名为“ Bijankhan” 和 Persian Hazm POS 工具在 Bijankhan 上训练而成的模型相比,我们的模型在 CPPOS 上训练的结果表明,我们的模型在 CPPOS 上训练的结果表明,我们的模型在 CPPOS 上训练的结果比之前的数据集提高了14%。
methods: 该方法使用了SPF算法生成时间序列数据集中的嵌入分类结果,并根据Silhouette系数选择优化数量分支。Silhouette系数在bag of word vector和tf-idf vector两个方面进行计算。
results: 对于UCRLibrary数据集,该方法实验结果表明与基准相比有显著改善。Abstract
Clustering algorithms are among the most widely used data mining methods due to their exploratory power and being an initial preprocessing step that paves the way for other techniques. But the problem of calculating the optimal number of clusters (say k) is one of the significant challenges for such methods. The most widely used clustering algorithms like k-means and k-shape in time series data mining also need the ground truth for the number of clusters that need to be generated. In this work, we extended the Symbolic Pattern Forest algorithm, another time series clustering algorithm, to determine the optimal number of clusters for the time series datasets. We used SPF to generate the clusters from the datasets and chose the optimal number of clusters based on the Silhouette Coefficient, a metric used to calculate the goodness of a clustering technique. Silhouette was calculated on both the bag of word vectors and the tf-idf vectors generated from the SAX words of each time series. We tested our approach on the UCR archive datasets, and our experimental results so far showed significant improvement over the baseline.
摘要
ECG-SL: Electrocardiogram(ECG) Segment Learning, a deep learning method for ECG signal
methods: 本研究提出了一种基于心跳段分的ECG-Segment based Learning(ECG-SL)框架,从心跳段中提取了结构特征,并使用时间模型学习时间信息。此外,还explored一种自动标注的自我超vised学习策略以预训练模型,从而提高下游任务的性能。
results: 对于三种临床应用(心脏病诊断、呼吸暂停检测和cardiac arrhythmia分类),ECG-SL方法显示了与基eline模型和任务特定方法相比的竞争性表现。此外,通过分心跳 segments的Visualization Map可以看到ECG-SL方法更强调每个心跳的峰值和ST范围。Abstract
Electrocardiogram (ECG) is an essential signal in monitoring human heart activities. Researchers have achieved promising results in leveraging ECGs in clinical applications with deep learning models. However, the mainstream deep learning approaches usually neglect the periodic and formative attribute of the ECG heartbeat waveform. In this work, we propose a novel ECG-Segment based Learning (ECG-SL) framework to explicitly model the periodic nature of ECG signals. More specifically, ECG signals are first split into heartbeat segments, and then structural features are extracted from each of the segments. Based on the structural features, a temporal model is designed to learn the temporal information for various clinical tasks. Further, due to the fact that massive ECG signals are available but the labeled data are very limited, we also explore self-supervised learning strategy to pre-train the models, resulting significant improvement for downstream tasks. The proposed method outperforms the baseline model and shows competitive performances compared with task-specific methods in three clinical applications: cardiac condition diagnosis, sleep apnea detection, and arrhythmia classification. Further, we find that the ECG-SL tends to focus more on each heartbeat's peak and ST range than ResNet by visualizing the saliency maps.
摘要
电心图(ECG)是人类心脏活动监测中的关键信号。研究人员在临床应用中已经取得了深受欢迎的结果,使用深度学习模型。然而,主流深度学习方法通常忽略ECG心跳波形的周期性和结构特征。在这项工作中,我们提出了一种基于ECG心跳分割的学习框架(ECG-SL),以明确ECG信号的周期性。具体来说,ECG信号首先被分割成心跳分割,然后从每个分割中提取结构特征。基于这些结构特征,我们设计了一个时间模型,以学习不同临床任务中的时间信息。由于大量的ECG信号 disponible,但标注数据却很有限,因此我们还探索了自动学习策略,以预训练模型,从而实现了显著的提升。我们的方法超过基线模型,并与特定任务方法相比,在三种临床应用中(心脏病诊断、呼吸暂停检测和心动过速分类)显示了竞争力。此外,我们发现ECG-SL在每个心跳的峰值和ST范围方面更加强调,相比ResNet。我们可以通过Visualize saliency maps来见到这一点。
results: 与问题独立的算法相比,这篇论文的专门学习算法不仅具有更好的理论均衡性,还在实验中显示出强大的表现。Abstract
As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.
摘要
artificial intelligence (AI) 系统在人类做出决策中扮演越来越重要的角色,但是人类与 AI 之间的互动问题开始浮现。一个挑战是由于 AI 策略不够佳,人类可能不会遵循 AI 的建议,同时 AI 需要提供建议时机选择性地为人类提供建议。这篇论文提出了一个顺序决策模型,该模型(i)考虑人类遵循度(机器建议被接受或拒绝的概率),(ii)将机器给出建议的时间点选择性地推荐。我们提供了特殊的学习算法,这些算法不仅具有更好的理论均衡性,并且在实验中表现出色。相比问题agnostic 征求学习算法,我们的特殊学习算法不仅具有更好的理论均衡性,而且在实验中表现出色。
Bayesian Design Principles for Frequentist Sequential Learning
paper_authors: Yunbei Xu, Assaf Zeevi for:这篇论文的目的是优化频繁ister regret for sequential learning problems,并提供了一种总结 Bayesian principles的通用理论。methods:论文使用了一种新的优化方法,即“algorithmic beliefs”的生成,以及基于 Bayesian posteriors 的决策。results:论文提出了一种新的算法,可以在随机、对抗和不同环境下实现“best-of-all-worlds”的 empirical performance。此外,这些原理还可以应用于线性 bandits、bandit convex optimization 和 reinforcement learning。Abstract
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
摘要
(Simplified Chinese translation note:* "frequentist regret" 改为 "频率 regret"* "sequential learning problems" 改为 "顺序学习问题"* "efficient bandit and reinforcement learning algorithms" 改为 "高效的随机抽象和奖励学习算法"* "Algorithmic Information Ratio" 改为 "算法信息比率"* "prior-free" 改为 "无先验"* "adversarial settings" 改为 "对抗设定"* "linear bandits" 改为 "线性随机抽象"* "bandit convex optimization" 改为 "随机抽象优化"* "reinforcement learning" 改为 "奖励学习")
Going Beyond Familiar Features for Deep Anomaly Detection
For: The paper is written for detecting anomalies in deep learning models, specifically addressing the problem of false negatives caused by uncaptured novel features.* Methods: The paper proposes a novel approach to anomaly detection using explainability, which captures novel features as unexplained observations in the input space. The approach combines similarity and novelty in a hybrid approach, eliminating the need for expensive background models and dense matching.* Results: The paper achieves strong performance across a wide range of anomaly benchmarks, reducing false negative anomalies by up to 40% compared to the state-of-the-art. The method also provides visually inspectable explanations for pixel-level anomalies.Here are the three points in Simplified Chinese text:* For: 本文是为检测深度学习模型中的异常点而写的,特别是解决由未捕捉的新特征引起的假阳性问题。* Methods: 本文提出了一种新的异常检测方法,使用可解释性来捕捉新特征,并将相似性和新鲜度结合在一起。这种方法可以无需昂贵的背景模型和紧密匹配。* Results: 本文在多种异常标准benchmark上实现了优秀的表现,相比之前的状态态-of-the-art,减少了假阳性异常的比例达40%。此外,方法还提供了可视化的解释,用于检测像素级异常。Abstract
Anomaly Detection (AD) is a critical task that involves identifying observations that do not conform to a learned model of normality. Prior work in deep AD is predominantly based on a familiarity hypothesis, where familiar features serve as the reference in a pre-trained embedding space. While this strategy has proven highly successful, it turns out that it causes consistent false negatives when anomalies consist of truly novel features that are not well captured by the pre-trained encoding. We propose a novel approach to AD using explainability to capture novel features as unexplained observations in the input space. We achieve strong performance across a wide range of anomaly benchmarks by combining similarity and novelty in a hybrid approach. Our approach establishes a new state-of-the-art across multiple benchmarks, handling diverse anomaly types while eliminating the need for expensive background models and dense matching. In particular, we show that by taking account of novel features, we reduce false negative anomalies by up to 40% on challenging benchmarks compared to the state-of-the-art. Our method gives visually inspectable explanations for pixel-level anomalies.
摘要
异常检测(AD)是一项关键任务,它的目标是找到不符合学习的模型正常性的观察值。现有的深度AD研究大多基于 Familiarity 假设,即使用已经训练过的特征空间中的熟悉特征作为参考。然而,这种策略会导致常见的假阳性结果,即在异常值中包含未 capture 的新特征。我们提出一种基于解释力的新方法,可以捕捉输入空间中的新特征作为未解释的观察值。我们通过将相似性和新鲜度结合在一起来实现了一种混合方法,并在多个异常 benchmark 上达到了新的状态对领导地位。我们的方法可以处理多种异常类型,而无需购买贵重的背景模型和紧密匹配。特别是,我们发现通过考虑新特征,可以降低 false negative 异常值达到 40% 以上,相比之前的状态对领导地位。我们的方法还可以为像素级异常值提供可见的解释。
Categorizing Flight Paths using Data Visualization and Clustering Methodologies
results: 研究发现,基于地理距离模型在航道部分的 clustering效果较好,而基于cosine相似性模型在近端操作部分,如到达路径,的 clustering效果较好。此外,使用点抽象技术可以提高计算效率。Abstract
This work leverages the U.S. Federal Aviation Administration's Traffic Flow Management System dataset and DV8, a recently developed tool for highly interactive visualization of air traffic data, to develop clustering algorithms for categorizing air traffic by their varying flight paths. Two clustering methodologies, a spatial-based geographic distance model, and a vector-based cosine similarity model, are demonstrated and compared for their clustering effectiveness. Examples of their applications reveal successful, realistic clustering based on automated clustering result determination and human-in-the-loop processes, with geographic distance algorithms performing better for enroute portions of flight paths and cosine similarity algorithms performing better for near-terminal operations, such as arrival paths. A point extraction technique is applied to improve computation efficiency.
摘要
这项工作利用美国联邦航空管理局的交通流管理系统数据集和DV8工具,一种最近开发的高度互动式航空交通数据可视化工具,开发出 clustering 算法来分类不同的航空交通路径。我们示出了两种 clustering 方法,一种基于空间准备的地理距离模型,另一种基于向量的 косину similarity 模型,并对它们的划分效果进行比较。我们还提供了自动划分结果决定和人工循环过程的应用示例,其中地理距离算法在航道部分表现较好,而 косину similarity 算法在近机场操作,如进近路径,表现较好。此外,我们还应用了点提取技术来提高计算效率。
Data-Efficient Power Flow Learning for Network Contingencies
results: 在IEEE 30-Bus 电网上进行了 simulations,发现MT-VDK-GP方法可以在低训练数据范围(50-250样本)下减少了平均预测错误的50%以上,并在75%以上的 N-2 停机网络结构中超过了基于超参数的传输学习方法。此外,MT-VDK-GP方法还可以使用64倍少的电流解题方法来实现PVE。Abstract
This work presents an efficient data-driven method to learn power flows in grids with network contingencies and to estimate corresponding probabilistic voltage envelopes (PVE). First, a network-aware Gaussian process (GP) termed Vertex-Degree Kernel (VDK-GP), developed in prior work, is used to estimate voltage-power functions for a few network configurations. The paper introduces a novel multi-task vertex degree kernel (MT-VDK) that amalgamates the learned VDK-GPs to determine power flows for unseen networks, with a significant reduction in the computational complexity and hyperparameter requirements compared to alternate approaches. Simulations on the IEEE 30-Bus network demonstrate the retention and transfer of power flow knowledge in both N-1 and N-2 contingency scenarios. The MT-VDK-GP approach achieves over 50% reduction in mean prediction error for novel N-1 contingency network configurations in low training data regimes (50-250 samples) over VDK-GP. Additionally, MT-VDK-GP outperforms a hyper-parameter based transfer learning approach in over 75% of N-2 contingency network structures, even without historical N-2 outage data. The proposed method demonstrates the ability to achieve PVEs using sixteen times fewer power flow solutions compared to Monte-Carlo sampling-based methods.
摘要
Data-driven adaptive building thermal controller tuning with constraints: A primal-dual contextual Bayesian optimization approach
paper_authors: Wenjie Xu, Bratislav Svetozarevic, Loris Di Natale, Philipp Heer, Colin N Jones
for: 本文 targets the problem of minimizing the energy consumption of a room temperature controller while ensuring the daily cumulative thermal discomfort of occupants is below a given threshold.
methods: 本文提出了一种数据驱动的逻辑-对抗搜索(PDCBO)方法来解决这个问题。
results: 在一个单个房间的 simulate case study中,我们运用了我们的算法来调整PI饱和预热时间的参数,并获得了至多4.7%的能源减少,同时保证每天的温室不超过给定的快速阈值。此外,PDCBO还可以自动跟踪时间变化的快速阈值,而其他方法无法完成这一任务。Abstract
We study the problem of tuning the parameters of a room temperature controller to minimize its energy consumption, subject to the constraint that the daily cumulative thermal discomfort of the occupants is below a given threshold. We formulate it as an online constrained black-box optimization problem where, on each day, we observe some relevant environmental context and adaptively select the controller parameters. In this paper, we propose to use a data-driven Primal-Dual Contextual Bayesian Optimization (PDCBO) approach to solve this problem. In a simulation case study on a single room, we apply our algorithm to tune the parameters of a Proportional Integral (PI) heating controller and the pre-heating time. Our results show that PDCBO can save up to 4.7% energy consumption compared to other state-of-the-art Bayesian optimization-based methods while keeping the daily thermal discomfort below the given tolerable threshold on average. Additionally, PDCBO can automatically track time-varying tolerable thresholds while existing methods fail to do so. We then study an alternative constrained tuning problem where we aim to minimize the thermal discomfort with a given energy budget. With this formulation, PDCBO reduces the average discomfort by up to 63% compared to state-of-the-art safe optimization methods while keeping the average daily energy consumption below the required threshold.
摘要
我们研究控制室内温度的参数来减少能源消耗,并且保持每天累累感觉下限。我们将这个问题转化为线上受限制的黑盒优化问题,每天我们可以观察环境上的一些相关数据,然后选择参数。在这篇论文中,我们提出使用基于Primal-Dual Contextual Bayesian Optimization(PDCBO)的数据驱动方法来解决这个问题。在单一房间的实验案例中,我们使用我们的算法来调整PI适应器和预热时间的参数。我们的结果显示,PDCBO可以与其他现有的Bayesian优化基于方法相比,在平均每天的能源消耗下降4.7%,同时保持每天累累感觉下限。此外,PDCBO可以自动跟踪时间变化的受限制耐受阈值,而现有的方法则无法实现这一点。然后,我们研究一个受限制的问题,即将累累感觉降到最低,并且保持每天能源消耗在所需的预算下。这个问题中,PDCBO可以在平均每天的累累感觉下降63%,同时保持每天能源消耗在所需的预算下。
Identifying Copeland Winners in Dueling Bandits with Indifferences
results: 本文的实验结果表明,POCOWISTA 算法在实际中表现出色,它可以快速寻找玩家的 Copeland 赢家,并且在普通的对抗炮手问题中也有优秀的表现。此外,如果 preference probabilities 满足一种特殊的随机对称性条件,则可以提供一个改进的 worst-case 下界。Abstract
We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.
摘要
我们考虑了在战斗炮手问题中确定科普兰赢家的任务,这是一种未得到充分研究但实际上很有实际意义的战斗炮手问题变种,在这种变种中,除了简单的首选之外,还可能观察到反馈形式的半同意。我们提供了确定科普兰赢家的样本复杂度下界,以及一种名为POCOWISTA的算法,该算法的样本复杂度几乎与下界匹配,并在实际中表现出色,包括传统的战斗炮手问题。在首选概率满足特定的随机transitivity性时,我们提供了一种改进的启发版本,其 worst case 样本复杂度得到改进。
SEED: Simple, Efficient, and Effective Data Management via Large Language Models
paper_authors: Zui CHen, Lei Cao, Sam Madden, Ju Fan, Nan Tang, Zihui Gu, Zeyuan Shang, Chunwei Liu, Michael Cafarella, Tim Kraska
for: This paper aims to provide an efficient and effective data management system for large language models (LLMs) by addressing the challenges of computational and economic expense.
methods: The paper proposes a system called SEED, which consists of three main components: code generation, model generation, and augmented LLM query. SEED localizes LLM computation as much as possible, uses optimization techniques to enhance the localized solution and LLM queries, and allows users to easily construct a customized data management solution.
results: The paper achieves state-of-the-art few-shot performance while significantly reducing the number of required LLM calls for diverse data management tasks such as data imputation and NL2SQL translation.Abstract
We introduce SEED, an LLM-centric system that allows users to easily create efficient, and effective data management applications. SEED comprises three main components: code generation, model generation, and augmented LLM query to address the challenges that LLM services are computationally and economically expensive and do not always work well on all cases for a given data management task. SEED addresses the expense challenge by localizing LLM computation as much as possible. This includes replacing most of LLM calls with local code, local models, and augmenting LLM queries with batching and data access tools, etc. To ensure effectiveness, SEED features a bunch of optimization techniques to enhance the localized solution and the LLM queries, including automatic code validation, code ensemble, model representatives selection, selective tool usages, etc. Moreover, with SEED users are able to easily construct a data management solution customized to their applications. It allows the users to configure each component and compose an execution pipeline in natural language. SEED then automatically compiles it into an executable program. We showcase the efficiency and effectiveness of SEED using diverse data management tasks such as data imputation, NL2SQL translation, etc., achieving state-of-the-art few-shot performance while significantly reducing the number of required LLM calls.
摘要
我们介绍SEED系统,它是基于LLM的系统,让用户可以轻松地创建高效、高效的数据管理应用程序。SEED包括三个主要 ком成分:代码生成、模型生成和增强LLM查询。这些 ком成分是为了解决LLM服务 computationally和经济成本高,并且不一定在所有情况下能够实现数据管理任务。SEED通过地方化LLM计算来解决这个挑战,包括将大多数LLM请求替换为本地代码、本地模型和增强LLM查询批处理等。为了保证效果,SEED具有许多优化技术,包括自动验证代码、代码合并、模型选择、选择工具使用等。此外,SEED还允许用户轻松地建立自定义的数据管理解决方案,并且可以自然语言中 configurations 和构成执行管线。SEED 将自动将其转换为可执行程式。我们透过使用多种数据管理任务,例如数据补充、NL2SQL翻译等,得到了状况之中的几个shot性能,同时对LLM请求数量进行了重要削减。
Deterministic Langevin Unconstrained Optimization with Normalizing Flows
results: 该方法在标准的synthetic测试函数上实现了superior或竞争性的进步,并在实际的科学和神经网络优化问题上达到了竞争性的result。Abstract
We introduce a global, gradient-free surrogate optimization strategy for expensive black-box functions inspired by the Fokker-Planck and Langevin equations. These can be written as an optimization problem where the objective is the target function to maximize minus the logarithm of the current density of evaluated samples. This objective balances exploitation of the target objective with exploration of low-density regions. The method, Deterministic Langevin Optimization (DLO), relies on a Normalizing Flow density estimate to perform active learning and select proposal points for evaluation. This strategy differs qualitatively from the widely-used acquisition functions employed by Bayesian Optimization methods, and can accommodate a range of surrogate choices. We demonstrate superior or competitive progress toward objective optima on standard synthetic test functions, as well as on non-convex and multi-modal posteriors of moderate dimension. On real-world objectives, such as scientific and neural network hyperparameter optimization, DLO is competitive with state-of-the-art baselines.
摘要
我们介绍了一种全球、gradient-free的优化策略,用于优化costly黑obox函数。这种策略 Draw inspiration from the Fokker-Planck and Langevin equations, and can be formulated as an optimization problem where the objective is to maximize the target function minus the logarithm of the current density of evaluated samples. This objective balances the exploitation of the target objective with the exploration of low-density regions. Our method, Deterministic Langevin Optimization (DLO), uses a Normalizing Flow density estimate to perform active learning and select proposal points for evaluation. This strategy differs qualitatively from the widely-used acquisition functions employed by Bayesian Optimization methods, and can accommodate a range of surrogate choices. We demonstrate superior or competitive progress toward objective optima on standard synthetic test functions, as well as on non-convex and multi-modal posteriors of moderate dimension. On real-world objectives, such as scientific and neural network hyperparameter optimization, DLO is competitive with state-of-the-art baselines.
Spectral Neural Networks: Approximation Theory and Optimization Landscape
paper_authors: Chenghui Li, Rishi Sonthalia, Nicolas Garcia Trillos
for: This paper investigates the theoretical aspects of Spectral Neural Networks (SNN) and their tradeoffs with respect to the number of neurons and the amount of spectral geometric information learned.
methods: The paper uses a theoretical approach to explore the optimization landscape of SNN’s objective function, shedding light on the training dynamics of SNN and its non-convex ambient loss function.
results: The paper presents quantitative insights into the tradeoff between the number of neurons and the amount of spectral geometric information a neural network learns, and initiates a theoretical exploration of the training dynamics of SNN.Abstract
There is a large variety of machine learning methodologies that are based on the extraction of spectral geometric information from data. However, the implementations of many of these methods often depend on traditional eigensolvers, which present limitations when applied in practical online big data scenarios. To address some of these challenges, researchers have proposed different strategies for training neural networks as alternatives to traditional eigensolvers, with one such approach known as Spectral Neural Network (SNN). In this paper, we investigate key theoretical aspects of SNN. First, we present quantitative insights into the tradeoff between the number of neurons and the amount of spectral geometric information a neural network learns. Second, we initiate a theoretical exploration of the optimization landscape of SNN's objective to shed light on the training dynamics of SNN. Unlike typical studies of convergence to global solutions of NN training dynamics, SNN presents an additional complexity due to its non-convex ambient loss function.
摘要
有很多机器学习方法基于数据中特征几何信息的提取,但是许多实现方法常常依赖于传统的特征值解决方案,这些解决方案在实际上线大数据场景中存在限制。为了解决这些挑战,研究人员已经提议了不同的替代方案,其中一种是叫做特征神经网络(SNN)。在这篇论文中,我们调查了SNN的关键理论方面。首先,我们提供了量化的视角,描述了神经网络学习过程中特征几何信息和神经元数之间的负反关系。其次,我们开始了SNN目标函数优化境地的理论探索,以便更好地理解SNN训练过程的动态。不同于传统的NN训练动态研究,SNN增加了非对称的抽象损失函数,使其训练动态更加复杂。
Physics-Informed Graph Neural Network for Dynamic Reconfiguration of Power Systems
results: 论文的结果表明,GraPhyR 能够学习解决 DyR 问题,并且比传统的方法更快和更有效。Abstract
To maintain a reliable grid we need fast decision-making algorithms for complex problems like Dynamic Reconfiguration (DyR). DyR optimizes distribution grid switch settings in real-time to minimize grid losses and dispatches resources to supply loads with available generation. DyR is a mixed-integer problem and can be computationally intractable to solve for large grids and at fast timescales. We propose GraPhyR, a Physics-Informed Graph Neural Network (GNNs) framework tailored for DyR. We incorporate essential operational and connectivity constraints directly within the GNN framework and train it end-to-end. Our results show that GraPhyR is able to learn to optimize the DyR task.
摘要
Simplified Chinese:维护可靠的电网需要快速的决策算法来解决复杂的问题,如动态重组(DyR)。DyR在实时中 ottimize 分配网络设置,以最小化电网损失和将资源派发给可用的生产。DyR是一个混合整数问题,可能需要大量的计算时间和复杂的数据分析。我们提出了 GraPhyR,一个基于物理网络学习(GNNs)框架,特别适合DyR。我们直接将运作和连接约束 integrate 到 GNN 框架中,并将其训练成一个终端解决方案。我们的结果显示,GraPhyR 能够学习来优化 DyR 任务。
Learning How to Propagate Messages in Graph Neural Networks
results: 经验表明,该提议的框架可以在不同类型的图benchmark上显著提高性能,并可以有效地学习GNN中的可解释性和个性化的传播策略。Abstract
This paper studies the problem of learning message propagation strategies for graph neural networks (GNNs). One of the challenges for graph neural networks is that of defining the propagation strategy. For instance, the choices of propagation steps are often specialized to a single graph and are not personalized to different nodes. To compensate for this, in this paper, we present learning to propagate, a general learning framework that not only learns the GNN parameters for prediction but more importantly, can explicitly learn the interpretable and personalized propagate strategies for different nodes and various types of graphs. We introduce the optimal propagation steps as latent variables to help find the maximum-likelihood estimation of the GNN parameters in a variational Expectation-Maximization (VEM) framework. Extensive experiments on various types of graph benchmarks demonstrate that our proposed framework can significantly achieve better performance compared with the state-of-the-art methods, and can effectively learn personalized and interpretable propagate strategies of messages in GNNs.
摘要
The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization
results: 研究发现,SGD在梯度下降过程中噪声会与损失函数的本地几何相似,并且SGD在避免锐 minimum 的过程中会选择平行于损失函数的平坦方向进行跃点。这与梯度下降法不同,后者只能逃脱锐 minimum 方向。实验 validate 了理论发现。Abstract
Empirical studies have demonstrated that the noise in stochastic gradient descent (SGD) aligns favorably with the local geometry of loss landscape. However, theoretical and quantitative explanations for this phenomenon remain sparse. In this paper, we offer a comprehensive theoretical investigation into the aforementioned {\em noise geometry} for over-parameterized linear (OLMs) models and two-layer neural networks. We scrutinize both average and directional alignments, paying special attention to how factors like sample size and input data degeneracy affect the alignment strength. As a specific application, we leverage our noise geometry characterizations to study how SGD escapes from sharp minima, revealing that the escape direction has significant components along flat directions. This is in stark contrast to GD, which escapes only along the sharpest directions. To substantiate our theoretical findings, both synthetic and real-world experiments are provided.
摘要
empirical studies have shown that the noise in stochastic gradient descent (SGD) is aligned with the local geometry of the loss landscape. however, there is a lack of theoretical and quantitative explanations for this phenomenon. in this paper, we provide a comprehensive theoretical investigation into the "noise geometry" of over-parameterized linear (OLMs) models and two-layer neural networks. we examine both average and directional alignments, paying special attention to how factors such as sample size and input data degeneracy affect the alignment strength. as a specific application, we use our noise geometry characterizations to study how SGD escapes from sharp minima, revealing that the escape direction has significant components along flat directions. this is in stark contrast to GD, which escapes only along the sharpest directions. to substantiate our theoretical findings, we provide both synthetic and real-world experiments.
PharmacoNet: Accelerating Large-Scale Virtual Screening by Deep Pharmacophore Modeling
results: 对比现有方法,PharmacoNet显示了更高的速度和更好的准确性,同时能够保留高过滤率下的hit候选者。研究发现,深度学习基于药物搜寻的方法可以激活未探索的药物搜寻潜力。Abstract
As the size of accessible compound libraries expands to over 10 billion, the need for more efficient structure-based virtual screening methods is emerging. Different pre-screening methods have been developed to rapidly screen the library, but the structure-based methods applicable to general proteins are still lacking: the challenge is to predict the binding pose between proteins and ligands and perform scoring in an extremely short time. We introduce PharmacoNet, a deep learning framework that identifies the optimal 3D pharmacophore arrangement which a ligand should have for stable binding from the binding site. By coarse-grained graph matching between ligands and the generated pharmacophore arrangement, we solve the expensive binding pose sampling and scoring procedures of existing methods in a single step. PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function. Furthermore, we show the promising result that PharmacoNet effectively retains hit candidates even under the high pre-screening filtration rates. Overall, our study uncovers the hitherto untapped potential of a pharmacophore modeling approach in deep learning-based drug discovery.
摘要
We introduce PharmacoNet, a deep learning framework that identifies the optimal 3D pharmacophore arrangement for stable binding from the binding site. By coarse-grained graph matching between ligands and the generated pharmacophore arrangement, we eliminate the expensive binding pose sampling and scoring procedures of existing methods in a single step. PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function.Moreover, we show that PharmacoNet effectively retains hit candidates even under high pre-screening filtration rates. Our study uncovers the hitherto untapped potential of a pharmacophore modeling approach in deep learning-based drug discovery.
A General Offline Reinforcement Learning Framework for Interactive Recommendation
methods: 论文提出了一种通用的Offline reinforcement learning框架,可以在不进行线上探索的情况下,最大化用户奖励。特别是,论文首先引入了一种 probabilistic generative model for interactive recommendation,然后提出了一种有效的推理算法基于历史反馈。
results: 论文通过五种方法来减少分布匹配问题,包括支持约束、监督辅助、政策约束、对偶约束和奖励推断。实验表明,提出的方法可以在两个公共的实验数据集上达到比现有的监督学习和强化学习方法更高的性能。Abstract
This paper studies the problem of learning interactive recommender systems from logged feedbacks without any exploration in online environments. We address the problem by proposing a general offline reinforcement learning framework for recommendation, which enables maximizing cumulative user rewards without online exploration. Specifically, we first introduce a probabilistic generative model for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world datasets, demonstrating that the proposed methods can achieve superior performance over existing supervised learning and reinforcement learning methods for recommendation.
摘要
这个论文研究在在线环境中学习互动推荐系统的问题,不需要在线探索。我们解决这个问题,提出了一种通用的离线强化学习推荐框架,可以在离线环境中最大化用户奖励。 Specifically, we first introduce a probabilistic生成模型 for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world datasets, demonstrating that the proposed methods can achieve superior performance over existing supervised learning and reinforcement learning methods for recommendation.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Optimization or Architecture: How to Hack Kalman Filtering
results: 该论文表明,通过使用OKF来优化非线性模型,可以使KF在某些问题上与神经网络模型相比赢得竞争力。此外,OKF还有较好的理论基础和实际表现。Abstract
In non-linear filtering, it is traditional to compare non-linear architectures such as neural networks to the standard linear Kalman Filter (KF). We observe that this mixes the evaluation of two separate components: the non-linear architecture, and the parameters optimization method. In particular, the non-linear model is often optimized, whereas the reference KF model is not. We argue that both should be optimized similarly, and to that end present the Optimized KF (OKF). We demonstrate that the KF may become competitive to neural models - if optimized using OKF. This implies that experimental conclusions of certain previous studies were derived from a flawed process. The advantage of OKF over the standard KF is further studied theoretically and empirically, in a variety of problems. Conveniently, OKF can replace the KF in real-world systems by merely updating the parameters.
摘要
“在非线性滤波中,传统上对非线性架构如神经网络进行比较,与标准的线性卡尔曼筛(KF)进行对比。我们注意到这两者是不同的两个组件:非线性架构和参数优化方法。具体来说,非线性模型经常被优化,而参考KF模型则不是。我们 argueThat both should be optimized similarly,并为此提出了优化后KF(OKF)。我们示出了KF可能与神经网络竞争,如果使用OKF进行优化。这意味着一些先前的研究结论可能基于不正确的过程得到。OKF比标准KF具有更大的优势,并在各种问题上进行了理论和实验研究。可以说,OKF可以在实际应用中取代KF,只需要更新参数即可。”
Learning Type Inference for Enhanced Dataflow Analysis
paper_authors: Lukas Seidel, Sedick David Baker Effendi, Xavier Pinho, Konrad Rieck, Brink van der Merwe, Fabian Yamaguchi
for: This paper aims to improve the accuracy and efficiency of type inference for dynamically-typed languages, specifically TypeScript, by using machine learning techniques.
methods: The paper proposes a Transformer-based model called CodeTIDAL5, which is trained to predict type annotations and integrates with an open-source static analysis tool called Joern.
results: The paper reports that CodeTIDAL5 outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall, and demonstrates the benefits of using the additional type information for security research.Here’s the same information in Simplified Chinese:
results: 论文表明,CodeTIDAL5比当前状态体系的最佳实现提高了7.85%的精度,达到了71.27%的总精度,并通过示出额外类型信息对安全研究带来了优势。Abstract
Statically analyzing dynamically-typed code is a challenging endeavor, as even seemingly trivial tasks such as determining the targets of procedure calls are non-trivial without knowing the types of objects at compile time. Addressing this challenge, gradual typing is increasingly added to dynamically-typed languages, a prominent example being TypeScript that introduces static typing to JavaScript. Gradual typing improves the developer's ability to verify program behavior, contributing to robust, secure and debuggable programs. In practice, however, users only sparsely annotate types directly. At the same time, conventional type inference faces performance-related challenges as program size grows. Statistical techniques based on machine learning offer faster inference, but although recent approaches demonstrate overall improved accuracy, they still perform significantly worse on user-defined types than on the most common built-in types. Limiting their real-world usefulness even more, they rarely integrate with user-facing applications. We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations. For effective result retrieval and re-integration, we extract usage slices from a program's code property graph. Comparing our approach against recent neural type inference systems, our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall. Furthermore, we present JoernTI, an integration of our approach into Joern, an open source static analysis tool, and demonstrate that the analysis benefits from the additional type information. As our model allows for fast inference times even on commodity CPUs, making our system available through Joern leads to high accessibility and facilitates security research.
摘要
这是一个挑战性的任务,因为在类型是在编译时才能知道的情况下,确定程式中的目标类型是非常困难的。为了解决这个问题,有越来越多的 dynamically-typed 语言加入了渐进类型系统,例如 TypeScript,它将类型系统引入 JavaScript 中。渐进类型可以帮助开发者更好地验证程式的行为,从而实现更加稳定、安全和可靠的程式。然而,在实践中,用户几乎不直接将类型资讯输入。另一方面,传统的类型推论面临着程序大小增长时的性能问题,而且机器学习技术的使用可以提供更快的推论,但是这些方法通常在用户自定义的类型上表现较差。为了解决这个问题,我们提出了 CodeTIDAL5,一个基于 Transformer 模型的类型预测方法。为了有效地从程式码中提取类型资讯,我们将程式码转换为 code property graph,并从中提取使用类型的片段。与最新的神经网络类型推论系统相比,我们的模型在 ManyTypes4TypeScript 测试 benchmark 上的表现比前者高7.85%,总的来说是71.27%的准确率。此外,我们还将我们的方法与 open source 静态分析工具 Joern 集成,并证明了这种结合带来的分析优化。由于我们的模型可以在廉价的实体CPU上进行快速的类型推论,因此通过 Joern 进行分析可以实现高可用性和促进安全研究。
Balancing Efficiency vs. Effectiveness and Providing Missing Label Robustness in Multi-Label Stream Classification
results: 我们的模型ML-BELS在11个基准模型、5个Synthetics和13个实际数据集上进行了广泛的评估,结果显示其能够平衡效率和有效性,同时对缺失标签和概念演化具有良好的Robustness。Abstract
Available works addressing multi-label classification in a data stream environment focus on proposing accurate models; however, these models often exhibit inefficiency and cannot balance effectiveness and efficiency. In this work, we propose a neural network-based approach that tackles this issue and is suitable for high-dimensional multi-label classification. Our model uses a selective concept drift adaptation mechanism that makes it suitable for a non-stationary environment. Additionally, we adapt our model to an environment with missing labels using a simple yet effective imputation strategy and demonstrate that it outperforms a vast majority of the state-of-the-art supervised models. To achieve our purposes, we introduce a weighted binary relevance-based approach named ML-BELS using the Broad Ensemble Learning System (BELS) as its base classifier. Instead of a chain of stacked classifiers, our model employs independent weighted ensembles, with the weights generated by the predictions of a BELS classifier. We show that using the weighting strategy on datasets with low label cardinality negatively impacts the accuracy of the model; with this in mind, we use the label cardinality as a trigger for applying the weights. We present an extensive assessment of our model using 11 state-of-the-art baselines, five synthetics, and 13 real-world datasets, all with different characteristics. Our results demonstrate that the proposed approach ML-BELS is successful in balancing effectiveness and efficiency, and is robust to missing labels and concept drift.
摘要
可用的工作,关于多标签分类在数据流环境中,主要关注提出准确的模型,但这些模型经常表现不具有效率。在这种情况下,我们提出一种基于神经网络的方法,可以解决这个问题,并适用于高维多标签分类。我们的模型使用一种选择性概念漂移适应机制,使其适用于不站ARY环境。此外,我们采用一种简单 yet effective的损失函数填充策略,以适应缺失标签的环境。为了实现我们的目标,我们引入了一种权重Binary relevance-based approach named ML-BELS,使用Broad Ensemble Learning System(BELS)作为基类фика器。而不是一串堆叠的类фика器,我们的模型使用独立的权重ensemble,其权重由BELS类фика器的预测值生成。我们发现,在 datasets with low label cardinality 上,使用权重策略会下降模型的准确率。因此,我们使用标签 cardinality 作为触发器,只在标签 cardinality 高于某个阈值时应用权重。我们对 11 种基线模型、5 种 sintetics 和 13 个实际 datasets进行了广泛的评估。我们的结果表明,我们的方法 ML-BELS 能够均衡效果和效率,并对缺失标签和概念漂移 exhibit robustness。
results: 在小到中等大的数据集上,该算法可以超越神经网络和 k-最近邻 regression。Abstract
Twin neural network regression is trained to predict differences between regression targets rather than the targets themselves. A solution to the original regression problem can be obtained by ensembling predicted differences between the targets of an unknown data point and multiple known anchor data points. Choosing the anchors to be the nearest neighbors of the unknown data point leads to a neural network-based improvement of k-nearest neighbor regression. This algorithm is shown to outperform both neural networks and k-nearest neighbor regression on small to medium-sized data sets.
摘要
双 neural network regression 是用来预测目标之间的差异而不是目标本身。一种解决原始回归问题的解决方案是通过 ensemble 预测未知数据点的目标之间的差异和多个已知的 anchor 数据点之间的差异。选择 anchor 为未知数据点最近的邻居,可以实现基于 neural network 的 k-nearest neighbor 回归的改进。这种算法在小到中型数据集上表现出了超过 neural networks 和 k-nearest neighbor 回归的性能。
PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting
methods: 该论文提出了一种新的CNN基于模型,即PatchMixer,它使用了可变 convolutional structure来保留时间信息。与传统的CNN在这个领域中使用多尺度或多支流的方法不同,我们的方法仅使用了深度分割 convolutions,以EXTRACT both local features和全局相关性。
results: 我们的实验结果表明,相比最佳的方法和CNN,PatchMixer在七个时间序列预测 bencmarks上提供了3.9%和21.2%的相对改进,同时比最高级别的方法快2-3倍。Abstract
Although the Transformer has been the dominant architecture for time series forecasting tasks in recent years, a fundamental challenge remains: the permutation-invariant self-attention mechanism within Transformers leads to a loss of temporal information. To tackle these challenges, we propose PatchMixer, a novel CNN-based model. It introduces a permutation-variant convolutional structure to preserve temporal information. Diverging from conventional CNNs in this field, which often employ multiple scales or numerous branches, our method relies exclusively on depthwise separable convolutions. This allows us to extract both local features and global correlations using a single-scale architecture. Furthermore, we employ dual forecasting heads that encompass both linear and nonlinear components to better model future curve trends and details. Our experimental results on seven time-series forecasting benchmarks indicate that compared with the state-of-the-art method and the best-performing CNN, PatchMixer yields $3.9\%$ and $21.2\%$ relative improvements, respectively, while being 2-3x faster than the most advanced method. We will release our code and model.
摘要
尽管Transformer在最近几年内时序预测任务中占据主导地位,但是一个基本挑战仍然存在:Transformer中的卷积层的自注意力机制会导致时间信息的丢失。为解决这些挑战,我们提出了PatchMixer,一种新的CNN基于模型。它引入了一种可变卷积结构,以保留时间信息。与传统的CNN在这个领域中,通常采用多个缩放或多个分支,我们的方法仅仅采用深度分解卷积。这使得我们可以提取本地特征和全局相关性,使用单一的大小结构。此外,我们使用双重预测头,包括线性和非线性组件,以更好地模型未来曲线趋势和细节。我们的实验结果表明,与状态之前的方法和最佳CNN相比,PatchMixer在七个时序预测标准 benchmark上提供了3.9%和21.2%的相对提升,同时比最先进的方法快2-3倍。我们将发布我们的代码和模型。
A primal-dual perspective for distributed TD-learning
for: investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process
methods: based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints
results: examined the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models, without assuming a doubly stochastic matrix for the communication network structure.Abstract
The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.
摘要
本文的目标是研究分布式时间差(TD)学习 для网络化多 Agent Markov决策过程。我们提出的方法基于分布式优化算法,可以被看作为 primal-dual ordinary differential equation(ODE)动力学Subject to null-space constraints。基于 primal-dual ODE 动力学Subject to null-space constraints 的快速抽象行为,我们研究了不同分布式 TD-学习场景中的最终迭代器行为,包括常数和减少步长和 Markovian 观测模型。不同于现有方法,我们的算法不需要假设基础通信网络结构是 doubly stochastic matrix。
GNRK: Graph Neural Runge-Kutta method for solving partial differential equations
paper_authors: Hoyun Choi, Sungyeop Lee, B. Kahng, Junghyo Jo for:* 这种新的方法可以用来解决广泛的偏微分方程(PDEs),而不需要特定的初始条件或偏微分方程的系数。methods:* 该方法基于图structures,使其对域分辨率和时间分辨率的变化具有抗锋性。* 该方法结合了图神经网络模块和回归结构,以提高其效率和通用性。results:* 对于2维布尔氏方程,GNRK表现出了较高的准确率和模型体积。* 该方法可以 straightforwardly拓展到解决相互关联的偏微分方程。Abstract
Neural networks have proven to be efficient surrogate models for tackling partial differential equations (PDEs). However, their applicability is often confined to specific PDEs under certain constraints, in contrast to classical PDE solvers that rely on numerical differentiation. Striking a balance between efficiency and versatility, this study introduces a novel approach called Graph Neural Runge-Kutta (GNRK), which integrates graph neural network modules with a recurrent structure inspired by the classical solvers. The GNRK operates on graph structures, ensuring its resilience to changes in spatial and temporal resolutions during domain discretization. Moreover, it demonstrates the capability to address general PDEs, irrespective of initial conditions or PDE coefficients. To assess its performance, we benchmark the GNRK against existing neural network based PDE solvers using the 2-dimensional Burgers' equation, revealing the GNRK's superiority in terms of model size and accuracy. Additionally, this graph-based methodology offers a straightforward extension for solving coupled differential equations, typically necessitating more intricate models.
摘要
The GNRK operates on graph structures, ensuring its ability to adapt to changes in spatial and temporal resolutions during domain discretization. Moreover, it can handle general PDEs regardless of initial conditions or PDE coefficients. To evaluate its performance, we benchmark the GNRK against existing neural network-based PDE solvers using the 2-dimensional Burgers' equation, demonstrating its superiority in terms of model size and accuracy.Furthermore, the graph-based methodology used in the GNRK provides a straightforward extension for solving coupled differential equations, which are typically more challenging to model. This study opens up new possibilities for using neural networks to solve a wide range of PDEs, with potential applications in fields such as fluid dynamics, heat transfer, and wave propagation.
On the Onset of Robust Overfitting in Adversarial Training
results: 实验结果表明,提出的两种方法能够有效地缓解robust overfitting,并提高模型的 adversarial robustness。Abstract
Adversarial Training (AT) is a widely-used algorithm for building robust neural networks, but it suffers from the issue of robust overfitting, the fundamental mechanism of which remains unclear. In this work, we consider normal data and adversarial perturbation as separate factors, and identify that the underlying causes of robust overfitting stem from the normal data through factor ablation in AT. Furthermore, we explain the onset of robust overfitting as a result of the model learning features that lack robust generalization, which we refer to as non-effective features. Specifically, we provide a detailed analysis of the generation of non-effective features and how they lead to robust overfitting. Additionally, we explain various empirical behaviors observed in robust overfitting and revisit different techniques to mitigate robust overfitting from the perspective of non-effective features, providing a comprehensive understanding of the robust overfitting phenomenon. This understanding inspires us to propose two measures, attack strength and data augmentation, to hinder the learning of non-effective features by the neural network, thereby alleviating robust overfitting. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed methods in mitigating robust overfitting and enhancing adversarial robustness.
摘要
针对抗性训练(AT) Algorithm,我们尝试分离 normal data 和抗击干扰作用,并发现了 robust overfitting 的根本机制。我们发现,AT 中的 robust overfitting 问题来自 normal data 的因素缺失,而这些因素缺失导致模型学习无效的特征,我们称之为 non-effective features。我们进行了详细的非效果特征生成分析和如何导致 robust overfitting 的分析。此外,我们还解释了 robust overfitting 的不同实验现象,并重新评估了不同的技术来mitigate robust overfitting,从非效果特征的角度出发。基于这种理解,我们提出了两种方法,攻击强度和数据增强,来阻止神经网络学习非效果特征,从而缓解 robust overfitting。我们在标准数据集上进行了广泛的实验,并证明了我们的方法可以有效地缓解 robust overfitting 并提高对抗性。
Path Structured Multimarginal Schrödinger Bridge for Probabilistic Learning of Hardware Resource Usage by Control Software
results: 方法可以快速减少至精度预测硬件资源使用情况,并且可以应用到任何软件预测Cyber-physical上下文相依性性能。Abstract
The solution of the path structured multimarginal Schr\"{o}dinger bridge problem (MSBP) is the most-likely measure-valued trajectory consistent with a sequence of observed probability measures or distributional snapshots. We leverage recent algorithmic advances in solving such structured MSBPs for learning stochastic hardware resource usage by control software. The solution enables predicting the time-varying distribution of hardware resource availability at a desired time with guaranteed linear convergence. We demonstrate the efficacy of our probabilistic learning approach in a model predictive control software execution case study. The method exhibits rapid convergence to an accurate prediction of hardware resource utilization of the controller. The method can be broadly applied to any software to predict cyber-physical context-dependent performance at arbitrary time.
摘要
解决方案是多 Structured 多 marginal Schrödinger 桥Problem (MSBP) 的解决方案,它是一系列观测概率分布或分布快照的最有可能的测试值轨迹。我们利用了最新的算法技术来解决这种结构化MSBP,用于学习干扰控制软件的硬件资源使用情况。解决方案可以预测时间变化的硬件资源可用性,并且保证线性快速收敛。我们在一个模型预测控制软件执行 caso study 中证明了我们的概率学方法的有效性。方法在控制器中的硬件资源使用情况中显示了快速收敛到准确的预测。该方法可以广泛应用于任何软件,以预测任意时间点的Cyber-Physical context-dependent性能。
SIMD Dataflow Co-optimization for Efficient Neural Networks Inferences on CPUs
results: 研究结果显示,以保持输出在SIMD注册中,同时将输入和权重重复 maximize 可以实现最好的性能,实现了对各种推理任务的广泛性和可靠性。具体来说,对于8位数字神经网络,可以 achieve 3x 速度提升;对于二进制神经网络,可以 achieve 4.8x 速度提升,对于现有的神经网络实现最佳化。Abstract
We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.
摘要
我们面临将神经网络部署到CPU上的挑战,尤其是对于降低推导时间而保持精度的最佳化。我们的新方法是根据神经网络的资料流(即computation order)进行回传分析和代码生成框架,以寻找可以使用单指令多数据(SIMD)实现最佳化的神经网络执行。我们的结果显示,可以将输出保持在SIMD寄存器中,同时将输入和权重重用到最大化的情况下,协助实现广泛的推导工作负载上的最高性能,相比今天的最佳化神经网络实现,8位数字神经网络可以达到3倍的速度提升,而二进制神经网络可以达到4.8倍的速度提升。
for: 这篇论文将绘制连接分类选择模型和在线学习多重枪支算法的关系。我们的贡献可以概括为两个关键方面: 1. 我们提供了下线 regret bound,覆盖广泛的算法家族,其中包括Exp3算法为特例。 2. 我们引入了一种新的对抗多重枪支算法家族, Drawing inspiration from generalized nested logit models introduced by \citet{wen:2001}.
methods: 我们的算法使用了分类选择模型,并且提供了closed-form sampling distribution probabilities,使其可以实现高效。
results: 我们通过数值实验,将我们的算法应用于随机枪支问题,并取得了实质性的结果。Abstract
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. Our contributions can be summarized in two key aspects. Firstly, we furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. Secondly, we introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models initially introduced by \citet{wen:2001}. These algorithms offer users the flexibility to fine-tune the model extensively, as they can be implemented efficiently due to their closed-form sampling distribution probabilities. To demonstrate the practical implementation of our algorithms, we present numerical experiments, focusing on the stochastic bandit case.
摘要
Firstly, we provide sublinear regret bounds for a wide range of algorithms, including the Exp3 algorithm as a special case.Secondly, we introduce a new family of adversarial multiarmed bandit algorithms, inspired by the generalized nested logit models first introduced by \citet{wen:2001}. These algorithms allow for extensive fine-tuning of the model and can be efficiently implemented due to their closed-form sampling distribution probabilities.To demonstrate the practical application of our algorithms, we present numerical experiments focusing on the stochastic bandit case.
results: 实验结果表明,这种水平类后门攻击方法可以具有高效率和高攻击成功率,并且可以轻松地绕过多种已知的防御方法。Abstract
All existing backdoor attacks to deep learning (DL) models belong to the vertical class backdoor (VCB). That is, any sample from a class will activate the implanted backdoor in the presence of the secret trigger, regardless of source-class-agnostic or source-class-specific backdoor. Current trends of existing defenses are overwhelmingly devised for VCB attacks especially the source-class-agnostic backdoor, which essentially neglects other potential simple but general backdoor types, thus giving false security implications. It is thus urgent to discover unknown backdoor types. This work reveals a new, simple, and general horizontal class backdoor (HCB) attack. We show that the backdoor can be naturally bounded with innocuous natural features that are common and pervasive in the real world. Note that an innocuous feature (e.g., expression) is irrelevant to the main task of the model (e.g., recognizing a person from one to another). The innocuous feature spans across classes horizontally but is exhibited by partial samples per class -- satisfying the horizontal class (HC) property. Only when the trigger is concurrently presented with the HC innocuous feature, can the backdoor be effectively activated. Extensive experiments on attacking performance in terms of high attack success rates with tasks of 1) MNIST, 2) facial recognition, 3) traffic sign recognition, and 4) object detection demonstrate that the HCB is highly efficient and effective. We extensively evaluate the HCB evasiveness against a (chronologically) series of 9 influential countermeasures of Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), and Beatrix (NDSS 23'), where none of them can succeed even when a simplest trigger is used.
摘要
所有现有的深度学习(DL)模型攻击都属于垂直类后门(VCB)。即任何一个类的样本在存在秘密触发符时,无论来源是否特定或不特定,都会触发嵌入的后门。现有的防御策略几乎都是为VCB攻击而设计的,特别是源特定后门,这实际上忽略了其他可能的简单但普遍的后门类型,从而给予false安全性。因此,发现未知后门类型是 Urgent。本工作发现了一种新的、简单的、普遍的水平类后门(HCB)攻击方法。我们表明,这种后门可以自然地与无关于主任务的 innocuous feature(例如表情)相结合,这种 innocuous feature 在实际世界中很普遍。注意,无关主任务的 innocuous feature 可以水平地跨类,但只有在触发符同时出现时,HC innocuous feature 才能够有效地触发后门。我们通过对不同任务的 MNIST、人脸识别、交通标识和物体检测进行了广泛的实验,并证明了 HCB 的高效率和可靠性。我们也进行了广泛的验证HCB的逃避性,并证明了HCB不受9种influential countermeasures的侵害,其中包括Fine-Pruning (RAID 18')、STRIP (ACSAC 19')、Neural Cleanse (Oakland 19')、ABS (CCS 19')、Februus (ACSAC 20')、MNTD (Oakland 21')、SCAn (USENIX SEC 21')、MOTH (Oakland 22')和Beatrix (NDSS 23')。 None of them can succeed even when a simplest trigger is used.
Robust Nonparametric Hypothesis Testing to Understand Variability in Training Neural Networks
results: 这篇论文的结果表明,使用这种新方法可以更好地评估DNN模型之间的相似度,并且可以捕捉到模型计算的不同。这种方法可以用于评估其他从训练过程中得到的模型属性。Abstract
Training a deep neural network (DNN) often involves stochastic optimization, which means each run will produce a different model. Several works suggest this variability is negligible when models have the same performance, which in the case of classification is test accuracy. However, models with similar test accuracy may not be computing the same function. We propose a new measure of closeness between classification models based on the output of the network before thresholding. Our measure is based on a robust hypothesis-testing framework and can be adapted to other quantities derived from trained models.
摘要
Thompson Exploration with Best Challenger Rule in Best Arm Identification
results: 我们证明了该策略是 any two-armed bandit problem 的 asymptotically optimal 策略,并且对于 general $K$-armed bandit problem ($K\geq 3$) 也有 near optimality。在numerical experiments中,我们的策略与 asymptotically optimal 策略相比,具有更好的sample complexity 性能,同时具有更低的计算成本。Abstract
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving an optimization problem at every round and/or are forced to explore an arm at least a certain number of times except those restricted to the Gaussian model. To address these limitations, we propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule. While Thompson sampling was originally considered for maximizing the cumulative reward, we demonstrate that it can be used to naturally explore arms in BAI without forcing it. We show that our policy is asymptotically optimal for any two-armed bandit problems and achieves near optimality for general $K$-armed bandit problems for $K\geq 3$. Nevertheless, in numerical experiments, our policy shows competitive performance compared to asymptotically optimal policies in terms of sample complexity while requiring less computation cost. In addition, we highlight the advantages of our policy by comparing it to the concept of $\beta$-optimality, a relaxed notion of asymptotic optimality commonly considered in the analysis of a class of policies including the proposed one.
摘要
Statistical Limits of Adaptive Linear Models: Low-Dimensional Estimation and Inference
results: 本文发现,当数据采集方式是 adaptive 时,OLS 估计器可能会具有较大的 estimation error,而且这个 error 与数据采集方式的度数相关。然而,当数据采集方式是 i.i.d. 时,OLS 估计器的 estimation error 可以达到最佳性。此外,本文还提出了一种新的估计器,可以在 adaptive 数据采集方式下实现 single coordinate inference。这个估计器的 asymptotic normality 性也得到了证明。Abstract
Estimation and inference in statistics pose significant challenges when data are collected adaptively. Even in linear models, the Ordinary Least Squares (OLS) estimator may fail to exhibit asymptotic normality for single coordinate estimation and have inflated error. This issue is highlighted by a recent minimax lower bound, which shows that the error of estimating a single coordinate can be enlarged by a multiple of $\sqrt{d}$ when data are allowed to be arbitrarily adaptive, compared with the case when they are i.i.d. Our work explores this striking difference in estimation performance between utilizing i.i.d. and adaptive data. We investigate how the degree of adaptivity in data collection impacts the performance of estimating a low-dimensional parameter component in high-dimensional linear models. We identify conditions on the data collection mechanism under which the estimation error for a low-dimensional parameter component matches its counterpart in the i.i.d. setting, up to a factor that depends on the degree of adaptivity. We show that OLS or OLS on centered data can achieve this matching error. In addition, we propose a novel estimator for single coordinate inference via solving a Two-stage Adaptive Linear Estimating equation (TALE). Under a weaker form of adaptivity in data collection, we establish an asymptotic normality property of the proposed estimator.
摘要
“统计中的估计和推断在收集数据时存在重要的挑战。甚至在线性模型中,常数最小二乘(OLS)估计器可能无法在单坐标估计中展现 asymptotic normality,并且有较大的误差。这个问题得到了最近的最小下界 bound,显示了数据收集机制允许自由变化时,估计单坐标误差可以被增加为 $\sqrt{d}$ 倍,与独立Identically distributed(i.i.d)数据相比。我们的工作探讨了这种估计性能之间的差异,并研究了高维线性模型中低维参数组件的估计性能如何受到数据收集机制的影响。我们确定了数据收集机制下的condition under which the estimation error for a low-dimensional parameter component matches its counterpart in the i.i.d. setting, up to a factor that depends on the degree of adaptivity。我们还提出了一种新的估计器,可以在高维线性模型中实现单坐标推断。在一种更弱的数据收集机制下,我们证明了该估计器的 asymptotic normality 性。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.
paper_authors: Ruipeng Guo, Qianwan Yang, Andrew S. Chang, Guorong Hu, Joseph Greene, Christopher V. Gabel, Sixian You, Lei Tian
for: This paper aims to develop a new imaging technique for visualizing complex and dynamic biological processes with high speed and large 3D space-bandwidth product (SBP).
methods: The proposed technique, called EventLFM, combines an event camera with Fourier light field microscopy (LFM) to achieve single-shot 3D wide-field imaging with asynchronous readout and high data throughput.
results: The authors demonstrate the ability of EventLFM to image fast-moving and rapidly blinking 3D samples at KHz frame rates and track GFP-labeled neurons in freely moving C. elegans with high accuracy.Here’s the Chinese translation of the three points:
methods: 该提案的技术是将事件相机与富ouriet light field microscopy(LFM)相结合,实现单次广角成像,并且使用异步读取,以提高数据传输速率。
results: 作者们证明了事件LFM可以在KHz帧率下成像高速和快速闪烁的3D样品,并且可以准确地跟踪在自由移动C. elegans中的GFP标记neuron。Abstract
Ultrafast 3D imaging is indispensable for visualizing complex and dynamic biological processes. Conventional scanning-based techniques necessitate an inherent tradeoff between the acquisition speed and space-bandwidth product (SBP). While single-shot 3D wide-field techniques have emerged as an attractive solution, they are still bottlenecked by the synchronous readout constraints of conventional CMOS architectures, thereby limiting the data throughput by frame rate to maintain a high SBP. Here, we present EventLFM, a straightforward and cost-effective system that circumnavigates these challenges by integrating an event camera with Fourier light field microscopy (LFM), a single-shot 3D wide-field imaging technique. The event camera operates on a novel asynchronous readout architecture, thereby bypassing the frame rate limitations intrinsic to conventional CMOS systems. We further develop a simple and robust event-driven LFM reconstruction algorithm that can reliably reconstruct 3D dynamics from the unique spatiotemporal measurements from EventLFM. We experimentally demonstrate that EventLFM can robustly image fast-moving and rapidly blinking 3D samples at KHz frame rates and furthermore, showcase EventLFM's ability to achieve 3D tracking of GFP-labeled neurons in freely moving C. elegans. We believe that the combined ultrafast speed and large 3D SBP offered by EventLFM may open up new possibilities across many biomedical applications.
摘要
超速3D成像是生物过程视化中不可或缺的。传统的扫描方式存在固有的质量比速度产品(SBP)交换限制,而单发3D广阔技术受到传统CMOS架构的同步读取限制,因此对数据传输率做出了限制,即保持高SBP的情况下,帧率限制。在这里,我们介绍了EventLFM,一种简单且cost-effective的系统,通过将事件摄像头与快速光场镜微scopio(LFM)结合,绕过了传统CMOS系统中的帧率限制。事件摄像头使用了新的异步读取架构,因此可以快速响应快速变化的3D动态过程。我们还开发了一种简单可靠的事件驱动LFM重构算法,可以可靠地从EventLFM中获取3D动力学。我们实验表明,EventLFM可以Robustly图像高速运动和快速灯泡3D样本,并且可以实现C. elegans中GFP标记的 neuron 3D跟踪。我们认为EventLFM的总体快速速度和大3DSBP可能会开拓新的生物医学应用领域。
Spatiotemporal Image Reconstruction to Enable High-Frame Rate Dynamic Photoacoustic Tomography with Rotating-Gantry Volumetric Imagers
paper_authors: Refik M. Cam, Chao Wang, Weylan Thompson, Sergey A. Ermilov, Mark A. Anastasio, Umberto Villa
For: 这种研究旨在开发一种能够应用于现有的扫描仪器上的快速扫描 PACT 图像重建方法,以解决现有系统中数据缺失的问题,并提高图像重建的精度和速度。* Methods: 该方法基于低级别矩阵估计(LRME),利用空间时间重复性来准确重建4D 空间时间图像。* Results: 数值研究表明,该方法可以准确地重建4D 动态图像,而实验研究则证明了该方法在实际应用中的可靠性和效果。Abstract
Significance: Dynamic photoacoustic computed tomography (PACT) is a valuable technique for monitoring physiological processes. However, current dynamic PACT techniques are often limited to 2D spatial imaging. While volumetric PACT imagers are commercially available, these systems typically employ a rotating gantry in which the tomographic data are sequentially acquired. Because the object varies during the data-acquisition process, the sequential data-acquisition poses challenges to image reconstruction associated with data incompleteness. The proposed method is highly significant in that it will address these challenges and enable volumetric dynamic PACT imaging with existing imagers. Aim: The aim of this study is to develop a spatiotemporal image reconstruction (STIR) method for dynamic PACT that can be applied to commercially available volumetric PACT imagers that employ a sequential scanning strategy. The proposed method aims to overcome the challenges caused by the limited number of tomographic measurements acquired per frame. Approach: A low-rank matrix estimation-based STIR method (LRME-STIR) is proposed to enable dynamic volumetric PACT. The LRME-STIR method leverages the spatiotemporal redundancies to accurately reconstruct a 4D spatiotemporal image. Results: The numerical studies substantiate the LRME-STIR method's efficacy in reconstructing 4D dynamic images from measurements acquired with a rotating gantry. The experimental study demonstrates the method's ability to faithfully recover the flow of a contrast agent at a frame rate of 0.1 s even when only a single tomographic measurement per frame is available. Conclusions: The LRME-STIR method offers a promising solution to the challenges faced by enabling 4D dynamic imaging using commercially available volumetric imagers. By enabling accurate 4D reconstruction, this method has the potential to advance preclinical research.
摘要
significación: La tomografía por sonido fotográfico dinámico (PACT) es una técnica valiosa para monitorear procesos fisiológicos. Sin embargo, las técnicas de PACT dinámicas actuales suelen limitarse a la imagen espacial bidimensional. Los escáneres de PACT volumétricos comerciales suelen emplear una gánglia rotatoria en la que los datos tomográficos se adquieren secuencialmente. Como el objeto varía durante el proceso de adquisición de datos, la adquisición de datos secuenciales plantea desafíos en la reconstrucción de imágenes asociada con la incompletitud de los datos. El método propuesto es altamente significativo ya que abordará estos desafíos y permitirá la imagen de volumetría dinámica PACT con imagers existentes. objetivo: El objetivo de este estudio es desarrollar un método de reconstrucción de imágenes espacio-temporal (STIR) para la PACT dinámica que pueda aplicarse a los imagers volumétricos PACT comerciales que utilizan una estrategia de escaneo secuencial. El método propuesto busca superar los desafíos causados por el número limitado de medidas tomográficas adquiridas por frame. enfoque: Se propone un método de estimación de matrices de baja riqueza (LRME-STIR) para la reconstrucción de imágenes espacio-temporales. El método de LRME-STIR aprovecha las redundancias espacio-temporales para reconstruir precisamente una imagen espacio-temporal de 4D. resultados: Los estudios numéricos respaldan la eficacia del método LRME-STIR en la reconstrucción de imágenes dinámicas de 4D a partir de medidas adquiridas con una gánglia rotatoria. El estudio experimental demuestra la capacidad del método para recuperar fielmente el flujo de un agente de contraste a una tasa de cuadros de 0,1 s, incluso cuando solo se adquieren medidas tomográficas por frame. conclusiones: El método LRME-STIR ofrece una solución prometedora para los desafíos que enfrenta la imagen de volumetría dinámica con imagers existentes. Al permitir la reconstrucción precisa de imágenes de 4D, este método tiene el potencial de avanzar en la investigación preclínica.
results: 这篇论文提出了一种不需要干扰功率或渠道状态信息(CSI)的恶意干扰器,并提出了一种基于统计的反干扰策略。此外,论文还介绍了一种能够在存在恶意干扰的情况下估计统计CSI的数据帧结构。Abstract
Emerging intelligent reflective surfaces (IRSs) significantly improve system performance, but also pose a signifcant risk for physical layer security (PLS). Unlike the extensive research on legitimate IRS-enhanced communications, in this article we present an adversarial IRS-based fully-passive jammer (FPJ). We describe typical application scenarios for Disco IRS (DIRS)-based FPJ, where an illegitimate IRS with random, time-varying reflection properties acts like a "disco ball" to randomly change the propagation environment. We introduce the principles of DIRS-based FPJ and overview existing investigations of the technology, including a design example employing one-bit phase shifters. The DIRS-based FPJ can be implemented without either jamming power or channel state information (CSI) for the legitimate users (LUs). It does not suffer from the energy constraints of traditional active jammers, nor does it require any knowledge of the LU channels. In addition to the proposed jamming attack, we also propose an anti-jamming strategy that requires only statistical rather than instantaneous CSI. Furthermore, we present a data frame structure that enables the legitimate access point (AP) to estimate the statistical CSI in the presence of the DIRS jamming. Typical cases are discussed to show the impact of the DIRS-based FPJ and the feasibility of the anti-jamming precoder. Moreover, we outline future research directions and challenges for the DIRS-based FPJ and its anti-jamming precoding to stimulate this line of research and pave the way for practical applications.
摘要
emerging intelligent reflective surfaces (IRSs) significantly improve system performance, but also pose a significant risk for physical layer security (PLS). unlike the extensive research on legitimate IRS-enhanced communications, in this article we present an adversarial IRS-based fully-passive jammer (FPJ). we describe typical application scenarios for Disco IRS (DIRS)-based FPJ, where an illegitimate IRS with random, time-varying reflection properties acts like a "disco ball" to randomly change the propagation environment. we introduce the principles of DIRS-based FPJ and overview existing investigations of the technology, including a design example employing one-bit phase shifters. the DIRS-based FPJ can be implemented without either jamming power or channel state information (CSI) for the legitimate users (LUs). it does not suffer from the energy constraints of traditional active jammers, nor does it require any knowledge of the LU channels. in addition to the proposed jamming attack, we also propose an anti-jamming strategy that requires only statistical rather than instantaneous CSI. furthermore, we present a data frame structure that enables the legitimate access point (AP) to estimate the statistical CSI in the presence of the DIRS jamming. typical cases are discussed to show the impact of the DIRS-based FPJ and the feasibility of the anti-jamming precoder. moreover, we outline future research directions and challenges for the DIRS-based FPJ and its anti-jamming precoding to stimulate this line of research and pave the way for practical applications.
Sequential Monte Carlo Graph Convolutional Network for Dynamic Brain Connectivity
methods: 该方法基于粒子滤波算法,可以在只有部分和噪声的观察数据情况下,不假设站立性的连接topology,并通过Sequential Monte Carlo Graph Convolutional Network (SMC-GCN)来限制干扰连接。
results: 实验研究表明,SMC-GCN方法在脑疾病分类任务中表现出色,超过了其他方法的性能。Abstract
An increasingly important brain function analysis modality is functional connectivity analysis which regards connections as statistical codependency between the signals of different brain regions. Graph-based analysis of brain connectivity provides a new way of exploring the association between brain functional deficits and the structural disruption related to brain disorders, but the current implementations have limited capability due to the assumptions of noise-free data and stationary graph topology. We propose a new methodology based on the particle filtering algorithm, with proven success in tracking problems, which estimates the hidden states of a dynamic graph with only partial and noisy observations, without the assumptions of stationarity on connectivity. We enrich the particle filtering state equation with a graph Neural Network called Sequential Monte Carlo Graph Convolutional Network (SMC-GCN), which due to the nonlinear regression capability, can limit spurious connections in the graph. Experiment studies demonstrate that SMC-GCN achieves the superior performance of several methods in brain disorder classification.
摘要
▼ 请注意,以下文本将被翻译成简化中文。一种日益重要的大脑功能分析方法是函数连接分析,它视连接为脑区域信号的统计 codependency。基于图的Brain Connectivity分析提供了一种探索脑功能缺陷和脑疾病相关的结构性破坏的新方法,但现有实现受限因为假设了噪声自由数据和静止的图表结构。我们提出了一种基于粒子滤波算法的新方法,该算法在跟踪问题中证明了成功,可以在只有部分和噪声的观察数据情况下估计图中隐藏的状态。我们在粒子滤波状态方程中添加了一种图神经网络 called Sequential Monte Carlo Graph Convolutional Network (SMC-GCN),该网络具有非线性回归能力,可以限制图中的假设连接。实验研究表明,SMC-GCN可以在脑疾病分类方面达到更高的性能。
Nonlinear Multi-Carrier System with Signal Clipping: Measurement, Analysis, and Optimization
for: Reducing the peak-to-average power ratio (PAPR) in OFDM systems
methods: Using the Bessel-Fourier PA (BFPA) model to analyze the nonlinearity of the power amplifier (PA), and simplifying the power expression using inter-modulation product (IMP) analysis
results: Optimizing the system setting for a nonlinear clipped OFDM system to achieve the symbol error rate (SER) lower bound in a practical system that considers both PA nonlinearity and clipping distortion.Abstract
Signal clipping is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clipping reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clipping distortion. Optimizing the joint system performance with consideration of both PA nonlinearity and clipping distortion remains an open problem due to the complex PA modeling. In this paper, we analyze the PA nonlinearity through the Bessel-Fourier PA (BFPA) model and simplify its power expression using inter-modulation product (IMP) analysis. We derive expressions of the receiver signal-to-noise ratio (SNR) and system symbol error rate (SER) for the nonlinear clipped OFDM system. With the derivations, we investigate the optimal system setting to achieve the SER lower bound in a practical OFDM system that considers both PA nonlinearity and clipping distortion. The methods and results presented in this paper can serve as a useful reference for the system-level optimization of clipped OFDM systems with nonlinear PA.
摘要
<>TRANSLATE_TEXTSignal clipping is a classic technique for reducing peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It has been widely applied in consumer electronic devices owing to its low complexity and high efficiency. Although clipping reduces the nonlinear distortion caused by power amplifiers (PAs), it induces additional clipping distortion. Optimizing the joint system performance with consideration of both PA nonlinearity and clipping distortion remains an open problem due to the complex PA modeling. In this paper, we analyze the PA nonlinearity through the Bessel-Fourier PA (BFPA) model and simplify its power expression using inter-modulation product (IMP) analysis. We derive expressions of the receiver signal-to-noise ratio (SNR) and system symbol error rate (SER) for the nonlinear clipped OFDM system. With the derivations, we investigate the optimal system setting to achieve the SER lower bound in a practical OFDM system that considers both PA nonlinearity and clipping distortion. The methods and results presented in this paper can serve as a useful reference for the system-level optimization of clipped OFDM systems with nonlinear PA.TRANSLATE_TEXT
An IRS-Assisted Secure Dual-Function Radar-Communication System
methods: 使用智能反射 superficie(IRS)和人工噪声(AN),并optimize the radar waveform, AN jamming noise, and IRS parameters to maximize the communication secrecy rate while meeting radar signal-to-noise ratio(SNR) constraints.
results: 提出一种新的系统设计方案,并使用分数编程技术将分数形目标函数转化为更易处理的非分数多项式。数值结果表明系统设计算法的收敛性,并显示了噪声分配对系统安全性的影响。Abstract
In dual-function radar-communication (DFRC) systems the probing signal contains information intended for the communication users, which makes that information vulnerable to eavesdropping by the targets. We propose a novel design for enhancing the physical layer security (PLS) of DFRC systems, via the help of intelligent reflecting surface (IRS) and artificial noise (AN), transmitted along with the probing waveform. The radar waveform, the AN jamming noise and the IRS parameters are designed to optimize the communication secrecy rate while meeting radar signal-to-noise ratio (SNR) constrains. Key challenges in the resulting optimization problem include the fractional form objective, the SNR being a quartic function of the IRS parameters, and the unit-modulus constraint of the IRS parameters. A fractional programming technique is used to transform the fractional form objective of the optimization problem into more tractable non-fractional polynomials. Numerical results are provided to demonstrate the convergence of the proposed system design algorithm, and also show the impact of the power assigned to the AN on the secrecy performance of the designed system.
摘要
在双功能雷达通信(DFRC)系统中,探测信号包含向通信用户传递的信息,因此这些信息容易受到目标的窃听。我们提议一种新的设计方案,以增强双功能雷达通信系统的物理层安全性(PLS),通过利用智能反射表面(IRS)和人工噪声(AN),同探测波形一起传输。雷达波形、噪声干扰和IRS参数是根据优化通信秘密率的要求,同时满足雷达信号响应比(SNR)的限制。关键挑战包括分数形目标函数、SNR为IRS参数的四次函数,以及IRS参数的单位模式约束。我们使用分数编程技术将分数形目标函数转换为更易处理的非分数多项式。numerical results show that the proposed system design algorithm converges and demonstrate the impact of the power assigned to the AN on the secrecy performance of the designed system.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that form instead.
An Experimental Prototype for Multistatic Asynchronous ISAC
results: 实验结果表明,多Static ISAC系统可以提供更高的感知能力,具有多视角的接收节点空间多样性。Abstract
We prototype and validate a multistatic mmWave ISAC system based on IEEE802.11ay. Compensation of the clock asynchrony between each TX and RX pair is performed using the sole LoS wireless signal propagation. As a result, our system provides concurrent target tracking and micro-Doppler estimation from multiple points of view, paving the way for practical multistatic data fusion. Our results on human movement sensing, complemented with precise, quantitative GT data, demonstrate the enhanced sensing capabilities of multistatic ISAC, due to the spatial diversity of the receiver nodes.
摘要
我们研究和验证了一个基于IEEE802.11ay的多态 millimeter wave ISAC系统。我们使用唯一的视线无线信号媒体进行时钟偏差补偿,因此我们的系统可以同时进行目标跟踪和微多普勒估算,从多个视点来源获得实用的数据融合。我们对人体运动感知进行了补充,并且通过精确的量化GT数据,示出了多态 ISAC的感知能力的增强,即因为接收节点的空间多样性。
results: 该算法在多种popular编程语言中的实现可以免除听觉switching artifacts,并且可以保持常见计算成本和内存占用量。代码可以在GitHub上免费下载。Abstract
Virtual and augmented realities are increasingly popular tools in many domains such as architecture, production, training and education, (psycho)therapy, gaming, and others. For a convincing rendering of sound in virtual and augmented environments, audio signals must be convolved in real-time with impulse responses that change from one moment in time to another. Key requirements for the implementation of such time-variant real-time convolution algorithms are short latencies, moderate computational cost and memory footprint, and no perceptible switching artifacts. In this engineering report, we introduce a partitioned convolution algorithm that is able to quickly switch between impulse responses without introducing perceptible artifacts, while maintaining a constant computational load and low memory usage. Implementations in several popular programming languages are freely available via GitHub.
摘要
虚拟和增强现实技术在多个领域得到了广泛应用,如建筑、生产、培训和教育、心理治疗、游戏等。为在虚拟和增强环境中提供真实的声音渲染,音频信号需要在实时中扩散到不同的冲击回应函数,这些函数在时间上变化。实现时变实时扩散算法的关键要求包括:短延迟时间、moderate计算成本和内存占用量,无法识别的切换 artifacts。本工程报告中,我们介绍了一种分解扩散算法,可以快速切换冲击回应函数,而无需引入明显的artefacts,同时保持了常量计算负担和内存占用量。实现在多种popular编程语言上可以免费获取于GitHub。
A Novel U-Net Architecture for Denoising of Real-world Noise Corrupted Phonocardiogram Signal
For: 本研究旨在提出一种基于U-Net深度神经网络架构的心听音信号杂谔除去方法,以解决在医学 auscultation 中心听音信号杂谔问题。* Methods: 为了设计、开发和验证提议的架构,我们提出了一种新的实验方法,利用现实世界噪声污染的PCG信号 DATASET 和一个开放式PCG DATASET。* Results: 对比与现有状态的先进技术,我们的杂谔除去方法在Synthesized noisy PCG DATASET 上的性能评估表明,提出的方法在识别和预测方面具有显著的改进。Abstract
The bio-acoustic information contained within heart sound signals are utilized by physicians world-wide for auscultation purpose. However, the heart sounds are inherently susceptible to noise contamination. Various sources of noises like lung sound, coughing, sneezing, and other background noises are involved in such contamination. Such corruption of the heart sound signal often leads to inconclusive or false diagnosis. To address this issue, we have proposed a novel U-Net based deep neural network architecture for denoising of phonocardiogram (PCG) signal in this paper. For the design, development and validation of the proposed architecture, a novel approach of synthesizing real-world noise corrupted PCG signals have been proposed. For the purpose, an open-access real-world noise sample dataset and an open-access PCG dataset has been utilized. The performance of the proposed denoising methodology has been evaluated on the synthesized noisy PCG dataset. The performance of the proposed algorithm has been compared with existing state-of-the-art (SoA) denoising algorithms qualitatively and quantitatively. The proposed denoising technique has shown improvement in performance as comparison to the SoAs.
摘要
生物声学信息在心声信号中含有,医生世界各地通过 auscultation 来利用这些信息。然而,心声信号具有自然潜在的噪声污染。这些噪声包括肺 зву、喷嚏、喷嚏、和其他背景噪声等。这种噪声污染可能导致不正确或不 conclution 的诊断。为解决这个问题,我们在本文中提出了一种基于 U-Net 深度神经网络架构的PCG 信号杂音除除法。为了设计、开发和验证该架构,我们提出了一种新的实际噪声污染 PCG 信号生成方法。为此,我们使用了一个开放访问的实际噪声样本数据集和一个开放访问的 PCG 数据集。我们对提出的杂音除法方法进行评估,并与现有状态的最佳方法(SoA)进行比较。我们发现,提出的杂音除法方法在比较 SoA 方法时显示出了改善的性能。
results: 研究发现,现有的深度神经网络模型在黑人肤色区域中的性能不佳,只能在白皮肤区域中表现出色。这显示了这些模型在不同肤色区域中的一致性不足。Abstract
Melanoma is the most severe type of skin cancer due to its ability to cause metastasis. It is more common in black people, often affecting acral regions: palms, soles, and nails. Deep neural networks have shown tremendous potential for improving clinical care and skin cancer diagnosis. Nevertheless, prevailing studies predominantly rely on datasets of white skin tones, neglecting to report diagnostic outcomes for diverse patient skin tones. In this work, we evaluate supervised and self-supervised models in skin lesion images extracted from acral regions commonly observed in black individuals. Also, we carefully curate a dataset containing skin lesions in acral regions and assess the datasets concerning the Fitzpatrick scale to verify performance on black skin. Our results expose the poor generalizability of these models, revealing their favorable performance for lesions on white skin. Neglecting to create diverse datasets, which necessitates the development of specialized models, is unacceptable. Deep neural networks have great potential to improve diagnosis, particularly for populations with limited access to dermatology. However, including black skin lesions is necessary to ensure these populations can access the benefits of inclusive technology.
摘要
癌症是皮肤癌症中最严重的一种,因为它可以导致肿瘤迁移。它更常见于黑人,通常会影响到手掌、脚底和指甲。深度神经网络在临床护理和皮肤癌诊断方面表现出了巨大的潜力。然而,现有的研究大多涉及白皮肤Dataset,忽略了不同皮肤颜色的患者诊断结果的报告。在这项工作中,我们评估了指导和自动化模型在黑人常见的手掌、脚底和指甲部位上的皮肤癌图像中的表现。此外,我们也仔细筛选了包含黑人皮肤癌图像的Dataset,并评估了该Dataset在Fitzpatrick级别中的表现,以确认模型在黑皮肤上的性能。我们的结果表明,现有的模型在白皮肤上表现良好,但对黑皮肤的患者来说,这些模型的总体性能很差。忽略创建多样化的Dataset是不可接受的。深度神经网络在护理方面具有极大的潜力,特别是对于有限的资源的人群,但是包含黑皮肤癌图像是必要的,以确保这些人群可以通过包容技术获得诊断的优势。
Exploring SAM Ablations for Enhancing Medical Segmentation in Radiology and Pathology
results: 经过系列仔细设计的实验表明,SAM在放射学(特别是脑肿瘤 segmentation)和病理学(特别是乳腺癌 segmentation)中的应用具有很高的潜力,可以帮助解决医学影像分 segmentation 的挑战。Abstract
Medical imaging plays a critical role in the diagnosis and treatment planning of various medical conditions, with radiology and pathology heavily reliant on precise image segmentation. The Segment Anything Model (SAM) has emerged as a promising framework for addressing segmentation challenges across different domains. In this white paper, we delve into SAM, breaking down its fundamental components and uncovering the intricate interactions between them. We also explore the fine-tuning of SAM and assess its profound impact on the accuracy and reliability of segmentation results, focusing on applications in radiology (specifically, brain tumor segmentation) and pathology (specifically, breast cancer segmentation). Through a series of carefully designed experiments, we analyze SAM's potential application in the field of medical imaging. We aim to bridge the gap between advanced segmentation techniques and the demanding requirements of healthcare, shedding light on SAM's transformative capabilities.
摘要
医疗影像在各种医疗疾病诊断和治疗规划中扮演着关键的角色,医 radiology 和 pathology 都受到精确的图像分割的依赖。 segmen anything model(SAM)在不同领域中呈现出了一种有前途的框架,以下我们将对 SAM 进行分析,探讨其基本组件之间的细腻交互,以及对准确性和可靠性的影响。我们将在医 radiology(特别是脑肿瘤分割)和 pathology(特别是乳腺癌分割)中进行精心设计的实验,分析 SAM 在医疗影像领域的潜在应用。我们想通过 bridging 高级分割技术和医疗需求的 gap,把 SAM 的 transformative 能力推广到医疗领域。
Black-box Attacks on Image Activity Prediction and its Natural Language Explanations
results: 研究发现,使用了这种自然语言解释的模型很容易受到黑盒攻击,可以通过让模型生成不准确的解释来 manipulate 模型的决策。Abstract
Explainable AI (XAI) methods aim to describe the decision process of deep neural networks. Early XAI methods produced visual explanations, whereas more recent techniques generate multimodal explanations that include textual information and visual representations. Visual XAI methods have been shown to be vulnerable to white-box and gray-box adversarial attacks, with an attacker having full or partial knowledge of and access to the target system. As the vulnerabilities of multimodal XAI models have not been examined, in this paper we assess for the first time the robustness to black-box attacks of the natural language explanations generated by a self-rationalizing image-based activity recognition model. We generate unrestricted, spatially variant perturbations that disrupt the association between the predictions and the corresponding explanations to mislead the model into generating unfaithful explanations. We show that we can create adversarial images that manipulate the explanations of an activity recognition model by having access only to its final output.
摘要
explainable AI (XAI) 技术目的是描述深度神经网络决策过程。早期 XAI 技术生成了视觉解释,而更近期的技术生成了多 modal 解释,包括文本信息和视觉表示。视觉 XAI 技术在面对白盒和灰盒攻击时容易受损,攻击者具有完整或部分知道和访问目标系统的权限。然而,多 modal XAI 模型的抵御性尚未被调查,这篇论文是第一次评估黑盒攻击下自然语言解释生成的图像活动识别模型的可靠性。我们生成了无限制、空间变化的扰动,使模型的预测和相应的解释失去关联,以诱导模型生成不寻常的解释。我们显示了访问模型的最终输出后,可以创造欺骗性图像,使模型生成不准确的解释。
Small Visual Language Models can also be Open-Ended Few-Shot Learners
results: 在多Modal 少量数据集上表现出优秀的灵活性和性能,并且可以用小型模型(约1B参数)来实现,而不需要大型模型或专有模型。Abstract
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks open-ended few-shot abilities of small visual language models. Our proposed adaptation algorithm explicitly learns from symbolic, yet self-supervised training tasks. Specifically, our approach imitates image captions in a self-supervised way based on clustering a large pool of images followed by assigning semantically-unrelated names to clusters. By doing so, we construct the `self-context', a training signal consisting of interleaved sequences of image and pseudo-caption pairs and a query image for which the model is trained to produce the right pseudo-caption. We demonstrate the performance and flexibility of SeCAt on several multimodal few-shot datasets, spanning various granularities. By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe. SeCAt opens new possibilities for research in open-ended few-shot learning that otherwise requires access to large or proprietary models.
摘要
我们介绍Self-Context Adaptation(SeCAt),一种自我指导的方法,可以激活小视觉语言模型的开放式少量学习能力。我们的提议的适应算法直接从 символиック, yet自我指导的训练任务中学习。具体来说,我们的方法模仿图像描述文本在自我指导的方式基于图像集 clustering,并将抽象无关的名称分配给集群。通过这样做,我们构建了`自我上下文',一个训练信号包括交错的图像和假描述对象的序列,以及一个查询图像,对于该模型要生成正确的假描述。我们在多个多modal few-shot数据集上表现出了性能和灵活性,覆盖了不同的细化程度。使用大约1B参数的模型,我们超越了许多更大的模型,如冰冻和FROMAGe的少量学习能力。SeCAt开启了新的可能性 для开放式少量学习研究,否则需要访问大型或专有模型。
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks
paper_authors: Cameron Shinn, Collin McCarthy, Saurav Muralidharan, Muhammad Osama, John D. Owens
for: 评估神经网络中稀疙瘩的性能
methods: 提出了一种名为”简洁顶层”的视觉性能模型,用于评估神经网络中稀疙瘩的性能
results: 通过一种新的分析方法,可以预测稀疙瘩神经网络的性能,并 Validate the predicted speedup using several real-world computer vision architectures pruned across a range of sparsity patterns and degrees.Abstract
We introduce the Sparsity Roofline, a visual performance model for evaluating sparsity in neural networks. The Sparsity Roofline jointly models network accuracy, sparsity, and predicted inference speedup. Our approach does not require implementing and benchmarking optimized kernels, and the predicted speedup is equal to what would be measured when the corresponding dense and sparse kernels are equally well-optimized. We achieve this through a novel analytical model for predicting sparse network performance, and validate the predicted speedup using several real-world computer vision architectures pruned across a range of sparsity patterns and degrees. We demonstrate the utility and ease-of-use of our model through two case studies: (1) we show how machine learning researchers can predict the performance of unimplemented or unoptimized block-structured sparsity patterns, and (2) we show how hardware designers can predict the performance implications of new sparsity patterns and sparse data formats in hardware. In both scenarios, the Sparsity Roofline helps performance experts identify sparsity regimes with the highest performance potential.
摘要
我们介绍了简洁顶部(Sparsity Roofline),一个用于评估神经网络中的简洁性的可视性表现模型。简洁顶部同时考虑神经网络的准确性、简洁性和预测的执行速度增加。我们的方法不需要实现和测试优化的核心,且预测的速度与 dense 和简洁核心相同程度的优化相同。我们通过一个新的分析模型来预测简洁网络的性能,并使用多个真实世界计算机视觉架构中的简洁Pattern和度量进行验证。我们透过两个案例研究:首先,我们显示了如何在简洁顶部的帮助下,机器学习研究人员可以预测尚未实现或优化的块结构简洁模式的性能。其次,我们显示了如何在简洁顶部的帮助下,硬件设计师可以预测新的简洁模式和简洁数据格式在硬件上的性能影响。在这两个案例中,简洁顶部帮助性能专家识别最高性能潜在的简洁度域。
Diff-DOPE: Differentiable Deep Object Pose Estimation
results: 实现了状态机器pose estimation datasets的最佳效果,并且可以处理多种modalities,如RGB、深度、纹理边缘和物体 segmentation masks。Abstract
We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets. Our approach is a departure from recent methods in which the pose refiner is a deep neural network trained on a large synthetic dataset to map inputs to refinement steps. Rather, our use of differentiable rendering allows us to avoid training altogether. Our approach performs multiple gradient descent optimizations in parallel with different random learning rates to avoid local minima from symmetric objects, similar appearances, or wrong step size. Various modalities can be used, e.g., RGB, depth, intensity edges, and object segmentation masks. We present experiments examining the effect of various choices, showing that the best results are found when the RGB image is accompanied by an object mask and depth image to guide the optimization process.
摘要
我们介绍Diff-DOPE,一种6DoF姿态级化器,它接受图像、一个3D纹理模型和初始对象姿态作为输入。该方法使用可微渲染更新对象姿态,以最小化图像和模型投影之间的视觉错误。我们表明,这个简单 yet有效的想法可以实现状态革命的结果在姿态估计数据集上。我们的方法与最近的方法不同,后者是通过训练大量的 sintetic数据来训练一个深度神经网络,以将输入映射到更新步骤。而我们使用可微渲染,可以避免训练。我们的方法可以并行进行多个梯度下降优化,以避免相似的对象、同样的外观或错误的步长导致的本地极小值。不同的感知modalities可以使用,例如RGB、深度、强度边缘和对象分割mask。我们进行了不同的选择的实验,并显示了RGB图像和对象mask、深度图像的搭配能够获得最佳结果。
UniLVSeg: Unified Left Ventricular Segmentation with Sparsely Annotated Echocardiogram Videos through Self-Supervised Temporal Masking and Weakly Supervised Training
paper_authors: Fadillah Maani, Asim Ukaye, Nada Saadi, Numan Saeed, Mohammad Yaqub for: 这份研究的目的是提出一种可靠且高效的左心室(LV)分 segmentation方法,以帮助医生更加精确地诊断心血管疾病。methods: 本研究使用了自动学习(SSL)和弱监督训练(WST)两种方法,并考虑了三种不同的分 segmentation方法:3D分 segmentation和一种新的2D超像(SI)。results: 本研究比较了各种方法的效果,结果显示了我们的提案方法在大规模数据集(EchoNet-Dynamic)上获得了93.32%(95%CI 93.21-93.43%)的 dice分数,而且比之前的方法更高效。我们还提供了广泛的拓展研究,包括预训练设定和不同的深度学习架构。Abstract
Echocardiography has become an indispensable clinical imaging modality for general heart health assessment. From calculating biomarkers such as ejection fraction to the probability of a patient's heart failure, accurate segmentation of the heart and its structures allows doctors to plan and execute treatments with greater precision and accuracy. However, achieving accurate and robust left ventricle segmentation is time-consuming and challenging due to different reasons. This work introduces a novel approach for consistent left ventricular (LV) segmentation from sparsely annotated echocardiogram videos. We achieve this through (1) self-supervised learning (SSL) using temporal masking followed by (2) weakly supervised training. We investigate two different segmentation approaches: 3D segmentation and a novel 2D superimage (SI). We demonstrate how our proposed method outperforms the state-of-the-art solutions by achieving a 93.32% (95%CI 93.21-93.43%) dice score on a large-scale dataset (EchoNet-Dynamic) while being more efficient. To show the effectiveness of our approach, we provide extensive ablation studies, including pre-training settings and various deep learning backbones. Additionally, we discuss how our proposed methodology achieves high data utility by incorporating unlabeled frames in the training process. To help support the AI in medicine community, the complete solution with the source code will be made publicly available upon acceptance.
摘要
echo cardiography 已成为现代医学实验室中不可或缺的诊断工具,从计算生物标志物such as 血液泵功率到患者的心血液疾病可能性,准确地分割心脏和其结构,帮助医生更加准确地规划和执行治疗。然而,实现准确和可靠的左心室(LV)分割是一项时间consuming和困难的任务,主要因为多种原因。这种工作介绍了一种新的方法,可以从缺乏标注的echo cardiogram视频中获得一致的LV分割结果。我们通过(1)自动学习(SSL)使用时间掩蔽,然后(2)弱监督训练来实现这一目标。我们 investigate了两种不同的分割方法:3D分割和一种新的2D超像(SI)。我们展示了我们的提议方法在大规模数据集(EchoNet-Dynamic)上的表现,而且比现有的解决方案高效。为了证明我们的方法的有效性,我们提供了广泛的拟合研究,包括预训练设置和不同的深度学习背bone。此外,我们讨论了我们的方法如何实现高数据利用率,通过在训练过程中包含无标注帧。为了支持AI医学社区,我们将在接受后公开完整的解决方案和源代码。
On the Role of Neural Collapse in Meta Learning Models for Few-shot Learning
For: 这个论文探讨了基于少量示例学习的元学习框架,以及这些框架在新类上的泛化性。* Methods: 这个论文使用了元学习框架,并在Omniglot数据集上进行了几个示例学习任务的研究。* Results: 研究发现,随着模型大小增加,学习出来的特征往往呈现出神经塌磔现象,但不一定符合完整的神经塌磔性质。Abstract
Meta-learning frameworks for few-shot learning aims to learn models that can learn new skills or adapt to new environments rapidly with a few training examples. This has led to the generalizability of the developed model towards new classes with just a few labelled samples. However these networks are seen as black-box models and understanding the representations learnt under different learning scenarios is crucial. Neural collapse ($\mathcal{NC}$) is a recently discovered phenomenon which showcases unique properties at the network proceeds towards zero loss. The input features collapse to their respective class means, the class means form a Simplex equiangular tight frame (ETF) where the class means are maximally distant and linearly separable, and the classifier acts as a simple nearest neighbor classifier. While these phenomena have been observed in simple classification networks, this study is the first to explore and understand the properties of neural collapse in meta learning frameworks for few-shot learning. We perform studies on the Omniglot dataset in the few-shot setting and study the neural collapse phenomenon. We observe that the learnt features indeed have the trend of neural collapse, especially as model size grows, but to do not necessarily showcase the complete collapse as measured by the $\mathcal{NC}$ properties.
摘要 translate-internal: "Meta-learning frameworks for few-shot learning aim to learn models that can learn new skills or adapt to new environments rapidly with just a few training examples. This has led to the generalizability of the developed model towards new classes with just a few labelled samples. However, these networks are seen as black-box models, and understanding the representations learnt under different learning scenarios is crucial. Neural collapse (NC) is a recently discovered phenomenon that showcases unique properties when the network proceeds towards zero loss. The input features collapse to their respective class means, the class means form a Simplex equiangular tight frame (ETF) where the class means are maximally distant and linearly separable, and the classifier acts as a simple nearest neighbor classifier. While these phenomena have been observed in simple classification networks, this study is the first to explore and understand the properties of neural collapse in meta learning frameworks for few-shot learning. We perform studies on the Omniglot dataset in the few-shot setting and study the neural collapse phenomenon. We observe that the learnt features indeed have the trend of neural collapse, especially as model size grows, but they do not necessarily showcase the complete collapse as measured by the NC properties."Here's the translation in Traditional Chinese:translate-internal: "Meta-learning frameworks for few-shot learning aim to learn models that can learn new skills or adapt to new environments rapidly with just a few training examples. This has led to the generalizability of the developed model towards new classes with just a few labelled samples. However, these networks are seen as black-box models, and understanding the representations learnt under different learning scenarios is crucial. Neural collapse (NC) is a recently discovered phenomenon that showcases unique properties when the network proceeds towards zero loss. The input features collapse to their respective class means, the class means form a Simplex equiangular tight frame (ETF) where the class means are maximally distant and linearly separable, and the classifier acts as a simple nearest neighbor classifier. While these phenomena have been observed in simple classification networks, this study is the first to explore and understand the properties of neural collapse in meta learning frameworks for few-shot learning. We perform studies on the Omniglot dataset in the few-shot setting and study the neural collapse phenomenon. We observe that the learnt features indeed have the trend of neural collapse, especially as model size grows, but they do not necessarily showcase the complete collapse as measured by the NC properties."Note that the translation is in Simplified Chinese, as requested. If you would like the translation in Traditional Chinese instead, please let me know.
results: 研究人员通过用涂抹笔将lines绘制到图像上,实现了在YOLO模型中81.8%的攻击成功率。此外,研究人员还进行了数字和物理世界的广泛测试,并证明了该方法可以由无经验人员应用。Abstract
Visual adversarial examples have so far been restricted to pixel-level image manipulations in the digital world, or have required sophisticated equipment such as 2D or 3D printers to be produced in the physical real world. We present the first ever method of generating human-producible adversarial examples for the real world that requires nothing more complicated than a marker pen. We call them $\textbf{adversarial tags}$. First, building on top of differential rendering, we demonstrate that it is possible to build potent adversarial examples with just lines. We find that by drawing just $4$ lines we can disrupt a YOLO-based model in $54.8\%$ of cases; increasing this to $9$ lines disrupts $81.8\%$ of the cases tested. Next, we devise an improved method for line placement to be invariant to human drawing error. We evaluate our system thoroughly in both digital and analogue worlds and demonstrate that our tags can be applied by untrained humans. We demonstrate the effectiveness of our method for producing real-world adversarial examples by conducting a user study where participants were asked to draw over printed images using digital equivalents as guides. We further evaluate the effectiveness of both targeted and untargeted attacks, and discuss various trade-offs and method limitations, as well as the practical and ethical implications of our work. The source code will be released publicly.
摘要
“Visual adversarial examples”Previously, have only been restricted to digital image manipulation or require sophisticated equipment such as 2D or 3D printers to produce in the physical world. We present the first method of generating human-producible adversarial examples for the real world that only requires a marker pen. We call them “adversarial tags”.First, we build on differential rendering and show that it is possible to create powerful adversarial examples with just lines. We found that by drawing just 4 lines, we can disrupt a YOLO-based model in 54.8% of cases, and increasing it to 9 lines disrupts 81.8% of the cases tested. Next, we improve the method for line placement to be invariant to human drawing errors.We thoroughly evaluate our system in both the digital and analog worlds and demonstrate that our tags can be applied by untrained humans. We also conduct a user study where participants were asked to draw over printed images using digital equivalents as guides, and evaluate the effectiveness of both targeted and untargeted attacks. We discuss various trade-offs and method limitations, as well as the practical and ethical implications of our work. The source code will be released publicly.
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
results: 我们的方法在评测中超过了现有方法,并且通过用户测试得到了更高的评价。 code和数据将公开发布。Abstract
The generation of stylistic 3D facial animations driven by speech poses a significant challenge as it requires learning a many-to-many mapping between speech, style, and the corresponding natural facial motion. However, existing methods either employ a deterministic model for speech-to-motion mapping or encode the style using a one-hot encoding scheme. Notably, the one-hot encoding approach fails to capture the complexity of the style and thus limits generalization ability. In this paper, we propose DiffPoseTalk, a generative framework based on the diffusion model combined with a style encoder that extracts style embeddings from short reference videos. During inference, we employ classifier-free guidance to guide the generation process based on the speech and style. We extend this to include the generation of head poses, thereby enhancing user perception. Additionally, we address the shortage of scanned 3D talking face data by training our model on reconstructed 3DMM parameters from a high-quality, in-the-wild audio-visual dataset. Our extensive experiments and user study demonstrate that our approach outperforms state-of-the-art methods. The code and dataset will be made publicly available.
摘要
当前的3D facial动画生成技术面临着一个重要挑战,即学习很多到很多的映射关系 между语音、风格和自然的脸部动作。然而,现有的方法都是使用决定性的语音到动作映射模型,或者使用一个简单的一个热度编码方法来编码风格。可是,这种一个热度编码方法无法捕捉风格的复杂性,因此限制了其泛化能力。在这篇论文中,我们提出了DiffPoseTalk,一种基于扩散模型并与风格编码器结合的生成框架。在推理过程中,我们使用无类别导航来指导生成过程,根据语音和风格。此外,我们还扩展了头部pose的生成,从而提高用户的感知。此外,我们解决了3D talking face数据的缺乏问题,通过在高质量的自然语言视频 Dataset 中重建3DMM参数来训练我们的模型。我们的广泛的实验和用户研究表明,我们的方法在比较状态的方法之上出色表现。代码和数据将会公开释出。
Technical Report of 2023 ABO Fine-grained Semantic Segmentation Competition
results: 我们的模型在2023年ICCV 3DVeComm Workshop Challenge的Dev阶段取得了第3名。Abstract
In this report, we describe the technical details of our submission to the 2023 ABO Fine-grained Semantic Segmentation Competition, by Team "Zeyu\_Dong" (username:ZeyuDong). The task is to predicate the semantic labels for the convex shape of five categories, which consist of high-quality, standardized 3D models of real products available for purchase online. By using DGCNN as the backbone to classify different structures of five classes, We carried out numerous experiments and found learning rate stochastic gradient descent with warm restarts and setting different rate of factors for various categories contribute most to the performance of the model. The appropriate method helps us rank 3rd place in the Dev phase of the 2023 ICCV 3DVeComm Workshop Challenge.
摘要
在本报告中,我们介绍了我们对2023年ABO细致semantic segmentation比赛的提交技术细节,由Team "Zeyu\_Dong"(用户名:ZeyuDong)完成。任务是预测五类 convex shape的semantic标签,其中五类包括高质量、标准化的3D模型在线销售。通过使用DGCNN作为后端分类不同结构的五类,我们进行了许多实验,发现了学习率随机梯度下降与温存 restart的方法对模型性能产生了最大化的影响。这种方法帮助我们在2023年ICCV 3DVeComm Workshop Challenge的Dev阶段取得第三名。
PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
results: 这篇论文显示了PIXART-$\alpha$的训练速度明显高于现有的大规模T2I模型,例如PIXART-$\alpha$只需10.8%的Stable Diffusion v1.5的训练时间(675vs. 6,250 A100 GPU天),优化了约 $300,000( $26,000 vs. $320,000)和减少90%的二氧化碳排放。此外,与现有较大的SOTA模型相比,我们的训练成本仅1%。PIXART-$\alpha$在图像质量、艺术性和Semantic控制方面表现出色。Abstract
The most advanced text-to-image (T2I) models require significant training costs (e.g., millions of GPU hours), seriously hindering the fundamental innovation for the AIGC community while increasing CO2 emissions. This paper introduces PIXART-$\alpha$, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators (e.g., Imagen, SDXL, and even Midjourney), reaching near-commercial application standards. Additionally, it supports high-resolution image synthesis up to 1024px resolution with low training cost, as shown in Figure 1 and 2. To achieve this goal, three core designs are proposed: (1) Training strategy decomposition: We devise three distinct training steps that separately optimize pixel dependency, text-image alignment, and image aesthetic quality; (2) Efficient T2I Transformer: We incorporate cross-attention modules into Diffusion Transformer (DiT) to inject text conditions and streamline the computation-intensive class-condition branch; (3) High-informative data: We emphasize the significance of concept density in text-image pairs and leverage a large Vision-Language model to auto-label dense pseudo-captions to assist text-image alignment learning. As a result, PIXART-$\alpha$'s training speed markedly surpasses existing large-scale T2I models, e.g., PIXART-$\alpha$ only takes 10.8% of Stable Diffusion v1.5's training time (675 vs. 6,250 A100 GPU days), saving nearly \$300,000 (\$26,000 vs. \$320,000) and reducing 90% CO2 emissions. Moreover, compared with a larger SOTA model, RAPHAEL, our training cost is merely 1%. Extensive experiments demonstrate that PIXART-$\alpha$ excels in image quality, artistry, and semantic control. We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.
摘要
最先进的文本到图像(T2I)模型需要巨大的训练成本(例如,百万个GPU小时),这会很大程度地阻碍AIGC社区的基础创新,同时也会增加CO2排放。这篇论文介绍了PIXART-α,一种基于Transformer的T2I扩散模型,其生成图像质量与现状最先进的图像生成器(如Imagen、SDXL以及甚至Midjourney)相当,达到了近商用应用标准。此外,它还支持高分辨率图像生成,达到1024px的分辨率,训练成本低廉,如图1和图2所示。为了实现这一目标,我们提出了三个核心设计:1. 训练策略分解:我们将训练过程分为三个独立的步骤,每个步骤都会分别优化像素依赖关系、文本-图像对齐和图像美观质量。2. 高效的T2I transformer:我们在扩散变换器(DiT)中添加了跨注意力模块,以注入文本条件并使计算量昂贵的分类分支更加高效。3. 高信息 densities:我们强调了文本-图像对的概率密度的重要性,并利用大量的视觉语言模型自动生成密集 Pseudo-captions,以帮助图像-文本对齐学习。因此,PIXART-α的训练速度明显高于现有的大规模T2I模型,例如PIXART-α只需10.8%的Stable Diffusion v1.5的训练时间(675 vs. 6,250 A100 GPU天),相对 экономии了约300,000美元(26,000 vs. 320,000美元),同时减少了90%的CO2排放。此外,相比一个更大的SOTA模型,RAPHAEL,我们的训练成本只有1%。广泛的实验表明,PIXART-α在图像质量、艺术性和 semantics 控制方面具有优异的表现。我们希望PIXART-α能为AIGC社区和创新公司提供新的思路,以便他们可以从头开始构建高质量 yet low cost的生成模型。
MVC: A Multi-Task Vision Transformer Network for COVID-19 Diagnosis from Chest X-ray Images
results: 实验结果显示,提出的方法在比较基于基eline的方法时表现更好,在骨肉X射线图像分类和受影响区域识别两个任务上都达到了更高的准确率。Abstract
Medical image analysis using computer-based algorithms has attracted considerable attention from the research community and achieved tremendous progress in the last decade. With recent advances in computing resources and availability of large-scale medical image datasets, many deep learning models have been developed for disease diagnosis from medical images. However, existing techniques focus on sub-tasks, e.g., disease classification and identification, individually, while there is a lack of a unified framework enabling multi-task diagnosis. Inspired by the capability of Vision Transformers in both local and global representation learning, we propose in this paper a new method, namely Multi-task Vision Transformer (MVC) for simultaneously classifying chest X-ray images and identifying affected regions from the input data. Our method is built upon the Vision Transformer but extends its learning capability in a multi-task setting. We evaluated our proposed method and compared it with existing baselines on a benchmark dataset of COVID-19 chest X-ray images. Experimental results verified the superiority of the proposed method over the baselines on both the image classification and affected region identification tasks.
摘要
医学图像分析使用计算机基于算法已经在过去十年内吸引了广泛的研究者关注,取得了很大的进步。随着计算资源的提高和医学图像大规模数据集的可用性,许多深度学习模型在医疾诊断方面得到了应用。然而,现有的技术主要集中在子任务上,例如疾病分类和识别,而忽略了多任务诊断框架的开发。我们在这篇论文中提出了一种新的方法,即多任务视transformer(MVC),用于同时分类胸部X射影图像和识别输入数据中的受影响区域。我们的方法基于视transformer,但在多任务设定下扩展了其学习能力。我们对一个COVID-19胸部X射影图像数据集进行了实验,并与现有的基线相比较。实验结果表明,我们提出的方法在两个任务上都有superiority。
SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution
results: 对两个难题的空间-频谱超分辨 benchmark 进行了实验,并证明了 SSIF 可以在不同的空间和频谱分辨下表现出色,并且可以提高下游任务(例如土地使用分类)的性能 by 1.7%-7%。Abstract
Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolutions. To address this challenge, we propose Spatial-Spectral Implicit Function (SSIF), a neural implicit model that represents an image as a function of both continuous pixel coordinates in the spatial domain and continuous wavelengths in the spectral domain. We empirically demonstrate the effectiveness of SSIF on two challenging spatio-spectral super-resolution benchmarks. We observe that SSIF consistently outperforms state-of-the-art baselines even when the baselines are allowed to train separate models at each spectral resolution. We show that SSIF generalizes well to both unseen spatial resolutions and spectral resolutions. Moreover, SSIF can generate high-resolution images that improve the performance of downstream tasks (e.g., land use classification) by 1.7%-7%.
摘要
现有的数字感知器只能捕捉固定的空间和спектраль分辨率图像(例如RGB、多spectral和快速pectral图像),每种组合都需要特制的机器学习模型。神经凝函数partially overcomes the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolutions. To address this challenge, we propose Spatial-Spectral Implicit Function (SSIF), a neural implicit model that represents an image as a function of both continuous pixel coordinates in the spatial domain and continuous wavelengths in the spectral domain. We empirically demonstrate the effectiveness of SSIF on two challenging spatio-spectral super-resolution benchmarks. We observe that SSIF consistently outperforms state-of-the-art baselines even when the baselines are allowed to train separate models at each spectral resolution. We show that SSIF generalizes well to both unseen spatial resolutions and spectral resolutions. Moreover, SSIF can generate high-resolution images that improve the performance of downstream tasks (e.g., 土地使用分类) by 1.7%-7%.
Controlling Neural Style Transfer with Deep Reinforcement Learning
results: 实验结果表明,我们的RL-based方法可以具有更高的效果和稳定性,并且比现有的一步DL基本模型具有更低的计算复杂度和更轻量级的计算负担。Abstract
Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters. In this paper, we propose the first deep Reinforcement Learning (RL) based architecture that splits one-step style transfer into a step-wise process for the NST task. Our RL-based method tends to preserve more details and structures of the content image in early steps, and synthesize more style patterns in later steps. It is a user-easily-controlled style-transfer method. Additionally, as our RL-based model performs the stylization progressively, it is lightweight and has lower computational complexity than existing one-step Deep Learning (DL) based models. Experimental results demonstrate the effectiveness and robustness of our method.
摘要
控制 neural style transfer(NST)的度风格化有些困难,通常需要手工调整超参数。在这篇论文中,我们提出了首个深度强化学习(RL)基 Architecture,将一步式样式传递分解成多个步骤的NST任务。我们的RL基方法会在早期步骤中保留更多的内容图像细节和结构,并在后期步骤中更多地 sinthez style pattern。这是一种用户轻松控制的样式传递方法。此外,我们的RL基模型在进行样式传递的过程中,轻量级,计算复杂度较低于现有的一步DL基模型。实验结果表明我们的方法的有效性和稳定性。
MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings
paper_authors: Lei Yang, Jiaxin Yu, Xinyu Zhang, Jun Li, Li Wang, Yi Huang, Chuang Zhang, Hong Wang, Yiming Li
for: 提高路边摄像头的自动驾驶系统能力
methods: 利用智能路边摄像头,通过学习高维 embedding 来提高物体检测精度
results: 比前一代方法更高的3D物体检测性能,可以帮助路边摄像头提高自动驾驶系统的能力Abstract
Although the majority of recent autonomous driving systems concentrate on developing perception methods based on ego-vehicle sensors, there is an overlooked alternative approach that involves leveraging intelligent roadside cameras to help extend the ego-vehicle perception ability beyond the visual range. We discover that most existing monocular 3D object detectors rely on the ego-vehicle prior assumption that the optical axis of the camera is parallel to the ground. However, the roadside camera is installed on a pole with a pitched angle, which makes the existing methods not optimal for roadside scenes. In this paper, we introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE. Specifically, the ground plane is a stable and strong prior knowledge due to the fixed installation of cameras in roadside scenarios. In order to reduce the domain gap between the ground geometry information and high-dimensional image features, we employ a supervised training paradigm with a ground plane to predict high-dimensional ground-aware embeddings. These embeddings are subsequently integrated with image features through cross-attention mechanisms. Furthermore, to improve the detector's robustness to the divergences in cameras' installation poses, we replace the ground plane depth map with a novel pixel-level refined ground plane equation map. Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras. The code and pre-trained models will be released soon.
摘要
尽管现在大多数自动驾驶系统都在开发基于自驾车感知器的感知方法,但是有一种被忽略的代理方法是利用智能路边摄像头来帮助扩展自驾车感知范围之 beyond。我们发现大多数现有的单目3D物体探测器都基于自驾车的先前假设,即摄像头的光学轴与地面平行。然而,路边摄像头通常会被安装在倾斜的杆上,这使得现有的方法不适合路边场景。在这篇论文中,我们介绍了一种名为MonogaE的新框架,用于路边单目3D物体探测。 Specifically,我们认为地面是一种稳定和强大的先知知识,因为摄像头在路边enario中是固定安装的。为了减少图像特征和地面几何信息之间的领域差距,我们采用一种监督训练方法,使用地面来预测高维度的地面感知 embedding。这些 embedding subsequentially 与图像特征进行交叉注意机制。此外,为了提高探测器对摄像头安装位置的不同的灵活性,我们将地面深度图像 replaced noval pixel-level refined ground plane equation map。我们的方法在广泛recognized 3D探测标准准例中表现出了明显的性能优势。我们即将发布代码和预训练模型。
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
results: 实验表明,该模型能够与其他通用和任务特定视觉模型相比竞争,并且具有出色的扩展性,可以在未经见过的数据、类别和用户指令下进行高效的执行。Abstract
Recent advances in generative diffusion models have enabled text-controlled synthesis of realistic and diverse images with impressive quality. Despite these remarkable advances, the application of text-to-image generative models in computer vision for standard visual recognition tasks remains limited. The current de facto approach for these tasks is to design model architectures and loss functions that are tailored to the task at hand. In this paper, we develop a unified language interface for computer vision tasks that abstracts away task-specific design choices and enables task execution by following natural language instructions. Our approach involves casting multiple computer vision tasks as text-to-image generation problems. Here, the text represents an instruction describing the task, and the resulting image is a visually-encoded task output. To train our model, we pool commonly-used computer vision datasets covering a range of tasks, including segmentation, object detection, depth estimation, and classification. We then use a large language model to paraphrase prompt templates that convey the specific tasks to be conducted on each image, and through this process, we create a multi-modal and multi-task training dataset comprising input and output images along with annotated instructions. Following the InstructPix2Pix architecture, we apply instruction-tuning to a text-to-image diffusion model using our constructed dataset, steering its functionality from a generative model to an instruction-guided multi-task vision learner. Experiments demonstrate that our model, dubbed InstructCV, performs competitively compared to other generalist and task-specific vision models. Moreover, it exhibits compelling generalization capabilities to unseen data, categories, and user instructions.
摘要
Traditional Chinese:最近的生成扩散模型突破了文本控制的图像生成的可靠性和多样性,这些突破使得图像生成的应用在计算机视觉中尚未得到广泛应用。当前的 де факто方法是通过设计特定任务的模型结构和损失函数来实现这些任务。在这篇论文中,我们开发了一个统一的自然语言界面 для计算机视觉任务,这个界面将任务特定的设计选择抽象化,使得任务执行可以通过自然语言指令进行。我们的方法是将多种计算机视觉任务转化为文本到图像生成问题,其中文本表示任务的指令,并且生成的图像是视觉编码的任务输出。为了训练我们的模型,我们将常用的计算机视觉数据集合起来,包括分割、物体检测、深度估计和分类等任务。然后,我们使用一个大型自然语言模型来重新表达用于每个图像的指令模板,并通过这个过程创建了一个多modal和多任务的训练数据集。采用InstructPix2Pix架构,我们对文本到图像扩散模型进行指令调整,从而将其变成一个根据指令进行多任务视觉学习的模型。实验结果表明,我们的模型,称为InstructCV,与其他普遍和任务特定的视觉模型相比,表现竞争力强。此外,它还能够对未看到的数据、类别和用户指令进行吸引人的泛化。
Deep Active Learning with Noisy Oracle in Object Detection
paper_authors: Marius Schubert, Tobias Riedlinger, Karsten Kahl, Matthias Rottmann
for: 提高对象检测器的性能,减少人工标注的数量和质量不良的影响。
methods: 使用活动学习算法和标签审核模块,对活动数据中的噪声标注进行纠正,提高模型性能。
results: 在实验中,通过包括标签审核模块,使用部分标注预算来纠正噪声标注,提高对象检测器的性能,最高提升4.5个mAP点。Abstract
Obtaining annotations for complex computer vision tasks such as object detection is an expensive and time-intense endeavor involving a large number of human workers or expert opinions. Reducing the amount of annotations required while maintaining algorithm performance is, therefore, desirable for machine learning practitioners and has been successfully achieved by active learning algorithms. However, it is not merely the amount of annotations which influences model performance but also the annotation quality. In practice, the oracles that are queried for new annotations frequently contain significant amounts of noise. Therefore, cleansing procedures are oftentimes necessary to review and correct given labels. This process is subject to the same budget as the initial annotation itself since it requires human workers or even domain experts. Here, we propose a composite active learning framework including a label review module for deep object detection. We show that utilizing part of the annotation budget to correct the noisy annotations partially in the active dataset leads to early improvements in model performance, especially when coupled with uncertainty-based query strategies. The precision of the label error proposals has a significant influence on the measured effect of the label review. In our experiments we achieve improvements of up to 4.5 mAP points of object detection performance by incorporating label reviews at equal annotation budget.
摘要
获取复杂计算机视觉任务中的对象检测签名是一个昂贵的时间进行的劳动密集型任务,需要大量的人工工作或专家意见。因此,减少需要的签名数量而保持算法性能是机器学习实践者所需的,并已经由活动学习算法得到成功。然而,不仅是签名数量,签名质量也对模型性能产生影响。在实践中,查询新签名的或acles frequently contain significant amounts of noise。因此,清洁过程是必要的,以审查并更正给出的标签。这个过程与初始签名预算相同,需要人工工作或域专家。我们提议一种复合的活动学习框架,包括深度对象检测中的标签审查模块。我们表明,在活动数据集中部分使用签名预算来更正噪音签名,会导致早期提高模型性能,特别是与不确定性基于的查询策略相结合。标签错误提案的精度有重要的影响于测量效果。在我们的实验中,通过包含标签审查,提高了对象检测性能的4.5个MAP点。
Distilling Inductive Bias: Knowledge Distillation Beyond Model Compression
results: 通过将多种架构的轻量级教师模型 ensemble 教学,学生模型可以从 readily 可识别的存储 dataset 中积累广泛的知识,提高学生模型的性能。Abstract
With the rapid development of computer vision, Vision Transformers (ViTs) offer the tantalizing prospect of unified information processing across visual and textual domains. But due to the lack of inherent inductive biases in ViTs, they require enormous amount of data for training. To make their applications practical, we introduce an innovative ensemble-based distillation approach distilling inductive bias from complementary lightweight teacher models. Prior systems relied solely on convolution-based teaching. However, this method incorporates an ensemble of light teachers with different architectural tendencies, such as convolution and involution, to instruct the student transformer jointly. Because of these unique inductive biases, instructors can accumulate a wide range of knowledge, even from readily identifiable stored datasets, which leads to enhanced student performance. Our proposed framework also involves precomputing and storing logits in advance, essentially the unnormalized predictions of the model. This optimization can accelerate the distillation process by eliminating the need for repeated forward passes during knowledge distillation, significantly reducing the computational burden and enhancing efficiency.
摘要
随着计算机视觉的快速发展,视觉变换器(ViTs)提供了融合视觉和文本领域的信息处理的吸引人可能性。然而,由于变换器缺乏自然的逻辑假设,因此需要很大量的数据进行训练。为了使其应用实用,我们提出了一种创新的ensemble-based distillation方法,将各种轻量级教学模型中的 inductive bias 逻辑假设传播给学生变换器。先前的系统仅仅依靠卷积而教学,而我们的方法则是结合不同的建筑倾向,如卷积和反卷积,来教学学生变换器。由于这些特有的逻辑假设,教师可以吸收广泛的知识,甚至从Ready readily可识别的存储数据集中,这导致了学生的性能提高。我们的提议的框架还包括预计算和存储logits的步骤,实际上是模型的未正规化预测值。这种优化可以减少了知识传播过程中的计算负担,使得学生的训练变得更加高效,提高了效率。
Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering
results: 可以从多种照明和空间分布式表面材料中采样,并且能够生成高质量、多样化的环境地图样本,并准确地反映输入图像的照明情况。Abstract
Inverse rendering, the process of inferring scene properties from images, is a challenging inverse problem. The task is ill-posed, as many different scene configurations can give rise to the same image. Most existing solutions incorporate priors into the inverse-rendering pipeline to encourage plausible solutions, but they do not consider the inherent ambiguities and the multi-modal distribution of possible decompositions. In this work, we propose a novel scheme that integrates a denoising diffusion probabilistic model pre-trained on natural illumination maps into an optimization framework involving a differentiable path tracer. The proposed method allows sampling from combinations of illumination and spatially-varying surface materials that are, both, natural and explain the image observations. We further conduct an extensive comparative study of different priors on illumination used in previous work on inverse rendering. Our method excels in recovering materials and producing highly realistic and diverse environment map samples that faithfully explain the illumination of the input images.
摘要
<> invertible rendering,将场景属性从图像中推算出来的过程,是一个具有很多挑战的反向问题。任务是不定的,因为多种场景配置都可以导致同一张图像。大多数现有的解决方案将约束加入反向渲染管道中,以便推导可能的解决方案,但它们没有考虑内在的抽象和多模分布。在这项工作中,我们提议一种新的方案,即将自然照明地图预训练的杂化扩散概率模型integrated到一个可导的跟踪器框架中。该方法允许采样从组合照明和空间分布的表面材料中,这些材料都是自然的,并且能够解释输入图像的照明。我们进一步进行了对先前工作中不同照明约束的比较研究。我们的方法在恢复材料和生成高真实度、多样化的环境地图样本方面表现出色,能够准确地解释输入图像的照明。
Improving Cross-dataset Deepfake Detection with Deep Information Decomposition
results: 实验结果显示,该框架在不同测试数据集中具有更高的检测精度和普遍能力,并且能够对不同的伪造方法进行更好的检测。Abstract
Deepfake technology poses a significant threat to security and social trust. Although existing detection methods have demonstrated high performance in identifying forgeries within datasets using the same techniques for training and testing, they suffer from sharp performance degradation when faced with cross-dataset scenarios where unseen deepfake techniques are tested. To address this challenge, we propose a deep information decomposition (DID) framework in this paper. Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over visual artifacts. Specifically, it decomposes facial features into deepfake-related and irrelevant information and optimizes the deepfake information for real/fake discrimination to be independent of other factors. Our approach improves the robustness of deepfake detection against various irrelevant information changes and enhances the generalization ability of the framework to detect unseen forgery methods. Extensive experimental comparisons with existing state-of-the-art detection methods validate the effectiveness and superiority of the DID framework on cross-dataset deepfake detection.
摘要
深刻的假动作技术对安全和社会信任具有重大威胁。 although existing detection methods have shown high performance in identifying forgeries within datasets using the same techniques for training and testing, they suffer from sharp performance degradation when faced with cross-dataset scenarios where unseen deepfake techniques are tested. To address this challenge, we propose a deep information decomposition (DID) framework in this paper. Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over visual artifacts. Specifically, it decomposes facial features into deepfake-related and irrelevant information and optimizes the deepfake information for real/fake discrimination to be independent of other factors. Our approach improves the robustness of deepfake detection against various irrelevant information changes and enhances the generalization ability of the framework to detect unseen forgery methods. Extensive experimental comparisons with existing state-of-the-art detection methods validate the effectiveness and superiority of the DID framework on cross-dataset deepfake detection.Here's the translation breakdown:* 深刻的假动作技术 (shēn kòng de zhèng zhī yì jī jī) - deepfake technology* 对安全和社会信任 (duì ān qì yè shè qì) - pose a significant threat to security and social trust* existing detection methods (zhèng zhī yì jī) - existing detection methods* have demonstrated high performance (zhèng zhī yì jī) - have demonstrated high performance* in identifying forgeries within datasets (shuì zhèng zhī yì jī) - within datasets* using the same techniques for training and testing (yì jī yuè xíng) - using the same techniques for training and testing* but they suffer from sharp performance degradation (but they suffer from sharp performance degradation)* when faced with cross-dataset scenarios (zhèng zhī yì jī zhèng zhī yì jī) - when faced with cross-dataset scenarios* where unseen deepfake techniques are tested (where unseen deepfake techniques are tested)* To address this challenge, we propose (To address this challenge, we propose)* a deep information decomposition (DID) framework (a deep information decomposition (DID) framework)* Unlike most existing deepfake detection methods (zhèng zhī yì jī yuè xíng) - Unlike most existing deepfake detection methods* our framework prioritizes high-level semantic features (our framework prioritizes high-level semantic features)* over visual artifacts (over visual artifacts)* Specifically, it decomposes facial features (Specifically, it decomposes facial features)* into deepfake-related and irrelevant information (into deepfake-related and irrelevant information)* and optimizes the deepfake information (and optimizes the deepfake information)* for real/fake discrimination to be independent of other factors (for real/fake discrimination to be independent of other factors)* Our approach improves the robustness (Our approach improves the robustness)* of deepfake detection (of deepfake detection)* against various irrelevant information changes (against various irrelevant information changes)* and enhances the generalization ability (and enhances the generalization ability)* of the framework to detect unseen forgery methods (of the framework to detect unseen forgery methods)* Extensive experimental comparisons (Extensive experimental comparisons)* with existing state-of-the-art detection methods (with existing state-of-the-art detection methods)* validate the effectiveness (validate the effectiveness)* and superiority (and superiority)* of the DID framework (of the DID framework)* on cross-dataset deepfake detection (on cross-dataset deepfake detection)
Structural Adversarial Objectives for Self-Supervised Representation Learning
results: 实验表明,通过使用本文提出的自然语言生成方法,GANs可以学习出高质量的表示,与对比学习方法相当。Abstract
Within the framework of generative adversarial networks (GANs), we propose objectives that task the discriminator for self-supervised representation learning via additional structural modeling responsibilities. In combination with an efficient smoothness regularizer imposed on the network, these objectives guide the discriminator to learn to extract informative representations, while maintaining a generator capable of sampling from the domain. Specifically, our objectives encourage the discriminator to structure features at two levels of granularity: aligning distribution characteristics, such as mean and variance, at coarse scales, and grouping features into local clusters at finer scales. Operating as a feature learner within the GAN framework frees our self-supervised system from the reliance on hand-crafted data augmentation schemes that are prevalent across contrastive representation learning methods. Across CIFAR-10/100 and an ImageNet subset, experiments demonstrate that equipping GANs with our self-supervised objectives suffices to produce discriminators which, evaluated in terms of representation learning, compete with networks trained by contrastive learning approaches.
摘要
在生成对抗网络(GAN)框架内,我们提议一些目标,要让分类器进行自我超vised学习的表示学习。这些目标与网络中的简洁正则化相结合,导引分类器学习提取有用的表示,同时保持生成器可以从领域中随机抽取样本。具体来说,我们的目标让分类器在两级划分粒度上结构化特征:在大规模划分水平上对分布特征进行匹配,并在细规划分水平上将特征分组到本地团集中。作为GAN框架内的特征学习器,我们的自我超vised系统不需要靠手工设计的数据增强方案,这种方法在对比学习方法中广泛存在。在CIFAR-10/100和ImageNet子集上进行了实验,发现当我们将GAN equip with我们的自我超vised目标时,评价在表示学习方面的分类器与对比学习方法训练的网络相比,具有竞争力。
RBF Weighted Hyper-Involution for RGB-D Object Detection
paper_authors: Mehfuz A Rahman, Jiju Peethambaran, Neil London
for: 实时RGBD物体检测模型
methods: 提议使用深度导航强化和升降样本联合层
results: 在NYU Depth v2和SUN RGB-D datasets上显示出比其他RGB-D基于物体检测模型更高的性能,并在新的室外RGB-D物体检测数据集上取得了最佳性能。Abstract
A vast majority of conventional augmented reality devices are equipped with depth sensors. Depth images produced by such sensors contain complementary information for object detection when used with color images. Despite the benefits, it remains a complex task to simultaneously extract photometric and depth features in real time due to the immanent difference between depth and color images. Moreover, standard convolution operations are not sufficient to properly extract information directly from raw depth images leading to intermediate representations of depth which is inefficient. To address these issues, we propose a real-time and two stream RGBD object detection model. The proposed model consists of two new components: a depth guided hyper-involution that adapts dynamically based on the spatial interaction pattern in the raw depth map and an up-sampling based trainable fusion layer that combines the extracted depth and color image features without blocking the information transfer between them. We show that the proposed model outperforms other RGB-D based object detection models on NYU Depth v2 dataset and achieves comparable (second best) results on SUN RGB-D. Additionally, we introduce a new outdoor RGB-D object detection dataset where our proposed model outperforms other models. The performance evaluation on diverse synthetic data generated from CAD models and images shows the potential of the proposed model to be adapted to augmented reality based applications.
摘要
大多数传统增强现实设备都配备有深度传感器。深度图像生成于such传感器中包含补偿信息,可以帮助对象检测。despite the benefits, it remains a complex task to simultaneously extract photometric and depth features in real time due to the inherent difference between depth and color images. Moreover, standard convolution operations are not sufficient to properly extract information directly from raw depth images leading to intermediate representations of depth, which is inefficient. To address these issues, we propose a real-time and two-stream RGBD object detection model. The proposed model consists of two new components: a depth-guided hyper-evolution that adapts dynamically based on the spatial interaction pattern in the raw depth map and an up-sampling based trainable fusion layer that combines the extracted depth and color image features without blocking the information transfer between them. We show that the proposed model outperforms other RGB-D based object detection models on NYU Depth v2 dataset and achieves comparable (second best) results on SUN RGB-D. Additionally, we introduce a new outdoor RGB-D object detection dataset where our proposed model outperforms other models. The performance evaluation on diverse synthetic data generated from CAD models and images shows the potential of the proposed model to be adapted to augmented reality based applications.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese languages. If you prefer Traditional Chinese, please let me know.
MFL Data Preprocessing and CNN-based Oil Pipeline Defects Detection
results: 该论文通过使用实际数据进行验证,并达到了高度的性能水平,以增强石油管道异常检测的精度和效果。Abstract
Recently, the application of computer vision for anomaly detection has been under attention in several industrial fields. An important example is oil pipeline defect detection. Failure of one oil pipeline can interrupt the operation of the entire transportation system or cause a far-reaching failure. The automated defect detection could significantly decrease the inspection time and the related costs. However, there is a gap in the related literature when it comes to dealing with this task. The existing studies do not sufficiently cover the research of the Magnetic Flux Leakage data and the preprocessing techniques that allow overcoming the limitations set by the available data. This work focuses on alleviating these issues. Moreover, in doing so, we exploited the recent convolutional neural network structures and proposed robust approaches, aiming to acquire high performance considering the related metrics. The proposed approaches and their applicability were verified using real-world data.
摘要
Here is the text in Simplified Chinese:近期,计算机视觉在各个 industrielle 领域中的应用异常检测引起了关注。一个重要的例子是油管缺陷检测。油管缺陷的失效可以中断交通系统的全部运行或者引起广泛的故障。自动检测可以显著减少检测时间和相关成本。然而,现有的相关文献不充分考虑了阻碍数据的限制和磁漏泄检测数据的研究。这个工作强调解决这些问题。此外,我们还利用了最新的卷积神经网络结构,并提出了可靠的方法,以达到考虑相关维度的高性能。我们的方法和其可应用性得到了实际数据的验证。
Decoding Realistic Images from Brain Activity with Contrastive Self-supervision and Latent Diffusion
results: 实验结果显示,CnD可以高效重建复杂图像,并提供了对LDM组件和人脑视系统之间的量化解释。Abstract
Reconstructing visual stimuli from human brain activities provides a promising opportunity to advance our understanding of the brain's visual system and its connection with computer vision models. Although deep generative models have been employed for this task, the challenge of generating high-quality images with accurate semantics persists due to the intricate underlying representations of brain signals and the limited availability of parallel data. In this paper, we propose a two-phase framework named Contrast and Diffuse (CnD) to decode realistic images from functional magnetic resonance imaging (fMRI) recordings. In the first phase, we acquire representations of fMRI data through self-supervised contrastive learning. In the second phase, the encoded fMRI representations condition the diffusion model to reconstruct visual stimulus through our proposed concept-aware conditioning method. Experimental results show that CnD reconstructs highly plausible images on challenging benchmarks. We also provide a quantitative interpretation of the connection between the latent diffusion model (LDM) components and the human brain's visual system. In summary, we present an effective approach for reconstructing visual stimuli based on human brain activity and offer a novel framework to understand the relationship between the diffusion model and the human brain visual system.
摘要
<>将人脑活动转化为可读图像,提供了推进我们对大脑视系统的理解和计算机视觉模型之间的连接的可能性。虽然深入的生成模型已经被应用于这项任务,但是生成高质量图像的挑战仍然存在,因为大脑信号的下面表示和数据并不充分。在这篇论文中,我们提出了一种两阶段框架,名为对比和散射(CnD),用于从功能核磁共振图像记录中重建真实的图像。在第一阶段,我们通过自我超级vised对比学习获得了fMRI数据的表示。在第二阶段,这些编码的fMRI表示条件了我们提出的概念意识对应方法,使得扩散模型重建视觉刺激。实验结果表明,CnD可以在复杂的标准 benchmark 上重建高可能性的图像。我们还提供了人脑视系统中LDM组件和扩散模型之间的量化解释。总之,我们提出了基于人脑活动的可读图像重建方法,并提供了扩散模型和人脑视系统之间的新框架。
An easy zero-shot learning combination: Texture Sensitive Semantic Segmentation IceHrNet and Advanced Style Transfer Learning Strategy
results: 实验显示,IceHrNet在Texture专注数据集IPC_RI_SEG上超过了现状的方法,并在Shape专注river ice数据集上达到了优秀的效果。在零shot转移学习中,IceHrNet比其他方法提高了2个百分点。我们的代码和模型已经发布在https://github.com/PL23K/IceHrNet。Abstract
We proposed an easy method of Zero-Shot semantic segmentation by using style transfer. In this case, we successfully used a medical imaging dataset (Blood Cell Imagery) to train a model for river ice semantic segmentation. First, we built a river ice semantic segmentation dataset IPC_RI_SEG using a fixed camera and covering the entire ice melting process of the river. Second, a high-resolution texture fusion semantic segmentation network named IceHrNet is proposed. The network used HRNet as the backbone and added ASPP and Decoder segmentation heads to retain low-level texture features for fine semantic segmentation. Finally, a simple and effective advanced style transfer learning strategy was proposed, which can perform zero-shot transfer learning based on cross-domain semantic segmentation datasets, achieving a practical effect of 87% mIoU for semantic segmentation of river ice without target training dataset (25% mIoU for None Stylized, 65% mIoU for Conventional Stylized, our strategy improved by 22%). Experiments showed that the IceHrNet outperformed the state-of-the-art methods on the texture-focused dataset IPC_RI_SEG, and achieved an excellent result on the shape-focused river ice datasets. In zero-shot transfer learning, IceHrNet achieved an increase of 2 percentage points compared to other methods. Our code and model are published on https://github.com/PL23K/IceHrNet.
摘要
我们提出了一种简单的零shot semantic segmentation方法,利用样式传输。在这种情况下,我们成功地使用医疗影像 dataset(血球影像)来训练一个河川冰 semantic segmentation 模型。首先,我们建立了一个河川冰 semantic segmentation dataset IPC_RI_SEG,使用固定摄像头和覆盖整个河川冰融化过程。其次,我们提出了一种高分辨率Texture Fusion semantic segmentation网络,名为 IceHrNet。该网络使用 HRNet 作为背景,并添加 ASPP 和 Decoder segmentation 头,以保留低级别 Texture 特征,以提高精确的semantic segmentation。最后,我们提出了一种简单有效的高级 Style Transfer 学习策略,可以在 cross-domain semantic segmentation 数据集上进行零shot Transfer learning,实现了87% mIoU 的semantic segmentation精度,比无目标训练数据集 (25% mIoU for None Stylized, 65% mIoU for Conventional Stylized) 提高22%。实验显示,IceHrNet 在 texture-focused 数据集 IPC_RI_SEG 上超过了现状级方法,并在 shape-focused 河川冰数据集上达到了出色的结果。在零shot Transfer learning 中,IceHrNet 相比其他方法提高了2个百分点。我们的代码和模型已经在 上发布。
Dual-Augmented Transformer Network for Weakly Supervised Semantic Segmentation
results: 对于 PASCAL VOC 2012 难度评测 benchmark,该方法达到了最高的效果,超过了之前的州立艺术方法。Abstract
Weakly supervised semantic segmentation (WSSS), a fundamental computer vision task, which aims to segment out the object within only class-level labels. The traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions. However, such methods only focus on the most discriminative region of the object, resulting in incomplete segmentation. An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information. Yet, the lack of transductive bias to objects is a flaw of ViT. In this paper, we explore the dual-augmented transformer network with self-regularization constraints for WSSS. Specifically, we propose a dual network with both CNN-based and transformer networks for mutually complementary learning, where both networks augment the final output for enhancement. Massive systemic evaluations on the challenging PASCAL VOC 2012 benchmark demonstrate the effectiveness of our method, outperforming previous state-of-the-art methods.
摘要
弱类指导 semantic segmentation (WSSS) 是计算机视觉中的基本任务,旨在只使用类别标签来分割对象。传统方法通常采用 CNN 网络和使用 CAM 策略来发现对象区域。然而,这些方法只关注对象中最有特征的区域,导致 segmentation 不够完整。为了解决这问题,我们可以探索使用 transformer 网络来编码图像,以获得全局 semantic 信息。然而,transformer 网络缺乏对物体的推导性偏好。在这篇论文中,我们探索了 dual-augmented transformer 网络,并采用自我regularization 约束来解决 WSSS 问题。具体来说,我们提出了一个 dual 网络,其中包括 CNN 基于网络和 transformer 网络,用于互补学习。在 PASCAL VOC 2012 数据集上进行了大规模系统性评估,我们的方法被证明高效,并超过了之前的状态码法。
QUIZ: An Arbitrary Volumetric Point Matching Method for Medical Image Registration
results: 实验结果显示,这种方法在一个大尺度变形的阴道癌病人 dataset 上的注册结果比现有方法更为稳定和准确,甚至在跨Modal subjects 上也能够取得更好的结果,超过现有state-of-the-art。Abstract
Rigid pre-registration involving local-global matching or other large deformation scenarios is crucial. Current popular methods rely on unsupervised learning based on grayscale similarity, but under circumstances where different poses lead to varying tissue structures, or where image quality is poor, these methods tend to exhibit instability and inaccuracies. In this study, we propose a novel method for medical image registration based on arbitrary voxel point of interest matching, called query point quizzer (QUIZ). QUIZ focuses on the correspondence between local-global matching points, specifically employing CNN for feature extraction and utilizing the Transformer architecture for global point matching queries, followed by applying average displacement for local image rigid transformation. We have validated this approach on a large deformation dataset of cervical cancer patients, with results indicating substantially smaller deviations compared to state-of-the-art methods. Remarkably, even for cross-modality subjects, it achieves results surpassing the current state-of-the-art.
摘要
rigid pre-registration involving local-global matching or other large deformation scenarios is crucial. current popular methods rely on unsupervised learning based on grayscale similarity, but under circumstances where different poses lead to varying tissue structures, or where image quality is poor, these methods tend to exhibit instability and inaccuracies. in this study, we propose a novel method for medical image registration based on arbitrary voxel point of interest matching, called query point quizzer (quiz). quiz focuses on the correspondence between local-global matching points, specifically employing cnn for feature extraction and utilizing the transformer architecture for global point matching queries, followed by applying average displacement for local image rigid transformation. we have validated this approach on a large deformation dataset of cervical cancer patients, with results indicating substantially smaller deviations compared to state-of-the-art methods. remarkably, even for cross-modality subjects, it achieves results surpassing the current state-of-the-art.Here's the breakdown of the translation:* rigid pre-registration: 固定预注册* involving local-global matching: 包括本地-全局匹配* or other large deformation scenarios: 或其他大型扭曲场景* is crucial: 是关键的* Current popular methods rely on unsupervised learning: 当前流行的方法依靠无监督学习* based on grayscale similarity: 基于灰度相似性* but under circumstances where different poses lead to varying tissue structures: 但在不同的姿势下导致组织结构变化* or where image quality is poor: 或图像质量不佳* these methods tend to exhibit instability and inaccuracies: 这些方法往往表现出不稳定和不准确* In this study, we propose a novel method: 在这项研究中,我们提出了一种新的方法* for medical image registration: 医疗图像注册* based on arbitrary voxel point of interest matching: 基于任意体素点的关注匹配* called query point quizzer (QUIZ): 称为查询点赛询(QUIZ)* QUIZ focuses on the correspondence between local-global matching points: 赛询集中注重本地-全局匹配点之间的匹配* specifically employing CNN for feature extraction: 特别利用CNN提取特征* and utilizing the Transformer architecture for global point matching queries: 并利用Transformer架构进行全局点匹配查询* followed by applying average displacement for local image rigid transformation: 然后应用平均偏移来实现本地图像固定变换* We have validated this approach on a large deformation dataset of cervical cancer patients: 我们在一个大型扭曲 dataset 上验证了这种方法* with results indicating substantially smaller deviations compared to state-of-the-art methods: 结果表明与当前状态艺的方法相比,这种方法具有较小的偏差* Remarkably, even for cross-modality subjects: 备受惊叹的是,这种方法可以在不同的modalities中进行批处理* it achieves results surpassing the current state-of-the-art: 它在当前状态艺中超越了当前的最佳性能I hope this helps! Let me know if you have any further questions or if there's anything else I can help you with.
Pubic Symphysis-Fetal Head Segmentation Using Pure Transformer with Bi-level Routing Attention
results: 该方法在 транс体内超声影像数据集上进行评估,并达到了相当于的最终分数。代码将在 GitHub 上公开。Abstract
In this paper, we propose a method, named BRAU-Net, to solve the pubic symphysis-fetal head segmentation task. The method adopts a U-Net-like pure Transformer architecture with bi-level routing attention and skip connections, which effectively learns local-global semantic information. The proposed BRAU-Net was evaluated on transperineal Ultrasound images dataset from the pubic symphysis-fetal head segmentation and angle of progression (FH-PS-AOP) challenge. The results demonstrate that the proposed BRAU-Net achieves comparable a final score. The codes will be available at https://github.com/Caipengzhou/BRAU-Net.
摘要
在这篇论文中,我们提出了一种方法,名为BRAU-Net,用于解决公钵缝-胎头分割任务。该方法采用了一种类似于U-Net的纯Transformer架构,具有 би层路由注意力和跳过连接,能够有效学习本地-全局semantic信息。我们提posed的BRAU-Net在transperineal Ultrasound图像数据集上进行了评估,并实现了相对比较高的最终分数。codes将在https://github.com/Caipengzhou/BRAU-Net中提供。
InFER: A Multi-Ethnic Indian Facial Expression Recognition Dataset
results: 这个论文通过实验表明,使用深度学习技术可以在印度次大陆的多元人种背景下实现高度的人脸表达识别精度。Abstract
The rapid advancement in deep learning over the past decade has transformed Facial Expression Recognition (FER) systems, as newer methods have been proposed that outperform the existing traditional handcrafted techniques. However, such a supervised learning approach requires a sufficiently large training dataset covering all the possible scenarios. And since most people exhibit facial expressions based upon their age group, gender, and ethnicity, a diverse facial expression dataset is needed. This becomes even more crucial while developing a FER system for the Indian subcontinent, which comprises of a diverse multi-ethnic population. In this work, we present InFER, a real-world multi-ethnic Indian Facial Expression Recognition dataset consisting of 10,200 images and 4,200 short videos of seven basic facial expressions. The dataset has posed expressions of 600 human subjects, and spontaneous/acted expressions of 6000 images crowd-sourced from the internet. To the best of our knowledge InFER is the first of its kind consisting of images from 600 subjects from very diverse ethnicity of the Indian Subcontinent. We also present the experimental results of baseline & deep FER methods on our dataset to substantiate its usability in real-world practical applications.
摘要
随着深度学习技术的快速发展,过去十年,人脸表达识别(FER)系统得到了深刻的改进,新的方法被提出,超越了传统的手工设计方法。然而,这种监督学习方法需要一个具有所有可能情况的充分大的训练数据集。而人们的表达往往与年龄组、性别和民族相关,因此需要一个多样化的人脸表达数据集。这变得更加重要,在开发印度次大陆的FER系统时。在这种情况下,我们提出了InFER,一个包含10,200张图像和4,200个短视频的多元族裔印度人脸表达识别数据集。该数据集包含1000名人类的pose表达和互联网上抓取的6000张自然表达图像。根据我们所知,InFER是世界上第一个包含600名不同民族背景的人脸表达数据集。我们还将展示基线和深度FER方法在我们的数据集上的实验结果,以证明其在实际应用中的可用性。
Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation
results: 实验结果显示, NAYER 不仅超越了现有的方法,而且比前一些方法快5-15倍。Abstract
Data-Free Knowledge Distillation (DFKD) has recently made remarkable advancements with its core principle of transferring knowledge from a teacher neural network to a student neural network without requiring access to the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in the production of low-quality data and imposing substantial time requirements for training the generator. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the randomness source from the input to a noisy layer and utilizes the meaningful label-text embedding (LTE) as the input. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches.
摘要
<> translate "Data-Free Knowledge Distillation (DFKD) has recently made remarkable advancements with its core principle of transferring knowledge from a teacher neural network to a student neural network without requiring access to the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in the production of low-quality data and imposing substantial time requirements for training the generator. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the randomness source from the input to a noisy layer and utilizes the meaningful label-text embedding (LTE) as the input. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches."Translation:“数据无法知识传播(DFKD)最近又取得了显著进步,其核心思想是将知识从教师神经网络传播到学生神经网络,无需访问原始数据。然而,现有方法在生成随机噪声输入时遇到了重大挑战,因为这些噪声缺乏有意义信息。这使得这些模型很难准确地将噪声映射到真实样本分布,从而生成低质量数据,并且需要训练生成器的很长时间。在这篇论文中,我们提出了一种新的噪声层生成方法(NAYER),它将噪声源从输入重新定义到噪声层,并使用有意义的标签文本嵌入(LTE)作为输入。LTE的重要性在于它能够包含大量有意义的 между类信息,使得通过只需几个训练步骤就可以生成高质量样本。同时,噪声层对样本生成的多样性做出了重要贡献,避免模型过分强调约束的标签信息。我们在每个迭代中重新初始化噪声层,以便生成多样化的样本,同时仍保持方法的效率,即使是通过LTE的易学习性。我们在多个数据集上进行了实验,结果表明,我们的NAYER不仅超过了现有方法的性能,而且在5-15倍 faster than previous approaches。”
MMPI: a Flexible Radiance Field Representation by Multiple Multi-plane Images Blending
results: 实验结果表明,该方法可以高质量地生成不同摄像头分布和视角的新视图图像,并且比前一代快速训练NeRF方法更快速地训练完成。此外,作者还示出了该方法可以处理长轨迹和新视图图像的问题,表明其在自动驾驶等应用中的潜在可能性。Abstract
This paper presents a flexible representation of neural radiance fields based on multi-plane images (MPI), for high-quality view synthesis of complex scenes. MPI with Normalized Device Coordinate (NDC) parameterization is widely used in NeRF learning for its simple definition, easy calculation, and powerful ability to represent unbounded scenes. However, existing NeRF works that adopt MPI representation for novel view synthesis can only handle simple forward-facing unbounded scenes, where the input cameras are all observing in similar directions with small relative translations. Hence, extending these MPI-based methods to more complex scenes like large-range or even 360-degree scenes is very challenging. In this paper, we explore the potential of MPI and show that MPI can synthesize high-quality novel views of complex scenes with diverse camera distributions and view directions, which are not only limited to simple forward-facing scenes. Our key idea is to encode the neural radiance field with multiple MPIs facing different directions and blend them with an adaptive blending operation. For each region of the scene, the blending operation gives larger blending weights to those advantaged MPIs with stronger local representation abilities while giving lower weights to those with weaker representation abilities. Such blending operation automatically modulates the multiple MPIs to appropriately represent the diverse local density and color information. Experiments on the KITTI dataset and ScanNet dataset demonstrate that our proposed MMPI synthesizes high-quality images from diverse camera pose distributions and is fast to train, outperforming the previous fast-training NeRF methods for novel view synthesis. Moreover, we show that MMPI can encode extremely long trajectories and produce novel view renderings, demonstrating its potential in applications like autonomous driving.
摘要
Our key idea is to encode the neural radiance field with multiple MPIs facing different directions and blend them with an adaptive blending operation. For each region of the scene, the blending operation gives larger blending weights to those MPIs with stronger local representation abilities and lower weights to those with weaker representation abilities. This automatically modulates the multiple MPIs to appropriately represent the diverse local density and color information.Experiments on the KITTI dataset and ScanNet dataset show that our proposed method, called Multi-plane Multi-Image (MMPI), synthesizes high-quality images from diverse camera pose distributions and is fast to train, outperforming previous fast-training NeRF methods for novel view synthesis. Moreover, we demonstrate that MMPI can encode extremely long trajectories and produce novel view renderings, indicating its potential in applications like autonomous driving.
Walking = Traversable? : Traversability Prediction via Multiple Human Object Tracking under Occlusion
results: 该方法可以在视觉复杂enario中稳定地预测通行性,包括 occlusion、非线性视角、深度不确定和多个人员的交叠。Abstract
The emerging ``Floor plan from human trails (PfH)" technique has great potential for improving indoor robot navigation by predicting the traversability of occluded floors. This study presents an innovative approach that replaces first-person-view sensors with a third-person-view monocular camera mounted on the observer robot. This approach can gather measurements from multiple humans, expanding its range of applications. The key idea is to use two types of trackers, SLAM and MOT, to monitor stationary objects and moving humans and assess their interactions. This method achieves stable predictions of traversability even in challenging visual scenarios, such as occlusions, nonlinear perspectives, depth uncertainty, and intersections involving multiple humans. Additionally, we extend map quality metrics to apply to traversability maps, facilitating future research. We validate our proposed method through fusion and comparison with established techniques.
摘要
“人类脚踪映射(PfH)技术在室内机器人导航方面具有潜在的潜力,可以预测受阻的loor的可行性。本研究提出了一种创新的方法,替换了首人视角感知器,使用跟踪器Mounted on the observer robot的第三人视角监测器来收集多个人的数据。这种方法可以监测站ARYObjects和移动人员之间的互动,并对其进行评估。这种方法可以在视觉复杂场景中稳定地预测可行性,包括 occlusions、非线性视角、深度不确定性和多个人的交叉。此外,我们扩展了图像质量指标,以便应用于可行性图。我们验证了我们的提议方法通过融合和与现有技术进行比较。”Note that Simplified Chinese is the official standard for Chinese writing in mainland China, and it is used in this translation. Traditional Chinese is also commonly used in Taiwan and Hong Kong, but it may have slightly different grammar and character forms.
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation
paper_authors: Siyu Jiao, Yunchao Wei, Yaowei Wang, Yao Zhao, Humphrey Shi for:* The paper aims to improve the performance of zero-shot segmentation methods by addressing the insensitivity of CLIP to different mask proposals.methods:* The proposed method, Mask-aware Fine-tuning (MAFT), uses an Image-Proposals CLIP Encoder (IP-CLIP Encoder) to handle arbitrary numbers of image and mask proposals simultaneously.* MAFT introduces mask-aware loss and self-distillation loss to fine-tune IP-CLIP Encoder, ensuring CLIP is responsive to different mask proposals while maintaining transferability.results:* With MAFT, the performance of state-of-the-art methods is promoted by a large margin on popular zero-shot benchmarks, including COCO, Pascal-VOC, and ADE20K. Specifically, the mIoU for unseen classes is improved by 8.2%, 3.2%, and 4.3% respectively.Abstract
Recently, pre-trained vision-language models have been increasingly used to tackle the challenging zero-shot segmentation task. Typical solutions follow the paradigm of first generating mask proposals and then adopting CLIP to classify them. To maintain the CLIP's zero-shot transferability, previous practices favour to freeze CLIP during training. However, in the paper, we reveal that CLIP is insensitive to different mask proposals and tends to produce similar predictions for various mask proposals of the same image. This insensitivity results in numerous false positives when classifying mask proposals. This issue mainly relates to the fact that CLIP is trained with image-level supervision. To alleviate this issue, we propose a simple yet effective method, named Mask-aware Fine-tuning (MAFT). Specifically, Image-Proposals CLIP Encoder (IP-CLIP Encoder) is proposed to handle arbitrary numbers of image and mask proposals simultaneously. Then, mask-aware loss and self-distillation loss are designed to fine-tune IP-CLIP Encoder, ensuring CLIP is responsive to different mask proposals while not sacrificing transferability. In this way, mask-aware representations can be easily learned to make the true positives stand out. Notably, our solution can seamlessly plug into most existing methods without introducing any new parameters during the fine-tuning process. We conduct extensive experiments on the popular zero-shot benchmarks. With MAFT, the performance of the state-of-the-art methods is promoted by a large margin: 50.4% (+ 8.2%) on COCO, 81.8% (+ 3.2%) on Pascal-VOC, and 8.7% (+4.3%) on ADE20K in terms of mIoU for unseen classes. The code is available at https://github.com/jiaosiyu1999/MAFT.git.
摘要
近期,预训练的视觉语言模型在零样式分割任务中表现越来越出色。一般来说,这些解决方案采用的是首先生成mask提案,然后采用CLIP进行分类的方法。为保持CLIP的零样式传输性,以前的做法是冻结CLIP。然而,我们发现CLIP对不同的mask提案敏感度很低,它会为同一张图片的不同mask提案生成相同的预测结果。这种敏感度问题导致了许多假阳性的分类结果。这主要与CLIP在图像水平上进行训练有关。为解决这个问题,我们提出了一种简单 yet 有效的方法,即Mask-aware Fine-tuning(MAFT)。具体来说,我们提出了Image-Proposals CLIP Encoder(IP-CLIP Encoder),可以同时处理任意数量的图像和mask提案。然后,我们设计了面孔检测和自我顾问损失来细化IP-CLIP Encoder,确保CLIP对不同的mask提案具有响应性。这样,面孔检测可以轻松地学习出mask-aware表示,使真阳性能够出亮。另外,我们的解决方案可以不需要添加任何新参数,直接在已有的训练过程中进行细化。我们在popular zero-shot benchmark上进行了广泛的实验,与MAFT结合,state-of-the-art方法的性能得到了大幅提升:COCO中的50.4% (+ 8.2%),Pascal-VOC中的81.8% (+ 3.2%),ADE20K中的8.7% (+4.3%)。相关代码可以在https://github.com/jiaosiyu1999/MAFT.git中找到。
results: 我们的方法在特殊领域影像识别 зада域中得到了最佳性能。Abstract
Large pre-trained vision-language models, such as CLIP, have shown remarkable generalization capabilities across various tasks when appropriate text prompts are provided. However, adapting these models to specialized domains, like remote sensing images (RSIs), medical images, etc, remains unexplored and challenging. Existing prompt learning methods often lack domain-awareness or domain-transfer mechanisms, leading to suboptimal performance due to the misinterpretation of specialized images in natural image patterns. To tackle this dilemma, we proposed a Domain-Controlled Prompt Learning for the specialized domains. Specifically, the large-scale specialized domain foundation model (LSDM) is first introduced to provide essential specialized domain knowledge. Using lightweight neural networks, we transfer this knowledge into domain biases, which control both the visual and language branches to obtain domain-adaptive prompts in a directly incorporating manner. Simultaneously, to overcome the existing overfitting challenge, we propose a novel noisy-adding strategy, without extra trainable parameters, to help the model escape the suboptimal solution in a global domain oscillation manner. Experimental results show our method achieves state-of-the-art performance in specialized domain image recognition datasets. Our code is available at https://anonymous.4open.science/r/DCPL-8588.
摘要
大型预训 vision-language模型,如CLIP,在不同任务中表现出了惊人的通用能力,但将这些模型适应特殊领域,如遥感图像(RSIs)、医疗图像等,仍然是一个未探索的和挑战性的领域。现有的提问学习方法通常缺乏领域意识或领域传输机制,导致特殊图像在自然图像模式中的误 interpret,从而影响表现。为解决这个困难,我们提出了领域控制的提问学习(DCPL)方法。具体来说,我们首先引入大规模特殊领域基础模型(LSDM),以提供特殊领域知识的基础。使用轻量级神经网络,我们将这些知识转移到领域偏好,以控制视觉和语言 Zweige,从而获得适应特殊领域的提问。同时,为了解决现有的过拟合挑战,我们提出了一种新的噪音添加策略,不需要额外的可训练参数,以帮助模型脱离低效解决方案。实验结果表明,我们的方法在特殊领域图像识别数据集中实现了状态略 луч的表现。我们的代码可以在https://anonymous.4open.science/r/DCPL-8588中找到。
Pixel-Inconsistency Modeling for Image Manipulation Localization
results: 实验结果表明,本文提出的方法可以成功检测图像修饰,并且在不同的数据集和扰动图像上表现出优秀的通用性和Robustness。Abstract
Digital image forensics plays a crucial role in image authentication and manipulation localization. Despite the progress powered by deep neural networks, existing forgery localization methodologies exhibit limitations when deployed to unseen datasets and perturbed images (i.e., lack of generalization and robustness to real-world applications). To circumvent these problems and aid image integrity, this paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. The rationale is grounded on the observation that most image signal processors (ISP) involve the demosaicing process, which introduces pixel correlations in pristine images. Moreover, manipulating operations, including splicing, copy-move, and inpainting, directly affect such pixel regularity. We, therefore, first split the input image into several blocks and design masked self-attention mechanisms to model the global pixel dependency in input images. Simultaneously, we optimize another local pixel dependency stream to mine local manipulation clues within input forgery images. In addition, we design novel Learning-to-Weight Modules (LWM) to combine features from the two streams, thereby enhancing the final forgery localization performance. To improve the training process, we propose a novel Pixel-Inconsistency Data Augmentation (PIDA) strategy, driving the model to focus on capturing inherent pixel-level artifacts instead of mining semantic forgery traces. This work establishes a comprehensive benchmark integrating 15 representative detection models across 12 datasets. Extensive experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints and achieve state-of-the-art generalization and robustness performances in image manipulation localization.
摘要
“数字图像科学在图像认证和修改地址中扮演着关键角色。尽管深度神经网络的进步,现有的伪造地址方法在未见数据集和压缩图像上展示了局限性和不可靠性。为了缓解这些问题并帮助图像完整性,本文提出了一种通用和Robust的伪造地址模型,基于像素不一致痕迹的分析。我们认为大多数图像信号处理器(ISP)都包含排除过程,这会在原始图像中引入像素相关性。此外,操作包括拼接、复制、填充等,都会直接影响这种像素规律。因此,我们将输入图像分成多个块,并设计了带有mask的自注意力机制,以模型输入图像的全球像素依赖关系。同时,我们优化了另一个本地像素依赖流,以挖掘输入伪造图像中的本地伪造线索。此外,我们设计了一种新的学习加权模块(LWM),以将两条流合并,从而提高最终伪造地址性能。为了改进训练过程,我们提出了一种新的像素不一致数据增强策略(PIDA),使模型更加专注于捕捉内置像素级别的痕迹,而不是挖掘 semantic伪造迹象。这种工作建立了12个数据集上15种表示性检测模型的通用 benchmark。广泛的实验表明,我们的方法可以成功捕捉内置像素不一致伪造指纹,并在图像伪造地址方面实现了状态 искусственный intelligence的通用和Robust性表现。”
Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement
paper_authors: Kai Xu, Rongyu Chen, Gianni Franchi, Angela Yao
for: This paper focuses on the task of out-of-distribution (OOD) detection in deep learning systems, specifically on the recent state-of-the-art method of activation shaping (ASH).
methods: The paper proposes two novel methods for OOD detection: 1) SCALE, a post-hoc network enhancement method that achieves state-of-the-art OOD detection performance without compromising in-distribution (ID) accuracy, and 2) Intermediate Tensor SHaping (ISH), a lightweight method for training time OOD detection enhancement.
results: The paper reports AUROC scores of +1.85% for near-OOD and +0.74% for far-OOD datasets on the OpenOOD v1.5 ImageNet-1K benchmark, demonstrating the effectiveness of the proposed methods for OOD detection.Here is the information in Simplified Chinese text:
results: 论文报告了 OpenOOD v1.5 ImageNet-1K 测试集上的 AUROC 分数,分别为 +1.85% 和 +0.74%,这表明提出的方法对 OOD 检测具有效果。Abstract
The capacity of a modern deep learning system to determine if a sample falls within its realm of knowledge is fundamental and important. In this paper, we offer insights and analyses of recent state-of-the-art out-of-distribution (OOD) detection methods - extremely simple activation shaping (ASH). We demonstrate that activation pruning has a detrimental effect on OOD detection, while activation scaling enhances it. Moreover, we propose SCALE, a simple yet effective post-hoc network enhancement method for OOD detection, which attains state-of-the-art OOD detection performance without compromising in-distribution (ID) accuracy. By integrating scaling concepts into the training process to capture a sample's ID characteristics, we propose Intermediate Tensor SHaping (ISH), a lightweight method for training time OOD detection enhancement. We achieve AUROC scores of +1.85\% for near-OOD and +0.74\% for far-OOD datasets on the OpenOOD v1.5 ImageNet-1K benchmark. Our code and models are available at https://github.com/kai422/SCALE.
摘要
现代深度学习系统确定样本是否属于其知识范围是基本重要的。在这篇论文中,我们提供了近期状态艺术的out-of-distribution(OOD)检测方法的深入分析和见解,包括极简的活动形状(ASH)。我们表明了活动剪除对OOD检测有负面影响,而活动缩放则有利于其。此外,我们提议SCALE,一种简单又有效的后期网络增强方法,以实现OOD检测性能的状元。通过将扩展概念 integrate到训练过程中,以捕捉样本的ID特征,我们提议Intermediate Tensor SHaping(ISH),一种轻量级的训练时OOD检测增强方法。我们在OpenOOD v1.5 ImageNet-1K测试集上达到了AUROC分数+1.85%的近OOD数据集和+0.74%的远OOD数据集。我们的代码和模型可以在https://github.com/kai422/SCALE上获取。
results: 该方法可以在长itudinal MR 上生成一个可读的干扰空间,并且可以在不同的诊断任务上达到或超过当前的状态艺 Representatives的性能。code available at https://github.com/ouyangjiahong/longitudinal-som-single-modality。Abstract
Interpretability is a key issue when applying deep learning models to longitudinal brain MRIs. One way to address this issue is by visualizing the high-dimensional latent spaces generated by deep learning via self-organizing maps (SOM). SOM separates the latent space into clusters and then maps the cluster centers to a discrete (typically 2D) grid preserving the high-dimensional relationship between clusters. However, learning SOM in a high-dimensional latent space tends to be unstable, especially in a self-supervision setting. Furthermore, the learned SOM grid does not necessarily capture clinically interesting information, such as brain age. To resolve these issues, we propose the first self-supervised SOM approach that derives a high-dimensional, interpretable representation stratified by brain age solely based on longitudinal brain MRIs (i.e., without demographic or cognitive information). Called Longitudinally-consistent Self-Organized Representation learning (LSOR), the method is stable during training as it relies on soft clustering (vs. the hard cluster assignments used by existing SOM). Furthermore, our approach generates a latent space stratified according to brain age by aligning trajectories inferred from longitudinal MRIs to the reference vector associated with the corresponding SOM cluster. When applied to longitudinal MRIs of the Alzheimer's Disease Neuroimaging Initiative (ADNI, N=632), LSOR generates an interpretable latent space and achieves comparable or higher accuracy than the state-of-the-art representations with respect to the downstream tasks of classification (static vs. progressive mild cognitive impairment) and regression (determining ADAS-Cog score of all subjects). The code is available at https://github.com/ouyangjiahong/longitudinal-som-single-modality.
摘要
<>translate into Simplified Chinese��sterreichische Nationalbibliothek, Vienna, Austria� entitled Interpretability in Deep Learning for Longitudinal Brain MRIs, one challenge is the high dimensionality of the latent spaces generated by deep learning models. One approach to address this challenge is to visualize the latent spaces using self-organizing maps (SOM). However, learning SOM in high-dimensional latent spaces can be unstable, especially in self-supervised settings. Moreover, the learned SOM grid may not capture clinically meaningful information such as brain age.� To address these issues, we propose the first self-supervised SOM approach that derives a high-dimensional, interpretable representation stratified by brain age based solely on longitudinal brain MRIs. Called Longitudinally-consistent Self-Organized Representation learning (LSOR), the method is stable during training and relies on soft clustering instead of hard cluster assignments. Furthermore, our approach aligns trajectories inferred from longitudinal MRIs to the reference vector associated with the corresponding SOM cluster, generating a latent space stratified according to brain age.� When applied to longitudinal MRIs of the Alzheimer's Disease Neuroimaging Initiative (ADNI, N=632), LSOR generates an interpretable latent space and achieves comparable or higher accuracy than state-of-the-art representations with respect to downstream tasks such as classification (static vs. progressive mild cognitive impairment) and regression (determining ADAS-Cog score of all subjects). The code is available at https://github.com/ouyangjiahong/longitudinal-som-single-modality.� In summary, LSOR is a stable and interpretable deep learning method for longitudinal brain MRIs that captures brain age information and achieves high accuracy in downstream tasks.
DeformUX-Net: Exploring a 3D Foundation Backbone for Medical Image Segmentation with Depthwise Deformable Convolution
paper_authors: Ho Hin Lee, Quan Liu, Qi Yang, Xin Yu, Shunxing Bao, Yuankai Huo, Bennett A. Landman for:* The paper is focused on improving medical image segmentation using 3D ViTs and deformable convolution.methods:* The proposed model, 3D DeformUX-Net, combines long-range dependency, adaptive spatial aggregation, and computational efficiency by revisiting volumetric deformable convolution in a depth-wise setting.* The model also includes a parallel branch for generating deformable tri-planar offsets, which provides adaptive spatial aggregation across all channels.results:* The proposed model consistently outperforms existing state-of-the-art ViTs and large kernel convolution models across four challenging public datasets, achieving better segmentation results in terms of mean Dice.Abstract
The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predictions. Despite this, existing core operators, ranging from global-local attention to large kernel convolution, exhibit inherent trade-offs and limitations (e.g., global-local range trade-off, aggregating attentional features). We hypothesize that deformable convolution can be an exploratory alternative to combine all advantages from the previous operators, providing long-range dependency, adaptive spatial aggregation and computational efficiency as a foundation backbone. In this work, we introduce 3D DeformUX-Net, a pioneering volumetric CNN model that adeptly navigates the shortcomings traditionally associated with ViTs and large kernel convolution. Specifically, we revisit volumetric deformable convolution in depth-wise setting to adapt long-range dependency with computational efficiency. Inspired by the concepts of structural re-parameterization for convolution kernel weights, we further generate the deformable tri-planar offsets by adapting a parallel branch (starting from $1\times1\times1$ convolution), providing adaptive spatial aggregation across all channels. Our empirical evaluations reveal that the 3D DeformUX-Net consistently outperforms existing state-of-the-art ViTs and large kernel convolution models across four challenging public datasets, spanning various scales from organs (KiTS: 0.680 to 0.720, MSD Pancreas: 0.676 to 0.717, AMOS: 0.871 to 0.902) to vessels (e.g., MSD hepatic vessels: 0.635 to 0.671) in mean Dice.
摘要
三维ViT的应用在医学图像分割领域已经取得了非常出色的进步,一些超越了增强型神经网络(CNN)模型的发展。大核心深度卷积技术已经出现了可能,与层次转换器类似,并且提供了宽泛的有效接受场(ERF),这些都是为紧密预测而必要的。然而,现有的核心运算符,从全球local注意力到大核心卷积,都存在着内在的负担和限制(例如全球local范围负担)。我们 hypothesize dass deformable convolution可以是一种探索性的代替方案,结合所有的优点,提供长茨征dependency、适应空间聚合和计算效率作为基础核心。在这种工作中,我们引入了3D DeformUX-Net,一种在深度缩放设置下的三维弹性 convolution 模型,能够灵活地导航传统上与ViTs和大核心卷积相关的缺陷。具体来说,我们在深度缩放设置下对卷积核心进行了修改,以适应长茨征dependency,并且通过缩放后的权重映射来实现适应空间聚合。我们的实验证明了3D DeformUX-Net在四个公共数据集上(包括 KiTS:0.680-0.720、MSD Pancreas:0.676-0.717、AMOS:0.871-0.902)表现出了与现有的状态开头的 ViTs 和大核心卷积模型相对的稳定性和精度。
results: 我们在飞机客舱模拟室进行了广泛的用户研究,以评估算法的效果和学习行为。结果表明,开发的算法具有适应用户偏好的能力,并能成功地适应各种环境条件和用户特点。这表明该算法在智能照明领域具有广泛的应用前景。Abstract
The lighting requirements are subjective and one light setting cannot work for all. However, there is little work on developing smart lighting algorithms that can adapt to user preferences. To address this gap, this paper uses fuzzy logic and reinforcement learning to develop an adaptive lighting algorithm. In particular, we develop a baseline fuzzy inference system (FIS) using the domain knowledge. We use the existing literature to create a FIS that generates lighting setting recommendations based on environmental conditions i.e. daily glare index, and user information including age, activity, and chronotype. Through a feedback mechanism, the user interacts with the algorithm, correcting the algorithm output to their preferences. We interpret these corrections as rewards to a Q-learning agent, which tunes the FIS parameters online to match the user preferences. We implement the algorithm in an aircraft cabin mockup and conduct an extensive user study to evaluate the effectiveness of the algorithm and understand its learning behavior. Our implementation results demonstrate that the developed algorithm possesses the capability to learn user preferences while successfully adapting to a wide range of environmental conditions and user characteristics. and can deal with a diverse spectrum of environmental conditions and user characteristics. This underscores its viability as a potent solution for intelligent light management, featuring advanced learning capabilities.
摘要
“照明需求是主观的,一个照明设定无法适用于所有人。然而,有很少有关于发展智能照明算法的研究,以适应用户喜好。为了填补这个空白,本研究使用混淆逻辑和强化学习开发了一个适应式照明算法。具体来说,我们开发了一个基本的混淆推理系统(FIS),使用领域知识来生成照明设定建议 based on 环境条件(日常闪光指数)和用户信息(年龄、活动和生物体质)。通过反馈机制,用户与算法进行互动,对算法输出进行更正,以对用户喜好进行调整。我们将这些更正视为对Q学习者的奖励,以调整FIS参数在线匹配用户喜好。我们实现了这个算法在飞机客舱模拟中,并进行了广泛的用户研究,以评估算法的效能和学习行为。我们的实现结果显示,开发的算法具有适应用户喜好的能力,并成功地适应了广泛的环境条件和用户特点。这说明了它的可行性,作为智能照管的强大解决方案。”
Learning Informative Latent Representation for Quantum State Tomography
results: 我们通过对预处理的 encoder 进行训练,使其能够在不具备准确测量数据的情况下,对高维度 ILR 进行重建,并通过 decoder 来预测量子状态。我们的方法在实验中得到了惊人的效果,能够在不具备准确测量数据的情况下重建量子状态。Abstract
Quantum state tomography (QST) is the process of reconstructing the complete state of a quantum system (mathematically described as a density matrix) through a series of different measurements. These measurements are performed on a number of identical copies of the quantum system, with outcomes gathered as frequencies. QST aims to recover the density matrix and the corresponding properties of the quantum state from the measured frequencies. Although an informationally complete set of measurements can specify quantum state accurately in an ideal scenario with a large number of identical copies, both measurements and identical copies are restricted and imperfect in practical scenarios, making QST highly ill-posed. The conventional QST methods usually assume adequate or accurate measured frequencies or rely on manually designed regularizers to handle the ill-posed reconstruction problem, suffering from limited applications in realistic scenarios. Recent advances in deep neural networks (DNNs) led to the emergence of deep learning (DL) in QST. However, existing DL-based QST approaches often employ generic DNN models that are not optimized for imperfect conditions of QST. In this paper, we propose a transformer-based autoencoder architecture tailored for QST with imperfect measurement data. Our method leverages a transformer-based encoder to extract an informative latent representation (ILR) from imperfect measurement data and employs a decoder to predict the quantum states based on the ILR. We anticipate that the high-dimensional ILR will capture more comprehensive information about quantum states. To achieve this, we conduct pre-training of the encoder using a pretext task that involves reconstructing high-quality frequencies from measured frequencies. Extensive simulations and experiments demonstrate the remarkable ability of the ILR in dealing with imperfect measurement data in QST.
摘要
量子状态拟合(QST)是将量子系统的完整状态(数学描述为密度矩阵)重建的过程,通过一系列不同的测量获得结果。这些测量在多个相同的量子系统上进行,并记录结果为频率。QST的目标是从测量结果中恢复密度矩阵和相应的量子状态属性。然而,在实际情况下,测量和量子系统 копи本都是受限和不完美的,使QST变得高度不定义。传统的QST方法通常假设了充分或准确的测量结果,或者依靠手动设计正则化器来处理不定义重建问题,受到限制的应用场景。现代深度神经网络(DNN)的出现导致深度学习(DL)在QST中出现。然而,现有的DL基于QST方法 oftemploys generic DNN模型,不是为不完美的QST条件优化。在本文中,我们提出一种基于变换器的自动编码器架构,适用于QST中的不完美测量数据。我们的方法利用变换器基本编码器提取不完美测量数据中的有用信息(ILR),并使用解码器预测量子状态基于ILR。我们预计ILR的高维度将捕捉更广泛的量子状态信息。为了实现这一点,我们在encoder中进行预训练,使用一个预TEXT任务,该任务涉及从测量数据中重建高质量的频率。我们的实验和 simulations表明,ILR具有在QST中处理不完美测量数据的出色能力。
A Brief History of Prompt: Leveraging Language Models
results: 论文详细介绍了2020和2021年的上下文提醒和转移学习的获得,以及2022和2023年的前期无监督训练和新奖励形成的出现。全文引用了具体的研究例子,以示不同发展对Prompt工程的影响。Abstract
This paper presents a comprehensive exploration of the evolution of prompt engineering and generation in the field of natural language processing (NLP). Starting from the early language models and information retrieval systems, we trace the key developments that have shaped prompt engineering over the years. The introduction of attention mechanisms in 2015 revolutionized language understanding, leading to advancements in controllability and context-awareness. Subsequent breakthroughs in reinforcement learning techniques further enhanced prompt engineering, addressing issues like exposure bias and biases in generated text. We examine the significant contributions in 2018 and 2019, focusing on fine-tuning strategies, control codes, and template-based generation. The paper also discusses the growing importance of fairness, human-AI collaboration, and low-resource adaptation. In 2020 and 2021, contextual prompting and transfer learning gained prominence, while 2022 and 2023 witnessed the emergence of advanced techniques like unsupervised pre-training and novel reward shaping. Throughout the paper, we reference specific research studies that exemplify the impact of various developments on prompt engineering. The journey of prompt engineering continues, with ethical considerations being paramount for the responsible and inclusive future of AI systems.
摘要
In 2015, the introduction of attention mechanisms revolutionized language understanding, leading to advancements in controllability and context-awareness. Subsequent breakthroughs in reinforcement learning techniques further enhanced prompt engineering, addressing issues like exposure bias and biases in generated text.In 2018 and 2019, significant contributions included fine-tuning strategies, control codes, and template-based generation. The paper also discusses the growing importance of fairness, human-AI collaboration, and low-resource adaptation.In 2020 and 2021, contextual prompting and transfer learning gained prominence, while 2022 and 2023 witnessed the emergence of advanced techniques like unsupervised pre-training and novel reward shaping. Throughout the paper, we reference specific research studies that exemplify the impact of various developments on prompt engineering.The journey of prompt engineering continues, with ethical considerations being paramount for the responsible and inclusive future of AI systems.
Unveiling the Unborn: Advancing Fetal Health Classification through Machine Learning
results: 研究获得了98.31%的准确率,表明了机器学习的潜力在胎儿健康评估中。Abstract
Fetal health classification is a critical task in obstetrics, enabling early identification and management of potential health problems. However, it remains challenging due to data complexity and limited labeled samples. This research paper presents a novel machine-learning approach for fetal health classification, leveraging a LightGBM classifier trained on a comprehensive dataset. The proposed model achieves an impressive accuracy of 98.31% on a test set. Our findings demonstrate the potential of machine learning in enhancing fetal health classification, offering a more objective and accurate assessment. Notably, our approach combines various features, such as fetal heart rate, uterine contractions, and maternal blood pressure, to provide a comprehensive evaluation. This methodology holds promise for improving early detection and treatment of fetal health issues, ensuring better outcomes for both mothers and babies. Beyond the high accuracy achieved, the novelty of our approach lies in its comprehensive feature selection and assessment methodology. By incorporating multiple data points, our model offers a more holistic and reliable evaluation compared to traditional methods. This research has significant implications in the field of obstetrics, paving the way for advancements in early detection and intervention of fetal health concerns. Future work involves validating the model on a larger dataset and developing a clinical application. Ultimately, we anticipate that our research will revolutionize the assessment and management of fetal health, contributing to improved healthcare outcomes for expectant mothers and their babies.
摘要
婴儿健康分类是妇科领域中一项关键任务,可以早期发现和管理潜在的健康问题。然而,由于数据复杂性和有限的标注样本,这项任务仍然具有挑战性。本研究论文提出了一种新的机器学习方法,利用LightGBM分类器在全面数据集上进行训练。我们的实验结果显示,提案的模型在测试集上达到了98.31%的准确率。我们的发现表明机器学习在婴儿健康分类中具有潜在的潜力,可以提供更加 объек的和准确的评估。我们的方法选择了多种特征,如婴儿心跳、uterine contractions和 maternal blood pressure,以提供全面的评估。这种方法背后的创新在于其全面的特征选择和评估方法。通过结合多个数据点,我们的模型提供了更加全面和可靠的评估,与传统方法相比。这项研究对妇科领域有着深远的影响,开创了早期发现和治疗婴儿健康问题的新途径。未来的工作将包括验证模型在更大的数据集上的可靠性和开发临床应用。最终,我们预计这项研究将对婴儿健康评估和管理产生深远的影响,为预期母亲和婴儿带来更好的医疗结果。
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
results: 研究发现,微调 instrucion 对预训练模型有三个重要影响:1)帮助模型更好地识别用户提示中的指令部分,从而提高响应生成质量和解决“lost-in-the-middle”问题;2)将知识在循环层中与用户任务相关的知识相互协调,保持语言水平的稳定性;3)通过自注意机制,模型更好地认识指令词。这些发现对于理解预训练模型后 instrucion 微调的行为变化做出了贡献,并为未来针对不同应用场景进行预训练模型的解释和优化做出了基础。Abstract
Large Language Models (LLMs) have achieved remarkable success, demonstrating powerful instruction-following capabilities across diverse tasks. Instruction fine-tuning is critical in enabling LLMs to align with user intentions and effectively follow instructions. In this work, we investigate how instruction fine-tuning modifies pre-trained models, focusing on two perspectives: instruction recognition and knowledge evolution. To study the behavior shift of LLMs, we employ a suite of local and global explanation methods, including a gradient-based approach for input-output attribution and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. Our findings reveal three significant impacts of instruction fine-tuning: 1) It empowers LLMs to better recognize the instruction parts from user prompts, thereby facilitating high-quality response generation and addressing the ``lost-in-the-middle'' issue observed in pre-trained models; 2) It aligns the knowledge stored in feed-forward layers with user-oriented tasks, exhibiting minimal shifts across linguistic levels. 3) It facilitates the learning of word-word relations with instruction verbs through the self-attention mechanism, particularly in the lower and middle layers, indicating enhanced recognition of instruction words. These insights contribute to a deeper understanding of the behavior shifts in LLMs after instruction fine-tuning and lay the groundwork for future research aimed at interpreting and optimizing LLMs for various applications. We will release our code and data soon.
摘要
results: 研究发现,使用抛物线优化算法可以在一些特殊的数据点上提高泛化性能,但同时也可能增加隐私风险。研究还提出了一些缓解这种隐私风险的策略。Abstract
In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff.
摘要
很多最近的研究都集中在设计可以寻找更平的最优点的算法,因为有证据表明这会提高多种数据集的泛化性能。在这个工作中,我们通过数据记忆的角度分析这些性能提升的原因。我们定义了一个新的指标,可以帮助我们确定特定的数据点,在比较于普通的SGD时,哪些算法更好地完成。我们发现,使用Sharpness Aware Minimization(SAM)时,特别是在不Typical的数据点上,其性能提升非常明显。这一点帮助我们发现高privacy风险,我们通过详细的实验来验证。最后,我们提出了一些缓解措施,以实现更好的准确率和隐私质量的平衡。
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities
for: The paper aims to improve the inferential capabilities of large language models (LLMs) by proposing a new prompting framework called UPAR, which is inspired by Kant’s a priori philosophy.
methods: The UPAR framework consists of four phases: “Understand”, “Plan”, “Act”, and “Reflect”. It enables the extraction of structured information from complex contexts, prior planning of solutions, execution according to plan, and self-reflection.
results: The paper demonstrates the effectiveness of the UPAR framework by testing it on two tasks: a challenging subset of GSM8K and the causal judgment task. The results show that the accuracy of LLM inference is significantly improved, with an increase from 22.92% to 58.33% in the GSM8K task and from 67.91% to 75.40% in the causal judgment task.Abstract
Large Language Models (LLMs) have demonstrated impressive inferential capabilities, with numerous research endeavors devoted to enhancing this capacity through prompting. Despite these efforts, a unified epistemological foundation is still conspicuously absent. Drawing inspiration from Kant's a priori philosophy, we propose the UPAR prompting framework, designed to emulate the structure of human cognition within LLMs. The UPAR framework is delineated into four phases: "Understand", "Plan", "Act", and "Reflect", enabling the extraction of structured information from complex contexts, prior planning of solutions, execution according to plan, and self-reflection. This structure significantly augments the explainability and accuracy of LLM inference, producing a human-understandable and inspectable inferential trajectory. Furthermore, our work offers an epistemological foundation for existing prompting techniques, allowing for a possible systematic integration of these methods. With GPT-4, our approach elevates the accuracy from COT baseline of 22.92% to 58.33% in a challenging subset of GSM8K, and from 67.91% to 75.40% in the causal judgment task.
摘要
大型语言模型(LLM)已经表现出了吸引人的推理能力,有很多研究努力于提高这种能力通过提示。尽管如此,一个统一的 épistémologique基础仍然缺失。我们 Drawing inspiration from Kant's a priori philosophy, we propose the UPAR prompting framework, designed to emulate the structure of human cognition within LLMs. The UPAR framework is delineated into four phases: "Understand", "Plan", "Act", and "Reflect", enabling the extraction of structured information from complex contexts, prior planning of solutions, execution according to plan, and self-reflection. This structure significantly augments the explainability and accuracy of LLM inference, producing a human-understandable and inspectable inferential trajectory. Furthermore, our work offers an épistémological foundation for existing prompting techniques, allowing for a possible systematic integration of these methods. With GPT-4, our approach elevates the accuracy from COT baseline of 22.92% to 58.33% in a challenging subset of GSM8K, and from 67.91% to 75.40% in the causal judgment task.
Encouraging Inferable Behavior for Autonomy: Repeated Bimatrix Stackelberg Games with Observations
paper_authors: Mustafa O. Karabag, Sophia Smith, David Fridovich-Keil, Ufuk Topcu
For: 这篇论文关注了自主agent在与其他非竞争决策机器人交互时,如何表达其意图和策略。* Methods: 作者使用了一个重复的二元Stackelberg游戏模型,以模拟自主车与其他机器人之间的交互。在这个模型中,领导者采用固定的混合策略,而追随者则根据领导者的前一步动作进行反应。* Results: 作者证明了领导者在有观察情况下可能会受到一定的推断损失,即与领导者的策略完全知情情况下的性能相比。此外,作者还提供了一个游戏,其中需要一定的交互次数来保证推断性。Abstract
When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed, potentially mixed strategy. The follower, on the other hand, does not know the leader's strategy and dynamically reacts based on observations that are the leader's previous actions. In the setting with observations, the leader may suffer from an inferability loss, i.e., the performance compared to the setting where the follower has perfect information of the leader's strategy. We show that the inferability loss is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. As a converse result, we also provide a game where the required number of interactions is lower bounded by a function of the desired inferability loss.
摘要
Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards
results: 研究发现了一种实用的非Markov均摊集合方案,可以超越这种不可能性,仅需要每个目标添加一个额外参数。这些成果对sequential、多元目标代理和时间选择具有新的理解和实践意义,有助于设计服务多代理的人工智能系统。Abstract
As the capabilities of artificial agents improve, they are being increasingly deployed to service multiple diverse objectives and stakeholders. However, the composition of these objectives is often performed ad hoc, with no clear justification. This paper takes a normative approach to multi-objective agency: from a set of intuitively appealing axioms, it is shown that Markovian aggregation of Markovian reward functions is not possible when the time preference (discount factor) for each objective may vary. It follows that optimal multi-objective agents must admit rewards that are non-Markovian with respect to the individual objectives. To this end, a practical non-Markovian aggregation scheme is proposed, which overcomes the impossibility with only one additional parameter for each objective. This work offers new insights into sequential, multi-objective agency and intertemporal choice, and has practical implications for the design of AI systems deployed to serve multiple generations of principals with varying time preference.
摘要
随着人工智能技术的进步,人工智能系统被越来越多地用于服务多个多样化的目标和利益相关者。然而,这些目标的组合经常是随意的,没有明确的证明。这篇论文采取了normative方法,从一组直观上有吸引力的axioms开始,Proof that Markovian aggregation of Markovian reward functions is not possible when the time preference (discount factor) for each objective may vary。这意味着优质多目标agent必须承认不同目标之间的非Markovian奖励。为此,一种实用的非Markovian汇集方案被提议,可以在每个目标上增加一个额外参数来解决这个不可能性。这项工作提供了新的思路,对sequential, multi-objective agency和时间偏好选择进行了深入的研究,并对AI系统服务多个代理人的设计产生了实质性的影响。
Active-Perceptive Motion Generation for Mobile Manipulation
results: 实验表明,该方法可以在 simulate 的Scene中提高移动抓取系统的抓取率,并且可以在实际场景中转移。此外,该方法还可以对抓取任务进行优化,以提高抓取率和效率。Abstract
Mobile Manipulation (MoMa) systems incorporate the benefits of mobility and dexterity, thanks to the enlarged space in which they can move and interact with their environment. MoMa robots can also continuously perceive their environment when equipped with onboard sensors, e.g., an embodied camera. However, extracting task-relevant visual information in unstructured and cluttered environments such as households remains a challenge. In this work, we introduce an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks such as grasping, in initially unknown, cluttered scenes. Our proposed approach ActPerMoMa generates robot trajectories in a receding horizon fashion, sampling trajectories and computing path-wise utilities that trade-off reconstructing the unknown scene by maximizing the visual information gain and the taskoriented objective, e.g., grasp success by maximizing grasp reachability efficiently. We demonstrate the efficacy of our method in simulated experiments with a dual-arm TIAGo++ MoMa robot performing mobile grasping in cluttered scenes and when its path is obstructed by external obstacles. We empirically analyze the contribution of various utilities and hyperparameters, and compare against representative baselines both with and without active perception objectives. Finally, we demonstrate the transfer of our mobile grasping strategy to the real world, showing a promising direction for active-perceptive MoMa.
摘要
Mobile Manipulation(MoMa)系统具有移动和聪明的优点,感谢装置了 борьбу的感知器,如搭载的相机。但是,在无组织的和混乱的环境中,如家居,提取任务相关的视觉信息仍然是一个挑战。在这个工作中,我们介绍了一个活耀感知管道,用于生产移动掌控器的机动轨迹,以实现实用的掌控任务,如抓取。我们的提案方法ActPerMoMa使用推移视野的方式生成机器人的轨迹,该轨迹将路径实用性和任务目标优先级排序。我们在实验中使用了双臂TIAGo++ MoMa机器人在混乱场景中进行移动抓取,并评估了不同的优点和参数的贡献。我们还与不具有活耀感知目标的基eline进行比较。最后,我们展示了我们的移动抓取策略在实际世界中的实现,显示了活耀感知MoMa的可能性。
Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability
results: 我们在70个POMDP问题中进行了11个领域的实验,结果表明,我们的方法不仅能够帮助未知团队成员解决未知任务,而且可以在更加复杂的问题上进行稳定的性能。Abstract
This paper introduces a formal definition of the setting of ad hoc teamwork under partial observability and proposes a first-principled model-based approach which relies only on prior knowledge and partial observations of the environment in order to perform ad hoc teamwork. We make three distinct assumptions that set it apart previous works, namely: i) the state of the environment is always partially observable, ii) the actions of the teammates are always unavailable to the ad hoc agent and iii) the ad hoc agent has no access to a reward signal which could be used to learn the task from scratch. Our results in 70 POMDPs from 11 domains show that our approach is not only effective in assisting unknown teammates in solving unknown tasks but is also robust in scaling to more challenging problems.
摘要
这篇论文介绍了适应性团队工作的正式定义,并提出了基于模型的首则方法,该方法仅基于团队成员的先前知识和环境的部分观察来实现适应性团队工作。我们做出了三个特点分开的假设,即:i) 环境状态总是部分可见,ii) 团队成员的行动总是不可见给适应代理人和 iii) 适应代理人没有直接学习任务的奖励信号。我们在11个领域中的70个POMDP中的结果表明,我们的方法不仅能够帮助未知团队成员解决未知任务,还能够在更加困难的问题上具有稳定性。
Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets
methods: 本研究使用了 Multimodal Integration of Oncology Data System (MINDS),一个可扩展、可扩展、可靠的metadata框架,可以高效地融合不同来源的医疗数据,并提供了一个界面,以便探索不同数据类型之间的关系,并建立大规模多modal机器学习模型。
results: 本研究通过MINDS来融合多modal数据,并提供了一个patient-centric的框架,以便实现精准医学和个性化治疗。MINDS还可以跟踪细致的数据证明,以确保可重现性和透明度。此外,MINDS的云 Native架构可以处理快速增长的数据,并确保安全、可靠、可扩展和高效的数据处理。Abstract
The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS) - a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.
摘要
随着数据获取、存储和处理技术的进步,医疗数据的多样化快速增长。将验理学扫描图像、 Histopathology 图像和分子信息与临床数据集成是为了建立疾病全面理解和优化治疗提供了基础。在复杂的疾病,如癌症,将多种数据集成是为了实现精准医疗和个性化治疗。本文提出了多Modal Integration of Oncology Data System (MINDS) -一个灵活、可扩展、成本效果的元数据框架,可以有效地将不同来源的数据集成成一个连接的、患者中心的框架。MINDS 提供了浏览不同数据类型之间的关系的接口,并可以建立大规模多模态机器学习模型。通过融合多模态数据,MINDS 目标是使研究人员拥有更多的分析能力,探索诊断和预后探索的新知识,并为个性化医疗提供证据基础。MINDS 跟踪精细的数据来源追溯,确保可重现性和透明度。云native 架构可以处理快速增长的数据,并确保安全、成本优化的方式进行存储优化、复制避免和动态访问。自动扩展、访问控制和其他机制确保管道的可扩展性和安全性。MINDS 超越了现有的医学数据困难,通过一种可交互的元数据驱动的方法,代表了未来医学数据集成的重要一步。
Refutation of Shapley Values for XAI – Additional Evidence
methods: 使用 families of classifiers,不是 Boolean 分类器,以及 multiple classes 可以选择。
results: 显示 Shapley values 不适用于 XAI,并且features changed in any minimal $l_0$ distance adversarial examples 不包括无关的特征。Abstract
Recent work demonstrated the inadequacy of Shapley values for explainable artificial intelligence (XAI). Although to disprove a theory a single counterexample suffices, a possible criticism of earlier work is that the focus was solely on Boolean classifiers. To address such possible criticism, this paper demonstrates the inadequacy of Shapley values for families of classifiers where features are not boolean, but also for families of classifiers for which multiple classes can be picked. Furthermore, the paper shows that the features changed in any minimal $l_0$ distance adversarial examples do not include irrelevant features, thus offering further arguments regarding the inadequacy of Shapley values for XAI.
摘要
最近的工作表明了希普利值不适用于可解释人工智能(XAI)。尽管单个反例 suffices to disprove a theory,可能的批评是之前的工作强调了布尔分类器。为解决这种可能的批评,本文示出希普利值对于不是布尔分类器的家族分类器以及可以选择多个类的家族分类器是无效的。此外,本文还证明了在任何最小$l_0$距离抗击例中改变的特征不包括无关的特征,从而提供了更多有关希普利值不适用于 XAI 的论据。
results: 这个论文的实验结果显示 OP-GFNs 可以在单一目标最大化任务和多个目标 Pareto 前方估算任务中表现出色,包括人工数据集、分子生成和神经架构搜寻。Abstract
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates with probabilities proportional to a given reward. However, GFlowNets can only be used with a predefined scalar reward, which can be either computationally expensive or not directly accessible, in the case of multi-objective optimization (MOO) tasks for example. Moreover, to prioritize identifying high-reward candidates, the conventional practice is to raise the reward to a higher exponent, the optimal choice of which may vary across different environments. To address these issues, we propose Order-Preserving GFlowNets (OP-GFNs), which sample with probabilities in proportion to a learned reward function that is consistent with a provided (partial) order on the candidates, thus eliminating the need for an explicit formulation of the reward function. We theoretically prove that the training process of OP-GFNs gradually sparsifies the learned reward landscape in single-objective maximization tasks. The sparsification concentrates on candidates of a higher hierarchy in the ordering, ensuring exploration at the beginning and exploitation towards the end of the training. We demonstrate OP-GFN's state-of-the-art performance in single-objective maximization (totally ordered) and multi-objective Pareto front approximation (partially ordered) tasks, including synthetic datasets, molecule generation, and neural architecture search.
摘要
生成流网络(GFlowNets)已经被提出,用于采样一个多样化的候选者,概率与给定的奖励相符。然而,GFlowNets只能在预定的整数奖励上使用,这可能是计算成本高或不直接可访问的,例如多目标优化(MOO)任务中。此外,为便于寻找高奖励候选者,通常是通过提高奖励的幂来进行优化,但选择最佳幂值可能会因不同环境而异。为解决这些问题,我们提议Order-Preserving GFlowNets(OP-GFNs),它采样的概率与提供的(部分)顺序中的候选者相符,因此不需要显式表述奖励函数。我们证明OP-GFNs在单目标最大化任务中的训练过程中逐渐简化学习的奖励地形,并且在开始训练时进行探索,到训练结束时则进行利用。我们在单目标最大化(完全排序)和多目标Pareto前缘接近(部分排序)任务中实现OP-GFNs的状态 arts Performances,包括 sintetic dataset、分子生成和神经网络搜索。
Dynamic Demonstrations Controller for In-Context Learning
results: 实验结果表明,D$^2$Controller可以在八种不同的LLM上提高ICL性能平均5.4%,并且可以与之前的ICL模型进行比较。Abstract
In-Context Learning (ICL) is a new paradigm for natural language processing (NLP), where a large language model (LLM) observes a small number of demonstrations and a test instance as its input, and directly makes predictions without updating model parameters. Previous studies have revealed that ICL is sensitive to the selection and the ordering of demonstrations. However, there are few studies regarding the impact of the demonstration number on the ICL performance within a limited input length of LLM, because it is commonly believed that the number of demonstrations is positively correlated with model performance. In this paper, we found this conclusion does not always hold true. Through pilot experiments, we discover that increasing the number of demonstrations does not necessarily lead to improved performance. Building upon this insight, we propose a Dynamic Demonstrations Controller (D$^2$Controller), which can improve the ICL performance by adjusting the number of demonstrations dynamically. The experimental results show that D$^2$Controller yields a 5.4% relative improvement on eight different sizes of LLMs across ten datasets. Moreover, we also extend our method to previous ICL models and achieve competitive results.
摘要
新的一代自然语言处理(NLP) paradigma——卷积语言模型(LLM)在观察一小数量示例和测试实例后,直接进行预测而不需要更新模型参数。先前的研究表明,ICL对示例选择和排序具有敏感性。然而,关于 Limited LLM 输入长度内示例数量对 ICL 性能的影响,有少量研究,因为通常认为示例数量与模型性能是正相关的。在这篇论文中,我们发现这种结论并不总是成立。经验测试表明,增加示例数量并不一定能提高性能。基于这一点,我们提出了动态示例控制器(D$^2$Controller),可以在运行时动态调整示例数量,以提高 ICL 性能。实验结果表明,D$^2$Controller 在八种不同大小的 LLM 上对十个数据集进行了5.4%的相对提高。此外,我们还扩展了我们的方法到之前的 ICL 模型,并实现了竞争性的结果。
Measuring Value Understanding in Language Models through Discriminator-Critique Gap
methods: 本研究使用了 Schwartz Value Survey 来Specify evaluation values,并开发了一个 thousand-level dialogue dataset with GPT-4。对于 LLMs 的评估,分析了其输出与基eline答案之间的差异,以及 LLM 对价值认知的理由与 GPT-4 的注释之间的差异。
results: 研究发现,随着 LLMS 的缩放,”know what” 方面的差异增加,但 “know why” 方面的差异很少变化,这可能指示 LLMS 可能会提供合理的解释,但并不真正理解其内在的价值。这些结果可能表明 LLMS 可能存在风险。Abstract
Recent advancements in Large Language Models (LLMs) have heightened concerns about their potential misalignment with human values. However, evaluating their grasp of these values is complex due to their intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present the Value Understanding Measurement (VUM) framework that quantitatively assesses both "know what" and "know why" by measuring the discriminator-critique gap related to human values. Using the Schwartz Value Survey, we specify our evaluation values and develop a thousand-level dialogue dataset with GPT-4. Our assessment looks at both the value alignment of LLM's outputs compared to baseline answers and how LLM responses align with reasons for value recognition versus GPT-4's annotations. We evaluate five representative LLMs and provide strong evidence that the scaling law significantly impacts "know what" but not much on "know why", which has consistently maintained a high level. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.
摘要
最近的大语言模型(LLM)技术进步引发了对其可能的偏差问题的担忧。然而,评估这些模型对人类价值的理解复杂,因为它们的结构和适应性很强。我们认为,很难真正理解 LLM 中的价值,不是只是知道“what”,而是知道“why”。为此,我们提出了价值理解度量框架(VUM),用于量化评估 LLM 对人类价值的理解。我们使用了 Schwartz 价值问卷,并开发了一个 thousand-level 对话集,并用 GPT-4 进行评估。我们的评估包括 LLM 输出与基线答案之间的价值Alignment,以及 LLM 对价值认知的原因与 GPT-4 的注释之间的对应性。我们评估了五种代表性 LLM,并发现了扩展法律对“what”有显著影响,但对“why”没有太多影响,这可能表示 LLM 可能会基于提供的 контекст生成可能的解释,而不是真正理解其内在的价值,这可能会带来风险。
AI-Dentify: Deep learning for proximal caries detection on bitewing x-ray – HUNT4 Oral Health Study
results: 训练后的模型显示了与牙医专业人员的准确率和F1分数的提高,并且false negative率的减少。其中YOLOv5模型表现最佳,其中的mean average precision为0.647,mean F1分数为0.548,false negative率为0.149。Abstract
Background: Dental caries diagnosis requires the manual inspection of diagnostic bitewing images of the patient, followed by a visual inspection and probing of the identified dental pieces with potential lesions. Yet the use of artificial intelligence, and in particular deep-learning, has the potential to aid in the diagnosis by providing a quick and informative analysis of the bitewing images. Methods: A dataset of 13,887 bitewings from the HUNT4 Oral Health Study were annotated individually by six different experts, and used to train three different object detection deep-learning architectures: RetinaNet (ResNet50), YOLOv5 (M size), and EfficientDet (D0 and D1 sizes). A consensus dataset of 197 images, annotated jointly by the same six dentist, was used for evaluation. A five-fold cross validation scheme was used to evaluate the performance of the AI models. Results: the trained models show an increase in average precision and F1-score, and decrease of false negative rate, with respect to the dental clinicians. Out of the three architectures studied, YOLOv5 shows the largest improvement, reporting 0.647 mean average precision, 0.548 mean F1-score, and 0.149 mean false negative rate. Whereas the best annotators on each of these metrics reported 0.299, 0.495, and 0.164 respectively. Conclusion: Deep-learning models have shown the potential to assist dental professionals in the diagnosis of caries. Yet, the task remains challenging due to the artifacts natural to the bitewings.
摘要
背景:牙科疾病诊断需要 manually inspect 牙齿图像,然后进行视觉检查和可能的疾病部位的探针。然而,人工智能(AI)和深度学习有助于诊断,可以提供快速和有用的牙齿图像分析。方法:使用了13,887张牙齿图像从HUNT4口腔卫生研究计划,由6名专家分别注释,并用于训练3种深度学习建筑:RetinaNet(ResNet50)、YOLOv5(M大小)和EfficientDet(D0和D1大小)。用于评估AI模型性能的共识数据集包含197张图像,由同6名牙科专家共同注释。采用五fold十字验证法来评估AI模型性能。结果:训练的模型在比较牙科专家时显示了增加的平均精度和F1分数,以及降低的假阳性率。其中YOLOv5显示最大改善,reporting平均精度0.647、平均F1分数0.548和平均假阳性率0.149。而最佳注释员在每个维度上的最佳值分别为0.299、0.495和0.164。结论:深度学习模型在诊断疾病方面表现出了潜力,但牙齿图像中的自然遗传物品使得任务变得更加挑战。
Neuroadaptation in Physical Human-Robot Collaboration
results: 实验结果表明,封闭征文化框架在人机合作中降低了认知冲突水平,从而提高了人机合作的平滑度和直观性。这些结果表明了未来人机合作控制系统中使用电энцеfalogram(EEG)信号的可能性。Abstract
Robots for physical Human-Robot Collaboration (pHRC) systems need to change their behavior and how they operate in consideration of several factors, such as the performance and intention of a human co-worker and the capabilities of different human-co-workers in collision avoidance and singularity of the robot operation. As the system's admittance becomes variable throughout the workspace, a potential solution is to tune the interaction forces and control the parameters based on the operator's requirements. To overcome this issue, we have demonstrated a novel closed-loop-neuroadaptive framework for pHRC. We have applied cognitive conflict information in a closed-loop manner, with the help of reinforcement learning, to adapt to robot strategy and compare this with open-loop settings. The experiment results show that the closed-loop-based neuroadaptive framework successfully reduces the level of cognitive conflict during pHRC, consequently increasing the smoothness and intuitiveness of human-robot collaboration. These results suggest the feasibility of a neuroadaptive approach for future pHRC control systems through electroencephalogram (EEG) signals.
摘要
Visual Political Communication in a Polarized Society: A Longitudinal Study of Brazilian Presidential Elections on Instagram
paper_authors: Mathias-Felipe de-Lima-Santos, Isabella Gonçalves, Marcos G. Quiles, Lucia Mesquita, Wilson Ceron for: This study aims to investigate the visual communication strategies employed by Brazilian presidential candidates on Instagram in the 2018 and 2022 national elections.methods: The study employs a combination of computational methods and qualitative approach to analyze a dataset of 11,263 Instagram posts by 19 Brazilian presidential candidates.results: The study finds consistent patterns of celebratory and positively toned images, a strong sense of personalization, and unique contextual nuances specific to the Brazilian political landscape. The study also uncovers the prevalence of screenshots from news websites and other social media platforms, as well as text-edited images with portrayals.Abstract
In today's digital age, images have emerged as powerful tools for politicians to engage with their voters on social media platforms. Visual content possesses a unique emotional appeal that often leads to increased user engagement. However, research on visual communication remains relatively limited, particularly in the Global South. This study aims to bridge this gap by employing a combination of computational methods and qualitative approach to investigate the visual communication strategies employed in a dataset of 11,263 Instagram posts by 19 Brazilian presidential candidates in 2018 and 2022 national elections. Through two studies, we observed consistent patterns across these candidates on their use of visual political communication. Notably, we identify a prevalence of celebratory and positively toned images. They also exhibit a strong sense of personalization, portraying candidates connected with their voters on a more emotional level. Our research also uncovers unique contextual nuances specific to the Brazilian political landscape. We note a substantial presence of screenshots from news websites and other social media platforms. Furthermore, text-edited images with portrayals emerge as a prominent feature. In light of these results, we engage in a discussion regarding the implications for the broader field of visual political communication. This article serves as a testament to the pivotal role that Instagram has played in shaping the narrative of two fiercely polarized Brazilian elections, casting a revealing light on the ever-evolving dynamics of visual political communication in the digital age. Finally, we propose avenues for future research in the realm of visual political communication.
摘要
今天的数字时代,图像已成为政治家用于社交媒体平台与选民互动的有效工具。图像具有唯一的情感吸引力,导致用户参与度增加。然而,关于视觉通信的研究在全球南方仍然有限,特别是在2018和2022年布razil大选期间。这项研究希望通过计算方法和质量方法相结合,investigate在11,263个Instagram帖子中19名布razil总统候选人的视觉政治通信策略。经两项研究,我们发现了候选人在使用视觉政治通信的一系列办法的一致性。尤其是,我们发现了一种庆祝和积极的图像优势。候选人还具有更深层的人性化,通过更直观的情感连接与选民。我们的研究还发现了特定于布razil政治景观的 Contextual nuances。我们注意到了屏幕截屉和其他社交媒体平台的屏幕截屉的普遍存在。此外,我们发现了文本修改的图像特征。在这些结果的基础之上,我们进行了关于视觉政治通信领域的探讨。这篇文章作为Instagram在两场极其分化的布razil大选中的形象之一,投射出了数字时代的视觉政治通信在不断演化的特点。最后,我们提出了未来在视觉政治通信领域的研究方向。
Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis
results: CBDT 模型在多个数据集上进行了严谨测试,能够准确地 distinguish 偏见与中性声明,并且可以准确地标识具有偏见的单词。与现有方法相比,CBDT 模型表现出了2-4%的提升。Abstract
Bias detection in text is imperative due to its role in reinforcing negative stereotypes, disseminating misinformation, and influencing decisions. Current language models often fall short in generalizing beyond their training sets. In response, we introduce the Contextualized Bi-Directional Dual Transformer (CBDT) Classifier. This novel architecture utilizes two synergistic transformer networks: the Context Transformer and the Entity Transformer, aiming for enhanced bias detection. Our dataset preparation follows the FAIR principles, ensuring ethical data usage. Through rigorous testing on various datasets, CBDT showcases its ability in distinguishing biased from neutral statements, while also pinpointing exact biased lexemes. Our approach outperforms existing methods, achieving a 2-4\% increase over benchmark performances. This opens avenues for adapting the CBDT model across diverse linguistic and cultural landscapes.
摘要
文本中的偏见检测非常重要,因为它可以扩大负面刻板印象,传播错误信息,并影响决策。现有语言模型经常无法总是泛化到其训练集之外。为此,我们介绍了 Contextualized Bi-Directional Dual Transformer(CBDT)分类器。这种新的架构使用两个相互作用的转换器网络:上下文转换器和实体转换器,以提高偏见检测能力。我们的数据准备遵循了 FAIR 原则,确保数据使用的道德。经过对多个数据集的严格测试,CBDT 能够 отличи出偏见语句和中立语句,同时还能够准确地标识偏见词语。我们的方法在现有方法之上提高了2-4%的性能,这开启了适应不同语言和文化背景的CBDT 模型的可能性。
Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation
results: 我们的TD3基于方法在CARLA simulate平台上训练和测试后显示稳定协调和安全性表现,在不同的交通密度下都有显著改善。结果表明我们的方法可以有效地帮助AV穿梭T字口,比前一些方法减少旅行延迟、减少碰撞和总成本。这篇研究贡献了自动驾驶领域的强化学习应用和单机制方法的前进,并探讨未来强化学习算法的发展。Abstract
In this paper, we explore the challenges associated with navigating complex T-intersections in dense traffic scenarios for autonomous vehicles (AVs). Reinforcement learning algorithms have emerged as a promising approach to address these challenges by enabling AVs to make safe and efficient decisions in real-time. Here, we address the problem of efficiently and safely navigating T-intersections using a lower-cost, single-agent approach based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning algorithm. We show that our TD3-based method, when trained and tested in the CARLA simulation platform, demonstrates stable convergence and improved safety performance in various traffic densities. Our results reveal that the proposed approach enables the AV to effectively navigate T-intersections, outperforming previous methods in terms of travel delays, collision minimization, and overall cost. This study contributes to the growing body of knowledge on reinforcement learning applications in autonomous driving and highlights the potential of single-agent, cost-effective methods for addressing more complex driving scenarios and advancing reinforcement learning algorithms in the future.
摘要
在这篇论文中,我们探讨了自动驾驶车辆(AV)在稠密交通场景中穿梭复杂的T字路口的挑战。 reinforcement learning算法已经出现为解决这些挑战的有力方法,允许AV在实时中作出安全和高效的决策。在这里,我们解决了在低成本、单代理下使用TD3 reinforcement learning算法来有效地和安全地穿梭T字路口的问题。我们在CARLA simulate平台上训练和测试了我们的TD3基本方法,并证明了它在不同的交通密度下具有稳定的凝固和安全性表现。我们的结果表明,我们的方法可以帮助AV有效地穿梭T字路口,在前一些方法的旁路延迟、碰撞降低和总成本方面表现出色。这篇研究贡献了自动驾驶领域的增长体系,并高亮了未来可能的单代理、低成本方法在推进增强学习算法的前景。
FedLPA: Personalized One-shot Federated Learning with Layer-Wise Posterior Aggregation
results: 在几个 metrics 上实现了对 state-of-the-art 方法的显著改进,包括学习性能和通信 overhead 等。Abstract
Efficiently aggregating trained neural networks from local clients into a global model on a server is a widely researched topic in federated learning. Recently, motivated by diminishing privacy concerns, mitigating potential attacks, and reducing the overhead of communication, one-shot federated learning (i.e., limiting client-server communication into a single round) has gained popularity among researchers. However, the one-shot aggregation performances are sensitively affected by the non-identical training data distribution, which exhibits high statistical heterogeneity in some real-world scenarios. To address this issue, we propose a novel one-shot aggregation method with Layer-wise Posterior Aggregation, named FedLPA. FedLPA aggregates local models to obtain a more accurate global model without requiring extra auxiliary datasets or exposing any confidential local information, e.g., label distributions. To effectively capture the statistics maintained in the biased local datasets in the practical non-IID scenario, we efficiently infer the posteriors of each layer in each local model using layer-wise Laplace approximation and aggregate them to train the global parameters. Extensive experimental results demonstrate that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.
摘要
通用训练神经网络从本地客户端归一到服务器端是 Federated Learning 领域中广泛研究的主题。在最近,驱动了隐私问题的减少、抵御攻击和通信过程的减少,一shot Federated Learning(即限制客户端服务器通信到一round)在研究人员中得到了广泛的关注。然而,one-shot 集成性表现受到非标准训练数据分布的影响,这些分布在一些实际场景中具有高度的统计差异性。为 Addressing this issue, we propose a novel one-shot aggregation method called FedLPA, which aggregates local models to obtain a more accurate global model without requiring extra auxiliary datasets or exposing any confidential local information, such as label distributions. To effectively capture the statistics maintained in the biased local datasets in the practical non-IID scenario, we efficiently infer the posteriors of each layer in each local model using layer-wise Laplace approximation and aggregate them to train the global parameters. Extensive experimental results demonstrate that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.
Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware
results: 研究结果显示,当与在芯片上的普勒变化生成器一起使用时,我们的自我修复神经网络可以与使用分析敏感的算法相比。Abstract
In recent years, hardware-accelerated neural networks have gained significant attention for edge computing applications. Among various hardware options, crossbar arrays, offer a promising avenue for efficient storage and manipulation of neural network weights. However, the transition from trained floating-point models to hardware-constrained analog architectures remains a challenge. In this work, we combine a quantization technique specifically designed for such architectures with a novel self-correcting mechanism. By utilizing dual crossbar connections to represent both the positive and negative parts of a single weight, we develop an algorithm to approximate a set of multiplicative weights. These weights, along with their differences, aim to represent the original network's weights with minimal loss in performance. We implement the models using IBM's aihwkit and evaluate their efficacy over time. Our results demonstrate that, when paired with an on-chip pulse generator, our self-correcting neural network performs comparably to those trained with analog-aware algorithms.
摘要
近年来,固件加速神经网络得到了边缘计算应用的广泛关注。多种硬件选项中,扫描阵列表示出了可能的高效存储和神经网络权重的方式。然而,从训练浮点模型转换到硬件受限的材料建筑仍然是一个挑战。在这种工作中,我们结合特定于这些架构的量化技术和一种新的自我修正机制。通过使用双扫描连接来表示单个权重的正负部分,我们开发了一种近似多个乘数的算法。这些乘数、其差异,目的是表示原始网络的权重,尽可能减少性能损失。我们使用IBM的aihwkit进行实现,并对时间进行评估。我们的结果表明,当与在板内普ulse生成器结合使用时,我们的自我修正神经网络能够与使用材料感知算法训练的神经网络相比走。
methods: 本文提出了一种基于latent action space的方法, capturing only possible actions within the behavior policy support and decoupling the temporal structure between planning and modeling。但现有的latent-action-based方法通常是离散的,需要昂贵的规划。
results: 本文提出的$\texttt{LatentDiffuser}$方法在low-dimensional locomotion control任务中表现竞争性,并在高维任务中超越了现有方法。Abstract
Temporal abstraction and efficient planning pose significant challenges in offline reinforcement learning, mainly when dealing with domains that involve temporally extended tasks and delayed sparse rewards. Existing methods typically plan in the raw action space and can be inefficient and inflexible. Latent action spaces offer a more flexible paradigm, capturing only possible actions within the behavior policy support and decoupling the temporal structure between planning and modeling. However, current latent-action-based methods are limited to discrete spaces and require expensive planning. This paper presents a unified framework for continuous latent action space representation learning and planning by leveraging latent, score-based diffusion models. We establish the theoretical equivalence between planning in the latent action space and energy-guided sampling with a pretrained diffusion model and incorporate a novel sequence-level exact sampling method. Our proposed method, $\texttt{LatentDiffuser}$, demonstrates competitive performance on low-dimensional locomotion control tasks and surpasses existing methods in higher-dimensional tasks.
摘要
temporal abstraction和有效规划在线上游戏学习中具有重要挑战,尤其是在具有长时间任务和延迟的奖励的Domain中。现有方法通常在原生动作空间中规划,可能是不灵活和不高效的。latent action space提供了更加灵活的思想,只 capture行为策略支持下可能的动作和时间结构的分离。然而,当前的latent-action-based方法通常是离散的,需要昂贵的规划。这篇论文提出了一个综合框架,使用latent, score-based diffusion模型来学习和规划连续latent action space。我们证明了在latent action space中规划和energy-guided sampling with pretrained diffusion模型之间的理论等价性,并 интегrios了一种新的序列级别准确 sampling方法。我们的提案方法, $\texttt{LatentDiffuser}$,在low-dimensional locomotion控制任务上显示了竞争性的性能,并在高维任务上超过了现有的方法。
A Hierarchical Approach to Environment Design with Generative Trajectory Modeling
for: trains generally capable agents to achieve good zero-shot transfer performance
methods: uses Hierarchical MDP and synthetic experience dataset to reduce resource-intensive interactions
results: significantly improves efficiency and robustness of agent under limited training resources, with manifold advantages and effectiveness in various domains.Abstract
Unsupervised Environment Design (UED) is a paradigm for training generally capable agents to achieve good zero-shot transfer performance. This paradigm hinges on automatically generating a curriculum of training environments. Leading approaches for UED predominantly use randomly generated environment instances to train the agent. While these methods exhibit good zero-shot transfer performance, they often encounter challenges in effectively exploring large design spaces or leveraging previously discovered underlying structures, To address these challenges, we introduce a novel framework based on Hierarchical MDP (Markov Decision Processes). Our approach includes an upper-level teacher's MDP responsible for training a lower-level MDP student agent, guided by the student's performance. To expedite the learning of the upper leavel MDP, we leverage recent advancements in generative modeling to generate synthetic experience dataset for training the teacher agent. Our algorithm, called Synthetically-enhanced Hierarchical Environment Design (SHED), significantly reduces the resource-intensive interactions between the agent and the environment. To validate the effectiveness of SHED, we conduct empirical experiments across various domains, with the goal of developing an efficient and robust agent under limited training resources. Our results show the manifold advantages of SHED and highlight its effectiveness as a potent instrument for curriculum-based learning within the UED framework. This work contributes to exploring the next generation of RL agents capable of adeptly handling an ever-expanding range of complex tasks.
摘要
自动化环境设计(UES)是一种训练通用智能代理人达到良好零批转换性的 paradigm。这种 paradigm 基于自动生成训练环境的课程。现有领先的UES方法主要使用随机生成的环境实例来训练代理人。 although these methods have shown good zero-shot transfer performance, they often encounter challenges in effectively exploring large design spaces or leveraging previously discovered underlying structures.To address these challenges, we propose a novel framework based on Hierarchical MDP (Markov Decision Processes). Our approach includes an upper-level teacher's MDP responsible for training a lower-level MDP student agent, guided by the student's performance. To expedite the learning of the upper-level MDP, we leverage recent advancements in generative modeling to generate synthetic experience datasets for training the teacher agent. Our algorithm, called Synthetically-enhanced Hierarchical Environment Design (SHED), significantly reduces the resource-intensive interactions between the agent and the environment.To validate the effectiveness of SHED, we conduct empirical experiments across various domains, with the goal of developing an efficient and robust agent under limited training resources. Our results show the manifold advantages of SHED and highlight its effectiveness as a potent instrument for curriculum-based learning within the UED framework. This work contributes to exploring the next generation of RL agents capable of adeptly handling an ever-expanding range of complex tasks.
results: 实验结果显示,将 GPT-4 embed 到 GNAS 中可以超越现有的GNAS方法,并且实现更快的融合速度Abstract
Graph Neural Architecture Search (GNAS) has shown promising results in automatically designing graph neural networks. However, GNAS still requires intensive human labor with rich domain knowledge to design the search space and search strategy. In this paper, we integrate GPT-4 into GNAS and propose a new GPT-4 based Graph Neural Architecture Search method (GPT4GNAS for short). The basic idea of our method is to design a new class of prompts for GPT-4 to guide GPT-4 toward the generative task of graph neural architectures. The prompts consist of descriptions of the search space, search strategy, and search feedback of GNAS. By iteratively running GPT-4 with the prompts, GPT4GNAS generates more accurate graph neural networks with fast convergence. Experimental results show that embedding GPT-4 into GNAS outperforms the state-of-the-art GNAS methods.
摘要
graph神经架构搜寻(GNAS)已经显示出优异的结果,可以自动设计graph神经网络。然而,GNAS仍然需要专业知识和专业人员的努力来设计搜寻空间和搜寻策略。在这篇论文中,我们将GPT-4 integrate到GNAS中,并提出了一种基于GPT-4的新的Graph Neural Architecture Search方法(GPT4GNAS简称)。我们的方法的基本想法是,通过设计GPT-4的新类型的提示,将GPT-4引导到Graph Neural Architecture Search的生成任务中。这些提示包括搜寻空间、搜寻策略和搜寻反馈。通过轮询GPT-4的提示,GPT4GNAS可以更快地生成更准确的graph神经网络。实验结果显示,将GPT-4 integrate到GNAS中比前者的GNAS方法更高效。
Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition
results: 实验表明,只使用20%的样本可以提高8.45%的准确率,同时减少79%的时间消耗。Abstract
Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in real-world scenes with large-scale noisy data. To address these issues, we propose an active learning (AL) based Fine-Tuning framework for SER that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training and the downstream task. Then, AL methods are used to iteratively select a subset of the most informative and diverse samples for fine-tuning, reducing time consumption. Experiments demonstrate that using only 20\%pt. samples improves 8.45\%pt. accuracy and reduces 79\%pt. time consumption.
摘要
人工智能感情识别(SER)在人机交互中受到越来越多的关注。然而,现有的SER方法忽视了预训练语音识别任务和下游SER任务之间的信息差距,导致性能下降。另外,它们需要大量时间来 Fine-tune 每个特定语音数据集,限制了它们在实际场景中的实用性。为解决这些问题,我们提议一种基于活动学习(AL)的 Fine-Tuning 框架 для SER,利用任务适应预训练(TAPT)和 AL 方法提高性能和效率。具体来说,我们首先使用 TAPT 将预训练任务和下游任务之间的信息差距减少到最小。然后,我们使用 AL 方法选择下游任务中最有用和多样的样本进行 Fine-tuning,从而减少时间消耗。实验结果表明,只使用 20% 的样本可以提高 8.45% 的精度和减少 79% 的时间消耗。
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
results: 经过对四种不同类型的解释任务的广泛实验,我们发现,通过多模型协作,解释性能得到了显著提高,与现有方法相比。此外,我们还提供了更多的结果和深入分析,证明我们的方法的成本效益和注解效率。Abstract
Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge. Benefiting from ultra-large-scale training corpora, a single LLM can manage typical NLP tasks competently. However, its performance in executing reasoning tasks is still confined by the limitations of its internal representations. To push this boundary further, we introduce Corex in this paper, a suite of novel general-purpose strategies that transform LLMs into autonomous agents pioneering multi-model collaborations for complex task-solving. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes, which collectively work towards enhancing the factuality, faithfulness, and reliability of the reasoning process. These paradigms foster task-agnostic approaches that enable LLMs to ''think outside the box,'' thereby overcoming hallucinations and providing better solutions. Through extensive experiments across four different types of reasoning tasks, we demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods. Further results and in-depth analysis demonstrate the cost-effectiveness of our method, facilitating collaboration among different LLMs and promoting annotation efficiency.
摘要
A Unified Framework for Generative Data Augmentation: A Comprehensive Survey
results: 提供了一个总结GDA领域的框架,揭示了一些研究方向,如有效的数据选择、大规模模型的应用、建立GDA标准 bencmark等。In English, that means:
for: To alleviate the problem of data scarcity in machine learning applications
methods: Using generative data augmentation (GDA) techniques to increase data quantity
results: Provide a comprehensive framework for the GDA landscape, reveal some research directions, such as effective data selection, theoretical development for large-scale models, and establishing a benchmark for GDA.Abstract
Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications. This thesis presents a comprehensive survey and unified framework of the GDA landscape. We first provide an overview of GDA, discussing its motivation, taxonomy, and key distinctions from synthetic data generation. We then systematically analyze the critical aspects of GDA - selection of generative models, techniques to utilize them, data selection methodologies, validation approaches, and diverse applications. Our proposed unified framework categorizes the extensive GDA literature, revealing gaps such as the lack of universal benchmarks. The thesis summarises promising research directions, including , effective data selection, theoretical development for large-scale models' application in GDA and establishing a benchmark for GDA. By laying a structured foundation, this thesis aims to nurture more cohesive development and accelerate progress in the vital arena of generative data augmentation.
摘要
Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting
methods: 本研究使用了Chain of Thought(CoT)提示法来训练大型语言模型完成特定任务。
results: 结果显示, Among all the models, Llama-7b performs the least effectively, displaying the highest mean squared error。 conversely, ChatGPT emerges as the superior model, boasting a higher Cohen kappa score value of 0.53。Abstract
Large Language Models, such as Generative Pre-trained Transformer 3 (aka. GPT-3), have been developed to understand language through the analysis of extensive text data, allowing them to identify patterns and connections between words. While LLMs have demonstrated impressive performance across various text-related tasks, they encounter challenges in tasks associated with reasoning. To address this challenge, Chain of Thought(CoT) prompting method has been proposed as a means to enhance LLMs' proficiency in complex reasoning tasks like solving math word problems and answering questions based on logical argumentative reasoning. The primary aim of this research is to assess how well four language models can grade reflective essays of third-year medical students. The assessment will specifically target the evaluation of critical thinking skills using CoT prompting. The research will provide the following contributions; to introduce and educate on the process of instructing models to evaluate reflective essays from a dataset they have not been previously trained on; to illustrate the use of CoT prompting as an instructional approach for training large models to carry out particular tasks. Our results suggest that among all the models, Llama-7b performs the least effectively, displaying the highest mean squared error. Conversely, ChatGPT emerges as the superior model, boasting a higher Cohen kappa score value of 0.53. Lastly, it's important to note that the selected models do prioritise user privacy by allowing users to delete their own conducted conversations.
摘要
大型语言模型,如生成预训练转换器3(GPT-3),已经开发以便理解语言,通过分析大量文本数据,找到语言中的模式和关系。 Although these models have shown impressive performance in various text-related tasks, they struggle with tasks that require reasoning. To address this challenge, Chain of Thought (CoT) prompting method has been proposed to enhance the models' ability in complex reasoning tasks, such as solving math word problems and answering questions based on logical argumentative reasoning.本研究的主要目标是评估四个语言模型在评估第三年医学生的反思文章时的表现。 研究将特别target evaluation of critical thinking skills using CoT prompting。 我们的结果表明,在所有模型中,Llama-7b表现最差,显示最高的方差平方误差。 相反,ChatGPT emerges as the superior model, with a higher Cohen kappa score value of 0.53. 最后,我们需要注意的是,选择的模型强调用户隐私,允许用户删除自己的进行的对话。
Unravel Anomalies: An End-to-end Seasonal-Trend Decomposition Approach for Time Series Anomaly Detection
paper_authors: Zhenwei Zhang, Ruiqi Wang, Ran Ding, Yuantao Gu
for: 本研究旨在解决复杂时间序列数据中的各种突变问题,传统时间序列异常检测方法在面临复杂时间序列数据和多种异常时 often struggle.
methods: 我们提出了TADNet,一种结束到端的时间序列异常检测模型,利用季节性-趋势分解将各种异常联系到特定的分解组件,从而简化复杂时间序列分析和异常检测性能。我们的训练方法包括预训练Synthetic dataset followed by fine-tuning, strike a balance between effective decomposition and precise anomaly detection.
results: 实验 validate TADNet在真实世界 dataset 上表现出state-of-the-art的性能,可以准确检测多种异常。Abstract
Traditional Time-series Anomaly Detection (TAD) methods often struggle with the composite nature of complex time-series data and a diverse array of anomalies. We introduce TADNet, an end-to-end TAD model that leverages Seasonal-Trend Decomposition to link various types of anomalies to specific decomposition components, thereby simplifying the analysis of complex time-series and enhancing detection performance. Our training methodology, which includes pre-training on a synthetic dataset followed by fine-tuning, strikes a balance between effective decomposition and precise anomaly detection. Experimental validation on real-world datasets confirms TADNet's state-of-the-art performance across a diverse range of anomalies.
摘要
传统时序异常检测(TAD)方法经常遇到复杂时序数据的composite性和多样化异常的问题。我们介绍TADNet,一种终端TAD模型,利用季节性趋势分解将各种异常类型关联到特定的分解组件,从而简化复杂时序分析并提高异常检测性能。我们的训练方法,包括先行预训练 followed by fine-tuning,在有效分解和精准异常检测之间寻找平衡点。实验 validate TADNet在实际 dataset上具有现代水平的性能,并在多种异常情况下具有优异的检测性能。
The Physics of Preference: Unravelling Imprecision of Human Preferences through Magnetisation Dynamics
results: 测试结果表明,这种物理和心理学材料的混合可以准确地捕捉人类决策过程中的复杂性。Abstract
Paradoxical decision-making behaviours such as preference reversal often arise from imprecise or noisy human preferences. By harnessing the physical principle of magnetisation reversal in ferromagnetic nanostructures driven by electric current, we developed a model that closely reflects human decision-making dynamics. Tested against a spectrum of psychological data, our model adeptly captures the complexities inherent in individual choices. This blend of physics and psychology paves the way for fresh perspectives on understanding human decision-making processes.
摘要
人类决策行为经常呈现出悖论的特点,如偏好逆转。我们通过利用束缚性材料中的电流驱动的磁化现象,开发了一种模型,能够准确反映人类决策过程中的复杂性。对各种心理数据进行测试,我们的模型能够成功捕捉人类决策过程中的复杂性。这种物理和心理学的融合,为人类决策过程的理解带来了新的视角。
A quantum system control method based on enhanced reinforcement learning
results: 与其他方法比较,QSC-ERL在有限资源条件下达到近似1比例学习控制量子系统,需要更少的集数进行量子状态演化。Abstract
Traditional quantum system control methods often face different constraints, and are easy to cause both leakage and stochastic control errors under the condition of limited resources. Reinforcement learning has been proved as an efficient way to complete the quantum system control task. To learn a satisfactory control strategy under the condition of limited resources, a quantum system control method based on enhanced reinforcement learning (QSC-ERL) is proposed. The states and actions in reinforcement learning are mapped to quantum states and control operations in quantum systems. By using new enhanced neural networks, reinforcement learning can quickly achieve the maximization of long-term cumulative rewards, and a quantum state can be evolved accurately from an initial state to a target state. According to the number of candidate unitary operations, the three-switch control is used for simulation experiments. Compared with other methods, the QSC-ERL achieves close to 1 fidelity learning control of quantum systems, and takes fewer episodes to quantum state evolution under the condition of limited resources.
摘要
传统量子系统控制方法经常遇到不同的限制,容易导致泄漏和随机控制错误,特别在有限资源的情况下。基于增强的回归学习(QSC-ERL)方法是一种有效的完成量子系统控制任务的方法。在这种方法中, reinforcement learning 中的状态和操作被映射到量子系统中的状态和控制操作。通过使用新的增强神经网络, reinforcement learning 可以快速实现最大化长期累积奖励,并将量子状态从初始状态演化到目标状态。根据候选unitary操作的数量,使用三个转移控制来进行模拟实验。相比其他方法,QSC-ERL 可以在有限资源的情况下准确地控制量子系统,并在 fewer episodes 中实现量子状态演化。
AdaptNet: Policy Adaptation for Physics-Based Character Control
results: 该方法可以快速适应现有的物理学习控制器,并在各种新的步态、任务目标、角色 morphology 和环境变化中显示出显著的提高。此外,它也表现出了与其他方法相比明显的增长效率,即训练时间的减少。Abstract
Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies. Code is available at https://motion-lab.github.io/AdaptNet.
摘要
<>受人类学习新技能的能力启发,本文介绍了 AdaptNet,一种修改现有政策的方法,以允许快速从相似任务中学习新行为。基于给定的奖励学习控制器,AdaptNet使用两层层次结构,将原始状态嵌入更改以支持小幅度的行为变化,并对策略网络层进行更加重要的更改。该技术能够有效地适应现有物理学习控制器到各种新的步态、任务目标、角色结构和环境变化中。此外,它还表现出了明显提高学习效率,如训练时间减少了很多,相比于从scratch或使用其他修改现有政策的方法。代码可以在 上获取。<>
Combining Spatial and Temporal Abstraction in Planning for Better Generalization
results: 比对现有高级 плани方法, Skipper 在零shot总结中显示出显著优势Abstract
Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning agent that utilizes spatial and temporal abstractions to generalize learned skills in novel situations. It automatically decomposes the task at hand into smaller-scale, more manageable subtasks and hence enables sparse decision-making and focuses its computation on the relevant parts of the environment. This relies on the definition of a high-level proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end using hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to existing state-of-the-art hierarchical planning methods.
摘要
Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis
paper_authors: Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks
for: 这篇研究是用于实现条件参数化的内部生成模型,以提高内部生成的品质和精确性。
methods: 这篇研究使用了扩散模型,并通过设计一个逆模型来控制扩散过程的推导。
results: 研究所得到的结果显示,这种方法可以在不需要训练的情况下,实现高品质的内部生成,并且可以轻松地整合多个条件。Abstract
Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in designing models that perform plug-and-play generation, i.e., to use a predefined or pretrained model, which is not explicitly trained on the generative task, to guide the generative process (e.g., using language). However, such guidance is typically useful only towards synthesizing high-level semantics rather than editing fine-grained details as in image-to-image translation tasks. To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation. The key idea is to steer the image generation of the diffusion model at inference time via designing a loss using a pre-trained inverse model that characterizes the conditional task. This loss modulates the sampling trajectory of the diffusion process. Our framework allows for easy incorporation of multiple conditions during inference. We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution. Our results demonstrate clear qualitative and quantitative improvements over state-of-the-art diffusion-based plug-and-play models while adding negligible additional computational cost.
摘要
通常情况下,条件生成模型需要大量标注训练数据来达到高质量生成。因此,有很大的兴趣在设计可以使用预定义或预训练模型来导引生成过程的模型。然而,这种导引通常只有用于synthesize高级 semantics而不是编辑细节,如图像。为了解决这个问题,我们介绍了Steered Diffusion,一种普适的扩展框架,可以通过定制 diffusion 模型来实现零shot条件图像生成。我们的关键想法是在推理时使用预训练 inverse 模型来定义条件任务的损失函数,以控制 diffusion 过程的抽象样本。我们的框架允许在推理时容易添加多个条件。我们通过对Steered Diffusion进行多种任务的实验,包括填充、色调化、文本干涉性编辑和图像超分辨率重建,并取得了明显的qualitative和quantitative提升。同时,我们的方法添加了negligible的计算成本。
Beyond Random Noise: Insights on Anonymization Strategies from a Latent Bandit Study
results: 我们在三个开放的实际数据集上进行了实验,发现将噪声添加到个体用户数据记录中并不是一个好的选择。相比拟标准噪声机制,使用不同的聚合策略和噪声可以提供更多的灵活性。例如,使用不同大小的群组的平均值可以提供不可能由噪声alone 实现的灵活性。总之,没有单一的聚合策略可以在给定的隐私水平下实现最佳的忧郁。Abstract
This paper investigates the issue of privacy in a learning scenario where users share knowledge for a recommendation task. Our study contributes to the growing body of research on privacy-preserving machine learning and underscores the need for tailored privacy techniques that address specific attack patterns rather than relying on one-size-fits-all solutions. We use the latent bandit setting to evaluate the trade-off between privacy and recommender performance by employing various aggregation strategies, such as averaging, nearest neighbor, and clustering combined with noise injection. More specifically, we simulate a linkage attack scenario leveraging publicly available auxiliary information acquired by the adversary. Our results on three open real-world datasets reveal that adding noise using the Laplace mechanism to an individual user's data record is a poor choice. It provides the highest regret for any noise level, relative to de-anonymization probability and the ADS metric. Instead, one should combine noise with appropriate aggregation strategies. For example, using averages from clusters of different sizes provides flexibility not achievable by varying the amount of noise alone. Generally, no single aggregation strategy can consistently achieve the optimum regret for a given desired level of privacy.
摘要
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
results: 这篇论文的实验结果显示,P3O 可以与 PPO 相比,在 KL-Reward 贡献损失中实现更好的平衡,并且可以更好地与人类偏好相Alignment。此外,P3O 还可以在不同的对照式反馈中进行精确的优化。Abstract
Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant approach for steering LLMs towards beneficial behavior involves Reinforcement Learning with Human Feedback (RLHF), with Proximal Policy Optimization (PPO) serving as the default RL optimizer. Despite its effectiveness, PPO has limitations when optimizing rewards trained from comparison-based loss. Primarily, PPO is not invariant to equivalent reward functions containing identical preference information due to the need to calibrate the reward scale. Additionally, PPO's necessity for token-wise updates introduces complexity in both function approximation and algorithm design compared to trajectory-wise optimization. This paper proposes a new framework, reinforcement learning with relative feedback, and a novel trajectory-wise policy gradient algorithm, Pairwise Proximal Policy Optimization (P3O) that operates directly on comparative rewards. We show theoretically that P3O is invariant to equivalent rewards and avoids the complexity of PPO. Empirical evaluations demonstrate that P3O outperforms PPO in the KL-Reward trade-off and can align with human preferences as well as or better than prior methods. In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.
摘要
大型语言模型(LLM)可以通过预训练大量文本获得广泛的世界知识。然而,由于接触低质量数据,LLM可能会表现出不良行为不符合人类价值观。现行的方法是使用人类反馈的强化学习来引导LLM,其中Proximal Policy Optimization(PPO) serve as the default RL optimizer。尽管它有效,但PPO受到比较基于损失函数的限制。一、PPO不惟一地对等抽象的奖励函数进行匹配,需要调整奖励的权重缩放。二、PPO需要在单个字符级别进行更新,这会增加函数近似和算法设计中的复杂度。本文提出了一种新的框架——强化学习与相对反馈,以及一种新的追踪级别政策梯度算法——对比较政策优化(P3O)。我们证明了P3O对等抽象的奖励函数是惟一的,并且不需要单个字符级别的更新。实验证明,P3O在KL-奖励负担中比PPO表现更好,并且可以与人类偏好相似或更好地启合。总之,本文介绍了一种简洁又有效的方法,通过相对反馈来引导LLM符合人类价值观。
A Prefrontal Cortex-inspired Architecture for Planning in Large Language Models
results: 使用多个 LLM-based(GPT-4)模块的黑盒体系,可以在解决 graph traversal 和 Tower of Hanoi 等两个困难的计划任务时,取得显著的提高,比如零shot 提示或在Context learning 中。这些结果表明,从 cognitive neuroscience 中获得的知识可以改善 LLM 的计划能力。Abstract
Large language models (LLMs) demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. To address this, we take inspiration from the human brain, in which planning is accomplished via the recurrent interaction of specialized modules in the prefrontal cortex (PFC). These modules perform functions such as conflict monitoring, state prediction, state evaluation, task decomposition, and task coordination. We find that LLMs are sometimes capable of carrying out these functions in isolation, but struggle to autonomously coordinate them in the service of a goal. Therefore, we propose a black box architecture with multiple LLM-based (GPT-4) modules. The architecture improves planning through the interaction of specialized PFC-inspired modules that break down a larger problem into multiple brief automated calls to the LLM. We evaluate the combined architecture on two challenging planning tasks -- graph traversal and Tower of Hanoi -- finding that it yields significant improvements over standard LLM methods (e.g., zero-shot prompting or in-context learning). These results demonstrate the benefit of utilizing knowledge from cognitive neuroscience to improve planning in LLMs.
摘要
results: 在三个实际的人类评估任务上展示了superior的能力和效率,能够预测人类评估者的总行为,匹配人类注解的分布,并模拟人类评估者之间的不一致。Abstract
Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment. Human perception and behaviour during human evaluation exhibit inherent variability due to diverse cognitive processes and subjective interpretations, which should be taken into account in modelling to better mimic the way people perceive and interact with the world. This paper introduces a novel meta-learning framework that treats HAS as a zero-shot density estimation problem, which incorporates human variability and allows for the efficient generation of human-like annotations for unlabelled test inputs. Under this framework, we propose two new model classes, conditional integer flows and conditional softmax flows, to account for ordinal and categorical annotations, respectively. The proposed method is evaluated on three real-world human evaluation tasks and shows superior capability and efficiency to predict the aggregated behaviours of human annotators, match the distribution of human annotations, and simulate the inter-annotator disagreements.
摘要
人工标注 simulate (HAS) acted as a cost-effective substitute for human evaluation, such as data annotation and system assessment. Human perception and behavior during human evaluation exhibit inherent variability due to diverse cognitive processes and subjective interpretations, which should be taken into account in modeling to better mimic the way people perceive and interact with the world. This paper introduces a novel meta-learning framework that treats HAS as a zero-shot density estimation problem, which incorporates human variability and allows for the efficient generation of human-like annotations for unlabeled test inputs. Under this framework, we propose two new model classes, conditional integer flows and conditional softmax flows, to account for ordinal and categorical annotations, respectively. The proposed method is evaluated on three real-world human evaluation tasks and shows superior capability and efficiency to predict the aggregated behaviors of human annotators, match the distribution of human annotations, and simulate the inter-annotator disagreements.
Question-Answering Model for Schizophrenia Symptoms and Their Impact on Daily Life using Mental Health Forums Data
results: 经过实验 validate,提出的方法可以获得一个准确的数据集,并且通过 fine-tuning BioBERT QA模型,实现了精神疾病领域的问答模型。该模型在 F1 分数上达到了 0.885,超过了当前领域的状态码模型。Abstract
In recent years, there is strong emphasis on mining medical data using machine learning techniques. A common problem is to obtain a noiseless set of textual documents, with a relevant content for the research question, and developing a Question Answering (QA) model for a specific medical field. The purpose of this paper is to present a new methodology for building a medical dataset and obtain a QA model for analysis of symptoms and impact on daily life for a specific disease domain. The ``Mental Health'' forum was used, a forum dedicated to people suffering from schizophrenia and different mental disorders. Relevant posts of active users, who regularly participate, were extrapolated providing a new method of obtaining low-bias content and without privacy issues. Furthermore, it is shown how to pre-process the dataset to convert it into a QA dataset. The Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, RoBERTa, and BioBERT models were fine-tuned and evaluated via F1-Score, Exact Match, Precision and Recall. Accurate empirical experiments demonstrated the effectiveness of the proposed method for obtaining an accurate dataset for QA model implementation. By fine-tuning the BioBERT QA model, we achieved an F1 score of 0.885, showing a considerable improvement and outperforming the state-of-the-art model for mental disorders domain.
摘要
现在,有强烈的强调在医疗数据挖掘中使用机器学习技术。一个常见的问题是获取噪音少的文本文档,具有相关的内容,并开发一个问答(QA)模型 для特定的医疗领域。本文的目的是提出一种新的方法ology for 建立医疗数据集和获得一个QA模型,用于分析疾病和对日常生活的影响的分析。使用“精神健康”论坛,这是一个专门为患有分子难病和不同的精神障碍的人们而设立的论坛。我们从有活跃用户的相关帖子中抽取了有用的帖子,以提供一种新的方法,无需隐私问题。此外,我们还介绍了如何预处理数据,以将其转换为QA数据集。我们使用了BERT、DistilBERT、RoBERTa和BioBERT模型,并对其进行了微调和评估。我们通过F1分数、精确匹配、精确率和受损率进行了实际的实验,并证明了我们提出的方法的有效性。通过微调BioBERT QA模型,我们实现了F1分数0.885,表明我们的方法在精神疾病领域的QA模型实现了显著改进,并超越了现有的状态码模型。
The Many Voices of Duying: Revisiting the Disputed Essays Between Lu Xun and Zhou Zuoren
paper_authors: Xin Xie, Jiangqiong Li, Haining Wang
for: This research aims to revisit three disputed essays pseudonymously published by Lu Xun and Zhou Zuoren in 1912, using quantitative methods and stylometric analysis to investigate the authors’ respective writing styles and examine the essays’ authorship.
methods: The research employs an interpretable authorship attribution model and visual representations of essay features to facilitate a nuanced understanding of the brothers’ formative intellectual trajectories and their collaboration on these early works.
results: The findings suggest that ‘Looking at the Country of China’ was authored by Lu Xun, while ‘People of Yue, Forget Not Your Ancestors’ Instructions’ seems to be either predominantly authored or extensively revised by Lu Xun, with notable stylistic similarities to ‘Looking at the Land of Yue,’ which Zhou Zuoren recognized as his own but edited by Lu Xun. The third essay, ‘Where Has the Character of the Republic Gone?’, exhibits a ‘diluted’, mixed writing style, suggesting thorough collaboration between the brothers.Abstract
Lu Xun and Zhou Zuoren stand as two of the most influential writers in modern Chinese literature. Beyond their familial ties as brothers, they were also intimate collaborators during the nascent stages of their writing careers. This research employs quantitative methods to revisit three disputed essays pseudonymously published by the brothers in 1912. Our stylometric analysis uses an interpretable authorship attribution model to investigate the essays' authorship and examine the brothers' respective writing styles. Our findings suggest that 'Looking at the Country of China' was authored by Lu Xun. Moreover, 'People of Yue, Forget Not Your Ancestors' Instructions' seems to be either predominantly authored or extensively revised by Lu Xun given its notable stylistic similarities to 'Looking at the Land of Yue,' a piece Zhou Zuoren recognized as his own, but edited by Lu Xun. The third essay, 'Where Has the Character of the Republic Gone?,' exhibits a 'diluted', mixed writing style, suggesting thorough collaboration. We offer visual representations of essay features to facilitate a nuanced and intuitive understanding. We have uncovered evidence suggesting Lu Xun's covert engagement with social issues during his purported 'silent era' and provided insights into the brothers' formative intellectual trajectories.
摘要
吕迅和周作人是现代中国文学中最有影响力的两位作家。除了他们的 familial ties 外,他们也是在他们写作早期的密切合作者。这项研究使用量化方法回到1912年 pseudonymously 发表的三篇文章中,进行了作者属性分析。我们的样本分析发现,《看中国的国情》是吕迅所作的,而《父母教诲》似乎是吕迅或者大量修改了,因为它的作家风格与吕迅认可的《看粤eland的情》具有显著的相似性。第三篇文章《共和国的人物消失了吗?》显示出了混合的写作风格,表明了两人的合作。我们提供了文章特征的视觉表示,以便更好地理解。我们发现吕迅在报道的“幽默时期”中仍然做出了 covert 的社会问题干预,并为吕迅和周作人的形成 интеллектуаль 轨迹提供了新的视角。
Enhancing Representation Generalization in Authorship Identification
results: 该论文的结果表明,选择合适的语言特征对于作者识别具有重要性,特别是在域外场景下。同时,使用深度学习模型可以提高作者识别的普适性。Abstract
Authorship identification ascertains the authorship of texts whose origins remain undisclosed. That authorship identification techniques work as reliably as they do has been attributed to the fact that authorial style is properly captured and represented. Although modern authorship identification methods have evolved significantly over the years and have proven effective in distinguishing authorial styles, the generalization of stylistic features across domains has not been systematically reviewed. The presented work addresses the challenge of enhancing the generalization of stylistic representations in authorship identification, particularly when there are discrepancies between training and testing samples. A comprehensive review of empirical studies was conducted, focusing on various stylistic features and their effectiveness in representing an author's style. The influencing factors such as topic, genre, and register on writing style were also explored, along with strategies to mitigate their impact. While some stylistic features, like character n-grams and function words, have proven to be robust and discriminative, others, such as content words, can introduce biases and hinder cross-domain generalization. Representations learned using deep learning models, especially those incorporating character n-grams and syntactic information, show promise in enhancing representation generalization. The findings underscore the importance of selecting appropriate stylistic features for authorship identification, especially in cross-domain scenarios. The recognition of the strengths and weaknesses of various linguistic features paves the way for more accurate authorship identification in diverse contexts.
摘要
我们对多种实验研究进行了全面的审查,包括不同的风格特征和它们在作者风格表示中的效果。我们还研究了主题、类型和注册等因素对写作风格的影响,以及如何减轻这些因素的影响。结果显示,一些风格特征,如字符n-gram和语法信息,可以提供强大和特征的表示,而其他些风格特征,如内容词,可能会引入偏见并降低跨领域泛化。深度学习模型,特别是包含字符n-gram和语法信息的模型,显示了提高表示泛化的潜力。这些发现强调了选择适当的风格特征对作者识别非常重要,特别是在跨领域场景下。通过了解不同语言特征的优劣点,我们可以更准确地确定文本的作者,并在多种不同的情况下进行更好的作者识别。
Open-Domain Dialogue Quality Evaluation: Deriving Nugget-level Scores from Turn-level Scores
results: 经过实验,该方法可以帮助找到对话转换中的具体问题,从而改进对话系统的表现。Abstract
Existing dialogue quality evaluation systems can return a score for a given system turn from a particular viewpoint, e.g., engagingness. However, to improve dialogue systems by locating exactly where in a system turn potential problems lie, a more fine-grained evaluation may be necessary. We therefore propose an evaluation approach where a turn is decomposed into nuggets (i.e., expressions associated with a dialogue act), and nugget-level evaluation is enabled by leveraging an existing turn-level evaluation system. We demonstrate the potential effectiveness of our evaluation method through a case study.
摘要
现有的对话质量评估系统可以返回一个给定系统转帧的分数,例如吸引力。然而,要改进对话系统,可能需要更细化的评估方法。我们因此提议一种评估方法,即将转帧 decomposed into 块 (即对话动作相关的表达),并使用现有的转帧级别评估系统来启用块级别评估。我们通过案例研究示出了我们的评估方法的潜在效果。
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
results: 对于人工和自动评估,CLiMA 和 LLaMA 都能够超越商业 GPT-4 和 Claude 2 模型,在人工创建的图表的相似性方面获得更高的分数。此外,CLiMA 还能够改善文本-图像对齐。Abstract
Generating bitmap graphics from text has gained considerable attention, yet for scientific figures, vector graphics are often preferred. Given that vector graphics are typically encoded using low-level graphics primitives, generating them directly is difficult. To address this, we propose the use of TikZ, a well-known abstract graphics language that can be compiled to vector graphics, as an intermediate representation of scientific figures. TikZ offers human-oriented, high-level commands, thereby facilitating conditional language modeling with any large language model. To this end, we introduce DaTikZ the first large-scale TikZ dataset, consisting of 120k TikZ drawings aligned with captions. We fine-tune LLaMA on DaTikZ, as well as our new model CLiMA, which augments LLaMA with multimodal CLIP embeddings. In both human and automatic evaluation, CLiMA and LLaMA outperform commercial GPT-4 and Claude 2 in terms of similarity to human-created figures, with CLiMA additionally improving text-image alignment. Our detailed analysis shows that all models generalize well and are not susceptible to memorization. GPT-4 and Claude 2, however, tend to generate more simplistic figures compared to both humans and our models. We make our framework, AutomaTikZ, along with model weights and datasets, publicly available.
摘要
科学图表生成已经受到了广泛关注,但是在科学图表上,vector图形通常被首选。这是因为vector图形通常使用低级图形元素编码,直接生成它们是困难的。为解决这个问题,我们提议使用TikZ,一种广泛使用的抽象图形语言,作为科学图表的中间表示。TikZ提供人类 oriented 高级命令,因此可以使用任何大型语言模型进行条件语言模型化。为此,我们引入了DaTikZ,我们的首个大规模TikZ数据集,包含120k个TikZ绘制和关联的描述。我们在DaTikZ上练习LLaMA,以及我们的新模型CLiMA,它在多Modal CLIP嵌入下进行了增强。在人工和自动评估中,CLiMA和LLaMA都超过了商业GPT-4和Claude 2在人类创建图表的相似性上,并且CLiMA还改善了文本-图像对齐。我们的详细分析表明所有模型都能够通过并不易于记忆。然而,GPT-4和Claude 2倾向于生成更简单的图表,与人类和我们的模型相比。我们在AutomaTikZ框架,以及模型和数据集,公开提供。
Gaze-Driven Sentence Simplification for Language Learners: Enhancing Comprehension and Readability
results: 实验结果表明,该系统可以准确估计句子级别的理解程度,并且通过使用GPT-3.5进行简化,提高了文本的可读性和个体单词难度Abstract
Language learners should regularly engage in reading challenging materials as part of their study routine. Nevertheless, constantly referring to dictionaries is time-consuming and distracting. This paper presents a novel gaze-driven sentence simplification system designed to enhance reading comprehension while maintaining their focus on the content. Our system incorporates machine learning models tailored to individual learners, combining eye gaze features and linguistic features to assess sentence comprehension. When the system identifies comprehension difficulties, it provides simplified versions by replacing complex vocabulary and grammar with simpler alternatives via GPT-3.5. We conducted an experiment with 19 English learners, collecting data on their eye movements while reading English text. The results demonstrated that our system is capable of accurately estimating sentence-level comprehension. Additionally, we found that GPT-3.5 simplification improved readability in terms of traditional readability metrics and individual word difficulty, paraphrasing across different linguistic levels.
摘要
学习者应 régulièrement 阅读具有挑战性的材料,以提高阅读理解能力。然而,不断地查询词典是时间consuming 和distracting。这篇论文提出了一种基于eye gaze的句子简化系统,用于提高阅读理解而不间断注意力。我们的系统通过对个人学习者的eye gaze特征和语言特征进行机器学习模型,以评估句子理解程度。当系统认为理解有difficulties时,它会提供简化版本,替换复杂词汇和语法 With GPT-3.5。我们进行了19名英语学习者的实验,收集了他们的eye movement数据 while reading English text。结果表明,我们的系统可以准确地估计句子级别的理解程度。此外,我们发现GPT-3.5简化提高了文本的可读性,包括传统的可读性指标和个别词Difficulty,以及各种语言水平的重叠。
Red Teaming Game: A Game-Theoretic Framework for Red Teaming Language Models
results: 这个论文的实验结果显示,GRTS 能够自动发现多种攻击策略,并对 LLM 的安全性进行改善,比起现有的规律性红队设计方法更好。Abstract
Deployable Large Language Models (LLMs) must conform to the criterion of helpfulness and harmlessness, thereby achieving consistency between LLMs outputs and human values. Red-teaming techniques constitute a critical way towards this criterion. Existing work rely solely on manual red team designs and heuristic adversarial prompts for vulnerability detection and optimization. These approaches lack rigorous mathematical formulation, thus limiting the exploration of diverse attack strategy within quantifiable measure and optimization of LLMs under convergence guarantees. In this paper, we present Red-teaming Game (RTG), a general game-theoretic framework without manual annotation. RTG is designed for analyzing the multi-turn attack and defense interactions between Red-team language Models (RLMs) and Blue-team Language Model (BLM). Within the RTG, we propose Gamified Red-teaming Solver (GRTS) with diversity measure of the semantic space. GRTS is an automated red teaming technique to solve RTG towards Nash equilibrium through meta-game analysis, which corresponds to the theoretically guaranteed optimization direction of both RLMs and BLM. Empirical results in multi-turn attacks with RLMs show that GRTS autonomously discovered diverse attack strategies and effectively improved security of LLMs, outperforming existing heuristic red-team designs. Overall, RTG has established a foundational framework for red teaming tasks and constructed a new scalable oversight technique for alignment.
摘要
deployable 大语言模型(LLM)必须遵循帮助和无害性的准则,以确保 LLM 的输出与人类价值之间的一致性。红队技术是一种关键的方法,可以帮助实现这一准则。现有的工作仅仅采用手动设计的红队和启发式对抗提示来检测漏洞和优化 LLM。这些方法缺乏准确的数学表述,因此限制了对多种攻击策略的可靠探索和 LLM 的优化 beneath convergence guarantees。在这篇论文中,我们提出了红队游戏(RTG),一种普适的游戏理论基础。RTG 是用于分析红队语言模型(RLM)和蓝队语言模型(BLM)之间的多轮攻击和防御互动的框架。在 RTG 中,我们提出了游戏化红队解决方案(GRTS),具有 semantic space 的多样性度量。GRTS 是一种自动化的红队技术,通过 meta-game 分析解决 RTG,对 RLMs 和 BLMs 都有 theoretically guaranteed optimization direction。实验结果表明,GRTS 可以自动找到多样化的攻击策略,提高 LLMs 的安全性,比传统的启发式红队设计更高效。总的来说,RTG 建立了红队任务的基础框架,并构建了一种新的可扩展的监管技术,以便对适应进行Alignment。
In-Context Learning in Large Language Models: A Neuroscience-inspired Analysis of Representations
paper_authors: Safoora Yousefi, Leo Betthauser, Hosein Hasanbeig, Akanksha Saran, Raphaël Millière, Ida Momennejad
for: investigate the mechanisms behind the improvement of large language models (LLMs) through in-context learning (ICL)
methods: employ neuroscience-inspired techniques such as representational similarity analysis (RSA) and propose novel methods for parameterized probing and attention ratio analysis (ARA)
results: found a meaningful correlation between changes in both embeddings and attention representations with improvements in behavioral performance after ICL, offering valuable tools and insights for future research and practical applications.Here’s the full text in Simplified Chinese:
results: 发现在ICL后, embedding 和注意力表示变化与行为性能改进存在明确的相关性,为未来研究和实际应用提供有价值的工具和洞察。Abstract
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) by leveraging task-specific examples in the input. However, the mechanisms behind this improvement remain elusive. In this work, we investigate embeddings and attention representations in Llama-2 70B and Vicuna 13B. Specifically, we study how embeddings and attention change after in-context-learning, and how these changes mediate improvement in behavior. We employ neuroscience-inspired techniques, such as representational similarity analysis (RSA), and propose novel methods for parameterized probing and attention ratio analysis (ARA, measuring the ratio of attention to relevant vs. irrelevant information). We designed three tasks with a priori relationships among their conditions: reading comprehension, linear regression, and adversarial prompt injection. We formed hypotheses about expected similarities in task representations to investigate latent changes in embeddings and attention. Our analyses revealed a meaningful correlation between changes in both embeddings and attention representations with improvements in behavioral performance after ICL. This empirical framework empowers a nuanced understanding of how latent representations affect LLM behavior with and without ICL, offering valuable tools and insights for future research and practical applications.
摘要
Translation notes:* "Large language models" (LLMs) is translated as "大语言模型" (dà yǔ yán módel).* "In-context learning" (ICL) is translated as "在上下文中学习" (zài shàng xiào yì xué xí).* "Embeddings" is translated as "嵌入" (fù rù).* "Attention" is translated as "注意" (zhù yì).* "Representational similarity analysis" (RSA) is translated as "表示相似性分析" (biǎo xiǎng sì xìng bù yì).* "Parameterized probing" is translated as "参数化探测" (cèsuǒ huì yì).* "Attention ratio analysis" (ARA) is translated as "注意率分析" (zhù yì xìng bù yì).* "Reading comprehension" is translated as "阅读理解" (dòng dú lǐ jiě).* "Linear regression" is translated as "直线回归" (zhí xiàn huí qù).* "Adversarial prompt injection" is translated as "敌对提示注入" (dí duì tím shì zhù yì).
Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompting Method
results: 在两个公共谣言数据集上,HiSS提问方法超过了现有的完全监督方法和强几架 ICL-enabled 基elineAbstract
While large pre-trained language models (LLMs) have shown their impressive capabilities in various NLP tasks, they are still under-explored in the misinformation domain. In this paper, we examine LLMs with in-context learning (ICL) for news claim verification, and find that only with 4-shot demonstration examples, the performance of several prompting methods can be comparable with previous supervised models. To further boost performance, we introduce a Hierarchical Step-by-Step (HiSS) prompting method which directs LLMs to separate a claim into several subclaims and then verify each of them via multiple questions-answering steps progressively. Experiment results on two public misinformation datasets show that HiSS prompting outperforms state-of-the-art fully-supervised approach and strong few-shot ICL-enabled baselines.
摘要
大型预训语言模型(LLM)在不同的自然语言处理任务中已经展示了它们的卓越能力,但它们在假信息领域还尚未得到充分的探索。在这篇论文中,我们对新闻声明验证 зада务使用 LLM 进行培 обу,并发现只需要4个示例示例,可以使得许多提示方法的性能与之前的监督模型相当。为了进一步提高性能,我们介绍了层次步骤进行(HiSS)提示方法,该方法将声明分解成多个子声明,然后通过多个问题回答步骤进行逐步验证。实验结果表明,HiSS 提示方法在两个公共的假信息 datasets 上表现出色,超过了当前的完全监督方法和强几步 ICL-enabled 基elines。
results: RelBERT模型可以模型到训练数据之外的关系,例如名称实体之间的关系,并且可以认osciLLMs and GPT-based models。Abstract
Many applications need access to background knowledge about how different concepts and entities are related. Although Knowledge Graphs (KG) and Large Language Models (LLM) can address this need to some extent, KGs are inevitably incomplete and their relational schema is often too coarse-grained, while LLMs are inefficient and difficult to control. As an alternative, we propose to extract relation embeddings from relatively small language models. In particular, we show that masked language models such as RoBERTa can be straightforwardly fine-tuned for this purpose, using only a small amount of training data. The resulting model, which we call RelBERT, captures relational similarity in a surprisingly fine-grained way, allowing us to set a new state-of-the-art in analogy benchmarks. Crucially, RelBERT is capable of modelling relations that go well beyond what the model has seen during training. For instance, we obtained strong results on relations between named entities with a model that was only trained on lexical relations between concepts, and we observed that RelBERT can recognise morphological analogies despite not being trained on such examples. Overall, we find that RelBERT significantly outperforms strategies based on prompting language models that are several orders of magnitude larger, including recent GPT-based models and open source models.
摘要
Note:* "Knowledge Graph" (知识图)* "Large Language Model" (大语言模型)* "Relation Embeddings" (关系嵌入)* "Masked Language Model" (伪语言模型)* "RoBERTa" (RoBERTa)* "RelBERT" (RelBERT)
Understanding In-Context Learning from Repetitions
results: 该研究发现了表面特征在文本生成中的双重作用,并解释了协同学习的内在机制和其可能的局限性。Abstract
This paper explores the elusive mechanism underpinning in-context learning in Large Language Models (LLMs). Our work provides a novel perspective by examining in-context learning via the lens of surface repetitions. We quantitatively investigate the role of surface features in text generation, and empirically establish the existence of \emph{token co-occurrence reinforcement}, a principle that strengthens the relationship between two tokens based on their contextual co-occurrences. By investigating the dual impacts of these features, our research illuminates the internal workings of in-context learning and expounds on the reasons for its failures. This paper provides an essential contribution to the understanding of in-context learning and its potential limitations, providing a fresh perspective on this exciting capability.
摘要
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
paper_authors: Tobi Olatunji, Tejumade Afonja, Aditya Yadavalli, Chris Chinenye Emezue, Sahib Singh, Bonaventure F. P. Dossou, Joanne Osuchukwu, Salomey Osei, Atnafu Lambebo Tonja, Naome Etori, Clinton Mbataku
For: The paper aims to address the lack of productivity tools for overworked clinicians in Africa, where the doctor-to-patient ratio is very low.* Methods: The paper uses clinical automatic speech recognition (ASR) systems, which are mature and ubiquitous in developed nations, but have not been widely available for clinicians in Africa. The authors also release a new dataset called AfriSpeech, which includes 200 hours of Pan-African English speech from 2,463 unique speakers across 120 indigenous accents from 13 countries.* Results: The authors release pre-trained models with state-of-the-art (SOTA) performance on the AfriSpeech benchmark, which can be used to improve the accuracy of clinical ASR systems for African accents.Here are the three points in Simplified Chinese:* For: 该论文目的是为非洲的医生缺乏产品力工具,非洲医生比例非常低。* Methods: 论文使用了临床自动语音识别(ASR)系统,这些系统在发达国家是成熔的,但在非洲尚未广泛应用。作者们还发布了新的数据集called AfriSpeech,包括200小时的非洲英语语音,来自2,463名唯一的说话者,来自13个国家的120种本地口音。* Results: 作者们发布了基于AfriSpeechbenchmark的预训练模型,其中的语音识别性能达到了当前最佳状态(SOTA)。Abstract
Africa has a very low doctor-to-patient ratio. At very busy clinics, doctors could see 30+ patients per day -- a heavy patient burden compared with developed countries -- but productivity tools such as clinical automatic speech recognition (ASR) are lacking for these overworked clinicians. However, clinical ASR is mature, even ubiquitous, in developed nations, and clinician-reported performance of commercial clinical ASR systems is generally satisfactory. Furthermore, the recent performance of general domain ASR is approaching human accuracy. However, several gaps exist. Several publications have highlighted racial bias with speech-to-text algorithms and performance on minority accents lags significantly. To our knowledge, there is no publicly available research or benchmark on accented African clinical ASR, and speech data is non-existent for the majority of African accents. We release AfriSpeech, 200hrs of Pan-African English speech, 67,577 clips from 2,463 unique speakers across 120 indigenous accents from 13 countries for clinical and general domain ASR, a benchmark test set, with publicly available pre-trained models with SOTA performance on the AfriSpeech benchmark.
摘要
非洲医生与病人比例非常低。在非常忙的医院,医生可以每天看到30多名患者,这对于发达国家来说是一个重重的患者负担,但是医疗机器人(ASR)的产业化工具缺乏。然而,在发达国家,临床ASR已经成熟,甚至 ubique,并且医生报告的商业临床ASR系统的性能一般满意。此外,最近的通用领域ASR性能已经接近人类水平。然而,有几个差距。一些发表文章指出了语言算法中的种族偏见,并且少数语言口音的性能明显落后。根据我们所知,没有公开的研究或标准测试数据在非洲语音领域的临床ASR,并且非洲语音数据非常罕见。我们释放了AfriSpeech,200小时的非洲英语语音数据,67,577个clip从2,463名唯一的说话者中,来自120个本地口音,13个国家的医疗和通用领域ASR测试集,以及公共可用的预训练模型,与AfriSpeech测试集的最新表现。
AutoHall: Automated Hallucination Dataset Generation for Large Language Models
for: This paper is written for detecting non-factual or hallucinatory content generated by large language models (LLMs).
methods: The paper proposes a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets, called AutoHall. Additionally, the paper proposes a zero-resource and black-box hallucination detection method based on self-contradiction.
results: The paper achieves superior hallucination detection performance compared to extant baselines, and reveals variations in hallucination proportions and types among different models.Here’s the information in Simplified Chinese text:
results: 论文在现有基准下 achieve 了超过基准的幻想检测性能,并发现不同模型中幻想的比例和类型存在差异。Abstract
While Large language models (LLMs) have garnered widespread applications across various domains due to their powerful language understanding and generation capabilities, the detection of non-factual or hallucinatory content generated by LLMs remains scarce. Currently, one significant challenge in hallucination detection is the laborious task of time-consuming and expensive manual annotation of the hallucinatory generation. To address this issue, this paper first introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall. Furthermore, we propose a zero-resource and black-box hallucination detection method based on self-contradiction. We conduct experiments towards prevalent open-/closed-source LLMs, achieving superior hallucination detection performance compared to extant baselines. Moreover, our experiments reveal variations in hallucination proportions and types among different models.
摘要
大型语言模型(LLM)在不同领域的应用广泛,主要是因为它们具有强大的语言理解和生成能力。然而,检测 LLM 生成的非事实或幻见内容仍然是一个罕见的问题。现在,一个主要的挑战是手动标注幻见生成的时间和成本很高。为解决这个问题,本文首先提出了一种自动构建基于现有真假检查集的模型特定幻见数据集的方法,称为AutoHall。此外,我们还提出了一种零资源和黑盒子幻见检测方法,基于自相矛盾。我们对广泛存在的开源/关闭源 LLM 进行了实验,并 achievement 超过现有基准。此外,我们的实验还发现了不同模型中幻见的порпор额和类型存在差异。
SLM: Bridge the thin gap between speech and text foundation models
For: The paper is written for the task of speech and language modeling, with a focus on multitask, multilingual, and dual-modal models.* Methods: The paper uses pretrained foundational speech and language models, and trains a simple adapter with just 1% of the foundation models’ parameters to adapt the model to new tasks.* Results: The paper demonstrates strong performance on conventional tasks such as speech recognition and speech translation, and introduces the novel capability of zero-shot instruction-following for more diverse tasks such as contextual biasing ASR, dialog generation, speech continuation, and question answering.Abstract
We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achieve strong performance on conventional tasks such as speech recognition (ASR) and speech translation (AST), but also introduces the novel capability of zero-shot instruction-following for more diverse tasks: given a speech input and a text instruction, SLM is able to perform unseen generation tasks including contextual biasing ASR using real-time context, dialog generation, speech continuation, and question answering, etc. Our approach demonstrates that the representational gap between pretrained speech and language models might be narrower than one would expect, and can be bridged by a simple adaptation mechanism. As a result, SLM is not only efficient to train, but also inherits strong capabilities already acquired in foundation models of different modalities.
摘要
我们提出了一个共同语音和语言模型(SLM),这是一种多任务、多语言、双modal模型,它利用预训练的基础语音和语言模型。SLM冻结了基础模型的预训练,以便保持其能力的最大化,并只训练一个简单的适应器,占基础模型参数的1%(156M)。这种适应不仅使SLM在传统任务such as语音识别(ASR)和语音翻译(AST)中达到了强大的表现,而且引入了无需训练的零shot指令遵循能力,包括基于实时上下文的语音识别、对话生成、语音续写和问答等多种不同任务。我们的方法表明,预训练的语音和语言模型之间的表示差可能比一 might expect更窄,并且可以通过简单的适应机制来bridged。因此,SLM不仅轻松训练,而且继承了不同模式的基础模型已经获得的强大能力。
Detecting Unseen Multiword Expressions in American Sign Language
results: word embeddings可以实现非常高的准确率来探测非结构化词语。Abstract
Multiword expressions present unique challenges in many translation tasks. In an attempt to ultimately apply a multiword expression detection system to the translation of American Sign Language, we built and tested two systems that apply word embeddings from GloVe to determine whether or not the word embeddings of lexemes can be used to predict whether or not those lexemes compose a multiword expression. It became apparent that word embeddings carry data that can detect non-compositionality with decent accuracy.
摘要
多字表达presentUnique挑战在许多翻译任务中。为了最终应用多字表达检测系统到美国手语翻译,我们建立并测试了两个系统,它们使用GloVeWord embedding来判断lexemes是否组成多字表达。结果表明,word embedding含有数据可以准确地检测非组合性。
results: 研究发现,不同学科的学术文献具有类似的结构和表达方式,并且在不同学科之间存在一定的相似性和差异性。这些结果可以为未来评估研究质量、域风格传递和进一步的 Pragmatic 分析提供基础。Abstract
Scholarly documents have a great degree of variation, both in terms of content (semantics) and structure (pragmatics). Prior work in scholarly document understanding emphasizes semantics through document summarization and corpus topic modeling but tends to omit pragmatics such as document organization and flow. Using a corpus of scholarly documents across 19 disciplines and state-of-the-art language modeling techniques, we learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors (also referred to as "normalization"). Then, we analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure. We report within-discipline structural archetypes, variability, and between-discipline comparisons, supporting the hypothesis that scholarly communities, despite their size, diversity, and breadth, share similar avenues for expressing their work. Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.
摘要
学术文献 exhibit 大量变化,包括内容( semantics)和结构( pragmatics)两方面。先前的学术文献理解工作强调 semantics 通过文摘和文库主题模型来实现,但它们往往忽略 pragmatics,如文档组织和流程。通过使用 crossed 学术文献资料库( across 19 学科)和现有的语言模型技术,我们学习了一组适用于所有学科的静态描述符(也称为“正常化”)。然后,我们分析了这些描述符在文档中的位置和顺序,以理解学术领域与结构之间的关系。我们发现了学术社区中文献的内部结构架构,以及不同学科之间的比较。这些发现为未来评估研究质量、领域风格传递和进一步的 Pragmatic 分析提供了基础。
The Sem-Lex Benchmark: Modeling ASL Signs and Their Phonemes
methods: 我们介绍了一个新的资源 для美国手语(ASL)模型化,即 Sem-Lex Benchmark。这个资源包括了超过84k个隔离手语制作的视频,这些视频来自于聋哑的 ASL 手语发isher,他们提供了同意和收到了补偿。人工专家将这些视频与其他手语资源,如 ASL-LEX、SignBank 和 ASL Citizen,进行了对应,从而实现了有用的扩展 для手语和音律特征recognition。
results: 我们进行了一系列实验,使用 SL-GCN 模型来证明手语的音律特征可以达到85%的准确率,并且这些特征是 ISR 中有效的辅助目标。学习recognize手语的音律特征并与词义recognition结合,可以提高 few-shot ISR 精度6%,提高 ISR 精度总体2%。有关下载数据的 instrucions 可以在 GitHub 上找到。Abstract
Sign language recognition and translation technologies have the potential to increase access and inclusion of deaf signing communities, but research progress is bottlenecked by a lack of representative data. We introduce a new resource for American Sign Language (ASL) modeling, the Sem-Lex Benchmark. The Benchmark is the current largest of its kind, consisting of over 84k videos of isolated sign productions from deaf ASL signers who gave informed consent and received compensation. Human experts aligned these videos with other sign language resources including ASL-LEX, SignBank, and ASL Citizen, enabling useful expansions for sign and phonological feature recognition. We present a suite of experiments which make use of the linguistic information in ASL-LEX, evaluating the practicality and fairness of the Sem-Lex Benchmark for isolated sign recognition (ISR). We use an SL-GCN model to show that the phonological features are recognizable with 85% accuracy, and that they are effective as an auxiliary target to ISR. Learning to recognize phonological features alongside gloss results in a 6% improvement for few-shot ISR accuracy and a 2% improvement for ISR accuracy overall. Instructions for downloading the data can be found at https://github.com/leekezar/SemLex.
摘要
sign language recognition和翻译技术有可能提高聋听群体的接入和包容,但研究进展受到数据不充分表征的限制。我们介绍了一个新的美国手语(ASL)模型资源,即Sem-Lex Benchmark。该资源现在是最大的一个,包括了84,000个孤立手语生产视频,这些视频由聋听ASL手语演示者提供,并经过了详细的人工标注和对照。我们进行了一系列实验,利用ASL-LEX等手语资源的语言信息,评估Sem-Lex Benchmark的孤立手语认可率(ISR)的实用性和公平性。我们使用SL-GCN模型显示,手语phonological特征可以达到85%的准确率,并且作为auxiliary target可以提高ISR准确率。通过同时学习手语gloss和phonological特征,可以提高几个shot ISR准确率和总的ISR准确率。下载数据的指导可以在https://github.com/leekezar/SemLex中找到。
Exploring Strategies for Modeling Sign Language Phonology
results: 研究人员在 Sem-Lex Benchmark 上进行了测试,结果表明,使用 curriculum learning 策略可以在所有 phoneme 类型上实现平均准确率为 87%,超过了 fine-tuning 和 multi-task 策略的性能。Abstract
Like speech, signs are composed of discrete, recombinable features called phonemes. Prior work shows that models which can recognize phonemes are better at sign recognition, motivating deeper exploration into strategies for modeling sign language phonemes. In this work, we learn graph convolution networks to recognize the sixteen phoneme "types" found in ASL-LEX 2.0. Specifically, we explore how learning strategies like multi-task and curriculum learning can leverage mutually useful information between phoneme types to facilitate better modeling of sign language phonemes. Results on the Sem-Lex Benchmark show that curriculum learning yields an average accuracy of 87% across all phoneme types, outperforming fine-tuning and multi-task strategies for most phoneme types.
摘要
如speech, sign language composed of discrete, recombinable features called phonemes. Prior work shows that models that can recognize phonemes are better at sign recognition, motivating deeper exploration into strategies for modeling sign language phonemes. In this work, we learn graph convolution networks to recognize the sixteen phoneme "types" found in ASL-LEX 2.0. Specifically, we explore how learning strategies like multi-task and curriculum learning can leverage mutually useful information between phoneme types to facilitate better modeling of sign language phonemes. Results on the Sem-Lex Benchmark show that curriculum learning yields an average accuracy of 87% across all phoneme types, outperforming fine-tuning and multi-task strategies for most phoneme types.Note: ASL-LEX 2.0 refers to the American Sign Language Lexicon, which is a dataset of sign language words and their corresponding phonemes. The Sem-Lex Benchmark is a standardized dataset for evaluating the recognition of sign language phonemes.
results: 研究表明,使用mutual information和其他方法选择特征可以提高分类器性能,只使用25%和50%的输入特征可以得到最佳结果。这些发现有助于提高防范黑客病毒攻击的计算机安全性。Abstract
Malware poses a significant security risk to individuals, organizations, and critical infrastructure by compromising systems and data. Leveraging memory dumps that offer snapshots of computer memory can aid the analysis and detection of malicious content, including malware. To improve the efficacy and address privacy concerns in malware classification systems, feature selection can play a critical role as it is capable of identifying the most relevant features, thus, minimizing the amount of data fed to classifiers. In this study, we employ three feature selection approaches to identify significant features from memory content and use them with a diverse set of classifiers to enhance the performance and privacy of the classification task. Comprehensive experiments are conducted across three levels of malware classification tasks: i) binary-level benign or malware classification, ii) malware type classification (including Trojan horse, ransomware, and spyware), and iii) malware family classification within each family (with varying numbers of classes). Results demonstrate that the feature selection strategy, incorporating mutual information and other methods, enhances classifier performance for all tasks. Notably, selecting only 25\% and 50\% of input features using Mutual Information and then employing the Random Forest classifier yields the best results. Our findings reinforce the importance of feature selection for malware classification and provide valuable insights for identifying appropriate approaches. By advancing the effectiveness and privacy of malware classification systems, this research contributes to safeguarding against security threats posed by malicious software.
摘要
恶意软件对个人、组织和关键基础设施 pose 安全风险,通过损害系统和数据来潜在地威胁。通过使用内存截图,可以帮助分析和检测恶意内容,包括恶意软件。为了提高分类效果并解决隐私问题,Feature Selection 可以扮演关键的角色,可以将内存中的最相关特征选择出来,从而最小化分类器接受的数据量。在这种研究中,我们采用了三种Feature Selection 方法,并将其与多种分类器结合使用,以提高分类效果和隐私。我们在三个不同的恶意软件分类任务上进行了全面的实验,分别是:1. 二进制级别的坏彩虫或清洁软件分类2. 恶意软件类型分类(包括 Trojan horse、勒索软件和间谍软件)3. 恶意软件家族分类(每个家族有不同的数量的类)结果表明,Feature Selection 策略可以提高所有任务的分类效果。特别是使用 Mutual Information 方法选择输入特征的结果,并使用 Random Forest 分类器,可以获得最佳的结果。我们的发现证明了Feature Selection 对恶意软件分类系统的重要性,并提供了选择合适方法的价值。通过提高恶意软件分类系统的效果和隐私,这项研究对安全威胁 pose 的恶意软件提供了重要的贡献。
Nonparametric active learning for cost-sensitive classification
results: 我们证明了我们的算法具有最佳速度对应数目互动次数,并且在一个更一般化的 Tsybakov 噪音假设下,与对应的静态学习方法相比,有一定的优势。Abstract
Cost-sensitive learning is a common type of machine learning problem where different errors of prediction incur different costs. In this paper, we design a generic nonparametric active learning algorithm for cost-sensitive classification. Based on the construction of confidence bounds for the expected prediction cost functions of each label, our algorithm sequentially selects the most informative vector points. Then it interacts with them by only querying the costs of prediction that could be the smallest. We prove that our algorithm attains optimal rate of convergence in terms of the number of interactions with the feature vector space. Furthermore, in terms of a general version of Tsybakov's noise assumption, the gain over the corresponding passive learning is explicitly characterized by the probability-mass of the boundary decision. Additionally, we prove the near-optimality of obtained upper bounds by providing matching (up to logarithmic factor) lower bounds.
摘要
<>传统的机器学习问题中,不同的预测错误有不同的成本。在这篇论文中,我们设计了一种通用非Parametric活动学习算法 для成本敏感分类。基于预测成本函数每个标签的信任范围的建构,我们的算法顺序选择最有用的维度点。然后它仅查询这些预测成本最小的成本。我们证明了我们的算法在相对于特征向量空间的互动数量上具有优化的速度。此外,在一个泛化的 Tsybakov 噪声假设下,我们提供了明确的获益,其与相应的被动学习相比,由边界决策的概率质量来Explicitly characterize。此外,我们证明了我们获得的上限 bounds 的几乎优化性,通过提供相应的下限 bounds (logarithmic factor) 来证明。Note: "cost-sensitive learning" in the text is translated as "成本敏感学习" in Simplified Chinese.
Automated Gait Generation For Walking, Soft Robotic Quadrupeds
results: 实验结果表明,这种方法可以在4分钟内生成出比手工设计的步态更好的翻译和旋转步态,并且可以在不同的软体机器人设计中进行自动化的步态生成。Abstract
Gait generation for soft robots is challenging due to the nonlinear dynamics and high dimensional input spaces of soft actuators. Limitations in soft robotic control and perception force researchers to hand-craft open loop controllers for gait sequences, which is a non-trivial process. Moreover, short soft actuator lifespans and natural variations in actuator behavior limit machine learning techniques to settings that can be learned on the same time scales as robot deployment. Lastly, simulation is not always possible, due to heterogeneity and nonlinearity in soft robotic materials and their dynamics change due to wear. We present a sample-efficient, simulation free, method for self-generating soft robot gaits, using very minimal computation. This technique is demonstrated on a motorized soft robotic quadruped that walks using four legs constructed from 16 "handed shearing auxetic" (HSA) actuators. To manage the dimension of the search space, gaits are composed of two sequential sets of leg motions selected from 7 possible primitives. Pairs of primitives are executed on one leg at a time; we then select the best-performing pair to execute while moving on to subsequent legs. This method -- which uses no simulation, sophisticated computation, or user input -- consistently generates good translation and rotation gaits in as low as 4 minutes of hardware experimentation, outperforming hand-crafted gaits. This is the first demonstration of completely autonomous gait generation in a soft robot.
摘要
软体机器人步态生成具有较大的挑战,主要是因为软动力器的非线性动态和高维输入空间。控制和感知软体机器人的限制使研究人员需要手工设计开loop控制器,这是一个非常困难的过程。另外,软动力器的寿命和自然变化还限制了机器学习技术的应用,只能在同时间尺度上进行学习。此外,模拟也不可行,因为软体机器人材料的不同和非线性,以及动力学的变化。我们提出了一种样本效率高、无需模拟的软体机器人步态自动生成方法,只需要非常少的计算。这种方法基于两个sequential sets of leg motion primitives,每个leg都选择7种可能的primitives中的一个。这些primitives在一个leg上被执行,然后选择最佳的primitives组合,并在后续的leg上执行。这种方法不需要模拟、复杂的计算或用户输入,可以在4分钟的硬件实验中生成良好的翻译和旋转步态,超出了手工设计的步态。这是软体机器人中的首次完全自动步态生成示例。
Dynamic DAG Discovery for Interpretable Imitation Learning
results: 经验结果表明,提出的方法可以准确地捕捉神经网络所学的知识,并且可以在实际应用中提高神经网络的预测精度和可读性。Abstract
Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents' decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, {\method}. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed {\method} in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy.
摘要
随着各种应用领域的广泛应用,如医疗治疗和自动驾驶车辆等,模仿学习已经显示了有前途的成果。然而,它仍然是一个困难的任务,即解释控制策略学习的agent。主要问题来源于两个方面:1)agent通常是用深度神经网络实现,这些模型是黑盒模型,缺乏可读性;2)代理人做出决策的 latent causal mechanism 可能会随着时间推移而变化,而不是静止的。为了增加透明度并提供更好的可读性,我们提议通过暴露 agent 捕捉的知识来减少这些问题。具体来说,我们通过 causal discovery 进程来暴露 agent 的决策过程中的 causal 关系,并使其能够模型 latent causal graph 的动态。我们的方法包括三部分:动态 causal discovery 模块、 causality encoding 模块和预测模块,并在综合训练的情况下进行学习。 после模型学习完毕,我们可以从 agent 的决策过程中提取 causal 关系,并对它学习的策略进行解释。实验结果表明,我们提出的方法可以准确地捕捉 agent 的决策过程中的 causal 关系,同时保持高的预测精度。
Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions
results: 研究发现,对提示语言进行小幅修改不会影响生成单元测试的质量。然而, Code Interpreter 能够有效地检查代码中的错误,因此提供 runnable 代码来检查其输出的正确性是有优势的。我们的发现表明,在提示模型类似于 Code Interpreter 时,只需提供基本信息可以生成单元测试,词句级别的细节不太重要。Abstract
Unit testing is a commonly-used approach in software engineering to test the correctness and robustness of written code. Unit tests are tests designed to test small components of a codebase in isolation, such as an individual function or method. Although unit tests have historically been written by human programmers, recent advancements in AI, particularly LLMs, have shown corresponding advances in automatic unit test generation. In this study, we explore the effect of different prompts on the quality of unit tests generated by Code Interpreter, a GPT-4-based LLM, on Python functions provided by the Quixbugs dataset, and we focus on prompting due to the ease with which users can make use of our findings and observations. We find that the quality of the generated unit tests is not sensitive to changes in minor details in the prompts provided. However, we observe that Code Interpreter is often able to effectively identify and correct mistakes in code that it writes, suggesting that providing it runnable code to check the correctness of its outputs would be beneficial, even though we find that it is already often able to generate correctly-formatted unit tests. Our findings suggest that, when prompting models similar to Code Interpreter, it is important to include the basic information necessary to generate unit tests, but minor details are not as important.
摘要
<> translate english text into simplified chineseUnit testing is a commonly-used approach in software engineering to test the correctness and robustness of written code. Unit tests are tests designed to test small components of a codebase in isolation, such as an individual function or method. Although unit tests have historically been written by human programmers, recent advancements in AI, particularly LLMs, have shown corresponding advances in automatic unit test generation. In this study, we explore the effect of different prompts on the quality of unit tests generated by Code Interpreter, a GPT-4-based LLM, on Python functions provided by the Quixbugs dataset, and we focus on prompting due to the ease with which users can make use of our findings and observations. We find that the quality of the generated unit tests is not sensitive to changes in minor details in the prompts provided. However, we observe that Code Interpreter is often able to effectively identify and correct mistakes in code that it writes, suggesting that providing it runnable code to check the correctness of its outputs would be beneficial, even though we find that it is already often able to generate correctly-formatted unit tests. Our findings suggest that, when prompting models similar to Code Interpreter, it is important to include the basic information necessary to generate unit tests, but minor details are not as important.
Generative Design of inorganic compounds using deep diffusion language models
paper_authors: Rongzhi Dong, Nihang Fu, dirisuriya M. D. Siriwardane, Jianjun Hu
For: The paper aims to discover new materials with specific functions by leveraging deep learning and chemical knowledge.* Methods: The authors use a deep learning-based generative model for material composition and structure design, which includes deep diffusion language models and a template-based crystal structure prediction algorithm. They also use a universal graph neural network-based potential for structure relaxation and density functional theory (DFT) calculations for validation.* Results: The authors discovered six new materials with formation energy less than zero, among which four materials (Ti2HfO5, TaNbP, YMoN2, and TaReO4) have an e-above-hull energy of less than 0.3 eV, demonstrating the effectiveness of their approach.Here is the simplified Chinese version of the three key points:* 为:本文目标是利用深度学习和化学知识发现具有特定功能的材料。* 方法:作者们使用深度扩散语言模型来生成材料组成和结构设计,并使用模板基于晶体结构预测算法来预测其对应的结构。他们还使用基于图神经网络的晶体结构relaxation算法和能量函数理论计算来验证新结构的有效性。* 结果:作者们发现了六种新的材料,其中四种材料(Ti2HfO5、TaNbP、YMoN2和TaReO4)的形成能量小于零,并且这些材料的e-above-hull能量小于0.3 eV,证明了他们的方法的有效性。Abstract
Due to the vast chemical space, discovering materials with a specific function is challenging. Chemical formulas are obligated to conform to a set of exacting criteria such as charge neutrality, balanced electronegativity, synthesizability, and mechanical stability. In response to this formidable task, we introduce a deep learning-based generative model for material composition and structure design by learning and exploiting explicit and implicit chemical knowledge. Our pipeline first uses deep diffusion language models as the generator of compositions and then applies a template-based crystal structure prediction algorithm to predict their corresponding structures, which is then followed by structure relaxation using a universal graph neural network-based potential. The density functional theory (DFT) calculations of the formation energies and energy-above-the-hull analysis are used to validate new structures generated through our pipeline. Based on the DFT calculation results, six new materials, including Ti2HfO5, TaNbP, YMoN2, TaReO4, HfTiO2, and HfMnO2, with formation energy less than zero have been found. Remarkably, among these, four materials, namely Ti2$HfO5, TaNbP, YMoN2, and TaReO4, exhibit an e-above-hull energy of less than 0.3 eV. These findings have proved the effectiveness of our approach.
摘要
Our pipeline first uses deep diffusion language models as the generator of compositions and then applies a template-based crystal structure prediction algorithm to predict their corresponding structures. This is followed by structure relaxation using a universal graph neural network-based potential. The density functional theory (DFT) calculations of the formation energies and energy-above-the-hull analysis are used to validate the new structures generated through our pipeline.Based on the DFT calculation results, six new materials with formation energy less than zero have been found, including Ti2HfO5, TaNbP, YMoN2, TaReO4, HfTiO2, and HfMnO2. Remarkably, among these, four materials (Ti2HfO5, TaNbP, YMoN2, and TaReO4) exhibit an e-above-hull energy of less than 0.3 eV. These findings demonstrate the effectiveness of our approach.
Enhancing Mortality Prediction in Heart Failure Patients: Exploring Preprocessing Methods for Imbalanced Clinical Datasets
paper_authors: Hanif Kia, Mansour Vali, Hadi Sabahi for: 这篇论文是为了提高心血管疾病(HF)患者一个月死亡预测的精度。methods: 这篇论文使用了一个全面的预processing框架,包括尺度调整、异常处理和重样化,以及一个识别Missing值的方法。results: 这篇论文使用了PROVE资料集,并借助适当的预processing技术和机器学习(ML)算法,实现了一个月死亡预测的改善。结果显示,使用这些预processing技术可以提高tree-based模型(例如Random Forest和XGB)的F1分数和MCC分数约3.6%和2.7%。这表明了这种预processing方法在处理不均衡的医疗资料时的效果。Abstract
Heart failure (HF) is a critical condition in which the accurate prediction of mortality plays a vital role in guiding patient management decisions. However, clinical datasets used for mortality prediction in HF often suffer from an imbalanced distribution of classes, posing significant challenges. In this paper, we explore preprocessing methods for enhancing one-month mortality prediction in HF patients. We present a comprehensive preprocessing framework including scaling, outliers processing and resampling as key techniques. We also employed an aware encoding approach to effectively handle missing values in clinical datasets. Our study utilizes a comprehensive dataset from the Persian Registry Of cardio Vascular disease (PROVE) with a significant class imbalance. By leveraging appropriate preprocessing techniques and Machine Learning (ML) algorithms, we aim to improve mortality prediction performance for HF patients. The results reveal an average enhancement of approximately 3.6% in F1 score and 2.7% in MCC for tree-based models, specifically Random Forest (RF) and XGBoost (XGB). This demonstrates the efficiency of our preprocessing approach in effectively handling Imbalanced Clinical Datasets (ICD). Our findings hold promise in guiding healthcare professionals to make informed decisions and improve patient outcomes in HF management.
摘要
心血液性疾病(HF)是一种严重的疾病状态,其中准确预测死亡率具有重要的指导作用,以帮助医生对患者进行有效的管理决策。然而,在HF疾病中使用的临床数据集经常受到类别的不均衡分布的困扰,这对于预测死亡率具有重要的挑战。在这篇论文中,我们探讨了适用于HF患者一月死亡预测的预处理技术。我们提出了一个完整的预处理框架,包括缩放、异常处理和重采样等关键技巧。此外,我们采用了一种感知编码方法,以有效地处理临床数据集中的缺失数据。我们的研究使用了来自伊朗cardiovascular疾病注册(PROVE)的全面数据集,这个数据集具有显著的类别不均衡。通过适用适当的预处理技术和机器学习(ML)算法,我们希望提高HF患者的一月死亡预测性能。结果表明,使用树状模型(RF和XGB)时,我们的预处理方法可以提高F1分数的平均提升约3.6%和MCC的平均提升约2.7%。这表明我们的预处理方法可以有效地处理临床数据集中的类别不均衡。我们的发现可能会帮助医生做出更有知识的决策,从而改善HF患者的疾病管理。
results: 这个研究创建了一个名为JustLMD的新的多媒体数据集,包含4.6小时的3D舞蹈动作,以及其 accompaniment的音乐和英文歌词。此外,这个研究还展示了一个跨模式传播网络,可以根据音乐和歌词生成3D舞蹈动作。Abstract
Lyrics often convey information about the songs that are beyond the auditory dimension, enriching the semantic meaning of movements and musical themes. Such insights are important in the dance choreography domain. However, most existing dance synthesis methods mainly focus on music-to-dance generation, without considering the semantic information. To complement it, we introduce JustLMD, a new multimodal dataset of 3D dance motion with music and lyrics. To the best of our knowledge, this is the first dataset with triplet information including dance motion, music, and lyrics. Additionally, we showcase a cross-modal diffusion-based network designed to generate 3D dance motion conditioned on music and lyrics. The proposed JustLMD dataset encompasses 4.6 hours of 3D dance motion in 1867 sequences, accompanied by musical tracks and their corresponding English lyrics.
摘要
文本经常传递歌曲之外的信息,增强舞蹈主题和乐曲的 semantics 含义。这些信息在舞蹈编排领域是非常重要的。然而,大多数现有的舞蹈生成方法主要集中在音乐到舞蹈生成上,忽略了semantic信息。为了补充它,我们介绍了JustLMD,一个新的多Modal dataset,包括3D舞蹈动作、音乐和歌词。我们知道这是首个包含三元信息的 dataset。此外,我们还展示了一种cross-modal填充网络,用于生成3D舞蹈动作,受音乐和歌词的控制。JustLMD dataset包含4.6小时的3D舞蹈动作,共1867个sequences,每个sequence都附有乐曲和其对应的英文歌词。
The objective function equality property of infoGAN for two-layer network
results: 研究表明,infoGAN中的两个目标函数在推定网络和生成网络样本数趋于无穷大时变得等价。这种等价性得到证明,通过考虑推定网络和生成网络函数类型的Rademacher复杂度。此外,使用两层网络,即推定网络和生成网络,并采用了Lipschitz和不递减 activation函数,也验证了这种等价性。Abstract
Information Maximizing Generative Adversarial Network (infoGAN) can be understood as a minimax problem involving two networks: discriminators and generators with mutual information functions. The infoGAN incorporates various components, including latent variables, mutual information, and objective function. This research demonstrates that the two objective functions in infoGAN become equivalent as the discriminator and generator sample size approaches infinity. This equivalence is established by considering the disparity between the empirical and population versions of the objective function. The bound on this difference is determined by the Rademacher complexity of the discriminator and generator function class. Furthermore, the utilization of a two-layer network for both the discriminator and generator, featuring Lipschitz and non-decreasing activation functions, validates this equality
摘要
信息最大化生成对抗网络(infoGAN)可以理解为两个网络:分类器和生成器具有互信息函数。infoGAN包含多个组件,包括隐藏变量、互信息和目标函数。本研究表明,infoGAN中的两个目标函数在分类器和生成器抽样数趋于无穷大时变得等价。这种等价性由考虑分类器和生成器抽样数的实际版本和人口版本之间的差异来确定。此外,使用两层网络作为分类器和生成器,其中包括 lipschitz 和非减少 activation 函数,可以证明这种等价性。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation. It is commonly used in mainland China and Singapore.
ResolvNet: A Graph Convolutional Network with multi-scale Consistency
paper_authors: Christian Koke, Abhishek Saroha, Yuesong Shen, Marvin Eisenberger, Daniel Cremers
for: The paper is written to address the limitations of graph neural networks (GNNs) in propagating information over long distances, particularly in the presence of bottlenecks and strongly connected sub-graphs.
methods: The paper introduces a new graph neural network architecture called ResolvNet, which is based on the mathematical concept of resolvents. The authors claim that ResolvNet is more consistent across multiple scales and outperforms baseline models on many tasks.
results: The authors report extensive experimental results on real-world data that demonstrate the effectiveness of ResolvNet in various tasks, including those with bottlenecks and strongly connected sub-graphs. The results show that ResolvNet outperforms baseline models significantly, and that it is more consistent across multiple scales.Abstract
It is by now a well known fact in the graph learning community that the presence of bottlenecks severely limits the ability of graph neural networks to propagate information over long distances. What so far has not been appreciated is that, counter-intuitively, also the presence of strongly connected sub-graphs may severely restrict information flow in common architectures. Motivated by this observation, we introduce the concept of multi-scale consistency. At the node level this concept refers to the retention of a connected propagation graph even if connectivity varies over a given graph. At the graph-level, multi-scale consistency refers to the fact that distinct graphs describing the same object at different resolutions should be assigned similar feature vectors. As we show, both properties are not satisfied by poular graph neural network architectures. To remedy these shortcomings, we introduce ResolvNet, a flexible graph neural network based on the mathematical concept of resolvents. We rigorously establish its multi-scale consistency theoretically and verify it in extensive experiments on real world data: Here networks based on this ResolvNet architecture prove expressive; out-performing baselines significantly on many tasks; in- and outside the multi-scale setting.
摘要
现在已经广泛认可的graph学习社区中的一个事实是,瓶颈会严重限制图 neural network的信息传递范围。而且,Counter-intuitively,强Connected sub-graphs 也可能减少信息流动。基于这一观察,我们引入多尺度一致性概念。在节点级别上,这个概念表示连接度变化的图中保持连通的宣传图。在图级别上,多尺度一致性表示同一个对象在不同的分辨率上描述的不同图应该赋予相似的特征 вектор。我们证明,这两个属性都不满足流行的图 neural network 架构。为了缓解这些缺陷,我们引入ResolvNet,基于解析函数的图 neural network 架构。我们严格地证明其多尺度一致性,并在实际数据上进行了广泛的实验,结果表明:ResolvNet 架构基网络表现强大,在许多任务上与基准值相比表现出优异。
On the Stability of Iterative Retraining of Generative Models on their own Data
results: 经验 validate了这些方法可以确保深度生成模型在混合数据集上的稳定性,并且可以在不同的数据集上进行应用。Abstract
Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models must contend with the reality that their training is curated from both clean data and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets (of real and synthetic data) on their stability. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.
摘要
深度生成模型已经取得了很大的进步,能够模拟复杂的数据,其生成质量经常超过人类能力的识别水平。无疑,这一成就的关键因素是由这些模型所消耗的庞大量数据。由于这些模型的突出表现和易用性,未来网络将被充满假的内容。这一事实直接意味着未来的生成模型需要面对现有的混合数据集(包括真实数据和过去模型生成的假数据)进行训练。在这篇论文中,我们开发了一套框架,用于严谨地研究训练生成模型的稳定性。我们首先证明,如果初始的生成模型足够接近数据分布,并且净训练数据占总训练数据的比重充分大 enough,那么训练过程是稳定的。我们验证了我们的理论通过对Synthetic和自然图像进行iterative训练,使用normalizing flows和state-of-the-art diffusion模型在CIFAR10和FFHQ上进行了实验。
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing
methods: 我们提出了一种新的算法,可以快速地降低模型的推理时间复杂度,而不需要直接计算多个大 covariance 矩阵。我们的方法结合了 Monte Carlo 采样和迭代线性解密。
results: 我们的实验表明,相比现有基eline,我们的算法可以在高维度情况下提高速度,并且可以降低内存占用量。具体来说,我们的算法可以在某些情况下比现有基eline thousands of times faster和一个数量级更高的内存占用量。Abstract
This paper considers clustered multi-task compressive sensing, a hierarchical model that solves multiple compressive sensing tasks by finding clusters of tasks that leverage shared information to mutually improve signal reconstruction. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. The main bottleneck involves repeated matrix inversion and log-determinant computation for multiple large covariance matrices. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices. Our approach combines Monte Carlo sampling with iterative linear solvers. Our experiments reveal that compared to the existing baseline, our algorithm can be up to thousands of times faster and an order of magnitude more memory-efficient.
摘要
Linear Convergence of Pre-Conditioned PI Consensus Algorithm under Restricted Strong Convexity
results: 研究者通过数值 validatePI妥协算法的效果,并与其他分布式凸优化算法进行比较。结果显示,采用本地预处理可以减少通信图的影响,提高PI妥协算法的性能。Abstract
This paper considers solving distributed convex optimization problems in peer-to-peer multi-agent networks. The network is assumed to be synchronous and connected. By using the proportional-integral (PI) control strategy, various algorithms with fixed stepsize have been developed. The earliest among them is the PI consensus algorithm. Using Lyapunov theory, we guarantee exponential convergence of the PI consensus algorithm for restricted strongly convex functions with rate-matching discretization, without requiring convexity of individual local cost functions, for the first time. In order to accelerate the PI consensus algorithm, we incorporate local pre-conditioning in the form of constant positive definite matrices and numerically validate its efficiency compared to the prominent distributed convex optimization algorithms. Unlike classical pre-conditioning, where only the gradients are multiplied by a pre-conditioner, the proposed pre-conditioning modifies both the gradients and the consensus terms, thereby controlling the effect of the communication graph between the agents on the PI consensus algorithm.
摘要
(本文考虑了在多智能机器人网络中解决分布式凸优化问题。网络假设同步连接。通过使用规比积分控制策略,我们开发了多种固定步长的算法。最早的一种是PI妥协算法。使用拉普诺夫理论,我们 garanttees 离散凸函数的凸优化问题的快速收敛,不需要个体本地成本函数的凸性,这是第一次。为了加速PI妥协算法,我们采用了本地预conditioning,通过将常数正定矩阵加到梯度和妥协项中,控制了网络通信图between agents对PI妥协算法的影响。)
Better Situational Graphs by Inferring High-level Semantic-Relational Concepts
results: 在 simulated 和实际数据集上,与基eline算法相比,更高精度和更高效的推理结果,以及新的semantic概念“墙”和其与墙面之间的关系Abstract
Recent works on SLAM extend their pose graphs with higher-level semantic concepts exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as wall surfaces and rooms, whose relationship is mathematically defined. Nevertheless, excerpting these high-level concepts relying exclusively on the lower-level factor-graph remains a challenge and it is currently done with ad-hoc algorithms, which limits its capability to include new semantic-relational concepts. To overcome this limitation, in this work, we propose a Graph Neural Network (GNN) for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. We have demonstrated that we can infer room entities and their relationship to the mapped wall surfaces, more accurately and more computationally efficient than the baseline algorithm. Additionally, to demonstrate the versatility of our method, we provide a new semantic concept, i.e. wall, and its relationship with its wall surfaces. Our proposed method has been integrated into S-Graphs+, and it has been validated in both simulated and real datasets. A docker container with our software will be made available to the scientific community.
摘要
最近的SLAM研究延伸了它们的姿态图,添加更高一级的semantic概念,利用这些概念之间的关系,以提供更加丰富的情况/环境表示,并提高其估计的准确性。具体来说,我们的前一项工作《 Situational Graphs (S-Graphs)》,是jointly利用semantic关系在因素优化过程中的先驱者, rely on semantic entity如墙面和房间,其关系由数学定义。然而,抽取这些高级概念,即依据低级因素图仅存在的方法,是一个挑战,现在通过ad-hoc算法来实现。为了超越这一限制,在这项工作中,我们提出了一种图神经网络(GNN),用于学习高级semantic-relational概念,可以从低级因素图中被推导出。我们已经证明,我们可以更加准确地、更加计算效率地,从低级因素图中推导出房间实体和它们与映射的墙面之间的关系。此外,为了证明我们的方法的多样性,我们还提出了一新的semantic概念,即墙,以及它与它的墙面之间的关系。我们的提议的方法已经被 integrate到S-Graphs+中,并在 simulations和实际数据中进行了验证。我们将在科学社区中提供一个docker容器,包含我们的软件。
Mitigating the Effect of Incidental Correlations on Part-based Learning
results: 研究表明,通过我们的方法,可以在少量数据下实现领先的表现(State-of-the-art,SoTA),并且在背景变化和常见数据损害下,部分表示仍然能够保持更好的泛化性和解释性。Abstract
Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific backgrounds. These incidental correlations may have a detrimental impact on the generalization and interpretability of learned part representations. This study asserts that part-based representations could be more interpretable and generalize better with limited data, employing two innovative regularization methods. The first regularization separates foreground and background information's generative process via a unique mixture-of-parts formulation. Structural constraints are imposed on the parts using a weakly-supervised loss, guaranteeing that the mixture-of-parts for foreground and background entails soft, object-agnostic masks. The second regularization assumes the form of a distillation loss, ensuring the invariance of the learned parts to the incidental background correlations. Furthermore, we incorporate sparse and orthogonal constraints to facilitate learning high-quality part representations. By reducing the impact of incidental background correlations on the learned parts, we exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets, including MiniImagenet, TieredImageNet, and FC100. We also demonstrate that the part-based representations acquired through our approach generalize better than existing techniques, even under domain shifts of the background and common data corruption on the ImageNet-9 dataset. The implementation is available on GitHub: https://github.com/GauravBh1010tt/DPViT.git
摘要
智能系统具有分解复杂问题为更小可重用组件或部分的重要特点。然而,当前的部件学习方法在面临有限观察对象的限制下遇到了间接相关性问题,这些问题可能会影响学习的泛化和解释性。这种研究表明,使用两种创新的正则化方法可以使部件表示更加可解和泛化。首先,我们使用了一种唯一的混合部分形式来分离前景和背景信息的生成过程。我们使用弱监督损失来强制实施结构约束,确保混合部分对于前景和背景的混合是软的、对象无关的面具。其次,我们使用了一种液体损失来保证学习的部件具有对于干扰背景相关性的抗变异性。此外,我们还添加了稀疏和正交约束,以便学习高质量的部件表示。通过减少学习部件中的干扰背景相关性,我们实现了state-of-the-art(SoTA)性能在少数shot学习任务上,包括MiniImagenet、TieredImageNet和FC100。我们还证明了我们的方法学习的部件表示能够更好地泛化,即使在背景的域变和通用数据腐坏中。实现可以在GitHub上找到:https://github.com/GauravBh1010tt/DPViT.git。
Harmony World Models: Boosting Sample Efficiency for Model-based Reinforcement Learning
results: 我们在三个视觉控制领域进行了实验,结果显示,当基于MBRL的方法加以和谐世界模型的改进时,能获得10%-55%的绝对性能提升。Abstract
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of more efficient MBRL by harmonizing the interference between observation and reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment through observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating in implicit MBRL and adept at learning task-centric dynamics, are inadequate for sample-efficient learning without richer learning signals. Capitalizing on these insights and discoveries, we propose a simple yet effective method, Harmony World Models (HarmonyWM), that introduces a lightweight harmonizer to maintain a dynamic equilibrium between the two tasks in world model learning. Our experiments on three visual control domains show that the base MBRL method equipped with HarmonyWM gains 10%-55% absolute performance boosts.
摘要
DURENDAL: Graph deep learning framework for temporal heterogeneous networks
results: 实验表明,DURENDAL在四个动态多元网络 datasets 上的未来链接预测任务中表现出色,比现有的解决方案更具预测力。此外,论文还证明了其模型设计的有效性。Abstract
Temporal heterogeneous networks (THNs) are evolving networks that characterize many real-world applications such as citation and events networks, recommender systems, and knowledge graphs. Although different Graph Neural Networks (GNNs) have been successfully applied to dynamic graphs, most of them only support homogeneous graphs or suffer from model design heavily influenced by specific THNs prediction tasks. Furthermore, there is a lack of temporal heterogeneous networked data in current standard graph benchmark datasets. Hence, in this work, we propose DURENDAL, a graph deep learning framework for THNs. DURENDAL can help to easily repurpose any heterogeneous graph learning model to evolving networks by combining design principles from snapshot-based and multirelational message-passing graph learning models. We introduce two different schemes to update embedding representations for THNs, discussing the strengths and weaknesses of both strategies. We also extend the set of benchmarks for TNHs by introducing two novel high-resolution temporal heterogeneous graph datasets derived from an emerging Web3 platform and a well-established e-commerce website. Overall, we conducted the experimental evaluation of the framework over four temporal heterogeneous network datasets on future link prediction tasks in an evaluation setting that takes into account the evolving nature of the data. Experiments show the prediction power of DURENDAL compared to current solutions for evolving and dynamic graphs, and the effectiveness of its model design.
摘要
Temporal heterogeneous networks (THNs) 是一种发展中的网络,表现在许多实际应用中,如引用和事件网络、推荐系统和知识图。虽然不同的图神经网络(GNNs)在动态图上得到了成功应用,但大多数其中只支持同质graph或受到特定 THNs 预测任务的设计强烈影响。此外,当前的标准图 benchmark 数据集中缺乏 temporal heterogeneous network 数据。因此,在这项工作中,我们提出了 DURENDAL,一个用于 THNs 的图深度学习框架。DURENDAL 可以帮助将any heterogeneous graph learning model 映射到发展中的网络,通过将 snapshot-based 和多关系消息传递的图学习模型设计原则结合。我们提出了两种不同的 THNs 嵌入表示更新策略,讨论了每个策略的优缺点。此外,我们还扩展了 THNs 的 benchmark 集,通过从一个emerging Web3 平台和一个知名的电商网站 derivated 两个高分辨率的时间含盐多关系图 dataset。总的来说,我们在四个时间含盐多关系图上对 DURENDAL 框架进行了实验评估,以考虑数据的发展性。实验结果表明 DURENDAL 在未来链接预测任务中的预测力与当前的动态和发展图解决方案相比,以及其设计的效果。
Anomaly Detection in Power Generation Plants with Generative Adversarial Networks
results: 研究发现,使用GANs进行异常点探测可以实现高准确率,尤其是在使用大量数据时。在这个研究中,模型的准确率为98.99%,比之前不含数据增强时的准确率(66.45%)高得多。Abstract
Anomaly detection is a critical task that involves the identification of data points that deviate from a predefined pattern, useful for fraud detection and related activities. Various techniques are employed for anomaly detection, but recent research indicates that deep learning methods, with their ability to discern intricate data patterns, are well-suited for this task. This study explores the use of Generative Adversarial Networks (GANs) for anomaly detection in power generation plants. The dataset used in this investigation comprises fuel consumption records obtained from power generation plants operated by a telecommunications company. The data was initially collected in response to observed irregularities in the fuel consumption patterns of the generating sets situated at the company's base stations. The dataset was divided into anomalous and normal data points based on specific variables, with 64.88% classified as normal and 35.12% as anomalous. An analysis of feature importance, employing the random forest classifier, revealed that Running Time Per Day exhibited the highest relative importance. A GANs model was trained and fine-tuned both with and without data augmentation, with the goal of increasing the dataset size to enhance performance. The generator model consisted of five dense layers using the tanh activation function, while the discriminator comprised six dense layers, each integrated with a dropout layer to prevent overfitting. Following data augmentation, the model achieved an accuracy rate of 98.99%, compared to 66.45% before augmentation. This demonstrates that the model nearly perfectly classified data points into normal and anomalous categories, with the augmented data significantly enhancing the GANs' performance in anomaly detection. Consequently, this study recommends the use of GANs, particularly when using large datasets, for effective anomaly detection.
摘要
异常检测是一项关键任务,涉及到从先定模式中异常出现的数据点的标识,有用于探测诈骗活动等。各种技术被使用于异常检测,但最新的研究表明,深度学习方法,拥有捕捉复杂数据模式的能力,对此任务非常适合。本研究探讨使用生成对抗网络(GANs)进行异常检测在发电厂中。该研究使用的数据集来自一家电信公司运营的发电厂,该数据集包括发电机组的燃料消耗记录。该数据在观察到发电机组的燃料消耗模式异常时被收集。数据集被分为异常和常见数据点,其中64.88%被分类为常见,35.12%被分类为异常。通过特征重要性分析,使用随机森林分类器,显示运行时间每天的相对重要性最高。GANs模型包括五层杂化函数的生成器模型,而批判器则包括六层杂化函数,每个杂化层都有Dropout层来避免过拟合。经过数据增强后,模型达到了98.99%的准确率,比之前增强前的66.45%高得多。这表明模型可以准确地将数据点分类为常见和异常类别,并且增强后的数据对GANs的异常检测性能有很大提升。因此,本研究建议使用GANs,特别是在使用大量数据时, для有效的异常检测。
Memorization with neural nets: going beyond the worst case
results: 我们通过大量的数值实验证明了这种算法的有效性,并将理论结论与实际情况联系起来。Abstract
In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.
摘要
In this paper, we investigate interpolation by taking an instance-specific viewpoint. We propose a simple randomized algorithm that, given a fixed finite dataset with two classes, can construct an interpolating three-layer neural network in polynomial time. The number of parameters required is linked to the geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and go beyond worst-case memorization capacity bounds.We demonstrate the effectiveness of the algorithm in non-pathological situations through extensive numerical experiments and link the insights back to the theoretical results.
Mathematical structure of perfect predictive reservoir computing for autoregressive type of time series data
results: 该论文显示了RC神经网络在AR类时间序列数据上的完美预测能力,并证明了其低训练成本、高速度和高计算能力的优势。Abstract
Reservoir Computing (RC) is a type of recursive neural network (RNN), and there can be no doubt that the RC will be more and more widely used for building future prediction models for time-series data, with low training cost, high speed and high computational power. However, research into the mathematical structure of RC neural networks has only recently begun. Bollt (2021) clarified the necessity of the autoregressive (AR) model for gaining the insight into the mathematical structure of RC neural networks, and indicated that the Wold decomposition theorem is the milestone for understanding of these. Keeping this celebrated result in mind, in this paper, we clarify hidden structures of input and recurrent weight matrices in RC neural networks, and show that such structures attain perfect prediction for the AR type of time series data.
摘要
rezhiyu zhongxin (RC) yisheng yizhi rnn, yige zhisha, RC yisheng yizhi yici yibu zhengxin shi zhengxin yongjian yisheng yizhi yici, gongying yibu zhengxin shi yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici. However, RC neural network de yi xiang yu yisheng yizhi yici yongjian yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici. Bollt (2021) ying yong zhengxin yisheng yizhi yici yongjian yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici, yige zhisha, Wold de zhengxin yisheng yizhi yici yongjian yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici. In this paper, we clarify the hidden structures of input and recurrent weight matrices in RC neural networks, and show that such structures achieve perfect prediction for AR type time series data.
SpatialRank: Urban Event Ranking with NDCG Optimization on Spatiotemporal Data
results: 对三个实际 dataset进行了全面的实验,显示SpatialRank可以有效地预测城市事件的顺位,并且在NDCG指标上与当前状态艺术方法相比,提高了12.7%。Abstract
The problem of urban event ranking aims at predicting the top-k most risky locations of future events such as traffic accidents and crimes. This problem is of fundamental importance to public safety and urban administration especially when limited resources are available. The problem is, however, challenging due to complex and dynamic spatio-temporal correlations between locations, uneven distribution of urban events in space, and the difficulty to correctly rank nearby locations with similar features. Prior works on event forecasting mostly aim at accurately predicting the actual risk score or counts of events for all the locations. Rankings obtained as such usually have low quality due to prediction errors. Learning-to-rank methods directly optimize measures such as Normalized Discounted Cumulative Gain (NDCG), but cannot handle the spatiotemporal autocorrelation existing among locations. In this paper, we bridge the gap by proposing a novel spatial event ranking approach named SpatialRank. SpatialRank features adaptive graph convolution layers that dynamically learn the spatiotemporal dependencies across locations from data. In addition, the model optimizes through surrogates a hybrid NDCG loss with a spatial component to better rank neighboring spatial locations. We design an importance-sampling with a spatial filtering algorithm to effectively evaluate the loss during training. Comprehensive experiments on three real-world datasets demonstrate that SpatialRank can effectively identify the top riskiest locations of crimes and traffic accidents and outperform state-of-art methods in terms of NDCG by up to 12.7%.
摘要
urbana 事件排名问题目标是预测未来事件 such as traffic accidents和犯罪的 top-k最危险位置。这个问题对公共安全和城市管理非常重要,特别当有限的资源时。然而,这个问题具有复杂的空间时间相关性、不均匀分布的城市事件空间和难以正确排名邻近位置的问题。先前的事件预测方法主要是准确预测所有位置的实际风险分数或事件数量。rankings 得到的质量通常较低,因为预测错误。我们在这篇论文中bridges 这个差距,提出了一种新的城市事件排名方法 named SpatialRank。SpatialRank 特点是动态学习空间时间相关性的适应 граф卷积层。此外,模型还优化了一个混合的NDCG损失函数,以更好地排名邻近空间位置。我们设计了一种importance sampling 的 spatial filtering 算法,以有效评估损失函数在训练中。三个实际数据集的全面实验表明,SpatialRank 可以有效地预测犯罪和交通事故的最危险位置,并在NDCG指标上与状态艺术方法相比提高至12.7%。
paper_authors: Zhaonan Qu, Alfred Galichon, Johan Ugander
for: 这 paper 是关于 Luce’s 选择假设下的选择和排名模型的研究,包括 Bradley–Terry–Luce 和 Plackett–Luce 模型。
methods: 这 paper 使用了 Sinkhorn 算法,一种经典的矩阵均衡问题的解决方法,来解决 Luce’s 选择假设下的最大可信度估计问题。
results: 这 paper 显示了 Sinkhorn 算法在非正式矩阵上的全线性归一化速率,并且Characterize 这个全线性归一化速率基于数据中的分布式连接性。此外,paper 还 derive 了关于 Sinkhorn 算法的精确几何归一化速率,这是一个经典结果,但是通过更直观的分析,找到了一个内在的正交结构。Abstract
For a broad class of choice and ranking models based on Luce's choice axiom, including the Bradley--Terry--Luce and Plackett--Luce models, we show that the associated maximum likelihood estimation problems are equivalent to a classic matrix balancing problem with target row and column sums. This perspective opens doors between two seemingly unrelated research areas, and allows us to unify existing algorithms in the choice modeling literature as special instances or analogs of Sinkhorn's celebrated algorithm for matrix balancing. We draw inspirations from these connections and resolve important open problems on the study of Sinkhorn's algorithm. We first prove the global linear convergence of Sinkhorn's algorithm for non-negative matrices whenever finite solutions to the matrix balancing problem exist. We characterize this global rate of convergence in terms of the algebraic connectivity of the bipartite graph constructed from data. Next, we also derive the sharp asymptotic rate of linear convergence, which generalizes a classic result of Knight (2008), but with a more explicit analysis that exploits an intrinsic orthogonality structure. To our knowledge, these are the first quantitative linear convergence results for Sinkhorn's algorithm for general non-negative matrices and positive marginals. The connections we establish in this paper between matrix balancing and choice modeling could help motivate further transmission of ideas and interesting results in both directions.
摘要
<>对一类基于逊的选择axioma的选择和排名模型,包括布拉德利--特里--逊和柏拉克特--逊模型,我们示示出其相关的最大 LIKELIHOOD估计问题与经典的矩阵均衡问题相关。这个视角打开了两个似非相关的研究领域之间的连接,并允许我们将现有的选择模型 литературе中的算法视为矩阵均衡问题的特殊情况或类比。我们从这些连接中继承了想法,并解决了关于Sinkhorn的算法的重要开放问题。我们首先证明了Sinkhorn的算法对非负矩阵的全局线性收敛,当finite solutions存在时。然后,我们 caracterize了这个全局收敛率,以及矩阵均衡问题的解的存在。接下来,我们还 derive了对于一般非负矩阵和正边的情况,Sinkhorn的算法的sharp asymptotic rate of linear convergence,这是一个通过抽象的Orthogonality结构进行更加精细的分析,并扩展了Knight (2008)的经典结果。到我们所知,这些结果是Sinkhorn的算法在总体上的第一个量化线性收敛结果。 connections we establish在这篇论文中 между矩阵均衡和选择模型可能会帮助推动两个方向的想法和结果的传输。
Learning State-Augmented Policies for Information Routing in Communication Networks
for: 这 paper 研究了大规模通信网络中信息路由问题,可以视为一个受限制的统计学学习问题,只能使用当地信息。
methods: 该 paper 提出了一种新的状态扩展(SA)策略,通过图神经网络(GNN)架构,在通信网络中启用图 convolution,以便在源节点中最大化总信息。
results: 该 paper 的实验表明,提出的方法可以有效地路由欲要信息到目标节点,并且在实时网络 topology 上进行评估。数值实验表明,该方法比基准算法更好地训练 GNN 参数化。Abstract
This paper examines the problem of information routing in a large-scale communication network, which can be formulated as a constrained statistical learning problem having access to only local information. We delineate a novel State Augmentation (SA) strategy to maximize the aggregate information at source nodes using graph neural network (GNN) architectures, by deploying graph convolutions over the topological links of the communication network. The proposed technique leverages only the local information available at each node and efficiently routes desired information to the destination nodes. We leverage an unsupervised learning procedure to convert the output of the GNN architecture to optimal information routing strategies. In the experiments, we perform the evaluation on real-time network topologies to validate our algorithms. Numerical simulations depict the improved performance of the proposed method in training a GNN parameterization as compared to baseline algorithms.
摘要
Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning
methods: 该论文提出了一种名为 Resource-aware Federated Foundation Models(RaFFM)的框架,该框架使用特殊的模型压缩算法来适应边缘设备的资源限制,例如突出的参数优先级和高性能子网络提取。这些算法允许在Edge FL 系统中动态调整基础模型的大小以适应不同的资源环境。
results: 实验结果表明,RaFFM 可以减少资源使用,同时保持模型性能的水平。具体来说,RaFFM 在自然语言处理和计算机视觉等任务上达到了与传统 Edge FL 方法相同的性能水平,而且占用的资源更少。Abstract
Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Simultaneously, foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks. However, integrating FMs into FL presents challenges, primarily due to their substantial size and intensive resource requirements. This is especially true when considering the resource heterogeneity in edge FL systems. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges. RaFFM introduces specialized model compression algorithms tailored for FL scenarios, such as salient parameter prioritization and high-performance subnetwork extraction. These algorithms enable dynamic scaling of given transformer-based FMs to fit heterogeneous resource constraints at the network edge during both FL's optimization and deployment stages. Experimental results demonstrate that RaFFM shows significant superiority in resource utilization efficiency and uses fewer resources to deploy FMs to FL. Despite the lower resource consumption, target models optimized by RaFFM achieve performance on par with traditional FL methods applied to full-sized FMs. This is evident across tasks in both natural language processing and computer vision domains.
摘要
federated learning (FL) 提供隐私保护的分布式机器学习,在边缘客户端上调参模型而不需要共享私有数据。同时,基础模型 (FM) 在人工智能 (AI) 领域得到了广泛的应用,因为它们在多种任务上表现出色。然而,将 FM integrated into FL 存在挑战,主要是因为它们的较大的大小和资源占用。这pecially true when considering the resource heterogeneity in edge FL systems。我们提出了适应性的 Federated Foundation Models (RaFFM) 框架,以解决这些挑战。RaFFM 引入了特殊的模型压缩算法,适用于 FL 场景,如突出参数优化和高性能子网络提取。这些算法允许在边缘网络中动态scaling给定的 transformer-based FMs,以适应不同资源限制。实验结果表明,RaFFM 在资源利用效率方面表现出显著的优势,并使用 fewer resources 来部署 FMs to FL。尽管资源占用量下降,由 RaFFM 优化的目标模型仍然能够与传统 FL 方法应用于全大小 FMs 的性能相当。这是在自然语言处理和计算机视觉领域中的多个任务上得到证明。
A hybrid quantum-classical conditional generative adversarial network algorithm for human-centered paradigm in cloud
For: The paper aims to improve the quantum generative adversarial network (QGAN) algorithm to conform to the human-centered paradigm, and to solve the problems of random generation and lack of human-computer interaction in QGAN.* Methods: The proposed algorithm, called hybrid quantum-classical conditional generative adversarial network (QCGAN), combines quantum and classical computing to achieve a knowledge-driven human-computer interaction computing mode. The generator uses a parameterized quantum circuit with an all-to-all connected topology, while the discriminator uses a classical neural network.* Results: The QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks, as demonstrated on the quantum cloud computing platform using the BAS training set.Abstract
As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on the artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode that can be implemented in cloud. The purposes of stabilizing the generation process and realizing the interaction between human and computing process are achieved by inputting artificial conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the "input bottleneck" of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.
摘要
traditional Chinese:As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode that can be implemented in cloud. The purposes of stabilizing the generation process and realizing the interaction between human and computing process are achieved by inputting artificial conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the "input bottleneck" of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.Simplified Chinese:作为一个崛起的领域,人类活动与计算系统之间的桥梁,人 centered computing(HCC)在云、边缘、fog中已经产生了巨大的影响。量子生成对抗网络(QGAN)是一种具有广泛应用前景的量子机器学习算法,但它的生成过程相对Random,生成的模型不符合人类中心的概念,因此不太适合实际应用。为解决这些问题,我们提出了一种半量子半类 condensed generative adversarial network(QCGAN)算法,这是一种基于知识驱动的人机交互计算模式,可以在云上实现。通过在生成器和识别器中输入人工条件信息,实现了生成过程的稳定化和人机交互的实现。生成器使用具有所有连接的量子Circuit,方便在训练过程中调整网络参数。识别器使用классиical神经网络,有效地避免了量子机器学习的输入瓶颈。最后,选择了BAS训练集,在量子云计算平台上进行实验。结果表明,QCGAN算法可以在训练后快速平衡到纳什平衡点,并完成人类中心的分类生成任务。
CausalImages: An R Package for Causal Inference with Earth Observation, Bio-medical, and Social Science Images
results: 可以快速和可读地进行大规模图像和视频数据分析,并提供vector评估图像或视频内容的功能Abstract
The causalimages R package enables causal inference with image and image sequence data, providing new tools for integrating novel data sources like satellite and bio-medical imagery into the study of cause and effect. One set of functions enables image-based causal inference analyses. For example, one key function decomposes treatment effect heterogeneity by images using an interpretable Bayesian framework. This allows for determining which types of images or image sequences are most responsive to interventions. A second modeling function allows researchers to control for confounding using images. The package also allows investigators to produce embeddings that serve as vector summaries of the image or video content. Finally, infrastructural functions are also provided, such as tools for writing large-scale image and image sequence data as sequentialized byte strings for more rapid image analysis. causalimages therefore opens new capabilities for causal inference in R, letting researchers use informative imagery in substantive analyses in a fast and accessible manner.
摘要
causalimages 包可以帮助研究者进行 causal inference 分析,使用图像和视频数据。这个包提供了一些新的函数,可以帮助研究者将新的数据源,如卫星和生物医学图像,integrated 到 causal inference 中。一个关键函数可以用 Bayesian 框架来划分干扰效应,以确定哪些图像或视频序列是干扰干扰最强的。另一个函数可以帮助研究者控制干扰,使用图像。此外,包还提供了一些基础函数,如用于将大规模图像和视频数据作为字节流进行Sequentialized 的存储和分析。因此,causalimages 开启了新的可能性,让研究者通过使用有用的图像,在快速和可 accessible 的方式进行 causal inference 分析。
Accelerating Non-IID Federated Learning via Heterogeneity-Guided Client Sampling
results: 实验结果显示,在非同一的数据设置下,HiCS-FL可以快速到达训练目标,并且比前一代FL客户端选择方法具有更低的训练方差和更高的可攻击性。Abstract
Statistical heterogeneity of data present at client devices in a federated learning (FL) system renders the training of a global model in such systems difficult. Particularly challenging are the settings where due to resource constraints only a small fraction of clients can participate in any given round of FL. Recent approaches to training a global model in FL systems with non-IID data have focused on developing client selection methods that aim to sample clients with more informative updates of the model. However, existing client selection techniques either introduce significant computation overhead or perform well only in the scenarios where clients have data with similar heterogeneity profiles. In this paper, we propose HiCS-FL (Federated Learning via Hierarchical Clustered Sampling), a novel client selection method in which the server estimates statistical heterogeneity of a client's data using the client's update of the network's output layer and relies on this information to cluster and sample the clients. We analyze the ability of the proposed techniques to compare heterogeneity of different datasets, and characterize convergence of the training process that deploys the introduced client selection method. Extensive experimental results demonstrate that in non-IID settings HiCS-FL achieves faster convergence and lower training variance than state-of-the-art FL client selection schemes. Notably, HiCS-FL drastically reduces computation cost compared to existing selection schemes and is adaptable to different heterogeneity scenarios.
摘要
在 Federated Learning (FL) 系统中,数据在客户端设备上存在统计差异,使得全球模型的训练变得困难。特别是在资源限制下,只有一小部分客户可以参与每个轮次的 FL 训练。现有的 FL 客户选择方法 Either introduce significant computation overhead or perform well only in scenarios where clients have similar heterogeneity profiles. In this paper, we propose HiCS-FL (Federated Learning via Hierarchical Clustered Sampling), a novel client selection method in which the server estimates the statistical heterogeneity of a client's data based on the client's update of the network's output layer and relies on this information to cluster and sample the clients. We analyze the ability of the proposed techniques to compare the heterogeneity of different datasets and characterize the convergence of the training process that deploys the introduced client selection method. Extensive experimental results demonstrate that in non-IID settings, HiCS-FL achieves faster convergence and lower training variance than state-of-the-art FL client selection schemes. Notably, HiCS-FL drastically reduces computation cost compared to existing selection schemes and is adaptable to different heterogeneity scenarios.
paper_authors: Dimitrios G. Gkotsoulias, Carsten Jäger, Roland Müller, Tobias Gräßle, Karin M. Olofsson, Torsten Møller, Steve Unwin, Catherine Crockford, Roman M. Wittig, Berkin Bilgic, Harald E. Möller
For: The paper aims to investigate the effects of non-ideal sampling and susceptibility anisotropy on the accuracy of quantitative susceptibility mapping (QSM) in the primate brain using the COSMOS method.* Methods: The authors used gradient-recalled echo (GRE) data of an entire fixed chimpanzee brain acquired at 7 T, including ideal COSMOS sampling and realistic rotations in vivo. They compared the results with ideal COSMOS, in-vivo feasible acquisitions with 3-8 orientations, and single-orientation iLSQR QSM.* Results: The authors found that in-vivo feasible and optimal COSMOS yielded high-quality susceptibility maps with increased signal-to-noise ratio (SNR) resulting from averaging multiple acquisitions. However, COSMOS reconstructions from non-ideal rotations about a single axis required additional L2-regularization to mitigate residual streaking artifacts.Here is the same information in Simplified Chinese text:* For: 这篇论文目的是 investigate COSMOS方法在非理想样本和磁矢相变的情况下,量子透镜 mapping (QSM) 在 primate 大脑中的准确性。* Methods: 作者使用了 gradient-recalled echo (GRE) 数据,包括理想的 COSMOS 样本和在 vivo 实际旋转。他们比较了理想的 COSMOS,在 vivo 可行的 3-8 个orientation 和单 orientation iLSQR QSM。* Results: 作者发现,在 vivo 可行和优化的 COSMOS 可以生成高质量的抵抗强度图,并通过多个取样提高信号响应比 (SNR)。然而,在非理想的单轴旋转情况下,需要额外的 L2 正则化来mitigate 剩下的扫描 artifacts。Abstract
Purpose: Field-to-susceptibility inversion in quantitative susceptibility mapping (QSM) is ill-posed and needs numerical stabilization through either regularization or oversampling by acquiring data at three or more object orientations. Calculation Of Susceptibility through Multiple Orientations Sampling (COSMOS) is an established oversampling approach and regarded as QSM gold standard. It achieves a well-conditioned inverse problem, requiring rotations by 0{\deg}, 60{\deg} and 120{\deg} in the yz-plane. However, this is impractical in vivo, where head rotations are typically restricted to a range of +-25{\deg}. Non-ideal sampling degrades the conditioning with residual streaking artifacts whose mitigation needs further regularization. Moreover, susceptibility anisotropy in white matter is not considered in the COSMOS model, which may introduce additional bias. The current work presents a thorough investigation of these effects in primate brain. Methods: Gradient-recalled echo (GRE) data of an entire fixed chimpanzee brain were acquired at 7 T (350 microns resolution, 10 orientations) including ideal COSMOS sampling and realistic rotations in vivo. Comparisons of the results included ideal COSMOS, in-vivo feasible acquisitions with 3-8 orientations and single-orientation iLSQR QSM. Results: In-vivo feasible and optimal COSMOS yielded high-quality susceptibility maps with increased SNR resulting from averaging multiple acquisitions. COSMOS reconstructions from non-ideal rotations about a single axis required additional L2-regularization to mitigate residual streaking artifacts. Conclusion: In view of unconsidered anisotropy effects, added complexity of the reconstruction, and the general challenge of multi-orientation acquisitions, advantages of sub-optimal COSMOS schemes over regularized single-orientation QSM appear limited in in-vivo settings.
摘要
目的:在量化感受mapping(QSM)中,场到感受性的倒置是不稳定的,需要数值稳定化通过Regularization或多样化数据收集。多样化样本收集(COSMOS)是已确立的多样化方法,被视为QSM的金标准。它实现了一个良好的倒置问题,需要在yz平面上进行0°、60°和120°的旋转。然而,这在生物体内不是实际可行的,因为头部旋转通常只能在+/-25°的范围内进行。非理想的抽象会导致倒置的稳定性受损,并需要进一步的Regularization来缓解这些残留的扭曲artefacts。此外,感受性不均匀在白 mater中的影响也没有考虑在COSMOS模型中,可能会引入额外的偏见。本研究对这些效果进行了全面的调查,并使用了7T的扩展echo(GRE)数据,包括理想的COSMOS样本和生物体内实际可行的旋转。方法:使用7T的GRE数据,包括理想的COSMOS样本和生物体内实际可行的旋转,并对比以下方法:理想的COSMOS、生物体内可行的多orientation iLSQR QSM和单orientation iLSQR QSM。结果:生物体内可行的多orientation COSMOS和理想的COSMOS可以提供高质量的感受度图,提高Signal-to-noise ratio(SNR)通过多次抽样。COSMOS重建从非理想旋转中需要额外的L2正则化来缓解剩下的扭曲artefacts。结论:由于不考虑感受性不均匀的影响,加之重建复杂化的问题,以及生物体内多orientation收集的挑战,对于在生物体内设置的情况,优于单orientation QSM的COSMOS方法的优势可能有限。