cs.AI - 2023-11-19

LLM aided semi-supervision for Extractive Dialog Summarization

  • paper_url: http://arxiv.org/abs/2311.11462
  • repo_url: None
  • paper_authors: Nishant Mishra, Gaurav Sahu, Iacer Calixto, Ameen Abu-Hanna, Issam H. Laradji
  • for: 提高chat summarization的效果
  • methods: 使用state-of-the-art大语言模型(LLMs)生成对话的pseudo-标签,然后使用这些pseudo-标签进行精度的训练,从而将知识传递到更小的专门化模型中
  • results: 在\tweetsumm dataset上实现了65.9/57.0/61.0 ROUGE-1/-2/-L的性能,比现有的状态场景下的性能提高了4.6%/12.2%/16.7%。最差情况下(即ROUGE-L),我们仍能保持94.7%的性能,使用只有10%的数据。
    Abstract Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10\% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.
    摘要 生成高质量摘要 для чат对话经常需要大量标注数据。我们提议使用无标注数据来高效地进行摘要EXTRACTIVE summarization of customer-agent dialogs。在我们的方法中,我们将摘要视为问答问题,使用当前最佳大语言模型(LLM)生成对对话的pseudo标签。然后,我们使用这些pseudo标签来练化一个专门的chat摘要模型,从而将大型LLM中的知识传递到一个更小的专门模型中。我们在\tweetsumm数据集上采用这种方法,并证明使用10%原始标注数据集可以达到65.9/57.0/61.0 ROUGE-1/-2/-L,而当前状态的训练所有数据集只能达到65.16/55.81/64.37 ROUGE-1/-2/-L。换句话说,在最差情况下(即ROUGE-L),我们仍然可以保留94.7%的性能,只使用10%的数据。

SecureBERT and LLAMA 2 Empowered Control Area Network Intrusion Detection and Classification

  • paper_url: http://arxiv.org/abs/2311.12074
  • repo_url: None
  • paper_authors: Xuemei Li, Huirong Fu
  • for: 本研究旨在评估预训练模型在控制区网络攻击检测中的适应性。
  • methods: 我们开发了两种不同的模型:CAN-SecureBERT和CAN-LLAMA2。CAN-LLAMA2模型在准确率、精度检测率、F1分数和假阳性率方面达到了最佳性能,其中假阳性率为3.10e-6,比前一代模型MTH-IDS(多层混合攻击检测系统)的假阳性率小52倍。
  • results: 我们的研究表明,使用大语言模型作为基本模型,并在其上添加适应器以满足其他计算机安全相关任务,可以保持模型的语言相关能力,同时提高检测性能。
    Abstract Numerous studies have proved their effective strength in detecting Control Area Network (CAN) attacks. In the realm of understanding the human semantic space, transformer-based models have demonstrated remarkable effectiveness. Leveraging pre-trained transformers has become a common strategy in various language-related tasks, enabling these models to grasp human semantics more comprehensively. To delve into the adaptability evaluation on pre-trained models for CAN intrusion detection, we have developed two distinct models: CAN-SecureBERT and CAN-LLAMA2. Notably, our CAN-LLAMA2 model surpasses the state-of-the-art models by achieving an exceptional performance 0.999993 in terms of balanced accuracy, precision detection rate, F1 score, and a remarkably low false alarm rate of 3.10e-6. Impressively, the false alarm rate is 52 times smaller than that of the leading model, MTH-IDS (Multitiered Hybrid Intrusion Detection System). Our study underscores the promise of employing a Large Language Model as the foundational model, while incorporating adapters for other cybersecurity-related tasks and maintaining the model's inherent language-related capabilities.
    摘要 多个研究证明他们在检测控制区网络(CAN)攻击方面的效力是非常高的。在人类语义空间理解方面,基于转换器的模型表现了非常出色的。利用预训练转换器变得成为了许多语言相关任务中的常见策略,使得这些模型能够更全面地捕捉人类语义。为了探索预训练模型在CAN攻击检测中的适应性,我们开发了两个不同的模型:CAN-SecureBERT和CAN-LLAMA2。需要注意的是,我们的CAN-LLAMA2模型在权衡准确率、检测精度率、F1分数和false alarm rate方面达到了非常出色的表现,其中false alarm rate为3.10e-6,与领先的模型MTH-IDS(多层混合攻击检测系统)的false alarm rate相比,下降了52倍。我们的研究证明了在基于大语言模型的同时,采用适应器进行其他Cybersecurity相关任务的可行性。

Unveiling Public Perceptions: Machine Learning-Based Sentiment Analysis of COVID-19 Vaccines in India

  • paper_url: http://arxiv.org/abs/2311.11435
  • repo_url: None
  • paper_authors: Milind Gupta, Abhishek Kaushik
  • for: 本研究旨在探讨印度人民对COVID-19疫苗的看法,以便帮助印度政府成功实施疫苗接种计划。
  • methods: 本研究使用数据挖掘技术分析Reddit平台上的评论,以评估印度用户对COVID-19疫苗的看法。 Python的Text Blob库用于注释评论,以评估总体情感。
  • results: 结果显示,大多数Reddit用户在印度表达中性或无关性对疫苗接种的看法,这对印度政府的疫苗接种计划 pose 一定的挑战。
    Abstract In March 2020, the World Health Organisation declared COVID-19 a global pandemic as it spread to nearly every country. By mid-2021, India had introduced three vaccines: Covishield, Covaxin, and Sputnik. To ensure successful vaccination in a densely populated country like India, understanding public sentiment was crucial. Social media, particularly Reddit with over 430 million users, played a vital role in disseminating information. This study employs data mining techniques to analyze Reddit data and gauge Indian sentiments towards COVID-19 vaccines. Using Python's Text Blob library, comments are annotated to assess general sentiments. Results show that most Reddit users in India expressed neutrality about vaccination, posing a challenge for the Indian government's efforts to vaccinate a significant portion of the population.
    摘要 三月2020年,世界卫生组织宣布COVID-19为全球大流行,其已在大多数国家传播。到2021年中期,印度已经推出三种疫苗:Covishield、Covaxin和Sputnik。为了在印度的高度密集化国家中成功进行疫苗接种,了解公众情绪非常重要。社交媒体,特别是Reddit(拥有430万用户),在传播信息方面发挥了重要作用。这项研究使用数据挖掘技术来分析Reddit数据,评估印度人民对COVID-19疫苗的情绪。使用Python的Text Blob库,评论被注释以评估总体情绪。结果显示,大多数Reddit用户在印度表达了中性的看法,对印度政府的大规模接种计划提出了挑战。

Appearance Codes using Joint Embedding Learning of Multiple Modalities

  • paper_url: http://arxiv.org/abs/2311.11427
  • repo_url: https://github.com/edogariu/alex-zhang
  • paper_authors: Alex Zhang, Evan Dogariu
  • for: 该文章是为了解决现有的生成模型中的一个主要限制,即需要在推理时重新训练新的外观代码。
  • methods: 该文章提出了一种框架,即在推理时强制 enforcing a contrastive loss constraint между不同的Modalities,以学习场景的 appearances 和结构的共同embedding空间。
  • results: 该文章通过应用该框架于 RADIATE 数据集 \cite{sheeny2021radiate} 中的一个简单的 Variational Auto-Encoder 模型,并质量地示出了通过使用日间外观代码生成夜间照片的能力。此外,该文章还与基准 VAE 相比较,并显示了该方法在推理时不需要学习任何未看过的图像的外观代码。
    Abstract The use of appearance codes in recent work on generative modeling has enabled novel view renders with variable appearance and illumination, such as day-time and night-time renders of a scene. A major limitation of this technique is the need to re-train new appearance codes for every scene on inference, so in this work we address this problem proposing a framework that learns a joint embedding space for the appearance and structure of the scene by enforcing a contrastive loss constraint between different modalities. We apply our framework to a simple Variational Auto-Encoder model on the RADIATE dataset \cite{sheeny2021radiate} and qualitatively demonstrate that we can generate new renders of night-time photos using day-time appearance codes without additional optimization iterations. Additionally, we compare our model to a baseline VAE that uses the standard per-image appearance code technique and show that our approach achieves generations of similar quality without learning appearance codes for any unseen images on inference.
    摘要 Recent work on generative modeling 使用表现代码(appearance code) 实现了新的视图渲染,如Scene中的日间和夜间渲染。然而,这种技术的主要限制是需要在推理时重新训练新的表现代码,因此在这里我们提出了一种框架,强制对不同模式之间的对比损失约束,以学习场景的共同embedding空间。我们在RADIATE数据集\cite{sheeny2021radiate}上应用了我们的框架,并质量地示出了使用日间表现代码生成夜间照片的能力。此外,我们与基准VAE模型进行比较,并证明我们的方法可以在推理时生成相同质量的图像,而无需学习任何未看过的图像的表现代码。

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms

  • paper_url: http://arxiv.org/abs/2311.11420
  • repo_url: None
  • paper_authors: Young D. Kwon, Jagmohan Chauhan, Hong Jia, Stylianos I. Venieris, Cecilia Mascolo
  • for: 本研究旨在开发一种具有硬件意识的元 continual learning 系统,以提高资源受限的嵌入式设备上的学习效率和适应能力。
  • methods: 本研究使用了元学习和复习策略,以解决数据稀缺问题并保证高准确率。同时,使用了lossless和lossy压缩技术来减少 CL 和复习样本的资源需求。
  • results: 研究结果显示,LifeLearner 可以实现近似于最优 CL 性能,与 oracle 基准值相差只有2.8%。相比之下,与 SOTA Meta CL 方法相比,LifeLearner 可以减少内存占用量(178.7倍)、终端延迟(80.8-94.2%)和能 consumption(80.9-94.2%)。此外,研究者还成功部署 LifeLearner 在两个边缘设备和一个微控制器Unit上,从而实现了资源受限平台上高效的 CL 部署。
    Abstract Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context. This is an important feature when context, actions, and users change. However, enabling CL on resource-constrained embedded systems is challenging due to the limited labeled data, memory, and computing capacity. In this paper, we propose LifeLearner, a hardware-aware meta continual learning system that drastically optimizes system resources (lower memory, latency, energy consumption) while ensuring high accuracy. Specifically, we (1) exploit meta-learning and rehearsal strategies to explicitly cope with data scarcity issues and ensure high accuracy, (2) effectively combine lossless and lossy compression to significantly reduce the resource requirements of CL and rehearsal samples, and (3) developed hardware-aware system on embedded and IoT platforms considering the hardware characteristics. As a result, LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline. With respect to the state-of-the-art (SOTA) Meta CL method, LifeLearner drastically reduces the memory footprint (by 178.7x), end-to-end latency by 80.8-94.2%, and energy consumption by 80.9-94.2%. In addition, we successfully deployed LifeLearner on two edge devices and a microcontroller unit, thereby enabling efficient CL on resource-constrained platforms where it would be impractical to run SOTA methods and the far-reaching deployment of adaptable CL in a ubiquitous manner. Code is available at https://github.com/theyoungkwon/LifeLearner.
    摘要
  1. 利用元学习和熵策略,直接面对数据缺乏问题,并保证高精度。2. 有效地结合lossless和lossy压缩,以减少 CL 和熵样本的资源需求。3. 对嵌入式和互联网平台进行硬件意识系统开发,考虑硬件特点。因此,LifeLearner 实现了几乎最佳的 CL 性能,相比于 Oracle 基线下,只减少了2.8%的精度。相比于状态之前(SOTA)元 CL 方法,LifeLearner 减少了内存占用量(by 178.7x)、终端延迟(by 80.8-94.2%)和能 consumption(by 80.9-94.2%)。此外,我们成功部署 LifeLearner 在两个边缘设备和一个微控制器单元上,因此实现了高效的 CL 在有限资源平台上,这些平台上运行 SOTA 方法是不实际的。代码可以在 https://github.com/theyoungkwon/LifeLearner 上下载。

A Security Risk Taxonomy for Large Language Models

  • paper_url: http://arxiv.org/abs/2311.11415
  • repo_url: None
  • paper_authors: Erik Derner, Kristina Batistič, Jan Zahálka, Robert Babuška
  • for: 这篇论文旨在评估大语言模型(LLM)的安全风险,包括诈骗、数据泄露和声誉损害等。
  • methods: 本论文提出了一个基于用户-模型交互pipeline的安全风险分类方法,包括提示型攻击。
  • results: 研究发现了许多具体的攻击示例,以及它们在实际应用中的影响。这些攻击包括诈骗、数据泄露和声誉损害等。
    Abstract As large language models (LLMs) permeate more and more applications, an assessment of their associated security risks becomes increasingly necessary. The potential for exploitation by malicious actors, ranging from disinformation to data breaches and reputation damage, is substantial. This paper addresses a gap in current research by focusing on the security risks posed by LLMs, which extends beyond the widely covered ethical and societal implications. Our work proposes a taxonomy of security risks along the user-model communication pipeline, explicitly focusing on prompt-based attacks on LLMs. We categorize the attacks by target and attack type within a prompt-based interaction scheme. The taxonomy is reinforced with specific attack examples to showcase the real-world impact of these risks. Through this taxonomy, we aim to inform the development of robust and secure LLM applications, enhancing their safety and trustworthiness.
    摘要 As large language models (LLMs) 普遍应用在更多应用程序中,评估这些模型相关的安全风险成为更加必要。这些模型的潜在问题包括伪信息、数据泄露和声誉伤害等,具体的威胁来源涉及访问者的黑客、恶意攻击者和无良用户。本文对现有的研究缺失进行了补充,专注于 LLMs 相关的安全风险,这些风险超出了广泛讨论的伦理和社会因素。我们的工作提出了一个对 LLMs 进行攻击的分类方案,并将攻击分为三类:内部攻击、外部攻击和混合攻击。这个分类方案以对话领域为基础,并提供具体的攻击示例,以显示这些风险对现实世界的影响。通过这个分类方案,我们想帮助开发安全和可靠的 LLM 应用程序,从而提高这些应用程序的安全性和可靠性。

Make me an Offer: Forward and Reverse Auctioning Problems in the Tourism Industry

  • paper_url: http://arxiv.org/abs/2311.11400
  • repo_url: None
  • paper_authors: Ioannis T. Christou, Dimitris Doukas, Konstantina Skouri, Gerasimos Meletiou
  • for: 帮助酒店iers和旅游者解决常见的季节性问题,提高旅游业的经济效益和社会影响。
  • methods: 开发了两种拍卖系统,一种是向下拍卖模型,允许低知名度地区或低季节期酒店拍卖房间,另一种是由客人发起的反拍卖模型,类似于priceline.com的拍卖概念,allowing customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms。
  • results: 通过数学Programming模型定义两种拍卖,并证明在每种中,都有significant benefits for both sides of the hotelier and the customer。
    Abstract Most tourist destinations are facing regular and consistent seasonality with significant economic and social impacts. This phenomenon is more pronounced in the post-covid era, where demand for travel has increased but unevenly among different geographic areas. To counter these problems that both customers and hoteliers are facing, we have developed two auctioning systems that allow hoteliers of lower popularity tier areas or during low season periods to auction their rooms in what we call a forward auction model, and also allows customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms, in what constitutes a reverse auction model initiated by the customer, similar to the bidding concept of priceline.com. We develop mathematical programming models that define explicitly both types of auctions, and show that in each type, there are significant benefits to be gained both on the side of the hotelier as well as on the side of the customer. We discuss algorithmic techniques for the approximate solution of these optimization problems, and present results using exact optimization solvers to solve them to guaranteed optimality. These techniques could be beneficial to both customer and hotelier reducing seasonality during middle and low season and providing the customer with attractive offers.
    摘要 多个旅游目的地都面临了常见和稳定的季节性问题,这种现象在covid后期更加突出,旅游需求增加了,但不均匀分布在不同地理区域。为了解决客户和酒店经理面临的问题,我们开发了两种拍卖系统:一是向下层知名度地区或低季节期酒店拍卖房间的前向拍卖模式,二是让客户 initiat 竞拍过程,让酒店在区域内提供优惠套餐,类似于priceline.com的拍卖概念。我们定义了这两种拍卖模式的数学编程模型,并证明每种模型都有很大的利益。我们介绍了算法技术来解决这些优化问题,并使用精确优化器解决它们,以确保最优解。这些技术可以帮助客户和酒店减少中期和低季节的季节性问题,并为客户提供吸引人的优惠。

Enhancing Novel Object Detection via Cooperative Foundational Models

  • paper_url: http://arxiv.org/abs/2311.12068
  • repo_url: https://github.com/rohit901/cooperative-foundational-models
  • paper_authors: Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan
  • For: 本研究目标是解决 novel object detection (NOD) 问题,即在推理过程中准确地检测已知和新类目对象。传统的对象检测算法受到closed-set的限制,无法处理 NOD。* Methods: 我们提出了一种将现有的 closed-set 检测器转化为 open-set 检测器的方法,利用 CLIP 和 SAM 两种预训练基本模型的优势。我们还与 state-of-the-art 的 open-set 检测器 GDINO 集成,以达到更高的对象检测性能。* Results: 我们在 LVIS 数据集上实现了 17.42 mAP 的新对象检测精度,并在 COCO OVD 分 split 上超过当前状态的报告值。我们的代码可以在 https://github.com/rohit901/cooperative-foundational-models 上下载。
    Abstract In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://github.com/rohit901/cooperative-foundational-models .
    摘要 在这项工作中,我们 Addressing the challenging and emerging problem of novel object detection (NOD), we focus on accurately detecting both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their ability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $\text{AP}_{50}$ for novel classes. Our code is available at .Here's the word-for-word translation of the text into Simplified Chinese:在这项工作中,我们 Addressing the challenging and emerging problem of novel object detection (NOD), we focus on accurately detecting both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their ability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $\text{AP}_{50}$ for novel classes. Our code is available at .

Inspecting Explainability of Transformer Models with Additional Statistical Information

  • paper_url: http://arxiv.org/abs/2311.11378
  • repo_url: None
  • paper_authors: Hoang C. Nguyen, Haeil Lee, Junmo Kim
  • for: 这篇论文的目的是为了有效地解释Transformer模型在视觉和多 modal任务中的含义。
  • methods: 这篇论文使用了将注意层组合起来,以显示每个图像块的重要性。
  • results: 这篇论文表明了对Swin Transformer和ViT的解释性能力很强,并且可以准确地显示预测对象。
    Abstract Transformer becomes more popular in the vision domain in recent years so there is a need for finding an effective way to interpret the Transformer model by visualizing it. In recent work, Chefer et al. can visualize the Transformer on vision and multi-modal tasks effectively by combining attention layers to show the importance of each image patch. However, when applying to other variants of Transformer such as the Swin Transformer, this method can not focus on the predicted object. Our method, by considering the statistics of tokens in layer normalization layers, shows a great ability to interpret the explainability of Swin Transformer and ViT.
    摘要 孔 transformed 在视觉领域的 популяр度在最近几年内逐渐增长,因此有需要找到一种有效地 interpret 孔 transformed 模型的方法。在最近的工作中,Chefer et al. 通过组合注意层来有效地视觉和多modal任务中的孔 transformed 模型。然而,当应用于其他孔 transformed 模型,如 Swin Transformer 和 ViT 时,这种方法无法关注预测的对象。我们的方法,通过考虑层normalization层中的统计数据,显示了对 Swin Transformer 和 ViT 的解释性能出色。

SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints

  • paper_url: http://arxiv.org/abs/2311.11371
  • repo_url: None
  • paper_authors: Aditya Nalgunda Ganesh
  • for: 这种paper是为了提出一种能够在封闭环境中进行3Dsemantic occupancy prediction的方法,以提高现有方法的性能。
  • methods: 这种方法使用dense prediction transformers来进行预测,并使用 semi-supervised training pipeline来学习从不结构化交通数据集中。另外,它还引入了patch-wise training来解决内存限制问题。
  • results: 这种方法在不结构化交通场景下表现出色,与现有方法相比,它的RMSE分数为9.1473, semantic segmentation IoU分数为46.02%,并且能够运行在69.47 Hz的频率上。
    Abstract We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and Bengaluru Driving Dataset. Our semi-supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labelling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public.
    摘要 我们提出了SOccDPT,一种具有内存效率的方法,用于从单视图图像输入获取3D semantic occupancy预测。为了解决现有方法在结构化交通数据集上的局限性,我们将我们的模型训练在无结构数据集上,包括印度驾驶数据集和孟买丹驾驶数据集。我们的半supervised训练管道使得SOccDPT可以从有限标签的数据集中学习,而不需要手动标注。为了增强我们的模型对无结构交通场景的处理能力,我们引入了patch-wise训练,将每个epoch中的一 subset of参数选择训练。在自动梯度图构建过程中,我们减少了训练过程中的内存使用。在无结构交通和内存受限的训练和执行环境下,SOccDPT比现有的不同差分估计方法表现更好,RMSE分数为9.1473,实现semantic segmentation IoU分数为46.02%,并在竞争性的69.47 Hz频率下运行。我们将我们的代码和semantic occupancy数据集公开。

Using Causal Threads to Explain Changes in a Dynamic System

  • paper_url: http://arxiv.org/abs/2311.11334
  • repo_url: None
  • paper_authors: Robert B. Allen
  • for: 这篇论文主要用于构建系统的 semantics 模型,具体来说是用于描述系统中状态变化的 causal 解释。
  • methods: 该论文使用 structured causal explanations 和 process-based dynamic knowledge graphs 来建立系统的 semantics 模型。
  • results: 通过 Snowball Earth 理论中的地质变化的 causal 线索,构建了一个初步的图形界面来展示解释。与统计方法such as Large Language Models (LLMs)不同,该方法可以直接被检查和验证。
    Abstract We explore developing rich semantic models of systems. Specifically, we consider structured causal explanations about state changes in those systems. Essentially, we are developing process-based dynamic knowledge graphs. As an example, we construct a model of the causal threads for geological changes proposed by the Snowball Earth theory. Further, we describe an early prototype of a graphical interface to present the explanations. Unlike statistical approaches to summarization and explanation such as Large Language Models (LLMs), our approach of direct representation can be inspected and verified directly.
    摘要 我们探索构建丰富Semantic模型系统。特别是,我们考虑结构化 causal 解释系统状态变化。基本上,我们正在构建基于过程的动态知识图。例如,我们构建了 Snowball Earth 理论提出的地质变化 causal 线索模型。此外,我们描述了一个早期的图形用户界面来展示解释。与统计方法such as Large Language Models (LLMs)不同,我们的直接表示方法可以直接检查和验证。

Portuguese FAQ for Financial Services

  • paper_url: http://arxiv.org/abs/2311.11331
  • repo_url: None
  • paper_authors: Paulo Finardi, Wanderley M. Melo, Edgard D. Medeiros Neto, Alex F. Mansano, Pablo B. Costa, Vinicius F. Caridá
  • for: 提高葡萄牙金融领域自然语言处理(NLP)应用的发展,因为当地数据的缺乏限制了NLP应用的研究和开发。
  • methods: 使用数据增强技术生成具有不同semantic similarity的数据集,并在超级vised和无监督任务中评估增强数据的影响。
  • results: 通过数据增强,提高了NLP应用中的性能,并且在low和high semantic similarity场景下都能够获得良好的效果。同时,生成的数据集将被公开发布在Hugging Face Datasets平台上,以便更广泛的学术社区参与和交流。
    Abstract Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, employing techniques that vary in semantic similarity. Supervised and unsupervised tasks are conducted to evaluate the impact of augmented data on both low and high semantic similarity scenarios. Additionally, the resultant dataset will be publicly disseminated on the Hugging Face Datasets platform, thereby enhancing accessibility and fostering broader engagement within the NLP research community.
    摘要 缺乏特定领域数据的问题在葡萄牙金融领域妨碍了自然语言处理(NLP)应用的发展。为解决这个限制,当前的研究建议利用数据增强技术生成的 sintética data。研究将对来自中南美洲中央银行FAQ的数据进行增强,使用不同的semantic similarity技术。在低和高semantic similarity情况下,进行了supervised和unsupervised任务来评估增强数据的影响。此外,生成的数据集将公开发布在Hugging Face Datasets平台上,从而提高了可访问性和推动了更广泛的NLP研究社区参与。

Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation

  • paper_url: http://arxiv.org/abs/2311.11321
  • repo_url: None
  • paper_authors: Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel
  • for: 该论文旨在提出一种新的、不受表示学习的束缚的极限 bounds 估计方法,用于评估 conditional average treatment effect (CATE) 估计中的受限制的束缚偏见。
  • methods: 该论文使用了一种新的、 representation-agnostic 框架,用于估计 CATE 估计中的受限制的束缚偏见。该框架包括一种 theoretically 确定 CATE 不可识别性的条件,以及一种用于估计受限制的束缚偏见的方法。
  • results: 该论文通过一系列实验证明了该 bounds 估计方法的效果iveness。具体来说,该方法可以减少 CATE 估计中的受限制的束缚偏见,并提供一种irect relevance 的方法来评估 CATE 估计的有效性。
    Abstract State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATEs are non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose to perform partial identification of CATEs or, equivalently, aim at estimating of lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our framework is of direct relevance in practice where the validity of CATE estimation is of importance.
    摘要 现代方法 для条件平均治疗效果(CATE)估计广泛使用表示学习。这里的想法是通过减小低样本CATE估计的方差,使用(可能受限的)低维度表示。然而,低维度表示可能会产生对观察的隐藏变量的损失信息,从而导致偏误,因此表示学习在CATE估计中的有效性通常会被违反。在这篇论文中,我们提出了一个新的、表示无关的框架,用于估计受约束表示带来的 repression-induced confounding bias 的上下限。首先,我们理论上证明在低维度(受约束)表示下,CATE是非可定义的。其次,作为我们的救济方法,我们提议在CATE估计中进行部分标识,或等价地,估计 repression-induced confounding bias 的下限和上限。我们在一系列实验中证明了我们的上限的有效性。总之,我们的框架在实践中对CATE估计的有效性具有直接的重要性。

GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure

  • paper_url: http://arxiv.org/abs/2311.11319
  • repo_url: None
  • paper_authors: Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu
  • for: 减少地图图像中的交通基础设施 segmentation 难题
  • methods: 提出了一种基于 SAM 框架的 Geographical SAM(GeoSAM)模型,通过zero-shot learning 精度提升策略和基于预训练 CNN 分割模型的稀疏视觉提示策略
  • results: GeoSAM 模型在地图图像中的交通基础设施分割 tasks 上表现出优于现有方法,具体提高了20%, 14.29%, 17.65% 等,代表了对地图图像中交通基础设施分割任务的很大进步。
    Abstract The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 20%, 14.29%, and 17.65% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images.
    摘要 《Segment Anything Model》(SAM)在自然图像分割 tasks 上表现出色,但在飞行和卫星图像中分割交通基础设施,如道路、人行道和横渡道,却表现不佳,主要原因在于这些物体的窄特征,Texture与周围环境杂mix,以及对象如树、建筑、车辆和步行人的干扰,这些因素会让模型产生错误的分割图像。为Address这些挑战,我们提出了《地理SAM》(GeoSAM),一种基于SAM的novel框架,通过适应策略和零shot学习 dense visual prompt、以及预训练 CNN segmentation模型的稀疏 visual prompt来解决这些问题。我们的提议GeoSAM在地理图像分割 tasks 上比 existed Approach 高出20%, 14.29%, 和17.65%,分别表示了在基础模型上 segments mobility 基础设施,包括道路和人行道基础设施的分割,即使在飞行和卫星图像中,具有巨大的进步。

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

  • paper_url: http://arxiv.org/abs/2311.11315
  • repo_url: None
  • paper_authors: Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao
  • for: 提高基于 LLM 的任务规划和工具使用能力,在真实世界系统中运行
  • methods: 提出了一个完整的框架,包括 API 选取器、 LLF 练化器和示例选取器,用于解决真实世界系统中的三大挑战
  • results: 验证了方法使用真实世界商业系统和开源学术数据集,结果表明每个组件都有效,同时集成的框架也能够提高任务规划和工具使用能力
    Abstract Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. In response, this paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating within real-world systems. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs for the user task among the extensive array available; (2) LLM Finetuner tunes a base LLM so that the finetuned LLM can be more capable for task planning and API calling; (3) the Demo Selector adaptively retrieves different demonstrations related to hard-to-distinguish APIs, which is further used for in-context learning to boost the final performance. We validate our methods using a real-world commercial system as well as an open-sourced academic dataset, and the outcomes clearly showcase the efficacy of each individual component as well as the integrated framework.
    摘要

What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

  • paper_url: http://arxiv.org/abs/2311.11288
  • repo_url: None
  • paper_authors: Zuzanna Osika, Jazmin Zatarain Salazar, Diederik M. Roijers, Frans A. Oliehoek, Pradeep K. Murukannaiah
  • for: 本文旨在总结和探讨多目标优化算法生成的解决方案的决策支持方法。
  • methods: 本文涵盖了多元优化算法解决问题时的解决方案分析方法,包括视觉化、解决集挖掘、不确定性探索以及emerging研究方向,如交互、解释性和伦理。
  • results: 本文synthesizes多种研究领域的方法,建立了一个独立于应用领域的统一方法,以降低使用多元优化算法的入门难度,并提供了新的研究方向。
    Abstract We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fields. We provide an overview of the advances on this topic, including methods for visualization, mining the solution set, and uncertainty exploration as well as emerging research directions, including interactivity, explainability, and ethics. We synthesize these methods drawing from different fields of research to build a unified approach, independent of the application. Our goals are to reduce the entry barrier for researchers and practitioners on using MOO algorithms and to provide novel research directions.
    摘要 我们提出了一篇文章,整合多目标优化(MOO)算法生成的解决方案的决策支持方法的评审。由于MOO在解决多种问题时使用,关于MOO算法的解决方案分析方法在不同领域中散布开来。我们提供了这些进展的概述,包括可视化、解决集挖掘、不确定性探索以及新兴研究方向,如互动、解释性和伦理。我们将这些方法从不同领域的研究中总结出来,建立一个独立于应用的统一方法,以降低MOO算法的入门难度,并提供新的研究方向。

Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition

  • paper_url: http://arxiv.org/abs/2311.11287
  • repo_url: None
  • paper_authors: Zihao Liu, Xing Liu, Yizhai Zhang, Zhengxiong Liu, Panfeng Huang
  • for: 这个研究旨在提出一种基于感觉活动吸引学习(Tactile-AIRL)的新方法,以实现有效的 manipulate 训练。
  • methods: 这种方法通过结合RL和活动推理,并使用视觉感知器来提供细致的感知,以提高RL的训练效率和适应率。
  • results: 实验结果表明,这种方法在非握持对象推力任务中可以达到显著高的训练效率,并在粗略和精细奖励任务中均表现出色,超过了SAC基eline。 Physical experiments on a gripper screwing task also demonstrate the algorithm’s rapid learning capability and its potential for practical applications.
    Abstract Robotic manipulation holds the potential to replace humans in the execution of tedious or dangerous tasks. However, control-based approaches are not suitable due to the difficulty of formally describing open-world manipulation in reality, and the inefficiency of existing learning methods. Thus, applying manipulation in a wide range of scenarios presents significant challenges. In this study, we propose a novel method for skill learning in robotic manipulation called Tactile Active Inference Reinforcement Learning (Tactile-AIRL), aimed at achieving efficient training. To enhance the performance of reinforcement learning (RL), we introduce active inference, which integrates model-based techniques and intrinsic curiosity into the RL process. This integration improves the algorithm's training efficiency and adaptability to sparse rewards. Additionally, we utilize a vision-based tactile sensor to provide detailed perception for manipulation tasks. Finally, we employ a model-based approach to imagine and plan appropriate actions through free energy minimization. Simulation results demonstrate that our method achieves significantly high training efficiency in non-prehensile objects pushing tasks. It enables agents to excel in both dense and sparse reward tasks with just a few interaction episodes, surpassing the SAC baseline. Furthermore, we conduct physical experiments on a gripper screwing task using our method, which showcases the algorithm's rapid learning capability and its potential for practical applications.
    摘要 人工 manipulate 潜在可以取代人类在干燥或危险任务中执行。然而,控制基于方法不适用,因为在实际开放世界中形式地描述 manipulate 是困难的,而且现有的学习方法不够有效。因此,在各种场景中应用 manipulate 存在 significante挑战。在这项研究中,我们提出了一种新的技术 для manipulate 的技能学习,称为感觉主动吸引强化学习(Tactile-AIRL),以实现高效培训。为了提高强化学习(RL)的表现,我们引入了活动推理,将模型基本技术和内生好奇纳入RL过程中。这种整合可以提高算法的培训效率和适应缺乏奖励的适应力。此外,我们利用视觉基于感觉器来提供细致的感知 для manipulate 任务。最后,我们采用模型基本方法来假设和规划适当的动作,通过自由能量减少来做出最佳的行为。在非握持物推力任务上,我们的方法在训练效率方面取得了显著的进步,可以在很少的互动集上达到SAC基准的性能。此外,我们在一个吊钢钉卷任务中使用我们的方法进行物理实验,结果表明了算法的快速学习能力和实际应用的潜力。

Adversarial Prompt Tuning for Vision-Language Models

  • paper_url: http://arxiv.org/abs/2311.11261
  • repo_url: None
  • paper_authors: Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang
  • for: 本研究旨在提高多modal学习中预训练的视觉语言模型(VLM)对攻击图像的Robustness。
  • methods: 本文引入了一种新的技术 called Adversarial Prompt Tuning(AdvPT),它利用可学习的文本提示和恶意图像嵌入之间的对应关系,以提高VLM的攻击图像Robustness。
  • results: 实验结果表明,AdvPT可以提高VLM的抗白盒和黑盒攻击性能,并且可以与现有的图像处理方法结合使用,进一步提高安全性能。
    Abstract With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques, further boosting defensive capabilities. Comprehensive experimental analyses provide insights into adversarial prompt tuning, a novel paradigm devoted to improving resistance to adversarial images through textual input modifications, paving the way for future robust multimodal learning research. These findings open up new possibilities for enhancing the security of VLMs. Our code will be available upon publication of the paper.
    摘要 随着多modal学习的快速发展,预训练的视觉语言模型(VLM)如CLIP表现出了跨modalities的很好的桥接能力。然而,这些模型对抗攻击仍然存在很大的安全风险,特别是在图像模式下。这篇论文提出了一种新的技术——敏感提示调整(AdvPT),以提高VLM中图像Encoder的抗击攻击能力。AdvPT利用可学习的文本提示和敌对图像嵌入的对应关系,从而解决VLM中存在的抗击攻击漏洞,无需对模型参数进行较为广泛的训练或者改变模型结构。我们的实验结果表明,AdvPT可以提高对白盒和黑盒抗击攻击的抵抗力,并且与现有的图像处理基础技术结合使用时,可以进一步强化防御能力。这些发现开启了新的可能性,以提高VLM的安全性。我们的代码将在论文发表后提供。

Tensor networks for interpretable and efficient quantum-inspired machine learning

  • paper_url: http://arxiv.org/abs/2311.11258
  • repo_url: None
  • paper_authors: Shi-Ju Ran, Gang Su
  • for: 这篇论文是为了探讨 tensor network 在深度机器学习中的应用和发展。
  • methods: 本文使用 tensor network 作为一种数学工具,通过其在量子机理学和多体物理中的强有力基础,提供了一种高度可解释的深度机器学习方案。
  • results: 本文综述了 tensor network 在深度机器学习中的激进进步,包括高度可解释的 ML 方案和高效的计算技术。此外,随着量子计算机的发展,tensor network 预期会推出novel的量子硬件实现方案,指向未来的“量子人工智能”。
    Abstract It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum mechanics, has shown its unique advantages on developing efficient ``white-box'' ML schemes. Here, we give a brief review on the inspiring progresses made in TN-based ML. On one hand, interpretability of TN ML is accommodated with the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be rendered from the powerful TN representations and the advanced computational techniques developed in quantum many-body physics. With the fast development on quantum computers, TN is expected to conceive novel schemes runnable on quantum hardware, heading towards the ``quantum artificial intelligence'' in the forthcoming future.
    摘要 Current deep machine learning (ML) schemes face a critical challenge in achieving both high interpretability and efficiency. Tensor network (TN), a well-established mathematical tool originating from quantum mechanics, has shown unique advantages in developing efficient "white-box" ML schemes. Here, we provide a brief review of the inspiring progress made in TN-based ML.On one hand, the interpretability of TN ML is rooted in the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be achieved through the powerful TN representations and advanced computational techniques developed in quantum many-body physics. With the rapid development of quantum computers, TN is expected to create novel schemes that can run on quantum hardware, leading towards the "quantum artificial intelligence" of the future.

A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications

  • paper_url: http://arxiv.org/abs/2311.11250
  • repo_url: None
  • paper_authors: Sudhanshu Kumar, Partha Pratim Roy, Debi Prosad Dogra, Byung-Gyu Kim
  • for: 这篇论文主要是为了介绍情绪分析(SA)的研究和发展,以及它在不同领域的应用。
  • methods: 论文使用了 Lexicon-based approach、Machine Learning和深度学习等方法来实现情绪分析。
  • results: 论文总结了情绪分析的挑战和机遇,并提供了不同领域的应用例子。
    Abstract Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}
    摘要 《情感分析(SA)是一个出现的领域,它是通过计算机自动地判断和分类文本中表达的意见的过程。社交媒体在了解客户心态方面发挥了关键作用,因为它可以为企业提供客户反馈和评价,帮助企业改进产品和服务。 SA或意见挖掘在不同领域的研究中具有普遍的应用前景,它可以处理大量的互联网上的数据,包括文本、语音、图像和视频等。本评论 paper define了情感的定义和最新的研究发展,以及不同领域中的挑战和机会。 ( keywords:情感分析、机器学习、词典基本方法、深度学习、自然语言处理)

Open Set Dandelion Network for IoT Intrusion Detection

  • paper_url: http://arxiv.org/abs/2311.11249
  • repo_url: None
  • paper_authors: Jiashu Wu, Hao Dai, Kenneth B. Kent, Jerome Yen, Chengzhong Xu, Yang Wang
    for: This paper aims to address the problem of intrusion detection in IoT devices, which is crucial due to the increasing use of IoT devices. However, traditional intrusion detection methods are not effective due to the data scarcity of IoT devices.methods: The proposed method, Open-Set Dandelion Network (OSDN), uses unsupervised heterogeneous domain adaptation in an open-set manner to transfer intrusion knowledge from a knowledge-rich source network intrusion domain to a data-scarce target IoT intrusion domain. The OSDN model forms the source domain into a dandelion-like feature space, and uses a target membership mechanism, dandelion angular separation mechanism, and dandelion embedding alignment mechanism to achieve better inter-category separability and intra-category compactness.results: The proposed OSDN model outperforms three state-of-the-art baseline methods by 16.9% in terms of intrusion detection accuracy. The comprehensive experiments on several intrusion datasets demonstrate the effectiveness of the OSDN model.
    Abstract As IoT devices become widely, it is crucial to protect them from malicious intrusions. However, the data scarcity of IoT limits the applicability of traditional intrusion detection methods, which are highly data-dependent. To address this, in this paper we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model performs intrusion knowledge transfer from the knowledge-rich source network intrusion domain to facilitate more accurate intrusion detection for the data-scarce target IoT intrusion domain. Under the open-set setting, it can also detect newly-emerged target domain intrusions that are not observed in the source domain. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space in which each intrusion category is compactly grouped and different intrusion categories are separated, i.e., simultaneously emphasising inter-category separability and intra-category compactness. The dandelion-based target membership mechanism then forms the target dandelion. Then, the dandelion angular separation mechanism achieves better inter-category separability, and the dandelion embedding alignment mechanism further aligns both dandelions in a finer manner. To promote intra-category compactness, the discriminating sampled dandelion mechanism is used. Assisted by the intrusion classifier trained using both known and generated unknown intrusion knowledge, a semantic dandelion correction mechanism emphasises easily-confused categories and guides better inter-category separability. Holistically, these mechanisms form the OSDN model that effectively performs intrusion knowledge transfer to benefit IoT intrusion detection. Comprehensive experiments on several intrusion datasets verify the effectiveness of the OSDN model, outperforming three state-of-the-art baseline methods by 16.9%.
    摘要 随着互联网物联网(IoT)设备的普及,保护它们免受恶意攻击变得非常重要。然而,IoT数据的缺乏使得传统的攻击检测方法不能够应用。为此,本文提出了基于无监督多类领域适应的开放集合网络(OSDN)模型,用于更加准确地检测IoT攻击。在开放集合Setting下,OSDN模型还可以检测新兴的目标领域攻击,而这些攻击没有在源领域观察到。为解决数据缺乏问题,OSDN模型将源领域形成一个 LIKE 风格的特征空间,在该空间中,每个攻击类型都是紧凑的分组的,同时也强调了不同攻击类型之间的分离。然后,通过目标风车机制,将目标领域转化为另一个风车。接着,风车angular分离机制实现了更好的inter-category分离,而风车嵌入对齐机制进一步将两个风车嵌入更细的层次。为了促进内部分类紧凑,使用抽象样本风车机制,以增强混淆类别之间的分离。总的来说,OSDN模型通过 Transfer learning 来传递知识,以便更好地检测IoT攻击。对于多个攻击数据集的实验证明,OSDN模型的效果非常出色,与三个状态机制比例高达16.9%。

AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

  • paper_url: http://arxiv.org/abs/2311.11238
  • repo_url: None
  • paper_authors: Alice Cai, Caine Ardayfio, AnhPhu Nguyen, Tica Lin, Elena Glassman
  • for: 提高开发效率和用户体验,为不熟悉XR开发的开发者提供一个易用的开发工具。
  • methods: 使用自然语言、眼动和触摸交互,提供一个吸引式的头戴式开发环境,并使用人工智能语言模型(LLMs)生成AtomScript脚本。
  • results: 经验证明,AtomXR可以提高开发速度和用户体验,比传统系统更高效。
    Abstract As technological advancements in extended reality (XR) amplify the demand for more XR content, traditional development processes face several challenges: 1) a steep learning curve for inexperienced developers, 2) a disconnect between 2D development environments and 3D user experiences inside headsets, and 3) slow iteration cycles due to context switching between development and testing environments. To address these challenges, we introduce AtomXR, a streamlined, immersive, no-code XR prototyping tool designed to empower both experienced and inexperienced developers in creating applications using natural language, eye-gaze, and touch interactions. AtomXR consists of: 1) AtomScript, a high-level human-interpretable scripting language for rapid prototyping, 2) a natural language interface that integrates LLMs and multimodal inputs for AtomScript generation, and 3) an immersive in-headset authoring environment. Empirical evaluation through two user studies offers insights into natural language-based and immersive prototyping, and shows AtomXR provides significant improvements in speed and user experience compared to traditional systems.
    摘要 随着扩展现实(XR)技术的进步,需求更多的XR内容的增加,传统的开发过程面临多种挑战:1)新手开发者学习曲线过于陡峭,2)2D开发环境与头盔中的3D用户体验之间的断绝,3)由于开发和测试环境之间的上下文切换而导致的慢速迭代循环。为了解决这些挑战,我们介绍AtomXR,一种简化的、沉浸式、无代码XR开发工具,用于帮助经验不足和经验丰富的开发者在使用自然语言、眼动和触摸交互创建应用程序。AtomXR包括:1)AtomScript,一种高级的人类可解释的脚本语言,用于快速概念化,2)一个与大语言模型(LLM)和多Modal输入集成的自然语言界面,3)一个沉浸式在头盔中的作者环境。经验证明,通过两项用户研究,AtomXR在自然语言基础和沉浸式概念驱动中提供了 Traditional 系统相比的显著改进,具体来说是快速和用户体验方面的改进。

Implementation of AI Deep Learning Algorithm For Multi-Modal Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2311.11237
  • repo_url: None
  • paper_authors: Jiazhen Wang
  • for: 这种多模态情绪识别方法是用于提高情绪识别精度和学习效率的。
  • methods: 这种方法使用了两个通道卷积神经网络和环形网络,并将词 vector化使用 GloVe,然后将词vector输入到卷积神经网络中,并使用注意机制和最大池化Converter BiSRU通道,从而获得本地深情和预后序情 semantics。
  • results: 实验表明,基于特征融合的情绪分析方法可以有效提高情绪数据集的识别精度和降低学习时间。这种模型具有一定的通用性。
    Abstract A multi-modal emotion recognition method was established by combining two-channel convolutional neural network with ring network. This method can extract emotional information effectively and improve learning efficiency. The words were vectorized with GloVe, and the word vector was input into the convolutional neural network. Combining attention mechanism and maximum pool converter BiSRU channel, the local deep emotion and pre-post sequential emotion semantics are obtained. Finally, multiple features are fused and input as the polarity of emotion, so as to achieve the emotion analysis of the target. Experiments show that the emotion analysis method based on feature fusion can effectively improve the recognition accuracy of emotion data set and reduce the learning time. The model has a certain generalization.
    摘要 一种多模式情感识别方法由两个通道卷积神经网络与环形网络结合,以提取情感信息效果地和提高学习效率。文本被vector化使用GloVe,word vector输入卷积神经网络。将注意机制和最大池化器BiSRU通道结合,可以获得当地深情感和预后序列情感 semantics。最后,多种特征被融合并输入为情感方向,以实现目标情感分析。实验显示,基于特征融合的情感分析方法可以提高情感数据集的识别精度和减少学习时间。模型具有一定的通用性。

Unraveling the `Anomaly’ in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution

  • paper_url: http://arxiv.org/abs/2311.11235
  • repo_url: https://github.com/pseudo-Skye/TriAD
  • paper_authors: Yuting Sun, Guansong Pang, Guanhua Ye, Tong Chen, Xia Hu, Hongzhi Yin
  • for: 本研究旨在提出一种基于自我超vision学习的三Domain异常检测器(TriAD),以解决时间序列异常检测(TSAD)中的挑战,包括缺乏异常标签和异常形态和长度的变化。
  • methods: TriAD使用了三个数据领域的特征模型,不需要异常标签,通过交互领域和内部领域的对比损失来学习常见特征。
  • results: TriAD在UCRC datasets上实现了三倍于SOTA深度学习模型的PA%K基数F1得分,以及50%的准确率提升 compared to SOTA的discord发现算法。
    Abstract The ongoing challenges in time series anomaly detection (TSAD), notably the scarcity of anomaly labels and the variability in anomaly lengths and shapes, have led to the need for a more efficient solution. As limited anomaly labels hinder traditional supervised models in TSAD, various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. However, they encounter difficulties handling variations in anomaly lengths and shapes, limiting their adaptability to diverse anomalies. Additionally, many benchmark datasets suffer from the problem of having explicit anomalies that even random functions can detect. This problem is exacerbated by ill-posed evaluation metrics, known as point adjustment (PA), which can result in inflated model performance. In this context, we propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD), which addresses these challenges by modeling features across three data domains - temporal, frequency, and residual domains - without relying on anomaly labels. Unlike traditional contrastive learning methods, TriAD employs both inter-domain and intra-domain contrastive loss to learn common attributes among normal data and differentiate them from anomalies. Additionally, our approach can detect anomalies of varying lengths by integrating with a discord discovery algorithm. It is worth noting that this study is the first to reevaluate the deep learning potential in TSAD, utilizing both rigorously designed datasets (i.e., UCR Archive) and evaluation metrics (i.e., PA%K and affiliation). Through experimental results on the UCR dataset, TriAD achieves an impressive three-fold increase in PA%K based F1 scores over SOTA deep learning models, and 50% increase of accuracy as compared to SOTA discord discovery algorithms.
    摘要 “时间序列异常检测(TSAD)中的挑战,包括缺乏异常标签和异常长度和形状的变化,导致需要更有效的解决方案。由于受限于异常标签,传统的超级vised学习模型在TSAD中受到阻碍,而新的state-of-the-art(SOTA)深度学习技术如自我supervised learning被引入来解决这个问题。然而,这些技术在异常长度和形状的变化中遇到困难,限制它们在多种异常检测中的适用性。此外,许多benchmark dataset受到异常标签的问题,这个问题更加严重由于评估度量(point adjustment,PA)的问题,可能导致模型表现的夸大。在这个 контекст中,我们提出了一个基于自我supervised learning的Tri-domain Anomaly Detector(TriAD),解决了这些挑战。TriAD不需要异常标签,而是通过在三个数据领域(时间、频率和差分领域)中建立特征,并运用両侧和内部对照损失来学习常规数据的共同特征和与异常检测。此外,TriAD可以检测异常的不同长度,通过与歧义发现算法的结合。值得注意的是,本研究是TSAD领域中首次将深度学习的潜力发掘,运用了严谨的测试数据(i.e., UCR Archive)和评估度量(i.e., PA%K和affiliation)。经过实验结果显示,TriAD在UCR数据集上表现出三倍的PA%K基于F1分数,和对于SOTA歧义发现算法的50%增加。”

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

  • paper_url: http://arxiv.org/abs/2311.11227
  • repo_url: https://github.com/leondada/fedra
  • paper_authors: Shangchao Su, Bin Li, Xiangyang Xue
  • for: 这个研究旨在提出一个新的联邦调教算法(FedRA),以便在实际的联邦学习应用中,处理多个客户端的资料和计算资源,并将其联合 fine-tune 基础模型。
  • methods: FedRA 使用了一个随机生成的分配矩阵,并在资源受限的客户端上实现了一小部分层的重新排序和微调。这些微调的结果将被服务器统计,并将其与原始模型的对应层进行整合。
  • results: 在两个大规模的图像数据集上(DomainNet 和 NICO++),FedRA 在多个非同寻设定下进行了实验,并与比较方法相比,它表现出了很好的性能。
    Abstract With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rendering them incapable of supporting the entire model fine-tuning process. In response to this challenge, we propose a novel federated tuning algorithm, FedRA. The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model. Specifically, in each communication round, FedRA randomly generates an allocation matrix. For resource-constrained clients, it reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using LoRA. Subsequently, the server aggregates the updated LoRA parameters from the clients according to the current allocation matrix into the corresponding layers of the original model. It is worth noting that FedRA also supports scenarios where none of the clients can support the entire global model, which is an impressive advantage. We conduct experiments on two large-scale image datasets, DomainNet and NICO++, under various non-iid settings. The results demonstrate that FedRA outperforms the compared methods significantly. The source code is available at \url{https://github.com/leondada/FedRA}.
    摘要 随着基础模型的可用性的增加,联邦调教(federated tuning)在联邦学习领域引起了关注,利用多个客户端的数据和计算资源进行联邦调教。然而,在实际的联邦场景中,客户端通常存在多种不同的计算和通信资源,使得他们无法支持整个模型调教过程。为了解决这个挑战,我们提出了一种新的联邦调教算法,即 FedRA。FedRA的实现非常直观,可以轻松地与任何基于转换器的模型集成,无需对原始模型进行进一步修改。具体来说,在每次通信循环中,FedRA随机生成一个分配矩阵。对于资源受限的客户端,它会根据分配矩阵重新排序一小部分层 FROM the original model,并使用 LoRA 进行微调。然后,服务器会根据当前的分配矩阵,将客户端上更新的 LoRA 参数与原始模型的相应层进行集成。需要注意的是,FedRA 还支持情况下,客户端无法支持整个全局模型,这是一个非常优势的特点。我们在 DomainNet 和 NICO++ 两个大规模图像数据集上进行了多种非标一致的实验,结果显示,FedRA 与相比方法相比有显著的优势。源代码可以在 \url{https://github.com/leondada/FedRA} 上获取。

An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback

  • paper_url: http://arxiv.org/abs/2311.11226
  • repo_url: None
  • paper_authors: Kaustubh D. Dhole, Ramraj Chandradevan, Eugene Agichtein
  • for: 这个论文旨在提供一种助手来帮助用户在搜索过程中提出更有效的查询。
  • methods: 论文使用了多种自然语言处理技术,包括语言模型(LLMs)和人工智能技术,以帮助用户提出更有效的查询。
  • results: 论文通过对多种语言和多个文档库进行测试,发现使用这种助手可以提高用户在搜索过程中提出的查询的有效性。
    Abstract While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other languages, or looking for complex information such as events, which are not easily expressible as queries. Providing example documents or passages of interest, might be easier for a user, however, such query-by-example scenarios are prone to concept drift, and are highly sensitive to the query generation method. This demo illustrates complementary approaches of using LLMs interactively, assisting and enabling the user to provide edits and feedback at all stages of the query formulation process. The proposed Query Generation Assistant is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection. Specifically, the proposed assistive interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries. The proposed interface is a valuable experimental tool for exploring fine-tuning and prompting of LLMs for query generation to qualitatively evaluate the effectiveness of retrieval and ranking models, and for conducting Human-in-the-Loop (HITL) experiments for complex search tasks where users struggle to formulate queries without such assistance.
    摘要 “寻找信息是主要的访问方法,但是寻找有效的查询仍然是一个问题,特别是当用户不熟悉一个领域,或者搜寻文档written in other languages,或者搜寻复杂的信息,如事件,这些不易表达为查询。提供示例文档或 interessante 的段落可能较为容易,但这些查询示例enario 容易受到概念变化的影响,并且高度敏感于查询生成方法。本 demo 展示了与 LLM 互动的辅助方法,帮助用户在查询生成过程中提供修改和反馈。提案的查询生成助手是一个新的搜寻界面,可以在单语言或多语言文档集上进行自动和互动查询生成。具体来说,提案的辅助界面可以让用户对不同的 LLM 生成的查询进行修改,提供关于已经 retrieve 的文档或段落的反馈,并且可以将用户的反馈作为提示,生成更有效的查询。本interface 是一个实用的实验工具,可以用于调整 LLM 的 fine-tuning 和提示,以评估搜寻和排名模型的效iveness,并且可以进行人类在 Loop (HITL)实验,用于复杂的搜寻任务,where users struggle to formulate queries without such assistance。”

SPLAIN: Augmenting CybersecurityWarnings with Reasons and Data

  • paper_url: http://arxiv.org/abs/2311.11215
  • repo_url: None
  • paper_authors: Vera A. Kazakova, Jena D. Hwang, Bonnie J. Dorr, Yorick Wilks, J. Blake Gage, Alex Memory, Mark A. Clark
  • for: 提供了一种自然语言生成器,用于转化 warnings 数据为用户友好的cyber 威胁解释。
  • methods: 使用模板基本 approached 生成一个层次结构的 warning 结构和词汇,以确保 warning 的一致性和可读性。
  • results: 通过使用 SPLAIN,可以提供了clear 和可行的输出,包括输入数据和系统功能的层次结构化解释。
    Abstract Effective cyber threat recognition and prevention demand comprehensible forecasting systems, as prior approaches commonly offer limited and, ultimately, unconvincing information. We introduce Simplified Plaintext Language (SPLAIN), a natural language generator that converts warning data into user-friendly cyber threat explanations. SPLAIN is designed to generate clear, actionable outputs, incorporating hierarchically organized explanatory details about input data and system functionality. Given the inputs of individual sensor-induced forecasting signals and an overall warning from a fusion module, SPLAIN queries each signal for information on contributing sensors and data signals. This collected data is processed into a coherent English explanation, encompassing forecasting, sensing, and data elements for user review. SPLAIN's template-based approach ensures consistent warning structure and vocabulary. SPLAIN's hierarchical output structure allows each threat and its components to be expanded to reveal underlying explanations on demand. Our conclusions emphasize the need for designers to specify the "how" and "why" behind cyber warnings, advocate for simple structured templates in generating consistent explanations, and recognize that direct causal links in Machine Learning approaches may not always be identifiable, requiring some explanations to focus on general methodologies, such as model and training data.
    摘要 为了有效识别和预防网络攻击,需要有可理解的预测系统,因为传统的方法通常只提供有限和不够有力的信息。我们介绍了简化平文语言(SPLAIN),一种自然语言生成器,可以将警告数据转换成有用的网络攻击预测说明。SPLAIN是设计来生成明了、可行的输出,并包括阶段性的解释细节和系统功能。对输入感知器生成的单个警告信号和总警告从拢合模块来说,SPLAIN会询问每个信号的报告来自哪些感知器和数据信号。这些收集的数据被处理成一个 coherent English 说明,包括预测、感知和数据元素,供用户审查。SPLAIN 使用模板方法,以保证警告结构和词汇的一致性。SPLAIN 的层次结构输出,让每个威胁和其组成部分可以Expand 到显示下一层的解释。我们的结论强调设计人员需要指定 "如何" 和 "为什么" 在网络警告中,主张使用简单结构模板生成一致的解释,并认可直接 causal 链在机器学习方法中可能无法识别,有些解释需要关注总方法,如模型和训练数据。

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms?

  • paper_url: http://arxiv.org/abs/2311.11212
  • repo_url: None
  • paper_authors: Chanhui Lee, Juhyeon Kim, Yongjun Jeong, Juhyun Lyu, Junghee Kim, Sangmin Lee, Sangjun Han, Hyeokjun Choe, Soyeon Park, Woohyung Lim, Sungbin Lim, Sanghack Lee
  • for: 本研究探讨了透过预训语言模型(PLM)进行 causal reasoning 的可能性,以及如何将 PLM 应用于探索 causal 关系性。
  • methods: 本研究使用了重复的 causal reasoning 来测试 PLM 的可能性,并通过特定设计的 prompts 来实现。
  • results: 研究发现 PLM-based causal reasoning 存在许多限制,包括 prompt design 的影响和 false prediction 的风险。在实验中,我们显示了 PLM-based causal reasoning 的局限性,并提出了一个新的框架,具有与 PLM 的融合和 causal discovery 的整合。这个框架不仅在整合 PLM 和 causal discovery 中提高了性能,还建议了如何将 PLM 提取的专业知识与现有的 causal discovery 算法整合。
    Abstract Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics-inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.
    摘要 <>使用预训练语言模型(PLM)进行 causal reasoning 已经得到了许多研究的推动。 PLM 可以通过文本描述来进行 causal reasoning,而不需要数据,这与 causal discovery 不同,后者的目标是通过数据来确定变量之间的 causal 关系。在最近的研究中,有一种方法被提出,它通过特定的 prompts 来模拟 causal discovery,并通过重复的 causal reasoning 来获得结果。这种方法高亮了 PLM 在发现 causal 关系的用途,特别是在 dealing with multiple variables 时。然而,PLM 的特点是它们不会分析数据,而且它们高度依赖于 prompt 的设计,这导致了直接使用 PLM 在 causal discovery 中的限制。因此,PLM-based causal reasoning 深受 prompt 设计的限制,并且存在风险的过确定和 false prediction 在确定 causal 关系上。在这篇论文中,我们employmultiple physics-inspired synthetic data 进行实验,以证明 PLM-based causal reasoning 中的上述限制。然后,我们提出了一种新的框架,它可以将 PLM 提取的 prior knowledge 与 causal discovery 算法相结合。我们的提出的框架不仅在将 PLM 和 causal discovery 结合起来后表现出了改进的性能,而且还表明了如何使用 PLM 提取的 prior knowledge 与现有的 causal discovery 算法结合。<>

Leveraging Generative AI for Clinical Evidence Summarization Needs to Achieve Trustworthiness

  • paper_url: http://arxiv.org/abs/2311.11211
  • repo_url: None
  • paper_authors: Gongbo Zhang, Qiao Jin, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng
  • for: 提高医疗质量,使医疗决策和实践受到最佳证据支持。
  • methods: 使用大语言模型来自动摘要医学证据,以便更好地收集、评估和Synthesize医学证据。
  • results: 通过开发可信worthy的生成AI模型,可以提高医学证据摘要的效率和准确性。
    Abstract Evidence-based medicine aims to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.
    摘要 证据基础医学目标是提高健康医疗质量,通过最好的证据来决策和实践医疗。医疗证据的快速增长,来自多种来源,增加了收集、评估和总结证据的挑战。最新的生成AI技术,如大语言模型,表现出了激进推动这项任务的潜力。然而,建立可靠、公正和包容的模型仍然是一项复杂的任务。本观点讨论了自动摘要医疗证据中生成AI的可靠性。

On the Noise Scheduling for Generating Plausible Designs with Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.11207
  • repo_url: None
  • paper_authors: Jiajie Fan, Laure Vuaille, Thomas Bäck, Hao Wang
    for: 这个论文探讨了Diffusion Models(DGMs)在创造新的设计方面的应用,尤其是在视觉质量高和结构Semantic表达有约束的情况下。methods: 该论文使用了Diffusion Models(DGMs)进行生成图像,并研究了噪声程度对结果plausibility的影响。两种技术被提出来确定噪声范围,并制定了一个新的parametric噪声程度。results: 相比默认噪声程度,使用提出的噪声程度可以大幅提高了结果plausibility的率从83.4%提高到93.5%,并且Fr'echet Inception Distance(FID)从7.84降低到4.87。这些结果表明模型具有良好的结构理解能力。
    Abstract Deep Generative Models (DGMs) are widely used to create innovative designs across multiple industries, ranging from fashion to the automotive sector. In addition to generating images of high visual quality, the task of structural design generation imposes more stringent constrains on the semantic expression, e.g., no floating material or missing part, which we refer to as plausibility in this work. We delve into the impact of noise schedules of diffusion models on the plausibility of the outcome: there exists a range of noise levels at which the model's performance decides the result plausibility. Also, we propose two techniques to determine such a range for a given image set and devise a novel parametric noise schedule for better plausibility. We apply this noise schedule to the training and sampling of the well-known diffusion model EDM and compare it to its default noise schedule. Compared to EDM, our schedule significantly improves the rate of plausible designs from 83.4% to 93.5% and Fr\'echet Inception Distance (FID) from 7.84 to 4.87. Further applications of advanced image editing tools demonstrate the model's solid understanding of structure.
    摘要

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models

  • paper_url: http://arxiv.org/abs/2311.11202
  • repo_url: https://github.com/docta-ai/docta
  • paper_authors: Zhaowei Zhu, Jialu Wang, Hao Cheng, Yang Liu
  • for: 这项研究的目的是提高现实世界数据集的信息准确性,以提高语言模型的训练和应用。
  • methods: 该研究提出了一种系统性的评估数据集信息准确性的框架,并使用了自动标注和人工审核来检测数据集中的标签错误。
  • results: 该研究在11个实际世界数据集上发现了平均6.16%的标签错误,并通过直接修复标签错误来提高数据集的信息准确性和下游学习性能。
    Abstract Language models have shown promise in various tasks but can be affected by undesired data during training, fine-tuning, or alignment. For example, if some unsafe conversations are wrongly annotated as safe ones, the model fine-tuned on these samples may be harmful. Therefore, the correctness of annotations, i.e., the credibility of the dataset, is important. This study focuses on the credibility of real-world datasets, including the popular benchmarks Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHF, that can be used for training a harmless language model. Given the cost and difficulty of cleaning these datasets by humans, we introduce a systematic framework for evaluating the credibility of datasets, identifying label errors, and evaluating the influence of noisy labels in the curated language data, specifically focusing on unsafe comments and conversation classification. With the framework, we find and fix an average of 6.16% label errors in 11 datasets constructed from the above benchmarks. The data credibility and downstream learning performance can be remarkably improved by directly fixing label errors, indicating the significance of cleaning existing real-world datasets. Open-source: https://github.com/Docta-ai/docta.
    摘要 Language models have shown promise in various tasks, but can be affected by undesired data during training, fine-tuning, or alignment. For example, if some unsafe conversations are wrongly annotated as safe ones, the model fine-tuned on these samples may be harmful. Therefore, the correctness of annotations, i.e., the credibility of the dataset, is important. This study focuses on the credibility of real-world datasets, including the popular benchmarks Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHF, that can be used for training a harmless language model. Given the cost and difficulty of cleaning these datasets by humans, we introduce a systematic framework for evaluating the credibility of datasets, identifying label errors, and evaluating the influence of noisy labels in the curated language data, specifically focusing on unsafe comments and conversation classification. With the framework, we find and fix an average of 6.16% label errors in 11 datasets constructed from the above benchmarks. The data credibility and downstream learning performance can be remarkably improved by directly fixing label errors, indicating the significance of cleaning existing real-world datasets. Open-source: .

Assessing AI Impact Assessments: A Classroom Study

  • paper_url: http://arxiv.org/abs/2311.11193
  • repo_url: None
  • paper_authors: Nari Johnson, Hoda Heidari
  • for: This paper is written to evaluate the effectiveness of existing AI impact assessments (AIIAs) and to provide recommendations for future work on developing and validating AIIAs.
  • methods: The paper uses a classroom study with 38 students at a large research-intensive university to evaluate the impact of AIIAs on participants’ perceptions of the potential risks of generative AI systems and the level of responsibility held by AI experts in addressing potential harm.
  • results: The study finds preliminary evidence that impact assessments can influence participants’ perceptions of the potential risks of generative AI systems, and identifies a consistent set of limitations shared by several existing AIIA instruments.Here are the results in Simplified Chinese text:
  • for: 这篇论文是为了评估现有的人工智能影响评估(AIIA)的有效性,并提供未来工作的建议。
  • methods: 这篇论文使用了一个大学课堂研究(N = 38),以评估AI影响评估的影响。
  • results: 研究发现,影响评估可以影响参与者对生成AI系统的风险潜在性的观念,并发现了许多现有AIIA工具的限制。
    Abstract Artificial Intelligence Impact Assessments ("AIIAs"), a family of tools that provide structured processes to imagine the possible impacts of a proposed AI system, have become an increasingly popular proposal to govern AI systems. Recent efforts from government or private-sector organizations have proposed many diverse instantiations of AIIAs, which take a variety of forms ranging from open-ended questionnaires to graded score-cards. However, to date that has been limited evaluation of existing AIIA instruments. We conduct a classroom study (N = 38) at a large research-intensive university (R1) in an elective course focused on the societal and ethical implications of AI. We assign students to different organizational roles (for example, an ML scientist or product manager) and ask participant teams to complete one of three existing AI impact assessments for one of two imagined generative AI systems. In our thematic analysis of participants' responses to pre- and post-activity questionnaires, we find preliminary evidence that impact assessments can influence participants' perceptions of the potential risks of generative AI systems, and the level of responsibility held by AI experts in addressing potential harm. We also discover a consistent set of limitations shared by several existing AIIA instruments, which we group into concerns about their format and content, as well as the feasibility and effectiveness of the activity in foreseeing and mitigating potential harms. Drawing on the findings of this study, we provide recommendations for future work on developing and validating AIIAs.
    摘要 人工智能影响评估工具(AIIA),一家家用于假设提案的人工智能系统的结构化过程,已经成为控制人工智能系统的增加 популяр的建议。 latest efforts from government or private-sector organizations have proposed many diverse instantiations of AIIAs, which take a variety of forms ranging from open-ended questionnaires to graded score-cards. However, to date there has been limited evaluation of existing AIIA instruments.我们在一所大型研究激射大学(R1)的选修课程中(关于人工智能的社会和道德问题),分配学生不同的组织角色(例如,机器学习科学家或产品经理),并让参与者组合完成一个现有的人工智能影响评估工具。在我们的主题分析中,我们发现了参与者对生成人工智能系统的潜在风险的认知改变,以及AI专家在避免可能的危害方面承担的责任水平。我们还发现了许多现有AIIA工具的共同局限性,包括格式和内容的问题,以及活动的可行性和效果。基于这些研究结果,我们提供未来开发和验证AIIAs的建议。

Attention-Based Real-Time Defenses for Physical Adversarial Attacks in Vision Applications

  • paper_url: http://arxiv.org/abs/2311.11191
  • repo_url: None
  • paper_authors: Giulio Rossolini, Alessandro Biondi, Giorgio Buttazzo
  • for: 防止深度神经网络在实际世界中受到攻击,以确保其应用在安全关键领域。
  • methods: 使用Channel-attention mechanism来快速识别和跟踪恶意物体,并在多帧场景中掩蔽恶意影响。
  • results: 提高了现有的过度活动技术的性能,并在多帧场景中实现了高效的防御框架,通过广泛的实验证明了其效果。
    Abstract Deep neural networks exhibit excellent performance in computer vision tasks, but their vulnerability to real-world adversarial attacks, achieved through physical objects that can corrupt their predictions, raises serious security concerns for their application in safety-critical domains. Existing defense methods focus on single-frame analysis and are characterized by high computational costs that limit their applicability in multi-frame scenarios, where real-time decisions are crucial. To address this problem, this paper proposes an efficient attention-based defense mechanism that exploits adversarial channel-attention to quickly identify and track malicious objects in shallow network layers and mask their adversarial effects in a multi-frame setting. This work advances the state of the art by enhancing existing over-activation techniques for real-world adversarial attacks to make them usable in real-time applications. It also introduces an efficient multi-frame defense framework, validating its efficacy through extensive experiments aimed at evaluating both defense performance and computational cost.
    摘要 深度神经网络在计算机视觉任务中表现出色,但它们受到真实世界的敌对攻击,通过物理对象腐蚀其预测,引发了安全问题的应用在安全关键领域。现有的防御方法主要是单帧分析,具有高计算成本,限制其在多帧场景中实现可靠的响应。为解决这个问题,本文提出了一种高效的注意力基于防御机制,通过对恶意对象在浅层神经网络中快速识别和跟踪,并在多帧设置中遮盖其恶意效应。本工作提高了现有的真实世界敌对攻击的过载技术的可用性,使其在实时应用中可用。此外,本文还介绍了一种高效的多帧防御框架,通过广泛的实验证明其防御性和计算成本的优化。

Few-Shot Classification & Segmentation Using Large Language Models Agent

  • paper_url: http://arxiv.org/abs/2311.12065
  • repo_url: None
  • paper_authors: Tian Meng, Yang Tao, Wuliang Yin
  • for: 解决几个示例图像分类和 segmentation 问题(few-shot image classification and segmentation,FS-CS),需要在查询图像中分类和 segmentation 目标对象,只需要几个目标类的示例。
  • methods: 利用大语言模型(LLM)作为代理,在培育自由的情况下解决 FS-CS 问题。将 LLM 作为任务规划者,并使用 off-the-shelf 视觉模型(如 Segment Anything Model 和 GPT-4Vision)帮助 LLM 理解空间和 semantic 信息。使用链式思维提示和在场景学习来导航 LLM,以使其在查询图像中分类和 segmentation 目标对象。
  • results: 提出的方法可以在 Pascal-5i 数据集上 дости得 state-of-the-art 性能。
    Abstract The task of few-shot image classification and segmentation (FS-CS) requires the classification and segmentation of target objects in a query image, given only a few examples of the target classes. We introduce a method that utilises large language models (LLM) as an agent to address the FS-CS problem in a training-free manner. By making the LLM the task planner and off-the-shelf vision models the tools, the proposed method is capable of classifying and segmenting target objects using only image-level labels. Specifically, chain-of-thought prompting and in-context learning guide the LLM to observe support images like human; vision models such as Segment Anything Model (SAM) and GPT-4Vision assist LLM understand spatial and semantic information at the same time. Ultimately, the LLM uses its summarizing and reasoning capabilities to classify and segment the query image. The proposed method's modular framework makes it easily extendable. Our approach achieves state-of-the-art performance on the Pascal-5i dataset.
    摘要 很少示例图像分类和分割(FS-CS)任务需要根据只有几个目标类示例来分类和分割目标对象在查询图像中。我们介绍了一种方法,使用大型自然语言模型(LLM)作为任务规划器,使用各种可用的视觉模型(如Segment Anything Model(SAM)和GPT-4Vision),让LLM通过人类的思维方式来理解支持图像中的空间和semantic信息。最终,LLM使用总结和理解能力来分类和分割查询图像。我们的方法具有扩展性,我们的方法在Pascal-5i数据集上实现了状态的表现。