cs.AI - 2023-11-24

Advancing Fluid-Based Thermal Management Systems Design: Leveraging Graph Neural Networks for Graph Regression and Efficient Enumeration Reduction

  • paper_url: http://arxiv.org/abs/2311.14874
  • repo_url: None
  • paper_authors: Saeid Bayat, Nastaran Shahmansouri, Satya RT Peddada, Alex Tessier, Adrian Butscher, James T Allison
    for: 这 paper 的目的是提出一种基于图的框架,用于快速和高效地找到优化 thermal management system 的设计方案。methods: 该 paper 使用了图学习模型(GNN),用于预测系统的性能值。首先,使用 GNN 模型对数据进行训练,然后使用这些模型来预测系统的性能值。results: 研究结果表明,使用 GNN 模型可以快速和高效地预测系统的性能值,并且可以减少约 92% 的系统动态模型化和优化控制分析时间。
    Abstract In this research, we developed a graph-based framework to represent various aspects of optimal thermal management system design, with the aim of rapidly and efficiently identifying optimal design candidates. Initially, the graph-based framework is utilized to generate diverse thermal management system architectures. The dynamics of these system architectures are modeled under various loading conditions, and an open-loop optimal controller is employed to determine each system's optimal performance. These modeled cases constitute the dataset, with the corresponding optimal performance values serving as the labels for the data. In the subsequent step, a Graph Neural Network (GNN) model is trained on 30% of the labeled data to predict the systems' performance, effectively addressing a regression problem. Utilizing this trained model, we estimate the performance values for the remaining 70% of the data, which serves as the test set. In the third step, the predicted performance values are employed to rank the test data, facilitating prioritized evaluation of the design scenarios. Specifically, a small subset of the test data with the highest estimated ranks undergoes evaluation via the open-loop optimal control solver. This targeted approach concentrates on evaluating higher-ranked designs identified by the GNN, replacing the exhaustive search (enumeration-based) of all design cases. The results demonstrate a significant average reduction of over 92% in the number of system dynamic modeling and optimal control analyses required to identify optimal design scenarios.
    摘要 在这项研究中,我们开发了一个基于图的框架来表示优化热管理系统设计的多种方面,以便快速和高效地确定优化设计方案。首先,基于图的框架用于生成多种热管理系统架构。这些系统架构的动态模型在不同的荷载条件下被模型化,并使用开箱控制器来确定每个系统的优化性能。这些模型的情况组成了数据集,其中每个数据对应的优化性能值服为标签。在接下来的步骤中,我们使用30%的标签数据训练了一个图神经网络(GNN)模型,以预测系统的性能。使用这个训练好的模型,我们对剩下的70%的数据进行预测,并将预测值用于排序测试数据。在第三步中,我们使用预测值来评价测试数据,并将其分为优化设计的小子集。这些小子集进行了优化控制的评估,从而实现了高效的优化设计评价。结果表明,使用GNN模型可以减少系统动态模型化和优化控制分析的数量超过92%。

Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge

  • paper_url: http://arxiv.org/abs/2311.14865
  • repo_url: None
  • paper_authors: Shi Yin Hong, Susan Gauch
  • for: 这项研究旨在提高仇恨言语检测系统的通用性,以便在真实世界中进行部署。
  • methods: 这项研究使用了多任务架构,利用情感知识来提高仇恨言语检测的通用性。
  • results: 研究表明,使用情感知识可以提高仇恨言语检测的通用性,并在六个公共可用的数据集上实现了18.1%的通用性提升和8.5%的平均跨领域性提升,根据F1度量。
    Abstract Reliable automatic hate speech (HS) detection systems must adapt to the in-flow of diverse new data to curtail hate speech. However, hate speech detection systems commonly lack generalizability in identifying hate speech dissimilar to data used in training, impeding their robustness in real-world deployments. In this work, we propose a hate speech generalization framework that leverages emotion knowledge in a multitask architecture to improve the generalizability of hate speech detection in a cross-domain setting. We investigate emotion corpora with varying emotion categorical scopes to determine the best corpus scope for supplying emotion knowledge to foster generalized hate speech detection. We further assess the relationship between using pretrained Transformers models adapted for hate speech and its effect on our emotion-enriched hate speech generalization model. We perform extensive experiments on six publicly available datasets sourced from different online domains and show that our emotion-enriched HS detection generalization method demonstrates consistent generalization improvement in cross-domain evaluation, increasing generalization performance up to 18.1% and average cross-domain performance up to 8.5%, according to the F1 measure.
    摘要 可靠的自动仇恨言语检测系统需要适应新数据的流入,以遏制仇恨言语的 распространение。然而,仇恨言语检测系统通常缺乏对不同于训练数据的仇恨言语识别的普适性,从而限制其在实际应用中的稳定性。在这种情况下,我们提出了一种基于情感知识的仇恨言语检测框架,以提高仇恨言语检测的普适性在跨领域设置下。我们 investigate了不同的情感领域词库,以确定最佳的词库范围来供情感知识,以促进普适的仇恨言语检测。我们进一步评估了使用预训练的Transformers模型,以适应仇恨言语检测,并对我们的情感强化仇恨言语检测模型产生了何种影响。我们在六个公共可用的数据集上进行了广泛的实验,并显示了我们的情感强化仇恨言语检测普适化方法在跨领域评估中具有可靠的普适性提升,提高总体普适性达18.1%,平均跨领域性达8.5%,根据F1度量。

Next-gen traffic surveillance: AI-assisted mobile traffic violation detection system

  • paper_url: http://arxiv.org/abs/2311.16179
  • repo_url: None
  • paper_authors: Dila Dede, Mehmet Ali Sarsıl, Ata Shaker, Olgu Altıntaş, Onur Ergen
  • for: 这篇论文的目的是探讨如何运用人工智能技术来实现精确的交通法规遵循检测系统,以减少交通事故的影响。
  • methods: 本论文使用的方法包括计算机视觉和机器学习,包括YOLOv5检测模组和强SORT追踪模组,以检测交通违法行为。
  • results: 本论文的结果显示,这些方法可以实现精确的交通违法检测,包括红灯违法、非法使用崩溃车道、违反车辆跟踪距离、违反标志法规、非法停车和停车在标志道路上。
    Abstract Road traffic accidents pose a significant global public health concern, leading to injuries, fatalities, and vehicle damage. Approximately 1,3 million people lose their lives daily due to traffic accidents [World Health Organization, 2022]. Addressing this issue requires accurate traffic law violation detection systems to ensure adherence to regulations. The integration of Artificial Intelligence algorithms, leveraging machine learning and computer vision, has facilitated the development of precise traffic rule enforcement. This paper illustrates how computer vision and machine learning enable the creation of robust algorithms for detecting various traffic violations. Our model, capable of identifying six common traffic infractions, detects red light violations, illegal use of breakdown lanes, violations of vehicle following distance, breaches of marked crosswalk laws, illegal parking, and parking on marked crosswalks. Utilizing online traffic footage and a self-mounted on-dash camera, we apply the YOLOv5 algorithm's detection module to identify traffic agents such as cars, pedestrians, and traffic signs, and the strongSORT algorithm for continuous interframe tracking. Six discrete algorithms analyze agents' behavior and trajectory to detect violations. Subsequently, an Identification Module extracts vehicle ID information, such as the license plate, to generate violation notices sent to relevant authorities.
    摘要 交通事故是全球公共健康的一大挑战,引起伤亡、死亡和车辆损害。根据世界卫生组织2022年的数据,每天有1,3万人因交通事故丧生。为解决这一问题,需要 precisetraffic law violation detection system,以确保遵守法规。通过人工智能算法,利用机器学习和计算机视觉,可以开发出高精度的交通规则执行系统。本文介绍了计算机视觉和机器学习如何为交通规则执行创造出Robust algorithms。我们的模型可以识别六种常见的交通违法行为,包括红灯违法、非法使用车辆维护道、车辆跟踪违法、人行交通违法、非法停车和停车在人行交通道上。我们使用在车上安装的自适应摄像头和网络交通视频,并应用YOLOv5检测模块和强Sort算法来识别交通代理人(包括车辆、行人和交通标志)的行为和轨迹,然后分别对代理人的行为和轨迹进行分析,并生成违法通知。最后,一个标识模块从车辆ID信息中提取车辆牌照号,以生成违法通知并发送到相关当局。

A Reusable AI-Enabled Defect Detection System for Railway Using Ensembled CNN

  • paper_url: http://arxiv.org/abs/2311.14824
  • repo_url: None
  • paper_authors: Rahatara Ferdousi, Fedwa Laamarti, Chunsheng Yang, Abdulmotaleb El Saddik
  • For: 提高智能铁路系统中的可靠性,通过检测铁路部件的缺陷。* Methods: 使用ensemble learning和转移学习模型(VGG-19、MobileNetV3和ResNet-50),并对不同阶段进行训练,以提高缺陷分类精度和鲁棒性。* Results: 对比其他状态艺术方法,实验结果表明我们的方法可以获得更好和更一致的性能, substantiating the reusability of the defect detection system for newly evolved defected rail parts。
    Abstract Accurate Defect detection is crucial for ensuring the trustworthiness of intelligent railway systems. Current approaches rely on single deep-learning models, like CNNs, which employ a large amount of data to capture underlying patterns. Training a new defect classifier with limited samples often leads to overfitting and poor performance on unseen images. To address this, researchers have advocated transfer learning and fine-tuning the pre-trained models. However, using a single backbone network in transfer learning still may cause bottleneck issues and inconsistent performance if it is not suitable for a specific problem domain. To overcome these challenges, we propose a reusable AI-enabled defect detection approach. By combining ensemble learning with transfer learning models (VGG-19, MobileNetV3, and ResNet-50), we improved the classification accuracy and achieved consistent performance at a certain phase of training. Our empirical analysis demonstrates better and more consistent performance compared to other state-of-the-art approaches. The consistency substantiates the reusability of the defect detection system for newly evolved defected rail parts. Therefore we anticipate these findings to benefit further research and development of reusable AI-enabled solutions for railway systems.
    摘要 精准的缺陷检测是智能铁路系统的可靠性保证之重要因素。现有的方法通常采用单个深度学习模型,如Convolutional Neural Networks(CNNs),利用大量数据捕捉下面的模式。然而,在训练新的缺陷分类器时,通常会导致过拟合和未看到的图像表现不佳。为解决这些挑战,研究人员已经提出了传输学习和精度调整预训练模型的方法。然而,使用单一的背部网络在传输学习中仍可能会导致瓶颈问题和不一致的表现。为了解决这些问题,我们提出了一种可重用的人工智能启用缺陷检测方法。通过将集成学习与传输学习模型(VGG-19、MobileNetV3和ResNet-50)相结合,我们提高了分类精度并在某些阶段的训练中实现了一致的表现。我们的实验分析表明,我们的方法在其他状态艺术方法的比较中表现更好和一致。这种一致性证明了缺陷检测系统的可重用性,因此我们预计这些发现将对智能铁路系统的进一步研发产生积极的影响。

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

  • paper_url: http://arxiv.org/abs/2311.15826
  • repo_url: https://github.com/mbzuai-oryx/geochat
  • paper_authors: Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan
    for: 这篇论文的目的是提出一种可靠的远程感知大型语言模型(VLM),以便在远程感知领域进行对话。methods: 该模型使用多任务对话能力,可以处理高分辨率远程感知图像,并且可以根据用户提问的要求进行地区特定的对话。results: 模型在远程感知多任务对话中表现出色,例如图像和地区描述、视觉问答、场景分类、视觉引用检测等任务上,都有robust零基eline性表现。
    Abstract Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene interpretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries. To address these limitations, we propose GeoChat - the first versatile remote sensing VLM that offers multitask conversational capabilities with high-resolution RS images. Specifically, GeoChat can not only answer image-level queries but also accepts region inputs to hold region-specific dialogue. Furthermore, it can visually ground objects in its responses by referring to their spatial coordinates. To address the lack of domain-specific datasets, we generate a novel RS multimodal instruction-following dataset by extending image-text pairs from existing diverse RS datasets. We establish a comprehensive benchmark for RS multitask conversations and compare with a number of baseline methods. GeoChat demonstrates robust zero-shot performance on various RS tasks, e.g., image and region captioning, visual question answering, scene classification, visually grounded conversations and referring detection. Our code is available at https://github.com/mbzuai-oryx/geochat.
    摘要 近期大量视觉语言模型(VLM)的进步已经在自然图像领域显示出了极大的搭配潜力,allowing用户与图像进行对话。然而,这些通用领域VLM对远程感知(RS)场景表现糟糕,导致用户提出的RS领域特定问题时返回错误或 fabricated 信息。这种行为的原因在于RS图像具有独特的挑战,例如处理高分辨率RS图像的多种比例变化和许多小对象需要地域级别的理解。此外,RS领域缺乏特定的多模态指令遵循数据和强大的后准体模型,使模型困难匹配用户查询。为解决这些限制,我们提出了GeoChat - 首个多功能远程感知VLM,具有多任务对话功能和高分辨率RS图像。具体来说,GeoChat不仅可以回答图像级别的问题,还可以接受区域输入进行区域特定对话。此外,它可以将对象视觉匹配到其空间坐标。为了解决RS领域缺乏特定数据,我们生成了一个新的RS多模态指令遵循数据集,通过扩展现有多种RS数据集的图像文本对。我们建立了RS多任务对话的全面benchmark,并与多个基线方法进行比较。GeoChat在多种RS任务上 exhibits robust zero-shot表现,例如图像和区域captioning、视觉问题回答、场景分类、visually grounded conversations和referring detection。我们的代码可以在https://github.com/mbzuai-oryx/geochat 上获取。

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

  • paper_url: http://arxiv.org/abs/2311.14656
  • repo_url: https://github.com/jonathan-roberts1/charting-new-territories
  • paper_authors: Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, Samuel Albanie
  • for: This paper explores the capabilities of multimodal large language models (MLLMs) in the geographic and geospatial domains, and evaluates their performance against open-source counterparts.
  • methods: The paper uses a small-scale geographic benchmark consisting of a suite of visual tasks to challenge the models and test their abilities across a spectrum of complexity.
  • results: The analysis reveals where the models excel and where they falter, providing a balanced view of their capabilities in the geographic domain. Additionally, the benchmark will be publicly released to enable the comparison and evaluation of future models.Here’s the Chinese version of the three key points:
  • for: 这篇论文探索了大型语言模型(MLLMs)在地理和地球几何领域的知识和能力,并对开源对手进行评估。
  • methods: 这篇论文使用一个小规模的地理 benchmark,包括一系列视觉任务,用于挑战这些模型,并测试它们在复杂性谱中的能力。
  • results: 分析发现,这些模型在某些任务上 excel,甚至超过人类的性能,但也有一些任务在它们失败。这个分析提供了地理领域中模型的总体能力的平衡视图。此外,这个 benchmark 将会公开发布,以便将来的模型进行比较和评估。
    Abstract Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response. We conduct a series of experiments exploring various vision capabilities of MLLMs within these domains, particularly focusing on the frontier model GPT-4V, and benchmark its performance against open-source counterparts. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity. The analysis uncovers not only where such models excel, including instances where they outperform humans, but also where they falter, providing a balanced view of their capabilities in the geographic domain. To enable the comparison and evaluation of future models, our benchmark will be publicly released.
    摘要 多模态大语言模型(MLLM)已经表现出了广泛的能力,但它们在地理和地区领域的知识和能力尚未得到了探索,尽管这些能力在航行、环境研究、城市发展和灾害应急应用中具有广泛的应用前景。我们进行了一系列实验,探索了不同的视觉能力,特别是关注最新的FRONTIER模型GPT-4V,并对开源对手进行了比较。我们的方法包括对这些模型进行小规模的地理测试,测试它们在复杂性谱上的能力。分析结果表明,这些模型在某些情况下可以超越人类,而在其他情况下则表现不佳,这为未来的模型评估和比较提供了一个平衡的视角。为便于将来的模型评估和比较,我们将在公共领域中发布我们的准则。

Evaluating Large Language Models through Gender and Racial Stereotypes

  • paper_url: http://arxiv.org/abs/2311.14788
  • repo_url: None
  • paper_authors: Ananya Malik
  • for: 这项研究是为了研究语言模型中可能存在的偏见,特别是gender和race偏见在职业场景下的表现。
  • methods: 这项研究采用了比较性研究的方法,使用了 newer和older语言模型,对两类偏见进行了评估。
  • results: 研究发现, newer模型中的gender偏见已经大幅减少,而race偏见仍然存在。
    Abstract Language Models have ushered a new age of AI gaining traction within the NLP community as well as amongst the general population. AI's ability to make predictions, generations and its applications in sensitive decision-making scenarios, makes it even more important to study these models for possible biases that may exist and that can be exaggerated. We conduct a quality comparative study and establish a framework to evaluate language models under the premise of two kinds of biases: gender and race, in a professional setting. We find out that while gender bias has reduced immensely in newer models, as compared to older ones, racial bias still exists.
    摘要 人工智能(AI)在自然语言处理(NLP)领域以及普通民众中得到了广泛的推广和应用。AI的预测、生成和敏感决策应用等能力,使得研究这些模型中可能存在的偏见变得非常重要。我们进行了质量比较研究,并建立了一个基于两种偏见(性别和种族)的评价框架,在职业场景中进行了测试。我们发现, newer models中的性别偏见已经减少了很多,而种族偏见仍然存在。

History Filtering in Imperfect Information Games: Algorithms and Complexity

  • paper_url: http://arxiv.org/abs/2311.14651
  • repo_url: None
  • paper_authors: Christopher Solinas, Douglas Rebstock, Nathan R. Sturtevant, Michael Buro
  • for: 这篇论文主要适用于不完全信息游戏中的深度有限搜索和价值函数。
  • methods: 论文使用了深度有限搜索和价值函数来解决不完全信息游戏中的问题。它们还使用了一些有强理论保证的方法,但这些方法可能需要较为复杂的计算。
  • results: 论文的实验结果表明,在使用深度有限搜索和价值函数时,可以在不完全信息游戏中实现更好的性能。它们还提供了一种基于马尔可夫链 Монте卡罗牛算法的生成算法,可以在诸如牛牛等游戏中进行更加快速的搜索。
    Abstract Historically applied exclusively to perfect information games, depth-limited search with value functions has been key to recent advances in AI for imperfect information games. Most prominent approaches with strong theoretical guarantees require subgame decomposition - a process in which a subgame is computed from public information and player beliefs. However, subgame decomposition can itself require non-trivial computations, and its tractability depends on the existence of efficient algorithms for either full enumeration or generation of the histories that form the root of the subgame. Despite this, no formal analysis of the tractability of such computations has been established in prior work, and application domains have often consisted of games, such as poker, for which enumeration is trivial on modern hardware. Applying these ideas to more complex domains requires understanding their cost. In this work, we introduce and analyze the computational aspects and tractability of filtering histories for subgame decomposition. We show that constructing a single history from the root of the subgame is generally intractable, and then provide a necessary and sufficient condition for efficient enumeration. We also introduce a novel Markov Chain Monte Carlo-based generation algorithm for trick-taking card games - a domain where enumeration is often prohibitively expensive. Our experiments demonstrate its improved scalability in the trick-taking card game Oh Hell. These contributions clarify when and how depth-limited search via subgame decomposition can be an effective tool for sequential decision-making in imperfect information settings.
    摘要

Calibrated Language Models Must Hallucinate

  • paper_url: http://arxiv.org/abs/2311.14648
  • repo_url: None
  • paper_authors: Adam Tauman Kalai, Santosh S. Vempala
  • for: 这研究旨在解释语言模型中的幻像现象,并提出了一种解释方法。
  • methods: 该研究使用了统计学方法,对语言模型的幻像现象进行分析。
  • results: 研究发现,幻像现象是因为语言模型满足一定的统计学条件而出现的,而这些条件不是由transformer语言模型架构或数据质量决定的。此外,研究还发现,对于一些特定的事实,幻像是语言模型的必要条件,以确保其能够生成有效的文本。
    Abstract Recent language models have a mysterious tendency to generate false but plausible-sounding text. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows shows that there is an inherent statistical reason that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucination is necessary for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.
    摘要 现代语言模型有一种神秘的倾向,即生成 false but plausible-sounding 的文本。这些 "hallucination" 会使语言基于 AI 系统的可用性受到影响,并可能伤害人们对其输出的依赖。这项工作表明,这种 hallucination 有一定的统计原因,与 transformer LM 架构或数据质量无关。对于无法从训练数据中确定真假的 "arbitrary" 事实,我们显示了 hallucination 是语言模型满足统计均衡 condtion 的必要条件。Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. 这一结论是,可以在训练后使用 post-training 来减少基于训练集中单一出现的hallucination。然而,我们的分析还表明,没有统计原因使得预训练会导致 hallucination 发生在多次出现在训练数据中的事实(如文献引用如文章和书籍的hallucination)或系统性的事实(如数学计算)。因此,不同的架构和学习算法可能会解决这些类型的 hallucination。

GPT-4V Takes the Wheel: Evaluating Promise and Challenges for Pedestrian Behavior Prediction

  • paper_url: http://arxiv.org/abs/2311.14786
  • repo_url: None
  • paper_authors: Jia Huang, Peng Jiang, Alvika Gautam, Srikanth Saripalli
  • for: 预测行人行为,提高自动驾驶安全性
  • methods: 使用大型多modal模型(LMM)和语言模型GPT-4V(vision),通过semi-supervised Training来提高视觉理解和 causal 逻辑能力
  • results: 在使用公共可用数据集JAAD、PIE和WiDEVIEW进行评估时,GPT-4V(vision)表现出zero-shot行人行为预测和自动驾驶场景理解能力,但还未达到传统领域专门模型的水平,存在小人和车辆在运动时的处理困难
    Abstract Existing pedestrian behavior prediction methods rely primarily on deep neural networks that utilize features extracted from video frame sequences. Although these vision-based models have shown promising results, they face limitations in effectively capturing and utilizing the dynamic spatio-temporal interactions between the target pedestrian and its surrounding traffic elements, crucial for accurate reasoning. Additionally, training these models requires manually annotating domain-specific datasets, a process that is expensive, time-consuming, and difficult to generalize to new environments and scenarios. The recent emergence of Large Multimodal Models (LMMs) offers potential solutions to these limitations due to their superior visual understanding and causal reasoning capabilities, which can be harnessed through semi-supervised training. GPT-4V(ision), the latest iteration of the state-of-the-art Large-Language Model GPTs, now incorporates vision input capabilities. This report provides a comprehensive evaluation of the potential of GPT-4V for pedestrian behavior prediction in autonomous driving using publicly available datasets: JAAD, PIE, and WiDEVIEW. Quantitative and qualitative evaluations demonstrate GPT-4V(ision)'s promise in zero-shot pedestrian behavior prediction and driving scene understanding ability for autonomous driving. However, it still falls short of the state-of-the-art traditional domain-specific models. Challenges include difficulties in handling small pedestrians and vehicles in motion. These limitations highlight the need for further research and development in this area.
    摘要 现有的步行者行为预测方法主要依靠深度神经网络,使用视频帧序列中提取的特征进行学习。 although these vision-based models have shown promising results, they have limitations in effectively capturing and utilizing the dynamic spatio-temporal interactions between the target pedestrian and its surrounding traffic elements, which is crucial for accurate reasoning. In addition, training these models requires manually annotating domain-specific datasets, which is expensive, time-consuming, and difficult to generalize to new environments and scenarios.However, the recent emergence of Large Multimodal Models (LMMs) offers potential solutions to these limitations due to their superior visual understanding and causal reasoning capabilities, which can be harnessed through semi-supervised training. GPT-4V(ision), the latest iteration of the state-of-the-art Large-Language Model GPTs, now incorporates vision input capabilities.This report provides a comprehensive evaluation of the potential of GPT-4V for pedestrian behavior prediction in autonomous driving using publicly available datasets: JAAD, PIE, and WiDEVIEW. Quantitative and qualitative evaluations demonstrate GPT-4V(ision)'s promise in zero-shot pedestrian behavior prediction and driving scene understanding ability for autonomous driving. However, it still falls short of the state-of-the-art traditional domain-specific models. Challenges include difficulties in handling small pedestrians and vehicles in motion. These limitations highlight the need for further research and development in this area.

One Strike, You’re Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images

  • paper_url: http://arxiv.org/abs/2311.14633
  • repo_url: https://github.com/thomasjurriaans/markush-recognition-msc-thesis
  • paper_authors: Thomas Jurriaans, Kinga Szarkowska, Eric Nalisnick, Markus Schwoerer, Camilo Thorne, Saber Akhondi
  • for: 本研究旨在提出和测试一种用于分类Markush结构的新方法,以提高化学家从大量文献中提取化学信息的自动化方法的精度。
  • methods: 本研究使用了两种方法进行比较:固定特征提取法和终端学习(CNN)方法。研究发现,终端学习方法与固定特征提取法的比较显著,其Macro F1值为0.928(0.035 SD),而固定特征提取法的Macro F1值为0.701(0.052 SD)。
  • results: 研究结果表明,提出的方法可以有效地和准确地过滤Markush结构,并且可以提高化学家使用OCSR管道时的精度。
    Abstract Modern research increasingly relies on automated methods to assist researchers. An example of this is Optical Chemical Structure Recognition (OCSR), which aids chemists in retrieving information about chemicals from large amounts of documents. Markush structures are chemical structures that cannot be parsed correctly by OCSR and cause errors. The focus of this research was to propose and test a novel method for classifying Markush structures. Within this method, a comparison was made between fixed-feature extraction and end-to-end learning (CNN). The end-to-end method performed significantly better than the fixed-feature method, achieving 0.928 (0.035 SD) Macro F1 compared to the fixed-feature method's 0.701 (0.052 SD). Because of the nature of the experiment, these figures are a lower bound and can be improved further. These results suggest that Markush structures can be filtered out effectively and accurately using the proposed method. When implemented into OCSR pipelines, this method can improve their performance and use to other researchers.
    摘要 现代研究日益依靠自动化方法助长研究人员。一个例子是光学化学结构识别(OCSR),它帮助化学家从大量文献中提取化学物质的信息。马库什结构是无法正确分析的化学结构,会导致错误。本研究的重点是提议并测试一种新的马库什结构分类方法。在这种方法中,对比了固定特征提取和端到端学习(CNN)两种方法。端到端方法在比较中表现了显著的优势,达到了0.928(0.035 SD)的macro F1指标,比固定特征方法的0.701(0.052 SD)高出了许多。由于实验的性质,这些数据是下界值,可以进一步提高。这些结果表明,可以使用提案的方法有效地和准确地过滤马库什结构。将这种方法纳入OCSR管道中,可以提高其性能,并且可以用于其他研究人员。

ARIA: On the interaction between Architectures, Aggregation methods and Initializations in federated visual classification

  • paper_url: http://arxiv.org/abs/2311.14625
  • repo_url: None
  • paper_authors: Vasilis Siomos, Sergio Naval-Marimont, Jonathan Passerat-Palmbach, Giacomo Tarroni
  • For: The paper is written to investigate the effect of architecture, initialization, and aggregation (ARIA) elements on the performance of federated learning (FL) models in medical image classification tasks.* Methods: The paper uses a joint ARCHitecture-Initialization-Aggregation (ARIA) study and benchmarking approach to evaluate the performance of different ARIA element combinations across a range of medical image classification tasks.* Results: The paper finds that ARIA elements should be chosen together to achieve the best possible performance, and provides insights into good choices for each element depending on the task, the effect of normalization layers, and the utility of secure multi-party computation (SMPC) pre-training.
    Abstract Federated Learning (FL) is a collaborative training paradigm that allows for privacy-preserving learning of cross-institutional models by eliminating the exchange of sensitive data and instead relying on the exchange of model parameters between the clients and a server. Despite individual studies on how client models are aggregated, and, more recently, on the benefits of ImageNet pre-training, there is a lack of understanding of the effect the architecture chosen for the federation has, and of how the aforementioned elements interconnect. To this end, we conduct the first joint ARchitecture-Initialization-Aggregation study and benchmark ARIAs across a range of medical image classification tasks. We find that, contrary to current practices, ARIA elements have to be chosen together to achieve the best possible performance. Our results also shed light on good choices for each element depending on the task, the effect of normalisation layers, and the utility of SSL pre-training, pointing to potential directions for designing FL-specific architectures and training pipelines.
    摘要 合作学习(Federated Learning,FL)是一种协同训练模式,允许客户端在保护隐私数据的情况下进行模型训练,通过客户端和服务器之间交换模型参数而不是敏感数据。虽然有关客户端模型的汇集方法和ImageNet预训练的研究已经进行过,但是还没有很好地理解 federation 的架构选择对性能的影响,以及这些元素之间的关系。为了解决这个问题,我们进行了首次的ARchitecture-Initialization-Aggregation研究和ARIA比较,并在医学图像分类任务上进行了检验。我们发现,与现有做法不同,ARIA元素需要一起选择以达到最佳性能。我们的结果还揭示了每个元素在任务上的好选择,normalization层的影响以及SSL预训练的用处,这些结果可能指向FL特有的架构设计和训练管道的可能方向。

Eliciting Honest Information From Authors Using Sequential Review

  • paper_url: http://arxiv.org/abs/2311.14619
  • repo_url: None
  • paper_authors: Yichi Zhang, Grant Schoenebeck, Weijie Su
  • for: 提高会议审核质量和拒绝低质量纸张
  • methods: 使用sequential review机制,通过conditioning the review of the next paper on the review scores of the previous papers来评估作者的纸张质量
  • results: 1) 能够获取作者真实的排名信息,在现实 scenarios下比前作更真实; 2) 提高接受纸张质量,减少审核工作量,提高接受纸张的平均质量; 3) 鼓励作者写 fewer papers of higher quality.
    Abstract In the setting of conference peer review, the conference aims to accept high-quality papers and reject low-quality papers based on noisy review scores. A recent work proposes the isotonic mechanism, which can elicit the ranking of paper qualities from an author with multiple submissions to help improve the conference's decisions. However, the isotonic mechanism relies on the assumption that the author's utility is both an increasing and a convex function with respect to the review score, which is often violated in peer review settings (e.g.~when authors aim to maximize the number of accepted papers). In this paper, we propose a sequential review mechanism that can truthfully elicit the ranking information from authors while only assuming the agent's utility is increasing with respect to the true quality of her accepted papers. The key idea is to review the papers of an author in a sequence based on the provided ranking and conditioning the review of the next paper on the review scores of the previous papers. Advantages of the sequential review mechanism include 1) eliciting truthful ranking information in a more realistic setting than prior work; 2) improving the quality of accepted papers, reducing the reviewing workload and increasing the average quality of papers being reviewed; 3) incentivizing authors to write fewer papers of higher quality.
    摘要 在会议同仁评审设置下,会议希望接受高质量的文章,拒绝低质量的文章基于噪音评分。一项最近的工作提议使用ISO逻辑机制,以提取作者多篇提交的文章质量排名。然而,ISO逻辑机制假设作者的价值函数是增加和凸函数,这在同仁评审设置中经常被违反(例如,作者尝试最大化被接受的文章数量)。在这篇文章中,我们提议一种顺序评审机制,可以真实地提取作者的排名信息,只需要假设作者的价值函数是增加的。我们的顺序评审机制的优点包括:1)在更真实的设置下提取真实的排名信息;2)提高接受文章的质量,减少评审工作量,提高平均被接受的文章质量;3)激励作者写 fewer 但高质量的文章。

A Survey and Analysis of Evolutionary Operators for Permutations

  • paper_url: http://arxiv.org/abs/2311.14595
  • repo_url: https://github.com/cicirello/permutation-crossover-landscape-analysis
  • paper_authors: Vincent A. Cicirello
  • for: 该论文主要研究了 permutation 问题的进化算法。
  • methods: 论文使用了多种进化算法的Operator,包括交叉和突变Operator。
  • results: 论文通过实验分析了不同 permutation 特征的突变Operator。
    Abstract There are many combinatorial optimization problems whose solutions are best represented by permutations. The classic traveling salesperson seeks an optimal ordering over a set of cities. Scheduling problems often seek optimal orderings of tasks or activities. Although some evolutionary approaches to such problems utilize the bit strings of a genetic algorithm, it is more common to directly represent solutions with permutations. Evolving permutations directly requires specialized evolutionary operators. Over the years, many crossover and mutation operators have been developed for solving permutation problems with evolutionary algorithms. In this paper, we survey the breadth of evolutionary operators for permutations. We implemented all of these in Chips-n-Salsa, an open source Java library for evolutionary computation. Finally, we empirically analyze the crossover operators on artificial fitness landscapes isolating different permutation features.
    摘要 有很多 combinatorial optimization 问题的解是最佳 permutation。经典的旅行销售员寻找一个最佳 ordering 的城市集。调度问题通常寻找最佳 ordering 的任务或活动。虽然一些EVOLUTIONARY APPROACHES 使用生物学演化算法中的 bit string,但更常直接表示解为 permutation。进化 permutation 需要特殊的进化运算。过去几十年,为解 permutation 问题而开发了很多 crossing 和 mutation 运算。在这篇论文中,我们对 permutation 问题的进化运算进行了评估。我们在 Chips-n-Salsa,一个开源 Java 库 для进化计算中实现了所有这些运算。最后,我们对人工适应度 landscape 中的 permutation 特征进行了实验分析。

GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

  • paper_url: http://arxiv.org/abs/2311.14583
  • repo_url: https://github.com/hmosousa/gpt_struct_me
  • paper_authors: Hugo Sousa, Nuno Guimarães, Alípio Jorge, Ricardo Campos
  • for: 本研究旨在评估两种现代自然语言处理模型(GPT-3和GPT-3.5)在新闻文本中提取结构化信息的能力。
  • methods: 本研究使用了两种state-of-the-art语言模型(GPT-3和GPT-3.5),通过对Text2Story Lusa数据集进行评估,以评估这两种模型在提取新闻文本中的事件、参与者和时间表达等结构化信息的能力。
  • results: 研究结果表明,GPT模型在提取新闻文本中的结构化信息方面具有竞争力,与现有的基eline系统相比,GPT模型可以作为有限资源的实践者提供一个综合性的解决方案。
    Abstract The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models -- GPT-3 and GPT-3.5, commonly known as ChatGPT -- in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.
    摘要 随着文本数据的日常生成量不断增加,EXTRACT STRUCTURED INFORMATION FROM TEXT的重要性也在不断提高。一个可以有效地EXTRACT STRUCTURED INFORMATION的系统在各个领域都是一种资产,例如金融、医疗和法律。自然语言处理技术的发展使得可以部分模拟人类智能的语言模型出现了。这种效果提出了一个有关问题:可以使用这些模型EXTRACT STRUCTURED INFORMATION吗?在这个工作中,我们将回答这个问题,通过评估GPT-3和GPT-3.5(也称为ChatGPT)语言模型在新闻文本中EXTRACT STRUCTURED INFORMATION的能力。我们使用了Text2Story Lusa数据集,包含119篇葡萄牙文新闻文本,其中每篇文本都有一个实体结构和多个标签和属性值的注释框架。我们首先通过对一些文本的子集进行ablation研究,选择最佳提示模板。然后,我们使用最佳提示模板来评估模型在剩下的文本上的效果。结果表明,GPT模型与外部基eline系统竞争,提供一个全面的替代方案,可以为具有有限资源的实践者提供帮助。通过研究这些模型在EXTRACT STRUCTURED INFORMATION的上下文中的优劣点和局限性,我们提供了指导未来改进和探索的信息。

Counting Solutions to Conjunctive Queries: Structural and Hybrid Tractability

  • paper_url: http://arxiv.org/abs/2311.14579
  • repo_url: None
  • paper_authors: Hubie Chen, Gianluigi Greco, Stefan Mengel, Francesco Scarcello
  • for: 本研究旨在解决基本问题,即在数据库中计数搜索结果的效率。
  • methods: 本文使用 #-hypertree decompositions 方法,具体来说是利用查询的结构特性和数据库中的属性,包括键或其他弱型度Constraints,从而提高计数效率。
  • results: 本研究证明,对于具有 bounded #-hypertree width 的查询,可以在 polynomial time 内计数结果。此外,本文还证明,对于 bounded arity 查询,bounded #-hypertree width 性质 precisely delineates the frontier of tractability for the counting problem。
    Abstract Counting the number of answers to conjunctive queries is a fundamental problem in databases that, under standard assumptions, does not have an efficient solution. The issue is inherently #P-hard, extending even to classes of acyclic instances. To address this, we pinpoint tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomposition. We establish the feasibility of counting answers in polynomial time for classes of queries featuring bounded #-hypertree width. Additionally, employing novel techniques from the realm of fixed-parameter computational complexity, we prove that, for bounded arity queries, the bounded #-hypertree width property precisely delineates the frontier of tractability for the counting problem. This result closes an important gap in our understanding of the complexity of such a basic problem for conjunctive queries and, equivalently, for constraint satisfaction problems (CSPs). Drawing upon #-hypertree decompositions, a ''hybrid'' decomposition method emerges. This approach leverages both the structural characteristics of the query and properties intrinsic to the input database, including keys or other (weaker) degree constraints that limit the permissible combinations of values. Intuitively, these features may introduce distinct structural properties that elude identification through the ''worst-possible database'' perspective inherent in purely structural methods.
    摘要 计算 conjunctive 查询的答案数是Database的基本问题,在标准假设下,不存在有效的解决方案。这个问题是基本 #P-hard,甚至涵盖了不向的分支实例。 To address this, we identify tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomposition. We establish the feasibility of counting answers in polynomial time for classes of queries featuring bounded #-hypertree width. Additionally, employing novel techniques from the realm of fixed-parameter computational complexity, we prove that, for bounded arity queries, the bounded #-hypertree width property precisely delineates the frontier of tractability for the counting problem. This result closes an important gap in our understanding of the complexity of such a basic problem for conjunctive queries and, equivalently, for constraint satisfaction problems (CSPs). Based on #-hypertree decompositions, a ''hybrid'' decomposition method emerges. This approach leverages both the structural characteristics of the query and properties intrinsic to the input database, including keys or other (weaker) degree constraints that limit the permissible combinations of values. Intuitively, these features may introduce distinct structural properties that elude identification through the ''worst-possible database'' perspective inherent in purely structural methods.

RAISE – Radiology AI Safety, an End-to-end lifecycle approach

  • paper_url: http://arxiv.org/abs/2311.14570
  • repo_url: None
  • paper_authors: M. Jorge Cardoso, Julia Moosbauer, Tessa S. Cook, B. Selnur Erdal, Brad Genereaux, Vikash Gupta, Bennett A. Landman, Tiarna Lee, Parashkev Nachev, Elanchezhian Somasundaram, Ronald M. Summers, Khaled Younis, Sebastien Ourselin, Franz MJ Pfister
  • for: 这篇论文旨在探讨人工智能(AI)在医学影像领域的应用,以提高诊断和治疗效果,但同时也需要仔细考虑避免潜在的风险。
  • methods: 论文提出了一种遵循高标准的验证和评估过程,以确保AI模型在医疗实践中的安全、有效性和诊断能力具备最高标准。在生产环境中,实施输入和输出的“ guardrails” 来防止个别故障,并且进行不断的 postevaluation 以跟踪数据的变化、公平性和价值传递。
  • results: 论文强调了在多个层次的质量保证,包括法规、临床、技术和伦理方面,以确保AI在医疗实践中的应用。它还提出了在医疗系统、产业、学术和政府机构之间建立合作的重要性,以解决多方面的挑战。通过这种方式,开发者可以赢得用户和患者的信任,并使AI在医疗领域得到负责任的普及和应用。
    Abstract The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency but it demands a meticulous approach to mitigate potential risks as with any other new technology. Beginning with rigorous pre-deployment evaluation and validation, the focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy for their intended applications. Input and output guardrails implemented during production usage act as an additional layer of protection, identifying and addressing individual failures as they occur. Continuous post-deployment monitoring allows for tracking population-level performance (data drift), fairness, and value delivery over time. Scheduling reviews of post-deployment model performance and educating radiologists about new algorithmic-driven findings is critical for AI to be effective in clinical practice. Recognizing that no single AI solution can provide absolute assurance even when limited to its intended use, the synergistic application of quality assurance at multiple levels - regulatory, clinical, technical, and ethical - is emphasized. Collaborative efforts between stakeholders spanning healthcare systems, industry, academia, and government are imperative to address the multifaceted challenges involved. Trust in AI is an earned privilege, contingent on a broad set of goals, among them transparently demonstrating that the AI adheres to the same rigorous safety, effectiveness and efficacy standards as other established medical technologies. By doing so, developers can instil confidence among providers and patients alike, enabling the responsible scaling of AI and the realization of its potential benefits. The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
    摘要 radiology 中的 AI integreation 引入了改善临床护理和效率的机会,但它也需要一个极其小心的方法来减轻潜在的风险,与任何新技术一样。开始于严格的预部署评估和验证,我们应该确保模型达到最高的安全性、有效性和效果标准。在生产使用时,实施输入和输出的 guardrails 作为额外的保护层,并在发生个体失败时进行识别和修复。Continuous post-deployment monitoring 可以跟踪人口级别的性能(数据漂移)、公平和价值传递,并在时间上跟踪。审核 poste-deployment 模型性能和教育 radiologists 关于新算法驱动的发现是关键,以确保 AI 在临床实践中效果。认可 AI 无法提供绝对保障,即使受限于其用途,因此我们强调多级质量控制 - 法规、临床、技术和道德 - 的共同努力。医疗机构、产业、学术和政府之间的合作是必要的,以解决多方面的挑战。通过 transparent 地表明 AI 遵循同样的严格安全、有效性和效果标准,开发者可以赢得提供者和患者的信任,使 AI 负责任地扩展。以上路线图预测了在 radiology 中实现可靠、可信、安全的 AI。

Electric Vehicles coordination for grid balancing using multi-objective Harris Hawks Optimization

  • paper_url: http://arxiv.org/abs/2311.14563
  • repo_url: None
  • paper_authors: Cristina Bianca Pop, Tudor Cioara, Viorica Chifu, Ionut Anghel, Francesco Bellesini
  • for: 本研究旨在提出一种EV Fleet协调模型,以确保地方网络供应的可靠性和稳定性,并利用EV充电和充电期间的能量储存和发电。
  • methods: 本研究使用了Harris Hawks优化算法(HHO)来解决优化问题,考虑了网络能量均衡、时间使用偏好和EV驱逐器的位置。EV充电和充电时间的调整是通过探索和利用操作来实现,并确保技术和操作上的可行性。
  • results: 研究结果表明,协调充电和充电期间的EV Fleet不仅可以满足网络平衡服务要求,还与用户偏好保持相似性,差异较小。
    Abstract The rise of renewables coincides with the shift towards Electrical Vehicles (EVs) posing technical and operational challenges for the energy balance of the local grid. Nowadays, the energy grid cannot deal with a spike in EVs usage leading to a need for more coordinated and grid aware EVs charging and discharging strategies. However, coordinating power flow from multiple EVs into the grid requires sophisticated algorithms and load-balancing strategies as the complexity increases with more control variables and EVs, necessitating large optimization and decision search spaces. In this paper, we propose an EVs fleet coordination model for the day ahead aiming to ensure a reliable energy supply and maintain a stable local grid, by utilizing EVs to store surplus energy and discharge it during periods of energy deficit. The optimization problem is addressed using Harris Hawks Optimization (HHO) considering criteria related to energy grid balancing, time usage preference, and the location of EV drivers. The EVs schedules, associated with the position of individuals from the population, are adjusted through exploration and exploitation operations, and their technical and operational feasibility is ensured, while the rabbit individual is updated with a non-dominated EV schedule selected per iteration using a roulette wheel algorithm. The solution is evaluated within the framework of an e-mobility service in Terni city. The results indicate that coordinated charging and discharging of EVs not only meet balancing service requirements but also align with user preferences with minimal deviations.
    摘要 随着可再生能源的发展,电动汽车(EV)的使用也在增加,对当地电网的能源平衡造成技术和运营挑战。现在,电网无法处理大量EV的使用,导致需要更加协调和grid aware的EV充电和充电策略。然而,将多个EV的电流输入到电网中需要复杂的算法和负荷均衡策略,因为 complexity 随着更多的控制变量和EV的增加。在这篇论文中,我们提出了一种EV队伍协调模型,以确保可靠的能源供应和维护当地电网的稳定,通过使用EV存储过剩能量并在能源缺乏期间释放。这个优化问题使用Harris Hawks优化(HHO),考虑电网平衡、时间使用偏好和EV驾驶员的位置。EV的时间表,与人口中的个体相关,通过探索和利用操作进行调整,并确保技术和运营可行性。每轮更新 rabbit 个体使用不在dominated EV时间表选择。解决方案在特内利市的e-мобиility服务框架下进行评估。结果表明,协调充电和充电的EV不仅满足平衡服务要求,还与用户偏好保持一致,偏差相对较小。

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.14552
  • repo_url: https://github.com/jefferyzhan/griffon
  • paper_authors: Yufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang, Jinqiao Wang
  • for: 该研究旨在探讨现有的大量视力语言模型(LVLM)是否可以具备基本对象感知能力,并如何在不同的本地化场景下提高其性能。
  • methods: 该研究使用了现有的LVLM,并通过自定义的语言提示集和精度的位置认知任务,使模型能够准确地识别和定位对象。
  • results: 研究表明,使用LVLM可以具备基本对象感知能力,并且可以在不同的本地化场景下提高性能。此外,提出了一种基于LVLM的新的语言提示本地化集,可以帮助提高模型的性能。
    Abstract Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Vision-Language models. Current Large Vision Language Models (LVLMs) are predominantly constrained to grounding a single, pre-existing object, relying solely on data from Referring Expression Comprehension tasks. The limitation leads to a compromise in model design, necessitating the introduction of visual expert models or the integration of customized head structures. Beyond these constraints, our research delves into the untapped potential of LVLMs and uncover their inherent capability for basic object perception, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel language-prompted localization dataset designed to fully unleash the capabilities of LVLMs in integrating fine-grained object perception with precise location awareness. More importantly, we present $\textbf{Griffon}$, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules. It simply maintains a consistent structure with popular LVLMs by unifying data formats across various localization-related scenarios and is trained end-to-end through a well-designed pipeline. Comprehensive experiments demonstrate that $\textbf{Griffon}$ not only achieves state-of-the-art performance on the fine-grained RefCOCO series but also approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO.
    摘要 Current large vision language models (LVLMs) can only ground a single pre-existing object based on referring expression comprehension tasks, which limits their design and requires the use of visual expert models or customized head structures. Our research explores the untapped potential of LVLMs and discovers their inherent ability to perceive objects at a fine grain level, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel language-prompted localization dataset that fully utilizes the capabilities of LVLMs in integrating fine-grained object perception with precise location awareness.We also present a purely LVLM-based baseline called Griffon, which does not require the introduction of special tokens, expert models, or additional detection modules. Griffon maintains a consistent structure with popular LVLMs and is trained end-to-end through a well-designed pipeline. Comprehensive experiments show that Griffon achieves state-of-the-art performance on the fine-grained RefCOCO series and approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO.

Inferring Latent Class Statistics from Text for Robust Visual Few-Shot Learning

  • paper_url: http://arxiv.org/abs/2311.14544
  • repo_url: https://github.com/ybendou/fs-text2stats
  • paper_authors: Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Giulia Lioi, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene
  • for: 提高几个shot学习中的横跨领域稳定性
  • methods: 利用文本 derive 统计来预测每个类视觉特征分布的平均和方差
  • results: 在多个数据集上显示了在 incorporating mean和covariance统计时提高几个shot学习中的分类性能
    Abstract In the realm of few-shot learning, foundation models like CLIP have proven effective but exhibit limitations in cross-domain robustness especially in few-shot settings. Recent works add text as an extra modality to enhance the performance of these models. Most of these approaches treat text as an auxiliary modality without fully exploring its potential to elucidate the underlying class visual features distribution. In this paper, we present a novel approach that leverages text-derived statistics to predict the mean and covariance of the visual feature distribution for each class. This predictive framework enriches the latent space, yielding more robust and generalizable few-shot learning models. We demonstrate the efficacy of incorporating both mean and covariance statistics in improving few-shot classification performance across various datasets. Our method shows that we can use text to predict the mean and covariance of the distribution offering promising improvements in few-shot learning scenarios.
    摘要 在几拠学习领域,基础模型如CLIP表现良好,但在跨领域稳定性方面存在限制,特别是在几拠设置下。现有工作通过文本作为附加特征来提高这些模型的表现。大多数这些方法将文本视为辅助特征而不是全面探索文本的潜在作用,即利用文本获得类视觉特征分布的下文。在这篇论文中,我们提出一种新的方法,利用文本 derive 的统计来预测每个类的视觉特征分布的平均值和方差。这种预测框架把潜在空间增强,从而提高几拠学习模型的 robustness 和普适性。我们在不同的数据集上展示了将两者(即平均值和方差统计)合并在一起的效果,并证明文本可以用来预测分布的平均值和方差,这种方法在几拠学习场景中具有承诺性。

Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language

  • paper_url: http://arxiv.org/abs/2311.14543
  • repo_url: None
  • paper_authors: Di Jin, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar, Sungjin Lee, Yang Liu, Mahdi Namazifar
  • for: 本研究旨在 investigate 大语言模型(LLM)的数据效率, Specifically, 我们在1000个记录或更少的人类反馈中进行了finetuning。
  • methods: 我们使用了一个开源的LLM,如Falcon-40B-Instruct,并在人类反馈中进行了 критики和修订。
  • results: 我们发现,这种方法可以改善很多强大的LLM,如ChatGPT、BARD和Vicuna,的答案质量。例如,经过一次修订,ChatGPT的答案的赢得率可以提高至56.6%,并且可以进一步提高至65.9% после应用五次修订。
    Abstract Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which may provide detailed feedback on strengths and weaknesses of a given response. In this work we investigate data efficiency of modeling human feedback that is in natural language. Specifically, we fine-tune an open-source LLM, e.g., Falcon-40B-Instruct, on a relatively small amount (1000 records or even less) of human feedback in natural language in the form of critiques and revisions of responses. We show that this model is able to improve the quality of responses from even some of the strongest LLMs such as ChatGPT, BARD, and Vicuna, through critique and revision of those responses. For instance, through one iteration of revision of ChatGPT responses, the revised responses have 56.6% win rate over the original ones, and this win rate can be further improved to 65.9% after applying the revision for five iterations.
    摘要

RDF Stream Taxonomy: Systematizing RDF Stream Types in Research and Practice

  • paper_url: http://arxiv.org/abs/2311.14540
  • repo_url: None
  • paper_authors: Piotr Sowinski, Pawel Szmeja, Maria Ganzha, Marcin Paprzycki
  • for: 本研究旨在 Addressing the critical research gap in RDF streaming 范畴的系统化和描述,提供一个 novel taxonomy (RDF-STaX) 以推动 RDF 流程的研究和实践。
  • methods: 本研究使用 OWL 2 DL ontology 和 FAIR 原则,提供了详细的文档和其他资源,以促进 ontology 的采用。 二个实现的用例显示了资源的使用方式,并且提供了一个 collaborative, living state-of-the-art 的 RDF 流程评论。
  • results: 本研究实现了一个 novel nanopublications dataset, serves as a collaborative, living state-of-the-art review of RDF streaming。 RDF-STaX 这个资源旨在推动 RDF 流程的科研讨论、合作和工具互操作性。
    Abstract Over the years, RDF streaming was explored in research and practice from many angles, resulting in a wide range of RDF stream definitions. This variety presents a major challenge in discussing and integrating streaming solutions, due to the lack of a common language. This work attempts to address this critical research gap, by systematizing RDF stream types present in the literature in a novel taxonomy. The proposed RDF Stream Taxonomy (RDF-STaX) is embodied in an OWL 2 DL ontology that follows the FAIR principles, making it readily applicable in practice. Extensive documentation and additional resources are provided, to foster the adoption of the ontology. Two realized use cases are presented, demonstrating the usefulness of the resource in discussing research works and annotating streaming datasets. Another result of this contribution is the novel nanopublications dataset, which serves as a collaborative, living state-of-the-art review of RDF streaming. The aim of RDF-STaX is to address a real need of the community for a better way to systematize and describe RDF streams. The resource is designed to help drive innovation in RDF streaming, by fostering scientific discussion, cooperation, and tool interoperability.
    摘要 The proposed RDF Stream Taxonomy (RDF-STaX) is formalized in an OWL 2 DL ontology that adheres to the FAIR principles, making it easily applicable in practice. Detailed documentation and additional resources are provided to facilitate the adoption of the ontology. Two practical use cases are presented, demonstrating the utility of the resource in discussing research works and annotating streaming datasets.Moreover, this contribution includes a novel nanopublications dataset, which serves as a collaborative, living state-of-the-art review of RDF streaming. The goal of RDF-STaX is to provide a much-needed solution for systematizing and describing RDF streams, thereby driving innovation in RDF streaming and fostering scientific discussion, cooperation, and tool interoperability.

CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation

  • paper_url: http://arxiv.org/abs/2311.14539
  • repo_url: None
  • paper_authors: Zhijie Qu, Juan Li, Zerui Ma, Jianqiang Li
  • for: 本研究旨在提高中文医疗对话生成技术的水平,以满足在线医疗咨询的需求。
  • methods: 该研究提出了基于中文医疗领域文本的 GPT 预训练语言模型(CMed-GPT),并在基本版本和大版本之间进行了比较。此外,文本中的词语和实体嵌入也被纳入对话文本中,以满足下游对话生成任务的需求。
  • results: 通过 fine-tuning 和 p-tuning,CMed-GPT 模型的 PPL 值从 8.44 降低至 7.35,证明了 CMed-GPT 模型在生成中文医疗文本方面的 Exceptional 表现。此外,研究还表明,在医疗对话生成中,包含外部信息的 incorporation 可以提高对话质量。
    Abstract Medical dialogue generation relies on natural language generation techniques to enable online medical consultations. Recently, the widespread adoption of large-scale models in the field of natural language processing has facilitated rapid advancements in this technology. Existing medical dialogue models are mostly based on BERT and pre-trained on English corpora, but there is a lack of high-performing models on the task of Chinese medical dialogue generation. To solve the above problem, this paper proposes CMed-GPT, which is the GPT pre-training language model based on Chinese medical domain text. The model is available in two versions, namely, base and large, with corresponding perplexity values of 8.64 and 8.01. Additionally, we incorporate lexical and entity embeddings into the dialogue text in a uniform manner to meet the requirements of downstream dialogue generation tasks. By applying both fine-tuning and p-tuning to CMed-GPT, we lowered the PPL from 8.44 to 7.35. This study not only confirms the exceptional performance of the CMed-GPT model in generating Chinese biomedical text but also highlights the advantages of p-tuning over traditional fine-tuning with prefix prompts. Furthermore, we validate the significance of incorporating external information in medical dialogue generation, which enhances the quality of dialogue generation.
    摘要 医疗对话生成依靠自然语言生成技术,以实现在线医疗咨询。近年来,大规模模型在自然语言处理领域的普及,使得医疗对话生成技术得到了快速的进步。现有的医疗对话模型大多基于BERT并在英语 corpus 上进行预训练,但是中文医疗对话生成模型却缺乏高性能模型。为解决这个问题,本文提出了CMed-GPT模型,它是基于中文医疗领域文本的 GPT 预训练语言模型。该模型有两个版本,即基础版和大版本,其中每个版本的 PPL 值分别为8.64和8.01。此外,我们在对话文本中uniformly embedding lexical和实体信息,以满足下游对话生成任务的需求。通过对 CMed-GPT 模型进行 fine-tuning 和 p-tuning,我们将 PPL 从8.44下降到7.35。本研究不仅证明了 CMed-GPT 模型在生成中文医疗文本方面的 exceptional 表现,还 highlights 精度调整的优势,以及在医疗对话生成中 External information 的添加对对话质量有益的效果。

Digital Twin-Native AI-Driven Service Architecture for Industrial Networks

  • paper_url: http://arxiv.org/abs/2311.14532
  • repo_url: None
  • paper_authors: Kubra Duran, Matthew Broadbent, Gokhan Yurdakul, Berk Canberk
  • for: 提高智能网络管理需求的减少,以满足大规模网络的精准监测和学习需求。
  • methods: 提议DT NATIVE AI驱动服务架构,实现了TCP基于数据流管道和RL基于学习模型。
  • results: 对Internet of Vehicles(IoV)网络进行应用,实现了约30%的处理时间减少,并测试了不同学习率组合的actor和critic网络性能。
    Abstract The dramatic increase in the connectivity demand results in an excessive amount of Internet of Things (IoT) sensors. To meet the management needs of these large-scale networks, such as accurate monitoring and learning capabilities, Digital Twin (DT) is the key enabler. However, current attempts regarding DT implementations remain insufficient due to the perpetual connectivity requirements of IoT networks. Furthermore, the sensor data streaming in IoT networks cause higher processing time than traditional methods. In addition to these, the current intelligent mechanisms cannot perform well due to the spatiotemporal changes in the implemented IoT network scenario. To handle these challenges, we propose a DT-native AI-driven service architecture in support of the concept of IoT networks. Within the proposed DT-native architecture, we implement a TCP-based data flow pipeline and a Reinforcement Learning (RL)-based learner model. We apply the proposed architecture to one of the broad concepts of IoT networks, the Internet of Vehicles (IoV). We measure the efficiency of our proposed architecture and note ~30% processing time-saving thanks to the TCP-based data flow pipeline. Moreover, we test the performance of the learner model by applying several learning rate combinations for actor and critic networks and highlight the most successive model.
    摘要 “由于互联网的需求增加,现在有大量的物联网(IoT)实现。为了管理这些大规模网络的需求,例如精准监控和学习能力,then Digital Twin(DT)是关键的启动器。但现有DT实现的尝试仍然不足,主要是因为互联网的无穷连接需求。此外,IoT网络中的感应器资料流对传统方法来处理的时间长得多。此外,目前的智能机制不能够在实现的IoT网络场景中表现良好,这是因为当前的IoT网络场景中存在着空间时间的变化。为了解决这些挑战,我们提议一个DT-Native AI驱动服务架构,用于支持IoT网络概念。在我们的提议架构中,我们实现了基于TCP的数据流水平和基于强化学习(RL)的学习模型。我们将这个架构应用到互联网网络中的一个广泛概念,即互联网交通(IoV)。我们测量了我们的提议架构的效率,发现在TCP基础的数据流水平下,可以节省约30%的处理时间。此外,我们测试了学习模型的性能,使用了不同的学习率组合 дляactor和批评网络,并评估了最成功的模型。”

FRAD: Front-Running Attacks Detection on Ethereum using Ternary Classification Model

  • paper_url: http://arxiv.org/abs/2311.14514
  • repo_url: None
  • paper_authors: Yuheng Zhang, Pin Liu, Guojun Wang, Peiqiang Li, Wanyi Gu, Houji Chen, Xuelei Liu, Jinyao Zhu
  • for: 本研究旨在提供一种准确地检测ETHEREUM上的前播攻击方法,以保护交易安全性。
  • methods: 该研究提出了一种基于ternary分类模型的FRAD(前播攻击检测模型),可以准确地分类ETHEREUM上的交易活动,并对交易执行进行检测和分类。
  • results: 实验结果表明,使用多层感知器(MLP)分类器可以达到84.59%的检测精度和84.60%的F1分数,表明FRAD模型可以高效地检测前播攻击。
    Abstract With the evolution of blockchain technology, the issue of transaction security, particularly on platforms like Ethereum, has become increasingly critical. Front-running attacks, a unique form of security threat, pose significant challenges to the integrity of blockchain transactions. In these attack scenarios, malicious actors monitor other users' transaction activities, then strategically submit their own transactions with higher fees. This ensures their transactions are executed before the monitored transactions are included in the block. The primary objective of this paper is to delve into a comprehensive classification of transactions associated with front-running attacks, which aims to equip developers with specific strategies to counter each type of attack. To achieve this, we introduce a novel detection method named FRAD (Front-Running Attacks Detection on Ethereum using Ternary Classification Model). This method is specifically tailored for transactions within decentralized applications (DApps) on Ethereum, enabling accurate classification of front-running attacks involving transaction displacement, insertion, and suppression. Our experimental validation reveals that the Multilayer Perceptron (MLP) classifier offers the best performance in detecting front-running attacks, achieving an impressive accuracy rate of 84.59% and F1-score of 84.60%.
    摘要 The primary objective of this paper is to classify transactions associated with front-running attacks in a comprehensive manner, equipping developers with specific strategies to counter each type of attack. To achieve this, we propose a novel detection method called FRAD (Front-Running Attacks Detection on Ethereum using Ternary Classification Model), which is specifically tailored for transactions within decentralized applications (DApps) on Ethereum. This method can accurately classify front-running attacks involving transaction displacement, insertion, and suppression.Our experimental validation shows that the Multilayer Perceptron (MLP) classifier offers the best performance in detecting front-running attacks, with an impressive accuracy rate of 84.59% and F1-score of 84.60%.

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

  • paper_url: http://arxiv.org/abs/2311.14495
  • repo_url: None
  • paper_authors: Shida Wang, Qianxiao Li
  • for: 这 paper 研究了 state-space models (SSMs) 的长期记忆学习能力,从参数化的角度出发。
  • methods: 这 paper 使用了一种批量梯度下降法来优化 SSMs,并提出了一种类型的重parameterization 技术来解决 SSMs 的记忆限制。
  • results: 这 paper 发现,使用重parameterization 技术可以解决 SSMs 的记忆限制,并且可以提高其近似能力和优化稳定性。
    Abstract In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets and language models.
    摘要 在这篇论文中,我们研究了状态空间模型(SSM)的长期记忆学习能力,从参数化的角度来看。我们证明了没有重parameterization的状态空间模型会显示出类似于传统RNN的记忆限制:target关系可以被稳定地由状态空间模型 aproximate的必须有减少幅度的记忆。我们的分析表明这是由于恒等重量 converging to a stability boundary的结果,这些重量可以通过重parameterization技术得到有效地提高。为此,我们介绍了一类重parameterization技术,可以有效地提高SSM的approximation能力。此外,我们还证明了一个原则性的选择重parameterization方案可以提高优化稳定性。我们验证了我们的发现使用 sintetic数据和语言模型。

Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

  • paper_url: http://arxiv.org/abs/2311.14482
  • repo_url: https://github.com/matt3o/autopet2-submission
  • paper_authors: Matthias Hadlich, Zdravko Marinov, Moon Kim, Enrico Nasca, Jens Kleesiek, Rainer Stiefelhagen
  • for: 这篇论文主要针对的是解决核医影像中疾病分类的准确性问题,但是需要大量的手动 voxel 注释来训练。
  • methods: 该论文提出了 SW-FastEdit 交互分割框架,通过只需要一些用户点击来加速分割,而不是原来的 voxelwise 注释。
  • results: 该模型在 AutoPET 数据集上比非静止窗口交互模型准确,并且可以通过静止窗口方式在 HECKTOR 数据集上进行推断。用户研究发现,只需要10次点击循环就能达到高质量预测,并且 NASA-TLX 工作负担较低。
    Abstract Deep learning has revolutionized the accurate segmentation of diseases in medical imaging. However, achieving such results requires training with numerous manual voxel annotations. This requirement presents a challenge for whole-body Positron Emission Tomography (PET) imaging, where lesions are scattered throughout the body. To tackle this problem, we introduce SW-FastEdit - an interactive segmentation framework that accelerates the labeling by utilizing only a few user clicks instead of voxelwise annotations. While prior interactive models crop or resize PET volumes due to memory constraints, we use the complete volume with our sliding window-based interactive scheme. Our model outperforms existing non-sliding window interactive models on the AutoPET dataset and generalizes to the previously unseen HECKTOR dataset. A user study revealed that annotators achieve high-quality predictions with only 10 click iterations and a low perceived NASA-TLX workload. Our framework is implemented using MONAI Label and is available: https://github.com/matt3o/AutoPET2-Submission/
    摘要 深度学习已经革命化医疗影像中精准的疾病分割。然而,实现这些结果需要训练数量很大的手动 voxel 注释。这个要求对整体体 Positron Emission Tomography (PET) 成像而言是一个挑战,因为疾病是全身贯穿的。为解决这个问题,我们介绍 SW-FastEdit - 一个互动性分割框架,可以通过只需要几个用户点击来加速标注。而先前的互动模型因内存限制而会裁剪或重新大小 PET Volume,我们的滑块窗口基本互动方案可以使用完整的 Volume。我们的模型在 AutoPET 数据集上超越了现有的非滑块窗口互动模型,并在之前未seen的 HECKTOR 数据集上保持了良好的一致性。一个用户研究表明,用户可以在 10 次点击 iterations 内获得高质量预测,并且 NASA-TLX 工作负担低。我们的框架使用 MONAI Label 实现,可以在 GitHub 上获取:https://github.com/matt3o/AutoPET2-Submission/

Evolutionary game theory: the mathematics of evolution and collective behaviours

  • paper_url: http://arxiv.org/abs/2311.14480
  • repo_url: None
  • paper_authors: The Anh Han
  • for: 这篇论文旨在探讨EVOLUTIONARY GAME THEORY作为集体行为进化的 poderful和统一的数学工具。
  • methods: 这些研究方向使用EVOLUTIONARY GAME THEORY方法,包括统计性质分析random evolutionary game中稳定平衡数量的性质,以及模拟高等技术发展的危险和AI技术发展竞赛中安全行为的EVOLUTION。
  • results: 这些研究得到了一些有趣的结论,例如:在random evolutionary game中,稳定平衡数量的统计性质具有某些特征,并且AI技术发展竞赛中,安全行为的EVOLUTION可以帮助减少技术发展中的风险。
    Abstract This brief discusses evolutionary game theory as a powerful and unified mathematical tool to study evolution of collective behaviours. It summarises some of my recent research directions using evolutionary game theory methods, which include i) the analysis of statistical properties of the number of (stable) equilibria in a random evolutionary game, and ii) the modelling of safety behaviours' evolution and the risk posed by advanced Artificial Intelligence technologies in a technology development race. Finally, it includes an outlook and some suggestions for future researchers.
    摘要
  1. The analysis of statistical properties of the number of (stable) equilibria in a random evolutionary game.2. The modeling of safety behaviors’ evolution and the risk posed by advanced Artificial Intelligence technologies in a technology development race.Finally, it includes an outlook and some suggestions for future researchers.Translated into Simplified Chinese:这个简报讨论了EVOLUTIONARY GAME theory作为集合行为的演化的强大和统一的数学工具。它概述了我最近的研究方向,使用EVOLUTIONARY GAME theory方法,包括:1. 随机演化游戏中稳定平衡数量的统计性质的分析。2. 技术发展竞赛中安全行为的演化和高级人工智能技术的风险的模型。最后,它包括一个看门和未来研究者的建议。Translated into Traditional Chinese:这个简报讨论了EVOLUTIONARY GAME theory作为集合行为的演化的强大和统一的数学工具。它概述了我最近的研究方向,使用EVOLUTIONARY GAME theory方法,包括:1. 随机演化游戏中稳定平衡数量的统计性质的分析。2. 技术发展竞赛中安全行为的演化和高级人工智能技术的风险的模型。最后,它包括一个看门和未来研究者的建议。

MRxaI: Black-Box Explainability for Image Classifiers in a Medical Setting

  • paper_url: http://arxiv.org/abs/2311.14471
  • repo_url: None
  • paper_authors: Nathan Blake, Hana Chockler, David A. Kelly, Santiago Calderon Pena, Akchunya Chanchal
  • for: 这 paper 是关于解释静止图像分类器输出的研究,尤其是针对医疗领域中的 MRI 图像。
  • methods: 这 paper 使用了多种黑盒方法,包括 causal explainability-based rex,以及其他一些常见的黑盒方法。
  • results: 研究发现,大多数黑盒方法不适合解释医疗领域中的静止图像分类结果,而 causal explainability-based rex 则能够与 gradcam 相比,表现很好。
    Abstract Existing tools for explaining the output of image classifiers can be divided into white-box, which rely on access to the model internals, and black-box, agnostic to the model. As the usage of AI in the medical domain grows, so too does the usage of explainability tools. Existing work on medical image explanations focuses on white-box tools, such as gradcam. However, there are clear advantages to switching to a black-box tool, including the ability to use it with any classifier and the wide selection of black-box tools available. On standard images, black-box tools are as precise as white-box. In this paper we compare the performance of several black-box methods against gradcam on a brain cancer MRI dataset. We demonstrate that most black-box tools are not suitable for explaining medical image classifications and present a detailed analysis of the reasons for their shortcomings. We also show that one black-box tool, a causal explainability-based rex, performs as well as \gradcam.
    摘要 现有的图像分类器解释工具可以分为白盒和黑盒两类,其中白盒工具需要对模型内部具有访问权,而黑盒工具则不依赖于模型。随着医疗领域中的AI应用的广泛使用,解释工具的使用也在不断增长。现有的医疗图像解释工作主要关注白盒工具,如gradcam。然而,使用黑盒工具有很多优点,包括可以与任何分类器结合使用,并且有很多黑盒工具可供选择。在标准图像上,黑盒工具的精度与白盒工具相当。在这篇论文中,我们比较了多个黑盒方法与gradcam的性能,并发现大多数黑盒工具不适用于医疗图像分类的解释,并提供了详细的分析原因。此外,我们还发现一种黑盒工具,基于 causal explainability 的 rex,与 gradcam 的性能相当。

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

  • paper_url: http://arxiv.org/abs/2311.14457
  • repo_url: None
  • paper_authors: Zicong Zhao
    for: This paper proposes a framework for safe and efficient decision-making in urban rail transit autonomous operation using deep reinforcement learning.methods: The proposed framework combines linear temporal logic, reinforcement learning, and Monte Carlo tree search, and consists of four main modules: a post-posed shielding, a searching tree module, a DRL framework, and an additional actor.results: The proposed framework can meet speed constraints, schedule constraints, and optimize the operation process, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.
    Abstract Deep reinforcement learning has gradually shown its latent decision-making ability in urban rail transit autonomous operation. However, since reinforcement learning can not neither guarantee safety during learning nor execution, this is still one of the major obstacles to the practical application of reinforcement learning. Given this drawback, reinforcement learning applied in the safety-critical autonomous operation domain remains challenging without generating a safe control command sequence that avoids overspeed operations. Therefore, a SSA-DRL framework is proposed in this paper for safe intelligent control of urban rail transit autonomous operation trains. The proposed framework is combined with linear temporal logic, reinforcement learning and Monte Carlo tree search and consists of four mainly module: a post-posed shielding, a searching tree module, a DRL framework and an additional actor. Furthermore, the output of the framework can meet speed constraint, schedule constraint and optimize the operation process. Finally, the proposed SSA-DRL framework for decision-making in urban rail transit autonomous operation is evaluated in sixteen different sections, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.
    摘要 深度强化学习逐渐显示了它在城市轨道交通自动化操作中的隐藏决策能力。然而,由于强化学习无法保证学习和执行过程中的安全,这还是城市轨道交通自动化操作中强化学习应用的主要障碍。因此,本文提出了一个SSA-DRL框架,用于安全智能控制城市轨道交通自动化操作列车。该框架组合了线性时间逻辑、强化学习和蒙地卡树搜索,包括四个主要模块: posterior shielding、搜索树模块、DRL框架和附加actor。此外,Output of the framework can meet speed constraint, schedule constraint and optimize the operation process。最后,本文提出的SSA-DRL框架在16个不同的段落中进行了评估,并通过ablation experiment和比较与规划操作计划进行了证明其效果。

Universal Jailbreak Backdoors from Poisoned Human Feedback

  • paper_url: http://arxiv.org/abs/2311.14455
  • repo_url: https://github.com/ethz-spylab/rlhf-poisoning
  • paper_authors: Javier Rando, Florian Tramèr
  • for: 这 paper 用于研究语音模型中的偏见和攻击,以及如何通过人工反馈学习(RLHF)来实现模型的适应和安全性。
  • methods: 这 paper 使用了RLHF方法,并调查了攻击者可能会植入“跳过增强”后门到模型中的可能性。
  • results: 研究发现,universal 跳过增强后门可以让模型生成有害的回答,而且这种后门比之前的研究中的后门更加强大和难以植入。
    Abstract Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses. Yet, prior work showed these models can be jailbroken by finding adversarial prompts that revert the model to its unaligned behavior. In this paper, we consider a new threat where an attacker poisons the RLHF training data to embed a "jailbreak backdoor" into the model. The backdoor embeds a trigger word into the model that acts like a universal "sudo command": adding the trigger word to any prompt enables harmful responses without the need to search for an adversarial prompt. Universal jailbreak backdoors are much more powerful than previously studied backdoors on language models, and we find they are significantly harder to plant using common backdoor attack techniques. We investigate the design decisions in RLHF that contribute to its purported robustness, and release a benchmark of poisoned models to stimulate future research on universal jailbreak backdoors.
    摘要 人工智能反馈学习(RLHF)用于让大型自然语言模型生成有用和无害的回复。然而,先前的工作表明这些模型可以通过找到攻击性提示来折衣模型的行为。在这篇论文中,我们考虑了一个新的威胁,即攻击者杀害RLHF训练数据,以嵌入一个"监狱援助后门"到模型中。这个后门包含一个触发词,可以让任何提示都会让模型生成有害的回复,不需要搜索攻击性提示。这种全局监狱后门比先前研究中的后门更加强大,我们发现它们使用常见后门攻击技术植入非常困难。我们 investigate RLHF的设计决策,并发布一个恶作用模型的标准 benchmark,以便未来的研究人员可以启发更多的研究。

Deep Learning for Automatic Strain Quantification in Arrhythmogenic Right Ventricular Cardiomyopathy

  • paper_url: http://arxiv.org/abs/2311.14448
  • repo_url: None
  • paper_authors: Laura Alvarez-Florez, Jörg Sander, Mimount Bourfiss, Fleur V. Y. Tjong, Birgitta K. Velthuis, Ivana Išgum
    for:The paper aims to develop an automatic method for quantifying cardiac motion in arrhythmogenic right ventricular cardiomyopathy (ARVC) diagnosis using cine Cardiac Magnetic Resonance Imaging (CMRI).methods:The method uses Implicit Neural Representations (INRs) and a biomechanically informed regularization inspired by the myocardial incompressibility assumption to register CMRIs from different time points of the cardiac cycle. The method also includes a rigid registration guided by the long-axis views to rectify inter-slice misalignment and an unsupervised deep learning super-resolution approach to increase the through-plane resolution.results:The proposed method significantly improves registration performance compared to using a single view or a single-frame registration method. Additionally, the method is able to quantify global and segmental strain over a cardiac cycle and compute the peak strain, which can assist in diagnosis and provide further understanding of disease-specific alterations of cardiac motion. The results show significant differences in the peak strain between ARVC patients and healthy controls, suggesting that automated motion quantification methods may be useful for diagnosis.Here’s the Chinese translation of the three points:for:这篇论文目标是使用cine Cardiac Magnetic Resonance Imaging(CMRI)诊断Arrhythmogenic right ventricular cardiomyopathy(ARVC),自动量化心肺运动。methods:该方法使用Implicit Neural Representations(INRs)和基于机能学的压缩regularization,以填充CMRI中的不同时间点的征识。此外,方法还包括由长轴视图引导的稳定registrations,以消除 междуslice的偏移。results:提议的方法可以显著提高registration性能,比单视图或单帧registration方法更为稳定。此外,方法还可以量化心肺运动的全球和分部弹性,并计算最大弹性值。结果表明,Automated motion量化方法可能是诊断ARVC的有用工具。
    Abstract Quantification of cardiac motion with cine Cardiac Magnetic Resonance Imaging (CMRI) is an integral part of arrhythmogenic right ventricular cardiomyopathy (ARVC) diagnosis. Yet, the expert evaluation of motion abnormalities with CMRI is a challenging task. To automatically assess cardiac motion, we register CMRIs from different time points of the cardiac cycle using Implicit Neural Representations (INRs) and perform a biomechanically informed regularization inspired by the myocardial incompressibility assumption. To enhance the registration performance, our method first rectifies the inter-slice misalignment inherent to CMRI by performing a rigid registration guided by the long-axis views, and then increases the through-plane resolution using an unsupervised deep learning super-resolution approach. Finally, we propose to synergically combine information from short-axis and 4-chamber long-axis views, along with an initialization to incorporate information from multiple cardiac time points. Thereafter, to quantify cardiac motion, we calculate global and segmental strain over a cardiac cycle and compute the peak strain. The evaluation of the method is performed on a dataset of cine CMRI scans from 47 ARVC patients and 67 controls. Our results show that inter-slice alignment and generation of super-resolved volumes combined with joint analysis of the two cardiac views, notably improves registration performance. Furthermore, the proposed initialization yields more physiologically plausible registrations. The significant differences in the peak strain, discerned between the ARVC patients and healthy controls suggest that automated motion quantification methods may assist in diagnosis and provide further understanding of disease-specific alterations of cardiac motion.
    摘要 干预心脏动力学诊断中,心脏运动评估是一个重要的组成部分。然而,通过人工评估心脏运动的CMRI图像是一项具有挑战性的任务。为了自动评估心脏运动,我们使用Implicit Neural Representations(INRs)进行心脏图像匹配,并采用生物力学 informed regularization,以便更好地匹配心脏的压缩性假设。为了提高匹配性能,我们的方法首先将CMRI图像中的横截轴偏移 rectify,并使用深度学习超分解技术来提高过滤平面分辨率。最后,我们提议同时使用短轴和4个胸部长轴图像,并使用初始化来汇集多个心脏时间点的信息。然后,我们计算全心和局部弹性,并计算心脏的峰弹性。我们的结果表明,通过同时使用短轴和4个胸部长轴图像,并使用初始化来汇集多个心脏时间点的信息,可以提高匹配性能。此外,我们的初始化方法可以更好地匹配心脏的动力学特征。在ARVC患者和健康群体之间的峰弹性差异显著,这表明自动动量评估方法可能可以帮助诊断和提供更深入的心脏动力学特征的理解。

GCPV: Guided Concept Projection Vectors for the Explainable Inspection of CNN Feature Spaces

  • paper_url: http://arxiv.org/abs/2311.14435
  • repo_url: None
  • paper_authors: Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, Korinna Bade
  • for: 本研究旨在提高计算机视觉深度神经网络(CNN)的解释性和可读性,以便人工检查学习的潜在表示。
  • methods: 本研究使用的方法包括:globally associate given natural language semantic concepts with representing vectors or regions in the CNN latent space,以及 hierarchical clustering。
  • results: 研究结果显示,引入了本地到全局导向概念向量(GCPV)方法后,对象检测器的性能得到了改进,并且可以具有多层概念向量的好处和强健性。此外,GCPV可以用于找到混淆的概念之根本原因,并且可以揭示概念水平的异常值。
    Abstract For debugging and verification of computer vision convolutional deep neural networks (CNNs) human inspection of the learned latent representations is imperative. Therefore, state-of-the-art eXplainable Artificial Intelligence (XAI) methods globally associate given natural language semantic concepts with representing vectors or regions in the CNN latent space supporting manual inspection. Yet, this approach comes with two major disadvantages: They are locally inaccurate when reconstructing a concept label and discard information about the distribution of concept instance representations. The latter, though, is of particular interest for debugging, like finding and understanding outliers, learned notions of sub-concepts, and concept confusion. Furthermore, current single-layer approaches neglect that information about a concept may be spread over the CNN depth. To overcome these shortcomings, we introduce the local-to-global Guided Concept Projection Vectors (GCPV) approach: It (1) generates local concept vectors that each precisely reconstruct a concept segmentation label, and then (2) generalizes these to global concept and even sub-concept vectors by means of hiearchical clustering. Our experiments on object detectors demonstrate improved performance compared to the state-of-the-art, the benefit of multi-layer concept vectors, and robustness against low-quality concept segmentation labels. Finally, we demonstrate that GCPVs can be applied to find root causes for confusion of concepts like bus and truck, and reveal interesting concept-level outliers. Thus, GCPVs pose a promising step towards interpretable model debugging and informed data improvement.
    摘要 为了调试和验证计算机视觉深度学习模型(CNN),人工检查学习的秘密表示(latent space)是必要的。因此,当前的可解释人工智能(XAI)技术将自然语言 semantic concept 与 CNN latent space 中表示vector或区域相关联。然而,这种方法有两个主要缺点:它们在恢复概念标签时是地方准确的,并且抛弃了概念实例表示的分布信息。后者对于调试,如找到和理解异常实例、学习的子概念和概念混淆而言是非常有利。此外,当前的单层方法忽略了概念信息可能在 CNN 深度中分布。为了解决这些缺点,我们提出了本地到全局导向概念投影向量(GCPV)方法:它首先生成每个精确地恢复一个概念分割标签的本地概念vector,然后通过层次归一化来将其扩展到全局概念vector和even sub-concept vector。我们在对物体检测器进行了实验,并证明了我们的方法在比state-of-the-art更高的性能、多层概念vector的利用和低质量概念分割标签的Robustness。最后,我们示示了GCPV可以用于找到混淆的概念,并发现有趣的概念水平异常。因此,GCPVposes a promising step towards interpretable model debugging和 informed data improvement。

Learning to Cooperate and Communicate Over Imperfect Channels

  • paper_url: http://arxiv.org/abs/2311.14770
  • repo_url: None
  • paper_authors: Jannis Weil, Gizem Ekinci, Heinz Koeppl, Tobias Meuser
  • for: 提高多代理系统中代理之间协作性,特别在部分可见情况下。
  • methods: 使用独立Q学习算法,允许代理在不可靠通道上进行归一化交流,并适应不同通道特性。
  • results: 在一个新的数字预测环境中,我们的方法比无适应能力的方法表现出色,并且在交通枢纽环境中有限制。
    Abstract Information exchange in multi-agent systems improves the cooperation among agents, especially in partially observable settings. In the real world, communication is often carried out over imperfect channels. This requires agents to handle uncertainty due to potential information loss. In this paper, we consider a cooperative multi-agent system where the agents act and exchange information in a decentralized manner using a limited and unreliable channel. To cope with such channel constraints, we propose a novel communication approach based on independent Q-learning. Our method allows agents to dynamically adapt how much information to share by sending messages of different sizes, depending on their local observations and the channel's properties. In addition to this message size selection, agents learn to encode and decode messages to improve their jointly trained policies. We show that our approach outperforms approaches without adaptive capabilities in a novel cooperative digit-prediction environment and discuss its limitations in the traffic junction environment.
    摘要 多智能体系中的信息交换可以提高智能体之间的合作,特别是在部分可见的设定下。在实际世界中,通信经常通过不可靠的通道进行。这需要智能体处理通信中的uncertainty,以适应可能的信息损失。在这篇论文中,我们考虑了一个合作多智能体系统,在这个系统中,智能体在分布式的方式进行行动和信息交换,使用有限和不可靠的通道进行通信。为了应对这种通道的限制,我们提出了一种新的通信方法,基于独立Q学习。我们的方法允许智能体在本地观察和通道的性质基础上选择发送信息的大小,并且学习编码和解码消息以提高其共同训练的策略。我们的方法在一个新的合作数字预测环境中表现出了超过非适应方法的优越性,并且对交通拐点环境中的局限性进行了讨论。

Human-Machine Cooperative Multimodal Learning Method for Cross-subject Olfactory Preference Recognition

  • paper_url: http://arxiv.org/abs/2311.14426
  • repo_url: None
  • paper_authors: Xiuxin Xia, Yuchen Guo, Yanwei Wang, Yuchao Yang, Yan Shi, Hong Men
  • for: 这种研究是为了开发一种跨个体嗅觉喜好认知方法,以便在食品、服装、化妆品等领域进行嗅觉评估。
  • methods: 这种方法使用了电子鼻(E-nose)和嗅觉电 Encyclopaedia(EEG)的多模态学习方法,以实现跨个体嗅觉喜好认知。
  • results: 研究结果表明,该方法可以在24名参与者中实现跨个体嗅觉喜好认知,并且认知效果比现有方法更高。此外,该方法的优势在于可以准确地捕捉嗅觉信息和个体情感信息,因此有很好的应用前景在实际嗅觉评估中。
    Abstract Odor sensory evaluation has a broad application in food, clothing, cosmetics, and other fields. Traditional artificial sensory evaluation has poor repeatability, and the machine olfaction represented by the electronic nose (E-nose) is difficult to reflect human feelings. Olfactory electroencephalogram (EEG) contains odor and individual features associated with human olfactory preference, which has unique advantages in odor sensory evaluation. However, the difficulty of cross-subject olfactory EEG recognition greatly limits its application. It is worth noting that E-nose and olfactory EEG are more advantageous in representing odor information and individual emotions, respectively. In this paper, an E-nose and olfactory EEG multimodal learning method is proposed for cross-subject olfactory preference recognition. Firstly, the olfactory EEG and E-nose multimodal data acquisition and preprocessing paradigms are established. Secondly, a complementary multimodal data mining strategy is proposed to effectively mine the common features of multimodal data representing odor information and the individual features in olfactory EEG representing individual emotional information. Finally, the cross-subject olfactory preference recognition is achieved in 24 subjects by fusing the extracted common and individual features, and the recognition effect is superior to the state-of-the-art recognition methods. Furthermore, the advantages of the proposed method in cross-subject olfactory preference recognition indicate its potential for practical odor evaluation applications.
    摘要 气味评估在食品、服装、化妆品等领域有广泛的应用。传统的人工气味评估有重复性问题,而电子鼻(E-nose)则难以反映人类情感。气味电enzephalogram(EEG)含有气味信息和个体特征,这些特征与人类气味偏好有着独特的优势。然而,跨主体气味EEG认知的困难很大限制了其应用。值得注意的是,E-nose和气味EEG各有其优势,E-nose更好地表达气味信息,而气味EEG更好地反映个体情感。在这篇论文中,一种E-nose和气味EEG多模式学习方法被提出,用于跨主体气味偏好认知。首先,确立了气味EEG和E-nose多模式数据收集和处理方法。然后,提出了一种补充多模式数据挖掘策略,以有效挖掘表达气味信息和个体情感的共同特征。最后,在24名主体中实现了跨主体气味偏好认知,并将提取的共同特征和个体特征进行融合,实现了更高的认知效果。此外,提出的方法在跨主体气味偏好认知中的优势,表明其在实际气味评估应用中具有潜在的潜力。

AdaDiff: Adaptive Step Selection for Fast Diffusion

  • paper_url: http://arxiv.org/abs/2311.14768
  • repo_url: None
  • paper_authors: Hui Zhang, Zuxuan Wu, Zhen Xing, Jie Shao, Yu-Gang Jiang
  • for: 提高 diffusion 模型中的渲染速度,适应不同输入文本的质量。
  • methods: 引入 AdaDiff 框架,学习实例特定的步骤使用策略,并使用政策梯度法优化。
  • results: 在三个图像生成和两个视频生成 benchmark 上,与基eline 使用固定 50 步骤的同等质量视觉效果,降低推理时间至少 33%,最高达 40%。
    Abstract Diffusion models, as a type of generative models, have achieved impressive results in generating images and videos conditioned on textual conditions. However, the generation process of diffusion models involves denoising for dozens of steps to produce photorealistic images/videos, which is computationally expensive. Unlike previous methods that design ``one-size-fits-all'' approaches for speed up, we argue denoising steps should be sample-specific conditioned on the richness of input texts. To this end, we introduce AdaDiff, a lightweight framework designed to learn instance-specific step usage policies, which are then used by the diffusion model for generation. AdaDiff is optimized using a policy gradient method to maximize a carefully designed reward function, balancing inference time and generation quality. We conduct experiments on three image generation and two video generation benchmarks and demonstrate that our approach achieves similar results in terms of visual quality compared to the baseline using a fixed 50 denoising steps while reducing inference time by at least 33%, going as high as 40%. Furthermore, our qualitative analysis shows that our method allocates more steps to more informative text conditions and fewer steps to simpler text conditions.
    摘要 Diffusion模型, как一种生成模型,在生成图像和视频的条件下已经实现了很好的效果。然而,Diffusion模型的生成过程中需要进行多达 dozen 步的干净过程,以生成高质量的图像和视频,这具有计算昂贵的问题。与之前的方法不同,我们认为干净步骤应该是基于输入文本的质量来conditioned,而不是采用一szone-size-fits-all的方法来加速。为此,我们介绍了 AdaDiff,一个轻量级的框架,用于学习实例特定的步骤使用策略,然后将其用于Diffusion模型的生成。AdaDiff通过使用一种政策梯度法来优化一个特殊的奖励函数,以平衡推理时间和生成质量。我们在三个图像生成和两个视频生成 benchmark 上进行了实验,并证明了我们的方法可以在保持视觉质量的情况下,降低推理时间至少33%,最高达40%。此外,我们的质量分析表明,我们的方法会在不同的文本条件下分配不同的步骤数量,对简单的文本条件分配 fewer 步骤,对更加详细的文本条件分配更多的步骤。

LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design

  • paper_url: http://arxiv.org/abs/2311.14407
  • repo_url: https://github.com/fraunhofer-scai/llamol
  • paper_authors: Niklas Dobberstein, Astrid Maass, Jan Hamaekers
  • for: 本研究目的是开发一种基于Transformer架构的生成模型,用于探索有机化学空间,并找到可能具有电磁活性的分子。
  • methods: 本研究使用了一种新的训练方法,称为“随机上下文学习”,以最大化模型的灵活性和可靠性。模型可以处理单个和多个条件的有机分子生成,并可以包含数字和/或字符序列在生成过程中。
  • results: 研究表明,LLamol模型可以生成有效的有机分子结构,并且可以随意地包含数字和/或字符序列进行生成。模型在各种场景中都表现非常满意。
    Abstract Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present "LLamol", a single novel generative transformer model based on the LLama 2 architecture, which was trained on a 13M superset of organic compounds drawn from diverse public sources. To allow for a maximum flexibility in usage and robustness in view of potentially incomplete data, we introduce "Stochastic Context Learning" as a new training procedure. We demonstrate that the resulting model adeptly handles single- and multi-conditional organic molecule generation with up to four conditions, yet more are possible. The model generates valid molecular structures in SMILES notation while flexibly incorporating three numerical and/or one token sequence into the generative process, just as requested. The generated compounds are very satisfactory in all scenarios tested. In detail, we showcase the model's capability to utilize token sequences for conditioning, either individually or in combination with numerical properties, making LLamol a potent tool for de novo molecule design, easily expandable with new properties.
    摘要 “生成模型在自然语言处理(NLP)和设计分子方面已经表现出了很大的承诺,如seen in General Pretrained Transformer(GPT)模型。为了开发一个用于探索有机化学空间,搜索可能有电动活性的分子的工具,我们介绍了“LLamol”,一种单一的生成转换器模型,基于LLama 2 架构,该模型在1300万个有机分子的超集上 receive 训练。为了实现最大的灵活性和数据不完整的鲁棒性,我们提出了“随机上下文学习”的新训练方法。我们示示了该模型可以轻松处理单个和多个条件的有机分子生成,并且可以随意地包含数字和/或字符串序列在生成过程中。生成的分子结构都是有效的SMILES记录,并且可以随意地携带三个数字和/或一个字符串序列。在所有测试场景中,生成的分子都很满意。”Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore.

Prototype of deployment of Federated Learning with IoT devices

  • paper_url: http://arxiv.org/abs/2311.14401
  • repo_url: None
  • paper_authors: Pablo García Santaclara, Ana Fernández Vilas, Rebeca P. Díaz Redondo
  • For: 本研究旨在提出一种基于联合学习的解决方案,以帮助Internet of Things(IoT)设备学习和改进模型性能,而不违反数据保护法规。* Methods: 本研究使用了联合学习技术,并在raspberry pi板上实现了一个详细的详细的联合学习解决方案。* Results: 研究结果显示,联合学习解决方案在一些情况下不能达到传统方法的性能水平,但可以在敏感数据保护和各种环境下进行有效的学习和改进。
    Abstract In the age of technology, data is an increasingly important resource. This importance is growing in the field of Artificial Intelligence (AI), where sub fields such as Machine Learning (ML) need more and more data to achieve better results. Internet of Things (IoT) is the connection of sensors and smart objects to collect and exchange data, in addition to achieving many other tasks. A huge amount of the resource desired, data, is stored in mobile devices, sensors and other Internet of Things (IoT) devices, but remains there due to data protection restrictions. At the same time these devices do not have enough data or computational capacity to train good models. Moreover, transmitting, storing and processing all this data on a centralised server is problematic. Federated Learning (FL) provides an innovative solution that allows devices to learn in a collaborative way. More importantly, it accomplishes this without violating data protection laws. FL is currently growing, and there are several solutions that implement it. This article presents a prototype of a FL solution where the IoT devices used were raspberry pi boards. The results compare the performance of a solution of this type with those obtained in traditional approaches. In addition, the FL solution performance was tested in a hostile environment. A convolutional neural network (CNN) and a image data set were used. The results show the feasibility and usability of these techniques, although in many cases they do not reach the performance of traditional approaches.
    摘要 在技术时代,数据已成为一种非常重要的资源。在人工智能(AI)领域,其下的机器学习(ML)需要更多的数据来达到更好的效果。互联网物联网(IoT)是连接感知器和智能设备来收集和交换数据的方式,同时具有许多其他任务的能力。很多人希望的资源——数据——却被存储在移动设备、感知器和其他互联网设备中,但由于数据保护限制而无法使用。此外,将所有这些数据传输、存储和处理到中央服务器是一项困难的任务。 Federated Learning(FL)提供了一种创新的解决方案,允许设备之间协同学习。此外,它不产生数据保护法规的违反。FL目前在发展中,有许多实现这种解决方案的解决方案。本文描述了一种基于raspberry pi板的FL解决方案的原型。测试结果显示,这种解决方案的性能与传统方法相比,在一定程度上具有可行性和实用性,但在一些情况下并不能达到传统方法的性能。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Low-Cost HEM with Arduino and Zigbee Technologies in the Energy Sector in Colombia

  • paper_url: http://arxiv.org/abs/2311.14767
  • repo_url: None
  • paper_authors: Zurisaddai de la Cruz Severiche Maury, Ana Fernandez Vilas, Rebeca Diaz Redondo
  • for: 降低家庭电力消耗
  • methods: 使用低成本家电管理系统(HEMS)监控家庭常用设备的电力消耗,并让用户分别监控每个设备的消耗,以设计降低家庭电力消耗策略。
  • results: 透过试验室评估,发现在安装HEMS后,每周的电力消耗降低了27%。这显示了低成本系统可以实现好的电力消耗减少。
    Abstract Since no solutions have been proposed in Colombia that seek to reduce the consumption of electricity at the residential level, this paper describes the design and implementation of a simple prototype of a low-cost home energy management system (HEMS). The objective of this plat-form is to monitor the energy consumption of typical household devices so that users can access the consumption of each device separately and then establish the strategy that allows them to reduce energy consumption at home. In order to demonstrate that our system is viable, the system has been evaluated by measuring weekly energy consumption with the on-line and off-line HEMS using a test bench with typical household devices in a Sincelejo typical household. The evaluation has shown that with the installation of this HEMS, consumption is reduced by 27%. This shows that it is possible to achieve a good reduction percentage with a low-cost system.
    摘要 自哈利逊(Sincelejo)typical household中,没有任何解决方案来减少家庭用电量。这篇文章描述了一个简单的低成本家庭能源管理系统(HEMS)的设计和实施。该平台的目标是监控家庭常用设备的能源消耗,让用户可以分别查看每个设备的能源消耗,并根据这些信息来制定减少家庭用电的策略。为证明我们的系统的可行性,我们在一个具有典型家庭设备的测试台上测试了在线和离线HEMS。测试结果表明,通过安装我们的HEMS,每周的能源消耗可以减少27%。这表明,可以通过低成本的系统实现有效的减少。

Directly Attention Loss Adjusted Prioritized Experience Replay

  • paper_url: http://arxiv.org/abs/2311.14390
  • repo_url: None
  • paper_authors: Zhuoying Chen, Huiping Li, Zhaoxu Wang
  • for: 提高强化学习算法的训练效率和稳定性
  • methods: 使用 Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) 方法,通过自回归网络直接量化变化后的分布差异,准确补偿错误
  • results: 在值函数基本、政策梯度基本和多代强化学习算法中进行了集成,实现了提高训练效率和减少训练异谱的优势
    Abstract Prioritized Experience Replay (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, an novel off policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, so as to accurately compensate the error. In addition, a Priority-Encouragement mechanism is designed simultaneously to optimize the sample screening criterion, and further improve the training efficiency. In order to verify the effectiveness and generality of DALAP, we integrate it with the value-function based, the policy-gradient based and multi-agent reinforcement learning algorithm, respectively. The multiple groups of comparative experiments show that DALAP has the significant advantages of both improving the convergence rate and reducing the training variance.
    摘要 通过人工变更样本的访问频率,增强体验回放(PER)可以让模型更多地学习关键样本。然而,这种不均衡采样方法会导致状态动作分布的变化,从而影响估计值函数的准确性。本文提出了一种新的不对称采样学习框架,称为直接注意力补偿增强体验回放(DALAP)。DALAP可以直接量化变换后的分布变化的程度,并通过并行自注意力网络进行补偿。此外,我们同时设计了优先顺序鼓励机制,以优化样本检选标准,进一步提高训练效率。为了证明DALAP的效果和通用性,我们将其与值函数基本、政策梯度基本和多代人reno奖学习算法相结合。多组比较实验显示,DALAP具有加速收敛率和降低训练方差的显著优势。

Potential Societal Biases of ChatGPT in Higher Education: A Scoping Review

  • paper_url: http://arxiv.org/abs/2311.14381
  • repo_url: None
  • paper_authors: Ming Li, Ariunaa Enkhtur, Beverley Anne Yamamoto, Fei Cheng
    for: This scoping review aims to examine the ethical issues involved in the use of ChatGPT and other Generative Artificial Intelligence (GAI) models in higher education settings, particularly the potential biases that may be inherited or amplified.methods: The review searches for academic articles written in English, Chinese, and Japanese across four main databases concerned with GAI usage in higher education and bias.results: The majority of articles touch on “bias” at a relatively superficial level, with few identifying the types of bias that may occur under what circumstances or discussing the possible implications for higher education, staff, faculty members, or students. There is a notable lack of empirical work in this area, and the review calls for more research to be conducted.
    Abstract ChatGPT and other Generative Artificial Intelligence (GAI) models tend to inherit and even amplify prevailing societal biases as they are trained on large amounts of existing data. Given the increasing usage of ChatGPT and other GAI by students, faculty members, and staff in higher education institutions (HEIs), there is an urgent need to examine the ethical issues involved such as its potential biases. In this scoping review, we clarify the ways in which biases related to GAI in higher education settings have been discussed in recent academic publications and identify what type of potential biases are commonly reported in this body of literature. We searched for academic articles written in English, Chinese, and Japanese across four main databases concerned with GAI usage in higher education and bias. Our findings show that while there is an awareness of potential biases around large language models (LLMs) and GAI, the majority of articles touch on ``bias'' at a relatively superficial level. Few identify what types of bias may occur under what circumstances. Neither do they discuss the possible implications for the higher education, staff, faculty members, or students. There is a notable lack of empirical work at this point, and we call for higher education researchers and AI experts to conduct more research in this area.
    摘要 chatGPT和其他生成人工智能(GAI)模型通常会继承和增强现有社会偏见,因为它们在大量数据上训练。随着学生、教师和工作人员在高等教育机构(HEI)中使用chatGPT和GAI的增加,有一项急需要评估GAI在高等教育 Setting中的伦理问题。本探讨篇中,我们清楚地解释了在高等教育设置中GAI中的偏见问题,并identify了常见的potential bias类型。我们在四个主要关于GAI使用高等教育和偏见的数据库中搜索了学术文献,our findings表明,虽然有人意识到大语言模型(LLM)和GAI中的偏见问题,但大多数文献只在“偏见”的问题上进行了 superficialexploration。 few identify了哪些情况下可能出现偏见,nor do they discuss the possible implications for higher education, staff, faculty members, or students。there is a notable lack of empirical work at this point, and we call for higher education researchers and AI experts to conduct more research in this area.

Ethical implications of ChatGPT in higher education: A scoping review

  • paper_url: http://arxiv.org/abs/2311.14378
  • repo_url: None
  • paper_authors: Ming Li, Ariunaa Enkhtur, Fei Cheng, Beverley Anne Yamamoto
  • For: This paper explores the ethical challenges of using ChatGPT in education, particularly in higher education.* Methods: The paper uses a scoping review approach, reviewing recent academic articles written in English, Chinese, and Japanese to provide a comprehensive overview of relevant research and identify gaps for future considerations.* Results: The paper identifies six main areas of ethical concern in using AI in education, including misinformation harms and human-computer interaction related harms. The majority of papers reviewed were concerned with these two areas.
    Abstract This scoping review explores the ethical challenges of using ChatGPT in education, focusing particularly on issues related to higher education. By reviewing recent academic articles written in English, Chinese, and Japanese, we aimed to provide a comprehensive overview of relevant research while identifying gaps for future considerations. Drawing on Arksey and O'Malley's (2005) five-stage scoping review framework, we identified research questions, search terms, and conducted article search from four databases in the target three languages. Each article was reviewed by at least two researchers identifying the main ethical issues of utilizing AI in education, particularly higher education. Our analysis of ethical issues followed the framework developed by DeepMind (Weiginger et al., 2021) to identify six main areas of ethical concern in Language Models. The majority of papers were concerned with misinformation harms (n=25) and/or human-computer interaction related harms (n=24). Given the rapid deployment of Generative Artificial Intelligence (GAI), it is imperative for educators to conduct more empirical studies to develop sound ethical policies for the use of GAI.
    摘要 Translation notes:* "ChatGPT" was translated as "聊天GPT" (shuò tiān GPT)* "higher education" was translated as "高等教育" (gāo děng jiào yù)* "Arksey and O'Malley's" was translated as "阿克西和奥马利的" (ā kè xī yǔ ài mǎ lì de)* "five-stage scoping review framework" was translated as "五个阶段探讨框架" (wǔ gè jiē dàng tàng zhù kōng jí)* "recent academic articles" was translated as "最新的学术文章" (zuì xīn de xué xué wén zhāng)* "in the target three languages" was translated as "目标语言中的" (mù zhǎng yǔ yán zhōng de)* "each article was reviewed by at least two researchers" was translated as "每篇文章都由至少两名研究人员审核" (měi piān wén zhāng dōu yǐ jīn yī zhī shàng liǎng míng yán jí yì zhū)* "main ethical issues of utilizing AI in education" was translated as "使用AI在教育中的主要伦理问题" (shǐ yòng AI zài jiào yù zhōng de zhōng yào liú lǐ wèn tí)* "particularly higher education" was translated as "特别是高等教育" (tè bié shì gāo děng jiào yù)* "misinformation harms" was translated as "误信害" (huì xìn hài)* "human-computer interaction related harms" was translated as "人机交互相关的害" (rén jī jiāo hù xiāng guān de hài)* "Generative Artificial Intelligence (GAI)" was translated as "生成人工智能" (shēng chéng rén gōng zhì yǎng)

Federated Transformed Learning for a Circular, Secure, and Tiny AI

  • paper_url: http://arxiv.org/abs/2311.14371
  • repo_url: None
  • paper_authors: Weisi Guo, Schyler Sun, Bin Li, Sam Blakeman
  • for: 本研究旨在实现转化的深度学习表示法,以解决新任务无法忘记之前任务的问题。
  • methods: 本研究使用了深度学习技术,包括循环深度学习、安全深度学习和微型深度学习。
  • results: 研究表明,通过跨领域的激励和深度学习变换,可以实现循环安全小型AI(CST-AI)。
    Abstract Deep Learning (DL) is penetrating into a diverse range of mass mobility, smart living, and industrial applications, rapidly transforming the way we live and work. DL is at the heart of many AI implementations. A key set of challenges is to produce AI modules that are: (1) "circular" - can solve new tasks without forgetting how to solve previous ones, (2) "secure" - have immunity to adversarial data attacks, and (3) "tiny" - implementable in low power low cost embedded hardware. Clearly it is difficult to achieve all three aspects on a single horizontal layer of platforms, as the techniques require transformed deep representations that incur different computation and communication requirements. Here we set out the vision to achieve transformed DL representations across a 5G and Beyond networked architecture. We first detail the cross-sectoral motivations for each challenge area, before demonstrating recent advances in DL research that can achieve circular, secure, and tiny AI (CST-AI). Recognising the conflicting demand of each transformed deep representation, we federate their deep learning transformations and functionalities across the network to achieve connected run-time capabilities.
    摘要 深度学习(DL)在多样化的大规模移动、智能生活和工业应用中普遍传播,快速改变我们的生活和工作方式。DL是许多人工智能实施的核心。一个关键的挑战是生成能够解决新任务而不忘记之前任务的AI模块。另外,还需要实现安全的AI模块,具有对抗恶意数据攻击的免疫力,以及能够在低功耗低成本的嵌入式硬件上实现的小型AI模块。然而,在单一的平台层上实现这三个方面的要求很 diffficult,因为这些技术需要不同的计算和通信要求。在这里,我们提出了在5G和以后网络架构下实现转换的深度学习表示法的视野。我们首先详细介绍了每个挑战领域的跨领域动机,然后示cases recent advances in DL research that can achieve circular, secure, and tiny AI (CST-AI).认识每个转换深度学习的矛盾需求,我们在网络上联合深度学习转换和功能,实现连接的运行时能力。

Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset

  • paper_url: http://arxiv.org/abs/2311.14335
  • repo_url: None
  • paper_authors: Usneek Singh, Piyush Arora, Shamika Ganesan, Mohit Kumar, Siddhant Kulkarni, Salil R. Joshi
  • for: 本研究是为了研究基于转换器模型的tabular数据模型化方法,特别是在大规模业务 dataset 上进行比较分析。
  • methods: 本研究使用了多种基于转换器模型的方法,包括预训练和直接监督学习方法,对 synthetic dataset 和 default prediction Kaggle dataset (2022) 进行了广泛的比较。
  • results: 研究发现了处理高维数据、有效地预处理 categorical 和数值特征以及计算资源的权衡等挑战,并提供了优化数据预处理、管理 categorical 和数值特征以及考虑计算资源和性能之间的贸易OFF的策略。
    Abstract We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability.
    摘要 我们进行了对transformer模型在表格数据模型方面的比较分析,特别是在大规模业务 dataset 上。EARLIER STUDIES 表明在小型公共或人工生成 dataset 上具有扎实的成果,但这些成果并没有扩展到更大的业务dataset。 THE CHALLENGES IDENTIFIED INCLUDE HANDLING HIGH-DIMENSIONAL DATA, THE NECESSITY FOR EFFICIENT PRE-PROCESSING OF CATEGORICAL AND NUMERICAL FEATURES, AND ADDRESSING SUBSTANTIAL COMPUTATIONAL REQUIREMENTS.为了解决这些挑战,该研究使用了多种 transformer 模型,并使用了 synthetic dataset 和 Kaggle 默认预测 dataset (2022) 从美国表现。研究发现了数据预处理的优化策略、预训练和直接监督学习方法的比较、 categorical 和数字特征的处理策略,以及计算资源和性能之间的贸易OFF。 concentrate on 财务数据模型,该研究旨在推动 transformer 模型在实际场景中的系统性发展和部署,强调可扩展性。

Large Language Models as Topological Structure Enhancers for Text-Attributed Graphs

  • paper_url: http://arxiv.org/abs/2311.14324
  • repo_url: None
  • paper_authors: Shengyin Sun, Yuxiang Ren, Chen Ma, Xuecang Zhang
  • for: 本研究探讨了如何使用大型自然语言模型(LLM)改善文本关联图(TAG)中节点的 topological structure,尤其是在节点分类任务下。
  • methods: 本研究提出了两种使用 LLM 改善图 topological structure的方法:首先,使用 LLM 生成节点特征的 semantic similarity,然后根据相似性进行边删除和边添加;其次,引入 pseudo-label 协助 GNN 学习合适的边重量。
  • results: 实验结果表明,LLM-based 图 topological refinement 可以提高节点分类任务的性能(在公共标准benchmark上达到0.15%–2.47%的提升)。
    Abstract The latest advancements in large language models (LLMs) have revolutionized the field of natural language processing (NLP). Inspired by the success of LLMs in NLP tasks, some recent work has begun investigating the potential of applying LLMs in graph learning tasks. However, most of the existing work focuses on utilizing LLMs as powerful node feature augmenters, leaving employing LLMs to enhance graph topological structures an understudied problem. In this work, we explore how to leverage the information retrieval and text generation capabilities of LLMs to refine/enhance the topological structure of text-attributed graphs (TAGs) under the node classification setting. First, we propose using LLMs to help remove unreliable edges and add reliable ones in the TAG. Specifically, we first let the LLM output the semantic similarity between node attributes through delicate prompt designs, and then perform edge deletion and edge addition based on the similarity. Second, we propose using pseudo-labels generated by the LLM to improve graph topology, that is, we introduce the pseudo-label propagation as a regularization to guide the graph neural network (GNN) in learning proper edge weights. Finally, we incorporate the two aforementioned LLM-based methods for graph topological refinement into the process of GNN training, and perform extensive experiments on four real-world datasets. The experimental results demonstrate the effectiveness of LLM-based graph topology refinement (achieving a 0.15%--2.47% performance gain on public benchmarks).
    摘要 最新的大语言模型(LLM)在自然语言处理(NLP)领域已经引起了革命,而一些最近的工作则开始研究在图学任务中应用LLM。然而,大多数现有的工作都是使用LLM作为图像特征增强工具,忽略了使用LLM来提高图结构的可能性。在这项工作中,我们探讨了如何使用LLM的信息检索和文本生成能力来改善文本涂抹图(TAG)中的图结构,具体来说,我们提出了以下两种方法:1. 使用LLM输出节点特征相似性,并根据相似性进行边删除和边添加。我们首先使用灵活的提示设计让LLM输出节点特征相似性,然后根据相似性进行边删除和边添加。2. 使用LLM生成的假标签来改善图结构,即通过假标签卷积来规范图 neural network(GNN)学习合适的边权重。最后,我们将这两种LLM基于的图结构级别提升方法integrated into GNN培训过程中,并在四个实际数据集上进行了广泛的实验。实验结果表明,LLM基于的图结构级别提升方法可以在公共标准准确率(ACC)上提高0.15%--2.47%的性能。

Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction

  • paper_url: http://arxiv.org/abs/2311.14316
  • repo_url: https://github.com/szwszwszw123/windformer
  • paper_authors: Xuewei Li, Zewen Shang, Zhiqiang Liu, Jian Yu, Wei Xiong, Mei Yu
  • for: 预测风速是风力发电管理中非常重要的一环, 由于风速的巨大变化范围和扰动效应, 可能存在较强的相关性 между远距离的风机。这种难以提取的特征已成为提高准确性的瓶颈。
  • methods: 为解决这些问题, 本文提出了风former。首先, 风former将风机群分成多个不重叠的窗口, 然后在窗口内计算相关性, 并将窗口部分地移动以提供窗口之间的连接, 最后将多个通道特征 fusions 基于详细和全局信息。
  • results: 对比其他当前先进方法, 风former 的 Mean Square Error (MSE) 在 NERL 上的两个数据集上降低了0.5% 到 15%。
    Abstract Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic information can be utilized will also affect the prediction effect. In response to the above problems, this paper proposes Windformer. First, Windformer divides the wind turbine cluster into multiple non-overlapping windows and calculates correlations inside the windows, then shifts the windows partially to provide connectivity between windows, and finally fuses multi-channel features based on detailed and global information. To dynamically model the change process of wind speed, this paper extracts time series in both history and future directions simultaneously. Compared with other current-advanced methods, the Mean Square Error (MSE) of Windformer is reduced by 0.5\% to 15\% on two datasets from NERL.
    摘要 预测风速对风力发电管理是关键。由于风速波动范围很大,带动效应也可能导致远程风机之间强相关性。这种难以提取特征使得精度提高受到了各种瓶颈。历史和未来时间信息包括风暴气流变化趋势,是否能充分利用这些动态信息,将影响预测效果。为了解决这些问题,本文提出了风成器(Windformer)。首先,风成器将风机群分成多个非重叠窗口,然后在窗口内计算相关性,并将窗口部分移动以提供窗口之间连接。最后,风成器将多个通道特征 fusion,基于详细和全局信息。为了动态模型风速变化的过程,本文同时提取历史和未来时间序列。与现有先进方法相比,风成器的 Mean Square Error(MSE)在NERL dataset上降低了0.5%到15%。

Robust Domain Misinformation Detection via Multi-modal Feature Alignment

  • paper_url: http://arxiv.org/abs/2311.14315
  • repo_url: https://github.com/less-and-less-bugs/rdcm
  • paper_authors: Hui Liu, Wenya Wang, Hao Sun, Anderson Rocha, Haoliang Li
  • for: 本研究旨在提出一种robust多模态鲁棒适应(RDCM)方法,用于检测多Modal的谣言信息。
  • methods: 本方法使用了交叉模态对齐模块和交叉模态对齐模块来减少频率域的差异,并且同时考虑了频率域的适应和频率域的泛化。
  • results: 经过测试两个公共的多模态谣言检测数据集(Pheme和Twitter Datasets),结果显示提出的方法在检测多模态谣言信息方面表现出色,与传统的监督学习方法相比,它具有更好的鲁检性和泛化性。
    Abstract Social media misinformation harms individuals and societies and is potentialized by fast-growing multi-modal content (i.e., texts and images), which accounts for higher "credibility" than text-only news pieces. Although existing supervised misinformation detection methods have obtained acceptable performances in key setups, they may require large amounts of labeled data from various events, which can be time-consuming and tedious. In turn, directly training a model by leveraging a publicly available dataset may fail to generalize due to domain shifts between the training data (a.k.a. source domains) and the data from target domains. Most prior work on domain shift focuses on a single modality (e.g., text modality) and ignores the scenario where sufficient unlabeled target domain data may not be readily available in an early stage. The lack of data often happens due to the dynamic propagation trend (i.e., the number of posts related to fake news increases slowly before catching the public attention). We propose a novel robust domain and cross-modal approach (\textbf{RDCM}) for multi-modal misinformation detection. It reduces the domain shift by aligning the joint distribution of textual and visual modalities through an inter-domain alignment module and bridges the semantic gap between both modalities through a cross-modality alignment module. We also propose a framework that simultaneously considers application scenarios of domain generalization (in which the target domain data is unavailable) and domain adaptation (in which unlabeled target domain data is available). Evaluation results on two public multi-modal misinformation detection datasets (Pheme and Twitter Datasets) evince the superiority of the proposed model. The formal implementation of this paper can be found in this link: https://github.com/less-and-less-bugs/RDCM
    摘要 社交媒体谣言伤害个人和社会,而这种谣言受到快速增长的多Modal内容(即文本和图像)的支持,这种内容的 credibility 高于文本只的新闻篇幅。 Although 现有的监督式谣言检测方法在某些场景中获得了接受的表现,但它们可能需要大量的标注数据从多个事件,这可能是时间consuming 和繁琐的。 In turn, 直接使用公共可用的数据集来训练模型可能无法泛化 Due to domain shifts between the training data (即源频道) and the data from target domains。 Prior work on domain shift 主要关注单一模态(例如文本模式),而忽略了target频道数据不可避免的情况,即在早期阶段可能无法获得足够的未标注数据。 这种情况frequently happens due to the dynamic propagation trend (即谣言宣传速度慢慢增长)。 We propose a novel robust domain and cross-modal approach (\textbf{RDCM}) for multi-modal misinformation detection. It reduces the domain shift by aligning the joint distribution of textual and visual modalities through an inter-domain alignment module and bridges the semantic gap between both modalities through a cross-modality alignment module. We also propose a framework that simultaneously considers application scenarios of domain generalization (in which the target domain data is unavailable) and domain adaptation (in which unlabeled target domain data is available). Evaluation results on two public multi-modal misinformation detection datasets (Pheme和Twitter Datasets) evince the superiority of the proposed model. The formal implementation of this paper can be found in this link:

New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

  • paper_url: http://arxiv.org/abs/2311.14305
  • repo_url: None
  • paper_authors: Vasantha Kumar Venugopal, Abhishek Gupta, Rohit Takhar, Vidur Mahajan
  • for: 监测和维护 radiology AI 分类模型在实践中的准确性和可靠性
  • methods: 提出了两个指标:predictive divergence和temporal stability,通过比较预测结果和两个补充模型的结果来评估模型准确性,并通过比较当前预测结果与历史移动平均值来评估模型的时间稳定性
  • results: 通过递归验证使用了胸部X射图数据,实现了模型可靠性的监测和维护,提供了实时的模型性能信息,为AI在医疗决策中的安全和有效使用 lay the foundation
    Abstract With the increasingly widespread adoption of AI in healthcare, maintaining the accuracy and reliability of AI models in clinical practice has become crucial. In this context, we introduce novel methods for monitoring the performance of radiology AI classification models in practice, addressing the challenges of obtaining real-time ground truth for performance monitoring. We propose two metrics - predictive divergence and temporal stability - to be used for preemptive alerts of AI performance changes. Predictive divergence, measured using Kullback-Leibler and Jensen-Shannon divergences, evaluates model accuracy by comparing predictions with those of two supplementary models. Temporal stability is assessed through a comparison of current predictions against historical moving averages, identifying potential model decay or data drift. This approach was retrospectively validated using chest X-ray data from a single-center imaging clinic, demonstrating its effectiveness in maintaining AI model reliability. By providing continuous, real-time insights into model performance, our system ensures the safe and effective use of AI in clinical decision-making, paving the way for more robust AI integration in healthcare
    摘要 We propose two metrics - predictive divergence and temporal stability - to be used for preemptive alerts of AI performance changes. Predictive divergence, measured using Kullback-Leibler and Jensen-Shannon divergences, evaluates model accuracy by comparing predictions with those of two supplementary models. Temporal stability is assessed through a comparison of current predictions against historical moving averages, identifying potential model decay or data drift.We retrospectively validated our approach using chest X-ray data from a single-center imaging clinic, and found it to be effective in maintaining AI model reliability. Our system provides continuous, real-time insights into model performance, ensuring the safe and effective use of AI in clinical decision-making, and paving the way for more robust AI integration in healthcare.

Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery

  • paper_url: http://arxiv.org/abs/2311.14270
  • repo_url: None
  • paper_authors: Ekaterina Nikonova, Cheng Xue, Jochen Renz
  • for: 本研究旨在提高深度强化学习Agent在新环境中的适应能力。
  • methods: 我们提出了一个通用的框架,可以让Agent在新环境中自主找到任务特定的规则,并自我超visit。
  • results: 我们的实验表明,基于规则的深度Q学习Agent(RDQ)可以快速检测和适应新的情况,并比基准Agent更加鲜明。
    Abstract Deep reinforcement learning suffers from catastrophic forgetting and sample inefficiency making it less applicable to the ever-changing real world. However, the ability to use previously learned knowledge is essential for AI agents to quickly adapt to novelties. Often, certain spatial information observed by the agent in the previous interactions can be leveraged to infer task-specific rules. Inferred rules can then help the agent to avoid potentially dangerous situations in the previously unseen states and guide the learning process increasing agent's novelty adaptation speed. In this work, we propose a general framework that is applicable to deep reinforcement learning agents. Our framework provides the agent with an autonomous way to discover the task-specific rules in the novel environments and self-supervise it's learning. We provide a rule-driven deep Q-learning agent (RDQ) as one possible implementation of that framework. We show that RDQ successfully extracts task-specific rules as it interacts with the world and uses them to drastically increase its learning efficiency. In our experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents, and is able to detect and adapt to novel situations faster.
    摘要 深度强化学习受到慢性学习和样本不足的影响,使其在实际世界中应用更加困难。然而,使用先前学习的知识是AI代理人在面临新事物时快速适应的重要能力。经常,特定的空间信息在先前交互中被观察到可以用来推导任务特定的规则。推导出的规则可以帮助代理人在未看过的状态中避免可能的危险,并且导引学习过程增加代理人对新事物的适应速度。在这种工作中,我们提出了一个通用的框架,可以应用于深度强化学习代理人。我们的框架为代理人提供了自主地推导任务特定的规则的能力,并且自我超级视觉。我们在实验中提供了一个基于这种框架的规则驱动深度Q学习代理人(RDQ),并证明了RDQ成功地推导出任务特定的规则,并且在交互时使用这些规则以增加其学习效率。在我们的实验中,RDQ代理人比基准代理人更加鲜明地适应新事物,并且更快地检测和适应新情况。

DemoFusion: Democratising High-Resolution Image Generation With No $$$

  • paper_url: http://arxiv.org/abs/2311.16973
  • repo_url: None
  • paper_authors: Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
  • for: 这篇论文的目的是通过推进高分辨率生成技术,使生成人工智能(GenAI)更加民主化,以便更多人可以访问和使用。
  • methods: 该论文使用了现有的幂展 diffusion model(LDM),并提出了一种名为 DemoFusion 的新框架,以实现更高的分辨率图像生成。 DemoFusion 使用了 Progressive Upscaling、Skip Residual 和 Dilated Sampling 机制,以提高图像生成的质量。
  • results: 论文的实验结果表明,使用 DemoFusion 可以实现更高的分辨率图像生成,而且可以在更广泛的用户群体中使用。
    Abstract High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration.
    摘要 高分辨率图像生成技术采用生成人工智能(GenAI)具有巨大潜力,但由于训练需要庞大资本投入,因此逐渐集中到少数大公司手中,并被隐藏在付费墙上。这篇论文希望通过推动高分辨率GenAI的前沿,使其更加民主化,让更广泛的人群有access。我们示示现有的潜在扩散模型(LDM)具有更高的分辨率图像生成潜力。我们的 DemoFusion 框架可以轻松扩展开源GenAI模型,通过进步升级、跳跃副作用和扩大采样机制实现高分辨率图像生成。进步性的 DemoFusion 需要更多的过程,但 intermediate 结果可以作为“预览”,促进快速的提示迭代。