cs.AI - 2023-09-23

Enhancing Student Performance Prediction on Learnersourced Questions with SGNN-LLM Synergy

  • paper_url: http://arxiv.org/abs/2309.13500
  • repo_url: None
  • paper_authors: Lin Ni, Sijie Wang, Zeyu Zhang, Xiaoxuan Li, Xianda Zheng, Paul Denny, Jiamou Liu
  • for: 这篇论文旨在提出一种新的学习战略——learnersourcing,并且解决学生发表问题时因为内在的噪声而难以预测学生表现的问题。
  • methods: 本文使用了签名双方 Graph Neural Networks (SGNNs) 和 Large Language Model (LLM) 的整合策略,实现了学生答案的全面模型化,并且使用了对照学习框架,增强了噪声抗性。
  • results: 本文针对五个真实世界的数据集,进行验证,结果显示了本方法的优越性,包括提高预测精度和类型抗性。
    Abstract As an emerging education strategy, learnersourcing offers the potential for personalized learning content creation, but also grapples with the challenge of predicting student performance due to inherent noise in student-generated data. While graph-based methods excel in capturing dense learner-question interactions, they falter in cold start scenarios, characterized by limited interactions, as seen when questions lack substantial learner responses. In response, we introduce an innovative strategy that synergizes the potential of integrating Signed Graph Neural Networks (SGNNs) and Large Language Model (LLM) embeddings. Our methodology employs a signed bipartite graph to comprehensively model student answers, complemented by a contrastive learning framework that enhances noise resilience. Furthermore, LLM's contribution lies in generating foundational question embeddings, proving especially advantageous in addressing cold start scenarios characterized by limited graph data interactions. Validation across five real-world datasets sourced from the PeerWise platform underscores our approach's effectiveness. Our method outperforms baselines, showcasing enhanced predictive accuracy and robustness.
    摘要 如一种出现的教育战略,学习者来源(learnersourcing)具有个性化学习内容创建的潜力,但同时也面临学生表现预测的挑战,因为学生自然生成的数据中含有噪声。Graph基的方法在学生-问题互动密集的情况下表现出色,但在冷启动场景下, caracterized by limited interactions, graph data interactions are limited. In response, we propose an innovative strategy that combines Signed Graph Neural Networks (SGNNs) and Large Language Model (LLM) embeddings. Our methodology uses a signed bipartite graph to comprehensively model student answers, and a contrastive learning framework that enhances noise resilience. Additionally, LLM's contribution lies in generating foundational question embeddings, which is especially advantageous in addressing cold start scenarios with limited graph data interactions. Our approach is validated across five real-world datasets sourced from the PeerWise platform, and outperforms baselines, demonstrating enhanced predictive accuracy and robustness.

Enhancing Prediction and Analysis of UK Road Traffic Accident Severity Using AI: Integration of Machine Learning, Econometric Techniques, and Time Series Forecasting in Public Health Research

  • paper_url: http://arxiv.org/abs/2309.13483
  • repo_url: None
  • paper_authors: Md Abu Sufian, Jayasree Varadarajan
  • for: 本研究旨在 investigate 英国道路交通事故严重程度,使用机器学习、 econometric 和统计方法处理历史数据。
  • methods: 我们使用了各种技术,包括相关分析、回归模型、GMM 处理错误项、时间序列预测VAR 和 ARIMA 模型。
  • results: 我们的方法比预测方法出perform better,MASE 0.800 和 ME -73.80。我们还建立了一个Random Forest 分类器,具有 73% 精度、78% 回归率和 73% F1-score。使用 H2O AutoML 优化后,我们获得了 XGBoost 模型,RMSE 0.176 和 MAE 0.087。因素分析确定了关键变量,并使用 SHAP 为 Explainable AI, highlighting 关键因素如 Driver_Home_Area_Type 和 Road_Type。I hope that helps! Let me know if you have any further questions.
    Abstract This research investigates road traffic accident severity in the UK, using a combination of machine learning, econometric, and statistical methods on historical data. We employed various techniques, including correlation analysis, regression models, GMM for error term issues, and time-series forecasting with VAR and ARIMA models. Our approach outperforms naive forecasting with an MASE of 0.800 and ME of -73.80. We also built a random forest classifier with 73% precision, 78% recall, and a 73% F1-score. Optimizing with H2O AutoML led to an XGBoost model with an RMSE of 0.176 and MAE of 0.087. Factor Analysis identified key variables, and we used SHAP for Explainable AI, highlighting influential factors like Driver_Home_Area_Type and Road_Type. Our study enhances understanding of accident severity and offers insights for evidence-based road safety policies.
    摘要 Translation notes:* "machine learning" Machine Learning* "econometric" econometric* "statistical" statistical* "historical data" 历史数据* "correlation analysis" 相关分析* "regression models" 回归模型* "GMM" Generalized Method of Moments (GMM)* "error term issues" 错误项问题* "time-series forecasting" 时间序列预测* "VAR" VAR (Vector Autoregression)* "ARIMA" ARIMA (AutoRegressive Integrated Moving Average)* "naive forecasting" 简单预测* "MASE" Mean Absolute Scaled Error (MASE)* "ME" Mean Error (ME)* "random forest classifier" 随机森林分类器* "H2O AutoML" H2O AutoML (Automated Machine Learning)* "XGBoost" XGBoost (eXtreme Gradient Boosting)* "Factor Analysis" 因素分析* "SHAP" SHAP (SHapley Additive exPlanations)* "Explainable AI" 可解释AI

Personalised and Adjustable Interval Type-2 Fuzzy-Based PPG Quality Assessment for the Edge

  • paper_url: http://arxiv.org/abs/2309.13464
  • repo_url: None
  • paper_authors: Jose A. Miranda, Celia López-Ongil, Javier Andreu-Perez
  • for: 这篇论文主要是为了提出一种基于Interval Type-2 Fuzzy Logic System (IT2FLS)的个性化和可调PPG信号质量评估方法,以提高PPG信号处理的准确性和可靠性。
  • methods: 该方法使用了个性化的IT2FLS参数来适应每个个体PPG信号的特点,同时提供可调的个性化水平,让医疗提供者可以根据不同应用场景进行调整。
  • results: 实验结果显示,提出的方法可以达到93.72%的准确率,表明该方法可以实现高效、实时的PPG信号质量评估,并提高PPG信号处理系统的准确性和可靠性。
    Abstract Most of today's wearable technology provides seamless cardiac activity monitoring. Specifically, the vast majority employ Photoplethysmography (PPG) sensors to acquire blood volume pulse information, which is further analysed to extract useful and physiologically related features. Nevertheless, PPG-based signal reliability presents different challenges that strongly affect such data processing. This is mainly related to the fact of PPG morphological wave distortion due to motion artefacts, which can lead to erroneous interpretation of the extracted cardiac-related features. On this basis, in this paper, we propose a novel personalised and adjustable Interval Type-2 Fuzzy Logic System (IT2FLS) for assessing the quality of PPG signals. The proposed system employs a personalised approach to adapt the IT2FLS parameters to the unique characteristics of each individual's PPG signals.Additionally, the system provides adjustable levels of personalisation, allowing healthcare providers to adjust the system to meet specific requirements for different applications. The proposed system obtained up to 93.72\% for average accuracy during validation. The presented system has the potential to enable ultra-low complexity and real-time PPG quality assessment, improving the accuracy and reliability of PPG-based health monitoring systems at the edge.
    摘要 Therefore, in this paper, we propose a novel personalized and adjustable Interval Type-2 Fuzzy Logic System (IT2FLS) for assessing the quality of PPG signals. The proposed system employs a personalized approach to adapt the IT2FLS parameters to the unique characteristics of each individual's PPG signals. Additionally, the system provides adjustable levels of personalization, allowing healthcare providers to adjust the system to meet specific requirements for different applications.The proposed system obtained up to 93.72% for average accuracy during validation. The presented system has the potential to enable ultra-low complexity and real-time PPG quality assessment, improving the accuracy and reliability of PPG-based health monitoring systems at the edge.

A Model-Agnostic Graph Neural Network for Integrating Local and Global Information

  • paper_url: http://arxiv.org/abs/2309.13459
  • repo_url: None
  • paper_authors: Wenzhuo Zhou, Annie Qu, Keiland W. Cooper, Norbert Fortin, Babak Shahbaba
  • for: 提高图像任务的解释性和可解释性,以及提高图像任务的表现。
  • methods: 提出了一种新的模型独立图像神经网络(MaGNet)框架,可以逐渐融合不同阶次的信息,提取高阶几何结构中的知识,并提供可解释的结果。
  • results: 在 simulate 数据上进行了广泛的数值研究,并在一个真实世界的案例中对 brain activity 数据进行了应用,以确认 MaGNet 的效果。
    Abstract Graph Neural Networks (GNNs) have achieved promising performance in a variety of graph-focused tasks. Despite their success, existing GNNs suffer from two significant limitations: a lack of interpretability in results due to their black-box nature, and an inability to learn representations of varying orders. To tackle these issues, we propose a novel Model-agnostic Graph Neural Network (MaGNet) framework, which is able to sequentially integrate information of various orders, extract knowledge from high-order neighbors, and provide meaningful and interpretable results by identifying influential compact graph structures. In particular, MaGNet consists of two components: an estimation model for the latent representation of complex relationships under graph topology, and an interpretation model that identifies influential nodes, edges, and important node features. Theoretically, we establish the generalization error bound for MaGNet via empirical Rademacher complexity, and showcase its power to represent layer-wise neighborhood mixing. We conduct comprehensive numerical studies using simulated data to demonstrate the superior performance of MaGNet in comparison to several state-of-the-art alternatives. Furthermore, we apply MaGNet to a real-world case study aimed at extracting task-critical information from brain activity data, thereby highlighting its effectiveness in advancing scientific research.
    摘要 GRAPH NEURAL NETWORKS (GNNs) 已经在各种图像任务中表现出色。 despite their success, existing GNNs 受到两个重要的限制:一是不能解释结果的黑盒特性,二是无法学习不同级别的表示。 为了解决这些问题,我们提出了一种新的Model-agnostic Graph Neural Network(MaGNet)框架,可以逐渐 интегриate不同级别的信息,提取高阶邻居的知识,并提供可靠和可解释的结果,通过标识重要的紧凑图结构。 具体来说,MaGNet 由两个组成部分:一个用于复杂关系的隐藏表示估计模型,和一个用于标识重要节点、边和节点特征的解释模型。 我们通过对Empirical Rademacher complexity的总化误差 bound来证明MaGNet 的总化误差 bound,并表明其可以具有层次混合的 neigh权。 我们在使用 simulated data 进行了广泛的数值研究,并证明 MaGNet 在与多种状态前的替代方案相比之下表现出优异性。 此外,我们使用 MaGNet 对 brain activity data 进行了实际应用,以验证其在科研中的效果。

EMGTFNet: Fuzzy Vision Transformer to decode Upperlimb sEMG signals for Hand Gestures Recognition

  • paper_url: http://arxiv.org/abs/2310.03754
  • repo_url: None
  • paper_authors: Joseph Cherre Córdova, Christian Flores, Javier Andreu-Perez
  • for: 这个论文是为了研究用于手势识别(HGR)的电Myoelectric控制而写的。
  • methods: 这篇论文使用机器学习和深度学习方法进行模式识别,并使用视Transformer(ViT)架构和粗糙神经块(FNB)组成EMGTFNet模型来实现手势识别。
  • results: 该模型可以准确地识别多种手势动作,而无需使用数据扩展技术、传输学习或增加网络参数的数量。实验结果显示,对于NinaPro数据集中的49种手势动作,测试准确率为83.57%和3.5%,使用200 ms窗口大小和56,793个可变参数。这些结果超越了不含FNB的ViT模型,因此证明了包含FNB可以提高其性能。
    Abstract Myoelectric control is an area of electromyography of increasing interest nowadays, particularly in applications such as Hand Gesture Recognition (HGR) for bionic prostheses. Today's focus is on pattern recognition using Machine Learning and, more recently, Deep Learning methods. Despite achieving good results on sparse sEMG signals, the latter models typically require large datasets and training times. Furthermore, due to the nature of stochastic sEMG signals, traditional models fail to generalize samples for atypical or noisy values. In this paper, we propose the design of a Vision Transformer (ViT) based architecture with a Fuzzy Neural Block (FNB) called EMGTFNet to perform Hand Gesture Recognition from surface electromyography (sEMG) signals. The proposed EMGTFNet architecture can accurately classify a variety of hand gestures without any need for data augmentation techniques, transfer learning or a significant increase in the number of parameters in the network. The accuracy of the proposed model is tested using the publicly available NinaPro database consisting of 49 different hand gestures. Experiments yield an average test accuracy of 83.57\% \& 3.5\% using a 200 ms window size and only 56,793 trainable parameters. Our results outperform the ViT without FNB, thus demonstrating that including FNB improves its performance. Our proposal framework EMGTFNet reported the significant potential for its practical application for prosthetic control.
    摘要 “我的电动控制是一个增加电omyography的兴趣领域,特别是在应用中有手势识别(HGR)的这些复义肢。今天的重点是使用机器学习和更深入的深度学习方法来进行模式识别。尽管可以取得好的结果,但这些模型通常需要大量的数据和训练时间。此外,由于随机的sEMG信号的性质,传统的模型无法扩展过去的样本,以致无法处理异常或噪音的值。在这篇文章中,我们提出了基于视觉 трансформа器(ViT)架构的EMGTFNet,以进行手势识别从表面电omyography(sEMG)信号。我们的提案的EMGTFNet架构可以将多种手势识别为无需增加资料增强技术、传统学习或网络中的参数数量。我们的实验结果显示,EMGTFNet可以高度精确地分类49种不同的手势,而且不需要增加训练数据或增加网络中的参数数量。我们的结果比ViT无FNB更好,这证明了包含FNB可以提高其表现。我们的建议框架EMGTFNet具有实际应用于复义控制的潜在性。”

AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming

  • paper_url: http://arxiv.org/abs/2309.13445
  • repo_url: None
  • paper_authors: Siva Satyendra Sahoo, Salim Ullah, Akash Kumar
  • for: 本研究旨在设计低成本计算机算符 для遥感系统中的机器学习(ML)算法。
  • methods: 本研究使用了人工智能/机器学习(AI/ML)基于的方法来设计FPGA基于的伪函数。
  • results: 相比传统的进化算法基于优化方法,本研究使用了混合整数二次函数 constrained programs来实现更有向性的搜索,并提高了精度和性能。
    Abstract With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML inference on resource-constrained systems. Approximate computing (AxC) aims to provide disproportionate gains in the power, performance, and area (PPA) of an application by allowing some level of reduction in its behavioral accuracy (BEHAV). Using approximate operators (AxOs) for computer arithmetic forms one of the more prevalent methods of implementing AxC. AxOs provide the additional scope for finer granularity of optimization, compared to only precision scaling of computer arithmetic. To this end, designing platform-specific and cost-efficient approximate operators forms an important research goal. Recently, multiple works have reported using AI/ML-based approaches for synthesizing novel FPGA-based AxOs. However, most of such works limit usage of AI/ML to designing ML-based surrogate functions used during iterative optimization processes. To this end, we propose a novel data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data and use the solutions to enable a more directed search approach for evolutionary optimization algorithms. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV, in the design of signed 8-bit multipliers.
    摘要 随着机器学习(ML)算法在嵌入式系统中的应用逐渐增加,需要设计低成本的计算机器 arithmetic 来支持这些资源受限的系统。为此,人们正在活跃探讨新的计算模型,如 aproximate 和 Stochastic computing,以利用 ML 算法的内置错误抗性来实现 ML 推理。approximate computing(AxC)目标是提供不均匀的 PPA 提升,而不是仅仅是精度的减少。使用 approximate 操作符(AxOs)来实现计算机器 arithmetic 是其中一种常见的方法。AxOs 提供了更高的优化精度,相比于仅仅是精度的缩放。为此,设计Platform-specific 和 cost-efficient approximate 操作符成为了一项重要的研究目标。最近,多种文献报道了使用 AI/ML 方法来 sinthez FPGA 基于 AxOs。然而,大多数这些工作都是限制使用 AI/ML 来设计 ML 基于 surrogate 函数,用于 iterative 优化过程中。因此,我们提出了一种数据分析驱动的数学编程方法来 sinthez approximate 操作符。 Specifically,我们使用权重分析结果来构建混合整数quadratically constrained 程序,并使用这些解决方案来实现更 direkt 的搜索方法。与传统的进化算法基于优化相比,我们报道了在设计 signed 8-bit 乘数器时,对 PPA 和 BEHAV 的共同优化中的21%提高。

How Do Drivers Behave at Roundabouts in a Mixed Traffic? A Case Study Using Machine Learning

  • paper_url: http://arxiv.org/abs/2309.13442
  • repo_url: None
  • paper_authors: Farah Abu Hamad, Rama Hasiba, Deema Shahwan, Huthaifa I. Ashqar
  • for: 这个研究旨在分类车手在环形巷与其他路用者之间的交互行为,以提高路面安全性。
  • methods: 使用数据驱动的无监督机器学习分类车手行为,使用车辆动力学数据,分为三种驾驶模式(保守、正常、强制)。
  • results: 研究发现,大多数车手在环形巷上的行为可以分为两种驾驶模式:保守和正常,因为环形巷的交通速度较低。此外,发现当车手与行人或自行车使用者互动时,大约77%的车手被分类为保守驾驶者,对于不参与互动的保守驾驶者而言,只有42%。这些结果显示车手在环形巷与其他路用者互动时可能会发生不寻常的行为,增加了交通碰撞的风险。
    Abstract Driving behavior is considered a unique driving habit of each driver and has a significant impact on road safety. Classifying driving behavior and introducing policies based on the results can reduce the severity of crashes on the road. Roundabouts are particularly interesting because of the interconnected interaction between different road users at the area of roundabouts, which different driving behavior is hypothesized. This study investigates driving behavior at roundabouts in a mixed traffic environment using a data-driven unsupervised machine learning to classify driving behavior at three roundabouts in Germany. We used a dataset of vehicle kinematics to a group of different vehicles and vulnerable road users (VRUs) at roundabouts and classified them into three categories (i.e., conservative, normal, and aggressive). Results showed that most of the drivers proceeding through a roundabout can be mostly classified into two driving styles: conservative and normal because traffic speeds in roundabouts are relatively lower than in other signalized and unsignalized intersections. Results also showed that about 77% of drivers who interacted with pedestrians or cyclists were classified as conservative drivers compared to about 42% of conservative drivers that did not interact or about 51% from all drivers. It seems that drivers tend to behave abnormally as they interact with VRUs at roundabouts, which increases the risk of crashes when an intersection is multimodal. Results of this study could be helpful in improving the safety of roads by allowing policymakers to determine the effective and suitable safety countermeasures. Results will also be beneficial for the Advanced Driver Assistance System (ADAS) as the technology is being deployed in a mixed traffic environment.
    摘要 驾驶行为被视为每位驾驶员的特有驾驶习惯,对路面安全有着重要影响。根据不同驾驶行为分类并采取相应政策可以减轻路面上的事故严重程度。圆形交叉口特别有趣,因为不同的驾驶行为在圆形交叉口的交叉点发生了互相关联的互动。本研究使用数据驱动无监督机器学习方法在德国三个圆形交叉口中分类驾驶行为。我们使用了车辆动态数据来分类不同的车辆和护理用路用户(VRU)在圆形交叉口中的驾驶行为,并将其分为三类(即保守、常规和强制)。结果显示,大多数通过圆形交叉口的驾驶员可以分为两种驾驶风格:保守和常规,因为圆形交叉口的交通速度相对较低。结果还显示,与步行者或自行车用户互动的77%的驾驶员被分类为保守驾驶员,与不与步行者或自行车用户互动的42%的保守驾驶员相比。这表明在多模式交叉口中,驾驶员在与VRU互动时有异常的行为,这会增加路面上的风险。本研究的结果可以帮助政策制定者确定有效和适当的安全防范措施。此外,这些结果还将有助于高等技术应用系统(ADAS)在混合交通环境中部署。

Finding Order in Chaos: A Novel Data Augmentation Method for Time Series in Contrastive Learning

  • paper_url: http://arxiv.org/abs/2309.13439
  • repo_url: https://github.com/eth-siplab/Finding_Order_in_Chaos
  • paper_authors: Berken Utku Demirel, Christian Holz
  • for: 这 paper 的目的是提出一种新的数据增强方法,用于 quasi-periodic 时间序列任务,以连接内类样本并找到隐藏空间中的顺序。
  • methods: 该方法基于 mixup 技术,并提出了一种新的方法,考虑非站ARY 时间序列的周期性。通过控制数据增强的混杂程度,该方法可以提高下游任务的表达特征和性能。
  • results: 对于三个时间序列任务(心率估算、人类活动识别和心血管疾病检测),该方法与州前工作相比,表现出了更好的数据生成和知道数据增强技术。
    Abstract The success of contrastive learning is well known to be dependent on data augmentation. Although the degree of data augmentations has been well controlled by utilizing pre-defined techniques in some domains like vision, time-series data augmentation is less explored and remains a challenging problem due to the complexity of the data generation mechanism, such as the intricate mechanism involved in the cardiovascular system. Moreover, there is no widely recognized and general time-series augmentation method that can be applied across different tasks. In this paper, we propose a novel data augmentation method for quasi-periodic time-series tasks that aims to connect intra-class samples together, and thereby find order in the latent space. Our method builds upon the well-known mixup technique by incorporating a novel approach that accounts for the periodic nature of non-stationary time-series. Also, by controlling the degree of chaos created by data augmentation, our method leads to improved feature representations and performance on downstream tasks. We evaluate our proposed method on three time-series tasks, including heart rate estimation, human activity recognition, and cardiovascular disease detection. Extensive experiments against state-of-the-art methods show that the proposed approach outperforms prior works on optimal data generation and known data augmentation techniques in the three tasks, reflecting the effectiveness of the presented method. Source code: https://github.com/eth-siplab/Finding_Order_in_Chaos
    摘要 成功的对比学习几乎总是受到数据增强的影响。虽然在某些领域如视觉领域中,数据增强的度已经很好地控制了,但时间序列数据增强仍然是一个挑战,因为时间序列数据生成机制的复杂性,如心血管系统的内部机制。此外,没有一种广泛认可和可适用于不同任务的时间序列数据增强方法。在这篇论文中,我们提出了一种新的时间序列数据增强方法,旨在连接同类样本 вместе,从而在隐藏空间找到顺序。我们的方法基于已知的mixup技术,并添加了一种新的方法,考虑非站ARY时间序列的周期性。此外,我们可控制数据增强中创造的混乱程度,从而获得改进的特征表示和下游任务的性能。我们在三个时间序列任务中进行了广泛的实验,包括心率估计、人员活动识别和冠状疾病检测。对比于现有的最佳数据生成和知道数据增强技术,我们的方法表现出色,反映了提出的方法的效iveness。源代码:https://github.com/eth-siplab/Finding_Order_in_Chaos

Rethinking Superpixel Segmentation from Biologically Inspired Mechanisms

  • paper_url: http://arxiv.org/abs/2309.13438
  • repo_url: None
  • paper_authors: Tingyu Zhao, Bo Peng, Yuan Sun, Daipeng Yang, Zhenguang Zhang, Xi Wu
    for: 这个论文主要针对的是提高深度学习基于超像分割方法的效率和性能,但是在生成严格遵循物体边界的超像时,仍然存在一定的挑战。methods: 我们提出了一种基于生物网络架构的超像分割方法,包括增强检查模块(ESM)和新的边界意识标签(BAL)。ESM通过模拟视觉系统中的交互投影机制来增强semantic信息。BAL利用视觉 cortical cells的空间频率特点来促进生成强边界遵循的超像。results: 我们通过对BSDS500 dataset和NYUv2 dataset进行评估,证明了我们的方法的有效性。
    Abstract Recently, advancements in deep learning-based superpixel segmentation methods have brought about improvements in both the efficiency and the performance of segmentation. However, a significant challenge remains in generating superpixels that strictly adhere to object boundaries while conveying rich visual significance, especially when cross-surface color correlations may interfere with objects. Drawing inspiration from neural structure and visual mechanisms, we propose a biological network architecture comprising an Enhanced Screening Module (ESM) and a novel Boundary-Aware Label (BAL) for superpixel segmentation. The ESM enhances semantic information by simulating the interactive projection mechanisms of the visual cortex. Additionally, the BAL emulates the spatial frequency characteristics of visual cortical cells to facilitate the generation of superpixels with strong boundary adherence. We demonstrate the effectiveness of our approach through evaluations on both the BSDS500 dataset and the NYUv2 dataset.
    摘要 近些年,深度学习基于超像素分割方法的进步,使得分割效率和性能得到了改善。然而,仍然存在一大挑战,即生成严格遵循物体边界的超像素,同时捕捉富有视觉意义的信息,特别是当颜色相关性障碍物体时。 drawing inspiration from neural structure and visual mechanisms, we propose a biological network architecture consisting of an Enhanced Screening Module (ESM) and a novel Boundary-Aware Label (BAL) for superpixel segmentation. The ESM enhances semantic information by simulating the interactive projection mechanisms of the visual cortex. Additionally, the BAL emulates the spatial frequency characteristics of visual cortical cells to facilitate the generation of superpixels with strong boundary adherence. We demonstrate the effectiveness of our approach through evaluations on both the BSDS500 dataset and the NYUv2 dataset.

SpeakEasy: A Conversational Intelligence Chatbot for Enhancing College Students’ Communication Skills

  • paper_url: http://arxiv.org/abs/2310.14891
  • repo_url: None
  • paper_authors: Hyunbae Jeon, Rhea Ramachandran, Victoria Ploerer, Yella Diekmann, Max Bagga
    for: The paper aims to help college students improve their communication skills through a chatbot that provides feedback on their conversational ability.methods: The chatbot, called SpeakEasy, uses a seven-minute spoken conversation with the user, analyzes the user’s responses with metrics based on previous research, and provides feedback on how to improve conversational ability.results: SpeakEasy evaluates the quality of the conversation using macros and provides elaborate feedback to the user on how to improve their conversations. The chatbot also updates its algorithms based on the user’s responses to questions about its performance.
    Abstract Social interactions and conversation skills separate the successful from the rest and the confident from the shy. For college students in particular, the ability to converse can be an outlet for the stress and anxiety experienced on a daily basis along with a foundation for all-important career skills. In light of this, we designed SpeakEasy: a chatbot with some degree of intelligence that provides feedback to the user on their ability to engage in free-form conversations with the chatbot. SpeakEasy attempts to help college students improve their communication skills by engaging in a seven-minute spoken conversation with the user, analyzing the user's responses with metrics designed based on previous psychology and linguistics research, and providing feedback to the user on how they can improve their conversational ability. To simulate natural conversation, SpeakEasy converses with the user on a wide assortment of topics that two people meeting for the first time might discuss: travel, sports, and entertainment. Unlike most other chatbots with the goal of improving conversation skills, SpeakEasy actually records the user speaking, transcribes the audio into tokens, and uses macros-e.g., sequences that calculate the pace of speech, determine if the user has an over-reliance on certain words, and identifies awkward transitions-to evaluate the quality of the conversation. Based on the evaluation, SpeakEasy provides elaborate feedback on how the user can improve their conversations. In turn, SpeakEasy updates its algorithms based on a series of questions that the user responds to regarding SpeakEasy's performance.
    摘要 社交交流和对话技巧对成功和自信心是非常重要的,尤其是 для大学生。在日常生活中受到压力和焦虑的情况下,与其他人交流可以是一种缓解压力的方式,同时也是职业技能的基础。为了帮助大学生提高communication skills,我们开发了SpeakEasy:一个具有一定程度的人工智能的chatbot,可以与用户进行7分钟的自由对话,并提供用户在对话中的表现评价。SpeakEasy使用了基于前期心理学和语言学研究的度量来评估用户的对话能力,并提供了用户如何改进对话技巧的具体反馈。与其他帮助提高对话技巧的chatbot不同,SpeakEasy实际记录用户的语音,将语音转录为符号,并使用抽象来评估对话质量。SpeakEasy使用的抽象包括语速度、用户语言使用情况和对话过渡的awkwardness等。基于这些评估结果,SpeakEasy提供了详细的反馈, помо助用户改进对话技巧。而SpeakEasy的算法则基于用户对SpeakEasy的表现进行评价的问题来进行更新。

Resolving References in Visually-Grounded Dialogue via Text Generation

  • paper_url: http://arxiv.org/abs/2309.13430
  • repo_url: https://github.com/willemsenbram/reference-resolution-via-text-generation
  • paper_authors: Bram Willemsen, Livia Qian, Gabriel Skantze
  • for: 用于解决基于对话语言的视觉引用解决方案,提高视觉语言模型(VLM)的对话处理能力。
  • methods: 使用修改的大语言模型(LLM)生成定语描述,捕捉对话语言上的核心相关信息;使用预训练的VLM来基于生成的定语描述进行零基本训练引用识别。
  • results: 在人工标注的视觉对话数据集上测试,与基eline比较的result exceeds,并发现使用更大的上下文窗口可以获得更高的返回率。
    Abstract Vision-language models (VLMs) have shown to be effective at image retrieval based on simple text queries, but text-image retrieval based on conversational input remains a challenge. Consequently, if we want to use VLMs for reference resolution in visually-grounded dialogue, the discourse processing capabilities of these models need to be augmented. To address this issue, we propose fine-tuning a causal large language model (LLM) to generate definite descriptions that summarize coreferential information found in the linguistic context of references. We then use a pretrained VLM to identify referents based on the generated descriptions, zero-shot. We evaluate our approach on a manually annotated dataset of visually-grounded dialogues and achieve results that, on average, exceed the performance of the baselines we compare against. Furthermore, we find that using referent descriptions based on larger context windows has the potential to yield higher returns.
    摘要 传感语言模型(VLM)在基于简单文本查询的图像检索方面表现出色,但基于对话输入的文本-图像检索仍然是一个挑战。因此,如果我们想使用VLM进行视觉定位对话,那么这些模型的语言处理能力需要进行增强。为解决这个问题,我们提议通过细化大语言模型(LLM)来生成定语描述,捕捉在语言上下文中的核心相关信息。然后,我们使用预训练的VLM来根据生成的描述来确定参照,无需训练。我们对手动标注的视觉定位对话集进行评估,并超越比较基线的性能。此外,我们发现使用基于更大上下文窗口的定语描述有可能带来更高的返回。

Modeling Student Performance in Game-Based Learning Environments

  • paper_url: http://arxiv.org/abs/2309.13429
  • repo_url: https://github.com/harryjeon24/student_performance
  • paper_authors: Hyunbae Jeon, Harry He, Anthony Wang, Susanna Spooner
  • for: 这项研究探讨了基于游戏学习的教育游戏”Jo Wilder和首都案例”,关注使用不同机器学习模型预测学生表现,包括K-最近邻居(KNN)、多层感知神经网络(MLP)和随机森林。研究目标是确定预测学生表现和正确问题答案的最有价值特征。
  • methods: 通过利用游戏数据,我们建立了完整的基准chmarks для这些模型,并探讨了如何应用正确的数据聚合方法。我们压缩了原始训练数据的大小从4.6 GB压缩到48 MB的预处理训练数据,保持了高F1分数和准确率。
  • results: 我们的发现表明,适当的预处理技术可以在不使用深度学习模型的情况下提高表现。MLP模型在French Touch模型当前状态的比较中表现出色,达到F-1分数0.83和准确率0.74,这表明其适用于这个数据集。未来的研究应该探索使用更大的数据集、其他预处理技术、更先进的深度学习技术和实际应用来为学生根据预测表现提供个性化学习建议。这项研究贡献于游戏学习理解和优化教育游戏经验,以提高学生的成绩和技能发展。
    Abstract This study investigates game-based learning in the context of the educational game "Jo Wilder and the Capitol Case," focusing on predicting student performance using various machine learning models, including K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and Random Forest. The research aims to identify the features most predictive of student performance and correct question answering. By leveraging gameplay data, we establish complete benchmarks for these models and explore the importance of applying proper data aggregation methods. By compressing all numeric data to min/max/mean/sum and categorical data to first, last, count, and nunique, we reduced the size of the original training data from 4.6 GB to 48 MB of preprocessed training data, maintaining high F1 scores and accuracy. Our findings suggest that proper preprocessing techniques can be vital in enhancing the performance of non-deep-learning-based models. The MLP model outperformed the current state-of-the-art French Touch model, achieving an F-1 score of 0.83 and an accuracy of 0.74, suggesting its suitability for this dataset. Future research should explore using larger datasets, other preprocessing techniques, more advanced deep learning techniques, and real-world applications to provide personalized learning recommendations to students based on their predicted performance. This paper contributes to the understanding of game-based learning and provides insights into optimizing educational game experiences for improved student outcomes and skill development.
    摘要 We preprocessed the original training data by compressing all numeric data to min/max/mean/sum and categorical data to first, last, count, and nunique, reducing the data size from 4.6 GB to 48 MB while maintaining high F1 scores and accuracy. Our findings suggest that proper preprocessing techniques can significantly enhance the performance of non-deep-learning-based models.The MLP model outperformed the current state-of-the-art French Touch model, achieving an F-1 score of 0.83 and an accuracy of 0.74, suggesting its suitability for this dataset. Future research should explore using larger datasets, other preprocessing techniques, more advanced deep learning techniques, and real-world applications to provide personalized learning recommendations to students based on their predicted performance.This study contributes to the understanding of game-based learning and provides insights into optimizing educational game experiences for improved student outcomes and skill development.

ECGNet: A generative adversarial network (GAN) approach to the synthesis of 12-lead ECG signals from single lead inputs

  • paper_url: http://arxiv.org/abs/2310.03753
  • repo_url: None
  • paper_authors: Max Bagga, Hyunbae Jeon, Alex Issokson
  • for: 这个论文的目的是生成完整的12导电cardiogram信号,并使用GAN模型来实现这一目标。
  • methods: 这个论文使用了GAN模型,bidirectional LSTM生成器和CNN抗对模型来生成12导电cardiogram信号。
  • results: 该模型可以很好地保留信号中的特有特征,例如P-Q段和R峰的特征,并且可以预测多种心血管疾病的发生。
    Abstract Electrocardiography (ECG) signal generation has been heavily explored using generative adversarial networks (GAN) because the implementation of 12-lead ECGs is not always feasible. The GAN models have achieved remarkable results in reproducing ECG signals but are only designed for multiple lead inputs and the features the GAN model preserves have not been identified-limiting the generated signals use in cardiovascular disease (CVD)-predictive models. This paper presents ECGNet which is a procedure that generates a complete set of 12-lead ECG signals from any single lead input using a GAN framework with a bidirectional long short-term memory (LSTM) generator and a convolutional neural network (CNN) discriminator. Cross and auto-correlation analysis performed on the generated signals identifies features conserved during the signal generation-i.e., features that can characterize the unique-nature of each signal and thus likely indicators of CVD. Finally, by using ECG signals annotated with the CVD-indicative features detailed by the correlation analysis as inputs for a CVD-onset-predictive CNN model, we overcome challenges preventing the prediction of multiple-CVD targets. Our models are experimented on 15s 12-lead ECG dataset recorded using MyoVista's wavECG. Functional outcome data for each patient is recorded and used in the CVD-predictive model. Our best GAN model achieves state-of-the-art accuracy with Frechet Distance (FD) scores of 4.73, 4.89, 5.18, 4.77, 4.71, and 5.55 on the V1-V6 pre-cordial leads respectively and shows strength in preserving the P-Q segments and R-peaks in the generated signals. To the best of our knowledge, ECGNet is the first to predict all of the remaining eleven leads from the input of any single lead.
    摘要 电rokardiography(ECG)信号生成已经得到了广泛的探索,使用生成对抗网络(GAN),因为实施12导ECG的实施不一定可行。GAN模型已经实现了对ECG信号的很好的重现,但是它们只是多导输入的,而且保留的特征没有得到了识别-这限制了生成的信号在冠军疾病预测中的使用。本文提出了ECGNet,一种可以从单个导入信号中生成完整的12导ECG信号的GAN框架,包括一个双向长短期记忆(LSTM)生成器和一个卷积神经网络(CNN)分类器。在生成的信号中进行了交叉和自相关分析,并识别了保留的特征-即可以Characterize每个信号的独特性,因此可能是冠军疾病的指标。最后,我们使用了标注了CVD指标的ECG信号作为输入,并使用了一个CVD发生预测的CNN模型,解决了由于多个CVD目标的预测而产生的挑战。我们对15秒12导ECG数据集进行了实验,该数据集使用MyoVista的wavECG记录。每个患者的功能结果数据都被记录,并用于CVD发生预测模型。我们的最佳GAN模型在V1-V6前心导电位上获得了state-of-the-art的准确率,FD分数分别为4.73、4.89、5.18、4.77、4.71和5.55,并且表现出了保持P-Q段和R-peak的强大能力。而且,根据我们知道,ECGNet是第一个可以从任何单个导入信号中预测所有的11导ECG信号。

A Chat About Boring Problems: Studying GPT-based text normalization

  • paper_url: http://arxiv.org/abs/2309.13426
  • repo_url: None
  • paper_authors: Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg
  • for: 本研究旨在探讨语言模型是否可以有效地进行文本normalization,并提出了一种新的文本normalizationtask设计方法。
  • methods: 本研究使用了大型语言模型(LLM),结合自我一致性理解和语言知识引入的提问工程,以实践文本normalization的可行性。
  • results: 研究发现,使用LLM进行文本normalization可以在几个shotenario下实现错误率大约40%下降,而且通过分析错误原因,发现了传统文本normalization任务的一些限制。
    Abstract Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for language models. In this work, we argue otherwise. We empirically show the capacity of Large-Language Models (LLM) for text normalization in few-shot scenarios. Combining self-consistency reasoning with linguistic-informed prompt engineering, we find LLM based text normalization to achieve error rates around 40\% lower than top normalization systems. Further, upon error analysis, we note key limitations in the conventional design of text normalization tasks. We create a new taxonomy of text normalization errors and apply it to results from GPT-3.5-Turbo and GPT-4.0. Through this new framework, we can identify strengths and weaknesses of GPT-based TN, opening opportunities for future work.
    摘要

Penalties and Rewards for Fair Learning in Paired Kidney Exchange Programs

  • paper_url: http://arxiv.org/abs/2309.13421
  • repo_url: None
  • paper_authors: Margarida Carvalho, Alison Caulfield, Yi Lin, Adrian Vetta
  • for: 这个论文旨在探讨了一种动态交换和分配机制,以提高生产力探讨机制的性能。
  • methods: 该论文使用了学习算法,以在动态模拟中学习优化患者-捐献者权重,以提高结果。
  • results: 研究发现,在加拿大生产力探讨计划中,使用学习算法可以提高平均等待时间、增加移植数量和提高群体公平。具体来说,最高表现的学习算法可以提高群体公平性 by 10%,同时增加移植数量 by 6%和降低等待时间 by 24%。但研究的核心结果却是,在提高生产力探讨计划的性能方面,不是将积极分配给患者-捐献者对的正面权重,而是通过对少量非指定捐献者的负面权重分配来实现。
    Abstract A kidney exchange program, also called a kidney paired donation program, can be viewed as a repeated, dynamic trading and allocation mechanism. This suggests that a dynamic algorithm for transplant exchange selection may have superior performance in comparison to the repeated use of a static algorithm. We confirm this hypothesis using a full scale simulation of the Canadian Kidney Paired Donation Program: learning algorithms, that attempt to learn optimal patient-donor weights in advance via dynamic simulations, do lead to improved outcomes. Specifically, our learning algorithms, designed with the objective of fairness (that is, equity in terms of transplant accessibility across cPRA groups), also lead to an increased number of transplants and shorter average waiting times. Indeed, our highest performing learning algorithm improves egalitarian fairness by 10% whilst also increasing the number of transplants by 6% and decreasing waiting times by 24%. However, our main result is much more surprising. We find that the most critical factor in determining the performance of a kidney exchange program is not the judicious assignment of positive weights (rewards) to patient-donor pairs. Rather, the key factor in increasing the number of transplants, decreasing waiting times and improving group fairness is the judicious assignment of a negative weight (penalty) to the small number of non-directed donors in the kidney exchange program.
    摘要 一个肾移植计划,也称为肾对肾移植计划,可以看作是一种循环、动态的交易和分配机制。这表明使用动态算法进行移植交易选择可能会有更高的性能。我们确认这一假设使用加拿大肾对肾移植计划的全规模模拟:学习算法,尝试通过动态模拟来学习患者-捐精对的优质量因子,实际上会导致改进的结果。Specifically, our learning algorithms, designed with the objective of fairness (that is, equity in terms of transplant accessibility across cPRA groups), also lead to an increased number of transplants and shorter average waiting times. Indeed, our highest performing learning algorithm improves egalitarian fairness by 10% whilst also increasing the number of transplants by 6% and decreasing waiting times by 24%. However, our main result is much more surprising. We find that the most critical factor in determining the performance of a kidney exchange program is not the judicious assignment of positive weights (rewards) to patient-donor pairs. Rather, the key factor in increasing the number of transplants, decreasing waiting times and improving group fairness is the judicious assignment of a negative weight (penalty) to the small number of non-directed donors in the kidney exchange program.

State-space Models with Layer-wise Nonlinearity are Universal Approximators with Exponential Decaying Memory

  • paper_url: http://arxiv.org/abs/2309.13414
  • repo_url: None
  • paper_authors: Shida Wang, Beichen Xue
  • for: 这篇论文主要研究了使用层状态模型来模型连续序列之间的关系。
  • methods: 论文使用了层状态模型,并在每层添加非线性活化来提高模型的表达能力。
  • results: 研究表明,通过层状态模型和非线性活化的组合,可以有效地模型复杂的连续序列模式。但是,研究也表明,状态空间模型无法根本解决指数减少的内存问题。
    Abstract State-space models have gained popularity in sequence modelling due to their simple and efficient network structures. However, the absence of nonlinear activation along the temporal direction limits the model's capacity. In this paper, we prove that stacking state-space models with layer-wise nonlinear activation is sufficient to approximate any continuous sequence-to-sequence relationship. Our findings demonstrate that the addition of layer-wise nonlinear activation enhances the model's capacity to learn complex sequence patterns. Meanwhile, it can be seen both theoretically and empirically that the state-space models do not fundamentally resolve the exponential decaying memory issue. Theoretical results are justified by numerical verifications.
    摘要 状态空间模型在序列模型中得到了广泛应用,因为它们的简单和高效的网络结构。然而,在时间方向上缺乏非线性活化限制了模型的容量。在这篇论文中,我们证明了将层weise非线性活化核心到状态空间模型可以近似任何连续序列到序列关系。我们的发现表明,增加层wise非线性活化可以提高模型学习复杂序列模式的能力。同时,可以在理论和实验两个方面见到,状态空间模型并没有根本解决指数减少记忆问题。理论结果得到了数值验证。

Towards Attributions of Input Variables in a Coalition

  • paper_url: http://arxiv.org/abs/2309.13411
  • repo_url: None
  • paper_authors: Xinhao Zheng, Huiqi Deng, Quanshi Zhang
  • for: 这paper的目的是开发一种新的贡献计算方法,以解释个体变量贡献和其党筹贡献之间的冲突。
  • methods: 该paper使用了一种全新的视角来推导贡献计算方法,包括将Harsanyi交互编码为AI模型中的交互分配,然后将Shapley值扩展到党筹贡献领域。
  • results: 该paper发现了冲突的基本机制,即党筹中包含部分变量的交互导致这种冲突。
    Abstract This paper aims to develop a new attribution method to explain the conflict between individual variables' attributions and their coalition's attribution from a fully new perspective. First, we find that the Shapley value can be reformulated as the allocation of Harsanyi interactions encoded by the AI model. Second, based the re-alloction of interactions, we extend the Shapley value to the attribution of coalitions. Third we ective. We derive the fundamental mechanism behind the conflict. This conflict come from the interaction containing partial variables in their coalition.
    摘要 这篇论文目的是开发一种新的归因方法,以解释个体变量归因和其党的归因之间的冲突。我们首先发现,夏普利值可以被重新解释为由人工智能模型编码的哈萨尼(Harsanyi)互动的分配。其次,基于重新分配互动,我们扩展了夏普利值来归因党。最后,我们 derive了这种冲突的基本机制,这种冲突来自各个变量在其党中的互动中含有部分变量。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Time-Series Forecasting: Unleashing Long-Term Dependencies with Fractionally Differenced Data

  • paper_url: http://arxiv.org/abs/2309.13409
  • repo_url: None
  • paper_authors: Sarit Maitra, Vivek Mishra, Srashti Dwivedi, Sukanya Kundu, Goutam Kumar Kundu
  • for: 这个研究旨在提出一种新的预测策略,利用分数差分(FD)来捕捉时间序列数据中的短期和长期依赖关系。
  • methods: 这个研究使用了FD法,与传统的整数差分方法不同,FD可以维护时间序列的记忆,同时为预测目的进行稳定化。研究还使用了新闻报道的 sentiment分析,将FD应用于股票指数SPY的金融数据。
  • results: 研究结果表明,FD在与目标变量进行binary分类时表现出优于整数差分,这得到了ROCAUC和MCC评价的证明。
    Abstract This study introduces a novel forecasting strategy that leverages the power of fractional differencing (FD) to capture both short- and long-term dependencies in time series data. Unlike traditional integer differencing methods, FD preserves memory in series while stabilizing it for modeling purposes. By applying FD to financial data from the SPY index and incorporating sentiment analysis from news reports, this empirical analysis explores the effectiveness of FD in conjunction with binary classification of target variables. Supervised classification algorithms were employed to validate the performance of FD series. The results demonstrate the superiority of FD over integer differencing, as confirmed by Receiver Operating Characteristic/Area Under the Curve (ROCAUC) and Mathews Correlation Coefficient (MCC) evaluations.
    摘要

A Unitary Weights Based One-Iteration Quantum Perceptron Algorithm for Non-Ideal Training Sets

  • paper_url: http://arxiv.org/abs/2309.14366
  • repo_url: None
  • paper_authors: Wenjie Liu, Peipei Gao, Yuxiang Wang, Wenbin Yu, Maojun Zhang
  • for: 提高量子神经网络的训练集不完美问题和一次学习问题
  • methods: 提出了一种基于单位 weights 的高效量子见解算法,通过计算总加重矩阵的特征值分解来使加重矩阵变为单位矩阵
  • results: 示例验证了量子门 Warren gates {H, S, T, CNOT, Toffoli, Fredkin} 的准确实现,并且与其他量子见解算法进行比较,显示了我们的算法在应用性、准确性和可用性等方面具有优势。此外,为了进一步验证我们的算法的可应用性,还提出了一种量子复合门,该门由多个基本量子门组成。
    Abstract In order to solve the problem of non-ideal training sets (i.e., the less-complete or over-complete sets) and implement one-iteration learning, a novel efficient quantum perceptron algorithm based on unitary weights is proposed, where the singular value decomposition of the total weight matrix from the training set is calculated to make the weight matrix to be unitary. The example validation of quantum gates {H, S, T, CNOT, Toffoli, Fredkin} shows that our algorithm can accurately implement arbitrary quantum gates within one iteration. The performance comparison between our algorithm and other quantum perceptron algorithms demonstrates the advantages of our algorithm in terms of applicability, accuracy, and availability. For further validating the applicability of our algorithm, a quantum composite gate which consists of several basic quantum gates is also illustrated.
    摘要 为解决非理想训练集(即部分或过complete的集)和实现一轮学习,一种新的高效量子批量算法基于单位Weightmatrix是提出的,其中来自训练集的总weight矩阵的singular value decomposition被计算以使weight矩阵变为单位矩阵。例子验证量子门{H, S, T, CNOT, Toffoli, Fredkin}表明,我们的算法可以在一轮内准确实现任意量子门。与其他量子批量算法相比,我们的算法在可用性、准确性和可用性等方面具有优势。为进一步验证我们的算法的可用性,一种量子复合门,由多个基本量子门组成,也被描述。

A Survey on Image-text Multimodal Models

  • paper_url: http://arxiv.org/abs/2309.15857
  • repo_url: https://github.com/i2vec/a-survey-on-image-text-multimodal-models
  • paper_authors: Ruifeng Guo, Jingxuan Wei, Linzhuang Sun, Bihui Yu, Guiyong Chang, Dawei Liu, Sibo Zhang, Zhengbing Yao, Mingjun Xu, Liping Bu
    for:This paper provides a comprehensive review of the evolution and current state of image-text multimodal models, exploring their application value, challenges, and potential research trajectories.methods:The paper revisits the basic concepts and developmental milestones of image-text multimodal models, introducing a novel classification that segments their evolution into three distinct phases, and proposes a categorization of the tasks associated with image-text multimodal models into five major types.results:The paper delves into the inherent challenges and limitations of image-text multimodal models and fosters the exploration of prospective research directions, offering an exhaustive overview of the present research landscape of image-text multimodal models and serving as a valuable reference for future scholarly endeavors.
    Abstract Amidst the evolving landscape of artificial intelligence, the convergence of visual and textual information has surfaced as a crucial frontier, leading to the advent of image-text multimodal models. This paper provides a comprehensive review of the evolution and current state of image-text multimodal models, exploring their application value, challenges, and potential research trajectories. Initially, we revisit the basic concepts and developmental milestones of these models, introducing a novel classification that segments their evolution into three distinct phases, based on their time of introduction and subsequent impact on the discipline. Furthermore, based on the tasks' significance and prevalence in the academic landscape, we propose a categorization of the tasks associated with image-text multimodal models into five major types, elucidating the recent progress and key technologies within each category. Despite the remarkable accomplishments of these models, numerous challenges and issues persist. This paper delves into the inherent challenges and limitations of image-text multimodal models, fostering the exploration of prospective research directions. Our objective is to offer an exhaustive overview of the present research landscape of image-text multimodal models and to serve as a valuable reference for future scholarly endeavors. We extend an invitation to the broader community to collaborate in enhancing the image-text multimodal model community, accessible at: \href{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.
    摘要 在人工智能的演化 landscape 中,图文合并成为了一个关键的前ier,导致了图文多modal模型的出现。本文提供了图文多modal模型的全面回顾和当前状况,探讨其应用价值、挑战和可能的研究车道。首先,我们回顾了这些模型的基本概念和发展历程,提出了一种新的分类方法,将其分为三个不同的阶段,根据它们的出现时间和对领域的影响。此外,根据学术景观中任务的重要性和普遍性,我们对图文多modal模型相关任务进行了五种主要类别的分类,阐述了最近的进步和关键技术在每个类别中。 despite the remarkable achievements of these models, numerous challenges and issues persist. This paper explores the inherent challenges and limitations of image-text multimodal models, and invites the broader community to collaborate in enhancing the image-text multimodal model community, accessible at: \href{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.Here's the word-for-word translation of the text into Simplified Chinese:在人工智能的演化 landscape 中,图文合并成为了一个关键的前ier,导致了图文多modal模型的出现。本文提供了图文多modal模型的全面回顾和当前状况,探讨其应用价值、挑战和可能的研究车道。首先,我们回顾了这些模型的基本概念和发展历程,提出了一种新的分类方法,将其分为三个不同的阶段,根据它们的出现时间和对领域的影响。此外,根据学术景观中任务的重要性和普遍性,我们对图文多modal模型相关任务进行了五种主要类别的分类,阐述了最近的进步和关键技术在每个类别中。 despite the remarkable achievements of these models, numerous challenges and issues persist. This paper explores the inherent challenges and limitations of image-text multimodal models, and invites the broader community to collaborate in enhancing the image-text multimodal model community, accessible at: \href{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}{https://github.com/i2vec/A-survey-on-image-text-multimodal-models}.

Smart City Digital Twin Framework for Real-Time Multi-Data Integration and Wide Public Distribution

  • paper_url: http://arxiv.org/abs/2309.13394
  • repo_url: None
  • paper_authors: Lorenzo Adreani, Pierfrancesco Bellini, Marco Fanfani, Paolo Nesi, Gianni Pantaleo
  • for: 这个论文是为了介绍一种基于Snap4City IoT平台的城市数字孪生框架,用于支持城市规划和管理决策。
  • methods: 该框架使用了数据收集、索引、计算和信息分布等方法,并将这些方法集成到了一个跨多个数据源的平台上,以实现实时更新的数字孪生。
  • results: 该框架可以提供实时的城市情况描述、预测和仿真分析结果,包括交通拥堵、污染物分布、可能的结果等,并且支持公民参与城市决策过程。
    Abstract Digital Twins are digital replica of real entities and are becoming fundamental tools to monitor and control the status of entities, predict their future evolutions, and simulate alternative scenarios to understand the impact of changes. Thanks to the large deployment of sensors, with the increasing information it is possible to build accurate reproductions of urban environments including structural data and real-time information. Such solutions help city councils and decision makers to face challenges in urban development and improve the citizen quality of life, by ana-lysing the actual conditions, evaluating in advance through simulations and what-if analysis the outcomes of infrastructural or political chang-es, or predicting the effects of humans and/or of natural events. Snap4City Smart City Digital Twin framework is capable to respond to the requirements identified in the literature and by the international forums. Differently from other solutions, the proposed architecture provides an integrated solution for data gathering, indexing, computing and information distribution offered by the Snap4City IoT platform, therefore realizing a continuously updated Digital Twin. 3D building models, road networks, IoT devices, WoT Entities, point of interests, routes, paths, etc., as well as results from data analytical processes for traffic density reconstruction, pollutant dispersion, predictions of any kind, what-if analysis, etc., are all integrated into an accessible web interface, to support the citizens participation in the city decision processes. What-If analysis to let the user performs simulations and observe possible outcomes. As case of study, the Digital Twin of the city of Florence (Italy) is presented. Snap4City platform, is released as open-source, and made available through GitHub and as docker compose.
    摘要 “数字双”是数字世界中的实体复制品,它们在监测和控制实体状态、预测未来发展和模拟不同enario来理解改变的影响。随着丰富的传感器的扩散,可以建立 precisemodels of urban environments, including structural data and real-time information。这些解决方案帮助城市议会和决策者面对城市发展的挑战,提高公民的生活质量,通过实际情况分析、预测变化和“what-if”分析来评估基础设施或政策变化的影响。Snap4City Smart City Digital Twin框架能够应对文献和国际论坛中所提出的需求。与其他解决方案不同,我们的架构提供了一个集成的数据收集、索引、计算和信息分发的解决方案,以实现不断更新的数字双。3D建筑模型、路网、物联网设备、Web of Things实体、终端、路线、轨迹等都会被集成到一个可访问的Web界面中,以支持公民参与城市决策过程。“what-if”分析允许用户进行模拟和观察可能的结果。作为案例研究,我们介绍了 Florence(意大利)的数字双。Snap4City平台释放为开源,通过 GitHub和docker compose 进行分发。

AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture

  • paper_url: http://arxiv.org/abs/2309.13393
  • repo_url: None
  • paper_authors: Leonardo Saraceni, Ionut M. Motoi, Daniele Nardi, Thomas A. Ciarfuglia
  • for: 这个论文是为了解决精准农业中的多目标跟踪问题,这个问题是机器人学中的一个挑战。
  • methods: 这篇论文提出了一种基于运动信息的实时跟踪检测管道,即AgriSORT,该管道可以快速和准确地在视频序列中传播跟踪。
  • results: 在一个特制的农业上的MOT benchмарck上测试了AgriSORT管道,并得到了高效和准确的跟踪结果。
    Abstract The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT [5], we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons.
    摘要 “多目标追踪(MOT)问题的挑战是在识别和追踪影像序列中的所有物件,并保留每个物件唯一的识别码。这是机器人学中的基本问题。在精确农业中,实现满意的解决方案受到极大的镜头运动、突然的照明变化和强大的遮蔽影响。现代追踪器多数依靠物件的外观而非运动进行相互关联,这在农业案例中可能无效,因为大多数目标是静止的物件,具有相同的外观。为此,我们基于SORT [5]的概念,提出了AgriSORT,一个简单、在线、实时的追踪-by-探测管线,仅基于运动资讯,可以实现精确和快速的探测迹踪转换。AgriSORT的主要专注点包括效率、灵活性、最小化依赖和机器人平台的易用性。我们将该管线评估在特有的农业上的MOT实验中,基于简体葡萄园的视频序列,特别是由于强大的自相似和物件的密度。管线和数据都可以供未来的比较。”

D-Separation for Causal Self-Explanation

  • paper_url: http://arxiv.org/abs/2309.13391
  • repo_url: https://github.com/jugechengzi/rationalization-mcd
  • paper_authors: Wei Liu, Jun Wang, Haozhao Wang, Ruixuan Li, Zhiying Deng, YuanKai Zhang, Yang Qiu
  • for: 提高 NLP 模型的解释性和精度
  • methods: 基于 Minimum Conditional Dependence(MCD) criterion,使用 KL-divergence 度量依赖性,提高 F1 分数
  • results: 与先前最佳 MMI-based 方法比较,MCD 方法可以提高 F1 分数达到 $13.7%$ 之间
    Abstract Rationalization is a self-explaining framework for NLP models. Conventional work typically uses the maximum mutual information (MMI) criterion to find the rationale that is most indicative of the target label. However, this criterion can be influenced by spurious features that correlate with the causal rationale or the target label. Instead of attempting to rectify the issues of the MMI criterion, we propose a novel criterion to uncover the causal rationale, termed the Minimum Conditional Dependence (MCD) criterion, which is grounded on our finding that the non-causal features and the target label are \emph{d-separated} by the causal rationale. By minimizing the dependence between the unselected parts of the input and the target label conditioned on the selected rationale candidate, all the causes of the label are compelled to be selected. In this study, we employ a simple and practical measure of dependence, specifically the KL-divergence, to validate our proposed MCD criterion. Empirically, we demonstrate that MCD improves the F1 score by up to $13.7\%$ compared to previous state-of-the-art MMI-based methods. Our code is available at: \url{https://github.com/jugechengzi/Rationalization-MCD}.
    摘要 <>这是一个自解释的框架 для NLP模型。传统工作通常使用最大共同信息(MMI) criterion 来找到这些模型的理由,但这个标准可能受到假冒的特征所影响,这些特征可能与目标标签或理由相关。而不是尝试修正 MMI 标准的问题,我们提出了一个新的标准,即最小侧项依存性(MCD)标准,这是基于我们发现非 causal 特征和目标标签在 causal 理由下是 d-separated 的现象。通过将选择的理由候选者中的非选择部分的输入与目标标签之间的依存关系降至最低,所有的 Label 的原因都会被选择。在这个研究中,我们使用了一个简单实用的依存度量,具体是 KL- divergence,以验证我们的提出的 MCD 标准。实验结果显示,MCD 可以与之前的 MMI 基于的方法相比,提高 F1 分数达 13.7%。我们的代码可以在:\url{https://github.com/jugechengzi/Rationalization-MCD} 中找到。

Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment

  • paper_url: http://arxiv.org/abs/2309.13378
  • repo_url: https://github.com/hieu9955/ggggg
  • paper_authors: Yutong Xia, Yuxuan Liang, Haomin Wen, Xu Liu, Kun Wang, Zhengyang Zhou, Roger Zimmermann
  • for: 本文旨在解决预测空间时间图(STG)中的 temporal out-of-distribution(OoD)问题和动态空间 causation 问题。
  • methods: 本文提出了一种名为 CaST 的新框架,利用 causal 镜头来解读 STG 数据生成过程,并采用 back-door adjustment 和 front-door adjustment 等方法来处理 temporal OoD 问题和 causal 衍生效应。
  • results: 实验结果表明,CaST 可以准确地预测 STG,并且在三个实际数据集上表现出色,常常超过现有方法。此外,CaST 具有良好的解释性。
    Abstract Spatio-Temporal Graph (STG) forecasting is a fundamental task in many real-world applications. Spatio-Temporal Graph Neural Networks have emerged as the most popular method for STG forecasting, but they often struggle with temporal out-of-distribution (OoD) issues and dynamic spatial causation. In this paper, we propose a novel framework called CaST to tackle these two challenges via causal treatments. Concretely, leveraging a causal lens, we first build a structural causal model to decipher the data generation process of STGs. To handle the temporal OoD issue, we employ the back-door adjustment by a novel disentanglement block to separate invariant parts and temporal environments from input data. Moreover, we utilize the front-door adjustment and adopt the Hodge-Laplacian operator for edge-level convolution to model the ripple effect of causation. Experiments results on three real-world datasets demonstrate the effectiveness and practicality of CaST, which consistently outperforms existing methods with good interpretability.
    摘要 espacio-temporal graph (STG) 预测是现实世界中许多应用场景中的基本任务。 espacio-temporal graph neural networks (STGNNs) 已经成为 STG 预测的最受欢迎方法,但它们经常面临时间外部预测 (OoD) 问题和动态空间 causation。 在这篇论文中,我们提议一种名为 CaST 的框架,以解决这两个挑战。具体来说,我们首先利用 causal 镜头来理解 STG 数据生成过程。为了处理时间 OoD 问题,我们使用一种新的分离块来分离输入数据中的不变部分和时间环境。此外,我们使用 front-door 调整和霍迪-拉普拉斯算子来模型 causation 的涟漪效应。实验结果表明,CaST 在三个真实世界数据集上具有优秀的效果和可读性,并经常超越现有方法。

Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs

  • paper_url: http://arxiv.org/abs/2309.13365
  • repo_url: None
  • paper_authors: Hecotr Kohler, Riad Akrour, Philippe Preux
  • for: 提高AI模型的可解释性,以便用户建立对其信任。
  • methods: 使用强化学习框架,在DT中探索特征之间的关系,以建立更加紧凑的DT。
  • results: 通过抽离特征之间的关系,可以减少DT的大小,同时保持模型的性能。
    Abstract Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been proposed to explore the space of DTs using deep RL. This framework augments a decision problem (e.g. a supervised classification task) with additional actions that gather information about the features of an otherwise hidden input. By appropriately penalizing these actions, the agent learns to optimally trade-off size and performance of DTs. In practice, a reactive policy for a partially observable Markov decision process (MDP) needs to be learned, which is still an open problem. We show in this paper that deep RL can fail even on simple toy tasks of this class. However, when the underlying decision problem is a supervised classification task, we show that finding the optimal tree can be cast as a fully observable Markov decision problem and be solved efficiently, giving rise to a new family of algorithms for learning DTs that go beyond the classical greedy maximization ones.
    摘要 “AI模型的可解释性允许用户建立信任,以建立可靠的AI。尤其是决策树(DT)可以提供全面的模型显示和对输入特征的透彻显示,从而帮助用户了解模型的问题。然而,如果DT太大,则可能会妨碍可解释性。为了学习尺寸小的DT,一个最近的强化学习(RL)框架已经被提议,通过将决策问题(例如分类任务)与额外的动作搜索整合,以便在搜索DT时,对输入特征进行有效的探索。在实践中,需要学习一个可 React的策略,这是一个 ainda 未解决的问题。我们在这篇论文中显示,深度RL可以在简单的玩具任务上失败,但当对决策问题时,我们可以将找到最佳树的问题转化为一个可观察的Markov决策过程(MDP),并有效地解决它,从而开启了一新的家族Algorithm для学习DT,与传统的单簇最大化算法不同。”

MLPST: MLP is All You Need for Spatio-Temporal Prediction

  • paper_url: http://arxiv.org/abs/2309.13363
  • repo_url: None
  • paper_authors: Zijian Zhang, Ze Huang, Zhiwei Hu, Xiangyu Zhao, Wanyu Wang, Zitao Liu, Junbo Zhang, S. Joe Qin, Hongwei Zhao
  • For: 预测交通流量,提高公共交通系统的运作效率和可靠性。* Methods: 提出了一种简单、轻量级的多层感知器(MLP)架构,通过快速和高效的MLP处理, capture 空间和时间关系,并且需要只有线性计算复杂度和模型参数数量相对较少。* Results: 经过广泛的实验 validate MLPST的高效性和灵活性,并且在模型准确率最高的情况下,MLPST achieves the best time and space efficiency。
    Abstract Traffic prediction is a typical spatio-temporal data mining task and has great significance to the public transportation system. Considering the demand for its grand application, we recognize key factors for an ideal spatio-temporal prediction method: efficient, lightweight, and effective. However, the current deep model-based spatio-temporal prediction solutions generally own intricate architectures with cumbersome optimization, which can hardly meet these expectations. To accomplish the above goals, we propose an intuitive and novel framework, MLPST, a pure multi-layer perceptron architecture for traffic prediction. Specifically, we first capture spatial relationships from both local and global receptive fields. Then, temporal dependencies in different intervals are comprehensively considered. Through compact and swift MLP processing, MLPST can well capture the spatial and temporal dependencies while requiring only linear computational complexity, as well as model parameters that are more than an order of magnitude lower than baselines. Extensive experiments validated the superior effectiveness and efficiency of MLPST against advanced baselines, and among models with optimal accuracy, MLPST achieves the best time and space efficiency.
    摘要 很多人对汽车流量预测有很大的需求,因为它对城市交通系统的管理有着重要的作用。为了满足这些需求,我们认为一个理想的空间时间预测方法应该具备以下三个特点:高效、轻量级和有效。然而,目前的深度模型基于的空间时间预测解决方案通常具有复杂的体系和繁琐的优化,这些方法很难满足我们的期望。为了实现以上目标,我们提出了一种直观和新型的框架,即多层感知网络(MLPST)。特别是,我们首先从本地和全局感知场景中捕捉到空间关系。然后,在不同时间间隔中考虑到了时间关系。通过紧凑的MLP处理,MLPST可以很好地捕捉到空间和时间关系,同时计算复杂度只有线性增长,并且模型参数比基线模型高出一个数量级。我们进行了广泛的实验,并证明了MLPST在比较先进的基elines上的超越性和效率。在同等准确性下,MLPST在时间和空间效率方面具有优势。

Probing the Moral Development of Large Language Models through Defining Issues Test

  • paper_url: http://arxiv.org/abs/2309.13356
  • repo_url: None
  • paper_authors: Kumar Tanmay, Aditi Khandelwal, Utkarsh Agarwal, Monojit Choudhury
  • for: 这项研究用于测试LLMs的道德理解能力,使用定义问题测试(DIT),这是根据科尔堡认知道的道德发展模型(KCDM)而开发的一种心理测试。
  • methods: 这项研究使用DIT测试LLMs的道德理解能力,包括用道德决策问题和道德考虑因素,评估 respondent 对问题的解决方案和道德价值观的重要性。
  • results: 研究显示,早期LLMs如GPT-3的道德理解能力与随机基线相当,而ChatGPT、Llama2-Chat、PaLM-2和GPT-4则表现出较好的道德理解能力,与成年人相当。GPT-4的后konventional道德理解分数最高,与典型大学生相当。但是,模型在不同的决策问题上表现不一致,指出了其理解和解决能力的重要缺陷。
    Abstract In this study, we measure the moral reasoning ability of LLMs using the Defining Issues Test - a psychometric instrument developed for measuring the moral development stage of a person according to the Kohlberg's Cognitive Moral Development Model. DIT uses moral dilemmas followed by a set of ethical considerations that the respondent has to judge for importance in resolving the dilemma, and then rank-order them by importance. A moral development stage score of the respondent is then computed based on the relevance rating and ranking. Our study shows that early LLMs such as GPT-3 exhibit a moral reasoning ability no better than that of a random baseline, while ChatGPT, Llama2-Chat, PaLM-2 and GPT-4 show significantly better performance on this task, comparable to adult humans. GPT-4, in fact, has the highest post-conventional moral reasoning score, equivalent to that of typical graduate school students. However, we also observe that the models do not perform consistently across all dilemmas, pointing to important gaps in their understanding and reasoning abilities.
    摘要 在这项研究中,我们测量了LLM的道德思维能力使用定义问题测试(DIT)——一种心理测量instrument,用于测量人类的道德发展阶段 according to Kohlberg's cognitive moral development model。DIT使用道德困境,然后提供一组伦理考虑,请求参与者根据其重要性来评价和排序。根据参与者的道德发展阶段分数, compute the moral development stage score。 our study shows that early LLMs such as GPT-3 do not exhibit any better moral reasoning ability than a random baseline, while ChatGPT, Llama2-Chat, PaLM-2 and GPT-4 show significantly better performance on this task, comparable to adult humans. GPT-4, in fact, has the highest post-conventional moral reasoning score, equivalent to that of typical graduate school students. However, we also observe that the models do not perform consistently across all dilemmas, pointing to important gaps in their understanding and reasoning abilities.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Lexical Squad@Multimodal Hate Speech Event Detection 2023: Multimodal Hate Speech Detection using Fused Ensemble Approach

  • paper_url: http://arxiv.org/abs/2309.13354
  • repo_url: https://github.com/m0hammad-kashif/multimodalhatespeech
  • paper_authors: Mohammad Kashif, Mohammad Zohair, Saquib Ali
    for: 本研究旨在探讨如何使用多模态学习方法来检测仇恨言论。methods: 本研究使用了InceptionV3、BERT和XLNet等现状模型,并将其组合成一个ensemble模型来检测仇恨言论。results: 研究得出了75.21%的准确率和74.96%的F1分数,并进行了实验来证明模型在预测和分类上的性能。
    Abstract With a surge in the usage of social media postings to express opinions, emotions, and ideologies, there has been a significant shift towards the calibration of social media as a rapid medium of conveying viewpoints and outlooks over the globe. Concurrently, the emergence of a multitude of conflicts between two entities has given rise to a stream of social media content containing propaganda, hate speech, and inconsiderate views. Thus, the issue of monitoring social media postings is rising swiftly, attracting major attention from those willing to solve such problems. One such problem is Hate Speech detection. To mitigate this problem, we present our novel ensemble learning approach for detecting hate speech, by classifying text-embedded images into two labels, namely "Hate Speech" and "No Hate Speech". We have incorporated state-of-art models including InceptionV3, BERT, and XLNet. Our proposed ensemble model yielded promising results with 75.21 and 74.96 as accuracy and F-1 score (respectively). We also present an empirical evaluation of the text-embedded images to elaborate on how well the model was able to predict and classify. We release our codebase here (https://github.com/M0hammad-Kashif/MultiModalHateSpeech).
    摘要 受社交媒体发表意见、情感和意识形态的使用量增加,社交媒体已成为全球快速传递观点和视野的重要媒体。同时,全球多个问题的出现导致社交媒体内容中充斥着宣传、仇恨言论和不谨慎的观点。因此,监测社交媒体帖子的问题日益减少,引起了广泛的关注。其中,我们提出了一种新的ensemble学习方法,用于检测嫌 speech,通过将文本嵌入图像分为两个标签:“嫌 speech”和“无嫌 speech”。我们把state-of-art模型,如InceptionV3、BERT和XLNet纳入了我们的提案模型中。我们的提案模型在实验中达到了75.21%和74.96%的准确率和F-1分数(分别)。我们还进行了employnesian评估,以便更好地描述模型是如何预测和分类文本嵌入图像。我们在github上发布了代码库(https://github.com/M0hammad-Kashif/MultiModalHateSpeech)。

An In-depth Survey of Large Language Model-based Artificial Intelligence Agents

  • paper_url: http://arxiv.org/abs/2309.14365
  • repo_url: None
  • paper_authors: Pengyu Zhao, Zijian Jin, Ning Cheng
  • for: 本文主要研究大语言模型(LLM)与传统人工智能(AI)代理之间的主要区别和特点,以及 LLM 基于 AI 代理的可能性和潜力。
  • methods: 本文首先比较了这两种代理的基本特点,并详细分析了 AI 代理的关键组件,包括规划、记忆和工具使用。特别是在记忆方面,本文提出了一种创新的分类方法,不仅与传统分类方法不同,还为 AI 代理的记忆系统设计提供了新的视角。
  • results: 本文通过对核心组件的深入分析,为未来人工智能代理技术的发展提供了坚实的基础。文章最后还提出了进一步研究的方向,以便为学术研究人员提供价值的思路和指导。
    Abstract Due to the powerful capabilities demonstrated by large language model (LLM), there has been a recent surge in efforts to integrate them with AI agents to enhance their performance. In this paper, we have explored the core differences and characteristics between LLM-based AI agents and traditional AI agents. Specifically, we first compare the fundamental characteristics of these two types of agents, clarifying the significant advantages of LLM-based agents in handling natural language, knowledge storage, and reasoning capabilities. Subsequently, we conducted an in-depth analysis of the key components of AI agents, including planning, memory, and tool use. Particularly, for the crucial component of memory, this paper introduced an innovative classification scheme, not only departing from traditional classification methods but also providing a fresh perspective on the design of an AI agent's memory system. We firmly believe that in-depth research and understanding of these core components will lay a solid foundation for the future advancement of AI agent technology. At the end of the paper, we provide directional suggestions for further research in this field, with the hope of offering valuable insights to scholars and researchers in the field.
    摘要 因为大型语言模型(LLM)的强大能力,近期有大量努力尝试将其与人工智能代理 integrate 以提高性能。在这篇论文中,我们探讨了 LLM 基于代理的核心差异和特点。 Specifically,我们首先比较了这两种代理的基本特点,明确 LLM 基于代理在自然语言处理、知识存储和 raison d'être 能力方面的显著优势。接着,我们进行了深入的分析代理的关键组件,包括规划、记忆和工具使用。尤其是在关键组件中的记忆方面,这篇论文提出了一种创新的分类方法,不仅与传统分类方法不同,而且为 AI 代理的记忆系统设计提供了新的视角。我们认为深入研究和理解这些核心组件将为未来人工智能代理技术的发展 lay 下一定的基础。总之,在这篇论文的结尾,我们提出了一些方向性的建议,以期为学者和研究人员在这个领域提供价值的信息。

LLMs as Counterfactual Explanation Modules: Can ChatGPT Explain Black-box Text Classifiers?

  • paper_url: http://arxiv.org/abs/2309.13340
  • repo_url: None
  • paper_authors: Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu
  • for: 本研究使用大语言模型(LLM)来解释黑obox文本分类器的决策,通过生成后续的、模型无关的counterfactual解释。
  • methods: 我们提出了一个管道,使用LLM来生成post-hoc、模型无关的counterfactual解释,通过(i)利用LLM的文本理解能力来标识和提取潜在特征,以及(ii)利用LLM的推倒和生成能力来生成counterfactual解释。
  • results: 我们在一组state-of-the-art LLM中评估了三种变体,包括不同的特征提取方法和Counterfactual解释生成方法。我们发现这些模型在不同的设置中的性能不同,一种基于两步特征提取的全变体在大多数情况下表现最佳。我们的管道可以用于自动解释系统,可能减少人工劳动。
    Abstract Large language models (LLMs) are increasingly being used for tasks beyond text generation, including complex tasks such as data labeling, information extraction, etc. With the recent surge in research efforts to comprehend the full extent of LLM capabilities, in this work, we investigate the role of LLMs as counterfactual explanation modules, to explain decisions of black-box text classifiers. Inspired by causal thinking, we propose a pipeline for using LLMs to generate post-hoc, model-agnostic counterfactual explanations in a principled way via (i) leveraging the textual understanding capabilities of the LLM to identify and extract latent features, and (ii) leveraging the perturbation and generation capabilities of the same LLM to generate a counterfactual explanation by perturbing input features derived from the extracted latent features. We evaluate three variants of our framework, with varying degrees of specificity, on a suite of state-of-the-art LLMs, including ChatGPT and LLaMA 2. We evaluate the effectiveness and quality of the generated counterfactual explanations, over a variety of text classification benchmarks. Our results show varied performance of these models in different settings, with a full two-step feature extraction based variant outperforming others in most cases. Our pipeline can be used in automated explanation systems, potentially reducing human effort.
    摘要 大型语言模型(LLM) increasingly 用于 tasks beyond 文本生成,包括复杂的任务,如数据标签、信息提取等。 随着研究尝试理解 LLM 的全面能力,在这个工作中,我们 investigate LLM 作为 counterfactual explanation module,以解释黑色盒子文本分类器的决策。 灵感自 causal 思维,我们提出一个管道,使用 LLM 生成 post-hoc,model-agnostic counterfactual explanations 的方式,包括:(i) 利用 LLM 的文本理解能力,识别和提取 latent features,以及 (ii) 利用 LLM 的干扰和生成能力,对 input features 进行推变,生成 counterfactual explanation。 我们评估了三种不同的框架,以不同的具体性,在一些最新的 LLM 上,包括 ChatGPT 和 LLaMA 2。 我们评估这些模型在不同的设定下的效能和质量,并发现在大多数情况下,一个完整的 two-step 特征提取基于的Variant 表现较好。 我们的管道可以用于自动解释系统,可能将人类努力削减。

Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic

  • paper_url: http://arxiv.org/abs/2309.13339
  • repo_url: None
  • paper_authors: Xufeng Zhao, Mengdi Li, Wenhao Lu, Cornelius Weber, Jae Hee Lee, Kun Chu, Stefan Wermter
  • for: 提高大型自然语言模型的逻辑推理能力
  • methods: 基于符号逻辑原理的符号神经网络框架LogiCoT
  • results: 在不同领域的语言任务上,LogiCoT能够提高大型自然语言模型的逻辑推理能力,并且可以避免生成模型的幻觉现象
    Abstract Recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their behavior, particularly in terms of reasoning, often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. Generative language models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. Aiming to improve the zero-shot chain-of-thought reasoning ability of large language models, we propose Logical Chain-of-Thought (LogiCoT), a neurosymbolic framework that leverages principles from symbolic logic to verify and revise the reasoning processes accordingly. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of the enhanced reasoning paradigm by logic.
    摘要 recent advancements in large language models have showcased their remarkable generalizability across various domains. However, their reasoning abilities still have significant room for improvement, especially when confronted with scenarios requiring multi-step reasoning. Although large language models possess extensive knowledge, their behavior, particularly in terms of reasoning, often fails to effectively utilize this knowledge to establish a coherent thinking paradigm. Generative language models sometimes show hallucinations as their reasoning procedures are unconstrained by logical principles. aiming to improve the zero-shot chain-of-thought reasoning ability of large language models, we propose Logical Chain-of-Thought (LogiCoT), a neurosymbolic framework that leverages principles from symbolic logic to verify and revise the reasoning processes accordingly. Experimental evaluations conducted on language tasks in diverse domains, including arithmetic, commonsense, symbolic, causal inference, and social problems, demonstrate the efficacy of the enhanced reasoning paradigm by logic.Here's the word-for-word translation in Simplified Chinese:最近的大语言模型突破有让人印象深刻的多领域普适性。然而,它们的理解能力仍然有很大的改进空间,特别是在多步逻辑场景下。虽然大语言模型拥有庞大的知识,但它们的行为,尤其是在理解方面,经常不能充分利用这些知识来建立一个有效的思维模式。生成语言模型有时会出现幻见,因为它们的理解过程没有遵循逻辑原则。为了提高大语言模型的零shot逻辑链条理解能力,我们提出了Logical Chain-of-Thought(LogiCoT),一种符号逻辑框架,利用符号逻辑原理来验证和修改理解过程。在不同领域的语言任务上,包括算术、常识、符号、 causal inference 和社会问题,我们进行了实验评估,并证明了增强的理解模式的有效性。

Diversifying Question Generation over Knowledge Base via External Natural Questions

  • paper_url: http://arxiv.org/abs/2309.14362
  • repo_url: None
  • paper_authors: Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen
  • for: 本研究旨在提高知识基础问题生成(KBQG)的质量。
  • methods: 本研究提出了一种新的多元评价指标,以度量生成的问题的多样性,并 introduces 一种双模型框架,通过两种选择策略来生成多元的问题。
  • results: 实验结果表明,提出的方法可以生成高度多元的问题,并提高问题回答 task 的性能。
    Abstract Previous methods on knowledge base question generation (KBQG) primarily focus on enhancing the quality of a single generated question. Recognizing the remarkable paraphrasing ability of humans, we contend that diverse texts should convey the same semantics through varied expressions. The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity. Current metrics inadequately assess the above diversity since they calculate the ratio of unique n-grams in the generated question itself, which leans more towards measuring duplication rather than true diversity. Accordingly, we devise a new diversity evaluation metric, which measures the diversity among top-k generated questions for each instance while ensuring their relevance to the ground truth. Clearly, the second challenge is how to enhance diversifying question generation. To address this challenge, we introduce a dual model framework interwoven by two selection strategies to generate diverse questions leveraging external natural questions. The main idea of our dual framework is to extract more diverse expressions and integrate them into the generation model to enhance diversifying question generation. Extensive experiments on widely used benchmarks for KBQG demonstrate that our proposed approach generates highly diverse questions and improves the performance of question answering tasks.
    摘要 To address this challenge, we propose a new evaluation metric for diversity that measures the diversity among the top-k generated questions for each instance while ensuring their relevance to the ground truth. Additionally, we introduce a dual model framework that leverages external natural questions to generate diverse questions. Our approach extracts more diverse expressions and integrates them into the generation model to enhance diversifying question generation.We demonstrate the effectiveness of our approach through extensive experiments on widely used benchmarks for KBQG. Our proposed approach generates highly diverse questions and improves the performance of question answering tasks.

Class Attendance System in Education with Deep Learning Method

  • paper_url: http://arxiv.org/abs/2309.13317
  • repo_url: None
  • paper_authors: Hüdaverdi Demir, Serkan Savaş
  • For: The paper is written for the purpose of developing a system using deep learning methods for object detection in images to record students’ entrance to educational institutions and to perform class attendance.* Methods: The paper uses deep learning methods, specifically object detection algorithms, to detect students’ entrance and attendance in educational institutions.* Results: The study successfully implemented the object detection system and will be applied to real-life problems in a school in the 2022-2023 academic year.Here is the information in Simplified Chinese text:
  • for: 本研究旨在开发基于深度学习方法的对象检测系统,用于记录学生入学教育机构和进行课程参加。
  • methods: 本研究使用深度学习方法,具体来说是对象检测算法,来检测学生入学和课程参加。
  • results: 研究成功实现对象检测系统,将在2022-2023学年度应用到实际问题中。
    Abstract With the advancing technology, the hardware gain of computers and the increase in the processing capacity of processors have facilitated the processing of instantaneous and real-time images. Face recognition processes are also studies in the field of image processing. Facial recognition processes are frequently used in security applications and commercial applications. Especially in the last 20 years, the high performances of artificial intelligence (AI) studies have contributed to the spread of these studies in many different fields. Education is one of them. The potential and advantages of using AI in education; can be grouped under three headings: student, teacher, and institution. One of the institutional studies may be the security of educational environments and the contribution of automation to education and training processes. From this point of view, deep learning methods, one of the sub-branches of AI, were used in this study. For object detection from images, a pioneering study has been designed and successfully implemented to keep records of students' entrance to the educational institution and to perform class attendance with images taken from the camera using image processing algorithms. The application of the study to real-life problems will be carried out in a school determined in the 2022-2023 academic year.
    摘要 The potential benefits of using AI in education can be grouped into three categories: student, teacher, and institution. One of the institutional studies is the security of educational environments and the contribution of automation to education and training processes. In this study, deep learning methods, a sub-branch of AI, were used for object detection from images. The study was designed to record students' entrance to the educational institution and to perform class attendance using images taken from cameras and image processing algorithms. The application of the study to real-life problems will be carried out in a school during the 2022-2023 academic year.

USL-Net: Uncertainty Self-Learning Network for Unsupervised Skin Lesion Segmentation

  • paper_url: http://arxiv.org/abs/2309.13289
  • repo_url: None
  • paper_authors: Xiaofan Li, Bo Peng, Daipeng Yang, Zhuyang Xie
  • for: 这个研究旨在提出一个无监督的皮肤条件分类方法,以解决无监督皮肤条件分类中的挑战。
  • methods: 本研究使用了自我学习网络(USL-Net),通过对照学习提取特征,然后生成分类对应的活化地图(CAM)来进行分类。高度活化的地图区域表示皮肤条件的重要性,而低度活化的区域则表示背景。
  • results: 实验结果显示,本方法可以与弱监督和监督方法相比,并且超过其他已有的无监督方法的性能。
    Abstract Unsupervised skin lesion segmentation offers several benefits, including conserving expert human resources, reducing discrepancies due to subjective human labeling, and adapting to novel environments. However, segmenting dermoscopic images without manual labeling guidance presents significant challenges due to dermoscopic image artifacts such as hair noise, blister noise, and subtle edge differences. To address these challenges, we introduce an innovative Uncertainty Self-Learning Network (USL-Net) designed for skin lesion segmentation. The USL-Net can effectively segment a range of lesions, eliminating the need for manual labeling guidance. Initially, features are extracted using contrastive learning, followed by the generation of Class Activation Maps (CAMs) as saliency maps using these features. The different CAM locations correspond to the importance of the lesion region based on their saliency. High-saliency regions in the map serve as pseudo-labels for lesion regions while low-saliency regions represent the background. However, intermediate regions can be hard to classify, often due to their proximity to lesion edges or interference from hair or blisters. Rather than risk potential pseudo-labeling errors or learning confusion by forcefully classifying these regions, we consider them as uncertainty regions, exempting them from pseudo-labeling and allowing the network to self-learn. Further, we employ connectivity detection and centrality detection to refine foreground pseudo-labels and reduce noise-induced errors. The application of cycle refining enhances performance further. Our method underwent thorough experimental validation on the ISIC-2017, ISIC-2018, and PH2 datasets, demonstrating that its performance is on par with weakly supervised and supervised methods, and exceeds that of other existing unsupervised methods.
    摘要 无监督皮肤病变分割具有多个优点,如保留专业人员资源、减少主观人类标注的差异和适应新环境。然而,在无人标注指导下对德朗斯科普图像进行分割存在 significanti挑战,主要是因为德朗斯科普图像的艺术ifacts,如毛发噪声、膨涨噪声和细微边缘差异。为解决这些挑战,我们提出了一种创新的不确定自学习网络(USL-Net),用于皮肤病变分割。USL-Net可以有效地分割多种病变,无需人工标注指导。首先,通过对比学习提取特征,然后通过这些特征生成类活动图(CAM)作为saliency map。不同的CAM位置对病变区域的重要性进行标识,高Saliency区域在Map中对病变区域具有高重要性,而低Saliency区域则代表背景。然而,中间区域可能具有困难分类,常常是因为 lesion edge 邻近或毛发/膨涨噪声的干扰。而不要强制性地将这些区域分类为病变区域,我们将其视为不确定区域,并将其除外。此外,我们还使用连接检测和中心检测来改进前景 pseudo-标注和减少噪声引起的错误。通过循环反馈,我们进一步提高性能。我们的方法在 ISIC-2017、ISIC-2018 和 PH2 数据集上进行了严格的实验验证,结果表明其性能与弱监督和监督方法相当,并超过了其他现有的无监督方法。

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.13285
  • repo_url: None
  • paper_authors: Zhehui Huang, Zhaojing Yang, Rahul Krupani, Baskın Şenbaşlar, Sumeet Batra, Gaurav S. Sukhatme
  • for: 这paper的目的是使用end-to-end深度强化学习控制带有障碍物的四旋翼机器人群体。
  • methods: 这paper使用的方法包括curriculum学习和归一化缓存,以及对邻居机器人和障碍物的注意机制。
  • results: 这paper的结果表明,通过使用这些方法,可以在带有障碍物的环境中控制四旋翼机器人群体,并且可以在真实的quadrotor上进行零shot传输。
    Abstract End-to-end deep reinforcement learning (DRL) for quadrotor control promises many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Video demonstrations are available on the project website at: https://sites.google.com/view/obst-avoid-swarm-rl.
    摘要 <>TRANSLATE_TEXTEnd-to-end deep reinforcement learning (DRL) for quadrotor control quadrotor 执行 controller 承诺 many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Video demonstrations are available on the project website at: https://sites.google.com/view/obst-avoid-swarm-rl.TRANSLATE_TEXT

Being Aware of Localization Accuracy By Generating Predicted-IoU-Guided Quality Scores

  • paper_url: http://arxiv.org/abs/2309.13269
  • repo_url: https://github.com/panffeereal/clq
  • paper_authors: Pengfei Liu, Weibo Wang, Yuhan Guo, Jiubin Tan
  • for: 提高检测性能,通过同时考虑分类分数和地址准确率,提高检测质量。
  • methods: 采用了新的检测架构,即CLQ,其包括一个简洁的LQE分支,用于获取地址质量分数指导。在训练和推理过程中,LQE分支与分类分支结合在一起,生成一个共同的分类-地址-质量表示。
  • results: 在COCO测试dev数据集上,CLQ实现了最新的状态艺术性能,具有47.8 AP和11.5 fps的速度,并且在ATSS上进行扩展,实现了可靠的1.2 AP提升。
    Abstract Localization Quality Estimation (LQE) helps to improve detection performance as it benefits post processing through jointly considering classification score and localization accuracy. In this perspective, for further leveraging the close relationship between localization accuracy and IoU (Intersection-Over-Union), and for depressing those inconsistent predictions, we designed an elegant LQE branch to acquire localization quality score guided by predicted IoU. Distinctly, for alleviating the inconsistency of classification score and localization quality during training and inference, under which some predictions with low classification scores but high LQE scores will impair the performance, instead of separately and independently setting, we embedded LQE branch into classification branch, producing a joint classification-localization-quality representation. Then a novel one stage detector termed CLQ is proposed. Extensive experiments show that CLQ achieves state-of-the-arts' performance at an accuracy of 47.8 AP and a speed of 11.5 fps with ResNeXt-101 as backbone on COCO test-dev. Finally, we extend CLQ to ATSS, producing a reliable 1.2 AP gain, showing our model's strong adaptability and scalability. Codes are released at https://github.com/PanffeeReal/CLQ.
    摘要 本文提出了一种新的一Stage检测器(CLQ),通过结合分类分支和本地化质量估计(LQE)分支,实现了分类、本地化和质量三者的共同表示。这种方法可以解决在训练和推断过程中分类分数和本地化质量之间的不一致问题,从而提高检测性能。实验表明,CLQ在COCO测试预训练集上达到了47.8 AP的最佳性能和11.5 fps的速度,并且对ATSS进行扩展,实现了可靠的1.2 AP提升。代码可以在github上找到。

Robust Navigation with Cross-Modal Fusion and Knowledge Transfer

  • paper_url: http://arxiv.org/abs/2309.13266
  • repo_url: https://github.com/wzcai99/Distill-Navigator
  • paper_authors: Wenzhe Cai, Guangran Cheng, Lingyue Kong, Lu Dong, Changyin Sun
  • for: 提高机器人Navigation技能的通用化和实际应用(improving the generalization of mobile robot navigation skills and achieving sim-to-real transfer)
  • methods: 跨模态融合方法和教师学生填充框架(cross-modal fusion method and teacher-student distillation architecture)
  • results: 比基eline表现出色,在 simulated和实际环境中实现了Robust Navigation性能(outperforms the baselines in both simulated and real-world environments, achieving robust navigation performance with varying working conditions)
    Abstract Recently, learning-based approaches show promising results in navigation tasks. However, the poor generalization capability and the simulation-reality gap prevent a wide range of applications. We consider the problem of improving the generalization of mobile robots and achieving sim-to-real transfer for navigation skills. To that end, we propose a cross-modal fusion method and a knowledge transfer framework for better generalization. This is realized by a teacher-student distillation architecture. The teacher learns a discriminative representation and the near-perfect policy in an ideal environment. By imitating the behavior and representation of the teacher, the student is able to align the features from noisy multi-modal input and reduce the influence of variations on navigation policy. We evaluate our method in simulated and real-world environments. Experiments show that our method outperforms the baselines by a large margin and achieves robust navigation performance with varying working conditions.
    摘要 最近,学习基于方法在导航任务中显示了有前途的结果。然而,低泛化能力和实验室实际差距阻碍了广泛应用。我们对移动机器人的改进泛化和实际协同转移技能的问题进行了考虑。为达到这一目标,我们提出了跨模态融合方法和知识传递框架。这是通过教师学生热退架构实现的。教师在理想环境中学习一个抽象表示和准确策略。通过imiter教师的行为和表示,学生可以将多种不稳定的输入特征相互拟合,并减少导航策略中的变化影响。我们在模拟和实际环境中进行了测试,实验结果表明,我们的方法在基eline上大幅超越,并在不同工作条件下实现了稳定的导航性能。

Optimizing Chance-Constrained Submodular Problems with Variable Uncertainties

  • paper_url: http://arxiv.org/abs/2309.14359
  • repo_url: None
  • paper_authors: Xiankun Yan, Anh Viet Do, Feng Shi, Xiaoyu Qin, Frank Neumann
  • for: 这个论文是关于概率性约束限制的实时优化问题的研究,具体来说是关于可变权重项目中的可变概率约束。
  • methods: 该论文使用了抽象搜索算法和随机搜索算法来解决可变权重项目中的可变概率约束问题。
  • results: 该论文通过分析和实验表明,使用抽象搜索算法和随机搜索算法可以在可变权重项目中提供高质量的解决方案,具体来说是一个常数近似比例的解决方案。
    Abstract Chance constraints are frequently used to limit the probability of constraint violations in real-world optimization problems where the constraints involve stochastic components. We study chance-constrained submodular optimization problems, which capture a wide range of optimization problems with stochastic constraints. Previous studies considered submodular problems with stochastic knapsack constraints in the case where uncertainties are the same for each item that can be selected. However, uncertainty levels are usually variable with respect to the different stochastic components in real-world scenarios, and rigorous analysis for this setting is missing in the context of submodular optimization. This paper provides the first such analysis for this case, where the weights of items have the same expectation but different dispersion. We present greedy algorithms that can obtain a high-quality solution, i.e., a constant approximation ratio to the given optimal solution from the deterministic setting. In the experiments, we demonstrate that the algorithms perform effectively on several chance-constrained instances of the maximum coverage problem and the influence maximization problem.
    摘要 机会约束 frequently 用于限制实际问题中的约束违背 probabilities。我们研究 chance-constrained submodular optimization problems,这些问题涵盖了实际问题中具有恒等约束的变量组合。 previous studies 对 submodular problems with stochastic knapsack constraints 进行了研究,但是在实际情况中,不同的随机成分之间的不确定程度通常是不同的,而这项研究在 submodular optimization 的 context 中缺乏 rigorous analysis。 this paper 提供了 first such analysis for this case,where the weights of items have the same expectation but different dispersion. we present greedy algorithms that can obtain a high-quality solution, i.e., a constant approximation ratio to the given optimal solution from the deterministic setting. in the experiments, we demonstrate that the algorithms perform effectively on several chance-constrained instances of the maximum coverage problem and the influence maximization problem.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

WikiMT++ Dataset Card

  • paper_url: http://arxiv.org/abs/2309.13259
  • repo_url: None
  • paper_authors: Monan Zhou, Shangda Wu, Yuan Wang, Wei Li
  • for: 扩展和改进 WikiMusicText 数据集,用于音乐信息检索、条件音乐生成、自动作曲和情感分类等应用场景。
  • methods: 添加对象属性(专辑、歌词、视频)和主观情感属性(12种情感词),以及使用 CLaMP 进行属性修正,以提高数据集的准确性和完整性。
  • results: 提高了 WikiMT 的应用场景和可用性,并且通过添加新的属性和修正原始数据,提高了数据集的准确性和完整性。
    Abstract WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT), featuring 1010 curated lead sheets in ABC notation. To expand application scenarios of WikiMT, we add both objective (album, lyrics, video) and subjective emotion (12 emotion adjectives) and emo\_4q (Russell 4Q) attributes, enhancing its usability for music information retrieval, conditional music generation, automatic composition, and emotion classification, etc. Additionally, CLaMP is implemented to correct the attributes inherited from WikiMT to reduce errors introduced during original data collection and enhance the accuracy and completeness of our dataset.
    摘要 WikiMT++是 WikiMusicText(WikiMT)的扩展和改进版本,包含1010个精心编辑的领导Sheet在ABCnotation。为扩展 WikiMT 的应用场景,我们添加了对象(专辑、歌词、视频)和主观情感(12种情感形容词)以及 emo\_4q(Russell 4Q)属性,从而提高了音乐信息检索、 conditional music generation、自动作曲和情感分类等方面的可用性。此外,CLaMP 也被实现,以修正从 WikiMT 继承的属性,以降低在原始数据收集过程中引入的错误,提高数据集的准确性和完整性。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation. It is commonly used in mainland China and Singapore.

Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

  • paper_url: http://arxiv.org/abs/2309.13256
  • repo_url: https://github.com/zhaohan-xi/plm-prompt-defense
  • paper_authors: Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Jinghui Chen, Fenglong Ma, Ting Wang
  • for: This paper is written to investigate the security risks of pre-trained language models (PLMs) as few-shot learners and to propose a novel defense mechanism called MDP to address these risks.
  • methods: The paper uses a pilot study to demonstrate the vulnerability of PLMs to backdoor attacks in few-shot scenarios and proposes MDP as a lightweight, pluggable, and effective defense. MDP leverages the gap between the masking-sensitivity of poisoned and clean samples to identify poisoned samples.
  • results: The paper shows the efficacy of MDP through analytical analysis and empirical evaluation using benchmark datasets and representative attacks. The results demonstrate that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness.
    Abstract Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP.
    摘要

Can I Trust the Explanations? Investigating Explainable Machine Learning Methods for Monotonic Models

  • paper_url: http://arxiv.org/abs/2309.13246
  • repo_url: None
  • paper_authors: Dangxing Chen
  • for: 这个论文主要针对的是解释性机器学习方法在具有领域知识的模型上的应用。
  • methods: 这个论文使用了解释性机器学习方法,包括基准值法和集成梯度法,来解释具有领域知识的模型的决策过程。
  • results: 研究发现,当只有个体偏好 monotonicity 存在时,基准值法可以提供良好的解释;而当强对比 monotonicity 存在时,集成梯度法在平均情况下可以提供相对更好的解释。
    Abstract In recent years, explainable machine learning methods have been very successful. Despite their success, most explainable machine learning methods are applied to black-box models without any domain knowledge. By incorporating domain knowledge, science-informed machine learning models have demonstrated better generalization and interpretation. But do we obtain consistent scientific explanations if we apply explainable machine learning methods to science-informed machine learning models? This question is addressed in the context of monotonic models that exhibit three different types of monotonicity. To demonstrate monotonicity, we propose three axioms. Accordingly, this study shows that when only individual monotonicity is involved, the baseline Shapley value provides good explanations; however, when strong pairwise monotonicity is involved, the Integrated gradients method provides reasonable explanations on average.
    摘要

UniHead: Unifying Multi-Perception for Detection Heads

  • paper_url: http://arxiv.org/abs/2309.13242
  • repo_url: https://github.com/zht8506/unihead
  • paper_authors: Hantao Zhou, Rui Yang, Yachao Zhang, Haoran Duan, Yawen Huang, Runze Hu, Xiu Li, Yefeng Zheng
    for:This paper aims to improve the object detection performance by developing a novel detection head called UniHead, which unifies three perceptual abilities simultaneously: deformation perception, global perception, and cross-task perception.methods:The proposed UniHead uses a Dual-axial Aggregation Transformer (DAT) to model long-range dependencies and adaptively sample object features, as well as a Cross-task Interaction Transformer (CIT) to facilitate interaction between the classification and localization branches.results:The proposed UniHead achieves significant improvements in object detection performance on the COCO dataset, with +2.7 AP gains in RetinaNet, +2.9 AP gains in FreeAnchor, and +2.1 AP gains in GFL, compared to the baseline methods. The code will be publicly available at https://github.com/zht8506/UniHead.
    Abstract The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception, global perception and cross-task perception. Despite numerous methods attempt to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we have developed an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach (1) introduces deformation perception, enabling the model to adaptively sample object features; (2) proposes a Dual-axial Aggregation Transformer (DAT) to adeptly model long-range dependencies, thereby achieving global perception; and (3) devises a Cross-task Interaction Transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain +2.7 AP gains in RetinaNet, +2.9 AP gains in FreeAnchor, and +2.1 AP gains in GFL. The code will be publicly available. Code Url: https://github.com/zht8506/UniHead.
    摘要 历史头部是目标检测器中的关键组件,负责执行分类和 lokalisierung 功能。可惜,通常使用的并行头部frequentlylacks omni perceptual capabilities, such as deformation perception, global perception, and cross-task perception. Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we have developed an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach:1. 引入了形态感知,使模型可以适应性地采样对象特征;2. 提出了 Dual-axial Aggregation Transformer (DAT),以便模型长距离依赖关系,实现全球感知;3. 设计了 Cross-task Interaction Transformer (CIT),以便类别和 lokalisierung 分支之间进行交互,使两个任务进行一致。作为一种插件式方法,我们的 UniHead 可以方便地与现有的检测器集成。广泛的实验表明,我们的 UniHead 可以为许多检测器带来显著改进。例如,UniHead 可以在 RetinaNet 中提高 +2.7 AP 得分,在 FreeAnchor 中提高 +2.9 AP 得分,和在 GFL 中提高 +2.1 AP 得分。代码将公开。代码Url:https://github.com/zht8506/UniHead。

Heterogeneous Feature Representation for Digital Twin-Oriented Complex Networked Systems

  • paper_url: http://arxiv.org/abs/2309.13229
  • repo_url: None
  • paper_authors: Jiaqi Wen, Bogdan Gabrys, Katarzyna Musial
  • for: 这项研究旨在提高复杂网络系统(CNS)模型的表达能力,以更好地反映实际世界系统。
  • methods: 该研究使用了不同的特征表示原则,包括整数特征值和杂化集,以描述节点特征的客观和主观含义。
  • results: 研究发现,使用杂化集表示法可以提高模型的表达能力,并且不同的特征表示方法会影响网络结构和疫情蔓延速度,需要采取不同的缓冲策略来适应不同人群。
    Abstract Building models of Complex Networked Systems (CNS) that can accurately represent reality forms an important research area. To be able to reflect real world systems, the modelling needs to consider not only the intensity of interactions between the entities but also features of all the elements of the system. This study aims to improve the expressive power of node features in Digital Twin-Oriented Complex Networked Systems (DT-CNSs) with heterogeneous feature representation principles. This involves representing features with crisp feature values and fuzzy sets, each describing the objective and the subjective inductions of the nodes' features and feature differences. Our empirical analysis builds DT-CNSs to recreate realistic physical contact networks in different countries from real node feature distributions based on various representation principles and an optimised feature preference. We also investigate their respective disaster resilience to an epidemic outbreak starting from the most popular node. The results suggest that the increasing flexibility of feature representation with fuzzy sets improves the expressive power and enables more accurate modelling. In addition, the heterogeneous features influence the network structure and the speed of the epidemic outbreak, requiring various mitigation policies targeted at different people.
    摘要 Translation notes:* "Complex Networked Systems" (CNS) is translated as "复杂网络系统" (CNS) in Simplified Chinese.* "Digital Twin-Oriented Complex Networked Systems" (DT-CNSs) is translated as "数字双向复杂网络系统" (DT-CNSs) in Simplified Chinese.* "heterogeneous feature representation principles" is translated as "不同类型特征表示原则" in Simplified Chinese.* "crisp feature values" is translated as "分割特征值" in Simplified Chinese.* "fuzzy sets" is translated as "柔软集" in Simplified Chinese.* "objective and subjective inductions of the nodes' features" is translated as "节点特征的客观和主观推导" in Simplified Chinese.* "feature differences" is translated as "特征差异" in Simplified Chinese.* "empirical analysis" is translated as "实证分析" in Simplified Chinese.* "physical contact networks" is translated as "物理接触网络" in Simplified Chinese.* "real node feature distributions" is translated as "真实节点特征分布" in Simplified Chinese.* "optimized feature preference" is translated as "优化特征偏好" in Simplified Chinese.* "disaster resilience" is translated as "灾害抗性" in Simplified Chinese.* "epidemic outbreak" is translated as "疫情爆发" in Simplified Chinese.* "most popular node" is translated as "最受欢迎的节点" in Simplified Chinese.* "mitigation policies" is translated as "缓解措施" in Simplified Chinese.* "different people" is translated as "不同人群" in Simplified Chinese.

Pick Planning Strategies for Large-Scale Package Manipulation

  • paper_url: http://arxiv.org/abs/2309.13224
  • repo_url: None
  • paper_authors: Shuai Li, Azarakhsh Keipour, Kevin Jamieson, Nicolas Hudson, Sicong Zhao, Charles Swan, Kostas Bekris
  • for: 提高仓储运作效率,降低物流成本,提高交往速度,提高市场波动的抗性。
  • methods: 使用Robin营运系统进行大规模包裹排序和单独,每天处理600万个包裹,总共处理20亿个包裹。开发了各种论述方法,其中包括使用实际生产数据训练的抓取质量预测器。
  • results: 本研究是首次在真实生产环境中大规模应用学习抓取质量预测器。
    Abstract Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to market fluctuations. This extended abstract showcases a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which is used for picking and singulating up to 6 million packages per day and so far has manipulated over 2 billion packages. It describes the various heuristic methods developed over time and their successor, which utilizes a pick success predictor trained on real production data. To the best of the authors' knowledge, this work is the first large-scale deployment of learned pick quality estimation methods in a real production system.
    摘要 自动化仓库操作可以减少物流成本,最终降低消费者最终价格,提高快递速度,并增强对市场波动的抗颤势。 这个扩展摘要展示了亚马逊 робо拓客(Robin)车队的大规模套件搬运,可以每天搬运Up to 6 million个套件并已经搬运了超过20亿个套件。它描述了不同的论述方法,以及其继承者,该方法使用实际生产数据进行学习套件质量预测。 据作者所知,这是首次在真正生产环境中大规模应用学习套件质量预测方法。

Hindi to English: Transformer-Based Neural Machine Translation

  • paper_url: http://arxiv.org/abs/2309.13222
  • repo_url: https://github.com/1502shivam-singh/audio-vision-server
  • paper_authors: Kavit Gangar, Hardik Ruparel, Shreyas Lele
  • for: 这个论文主要针对的是将印度语言希腊语译成英语的自动翻译问题,以提高翻译质量。
  • methods: 这个论文使用了深度学习技术,特别是Transformer模型,将希腊语译成英语。为了增强数据训练, authors还使用了回训练和字节对编码(BPE)进行 vocabulary 创建和tokenization。
  • results: 根据IIT Bombay英语-希腊语词库测试集,这个配置达到了当前最佳的BLEU分数24.53。
    Abstract Machine Translation (MT) is one of the most prominent tasks in Natural Language Processing (NLP) which involves the automatic conversion of texts from one natural language to another while preserving its meaning and fluency. Although the research in machine translation has been going on since multiple decades, the newer approach of integrating deep learning techniques in natural language processing has led to significant improvements in the translation quality. In this paper, we have developed a Neural Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English. Hindi being a low resource language has made it difficult for neural networks to understand the language thereby leading to a slow growth in the development of neural machine translators. Thus, to address this gap, we implemented back-translation to augment the training data and for creating the vocabulary, we experimented with both word and subword level tokenization using Byte Pair Encoding (BPE) thereby ending up training the Transformer in 10 different configurations. This led us to achieve a state-of-the-art BLEU score of 24.53 on the test set of IIT Bombay English-Hindi Corpus in one of the configurations.
    摘要 机器翻译(MT)是自然语言处理(NLP)中最为出名的任务之一,它涉及自然语言之间文本的自动转换,保持意思和流畅性。虽然关于机器翻译的研究已经持续多个 décennia,但是在近年来,将深度学习技术应用于自然语言处理领域,带来了对翻译质量的显著改善。在这篇论文中,我们开发了一个基于Transformer模型的神经机器翻译系统,用于从印度语言希ن第语到英语的翻译。由于希第语是一种低资源语言,使得神经网络理解这种语言很困难,因此在神经机器翻译的发展中,进展较为缓慢。为了解决这个问题,我们实施了反向翻译以增加训练数据,并在 vocabulary 创建方面实验了字节对编码(BPE)的两种tokenization策略。经过10种不同的配置训练,我们最终实现了IIT Bombay英语-希第语词库测试集上的state-of-the-art BLEU分数24.53。