cs.AI - 2023-08-07

SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples

  • paper_url: http://arxiv.org/abs/2308.03671
  • repo_url: None
  • paper_authors: Michael Färber, David Lamprecht, Johan Krause, Linn Aung, Peter Haase
  • for: 这个论文是为了描述一个名为SemOpenAlex的大规模RDF知识图,该图包含了260亿个三元组,关于科学出版物和其相关的实体,如作者、机构、期刊和概念。
  • methods: 该论文使用了CC0协议,提供了免费和开放的数据访问渠道,包括RDF堆 dump文件、SPARQL端点和 Linked Open Data 云端,以及高性能计算 embedding 技术。
  • results: SemOpenAlex 可以满足广泛的用例enario,如探索性搜索、大规模科学影响量计算、科学领域之间的探索性分析、学术推荐系统、合作者推荐、出版物推荐、会议推荐等。
    Abstract We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud, complete with resolvable URIs and links to other data sources. Moreover, we provide embeddings for knowledge graph entities using high-performance computing. SemOpenAlex enables a broad range of use-case scenarios, such as exploratory semantic search via our website, large-scale scientific impact quantification, and other forms of scholarly big data analytics within and across scientific disciplines. Additionally, it enables academic recommender systems, such as recommending collaborators, publications, and venues, including explainability capabilities. Finally, SemOpenAlex can serve for RDF query optimization benchmarks, creating scholarly knowledge-guided language models, and as a hub for semantic scientific publishing.
    摘要 我们现在提供SemOpenAlex,一个广泛的RDF知识图库,包含超过260亿个三元组关于科学出版物和其相关的实体,如作者、机构、杂志和概念。SemOpenAlex采用CC0许可证,提供免费和开放的数据访问。我们将数据提供多种途径,包括RDF填充文件、SPARQL终结点和 Linked Open Data 云端,并提供可访问的 URI 和与其他数据源的链接。此外,我们还提供知识图实体的嵌入,使用高性能计算。SemOpenAlex 支持广泛的使用场景,如探索性Semantic Search、大规模科学影响评估、科学领域之间和科学领域之间的大数据分析,以及学术推荐系统,如推荐合作者、论文和会议。此外,SemOpenAlex 还可以用于RDF查询优化基准,创建学术知识驱动的自然语言模型,以及为Semantic scientific publishing 的中心。

Diffusion Model in Causal Inference with Unmeasured Confounders

  • paper_url: http://arxiv.org/abs/2308.03669
  • repo_url: https://github.com/tatsu432/BDCM
  • paper_authors: Tatsuhiro Shimizu
  • for: 本研究旨在扩展Diffusion Model,以回答基于观察数据的 causal 问题,尽管存在不被测量的干扰因素。
  • methods: 我们提出了一种基于 Directed Acyclic Graph (DAG) 的 causal 模型(Diffusion-based Causal Model,DCM),并在这个模型中包含了扩散模型,以更准确地回答 causal 问题。
  • results: 我们的实验结果表明,在不被测量的干扰因素的存在下,我们的提议的模型可以更 precisely 捕捉 counterfactual 分布,与 DCM 相比。
    Abstract We study how to extend the use of the diffusion model to answer the causal question from the observational data under the existence of unmeasured confounders. In Pearl's framework of using a Directed Acyclic Graph (DAG) to capture the causal intervention, a Diffusion-based Causal Model (DCM) was proposed incorporating the diffusion model to answer the causal questions more accurately, assuming that all of the confounders are observed. However, unmeasured confounders in practice exist, which hinders DCM from being applicable. To alleviate this limitation of DCM, we propose an extended model called Backdoor Criterion based DCM (BDCM), whose idea is rooted in the Backdoor criterion to find the variables in DAG to be included in the decoding process of the diffusion model so that we can extend DCM to the case with unmeasured confounders. Synthetic data experiment demonstrates that our proposed model captures the counterfactual distribution more precisely than DCM under the unmeasured confounders.
    摘要 我们研究如何扩展Diffusion模型,以回答基于观察数据的 causal 问题,在存在未测量的干扰变量的情况下。在Pearl的框架下,使用 Directed Acyclic Graph (DAG) 捕捉 causal 干扰,Diffusion-based Causal Model (DCM) 被提出,将Diffusion模型与 causal 干扰相结合,以更准确地回答 causal 问题,假设所有干扰变量都是观察的。然而,在实践中,未测量的干扰变量存在,这限制了 DCM 的应用。为了解决 DCM 的这种限制,我们提出了一种扩展模型,即 Backdoor Criterion based DCM (BDCM),其基于 Backdoor criterion 来选择 DAG 中的变量,以便在Diffusion模型的解码过程中包含这些变量,从而扩展 DCM 到带有未测量的干扰变量的情况。 synthetic data 实验表明,我们的提出的模型可以更 precisely 回答 causal 问题,比 DCM 在未测量的干扰变量的情况下。

QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration

  • paper_url: http://arxiv.org/abs/2308.03665
  • repo_url: None
  • paper_authors: Felix Chalumeau, Bryan Lim, Raphael Boige, Maxime Allard, Luca Grillotti, Manon Flageat, Valentin Macé, Arthur Flajolet, Thomas Pierrot, Antoine Cully
  • for: The paper is written for researchers and practitioners who are interested in using Quality-Diversity (QD) optimization algorithms in Jax for various optimization purposes, including black-box optimization and continuous control.
  • methods: The paper presents QDax, an open-source library with a streamlined and modular API for QD optimization algorithms in Jax. The library offers implementations of popular QD, Neuroevolution, and Reinforcement Learning (RL) algorithms, supported by various examples.
  • results: The paper demonstrates the efficiency and flexibility of QDax by testing it with 95% coverage and showing that it can be just-in-time compiled with Jax for efficient execution across multiple accelerators, including GPUs and TPUs.
    Abstract QDax is an open-source library with a streamlined and modular API for Quality-Diversity (QD) optimization algorithms in Jax. The library serves as a versatile tool for optimization purposes, ranging from black-box optimization to continuous control. QDax offers implementations of popular QD, Neuroevolution, and Reinforcement Learning (RL) algorithms, supported by various examples. All the implementations can be just-in-time compiled with Jax, facilitating efficient execution across multiple accelerators, including GPUs and TPUs. These implementations effectively demonstrate the framework's flexibility and user-friendliness, easing experimentation for research purposes. Furthermore, the library is thoroughly documented and tested with 95\% coverage.
    摘要 QDax 是一个开源库,具有整合和可调的 API,用于纯度多样化(QD)优化算法在 Jax 中。这个库可以用于各种优化目的,由黑盒优化到连续控制。QDax 提供了各种流行的 QD、神经演化和强化学习(RL)算法的实现,并且支持多个例子。这些实现可以与 Jax 的 Just-in-Time 编译功能相结合,以提高在多个加速器(包括 GPU 和 TPU)上的执行效率。这些实现也详细地显示了框架的 flexibility 和用户友善性,便利实验研究。此外,库也受到了95%的覆盖率测试。

Detecting Spells in Fantasy Literature with a Transformer Based Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2308.03660
  • repo_url: None
  • paper_authors: Marcel Moravek, Alexander Zender, Andreas Müller
  • for: 本研究使用BERT架构进行魔法咒语识别,以识别哈利·波特小说系列中的魔法咒语。
  • methods: 我们使用预训练的BERT模型,并对不同的数据集和训练方法进行了微调,以识别咒语的上下文。
  • results: 我们的实验结果表明,可以使用BERT模型来识别咒语的上下文,并且采用不同的序列分类和Token分类方法可以提高模型的准确率。此外,我们还发现了咒语的总体特征,可以将模型应用于其他奇幻世界。
    Abstract Transformer architectures and models have made significant progress in language-based tasks. In this area, is BERT one of the most widely used and freely available transformer architecture. In our work, we use BERT for context-based phrase recognition of magic spells in the Harry Potter novel series. Spells are a common part of active magic in fantasy novels. Typically, spells are used in a specific context to achieve a supernatural effect. A series of investigations were conducted to see if a Transformer architecture could recognize such phrases based on their context in the Harry Potter saga. For our studies a pre-trained BERT model was used and fine-tuned utilising different datasets and training methods to identify the searched context. By considering different approaches for sequence classification as well as token classification, it is shown that the context of spells can be recognised. According to our investigations, the examined sequence length for fine-tuning and validation of the model plays a significant role in context recognition. Based on this, we have investigated whether spells have overarching properties that allow a transfer of the neural network models to other fantasy universes as well. The application of our model showed promising results and is worth to be deepened in subsequent studies.
    摘要 transformer 架构和模型在语言相关任务中做出了重要进展。在这个领域中,BERT是一个非常受欢迎的和可公开使用的 transformer 架构。在我们的工作中,我们使用 BERT 进行了文本上下文基于短语识别, specifically 在哈利·ポッター系列小说中的魔法 incantations。incantations 是魔法世界中常见的一种特殊语言,通常在特定的上下文中使用以实现超自然的效果。我们通过不同的数据集和训练方法来练习和 fine-tune 一个预训练的 BERT 模型,以识别这些上下文。我们的研究表明,模型的检查序列长度在 fine-tuning 和验证中发挥了重要作用。此外,我们还 investigate 了 whether spells have overarching properties that allow a transfer of the neural network models to other fantasy universes。我们的应用结果表示该模型具有潜在的应用价值,值得进一步研究。

FFF: Fragments-Guided Flexible Fitting for Building Complete Protein Structures

  • paper_url: http://arxiv.org/abs/2308.03654
  • repo_url: None
  • paper_authors: Weijie Chen, Xinyan Wang, Yuhang Wang
  • for: 该 paper 描述了一种新的方法 FFP,它可以将蛋白质结构预测和蛋白质结构识别相结合,从而构建完整的蛋白质结构。
  • methods: 该方法使用多级识别网络来捕捉输入 3D 粒子电子镜像中的多种结构特征,然后使用 Pseudo 氨基酸 вектор 和蛋白质序列对照方法来生成蛋白质结构片段。最后,通过 flexible 匹配来构建完整的结构模型。
  • results: 根据我们的测试, FFP 方法在构建完整蛋白质结构方面比基eline 方法表现更好。
    Abstract Cryo-electron microscopy (cryo-EM) is a technique for reconstructing the 3-dimensional (3D) structure of biomolecules (especially large protein complexes and molecular assemblies). As the resolution increases to the near-atomic scale, building protein structures de novo from cryo-EM maps becomes possible. Recently, recognition-based de novo building methods have shown the potential to streamline this process. However, it cannot build a complete structure due to the low signal-to-noise ratio (SNR) problem. At the same time, AlphaFold has led to a great breakthrough in predicting protein structures. This has inspired us to combine fragment recognition and structure prediction methods to build a complete structure. In this paper, we propose a new method named FFF that bridges protein structure prediction and protein structure recognition with flexible fitting. First, a multi-level recognition network is used to capture various structural features from the input 3D cryo-EM map. Next, protein structural fragments are generated using pseudo peptide vectors and a protein sequence alignment method based on these extracted features. Finally, a complete structural model is constructed using the predicted protein fragments via flexible fitting. Based on our benchmark tests, FFF outperforms the baseline methods for building complete protein structures.
    摘要 冻电子顺ligtroscopy(冻电子顺igtroscopy)是一种用于 reconstruction生物分子(特别是大蛋白复合物和分子集合体)的3维结构的技术。随着分辨率逐渐提高到近原子尺度,从冻电子顺igtroscopy地图中直接建立蛋白结构的可能性增加。然而,由于低信号噪声比(SNR)问题,不能完全建立蛋白结构。同时,AlphaFold在预测蛋白结构方面取得了重大突破。这种灵感我们将Recognition-based de novo building方法和结构预测方法相结合,以建立完整的结构。在这篇论文中,我们提出一种新的方法 named FFF,它可以将蛋白结构预测和蛋白结构认知相连接。首先,我们使用多级认知网络来捕捉输入3D冻电子顺igtroscopy地图中的多种结构特征。然后,我们使用pseudo peptide vectors和基于这些提取的蛋白序列对应方法来生成蛋白结构分割。最后,我们使用预测的蛋白分割来建立完整的结构模型,并通过flexible fitting来调整它们。根据我们的测试,FFF方法在建立完整蛋白结构方面表现出色,超过了基eline方法。

Segmentation Framework for Heat Loss Identification in Thermal Images: Empowering Scottish Retrofitting and Thermographic Survey Companies

  • paper_url: http://arxiv.org/abs/2308.03631
  • repo_url: None
  • paper_authors: Md Junayed Hasan, Eyad Elyan, Yijun Yan, Jinchang Ren, Md Mostafa Kamal Sarker
  • For: This study aims to tackle fuel poverty in Scotland by automating the identification of heat loss sources in thermal images of homes, using a deep learning-based segmentation framework.* Methods: The proposed framework uses a Mask Region Proposal Convolutional Neural Network (Mask RCNN) to segment heat loss sources caused by weak insulation, and eliminates obstructive objects present in the images.* Results: The final fine-tuned model achieved a mean average precision (mAP) score of 77.2% for segmenting the target objects (heat loss sources), demonstrating the potential of the proposed framework in accurately quantifying energy loss in Scottish homes.
    Abstract Retrofitting and thermographic survey (TS) companies in Scotland collaborate with social housing providers to tackle fuel poverty. They employ ground-level infrared (IR) camera-based-TSs (GIRTSs) for collecting thermal images to identi-fy the heat loss sources resulting from poor insulation. However, this identifica-tion process is labor-intensive and time-consuming, necessitating extensive data processing. To automate this, an AI-driven approach is necessary. Therefore, this study proposes a deep learning (DL)-based segmentation framework using the Mask Region Proposal Convolutional Neural Network (Mask RCNN) to validate its applicability to these thermal images. The objective of the framework is to au-tomatically identify, and crop heat loss sources caused by weak insulation, while also eliminating obstructive objects present in those images. By doing so, it min-imizes labor-intensive tasks and provides an automated, consistent, and reliable solution. To validate the proposed framework, approximately 2500 thermal imag-es were collected in collaboration with industrial TS partner. Then, 1800 repre-sentative images were carefully selected with the assistance of experts and anno-tated to highlight the target objects (TO) to form the final dataset. Subsequently, a transfer learning strategy was employed to train the dataset, progressively aug-menting the training data volume and fine-tuning the pre-trained baseline Mask RCNN. As a result, the final fine-tuned model achieved a mean average precision (mAP) score of 77.2% for segmenting the TO, demonstrating the significant po-tential of proposed framework in accurately quantifying energy loss in Scottish homes.
    摘要 历史遗产改造和 thermographic 检测(TS)公司在苏格兰与社会住房提供商合作,解决燃料贫困问题。他们使用地面近红外(IR)摄像机基本TS(GIRTS)来收集热图像,以识别因为差异垄断的热损源。但这个识别过程劳动 INTENSIVE 和时间consuming,需要广泛的数据处理。为了自动化这个过程,这种研究提出了基于人工智能(AI)的分割框架,使用面部提案卷积神经网络(Mask RCNN)来验证其适用性。该框架的目标是自动地识别和cropping热损源,并从热图像中排除干扰物体。通过这样做,它将减少劳动 INTENSIVE 任务,提供一个自动化、一致、可靠的解决方案。为验证提出的框架,约2500个热图像被收集,并与业务TS伙伴合作。然后,1800个代表性图像被谨慎选择,并由专家帮助高亮目标对象(TO),以形成最终数据集。接着,使用传输学习策略进行训练数据集,逐渐增加训练数据量,并进行细化和微调。最终,经过微调的基线Mask RCNN模型在 segmenting TO 方面获得了77.2%的均值精度分(mAP),显示了提出的框架在准确量化苏格兰家庭的能源损失中的显著潜力。

MedMine: Examining Pre-trained Language Models on Medication Mining

  • paper_url: http://arxiv.org/abs/2308.03629
  • repo_url: https://github.com/hecta-uom/m3
  • paper_authors: Haifa Alrdahi, Lifeng Han, Hendrik Šuvalov, Goran Nenadic
  • for: 这个研究的目的是探讨现有的州前语言模型(PLM)在自动药物探索领域的表现,以及如何将其应用到临床实践中。
  • methods: 研究使用了现有的州前语言模型(PLM),包括Med7和XLM-RoBERTa,进行精确化训练。
  • results: 研究发现现有的PLM模型在自动药物探索 task 上存在不均衡的表现,特别是在不同的实体类型和临床事件上。
    Abstract Automatic medication mining from clinical and biomedical text has become a popular topic due to its real impact on healthcare applications and the recent development of powerful language models (LMs). However, fully-automatic extraction models still face obstacles to be overcome such that they can be deployed directly into clinical practice for better impacts. Such obstacles include their imbalanced performances on different entity types and clinical events. In this work, we examine current state-of-the-art pre-trained language models (PLMs) on such tasks, via fine-tuning including the monolingual model Med7 and multilingual large language model (LLM) XLM-RoBERTa. We compare their advantages and drawbacks using historical medication mining shared task data sets from n2c2-2018 challenges. We report the findings we get from these fine-tuning experiments such that they can facilitate future research on addressing them, for instance, how to combine their outputs, merge such models, or improve their overall accuracy by ensemble learning and data augmentation. MedMine is part of the M3 Initiative \url{https://github.com/HECTA-UoM/M3}
    摘要 自动药物挖掘从医疗和生物医学文本中得到了广泛的关注,因为它们在医疗应用中有真正的影响。然而,完全自动提取模型仍然需要突破一些障碍,以便在临床实践中直接部署。这些障碍包括它们在不同实体类型和医疗事件上的不均衡性表现。在这项工作中,我们评估了当前状态的批处理语言模型(PLM)在这些任务上,包括单语言模型Med7和多语言大语言模型(LLM)XLM-RoBERTa。我们比较了它们的优点和缺点,使用历史药物挖掘分享任务数据集。我们报告了这些精度调整实验的结果,以便未来研究如何组合它们的输出、合并这些模型或提高它们的总准确率 durch ensemble学习和数据扩展。MedMine是M3Initiave的一部分,详细信息请参考

A Meta-learning based Stacked Regression Approach for Customer Lifetime Value Prediction

  • paper_url: http://arxiv.org/abs/2308.08502
  • repo_url: None
  • paper_authors: Karan Gadgil, Sukhpal Singh Gill, Ahmed M. Abdelmoniem
  • For: The paper aims to propose a simple yet effective and interpretable Customer Lifetime Value (CLV) prediction model that can handle a wide variety of input features and is applicable in various business domains.* Methods: The proposed model is based on a meta-learning-based stacked regression approach that combines the predictions from bagging and boosting models.* Results: The proposed model was empirically tested on an openly available Online Retail dataset and showed superior performance compared to existing distribution-based and basic models.Here’s the simplified Chinese version:* For: 这篇论文目的是提出一种简单又有效、可解释的客户生命周期价值(CLV)预测模型,可以处理各种输入特征并在各个业务领域中适用。* Methods: 该模型基于元学习基层堆叠回归方法,将权重融合 bagging 和 boosting 模型的预测结果。* Results: 该模型在一个公开的在线零售数据集上进行了实验测试,与现有的分布型和基础型模型相比,显示出了更高的性能。
    Abstract Companies across the globe are keen on targeting potential high-value customers in an attempt to expand revenue and this could be achieved only by understanding the customers more. Customer Lifetime Value (CLV) is the total monetary value of transactions/purchases made by a customer with the business over an intended period of time and is used as means to estimate future customer interactions. CLV finds application in a number of distinct business domains such as Banking, Insurance, Online-entertainment, Gaming, and E-Commerce. The existing distribution-based and basic (recency, frequency & monetary) based models face a limitation in terms of handling a wide variety of input features. Moreover, the more advanced Deep learning approaches could be superfluous and add an undesirable element of complexity in certain application areas. We, therefore, propose a system which is able to qualify both as effective, and comprehensive yet simple and interpretable. With that in mind, we develop a meta-learning-based stacked regression model which combines the predictions from bagging and boosting models that each is found to perform well individually. Empirical tests have been carried out on an openly available Online Retail dataset to evaluate various models and show the efficacy of the proposed approach.
    摘要 世界各地公司都在努力寻找高值客户,以拓展收入。这可以通过更好地理解客户来实现。客户生命周期价值(CLV)是指客户在业务之间的财务交易总额,在一定时间范围内,并用于预测未来客户互动。CLV在银行、保险、在线娱乐、游戏和电商等多个业务领域有广泛的应用。现有的分布型和基础(频率、购买量和金额)型模型在处理各种输入特征方面存在限制。此外,更高级的深度学习方法可能会增加不必要的复杂性,特别在某些应用领域。我们因此提出了一个能够同时具有效果、全面、简单并且可解释的系统。为了实现这一目标,我们开发了基于元学习的堆式回归模型,该模型将束合袋装和提升模型的预测结果。我们对公开ailable的在线零售数据集进行了实验,以评估不同模型的表现,并证明了我们的方法的有效性。

Stock Market Price Prediction: A Hybrid LSTM and Sequential Self-Attention based Approach

  • paper_url: http://arxiv.org/abs/2308.04419
  • repo_url: None
  • paper_authors: Karan Pardeshi, Sukhpal Singh Gill, Ahmed M. Abdelmoniem
  • for: 预测股票价格,帮助投资者做出最佳决策。
  • methods: 使用深度学习策略,具体是提议一种新的长期记忆(LSTM)加上顺序自注意机制(SSAM)模型,以提高股票价格预测的准确性。
  • results: 对三个股票数据集(SBIN、HDFCBANK、BANKBARODA)进行了广泛的实验,结果表明提议的模型比现有模型更有效率和可行,RMSE和R2评价指标表现最佳。
    Abstract One of the most enticing research areas is the stock market, and projecting stock prices may help investors profit by making the best decisions at the correct time. Deep learning strategies have emerged as a critical technique in the field of the financial market. The stock market is impacted due to two aspects, one is the geo-political, social and global events on the bases of which the price trends could be affected. Meanwhile, the second aspect purely focuses on historical price trends and seasonality, allowing us to forecast stock prices. In this paper, our aim is to focus on the second aspect and build a model that predicts future prices with minimal errors. In order to provide better prediction results of stock price, we propose a new model named Long Short-Term Memory (LSTM) with Sequential Self-Attention Mechanism (LSTM-SSAM). Finally, we conduct extensive experiments on the three stock datasets: SBIN, HDFCBANK, and BANKBARODA. The experimental results prove the effectiveness and feasibility of the proposed model compared to existing models. The experimental findings demonstrate that the root-mean-squared error (RMSE), and R-square (R2) evaluation indicators are giving the best results.
    摘要 一个非常吸引人的研究领域是股市,并且预测股价可以帮助投资者取得最佳的决策时间。深度学习策略在金融市场中发挥了关键作用。股市受到两个方面的影响,一是地域政治、社会和全球事件的影响,这些事件可能对股价趋势产生影响。而第二个方面则是历史价格趋势和季节性,我们可以通过这些信息来预测股价。在这篇论文中,我们将关注第二个方面,并建立一个名为Long Short-Term Memory(LSTM)的新模型,并加入Sequential Self-Attention Mechanism(LSTM-SSAM)。最后,我们对SBIN、HDFCBANK和BANKBARODA三个股Dataset进行了广泛的实验。实验结果证明了我们提出的模型的有效性和实现性,并且与现有模型进行比较。实验结果表明,使用RMSE和R2评价指标,我们的模型在预测股价方面表现出色。

Why We Don’t Have AGI Yet

  • paper_url: http://arxiv.org/abs/2308.03598
  • repo_url: None
  • paper_authors: Peter Voss, Mladjan Jovanovic
  • for: 这篇论文旨在探讨人工智能的发展,尤其是人工通用智能(AGI)的概念和发展难点。
  • methods: 本论文使用了统计学方法,探讨了采用统计学方法是否能够实现人工智能。同时,它还检查了一些关键的认知能力,以及它们在人工智能实现中的作用。
  • results: 本论文结果显示,统计学方法 alone 是不够实现人工智能的。它还指出了一些关键的认知能力,以及它们在人工智能实现中的作用。此外,它还检查了一些社会技术因素,以及它们对人工智能发展的影响。
    Abstract The original vision of AI was re-articulated in 2002 via the term 'Artificial General Intelligence' or AGI. This vision is to build 'Thinking Machines' - computer systems that can learn, reason, and solve problems similar to the way humans do. This is in stark contrast to the 'Narrow AI' approach practiced by almost everyone in the field over the many decades. While several large-scale efforts have nominally been working on AGI (most notably DeepMind), the field of pure focused AGI development has not been well funded or promoted. This is surprising given the fantastic value that true AGI can bestow on humanity. In addition to the dearth of effort in this field, there are also several theoretical and methodical missteps that are hampering progress. We highlight why purely statistical approaches are unlikely to lead to AGI, and identify several crucial cognitive abilities required to achieve human-like adaptability and autonomous learning. We conclude with a survey of socio-technical factors that have undoubtedly slowed progress towards AGI.
    摘要 原始的人工智能概念在2002年被重新艺术iculminated 以"人工通用智能"(AGI)的形式。这个目标是建立"思维机器"——计算机系统可以学习、理据和解决问题,与人类相似。这与在多个时期内的"窄AI"方法不同,大多数人在该领域的努力都是这种方法。虽然有几个大规模尝试在AGI方面工作(特别是DeepMind),但是纯粹的AGI发展领域没有得到过足够的投资和推广。这对人类的未来带来了惊人的价值。此外,我们还指出了统计方法不可能导致AGI的理论和方法上的阻碍因素,并识别了达到人类式的适应和自主学习所需的关键认知能力。我们结束于论述AGI的发展受到了社会技术因素的阻碍。

Feature Importance versus Feature Influence and What It Signifies for Explainable AI

  • paper_url: http://arxiv.org/abs/2308.03589
  • repo_url: None
  • paper_authors: Kary Främling
  • for: 这种研究是为了提出一种全局和地方特征重要性的定义,以及一种基于值Utility概念的实时解释方法。
  • methods: 这种方法使用Contextual Importance和Contextual Utility(CIU)来评估特征重要性,并对不同方法的精度和稳定性进行评估。
  • results: 研究表明,使用CIU可以提供更有表达力和更灵活的解释,并且可以减少因果关系的偏见。
    Abstract When used in the context of decision theory, feature importance expresses how much changing the value of a feature can change the model outcome (or the utility of the outcome), compared to other features. Feature importance should not be confused with the feature influence used by most state-of-the-art post-hoc Explainable AI methods. Contrary to feature importance, feature influence is measured against a reference level or baseline. The Contextual Importance and Utility (CIU) method provides a unified definition of global and local feature importance that is applicable also for post-hoc explanations, where the value utility concept provides instance-level assessment of how favorable or not a feature value is for the outcome. The paper shows how CIU can be applied to both global and local explainability, assesses the fidelity and stability of different methods, and shows how explanations that use contextual importance and contextual utility can provide more expressive and flexible explanations than when using influence only.
    摘要 Translated into Simplified Chinese:在决策理论中,特征重要度表示修改特征值会改变模型结果(或结果的价值)的程度,相比其他特征。这与特征影响不同,特征影响是相对参照水平或基线进行度量。Contextual Importance and Utility(CIU)方法提供了一个综合定义的全局和本地特征重要度,可以应用于后期解释,其中值用性概念提供了实例级别的评估结果如何有利或不利于结果。文章显示了CIU如何应用于全局和本地解释,评估不同方法的准确性和稳定性,并显示了使用Contextual Importance和Contextual Utility来提供更加表达性和灵活的解释,相比使用影响只。

A machine-learning sleep-wake classification model using a reduced number of features derived from photoplethysmography and activity signals

  • paper_url: http://arxiv.org/abs/2308.05759
  • repo_url: None
  • paper_authors: Douglas A. Almeida, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
  • for: 这个研究的目的是开发一种基于XTreme Gradient Boosting(XGBoost)算法和血氧信号和活动计数的机器学习睡眠-醒目分类模型,以提高睡眠质量和全身健康。
  • methods: 这个模型使用了血氧信号和活动计数的特征EXTraction,并使用了XGBoost算法进行分类。
  • results: 该模型的性能与当前领先方法相当,具有感知率91.15 $\pm$ 1.16%, 特征率53.66 $\pm$ 1.12%, F1分数83.88 $\pm$ 0.56%和κ48.0 $\pm$ 0.86%。这个方法在计算能力有限的穿戴式设备中可以实现更好的性能。
    Abstract Sleep is a crucial aspect of our overall health and well-being. It plays a vital role in regulating our mental and physical health, impacting our mood, memory, and cognitive function to our physical resilience and immune system. The classification of sleep stages is a mandatory step to assess sleep quality, providing the metrics to estimate the quality of sleep and how well our body is functioning during this essential period of rest. Photoplethysmography (PPG) has been demonstrated to be an effective signal for sleep stage inference, meaning it can be used on its own or in a combination with others signals to determine sleep stage. This information is valuable in identifying potential sleep issues and developing strategies to improve sleep quality and overall health. In this work, we present a machine learning sleep-wake classification model based on the eXtreme Gradient Boosting (XGBoost) algorithm and features extracted from PPG signal and activity counts. The performance of our method was comparable to current state-of-the-art methods with a Sensitivity of 91.15 $\pm$ 1.16%, Specificity of 53.66 $\pm$ 1.12%, F1-score of 83.88 $\pm$ 0.56%, and Kappa of 48.0 $\pm$ 0.86%. Our method offers a significant improvement over other approaches as it uses a reduced number of features, making it suitable for implementation in wearable devices that have limited computational power.
    摘要 睡眠是我们全面健康和卫生的重要组成部分。它对我们的情绪、身体健康和智能功能产生重要的影响,同时也影响我们的免疫力和身体抵抗力。确定睡眠阶段是一项必要的步骤,以评估睡眠质量,并提供评估睡眠质量和身体功能的指标。光谱 Plethysmography (PPG) 已被证明是一种有效的睡眠阶段推断信号,因此可以单独使用或与其他信号结合使用来确定睡眠阶段。这些信息对于检测可能存在的睡眠问题和改善睡眠质量和全面健康提供了 ценности。在这个工作中,我们提出了基于 eXtreme Gradient Boosting (XGBoost) 算法和 PPG 信号和活动计数的机器学习睡眠-醒目分类模型。我们的方法的性能与当前状态的方法相当,具有感知率为 91.15 $\pm$ 1.16%、特异性为 53.66 $\pm$ 1.12%、F1 分数为 83.88 $\pm$ 0.56% 和 Kappa 值为 48.0 $\pm$ 0.86%。我们的方法提供了与其他方法相比的显著改善,因为它使用了减少的特征数,使其适合在有限的计算能力的穿戴式设备中实现。

Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization

  • paper_url: http://arxiv.org/abs/2308.03580
  • repo_url: None
  • paper_authors: Akshit Achara, Ram Krishna Pandey
  • for: 了解深度学习模型的表现和泛化能力
  • methods: 使用图像-图像、数据集-数据集和图像-数据集距离来理解模型的行为
  • results: 通过添加一小数量的未看过图像(例如1、3或7)到训练集,可以提高模型的泛化能力,并降低训练和标注成本。
    Abstract Supervised deep learning models require significant amount of labelled data to achieve an acceptable performance on a specific task. However, when tested on unseen data, the models may not perform well. Therefore, the models need to be trained with additional and varying labelled data to improve the generalization. In this work, our goal is to understand the models, their performance and generalization. We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior. Our proposed distance metric when combined with model performance can help in selecting an appropriate model/architecture from a pool of candidate architectures. We have shown that the generalization of these models can be improved by only adding a small number of unseen images (say 1, 3 or 7) into the training set. Our proposed approach reduces training and annotation costs while providing an estimate of model performance on unseen data in dynamic environments.
    摘要 深度学习模型需要大量标注数据来达到特定任务的可接受性水平。然而,当测试在未看到的数据时,模型可能不会表现好。因此,模型需要通过添加更多和变化的标注数据来改善通用性。在这项工作中,我们的目标是理解模型、其性能和通用性。我们定义图像-图像、数据集-数据集和图像-数据集距离,以获得模型的行为的启示。我们的提议的距离度量器,当与模型性能相结合,可以帮助选择最佳的模型/架构从候选 arquitectures中。我们已经示出,通过只添加一小数量的未看到图像(例如1、3或7)到训练集中,可以改善这些模型的通用性。我们的提议方法可以降低训练和注释成本,同时提供对未看到数据的模型性能的估计,在动态环境中。

Provably Efficient Learning in Partially Observable Contextual Bandit

  • paper_url: http://arxiv.org/abs/2308.03572
  • repo_url: None
  • paper_authors: Xueping Gong, Jiheng Zhang
  • for: investigate transfer learning in partially observable contextual bandits
  • methods: convert the problem to identifying or partially identifying causal effects through optimization problems, and use sampling algorithms to obtain causal bounds
  • results: improve the performance of classical bandit algorithms and achieve orders of magnitude faster convergence rates, especially in tasks with function approximation.
    Abstract In this paper, we investigate transfer learning in partially observable contextual bandits, where agents have limited knowledge from other agents and partial information about hidden confounders. We first convert the problem to identifying or partially identifying causal effects between actions and rewards through optimization problems. To solve these optimization problems, we discretize the original functional constraints of unknown distributions into linear constraints, and sample compatible causal models via sequentially solving linear programmings to obtain causal bounds with the consideration of estimation error. Our sampling algorithms provide desirable convergence results for suitable sampling distributions. We then show how causal bounds can be applied to improving classical bandit algorithms and affect the regrets with respect to the size of action sets and function spaces. Notably, in the task with function approximation which allows us to handle general context distributions, our method improves the order dependence on function space size compared with previous literatures. We formally prove that our causally enhanced algorithms outperform classical bandit algorithms and achieve orders of magnitude faster convergence rates. Finally, we perform simulations that demonstrate the efficiency of our strategy compared to the current state-of-the-art methods. This research has the potential to enhance the performance of contextual bandit agents in real-world applications where data is scarce and costly to obtain.
    摘要 在这篇论文中,我们研究了在部分可见情况下的 contextual bandit 中的转移学习,agent有限制的知识来自其他代理人和部分隐藏的干扰因素。我们首先将问题转化为标定或部分标定 causal 效应 между动作和奖励,通过优化问题来解决。为解决这些优化问题,我们将原始Unknown Distributions的函数约束转化为线性约束,并通过顺序解决线性程序来采样Compatible causal models,从而获得 causal bound WITH consideration of estimation error。我们的采样算法提供了可靠的连续抽象结果。然后,我们示了如何使用 causal bound 来改进经典 bandit 算法,并对 act 集和函数空间大小的选择产生影响。尤其在可以处理通用上下文分布时,我们的方法提高了对函数空间大小的依赖性。我们正式证明我们的 causally enhanced 算法比经典 bandit 算法更高效,并实现了orders of magnitude faster convergence rates。最后,我们进行了 simulations ,证明我们的策略比现有的方法更高效。这项研究有望提高实际应用中的 contextual bandit 代理人性能,因为数据是珍贵和costly to obtain。

MSLE: An ontology for Materials Science Laboratory Equipment. Large-Scale Devices for Materials Characterization

  • paper_url: http://arxiv.org/abs/2308.07325
  • repo_url: None
  • paper_authors: Mehrdad Jalali, Matthias Mail, Rossella Aversa, Christian Kübel
  • for: 这个论文是为了开发一个Materials Science Laboratory Equipment(MSLE) Ontology,用于统一材料科学实验室设备的描述和使用。
  • methods: 该论文使用了两个现有的 ontology:Semantic Sensor Network(SSN)和Material Vocabulary(MatVoc),将其综合到 MSLE 核心中,建立了一个协调的 ontology。此外,论文还使用了 Simple Knowledge Organization System(SKOS)来表示设备名称的层次结构。
  • results: 该论文通过与领域专家的合作,对大规模材料Characterization设备进行了研究和模型化,并使用 SHACL 语言来模型约束。这些约束可以帮助回答材料科学实验室设备的能力问题。
    Abstract This paper introduces a new ontology for Materials Science Laboratory Equipment, termed MSLE. A fundamental issue with materials science laboratory (hereafter lab) equipment in the real world is that scientists work with various types of equipment with multiple specifications. For example, there are many electron microscopes with different parameters in chemical and physical labs. A critical development to unify the description is to build an equipment domain ontology as basic semantic knowledge and to guide the user to work with the equipment appropriately. Here, we propose to develop a consistent ontology for equipment, the MSLE ontology. In the MSLE, two main existing ontologies, the Semantic Sensor Network (SSN) and the Material Vocabulary (MatVoc), have been integrated into the MSLE core to build a coherent ontology. Since various acronyms and terms have been used for equipment, this paper proposes an approach to use a Simple Knowledge Organization System (SKOS) to represent the hierarchical structure of equipment terms. Equipment terms were collected in various languages and abbreviations and coded into the MSLE using the SKOS model. The ontology development was conducted in close collaboration with domain experts and focused on the large-scale devices for materials characterization available in our research group. Competency questions are expected to be addressed through the MSLE ontology. Constraints are modeled in the Shapes Query Language (SHACL); a prototype is shown and validated to show the value of the modeling constraints.
    摘要 The MSLE ontology integrates two existing ontologies, the Semantic Sensor Network (SSN) and the Material Vocabulary (MatVoc), to create a coherent ontology. To deal with the various acronyms and terms used for equipment, the authors propose using a Simple Knowledge Organization System (SKOS) to represent the hierarchical structure of equipment terms.The ontology development was conducted in collaboration with domain experts and focused on large-scale devices for materials characterization available in the research group. The authors expect that the MSLE ontology will address competency questions and provide a standardized way of describing equipment. Constraints are modeled in the Shapes Query Language (SHACL) and a prototype is shown to demonstrate the value of the modeling constraints.Translation notes:* "Materials Science Laboratory Equipment" is translated as "材料科学实验室设备" (materials science experimental equipment)* "Semantic Sensor Network" is translated as "含义感知网络" (semantic sensor network)* "Material Vocabulary" is translated as "材料词汇" (material vocabulary)* "Simple Knowledge Organization System" is translated as "简单知识组织系统" (simple knowledge organization system)* "SHACL" is translated as "SHACL" (SHACL)Note: The translation is based on Simplified Chinese, which is the most widely used form of Chinese in mainland China. If you need the translation in Traditional Chinese, please let me know.

Measuring Variety, Balance, and Disparity: An Analysis of Media Coverage of the 2021 German Federal Election

  • paper_url: http://arxiv.org/abs/2308.03531
  • repo_url: None
  • paper_authors: Michael Färber, Jannik Schwade, Adam Jatowt
  • for: 本研究旨在探讨新闻文章中多样性的评估方法,以便防止过滤屏和促进公共讨论,特别是在选举前。
  • methods: 本研究提出了一种基于多维度的新闻文章多样性评估框架,考虑了个体、党派和话题的多样性。同时,研究人员还创建了一个Google Top Stories数据集,包括超过26,000个不同的标题和来自超过900家新闻机构的新闻文章,收集于2021年德国联邦选举前后的两周内。
  • results: 研究人员发现,使用更一般性的搜索关键词(例如“选举”)时,新闻文章的多样性较高。然而,使用更专门的搜索关键词(例如“教育”、“欧洲”、“气候保护”、“政府”)时,新闻文章的多样性在三个维度中较高,这反映了更加主观、专注的讨论。
    Abstract Determining and measuring diversity in news articles is important for a number of reasons, including preventing filter bubbles and fueling public discourse, especially before elections. So far, the identification and analysis of diversity have been illuminated in a variety of ways, such as measuring the overlap of words or topics between news articles related to US elections. However, the question of how diversity in news articles can be measured holistically, i.e., with respect to (1) variety, (2) balance, and (3) disparity, considering individuals, parties, and topics, has not been addressed. In this paper, we present a framework for determining diversity in news articles according to these dimensions. Furthermore, we create and provide a dataset of Google Top Stories, encompassing more than 26,000 unique headlines from more than 900 news outlets collected within two weeks before and after the 2021 German federal election. While we observe high diversity for more general search terms (e.g., "election"), a range of search terms ("education," "Europe," "climate protection," "government") resulted in news articles with high diversity in two out of three dimensions. This reflects a more subjective, dedicated discussion on rather future-oriented topics.
    摘要 确定和衡量新闻文章的多样性是重要的多种原因,包括避免 Filter Bubble 和促进公众讨论,特别是在选举前。迄今为止,多样性的识别和分析已经得到了多种方法的探讨,如在美国选举新闻文章中度量词语或话题之间的重叠。然而,如何全面衡量新闻文章的多样性,即以(1)多样性、(2)平衡和(3)差异为基础,考虑个体、党派和话题,还没有得到回答。在这篇论文中,我们提出了对多样性的定义和衡量方法。此外,我们还创建了一个 Google Top Stories 数据集,包括超过 26,000 个唯一的标题和来自超过 900 家新闻机构,在2021年德国联邦大选之前两周内收集到的。我们发现,使用更通用的搜索关键词(例如 "选举")时,新闻文章的多样性很高。然而,使用不同的搜索关键词(例如 "教育", "欧洲", "气候保护", "政府")时,新闻文章的多样性在三个维度中具有高度的多样性,这反映了一种更Subjective、专注于未来话题的讨论。

Deep Feature Learning for Wireless Spectrum Data

  • paper_url: http://arxiv.org/abs/2308.03530
  • repo_url: None
  • paper_authors: Ljupcho Milosheski, Gregor Cerar, Blaž Bertalanič, Carolina Fortuna, Mihael Mohorčič
  • for: 本研究旨在无监督的情况下自动学习无线传输对 clustering 的特征表现。
  • methods: 本研究使用了 convolutional neural networks (CNN) 来自动学习对数据的凝固表现,并与基准方法(principal component analysis,PCA)进行比较。
  • results: 研究发现,自动学习的特征表现可以提取细化的传输对应形状,而基准方法仅能基于背景噪音来分类数据。
    Abstract In recent years, the traditional feature engineering process for training machine learning models is being automated by the feature extraction layers integrated in deep learning architectures. In wireless networks, many studies were conducted in automatic learning of feature representations for domain-related challenges. However, most of the existing works assume some supervision along the learning process by using labels to optimize the model. In this paper, we investigate an approach to learning feature representations for wireless transmission clustering in a completely unsupervised manner, i.e. requiring no labels in the process. We propose a model based on convolutional neural networks that automatically learns a reduced dimensionality representation of the input data with 99.3% less components compared to a baseline principal component analysis (PCA). We show that the automatic representation learning is able to extract fine-grained clusters containing the shapes of the wireless transmission bursts, while the baseline enables only general separability of the data based on the background noise.
    摘要 Our proposed model is based on convolutional neural networks (CNNs), which automatically learn a reduced dimensionality representation of the input data. We show that this approach achieves a 99.3% reduction in the number of components compared to a baseline principal component analysis (PCA) method. Furthermore, the automatic representation learning is able to extract fine-grained clusters containing the shapes of the wireless transmission bursts, while the baseline method only enables general separability of the data based on the background noise.

Exploring ChatGPT’s Empathic Abilities

  • paper_url: http://arxiv.org/abs/2308.03527
  • repo_url: None
  • paper_authors: Kristina Schaaff, Caroline Reinig, Tim Schlippe
  • For: This study investigates the empathetic responses and emotional expressions of ChatGPT, a chatbot based on GPT-3.5.* Methods: The study evaluates ChatGPT’s empathy in three aspects: understanding and expressing emotions, parallel emotional response, and empathic personality.* Results: ChatGPT was able to correctly identify emotions and produce appropriate answers in 91.7% of cases, and reacted with a parallel emotion in 70.7% of conversations. The empathic capabilities of ChatGPT were found to be better than those of people with Asperger syndrome/high-functioning autism, but still below the average of healthy humans.Here is the information in Simplified Chinese text:* 为:这项研究研究了基于GPT-3.5的ChatGPT chatbot的共鸣和情感表达。* 方法:研究对ChatGPT的共鸣进行三个方面的评估:理解和表达情感、同步情感反应和共鸣性格。* 结果:ChatGPT在91.7%的情况下正确地识别情感和生成相应的答案,在对话中与人类的情感同步达到70.7%。与阿斯伯格症/高功能自闭症患者相比,ChatGPT的共鸣能力显示出改善,但仍然下于健康人群的平均水平。
    Abstract Empathy is often understood as the ability to share and understand another individual's state of mind or emotion. With the increasing use of chatbots in various domains, e.g., children seeking help with homework, individuals looking for medical advice, and people using the chatbot as a daily source of everyday companionship, the importance of empathy in human-computer interaction has become more apparent. Therefore, our study investigates the extent to which ChatGPT based on GPT-3.5 can exhibit empathetic responses and emotional expressions. We analyzed the following three aspects: (1) understanding and expressing emotions, (2) parallel emotional response, and (3) empathic personality. Thus, we not only evaluate ChatGPT on various empathy aspects and compare it with human behavior but also show a possible way to analyze the empathy of chatbots in general. Our results show, that in 91.7% of the cases, ChatGPT was able to correctly identify emotions and produces appropriate answers. In conversations, ChatGPT reacted with a parallel emotion in 70.7% of cases. The empathic capabilities of ChatGPT were evaluated using a set of five questionnaires covering different aspects of empathy. Even though the results indicate that the empathic abilities of ChatGPT are still below the average of healthy humans, the scores are better than those of people who have been diagnosed with Asperger syndrome / high-functioning autism.
    摘要 Empathy 常被理解为与别人分享和理解他们的情感或情绪的能力。随着虚拟助手在不同领域的使用,例如孩子们寻求家庭作业帮助、人们寻求医疗建议以及人们每天通过虚拟助手获得伴侣关系,人机交互中Empathy的重要性变得更加明显。因此,我们的研究探讨了基于GPT-3.5的ChatGPT是否能够表现出Empathy的响应和情感表达。我们分析了以下三个方面:(1)理解和表达情感,(2)并行情感响应,以及(3)Empathic Personality。因此,我们不仅评估ChatGPT在不同Empathy方面的表现,并与人类行为进行比较,还提供了分析虚拟助手 Empathy 的可能性。我们的结果显示,在91.7%的情况下,ChatGPT能正确地识别情感并提供相应的答案。在对话中,ChatGPT在70.7%的情况下表现出并行情感响应。虚拟助手Empathic能力被评估 using five 个问卷,涵盖不同方面的Empathy。尽管结果表明ChatGPT的Empathic能力仍然比健康人类的平均水平低,但得分仍高于被诊断为有Asperger症/高功能自闭症的人。

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.03526
  • repo_url: None
  • paper_authors: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals
  • for: 这篇论文是为了提高offline学习RL算法的性能而写的。
  • methods: 该论文使用了大量的StarCraft II游戏数据,并提供了一个标准的API和评价协议。它还包括了一些基线代理,如行为做clone和离线变体的actor-critic和MuZero。
  • results: 该论文使用了仅做offline数据,提高了先前发表的AlphaStar行为做clone代理的状态。它实现了90%的胜率。
    Abstract StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.
    摘要 星际II是一个非常具有挑战性的模拟强化学环境之一,它是部分可见、随机、多个智能体,需要在长时间 horizon 上进行策略规划,同时在实时低级别执行。它还拥有活跃的职业竞赛场景。由于星际II的挑战性和Blizzard公司发布了数百万场星际II游戏记录,因此这个纸 lái 利用这些数据,建立了一个名为AlphaStar Unplugged的标准。我们定义了一个子集(Blizzard发布的 dataset)、工具和标准化 API для机器学习方法,以及评估协议。我们还提供了基线代理,包括行为做参数和离线变体的actor-critic和MuZero。我们使用仅基于离线数据进行代理,并达到90%的胜率,比之前发布的AlphaStar行为做参数代理更高。

Vocab-Expander: A System for Creating Domain-Specific Vocabularies Based on Word Embeddings

  • paper_url: http://arxiv.org/abs/2308.03519
  • repo_url: None
  • paper_authors: Michael Färber, Nicholas Popovic
  • for: 该论文旨在提供一种在线工具,帮助用户(如技术搜寻者)创建和扩展自己的领域词汇。
  • methods: 该工具使用了当前最佳的词嵌入技术 ensemble,基于网络文本和 ConceptNet 常识知识库,为已给定的词语提供相关的词语建议。
  • results: 该系统具有易用的界面,允许用户快速确认或拒绝词语建议。 Vocab-Expander 可以满足多种用例,如提高技术和创新管理中的概念基于搜索、组织内部或跨学科项目的沟通和合作,以及特定课程教育中的词汇创建。
    Abstract In this paper, we propose Vocab-Expander at https://vocab-expander.com, an online tool that enables end-users (e.g., technology scouts) to create and expand a vocabulary of their domain of interest. It utilizes an ensemble of state-of-the-art word embedding techniques based on web text and ConceptNet, a common-sense knowledge base, to suggest related terms for already given terms. The system has an easy-to-use interface that allows users to quickly confirm or reject term suggestions. Vocab-Expander offers a variety of potential use cases, such as improving concept-based information retrieval in technology and innovation management, enhancing communication and collaboration within organizations or interdisciplinary projects, and creating vocabularies for specific courses in education.
    摘要 在这篇论文中,我们提出了Vocab-Expander,一个在线工具,它允许用户(例如技术搜寻专家)根据他们的领域兴趣创建和扩展词汇表。它利用了当前最佳的词嵌入技术,基于网络文本和ConceptNet,一个常识知识库,提供相关的词语建议。用户可以通过一个简单易用的界面来快速确认或拒绝词语建议。Vocab-Expander具有多种可能的用 caso,例如在科技和创新管理中提高基于概念的搜索,在组织内部或跨学科项目中增强交流和合作,以及为特定课程创建专门的词汇表。

Balanced Face Dataset: Guiding StyleGAN to Generate Labeled Synthetic Face Image Dataset for Underrepresented Group

  • paper_url: http://arxiv.org/abs/2308.03495
  • repo_url: None
  • paper_authors: Kidist Amde Mekonnen
  • for: 这个研究的目的是生成一个可靠的人脸图像数据集,以便减少人工标注成本和偏见。
  • methods: 这个研究使用了StyleGAN模型来生成人脸图像数据集,并控制生成过程以确保数据集具有良好的分布和代表性。
  • results: 研究表明,使用StyleGAN模型生成的人脸图像数据集具有良好的代表性和准确性,可以用于各种下游任务。
    Abstract For a machine learning model to generalize effectively to unseen data within a particular problem domain, it is well-understood that the data needs to be of sufficient size and representative of real-world scenarios. Nonetheless, real-world datasets frequently have overrepresented and underrepresented groups. One solution to mitigate bias in machine learning is to leverage a diverse and representative dataset. Training a model on a dataset that covers all demographics is crucial to reducing bias in machine learning. However, collecting and labeling large-scale datasets has been challenging, prompting the use of synthetic data generation and active labeling to decrease the costs of manual labeling. The focus of this study was to generate a robust face image dataset using the StyleGAN model. In order to achieve a balanced distribution of the dataset among different demographic groups, a synthetic dataset was created by controlling the generation process of StyleGaN and annotated for different downstream tasks.
    摘要 为了让机器学习模型在未经见过数据中准确泛化,需要确保数据集具有足够的大小和 represencing 实际世界情况。然而,实际世界数据集往往存在过度和不足的分布。为了减少机器学习中的偏见,可以利用多样化和表示性的数据集。在这种情况下,通过控制StyleGAN生成过程,生成了一个多样化的面像数据集,并对不同的下游任务进行了标注。这些步骤可以帮助生成一个 Robust 的面像数据集,以减少偏见。

No Length Left Behind: Enhancing Knowledge Tracing for Modeling Sequences of Excessive or Insufficient Lengths

  • paper_url: http://arxiv.org/abs/2308.03488
  • repo_url: None
  • paper_authors: Moyu Zhang, Xinning Zhu, Chunhong Zhang, Feng Pan, Wenchen Qian, Hui Zhao
  • for: 预测学生的练习响应基于其历史问答行为。
  • methods: 提出了一种模型 called Sequence-Flexible Knowledge Tracing (SFKT),用于解决现有方法中 sequences 的较长或较短问题。
  • results: 模型可以更好地捕捉学生的完整历史练习行为,并且可以避免过拟合问题。
    Abstract Knowledge tracing (KT) aims to predict students' responses to practices based on their historical question-answering behaviors. However, most current KT methods focus on improving overall AUC, leaving ample room for optimization in modeling sequences of excessive or insufficient lengths. As sequences get longer, computational costs will increase exponentially. Therefore, KT methods usually truncate sequences to an acceptable length, which makes it difficult for models on online service systems to capture complete historical practice behaviors of students with too long sequences. Conversely, modeling students with short practice sequences using most KT methods may result in overfitting due to limited observation samples. To address the above limitations, we propose a model called Sequence-Flexible Knowledge Tracing (SFKT).
    摘要 知识追踪(KT)目标是预测学生对实践的回答。然而,现有的大多数KT方法都是通过提高总的准确率来优化,忽略了序列长度过长或短的问题。随着序列长度增加,计算成本会 exponential 增加。因此,KT方法通常会舍弃序列,使得在在线服务系统上的模型难以捕捉学生的完整历史实践行为。相反,使用大多数KT方法模型学生短实践序列可能会导致过拟合,因为有限的观察样本。为解决上述限制,我们提出了一种模型 calledSequence-Flexible Knowledge Tracing(SFKT)。

CIRO: COVID-19 infection risk ontology

  • paper_url: http://arxiv.org/abs/2308.09719
  • repo_url: https://github.com/plod-info/plod
  • paper_authors: Shusaku Egami, Yasunori Yamamoto, Ikki Ohmukai, Takashi Okumura
  • For: The paper aims to automate the assessment of COVID-19 infection risks for individuals based on the Japanese government’s formulation of infection risks.* Methods: The paper uses an ontology called COVID-19 Infection Risk Ontology (CIRO) and the Resource Description Framework (RDF) and SPARQL queries to automate the assessment of infection risks.* Results: The knowledge graph built using CIRO and RDF/SPARQL queries can infer the infection risks formulated by the Japanese government, and the reasoning experiments demonstrated the usefulness of the knowledge processing. However, some issues were identified for further deployment.Here’s the same information in Simplified Chinese:* For: 本研究旨在自动评估 COVID-19 感染风险,基于日本政府的感染风险形态。* Methods: 本研究使用 COVID-19 感染风险 ontology(CIRO)和资源描述框架(RDF)和 SPARQL 查询自动评估感染风险。* Results: 使用 CIRO 和 RDF/SPARQL 查询构建的知识图可以推理出日本政府的感染风险形态,并且理解实验表明了知识处理的有用性。但是,进一步部署还有一些问题需要解决。
    Abstract Public health authorities perform contact tracing for highly contagious agents to identify close contacts with the infected cases. However, during the pandemic caused by coronavirus disease 2019 (COVID-19), this operation was not employed in countries with high patient volumes. Meanwhile, the Japanese government conducted this operation, thereby contributing to the control of infections, at the cost of arduous manual labor by public health officials. To ease the burden of the officials, this study attempted to automate the assessment of each person's infection risk through an ontology, called COVID-19 Infection Risk Ontology (CIRO). This ontology expresses infection risks of COVID-19 formulated by the Japanese government, toward automated assessment of infection risks of individuals, using Resource Description Framework (RDF) and SPARQL (SPARQL Protocol and RDF Query Language) queries. For evaluation, we demonstrated that the knowledge graph built could infer the risks, formulated by the government. Moreover, we conducted reasoning experiments to analyze the computational efficiency. The experiments demonstrated usefulness of the knowledge processing, and identified issues left for deployment.
    摘要 公共健康当局在高度传染病毒病例中进行联系跟踪,以确定感染者的近距离接触者。然而,在2019冠状病毒疫情中,这种操作在高病人量国家没有实施。而日本政府则进行了这种操作,从而对感染的控制做出了贡献,但是需要公共卫生官员进行劳动密集的手动劳动。为了减轻官员的负担,本研究尝试自动评估每个人的感染风险,通过叫做COVID-19感染风险 ontology(CIRO)。这个 ontology 表达了由日本政府制定的感染风险形式,并使用 Resource Description Framework(RDF)和 SPARQL(SPARQL Protocol and RDF Query Language)查询来自动评估个人的感染风险。为了评估,我们展示了知识图建立的可能性,并进行了逻辑实验来分析计算效率。实验表明了知识处理的有用性,并确定了部署中的问题。

Exploring the Physical World Adversarial Robustness of Vehicle Detection

  • paper_url: http://arxiv.org/abs/2308.03476
  • repo_url: None
  • paper_authors: Wei Jiang, Tianyuan Zhang, Shuangcheng Liu, Weiyu Ji, Zichao Zhang, Gang Xiao
  • for: 本研究旨在评估检测模型在真实世界中的 Robustness,并提出一种基于 CARLA simulator 的快速数据生成管道,以生成 Discrete and Continuous Instant-level (DCI) 数据集。
  • methods: 本研究使用了 CARLA simulator 生成快速数据,并在三种检测模型和三种物理攻击下进行了全面的实验。
  • results: 研究发现,Yolo v6 模型在攻击下表现出色,其 AP 值只减少了 6.59%,而 ASA 攻击则导致了 AP 值减少了 14.51%,远超其他算法的影响。此外,研究还发现,静止场景的识别 AP 值较高,并且不同天气条件下的结果相对稳定。
    Abstract Adversarial attacks can compromise the robustness of real-world detection models. However, evaluating these models under real-world conditions poses challenges due to resource-intensive experiments. Virtual simulations offer an alternative, but the absence of standardized benchmarks hampers progress. Addressing this, we propose an innovative instant-level data generation pipeline using the CARLA simulator. Through this pipeline, we establish the Discrete and Continuous Instant-level (DCI) dataset, enabling comprehensive experiments involving three detection models and three physical adversarial attacks. Our findings highlight diverse model performances under adversarial conditions. Yolo v6 demonstrates remarkable resilience, experiencing just a marginal 6.59% average drop in average precision (AP). In contrast, the ASA attack yields a substantial 14.51% average AP reduction, twice the effect of other algorithms. We also note that static scenes yield higher recognition AP values, and outcomes remain relatively consistent across varying weather conditions. Intriguingly, our study suggests that advancements in adversarial attack algorithms may be approaching its ``limitation''.In summary, our work underscores the significance of adversarial attacks in real-world contexts and introduces the DCI dataset as a versatile benchmark. Our findings provide valuable insights for enhancing the robustness of detection models and offer guidance for future research endeavors in the realm of adversarial attacks.
    摘要 实际世界中的检测模型可能会受到敌意攻击的威胁。然而,在实际情况下进行测试具有资源占用的问题。虚拟 simulate 提供了一种 alternaative,但缺乏标准化的 benchmark 使得进步受阻。为了解决这个问题,我们提出了一种创新的实时数据生成管道,使用 CARLA 模拟器。通过这个管道,我们建立了精度和连续实时(DCI)数据集,允许对三种检测模型和三种物理敌意攻击进行全面的实验。我们的发现表明,Yolo v6 表现出色,只有 marginal 6.59% 的平均精度下降(AP)。与此同时,ASA 攻击导致了 substatial 14.51% 的平均精度下降,比其他算法多出一半。我们还发现,静止场景的识别精度较高,并且结果在不同的天气条件下呈 relativelly 一致。另外,我们的研究表明,敌意攻击算法的进步可能会达到其“限制”。总之,我们的工作强调了在实际世界中的敌意攻击的重要性,并将 DCI 数据集作为一个多样化的标准启用。我们的发现为检测模型的Robustness带来了有价值的指导,并为未来对敌意攻击算法的研究提供了新的思路。

Wide Gaps and Clustering Axioms

  • paper_url: http://arxiv.org/abs/2308.03464
  • repo_url: None
  • paper_authors: Mieczysław A. Kłopotek
  • for: The paper is written to address the issue of k-means algorithm producing clusterings that violate our expectations with respect to high/low similarity/density, and to reconcile k-means with Kleinberg’s axiomatic framework in Euclidean and non-Euclidean settings.
  • methods: The paper introduces two new clusterability properties, variational k-separability and residual k-separability, and proposes extensions of k-means algorithm that fit approximately the Kleinberg’s richness axiom.
  • results: The paper demonstrates that the proposed extensions of k-means algorithm fit the Kleinberg’s consistency axiom in both Euclidean and non-Euclidean settings, and provides a method for constructing datasets for testing purposes of algorithms optimizing k-means cost function. Additionally, the paper provides practical contributions to the field of clusterability theory and the theory of axiomatic frameworks of clustering.
    Abstract The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density and is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms that formalizes those expectations in a natural way. k-means violates in particular the consistency axiom. We hypothesise that this clash is due to the not explicated expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering hem to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, variational k-separability and residual k-separability and show that then the Kleinberg's consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of k-means algorithm that fit approximately the Kleinberg's richness axiom that does not hold for k-means. In this way, we reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and for clusterability theory, practical contribution is the possibility to construct {datasets for testing purposes of algorithms optimizing k-means cost function. This includes a method of construction of {clusterable data with known in advance global optimum.
    摘要 广泛应用的k-means算法对我们的预期产生了冲突,尤其是在高/低相似性和密度方面。k-means与克莱因堡格的axiomaatic系统不符合,这是因为没有明确地预期资料本身应有聚集性的假设。为了证明这一点,我们引入了两个新的聚集性特性:variational k-separability和residual k-separability。我们显示了这两个特性使得克莱因堡格的一致性axioma成立,并且提出了基于非欧几何空间的延伸算法,可以近似地满足克莱因堡格的丰富性axioma。这样,我们可以将k-means算法与克莱因堡格的axiomaatic framework相符合,并且在欧几何和非欧几何空间中实现。此外,我们可以建立具有已知全球最佳解的聚集数据集,用于测试 clustering 算法的优化。Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

High-Resolution Cranial Defect Reconstruction by Iterative, Low-Resolution, Point Cloud Completion Transformers

  • paper_url: http://arxiv.org/abs/2308.03813
  • repo_url: https://github.com/MWod/DeepImplant_MICCAI_2023
  • paper_authors: Marek Wodzinski, Mateusz Daniol, Daria Hemmerling, Miroslaw Socha
  • for: This paper aims to develop an automatic, dedicated system for personalized cranial reconstruction, to increase the availability of cranial implants and reduce the time and cost of manual design.
  • methods: The proposed method reformulates the problem as a point cloud completion task and uses an iterative, transformer-based approach to reconstruct the cranial defect at any resolution, while being fast and resource-efficient during training and inference.
  • results: The proposed method shows superior performance compared to state-of-the-art volumetric approaches in terms of GPU memory consumption, while maintaining high-quality of the reconstructed defects.
    Abstract Each year thousands of people suffer from various types of cranial injuries and require personalized implants whose manual design is expensive and time-consuming. Therefore, an automatic, dedicated system to increase the availability of personalized cranial reconstruction is highly desirable. The problem of the automatic cranial defect reconstruction can be formulated as the shape completion task and solved using dedicated deep networks. Currently, the most common approach is to use the volumetric representation and apply deep networks dedicated to image segmentation. However, this approach has several limitations and does not scale well into high-resolution volumes, nor takes into account the data sparsity. In our work, we reformulate the problem into a point cloud completion task. We propose an iterative, transformer-based method to reconstruct the cranial defect at any resolution while also being fast and resource-efficient during training and inference. We compare the proposed methods to the state-of-the-art volumetric approaches and show superior performance in terms of GPU memory consumption while maintaining high-quality of the reconstructed defects.
    摘要 每年千计人们受到不同类型的头部伤害,需要个性化设备 whose 手动设计是昂贵的时间消耗。因此,一个自动化、专门的系统可以大大提高个性化头部重建的可用性。头部缺陷重建问题可以 reformulated 为形状完成任务,并使用专门的深度网络解决。当前最常见的方法是使用体积表示,并应用深度网络进行图像分割。然而,这种方法有一些局限性,不能扩展到高分辨率的体积,也不考虑数据稀疏性。在我们的工作中,我们将问题重新定义为点云完成任务。我们提出了一种迭代的、基于变换器的方法,可以在任何分辨率下重建头部缺陷,并且在训练和推理过程中具有快速和资源高效的特点。我们与状态的艺术方法进行比较,并显示我们的方法在 GPU 内存占用量方面具有更高的性能,同时保持重建的缺陷质量高。

Intelligence-Endogenous Management Platform for Computing and Network Convergence

  • paper_url: http://arxiv.org/abs/2308.03450
  • repo_url: None
  • paper_authors: Zicong Hong, Xiaoyu Qiu, Jian Lin, Wuhui Chen, Yue Yu, Hui Wang, Song Guo, Wen Gao
    for:This paper aims to present a concept for an intelligence-endogenous management platform for Computing and Network Convergence (CNC) called “CNC brain” based on artificial intelligence technologies.methods:The proposed CNC brain platform uses four key building blocks: perception, scheduling, adaptation, and governance, to efficiently and automatically match supply and demand with high heterogeneity in a CNC throughout its life cycle.results:The proposed method is evaluated on a CNC testbed that integrates two open-source frameworks (OpenFaas and Kubernetes) and a real-world business dataset provided by Microsoft Azure, and the evaluation results show that the proposed method is effective in terms of resource utilization and performance.
    Abstract Massive emerging applications are driving demand for the ubiquitous deployment of computing power today. This trend not only spurs the recent popularity of the \emph{Computing and Network Convergence} (CNC), but also introduces an urgent need for the intelligentization of a management platform to coordinate changing resources and tasks in the CNC. Therefore, in this article, we present the concept of an intelligence-endogenous management platform for CNCs called \emph{CNC brain} based on artificial intelligence technologies. It aims at efficiently and automatically matching the supply and demand with high heterogeneity in a CNC via four key building blocks, i.e., perception, scheduling, adaptation, and governance, throughout the CNC's life cycle. Their functionalities, goals, and challenges are presented. To examine the effectiveness of the proposed concept and framework, we also implement a prototype for the CNC brain based on a deep reinforcement learning technology. Also, it is evaluated on a CNC testbed that integrates two open-source and popular frameworks (OpenFaas and Kubernetes) and a real-world business dataset provided by Microsoft Azure. The evaluation results prove the proposed method's effectiveness in terms of resource utilization and performance. Finally, we highlight the future research directions of the CNC brain.
    摘要 巨大的应用需求今天启动了计算力的无限扩展。这种趋势不仅推动了最近的计算和网络融合(CNC)的流行,还提出了智能化管理平台的急需,以协调CNC中的变化资源和任务。因此,在本文中,我们提出了基于人工智能技术的CNC脑(CNC brain)智能化管理平台的概念,旨在高效自动匹配CNC中的供应和需求,并在CNC生命周期中实现四个关键组件的功能:感知、调度、适应和治理。我们还实现了基于深度强化学习技术的CNC脑原型,并在一个包含OpenFaas和Kubernetes两个开源框架以及Microsoft Azure提供的实际业务数据的CNC测试环境中进行了评估。评估结果表明,提出的方法能够提高资源利用率和性能。最后,我们还概述了CNC脑的未来研究方向。

Biomedical Knowledge Graph Embeddings with Negative Statements

  • paper_url: http://arxiv.org/abs/2308.03447
  • repo_url: https://github.com/liseda-lab/truewalks
  • paper_authors: Rita T. Sousa, Sara Silva, Heiko Paulheim, Catia Pesquita
  • for: 本研究旨在提高知识 graphs embedding approaches的准确性,通过正确地包含负性陈述来提高实体表示的准确性。
  • methods: 我们提出了一种新的方法,即TrueWalks,它可以在知识 graphs embedding learning过程中正确地包含负性陈述,并且能够考虑负性陈述在ontology层次上的语义意义。
  • results: 我们在 ontology-rich生物医学知识 graphs 上进行了两种不同的预测任务,并得到了与现有benchmarks进行比较的好效果。
    Abstract A knowledge graph is a powerful representation of real-world entities and their relations. The vast majority of these relations are defined as positive statements, but the importance of negative statements is increasingly recognized, especially under an Open World Assumption. Explicitly considering negative statements has been shown to improve performance on tasks such as entity summarization and question answering or domain-specific tasks such as protein function prediction. However, no attention has been given to the exploration of negative statements by knowledge graph embedding approaches despite the potential of negative statements to produce more accurate representations of entities in a knowledge graph. We propose a novel approach, TrueWalks, to incorporate negative statements into the knowledge graph representation learning process. In particular, we present a novel walk-generation method that is able to not only differentiate between positive and negative statements but also take into account the semantic implications of negation in ontology-rich knowledge graphs. This is of particular importance for applications in the biomedical domain, where the inadequacy of embedding approaches regarding negative statements at the ontology level has been identified as a crucial limitation. We evaluate TrueWalks in ontology-rich biomedical knowledge graphs in two different predictive tasks based on KG embeddings: protein-protein interaction prediction and gene-disease association prediction. We conduct an extensive analysis over established benchmarks and demonstrate that our method is able to improve the performance of knowledge graph embeddings on all tasks.
    摘要 一个知识图是一种强大的实体和关系表示方式。大多数关系被定义为正式声明,但对开放世界假设下,正式声明的重要性得到了更多的认可,特别是在实体概括和问答 зада务中。明确考虑正式声明可以提高实体概括和问答 зада务的性能。然而,知识图嵌入方法对负声明的探索没有得到过关注,尽管负声明可以生成更准确的实体表示。我们提出了一种新的方法,真实步行(TrueWalks),用于在知识图表示学习过程中包含负声明。特别是,我们提出了一种新的步行生成方法,可以不仅区分正式声明和负声明,还能够考虑ontology层次上的否定 semantically。这对生物医学领域的应用非常重要,因为嵌入方法在ontology层次上对负声明的不足已被证明为是关键的限制。我们在生物医学知识图中进行了两种不同的预测任务基于KG嵌入:蛋白质-蛋白质交互预测和基因-疾病相关性预测。我们对已有的标准底层进行了广泛的分析,并证明了我们的方法可以在所有任务中提高知识图嵌入的性能。

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

  • paper_url: http://arxiv.org/abs/2308.03443
  • repo_url: https://github.com/tatsu432/DR-estimator-OPE-large-action
  • paper_authors: Tatsuhiro Shimizu, Laura Forastiere
  • For: The paper focuses on Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces, and aims to develop a more accurate and efficient estimator.* Methods: The paper proposes a new estimator called Marginalized Doubly Robust (MDR) estimator, which combines the strengths of Marginalized Inverse Propensity Scoring (MIPS) and doubly robust estimation. The MDR estimator uses embeddings of actions to mitigate the estimator’s variance and improve accuracy.* Results: The paper shows that the proposed MDR estimator is unbiased under weaker assumptions than MIPS and maintains variance reduction against IPS, which was the main advantage of MIPS. The empirical experiment verifies the supremacy of MDR against existing estimators.Here are the three key points in Simplified Chinese:* For: 这个论文关注在Contextual Bandit Setting下的Off-Policy Evaluation (OPE)问题上, 并提出了一种更加准确和高效的估计方法。* Methods: 论文提出了一种新的估计方法called Marginalized Doubly Robust (MDR) estimator, 它结合了Marginalized Inverse Propensity Scoring (MIPS)和 doubly robust估计的优点。 MDR estimator使用行动的嵌入来降低估计变iance和提高准确性。* Results: 论文表明,提出的MDR估计方法在较弱的假设下是不偏的, 并且保持了对IPS的变iance减少, 这是MIPS的主要优点。 验证性实验证明了MDR估计方法在现有估计方法之上的优越性。
    Abstract We study Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces. The benchmark estimators suffer from severe bias and variance tradeoffs. Parametric approaches suffer from bias due to difficulty specifying the correct model, whereas ones with importance weight suffer from variance. To overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was proposed to mitigate the estimator's variance via embeddings of an action. To make the estimator more accurate, we propose the doubly robust estimator of MIPS called the Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the proposed estimator is unbiased under weaker assumptions than MIPS while maintaining variance reduction against IPS, which was the main advantage of MIPS. The empirical experiment verifies the supremacy of MDR against existing estimators.
    摘要 我们研究偏离策略评估(OPE)在具有大型动作空间的上下文ual bandit中。参考估计器受到严重的偏误和方差交易限制。parametric方法受到模型难以确定的偏误,而具有重要性权重的方法则受到方差的限制。为了超越这些局限性,我们提出了折衔embeddings的Marginalized Inverse Propensity Scoring(MIPS),以减少估计器的方差。为了使估计器更加准确,我们提出了MIPS的双重Robust(MDR)估计器。理论分析表明,我们的估计器具有较弱的假设下的无偏性,而且与IPS相比,具有更好的方差减少效果。实验证明了MDR的超越性。

RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling

  • paper_url: http://arxiv.org/abs/2308.03429
  • repo_url: None
  • paper_authors: Herman Sugiharto, Aradea, Husni Mubarok
  • for: 本研究旨在改进多头注意力(MHA)模型,以提高准确率和减少内存使用量。
  • methods: 本研究采用了相对位置编码和深度分割卷积层,并对输入嵌入进行深度分割卷积,以提高准确率和减少内存使用量。
  • results: 实验结果表明,提出的RCMHA模型在比较于其他注意力模型(MHA、MDHA、RMHA)时,具有更高的准确率(0.572),并且内存使用量相对较低(2.98 GB)。
    Abstract The Attention module finds common usage in language modeling, presenting distinct challenges within the broader scope of Natural Language Processing. Multi-Head Attention (MHA) employs an absolute positional encoding, which imposes limitations on token length and entails substantial memory consumption during the processing of embedded inputs. The current remedy proposed by researchers involves the utilization of relative positional encoding, similar to the approach adopted in Transformer-XL or Relative Multi-Head Attention (RMHA), albeit the employed architecture consumes considerable memory resources. To address these challenges, this study endeavors to refine MHA, leveraging relative positional encoding in conjunction with the Depth-Wise Convolutional Layer architecture, which promises heightened accuracy coupled with minimized memory usage. The proposed RCMHA framework entails the modification of two integral components: firstly, the application of the Depth-Wise Convolutional Layer to the input embedding, encompassing Query, Key, and Value parameters; secondly, the incorporation of Relative Positional Encoding into the attention scoring phase, harmoniously integrated with Scaled Dot-Product Attention. Empirical experiments underscore the advantages of RCMHA, wherein it exhibits superior accuracy, boasting a score of 0.572 in comparison to alternative attention modules such as MHA, Multi-DConv-Head Attention (MDHA), and RMHA. Concerning memory utilization, RMHA emerges as the most frugal, demonstrating an average consumption of 2.98 GB, surpassing RMHA which necessitates 3.5 GB.
    摘要 研究人员通常使用注意模块在自然语言处理中,但它们具有一些特殊的挑战。多头注意(MHA)使用绝对位置编码,这限制了单个符号的长度和需要大量内存进行融合输入的处理。为了解决这些挑战,研究人员提出了使用相对位置编码的方法,类似于Transformer-XL或相对多头注意(RMHA)的方法,但是使用的架构占用了大量内存资源。为了解决这些问题,本研究尝试更新MHA,使用相对位置编码和深度wise卷积层架构,以提高准确率并降低内存使用量。提案的RCMHA框架包括对输入嵌入进行深度wise卷积层应用,以及在注意得分阶段进行相对位置编码的整合,和尺度积分注意。实验证明RCMHA具有更高的准确率,其中在比较注意模块时,RCMHA的得分为0.572,比MHA、Multi-DConv-Head Attention(MDHA)和RMHA更高。此外,RCMHA的内存使用量较低,只需2.98 GB,超过RMHA的3.5 GB。

TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents

  • paper_url: http://arxiv.org/abs/2308.03427
  • repo_url: None
  • paper_authors: Jingqing Ruan, Yihong Chen, Bin Zhang, Zhiwei Xu, Tianpeng Bao, Guoqing Du, Shiwei Shi, Hangyu Mao, Xingyu Zeng, Rui Zhao
  • for: 本研究旨在探讨Large Language Models(LLMs)在不同应用领域中的应用潜力,以及如何使用LLMs来解决复杂的任务。
  • methods: 本研究提出了一种结构化的框架,用于LLMs基本的AI代理人,并讨论了在执行复杂任务时需要的关键能力。在这个框架中,我们设计了两种不同的代理人(即一步代理人和序列代理人),用于执行推理过程。
  • results: 我们通过使用不同的LLMs来实现这个框架,并评估了这些模型在 typical tasks 上的任务规划和工具使用(TPTU)能力。我们发现LLMs在执行复杂任务时具有潜在的潜力,但还有一些挑战和需要进一步研究的领域。
    Abstract With recent advancements in natural language processing, Large Language Models (LLMs) have emerged as powerful tools for various real-world applications. Despite their prowess, the intrinsic generative abilities of LLMs may prove insufficient for handling complex tasks which necessitate a combination of task planning and the usage of external tools. In this paper, we first propose a structured framework tailored for LLM-based AI Agents and discuss the crucial capabilities necessary for tackling intricate problems. Within this framework, we design two distinct types of agents (i.e., one-step agent and sequential agent) to execute the inference process. Subsequently, we instantiate the framework using various LLMs and evaluate their Task Planning and Tool Usage (TPTU) abilities on typical tasks. By highlighting key findings and challenges, our goal is to provide a helpful resource for researchers and practitioners to leverage the power of LLMs in their AI applications. Our study emphasizes the substantial potential of these models, while also identifying areas that need more investigation and improvement.
    摘要 Within this framework, we design two types of agents (one-step agent and sequential agent) to execute the inference process. We then instantiate the framework using various LLMs and evaluate their Task Planning and Tool Usage (TPTU) abilities on typical tasks. Our study highlights key findings and challenges, providing a helpful resource for researchers and practitioners to leverage the power of LLMs in their AI applications. We emphasize the substantial potential of these models, while also identifying areas that require further investigation and improvement.Translation notes:* "Large Language Models" (LLMs) is translated as "大型自然语言处理模型" (dàxíng zìrán yǔyán xῡngwén módelìng).* "Task Planning and Tool Usage" (TPTU) is translated as "任务规划和工具使用" (rèngwù guīhua yǔ gōngjuǎn shǐyòng).* "AI Agents" is translated as "人工智能代理" (réngōng zhìnéng dàibiǎn).* "inference process" is translated as "推理过程" (tuīlǐ gòujiāng).* "one-step agent" and "sequential agent" are translated as "单步代理" (dān bù dàibiǎn) and "连续代理" (liánxù dàibiǎn), respectively.

Boosting Chinese ASR Error Correction with Dynamic Error Scaling Mechanism

  • paper_url: http://arxiv.org/abs/2308.03423
  • repo_url: None
  • paper_authors: Jiaxin Fan, Yong Zhang, Hanzhang Li, Jianzong Wang, Zhitao Li, Sheng Ouyang, Ning Cheng, Jing Xiao
  • for: 这篇论文是为了提高自动语音识别(ASR)错误纠正而写的。
  • methods: 这篇论文提出了一种新的方法,即使用动态错误缩放机制来检测和更正由ASR输出生成的音系错误文本。该机制通过动态融合词级特征和音系信息,以便让模型具备更多的 semantics 数据。此外,该方法还实施了独特的错误减少和增强策略,以解决因 incorrect characters 导致的匹配错误问题。
  • results: 实验结果表明,该提出的方法可以substantially improve ASR error correction,并在已知的数据集上获得了优秀的结果。
    Abstract Chinese Automatic Speech Recognition (ASR) error correction presents significant challenges due to the Chinese language's unique features, including a large character set and borderless, morpheme-based structure. Current mainstream models often struggle with effectively utilizing word-level features and phonetic information. This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text generated by ASR output. This mechanism operates by dynamically fusing word-level features and phonetic information, thereby enriching the model with additional semantic data. Furthermore, our method implements unique error reduction and amplification strategies to address the issues of matching wrong words caused by incorrect characters. Experimental results indicate substantial improvements in ASR error correction, demonstrating the effectiveness of our proposed method and yielding promising results on established datasets.
    摘要

Prompt Guided Copy Mechanism for Conversational Question Answering

  • paper_url: http://arxiv.org/abs/2308.03422
  • repo_url: None
  • paper_authors: Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao, Ning Cheng, Fengying Yu, Jing Xiao
  • for: 本研究旨在提高对话型问题的回答自然度和适用性,提出一种可替换的抽取方法。
  • methods: 本方法使用提示来连接问题和答案,并使用注意力引导复制机制来验证抽取的答案是否自然和适用。
  • results: 实验表明,该方法能够有效提高对话型问题的回答自然度和适用性,在CoQA挑战中获得了好的结果。
    Abstract Conversational Question Answering (CQA) is a challenging task that aims to generate natural answers for conversational flow questions. In this paper, we propose a pluggable approach for extractive methods that introduces a novel prompt-guided copy mechanism to improve the fluency and appropriateness of the extracted answers. Our approach uses prompts to link questions to answers and employs attention to guide the copy mechanism to verify the naturalness of extracted answers, making necessary edits to ensure that the answers are fluent and appropriate. The three prompts, including a question-rationale relationship prompt, a question description prompt, and a conversation history prompt, enhance the copy mechanism's performance. Our experiments demonstrate that this approach effectively promotes the generation of natural answers and achieves good results in the CoQA challenge.
    摘要 问答对话(CQA)是一项具有挑战性的任务,旨在生成自然的对话流程中的答案。在这篇论文中,我们提出了一种可替换的方法,即使用启示机制来提高抽取答案的流畅性和适用性。我们的方法使用启示来联结问题和答案,并通过注意力引导机制来验证抽取答案的自然性,进行必要的修改,以确保答案的流畅性和适用性。我们的实验表明,这种方法可以有效地促进自然的答案生成,并在CoQA挑战中 дости得好的 результа。

RecycleGPT: An Autoregressive Language Model with Recyclable Module

  • paper_url: http://arxiv.org/abs/2308.03421
  • repo_url: None
  • paper_authors: Yufan Jiang, Qiaozhi He, Xiaomin Zhuang, Zhihua Wu, Kunpeng Wang, Wenlai Zhao, Guangwen Yang
  • for: 提高语言模型的速度响应,减少执行时间
  • methods: 基于紧挨 token 之间的强相关性,回收预生成的模型状态,不需要执行整个模型多次
  • results: 实验和分析表明,该方法可以下降推理延迟,达到1.4倍的速度提升,同时保持高性能Translation:
  • for: To improve the speed response of language models and reduce execution time.
  • methods: Based on the strong correlations between adjacent tokens, recycling pre-generated model states without running the whole model multiple times.
  • results: Experiments and analysis show that the approach can significantly reduce inference latency, achieving up to 1.4x speedup while maintaining high performance.
    Abstract Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the whole model in multiple steps. Our approach relies on the observation that adjacent tokens in a sequence usually have strong correlations and the next token in a sequence can be reasonably guessed or inferred based on the preceding ones. Experiments and analysis demonstrate the effectiveness of our approach in lowering inference latency, achieving up to 1.4x speedup while preserving high performance.
    摘要 现有大型语言模型需要运行 K 次以生成一个序列中的 K 个符号。在本文中,我们介绍了 RecycleGPT,一种生成语言模型,具有快速解码速度,通过 reuse 预生成模型状态而不需要在多个步骤中运行整个模型。我们的方法基于 adjacent 符号在序列中强相关性的观察,下一个符号可以基于前一个符号预测或推理。实验和分析表明,我们的方法可以降低推理延迟,达到最高性能的 1.4 倍速度。

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

  • paper_url: http://arxiv.org/abs/2308.03415
  • repo_url: None
  • paper_authors: Christian Huber, Tu Anh Dinh, Carlos Mullov, Ngoc Quan Pham, Thai Binh Nguyen, Fabian Retkowski, Stefan Constantin, Enes Yavuz Ugan, Danni Liu, Zhaolin Li, Sai Koneru, Jan Niehues, Alexander Waibel
  • for: 本研究旨在评估不同低延迟语音翻译方法在真实场景中的表现。
  • methods: 本文提出了首个用于实际场景下评估多种低延迟语音翻译方法的框架。该框架包括音频分 segmentation 和组件的运行时间评估。
  • results: 本研究通过该框架对不同低延迟语音翻译方法进行了比较。包括可修改输出和固定输出方法的比较,以及使用现有的缩进和端到端系统的比较。此外,框架还可自动评估翻译质量和延迟时间,并提供了在线用户界面,以便向用户显示低延迟模型的输出。
    Abstract The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user.
    摘要 <>将文本翻译成简化中文。>研究群体对低延迟语音翻译的挑战在最近几年内已引起了广泛的关注,如图所示。因此,评估这些不同的方法在实际场景下是非常重要的。然而,当前只有特定的方面的系统被评估,而且不能比较不同的方法。在这项工作中,我们提出了第一个能够在实际场景下进行和评估多个方面的低延迟语音翻译的框架。这包括音频分割以及不同组件的运行时间。其次,我们使用这个框架对低延迟语音翻译不同方法进行比较。我们评估可以修改输出的模型以及固定输出的方法。此外,我们直接对现有的核心笔记和端到端系统进行比较。最后,该框架可以自动评估翻译质量以及延迟时间,并提供了一个网页界面,以便用户查看低延迟模型的输出。

Counterfactual Monotonic Knowledge Tracing for Assessing Students’ Dynamic Mastery of Knowledge Concepts

  • paper_url: http://arxiv.org/abs/2308.03377
  • repo_url: None
  • paper_authors: Moyu Zhang, Xinning Zhu, Chunhong Zhang, Wenchen Qian, Feng Pan, Hui Zhao
  • for: 评估学生动态知识概念的掌握是知识追踪(KT)任务的核心, both offline teaching and online educational applications 需要。
  • methods: exist KT methods rely on the implicit paradigm of historical practice to mastery of knowledge concepts to students’ responses to practices to address the challenge of unlabeled concept mastery.
  • results: 我们提出了一种原则正确的方法 called Counterfactual Monotonic Knowledge Tracing (CMKT), which builds on the implicit paradigm described above by using a counterfactual assumption to constrain the evolution of students’ mastery of knowledge concepts.
    Abstract As the core of the Knowledge Tracking (KT) task, assessing students' dynamic mastery of knowledge concepts is crucial for both offline teaching and online educational applications. Since students' mastery of knowledge concepts is often unlabeled, existing KT methods rely on the implicit paradigm of historical practice to mastery of knowledge concepts to students' responses to practices to address the challenge of unlabeled concept mastery. However, purely predicting student responses without imposing specific constraints on hidden concept mastery values does not guarantee the accuracy of these intermediate values as concept mastery values. To address this issue, we propose a principled approach called Counterfactual Monotonic Knowledge Tracing (CMKT), which builds on the implicit paradigm described above by using a counterfactual assumption to constrain the evolution of students' mastery of knowledge concepts.
    摘要 为核心的知识跟踪(KT)任务,评估学生的动态知识概念熟练性非常重要,是线上教育应用以及线下教育中的一个关键任务。由于学生的知识概念熟练性通常无法被直接标注,现有的KT方法通常采用历史实践的隐式模式来评估学生对知识概念的熟练性。然而,仅仅预测学生的回答不能保证这些中间概念熟练性值的准确性。为解决这个问题,我们提出了一种原则性的方法calledCounterfactual Monotonic Knowledge Tracing(CMKT),该方法基于上述隐式模式,并使用一种对假假设来约束学生的知识概念熟练性的演化。

Robust Ordinal Regression for Subsets Comparisons with Interactions

  • paper_url: http://arxiv.org/abs/2308.03376
  • repo_url: None
  • paper_authors: Hugo Gilbert, Mohamed Ouaguenouni, Meltem Ozturk, Olivier Spanjaard
  • for: 本研究旨在开发一种可靠的排序方法,用于学习决策者对子集的偏好。
  • methods: 该方法基于鱼本和拉瓦лле(1996)提出的决策模型,并考虑了元素之间的可能交互。我们不会预测不具有可靠预测的偏好,而是根据可能的 simplest models(奥卡姆的剃刀)解释偏好数据来做预测。
  • results: 我们通过使用不确定集来表示模型参数的可能值,并定义一种robust排序关系,以确定subset之间的偏好关系。我们在 sintetic 和实际数据上进行了数值测试,并证明了我们的偏好预测的多样性和可靠性。
    Abstract This paper is dedicated to a robust ordinal method for learning the preferences of a decision maker between subsets. The decision model, derived from Fishburn and LaValle (1996) and whose parameters we learn, is general enough to be compatible with any strict weak order on subsets, thanks to the consideration of possible interactions between elements. Moreover, we accept not to predict some preferences if the available preference data are not compatible with a reliable prediction. A predicted preference is considered reliable if all the simplest models (Occam's razor) explaining the preference data agree on it. Following the robust ordinal regression methodology, our predictions are based on an uncertainty set encompassing the possible values of the model parameters. We define a robust ordinal dominance relation between subsets and we design a procedure to determine whether this dominance relation holds. Numerical tests are provided on synthetic and real-world data to evaluate the richness and reliability of the preference predictions made.
    摘要 这篇论文探讨了一种可靠的排序方法,用于学习决策者对subset的偏好。我们基于鱼本和拉瓦列(1996)提出的决策模型,并学习其参数,该模型可以与任何严格强制排序集合兼容。此外,我们接受不预测一些偏好,如果可用偏好数据不具有可靠预测。一个预测的偏好被视为可靠,如果所有最简模型(奥卡姆的剑)解释偏好数据都同意它。按照稳健排序回归方法,我们的预测基于模型参数的不确定集。我们定义了稳健排序准则,并设计了一种确定该准则是否成立的过程。在synthetic和实际数据上进行了数据测试,以评估我们的偏好预测的 ricacity 和可靠性。

A reading survey on adversarial machine learning: Adversarial attacks and their understanding

  • paper_url: http://arxiv.org/abs/2308.03363
  • repo_url: None
  • paper_authors: Shashank Kotyan
  • for: This paper provides a survey of existing adversarial attacks and their understanding based on different perspectives, with the goal of classifying adversarial attacks and understanding their vulnerabilities in a systematic order.
  • methods: The paper uses a comprehensive review of existing literature on adversarial attacks and defenses to provide a detailed understanding of the different types of attacks and their characteristics.
  • results: The paper concludes with a discussion on the future research directions in the field of adversarial machine learning, highlighting the limitations of existing defenses and the need for further research to mitigate the effects of adversarial attacks.
    Abstract Deep Learning has empowered us to train neural networks for complex data with high performance. However, with the growing research, several vulnerabilities in neural networks have been exposed. A particular branch of research, Adversarial Machine Learning, exploits and understands some of the vulnerabilities that cause the neural networks to misclassify for near original input. A class of algorithms called adversarial attacks is proposed to make the neural networks misclassify for various tasks in different domains. With the extensive and growing research in adversarial attacks, it is crucial to understand the classification of adversarial attacks. This will help us understand the vulnerabilities in a systematic order and help us to mitigate the effects of adversarial attacks. This article provides a survey of existing adversarial attacks and their understanding based on different perspectives. We also provide a brief overview of existing adversarial defences and their limitations in mitigating the effect of adversarial attacks. Further, we conclude with a discussion on the future research directions in the field of adversarial machine learning.
    摘要 深度学习已经赋予我们训练复杂数据的神经网络高性能。然而,随着研究的发展,一些神经网络的漏洞被曝光。一种特定的研究分支,敌意机器学习,利用和探索了一些导致神经网络对近似输入进行误分类的漏洞。一类称为敌意攻击的算法被提出,以使神经网络在不同领域中对不同任务进行误分类。随着敌意攻击的扩大和增长的研究,了解敌意攻击的分类变得非常重要。这将帮助我们系统地了解漏洞,并帮助我们 Mitigate the effects of adversarial attacks。这篇文章提供了现有的敌意攻击和它们的理解,以及不同角度的概述。此外,我们还提供了现有的防御措施的简要概述和其限制在减轻敌意攻击的效果。最后,我们 conclude with 对敌意机器学习未来研究方向的讨论。

Discrete Message via Online Clustering Labels in Decentralized POMDP

  • paper_url: http://arxiv.org/abs/2308.03358
  • repo_url: None
  • paper_authors: Jingdi Chen, Tian Lan
  • for: 解决多智能体协同学习任务中的共享信息问题
  • methods: 利用地址�分 clustering 问题的方法,将本地观察数据用作标签,并使用 upper bound 作为归一化损失函数
  • results: 提出了一种简单的消息生成函数设计,并与奖励学习结合使用 Regularized Information Maximization 损失函数,实现了对比 estado-of-the-art 多智能体通信基eline 的突出表现,并可以实现高效率的几位数据传输。
    Abstract Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in Partially-Observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents. However, such black-box approaches are unable to provide any quantitative guarantees on the expected return and often lead to the generation of continuous messages with high communication overhead and poor interpretability. In this paper, we establish an upper bound on the return gap between an ideal policy with full observability and an optimal partially-observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. By minimizing the upper bound, we propose a surprisingly simple design of message generation functions in multi-agent communication and integrate it with reinforcement learning using a Regularized Information Maximization loss function. Evaluations show that the proposed discrete communication significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly-optimal returns with few-bit messages that are naturally interpretable.
    摘要 通信是解决合作多智能体强化学习任务中的关键,特别是在部分可见 Markov 决策过程中。现有的方法 oft rely on 黑盒方法来编码本地信息/特征到其他代理机器人的消息中。然而,这些黑盒方法无法提供任何量化保证返回并经常导致高通信开销和低可读性的连续消息生成。在这篇论文中,我们确定了完全可见策略和部分可见策略之间的返回差。这个结果使得我们可以将多智能体通信转化为一个新的在本地观察到每个代理机器人的局部观察上进行在线划分问题,消息作为划分标签,并将返回差作为划分损失。通过最小化返回差,我们提议一种简单的消息生成函数设计,并将其与强化学习相结合,使用 Regularized Information Maximization 损失函数。评估表明,提议的简单消息生成方法在多智能体通信基elines上显著超越了当前的多智能体通信基elines,并可以在几 bits 的消息中实现近似于最优的返回,这些消息自然可读性强。

SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs

  • paper_url: http://arxiv.org/abs/2308.03349
  • repo_url: https://github.com/findalexli/SciGraphQA
  • paper_authors: Shengzhi Li, Nima Tajbakhsh
    for:This paper is written to present a new synthetic multi-turn question-answer dataset related to academic graphs, called SciGraphQA.methods:The dataset is built by using Palm-2 to generate open-vocabulary multi-turn question-answering dialogues about the graphs, with an average of 2.23 question-answer turns for each graph. The paper’s context, including the paper title, abstract, paragraph mentioning the graph, and rich text contextual data from the graph, is provided as input to GPT-4 to assess the matching quality of the question-answer turns.results:The average rating of the question-answer turns given the paper’s context is 8.7/10 on a test set. The most popular MLLM models, such as LLaVa, mPLUGowl, BLIP-2, and openFlamingo, are evaluated on the dataset, with LLaVA-13B being the most performant with a CIDEr score of 0.08. The question prompts for LLAVA are further enriched by including serialized data tables extracted from the graphs using the DePlot model, boosting LLaVA’s 0-shot CIDEr to 0.15. Fine-tuning LLaVa using the dataset results in a substantially higher CIDEr score of 0.26.
    Abstract In this work, we present SciGraphQA, a synthetic multi-turn question-answer dataset related to academic graphs. SciGraphQA is 13 times larger than ChartVQA, the previously largest chart-visual question-answering dataset. It is also the largest open-sourced chart VQA dataset with non-synthetic charts. To build our dataset, we selected 290,000 Computer Science or Machine Learning ArXiv papers published between 2010 and 2020, and then used Palm-2 to generate 295K samples of open-vocabulary multi-turn question-answering dialogues about the graphs. As context, we provided the text-only Palm-2 with paper title, abstract, paragraph mentioning the graph, and rich text contextual data from the graph itself, obtaining dialogues with an average 2.23 question-answer turns for each graph. We asked GPT-4 to assess the matching quality of our question-answer turns given the paper's context, obtaining an average rating of 8.7/10 on our 3K test set. We evaluated the 0-shot capability of the most popular MLLM models such as LLaVa, mPLUGowl, BLIP-2, and openFlamingo's on our dataset, finding LLaVA-13B being the most performant with a CIDEr score of 0.08. We further enriched the question prompts for LLAVA by including the serialized data tables extracted from the graphs using the DePlot model, boosting LLaVA's 0-shot CIDEr to 0.15. To verify the validity of our dataset, we also fine-tuned LLaVa using our dataset, reaching a substantially higher CIDEr score of 0.26. We anticipate further accuracy improvement by including segmentation mask tokens and leveraging larger LLM backbones coupled with emergent prompting techniques. Our code and data are open-sourced.
    摘要 在这项工作中,我们介绍了SciGraphQA,一个基于学术图的多turn问答数据集。SciGraphQA比ChartVQA更大13倍,是目前最大的开源 chart VQA数据集之一。为建立我们的数据集,我们选择了2010-2020年发表的290,000篇计算机科学或机器学习ArXiv论文,然后使用Palm-2生成295,000个开 vocabulary multi-turn问答对话。作为 контекст,我们提供了文本只的Palm-2,并提供了论文标题、摘要、提及图表的段落,以及图表自身的丰富文本数据,得到了每个图表的2.23个问答对话。我们召集了GPT-4进行评分,得到了每个测试集的8.7/10的对话匹配评分。我们评估了目前最受欢迎的MLLM模型,包括LLaVa、mPLUGowl、BLIP-2和openFlamingo,发现LLaVA-13B表现最佳,CIDEr分数为0.08。我们进一步丰富了LLaVA的问题提示,包括使用DePlot模型提取的序列化数据表,提高LLaVA的0shot CIDEr至0.15。为验证我们的数据集的有效性,我们还使用我们的数据集进行了精细调整LLaVa,达到了远高于0.26的CIDEr分数。我们预期将来的准确性改进,通过包括分割masktoken和利用更大的LLM底层,并采用新的提示技术。我们的代码和数据将公开。

Solving Falkner-Skan type equations via Legendre and Chebyshev Neural Blocks

  • paper_url: http://arxiv.org/abs/2308.03337
  • repo_url: None
  • paper_authors: Alireza Afzal Aghaei, Kourosh Parand, Ali Nikkhah, Shakila Jaberi
  • for: 解决非线性的 Falcker-Skan 方程
  • methods: 使用 Legendre 和 Chebyshev 神经块,利用 orthogonal polynomials 在神经网络中提高拟合能力
  • results: 通过对 Falcker-Skan 方程的不同配置进行 simulate,证明提出的方法可以减少计算复杂性并提高效率
    Abstract In this paper, a new deep-learning architecture for solving the non-linear Falkner-Skan equation is proposed. Using Legendre and Chebyshev neural blocks, this approach shows how orthogonal polynomials can be used in neural networks to increase the approximation capability of artificial neural networks. In addition, utilizing the mathematical properties of these functions, we overcome the computational complexity of the backpropagation algorithm by using the operational matrices of the derivative. The efficiency of the proposed method is carried out by simulating various configurations of the Falkner-Skan equation.
    摘要 在这篇论文中,一种新的深度学习架构,用于解决非线性法克内-斯坦方程,被提出。通过使用Legendre和Chebyshev神经块,这种方法表明了在神经网络中使用正交多项式可以增加人工神经网络的拟合能力。此外,利用这些函数的数学性质,我们超越了反propagation算法的计算复杂性,使用操作矩阵的导数。提出的方法的效率被通过 simulate多种法克内-斯坦方程的配置来证明。

Heterogeneous Knowledge Fusion: A Novel Approach for Personalized Recommendation via LLM

  • paper_url: http://arxiv.org/abs/2308.03333
  • repo_url: None
  • paper_authors: Bin Yin, Junjie Xie, Yu Qin, Zixiang Ding, Zhichao Feng, Xiang Li, Wei Lin
  • for: 这个研究旨在提出一种基于大型自然语言模型(LLM)的对象化推荐方法,以抽取和融合用户各种不同行为信息,并对LLM进行调教,以提高对象化推荐的表现。
  • methods: 本研究使用了各种不同行为信息,包括用户的搜寻历史、购买纪录、社交媒体活动等,并使用了LLM进行学习和推荐。
  • results: 实验结果显示,我们的方法可以很好地融合用户各种不同行为信息,并对推荐表现有所提高。
    Abstract The analysis and mining of user heterogeneous behavior are of paramount importance in recommendation systems. However, the conventional approach of incorporating various types of heterogeneous behavior into recommendation models leads to feature sparsity and knowledge fragmentation issues. To address this challenge, we propose a novel approach for personalized recommendation via Large Language Model (LLM), by extracting and fusing heterogeneous knowledge from user heterogeneous behavior information. In addition, by combining heterogeneous knowledge and recommendation tasks, instruction tuning is performed on LLM for personalized recommendations. The experimental results demonstrate that our method can effectively integrate user heterogeneous behavior and significantly improve recommendation performance.
    摘要 “用户多样化行为的分析和挖掘是推荐系统中的关键。然而,通过将不同类型的多样化行为 integrate into 推荐模型中,会导致特征稀缺和知识孤立问题。为解决这个挑战,我们提出了一种基于 Large Language Model (LLM) 的个性化推荐方法,通过提取和融合用户多样化行为信息中的多样化知识。此外,通过结合多样化知识和推荐任务,对 LLM 进行了指令调整,以实现个性化推荐。实验结果显示,我们的方法可以有效地 инте integrate 用户多样化行为,并有显著提高推荐性能。”Note: Please keep in mind that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

  • paper_url: http://arxiv.org/abs/2308.03332
  • repo_url: None
  • paper_authors: Rawad Melhem, Assef Jafar, Riad Hamadeh
  • for: 提出了一种简化了DANet模型,以提高混音分离的性能和学习速度。
  • methods: 使用了Bidirectional Gated neural network (BGRU) instead of BLSTM,并使用 Gaussian Mixture Model (GMM) 作为聚类算法来减少模型的复杂性。
  • results: 在使用TIMIT corpus中的两个说话者混音数据集进行评估时,提出的模型比原始DANet模型得到了12.3 dB和2.94的SDR和PESQ分数,并且减少了20.7%和17.9%的参数数量和训练时间。同时,模型在混合阿拉伯语音信号上也表现了更好的result。
    Abstract Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high. In this paper, a simplified and powerful DANet model is proposed using Bidirectional Gated neural network (BGRU) instead of BLSTM. The Gaussian Mixture Model (GMM) other than the k-means was applied in DANet as a clustering algorithm to reduce the complexity and increase the learning speed and accuracy. The metrics used in this paper are Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR), Signal to Artifact Ratio (SAR), and Perceptual Evaluation Speech Quality (PESQ) score. Two speaker mixture datasets from TIMIT corpus were prepared to evaluate the proposed model, and the system achieved 12.3 dB and 2.94 for SDR and PESQ scores respectively, which were better than the original DANet model. Other improvements were 20.7% and 17.9% in the number of parameters and time training, respectively. The model was applied on mixed Arabic speech signals and the results were better than that in English.
    摘要 深度吸引网络(DANet)是现在的演说分离领域技术state-of-the-art,使用双向长短期记忆(BLSTM),但DANet模型的复杂性很高。在这篇论文中,一种简化了DANet模型的方法被提出,使用双向阻塞神经网络(BGRU)而不是BLSTM。在DANet中, Gaussian Mixture Model(GMM)作为聚类算法,以降低复杂性并提高学习速度和准确性。在本文中使用的度量包括Signal to Distortion Ratio(SDR)、Signal to Interference Ratio(SIR)、Signal to Artifact Ratio(SAR)和Perceptual Evaluation Speech Quality(PESQ)分数。对于TIMIT corpus中的两个说话混合数据集进行了评估,提出的模型实现了12.3 dB和2.94的SDR和PESQ分数,分别高于原始DANet模型。此外,模型的参数数量和训练时间都有20.7%和17.9%的下降。模型在混合阿拉伯语语音信号上进行了应用,结果比英语更好。

Expediting Neural Network Verification via Network Reduction

  • paper_url: http://arxiv.org/abs/2308.03330
  • repo_url: None
  • paper_authors: Yuyi Zhong, Ruiwei Wang, Siau-Cheng Khoo
  • for: 验证深度神经网络的安全性 Properties,以确保神经网络在关键应用中正确工作。
  • methods: 提出了许多验证方法,但许多知名的验证工具仍然无法处理复杂的网络架构和大型网络。 这个研究提出了一种网络减少技术作为验证前置处理方法,通过消除稳定的ReLU神经元并将其转换为一个序列神经网络,包括ReLU和Affine层,这些层可以由大多数验证工具处理。
  • results: 我们在一个大量的benchmark上进行了实验,结果表明,提出的方法可以减少神经网络,并使现有的验证工具更快速地处理神经网络。此外,实验结果还表明,网络减少可以提高现有验证工具对许多网络的可用性。
    Abstract A wide range of verification methods have been proposed to verify the safety properties of deep neural networks ensuring that the networks function correctly in critical applications. However, many well-known verification tools still struggle with complicated network architectures and large network sizes. In this work, we propose a network reduction technique as a pre-processing method prior to verification. The proposed method reduces neural networks via eliminating stable ReLU neurons, and transforming them into a sequential neural network consisting of ReLU and Affine layers which can be handled by the most verification tools. We instantiate the reduction technique on the state-of-the-art complete and incomplete verification tools, including alpha-beta-crown, VeriNet and PRIMA. Our experiments on a large set of benchmarks indicate that the proposed technique can significantly reduce neural networks and speed up existing verification tools. Furthermore, the experiment results also show that network reduction can improve the availability of existing verification tools on many networks by reducing them into sequential neural networks.
    摘要 各种验证方法已经被提议来验证深度神经网络的安全性,以确保神经网络在关键应用中正确地工作。然而,许多知名的验证工具仍然无法处理复杂的网络架构和大型网络。在这种情况下,我们提议一种网络减少技术作为预处理方法,以降低验证工具的难度。我们的方法利用稳定的ReLU神经元的消除和变换为一个序列神经网络,包括ReLU和Affine层,这些层可以由现有的验证工具处理。我们在alpha-beta-crown、VeriNet和PRIMA等完整和部分验证工具上实现了这种减少技术,并对一个大量的benchmark进行了实验。实验结果表明,我们的方法可以减少神经网络,并使现有的验证工具在许多网络上提高可用性。

Generative AI trial for nonviolent communication mediation

  • paper_url: http://arxiv.org/abs/2308.03326
  • repo_url: None
  • paper_authors: Takeshi Kato
  • for: 目的是建立一个包容多元价值观的社会,通过非暴力交流(NVC)来帮助人们在社会分化和冲突中表达自己的感受和需求,并实现宽泛的合作和共谐。
  • methods: 使用生成AI来模拟证明人员的培训,在四个过程中测试了ChatGPT的可能性:观察、情感、需求和请求。
  • results: 结果表明,使用生成AI可能有潜在的应用前提,但目前还不够实际化。建议改进的指南包括添加模型回答、重新学习修改回答、使用适当的词汇表达每个过程,以及重新请求必要的信息。
    Abstract Aiming for a mixbiotic society that combines freedom and solidarity among people with diverse values, I focused on nonviolent communication (NVC) that enables compassionate giving in various situations of social division and conflict, and tried a generative AI for it. Specifically, ChatGPT was used in place of the traditional certified trainer to test the possibility of mediating (modifying) input sentences in four processes: observation, feelings, needs, and requests. The results indicate that there is potential for the application of generative AI, although not yet at a practical level. Suggested improvement guidelines included adding model responses, relearning revised responses, specifying appropriate terminology for each process, and re-asking for required information. The use of generative AI will be useful initially to assist certified trainers, to prepare for and review events and workshops, and in the future to support consensus building and cooperative behavior in digital democracy, platform cooperatives, and cyber-human social co-operating systems. It is hoped that the widespread use of NVC mediation using generative AI will lead to the early realization of a mixbiotic society.
    摘要 我寻求一个mixbiotic社会,既保持人们多样化价值观的自由,又强调人们之间的团结和共融。我选择了非暴力通信(NVC)作为解决社会分化和冲突的工具,并使用生成AI测试其可能性。特别是,我使用了ChatGPT来替代传统证明人员,在四个过程中测试了输入句子的修改可能性:观察、情感、需求和请求。结果表明,生成AI有潜力应用,但还不够实用。建议的改进建议包括添加模型回答、重新学习修改回答、指定每个过程的适当术语,以及重新请求需要的信息。使用生成AI将有助于资深训练人员,准备和审查活动和讲座,以及在未来支持协商建设和合作行为在数字民主、平台合作和人机社会合作系统中。希望通过广泛应用NVC媒介使用生成AI,早日实现mixbiotic社会。

Part-Aware Transformer for Generalizable Person Re-identification

  • paper_url: http://arxiv.org/abs/2308.03322
  • repo_url: https://github.com/liyuke65535/part-aware-transformer
  • paper_authors: Hao Ni, Yuke Li, Heng Tao Shen, Jingkuan Song
  • for: 本研究旨在提高预测人脸锁定 Task(DG-ReID)中模型的泛化能力,特别是在不同频谱上进行训练和测试时。
  • methods: 我们提出了一种名为Part-aware Transformer的纯转换器模型,该模型通过一个名为CSL的协助任务来学习地方视觉信息,从而改善模型的泛化能力。
  • results: 我们的方法在大多数DG-ReID设置下达到了状态对的表现,特别是在Market$\to$Duke设置下,我们的方法在Rank1和mAP上超过了州际最优性的表现,提高了10.9%和12.8%。
    Abstract Domain generalization person re-identification (DG-ReID) aims to train a model on source domains and generalize well on unseen domains. Vision Transformer usually yields better generalization ability than common CNN networks under distribution shifts. However, Transformer-based ReID models inevitably over-fit to domain-specific biases due to the supervised learning strategy on the source domain. We observe that while the global images of different IDs should have different features, their similar local parts (e.g., black backpack) are not bounded by this constraint. Motivated by this, we propose a pure Transformer model (termed Part-aware Transformer) for DG-ReID by designing a proxy task, named Cross-ID Similarity Learning (CSL), to mine local visual information shared by different IDs. This proxy task allows the model to learn generic features because it only cares about the visual similarity of the parts regardless of the ID labels, thus alleviating the side effect of domain-specific biases. Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features. Our method achieves state-of-the-art performance under most DG ReID settings. Under the Market$\to$Duke setting, our method exceeds state-of-the-art by 10.9% and 12.8% in Rank1 and mAP, respectively. The code is available at https://github.com/liyuke65535/Part-Aware-Transformer.
    摘要 领域总结人识别(DG-ReID)目标是在源频谱上训练模型,并在未看过的频谱上进行普适化。视觉转移通常比常见的CNN网络在分布差异下表现更好,但是转移基于的ReID模型总是因为监督学习策略在源频谱上遇到分布差异而导致过拟合。我们发现,不同ID的全局图像应该有不同的特征,但是它们的相似部分(例如黑色背pack)并不受这一限制。基于这一点,我们提出了一种纯transformer模型(称为Part-aware Transformer),通过设计一个代理任务(名为跨ID相似学习(CSL))来挖掘不同ID的本地视觉信息。这个代理任务使得模型学习通用特征,因为它只关心不同ID标签下的视觉相似性,从而消除分布差异的副作用。基于本地相似性获得的Part-guided Self-Distillation(PSD)进一步改进了全局特征的普适性。我们的方法在大多数DG ReID设置下达到了状态盘。在Market$\to$Duke设置下,我们的方法比状态盘提高了10.9%和12.8%的排名1和mAP, соответivamente。代码可以在https://github.com/liyuke65535/Part-Aware-Transformer上下载。

Binary Federated Learning with Client-Level Differential Privacy

  • paper_url: http://arxiv.org/abs/2308.03320
  • repo_url: None
  • paper_authors: Lumin Liu, Jun Zhang, Shenghui Song, Khaled B. Letaief
  • for: 提高 Federated Learning 系统中的隐私保护和性能。
  • methods: 采用 Binary Neural Networks (BNNs) 和粗粒化噪声来实现客户端级别的隐私保护,并且通过调整粗粒化噪声来保证隐私保护和性能之间的平衡。
  • results: 实验结果基于 MNIST 和 Fashion-MNIST 数据集显示,提议的训练算法可以实现客户端级别的隐私保护,同时享受到低通信开销的优势。
    Abstract Federated learning (FL) is a privacy-preserving collaborative learning framework, and differential privacy can be applied to further enhance its privacy protection. Existing FL systems typically adopt Federated Average (FedAvg) as the training algorithm and implement differential privacy with a Gaussian mechanism. However, the inherent privacy-utility trade-off in these systems severely degrades the training performance if a tight privacy budget is enforced. Besides, the Gaussian mechanism requires model weights to be of high-precision. To improve communication efficiency and achieve a better privacy-utility trade-off, we propose a communication-efficient FL training algorithm with differential privacy guarantee. Specifically, we propose to adopt binary neural networks (BNNs) and introduce discrete noise in the FL setting. Binary model parameters are uploaded for higher communication efficiency and discrete noise is added to achieve the client-level differential privacy protection. The achieved performance guarantee is rigorously proved, and it is shown to depend on the level of discrete noise. Experimental results based on MNIST and Fashion-MNIST datasets will demonstrate that the proposed training algorithm achieves client-level privacy protection with performance gain while enjoying the benefits of low communication overhead from binary model updates.
    摘要 联合学习(FL)是一种隐私保护的合作学习框架,可以进一步强化其隐私保护。现有的FL系统通常采用联合平均(FedAvg)作为训练算法,并在其中实现差分隐私。然而,这些系统中的隐私 utility 质量负面环境严重影响训练性能,特别是当强制实施严格的隐私预算时。此外, Gaussian 机制需要模型参数的高精度。为了提高通信效率和实现更好的隐私 utility 质量,我们提议一种基于 binary neural networks(BNNs)的通信高效的FL训练算法,并实现了适用于客户端的差分隐私保护。我们采用 binary 模型参数上传,以提高通信效率,并在FL设置中添加抽象噪声来实现客户端级差分隐私保护。我们的性能保证是严格地证明的,并且表明其取决于抽象噪声的水平。实验结果基于 MNIST 和 Fashion-MNIST 数据集表明,我们的训练算法可以实现客户端级差分隐私保护,同时享受到低通信开销的 binary 模型更新的好处。

When GPT Meets Program Analysis: Towards Intelligent Detection of Smart Contract Logic Vulnerabilities in GPTScan

  • paper_url: http://arxiv.org/abs/2308.03314
  • repo_url: None
  • paper_authors: Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, Yang Liu
  • for: 这个论文主要是为了检测智能合约中的逻辑漏洞。
  • methods: 该论文使用了生成推训Transformer(GPT)和静态分析结合以检测智能合约的逻辑漏洞。
  • results: 该论文通过使用GPT来理解代码,实现了高精度(超过90%)的智能合约逻辑漏洞检测,并新发现了9个人验员错过的漏洞。
    Abstract Smart contracts are prone to various vulnerabilities, leading to substantial financial losses over time. Current analysis tools mainly target vulnerabilities with fixed control or dataflow patterns, such as re-entrancy and integer overflow. However, a recent study on Web3 security bugs revealed that about 80% of these bugs cannot be audited by existing tools due to the lack of domain-specific property description and checking. Given recent advances in Generative Pretraining Transformer (GPT), it is worth exploring how GPT could aid in detecting logic vulnerabilities in smart contracts. In this paper, we propose GPTScan, the first tool combining GPT with static analysis for smart contract logic vulnerability detection. Instead of relying solely on GPT to identify vulnerabilities, which can lead to high false positives and is limited by GPT's pre-trained knowledge, we utilize GPT as a versatile code understanding tool. By breaking down each logic vulnerability type into scenarios and properties, GPTScan matches candidate vulnerabilities with GPT. To enhance accuracy, GPTScan further instructs GPT to intelligently recognize key variables and statements, which are then validated by static confirmation. Evaluation on diverse datasets with around 400 contract projects and 3K Solidity files shows that GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects like Web3Bugs. It effectively detects groundtruth logic vulnerabilities with a recall of over 80%, including 9 new vulnerabilities missed by human auditors. GPTScan is fast and cost-effective, taking an average of 14.39 seconds and 0.01 USD to scan per thousand lines of Solidity code. Moreover, static confirmation helps GPTScan reduce two-thirds of false positives.
    摘要 智能合约容易受到各种漏洞的威胁,导致长期的财务损失。现有的分析工具主要targets着固定控制或数据流模式的漏洞,如重入和整数溢出。然而,一项研究表明,约80%的Web3安全漏洞无法由现有工具检测,因为缺乏域特定的属性描述和检查。随着生成学习变换器(GPT)的进步,我们可以考虑如何使用GPT来检测智能合约逻辑漏洞。在这篇论文中,我们提出GPTScan,第一个结合GPT和静态分析的智能合约逻辑漏洞检测工具。而不是仅仅依靠GPT来识别漏洞,这可能会导致高false positives和GPT的预训练知识的限制。我们利用GPT作为智能代码理解工具,将每种逻辑漏洞类型分解为场景和属性。GPTScan与GPT进行匹配,以提高准确性。为了进一步提高准确性,GPTScan还 instrucGPT认智感知关键变量和语句,然后验证这些变量和语句的有效性。我们对包括约400个合约项目和3000个Solidity文件的多样化数据进行评估,结果表明GPTScan在智能合约中具有高精度(超过90%),并且在大型项目如Web3Bugs中具有可接受的精度(57.14%)。GPTScan可以快速和cost-effective地检测漏洞,每千行Solidity代码平均需要14.39秒和0.01美元。此外,静态确认帮助GPTScan减少了两 thirds的false positives。

CrossTalk: Intelligent Substrates for Language-Oriented Interaction in Video-Based Communication and Collaboration

  • paper_url: http://arxiv.org/abs/2308.03311
  • repo_url: None
  • paper_authors: Haijun Xia, Tony Wang, Aditya Gunturu, Peiling Jiang, William Duan, Xiaoshuo Yao
  • for: 这篇论文旨在提出一种基于智能技术的视频会议系统,以便更好地帮助用户进行交流和合作。
  • methods: 论文提出三个关键设计思想,包括面板基础、语言基于意图识别和轻量级交互技术。
  • results: 作者开发了一个名为 CrossTalk 的视频会议系统,该系统实现了这三个设计思想,并为用户提供了更加流畅和灵活的交流和合作体验。
    Abstract Despite the advances and ubiquity of digital communication media such as videoconferencing and virtual reality, they remain oblivious to the rich intentions expressed by users. Beyond transmitting audio, videos, and messages, we envision digital communication media as proactive facilitators that can provide unobtrusive assistance to enhance communication and collaboration. Informed by the results of a formative study, we propose three key design concepts to explore the systematic integration of intelligence into communication and collaboration, including the panel substrate, language-based intent recognition, and lightweight interaction techniques. We developed CrossTalk, a videoconferencing system that instantiates these concepts, which was found to enable a more fluid and flexible communication and collaboration experience.
    摘要 尽管数字通信媒体如视频会议和虚拟现实已经广泛应用并普及,但它们却忽略了用户表达的丰富意图。我们认为数字通信媒体不仅仅是传输音频、视频和消息的工具,而是能够通过不侵入式的协助来提高交流和合作。根据前期研究的结果,我们提出了三个关键的设计思想,包括面板底层、语言基于意图识别和轻量级交互技术。我们开发了 CrossTalk 视频会议系统,该系统实现了这些概念,并在使用者体验中提供了更加流畅和灵活的交流和合作体验。

What has ChatGPT read? The origins of archaeological citations used by a generative artificial intelligence application

  • paper_url: http://arxiv.org/abs/2308.03301
  • repo_url: None
  • paper_authors: Dirk HR Spennemann
  • for: 测试 chatGPT 模型是否包含历史文献内容
  • methods: 使用cloze分析法推测 chatGPT 模型所 memorized 的源泉
  • results: chatGPT 模型提供的参考文献中有很多是 fictitious,但是所有真实的参考文献都有在wikipedia页面上被引用Here’s a breakdown of each point in English:
  • for: The paper aims to test what archaeological literature was included in ChatGPT’s training phase.
  • methods: The paper uses cloze analysis to infer what sources the generative AI model has memorized.
  • results: The paper finds that a large percentage of the references provided by ChatGPT are fictitious, and that all genuine references have also been cited on Wikipedia pages. This suggests that the source base for at least some of the data is found in those pages.
    Abstract The public release of ChatGPT has resulted in considerable publicity and has led to wide-spread discussion of the usefulness and capabilities of generative AI language models. Its ability to extract and summarise data from textual sources and present them as human-like contextual responses makes it an eminently suitable tool to answer questions users might ask. This paper tested what archaeological literature appears to have been included in ChatGPT's training phase. While ChatGPT offered seemingly pertinent references, a large percentage proved to be fictitious. Using cloze analysis to make inferences on the sources 'memorised' by a generative AI model, this paper was unable to prove that ChatGPT had access to the full texts of the genuine references. It can be shown that all references provided by ChatGPT that were found to be genuine have also been cited on Wikipedia pages. This strongly indicates that the source base for at least some of the data is found in those pages. The implications of this in relation to data quality are discussed.
    摘要 公共发布的ChatGPT已引起广泛的关注和讨论,探讨了生成AI语言模型的用途和能力。它可以从文本源中提取和摘要数据,并以人类化的语言回答用户问题。这篇论文测试了ChatGPT在训练阶段是否包含了文物学 литературы。虽然ChatGPT提供了看似相关的参考,但大多数证明是假的。通过cloze分析来推断一个生成AI模型所吸收的源,这篇论文未能证明ChatGPT有访问全文真实参考的能力。可以证明所有由ChatGPT提供的真实参考都已经出现在Wikipedia页面上。这表明至少一部分数据的来源在那里。关于数据质量的影响,进行了讨论。

DOMINO: Domain-invariant Hyperdimensional Classification for Multi-Sensor Time Series Data

  • paper_url: http://arxiv.org/abs/2308.03295
  • repo_url: None
  • paper_authors: Junyao Wang, Luke Chen, Mohammad Abdullah Al Faruque
  • for: 本研究旨在解决智能设备边缘上的数据驱动机器学习(ML)方法中的分布shift问题,以提高多感器时序数据的分类性能。
  • methods: 本文提出了一种名为DOMINO的神经元驱动计算(HDC)学习框架,利用高维度空间的有效并行矩阵运算来动态标识和筛选域variant维度。
  • results: 对多种多感器时序分类任务进行了广泛的评估,结果表明DOMINO比状态 Künstler(SOTA)的域泛化技术高出2.04%的准确率,并在训练和推理中具有16.34倍和2.89倍的速度优势。此外,DOMINO在部分标注和高度不均衡数据上进行学习时表现更加出众,对硬件噪声的抗衡性提高了10.93倍。
    Abstract With the rapid evolution of the Internet of Things, many real-world applications utilize heterogeneously connected sensors to capture time-series information. Edge-based machine learning (ML) methodologies are often employed to analyze locally collected data. However, a fundamental issue across data-driven ML approaches is distribution shift. It occurs when a model is deployed on a data distribution different from what it was trained on, and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) have been proposed to capture spatial and temporal dependencies in multi-sensor time series data, requiring intensive computational resources beyond the capacity of today's edge devices. While brain-inspired hyperdimensional computing (HDC) has been introduced as a lightweight solution for edge-based learning, existing HDCs are also vulnerable to the distribution shift challenge. In this paper, we propose DOMINO, a novel HDC learning framework addressing the distribution shift problem in noisy multi-sensor time-series data. DOMINO leverages efficient and parallel matrix operations on high-dimensional space to dynamically identify and filter out domain-variant dimensions. Our evaluation on a wide range of multi-sensor time series classification tasks shows that DOMINO achieves on average 2.04% higher accuracy than state-of-the-art (SOTA) DNN-based domain generalization techniques, and delivers 16.34x faster training and 2.89x faster inference. More importantly, DOMINO performs notably better when learning from partially labeled and highly imbalanced data, providing 10.93x higher robustness against hardware noises than SOTA DNNs.
    摘要 随着互联网物联网的快速发展,许多现实世界应用程序利用不同种类的传感器来 capture 时间序列信息。边缘基于机器学习(ML)方法ologies 常常被用来分析本地收集的数据。然而,跨数据频道的分布shift 问题是数据驱动的 ML 方法ologies 中的一个基本问题。随着时间序列数据中的空间和时间相关性的不断提高,使用深度神经网络(DNNs)来捕捉这些相关性已成为一项核心的技术。然而,这些深度神经网络的计算资源需求已超出当今边缘设备的处理能力。在此基础上,我们提出了 DOMINO,一种新的幂 dimensional computing(HDC)学习框架,解决跨数据频道分布shift 问题在干扰多感知时序数据中。DOMINO 利用高维度空间中效率和并行的矩阵操作,动态标识和筛选域variant 维度。我们对多种多感知时序分类任务进行了广泛的评估,结果显示,DOMINO 在 average 比 state-of-the-art(SOTA) DNN-based 域泛化技术上 achieve 2.04% 高的准确率,并提供 16.34x faster 训练和 2.89x faster 推理。此外,DOMINO 在 learning 从 partially 标注和高度不均衡的数据中表现更为出色,提供 10.93x 更高的硬件噪音鲁减能力。

SynJax: Structured Probability Distributions for JAX

  • paper_url: http://arxiv.org/abs/2308.03291
  • repo_url: https://github.com/deepmind/synjax
  • paper_authors: Miloš Stanojević, Laurent Sartran
  • for: 该论文旨在提供一种高效的 вектор化实现方法,以便在大规模的深度学习模型中直接表示数据中的结构。
  • methods: 该论文使用了SynJax库,该库提供了一种高效的 вектор化实现方法,用于处理包含结构的分布的各种推理算法,如对齐、标记、分割、成分树和覆盖树。
  • results: 该论文通过使用SynJax库,实现了一种大规模的可导 differentiable 模型,可以直接表示数据中的结构,并且可以在现代硬件加速器上实现高效的推理。
    Abstract The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to the vectorized computation. The models that explicitly account for structured objects, such as trees and segmentations, did not benefit equally because they require custom algorithms that are difficult to implement in a vectorized form. SynJax directly addresses this problem by providing an efficient vectorized implementation of inference algorithms for structured distributions covering alignment, tagging, segmentation, constituency trees and spanning trees. With SynJax we can build large-scale differentiable models that explicitly model structure in the data. The code is available at https://github.com/deepmind/synjax.
    摘要 通过深度学习软件库的发展,在这个领域取得了重要进步,让用户可以专注于模型设计,让库负责处理现代硬件加速器的繁琐和耗时任务。然而,这主要对特定类型的深度学习模型带来了好处,如转换器,这些模型的基本 primitives 可以轻松地vector化计算。然而,模型处理结构化对象的模型,如树和分割,没有得到了相同的好处,因为它们需要特定的算法,difficult to implement in a vectorized form。SynJax直接解决了这个问题,提供了高效的vectorized实现方式,用于推理算法,包括对适配、标记、分割、树和span树的推理。通过SynJax,我们可以构建大规模可导的模型,并且直接模型数据中的结构。代码可以在https://github.com/deepmind/synjax上获取。

Local Structure-aware Graph Contrastive Representation Learning

  • paper_url: http://arxiv.org/abs/2308.03271
  • repo_url: None
  • paper_authors: Kai Yang, Yuan Liu, Zijuan Zhao, Peijin Ding, Wenqian Zhao
  • for: 本研究提出了一种Local Structure-aware Graph Contrastive representation Learning方法 (LS-GCL),用于模elling多视图结构信息。
  • methods: 本方法使用了 semantic subgraphs,不限于首orde neighborhood,并使用了共享GNN编码器来学习目标节点嵌入。还使用了一个pooling函数来生成子图层级图像嵌入。
  • results: 实验结果表明,LS-GCL方法在五个数据集上的表现比前一些状态对比较高,在节点分类和链接预测任务上都达到了更好的效果。
    Abstract Traditional Graph Neural Network (GNN), as a graph representation learning method, is constrained by label information. However, Graph Contrastive Learning (GCL) methods, which tackle the label problem effectively, mainly focus on the feature information of the global graph or small subgraph structure (e.g., the first-order neighborhood). In the paper, we propose a Local Structure-aware Graph Contrastive representation Learning method (LS-GCL) to model the structural information of nodes from multiple views. Specifically, we construct the semantic subgraphs that are not limited to the first-order neighbors. For the local view, the semantic subgraph of each target node is input into a shared GNN encoder to obtain the target node embeddings at the subgraph-level. Then, we use a pooling function to generate the subgraph-level graph embeddings. For the global view, considering the original graph preserves indispensable semantic information of nodes, we leverage the shared GNN encoder to learn the target node embeddings at the global graph-level. The proposed LS-GCL model is optimized to maximize the common information among similar instances at three various perspectives through a multi-level contrastive loss function. Experimental results on five datasets illustrate that our method outperforms state-of-the-art graph representation learning approaches for both node classification and link prediction tasks.
    摘要 传统的图 нейрон网络(GNN)在图表示学习方法中受标签信息的限制。然而,图对照学习(GCL)方法,可以有效地解决标签问题,主要集中于全图或小子图结构(例如,首先邻居)的特征信息。在本文中,我们提出了一种本地结构意识感知的图对照学习表示学习方法(LS-GCL),用于模型节点的多视图结构信息。具体来说,我们构建了不限于首先邻居的semantic子图。对本地视图,每个目标节点的semantic子图将输入到共享GNNEncoder中,以获取目标节点的子图级别表示。然后,我们使用一个池化函数生成子graph级别的图编码。对全球视图,由于原始图保留了节点的必要 semantic信息,我们利用共享GNNEncoder来学习目标节点的全图级别表示。我们提出的LS-GCL模型通过最大化三个不同视角的共同信息来优化多级对照损失函数来进行优化。实验结果表明,我们的方法在五个数据集上比 estado-of-the-art的图表示学习方法出色地进行节点分类和链接预测任务。

Simple Rule Injection for ComplEx Embeddings

  • paper_url: http://arxiv.org/abs/2308.03269
  • repo_url: None
  • paper_authors: Haodi Ma, Anthony Colas, Yuejie Wang, Ali Sadeghian, Daisy Zhe Wang
  • for: This paper is written for researchers and practitioners interested in neural knowledge graph inference, particularly those looking to combine logic rules with knowledge graph embeddings.
  • methods: The paper proposes a mechanism called InjEx, which injects multiple types of rules through simple constraints to capture definite Horn rules.
  • results: The paper evaluates InjEx on both the knowledge graph completion (KGC) and few-shot knowledge graph completion (FKGC) settings, and shows that it outperforms baseline KGC models as well as specialized few-shot models while maintaining its scalability and efficiency.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是为了探讨神经知识图推理而写的,尤其是将逻辑规则与知识图嵌入结合起来的研究人员和实践者。
  • methods: 论文提出了一种机制called InjEx,通过简单的约束来捕捉definite Horn规则。
  • results: 论文在知识图完成(KGC)和少量知识图完成(FKGC)设置下进行了实验,并证明InjEx可以超越基eline KGC模型以及特化的少量模型,同时保持了其扩展性和效率。
    Abstract Recent works in neural knowledge graph inference attempt to combine logic rules with knowledge graph embeddings to benefit from prior knowledge. However, they usually cannot avoid rule grounding, and injecting a diverse set of rules has still not been thoroughly explored. In this work, we propose InjEx, a mechanism to inject multiple types of rules through simple constraints, which capture definite Horn rules. To start, we theoretically prove that InjEx can inject such rules. Next, to demonstrate that InjEx infuses interpretable prior knowledge into the embedding space, we evaluate InjEx on both the knowledge graph completion (KGC) and few-shot knowledge graph completion (FKGC) settings. Our experimental results reveal that InjEx outperforms both baseline KGC models as well as specialized few-shot models while maintaining its scalability and efficiency.
    摘要 最近的 neural knowledge graph inference 研究尝试将逻辑规则与知识图 embedding 结合以获得优势。然而,它们通常无法避免规则定义,并尚未全面探讨多种规则的混合。在这个工作中,我们提出了 InjEx,一种可以通过简单的约束将多种类型的规则注入到 embedding 空间中的机制。首先,我们理论上证明了 InjEx 可以注入这些规则。然后,我们通过在知识图完成 (KGC) 和少量知识图完成 (FKGC) 设置中评估 InjEx,发现它可以让知识图中的 embedding 空间具有可读性和可理解性。我们的实验结果表明,InjEx 可以比基eline KGC 模型和专门的几何shot模型表现更好,同时保持其可扩展性和效率。

Redundancy-aware Transformer for Video Question Answering

  • paper_url: http://arxiv.org/abs/2308.03267
  • repo_url: None
  • paper_authors: Yicong Li, Xun Yang, An Zhang, Chun Feng, Xiang Wang, Tat-Seng Chua
  • for: 本研究旨在提高VideoQA中的准确率和效率,通过避免邻帧重复和交叉模态重复。
  • methods: 提出一种基于变换器的新架构,通过强调邻帧中对象水平的变化,以及对不同模式的自适应采样,来解决邻帧重复和交叉模态重复。
  • results: 通过对多个VideoQA benchmark进行测试,发现该方法可以达到当前最佳的结果。
    Abstract This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically, the current video encoders tend to holistically embed all video clues at different granularities in a hierarchical manner, which inevitably introduces \textit{neighboring-frame redundancy} that can overwhelm detailed visual clues at the object level. Subsequently, prevailing vision-language fusion designs introduce the \textit{cross-modal redundancy} by exhaustively fusing all visual elements with question tokens without explicitly differentiating their pairwise vision-language interactions, thus making a pernicious impact on the answering. To this end, we propose a novel transformer-based architecture, that aims to model VideoQA in a redundancy-aware manner. To address the neighboring-frame redundancy, we introduce a video encoder structure that emphasizes the object-level change in neighboring frames, while adopting an out-of-neighboring message-passing scheme that imposes attention only on distant frames. As for the cross-modal redundancy, we equip our fusion module with a novel adaptive sampling, which explicitly differentiates the vision-language interactions by identifying a small subset of visual elements that exclusively support the answer. Upon these advancements, we find this \underline{R}edundancy-\underline{a}ware trans\underline{former} (RaFormer) can achieve state-of-the-art results on multiple VideoQA benchmarks.
    摘要 To address these issues, we propose a novel transformer-based architecture that models VideoQA in a redundancy-aware manner. To reduce neighboring-frame redundancy, we introduce a video encoder structure that emphasizes object-level changes in neighboring frames and adopts an out-of-neighboring message-passing scheme that only attends to distant frames. To address cross-modal redundancy, we equip our fusion module with a novel adaptive sampling that explicitly differentiates vision-language interactions by identifying a small subset of visual elements that exclusively support the answer.Our proposed \underline{R}edundancy-\underline{a}ware transformer (RaFormer) achieves state-of-the-art results on multiple VideoQA benchmarks.

TempFuser: Learning Tactical and Agile Flight Maneuvers in Aerial Dogfights using a Long Short-Term Temporal Fusion Transformer

  • paper_url: http://arxiv.org/abs/2308.03257
  • repo_url: None
  • paper_authors: Hyunki Seong, David Hyunchul Shim
  • for: 该论文旨在提出一种基于长期征文特征和短期动态特征的整体飞行策略模型,用于解决空中战斗中的机动和战术飞行问题。
  • methods: 该方法使用两个LSTM基于输入嵌入来编码长期稀缺状态轨迹,以及短期密集状态轨迹。通过将两个嵌入者 integrate through transformer编码器,方法subsequently derivese终端飞行命令。
  • results: 该模型在具有多种反对机型的高精度环境中进行了广泛验证,并证明了它在机动和战术飞行方面的表现超过了基准模型。该模型成功地学习了基本飞行招数、人工驾驶员式战术招数和在低空下的稳定追逐。视频可以在 \url{https://sites.google.com/view/tempfuser} 上查看。
    Abstract Aerial dogfights necessitate understanding the tactically changing maneuvers from a long-term perspective, along with the rapidly changing aerodynamics from a short-term view. In this paper, we propose a novel long short-term temporal fusion transformer (TempFuser) for a policy network in aerial dogfights. Our method uses two LSTM-based input embeddings to encode long-term, sparse state trajectories, as well as short-term, dense state trajectories. By integrating the two embeddings through a transformer encoder, the method subsequently derives end-to-end flight commands for agile and tactical maneuvers. We formulate a deep reinforcement learning framework to train our TempFuser-based policy model. We then extensively validate our model, demonstrating that it outperforms other baseline models against a diverse range of opponent aircraft in a high-fidelity environment. Our model successfully learns basic fighter maneuvers, human pilot-like tactical maneuvers, and robust supersonic pursuit in low altitudes without explicitly coded prior knowledge. Videos are available at \url{https://sites.google.com/view/tempfuser}
    摘要 aerial dogfights require understanding the tactically changing maneuvers from a long-term perspective, as well as the rapidly changing aerodynamics from a short-term view. In this paper, we propose a novel long short-term temporal fusion transformer (TempFuser) for a policy network in aerial dogfights. Our method uses two LSTM-based input embeddings to encode long-term, sparse state trajectories, as well as short-term, dense state trajectories. By integrating the two embeddings through a transformer encoder, the method subsequently derives end-to-end flight commands for agile and tactical maneuvers. We formulate a deep reinforcement learning framework to train our TempFuser-based policy model. We then extensively validate our model, demonstrating that it outperforms other baseline models against a diverse range of opponent aircraft in a high-fidelity environment. Our model successfully learns basic fighter maneuvers, human pilot-like tactical maneuvers, and robust supersonic pursuit in low altitudes without explicitly coded prior knowledge. Videos are available at \url{https://sites.google.com/view/tempfuser}Here's the Chinese text with traditional Chinese characters:空中 dogfight 需要从长期perspective理解战术上的变化,以及短期view的 aerodynamics 变化。在这篇文章中,我们提出一个 novel long short-term temporal fusion transformer (TempFuser) 作为policy network的一部分。我们的方法使用两个 LSTM 基于的输入嵌入来编码长期、稀疏的状态轨迹,以及短期、密集的状态轨迹。通过将两个嵌入器组合成一个 transformer Encoder,方法随后 derivation 终端的飞行命令。我们建立了一个深度强化学习框架,用于训练我们的 TempFuser 基于的政策模型。我们然后广泛验证我们的模型,证明它在高质量环境中比基eline模型高效。我们的模型成功地学习了基本战斗机动、人类飞行员式的战术机动和在低高度中Robust supersonic pursuit 无需显式编程优先知识。影片可以在 \url{https://sites.google.com/view/tempfuser} 上找到。

PaniniQA: Enhancing Patient Education Through Interactive Question Answering

  • paper_url: http://arxiv.org/abs/2308.03253
  • repo_url: https://github.com/pengshancai/paniniqa
  • paper_authors: Pengshan Cai, Zonghai Yao, Fei Liu, Dakuo Wang, Meghan Reilly, Huixue Zhou, Lingxi Li, Yi Cao, Alok Kapoor, Adarsha Bajracharya, Dan Berlowitz, Hong Yu
  • for: 帮助病人理解医疗记录中的个性化出院指南
  • methods: 使用人类中心的问答系统,从病人的出院指南中提取重要的医疗内容,并为病人提供个性化的教育问题
  • results: 通过自动和人工评估,表明PaniniQA可以有效地帮助病人理解和记忆医疗指南,提高病人的医疗知识和自信心
    Abstract Patient portal allows discharged patients to access their personalized discharge instructions in electronic health records (EHRs). However, many patients have difficulty understanding or memorizing their discharge instructions. In this paper, we present PaniniQA, a patient-centric interactive question answering system designed to help patients understand their discharge instructions. PaniniQA first identifies important clinical content from patients' discharge instructions and then formulates patient-specific educational questions. In addition, PaniniQA is also equipped with answer verification functionality to provide timely feedback to correct patients' misunderstandings. Our comprehensive automatic and human evaluation results demonstrate our PaniniQA is capable of improving patients' mastery of their medical instructions through effective interactions
    摘要 患者门户 permet 出院患者访问其个性化出院指南在电子医疗纪录 (EHR) 中。然而,许多患者困难理解或记忆出院指南。在这篇论文中,我们介绍 PaniniQA,一个患者中心的交互问答系统,用于帮助患者理解出院指南。PaniniQA 首先从患者的出院指南中提取重要的医疗内容,然后根据患者的个性特点制定特定的教育问题。此外,PaniniQA 还具有答案验证功能,以提供及时的反馈,以正式误解。我们的全面的自动和人工评估结果表明,PaniniQA 能够通过有效的互动提高患者对医疗指南的理解。

Analysis of Optical Loss and Crosstalk Noise in MZI-based Coherent Photonic Neural Networks

  • paper_url: http://arxiv.org/abs/2308.03249
  • repo_url: None
  • paper_authors: Amin Shafiee, Sanmitra Banerjee, Krishnendu Chakrabarty, Sudeep Pasricha, Mahdi Nikdast
    for: 这篇论文主要关注于提出了一个从底向上的模型,用于分析摄光网络(SP-NN)中各种实验设计对于损失和杂音的影响。methods: 本论文使用了一个从底向上的模型,从device层次到系统层次,以分析摄光网络中各种实验设计对于损失和杂音的影响。results: 本论文的结果显示,当SP-NN的规模增加时,损失和杂音的影响会逐渐增加,导致推论精度下降,甚至可以下降至10%以下。此外,本论文还给出了不同的MZI网络配置(如Reck、Clements和Diamond)的损失和杂音的分析结果。
    Abstract With the continuous increase in the size and complexity of machine learning models, the need for specialized hardware to efficiently run such models is rapidly growing. To address such a need, silicon-photonic-based neural network (SP-NN) accelerators have recently emerged as a promising alternative to electronic accelerators due to their lower latency and higher energy efficiency. Not only can SP-NNs alleviate the fan-in and fan-out problem with linear algebra processors, their operational bandwidth can match that of the photodetection rate (typically 100 GHz), which is at least over an order of magnitude faster than electronic counterparts that are restricted to a clock rate of a few GHz. Unfortunately, the underlying silicon photonic devices in SP-NNs suffer from inherent optical losses and crosstalk noise originating from fabrication imperfections and undesired optical couplings, the impact of which accumulates as the network scales up. Consequently, the inferencing accuracy in an SP-NN can be affected by such inefficiencies -- e.g., can drop to below 10% -- the impact of which is yet to be fully studied. In this paper, we comprehensively model the optical loss and crosstalk noise using a bottom-up approach, from the device to the system level, in coherent SP-NNs built using Mach-Zehnder interferometer (MZI) devices. The proposed models can be applied to any SP-NN architecture with different configurations to analyze the effect of loss and crosstalk. Such an analysis is important where there are inferencing accuracy and scalability requirements to meet when designing an SP-NN. Using the proposed analytical framework, we show a high power penalty and a catastrophic inferencing accuracy drop of up to 84% for SP-NNs of different scales with three known MZI mesh configurations (i.e., Reck, Clements, and Diamond) due to accumulated optical loss and crosstalk noise.
    摘要 随着机器学习模型的大小和复杂度不断增加,特化硬件来高效运行这些模型的需求也在不断增长。为了解决这种需求,silicon-photonic-based neural network(SP-NN)加速器在最近几年出现了,它们因其低延迟和高能效性而成为了电子加速器的有力竞争者。不仅可以使SP-NN解决线性代数处理器的缓冲和输出问题,其操作带宽可以与光检测速率(通常是100 GHz)相同,这是电子对手的多orders of magnitude更慢的速率。然而,在SP-NN中的silicon光学设备受到制造瑕疵和不良光学 Coupling的影响,这些影响会随着网络规模增加,从而影响SP-NN的推理精度。例如,推理精度可以降至下rance than 10%。在这篇论文中,我们从底层设备到系统层使用可靠的模型来模拟光损和十字谱噪。这些模型可以应用于不同的SP-NN架构,以分析光损和十字谱噪对推理精度的影响。这种分析对于设计SP-NN时存在推理精度和可扩展性的需求非常重要。使用我们提出的分析框架,我们显示了SP-NN的不同规模下的高电力负担和推理精度下降可达84%,这些下降都是由光损和十字谱噪所导致的。

Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes

  • paper_url: http://arxiv.org/abs/2308.03244
  • repo_url: None
  • paper_authors: Chongyang Zhao, Yuankai Qi, Qi Wu
  • for: 这个论文的目的是缓解成本精确 Navigation (VLN) 中的成本差(Success Rate,SR)和oracle成本差(Oracle Success Rate,OSR)之间的差距。
  • methods: 本论文提出了一个多模块 transformer 基本的模型,用于学习精确的路径视角表示,并使用这个表示来预测指令中的目标位置是否确实存在。
  • results: 本论文在三个通用的测试集(R2R、REVERIE 和 NDH)上进行评估,结果显示了可以实现更高的精确率和更低的失败率,显示了这个方法的潜力。
    Abstract Vision-and-Language Navigation (VLN) aims to navigate to the target location by following a given instruction. Unlike existing methods focused on predicting a more accurate action at each step in navigation, in this paper, we make the first attempt to tackle a long-ignored problem in VLN: narrowing the gap between Success Rate (SR) and Oracle Success Rate (OSR). We observe a consistently large gap (up to 9%) on four state-of-the-art VLN methods across two benchmark datasets: R2R and REVERIE. The high OSR indicates the robot agent passes the target location, while the low SR suggests the agent actually fails to stop at the target location at last. Instead of predicting actions directly, we propose to mine the target location from a trajectory given by off-the-shelf VLN models. Specially, we design a multi-module transformer-based model for learning compact discriminative trajectory viewpoint representation, which is used to predict the confidence of being a target location as described in the instruction. The proposed method is evaluated on three widely-adopted datasets: R2R, REVERIE and NDH, and shows promising results, demonstrating the potential for more future research.
    摘要 vision-and-language navigation (vlN) 目标是通过跟随给定的指令进行导航。与现有方法强调预测每步行动的准确性不同,在这篇论文中,我们首次尝试解决 vlN 中长期被忽略的问题:减少 SR 和 OSR 之间的差距。我们在两个基本样本数据集上(R2R 和 REVERIE)观察到一定的差距(最高达 9%)。高 OSR 表示机器人代理人在目标位置通过,而低 SR 则表示机器人代理人最终没有停止在目标位置。相比于直接预测行动,我们提议从 off-the-shelf VLN 模型获取的轨迹给出的路径视图表示来挖掘目标位置。我们设计了一种具有多模块的 transformer 基本模型,用于学习短暂而特征化的轨迹视图表示,以预测指令中所描述的目标位置是否准确。我们在 R2R、REVERIE 和 NDH 等三个广泛采用的数据集上进行评估,并取得了满意的结果,证明了我们的方法的潜在可能性。

Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining

  • paper_url: http://arxiv.org/abs/2308.03235
  • repo_url: https://github.com/zekaouinoureddine/Opinion-Transformers
  • paper_authors: Nour Eddine Zekaoui, Siham Yousfi, Maryem Rhanoui, Mounia Mikram
  • For: 本研究旨在研究高性能自然语言处理(NLP)模型在观点挖掘(Opinion Mining)任务中的表现,并对比不同的Transformer-based语言模型。* Methods: 本研究使用了高性能的Transformer-based语言模型,包括BERT、RoBERTa和XLNet等,对多个语料库进行评价,并对比它们的性能。* Results: 研究结果显示,这些Transformer-based语言模型在观点挖掘任务中具有出色的表现,具体来说,BERT和RoBERTa在多个语料库中的平均准确率都高于90%,而XLNet的平均准确率则高于95%。
    Abstract Opinion mining, also known as sentiment analysis, is a subfield of natural language processing (NLP) that focuses on identifying and extracting subjective information in textual material. This can include determining the overall sentiment of a piece of text (e.g., positive or negative), as well as identifying specific emotions or opinions expressed in the text, that involves the use of advanced machine and deep learning techniques. Recently, transformer-based language models make this task of human emotion analysis intuitive, thanks to the attention mechanism and parallel computation. These advantages make such models very powerful on linguistic tasks, unlike recurrent neural networks that spend a lot of time on sequential processing, making them prone to fail when it comes to processing long text. The scope of our paper aims to study the behaviour of the cutting-edge Transformer-based language models on opinion mining and provide a high-level comparison between them to highlight their key particularities. Additionally, our comparative study shows leads and paves the way for production engineers regarding the approach to focus on and is useful for researchers as it provides guidelines for future research subjects.
    摘要

Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi (2023)

  • paper_url: http://arxiv.org/abs/2308.03228
  • repo_url: None
  • paper_authors: Jordan Kodner, Sarah Payne, Jeffrey Heinz
  • for: 本文批判Piantadosi(2023)声称“现代语言模型推翻了昌斯基的语言方法”,关注四个主要点。
  • methods: 本文使用了大语言模型(LLM)的启示和实用性,以及语言学习的中心谜团。
  • results: 本文结论是,尽管LLMs具有启示和实用性,但人类语言学习的谜团仍未被解释。此外,LLMs无法提供解释性的科学理论,因此generative linguistics仍将是21世纪和以后不可或缺的科学 дисциплины。
    Abstract We present a critical assessment of Piantadosi's (2023) claim that "Modern language models refute Chomsky's approach to language," focusing on four main points. First, despite the impressive performance and utility of large language models (LLMs), humans achieve their capacity for language after exposure to several orders of magnitude less data. The fact that young children become competent, fluent speakers of their native languages with relatively little exposure to them is the central mystery of language learning to which Chomsky initially drew attention, and LLMs currently show little promise of solving this mystery. Second, what can the artificial reveal about the natural? Put simply, the implications of LLMs for our understanding of the cognitive structures and mechanisms underlying language and its acquisition are like the implications of airplanes for understanding how birds fly. Third, LLMs cannot constitute scientific theories of language for several reasons, not least of which is that scientific theories must provide interpretable explanations, not just predictions. This leads to our final point: to even determine whether the linguistic and cognitive capabilities of LLMs rival those of humans requires explicating what humans' capacities actually are. In other words, it requires a separate theory of language and cognition; generative linguistics provides precisely such a theory. As such, we conclude that generative linguistics as a scientific discipline will remain indispensable throughout the 21st century and beyond.
    摘要 我们提出了对Piantadosi(2023)的批判,关注四个主要点。首先,虽然大型语言模型(LLMs)表现出色,但人类通过相对较少的数据来获得语言能力。儿童在获得native语言 fluency的过程中需要相对较少的数据,这是语言学习中的中心谜题,LLMs目前没有解决这个谜题。二、人工智能可以揭示自然语言吗?将airplanes作为 birds fly的 analogie,LLMs对我们对语言和其学习机制的理解提供了什么?三、LLMs无法构成语言科学的理论,因为科学理论需要可解释的结果,不仅仅是预测。这导致我们的最后一点:要确定LLMs的语言和认知能力与人类相比,首先需要解释人类的能力。在其他 palabras,我们需要一个分析语言和认知的理论,生成语言学派 precisely 提供了这样的理论。因此,我们结论是,生成语言学派作为科学领域将在21世纪和以后保持不可或缺的。

Investigation of Self-supervised Pre-trained Models for Classification of Voice Quality from Speech and Neck Surface Accelerometer Signals

  • paper_url: http://arxiv.org/abs/2308.03226
  • repo_url: None
  • paper_authors: Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku
    for:* 这项研究旨在 классифика voice quality(呼吸声、Modal声和压缩声)的自动分类方法。methods:* 这项研究使用了同时记录的语音和脖子振荡器(NSA)信号作为输入,并提取了MFCCs和glottal source features。results:* 研究发现,使用 NSA 输入可以获得更好的分类性能,而且使用 pre-trained 模型基于的特征(wav2vec2-BASE、wav2vec2-LARGE 和 HuBERT)可以提高分类精度。I hope this helps! Let me know if you have any other questions.
    Abstract Prior studies in the automatic classification of voice quality have mainly studied the use of the acoustic speech signal as input. Recently, a few studies have been carried out by jointly using both speech and neck surface accelerometer (NSA) signals as inputs, and by extracting MFCCs and glottal source features. This study examines simultaneously-recorded speech and NSA signals in the classification of voice quality (breathy, modal, and pressed) using features derived from three self-supervised pre-trained models (wav2vec2-BASE, wav2vec2-LARGE, and HuBERT) and using a SVM as well as CNNs as classifiers. Furthermore, the effectiveness of the pre-trained models is compared in feature extraction between glottal source waveforms and raw signal waveforms for both speech and NSA inputs. Using two signal processing methods (quasi-closed phase (QCP) glottal inverse filtering and zero frequency filtering (ZFF)), glottal source waveforms are estimated from both speech and NSA signals. The study has three main goals: (1) to study whether features derived from pre-trained models improve classification accuracy compared to conventional features (spectrogram, mel-spectrogram, MFCCs, i-vector, and x-vector), (2) to investigate which of the two modalities (speech vs. NSA) is more effective in the classification task with pre-trained model-based features, and (3) to evaluate whether the deep learning-based CNN classifier can enhance the classification accuracy in comparison to the SVM classifier. The results revealed that the use of the NSA input showed better classification performance compared to the speech signal. Between the features, the pre-trained model-based features showed better classification accuracies, both for speech and NSA inputs compared to the conventional features. It was also found that the HuBERT features performed better than the wav2vec2-BASE and wav2vec2-LARGE features.
    摘要 前期研究在自动识别语音质量方面主要使用语音音频信号作为输入。近年来,一些研究开始将语音和脖子表面加速度信号(NSA)作为输入并提取MFCC和格斗音源特征。本研究通过同时记录的语音和NSA信号进行语音质量(呼吸、模态和压缩)的分类,使用三个自动预训练模型(wav2vec2-BASE、wav2vec2-LARGE和HuBERT)提取特征,并使用支持向量机(SVM)和卷积神经网络(CNN)作为分类器。此外,采用不同的信号处理方法( quasi-closed phase 预测频率滤波和零频率滤波)来估计语音和NSA信号的格斗音源波形。研究拥有三个主要目标:1. 研究是否可以通过使用预训练模型提取特征来提高分类精度,比较传统特征(spectrogram、mel-spectrogram、MFCC、i-vector和x-vector)的表现。2. investigate 语音和NSA信号中哪一个模式更有效iveness 在分类任务中,并且是否可以通过预训练模型基于特征来确定这一点。3. 评估深度学习基于CNN的分类器是否可以提高分类精度,与支持向量机(SVM)分类器相比。研究结果表明,使用NSA输入可以更好地分类语音质量,而且采用预训练模型基于特征可以提高分类精度,对于语音和NSA输入都有着优异表现。此外,HuBERT特征也被发现比wav2vec2-BASE和wav2vec2-LARGE特征更为有效。

Source-free Domain Adaptive Human Pose Estimation

  • paper_url: http://arxiv.org/abs/2308.03202
  • repo_url: https://github.com/davidpengucf/sfdahpe
  • paper_authors: Qucheng Peng, Ce Zheng, Chen Chen
  • for: address the challenges of cross-domain learning of Human Pose Estimation (HPE) without access to source data during the adaptation process.
  • methods: proposed a novel framework that consists of three models: source model, intermediate model, and target model, which explores the task from both source-protect and target-relevant perspectives.
  • results: comprehensive experiments on several domain adaptive HPE benchmarks show that the proposed method outperforms existing approaches by a considerable margin.
    Abstract Human Pose Estimation (HPE) is widely used in various fields, including motion analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world datasets present a significant challenge for HPE. To overcome this, one approach is to train HPE models on synthetic datasets and then perform domain adaptation (DA) on real-world data. Unfortunately, existing DA methods for HPE neglect data privacy and security by using both source and target data in the adaptation process. To this end, we propose a new task, named source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process. We further propose a novel framework that consists of three models: source model, intermediate model, and target model, which explores the task from both source-protect and target-relevant perspectives. The source-protect module preserves source information more effectively while resisting noise, and the target-relevant module reduces the sparsity of spatial representations by building a novel spatial probability space, and pose-specific contrastive learning and information maximization are proposed on the basis of this space. Comprehensive experiments on several domain adaptive HPE benchmarks show that the proposed method outperforms existing approaches by a considerable margin. The codes are available at https://github.com/davidpengucf/SFDAHPE.
    摘要 人体姿态估计(HPE)在多个领域得到广泛应用,如动作分析、医疗和虚拟现实。然而,实际世界数据的高成本成为HPE的一大挑战。为解决这个问题,一种方法是在HPE模型上训练于synthetic数据,然后在实际数据上进行领域适应(DA)。然而,现有的DA方法 дляHPE忽视了数据隐私和安全性,通过使用源数据和目标数据在适应过程中使用。为此,我们提出了一个新任务,名为无源领域适应HPE,旨在解决HPE的跨领域学习问题,不需要在适应过程中访问源数据。我们还提出了一个新的框架,包括三个模型:源模型、中间模型和目标模型,该框架从源保护和目标相关两个角度出发,以提高适应效果。源保护模块更好地保留源信息,同时抵御噪声,目标相关模块减少了空间表示的稀疏性,通过建立一个新的空间概率空间,并在其基础上提出了pose特有的对比学习和信息最大化。我们对多个领域适应HPE的benchmark进行了广泛的实验,结果表明,我们提出的方法在现有方法的基础上具有较大的提升。代码可以在https://github.com/davidpengucf/SFDAHPE上获取。

Unmasking the Invisible: Finding Location-Specific Aggregated Air Quality Index with Smartphone-Captured Images

  • paper_url: http://arxiv.org/abs/2308.03200
  • repo_url: None
  • paper_authors: Joyanta Jyoti Mondal, Md. Farhadul Islam, Raima Islam, Nowsin Kabir Rhidi, A. B. M. Alim Al Islam, Meem Arafat Manab, Jannatun Noor
  • for: 这篇论文主要探讨了基于智能手机拍摄的空气质量指数预测技术,具体来说是使用深度卷积神经网络(DCNN)来预测达卡(Dhaka)的PM2.5浓度。
  • methods: 这篇论文使用了大量的户外图像和相应的PM2.5浓度数据来训练DCNN模型,并通过超vised学习来建立图像和PM2.5浓度之间的相关性指数。这种方法被称为“ Picture-based Predictor of PM2.5 Concentration”(PPPC)。
  • results: 试验结果表明,该模型在预测达卡的PM2.5浓度方面表现出色,比较流行的模型如ViT和INN以及CNN基本模型如VGG19、ResNet50和MobileNetV2都要出色。此外,该模型的资源利用率较高,只用了少量的参数。
    Abstract The prevalence and mobility of smartphones make these a widely used tool for environmental health research. However, their potential for determining aggregated air quality index (AQI) based on PM2.5 concentration in specific locations remains largely unexplored in the existing literature. In this paper, we thoroughly examine the challenges associated with predicting location-specific PM2.5 concentration using images taken with smartphone cameras. The focus of our study is on Dhaka, the capital of Bangladesh, due to its significant air pollution levels and the large population exposed to it. Our research involves the development of a Deep Convolutional Neural Network (DCNN), which we train using over a thousand outdoor images taken and annotated. These photos are captured at various locations in Dhaka, and their labels are based on PM2.5 concentration data obtained from the local US consulate, calculated using the NowCast algorithm. Through supervised learning, our model establishes a correlation index during training, enhancing its ability to function as a Picture-based Predictor of PM2.5 Concentration (PPPC). This enables the algorithm to calculate an equivalent daily averaged AQI index from a smartphone image. Unlike, popular overly parameterized models, our model shows resource efficiency since it uses fewer parameters. Furthermore, test results indicate that our model outperforms popular models like ViT and INN, as well as popular CNN-based models such as VGG19, ResNet50, and MobileNetV2, in predicting location-specific PM2.5 concentration. Our dataset is the first publicly available collection that includes atmospheric images and corresponding PM2.5 measurements from Dhaka. Our code and dataset will be made public when publishing the paper.
    摘要 智能手机的普遍和流动性使得它们成为了环境健康研究中广泛使用的工具。然而,智能手机在确定具体位置的空气质量指数(AQI)方面的潜在应用仍然在现有文献中得不到充分的探讨。本文 thorougly examine the challenges associated with predicting location-specific PM2.5 concentration using images taken with smartphone cameras.我们的研究对象是孟加拉国首都达卡,因为它的空气污染水平很高,并且有大量人口暴露在其中。我们的研究包括开发一个深度卷积神经网络(DCNN),我们使用超过一千个户外图像进行训练。这些图像在达卡各地拍摄,并将其标注为PM2.5浓度数据,该数据来自当地美国领事馆计算的NowCast算法。通过监督学习,我们的模型在训练期间建立了相关性指数,从而使得它可以作为图像基于预测PM2.5浓度的算法(PPPC)。这使得算法可以从智能手机图像中计算équivalent的日均AQI指数。与流行的过度参数化模型不同,我们的模型表现出资源有效性,因为它使用 fewer 参数。另外,测试结果表明,我们的模型在确定具体位置的PM2.5浓度方面比流行的ViT和INN模型,以及流行的CNN基本模型如VGG19、ResNet50和MobileNetV2,表现出色。我们的数据集是首次公共可用的,包括达卡的大气图像和相应PM2.5测量数据。我们的代码和数据将在发表论文时公开。

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

  • paper_url: http://arxiv.org/abs/2308.03188
  • repo_url: https://github.com/teacherpeterpan/self-correction-llm-papers
  • paper_authors: Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang
  • for: 本研究旨在探讨自动反馈技术的应用在大语言模型(LLM)中,以改进LLM的性能和可用性。
  • methods: 本研究使用了许多最新的研究,包括训练时间、生成时间和后期修正等方法,以探讨自动反馈技术的应用。
  • results: 研究发现自动反馈技术可以有效地改进LLM的性能和可用性,但还存在一些挑战和未来的发展方向。
    Abstract Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic content. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. Techniques leveraging automated feedback -- either produced by the LLM itself or some external system -- are of particular interest as they are a promising way to make LLM-based solutions more practical and deployable with minimal human feedback. This paper presents a comprehensive review of this emerging class of techniques. We analyze and taxonomize a wide array of recent work utilizing these strategies, including training-time, generation-time, and post-hoc correction. We also summarize the major applications of this strategy and conclude by discussing future directions and challenges.
    摘要 Translation notes:* "Large language models" (LLMs) is translated as "大型语言模型" (dàxíng yǔyán módelǐ)* "NLP tasks" is translated as "自然语言处理任务" (zìrán yǔyán xiǎnggōng zhìdao)* "hallucination" is translated as "幻见" (hénjiàn)* "unfaithful reasoning" is translated as "不诚实的推理" (bùzhèngshí de tuīlǐ)* "toxic content" is translated as "毒害内容" (dāohài nèixìng)* "self-correction" is translated as "自动更正" (zìdòng gengzhèng)* "automated feedback" is translated as "自动反馈" (zìdòng fāngxiàn)* "training-time" is translated as "训练时间" (xùnxīn shíjiān)* "generation-time" is translated as "生成时间" (shēngchǎn shíjiān)* "post-hoc correction" is translated as "后续更正" (hòu xiù gengzhèng)* "major applications" is translated as "主要应用" (zhǔyào yìngyù)* "future directions" is translated as "未来方向" (wèilái fāngdìng)* "challenges" is translated as "挑战" (tiǎozhàng)

VN-Solver: Vision-based Neural Solver for Combinatorial Optimization over Graphs

  • paper_url: http://arxiv.org/abs/2308.03185
  • repo_url: https://github.com/minasmz/VN-Solver
  • paper_authors: Mina Samizadeh, Guangmo Tong
  • for: 解决 combinatorial optimization 问题 over graphs,如旅行商问题和车辆路径问题。
  • methods: 使用视觉模型,不同于常见的 neural combinatorial solvers,可以通过对图像进行学习来解决图像问题。
  • results: 结果表明,这种视觉方法的性能不仅不低于Matrix-based方法,而且可以与其相比,开启了一新的数据驱动优化解决方法的avenue。
    Abstract Data-driven approaches have been proven effective in solving combinatorial optimization problems over graphs such as the traveling salesman problems and the vehicle routing problem. The rationale behind such methods is that the input instances may follow distributions with salient patterns that can be leveraged to overcome the worst-case computational hardness. For optimization problems over graphs, the common practice of neural combinatorial solvers consumes the inputs in the form of adjacency matrices. In this paper, we explore a vision-based method that is conceptually novel: can neural models solve graph optimization problems by \textit{taking a look at the graph pattern}? Our results suggest that the performance of such vision-based methods is not only non-trivial but also comparable to the state-of-the-art matrix-based methods, which opens a new avenue for developing data-driven optimization solvers.
    摘要 <>translate the following text into Simplified Chinese:Data-driven approaches have been proven effective in solving combinatorial optimization problems over graphs such as the traveling salesman problems and the vehicle routing problem. The rationale behind such methods is that the input instances may follow distributions with salient patterns that can be leveraged to overcome the worst-case computational hardness. For optimization problems over graphs, the common practice of neural combinatorial solvers consumes the inputs in the form of adjacency matrices. In this paper, we explore a vision-based method that is conceptually novel: can neural models solve graph optimization problems by \textit{taking a look at the graph pattern}? Our results suggest that the performance of such vision-based methods is not only non-trivial but also comparable to the state-of-the-art matrix-based methods, which opens a new avenue for developing data-driven optimization solvers.Translate the text into Simplified Chinese:<>Here's the translation:数据驱动方法在解决图上的 combinatorial 优化问题上已经得到证明,如旅行商问题和车辆路径问题。这种方法的基本思想是,输入实例可能会遵循一些突出的模式,这些模式可以用来缓解最坏情况的计算复杂性。在图上的优化问题上,常见的 neural combinatorial 算法会将输入作为邻接矩阵来处理。在这篇论文中,我们探索了一种新的视觉基于的方法:可以 neural 模型通过 \textit{看看图形模式} 来解决图上的优化问题吗?我们的结果表明,这种视觉基于的方法不仅不rivial,而且与当前最佳的矩阵基于的方法相当,这开启了一个新的数据驱动优化算法的发展新途。

Empirical Optimal Risk to Quantify Model Trustworthiness for Failure Detection

  • paper_url: http://arxiv.org/abs/2308.03179
  • repo_url: None
  • paper_authors: Shuang Ao, Stefan Rueger, Advaith Siddharthan
  • For: This paper focuses on the problem of failure detection (FD) in AI systems, specifically the evaluation of FD performance and the trade-offs between data coverage rate and performance on accepted data.* Methods: The paper proposes two new evaluation metrics, the Excess Area Under the Optimal RC Curve (E-AUoptRC) and the Trust Index (TI), to better reflect the trustworthiness of FD models. These metrics are designed to provide a more intuitive and meaningful evaluation of FD performance, especially when the data coverage rate is partial.* Results: The paper reports extensive experiments on three benchmark image datasets with ten variants of transformer and CNN models, demonstrating that the proposed methods can better reflect the model trustworthiness than existing evaluation metrics. The results also show that high overall accuracy does not always yield high TI, highlighting the necessity of the proposed Trust Index as a complementary metric to the model overall accuracy.
    Abstract Failure detection (FD) in AI systems is a crucial safeguard for the deployment for safety-critical tasks. The common evaluation method of FD performance is the Risk-coverage (RC) curve, which reveals the trade-off between the data coverage rate and the performance on accepted data. One common way to quantify the RC curve by calculating the area under the RC curve. However, this metric does not inform on how suited any method is for FD, or what the optimal coverage rate should be. As FD aims to achieve higher performance with fewer data discarded, evaluating with partial coverage excluding the most uncertain samples is more intuitive and meaningful than full coverage. In addition, there is an optimal point in the coverage where the model could achieve ideal performance theoretically. We propose the Excess Area Under the Optimal RC Curve (E-AUoptRC), with the area in coverage from the optimal point to the full coverage. Further, the model performance at this optimal point can represent both model learning ability and calibration. We propose it as the Trust Index (TI), a complementary evaluation metric to the overall model accuracy. We report extensive experiments on three benchmark image datasets with ten variants of transformer and CNN models. Our results show that our proposed methods can better reflect the model trustworthiness than existing evaluation metrics. We further observe that the model with high overall accuracy does not always yield the high TI, which indicates the necessity of the proposed Trust Index as a complementary metric to the model overall accuracy. The code are available at \url{https://github.com/AoShuang92/optimal_risk}.
    摘要 Failure detection (FD) 在人工智能系统中是一项重要的安全监测,用于安全关键任务的部署。通常的评估方法是风险覆盖率(RC)曲线,它显示了数据覆盖率和接受数据的性能之间的交易。但这个指标并不能告诉我们任务是否适合FD,也不能告诉我们应该选择的覆盖率是多少。因为FD的目标是通过少量数据来提高性能,所以评估 partial coverage,排除最不确定的样本更加直观和有意义。此外,存在最佳的覆盖率点, modelo可以在理论上实现最佳性能。我们提出了过余的风险覆盖曲线下的最佳点(E-AUoptRC),以及该点下的模型性能。我们认为这个指标是模型信任指数(TI),它是评估模型可靠性的 complementary 指标。我们在三个标准图像集上进行了广泛的实验,结果表明我们的提议方法可以更好地反映模型的可靠性。我们还发现,高度全局准确率并不总是导致高度信任指数,这说明了我们的信任指数是一个必要的 complementary 指标。代码可以在 GitHub 上找到:https://github.com/AoShuang92/optimal_risk。

Building Safe and Reliable AI systems for Safety Critical Tasks with Vision-Language Processing

  • paper_url: http://arxiv.org/abs/2308.03176
  • repo_url: None
  • paper_authors: Shuang Ao
  • for: 这份论文的目的是提出一个安全可靠的人工智能系统,尤其是在安全敏感任务中。
  • methods: 本论文使用了许多已有的数据类型和开源benchmark数据集,并提出了一些改进现有技术的方法,以确保模型的不确定性量精确。
  • results: 本论文的结果显示,现有的人工智能系统无法正确地识别通用的失败原因,并且需要更多的技术来量化预测的质量。
    Abstract Although AI systems have been applied in various fields and achieved impressive performance, their safety and reliability are still a big concern. This is especially important for safety-critical tasks. One shared characteristic of these critical tasks is their risk sensitivity, where small mistakes can cause big consequences and even endanger life. There are several factors that could be guidelines for the successful deployment of AI systems in sensitive tasks: (i) failure detection and out-of-distribution (OOD) detection; (ii) overfitting identification; (iii) uncertainty quantification for predictions; (iv) robustness to data perturbations. These factors are also challenges of current AI systems, which are major blocks for building safe and reliable AI. Specifically, the current AI algorithms are unable to identify common causes for failure detection. Furthermore, additional techniques are required to quantify the quality of predictions. All these contribute to inaccurate uncertainty quantification, which lowers trust in predictions. Hence obtaining accurate model uncertainty quantification and its further improvement are challenging. To address these issues, many techniques have been proposed, such as regularization methods and learning strategies. As vision and language are the most typical data type and have many open source benchmark datasets, this thesis will focus on vision-language data processing for tasks like classification, image captioning, and vision question answering. In this thesis, we aim to build a safeguard by further developing current techniques to ensure the accurate model uncertainty for safety-critical tasks.
    摘要
  1. Failure detection and out-of-distribution (OOD) detection2. Overfitting identification3. Uncertainty quantification for predictions4. Robustness to data perturbationsCurrent AI algorithms are unable to identify common causes for failure detection and lack techniques to quantify the quality of predictions, leading to inaccurate uncertainty quantification and lower trust in predictions. To address these issues, many techniques have been proposed, such as regularization methods and learning strategies.In this thesis, we focus on vision-language data processing for tasks like classification, image captioning, and vision question answering. Our aim is to build a safeguard by further developing current techniques to ensure accurate model uncertainty for safety-critical tasks.

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

  • paper_url: http://arxiv.org/abs/2308.03172
  • repo_url: https://github.com/aoshuang92/miscalibration_ts
  • paper_authors: Shuang Ao, Stefan Rueger, Advaith Siddharthan
  • for: 该论文旨在解决深度神经网络的准确性验证中的欠拟合问题,以确保模型的预测准确性。
  • methods: 该论文提出了一种新的评估 metric,用于评估模型的欠拟合状况,并利用这个 metric 设计了一种新的欠拟合方法,可以解决模型的过度和 Underdetermination 问题。
  • results: 该论文的实验结果显示,提出的方法可以substantially outperform existing calibration techniques,并且在一个自动故障检测任务中,提高了模型的可靠性和信任性。Here’s the full text in Simplified Chinese:
  • for: 该论文旨在解决深度神经网络的准确性验证中的欠拟合问题,以确保模型的预测准确性。
  • methods: 该论文提出了一种新的评估 metric,用于评估模型的欠拟合状况,并利用这个 metric 设计了一种新的欠拟合方法,可以解决模型的过度和 Underdetermination 问题。
  • results: 该论文的实验结果显示,提出的方法可以substantially outperform existing calibration techniques,并且在一个自动故障检测任务中,提高了模型的可靠性和信任性。
    Abstract Proper confidence calibration of deep neural networks is essential for reliable predictions in safety-critical tasks. Miscalibration can lead to model over-confidence and/or under-confidence; i.e., the model's confidence in its prediction can be greater or less than the model's accuracy. Recent studies have highlighted the over-confidence issue by introducing calibration techniques and demonstrated success on various tasks. However, miscalibration through under-confidence has not yet to receive much attention. In this paper, we address the necessity of paying attention to the under-confidence issue. We first introduce a novel metric, a miscalibration score, to identify the overall and class-wise calibration status, including being over or under-confident. Our proposed metric reveals the pitfalls of existing calibration techniques, where they often overly calibrate the model and worsen under-confident predictions. Then we utilize the class-wise miscalibration score as a proxy to design a calibration technique that can tackle both over and under-confidence. We report extensive experiments that show our proposed methods substantially outperforming existing calibration techniques. We also validate our proposed calibration technique on an automatic failure detection task with a risk-coverage curve, reporting that our methods improve failure detection as well as trustworthiness of the model. The code are available at \url{https://github.com/AoShuang92/miscalibration_TS}.
    摘要 deep learning 网络的自信核对是非常重要的,以确保在安全关键任务中的可靠预测。 miscalibration 可能会导致模型过于自信和/或不足自信,即模型对其预测的自信度高于或低于模型的准确率。 recent studies 曾经提出了 calibration 技术,并在不同任务上得到了成功。然而, under-confidence 的 miscalibration 问题还没有得到了充分的关注。在这篇论文中,我们强调了对 under-confidence 问题的注意。我们首先引入了一种新的指标,即 miscalibration Score,以评估模型的总体和类别 Calibration 状态,包括是否过于自信和/或不足自信。我们的提出的指标显示了现有的 calibration 技术的缺陷,即它们通常过于 Calibration 模型,从而恶化了不足自信的预测。然后,我们利用类别 miscalibration Score 作为代理,设计了一种可以解决过于自信和不足自信的 calibration 技术。我们报告了广泛的实验结果,显示了我们的提出的方法在现有的 calibration 技术上表现出了极大的优势。我们还验证了我们的提出的 calibration 技术在自动故障检测任务中的可靠性和信任性。代码可以在 \url{https://github.com/AoShuang92/miscalibration_TS} 上获取。

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

  • paper_url: http://arxiv.org/abs/2308.03166
  • repo_url: None
  • paper_authors: Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, Fisher Yu
  • for: 提高遮蔽物检测精度(Camouflaged Object Detection,COD)的表现,尤其是在一些具有挑战性的情况下。
  • methods: 基于预料-vs-捕食者游戏的思想,从预料和捕食者两个角度提出算法,包括一种对抗训练框架“Camouflageator”以及一种新的COD方法“Internal Coherence and Edge Guidance”(ICEG)。
  • results: 对比 existed COD 方法,ICEG 能够更好地 segmentation 遮蔽物,而且 Camouflageator 可以改进多种 COD 方法,包括 ICEG,从而实现 state-of-the-art COD 性能。
    Abstract Camouflaged object detection (COD) is the challenging task of identifying camouflaged objects visually blended into surroundings. Albeit achieving remarkable success, existing COD detectors still struggle to obtain precise results in some challenging cases. To handle this problem, we draw inspiration from the prey-vs-predator game that leads preys to develop better camouflage and predators to acquire more acute vision systems and develop algorithms from both the prey side and the predator side. On the prey side, we propose an adversarial training framework, Camouflageator, which introduces an auxiliary generator to generate more camouflaged objects that are harder for a COD method to detect. Camouflageator trains the generator and detector in an adversarial way such that the enhanced auxiliary generator helps produce a stronger detector. On the predator side, we introduce a novel COD method, called Internal Coherence and Edge Guidance (ICEG), which introduces a camouflaged feature coherence module to excavate the internal coherence of camouflaged objects, striving to obtain more complete segmentation results. Additionally, ICEG proposes a novel edge-guided separated calibration module to remove false predictions to avoid obtaining ambiguous boundaries. Extensive experiments show that ICEG outperforms existing COD detectors and Camouflageator is flexible to improve various COD detectors, including ICEG, which brings state-of-the-art COD performance.
    摘要 幻化物体检测(COD)是一项复杂的任务,即识别扮演融入周围环境中的物体。虽然已经取得了很大的成功,现有的COD检测器仍然在一些挑战性情况下困难获得精准结果。为解决这个问题,我们从预食-vs-掠食游戏中继承了猎食者和猎食者之间的竞争关系,并从两个角度提出算法。在猎食者(prey)一方,我们提出了一个对抗训练框架,即Camouflageator,该框架在auxiliary generator中引入了更多的掩蔽物体,使COD方法更难以检测。Camouflageator在对Generator和检测器进行对抗训练后,可以生成更加掩蔽的物体,从而提高检测精度。在猎食者(predator)一方,我们提出了一种新的COD方法,即内部凝聚和边缘引导(ICEG),该方法引入了掩蔽物体的凝聚特征模块,以提高物体完整性的检测结果。此外,ICEG还提出了一种新的边缘引导分离calibration模块,以除掉假定的预测,避免获得模糊的边界。广泛的实验表明,ICEG可以超越现有的COD检测器,而Camouflageator可以改进各种COD检测器,包括ICEG,从而实现状态足球的COD性能。

Precise Benchmarking of Explainable AI Attribution Methods

  • paper_url: http://arxiv.org/abs/2308.03161
  • repo_url: https://github.com/rbrandt1/precise-benchmarking-of-xai
  • paper_authors: Rafaël Brandt, Daan Raatjens, Georgi Gaydadjiev
  • for: The paper aims to develop a novel evaluation approach for benchmarking state-of-the-art explainable AI (XAI) attribution methods, in order to provide deeper insights into the output of XAI models.
  • methods: The proposed evaluation approach includes a synthetic classification model accompanied by its derived ground truth explanations, as well as new high-fidelity metrics to quantify the difference between explanations of the investigated XAI method and those derived from the synthetic model.
  • results: The authors investigate their proposal by constructing a synthetic convolutional image classification model and benchmarking several widely used XAI attribution methods using their evaluation approach. They compare their results with established prior XAI evaluation metrics, and show that their metrics provide deeper insights into the performance of XAI methods, including the poor precision scores among negatively contributing pixels. Additionally, they demonstrate that their metrics are among the fastest in terms of execution time.
    Abstract The rationale behind a deep learning model's output is often difficult to understand by humans. EXplainable AI (XAI) aims at solving this by developing methods that improve interpretability and explainability of machine learning models. Reliable evaluation metrics are needed to assess and compare different XAI methods. We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods. Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations allowing high precision representation of input nodes contributions. We also propose new high-fidelity metrics to quantify the difference between explanations of the investigated XAI method and those derived from the synthetic model. Our metrics allow assessment of explanations in terms of precision and recall separately. Also, we propose metrics to independently evaluate negative or positive contributions of inputs. Our proposal provides deeper insights into XAI methods output. We investigate our proposal by constructing a synthetic convolutional image classification model and benchmarking several widely used XAI attribution methods using our evaluation approach. We compare our results with established prior XAI evaluation metrics. By deriving the ground truth directly from the constructed model in our method, we ensure the absence of bias, e.g., subjective either based on the training set. Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods that are widely in use. Both have good precision and recall scores among positively contributing pixels (0.7, 0.76 and 0.7, 0.77, respectively), but poor precision scores among negatively contributing pixels (0.44, 0.61 and 0.47, 0.75, resp.). The recall scores in the latter case remain close. We show that our metrics are among the fastest in terms of execution time.
    摘要 <> translate "The rationale behind a deep learning model's output is often difficult to understand by humans. EXplainable AI (XAI) aims at solving this by developing methods that improve interpretability and explainability of machine learning models. Reliable evaluation metrics are needed to assess and compare different XAI methods. We propose a novel evaluation approach for benchmarking state-of-the-art XAI attribution methods. Our proposal consists of a synthetic classification model accompanied by its derived ground truth explanations allowing high precision representation of input nodes contributions. We also propose new high-fidelity metrics to quantify the difference between explanations of the investigated XAI method and those derived from the synthetic model. Our metrics allow assessment of explanations in terms of precision and recall separately. Also, we propose metrics to independently evaluate negative or positive contributions of inputs. Our proposal provides deeper insights into XAI methods output. We investigate our proposal by constructing a synthetic convolutional image classification model and benchmarking several widely used XAI attribution methods using our evaluation approach. We compare our results with established prior XAI evaluation metrics. By deriving the ground truth directly from the constructed model in our method, we ensure the absence of bias, e.g., subjective either based on the training set. Our experimental results provide novel insights into the performance of Guided-Backprop and Smoothgrad XAI methods that are widely in use. Both have good precision and recall scores among positively contributing pixels (0.7, 0.76 and 0.7, 0.77, respectively), but poor precision scores among negatively contributing pixels (0.44, 0.61 and 0.47, 0.75, resp.). The recall scores in the latter case remain close. We show that our metrics are among the fastest in terms of execution time."中文翻译:人类理解深度学习模型输出的理由往往具有困难,EXplainable AI(XAI)目的是解决这一问题,通过发展可解释性和可读性的机器学习模型。可靠的评估 метри可以用来评估和比较不同的 XAI 方法。我们提出了一种新的评估方法,用于比较现代 XAI 负担方法的表现。我们的提议包括一个Synthetic类型模型,以及其Derived的真实解释,allowing high precision representation of input nodes contributions。我们还提出了一些新的高效度 métriques,用于评估Investigated XAI方法的解释和Synthetic模型中的解释之间的差异。我们的 métriques 允许对解释进行精确的评估,分别评估精度和回归。此外,我们还提出了一些独立评估输入的正负性贡献的 метри。我们的提议可以为 XAI 方法的输出提供更深入的理解。我们在一个Synthetic convolutional image classification模型上进行了实验,并使用我们的评估方法评估了一些广泛使用的 XAI 负担方法。我们与已有的 XAI 评估 métriques进行比较。在我们的方法中,直接从构建的模型中 derivation ground truth,以避免主观偏见,如基于训练集的主观偏见。我们的实验结果提供了新的意义,Guided-Backprop和Smoothgrad XAI方法在使用的情况下的性能。两者在正确贡献像素上有着好的精度和回归分数(0.7, 0.76和0.7, 0.77,分别),但是在负贡献像素上有着差的精度分数(0.44, 0.61和0.47, 0.75,分别)。负贡献像素的回归分数保持相对较近。我们的 métriques 在执行时间方面也是 Among the fastest。