cs.AI - 2023-09-05

Utilizing Generative Adversarial Networks for Stable Structure Generation in Angry Birds

  • paper_url: http://arxiv.org/abs/2309.02614
  • repo_url: https://github.com/Blaxzter/Utilizing-Generative-Adversarial-Networks-for-Stable-Structure-Generation-in-Angry-Birds
  • paper_authors: Frederic Abraham, Matthew Stephenson
  • for: investigate the suitability of using Generative Adversarial Networks (GANs) to generate stable structures for the physics-based puzzle game Angry Birds.
  • methods: using a detailed encoding/decoding process to convert between Angry Birds level descriptions and a suitable grid-based representation, and utilizing state-of-the-art GAN architectures and training methods to produce new structure designs.
  • results: GANs can be successfully applied to generate a varied range of complex and stable Angry Birds structures.
    Abstract This paper investigates the suitability of using Generative Adversarial Networks (GANs) to generate stable structures for the physics-based puzzle game Angry Birds. While previous applications of GANs for level generation have been mostly limited to tile-based representations, this paper explores their suitability for creating stable structures made from multiple smaller blocks. This includes a detailed encoding/decoding process for converting between Angry Birds level descriptions and a suitable grid-based representation, as well as utilizing state-of-the-art GAN architectures and training methods to produce new structure designs. Our results show that GANs can be successfully applied to generate a varied range of complex and stable Angry Birds structures.
    摘要

Detection of Unknown-Unknowns in Cyber-Physical Systems using Statistical Conformance with Physics Guided Process Models

  • paper_url: http://arxiv.org/abs/2309.02603
  • repo_url: None
  • paper_authors: Aranyak Maity, Ayan Banerjee, Sandeep Gupta
  • for: 这 paper 是关于 cyber-physical system unknown unknown 情况下的分析和评估的研究。
  • methods: 该 paper 使用 dynamics-induced hybrid recurrent neural networks (DiH-RNN) 和 physics-guided surrogate model (PGSM) 来检测 operational output 特性的模型兼容性。
  • results: 该 paper 通过使用 DiH-RNN 和 PGSM 检测 operational output 特性的模型兼容性,可以检测到 unknown insulin cartridge errors 的影响。
    Abstract Unknown unknowns are operational scenarios in a cyber-physical system that are not accounted for in the design and test phase. As such under unknown-unknown scenarios, the operational behavior of the CPS is not guaranteed to meet requirements such as safety and efficacy specified using Signal Temporal Logic (STL) on the output trajectories. We propose a novel framework for analyzing the stochastic conformance of operational output characteristics of safety-critical cyber-physical systems that can discover unknown-unknown scenarios and evaluate potential safety hazards. We propose dynamics-induced hybrid recurrent neural networks (DiH-RNN) to mine a physics-guided surrogate model (PGSM) which is used to check the model conformance using STL on the model coefficients. We demonstrate the detection of operational changes in an Artificial Pancreas(AP) due to unknown insulin cartridge errors.
    摘要 未知未知情况是遗传逻辑系统中运行阶段没有考虑的情况。因此,在未知未知情况下,遗传逻辑系统的运行行为不能保证符合要求,如安全性和有效性,使用信号时间逻辑(STL)表示输出特性的轨迹。我们提出了一种新的框架,用于分析安全关键遗传逻辑系统的输出特性的杂乱兼容性,可以发现未知未知情况并评估可能的安全隐患。我们提出了动力学引导的混合回归神经网络(DiH-RNN),用于挖掘基于物理学的模型(PGSM),并使用STL来检查模型的一致性。我们示例了适用于人工肾脏(AP)的操作变化检测,即因未知的胰岛素卡通错误而导致的操作变化。

Comparative Evaluation of Metaheuristic Algorithms for Hyperparameter Selection in Short-Term Weather Forecasting

  • paper_url: http://arxiv.org/abs/2309.02600
  • repo_url: None
  • paper_authors: Anuvab Sen, Arul Rhik Mazumder, Dibyarup Dutta, Udayon Sen, Pathikrit Syam, Sandipan Dhar
  • for: 这篇论文主要针对于准确预测天气系统的复杂动态,以便于传统统计模型不能准确地捕捉天气系统的复杂性。
  • methods: 本论文使用了深度学习技术(包括普通的ANN、LSTM和GRU网络),以及元heidursive算法(GA、DE、PSO)来自动搜索优化参数。
  • results: 研究发现,元heidursive算法可以准确地搜索优化参数,从而提高天气预测的准确性。研究还发现,不同的模型结构和元heidursive算法之间存在着积极的交互关系。
    Abstract Weather forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of weather systems remains a challenge for traditional statistical models. Apart from Auto Regressive time forecasting models like ARIMA, deep learning techniques (Vanilla ANNs, LSTM and GRU networks), have shown promise in improving forecasting accuracy by capturing temporal dependencies. This paper explores the application of metaheuristic algorithms, namely Genetic Algorithm (GA), Differential Evolution (DE), and Particle Swarm Optimization (PSO), to automate the search for optimal hyperparameters in these model architectures. Metaheuristic algorithms excel in global optimization, offering robustness, versatility, and scalability in handling non-linear problems. We present a comparative analysis of different model architectures integrated with metaheuristic optimization, evaluating their performance in weather forecasting based on metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). The results demonstrate the potential of metaheuristic algorithms in enhancing weather forecasting accuracy \& helps in determining the optimal set of hyper-parameters for each model. The paper underscores the importance of harnessing advanced optimization techniques to select the most suitable metaheuristic algorithm for the given weather forecasting task.
    摘要 天气预报中的准确预测是许多领域的关键任务,但是传统的统计模型难以准确地捕捉天气系统的复杂动态。 apart from ARIMA模型,深度学习技术(Vanilla ANNs、LSTM和GRU网络)已经显示出提高预测准确性的承诺, capture temporal dependencies。 this paper explores the application of metaheuristic algorithms, namely Genetic Algorithm (GA)、Differential Evolution (DE) and Particle Swarm Optimization (PSO), to automate the search for optimal hyperparameters in these model architectures。 metaheuristic algorithms excel in global optimization, offering robustness、versatility and scalability in handling non-linear problems。 we present a comparative analysis of different model architectures integrated with metaheuristic optimization, evaluating their performance in weather forecasting based on metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE)。 the results demonstrate the potential of metaheuristic algorithms in enhancing weather forecasting accuracy and helps in determining the optimal set of hyper-parameters for each model。 the paper underscores the importance of harnessing advanced optimization techniques to select the most suitable metaheuristic algorithm for the given weather forecasting task。

Approximating High-Dimensional Minimal Surfaces with Physics-Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02589
  • repo_url: None
  • paper_authors: Steven Zhou, Xiaojing Ye
  • for: 这 paper 是计算高维 minimal surface 数学近似的,解决了高维埃尔文·芮贝格·约瑟夫问题。
  • methods: 本 paper 使用 Physics-Informed Neural Network (PINN) 方法,通过训练深度神经网络 (DNN) 来解决 minimal surface 方程。
  • results: 本 paper 的结果表明,PINN 方法可以在高维空间中计算 minimal surface,并且可以快速地训练和运行在笔记型计算机上,无需高性能计算机。
    Abstract In this paper, we compute numerical approximations of the minimal surfaces, an essential type of Partial Differential Equation (PDE), in higher dimensions. Classical methods cannot handle it in this case because of the Curse of Dimensionality, where the computational cost of these methods increases exponentially fast in response to higher problem dimensions, far beyond the computing capacity of any modern supercomputers. Only in the past few years have machine learning researchers been able to mitigate this problem. The solution method chosen here is a model known as a Physics-Informed Neural Network (PINN) which trains a deep neural network (DNN) to solve the minimal surface PDE. It can be scaled up into higher dimensions and trained relatively quickly even on a laptop with no GPU. Due to the inability to view the high-dimension output, our data is presented as snippets of a higher-dimension shape with enough fixed axes so that it is viewable with 3-D graphs. Not only will the functionality of this method be tested, but we will also explore potential limitations in the method's performance.
    摘要 在本文中,我们计算高维精灵散函数(PDE)的数学近似值,这是一种基础方程的重要类型。传统方法在这种情况下无法处理,因为维度味问题的计算成本会快速增长,超过现代超级计算机的处理能力。仅在过去几年,机器学习研究人员才能够解决这个问题。选择的方法是一种称为物理学习神经网络(PINN),它使用深度神经网络(DNN)解决精灵散函数PDE。它可以扩展到更高维度,并在笔记计算机上训练相对快速,即使没有GPU。由于无法查看高维度输出,我们的数据被示为一些固定轴的高维度形状的截面,可以通过3D图表查看。不仅会测试这种方法的功能,我们还将探讨这种方法的可能的局限性。

Representation Learning for Sequential Volumetric Design Tasks

  • paper_url: http://arxiv.org/abs/2309.02583
  • repo_url: None
  • paper_authors: Md Ferdous Alam, Yi Wang, Linh Tran, Chin-Yi Cheng, Jieliang Luo
  • for: 这个论文的目的是提出一种基于 transformer 模型的整体设计系统,以便自动生成符合设计师的设计解决方案。
  • methods: 该论文使用了 transformer 模型来编码设计知识,并从一系列专家或高性能的设计序列中提取有用的表示。然后,它使用这些表示来进行设计偏好评估和生成过程设计。
  • results: 该论文通过使用一个 novel dataset of thousands of sequential volumetric designs 来证明其方法的有效性。其中,设计偏好评估模型可以准确地评估两个任意给定的设计序列之间的差异,并且可以自动完成一个volumetric设计序列从一个半完整的设计序列中。
    Abstract Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process is complex, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost 90% accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.
    摘要 volumetric design,也称为质量设计,是职业建筑设计的首要步骤,具有顺序性。由于volumetric design过程复杂,其下面顺序设计过程含有价值信息。许多努力已经被made to automatically生成合理的涂抹设计,但生成的设计解决方案质量不一,评估设计解决方案需要 Either a comprehensive set of metrics or expensive human expertise。而前一些方法只是学习最终的设计而不是顺序设计任务,我们提议将专家或高性能的设计序列知识编码到模型中,并使用 transformer-based 模型提取有用的表示。后续,我们将学习的表示用于重要的下游应用,如设计偏好评估和 procedural 设计生成。我们开发了偏好模型,通过估计学习的表示密度来评估两个任意给定的设计序列,并训练一个 autoregressive transformer 模型来生成顺序设计序列。我们通过使用一个 novel 的千个顺序涂抹设计数据集来证明我们的想法。我们的偏好模型可以比较两个任意给定的设计序列,准确率接近 90%,而我们的 autoregressive 模型也可以自动完成一个顺序涂抹设计序列从一个部分设计序列。

Unveiling Intractable Epileptogenic Brain Networks with Deep Learning Algorithms: A Novel and Comprehensive Framework for Scalable Seizure Prediction with Unimodal Neuroimaging Data in Pediatric Patients

  • paper_url: http://arxiv.org/abs/2309.02580
  • repo_url: None
  • paper_authors: Bliss Singhal, Fnu Pooja
    for: 这项研究的目的是预测儿童患有不可逆的癫痫病例中的癫痫发作。methods: 该研究使用机器学习算法对单modal神经成像数据进行评估,包括电энцеfalogram信号。研究使用了频带滤波和独立组分分析来减少数据中的噪声和artefacts。results: 研究发现,深度学习算法在预测癫痫发作方面比逻子回归和k最近邻居更成功。Long short-term memory(LSTM)在精度和F1分数方面表现出色, convolutional neural network(CNN)在特征特征方面表现出色。这项研究有重要的应用前瞻性,可能改变临床实践,并提高儿童护理水平。
    Abstract Epilepsy is a prevalent neurological disorder affecting 50 million individuals worldwide and 1.2 million Americans. There exist millions of pediatric patients with intractable epilepsy, a condition in which seizures fail to come under control. The occurrence of seizures can result in physical injury, disorientation, unconsciousness, and additional symptoms that could impede children's ability to participate in everyday tasks. Predicting seizures can help parents and healthcare providers take precautions, prevent risky situations, and mentally prepare children to minimize anxiety and nervousness associated with the uncertainty of a seizure. This research proposes a novel and comprehensive framework to predict seizures in pediatric patients by evaluating machine learning algorithms on unimodal neuroimaging data consisting of electroencephalogram signals. The bandpass filtering and independent component analysis proved to be effective in reducing the noise and artifacts from the dataset. Various machine learning algorithms' performance is evaluated on important metrics such as accuracy, precision, specificity, sensitivity, F1 score and MCC. The results show that the deep learning algorithms are more successful in predicting seizures than logistic Regression, and k nearest neighbors. The recurrent neural network (RNN) gave the highest precision and F1 Score, long short-term memory (LSTM) outperformed RNN in accuracy and convolutional neural network (CNN) resulted in the highest Specificity. This research has significant implications for healthcare providers in proactively managing seizure occurrence in pediatric patients, potentially transforming clinical practices, and improving pediatric care.
    摘要 “凝视症是一种流行的神经系统疾病,全球病人约5000万人,美国病人约120万人。有数百万名儿童患有不治疗的凝视症,其中发作不能控制的症状可能导致物理伤害、混乱、失去意识和其他 симптом,影响儿童日常生活。预测发作可以帮助家长和医疗提供者制定预防措施,避免危险情况,并帮助儿童减少发作症状所带来的焦虑和不安。本研究提出了一个全面的预测发作框架,通过评估机器学习算法在单一神经内成像数据上进行评估。对于重要的 метри几何,例如准确性、特异性、敏感度和合理性,评估了不同的机器学习算法的表现。结果显示,深度学习算法比逻辑回传和k最近邻居更 successful 预测发作。Long short-term memory(LSTM)在准确性和敏感度方面表现出色, convolutional neural network(CNN)在特异性方面表现最好。这些研究结果具有重要的实践意义,可能将影响医疗提供者在管理儿童发作的方法,并改善儿童医疗。”

Recurrence-Free Survival Prediction for Anal Squamous Cell Carcinoma Chemoradiotherapy using Planning CT-based Radiomics Model

  • paper_url: http://arxiv.org/abs/2309.02562
  • repo_url: None
  • paper_authors: Shanshan Tang, Kai Wang, David Hein, Gloria Lin, Nina N. Sanford, Jing Wang
  • for: 这项研究是为了开发一个基于辐射预计划CT图像的模型,以预测非转移性分化细胞癌(ASCC)患者 после化疗(CRT)的再次出现率。
  • methods: 研究人员使用了 радиом扬特征来预测ASCC患者的再次出现率,并在多ivariate Cox准确比率模型中选择了最佳特征集。
  • results: 研究发现,基于Shape和Texture的 радиом扬特征可以有效预测ASCC患者的再次出现率,并且combined模型在测试组中表现更好,其C-指数和ROC曲线都高于仅使用临床特征模型。
    Abstract Objectives: Approximately 30% of non-metastatic anal squamous cell carcinoma (ASCC) patients will experience recurrence after chemoradiotherapy (CRT), and currently available clinical variables are poor predictors of treatment response. We aimed to develop a model leveraging information extracted from radiation pretreatment planning CT to predict recurrence-free survival (RFS) in ASCC patients after CRT. Methods: Radiomics features were extracted from planning CT images of 96 ASCC patients. Following pre-feature selection, the optimal feature set was selected via step-forward feature selection with a multivariate Cox proportional hazard model. The RFS prediction was generated from a radiomics-clinical combined model based on an optimal feature set with five repeats of five-fold cross validation. The risk stratification ability of the proposed model was evaluated with Kaplan-Meier analysis. Results: Shape- and texture-based radiomics features significantly predicted RFS. Compared to a clinical-only model, radiomics-clinical combined model achieves better performance in the testing cohort with higher C-index (0.80 vs 0.73) and AUC (0.84 vs 0.79 for 1-year RFS, 0.84 vs 0.78 for 2-year RFS, and 0.86 vs 0.83 for 3-year RFS), leading to distinctive high- and low-risk of recurrence groups (p<0.001). Conclusions: A treatment planning CT based radiomics and clinical combined model had improved prognostic performance in predicting RFS for ASCC patients treated with CRT as compared to a model using clinical features only.
    摘要 目的:约30%的非转移性 anal squamous cell carcinoma(ASCC)患者会经受化疗后再次出现,现有的临床变量不能准确预测治疗效果。我们目标是利用从化疗前规划CT图像中提取的信息,预测ASCC患者在化疗后的再次出现率(RFS)。方法:从96名ASCC患者的规划CT图像中提取了 радиом学特征。经过预选feature,选择了最佳特征集。然后,通过五次十分割分 Validation进行验证。Result:Shape和 texture基的 радиом学特征能够有效预测RFS。与仅使用临床特征模型相比,radiomics-临床共同模型在测试组中表现更好,其C-指数(0.80 vs 0.73)和ROC(0.84 vs 0.79 for 1-year RFS, 0.84 vs 0.78 for 2-year RFS, and 0.86 vs 0.83 for 3-year RFS)都高于临床特征模型,从而导致了高和低风险组分化(p<0.001)。结论:基于规划CT图像的 радиомics和临床共同模型在预测ASCC患者化疗后RFS方面表现出了改善的预测能力,与仅使用临床特征模型相比。

Physically Grounded Vision-Language Models for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2309.02561
  • repo_url: https://github.com/Stanford-ILIAD/pg-vlm
  • paper_authors: Jensen Gao, Bidipta Sarkar, Fei Xia, Ted Xiao, Jiajun Wu, Brian Ichter, Anirudha Majumdar, Dorsa Sadigh
  • for: 提高视觉问答和图像描述任务的性能,使模型更能理解物理世界。
  • methods: 使用普通人的协助和自动化的物理概念标注数据集PhysObjects进行训练,以捕捉人类对物理对象的Visual priors。
  • results: 在含有物理概念的任务中,使用物理grounded VLM进行规划,可以提高任务成功率,并在实际机器人中进行实践。
    Abstract Recent advances in vision-language models (VLMs) have led to improved performance on tasks such as visual question answering and image captioning. Consequently, these models are now well-positioned to reason about the physical world, particularly within domains such as robotic manipulation. However, current VLMs are limited in their understanding of the physical concepts (e.g., material, fragility) of common objects, which restricts their usefulness for robotic manipulation tasks that involve interaction and physical reasoning about such objects. To address this limitation, we propose PhysObjects, an object-centric dataset of 39.6K crowd-sourced and 417K automated physical concept annotations of common household objects. We demonstrate that fine-tuning a VLM on PhysObjects improves its understanding of physical object concepts, including generalization to held-out concepts, by capturing human priors of these concepts from visual appearance. We incorporate this physically-grounded VLM in an interactive framework with a large language model-based robotic planner, and show improved planning performance on tasks that require reasoning about physical object concepts, compared to baselines that do not leverage physically-grounded VLMs. We additionally illustrate the benefits of our physically-grounded VLM on a real robot, where it improves task success rates. We release our dataset and provide further details and visualizations of our results at https://iliad.stanford.edu/pg-vlm/.
    摘要 近期的视力语言模型(VLM)的进步已导致视觉问答和图像描述等任务的表现得到改善。因此,这些模型现在更容易进行物理世界中的理解,特别是在机器人操作领域。然而,当前的VLM仍有限制,它们对常见物品的物理概念(例如材料和脆弱性)的理解不够,这限制了它们在机器人操作任务中的用途。为解决这个问题,我们提出了PhysObjects,一个包含39600个人工标注和417000个自动标注的常见家用物品的物理概念数据集。我们示示了 fine-tuning VLM 在 PhysObjects 上可以提高它对物理 объек 概念的理解,包括泛化到未经过 обучение的概念。我们将这种物理基础 VLM 与大语言模型基础的机器人规划器集成,并表明了不使用物理基础 VLM 的基eline 的规划性能相对较差。我们还在真实的机器人上运行了这种物理基础 VLM,并证明了它可以提高任务成功率。我们在 https://iliad.stanford.edu/pg-vlm/ 发布了我们的数据集,并在结果中提供了更多的详细信息和视觉化。

Automating Behavioral Testing in Machine Translation

  • paper_url: http://arxiv.org/abs/2309.02553
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Javier Ferrando, Matthias Sperber, Hendra Setiawan, Dominic Telaar, Saša Hasan
  • for: 评估机器翻译系统的语言能力,包括语料生成和输入输出行为的分析。
  • methods: 使用大语言模型生成多样化的源句子,以测试机器翻译系统在不同情况下的行为。
  • results: 通过对多个可用的机器翻译系统进行评估,发现 passer rates 随着传统精度指标的趋势相似,但方法找到了一些重要的差异和潜在的错误,这些错误在仅仅依据精度时未被发现。
    Abstract Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is currently restricted to largely handcrafted tests covering a limited range of capabilities and languages. To address this limitation, we propose to use Large Language Models (LLMs) to generate a diverse set of source sentences tailored to test the behavior of MT models in a range of situations. We can then verify whether the MT model exhibits the expected behavior through matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy.
    摘要 <>TRANSLATE_TEXT行为测试在自然语言处理(NLP)中允许细化评估系统的语言能力通过输入输出行为的分析。 Unfortunately,现有的机器翻译(MT)行为测试工作现在受到了较少的手动测试覆盖的限制,只有一些特定的语言和能力。 To address this limitation, we propose using Large Language Models(LLMs)来生成一组多样化的源句子,用于测试MT模型在多种情况下的行为。 We can then verify whether the MT model exhibits the expected behavior by matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy.TRANSLATE_TEXT

Continual Improvement of Threshold-Based Novelty Detection

  • paper_url: http://arxiv.org/abs/2309.02551
  • repo_url: None
  • paper_authors: Abe Ejilemele, Jorge Mendez-Mendez
  • for: 解决 neural network 在动态开放世界中探测未经见过的类型时存在问题,使得不可预知 novelty 探测方法无法适应实际环境中的数据特性。
  • methods: 我们提出了一种新的方法,利用直线搜索和留一个样本验证来自动选择阈值,以提高总准确率在 MNIST、Fashion MNIST 和 CIFAR-10 上。
  • results: 我们的方法在三个 dataset 上都达到了提高的总准确率,表明自动选择阈值可以更好地适应不同的数据特性。
    Abstract When evaluated in dynamic, open-world situations, neural networks struggle to detect unseen classes. This issue complicates the deployment of continual learners in realistic environments where agents are not explicitly informed when novel categories are encountered. A common family of techniques for detecting novelty relies on thresholds of similarity between observed data points and the data used for training. However, these methods often require manually specifying (ahead of time) the value of these thresholds, and are therefore incapable of adapting to the nature of the data. We propose a new method for automatically selecting these thresholds utilizing a linear search and leave-one-out cross-validation on the ID classes. We demonstrate that this novel method for selecting thresholds results in improved total accuracy on MNIST, Fashion MNIST, and CIFAR-10.
    摘要 在动态开放环境中评估神经网络时,它们很难探测未看过的类别。这个问题使得在实际环境中部署不断学习者变得更加复杂。一种常见的新类探测技术是基于训练数据点和观察数据点之间的相似度阈值。然而,这些方法通常需要手动指定阈值的值(在过程中),因此无法适应数据的特点。我们提出了一种新的方法,使用线性搜索和留下一个样本进行交叉验证,以自动选择阈值。我们示出,这种新的阈值选择方法可以提高MNIST、Fashion MNIST和CIFAR-10等三个 dataset 的总准确率。

Structural Concept Learning via Graph Attention for Multi-Level Rearrangement Planning

  • paper_url: http://arxiv.org/abs/2309.02547
  • repo_url: None
  • paper_authors: Manav Kulshrestha, Ahmed H. Qureshi
  • for: 该论文关注了机器人 manipulate 任务,如物体重新排序,以便 robot 与复杂而不受限制的环境进行交互。
  • methods: 该论文提出了一种深度学习方法,名为结构概念学习(SCL),它利用图注意网络来实现多层物体重新排序规划。SCL 可以处理具有结构依赖层次的Scene,并可以在未经见过的场景中进行任务并行化和灵活化。
  • results: 作者通过对一系列经典和模型基础的基准方法进行比较,证明了 SCL 能够更好地利用场景理解来实现更高的性能、灵活性和效率。
    Abstract Robotic manipulation tasks, such as object rearrangement, play a crucial role in enabling robots to interact with complex and arbitrary environments. Existing work focuses primarily on single-level rearrangement planning and, even if multiple levels exist, dependency relations among substructures are geometrically simpler, like tower stacking. We propose Structural Concept Learning (SCL), a deep learning approach that leverages graph attention networks to perform multi-level object rearrangement planning for scenes with structural dependency hierarchies. It is trained on a self-generated simulation data set with intuitive structures, works for unseen scenes with an arbitrary number of objects and higher complexity of structures, infers independent substructures to allow for task parallelization over multiple manipulators, and generalizes to the real world. We compare our method with a range of classical and model-based baselines to show that our method leverages its scene understanding to achieve better performance, flexibility, and efficiency. The dataset, supplementary details, videos, and code implementation are available at: https://manavkulshrestha.github.io/scl
    摘要 瑜珈机器人操作任务,如物品重新排序,对机器人在复杂且随机环境中进行交互起到关键作用。现有研究主要集中在单个层次重新排序规划上,即使有多个层次存在,dependency关系 among substructures几乎都是 геометрически简单的,如筒堆。我们提出了Structural Concept Learning(SCL),一种深度学习方法,通过图像注意力网络来实现多层次物品重新排序规划。它在自己生成的 simulatedata set上接受intuitive结构,可以处理未看过的场景,无论有多少对象和更高的结构复杂度,推导独立的substructure,以便在多个机器人上分布task,并且可以在实际世界中普适。我们与一系列的古典和基于模型的基准进行比较,显示我们的方法借助于场景理解来实现更好的性能、灵活性和效率。数据集、补充细节、视频和代码实现可以在:https://manavkulshrestha.github.io/scl obtener

Experience and Prediction: A Metric of Hardness for a Novel Litmus Test

  • paper_url: http://arxiv.org/abs/2309.02534
  • repo_url: None
  • paper_authors: Nicos Isaak, Loizos Michael
  • for: 本研究旨在开发一种基于机器学习(ML)的系统,用于评估winograd schema的困难程度,并且比之前的方法更快速和准确。
  • methods: 本研究采用了两种不同的方法,namely random forest和深度学习(LSTM-based),以评估winograd schema的困难程度。
  • results: 研究发现,人类对winograd schema的表现异常 vary,并且与schema的困难程度有直接关系。此外,我们还发现了一些特定的winograd schema可以用于测试人类的智能水平。
    Abstract In the last decade, the Winograd Schema Challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this \textit{hardness-metric} could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an automated system that is able to output the hardness-indexes of Winograd schemas, albeit with limitations regarding the number of schemas it could be applied on. This paper adds to previous research by presenting a new system that is based on Machine Learning (ML), able to output the hardness of any Winograd schema faster and more accurately than any other previously used method. Our developed system, which works within two different approaches, namely the random forest and deep learning (LSTM-based), is ready to be used as an extension of any other system that aims to differentiate between Winograd schemas, according to their perceived hardness for humans. At the same time, along with our developed system we extend previous work by presenting the results of a large-scale experiment that shows how human performance varies across Winograd schemas.
    摘要 过去一个 décennio,Winograd Schema Challenge(WSC)已成为研究社区中的中心方面,作为一种新的考验。因此,WSC 引发了研究者的兴趣,因为它可以用来理解人类行为。在这个意义上,开发新技术使得 Winograd schema 可以在不同领域中使用,如设计新型 CAPTCHAs。 据文献记录,人类成人在 WSC 中的表现达标准,表明不同的 Winograd schema 可能有不同的抵抗程度。在这个意义上,这个“抵抗度指标”可以在未来的挑战中或 WSC CAPTCHA 服务中使用来区分 Winograd schema。 我们最近的工作表明,这可以通过设计一个自动化系统来实现,该系统可以输出 Winograd schema 的抵抗度指标,但是只能应用于有限数量的 schema。本文添加了先前的研究,提出了一个基于机器学习(ML)的新系统,可以更快、更准确地输出 Winograd schema 的抵抗度指标。我们开发的系统采用了两种不同的方法,即随机森林和深度学习(LSTM)。这个系统可以作为任何其他系统的扩展,以区分 Winograd schema 根据人类对它们的抵抗度。同时,我们也扩展了先前的工作,通过发表大规模实验,显示了人类在不同 Winograd schema 中的表现差异。

Do You Trust ChatGPT? – Perceived Credibility of Human and AI-Generated Content

  • paper_url: http://arxiv.org/abs/2309.02524
  • repo_url: None
  • paper_authors: Martin Huschens, Martin Briesch, Dominik Sobania, Franz Rothlauf
  • for: 这个研究探讨用户对人类作者vs大语言模型生成内容的信任程度如何不同的用户界面版本。
  • methods: 研究使用了具有不同用户界面版本的人类作者和大语言模型生成的内容,评估用户对这两种内容的信任程度和技能感。
  • results: 结果显示,尽管用户界面版本不同,但参与者对人类作者和大语言模型生成的内容的信任程度几乎相同。同时,参与者认为AI生成的内容 clearer和更有吸引力。这些发现告诉我们需要更加谨慎地评估信息来源,促使用者予以慎重和批判性思维。
    Abstract This paper examines how individuals perceive the credibility of content originating from human authors versus content generated by large language models, like the GPT language model family that powers ChatGPT, in different user interface versions. Surprisingly, our results demonstrate that regardless of the user interface presentation, participants tend to attribute similar levels of credibility. While participants also do not report any different perceptions of competence and trustworthiness between human and AI-generated content, they rate AI-generated content as being clearer and more engaging. The findings from this study serve as a call for a more discerning approach to evaluating information sources, encouraging users to exercise caution and critical thinking when engaging with content generated by AI systems.
    摘要

Efficient RL via Disentangled Environment and Agent Representations

  • paper_url: http://arxiv.org/abs/2309.02435
  • repo_url: None
  • paper_authors: Kevin Gmelin, Shikhar Bahl, Russell Mendonca, Deepak Pathak
  • for: 提高RL算法的视觉理解和表示能力
  • methods: 使用自己的视觉知识(如形状或面具)来学习结构化表示,并将其integrated into RL目标函数中
  • results: 在18个不同的视觉 simulations环境中,对5种不同的机器人进行了比较,并得到了模型自由方法的性能提升
    Abstract Agents that are aware of the separation between themselves and their environments can leverage this understanding to form effective representations of visual input. We propose an approach for learning such structured representations for RL algorithms, using visual knowledge of the agent, such as its shape or mask, which is often inexpensive to obtain. This is incorporated into the RL objective using a simple auxiliary loss. We show that our method, Structured Environment-Agent Representations, outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots. Website at https://sear-rl.github.io/
    摘要 Agent 可以感知自己和环境之间的分离,可以利用这种理解来形成有效的视觉输入表示。我们提出一种使用视觉知识,如机器人的形状或面具,可以轻松获得的方法来学习这些结构化表示。这种方法被 integrate 到RL目标中使用简单的辅助损失。我们显示,我们的方法 Structured Environment-Agent Representations 在 18 个不同的复杂视觉 simulate 环境中,使用 5 种不同的机器人,超过了现状的模型自由方法。网站地址为 。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Building a Winning Team: Selecting Source Model Ensembles using a Submodular Transferability Estimation Approach

  • paper_url: http://arxiv.org/abs/2309.02429
  • repo_url: None
  • paper_authors: Vimal K B, Saketh Bachu, Tanmay Garg, Niveditha Lakshmi Narasimhan, Raghavan Konuru, Vineeth N Balasubramanian
  • for: 这篇论文主要应用于估计公开可用的预训模型在目标任务上的转移可能性。
  • methods: 本论文使用了一个新的Optimal tranSport-based suBmOdular tRaNsferability metric(OSBORN)来估计多来源模型的转移可能性。OSBORN考虑了图像领域差异、任务差异和模型集合中的凝聚性,以提供可靠的转移可能性估计。
  • results: 本论文通过对28个源数据集、11个目标数据集、5种模型架构和2种预训方法进行 benchmarking,发现OSBORN可以与现有的state-of-the-art度量metric MS-LEEP和E-LEEP相比,并在该方法下表现出色。
    Abstract Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks in recent years. Existing efforts propose metrics that allow a user to choose one model from a pool of pre-trained models without having to fine-tune each model individually and identify one explicitly. With the growth in the number of available pre-trained models and the popularity of model ensembles, it also becomes essential to study the transferability of multiple-source models for a given target task. The few existing efforts study transferability in such multi-source ensemble settings using just the outputs of the classification layer and neglect possible domain or task mismatch. Moreover, they overlook the most important factor while selecting the source models, viz., the cohesiveness factor between them, which can impact the performance and confidence in the prediction of the ensemble. To address these gaps, we propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task. OSBORN collectively accounts for image domain difference, task difference, and cohesiveness of models in the ensemble to provide reliable estimates of transferability. We gauge the performance of OSBORN on both image classification and semantic segmentation tasks. Our setup includes 28 source datasets, 11 target datasets, 5 model architectures, and 2 pre-training methods. We benchmark our method against current state-of-the-art metrics MS-LEEP and E-LEEP, and outperform them consistently using the proposed approach.
    摘要 估计公共可用预训练模型在目标任务中的转移性在过去几年中得到了重要地位。现有的努力提出了用于选择预训练模型池中的一个模型而无需 individually fine-tune each model和特定地标出一个的指标。随着可用的预训练模型的数量的增加和模型组合的流行,也变得必要研究多源模型在给定目标任务中的转移性。现有的努力研究了这种多源模型的转移性使用输出类别层的结果,而忽略了可能存在的领域或任务差异,而且也忽略了选择源模型时最重要的因素——模型集合中的凝结度,这可能会影响预测 ensemble 的性能和信任度。为了解决这些差距,我们提出了一种新的 Optimal tranSport-based suBmOdular tRaNsferability 指标(OSBORN),用于估计 ensemble 模型到下游任务的转移性。OSBORN 共同考虑图像领域差异、任务差异和模型集合中的凝结度,以提供可靠的转移性估计。我们在图像分类和 semantic segmentation 任务上测试了我们的方法,并与当前状态的метрик MS-LEEP 和 E-LEEP 进行了比较,并一致地超越了它们。

Cognitive Architectures for Language Agents

  • paper_url: http://arxiv.org/abs/2309.02427
  • repo_url: https://github.com/ysymyth/awesome-language-agents
  • paper_authors: Theodore Sumers, Shunyu Yao, Karthik Narasimhan, Thomas L. Griffiths
  • for: 本研究旨在开发一种新一代的语言智能代理人,以帮助语言模型(LLM)进行更好的理解和决策。
  • methods: 本研究使用了符号人工智能的历史经验,将生成语言模型(LLM)与外部资源(如互联网)或内部控制流(如提示链)结合起来,以建立一个完整的语言代理人系统。
  • results: 研究表明,LLMs 具有许多生产系统的特性,而最近尝试改进 LLMS 的理解和基础设施的努力,与生产系统驱动的认知架构的发展具有很大的相似性。
    Abstract Recent efforts have incorporated large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning. However, these efforts have largely been piecemeal, lacking a systematic framework for constructing a fully-fledged language agent. To address this challenge, we draw on the rich history of agent design in symbolic artificial intelligence to develop a blueprint for a new wave of cognitive language agents. We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future.
    摘要 We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework.Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future.

A Context-Sensitive Approach to XAI in Music Performance

  • paper_url: http://arxiv.org/abs/2309.04491
  • repo_url: None
  • paper_authors: Nicola Privato, Jack Armitage
  • for: 提出了一种Explainable Pragmatism(EP)框架,用于解释人工智能(AI)系统在音乐表演中的工作原理。
  • methods: 提出了一种基于上下文和听众的解释需求开发方法,并在实际应用中进行了详细的描述和分析。
  • results: EP框架可以帮助提高AI系统在艺术应用中的透明度和可解性,并且可以根据听众反馈和改进。
    Abstract The rapidly evolving field of Explainable Artificial Intelligence (XAI) has generated significant interest in developing methods to make AI systems more transparent and understandable. However, the problem of explainability cannot be exhaustively solved in the abstract, as there is no single approach that can be universally applied to generate adequate explanations for any given AI system, and this is especially true in the arts. In this position paper, we propose an Explanatory Pragmatism (EP) framework for XAI in music performance, emphasising the importance of context and audience in the development of explainability requirements. By tailoring explanations to specific audiences and continuously refining them based on feedback, EP offers a promising direction for enhancing the transparency and interpretability of AI systems in broad artistic applications and more specifically to music performance.
    摘要 rapidly evolving field of 可解释人工智能(XAI)已引起了开发方法来使AI系统更透明和理解的广泛关注。然而,问题的解释不可能在抽象中完全解决,因为没有一种通用的方法可以对任何AI系统生成足够的解释,这特别是在艺术领域。在这篇Position paper中,我们提出了一种 Pragmatism(EP)框架 для XAI在音乐表演中,强调了解释的上下文和听众的重要性。通过对specific audiences tailoring explanations和基于反馈不断修改,EP提供了一个有前途的方向来提高AI系统在艺术应用中的透明度和可解释性。

Information Processing by Neuron Populations in the Central Nervous System: Mathematical Structure of Data and Operations

  • paper_url: http://arxiv.org/abs/2309.02332
  • repo_url: None
  • paper_authors: Martin N. P. Nilsson
    for:这篇论文旨在探讨神经元集团中的信息编码和操作方法。methods:该论文使用一种现代化的神经元模型,并通过描述这些神经元集团的数学结构,探讨了这些集团在信息处理方面的能力。results:研究发现,这些神经元集团可以通过一种简单的代数结构来表示和处理信息,并且可以实现许多操作,如特殊化、通用化、新鲜度检测、维度减少、逆模型、预测和关联记忆等。这些结果可能有助于我们更好地理解脑中的信息处理机制,并在认知科学和人工智能领域进行进一步的研究。
    Abstract In the intricate architecture of the mammalian central nervous system, neurons form populations. Axonal bundles communicate between these clusters using spike trains as their medium. However, these neuron populations' precise encoding and operations have yet to be discovered. In our analysis, the starting point is a state-of-the-art mechanistic model of a generic neuron endowed with plasticity. From this simple framework emerges a profound mathematical construct: The representation and manipulation of information can be precisely characterized by an algebra of finite convex cones. Furthermore, these neuron populations are not merely passive transmitters. They act as operators within this algebraic structure, mirroring the functionality of a low-level programming language. When these populations interconnect, they embody succinct yet potent algebraic expressions. These networks allow them to implement many operations, such as specialization, generalization, novelty detection, dimensionality reduction, inverse modeling, prediction, and associative memory. In broader terms, this work illuminates the potential of matrix embeddings in advancing our understanding in fields like cognitive science and AI. These embeddings enhance the capacity for concept processing and hierarchical description over their vector counterparts.
    摘要 在哺乳动物中枢神经系统的复杂建筑中, neurons 组成 populations。 axon 短列传输 между这些群体使用冲击车作为媒介。然而,这些 neuron populations 的准确编码和操作仍未被发现。在我们的分析中,开始点是一种现代机制模型,拥有 пластичность的 generic neuron。从这个简单的框架中,出现了深刻的数学构造:表示和操作信息的 algebra of finite convex cones。此外,这些 neuron populations 不仅是 passive 的传输器。它们作为这些数学结构中的操作员,反映了低级编程语言的功能。当这些 populations 相互连接时,它们实现了简洁而强大的数学表达。这些网络允许它们实现许多操作,如特性化、泛化、发现新的、维度减少、逆模型、预测和相关记忆。在更广泛的意义上,这些 embedding 在认知科学和 AI 领域的发展中具有潜在的潜力。这些 embedding 可以提高概念处理的能力和层次描述的能力,比vector counterparts 更高效。

Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments

  • paper_url: http://arxiv.org/abs/2309.02328
  • repo_url: None
  • paper_authors: Haozhe Lei, Quanyan Zhu
  • for: This paper focuses on the integration of machine learning into self-driving technology, with a specific emphasis on ensuring safety and efficiency in real-world applications.
  • methods: The paper introduces an algorithm for online meta-reinforcement learning, called Neurosymbolic Meta-Reinforcement Lookahead Learning (NUMERLA), which combines lookahead symbolic constraints with online adaptation to ensure both efficiency and safety.
  • results: The experimental results demonstrate that NUMERLA enables the self-driving agent to adapt in real-time to non-stationary urban human-vehicle interaction scenarios, leading to safe and self-adaptive driving.Here’s the same information in Simplified Chinese:
  • for: 这篇论文关注机器学习在自动驾驶技术中的集成,特别是在实际应用中保证安全性和效率的问题。
  • methods: 这篇论文提出了一种名为数字符号化多因素奖励前Lookahead学习算法(NUMERLA),它将数字符号化约束与在线调整结合起来,以确保效率和安全性均得到保障。
  • results: 实验结果表明, NUMERLA可以使自动驾驶机器人在非站立城市人机交互enario下实现安全和自适应驾驶。
    Abstract In the area of learning-driven artificial intelligence advancement, the integration of machine learning (ML) into self-driving (SD) technology stands as an impressive engineering feat. Yet, in real-world applications outside the confines of controlled laboratory scenarios, the deployment of self-driving technology assumes a life-critical role, necessitating heightened attention from researchers towards both safety and efficiency. To illustrate, when a self-driving model encounters an unfamiliar environment in real-time execution, the focus must not solely revolve around enhancing its anticipated performance; equal consideration must be given to ensuring its execution or real-time adaptation maintains a requisite level of safety. This study introduces an algorithm for online meta-reinforcement learning, employing lookahead symbolic constraints based on \emph{Neurosymbolic Meta-Reinforcement Lookahead Learning} (NUMERLA). NUMERLA proposes a lookahead updating mechanism that harmonizes the efficiency of online adaptations with the overarching goal of ensuring long-term safety. Experimental results demonstrate NUMERLA confers the self-driving agent with the capacity for real-time adaptability, leading to safe and self-adaptive driving under non-stationary urban human-vehicle interaction scenarios.
    摘要 在学习驱动人工智能的发展领域,将机器学习(ML) integrates into自动驾驶(SD)技术是一项印象深刻的工程成果。然而,在实际应用中,自动驾驶技术的部署具有生命 crítical 的重要性,需要研究人员强调安全性和效率之间的平衡。例如,当一个自动驾驶模型在实时执行中遇到未知环境时,不能 solely 围绕增强其预期性能进行强调,也需要确保其执行或实时适应保持一定的安全水平。本研究提出了一种在线meta-学习算法,使用lookahead符号约束,基于Neurosymbolic Meta-Reinforcement Lookahead Learning(NUMERLA)。NUMERLA提出了一种协调在线适应的效率和长期安全的目标,使得自动驾驶机器人能够在非站ARY urban human-vehicle interactionenario下进行安全和自适应驾驶。实验结果表明,NUMERLA使得自动驾驶机器人具有了实时适应的能力,并在非站ARY urban human-vehicle interactionenario下保持了安全和自适应的驾驶。

Revisiting File Context for Source Code Summarization

  • paper_url: http://arxiv.org/abs/2309.02326
  • repo_url: https://github.com/apcl-research/transformerfc
  • paper_authors: Aakash Bansal, Chia-Yi Su, Collin McMillan
  • for: 这个论文主要是为了提高代码概要的生成。
  • methods: 该论文使用了改进的 transformer 架构,用于编码文件上下文信息,以帮助解决一些困难的例子。
  • results: 研究发现,文件上下文信息可以帮助解决一些困难的例子,并且提高代码概要的生成质量。
    Abstract Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself -- that information often resides in other nearby code. In this paper, we revisit the idea of ``file context'' for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.
    摘要 源代码概要是将源代码写成自然语言描述的任务。一个常见的用例是生成 API 文档中的简短描述。现有的大多数研究都使用 encoder-decoder 神经网络架构,其中 encoder 输入通常是单个子routine 或其他短代码副本。问题在于,代码描述所需的信息不总是在代码中存在,这些信息通常位于附近的代码中。在这篇论文中,我们重新考虑了 ``file context'' 的想法,即在代码概要中使用其他文件中的选择信息。我们提出了一种 modificated Transformer 架构,专门用于编码文件上下文,并证明其在多个基线上显著提高了性能。我们发现,文件上下文有助于一些困难的例子,传统方法在这些例子中困难。

SeisCLIP: A seismology foundation model pre-trained by multi-modal data for multi-purpose seismic feature extraction

  • paper_url: http://arxiv.org/abs/2309.02320
  • repo_url: https://github.com/sixu0/SeisCLIP
  • paper_authors: Xu Si, Xinming Wu, Hanlin Sheng, Jun Zhu, Zefeng Li
  • for: 这篇论文的目的是发展一个基础模型,供不同领域的地震学家使用。
  • methods: 这篇论文使用了对比式学习方法,将多modal数据集融合到一个基础模型中。
  • results: 这篇论文的实验结果显示,基础模型可以在不同地区的数据集上表现出色,并且在不同任务上表现更好 than 基于点的方法。
    Abstract Training specific deep learning models for particular tasks is common across various domains within seismology. However, this approach encounters two limitations: inadequate labeled data for certain tasks and limited generalization across regions. To address these challenges, we develop SeisCLIP, a seismology foundation model trained through contrastive learning from multi-modal data. It consists of a transformer encoder for extracting crucial features from time-frequency seismic spectrum and an MLP encoder for integrating the phase and source information of the same event. These encoders are jointly pre-trained on a vast dataset and the spectrum encoder is subsequently fine-tuned on smaller datasets for various downstream tasks. Notably, SeisCLIP's performance surpasses that of baseline methods in event classification, localization, and focal mechanism analysis tasks, employing distinct datasets from different regions. In conclusion, SeisCLIP holds significant potential as a foundational model in the field of seismology, paving the way for innovative directions in foundation-model-based seismology research.
    摘要 通常在不同领域内的地震学中都会使用特定任务的深度学习模型训练。然而,这种方法存在两个限制:一是没有充足的标注数据 для某些任务,二是限制了在不同地区的泛化。为了解决这些挑战,我们开发了SeisCLIP,一个基于对比学习的地震学基础模型。它包括一个变换器编码器,用于从时频地震谱中提取关键特征,以及一个多层感知编码器,用于 инте integrating频谱信息和源信息。这两个编码器被共同预训练在庞大的数据集上,并且 spectrum编码器在更小的数据集上进行细化训练以适应不同下游任务。值得注意的是,SeisCLIP的性能在不同地区的事件分类、地点定位和焦点机制分析任务中都超过了基eline方法的性能。因此,SeisCLIP在地震学领域中具有重要的潜在价值,可能开创出新的基础模型基于的地震学研究方向。

Graph Self-Contrast Representation Learning

  • paper_url: http://arxiv.org/abs/2309.02304
  • repo_url: https://github.com/GRAND-Lab/MERIT
  • paper_authors: Minjie Chen, Yao Cheng, Ye Wang, Xiang Li, Ming Gao
  • for: 本文提出了一种新的图自对抗框架GraphSC,用于图表示学习。
  • methods: GraphSC使用了一个positive和一个negative样本,并使用三元损失函数。具体来说, GraphSC使用图生成函数来生成图样本的多种强度的负样本,并使用HSIC来因素化表示。
  • results: 在对19种当前状态的方法进行了广泛的实验测试后,GraphSC在无监督和转移学习Setting中表现出了优秀的表现。
    Abstract Graph contrastive learning (GCL) has recently emerged as a promising approach for graph representation learning. Some existing methods adopt the 1-vs-K scheme to construct one positive and K negative samples for each graph, but it is difficult to set K. For those methods that do not use negative samples, it is often necessary to add additional strategies to avoid model collapse, which could only alleviate the problem to some extent. All these drawbacks will undoubtedly have an adverse impact on the generalizability and efficiency of the model. In this paper, to address these issues, we propose a novel graph self-contrast framework GraphSC, which only uses one positive and one negative sample, and chooses triplet loss as the objective. Specifically, self-contrast has two implications. First, GraphSC generates both positive and negative views of a graph sample from the graph itself via graph augmentation functions of various intensities, and use them for self-contrast. Second, GraphSC uses Hilbert-Schmidt Independence Criterion (HSIC) to factorize the representations into multiple factors and proposes a masked self-contrast mechanism to better separate positive and negative samples. Further, Since the triplet loss only optimizes the relative distance between the anchor and its positive/negative samples, it is difficult to ensure the absolute distance between the anchor and positive sample. Therefore, we explicitly reduced the absolute distance between the anchor and positive sample to accelerate convergence. Finally, we conduct extensive experiments to evaluate the performance of GraphSC against 19 other state-of-the-art methods in both unsupervised and transfer learning settings.
    摘要 graph contrastive learning (GCL) 近期出现为图表示学习的有力的方法之一。一些现有方法采用1对K的方案来建立一个图和K个负样本,但是很难设置K。对于不使用负样本的方法,通常需要添加额外策略以避免模型塌陷,这可以只是减轻问题的程度。这些缺点会对模型的普适性和效率产生负面影响。在这篇论文中,我们提出一种新的图自相关框架GraphSC,它仅使用一个图和一个负样本,并选择三元损失为目标。具体来说,自相关有两个含义。首先,GraphSC通过图像函数的多种强度生成了图样本的正面和负面视图,并将它们用于自相关。其次,GraphSC使用希尔伯特-施密特独立度标准(HSIC)来因素化表示,并提出了屏蔽自相关机制以更好地分离正面和负面样本。此外,因为三元损失仅仅优化了anchor和正样本之间的相对距离,因此我们显式减小了anchor和正样本之间的绝对距离,以加速收敛。最后,我们进行了广泛的实验,以评估GraphSC在无监督和转移学习设置下的性能,与19种当前状态的方法进行比较。

Enhancing Semantic Communication with Deep Generative Models – An ICASSP Special Session Overview

  • paper_url: http://arxiv.org/abs/2309.02478
  • repo_url: None
  • paper_authors: Eleonora Grassucci, Yuki Mitsufuji, Ping Zhang, Danilo Comminiello
  • for: 本研究旨在探讨 semantic communication 在 future AI-driven communication systems 中的突出作用,以及如何通过深度生成模型来解决 semantic information EXTRACTION 和 semantically consistent data 生成的挑战。
  • methods: 本研究使用 deep generative models 来 Addressing semantic communication challenges from the machine learning perspective,包括 dealing with real-world complex data, extracting and exploiting semantic information, and being robust to channel corruptions。
  • results: 本研究 Chart novel research pathways for the next generative semantic communication frameworks, 并预示了 deep generative models 在 semantic communication 中的突出作用。
    Abstract Semantic communication is poised to play a pivotal role in shaping the landscape of future AI-driven communication systems. Its challenge of extracting semantic information from the original complex content and regenerating semantically consistent data at the receiver, possibly being robust to channel corruptions, can be addressed with deep generative models. This ICASSP special session overview paper discloses the semantic communication challenges from the machine learning perspective and unveils how deep generative models will significantly enhance semantic communication frameworks in dealing with real-world complex data, extracting and exploiting semantic information, and being robust to channel corruptions. Alongside establishing this emerging field, this paper charts novel research pathways for the next generative semantic communication frameworks.
    摘要 semantic communication 将在未来的 AI 驱动通信系统中扮演重要的角色。其挑战是从原始复杂内容中提取 semantics 信息并在接收端生成具有 semantics 一致性的数据,可能在通道损害情况下保持稳定性。这篇 ICASSP 特别会议简述paper 揭示了从机器学习角度来看 semantic communication 的挑战和 deep generative models 如何在实际世界中处理复杂数据、提取和利用 semantics 信息,并在通道损害情况下保持稳定性。此外,这篇 paper 还映示了未来的 generative semantic communication 框架的新研究路径。

Optimal Observation-Intervention Trade-Off in Optimisation Problems with Causal Structure

  • paper_url: http://arxiv.org/abs/2309.02287
  • repo_url: None
  • paper_authors: Kim Hammar, Neil Dhir
  • for: 优化成本高的灰色案例目标函数,在有限预算下,基于 causal 结构的知识。
  • methods: 使用非幼目的最优停止问题,考虑观察 intervención 费用负面选择。
  • results: 实验结果表明,我们的表述可以增强现有的算法在真实和 sintética 标准准的表现。
    Abstract We consider the problem of optimising an expensive-to-evaluate grey-box objective function, within a finite budget, where known side-information exists in the form of the causal structure between the design variables. Standard black-box optimisation ignores the causal structure, often making it inefficient and expensive. The few existing methods that consider the causal structure are myopic and do not fully accommodate the observation-intervention trade-off that emerges when estimating causal effects. In this paper, we show that the observation-intervention trade-off can be formulated as a non-myopic optimal stopping problem which permits an efficient solution. We give theoretical results detailing the structure of the optimal stopping times and demonstrate the generality of our approach by showing that it can be integrated with existing causal Bayesian optimisation algorithms. Experimental results show that our formulation can enhance existing algorithms on real and synthetic benchmarks.
    摘要 我们考虑一个评估成本高的灰色obox目标函数优化问题,在有限预算内进行优化,其中知道变量之间的 causal 结构。标准的黑色obox优化忽略了 causal 结构,经常使其不fficient和昂贵。现有的方法只考虑了 causal 结构,但它们是偏短视的,不完全考虑观测 intervención 费用的负面作用。在这篇论文中,我们表明观测 intervención 费用可以形式化为非偏短视的最优停止问题,允许高效解决。我们提供了理论结果,详细说明优止时间的结构,并证明我们的方法可以与现有的 causal Bayesian 优化算法结合使用。实验结果表明,我们的形式化可以提高现有算法的性能在真实和synthetic 标准测试上。

s-ID: Causal Effect Identification in a Sub-Population

  • paper_url: http://arxiv.org/abs/2309.02281
  • repo_url: None
  • paper_authors: Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash
  • for: 本文目的是解决在特定子 populations 中 causal inference 问题,即从 observational data 中确定干预对特定子 populations 的影响。
  • methods: 本文提出了一种新的 causal inference 问题,称为 s-ID 问题,其中只有 observational data 可用,并且不知道整个人口的数据分布。作者提供了必需的和完整的条件,以确定 causal effect 在子 populations 中的可 identificability。
  • results: 本文提出的方法可以在 observational data 中确定 causal effect 在子 populations 中,并且可以在不同的 causal graph 下进行 identification。这种方法可以解决现有方法在 sub-populations 中的局限性。
    Abstract Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup within a larger population. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.
    摘要 causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup within a larger population. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.Here's the translation in Traditional Chinese as well: causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup within a larger population. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem.

MA-VAE: Multi-head Attention-based Variational Autoencoder Approach for Anomaly Detection in Multivariate Time-series Applied to Automotive Endurance Powertrain Testing

  • paper_url: http://arxiv.org/abs/2309.02253
  • repo_url: https://github.com/lcs-crr/ma-vae
  • paper_authors: Lucas Correia, Jan-Christoph Goos, Philipp Klein, Thomas Bäck, Anna V. Kononova
  • for: automatous anomaly detection in automotive testing
  • methods: variational autoencoder with multi-head attention (MA-VAE)
  • results: detects the majority of anomalies with few false positives, avoids bypass phenomenon, and introduces a new method for remapping individual windows to a continuous time series.Here’s the breakdown of each point:
  • for: The paper is written for the purpose of proposing a novel approach to automatic anomaly detection in automotive testing, which is a real-world application with massive, diverse, multivariate, and temporal data.
  • methods: The proposed approach uses a variational autoencoder with multi-head attention (MA-VAE) to model the testee behavior and detect anomalies. The MA-VAE is trained on unlabelled data and has the ability to provide few false positives and detect the majority of anomalies.
  • results: The approach is tested on a real-world industrial data set and the results show that it can detect 67% of the anomalies present with 9% false positives. Additionally, the approach has the potential to perform well with only a fraction of the training and validation subset, but a more sophisticated threshold estimation method is required to extract it.
    Abstract A clear need for automatic anomaly detection applied to automotive testing has emerged as more and more attention is paid to the data recorded and manual evaluation by humans reaches its capacity. Such real-world data is massive, diverse, multivariate and temporal in nature, therefore requiring modelling of the testee behaviour. We propose a variational autoencoder with multi-head attention (MA-VAE), which, when trained on unlabelled data, not only provides very few false positives but also manages to detect the majority of the anomalies presented. In addition to that, the approach offers a novel way to avoid the bypass phenomenon, an undesirable behaviour investigated in literature. Lastly, the approach also introduces a new method to remap individual windows to a continuous time series. The results are presented in the context of a real-world industrial data set and several experiments are undertaken to further investigate certain aspects of the proposed model. When configured properly, it is 9% of the time wrong when an anomaly is flagged and discovers 67% of the anomalies present. Also, MA-VAE has the potential to perform well with only a fraction of the training and validation subset, however, to extract it, a more sophisticated threshold estimation method is required.
    摘要 <>对于自动异常检测在汽车测试中的需求,现在越来越明显,因为人类的手动评估已经达到了其容量。这些真实世界数据是庞大、多样、多变和时间序列的,因此需要测试对象的行为模型。我们提议一种多头注意力自适应变换器(MA-VAE),当训练于无标签数据时,不仅可以减少假阳性数量,而且能够检测大多数异常现象。此外,该方法还可以避免快船现象,这是文献中 investigate 的不良行为。最后,该方法还引入了一种新的时间序列映射方法。结果在实际工业数据集上展示,并进行了一些实验来更深入探索ertain aspect of the proposed model。当配置正确时,MA-VAE的错误率为9%,并检测到67%的异常现象。此外,MA-VAE还有可能在只使用一小部分的训练和验证subset中表现良好,但是要EXTRACT 它,需要一种更加复杂的阈值估计方法。

Encoding Seasonal Climate Predictions for Demand Forecasting with Modular Neural Network

  • paper_url: http://arxiv.org/abs/2309.02248
  • repo_url: None
  • paper_authors: Smit Marvaniya, Jitendra Singh, Nicolas Galichet, Fred Ochieng Otieno, Geeth De Mel, Kommy Weldemariam
  • for: 提高供应链功能的时间序列预测精度
  • methods: 使用模块化神经网络架构,高效地编码季节气候预测结果,以及其他时间序列数据(例如买家模式),从而学习具有坚实性和可靠性的秘密表示
  • results: 比对 existed 预测方法,实验结果显示,使用该模型增加了约13%到17%的预测精度,在多个实际数据集上
    Abstract Current time-series forecasting problems use short-term weather attributes as exogenous inputs. However, in specific time-series forecasting solutions (e.g., demand prediction in the supply chain), seasonal climate predictions are crucial to improve its resilience. Representing mid to long-term seasonal climate forecasts is challenging as seasonal climate predictions are uncertain, and encoding spatio-temporal relationship of climate forecasts with demand is complex. We propose a novel modeling framework that efficiently encodes seasonal climate predictions to provide robust and reliable time-series forecasting for supply chain functions. The encoding framework enables effective learning of latent representations -- be it uncertain seasonal climate prediction or other time-series data (e.g., buyer patterns) -- via a modular neural network architecture. Our extensive experiments indicate that learning such representations to model seasonal climate forecast results in an error reduction of approximately 13\% to 17\% across multiple real-world data sets compared to existing demand forecasting methods.
    摘要 当前时间序列预测问题通常使用短期天气特征作为外生输入。然而,在特定的时间序列预测解决方案(如购物者patterns)中,季节气候预测是关键以提高其抗难度。表示中期至长期季节气候预测的问题是复杂的,因为季节气候预测具有不确定性,而且与需求的空间时间关系复杂。我们提出了一种新的模型框架,可以效率地编码季节气候预测。该框架允许效果学习季节气候预测的秘密表示,并且可以吸收其他时间序列数据(如购物者patterns)的学习。我们的广泛实验表明,通过学习这些表示来模型季节气候预测可以减少错误率约13%到17%,相比之前的需求预测方法。

AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.06495
  • repo_url: None
  • paper_authors: Fei Tang, Wanling Gao, Luzhou Peng, Jianfeng Zhan
    for:* 本研究的目的是为了评估大型自然语言模型(LLM)的问题解决能力和智能水平。methods:* 本研究提出了一种多级划分、多modal、人参照的benchmarking方法,称为AGIBench,用于评估LLM的问题解决能力。results:* AGIBench支持多级划分benchmarking,包括每个问题、每个能力分支、每个知识、每个模式和每个难度层次的 granularity。Note:* 本文使用Simplified Chinese text format.* 所有的中文句子使用标准的标点符号和格式。
    Abstract Large language models (LLMs) like ChatGPT have revealed amazing intelligence. How to evaluate the question-solving abilities of LLMs and their degrees of intelligence is a hot-spot but challenging issue. First, the question-solving abilities are interlaced with different ability branches like understanding and massive knowledge categories like mathematics. Second, the inputs of questions are multimodal that may involve text and images. Third, the response format of LLMs is diverse and thus poses great challenges for result extraction and evaluation. In this paper, we propose AGIBench -- a multi-granularity, multimodal, human-referenced, and auto-scoring benchmarking methodology for LLMs. Instead of a collection of blended questions, AGIBench focuses on three typical ability branches and adopts a four-tuple to label the attributes of each question. First, it supports multi-granularity benchmarking, e.g., per-question, per-ability branch, per-knowledge, per-modal, per-dataset, and per-difficulty level granularities. Second, it contains multimodal input, including text and images. Third, it classifies all the questions into five degrees of difficulty according to the average accuracy rate of abundant educated humans (human-referenced). Fourth, it adopts zero-shot learning to avoid introducing additional unpredictability and provides an auto-scoring method to extract and judge the result. Finally, it defines multi-dimensional metrics, including accuracy under the average, worst, best, and majority voting cases, and repeatability. AGIBench is publically available from \url{https://www.benchcouncil.org/agibench}.
    摘要 大型语言模型(LLM)如ChatGPT的出色智能引发了评估这类模型的问题解决能力和智能水平的热点问题。然而,这些能力存在多种能力分支和多种输入模式,使评估变得具有挑战性。在本文中,我们提出了AGIBench方法,它是一种多级、多Modal、人参照的自动评分 benchMarking方法。在AGIBench中,每个问题被标记为四元组(能力分支、知识、Difficulty、Modal),以便支持多级别的评估。此外,AGIBench还支持多Modal输入,包括文本和图像。此外,它采用人参照的方式将问题分为五个Difficulty水平,并采用零投入学习以避免引入额外的不确定性。最后,它定义了多维度纪录,包括均值、最差、最佳、多数投票等纪录。AGIBench公共可用于 \url{https://www.benchcouncil.org/agibench}。

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

  • paper_url: http://arxiv.org/abs/2309.02236
  • repo_url: None
  • paper_authors: Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic
  • for: 本研究旨在解决机器学习中的复杂动态系统、数据收集成本高和实际环境不符合训练环境的问题。
  • methods: 本文使用分布robust Markov决策过程(DRMP),利用 Gaussian Processes 和最大差异减少算法,效率地学习多输出 номинаル过程动态模型,并可以快速适应不同的不确定性集。
  • results: 研究人员通过 theoretically 和实验来证明提议的方法可以快速和高效地学习分布robust策略,并且可以适应不同的不确定性集。实验结果表明,该方法可以快速适应分布shift,并且在许多实际应用中表现出色。
    Abstract Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics, leveraging access to a generative model (i.e., simulator). We further demonstrate the statistical sample complexity of the proposed method for different uncertainty sets. These complexity bounds are independent of the number of states and extend beyond linear dynamics, ensuring the effectiveness of our approach in identifying near-optimal distributionally-robust policies. The proposed method can be further combined with other model-free distributionally robust reinforcement learning methods to obtain a near-optimal robust policy. Experimental results demonstrate the robustness of our algorithm to distributional shifts and its superior performance in terms of the number of samples needed.
    摘要 三大挑战在强化学习中是复杂的动力系统和大状态空间,以及实际世界中的动力不同于训练环境部署。为了解决这些问题,我们研究了分布robust Markov决策过程(MDP),并使用了 kontinuous state space 下的 Kullback-Leibler、chi-square 和 total variation 不确定性集。我们提出了一种基于模型的方法,利用 Gaussian Processes 和最大差异减少算法,高效地学习多输出 номинал传递动力学,利用 simulator 的访问权限。我们进一步证明了我们的方法的统计样本复杂度,这些复杂度独立于状态数量,并超越了线性动力学,保证了我们的方法在分布不稳定下可以适应最佳的分布robust策略。我们的方法可以与其他分布robust强化学习方法相结合,以获得最佳的 robust 策略。实验结果表明,我们的算法具有分布不稳定的Robustness和较少样本数量的优势。

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

  • paper_url: http://arxiv.org/abs/2309.02233
  • repo_url: None
  • paper_authors: Yubo Wang, Xueguang Ma, Wenhu Chen
  • for: 这个研究旨在应用大规模语言模型(LLM)到医疗领域,但是这类模型在医疗领域中的应用仍然具有挑战,主要是因为它们无法充分利用领域专门知识。
  • methods: 这个研究提出了一个名为Large-scale Language Models Augmented with Medical Textbooks(LLM-AMT)的解决方案,它通过将专业医学书籍作为设计的核心,通过插件式模组来扩展LLM的能力,包括混合文献搜寻器和询问增强器。
  • results: 实验结果显示,在三个开放领域医学问题解答任务上,使用LLM-AMT可以提高LLM的专业性和准确性,提高范围在11.4%到13.2%。此外,医学书籍作为搜寻 корпу的使用比wikipedia更有价值,实验结果显示,对于医学领域来说,使用医学书籍进行扩展可以提高性能范围在9.7%到12.2%。
    Abstract Large-scale language models (LLMs), such as ChatGPT, are capable of generating human-like responses for various downstream tasks, such as task-oriented dialogues and question answering. However, applying LLMs to medical domains remains challenging due to their inability to leverage domain-specific knowledge. In this study, we present the Large-scale Language Models Augmented with Medical Textbooks (LLM-AMT), which integrates authoritative medical textbooks as the cornerstone of its design, enhancing its proficiency in the specialized domain through plug-and-play modules, comprised of a Hybrid Textbook Retriever, supplemented by the Query Augmenter and the LLM Reader. Experimental evaluation on three open-domain medical question-answering tasks reveals a substantial enhancement in both the professionalism and accuracy of the LLM responses when utilizing LLM-AMT, exhibiting an improvement ranging from 11.4% to 13.2%. Despite being 100 times smaller, we found that medical textbooks as the retrieval corpus serves as a more valuable external knowledge source than Wikipedia in the medical domain. Our experiments show that textbook augmentation results in a performance improvement ranging from 9.7% to 12.2% over Wikipedia augmentation.
    摘要 大规模语言模型(LLM),如ChatGPT,可以生成人类化回答 для多种下游任务,如任务导向对话和问答。然而,在医疗领域中应用LLM仍然是一个挑战,因为它们无法借鉴医疗领域专业知识。在本研究中,我们提出了医疗领域语言模型增强器(LLM-AMT),它将权威的医疗文献作为设计的核心,通过插件式模块来增强其在专业领域的能力。我们的实验表明,在三个开放领域医学问答任务上,使用LLM-AMT可以substantially提高LLM的专业性和准确性,提高回答的质量,其中提高范围为11.4%到13.2%。我们发现,医疗文献作为搜索库是医疗领域更有价值的外部知识源,而不是Wikipedia。我们的实验表明,在使用医疗文献扩展时,表现提高的范围为9.7%到12.2%。

FSD: An Initial Chinese Dataset for Fake Song Detection

  • paper_url: http://arxiv.org/abs/2309.02232
  • repo_url: https://github.com/xieyuankun/fsd-dataset
  • paper_authors: Yuankun Xie, Jingjing Zhou, Xiaolin Lu, Zhenghao Jiang, Yuxin Yang, Haonan Cheng, Long Ye
  • for: 本研究旨在提供一个特有的歌曲深度变异检测数据集,并利用这个数据集进行歌曲深度变异检测模型的训练和评估。
  • methods: 本研究使用了五种当前最佳的嗓音合成和嗓音转换技术来生成假歌曲,并使用这些假歌曲来初始化一个中文歌曲深度变异检测数据集(FSD)。然后,我们使用FSD数据集进行歌曲深度变异检测模型的训练和评估。
  • results: 我们的实验结果表明,将歌曲特有的特征提取和处理到深度变异检测模型中,可以减少平均错误率38.58%,相比Speech-trained ADD模型在FSD测试集上。
    Abstract Singing voice synthesis and singing voice conversion have significantly advanced, revolutionizing musical experiences. However, the rise of "Deepfake Songs" generated by these technologies raises concerns about authenticity. Unlike Audio DeepFake Detection (ADD), the field of song deepfake detection lacks specialized datasets or methods for song authenticity verification. In this paper, we initially construct a Chinese Fake Song Detection (FSD) dataset to investigate the field of song deepfake detection. The fake songs in the FSD dataset are generated by five state-of-the-art singing voice synthesis and singing voice conversion methods. Our initial experiments on FSD revealed the ineffectiveness of existing speech-trained ADD models for the task of song deepFake detection. Thus, we employ the FSD dataset for the training of ADD models. We subsequently evaluate these models under two scenarios: one with the original songs and another with separated vocal tracks. Experiment results show that song-trained ADD models exhibit a 38.58% reduction in average equal error rate compared to speech-trained ADD models on the FSD test set.
    摘要 《声音合成和声音转换技术在音乐经验方面已经取得了 significativ advancement,但是这些技术的出现也引发了 authenticity的问题。与Audio DeepFake Detection(ADD)不同的是,歌曲深伪检测领域没有专门的数据集或方法进行歌曲的真实性验证。在这篇论文中,我们首先构建了中文伪歌曲检测(FSD)数据集,以探讨歌曲深伪检测领域的问题。这些伪歌曲在FSD数据集中是由五种当前最好的声音合成和声音转换方法生成的。我们的初始实验表明,现有的speech-trained ADD模型对歌曲深伪检测任务并不有效。因此,我们使用FSD数据集来训练ADD模型。我们之后对这些模型进行了两种enario的评估:一种是使用原始的歌曲,另一种是使用分离的vocal轨。实验结果显示,使用歌曲训练的ADD模型在FSD测试集上比使用speech训练的ADD模型减少了38.58%的平均等错率。

DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2309.02230
  • repo_url: None
  • paper_authors: Zhechao Wang, Peirui Cheng, Shujing Duan, Kaiqiang Chen, Zhirui Wang, Xinming Li, Xian Sun
  • for: 提高远程感知任务中紧急情况下多平台协同观测的精度和效率。
  • methods: 提出了一种分布式协同感知网络(DCP-Net),通过将不同平台的特征集成而提高感知性能。同时,通过自适应匹配模块和相关特征融合模块来实现多平台协同观测。
  • results: 经过广泛的实验和视觉分析,DCP-Net在三个semantic segmentation dataset上表现出了明显的优势,与exist方法相比,提高了mIoU值2.61%~16.89%,达到了当前最佳水平。
    Abstract Onboard intelligent processing is widely applied in emergency tasks in the field of remote sensing. However, it is predominantly confined to an individual platform with a limited observation range as well as susceptibility to interference, resulting in limited accuracy. Considering the current state of multi-platform collaborative observation, this article innovatively presents a distributed collaborative perception network called DCP-Net. Firstly, the proposed DCP-Net helps members to enhance perception performance by integrating features from other platforms. Secondly, a self-mutual information match module is proposed to identify collaboration opportunities and select suitable partners, prioritizing critical collaborative features and reducing redundant transmission cost. Thirdly, a related feature fusion module is designed to address the misalignment between local and collaborative features, improving the quality of fused features for the downstream task. We conduct extensive experiments and visualization analyses using three semantic segmentation datasets, including Potsdam, iSAID and DFC23. The results demonstrate that DCP-Net outperforms the existing methods comprehensively, improving mIoU by 2.61%~16.89% at the highest collaboration efficiency, which promotes the performance to a state-of-the-art level.
    摘要 在远程感知领域的紧急任务中,船载智能处理广泛应用。然而,它主要受限于个人平台的有限观测范围以及易受干扰的问题,导致准确性有限。针对当前多平台合作观测的状况,本文创新提出了分布式合作感知网络(DCP-Net)。首先,提议的DCP-Net帮助成员提高感知性能,将其他平台的特征集成到自己平台上。其次,基于自我相互信息匹配模块,用于识别合作机会,选择适合的合作伙伴,优先级划分关键合作特征,降低重复传输成本。第三,关联特征融合模块用于解决本地特征与合作特征之间的不一致问题,提高下游任务的质量。我们对三个semantic segmentation数据集进行了广泛的实验和视觉分析,包括Potsdam、iSAID和DFC23。结果表明,DCP-Net与现有方法相比,全面性地提高了miou值,在最高的合作效率下提高了2.61%~16.89%,提升性能至当前领先水平。

Dense Object Grounding in 3D Scenes

  • paper_url: http://arxiv.org/abs/2309.02224
  • repo_url: None
  • paper_authors: Wencan Huang, Daizong Liu, Wei Hu
  • for: 本研究旨在解决现有3D物体准确定位方法的限制,即只能根据单个句子描述一个物体进行定位。为了解决这个问题,我们引入了3D密集物体定位(3D DOG)任务,即在更复杂的段落中找到多个物体的定位。
  • methods: 我们提出了一种新的核心 трансформа器基于框架,名为3DOGSFormer。该框架包括一个地址驱动的本地变换器解码器,以及一个提案驱动的全球变换器解码器。这两个解码器协作以生成更加准确的定位提案。
  • results: 我们的3DOGSFormer在三个不同的测试集(Nr3D、Sr3D和ScanRefer)上进行了广泛的实验,结果表明,我们的方法在比较复杂的3D场景中对多个物体的定位有较高的准确率,与现有的3D单物体定位方法和密集物体定位方法相比,具有显著的优势。
    Abstract Localizing objects in 3D scenes according to the semantics of a given natural language is a fundamental yet important task in the field of multimedia understanding, which benefits various real-world applications such as robotics and autonomous driving. However, the majority of existing 3D object grounding methods are restricted to a single-sentence input describing an individual object, which cannot comprehend and reason more contextualized descriptions of multiple objects in more practical 3D cases. To this end, we introduce a new challenging task, called 3D Dense Object Grounding (3D DOG), to jointly localize multiple objects described in a more complicated paragraph rather than a single sentence. Instead of naively localizing each sentence-guided object independently, we found that dense objects described in the same paragraph are often semantically related and spatially located in a focused region of the 3D scene. To explore such semantic and spatial relationships of densely referred objects for more accurate localization, we propose a novel Stacked Transformer based framework for 3D DOG, named 3DOGSFormer. Specifically, we first devise a contextual query-driven local transformer decoder to generate initial grounding proposals for each target object. Then, we employ a proposal-guided global transformer decoder that exploits the local object features to learn their correlation for further refining initial grounding proposals. Extensive experiments on three challenging benchmarks (Nr3D, Sr3D, and ScanRefer) show that our proposed 3DOGSFormer outperforms state-of-the-art 3D single-object grounding methods and their dense-object variants by significant margins.
    摘要 本文提出了一个新的挑战任务,即3D密集物地理(3D DOG),它的目的是在3D场景中对自然语言中提供的多个对象进行同时地理化。现有的大多数3D物理地理方法都是基于单句输入,无法处理更复杂的多对象描述。为此,我们提出了一种新的框架,即3DOGSFormer,它利用了堆叠的变换器来探索多个对象之间的含义和空间关系,从而实现更高精度的物理地理。我们首先设计了一种基于上下文的查询驱动的本地变换器嵌入器,以生成每个目标对象的初步锚点提案。然后,我们使用一种提案驱动的全球变换器嵌入器,利用本地对象特征来学习它们之间的相互关系,进一步细化初步锚点提案。我们在三个具有挑战性的测试基准(Nr3D、Sr3D和ScanRefer)上进行了广泛的实验,结果显示,我们的提案的3DOGSFormer在与现有的3D单个对象地理方法和密集对象变体之间具有显著的优势。

Improving equilibrium propagation without weight symmetry through Jacobian homeostasis

  • paper_url: http://arxiv.org/abs/2309.02214
  • repo_url: https://github.com/laborieux-axel/generalized-holo-ep
  • paper_authors: Axel Laborieux, Friedemann Zenke
  • for: 这个论文是为了研究等温傅振(EP)算法在生物或分析型神经网络上的应用。
  • methods: 这个论文使用了等温傅振算法,但是它需要权重对称和极小的平衡冲击来计算神经网络中的梯度。
  • results: 研究发现,权重不对称会导致等温傅振算法的表现不佳,而且可能会导致学习任务的低效。为了解决这个问题, authors propose了一种新的自适应目标函数,可以直接惩罚神经网络中权重的不对称性。这种自适应目标函数可以帮助神经网络更好地解决复杂的任务,如 ImageNet 32x32。
    Abstract Equilibrium propagation (EP) is a compelling alternative to the backpropagation of error algorithm (BP) for computing gradients of neural networks on biological or analog neuromorphic substrates. Still, the algorithm requires weight symmetry and infinitesimal equilibrium perturbations, i.e., nudges, to estimate unbiased gradients efficiently. Both requirements are challenging to implement in physical systems. Yet, whether and how weight asymmetry affects its applicability is unknown because, in practice, it may be masked by biases introduced through the finite nudge. To address this question, we study generalized EP, which can be formulated without weight symmetry, and analytically isolate the two sources of bias. For complex-differentiable non-symmetric networks, we show that the finite nudge does not pose a problem, as exact derivatives can still be estimated via a Cauchy integral. In contrast, weight asymmetry introduces bias resulting in low task performance due to poor alignment of EP's neuronal error vectors compared to BP. To mitigate this issue, we present a new homeostatic objective that directly penalizes functional asymmetries of the Jacobian at the network's fixed point. This homeostatic objective dramatically improves the network's ability to solve complex tasks such as ImageNet 32x32. Our results lay the theoretical groundwork for studying and mitigating the adverse effects of imperfections of physical networks on learning algorithms that rely on the substrate's relaxation dynamics.
    摘要 <> translate the following text into Simplified Chinese<>平衡传播(EP)是一种有吸引力的替代品 для误差传播算法(BP),用于计算神经网络的梯度。然而,EP算法需要权重的对称和微小的平衡干扰,以便效率地计算梯度。这两个需求在物理系统中实现可以是问题。但是,权重的不对称性是否会影响EP的应用是未知的,因为在实践中可能会遭受由固定干扰引入的偏见。为了解决这个问题,我们研究了一种简化EP的方法,可以不需要权重的对称。我们还可以分析EP中两种source of bias,并证明在复杂不对称的神经网络上,finite nudge不会对梯度的计算产生影响。然而,权重的不对称性会导致梯度的误差,从而降低任务的性能。为了解决这个问题,我们提出了一个新的家ostatic objective,可以直接 penalty函数的不对称性。这个家ostatic objective可以对任务如ImageNet 32x32进行解决,并获得了良好的性能。我们的结果对于研究和解决物理网络上学习算法的不完善性问题提供了理论基础。

Exchanging-based Multimodal Fusion with Transformer

  • paper_url: http://arxiv.org/abs/2309.02190
  • repo_url: https://github.com/recklessronan/muse
  • paper_authors: Renyu Zhu, Chengcheng Han, Yong Qian, Qiushi Sun, Xiang Li, Ming Gao, Xuezhi Cao, Yunsen Xian
  • for: 本文研究了多模态融合的问题,特别是用于文本视频融合。
  • methods: 本文提出了一种基于Transformer的新的多模态融合模型MuSE,使用了两个编码器将多modal输入映射到不同的低维度空间中,并使用了两个解码器来规范嵌入并将其拟合到同一个空间中。
  • results: 对多模态命名实体识别和多模态情感分析两个任务进行了广泛的实验,结果表明MuSE比其他竞争者更高效。
    Abstract We study the problem of multimodal fusion in this paper. Recent exchanging-based methods have been proposed for vision-vision fusion, which aim to exchange embeddings learned from one modality to the other. However, most of them project inputs of multimodalities into different low-dimensional spaces and cannot be applied to the sequential input data. To solve these issues, in this paper, we propose a novel exchanging-based multimodal fusion model MuSE for text-vision fusion based on Transformer. We first use two encoders to separately map multimodal inputs into different low-dimensional spaces. Then we employ two decoders to regularize the embeddings and pull them into the same space. The two decoders capture the correlations between texts and images with the image captioning task and the text-to-image generation task, respectively. Further, based on the regularized embeddings, we present CrossTransformer, which uses two Transformer encoders with shared parameters as the backbone model to exchange knowledge between multimodalities. Specifically, CrossTransformer first learns the global contextual information of the inputs in the shallow layers. After that, it performs inter-modal exchange by selecting a proportion of tokens in one modality and replacing their embeddings with the average of embeddings in the other modality. We conduct extensive experiments to evaluate the performance of MuSE on the Multimodal Named Entity Recognition task and the Multimodal Sentiment Analysis task. Our results show the superiority of MuSE against other competitors. Our code and data are provided at https://github.com/RecklessRonan/MuSE.
    摘要 我们在这篇论文中研究了多模态融合问题。在最近的交换基本方法中,有些方法用于视觉融合,其目的是将一个模态的嵌入交换到另一个模态中。然而,大多数方法将多模态输入映射到不同的低维度空间,无法应用于顺序输入数据。为解决这些问题,在这篇论文中,我们提出了一种基于Transformer的新的多模态融合模型MuSE,用于文本视觉融合。我们首先使用两个encoder将多模态输入映射到不同的低维度空间。然后,我们employs两个decoder来规范嵌入并将其拖入同一个空间。两个decoder使得文本和图像之间的相关性能够更好地捕捉,并通过图像描述任务和文本到图像生成任务来规范嵌入。此外,基于规范嵌入,我们还提出了交换Transformer,它使用两个Transformer encoder的共享参数作为后备模型,以交换多模态之间的知识。具体来说,交换Transformer先学习输入的全局 контекст信息,然后进行交换,选择一个模态中的一些Token,并将其嵌入换成另一个模态中的均值。我们对MuSE进行了广泛的实验,以评估其在多模态命名实体识别任务和多模态情感分析任务中的表现。我们的结果显示MuSE在与其他竞争对手相比,具有更高的表现。我们的代码和数据可以在https://github.com/RecklessRonan/MuSE上获取。

Leveraging BERT Language Models for Multi-Lingual ESG Issue Identification

  • paper_url: http://arxiv.org/abs/2309.02189
  • repo_url: None
  • paper_authors: Elvys Linhares Pontes, Mohamed Benjannet, Lam Kim Ming
  • for: 这项研究的目的是为投资者更好地了解公司的可持续发展和社会责任,通过分类新闻文章的ESG Issue标签来提高投资决策的可持续性。
  • methods: 该研究使用BERT语言模型来实现新闻文章的分类,并 comparing different BERT语言模型和SVM折衔模型的表现。
  • results: 研究发现,使用RoBERTa分类器得到了英文测试集第二名的成绩,并与法语测试集第五名相当。此外,SVM折衔模型特制 для中文语言也表现出色,在测试集上排名第二。
    Abstract Environmental, Social, and Governance (ESG) has been used as a metric to measure the negative impacts and enhance positive outcomes of companies in areas such as the environment, society, and governance. Recently, investors have increasingly recognized the significance of ESG criteria in their investment choices, leading businesses to integrate ESG principles into their operations and strategies. The Multi-Lingual ESG Issue Identification (ML-ESG) shared task encompasses the classification of news documents into 35 distinct ESG issue labels. In this study, we explored multiple strategies harnessing BERT language models to achieve accurate classification of news documents across these labels. Our analysis revealed that the RoBERTa classifier emerged as one of the most successful approaches, securing the second-place position for the English test dataset, and sharing the fifth-place position for the French test dataset. Furthermore, our SVM-based binary model tailored for the Chinese language exhibited exceptional performance, earning the second-place rank on the test dataset.
    摘要 环境、社会和管理(ESG)被用作公司负面影响和改善效果的度量。近期,投资者对ESG标准的重要性日益认识,导致企业将ESG原则 integrate into their operations and strategies。这个多语言ESG问题识别(ML-ESG)共同任务涵盖35个不同的ESG问题标签。本研究通过BERT语言模型的多种策略来实现新闻文档的准确分类。我们的分析发现,RoBERTa分类器在英语测试数据集中获得了第二名的成绩,并在法语测试数据集中与其他模型并列第五名。此外,我们对中文语言的SVM二分类模型也展现出了出色的表现,在测试数据集中获得了第二名。

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

  • paper_url: http://arxiv.org/abs/2309.02186
  • repo_url: https://github.com/YueWuHKUST/AniPortraitGAN
  • paper_authors: Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong
  • for: 生成高质量的3D人像视频
  • methods: 基于生成光谱抽象表示法和可学习头部和肩部变形的3D人像生成模型
  • results: 使用不结构化2D图像集训练的方法可以生成多样性和高质量的3D人像视频,并可以控制不同属性的表达Here’s a more detailed explanation of each point:
  • for: The paper is focused on generating high-quality 3D portrait videos, which are relatively rare in real life and are challenging to generate using existing methods.
  • methods: The proposed method is based on a generative radiance manifold representation and includes learnable facial and head-shoulder deformations. The method also uses a dual-camera rendering and adversarial learning scheme to improve the quality of the generated faces.
  • results: The method, trained on unstructured 2D image collections, can generate diverse and high-quality 3D portraits with desired control over different properties, such as facial expression, head pose, and shoulder movements.
    Abstract Previous animatable 3D-aware GANs for human generation have primarily focused on either the human head or full body. However, head-only videos are relatively uncommon in real life, and full body generation typically does not deal with facial expression control and still has challenges in generating high-quality results. Towards applicable video avatars, we present an animatable 3D-aware GAN that generates portrait images with controllable facial expression, head pose, and shoulder movements. It is a generative model trained on unstructured 2D image collections without using 3D or video data. For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations. A dual-camera rendering and adversarial learning scheme is proposed to improve the quality of the generated faces, which is critical for portrait images. A pose deformation processing network is developed to generate plausible deformations for challenging regions such as long hair. Experiments show that our method, trained on unstructured 2D images, can generate diverse and high-quality 3D portraits with desired control over different properties.
    摘要 以前的可动画3D意识GANs为人类生成都主要集中在人头或全身上。然而,头部视频较少出现在实际生活中,全身生成通常没有控制表情和脸部表现的能力,并且仍有高质量结果生成的挑战。为应用视频化身,我们提出了可动画3D意识GAN,该模型可生成带有可控表情、头部姿势和肩部运动的肖像图像。我们基于生成抛光扩散表示,并增加了可学习的脸部和头部运动扭曲。我们还提出了双摄像头渲染和对抗学习方案,以提高生成的脸部质量,这是对肖像图像的关键。此外,我们还开发了一个挑战区域 such as long hair 的姿势处理网络,以生成可能的姿势扭曲。实验表明,我们的方法,通过未结构化的2D图像集训练,可以生成多样化和高质量的3D肖像图像,并且可以控制不同的属性。

BEVTrack: A Simple Baseline for 3D Single Object Tracking in Bird’s-Eye View

  • paper_url: http://arxiv.org/abs/2309.02185
  • repo_url: https://github.com/xmm-prio/bevtrack
  • paper_authors: Yuxiang Yang, Yingqi Deng, Jiahao Nie, Jing Zhang
  • for: 3D single object tracking (SOT) in point clouds, specifically in autonomous driving scenarios where target objects maintain spatial adjacency across frames.
  • methods: converts consecutive point clouds into Bird’s-Eye View representation, encodes spatial proximity and captures motion cues via simple element-wise operation and convolutional layers, and directly learns the underlying motion distribution without making assumptions.
  • results: achieves state-of-the-art performance on KITTI and NuScenes datasets with a high inference speed of 122 FPS.
    Abstract 3D single object tracking (SOT) in point clouds is still a challenging problem due to appearance variation, distractors, and high sparsity of point clouds. Notably, in autonomous driving scenarios, the target object typically maintains spatial adjacency across consecutive frames, predominantly moving horizontally. This spatial continuity offers valuable prior knowledge for target localization. However, existing trackers, which often employ point-wise representations, struggle to efficiently utilize this knowledge owing to the irregular format of such representations. Consequently, they require elaborate designs and solving multiple subtasks to establish spatial correspondence. In this paper, we introduce BEVTrack, a simple yet strong baseline framework for 3D SOT. After converting consecutive point clouds into the common Bird's-Eye View representation, BEVTrack inherently encodes spatial proximity and adeptly captures motion cues for tracking via a simple element-wise operation and convolutional layers. Additionally, to better deal with objects having diverse sizes and moving patterns, BEVTrack directly learns the underlying motion distribution rather than making a fixed Laplacian or Gaussian assumption as in previous works. Without bells and whistles, BEVTrack achieves state-of-the-art performance on KITTI and NuScenes datasets while maintaining a high inference speed of 122 FPS. The code will be released at https://github.com/xmm-prio/BEVTrack.
    摘要 三元素 объек tracking (SOT) in point clouds 仍然是一个挑战,主要因为外观变化、干扰和点云的稀疏性。值得注意的是,在自动驾驶场景中,目标对象通常在连续帧中保持空间邻近,主要在水平方向上移动。这种空间继续性提供了有价值的先知知识 для目标位置确定。然而,现有的跟踪器,通常使用点 wise 表示,困难减少这种知识,因为点云的不规则格式。因此,它们需要较复杂的设计和解决多个子任务来确立空间匹配。在这篇论文中,我们介绍了 BEVTrack,一个简单却强大的基eline框架 для 3D SOT。将 consecutive point clouds 转化为共同 Bird's-Eye View 表示后,BEVTrack 自然地编码了空间 proximity 并善于捕捉运动指示符,通过简单的元素 wise 操作和卷积层来跟踪。此外,为了更好地处理具有不同尺寸和移动模式的对象,BEVTrack 直接学习下流动分布而不是在前一些作品中做固定 Laplacian 或 Gaussian 假设。无论各种饰物,BEVTrack achieve state-of-the-art 性能在 KITTI 和 NuScenes 数据集上,并保持高速推理速度为 122 FPS。代码将在 https://github.com/xmm-prio/BEVTrack 上发布。

Dual Relation Alignment for Composed Image Retrieval

  • paper_url: http://arxiv.org/abs/2309.02169
  • repo_url: None
  • paper_authors: Xintong Jiang, Yaxiong Wang, Yujiao Wu, Meng Wang, Xueming Qian
    for: 本研究旨在提高组合图像检索性能,通过融合两种关系:Explicit Relation(图像参考和补充文本)和Implicit Relation(图像参考和目标图像)。methods: 我们提出了一种新的框架,称为双关系对齐,它将Explicit Relation和Implicit Relation完全融合,以便充分利用这些对象之间的相互关系。我们设计了一个视觉混合器,将参考图像和目标图像 fusion,然后将结果表示作两种角色:(1)对Semantic Alignment with 补充文本进行对应,(2)为Explicit Relation模型增强。results: 我们在CIRR和FashionIQ两个Popular Dataset上进行了广泛的实验,结果表明我们的双关系学习方法可以明显提高组合图像检索性能。
    Abstract Composed image retrieval, a task involving the search for a target image using a reference image and a complementary text as the query, has witnessed significant advancements owing to the progress made in cross-modal modeling. Unlike the general image-text retrieval problem with only one alignment relation, i.e., image-text, we argue for the existence of two types of relations in composed image retrieval. The explicit relation pertains to the reference image & complementary text-target image, which is commonly exploited by existing methods. Besides this intuitive relation, the observations during our practice have uncovered another implicit yet crucial relation, i.e., reference image & target image-complementary text, since we found that the complementary text can be inferred by studying the relation between the target image and the reference image. Regrettably, existing methods largely focus on leveraging the explicit relation to learn their networks, while overlooking the implicit relation. In response to this weakness, We propose a new framework for composed image retrieval, termed dual relation alignment, which integrates both explicit and implicit relations to fully exploit the correlations among the triplets. Specifically, we design a vision compositor to fuse reference image and target image at first, then the resulted representation will serve two roles: (1) counterpart for semantic alignment with the complementary text and (2) compensation for the complementary text to boost the explicit relation modeling, thereby implant the implicit relation into the alignment learning. Our method is evaluated on two popular datasets, CIRR and FashionIQ, through extensive experiments. The results confirm the effectiveness of our dual-relation learning in substantially enhancing composed image retrieval performance.
    摘要 新型图像检索任务:基于参考图像和补充文本的目标图像检索,受到跨模型的进步所见证。不同于一般的图像文本检索问题,我们认为图像检索任务存在两种关系:一种是明确的关系,即参考图像和补充文本-目标图像,这种关系通常被现有方法利用。此外,我们在实践中发现了一种隐式 yet crucial 的关系,即参考图像和目标图像-补充文本,因为我们发现了补充文本可以通过研究参考图像和目标图像之间的关系来推导。然而,现有方法主要是通过明确的关系来学习其网络。为了解决这个弱点,我们提出了一种新的框架,即双关系协调,这种框架将明确和隐式关系完全利用,以便充分利用参考图像、目标图像和补充文本之间的相关性。我们设计了一个视觉笔记,用于将参考图像和目标图像 fusion,然后得到的表示将扮演两个角色:(1)对应文本的 semantic alignment 和(2)用于强化明确关系模型,以便在 alignment 学习中嵌入隐式关系。我们的方法在 CIRR 和 FashionIQ 两个流行的数据集上进行了广泛的实验,结果证明了我们的双关系学习在图像检索性能上具有显著提高的效果。

A Survey of Imitation Learning: Algorithms, Recent Developments, and Challenges

  • paper_url: http://arxiv.org/abs/2309.02473
  • repo_url: None
  • paper_authors: Maryam Zare, Parham M. Kebria, Abbas Khosravi, Saeid Nahavandi
  • For: This paper provides an introduction to imitation learning (IL) and offers an overview of its underlying assumptions and approaches in the context of robotics and artificial intelligence (AI).* Methods: The paper discusses recent advances and emerging areas of research in IL, including the use of demonstrations to learn desired behavior, and addresses common challenges associated with IL.* Results: The paper provides a comprehensive guide to the growing field of IL in robotics and AI, including potential directions for future research.Here’s the text in Simplified Chinese:
  • for: 这篇论文是提供对人工智能和机器人领域内的学习模式的引入和概述,包括学习从专家示例的方法。
  • methods: 论文讨论了现有的IL技术和新兴领域的研究,以及面临IL的常见挑战。
  • results: 论文提供了人工智能和机器人领域内IL的总结和未来研究方向。
    Abstract In recent years, the development of robotics and artificial intelligence (AI) systems has been nothing short of remarkable. As these systems continue to evolve, they are being utilized in increasingly complex and unstructured environments, such as autonomous driving, aerial robotics, and natural language processing. As a consequence, programming their behaviors manually or defining their behavior through reward functions (as done in reinforcement learning (RL)) has become exceedingly difficult. This is because such environments require a high degree of flexibility and adaptability, making it challenging to specify an optimal set of rules or reward signals that can account for all possible situations. In such environments, learning from an expert's behavior through imitation is often more appealing. This is where imitation learning (IL) comes into play - a process where desired behavior is learned by imitating an expert's behavior, which is provided through demonstrations. This paper aims to provide an introduction to IL and an overview of its underlying assumptions and approaches. It also offers a detailed description of recent advances and emerging areas of research in the field. Additionally, the paper discusses how researchers have addressed common challenges associated with IL and provides potential directions for future research. Overall, the goal of the paper is to provide a comprehensive guide to the growing field of IL in robotics and AI.
    摘要 This paper provides an introduction to IL and an overview of its underlying assumptions and approaches. It also offers a detailed description of recent advances and emerging areas of research in the field. Additionally, the paper discusses how researchers have addressed common challenges associated with IL and provides potential directions for future research. The goal of the paper is to provide a comprehensive guide to the growing field of IL in robotics and AI.

Model-based Offline Policy Optimization with Adversarial Network

  • paper_url: http://arxiv.org/abs/2309.02157
  • repo_url: https://github.com/junming-yang/moan
  • paper_authors: Junming Yang, Xingguo Chen, Shengyuan Wang, Bolei Zhang
  • for: 提出了一种基于模型的线上权限学习(RL)方法,以避免在线环境中的成本交互,并且可以在离线数据集上进行策略优化。
  • methods: 使用了对抗学习来建立一个更好的逻辑分布,并通过对抗网络来提供模型的不确定性量化。
  • results: 比较了现有的基于模型的离线RL方法,并取得了更高的性能和更准确的不确定性量化。
    Abstract Model-based offline reinforcement learning (RL), which builds a supervised transition model with logging dataset to avoid costly interactions with the online environment, has been a promising approach for offline policy optimization. As the discrepancy between the logging data and online environment may result in a distributional shift problem, many prior works have studied how to build robust transition models conservatively and estimate the model uncertainty accurately. However, the over-conservatism can limit the exploration of the agent, and the uncertainty estimates may be unreliable. In this work, we propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN). The key idea is to use adversarial learning to build a transition model with better generalization, where an adversary is introduced to distinguish between in-distribution and out-of-distribution samples. Moreover, the adversary can naturally provide a quantification of the model's uncertainty with theoretical guarantees. Extensive experiments showed that our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks. It can also generate diverse in-distribution samples, and quantify the uncertainty more accurately.
    摘要 模型基于的线上强化学习(RL),通过使用日志数据建立一个监督式过渡模型,以避免在线环境中的成本性交互,已经是无线环境中的一种有前途的方法。然而, logging 数据和在线环境之间的差异可能会导致分布性Shift问题,许多前作都研究了如何建立保守的过渡模型和准确地估计模型的不确定性。然而,过于保守的建模可能会限制 agent 的探索,而估计的不确定性可能是不可靠的。在这个工作中,我们提出了一种基于 Model-based Offline policy optimization 框架的 Adversarial Network (MOAN)。关键思想是使用对抗学习建立一个更好的泛化过渡模型,其中一个对手可以分辨在 Distribution 和 Out-of-Distribution 样本之间。此外,对手还可以自然地提供一个量化的模型不确定性的理论保证。我们的方法在 widely studied 的 offline RL 标准准样本上进行了广泛的实验,并显示了我们的方法在性能和多样性方面的超过现有基eline。它还可以更准确地量化不确定性。

Making Large Language Models Better Reasoners with Alignment

  • paper_url: http://arxiv.org/abs/2309.02144
  • repo_url: None
  • paper_authors: Peiyi Wang, Lei Li, Liang Chen, Feifan Song, Binghuai Lin, Yunbo Cao, Tianyu Liu, Zhifang Sui
  • for: 提升大语言模型(LLM)的理解能力,特别是在链式思维(COT)理解过程中。
  • methods: 通过对 LLM 进行特定的微调,使其在 COT 理解过程中提高其理解能力。
  • results: 通过实施新的对齐练习(AFT)方法,可以有效地解决 LLM 在 COT 理解过程中存在的评价不一致问题,并提高其理解能力。
    Abstract Reasoning is a cognitive process of using evidence to reach a sound conclusion. The reasoning capability is essential for large language models (LLMs) to serve as the brain of the artificial general intelligence agent. Recent studies reveal that fine-tuning LLMs on data with the chain of thought (COT) reasoning process can significantly enhance their reasoning capabilities. However, we find that the fine-tuned LLMs suffer from an \textit{Assessment Misalignment} problem, i.e., they frequently assign higher scores to subpar COTs, leading to potential limitations in their reasoning abilities. To address this problem, we introduce an \textit{Alignment Fine-Tuning (AFT)} paradigm, which involves three steps: 1) fine-tuning LLMs with COT training data; 2) generating multiple COT responses for each question, and categorizing them into positive and negative ones based on whether they achieve the correct answer; 3) calibrating the scores of positive and negative responses given by LLMs with a novel constraint alignment loss. Specifically, the constraint alignment loss has two objectives: a) Alignment, which guarantees that positive scores surpass negative scores to encourage answers with high-quality COTs; b) Constraint, which keeps the negative scores confined to a reasonable range to prevent the model degradation. Beyond just the binary positive and negative feedback, the constraint alignment loss can be seamlessly adapted to the ranking situations when ranking feedback is accessible. Furthermore, we also delve deeply into recent ranking-based alignment methods, such as DPO, RRHF, and PRO, and discover that the constraint, which has been overlooked by these approaches, is also crucial for their performance. Extensive experiments on four reasoning benchmarks with both binary and ranking feedback demonstrate the effectiveness of AFT.
    摘要 理智是认知过程中的证据使用,以达到正确的结论。理智能力是人工通用智能代理人的关键能力。现代研究表明,对大语言模型(LLM)进行适应过程可以显著提高其理智能力。然而,我们发现,经过精度调整后的LLM受到了评价不一致(Assessment Misalignment)问题的限制,即它们经常对低质量的链条思维(COT)进行高分评价,可能导致其理智能力受到限制。为解决这个问题,我们提出了一种Alignment Fine-Tuning(AFT)方法,包括以下三步:1)对LLM进行COT训练数据的精度调整;2)为每个问题生成多个COT响应,并将它们分为正确和错误的两类 based on whether they achieve the correct answer; 3)对LLM对正确和错误响应的分配分数进行Calibration,使其符合一个新的约束Alignmentloss。具体来说,Alignmentloss有两个目标:a)Alignment,确保正确的分数高于错误的分数,以鼓励高质量的COT; b)Constraint,使错误的分数尽可能地受限,以避免模型下降。此外,我们还发现,当有排名反馈时,这种约束可以轻松地适应到排名情况下。此外,我们还对最近的排名基于Alignment方法,如DPO、RRHF和PRO进行了深入研究,发现,这种约束也是这些方法的关键因素。我们在四个理智benchmark上进行了广泛的实验,并证明了AFT的效果。

A Lightweight, Rapid and Efficient Deep Convolutional Network for Chest X-Ray Tuberculosis Detection

  • paper_url: http://arxiv.org/abs/2309.02140
  • repo_url: https://github.com/dani-capellan/LightTBNet
  • paper_authors: Daniel Capellán-Martín, Juan J. Gómez-Valverde, David Bermejo-Peláez, María J. Ledesma-Carbayo
  • for: 您的论文旨在提高肺部X射线图像的诊断精度,减少误判。
  • methods: 您使用了深度学习技术,开发了一种特制的轻量级、快速、计算效率低的快速预测模型,以提高肺部X射线图像的诊断精度。
  • results: 您的模型在独立测试集上达到了0.906、0.907和0.961的准确率、F1分数和ROC曲线值,表明模型在诊断肺部TB的能力强,并且具有快速预测和低计算和存储需求,适用于在低TB发病地区使用。
    Abstract Tuberculosis (TB) is still recognized as one of the leading causes of death worldwide. Recent advances in deep learning (DL) have shown to enhance radiologists' ability to interpret chest X-ray (CXR) images accurately and with fewer errors, leading to a better diagnosis of this disease. However, little work has been done to develop models capable of diagnosing TB that offer good performance while being efficient, fast and computationally inexpensive. In this work, we propose LightTBNet, a novel lightweight, fast and efficient deep convolutional network specially customized to detect TB from CXR images. Using a total of 800 frontal CXR images from two publicly available datasets, our solution yielded an accuracy, F1 and area under the ROC curve (AUC) of 0.906, 0.907 and 0.961, respectively, on an independent test subset. The proposed model demonstrates outstanding performance while delivering a rapid prediction, with minimal computational and memory requirements, making it highly suitable for deployment in handheld devices that can be used in low-resource areas with high TB prevalence. Code publicly available at https://github.com/dani-capellan/LightTBNet.
    摘要 肺炎病毒 (TB) 仍然被认为全球主要的死亡原因之一。latest advances in deep learning (DL) 已经显示了改善医生解读胸部X射线 (CXR) 像素的能力,导致更好的这病的诊断。然而, little work has been done to develop models capable of diagnosing TB that offer good performance while being efficient, fast and computationally inexpensive. In this work, we propose LightTBNet, a novel lightweight, fast and efficient deep convolutional network specially customized to detect TB from CXR images. Using a total of 800 frontal CXR images from two publicly available datasets, our solution yielded an accuracy, F1 and area under the ROC curve (AUC) of 0.906, 0.907 and 0.961, respectively, on an independent test subset. The proposed model demonstrates outstanding performance while delivering a rapid prediction, with minimal computational and memory requirements, making it highly suitable for deployment in handheld devices that can be used in low-resource areas with high TB prevalence. Code publicly available at https://github.com/dani-capellan/LightTBNet.Note: Please note that the translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Self-Supervised Pre-Training Boosts Semantic Scene Segmentation on LiDAR data

  • paper_url: http://arxiv.org/abs/2309.02139
  • repo_url: https://github.com/marionacaros/barlow-twins-for-sem-seg
  • paper_authors: Mariona Carós, Ariadna Just, Santi Seguí, Jordi Vitrià
  • for: 本研究旨在实现从无标注数据中学习Semantic Scene Segmentation(实物景像分类),以减少需要标注的数据量。
  • methods: 本研究使用Barlow Twins自我超vised encoder进行预训练,并将其用作实物景像分类任务中的预训练网络。
  • results: 实验结果显示,我们的无标注预训练策略可以增加实物景像分类任务中的表现,特别是对于少数类别的表现。
    Abstract Airborne LiDAR systems have the capability to capture the Earth's surface by generating extensive point cloud data comprised of points mainly defined by 3D coordinates. However, labeling such points for supervised learning tasks is time-consuming. As a result, there is a need to investigate techniques that can learn from unlabeled data to significantly reduce the number of annotated samples. In this work, we propose to train a self-supervised encoder with Barlow Twins and use it as a pre-trained network in the task of semantic scene segmentation. The experimental results demonstrate that our unsupervised pre-training boosts performance once fine-tuned on the supervised task, especially for under-represented categories.
    摘要 空中探测LiDAR系统可以捕捉地球表面,生成大量的点云数据,主要由3D坐标定义。但标注这些点云数据用于监督学习任务是时间消耗大。因此,我们需要研究如何从无标注数据中学习,以大幅减少需要标注的样本数量。在这个工作中,我们提议使用自我监督编码器和Barlow Twins进行预训练,并将其作为semantic scene segmentation任务的预训练网络。实验结果表明,我们的无监督预训练可以大幅提高任务的性能,尤其是对于少数概率类别。

Generalized Simplicial Attention Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02138
  • repo_url: https://github.com/luciatesta97/generalized-simplicial-attention-neural-networks
  • paper_authors: Claudio Battiloro, Lucia Testa, Lorenzo Giusti, Stefania Sardellitti, Paolo Di Lorenzo, Sergio Barbarossa
  • for: 本研究旨在介绍通用 simplicial 注意力神经网络 (GSAN),即利用面积掩码自我注意 layers 处理定义在 simplicial 复合体上的数据的新神经网络体系。
  • methods: 作者提出了一系列基于 topological signal processing 原理的自我注意方案,可以处理不同 simplicial 顺序上的数据组件,如节点、边、三角形等,并通过 Dirac 算子和其分解学习对 simplicial 领域的邻居重要性进行权重。
  • results: 作者证明了 GSAN 具有交换对称和 simplicial 意识,并通过应用于多个(推导和推理)任务,如 trajectory prediction、缺失数据填充、图 классификация和 simplex prediction,与其他方法进行比较,得到了比较好的结果。
    Abstract The aim of this work is to introduce Generalized Simplicial Attention Neural Networks (GSANs), i.e., novel neural architectures designed to process data defined on simplicial complexes using masked self-attentional layers. Hinging on topological signal processing principles, we devise a series of self-attention schemes capable of processing data components defined at different simplicial orders, such as nodes, edges, triangles, and beyond. These schemes learn how to weight the neighborhoods of the given topological domain in a task-oriented fashion, leveraging the interplay among simplices of different orders through the Dirac operator and its Dirac decomposition. We also theoretically establish that GSANs are permutation equivariant and simplicial-aware. Finally, we illustrate how our approach compares favorably with other methods when applied to several (inductive and transductive) tasks such as trajectory prediction, missing data imputation, graph classification, and simplex prediction.
    摘要 文章的目的是介绍通用 simplicial 注意力神经网络(GSAN),即新的神经网络架构,用于处理定义在 simplicial 复合体上的数据,使用假自注意层。基于 topological signal processing 原则,我们设计了一系列自注意方案,可以处理不同 simplicial 顺序的数据组件,如节点、边、triangle 等。这些方案可以Weight neighborhoods of the given topological domain in a task-oriented fashion,利用不同 simplices 之间的交互,通过 Дирак算符和其 Дирак分解。我们还证明了 GSANs 是 permutation equivariant 和 simplicial-aware。最后,我们比较了我们的方法与其他方法在 inductive 和 transductive 任务上的性能,包括 trajectory prediction、missing data imputation、graph classification 和 simplex prediction。

Exploring the Intersection of Complex Aesthetics and Generative AI for Promoting Cultural Creativity in Rural China after the Post-Pandemic Era

  • paper_url: http://arxiv.org/abs/2309.02136
  • repo_url: None
  • paper_authors: Mengyao Guo, Xiaolin Zhang, Yuan Zhuang, Jing Chen, Pengfei Wang, Ze Gao
  • for: 这个论文探讨了在中国农村地区使用生成AI和艺术来促进文化创新,尤其是在COVID-19的影响下。
  • methods: 该论文通过文献综述、案例研究、问卷调查和文本分析方法来研究艺术和科技在农村 context中的应用,并找到了关键的挑战。
  • results: 研究发现艺术作品经常无法在当地 resonate,而依赖于外部艺术家的支持限制了可持续性。因此,抚养“村村艺术家”通过AI被提议。我们的方法是通过对主观美学进行机器学习训练,生成文化相关的内容。交互式AI媒体还可以提高旅游业,保护遗产。这项先导性的研究提出了对AI和艺术的交互关系的新视角,并强调AI的创作能力 versus 取代性。最后,它为了使用AI创新来促进农村社区的发展奠定了基础。
    Abstract This paper explores using generative AI and aesthetics to promote cultural creativity in rural China amidst COVID-19's impact. Through literature reviews, case studies, surveys, and text analysis, it examines art and technology applications in rural contexts and identifies key challenges. The study finds artworks often fail to resonate locally, while reliance on external artists limits sustainability. Hence, nurturing grassroots "artist villagers" through AI is proposed. Our approach involves training machine learning on subjective aesthetics to generate culturally relevant content. Interactive AI media can also boost tourism while preserving heritage. This pioneering research puts forth original perspectives on the intersection of AI and aesthetics to invigorate rural culture. It advocates holistic integration of technology and emphasizes AI's potential as a creative enabler versus replacement. Ultimately, it lays the groundwork for further exploration of leveraging AI innovations to empower rural communities. This timely study contributes to growing interest in emerging technologies to address critical issues facing rural China.
    摘要

Multi-label affordance mapping from egocentric vision

  • paper_url: http://arxiv.org/abs/2309.02120
  • repo_url: https://github.com/lmur98/epic_kitchens_affordances
  • paper_authors: Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin
  • for: 本研究旨在提供高精度的交互场景中的可用性检测和分割方法,用于支持人工智能系统的发展。
  • methods: 本研究使用新的多标签检测方法,可以准确地检测和分割多个可用性在同一个空间中。
  • results: 研究人员通过使用多标签检测方法,成功地提取了高精度的交互场景中的可用性信息,并构建了大量和完整的交互可用性数据集(EPIC-Aff)。此外,研究人员还提出了一种新的多标签检测方法,可以处理多个可用性同时存在同一个空间中的情况。
    Abstract Accurate affordance detection and segmentation with pixel precision is an important piece in many complex systems based on interactions, such as robots and assitive devices. We present a new approach to affordance perception which enables accurate multi-label segmentation. Our approach can be used to automatically extract grounded affordances from first person videos of interactions using a 3D map of the environment providing pixel level precision for the affordance location. We use this method to build the largest and most complete dataset on affordances based on the EPIC-Kitchen dataset, EPIC-Aff, which provides interaction-grounded, multi-label, metric and spatial affordance annotations. Then, we propose a new approach to affordance segmentation based on multi-label detection which enables multiple affordances to co-exists in the same space, for example if they are associated with the same object. We present several strategies of multi-label detection using several segmentation architectures. The experimental results highlight the importance of the multi-label detection. Finally, we show how our metric representation can be exploited for build a map of interaction hotspots in spatial action-centric zones and use that representation to perform a task-oriented navigation.
    摘要 重要的一部分是许多复杂系统中的互动,如 робоット和协助设备。我们提出了一个新的方法来检测和分类可用性,可以从首人视频中自动提取固定的可用性。我们使用这种方法建立了最大和最完整的可用性数据集,EPIC-Aff,其提供了互动基于环境的三维地图,以像素精度检测可用性位置。接下来,我们提出了一个新的可用性分类方法,可以同时检测多个可用性,例如它们与同一个物品相关。我们提出了多种多label检测方法,包括多个分类架构。实验结果显示了多label检测的重要性。最后,我们显示了如何使用我们的度量表示法建立互动热点地图,并使用该表示法进行任务导向的探索。

Leveraging Label Information for Multimodal Emotion Recognition

  • paper_url: http://arxiv.org/abs/2309.02106
  • repo_url: https://github.com/Digimonseeker/LE-MER
  • paper_authors: Peiying Wang, Sunlu Zeng, Junqing Chen, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He
  • for: 本研究旨在提高多模态情感识别(MER)的性能,通过结合语音和文本信息。
  • methods: 我们提出了一种新的方法,利用标签信息来提高MER的性能。我们首先获取语音和文本模态的表示性标签嵌入,然后通过标签-token和标签-帧交互来学习每个语音的标签感知表示。最后,我们提出了一种新的标签导向拟合模块,将标签意识文本和语音表示进行情感分类。
  • results: 我们在公共的IEMOCAP dataset上进行了广泛的实验,结果表明,我们的提议的方法在比较基eline和现有方法的情况下,实现了新的国际顶点性能。
    Abstract Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label information. Specifically, we first obtain the representative label embeddings for both text and speech modalities, then learn the label-enhanced text/speech representations for each utterance via label-token and label-frame interactions. Finally, we devise a novel label-guided attentive fusion module to fuse the label-aware text and speech representations for emotion classification. Extensive experiments were conducted on the public IEMOCAP dataset, and experimental results demonstrate that our proposed approach outperforms existing baselines and achieves new state-of-the-art performance.
    摘要 多Modal情感识别(MER)目标是通过 Speech 和文本信息检测表达的情感状态。直觉地,标签信息应该能够帮助模型定位特定情感的关键词/帧,从而实现MER任务。 inspirited by这个想法,我们提出了一种新的MER方法,利用标签信息。具体来说,我们首先获得文本和Speech模态的表示性标签嵌入,然后通过标签-token和标签-帧交互学习每个语音的标签感知表示。最后,我们设计了一种新的标签引导束合模块,将标签意识的文本和Speech表示进行情感分类。我们在公共的IEMOCAP数据集上进行了广泛的实验,实验结果表明,我们提出的方法比现有的基eline和实现新的状态。

Improving Query-Focused Meeting Summarization with Query-Relevant Knowledge

  • paper_url: http://arxiv.org/abs/2309.02105
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Tiezheng Yu, Ziwei Ji, Pascale Fung
  • for: 这篇论文旨在提供一个可以根据查询生成会议总结的方法。
  • methods: 本文提出了一个知识增强的两阶段框架,名为知识感知SUMmarizer (KAS),以解决长Input文本长度和会议总结中罕见的查询相关信息的问题。在第一阶段,我们引入知识感知分析来提高查询相关段别的提取精度。在第二阶段,我们将查询相关知识 integrate到总结生成中。
  • results: 实验结果显示,我们的方法在QMSum dataset上实现了现有最佳性能。进一步的分析显示,我们的方法能够生成相关 faithful和有用的总结。
    Abstract Query-Focused Meeting Summarization (QFMS) aims to generate a summary of a given meeting transcript conditioned upon a query. The main challenges for QFMS are the long input text length and sparse query-relevant information in the meeting transcript. In this paper, we propose a knowledge-enhanced two-stage framework called Knowledge-Aware Summarizer (KAS) to tackle the challenges. In the first stage, we introduce knowledge-aware scores to improve the query-relevant segment extraction. In the second stage, we incorporate query-relevant knowledge in the summary generation. Experimental results on the QMSum dataset show that our approach achieves state-of-the-art performance. Further analysis proves the competency of our methods in generating relevant and faithful summaries.
    摘要 Query-Focused Meeting Summarization (QFMS) 目标是根据查询生成会议笔记摘要。主要挑战是输入文本长度较长,会议笔记中关键信息罕见。在这篇论文中,我们提出了知识增强的两Stage框架,称为知识感知摘要器(KAS),以解决这些挑战。首先,我们在查询相关段落提取中引入了知识感知分数。然后,我们在摘要生成过程中引入了查询相关知识。实验结果表明,我们的方法在 QMSum 数据集上达到了顶尖性能。进一步分析表明,我们的方法能够生成相关和准确的摘要。

Iterative Superquadric Recomposition of 3D Objects from Multiple Views

  • paper_url: http://arxiv.org/abs/2309.02102
  • repo_url: https://github.com/explainableml/isco
  • paper_authors: Stephan Alaniz, Massimiliano Mancini, Zeynep Akata
  • for: 本研究旨在提出一种描述对象的概念模型,帮助机器学习模型更好地理解和重建物体的三维结构。
  • methods: 该方法使用3D超quadrics作为semantic part来直接从2D视图中重建物体,而不需要训练任何3Dsupervision模型。该方法通过优化超quadrics参数,以实现高精度的3D重建。
  • results: 实验表明,相比最近的单个实例超quadrics重建方法,ISCO方法能够提供更高精度的3D重建结果,即使是从野生图像中。代码可以在https://github.com/ExplainableML/ISCO上下载。
    Abstract Humans are good at recomposing novel objects, i.e. they can identify commonalities between unknown objects from general structure to finer detail, an ability difficult to replicate by machines. We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views without training a model that uses 3D supervision. To achieve this, we optimize the superquadric parameters that compose a specific instance of the object, comparing its rendered 3D view and 2D image silhouette. Our ISCO framework iteratively adds new superquadrics wherever the reconstruction error is high, abstracting first coarse regions and then finer details of the target object. With this simple coarse-to-fine inductive bias, ISCO provides consistent superquadrics for related object parts, despite not having any semantic supervision. Since ISCO does not train any neural network, it is also inherently robust to out-of-distribution objects. Experiments show that, compared to recent single instance superquadrics reconstruction approaches, ISCO provides consistently more accurate 3D reconstructions, even from images in the wild. Code available at https://github.com/ExplainableML/ISCO .
    摘要 人类具有将新物体复制成已知结构的能力,即可以从总结构到细节上识别未知物体的共同点,这是机器不易复制的能力。我们提出了一个框架,即ISCO,可以通过直接从2D视图中提取3D超quadric作为semantic part来重新组合物体。为了实现这一点,我们优化了超quadric参数,以使其能够组合特定物体的实例,并比较其渲染的3D视图和2D图像轮廓。我们的ISCO框架会逐渐添加新的超quadric,以降低重建错误,从总体到细节地抽象物体的target part。由于ISCO没有任何semantic supervision,它具有简单的卷积偏好,可以适应各种不同的物体。此外,由于ISCO不需要训练任何神经网络,它也是对外部数据集的抗耗性的。实验显示,相比最近的单个实例超quadrics重建方法,ISCO可以提供更加准确的3D重建结果,甚至来自野外图像。代码可以在https://github.com/ExplainableML/ISCO上获取。

TensorBank:Tensor Lakehouse for Foundation Model Training

  • paper_url: http://arxiv.org/abs/2309.02094
  • repo_url: None
  • paper_authors: Romeo Kienzler, Benedikt Blumenstiel, Zoltan Arnold Nagy, S. Karthik Mukkavilli, Johannes Schmude, Marcus Freitag, Michael Behrendt, Daniel Salles Civitarese, Naomi Simumba, Daiki Kimura, Hendrik Hamann
  • For: 用于训练基础模型的高维数据存储和流处理成为现代自然语言之外的核心需求。* Methods: 使用复杂关系查询加速 Hierarchical Statistical Indices (HSI) 来从Cloud Object Store (COS) 流动到 GPU 内存中的tensor lakehouse。* Results: 可以通过 direktly 地址tensor的块级别使用 HTTP 范围读取来快速地从Cloud Object Store (COS) 流动tensor到 GPU 内存中,并使用 PyTorch 转换来转换数据。
    Abstract Storing and streaming high dimensional data for foundation model training became a critical requirement with the rise of foundation models beyond natural language. In this paper we introduce TensorBank, a petabyte scale tensor lakehouse capable of streaming tensors from Cloud Object Store (COS) to GPU memory at wire speed based on complex relational queries. We use Hierarchical Statistical Indices (HSI) for query acceleration. Our architecture allows to directly address tensors on block level using HTTP range reads. Once in GPU memory, data can be transformed using PyTorch transforms. We provide a generic PyTorch dataset type with a corresponding dataset factory translating relational queries and requested transformations as an instance. By making use of the HSI, irrelevant blocks can be skipped without reading them as those indices contain statistics on their content at different hierarchical resolution levels. This is an opinionated architecture powered by open standards and making heavy use of open-source technology. Although, hardened for production use using geospatial-temporal data, this architecture generalizes to other use case like computer vision, computational neuroscience, biological sequence analysis and more.
    摘要 存储和流动高维数据 для基础模型训练成为了基础模型的发展的关键要求。在这篇论文中,我们介绍TensorBank,一个 petabyte 级 tensor 湖居 capable of streaming tensors from Cloud Object Store (COS) to GPU 内存 based on complex relational queries。我们使用 Hierarchical Statistical Indices (HSI) for query acceleration。我们的架构允许直接地址 tensors 在块级别使用 HTTP 范围读。一旦在 GPU 内存中,数据可以通过 PyTorch 转换。我们提供一个通用 PyTorch 数据集类型,并提供一个对应的数据工厂,该工厂将关系查询和请求的转换翻译为实例。通过使用 HSI,我们可以跳过无关块,因为它们包含不同层次分辨率水平上的统计信息。这是一种基于开源技术的意见 arquitecture,并且通过使用 geospatial-temporal 数据进行硬化,这种架构可以普及到其他应用场景,如计算机视觉、计算神经科学、生物序列分析等。

Dual Adversarial Alignment for Realistic Support-Query Shift Few-shot Learning

  • paper_url: http://arxiv.org/abs/2309.02088
  • repo_url: None
  • paper_authors: Siyang Jiang, Rui Fang, Hsi-Wen Chen, Wei Ding, Ming-Syan Chen
  • For: 实际情况下,支持集和查询集之间存在多种不确定的扩散shift,这使得traditional的支持集查询集学习变得困难。这篇论文提出了一个新的挑战,即Realistic Support-Query Shift few-shot learning(RSQS),旨在在不确定的扩散shift下进行几个shot学习。* Methods: 我们提出了一种新的对抗性特征平衡方法called DUal adversarial ALignment framework(DuaL),用于缓解RSQS中的两种方面:inter-domain bias和intra-domain variance。一方面,我们在特定的数据集上进行了预处理,并使用生成的各种偏移输入来训练修复网络,以最小化特征层的距离。另一方面,我们提出了一种生成器网络,用于自顺序生成硬example,即less similar的支持集中的示例,并通过整合最优运输来获得一个平滑的运输计划。* Results: 我们建立了RSQS的benchmark,包括了several state-of-the-art baselines,并进行了实验研究。结果表明,DuaL在我们的benchmark中显著超过了state-of-the-art方法。
    Abstract Support-query shift few-shot learning aims to classify unseen examples (query set) to labeled data (support set) based on the learned embedding in a low-dimensional space under a distribution shift between the support set and the query set. However, in real-world scenarios the shifts are usually unknown and varied, making it difficult to estimate in advance. Therefore, in this paper, we propose a novel but more difficult challenge, RSQS, focusing on Realistic Support-Query Shift few-shot learning. The key feature of RSQS is that the individual samples in a meta-task are subjected to multiple distribution shifts in each meta-task. In addition, we propose a unified adversarial feature alignment method called DUal adversarial ALignment framework (DuaL) to relieve RSQS from two aspects, i.e., inter-domain bias and intra-domain variance. On the one hand, for the inter-domain bias, we corrupt the original data in advance and use the synthesized perturbed inputs to train the repairer network by minimizing distance in the feature level. On the other hand, for intra-domain variance, we proposed a generator network to synthesize hard, i.e., less similar, examples from the support set in a self-supervised manner and introduce regularized optimal transportation to derive a smooth optimal transportation plan. Lastly, a benchmark of RSQS is built with several state-of-the-art baselines among three datasets (CIFAR100, mini-ImageNet, and Tiered-Imagenet). Experiment results show that DuaL significantly outperforms the state-of-the-art methods in our benchmark.
    摘要 支持问题Shift几何学学习目标是将未经见过的示例(查询集)分类到已经标注的数据(支持集)基于学习得到的嵌入在低维度空间下,但在实际场景中,这些变化通常是未知且多样的,使得预测变化很困难。因此,在这篇论文中,我们提出了一个新的挑战,即真实支持问题Shift几何学学习(RSQS)。RSQS的关键特点是每个元任务中的个体样本会面临多种分布变化。此外,我们提出了一种整合式对抗特征对齐方法,即DUal adversarial ALignment framework(DuaL),以解决RSQS中的两个方面:间域偏见和内域变异。一方面,为了间域偏见,我们在提前损害原始数据后,使用生成的妨害输入来训练维护网络,并在特征层面下将其距离最小化。另一方面,为了内域变异,我们提出了一种生成器网络,通过自适应方式生成硬例(即更不相似的示例),并通过可惩正的优化运输来 derivation 一个平滑的优化运输计划。最后,我们建立了RSQS的标准准则,包括三个数据集(CIFAR100、mini-ImageNet和Tiered-Imagenet)上的state-of-the-art基elines。实验结果显示,DuaL明显超过了state-of-the-art方法在我们的标准准则中。

Natural Example-Based Explainability: a Survey

  • paper_url: http://arxiv.org/abs/2309.03234
  • repo_url: https://github.com/danderfer/Comp_Sci_Sem_2
  • paper_authors: Antonin Poché, Lucas Hervier, Mohamed-Chafik Bakkay
  • for: 本研究旨在提供一份对当前自然示例基于XAI技术的概述,以便了解不同方法的优缺点和应用场景。
  • methods: 本研究主要涉及的方法包括相似示例、 counterfactual 和 semi-factual 示例、重要实例、原型和概念。
  • results: 本研究提供了这些方法的 semantic definition、认知影响和added value的比较,以便激励和促进未来的自然示例基于XAI技术发展。
    Abstract Explainable Artificial Intelligence (XAI) has become increasingly significant for improving the interpretability and trustworthiness of machine learning models. While saliency maps have stolen the show for the last few years in the XAI field, their ability to reflect models' internal processes has been questioned. Although less in the spotlight, example-based XAI methods have continued to improve. It encompasses methods that use examples as explanations for a machine learning model's predictions. This aligns with the psychological mechanisms of human reasoning and makes example-based explanations natural and intuitive for users to understand. Indeed, humans learn and reason by forming mental representations of concepts based on examples. This paper provides an overview of the state-of-the-art in natural example-based XAI, describing the pros and cons of each approach. A "natural" example simply means that it is directly drawn from the training data without involving any generative process. The exclusion of methods that require generating examples is justified by the need for plausibility which is in some regards required to gain a user's trust. Consequently, this paper will explore the following family of methods: similar examples, counterfactual and semi-factual, influential instances, prototypes, and concepts. In particular, it will compare their semantic definition, their cognitive impact, and added values. We hope it will encourage and facilitate future work on natural example-based XAI.
    摘要 很好!我们将提供一个关于自然例子基于Explainable Artificial Intelligence(XAI)的概览,涵盖了不同方法的优点和缺点。XAI在机器学习模型的解释性和可信度方面具有越来越重要的地位。虽然在过去几年中,saliency map在XAI领域中受到了很多关注,但其能够反映机器学习模型内部过程的能力受到了质疑。在这篇文章中,我们将探讨不同的自然例子基于XAI方法,包括相似的例子、counterfactual和semi-factual、重要的实例、原型和概念。尤其是在比较这些方法的semantic定义、认知影响和加值方面。我们希望这篇文章可以对未来的自然例子基于XAI工作提供启发和促进。Note: "自然"(natural)在这里指的是直接从训练数据中提取的例子,而不是通过生成过程来生成的例子。这种要求可信度的需求是因为人们需要在理解模型的预测时有足够的信任感。

DeepVol: A Deep Transfer Learning Approach for Universal Asset Volatility Modeling

  • paper_url: http://arxiv.org/abs/2309.02072
  • repo_url: None
  • paper_authors: Chen Liu, Minh-Ngoc Tran, Chao Wang, Richard Gerlach, Robert Kohn
  • for: 这篇论文旨在提出一种深度学习模型,以更好地模型金融资产的波动性。
  • methods: 该模型使用了转移学习,可以有效地捕捉和模型所有金融资产的波动性动态,只需要一个通用模型。这与经济学 литераature中常见的单独训练每个数据集的方法不同。
  • results: 这个模型在模型泛化性方面表现出色,可以更好地预测金融资产的波动性。这对金融预测和管理具有广泛的应用前景。
    Abstract This paper introduces DeepVol, a promising new deep learning volatility model that outperforms traditional econometric models in terms of model generality. DeepVol leverages the power of transfer learning to effectively capture and model the volatility dynamics of all financial assets, including previously unseen ones, using a single universal model. This contrasts to the prevailing practice in econometrics literature, which necessitates training separate models for individual datasets. The introduction of DeepVol opens up new avenues for volatility modeling and forecasting in the finance industry, potentially transforming the way volatility is understood and predicted.
    摘要 这篇论文介绍了深度风险模型(DeepVol),这是一种有前途的深度学习模型,可以在经济学领域中超越传统 econometric 模型,并且可以更好地捕捉和模型所有金融资产的风险动态。 DeepVol 利用了传输学习的力量,可以通过单一的通用模型来模型所有金融资产,包括之前未见的资产。这与经济学 литераature 中的常见做法不同,需要为每个数据集训练 separte 模型。 DeepVol 的出现将为金融行业带来新的风险模型和预测方法,可能会改变风险的理解和预测方式。

Enhance Multi-domain Sentiment Analysis of Review Texts through Prompting Strategies

  • paper_url: http://arxiv.org/abs/2309.02045
  • repo_url: None
  • paper_authors: Yajing Wang, Zongwei Luo
  • for: 这篇论文旨在提高大型自然语言处理模型(LLMs)在特定任务中的性能,具体来说是在 Sentiment Analysis 任务中。
  • methods: 这篇论文使用了两种新的提示策略,即 RolePlaying(RP)提示和 Chain-of-thought(CoT)提示,并提出了 RP-CoT 提示策略。
  • results: 实验结果显示,采用提出的提示策略可以明显提高 Sentiment Analysis 的准确率,其中 CoT 提示策略对隐式情感分析具有显著的影响,RP-CoT 提示策略则在所有策略中表现最佳。
    Abstract Large Language Models (LLMs) have made significant strides in both scientific research and practical applications. Existing studies have demonstrated the state-of-the-art (SOTA) performance of LLMs in various natural language processing tasks. However, the question of how to further enhance LLMs' performance in specific task using prompting strategies remains a pivotal concern. This paper explores the enhancement of LLMs' performance in sentiment analysis through the application of prompting strategies. We formulate the process of prompting for sentiment analysis tasks and introduce two novel strategies tailored for sentiment analysis: RolePlaying (RP) prompting and Chain-of-thought (CoT) prompting. Specifically, we also propose the RP-CoT prompting strategy which is a combination of RP prompting and CoT prompting. We conduct comparative experiments on three distinct domain datasets to evaluate the effectiveness of the proposed sentiment analysis strategies. The results demonstrate that the adoption of the proposed prompting strategies leads to a increasing enhancement in sentiment analysis accuracy. Further, the CoT prompting strategy exhibits a notable impact on implicit sentiment analysis, with the RP-CoT prompting strategy delivering the most superior performance among all strategies.
    摘要

Diffusion Generative Inverse Design

  • paper_url: http://arxiv.org/abs/2309.02040
  • repo_url: None
  • paper_authors: Marin Vlastelica, Tatiana López-Guevara, Kelsey Allen, Peter Battaglia, Arnaud Doucet, Kimberley Stachenfeld
  • for: solving inverse design problems efficiently
  • methods: using denoising diffusion models (DDMs) and a particle sampling algorithm
  • results: reducing the number of calls to the simulator compared to standard techniques, with improved efficiency
    Abstract Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome. For many real-world engineering problems, the objective function takes the form of a simulator that predicts how the system state will evolve over time, and the design challenge is to optimize the initial conditions that lead to a target outcome. Recent developments in learned simulation have shown that graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics, and support high-quality design optimization with gradient- or sampling-based optimization procedures. However, optimizing designs from scratch requires many expensive model queries, and these procedures exhibit basic failures on either non-convex or high-dimensional problems. In this work, we show how denoising diffusion models (DDMs) can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency. We perform experiments on a number of fluid dynamics design challenges, and find that our approach substantially reduces the number of calls to the simulator compared to standard techniques.
    摘要 “ inverse 设计”指的是对目标函数的输入优化,以实现一个目标结果。在许多实际工程问题中,目标函数通常是一个预测系统状态在时间推移中的模拟器,并且设计挑战是确定初始条件以实现目标结果。现有的学习模拟技术发展已经表明,图 neural network (GNNs) 可以用于准确、高效、可导estiimation of simulator dynamics,并支持高质量的设计优化。然而,从头开始优化设计需要许多昂贵的模拟器调用,这些过程在非凸或高维问题上会表现出基本的失败。在这种情况下,我们提出了使用 denoising diffusion models (DDMs) 来解决 inverse 设计问题,并提出了一种粒子抽象算法来进一步提高其效率。我们在一些流体动力学设计挑战中进行了实验,并发现我们的方法可以减少对模拟器的调用数量相比标准技术。

The Impact of Artificial Intelligence on the Evolution of Digital Education: A Comparative Study of OpenAI Text Generation Tools including ChatGPT, Bing Chat, Bard, and Ernie

  • paper_url: http://arxiv.org/abs/2309.02029
  • repo_url: None
  • paper_authors: Negin Yazdani Motlagh, Matin Khajavi, Abbas Sharifi, Mohsen Ahmadi
  • for: This paper aims to explore the potential of OpenAI’s text generation tools, particularly ChatGPT, in revolutionizing education and to highlight the challenges and opportunities of AI in education.
  • methods: The paper uses a typology that views education through the lenses of system, process, and result to examine the multifaceted applications of AI in education, including decentralizing global education, personalizing curriculums, and digitally documenting competence-based outcomes.
  • results: The paper highlights ChatGPT’s meteoric rise to one million users in just five days and its potential in democratizing education, fostering autodidacticism, and magnifying student engagement. However, the study also acknowledges the potential challenges of AI in education, such as the need for ethical guidelines, pedagogical adaptations, and strategic collaborations to ensure the responsible use of AI tools.
    Abstract In the digital era, the integration of artificial intelligence (AI) in education has ushered in transformative changes, redefining teaching methodologies, curriculum planning, and student engagement. This review paper delves deep into the rapidly evolving landscape of digital education by contrasting the capabilities and impact of OpenAI's pioneering text generation tools like Bing Chat, Bard, Ernie with a keen focus on the novel ChatGPT. Grounded in a typology that views education through the lenses of system, process, and result, the paper navigates the multifaceted applications of AI. From decentralizing global education and personalizing curriculums to digitally documenting competence-based outcomes, AI stands at the forefront of educational modernization. Highlighting ChatGPT's meteoric rise to one million users in just five days, the study underscores its role in democratizing education, fostering autodidacticism, and magnifying student engagement. However, with such transformative power comes the potential for misuse, as text-generation tools can inadvertently challenge academic integrity. By juxtaposing the promise and pitfalls of AI in education, this paper advocates for a harmonized synergy between AI tools and the educational community, emphasizing the urgent need for ethical guidelines, pedagogical adaptations, and strategic collaborations.
    摘要 在数字时代,人工智能(AI)在教育领域的整合已经带来了转变性的变革,重定义了教学方法、课程规划和学生参与度。这篇评论文章深入探讨在数字教育领域的迅速发展,并对OpenAI的创新性文本生成工具如Bing Chat、Bard、Ernie等进行了着力强调,特别是新出现的ChatGPT。根据教育视为系统、过程和结果的三个视角,文章探讨了AI在教育中的多方面应用。从全球教育的减少到个性化课程、数字记录竞争力具体成果等方面,AI在教育现代化中扮演着重要的角色。文章指出ChatGPT在只需五天内吸引了一百万用户,其在推动自主学习、提高学生参与度和全球教育民主化方面具有重要的作用。然而,与此同时,AI在教育领域的应用也存在潜在的风险,文本生成工具可能会不必要地挑战学术Integrity。通过对AI在教育中的推荐和风险的对比,这篇文章强调需要在AI工具和教育社区之间建立和谐的合作,并提出了优先级的道德规范、教学改进和战略合作。

Dynamic Early Exiting Predictive Coding Neural Networks

  • paper_url: http://arxiv.org/abs/2309.02022
  • repo_url: None
  • paper_authors: Alaa Zniber, Ouassim Karrakchou, Mounir Ghogho
  • for: 这篇论文是为了提高深度学习模型在互联网络预测应用中的效率和低功耗。
  • methods: 本研究使用预测编码理论和动态早期终止技术,构建了一个浅层双向网络,并与VGG-16模型进行比较。
  • results: 本研究获得了与VGG-16模型相似的准确率,但具有更少的参数和computational complexity。
    Abstract Internet of Things (IoT) sensors are nowadays heavily utilized in various real-world applications ranging from wearables to smart buildings passing by agrotechnology and health monitoring. With the huge amounts of data generated by these tiny devices, Deep Learning (DL) models have been extensively used to enhance them with intelligent processing. However, with the urge for smaller and more accurate devices, DL models became too heavy to deploy. It is thus necessary to incorporate the hardware's limited resources in the design process. Therefore, inspired by the human brain known for its efficiency and low power consumption, we propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations when a performance threshold is surpassed. We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
    摘要 互联网智能设备(IoT)的感应器在不同的实际应用中广泛使用,由穿梭到智能建筑和医疗监控。这些小型设备产生的大量数据,导致深度学习(DL)模型广泛应用于增强过程中。但是,对于更小和更精确的设备,DL模型已经变得太重且不可行。因此,我们受人脑的效率和低功耗骄傲,提出了一个浅层双向网络,基于预测编码理论和动态早期退出,以避免进一步的计算。我们在CIFAR-10类图标签准则中实现了与VGG-16相同的准确性,但具有较少的参数和计算复杂性。

iLoRE: Dynamic Graph Representation with Instant Long-term Modeling and Re-occurrence Preservation

  • paper_url: http://arxiv.org/abs/2309.02012
  • repo_url: None
  • paper_authors: Siwei Zhang, Yun Xiong, Yao Zhang, Xixi Wu, Yiheng Sun, Jiawei Zhang
  • for: 这篇论文旨在提出一个新的动态グラフ模型化方法,以解决现有方法的三个限制,提高其数据测试和应用范围。
  • methods: 这篇论文使用了一个具有自适应短期更新和长期更新的模型,具有自动删除无用或杂散的边的能力,以及一个具有识别注意力机制的传统transformer-based更新器,以更好地捕捉长期循环pattern。
  • results: 论文的实验结果显示,iLoRE方法能够有效地模型动态グラフ,并且在实验中获得了更高的表现。
    Abstract Continuous-time dynamic graph modeling is a crucial task for many real-world applications, such as financial risk management and fraud detection. Though existing dynamic graph modeling methods have achieved satisfactory results, they still suffer from three key limitations, hindering their scalability and further applicability. i) Indiscriminate updating. For incoming edges, existing methods would indiscriminately deal with them, which may lead to more time consumption and unexpected noisy information. ii) Ineffective node-wise long-term modeling. They heavily rely on recurrent neural networks (RNNs) as a backbone, which has been demonstrated to be incapable of fully capturing node-wise long-term dependencies in event sequences. iii) Neglect of re-occurrence patterns. Dynamic graphs involve the repeated occurrence of neighbors that indicates their importance, which is disappointedly neglected by existing methods. In this paper, we present iLoRE, a novel dynamic graph modeling method with instant node-wise Long-term modeling and Re-occurrence preservation. To overcome the indiscriminate updating issue, we introduce the Adaptive Short-term Updater module that will automatically discard the useless or noisy edges, ensuring iLoRE's effectiveness and instant ability. We further propose the Long-term Updater to realize more effective node-wise long-term modeling, where we innovatively propose the Identity Attention mechanism to empower a Transformer-based updater, bypassing the limited effectiveness of typical RNN-dominated designs. Finally, the crucial re-occurrence patterns are also encoded into a graph module for informative representation learning, which will further improve the expressiveness of our method. Our experimental results on real-world datasets demonstrate the effectiveness of our iLoRE for dynamic graph modeling.
    摘要 <> translate("Continuous-time dynamic graph modeling is a crucial task for many real-world applications, such as financial risk management and fraud detection. Though existing dynamic graph modeling methods have achieved satisfactory results, they still suffer from three key limitations, hindering their scalability and further applicability. i) Indiscriminate updating. For incoming edges, existing methods would indiscriminately deal with them, which may lead to more time consumption and unexpected noisy information. ii) Ineffective node-wise long-term modeling. They heavily rely on recurrent neural networks (RNNs) as a backbone, which has been demonstrated to be incapable of fully capturing node-wise long-term dependencies in event sequences. iii) Neglect of re-occurrence patterns. Dynamic graphs involve the repeated occurrence of neighbors that indicates their importance, which is disappointedly neglected by existing methods. In this paper, we present iLoRE, a novel dynamic graph modeling method with instant node-wise Long-term modeling and Re-occurrence preservation. To overcome the indiscriminate updating issue, we introduce the Adaptive Short-term Updater module that will automatically discard the useless or noisy edges, ensuring iLoRE's effectiveness and instant ability. We further propose the Long-term Updater to realize more effective node-wise long-term modeling, where we innovatively propose the Identity Attention mechanism to empower a Transformer-based updater, bypassing the limited effectiveness of typical RNN-dominated designs. Finally, the crucial re-occurrence patterns are also encoded into a graph module for informative representation learning, which will further improve the expressiveness of our method. Our experimental results on real-world datasets demonstrate the effectiveness of our iLoRE for dynamic graph modeling.")Here's the translation in Simplified Chinese:<> continuous-time动态图模型化是许多实际应用中的关键任务,如金融风险管理和欺诈探测。虽然现有的动态图模型化方法已经达到了一定的成果,但它们仍然受到三个关键限制,使其可扩展性和更多的应用场景受到限制。i) 随机更新。现有的方法会随机处理入coming edges,这可能会导致更多的时间开销和意外的噪声信息。ii) 不够有效的节点级长期模型化。它们依赖于循环神经网络(RNN)作为底层,这已经被证明无法全面捕捉节点级长期依赖关系。iii) 忽略重复模式。动态图中的重复 neighboor 表示其重要性,这是现有方法忽略的。在这篇论文中,我们提出了 iLoRE,一种新的动态图模型化方法,具有即时节点级长期模型化和重复模式保存。为了解决随机更新问题,我们引入了适应短期更新模块,可以自动排除无用或噪声的边,保证 iLoRE 的有效性和即时能力。我们进一步提出了长期更新模块,以实现更有效的节点级长期模型化。我们创新地提出了标识注意力机制,以强化基于 Transformer 的更新器,超越传统 RNN доминиated 设计的局限性。最后,我们还编码了重复模式到图模块,以进一步提高我们方法的表达能力。我们在实际 datasets 上进行了实验, demonstate 了我们 iLoRE 在动态图模型化中的效果。

Belief revision and incongruity: is it a joke?

  • paper_url: http://arxiv.org/abs/2309.02009
  • repo_url: None
  • paper_authors: Florence Dupin de Saint Cyr - Bannay, Henri Prade
  • for: 本文是一种智能行为 formalization的尝试,描述一个智能代理在听笑话时的行为。
  • methods: 本文使用了改变信念、出乎意料和违反 norms 等方法来形式化这种智能行为。
  • results: 本文的研究结果表明,在听笑话时,智能代理可以通过改变信念和出乎意料来产生幽默的感受,并且可以通过违反 norms 来提高幽默的效果。
    Abstract Incongruity often makes people laugh. You have to be smart to say stupid things. It requires to be even smarter for understanding them. This paper is a shameless attempt to formalize this intelligent behavior in the case of an agent listening to a joke. All this is a matter of revision of beliefs, surprise and violation of norms.
    摘要 冲突可以让人 laugh。你需要聪明才能说些愚蠢的话。更需要聪明才能理解它们。这篇文章是一种不害怕的尝试,用于形式化代理人听笑话时的智慧行为。这一切都是对信念的修订,对听者的意外和规范的违反。

Aggregating Correlated Estimations with (Almost) no Training

  • paper_url: http://arxiv.org/abs/2309.02005
  • repo_url: None
  • paper_authors: Theo Delemazure, François Durand, Fabien Mathieu
  • for: 本研究旨在提出一些考虑关联错误的汇集规则,以解决许多决策问题无法得到精确解决方案。
  • methods: 本文提出了一些考虑关联错误的汇集规则,并对它们进行了多种实验,以评估它们在不同的数据集上的性能。
  • results: 研究结果表明,当知道错误的相关性信息时,最大似然汇集方法应该被首选。否则,通常在受限的训练数据下,我们建议使用嵌入式选举方法(EV)。
    Abstract Many decision problems cannot be solved exactly and use several estimation algorithms that assign scores to the different available options. The estimation errors can have various correlations, from low (e.g. between two very different approaches) to high (e.g. when using a given algorithm with different hyperparameters). Most aggregation rules would suffer from this diversity of correlations. In this article, we propose different aggregation rules that take correlations into account, and we compare them to naive rules in various experiments based on synthetic data. Our results show that when sufficient information is known about the correlations between errors, a maximum likelihood aggregation should be preferred. Otherwise, typically with limited training data, we recommend a method that we call Embedded Voting (EV).
    摘要 很多决策问题无法精确解决,需要使用估计算法赋分不同选项的分数。估计误差可能存在多种相关性,从低(例如两种完全不同的方法)到高(例如使用同一算法的不同Hyperparameter)。大多数汇集规则都会受到这种多样性的影响。在这篇文章中,我们提出了考虑相关性的不同汇集规则,并与无知规则进行了多个实验,基于 sintetic 数据。我们的结果表明,当知道估计误差之间的相关性信息充分时,最大 likelihood 汇集应该被首选。否则,通常在受限的训练数据情况下,我们建议一种我们称为 Embedded Voting(EV)方法。

Analyzing domain shift when using additional data for the MICCAI KiTS23 Challenge

  • paper_url: http://arxiv.org/abs/2309.02001
  • repo_url: None
  • paper_authors: George Stoica, Mihaela Breaban, Vlad Barbu
  • for: 提高医疗影像3D segmentation的结果,尤其是在训练材料稀缺的情况下。
  • methods: 使用 histogram matching 来缓解频谱shift,以便将新数据与原始训练数据一起使用。
  • results: 对于 histogram matching 的应用,比使用 simple normalization 得到了更好的结果。
    Abstract Using additional training data is known to improve the results, especially for medical image 3D segmentation where there is a lack of training material and the model needs to generalize well from few available data. However, the new data could have been acquired using other instruments and preprocessed such its distribution is significantly different from the original training data. Therefore, we study techniques which ameliorate domain shift during training so that the additional data becomes better usable for preprocessing and training together with the original data. Our results show that transforming the additional data using histogram matching has better results than using simple normalization.
    摘要 使用额外训练数据可以提高结果,特别是医学图像三维分割,因为这个领域缺乏训练材料,模型需要将少量可用数据总结化好。然而,新的数据可能是使用不同的仪器获取的,其分布与原始训练数据有很大差异。因此,我们研究如何在训练过程中缓解领域差异,使得额外数据更容易与原始数据一起预处理和训练。我们的结果表明,对额外数据进行 histogram matching 变换比使用简单 нормализация更有效。

Photonic Structures Optimization Using Highly Data-Efficient Deep Learning: Application To Nanofin And Annular Groove Phase Masks

  • paper_url: http://arxiv.org/abs/2309.01995
  • repo_url: https://github.com/kaeryv/acsphot23suppl
  • paper_authors: Nicolas Roy, Lorenzo König, Olivier Absil, Charlotte Beauthier, Alexandre Mayer, Michaël Lobet
    for: This paper aims to introduce a surrogate optimization framework for metasurfaces, specifically for the manipulation of light properties in astronomical high-contrast imaging.methods: The paper uses computational intelligence techniques, such as partial least squares Kriging, radial basis functions, and neural networks, to optimize the geometric features of vortex phase masks (VPMs). However, these methods are shown to be inadequate for modeling the performance of VPMs, so a data-efficient evolutionary optimization setup using a deep neural network is proposed instead.results: The paper demonstrates the effectiveness of the proposed optimization setup by developing optimal designs for two design candidates, with the surrogate model improving the reliability and efficiency of the procedure. In the most complex case, evolutionary optimization enables optimization of the design that would otherwise be impractical (requiring too much simulations). The use of the surrogate model reduces the required number of simulations by up to 75% compared to conventional optimization techniques.Here is the text in Simplified Chinese:for: 这篇论文目标是引入一种基于Vortex phase masks(VPMs)的高精度光学设计优化框架。methods: 论文使用了计算智能技术,如部分最小值Kriging、基函数和神经网络,来优化VPMs的几何特征。然而,这些方法不足以模型VPMs的性能,因此提出了一种数据高效的进化优化方案。results: 论文证明了提案的优化方案的有效性,通过开发了两种设计候选人。在最复杂的情况下,进化优化可以实现对设计的优化,而无需进行过多的Simulations。使用代表性模型可以大大降低需要的Simulations数量,相比传统优化技术。
    Abstract Metasurfaces offer a flexible framework for the manipulation of light properties in the realm of thin film optics. Specifically, the polarization of light can be effectively controlled through the use of thin phase plates. This study aims to introduce a surrogate optimization framework for these devices. The framework is applied to develop two kinds of vortex phase masks (VPMs) tailored for application in astronomical high-contrast imaging. Computational intelligence techniques are exploited to optimize the geometric features of these devices. The large design space and computational limitations necessitate the use of surrogate models like partial least squares Kriging, radial basis functions, or neural networks. However, we demonstrate the inadequacy of these methods in modeling the performance of VPMs. To address the shortcomings of these methods, a data-efficient evolutionary optimization setup using a deep neural network as a highly accurate and efficient surrogate model is proposed. The optimization process in this study employs a robust particle swarm evolutionary optimization scheme, which operates on explicit geometric parameters of the photonic device. Through this approach, optimal designs are developed for two design candidates. In the most complex case, evolutionary optimization enables optimization of the design that would otherwise be impractical (requiring too much simulations). In both cases, the surrogate model improves the reliability and efficiency of the procedure, effectively reducing the required number of simulations by up to 75% compared to conventional optimization techniques.
    摘要 追踪板(Metasurfaces)提供了膜片光学中的灵活框架,可以有效控制光的属性。本研究旨在介绍一种代理优化框架,用于这些设备。这种框架应用于开发两种星系高对比图像的旋转相位面镜(VPM)。通过利用计算智能技术,可以优化这些设备的几何特征。由于设计空间很大,计算限制,因此需要使用代理模型,如多项式拟合、径向基函数或神经网络。然而,我们发现这些方法无法模型VPM的性能。为了解决这些方法的缺陷,我们提出了一种数据有效的进化优化设计,使用深度神经网络作为高精度和高效的代理模型。优化过程中,我们使用一种稳定的粒子群演化优化方法,该方法操作于膜片光学设备的显式几何参数。通过这种方法,我们开发了两个设计候选人。在最复杂的情况下,演化优化使得设计可以实现,而不是通过传统优化技术来实现。在两个情况下,代理模型提高了过程的可靠性和效率,实际减少了需要的模拟数量,相对于传统优化技术,减少了75%。

sasdim: self-adaptive noise scaling diffusion model for spatial time series imputation

  • paper_url: http://arxiv.org/abs/2309.01988
  • repo_url: None
  • paper_authors: Shunyang Zhang, Senzhang Wang, Xianzhen Tan, Ruochen Liu, Jian Zhang, Jianxin Wang
  • for: spatial time series imputation
  • methods: self-adaptive noise scaling diffusion model (SaSDim) with a new loss function and across spatial-temporal global convolution module
  • results: effective imputation performance on three real-world datasets, with comparison to current state-of-the-art baselines
    Abstract Spatial time series imputation is critically important to many real applications such as intelligent transportation and air quality monitoring. Although recent transformer and diffusion model based approaches have achieved significant performance gains compared with conventional statistic based methods, spatial time series imputation still remains as a challenging issue due to the complex spatio-temporal dependencies and the noise uncertainty of the spatial time series data. Especially, recent diffusion process based models may introduce random noise to the imputations, and thus cause negative impact on the model performance. To this end, we propose a self-adaptive noise scaling diffusion model named SaSDim to more effectively perform spatial time series imputation. Specially, we propose a new loss function that can scale the noise to the similar intensity, and propose the across spatial-temporal global convolution module to more effectively capture the dynamic spatial-temporal dependencies. Extensive experiments conducted on three real world datasets verify the effectiveness of SaSDim by comparison with current state-of-the-art baselines.
    摘要 <> spatial time series imputation 是很重要的几个实际应用,如智能交通和空气质量监测。尽管最近的 transformer 和 diffusion model 基于方法已经实现了与传统统计学基于方法相比的显著性能提升,但是 spatial time series imputation 仍然是一个具有复杂的空间时间相关性和空间时间数据的噪声不确定性的挑战。特别是,最近的 diffusion process 基于模型可能会将随机噪声引入到插入中,从而影响模型性能。为此,我们提出了一种自适应噪声扩大扩散模型名为 SaSDim,以更有效地进行 spatial time series imputation。特别是,我们提出了一个新的损失函数,可以扩大噪声到类似的强度,并提出了跨空间时间全球 convolution 模块,以更好地捕捉空间时间相关性的动态变化。广泛的实验在三个真实世界数据集上验证了 SaSDim 的效果,与当前状态的先进基elines进行比较。

Graph-Based Interaction-Aware Multimodal 2D Vehicle Trajectory Prediction using Diffusion Graph Convolutional Networks

  • paper_url: http://arxiv.org/abs/2309.01981
  • repo_url: None
  • paper_authors: Keshu Wu, Yang Zhou, Haotian Shi, Xiaopeng Li, Bin Ran
  • for: 预测汽车轨迹,以提高自动化汽车运行效率和安全,特别在拥堵多车道高速公路上。
  • methods: 使用 Graph-based Interaction-aware Multi-modal Trajectory Prediction (GIMTP) 框架,利用图表示汽车的动态互动关系,并通过Diffusion Graph Convolutional Network (DGCN) 捕捉空间和时间两种依赖关系。
  • results: 提供了两维预测结果,包括 longitudinal 和 lateral 驾驶行为,并提供了相应的概率分布,以便更好地预测汽车的未来行为。
    Abstract Predicting vehicle trajectories is crucial for ensuring automated vehicle operation efficiency and safety, particularly on congested multi-lane highways. In such dynamic environments, a vehicle's motion is determined by its historical behaviors as well as interactions with surrounding vehicles. These intricate interactions arise from unpredictable motion patterns, leading to a wide range of driving behaviors that warrant in-depth investigation. This study presents the Graph-based Interaction-aware Multi-modal Trajectory Prediction (GIMTP) framework, designed to probabilistically predict future vehicle trajectories by effectively capturing these interactions. Within this framework, vehicles' motions are conceptualized as nodes in a time-varying graph, and the traffic interactions are represented by a dynamic adjacency matrix. To holistically capture both spatial and temporal dependencies embedded in this dynamic adjacency matrix, the methodology incorporates the Diffusion Graph Convolutional Network (DGCN), thereby providing a graph embedding of both historical states and future states. Furthermore, we employ a driving intention-specific feature fusion, enabling the adaptive integration of historical and future embeddings for enhanced intention recognition and trajectory prediction. This model gives two-dimensional predictions for each mode of longitudinal and lateral driving behaviors and offers probabilistic future paths with corresponding probabilities, addressing the challenges of complex vehicle interactions and multi-modality of driving behaviors. Validation using real-world trajectory datasets demonstrates the efficiency and potential.
    摘要 预测 vehicular trajectories 是确保自动化交通效率和安全的关键,尤其在拥堵的多车道高速公路上。在这种动态环境中,车辆的运动受到历史行为以及与周围车辆的交互影响。这些复杂的交互关系导致了车辆的驾驶行为的各种多样性,需要进一步的研究。本研究提出的 Graph-based Interaction-aware Multi-modal Trajectory Prediction(GIMTP)框架,可以 probabilistically 预测未来车辆的 trajectories,并有效地捕捉这些交互关系。在这个框架中,车辆的运动被视为时间变化的图形中的节点,交通交互被表示为动态邻接矩阵。为了全面捕捉这些图形中的空间和时间相关性,我们采用了卷积图грам(DGCN),从而提供了图形 embedding both historical states 和 future states。此外,我们采用了驾驶意图特征融合,以适应不同驾驶意图的 embeddings 的权重调整,从而提高了驾驶意图识别和车辆预测。这个模型为每种方向的两维预测提供了两维预测结果,并提供了对应的概率,解决了车辆间复杂的交互关系和驾驶行为多样性的问题。验证使用实际 trajectory 数据表明该模型的效率和潜力。

Linear Regression using Heterogeneous Data Batches

  • paper_url: http://arxiv.org/abs/2309.01973
  • repo_url: None
  • paper_authors: Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky
    for:The paper is written to address the problem of learning input-output relationships from multiple sources with insufficient data.methods:The paper proposes a novel gradient-based algorithm that improves upon existing results by allowing for different, unknown, and heavy-tailed input distributions for each subgroup, recovering all subgroups with a significant proportion of batches, and removing the separation requirement between regression vectors.results:The proposed algorithm extends the applicability of the existing results, allowing for smaller batch sizes and reducing the number of batches needed to achieve accurate regression.
    Abstract In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are $k$ subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, $\tilde\Omega( k^{3/2})$, batches of medium-size with $\tilde\Omega(\sqrt k)$ samples each. However, the paper requires that the input distribution for all $k$ subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite $k$; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes.
    摘要 在许多学习应用中,数据来源来自多个不同的源泉,每个源泉提供一个不够的批处理,用于学习其输入输出关系。一种常见的方法假设这些源泉可以分为多个未知的子组,每个子组有未知的输入分布和输入输出关系。我们考虑这个设置的一个最基本和最重要的情况,其中输出是噪声加权的输入的线性组合,并且有 $k$ 个子组,每个子组有自己的回归 вектор。先前的工作(\ref{kong2020meta}) 显示,只要有充足的小批处理,可以通过只需几个 $\tilde\Omega(k^{3/2})$ 批处理,每批处理中有 $\tilde\Omega(\sqrt k)$ 个样本,学习这些回归 вектор。然而,这篇文章要求所有 $k$ 个子组的输入分布都是均匀的 Gaussian,并且认为从不同输入分布的批处理中学习回归 вектор是一个“有趣和挑战的问题”。我们提出了一种新的梯度法,它在以下方面改进了现有结果:1. 允许子组的下面分布不同、未知、重 tailed;2. 可以在有限 $k$ 的情况下,将所有子组都回归;3. 取消回归向量之间的分离要求;4. 减少批处理的数量,并允许更小的批处理大小。

A Survey on Interpretable Cross-modal Reasoning

  • paper_url: http://arxiv.org/abs/2309.01955
  • repo_url: https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning
  • paper_authors: Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu
  • for: 本文旨在探讨可解释的跨模态理解(I-CMR),即不 только实现高预测性能,还能提供人类可理解的解释结果。
  • methods: 本文使用三级分类法概述I-CMR的典型方法,并对现有的CMR数据集进行了解释注释。
  • results: 本文总结了I-CMR的挑战和未来发展方向,并提供了一个包含相关方法、数据集和资源的GitHub项目。
    Abstract In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses potential future directions. In conclusion, this survey aims to catalyze the progress of this emerging research area by providing researchers with a panoramic and comprehensive perspective, illuminating the state of the art and discerning the opportunities. The summarized methods, datasets, and other resources are available at https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.
    摘要 This survey provides a comprehensive overview of I-CMR methods, using a three-level taxonomy to categorize them. Additionally, it reviews existing CMR datasets with annotations for explanations. Finally, it discusses the challenges facing I-CMR and outlines potential future directions.The main goal of this survey is to advance the progress of this emerging research area by providing researchers with a comprehensive perspective, highlighting the current state of the art and identifying opportunities for future research. The survey's findings, methods, datasets, and other resources are available at https://github.com/ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning.

RADIO: Reference-Agnostic Dubbing Video Synthesis

  • paper_url: http://arxiv.org/abs/2309.01950
  • repo_url: None
  • paper_authors: Dongyeun Lee, Chaewon Kim, Sangjoon Yu, Jaejun Yoo, Gyeong-Moon Park
  • for: 高精度 talking head生成中最大的挑战是实现高精度细节,同时保证 precisions synchronization.
  • methods: 我们提出了 RADIO 框架,通过修改decoder层的 latent space 来实现高质量的 dubbed video。此外,我们还 incorporated ViT blocks 来强调高精度细节,特别是在唇部分。
  • results: 我们的实验结果表明,RADIO 能够保持高度同步,同时不失高精度。特别在参考帧与实际帧有很大差异时,我们的方法表现出了更高的稳定性和可靠性。
    Abstract One of the most challenging problems in audio-driven talking head generation is achieving high-fidelity detail while ensuring precise synchronization. Given only a single reference image, extracting meaningful identity attributes becomes even more challenging, often causing the network to mirror the facial and lip structures too closely. To address these issues, we introduce RADIO, a framework engineered to yield high-quality dubbed videos regardless of the pose or expression in reference images. The key is to modulate the decoder layers using latent space composed of audio and reference features. Additionally, we incorporate ViT blocks into the decoder to emphasize high-fidelity details, especially in the lip region. Our experimental results demonstrate that RADIO displays high synchronization without the loss of fidelity. Especially in harsh scenarios where the reference frame deviates significantly from the ground truth, our method outperforms state-of-the-art methods, highlighting its robustness. Pre-trained model and codes will be made public after the review.
    摘要 一个非常挑战的问题在听音驱动的头部生成中是达到高精度细节,同时保证准确的同步。只有一个参考图片,提取有意义的人脸特征变得更加挑战,常常使网络模式lip和脸部结构,这会导致网络模式的产生。为解决这些问题,我们介绍了RADIO框架,可以生成高质量的重音视频,无论参考图片的pose或表情。关键在于在decoder层中模拟latent空间中的音频和参考特征。此外,我们在decoder中添加了ViT块,以强调高精度细节,特别是在唇区。我们的实验结果表明,RADIO可以保持高同步性,而不失去精度。尤其在参考图片与实际真实情况有很大差异时,我们的方法比状态艺术法更高效,这展示了其Robustness。我们将在审核后发布预训练模型和代码。

OHQ: On-chip Hardware-aware Quantization

  • paper_url: http://arxiv.org/abs/2309.01945
  • repo_url: None
  • paper_authors: Wei Huang, Haotong Qin, Yangdong Liu, Jingzhuo Liang, Yifu Ding, Ying Li, Xianglong Liu
  • for: 这个论文旨在提出一个在硬件上进行数值化的框架,以便在资源有限的硬件上部署进步的深度模型。
  • methods: 这个框架使用了硬件感知的混合精度数值化,并且使用了面精度调节来优化数值化的精度和效率。
  • results: 这个框架可以实现在硬件上进行加速的推理,并且可以获得70%和73%的准确率 для ResNet-18和MobileNetV3。同时,这个框架可以提高对INT8的延迟时间,比较INT8在部署时的性能。
    Abstract Quantization emerges as one of the most promising approaches for deploying advanced deep models on resource-constrained hardware. Mixed-precision quantization leverages multiple bit-width architectures to unleash the accuracy and efficiency potential of quantized models. However, existing mixed-precision quantization suffers exhaustive search space that causes immense computational overhead. The quantization process thus relies on separate high-performance devices rather than locally, which also leads to a significant gap between the considered hardware metrics and the real deployment.In this paper, we propose an On-chip Hardware-aware Quantization (OHQ) framework that performs hardware-aware mixed-precision quantization without accessing online devices. First, we construct the On-chip Quantization Awareness (OQA) pipeline, enabling perceive the actual efficiency metrics of the quantization operator on the hardware.Second, we propose Mask-guided Quantization Estimation (MQE) technique to efficiently estimate the accuracy metrics of operators under the constraints of on-chip-level computing power.By synthesizing network and hardware insights through linear programming, we obtain optimized bit-width configurations. Notably, the quantization process occurs on-chip entirely without any additional computing devices and data access. We demonstrate accelerated inference after quantization for various architectures and compression ratios, achieving 70% and 73% accuracy for ResNet-18 and MobileNetV3, respectively. OHQ improves latency by 15~30% compared to INT8 on deployment.
    摘要 “量化技术在资源有限的硬件上部署高级深度模型的可能性几乎无限大。混合精度量化利用多个bit Width架构实现量化模型的精度和效率潜力。然而,现有的混合精度量化受到极大的搜索空间压力,导致计算开销很大。因此,量化过程通常需要分离的高性能设备,这也导致了实际部署与考虑的硬件指标之间存在很大的差距。在这篇论文中,我们提出了在硬件上完全没有访问外部设备的On-chip Hardware-aware Quantization(OHQ)框架。首先,我们构建了On-chip Quantization Awareness(OQA)管道,使得量化操作的实际效率指标可以在硬件上被感知。其次,我们提出了面具指导量化估计(MQE)技术,以计算量化操作在硬件上的精度指标。通过将网络和硬件知识融合到线性规划中,我们得到了优化的位数配置。需要注意的是,量化过程完全发生在硬件上,没有任何外部计算设备和数据访问。我们在不同的架构和压缩比例上进行加速的推理,实现了ResNet-18和MobileNetV3的70%和73%的准确率。OHQ提高了INT8在部署时的延迟时间,相对于INT8,OHQ提高了15~30%。”

Quantum-AI empowered Intelligent Surveillance: Advancing Public Safety Through Innovative Contraband Detection

  • paper_url: http://arxiv.org/abs/2309.03231
  • repo_url: None
  • paper_authors: Syed Atif Ali Shah, Nasir Algeelani, Najeeb Al-Sammarraie
  • for: 这个研究旨在发展一个基于量子人工智能的实时类型识别系统,以解决现有的实时识别过程中的速度问题。
  • methods: 本研究使用了Quantum CNN的技术,实现了实时类型识别的高精度和高速度。
  • results: Quantum-RetinaNet模型在实验中表现出色,能够实现高精度和高速度的实时类型识别,提供了一个可行的解决方案 для实时识别过程中的速度问题。
    Abstract Surveillance systems have emerged as crucial elements in upholding peace and security in the modern world. Their ubiquity aids in monitoring suspicious activities effectively. However, in densely populated environments, continuous active monitoring becomes impractical, necessitating the development of intelligent surveillance systems. AI integration in the surveillance domain was a big revolution, however, speed issues have prevented its widespread implementation in the field. It has been observed that quantum artificial intelligence has led to a great breakthrough. Quantum artificial intelligence-based surveillance systems have shown to be more accurate as well as capable of performing well in real-time scenarios, which had never been seen before. In this research, a RentinaNet model is integrated with Quantum CNN and termed as Quantum-RetinaNet. By harnessing the Quantum capabilities of QCNN, Quantum-RetinaNet strikes a balance between accuracy and speed. This innovative integration positions it as a game-changer, addressing the challenges of active monitoring in densely populated scenarios. As demand for efficient surveillance solutions continues to grow, Quantum-RetinaNet offers a compelling alternative to existing CNN models, upholding accuracy standards without sacrificing real-time performance. The unique attributes of Quantum-RetinaNet have far-reaching implications for the future of intelligent surveillance. With its enhanced processing speed, it is poised to revolutionize the field, catering to the pressing need for rapid yet precise monitoring. As Quantum-RetinaNet becomes the new standard, it ensures public safety and security while pushing the boundaries of AI in surveillance.
    摘要 现代世界中维护和平安全的重要元素之一是监控系统。它们的普遍性使得可以有效监控异常活动。然而,在高度密集的环境中,不断的活动监控变得不实际,需要开发智能监控系统。人工智能在监控领域的整合是一次大革命,但速度问题阻碍了其广泛应用。研究表明,量子人工智能在监控领域带来了巨大突破。基于量子人工智能的监控系统显示更高精度,并在实时场景中表现出色,这从未被见过。本研究将RentinaNet模型与量子神经网络(QCNN)结合,称为量子-RetinaNet。通过利用量子神经网络的量子特性,量子-RetinaNet实现了精度和速度之间的平衡。这种创新的集成,将成为监控领域的游戏 changer,解决了高度密集enario中不断监控的挑战。随着有效监控解决方案的需求不断增长,量子-RetinaNet对现有的Convolutional Neural Network(CNN)模型提供了一种有力的替代,保持精度标准而不是速度性能的牺牲。量子-RetinaNet的独特特点有广泛的未来预测,它在监控领域的扩展将成为一个革命,为公共安全和安全提供保障,同时推动人工智能在监控领域的发展。

Dynamic Brain Transformer with Multi-level Attention for Functional Brain Network Analysis

  • paper_url: http://arxiv.org/abs/2309.01941
  • repo_url: https://github.com/Wayfear/Dynamic-Brain-Transformer
  • paper_authors: Xuan Kan, Antonio Aodong Chen Gu, Hejie Cui, Ying Guo, Carl Yang
  • for: 这篇论文的目的是提出一种新的方法,以便更好地分析大脑功能。
  • methods: 这篇论文使用了Dynamic bRAin Transformer(DART)方法,融合静止大脑网络和动态大脑网络,以提高大脑功能分析的精度和多元性。
  • results: 这篇论文的结果显示,DRAT方法可以更有效地预测临床结果和分类个人,并且可以提供更多的几何资讯,例如哪些大脑网络或动态网络在最终预测中做出了贡献。
    Abstract Recent neuroimaging studies have highlighted the importance of network-centric brain analysis, particularly with functional magnetic resonance imaging. The emergence of Deep Neural Networks has fostered a substantial interest in predicting clinical outcomes and categorizing individuals based on brain networks. However, the conventional approach involving static brain network analysis offers limited potential in capturing the dynamism of brain function. Although recent studies have attempted to harness dynamic brain networks, their high dimensionality and complexity present substantial challenges. This paper proposes a novel methodology, Dynamic bRAin Transformer (DART), which combines static and dynamic brain networks for more effective and nuanced brain function analysis. Our model uses the static brain network as a baseline, integrating dynamic brain networks to enhance performance against traditional methods. We innovatively employ attention mechanisms, enhancing model explainability and exploiting the dynamic brain network's temporal variations. The proposed approach offers a robust solution to the low signal-to-noise ratio of blood-oxygen-level-dependent signals, a recurring issue in direct DNN modeling. It also provides valuable insights into which brain circuits or dynamic networks contribute more to final predictions. As such, DRAT shows a promising direction in neuroimaging studies, contributing to the comprehensive understanding of brain organization and the role of neural circuits.
    摘要 latest neuroimaging studies have highlighted the importance of network-centric brain analysis, particularly with functional magnetic resonance imaging. The emergence of Deep Neural Networks has fostered a substantial interest in predicting clinical outcomes and categorizing individuals based on brain networks. However, the conventional approach involving static brain network analysis offers limited potential in capturing the dynamism of brain function. Although recent studies have attempted to harness dynamic brain networks, their high dimensionality and complexity present substantial challenges. This paper proposes a novel methodology, Dynamic bRAin Transformer (DART), which combines static and dynamic brain networks for more effective and nuanced brain function analysis. Our model uses the static brain network as a baseline, integrating dynamic brain networks to enhance performance against traditional methods. We innovatively employ attention mechanisms, enhancing model explainability and exploiting the dynamic brain network's temporal variations. The proposed approach offers a robust solution to the low signal-to-noise ratio of blood-oxygen-level-dependent signals, a recurring issue in direct DNN modeling. It also provides valuable insights into which brain circuits or dynamic networks contribute more to final predictions. As such, DRAT shows a promising direction in neuroimaging studies, contributing to the comprehensive understanding of brain organization and the role of neural circuits.

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

  • paper_url: http://arxiv.org/abs/2309.01940
  • repo_url: https://github.com/apexlab/codeapex
  • paper_authors: Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du, Weiming Zhang, Longteng Fan, Jiayi Lei, Renting Rui, Jianghao Lin, Yuchen Fang, Yifan Liu, Jingkuan Wang, Siyuan Qi, Kangning Zhang, Weinan Zhang, Yong Yu
  • for: 这个论文主要是用来评估大型自然语言模型(LLMs)在编程方面的能力。
  • methods: 该论文使用了一个名为CodeApex的双语 bencmark dataset,以评估 LLMS 在编程理解和代码生成方面的能力。 CodeApex 包括三类多选问题:概念理解、通用理性和多步理性,用于评估 LLMS 的编程理解能力。
  • results: 研究人员使用 14 个当前状态的 LLMS,包括一般型和专门型模型,进行评估。 GPT 表现出了最好的编程能力,在两个任务上的准确率分别为 50% 和 56%。 这显示了 LLMS 在编程任务上仍有很大的改进空间。
    Abstract With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. We propose CodeApex, a bilingual benchmark dataset focusing on the programming comprehension and code generation abilities of LLMs. CodeApex comprises three types of multiple-choice questions: conceptual understanding, commonsense reasoning, and multi-hop reasoning, designed to evaluate LLMs on programming comprehension tasks. Additionally, CodeApex utilizes algorithmic questions and corresponding test cases to assess the code quality generated by LLMs. We evaluate 14 state-of-the-art LLMs, including both general-purpose and specialized models. GPT exhibits the best programming capabilities, achieving approximate accuracies of 50% and 56% on the two tasks, respectively. There is still significant room for improvement in programming tasks. We hope that CodeApex can serve as a reference for evaluating the coding capabilities of LLMs, further promoting their development and growth. Datasets are released at https://github.com/APEXLAB/CodeApex.git. CodeApex submission website is https://apex.sjtu.edu.cn/codeapex/.
    摘要 <>使大语言模型(LLM)的出现,程序设计和生成能力得到了显著改善,引起了研究人员的关注。我们提出了 CodeApex,一个双语测试集,旨在评估 LLM 在程序理解和代码生成任务上的能力。CodeApex 包括三种多选问题:概念理解、常识逻辑和多步逻辑,用于评估 LLM 在程序理解任务上的能力。此外,CodeApex 还使用了算法问题和相应的测试用例,来评估 LLM 生成的代码质量。我们对 14 个当前state-of-the-art LLM 进行评估,包括一般目标和专门目标模型。GPT 在两个任务上显示出了最好的编程能力,即 aproximate 的准确率为 50% 和 56%。然而,还有很多空间可以进一步改进程序任务。我们希望 CodeApex 能够成为 LLM 编程能力的参考,并促进其发展和成长。测试集可以在 https://github.com/APEXLAB/CodeApex.git 上下载。CodeApex 提交website是 https://apex.sjtu.edu.cn/codeapex/。

Provably safe systems: the only path to controllable AGI

  • paper_url: http://arxiv.org/abs/2309.01933
  • repo_url: None
  • paper_authors: Max Tegmark, Steve Omohundro
  • for: 这篇论文旨在帮助人类安全蒸蒸成长,并使用强大的人工智能(AGI)来实现这一目标。
  • methods: 这篇论文提出了使用高级人工智能进行正式验证和机制解释来建构AGI,并 garantía AGI满足人类指定的要求。
  • results: 这篇论文认为,这种方法是保证安全控制AGI的唯一道路。
    Abstract We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability. We further argue that it is the only path which guarantees safe controlled AGI. We end with a list of challenge problems whose solution would contribute to this positive outcome and invite readers to join in this work.
    摘要 我们描述了一条人类安全快速发展的人工通用智能(AGI)路径,通过让AGI建立可靠满足人类规定的条件。我们认为这很快会科技上可行,使用进步的AI进行正式验证和机械阅读性。我们还认为这是唯一能 guarantee safe controlled AGI 的路径。我们列出了一些挑战问题的解决方案,并邀请读者参与这个工作。Note that "人类安全快速发展" (rénxīn ànqù suǒzhòng fāzhǎng) is a bit of a mouthful in Chinese, so you may see variations of the phrase that use shorter words or different phrasing.

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2309.01922
  • repo_url: None
  • paper_authors: Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal
  • for: 这个论文关注的是无穷远景平均奖励Markov决策过程(MDP)。与现有工作不同,我们的方法不假设MDP结构是线性的,而是利用通用的政策梯度算法,从而解放其吞吐量。
  • methods: 我们提出了一种政策梯度算法,并证明其全球吞吐量性。此外,我们还 Compute regret bound,这是首次在平均奖励场景中对通用参数化政策梯度算法进行投入的尝试。
  • results: 我们证明了该算法的 regret bound为 $\tilde{\mathcal{O}({T}^{3/4})$。这意味着,在平均奖励场景中,我们的算法可以在很短的时间内达到理想的决策。
    Abstract In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it from the constraints of assuming a linear MDP structure. We propose a policy gradient-based algorithm and show its global convergence property. We then prove that the proposed algorithm has $\tilde{\mathcal{O}({T}^{3/4})$ regret. Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.
    摘要 在这篇论文中,我们考虑了一个无穷horizon平均奖励Markov决策过程(MDP)。与现有的研究不同,我们的方法利用了通用的policy梯度基本算法,从linear MDP结构的假设中解放出来。我们提出了一种policy梯度基本算法,并证明其全球归一化性。然后,我们证明了提案的算法有$\tilde{\mathcal{O}({T}^{3/4})$的 regret。值得注意的是,这篇论文是第一篇在average奖励场景中计算 regret bound的general parameterized policy gradient算法的探索。

SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection

  • paper_url: http://arxiv.org/abs/2309.01907
  • repo_url: https://github.com/JTRNEO/SyntheWorld
  • paper_authors: Jian Song, Hongruixuan Chen, Naoto Yokoya
  • for: 提高计算机视觉任务和技术的研究,尤其是远程感知图像处理领域。
  • methods: 使用 Synthetic dataset,包括40,000个图像,每个图像具有 submeter 精度的像素和 eight 类地形分类注解,以及40,000个对比图像,用于检测建筑变化。
  • results: 通过在多个标准远程感知图像集上进行实验,证明 SyntheticWorld 的高质量和多样性,并 investigate 了在不同条件下 synthetic data 的优势。
    Abstract Synthetic datasets, recognized for their cost effectiveness, play a pivotal role in advancing computer vision tasks and techniques. However, when it comes to remote sensing image processing, the creation of synthetic datasets becomes challenging due to the demand for larger-scale and more diverse 3D models. This complexity is compounded by the difficulties associated with real remote sensing datasets, including limited data acquisition and high annotation costs, which amplifies the need for high-quality synthetic alternatives. To address this, we present SyntheWorld, a synthetic dataset unparalleled in quality, diversity, and scale. It includes 40,000 images with submeter-level pixels and fine-grained land cover annotations of eight categories, and it also provides 40,000 pairs of bitemporal image pairs with building change annotations for building change detection task. We conduct experiments on multiple benchmark remote sensing datasets to verify the effectiveness of SyntheWorld and to investigate the conditions under which our synthetic data yield advantages. We will release SyntheWorld to facilitate remote sensing image processing research.
    摘要 《 synthetic datasets 》,被广泛应用于计算机视觉任务和技术的进步,因为它们的成本效益很高。然而,当涉及到远程感知图像处理时,创建 synthetic datasets 变得更加困难,因为需要更大规模和更多的 3D 模型。这种复杂性由实际远程感知数据的限制和高注释成本带来,这使得高质量的 synthetic altenativas 变得更加重要。为解决这一问题,我们介绍 SyntheWorld,一个无与伦比的 synthetic dataset,包括 40,000 张图像,每张图像有 submeter 级像素和细化的地形分类注释,同时还提供了 40,000 对时间双写图像对,用于建筑变化检测任务的注释。我们在多个标准远程感知数据集上进行了实验,以验证 SyntheWorld 的有效性和在不同条件下synthetic 数据的优势。我们将在未来发布 SyntheWorld,以便促进远程感知图像处理研究。

Towards General and Efficient Online Tuning for Spark

  • paper_url: http://arxiv.org/abs/2309.01901
  • repo_url: None
  • paper_authors: Yang Li, Huaijun Jiang, Yu Shen, Yide Fang, Xiaofeng Yang, Danqing Huang, Xinyi Zhang, Wentao Zhang, Ce Zhang, Peng Chen, Bin Cui
  • for: 提高 Spark 的性能和可扩展性,解决自动调整问题。
  • methods: 提出一个通用和高效的 Spark 自动调整框架,包括一个通用优化形式ulation、搜索方法、安全获取方法和三种创新技术。
  • results: 实现了在云端提供独立的 Spark 调整服务,并在实际生产任务中实现了减少内存成本57.00%和CPU成本34.93%的效果,提高了实用性、通用性和效率。
    Abstract The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. First, we introduce a generalized tuning formulation, which can support multiple tuning goals and constraints conveniently, and a Bayesian optimization (BO) based solution to solve this generalized optimization problem. Second, to avoid high overhead from additional offline evaluations in existing methods, we propose to tune parameters along with the actual periodic executions of each job (i.e., online evaluations). To ensure safety during online job executions, we design a safe configuration acquisition method that models the safe region. Finally, three innovative techniques are leveraged to further accelerate the search process: adaptive sub-space generation, approximate gradient descent, and meta-learning method. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent. The empirical results on both public benchmarks and large-scale production tasks demonstrate its superiority in terms of practicality, generality, and efficiency. Notably, this service saves an average of 57.00% memory cost and 34.93% CPU cost on 25K in-production tasks within 20 iterations, respectively.
    摘要 分布式数据分析系统---Spark 是一种常用的处理巨量不同类型数据的工具,但是调整其参数以 достичь高性能是一个挑战。最近的研究尝试使用自动调整技术解决这个问题,但它们受到三种问题的限制:功能受限,高过程成本,和不fficient搜索。在这篇论文中,我们提出了一个通用和高效的Spark调整框架,可以同时解决这三种问题。首先,我们引入一个通用的调整形式,可以方便地支持多个调整目标和约束,并使用抽象优化(BO)来解决这个通用优化问题。其次,为了避免现有方法的高过程成本,我们提议在实际 periodic执行每个任务时(即在线评估)进行参数调整。为确保在线任务执行安全,我们设计了一种安全配置获取方法,可以模拟安全区域。最后,我们采用了三种创新技术来加速搜索过程:适应子空间生成、 Approximate Gradient Descent 和元学习方法。我们实现了这个框架作为独立的云服务,并应用于腾讯数据平台。实际结果表明,这个框架在实际应用中具有了优秀的实用性、通用性和高效性。特别是,这个服务在25000个生产任务上平均占用内存成本下降57.00%,并且CPU成本下降34.93%,在20个迭代中分别达到这些值。

Inferring Actual Treatment Pathways from Patient Records

  • paper_url: http://arxiv.org/abs/2309.01897
  • repo_url: None
  • paper_authors: Adrian Wilkins-Caruana, Madhushi Bandara, Katarzyna Musial, Daniel Catchpoole, Paul J. Kennedy
  • for: This paper aims to infer actual treatment steps for a particular patient group from administrative health records (AHRs), addressing gaps in treatment pathway-inference research.
  • methods: The method introduced in this paper is called Defrag, which learns the semantic and temporal meaning of healthcare event sequences using a neural network (NN) and a self-supervised learning objective.
  • results: Defrag significantly outperforms several existing pathway-inference methods and is effective in identifying best-practice pathway fragments for breast cancer, lung cancer, and melanoma in public healthcare records.Here’s the Chinese translation of the three points:
  • for: 这篇论文目标是从行政医疗记录(AHR)中推断特定患者群体的实际治疗步骤,解决治疗路径推断研究中的技术和方法上的缺陷。
  • methods: 该论文提出的方法是名为“Defrag”的方法,它利用神经网络(NN)和一种自我超vised学习目标来学习医疗事件序列的semantic和时间意义。
  • results: Defrag Significantly Outperforms Several Existing Pathway-Inference Methods and Effective in Identifying Best-Practice Pathway Fragments for Breast Cancer, Lung Cancer, and Melanoma in Public Healthcare Records。
    Abstract Treatment pathways are step-by-step plans outlining the recommended medical care for specific diseases; they get revised when different treatments are found to improve patient outcomes. Examining health records is an important part of this revision process, but inferring patients' actual treatments from health data is challenging due to complex event-coding schemes and the absence of pathway-related annotations. This study aims to infer the actual treatment steps for a particular patient group from administrative health records (AHR) - a common form of tabular healthcare data - and address several technique- and methodology-based gaps in treatment pathway-inference research. We introduce Defrag, a method for examining AHRs to infer the real-world treatment steps for a particular patient group. Defrag learns the semantic and temporal meaning of healthcare event sequences, allowing it to reliably infer treatment steps from complex healthcare data. To our knowledge, Defrag is the first pathway-inference method to utilise a neural network (NN), an approach made possible by a novel, self-supervised learning objective. We also developed a testing and validation framework for pathway inference, which we use to characterise and evaluate Defrag's pathway inference ability and compare against baselines. We demonstrate Defrag's effectiveness by identifying best-practice pathway fragments for breast cancer, lung cancer, and melanoma in public healthcare records. Additionally, we use synthetic data experiments to demonstrate the characteristics of the Defrag method, and to compare Defrag to several baselines where it significantly outperforms non-NN-based methods. Defrag significantly outperforms several existing pathway-inference methods and offers an innovative and effective approach for inferring treatment pathways from AHRs. Open-source code is provided to encourage further research in this area.
    摘要 医疗路径是一系列步骤计划,用于确定特定疾病的建议的医疗方案。这些路径不断地得到更新,当新的治疗方法提高患者结果时。查看医疗记录是这个 revision 过程的重要组成部分,但从医疗数据中推断患者的具体治疗步骤是困难的,因为医疗事件编码方案复杂,而且缺乏路径相关的注释。本研究旨在从医疗记录中推断患者特定群体的实际治疗步骤,并解决了一些技术和方法基础上的差距。我们提出了一种名为Defrag的方法,可以从医疗记录中推断实际治疗步骤。Defrag可以学习医疗事件序列的semantic和temporal意义,以可靠地从复杂医疗数据中推断治疗步骤。我们知道,Defrag是首个利用神经网络(NN)的医疗路径推断方法,这是由于我们提出的一种新的自主学习目标。我们还开发了一个用于医疗路径推断的测试和验证框架,用于评估和比较Defrag的路径推断能力,并与基eline相比。我们在公共医疗记录中identified breast cancer, lung cancer和melanoma的best-practice路径片段。此外,我们通过 sintetic data experiment demonstrates Defrag的特点,并与其他基eline相比,Defrag显示出显著的优势。Defrag signifiantly outperforms several existing pathway-inference methods and offers an innovative and effective approach for inferring treatment pathways from AHRs.我们提供了开源代码,以便进一步研究这个领域。

On the Planning, Search, and Memorization Capabilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.01868
  • repo_url: None
  • paper_authors: Yunhao Yang, Anshul Tomar
  • for: 这项研究探讨了使用最新的大语言模型GPT-4进行规划任务的可能性,并在多个规划子领域进行了广泛的检验。
  • methods: 本研究使用GPT-4进行规划领域EXTRACTION、图搜索路径规划和反对抗规划等多个任务的实验分析。
  • results: 研究发现GPT-4在规划领域中表现出色,但也存在一些约束限制其应用范围。提出了一种精通语言模型特定领域的微调方法来提高CoT能力。
    Abstract The rapid advancement of large language models, such as the Generative Pre-trained Transformer (GPT) series, has had significant implications across various disciplines. In this study, we investigate the potential of the state-of-the-art large language model (GPT-4) for planning tasks. We explore its effectiveness in multiple planning subfields, highlighting both its strengths and limitations. Through a comprehensive examination, we identify areas where large language models excel in solving planning problems and reveal the constraints that limit their applicability. Our empirical analysis focuses on GPT-4's performance in planning domain extraction, graph search path planning, and adversarial planning. We then propose a way of fine-tuning a domain-specific large language model to improve its Chain of Thought (CoT) capabilities for the above-mentioned tasks. The results provide valuable insights into the potential applications of large language models in the planning domain and pave the way for future research to overcome their limitations and expand their capabilities.
    摘要 <>转换给定文本到简化中文。<>大语言模型的快速发展,如生成预训练变换器(GPT)系列,对各个领域产生了深远的影响。在这项研究中,我们研究了最新的州阶势language model(GPT-4)在规划任务中的潜力。我们探索它在多个规划子领域的效果,把握其优势和局限性。通过全面的分析,我们确定了大语言模型在解决规划问题的场景,以及它们的应用约束。我们的实验分析关注GPT-4在规划领域抽取、图搜索路径规划和反对抗规划等方面的性能。然后,我们提出了一种 fine-tuning 域特定的大语言模型来提高它的链条思维(CoT)能力,以便更好地应用于以上任务。结果提供了对大语言模型在规划领域的应用潜力和未来研究的指导。

Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting

  • paper_url: http://arxiv.org/abs/2309.01866
  • repo_url: None
  • paper_authors: Ping He, Yifan Xia, Xuhong Zhang, Shouling Ji
  • for: 本研究旨在提出一种高效的查询式攻击框架,用于对基于机器学习的Android黑客检测(AMD)方法进行攻击。
  • methods: 本研究使用了一种基于零知识的查询式攻击方法,可以在各种实际场景中进行攻击。
  • results: 对多种主流的机器学习基于AMD方法和现实世界的抗病毒解决方案进行了广泛的评估,并取得了出色的成绩。
    Abstract The widespread adoption of the Android operating system has made malicious Android applications an appealing target for attackers. Machine learning-based (ML-based) Android malware detection (AMD) methods are crucial in addressing this problem; however, their vulnerability to adversarial examples raises concerns. Current attacks against ML-based AMD methods demonstrate remarkable performance but rely on strong assumptions that may not be realistic in real-world scenarios, e.g., the knowledge requirements about feature space, model parameters, and training dataset. To address this limitation, we introduce AdvDroidZero, an efficient query-based attack framework against ML-based AMD methods that operates under the zero knowledge setting. Our extensive evaluation shows that AdvDroidZero is effective against various mainstream ML-based AMD methods, in particular, state-of-the-art such methods and real-world antivirus solutions.
    摘要 Android 操作系统的普及使得恶意应用程序成为了袭击者的目标。基于机器学习(ML)的 Android 恶意软件检测(AMD)方法是解决这个问题的关键,但它们受到了对抗示例的攻击的担忧。现有的对 ML-based AMD 方法的攻击方法具有惊人的性能,但它们假设了可能不是实际场景中的假设,例如特征空间、模型参数和训练集的知识要求。为解决这个限制,我们介绍了 AdvDroidZero,一种基于查询的攻击框架,在零知识设定下运行。我们的广泛评估表明,AdvDroidZero 对主流 ML-based AMD 方法和实际的反病毒解决方案都具有高效性。

BigFUSE: Global Context-Aware Image Fusion in Dual-View Light-Sheet Fluorescence Microscopy with Image Formation Prior

  • paper_url: http://arxiv.org/abs/2309.01865
  • repo_url: None
  • paper_authors: Yu Liu, Gesine Muller, Nassir Navab, Carsten Marr, Jan Huisken, Tingying Peng
  • for: 提高LSFM图像质量,解决薄样品中光散射引起的图像模糊问题
  • methods: 使用双视图图像融合技术,根据两个视图的图像质量进行地方性比较,以确定具有高对比度的焦点 pixels
  • results: 提出了BigFUSE全局上下文感知图像融合方法,可以在LSFM中稳定图像融合,并且可以排除结构化噪声,从而提高图像质量
    Abstract Light-sheet fluorescence microscopy (LSFM), a planar illumination technique that enables high-resolution imaging of samples, experiences defocused image quality caused by light scattering when photons propagate through thick tissues. To circumvent this issue, dualview imaging is helpful. It allows various sections of the specimen to be scanned ideally by viewing the sample from opposing orientations. Recent image fusion approaches can then be applied to determine in-focus pixels by comparing image qualities of two views locally and thus yield spatially inconsistent focus measures due to their limited field-of-view. Here, we propose BigFUSE, a global context-aware image fuser that stabilizes image fusion in LSFM by considering the global impact of photon propagation in the specimen while determining focus-defocus based on local image qualities. Inspired by the image formation prior in dual-view LSFM, image fusion is considered as estimating a focus-defocus boundary using Bayes Theorem, where (i) the effect of light scattering onto focus measures is included within Likelihood; and (ii) the spatial consistency regarding focus-defocus is imposed in Prior. The expectation-maximum algorithm is then adopted to estimate the focus-defocus boundary. Competitive experimental results show that BigFUSE is the first dual-view LSFM fuser that is able to exclude structured artifacts when fusing information, highlighting its abilities of automatic image fusion.
    摘要 光Sheet fluorescence微scopía(LSFM),一种平面照明技术,可以实现高分辨率图像的取得,但光子在厚度的样本中传播时会导致图像模糊。为了解决这问题,双视图成像是有帮助的。它可以在不同的方向上扫描样本,从而实现不同部分的样本的高分辨率扫描。然而,当应用最新的图像融合方法时,由于其有限的场景视野,会导致图像融合失真。在这种情况下,我们提出了BigFUSE,一种全局上下文认知的图像融合器,可以在LSFM中稳定图像融合,并且考虑了光子在样本中的全局影响。通过对本地图像质量进行比较,BigFUSE可以计算出各个像素的封闭度,并且通过 bayes定理来确定注重点。在应用期望最大算法时,BigFUSE可以优先地除掉结构化遗憾。实验结果表明,BigFUSE是第一个可以自动执行图像融合的双视图LSFM融合器。