cs.LG - 2023-08-12

CoverNav: Cover Following Navigation Planning in Unstructured Outdoor Environment with Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.06594
  • repo_url: None
  • paper_authors: Jumman Hossain, Abu-Zaher Faridee, Nirmalya Roy, Anjan Basak, Derrik E. Asher
  • for: 这 paper 的目的是提出一种基于 Deep Reinforcement Learning (DRL) 算法,用于在 offroad 环境中避免被外部观察者发现,并在有观察者存在的情况下安全地前往预定的目的地。
  • methods: 该算法使用了一个本地成本地图,以帮助选择最佳的遮盾和低成本路径,并使用了3D 点云数据、机器人的位置和指定目标信息来计算本地成本地图。
  • results: CoverNav 在 Unity simulate 环境中被评估,并显示了在 terrain 中保持动态可行性的能力,并在不同的高度enario 中实现了最大的目标距离和成功率。
    Abstract Autonomous navigation in offroad environments has been extensively studied in the robotics field. However, navigation in covert situations where an autonomous vehicle needs to remain hidden from outside observers remains an underexplored area. In this paper, we propose a novel Deep Reinforcement Learning (DRL) based algorithm, called CoverNav, for identifying covert and navigable trajectories with minimal cost in offroad terrains and jungle environments in the presence of observers. CoverNav focuses on unmanned ground vehicles seeking shelters and taking covers while safely navigating to a predefined destination. Our proposed DRL method computes a local cost map that helps distinguish which path will grant the maximal covertness while maintaining a low cost trajectory using an elevation map generated from 3D point cloud data, the robot's pose, and directed goal information. CoverNav helps robot agents to learn the low elevation terrain using a reward function while penalizing it proportionately when it experiences high elevation. If an observer is spotted, CoverNav enables the robot to select natural obstacles (e.g., rocks, houses, disabled vehicles, trees, etc.) and use them as shelters to hide behind. We evaluate CoverNav using the Unity simulation environment and show that it guarantees dynamically feasible velocities in the terrain when fed with an elevation map generated by another DRL based navigation algorithm. Additionally, we evaluate CoverNav's effectiveness in achieving a maximum goal distance of 12 meters and its success rate in different elevation scenarios with and without cover objects. We observe competitive performance comparable to state of the art (SOTA) methods without compromising accuracy.
    摘要 自主导航在非道路环境中已经得到了RoboticsField的广泛研究。然而,在保持外部观察者隐私的情况下,自主导航仍然是一个未得到充分探索的领域。在这篇论文中,我们提出了一种基于深度优化学习(DRL)算法,称为CoverNav,用于在非道路地形和热带环境中寻找最佳隐蔽和可行的路径,并在外部观察者存在的情况下保持最佳的隐蔽性。CoverNav的设计目标是让无人地面车辆在保持安全的情况下寻找遮盾和避险的方式,并最终达到预定的目标地点。我们提出的DRL方法计算了当地的成本图,以帮助分辨出最佳隐蔽路径,同时保持低成本 trajectory使用3D点云数据、机器人的姿态和指定目标信息。CoverNav帮助机器人代理人学习低高度地形,通过一个奖励函数,同时对高高度增加惩罚。如果检测到观察者,CoverNav允许机器人选择自然障碍物(如岩石、房屋、瘫痪车辆、树木等),并使用它们作为遮盾隐藏。我们使用Unity simulate环境进行评估,并证明CoverNav可以在地形中保持动态可行速度。此外,我们还评估了CoverNav在不同高度场景中的效果和成功率,并发现其与状态之最好的方法(SOTA)的性能相似,无需损失精度。

Value-Distributional Model-Based Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.06590
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
  • for: 本研究旨在解决Sequential Decision-Making任务中的不确定性问题,通过基于搜索学习的模型based Bayesian reinforcement learning的角度来评估政策的长期表现。
  • methods: 本研究使用了分布式权值函数的思想,通过引入Bellman операktor的固定点来学习值函数的 posterior distribution。
  • results: 对多个连续控制任务的评估表明,EQR算法可以比以前的模型基于和模型自由算法表现更好,具有性能优势。
    Abstract Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape, e.g., Gaussians. Inspired by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function. Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function that can be used for policy optimization. Evaluation across several continuous-control tasks shows performance benefits with respect to established model-based and model-free algorithms.
    摘要 Important 是量化一个政策的长期表现uncertainty,解决sequential decision-making tasks。我们从model-based Bayesian reinforcement learning的角度研究这个问题,目标是学习Markov decision process中参数(epistemic)uncertainty引起的值函数 posterior distribution。前一任 restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape, e.g., Gaussians。drawing inspiration from distributional reinforcement learning,我们引入一个Bellman operator,其fixed-point是值分布函数。根据我们的理论,我们提出Epistemic Quantile-Regression(EQR),一种model-based算法,可以学习一个可以用于政策优化的值分布函数。在许多连续控制任务上,我们的算法表现出了与已知model-based和model-free算法相比的性能优势。

Approximate Answering of Graph Queries

  • paper_url: http://arxiv.org/abs/2308.06585
  • repo_url: None
  • paper_authors: Michael Cochez, Dimitrios Alivanistos, Erik Arakelyan, Max Berrendorf, Daniel Daza, Mikhail Galkin, Pasquale Minervini, Mathias Niepert, Hongyu Ren
  • for: Answering queries in an incomplete knowledge graph (KG) setting.
  • methods: Several methods have been proposed to answer queries in an incomplete KG setting, including approaches based on semantic search, knowledge graph completion, and embedding-based methods.
  • results: These methods have been shown to be effective in answering queries in an incomplete KG setting, but they have limitations in terms of expressiveness, supported graph types, and inference capabilities.Here is the same information in Simplified Chinese text:
  • for: Answering queries in an incomplete知识图(KG) setting.
  • methods: Several methods have been proposed to answer queries in an incomplete KG setting, including基于semantic search的方法、knowledge graph completion的方法和embedding-based methods.
  • results: These methods have been shown to be effective in answering queries in an incomplete KG setting, but they have limitations in terms of expressiveness、supported graph types和inference capabilities.
    Abstract Knowledge graphs (KGs) are inherently incomplete because of incomplete world knowledge and bias in what is the input to the KG. Additionally, world knowledge constantly expands and evolves, making existing facts deprecated or introducing new ones. However, we would still want to be able to answer queries as if the graph were complete. In this chapter, we will give an overview of several methods which have been proposed to answer queries in such a setting. We will first provide an overview of the different query types which can be supported by these methods and datasets typically used for evaluation, as well as an insight into their limitations. Then, we give an overview of the different approaches and describe them in terms of expressiveness, supported graph types, and inference capabilities.
    摘要 知识图(KG)自然而然 incomplete,因为世界知识不完整和输入KG中的偏见。然而,我们仍然想能够回答尚未完善的查询。在这章中,我们将给出几种提出来的方法,以及它们在支持不同类型的查询和评估 datasets 的限制。然后,我们将对这些方法进行概述,包括它们在表达力、支持的图类型和推理能力方面的特点。Here's the breakdown of the text into Simplified Chinese characters:知识图 (KG) 自然而然 incomplete 因为世界知识不完整和输入 KG 中的偏见。然而,我们仍然想能够回答尚未完善的查询。在这章中,我们将给出几种提出来的方法,以及它们在支持不同类型的查询和评估 datasets 的限制。然后,我们将对这些方法进行概述,包括它们在表达力、支持的图类型和推理能力方面的特点。

A new solution and concrete implementation steps for Artificial General Intelligence

  • paper_url: http://arxiv.org/abs/2308.09721
  • repo_url: None
  • paper_authors: Yongcong Chen, Ting Zeng, Jun Zhang
  • for: 这个论文目标是解决现有技术的缺陷,以实现更广泛的人工智能应用。
  • methods: 该论文使用现有技术和解决现有技术的缺陷,以实现更广泛的人工智能应用。
  • results: 该论文提出了解决现有技术的缺陷,以实现更广泛的人工智能应用的方法。
    Abstract At present, the mainstream artificial intelligence generally adopts the technical path of "attention mechanism + deep learning" + "reinforcement learning". It has made great progress in the field of AIGC (Artificial Intelligence Generated Content), setting off the technical wave of big models[ 2][13 ]. But in areas that need to interact with the actual environment, such as elderly care, home nanny, agricultural production, and vehicle driving, trial and error are expensive and a reinforcement learning process that requires much trial and error is difficult to achieve. Therefore, in order to achieve Artificial General Intelligence(AGI) that can be applied to any field, we need to use both existing technologies and solve the defects of existing technologies, so as to further develop the technological wave of artificial intelligence. In this paper, we analyze the limitations of the technical route of large models, and by addressing these limitations, we propose solutions, thus solving the inherent defects of large models. In this paper, we will reveal how to achieve true AGI step by step.
    摘要 Translated into Simplified Chinese:现在,主流人工智能通常采用“注意机制+深度学习”+“强化学习”技术路线。这种方法在AIGC(人工智能生成内容)领域已经取得了 significativadeeps[ 2][13 ], triggering a technological wave of big models. However, in areas that require interaction with the actual environment, such as elderly care, home nanny, agricultural production, and vehicle driving, trial and error are costly and a reinforcement learning process that requires much trial and error is difficult to achieve. Therefore, to achieve Artificial General Intelligence (AGI) that can be applied to any field, we need to leverage both existing technologies and address the limitations of existing technologies, in order to further develop the technological wave of artificial intelligence. In this paper, we analyze the limitations of the technical route of large models, and by addressing these limitations, we propose solutions, thus solving the inherent defects of large models. Through this paper, we will reveal how to achieve true AGI step by step.

EquiDiff: A Conditional Equivariant Diffusion Model For Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2308.06564
  • repo_url: None
  • paper_authors: Kehua Chen, Xianda Chen, Zihan Yu, Meixin Zhu, Hai Yang
  • for: 预测自动驾驶车辆的未来路径,以确保安全和有效的运行。
  • methods: 使用深度学习的各种方法,包括权重排序网络和强化学习,以及基于 conditional diffusion model 的 EquiDiff 模型,通过 integrate 历史信息和随机 Gaussian 噪声来预测未来路径。
  • results: EquiDiff 模型在 NGSIM 数据集上的实验结果表明,在短期预测方面表现出色,但在长期预测方面有些较高的错误率。此外,我们还进行了一项ablation 研究,以Investigate 各组件对预测精度的贡献。同时,我们还提供了 diffusion 模型生成过程的视觉化,以提供预测结果的不确定性的视觉化。
    Abstract Accurate trajectory prediction is crucial for the safe and efficient operation of autonomous vehicles. The growing popularity of deep learning has led to the development of numerous methods for trajectory prediction. While deterministic deep learning models have been widely used, deep generative models have gained popularity as they learn data distributions from training data and account for trajectory uncertainties. In this study, we propose EquiDiff, a deep generative model for predicting future vehicle trajectories. EquiDiff is based on the conditional diffusion model, which generates future trajectories by incorporating historical information and random Gaussian noise. The backbone model of EquiDiff is an SO(2)-equivariant transformer that fully utilizes the geometric properties of location coordinates. In addition, we employ Recurrent Neural Networks and Graph Attention Networks to extract social interactions from historical trajectories. To evaluate the performance of EquiDiff, we conduct extensive experiments on the NGSIM dataset. Our results demonstrate that EquiDiff outperforms other baseline models in short-term prediction, but has slightly higher errors for long-term prediction. Furthermore, we conduct an ablation study to investigate the contribution of each component of EquiDiff to the prediction accuracy. Additionally, we present a visualization of the generation process of our diffusion model, providing insights into the uncertainty of the prediction.
    摘要 准确的轨迹预测是自动驾驶车辆运行的关键。随着深度学习的普及,许多方法已经被开发出来用于轨迹预测。而使用权值函数的决定性深度学习模型在轨迹预测中广泛应用。在本研究中,我们提出了EquiDiff,一种基于条件扩散模型的深度生成模型,用于预测未来车辆的轨迹。EquiDiff通过将历史信息和随机 Gaussian 噪声纳入条件扩散模型,生成未来车辆的轨迹。我们的核心模型是一个SO(2)-共轭变换器,它完全利用了坐标点的几何属性。此外,我们还使用循环神经网络和图注意网络来提取历史轨迹中的社会互动。为了评估EquiDiff的性能,我们在NGSIM数据集上进行了广泛的实验。我们的结果显示,EquiDiff在短期预测方面胜过其他基准模型,但在长期预测方面有些微的错误。此外,我们还进行了减少分析,以了解各组件对预测精度的贡献。此外,我们还提供了生成过程中扩散模型的视觉化,为预测不确定性提供了更多的视角。

Human Behavior-based Personalized Meal Recommendation and Menu Planning Social System

  • paper_url: http://arxiv.org/abs/2308.06549
  • repo_url: None
  • paper_authors: Tanvir Islam, Anika Rahman Joyita, Md. Golam Rabiul Alam, Mohammad Mehedi Hassan, Md. Rafiul Hassan, Raffaele Gravina
  • for: 这个研究旨在提供一种基于情感计算的餐单推荐和菜单规划方法,以满足用户不同情感的需求。
  • methods: 该研究使用了问卷调查和偏好认知来确定用户的餐食偏好,并使用电encephalography信号来检测用户对不同食物的情感。在这个研究中,我们使用了14栽 wireless Emotive Epoc+来测量用户对不同食物的情感。
  • results: 实验结果表明,该提议的情感计算、餐单推荐和菜单规划算法在多种评价参数上表现良好。
    Abstract The traditional dietary recommendation systems are basically nutrition or health-aware where the human feelings on food are ignored. Human affects vary when it comes to food cravings, and not all foods are appealing in all moods. A questionnaire-based and preference-aware meal recommendation system can be a solution. However, automated recognition of social affects on different foods and planning the menu considering nutritional demand and social-affect has some significant benefits of the questionnaire-based and preference-aware meal recommendations. A patient with severe illness, a person in a coma, or patients with locked-in syndrome and amyotrophic lateral sclerosis (ALS) cannot express their meal preferences. Therefore, the proposed framework includes a social-affective computing module to recognize the affects of different meals where the person's affect is detected using electroencephalography signals. EEG allows to capture the brain signals and analyze them to anticipate affective toward a food. In this study, we have used a 14-channel wireless Emotive Epoc+ to measure affectivity for different food items. A hierarchical ensemble method is applied to predict affectivity upon multiple feature extraction methods and TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) is used to generate a food list based on the predicted affectivity. In addition to the meal recommendation, an automated menu planning approach is also proposed considering a person's energy intake requirement, affectivity, and nutritional values of the different menus. The bin-packing algorithm is used for the personalized menu planning of breakfast, lunch, dinner, and snacks. The experimental findings reveal that the suggested affective computing, meal recommendation, and menu planning algorithms perform well across a variety of assessment parameters.
    摘要 传统的饮食建议系统基本上是nutrition或健康意识的,忽略了人类情感对食物的影响。人们对食物的欲望和喜好可以在不同的情感状态下发生变化,问卷和喜好意识的饭菜推荐系统可能是一个解决方案。然而,自动地认知社会情感对不同食物的影响,并根据营养需求和社会情感规划菜单,有一些显著的优点。例如,患有严重疾病、昏迷状态或locked-in syndrome和amyotrophic lateral sclerosis(ALS)患者无法表达他们的饭菜偏好。因此,我们的框架包括一个社交情感计算模块,用于识别不同饭菜中的情感。我们使用14栏 wireless Emotive Epoc+来测量不同饭品的情感响应。我们采用了层次ensemble方法来预测情感,并使用TOPSIS(理想解决方案的技术)来生成基于预测情感的饭品列表。此外,我们还提出了一种自动菜单规划方法,考虑人类能量摄入需求、情感和不同菜单的营养价值。使用bin-packing算法进行个性化菜单规划的早餐、午餐、晚餐和小吃。实验结果表明,我们提出的情感计算、饭菜推荐和菜单规划算法在多种评价参数上表现良好。

Digital elevation model correction in urban areas using extreme gradient boosting, land cover and terrain parameters

  • paper_url: http://arxiv.org/abs/2308.06545
  • repo_url: None
  • paper_authors: Chukwuma Okolie, Jon Mills, Adedayo Adeleke, Julian Smit
  • for: This paper aims to enhance the accuracy of medium-resolution digital elevation models (DEMs) in urban areas using the extreme gradient boosting (XGBoost) ensemble algorithm.
  • methods: The XGBoost algorithm was applied to two medium-resolution DEMs over Cape Town, South Africa, using eleven predictor variables, including elevation, urban footprints, and terrain features.
  • results: The correction achieved significant accuracy gains, with the root mean square error (RMSE) of the DEMs improving by 46-53% and 72-73%, respectively, compared to other proposed methods. These results demonstrate the potential of gradient boosted trees for enhancing the quality of DEMs and improving hydrological modelling in urban catchments.Here is the same information in Simplified Chinese text:
  • for: 本研究目的是使用极限梯度加权树(XGBoost)ensemble算法提高城市区域中数字高程模型(DEMs)的精度。
  • methods: XGBoost算法应用于两个中Resolution DEMs上cape Town,南非,使用eleven predictor variable,包括高程、城市脚本、地形特征等。
  • results: 修正得到了显著的准确性提高,高程误差根mean square error(RMSE)提高46-53%和72-73%,相比其他提议的方法。这些结果表明梯度加权树可以提高DEMs的质量和城市流域 hydrological modelling中的准确性。
    Abstract The accuracy of digital elevation models (DEMs) in urban areas is influenced by numerous factors including land cover and terrain irregularities. Moreover, building artifacts in global DEMs cause artificial blocking of surface flow pathways. This compromises their quality and adequacy for hydrological and environmental modelling in urban landscapes where precise and accurate terrain information is needed. In this study, the extreme gradient boosting (XGBoost) ensemble algorithm is adopted for enhancing the accuracy of two medium-resolution 30m DEMs over Cape Town, South Africa: Copernicus GLO-30 and ALOS World 3D (AW3D). XGBoost is a scalable, portable and versatile gradient boosting library that can solve many environmental modelling problems. The training datasets are comprised of eleven predictor variables including elevation, urban footprints, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector roughness measure, forest cover and bare ground cover. The target variable (elevation error) was calculated with respect to highly accurate airborne LiDAR. After training and testing, the model was applied for correcting the DEMs at two implementation sites. The correction achieved significant accuracy gains which are competitive with other proposed methods. The root mean square error (RMSE) of Copernicus DEM improved by 46 to 53% while the RMSE of AW3D DEM improved by 72 to 73%. These results showcase the potential of gradient boosted trees for enhancing the quality of DEMs, and for improved hydrological modelling in urban catchments.
    摘要 “城市地区数字高程模型(DEM)的准确性受多种因素影响,包括地面覆盖和地形差异。此外,全球DEM中的建筑物artefact会导致 superficiale流动的人工堵塞,从而下降其质量和适用性于城市地区的水文和环境模型。本研究采用extrem Gradient Boosting(XGBoost)ensemble算法来提高两个中等分辨率30米DEM的准确性,即Copernicus GLO-30和ALOS World 3D(AW3D)。XGBoost是一种可扩展、可移植和多样的梯度提升库,可以解决许多环境模型问题。训练数据集包括11个预测变量,包括高程、城市覆盖面积、坡度、方向、表面粗糙度、地形位置指数、地形抗 roughness指数、地形表面文字、向量粗糙度度量、森林覆盖率和无 veg 覆盖率。目标变量(高程错误)与高精度飞行 LiDAR 进行计算。经过训练和测试,模型在两个实施地点应用于修正DEM。修正后的DEM准确性提高了46%到53%,而AW3D DEM的准确性提高了72%到73%。这些结果表明梯度增进树可以提高DEM的质量,并且为城市水文模型提供了改进的可能性。”

Dealing with Small Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencoder Methods for Convolutional Models

  • paper_url: http://arxiv.org/abs/2308.06534
  • repo_url: https://github.com/wolfda95/ssl-medicalimagining-cl-mae
  • paper_authors: Daniel Wolf, Tristan Payer, Catharina Silvia Lisson, Christoph Gerhard Lisson, Meinrad Beer, Timo Ropinski, Michael Götz
  • for: 这篇研究的目的是为了探讨在医疗影像领域中使用深度学习模型,以减少医生负担,提高诊断速度,并最小化诊断错误的风险。
  • methods: 这篇研究使用了自然语言处理领域的自我超vised学习方法,将深度学习模型训练在大量无标注的医疗影像 dataset 上,然后使用小量标注 dataset 进行精度训练。
  • results: 研究发现,使用 SparK 自我超vised学习方法可以更好地适应小量标注 dataset,并且在不同的训练 dataset 大小下表现出不同的优势。因此,这篇研究建议在医疗影像领域使用 SparK 自我超vised学习方法,以提高深度学习模型的精度和效率。
    Abstract Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis. Training such deep learning models requires large and accurate datasets, with annotations for all training samples. However, in the medical imaging domain, annotated datasets for specific tasks are often small due to the high complexity of annotations, limited access, or the rarity of diseases. To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning. After pre-training, small annotated datasets are sufficient to fine-tune the models for a specific task. The most popular self-supervised pre-training approaches in medical imaging are based on contrastive learning. However, recent studies in natural image processing indicate a strong potential for masked autoencoder approaches. Our work compares state-of-the-art contrastive learning methods with the recently introduced masked autoencoder approach "SparK" for convolutional neural networks (CNNs) on medical images. Therefore we pre-train on a large unannotated CT image dataset and fine-tune on several CT classification tasks. Due to the challenge of obtaining sufficient annotated training data in medical imaging, it is of particular interest to evaluate how the self-supervised pre-training methods perform when fine-tuning on small datasets. By experimenting with gradually reducing the training dataset size for fine-tuning, we find that the reduction has different effects depending on the type of pre-training chosen. The SparK pre-training method is more robust to the training dataset size than the contrastive methods. Based on our results, we propose the SparK pre-training for medical imaging tasks with only small annotated datasets.
    摘要 深度学习在医疗影像领域可能减少诊断错误的风险,减轻放射学家的工作负担,并加速诊断。深度学习模型的训练需要大量和准确的数据集,并将所有训练样本注解。然而,在医疗影像领域,特定任务的注解数据集经常受到高复杂性的限制,限制了获得可用的数据。为解决这个挑战,深度学习模型可以通过不注解的图像集进行自我超vised学习。在这种情况下,小型注解数据集可以进行精度的微调。我们的工作 comparing state-of-the-art contrastive learning方法和最近引入的"SparK"隐藏自动编码器方法(MAE)在医疗影像领域的 convolutional neural networks(CNNs)中进行比较。因此,我们在大量无注解CT图像集上进行预训练,然后在多个CT分类任务上进行微调。由于在医疗影像领域获得足够的注解训练数据是困难的,因此我们特别关注在小型注解数据集上进行微调时的性能。我们通过逐渐减少微调数据集的大小来评估不同类型的预训练方法的性能。我们发现,使用SparK预训练方法可以更好地抗衡训练数据集的大小。根据我们的结果,我们提议在医疗影像任务中使用SparK预训练方法,即使只有小型注解数据集。

Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices

  • paper_url: http://arxiv.org/abs/2308.06528
  • repo_url: https://github.com/jakubkwiatkowski/abstract_compositional_transformer
  • paper_authors: Jakub Kwiatkowski, Krzysztof Krawiec
  • for: 本研究旨在提高 Abstract Reasoning 的能力,通过预测图像中对象的视觉属性和排列来解决 Raven Progressive Matrices (RPM) 问题。
  • methods: 本研究使用 transformer 架构,通过预测图像中对象的视觉属性和排列来解决 RPM 问题。研究还考虑了不同的图像分割方法和自动Masking 技术。
  • results: 实验结果表明,该方法不仅超越了当前最佳方法,还提供了有趣的思路和部分解释,帮助理解 RPM 问题的决策过程。此外,该方法还具有免除一些已知 RPM 标准准样的偏见的优点。
    Abstract One of the challenges in learning to perform abstract reasoning is that problems are often posed as monolithic tasks, with no intermediate subgoals. In Raven Progressive Matrices (RPM), the task is to choose one of the available answers given a context, where both contexts and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning is challenging and most contemporary solvers tend to be opaque. In this study, we propose a deep learning architecture based on the transformer blueprint which, rather than directly making the above choice, predicts the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to exist in some RPM benchmarks.
    摘要 一个学习抽象逻辑的挑战是问题frequently pose as monolithic tasks,without intermediate subgoals. In Raven Progressive Matrices (RPM), the task is to choose one of the available answers given a context, where both contexts and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning is challenging and most contemporary solvers tend to be opaque. In this study, we propose a deep learning architecture based on the transformer blueprint, which rather than directly making the above choice, predicts the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to exist in some RPM benchmarks.Here is the translation in Traditional Chinese:一个学习抽象逻辑的挑战是问题frequently pose as monolithic tasks,without intermediate subgoals. In Raven Progressive Matrices (RPM), the task is to choose one of the available answers given a context, where both contexts and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning is challenging and most contemporary solvers tend to be opaque. In this study, we propose a deep learning architecture based on the transformer blueprint, which rather than directly making the above choice, predicts the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to exist in some RPM benchmarks.

SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

  • paper_url: http://arxiv.org/abs/2308.06522
  • repo_url: None
  • paper_authors: Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Qingfeng Liu, Kee-Bong Song, Mostafa El-Khamy, Salman Avestimehr
  • for: 这个研究旨在探讨在分布式语言任务中应用精简 parameter fine-tuning(PEFT)方法,以提高 Federated Learning(FL) 的可行性和效率。
  • methods: 本研究使用了 parameter efficient fine-tuning(PEFT)方法,并提出了一个名为 SLoRA 的新方法,具有跨用户数据的可靠性和高效性。
  • results: 实验结果显示,SLoRA 可以与全量 fine-tuning 相比,实现高度可 sparse 的更新,并在高 hetrogenous 数据场景下提高了表现。特别是,SLoRA 可以实现 $\sim 1%$ 的紧密更新,并降低了训练时间,高达 $90%$。
    Abstract Transfer learning via fine-tuning pre-trained transformer models has gained significant success in delivering state-of-the-art results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devices and the huge sizes of popular transformer models, efficient fine-tuning is crucial to make federated training feasible. This work explores the opportunities and challenges associated with applying parameter efficient fine-tuning (PEFT) methods in different FL settings for language tasks. Specifically, our investigation reveals that as the data across users becomes more diverse, the gap between fully fine-tuning the model and employing PEFT methods widens. To bridge this performance gap, we propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios through a novel data-driven initialization technique. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning, with significant sparse updates with approximately $\sim 1\%$ density while reducing training time by up to $90\%$.
    摘要 <>转换给定文本到简化中文。>基于预训练变换器模型的迁移学习已经在不同的自然语言处理任务中带来了显著的成果,包括语音识别、文本分类、翻译等。在中央数据缺乏的情况下,联邦学习(FL)可以利用分布式和私有的edge客户端数据进行微调。然而,由于edge设备的通信、计算和存储能力的限制,以及流行的变换器模型的大型,高效的微调是必要的以使联邦训练成为可能。这项工作探讨了在不同的FL设置下应用Parameter Efficient Fine-tuning(PEFT)方法的机会和挑战。具体来说,我们的调查发现,当用户数据变得更加多样化时,完全微调和PEFT方法之间的性能差距加大。为 bridging这个性能差距,我们提议了一种名为SLoRA的方法,通过一种新的数据驱动初始化技术,超越LoRA在高多样性数据场景中的关键限制。我们的实验结果表明,SLoRA可以与全部微调达到相同的性能水平,并在大约1%的稀疏更新下降低训练时间约90%。

One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training

  • paper_url: http://arxiv.org/abs/2308.07934
  • repo_url: https://github.com/jianshuod/tba
  • paper_authors: Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia
  • for: 防御深度神经网络(DNNs)在实际设备上的安全性问题。
  • methods: 利用记忆FAULT INJECT技术实现行ammer attack,通过修改模型的权重来攻击量化模型在部署阶段。
  • results: 通过修改一个关键位的bit,可以轻松地将正常模型转化为恶意模型,并且这种攻击还可以绕过一些检测方法。
    Abstract Deep neural networks (DNNs) are widely deployed on real-world devices. Concerns regarding their security have gained great attention from researchers. Recently, a new weight modification attack called bit flip attack (BFA) was proposed, which exploits memory fault inject techniques such as row hammer to attack quantized models in the deployment stage. With only a few bit flips, the target model can be rendered useless as a random guesser or even be implanted with malicious functionalities. In this work, we seek to further reduce the number of bit flips. We propose a training-assisted bit flip attack, in which the adversary is involved in the training stage to build a high-risk model to release. This high-risk model, obtained coupled with a corresponding malicious model, behaves normally and can escape various detection methods. The results on benchmark datasets show that an adversary can easily convert this high-risk but normal model to a malicious one on victim's side by \textbf{flipping only one critical bit} on average in the deployment stage. Moreover, our attack still poses a significant threat even when defenses are employed. The codes for reproducing main experiments are available at \url{https://github.com/jianshuod/TBA}.
    摘要 深度神经网络(DNN)在实际设备上广泛应用。关于其安全性的问题吸引了研究者的广泛关注。最近,一种新的权值修改攻击方法called bit flip attack(BFA)被提出,它利用内存错误注入技术such as row hammer攻击部署阶段的量化模型。只需几个比特软件,目标模型就可以变成随机猜测器或甚至被恶意模型植入。在这种工作中,我们尝试降低比特软件的数量。我们提出了帮助者参与训练阶段的训练帮助攻击,以建立一个高风险模型,并将其发布。这个高风险模型,结合相应的恶意模型,在发布阶段 behave normally,并可以逃脱多种检测方法。我们的攻击仍然对于防御措施产生威胁。实验结果表明,一个攻击者可以在部署阶段通过flipping only one critical bit的方式,将高风险模型转换为恶意模型,而且这种攻击仍然有效even when defenses are employed。代码可以在 中进行重现主要实验。

Performance Analysis for Resource Constrained Decentralized Federated Learning Over Wireless Networks

  • paper_url: http://arxiv.org/abs/2308.06496
  • repo_url: None
  • paper_authors: Zhigang Yan, Dong Li
  • for: 这个研究旨在分析资源受限的分布式机器学习(DFL)系统中的通信效率优化。
  • methods: 这个研究使用了不同的通信方案(数位和类比)来分析内部通信效率。
  • results: 研究发现,这些通信方案可以提供内部模型的训练,并且可以在不同的通信条件下进行优化。
    Abstract Federated learning (FL) can lead to significant communication overhead and reliance on a central server. To address these challenges, decentralized federated learning (DFL) has been proposed as a more resilient framework. DFL involves parameter exchange between devices through a wireless network. This study analyzes the performance of resource-constrained DFL using different communication schemes (digital and analog) over wireless networks to optimize communication efficiency. Specifically, we provide convergence bounds for both digital and analog transmission approaches, enabling analysis of the model performance trained on DFL. Furthermore, for digital transmission, we investigate and analyze resource allocation between computation and communication and convergence rates, obtaining its communication complexity and the minimum probability of correction communication required for convergence guarantee. For analog transmission, we discuss the impact of channel fading and noise on the model performance and the maximum errors accumulation with convergence guarantee over fading channels. Finally, we conduct numerical simulations to evaluate the performance and convergence rate of convolutional neural networks (CNNs) and Vision Transformer (ViT) trained in the DFL framework on fashion-MNIST and CIFAR-10 datasets. Our simulation results validate our analysis and discussion, revealing how to improve performance by optimizing system parameters under different communication conditions.
    摘要 联合学习(FL)可能会带来重要的通信负担和依赖中央服务器。为了解决这些挑战,分散式联合学习(DFL)已经被提议作为更可靠的框架。DFL通过装置间的参数交换来进行学习。本研究分析了受限制的DFL在无线网络上的表现,使用不同的通信方案(数位和模拟),以便最佳化通信效率。具体来说,我们提供了两种通信方法的整合界限,以及对数位传输的资源分配和通信复杂度的分析。另外,我们还考虑了频道折射和噪音对模型性能的影响,并分析了在折射通道上获得最大错误的组合。最后,我们将进行数据实验,以评估在DFL框架中训练过滤神经网络和探索神经网络的性能和融合率。我们的实验结果验证了我们的分析和讨论,并显示了如何通过优化系统参数来提高性能。

Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding

  • paper_url: http://arxiv.org/abs/2308.06472
  • repo_url: None
  • paper_authors: Kumari Nishu, Minsik Cho, Paul Dixon, Devang Naik
  • for: 这篇论文主要关注于efficiently detecting arbitrary keywords in audio-text modalities, using an audio-compliant text encoder to reduce the mismatch between text and audio embeddings.
  • methods: 本文提出了一个新的架构,使用一个具有同步表示的文本编码器,将文本转换为phonemes使用grapheme-to-phoneme(G2P)模型,然后将phonemes转换为嵌入使用代表性的phoneme вектор,从低质量的话语资料集中提取。此外,本文还使用可替代的关键生成技术来开发一个Audio-Text嵌入验证器。
  • results: 实验结果显示,本文的方法在Libriphrase hard dataset上比前一个state-of-the-art的结果高出84.21%到92.7%,且下降了23.36%到14.4%的Equal-Error-Rate(EER)值。
    Abstract Spotting user-defined/flexible keywords represented in text frequently uses an expensive text encoder for joint analysis with an audio encoder in an embedding space, which can suffer from heterogeneous modality representation (i.e., large mismatch) and increased complexity. In this work, we propose a novel architecture to efficiently detect arbitrary keywords based on an audio-compliant text encoder which inherently has homogeneous representation with audio embedding, and it is also much smaller than a compatible text encoder. Our text encoder converts the text to phonemes using a grapheme-to-phoneme (G2P) model, and then to an embedding using representative phoneme vectors, extracted from the paired audio encoder on rich speech datasets. We further augment our method with confusable keyword generation to develop an audio-text embedding verifier with strong discriminative power. Experimental results show that our scheme outperforms the state-of-the-art results on Libriphrase hard dataset, increasing Area Under the ROC Curve (AUC) metric from 84.21% to 92.7% and reducing Equal-Error-Rate (EER) metric from 23.36% to 14.4%.
    摘要 通常情况下,用户定义/灵活关键词在文本中的检测通常需要使用昂贵的文本编码器进行共同分析,并与音频编码器在嵌入空间进行结合分析,这可能会导致不同类型的表达(大匹配度)和复杂性增加。在这种工作中,我们提出了一种新的架构,可以有效地检测任意关键词,基于兼容音频编码器的文本编码器,该编码器具有兼容音频嵌入的同型表示,并且比兼容文本编码器更小。我们的文本编码器将文本转换为音频的phoneme使用图eme-to-phoneme(G2P)模型,然后将其转换为嵌入使用表示音频嵌入的phoneme вектор。我们还将我们的方法与可能的关键词生成进行增强,以开发一个具有强大抑制力的音频-文本嵌入验证器。实验结果表明,我们的方案在Libriphrase困难数据集上的成绩高于当前最佳结果,从84.21%提高到92.7%,并将相同错误率(EER)从23.36%降低到14.4%。

Volterra Accentuated Non-Linear Dynamical Admittance (VANYA) to model Deforestation: An Exemplification from the Amazon Rainforest

  • paper_url: http://arxiv.org/abs/2308.06471
  • repo_url: None
  • paper_authors: Karthik R., Ramamoorthy A.
  • for: 本研究旨在预测雨林覆盖率,通过 integrate 猎物驱动力学和决策支持系统。
  • methods: 本研究使用 VANYA 模型,包括猎物驱动力学和决策支持系统,并对 Amazon 雨林数据进行预测。
  • results: 研究发现 VANYA 模型在预测雨林覆盖率方面表现出色,比 Long Short-Term Memory、N-BEATS 和 RCN 等其他预测器更为精准。
    Abstract Intelligent automation supports us against cyclones, droughts, and seismic events with recent technology advancements. Algorithmic learning has advanced fields like neuroscience, genetics, and human-computer interaction. Time-series data boosts progress. Challenges persist in adopting these approaches in traditional fields. Neural networks face comprehension and bias issues. AI's expansion across scientific areas is due to adaptable descriptors and combinatorial argumentation. This article focuses on modeling Forest loss using the VANYA Model, incorporating Prey Predator Dynamics. VANYA predicts forest cover, demonstrated on Amazon Rainforest data against other forecasters like Long Short-Term Memory, N-BEATS, RCN.
    摘要 智能自动化支持我们面对风暴、旱情和地震事件,因为最近技术的发展。算法学习已经提高了神经科学、遗传学和人机交互等领域。时间序列数据提高了进步。但在传统领域中采纳这些方法仍存在挑战。神经网络受理解和偏见问题困扰。AI的扩展到科学领域归功于可变描述和组合论证。本文将关注用VANYA模型预测森林损失,包括猎 Predator Dynamics。VANYA预测森林覆盖率,通过对亚马逊雨林数据进行比较,与其他预测器如Long Short-Term Memory、N-BEATS、RCN。

Tiny and Efficient Model for the Edge Detection Generalization

  • paper_url: http://arxiv.org/abs/2308.06468
  • repo_url: https://github.com/xavysp/teed
  • paper_authors: Xavier Soria, Yachuan Li, Mohammad Rouhani, Angel D. Sappa
  • For: 提高边检测精度,降低模型复杂度* Methods: 提出了一种轻量级卷积神经网络TEED,只有58K参数,比State-of-the-art模型少得多。* Results: 模型训练时间快(less than 30 minutes),每 epoch快(less than 5 minutes),预测边映射清晰度高。新提出的测试集可以评估边检测模型的通用性。
    Abstract Most high-level computer vision tasks rely on low-level image operations as their initial processes. Operations such as edge detection, image enhancement, and super-resolution, provide the foundations for higher level image analysis. In this work we address the edge detection considering three main objectives: simplicity, efficiency, and generalization since current state-of-the-art (SOTA) edge detection models are increased in complexity for better accuracy. To achieve this, we present Tiny and Efficient Edge Detector (TEED), a light convolutional neural network with only $58K$ parameters, less than $0.2$% of the state-of-the-art models. Training on the BIPED dataset takes $less than 30 minutes$, with each epoch requiring $less than 5 minutes$. Our proposed model is easy to train and it quickly converges within very first few epochs, while the predicted edge-maps are crisp and of high quality. Additionally, we propose a new dataset to test the generalization of edge detection, which comprises samples from popular images used in edge detection and image segmentation. The source code is available in https://github.com/xavysp/TEED.
    摘要 大多数高级计算机视觉任务都依赖于低级图像操作作为其初始过程。操作如边检测、图像提高和超分解,为更高级的图像分析提供基础。在这项工作中,我们考虑边检测三个主要目标:简单、高效和普适,因为当前状态艺术(SOTA)边检测模型在精度方面增加了复杂度。为了实现这一点,我们提出了简单和高效的边检测器(TEED),这是一个具有只有58000个参数的轻量级卷积神经网络。在BIPE dataset上训练时间仅占少于30分钟,每个epoch仅需5分钟左右。我们提出的模型轻松训练,快速 converge在第一些epoch中,而预测的边图具有高质量。此外,我们还提出了一个新的测试普适性边检测的数据集,该数据集包括流行的图像used在边检测和图像分割中。源代码可以在https://github.com/xavysp/TEED上获取。

Not So Robust After All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2308.06467
  • repo_url: None
  • paper_authors: Roman Garaev, Bader Rasheed, Adil Khan
  • for: 这种研究旨在挑战当今防御机制对假数据攻击的有效性和通用性。
  • methods: 该研究使用了 adversarial attacks 来挑战当今的 DNN 模型。
  • results: 研究发现,train DNN 模型使用只有 robust 特征集时,并不能保证模型免受假数据攻击。此外,研究还发现 $L_2$ 和 $L_{\infty}$ нор的攻击对 DNN 表示的影响不同,这可能会对研究者提供有用的启示。
    Abstract Deep neural networks (DNNs) have gained prominence in various applications, such as classification, recognition, and prediction, prompting increased scrutiny of their properties. A fundamental attribute of traditional DNNs is their vulnerability to modifications in input data, which has resulted in the investigation of adversarial attacks. These attacks manipulate the data in order to mislead a DNN. This study aims to challenge the efficacy and generalization of contemporary defense mechanisms against adversarial attacks. Specifically, we explore the hypothesis proposed by Ilyas et. al, which posits that DNN image features can be either robust or non-robust, with adversarial attacks targeting the latter. This hypothesis suggests that training a DNN on a dataset consisting solely of robust features should produce a model resistant to adversarial attacks. However, our experiments demonstrate that this is not universally true. To gain further insights into our findings, we analyze the impact of adversarial attack norms on DNN representations, focusing on samples subjected to $L_2$ and $L_{\infty}$ norm attacks. Further, we employ canonical correlation analysis, visualize the representations, and calculate the mean distance between these representations and various DNN decision boundaries. Our results reveal a significant difference between $L_2$ and $L_{\infty}$ norms, which could provide insights into the potential dangers posed by $L_{\infty}$ norm attacks, previously underestimated by the research community.
    摘要

A One-dimensional HEVC video steganalysis method using the Optimality of Predicted Motion Vectors

  • paper_url: http://arxiv.org/abs/2308.06464
  • repo_url: None
  • paper_authors: Jun Li, Minqing Zhang, Ke Niu, Yingnan Zhang, Xiaoyuan Yang
  • for: 增强HEVC标准视频隐藏通信的检测性能
  • methods: 基于优化的预测动作向量(MVP)的特征提取
  • results: 对两个通用数据集的三种常见隐藏通信方法进行检测,与四种现有的检测方法进行比较,实验结果表明提议的优化率of MVP在所有覆盖视频中为100%,而在所有隐藏视频中为 less than 100%,因此可以准确地分辨覆盖视频和隐藏视频,并在实际应用中具有无模型训练和低计算复杂度。
    Abstract Among steganalysis techniques, detection against motion vector (MV) domain-based video steganography in High Efficiency Video Coding (HEVC) standard remains a hot and challenging issue. For the purpose of improving the detection performance, this paper proposes a steganalysis feature based on the optimality of predicted MVs with a dimension of one. Firstly, we point out that the motion vector prediction (MVP) of the prediction unit (PU) encoded using the Advanced Motion Vector Prediction (AMVP) technique satisfies the local optimality in the cover video. Secondly, we analyze that in HEVC video, message embedding either using MVP index or motion vector differences (MVD) may destroy the above optimality of MVP. And then, we define the optimal rate of MVP in HEVC video as a steganalysis feature. Finally, we conduct steganalysis detection experiments on two general datasets for three popular steganography methods and compare the performance with four state-of-the-art steganalysis methods. The experimental results show that the proposed optimal rate of MVP for all cover videos is 100\%, while the optimal rate of MVP for all stego videos is less than 100\%. Therefore, the proposed steganography scheme can accurately distinguish between cover videos and stego videos, and it is efficiently applied to practical scenarios with no model training and low computational complexity.
    摘要 在隐藏分析技术中,对高效视频编码标准(HEVC)中的动态vector域基于视频隐藏技术进行检测仍然是一个热点和挑战。为了提高检测性能,这篇论文提出了基于预测动态vector(MVP)的隐藏特征。首先,我们指出HEVC视频中的预测单元(PU)使用高级动态vector预测(AMVP)技术编码时,预测动态vector的优化性在覆盖视频中是本地优化的。其次,我们分析HEVC视频中的信息嵌入(使用MVP索引或动态vector差(MVD))可能会破坏上述优化性。然后,我们定义HEVC视频中MVP的优化率作为隐藏特征。最后,我们对两个通用数据集上三种流行的隐藏技术进行检测试验,并与四种现状顶尖隐藏检测方法进行比较。实验结果表明,我们提出的优化率对所有覆盖视频是100%,而对所有隐藏视频是少于100%。因此,我们的隐藏方案可以准确地 отлича出覆盖视频和隐藏视频,并在实际应用中具有无模型训练和低计算复杂度。

Multi-Label Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2308.06453
  • repo_url: https://github.com/penghui-yang/l2d
  • paper_authors: Penghui Yang, Ming-Kun Xie, Chen-Chen Zong, Lei Feng, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang
  • for: 本研究是为了解决多类 label 学习中的知识填充问题,因为传统的知识填充方法难以在多类 label 学习中应用。
  • methods: 本研究提出了一种新的多类 label 知识填充方法,它利用了类别 embeddings 的结构信息,并将多类 label 学习问题分解成多个 binary 分类问题,以提高知识填充效果。
  • results: 实验结果表明,提出的方法可以避免类标签之间的知识冲突,并在多个 benchmark 数据集上达到了Superior性能,比较方法的性能。
    Abstract Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D
    摘要 traditional knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D.Here's the translation in Traditional Chinese:传统的知识传递方法通常是将教师网络的输出条件或中间特征图形传递到学生网络中,在多类单 Label 学习中非常成功。然而,这些方法很难扩展到多 Label 学习情况下,因为每个例子都 associates 多个Semantic 标签,预测概率不等于一,特征图形可能将次要类别忽略。在这篇论文中,我们提出了一个新的多 Label 知识传递方法。一方面,它利用了条件的Semantic 知识,将多 Label 学习问题分成多个binary 分类问题;另一方面,它增强了学习的特征表现的明确性,通过利用标签对应的结构信息。实验结果显示,提案的方法可以避免标签之间的知识对抗,因此在多种比较方法面表现出色。我们的代码可以在:https://github.com/penghui-yang/L2D 中找到。

Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More

  • paper_url: http://arxiv.org/abs/2308.06448
  • repo_url: None
  • paper_authors: Sudhanshu Chanpuriya, Cameron Musco
  • for: 本研究旨在提出一种基于非正式矩阵因子化的probabilistic模型,用于融合群集和简化图structure。
  • methods: 该模型基于Random Walk进程的分解,并通过简单的梯度下降优化。
  • results: 该算法可以relax hard clustering问题,并将其转化为一个 tractable 的问题。 furthermore, 该模型在synthetic graph和一些不监控学习任务中表现良好,如orthographic和phonological数据的bipartite和tripartite clustering。
    Abstract Algorithms for node clustering typically focus on finding homophilous structure in graphs. That is, they find sets of similar nodes with many edges within, rather than across, the clusters. However, graphs often also exhibit heterophilous structure, as exemplified by (nearly) bipartite and tripartite graphs, where most edges occur across the clusters. Grappling with such structure is typically left to the task of graph simplification. We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification, and provides a framework for modeling arbitrary graph structure. Our model is based on factorizing the process of taking a random walk on the graph. It permits an unconstrained parametrization, allowing for optimization via simple gradient descent. By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones. We illustrate our algorithm's capabilities on a synthetic graph, as well as simple unsupervised learning tasks involving bipartite and tripartite clustering of orthographic and phonological data.
    摘要 Translated into Simplified Chinese:Algorithms for node clustering通常是查找图граhp的同质结构,即找到多数边连接的节点集,而不是跨集的边。然而,图 oftentimes also exhibits heterophilous structure, such as (nearly) bipartite and tripartite graphs, where most edges occur across the clusters. Previously, dealing with such structure was left to the task of graph simplification. We present a probabilistic model based on non-negative matrix factorization, which unifies clustering and simplification, and provides a framework for modeling arbitrary graph structure. Our model is based on factorizing the process of taking a random walk on the graph. It permits an unconstrained parametrization, allowing for optimization via simple gradient descent. By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones. We illustrate our algorithm's capabilities on a synthetic graph, as well as simple unsupervised learning tasks involving bipartite and tripartite clustering of orthographic and phonological data.

A Sequential Meta-Transfer (SMT) Learning to Combat Complexities of Physics-Informed Neural Networks: Application to Composites Autoclave Processing

  • paper_url: http://arxiv.org/abs/2308.06447
  • repo_url: https://github.com/miladramzy/sequentialmetatransferpinns
  • paper_authors: Milad Ramezankhani, Abbas S. Milani
  • for: 解决非线性偏微分方程(PDE)的快速解决方法,提高科学和工程应用中的精度和效率。
  • methods: 基于物理法则的神经网络(PINNs),通过将物理法则integrated到神经网络的训练中,使其在解决非线性系统方程方面表现出优异。
  • results: 在一个复杂的材料制造过程例子中,提出了一种新的Sequential Meta-Transfer(SMT)学习框架,可以快速地适应非线性系统中的变化,并大幅降低计算成本。
    Abstract Physics-Informed Neural Networks (PINNs) have gained popularity in solving nonlinear partial differential equations (PDEs) via integrating physical laws into the training of neural networks, making them superior in many scientific and engineering applications. However, conventional PINNs still fall short in accurately approximating the solution of complex systems with strong nonlinearity, especially in long temporal domains. Besides, since PINNs are designed to approximate a specific realization of a given PDE system, they lack the necessary generalizability to efficiently adapt to new system configurations. This entails computationally expensive re-training from scratch for any new change in the system. To address these shortfalls, in this work a novel sequential meta-transfer (SMT) learning framework is proposed, offering a unified solution for both fast training and efficient adaptation of PINNs in highly nonlinear systems with long temporal domains. Specifically, the framework decomposes PDE's time domain into smaller time segments to create "easier" PDE problems for PINNs training. Then for each time interval, a meta-learner is assigned and trained to achieve an optimal initial state for rapid adaptation to a range of related tasks. Transfer learning principles are then leveraged across time intervals to further reduce the computational cost.Through a composites autoclave processing case study, it is shown that SMT is clearly able to enhance the adaptability of PINNs while significantly reducing computational cost, by a factor of 100.
    摘要 为了解决这些缺陷,这个研究提出了一个 novel sequential meta-transfer (SMT) 学习框架,它可以实现快速训练和高效适应 PINNs 在高非线性系统中。 Specifically, the framework decomposes PDE's time domain into smaller time segments to create "easier" PDE problems for PINNs training. Then for each time interval, a meta-learner is assigned and trained to achieve an optimal initial state for rapid adaptation to a range of related tasks. Transfer learning principles are then leveraged across time intervals to further reduce the computational cost.通过一个 composites autoclave processing 案例研究,显示了 SMT 能够增强 PINNs 的适应能力,同时大幅降低计算成本,比例为 100。

Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data

  • paper_url: http://arxiv.org/abs/2308.06443
  • repo_url: None
  • paper_authors: Cheol Jun Cho, Edward F. Chang, Gopala K. Anumanchipalli
  • for: 本研究的目的是理解人类行为的神经实现,以便更好地理解神经科学中的复杂行为。
  • methods: 该研究提出了一种新的无监督学习框架——神经幽Alignment(NLA),用于找到有用的神经表示,并使用了一种完全可导的时间折叠模型(TWM)来解决 trial的时间不同问题。
  • results: 当应用于自然的说话ECoG数据时,该模型可以学习更好的表示来编码行为,特别是在低维度空间中。TWM被实验证明,并且当Visualized的折叠 manifold上可以看到共享的神经轨迹 across trials。
    Abstract Understanding the neural implementation of complex human behaviors is one of the major goals in neuroscience. To this end, it is crucial to find a true representation of the neural data, which is challenging due to the high complexity of behaviors and the low signal-to-ratio (SNR) of the signals. Here, we propose a novel unsupervised learning framework, Neural Latent Aligner (NLA), to find well-constrained, behaviorally relevant neural representations of complex behaviors. The key idea is to align representations across repeated trials to learn cross-trial consistent information. Furthermore, we propose a novel, fully differentiable time warping model (TWM) to resolve the temporal misalignment of trials. When applied to intracranial electrocorticography (ECoG) of natural speaking, our model learns better representations for decoding behaviors than the baseline models, especially in lower dimensional space. The TWM is empirically validated by measuring behavioral coherence between aligned trials. The proposed framework learns more cross-trial consistent representations than the baselines, and when visualized, the manifold reveals shared neural trajectories across trials.
    摘要 Translated into Simplified Chinese:理解人类复杂行为的神经实现是生物科学的一个主要目标。为 достичь这个目标,寻找神经数据真正的表示是非常重要,但是由于行为的高复杂性和神经信号噪声比(SNR)的低,这是一项挑战。我们提出了一种新的无监督学习框架,神经缺失匹配(NLA),以获取行为相关的神经表示。我们的关键想法是在重复试验中对表示进行对齐,以学习跨试验的一致信息。此外,我们还提出了一种完全可微分的时间折叠模型(TWM),以解决试验时间的不一致问题。当应用于自然说话的内部电rocorticography(ECoG)数据时,我们的模型可以学习更好的表示,特别是在低维度空间中。TWM被验证了通过测量试验之间的行为一致性。我们的框架可以更好地学习跨试验一致的表示,并且当Visualized时,折叠 manifold revelas shared neural trajectories across trials。

A Domain-adaptive Physics-informed Neural Network for Inverse Problems of Maxwell’s Equations in Heterogeneous Media

  • paper_url: http://arxiv.org/abs/2308.06436
  • repo_url: None
  • paper_authors: Shiyuan Piao, Hong Gu, Aina Wang, Pan Qin
  • for: 解决Maxwell方程组在不同媒质中的逆问题
  • methods: 使用physics-informed神经网络(PINN)和领域适应训练策略
  • results: 提出了一种领域适应PINN(da-PINN),并在两个案例研究中证明了其效果
    Abstract Maxwell's equations are a collection of coupled partial differential equations (PDEs) that, together with the Lorentz force law, constitute the basis of classical electromagnetism and electric circuits. Effectively solving Maxwell's equations is crucial in various fields, like electromagnetic scattering and antenna design optimization. Physics-informed neural networks (PINNs) have shown powerful ability in solving PDEs. However, PINNs still struggle to solve Maxwell's equations in heterogeneous media. To this end, we propose a domain-adaptive PINN (da-PINN) to solve inverse problems of Maxwell's equations in heterogeneous media. First, we propose a location parameter of media interface to decompose the whole domain into several sub-domains. Furthermore, the electromagnetic interface conditions are incorporated into a loss function to improve the prediction performance near the interface. Then, we propose a domain-adaptive training strategy for da-PINN. Finally, the effectiveness of da-PINN is verified with two case studies.
    摘要 马克斯威尔方程是一系列相互关联的偏微分方程(PDEs),与 Lorentz 力法则共同构成了经典电磁学和电路。有效解决马克斯威尔方程是在各种领域中重要,如电磁散射和天线设计优化。 физи学 Informed Neural Networks(PINNs)已经显示出解决 PDEs 的强大能力。然而,PINNs 仍然在不同媒体中解决马克斯威尔方程困难。为此,我们提出了域 adaptive PINN(da-PINN)来解决马克斯威尔方程的反向问题在不同媒体中。首先,我们提出了媒体界面位置参数来分解整个域 into 多个子域。然后,我们将电磁界面条件纳入损失函数以提高预测性能 near 界面。最后,我们提出了域 adaptive 训练策略 для da-PINN。 Finally, da-PINN 的有效性被两个案例研究所验证。

Learn Single-horizon Disease Evolution for Predictive Generation of Post-therapeutic Neovascular Age-related Macular Degeneration

  • paper_url: http://arxiv.org/abs/2308.06432
  • repo_url: None
  • paper_authors: Yuhan Zhang, Kun Huang, Mingchao Li, Songtao Yuan, Qiang Chen
  • for: 预测 age-related macular degeneration (nAMD) 疾病发展,生成post-therapeutic SD-OCT图像
  • methods: 提posed a single-horizon disease evolution network (SHENet),包括Feature Encoder、Graph Evolution Module和Feature Decoder,通过 adversarial training 确保疾病演化学习的有效性
  • results: 比较其他生成方法,SHENet 的生成 SD-OCT 图像具有最高的图像质量,同时保持着 структура和内容的准确预测,并且在质量和效果上具有更好的视觉效果
    Abstract Most of the existing disease prediction methods in the field of medical image processing fall into two classes, namely image-to-category predictions and image-to-parameter predictions. Few works have focused on image-to-image predictions. Different from multi-horizon predictions in other fields, ophthalmologists prefer to show more confidence in single-horizon predictions due to the low tolerance of predictive risk. We propose a single-horizon disease evolution network (SHENet) to predictively generate post-therapeutic SD-OCT images by inputting pre-therapeutic SD-OCT images with neovascular age-related macular degeneration (nAMD). In SHENet, a feature encoder converts the input SD-OCT images to deep features, then a graph evolution module predicts the process of disease evolution in high-dimensional latent space and outputs the predicted deep features, and lastly, feature decoder recovers the predicted deep features to SD-OCT images. We further propose an evolution reinforcement module to ensure the effectiveness of disease evolution learning and obtain realistic SD-OCT images by adversarial training. SHENet is validated on 383 SD-OCT cubes of 22 nAMD patients based on three well-designed schemes based on the quantitative and qualitative evaluations. Compared with other generative methods, the generative SD-OCT images of SHENet have the highest image quality. Besides, SHENet achieves the best structure protection and content prediction. Qualitative evaluations also demonstrate that SHENet has a better visual effect than other methods. SHENet can generate post-therapeutic SD-OCT images with both high prediction performance and good image quality, which has great potential to help ophthalmologists forecast the therapeutic effect of nAMD.
    摘要 大多数现有的疾病预测方法在医学影像处理领域都归类为图像到类别预测和图像到参数预测,少数工作强调图像到图像预测。与其他多个时间预测不同,眼科医生偏好单个时间预测,因为预测风险的容忍度较低。我们提出了单个时间疾病演化网络(SHENet),用于预测治疗后SD-OCT图像。SHENet使用FeatureEncoder将输入SD-OCT图像转换为深度特征,然后使用图像演化模块预测疾病演化过程在高维潜在空间中,并输出预测的深度特征。最后,FeatureDecoder重建预测的深度特征为SD-OCT图像。我们还提出了演化增强模块,以确保疾病演化学习的有效性并获得真实的SD-OCT图像。SHENet在383个SD-OCT立方体上进行了三种基于量化和质量评价的验证。与其他生成方法相比,SHENet生成的SD-OCT图像的图像质量最高。此外,SHENet也达到了最佳结构保护和内容预测。质量评价还表明,SHENet的视觉效果更好。SHENet可以生成治疗后SD-OCT图像,具有高预测性和好图像质量,这对眼科医生预测nAMD的效果具有很大潜力。

Genetic heterogeneity analysis using genetic algorithm and network science

  • paper_url: http://arxiv.org/abs/2308.06429
  • repo_url: None
  • paper_authors: Zhendong Sha, Yuanzhu Chen, Ting Hu
  • for: 这个论文目的是通过基因组宽度关联研究(GWAS)发现疾病感染的遗传变量。
  • methods: 这篇论文使用了一种新的特征选择机制,即特征合选网络(FCS-Net),以EXTRACT多样化的基因变量。FCS-Net使用了一种遗传算理算法(GA)和一种非线性机器学习算法来检测特征互作。
  • results: 实验表明,FCS-Net可以有效地检测特征互作,并且可以在一个案例-控制患肠癌GWAS数据集中提取出新的合成特征。这些合成特征可以用来解释患肠癌的遗传多样性。
    Abstract Through genome-wide association studies (GWAS), disease susceptible genetic variables can be identified by comparing the genetic data of individuals with and without a specific disease. However, the discovery of these associations poses a significant challenge due to genetic heterogeneity and feature interactions. Genetic variables intertwined with these effects often exhibit lower effect-size, and thus can be difficult to be detected using machine learning feature selection methods. To address these challenges, this paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet). FCS-Net is designed to extract heterogeneous subsets of genetic variables from a network constructed from multiple independent feature selection runs based on a genetic algorithm (GA), an evolutionary learning algorithm. We employ a non-linear machine learning algorithm to detect feature interaction. We introduce the Community Risk Score (CRS), a synthetic feature designed to quantify the collective disease association of each variable subset. Our experiment showcases the effectiveness of the utilized GA-based feature selection method in identifying feature interactions through synthetic data analysis. Furthermore, we apply our novel approach to a case-control colorectal cancer GWAS dataset. The resulting synthetic features are then used to explain the genetic heterogeneity in an additional case-only GWAS dataset.
    摘要 通过全基因组协作研究(GWAS),可以通过比较患病者和无病者的遗传数据来确定疾病易感的遗传变量。然而,发现这些相互作用的挑战是由于遗传多样性和特征互作所致。遗传变量与这些效果相互作用的情况下经常表现出较低的效果大小,因此可能difficult to be detected using machine learning feature selection methods。为解决这些挑战,本文提出了一种新的特征选择机制,称为特征相互选择网络(FCSNet)。FCS-Net是基于多个独立的特征选择运行的一个网络结构,使用一种遗传算法(GA)进行进化学习算法。我们使用一种非线性机器学习算法来探测特征相互作用。我们还引入了一个名为社区风险分数(CRS)的合成特征,用于评估每个变量子集的疾病相关度。我们的实验表明,使用我们的新采用的GA基于特征选择方法可以快速和有效地检测特征相互作用。此外,我们还应用了我们的新方法于一个患肠癌GWAS数据集。得到的合成特征然后用于解释一个额外的case-only GWAS数据集中的遗传多样性。

Multiclass Learnability Does Not Imply Sample Compression

  • paper_url: http://arxiv.org/abs/2308.06424
  • repo_url: None
  • paper_authors: Chirag Pabbaraju
  • for: 该论文讨论了一种叫做”样本压缩”的问题,即对于每个由一个假设来标注的样本,是否可以只保留一小部分样本,以便从整个样本上获取标签。
  • methods: 论文使用了一种名叫”VC dimension”的概念,它是用于描述一个假设类型的复杂度的一种指标。论文还使用了一种名叫”DS dimension”的概念,它是用于描述一个多类假设类型的复杂度的一种指标。
  • results: 论文的结果表明,对于每个有限多个假设类型,都存在一个可以压缩样本的方法,其中的压缩率只取决于假设类型的VC dimension。但是,对于多类假设类型,不存在一个可以压缩样本的方法,其中的压缩率只取决于假设类型的DS dimension。
    Abstract A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.
    摘要 一个假设集合承认样本压缩方案,如果对每个由假设集合中的一个假设标注的样本,只需保留一小样本,使得整个样本上的标注可以被推断出来。压缩方案的大小是样本上的子样本的上限。每个可学习的二分类假设集合(必然具有有限VC维度)承认一个样本压缩方案,它的大小只是VC维度的一个有限函数,不виси于样本的大小。对多类假设集合,相应的VC维度是DS维度。我们显示,对多类假设集合,其相应的压缩方案不存在,即每个可学习的多类假设集合,它必然具有有限DS维度,但不存在一个只是DS维度的有限函数的压缩方案。

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

  • paper_url: http://arxiv.org/abs/2308.06422
  • repo_url: None
  • paper_authors: Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram
  • for: 提高深度学习模型的效率,自动选择最佳bit Width和层Width。
  • methods: 使用Hessian-based pruning和cluster-based tree-structured Parzen estimator来缩小搜索空间,并开发surrogate模型。
  • results: 在知名数据集上进行严格测试,与现有方法相比,提供20%的模型大小减少和12倍的搜索时间减少,代表了深度学习模型设计优化领域的一大突破。
    Abstract As the complexity and computational demands of deep learning models rise, the need for effective optimization methods for neural network designs becomes paramount. This work introduces an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers. This leads to a marked enhancement in deep neural network efficiency. The search domain is strategically reduced by leveraging Hessian-based pruning, ensuring the removal of non-crucial parameters. Subsequently, we detail the development of surrogate models for favorable and unfavorable outcomes by employing a cluster-based tree-structured Parzen estimator. This strategy allows for a streamlined exploration of architectural possibilities and swift pinpointing of top-performing designs. Through rigorous testing on well-known datasets, our method proves its distinct advantage over existing methods. Compared to leading compression strategies, our approach records an impressive 20% decrease in model size without compromising accuracy. Additionally, our method boasts a 12x reduction in search time relative to the best search-focused strategies currently available. As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.
    摘要 “深度学习模型的复杂性和计算需求逐渐增长,因此选择最佳 neural network 层的位数和层宽成为了一项非常重要的优化方法。本工作提出了一种新的搜索机制,可以自动选择最佳位数和层宽,从而提高深度神经网络的效率。搜索空间通过利用希尔比ан-基于的剔除来减少,确保移除不必要的参数。然后,我们详细介绍了使用分布式树结构的 Parzen 估计器来开发备受欢迎和不欢迎的结果的代理模型。这种策略可以快速探索不同的建筑方案,并快速定位最佳的设计。我们对知名的数据集进行了严格的测试,并证明了我们的方法与现有方法相比,能够减少模型大小20%,而不会影响准确性。此外,我们的方法可以在搜索时间上减少12倍,相比最佳的搜索焦点策略。因此,我们的提议方法代表了深度神经网络设计优化领域的一大突破,为有限资源的设置中快速实现模型设计和实现,从而推动了可拓展的深度学习解决方案。”

Pedestrian Trajectory Prediction in Pedestrian-Vehicle Mixed Environments: A Systematic Review

  • paper_url: http://arxiv.org/abs/2308.06419
  • repo_url: None
  • paper_authors: Mahsa Golchoubian, Moojan Ghafurian, Kerstin Dautenhahn, Nasser Lashgarian Azad
  • for: 本研究は自动驾驶车辆(AV)在共享空间中的轨迹规划问题的解决方案。
  • methods: 本文系统atically review了 Literature中关于模拟行人轨迹预测的不同方法,这些方法可以应用于不结构化环境中。
  • results: 本文对pedestrian-vehicle交互(与人与人交互)进行了专门考虑,并review了不同变量(如预测uncertainty和行为差异)在已提出的预测模型中如何考虑。
    Abstract Planning an autonomous vehicle's (AV) path in a space shared with pedestrians requires reasoning about pedestrians' future trajectories. A practical pedestrian trajectory prediction algorithm for the use of AVs needs to consider the effect of the vehicle's interactions with the pedestrians on pedestrians' future motion behaviours. In this regard, this paper systematically reviews different methods proposed in the literature for modelling pedestrian trajectory prediction in presence of vehicles that can be applied for unstructured environments. This paper also investigates specific considerations for pedestrian-vehicle interaction (compared with pedestrian-pedestrian interaction) and reviews how different variables such as prediction uncertainties and behavioural differences are accounted for in the previously proposed prediction models. PRISMA guidelines were followed. Articles that did not consider vehicle and pedestrian interactions or actual trajectories, and articles that only focused on road crossing were excluded. A total of 1260 unique peer-reviewed articles from ACM Digital Library, IEEE Xplore, and Scopus databases were identified in the search. 64 articles were included in the final review as they met the inclusion and exclusion criteria. An overview of datasets containing trajectory data of both pedestrians and vehicles used by the reviewed papers has been provided. Research gaps and directions for future work, such as having more effective definition of interacting agents in deep learning methods and the need for gathering more datasets of mixed traffic in unstructured environments are discussed.
    摘要 планирование пути автономного транспортного средства (АВ) в пространстве, где находятся пешеходы, требует рассмотрения прогнозируемых траекторий пешеходов. практический алгоритм предсказания траекторий пешеходов для использования АВ должен учитывать влияние взаимодействия автомобиля с пешеходами на будущие движения людей. в этом отношении, этот документ систематически обзорывает различные методы, предложенные в литературе для моделирования предсказания траекторий пешеходов в присутствии автомобилей, которые могут быть применены в неструктурированных средах. документ также исследует конкретные аспекты взаимодействия пешехода-автомобиль (в сравнении с взаимодействием пешехода-пешеход) и обзоры, как различные переменные, такие как неопределенности предсказания и различия в поведении, учитываются в предыдущих моделях предсказания. Following PRISMA guidelines, articles that did not consider the interactions between vehicles and pedestrians or actual trajectories, and articles that only focused on road crossing were excluded. A total of 1260 unique peer-reviewed articles from ACM Digital Library, IEEE Xplore, and Scopus databases were identified in the search. 64 articles were included in the final review as they met the inclusion and exclusion criteria. An overview of datasets containing trajectory data of both pedestrians and vehicles used by the reviewed papers has been provided. Research gaps and directions for future work, such as the need for more effective definitions of interacting agents in deep learning methods and the need for gathering more datasets of mixed traffic in unstructured environments, are discussed.

Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering

  • paper_url: http://arxiv.org/abs/2308.06399
  • repo_url: None
  • paper_authors: Lorenzo Vallegi, Marco Scutari, Federico Mattia Stefanini
  • For: This paper is written for researchers and practitioners who work with complex data sets in various fields, particularly agronomic studies. The paper aims to provide a novel approach for modeling causal relationships using Bayesian networks (BNs) and to demonstrate its effectiveness in handling hierarchical data.* Methods: The paper introduces a new approach that integrates random effects into BN learning, which is rooted in linear mixed-effects models. The approach uses directed acyclic graphs to illustrate the connections between variables and can handle complex networks of causal relationships.* Results: The paper reports that employing this approach can enhance structural learning, leading to the discovery of new connections and improved model specification. The approach also results in a reduction in prediction errors from 28% to 17%. The results suggest that the approach is effective in handling complex data sets and can improve the accuracy of predictions.
    Abstract Research involving diverse but related data sets, where associations between covariates and outcomes may vary, is prevalent in various fields including agronomic studies. In these scenarios, hierarchical models, also known as multilevel models, are frequently employed to assimilate information from different data sets while accommodating their distinct characteristics. However, their structure extend beyond simple heterogeneity, as variables often form complex networks of causal relationships. Bayesian networks (BNs) provide a powerful framework for modelling such relationships using directed acyclic graphs to illustrate the connections between variables. This study introduces a novel approach that integrates random effects into BN learning. Rooted in linear mixed-effects models, this approach is particularly well-suited for handling hierarchical data. Results from a real-world agronomic trial suggest that employing this approach enhances structural learning, leading to the discovery of new connections and the improvement of improved model specification. Furthermore, we observe a reduction in prediction errors from 28\% to 17\%. By extending the applicability of BNs to complex data set structures, this approach contributes to the effective utilisation of BNs for hierarchical agronomic data. This, in turn, enhances their value as decision-support tools in the field.
    摘要 研究涉及多个相关数据集,其中变量之间可能存在复杂的关系,在各个领域,如农学研究中很普遍。在这些情况下,层次模型,也称为多级模型, часто被使用来整合不同数据集的信息,同时适应它们的特点。然而,这些模型的结构超出了简单的不同性,因为变量经常形成复杂的 causal 关系网络。 bayesian networks(BN)提供一种强大的模型化这些关系的框架,使用指向无环图来示出变量之间的连接。本研究提出了一种新的方法,即在 bayesian networks 学习中添加随机效应。基于线性混合效应模型,这种方法特别适合处理层次数据。实际的农学试验结果表明,通过使用这种方法,可以提高结构学习的效果,发现新的连接,并改善模型规定。此外,我们发现预测错误率从28%降低到17%。通过扩展 bayesian networks 的应用范围,这种方法为层次农学数据的有效利用做出了贡献,从而提高了 bayesian networks 作为决策支持工具的价值。

Detecting and Preventing Hallucinations in Large Vision Language Models

  • paper_url: http://arxiv.org/abs/2308.06394
  • repo_url: None
  • paper_authors: Anisha Gunjal, Jihan Yin, Erhan Bas
  • for: 本研究的目的是提高大观语言模型(LVLM)在多modal任务中的泛化能力,特别是对Visual Question Answering(VQA)任务的泛化。
  • methods: 我们使用了InstructBLIP模型,并通过我们的novel Fine-grained Direct Preference Optimization(FDPO)和 fine-grained多Modal reward模型来优化这个模型,以避免hallucination。
  • results: 我们的实验结果表明,使用FDPO和rejection sampling可以将InstructBLIP模型中的hallucination率降低41%和55%,并且我们的 reward模型可以在其他多Modal模型上提高泛化能力,降低LLaVA和mPLUG-OWL模型中的hallucination率15%和57%。
    Abstract Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in generalizing across a diverse set of multi-modal tasks, especially for Visual Question Answering (VQA). However, generating detailed responses that are visually grounded is still a challenging task for these models. We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships. To address this, we introduce M-HalDetect, a (M)ultimodal (Hal)lucination (Detect)ion Dataset that can be used to train and benchmark models for hallucination detection and prevention. M-HalDetect consists of 16k fine-grained annotations on VQA examples, making it the first comprehensive multi-modal hallucination detection dataset for detailed image descriptions. Unlike previous work that only consider object hallucination, we additionally annotate both entity descriptions and relationships that are unfaithful. To demonstrate the potential of this dataset for hallucination prevention, we optimize InstructBLIP through our novel Fine-grained Direct Preference Optimization (FDPO). We also train fine-grained multi-modal reward models from InstructBLIP and evaluate their effectiveness with best-of-n rejection sampling. We perform human evaluation on both FDPO and rejection sampling, and find that they reduce hallucination rates in InstructBLIP by 41% and 55% respectively. We also find that our reward model generalizes to other multi-modal models, reducing hallucinations in LLaVA and mPLUG-OWL by 15% and 57% respectively, and has strong correlation with human evaluated accuracy scores.
    摘要 压缩 Large Vision Language Models (LVLMs) 在多modal任务上的总体化进步了很多,特别是视觉问答 (VQA)。然而,生成具体的视觉基于的回答仍然是这些模型的挑战。我们发现,even the current state-of-the-art LVLMs (InstructBLIP) 仍然包含30%的幻觉文本,包括不存在的对象、不准确的描述和关系。为解决这个问题,我们介绍了 M-HalDetect,一个多modal的幻觉检测数据集,可以用于训练和对模型的幻觉检测和预防。M-HalDetect 包含16k 细化的 VQA 示例注释,使其成为首个多modal幻觉检测数据集。不同于之前的工作仅考虑对象幻觉,我们还注释了不准确的实体描述和关系。为证明这个数据集的潜力,我们使用我们的新的精细直接偏好优化 (FDPO) 方法优化 InstructBLIP。我们还使用基于 InstructBLIP 的精细多modal奖励模型,并通过best-of-n 拒绝采样评估其效果。我们进行了人工评估,发现 FDPO 和拒绝采样都能减少 InstructBLIP 中幻觉率 by 41% 和 55% соответственно。此外,我们发现我们的奖励模型可以泛化到其他多modal模型,减少 LLaVA 和 mPLUG-OWL 中的幻觉率 by 15% 和 57% соответственно,并与人类评估准确率之间存在强相关性。

Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

  • paper_url: http://arxiv.org/abs/2308.06382
  • repo_url: https://github.com/PhonemeHallucinator/Phoneme_Hallucinator
  • paper_authors: Siyuan Shan, Yang Li, Amartya Banerjee, Junier B. Oliva
  • for: 本研究旨在解决现有VC方法中的一个矛盾,即保持语言内容的同时实现高度的 speaker similarity。
  • methods: 本研究提出了一种新的VC模型,即“phoneme hallucinator”,该模型可以基于短时间内的目标说话者声音(例如3秒)生成多样化和高质量的目标说话者音频。
  • results: 对比 existed VC方法,本研究的“phoneme hallucinator”模型在语言内容和说话者相似性两个方面都达到了更高的性能。
    Abstract Voice conversion (VC) aims at altering a person's voice to make it sound similar to the voice of another person while preserving linguistic content. Existing methods suffer from a dilemma between content intelligibility and speaker similarity; i.e., methods with higher intelligibility usually have a lower speaker similarity, while methods with higher speaker similarity usually require plenty of target speaker voice data to achieve high intelligibility. In this work, we propose a novel method \textit{Phoneme Hallucinator} that achieves the best of both worlds. Phoneme Hallucinator is a one-shot VC model; it adopts a novel model to hallucinate diversified and high-fidelity target speaker phonemes based just on a short target speaker voice (e.g. 3 seconds). The hallucinated phonemes are then exploited to perform neighbor-based voice conversion. Our model is a text-free, any-to-any VC model that requires no text annotations and supports conversion to any unseen speaker. Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.
    摘要 声音转换(VC)目标是使一个人的声音变得更像另一个人的声音,保持语言内容不变。现有方法受到内容理解和发音相似之间的矛盾,即方法更高的理解能力通常需要大量目标 speaker 的声音数据来实现高度的发音相似。在这项工作中,我们提出了一种新方法——《phoneme hallucinator》。这是一个一架VC模型,它采用了一种新的模型来幻化目标 speaker 的多样化和高品质的发音,只需要短时间的目标 speaker 声音(例如3秒)。幻化的发音然后被利用于邻居基于的声音转换。我们的模型是文本无需、任何到任何的VC模型,不需要文本注释,并且支持转换到任何未看过的发音。对象和主观评估表明,《phoneme hallucinator》在理解和发音相似性方面都高于现有的VC方法。

DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System

  • paper_url: http://arxiv.org/abs/2308.06378
  • repo_url: None
  • paper_authors: Mojtaba Yeganejou, Kimia Honari, Ryan Kluzinski, Scott Dick, Michael Lipsett, James Miller
  • for: 提高 искусственный интеллект的可解释性,即让人可以直观地理解算法的工作方式,而不是仅仅得到后果的解释。
  • methods: 通过混合深度学习和符谱逻辑模型,设计了一种深度 convolutional neuro-fuzzy inference system (DCNFIS),以提高算法的可解释性而不损失准确性。
  • results: DCNFIS在四个常用的数据集上表现与三种现有的 convolutional neural networks 相当,并且在深度符谱系统中表现出色,可以提供明确的解释。
    Abstract A key challenge in eXplainable Artificial Intelligence is the well-known tradeoff between the transparency of an algorithm (i.e., how easily a human can directly understand the algorithm, as opposed to receiving a post-hoc explanation), and its accuracy. We report on the design of a new deep network that achieves improved transparency without sacrificing accuracy. We design a deep convolutional neuro-fuzzy inference system (DCNFIS) by hybridizing fuzzy logic and deep learning models and show that DCNFIS performs as accurately as three existing convolutional neural networks on four well-known datasets. We furthermore that DCNFIS outperforms state-of-the-art deep fuzzy systems. We then exploit the transparency of fuzzy logic by deriving explanations, in the form of saliency maps, from the fuzzy rules encoded in DCNFIS. We investigate the properties of these explanations in greater depth using the Fashion-MNIST dataset.
    摘要 一个主要挑战在可解释人工智能中是论知识(即人类可以直接理解算法,而不是接受后勤解释)和准确性之间的贸易。我们报告了一种新的深度网络的设计,该网络实现了改善的透明度,不 sacrifice准确性。我们设计了一种深度 convolutional neuro-fuzzy inference system (DCNFIS),通过混合深度学习模型和多valued 逻辑模型,并证明 DCNFIS 与三种现有的 convolutional neural networks 在四个知名的数据集上表现相同。此外,我们发现 DCNFIS 在深度逻辑系统中的透明度,可以从 fuzzy 规则中 derivation 出解释,例如 saliency maps。我们在 Fashion-MNIST 数据集上进行了更深入的调查,并证明这些解释具有某些性质。

UAMM: UBET Automated Market Maker

  • paper_url: http://arxiv.org/abs/2308.06375
  • repo_url: None
  • paper_authors: Daniel Jiwoong Im, Alexander Kondratskiy, Vincent Harvey, Hsuan-Wei Fu
  • for: 这篇论文是关于抽象市场制定机制(AMM),用于中央化交易所(DEX)的价格机制。
  • methods: 该论文提出了一种新的价格计算方法,称为UBET AMM(UAMM),该方法考虑了外部市场价格和流动性池的不稳定损失。
  • results: 作者们示出了该方法可以消除外部市场价格是有效的情况下的买卖假象。
    Abstract Automated market makers (AMMs) are pricing mechanisms utilized by decentralized exchanges (DEX). Traditional AMM approaches are constrained by pricing solely based on their own liquidity pool, without consideration of external markets or risk management for liquidity providers. In this paper, we propose a new approach known as UBET AMM (UAMM), which calculates prices by considering external market prices and the impermanent loss of the liquidity pool. Despite relying on external market prices, our method maintains the desired properties of a constant product curve when computing slippages. The key element of UAMM is determining the appropriate slippage amount based on the desired target balance, which encourages the liquidity pool to minimize impermanent loss. We demonstrate that our approach eliminates arbitrage opportunities when external market prices are efficient.
    摘要 自动化市场制造者(AMM)是分布式交易所(DEX)中的价格调节机制。传统AMM方法仅基于自己的流动性池来价格调节,无视外部市场或流动性提供者的风险管理。在这篇论文中,我们提出了一新的方法,known as UBET AMM(UAMM),它根据外部市场价格和流动性池的不稳定损失来计算价格。尽管依赖外部市场价格,我们的方法仍然保持欲要的常量产品曲线价格调节。UBAMM的关键元素是根据目标库存量来决定适当的滑动量,这样将流动性池最小化不稳定损失。我们显示,我们的方法可以在有效的外部市场价格下消除投资机会。

Topic-Level Bayesian Surprise and Serendipity for Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.06368
  • repo_url: https://github.com/ton-moy/surprise-and-serendipity
  • paper_authors: Tonmoy Hasan, Razvan Bunescu
  • for: 本研究旨在提高个性化推荐系统的效果,使用高度可能性的Item推荐,以减少用户接触到的Filter Bubble问题。
  • methods: 该研究使用了Bayesian surprise来量化Item的意外性,并结合协同推荐算法,以找到用户可能很喜欢的高度可能性Item。
  • results: 实验结果表明,使用Bayesian surprise来量化Item的意外性,与距离基于的估计法相比,具有更高的相关性和更好的服务器端推荐性能。
    Abstract A recommender system that optimizes its recommendations solely to fit a user's history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 449 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.
    摘要 一个受推荐系统可以专注于让用户过去的评分历史中的项目进行最佳化,则可能导致一个范本弹簧(filter bubble),使用户没有机会体验到 novel、未看过的类别中的项目。为了解决这个问题,我们可以推荐项目具有高度的惊喜性,即可能具有高度评价的项目。在这篇文章中,我们提出了基于 bayesian 惊喜的内容基式的调和方法,并使用这个方法来衡量项目被用户过去评分后的惊喜性。当与相似用户的协同推荐部分结合时,这个方法可以提供高度惊喜性的项目推荐。为了促进题目级模型的惊喜和创新性的评估,我们将goodreads中的阅读历史数据集提取出超过26000名用户和约130000本书的数据,并 manually annotate 449本被4名用户阅读的书籍,以时间依赖的题目级惊喜作为标准。实验结果显示,使用 bayesian 惊喜的模型与距离基于的评估方法相比,具有更好的惊喜性和创新性的项目推荐性能。

Learning Distributions via Monte-Carlo Marginalization

  • paper_url: http://arxiv.org/abs/2308.06352
  • repo_url: None
  • paper_authors: Chenqiu Zhao, Guanfang Dong, Anup Basu
  • For: 学习不可求解分布的方法* Methods: 使用参数化分布模型(如混合 Gaussian Mixture Model)来近似不可求解分布,并使用 Monte-Carlo Marginalization 和 Kernel Density Estimation 解决计算复杂性和优化问题* Results: 提出了一种可 differentiable 的分布学习方法,并在标准数据集和 sintetic data 上进行了实验,证明了该方法的效果。此外,该方法还可以在 VAE 中代替变量推理,并且可以生成更好的图像。
    Abstract We propose a novel method to learn intractable distributions from their samples. The main idea is to use a parametric distribution model, such as a Gaussian Mixture Model (GMM), to approximate intractable distributions by minimizing the KL-divergence. Based on this idea, there are two challenges that need to be addressed. First, the computational complexity of KL-divergence is unacceptable when the dimensions of distributions increases. The Monte-Carlo Marginalization (MCMarg) is proposed to address this issue. The second challenge is the differentiability of the optimization process, since the target distribution is intractable. We handle this problem by using Kernel Density Estimation (KDE). The proposed approach is a powerful tool to learn complex distributions and the entire process is differentiable. Thus, it can be a better substitute of the variational inference in variational auto-encoders (VAE). One strong evidence of the benefit of our method is that the distributions learned by the proposed approach can generate better images even based on a pre-trained VAE's decoder. Based on this point, we devise a distribution learning auto-encoder which is better than VAE under the same network architecture. Experiments on standard dataset and synthetic data demonstrate the efficiency of the proposed approach.
    摘要 我们提出了一种新的方法来学习不可解 Distributions 的样本。主要思想是使用参数化分布模型,如 Gaussian Mixture Model (GMM),来approximate不可解 Distributions 的 KL- divergence 的最小值。在这个想法的基础之上,需要解决两个挑战。首先,随着分布的维度增加,KL- divergence 的计算复杂性变得不可接受。我们提出了 Monte-Carlo Marginalization (MCMarg) 来解决这个问题。其次,目标分布是不可导的,因此需要使用 Kernel Density Estimation (KDE) 来处理这个问题。我们的方法可以学习复杂的分布,整个过程是导数的,因此可以作为 VAE 的更好的替代方案。我们的方法可以在标准数据集和 synthetic data 上进行实验,并且得到了良好的效果。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The translation is based on the standard grammar and vocabulary of Simplified Chinese, and may differ slightly from the Traditional Chinese used in Taiwan and other countries.

Mirror Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.06342
  • repo_url: https://github.com/cran/DIMORA
  • paper_authors: Jaesung Tae
  • for: 这份报告旨在提出一种对组 categorical 资料进行生成的方法,并且提供一个理论框架来适应受限的领域。
  • methods: 这种方法基于镜像朗凯文法,并且将其应用到简单的扩散过程中。这些方法可以自然地扩展到受欢迎的领域,如图像和文本生成。
  • results: 这种方法可以在简单的扩散过程中生成高质量的组 categorical 资料,并且可以实现受限的领域中的生成。
    Abstract Diffusion models have successfully been applied to generative tasks in various continuous domains. However, applying diffusion to discrete categorical data remains a non-trivial task. Moreover, generation in continuous domains often requires clipping in practice, which motivates the need for a theoretical framework for adapting diffusion to constrained domains. Inspired by the mirror Langevin algorithm for the constrained sampling problem, in this theoretical report we propose Mirror Diffusion Models (MDMs). We demonstrate MDMs in the context of simplex diffusion and propose natural extensions to popular domains such as image and text generation.
    摘要 Diffusion models have successfully been applied to generative tasks in various continuous domains. However, applying diffusion to discrete categorical data remains a non-trivial task. Moreover, generation in continuous domains often requires clipping in practice, which motivates the need for a theoretical framework for adapting diffusion to constrained domains. Inspired by the mirror Langevin algorithm for the constrained sampling problem, in this theoretical report we propose Mirror Diffusion Models (MDMs). We demonstrate MDMs in the context of simplex diffusion and propose natural extensions to popular domains such as image and text generation.Here's the translation in Traditional Chinese:Diffusion models have successfully been applied to generative tasks in various continuous domains. However, applying diffusion to discrete categorical data remains a non-trivial task. Moreover, generation in continuous domains often requires clipping in practice, which motivates the need for a theoretical framework for adapting diffusion to constrained domains. Inspired by the mirror Langevin algorithm for the constrained sampling problem, in this theoretical report we propose Mirror Diffusion Models (MDMs). We demonstrate MDMs in the context of simplex diffusion and propose natural extensions to popular domains such as image and text generation.

Size Lowerbounds for Deep Operator Networks

  • paper_url: http://arxiv.org/abs/2308.06338
  • repo_url: None
  • paper_authors: Anirbit Mukherjee, Amartya Roy
  • for: 本研究目的是Establishing a first-of-its-kind data-dependent lower bound on the size of DeepONets required to reduce empirical error on noisy data.
  • methods: 本研究使用Deep Operator Networks (DeepONets) paradigm to solve regression in infinite dimensions and families of PDEs in one shot.
  • results: 研究发现,为了在$n$个数据点上获得低训练错误,Common output dimension of branch and trunk net必须在$\Omega \left ( \sqrt{n} \right )$scaling。这种情况在解决势动扩散反应PDE中进行实验,并证明在固定模型大小下,通过增加common output dimension来逐渐下降训练错误,数据训练集可能需要平方倍增。
    Abstract Deep Operator Networks are an increasingly popular paradigm for solving regression in infinite dimensions and hence solve families of PDEs in one shot. In this work, we aim to establish a first-of-its-kind data-dependent lowerbound on the size of DeepONets required for them to be able to reduce empirical error on noisy data. In particular, we show that for low training errors to be obtained on $n$ data points it is necessary that the common output dimension of the branch and the trunk net be scaling as $\Omega \left ( {\sqrt{n} \right )$. This inspires our experiments with DeepONets solving the advection-diffusion-reaction PDE, where we demonstrate the possibility that at a fixed model size, to leverage increase in this common output dimension and get monotonic lowering of training error, the size of the training data might necessarily need to scale quadratically with it.
    摘要 深度网络(DeepONet)是一种在无穷维度中进行回归的增 Popular 模式,可以解决 families of PDEs 的问题一并。在这项工作中,我们想要建立一个数据висимы的下界,以确定 DeepONets 的大小需要在噪音数据上减少 empirical error。 Specifically,我们显示了在 $n$ 个数据点上获得低训练错误需要branch和trunk网的公共输出维度Scale as $\Omega \left ( \sqrt{n} \right )$.这种情况在我们解决了diffusion-advection-reaction PDE 的实验中得到了证明,我们发现在固定模型大小下,可以通过增加 common output dimension来降低训练错误,但是training data的大小可能需要 quadratic 增长。

Foundation Model is Efficient Multimodal Multitask Model Selector

  • paper_url: http://arxiv.org/abs/2308.06262
  • repo_url: https://github.com/opengvlab/multitask-model-selector
  • paper_authors: Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo
  • for: 这个论文研究了一个未经探索的重要问题:给一个集合的预训练神经网络,预测它们在每个多Modal任务上的性能,不需要细致调整。
  • methods: 这个论文提出了一种高效的多任务模型选择器(EMMS),使用大规模基础模型将多种下游任务的标签格式转换成一个统一的噪声标签嵌入。EMMS可以通过一种简单的权重线性回归来估算模型的传输性,该算法可以通过一种交互式最小化算法来高效解决。
  • results: 对5个下游任务和24个数据集进行了广泛的实验,显示了EMMS的高效性、有效性和通用性。比如,相比采用LogME进行加强的状态的方法,EMMS在图像识别、引用、描述、视觉问答和文本问答等5个任务上 achieve了9.0%、26.3%、20.1%、54.8%和12.2%的性能提升,同时带来5.13倍、6.29倍、3.59倍、6.19倍和5.66倍的计划时间提升。代码可以在https://github.com/OpenGVLab/Multitask-Model-Selector上获取。
    Abstract This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visual question answering, and text question answering. A brute-force approach is to finetune all models on all target datasets, bringing high computational costs. Although recent-advanced approaches employed lightweight metrics to measure models' transferability,they often depend heavily on the prior knowledge of a single task, making them inapplicable in a multi-modal multi-task scenario. To tackle this issue, we propose an efficient multi-task model selector (EMMS), which employs large-scale foundation models to transform diverse label formats such as categories, texts, and bounding boxes of different downstream tasks into a unified noisy label embedding. EMMS can estimate a model's transferability through a simple weighted linear regression, which can be efficiently solved by an alternating minimization algorithm with a convergence guarantee. Extensive experiments on 5 downstream tasks with 24 datasets show that EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models, making it the first model selection method in the multi-task scenario. For instance, compared with the state-of-the-art method LogME enhanced by our label embeddings, EMMS achieves 9.0\%, 26.3\%, 20.1\%, 54.8\%, 12.2\% performance gain on image recognition, referring, captioning, visual question answering, and text question answering, while bringing 5.13x, 6.29x, 3.59x, 6.19x, and 5.66x speedup in wall-clock time, respectively. The code is available at https://github.com/OpenGVLab/Multitask-Model-Selector.
    摘要 Currently, a brute-force approach is to fine-tune all models on all target datasets, which is computationally expensive. Recent advanced approaches use lightweight metrics to measure models' transferability, but these metrics often rely on prior knowledge of a single task and are not applicable in a multi-modal multi-task scenario.To address this issue, we propose an Efficient Multi-task Model Selector (EMMS), which uses large-scale foundation models to transform diverse label formats into a unified noisy label embedding. EMMS estimates a model's transferability through a simple weighted linear regression, which can be efficiently solved by an alternating minimization algorithm with a convergence guarantee.Our extensive experiments on 5 downstream tasks with 24 datasets show that EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models. Compared with the state-of-the-art method LogME enhanced by our label embeddings, EMMS achieves a 9.0%, 26.3%, 20.1%, 54.8%, and 12.2% performance gain on image recognition, referring, captioning, visual question answering, and text question answering, respectively. Additionally, EMMS brings 5.13x, 6.29x, 3.59x, 6.19x, and 5.66x speedup in wall-clock time, respectively.The code for EMMS is available at https://github.com/OpenGVLab/Multitask-Model-Selector.

Predicting Resilience with Neural Networks

  • paper_url: http://arxiv.org/abs/2308.06309
  • repo_url: None
  • paper_authors: Karen da Mata, Priscila Silva, Lance Fiondella
  • for: This paper aims to propose and evaluate alternative neural network (NN) approaches to model and predict system performance, including negative and positive factors driving resilience, in order to quantify the impact of disruptive events and restorative activities.
  • methods: The paper proposes three alternative NN approaches, including Artificial Neural Networks, Recurrent Neural Networks, and Long-Short Term Memory (LSTM), to model and predict system performance.
  • results: The results show that NN models outperformed a classical statistical model on all goodness-of-fit measures, with LSTMs achieving an over 60% higher adjusted R squared and decreased predictive error by 34-fold compared to the traditional method. These results suggest that NN models are both feasible and accurate for predicting resilience and may find practical use in many important domains.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目的是提出和评估基于神经网络(NN)的方法,以量化系统性能,包括负面和正面因素对系统抗异常性的影响。
  • methods: 论文提出了三种代表性的NN方法,包括人工神经网络、循环神经网络和长短期记忆(LSTM),以量化系统性能。
  • results: 结果表明,NN模型在所有准确度度量上都高于传统统计模型,具体来说,LSTM模型在所有准确度度量上高于60%,并将预测错误量减少34倍。这些结果表明,NN模型可以准确地预测系统抗异常性,并在许多重要领域发现实际应用。
    Abstract Resilience engineering studies the ability of a system to survive and recover from disruptive events, which finds applications in several domains. Most studies emphasize resilience metrics to quantify system performance, whereas recent studies propose statistical modeling approaches to project system recovery time after degradation. Moreover, past studies are either performed on data after recovering or limited to idealized trends. Therefore, this paper proposes three alternative neural network (NN) approaches including (i) Artificial Neural Networks, (ii) Recurrent Neural Networks, and (iii) Long-Short Term Memory (LSTM) to model and predict system performance, including negative and positive factors driving resilience to quantify the impact of disruptive events and restorative activities. Goodness-of-fit measures are computed to evaluate the models and compared with a classical statistical model, including mean squared error and adjusted R squared. Our results indicate that NN models outperformed the traditional model on all goodness-of-fit measures. More specifically, LSTMs achieved an over 60\% higher adjusted R squared, and decreased predictive error by 34-fold compared to the traditional method. These results suggest that NN models to predict resilience are both feasible and accurate and may find practical use in many important domains.
    摘要 “恢复工程”(Resilience engineering)研究系统对瘫痪事件的抗衡能力和恢复时间,这些应用在多个领域。大多数研究强调系统表现的量化指标(resilience metrics),而现在的研究则提出了使用统计模型估算系统恢复时间。然而,过去的研究都是基于已经恢复的数据或仅对理想化趋势进行研究。因此,本文提出了三种人工神经网络(Artificial Neural Networks)方法,包括(i)人工神经网络(Artificial Neural Networks)、(ii)循环神经网络(Recurrent Neural Networks)和(iii)长期记忆运算(Long-Short Term Memory,LSTM),用于模拟和预测系统表现,包括负和正因素影响系统抗衡能力。我们 Compute 好igkeit-of-fit 度量来评估这些模型,并与传统的统计模型进行比较,包括平均方差和修正系数。我们的结果显示,NN 模型在所有好igkeit-of-fit 度量上表现更好,特别是 LSTM 模型的调整 R 平方error 高于 60%,并降低预测误差34倍。这些结果表示 NN 模型可以实现系统抗衡的预测,并且具有高准确性。这些模型可能在许多重要领域中找到实际应用。

FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods

  • paper_url: http://arxiv.org/abs/2308.06248
  • repo_url: https://github.com/visinf/funnybirds
  • paper_authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth
  • for: This paper aims to address the challenge of evaluating the quality of explainable artificial intelligence (XAI) methods, which is an important problem in safety-critical domains where XAI is used.
  • methods: The paper proposes a novel synthetic vision dataset called FunnyBirds, as well as accompanying automatic evaluation protocols. The dataset allows for semantically meaningful image interventions, such as removing individual object parts, which enables the analysis of explanations on a part level and the estimation of ground-truth part importances.
  • results: The paper reports results for 24 different combinations of neural models and XAI methods, demonstrating the strengths and weaknesses of the assessed methods in a fully automatic and systematic manner. The results show that the proposed evaluation protocols can provide valuable insights into the quality of XAI methods and can help to identify areas for improvement.
    Abstract The field of explainable artificial intelligence (XAI) aims to uncover the inner workings of complex deep neural models. While being crucial for safety-critical domains, XAI inherently lacks ground-truth explanations, making its automatic evaluation an unsolved problem. We address this challenge by proposing a novel synthetic vision dataset, named FunnyBirds, and accompanying automatic evaluation protocols. Our dataset allows performing semantically meaningful image interventions, e.g., removing individual object parts, which has three important implications. First, it enables analyzing explanations on a part level, which is closer to human comprehension than existing methods that evaluate on a pixel level. Second, by comparing the model output for inputs with removed parts, we can estimate ground-truth part importances that should be reflected in the explanations. Third, by mapping individual explanations into a common space of part importances, we can analyze a variety of different explanation types in a single common framework. Using our tools, we report results for 24 different combinations of neural models and XAI methods, demonstrating the strengths and weaknesses of the assessed methods in a fully automatic and systematic manner.
    摘要 领域的可解释人工智能(XAI)目的是探索复杂的深度神经网络模型的内部工作机制。而这是安全关键领域的关键,但XAI自然lacks ground-truth explanations,这使得自动评估成为一个未解决的问题。我们解决这个挑战 by proposing a novel synthetic vision dataset,名为FunnyBirds,以及一系列自动评估协议。我们的dataset允许进行semantically meaningful的图像交互,例如去除个体物体部分,这有三个重要的含义。首先,它允许分析解释的部级别,这更加接近人类理解的水平 than existing methods that evaluate on a pixel level。第二,通过比较模型输出对各个部分去除后的输出,我们可以估算ground-truth part importances,这些importances应该反映在解释中。第三,将各种解释映射到一个共同的部分重要性空间中,我们可以分析多种不同的解释类型在一个共同框架中。使用我们的工具,我们对24种不同的神经网络模型和XAI方法进行了报告,并demonstrated它们在自动和系统atic的方式下的优劣点。

Private Distribution Learning with Public Data: The View from Sample Compression

  • paper_url: http://arxiv.org/abs/2308.06239
  • repo_url: None
  • paper_authors: Shai Ben-David, Alex Bie, Clément L. Canonne, Gautam Kamath, Vikrant Singhal
  • for: 本研究考虑了一种名为公共-私有学习的问题,在这种设置下,学习者只能访问公共和私有样本,并且需要根据这些样本来估算一个未知分布$p$的值,同时保证隐私性。
  • methods: 本研究使用了纯量化隐私学习来保证隐私性,并使用了一种叫做列学习的方法来连接公共-私有学习的问题。
  • results: 本研究得出了一些新的结论,包括对于任意$k$-mixture的加aussian over $\mathbb R^d$的样本复杂度上界,以及对于agnostic和分布shift抗性的学习者的结论。此外,研究还发现了一个关于公共-私有学习的closure性性质,即对于Gaussian over $\mathbb R^d$,至少需要$d$个公共样本来保证私有学习的隐私性。
    Abstract We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class $\mathcal Q$ is connected to the existence of a sample compression scheme for $\mathcal Q$, as well as to an intermediate notion we refer to as list learning. Leveraging this connection: (1) approximately recovers previous results on Gaussians over $\mathbb R^d$; and (2) leads to new ones, including sample complexity upper bounds for arbitrary $k$-mixtures of Gaussians over $\mathbb R^d$, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in $\mathbb R^d$, at least $d$ public samples are necessary for private learnability, which is close to the known upper bound of $d+1$ public samples.
    摘要 我们研究公共分布学习问题,即在公共数据和私人数据之间的学习问题。在这种设置下,我们称之为公共私人学习。学习者被公共和私人样本所提供,这些样本来自未知分布$p$,属于一个类$\mathcal Q$。学习者的目标是输出一个估计$p$,同时遵守隐私限制(在这种情况下是纯度ifferential privacy)只针对私人样本。我们证明了公共私人学习的可行性与存在一个样本压缩算法,以及一个中间概念——列表学习有关系。通过这种关系,我们可以:1. 约束previous结果中的高斯分布在$\mathbb R^d$上的恢复;2. 导出新的结果,包括$k$-mixture高斯分布在$\mathbb R^d$上的样本复杂性上的Upper bound,以及agnostic和分布shift抗性的学习者的结果。此外,通过与列表学习的连接,我们还证明了对于高斯分布在$\mathbb R^d$上,至少需要$d$个公共样本以便私人学习可行,这与已知的最高bound($d+1$个公共样本)很接近。

MaxFloodCast: Ensemble Machine Learning Model for Predicting Peak Inundation Depth And Decoding Influencing Features

  • paper_url: http://arxiv.org/abs/2308.06228
  • repo_url: None
  • paper_authors: Cheng-Chun Lee, Lipai Huang, Federico Antolini, Matthew Garcia, Andrew Juanb, Samuel D. Brody, Ali Mostafavi
  • for: This study aims to provide efficient and interpretable flood inundation depth predictions using a machine learning model, MaxFloodCast, which can support near-time floodplain management and emergency operations.
  • methods: The study uses physics-based hydrodynamic simulations to train the MaxFloodCast model, which achieves reliable flood inundation depth predictions with an average R-squared of 0.949 and a Root Mean Square Error of 0.61 ft on unseen data.
  • results: The study validates the MaxFloodCast model against Hurricane Harvey and Storm Imelda, demonstrating its potential in supporting flood risk management and emergency operations. The model provides critical information for decision-makers to prioritize areas with critical facilities and to examine how rainfall in other watersheds influences flood exposure in one area.
    Abstract Timely, accurate, and reliable information is essential for decision-makers, emergency managers, and infrastructure operators during flood events. This study demonstrates a proposed machine learning model, MaxFloodCast, trained on physics-based hydrodynamic simulations in Harris County, offers efficient and interpretable flood inundation depth predictions. Achieving an average R-squared of 0.949 and a Root Mean Square Error of 0.61 ft on unseen data, it proves reliable in forecasting peak flood inundation depths. Validated against Hurricane Harvey and Storm Imelda, MaxFloodCast shows the potential in supporting near-time floodplain management and emergency operations. The model's interpretability aids decision-makers in offering critical information to inform flood mitigation strategies, to prioritize areas with critical facilities and to examine how rainfall in other watersheds influences flood exposure in one area. The MaxFloodCast model enables accurate and interpretable inundation depth predictions while significantly reducing computational time, thereby supporting emergency response efforts and flood risk management more effectively.
    摘要 时间、准确、可靠的信息是决策者、紧急管理者和基础设施运营员 durante 洪水事件的重要资讯。本研究展示了一个提议的机器学习模型MaxFloodCast,基于物理基础的水动力 simulations在哈里斯县训练,可提供优化和可解释的洪涛深度预测。在未见数据上,它实现了0.949的平均R-squared和0.61 ft的根幂平均误差。这证明了MaxFloodCast在预测洪涛峰值深度方面的可靠性。验证了飓风哈维和飓风Imelda,MaxFloodCast表明它具有支持即时洪平原管理和紧急作业的潜力。模型的解释性帮助决策者提供重要信息,以帮助实施洪水缓解策略,优先级有 kritical 设施区域,并考虑在其他水系中降雨如何影响洪涛暴露在一个区域。MaxFloodCast 模型可提供高精度和解释性的洪涛深度预测,同时大幅降低计算时间,因此更有效地支持紧急回应努力和洪水风险管理。

Automated Sizing and Training of Efficient Deep Autoencoders using Second Order Algorithms

  • paper_url: http://arxiv.org/abs/2308.06221
  • repo_url: None
  • paper_authors: Kanishka Tyagi, Chinmay Rane, Michael Manry
  • for: 这个论文是为了设计一种通用的线性分类器而写的。
  • methods: 论文使用了一种多步训练方法,包括初始化多类线性分类器、验证错误最小化、提高输出和批量训练算法。
  • results: 论文通过多步训练和杜邦法听到一个高效的深度学习模型,并且在多个公共数据集上实现了性能提升。
    Abstract We propose a multi-step training method for designing generalized linear classifiers. First, an initial multi-class linear classifier is found through regression. Then validation error is minimized by pruning of unnecessary inputs. Simultaneously, desired outputs are improved via a method similar to the Ho-Kashyap rule. Next, the output discriminants are scaled to be net functions of sigmoidal output units in a generalized linear classifier. We then develop a family of batch training algorithm for the multi layer perceptron that optimizes its hidden layer size and number of training epochs. Next, we combine pruning with a growing approach. Later, the input units are scaled to be the net function of the sigmoidal output units that are then feed into as input to the MLP. We then propose resulting improvements in each of the deep learning blocks thereby improving the overall performance of the deep architecture. We discuss the principles and formulation regarding learning algorithms for deep autoencoders. We investigate several problems in deep autoencoders networks including training issues, the theoretical, mathematical and experimental justification that the networks are linear, optimizing the number of hidden units in each layer and determining the depth of the deep learning model. A direct implication of the current work is the ability to construct fast deep learning models using desktop level computational resources. This, in our opinion, promotes our design philosophy of building small but powerful algorithms. Performance gains are demonstrated at each step. Using widely available datasets, the final network's ten fold testing error is shown to be less than that of several other linear, generalized linear classifiers, multi layer perceptron and deep learners reported in the literature.
    摘要 我们提出一种多步训练方法用于设计通用线性分类器。首先,通过回归获得初始多类线性分类器。然后,通过剪枝消除不必要的输入,提高验证错误。同时,通过类似于何-卡希普规则提高 желаем的输出。接着,输出推定器被映射到通用线性分类器中的sigmoid输出单元。我们然后开发了一家批处理训练算法,用于最优化多层感知器的隐藏层大小和训练轮数。接着,我们结合剪枝与增长方法。然后,输入单元被映射到sigmoid输出单元中的net函数。我们最后提出了改进每个深度学习块的结果,从而提高整体深度学习模型的性能。我们讨论了深度学习算法的学习原理和形式,并对深度学习网络中的许多问题进行研究,包括训练问题、理论、数学和实验的正确性。我们的研究表明,通过使用桌面级计算资源,可以快速构建深度学习模型,这与我们的设计哲学相符。我们的实验表明,使用常用的数据集,最终网络的十倍测试错误小于其他线性、通用线性分类器、多层感知器和深度学习者在文献中报道的错误。

Change Point Detection With Conceptors

  • paper_url: http://arxiv.org/abs/2308.06213
  • repo_url: https://github.com/noahgade/changepointdetectionwithconceptors
  • paper_authors: Noah D. Gade, Jordan Rodu
  • for: 这篇论文主要是为了解决时间序列中数据生成过程发生变化的问题。
  • methods: 该方法使用一个概念矩阵来学习时间序列的特征动力学,然后通过一个随机回归神经网络来抽象数据,最后通过一个多variate的距离量来确定变化点。
  • results: 该方法可以提供可靠的变化点估计,并且可以通过Bootstrap方法来生成资料的类型1错误控制。在一些模拟数据和实际数据上测试了该方法,并评估了其性能。
    Abstract Offline change point detection seeks to identify points in a time series where the data generating process changes. This problem is well studied for univariate i.i.d. data, but becomes challenging with increasing dimension and temporal dependence. For the at most one change point problem, we propose the use of a conceptor matrix to learn the characteristic dynamics of a specified training window in a time series. The associated random recurrent neural network acts as a featurizer of the data, and change points are identified from a univariate quantification of the distance between the featurization and the space spanned by a representative conceptor matrix. This model agnostic method can suggest potential locations of interest that warrant further study. We prove that, under mild assumptions, the method provides a consistent estimate of the true change point, and quantile estimates for statistics are produced via a moving block bootstrap of the original data. The method is tested on simulations from several classes of processes, and we evaluate performance with clustering metrics, graphical methods, and observed Type 1 error control. We apply our method to publicly available neural data from rats experiencing bouts of non-REM sleep prior to exploration of a radial maze.
    摘要 停机变点检测目标是找到时间序列中数据生成过程中的变化点。这个问题在独立同分布数据上得到了广泛的研究,但是随着维度和时间相关性的增加,这个问题就变得更加挑战。为了解决这个问题,我们提出了一种基于特征动态矩阵的方法。这个矩阵可以学习指定的训练窗口中的特征动态,然后通过一个随机循环神经网络来抽象数据。通过评估这个抽象和特征矩阵所生成的空间之间的距离,我们可以确定变点。这种模型无关的方法可以提供有利于进一步研究的可能性。我们证明,在某些假设下,这种方法可以提供一个一致的变点估计,并且可以通过移动块bootstrap来生成量统计。我们在一些类型的过程的 simulations 上测试了这种方法,并评估了它们的性能使用 clustering 度量、图形方法和观察到的类型一错控制。最后,我们应用了这种方法在公共可用的 neural 数据上,该数据来自于在非 REM 睡眠前的猫鼠在 радиаль 迷宫中的探索。

Safety in Traffic Management Systems: A Comprehensive Survey

  • paper_url: http://arxiv.org/abs/2308.06204
  • repo_url: None
  • paper_authors: Wenlu Du, Ankan Dash, Jing Li, Hua Wei, Guiling Wang
  • for: 这项研究的目的是为了对交通管理系统中的安全问题进行全面的文献综述,以便更好地了解这些系统的安全问题,并提出解决方案。
  • methods: 本文使用了文献综述的方法来检查交通管理系统中的安全问题,并分析了现有的研究成果。
  • results: 本文发现了交通管理系统中的安全问题,包括系统设计缺陷、车辆通信问题、人工意识问题等,并提出了一些解决方案,如使用隐藏马尔文网络、增加人工意识等。同时,本文也指出了现有研究的限制,如缺乏实验数据和难以模拟实际情况等。
    Abstract Traffic management systems play a vital role in ensuring safe and efficient transportation on roads. However, the use of advanced technologies in traffic management systems has introduced new safety challenges. Therefore, it is important to ensure the safety of these systems to prevent accidents and minimize their impact on road users. In this survey, we provide a comprehensive review of the literature on safety in traffic management systems. Specifically, we discuss the different safety issues that arise in traffic management systems, the current state of research on safety in these systems, and the techniques and methods proposed to ensure the safety of these systems. We also identify the limitations of the existing research and suggest future research directions.
    摘要 交通管理系统在路面交通安全和效率的问题上扮演着重要的角色。然而,进步的科技应用在交通管理系统中带来了新的安全挑战。因此,确保交通管理系统的安全性是不可或缺的。在这份调查中,我们提供了交通管理系统安全的全面评论。具体来说,我们讨论了交通管理系统中不同的安全问题,现有的研究状况,以及确保这些系统安全的技术和方法。我们还识别出现有的研究限制,并建议未来研究方向。