2023-08-12

cs.AI

cs.AI - 2023-08-12

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

paper_url: http://arxiv.org/abs/2308.06595
repo_url: None
paper_authors: Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schimdt
for: 评估视觉语言模型在真实世界中的 instrucion-following 能力（evaluate vision-language models’ ability to follow instructions in real-world scenarios）
methods: 使用 70 个 ‘instruction families’ 和 592 个测试查询（use 70 instruction families and 592 test queries），包括从基本认知到游戏和创意生成等多种任务（including tasks such as basic recognition, game playing, and creative generation）
results: 使用人工和自动评估方法，发现现有模型与参考模型之间的质量差距 relativelly large（using both human and automatic evaluation methods, the quality gap between existing models and reference models is relatively large），提供了一个动态参与的项目，让实验室和研究人员可以简单地在项目网站上提交自己的模型答案（providing a dynamic project that allows researchers and practitioners to simply submit their model’s responses on the project website）

Abstract
We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of instruction-following vision-language models for real-world use. Our starting point is curating 70 'instruction families' that we envision instruction tuned vision-language models should be able to address. Extending beyond evaluations like VQAv2 and COCO, tasks range from basic recognition to game playing and creative generation. Following curation, our dataset comprises 592 test queries, each with a human-authored instruction-conditioned caption. These descriptions surface instruction-specific factors, e.g., for an instruction asking about the accessibility of a storefront for wheelchair users, the instruction-conditioned caption describes ramps/potential obstacles. These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment. We quantify quality gaps between models and references using both human and automatic evaluations; e.g., the top-performing instruction-following model wins against the GPT-4 reference in just 27% of the comparison. VisIT-Bench is dynamic to participate, practitioners simply submit their model's response on the project website; Data, code and leaderboard is available at visit-bench.github.io.

摘要
我们介绍VisIT-Bench（视觉指令比赛），一个用于评估视觉语言模型的实际应用场景的 benchmark。我们开始于精心选择70个“指令家庭”，我们认为视觉语言模型应该能够解决这些指令。我们的数据集包括592个测试查询，每个查询都有一个人工生成的指令条件描述。这些描述包括指令特有的因素，例如一个指令要求关于轮椅用户是否可以进入商店的访问性，描述了斜坡/潜在障碍物。这些描述允许我们收集人工验证的参考输出 для每个实例，并使用文本 только LLM 自动评估候选的多Modal生成。我们使用人工和自动评估来衡量模型和参考之间的质量差距，例如，最高级别的指令遵循模型只在与 GPT-4 参考的比赛中赢得27%。VisIT-Bench 是开放的，参与者可以在项目网站上提交他们的模型的回答。数据、代码和排名信息可以在 visit-bench.github.io 上获得。

Value-Distributional Model-Based Reinforcement Learning

paper_url: http://arxiv.org/abs/2308.06590
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, Jan Peters
for: 这个论文目的是为了解决sequential decision-making任务中的uncertainty quantification问题。
methods: 这个论文使用了model-based Bayesian reinforcement learning的方法，其中的目标是学习Markov决策过程中参数不确定性induced的 posterior distribution over value functions。
results: 论文的实验表明，EQR算法可以在 continuous-control tasks 中比Established model-based和model-free算法表现出性能优势。

Abstract
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective, where the goal is to learn the posterior distribution over value functions induced by parameter (epistemic) uncertainty of the Markov decision process. Previous work restricts the analysis to a few moments of the distribution over values or imposes a particular distribution shape, e.g., Gaussians. Inspired by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function. Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function that can be used for policy optimization. Evaluation across several continuous-control tasks shows performance benefits with respect to established model-based and model-free algorithms.

摘要
<>量化政策长期表现的不确定性是解决sequential decision-making任务的重要问题。我们从model-based Bayesian reinforcement learning的视角 изуча这个问题，目标是学习Markov决策过程中参数（эпистемиче）不确定性引起的 posterior distribution over value functions。先前的工作只考虑了这些分布的一些瞬间或假设了特定的分布形式，例如 Gaussian。 inspirited by distributional reinforcement learning, we introduce a Bellman operator whose fixed-point is the value distribution function。 Based on our theory, we propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function that can be used for policy optimization. 评估在多个连续控制任务上表现出与已有的model-based和model-free算法相比的性能优势。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you need Traditional Chinese, please let me know.

Approximate Answering of Graph Queries

paper_url: http://arxiv.org/abs/2308.06585
repo_url: None
paper_authors: Michael Cochez, Dimitrios Alivanistos, Erik Arakelyan, Max Berrendorf, Daniel Daza, Mikhail Galkin, Pasquale Minervini, Mathias Niepert, Hongyu Ren
for: 本文旨在介绍几种方法，以帮助回答含有不完整信息的知识图（KG）中的查询。
methods: 本文提出了多种方法，包括基于预测、基于潜在相似性、基于证据等方法，以满足不同类型的查询需求。
results: 这些方法可以帮助解决各种查询问题，如答案推断、 Entity Disambiguation、 Relation extraction 等。但是，这些方法受到图数据不完整和不准确的限制。

Abstract
Knowledge graphs (KGs) are inherently incomplete because of incomplete world knowledge and bias in what is the input to the KG. Additionally, world knowledge constantly expands and evolves, making existing facts deprecated or introducing new ones. However, we would still want to be able to answer queries as if the graph were complete. In this chapter, we will give an overview of several methods which have been proposed to answer queries in such a setting. We will first provide an overview of the different query types which can be supported by these methods and datasets typically used for evaluation, as well as an insight into their limitations. Then, we give an overview of the different approaches and describe them in terms of expressiveness, supported graph types, and inference capabilities.

摘要
知识图（KG）自然而然地是不完整的，因为世界知识的不完整和输入KG中的偏见。此外，世界知识不断扩展和发展，使现有的事实过时或引入新的事实。然而，我们仍然希望能够回答问题，作为如果图完整一样。在这章中，我们将给出不同类型的查询支持的方法的概述，以及通常用于评估的数据集，以及这些方法的局限性。然后，我们将对不同的方法进行描述，包括表达力、支持的图类型和推理能力。

paper_url: http://arxiv.org/abs/2308.06573
repo_url: None
paper_authors: Guirong Zhuo, Shouyi Lu, Huanyu Zhou, Lianqing Zheng, Lu Xiong
for:* 4D radar–visual odometry (4DRVO) is an attractive solution for achieving accurate and robust pose estimation by integrating complementary information from 4D radar and cameras.methods:* 4DRVO-Net leverages a feature pyramid, pose warping, and cost volume (PWC) network architecture to progressively estimate and refine poses, with a multi-scale feature extraction network called Radar-PointNet++ that fully considers rich 4D radar point information.* An adaptive 4D radar–camera fusion module (A-RCFM) is designed to automatically select image features based on 4D radar point features, facilitating multi-scale cross-modal feature interaction and adaptive multi-modal feature fusion.results:* Our method outperforms all learning-based and geometry-based methods for most sequences in the VoD dataset, and has exhibited promising performance that closely approaches that of the 64-line LiDAR odometry results of A-LOAM without mapping optimization.

Abstract
Four-dimensional (4D) radar--visual odometry (4DRVO) integrates complementary information from 4D radar and cameras, making it an attractive solution for achieving accurate and robust pose estimation. However, 4DRVO may exhibit significant tracking errors owing to three main factors: 1) sparsity of 4D radar point clouds; 2) inaccurate data association and insufficient feature interaction between the 4D radar and camera; and 3) disturbances caused by dynamic objects in the environment, affecting odometry estimation. In this paper, we present 4DRVO-Net, which is a method for 4D radar--visual odometry. This method leverages the feature pyramid, pose warping, and cost volume (PWC) network architecture to progressively estimate and refine poses. Specifically, we propose a multi-scale feature extraction network called Radar-PointNet++ that fully considers rich 4D radar point information, enabling fine-grained learning for sparse 4D radar point clouds. To effectively integrate the two modalities, we design an adaptive 4D radar--camera fusion module (A-RCFM) that automatically selects image features based on 4D radar point features, facilitating multi-scale cross-modal feature interaction and adaptive multi-modal feature fusion. In addition, we introduce a velocity-guided point-confidence estimation module to measure local motion patterns, reduce the influence of dynamic objects and outliers, and provide continuous updates during pose refinement. We demonstrate the excellent performance of our method and the effectiveness of each module design on both the VoD and in-house datasets. Our method outperforms all learning-based and geometry-based methods for most sequences in the VoD dataset. Furthermore, it has exhibited promising performance that closely approaches that of the 64-line LiDAR odometry results of A-LOAM without mapping optimization.

摘要
四维度（4D）雷达--视觉协调（4DRVO）结合了不同信息，使得它成为了精度和可靠性很高的pose estimation的有力解决方案。然而，4DRVO可能会出现严重的跟踪错误，这些错误主要来自于以下三个原因：1）4D雷达点云稀疏; 2）摄像头和雷达数据的不准确相关和不足的特征互动; 3）环境中的动态对象的干扰，影响 pose estimation。在这篇文章中，我们提出了4DRVO-Net，这是一种4D雷达--视觉协调方法。这种方法利用了特征层、pose扭曲和成本量网络架构，逐步估算和精化pose。我们提出了一种多尺度特征提取网络，叫做Radar-PointNet++,该网络可以全面考虑4D雷达点云的丰富信息，以便细化学习稀疏4D雷达点云。为了有效地结合两种模式，我们设计了自适应4D雷达--摄像头融合模块（A-RCFM），该模块可以根据4D雷达点云特征自动选择摄像头特征，实现了多尺度交互和自适应多模式特征融合。此外，我们引入了速度导向点信任度估计模块，可以测量本地运动趋势，减少动态对象和异常点的影响，并在pose精化过程中提供连续更新。我们在VoD和自有 dataset上展示了我们的方法的优秀性和每个模块设计的有效性。我们的方法在大多数序列上超过了所有学习基于和几何基于的方法，并且在64行LiDAR odometry结果的A-LOAM不需要地图优化的情况下，表现出了可观的表现。

ModelScope Text-to-Video Technical Report

paper_url: http://arxiv.org/abs/2308.06571
repo_url: None
paper_authors: Jiuniu Wang, Hangjie Yuan, Dayou Chen, Yingya Zhang, Xiang Wang, Shiwei Zhang
for: 这个论文旨在描述一种基于文本-图像合成模型（即Stable Diffusion）的文本-视频合成模型（ModelScopeT2V）。
methods: 该模型采用了空间-时间块来保证渠道生成顺序和运动过渡的一致性，并且可以在训练和推理阶段适应不同的帧数。模型包括三个组件（即VQGAN、文本编码器和杂噪UNet），总共含1.7亿个参数，其中0.5亿个参数专门用于时间能力。
results: 模型在三个评价指标上表现出优于当前状态艺术方法。代码和在线demo可以在\url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary}中找到。

Abstract
This paper introduces ModelScopeT2V, a text-to-video synthesis model that evolves from a text-to-image synthesis model (i.e., Stable Diffusion). ModelScopeT2V incorporates spatio-temporal blocks to ensure consistent frame generation and smooth movement transitions. The model could adapt to varying frame numbers during training and inference, rendering it suitable for both image-text and video-text datasets. ModelScopeT2V brings together three components (i.e., VQGAN, a text encoder, and a denoising UNet), totally comprising 1.7 billion parameters, in which 0.5 billion parameters are dedicated to temporal capabilities. The model demonstrates superior performance over state-of-the-art methods across three evaluation metrics. The code and an online demo are available at \url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary}.

摘要
这篇论文介绍了ModelScopeT2V，一种文本到视频合成模型，它从文本到图像合成模型（即稳定扩散）中演化出来。ModelScopeT2V包含空间-时间块来保证 Frame 生成的一致性和平滑的运动过渡。模型可以在训练和推理过程中适应不同的帧数，因此适用于图像-文本和视频-文本数据集。ModelScopeT2V由三个组件（即 VQGAN、文本编码器和杂净 UNet）组成，总共含有1.7亿参数，其中0.5亿参数专门用于时间能力。模型在三个评价指标上表现出色，超过了当前最佳方法。代码和在线示例可以在 \url{https://modelscope.cn/models/damo/text-to-video-synthesis/summary} 上获取。

MC-DRE: Multi-Aspect Cross Integration for Drug Event/Entity Extraction

paper_url: http://arxiv.org/abs/2308.06546
repo_url: None
paper_authors: Jie Yang, Soyeon Caren Han, Siqu Long, Josiah Poon, Goran Nenadic
For: This paper proposes a new multi-aspect cross-integration framework for drug entity/event detection in drug-related documents.* Methods: The proposed framework uses multi-aspect encoders to describe semantic, syntactic, and medical document contextual information, and conducts cross-integration of different contextual information in three ways: key-value cross, attention cross, and feedforward cross.* Results: The proposed model outperforms all state-of-the-art (SOTA) models on two widely used tasks, flat entity detection and discontinuous event extraction.

Abstract
Extracting meaningful drug-related information chunks, such as adverse drug events (ADE), is crucial for preventing morbidity and saving many lives. Most ADEs are reported via an unstructured conversation with the medical context, so applying a general entity recognition approach is not sufficient enough. In this paper, we propose a new multi-aspect cross-integration framework for drug entity/event detection by capturing and aligning different context/language/knowledge properties from drug-related documents. We first construct multi-aspect encoders to describe semantic, syntactic, and medical document contextual information by conducting those slot tagging tasks, main drug entity/event detection, part-of-speech tagging, and general medical named entity recognition. Then, each encoder conducts cross-integration with other contextual information in three ways: the key-value cross, attention cross, and feedforward cross, so the multi-encoders are integrated in depth. Our model outperforms all SOTA on two widely used tasks, flat entity detection and discontinuous event extraction.

摘要
<>提取有用的药物相关信息块，如负面影响（ADE），对避免负担和拯救生命非常重要。大多数ADE都是通过不结构化的医疗讨论报告的方式报告的，因此使用一般的实体识别方法不够。在这篇论文中，我们提议一种新的多方面融合框架，用于药物实体/事件检测，通过捕捉和对照不同语言/知识/文档上下文的信息来描述药物相关文档。我们首先构建多方面编码器，用于描述语义、语法和医疗文档上下文信息，包括插槽标注任务、主药物实体/事件检测、语法标注和普通医学实体识别。然后，每个编码器进行了三种跨integration：键值跨、注意力跨和Feedforward跨，以融合多个上下文信息。我们的模型在两个常用任务上都超过了所有SOTA的性能。

Digital elevation model correction in urban areas using extreme gradient boosting, land cover and terrain parameters

paper_url: http://arxiv.org/abs/2308.06545
repo_url: None
paper_authors: Chukwuma Okolie, Jon Mills, Adedayo Adeleke, Julian Smit
For: The paper aims to enhance the accuracy of medium-resolution digital elevation models (DEMs) in urban areas, specifically in Cape Town, South Africa, for hydrological and environmental modelling.* Methods: The authors use the extreme gradient boosting (XGBoost) ensemble algorithm to correct the DEMs, with eleven predictor variables including elevation, urban footprints, slope, aspect, surface roughness, and more.* Results: The corrected DEMs achieved significant accuracy gains, with a root mean square error (RMSE) improvement of 46-53% for Copernicus DEM and 72-73% for AW3D DEM, compared to other proposed methods. These results demonstrate the potential of gradient boosted trees for enhancing DEM quality and improving hydrological modelling in urban catchments.Here is the same information in Simplified Chinese text, as requested:* For: 这个论文的目的是提高城市区域中的数字高程模型（DEM）的准确性，以便于水文和环境模型。* Methods: 作者使用极限Gradient Boosting（XGBoost）ensemble算法来修正DEM，使用的predictor变量包括高程、城市脚印、坡度、方向、表面荒凉、地形位置指数、地形荒凉指数、地形表面 текстура等 eleven个变量。* Results: 修正后的DEM实现了显著的准确性提高，比如 Copernicus DEM的RMSE提高46-53%，AW3D DEM的RMSE提高72-73%，与其他提议的方法相比。这些结果表明极限Gradient Boosting树可以提高DEM的质量，并且为城市catchments中的水文模型提供改善。

Abstract
The accuracy of digital elevation models (DEMs) in urban areas is influenced by numerous factors including land cover and terrain irregularities. Moreover, building artifacts in global DEMs cause artificial blocking of surface flow pathways. This compromises their quality and adequacy for hydrological and environmental modelling in urban landscapes where precise and accurate terrain information is needed. In this study, the extreme gradient boosting (XGBoost) ensemble algorithm is adopted for enhancing the accuracy of two medium-resolution 30m DEMs over Cape Town, South Africa: Copernicus GLO-30 and ALOS World 3D (AW3D). XGBoost is a scalable, portable and versatile gradient boosting library that can solve many environmental modelling problems. The training datasets are comprised of eleven predictor variables including elevation, urban footprints, slope, aspect, surface roughness, topographic position index, terrain ruggedness index, terrain surface texture, vector roughness measure, forest cover and bare ground cover. The target variable (elevation error) was calculated with respect to highly accurate airborne LiDAR. After training and testing, the model was applied for correcting the DEMs at two implementation sites. The correction achieved significant accuracy gains which are competitive with other proposed methods. The root mean square error (RMSE) of Copernicus DEM improved by 46 to 53% while the RMSE of AW3D DEM improved by 72 to 73%. These results showcase the potential of gradient boosted trees for enhancing the quality of DEMs, and for improved hydrological modelling in urban catchments.

摘要
地数模型（DEM）在城市地区的准确性受到多种因素的影响，包括地表覆盖物和地形 irregularities。此外，全球 DEM 中的建筑物略导致表面流道路径的人工堵塞，从而降低其质量和适用性 для水文环境模型在城市景观中，需要精准和准确的地形信息。在这种研究中，我们采用了极限拟合搅拌（XGBoost）ensemble算法来提高两个中等分辨率 30 m DEM 的准确性，即 Copernicus GLO-30 和 ALOS World 3D（AW3D）。XGBoost 是一种可扩展、可移植和多样的拟合搅拌库，可以解决许多环境模型问题。训练数据集包括 eleven 个预测变量，包括高程、城市脚印、坡度、方向、表面粗糙度、地形坡度指数、地形表面文化、向量粗糙度度量、森林覆盖率和裸地覆盖率。target variable （高程误差）与高精度飞行 LiDAR 进行计算。之后，模型被应用于修正 DEM 的两个实施场景。修正后，DEM 的Root Mean Square Error（RMSE）提高了46%到53%，AW3D DEM 的 RMSE 提高了72%到73%。这些结果显示了拟合搅拌树的潜在可能性，以及对城市流域水文模型的改进。

Dealing with Small Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencoder Methods for Convolutional Models

paper_url: http://arxiv.org/abs/2308.06534
repo_url: https://github.com/wolfda95/ssl-medicalimagining-cl-mae
paper_authors: Daniel Wolf, Tristan Payer, Catharina Silvia Lisson, Christoph Gerhard Lisson, Meinrad Beer, Timo Ropinski, Michael Götz
for: 这篇论文旨在探讨deep learning在医疗影像领域中的应用，以减少诊断错误、轻量化医生工作负担，并加快诊断。
methods: 这篇论文使用了自动标注学习方法，包括对大量无标注影像进行自动标注。
results: 研究发现，使用SparK预训方法可以更好地适应小型标注数据，并且在诊断任务中表现更好。

Abstract
Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis. Training such deep learning models requires large and accurate datasets, with annotations for all training samples. However, in the medical imaging domain, annotated datasets for specific tasks are often small due to the high complexity of annotations, limited access, or the rarity of diseases. To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning. After pre-training, small annotated datasets are sufficient to fine-tune the models for a specific task. The most popular self-supervised pre-training approaches in medical imaging are based on contrastive learning. However, recent studies in natural image processing indicate a strong potential for masked autoencoder approaches. Our work compares state-of-the-art contrastive learning methods with the recently introduced masked autoencoder approach "SparK" for convolutional neural networks (CNNs) on medical images. Therefore we pre-train on a large unannotated CT image dataset and fine-tune on several CT classification tasks. Due to the challenge of obtaining sufficient annotated training data in medical imaging, it is of particular interest to evaluate how the self-supervised pre-training methods perform when fine-tuning on small datasets. By experimenting with gradually reducing the training dataset size for fine-tuning, we find that the reduction has different effects depending on the type of pre-training chosen. The SparK pre-training method is more robust to the training dataset size than the contrastive methods. Based on our results, we propose the SparK pre-training for medical imaging tasks with only small annotated datasets.

摘要
深度学习在医疗影像领域可能减少诊断错误风险，减轻放射学家的工作负担，并加速诊断。深度学习模型的训练需要大量和准确的数据集，并将所有训练样本标注。然而，在医疗影像领域，特定任务的标注数据集经常很小，这可能由标注的复杂性、访问限制或疾病的罕见性引起。为解决这个挑战，可以使用自动标注学习的方法进行深度学习模型的预训练。在预训练后，只需要小量的标注数据集来精度地调整模型 для特定任务。医疗影像领域最受欢迎的自动标注预训练方法是对比学习。然而，最近的自然图像处理研究表明，遮盲 autoencoder 方法有很强的潜在性。我们的工作比较了当前状态的对比学习方法和新引入的遮盲 autoencoder 方法 "SparK" 在医疗影像中的 convolutional neural networks (CNNs) 上。因此，我们预训练在大量无注释 CT 图像数据集上，并在多个 CT 分类任务上进行精度调整。由于医疗影像领域获得足够的注释训练数据是困难的，因此特别关心自动标注预训练方法在小型注释数据集上的性能。通过逐渐减少 fine-tuning 数据集大小的实验，我们发现降低的效果与预训练方法的类型有很大的差异。SparK 预训练方法在训练数据集尺寸减少后表现更加稳定。根据我们的结果，我们建议使用 SparK 预训练方法进行医疗影像任务，只需要小量的注释训练数据。

Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices

paper_url: http://arxiv.org/abs/2308.06528
repo_url: https://github.com/jakubkwiatkowski/abstract_compositional_transformer
paper_authors: Jakub Kwiatkowski, Krzysztof Krawiec
for: The paper aims to improve the performance of solving Raven Progressive Matrices (RPM) tasks using deep learning.
methods: The proposed method uses a transformer-based architecture to predict the visual properties of individual objects and their arrangements, rather than directly choosing the answer. The model parses the visual input into tokens and is trained using self-supervised methods with various masking regimes.
results: The proposed method outperforms state-of-the-art methods and provides interesting insights and partial explanations about the inference. Additionally, the design of the method is immune to biases that exist in some RPM benchmarks.Here’s the simplified Chinese text for the three key points:
for: 这篇论文目的是使用深度学习方法改进解决Raven Progressive Matrices (RPM)任务。
methods: 提议的方法使用 transformer 架构，而不是直接选择答案，而是预测图像中对象的视觉属性和排列。模型将视觉输入解析成 токен，并使用自我超vised 训练方法，包括不同的掩蔽方式。
results: 提议的方法不仅超越了当前的方法，还提供了有趣的解释和偏好。此外，方法的设计也免备了一些 RPM 数据集中的偏见。

Abstract
One of the challenges in learning to perform abstract reasoning is that problems are often posed as monolithic tasks, with no intermediate subgoals. In Raven Progressive Matrices (RPM), the task is to choose one of the available answers given a context, where both contexts and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning is challenging and most contemporary solvers tend to be opaque. In this study, we propose a deep learning architecture based on the transformer blueprint which, rather than directly making the above choice, predicts the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to exist in some RPM benchmarks.

摘要
一个learning抽象逻辑的挑战是问题经常被提出为单一任务，没有中间目标。在Raven进步矩阵（RPM）中，任务是根据上下文选择一个可用的答案，上下文和答案都是复杂的图像组合，包括多个物体在不同的空间排列。由于这个高级目标是唯一的指导，学习是困难的，大多数当代解决方案都是透明的。在这项研究中，我们提议一种基于转换器蓝图的深度学习架构，而不是直接选择上述选择，而是预测图像中对象的视觉属性和排列。得到的多维预测可以直接相互对比，从而选择答案。我们考虑了一些将视觉输入分解成токен的方法，以及在自然supervised训练中隐藏部分输入的方法。在实验评估中，模型不仅超越了当前的方法，还提供了有趣的结论和部分解释，关于推理过程。此外，方法的设计还使其免受一些RPMbenchmark中已知的偏见。

SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

paper_url: http://arxiv.org/abs/2308.06522
repo_url: None
paper_authors: Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Qingfeng Liu, Kee-Bong Song, Mostafa El-Khamy, Salman Avestimehr
for: 这篇论文目的是探讨在 Federated Learning（FL）中使用已经预训练的 transformer 模型进行调整，以获得最佳的语言任务结果。
methods: 这篇论文使用的方法包括 parameter efficient fine-tuning（PEFT）和一个名为 SLoRA 的新方法，用于在高度多标的数据情况下bridge the performance gap between PEFT 和全部调整。
results: 实验结果显示，SLoRA 可以 дости持比 full fine-tuning 相似的性能，并在大约 $\sim 1%$ 的稀疏更新下实现大约 $90%$ 的训练时间减少。

Abstract
Transfer learning via fine-tuning pre-trained transformer models has gained significant success in delivering state-of-the-art results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devices and the huge sizes of popular transformer models, efficient fine-tuning is crucial to make federated training feasible. This work explores the opportunities and challenges associated with applying parameter efficient fine-tuning (PEFT) methods in different FL settings for language tasks. Specifically, our investigation reveals that as the data across users becomes more diverse, the gap between fully fine-tuning the model and employing PEFT methods widens. To bridge this performance gap, we propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios through a novel data-driven initialization technique. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning, with significant sparse updates with approximately $\sim 1\%$ density while reducing training time by up to $90\%$.

摘要
<> translate the following text into Simplified Chinese: Transfer learning via fine-tuning pre-trained transformer models has gained significant success in delivering state-of-the-art results across various NLP tasks. In the absence of centralized data, Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. However, due to the limited communication, computation, and storage capabilities of edge devices and the huge sizes of popular transformer models, efficient fine-tuning is crucial to make federated training feasible. This work explores the opportunities and challenges associated with applying parameter efficient fine-tuning (PEFT) methods in different FL settings for language tasks. Specifically, our investigation reveals that as the data across users becomes more diverse, the gap between fully fine-tuning the model and employing PEFT methods widens. To bridge this performance gap, we propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios through a novel data-driven initialization technique. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning, with significant sparse updates with approximately $\sim 1\%$ density while reducing training time by up to $90\%$.Transfer learning via fine-tuning pre-trained transformer models 在各种 NLP 任务中取得了很大的成功，但在没有中央数据的情况下，Federated Learning (FL) 可以利用分布式和私有的 FL 边缘客户端数据进行 fine-tuning。然而，由于边缘设备的限制性，包括通信、计算和存储能力，以及流行的 transformer 模型的巨大大小，fficient fine-tuning 是使 federated 训练可行的关键。这个工作探讨了在不同的 FL 设置下，用于语言任务的 PEFT 方法所面临的机会和挑战。我们的调查发现，随着用户数据的多样化，完全 fine-tuning 和 PEFT 方法之间的性能差距逐渐扩大。为了弥补这个性能差距，我们提议一种名为 SLoRA 的方法，通过一种新的数据驱动初始化技术，超越 LoRA 在高多样性数据场景中的关键局限性。我们的实验结果表明，SLoRA 可以与完全 fine-tuning 相比，在 $\sim 1\%$ 杂点上实现相似的性能，同时减少训练时间达到 $90\%$。

One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training

paper_url: http://arxiv.org/abs/2308.07934
repo_url: https://github.com/jianshuod/tba
paper_authors: Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia
For: This paper aims to propose a training-assisted bit flip attack on deep neural networks (DNNs) to compromise their security.* Methods: The attack exploits memory fault inject techniques such as row hammer and involves the adversary in the training stage to build a high-risk model. The attack can convert the high-risk model to a malicious one on the victim’s side by flipping only one critical bit on average in the deployment stage.* Results: The attack poses a significant threat even when defenses are employed, and the adversary can easily convert the high-risk model to a malicious one by flipping only one critical bit on average.Here is the information in Simplified Chinese text:
for: 这篇论文目的是提出一种基于训练的位置攻击，用于攻击深度神经网络（DNNs）的安全性。
methods: 该攻击利用了内存错误注入技术，如行撞击，并在训练阶段由敌方参与建立高风险模型。攻击者可以在部署阶段通过只flipping一个关键位来将高风险模型转换为恶意模型。
results: 该攻击可以快速地转换高风险模型为恶意模型，并且对防御措施仍然构成了一定的威胁。

Abstract
Deep neural networks (DNNs) are widely deployed on real-world devices. Concerns regarding their security have gained great attention from researchers. Recently, a new weight modification attack called bit flip attack (BFA) was proposed, which exploits memory fault inject techniques such as row hammer to attack quantized models in the deployment stage. With only a few bit flips, the target model can be rendered useless as a random guesser or even be implanted with malicious functionalities. In this work, we seek to further reduce the number of bit flips. We propose a training-assisted bit flip attack, in which the adversary is involved in the training stage to build a high-risk model to release. This high-risk model, obtained coupled with a corresponding malicious model, behaves normally and can escape various detection methods. The results on benchmark datasets show that an adversary can easily convert this high-risk but normal model to a malicious one on victim's side by \textbf{flipping only one critical bit} on average in the deployment stage. Moreover, our attack still poses a significant threat even when defenses are employed. The codes for reproducing main experiments are available at \url{https://github.com/jianshuod/TBA}.

摘要

HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion

paper_url: http://arxiv.org/abs/2308.06512
repo_url: https://github.com/zhiweihu1103/hkgc-hyperformer
paper_authors: Zhiwei Hu, Víctor Gutiérrez-Basulto, Zhiliang Xiang, Ru Li, Jeff Z. Pan
for: 这个论文主要目标是完善具有 attribute-value 赋值的高级知识图（HKG），以推理未知 triple 而考虑其赋值。
methods: 这个论文提出了 HyperFormer 模型，该模型利用了本地级别的序列信息，包括实体、关系和赋值的内容，以提高 triple 预测的精度。模型包括三个不同模块：实体邻居聚合模块、关系赋值聚合模块和卷积推理模块。
results: 经过广泛的实验 validate 了 HyperFormer 模型在三个知识图 datasets 上的效果，并且在不同的条件下进行了比较。模型在实验中表现出了明显的优势。代码和数据可以在 GitHub 上找到。

Abstract
Hyper-relational knowledge graphs (HKGs) extend standard knowledge graphs by associating attribute-value qualifiers to triples, which effectively represent additional fine-grained information about its associated triple. Hyper-relational knowledge graph completion (HKGC) aims at inferring unknown triples while considering its qualifiers. Most existing approaches to HKGC exploit a global-level graph structure to encode hyper-relational knowledge into the graph convolution message passing process. However, the addition of multi-hop information might bring noise into the triple prediction process. To address this problem, we propose HyperFormer, a model that considers local-level sequential information, which encodes the content of the entities, relations and qualifiers of a triple. More precisely, HyperFormer is composed of three different modules: an entity neighbor aggregator module allowing to integrate the information of the neighbors of an entity to capture different perspectives of it; a relation qualifier aggregator module to integrate hyper-relational knowledge into the corresponding relation to refine the representation of relational content; a convolution-based bidirectional interaction module based on a convolutional operation, capturing pairwise bidirectional interactions of entity-relation, entity-qualifier, and relation-qualifier. realize the depth perception of the content related to the current statement. Furthermore, we introduce a Mixture-of-Experts strategy into the feed-forward layers of HyperFormer to strengthen its representation capabilities while reducing the amount of model parameters and computation. Extensive experiments on three well-known datasets with four different conditions demonstrate HyperFormer's effectiveness. Datasets and code are available at https://github.com/zhiweihu1103/HKGC-HyperFormer.

摘要
超过标准知识 graphs (HKGs) 将 attribute-value 资讯 associates 到 triplets, 实际表示了对应 triplets 的详细信息。 hyper-relational 知识图完成 (HKGC) 目标是预测未知 triplets, 考虑其资讯。现有大多数 HKGC 方法利用全局级图结构编码 hyper-relational 知识到图 convolution 消息传递过程中。然而，添加多个跳跃信息可能会带来 triple 预测过程中的噪声。为解决这个问题，我们提出了 HyperFormer，一种模型，考虑本地级别的顺序信息，对 entitites、关系和资讯的内容进行编码。更加准确地说，HyperFormer 由三个不同模块组成：一个 entity neighbor aggregator 模块，用于将 entity 的 neighborgraph 信息集成，以 Capture 不同的 perspective of it; 一个 relation qualifier aggregator 模块，用于将 hyper-relational 知识 integrate 到对应关系中，以 Refine 关系内容的表示; 一个基于 convolution 操作的 bidirectional interaction module，用于 Capture entity-relation、entity-qualifier 和 relation-qualifier 对的 pairwise bidirectional interactions, 实现对当前声明的深度认知。此外，我们在 HyperFormer 的 feed-forward 层中引入 Mixture-of-Experts 策略，以增强其表示能力，同时减少模型参数和计算量。extensive experiments 表明 HyperFormer 有效。数据集和代码可以在上获取。

Three Ways of Using Large Language Models to Evaluate Chat

paper_url: http://arxiv.org/abs/2308.06502
repo_url: https://github.com/oplatek/chateval-llm
paper_authors: Ondřej Plátek, Vojtěch Hudeček, Patricia Schmidtová, Mateusz Lango, Ondřej Dušek
for: 这个论文描述了由team6提交的ChatEval竞赛中的系统，包括三种基于大语言模型（LLMs）预测对话机器人回复质量的方法。
methods: 论文描述了三种方法，包括使用动态少量示例从矢量存储中提取提示，以及对其他两种方法的分析和未来工作的需求。
results: 论文报告了基于这三种方法的改进，包括使用动态少量示例从矢量存储中提取提示的改进。同时，论文还报告了其他两种方法的性能分析和未来工作的需求。

Abstract
This paper describes the systems submitted by team6 for ChatEval, the DSTC 11 Track 4 competition. We present three different approaches to predicting turn-level qualities of chatbot responses based on large language models (LLMs). We report improvement over the baseline using dynamic few-shot examples from a vector store for the prompts for ChatGPT. We also analyze the performance of the other two approaches and report needed improvements for future work. We developed the three systems over just two weeks, showing the potential of LLMs for this task. An ablation study conducted after the challenge deadline shows that the new Llama 2 models are closing the performance gap between ChatGPT and open-source LLMs. However, we find that the Llama 2 models do not benefit from few-shot examples in the same way as ChatGPT.

摘要
这篇论文描述了团队6在ChatEval DSTC 11 Track 4比赛中提交的三种不同方法来预测对话机器人响应质量。我们使用大型自然语言模型（LLM）来预测对话机器人响应的每个转折质量。我们发现使用动态少量示例从向量存储中提取的Prompt对ChatGPT的性能有所提升。我们还分析了其他两种方法的性能并报告了未来工作中所需的改进。我们在只有两周时间内开发了这三种系统，这表明LLMs在这个任务中的潜力。经过比赛结束后的抽象研究发现，新的Llama 2模型在关键性能方面追近ChatGPT和开源LLMs的性能。然而，我们发现Llama 2模型不如ChatGPT那样受益于少量示例。

Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot Interaction

paper_url: http://arxiv.org/abs/2308.06498
repo_url: None
paper_authors: Kaiqi Chen, Jing Yu Lim, Kingsley Kuan, Harold Soh
for: 本文是为了帮助机器人进行视角理解，即理解人类的视角和信念。
methods: 本文使用了深度世界模型，允许机器人进行视觉和概念上的视角理解，即能够推断人类看到和信任的内容。
results: 实验表明，本方法在三个半可见人机交互任务中表现出色，与现有的基准值进行比较，显著超越了基准值。

Abstract
Perspective-taking is the ability to perceive or understand a situation or concept from another individual's point of view, and is crucial in daily human interactions. Enabling robots to perform perspective-taking remains an unsolved problem; existing approaches that use deterministic or handcrafted methods are unable to accurately account for uncertainty in partially-observable settings. This work proposes to address this limitation via a deep world model that enables a robot to perform both perception and conceptual perspective taking, i.e., the robot is able to infer what a human sees and believes. The key innovation is a decomposed multi-modal latent state space model able to generate and augment fictitious observations/emissions. Optimizing the ELBO that arises from this probabilistic graphical model enables the learning of uncertainty in latent space, which facilitates uncertainty estimation from high-dimensional observations. We tasked our model to predict human observations and beliefs on three partially-observable HRI tasks. Experiments show that our method significantly outperforms existing baselines and is able to infer visual observations available to other agent and their internal beliefs.

摘要

EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes

paper_url: http://arxiv.org/abs/2308.06493
repo_url: None
paper_authors: Jiaxi Jiang, Paul Streli, Manuel Meier, Christian Holz
for: 这篇论文旨在解决headset上的 egopose估计问题，即只使用头和手部位的位姿来估计全身姿态。
methods: 该论文提出了一种新的输入表示方法和一种新的运动分解方法，以估计全身姿态独立于全局位置。此外，它还能够对不同用户的体型进行robust模型。
results: 实验表明，该论文在质量和量化上都有较好的表现，而且可以保持高速推断速度（大于600帧/秒）。这篇论文为将来的工作提供了一个可靠的基线，即全身姿态估计不再需要外部捕捉，并可以在大景观环境中扩展。

Abstract
Full-body ego-pose estimation from head and hand poses alone has become an active area of research to power articulate avatar representation on headset-based platforms. However, existing methods over-rely on the confines of the motion-capture spaces in which datasets were recorded, while simultaneously assuming continuous capture of joint motions and uniform body dimensions. In this paper, we propose EgoPoser, which overcomes these limitations by 1) rethinking the input representation for headset-based ego-pose estimation and introducing a novel motion decomposition method that predicts full-body pose independent of global positions, 2) robustly modeling body pose from intermittent hand position and orientation tracking only when inside a headset's field of view, and 3) generalizing across various body sizes for different users. Our experiments show that EgoPoser outperforms state-of-the-art methods both qualitatively and quantitatively, while maintaining a high inference speed of over 600 fps. EgoPoser establishes a robust baseline for future work, where full-body pose estimation needs no longer rely on outside-in capture and can scale to large-scene environments.

摘要
全身ego姿 estimation从头和手姿alone已成为研究的活跃领域，以提供头盔平台上的人物表现。然而，现有方法受到数据采集空间的限制，同时假设持续采集 JOINT 动作和一致体 dimensions。在这篇论文中，我们提出了 EgoPoser，它缓解了这些限制，通过：1. 重新定义头盔基于的输入表示，并 introduce 一种新的运动分解方法，可以独立地预测全身姿。2. 可靠地模型体姿从头盔视野内部的间歇手姿和方向追踪。3. 对不同用户的体型进行一致化。我们的实验表明，EgoPoser 超过了现有方法的质量和量化表现，同时保持了高速度推断速度超过 600 fps。EgoPoser 建立了一个可靠的基线，将全身姿推断带到大景景环境中。

Generating Faithful Text From a Knowledge Graph with Noisy Reference Text

paper_url: http://arxiv.org/abs/2308.06488
repo_url: None
paper_authors: Tahsina Hashem, Weiqing Wang, Derry Tanti Wijaya, Mohammed Eunus Ali, Yuan-Fang Li
for: 这个论文的目的是提出一种基于知识图（KG）的自然语言生成模型，能够生成准确表示知识图信息的自然语言文本。
methods: 该模型使用了对抗学习和可控文本生成技术，以提高模型对 faithful 信息的识别和控制。
results: 论文的实验结果表明，该模型在 faithfulness 方面表现出色，超过了现有的状态艺文。

Abstract
Knowledge Graph (KG)-to-Text generation aims at generating fluent natural-language text that accurately represents the information of a given knowledge graph. While significant progress has been made in this task by exploiting the power of pre-trained language models (PLMs) with appropriate graph structure-aware modules, existing models still fall short of generating faithful text, especially when the ground-truth natural-language text contains additional information that is not present in the graph. In this paper, we develop a KG-to-text generation model that can generate faithful natural-language text from a given graph, in the presence of noisy reference text. Our framework incorporates two core ideas: Firstly, we utilize contrastive learning to enhance the model's ability to differentiate between faithful and hallucinated information in the text, thereby encouraging the decoder to generate text that aligns with the input graph. Secondly, we empower the decoder to control the level of hallucination in the generated text by employing a controllable text generation technique. We evaluate our model's performance through the standard quantitative metrics as well as a ChatGPT-based quantitative and qualitative analysis. Our evaluation demonstrates the superior performance of our model over state-of-the-art KG-to-text models on faithfulness.

摘要
知识图（KG）-to-文本生成目标是生成流畅自然语言文本，准确表达给定知识图中的信息。虽然现有模型通过利用适当的前训练语言模型（PLMs）和合适的图结构意识模块，已经取得了显著的进步，但现有模型仍然无法生成准确的文本，特别是当参考文本中含有不在知识图中的信息时。在这篇论文中，我们开发了一种KG-to-文本生成模型，可以从给定图生成准确的自然语言文本，并在参考文本中含有噪音时提供 faithful 的文本生成。我们的框架包括两个核心想法：首先，我们利用对比学习增强模型的能力，在文本中划分 faithful 和幻想信息，从而让解码器生成与输入图相关的文本。其次，我们赋予解码器控制幻想度的能力，通过使用可控文本生成技术。我们通过标准的量化度量以及基于 ChatGPT 的量化和质量分析进行评估。我们的评估结果表明，我们的模型在准确性方面与当前状态的 KG-to-文本模型相比，表现出优异的性能。

Not So Robust After All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

paper_url: http://arxiv.org/abs/2308.06467
repo_url: None
paper_authors: Roman Garaev, Bader Rasheed, Adil Khan
for: This study aims to challenge the efficacy and generalization of contemporary defense mechanisms against adversarial attacks.methods: The study explores the hypothesis proposed by Ilyas et. al, which posits that DNN image features can be either robust or non-robust, with adversarial attacks targeting the latter. The study employs canonical correlation analysis, visualizes the representations, and calculates the mean distance between these representations and various DNN decision boundaries.results: The study finds a significant difference between $L_2$ and $L_{\infty}$ norms, which could provide insights into the potential dangers posed by $L_{\infty}$ norm attacks, previously underestimated by the research community.

Abstract
Deep neural networks (DNNs) have gained prominence in various applications, such as classification, recognition, and prediction, prompting increased scrutiny of their properties. A fundamental attribute of traditional DNNs is their vulnerability to modifications in input data, which has resulted in the investigation of adversarial attacks. These attacks manipulate the data in order to mislead a DNN. This study aims to challenge the efficacy and generalization of contemporary defense mechanisms against adversarial attacks. Specifically, we explore the hypothesis proposed by Ilyas et. al, which posits that DNN image features can be either robust or non-robust, with adversarial attacks targeting the latter. This hypothesis suggests that training a DNN on a dataset consisting solely of robust features should produce a model resistant to adversarial attacks. However, our experiments demonstrate that this is not universally true. To gain further insights into our findings, we analyze the impact of adversarial attack norms on DNN representations, focusing on samples subjected to $L_2$ and $L_{\infty}$ norm attacks. Further, we employ canonical correlation analysis, visualize the representations, and calculate the mean distance between these representations and various DNN decision boundaries. Our results reveal a significant difference between $L_2$ and $L_{\infty}$ norms, which could provide insights into the potential dangers posed by $L_{\infty}$ norm attacks, previously underestimated by the research community.

摘要

Multi-Label Knowledge Distillation

paper_url: http://arxiv.org/abs/2308.06453
repo_url: https://github.com/penghui-yang/l2d
paper_authors: Penghui Yang, Ming-Kun Xie, Chen-Chen Zong, Lei Feng, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang
for: 这篇论文主要针对多标签学习问题，旨在提出一种基于知识储存技术的多标签知识传递方法。
methods: 该方法首先将多标签学习问题分解成多个二分类问题，然后通过分别对每个二分类问题进行知识储存来增强学习的特征表示。同时，该方法还利用标签嵌入结构来提高特征表示的独特性。
results: 实验结果表明，提出的方法可以减少标签之间的知识冲突，并且在多个 benchmark 数据集上达到了较高的性能水平，比较于其他比较方法。

Abstract
Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D

摘要
现有的知识传授方法通常是将教师网络的输出几何或中间特征图形知识传授到学生网络，这很成功在多类单 Label 学习中。但这些方法几乎无法扩展到多Label学习情况下，因为预测概率不会加总到一，且特征图形全例可能忽略次要类别。在本文中，我们提出了一个新的多Label知识传授方法。一方面，它利用多Label学习问题中的 semantic 知识，将问题分成多个二分类问题；另一方面，它利用类别对称信息来强化学习的特征表现。实验结果显示，提案的方法可以避免知识抵触 Label 之间，因此在多个比较方法面上获得了更好的性能。我们的代码可以在：https://github.com/penghui-yang/L2D 中找到。

Semantic Equivariant Mixup

paper_url: http://arxiv.org/abs/2308.06451
repo_url: None
paper_authors: Zongbo Han, Tianchi Xie, Bingzhe Wu, Qinghua Hu, Changqing Zhang
for: 提高模型对分布Shift的 Robustness，通过在表示空间强制保持输入数据的结构不变。
methods: 基于semantic-equivariance assumption的generic mixup regularization，使得模型在混合样本中学习更多的semantic information。
results: 经过extensive empirical studies和qualitative analyzes，表明提出的方法可以提高模型的Robustness和Generalization能力。

Abstract
Mixup is a well-established data augmentation technique, which can extend the training distribution and regularize the neural networks by creating ''mixed'' samples based on the label-equivariance assumption, i.e., a proportional mixup of the input data results in the corresponding labels being mixed in the same proportion. However, previous mixup variants may fail to exploit the label-independent information in mixed samples during training, which usually contains richer semantic information. To further release the power of mixup, we first improve the previous label-equivariance assumption by the semantic-equivariance assumption, which states that the proportional mixup of the input data should lead to the corresponding representation being mixed in the same proportion. Then a generic mixup regularization at the representation level is proposed, which can further regularize the model with the semantic information in mixed samples. At a high level, the proposed semantic equivariant mixup (sem) encourages the structure of the input data to be preserved in the representation space, i.e., the change of input will result in the obtained representation information changing in the same way. Different from previous mixup variants, which tend to over-focus on the label-related information, the proposed method aims to preserve richer semantic information in the input with semantic-equivariance assumption, thereby improving the robustness of the model against distribution shifts. We conduct extensive empirical studies and qualitative analyzes to demonstrate the effectiveness of our proposed method. The code of the manuscript is in the supplement.

摘要
混合是一种已有的数据增强技术，可以使得训练分布延伸并规范神经网络，通过创建基于标签相似性假设的混合样本。然而，先前的混合变体可能会忽略混合样本中的标签独立信息，这些信息通常含有更加丰富的 semantics。为了更好地发挥混合的力量，我们首先提高了先前的标签相似性假设，使其转化为 semantics相似性假设，即混合输入数据时，应该对应的表示也在同样的比例进行混合。然后，我们提出了一种通用的混合规范，可以在表示层进行规范，以更加规范模型。总的来说，我们的 semantic equivariant mixup（sem）方法要求输入数据的结构在表示空间保持不变，即输入变化后，获得的表示信息也会在同样的比例进行变化。与先前的混合变体不同，我们的方法更关注于保持输入中更加丰富的 semantics信息，从而提高模型对分布偏移的Robustness。我们进行了广泛的实验和质量分析，以证明我们的提议的效iveness。代码在附录中。

A Sequential Meta-Transfer (SMT) Learning to Combat Complexities of Physics-Informed Neural Networks: Application to Composites Autoclave Processing

paper_url: http://arxiv.org/abs/2308.06447
repo_url: https://github.com/miladramzy/sequentialmetatransferpinns
paper_authors: Milad Ramezankhani, Abbas S. Milani
for: 解决非线性偏微分方程（PDE）问题，提高物理学法的泛化能力。
methods: 使用sequential meta-transfer（SMT）学习框架，将时间域分解成小时段，每个时间段使用meta-学习器进行快速适应。
results: 在一个复杂系统中，通过使用SMT学习框架，可以明显提高PINNs的适应能力，同时减少计算成本，提高效率。

Abstract
Physics-Informed Neural Networks (PINNs) have gained popularity in solving nonlinear partial differential equations (PDEs) via integrating physical laws into the training of neural networks, making them superior in many scientific and engineering applications. However, conventional PINNs still fall short in accurately approximating the solution of complex systems with strong nonlinearity, especially in long temporal domains. Besides, since PINNs are designed to approximate a specific realization of a given PDE system, they lack the necessary generalizability to efficiently adapt to new system configurations. This entails computationally expensive re-training from scratch for any new change in the system. To address these shortfalls, in this work a novel sequential meta-transfer (SMT) learning framework is proposed, offering a unified solution for both fast training and efficient adaptation of PINNs in highly nonlinear systems with long temporal domains. Specifically, the framework decomposes PDE's time domain into smaller time segments to create "easier" PDE problems for PINNs training. Then for each time interval, a meta-learner is assigned and trained to achieve an optimal initial state for rapid adaptation to a range of related tasks. Transfer learning principles are then leveraged across time intervals to further reduce the computational cost.Through a composites autoclave processing case study, it is shown that SMT is clearly able to enhance the adaptability of PINNs while significantly reducing computational cost, by a factor of 100.

摘要
物理学教导神经网络（PINNs）在解决非线性偏微分方程（PDEs）中得到了广泛应用，通过将物理法则 integrate到神经网络训练中，使其在科学和工程应用中优于传统方法。然而，传统的PINNs在处理复杂系统中仍然缺乏精度，特别是在长时间域内。此外，由于PINNs是为某种特定的PDE系统进行适应，因此缺乏可重用的扩展性，需要在新系统配置时重新从零开始训练，这会增加计算成本。为了解决这些缺陷，本文提出了一种新的时序顺序多模式学习（SMT）框架，用于快速训练和高效适应PINNs在非线性系统中。特别是，该框架将时间域 decomposes 为 smaller time segments，以创建"更容易"的PDE问题，以便PINNs的快速训练。然后，每个时间段中分配了一个meta-学习器，并在快速适应一系列相关任务的基础上进行了优化。然后，通过转移学习原理，在时间间隔内进行了进一步的计算成本减少。通过一个复杂材料自动炉处理案例研究，显示SMT可以明显提高PINNs的适应性，同时显著减少计算成本，比例为100。

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

paper_url: http://arxiv.org/abs/2308.06422
repo_url: None
paper_authors: Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram
for: 这篇论文的目的是提出一种自动选择神经网络层的最佳位元数和层宽的搜寻方法，以提高深度学习模型的效率。
methods: 这篇论文使用的方法包括对神经网络层的位元数和层宽进行自动选择，并使用希瑟尔基于删除的搜寻范围缩小技术，以便快速寻找最佳设计。它还使用树结构的Parzen估计器来建立代表性模型，以便快速探索不同的架构可能性。
results: 这篇论文的结果显示，与现有的压缩策略相比，这种方法可以实现20%的模型大小减少，不会对准确性产生影响。另外，这种方法的搜寻时间仅需12倍于目前最佳搜寻策略，使得快速设计和实现深度学习解决方案成为可能。

Abstract
As the complexity and computational demands of deep learning models rise, the need for effective optimization methods for neural network designs becomes paramount. This work introduces an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers. This leads to a marked enhancement in deep neural network efficiency. The search domain is strategically reduced by leveraging Hessian-based pruning, ensuring the removal of non-crucial parameters. Subsequently, we detail the development of surrogate models for favorable and unfavorable outcomes by employing a cluster-based tree-structured Parzen estimator. This strategy allows for a streamlined exploration of architectural possibilities and swift pinpointing of top-performing designs. Through rigorous testing on well-known datasets, our method proves its distinct advantage over existing methods. Compared to leading compression strategies, our approach records an impressive 20% decrease in model size without compromising accuracy. Additionally, our method boasts a 12x reduction in search time relative to the best search-focused strategies currently available. As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.

摘要
“深度学习模型的复杂性和计算需求逐渐增加，因此有效地优化神经网络设计的搜索方法变得非常重要。这项工作提出了一种新的搜索机制，可以自动选择神经网络层的最佳位数和宽度。这会导致深度神经网络的效率得到明显提高。在搜索空间中，我们利用希腊拟合法（Hessian-based pruning）缩小搜索范围，以便快速消除不重要的参数。然后，我们采用分布式树结构的Parzen估计器来构建代表性模型，以便快速探索不同的建筑方案。这种策略可以快速寻找最佳设计，并且可以保证模型的准确性不受影响。我们对知名的数据集进行了严格的测试，并证明了我们的方法与现有方法相比，可以录入20%的模型大小减少，同时保持准确性不变。此外，我们的方法可以在搜索时间方面实现12倍的提升，相比于目前最佳的搜索焦点策略。因此，我们的提议方法代表了神经网络设计优化领域的一大突破，为具有限制资源的场景中快速实现神经网络设计，铺平深度学习解决方案的可能性。”

Pedestrian Trajectory Prediction in Pedestrian-Vehicle Mixed Environments: A Systematic Review

paper_url: http://arxiv.org/abs/2308.06419
repo_url: None
paper_authors: Mahsa Golchoubian, Moojan Ghafurian, Kerstin Dautenhahn, Nasser Lashgarian Azad
for: The paper is written for the development of practical pedestrian trajectory prediction algorithms for autonomous vehicles (AVs) in unstructured environments.
methods: The paper systematically reviews different methods proposed in the literature for modelling pedestrian trajectory prediction in the presence of vehicles, and investigates specific considerations for pedestrian-vehicle interaction.
results: The paper provides an overview of datasets containing trajectory data of both pedestrians and vehicles used by the reviewed papers, and discusses research gaps and directions for future work, such as the need for more effective definition of interacting agents in deep learning methods and the need for more datasets of mixed traffic in unstructured environments.Here are the three points in Simplified Chinese text:
for: 本文是为了开发可行的步行者轨迹预测算法，用于自动驾驶车辆（AV）在无结构环境中。
methods: 本文系统地查询了Literature中的不同方法，用于模拟步行者轨迹预测在车辆存在下。
results: 本文提供了各种数据集，包括步行者和车辆的轨迹数据，并讨论了未来研究的潜在空间，如深度学习方法中的交互代理定义和无结构环境中混合交通数据的收集。

Abstract
Planning an autonomous vehicle's (AV) path in a space shared with pedestrians requires reasoning about pedestrians' future trajectories. A practical pedestrian trajectory prediction algorithm for the use of AVs needs to consider the effect of the vehicle's interactions with the pedestrians on pedestrians' future motion behaviours. In this regard, this paper systematically reviews different methods proposed in the literature for modelling pedestrian trajectory prediction in presence of vehicles that can be applied for unstructured environments. This paper also investigates specific considerations for pedestrian-vehicle interaction (compared with pedestrian-pedestrian interaction) and reviews how different variables such as prediction uncertainties and behavioural differences are accounted for in the previously proposed prediction models. PRISMA guidelines were followed. Articles that did not consider vehicle and pedestrian interactions or actual trajectories, and articles that only focused on road crossing were excluded. A total of 1260 unique peer-reviewed articles from ACM Digital Library, IEEE Xplore, and Scopus databases were identified in the search. 64 articles were included in the final review as they met the inclusion and exclusion criteria. An overview of datasets containing trajectory data of both pedestrians and vehicles used by the reviewed papers has been provided. Research gaps and directions for future work, such as having more effective definition of interacting agents in deep learning methods and the need for gathering more datasets of mixed traffic in unstructured environments are discussed.

摘要
планирование пути автономного транспортного средства (АВ) в пространстве, разделенном с пешеходами, требует расчета будущих траекторий пешеходов. практический алгоритм предсказания траекторий пешеходов для использования АВ должен учитывать влияние взаимодействия автомобиля с пешеходами на будущие движения людей. в этом смысле, этот папяр систематически обзорывает разные методы, предложенные в литературе для моделирования предсказания траекторий пешеходов в присутствии автомобилей, которые могут быть применены в неструктурированных средах. папяр также рассматривает специфические условия для взаимодействия пешеходов и автомобилей (в сравнении с взаимодействием пешеходов-пешеходов) и обзоры, как различные переменные, такие как неопределенности предсказаний и различия в поведении, учитываются в предыдущих предсказательных моделях. following PRISMA guidelines, articles that did not consider vehicle and pedestrian interactions or actual trajectories, and articles that only focused on road crossing were excluded. a total of 1260 unique peer-reviewed articles from ACM Digital Library, IEEE Xplore, and Scopus databases were identified in the search. 64 articles were included in the final review as they met the inclusion and exclusion criteria. an overview of datasets containing trajectory data of both pedestrians and vehicles used by the reviewed papers has been provided. research gaps and directions for future work, such as having more effective definition of interacting agents in deep learning methods and the need for gathering more datasets of mixed traffic in unstructured environments, are discussed.

Dialogue Possibilities between a Human Supervisor and UAM Air Traffic Management: Route Alteration

paper_url: http://arxiv.org/abs/2308.06411
repo_url: None
paper_authors: Jeongseok Kim, Kangjin Kim
for: 本研究旨在提出一种基于知识表示和逻辑的城市航空交通管理（UATM）拓扑管理方法，以便快速Identify safe和高效的 Routes in a carefully sampled environment.
methods: 本方法使用Answer Set Programming（ASP）实现，其中包括非 monotonic reasoning和两个阶段对话，考虑安全和可能的影响因素。
results: 经过多个查询从两个 simulations scenarios， validate了提出的方法的可靠性和有效性。I hope this helps! Let me know if you have any further questions.

Abstract
This paper introduces a novel approach to detour management in Urban Air Traffic Management (UATM) using knowledge representation and reasoning. It aims to understand the complexities and requirements of UAM detours, enabling a method that quickly identifies safe and efficient routes in a carefully sampled environment. This method implemented in Answer Set Programming uses non-monotonic reasoning and a two-phase conversation between a human manager and the UATM system, considering factors like safety and potential impacts. The robustness and efficacy of the proposed method were validated through several queries from two simulation scenarios, contributing to the symbiosis of human knowledge and advanced AI techniques. The paper provides an introduction, citing relevant studies, problem formulation, solution, discussions, and concluding comments.

摘要
这篇论文提出了一种新的偏航管理方法（Detour Management），用于城市空中交通管理（UATM），利用知识表示和推理。它旨在理解城市垂直飞行偏航的复杂性和需求，以便快速地确定安全和高效的路径，并在精心采样的环境中进行。这种方法使用了非 monotonic 推理和两个阶段的人工管理和UATM系统之间的对话，考虑了安全和可能的影响因素。该方法的可靠性和有效性通过多个查询来 validate，来自两个 simulate enario。这篇论文提供了引言、相关研究、问题表述、解决方案、讨论和结论。

A Brain-Computer Interface Augmented Reality Framework with Auto-Adaptive SSVEP Recognition

paper_url: http://arxiv.org/abs/2308.06401
repo_url: None
paper_authors: Yasmine Mustafa, Mohamed Elmahallawy, Tie Luo, Seif Eldawlatly
for: 该研究旨在开发一种可以满足不同个体的脑电信号特点的简单适应集合分类系统，以便在脑机接口（BCI）和增强现实（AR）技术的应用中提高抗骚抗振性能。
methods: 该研究使用了稳态视觉谱波（SSVEP）信号 Pattern，并提出了一种简单的BCI-AR框架，以支持广泛的SSVEP-based BCI-AR应用程序的开发。
results: 测试结果显示，我们的ensemble分类方法在SSVEP-based BCI-AR应用程序中表现出了Robust性，并且与之前的研究相比，我们的方法在包括头部运动的情况下仍然能够达到80%的正确率（在PC上）和77%的正确率（使用HoloLens AR头盔）。此外，我们的视觉刺激时间为5秒，相对较短。

Abstract
Brain-Computer Interface (BCI) initially gained attention for developing applications that aid physically impaired individuals. Recently, the idea of integrating BCI with Augmented Reality (AR) emerged, which uses BCI not only to enhance the quality of life for individuals with disabilities but also to develop mainstream applications for healthy users. One commonly used BCI signal pattern is the Steady-state Visually-evoked Potential (SSVEP), which captures the brain's response to flickering visual stimuli. SSVEP-based BCI-AR applications enable users to express their needs/wants by simply looking at corresponding command options. However, individuals are different in brain signals and thus require per-subject SSVEP recognition. Moreover, muscle movements and eye blinks interfere with brain signals, and thus subjects are required to remain still during BCI experiments, which limits AR engagement. In this paper, we (1) propose a simple adaptive ensemble classification system that handles the inter-subject variability, (2) present a simple BCI-AR framework that supports the development of a wide range of SSVEP-based BCI-AR applications, and (3) evaluate the performance of our ensemble algorithm in an SSVEP-based BCI-AR application with head rotations which has demonstrated robustness to the movement interference. Our testing on multiple subjects achieved a mean accuracy of 80\% on a PC and 77\% using the HoloLens AR headset, both of which surpass previous studies that incorporate individual classifiers and head movements. In addition, our visual stimulation time is 5 seconds which is relatively short. The statistically significant results show that our ensemble classification approach outperforms individual classifiers in SSVEP-based BCIs.

摘要
Initially, Brain-Computer Interface (BCI) 引起关注的应用是为Physically impaired individuals 提高生活质量。然而， BCIs 的潜在应用不仅限于这些人群，还可以为健康用户开发主流应用程序。 BCIs 使用 Steady-state Visually-evoked Potential (SSVEP) 信号模式， capture 脑的响应，并使用 BCIs 来表达需求或愿望。然而，每个人的脑信号不同，因此需要每个人SSVEP 认知。此外，肌肉运动和眼睛跳动会干扰脑信号，因此需要用户在BCI实验中保持静止，限制了AR的应用。在这篇论文中，我们提出了一种简单的适应集成分类系统，可以处理每个人的差异。我们还提出了一种支持广泛SSVEP 基于 BCIs 应用程序的简单AR框架。我们的结果表明，我们的集成分类方法在SSVEP 基于 BCIs 的AR应用程序中，可以快速响应用户的需求或愿望，并且在多个测试人群中表现出 statistically significant 的表现。我们的测试结果显示，我们的集成分类方法在PC 和 HoloLens AR 头盔中都可以达到80%和77%的准确率，这 beiden超过了以个体分类器和头部运动混合的前一 Studies。此外，我们的视觉刺激时间为5秒，相对较短。总之，我们的研究表明，集成分类方法在SSVEP 基于 BCIs 的AR应用程序中表现出了优于个体分类器的表现。这 suggets that our ensemble classification approach can be a promising solution for developing mainstream BCI-AR applications.

ZYN: Zero-Shot Reward Models with Yes-No Questions

paper_url: http://arxiv.org/abs/2308.06385
repo_url: https://github.com/vicgalle/zero-shot-reward-models
paper_authors: Victor Gallego
for: 本文提出了一种解决方案，用于指导语言模型生成文本，以便与人类操作员的偏好相align。
methods: 该方法使用另一个语言模型作为批评者和奖励模型，通过一个Yes-No问题的提问来表达用户偏好，无需进一步的标注数据。
results: 在不同的文本生成领域中，包括毒瘤化、修正电影评论的情感、控制模型对某个话题的看法，以及个性化文本生成器的推荐等方面，实验证明了提议的ZYN框架的可能性。

Abstract
In this work, we address the problem of directing the text generations of a LLM towards a desired behavior, aligning the generated text with the preferences of the human operator. We propose using another language model as a critic, reward model in a zero-shot way thanks to the prompt of a Yes-No question that represents the user preferences, without requiring further labeled data. This zero-shot reward model provides the learning signal to further fine-tune the base LLM using reinforcement learning, as in RLAIF; yet our approach is also compatible in other contexts such as quality-diversity search. Extensive evidence of the capabilities of the proposed ZYN framework is provided through experiments in different domains related to text generation, including detoxification; optimizing sentiment of movie reviews, or any other attribute; steering the opinion about a particular topic the model may have; and personalizing prompt generators for text-to-image tasks. Code to be released at \url{https://github.com/vicgalle/zero-shot-reward-models/}.

摘要
在这项工作中，我们解决了直接将语言生成模型（LLM）引导到所需行为的问题，将生成的文本与人类运行员的偏好相align。我们提议使用另一个语言模型作为批评者、奖励模型，通过Zero-shot manner，只需通过问题提示（Yes-No问题）表示用户偏好，无需更多的标注数据。这种Zero-shot奖励模型为基础LLM进行了进一步微调，使用束缚学习，类似RLAIF;而我们的方法也可以在其他上下文中使用，如质量多样性搜索。我们通过不同领域的文本生成实验提供了广泛的证据，包括毒瘤化、修改电影评论的情感、控制模型对某个话题的看法，以及个性化提示生成器 для文本到图像任务。代码将在 \url{https://github.com/vicgalle/zero-shot-reward-models/} 上发布。

DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System

paper_url: http://arxiv.org/abs/2308.06378
repo_url: None
paper_authors: Mojtaba Yeganejou, Kimia Honari, Ryan Kluzinski, Scott Dick, Michael Lipsett, James Miller
for: 该研究旨在提出一种新的深度学习模型，以提高模型的透明度而不增加准确性的损失。
methods: 该研究使用了深度 convolutional neuro-fuzzy inference system (DCNFIS)，即将深度学习模型和逻辑学习模型相结合，以提高模型的透明度。
results: 研究发现，DCNFIS可以与现有的三种 convolutional neural networks 相比，在四个公共数据集上表现相当准确。此外，DCNFIS还可以超过当前的深度逻辑系统的性能。此外，通过解释来源于逻辑规则的质量分析，该研究还发现了一些有用的特性。

Abstract
A key challenge in eXplainable Artificial Intelligence is the well-known tradeoff between the transparency of an algorithm (i.e., how easily a human can directly understand the algorithm, as opposed to receiving a post-hoc explanation), and its accuracy. We report on the design of a new deep network that achieves improved transparency without sacrificing accuracy. We design a deep convolutional neuro-fuzzy inference system (DCNFIS) by hybridizing fuzzy logic and deep learning models and show that DCNFIS performs as accurately as three existing convolutional neural networks on four well-known datasets. We furthermore that DCNFIS outperforms state-of-the-art deep fuzzy systems. We then exploit the transparency of fuzzy logic by deriving explanations, in the form of saliency maps, from the fuzzy rules encoded in DCNFIS. We investigate the properties of these explanations in greater depth using the Fashion-MNIST dataset.

摘要
一个主要挑战在可解释人工智能是论文质量和直观性之间的交换。我们报告了一种新的深度网络的设计，该网络可以提高直观性而无需牺牲准确性。我们将深度 convolutional neuro-fuzzy inference system (DCNFIS) 设计为混合深度学习和规则逻辑模型，并证明 DCNFIS 在四个常见数据集上表现和三种现有的 convolutional neural networks 相同。此外，我们还证明 DCNFIS 在深度逻辑系统中表现更出色。然后，我们利用规则逻辑的透明性，从 DCNFIS 中提取出解释，以幻灯片的形式表示。我们在 Fashion-MNIST 数据集中进一步调查了这些解释的性质。

Large Language Models and Knowledge Graphs: Opportunities and Challenges

paper_url: http://arxiv.org/abs/2308.06374
repo_url: https://github.com/jettbrains/-L-
paper_authors: Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, Damien Graux
for: 本研究论文探讨了大语言模型（LLM）在知识表示方面的发展，以及这些模型对知识图和 parametric knowledge 的影响。
methods: 本文使用了许多现有的知识表示方法，如知识图和Parametric knowledge，以及一些新的研究方法。
results: 本文总结了一些关于 LLMs 和知识图的共识和观点，并提出了一些可能的研究方向和挑战。

Abstract
Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.

摘要

Wireless Federated $k$-Means Clustering with Non-coherent Over-the-Air Computation

paper_url: http://arxiv.org/abs/2308.06371
repo_url: None
paper_authors: Alphan Sahin
for: 降低无线网络上实现 Federated k-means 算法时的每次通信延迟
methods: 使用 Over-the-air computation（OAC）方案，通过编码器利用数字征在均匀数系统中的表示，通过无线多访问通道的信号积加性性质消除精确时钟和频率同步需求
results: 对客户位置 clustering 场景进行 demonstration，比较标准 k-means clustering 和提议方法的性能，结果显示提议方法与标准 k-means 性能相似，同时降低了通信延迟

Abstract
In this study, we propose using an over-the-air computation (OAC) scheme for the federated k-means clustering algorithm to reduce the per-round communication latency when it is implemented over a wireless network. The OAC scheme relies on an encoder exploiting the representation of a number in a balanced number system and computes the sum of the updates for the federated k-means via signal superposition property of wireless multiple-access channels non-coherently to eliminate the need for precise phase and time synchronization. Also, a reinitialization method for ineffectively used centroids is proposed to improve the performance of the proposed method for heterogeneous data distribution. For a customer-location clustering scenario, we demonstrate the performance of the proposed algorithm and compare it with the standard k-means clustering. Our results show that the proposed approach performs similarly to the standard k-means while reducing communication latency.

摘要
在这种研究中，我们提议使用无线电 computation（OAC）方案来降低在无线网络上实现 federated k-means 算法时的每轮通信延迟。 OAC 方案利用一个编码器利用数字 representation 在平衡数系统中的特性，通过无线多接入通道的信号重叠性性质来消除精确的时钟和相位同步需求。此外，我们还提出了一种重新初始化不合适使用的中心点方法，以提高提案方法在不同数据分布情况下的性能。为一个客户位置 clustering 场景，我们展示了提案的算法性能和标准 k-means 集群算法的比较，我们的结果表明，提案的方法与标准 k-means 集群算法性能相似，同时降低了通信延迟。

Topic-Level Bayesian Surprise and Serendipity for Recommender Systems

paper_url: http://arxiv.org/abs/2308.06368
repo_url: https://github.com/ton-moy/surprise-and-serendipity
paper_authors: Tonmoy Hasan, Razvan Bunescu
for: 提高推荐系统的多样性，使用高度可能性的推荐项，让用户体验到新、未看过的类别。
methods: 使用 bayesian 惊喜来衡量item的意外性，并结合协同推荐算法来找到相似用户。
results: 实验结果表明，使用 bayesian 惊喜与距离基于的优化方法相比，对于时间和主题层次的意外性的评估更加准确，并且在推荐高度可能性的项目方面获得更好的性能。

Abstract
A recommender system that optimizes its recommendations solely to fit a user's history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 449 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.

摘要
一个推荐系统仅将推荐项目调整为用户的预先消耗项目历史，可能会创建一个范例弹性泡箱，让用户无法体验到未看过的类别。为了解决这个问题，可以推荐有高可能性的意外项目，即吸引用户高度评价的项目。在这篇论文中，我们提出了基于bayesian surprise的内容基于的serendipity表现，并使用它来衡量项目被用户过后评价后的surprise程度。当与相似用户的协同组件一起使用时，这将允许推荐高可能性的意外项目。为了评估主题层模型的惊喜和意外性表现，我们引入了Goodreads上的阅读历史数据集，包括26,000名用户和1,300,000本书，其中我们 manually annotate 449本被4名用户阅读的书籍，以时间依赖的主题层惊喜作为标准。实验评估显示，使用bayesian surprise的模型与距离基于的规律来的模型相比，具有更高的惊喜和意外性表现，并且在serendipity项目推荐上也有更好的表现。

Causally Linking Health Application Data and Personal Information Management Tools

paper_url: http://arxiv.org/abs/2308.08556
repo_url: None
paper_authors: Saturnino Luz, Masood Masoodian
for: 本研究旨在开发一种整合多种数据源、分析和可见化工具，以帮助用户更好地理解健康变量之间的 causal 连接。
methods: 本研究使用了数据挖掘、时间序列分析和可见化技术，并将这些技术与各种健康应用程序集成。
results: 研究人员通过提供用户可见化时间序列数据，使用者可以更好地理解健康变量之间的关系，从而帮助用户更好地管理健康。

Abstract
The proliferation of consumer health devices such as smart watches, sleep monitors, smart scales, etc, in many countries, has not only led to growing interest in health monitoring, but also to the development of a countless number of ``smart'' applications to support the exploration of such data by members of the general public, sometimes with integration into professional health services. While a variety of health data streams has been made available by such devices to users, these streams are often presented as separate time-series visualizations, in which the potential relationships between health variables are not explicitly made visible. Furthermore, despite the fact that other aspects of life, such as work and social connectivity, have become increasingly digitised, health and well-being applications make little use of the potentially useful contextual information provided by widely used personal information management tools, such as shared calendar and email systems. This paper presents a framework for the integration of these diverse data sources, analytic and visualization tools, with inference methods and graphical user interfaces to help users by highlighting causal connections among such time-series.

摘要
“随着各国消费者医疗设备的普及，如智能手表、睡眠监测仪、智能秤 scales 等，人们对健康监测的兴趣不 только增加，而且促使了大量的``智能''应用程序的开发，以支持公众成员对健康数据的探索，并有时与专业医疗服务集成。而这些医疗设备提供的健康数据流量，经常以分开的时间序列视图方式显示出来，无法直观地显示健康变量之间的可能关系。此外，尽管其他方面的生活，如工作和社交连接，已经 Digitized，健康和福祉应用却几乎不使用广泛使用的个人信息管理工具，如共享日历和邮件系统，具有可营利的上下文信息。本文提出了将这些多种数据源、分析和视图工具、推理方法和图形用户界面集成起来，以帮助用户更好地探索健康数据的关系。”

paper_url: http://arxiv.org/abs/2308.06354
repo_url: https://github.com/aim-harvard/sdoh
paper_authors: Marco Guevara, Shan Chen, Spencer Thomas, Tafadzwa L. Chaunzwa, Idalid Franco, Benjamin Kann, Shalini Moningi, Jack Qian, Madeleine Goldstein, Susan Harper, Hugo JWL Aerts, Guergana K. Savova, Raymond H. Mak, Danielle S. Bitterman
For: The paper aims to extract social determinants of health (SDoH) from electronic health records (EHRs) to improve patient outcomes.* Methods: The study uses large language models to extract SDoH from free text in EHRs, and experiments with synthetic data generation to improve the extraction of scarce SDoH data.* Results: The best-performing models were fine-tuned Flan-T5 XL and Flan-T5 XXL, which outperformed zero- and few-shot performance of ChatGPT-family models and showed less algorithmic bias. The models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured only 2.0%.Here’s the information in Simplified Chinese text:* 为：本研究用大语言模型提取电子医疗记录中社会determinants of health（SDoH），以提高患者结果。* 方法：研究使用自由文本中的SDoH，并对缺乏SDoH数据进行生成数据的尝试。* 结果：最佳表现的模型是精细调整后的Flan-T5 XL和Flan-T5 XXL，它们在比较shot setting下表现得更好，并且表现出较少的算法偏见。模型可以准确地提取93.8%的患者有不良SDoH，而ICD-10代码只能捕捉2.0%。

Abstract
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated. The study also experimented with synthetic data generation and assessed for algorithmic bias. Our best-performing models were fine-tuned Flan-T5 XL (macro-F1 0.71) for any SDoH, and Flan-T5 XXL (macro-F1 0.70). The benefit of augmenting fine-tuning with synthetic data varied across model architecture and size, with smaller Flan-T5 models (base and large) showing the greatest improvements in performance (delta F1 +0.12 to +0.23). Model performance was similar on the in-hospital system dataset but worse on the MIMIC-III dataset. Our best-performing fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models for both tasks. These fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p<0.05). At the patient-level, our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. Our method can effectively extracted SDoH information from clinic notes, performing better compare to GPT zero- and few-shot settings. These models could enhance real-world evidence on SDoH and aid in identifying patients needing social support.

摘要
社会 determinants of health (SDoH) 有重要的影响 på patient outcomes，但是它们从电子健康记录 (EHR) 中 incomplete 收集。这项研究检查了大型自然语言模型能否从自由文本中提取 SDoH，其中最常见的位置是 EHR 中。研究还检查了使用生成的Synthetic clinical text 来提高提取这些罕见 yet extremely valuable 的临床数据的能力。研究采用了800份病人笔记，并评估了多种 transformer-based 模型。研究还进行了生成数据的评估和算法偏见的检查。我们的最佳表现模型是 Fine-tuned Flan-T5 XL (macro-F1 0.71) 和 Fine-tuned Flan-T5 XXL (macro-F1 0.70)。使用生成数据进行 augmentation 的效果因模型结构和大小而异，小型 Flan-T5 模型（基本和大型）在性能提升中表现最佳（delta F1 +0.12到 +0.23）。模型在医院内系统数据集上的表现相似，但在 MIMIC-III 数据集上表现更差。我们的最佳精度调整模型在 zero-和 few-shot 任务上表现更好，并且比 ChatGPT 家族模型更少改变其预测结果，这表明它们更少受到算法偏见（p<0.05）。在 patient 级别上，我们的模型可以识别93.8%的患者拥有不利的 SDoH，而 ICD-10 代码只能识别2.0%。我们的方法可以有效地从临床笔记中提取 SDoH 信息，并在 GPT zero-和 few-shot 设置下表现更好。这些模型可以增强实际证据，并帮助 indentify 需要社会支持的患者。

Combining feature aggregation and geometric similarity for re-identification of patterned animals

paper_url: http://arxiv.org/abs/2308.06335
repo_url: None
paper_authors: Veikka Immonen, Ekaterina Nepovinnykh, Tuomas Eerola, Charles V. Stewart, Heikki Kälviäinen
For: The paper is written for studying animal populations by using image-based re-identification of individual animals.* Methods: The paper combines two types of pattern similarity metrics: pattern appearance similarity and geometric pattern similarity.* Results: The proposed combination of pattern similarity metrics achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks.Here’s the text in Simplified Chinese:
for: 研究动物种群，通过图像基于个体重新识别。
methods: combining两种 patrern similarity metrics： patrern appearance similarity和几何 patrern similarity。
results: 提议的combinaison achieve promising的重新识别精度 дляSaimaa环形海豹和鲸鱼。

Abstract
Image-based re-identification of animal individuals allows gathering of information such as migration patterns of the animals over time. This, together with large image volumes collected using camera traps and crowdsourcing, opens novel possibilities to study animal populations. For many species, the re-identification can be done by analyzing the permanent fur, feather, or skin patterns that are unique to each individual. In this paper, we address the re-identification by combining two types of pattern similarity metrics: 1) pattern appearance similarity obtained by pattern feature aggregation and 2) geometric pattern similarity obtained by analyzing the geometric consistency of pattern similarities. The proposed combination allows to efficiently utilize both the local and global pattern features, providing a general re-identification approach that can be applied to a wide variety of different pattern types. In the experimental part of the work, we demonstrate that the method achieves promising re-identification accuracies for Saimaa ringed seals and whale sharks.

摘要
图像基于个体重新识别动物，可以获取动物迁徙趋势的信息，并且通过摄像头和人员参与投票，收集大量图像。这些图像可以用于研究动物种群。许多物种的重新识别可以通过分析永久性毛发、羽毛或皮肤特征来完成，这些特征是每个个体唯一的。在这篇论文中，我们提出了结合两种模式相似度度量的方法：1）图像出现相似度度量，通过图像特征聚合获得，2）几何模式相似度度量，通过分析模式相似度的几何一致性来获得。该方法可以有效利用本地和全局模式特征，提供一种通用的重新识别方法，可以应用于多种不同的模式类型。在实验部分，我们示例了对Saimaa环形鳐和鲸鱼等动物的重新识别准确率。

Foundation Model is Efficient Multimodal Multitask Model Selector

paper_url: http://arxiv.org/abs/2308.06262
repo_url: https://github.com/opengvlab/multitask-model-selector
paper_authors: Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo
For: 本文研究了一个未得到充分研究的问题：给一个集合 pré-trained neural networks，预测它们在每个多 modal 任务上的性能，而不需要 fine-tuning 它们。* Methods: 本文提出了一种高效的多任务模型选择器（EMMS），使用大规模基础模型将多个下游任务的多种标签格式转化为一个统一的噪声标签嵌入。EMMS 可以通过一种简单的负权重回归来估计模型的传输性能，可以高效地解决一个 Alternating Minimization 算法。* Results: 广泛的实验表明，EMMS 是一种快速、有效和通用的模型选择器，可以高效地评估 pré-trained 模型的传输性能。例如，相比之前的 state-of-the-art 方法 LogME 增强我们的标签嵌入，EMMS 在图像识别、引用、描述、视觉问答和文本问答等五个下游任务上实现了9.0%、26.3%、20.1%、54.8% 和12.2% 的性能提升，同时带来5.13x、6.29x、3.59x、6.19x 和5.66x 的速度提升。代码可以在 https://github.com/OpenGVLab/Multitask-Model-Selector 上获取。

Abstract
This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visual question answering, and text question answering. A brute-force approach is to finetune all models on all target datasets, bringing high computational costs. Although recent-advanced approaches employed lightweight metrics to measure models' transferability,they often depend heavily on the prior knowledge of a single task, making them inapplicable in a multi-modal multi-task scenario. To tackle this issue, we propose an efficient multi-task model selector (EMMS), which employs large-scale foundation models to transform diverse label formats such as categories, texts, and bounding boxes of different downstream tasks into a unified noisy label embedding. EMMS can estimate a model's transferability through a simple weighted linear regression, which can be efficiently solved by an alternating minimization algorithm with a convergence guarantee. Extensive experiments on 5 downstream tasks with 24 datasets show that EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models, making it the first model selection method in the multi-task scenario. For instance, compared with the state-of-the-art method LogME enhanced by our label embeddings, EMMS achieves 9.0\%, 26.3\%, 20.1\%, 54.8\%, 12.2\% performance gain on image recognition, referring, captioning, visual question answering, and text question answering, while bringing 5.13x, 6.29x, 3.59x, 6.19x, and 5.66x speedup in wall-clock time, respectively. The code is available at https://github.com/OpenGVLab/Multitask-Model-Selector.

摘要

Enhancing Network Management Using Code Generated by Large Language Models

paper_url: http://arxiv.org/abs/2308.06261
repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
paper_authors: Sathiya Kumaran Mani, Yajie Zhou, Kevin Hsieh, Santiago Segarra, Ranveer Chandra, Srikanth Kandula
for: This paper aims to provide a novel approach for natural-language-based network management, leveraging large language models (LLMs) to generate task-specific code from natural language queries.
methods: The proposed approach utilizes LLMs to generate code, addressing the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code and eliminating the need to share network data with LLMs.
results: The prototype system designed and evaluated in the paper demonstrates high accuracy, cost-effectiveness, and potential for further enhancements using complementary program synthesis techniques.

Abstract
Analyzing network topologies and communication graphs plays a crucial role in contemporary network management. However, the absence of a cohesive approach leads to a challenging learning curve, heightened errors, and inefficiencies. In this paper, we introduce a novel approach to facilitate a natural-language-based network management experience, utilizing large language models (LLMs) to generate task-specific code from natural language queries. This method tackles the challenges of explainability, scalability, and privacy by allowing network operators to inspect the generated code, eliminating the need to share network data with LLMs, and concentrating on application-specific requests combined with general program synthesis techniques. We design and evaluate a prototype system using benchmark applications, showcasing high accuracy, cost-effectiveness, and the potential for further enhancements using complementary program synthesis techniques.

摘要
现代网络管理中分析网络拓扑和通信图是关键。然而，由于缺乏一致的方法，会导致学习曲线困难、错误高伸和不效率。在这篇论文中，我们介绍一种新的方法，使得网络管理人员可以通过自然语言查询来获得任务特定的代码。这种方法解决了解释性、可扩展性和隐私问题，因为网络数据不需要与大语言模型（LLMs）分享，而是专注于应用特定的请求，并结合通用程序生成技术。我们设计并评估了一个原型系统，使用标准套件应用程序进行评估，显示高精度、成本效果和可能性。

ChatGPT-based Investment Portfolio Selection

paper_url: http://arxiv.org/abs/2308.06260
repo_url: None
paper_authors: Oleksandr Romanko, Akhilesh Narayan, Roy H. Kwon
for: 投资组合选择（portfolio selection）
methods: 使用生成AI模型（ChatGPT）获取S&P500市场指数中可能有潜力的股票，并对这些股票进行优化配置
results: 结果表明，使用ChatGPT进行股票选择可以带来更好的回报，但是在分配股票重量方面可能不如量化优化模型。但是将AI生成的股票选择与量化优化模型相结合，可以获得更好的投资效果，建议将来投资决策中采用协同approach。

Abstract
In this paper, we explore potential uses of generative AI models, such as ChatGPT, for investment portfolio selection. Trusting investment advice from Generative Pre-Trained Transformer (GPT) models is a challenge due to model "hallucinations", necessitating careful verification and validation of the output. Therefore, we take an alternative approach. We use ChatGPT to obtain a universe of stocks from S&P500 market index that are potentially attractive for investing. Subsequently, we compared various portfolio optimization strategies that utilized this AI-generated trading universe, evaluating those against quantitative portfolio optimization models as well as comparing to some of the popular investment funds. Our findings indicate that ChatGPT is effective in stock selection but may not perform as well in assigning optimal weights to stocks within the portfolio. But when stocks selection by ChatGPT is combined with established portfolio optimization models, we achieve even better results. By blending strengths of AI-generated stock selection with advanced quantitative optimization techniques, we observed the potential for more robust and favorable investment outcomes, suggesting a hybrid approach for more effective and reliable investment decision-making in the future.

摘要
在这篇论文中，我们探讨了使用生成AI模型，如ChatGPT，来选择投资 portefolio的可能性。因为GPT模型的“幻觉”问题，使得对模型输出的信任具有挑战性，因此我们采取了一种不同的方法。我们使用ChatGPT来获取S&P500市场指数中可能有吸引力的股票，然后比较了不同的投资组合优化策略，包括使用这些AI生成的交易宇宙，与量化投资优化模型进行比较，以及与一些流行的投资基金进行比较。我们的发现表明，ChatGPT在股票选择方面是有效的，但可能不如在分配股票 weights 方面表现好。但当ChatGPT生成的股票选择与已有的量化优化模型相结合时，我们可以获得更好的投资结果。通过融合AI生成的股票选择和已有的量化优化技术，我们发现了一种更加有效和可靠的投资决策方法，建议将这种方法应用于未来的投资决策中。

Automated Sizing and Training of Efficient Deep Autoencoders using Second Order Algorithms

paper_url: http://arxiv.org/abs/2308.06221
repo_url: None
paper_authors: Kanishka Tyagi, Chinmay Rane, Michael Manry
For: 本研究旨在提出一种多步训练方法，用于设计通用线性分类器。* Methods: 首先，通过回归获得初始多类线性分类器。然后，通过减少无用输入的方式，降低验证错误。同时，通过类似于霍-卡什洛夫规则的方法，提高 DESIRED 输出。接着，输出推定器被扩展为一个通用的线性分类器中的多层感知器。* Results: 通过组合剪枝和增长策略，提高输入单元的推定器，并将输出单元扩展为一个通用的线性分类器中的多层感知器。最后，通过改进每个深度学习块，提高整体深度学习模型的性能。

Abstract
We propose a multi-step training method for designing generalized linear classifiers. First, an initial multi-class linear classifier is found through regression. Then validation error is minimized by pruning of unnecessary inputs. Simultaneously, desired outputs are improved via a method similar to the Ho-Kashyap rule. Next, the output discriminants are scaled to be net functions of sigmoidal output units in a generalized linear classifier. We then develop a family of batch training algorithm for the multi layer perceptron that optimizes its hidden layer size and number of training epochs. Next, we combine pruning with a growing approach. Later, the input units are scaled to be the net function of the sigmoidal output units that are then feed into as input to the MLP. We then propose resulting improvements in each of the deep learning blocks thereby improving the overall performance of the deep architecture. We discuss the principles and formulation regarding learning algorithms for deep autoencoders. We investigate several problems in deep autoencoders networks including training issues, the theoretical, mathematical and experimental justification that the networks are linear, optimizing the number of hidden units in each layer and determining the depth of the deep learning model. A direct implication of the current work is the ability to construct fast deep learning models using desktop level computational resources. This, in our opinion, promotes our design philosophy of building small but powerful algorithms. Performance gains are demonstrated at each step. Using widely available datasets, the final network's ten fold testing error is shown to be less than that of several other linear, generalized linear classifiers, multi layer perceptron and deep learners reported in the literature.

摘要
我们提出了一种多步训练方法用于设计通用线性分类器。首先，通过回归获得初始多类线性分类器。然后，通过减少不必要的输入，降低验证错误。同时，通过类似于霍-卡什纳规则的方法，提高期望的输出。接着，输出推定器被映射到通用线性分类器中的sigmoid输出单元。然后，我们开发了一家批处理训练算法，用于最优化多层感知器的隐藏层大小和训练轮次数。接着，我们结合剪除和增长方法。最后，输入单元被映射到sigmoid输出单元的网络中，并且这些输入单元被用作多层感知器的输入。我们then propose several improvements in each deep learning block, leading to improved overall performance of the deep architecture. We discuss the principles and formulation of learning algorithms for deep autoencoders, and investigate several problems in deep autoencoder networks, including training issues, theoretical, mathematical, and experimental justification that the networks are linear, optimizing the number of hidden units in each layer, and determining the depth of the deep learning model. A direct implication of our work is the ability to construct fast deep learning models using desktop-level computational resources, which promotes our design philosophy of building small but powerful algorithms. Performance gains are demonstrated at each step. Using widely available datasets, the final network's ten-fold testing error is shown to be less than that of several other linear, generalized linear classifiers, multi-layer perceptron, and deep learners reported in the literature.

Safety in Traffic Management Systems: A Comprehensive Survey

paper_url: http://arxiv.org/abs/2308.06204
repo_url: None
paper_authors: Wenlu Du, Ankan Dash, Jing Li, Hua Wei, Guiling Wang
for: 这篇论文旨在提供对交通管理系统安全性的全面回顾，包括交通管理系统中出现的各种安全问题、当前研究的状况以及提高交通管理系统安全性的技术和方法。
methods: 论文使用了文献综述的方法，概括了交通管理系统中的安全问题，并分析了当前研究的状况和提议。
results: 论文总结了当前研究的结果和限制，并提出了未来研究的方向。

Abstract
Traffic management systems play a vital role in ensuring safe and efficient transportation on roads. However, the use of advanced technologies in traffic management systems has introduced new safety challenges. Therefore, it is important to ensure the safety of these systems to prevent accidents and minimize their impact on road users. In this survey, we provide a comprehensive review of the literature on safety in traffic management systems. Specifically, we discuss the different safety issues that arise in traffic management systems, the current state of research on safety in these systems, and the techniques and methods proposed to ensure the safety of these systems. We also identify the limitations of the existing research and suggest future research directions.

摘要
交通管理系统在公路上的交通运输中发挥了关键作用，但是使用先进技术的交通管理系统引入了新的安全挑战。因此，确保交通管理系统的安全性是非常重要的，以避免事故和减少它们对公路用户的影响。在这份调查中，我们提供了交通管理系统安全的全面评论。 Specifically，我们讨论了交通管理系统中不同的安全问题，当前的研究进展、以及为确保交通管理系统安全的技术和方法。我们还识别了现有研究的限制，并建议未来的研究方向。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

2023-08-12

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

Value-Distributional Model-Based Reinforcement Learning

Approximate Answering of Graph Queries

4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion

ModelScope Text-to-Video Technical Report

MC-DRE: Multi-Aspect Cross Integration for Drug Event/Entity Extraction

Digital elevation model correction in urban areas using extreme gradient boosting, land cover and terrain parameters

Dealing with Small Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencoder Methods for Convolutional Models

Learning Abstract Visual Reasoning via Task Decomposition: A Case Study in Raven Progressive Matrices

SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models

One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training

HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion

Three Ways of Using Large Language Models to Evaluate Chat

Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot Interaction

EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes

Generating Faithful Text From a Knowledge Graph with Noisy Reference Text

Not So Robust After All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

Multi-Label Knowledge Distillation

Semantic Equivariant Mixup

A Sequential Meta-Transfer (SMT) Learning to Combat Complexities of Physics-Informed Neural Networks: Application to Composites Autoclave Processing

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

Pedestrian Trajectory Prediction in Pedestrian-Vehicle Mixed Environments: A Systematic Review

Dialogue Possibilities between a Human Supervisor and UAM Air Traffic Management: Route Alteration

A Brain-Computer Interface Augmented Reality Framework with Auto-Adaptive SSVEP Recognition

ZYN: Zero-Shot Reward Models with Yes-No Questions

DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Wireless Federated $k$-Means Clustering with Non-coherent Over-the-Air Computation

Topic-Level Bayesian Surprise and Serendipity for Recommender Systems

Causally Linking Health Application Data and Personal Information Management Tools

Large Language Models to Identify Social Determinants of Health in Electronic Health Records

Combining feature aggregation and geometric similarity for re-identification of patterned animals

Foundation Model is Efficient Multimodal Multitask Model Selector

Enhancing Network Management Using Code Generated by Large Language Models

ChatGPT-based Investment Portfolio Selection

Automated Sizing and Training of Efficient Deep Autoencoders using Second Order Algorithms

Safety in Traffic Management Systems: A Comprehensive Survey