2023-08-14

cs.LG

cs.LG - 2023-08-14

Distance Matters For Improving Performance Estimation Under Covariate Shift

paper_url: http://arxiv.org/abs/2308.07223
repo_url: https://github.com/melanibe/distance_matters_performance_estimation
paper_authors: Mélanie Roschewitz, Ben Glocker
for: 本文旨在提出一种基于距离测试样本预期的训练分布的性能估计方法，以便在数据变换时进行安全的 AI 模型部署。
methods: 本文使用了一种基于距离测试样本预期的训练分布的方法，通过检查样本与预期的训练分布之间的距离，以避免在数据变换时取得不可靠的模型输出。
results: experiments 表明，该方法可以在13种图像分类任务上提供 statistically 显著的性能估计改进（相对于最佳基准），并在10种任务上达到最佳性能。 code 可以在 https://github.com/melanibe/distance_matters_performance_estimation 中找到。

Abstract
Performance estimation under covariate shift is a crucial component of safe AI model deployment, especially for sensitive use-cases. Recently, several solutions were proposed to tackle this problem, most leveraging model predictions or softmax confidence to derive accuracy estimates. However, under dataset shifts, confidence scores may become ill-calibrated if samples are too far from the training distribution. In this work, we show that taking into account distances of test samples to their expected training distribution can significantly improve performance estimation under covariate shift. Precisely, we introduce a "distance-check" to flag samples that lie too far from the expected distribution, to avoid relying on their untrustworthy model outputs in the accuracy estimation step. We demonstrate the effectiveness of this method on 13 image classification tasks, across a wide-range of natural and synthetic distribution shifts and hundreds of models, with a median relative MAE improvement of 27% over the best baseline across all tasks, and SOTA performance on 10 out of 13 tasks. Our code is publicly available at https://github.com/melanibe/distance_matters_performance_estimation.

摘要
In this work, we show that taking into account the distances of test samples to their expected training distribution can significantly improve performance estimation under covariate shift. Specifically, we introduce a "distance-check" to flag samples that lie too far from the expected distribution, in order to avoid relying on their untrustworthy model outputs in the accuracy estimation step.We demonstrate the effectiveness of this method on 13 image classification tasks, across a wide range of natural and synthetic distribution shifts and hundreds of models. Our results show a median relative MAE improvement of 27% over the best baseline across all tasks, and SOTA performance on 10 out of 13 tasks. Our code is publicly available at .

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

paper_url: http://arxiv.org/abs/2308.07221
repo_url: https://github.com/LZH-0225/AudioFormer
paper_authors: Zhaohui Li, Haitao Wang, Xinghua Jiang
for: 本研究旨在提出一种名为AudioFormer的方法，该方法通过获得隐藏的音频编码并在其基础之上进行微调，以便为音频分类任务进行特征表示。
methods: 我们首先提出一种新的思路，即将音频分类任务看作自然语言理解（NLU）的一种形式。然后，我们利用现有的神经网络音频编码器模型，生成隐藏的音频编码，并将其用于训练一个假名句语言模型（MLM），从而获得音频特征表示。此外，我们还提出了一种多 positivesample contrastive（MPC）学习方法，该方法可以学习多个隐藏的音频编码之间的共同表示。
results: 在我们的实验中，我们将隐藏的音频编码视为文本数据，并使用cloze-like方法训练一个假名句语言模型，最终获得高质量的音频表示。特别是，MPC学习技术可以有效地捕捉多个正样本之间的协同表示。我们的研究结果表明，AudioFormer在多个数据集上达到了 significatively improved的性能，甚至超过了一些音视频多模态分类模型。具体的表现为：在AudioSet（2M,20K）、FSD50K等数据集上，AudioFormer的性能分别为53.9、45.1和65.6。我们已经公开分享了代码和模型：https://github.com/LZH-0225/AudioFormer.git。

Abstract
We propose a method named AudioFormer,which learns audio feature representations through the acquisition of discrete acoustic codes and subsequently fine-tunes them for audio classification tasks. Initially,we introduce a novel perspective by considering the audio classification task as a form of natural language understanding (NLU). Leveraging an existing neural audio codec model,we generate discrete acoustic codes and utilize them to train a masked language model (MLM),thereby obtaining audio feature representations. Furthermore,we pioneer the integration of a Multi-Positive sample Contrastive (MPC) learning approach. This method enables the learning of joint representations among multiple discrete acoustic codes within the same audio input. In our experiments,we treat discrete acoustic codes as textual data and train a masked language model using a cloze-like methodology,ultimately deriving high-quality audio representations. Notably,the MPC learning technique effectively captures collaborative representations among distinct positive samples. Our research outcomes demonstrate that AudioFormer attains significantly improved performance compared to prevailing monomodal audio classification models across multiple datasets,and even outperforms audio-visual multimodal classification models on select datasets. Specifically,our approach achieves remarkable results on datasets including AudioSet (2M,20K),and FSD50K,with performance scores of 53.9,45.1,and 65.6,respectively. We have openly shared both the code and models: https://github.com/LZH-0225/AudioFormer.git.

摘要
我们提出一种方法 named AudioFormer，它通过获取批量音频编码并进行精细调整，以便为音频分类任务学习音频特征表示。我们首先提出一种新的视角，即视音频分类任务为自然语言理解（NLU）的一种形式。利用现有的神经网络音频编码器模型，我们生成了批量音频编码，并使用它们训练一个隐藏状态语言模型（MLM），从而获得音频特征表示。此外，我们开拓了多个正样本对比（MPC）学习方法的 интеграción。这种方法使得在同一个音频输入中学习多个批量音频编码的联合表示。在我们的实验中，我们将批量音频编码视为文本数据，并使用cloze-like方法训练一个隐藏状态语言模型，最终得到高质量的音频表示。各种实验结果表明，AudioFormer在多个数据集上达到了 significativamente提高的性能，并在一些数据集上 even outperform 音频-视觉多模态分类模型。具体来说，我们的方法在AudioSet（2M,20K）、FSD50K 等数据集上达到了性能分数为 53.9、45.1 和 65.6 等。我们已经在 GitHub 上公开分享了代码和模型：https://github.com/LZH-0225/AudioFormer.git。

Generating Individual Trajectories Using GPT-2 Trained from Scratch on Encoded Spatiotemporal Data

paper_url: http://arxiv.org/abs/2308.07940
repo_url: None
paper_authors: Taizo Horikomi, Shouji Fujimoto, Atushi Ishikawa, Takayuki Mizuno
for: 本研究使用GPT-2语言模型来生成个人日常路径序列，以考虑环境因素和个人特征的影响。
methods: 研究人员使用了坐标转换技术将地理坐标表示为特定的位置符号，并将每天的路径序列表示为一系列这些位置符号。特定的时间间隔符号和环境因素符号也被添加到序列中，以便在GPT-2架构上进行训练。
results: 通过训练这些位置符号和时间间隔符号，研究人员可以生成受环境因素和个人特征影响的个人日常路径序列。

Abstract
Following Mizuno, Fujimoto, and Ishikawa's research (Front. Phys. 2022), we transpose geographical coordinates expressed in latitude and longitude into distinctive location tokens that embody positions across varied spatial scales. We encapsulate an individual daily trajectory as a sequence of tokens by adding unique time interval tokens to the location tokens. Using the architecture of an autoregressive language model, GPT-2, this sequence of tokens is trained from scratch, allowing us to construct a deep learning model that sequentially generates an individual daily trajectory. Environmental factors such as meteorological conditions and individual attributes such as gender and age are symbolized by unique special tokens, and by training these tokens and trajectories on the GPT-2 architecture, we can generate trajectories that are influenced by both environmental factors and individual attributes.

摘要
根据米泽野、藤本和石川等人的研究（Front. Phys. 2022），我们将地理坐标表示为纬度和经度转换为特征化的位置标记，这些标记表示在不同的空间尺度上的位置。我们将每天的行走路径序列为一系列标记，并将具有特定时间间隔的唯一标记添加到位置标记中。使用GPT-2架构的自然语言模型，我们从头开始训练这些标记和路径，以生成基于环境因素和个人特征的各天行走路径。特殊的环境因素和个人特征被象化为唯一的特殊标记，通过训练这些标记和路径，我们可以生成受环境因素和个人特征影响的行走路径。

Automated Ensemble-Based Segmentation of Pediatric Brain Tumors: A Novel Approach Using the CBTN-CONNECT-ASNR-MICCAI BraTS-PEDs 2023 Challenge Data

paper_url: http://arxiv.org/abs/2308.07212
repo_url: None
paper_authors: Shashidhar Reddy Javaji, Sovesh Mohapatra, Advait Gosai, Gottfried Schlaug
for: 这个研究旨在发展deep learning技术，对于脑癌诊断和治疗方法进行改进。
methods: 这个研究使用了深度学习技术，包括ONet和modified UNet，以及新的损失函数。
results: ensemble方法可以实现更高的精度和更好的特征捕捉，实现lesion_wise dice scores的0.52、0.72和0.78，并且可以更好地覆盖肿瘤区域。

Abstract
Brain tumors remain a critical global health challenge, necessitating advancements in diagnostic techniques and treatment methodologies. In response to the growing need for age-specific segmentation models, particularly for pediatric patients, this study explores the deployment of deep learning techniques using magnetic resonance imaging (MRI) modalities. By introducing a novel ensemble approach using ONet and modified versions of UNet, coupled with innovative loss functions, this study achieves a precise segmentation model for the BraTS-PEDs 2023 Challenge. Data augmentation, including both single and composite transformations, ensures model robustness and accuracy across different scanning protocols. The ensemble strategy, integrating the ONet and UNet models, shows greater effectiveness in capturing specific features and modeling diverse aspects of the MRI images which result in lesion_wise dice scores of 0.52, 0.72 and 0.78 for enhancing tumor, tumor core and whole tumor labels respectively. Visual comparisons further confirm the superiority of the ensemble method in accurate tumor region coverage. The results indicate that this advanced ensemble approach, building upon the unique strengths of individual models, offers promising prospects for enhanced diagnostic accuracy and effective treatment planning for brain tumors in pediatric brains.

摘要
�� funcionado global health challenge, requiring advancements in diagnostic techniques and treatment methodologies. In response to the growing need for age-specific segmentation models, particularly for pediatric patients, this study explores the deployment of deep learning techniques using magnetic resonance imaging (MRI) modalities. By introducing a novel ensemble approach using ONet and modified versions of UNet, coupled with innovative loss functions, this study achieves a precise segmentation model for the BraTS-PEDs 2023 Challenge. Data augmentation, including both single and composite transformations, ensures model robustness and accuracy across different scanning protocols. The ensemble strategy, integrating the ONet and UNet models, shows greater effectiveness in capturing specific features and modeling diverse aspects of the MRI images which result in lesion_wise dice scores of 0.52, 0.72 and 0.78 for enhancing tumor, tumor core and whole tumor labels respectively. Visual comparisons further confirm the superiority of the ensemble method in accurate tumor region coverage. The results indicate that this advanced ensemble approach, building upon the unique strengths of individual models, offers promising prospects for enhanced diagnostic accuracy and effective treatment planning for brain tumors in pediatric brains.Here's the word-for-word translation:�Git tumors remain a critical global health challenge, necessitating advancements in diagnostic techniques and treatment methodologies. In response to the growing need for age-specific segmentation models, particularly for pediatric patients, this study explores the deployment of deep learning techniques using magnetic resonance imaging (MRI) modalities. By introducing a novel ensemble approach using ONet and modified versions of UNet, coupled with innovative loss functions, this study achieves a precise segmentation model for the BraTS-PEDs 2023 Challenge. Data augmentation, including both single and composite transformations, ensures model robustness and accuracy across different scanning protocols. The ensemble strategy, integrating the ONet and UNet models, shows greater effectiveness in capturing specific features and modeling diverse aspects of the MRI images which result in lesion_wise dice scores of 0.52, 0.72 and 0.78 for enhancing tumor, tumor core and whole tumor labels respectively. Visual comparisons further confirm the superiority of the ensemble method in accurate tumor region coverage. The results indicate that this advanced ensemble approach, building upon the unique strengths of individual models, offers promising prospects for enhanced diagnostic accuracy and effective treatment planning for brain tumors in pediatric brains.

Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning

paper_url: http://arxiv.org/abs/2308.07209
repo_url: None
paper_authors: Shipeng Bai, Jun Chen, Xintian Shen, Yixuan Qian, Yong Liu
for: 提高神经网络的推理时间和内存占用的压缩和量化方法，但大多数现有方法需要原始训练集来微调模型，带来负担重大并不适用于敏感或商业化数据的应用。
methods: 提出了一些数据自由方法，但它们分别进行数据自由压缩和量化，而不是同时进行压缩和量化。
results: 在大规模图像分类任务中，我们的方法（Unified Data-Free Compression，UDFC）可以在不需要数据和微调过程的情况下，同时进行压缩和量化，并实现了与现有方法相当的性能提升。例如，在ImageNet dataset上，我们对ResNet-34网络进行30%压缩和6比特量化后，与最佳方法相比，我们的方法可以达到20.54%的精度提升。

Abstract
Structured pruning and quantization are promising approaches for reducing the inference time and memory footprint of neural networks. However, most existing methods require the original training dataset to fine-tune the model. This not only brings heavy resource consumption but also is not possible for applications with sensitive or proprietary data due to privacy and security concerns. Therefore, a few data-free methods are proposed to address this problem, but they perform data-free pruning and quantization separately, which does not explore the complementarity of pruning and quantization. In this paper, we propose a novel framework named Unified Data-Free Compression(UDFC), which performs pruning and quantization simultaneously without any data and fine-tuning process. Specifically, UDFC starts with the assumption that the partial information of a damaged(e.g., pruned or quantized) channel can be preserved by a linear combination of other channels, and then derives the reconstruction form from the assumption to restore the information loss due to compression. Finally, we formulate the reconstruction error between the original network and its compressed network, and theoretically deduce the closed-form solution. We evaluate the UDFC on the large-scale image classification task and obtain significant improvements over various network architectures and compression methods. For example, we achieve a 20.54% accuracy improvement on ImageNet dataset compared to SOTA method with 30% pruning ratio and 6-bit quantization on ResNet-34.

摘要
《结构化剪辑和量化是减少神经网络推理时间和内存占用的有效方法。然而，大多数现有方法需要原始训练集来精度调整模型，这不仅带来重要资源占用，还不可能 для涉及隐私或商业机密的应用程序 due to privacy and security concerns。因此，一些无数据方法被提议，但它们分别进行无数据剪辑和量化，而不是探索剪辑和量化的共同优势。在这篇论文中，我们提出了一个名为统一无数据压缩（UDFC）的新框架，它在无数据情况下同时进行剪辑和量化。具体来说，UDFC从假设部分频道（例如剪辑或量化）的信息可以通过其他频道的线性组合来保留一些信息，然后 derive 恢复形式来恢复因压缩而产生的信息损失。最后，我们将重建误差 между 原始网络和压缩后的网络，并 theoretically 递归解决。我们在大规模图像分类任务上评估了UDFC，并实现了对不同网络架构和压缩方法的显著改进。例如，我们在 ImageNet 数据集上实现了与 SOTA 方法相比的 20.54% 的准确率提升，其中 ResNet-34 网络的 30% 剪辑率和 6 位量化。

Algorithms for the Training of Neural Support Vector Machines

paper_url: http://arxiv.org/abs/2308.07204
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Lars Simon, Manuel Radons
for: 本文旨在探讨基于域知识的神经支持向量机（NSVM）模型的设计，以及一些基于Pegasos算法的训练算法。
methods: 本文使用Pegasos算法和一些基于神经网络的训练算法来训练NSVM模型。
results: 本文通过解决一些标准机器学习任务来证明NSVM模型的可行性。

Abstract
Neural support vector machines (NSVMs) allow for the incorporation of domain knowledge in the design of the model architecture. In this article we introduce a set of training algorithms for NSVMs that leverage the Pegasos algorithm and provide a proof of concept by solving a set of standard machine learning tasks.

摘要
神经支持向量机器 (NSVM) 允许在模型建立之处 incorporate 领域知识。在这篇文章中，我们介绍了一组用 Pegasos 算法进行训练 NSVM 的算法，并通过解决一组标准机器学习任务来提供证明。Note that "神经支持向量机器" (NSVM) is the Simplified Chinese term for "neural support vector machine".

Neural Categorical Priors for Physics-Based Character Control

paper_url: http://arxiv.org/abs/2308.07200
repo_url: https://github.com/Tencent-RoboticsX/NCP
paper_authors: Qingxu Zhu, He Zhang, Mengting Lan, Lei Han
for: 本研究目的是提出一种新的学习框架，用于控制基于物理学的人工智能角色，以实现更高质量和多样性的运动。
methods: 本研究使用了强化学习（RL）来跟踪和模仿生命力运动的精准信息，并使用了量化自适应变换器（VQ-VAE）来压缩运动clip中的最重要信息。
results: 研究结果表明，提出的方法可以控制人工智能角色进行高质量、多样化的运动，并且可以在两个复杂的下游任务中表现出色，包括剑盾攻击和两个玩家拳击游戏。

Abstract
Recent advances in learning reusable motion priors have demonstrated their effectiveness in generating naturalistic behaviors. In this paper, we propose a new learning framework in this paradigm for controlling physics-based characters with significantly improved motion quality and diversity over existing state-of-the-art methods. The proposed method uses reinforcement learning (RL) to initially track and imitate life-like movements from unstructured motion clips using the discrete information bottleneck, as adopted in the Vector Quantized Variational AutoEncoder (VQ-VAE). This structure compresses the most relevant information from the motion clips into a compact yet informative latent space, i.e., a discrete space over vector quantized codes. By sampling codes in the space from a trained categorical prior distribution, high-quality life-like behaviors can be generated, similar to the usage of VQ-VAE in computer vision. Although this prior distribution can be trained with the supervision of the encoder's output, it follows the original motion clip distribution in the dataset and could lead to imbalanced behaviors in our setting. To address the issue, we further propose a technique named prior shifting to adjust the prior distribution using curiosity-driven RL. The outcome distribution is demonstrated to offer sufficient behavioral diversity and significantly facilitates upper-level policy learning for downstream tasks. We conduct comprehensive experiments using humanoid characters on two challenging downstream tasks, sword-shield striking and two-player boxing game. Our results demonstrate that the proposed framework is capable of controlling the character to perform considerably high-quality movements in terms of behavioral strategies, diversity, and realism. Videos, codes, and data are available at https://tencent-roboticsx.github.io/NCP/.

摘要
近期研究生成可重用运动先验的进步已经证明了它们在生成自然化行为方面的效果。在这篇论文中，我们提出了一种新的学习框架，用于控制基于物理学的角色，并提高了现有状态艺术方法的运动质量和多样性。我们的方法使用了奖励学习（RL）来初始化并模仿生命体运动，使用不结构化运动片段中的精炼信息，并使用Vector Quantized Variational AutoEncoder（VQ-VAE）结构压缩运动片段中的最重要信息。这种结构将运动片段中的信息压缩成一个紧凑而有用的秘密空间中，通过从已经训练的分类先验分布中采样代码，可以生成高质量的生命体运动。虽然这种先验分布可以通过Encoder的输出进行超vision训练，但它遵循原始运动片段分布，可能会导致行为偏好。为解决这个问题，我们进一步提出了一种名为“先验偏移”的技术，通过吸引力驱动RL来调整先验分布。结果显示，我们的框架可以控制角色进行高质量的运动，包括行为策略、多样性和真实性。我们在人iform机器人上进行了广泛的实验，并在剑盾战和两个玩家盒子游戏中进行了两个下游任务。我们的结果表明，我们的框架可以控制角色进行较高质量的运动，并且可以提高下游任务的性能。视频、代码和数据可以在https://tencent-roboticsx.github.io/NCP/上获取。

Explaining Black-Box Models through Counterfactuals

paper_url: http://arxiv.org/abs/2308.07198
repo_url: https://github.com/juliatrustworthyai/counterfactualexplanations.jl
paper_authors: Patrick Altmeyer, Arie van Deursen, Cynthia C. S. Liem
for: 这篇论文是用于解释人工智能的Explainable Artificial Intelligence（ExplaI）。
methods: 这篇论文使用Counterfactual Explanations（CE）和Algorithmic Recourse（AR）来解释黑盒模型的预测结果。
results: 这篇论文提供了一个用于Julia语言的CounterfactualExplanations.jl包，可以生成Counterfactual Explanations和Algorithmic Recourse，并且可以用于解释任何黑盒模型的预测结果。

Abstract
We present CounterfactualExplanations.jl: a package for generating Counterfactual Explanations (CE) and Algorithmic Recourse (AR) for black-box models in Julia. CE explain how inputs into a model need to change to yield specific model predictions. Explanations that involve realistic and actionable changes can be used to provide AR: a set of proposed actions for individuals to change an undesirable outcome for the better. In this article, we discuss the usefulness of CE for Explainable Artificial Intelligence and demonstrate the functionality of our package. The package is straightforward to use and designed with a focus on customization and extensibility. We envision it to one day be the go-to place for explaining arbitrary predictive models in Julia through a diverse suite of counterfactual generators.

摘要
我们介绍CounterfactualExplanations.jl：一个用于生成Counterfactual Explanations（CE）和Algorithmic Recourse（AR）的套件，用于黑盒模型中的Julia。CE解释了如何让模型的输入变化以获得具体预测。这些解释可以提供AR：一组建议行动，以改善不愉快的结果。在这篇文章中，我们讨论了Counterfactual Explanations在可解释人工智能中的用途，并详细介绍了我们的套件。套件易于使用，设计了一个重点在自定义和扩展。我们将这个套件作为Julia中解释任意预测模型的首选场所。

gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling

paper_url: http://arxiv.org/abs/2308.07192
repo_url: https://github.com/asash/gsasrec
paper_authors: Aleksandr Petrov, Craig Macdonald
for: 这篇论文旨在解释为何SASRec模型在比较BERT4Rec模型时表现不佳，并提出一种新的总体二进制十字Entropy损失函数（gBCE）以及改进后的gSASRec模型，以 Mitigate overconfidence问题。
methods: 这篇论文使用了SASRec模型和BERT4Rec模型，并对它们进行了比较。它还提出了一种新的总体二进制十字Entropy损失函数（gBCE），并证明了它可以降低overconfidence问题。
results: 这篇论文通过了详细的实验表明，gSASRec模型可以在三个 datasets上不受overconfidence问题的影响，并且可以超越BERT4Rec模型（例如，MovieLens-1M数据集上的NDCG提高了9.47%），同时需要更少的训练时间（例如，MovieLens-1M数据集上的训练时间减少了73%）。

Abstract
A large catalogue size is one of the central challenges in training recommendation models: a large number of items makes them memory and computationally inefficient to compute scores for all items during training, forcing these models to deploy negative sampling. However, negative sampling increases the proportion of positive interactions in the training data, and therefore models trained with negative sampling tend to overestimate the probabilities of positive interactions a phenomenon we call overconfidence. While the absolute values of the predicted scores or probabilities are not important for the ranking of retrieved recommendations, overconfident models may fail to estimate nuanced differences in the top-ranked items, resulting in degraded performance. In this paper, we show that overconfidence explains why the popular SASRec model underperforms when compared to BERT4Rec. This is contrary to the BERT4Rec authors explanation that the difference in performance is due to the bi-directional attention mechanism. To mitigate overconfidence, we propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence. We further propose the gSASRec model, an improvement over SASRec that deploys an increased number of negatives and the gBCE loss. We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem. As a result, gSASRec can outperform BERT4Rec (e.g. +9.47% NDCG on the MovieLens-1M dataset), while requiring less training time (e.g. -73% training time on MovieLens-1M). Moreover, in contrast to BERT4Rec, gSASRec is suitable for large datasets that contain more than 1 million items.

摘要
大型目录大小是训练推荐模型的中心挑战之一：大量的项目使得计算和存储成本增加，使得这些模型在训练期间计算所有项目的得分变得不可能。然而，使用负样本增加了正交互动的比例在训练数据中，因此模型受负样本部署后会过度估计正交互动的现象，我们称之为过信任。虽然绝对值的预测分数或概率不重要于推荐结果的排序，但过信任的模型可能无法估计顶层推荐的细微差异，导致性能下降。在这篇论文中，我们表明了过信任问题导致SASRec模型在比较BERT4Rec时表现不佳。这与BERT4Rec作者的解释不同，即Bi-directional attention机制导致的差异。为了消除过信任，我们提出一种通用二进制十进制损失函数（gBCE），并论证它可以消除过信任。此外，我们还提出了gSASRec模型，它在SASRec模型的基础上增加了更多的负样本和gBCE损失函数。我们通过三个数据集的详细实验表明，gSASRec模型不会出现过信任问题。因此，gSASRec可以超过BERT4Rec（例如，MovieLens-1M数据集上的NDCG提高9.47%），同时需要较少的训练时间（例如，MovieLens-1M数据集上的训练时间减少73%）。此外，gSASRec模型适用于包含更多 чем100万个项目的大型数据集。

Improving ICD-based semantic similarity by accounting for varying degrees of comorbidity

paper_url: http://arxiv.org/abs/2308.07359
repo_url: None
paper_authors: Jan Janosch Schneider, Marius Adler, Christoph Ammer-Herrmenau, Alexander Otto König, Ulrich Sax, Jonas Hügel
for: 这篇论文的目的是什么？* 这篇论文的目的是找出类似的病人，以便评估治疗结果和促进临床决策。methods: 这篇论文使用了哪些方法？* 这篇论文使用了 semantic similarity algorithms，包括 level-based information content、Leacock & Chodorow concept similarity 和 bipartite graph matching。results: 这篇论文的结果是什么？* 这篇论文的结果表明， Accounting for comorbidity variance can significantly improve the performance of semantic similarity algorithms。最佳结果为 level-based information content、Leacock & Chodorow concept similarity 和 bipartite graph matching的 комbination，与专家评验的真实值相符。

Abstract
Finding similar patients is a common objective in precision medicine, facilitating treatment outcome assessment and clinical decision support. Choosing widely-available patient features and appropriate mathematical methods for similarity calculations is crucial. International Statistical Classification of Diseases and Related Health Problems (ICD) codes are used worldwide to encode diseases and are available for nearly all patients. Aggregated as sets consisting of primary and secondary diagnoses they can display a degree of comorbidity and reveal comorbidity patterns. It is possible to compute the similarity of patients based on their ICD codes by using semantic similarity algorithms. These algorithms have been traditionally evaluated using a single-term expert rated data set. However, real-word patient data often display varying degrees of documented comorbidities that might impair algorithm performance. To account for this, we present a scale term that considers documented comorbidity-variance. In this work, we compared the performance of 80 combinations of established algorithms in terms of semantic similarity based on ICD-code sets. The sets have been extracted from patients with a C25.X (pancreatic cancer) primary diagnosis and provide a variety of different combinations of ICD-codes. Using our scale term we yielded the best results with a combination of level-based information content, Leacock & Chodorow concept similarity and bipartite graph matching for the set similarities reaching a correlation of 0.75 with our expert's ground truth. Our results highlight the importance of accounting for comorbidity variance while demonstrating how well current semantic similarity algorithms perform.

摘要
寻找类似病人是精准医学中常见的目标，可以促进治疗结果评估和临床决策支持。选择广泛可用的病人特征和适当的数学方法进行相似性计算是关键。国际疾病分类和相关医学问题（ICD）代码是全球通用的疾病编码，可以为大多数病人提供。将这些代码集成为主要和次要诊断的集合，可以显示疾病复杂性和潜在的疾病模式。可以使用语义相似算法计算病人之间的相似性。这些算法traditionally被评估使用专家评分的单个数据集。但是，实际的病人数据经常具有不同程度的记录的相关疾病，这可能会影响算法性能。为了考虑这一点，我们提出了一个权重因子，以考虑记录的相关疾病差异。在这种情况下，我们比较了80组已知算法的语义相似性，基于ICD代码集。这些代码集来自悉尼癌病（C25.X）主诊断的病人，并提供了不同的ICD代码组合。通过我们的权重因子，我们得到了最佳的结果，其中包括水平基本信息内容、Leacock & Chodorow概念相似和 биipartite图匹配算法，达到了专家的参考真实值的0.75相似度。我们的结果 highlights the importance of accounting for comorbidity variance while demonstrating the current state-of-the-art semantic similarity algorithms perform well.

Conformal Predictions Enhanced Expert-guided Meshing with Graph Neural Networks

paper_url: http://arxiv.org/abs/2308.07358
repo_url: https://github.com/ahnobari/autosurf
paper_authors: Amin Heyrani Nobari, Justin Rey, Suhas Kodali, Matthew Jones, Faez Ahmed
for:This paper aims to develop a machine learning-based scheme for automatically generating high-quality meshes for computational fluid dynamics (CFD) simulations, with a focus on aircraft models.methods:The proposed method utilizes graph neural networks (GNN) and expert guidance to generate CFD meshes. A new 3D segmentation algorithm is introduced, which outperforms two state-of-the-art models, PointNet++ and PointMLP, for surface classification. The conformal predictions method is used to project predictions from 3D mesh segmentation models to CAD surfaces, providing marginal statistical guarantees and robust uncertainty quantification and handling.results:The proposed approach is demonstrated through a real-world case study, showing that the automatically generated mesh is comparable in quality to expert-generated meshes and enables the solver to converge and produce accurate results. Additionally, the approach is found to be 5 times faster than adaptive remeshing in the overall process of simulation. The code and data for this project are made publicly available at https://github.com/ahnobari/AutoSurf.

Abstract
Computational Fluid Dynamics (CFD) is widely used in different engineering fields, but accurate simulations are dependent upon proper meshing of the simulation domain. While highly refined meshes may ensure precision, they come with high computational costs. Similarly, adaptive remeshing techniques require multiple simulations and come at a great computational cost. This means that the meshing process is reliant upon expert knowledge and years of experience. Automating mesh generation can save significant time and effort and lead to a faster and more efficient design process. This paper presents a machine learning-based scheme that utilizes Graph Neural Networks (GNN) and expert guidance to automatically generate CFD meshes for aircraft models. In this work, we introduce a new 3D segmentation algorithm that outperforms two state-of-the-art models, PointNet++ and PointMLP, for surface classification. We also present a novel approach to project predictions from 3D mesh segmentation models to CAD surfaces using the conformal predictions method, which provides marginal statistical guarantees and robust uncertainty quantification and handling. We demonstrate that the addition of conformal predictions effectively enables the model to avoid under-refinement, hence failure, in CFD meshing even for weak and less accurate models. Finally, we demonstrate the efficacy of our approach through a real-world case study that demonstrates that our automatically generated mesh is comparable in quality to expert-generated meshes and enables the solver to converge and produce accurate results. Furthermore, we compare our approach to the alternative of adaptive remeshing in the same case study and find that our method is 5 times faster in the overall process of simulation. The code and data for this project are made publicly available at https://github.com/ahnobari/AutoSurf.

摘要
computational fluid dynamics (CFD) 广泛应用于不同的工程领域，但准确的 simulations 受到 mesh 的限制。高精度的 mesh 可以确保精度，但是来自计算成本的代价很高。 adaptive remeshing 技术也需要多次 simulations 和大量计算成本。这意味着 meshing 过程依赖于专家知识和多年的经验。自动生成 mesh 可以保存很多时间和努力，并且导致更快的设计过程。本文提出了一种基于 machine learning 的 scheme，使用 graph neural networks (GNN) 和专家指导生成 CFD mesh для飞机模型。在这种工作中，我们提出了一种新的 3D 分割算法，其在 surface classification 方面超过了两个 state-of-the-art 模型：PointNet++ 和 PointMLP。我们还提出了一种将 predictions 从 3D mesh 分割模型项project 到 CAD 表面的新方法，使用 conformal predictions 方法，该方法提供了边缘统计保证和稳定的 uncertainty quantification 和处理。我们示示了添加 conformal predictions 可以使模型避免 under-refinement 和失败，即CFD meshing中的负面刻。 finally，我们通过一个实际的案例研究证明了我们自动生成的 mesh 与专家生成的 mesh 相当，并且使得解除器能够 converges 并生成准确的结果。此外，我们与 adaptive remeshing 的相对比较发现，我们的方法在整个 simulations 过程中速度比 adaptive remeshing 5 倍。代码和数据可以在 https://github.com/ahnobari/AutoSurf 上公开获取。

Efficient Learning of Quantum States Prepared With Few Non-Clifford Gates II: Single-Copy Measurements

paper_url: http://arxiv.org/abs/2308.07175
repo_url: None
paper_authors: Sabee Grewal, Vishnu Iyer, William Kretschmer, Daniel Liang
for: 学习 $n$-qubit 量子状态，输出由最多 $t$ 单位 qubit 非截归 gate 生成的 circuits，可以使用 $\mathsf{poly}(n,2^t,1/\epsilon)$ 时间和样本来达到 trace distance $\epsilon$。
methods: 使用单复本测量来学习该类状态，而不需要双复本测量。
results: 实现了同样高效的学习算法，但使用单复本测量而不需要双复本测量。

Abstract
Recent work has shown that $n$-qubit quantum states output by circuits with at most $t$ single-qubit non-Clifford gates can be learned to trace distance $\epsilon$ using $\mathsf{poly}(n,2^t,1/\epsilon)$ time and samples. All prior algorithms achieving this runtime use entangled measurements across two copies of the input state. In this work, we give a similarly efficient algorithm that learns the same class of states using only single-copy measurements.

摘要

PitchNet: A Fully Convolutional Neural Network for Pitch Estimation

paper_url: http://arxiv.org/abs/2308.07170
repo_url: None
paper_authors: Jeremy Cochoy
for: 用于提高音乐和声音处理中的抽取音高精度
methods: 使用卷积神经网络和自相关函数优化抽取音高精度
results: 在各种数据集上（包括合唱、歌剧录音和时间压缩的元音）进行评估，达到了更高的抽取音高精度

Abstract
In the domain of music and sound processing, pitch extraction plays a pivotal role. This research introduces "PitchNet", a convolutional neural network tailored for pitch extraction from the human singing voice, including acapella performances. Integrating autocorrelation with deep learning techniques, PitchNet aims to optimize the accuracy of pitch detection. Evaluation across datasets comprising synthetic sounds, opera recordings, and time-stretched vowels demonstrates its efficacy. This work paves the way for enhanced pitch extraction in both music and voice settings.

摘要
在音乐和声音处理领域中，抓取高度扮演着关键性的角色。这项研究介绍了“抓取网络”（PitchNet），一种针对人声 singing voice 的卷积神经网络，包括 acapella 表演。通过与深度学习技术结合自相关性，PitchNet 目标优化抓取精度。对于各种数据集，包括 sintetic 声音、歌剧录音和时间压缩的元音，评估表明 PitchNet 的可行性。这项工作将为音乐和声音设置中的抓取提供新的 возможности。

SPEGTI: Structured Prediction for Efficient Generative Text-to-Image Models

paper_url: http://arxiv.org/abs/2308.10997
repo_url: None
paper_authors: Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam, Andreas Veit, Ayan Chakrabarti, Sanjiv Kumar
for: 提高文本图像生成模型的计算效率，以便在不输出质量下降的情况下提高图像生成速度。
methods: 使用MarkovRandomField（MRF）模型来编码图像各个位置的图像元素之间的兼容性，并使用这个MRF模型与之前提出的Muse模型结合使用，以便减少Muse预测步骤数量，从而提高图像生成速度。
results: 通过使用MRF模型，可以在不输出质量下降的情况下，提高文本图像生成模型的计算效率，并且可以在不需要多次预测的情况下，提高图像生成速度。

Abstract
Modern text-to-image generation models produce high-quality images that are both photorealistic and faithful to the text prompts. However, this quality comes at significant computational cost: nearly all of these models are iterative and require running inference multiple times with large models. This iterative process is needed to ensure that different regions of the image are not only aligned with the text prompt, but also compatible with each other. In this work, we propose a light-weight approach to achieving this compatibility between different regions of an image, using a Markov Random Field (MRF) model. This method is shown to work in conjunction with the recently proposed Muse model. The MRF encodes the compatibility among image tokens at different spatial locations and enables us to significantly reduce the required number of Muse prediction steps. Inference with the MRF is significantly cheaper, and its parameters can be quickly learned through back-propagation by modeling MRF inference as a differentiable neural-network layer. Our full model, SPEGTI, uses this proposed MRF model to speed up Muse by 1.5X with no loss in output image quality.

摘要
现代文本到图像生成模型可以生成高质量的图像，这些图像不仅具有高度的真实性，还具有与文本提示符的准确性。然而，这些高质量图像的生成需要大量的计算成本：大多数这些模型都是迭代的，需要多次运行推理。这种迭代过程是为了确保图像中的不同区域不仅与文本提示符吻合，而且也与其他区域吻合。在这种工作中，我们提出了一种轻量级的方法来实现图像中不同区域之间的吻合，使用Markov随机场（MRF）模型。这种方法可以与最近提出的Muse模型结合使用，并且可以在推理过程中减少Muse预测步骤的数量。MRF模型可以编码图像元素之间的空间位置的兼容性，从而使得推理过程中的计算成本得到了显著减少。我们的全模型SPEGTI使用这种提议的MRF模型，可以在推理过程中加速Muse的执行，而无需减少输出图像质量。

Pairing interacting protein sequences using masked language modeling

paper_url: http://arxiv.org/abs/2308.07136
repo_url: https://github.com/bitbol-lab/diffpalm
paper_authors: Umberto Lupo, Damiano Sgarbossa, Anne-Florence Bitbol
for: The paper aims to predict which proteins interact together from their amino-acid sequences, which is an important task in protein structure prediction and function prediction.
methods: The paper develops a method called DiffPALM that leverages protein language models trained on multiple sequence alignments to pair interacting protein sequences. The method uses MSA Transformer and the EvoFormer module of AlphaFold to fill in masked amino acids in multiple sequence alignments and capture inter-chain coevolution.
results: The paper shows that DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments and achieves competitive performance with using orthology-based pairing. Additionally, DiffPALM improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer without significantly deteriorating any of those tested.Here is the simplified Chinese version of the three key points:
for: 这篇论文目标是从蛋白质序列中预测哪些蛋白质相互作用，这是蛋白质结构预测和功能预测中非常重要的任务。
methods: 论文提出了一种名为DiffPALM的方法，它利用蛋白质语言模型在多个序列对上进行训练，以对相互作用的蛋白质序列进行对应。该方法使用MSA Transformer和AlphaFold中的EvoFormer模块来填充多个序列对中的遮盖氨基酸，并capture氨基酸之间的跨链共演化。
results: 论文表明，DiffPALM比现有的相互作用基于共演化的对应方法在困难的多个序列对上表现出色，并达到与使用同源蛋白质对应的竞争性表现。此外，DiffPALM还可以提高一些细胞蛋白质复合物的结构预测，无需进行较major fine-tuning。

Abstract
Predicting which proteins interact together from amino-acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments, such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called DiffPALM that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids. We show that it captures inter-chain coevolution, while it was trained on single-chain data, which means that it can be used out-of-distribution. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer, without significantly deteriorating any of those we tested. It also achieves competitive performance with using orthology-based pairing.

摘要
<>转换文本为简化中文。<>预测蛋白质之间的互作是一项重要任务。我们开发了一种方法来对蛋白质序列进行互作对应，利用蛋白质语言模型在多个序列对alignment中学习的力量，如MSA transformer和AlphaFold中的EvoFormer模块。我们将对蛋白质家族中的参数进行分配的问题进行形式化。我们提出了一种名为DiffPALM的方法，它利用MSA transformer填充多个序列对alignment中的masked amino酸的能力，以获得更好的互作对应。MSA transformer编码了功能或结构相关的氨基酸之间的共演化，我们表明它可以在单链数据上进行填充，并且在多个序列对alignment中提取深层次的数据时表现出色。与现有的共演化基于方法相比，DiffPALM在具有深度多个序列对alignment的困难benchmark上表现出色，并且在使用不需要微调的情况下，也能够与一种基于state-of-the-art蛋白质语言模型的方法相比。对于一些细菌蛋白质复合物的三维结构预测，DiffPALM提供了显著改善，而不是显著下降任何已测试的结构。它还可以与基于orthology的对应方法相比。

Natural Language is All a Graph Needs

paper_url: http://arxiv.org/abs/2308.07134
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, Yongfeng Zhang
for: 本研究旨在探讨whether large language models (LLMs) can replace graph neural networks (GNNs) as the foundation model for graphs.
methods: 本研究提出了InstructGLM（Instruction-finetuned Graph Language Model），通过自然语言指令设计了高度扩展的提示，并使用自然语言描述图像的几何结构和节点特征。
results: 研究结果表明，InstructGLM在ogbn-arxiv、Cora和PubMed datasets上都超过了所有竞争GNN基elines，这证明了我们的方法的有效性，同时也释放了大语言模型作为图机器学习基础模型的潜在性。

Abstract
The emergence of large-scale pre-trained language models, such as ChatGPT, has revolutionized various research fields in artificial intelligence. Transformers-based large language models (LLMs) have gradually replaced CNNs and RNNs to unify fields of computer vision and natural language processing. Compared with the data that exists relatively independently such as images, videos or texts, graph is a type of data that contains rich structural and relational information. Meanwhile, natural language, as one of the most expressive mediums, excels in describing complex structures. However, existing work on incorporating graph learning problems into the generative language modeling framework remains very limited. As the importance of large language models continues to grow, it becomes essential to explore whether LLMs can also replace GNNs as the foundation model for graphs. In this paper, we propose InstructGLM (Instruction-finetuned Graph Language Model), systematically design highly scalable prompts based on natural language instructions, and use natural language to describe the geometric structure and node features of the graph for instruction tuning an LLM to perform learning and inference on graphs in a generative manner. Our method exceeds all competitive GNN baselines on ogbn-arxiv, Cora and PubMed datasets, which demonstrates the effectiveness of our method and sheds light on generative large language models as the foundation model for graph machine learning.

摘要
大型预训语言模型，如ChatGPT，的出现对人工智能多个研究领域产生了革命性的影响。基于Transformers的大型语言模型（LLMs）逐渐取代了CNNs和RNNs，统一了计算机视觉和自然语言处理的领域。相比于独立存在的数据，如图像、视频或文本，图表是一种包含丰富结构和关系信息的数据类型。同时，自然语言作为最有表达力的媒体，在描述复杂结构方面表现出色。然而，将图学问题 incorporated into the generative language modeling framework 的现有工作很有限。随着大语言模型的重要性不断增长，我们需要探索是否可以将LLMs作为图像学习的基础模型。在这篇论文中，我们提出了InstructGLM（基于自然语言指令的图语言模型），系统地设计了可扩展的自然语言指令，并使用自然语言来描述图形结构和节点特征。通过这种方式，我们使用大语言模型进行图像学习和推理，并达到了在ogbn-arxiv、Cora和PubMed数据集上的所有竞争GNN基elines的超越。这说明了我们的方法的有效性，并且推照到了大语言模型作为图像学习的基础模型。

Implementation of The Future of Drug Discovery: QuantumBased Machine Learning Simulation (QMLS)

paper_url: http://arxiv.org/abs/2308.08561
repo_url: None
paper_authors: Yew Kee Wong, Yifan Zhou, Yan Shing Liang, Haichuan Qiu, Yu Xi Wu, Bin He
For: 这篇论文主要目的是提出一种新的药物开发研究与发展（R&D）阶段缩短方法，使其从原来的几年和十万美元降低到只需三到六个月和五万到八千美元。* Methods: 这篇论文使用的方法包括机器学习分子生成（MLMG）和量子 simulate（QS）。 MLMG 根据目标蛋白质的分子结构生成可能的吸引者，而 QS 根据反应和绑定效果从原始试剂中筛选出符合条件的分子。* Results: 这篇论文的结果是提出了一种基于机器学习和量子 simulate 的药物开发研究方法，可以在三到六个月和五万到八千美元的范围内缩短 R&D 阶段，并且可以生成多达几十个前期临床试验准备的药物。

Abstract
The Research & Development (R&D) phase of drug development is a lengthy and costly process. To revolutionize this process, we introduce our new concept QMLS to shorten the whole R&D phase to three to six months and decrease the cost to merely fifty to eighty thousand USD. For Hit Generation, Machine Learning Molecule Generation (MLMG) generates possible hits according to the molecular structure of the target protein while the Quantum Simulation (QS) filters molecules from the primary essay based on the reaction and binding effectiveness with the target protein. Then, For Lead Optimization, the resultant molecules generated and filtered from MLMG and QS are compared, and molecules that appear as a result of both processes will be made into dozens of molecular variations through Machine Learning Molecule Variation (MLMV), while others will only be made into a few variations. Lastly, all optimized molecules would undergo multiple rounds of QS filtering with a high standard for reaction effectiveness and safety, creating a few dozen pre-clinical-trail-ready drugs. This paper is based on our first paper, where we pitched the concept of machine learning combined with quantum simulations. In this paper we will go over the detailed design and framework of QMLS, including MLMG, MLMV, and QS.

摘要
研发（R&D）阶段是药品开发的 longest 和最昂贵的阶段。为了革新这个过程，我们介绍了一新的概念——QMLS，它可以缩短整个R&D阶段的时间至3-6个月，并将成本降至50-80万美元。在hit生成阶段，机器学习分子生成（MLMG）根据目标蛋白质分子结构生成可能的hit，而量子 simulations（QS）则从首轮试验中筛选出符合反应和结合效果的分子。在Lead优化阶段，由MLMG和QS生成的结果分子进行比较，并生成几十个分子变化through machine learning分子变化（MLMV），而其他分子则只生成几个变化。最后，所有优化的分子都会经过多轮QS筛选，以确保反应效果和安全性。通过这种方式，我们可以在几个月内生成数十个前期临床药物。这篇文章是我们之前的第一篇论文中提出的概念的详细设计和框架，包括MLMG、MLMV和QS。

A Time-aware tensor decomposition for tracking evolving patterns

paper_url: http://arxiv.org/abs/2308.07126
repo_url: None
paper_authors: Christos Chatzis, Max Pfeffer, Pedro Lind, Evrim Acar
for: 这篇论文主要旨在提出一种基于PARAFAC2的时间 regularization方法，用于从时间数据中提取慢慢发展的模式。
methods: 该方法使用时间 regularization来防止时间点的重新排序，并使用PARAFAC2进行tensor factorization来捕捉时间数据中的下降模式。
results: 经过广泛的实验表明，tPARAFAC2能够准确地捕捉时间数据中的下降模式，并在表现上超过PARAFAC2和 coupling matrix factorization with temporal smoothness regularization。

Abstract
Time-evolving data sets can often be arranged as a higher-order tensor with one of the modes being the time mode. While tensor factorizations have been successfully used to capture the underlying patterns in such higher-order data sets, the temporal aspect is often ignored, allowing for the reordering of time points. In recent studies, temporal regularizers are incorporated in the time mode to tackle this issue. Nevertheless, existing approaches still do not allow underlying patterns to change in time (e.g., spatial changes in the brain, contextual changes in topics). In this paper, we propose temporal PARAFAC2 (tPARAFAC2): a PARAFAC2-based tensor factorization method with temporal regularization to extract gradually evolving patterns from temporal data. Through extensive experiments on synthetic data, we demonstrate that tPARAFAC2 can capture the underlying evolving patterns accurately performing better than PARAFAC2 and coupled matrix factorization with temporal smoothness regularization.

摘要
<>将文本翻译成简化字符串。<>时间演化数据集经常可以被视为高阶张量，其中一个模式是时间模式。而张量分解技术已经成功地捕捉了高阶数据集中的下面纹理，但是忽略了时间方面，允许时间点的重新排序。在最近的研究中，人们尝试将时间正则化添加到时间模式中，以解决这个问题。然而，现有的方法仍然不允许下面纹理在时间上发生变化（例如，脑中的空间变化，话题中的上下文变化）。在这篇论文中，我们提出了时间PARAFAC2（tPARAFAC2）：一种基于PARAFAC2的张量分解方法，带有时间正则化来提取时间演化的慢慢发展模式。通过对synthetic数据进行了广泛的实验，我们示出了tPARAFAC2可以准确地捕捉到时间演化中的下面纹理，并且比PARAFAC2和联合矩阵因子化 WITH 时间平滑正则化更好。

Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

paper_url: http://arxiv.org/abs/2308.07121
repo_url: None
paper_authors: Lukas Rauch, Raphael Schwinger, Moritz Wirth, Bernhard Sick, Sven Tomforde, Christoph Scholz
for: 鸟叫声监测的终端学习shift，结合自动学习(SSL)和深度活动学习(DAL)。
methods: 利用 transformer 模型，直接处理原始音频数据，不需要传统的spectrogram转换。
results: 通过 SSL 生成高质量鸟叫声表示，可能加速环境变化评估和风力 farm 决策过程。同时，通过 DAL 利用鸟类 vocals 的多样性，减少人工标注数据的依赖，提高生物听音研究的可比性和可重现性。

Abstract
We propose a shift towards end-to-end learning in bird sound monitoring by combining self-supervised (SSL) and deep active learning (DAL). Leveraging transformer models, we aim to bypass traditional spectrogram conversions, enabling direct raw audio processing. ActiveBird2Vec is set to generate high-quality bird sound representations through SSL, potentially accelerating the assessment of environmental changes and decision-making processes for wind farms. Additionally, we seek to utilize the wide variety of bird vocalizations through DAL, reducing the reliance on extensively labeled datasets by human experts. We plan to curate a comprehensive set of tasks through Huggingface Datasets, enhancing future comparability and reproducibility of bioacoustic research. A comparative analysis between various transformer models will be conducted to evaluate their proficiency in bird sound recognition tasks. We aim to accelerate the progression of avian bioacoustic research and contribute to more effective conservation strategies.

摘要
我们提议将学习方法转向终端学习（End-to-End Learning），将自我超级vised学习（Self-Supervised Learning）和深度活动学习（Deep Active Learning）相结合。通过使用转换器模型，我们希望直接处理原始音频数据，并不需要传统的spectrogram转换。活动鸟2Vec可以通过SSL生成高质量鸟叫表示，可能加速环境变化评估和风轮农场决策过程。此外，我们计划利用鸟类 vocals 的多样性，减少人工标注数据的依赖性。我们将使用Huggingface Datasets框架，实现未来比较性和可重复性的生物声学研究。我们计划对不同的转换器模型进行比较分析，以评估它们在鸟叫识别任务中的效果。我们希望通过加速鸟类生物声学研究，为更有效的保护策略做出贡献。

Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases

paper_url: http://arxiv.org/abs/2308.07118
repo_url: https://github.com/maftej/iisnerf
paper_authors: Eugen Šlapak, Enric Pardo, Matúš Dopiriak, Taras Maksymyuk, Juraj Gazda
for: 本研究旨在探讨基于提供训练图像的神经辐射场（NeRF）在各种工业领域的应用潜力，并提供未来研究方向。
methods: 本研究使用NeRF来实现3D场景表示，并在视频压缩和3D动作估计等领域进行证明。
results: 研究显示，使用NeRF进行视频压缩可以达到48%和74%的压缩率提升，而在3D动作估计中，使用D-NeRF实现的 disparity map PSNR值达到23 dB，SSIM值为0.97。

Abstract
The proliferation of technologies, such as extended reality (XR), has increased the demand for high-quality three-dimensional (3D) graphical representations. Industrial 3D applications encompass computer-aided design (CAD), finite element analysis (FEA), scanning, and robotics. However, current methods employed for industrial 3D representations suffer from high implementation costs and reliance on manual human input for accurate 3D modeling. To address these challenges, neural radiance fields (NeRFs) have emerged as a promising approach for learning 3D scene representations based on provided training 2D images. Despite a growing interest in NeRFs, their potential applications in various industrial subdomains are still unexplored. In this paper, we deliver a comprehensive examination of NeRF industrial applications while also providing direction for future research endeavors. We also present a series of proof-of-concept experiments that demonstrate the potential of NeRFs in the industrial domain. These experiments include NeRF-based video compression techniques and using NeRFs for 3D motion estimation in the context of collision avoidance. In the video compression experiment, our results show compression savings up to 48\% and 74\% for resolutions of 1920x1080 and 300x168, respectively. The motion estimation experiment used a 3D animation of a robotic arm to train Dynamic-NeRF (D-NeRF) and achieved an average peak signal-to-noise ratio (PSNR) of disparity map with the value of 23 dB and an structural similarity index measure (SSIM) 0.97.

摘要
“技术的普及，如扩展现实（XR），已经提高了高品质三维图形的需求。工业三维应用包括计算机支持设计（CAD）、finite element分析（FEA）、扫描和机器人。然而，现有的工业三维表示方法受到高实施成本和人工输入的假设，以获得准确的三维模型。为了解决这些挑战，神经辐射场（NeRF）已经出现为了学习基于提供训练图像的三维场景表示。尽管NeRF在不同领域产生了增长的兴趣，但它们在不同的工业子领域的潜在应用还未得到了足够的探索。在这篇论文中，我们提供了对NeRF工业应用的全面评估，并为未来研究提供方向。我们还进行了一系列的证明性实验，以示NeRF在工业领域的潜在应用。这些实验包括基于NeRF的视频压缩技术和使用NeRF进行3D运动估计，以避免碰撞。在视频压缩实验中，我们得到了1920x1080和300x168的分辨率下的压缩率为48%和74%。在运动估计实验中，我们使用了一个3D动画的机械臂进行训练，并获得了23 dB的平均峰值信号噪声比（PSNR）和0.97的结构相似度指标（SSIM）。”

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

paper_url: http://arxiv.org/abs/2308.07117
repo_url: None
paper_authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki
for: 高速、轻量级、高精度的语音合成
methods: 使用快速、轻量级的1D CNN作为基础网络，并将一些神经过程替换为iSTFT，以提高速度和精度。
results: iSTFTNet2比iSTFTNet更快、更轻量级，且音质相对保持不变。

Abstract
The inverse short-time Fourier transform network (iSTFTNet) has garnered attention owing to its fast, lightweight, and high-fidelity speech synthesis. It obtains these characteristics using a fast and lightweight 1D CNN as the backbone and replacing some neural processes with iSTFT. Owing to the difficulty of a 1D CNN to model high-dimensional spectrograms, the frequency dimension is reduced via temporal upsampling. However, this strategy compromises the potential to enhance the speed. Therefore, we propose iSTFTNet2, an improved variant of iSTFTNet with a 1D-2D CNN that employs 1D and 2D CNNs to model temporal and spectrogram structures, respectively. We designed a 2D CNN that performs frequency upsampling after conversion in a few-frequency space. This design facilitates the modeling of high-dimensional spectrograms without compromising the speed. The results demonstrated that iSTFTNet2 made iSTFTNet faster and more lightweight with comparable speech quality. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/.

摘要
它的快速、轻量级和高精度语音合成使得倒时傅立卷网络（iSTFTNet）受到了关注。它使用了快速和轻量级的1D CNN作为核心，并将一些神经过程替换为iSTFT。由于1D CNNDifficulty modeling高维spectrograms，因此在频率维度上做了时间upsampling。然而，这种策略会减少速度的潜在提高。因此，我们提出了iSTFTNet2，它是iSTFTNet的改进版本，使用1D-2D CNN来模型时间和spectrogram结构。我们设计了一个2D CNN，它在几个频率空间中进行频率upsampling。这种设计可以模型高维spectrograms，而不会减少速度。结果表明，iSTFTNet2使得iSTFTNet更快速和轻量级，并且与相同的语音质量相对。音频样本可以在https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet2/上获取。

Ada-QPacknet – adaptive pruning with bit width reduction as an efficient continual learning method without forgetting

paper_url: http://arxiv.org/abs/2308.07939
repo_url: None
paper_authors: Marcin Pietroń, Dominik Żurek, Kamil Faber, Roberto Corizzo
for: 这篇论文主要针对各种各样的动态和复杂环境下的Continual Learning（CL）问题。
methods: 该论文提出了一种基于架构的CL方法，称为Ada-QPacknet，它通过减少模型大小来实现CL。该方法使用有效的线性和非线性归一化方法来减少模型的权重的位数据类型。
results: 根据实验结果，hybrid 8和4位归一化的混合归一化方法可以达到类似于浮点子网络的准确率，而且在任务和类增量场景中比大多数CL策略表现更好。

Abstract
Continual Learning (CL) is a process in which there is still huge gap between human and deep learning model efficiency. Recently, many CL algorithms were designed. Most of them have many problems with learning in dynamic and complex environments. In this work new architecture based approach Ada-QPacknet is described. It incorporates the pruning for extracting the sub-network for each task. The crucial aspect in architecture based CL methods is theirs capacity. In presented method the size of the model is reduced by efficient linear and nonlinear quantisation approach. The method reduces the bit-width of the weights format. The presented results shows that hybrid 8 and 4-bit quantisation achieves similar accuracy as floating-point sub-network on a well-know CL scenarios. To our knowledge it is the first CL strategy which incorporates both compression techniques pruning and quantisation for generating task sub-networks. The presented algorithm was tested on well-known episode combinations and compared with most popular algorithms. Results show that proposed approach outperforms most of the CL strategies in task and class incremental scenarios.

摘要

Age-Stratified Differences in Morphological Connectivity Patterns in ASD: An sMRI and Machine Learning Approach

paper_url: http://arxiv.org/abs/2308.07356
repo_url: None
paper_authors: Gokul Manoj, Sandeep Singh Sengar, Jac Fredo Agastinose Ronickom
for: 本研究的目的是用 morphological features (MF) 和 morphological connectivity features (MCF) 来分类 autism spectrum disorder (ASD)，并比较不同年龄组的分类效果。
methods: 研究使用了 two publicly available databases, ABIDE-I 和 ABIDE-II, 获取了 structural magnetic resonance imaging (sMRI) 数据，并对数据进行了标准化处理。然后，将数据分割成 148 个不同区域，根据 Destrieux Atlases，并从每个区域提取了面积、厚度、体积和平均弯曲信息。使用了统计学 t-test (p<0.05) 来选择特征，然后使用 random forest (RF) 分类器进行训练。
results: 研究结果表明，6-11 岁的年龄组的表现最高，然后是 6-18 岁和 11-18 岁的年龄组。总的来说，MCF 与 RF 在 6-11 岁的年龄组中表现最好，其中的准确率、 F1 分数、回归率和精度分别为 75.8%、83.1%、86% 和 80.4%。结论：本研究因此表明，使用 morphological connectivity 和年龄相关的诊断模型可以有效地分类 ASD。

Abstract
Purpose: Age biases have been identified as an essential factor in the diagnosis of ASD. The objective of this study was to compare the effect of different age groups in classifying ASD using morphological features (MF) and morphological connectivity features (MCF). Methods: The structural magnetic resonance imaging (sMRI) data for the study was obtained from the two publicly available databases, ABIDE-I and ABIDE-II. We considered three age groups, 6 to 11, 11 to 18, and 6 to 18, for our analysis. The sMRI data was pre-processed using a standard pipeline and was then parcellated into 148 different regions according to the Destrieux atlas. The area, thickness, volume, and mean curvature information was then extracted for each region which was used to create a total of 592 MF and 10,878 MCF for each subject. Significant features were identified using a statistical t-test (p<0.05) which was then used to train a random forest (RF) classifier. Results: The results of our study suggested that the performance of the 6 to 11 age group was the highest, followed by the 6 to 18 and 11 to 18 ages in both MF and MCF. Overall, the MCF with RF in the 6 to 11 age group performed better in the classification than the other groups and produced an accuracy, F1 score, recall, and precision of 75.8%, 83.1%, 86%, and 80.4%, respectively. Conclusion: Our study thus demonstrates that morphological connectivity and age-related diagnostic model could be an effective approach to discriminating ASD.

摘要
目的：识别自适应发育障碍（ASD）的年龄因素有所重要。本研究的目标是比较不同年龄组的ASD诊断使用形态特征（MF）和形态连接特征（MCF）的效果。方法：我们使用ABIDE-I和ABIDE-II两个公共数据库获取了structural magnetic resonance imaging（sMRI）数据。我们分为三个年龄组：6-11岁、11-18岁和6-18岁进行分析。sMRI数据经过标准化处理后，使用Destrieux Atlas将数据分割成148个区域。然后提取每个区域的面积、厚度、体积和平均曲率信息，共计592个MF和10878个MCF。使用统计t检测test（p<0.05）标识特征，然后使用随机森林（RF）分类器进行训练。结果：我们的研究发现，6-11岁年龄组的表现最高，其次是6-18岁和11-18岁年龄组。总的来说，MCF与RF在6-11岁年龄组中表现较好，其准确率、F1分数、报告率和准确率分别为75.8%、83.1%、86%和80.4%。结论：因此，我们的研究表明，形态连接和年龄相关的诊断模型可以有效地识别ASD。

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

paper_url: http://arxiv.org/abs/2308.07074
repo_url: https://github.com/ofa-sys/instag
paper_authors: Keming Lu, Hongyi Yuan, Zheng Yuan, Runji Lin, Junyang Lin, Chuanqi Tan, Chang Zhou, Jingren Zhou
for: 这个论文的目的是提高基础模型的命令遵从能力，并通过量化分析定义命令多样性和复杂性。
methods: 本文使用了一种名为InsTag的开放集 Fine-grained tagger，通过Semantics和Intention来标签SFT数据集中的样本，并定义命令多样性和复杂性。
results: 根据MT-Bench的评价，使用InsTag选择的6K多样性和复杂性的样本进行微调，可以使基础模型的命令遵从能力得到显著提高。

Abstract
Foundation language models obtain the instruction-following ability through supervised fine-tuning (SFT). Diversity and complexity are considered critical factors of a successful SFT dataset, while their definitions remain obscure and lack quantitative analyses. In this work, we propose InsTag, an open-set fine-grained tagger, to tag samples within SFT datasets based on semantics and intentions and define instruction diversity and complexity regarding tags. We obtain 6.6K tags to describe comprehensive user queries. Then we analyze popular open-sourced SFT datasets and find that the model ability grows with more diverse and complex data. Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data. The resulting models, TagLM, outperform open-source models based on considerably larger SFT data evaluated by MT-Bench, echoing the importance of query diversity and complexity. We open-source InsTag in https://github.com/OFA-Sys/InsTag.

摘要
基于监督精度（SFT）的基础语言模型获得了指令遵循能力，但是关键因素如多样性和复杂性的定义尚未得到准确的量化分析。在这项工作中，我们提出了InsTag，一个开放集成细词标注器，用于在SFT数据集中标注样本基于含义和目标，并定义指令多样性和复杂性的标签。我们获得了6.6K个标签来描述用户查询的全面性。然后我们分析了一些常用的开源SFT数据集，发现模型能力随着数据集的多样性和复杂性增加而增长。基于这一观察，我们提出了基于InsTag的数据选择器，用于从开源数据集中选择6K个多样性和复杂性最高的样本，并在InsTag-选择的数据上精度 fine-tune 模型。得到的模型TagLM在MT-Bench上评估得到了较大的SFT数据集的较好的性能，证明了查询多样性和复杂性的重要性。我们将InsTag开源在https://github.com/OFA-Sys/InsTag。

Machine Unlearning: Solutions and Challenges

paper_url: http://arxiv.org/abs/2308.07061
repo_url: None
paper_authors: Jie Xu, Zihan Wu, Cong Wang, Xiaohua Jia
for: 本研究旨在Addressing privacy and security concerns in machine learning by selectively removing specific training data points’ influence on trained models.
methods: 本研究 categorizes existing machine unlearning research into two types: exact unlearning和approximate unlearning, and reviews state-of-the-art solutions with their advantages and limitations.
results: 本研究提出了未来研究方向，并鼓励研究人员通过Addressing open problems to advance machine unlearning and establish it as an essential capability for trustworthy and adaptive machine learning.

Abstract
Machine learning models may inadvertently memorize sensitive, unauthorized, or malicious data, posing risks of privacy violations, security breaches, and performance deterioration. To address these issues, machine unlearning has emerged as a critical technique to selectively remove specific training data points' influence on trained models. This paper provides a comprehensive taxonomy and analysis of machine unlearning research. We categorize existing research into exact unlearning that algorithmically removes data influence entirely and approximate unlearning that efficiently minimizes influence through limited parameter updates. By reviewing the state-of-the-art solutions, we critically discuss their advantages and limitations. Furthermore, we propose future directions to advance machine unlearning and establish it as an essential capability for trustworthy and adaptive machine learning. This paper provides researchers with a roadmap of open problems, encouraging impactful contributions to address real-world needs for selective data removal.

摘要
Translation notes:* " Machine learning models" is translated as "机器学习模型" (jī zhī xué xí mó delè)* "inadvertently" is translated as "无意" (wú yì)* "sensitive, unauthorized, or malicious" is translated as "敏感、未授权或黑客" (mǐn gǎn, wèi shèng qián, hēi kè)* "privacy violations" is translated as "隐私侵犯" (yǐn wèi qiāng fāng)* "security breaches" is translated as "安全泄露" (ān què lù)* "performance deterioration" is translated as "性能下降" (xìng néng xià gōng)* "machine unlearning" is translated as "机器忘记" (jī zhī wàng jī)* "selectively remove" is translated as "选择性地移除" (选择性地移除)* "specific training data points" is translated as "特定训练数据点" (特定训练数据点)* "influence" is translated as "影响" (yìng xiǎng)* "entirely" is translated as "完全" (quán zhèng)* "efficiently" is translated as "高效" (gāo yù)* "minimizes" is translated as "最小化" (zuì xiǎo hóu)* "limited parameter updates" is translated as "有限参数更新" (yǒu xiàn paramètres jīn gòu)* "state-of-the-art solutions" is translated as "现状的解决方案" (xiàn zhèng de jiě jīng fāng àn)* "advantages and limitations" is translated as "优点和缺点" (yòu dòng hé qiòng diǎn)* "future directions" is translated as "未来方向" (wèi lāi fāng dìng)* "trustworthy and adaptive machine learning" is translated as "可靠性和适应性机器学习" (kě zuò xìng yì jī zhī xué xí)* "open problems" is translated as "开放问题" (kāi fàng wèn tí)* "impactful contributions" is translated as "有影响的贡献" (yǒu yìng xiǎng de gōng jìn)

Diagnosis of Scalp Disorders using Machine Learning and Deep Learning Approach – A Review

paper_url: http://arxiv.org/abs/2308.07052
repo_url: None
paper_authors: Hrishabh Tiwari, Jatin Moolchandani, Shamla Mantri
for: 这个研究是为了提高皮肤病诊断的准确率和效率。
methods: 这个研究使用了深度学习模型，包括CNN和FCN，以及一个APP来识别皮肤和scalp疾病。
results: 研究结果表明，使用深度学习模型可以准确地识别皮肤和scalp疾病，其中最高准确率达97.41%-99.09%。

Abstract
The morbidity of scalp diseases is minuscule compared to other diseases, but the impact on the patient's life is enormous. It is common for people to experience scalp problems that include Dandruff, Psoriasis, Tinea-Capitis, Alopecia and Atopic-Dermatitis. In accordance with WHO research, approximately 70% of adults have problems with their scalp. It has been demonstrated in descriptive research that hair quality is impaired by impaired scalp, but these impacts are reversible with early diagnosis and treatment. Deep Learning advances have demonstrated the effectiveness of CNN paired with FCN in diagnosing scalp and skin disorders. In one proposed Deep-Learning-based scalp inspection and diagnosis system, an imaging microscope and a trained model are combined with an app that classifies scalp disorders accurately with an average precision of 97.41%- 99.09%. Another research dealt with classifying the Psoriasis using the CNN with an accuracy of 82.9%. As part of another study, an ML based algorithm was also employed. It accurately classified the healthy scalp and alopecia areata with 91.4% and 88.9% accuracy with SVM and KNN algorithms. Using deep learning models to diagnose scalp related diseases has improved due to advancements i computation capabilities and computer vision, but there remains a wide horizon for further improvements.

摘要
scalp病的感染率相对其他疾病较低，但对病人生活的影响却很大。人们常常会经历头皮问题，包括痤疮、 Psoriasis、脚抄螯、脱发和过敏性皮肤炎。根据Who的研究，大约70%的成年人都有头皮问题。研究表明，损害的头皮质量可以通过早期诊断和治疗来改善，但这些影响可以逆转。深度学习技术的发展使得 CNN 和 FCN 的结合可以准确地诊断头皮和皮肤疾病。一种提议的深度学习基于的头皮检查和诊断系统使用了一个升级的探针和训练模型，并与一个APP结合，可以准确地分类头皮疾病，其精度为97.41%-99.09%。另一项研究用到 CNN 分类痤疮，精度为82.9%。另外一项研究使用 ML 算法，可以准确地分类健康的头皮和脱发症，精度分别为91.4%和88.9%。使用深度学习模型诊断头皮相关疾病，因计算能力和计算视觉的进步而得到改善，但还有很大的发展空间。

Fourier neural operator for learning solutions to macroscopic traffic flow models: Application to the forward and inverse problems

paper_url: http://arxiv.org/abs/2308.07051
repo_url: None
paper_authors: Bilal Thonnam Thodi, Sai Venkata Ramana Ambadipudi, Saif Eddin Jabari
for: 本研究使用深度学习方法解决非线性散射方程的问题，具体是用神经网络扩散算法来学习宏观交通流模型中的全部交通状态。
methods: 本研究使用的是一种名为 физи学 Informed Fourier Neural Operator（π-FNO）的神经网络算法，该算法在训练过程中添加了物理损失函数来补做冲击预测，以提高冲击预测的准确性。
results: 实验结果表明，使用本研究的神经网络算法可以高度准确地预测环路交通网络和城市信号灯控制下的density dynamics，并且可以适应不同的车辆队列分布和多个交通信号周期。此外，研究还发现，使用physics regularizer可以帮助学习长期交通状态的预测，特别是在periodic boundary data的情况下。

Abstract
Deep learning methods are emerging as popular computational tools for solving forward and inverse problems in traffic flow. In this paper, we study a neural operator framework for learning solutions to nonlinear hyperbolic partial differential equations with applications in macroscopic traffic flow models. In this framework, an operator is trained to map heterogeneous and sparse traffic input data to the complete macroscopic traffic state in a supervised learning setting. We chose a physics-informed Fourier neural operator ($\pi$-FNO) as the operator, where an additional physics loss based on a discrete conservation law regularizes the problem during training to improve the shock predictions. We also propose to use training data generated from random piecewise constant input data to systematically capture the shock and rarefied solutions. From experiments using the LWR traffic flow model, we found superior accuracy in predicting the density dynamics of a ring-road network and urban signalized road. We also found that the operator can be trained using simple traffic density dynamics, e.g., consisting of $2-3$ vehicle queues and $1-2$ traffic signal cycles, and it can predict density dynamics for heterogeneous vehicle queue distributions and multiple traffic signal cycles $(\geq 2)$ with an acceptable error. The extrapolation error grew sub-linearly with input complexity for a proper choice of the model architecture and training data. Adding a physics regularizer aided in learning long-term traffic density dynamics, especially for problems with periodic boundary data.

摘要
深度学习方法在交通流动中应用得更加广泛，用于解决前向和反向问题。在这篇论文中，我们研究了一种神经运算框架，用于学习解决非线性偏微分方程的解。在这个框架中，一个运算被训练来将各种不同和稀缺的交通输入数据映射到完整的宏观交通状态中。我们选择了一种physics-informed Fourier neural operator（$\pi$-FNO）作为运算，其中添加了物理损失，以便在训练过程中进行辐射预测。我们还提议使用来自随机划分输入数据的训练数据，以系统地捕捉冲击和稀缺解。在使用LWR交通流模型的实验中，我们发现了在密度动力学中的高精度预测，特别是在环路网络和城市控制措施下。我们还发现，运算可以通过简单的交通密度动力学，例如由2-3辆汽车队列和1-2个交通信号循环组成的，来预测密度动力学。并且可以在多个交通信号循环和不同车辆队列分布下进行预测，并且误差在输入复杂性增加时呈线性增长。添加物理正则化有助于学习长期交通密度动力学，特别是在 Periodic boundary data 下。

UIPC-MF: User-Item Prototype Connection Matrix Factorization for Explainable Collaborative Filtering

paper_url: http://arxiv.org/abs/2308.07048
repo_url: None
paper_authors: Lei Pan, Von-Wun Soo
for: 提供可解释的用户行为推荐（Recommending items to potentially interested users with explainable user behavior）
methods: 使用prototype-based matrix factorization方法（UIPC-MF），用户和Item分别与一组prototype相关联，以提高推荐的可解释性。
results: 在三个 dataset 上比基eline方法高效，并且提供更好的透明度（Hit Ratio和Normalized Discounted Cumulative Gain）。

Abstract
Recommending items to potentially interested users has been an important commercial task that faces two main challenges: accuracy and explainability. While most collaborative filtering models rely on statistical computations on a large scale of interaction data between users and items and can achieve high performance, they often lack clear explanatory power. We propose UIPC-MF, a prototype-based matrix factorization method for explainable collaborative filtering recommendations. In UIPC-MF, both users and items are associated with sets of prototypes, capturing general collaborative attributes. To enhance explainability, UIPC-MF learns connection weights that reflect the associative relations between user and item prototypes for recommendations. UIPC-MF outperforms other prototype-based baseline methods in terms of Hit Ratio and Normalized Discounted Cumulative Gain on three datasets, while also providing better transparency.

摘要
推荐预测已成为商业中的一个重要任务，面临两大挑战：准确率和可解释性。大多数共同推荐模型基于大规模的用户和项目互动数据进行统计计算，可以达到高性能，但往往缺乏明确的解释力。我们提出了UIPC-MF，一种基于Matrix Factorization的原型基于方法，用于可解释的共同推荐。在UIPC-MF中，用户和项目都关联有一组概念prototype，捕捉用户和项目之间的共同特征。为了增强可解释性，UIPC-MF学习用户和项目概念之间的关联关系，以便为推荐提供更好的透明性。UIPC-MF在三个数据集上相比其他原型基本方法而言，有较高的 Hit Ratio 和 Normalized Discounted Cumulative Gain，同时也提供更好的透明性。

No Regularization is Needed: An Efficient and Effective Model for Incomplete Label Distribution Learning

paper_url: http://arxiv.org/abs/2308.07047
repo_url: None
paper_authors: Xiang Li, Songcan Chen
for: This paper focuses on addressing the problem of Incomplete Label Distribution Learning (InLDL), where the labels are incomplete or unobserved for some samples.
methods: The authors propose a new method that uses the prior of label distribution to solve the InLDL problem without any explicit regularization. They define a weighted empirical risk and derive upper bounds to reveal the implicit regularization role of weighting.
results: The proposed method has four advantages: 1) it is model selection free, 2) it has a closed form solution and is easy to implement, 3) it has linear computational complexity, and 4) it is competitive with state-of-the-art methods even without any explicit regularization.

Abstract
Label Distribution Learning (LDL) assigns soft labels, a.k.a. degrees, to a sample. In reality, it is always laborious to obtain complete degrees, giving birth to the Incomplete LDL (InLDL). However, InLDL often suffers from performance degeneration. To remedy it, existing methods need one or more explicit regularizations, leading to burdensome parameter tuning and extra computation. We argue that label distribution itself may provide useful prior, when used appropriately, the InLDL problem can be solved without any explicit regularization. In this paper, we offer a rational alternative to use such a prior. Our intuition is that large degrees are likely to get more concern, the small ones are easily overlooked, whereas the missing degrees are completely neglected in InLDL. To learn an accurate label distribution, it is crucial not to ignore the small observed degrees but to give them properly large weights, while gradually increasing the weights of the missing degrees. To this end, we first define a weighted empirical risk and derive upper bounds between the expected risk and the weighted empirical risk, which reveals in principle that weighting plays an implicit regularization role. Then, by using the prior of degrees, we design a weighted scheme and verify its effectiveness. To sum up, our model has four advantages, it is 1) model selection free, as no explicit regularization is imposed; 2) with closed form solution (sub-problem) and easy-to-implement (a few lines of codes); 3) with linear computational complexity in the number of samples, thus scalable to large datasets; 4) competitive with state-of-the-arts even without any explicit regularization.

摘要
Label Distribution Learning (LDL) assigns 软标签，即学习度，到一个样本上。在实际应用中，通常难以获得完整的学习度，从而产生了不完整的LDL（InLDL）问题。然而，InLDL经常会导致性能下降。为了解决这个问题，现有方法通常需要一或多个显式正则化，从而增加参数调整的复杂性和计算量。我们认为标签分布本身可以提供有用的先验知识，当用于适当的情况时，InLDL问题可以解决无需显式正则化。在这篇论文中，我们提出了一种有理的方法，使用这种先验知识来解决InLDL问题。我们的假设是，大的学习度更有可能得到更多的注意力，小的学习度容易被忽略，而缺失的学习度完全被InLDL忽略。为了学习准确的标签分布，非常重要不要忽略小 observed 的学习度，而是给它们分配正确的大小，同时逐渐增加缺失的学习度的权重。我们首先定义一个权重 empirical risk，并 deriv 上下文中的预期风险和权重 empirical risk 之间的Upper bound，这表明了权重在本质上扮演了隐式正则化的角色。然后，我们使用学习度的先验知识来设计一种权重方案，并证明其效果。总之，我们的模型具有以下四个优点：1) 无需显式正则化，因为不需要在数据上添加任何正则化项；2) 具有关闭式解决方案和易于实现（只需一些代码）；3) 计算复杂度为数据集的线性时间，因此可扩展到大型数据集；4) 可与当前的状态艺技相比，甚至没有任何显式正则化。

Bayesian Flow Networks

paper_url: http://arxiv.org/abs/2308.07037
repo_url: https://github.com/stefanradev93/BayesFlow
paper_authors: Alex Graves, Rupesh Kumar Srivastava, Timothy Atkinson, Faustino Gomez
for: 本研究旨在提出一种新的生成模型—权当流网络（BFN），它通过bayesian推理在噪声数据样本的指导下修改参数集中的独立分布，然后将这些分布作为输入传递给神经网络，从而生成第二个相互关联的分布。
methods: 本研究使用了权当流网络（BFN），它们的生成过程类似于反射模型的逆过程，但是更加简单，不需要前向过程。研究者还 derive了离散和连续时间的损失函数，以及批量生成过程。
results: 实验表明，BFNs可以在 dynamical binarized MNIST 和 CIFAR-10 图像模型任务上 achieve 竞争力的 log-likelihood，并且在 text8 字符级语言模型任务上超越了所有已知的杂分 diffusion 模型。

Abstract
This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as language modelling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.

摘要

S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields

paper_url: http://arxiv.org/abs/2308.07032
repo_url: https://github.com/madaoer/s3im_nerf
paper_authors: Zeke Xie, Xindi Yang, Yujie Yang, Qi Sun, Yixiang Jiang, Haoran Wang, Yunfeng Cai, Mingming Sun
For: The paper aims to improve the quality of Neural Radiance Field (NeRF) and related neural field methods for novel-view image synthesis and surface reconstruction tasks.* Methods: The paper introduces a nonlocal multiplex training paradigm for NeRF and related neural field methods, using a novel Stochastic Structural SIMilarity (S3IM) loss that processes multiple data points as a whole set instead of processing multiple inputs independently.* Results: The paper shows that the proposed S3IM loss leads to significant improvements in quality metrics for NeRF and neural surface representation, particularly for difficult tasks such as novel view synthesis and surface reconstruction. The improvements are robust even with sparse inputs, corrupted images, and dynamic scenes.

Abstract
Recently, Neural Radiance Field (NeRF) has shown great success in rendering novel-view images of a given scene by learning an implicit representation with only posed RGB images. NeRF and relevant neural field methods (e.g., neural surface representation) typically optimize a point-wise loss and make point-wise predictions, where one data point corresponds to one pixel. Unfortunately, this line of research failed to use the collective supervision of distant pixels, although it is known that pixels in an image or scene can provide rich structural information. To the best of our knowledge, we are the first to design a nonlocal multiplex training paradigm for NeRF and relevant neural field methods via a novel Stochastic Structural SIMilarity (S3IM) loss that processes multiple data points as a whole set instead of process multiple inputs independently. Our extensive experiments demonstrate the unreasonable effectiveness of S3IM in improving NeRF and neural surface representation for nearly free. The improvements of quality metrics can be particularly significant for those relatively difficult tasks: e.g., the test MSE loss unexpectedly drops by more than 90% for TensoRF and DVGO over eight novel view synthesis tasks; a 198% F-score gain and a 64% Chamfer $L_{1}$ distance reduction for NeuS over eight surface reconstruction tasks. Moreover, S3IM is consistently robust even with sparse inputs, corrupted images, and dynamic scenes.

摘要
最近，神经辐射场（NeRF）已经取得了大成功，通过学习含义表示的唯一RGB图像来生成新视图图像。NeRF和相关的神经场方法（例如神经表面表示）通常通过点级损失来优化和预测点级数据，而这些数据点与每个像素相对应。然而，这一线索的研究忽略了远程像素的共同监督，尽管知道图像或场景中的像素可以提供丰富的结构信息。据我们所知，我们是第一个设计非本地多重训练方法via一种新的随机结构相似性（S3IM）损失，该损失处理多个数据点作为整体而不是独立处理多个输入。我们的广泛实验表明S3IM在改进NeRF和神经表面表示方法方面具有不可思议的效果，并且这些改进的质量指标可以特别显著，例如在八个新视图合成任务中，测试MSE损失意外下降了More than 90% дляTensoRF和DVGO; NeuS在八个表面重建任务中获得了198%的F-score提升和64%的L1距离减少。此外，S3IM具有对于稀缺输入、损坏图像和动态场景的一致性。

Bayesian Physics-Informed Neural Network for the Forward and Inverse Simulation of Engineered Nano-particles Mobility in a Contaminated Aquifer

paper_url: http://arxiv.org/abs/2308.07352
repo_url: None
paper_authors: Shikhar Nilabh, Fidel Grandia
for: 这项研究的目的是为了开发一种能够在地下水域中有效地预测粒子的移动和停留行为，以便开发一种有效的地下水恢复策略。
methods: 这项研究使用了一种bayesian physics-informed neural network（B-PINN）框架，通过对模拟粒子在aquifer中的移动进行前向模型，并通过对模型输出进行逆向模型，来量化粒子的移动和停留行为。
results: 研究表明，B-PINN框架可以准确地预测粒子的移动和停留行为，并且可以量化这些行为的不确定性。此外，研究还发现了一些关键参数，可以用于控制粒子的移动和停留。这些结果表明，B-PINN框架可以提供有用的预测情况，以便开发有效的地下水恢复策略。

Abstract
Globally, there are many polluted groundwater sites that need an active remediation plan for the restoration of local ecosystem and environment. Engineered nanoparticles (ENPs) have proven to be an effective reactive agent for the in-situ degradation of pollutants in groundwater. While the performance of these ENPs has been highly promising on the laboratory scale, their application in real field case conditions is still limited. The complex transport and retention mechanisms of ENPs hinder the development of an efficient remediation strategy. Therefore, a predictive tool to comprehend the transport and retention behavior of ENPs is highly required. The existing tools in the literature are dominated with numerical simulators, which have limited flexibility and accuracy in the presence of sparse datasets and the aquifer heterogeneity. This work uses a Bayesian Physics-Informed Neural Network (B-PINN) framework to model the nano-particles mobility within an aquifer. The result from the forward model demonstrates the effective capability of B-PINN in accurately predicting the ENPs mobility and quantifying the uncertainty. The inverse model output is then used to predict the governing parameters for the ENPs mobility in a small-scale aquifer. The research demonstrates the capability of the tool to provide predictive insights for developing an efficient groundwater remediation strategy.

摘要

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

paper_url: http://arxiv.org/abs/2308.07351
repo_url: None
paper_authors: Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang
for: 本研究旨在解决选择适当的源策略以促进目标策略学习的挑战，提出了一种新的转移学习RL方法。
methods: 该方法利用actor-critic框架中的Q函数引导策略选择，选择源策略可以提供最大一步改进。另外，该方法还结合了优化转移和行为转移（IOB），通过规范学习的策略来模仿指导策略，并将其与行为策略相结合。
results: 该方法在标准任务中超过了状态艺术RL基线，并在连续学习场景中提高了最终性和知识传递性。此外，该方法的优化转移技术保证了目标策略学习的提高。

Abstract
Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task. Transfer RL methods can reshape the policy optimization objective (optimization transfer) or influence the behavior policy (behavior transfer) using source policies. However, selecting the appropriate source policy with limited samples to guide target policy learning has been a challenge. Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions, which can lead to non-stationary policy optimization or heavy sampling costs, diminishing transfer effectiveness. To address this challenge, we propose a novel transfer RL method that selects the source policy without training extra components. Our method utilizes the Q function in the actor-critic framework to guide policy selection, choosing the source policy with the largest one-step improvement over the current target policy. We integrate optimization transfer and behavior transfer (IOB) by regularizing the learned policy to mimic the guidance policy and combining them as the behavior policy. This integration significantly enhances transfer effectiveness, surpasses state-of-the-art transfer RL baselines in benchmark tasks, and improves final performance and knowledge transferability in continual learning scenarios. Additionally, we show that our optimization transfer technique is guaranteed to improve target policy learning.

摘要
人类有能力快速解决新任务使用已经学习过的策略，而强化学习（RL）代理也可以通过将来源策略中的知识传递到相关的目标任务中来实现此目的。传输RL方法可以修改策略优化目标（优化传递）或影响行为策略（行为传递）使用源策略。然而，选择适当的源策略可以受有限样本数的限制，从而影响目标策略的学习。先前的方法通过引入层次政策或估计源策略的价值函数等附加组件，可能导致非站点策略优化或重大的样本成本，减弱传输效果。为解决这个挑战，我们提出了一种新的传输RL方法，不需要训练附加组件。我们利用actor-critic框架中的Q函数来导引策选择，选择目标策略中最大化一步改进的源策略。我们将优化传递和行为传递（IOB）相结合，通过规范学习的策略来模仿指导策略，并将其与之相结合。这种结合显著提高了传输效果，超越了基准测试RL方法，并在持续学习场景中提高了最终性和知识传递性。此外，我们证明我们的优化传递技术是 garantizado 提高目标策略学习。

Efficient Neural PDE-Solvers using Quantization Aware Training

paper_url: http://arxiv.org/abs/2308.07350
repo_url: None
paper_authors: Winfried van den Dool, Tijmen Blankevoort, Max Welling, Yuki M. Asano
for: 解决Partial Differential Equations（PDE）中的计算成本问题，以减少计算成本并维持性能。
methods: 使用现有的量化方法来减少计算成本，包括量化网络参数和活动。
results: 对四个标准PDE数据集和三种网络架构进行了训练，并证明了量化意识训练可以降低计算成本，同时维持性能。最终，我们实际示出，只有通过量化来实现Pareto优化计算成本与性能的平衡。

Abstract
In the past years, the application of neural networks as an alternative to classical numerical methods to solve Partial Differential Equations has emerged as a potential paradigm shift in this century-old mathematical field. However, in terms of practical applicability, computational cost remains a substantial bottleneck. Classical approaches try to mitigate this challenge by limiting the spatial resolution on which the PDEs are defined. For neural PDE solvers, we can do better: Here, we investigate the potential of state-of-the-art quantization methods on reducing computational costs. We show that quantizing the network weights and activations can successfully lower the computational cost of inference while maintaining performance. Our results on four standard PDE datasets and three network architectures show that quantization-aware training works across settings and three orders of FLOPs magnitudes. Finally, we empirically demonstrate that Pareto-optimality of computational cost vs performance is almost always achieved only by incorporating quantization.

摘要

Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads

paper_url: http://arxiv.org/abs/2308.07013
repo_url: None
paper_authors: Dingheng Mo, Fanchao Chen, Siqiang Luo, Caihua Shan
for: 提高静态工作负荷下的系统性能优化。
methods: 使用Reinforcement Learning（RL）导向LSM树变换，并提出新的LSM树设计——FLSM树，以便在不同的压缩策略之间进行高效的过渡。
results: 在多种工作负荷下，RusKey可以达到4倍的终端性能优化，比RocksDB系统更强。

Abstract
LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings.

摘要

RusKey是首次在线上预约LSM-树结构，以确保在动态负荷下的稳定性表现。2. RusKey是首次使用强化学习（RL）引导LSM-树变换的研究。3. RusKey包含一种新的LSM-树设计，称为FLSM-树，可以有效地在不同的压缩策略之间进行过渡。4. RusKey不需要先知系统负荷特性，与现有技术不同。我们通过理论分析证明了新设计的优越性。实验结果表明，RusKey在多种工作负荷下表现出了强大的性能稳定性，与RocksDB系统在不同设置下实现了最高的终端性能，达到4倍之多。

Greedy online change point detection

paper_url: http://arxiv.org/abs/2308.07012
repo_url: None
paper_authors: Jou-Hui Ho, Felipe Tobar
for: 提高 online Change Point Detection（CPD）方法的精度和准确性。
methods: 使用 Greedy Online Change Point Detection（GOCPD）方法，通过最大化数据来自两个独立模型（temporal）的概率，以找到时间序列中的变化点。
results: 在单个变化点的情况下，使用ternary搜索，逻辑复杂度为对数。在synthetic数据和实际世界 univariate和multivariate设置中，证明GOCPD的有效性。

Abstract
Standard online change point detection (CPD) methods tend to have large false discovery rates as their detections are sensitive to outliers. To overcome this drawback, we propose Greedy Online Change Point Detection (GOCPD), a computationally appealing method which finds change points by maximizing the probability of the data coming from the (temporal) concatenation of two independent models. We show that, for time series with a single change point, this objective is unimodal and thus CPD can be accelerated via ternary search with logarithmic complexity. We demonstrate the effectiveness of GOCPD on synthetic data and validate our findings on real-world univariate and multivariate settings.

摘要
标准在线变点检测（CPD）方法通常会有较大的假阳性率，因为它们对异常值敏感。为了解决这个缺点，我们提议了Greedy Online Change Point Detection（GOCPD），一种计算效率高的方法，它通过最大化数据来自（时间）拼接两个独立模型的概率来检测变点。我们显示，对于具有单个变点的时间序列，这个目标函数是单峰性的，因此可以通过ternary search进行加速，其复杂度为对数型。我们在 synthetic 数据上证明了 GOCPD 的效果，并在实际世界的单variate 和多variate 设置中验证了我们的结论。

Aggregating Intrinsic Information to Enhance BCI Performance through Federated Learning

paper_url: http://arxiv.org/abs/2308.11636
repo_url: None
paper_authors: Rui Liu, Yuanyuan Chen, Anran Li, Yi Ding, Han Yu, Cuntai Guan
for: 这个研究旨在解决脑computer接口（BCI）建立高性能深度学习模型所面临的长期挑战，即脑电图（EEG）数据的共享。
methods: 本研究提出了一个层次化个性化联合学习（FLEEG）框架，以解决EEG数据之间的不同格式问题。每个客户端都被指派特定的数据集，并训练层次化个性化模型，以管理不同数据格式并促进信息交换。服务器则处理训练过程，将来自所有数据集的知识融合，以提高总表现。
results: 研究将在脑意念（MI）类别任务上进行了评估，使用了9个由不同设备收集的EEG数据集。结果显示，提出的框架可以提高类别性能达16.7%，尤其是 для较小的数据集。可视化结果也显示出该框架可以让本地模型对任务相关区域进行稳定的注意力集中，从而提高表现。

Abstract
Insufficient data is a long-standing challenge for Brain-Computer Interface (BCI) to build a high-performance deep learning model. Though numerous research groups and institutes collect a multitude of EEG datasets for the same BCI task, sharing EEG data from multiple sites is still challenging due to the heterogeneity of devices. The significance of this challenge cannot be overstated, given the critical role of data diversity in fostering model robustness. However, existing works rarely discuss this issue, predominantly centering their attention on model training within a single dataset, often in the context of inter-subject or inter-session settings. In this work, we propose a hierarchical personalized Federated Learning EEG decoding (FLEEG) framework to surmount this challenge. This innovative framework heralds a new learning paradigm for BCI, enabling datasets with disparate data formats to collaborate in the model training process. Each client is assigned a specific dataset and trains a hierarchical personalized model to manage diverse data formats and facilitate information exchange. Meanwhile, the server coordinates the training procedure to harness knowledge gleaned from all datasets, thus elevating overall performance. The framework has been evaluated in Motor Imagery (MI) classification with nine EEG datasets collected by different devices but implementing the same MI task. Results demonstrate that the proposed frame can boost classification performance up to 16.7% by enabling knowledge sharing between multiple datasets, especially for smaller datasets. Visualization results also indicate that the proposed framework can empower the local models to put a stable focus on task-related areas, yielding better performance. To the best of our knowledge, this is the first end-to-end solution to address this important challenge.

摘要
BCIs 长期面临缺乏数据的挑战，建立高性能的深度学习模型。虽然多个研究组织和机构收集了大量的 EEG 数据，但是在不同设备上分享 EEG 数据仍然具有挑战性，这是因为设备之间存在差异。这种挑战的重要性无法被低估，因为数据多样性对模型的稳定性具有关键作用。然而，现有的研究很少讨论这个问题，通常在单一数据集上进行模型训练，通常在 между Subject 或 Session 上进行。在这种情况下，我们提出了一种层次个性化 Federated Learning EEG 解码（FLEEG）框架，以超越这个挑战。这种创新的框架标识了一种新的学习模式 для BCIs，使得不同数据格式的数据可以在模型训练过程中合作。每个客户端都被分配了特定的数据集，并训练了一个层次个性化模型来管理多样的数据格式并促进信息交换。同时，服务器协调训练过程，以利用所有数据集中所获得的知识，从而提高总性能。我们在 Motor Imagery （MI）分类任务中使用九个 EEG 数据集，每个数据集都是由不同的设备收集的，但是实现了同一个 MI 任务。结果表明，我们的框架可以提高分类性能达到 16.7%，尤其是对小数据集的提高。视觉结果还表明，我们的框架可以让本地模型固定焦点于任务相关的区域，从而提高表现。到目前为止，这是我们知道的首个综合解决这个重要挑战的解决方案。

Deep convolutional neural networks for cyclic sensor data

paper_url: http://arxiv.org/abs/2308.06987
repo_url: None
paper_authors: Payman Goodarzi, Yannick Robin, Andreas Schütze, Tizian Schneider
for: 本研究旨在探讨基于感知器的维保维护，并使用深度学习技术对一个液压系统测试平台数据进行应用。
methods: 本研究使用了三个模型：基线模型使用传统方法、单个CNN模型使用早期感知融合、以及两个CNN模型（2L-CNN）使用晚期感知融合。
results: 基线模型使用晚期感知融合实现了低于1%的测试错误率，而CNN模型由于感知器之间的多样性而遇到挑战，导致错误率高达20.5%。在进一步调查这个问题时，我们发现了每个感知器都需要独立进行特征提取的问题。此外，我们还评估了2L-CNN模型，并发现它可以将最佳和最差的感知器组合起来，以减少错误率33%。这种研究认真地面对了多感知器系统中的复杂性。

Abstract
Predictive maintenance plays a critical role in ensuring the uninterrupted operation of industrial systems and mitigating the potential risks associated with system failures. This study focuses on sensor-based condition monitoring and explores the application of deep learning techniques using a hydraulic system testbed dataset. Our investigation involves comparing the performance of three models: a baseline model employing conventional methods, a single CNN model with early sensor fusion, and a two-lane CNN model (2L-CNN) with late sensor fusion. The baseline model achieves an impressive test error rate of 1% by employing late sensor fusion, where feature extraction is performed individually for each sensor. However, the CNN model encounters challenges due to the diverse sensor characteristics, resulting in an error rate of 20.5%. To further investigate this issue, we conduct separate training for each sensor and observe variations in accuracy. Additionally, we evaluate the performance of the 2L-CNN model, which demonstrates significant improvement by reducing the error rate by 33% when considering the combination of the least and most optimal sensors. This study underscores the importance of effectively addressing the complexities posed by multi-sensor systems in sensor-based condition monitoring.

摘要
预测维护在工业系统不间断运行和降低系统故障的风险方面扮演着关键角色。本研究利用液压系统测试平台数据进行了深度学习技术的应用，并对三种模型进行比较：基线模型使用传统方法、单个CNN模型使用早期感知融合，以及两个CNN模型（2L-CNN）使用晚期感知融合。基线模型通过使用晚期感知融合实现了测试错误率为1%，但CNN模型由于感知器的多样性而遇到问题，导致错误率为20.5%。为了更深入了解这个问题，我们对每个感知器进行了分别的训练，并观察到了减少精度的变化。此外，我们还评估了2L-CNN模型的性能，其能够在考虑最佳和最差感知器的组合下降低错误率33%。这个研究重申了对多感知器系统的预测维护存在多样性和复杂性的挑战。

pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

paper_url: http://arxiv.org/abs/2308.06983
repo_url: None
paper_authors: Momojit Biswas, Himanshu Buckchash, Dilip K. Prasad
for: 本研究的目的是提高 nearest neighbor 基于自助学习（SSL）的图像识别问题中的 semantic variation。
methods: 本研究使用 nearest neighbor sampling 方法，并引入 pseudo nearest neighbors（pNN）来控制支持集质量。此外，通过随机抽样和平滑重量更新方法来稳定 nearest neighbor 基于学习的不确定性。
results: 对多个公共图像识别和医学图像识别数据集进行评估，本研究的提案方法可以与基准 nearest neighbor 方法相比，并与其他先前提出的 SSL 方法相当。

Abstract
Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems. However, its performance is restricted by the quality of the support set, which holds positive samples for the contrastive loss. In this work, we show that the quality of the support set plays a crucial role in any nearest neighbor based method for SSL. We then provide a refined baseline (pNNCLR) to the nearest neighbor based SSL approach (NNCLR). To this end, we introduce pseudo nearest neighbors (pNN) to control the quality of the support set, wherein, rather than sampling the nearest neighbors, we sample in the vicinity of hard nearest neighbors by varying the magnitude of the resultant vector and employing a stochastic sampling strategy to improve the performance. Additionally, to stabilize the effects of uncertainty in NN-based learning, we employ a smooth-weight-update approach for training the proposed network. Evaluation of the proposed method on multiple public image recognition and medical image recognition datasets shows that it performs up to 8 percent better than the baseline nearest neighbor method, and is comparable to other previously proposed SSL methods.

摘要
近邻采样（NN）提供更多语义变化，对自助学习（SSL）基于图像识别问题的性能有较好的影响。然而，其性能受支持集质量的限制。在这种情况下，我们表明支持集质量对任何近邻基于SSL方法的性能具有关键作用。我们然后提供一种精度的基线（pNNCLR），用于改进近邻基于SSL方法（NNCLR）。为此，我们引入 pseudo 近邻（pNN），以控制支持集质量。具体来说，而不是直接采样最近邻，我们采样邻近硬邻邻的附近，通过变化结果向量的大小和使用随机采样策略来提高性能。此外，为了稳定NN基于学习中的uncertainty的效果，我们使用了平滑Weight更新方法进行网络训练。多个公共图像识别和医疗图像识别数据集上的评估表明，我们提出的方法与基eline最近邻方法相比，性能提高达8%，与其他之前提出的SSL方法相当。

Routing Recovery for UAV Networks with Deliberate Attacks: A Reinforcement Learning based Approach

paper_url: http://arxiv.org/abs/2308.06973
repo_url: None
paper_authors: Sijie He, Ziye Jia, Chao Dong, Wei Wang, Yilu Cao, Yang Yang, Qihui Wu
for: 本研究强调路由计划和恢复方法，以适应无人机网络受到攻击的情况。
methods: 该研究提出了一种基于节点重要性的攻击模型，并实现了节点重要性排名机制。此外，基于强化学习算法的智能路由方法也被提出，以恢复路由路径在无人机网络受到攻击时。
results: 数据示，提出的方法比其他相关方法更为有效。

Abstract
The unmanned aerial vehicle (UAV) network is popular these years due to its various applications. In the UAV network, routing is significantly affected by the distributed network topology, leading to the issue that UAVs are vulnerable to deliberate damage. Hence, this paper focuses on the routing plan and recovery for UAV networks with attacks. In detail, a deliberate attack model based on the importance of nodes is designed to represent enemy attacks. Then, a node importance ranking mechanism is presented, considering the degree of nodes and link importance. However, it is intractable to handle the routing problem by traditional methods for UAV networks, since link connections change with the UAV availability. Hence, an intelligent algorithm based on reinforcement learning is proposed to recover the routing path when UAVs are attacked. Simulations are conducted and numerical results verify the proposed mechanism performs better than other referred methods.

摘要
“无人航空器（UAV）网络在这些年变得非常流行，它在各种应用方面表现出了优异的表现。然而，UAV网络中的路由却受到分布式网络架构的影响，导致UAV易受到意外攻击。因此，本文关注UAV网络中的路由计划和恢复，以适应攻击。具体来说，我们设计了一种基于节点重要性的攻击模型，并提出了一种考虑节点和链接重要性的节点重要性排名机制。然而，由于UAV网络中的链接连接随着UAV可用性的变化，传统的路由方法无法处理UAV网络的路由问题。因此，我们提出了基于强化学习算法的智能路由恢复方法，以便在UAV被攻击时恢复路由路径。我们对此进行了仿真和数值分析，结果表明我们的提案在恢复路由路径方面表现出了更好的性能。”Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need any further assistance.

AutoAssign+: Automatic Shared Embedding Assignment in Streaming Recommendation

paper_url: http://arxiv.org/abs/2308.06965
repo_url: https://github.com/Applied-Machine-Learning-Lab/AutoAssign-Plus
paper_authors: Ziru Liu, Kecheng Chen, Fengyi Song, Bo Chen, Xiangyu Zhao, Huifeng Guo, Ruiming Tang
For: The paper aims to address the challenges of assigning initial ID embeddings randomly in streaming recommender systems, which can result in suboptimal prediction performance for items or users with limited interactive data, and lead to unnecessary memory consumption.* Methods: The paper proposes a reinforcement learning-driven framework called AutoAssign+, which utilizes an Identity Agent to represent low-frequency IDs field-wise with a small set of shared embeddings, and dynamically determine which ID features should be retained or eliminated in the embedding table.* Results: The paper demonstrates that AutoAssign+ is capable of significantly enhancing recommendation performance by mitigating the cold-start problem, and yields a reduction in memory usage of approximately 20-30%, verifying its practical effectiveness and efficiency for streaming recommender systems.Here’s the simplified Chinese text for the three key points:* For: 这篇论文目标是解决流动推荐系统中 randomly 分配初始 ID 嵌入的问题，这可能导致有限交互数据的用户或物品预测性能下降，并且需要不断扩展嵌入表，从而导致过度的内存消耗。* Methods: 论文提出一种基于强化学习的框架，即 AutoAssign+，该框架利用一个 Identity Agent 作为actor网络，该网络在两个角色下运行：一是用一小组共享嵌入来代表低频 ID，以提高嵌入初始化；二是在嵌入表中决定应保留或消除哪些 ID 特征。批评网络对策优化。* Results: 实验结果表明，AutoAssign+ 能够显著提高推荐性能，减轻冷启 проблеme，并且减少内存使用量约 20-30%，证明其在流动推荐系统中的实用性和效率。

Abstract
In the domain of streaming recommender systems, conventional methods for addressing new user IDs or item IDs typically involve assigning initial ID embeddings randomly. However, this practice results in two practical challenges: (i) Items or users with limited interactive data may yield suboptimal prediction performance. (ii) Embedding new IDs or low-frequency IDs necessitates consistently expanding the embedding table, leading to unnecessary memory consumption. In light of these concerns, we introduce a reinforcement learning-driven framework, namely AutoAssign+, that facilitates Automatic Shared Embedding Assignment Plus. To be specific, AutoAssign+ utilizes an Identity Agent as an actor network, which plays a dual role: (i) Representing low-frequency IDs field-wise with a small set of shared embeddings to enhance the embedding initialization, and (ii) Dynamically determining which ID features should be retained or eliminated in the embedding table. The policy of the agent is optimized with the guidance of a critic network. To evaluate the effectiveness of our approach, we perform extensive experiments on three commonly used benchmark datasets. Our experiment results demonstrate that AutoAssign+ is capable of significantly enhancing recommendation performance by mitigating the cold-start problem. Furthermore, our framework yields a reduction in memory usage of approximately 20-30%, verifying its practical effectiveness and efficiency for streaming recommender systems.

摘要
在流动推荐系统领域，传统方法通常是随机分配初始ID embedding。然而，这种做法会导致两个实际挑戦：（i）有限交互数据的物品或用户可能会得到低效预测性能。（ii）添加新ID或低频ID需要不断扩大 embedding 表，从而导致不必要的内存浪费。为了解决这些问题，我们介绍了一个基于强化学习的框架，即AutoAssign+，它实现了自动共享 embedding 分配加 plus。具体来说，AutoAssign+ 使用一个 Identity Agent 作为actor网络，该网络在两个角色中进行表达：（i）在 embeddings 中场景化低频 ID 使用一小组共享 embedding 进行增强初始化。（ii）在 embedding 表中决定保留或 eliminating ID 特征。Identity Agent 的策略通过批评网络的指导优化。为了评估我们的方法的有效性，我们在三个常用的标准数据集上进行了广泛的实验。实验结果表明，AutoAssign+ 能够有效地缓解冷启点问题，并且它的内存使用率比传统方法减少了约20-30%，证明了它在流动推荐系统中的实际效果和效率。

Graph Structural Residuals: A Learning Approach to Diagnosis

paper_url: http://arxiv.org/abs/2308.06961
repo_url: None
paper_authors: Jan Lukas Augustin, Oliver Niggemann
for: This paper proposes a novel framework for model-based diagnosis that combines concepts of model-based diagnosis with deep graph structure learning, aiming to facilitate a seamless integration of graph structure learning with model-based diagnosis.
methods: The proposed framework uses two distinct graph adjacency matrices to represent the system’s underlying structure and provide dynamic observations. Additionally, the paper introduces two versions of a self-supervised graph structure learning model architecture.
results: The authors demonstrate the potential of their data-driven diagnostic method through experiments on a system of coupled oscillators.

Abstract
Traditional model-based diagnosis relies on constructing explicit system models, a process that can be laborious and expertise-demanding. In this paper, we propose a novel framework that combines concepts of model-based diagnosis with deep graph structure learning. This data-driven approach leverages data to learn the system's underlying structure and provide dynamic observations, represented by two distinct graph adjacency matrices. Our work facilitates a seamless integration of graph structure learning with model-based diagnosis by making three main contributions: (i) redefining the constructs of system representation, observations, and faults (ii) introducing two distinct versions of a self-supervised graph structure learning model architecture and (iii) demonstrating the potential of our data-driven diagnostic method through experiments on a system of coupled oscillators.

摘要
传统的模型基于诊断方法是通过构建明确的系统模型来进行，这可能是一项劳动密集且需要专家知识的过程。在这篇论文中，我们提出了一种新的框架，它将模型基于诊断与深度图结构学习结合起来。这种数据驱动的方法利用数据来学习系统的下面结构，并提供动态观察结果，表示为两个不同的图邻接矩阵。我们的工作使得图结构学习与模型基于诊断的集成变得自然和简单，我们的主要贡献包括：1. 重新定义系统表示、观察和缺陷的构造2. 提出两种自动学习图结构模型建立方法3. 通过对振荡器系统的实验，证明我们的数据驱动诊断方法的潜力。

Search to Fine-tune Pre-trained Graph Neural Networks for Graph-level Tasks

paper_url: http://arxiv.org/abs/2308.06960
repo_url: None
paper_authors: Zhili Wang, Shimin Di, Lei Chen, Xiaofang Zhou
for: 这paper是为了提出一种更好的微调策略来改进预训练的graph neural network (GNN)的性能，以便在下游任务上提高模型性能。methods: 这paper使用了针对大规模未标注图数据进行预训练，并通过限制数据量的微调来适应目标下游任务。具体来说，它们提出了一种名为S2PGNN的搜索式微调策略，可以在各种下游任务上实现更好的性能。results: 这paper的实验结果表明，S2PGNN可以在10种著名的预训练GNN上实现性能提升，并且在预训练GNN的内部和外部比较其他微调策略的情况下都有更好的性能。codes可以在\url{https://anonymous.4open.science/r/code_icde2024-A9CB/}上获取。

Abstract
Recently, graph neural networks (GNNs) have shown its unprecedented success in many graph-related tasks. However, GNNs face the label scarcity issue as other neural networks do. Thus, recent efforts try to pre-train GNNs on a large-scale unlabeled graph and adapt the knowledge from the unlabeled graph to the target downstream task. The adaptation is generally achieved by fine-tuning the pre-trained GNNs with a limited number of labeled data. Despite the importance of fine-tuning, current GNNs pre-training works often ignore designing a good fine-tuning strategy to better leverage transferred knowledge and improve the performance on downstream tasks. Only few works start to investigate a better fine-tuning strategy for pre-trained GNNs. But their designs either have strong assumptions or overlook the data-aware issue for various downstream datasets. Therefore, we aim to design a better fine-tuning strategy for pre-trained GNNs to improve the model performance in this paper. Given a pre-trained GNN, we propose to search to fine-tune pre-trained graph neural networks for graph-level tasks (S2PGNN), which adaptively design a suitable fine-tuning framework for the given labeled data on the downstream task. To ensure the improvement brought by searching fine-tuning strategy, we carefully summarize a proper search space of fine-tuning framework that is suitable for GNNs. The empirical studies show that S2PGNN can be implemented on the top of 10 famous pre-trained GNNs and consistently improve their performance. Besides, S2PGNN achieves better performance than existing fine-tuning strategies within and outside the GNN area. Our code is publicly available at \url{https://anonymous.4open.science/r/code_icde2024-A9CB/}.

摘要
近期，图 нейрон网络（GNNs）在许多图关联任务中显示了无前例的成功。然而，GNNs面临标签缺乏问题，与其他神经网络一样。因此，当前努力通过大规模无标签图进行Pre-training GNNs，并将知识从无标签图传递到目标下游任务。适应通常通过精度调整Pre-trained GNNs中的一部分参数来实现。 despite the importance of fine-tuning, current GNNs pre-training works often ignore designing a good fine-tuning strategy to better leverage transferred knowledge and improve the performance on downstream tasks. Only a few works have started to investigate a better fine-tuning strategy for pre-trained GNNs, but their designs either have strong assumptions or overlook the data-aware issue for various downstream datasets. Therefore, we aim to design a better fine-tuning strategy for pre-trained GNNs to improve the model performance in this paper. Given a pre-trained GNN, we propose to search for a fine-tuning framework that adaptively designs a suitable fine-tuning strategy for the given labeled data on the downstream task. To ensure the improvement brought by searching fine-tuning strategy, we carefully summarize a proper search space of fine-tuning framework that is suitable for GNNs. The empirical studies show that S2PGNN can be implemented on the top of 10 famous pre-trained GNNs and consistently improve their performance. Besides, S2PGNN achieves better performance than existing fine-tuning strategies within and outside the GNN area. Our code is publicly available at \url{https://anonymous.4open.science/r/code_icde2024-A9CB/}.

Data-Driven Allocation of Preventive Care With Application to Diabetes Mellitus Type II

paper_url: http://arxiv.org/abs/2308.06959
repo_url: None
paper_authors: Mathias Kraus, Stefan Feuerriegel, Maytal Saar-Tsechansky
for: 预防疾病的效果性评估和决策支持
methods: 结合Counterfactual推理、机器学习和优化技术，建立可扩展的数据驱动决策模型，可以利用现代电子医疗记录中的高维医疗数据
results: 对89,191名 prediabetic 患者的电子医疗记录进行评估，与现有医疗实践相比，我们的数据驱动决策模型可以每年节省11亿美元。并且在不同预算水平下进行成本效果分析。

Abstract
Problem Definition. Increasing costs of healthcare highlight the importance of effective disease prevention. However, decision models for allocating preventive care are lacking. Methodology/Results. In this paper, we develop a data-driven decision model for determining a cost-effective allocation of preventive treatments to patients at risk. Specifically, we combine counterfactual inference, machine learning, and optimization techniques to build a scalable decision model that can exploit high-dimensional medical data, such as the data found in modern electronic health records. Our decision model is evaluated based on electronic health records from 89,191 prediabetic patients. We compare the allocation of preventive treatments (metformin) prescribed by our data-driven decision model with that of current practice. We find that if our approach is applied to the U.S. population, it can yield annual savings of $1.1 billion. Finally, we analyze the cost-effectiveness under varying budget levels. Managerial Implications. Our work supports decision-making in health management, with the goal of achieving effective disease prevention at lower costs. Importantly, our decision model is generic and can thus be used for effective allocation of preventive care for other preventable diseases.

摘要
问题定义：医疗成本的增长强调了疾病预防的重要性。然而，决策模型用于分配预防治疗的缺失。方法ология/结果：在这篇论文中，我们开发了一种基于数据的决策模型，用于确定有效分配预防治疗给患有风险的病人。具体来说，我们结合Counterfactual推理、机器学习和优化技术，构建了可扩展的决策模型，可以利用现代电子医疗记录中的高维医疗数据。我们的决策模型在89191名 prediabetic 患者的电子医疗记录上进行评估。我们将比较我们的数据驱动的决策模型与现有做法分配预防治疗（metformin）的分配方式。我们发现，如果我们的方法应用于美国人口，可以每年节省11亿美元。最后，我们分析了不同预算水平下的成本效果。管理意义：我们的工作支持医疗管理决策，以实现更有效的疾病预防，并降低成本。重要的是，我们的决策模型是通用的，可以用于有效地分配预防治疗其他预防性疾病。

CEmb-SAM: Segment Anything Model with Condition Embedding for Joint Learning from Heterogeneous Datasets

paper_url: http://arxiv.org/abs/2308.06957
repo_url: None
paper_authors: Dongik Shin, Beomsuk Kim, Seungjun Baek
for: 助 медицин专家进行诊断和治疗过程中的自动图像分割。
methods: 使用多modal ultrasound图像，并将不同的 анатомиче结构或癌变分为不同的子集，以便使用单一模型进行学习和泛化。
results: 在实验中，使用 Condition Embedding block (CEmb-SAM) 可以有效地适应不同的子集，并且在 peripheral nerves 和 breast cancer 图像分割任务中表现出色，比基eline方法有更好的效果。

Abstract
Automated segmentation of ultrasound images can assist medical experts with diagnostic and therapeutic procedures. Although using the common modality of ultrasound, one typically needs separate datasets in order to segment, for example, different anatomical structures or lesions with different levels of malignancy. In this paper, we consider the problem of jointly learning from heterogeneous datasets so that the model can improve generalization abilities by leveraging the inherent variability among datasets. We merge the heterogeneous datasets into one dataset and refer to each component dataset as a subgroup. We propose to train a single segmentation model so that the model can adapt to each sub-group. For robust segmentation, we leverage recently proposed Segment Anything model (SAM) in order to incorporate sub-group information into the model. We propose SAM with Condition Embedding block (CEmb-SAM) which encodes sub-group conditions and combines them with image embeddings from SAM. The conditional embedding block effectively adapts SAM to each image sub-group by incorporating dataset properties through learnable parameters for normalization. Experiments show that CEmb-SAM outperforms the baseline methods on ultrasound image segmentation for peripheral nerves and breast cancer. The experiments highlight the effectiveness of Cemb-SAM in learning from heterogeneous datasets in medical image segmentation tasks.

摘要
自动 segmentation of ultrasound images 可以帮助医疗专家进行诊断和治疗过程。 although using the common modality of ultrasound, one typically needs separate datasets in order to segment, for example, different anatomical structures or lesions with different levels of malignancy. 在这篇论文中，我们考虑了将异类数据集合在一起，以便模型可以利用数据集之间的自然变化来提高泛化能力。 we merge the heterogeneous datasets into one dataset and refer to each component dataset as a subgroup. we propose to train a single segmentation model so that the model can adapt to each sub-group. for robust segmentation, we leverage recently proposed Segment Anything model (SAM) in order to incorporate sub-group information into the model. we propose SAM with Condition Embedding block (CEmb-SAM) which encodes sub-group conditions and combines them with image embeddings from SAM. the conditional embedding block effectively adapts SAM to each image sub-group by incorporating dataset properties through learnable parameters for normalization. experiments show that CEmb-SAM outperforms the baseline methods on ultrasound image segmentation for peripheral nerves and breast cancer. the experiments highlight the effectiveness of Cemb-SAM in learning from heterogeneous datasets in medical image segmentation tasks.

Channel-Wise Contrastive Learning for Learning with Noisy Labels

paper_url: http://arxiv.org/abs/2308.06952
repo_url: None
paper_authors: Hui Kang, Sheng Liu, Huaxi Huang, Tongliang Liu
for: 本研究旨在Addressing the challenge of learning with noisy labels (LNL), 即训练一个能够从给定的实例中分辨真实的类别信息的分类器。
methods: 本研究提出了一种频道 wise contrastive learning (CWCL) 方法，通过在多个频道上进行对比学习，以分离真实的标签信息和噪声。
results: 对多个 benchmark 数据集进行评估，研究发现 CWCL 方法比既有的方法更高效，能够提取更加细腻和鲜明的特征，以便更好地分辨真实的标签信息。

Abstract
In real-world datasets, noisy labels are pervasive. The challenge of learning with noisy labels (LNL) is to train a classifier that discerns the actual classes from given instances. For this, the model must identify features indicative of the authentic labels. While research indicates that genuine label information is embedded in the learned features of even inaccurately labeled data, it's often intertwined with noise, complicating its direct application. Addressing this, we introduce channel-wise contrastive learning (CWCL). This method distinguishes authentic label information from noise by undertaking contrastive learning across diverse channels. Unlike conventional instance-wise contrastive learning (IWCL), CWCL tends to yield more nuanced and resilient features aligned with the authentic labels. Our strategy is twofold: firstly, using CWCL to extract pertinent features to identify cleanly labeled samples, and secondly, progressively fine-tuning using these samples. Evaluations on several benchmark datasets validate our method's superiority over existing approaches.

摘要

Knowing Where to Focus: Event-aware Transformer for Video Grounding

paper_url: http://arxiv.org/abs/2308.06947
repo_url: https://github.com/jinhyunj/eatr
paper_authors: Jinhyun Jang, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn
for: This paper aims to improve video grounding models by incorporating event-aware dynamic moment queries to better capture the temporal structure of videos and provide more accurate moment timestamps.
methods: The proposed method uses a slot attention mechanism for event reasoning and a gated fusion transformer layer for moment reasoning, which fuses the moment queries with the video-sentence representations to predict moment timestamps.
results: The proposed approach outperforms state-of-the-art video grounding models on several benchmarks, demonstrating its effectiveness and efficiency.Here’s the simplified Chinese text:
for: 这篇论文目的是提高视频落实模型，通过包含事件相关的动态时刻查询来更好地捕捉视频的时间结构，并提供更准确的时刻查询。
methods: 该方法使用槽注意机制进行事件理解，并使用阀门融合变换层与视频句子表示之间的交互来预测时刻查询。
results: 该方法在多个benchmark上表现出色，超越了现有的视频落实模型，证明其效果和效率。

Abstract
Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries. However, their input-agnostic moment queries inevitably overlook an intrinsic temporal structure of a video, providing limited positional information. In this paper, we formulate an event-aware dynamic moment query to enable the model to take the input-specific content and positional information of the video into account. To this end, we present two levels of reasoning: 1) Event reasoning that captures distinctive event units constituting a given video using a slot attention mechanism; and 2) moment reasoning that fuses the moment queries with a given sentence through a gated fusion transformer layer and learns interactions between the moment queries and video-sentence representations to predict moment timestamps. Extensive experiments demonstrate the effectiveness and efficiency of the event-aware dynamic moment queries, outperforming state-of-the-art approaches on several video grounding benchmarks.

摘要

Event reasoning that captures distinctive event units constituting a given video using a slot attention mechanism.2. Moment reasoning that fuses the moment queries with a given sentence through a gated fusion transformer layer and learns interactions between the moment queries and video-sentence representations to predict moment timestamps.Extensive experiments demonstrate the effectiveness and efficiency of the event-aware dynamic moment queries, outperforming state-of-the-art approaches on several video grounding benchmarks.Translation notes:* DETR-based: 基于DETR的 (DETR是一种引入了 transformer 的 Object Detection 模型)* input-agnostic: 无关输入的 (ignore the input)* event-aware: 事件意识的 (aware of events)* dynamic moment queries: 动态时刻查询 (query the moment of an event)* gated fusion transformer layer: 阻塞融合变换层 (a type of transformer layer that combines multiple inputs)* video-sentence representations: 视频句子表示 (representations of video and sentence)* moment timestamps: 时刻查询 (query the moment of an event)

Semantic-aware Network for Aerial-to-Ground Image Synthesis

paper_url: http://arxiv.org/abs/2308.06945
repo_url: https://github.com/jinhyunj/sanet
paper_authors: Jinhyun Jang, Taeyong Song, Kwanghoon Sohn
for: 本文 targets Aerial-to-ground image synthesis, an emerging and challenging problem that aims to synthesize a ground image from an aerial image.
methods: 本文提出了一个 novel framework，通过强化结构对运算和 semantic awareness 来解决这个问题。具体来说，本文引入了一个新的 semantic-attentive feature transformation module，可以将 aerial 特征转换为 ground 的 complex geographic structures。此外，本文还提出了 semantic-aware loss functions，通过利用预训练的 segmentation network，让网络 Synthesize realistic objects across various classes，并对不同类别进行分别计算损失和均衡。
results: 实验结果显示，提出的 framework 能够实现高品质的 Aerial-to-ground image synthesis，并与先前的方法进行比较和范例研究。

Abstract
Aerial-to-ground image synthesis is an emerging and challenging problem that aims to synthesize a ground image from an aerial image. Due to the highly different layout and object representation between the aerial and ground images, existing approaches usually fail to transfer the components of the aerial scene into the ground scene. In this paper, we propose a novel framework to explore the challenges by imposing enhanced structural alignment and semantic awareness. We introduce a novel semantic-attentive feature transformation module that allows to reconstruct the complex geographic structures by aligning the aerial feature to the ground layout. Furthermore, we propose semantic-aware loss functions by leveraging a pre-trained segmentation network. The network is enforced to synthesize realistic objects across various classes by separately calculating losses for different classes and balancing them. Extensive experiments including comparisons with previous methods and ablation studies show the effectiveness of the proposed framework both qualitatively and quantitatively.

摘要
空中图像与地面图像合成是一个emerging和挑战性的问题，目标是将空中图像转换为地面图像。由于空中和地面图像之间的 Layout和对象表示差异极大，现有的方法通常无法将空中场景中的组件迁移到地面场景中。在这篇论文中，我们提出了一个新的框架，以探讨这些挑战。我们引入了一个新的semantic-attentive特征变换模块，该模块可以将空中特征与地面布局相互对应，并且我们提出了Semantic-aware的损失函数，该函数通过使用预训练的分割网络来适应不同类别的物体，以实现Synthesize realistic的对象。我们进行了广泛的实验，包括与之前的方法进行比较和简要的ablation study，以证明我们的框架的效果。

Insurance pricing on price comparison websites via reinforcement learning

paper_url: http://arxiv.org/abs/2308.06935
repo_url: None
paper_authors: Tanut Treetanthiploet, Yufei Zhang, Lukasz Szpruch, Isaac Bowers-Barnard, Henrietta Ridley, James Hickey, Chris Pearce
for: This paper aims to address the challenges of formulating effective pricing strategies for insurers on price comparison websites (PCWs) by introducing a reinforcement learning (RL) framework that integrates model-based and model-free methods.
methods: The proposed methodology uses a model-based component to train agents in an offline setting, and model-free algorithms in a contextual bandit (CB) manner to dynamically update the pricing policy and maximize expected revenue.
results: The paper demonstrates the superiority of the proposed methodology over existing off-the-shelf RL/CB approaches using synthetic data, and shows that the hybrid agent outperforms benchmarks in terms of sample efficiency and cumulative reward.

Abstract
The emergence of price comparison websites (PCWs) has presented insurers with unique challenges in formulating effective pricing strategies. Operating on PCWs requires insurers to strike a delicate balance between competitive premiums and profitability, amidst obstacles such as low historical conversion rates, limited visibility of competitors' actions, and a dynamic market environment. In addition to this, the capital intensive nature of the business means pricing below the risk levels of customers can result in solvency issues for the insurer. To address these challenges, this paper introduces reinforcement learning (RL) framework that learns the optimal pricing policy by integrating model-based and model-free methods. The model-based component is used to train agents in an offline setting, avoiding cold-start issues, while model-free algorithms are then employed in a contextual bandit (CB) manner to dynamically update the pricing policy to maximise the expected revenue. This facilitates quick adaptation to evolving market dynamics and enhances algorithm efficiency and decision interpretability. The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion and demonstrates the superiority of the proposed methodology over existing off-the-shelf RL/CB approaches. We validate our methodology using synthetic data, generated to reflect private commercially available data within real-world insurers, and compare against 6 other benchmark approaches. Our hybrid agent outperforms these benchmarks in terms of sample efficiency and cumulative reward with the exception of an agent that has access to perfect market information which would not be available in a real-world set-up.

摘要
随着价格比较网站（PCW）的出现，保险公司面临着独特的价格策略形成挑战。在PCW上运营需要保险公司坚持细致的平衡，同时综合考虑竞争价格、利润和市场环境的变化。此外，保险业务具有资本密集的特点，如果价格低于客户风险水平，可能会导致保险公司的资本危机。为 Addressing these challenges, this paper proposes a reinforcement learning (RL) framework that learns the optimal pricing policy by integrating model-based and model-free methods. The model-based component is used to train agents in an offline setting, avoiding cold-start issues, while model-free algorithms are then employed in a contextual bandit (CB) manner to dynamically update the pricing policy to maximize the expected revenue. This facilitates quick adaptation to evolving market dynamics and enhances algorithm efficiency and decision interpretability. The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion and demonstrates the superiority of the proposed methodology over existing off-the-shelf RL/CB approaches. We validate our methodology using synthetic data, generated to reflect private commercially available data within real-world insurers, and compare against 6 other benchmark approaches. Our hybrid agent outperforms these benchmarks in terms of sample efficiency and cumulative reward, with the exception of an agent that has access to perfect market information, which is not available in a real-world setting.

Predicting Listing Prices In Dynamic Short Term Rental Markets Using Machine Learning Models

paper_url: http://arxiv.org/abs/2308.06929
repo_url: None
paper_authors: Sam Chapman, Seifey Mohammad, Kimberly Villegas
For: The paper aims to predict the prices of Airbnb rentals in Austin, Texas using a machine learning modeling approach, with the primary objective of constructing an accurate model and the secondary objective of identifying the key factors that drive rental prices.* Methods: The paper uses a machine learning approach and incorporates sentiment analysis into the feature engineering to gain a deeper understanding of periodic changes in Airbnb rental prices.* Results: The paper aims to provide accurate predictions of Airbnb rental prices in Austin, Texas and identify the key factors that drive these prices, with a focus on understanding how these factors vary across different locations and property types.Here is the same information in Simplified Chinese:* For: 这篇论文目标是预测美国得克萨斯州奥斯汀的空bnb租赁价格，使用机器学习模型方法，主要目标是构建准确的模型，并且次要目标是确定租赁价格的关键因素。* Methods: 论文使用机器学习方法，并将情感分析integrated into feature engineering，以更深入理解 periodic change in Airbnb租赁价格。* Results: 论文期望提供准确的空bnb租赁价格预测，并确定租赁价格关键因素，特别是在不同的地点和房型上。

Abstract
Our research group wanted to take on the difficult task of predicting prices in a dynamic market. And short term rentals such as Airbnb listings seemed to be the perfect proving ground to do such a thing. Airbnb has revolutionized the travel industry by providing a platform for homeowners to rent out their properties to travelers. The pricing of Airbnb rentals is prone to high fluctuations, with prices changing frequently based on demand, seasonality, and other factors. Accurate prediction of Airbnb rental prices is crucial for hosts to optimize their revenue and for travelers to make informed booking decisions. In this project, we aim to predict the prices of Airbnb rentals using a machine learning modeling approach. Our project expands on earlier research in the area of analyzing Airbnb rental prices by taking a methodical machine learning approach as well as incorporating sentiment analysis into our feature engineering. We intend to gain a deeper understanding on periodic changes of Airbnb rental prices. The primary objective of this study is to construct an accurate machine learning model for predicting Airbnb rental prices specifically in Austin, Texas. Our project's secondary objective is to identify the key factors that drive Airbnb rental prices and to investigate how these factors vary across different locations and property types.

摘要
我们的研究小组想要解决动态市场中价格预测的复杂任务。短期租赁如 Airbnb 列表似乎是完美的证明场地。 Airbnb 为旅行者提供了一个平台，让房东租出他们的房屋给旅行者。 Airbnb 租赁价格受到高涨的影响，价格频繁变化，与需求、季节和其他因素有关。正确预测 Airbnb 租赁价格是hosts 优化收益和旅行者做出 Informed 预订决策的关键。在这个项目中，我们使用机器学习模型方法来预测 Airbnb 租赁价格。我们的项目在分析 Airbnb 租赁价格方面进一步发展了之前的研究。我们采用了系统的机器学习方法，同时还包括了情感分析在feature工程中。我们想要更深入了解 periodic 变化 Airbnb 租赁价格。我们项目的主要目标是在奥斯汀、得克萨斯建立准确的机器学习模型，预测 Airbnb 租赁价格。我们项目的次要目标是确定 Airbnb 租赁价格的关键因素，以及这些因素在不同的地点和房型上如何变化。

CBA: Improving Online Continual Learning via Continual Bias Adaptor

paper_url: http://arxiv.org/abs/2308.06925
repo_url: https://github.com/wqza/cba-online-cl
paper_authors: Quanziang Wang, Renzhen Wang, Yichen Wu, Xixi Jia, Deyu Meng
for: 提高在非站ARY数据流中进行在线 continual learning（CL）的能力，以抵御数据流中的 Distribution shift 问题。
methods: 提出了一种 Continual Bias Adaptor（CBA）模块，用于在训练过程中增强分类器网络，以适应不断变化的数据分布，从而保持 previously learned tasks 的稳定整合。
results: 通过理论分析和实验测试，证明了 CBA 模块的有效性，并与四种基eline和三个公共的 continual learning benchmark 进行了广泛的比较。

Abstract
Online continual learning (CL) aims to learn new knowledge and consolidate previously learned knowledge from non-stationary data streams. Due to the time-varying training setting, the model learned from a changing distribution easily forgets the previously learned knowledge and biases toward the newly received task. To address this problem, we propose a Continual Bias Adaptor (CBA) module to augment the classifier network to adapt to catastrophic distribution change during training, such that the classifier network is able to learn a stable consolidation of previously learned tasks. In the testing stage, CBA can be removed which introduces no additional computation cost and memory overhead. We theoretically reveal the reason why the proposed method can effectively alleviate catastrophic distribution shifts, and empirically demonstrate its effectiveness through extensive experiments based on four rehearsal-based baselines and three public continual learning benchmarks.

摘要

A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models with Positional Embeddings

paper_url: http://arxiv.org/abs/2308.10822
repo_url: None
paper_authors: Hao Wen, Jie Wang, Xiaodong Qiao
for: 本研究旨在提高中文科技论文摘要中的 Move 识别精度。
methods: 该算法使用了改进的预训练模型和笛卡尔网络听力机制，以获取摘要中的字符位信息，进而提高深度 semantics 学习和targeted 特征提取。
results: 实验结果表明，提案的算法相比原始数据集，在分割数据集上达到了13.37% 高的准确率，并与基础对比模型相比，提高了7.55%。

Abstract
The recognition of abstracts is crucial for effectively locating the content and clarifying the article. Existing move recognition algorithms lack the ability to learn word position information to obtain contextual semantics. This paper proposes a novel enhanced move recognition algorithm with an improved pre-trained model and a gated network with attention mechanism for unstructured abstracts of Chinese scientific and technological papers. The proposed algorithm first performs summary data segmentation and vocabulary training. The EP-ERNIE$\_$AT-GRU framework is leveraged to incorporate word positional information, facilitating deep semantic learning and targeted feature extraction. Experimental results demonstrate that the proposed algorithm achieves 13.37$\%$ higher accuracy on the split dataset than on the original dataset and a 7.55$\%$ improvement in accuracy over the basic comparison model.

摘要
“摘要识别是对中文科技论文内容的效果搜索和解释的关键。现有的移动识别算法缺乏 Contextual semantics 的学习能力。本文提出了一种新的增强移动识别算法，包括改进的预训练模型和闭合网络带有注意力机制，以提高中文科技论文的摘要中的字位信息。提议的算法首先执行摘要数据分 segmentation 和词汇训练。EP-ERNIE $\_$ AT-GRU 框架被利用，以包括字位信息，促进深层 semantic learning 和targeted feature extraction。实验结果表明，提议的算法在分 Split 集上比原始集上的准确率高出 13.37%，与基本比较模型的准确率高出 7.55%。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

CausalLM is not optimal for in-context learning

paper_url: http://arxiv.org/abs/2308.06912
repo_url: None
paper_authors: Nan Ding, Tomer Levinboim, Jialin Wu, Sebastian Goodman, Radu Soricut
for: 本研究旨在理解 prefixLM 和 causalLM 在理论上的异同，以及它们在不同任务上的表现。
methods: 本研究使用了一种特定的参数构造来分析 prefixLM 和 causalLM 的收敛行为。
results: 我们的分析显示，prefixLM 在Linear Regression的优点点上收敛，而 causalLM 的收敛 dynamics 类似于在线 gradient descent 算法，不是 garantied 为优化的，即使样本数量在无穷大。我们的实验结果也表明， causalLM 在所有任务上一直下perform prefixLM。

Abstract
Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples. While this result is intuitive, it is not understood from a theoretical perspective. In this paper we take a theoretical approach and analyze the convergence behavior of prefixLM and causalLM under a certain parameter construction. Our analysis shows that both LM types converge to their stationary points at a linear rate, but that while prefixLM converges to the optimal solution of linear regression, causalLM convergence dynamics follows that of an online gradient descent algorithm, which is not guaranteed to be optimal even as the number of samples grows infinitely. We supplement our theoretical claims with empirical experiments over synthetic and real tasks and using various types of transformers. Our experiments verify that causalLM consistently underperforms prefixLM in all settings.

摘要
近期实验证据表明，基于转换器的受Context学习（prefixLM）在比 causalLM（causalLM）的情况下表现更好，这两种语言模型的区别在于，前一种使用预测语言模型，可以让所有的受Context样本都可以互相注意，而后一种使用自动递归注意力，这会禁止受Context样本注意到未来的样本。虽然这个结果很直观，但是从理论角度来看还不够了解。在这篇论文中，我们采用理论方法，分析了 prefixLM 和 causalLM 两种语言模型在某些参数构造下的收敛行为。我们的分析表明，两种LM类型在线性收敛，但是 prefixLM converge 到线性回归的优秀解，而 causalLM 的收敛动态类似于在线上梯度下降算法，这并不是 garantate 为INF 多个样本收敛到优秀解。我们在实验中验证了这些理论声明，通过在 sintetic 和实际任务上进行了多种 transformer 的实验，并发现 causalLM 在所有设置下 consistently underperform prefixLM。

paper_url: http://arxiv.org/abs/2308.06911
repo_url: None
paper_authors: Pengfei Liu, Yiming Ren, Zhixiang Ren
for: 本研究旨在开发一种多模态大语言模型，以捕捉分子数据中的丰富和复杂信息。
methods: 本研究使用GIT-Mol模型，该模型结合结构图、图像和文本信息，包括简化分子输入线Entry系统（SMILES）和分子caption。为了实现多Modal数据的集成，我们提出了GIT-Former模型，可以将所有模式映射到一个统一的幽默空间。
results: 我们开发了一种创新的任意语言分子翻译策略，比基线或单模态模型提高了10%-15%的分子描述率，提高了5%-10%的物理预测精度，并提高了20%的分子生成有效性。

Abstract
Large language models have made significant strides in natural language processing, paving the way for innovative applications including molecular representation and generation. However, most existing single-modality approaches cannot capture the abundant and complex information in molecular data. Here, we introduce GIT-Mol, a multi-modal large language model that integrates the structure Graph, Image, and Text information, including the Simplified Molecular Input Line Entry System (SMILES) and molecular captions. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture capable of mapping all modalities into a unified latent space. Our study develops an innovative any-to-language molecular translation strategy and achieves a 10%-15% improvement in molecular captioning, a 5%-10% accuracy increase in property prediction, and a 20% boost in molecule generation validity compared to baseline or single-modality models.

摘要
大型语言模型已经做出了很大的进步，导致了创新的应用，如分子表示和生成。但是，现有的单一模式方法通常无法捕捉分子数据中的实际和复杂的信息。在这里，我们介绍GIT-Mol，一个多模式大语言模型，它统合了分子结构graph、图像和文本信息，包括简润分子输入语系 (SMILES) 和分子描述。为了促进多模式分子数据的集成，我们提出GIT-Former，一个新的架构，可以将所有模式转换到一个统一的隐藏空间中。我们的研究开发了一种创新的任意语言分子翻译策略，并在分子描述、性能预测和分子生成效果上实现了10%-15%的改善、5%-10%的精度提高和20%的效果提高，相比基准或单一模式模型。

Generative Interpretation

paper_url: http://arxiv.org/abs/2308.06907
repo_url: https://github.com/yonathanarbel/generativeinterpretation
paper_authors: Yonathan A. Arbel, David Hoffman
for: This paper aims to introduce a new approach to estimating contractual meaning using large language models.
methods: The paper uses grounded case studies to illustrate the capabilities of these novel tools in distinct ways, such as ascertaining ordinary meaning in context, quantifying ambiguity, filling gaps in parties’ agreements, and calculating the probative value of individual pieces of extrinsic evidence.
results: The paper shows that AI models can help factfinders accurately estimate what the parties intended, and that generative interpretation can unsettle the current interpretative stalemate between efficiency-minded textualists and justice-oriented contextualists.

Abstract
We introduce generative interpretation, a new approach to estimating contractual meaning using large language models. As AI triumphalism is the order of the day, we proceed by way of grounded case studies, each illustrating the capabilities of these novel tools in distinct ways. Taking well-known contracts opinions, and sourcing the actual agreements that they adjudicated, we show that AI models can help factfinders ascertain ordinary meaning in context, quantify ambiguity, and fill gaps in parties' agreements. We also illustrate how models can calculate the probative value of individual pieces of extrinsic evidence. After offering best practices for the use of these models given their limitations, we consider their implications for judicial practice and contract theory. Using LLMs permits courts to estimate what the parties intended cheaply and accurately, and as such generative interpretation unsettles the current interpretative stalemate. Their use responds to efficiency-minded textualists and justice-oriented contextualists, who argue about whether parties will prefer cost and certainty or accuracy and fairness. Parties--and courts--would prefer a middle path, in which adjudicators strive to predict what the contract really meant, admitting just enough context to approximate reality while avoiding unguided and biased assimilation of evidence. As generative interpretation offers this possibility, we argue it can become the new workhorse of contractual interpretation.

摘要
我们介绍生成解释，一种新的合约解释方法使用大型自然语言模型。随着人工智能豪语气势日益增长，我们透过实际案例进行grounded的应用，每个案例都展示了这些新工具在不同方面的能力。从知名合约案例和实际契约中获取了诉讼官员可能需要了解的资讯，我们示出AI模型可以帮助诉讼官员了解合约的内容意义，衡量内容的模糊性，填充点数和契约中的漏洞。我们还示出了AI模型可以评估个别外部证据的证据价值。在提供了最佳实践方法后，我们考虑了这些模型的局限性和影响，并探讨它们对法律实践和合约理论的影响。使用LLMs可以让法院估算合约的意思便宜且精确，因此生成解释会使现有的解释僵局破产。这种方法可以满足效率主义者和正义主义者的需求，他们认为党籍人将偏好成本和可靠性或精确和公正。党籍人和法院都偏好一条中路，在这条路上，仲裁官员将努力预测合约的真正意思，接受足够的文本背景，以避免无方向和偏袋的证据融合。因此，我们认为生成解释将成为未来合约解释的主要工具。

Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

paper_url: http://arxiv.org/abs/2308.06895
repo_url: None
paper_authors: Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic
for: 这个论文旨在解决在分布式和隐私化的设置下进行 Federated Learning 的问题，特别是在 hyperbolic spaces 中进行分类。
methods: 本文提出了一种首次在 hyperbolic spaces 中进行 Federated Classification 的方法，包括分布式版本的 convex SVM 分类器，integer $B_h$ 序列来解决标签替换问题，以及基于 Poincaré 盘的量化方法来限制数据泄露。
results: 本文通过测试多种多样化的数据集，包括具有层次结构的单元细胞 RNA-seq 数据，demonstrated 该方法可以提高分类精度，比其欧几何空间中的对应方法更好。

Abstract
Hierarchical and tree-like data sets arise in many applications, including language processing, graph data mining, phylogeny and genomics. It is known that tree-like data cannot be embedded into Euclidean spaces of finite dimension with small distortion. This problem can be mitigated through the use of hyperbolic spaces. When such data also has to be processed in a distributed and privatized setting, it becomes necessary to work with new federated learning methods tailored to hyperbolic spaces. As an initial step towards the development of the field of federated learning in hyperbolic spaces, we propose the first known approach to federated classification in hyperbolic spaces. Our contributions are as follows. First, we develop distributed versions of convex SVM classifiers for Poincar\'e discs. In this setting, the information conveyed from clients to the global classifier are convex hulls of clusters present in individual client data. Second, to avoid label switching issues, we introduce a number-theoretic approach for label recovery based on the so-called integer $B_h$ sequences. Third, we compute the complexity of the convex hulls in hyperbolic spaces to assess the extent of data leakage; at the same time, in order to limit the communication cost for the hulls, we propose a new quantization method for the Poincar\'e disc coupled with Reed-Solomon-like encoding. Fourth, at server level, we introduce a new approach for aggregating convex hulls of the clients based on balanced graph partitioning. We test our method on a collection of diverse data sets, including hierarchical single-cell RNA-seq data from different patients distributed across different repositories that have stringent privacy constraints. The classification accuracy of our method is up to $\sim 11\%$ better than its Euclidean counterpart, demonstrating the importance of privacy-preserving learning in hyperbolic spaces.

摘要
随着应用领域的发展，树状和树状数据在语言处理、图数据挖掘、phylogeny和 genomics 等领域中变得越来越普遍。然而，树状数据无法在有限维度的欧式空间中嵌入，这会导致数据泄露问题。为解决这问题，我们可以使用拥有不同维度的几何空间。在分布式和隐私化的设置下，我们需要采用特有的联邦学习方法，以适应几何空间中的树状数据。为开拓联邦学习在几何空间中的领域，我们提出了首个已知的联邦分类方法。我们的贡献如下：1. 我们开发了分布式版本的 convex SVM 分类器，适用于Poincaré盘中的数据。在这个设置下，客户端上的信息都是各个客户端数据中的凸集。2. 为避免标签交换问题，我们引入了一种数学基础的方法，基于 so-called 整数 $B_h$ 序列。3. 我们计算了几何空间中凸集的复杂度，以评估数据泄露的程度。同时，我们提出了一种新的量化方法，用于压缩 Poincaré 盘。4. 在服务器端，我们引入了一种新的方法，用于将客户端上的凸集聚合到 Balanced Graph Partitioning 中。我们对一些多样化的数据集进行测试，包括不同病人的单元细胞 RNA-seq 数据，分布在不同的存储库中，具有严格的隐私限制。我们的方法的分类精度比其欧式对应方法高出至多 11%，这说明了隐私保护在几何空间中的重要性。

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

paper_url: http://arxiv.org/abs/2308.06885
repo_url: None
paper_authors: Petr Kasalický, Rodrigo Alves, Pavel Kordík
for: 评估推荐系统的效果是一项复杂的任务。在线和离线评估指标对推荐系统的真正目标存在偏误。大多数最近发表的论文使用不精准的离线评估方法来评估其方法的性能，从而削弱了学术研究对实际应用的影响。
methods: 我们研究了和评估推荐系统在线性能相关的离线评估指标。我们发现，惩罚受欢迎的Item和考虑交易时间在评估中可以提高我们选择最佳推荐模型的能力。
results: 我们对五个大规模的实际应用数据进行了测试，并发现，使用我们提出的离线评估指标可以更好地反映推荐系统在线性能。这些结果可以帮助学术界更好地理解离线评估和优化的标准，以便更好地应用推荐系统。

Abstract
The evaluation of recommendation systems is a complex task. The offline and online evaluation metrics for recommender systems are ambiguous in their true objectives. The majority of recently published papers benchmark their methods using ill-posed offline evaluation methodology that often fails to predict true online performance. Because of this, the impact that academic research has on the industry is reduced. The aim of our research is to investigate and compare the online performance of offline evaluation metrics. We show that penalizing popular items and considering the time of transactions during the evaluation significantly improves our ability to choose the best recommendation model for a live recommender system. Our results, averaged over five large-size real-world live data procured from recommenders, aim to help the academic community to understand better offline evaluation and optimization criteria that are more relevant for real applications of recommender systems.

摘要
评估推荐系统的复杂性使得评估方法存在各种问题。在线和离线评估指标对于推荐系统来说是不确定的。大多数最近发表的论文使用不精准的离线评估方法来评估自己的方法，这会导致实际在线性能与评估结果存在差异。这种情况使得学术研究对于实际应用的影响减少。我们的研究目标是研究和比较在线评估指标的表现。我们发现，对 популяр item 进行 penalty 和在评估过程中考虑交易时间可以显著改善我们选择最佳推荐模型的能力。我们的结果，基于五个大型实际生产环境中的真实数据， hopes to help学术界更好地理解推荐系统的离线评估和优化标准，以便更好地应用于实际推荐系统中。

Multi-Receiver Task-Oriented Communications via Multi-Task Deep Learning

paper_url: http://arxiv.org/abs/2308.06884
repo_url: None
paper_authors: Yalin E. Sagduyu, Tugba Erpek, Aylin Yener, Sennur Ulukus
for: 本研究探讨了任务导向的通信系统，在 transmitter 与多个接收器之间进行交互，每个接收器都需要完成自己的任务，例如图像分类等，并在 transmitter 上训练共享encoder和每个接收器上的专门decoder。
methods: 该方法使用多任务深度学习来实现多任务的共同优化和多接收器之间的通信，并通过在边缘的6G网络中进行有效的资源分配，以适应不同的通信频道条件，并最小化传输过程中的过载。
results: 实验结果表明，相比单任务导向的通信系统，多任务导向的通信系统可以更好地适应不同的任务和通信环境，并且可以提高图像分类精度和资源利用率。

Abstract
This paper studies task-oriented, otherwise known as goal-oriented, communications, in a setting where a transmitter communicates with multiple receivers, each with its own task to complete on a dataset, e.g., images, available at the transmitter. A multi-task deep learning approach that involves training a common encoder at the transmitter and individual decoders at the receivers is presented for joint optimization of completing multiple tasks and communicating with multiple receivers. By providing efficient resource allocation at the edge of 6G networks, the proposed approach allows the communications system to adapt to varying channel conditions and achieves task-specific objectives while minimizing transmission overhead. Joint training of the encoder and decoders using multi-task learning captures shared information across tasks and optimizes the communication process accordingly. By leveraging the broadcast nature of wireless communications, multi-receiver task-oriented communications (MTOC) reduces the number of transmissions required to complete tasks at different receivers. Performance evaluation conducted on the MNIST, Fashion MNIST, and CIFAR-10 datasets (with image classification considered for different tasks) demonstrates the effectiveness of MTOC in terms of classification accuracy and resource utilization compared to single-task-oriented communication systems.

摘要
Performance evaluation on the MNIST, Fashion MNIST, and CIFAR-10 datasets (with image classification considered for different tasks) demonstrates the effectiveness of task-oriented communication in terms of classification accuracy and resource utilization.

Quantifying Outlierness of Funds from their Categories using Supervised Similarity

paper_url: http://arxiv.org/abs/2308.06882
repo_url: None
paper_authors: Dhruv Desai, Ashmita Dhiman, Tushar Sharma, Deepika Sharma, Dhagash Mehta, Stefano Pasquali
For: 本研究旨在量化基金分类错误的影响，并提出一种基于机器学习的方法来检测和识别基金分类错误。* Methods: 本研究使用Random Forest方法进行距离度量学习，计算每个数据点的类别异常指标，以识别基金分类错误。* Results: 研究发现基金分类错误与未来回报之间存在强相关关系，并讨论了这些结果的意义。

Abstract
Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. Here, we aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach. We formulate the problem of miscategorization of funds as a distance-based outlier detection problem, where the outliers are the data-points that are far from the rest of the data-points in the given feature space. We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data. We test our implementation on various publicly available data sets, and then apply it to mutual fund data. We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings.

摘要
资金基金分类已成为投资管理industry的标准工具，广泛用于分配和资金经理选择，以及资金管理人员用于对比分析和竞争位置。因此，任意错误分类或缺乏精度可能对分配决策和投资基金产生深远的影响。在这里，我们想要量化基金分类错误的影响，使用机器学习基于的方法。我们将基金分类问题定义为一个距离度量学习问题，其中异常数据点是与其他数据点在给定特征空间的距离最远的数据点。我们实现了Random Forest（RF）基于的距离度量学习方法，并计算每个数据点的类别异常度量来标识异常数据点。我们在各种公开可用的数据集上测试了我们的实现，然后应用于基金数据。我们发现，基金异常度量和未来回报之间存在强相关关系，并讨论了我们的发现的意义。

AutoSeqRec: Autoencoder for Efficient Sequential Recommendation

paper_url: http://arxiv.org/abs/2308.06878
repo_url: https://github.com/sliu675/autoseqrec
paper_authors: Sijia Liu, Jiahao Liu, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu
for: 这篇论文主要针对继承推荐任务，旨在提供一种高效可靠的继承推荐方法。
methods: 该方法基于自适应器，包括一个编码器和三个解码器，它们考虑了用户-项交互矩阵和项过渡矩阵的行列。
results: 对比其他方法，AutoSeqRec显示了更高的准确率，同时具有更好的可靠性和效率。

Abstract
Sequential recommendation demonstrates the capability to recommend items by modeling the sequential behavior of users. Traditional methods typically treat users as sequences of items, overlooking the collaborative relationships among them. Graph-based methods incorporate collaborative information by utilizing the user-item interaction graph. However, these methods sometimes face challenges in terms of time complexity and computational efficiency. To address these limitations, this paper presents AutoSeqRec, an incremental recommendation model specifically designed for sequential recommendation tasks. AutoSeqRec is based on autoencoders and consists of an encoder and three decoders within the autoencoder architecture. These components consider both the user-item interaction matrix and the rows and columns of the item transition matrix. The reconstruction of the user-item interaction matrix captures user long-term preferences through collaborative filtering. In addition, the rows and columns of the item transition matrix represent the item out-degree and in-degree hopping behavior, which allows for modeling the user's short-term interests. When making incremental recommendations, only the input matrices need to be updated, without the need to update parameters, which makes AutoSeqRec very efficient. Comprehensive evaluations demonstrate that AutoSeqRec outperforms existing methods in terms of accuracy, while showcasing its robustness and efficiency.

摘要
带有顺序推荐的模型可以根据用户的顺序行为来推荐项目。传统方法通常将用户看作为序列中的项目，忽略了用户之间的协同关系。基于图的方法可以包含协同信息，但是它们有时会面临时间复杂度和计算效率的限制。为了解决这些限制，这篇论文提出了 AutoSeqRec，一种特点是适用于顺序推荐任务的增量推荐模型。AutoSeqRec基于自适应器，包括一个Encoder和三个解码器。这些组件考虑了用户-项目交互矩阵和行列式的项目过渡矩阵。重建用户-项目交互矩阵可以捕捉用户长期的偏好，通过协同筛选。此外，行列式的项目过渡矩阵表示用户短期的兴趣，允许模型化用户的短期偏好。在进行增量推荐时，只需更新输入矩阵，无需更新参数，这使得AutoSeqRec非常高效。 comprehensive评估表明，AutoSeqRec在准确性方面超过了现有方法，并展示了其稳定性和高效性。

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

paper_url: http://arxiv.org/abs/2308.06873
repo_url: None
paper_authors: Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka
for: 高质量零批text-to-speech创新和多种语音转换任务
methods: neural codec语言模型+多任务学习+任务dependent prompting
results: 在多种任务中表现优秀，包括零批TTS、噪声抑制、目标 speaker提取、speech removing和speech editing等，与专门模型相当或更高的性能

Abstract
Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks. See https://aka.ms/speechx for demo samples.

摘要
近期，基于音频文本提示的生成术语模型得到了很大的进步，可以实现高质量的零shot文本至语音转化。然而，现有模型仍然面临着处理多样化音频文本术语生成任务的限制，包括转化输入语音和处理附带噪声的audio捕获条件。本文介绍SpeechX，一种通用的术语生成模型，可以实现零shot TTS和多种术语转换任务，并处理干净和噪声信号。SpeechX通过神经编码语言模型和多任务学习，使用任务特定的提示，实现了一个统一和可扩展的模型，可以通过文本输入来进行语音提高和转换任务。实验结果表明SpeechX在多种任务中具有优秀的表现，包括零shot TTS、噪声减少、目标说话人EXTRACTION、语音删除和语音编辑等，与专门的模型相比，在任务中表现相当或更高。请参考https://aka.ms/speechx查看示例样本。

Semi-Supervised Dual-Stream Self-Attentive Adversarial Graph Contrastive Learning for Cross-Subject EEG-based Emotion Recognition

paper_url: http://arxiv.org/abs/2308.11635
repo_url: None
paper_authors: Weishan Ye, Zhiguo Zhang, Min Zhang, Fei Teng, Li Zhang, Linling Li, Gan Huang, Jianhong Wang, Dong Ni, Zhen Liang
for: 本研究旨在解决脑波Emotion recognition中数据标注率的限制问题，提高脑波Emotion recognition的精度和可靠性。methods: 本研究提出了一种基于Self-Attentive Adversarial Graph Contrastive learning的半supervised Dual-stream架构（简称DS-AGC），包括两个平行流处理非结构和结构脑波特征。非结构流使用半supervised多域适应方法来减轻来源频率域和目标频率域之间的分布差异。结构流开发了图像异构学习方法来提取多个EEG渠道之间的有效特征表示。此外，一种自注意 fusion模块也被提出，用于特征融合、样本选择和情绪识别，其中更加注重EEG特征与数据样本在标签源频率域中更加相关的部分。results: 经过对两个 benchmark数据库（SEED和SEED-IV）进行了大规模的实验，结果表明，提出的模型在不同的受测 incomplete label 条件下（包括标签率不同的情况）表现出了5.83%和6.99%的提高在SEED和SEED-IV数据库上，证明了该模型在跨主体EEG Emotion recognition中有效地解决了标签稀缺问题。

Abstract
Electroencephalography (EEG) is an objective tool for emotion recognition with promising applications. However, the scarcity of labeled data remains a major challenge in this field, limiting the widespread use of EEG-based emotion recognition. In this paper, a semi-supervised Dual-stream Self-Attentive Adversarial Graph Contrastive learning framework (termed as DS-AGC) is proposed to tackle the challenge of limited labeled data in cross-subject EEG-based emotion recognition. The DS-AGC framework includes two parallel streams for extracting non-structural and structural EEG features. The non-structural stream incorporates a semi-supervised multi-domain adaptation method to alleviate distribution discrepancy among labeled source domain, unlabeled source domain, and unknown target domain. The structural stream develops a graph contrastive learning method to extract effective graph-based feature representation from multiple EEG channels in a semi-supervised manner. Further, a self-attentive fusion module is developed for feature fusion, sample selection, and emotion recognition, which highlights EEG features more relevant to emotions and data samples in the labeled source domain that are closer to the target domain. Extensive experiments conducted on two benchmark databases (SEED and SEED-IV) using a semi-supervised cross-subject leave-one-subject-out cross-validation evaluation scheme show that the proposed model outperforms existing methods under different incomplete label conditions (with an average improvement of 5.83% on SEED and 6.99% on SEED-IV), demonstrating its effectiveness in addressing the label scarcity problem in cross-subject EEG-based emotion recognition.

摘要
电子脑电图像 (EEG) 是一种客观工具 для情感识别，具有广泛的应用前景。然而，有限的标签数据仍然是这一领域的主要挑战，限制了 EEG 基于情感识别的普及使用。在这篇论文中，一种半supervised dual-stream自我注意力对抗图像学习框架（简称 DS-AGC）被提出，以解决跨主体 EEG 基于情感识别的标签稀缺问题。DS-AGC 框架包括两个并行的流动，用于提取非结构化和结构化 EEG 特征。非结构化流动利用半supervised多域适应方法，以减轻不同标签领域之间的分布差异。结构化流动开发了图像学习方法，以提取多个 EEG 通道之间的有效图像特征表示。此外，一个自我注意力融合模块被开发，用于特征融合、样本选择和情感识别，并强调 EEG 特征更加 relevante 于情感和数据样本在标签领域中更加接近的目标领域。广泛的实验在两个标准数据库（SEED 和 SEED-IV）上进行，使用半supervised cross-subject leave-one-subject-out 跨领域验证方式，显示提出的模型在不同的标签缺失情况下（SEED 上的平均提升率为 5.83%，SEED-IV 上的平均提升率为 6.99%），表明其能够有效地解决跨主体 EEG 基于情感识别的标签稀缺问题。

Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks

paper_url: http://arxiv.org/abs/2308.06862
repo_url: https://github.com/erfanloghmani/effect-of-loss-function-tbatching
paper_authors: Erfan Loghmani, MohammadAmin Fazli
for: 这篇论文旨在探讨静态网络模型的动态学习方法，以提高网络模型的训练效率和准确性。
methods: 本论文提出了两种替代训练损失函数，并通过数学分析显示了这些损失函数可以解决训练损失函数中的问题，提高训练性能。
results: 实验结果显示，将这两种替代损失函数应用于训练静态网络模型可以提高模型的训练效率和准确性，特别是在实际世界的动态网络上。

Abstract
Representation learning methods have revolutionized machine learning on networks by converting discrete network structures into continuous domains. However, dynamic networks that evolve over time pose new challenges. To address this, dynamic representation learning methods have gained attention, offering benefits like reduced learning time and improved accuracy by utilizing temporal information. T-batching is a valuable technique for training dynamic network models that reduces training time while preserving vital conditions for accurate modeling. However, we have identified a limitation in the training loss function used with t-batching. Through mathematical analysis, we propose two alternative loss functions that overcome these issues, resulting in enhanced training performance. We extensively evaluate the proposed loss functions on synthetic and real-world dynamic networks. The results consistently demonstrate superior performance compared to the original loss function. Notably, in a real-world network characterized by diverse user interaction histories, the proposed loss functions achieved more than 26.9% enhancement in Mean Reciprocal Rank (MRR) and more than 11.8% improvement in Recall@10. These findings underscore the efficacy of the proposed loss functions in dynamic network modeling.

摘要
<>将文本翻译成简化中文。<>机器学习在网络上进行了革命，通过将分类网络结构转换为连续域。然而，时间演化的网络却提出了新的挑战。为此，动态表示学习方法在抓取到关注。这些方法可以减少学习时间，并使用时间信息提高准确性。 T-批处理是训练动态网络模型的有价值技术，可以降低训练时间，保持模型准确的条件。然而，我们发现在使用 t-批处理时的训练损失函数中存在一定的限制。通过数学分析，我们提出了两种替代的损失函数，可以解决这些问题，从而提高训练性能。我们对提出的损失函数进行了广泛的评估，在 sintetic 和实际的动态网络上进行了测试。结果表明，提出的损失函数在动态网络模型训练中具有显著优势，与原始损失函数相比，可以提高 Mean Reciprocal Rank（MRR）的表现至少26.9%，并提高 Recall@10 的表现至少11.8%。这些发现证明了我们提出的损失函数在动态网络模型中的有效性。

Optimizing Offensive Gameplan in the National Basketball Association with Machine Learning

paper_url: http://arxiv.org/abs/2308.06851
repo_url: None
paper_authors: Eamon Mukhopadhyay
for: 本研究的目的是确认Stats的有效性，以及将Stats与NBA比赛类型之间建立关联性。
methods: 本研究使用了机器学习技术，选择了一组特定的特征，以评估Stats的有效性。 linear regression 模型和对应性网络模型都被用来检验Stats 的跟踪能力。
results: 研究发现，使用ORTG 的 linear regression 模型和对应性网络模型都能够与不同的NBA比赛类型建立关联性。然而，使用对应性网络模型的精度较高。通过对模型的调整，研究人员发现了一组特定的特征，可以帮助建立一个高效的进攻策略。

Abstract
Throughout the analytical revolution that has occurred in the NBA, the development of specific metrics and formulas has given teams, coaches, and players a new way to see the game. However - the question arises - how can we verify any metrics? One method would simply be eyeball approximation (trying out many different gameplans) and/or trial and error - an estimation-based and costly approach. Another approach is to try to model already existing metrics with a unique set of features using machine learning techniques. The key to this approach is that with these features that are selected, we can try to gauge the effectiveness of these features combined, rather than using individual analysis in simple metric evaluation. If we have an accurate model, it can particularly help us determine the specifics of gameplan execution. In this paper, the statistic ORTG (Offensive Rating, developed by Dean Oliver) was found to have a correlation with different NBA playtypes using both a linear regression model and a neural network regression model, although ultimately, a neural network worked slightly better than linear regression. Using the accuracy of the models as a justification, the next step was to optimize the output of the model with test examples, which would demonstrate the combination of features to best achieve a highly functioning offense.

摘要
在NBA的分析革命中，开发特定的指标和公式为球队、教练和球员提供了一种新的视角。然而，问题出现：如何证明这些指标？一种方法是通过观察和尝试多种战斗策略来估算，这是一种估算性的和昂贵的方法。另一种方法是使用机器学习技术来模型现有的指标，并使用这些特定的特征来评估这些指标的效果。如果我们有一个准确的模型，那么它可以帮助我们确定游戏计划的具体执行方式。根据这篇论文，由Dean Oliver开发的ORTG指标（进攻评估指标）与不同的NBA战斗类型之间存在正相关关系，使用线性回归模型和神经网络回归模型进行评估，最终神经网络模型的性能略高于线性回归模型。使用模型的准确性为正当化，接下来的步骤是使用测试例子来优化模型的输出，以达到高效的攻击战斗。

When Monte-Carlo Dropout Meets Multi-Exit: Optimizing Bayesian Neural Networks on FPGA

paper_url: http://arxiv.org/abs/2308.06849
repo_url: https://github.com/os-hxfan/bayesnn_fpga
paper_authors: Hongxiang Fan, Hao Chen, Liam Castelli, Zhiqiang Que, He Li, Kenneth Long, Wayne Luk
for: 提高安全应用中的投机率预测，如医学影像和自动驾驶。
methods: 提出了一种基于多出口Monte Carlo Dropout（MCD）的 bayesian neural network，实现了准确预测，同时降低了算法复杂性。
results: 对比CPU、GPU和其他当前硬件实现，自动生成的加速器实现了更高的能效率。

Abstract
Bayesian Neural Networks (BayesNNs) have demonstrated their capability of providing calibrated prediction for safety-critical applications such as medical imaging and autonomous driving. However, the high algorithmic complexity and the poor hardware performance of BayesNNs hinder their deployment in real-life applications. To bridge this gap, this paper proposes a novel multi-exit Monte-Carlo Dropout (MCD)-based BayesNN that achieves well-calibrated predictions with low algorithmic complexity. To further reduce the barrier to adopting BayesNNs, we propose a transformation framework that can generate FPGA-based accelerators for multi-exit MCD-based BayesNNs. Several novel optimization techniques are introduced to improve hardware performance. Our experiments demonstrate that our auto-generated accelerator achieves higher energy efficiency than CPU, GPU, and other state-of-the-art hardware implementations.

摘要
bayesian neural networks (bayesNNs) 有显示出在安全关键应用，如医疗成像和自动驾驶中提供了调整后预测的能力。然而，高算法复杂性和 BayesNNs 的硬件性能问题使得它们在实际应用中困难得 deployment。为bridge这个差距，本文提出了一种基于多出口 Monte Carlo Dropout (MCD) 的 BayesNN ，可以实现低算法复杂性下的准确预测。此外，我们还提出了一种转换框架，可以生成 FPGA 基于的加速器，以便快速采用 BayesNNs。我们还引入了一些新的优化技术，以提高硬件性能。我们的实验表明，我们自动生成的加速器在能耗效率方面高于 CPU、GPU 和其他现有硬件实现。Note: "BayesNNs" is a abbreviation of "Bayesian Neural Networks" in English, and "bayesNNs" is the pinyin Romanization of "拜耳 нейрон网络" in Chinese.

Generalizing Topological Graph Neural Networks with Paths

paper_url: http://arxiv.org/abs/2308.06838
repo_url: None
paper_authors: Quang Truong, Peter Chin
for: 本文主要研究Graph Neural Networks (GNNs)的限制和提高。
methods: 本文提出了一种以路径为中心的方法，该方法可以在不假设图structure的情况下提高GNNs的性能。
results: 本文的方法在多个 benchmark 上达到了状态之artefact的表现。

Abstract
While Graph Neural Networks (GNNs) have made significant strides in diverse areas, they are hindered by a theoretical constraint known as the 1-Weisfeiler-Lehmann test. Even though latest advancements in higher-order GNNs can overcome this boundary, they typically center around certain graph components like cliques or cycles. However, our investigation goes a different route. We put emphasis on paths, which are inherent in every graph. We are able to construct a more general topological perspective and form a bridge to certain established theories about other topological domains. Interestingly, without any assumptions on graph sub-structures, our approach surpasses earlier techniques in this field, achieving state-of-the-art performance on several benchmarks.

摘要
GNNS （图 neural network）在多种领域取得了重要进步，但它们受到一种理论限制，称为Weisfeiler-Lehmann测试。 latest advancements in higher-order GNNs 可以突破这个限制，但它们通常围绕 graf 组件如 clique 或 cycle 进行中心。然而，我们的研究采取了不同的方向。我们强调 path，这些是所有 graf 的内在特征。我们可以构建一个更通用的 topological 视角，并与其他已确立的 topological 领域之间建立桥梁。很有趣的是，不需要任何关于 graf 子结构的假设，我们的方法可以在这个领域中超越之前的技术，在多个 benchmark 上达到状态的表现。

InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

paper_url: http://arxiv.org/abs/2308.08500
repo_url: None
paper_authors: Kabir Nagrecha, Lingyi Liu, Pablo Delgado, Prasanna Padmanabhan
for: 这个论文的目的是探讨深度学习推荐模型（DLRM）的训练过程中的数据接收问题，以及这个问题在现实世界中的瓶颈和挑战。
methods: 这篇论文使用了人工智能的强化学习（RL）技术来解决数据接收问题，RL机器学习agent可以学习如何在DLRM数据管道中分配CPU资源，以更好地并行数据加载和提高throughput。
results: 实验表明，使用InTune可以在只需几分钟之内构建优化数据管道配置，并且可以轻松地与现有训练工作流 integrate。InTune可以提高在线数据接收率，从而减少模型执行时间的浪费和提高效率。在实际世界中应用InTune后，发现它可以提高数据接收 durchput 比现有数据管道优化器高出2.29倍，同时也提高CPU和GPU资源的利用率。

Abstract
Deep learning-based recommender models (DLRMs) have become an essential component of many modern recommender systems. Several companies are now building large compute clusters reserved only for DLRM training, driving new interest in cost- and time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning training jobs are dominated by model execution, the most important factor in DLRM training performance is often online data ingestion. In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into DLRM training pipeline bottlenecks and challenges. We study real-world DLRM data processing pipelines taken from our compute cluster at Netflix to observe the performance impacts of online ingestion and to identify shortfalls in existing pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute the CPU resources of a trainer machine across a DLRM data pipeline to more effectively parallelize data loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.

摘要
现代推荐系统中的深度学习基于模型（DLRM）已成为关键组件。许多公司正在建立专门用于DLRM训练的大型计算集群，导致新的成本和时间OPTIMIZATION的兴趣。这种系统中的挑战是唯一的；通常的深度学习训练任务由模型执行所主导，而DLRM训练中最重要的因素则是在线数据接收。在这篇论文中，我们探讨DLRM数据接收问题的独特特点，并对DLRM训练管道中的瓶颈和挑战提供了新的视角。我们研究了Netflix的计算集群中的真实DLRM数据处理管道，以观察在线接收的性能影响和标准化管道优化器的缺陷。我们发现现有的工具可以提供低效率、频繁崩溃或者需要重新组织集群的优化。为了解决这些问题，我们设计了一种新的数据管道优化解决方案——InTune。InTune使用了强化学习（RL）代理来学习如何在DLRM数据管道中分配训练机器的CPU资源，以更好地并行数据加载并提高吞吐量。我们的实验表明，InTune可以在只需几分钟之内构建优化后的数据管道配置，并可以轻松地与现有训练工作流 integrate。通过强化学习的响应和适应性，InTune可以在现有优化器的基础上提高在线数据接收速率，从而降低模型执行时间的浪费和提高效率。我们在实际 cluster 中应用InTune，发现它可以提高数据接收吞吐量，最高可达2.29倍于当前状态艺术数据管道优化器，同时也提高CPU和GPU资源的利用率。

An Ensemble Approach to Question Classification: Integrating Electra Transformer, GloVe, and LSTM

paper_url: http://arxiv.org/abs/2308.06828
repo_url: None
paper_authors: Sanad Aburass, Osama Dorgham
for: 本研究提出了一种新的集成方法，用于问题分类任务。
methods: 该模型使用了现代化的 Electra、GloVe 和 LSTM 模型，并将其集成起来，以提高问题分类的精度和效率。
results: 对于 TREC 数据集上的问题分类任务，我们的模型实现了以下成果： accuracy 0.8 。这些结果表明，集成方法在问题分类任务中具有显著的优势，并且鼓励进一步探索 ensemble 方法在自然语言处理中的应用。

Abstract
This paper introduces a novel ensemble approach for question classification using state-of-the-art models -- Electra, GloVe, and LSTM. The proposed model is trained and evaluated on the TREC dataset, a well-established benchmark for question classification tasks. The ensemble model combines the strengths of Electra, a transformer-based model for language understanding, GloVe, a global vectors for word representation, and LSTM, a recurrent neural network variant, providing a robust and efficient solution for question classification. Extensive experiments were carried out to compare the performance of the proposed ensemble approach with other cutting-edge models, such as BERT, RoBERTa, and DistilBERT. Our results demonstrate that the ensemble model outperforms these models across all evaluation metrics, achieving an accuracy of 0.8 on the test set. These findings underscore the effectiveness of the ensemble approach in enhancing the performance of question classification tasks, and invite further exploration of ensemble methods in natural language processing.

摘要
这篇论文介绍了一种新的ensemble方法 для问题分类，使用当今最佳模型——Electra、GloVe和LSTM。提议的模型在TREC数据集上进行训练和评估，TREC数据集是问题分类任务的可靠的标准 benchmark。 ensemble模型结合了Electra、GloVe和LSTM的优势，提供了一种强大和高效的问题分类解决方案。我们进行了广泛的实验，比较了提议的ensemble方法与其他最新的模型，如BERT、RoBERTa和DistilBERT的性能。我们的结果表明， ensemble模型在所有评价指标上都超过了这些模型，在测试集上达到了0.8的准确率。这些发现证明了 ensemble方法在问题分类任务中的效果，并邀请了进一步的ensemble方法在自然语言处理领域的探索。

Reinforcement Graph Clustering with Unknown Cluster Number

paper_url: http://arxiv.org/abs/2308.06827
repo_url: https://github.com/yueliu1999/awesome-deep-graph-clustering
paper_authors: Yue Liu, Ke Liang, Jun Xia, Xihong Yang, Sihang Zhou, Meng Liu, Xinwang Liu, Stan Z. Li
for: 这个论文的目标是提出一种无监督的深度图 clustering 方法，以便在实际场景中不需要预先定义群集数量。
methods: 该方法使用了强化学习机制，将集群数量决定和无监督表示学习集成到一个统一框架中。首先学习出了强化表示，然后考虑了节点和集群状态，并使用了质量网络评估不同群集数量的质量。最后，通过强化学习机制来确定最佳的群集数量。
results: 实验表明，提出的方法可以有效地进行深度图 clustering，并且比既有方法更加高效。code 和数据集可以在 GitHub 上找到。

Abstract
Deep graph clustering, which aims to group nodes into disjoint clusters by neural networks in an unsupervised manner, has attracted great attention in recent years. Although the performance has been largely improved, the excellent performance of the existing methods heavily relies on an accurately predefined cluster number, which is not always available in the real-world scenario. To enable the deep graph clustering algorithms to work without the guidance of the predefined cluster number, we propose a new deep graph clustering method termed Reinforcement Graph Clustering (RGC). In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework by the reinforcement learning mechanism. Concretely, the discriminative node representations are first learned with the contrastive pretext task. Then, to capture the clustering state accurately with both local and global information in the graph, both node and cluster states are considered. Subsequently, at each state, the qualities of different cluster numbers are evaluated by the quality network, and the greedy action is executed to determine the cluster number. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters. Extensive experiments demonstrate the effectiveness and efficiency of our proposed method. The source code of RGC is shared at https://github.com/yueliu1999/RGC and a collection (papers, codes and, datasets) of deep graph clustering is shared at https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering on Github.

摘要
深度图 clustering，目标是通过神经网络在无监督情况下将节点分组成不重叠的分组，在过去几年内吸引了很大的关注。虽然现有的方法已经提高了性能，但是它们的出色表现受到准确预定的分组数量的限制，这在实际应用场景中并不总是可用。为了让深度图 clustering 算法不受预定分组数量的限制，我们提出了一种新的深度图 clustering 方法，称为奖励图 clustering（RGC）。在我们的提议方法中，帧定分组数量和无监督表示学习被统一到一个奖励学习机制中。具体来说，首先通过对比预文本任务来学习描述性节点表示。然后，为了准确地捕捉图中节点和分组的相互关系，在每个状态下考虑节点和分组状态。接着，在每个状态下，通过质量网络评估不同分组数量的质量，并执行贪婪的动作来确定分组数量。为了进行反馈动作，我们提出了一种集成吸引函数，以提高同分组内节点之间的凝聚力和不同分组内节点之间的分离力。我们的实验表明，RGC 方法可以具有高效率和高效性。RGC 的源代码可以在 GitHub 上获取（https://github.com/yueliu1999/RGC），并且我们在 GitHub 上分享了一个包含深度图 clustering 相关论文、代码和数据集的集成（https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering）。

Approximate and Weighted Data Reconstruction Attack in Federated Learning

paper_url: http://arxiv.org/abs/2308.06822
repo_url: None
paper_authors: Ziqi Wang, Yongcun Song, Enrique Zuazua
for: 该研究旨在攻击基于 horizontal Federated Averaging（FedAvg）的分布式学习（Federated Learning，FL）场景，以便在无需分享客户端数据的情况下，攻击者可以recover客户端的训练数据。
methods: 该研究提出了一种 interpolation-based approximation 方法，通过生成客户端的本地训练过程中的中间模型更新，使得攻击 FedAvg 场景变得可行。此外，该研究还提出了一种层wise weighted loss function，用于提高数据重建的质量。
results: 实验结果表明，该研究的提出的 Approximate and Weighted Attack（AWA）方法在不同的评价指标中具有显著的改善，特别是在图像数据重建中。

Abstract
Federated Learning (FL) is a distributed learning paradigm that enables multiple clients to collaborate on building a machine learning model without sharing their private data. Although FL is considered privacy-preserved by design, recent data reconstruction attacks demonstrate that an attacker can recover clients' training data based on the parameters shared in FL. However, most existing methods fail to attack the most widely used horizontal Federated Averaging (FedAvg) scenario, where clients share model parameters after multiple local training steps. To tackle this issue, we propose an interpolation-based approximation method, which makes attacking FedAvg scenarios feasible by generating the intermediate model updates of the clients' local training processes. Then, we design a layer-wise weighted loss function to improve the data quality of reconstruction. We assign different weights to model updates in different layers concerning the neural network structure, with the weights tuned by Bayesian optimization. Finally, experimental results validate the superiority of our proposed approximate and weighted attack (AWA) method over the other state-of-the-art methods, as demonstrated by the substantial improvement in different evaluation metrics for image data reconstructions.

摘要
federated learning（FL）是一种分布式学习模式，允许多个客户端共同构建一个机器学习模型，而无需分享他们的私人数据。虽然FL被视为隐私保护的设计，但是最近的数据重建攻击表明，攻击者可以根据在FL中共享的参数恢复客户端的训练数据。然而，现有的方法无法攻击最常用的水平联合平均（FedAvg）场景，在这里，客户端在多个本地训练步骤后共享模型参数。为解决这个问题，我们提议一种 interpolation-based 近似方法，使得在客户端的本地训练过程中生成Intermediate模型更新。然后，我们设计了层weise weighted 损失函数，以提高数据重建的质量。我们对模型更新在不同层中分配不同的权重，并通过抽样优化得到最佳权重。最后，我们的提出的近似和权重攻击（AWA）方法在不同评价指标中具有显著的提高，与其他当前state-of-the-art方法进行比较。

SoK: Realistic Adversarial Attacks and Defenses for Intelligent Network Intrusion Detection

paper_url: http://arxiv.org/abs/2308.06819
repo_url: None
paper_authors: João Vitorino, Isabel Praça, Eva Maia
for: 本研究旨在汇总当前领域中 adversarial 学习的应用情况，以及对 realistic 的攻击示例的生成方法。
methods: 本研究使用了多种 adversarial 攻击方法，包括黑盒测试、灰盒测试、探测隐蔽攻击等。
results: 本研究通过对多种 adversarial 攻击方法进行分析和评估，提出了一些 open challenges 和 future research directions，以及一些实际应用场景的推荐。

Abstract
Machine Learning (ML) can be incredibly valuable to automate anomaly detection and cyber-attack classification, improving the way that Network Intrusion Detection (NID) is performed. However, despite the benefits of ML models, they are highly susceptible to adversarial cyber-attack examples specifically crafted to exploit them. A wide range of adversarial attacks have been created and researchers have worked on various defense strategies to safeguard ML models, but most were not intended for the specific constraints of a communication network and its communication protocols, so they may lead to unrealistic examples in the NID domain. This Systematization of Knowledge (SoK) consolidates and summarizes the state-of-the-art adversarial learning approaches that can generate realistic examples and could be used in real ML development and deployment scenarios with real network traffic flows. This SoK also describes the open challenges regarding the use of adversarial ML in the NID domain, defines the fundamental properties that are required for an adversarial example to be realistic, and provides guidelines for researchers to ensure that their future experiments are adequate for a real communication network.

摘要
Translated into Simplified Chinese:机器学习（ML）可以极其有价值地自动检测异常和识别攻击，提高网络入侵检测（NID）的方式。然而，尽管ML模型具有各种优点，但它们又受到特制的攻击示例的威胁。有许多攻击方法被创造出来，研究人员也为了保护ML模型而努力了很多，但大多数这些方法不适用于通信网络和其通信协议的特定限制，因此可能导致NID领域中的不真实的示例。这个系统化知识（SoK）总结了当前领域中最佳的抗击学习方法，这些方法可以生成真实的示例，并可以在实际的ML开发和部署场景中使用实际的网络流量。这个SoK还描述了使用抗击学习在NID领域的开放挑战，定义了真实示例所需的基本属性，并提供了指导方针，以便未来研究人员可以在真正的通信网络中进行合适的实验。

SAILOR: Structural Augmentation Based Tail Node Representation Learning

paper_url: http://arxiv.org/abs/2308.06801
repo_url: https://github.com/jie-re/sailor
paper_authors: Jie Liao, Jintang Li, Liang Chen, Bingzhe Wu, Yatao Bian, Zibin Zheng
for: 本文是为了提高图像中的尾节点表示性而提出的一种框架，即SAILOR，该框架可以同时学习图像的结构增强和尾节点表示提取更多的信息。methods: 本文使用的方法包括message propagation和structural augmentation，这两种方法可以帮助提高尾节点的表示性。results: 实验结果表明，SAILOR可以显著提高尾节点的表示性，并超越现有的基elines。

Abstract
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in representation learning for graphs recently. However, the effectiveness of GNNs, which capitalize on the key operation of message propagation, highly depends on the quality of the topology structure. Most of the graphs in real-world scenarios follow a long-tailed distribution on their node degrees, that is, a vast majority of the nodes in the graph are tail nodes with only a few connected edges. GNNs produce inferior node representations for tail nodes since they lack structural information. In the pursuit of promoting the expressiveness of GNNs for tail nodes, we explore how the deficiency of structural information deteriorates the performance of tail nodes and propose a general Structural Augmentation based taIL nOde Representation learning framework, dubbed as SAILOR, which can jointly learn to augment the graph structure and extract more informative representations for tail nodes. Extensive experiments on public benchmark datasets demonstrate that SAILOR can significantly improve the tail node representations and outperform the state-of-the-art baselines.

摘要
格raph神经网络（GNNs）在近期 representation learning 中取得了状态理想的表现。然而，GNNs 的效果，它们基于消息传递操作，具体取决于图结构质量。大多数实际场景中的图都遵循一个长尾分布，即图中的大多数节点是tail节点，只有几个连接的边。GNNs 对tail节点进行表示不够，因为它们缺乏结构信息。为了提高 GNNs 对tail节点的表示能力，我们研究了tail节点表示力下降的原因，并提出了一种通用的结构扩充基于 taIL nOde Representation 学习框架，名为 SAILOR，可以同时学习扩充图结构并提取更有用的表示信息。我们对公共 benchmark 数据集进行了广泛的实验，结果显示，SAILOR 可以明显提高tail节点表示能力，并超过当前的基elines。

2023-08-14

Distance Matters For Improving Performance Estimation Under Covariate Shift

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Generating Individual Trajectories Using GPT-2 Trained from Scratch on Encoded Spatiotemporal Data

Automated Ensemble-Based Segmentation of Pediatric Brain Tumors: A Novel Approach Using the CBTN-CONNECT-ASNR-MICCAI BraTS-PEDs 2023 Challenge Data

Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning

Algorithms for the Training of Neural Support Vector Machines

Neural Categorical Priors for Physics-Based Character Control

Explaining Black-Box Models through Counterfactuals

gSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling

Improving ICD-based semantic similarity by accounting for varying degrees of comorbidity

Conformal Predictions Enhanced Expert-guided Meshing with Graph Neural Networks

Efficient Learning of Quantum States Prepared With Few Non-Clifford Gates II: Single-Copy Measurements

PitchNet: A Fully Convolutional Neural Network for Pitch Estimation

SPEGTI: Structured Prediction for Efficient Generative Text-to-Image Models

Pairing interacting protein sequences using masked language modeling

Natural Language is All a Graph Needs

Implementation of The Future of Drug Discovery: QuantumBased Machine Learning Simulation (QMLS)

A Time-aware tensor decomposition for tracking evolving patterns

Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers

Neural radiance fields in the industrial and robotics domain: applications, research opportunities and use cases

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

Ada-QPacknet – adaptive pruning with bit width reduction as an efficient continual learning method without forgetting

Age-Stratified Differences in Morphological Connectivity Patterns in ASD: An sMRI and Machine Learning Approach

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

Machine Unlearning: Solutions and Challenges

Diagnosis of Scalp Disorders using Machine Learning and Deep Learning Approach – A Review

Fourier neural operator for learning solutions to macroscopic traffic flow models: Application to the forward and inverse problems

UIPC-MF: User-Item Prototype Connection Matrix Factorization for Explainable Collaborative Filtering

No Regularization is Needed: An Efficient and Effective Model for Incomplete Label Distribution Learning

Bayesian Flow Networks

S3IM: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields

Bayesian Physics-Informed Neural Network for the Forward and Inverse Simulation of Engineered Nano-particles Mobility in a Contaminated Aquifer

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

Efficient Neural PDE-Solvers using Quantization Aware Training

Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads

Greedy online change point detection

Aggregating Intrinsic Information to Enhance BCI Performance through Federated Learning

Deep convolutional neural networks for cyclic sensor data

pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

Routing Recovery for UAV Networks with Deliberate Attacks: A Reinforcement Learning based Approach

AutoAssign+: Automatic Shared Embedding Assignment in Streaming Recommendation

Graph Structural Residuals: A Learning Approach to Diagnosis

Search to Fine-tune Pre-trained Graph Neural Networks for Graph-level Tasks

Data-Driven Allocation of Preventive Care With Application to Diabetes Mellitus Type II

CEmb-SAM: Segment Anything Model with Condition Embedding for Joint Learning from Heterogeneous Datasets

Channel-Wise Contrastive Learning for Learning with Noisy Labels

Knowing Where to Focus: Event-aware Transformer for Video Grounding

Semantic-aware Network for Aerial-to-Ground Image Synthesis

Insurance pricing on price comparison websites via reinforcement learning

Predicting Listing Prices In Dynamic Short Term Rental Markets Using Machine Learning Models

CBA: Improving Online Continual Learning via Continual Bias Adaptor

A Novel Ehanced Move Recognition Algorithm Based on Pre-trained Models with Positional Embeddings

CausalLM is not optimal for in-context learning

GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text

Generative Interpretation

Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Bridging Offline-Online Evaluation with a Time-dependent and Popularity Bias-free Offline Metric for Recommenders

Multi-Receiver Task-Oriented Communications via Multi-Task Deep Learning

Quantifying Outlierness of Funds from their Categories using Supervised Similarity

AutoSeqRec: Autoencoder for Efficient Sequential Recommendation

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Semi-Supervised Dual-Stream Self-Attentive Adversarial Graph Contrastive Learning for Cross-Subject EEG-based Emotion Recognition

Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks

Optimizing Offensive Gameplan in the National Basketball Association with Machine Learning

When Monte-Carlo Dropout Meets Multi-Exit: Optimizing Bayesian Neural Networks on FPGA

Generalizing Topological Graph Neural Networks with Paths

InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

An Ensemble Approach to Question Classification: Integrating Electra Transformer, GloVe, and LSTM

Reinforcement Graph Clustering with Unknown Cluster Number

Approximate and Weighted Data Reconstruction Attack in Federated Learning

SoK: Realistic Adversarial Attacks and Defenses for Intelligent Network Intrusion Detection

SAILOR: Structural Augmentation Based Tail Node Representation Learning