2023-07-19

cs.LG

cs.LG - 2023-07-19

Android in the Wild: A Large-Scale Dataset for Android Device Control

paper_url: http://arxiv.org/abs/2307.10088
repo_url: https://github.com/google-research/google-research
paper_authors: Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, Timothy Lillicrap
For: 本研究准备了一个大型设备控制 dataset，以便研究设备控制系统可以从人工语言指令中提取信息并直接控制设备用户界面。* Methods: 该 dataset 包含人类展示设备交互的示例，以及对应的自然语言指令。它包括 715k 个集成，30k 个唯一的指令，四个版本的 Android (v10-13)，八种设备类型 (Pixel 2 XL 到 Pixel 6) with varying 屏幕分辨率。它包含多步任务，需要语言和视觉上下文的含义理解。* Results: 该 dataset 提出了一个新的挑战：通过视觉上下文进行动作推理。而不是简单的 UI 元素基本动作，动作空间包括精准的手势（例如，横向滚动来操作轮播 widget）。作者们组织了该 dataset，以便检验设备控制系统的可靠性，即在新的任务描述、应用程序或平台版本下，系统如何表现。作者们开发了两个代理，并在 dataset 上进行了性能测试。dataset 可以在 https://github.com/google-research/google-research/tree/master/android_in_the_wild 中下载。

Abstract
There is a growing interest in device-control systems that can interpret human natural language instructions and execute them on a digital device by directly controlling its user interface. We present a dataset for device-control research, Android in the Wild (AITW), which is orders of magnitude larger than current datasets. The dataset contains human demonstrations of device interactions, including the screens and actions, and corresponding natural language instructions. It consists of 715k episodes spanning 30k unique instructions, four versions of Android (v10-13),and eight device types (Pixel 2 XL to Pixel 6) with varying screen resolutions. It contains multi-step tasks that require semantic understanding of language and visual context. This dataset poses a new challenge: actions available through the user interface must be inferred from their visual appearance. And, instead of simple UI element-based actions, the action space consists of precise gestures (e.g., horizontal scrolls to operate carousel widgets). We organize our dataset to encourage robustness analysis of device-control systems, i.e., how well a system performs in the presence of new task descriptions, new applications, or new platform versions. We develop two agents and report performance across the dataset. The dataset is available at https://github.com/google-research/google-research/tree/master/android_in_the_wild.

摘要
“有一 growing interest 在设备控制系统中，能够理解人类自然语言指令并直接控制设备用户界面。我们提供了一个设备控制数据集，Android in the Wild (AITW)，该数据集规模比现有数据集有很大的提升。数据集包含人类设备交互示例，包括屏幕和操作，以及相应的自然语言指令。其包含30k个独特指令、四个版本的Android（v10-13）、八种设备（Pixel 2 XL 到 Pixel 6）以及不同屏幕分辨率。它包含多步任务，需要自然语言和视觉上下文的含义理解。这个数据集带来了一个新的挑战：通过视觉出现来推理操作。而不是简单的UI元素基于的操作，操作空间包括精准的手势（例如，横向滚动来操作轮播 widget）。我们将数据集分成了多个分区，以便鼓励设备控制系统的鲁棒性分析，即在新的任务描述、应用程序或平台版本下，系统的性能如何。我们开发了两个代理，并在数据集上进行了性能测试。数据集可以在https://github.com/google-research/google-research/tree/master/android_in_the_wild 上获取。”

A Dual Formulation for Probabilistic Principal Component Analysis

paper_url: http://arxiv.org/abs/2307.10078
repo_url: None
paper_authors: Henri De Plaen, Johan A. K. Suykens
for: 本研究探讨了概率主成分分析在希尔бер特空间中的特性，并证明了优解在 dual space 中有一个表示。这使得我们可以开发一种生成框架 для核方法。
methods: 本研究使用了概率主成分分析和 dual space 的技术。
results: 研究证明了 Kernel Principal Component Analysis 是 Probabilistic Principal Component Analysis 的一种特例，并在一个小型示例和一个实际数据集上进行了示例。

Abstract
In this paper, we characterize Probabilistic Principal Component Analysis in Hilbert spaces and demonstrate how the optimal solution admits a representation in dual space. This allows us to develop a generative framework for kernel methods. Furthermore, we show how it englobes Kernel Principal Component Analysis and illustrate its working on a toy and a real dataset.

摘要
在这篇论文中，我们 caracterize Probabilistic Principal Component Analysis在希尔бер特空间中，并证明优解 admit dual空间的表示。这 позволяет我们开发generative框架 дляkernel方法。此外，我们还示了它包含Kernel Principal Component Analysis，并在一个玩偶和一个真实数据集上验证其工作。Here's a breakdown of the translation:* "Probabilistic Principal Component Analysis" is translated as "概率主成分分析" (gōng cháng zhòng xīng yǐng yǎn jī).* "in Hilbert spaces" is translated as "在希尔бер特空间中" (zhōng yǐn hī lè bèi tiān xiàng).* "demonstrate how the optimal solution admits a representation in dual space" is translated as "证明优解 admit dual空间的表示" (zhèng mín yú jiě admit dì zhōng yǐng yǎn jī).* "This allows us to develop a generative framework for kernel methods" is translated as "这 позволяет我们开发generative框架 дляkernel方法" (zhèng mìng yú jī yī jiān kē yì).* "Furthermore, we show how it englobes Kernel Principal Component Analysis" is translated as "此外，我们还示了它包含Kernel Principal Component Analysis" (qí wài, wǒmen hái shì le ta bāng zhì Kernel Principal Component Analysis).* "and illustrate its working on a toy and a real dataset" is translated as "并在一个玩偶和一个真实数据集上验证其工作" (yī yī gè zhōng zhì yī yī zhèng shí yī jīn jī).

Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples

paper_url: http://arxiv.org/abs/2307.10062
repo_url: None
paper_authors: JoonHo Lee, Jae Oh Woo, Hankyu Moon, Kwonho Lee
for: 这个论文是为了解决在投入深度视觉模型时可能出现的性能下降问题，具体是因为源和目标分布之间存在差异。
methods: 该论文提出了一种新的源无法场景下的模型准确率估计方法，不需要源数据和标签。具体来说，它使用pseudo-标签来估计目标域中模型的准确率，并采用了源自由适应算法来解决问题。
results: 该论文的实验结果表明，该方法可以在不需要源数据和标签的情况下，有效地Addressing the challenging distribution shift scenarios and outperform existing methods that require source data and labels for training.

Abstract
Deploying deep visual models can lead to performance drops due to the discrepancies between source and target distributions. Several approaches leverage labeled source data to estimate target domain accuracy, but accessing labeled source data is often prohibitively difficult due to data confidentiality or resource limitations on serving devices. Our work proposes a new framework to estimate model accuracy on unlabeled target data without access to source data. We investigate the feasibility of using pseudo-labels for accuracy estimation and evolve this idea into adopting recent advances in source-free domain adaptation algorithms. Our approach measures the disagreement rate between the source hypothesis and the target pseudo-labeling function, adapted from the source hypothesis. We mitigate the impact of erroneous pseudo-labels that may arise due to a high ideal joint hypothesis risk by employing adaptive adversarial perturbation on the input of the target model. Our proposed source-free framework effectively addresses the challenging distribution shift scenarios and outperforms existing methods requiring source data and labels for training.

摘要
deploying deep visual models 可能会导致性能下降，因为目标分布和源分布之间存在差异。多种方法利用标签的源数据估算目标领域的准确性，但是获取标签的源数据常常因调制解调器密性或服务器上的资源限制而受到限制。我们的工作提出了一个新的框架，可以在无法存取源数据的情况下估算模型的准确性。我们 investigates the feasibility of using pseudo-labels for accuracy estimation, and evolves this idea into adopting recent advances in source-free domain adaptation algorithms. 我们的方法量化源数据的假标签与目标领域的假标签函数之间的不一致率，并且运用适应式对抗扰动来减少因高理想共同假设风险而产生的错误假标签。我们的提议的source-free框架有效地解决了困难的分布迁移情况，并且超越了需要源数据和标签进行训练的现有方法。

Accurate deep learning sub-grid scale models for large eddy simulations

paper_url: http://arxiv.org/abs/2307.10060
repo_url: None
paper_authors: Rikhi Bose, Arunabha M. Roy
for: 这两家模型是为了大涵观 simulations（LES）的目的而开发的。
methods: 这两家模型使用了深度学习（DL）算法，与传统的分析模型技术不同，可以生成高阶复杂的非线性关系。
results: 实验结果显示，使用了更简单的模型可以更好地学习特征，并且在不同的滤波器宽度和 Reynolds 数下预测 SGS 压力的预测性能更高。

Abstract
We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes. Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms which, unlike state-of-the-art analytical modeling techniques can produce high-order complex non-linear relations between inputs and outputs. Explicit filtering of data from direct simulations of the canonical channel flow at two friction Reynolds numbers $Re_\tau\approx 395$ and 590 provided accurate data for training and testing. The two sets of models use different network architectures. One of the architectures uses tensor basis neural networks (TBNN) and embeds the simplified analytical model form of the general effective-viscosity hypothesis, thus incorporating the Galilean, rotational and reflectional invariances. The other architecture is that of a relatively simple network, that is able to incorporate the Galilean invariance only. However, this simpler architecture has better feature extraction capacity owing to its ability to establish relations between and extract information from cross-components of the integrity basis tensors and the SGS stresses. Both sets of models are used to predict the SGS stresses for feature datasets generated with different filter widths, and at different Reynolds numbers. It is shown that due to the simpler model's better feature learning capabilities, it outperforms the invariance embedded model in statistical performance metrics. In a priori tests, both sets of models provide similar levels of dissipation and backscatter. Based on the test results, both sets of models should be usable in a posteriori actual LESs.

摘要
我们介绍了两家子grid scale（SGS）湍流模型，用于大扰波 simulations（LES）的目的。它们的发展需要了具有物理学 Informed Robust 和高效的深度学习（DL）算法，不同于现有的分析模型技术可以生成高阶复杂非线性关系。我们使用了对直接实验的标准涡流频率 $Re_\tau\approx 395$ 和 $Re_\tau\approx 590$ 的数据进行直接训练和测试。这两组模型使用了不同的网络架构。其中一个架构使用了tensor基 neural network（TBNN），并将简单的分析模型形式给适用到通用效率假设，因此包含了加利力、旋转和反射不变性。另一个架构则是一个较简单的网络，它能够将加利力不变性给适用，但是它的特征提取能力比较强，因为它可以从标本网络和SGS压力之间的交互关系中提取信息。这两组模型都用于预测SGS压力，并在不同的范围和 Reynolds 数下进行了预测。发现较简单的模型在统计性能指标下比具有不变性的模型表现更好。在先前的测试中，这两组模型都提供了相似的净减杂和反射。根据测试结果，这两组模型都可以在 posteriori 实际的LES中使用。

Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization

paper_url: http://arxiv.org/abs/2307.10053
repo_url: https://github.com/xnchxy/GeneralSGD
paper_authors: Nachuan Xiao, Xiaoyin Hu, Kim-Chuan Toh
for: investigate the convergence properties of stochastic gradient descent (SGD) method and its variants, especially in training neural networks built from nonsmooth activation functions.
methods: develop a novel framework that assigns different timescales to stepsizes for updating the momentum terms and variables, and prove the global convergence of the proposed framework in both single-timescale and two-timescale cases.
results: prove the convergence properties of SGD-type methods based on the proposed framework, including heavy-ball SGD, SignSGD, Lion, normalized SGD and clipped SGD, and demonstrate the high efficiency of these methods through preliminary numerical experiments.

Abstract
In this paper, we investigate the convergence properties of the stochastic gradient descent (SGD) method and its variants, especially in training neural networks built from nonsmooth activation functions. We develop a novel framework that assigns different timescales to stepsizes for updating the momentum terms and variables, respectively. Under mild conditions, we prove the global convergence of our proposed framework in both single-timescale and two-timescale cases. We show that our proposed framework encompasses a wide range of well-known SGD-type methods, including heavy-ball SGD, SignSGD, Lion, normalized SGD and clipped SGD. Furthermore, when the objective function adopts a finite-sum formulation, we prove the convergence properties for these SGD-type methods based on our proposed framework. In particular, we prove that these SGD-type methods find the Clarke stationary points of the objective function with randomly chosen stepsizes and initial points under mild assumptions. Preliminary numerical experiments demonstrate the high efficiency of our analyzed SGD-type methods.

摘要
在这篇论文中，我们研究了束ochastic gradient descent（SGD）方法和其变体在训练使用nonsmooth activation function的神经网络时的收敛性质。我们提出了一个新的框架，将步长分配给更新势能项和变量的两个不同时间尺度。在某些轻量条件下，我们证明了我们提议的框架在单时间尺度和双时间尺度情况下的全球收敛性。我们表明，我们的提议框架包括许多已知SGD-type方法，包括坚重球SGD、SignSGD、Lion、normalized SGD和clipped SGD。此外，当目标函数采用finite-sum形式时，我们证明了这些SGD-type方法的收敛性基于我们的提议框架。具体来说，我们证明这些SGD-type方法在 randomly chosen步长和初始点下，可以找到目标函数的clarke静点。初步的数值实验表明我们分析的SGD-type方法具有高效性。

Contextual Reliability: When Different Features Matter in Different Contexts

paper_url: http://arxiv.org/abs/2307.10026
repo_url: None
paper_authors: Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, Aditi Raghunathan
for: 本研究旨在提高深度神经网络的非恶性性能，通过考虑上下文相关的可靠性来解决深度神经网络过度依赖偶合关系的问题。
methods: 本研究提出了一种两阶段框架called Explicit Non-spurious feature Prediction (ENP)，首先在给定上下文中标识合适的特征，然后使用这些特征来训练模型。
results: 本研究的理论和实验结果表明，相比现有方法，ENP框架可以提高非恶性性能，并提供了新的上下文相关可靠性的标准准measure。

Abstract
Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we cannot simply enforce invariance to next-lane speed, since it could provide valuable information about an unobservable pedestrian at a crosswalk. Thus, universally ignoring features that are sometimes (but not always) reliable can lead to non-robust performance. We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context. We propose and analyze a two-stage framework called Explicit Non-spurious feature Prediction (ENP) which first identifies the relevant features to use for a given context, then trains a model to rely exclusively on these features. Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability.

摘要

Europepolls: A Dataset of Country-Level Opinion Polling Data for the European Union and the UK

paper_url: http://arxiv.org/abs/2307.10022
repo_url: https://github.com/konstantinos-p/europepolls
paper_authors: Konstantinos Pitas
for: fills a gap in available opinion polling data for the European Union and the UK, providing a large and open dataset for researchers to study voting behavior and multimodal data.
methods: uses Wikipedia data and the pandas library to gather and preprocess the data, making it available in both raw and preprocessed formats.
results: enables researchers to study complex interactions between multimodal data and voting behavior, with the potential for new insights and discoveries using recent advances in LLMs and deep learning.

Abstract
I propose an open dataset of country-level historical opinion polling data for the European Union and the UK. The dataset aims to fill a gap in available opinion polling data for the European Union. Some existing datasets are restricted to the past five years, limiting research opportunities. At the same time, some larger proprietary datasets exist but are available only in a visual preprocessed time series format. Finally, while other large datasets for individual countries might exist, these could be inaccessible due to language barriers. The data was gathered from Wikipedia, and preprocessed using the pandas library. Both the raw and the preprocessed data are in the .csv format. I hope that given the recent advances in LLMs and deep learning in general, this large dataset will enable researchers to uncover complex interactions between multimodal data (news articles, economic indicators, social media) and voting behavior. The raw data, the preprocessed data, and the preprocessing scripts are available on GitHub.

摘要
我提议开放的国家级历史民意调查数据集，涵盖欧盟和英国。这个数据集的目标是填补现有的欧盟民意调查数据的空白，一些现有的数据集只有过去五年的数据，限制了研究机会。同时，一些大型专有数据集存在，但只能在可视化预处理的时间序列格式下获得。此外，各国个别的大型数据集可能存在，但可能因语言障碍而不可达。这些数据来自Wikipedia，使用pandas库进行预处理。原始数据和预处理后的数据均在.csv格式下可用。我希望通过最近的人工智能技术和深度学习的发展，这个大型数据集将帮助研究人员发现多Modal数据（新闻文章、经济指标、社交媒体）和选举行为之间的复杂互动。原始数据、预处理后的数据和预处理脚本都可在GitHub上下载。

TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction

paper_url: http://arxiv.org/abs/2307.10003
repo_url: None
paper_authors: Amirhossein Aminimehr, Pouya Khani, Amirali Molaei, Amirmohammad Kazemeini, Erik Cambria
for: 这个研究旨在提高透明化人工智能（XAI）模型的解释性，使得不熟悉机器学习的用户可以更好地理解模型的预测结果。
methods: 该研究提出了一种名为TbExplain的框架，使用XAI技术和预训练的对象检测器来提供场景分类模型的文本解释。此外，TbExplain还包括一种新的方法来修正预测和文本解释基于输入图像中对象的统计学分布。
results: 对TbExplain在场景分类 dataset 上进行质量和质量测试，发现它可以提高场景分类精度，并且对于不可靠的预测，可以提供可靠的文本解释。

Abstract
The field of Explainable Artificial Intelligence (XAI) aims to improve the interpretability of black-box machine learning models. Building a heatmap based on the importance value of input features is a popular method for explaining the underlying functions of such models in producing their predictions. Heatmaps are almost understandable to humans, yet they are not without flaws. Non-expert users, for example, may not fully understand the logic of heatmaps (the logic in which relevant pixels to the model's prediction are highlighted with different intensities or colors). Additionally, objects and regions of the input image that are relevant to the model prediction are frequently not entirely differentiated by heatmaps. In this paper, we propose a framework called TbExplain that employs XAI techniques and a pre-trained object detector to present text-based explanations of scene classification models. Moreover, TbExplain incorporates a novel method to correct predictions and textually explain them based on the statistics of objects in the input image when the initial prediction is unreliable. To assess the trustworthiness and validity of the text-based explanations, we conducted a qualitative experiment, and the findings indicated that these explanations are sufficiently reliable. Furthermore, our quantitative and qualitative experiments on TbExplain with scene classification datasets reveal an improvement in classification accuracy over ResNet variants.

摘要
黑obox机器学习模型的解释可视化（XAI）领域目标是提高模型预测结果的解释性。建立基于输入特征重要性值的热图是一种流行的方法来解释模型在生成预测结果时使用的下面函数。然而，热图并不完全无瑕疵，例如，非专业用户可能无法完全理解热图的逻辑（在热图中显示相关像素的不同颜色或强度）。此外，输入图像中对模型预测结果的重要对象和区域也经常不充分分化。在这篇论文中，我们提出了一个名为TbExplain的框架，该框架使用XAI技术和预训练的对象检测器来提供场景分类模型的文本基于解释。此外，TbExplain还包括一种新的方法来根据输入图像中对象的统计特征来修正预测结果和文本解释，当初始预测结果不可靠时。为评估文本基于解释的可信度和有效性，我们进行了一次质量性实验，实验结果表明这些解释具有足够的可靠性。此外，我们对TbExplain与场景分类数据集进行了量化和质量性实验，发现TbExplain比ResNet变体增加了分类精度。

Impact of Disentanglement on Pruning Neural Networks

paper_url: http://arxiv.org/abs/2307.09994
repo_url: None
paper_authors: Carl Shneider, Peyman Rostami, Anis Kacem, Nilotpal Sinha, Abd El Rahman Shabayek, Djamila Aouada
for: 这个研究目的是为了实现对于实际世界的任务特定目标进行深度学习神经网络在边缘设备上部署，因此需要对神经网络进行压缩，以减少它们的内存占用、电力消耗和延迟。
methods: 这个研究使用了 beta-VAE 框架，与标准的条件进行剪裁，以 investigate 对于训练任务的条件下，强制神经网络学习分离的表现如何影响剪裁过程。
results: 这个研究在 MNIST 和 CIFAR10 datasets 上进行实验，探讨了分离挑战，并提出了未来研究的路径。

Abstract
Deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real-world, requires a reduction in their memory footprint, power consumption, and latency. This can be realized via efficient model compression. Disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because they mainly retain task-specific information, discarding useless information for the task at hand. We make use of the Beta-VAE framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations on the pruning process for the task of classification. In particular, we perform experiments on MNIST and CIFAR10 datasets, examine disentanglement challenges, and propose a path forward for future works.

摘要
deploying deep learning neural networks on edge devices, to accomplish task specific objectives in the real world, requires a reduction in their memory footprint, power consumption, and latency. this can be realized via efficient model compression. disentangled latent representations produced by variational autoencoder (VAE) networks are a promising approach for achieving model compression because they mainly retain task-specific information, discarding useless information for the task at hand. we make use of the beta-vae framework combined with a standard criterion for pruning to investigate the impact of forcing the network to learn disentangled representations on the pruning process for the task of classification. in particular, we perform experiments on mnist and cifar10 datasets, examine disentanglement challenges, and propose a path forward for future works.

UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing

paper_url: http://arxiv.org/abs/2307.09989
repo_url: None
paper_authors: Qifang Zhao, Tianyu Li, Meng Du, Yu Jiang, Qinghui Sun, Zhongyao Wang, Hong Liu, Huan Xu
for: 降低云服务中private域营销时的机器学习模型成本
methods: 提出了一个统一的用户项目匹配框架，同时进行用户定向和项目推荐，只需一个模型
results: 实验表明，该框架可以同时实现用户定向和项目推荐，并且比 estado del arte 方法更高效，同时减少了计算资源和日常维护成本

Abstract
When doing private domain marketing with cloud services, the merchants usually have to purchase different machine learning models for the multiple marketing purposes, leading to a very high cost. We present a unified user-item matching framework to simultaneously conduct item recommendation and user targeting with just one model. We empirically demonstrate that the above concurrent modeling is viable via modeling the user-item interaction matrix with the multinomial distribution, and propose a bidirectional bias-corrected NCE loss for the implementation. The proposed loss function guides the model to learn the user-item joint probability $p(u,i)$ instead of the conditional probability $p(i|u)$ or $p(u|i)$ through correcting both the users and items' biases caused by the in-batch negative sampling. In addition, our framework is model-agnostic enabling a flexible adaptation of different model architectures. Extensive experiments demonstrate that our framework results in significant performance gains in comparison with the state-of-the-art methods, with greatly reduced cost on computing resources and daily maintenance.

摘要
(Note: Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. It is different from Traditional Chinese, which is used in Taiwan and other countries. The translation is written in Simplified Chinese.)

TinyTrain: Deep Neural Network Training at the Extreme Edge

paper_url: http://arxiv.org/abs/2307.09988
repo_url: None
paper_authors: Young D. Kwon, Rui Li, Stylianos I. Venieris, Jagmohan Chauhan, Nicholas D. Lane, Cecilia Mascolo
for: 这个论文的目的是提出一个在设备上训练的方法，以提高用户化和隐私。
methods: 这个方法使用选择性更新部分模型，以及特别是处理数据缺乏的问题。它还引入了一个任务适应的迭代更新方法，可以在不同的任务下灵活地选择层/通道，以jointly captures用户数据、内存和计算能力。
results: 相比于vanilla fine-tuning整个网络，这个方法可以提高精度 by 3.6-5.0%，并同时降低了传递内存和计算成本，分别降低了2,286倍和7.68倍。这个方法可以在实际的edge设备上进行9.5倍 faster和3.5倍更能效的训练，并且仅需1 MB的内存空间，比SOTA方法小得多。

Abstract
On-device training is essential for user personalisation and privacy. With the pervasiveness of IoT devices and microcontroller units (MCU), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labelled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g. a few hours), or induce substantial accuracy loss ($\geq$10\%). We propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0\% in accuracy, while reducing the backward-pass memory and computation cost by up to 2,286$\times$ and 7.68$\times$, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5$\times$ faster and 3.5$\times$ more energy-efficient training over status-quo approaches, and 2.8$\times$ smaller memory footprint than SOTA approaches, while remaining within the 1 MB memory envelope of MCU-grade platforms.

摘要
<> translate the following text into Simplified ChineseOn-device training is essential for user personalization and privacy. With the pervasiveness of IoT devices and microcontroller units (MCU), this task becomes more challenging due to the constrained memory and compute resources, and the limited availability of labeled user data. Nonetheless, prior works neglect the data scarcity issue, require excessively long training time (e.g., a few hours), or induce substantial accuracy loss (≥10%). We propose TinyTrain, an on-device training approach that drastically reduces training time by selectively updating parts of the model and explicitly coping with data scarcity. TinyTrain introduces a task-adaptive sparse-update method that dynamically selects the layer/channel based on a multi-objective criterion that jointly captures user data, the memory, and the compute capabilities of the target device, leading to high accuracy on unseen tasks with reduced computation and memory footprint. TinyTrain outperforms vanilla fine-tuning of the entire network by 3.6-5.0% in accuracy, while reducing the backward-pass memory and computation cost by up to 2,286 times and 7.68 times, respectively. Targeting broadly used real-world edge devices, TinyTrain achieves 9.5 times faster and 3.5 times more energy-efficient training over status-quo approaches, and 2.8 times smaller memory footprint than SOTA approaches, while remaining within the 1 MB memory envelope of MCU-grade platforms.Translation:在设备上训练是用户个性化和隐私的关键。随着物联网设备和微控制器单元（MCU）的普及，这种任务变得更加挑战，因为它们具有有限的存储和计算资源，以及有限的用户数据标注。然而，先前的工作忽视了数据稀缺问题，或者需要过长的训练时间（例如几个小时），或者导致减少精度（≥10%）。我们提出了TinyTrain，一种在设备上训练的方法，可以快速地减少训练时间，并且直接处理数据稀缺问题。TinyTrain引入了任务适应的稀有更新方法，可以在多bjective criterion 中选择层/通道，该 criterion jointly 捕捉用户数据、内存和计算能力等因素。这使得TinyTrain在未经训练任务上达到高精度，同时减少了反向传播的内存和计算成本。对于广泛使用的边缘设备，TinyTrain实现了9.5倍快速的训练，3.5倍更高的能效性，并且内存占用率比SOTAapproaches 小2.8倍。同时，TinyTrain仍然保持在MCU-grade平台上的1 MB内存范围内。

Learner Referral for Cost-Effective Federated Learning Over Hierarchical IoT Networks

paper_url: http://arxiv.org/abs/2307.09977
repo_url: None
paper_authors: Yulan Gao, Ziqiang Ye, Yue Xiao, Wei Xiang
for: 这篇论文的目的是提出一种基于联合学习者推荐和本地模型准确优化的 Federated Learning（FL）方法，以Address data privacy concerns和提高FL在分布式网络中的可扩展性和可靠性。
methods: 这篇论文使用了联合学习者推荐、通信和计算资源调度、本地模型准确优化等方法来Minimize the cost incurred by the worst-case participant and ensure the long-term fairness of FL in hierarchical Internet of Things (HieIoT) networks。
results: numerical simulations and experimental results on the MNIST/CIFAR-10 datasets demonstrate that our proposed LRef-FedCS approach could achieve a good balance between pursuing high global accuracy and reducing cost。

Abstract
The paradigm of federated learning (FL) to address data privacy concerns by locally training parameters on resource-constrained clients in a distributed manner has garnered significant attention. Nonetheless, FL is not applicable when not all clients within the coverage of the FL server are registered with the FL network. To bridge this gap, this paper proposes joint learner referral aided federated client selection (LRef-FedCS), along with communications and computing resource scheduling, and local model accuracy optimization (LMAO) methods. These methods are designed to minimize the cost incurred by the worst-case participant and ensure the long-term fairness of FL in hierarchical Internet of Things (HieIoT) networks. Utilizing the Lyapunov optimization technique, we reformulate the original problem into a stepwise joint optimization problem (JOP). Subsequently, to tackle the mixed-integer non-convex JOP, we separatively and iteratively address LRef-FedCS and LMAO through the centralized method and self-adaptive global best harmony search (SGHS) algorithm, respectively. To enhance scalability, we further propose a distributed LRef-FedCS approach based on a matching game to replace the centralized method described above. Numerical simulations and experimental results on the MNIST/CIFAR-10 datasets demonstrate that our proposed LRef-FedCS approach could achieve a good balance between pursuing high global accuracy and reducing cost.

摘要
《联邦学习（FL）》的概念，通过在分布式环境中本地训练参数，以解决数据隐私问题，已经吸引了广泛的关注。然而，FL不适用于所有FL服务器的覆盖区域内的所有客户端没有注册到FL网络。为了bridging这个差距，本文提出了联合学生推荐帮助 federated client selection（LRef-FedCS）、通信和计算资源调度，以及本地模型准确优化（LMAO）方法。这些方法的目的是 minimize the cost incurred by the worst-case participant and ensure the long-term fairness of FL in hierarchical Internet of Things（HieIoT） networks。通过利用Lyapunov优化技术，我们将原始问题转换为stepwise joint optimization problem（JOP）。然后，为了解决杂合Integer non-convex JOP，我们分别和iteratively处理LRef-FedCS和LMAO通过中央化方法和自适应全球最佳匹配搜索（SGHS）算法，分别进行处理。为了提高扩展性，我们还提出了基于匹配游戏的分布式LRef-FedCS方法来取代中央化方法。numerical simulations and experimental results on the MNIST/CIFAR-10 datasets show that our proposed LRef-FedCS approach can achieve a good balance between pursuing high global accuracy and reducing cost.

Towards green AI-based software systems: an architecture-centric approach (GAISSA)

paper_url: http://arxiv.org/abs/2307.09964
repo_url: None
paper_authors: Silverio Martínez-Fernández, Xavier Franch, Francisco Durán
for: 这个研究项目的目的是提供数据科学家和软件工程师用于模型和开发绿色人工智能系统的工具支持方法。
methods: 该项目使用了architecture-centric方法，以帮助数据科学家和软件工程师在设计和实现绿色人工智能系统时更加高效。
results: current research results indicate that the GAISSA project has the potential to achieve its objectives and provide effective methods for developing green AI-based systems.

Abstract
Nowadays, AI-based systems have achieved outstanding results and have outperformed humans in different domains. However, the processes of training AI models and inferring from them require high computational resources, which pose a significant challenge in the current energy efficiency societal demand. To cope with this challenge, this research project paper describes the main vision, goals, and expected outcomes of the GAISSA project. The GAISSA project aims at providing data scientists and software engineers tool-supported, architecture-centric methods for the modelling and development of green AI-based systems. Although the project is in an initial stage, we describe the current research results, which illustrate the potential to achieve GAISSA objectives.

摘要
现在，基于人工智能的系统已经取得了杰出的成绩，在不同的领域超越了人类。但是，训练AI模型和从其进行推理需要高度的计算资源，这对当今能效社会需求 pose 了 significiant 挑战。为了应对这个挑战，本研究项目论文描述了GAISSA项目的主要视野、目标和预期结果。GAISSA项目的目标是为数据科学家和软件工程师提供工具支持、建筑中心的方法，用于开发绿色基于人工智能的系统。虽然项目还在初期阶段，我们介绍了当前的研究成果，这些成果表明GAISSA目标的可能性。

XSkill: Cross Embodiment Skill Discovery

paper_url: http://arxiv.org/abs/2307.09955
repo_url: None
paper_authors: Mengda Xu, Zhenjia Xu, Cheng Chi, Manuela Veloso, Shuran Song
for: 本研究旨在实现从人工智能影像中提取可重用的机器人操作技能，并将其应用于实际世界中。
methods: 本研究使用了一个叫做XSkill的实践学习框架，它可以从无标注的人工和机器人操作影像中独立获取技能标本，并将其转换为机器人动作使用条件散乱策略。
results: 实验结果显示，XSkill可以将获取的技能标本转换为机器人动作，并且可以将学习的技能组合以实现未见过的任务，实现了更加一般和可扩展的实践学习框架。

Abstract
Human demonstration videos are a widely available data source for robot learning and an intuitive user interface for expressing desired behavior. However, directly extracting reusable robot manipulation skills from unstructured human videos is challenging due to the big embodiment difference and unobserved action parameters. To bridge this embodiment gap, this paper introduces XSkill, an imitation learning framework that 1) discovers a cross-embodiment representation called skill prototypes purely from unlabeled human and robot manipulation videos, 2) transfers the skill representation to robot actions using conditional diffusion policy, and finally, 3) composes the learned skill to accomplish unseen tasks specified by a human prompt video. Our experiments in simulation and real-world environments show that the discovered skill prototypes facilitate both skill transfer and composition for unseen tasks, resulting in a more general and scalable imitation learning framework. The performance of XSkill is best understood from the anonymous website: https://xskillcorl.github.io.

摘要
人类示例视频是机器人学习的广泛可用数据源，也是一种直观的用户界面，可以直接表达机器人需要的行为。然而，直接从无结构的人类视频中提取可重用的机器人操作技能是具有大embodiment差异和未观察行为参数的挑战。为bridging这个embodiment gap，本文提出了XSkill，一个仿学学习框架，它可以：1. 基于无标注的人类和机器人操作视频，纯粹地发现跨embodiment的表示 called skill prototypes;2. 使用条件扩散策略将表示转移到机器人动作;3. 使用人类提示视频完成未seen任务。我们在 simulated和实际环境中进行了实验，结果表明 XSkill可以快速 Transfer skill和组合未seen任务，从而实现更一般和可扩展的仿学学习框架。XSkill的性能可以通过无名网站https://xskillcorl.github.io进行了解。

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

paper_url: http://arxiv.org/abs/2307.09943
repo_url: https://github.com/spotify-research/impatient-bandits
paper_authors: Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek
for: 这个论文目标是提高用户在线平台上的满意度，通过研究内容探索任务，以及将这个任务формализова为多重钩盘问题。
methods: 作者使用了一种 bayesian 筛选器来 integrate full observations和partial outcomes，并开发了一种钩盘算法来快速Identify用户长期满意的内容。
results: 实验表明，作者的方法可以substantially improve performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized。

Abstract
Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that takes advantage of this new predictive model. The algorithm quickly learns to identify content aligned with long-term success by carefully balancing exploration and exploitation. We apply our approach to a podcast recommendation problem, where we seek to identify shows that users engage with repeatedly over two months. We empirically validate that our approach results in substantially better performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized.

摘要
优化器系统在在线平台上是一种普遍存在的特性。现在，它们更加专注于提高用户的长期满意度。在这个 контексте，我们研究一种内容探索任务，我们将其正式定义为带延迟奖励的多臂投机问题。我们发现，选择学习信号的决策存在一定的负担：等待获得完整奖励可能需要几周时间，从而妨碍学习的速度，而且测量短期代理奖励只能够做出不准确的长期目标预测。为了解决这个挑战，我们采取了两个步骤：首先，我们开发了一个带延迟奖励的预测模型。这个模型把所有已有信息都合并到一起，并使用某种权重来衡量奖励的可能性。其次，我们设计了一种投机算法，该算法利用这个新的预测模型来快速地标识与长期成功相符的内容。我们在一个 Podcast 推荐问题中应用了这种方法，我们的目标是在两个月内找到用户重复听众的播客。我们通过实验证明，我们的方法可以与等待完整奖励或仅仅优化短期代理奖励相比，显著提高表现。

TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network

paper_url: http://arxiv.org/abs/2307.09942
repo_url: None
paper_authors: Brandon Theodorou, Cao Xiao, Jimeng Sun
for: 快速和效率地招募临床试验参与者，提高临床试验效率。
methods: 使用机器学习模型自动匹配患者与临床试验，基于患者长期电子医疗记录（EHR）数据和临床试验招募要求。
results: 提出一种名为TREEMENT的个性化动态树型记忆网络模型，可以提供精度和可读性的患者试验匹配。对现有模型进行比较，TREEMENT在实际数据集上表现出7%的错误减少和优于最佳基eline的试验级匹配能力。此外，TREEMENT还可以提供良好的可读性，使模型结果更易于采用。

Abstract
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials based on longitudinal patient electronic health records (EHR) data and eligibility criteria of clinical trials. However, they either depend on trial-specific expert rules that cannot expand to other trials or perform matching at a very general level with a black-box model where the lack of interpretability makes the model results difficult to be adopted. To provide accurate and interpretable patient trial matching, we introduce a personalized dynamic tree-based memory network model named TREEMENT. It utilizes hierarchical clinical ontologies to expand the personalized patient representation learned from sequential EHR data, and then uses an attentional beam-search query learned from eligibility criteria embedding to offer a granular level of alignment for improved performance and interpretability. We evaluated TREEMENT against existing models on real-world datasets and demonstrated that TREEMENT outperforms the best baseline by 7% in terms of error reduction in criteria-level matching and achieves state-of-the-art results in its trial-level matching ability. Furthermore, we also show TREEMENT can offer good interpretability to make the model results easier for adoption.

摘要
临床试验是药品开发的关键一环，但它们经常面临临床招募的成本和效率问题。在最近几年，机器学习模型被提出来加速临床招募，通过自动将病人与临床试验相匹配，基于患者的长期电子医疗记录（EHR）数据和临床试验的参与条件。但是，这些模型可能会依赖于专门为某个试验而设计的规则，无法扩展到其他试验，或者使用黑obox模型，导致模型结果难以被采纳。为了提供准确和可解释的病人试验匹配，我们介绍了一种个性化动态树型记忆网络模型，名为TREEMENT。它利用层次的临床 ontology 扩展个性化病人表示，然后使用注意力寻找查询学习自适应搜索来提供精细水平的匹配，从而提高性能和可解释性。我们对实际数据进行了评估，并证明TREEMENT在指标水平匹配方面比最佳参考模型下降7%，并在试验水平匹配能力方面达到了当前最佳 результа。此外，我们还证明TREEMENT可以提供好的可解释性，使得模型结果更容易采纳。

Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features

paper_url: http://arxiv.org/abs/2307.09933
repo_url: None
paper_authors: Cian Eastwood, Shashank Singh, Andrei Liviu Nicolicioiu, Marin Vlastelica, Julius von Kügelgen, Bernhard Schölkopf
For: The paper is written for improving the performance of machine learning models on out-of-distribution data.* Methods: The paper proposes a method called Stable Feature Boosting (SFB) that learns to use unstable features in the test domain without labels. SFB consists of two steps: (i) learning a predictor that separates stable and conditionally-independent unstable features, and (ii) using the stable-feature predictions to adapt the unstable-feature predictions in the test domain.* Results: The paper proves that SFB can learn an asymptotically-optimal predictor without test-domain labels, and demonstrates the effectiveness of SFB on real and synthetic data.

Abstract
To avoid failures on out-of-distribution data, recent works have sought to extract features that have a stable or invariant relationship with the label across domains, discarding the "spurious" or unstable features whose relationship with the label changes across domains. However, unstable features often carry complementary information about the label that could boost performance if used correctly in the test domain. Our main contribution is to show that it is possible to learn how to use these unstable features in the test domain without labels. In particular, we prove that pseudo-labels based on stable features provide sufficient guidance for doing so, provided that stable and unstable features are conditionally independent given the label. Based on this theoretical insight, we propose Stable Feature Boosting (SFB), an algorithm for: (i) learning a predictor that separates stable and conditionally-independent unstable features; and (ii) using the stable-feature predictions to adapt the unstable-feature predictions in the test domain. Theoretically, we prove that SFB can learn an asymptotically-optimal predictor without test-domain labels. Empirically, we demonstrate the effectiveness of SFB on real and synthetic data.

摘要
We propose Stable Feature Boosting (SFB), an algorithm that learns to separate stable and conditionally-independent unstable features, and uses the stable-feature predictions to adapt the unstable-feature predictions in the test domain. Theoretically, we prove that SFB can learn an asymptotically-optimal predictor without test-domain labels. Empirically, we demonstrate the effectiveness of SFB on real and synthetic data.In summary, our approach leverages pseudo-labels based on stable features to guide the use of unstable features in the test domain, without requiring test-domain labels. This can lead to improved performance on out-of-distribution data, and our algorithm is proven to be asymptotically optimal.

DISA: DIfferentiable Similarity Approximation for Universal Multimodal Registration

paper_url: http://arxiv.org/abs/2307.09931
repo_url: https://github.com/imfusiongmbh/disa-universal-multimodal-registration
paper_authors: Matteo Ronchetti, Wolfgang Wein, Nassir Navab, Oliver Zettinig, Raphael Prevost
for: 用于多Modal imaging registration的挑战性问题，以便实现多种图像导航程序。
methods: 使用小型卷积神经网络（CNN）来创建表达力强的跨Modal描述符，以便快速进行可变的全局对 align。
results: 在三个不同的数据集上进行实验，结果表明我们的方法可以快速地执行，并且可以在临床设置中直接应用，无需特殊 retraining。

Abstract
Multimodal image registration is a challenging but essential step for numerous image-guided procedures. Most registration algorithms rely on the computation of complex, frequently non-differentiable similarity metrics to deal with the appearance discrepancy of anatomical structures between imaging modalities. Recent Machine Learning based approaches are limited to specific anatomy-modality combinations and do not generalize to new settings. We propose a generic framework for creating expressive cross-modal descriptors that enable fast deformable global registration. We achieve this by approximating existing metrics with a dot-product in the feature space of a small convolutional neural network (CNN) which is inherently differentiable can be trained without registered data. Our method is several orders of magnitude faster than local patch-based metrics and can be directly applied in clinical settings by replacing the similarity measure with the proposed one. Experiments on three different datasets demonstrate that our approach generalizes well beyond the training data, yielding a broad capture range even on unseen anatomies and modality pairs, without the need for specialized retraining. We make our training code and data publicly available.

摘要
多modal图像匹配是一项复杂但必要的步骤，用于许多图像引导过程。大多数注册算法利用计算复杂，通常不可导的相似度指标来处理不同模式下的生物结构的外观差异。现代机器学习基于方法受限于特定的解剖学-模式组合，并不能泛化到新的设定。我们提出了一个通用框架，用于生成表达性较高的跨模式描述符，以实现快速的可变截形注册。我们实现了这一点通过将现有的度量简化为点积在一个小型卷积神经网络（CNN）的特征空间中，该网络自然地可导，可以在不注册数据下进行训练。我们的方法比局部补充度量快速得多，可以直接在临床设置中应用，只需替换相似度度量即可。我们在三个不同的数据集上进行了实验，并证明我们的方法可以覆盖训练数据以外的广泛范围，包括未看到的解剖结构和模式对。我们将训练代码和数据公开发布。

TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual Explanations

paper_url: http://arxiv.org/abs/2307.09916
repo_url: https://github.com/catherinehao/timetuner
paper_authors: Jianing Hao, Qing Shi, Yilin Ye, Wei Zeng
for: 本研究旨在帮助分析者理解时间序列表示之间的关系，以及如何通过特征工程来提高模型预测性能。
methods: 本研究使用了一种新的视觉分析框架，named TimeTuner，它可以帮助分析者理解模型行为与时间序列表示之间的关系，并提供多种可视化视图来描述模型表现。
results: 研究表明，TimeTuner可以帮助分析者更好地理解时间序列表示的特征，并提高模型预测性能。在实验中，TimeTuner使用了两种变换方法（平滑和采样），并在实际时间序列预测 task 中进行了应用。

Abstract
Deep learning (DL) approaches are being increasingly used for time-series forecasting, with many efforts devoted to designing complex DL models. Recent studies have shown that the DL success is often attributed to effective data representations, fostering the fields of feature engineering and representation learning. However, automated approaches for feature learning are typically limited with respect to incorporating prior knowledge, identifying interactions among variables, and choosing evaluation metrics to ensure that the models are reliable. To improve on these limitations, this paper contributes a novel visual analytics framework, namely TimeTuner, designed to help analysts understand how model behaviors are associated with localized correlations, stationarity, and granularity of time-series representations. The system mainly consists of the following two-stage technique: We first leverage counterfactual explanations to connect the relationships among time-series representations, multivariate features and model predictions. Next, we design multiple coordinated views including a partition-based correlation matrix and juxtaposed bivariate stripes, and provide a set of interactions that allow users to step into the transformation selection process, navigate through the feature space, and reason the model performance. We instantiate TimeTuner with two transformation methods of smoothing and sampling, and demonstrate its applicability on real-world time-series forecasting of univariate sunspots and multivariate air pollutants. Feedback from domain experts indicates that our system can help characterize time-series representations and guide the feature engineering processes.

摘要
TimeTuner 的核心技术包括以下两阶段方法：1. 首先，我们利用 counterfactual 解释来连接时间序列表示、多Variable 特征和预测的关系。2. 接下来，我们设计了多个协调的观点，包括分割基于相互关联的变数矩阵和排列在一起的双轴条纹，并提供了一些互动来让用户进入转换选择过程，穿梭特征空间，并理解模型性能。我们实现 TimeTuner 使用两种转换方法：平滑和抽样。我们在实际应用中运用这两种转换方法，并在单variate 太阳黑子和多variate 空气污染物中进行了实验。专家反馈表明，我们的系统可以帮助描述时间序列表示和导引特征工程过程。

Deep projection networks for learning time-homogeneous dynamical systems

paper_url: http://arxiv.org/abs/2307.09912
repo_url: None
paper_authors: Vladimir R. Kostic, Pietro Novelli, Riccardo Grazzi, Karim Lounici, Massimiliano Pontil
for: 学习时间homogeneous dynamical systems的meaningful representation，用于预测未来状态或观测量。
methods: 使用卷积神经网络学习投影算子，通过优化一个类似于 canonical correlation analysis（CCA）的目标函数来学习。
results: 提出了一种稳定且可靠的方法，可以应用于具有挑战性的情况，并且可以改进前一些方法的性能。

Abstract
We consider the general class of time-homogeneous dynamical systems, both discrete and continuous, and study the problem of learning a meaningful representation of the state from observed data. This is instrumental for the task of learning a forward transfer operator of the system, that in turn can be used for forecasting future states or observables. The representation, typically parametrized via a neural network, is associated with a projection operator and is learned by optimizing an objective function akin to that of canonical correlation analysis (CCA). However, unlike CCA, our objective avoids matrix inversions and therefore is generally more stable and applicable to challenging scenarios. Our objective is a tight relaxation of CCA and we further enhance it by proposing two regularization schemes, one encouraging the orthogonality of the components of the representation while the other exploiting Chapman-Kolmogorov's equation. We apply our method to challenging discrete dynamical systems, discussing improvements over previous methods, as well as to continuous dynamical systems.

摘要
我们考虑一般时间对称动力系统，包括离散和连续类型，并研究从观测数据中学习一个有意义的状态表现。这个表现通常是通过神经网络实现，并与投影算子相关。我们通过优化一个目标函数，与标准均值分析（CCA）的目标函数相似，但不同的是，我们的目标函数没有矩阵逆元，因此更稳定和适用于具有挑战性的情况。我们的目标函数是CCA的紧缩版本，而我们还提出了两种调整方案，一种鼓励表现的分量对称，另一种利用柯莫格罗夫的方程式。我们将方法应用到具有挑战性的离散动力系统和连续动力系统。

Repeated Observations for Classification

paper_url: http://arxiv.org/abs/2307.09896
repo_url: None
paper_authors: Hüseyin Afşer, László Györfi, Harro Walk
for: 这个论文研究了非 Parametric 分类问题，具体来说是在重复观察的情况下。
methods: 这篇论文提出了一些简单的分类规则，这些规则的 conditional error probabilities 的整数速度 converge 为 $t\to\infty$.
results: 文章分析了一些特定的模型，如 robust detection by nominal densities、prototype classification、linear transformation、linear classification、scaling。

Abstract
We study the problem nonparametric classification with repeated observations. Let $\bX$ be the $d$ dimensional feature vector and let $Y$ denote the label taking values in $\{1,\dots ,M\}$. In contrast to usual setup with large sample size $n$ and relatively low dimension $d$, this paper deals with the situation, when instead of observing a single feature vector $\bX$ we are given $t$ repeated feature vectors $\bV_1,\dots ,\bV_t $. Some simple classification rules are presented such that the conditional error probabilities have exponential convergence rate of convergence as $t\to\infty$. In the analysis, we investigate particular models like robust detection by nominal densities, prototype classification, linear transformation, linear classification, scaling.

摘要
我们研究非 Parametric 分类问题，对于重复观察 $\bX$ 的状况。在传统设置中，我们有大量样本数 $n$ 和低维度 $d$，但这里我们则是考虑在 $\bV_1, \bV_2, ..., \bV_t$ 中观察 $t$ 次的重复Feature vector。我们提出了一些简单的分类规则，其中 conditional error probability 的数值具有 exponential 的减少速度，随着 $t$ 趋向无限大。在分析中，我们探讨了特定的模型，例如 Robust 检测、prototype 分类、线性转换、线性分类、缩放。

Symmetric Equilibrium Learning of VAEs

paper_url: http://arxiv.org/abs/2307.09883
repo_url: None
paper_authors: Boris Flach, Dmitrij Schlesinger, Alexander Shekhovtsov
for: 这个论文是为了推广Variational Autoencoders（VAE）的应用范围，使其能够在更复杂的学习场景中使用，例如普通的半监督学习和使用复杂的生成模型为先验。
methods: 这个论文提出了一种Nash平衡学习方法，该方法可以适应更复杂的学习场景，例如只能通过采样获取数据和幂态分布的情况。该方法使用随机扩散来学习VAE，并且可以应用于各种下游任务。
results: 实验表明，使用Nash平衡学习方法学习VAE可以与使用标准的ELBO学习方法相比，并且可以应用于不可以使用标准VAE学习方法的任务。

Abstract
We view variational autoencoders (VAE) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. The standard learning approach for VAEs, i.e. maximisation of the evidence lower bound (ELBO), has an obvious asymmetry in that respect. Moreover, it requires a closed form a-priori latent distribution. This limits the applicability of VAEs in more complex scenarios, such as general semi-supervised learning and employing complex generative models as priors. We propose a Nash equilibrium learning approach that relaxes these restrictions and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling. The flexibility and simplicity of this approach allows its application to a wide range of learning scenarios and downstream tasks. We show experimentally that the models learned by this method are comparable to those obtained by ELBO learning and demonstrate its applicability for tasks that are not accessible by standard VAE learning.

摘要
Translated into Simplified Chinese:我们看待变量自动机 (VAE) 为抽象-编码对，它将数据空间中的分布映射到幂空间中的分布和 vice versa。标准的 VAE 学习方法，即最大化证据下界 (ELBO)，具有显著的不均衡性，而且需要闭式预先知道的幂空间分布。这限制了 VAE 在更复杂的场景下的应用，如总体半导导学习和使用复杂的生成模型为假设。我们提议一种纳什平衡学习方法，它可以放弃这些限制，并在数据和幂空间中的样本可用时学习 VAE。这种灵活性和简单性使其适用于各种学习场景和下游任务。我们实验表明，与 ELBO 学习相比，这种方法学习的模型相当，并且可以应用于不可用于标准 VAE 学习的任务。

Adversarial Likelihood Estimation with One-way Flows

paper_url: http://arxiv.org/abs/2307.09882
repo_url: None
paper_authors: Omri Ben-Dov, Pravir Singh Gupta, Victoria Abrevaya, Michael J. Black, Partha Ghosh
for: 本文使用 Generative Adversarial Networks (GANs) 生成高质量样本，但GANs不提供样本周围的概率密度估计。然而，在能量基础设定下，最大化循环可能性函数可以导致对拒绝者的挑战，并且可以获得不正规化的概率密度（常称为能量）。本文进一步发展这种视角，并 incorporate 重要抽样，以获得一种不偏估计的合理性。
methods: 本文提出了一种新的流网络 architecture，called one-way flow network，它不需要有可追踪的逆函数，因此更加自由。此外，本文还使用了重要抽样来提高生成器的概率密度估计。
results: 本文的实验结果表明，使用了一种新的抽样策略和一种更自由的流网络 architecture，可以更快地收敛，并且生成的样本质量与相似的 GAN 架构相比几乎相同。此外，本文还成功避免了常见的数据集适应问题，并生成了平滑的低维 latent representation。

Abstract
Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require to have a tractable inverse function. Our experimental results show that we converge faster, produce comparable sample quality to GANs with similar architecture, successfully avoid over-fitting to commonly used datasets and produce smooth low-dimensional latent representations of the training data.

摘要
生成对抗网络（GANs）可生成高质量样本，但不提供样本周围的概率密度估计。然而，有观点认为，在能量基础设置下，最大化各自对抗极大值可以导致对抗框架，其中承认器提供不归一化密度（常称为能量）。我们进一步发展这个视角，包括重要抽样，并表明以下两点：1. Wasserstein GAN 实现了偏差估计 partition function，我们提议使用不偏差估计器；2. 在优化likelihood时，需要最大化生成器的Entropy。这是假设提供更好的模式覆盖率。与前一些作品不同，我们直接计算生成样本的浓度。这是键启用计算生成器的Entropy项和不偏差估计器。我们的实验结果表明，我们 faster converging, 生成样本质量与GANs相似，避免了通用数据集上的过拟合，并生成了平滑、低维度的训练数据表示。

Detecting Vulnerable Nodes in Urban Infrastructure Interdependent Network

paper_url: http://arxiv.org/abs/2307.09866
repo_url: https://github.com/tsinghua-fib-lab/kdd2023-id546-urbaninfra
paper_authors: Jinzhu Mao, Liu Cao, Chen Gao, Huandong Wang, Hangyu Fan, Depeng Jin, Yong Li
For: This paper aims to understand and characterize the vulnerability of urban infrastructures, which are essential for the regular running of cities and exist naturally in the form of networks.* Methods: The paper proposes a system based on graph neural network with reinforcement learning to accurately model the interdependent network as a heterogeneous graph and capture the risk of cascade failure and discover vulnerable infrastructures of cities.* Results: The proposed system is demonstrated to be effective through extensive experiments with various requests, showing not only its expressive power but also its transferring ability and necessity of specific components.

Abstract
Understanding and characterizing the vulnerability of urban infrastructures, which refers to the engineering facilities essential for the regular running of cities and that exist naturally in the form of networks, is of great value to us. Potential applications include protecting fragile facilities and designing robust topologies, etc. Due to the strong correlation between different topological characteristics and infrastructure vulnerability and their complicated evolution mechanisms, some heuristic and machine-assisted analysis fall short in addressing such a scenario. In this paper, we model the interdependent network as a heterogeneous graph and propose a system based on graph neural network with reinforcement learning, which can be trained on real-world data, to characterize the vulnerability of the city system accurately. The presented system leverages deep learning techniques to understand and analyze the heterogeneous graph, which enables us to capture the risk of cascade failure and discover vulnerable infrastructures of cities. Extensive experiments with various requests demonstrate not only the expressive power of our system but also transferring ability and necessity of the specific components.

摘要
理解和特征城市基础设施的抗垮性很重要，这些基础设施包括城市的工程设施，这些设施自然存在于城市的网络形式中。 potential应用包括保护脆弱设施和设计Robust topology等等。由于不同的 topological特征和基础设施抗垮性存在强相关性和复杂的演化机制，一些启发式和机器学习分析方法无法处理这种情况。在这篇论文中，我们模型城市系统为不同类型图的异质图，并提出基于图神经网络和强化学习的系统，可以在实际数据上训练，准确地评估城市系统的抗垮性。我们的系统利用深度学习技术来理解和分析异质图，以便捕捉城市系统中的风险和潜在敏感设施。我们的实验结果表明，我们的系统不仅具有表达力，还可以转移和特定组件的必要性。

Towards a population-informed approach to the definition of data-driven models for structural dynamics

paper_url: http://arxiv.org/abs/2307.09862
repo_url: None
paper_authors: G. Tsialiamanis, N. Dervilis, D. J. Wagg, K. Worden
for: 本研究旨在适应数据稀缺问题，通过将物理学基本方法与机器学习算法结合使用，提高结构动力学领域的模型性能。
methods: 本研究使用了两种meta-learning算法，分别是模型独立多学习（MAML）算法和条件神经过程（CNP）模型，以Population-based方式实现数据驱动模型的建立。
results: 实验结果表明，使用这两种机器学习算法可以更好地估算关键量，并且表现与传统机器学习算法类似，即随着训练集体系数量的增加，性能逐渐提高。

Abstract
Machine learning has affected the way in which many phenomena for various domains are modelled, one of these domains being that of structural dynamics. However, because machine-learning algorithms are problem-specific, they often fail to perform efficiently in cases of data scarcity. To deal with such issues, combination of physics-based approaches and machine learning algorithms have been developed. Although such methods are effective, they also require the analyser's understanding of the underlying physics of the problem. The current work is aimed at motivating the use of models which learn such relationships from a population of phenomena, whose underlying physics are similar. The development of such models is motivated by the way that physics-based models, and more specifically finite element models, work. Such models are considered transferrable, explainable and trustworthy, attributes which are not trivially imposed or achieved for machine-learning models. For this reason, machine-learning approaches are less trusted by industry and often considered more difficult to form validated models. To achieve such data-driven models, a population-based scheme is followed here and two different machine-learning algorithms from the meta-learning domain are used. The two algorithms are the model-agnostic meta-learning (MAML) algorithm and the conditional neural processes (CNP) model. The algorithms seem to perform as intended and outperform a traditional machine-learning algorithm at approximating the quantities of interest. Moreover, they exhibit behaviour similar to traditional machine learning algorithms (e.g. neural networks or Gaussian processes), concerning their performance as a function of the available structures in the training population.

摘要
机器学习已经对许多领域的现象模型ling有所影响，其中一个领域是结构动力学。然而，由于机器学习算法是问题特定的，因此在数据稀缺时常常表现不好。为解决这些问题，physics-based方法和机器学习算法的组合被开发出来。虽然这些方法有效，但也需要分析者对问题的物理基础知识。本工作的目的是鼓励使用基于人口现象的模型，其中人口现象的物理基础相似。这种模型被认为是可转移、可解释和可信任的，这些特性不是容易受到机器学习模型的限制。因此，机器学习方法在业界中 Less trusted， often considered more difficult to form validated models. To achieve such data-driven models, a population-based scheme is followed here and two different machine-learning algorithms from the meta-learning domain are used. The two algorithms are the model-agnostic meta-learning (MAML) algorithm and the conditional neural processes (CNP) model. The algorithms seem to perform as intended and outperform a traditional machine-learning algorithm at approximating the quantities of interest. Moreover, they exhibit behavior similar to traditional machine learning algorithms (e.g. neural networks or Gaussian processes), concerning their performance as a function of the available structures in the training population.

Reinforcement Learning for Credit Index Option Hedging

paper_url: http://arxiv.org/abs/2307.09844
repo_url: None
paper_authors: Francesco Mandelli, Marco Pinciroli, Michele Trapletti, Edoardo Vittori
for: 本研究目标是找到最佳减负策略，以减少信用指数选择的风险。
methods: 我们采用了实用的方法，即使 discrete time 和交易成本，并在实际市场数据上测试了我们的策略。我们使用了现代算法，即信任区域波动优化（TRVO）算法。
results: 我们的研究表明，基于TRVO算法得到的减负策略比布莱克和斯科尔的Delta减负策略更高效。

Abstract
In this paper, we focus on finding the optimal hedging strategy of a credit index option using reinforcement learning. We take a practical approach, where the focus is on realism i.e. discrete time, transaction costs; even testing our policy on real market data. We apply a state of the art algorithm, the Trust Region Volatility Optimization (TRVO) algorithm and show that the derived hedging strategy outperforms the practitioner's Black & Scholes delta hedge.

摘要
在这篇论文中，我们关注发现债券指数选项最优化套利策略的优化策略，使用回归学习。我们采取了实用的方法，即注重实际情况，即逐步时间、交易成本等。我们应用了现代算法TRVO算法，并证明得到的套利策略超过实践中的黑格&瑞德（Black & Scholes）delta套利。

Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application to Sparse Autoencoders

paper_url: http://arxiv.org/abs/2307.09836
repo_url: https://github.com/memo-p/projection
paper_authors: Guillaume Perez, Laurent Condat, Michel Barlaud
for: 本文主要研究如何快速训练大规模神经网络，特别是如何使用投影技术来减少神经网络的总成本。
methods: 本文提出了一种新的投影算法，用于减少$\ell_{1,\infty}$ нор范围内的矩阵元素。该算法的最坏时间复杂度为$\mathcal{O}\big(nm+J\log(nm)\big)$，其中$J$是一个减少到0的常数，或者是$nm$的常数，具体取决于矩阵的稀烈程度。此外，本文还提议在神经网络训练中采用$\ell_{1,\infty}$球投影来实现特征选择和稀烈化。
results: 本文的实验结果显示，在生物学应用中，只有小部分数据（少于2%）是有关的，而使用本文的方法可以快速地减少神经网络的总成本。此外，在总体上，本文的方法也是最快的。

Abstract
Looking for sparsity is nowadays crucial to speed up the training of large-scale neural networks. Projections onto the $\ell_{1,2}$ and $\ell_{1,\infty}$ are among the most efficient techniques to sparsify and reduce the overall cost of neural networks. In this paper, we introduce a new projection algorithm for the $\ell_{1,\infty}$ norm ball. The worst-case time complexity of this algorithm is $\mathcal{O}\big(nm+J\log(nm)\big)$ for a matrix in $\mathbb{R}^{n\times m}$. $J$ is a term that tends to 0 when the sparsity is high, and to $nm$ when the sparsity is low. Its implementation is easy and it is guaranteed to converge to the exact solution in a finite time. Moreover, we propose to incorporate the $\ell_{1,\infty}$ ball projection while training an autoencoder to enforce feature selection and sparsity of the weights. Sparsification appears in the encoder to primarily do feature selection due to our application in biology, where only a very small part ($<2\%$) of the data is relevant. We show that both in the biological case and in the general case of sparsity that our method is the fastest.

摘要
现在，寻找稀疏性（sparsity）是训练大规模神经网络的关键。 projet onto the $\ell_{1,2}$ and $\ell_{1,\infty}$ 是最有效的技术来减少和降低神经网络的总成本。在这篇论文中，我们提出了一种新的 projection algorithm for the $\ell_{1,\infty}$ norm ball。该算法的最坏情况时间复杂度为 $\mathcal{O}\big(nm+J\log(nm)\big)$，其中 $J$ 是一个随着稀疏性的变量，当稀疏性高时，$J$ 趋于 0，而当稀疏性低时，$J$ 趋于 $nm$。该算法的实现容易，并且可以在有限时间内 converge to the exact solution。此外，我们还提议在训练 autoencoder 时，通过 incorporating the $\ell_{1,\infty}$ ball projection 来强制权重的稀疏性和选择性。在我们的应用中，只有一小部分（<2%）的数据是关键的，因此在encoder中进行稀疏化可以主要地实现特征选择。我们展示了在生物学应用中和总的稀疏性情况下，我们的方法是最快的。

Deep Operator Network Approximation Rates for Lipschitz Operators

paper_url: http://arxiv.org/abs/2307.09835
repo_url: None
paper_authors: Christoph Schwab, Andreas Stein, Jakob Zech
for: 这个论文是为了研究深度运算网络（Deep Operator Networks，DON）在模拟 lipschitz（或 holder）连续映射 $\mathcal G:\mathcal X\to\mathcal Y$ между（subsets of） separable Hilbert spaces $\mathcal X$, $\mathcal Y$ 的表达率 bounds。
methods: 这个论文使用了 linear encoders $\mathcal E$ 和 decoders $\mathcal D$ via（biorthogonal）Riesz bases of $\mathcal X$, $\mathcal Y$，并使用了一个参数化坐标映射的 approximator network，以实现 infinite-dimensional、 Parametric coordinate map 的 lipschitz continuous 表达。
results: 这个论文得到了 Lipschitz（或 holder）连续映射 $\mathcal G$ 的表达率 bounds，不需要 $\mathcal G$ 是 holomorphic 的假设。关键在证明表达率 bounds 时使用了 either super-expressive activations 或 nonstandard NN architectures with standard (ReLU) activations。例如，可以用 elementry superexpressive activations 或 nonstandard NN architectures with standard (ReLU) activations，如 [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021] 和 [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021] 等。

Abstract
We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or H\"older) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of $\mathcal X$, $\mathcal Y$, and an approximator network of an infinite-dimensional, parametric coordinate map that is Lipschitz continuous on the sequence space $\ell^2(\mathbb N)$. Unlike previous works ([Herrmann, Schwab and Zech: Neural and Spectral operator surrogates: construction and expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, SAM Report, 2022]), which required for example $\mathcal G$ to be holomorphic, the present expression rate results require mere Lipschitz (or H\"older) continuity of $\mathcal G$. Key in the proof of the present expression rate bounds is the use of either super-expressive activations (e.g. [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021], [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021], and the references there) which are inspired by the Kolmogorov superposition theorem, or of nonstandard NN architectures with standard (ReLU) activations as recently proposed in [Zhang, Shen and Yang: Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc. Sys., 2022]. We illustrate the abstract results by approximation rate bounds for emulation of a) solution operators for parametric elliptic variational inequalities, and b) Lipschitz maps of Hilbert-Schmidt operators.

摘要
我们证明了一 classe of neural Deep Operator Networks (DON) 的 universality和expression rate bounds，其中 $\mathcal G:\mathcal X\to\mathcal Y$ 是一个 Lipschitz (或Holder) 连续的映射 zwischen (subsets of) 分离 Hilbert spaces $\mathcal X$, $\mathcal Y$。DON 架构使用了线性encoder $\mathcal E$ 和decoder $\mathcal D$ via (biorthogonal) Riesz bases of $\mathcal X$, $\mathcal Y$, 并且使用一个实际coordinate map的 approximator network，这个coordinate map是一个无限维度的参数化映射，并且在sequence space $\ell^2(\mathbb N)$上是 Lipschitz 连续的。不同于先前的工作([Herrmann, Schwab and Zech: Neural and Spectral operator surrogates: construction and expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, SAM Report, 2022])，我们不需要 $\mathcal G$ 是holomorphic的，而是只需要它是Lipschitz (或Holder) 连续的。在证明我们的expression rate bounds时，我们使用了 either super-expressive activations (e.g. [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021], [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021], 和 referenes there)，这些activations是基于Kolmogorov superposition theorem的，或者使用 nonstandard NN architectures with standard (ReLU) activations，如最近提出的 [Zhang, Shen and Yang: Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc. Sys., 2022]。我们这里给出了 approximation rate bounds for emulation of a) solution operators for parametric elliptic variational inequalities, 和 b) Lipschitz maps of Hilbert-Schmidt operators。

What do neural networks learn in image classification? A frequency shortcut perspective

paper_url: http://arxiv.org/abs/2307.09829
repo_url: https://github.com/nis-research/nn-frequency-shortcuts
paper_authors: Shunxin Wang, Raymond Veldhuis, Christoph Brune, Nicola Strisciuglio
for: investigate the mechanisms of representation learning in neural networks (NNs) for classification tasks, and expand the understanding of frequency shortcuts.
methods: perform experiments on synthetic datasets and natural images, propose a metric to measure class-wise frequency characteristics, and identify frequency shortcuts.
results: demonstrate that NNs tend to find simple solutions for classification, and what they learn first during training depends on the most distinctive frequency characteristics; confirm that frequency shortcuts can be transferred across datasets and cannot be fully avoided by larger model capacity and data augmentation.

Abstract
Frequency analysis is useful for understanding the mechanisms of representation learning in neural networks (NNs). Most research in this area focuses on the learning dynamics of NNs for regression tasks, while little for classification. This study empirically investigates the latter and expands the understanding of frequency shortcuts. First, we perform experiments on synthetic datasets, designed to have a bias in different frequency bands. Our results demonstrate that NNs tend to find simple solutions for classification, and what they learn first during training depends on the most distinctive frequency characteristics, which can be either low- or high-frequencies. Second, we confirm this phenomenon on natural images. We propose a metric to measure class-wise frequency characteristics and a method to identify frequency shortcuts. The results show that frequency shortcuts can be texture-based or shape-based, depending on what best simplifies the objective. Third, we validate the transferability of frequency shortcuts on out-of-distribution (OOD) test sets. Our results suggest that frequency shortcuts can be transferred across datasets and cannot be fully avoided by larger model capacity and data augmentation. We recommend that future research should focus on effective training schemes mitigating frequency shortcut learning.

摘要
频率分析有助于理解神经网络（NN）的表征学习机制。大多数研究集中在 regression 任务上进行学习动力研究，而 Classification 方面则得少。这项研究进行实验性调查，扩展了频率短cut的理解。我们在设计的synthetic datasets上进行实验，发现NNs在类别化任务上倾向于找到简单的解决方案，并在训练过程中学习的第一个频率特征取决于数据集的偏好。我们还确认了这种现象在自然图像上。我们提出了一种类别频率特征的度量和frequency短cut的标识方法。结果显示，frequency短cut可以基于文字特征或形态特征，归结在最简化目标时。最后，我们验证了频率短cut的传送性，发现它们可以在不同的数据集上传送，并且不可以通过更大的模型容量和数据增强来完全避免。我们建议未来的研究应该关注有效地 mitigate 频率短cut学习。

paper_url: http://arxiv.org/abs/2307.09823
repo_url: https://github.com/batuhankmkaraman/mlbasedad
paper_authors: Yaran Chen, Xueyu Chen, Yu Han, Haoran Li, Dongbin Zhao, Jingzhong Li, Xu Wang
for: 预测非酒精性肝病（NAFLD）的病理诊断
methods: combining a comprehensive clinical dataset（FLDData）和一种基于多Modal学习的NAFLD预测方法（DeepFLD），使用多Modal输入，包括Metadata和面部图像，以提高非侵入性诊断的准确率
results: DeepFLD模型在不同的测试集上达到了高度的表现，并且使用只有面部图像作为输入，可以实现相对较好的性能，这显示了该方法的可行性和简洁性。

Abstract
Non alcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease, which can be predicted accurately to prevent advanced fibrosis and cirrhosis. While, a liver biopsy, the gold standard for NAFLD diagnosis, is invasive, expensive, and prone to sampling errors. Therefore, non-invasive studies are extremely promising, yet they are still in their infancy due to the lack of comprehensive research data and intelligent methods for multi-modal data. This paper proposes a NAFLD diagnosis system (DeepFLDDiag) combining a comprehensive clinical dataset (FLDData) and a multi-modal learning based NAFLD prediction method (DeepFLD). The dataset includes over 6000 participants physical examinations, laboratory and imaging studies, extensive questionnaires, and facial images of partial participants, which is comprehensive and valuable for clinical studies. From the dataset, we quantitatively analyze and select clinical metadata that most contribute to NAFLD prediction. Furthermore, the proposed DeepFLD, a deep neural network model designed to predict NAFLD using multi-modal input, including metadata and facial images, outperforms the approach that only uses metadata. Satisfactory performance is also verified on other unseen datasets. Inspiringly, DeepFLD can achieve competitive results using only facial images as input rather than metadata, paving the way for a more robust and simpler non-invasive NAFLD diagnosis.

摘要
非酒精肝病（NAFLD）是现代肝病最常见的原因，可以准确预测并预防进展到高度纤维化和 cirrhosis。然而，肝切片，作为 NAFLD 诊断的标准方法，是侵入性的、昂贵的，且容易出现采样错误。因此，非侵入性的研究非常有前途，但是现在仍然处于初期阶段，因为缺乏全面的研究数据和智能的多Modal 数据处理方法。本文提出了一种基于 Deep Learning 技术的 NAFLD 诊断系统（DeepFLDDiag），结合了丰富的临床数据集（FLDData）和多Modal 学习基于 NAFLD 预测方法（DeepFLD）。该数据集包括超过 6000 名参与者的身体检查、实验室和成像研究、广泛的问卷和一部分参与者的面部图像，这是丰富和有价值的临床数据。从数据集中，我们量化分析并选择了对 NAFLD 预测最有价值的临床Metadata。此外，我们提出的 DeepFLD 模型，用于预测 NAFLD 的多Modal 输入，包括Metadata 和面部图像，超越了只使用Metadata 的方法。在其他未看到数据集上，我们也得到了满意的性能。更有意思的是，DeepFLD 可以使用面部图像作为输入，而不需要Metadata，这打开了一个更加简单、更加稳定的非侵入性 NAFLD 诊断之路。

Deep unrolling Shrinkage Network for Dynamic MR imaging

paper_url: http://arxiv.org/abs/2307.09818
repo_url: https://github.com/yhao-z/dus-net
paper_authors: Yinghao Zhang, Xiaodi Li, Weihang Li, Yue Hu
for:* 这篇论文主要是关于动态核磁共振成像（MR）重建模型的研究。methods:* 该论文提出了一种新的soft thresholding with channel attention（AST）操作，用于在每个通道上学习阈值。* 该论文还提出了一种深度折叠缩小网络（DUS-Net），通过对多个分量的alternating direction method of multipliers（ADMM）进行拓展，来优化转换后$l_1$ нор的动态MR重建模型。results:* 实验结果表明，提议的DUS-Net方法在一个开放的动态照片MR数据集上表现出色，超过了当前的状态艺术方法。

Abstract
Deep unrolling networks that utilize sparsity priors have achieved great success in dynamic magnetic resonance (MR) imaging. The convolutional neural network (CNN) is usually utilized to extract the transformed domain, and then the soft thresholding (ST) operator is applied to the CNN-transformed data to enforce the sparsity priors. However, the ST operator is usually constrained to be the same across all channels of the CNN-transformed data. In this paper, we propose a novel operator, called soft thresholding with channel attention (AST), that learns the threshold for each channel. In particular, we put forward a novel deep unrolling shrinkage network (DUS-Net) by unrolling the alternating direction method of multipliers (ADMM) for optimizing the transformed $l_1$ norm dynamic MR reconstruction model. Experimental results on an open-access dynamic cine MR dataset demonstrate that the proposed DUS-Net outperforms the state-of-the-art methods. The source code is available at \url{https://github.com/yhao-z/DUS-Net}.

摘要
深度 unfolding 网络，利用减法约束，在动态磁共振成像中取得了很大的成功。通常情况下，卷积神经网络（CNN）会被用来提取转换的频谱，然后使用软resholding（ST）操作来强制施加约束。但是，ST操作通常会被限制为所有通道的 CNN-transformed 数据中的同一个值。在这篇论文中，我们提出了一种新的操作，即通道注意力软resholding（AST），它可以学习每个通道的阈值。具体来说，我们提出了一种新的深度 unfolding shrinkage network（DUS-Net），通过对 alternate direction method of multipliers（ADMM）的拓展来优化转换 $l_1$ нор的动态 MR 重建模型。实验结果表明，提议的 DUS-Net 在一个公开的动态磁共振动态MR数据集上表现出了比状态之前的方法更好的性能。源代码可以在 \url{https://github.com/yhao-z/DUS-Net} 上获取。

Manifold Learning with Sparse Regularised Optimal Transport

paper_url: http://arxiv.org/abs/2307.09816
repo_url: https://github.com/zsteve/QROT
paper_authors: Stephen Zhang, Gilles Mordant, Tetsuya Matsumoto, Geoffrey Schiebinger
for: 本研究旨在提出一种基于优化运输的拟合 manifold 学习方法，以减少实际数据中的噪声和采样影响。
methods: 该方法使用一种对称的优化运输方法，加以二阶规则化， constructions 一个稀疏和可适应的凝聚矩阵，可以视为泛化凝聚kernel的扩展。
results: 研究证明了该kernel在维度缩放时是一个准确的laplace-type运算，并且在各种伪随机噪声下保持稳定性。此外，该方法在一些实验中表现出了较高的效率和精度。

Abstract
Manifold learning is a central task in modern statistics and data science. Many datasets (cells, documents, images, molecules) can be represented as point clouds embedded in a high dimensional ambient space, however the degrees of freedom intrinsic to the data are usually far fewer than the number of ambient dimensions. The task of detecting a latent manifold along which the data are embedded is a prerequisite for a wide family of downstream analyses. Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge. We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation that constructs a sparse and adaptive affinity matrix, that can be interpreted as a generalisation of the bistochastic kernel normalisation. We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in simulations. We identify a highly efficient computational scheme for computing this optimal transport for discrete data and demonstrate that it outperforms competing methods in a set of examples.

摘要
《 manifold learning 是现代统计和数据科学中的中心任务。许多数据（细胞、文档、图像、分子）可以被表示为高维拓扑空间中的点云，但数据中的自由度通常比拓扑空间维度少得多。探测数据所embedded的秘密拓扑是下游分析的先决条件。实际世界数据受到噪声观测和采样的影响，因此提取数据下的底层拓扑信息是一大挑战。我们提出一种基于对称的最优运输方法，加入 quadratic regularization，构建一个稀疏和适应性的倾斜矩阵，这可以看作是泛函 kernel 的普适化。我们证明这种核函在连续限制下与拉普拉斯型运算相一致，并且具有抗hetroskedastic 噪声的稳定性。我们还提出了高效的计算方案，可以高效地计算这种最优运输，并在一些示例中证明其超过竞争方法。》

GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence

paper_url: http://arxiv.org/abs/2307.09810
repo_url: https://github.com/codetopaper/genkl
paper_authors: Xia Huang, Kai Fong Ernest Chong
for:* The paper is written to address the problem of non-conforming (NC) instances in web image datasets, which can negatively impact the performance of image classification models.methods:* The paper proposes a new method called $\mathcal{D}{\text{KL}^{\alpha, \beta}(p|q)$ based on the concept of generalized KL divergence to identify NC instances.* The method is based on the idea of measuring the difference between the predicted distribution of an instance and the true distribution, and it is proven to be more effective than traditional entropy-based methods.results:* The paper achieves new state-of-the-art classification accuracies on three web image datasets: Clothing1M, Food101/Food101N, and mini WebVision 1.0. The achieved accuracies are $81.34%$, $85.73%$ and $78.99%$/$92.54%$ (top-1/top-5), respectively.Here is the information in Simplified Chinese text:for:* 论文是为了解决网络图像集中的非合规（NC）实例问题，这些实例可能会影响图像分类模型的性能。methods:* 论文提出了一种基于泛化KL差的新方法，称为 $\mathcal{D}{\text{KL}^{\alpha, \beta}(p|q)$，用于标识NC实例。* 该方法基于测量实例预测分布与真实分布之间的差异，并证明其比传统的熵基本方法更有效。results:* 论文在三个网络图像集上达到了新的状态对图像分类模型的最佳性能：Clothing1M、Food101/Food101N和 mini WebVision 1.0。得到的准确率分别是81.34%、85.73%和78.99%/92.54% (top-1/top-5)。

Abstract
Web image datasets curated online inherently contain ambiguous in-distribution (ID) instances and out-of-distribution (OOD) instances, which we collectively call non-conforming (NC) instances. In many recent approaches for mitigating the negative effects of NC instances, the core implicit assumption is that the NC instances can be found via entropy maximization. For "entropy" to be well-defined, we are interpreting the output prediction vector of an instance as the parameter vector of a multinomial random variable, with respect to some trained model with a softmax output layer. Hence, entropy maximization is based on the idealized assumption that NC instances have predictions that are "almost" uniformly distributed. However, in real-world web image datasets, there are numerous NC instances whose predictions are far from being uniformly distributed. To tackle the limitation of entropy maximization, we propose $(\alpha, \beta)$-generalized KL divergence, $\mathcal{D}_{\text{KL}^{\alpha, \beta}(p\|q)$, which can be used to identify significantly more NC instances. Theoretical properties of $\mathcal{D}_{\text{KL}^{\alpha, \beta}(p\|q)$ are proven, and we also show empirically that a simple use of $\mathcal{D}_{\text{KL}^{\alpha, \beta}(p\|q)$ outperforms all baselines on the NC instance identification task. Building upon $(\alpha,\beta)$-generalized KL divergence, we also introduce a new iterative training framework, GenKL, that identifies and relabels NC instances. When evaluated on three web image datasets, Clothing1M, Food101/Food101N, and mini WebVision 1.0, we achieved new state-of-the-art classification accuracies: $81.34\%$, $85.73\%$ and $78.99\%$/$92.54\%$ (top-1/top-5), respectively.

摘要
网络图像数据集Curated online自然而含有ambiguous in-distribution (ID)实例和out-of-distribution (OOD)实例，我们共同称之为非符合 (NC) 实例。在许多最近的approaches中，核心隐式假设是通过Entropy maximization来缓解NC实例的负面影响。为了使Entropy well-defined，我们 interprets the output prediction vector of an instance as the parameter vector of a multinomial random variable, with respect to some trained model with a softmax output layer。因此，Entropy maximization基于理想化的假设，NC实例的预测值都是“几乎” uniform distribution。然而，在实际的网络图像数据集中，有许多NC实例 whose predictions are far from being uniformly distributed。为了缓解Entropy maximization的局限性，我们提出了$(\alpha, \beta)$-通常KL差（Divergence）， $\mathcal{D}_{\text{KL}^{\alpha, \beta}(p\|q)$，可以用来标识更多的NC实例。我们也证明了这种方法的理论性质，并通过实验表明，使用$\mathcal{D}_{\text{KL}^{\alpha, \beta}(p\|q)$可以超过所有基准的NC实例标识性能。基于$(\alpha, \beta)$-通常KL差，我们还介绍了一种新的迭代培训框架，GenKL，可以标识和重新标注NC实例。当应用于三个网络图像数据集，Clothing1M、Food101/Food101N和mini WebVision 1.0时，我们 achieved new state-of-the-art classification accuracies：$81.34\%$, $85.73\%$和$78.99\%$/$92.54\%$ (top-1/top-5)，分别。

Graph Federated Learning Based on the Decentralized Framework

paper_url: http://arxiv.org/abs/2307.09801
repo_url: None
paper_authors: Peilin Liu, Yanni Tang, Mingyue Zhang, Wu Chen
for: 防止数据隐私泄露和提高模型准确率，提出一种基于分布式机器学习的图学习方法。
methods: 使用分布式机器学习的客户端-服务器框架，并在客户端之间进行数据交换和模型训练。基于节点之间数据相似性的确定节点信任度，然后进行权重线性聚合。
results: 与 FedAvg、Fedprox、GCFL 和 GCFL+ 进行比较，实验结果表明提出的方法在准确率和稳定性方面具有较高的性能。

Abstract
Graph learning has a wide range of applications in many scenarios, which require more need for data privacy. Federated learning is an emerging distributed machine learning approach that leverages data from individual devices or data centers to improve the accuracy and generalization of the model, while also protecting the privacy of user data. Graph-federated learning is mainly based on the classical federated learning framework i.e., the Client-Server framework. However, the Client-Server framework faces problems such as a single point of failure of the central server and poor scalability of network topology. First, we introduce the decentralized framework to graph-federated learning. Second, determine the confidence among nodes based on the similarity of data among nodes, subsequently, the gradient information is then aggregated by linear weighting based on confidence. Finally, the proposed method is compared with FedAvg, Fedprox, GCFL, and GCFL+ to verify the effectiveness of the proposed method. Experiments demonstrate that the proposed method outperforms other methods.

摘要
《图学学习有广泛的应用场景，需要更加重视数据隐私。联邦学习是一种迅速发展的分布式机器学习方法，利用设备或数据中心上的数据来提高模型的准确性和通用性，同时保护用户数据的隐私。图联邦学习主要基于经典的客户端-服务器框架。但是，客户端-服务器框架受到中央服务器垂直点失败和网络拓扑缺乏扩展性的问题。我们首先引入了分布式框架到图联邦学习。其次，通过数据相似性对节点进行确定 Similarity，然后将Gradient信息通过线性权重平均来聚合。最后，我们对提出的方法进行了与FedAvg、Fedprox、GCFL和GCFL+进行比较，以验证提出的方法的有效性。实验表明，提出的方法在其他方法之上具有优势。》Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form as well.

Probabilistic Forecasting with Coherent Aggregation

paper_url: http://arxiv.org/abs/2307.09797
repo_url: None
paper_authors: Geoffrey Négiar, Ruijun Ma, O. Nangba Meetei, Mengfei Cao, Michael W. Mahoney
for: 这篇论文主要应用于多変量预测 tasks，例如能源管理、供应链观察和资源分配等领域，以提供精确的机会预测。
methods: 本论文提出了一个新的模型，利用因子模型结构来生成具有层次结构的预测，并使用卷积神经网来生成因子、负载和基层分布的参数。这个模型可以跟踪基层分布的变化，并且可以根据样本的差异来估计基层分布的参数。
results: 该论文的实验结果显示，该模型可以在三个层次预测数据集上获得明显的改善（11.8-41.4%），并且可以调整基层分布和因子数量对模型的影响。

Abstract
Obtaining accurate probabilistic forecasts while respecting hierarchical information is an important operational challenge in many applications, perhaps most obviously in energy management, supply chain planning, and resource allocation. The basic challenge, especially for multivariate forecasting, is that forecasts are often required to be coherent with respect to the hierarchical structure. In this paper, we propose a new model which leverages a factor model structure to produce coherent forecasts by construction. This is a consequence of a simple (exchangeability) observation: permuting \textit{}base-level series in the hierarchy does not change their aggregates. Our model uses a convolutional neural network to produce parameters for the factors, their loadings and base-level distributions; it produces samples which can be differentiated with respect to the model's parameters; and it can therefore optimize for any sample-based loss function, including the Continuous Ranked Probability Score and quantile losses. We can choose arbitrary continuous distributions for the factor and the base-level distributions. We compare our method to two previous methods which can be optimized end-to-end, while enforcing coherent aggregation. Our model achieves significant improvements: between $11.8-41.4\%$ on three hierarchical forecasting datasets. We also analyze the influence of parameters in our model with respect to base-level distribution and number of factors.

摘要
Obtaining accurate probabilistic forecasts while respecting hierarchical information is an important operational challenge in many applications, perhaps most obviously in energy management, supply chain planning, and resource allocation. The basic challenge, especially for multivariate forecasting, is that forecasts are often required to be coherent with respect to the hierarchical structure. In this paper, we propose a new model that leverages a factor model structure to produce coherent forecasts by construction. This is a consequence of a simple (exchangeability) observation: permuting base-level series in the hierarchy does not change their aggregates. Our model uses a convolutional neural network to produce parameters for the factors, their loadings, and base-level distributions; it produces samples that can be differentiated with respect to the model's parameters; and it can therefore optimize for any sample-based loss function, including the Continuous Ranked Probability Score and quantile losses. We can choose arbitrary continuous distributions for the factor and the base-level distributions. We compare our method to two previous methods that can be optimized end-to-end while enforcing coherent aggregation. Our model achieves significant improvements: between 11.8-41.4% on three hierarchical forecasting datasets. We also analyze the influence of parameters in our model with respect to base-level distribution and number of factors.Here's the text in Traditional Chinese as well:Obtaining accurate probabilistic forecasts while respecting hierarchical information is an important operational challenge in many applications, perhaps most obviously in energy management, supply chain planning, and resource allocation. The basic challenge, especially for multivariate forecasting, is that forecasts are often required to be coherent with respect to the hierarchical structure. In this paper, we propose a new model that leverages a factor model structure to produce coherent forecasts by construction. This is a consequence of a simple (exchangeability) observation: permuting base-level series in the hierarchy does not change their aggregates. Our model uses a convolutional neural network to produce parameters for the factors, their loadings, and base-level distributions; it produces samples that can be differentiated with respect to the model's parameters; and it can therefore optimize for any sample-based loss function, including the Continuous Ranked Probability Score and quantile losses. We can choose arbitrary continuous distributions for the factor and the base-level distributions. We compare our method to two previous methods that can be optimized end-to-end while enforcing coherent aggregation. Our model achieves significant improvements: between 11.8-41.4% on three hierarchical forecasting datasets. We also analyze the influence of parameters in our model with respect to base-level distribution and number of factors.

Forecasting Early with Meta Learning

paper_url: http://arxiv.org/abs/2307.09796
repo_url: https://github.com/super-shayan/feml
paper_authors: Shayan Jawed, Kiran Madhusudhanan, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme
for: 本研究的目的是开发一种基于Meta学习的时间序列预测方法，以便在早期观察期间，使用已有的数据集来预测时间序列。
methods: 本研究使用了一种基于对抗学习的Meta学习方法，其中包括一个共享的卷积层，用于学习不同数据集的特征，以及数据集特定的头，用于预测不同的输出长度。
results: 研究表明，FEML可以在不同数据集上进行Meta学习，并通过在 auxiliary task 中学习对抗生成的样本，提高预测性能，比单任务学习和不同的解决方案（如共同学习、多任务学习和经典预测基准）的表现更佳。

Abstract
In the early observation period of a time series, there might be only a few historic observations available to learn a model. However, in cases where an existing prior set of datasets is available, Meta learning methods can be applicable. In this paper, we devise a Meta learning method that exploits samples from additional datasets and learns to augment time series through adversarial learning as an auxiliary task for the target dataset. Our model (FEML), is equipped with a shared Convolutional backbone that learns features for varying length inputs from different datasets and has dataset specific heads to forecast for different output lengths. We show that FEML can meta learn across datasets and by additionally learning on adversarial generated samples as auxiliary samples for the target dataset, it can improve the forecasting performance compared to single task learning, and various solutions adapted from Joint learning, Multi-task learning and classic forecasting baselines.

摘要
在时间序列的早期观察期，可能只有几个历史观察值可以学习模型。但在存在现有的先前数据集的情况下，元学习方法可以应用。在这篇论文中，我们设计了一种元学习方法，利用来自其他数据集的样本，并通过对抗学习作为目标数据集的auxiliary任务来增强时间序列的学习。我们的模型（FEML）具有共享的卷积几何体，可以从不同的数据集中学习不同长度的输入特征，并具有特定数据集的头来预测不同的输出长度。我们显示，FEML可以在不同数据集之间meta学习，并通过额外学习对抗生成的样本来提高预测性能，并评估了单任务学习、联合学习、多任务学习和经典预测基准的多种解决方案。

From West to East: Who can understand the music of the others better?

paper_url: http://arxiv.org/abs/2307.09795
repo_url: https://github.com/pxaris/ccml
paper_authors: Charilaos Papaioannou, Emmanouil Benetos, Alexandros Potamianos
For: 本研究用于了解不同音乐文化和风格下的音频嵌入模型是否可以学习音乐特征，以及是否可以通过转移学习来捕捉不同音乐文化之间的相似性。* Methods: 本研究使用了转移学习方法，将西方乐器演奏的音频数据和东方传统乐器演奏的音频数据作为源数据，并使用了两种CNN和一种Transformer预训练模型进行音频嵌入。* Results: 实验结果显示，通过转移学习可以在所有领域中实现竞争性的表现，但是不同音乐文化的源数据具有不同的优势。研究提供了一个公共存储库，包含实现和训练的模型。

Abstract
Recent developments in MIR have led to several benchmark deep learning models whose embeddings can be used for a variety of downstream tasks. At the same time, the vast majority of these models have been trained on Western pop/rock music and related styles. This leads to research questions on whether these models can be used to learn representations for different music cultures and styles, or whether we can build similar music audio embedding models trained on data from different cultures or styles. To that end, we leverage transfer learning methods to derive insights about the similarities between the different music cultures to which the data belongs to. We use two Western music datasets, two traditional/folk datasets coming from eastern Mediterranean cultures, and two datasets belonging to Indian art music. Three deep audio embedding models are trained and transferred across domains, including two CNN-based and a Transformer-based architecture, to perform auto-tagging for each target domain dataset. Experimental results show that competitive performance is achieved in all domains via transfer learning, while the best source dataset varies for each music culture. The implementation and the trained models are both provided in a public repository.

摘要

IncDSI: Incrementally Updatable Document Retrieval

paper_url: http://arxiv.org/abs/2307.10323
repo_url: None
paper_authors: Varsha Kishore, Chao Wan, Justin Lovelace, Yoav Artzi, Kilian Q. Weinberger
For: The paper is written for document retrieval systems that need to be updated with new information in real-time.* Methods: The paper proposes a method called IncDSI, which adds documents to a trained neural network-based search index in real-time, without retraining the entire model.* Results: The authors claim that their approach is orders of magnitude faster than retraining the model on the entire dataset, and is competitive with retraining the model on the whole dataset in terms of performance. They also demonstrate the effectiveness of their approach on several benchmarks.

Abstract
Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.

摘要
diferenciable 搜寻指标是一种最近提出的文档搜寻方法，它将文档集中的信息嵌入神经网络的参数中，并直接将查询映射到相应的文档。这种模型在许多benchmark上达到了州际级的表现，但它具有一个重要的限制：不容易在模型训练后添加新的文档。我们提出了IncDSI，一种在实时（约20-50ms每个文档）添加文档的方法，不需要重新训练整个资料集或其部分。相反，我们将文档添加形式化为一个受限制的优化问题，以最小化网络参数的变化。这种方法比较应用训练整个资料集的方法快得多，但是其表现和重新训练整个资料集的方法相当，可以在实时更新文档搜寻系统，以便获取新的信息。我们的IncDSI代码可以在https://github.com/varshakishore/IncDSI 中找到。

A Note on Hardness of Computing Recursive Teaching Dimension

paper_url: http://arxiv.org/abs/2307.09792
repo_url: None
paper_authors: Pasin Manurangsi
for: 计算概念类（输入直接提供）的Recursive Teaching Dimension（RTD）问题需要 $n^{\Omega(\log n)}$ 时间，假设快速寻路假设（ETH）。
methods: 使用快速寻路假设（ETH）。
results: 得到 $n^{O(\log n)}$ 时间的解决方案，与笨方法 ($n^{O(\log n)}$)相同。

Abstract
In this short note, we show that the problem of computing the recursive teaching dimension (RTD) for a concept class (given explicitly as input) requires $n^{\Omega(\log n)}$-time, assuming the exponential time hypothesis (ETH). This matches the running time $n^{O(\log n)}$ of the brute-force algorithm for the problem.

摘要
在这个短文中，我们证明计算概念类中的重ursive教学维度（RTD）问题需要$n^{\Omega(\log n)}$时间，假设快速幂时间假设（ETH）成立。这与布尔特 FORCE算法的运行时间 $n^{O(\log n)}$ 相同。

Reproducibility in Machine Learning-Driven Research

paper_url: http://arxiv.org/abs/2307.10320
repo_url: None
paper_authors: Harald Semmelrock, Simone Kopeinik, Dieter Theiler, Tony Ross-Hellauer, Dominik Kowald
for: 这个论文旨在探讨机器学习（ML）驱动的研究中存在的可重现性危机，以及这些研究领域中存在的可重现性问题和障碍。
methods: 该论文采用了文献综述的方法，检视了不同研究领域中ML可重现性的现状，并确定了可重现性问题和障碍。
results: 该论文发现了各种研究领域中ML可重现性的问题和障碍，并提出了一些可能的解决方案，包括使用ML平台等工具和实践。

Abstract
Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.

摘要
研究正面临一种复制危机，在许多研究中结果和发现很难或根本不可复制。这也是机器学习（ML）和人工智能（AI）研究中的问题。常常这是因为未发表的数据和/或源代码，以及对ML训练条件的敏感性。虽然研究社区中有许多解决此问题的方案，如使用ML平台，但ML驱动研究的复制性不增加的情况并没有改善。因此，在这次小访问中，我们会回顾ML驱动研究中的复制性情况，了解不同领域的ML复制性情况，识别ML复制性问题和障碍，以及可能支持ML复制性的工具、做法和措施。希望通过这些研究，对不同解决方案的可行性做出决策。

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

paper_url: http://arxiv.org/abs/2307.09782
repo_url: https://github.com/microsoft/DeepSpeed
paper_authors: Xiaoxia Wu, Zhewei Yao, Yuxiong He
for: 这个研究的目的是为大型语言模型（LLM）实现高效性和模型质量的平衡。
methods: 这个研究使用浮点数（FP）量化，特别是FP8和FP4，作为可能的解决方案，并进行了全面的探索。
results: 研究结果显示， для LLMs，FP8活化常常比其整数（INT8）相等者更好，尤其在模型中有多亿个参数时。对于预算量化，我们发现FP4的表现与INT4相似或更好，使得在FP支持的硬件上进行部署变得更加简单。我们还提出了两个缩减对称的精简方法，以减少因精度不同而导致的遗憾。此外，我们将量化方法与LoRC策略结合，导致小型模型中的改进。研究结果显示FP量化具有巨大的潜力，并且这种方法可以实现高效地部署在资源有限的环境中。

Abstract
In the complex domain of large language models (LLMs), striking a balance between computational efficiency and maintaining model quality is a formidable challenge. Navigating the inherent limitations of uniform quantization, particularly when dealing with outliers, and motivated by the launch of NVIDIA's H100 hardware, this study delves into the viability of floating-point (FP) quantization, particularly focusing on FP8 and FP4, as a potential solution. Our comprehensive investigation reveals that for LLMs, FP8 activation consistently outshines its integer (INT8) equivalent, with the performance edge becoming more noticeable in models possessing parameters beyond one billion. For weight quantization, our findings indicate that FP4 exhibits comparable, if not superior, performance to INT4, simplifying deployment on FP-supported hardware like H100. To mitigate the overhead from precision alignment caused by the disparity between weights and activations, we propose two scaling constraints for weight quantization that negligibly impact the performance compared to the standard W4A8 model. We additionally enhance our quantization methods by integrating the Low Rank Compensation (LoRC) strategy, yielding improvements especially in smaller models. The results of our investigation emphasize the immense potential of FP quantization for LLMs, paving the way for high-efficiency deployment in resource-limited settings.

摘要
在大语言模型（LLM）领域中，实现计算效率和模型质量之间的平衡是一项具有挑战性的任务。在无格量化的环境中，特别是处理异常值时，我们发现了使用浮点数（FP）量化的可能性，特别是FP8和FP4。本研究的全面调查表明，对于LLMs，FP8活动通常比INT8等效，而且在模型参数超过一亿时，性能差异变得更加明显。对于权量量化，我们发现了FP4的性能与INT4相当，甚至超越INT4，这使得在FP支持的硬件上进行部署变得更加简单。为了减少精度对齐所带来的开销，我们提议了两种缩放约束，对于标准W4A8模型而言，影响非常小。此外，我们还增强了我们的量化方法，通过 интеграating Low Rank Compensation（LoRC）策略，尤其在较小的模型中，得到了改进。研究结果表明，FP量化对LLMs来说具有极大的潜力，为高效部署在有限资源的场景提供了道路。

Text2Layer: Layered Image Generation using Latent Diffusion Model

paper_url: http://arxiv.org/abs/2307.09781
repo_url: None
paper_authors: Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien
for: 本研究旨在探讨层 composite workflow 中的图像生成问题，通过Diffusion模型的成功启发我们对层图像生成进行探索。
methods: 我们提议同时生成背景图像、前景图像、层Mask和组合图像，并使用自适应encoder来重建层图像和对抽象表示进行扩散模型训练。
results: 我们的方法可以生成高质量的层图像和层Mask，并为未来工作提供了一个新的标准。

Abstract
Layer compositing is one of the most popular image editing workflows among both amateurs and professionals. Motivated by the success of diffusion models, we explore layer compositing from a layered image generation perspective. Instead of generating an image, we propose to generate background, foreground, layer mask, and the composed image simultaneously. To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation. One benefit of the proposed problem is to enable better compositing workflows in addition to the high-quality image output. Another benefit is producing higher-quality layer masks compared to masks produced by a separate step of image segmentation. Experimental results show that the proposed method is able to generate high-quality layered images and initiates a benchmark for future work.

摘要
层compositing是图像编辑中最受欢迎的工作流程之一，both amateur和professional都very popular。被diffusion模型的成功启发，我们从层图生成的角度出发，对层compositing进行探索。而不是生成一个图像，我们提议同时生成背景、前景、层mask和组合图像。为实现层图生成，我们训练了一个能够重建层图的autoencoder，并在层表示中训练diffusion模型。这个问题的两个benefit是：一是实现更好的compositing工作流程，二是生成更高质量的层mask，比起分开进行图像分割后生成的mask更高质量。实验结果表明，我们的方法可以生成高质量的层图并为未来的工作提供了参考。

Beyond Single-Feature Importance with ICECREAM

paper_url: http://arxiv.org/abs/2307.09779
repo_url: None
paper_authors: Michael Oesterle, Patrick Blöbaum, Atalanti A. Mastakouri, Elke Kirschbaum
for: 本文旨在描述一种用于解释模型输出和云计算应用失败的方法。
methods: 本文提出了一种基于联盟的信息量度量表示方法，用于掌握模型输出和失败原因。
results: 在synthetic和实际数据上进行了实验，并证明了该方法可以在解释和根本原因分析任务中减轻表达和排名的约束，并且在两个任务中具有卓越的准确率。

Abstract
Which set of features was responsible for a certain output of a machine learning model? Which components caused the failure of a cloud computing application? These are just two examples of questions we are addressing in this work by Identifying Coalition-based Explanations for Common and Rare Events in Any Model (ICECREAM). Specifically, we propose an information-theoretic quantitative measure for the influence of a coalition of variables on the distribution of a target variable. This allows us to identify which set of factors is essential to obtain a certain outcome, as opposed to well-established explainability and causal contribution analysis methods which can assign contributions only to individual factors and rank them by their importance. In experiments with synthetic and real-world data, we show that ICECREAM outperforms state-of-the-art methods for explainability and root cause analysis, and achieves impressive accuracy in both tasks.

摘要
“我们在这个研究中使用 Identifying Coalition-based Explanations for Common and Rare Events in Any Model (ICECREAM) 方法来回答一些问题，例如：哪些功能集引起了某个机器学习模型的特定输出？哪些组件导致云 computing 应用程序的失败？我们提出一个信息理论基础的量化度量，用于衡量参数集对目标变量的分布的影响。这允许我们找出获得特定结果所必需的元素，而不是传统的解释和 causal contribution 分析方法，它们仅将贡献分配到个别的参数上，并按其重要性排名。在实验中，我们使用 synthetic 和实际数据，与现有的方法进行比较，结果显示 ICECREAM 在解释和根本原因分析任务中具有优秀的性能。”Note that Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore, while Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls

paper_url: http://arxiv.org/abs/2307.10304
repo_url: https://github.com/aik2mlj/polyffusion
paper_authors: Lejun Min, Junyan Jiang, Gus Xia, Jingwei Zhao
for: generating polyphonic music scores using a diffusion model
methods: uses internal control (pre-defined parts of the music) and external control (external yet related information such as chord, texture, or other features) via cross-attention mechanism
results: significantly outperforms existing Transformer and sampling-based baselines, and using pre-trained disentangled representations as external conditions yields more effective controls.Here’s the simplified Chinese version:
for: 通过扩散模型生成多重音乐谱
methods: 使用内部控制（用户先定义音乐的一部分）和外部控制（通过对应关系机制将外部信息引入）
results: 与现有的Transformer和采样基eline相比，表现出色，并且使用预训练分解表示者作为外部条件可以更有效地控制音乐生成。

Abstract
We propose Polyffusion, a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations. The model is capable of controllable music generation with two paradigms: internal control and external control. Internal control refers to the process in which users pre-define a part of the music and then let the model infill the rest, similar to the task of masked music generation (or music inpainting). External control conditions the model with external yet related information, such as chord, texture, or other features, via the cross-attention mechanism. We show that by using internal and external controls, Polyffusion unifies a wide range of music creation tasks, including melody generation given accompaniment, accompaniment generation given melody, arbitrary music segment inpainting, and music arrangement given chords or textures. Experimental results show that our model significantly outperforms existing Transformer and sampling-based baselines, and using pre-trained disentangled representations as external conditions yields more effective controls.

摘要
我们提出了Polyffusion，一种扩散模型，通过将音乐视为像图像的钢琴谱表示，生成多重音乐分解。该模型具有可控的音乐生成功能，可以通过两种方法进行控制：内部控制和外部控制。内部控制指的是用户先定义部分音乐，然后让模型填充其余部分，类似于遮盖音乐生成（或音乐填充）任务。外部控制通过跨注意机制将外部 yet related的信息，如和声、xture或其他特征，用于控制模型。我们表明，通过内部和外部控制，Polyffusion可以统一许多音乐创作任务，包括给予伴奏的旋律生成、给予旋律的伴奏生成、任意音乐段填充和基于和声或xture的音乐安排。实验结果显示，我们的模型在与传统Transformer和采样基础之间显著超越了现有的基elines，并且使用预训练的分离表示作为外部条件可以更有效地控制。

Eliminating Label Leakage in Tree-Based Vertical Federated Learning

paper_url: http://arxiv.org/abs/2307.10318
repo_url: None
paper_authors: Hideaki Takahashi, Jingjing Liu, Yang Liu
for: 这个论文是研究 vertical federated learning（VFL）中的攻击和防御机制的。
methods: 论文使用了一种新的标签推导攻击方法ID2Graph，通过利用训练样本中每个节点的集合（即实例空间）来推导私有训练标签。同时，论文还提出了一种有效的防御机制ID-LMID，通过互信息常量化来防止标签泄露。
results: 实验表明，ID2Graph攻击可以带来tree-based模型的泄露风险，而ID-LMID有效地mitigates label leakage in such instances。

Abstract
Vertical federated learning (VFL) enables multiple parties with disjoint features of a common user set to train a machine learning model without sharing their private data. Tree-based models have become prevalent in VFL due to their interpretability and efficiency. However, the vulnerability of tree-based VFL has not been sufficiently investigated. In this study, we first introduce a novel label inference attack, ID2Graph, which utilizes the sets of record-IDs assigned to each node (i.e., instance space) to deduce private training labels. The ID2Graph attack generates a graph structure from training samples, extracts communities from the graph, and clusters the local dataset using community information. To counteract label leakage from the instance space, we propose an effective defense mechanism, ID-LMID, which prevents label leakage by focusing on mutual information regularization. Comprehensive experiments conducted on various datasets reveal that the ID2Graph attack presents significant risks to tree-based models such as Random Forest and XGBoost. Further evaluations on these benchmarks demonstrate that ID-LMID effectively mitigates label leakage in such instances.

摘要
vertically federated learning (VFL) 可以让多个党有不同特征的共同用户集中训练机器学习模型，而不需要共享私人数据。由于树状模型的可读性和效率，因此在VFL中它们变得普遍。然而，树状VFL的漏洞尚未得到充分调查。在本研究中，我们首先介绍了一种新的标签推断攻击，即ID2Graph攻击，它利用训练样本中每个节点（即实例空间）分配的集合ID来推断私人训练标签。ID2Graph攻击生成了训练样本中的图结构，提取了图中的社区信息，并使用社区信息对本地数据进行归一化。为了防止实例空间中的标签泄露，我们提议了一种有效的防御机制，即ID-LMID，它通过强调共轭信息规则来防止标签泄露。在各种数据集上进行了广泛的实验，显示ID2Graph攻击对树状模型，如Random Forest和XGBoost，具有重大风险。进一步在这些标准底上进行的评估表明，ID-LMID有效地减轻标签泄露的风险。

Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study

paper_url: http://arxiv.org/abs/2308.02412
repo_url: https://github.com/JJJinx/SSLCSI
paper_authors: Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng
for: 这篇论文主要是为了探讨 WiFi CSI 技术在人体动作识别（HAR）领域中的应用，以及如何使用深度学习技术和 SSL 算法来解决数据不充足问题。
methods: 该论文使用了多种 SSL 算法，包括已经研究过的和尚未研究过的类型，并在三个公开available的 CSI HAR 数据集上进行了广泛的实验研究。
results: 该论文的实验结果表明，使用 SSL 算法可以在 WiFi CSI 技术应用中提高 HAR 性能，但是还存在一些限制和盲点，需要进一步的改进才能在实际应用中使用。

Abstract
Recently, with the advancement of the Internet of Things (IoT), WiFi CSI-based HAR has gained increasing attention from academic and industry communities. By integrating the deep learning technology with CSI-based HAR, researchers achieve state-of-the-art performance without the need of expert knowledge. However, the scarcity of labeled CSI data remains the most prominent challenge when applying deep learning models in the context of CSI-based HAR due to the privacy and incomprehensibility of CSI-based HAR data. On the other hand, SSL has emerged as a promising approach for learning meaningful representations from data without heavy reliance on labeled examples. Therefore, considerable efforts have been made to address the challenge of insufficient data in deep learning by leveraging SSL algorithms. In this paper, we undertake a comprehensive inventory and analysis of the potential held by different categories of SSL algorithms, including those that have been previously studied and those that have not yet been explored, within the field. We provide an in-depth investigation of SSL algorithms in the context of WiFi CSI-based HAR. We evaluate four categories of SSL algorithms using three publicly available CSI HAR datasets, each encompassing different tasks and environmental settings. To ensure relevance to real-world applications, we design performance metrics that align with specific requirements. Furthermore, our experimental findings uncover several limitations and blind spots in existing work, highlighting the barriers that need to be addressed before SSL can be effectively deployed in real-world WiFi-based HAR applications. Our results also serve as a practical guideline for industry practitioners and provide valuable insights for future research endeavors in this field.

摘要
近些年，因互联网物联网（IoT）的进步，WiFi CSI基本的人类活动识别（HAR）已经获得学术和工业社区的关注。通过融合深度学习技术与CSI基本的HAR，研究人员可以 дости得现场的性能，不需要专家知识。然而，CSI基本的数据短缺对应用深度学习模型在CSI基本的HAR领域中具有最大的挑战，因为CSI基本的数据具有隐私和不可理解的问题。一方面，SSL（匿名学习）已经成为一种可能地学习具有意义的表现，不需要大量的标注数据。因此，在深度学习应用中，对于CSI基本的数据短缺的挑战，SSL算法已经获得了大量的研究和应用。在这篇文章中，我们对WiFi CSI基本的HAR领域中SSL算法的应用进行了全面的调查和分析。我们对不同类型的SSL算法进行了深入的探访，包括已经研究过的和尚未研究过的类型。我们还对三个公开可用的CSI HAR数据集进行了实验，每个数据集都包含不同的任务和环境设定。为了保持现实应用的相关性，我们设计了适合实际应用的表现指标。我们的实验结果显示，SSL算法在WiFi CSI基本的HAR领域中具有许多局限和盲点，这些盲点需要被解决才能实现实际应用。我们的结果也可以作为实践工程师的实用指南，并提供了未来研究的有益指导。

A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices

paper_url: http://arxiv.org/abs/2307.09771
repo_url: None
paper_authors: Jinyang Li, Zhepeng Wang, Zhirui Hu, Prasanna Date, Ang Li, Weiwen Jiang
For: This paper proposes a novel spatial-temporal design, named ST-VQC, to improve the accuracy and robustness of quantum learning algorithms for handling non-linear datasets and noisy environments.* Methods: The proposed ST-VQC integrates non-linearity in quantum learning through a block-based encoding quantum sub-circuit and layer-wise computation quantum sub-circuit, and adopts a SWAP-Free physical circuit design to improve robustness. An automated optimization framework is proposed to generate the ST-VQC quantum circuit.* Results: The proposed ST-VQC achieves over 30% accuracy improvement compared with existing VQCs on actual quantum computers, and outperforms a linear classifier by 27.9% on a non-linear synthetic dataset.

Abstract
Quantum computing presents a promising approach for machine learning with its capability for extremely parallel computation in high-dimension through superposition and entanglement. Despite its potential, existing quantum learning algorithms, such as Variational Quantum Circuits(VQCs), face challenges in handling more complex datasets, particularly those that are not linearly separable. What's more, it encounters the deployability issue, making the learning models suffer a drastic accuracy drop after deploying them to the actual quantum devices. To overcome these limitations, this paper proposes a novel spatial-temporal design, namely ST-VQC, to integrate non-linearity in quantum learning and improve the robustness of the learning model to noise. Specifically, ST-VQC can extract spatial features via a novel block-based encoding quantum sub-circuit coupled with a layer-wise computation quantum sub-circuit to enable temporal-wise deep learning. Additionally, a SWAP-Free physical circuit design is devised to improve robustness. These designs bring a number of hyperparameters. After a systematic analysis of the design space for each design component, an automated optimization framework is proposed to generate the ST-VQC quantum circuit. The proposed ST-VQC has been evaluated on two IBM quantum processors, ibm_cairo with 27 qubits and ibmq_lima with 7 qubits to assess its effectiveness. The results of the evaluation on the standard dataset for binary classification show that ST-VQC can achieve over 30% accuracy improvement compared with existing VQCs on actual quantum computers. Moreover, on a non-linear synthetic dataset, the ST-VQC outperforms a linear classifier by 27.9%, while the linear classifier using classical computing outperforms the existing VQC by 15.58%.

摘要
量子计算技术在机器学习方面具有极大的潜力，它可以通过积分和异常相互作用来进行非常平行的计算，以处理高维数据。然而，现有的量子学习算法，如变量量子电路（VQC），在处理更复杂的数据集时会遇到挑战，特别是不可分割的数据集。此外，它还会遇到部署问题，使得学习模型在真实的量子设备上表现出较大的精度下降。为了解决这些局限性，这篇论文提出了一种新的空间-时间设计，即ST-VQC，以 интегрирова量子学习中的非线性。具体来说，ST-VQC可以通过一种新的块基编码量子子电路和层次计算量子子电路来提取空间特征，以实现时间 wise深度学习。此外，一种SWAP-Free物理电路设计也被提出，以提高robustness。这些设计带来了一些参数。经系统分析每个设计组件的设计空间，我们提出了一个自动优化框架，以生成ST-VQC量子电路。我们对IBM量子处理器ibmq_lima和ibm_cairo，它们具有27个量子比特和7个量子比特，进行了评估。结果表明，ST-VQC在标准的二分类数据集上的评价结果表明，ST-VQC可以在真实的量子计算机上实现30%的精度提高。此外，在一个非线性的synthetic数据集上，ST-VQC超过了一个线性分类器的27.9%的提高，而线性分类器使用классический计算机则超过了现有VQC的15.58%的提高。

How Curvature Enhance the Adaptation Power of Framelet GCNs

paper_url: http://arxiv.org/abs/2307.09768
repo_url: https://github.com/dshi3553usyd/curvature_enhanced_graph_convolution
paper_authors: Dai Shi, Yi Guo, Zhiqi Shao, Junbin Gao
for: This paper aims to enhance the performance of graph neural networks (GNNs) by incorporating graph geometric information, specifically discrete graph Ricci curvature.
methods: The proposed approach uses the graph Ricci curvature defined on the edges of a graph to measure the difficulty of information transit between nodes. The curvature information is inserted into the GNN model with a carefully designed transformation function $\zeta$ to alleviate computational issues such as over-smoothing.
results: The proposed curvature-based GNN model outperforms state-of-the-art baselines in both homophily and heterophily graph datasets, indicating the effectiveness of involving graph geometric information in GNNs. Additionally, the curvature-based graph edge drop algorithm is proposed to drop edges with very positive Ricci curvature to enhance the model’s adaptation to heterophily graphs.Here is the simplified Chinese text for the three key points:
for: 这篇论文目的是使用图形神经网络（GNN）中的图形信息来提高模型性能。
methods: 提议的方法使用图中边的 Ricci curvature 来衡量信息在不同节点之间的传递难度。 curvature 信息被注入到 GNN 模型中，并使用特定的变换函数 $\zeta$ 来缓解计算问题。
results: 提议的 curvature-based GNN 模型在同性和异性图像数据集上都有较高的性能，这表明将图形信息包含在 GNN 中是有效的。此外，提出了基于 Ricci curvature 的图边Drop 算法，以便适应异性图。

Abstract
Graph neural network (GNN) has been demonstrated powerful in modeling graph-structured data. However, despite many successful cases of applying GNNs to various graph classification and prediction tasks, whether the graph geometrical information has been fully exploited to enhance the learning performance of GNNs is not yet well understood. This paper introduces a new approach to enhance GNN by discrete graph Ricci curvature. Specifically, the graph Ricci curvature defined on the edges of a graph measures how difficult the information transits on one edge from one node to another based on their neighborhoods. Motivated by the geometric analogy of Ricci curvature in the graph setting, we prove that by inserting the curvature information with different carefully designed transformation function $\zeta$, several known computational issues in GNN such as over-smoothing can be alleviated in our proposed model. Furthermore, we verified that edges with very positive Ricci curvature (i.e., $\kappa_{i,j} \approx 1$) are preferred to be dropped to enhance model's adaption to heterophily graph and one curvature based graph edge drop algorithm is proposed. Comprehensive experiments show that our curvature-based GNN model outperforms the state-of-the-art baselines in both homophily and heterophily graph datasets, indicating the effectiveness of involving graph geometric information in GNNs.

摘要
GRAPH Neural Network (GNN) 已经在模型图structured数据中显示出强大的能力。然而，虽然 многи successfully applied GNNs to various graph classification and prediction tasks, whether the graph geometric information has been fully exploited to enhance the learning performance of GNNs is not yet well understood. This paper introduces a new approach to enhance GNN by discrete graph Ricci curvature. Specifically, the graph Ricci curvature defined on the edges of a graph measures how difficult the information transits on one edge from one node to another based on their neighborhoods. Motivated by the geometric analogy of Ricci curvature in the graph setting, we prove that by inserting the curvature information with different carefully designed transformation function $\zeta$, several known computational issues in GNN such as over-smoothing can be alleviated in our proposed model. Furthermore, we verified that edges with very positive Ricci curvature (i.e., $\kappa_{i,j} \approx 1$) are preferred to be dropped to enhance model's adaption to heterophily graph and one curvature based graph edge drop algorithm is proposed. Comprehensive experiments show that our curvature-based GNN model outperforms the state-of-the-art baselines in both homophily and heterophily graph datasets, indicating the effectiveness of involving graph geometric information in GNNs.

Sig-Splines: universal approximation and convex calibration of time series generative models

paper_url: http://arxiv.org/abs/2307.09767
repo_url: None
paper_authors: Magnus Wiese, Phillip Murray, Ralf Korn
for: 这个论文是为了提出一种新的生成模型，用于处理多变量离散时间序列数据。
methods: 该算法启发自神经抽屉流的建构，并将线性变换和签名变换作为神经网络的替换。这种方法不仅具有神经网络的通用性，还引入了模型参数的凸性。
results: 该模型可以实现不仅神经网络的通用性，还可以保证模型参数的凸性。

Abstract
We propose a novel generative model for multivariate discrete-time time series data. Drawing inspiration from the construction of neural spline flows, our algorithm incorporates linear transformations and the signature transform as a seamless substitution for traditional neural networks. This approach enables us to achieve not only the universality property inherent in neural networks but also introduces convexity in the model's parameters.

摘要
我们提出一种新的生成模型，用于多变量离散时间序列数据。我们的算法启发自神经 spline flows 的结构，并在其中包含线性变换和签名变换，作为传统神经网络的替换。这种方法不仅具有神经网络的universality性，还会使模型参数变得几何。

Reinforcing POD based model reduction techniques in reaction-diffusion complex networks using stochastic filtering and pattern recognition

paper_url: http://arxiv.org/abs/2307.09762
repo_url: None
paper_authors: Abhishek Ajayakumar, Soumyendu Raha
for: 模型实际世界系统，但是这些系统的维度可能会使其分析变得困难。
methods: 使用dimensionality reduction技术，如POD，但这些模型容易受输入数据的干扰。
results: 我们的方法可以改善受到干扰输入的模型的准确性。

Abstract
Complex networks are used to model many real-world systems. However, the dimensionality of these systems can make them challenging to analyze. Dimensionality reduction techniques like POD can be used in such cases. However, these models are susceptible to perturbations in the input data. We propose an algorithmic framework that combines techniques from pattern recognition (PR) and stochastic filtering theory to enhance the output of such models. The results of our study show that our method can improve the accuracy of the surrogate model under perturbed inputs. Deep Neural Networks (DNNs) are susceptible to adversarial attacks. However, recent research has revealed that neural Ordinary Differential Equations (ODEs) exhibit robustness in specific applications. We benchmark our algorithmic framework with a Neural ODE-based approach as a reference.

摘要
困难的网络被用来模型许多实际世界系统。然而，这些系统的维度可以使得它们变得复杂而难以分析。降维技术如POD可以在这些情况下使用。然而，这些模型容易受输入数据的扰动影响。我们提议一种算法框架，将 Pattern recognition 技术和 Stochastic filtering theory 结合使用，以提高这些模型的输出精度。我们的研究结果表明，我们的方法可以在受到扰动输入时提高模型的准确性。深度神经网络（DNNs）容易受到恶意攻击。然而，最近的研究表明，神经 Ordinary Differential Equations （ODEs）在某些应用场景中具有特殊的Robustness。我们将我们的算法框架与神经 ODE 方法作为参考进行比较。

FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning

paper_url: http://arxiv.org/abs/2307.10317
repo_url: https://github.com/iandrover/fedbug
paper_authors: Chia-Hsiang Kao, Yu-Chiang Frank Wang
for: 该论文旨在提出一种新的 Federated Learning（FL）框架，以解决客户端遗传的问题。
methods: 该论文提出了一种名为 FedBug（Federated Learning with Bottom-Up Gradual Unfreezing）的新的 FL 框架，它在客户端上采用层 wise 的渐进解冻策略，从输入层到输出层，以实现跨客户端的Alignment。
results: 该论文通过理论分析和实验 validate 了 FedBug 的高效性，并且在不同的数据集、训练条件和网络架构下进行了广泛的测试。

Abstract
Federated Learning (FL) offers a collaborative training framework, allowing multiple clients to contribute to a shared model without compromising data privacy. Due to the heterogeneous nature of local datasets, updated client models may overfit and diverge from one another, commonly known as the problem of client drift. In this paper, we propose FedBug (Federated Learning with Bottom-Up Gradual Unfreezing), a novel FL framework designed to effectively mitigate client drift. FedBug adaptively leverages the client model parameters, distributed by the server at each global round, as the reference points for cross-client alignment. Specifically, on the client side, FedBug begins by freezing the entire model, then gradually unfreezes the layers, from the input layer to the output layer. This bottom-up approach allows models to train the newly thawed layers to project data into a latent space, wherein the separating hyperplanes remain consistent across all clients. We theoretically analyze FedBug in a novel over-parameterization FL setup, revealing its superior convergence rate compared to FedAvg. Through comprehensive experiments, spanning various datasets, training conditions, and network architectures, we validate the efficacy of FedBug. Our contributions encompass a novel FL framework, theoretical analysis, and empirical validation, demonstrating the wide potential and applicability of FedBug.

摘要
联合学习（FL）提供了一个合作训练框架，让多个客户端共同训练一个共享模型，不会妥协数据隐私。由于本地数据组的不同性，客户端模型的更新可能会过滤和分化，即client drift问题。在这篇文章中，我们提出了FedBug（联合学习 with Bottom-Up Gradual Unfreezing），一个新的FL框架，用于有效地解决client drift问题。FedBug在服务器端每个全球轮次释出的客户端模型参数作为参考点，进行跨客户Alignment。具体来说，在客户端上，FedBug首先将整个模型冻结，然后逐层解冻，从输入层到输出层。这种底上逐层的方法让模型在新解冻的层上训练，将数据 проек 到一个共同的潜在空间，其中的分隔条件保持一致 across all clients。我们在一个新的FL设置下进行了理论分析，显示FedBug的渐进速度比FedAvg更高。通过了多种数据集、训练条件和网络架构，我们验证了FedBug的有效性。我们的贡献包括一个新的FL框架、理论分析和实验验证，展示了FedBug的广泛应用和可能性。

Constructing Extreme Learning Machines with zero Spectral Bias

paper_url: http://arxiv.org/abs/2307.09759
repo_url: None
paper_authors: Kaumudi Joshi, Vukka Snigdha, Arya Kumar Bhattacharya
for: This paper aims to address the issue of Spectral Bias (SB) in Extreme Learning Machines (ELMs) and its impact on the resolution of higher frequencies, which is crucial in fields like Physics Informed Neural Networks (PINNs).
methods: The paper uses Fourier Feature Embeddings to mitigate SB in ELMs, which has been shown to be effective in addressing this issue in other types of Artificial Neural Networks (ANNs).
results: The paper demonstrates that the proposed approach completely eliminates SB in ELMs, making them feasible for practical problems like PINNs where resolution of higher frequencies is essential.

Abstract
The phenomena of Spectral Bias, where the higher frequency components of a function being learnt in a feedforward Artificial Neural Network (ANN) are seen to converge more slowly than the lower frequencies, is observed ubiquitously across ANNs. This has created technology challenges in fields where resolution of higher frequencies is crucial, like in Physics Informed Neural Networks (PINNs). Extreme Learning Machines (ELMs) that obviate an iterative solution process which provides the theoretical basis of Spectral Bias (SB), should in principle be free of the same. This work verifies the reliability of this assumption, and shows that it is incorrect. However, the structure of ELMs makes them naturally amenable to implementation of variants of Fourier Feature Embeddings, which have been shown to mitigate SB in ANNs. This approach is implemented and verified to completely eliminate SB, thus bringing into feasibility the application of ELMs for practical problems like PINNs where resolution of higher frequencies is essential.

摘要
现象 known as Spectral Bias（спектральная предвзятость），其中高频成分函数在Feedforward Artificial Neural Network（ANN）中学习时 convergence slower than low frequency components, 是在 ANN 中广泛观察到的。这种情况在 Physics Informed Neural Networks（PINNs）等领域带来了技术挑战，因为在这些领域，高频分量的解析是关键。Extreme Learning Machines（ELMs），它们不需要迭代解决过程，因此可以避免 Spectral Bias（SB）的理论基础。然而，实际上，ELMs 中的结构使其自然地适合实现 variants of Fourier Feature Embeddings（FFE），这种方法已经在 ANN 中证明可以 Mitigate SB。本文验证了这种方法的可靠性，并证明了它可以完全消除 SB，因此使得 ELMs 在实际应用中，如 PINNs，可以实现高频分量的解析。

Improved Distribution Matching for Dataset Condensation

paper_url: http://arxiv.org/abs/2307.09742
repo_url: https://github.com/uitrbn/idm
paper_authors: Ganlong Zhao, Guanbin Li, Yipeng Qin, Yizhou Yu
for: 这个研究旨在开发一种能够将大量数据集整合成小数据集，并且能够训练高性能模型，以减少深度学习应用中的储存成本和训练努力。
methods: 本研究使用分布匹配法来实现数据集整合，并且提出三种新技术来解决对分布匹配的两大缺陷（即分布特征数量不均匀和无效的嵌入）。
results: compared to先前的优化方法，本研究的方法更加高效，可以处理更大的数据集和模型，并且在实验中得到了更好的效果。

Abstract
Dataset Condensation aims to condense a large dataset into a smaller one while maintaining its ability to train a well-performing model, thus reducing the storage cost and training effort in deep learning applications. However, conventional dataset condensation methods are optimization-oriented and condense the dataset by performing gradient or parameter matching during model optimization, which is computationally intensive even on small datasets and models. In this paper, we propose a novel dataset condensation method based on distribution matching, which is more efficient and promising. Specifically, we identify two important shortcomings of naive distribution matching (i.e., imbalanced feature numbers and unvalidated embeddings for distance computation) and address them with three novel techniques (i.e., partitioning and expansion augmentation, efficient and enriched model sampling, and class-aware distribution regularization). Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources, thereby scaling data condensation to larger datasets and models. Extensive experiments demonstrate the effectiveness of our method. Codes are available at https://github.com/uitrbn/IDM

摘要

Mood Classification of Bangla Songs Based on Lyrics

paper_url: http://arxiv.org/abs/2307.10314
repo_url: None
paper_authors: Maliha Mahajebin, Mohammad Rifat Ahmmad Rashid, Nafees Mansoor
for: 本研究旨在分析孟加拉语歌曲的情感，并通过自然语言处理和BERT算法来分类歌曲的情感。
methods: 该研究使用自然语言处理和BERT算法来分析4000首孟加拉语歌曲的 lyrics，并将歌曲分为四种情感：快乐、悲伤、爱情和宁静。
results: 研究发现，4000首歌曲中有1513首表达悲伤情感，1362首表达爱情情感，886首表达快乐情感，剩下的239首表达宁静情感。通过嵌入歌曲 lyrics，该研究准确地分类了歌曲的四种情感。

Abstract
Music can evoke various emotions, and with the advancement of technology, it has become more accessible to people. Bangla music, which portrays different human emotions, lacks sufficient research. The authors of this article aim to analyze Bangla songs and classify their moods based on the lyrics. To achieve this, this research has compiled a dataset of 4000 Bangla song lyrics, genres, and used Natural Language Processing and the Bert Algorithm to analyze the data. Among the 4000 songs, 1513 songs are represented for the sad mood, 1362 for the romantic mood, 886 for happiness, and the rest 239 are classified as relaxation. By embedding the lyrics of the songs, the authors have classified the songs into four moods: Happy, Sad, Romantic, and Relaxed. This research is crucial as it enables a multi-class classification of songs' moods, making the music more relatable to people's emotions. The article presents the automated result of the four moods accurately derived from the song lyrics.

摘要
音乐可以诱发不同的情感，而技术的进步使得音乐更加 accessible。孟加拉音乐，表达不同人类情感的音乐，尚未得到足够的研究。本文的作者希望通过分析孟加拉歌曲的歌词，根据歌词来分类它们的情感。为此，本研究编译了4000首孟加拉歌曲的歌词、种类，并使用自然语言处理和BERT算法来分析数据。其中4000首歌曲中，1513首表达了悲伤的情感，1362首表达了爱情的情感，886首表达了快乐的情感，剩下的239首被分类为放松的情感。通过嵌入歌曲歌词，作者将歌曲分为四种情感：快乐、悲伤、爱情和放松。这项研究非常重要，因为它使得音乐与人们的情感更加相似，使得更多人能够通过音乐来表达自己的情感。文章公布了自动分类出的四种情感的准确结果。

RaTE: a Reproducible automatic Taxonomy Evaluation by Filling the Gap

paper_url: http://arxiv.org/abs/2307.09706
repo_url: https://github.com/cestlucas/rate
paper_authors: Tianjian Gao, Phillipe Langlais
for: 这篇论文主要用于提出一种自动生成税onomy的评估方法，以便用于评估自动生成税onomy算法的效果。
methods: 该论文使用了一种基于大量预训练语言模型的自动标签 свобо Taxonomy 评估方法，称为RaTE。
results: 研究发现，RaTE 与人类评估结果相关度高，并且人工降低税onomy 会导致 RaTE 分数下降。

Abstract
Taxonomies are an essential knowledge representation, yet most studies on automatic taxonomy construction (ATC) resort to manual evaluation to score proposed algorithms. We argue that automatic taxonomy evaluation (ATE) is just as important as taxonomy construction. We propose RaTE, an automatic label-free taxonomy scoring procedure, which relies on a large pre-trained language model. We apply our evaluation procedure to three state-of-the-art ATC algorithms with which we built seven taxonomies from the Yelp domain, and show that 1) RaTE correlates well with human judgments and 2) artificially degrading a taxonomy leads to decreasing RaTE score.

摘要
《taxonomies是知识表示的重要方面，但大多数自动taxonomy建构(ATC)研究仍然采用手动评估提议的算法。我们认为自动taxonomy评估(ATE)也是非常重要。我们提出了一种无标签的自动评估方法，名为RaTE，它基于大量预训练的自然语言模型。我们对三种当前领先ATC算法进行了应用，并在Yelp领域建立了七个稿件，并证明了以下两点：1）RaTE与人类评估呈相关性，2） искусственно降低稿件会导致RaTE分数下降。》Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need any further assistance.

Efficient Guided Generation for Large Language Models

paper_url: http://arxiv.org/abs/2307.09702
repo_url: https://github.com/normal-computing/outlines
paper_authors: Brandon T. Willard, Rémi Louf
for: 这篇论文主要针对文本生成问题进行了构建性的重新定义，它基于finite-state machine的状态转移来实现有效的文本生成 guideline。
methods: 该方法使用了regular expressions和context-free grammar来控制文本生成，可以构建语言模型词汇表的索引，并且允许执行域专业知识和约束。
results: 该方法可以减少TokenSequence生成过程中的开销，并且在比较试验中显著超越现有的解决方案。一个开源的Python库Outlines提供了实现。

Abstract
In this article we show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine. This framework leads to an efficient approach to guiding text generation with regular expressions and context-free grammars by allowing the construction of an index over a language model's vocabulary. The approach is model agnostic, allows one to enforce domain-specific knowledge and constraints, and enables the construction of reliable interfaces by guaranteeing the structure of the generated text. It adds little overhead to the token sequence generation process and significantly outperforms existing solutions. An implementation is provided in the open source Python library Outlines

摘要
在这篇文章中，我们示例如如何将神经文本生成问题转换成 finite-state machine 的状态转移问题。这个框架导致了一种高效的方法来使用正则表达式和 context-free грамматики来引导文本生成，并允许建立语言模型词汇索引。这种方法是模型不偏向的，允许承载领域特定知识和约束，并允许建立可靠的界面，保证生成的文本结构正确。它增加了Token序列生成过程中的负担，并在性能上明显超过现有的解决方案。我们在 Python 开源库 Outlines 中提供了一个实现。

STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization

paper_url: http://arxiv.org/abs/2307.09692
repo_url: https://github.com/rll-research/bpref
paper_authors: Yachen Kang, Li He, Jinxin Liu, Zifeng Zhuang, Donglin Wang
for: 学习复杂的奖励函数，使用人类偏好作为学习目标。
methods: 使用 preference-based reinforcement learning (PbRL) 方法，并对不带标签的段使用 reuse 技术，以降低人类努力的成本。 consistency regularization 也被考虑以提高 semi-supervised learning 的性能。
results: 对于不同的 semi-supervised 选择和 peer regularization，我们的方法可以学习出多种 locomotive 和机器人 manipulate 行为。然而，我们发现在 PbRL 中存在一种 Similarity Trap 现象，导致 consistency regularization 不当地增强了模型对段对的一致性，从而降低了奖励学习的信任度。我们的自动训练方法和提出的 peer regularization 可以解决这个问题，并且实验证明了我们的方法的效果。

Abstract
Preference-based reinforcement learning (PbRL) promises to learn a complex reward function with binary human preference. However, such human-in-the-loop formulation requires considerable human effort to assign preference labels to segment pairs, hindering its large-scale applications. Recent approache has tried to reuse unlabeled segments, which implicitly elucidates the distribution of segments and thereby alleviates the human effort. And consistency regularization is further considered to improve the performance of semi-supervised learning. However, we notice that, unlike general classification tasks, in PbRL there exits a unique phenomenon that we defined as similarity trap in this paper. Intuitively, human can have diametrically opposite preferredness for similar segment pairs, but such similarity may trap consistency regularization fail in PbRL. Due to the existence of similarity trap, such consistency regularization improperly enhances the consistency possiblity of the model's predictions between segment pairs, and thus reduces the confidence in reward learning, since the augmented distribution does not match with the original one in PbRL. To overcome such issue, we present a self-training method along with our proposed peer regularization, which penalizes the reward model memorizing uninformative labels and acquires confident predictions. Empirically, we demonstrate that our approach is capable of learning well a variety of locomotion and robotic manipulation behaviors using different semi-supervised alternatives and peer regularization.

摘要
preference-based reinforcement learning (PbRL) 承诺学习复杂的奖励函数使用二进制人类偏好。然而，这种人 loop 形式需要较大的人类努力来分配偏好标签对 segment pair，限制其大规模应用。 recent approach 尝试 reuse unlabeled segments，这些 segment 隐式地描述了分布，从而减少人类努力。 In addition, consistency regularization 是进一步考虑的，以提高 semi-supervised learning 的性能。然而，我们注意到，与通用的分类任务不同，在 PbRL 中存在一种特殊的现象，我们在这篇论文中定义为 similarity trap。人类可能对类似 segment pair 表示对立的偏好，但这种相似性可能会诱导 consistency regularization 失败在 PbRL 中。由于存在 similarity trap，consistency regularization 可能会增强模型对 segment pair 的预测一致性，从而降低奖励学习的信任度，因为扩展的分布与原始分布在 PbRL 中不匹配。为了解决这个问题，我们提出了一种 self-training 方法，并与我们的提议的 peer regularization 一起使用。 peer regularization 会对奖励模型的预测进行约束，以避免模型记忆无用的标签。我们的方法可以在不同的 semi-supervised 变体和 peer regularization 下学习多种 locomotion 和 robotic manipulation 行为。

Joint Service Caching, Communication and Computing Resource Allocation in Collaborative MEC Systems: A DRL-based Two-timescale Approach

paper_url: http://arxiv.org/abs/2307.09691
repo_url: None
paper_authors: Qianqian Liu, Haixia Zhang, Xin Zhang, Dongfeng Yuan
for: 提高多元Access Edge Computing（MEC）系统中端点的功能，以满足终端设备的严格服务质量（QoS）要求。
methods: 提议一种协作式MEC框架，以便在边缘服务器之间分享资源，并通过缓存、协同卸载和计算和通信资源的均衡来最大化长期QoS和减少缓存切换成本。
results: 通过对问题的解决，提高了平均QoS和缓存切换成本。

Abstract
Meeting the strict Quality of Service (QoS) requirements of terminals has imposed a signiffcant challenge on Multiaccess Edge Computing (MEC) systems, due to the limited multidimensional resources. To address this challenge, we propose a collaborative MEC framework that facilitates resource sharing between the edge servers, and with the aim to maximize the long-term QoS and reduce the cache switching cost through joint optimization of service caching, collaborative offfoading, and computation and communication resource allocation. The dual timescale feature and temporal recurrence relationship between service caching and other resource allocation make solving the problem even more challenging. To solve it, we propose a deep reinforcement learning (DRL)-based dual timescale scheme, called DGL-DDPG, which is composed of a short-term genetic algorithm (GA) and a long short-term memory network-based deep deterministic policy gradient (LSTM-DDPG). In doing so, we reformulate the optimization problem as a Markov decision process (MDP) where the small-timescale resource allocation decisions generated by an improved GA are taken as the states and input into a centralized LSTM-DDPG agent to generate the service caching decision for the large-timescale. Simulation results demonstrate that our proposed algorithm outperforms the baseline algorithms in terms of the average QoS and cache switching cost.

摘要
To address this challenge, we propose a deep reinforcement learning (DRL)-based dual timescale scheme, called DGL-DDPG, which consists of a short-term genetic algorithm (GA) and a long short-term memory network-based deep deterministic policy gradient (LSTM-DDPG). We reformulate the optimization problem as a Markov decision process (MDP) where small-timescale resource allocation decisions generated by an improved GA are taken as states and input into a centralized LSTM-DDPG agent to generate service caching decisions for the large-timescale.Simulation results show that our proposed algorithm outperforms baseline algorithms in terms of average QoS and cache switching costs.

Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation

paper_url: http://arxiv.org/abs/2307.09688
repo_url: None
paper_authors: Wei Jin, Haitao Mao, Zheng Li, Haoming Jiang, Chen Luo, Hongzhi Wen, Haoyu Han, Hanqing Lu, Zhengyang Wang, Ruirui Li, Zhen Li, Monica Xiao Cheng, Rahul Goutam, Haiyang Zhang, Karthik Subbian, Suhang Wang, Yizhou Sun, Jiliang Tang, Bing Yin, Xianfeng Tang
for: 这个论文的目的是提高电商个性化推荐，以提高用户体验和参与度。
methods: 这篇论文使用了用户会话数据来预测用户的下一个交互，并在不同的语言和地区上进行了多样化的数据采集。
results: 论文提出了一个新的多语言多地区的用户会话数据集，可以帮助提高个性化推荐和理解用户偏好，并且可以用于各种现有任务以及新任务的研究和实践。

Abstract
Modeling customer shopping intentions is a crucial task for e-commerce, as it directly impacts user experience and engagement. Thus, accurately understanding customer preferences is essential for providing personalized recommendations. Session-based recommendation, which utilizes customer session data to predict their next interaction, has become increasingly popular. However, existing session datasets have limitations in terms of item attributes, user diversity, and dataset scale. As a result, they cannot comprehensively capture the spectrum of user behaviors and preferences. To bridge this gap, we present the Amazon Multilingual Multi-locale Shopping Session Dataset, namely Amazon-M2. It is the first multilingual dataset consisting of millions of user sessions from six different locales, where the major languages of products are English, German, Japanese, French, Italian, and Spanish. Remarkably, the dataset can help us enhance personalization and understanding of user preferences, which can benefit various existing tasks as well as enable new tasks. To test the potential of the dataset, we introduce three tasks in this work: (1) next-product recommendation, (2) next-product recommendation with domain shifts, and (3) next-product title generation. With the above tasks, we benchmark a range of algorithms on our proposed dataset, drawing new insights for further research and practice. In addition, based on the proposed dataset and tasks, we hosted a competition in the KDD CUP 2023 and have attracted thousands of users and submissions. The winning solutions and the associated workshop can be accessed at our website https://kddcup23.github.io/.

摘要
modelo de intenciones de compras de los clientes es una tarea crucial para el comercio electrónico, ya que directamente afecta la experiencia del usuario y la participación. Por lo tanto, comprender preferencias de los clientes de manera precisa es esencial para brindar recomendaciones personalizadas. La recomendación de sesión, que utiliza datos de sesiones de clientes para predecir su próxima interacción, ha ganado popularidad en la actualidad. Sin embargo, los conjuntos de datos de sesión existentes tienen limitaciones en términos de atributos de item, diversidad de usuarios y escala de datos. Como resultado, no pueden abarcar completamente el espectro de comportamientos y preferencias de los usuarios. Para superar este gap, presentamos el Dataset de Sesiones de Compras Multilingüe de Amazon, conocido como Amazon-M2. Es el primer conjunto de datos multilingüe que consta de millones de sesiones de usuarios de seis locales diferentes, donde las lenguas principales de los productos son inglés, alemán, japonés, francés, italiano y español. Destacablemente, el conjunto de datos puede ayudarnos a mejorar la personalización y la comprensión de las preferencias de los usuarios, lo que puede beneficiar diversas tareas existentes e incluso permitir nuevas tareas. Para probar el potencial del conjunto de datos, introducimos tres tareas en este trabajo: (1) recomendación de productos siguientes, (2) recomendación de productos siguientes con cambios de dominio, y (3) generación de títulos de productos siguientes. Con estas tareas, evaluamos una variedad de algoritmos en nuestro conjunto de datos propuesto, obteniendo nuevos hallazgos para la investigación y la práctica. Además, basado en nuestro conjunto de datos y tareas, organizamos una competencia en la KDD CUP 2023 y atraemos a miles de usuarios y suscripciones. Los soluciones ganadoras y el trabajo asociado se pueden acceder en nuestro sitio web .

Convex Geometry of ReLU-layers, Injectivity on the Ball and Local Reconstruction

paper_url: http://arxiv.org/abs/2307.09672
repo_url: https://github.com/danedane-haider/alpha-rectifying-frames
paper_authors: Daniel Haider, Martin Ehler, Peter Balazs
for: 研究ReLU层的具体输入 Vector 的可逆性，并提供一种可行的方法来确保 ReLU 层的可逆性。
methods: 使用框架理论设定，研究 ReLU 层在closed ball 上的具体输入 Vector 的可逆性，并利用 convex geometry 的视角来提供一种计算可行的方法。
results: 提供了一种可逆性判断方法，并提供了一种具体的重建方法，可以用于任何输入 Vector 在 ball 上。

Abstract
The paper uses a frame-theoretic setting to study the injectivity of a ReLU-layer on the closed ball of $\mathbb{R}^n$ and its non-negative part. In particular, the interplay between the radius of the ball and the bias vector is emphasized. Together with a perspective from convex geometry, this leads to a computationally feasible method of verifying the injectivity of a ReLU-layer under reasonable restrictions in terms of an upper bound of the bias vector. Explicit reconstruction formulas are provided, inspired by the duality concept from frame theory. All this gives rise to the possibility of quantifying the invertibility of a ReLU-layer and a concrete reconstruction algorithm for any input vector on the ball.

摘要
文章使用框理论设定来研究具有$\mathbb{R}^n$闭球的ReLU层的具体性。特别是，文章强调半径和偏置向量之间的交互作用。通过几何学的视角，这导致了对ReLU层的具体性进行有限制的 computationally feasible 验证方法。 besides, the paper provides explicit reconstruction formulas, inspired by the duality concept from frame theory. all this allows for the possibility of quantifying the invertibility of a ReLU-layer and a concrete reconstruction algorithm for any input vector on the ball.Here's the text in Traditional Chinese:文章使用框理论设定来研究具有$\mathbb{R}^n$闭球的ReLU层的具体性。特别是，文章强调半径和偏置向量之间的交互作用。通过几何学的视角，这导致了对ReLU层的具体性进行有限制的 computationally feasible 验证方法。 besides, the paper provides explicit reconstruction formulas, inspired by the duality concept from frame theory. all this allows for the possibility of quantifying the invertibility of a ReLU-layer and a concrete reconstruction algorithm for any input vector on the ball.

JAZZVAR: A Dataset of Variations found within Solo Piano Performances of Jazz Standards for Music Overpainting

paper_url: http://arxiv.org/abs/2307.09670
repo_url: None
paper_authors: Eleanor Row, Jingjing Tang, George Fazekas
for: 这个论文是为了描述一个 JazzVAR 数据集，用于研究 jazz 标准曲目的变奏和表演方式。
methods: 论文使用了人工提取的 JazzVAR 数据集，包含 502 对变奏和原始 MIDI 段落。每个变奏段落都有对应的原始段落，包含旋律和和声。
results: 论文提出了一种新的生成音乐任务 - 音乐覆盖，并提出了一个基于 Transformer 模型的基线模型，用于这个任务。此外，数据集还可用于表达性表演分析和演奏者识别等其他应用。

Abstract
Jazz pianists often uniquely interpret jazz standards. Passages from these interpretations can be viewed as sections of variation. We manually extracted such variations from solo jazz piano performances. The JAZZVAR dataset is a collection of 502 pairs of Variation and Original MIDI segments. Each Variation in the dataset is accompanied by a corresponding Original segment containing the melody and chords from the original jazz standard. Our approach differs from many existing jazz datasets in the music information retrieval (MIR) community, which often focus on improvisation sections within jazz performances. In this paper, we outline the curation process for obtaining and sorting the repertoire, the pipeline for creating the Original and Variation pairs, and our analysis of the dataset. We also introduce a new generative music task, Music Overpainting, and present a baseline Transformer model trained on the JAZZVAR dataset for this task. Other potential applications of our dataset include expressive performance analysis and performer identification.

摘要
爵士钢琴家常会有独特的表演方式，这些表演方式可以被视为旋律的变化部分。我们 manually提取了这些变化段落从爵士钢琴独奏表演中。JAZZVAR数据集是一个收集502对变化和原始MIDI段落的集合。每个变化段落在数据集中都是由原始爵士标准旋律和和声的一部分，而不是 improvise部分。在这篇论文中，我们介绍了获取和排序 реперtoire的过程，以及创建原始和变化段落的管道。我们还对数据集进行分析，并介绍了一种新的音乐任务——音乐覆盖，以及基于JAZZVAR数据集的这种任务的基线Transformer模型。此外，JAZZVAR数据集还有其他潜在应用，如表演技巧分析和演奏者识别。

Towards A Unified Agent with Foundation Models

paper_url: http://arxiv.org/abs/2307.09668
repo_url: None
paper_authors: Norman Di Palo, Arunkumar Byravan, Leonard Hasenclever, Markus Wulfmeier, Nicolas Heess, Martin Riedmiller
for: 本文旨在探讨如何将语言模型和视觉语言模型 embedding into 强化学习（RL）代理人，以便利用其理解人类意图、逻辑、场景理解和计划行为等能力。
methods: 本文提出了一个框架，用语言作为核心思维工具，以解决RL中一些基本挑战，如高效探索、重用经验数据、调度技能和学习从观察中学习。
results: 我们在一个具有缺乏奖励的虚拟机器人搅拌环境中测试了我们的方法，并证明了在探索效率和重用经验数据方面获得了显著性能提升，并示出了如何重用已学到的技能解决新任务或模仿人类专家视频。

Abstract
Language Models and Vision Language Models have recently demonstrated unprecedented capabilities in terms of understanding human intentions, reasoning, scene understanding, and planning-like behaviour, in text form, among many others. In this work, we investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents. We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges, such as efficient exploration, reusing experience data, scheduling skills, and learning from observations, which traditionally require separate, vertically designed algorithms. We test our method on a sparse-reward simulated robotic manipulation environment, where a robot needs to stack a set of objects. We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets, and illustrate how to reuse learned skills to solve novel tasks or imitate videos of human experts.

摘要
现在的语言模型和视觉语言模型已经展示了对人类意图、逻辑、场景理解和规划行为的无 precedent的能力，以文本形式为主。在这个工作中，我们investigate如何将这些能力 embedding在奖励学习（RL）代理人中。我们设计了一个框架，用语言作为核心思维工具，探索如何使得代理人可以解决一系列基本RL挑战，如高效探索、重用经验数据、调度技能和从观察中学习，这些传统需要分立的算法。我们在一个具有缺少奖励的模拟 robotic manipulation 环境中测试了我们的方法，并示出了对基eline的显著性能提高，包括探索效率和可以重用数据集的 reuse。此外，我们还 illustrate如何 reuse学习的技能来解决新任务或者模仿人类专家的视频。

Anticipating Technical Expertise and Capability Evolution in Research Communities using Dynamic Graph Transformers

paper_url: http://arxiv.org/abs/2307.09665
repo_url: None
paper_authors: Sameera Horawalavithana, Ellyn Ayton, Anastasiya Usenko, Robin Cosbey, Svitlana Volkova
for: 这项研究的目的是为了预测技术能力和技能的发展趋势，以便提高国家和全球安全，特别是在核不扩散领域和迅速发展的人工智能领域。
methods: 这项研究使用了传统的统计关系学学习方法（如链接预测在合作网络中），并使用动态不同类型图表示法来解决技术能力和技能发展的问题。
results: 研究人员们开发了一种动态图变换（DGT）神经网络模型，可以预测科学家和机构之间的合作 Patterns、作者行为和技能发展趋势，并在人工智能和核不扩散领域中达到了state-of-the-art的性能。

Abstract
The ability to anticipate technical expertise and capability evolution trends globally is essential for national and global security, especially in safety-critical domains like nuclear nonproliferation (NN) and rapidly emerging fields like artificial intelligence (AI). In this work, we extend traditional statistical relational learning approaches (e.g., link prediction in collaboration networks) and formulate a problem of anticipating technical expertise and capability evolution using dynamic heterogeneous graph representations. We develop novel capabilities to forecast collaboration patterns, authorship behavior, and technical capability evolution at different granularities (e.g., scientist and institution levels) in two distinct research fields. We implement a dynamic graph transformer (DGT) neural architecture, which pushes the state-of-the-art graph neural network models by (a) forecasting heterogeneous (rather than homogeneous) nodes and edges, and (b) relying on both discrete -- and continuous -- time inputs. We demonstrate that our DGT models predict collaboration, partnership, and expertise patterns with 0.26, 0.73, and 0.53 mean reciprocal rank values for AI and 0.48, 0.93, and 0.22 for NN domains. DGT model performance exceeds the best-performing static graph baseline models by 30-80% across AI and NN domains. Our findings demonstrate that DGT models boost inductive task performance, when previously unseen nodes appear in the test data, for the domains with emerging collaboration patterns (e.g., AI). Specifically, models accurately predict which established scientists will collaborate with early career scientists and vice-versa in the AI domain.

摘要
可以预测技术专业和能力发展趋势是国家和全球安全的关键，尤其是在安全关键领域如核不扩散（NN）和快速出现的领域如人工智能（AI）。在这项工作中，我们扩展了传统的统计关系学学习方法（例如链接预测在合作网络中），并将技术专业和能力发展的问题转化为动态不同类型图表示。我们开发了新的能力来预测合作模式、作者行为和技术能力的演变，并在不同粒度（例如科学家和机构层次）上进行预测。我们实现了动态图变换（DGT）神经网络模型，它超越了当前最佳STATIC GRAPH模型，通过（a）预测不同类型的节点和边（而不是同类型的节点和边），以及（b）使用时间输入。我们的DGT模型在AI和NN领域中预测了合作、合作伙伴和技术能力的模式，其中AI领域的mean reciprocal rank值为0.26、0.73和0.53，NN领域的mean reciprocal rank值为0.48、0.93和0.22。DGT模型的性能超过了最佳静止GRAPH模型的30-80%。我们的发现表明，DGT模型可以提高适应任务性能，当前没有seen节点出现在测试数据中，特别是在AI领域。具体来说，模型可以准确预测知名科学家和初出茅廊的科学家之间的合作在AI领域。

Physics-based Reduced Order Modeling for Uncertainty Quantification of Guided Wave Propagation using Bayesian Optimization

paper_url: http://arxiv.org/abs/2307.09661
repo_url: None
paper_authors: G. I. Drakoulas, T. V. Gortsas, D. Polyzos
for: 这 paper 的目的是提出一种基于机器学习的减少阶模型（BO-ML-ROM），用于提高 condition-based maintenance 中的STRUCTURAL HEALTH MONITORING（SHM）的计算效率。
methods: 这 paper 使用了 Bayesian optimization 框架和 finite element method 来实现 BO-ML-ROM，并通过 Sobol 指数来实现全局敏感分析。
results: 实验结果显示，Bayesian optimization 比一般采样方法更高效和快速，BO-ML-ROM 可以减少 GWP 的计算成本，并且可以准确地预测 SHM 中的结果。

Abstract
In the context of digital twins, structural health monitoring (SHM) constitutes the backbone of condition-based maintenance, facilitating the interconnection between virtual and physical assets. Guided wave propagation (GWP) is commonly employed for the inspection of structures in SHM. However, GWP is sensitive to variations in the material properties of the structure, leading to false alarms. In this direction, uncertainty quantification (UQ) is regularly applied to improve the reliability of predictions. Computational mechanics is a useful tool for the simulation of GWP, and is often applied for UQ. Even so, the application of UQ methods requires numerous simulations, while large-scale, transient numerical GWP solutions increase the computational cost. Reduced order models (ROMs) are commonly employed to provide numerical results in a limited amount of time. In this paper, we propose a machine learning (ML)-based ROM, mentioned as BO-ML-ROM, to decrease the computational time related to the simulation of the GWP. The ROM is integrated with a Bayesian optimization (BO) framework, to adaptively sample the parameters for the ROM training. The finite element method is used for the simulation of the high-fidelity models. The formulated ROM is used for forward UQ of the GWP in an aluminum plate with varying material properties. To determine the influence of each parameter perturbation, a global, variance-based sensitivity analysis is implemented based on Sobol' indices. It is shown that Bayesian optimization outperforms one-shot sampling methods, both in terms of accuracy and speed-up. The predicted results reveal the efficiency of BO-ML-ROM for GWP and demonstrate its value for UQ.

摘要
在数字双身框架中，结构健康监测（SHM）作为 Condition-based 维护的基础，实现了虚拟和物理资产之间的连接。通常使用推波propagation（GWP）进行结构的检测。但GWP受到结构物理属性的变化影响，导致假警示。为了改善预测的可靠性，uncertainty量化（UQ）经常应用。计算机机制是结构 simulations的有用工具，并常用于UQ。然而，UQ方法的应用需要许多 simulations，而大规模、过程 numerical GWP 解决方案会增加计算成本。减少级模型（ROMs）通常用于提供数字化的结果，以降低计算时间。在本文中，我们提出了一种基于机器学习（ML）的 ROM，称为 BO-ML-ROM，以降低 GWP 的计算时间。BO-ML-ROM 与 Bayesian 优化（BO）框架集成，以适应参数的参数采样。使用finite element 方法进行高精度模型的 simulations。我们采用了 Sobol 指数来实现全球、卷积基于sensitivity分析，以确定每个参数的影响。结果显示， Bayesian 优化超过一键采样方法，both in terms of accuracy和speed-up。预测结果表明，BO-ML-ROM 对 GWP 有效，并且对 UQ 提供了价值。

Neural Priority Queues for Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.09660
repo_url: None
paper_authors: Rishabh Jain, Petar Veličković, Pietro Liò
for: 这篇论文旨在扩展Graph Neural Networks（GNNs）中的外部记忆。
methods: 该论文提出了一种名为Neural Priority Queues（NPQs）的算法，用于增强GNNs的推理能力。
results: 实验结果表明，NPQs能够 capture long-range interactions，并且与算法性理解相关。

Abstract
Graph Neural Networks (GNNs) have shown considerable success in neural algorithmic reasoning. Many traditional algorithms make use of an explicit memory in the form of a data structure. However, there has been limited exploration on augmenting GNNs with external memory. In this paper, we present Neural Priority Queues, a differentiable analogue to algorithmic priority queues, for GNNs. We propose and motivate a desiderata for memory modules, and show that Neural PQs exhibit the desiderata, and reason about their use with algorithmic reasoning. This is further demonstrated by empirical results on the CLRS-30 dataset. Furthermore, we find the Neural PQs useful in capturing long-range interactions, as empirically shown on a dataset from the Long-Range Graph Benchmark.

摘要
图内神经网络（GNNs）在神经算法逻辑中表现出了很大的成功。许多传统算法都利用了明确的内存结构，但是对于GNNs中的外部Memory的探索是有限的。在这篇论文中，我们介绍了神经优先队列（Neural Priority Queues），它是神经网络中的一种可导的外部 Memory 模块。我们提出了内存模块的需求和愿景，并证明了神经优先队列符合这些需求，并且可以与算法逻辑进行推理。这种结果在CLRS-30数据集上得到了实验证明，并且在Long-Range Graph Benchmark数据集上也得到了实验证明，其可以捕捉长距离交互。

HAT-CL: A Hard-Attention-to-the-Task PyTorch Library for Continual Learning

paper_url: http://arxiv.org/abs/2307.09653
repo_url: https://github.com/xduan7/hat-cl
paper_authors: Xiaotian Duan
for: 本研究旨在 Mitigating catastrophic forgetting 问题，即神经网络学习新任务时，损失之前学习的知识。
methods: 本文引入了 Hard-Attention-to-the-Task (HAT) 机制，并提供了一个 user-friendly 和 PyTorch-compatible 的 redesign - HAT-CL。HAT-CL 不仅自动 manipulate 梯度，还可以将 PyTorch 模块转换为 HAT 模块。
results: 本研究成果包括一个 comprehensive 的 HAT 模块集，可以顺利地 интеGRATE 到现有的架构中，以及一些Ready-to-use HAT 网络，它们可以轻松地与 TIMM 库集成。此外，本研究还提出了一些新的 mask manipulate 技术，这些技术在多种实验中表现出了提高。

Abstract
Catastrophic forgetting, the phenomenon in which a neural network loses previously obtained knowledge during the learning of new tasks, poses a significant challenge in continual learning. The Hard-Attention-to-the-Task (HAT) mechanism has shown potential in mitigating this problem, but its practical implementation has been complicated by issues of usability and compatibility, and a lack of support for existing network reuse. In this paper, we introduce HAT-CL, a user-friendly, PyTorch-compatible redesign of the HAT mechanism. HAT-CL not only automates gradient manipulation but also streamlines the transformation of PyTorch modules into HAT modules. It achieves this by providing a comprehensive suite of modules that can be seamlessly integrated into existing architectures. Additionally, HAT-CL offers ready-to-use HAT networks that are smoothly integrated with the TIMM library. Beyond the redesign and reimplementation of HAT, we also introduce novel mask manipulation techniques for HAT, which have consistently shown improvements across various experiments. Our work paves the way for a broader application of the HAT mechanism, opening up new possibilities in continual learning across diverse models and applications.

摘要
Catastrophic forgetting, neuronal networks losing previously obtained knowledge during the learning of new tasks, poses a significant challenge in continual learning. The Hard-Attention-to-the-Task (HAT) mechanism has shown potential in mitigating this problem, but its practical implementation has been complicated by issues of usability and compatibility, and a lack of support for existing network reuse. In this paper, we introduce HAT-CL, a user-friendly, PyTorch-compatible redesign of the HAT mechanism. HAT-CL not only automates gradient manipulation but also streamlines the transformation of PyTorch modules into HAT modules. It achieves this by providing a comprehensive suite of modules that can be seamlessly integrated into existing architectures. Additionally, HAT-CL offers ready-to-use HAT networks that are smoothly integrated with the TIMM library. Beyond the redesign and reimplementation of HAT, we also introduce novel mask manipulation techniques for HAT, which have consistently shown improvements across various experiments. Our work paves the way for a broader application of the HAT mechanism, opening up new possibilities in continual learning across diverse models and applications.Here's the text with some additional information about the Simplified Chinese translation:The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. The translation is written in a formal and neutral style, with a focus on accurately conveying the meaning of the original text.Some of the key features of the translation include:* Use of Catastrophic forgetting (泛化遗忘) to refer to the phenomenon of neural networks losing previously obtained knowledge during the learning of new tasks.* Use of Hard-Attention-to-the-Task (HAT) mechanism (硬件注意力机制) to refer to the mechanism proposed in the paper to mitigate catastrophic forgetting.* Use of PyTorch-compatible (PyTorch 兼容) to refer to the fact that the HAT mechanism is designed to be compatible with the PyTorch deep learning framework.* Use of ready-to-use HAT networks (准备好使用的 HAT 网络) to refer to the fact that the HAT-CL mechanism provides pre-built HAT networks that can be easily integrated into existing architectures.* Use of novel mask manipulation techniques (新的面积操作技术) to refer to the additional techniques proposed in the paper to improve the performance of the HAT mechanism.Overall, the translation aims to accurately convey the meaning and content of the original text in a formal and neutral style, while also taking into account the conventions and characteristics of Simplified Chinese.

Application of BadNets in Spam Filters

paper_url: http://arxiv.org/abs/2307.09649
repo_url: None
paper_authors: Swagnik Roychoudhury, Akshaj Kumar Veldanda
for: 防止垃圾邮件泄露用户个人信息和威胁邮件系统安全性
methods: 利用机器学习模型攻击推送垃圾邮件
results: 成功实现攻击后门攻击，提示机器学习模型供应链需要严格评估和监测

Abstract
Spam filters are a crucial component of modern email systems, as they help to protect users from unwanted and potentially harmful emails. However, the effectiveness of these filters is dependent on the quality of the machine learning models that power them. In this paper, we design backdoor attacks in the domain of spam filtering. By demonstrating the potential vulnerabilities in the machine learning model supply chain, we highlight the need for careful consideration and evaluation of the models used in spam filters. Our results show that the backdoor attacks can be effectively used to identify vulnerabilities in spam filters and suggest the need for ongoing monitoring and improvement in this area.

摘要
防ospam filter 是现代电子邮件系统中的重要组件，它们帮助保护用户从不想要的和可能有害的电子邮件中受到保护。然而，这些防ospam filter 的效iveness 取决于机器学习模型的质量。在这篇论文中，我们设计了针对防ospam filter 的后门攻击。我们通过示例出了机器学习模型供应链中的潜在漏洞，并高亮了在使用这些模型时需要仔细评估和评估。我们的结果表明，后门攻击可以有效地找到防ospam filter 中的漏洞，并建议需要持续监测和改进。

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

paper_url: http://arxiv.org/abs/2307.09638
repo_url: https://github.com/chandar-lab/cmoptimizer
paper_authors: Pranshu Malviya, Gonçalo Mordido, Aristide Baratin, Reza Babanezhad Harikandeh, Jerry Huang, Simon Lacoste-Julien, Razvan Pascanu, Sarath Chandar
for: 本文主要针对大规模深度学习模型的训练。
methods: 本文提出了一种新的记忆扩展版的Adam优化器，通过在训练过程中使用缓存来提高探索较平的极小值的能力。
results: 本文经验表明，该方法可以提高多种Adam变体在标准的报告语言模型和图像分类任务中的表现。

Abstract
Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of such optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.

摘要
《适应型梯度下降优化器，特别是Adam，在训练大规模深度学习模型中留下了痕迹。这些优化器的优点在于它们在选择超参数时表现出更快的收敛速度和更高的稳定性。然而，它们通常在泛化性方面表现不佳。现有研究表明，这种性能差异与找到狭窄的极小点有关：适应方法通常找到具有更锐度的极小点，这会导致泛化性下降。为了解决这个问题，我们提出了一种新的内存扩展版本的Adam优化器，通过在训练过程中使用缓存来促进探索更平降的极小点。直观来说，使用缓存会让优化器在缺乏够广的极小点基因上进行过射。我们实验表明，我们的方法可以提高许多变种的Adam优化器在标准的supervised语言模型和图像分类任务中的性能。》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you prefer Traditional Chinese, I can provide that as well.

Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning

paper_url: http://arxiv.org/abs/2307.09619
repo_url: https://github.com/google-research/dataset_grouper
paper_authors: Zachary Charles, Nicole Mitchell, Krishna Pillutla, Michael Reneer, Zachary Garrett
for: This paper is written for researchers and practitioners in the field of federated learning, particularly those interested in creating large-scale group-structured datasets for federated learning simulations.
methods: The paper introduces a new library called Dataset Grouper, which allows researchers to create large-scale group-structured datasets for federated learning simulations. The library scales to settings where even a single group’s dataset is too large to fit in memory, and provides flexibility in choosing the base dataset and defining partitions.
results: The paper demonstrates the effectiveness of Dataset Grouper through experimental results on large-scale federated language modeling simulations. The results show that algorithms like FedAvg operate more as meta-learning methods than as empirical risk minimization methods at this scale, suggesting their utility in downstream personalization and task-specific adaptation.

Abstract
We introduce a library, Dataset Grouper, to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library allows the creation of group-structured versions of existing datasets based on user-specified partitions, and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions. Finally, it is framework-agnostic. We empirically demonstrate that Dataset Grouper allows for large-scale federated language modeling simulations on datasets that are orders of magnitude larger than in previous work. Our experimental results show that algorithms like FedAvg operate more as meta-learning methods than as empirical risk minimization methods at this scale, suggesting their utility in downstream personalization and task-specific adaptation.

摘要
我团队介绍一个库 named Dataset Grouper，用于创建大规模群体结构化（例如联邦学习）数据集，使得联邦学习模拟可以在基础模型的规模上进行。这个库允许用户根据自己的分区来创建群体结构化的现有数据集的版本，直接导致了一些灵活的多种数据集，可以适应现有的软件框架。Dataset Grouper具有三个关键优势：首先，它可以处理具有太多数据的设置，以至于单个群体的数据集不能置于内存中；其次，它允许用户选择基本（非分区）数据集和定义分区，提供了更多的灵活性；最后，它是框架不依赖的。我们通过实验证明，使用Dataset Grouper可以进行大规模的联邦语言模型 simulations，比前作多出了数个数量级的领域。我们的实验结果表明，在这种规模下，算法如 FedAvg 更像是元学习方法而非实际风险最小化方法，这表明它们在下游个性化和任务特定适应方面具有Utility。

Gradient strikes back: How filtering out high frequencies improves explanations

paper_url: http://arxiv.org/abs/2307.09591
repo_url: None
paper_authors: Sabine Muzellec, Léo Andéol, Thomas Fel, Rufin VanRullen, Thomas Serre
for: 本研究旨在解释深度神经网络（CNN）的决策过程中， Gradient-based 和 Prediction-based 两种方法的差异，以及这些方法对决策的影响。
methods: 本研究使用了三种代表性的视觉分类模型的梯度分析，探讨了这些模型中高频信息的来源，以及这些信息对决策的影响。
results: 研究发现，Gradient-based 方法的梯度中含有噪声信息，而 Prediction-based 方法的梯度中减少了高频信息。此外，研究还发现，CNN 中的下采样操作可能是高频信息的主要来源。通过应用优化的低通量滤波器，可以改善 Gradient-based 方法的解释效果。研究结果表明，移除高频噪声可以提高 Gradient-based 方法的解释效果，并且将 Gradient-based 方法 ranked 为 state-of-the-art 方法。

Abstract
Recent years have witnessed an explosion in the development of novel prediction-based attribution methods, which have slowly been supplanting older gradient-based methods to explain the decisions of deep neural networks. However, it is still not clear why prediction-based methods outperform gradient-based ones. Here, we start with an empirical observation: these two approaches yield attribution maps with very different power spectra, with gradient-based methods revealing more high-frequency content than prediction-based methods. This observation raises multiple questions: What is the source of this high-frequency information, and does it truly reflect decisions made by the system? Lastly, why would the absence of high-frequency information in prediction-based methods yield better explainability scores along multiple metrics? We analyze the gradient of three representative visual classification models and observe that it contains noisy information emanating from high-frequencies. Furthermore, our analysis reveals that the operations used in Convolutional Neural Networks (CNNs) for downsampling appear to be a significant source of this high-frequency content -- suggesting aliasing as a possible underlying basis. We then apply an optimal low-pass filter for attribution maps and demonstrate that it improves gradient-based attribution methods. We show that (i) removing high-frequency noise yields significant improvements in the explainability scores obtained with gradient-based methods across multiple models -- leading to (ii) a novel ranking of state-of-the-art methods with gradient-based methods at the top. We believe that our results will spur renewed interest in simpler and computationally more efficient gradient-based methods for explainability.

摘要
近年来，有一个快速发展的新的预测基于的归因方法，逐渐替代了老的梯度基于的方法来解释深度神经网络的决策。然而，还不清楚为什么预测基于的方法超过梯度基于的方法。在这里，我们开始从一个实际观察出发：这两种方法生成的归因图有非常不同的功率特征，预测基于的方法的归因图具有较低的功率特征，而梯度基于的方法的归因图具有更高的功率特征。这个观察问题出现了多个问题：预测基于的方法中具有高频信息的来源是什么，这些信息是否真正反映系统做出的决策？最后，预测基于的方法中缺乏高频信息的原因是为什么会导致多个纪录的解释分数提高？我们分析了三种代表性的视觉分类模型的梯度，发现梯度包含了高频信息的噪声。我们的分析还表明，在Convolutional Neural Networks（CNNs）中用于下采样的操作是高频信息的主要来源，这表明噪声可能是下采样操作的基础。我们应用最佳低通 filters 来修正归因图，并证明了移除高频噪声可以大幅提高梯度基于的解释分数。我们的结果表明，（i）移除高频噪声可以使得梯度基于的解释分数在多个模型中得到显著改进，导致（ii）新的状态艺术方法排名，梯度基于的方法位于排名的顶峰。我们认为，我们的结果将会激发人们对简单而计算效率更高的梯度基于的解释方法的新的兴趣。

Self-Compatibility: Evaluating Causal Discovery without Ground Truth

paper_url: http://arxiv.org/abs/2307.09552
repo_url: None
paper_authors: Philipp M. Faller, Leena Chennuru Vankadara, Atalanti A. Mastakouri, Francesco Locatello, Dominik Janzing
for: 本研究的目的是提出一种新的方法来证明 causal discovery 算法的输出是否正确， absence of ground truth.
methods: 本研究使用了一种新的方法， which relies on the notion of compatibility between causal graphs learned on different subsets of variables.
results: 研究表明， detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Additionally, the method provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution.

Abstract
As causal ground truth is incredibly rare, causal discovery algorithms are commonly only evaluated on simulated data. This is concerning, given that simulations reflect common preconceptions about generating processes regarding noise distributions, model classes, and more. In this work, we propose a novel method for falsifying the output of a causal discovery algorithm in the absence of ground truth. Our key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables. Motivated by this insight, our method relies on a notion of compatibility between causal graphs learned on different subsets of variables. We prove that detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Although passing such compatibility tests is only a necessary criterion for good performance, we argue that it provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution. We also demonstrate experimentally that detection of incompatibilities can aid in causal model selection.

摘要
《因果真实是极其罕见的，因此 causal discovery 算法通常只能在模拟数据上进行评估。这有一定的问题，因为模拟数据反映了我们对生成过程的共同假设，如噪声分布、模型类型等。在这种情况下，我们提出了一种新的方法来证明 causal discovery 算法的输出是错误的。我们的关键发现是，统计学学习寻求数据点集之间的稳定性，而 causal learning 则应该寻求变量集之间的稳定性。根据这一点，我们的方法基于变量集之间的兼容性来评估 causal 模型。我们证明了，如果检测到兼容性不兼容，则可以证明因果关系是错误的，这可能是因为假设的违反或 finite sample effects 的错误。虽然通过兼容性测试只是一个必要条件，但我们 argue 它提供了强有力的证明，当兼容性导致 JOINT 分布的强制性时。我们还在实验中证明了，检测到兼容性可以帮助在 causal 模型选择中。》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Analyzing sports commentary in order to automatically recognize events and extract insights

paper_url: http://arxiv.org/abs/2307.10303
repo_url: https://github.com/yanismiraoui/analyzing-sports-commentary-in-order-to-automatically-recognize-events-and-extract-insights
paper_authors: Yanis Miraoui
for: 本研究旨在使用多种自然语言处理技术和方法自动识别体育赛事中的主要动作。
methods: 本研究使用了多种生动语言评论来分类不同来源的主要动作，并研究了情感分析是否能够检测主要动作。
results: 研究发现，使用多种自然语言处理技术和方法可以准确地识别体育赛事中的主要动作，并且情感分析可以帮助检测主要动作。

Abstract
In this paper, we carefully investigate how we can use multiple different Natural Language Processing techniques and methods in order to automatically recognize the main actions in sports events. We aim to extract insights by analyzing live sport commentaries from different sources and by classifying these major actions into different categories. We also study if sentiment analysis could help detect these main actions.

摘要
在这篇论文中，我们仔细研究了如何使用多种自然语言处理技术和方法来自动识别体育活动中的主要动作。我们希望通过分析不同来源的直播体育评论来提取分析结论，并将这些主要动作分类为不同的类别。此外，我们还研究了情感分析是否可以帮助检测这些主要动作。

The semantic landscape paradigm for neural networks

paper_url: http://arxiv.org/abs/2307.09550
repo_url: None
paper_authors: Shreyas Gokhale
for: 这篇论文的目的是提出一个概念和数学框架，用于描述深度神经网络的训练剖架和性能。
methods: 这篇论文使用的方法是基于神经网络学习的表示学习的概念和数学模型，用于解释神经网络的训练过程和性能。
results: 这篇论文的结果是提出了一种名为“semantic landscape”的概念和数学框架，可以用于描述神经网络的训练剖架和性能，并且可以用于解释神经网络的各种现象，如grokking和emergence。

Abstract
Deep neural networks exhibit a fascinating spectrum of phenomena ranging from predictable scaling laws to the unpredictable emergence of new capabilities as a function of training time, dataset size and network size. Analysis of these phenomena has revealed the existence of concepts and algorithms encoded within the learned representations of these networks. While significant strides have been made in explaining observed phenomena separately, a unified framework for understanding, dissecting, and predicting the performance of neural networks is lacking. Here, we introduce the semantic landscape paradigm, a conceptual and mathematical framework that describes the training dynamics of neural networks as trajectories on a graph whose nodes correspond to emergent algorithms that are instrinsic to the learned representations of the networks. This abstraction enables us to describe a wide range of neural network phenomena in terms of well studied problems in statistical physics. Specifically, we show that grokking and emergence with scale are associated with percolation phenomena, and neural scaling laws are explainable in terms of the statistics of random walks on graphs. Finally, we discuss how the semantic landscape paradigm complements existing theoretical and practical approaches aimed at understanding and interpreting deep neural networks.

摘要
深度神经网络展现出一种惊喜的谱面，从预测可预测的缩放法则到由训练时间、数据集大小和网络大小而带来的新的能力的不可预测出现。对这些现象的分析发现了神经网络中学习表示中的概念和算法。虽然在分解这些现象方面已经做出了重要的进步，但一个综合的框架 для理解、拆分和预测神经网络的性能是缺失的。在这里，我们引入 semantic landscape 概念，它是一种概念和数学框架，用于描述神经网络的训练剖ogram，其节点对应于神经网络学习表示中的内在的算法。这种抽象使得我们可以用已有的统计物理学问题来描述各种神经网络现象。例如，我们表明了感知和emergence with scale是percolation现象的关联，而神经网络的尺度法则是可以通过图集的统计学来解释的。最后，我们讨论了 semantic landscape 概念如何与现有的理论和实践方法相结合，以更好地理解和解释深度神经网络。

DreaMR: Diffusion-driven Counterfactual Explanation for Functional MRI

paper_url: http://arxiv.org/abs/2307.09547
repo_url: https://github.com/icon-lab/dreamr
paper_authors: Hasan Atakan Bedel, Tolga Çukur
for: 这个论文旨在提供一种高准确性、可信度和效率的 diffusion-driven counterfactual method，以便对 fMRI 数据进行解释。
methods: 该论文使用了一种新型的 fractional multi-phase-distilled diffusion prior，以提高抽样效率而不妥协准确性，同时使用 transformer 架构来考虑 long-range spatiotemporal 上下文。
results: 对 neuroimaging 数据进行了广泛的实验，并证明了 DreaMR 比 state-of-the-art counterfactual 方法在 sample generation 方面具有更高的准确性、可信度和效率。

Abstract
Deep learning analyses have offered sensitivity leaps in detection of cognitive states from functional MRI (fMRI) measurements across the brain. Yet, as deep models perform hierarchical nonlinear transformations on their input, interpreting the association between brain responses and cognitive states is challenging. Among common explanation approaches for deep fMRI classifiers, attribution methods show poor specificity and perturbation methods show limited plausibility. While counterfactual generation promises to address these limitations, previous methods use variational or adversarial priors that yield suboptimal sample fidelity. Here, we introduce the first diffusion-driven counterfactual method, DreaMR, to enable fMRI interpretation with high specificity, plausibility and fidelity. DreaMR performs diffusion-based resampling of an input fMRI sample to alter the decision of a downstream classifier, and then computes the minimal difference between the original and counterfactual samples for explanation. Unlike conventional diffusion methods, DreaMR leverages a novel fractional multi-phase-distilled diffusion prior to improve sampling efficiency without compromising fidelity, and it employs a transformer architecture to account for long-range spatiotemporal context in fMRI scans. Comprehensive experiments on neuroimaging datasets demonstrate the superior specificity, fidelity and efficiency of DreaMR in sample generation over state-of-the-art counterfactual methods for fMRI interpretation.

摘要
深度学习分析已经提供了识别认知状态从功能磁共振成像（fMRI）测量的感知跳变。然而，由于深度模型对输入进行堆叠非线性变换，因此解释脑响应和认知状态之间的关系具有挑战。当前的解释方法包括负回归方法和负影响方法，但它们具有较低的特点和可靠性。在这些限制下，Counterfactual生成技术被提出，以提高深度fMRI分类器的解释能力。在本文中，我们介绍了首个扩散驱动Counterfactual方法，即DreaMR，用于实现高特点、可靠性和准确性的fMRI解释。DreaMR使用扩散基于扩散的重新混合输入fMRI样本，并计算原始和对应样本之间的最小差异来进行解释。与传统扩散方法不同，DreaMR利用一种新的分数多相态混合扩散先进技术来提高抽象效率而无需妥协准确性，并使用一个转换架构来考虑fMRI扫描中的长距离空间时间 correlations。实验表明，DreaMR在 neuroscience 数据集上具有更高的特点、可靠性和效率，并且在 sample generation 方面与当前最佳的Counterfactual方法进行比较。

Can Neural Network Memorization Be Localized?

paper_url: http://arxiv.org/abs/2307.09542
repo_url: https://github.com/pratyushmaini/localizing-memorization
paper_authors: Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang
for: 本文探讨了深度过参数网络中 memorization 和通用化的关系，并提出了一种新的 dropout 方法来控制 memorization。
methods: 本文使用了三种实验来证明 memorization 不是局限于各层，而是一种在模型中的小集合的现象。这三种实验包括 gradient accounting、layer rewinding 和 retraining。
results: 研究发现，大多数层在 memorization 方面是 redundant，而 memorization 的层通常不是最后几层。此外，研究还发现 memorization 通常是模型中的一小部分 neuron 或通道（大约 5）的现象。基于这些发现，本文提出了一种新的 dropout 方法——example-tied dropout，可以控制 memorization 的分布。

Abstract
Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks $\textit{memorize}$ "hard" examples in the final few layers of the model. Memorization refers to the ability to correctly predict on $\textit{atypical}$ examples of the training set. In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. First, via three experimental sources of converging evidence, we find that most layers are redundant for the memorization of examples and the layers that contribute to example memorization are, in general, not the final layers. The three sources are $\textit{gradient accounting}$ (measuring the contribution to the gradient norms from memorized and clean examples), $\textit{layer rewinding}$ (replacing specific model weights of a converged model with previous training checkpoints), and $\textit{retraining}$ (training rewound layers only on clean examples). Second, we ask a more generic question: can memorization be localized $\textit{anywhere}$ in a model? We discover that memorization is often confined to a small number of neurons or channels (around 5) of the model. Based on these insights we propose a new form of dropout -- $\textit{example-tied dropout}$ that enables us to direct the memorization of examples to an apriori determined set of neurons. By dropping out these neurons, we are able to reduce the accuracy on memorized examples from $100\%\to3\%$, while also reducing the generalization gap.

摘要
最近的研究表明，深度过参数网络中的各层之间的交互作用可以帮助网络“记忆”特殊的训练集示例。记忆指的是网络可以正确预测训练集中的特殊示例。在这项工作中，我们表明了这种记忆不是局限于具体层，而是一种局限于小量神经元的现象。我们通过三种实验证据来证明这一点：1. Gradient Accounting：我们测试了各层对 memorized 和 clean 示例的贡献，发现大多数层对 memorized 示例的贡献很小，而且这些层通常不是最后几层。2. Layer Rewinding：我们将特定模型的权重更改为训练过程中的检查点权重，发现这些层对 memorized 示例的贡献也很小。3. Retraining：我们只在 clean 示例上训练rewound层，发现这些层对 memorized 示例的贡献也很小。 Based on these findings, we propose a new form of dropout called "example-tied dropout" that can direct the memorization of examples to a priori determined neurons. By dropping out these neurons, we can reduce the accuracy on memorized examples from 100% to 3%, while also reducing the generalization gap.

Forecasting the steam mass flow in a powerplant using the parallel hybrid network

paper_url: http://arxiv.org/abs/2307.09483
repo_url: None
paper_authors: Andrii Kurkin, Jonas Hegemann, Mo Kordzanganeh, Alexey Melnikov
for: 这种研究旨在提高热电厂的蒸汽质量流量预测精度，以提高操作效率和降低成本。
methods: 该研究使用了并行混合神经网络架构，结合 Parametrized Quantum Circuit 和 Conventional Feed-Forward Neural Network，专门为产业场景的时间序列预测。
results: 对比 Standalone Classical 和 Quantum 模型，并行混合模型在测试集上的 MSE 损失较低，比 pure Classical 和 pure Quantum 网络低出至少 5.7 和 4.9 倍。此外，混合模型在测试集上的相对误差较小，与基eline Classical 模型相比，最多下降 2 倍。这些发现可以帮助energy sector解决实际问题，最终带来更好的发电厂运行效率。

Abstract
Efficient and sustainable power generation is a crucial concern in the energy sector. In particular, thermal power plants grapple with accurately predicting steam mass flow, which is crucial for operational efficiency and cost reduction. In this study, we use a parallel hybrid neural network architecture that combines a parametrized quantum circuit and a conventional feed-forward neural network specifically designed for time-series prediction in industrial settings to enhance predictions of steam mass flow 15 minutes into the future. Our results show that the parallel hybrid model outperforms standalone classical and quantum models, achieving more than 5.7 and 4.9 times lower mean squared error (MSE) loss on the test set after training compared to pure classical and pure quantum networks, respectively. Furthermore, the hybrid model demonstrates smaller relative errors between the ground truth and the model predictions on the test set, up to 2 times better than the pure classical model. These findings contribute to the broader scientific understanding of how integrating quantum and classical machine learning techniques can be applied to real-world challenges faced by the energy sector, ultimately leading to optimized power plant operations.

摘要
efficient和可持续的电力生产是能源领域的关键问题。特别是热电厂面临精准预测蒸汽质量的挑战，这对操作效率和成本减少都是关键。在这种研究中，我们使用并行的混合神经网络架构，将 parametrized quantum circuit 和 conventional feed-forward neural network 结合在一起，专门用于工业设置中的时间序列预测。我们的结果表明，并行混合模型比纯经典和量子模型都更高效，在测试集上减少MSE损失的mean squared error 比例下降超过5.7和4.9倍，而且在测试集上相对误差与真实值的比较也更小，最多下降2倍于纯经典模型。这些发现对能源领域的科学理解如何将量子和经典机器学习技术集成应用于实际挑战中，带来了优化的发电厂操作。

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

paper_url: http://arxiv.org/abs/2307.09476
repo_url: https://github.com/dannyallover/overthinking_the_truth
paper_authors: Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt
for: 研究模型内部表征如何导致危险模仿行为
methods: 使用 few-shot learning 训练模型，并分析模型内部表征以探究危险模仿行为的两个相关现象：过思考和假推导头
results: 研究发现，在某些层次上，模型会因为 incorrect 示例而导致模型行为分化，并且这种分化可能是由 false induction heads 引起的。禁用 false induction heads 可以降低过思考现象。这些结果提供了一个可能有用的方向，以便更好地理解和防止危险模型行为。

Abstract
Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present in the context. We study harmful imitation through the lens of a model's internal representations, and identify two related phenomena: overthinking and false induction heads. The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations. At early layers, both demonstrations induce similar model behavior, but the behavior diverges sharply at some "critical layer", after which the accuracy given incorrect demonstrations progressively decreases. The second phenomenon, false induction heads, are a possible mechanistic cause of overthinking: these are heads in late layers that attend to and copy false information from previous demonstrations, and whose ablation reduces overthinking. Beyond scientific understanding, our results suggest that studying intermediate model computations could be a promising avenue for understanding and guarding against harmful model behaviors.

摘要
现代语言模型可以通过几招学习模式，完成复杂任务而无需微调。然而，仿效也可能导致模型复制错误或有害内容，如果存在在上下文中。我们通过模型内部表示的研究，发现了两种相关现象：过思和假推导头。第一种现象，过思，发生在我们decode预测从中间层，正确vs.错误几招示范下。在早期层，两种示范都会导致相似的模型行为，但行为在某“关键层”后弯曲分别， incorrect示范后的准确率逐渐下降。第二种现象，假推导头，可能是过思的机制原因：这些在晚期层出现的头部会复制和仿效先前示范中的错误信息，并且它们的ablation可以减少过思。我们的结果表明，研究中间模型计算可能是理解和防止危险模型行为的有望之路。

A Cryogenic Memristive Neural Decoder for Fault-tolerant Quantum Error Correction

paper_url: http://arxiv.org/abs/2307.09463
repo_url: None
paper_authors: Frédéric Marcotte, Pierre-Antoine Mouny, Victor Yon, Gebremedhin A. Dagnew, Bohdan Kulchytskyy, Sophie Rochette, Yann Beilliard, Dominique Drouin, Pooya Ronagh
for: 这个论文目的是为了提高量子错误修复（QEC）中的神经网络扩散器的性能。methods: 这个论文使用的方法是基于内存计算（IMC）架构，使用栅格阵列来存储神经网络的 synaptic 权重和进行分析时的分数积分。results: 这个论文的结果表明，使用 TiO$_\textrm{x}$-based 记忆Device 的非理想特性会导致损失精度，但通过硬件意识训练方法可以 Mitigate 这种损失，使得神经网络扩散器可以达到 $9.23\times 10^{-4}$ 的 pseudo-threshold。

Abstract
Neural decoders for quantum error correction (QEC) rely on neural networks to classify syndromes extracted from error correction codes and find appropriate recovery operators to protect logical information against errors. Despite the good performance of neural decoders, important practical requirements remain to be achieved, such as minimizing the decoding time to meet typical rates of syndrome generation in repeated error correction schemes, and ensuring the scalability of the decoding approach as the code distance increases. Designing a dedicated integrated circuit to perform the decoding task in co-integration with a quantum processor appears necessary to reach these decoding time and scalability requirements, as routing signals in and out of a cryogenic environment to be processed externally leads to unnecessary delays and an eventual wiring bottleneck. In this work, we report the design and performance analysis of a neural decoder inference accelerator based on an in-memory computing (IMC) architecture, where crossbar arrays of resistive memory devices are employed to both store the synaptic weights of the decoder neural network and perform analog matrix-vector multiplications during inference. In proof-of-concept numerical experiments supported by experimental measurements, we investigate the impact of TiO$_\textrm{x}$-based memristive devices' non-idealities on decoding accuracy. Hardware-aware training methods are developed to mitigate the loss in accuracy, allowing the memristive neural decoders to achieve a pseudo-threshold of $9.23\times 10^{-4}$ for the distance-three surface code, whereas the equivalent digital neural decoder achieves a pseudo-threshold of $1.01\times 10^{-3}$. This work provides a pathway to scalable, fast, and low-power cryogenic IMC hardware for integrated QEC.

摘要
量子错误修复（QEC）神经解码器 rely on 神经网络来分类错误码和找到保护逻辑信息的正确操作符。despite the good performance of neural decoders, important practical requirements remain to be achieved, such as minimizing the decoding time to meet typical rates of syndrome generation in repeated error correction schemes, and ensuring the scalability of the decoding approach as the code distance increases. Designing a dedicated integrated circuit to perform the decoding task in co-integration with a quantum processor appears necessary to reach these decoding time and scalability requirements, as routing signals in and out of a cryogenic environment to be processed externally leads to unnecessary delays and an eventual wiring bottleneck. In this work, we report the design and performance analysis of a neural decoder inference accelerator based on an in-memory computing（IMC）architecture, where crossbar arrays of resistive memory devices are employed to both store the synaptic weights of the decoder neural network and perform analog matrix-vector multiplications during inference. In proof-of-concept numerical experiments supported by experimental measurements, we investigate the impact of TiO$_\textrm{x}$-based memristive devices' non-idealities on decoding accuracy. Hardware-aware training methods are developed to mitigate the loss in accuracy, allowing the memristive neural decoders to achieve a pseudo-threshold of $9.23\times 10^{-4}$ for the distance-three surface code, whereas the equivalent digital neural decoder achieves a pseudo-threshold of $1.01\times 10^{-3}$. This work provides a pathway to scalable, fast, and low-power cryogenic IMC hardware for integrated QEC.

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

paper_url: http://arxiv.org/abs/2307.09458
repo_url: None
paper_authors: Tom Lieberum, Matthew Rahtz, János Kramár, Neel Nanda, Geoffrey Irving, Rohin Shah, Vladimir Mikulik
for: 这 paper 旨在测试 Chinchilla 模型中的回路分析的可扩展性。
methods: 这 paper 使用了现有的回路分析技术，包括 logit 归因、注意力图像化和活动覆盖，对 Chinchilla 模型进行了测试。
results: 研究发现，对于多选问题的回答，可以快速压缩查询、关键和值子空间，无损性性能。然而，当使用这些解释来理解 heads 对多选问题回答的行为时，发现只是一个 partial explanation， suggessts 还有更多的学习需要完成。

Abstract
\emph{Circuit analysis} is a promising technique for understanding the internal mechanisms of language models. However, existing analyses are done in small models far from the state of the art. To address this, we present a case study of circuit analysis in the 70B Chinchilla model, aiming to test the scalability of circuit analysis. In particular, we study multiple-choice question answering, and investigate Chinchilla's capability to identify the correct answer \emph{label} given knowledge of the correct answer \emph{text}. We find that the existing techniques of logit attribution, attention pattern visualization, and activation patching naturally scale to Chinchilla, allowing us to identify and categorize a small set of `output nodes' (attention heads and MLPs). We further study the `correct letter' category of attention heads aiming to understand the semantics of their features, with mixed results. For normal multiple-choice question answers, we significantly compress the query, key and value subspaces of the head without loss of performance when operating on the answer labels for multiple-choice questions, and we show that the query and key subspaces represent an `Nth item in an enumeration' feature to at least some extent. However, when we attempt to use this explanation to understand the heads' behaviour on a more general distribution including randomized answer labels, we find that it is only a partial explanation, suggesting there is more to learn about the operation of `correct letter' heads on multiple choice question answering.

摘要
我们进一步研究 `正确字符' 类型的注意力头，以了解它们的Semantics。结果是混合的，我们在对多选问答的答案标签进行操作时，可以压缩查询、键和值子空间，而不会影响性能。此外，我们发现查询和键子空间表示了 `N个项在排序' 的特征，至少在一些程度上。然而，当我们尝试使用这种解释来理解 `correct letter' 头在多选问答中的行为时，我们发现这只是一个 partial explanation， suggesting there is more to learn about the operation of `correct letter' heads on multiple choice question answering。

Smooth Attention for Deep Multiple Instance Learning: Application to CT Intracranial Hemorrhage Detection

paper_url: http://arxiv.org/abs/2307.09457
repo_url: https://github.com/yunanwu2168/sa-mil
paper_authors: Yunan Wu, Francisco M. Castro-Macías, Pablo Morales-Álvarez, Rafael Molina, Aggelos K. Katsaggelos
for: 这篇研究应用于医疗影像诊断，特别是头部CT扫描中的血肿诊断。
methods: 这篇研究提出了一个缓和对应深度多例学习（SA-DMIL）模型，通过设定首次和第二次约束来实现缓和性。
results: 研究结果显示，SA-DMIL模型在头部CT扫描中血肿诊断方面比非缓和对应深度多例学习模型表现更好，并且学习了对于每个扫描的空间相依性。同时，它也超过了目前医疗影像诊断中的现有州态艺术方法。

Abstract
Multiple Instance Learning (MIL) has been widely applied to medical imaging diagnosis, where bag labels are known and instance labels inside bags are unknown. Traditional MIL assumes that instances in each bag are independent samples from a given distribution. However, instances are often spatially or sequentially ordered, and one would expect similar diagnostic importance for neighboring instances. To address this, in this study, we propose a smooth attention deep MIL (SA-DMIL) model. Smoothness is achieved by the introduction of first and second order constraints on the latent function encoding the attention paid to each instance in a bag. The method is applied to the detection of intracranial hemorrhage (ICH) on head CT scans. The results show that this novel SA-DMIL: (a) achieves better performance than the non-smooth attention MIL at both scan (bag) and slice (instance) levels; (b) learns spatial dependencies between slices; and (c) outperforms current state-of-the-art MIL methods on the same ICH test set.

摘要

Convergent regularization in inverse problems and linear plug-and-play denoisers

paper_url: http://arxiv.org/abs/2307.09441
repo_url: None
paper_authors: Andreas Hauptmann, Subhadip Mukherjee, Carola-Bibiane Schönlieb, Ferdia Sherry
for: 这篇论文的目的是研究插件和排除（PnP）去噪的可行性和稳定性。
methods: 这篇论文使用了经典的干扰理论和一些可证明是可靠的数据驱动方法来研究PnP算法的稳定性。
results: 这篇论文提出了一种新的spectral filtering技术来控制PnP算法中的噪声约束，并证明了PnP算法在linear denoiser情况下是一种可靠的减噪方法。

Abstract
Plug-and-play (PnP) denoising is a popular iterative framework for solving imaging inverse problems using off-the-shelf image denoisers. Their empirical success has motivated a line of research that seeks to understand the convergence of PnP iterates under various assumptions on the denoiser. While a significant amount of research has gone into establishing the convergence of the PnP iteration for different regularity conditions on the denoisers, not much is known about the asymptotic properties of the converged solution as the noise level in the measurement tends to zero, i.e., whether PnP methods are provably convergent regularization schemes under reasonable assumptions on the denoiser. This paper serves two purposes: first, we provide an overview of the classical regularization theory in inverse problems and survey a few notable recent data-driven methods that are provably convergent regularization schemes. We then continue to discuss PnP algorithms and their established convergence guarantees. Subsequently, we consider PnP algorithms with linear denoisers and propose a novel spectral filtering technique to control the strength of regularization arising from the denoiser. Further, by relating the implicit regularization of the denoiser to an explicit regularization functional, we rigorously show that PnP with linear denoisers leads to a convergent regularization scheme. More specifically, we prove that in the limit as the noise vanishes, the PnP reconstruction converges to the minimizer of a regularization potential subject to the solution satisfying the noiseless operator equation. The theoretical analysis is corroborated by numerical experiments for the classical inverse problem of tomographic image reconstruction.

摘要
插件并运行（PnP）杜食是一种流行的迭代框架，用于解决图像逆问题。它们的实际成功激发了一条研究的线索，旨在了解PnP迭代的稳定性和收敛性。虽然一些研究已经证明了PnP迭代的稳定性，但它们对静音水平下的恒定性还不够了解。这篇论文旨在两个目的：首先，提供经典的反问题理论的概述，并Survey一些最近的数据驱动方法，这些方法是可证明的收敛规则。然后，我们继续讨论PnP算法，并证明它们的稳定性和收敛性。接下来，我们考虑PnP算法与线性杜食器的组合，并提出一种新的 спектраль滤波技术来控制杜食器对正则化的影响。此外，我们将杜食器的隐式正则化相关到一个显式的正则化函数中，并证明PnP与线性杜食器的组合是一种收敛的正则化方案。具体来说，我们证明在静音水平下，PnP重建将收敛到一个具有正则化潜在能量的最小化问题的解。这个 teorema 的实际分析得到了数值实验的证明。

Unsupervised Conditional Slot Attention for Object Centric Learning

paper_url: http://arxiv.org/abs/2307.09437
repo_url: None
paper_authors: Avinash Kori, Francesco Locatello, Francesca Toni, Ben Glocker
for: 这个论文的目的是学习无监督下的对象水平表示，以便在下游逻辑任务中进行更好的理解和推理。
methods: 这个论文使用了一种新的Conditional Slot Attention方法，它使用一个新的随机 Gaussian distribution来学习对象水平的特定槽级别绑定。
results: 这个论文的结果表明，使用Conditional Slot Attention方法可以在多个下游任务中提供场景组合能力和几个步骤适应能力，同时在对象发现任务中表现与Slot Attention方法相当或更好。

Abstract
Extracting object-level representations for downstream reasoning tasks is an emerging area in AI. Learning object-centric representations in an unsupervised setting presents multiple challenges, a key one being binding an arbitrary number of object instances to a specialized object slot. Recent object-centric representation methods like Slot Attention utilize iterative attention to learn composable representations with dynamic inference level binding but fail to achieve specialized slot level binding. To address this, in this paper we propose Unsupervised Conditional Slot Attention using a novel Probabilistic Slot Dictionary (PSD). We define PSD with (i) abstract object-level property vectors as key and (ii) parametric Gaussian distribution as its corresponding value. We demonstrate the benefits of the learnt specific object-level conditioning distributions in multiple downstream tasks, namely object discovery, compositional scene generation, and compositional visual reasoning. We show that our method provides scene composition capabilities and a significant boost in a few shot adaptability tasks of compositional visual reasoning, while performing similarly or better than slot attention in object discovery tasks

摘要
“抽取对象水平表示是人工智能领域的一个emerging领域。在无监督情况下学习对象中心表示存在多个挑战，其中一个关键的问题是将多个对象实例绑定到专门的对象槽。现有的对象中心表示方法，如满意注意力，利用迭代注意力学习可 compose 表示，但缺乏专门槽级别绑定。为解决这个问题，在这篇论文中我们提出了无监督条件满意注意力，使用一种新的probabilistic slot dictionary（PSD）。我们定义 PSD 的键是抽取对象级别属性向量，值是 Parametric Gaussian distribution。我们示出了学习的特定对象级别conditioning分布的好处，在多个下游任务中，包括对象发现、 compositional scene generation 和 compositional visual reasoning。我们显示了我们的方法提供了场景组合能力，并在一些几枚投入的 compositional visual reasoning 任务中带来了明显的提升，而与满意注意力在对象发现任务中表现类似或更好”

Scaling Laws for Imitation Learning in NetHack

paper_url: http://arxiv.org/abs/2307.09423
repo_url: None
paper_authors: Jens Tuyls, Dhruv Madeka, Kari Torkkola, Dean Foster, Karthik Narasimhan, Sham Kakade
for: 研究 Whether scaling up the model and data size can improve the performance of imitation learning in a challenging environment, such as the game of NetHack.
methods: 使用 Inspired by recent work in Natural Language Processing (NLP), the authors carefully scale up the model and data size to investigate the effectiveness of imitation learning in NetHack.
results: 发现 IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. The authors also find that their agents outperform prior state-of-the-art by at least 2x in all settings.

Abstract
Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, while powerful, many works find it is often not able to fully recover the underlying expert behavior. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting. To demonstrate our findings, we focus on the game of NetHack, a challenging environment featuring procedural generation, stochasticity, long-term dependencies, and partial observability. We find IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. We forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by at least 2x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a challenging domain, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.

摘要
遗传学习（IL）是机器学习中最广泛使用的方法之一，然而许多研究发现，它通常无法完全回归到专家行为的下面。然而，这些研究几乎没有深入探究扩大模型和数据集大小的作用。以最近的自然语言处理（NLP）研究为例，“扩大”已经导致了更加强大的LLMs的出现。我们调查了扩大模型和数据集大小对IL的影响，以示我们的发现。我们使用NetHack游戏作为研究环境，这是一个复杂的环境，具有过程生成、随机性、长期依赖和部分可见性。我们发现，IL损失和平均回报与计算预算有直线关系，并且存在计算优质的IL代理人的尺度-样本数量关系。我们预测和训练了一些NetHack代理人，并发现它们在所有设置下至少两倍于之前的状态艺术。我们的工作不仅证明了IL在复杂环境中的扩展行为，也证明了现有方法的扩大可以提供更加强大的代理人在NetHack游戏中。NetHack游戏仍然是现有AI系统所拒绝的游戏，我们的工作提供了一个可能的解决方案。

Causality-oriented robustness: exploiting general additive interventions

paper_url: http://arxiv.org/abs/2307.10299
repo_url: https://github.com/xwshen51/drig
paper_authors: Xinwei Shen, Peter Bühlmann, Armeen Taeb
for: This paper focuses on developing a robust prediction method that can handle distribution shifts in real-world applications.methods: The proposed method, Distributional Robustness via Invariant Gradients (DRIG), exploits general additive interventions in training data to achieve robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality.results: The authors prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts, and show that their framework includes anchor regression as a special case. Additionally, they extend their approach to the semi-supervised domain adaptation setting to further improve prediction performance, and empirically validate their methods on synthetic simulations and on single-cell data.

Abstract
Since distribution shifts are common in real-world applications, there is a pressing need for developing prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general additive interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression (Rothenh\"ausler et al.\ 2021) as a special case, and that it yields prediction models that protect against more diverse perturbations. We extend our approach to the semi-supervised domain adaptation setting to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell data.

摘要
因为分布shift是实际应用中的常见现象，因此有一项急需开发具有对不visible分布的Robustness的预测模型。现有的框架，如empirical risk minimization或distributionally robust optimization，可能会lack普遍性 для未经看过的分布，或者基于假设的距离度量。然而， causality提供了一种数据驱动和结构性的 Perspective for robust predictions。然而， causal inference的假设可能会太 stringent，而且 causal models中的Robustness often lacks flexibility。在这篇论文中，我们关注 causality-oriented robustness，并提出 Distributional Robustness via Invariant Gradients (DRIG)，一种利用训练数据中的通用加itivity interventions来为未经看过的interventions提供Robust predictions，并自然地 interpolates between in-distribution prediction和causality。在线性设置下，我们证明了 DRIG 的预测是对数据依赖性的 class of distribution shifts 中Robust。此外，我们显示了我们的框架包含了 anchor regression (Rothenh\"ausler et al.\ 2021) 的特例，并且其预测模型可以保护 против更多元的扰动。我们将我们的方法推广到 semi-supervised domain adaptation setting，以进一步改善预测性能。最后，我们实际验证了我们的方法在 sintetic simulations 和单元细胞数据上。

Online Learning with Costly Features in Non-stationary Environments

paper_url: http://arxiv.org/abs/2307.09388
repo_url: https://github.com/saeedghoorchian/ncc-bandits
paper_authors: Saeed Ghoorchian, Evgenii Kortukov, Setareh Maghsudi
for: maximizing long-term rewards in sequential decision-making problems
methods: extending the contextual bandit setting to observe subsets of features’ states, and developing an algorithm with a sublinear regret guarantee
results: superior performance in a real-world scenario compared to existing methods

Abstract
Maximizing long-term rewards is the primary goal in sequential decision-making problems. The majority of existing methods assume that side information is freely available, enabling the learning agent to observe all features' states before making a decision. In real-world problems, however, collecting beneficial information is often costly. That implies that, besides individual arms' reward, learning the observations of the features' states is essential to improve the decision-making strategy. The problem is aggravated in a non-stationary environment where reward and cost distributions undergo abrupt changes over time. To address the aforementioned dual learning problem, we extend the contextual bandit setting and allow the agent to observe subsets of features' states. The objective is to maximize the long-term average gain, which is the difference between the accumulated rewards and the paid costs on average. Therefore, the agent faces a trade-off between minimizing the cost of information acquisition and possibly improving the decision-making process using the obtained information. To this end, we develop an algorithm that guarantees a sublinear regret in time. Numerical results demonstrate the superiority of our proposed policy in a real-world scenario.

摘要
“最大化长期回报是Sequential decision-making问题的主要目标。现有的方法多数假设可以免费获得侧情报，让决策机器可以在决策之前观察所有特征的状态。然而，在实际问题中，收集有利信息可能是成本的。这意味着，除了个别臂的奖励之外，学习特征的状态观察也是重要的。在非站点环境下，奖励和成本分布会随时间而改变，这问题更加严重。为解决上述双重学习问题，我们将上下文ual bandit设定扩展，让机器人可以观察特征subset的状态。目标是 Maximizing the long-term average gain, which is the difference between the accumulated rewards and the paid costs on average.因此，机器人面临一个优化资讯成本和可能改善决策过程的问题。为此，我们开发了一个 guarantees a sublinear regret in time的算法。numerical results show that our proposed policy is superior to other methods in a real-world scenario.”Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

Batched Predictors Generalize within Distribution

paper_url: http://arxiv.org/abs/2307.09379
repo_url: None
paper_authors: Andreas Loukas, Pan Kessel
for: 这个论文主要研究批处理预测器的通用性特性，即用一小组例子（或批）来预测样本的平均标签。
methods: 这篇论文使用了一种适当的总化了Rademacher复杂度的方法来证明批处理预测器的泛化性能比标准每个样本预测更强。
results: 论文通过实验证明，批处理预测器在不同任务、结构和应用中都有较强的泛化性能，而且这些性能积极不受过参数化的影响。

Abstract
We study the generalization properties of batched predictors, i.e., models tasked with predicting the mean label of a small set (or batch) of examples. The batched prediction paradigm is particularly relevant for models deployed to determine the quality of a group of compounds in preparation for offline testing. By utilizing a suitable generalization of the Rademacher complexity, we prove that batched predictors come with exponentially stronger generalization guarantees as compared to the standard per-sample approach. Surprisingly, the proposed bound holds independently of overparametrization. Our theoretical insights are validated experimentally for various tasks, architectures, and applications.

摘要
我们研究批处理预测器的通用性质，即模型用于预测一小集（或批）的示例的平均标签。批处理预测模式特别有用于用于在线测试前对一组分子的质量进行评估。我们使用适当的总体化卡达默chs complexity来证明批处理预测器的泛化保证比标准每个样本预测更强大，而且这些保证独立于过参数化。我们的理论发现被实验证实了 для多种任务、架构和应用。Note: "批处理预测器" in Chinese is "批处理预测模式" (pīnzhèng yùjì zhìdǎo), and "总体化卡达默chs complexity" in Chinese is "总体化卡达默chs复杂度" (zòngtǐhuì kǎdàmùchōng dòu).

Data Cross-Segmentation for Improved Generalization in Reinforcement Learning Based Algorithmic Trading

paper_url: http://arxiv.org/abs/2307.09377
repo_url: None
paper_authors: Vikram Duvvur, Aashay Mehta, Edward Sun, Bo Wu, Ken Yew Chan, Jeff Schneider
for: 这个论文旨在提出一种基于强化学习的股票交易系统，以便在稀缺交易市场和不同化资产市场中进行更有效的交易。
methods: 该论文使用了强化学习算法，并将预测模型的信号用于交易决策。
results: 在20多年的马来西亚证券交易数据上测试的结果表明，该算法可以在稀缺交易市场和不同化资产市场中提供更高的交易效率和更好的资产配置。

Abstract
The use of machine learning in algorithmic trading systems is increasingly common. In a typical set-up, supervised learning is used to predict the future prices of assets, and those predictions drive a simple trading and execution strategy. This is quite effective when the predictions have sufficient signal, markets are liquid, and transaction costs are low. However, those conditions often do not hold in thinly traded financial markets and markets for differentiated assets such as real estate or vehicles. In these markets, the trading strategy must consider the long-term effects of taking positions that are relatively more difficult to change. In this work, we propose a Reinforcement Learning (RL) algorithm that trades based on signals from a learned predictive model and addresses these challenges. We test our algorithm on 20+ years of equity data from Bursa Malaysia.

摘要
机器学习在算法交易系统中越来越普遍。一般情况下，监督学习用于预测资产未来价格，这些预测驱动简单的交易和执行策略。这很有效，当预测具有足够的信号，市场流动性高，交易成本低时。然而，这些条件往往不符合财务证券市场和特殊资产市场，如房地产或汽车。在这些市场中，交易策略需要考虑长期持有位置的难度。在这种情况下，我们提出一种强化学习（RL）算法，基于学习预测模型的信号进行交易。我们对马来西亚证券交易所20多年的股票数据进行测试。

2023-07-19

Android in the Wild: A Large-Scale Dataset for Android Device Control

A Dual Formulation for Probabilistic Principal Component Analysis

Unsupervised Accuracy Estimation of Deep Visual Models using Domain-Adaptive Adversarial Perturbation without Source Samples

Accurate deep learning sub-grid scale models for large eddy simulations

Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization

Contextual Reliability: When Different Features Matter in Different Contexts

Europepolls: A Dataset of Country-Level Opinion Polling Data for the European Union and the UK

TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction

Impact of Disentanglement on Pruning Neural Networks

UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing

TinyTrain: Deep Neural Network Training at the Extreme Edge

Learner Referral for Cost-Effective Federated Learning Over Hierarchical IoT Networks

Towards green AI-based software systems: an architecture-centric approach (GAISSA)

XSkill: Cross Embodiment Skill Discovery

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network

Spuriosity Didn’t Kill the Classifier: Using Invariant Predictions to Harness Spurious Features

DISA: DIfferentiable Similarity Approximation for Universal Multimodal Registration

TimeTuner: Diagnosing Time Representations for Time-Series Forecasting with Counterfactual Explanations

Deep projection networks for learning time-homogeneous dynamical systems

Repeated Observations for Classification

Symmetric Equilibrium Learning of VAEs

Adversarial Likelihood Estimation with One-way Flows

Detecting Vulnerable Nodes in Urban Infrastructure Interdependent Network

Towards a population-informed approach to the definition of data-driven models for structural dynamics

Reinforcement Learning for Credit Index Option Hedging

Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application to Sparse Autoencoders

Deep Operator Network Approximation Rates for Lipschitz Operators

What do neural networks learn in image classification? A frequency shortcut perspective

Multi-modal Learning based Prediction for Disease

Deep unrolling Shrinkage Network for Dynamic MR imaging

Manifold Learning with Sparse Regularised Optimal Transport

GenKL: An Iterative Framework for Resolving Label Ambiguity and Label Non-conformity in Web Images Via a New Generalized KL Divergence

Graph Federated Learning Based on the Decentralized Framework

Probabilistic Forecasting with Coherent Aggregation

Forecasting Early with Meta Learning

From West to East: Who can understand the music of the others better?

IncDSI: Incrementally Updatable Document Retrieval

A Note on Hardness of Computing Recursive Teaching Dimension

Reproducibility in Machine Learning-Driven Research

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

Text2Layer: Layered Image Generation using Latent Diffusion Model

Beyond Single-Feature Importance with ICECREAM

Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls

Eliminating Label Leakage in Tree-Based Vertical Federated Learning

Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study

A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices

How Curvature Enhance the Adaptation Power of Framelet GCNs

Sig-Splines: universal approximation and convex calibration of time series generative models

Reinforcing POD based model reduction techniques in reaction-diffusion complex networks using stochastic filtering and pattern recognition

FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning

Constructing Extreme Learning Machines with zero Spectral Bias

Improved Distribution Matching for Dataset Condensation

Mood Classification of Bangla Songs Based on Lyrics

RaTE: a Reproducible automatic Taxonomy Evaluation by Filling the Gap

Efficient Guided Generation for Large Language Models

STRAPPER: Preference-based Reinforcement Learning via Self-training Augmentation and Peer Regularization

Joint Service Caching, Communication and Computing Resource Allocation in Collaborative MEC Systems: A DRL-based Two-timescale Approach

Amazon-M2: A Multilingual Multi-locale Shopping Session Dataset for Recommendation and Text Generation

Convex Geometry of ReLU-layers, Injectivity on the Ball and Local Reconstruction

JAZZVAR: A Dataset of Variations found within Solo Piano Performances of Jazz Standards for Music Overpainting

Towards A Unified Agent with Foundation Models

Anticipating Technical Expertise and Capability Evolution in Research Communities using Dynamic Graph Transformers

Physics-based Reduced Order Modeling for Uncertainty Quantification of Guided Wave Propagation using Bayesian Optimization

Neural Priority Queues for Graph Neural Networks

HAT-CL: A Hard-Attention-to-the-Task PyTorch Library for Continual Learning

Application of BadNets in Spam Filters

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning

Gradient strikes back: How filtering out high frequencies improves explanations

Self-Compatibility: Evaluating Causal Discovery without Ground Truth

Analyzing sports commentary in order to automatically recognize events and extract insights

The semantic landscape paradigm for neural networks

DreaMR: Diffusion-driven Counterfactual Explanation for Functional MRI

Can Neural Network Memorization Be Localized?

Forecasting the steam mass flow in a powerplant using the parallel hybrid network

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

A Cryogenic Memristive Neural Decoder for Fault-tolerant Quantum Error Correction

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla