2023-07-04

cs.LG

cs.LG - 2023-07-04

GHOST: A Graph Neural Network Accelerator using Silicon Photonics

paper_url: http://arxiv.org/abs/2307.01782
repo_url: None
paper_authors: Salma Afifi, Febin Sunny, Amin Shafiee, Mahdi Nikdast, Sudeep Pasricha
for: 这篇论文的目的是为了提出一种基于光学频谱的干扰器硬件加速器，用于加速图 neuron 网络（GNNs）的运算。
methods: 这篇论文使用了光学频谱技术，实现了图 neuron 网络的三个主要阶段（邻居更新、 message passing 和更新），并且可以用于多种广泛使用的 GNN 模型和架构，如图 convolution 网络和图注意力网络。
results: 根据 simulations 研究，GHOST 相比 GPU、TPU、CPU 和多种现有 GNN 硬件加速器，能够提供至少 10.2 倍的吞吐量和 3.8 倍的能效率。

Abstract
Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.

摘要
граф нейрон сети (GNNs) 已成为图Structured data的 мощful approached for modeling and learning. 多个领域受益于 GNNs 的能力, such as recommendation systems, social network analysis, drug discovery, and robotics. 然而，加速和有效地处理 GNNs 需要特殊的approach，以 beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. CMOS 平台的慢速下降也驱动了寻找代替实现SUBSTRATES. 在这篇论文中，我们提出了 GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU, and multiple state-of-the-art GNN hardware accelerators.

FedHIL: Heterogeneity Resilient Federated Learning for Robust Indoor Localization with Mobile Devices

paper_url: http://arxiv.org/abs/2307.01780
repo_url: None
paper_authors: Danish Gufran, Sudeep Pasricha
for: 本研究旨在提高设备不同、indoor环境多样化的情况下的indoor定位精度，同时保护用户数据隐私。
methods: 本研究提出了一种基于联合学习（Federated Learning，FL）和indoor定位的嵌入式机器学习框架（FedHIL），通过选择性调整量来维护ML模型的性能，并在不同设备和环境中实现高精度indoor定位。
results: 实验表明，FedHIL在多种不同的indoor环境和设备上都能够实现1.62倍的定位精度提高，较前期工作的最佳FL-based indoor定位框架的1.35倍。

Abstract
Indoor localization plays a vital role in applications such as emergency response, warehouse management, and augmented reality experiences. By deploying machine learning (ML) based indoor localization frameworks on their mobile devices, users can localize themselves in a variety of indoor and subterranean environments. However, achieving accurate indoor localization can be challenging due to heterogeneity in the hardware and software stacks of mobile devices, which can result in inconsistent and inaccurate location estimates. Traditional ML models also heavily rely on initial training data, making them vulnerable to degradation in performance with dynamic changes across indoor environments. To address the challenges due to device heterogeneity and lack of adaptivity, we propose a novel embedded ML framework called FedHIL. Our framework combines indoor localization and federated learning (FL) to improve indoor localization accuracy in device-heterogeneous environments while also preserving user data privacy. FedHIL integrates a domain-specific selective weight adjustment approach to preserve the ML model's performance for indoor localization during FL, even in the presence of extremely noisy data. Experimental evaluations in diverse real-world indoor environments and with heterogeneous mobile devices show that FedHIL outperforms state-of-the-art FL and non-FL indoor localization frameworks. FedHIL is able to achieve 1.62x better localization accuracy on average than the best performing FL-based indoor localization framework from prior work.

摘要
室内定位在应用程序中扮演着重要的角色，如应急应对、仓库管理和增强现实体验。通过在移动设备上部署机器学习（ML）基于的室内定位框架，用户可以在各种室内和地下环境中自动地标定自己的位置。然而，实现准确的室内定位可以是困难的，因为移动设备的硬件和软件栈的差异会导致不一致和不准确的位置估计。传统的ML模型也具有依赖于初始训练数据的问题，从而使其在室内环境中表现出现很大的变化和衰退。为解决设备不一致和数据变化导致的挑战，我们提出了一种新的嵌入式ML框架called FedHIL。FedHIL将室内定位和联邦学习（FL）结合起来，以提高设备不一致环境中的室内定位精度，同时也保护用户数据隐私。FedHIL使用域特定的选择性加重方法来保持ML模型在室内定位中的表现，即使面临非常噪音的数据时也能够保持高性能。实验证明，FedHIL在多个真实世界室内环境和不同的移动设备上表现出色，与传统的FL和非FL室内定位框架相比，具有1.62倍的本地化精度。

Shapley Sets: Feature Attribution via Recursive Function Decomposition

paper_url: http://arxiv.org/abs/2307.01777
repo_url: None
paper_authors: Torty Sivill, Peter Flach
for: 本研究旨在替代Feature Value Attribution中常用但可能受特征相互作用的Shapley值，提出一种新的归属方法——Shapley Set。
methods: 本研究使用了一种归属函数分解算法，将模型分解成不可分割变量组，并具有对数 linear 复杂度。
results: 研究表明，Shapley Set具有与Shapley值相同的公正性观念，并且可以避免基于Shapley值的归属方法中出现的坑。此外，Shapley Set在数据类型具有复杂依赖关系时表现 particullary 优异。

Abstract
Despite their ubiquitous use, Shapley value feature attributions can be misleading due to feature interaction in both model and data. We propose an alternative attribution approach, Shapley Sets, which awards value to sets of features. Shapley Sets decomposes the underlying model into non-separable variable groups using a recursive function decomposition algorithm with log linear complexity in the number of variables. Shapley Sets attributes to each non-separable variable group their combined value for a particular prediction. We show that Shapley Sets is equivalent to the Shapley value over the transformed feature set and thus benefits from the same axioms of fairness. Shapley Sets is value function agnostic and we show theoretically and experimentally how Shapley Sets avoids pitfalls associated with Shapley value based alternatives and are particularly advantageous for data types with complex dependency structure.

摘要
尽管Shapley值特征归功通用，但它们可能导致特征互动的启示，both model和数据级。我们提出了一种替代方案，即Shapley集，该奖励集合特征。Shapley集使用一种分解函数分解算法，将基础模型分解为不可分割变量组。对每个不可分割变量组，Shapley集归功其组合值 для特定预测。我们证明了Shapley集等于在转换特征集上的Shapley值，因此受到同样的公平原则保证。Shapley集是值函数无关的，我们 theoretically和实验表明，Shapley集可以避免基于Shapley值的代替方法中的坑害，特别是数据类型具有复杂依赖结构。

Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics

paper_url: http://arxiv.org/abs/2307.01770
repo_url: None
paper_authors: Guillaume Mahey, Laetitia Chapel, Gilles Gasso, Clément Bonet, Nicolas Courty
for: 本 paper 描述了一种新的 Wasserstein 距离代理（min-SWGG），该代理基于输送地图，并与 Wasserstein 泛化 геodesics 相关。
methods: 本 paper 使用了一种新的 Computational Scheme，可以使用 gradient descent 优化。此外，paper 还提供了一种关于 Wasserstein 距离的closed form解，并证明了 min-SWGG 是 Wasserstein 距离的上界，并且与 Sliced-Wasserstein 相似，但具有更多的特性。
results: 本 paper 通过 empirical evidences 支持 min-SWGG 在各种应用中的 beneficial 效果，包括梯度流、形状匹配和图像颜色化等。

Abstract
Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others.

摘要
瓦asserstein距离（WD）和相关的最优运输计划在probability measures中显示了有用性。在这篇论文中，我们提议一个新的proxy，称为min-SWGG，它基于两个输入分布的运输地图，它是通过一个优化的一维投影来定义的。我们将min-SWGG与通用水stein化曲线的关系进行连接，并在特定情况下提供一个新的准确 Wasserstein距离的closed form，使得可以使用梯度下降优化。我们证明min-SWGG是WD的上界，并且它的复杂性与Sliced-Wasserstein相似，但它具有提供相关运输计划的特点。我们还研究了一些理论性质，如metricity、weak convergence、computational和topological性。empirical evidence表明min-SWGG在各种场景中具有各种优点，从梯度流、形态匹配到图像颜色化等。

Localized Data Work as a Precondition for Data-Centric ML: A Case Study of Full Lifecycle Crop Disease Identification in Ghana

paper_url: http://arxiv.org/abs/2307.01767
repo_url: None
paper_authors: Darlington Akogo, Issah Samori, Cyril Akafia, Harriet Fiagbor, Andrews Kangah, Donald Kwame Asiedu, Kwabena Fuachie, Luis Oala
for: 论文旨在演示如何通过团队合作和数据工程来提高农业产量和食品安全。
methods: 论文使用了无人机采集的数据和机器学习算法来确定作物压力。
results: 研究实现了一个基于地 desktop 应用程序的本地化数据驱动解决方案，以提高农业生产力和食品安全。

Abstract
The Ghana Cashew Disease Identification with Artificial Intelligence (CADI AI) project demonstrates the importance of sound data work as a precondition for the delivery of useful, localized datacentric solutions for public good tasks such as agricultural productivity and food security. Drone collected data and machine learning are utilized to determine crop stressors. Data, model and the final app are developed jointly and made available to local farmers via a desktop application.

摘要
《加纳杏仁疾病识别用人工智能项目（CADI AI）》显示了数据工作的重要性，作为当地数据驱动解决方案的前提。该项目使用无人机收集数据和机器学习来确定作物压力。数据、模型和最终应用程序均由本地农民通过桌面应用程序获得。

Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification

paper_url: http://arxiv.org/abs/2307.01759
repo_url: https://github.com/lugges991/metaformer
paper_authors: Lucas Mahler, Qi Wang, Julius Steiglechner, Florian Birk, Samuel Heczko, Klaus Scheffler, Gabriele Lohmann
For: This paper proposes a novel framework for ASD classification using resting-state functional magnetic resonance imaging data.* Methods: The proposed framework, called METAFormer, utilizes a multi-atlas approach and self-supervised pretraining to improve classification performance.* Results: The proposed framework achieves state-of-the-art performance on the ABIDE I dataset, with an average accuracy of 83.7% and an AUC-score of 0.832.Here is the same information in Simplified Chinese text:* For: 这个论文提出了一种基于Resting-state功能磁共振成像数据的ASD分类方法。* Methods: 提议的方法是METAFormer，它使用多个图像的方法和自我批示训练来提高分类性能。* Results: 提议的方法在ABIDE I dataset上达到了状态之arte的性能，具体来说是83.7%的平均精度和0.832的AUC分数。

Abstract
Autism spectrum disorder (ASD) is a prevalent psychiatric condition characterized by atypical cognitive, emotional, and social patterns. Timely and accurate diagnosis is crucial for effective interventions and improved outcomes in individuals with ASD. In this study, we propose a novel Multi-Atlas Enhanced Transformer framework, METAFormer, ASD classification. Our framework utilizes resting-state functional magnetic resonance imaging data from the ABIDE I dataset, comprising 406 ASD and 476 typical control (TC) subjects. METAFormer employs a multi-atlas approach, where flattened connectivity matrices from the AAL, CC200, and DOS160 atlases serve as input to the transformer encoder. Notably, we demonstrate that self-supervised pretraining, involving the reconstruction of masked values from the input, significantly enhances classification performance without the need for additional or separate training data. Through stratified cross-validation, we evaluate the proposed framework and show that it surpasses state-of-the-art performance on the ABIDE I dataset, with an average accuracy of 83.7% and an AUC-score of 0.832. The code for our framework is available at https://github.com/Lugges991/METAFormer

摘要
“自闭症 спектルム病（ASD）是一种常见的心理疾病，具有异常的认知、情感和社交模式。及时和准确的诊断非常重要，以便为患有ASD的个体提供有效的 intervención和改善结果。在这项研究中，我们提出了一种新的多 Atlas 增强变换框架，METAFormer，用于ASD分类。我们的框架使用了ABIDE I 数据集中的406名ASD和476名 Typical control（TC）个体的休息态功能磁共振成像数据。METAFormer 使用多Atlas方法，其中扁平连接矩阵从AAL、CC200和DOS160 的图像服务器为变换器编码器的输入。我们表明，不需要额外或分离的训练数据，通过自我超vision的预训练，即将掩码的值重建为输入的masked 值，可以明显提高分类性能。通过 stratified 树目录验证，我们评估了提议的框架，并发现其在ABIDE I 数据集上的平均准确率为83.7%，AUC 分数为0.832。 code for our framework is available at https://github.com/Lugges991/METAFormer。”

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

paper_url: http://arxiv.org/abs/2307.01753
repo_url: https://github.com/mehdirezaie/dimagfnl
paper_authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho, Julien Guy, Klaus Honscheid, Theodore Kisner, Martin Landriau, Michael Levi, Marc Manera, Aaron Meisner, Ramon Miquel, Eva-Maria Mueller, Adam Myers, Jeffrey A. Newman, Jundan Nie, Nathalie Palanque-Delabrouille, Will Percival, Claire Poppett, Graziano Rossi, Eusebio Sanchez, Michael Schubnell, Gregory Tarlé, Benjamin Alan Weaver, Christophe Yèche, Zhimin Zhou, Hu Zou
For: The paper aims to constrain the local primordial non-Gaussianity parameter fNL using angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys.* Methods: The paper uses linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales, and tests the methods against log-normal simulations with and without fNL and systematics.* Results: The paper finds fNL $= 47^{+14(+29)}_{-11(-22)}$ at 68%(95%) confidence, with a maximum likelihood value of fNL $\sim 50$ and increased uncertainty when including a full set of imaging maps. The results indicate fNL > 0 with a 99.9 percent confidence level, which could be attributed to unforeseen systematics or a scale-dependent fNL model.

Abstract
We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter fNL. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range 0.2< z < 1.35. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against log-normal simulations with and without fNL and systematics, showing superior performance of the neural network treatment in reducing remaining systematics. Assuming the universality relation, we find fNL $= 47^{+14(+29)}_{-11(-22)}$ at 68\%(95\%) confidence. With a more aggressive treatment, including regression against the full set of imaging maps, our maximum likelihood value shifts slightly to fNL$ \sim 50$ and the uncertainty on fNL increases due to the removal of large-scale clustering information. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. Despite extensive efforts to mitigate systematics, our measurements indicate fNL > 0 with a 99.9 percent confidence level. This outcome raises concerns as it could be attributed to unforeseen systematics, including calibration errors or uncertainties associated with low-\ell systematics in the extinction template. Alternatively, it could suggest a scale-dependent fNL model--causing significant non-Gaussianity around large-scale structure while leaving cosmic microwave background scales unaffected. Our results encourage further studies of fNL with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics.

摘要
我们使用 DESI 图像观测的 Angular 卷积方法来约束本地原始非加性参数 fNL。我们的样本包括超过 12 百万目标，覆盖 14,000平方度天空，红shift 在 0.2 < z < 1.35 之间。我们认为 galactic 遮盖、观测深度和天文望远镜为主要系统性错误来源，并使用线性回归和人工神经网络来缓减非 cosmological 过卷 clustering。我们的方法在 log-normal simulations 中与和 без fNL 和系统atic 进行测试，显示人工神经网络处理的superior performance 在减少剩下系统atic。assuming универса性关系，我们得到 fNL = 47 ± 14 ± 29 的确idence Interval。通过对全aset of imaging maps进行回归，我们的最大似然值shift 到 fNL ≈ 50，并且因为移除大规模 clustering 信息而增加了 fNL 的不确定度。我们进行了一系列Robustness 测试（例如，对 imaging、 declination 或 scale 进行cut），发现结果是一致的。despite extensive efforts to mitigate systematics，我们的测量结果表明 fNL > 0 的99.9% 信任水平。这些结果可能被归因于未知系统atic，包括折合错误或低-\ell 系统atic 在 extinction 模板中的不确定度。 Alternatively，这些结果可能表明 scale-dependent fNL 模型，导致在大规模结构上显著的非 Gaussianity，而不影响cosmic microwave background 观测。我们的结果鼓励 DESI 光谱样本进一步研究 fNL，其中包括3D clustering modes，可以帮助分离图像系统atic。

SRCD: Semantic Reasoning with Compound Domains for Single-Domain Generalized Object Detection

paper_url: http://arxiv.org/abs/2307.01750
repo_url: None
paper_authors: Zhijie Rao, Jingcai Guo, Luyao Tang, Yue Huang, Xinghao Ding, Song Guo
for: 这个论文提出了一个新的单域泛化物体检测框架（即Single-DGOD），旨在学习和维护自增强采样的 semantic 结构，以提高模型的泛化能力。
methods: 论文提出了两个主要组件： texture-based self-augmentation (TBSA) 模块和 local-global semantic reasoning (LGSR) 模块。 TBSA 模块用于消除图像水平上的不相关属性，如光影、颜色等，而 LGSR 模块用于进一步模型实例层次的 semantic 关系，以帮助维护内在的 semantic 结构。
results: 对多个benchmark进行了广泛的实验，证明了提出的 SRCD 的效果。

Abstract
This paper provides a novel framework for single-domain generalized object detection (i.e., Single-DGOD), where we are interested in learning and maintaining the semantic structures of self-augmented compound cross-domain samples to enhance the model's generalization ability. Different from DGOD trained on multiple source domains, Single-DGOD is far more challenging to generalize well to multiple target domains with only one single source domain. Existing methods mostly adopt a similar treatment from DGOD to learn domain-invariant features by decoupling or compressing the semantic space. However, there may have two potential limitations: 1) pseudo attribute-label correlation, due to extremely scarce single-domain data; and 2) the semantic structural information is usually ignored, i.e., we found the affinities of instance-level semantic relations in samples are crucial to model generalization. In this paper, we introduce Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. Specifically, our SRCD contains two main components, namely, the texture-based self-augmentation (TBSA) module, and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by a light-yet-efficient self-augmentation. Moreover, LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures. Extensive experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD.

摘要
To address these limitations, this paper introduces Semantic Reasoning with Compound Domains (SRCD) for Single-DGOD. SRCD consists of two main components: the texture-based self-augmentation (TBSA) module and the local-global semantic reasoning (LGSR) module. TBSA aims to eliminate the effects of irrelevant attributes associated with labels, such as light, shadow, color, etc., at the image level by using a light-yet-efficient self-augmentation. LGSR is used to further model the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures.Experiments on multiple benchmarks demonstrate the effectiveness of the proposed SRCD. The main contributions of this paper are:1. A novel framework for Single-DGOD, which learns and maintains the semantic structures of self-augmented compound cross-domain samples.2. A new module called TBSA, which eliminates the effects of irrelevant attributes associated with labels at the image level.3. A module called LGSR, which models the semantic relationships on instance features to uncover and maintain the intrinsic semantic structures.Overall, this paper presents a more effective and efficient approach to Single-DGOD, which can improve the generalization ability of object detection models in real-world applications.

RRCNN: A novel signal decomposition approach based on recurrent residue convolutional neural network

paper_url: http://arxiv.org/abs/2307.01725
repo_url: https://github.com/zhoudafa08/rrcnn
paper_authors: Feng Zhou, Antonio Cicone, Haomin Zhou
for: 这种研究的目的是为了开发一种基于深度学习的非站立信号分解方法，以提高现有方法的缺点，如边界和模式混合问题和噪声Robustness。
methods: 该方法使用了卷积神经网络、径向结构和非线性活动函数来计算信号的本地平均值，并在深度学习框架下实现了新的非站立信号分解方法。
results: 实验表明，提案的方法可以更好地处理边界问题、模式混合问题、噪声Robustness和分解结果的正交性，并且在计算本地平均值和信号分解两个方面都有更高的性能。

Abstract
The decomposition of non-stationary signals is an important and challenging task in the field of signal time-frequency analysis. In the recent two decades, many signal decomposition methods led by the empirical mode decomposition, which was pioneered by Huang et al. in 1998, have been proposed by different research groups. However, they still have some limitations. For example, they are generally prone to boundary and mode mixing effects and are not very robust to noise. Inspired by the successful applications of deep learning in fields like image processing and natural language processing, and given the lack in the literature of works in which deep learning techniques are used directly to decompose non-stationary signals into simple oscillatory components, we use the convolutional neural network, residual structure and nonlinear activation function to compute in an innovative way the local average of the signal, and study a new non-stationary signal decomposition method under the framework of deep learning. We discuss the training process of the proposed model and study the convergence analysis of the learning algorithm. In the experiments, we evaluate the performance of the proposed model from two points of view: the calculation of the local average and the signal decomposition. Furthermore, we study the mode mixing, noise interference, and orthogonality properties of the decomposed components produced by the proposed method. All results show that the proposed model allows for better handling boundary effect, mode mixing effect, robustness, and the orthogonality of the decomposed components than existing methods.

摘要
非站点信号的分解是信号时频分析领域中的一个重要和挑战性任务。过去二十年，许多基于实验模式分解的信号分解方法已经被不同的研究组织提出。然而，它们仍有一些限制，例如容易受边缘和模式混合效应的影响，并不够鲁棒对噪声。受图像处理和自然语言处理等领域的深度学习成功应用启发，我们使用卷积神经网络、循环结构和非线性活化函数计算非站点信号的本地均值，并研究了一种基于深度学习框架的新的非站点信号分解方法。我们讨论了该模型的训练过程和学习算法的整合分析。在实验中，我们评估了提案模型的性能从两个角度：计算本地均值和信号分解。此外，我们还研究了分解后的模式混合、噪声抑制和正交性特性。所有结果都表明，提案的模型可以更好地处理边缘效应、模式混合效应、鲁棒性和分解后的正交性。

MOPO-LSI: A User Guide

paper_url: http://arxiv.org/abs/2307.01719
repo_url: None
paper_authors: Yong Zheng, Kumar Neelotpal Shukla, Jasmine Xu, David, Wang, Michael O’Leary
for: 这份论文是为了提供一个开源的多目标投资套件库，用于实现可持续投资。
methods: 该论文使用了多目标优化算法来解决投资问题，并提供了一个可用的配置文件来定制算法的参数。
results: 该论文通过使用多目标优化算法，可以实现更好的投资效果，并提供了一个可用的配置文件来定制算法的参数。I hope that helps! Let me know if you have any other questions.

Abstract
MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.

摘要
MOPO-LSI是一个开源的多目标投资组合优化库，旨在推动可持续投资。这份文档提供MOPO-LSI版本1.0的用户指南，包括问题设置、工作流程和配置参数。

On the Constrained Time-Series Generation Problem

paper_url: http://arxiv.org/abs/2307.01717
repo_url: None
paper_authors: Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, Svitlana Vyetrenko
for: 这个论文的目的是解决受限时间序列生成问题，以提高机器学习算法的性能，增加罕见事件的发生频率，并生成对应的counterfactualenario。
methods: 这个论文提出了一种新的方法集，包括一种受限时间序列生成模型“GuidedDiffTime”，用于生成符合限制的时间序列。这些方法使用可导的扩散模型，并通过优化问题来保证生成的时间序列具有真实性。
results: 这个论文在金融和能源等领域进行了评估，并证明了其方法的优越性。具体来说，这些方法可以提高现有方法的性能，同时不需要重新训练，从而减少碳脚印。

Abstract
Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.

摘要
Synthetic time series 常用于实际应用中以增强机器学习算法的性能，增加罕见事件的发生频率，并创建对时间序列的counterfactualenario。例如，美国联邦储金行发布了基于受限时间序列的synthetic市场压力场景，用于金融机构评估其在假设的经济衰退中的性能。现有的时间序列生成方法通常是通过减少训练损失来实现约束，并拒绝不符合约束的样本。然而，这些方法需要重新训练，如果改变约束，并且拒绝样本可能是计算昂贵或对复杂约束来说不实际。在这篇论文中，我们提出一种新的方法来解决受约束时间序列生成问题，并提供高效的采样，以保证生成的时间序列的真实性。具体来说，我们将问题带入一个受约束优化框架，然后我们提出一种生成方法，包括“导航扩散模型”，用于生成真实的时间序列。在实际中，我们对金融和能源等数据集进行了评估，并证明我们的方法在质量和效率两个方面都有较好的表现。最重要的是，我们的“导航扩散模型”不需要重新训练，以避免重新训练所带来的碳脚印。

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework

paper_url: http://arxiv.org/abs/2307.01715
repo_url: None
paper_authors: Eliya Segev, Maya Alroy, Ronen Katsir, Noam Wies, Ayana Shenhav, Yael Ben-Oren, David Zar, Oren Tadmor, Jacob Bitterman, Amnon Shashua, Tal Rosenwein
for: 这篇论文的目的是提高CTC评价函数中的一种扩展，以便在训练seq2seq模型时提高模型的性能。
methods: 该论文提出了一种扩展CTC评价函数的方法，称为“Align With Purpose”，该方法通过添加一个额外的损失函数来优化模型的一定性能。
results: 该论文在自动语音识别领域中应用了该方法，并实现了在不同的性能指标下提高模型的性能。例如，在释放时间优化中，提高了570毫秒，而word error rate（WER）下降了4.5%。此外，该方法可以在大规模数据上进行扩展，并且可以通过只添加一些代码来实现。

Abstract
Connectionist Temporal Classification (CTC) is a widely used criterion for training supervised sequence-to-sequence (seq2seq) models. It enables learning the relations between input and output sequences, termed alignments, by marginalizing over perfect alignments (that yield the ground truth), at the expense of imperfect alignments. This binary differentiation of perfect and imperfect alignments falls short of capturing other essential alignment properties that hold significance in other real-world applications. Here we propose $\textit{Align With Purpose}$, a $\textbf{general Plug-and-Play framework}$ for enhancing a desired property in models trained with the CTC criterion. We do that by complementing the CTC with an additional loss term that prioritizes alignments according to a desired property. Our method does not require any intervention in the CTC loss function, enables easy optimization of a variety of properties, and allows differentiation between both perfect and imperfect alignments. We apply our framework in the domain of Automatic Speech Recognition (ASR) and show its generality in terms of property selection, architectural choice, and scale of training dataset (up to 280,000 hours). To demonstrate the effectiveness of our framework, we apply it to two unrelated properties: emission time and word error rate (WER). For the former, we report an improvement of up to 570ms in latency optimization with a minor reduction in WER, and for the latter, we report a relative improvement of 4.5% WER over the baseline models. To the best of our knowledge, these applications have never been demonstrated to work on a scale of data as large as ours. Notably, our method can be implemented using only a few lines of code, and can be extended to other alignment-free loss functions and to domains other than ASR.

摘要
Connectionist Temporal Classification (CTC) 是一种广泛使用的训练监督序列到序列（seq2seq）模型的评价标准。它允许学习输入和输出序列之间的关系，称为对齐，通过对完美对齐（导致真实值）进行积分，而抛弃不完美对齐。这个二元对齐分类不足以捕捉其他重要的对齐属性，因此我们提出了 $\textit{Align With Purpose}$，一种通用的插件和替换框架。我们通过补充 CTC 的损失函数中的额外损失项，以便根据某种需要的属性进行对齐优化。我们的方法不需要对 CTC 损失函数进行任何修改，可以轻松地优化多种属性，并允许对不完美对齐进行分类。我们在自动语音识别（ASR）领域应用了我们的框架，并在不同的属性、结构和训练数据规模（最多 280,000 小时）上进行了证明。为了证明我们的框架的有效性，我们在两个不相关的属性上应用了它：发射时间和单词错误率（WER）。对于前者，我们报告了最多 570ms 的延迟优化和相对较小的 WER 降低，对于后者，我们报告了相对于基eline模型的4.5% WER 提高。这些应用都是在我们知道的数据规模上进行的，而且我们的方法只需要几行代码就可以实现，并且可以扩展到其他对齐不受限制的损失函数和领域。

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01708
repo_url: None
paper_authors: Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand
for: 本研究强调学习风险敏感奖励学习模型。
methods: 本文使用分布式奖励学习引入两种新的模型Equivalence定义，一种是通用的，可以用来奖励任何风险度量，但是 computationally intractable; 另一种是实用的，允许用户选择可以奖励的风险度量。
results: 我们的框架可以用来改进任何模型自由风险敏感算法，并在标准和大规模实验中证明其能力。

Abstract
We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.

摘要
我们考虑到风险敏感的强化学习问题。我们理论上显示，对于风险中立设定的价值相等方法，不够以来 пла номoptimal 的方式在风险敏感设定中。我们利用分布式强化学习来引入两个新的模型相等性，一个是一般的，可以用来 пла номoptimal 任何风险度量，但是computationally intractable;另一个是实用的，允许选择可以实时最佳化的风险度量。我们显示了我们的框架可以与任何风险敏感无模型学习算法结合，并提供了 Tabular 和大规模实验来证明其能力。

Online Learning and Solving Infinite Games with an ERM Oracle

paper_url: http://arxiv.org/abs/2307.01689
repo_url: None
paper_authors: Angelos Assos, Idan Attias, Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson
for: 这个论文主要针对的是在在线学习Setting中，使用ERM oracle calls来实现最佳化的泛化误差和均衡点。
methods: 这篇论文提出了一种基于ERM oracle calls的在线binary分类算法，以及在多 Player游戏中的均衡点算法，这些算法都可以在不同的游戏设定中实现最佳化的性能。
results: 论文表明了这种算法在可 réalisable Setting中有finite regret，在agnostic Setting中具有sublinearly growing regret，并且可以在不同的游戏设定中实现最佳化的性能，其性能与游戏的Littlestone和阈值维度有关。

Abstract
While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.

摘要
在随机学习设定下，ERM 已经足够保证逼近优化的泛化误差，但在在线学习设定下，算法们尚未知道是否可以达到优化的泛化误差。在这项工作中，我们提出了基于 ERM oracle 的在线二分类Setting 的算法，并证明其在可 realizable 设定下有finite regret，在agnostical 设定下有sublinearly growing regret。我们 bound regret 的大小与underlying 概念类型的 Littlestone 和阈值维度。在非参数学习游戏中，我们可以将 ERM oracle 解释为最佳回应 oracle，找到对某个玩家的历史玩家的最佳回应。在这个设定下，我们提供了基于最佳回应 oracle 的学习算法，可以在两 player zero-sum 游戏和多 player general-sum 游戏中达到approximate-minimax equilibria和approximate coarse correlated equilibria，只要游戏有 bounded fat-threshold dimension。我们的算法适用于 binary-valued 和 real-valued 游戏，可以视为对 double oracle 和多 oracle 算法在实践中的广泛使用提供 justify。

Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT Services

paper_url: http://arxiv.org/abs/2307.01684
repo_url: None
paper_authors: Liekang Zeng, Xu Chen, Peng Huang, Ke Luo, Xiaoxi Zhang, Zhi Zhou
for:Fograph is designed to provide real-time GNN inference for IoT-driven smart applications, leveraging the resources of multiple fog nodes to reduce communication overhead and improve performance.methods:Fograph employs heterogeneity-aware execution planning and GNN-specific compression techniques to optimize the performance of GNN inference in fog environments.results:Compared to state-of-the-art cloud serving and fog deployment, Fograph achieves up to 5.39x execution speedup and 6.84x throughput improvement, demonstrating its effectiveness in improving the performance of GNN-based services for IoT applications.

Abstract
Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.

摘要
граф neural networks (GNNs) 在不同的应用领域获得了不断增长的兴趣，主要是因为它们在图结构上能够激发出优秀的隐藏表示。为了在基于 IoT 的智能应用中提供 GNN 服务，传统的模型服务方式通常是通过完全上传到远程数据中心进行云计算。然而，我们的实验表明，这种云计算中的通信开销很大，而 fog 计算的出现也提供了一个可能性。为了最大化fog计算中的建筑减少开销，在这篇论文中，我们提出了一种分布式实时 GNN 推理框架，称之为 Fograph。 Fograph 利用了多个 fog 节点的多样化和动态资源，以便在 IoT 数据源附近进行 GNN 推理。通过对异质性的执行规划和 GNN 特定压缩技术，Fograph 的设计与 fog 环境中 GNN 服务的特点相匹配。实验和案例研究表明，Fograph 在比较云服务和 fog 部署时可以达到5.39倍的执行速度提升和6.84倍的吞吐量提高。

Learning Discrete Weights and Activations Using the Local Reparameterization Trick

paper_url: http://arxiv.org/abs/2307.01683
repo_url: None
paper_authors: Guy Berger, Aviv Navon, Ethan Fetaya
for: 降低计算机视和机器学习中的神经网络推断 computation和存储需求
methods: 使用二进制化来减少神经网络推断的计算复杂性，并通过使用比较快的位运算来替代慢速的浮点运算
results: 实现了降低计算机视和机器学习中神经网络推断的时间和内存占用，并达到了当前最佳性能的二进制激活神经网络推断

Abstract
In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. A commonplace solution to address this challenge is through the use of binarization. By binarizing the network weights and activations, one can significantly reduce computational complexity by substituting the computationally expensive floating operations with faster bitwise operations. This leads to a more efficient neural network inference that can be deployed on low-resource devices. In this work, we extend previous approaches that trained networks with discrete weights using the local reparameterization trick to also allow for discrete activations. The original approach optimized a distribution over the discrete weights and uses the central limit theorem to approximate the pre-activation with a continuous Gaussian distribution. Here we show that the probabilistic modeling can also allow effective training of networks with discrete activation as well. This further reduces runtime and memory footprint at inference time with state-of-the-art results for networks with binary activations.

摘要
在计算机视觉和机器学习中，一个重要挑战是降低神经网络推理的计算和内存需求。一种常见的解决方案是通过 binarization 来实现这一目标。通过将神经网络权重和活动化值binarized，可以在替换计算昂贵的浮点运算时大幅降低计算复杂性。这会导致更高效的神经网络推理，可以在低资源设备上部署。在这项工作中，我们extend了之前的方法，使得神经网络可以使用随机变量的批处理技术进行训练，而不是通过精确的权重值来进行训练。我们原始的方法是使用中心假设定理来近似预Activation的Continuous Gaussian Distribution。这里我们表明，可以通过概率模型来有效地训练具有随机变量的神经网络。这会进一步降低执行时间和内存占用，并且在state-of-the-art 的结果下，对于具有二进制活动化的神经网络进行推理。

Training Energy-Based Models with Diffusion Contrastive Divergences

paper_url: http://arxiv.org/abs/2307.01668
repo_url: None
paper_authors: Weijian Luo, Hao Jiang, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Zhihua Zhang
for: 这个论文主要针对的问题是如何改进对能量基模型（EBM）的训练方法，以提高EBM的生成能力和效率。
methods: 这个论文提出了一种新的启发对EBM的训练方法，即Diffusion Contrastive Divergence（DCD），它将Langevin dynamic更换为其他EBM参数自由的扩散过程。这种方法可以更高效地进行训练，并且不受非可忽略的梯度项的限制。
results: 作者在实验中表明，提出的DCD方法可以在生成数据集和高维图像噪声除除和生成任务中表现出色，比CD更高效和稳定。此外，DCD还能够训练EBM来生成Celab-A $32\times 32$数据集，与现有EBM相当。

Abstract
Energy-Based Models (EBMs) have been widely used for generative modeling. Contrastive Divergence (CD), a prevailing training objective for EBMs, requires sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which leads to an irreconcilable trade-off between the computational burden and the validity of the CD. Running MCMCs till convergence is computationally intensive. On the other hand, short-run MCMC brings in an extra non-negligible parameter gradient term that is difficult to handle. In this paper, we provide a general interpretation of CD, viewing it as a special instance of our proposed Diffusion Contrastive Divergence (DCD) family. By replacing the Langevin dynamic used in CD with other EBM-parameter-free diffusion processes, we propose a more efficient divergence. We show that the proposed DCDs are both more computationally efficient than the CD and are not limited to a non-negligible gradient term. We conduct intensive experiments, including both synthesis data modeling and high-dimensional image denoising and generation, to show the advantages of the proposed DCDs. On the synthetic data learning and image denoising experiments, our proposed DCD outperforms CD by a large margin. In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.

摘要
能量基模型（EBM）在生成模型方面广泛使用。对比差分泵（CD）是EBM训练的主要目标函数，但是使用Markov链约化 Monte Carlo方法（MCMC）来采样EBM，导致计算成本和验证CD之间存在不可 reconcile的负担。在MCMC运行至收敛之前，计算成本很高；另一方面，使用短跑MCMC会带来额外的非可忽略的参数梯度项，而且难以处理。在本文中，我们提供了CD的普遍解释，视其为我们提议的噪声对照分布（DCD）家族的特例。我们将CD中使用的朗格温动力换用其他EBM参数无关的扩散过程，并提出了更高效的分离。我们表明，我们提议的DCD比CD更高效，而且不受非可忽略的参数梯度项的限制。我们进行了广泛的实验，包括生成数据模型和高维图像减震和生成，以显示我们的提议DCD的优势。在生成数据学习和图像减震实验中，我们的提议DCD比CD大幅提高。在图像生成实验中，我们的提议DCD可以训练一个能量基模型，用于生成Celab-A $32\times 32$ 数据集，与现有EBM相当。

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

paper_url: http://arxiv.org/abs/2307.01649
repo_url: None
paper_authors: Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang
for: 这 paper 的目的是研究 Convolutional residual neural networks (ConvResNets) 的性能，并解释它们在实践中的出色预测能力，不能由 conventional wisdom 解释。
methods: 这 paper 使用 weight decay 来研究 ConvResNeXts 的表现，从非Parametric classification 的角度来看。
results: 研究表明，ConvResNeXts 可以具有高精度的预测性能，并且可以有效地适应函数的柔和性和低维度结构。

Abstract
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.

摘要

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

paper_url: http://arxiv.org/abs/2307.01646
repo_url: https://github.com/qiyan98/swingnn
paper_authors: Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang
for: 本文旨在提出一种基于卷积神经网络的非对称扩散模型，用于学习图数据上的非对称分布。
methods: 该模型使用高效的边到边2-WL消息传递网络，并利用Shifted Window基于SwinTransformers的自注意机制。
results: 经过系统的ablations和训练技巧优化，我们的SwinGNN在synthetic和实际的蛋白质和分子数据上达到了顶尖性能。

Abstract
Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

摘要
Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.Here's the translation in Traditional Chinese:Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.

Heuristic Algorithms for the Approximation of Mutual Coherence

paper_url: http://arxiv.org/abs/2307.01639
repo_url: None
paper_authors: Gregor Betz, Vera Chekan, Tamara Mchedlidze
for: This paper is written for those interested in efficient computation of mutual coherence, particularly in the context of political preference matching systems like Wahl-O-Mat.
methods: The paper presents several heuristics to estimate the model parameters of a mixture of three Gaussians distribution, which is used to approximate the mutual coherence. Some of the algorithms are fully polynomial-time, while others require solving a small number of instances of the SAT model counting problem.
results: The paper reports the average squared error of the best algorithm, which is below 0.0035, indicating a high degree of accuracy while also being efficient. The results are precise enough to be used in Wahl-O-Mat-like systems.Here’s the same information in Simplified Chinese text:
for: 这篇论文是为了提高共谐度的计算效率而写的，特别是在政治偏好匹配系统中使用。
methods: 论文提出了一些用于估计三个 Gaussian 分布的模型参数的快速算法，其中一些是完全 полиномиаль时间的，而另一些只需解决一些 SAT 模型计数问题。
results: 论文报告了最佳算法的平均平方误差，为0.0035以下，表明高度准确并且高效。结果可以用于 Wahl-O-Mat 类系统中。

Abstract
Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems.

摘要
互相协调是两个意见之间的相似度度量。这个概念起源于哲学，但是它对广泛的技术领域都很重要，例如德国的 Wahl-O-Mat 系统。这个系统帮助选民找到最符合其政治偏好的候选人。计算互相协调的精确方法需要遍历所有意见的所有子集，并解决每个子集的 SAT 模型计数问题，这是计算机科学中知名的困难问题。这项工作是首次加速这种计算的研究。我们模型了确认值的分布为三个高斯分布的混合，并提供了高效的启发式来估算模型参数。然后，我们使用分布的期望值来 aproximate 互相协调。一些我们提出的算法是完全多项式时间的，其他些只需解决一些 SAT 模型计数问题。我们的最佳算法的平均方差平方误差低于 0.0035，这对于效率来说是无意义的。此外，我们的精度够精确，可以用于 Wahl-O-Mat 类系统。

HAGNN: Hybrid Aggregation for Heterogeneous Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.01636
repo_url: None
paper_authors: Guanghui Zhu, Zhennan Zhu, Hongyang Chen, Chunfeng Yuan, Yihua Huang
for: Handle heterogeneous graphs with rich type semantic information.
methods: + Propose a novel framework called HAGNN (Hybrid Aggregation for Heterogeneous GNNs) + Leverage both meta-path neighbors and directly connected neighbors for node aggregation + Divide the aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation + Use a new data structure called fused meta-path graph for intra-type aggregation + Perform structural semantic aware aggregation
results: + Outperform existing heterogeneous GNN models on node classification, node clustering, and link prediction tasks + Demonstrate the effectiveness of HAGNN in handling heterogeneous graphs with rich type semantic information.

Abstract
Heterogeneous graph neural networks (GNNs) have been successful in handling heterogeneous graphs. In existing heterogeneous GNNs, meta-path plays an essential role. However, recent work pointed out that simple homogeneous graph model without meta-path can also achieve comparable results, which calls into question the necessity of meta-path. In this paper, we first present the intrinsic difference about meta-path-based and meta-path-free models, i.e., how to select neighbors for node aggregation. Then, we propose a novel framework to utilize the rich type semantic information in heterogeneous graphs comprehensively, namely HAGNN (Hybrid Aggregation for Heterogeneous GNNs). The core of HAGNN is to leverage the meta-path neighbors and the directly connected neighbors simultaneously for node aggregations. HAGNN divides the overall aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation. During the intra-type aggregation phase, we propose a new data structure called fused meta-path graph and perform structural semantic aware aggregation on it. Finally, we combine the embeddings generated by each phase. Compared with existing heterogeneous GNN models, HAGNN can take full advantage of the heterogeneity in heterogeneous graphs. Extensive experimental results on node classification, node clustering, and link prediction tasks show that HAGNN outperforms the existing modes, demonstrating the effectiveness of HAGNN.

摘要
《异类图 neural network（GNN）在处理异类图方面取得成功。现有的异类GNN中，元路扮演着关键性的角色。然而，最近的研究表明，简单的同类图模型无需元路可以达到相似的结果，这意味着元路的必要性被质疑。在这篇论文中，我们首先介绍异类GNN中元路和无元路两种模型之间的本质差异，即如何选择节点 для节点聚合。然后，我们提出了一种新的框架，即Hybrid Aggregation for Heterogeneous GNNs（异类GNN中的混合聚合），用于全面利用异类图中各种类型Semantic信息。核心思想是同时利用元路邻居和直接连接邻居进行节点聚合。我们将整个聚合过程分成两个阶段：元路基于的内部聚合和元路无的交叉聚合。在内部聚合阶段，我们提出了一种新的数据结构called fused meta-path graph，并在其上进行结构层次意识感知聚合。最后，我们将每个阶段生成的embeddings合并。与现有的异类GNN模型相比，HAGNN可以全面利用异类图中的异类性。我们在节点分类、节点封顶和链接预测任务上进行了广泛的实验，结果显示，HAGNN在这些任务上表现出了更好的效果，证明了HAGNN的效果。》

Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network

paper_url: http://arxiv.org/abs/2307.01622
repo_url: https://github.com/mertnakip/Recurrent-Trend-Predictive-Neural-Network
paper_authors: Mert Nakıp, Onur Çopur, Emrah Biyik, Cüneyt Güzeliş
For: The paper proposes an advanced machine learning algorithm for efficient residential demand control in smart home energy management systems.* Methods: The proposed algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), simultaneously forecasts renewable energy generation and schedules household appliances, eliminating the need for separate algorithms.* Results: The evaluation results show that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than optimization while outperforming state-of-the-art forecasting techniques.Here is the same information in Simplified Chinese:* For: 本文提出了一种高效的机器学习算法，用于智能家庭能源管理系统中的居民需求控制。* Methods: 提议的算法是基于循环趋势预测神经网络的预测嵌入式调度算法（rTPNN-FES），同时预测可再生能源生产和家庭电器的调度。* Results: 评估结果显示，rTPNN-FES可以在优化过程中提供近似优化的调度，比起现有预测技术要高效，并且在37.5倍 faster than optimization。

Abstract
Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurrent Trend Predictive Neural Network based Forecast Embedded Scheduling (rTPNN-FES), to provide efficient residential demand control. rTPNN-FES is a novel neural network architecture that simultaneously forecasts renewable energy generation and schedules household appliances. By its embedded structure, rTPNN-FES eliminates the utilization of separate algorithms for forecasting and scheduling and generates a schedule that is robust against forecasting errors. This paper also evaluates the performance of the proposed algorithm for an IoT-enabled smart home. The evaluation results reveal that rTPNN-FES provides near-optimal scheduling $37.5$ times faster than the optimization while outperforming state-of-the-art forecasting techniques.

摘要
智能家庭能源管理系统可以使分布式电力网络运行更加高效和可靠，并允许有效地推进分布式可再生能源源。这些系统需要可靠的预测、优化和控制/调度算法，以处理各种不确定的需求和可再生能源生产。本文提出了一种高级的机器学习算法，即循环趋势预测神经网络基于预测嵌入的调度算法（rTPNN-FES），以提供高效的家庭需求控制。rTPNN-FES是一种新的神经网络架构，同时预测可再生能源生产和调度家用电器。由嵌入结构，rTPNN-FES消除了分离的预测和调度算法，生成一个强健对预测错误的负荷调度。本文还评估了提议的算法在智能家庭上的性能。评估结果显示，rTPNN-FES提供了近似优化的调度，比传统预测技术更高效，并且比优化算法更快，每秒37.5次。

SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting

paper_url: http://arxiv.org/abs/2307.01616
repo_url: None
paper_authors: Zhenwei Zhang, Xin Wang, Yuantao Gu
for: 本研究旨在提高多ivariate时间序列预测中的深度学习方法，尤其是Transformers的应用。
methods: 本paper引入了Series-aware Graph-enhanced Transformer模型，用于有效地捕捉和模型系列之间的依赖关系。
results: 经过广泛的实验研究，本paper显示了SageFormer模型在实际数据和 sintetic dataset上的superior性能，比之前的状态之 искусственный智能方法更高。

Abstract
Multivariate time series forecasting plays a critical role in diverse domains. While recent advancements in deep learning methods, especially Transformers, have shown promise, there remains a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.

摘要
多ivariate时间序列预测在多个领域发挥关键作用。最近的深度学习方法，特别是Transformers，已经显示了承诺，但还有一个差距在处理多个时间序列之间的相互依赖关系。这篇论文引入了SageFormer，一种基于图结构的Series-aware Graph-enhanced Transformer模型，用于有效地捕捉和模型多个时间序列之间的依赖关系。SageFormer解决了两个关键挑战：一是有效地表示多个时间序列中的多样化时间模式，二是避免多个时间序列之间的重复信息。重要的是，我们提出的系列意识框架可以轻松地与现有的Transformer-based模型结合使用，以提高对多个时间序列之间的依赖关系的模型。通过对真实世界和 sintetic 数据集进行广泛的实验，我们展示了SageFormer比前一个状态的方法更高效。

Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction

paper_url: http://arxiv.org/abs/2307.01610
repo_url: https://github.com/dependablesystemslab/mia_defense_hamp
paper_authors: Zitao Chen, Karthik Pattabiraman
For: This paper is written to address the problem of membership inference attacks (MIAs) on machine learning (ML) models, which can compromise the privacy of training data. The paper proposes a defense technique called HAMP that can provide strong membership privacy and high accuracy without requiring additional data.* Methods: The HAMP technique consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model’s prediction while still achieving high accuracy. The technique also modifies all prediction outputs to become low-confidence outputs, effectively obscuring the differences between the prediction on members and non-members.* Results: The paper conducts extensive evaluation on five benchmark datasets and shows that HAMP provides consistently high accuracy and strong membership privacy, outperforming seven state-of-the-art defenses in terms of privacy-utility trade-off.

Abstract
Machine learning (ML) models are vulnerable to membership inference attacks (MIAs), which determine whether a given input is used for training the target model. While there have been many efforts to mitigate MIAs, they often suffer from limited privacy protection, large accuracy drop, and/or requiring additional data that may be difficult to acquire. This work proposes a defense technique, HAMP that can achieve both strong membership privacy and high accuracy, without requiring extra data. To mitigate MIAs in different forms, we observe that they can be unified as they all exploit the ML model's overconfidence in predicting training samples through different proxies. This motivates our design to enforce less confident prediction by the model, hence forcing the model to behave similarly on the training and testing samples. HAMP consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model's prediction while still achieving high accuracy. To further reduce privacy risk, HAMP uniformly modifies all the prediction outputs to become low-confidence outputs while preserving the accuracy, which effectively obscures the differences between the prediction on members and non-members. We conduct extensive evaluation on five benchmark datasets, and show that HAMP provides consistently high accuracy and strong membership privacy. Our comparison with seven state-of-the-art defenses shows that HAMP achieves a superior privacy-utility trade off than those techniques.

摘要
为了mitigate MIA的不同形式，我们发现它们都利用了 ML 模型对训练样本的过于自信的预测，通过不同的代理来实现。这种情况使我们设计了一种强制模型在训练和测试样本上具有相同的预测行为的方法。HAMP 包括一种新的训练框架，高级 entropy 软标签和一种基于 entropy 的 regularizer，以防止模型的预测，同时仍然实现高准确率。为了进一步减少隐私风险，HAMP 对所有预测输出进行了一致的低信任输出修改，使模型的预测结果变得模拟，从而隐藏了训练和测试样本之间的差异。我们对五个 benchmark 数据集进行了广泛的评估，并显示了 HAMP 可以在高准确率和强大的成员隐私之间取得平衡。我们与七种 state-of-the-art 防御技术进行比较，发现 HAMP 在隐私利用与实用性之间取得了更好的平衡。

Prototypes as Explanation for Time Series Anomaly Detection

paper_url: http://arxiv.org/abs/2307.01601
repo_url: None
paper_authors: Bin Li, Carsten Jentsch, Emmanuel Müller
for: 本文针对时间序列资料中的异常模式探测，尤其是在没有标签的情况下，时间序列资料的动态性和未料到的异常行为导致探测过程具有挑战性。
methods: 本文提出了ProtoAD方法，利用示例来解释深度黑盒模型中的异常探测过程。在不对探测性能有重要影响的情况下，示例提供了深度黑盒模型中的透彻关键，并提供了域专家和投资者对模型的直觉理解。
results: 本文extend了广泛使用的示例学习在分类问题上的应用，并将其推广到异常探测问题上。通过视觉化示例的latent空间和输入空间，我们直观地解释了常规资料如何被模型，并且解释了具体的异常模式是如何被识别为异常的。

Abstract
Detecting abnormal patterns that deviate from a certain regular repeating pattern in time series is essential in many big data applications. However, the lack of labels, the dynamic nature of time series data, and unforeseeable abnormal behaviors make the detection process challenging. Despite the success of recent deep anomaly detection approaches, the mystical mechanisms in such black-box models have become a new challenge in safety-critical applications. The lack of model transparency and prediction reliability hinders further breakthroughs in such domains. This paper proposes ProtoAD, using prototypes as the example-based explanation for the state of regular patterns during anomaly detection. Without significant impact on the detection performance, prototypes shed light on the deep black-box models and provide intuitive understanding for domain experts and stakeholders. We extend the widely used prototype learning in classification problems into anomaly detection. By visualizing both the latent space and input space prototypes, we intuitively demonstrate how regular data are modeled and why specific patterns are considered abnormal.

摘要
检测时序序数据中异常模式的检测是许多大数据应用场景中的关键问题。然而，缺乏标签、时序序数据的动态性和未预期的异常行为使检测过程具有挑战性。虽然最近的深度异常检测方法已经取得了成功，但这些黑盒模型中的神秘机制成为了新的挑战。模型的不透明度和预测可靠性限制了进一步的突破。本文提出了ProtoAD，使用模型为异常检测中的示例基本解释。无需对检测性能产生显著影响，示例揭示了深度黑盒模型的内部机制，提供了域专家和投资者Intuitive的理解。我们将通用的 prototype 学习在分类问题中扩展到异常检测。通过视觉化 latent space 和输入空间示例，我们直观地解释了如何模型正常数据和哪些特定模式被视为异常。

A Scalable Reinforcement Learning-based System Using On-Chain Data for Cryptocurrency Portfolio Management

paper_url: http://arxiv.org/abs/2307.01599
repo_url: None
paper_authors: Zhenhan Huang, Fumihide Tanaka
For: The paper is written for proposing a novel reinforcement learning-based system for cryptocurrency portfolio management that incorporates on-chain data for end-to-end management.* Methods: The paper uses on-chain data to train a reinforcement learning model for cryptocurrency portfolio management, and the model is tested and evaluated using backtesting results on three portfolios.* Results: The results show that the proposed CryptoRLPM system outperforms all baselines in terms of accumulated rate of return, daily rate of return, and Sortino ratio, with an enhancement of at least 83.14%, 0.5603%, and 2.1767 respectively compared to Bitcoin.Here are the three points in Simplified Chinese text:
for: 这篇论文是为了提出一种基于强化学习的 криптовалю端folio管理系统，该系统包括了链上数据的测试和评估。
methods: 论文使用链上数据来训练一个基于强化学习的 криптовалю端folio管理模型，并对模型进行了测试和评估。
results: 结果显示，提出的 CryptoRLPM 系统在比基金的测试和评估中减少了至少 83.14%、0.5603% 和 2.1767% 的负面影响，并且在比特币方面减少了至少 83.14%、0.5603% 和 2.1767% 的负面影响。

Abstract
On-chain data (metrics) of blockchain networks, akin to company fundamentals, provide crucial and comprehensive insights into the networks. Despite their informative nature, on-chain data have not been utilized in reinforcement learning (RL)-based systems for cryptocurrency (crypto) portfolio management (PM). An intriguing subject is the extent to which the utilization of on-chain data can enhance an RL-based system's return performance compared to baselines. Therefore, in this study, we propose CryptoRLPM, a novel RL-based system incorporating on-chain data for end-to-end crypto PM. CryptoRLPM consists of five units, spanning from information comprehension to trading order execution. In CryptoRLPM, the on-chain data are tested and specified for each crypto to solve the issue of ineffectiveness of metrics. Moreover, the scalable nature of CryptoRLPM allows changes in the portfolios' cryptos at any time. Backtesting results on three portfolios indicate that CryptoRLPM outperforms all the baselines in terms of accumulated rate of return (ARR), daily rate of return (DRR), and Sortino ratio (SR). Particularly, when compared to Bitcoin, CryptoRLPM enhances the ARR, DRR, and SR by at least 83.14%, 0.5603%, and 2.1767 respectively.

摘要
币Chain数据（指标），类似于公司基础数据，为区块链网络提供了关键和全面的信息。尽管它们的信息性很高，但是它们在基于强化学习（RL）的系统中没有被利用，用于货币（简称为“爬”）股票管理（PM）。这是一个有趣的话题，即使用币Chain数据可以提高RL基本系统的回报性相比基准。因此，在这种研究中，我们提出了CryptoRLPM，一种包含五个单元的RL基本系统，用于综合管理爬股票。CryptoRLPM中的币Chain数据被测试和特定为每种爬股票，以解决币Chain数据的不准确性问题。此外，CryptoRLPM具有可扩展性，可以在任何时间更改股票组合中的爬股票。回testing结果表明，CryptoRLPM在三个股票组合上超过所有基准，在累积收益率（ARR）、日内收益率（DRR）和Sortino分数（SR）方面。特别是与比特币相比，CryptoRLPM在ARR、DRR和SR方面提高了至少83.14%、0.5603%和2.1767%。

Bridge the Performance Gap in Peak-hour Series Forecasting: The Seq2Peak Framework

paper_url: http://arxiv.org/abs/2307.01597
repo_url: None
paper_authors: Zhenwei Zhang, Xin Wang, Jingyuan Xie, Heling Zhang, Yuantao Gu
for: 预测峰值时间序列 (PHSF) 是许多领域的关键任务，但是目前的深度学习模型在这种任务上表现不佳。这可以归结于峰值时间序列的高度非站台性，导致直接预测更加困难于标准时间序列预测 (TSF)。
methods: 本文提出了一种名为Seq2Peak的新框架，用于解决 PHSF 任务中的性能差距。Seq2Peak 包括两个关键组件：一个名为 CyclicNorm 的管道，用于解决非站台性问题，以及一个简单 yet effective 的可学习参数自由的峰值时间序列解码器，使用了一种混合损失函数，将原始序列和峰值时间序列作为监督信号。
results: 对于四个实际世界数据集，Seq2Peak 实现了惊人的平均相对提升率为 37.7%，对于基于 transformer 和非 transformer 的 TSF 模型。

Abstract
Peak-Hour Series Forecasting (PHSF) is a crucial yet underexplored task in various domains. While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF. This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF. Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit. To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models. Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue, and a simple yet effective trainable-parameter-free peak-hour decoder with a hybrid loss function that utilizes both the original series and peak-hour series as supervised signals. Extensive experimentation on publicly available time series datasets demonstrates the effectiveness of the proposed framework, yielding a remarkable average relative improvement of 37.7\% across four real-world datasets for both transformer- and non-transformer-based TSF models.

摘要
《峰值小时序列预测（PHSF）是许多领域中的关键 yet 未得到充分的研究。当前的深度学习模型在标准时间序列预测（TSF）中表现出色，但在 PHSF 中却很难达到相似的结果。这可以归因于峰值小时序列的高度非站ARY，使得直接预测变得更加困难于标准 TSF。另外，通过手动提取最大值从 regular 预测结果来获得优化性能的方法会带来较差的性能，因为模型会尝试最小化均方误差。为解决这些问题，本文提出了 Seq2Peak 框架，这是专门为 PHSF 任务设计的。Seq2Peak 包括两个关键组成部分： CyclicNorm 管道，用于mitigate 非站ARY问题，以及一个简单 yet 高效的可学习参数无 peak-hour 解码器，使用了 Hybrid 损失函数，该函数使用原始序列和峰值小时序列作为监督信号。经过对公共可用时间序列数据集的广泛实验，Seq2Peak 的效果得到了许多实验证明，其中平均相对提升率为 37.7%，在四个实际世界数据集上。

Cross-Element Combinatorial Selection for Multi-Element Creative in Display Advertising

paper_url: http://arxiv.org/abs/2307.01593
repo_url: None
paper_authors: Wei Zhang, Ping Zhang, Jian Dong, Yongkang Wang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang
for: 本研究旨在提高广告创作的效果，通过采用跨元素共同选择机制来选择多个创意元素的合适组合。
methods: 本研究提出了一种跨元素共同选择框架（CECS），包括编码器过程和解码器过程。编码器过程采用跨元素交互来动态调整单个创意元素的表达，而解码器过程将创意组合问题转化为多个创意元素之间的链式选择问题。
results: 实验结果表明，CECS取得了最佳成绩（SOTA）在线上数据集上的评价指标，并在实际应用中实现了显著的6.02% CTR和10.37% GMV提升，这对业务具有益处。

Abstract
The effectiveness of ad creatives is greatly influenced by their visual appearance. Advertising platforms can generate ad creatives with different appearances by combining creative elements provided by advertisers. However, with the increasing number of ad creative elements, it becomes challenging to select a suitable combination from the countless possibilities. The industry's mainstream approach is to select individual creative elements independently, which often overlooks the importance of interaction between creative elements during the modeling process. In response, this paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements, termed CECS. In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element based on the current candidate creatives. In the decoder process, the creative combination problem is transformed into a cascade selection problem of multiple creative elements. A pointer mechanism with a cascade design is used to model the associations among candidates. Comprehensive experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics. Moreover, the CECS algorithm has been deployed in our industrial application, resulting in a significant 6.02% CTR and 10.37% GMV lift, which is beneficial to the business.

摘要
“广告创意的有效性受到它的视觉形象影响很大。广告平台可以通过结合广告主提供的创意元素，生成不同的创意形象。然而，随着创意元素的数量增加，选择适当的组合变得越来越困难。业界主流的方法是选择个别创意元素独立地，往往忽略了创意元素间的互动过程中的重要性。因此，这篇文章提出了跨元素选择框架（CECS）。在encode过程中，采用了跨元素互动来动态地调整单一创意元素的表达，以满足目前的候选者。在decode过程中，创意组合问题转化为多个创意元素之间的传递选择问题。使用一个链接机制，模型候选者之间的协力。实际测试统计表明，CECS已经 дости得了最佳成绩（SOTA）的数据。此外，CECS算法已经在我们的业务应用中实现了6.02%的Click Through Rate（CTR）和10.37%的Gross Merchandise Value（GMV）提升，对业务有很大的帮助。”

Learning Lie Group Symmetry Transformations with Neural Networks

paper_url: http://arxiv.org/abs/2307.01583
repo_url: https://github.com/victoria-klein/learning-lie-group-symmetries
paper_authors: Alex Gabel, Victoria Klein, Riccardo Valperga, Jeroen S. W. Lamb, Kevin Webster, Rick Quax, Efstratios Gavves
for: 检测和评估数据集中的对称性，用于模型选择、生成模型和数据分析等方面。
methods: 利用一种新的方法，可以自动发现数据集中的未知对称性，包括 Lie 群对称变换以外的其他对称性。
results: 研究得出的结果表明，该方法可以有效地检测和评估数据集中的对称性，并且可以在不同的参数值下进行一一对应。

Abstract
The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hard-coding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterizing unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings.

摘要
问题是检测和评估数据集中的对称性，具有各种应用，如模型选择、生成模型和数据分析等。现有的方法需要先知道任务的对称性，而这种工作则是通过发现数据集中未知对称性，即李群对称变换 beyond 传统 Considered in the field (旋转、缩放和平移)。 Specifically, we consider a scenario in which a dataset has been transformed by a one-parameter subgroup of transformations with different parameter values for each data point. Our goal is to characterize the transformation group and the distribution of the parameter values. The results showcase the effectiveness of the approach in both these settings.Note that the translation is in Simplified Chinese, which is the more commonly used variety of Chinese in mainland China. If you prefer Traditional Chinese, I can provide that as well.

IAdet: Simplest human-in-the-loop object detection

paper_url: http://arxiv.org/abs/2307.01582
repo_url: https://github.com/franchesoni/iadet
paper_authors: Franco Marchesoni-Acland, Gabriele Facciolo
for: 提高单类物体检测模型的训练效率和质量，通过人工监督系统。
methods: propose a Intelligent Annotation (IA) strategy, including three modules: 助手数据标注、背景模型训练和活动选择下一个数据点。开发了特定于单类物体检测的IAdet工具，并提出了自动评估这种人工监督系统的方法。
results: 在PASCAL VOC数据集上，IAdet工具可以减少数据库标注时间25%，并提供一个免费训练过的模型。这些结果是基于偏门设计的very simple IAdet design，因此IAdet具有多个简单的改进空间，预示了可以实现强大的人工监督对象检测系统。

Abstract
This work proposes a strategy for training models while annotating data named Intelligent Annotation (IA). IA involves three modules: (1) assisted data annotation, (2) background model training, and (3) active selection of the next datapoints. Under this framework, we open-source the IAdet tool, which is specific for single-class object detection. Additionally, we devise a method for automatically evaluating such a human-in-the-loop system. For the PASCAL VOC dataset, the IAdet tool reduces the database annotation time by $25\%$ while providing a trained model for free. These results are obtained for a deliberately very simple IAdet design. As a consequence, IAdet is susceptible to multiple easy improvements, paving the way for powerful human-in-the-loop object detection systems.

摘要
这个工作提出了一种名为智能注释（IA）的模型训练策略。IA包括三个模块：（1）助手数据注释、（2）背景模型训练和（3）活动选择下一个数据点。在这个框架下，我们开源了专门用于单类对象检测的IADE工具。此外，我们还提出了一种自动评估这种人在循环系统的方法。对于PASCAL VOC数据集，IADE工具可以降低数据库注释时间$25\%$，同时提供免费的训练模型。这些结果是基于故意设计非常简单的IADE设计而得到的。因此，IADE易受到多个简单的改进，开展出具有强大人在循环对象检测系统的可能性。

Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

paper_url: http://arxiv.org/abs/2307.01578
repo_url: None
paper_authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo
for: 本研究旨在解决人工监督学习中数据注释的缺失问题，即使用一个预测器可以获得更多的注释数据。
methods: 本研究使用了一种枚举编码法来寻找最优的问题策略，以及一些启发式和lookahead最小化代理成本函数的方法。
results: 研究表明，使用提议的方法可以在几种 sintetic 和实际世界的数据集上实现23-86%的注释效率提升。

Abstract
Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement ($23-86\%$) in annotation efficiency.

摘要

Multi-Task Learning to Enhance Generazability of Neural Network Equalizers in Coherent Optical Systems

paper_url: http://arxiv.org/abs/2307.05374
repo_url: None
paper_authors: Sasipim Srivallapanondh, Pedro J. Freire, Ashraful Alam, Nelson Costa, Bernhard Spinnler, Antonio Napoli, Egor Sedov, Sergei K. Turitsyn, Jaroslaw E. Prilepsky
for: 提高减噪系统的灵活性
methods: 使用多任务学习方法提高NN基于的平衡器
results: 单个NN基于平衡器可以提高Q因子至4dB，不需要重新训练，即使发射功率、符号速率或传输距离发生变化。

Abstract
For the first time, multi-task learning is proposed to improve the flexibility of NN-based equalizers in coherent systems. A "single" NN-based equalizer improves Q-factor by up to 4 dB compared to CDC, without re-training, even with variations in launch power, symbol rate, or transmission distance.

摘要

Approximate information for efficient exploration-exploitation strategies

paper_url: http://arxiv.org/abs/2307.01563
repo_url: None
paper_authors: Alex Barbier-Chebbah, Christian L. Vestergaard, Jean-Baptiste Masson
for: 这篇论文目标是解决决策中的探索-利用矛盾，具体是多重枪支问题。
methods: 这篇论文提出了一种新的算法，即approximate information maximization（AIM），该算法使用分析式 entropy 导数来选择每个时刻哪个枪支。AIM 与 Infomax 和 Thompson sampling 性能相同，同时具有加速、决定性和可追踪性等优点。
results: 实验证明 AIM 遵循 Lai-Robbins asymptotic bound，并在不同的假设下表现稳定。其表达可调，可以根据具体情况进行特定优化。

Abstract
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a range of priors. Its expression is tunable, which allows for specific optimization in various settings.

摘要

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

paper_url: http://arxiv.org/abs/2307.03270
repo_url: https://github.com/louisbearing/hmo-audio
paper_authors: Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda
for: 这个论文主要针对的是使用深度生成模型来动画非动体图像，以实现更加自然的头部动作和语音同步。
methods: 该论文提出了一种多尺度音视频同步损失函数和多尺度自适应GAN，以更好地处理语音和头部动作之间的短期和长期相关性。
results: 实验表明，该方法可以在多尺度音视频同步和头部动作质量上达到州前的提升，并且在标准的面部特征域中生成更加自然的头部动作。

Abstract
Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.

摘要
<> transtable text into Simplified Chinese.<>使用深度生成模型动画静止图像是一个活跃的研究领域，最近几年得到了重要的进步。然而，许多努力都是 lip syncing 和图像质量的优化，而生成自然的头部运动和语音-图像相关性往往被忽略。在这项工作中，我们提议一种多尺度音视频同步损失和多尺度自适应GAN，以更好地处理语音和头部运动之间的短期和长期相关性。特别是，我们在多modal输入 pyramids 上堆叠 syncer 模型，并使用这些模型作为导向在多尺度生成网络中生成音频同步的动作。我们的生成器在 facial landmark 领域中运行，这是一个标准的低维度头部表示。实验结果表明，我们的方法可以在头部运动动态质量和多尺度音视频同步两个方面达到显著提高。

Secure Deep Learning-based Distributed Intelligence on Pocket-sized Drones

paper_url: http://arxiv.org/abs/2307.01559
repo_url: None
paper_authors: Elia Cereda, Alessandro Giusti, Daniele Palossi
for: 这个研究旨在解决单位大小仅对应小型飞行器（nano-drone）上进行大型深度学习模型的问题。
methods: 本研究提出了一种分布式边缘-fog计算模型，以实现在nano-drone上进行大型深度学习模型的执行。此外，本研究还提出了一种验证fog计算的方法，以确保fog节点或通信链路不可信。
results: 相比于完全在nano-drone上执行的现有Visual Pose Estimation网络，这个分布式边缘-fog执行方案可以提高$R^2$ score +0.19。在攻击情况下，本方法可以在2秒内检测攻击，95%的概率可以检测到。

Abstract
Palm-sized nano-drones are an appealing class of edge nodes, but their limited computational resources prevent running large deep-learning models onboard. Adopting an edge-fog computational paradigm, we can offload part of the computation to the fog; however, this poses security concerns if the fog node, or the communication link, can not be trusted. To tackle this concern, we propose a novel distributed edge-fog execution scheme that validates fog computation by redundantly executing a random subnetwork aboard our nano-drone. Compared to a State-of-the-Art visual pose estimation network that entirely runs onboard, a larger network executed in a distributed way improves the $R^2$ score by +0.19; in case of attack, our approach detects it within 2s with 95% probability.

摘要
手持式奈米型机器人的 Computational Resources 有限，无法进行大型深度学习模型的 Calculation。我们运用 Edge-Fog 计算模式，将一部分计算推广到fog中，但这会带来安全性 Concern ，如果fog Node 或通信链路不能被信任。为解决这问题，我们提出了一个分布式 Edge-Fog 执行方案，透过重复运行 Random Subnetwork 在我们的奈米型机器人上，以验证fog计算。相比于完全在board上运行的 State-of-the-Art 视觉 pose 估测网络，分布式执行的大型网络可以提高 $R^2$ 分数 +0.19; 在攻击情况下，我们的方法可以在2秒内检测到攻击，95%的机会性。

Multi-gauge Hydrological Variational Data Assimilation: Regionalization Learning with Spatial Gradients using Multilayer Perceptron and Bayesian-Guided Multivariate Regression

paper_url: http://arxiv.org/abs/2307.02497
repo_url: None
paper_authors: Ngo Nghi Truyen Huynh, Pierre-André Garambois, François Colleoni, Benjamin Renard, Hélène Roux
for: 这篇论文旨在解决水文模型中难以估计的空间分布型水文参数问题，特别是无测水道上的洪水。
methods: 本研究使用了一种新的区域化技术，将复杂的区域转换函数融合到高分辨率水文模型中，以便使用机器学习优化算法进行学习。
results: 本研究获得了一种可靠地估计水文模型中的空间分布型参数，并且可以处理多测站数据，实现了高精度的水文预测。

Abstract
Tackling the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, this contribution presents a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on: (i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or (ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions. The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.

摘要
solves the difficult problem of estimating spatially distributed hydrological parameters, especially for floods on ungauged watercourses, by presenting a novel seamless regionalization technique for learning complex regional transfer functions designed for high-resolution hydrological models. The transfer functions rely on:(i) a multilayer perceptron enabling a seamless flow of gradient computation to employ machine learning optimization algorithms, or(ii) a multivariate regression mapping optimized by variational data assimilation algorithms and guided by Bayesian estimation, addressing the equifinality issue of feasible solutions.The approach involves incorporating the inferable regionalization mappings into a differentiable hydrological model and optimizing a cost function computed on multi-gauge data with accurate adjoint-based spatially distributed gradients.

Scalable variable selection for two-view learning tasks with projection operators

paper_url: http://arxiv.org/abs/2307.01558
repo_url: https://github.com/aalto-ics-kepaco/projse
paper_authors: Sandor Szedmak, Riikka Huusari, Tat Hong Duong Le, Juho Rousu
for: 该论文提出了一种新的变量选择方法，适用于两视图设置或vector-valued超vision学习问题。该方法可以处理巨大规模的选择任务，数据样本数可以达到百万级。
methods: 该方法通过Iteratively选择高度相关于输出变量的变量，但不相关于先前选择的变量。为度量相关性，该方法使用投影算子和其代数。通过投影算子，输入和输出变量之间的关系可以表示为kernel函数，从而可以利用非线性相关模型。
results: 该方法在synthetic和实际数据上进行了实验 validate，显示了其扩展性和选择的有效性。

Abstract
In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables, but which are not correlated with the previously chosen variables. To measure the correlation, our method uses the concept of projection operators and their algebra. With the projection operators the relationship, correlation, between sets of input and output variables can also be expressed by kernel functions, thus nonlinear correlation models can be exploited as well. We experimentally validate our approach, showing on both synthetic and real data its scalability and the relevance of the selected features. Keywords: Supervised variable selection, vector-valued learning, projection-valued measure, reproducing kernel Hilbert space

摘要
在这篇论文中，我们提出了一种新的变量选择方法，适用于两视设定或vector-valued学习问题。我们的框架可以处理非常大规模的选择任务，数据样本数可以达到百万级。总之，我们的方法通过逐步选择输出变量高度相关的变量，但不相关于已经选择的变量来进行变量选择。为了度量相关性，我们使用投影算子和其代数来度量输入和输出变量之间的关系。通过投影算子，我们可以将输入和输出变量之间的关系表示为内积函数，从而可以利用内积函数来表示非线性相关模型。我们在实验中 validate our approach，并在 synthetic 和实际数据上证明了我们的方法的扩展性和选择的相关性。关键词：supervised变量选择、vector-valued学习、投影值度量、 reproduce kernel Hilbert space

Learning to reconstruct the bubble distribution with conductivity maps using Invertible Neural Networks and Error Diffusion

paper_url: http://arxiv.org/abs/2307.02496
repo_url: None
paper_authors: Nishant Kumar, Lukas Krause, Thomas Wondrak, Sven Eckert, Kerstin Eckert, Stefan Gumhold
for: 用于实现可持续的氢生产
methods: 使用外部磁场探测器和归一化方法测量磁场干扰，并使用INN重建电导率场
results: 比使用提高方法（Tikhonov regularization）表现更好，可以高精度地重建电导率场

Abstract
Electrolysis is crucial for eco-friendly hydrogen production, but gas bubbles generated during the process hinder reactions, reduce cell efficiency, and increase energy consumption. Additionally, these gas bubbles cause changes in the conductivity inside the cell, resulting in corresponding variations in the induced magnetic field around the cell. Therefore, measuring these gas bubble-induced magnetic field fluctuations using external magnetic sensors and solving the inverse problem of Biot-Savart Law allows for estimating the conductivity in the cell and, thus, bubble size and location. However, determining high-resolution conductivity maps from only a few induced magnetic field measurements is an ill-posed inverse problem. To overcome this, we exploit Invertible Neural Networks (INNs) to reconstruct the conductivity field. Our qualitative results and quantitative evaluation using random error diffusion show that INN achieves far superior performance compared to Tikhonov regularization.

摘要
<>使用电解为绿色氢生产的关键步骤，但在过程中生成的气泡会阻碍反应、降低电池效率和增加能源消耗。此外，这些气泡会导致电池内的导电性变化，从而导致电磁场附近电池的变化。因此，通过外部磁场探测器测量气泡启发的磁场变化，并解决生成的Biot-Savart法的反问题，可以估算电池内的导电性，并由此计算气泡的大小和位置。但是，从仅几个磁场测量得到高分辨率导电地图是一个不定义的倒数问题。为了解决这个问题，我们利用归一化神经网络（INNs）重建导电场。我们的质量结果和随机扩散评价表明，INN在性能方面远胜于TIkhonov正则化。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Exploiting Richness of Learned Compressed Representation of Images for Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.01524
repo_url: https://github.com/DL4Compression/Semantic_Segmentation_of_Driving_Videos_on_Learning_based_Image_Compression
paper_authors: Ravi Kakaiya, Rakshith Sathish, Ramanathan Sethuraman, Debdoot Sheet
for: 提高自动驾驶和高级驾驶助手系统（ADAS）的性能和可扩展性。
methods: 使用学习基于的压缩编码器来减少传输数据的延迟，并且通过学习的方式使得压缩编码器可以同时执行压缩和解压缩操作。
results: 在Cityscapes dataset上实验 validate the proposed pipeline，实现了压缩因子达66倍，保留了segmenation任务所需的信息，而且降低了总计算量11%。

Abstract
Autonomous vehicles and Advanced Driving Assistance Systems (ADAS) have the potential to radically change the way we travel. Many such vehicles currently rely on segmentation and object detection algorithms to detect and track objects around its surrounding. The data collected from the vehicles are often sent to cloud servers to facilitate continual/life-long learning of these algorithms. Considering the bandwidth constraints, the data is compressed before sending it to servers, where it is typically decompressed for training and analysis. In this work, we propose the use of a learning-based compression Codec to reduce the overhead in latency incurred for the decompression operation in the standard pipeline. We demonstrate that the learned compressed representation can also be used to perform tasks like semantic segmentation in addition to decompression to obtain the images. We experimentally validate the proposed pipeline on the Cityscapes dataset, where we achieve a compression factor up to $66 \times$ while preserving the information required to perform segmentation with a dice coefficient of $0.84$ as compared to $0.88$ achieved using decompressed images while reducing the overall compute by $11\%$.

摘要
自动驾驶车和高级驾驶助手系统（ADAS）有可能改变我们的旅行方式。许多这些车辆目前都使用分割和对象检测算法来检测和跟踪周围的对象。收集到的数据通常会被发送到云服务器以便持续/人生学习这些算法。由于带宽约束，数据通常会被压缩后发送到服务器，其中它们通常会被解压缩以进行训练和分析。在这种情况下，我们提议使用学习基于压缩编码器来减少标准管道中的延迟过载。我们示出了learned压缩表示可以用于实现像semantic segmentation这样的任务，而不需要解压缩。我们对Cityscapes数据集进行实验，并实现了最多$66\times$的压缩因子，保留了需要进行分割的信息，并且将compute总体减少$11\%$。

Deep Attention Q-Network for Personalized Treatment Recommendation

paper_url: http://arxiv.org/abs/2307.01519
repo_url: https://github.com/stevenmsm/rl-icu-daqn
paper_authors: Simin Ma, Junghwan Lee, Nicoleta Serban, Shihao Yang
for: 这篇论文旨在提供个性化治疗建议，以实现医疗结果最佳化。
methods: 本研究使用深度注意力Q网络（DAQN），利用对应架构内的强化学习框架，高效地包含所有过去病人观察数据。
results: 比较先前的模型，本研究的DAQN模型在实际世界的 septic shock 和急性低血压患者群中表现出色，显示其超越性。

Abstract
Tailoring treatment for individual patients is crucial yet challenging in order to achieve optimal healthcare outcomes. Recent advances in reinforcement learning offer promising personalized treatment recommendations; however, they rely solely on current patient observations (vital signs, demographics) as the patient's state, which may not accurately represent the true health status of the patient. This limitation hampers policy learning and evaluation, ultimately limiting treatment effectiveness. In this study, we propose the Deep Attention Q-Network for personalized treatment recommendations, utilizing the Transformer architecture within a deep reinforcement learning framework to efficiently incorporate all past patient observations. We evaluated the model on real-world sepsis and acute hypotension cohorts, demonstrating its superiority to state-of-the-art models. The source code for our model is available at https://github.com/stevenmsm/RL-ICU-DAQN.

摘要
个人化治疗是现代医疗的关键，但是实现优化医疗效果却是挑战。 latest advances in reinforcement learning 提供了个人化治疗建议的可能性，但是它们只基于当前患者的观察数据（生命 Parameters, demographics）来定义患者的状态，这可能不准确地反映患者的真实健康状况。这种限制策略学习和评估，最终限制了治疗效果。在这项研究中，我们提出了 Deep Attention Q-Network，使用 transformer 架构在深度强化学习框架中高效地包含所有过去患者的观察数据。我们对现实世界的 septic shock 和急性低血压群体进行了评估，并证明了我们的模型在现有模型之上表现出色。我们的模型的源代码可以在 https://github.com/stevenmsm/RL-ICU-DAQN 上获取。

SelfFed: Self-supervised Federated Learning for Data Heterogeneity and Label Scarcity in IoMT

paper_url: http://arxiv.org/abs/2307.01514
repo_url: None
paper_authors: Sunder Ali Khowaja, Kapal Dev, Syed Muhammad Anwar, Marius George Linguraru
for: 这个研究旨在提出一个基于自适应学习的联邦学习框架，以实现在对没有标签的隔离数据上进行协同学习。
methods: 我们提出了一个名为SelfFed的框架，它包括两个阶段：首先是预训阶段，使用Swin Transformer基本Encoder进行增强模型，在分散式的方式下进行执行。其次是精度调整阶段，引入对照网络和一个新的聚合策略，在分散式的方式下进行训练，以解决标签稀缺问题。
results: 我们在公共可用的医疗图像数据集上进行实验分析，结果显示，我们的提出的SelfFed框架在非Identical和相似数据（IID） dataset上比基于已有的基eline出perform得更好，具体的提高8.8%和4.1%在Retina和COVID-FL数据集上。此外，我们的方法甚至在仅使用10%标签的情况下也能超越现有的基eline。

Abstract
Self-supervised learning in federated learning paradigm has been gaining a lot of interest both in industry and research due to the collaborative learning capability on unlabeled yet isolated data. However, self-supervised based federated learning strategies suffer from performance degradation due to label scarcity and diverse data distributions, i.e., data heterogeneity. In this paper, we propose the SelfFed framework for Internet of Medical Things (IoMT). Our proposed SelfFed framework works in two phases. The first phase is the pre-training paradigm that performs augmentive modeling using Swin Transformer based encoder in a decentralized manner. The first phase of SelfFed framework helps to overcome the data heterogeneity issue. The second phase is the fine-tuning paradigm that introduces contrastive network and a novel aggregation strategy that is trained on limited labeled data for a target task in a decentralized manner. This fine-tuning stage overcomes the label scarcity problem. We perform our experimental analysis on publicly available medical imaging datasets and show that our proposed SelfFed framework performs better when compared to existing baselines concerning non-independent and identically distributed (IID) data and label scarcity. Our method achieves a maximum improvement of 8.8% and 4.1% on Retina and COVID-FL datasets on non-IID dataset. Further, our proposed method outperforms existing baselines even when trained on a few (10%) labeled instances.

摘要
“自我指导学习在联合学习框架中得到了产业和研究领域的广泛关注，因为它可以在不同数据源上进行协同学习，无需标签数据。然而，基于自我指导学习的联合学习策略受到数据不均衡和标签稀缺的限制，即数据多样性问题。在本文中，我们提出了基于互联网医疗器件（IoMT）的SelfFed框架。我们的提议的SelfFed框架分为两个阶段。第一阶段是预训练阶段，使用Swin Transformer基于编码器进行增强模型，在分布式方式下进行。第一阶段的SelfFed框架帮助解决数据多样性问题。第二阶段是精度调整阶段，引入对照网络和一种新的聚合策略，在分布式方式下进行限制标签数据的训练。这个精度调整阶段帮助解决标签稀缺问题。我们在公开available的医学成像数据集上进行实验分析，并证明我们的提议的SelfFed框架在非Identical和相似数据（IID）下性能更好，提高了8.8%和4.1%的提升。此外，我们的提议方法还能在只有10%标签实例的情况下超越现有的基准值。”

Relation-aware graph structure embedding with co-contrastive learning for drug-drug interaction prediction

paper_url: http://arxiv.org/abs/2307.01507
repo_url: None
paper_authors: Mengying Jiang, Guizhong Liu, Biao Zhao, Yuanchao Su, Weiqiang Jin
for: 预测多种关系 drug-drug interaction (DDIs)
methods: 使用 relation-aware graph structure embedding (RaGSE) with co-contrastive learning
results: 在三个任务上比前state-of-the-art方法表现出色，得到更好的预测结果

Abstract
Relation-aware graph structure embedding is promising for predicting multi-relational drug-drug interactions (DDIs). Typically, most existing methods begin by constructing a multi-relational DDI graph and then learning relation-aware graph structure embeddings (RaGSEs) of drugs from the DDI graph. Nevertheless, most existing approaches are usually limited in learning RaGSEs of new drugs, leading to serious over-fitting when the test DDIs involve such drugs. To alleviate this issue, we propose a novel DDI prediction method based on relation-aware graph structure embedding with co-contrastive learning, RaGSECo. The proposed RaGSECo constructs two heterogeneous drug graphs: a multi-relational DDI graph and a multi-attribute drug-drug similarity (DDS) graph. The two graphs are used respectively for learning and propagating the RaGSEs of drugs, aiming to ensure all drugs, including new ones, can possess effective RaGSEs. Additionally, we present a novel co-contrastive learning module to learn drug-pairs (DPs) representations. This mechanism learns DP representations from two distinct views (interaction and similarity views) and encourages these views to supervise each other collaboratively to obtain more discriminative DP representations. We evaluate the effectiveness of our RaGSECo on three different tasks using two real datasets. The experimental results demonstrate that RaGSECo outperforms existing state-of-the-art prediction methods.

摘要
“关系意识的图结构嵌入显示了在预测多关系药物交互（DDIs）方面的承诺。通常，现有的方法都是从多关系DDIs图构建起来，然后学习关系意识图结构嵌入（RaGSEs）。然而，这些方法通常只能学习新药物的RaGSEs，导致在测试DDIs中严重过拟合。为解决这个问题，我们提出了一种基于关系意识图结构嵌入和协同对比学习的新DDIs预测方法，即RaGSECo。提案的RaGSECo构建了两个不同类型的药物图：一个多关系DDIs图和一个多属性药物对比图。这两个图用于学习和传播药物的RaGSEs，以确保所有药物，包括新的一些，都可以具有有效的RaGSEs。此外，我们还提出了一种新的协同对比学习模块，用于学习药物对的表示。这个机制从两种不同的视图（交互视图和相似视图）中学习药物对的表示，并且使这两个视图相互监督each other以获得更有特征的药物对表示。我们使用三个不同的任务和两个真实数据集进行了实验，结果显示，RaGSECo在这些任务中表现出了更高的效果。”

All in One: Multi-task Prompting for Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.01504
repo_url: https://github.com/sheldonresearch/ProG
paper_authors: Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan
for: 填充预训练模型的知识空间，以便更好地应对不同的图任务。
methods: 提出了一种基于多 зада务提问的图模型提问方法，包括对图提问和自然语言提问的融合、对图任务的重新定义以适应预训练模型，以及使用元学习来快速学习更好的初始化方法。
results: 经过广泛的实验，结果表明该方法可以在不同的图任务上达到更高的性能。

Abstract
Recently, ''pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a ''negative transfer'' to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.

摘要
近些时候，“预训练和精度调整”成为了许多图任务的标准工作流程，因为它可以帮助图模型学习通用的图知识，从而缓解每个应用程序缺乏图注释的问题。然而，图任务中的节点层、边层和图层具有广泛的多样性，这些预训练预TeX often incompatible with these multiple tasks，这可能会导致“负面传播”，从而影响特定应用程序的结果。 inspirited by the prompt learning in natural language processing (NLP), which has shown significant effectiveness in leveraging prior knowledge for various NLP tasks, we investigate the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks.在这篇论文中，我们提出了一种新的多任务提问方法 для图模型。具体来说，我们首先将图提问和语言提问的格式统一为Prompt Token、Token结构和插入模式。这样，NLP中的提问思想可以轻松地在图领域中引入。然后，为了进一步缩小不同图任务和当前预训练策略之间的差距，我们进一步研究了各种图应用程序的任务空间，并重新表述下游问题为图级任务。最后，我们引入了元学习，以更有效地学习多任务提问中的初始化，以使我们的提问框架更可靠和通用于不同任务。我们进行了广泛的实验，实验结果表明了我们的方法的优越性。

Accelerated stochastic approximation with state-dependent noise

paper_url: http://arxiv.org/abs/2307.01497
repo_url: None
paper_authors: Sasila Ilandarideva, Anatoli Juditsky, Guanghui Lan, Tianjiao Li
for: solves a class of stochastic smooth convex optimization problems with general noise assumptions.
methods: uses two non-Euclidean accelerated stochastic approximation routines: stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE).
results: achieves the optimal convergence rate and attains the optimal iteration and sample complexities simultaneously, with more general assumptions for SGE that allow for efficient application to statistical estimation problems under heavy tail noises and discontinuous score functions.

Abstract
We consider a class of stochastic smooth convex optimization problems under rather general assumptions on the noise in the stochastic gradient observation. As opposed to the classical problem setting in which the variance of noise is assumed to be uniformly bounded, herein we assume that the variance of stochastic gradients is related to the "sub-optimality" of the approximate solutions delivered by the algorithm. Such problems naturally arise in a variety of applications, in particular, in the well-known generalized linear regression problem in statistics. However, to the best of our knowledge, none of the existing stochastic approximation algorithms for solving this class of problems attain optimality in terms of the dependence on accuracy, problem parameters, and mini-batch size. We discuss two non-Euclidean accelerated stochastic approximation routines--stochastic accelerated gradient descent (SAGD) and stochastic gradient extrapolation (SGE)--which carry a particular duality relationship. We show that both SAGD and SGE, under appropriate conditions, achieve the optimal convergence rate, attaining the optimal iteration and sample complexities simultaneously. However, corresponding assumptions for the SGE algorithm are more general; they allow, for instance, for efficient application of the SGE to statistical estimation problems under heavy tail noises and discontinuous score functions. We also discuss the application of the SGE to problems satisfying quadratic growth conditions, and show how it can be used to recover sparse solutions. Finally, we report on some simulation experiments to illustrate numerical performance of our proposed algorithms in high-dimensional settings.

摘要
我团队考虑了一类泛化噪声 convex 优化问题，其中噪声噪声度受到较为一般的假设。与经典问题设定相比，我们假设噪声 variance 与优化解的"低效"有关。这些问题在各种应用中自然出现，如统计中的泛化线性回归问题。然而，据我们所知，现有的泛化噪声策略并没有达到最佳的依赖于准确性、问题参数和批处大小。我们介绍了两种非欧几何减速泛化策略：噪声加速梯度下降（SAGD）和梯度拓展（SGE）。我们证明了这两种策略，在合适的假设下，都可以达到最佳的准确率，同时具有最佳的迭代次数和批处大小复杂度。然而，SGE 算法的假设更加通用，允许在重 tailed 噪声和离散分数函数的情况下进行有效的应用。我们还讨论了 SGE 在 quadratic growth conditions 下的应用，并示出它可以用来恢复稀疏解。最后，我们报告了一些高维度设置下的仿真实验结果，以 illustrate 我们的提议方法的数值性能。

Review of Deep Learning-based Malware Detection for Android and Windows System

paper_url: http://arxiv.org/abs/2307.01494
repo_url: None
paper_authors: Nazmul Islam, Seokjoo Shin
for: 遥测和区分不同种类的黑客病毒，以评估其行为和威胁水平，并发展防御策略。
methods: 使用人工智能技术（AI）为抗黑客系统，以应对不同类型的隐藏和混淆技术。
results: 实验结果显示，使用AI技术可以实现百分之百的检测精度，探测不同类型的黑客病毒。

Abstract
Differentiating malware is important to determine their behaviors and level of threat; as well as to devise defensive strategy against them. In response, various anti-malware systems have been developed to distinguish between different malwares. However, most of the recent malware families are Artificial Intelligence (AI) enable and can deceive traditional anti-malware systems using different obfuscation techniques. Therefore, only AI-enabled anti-malware system is robust against these techniques and can detect different features in the malware files that aid in malicious activities. In this study we review two AI-enabled techniques for detecting malware in Windows and Android operating system, respectively. Both the techniques achieved perfect accuracy in detecting various malware families.

摘要
不同的黑客软件有不同的行为和威胁水平，因此可以通过区分黑客软件来制定防御策略。然而，大多数最新的黑客软件家族具有人工智能（AI）功能，可以使用不同的隐蔽技术欺骗传统的防病软件。因此，只有使用AI技术的防病软件才能够对这些技术进行鲜活的检测和区分。本研究将介绍两种基于AI技术的防病方法，一种用于Windows操作系统，另一种用于Android操作系统。两种方法均达到了完美的检测精度，可以帮助检测不同的黑客软件家族。

FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization

paper_url: http://arxiv.org/abs/2307.02493
repo_url: None
paper_authors: Eunju Yang, Gyusang Cho, Chan-Hyun Youn
for:* 这个研究是为了解决多源领域适应（Multi-Source Domain Adaptation，MSDA）中的问题，特别是在没有目标标签和多个领域的情况下进行模型适应。methods:* 这个研究提出了一个新的问题场景，即Three-Free Domain Adaptation（TFDA），在这个问题场景下，目标标签、源数据集和源领域资讯（领域标签和领域数量）都是不可用的。* 这个研究提出了一个实用的适应框架，called FREEDOM，它利用生成模型，将数据分解为类别和样式的两个方面，并使用非Parametric Bayesian方法来定义样式。在适应阶段，FREEDOM尝试将源类别分布与目标类别分布匹配，然后只部署部分的分类模型为个人化网络。results:* 这个研究获得了state-of-the-art或相等的性能，而且可以在没有领域资讯的情况下进行适应，并且将终端模型的大小减少到目标边缘。

Abstract
From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.

摘要
从服务角度来看，多源频率适应（MSDA）是一个有前途的场景，用于适应已部署模型到客户的数据集。它可以无需目标标签进行适应，并且支持多个源频率构建的情况。然而，它在训练中强依赖于多个源频率数据集的先前知识，以及每个数据样本的频率标签。此外，MSDA需要同时使用源和目标数据集（物理上），导致客户设备存储限制或数据隐私问题。为了更实际的模型适应场景，我们宽松了这些限制，并提出了一个新的问题场景：三自频率适应（TFDA），其中1）目标标签，2）源数据集，以及3）源频率信息（频率标签和频率数量）都不可用。在这种问题场景下，我们提出了一个实用的适应框架called FREEDOM。它利用了生成模型的力量，将数据分解成类和风格两个方面，其中风格被定义为来源数据中独立于类的信息，并使用非 Parametric Bayesian方法设计。在适应阶段，FREEDOM的目标是匹配源类分布与目标类分布，以哲学的思想，即类分布在风格不同的情况下仍然一致。然后，FREEDOM只部署一部分的分类模型作为个性化网络。因此，FREEDOM可以在无需源频率信息的情况下实现状态前或相当的性能，并且减少了目标模型的最终大小，不受源频率数量的影响。

Nexus sine qua non: Essentially Connected Networks for Traffic Forecasting

paper_url: http://arxiv.org/abs/2307.01482
repo_url: None
paper_authors: Tong Nie, Guoyang Qin, Lijun Sun, Yunpeng Wang, Jian Sun
for: 本研究旨在开发简洁高效的神经网络模型，用于learnings representations和预测交通数据中的下一个时刻行为。
methods: 本研究使用了spatiotemporal graph neural networks (STGNNs)，但是现有STGNNs使用复杂的技术来捕捉交通数据中的结构，导致它们难以理解和扩展。因此，研究人员寻求了简单 yet efficient的architecture。研究人员发现了STGNN的表示中的核心是certain forms of spatiotemporal contextualization，并根据此设计了一种简单的efficient message-passing backbone，即Nexus sine qua non (NexuSQN)。
results: 研究人员发现，NexuSQN比较简单的结构，即使不使用复杂的RNNs、Transformers和diffusion convolutions，仍能在计算效率、精度和大小等方面超越了复杂的参考模型。这表明，将来可能有一个Promising future for developing simple yet efficient neural predictors。

Abstract
Spatiotemporal graph neural networks (STGNNs) have emerged as a leading approach for learning representations and forecasting on traffic datasets with underlying topological and correlational structures. However, current STGNNs use intricate techniques with high complexities to capture these structures, making them difficult to understand and scale. The existence of simple yet efficient architectures remains an open question. Upon closer examination, we find what lies at the core of STGNN's representations are certain forms of spatiotemporal contextualization. In light of this, we design Nexus sine qua non (NexuSQN), an essentially connected network built on an efficient message-passing backbone. NexuSQN simply uses learnable "where" and "when" locators for the aforementioned contextualization and omits any intricate components such as RNNs, Transformers, and diffusion convolutions. Results show that NexuSQN outperforms intricately designed benchmarks in terms of size, computational efficiency, and accuracy. This suggests a promising future for developing simple yet efficient neural predictors.

摘要
现代各种图 neural networks (STGNNs) 已经成为学习表示和预测交通数据中的底层拓扑和相关结构的领先方法。然而，当前的 STGNNs 使用复杂的技术来捕捉这些结构，这使得它们变得难以理解和扩展。有效且简单的架构的存在仍然是一个开放的问题。经过仔细分析，我们发现 STGNN 的表示核心是一种特定的空间时间嵌入。基于这一点，我们设计了 Nexus sine qua non (NexuSQN)，一种简单而高效的网络。NexuSQN 使用学习的 "where" 和 "when" 嵌入来进行上述嵌入，并且不包含任何复杂的组件，如 RNNs、Transformers 和扩散卷积。结果表明，NexuSQN 在Size、计算效率和准确性三个方面超过了复杂设计的标准准。这表示在发展简单且高效的神经预测器方面，有一个广阔的未来。

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01472
repo_url: None
paper_authors: Zhuoran Li, Ling Pan, Longbo Huang
for: 本研究提出了一种新的多智能体偏好离线模型（DOM2），用于多智能体学习 reinforcement learning（MARL）环境中的离线学习。
methods: 在本研究中，我们将diffusion模型integrated into the policy network，并提出了一种基于轨迹的数据增强方案。这些关键元素使得我们的算法更加鲁棒对环境变化，并实现了 significiant improvements in performance, generalization和数据效率。
results: 我们的实验结果表明，DOM2在多智能体粒子和多智能体MuJoCo环境中比 existed state-of-the-art方法表现出更高的表现，并在shifted环境中具有更高的表现和更好的泛化能力。此外，DOM2也表现出了更高的数据效率，可以在$20++$ times less data的情况下达到state-of-the-art表现。

Abstract
We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-augmentation scheme in training. These key ingredients make our algorithm more robust to environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better in shifted environments thanks to its high expressiveness and diversity. Furthermore, DOM2 shows superior data efficiency and can achieve state-of-the-art performance with $20+$ times less data compared to existing algorithms.

摘要
我们提出了一种新的扩散停机多智能体模型（DOM2），用于停机多智能体学习（MARL）。与现有算法不同，我们的算法不仅仅依靠保守性在策略设计中，而是通过扩散来增强策略表达能力和多样性。具体来说，我们在策略网络中 интегrollo了扩散模型，并提出了一种基于轨迹的数据扩充方案在训练中。这些关键元素使我们的算法更加鲁棒对环境变化，并在性能、泛化和数据效率方面达到了显著的改进。我们的广泛的实验结果表明，DOM2在多体分子和多体MuJoCo环境中比现有状态的方法表现出色，并在偏shifted环境中具有更高的表达能力和多样性。此外，DOM2还表现出了更好的数据效率，可以在$20++$times less data的情况下达到状态顶尖的性能。

A Review of Driver Gaze Estimation and Application in Gaze Behavior Understanding

paper_url: http://arxiv.org/abs/2307.01470
repo_url: None
paper_authors: Pavan Kumar Sharma, Pranamesh Chakraborty
for: 本研究的主要目标是对driver gaze基础知识、测量方法和实际驾驶场景中的应用进行全面的总结。
methods: 本研究使用了头戴式和远程设置基于眼动估算的方法，以及与这些数据收集方法相关的术语。然后列出了现有的参考驾驶员眼动数据集，并讲述了数据收集方法和设备使用的方法。最后，本研究讲述了用于眼动估算的算法，主要是传统机器学习和深度学习基本方法。
results: 估算的驾驶员眼动被用于理解在交叉路口、上坡入口、下坡出口、车道变换和道路广告结构的影响。而且，本研究还讲述了现有文献中的限制、挑战和未来发展预cast。

Abstract
Driver gaze plays an important role in different gaze-based applications such as driver attentiveness detection, visual distraction detection, gaze behavior understanding, and building driver assistance system. The main objective of this study is to perform a comprehensive summary of driver gaze fundamentals, methods to estimate driver gaze, and it's applications in real world driving scenarios. We first discuss the fundamentals related to driver gaze, involving head-mounted and remote setup based gaze estimation and the terminologies used for each of these data collection methods. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and the equipment used for such data collection. This is followed by a discussion of the algorithms used for driver gaze estimation, which primarily involves traditional machine learning and deep learning based techniques. The estimated driver gaze is then used for understanding gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and determining the effect of roadside advertising structures. Finally, we have discussed the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.

摘要
Driver's gaze plays an important role in various gaze-based applications, such as detecting driver attentiveness, visual distraction, and understanding gaze behavior. The main objective of this study is to provide a comprehensive overview of driver gaze fundamentals, methods for estimating driver gaze, and its applications in real-world driving scenarios.First, we discuss the fundamentals of driver gaze, including head-mounted and remote setup-based gaze estimation, and the terminologies used for each data collection method. Next, we list out the existing benchmark driver gaze datasets, highlighting the collection methodology and equipment used for such data collection.Then, we discuss the algorithms used for driver gaze estimation, primarily involving traditional machine learning and deep learning-based techniques. The estimated driver gaze is used to understand gaze behavior while maneuvering through intersections, on-ramps, off-ramps, lane changing, and the effect of roadside advertising structures.Finally, we discuss the limitations in the existing literature, challenges, and the future scope in driver gaze estimation and gaze-based applications.

Causal Reinforcement Learning: A Survey

paper_url: http://arxiv.org/abs/2307.01452
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang
for: 本研究写作的目的是对于 causal reinforcement learning 的文献审查和概述。
methods: 本文使用的方法包括 introducing basic concepts of causality and reinforcement learning, 以及 categorizing and systematically reviewing existing causal reinforcement learning approaches based on their target problems and methodologies.
results: 本文审查了 current literature on causal reinforcement learning, 并发现了 several open issues and future directions in this emerging field.

Abstract
Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.

摘要
<>通过下面的文本翻译到简化中文：<>Control learning是一种重要的思想方式，用于解决带有不确定性的顺序决策问题。虽然在过去几十年内，有很多出色的成果，但是在实际应用中仍然存在很多挑战。其中一个主要的障碍是控制学学习代理不具备世界的基本理解，因此需要通过大量的尝试和错误互动来学习。它们还可能面临着解释决策的挑战和掌握知识的一致性问题。然而， causality 提供了一种明显的优势，即可以系统地ormalize知识，并利用不变性来实现有效的知识传递。这导致了 causal reinforcement learning 的出现，这是一种尝试将 causality integrated 到学习过程中的一种新领域。在这篇评论中，我们全面评论了 literature 中的 causal reinforcement learning 研究。我们首先介绍了 causality 和 reinforcement learning 的基本概念，然后解释了如何通过 causality 解决非 causal reinforcement learning 中的核心挑战。然后，我们按照目标问题和方法分类系统地审查了现有的 causal reinforcement learning 方法。最后，我们列出了未解决的问题和未来的方向。

A Double Machine Learning Approach to Combining Experimental and Observational Data

paper_url: http://arxiv.org/abs/2307.01449
repo_url: None
paper_authors: Marco Morucci, Vittorio Orlandi, Harsh Parikh, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky
for: 该论文旨在提出一种将实验和观察研究结合起来的双机器学习方法，以便实践者可以一起测试假设的满足性和对治疗效果的估计。
methods: 该方法使用双机器学习技术将实验和观察研究结合起来，以检测假设的满足性和外部有效性的违反。当只有一个假设被违反时，我们提供了半 Parametric 有效的治疗效果估计器。
results: 该研究在三个实际应用场景中展示了其适用性，并指出了准确地识别违反假设的重要性以确保治疗效果的估计的重要性。

Abstract
Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one assumption is violated, we provide semi-parametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. We demonstrate the applicability of our approach in three real-world case studies, highlighting its relevance for practical settings.

摘要
实验和观察研究经常受到有效性问题，因为假设通常无法被证明。我们提出了一种双机器学习方法，可以结合实验和观察研究，让实践者可以测试假设违背和估计治疗效果一致。我们的框架测试了外部有效性和无知性的违背，假设较弱的假设违背。只有一个假设违背时，我们提供了半 parametrically有效的治疗效果估计器。但我们的无免责 theorem 显示，精确地识别违背的假设是估计治疗效果一致的必要条件。我们在三个实际应用中例子中详细介绍了我们的方法，强调了它在实践中的重要性。

On Conditional and Compositional Language Model Differentiable Prompting

paper_url: http://arxiv.org/abs/2307.01446
repo_url: https://github.com/jpilaul/PRopS
paper_authors: Jonathan Pilault, Can Liu, Mohit Bansal, Markus Dreyer
for: 本研究旨在调整静态语言模型（PLM），以便在下游任务中表现出色。
methods: 本研究使用 conditional和compositional differentiable prompting，并提出了一种新的模型——Prompt Production System（PRopS），可以将任务说明或输入元数据转换成细化的Continuous prompts，以便从PLM中获得任务特定的输出。PRopS使用基于神经网络的Production Systems结构，可以学习不同的提示输入模式，进行可compose转换，适用于小样本学习和过渡学习。
results: 研究表明，PRopS可以在compositional generalization任务、可控摘要和多语言翻译中，Consistently exceed other PLM adaptation techniques，并经常超越完全精心调整模型。同时，PRopS需要 fewer trainable parameters，适合实际应用。

Abstract
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks. Prompts can be represented by a human-engineered word sequence or by a learned continuous embedding. In this work, we investigate conditional and compositional differentiable prompting. We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts that elicit task-specific outputs from the PLM. Our model uses a modular network structure based on our neural formulation of Production Systems, which allows the model to learn discrete rules -- neural functions that learn to specialize in transforming particular prompt input patterns, making it suitable for compositional transfer learning and few-shot learning. We present extensive empirical and theoretical analysis and show that PRopS consistently surpasses other PLM adaptation techniques, and often improves upon fully fine-tuned models, on compositional generalization tasks, controllable summarization and multilingual translation, while needing fewer trainable parameters.

摘要
<>转换给定文本到简化中文。文本：提示已经被证明是一种有效的方法，用于适应预训练语言模型（PLM）在下游任务中表现良好。提示可以表示为人工设计的单词序列或学习到的连续嵌入。在这种工作中，我们研究了决定式和组合的可导提示。我们提出了一种新的模型，提示生产系统（PRopS），该模型学习将任务指令或输入元数据转换为可导的提示，以便从PLM中获取任务特定的输出。我们的模型采用基于我们的神经网络表述的生产系统结构，该结构允许模型学习分解规则——神经函数学习特定提示输入模式的特殊化，使其适用于组合转移学习和少量学习。我们进行了广泛的实验和理论分析，并证明了PRopS在组合泛化任务、可控概要摘要和多语言翻译中表现出色，而需要 fewer 可训练参数。

Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM

paper_url: http://arxiv.org/abs/2307.05383
repo_url: None
paper_authors: Di Fan, Mingyang Liu, Xiaohan Zhang, Xiaopeng Gong
for: 本研究提出了一种基于自动选择的 galvanic skin response (GSR) 信号特征和 Support Vector Machine (SVM) 的人类情感识别方法。
methods: 研究使用 e-Health Sensor Platform V2.0 获取 GSR 信号，然后使用浮点函数除噪和normalize 处理数据，提取30个特征。然后，使用协方差基于的特征选择来优化特征。最后，使用 SVM 输入优化特征进行人类情感识别。
results: 实验结果表明，提出的方法可以实现好的人类情感识别，识别率高于 66.67%。

Abstract
A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper. GSR signals were acquired by e-Health Sensor Platform V2.0. Then, the data is de-noised by wavelet function and normalized to get rid of the individual difference. 30 features are extracted from the normalized data, however, directly using of these features will lead to a low recognition rate. In order to gain the optimized features, a covariance based feature selection is employed in our method. Finally, a SVM with input of the optimized features is utilized to achieve the human emotion recognition. The experimental results indicate that the proposed method leads to good human emotion recognition, and the recognition accuracy is more than 66.67%.

摘要
本文提出了一种基于自动选择的galvanic skin response（GSR）信号特征和支持向量机（SVM）的人类情感识别方法。GSR信号通过e-Health感知平台V2.0获取。然后，数据进行杂谱函数滤波和normalizaation处理，以消除个体差异。从 нормализов的数据中提取了30个特征，但直接使用这些特征将导致低的识别率。为了获得优化的特征，我们在方法中使用covariance基于的特征选择。最后，使用输入优化特征的SVM实现人类情感识别。实验结果表明，提出的方法可以实现良好的人类情感识别，识别率高于66.67%。

TablEye: Seeing small Tables through the Lens of Images

paper_url: http://arxiv.org/abs/2307.02491
repo_url: None
paper_authors: Seung-eon Lee, Sang-Chul Lee
for: 这个论文目的是解决几个板表学习问题，具体来说是在几个板表数据上培养模型，而不需要大量标签数据。
methods: 这个论文使用的方法是基于域转换的，通过生成板表图像来保持原始板表数据的内在 semantics。然后使用已经测试过的几个shot学习算法和嵌入函数来获得和应用优先知识。
results: 这个论文的结果表明，TablEye在4个shot任务中的最高AUC为0.11，在1个shot设置中的平均错误率高于TabLLM by 3.17%。这表明TablEye在几个板表数据上具有更好的性能。

Abstract
The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.

摘要
exploration of few-shot tabular learning becoming increasingly important. 表格数据是一种多样化表示方式，它可以捕捉多种信息，但同时也有一些限制，例如数据属性和模型大小。对于大量表格数据的标注可能是困难的，而且可能无法捕捉所有重要的特征。然而，几 shot tabular learning仍然尚未得到广泛的探索，主要是因为独立的数据集之间的共享信息缺乏，以及表格数据中的内在含义是不具有明确定义的。根据我们所知，没有任何不受限制的几 shot tabular learning技术已经被开发出来，没有强制要求数据集的限制。在这篇论文中，我们提出了一个创新的框架，即TablEye，以解决表格数据的几 shot learning问题。TablEye采用域转换来解决几 shot learning问题，通过生成表格图像来保留原始表格数据的内在含义。这种方法利用了已经测试过的几 shot学习算法和嵌入函数，以获取和应用先前知识。通过共享数据域，我们可以利用这些先前知识，原来学习自图像领域。特别是，TablEye在4 shot任务中的最大AUC为0.11，在1 shot任务中的平均准确率高于TabLLM的3.17%。

Learning to Branch in Combinatorial Optimization with Graph Pointer Networks

paper_url: http://arxiv.org/abs/2307.01434
repo_url: None
paper_authors: Rui Wang, Zhiming Zhou, Tao Zhang, Ling Wang, Xin Xu, Xiangke Liao, Kaiwen Li
for: 解决 combinatorial optimization 问题的variable选择策略学习
methods: 提出了一种基于图Pointer网络的变量选择策略学习模型，利用图特征、全局特征和历史特征来表示解决器状态
results: 实验表明，提出的方法可以有效地将解决器状态映射到分支变量决策中，并且在各种benchmark问题上显著超越了经典强分支专家规则，同时也超越了当前最佳机器学习基于分支和缓存的方法。

Abstract
Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions. The model is trained to imitate the classic strong branching expert rule by a designed top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. Our approach also outperforms the state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.

摘要
通常的方法之一用于解决 combinatorial optimization 问题是 branch-and-bound。这篇论文提出了一种图像指针网络模型，用于学习变量选择策略在 branch-and-bound 中。我们提取了图像特征、全局特征和历史特征来表示解决器状态。提议的模型，结合图像神经网络和指针机制，可以有效地将解决器状态映射到分支变量决策。模型通过一个设计的 top-k Kullback-Leibler 分布差函数进行训练，以模仿经典的强分支专家规则。实验表明，提议的方法在一系列的 benchmark 问题上显著超越了通用的专家设计的分支规则。我们的方法还超越了当前的Machine Learning 基于 branch-and-bound 方法在所有测试实例上的解决速度和搜索树大小。此外，模型还可以泛化到未看过的实例和更大的实例。

SleepEGAN: A GAN-enhanced Ensemble Deep Learning Model for Imbalanced Classification of Sleep Stages

paper_url: http://arxiv.org/abs/2307.05362
repo_url: None
paper_authors: Xuewei Cheng, Ke Huang, Yi Zou, Shujie Ma
for: automatische slaapfaseclassificatie
methods: GAN-powered ensemble deep learning model (SleepEGAN) en data-augmentatie
results: verbeterde classificatie-accurateit compared to existing state-of-the-art methods using three public sleep datasets.

Abstract
Deep neural networks have played an important role in automatic sleep stage classification because of their strong representation and in-model feature transformation abilities. However, class imbalance and individual heterogeneity which typically exist in raw EEG signals of sleep data can significantly affect the classification performance of any machine learning algorithms. To solve these two problems, this paper develops a generative adversarial network (GAN)-powered ensemble deep learning model, named SleepEGAN, for the imbalanced classification of sleep stages. To alleviate class imbalance, we propose a new GAN (called EGAN) architecture adapted to the features of EEG signals for data augmentation. The generated samples for the minority classes are used in the training process. In addition, we design a cost-free ensemble learning strategy to reduce the model estimation variance caused by the heterogeneity between the validation and test sets, so as to enhance the accuracy and robustness of prediction performance. We show that the proposed method can improve classification accuracy compared to several existing state-of-the-art methods using three public sleep datasets.

摘要
深度神经网络在自动睡眠阶段分类中发挥了重要作用，因为它们具有强大的表示能力和内存中特征转换能力。然而， raw EEG 信号中的分类不均和个体差异通常会对任何机器学习算法的分类性能产生很大的影响。为解决这两个问题，本文提出了基于生成对抗网络（GAN）的 ensemble 深度学习模型，名为 SleepEGAN，用于不均分类睡眠阶段。为了缓解分类不均，我们提出了一种适应 EEG 信号特点的新 GAN 架构（称为 EGAN），用于数据增强。生成的小类样本在训练过程中使用。此外，我们设计了一种免费的ensemble学习策略，以降低因验证集和测试集之间的个体差异而导致的模型估计方差，以提高预测性能的准确性和稳定性。我们示示了提案的方法可以在三个公共睡眠数据集上提高分类精度，比较现有的一些状态之 arts 方法。

Smart filter aided domain adversarial neural network: An unsupervised domain adaptation method for fault diagnosis in noisy industrial scenarios

paper_url: http://arxiv.org/abs/2307.01429
repo_url: None
paper_authors: Baorui Dai, Gaëtan Frusque, Tianfu Li, Qi Li, Olga Fink
for: 这个研究旨在提出一种基于不监督领域适应（Unsupervised Domain Adaptation, UDA）的缺陷诊断方法，以便在实际工业应用中将运作经验和缺陷特征转移到不同的运作条件、不同的机器设备或实际数据和模拟数据之间。
methods: 本研究提出了一种名为Smart Filter-Aided Domain Adversarial Neural Network（SFDANN）的缺陷诊断方法，其主要包括两个步骤。第一步是发展一个智能节点，它可以在时间-频域域中强制同源和目标领域数据的相似性。第二步是将重建后的数据输入到一个领域对抗神经网络（Domain Adversarial Neural Network, DANN）中，以学习领域不断和特征分类。
results: 本研究运用了两个缺陷诊断案例，一是磨削机缺陷诊断在噪音环境中，另一是列车轨道缺陷诊断在列车-轨道-桥梁组合震动系统中，这两个案例都是将模拟数据转移到实际数据上，以验证SFDANN方法的效果。结果显示，相比于其他代表性的UDA方法，SFDANN方法在稳定性和识别性方面表现出色。

Abstract
The application of unsupervised domain adaptation (UDA)-based fault diagnosis methods has shown significant efficacy in industrial settings, facilitating the transfer of operational experience and fault signatures between different operating conditions, different units of a fleet or between simulated and real data. However, in real industrial scenarios, unknown levels and types of noise can amplify the difficulty of domain alignment, thus severely affecting the diagnostic performance of deep learning models. To address this issue, we propose an UDA method called Smart Filter-Aided Domain Adversarial Neural Network (SFDANN) for fault diagnosis in noisy industrial scenarios. The proposed methodology comprises two steps. In the first step, we develop a smart filter that dynamically enforces similarity between the source and target domain data in the time-frequency domain. This is achieved by combining a learnable wavelet packet transform network (LWPT) and a traditional wavelet packet transform module. In the second step, we input the data reconstructed by the smart filter into a domain adversarial neural network (DANN). To learn domain-invariant and discriminative features, the learnable modules of SFDANN are trained in a unified manner with three objectives: time-frequency feature proximity, domain alignment, and fault classification. We validate the effectiveness of the proposed SFDANN method based on two fault diagnosis cases: one involving fault diagnosis of bearings in noisy environments and another involving fault diagnosis of slab tracks in a train-track-bridge coupling vibration system, where the transfer task involves transferring from numerical simulations to field measurements. Results show that compared to other representative state of the art UDA methods, SFDANN exhibits superior performance and remarkable stability.

摘要
通过不监督领域适应（UDA）基本的缺陷诊断方法应用，在实际工业场景中显示出了显著的效果，帮助传输不同操作条件、不同单元的船队中的运行经验和缺陷特征。然而，在实际工业场景中，未知的噪声水平和类型可能会增加领域对Alignment的困难度，从而严重地affect Deep learning模型的诊断性能。为解决这个问题，我们提出了一种名为智能筛子援助领域对抗神经网络（SFDANN）的UDA方法，用于缺陷诊断在噪声rich工业场景中。该方法包括两个步骤：第一步：我们开发了一种智能筛子，通过将源频域和目标频域数据在时域频域上进行动态相似性检查，以确保频域数据的匹配。这是通过组合学习抽象射频包变换网络（LWPT）和传统的抽象射频包变换模块来实现的。第二步：我们将重构后的数据输入到领域对抗神经网络（DANN）中，以学习频域特征的域不可分别性和分类特征。我们将学习模块在一起训练三个目标：时域特征的相似性、频域对齐和缺陷分类。我们验证了我们提出的SFDANN方法的效果，在磁矿轮毂缺陷诊断和铁路桥摆车轨缺陷诊断两个案例中进行了比较，其中一个案例是在噪声环境中进行磁矿轮毂缺陷诊断，另一个案例是在铁路桥摆车轨缺陷诊断中，将数据从数值仿真转移到场景测量中。结果显示，相比其他代表性的UDA方法，SFDANN方法在稳定性和性能两个方面具有显著优势。

Generative Flow Networks: a Markov Chain Perspective

paper_url: http://arxiv.org/abs/2307.01422
repo_url: None
paper_authors: Tristan Deleu, Yoshua Bengio
for: 这篇论文是为了提出一种基于Markov链 Monte Carlo方法的新框架，用于采样高多modal的概率分布。
methods: 论文使用Generative Flow Networks（GFlowNets）作为一种新的采样框架，通过对采样视为一个顺序决策问题来mitigate高多modal的问题。
results: 论文提出了一种新的框架，可以在不同的状态空间下视为一种回归Markov链，并且可以通过对GFlowNets进行抽象来看到它们与MCMC方法之间的相似性。

Abstract
While Markov chain Monte Carlo methods (MCMC) provide a general framework to sample from a probability distribution defined up to normalization, they often suffer from slow convergence to the target distribution when the latter is highly multi-modal. Recently, Generative Flow Networks (GFlowNets) have been proposed as an alternative framework to mitigate this issue when samples have a clear compositional structure, by treating sampling as a sequential decision making problem. Although they were initially introduced from the perspective of flow networks, the recent advances of GFlowNets draw more and more inspiration from the Markov chain literature, bypassing completely the need for flows. In this paper, we formalize this connection and offer a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space as recurrent Markov chains. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks, and most importantly to highlight their

摘要
While Markov chain Monte Carlo methods (MCMC) provide a general framework to sample from a probability distribution defined up to normalization, they often suffer from slow convergence to the target distribution when the latter is highly multi-modal. Recently, Generative Flow Networks (GFlowNets) have been proposed as an alternative framework to mitigate this issue when samples have a clear compositional structure, by treating sampling as a sequential decision making problem. Although they were initially introduced from the perspective of flow networks, the recent advances of GFlowNets draw more and more inspiration from the Markov chain literature, bypassing completely the need for flows. In this paper, we formalize this connection and offer a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space as recurrent Markov chains. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks, and most importantly to highlight their differences.Note: The translation is done using a machine translation tool, and may not be perfect.

Free energy of Bayesian Convolutional Neural Network with Skip Connection

paper_url: http://arxiv.org/abs/2307.01417
repo_url: None
paper_authors: Shuya Nagayasu, Sumio Watanabe
for: 本研究探讨了Convolutional Neural Networks(CNNs)中skip connection的效果，以及 Bayesian learning中这种结构的可能性。
methods: 本研究使用了Bayesian方法来研究CNNs中skip connection的效果，并对 Bayesian CNN的一般化性能进行了解释。
results: 研究发现，Bayesian CNN中skip connection的upper bound of free energy不依赖于过参数，并且Bayesian CNN的一般化错误有类似的性能。

Abstract
Since the success of Residual Network(ResNet), many of architectures of Convolutional Neural Networks(CNNs) have adopted skip connection. While the generalization performance of CNN with skip connection has been explained within the framework of Ensemble Learning, the dependency on the number of parameters have not been revealed. In this paper, we show that Bayesian free energy of Convolutional Neural Network both with and without skip connection in Bayesian learning. The upper bound of free energy of Bayesian CNN with skip connection does not depend on the oveparametrization and, the generalization error of Bayesian CNN has similar property.

摘要
自Residual Network(ResNet)的成功以来，许多Convolutional Neural Networks(CNNs)的 arquitectures 已经采用了跳connection。然而，通用的参数数量对CNN with skip connection的泛化性能的影响还没有得到解释。在这篇论文中，我们展示了Bayesian free energy of Convolutional Neural Network both with and without skip connection in Bayesian learning。无论是Bayesian CNN with skip connection还是Bayesian CNN without skip connection，其Upper bound of free energy都不依赖于过参数化，而泛化误差的性能也具有相同的性质。

Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks

paper_url: http://arxiv.org/abs/2307.03197
repo_url: None
paper_authors: Aysha Thahsin Zahir Ismail, Raj Mani Shukla
for: 这个论文旨在研究和分析 SplitFed Learning (SFL) 中数据毒素攻击的影响。
methods: 该论文提出了三种新的攻击策略，包括无目标攻击、targeted攻击和距离基于攻击。
results: 研究发现，无目标和距离基于攻击在SFL中有更大的影响，比targeted攻击更容易让分类器输出错误。研究还通过对两个案例研究（electrocardiogram signal classification和自动手写数字识别）进行了多个攻击实验，并分析了攻击的影响。

Abstract
Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL

摘要
分布式协作机器学习（DCML）是一种可能的中央机器学习隐私问题的解决方案。分布式学习（SL）和联邦学习（FL）是DCML中两种有效的学习方法。最近，关注于SL和FL的混合，即SplitFed Learning（SFL）的研究增长。这项研究是对SFL中数据毒化攻击的首次研究。我们提出了三种新的攻击策略，namely 无目标、Targeted和距离基于攻击，这三种攻击策略都是为了降低基于DCML的分类器性能。我们在两个不同的案例研究中进行了电室心跳信号分类和自动手写数字识别的试验。我们在clients和服务器之间的模型 Split层进行了变化，并通过调整恶意客户端的百分比和选择的模型 Split层来进行了一系列攻击实验。结果表明，无目标和距离基于攻击更有可能影响DCML-based分类器的性能，compared to Targeted attacks。

Multi-Predictor Fusion: Combining Learning-based and Rule-based Trajectory Predictors

paper_url: http://arxiv.org/abs/2307.01408
repo_url: None
paper_authors: Sushant Veer, Apoorva Sharma, Marco Pavone
for: 这篇论文是关于自动驾驶车辆（AV）的 trajectory 预测模块，尤其是在高度互动的交通enario中，以提高安全和效率的规划计划。
methods: 这篇论文提出了一种名为多predictor fusion（MPF）的算法，它将学习基于predictors和逻辑规则的motions planners结合在一起，以提高学习型预测器的性能。MPF使用probabilistic combining方法，将学习型和逻辑规则基的预测器的轨迹混合在一起，以获得最佳性能。
results: 根据我们的结果，MPF在多种指标上表现出色，并且在线性能最高和最稳定的情况下运行。

Abstract
Trajectory prediction modules are key enablers for safe and efficient planning of autonomous vehicles (AVs), particularly in highly interactive traffic scenarios. Recently, learning-based trajectory predictors have experienced considerable success in providing state-of-the-art performance due to their ability to learn multimodal behaviors of other agents from data. In this paper, we present an algorithm called multi-predictor fusion (MPF) that augments the performance of learning-based predictors by imbuing them with motion planners that are tasked with satisfying logic-based rules. MPF probabilistically combines learning- and rule-based predictors by mixing trajectories from both standalone predictors in accordance with a belief distribution that reflects the online performance of each predictor. In our results, we show that MPF outperforms the two standalone predictors on various metrics and delivers the most consistent performance.

摘要
几何预测模组是自动驾驶车 (AV) 规划中的关键启动器，特别是在高度互动的交通情况下。最近，学习型几何预测器在提供最佳性能方面有所成就，因为它们可以从数据中学习多种行为模式。在这篇文章中，我们提出了一个名为多predictor融合（MPF）的算法，它将学习型和规则型预测器融合在一起，以提高几何预测器的性能。MPF 使用一个信念分布来混合两个独立的预测器的轨迹，以实现学习型和规则型预测器的共同运行。在我们的结果中，我们发现MPF 在多个指标上表现更好，并提供了最稳定的性能。

Learning to Communicate using Contrastive Learning

paper_url: http://arxiv.org/abs/2307.01403
repo_url: https://github.com/SonamSangpoLama/Music-Genre-Classification
paper_authors: Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch
for: 这个研究目的是为了提高多智能体RL中的协调，并且解决对环境的观察和沟通问题。
methods: 这个研究使用了对比学习来学习通信，将在不同时间和位置发送的消息视为不完整的环境状态观察。
results: 研究发现，这种方法可以在对话重要的环境中提高性能和学习速度，并且对环境状态观察有更好的对 symmetry 和全局状态资讯的捕捉。

Abstract
Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.

摘要
通信是多智能RL中协调工具的强大工具。但是引入有效、公共语言是一个困难的挑战，特别是在分布式设定下。在这项工作中，我们提出了一种不同的视角，即在代理者之间交换的通信信息被视为环境状态的不同不完整的视图。我们提出了通过对交换的消息进行对比学习，以最大化交换消息序列中的相互信息。在需要通信的环境下，我们的方法比前一项工作在性能和学习速度方面表现更好。使用质量指标和表示探测，我们显示了我们的方法在交换消息中引入更Symmetric的通信和捕捉环境中的全局状态信息。总之，我们展示了对冲学习的力量和通过消息编码实现有效的通信的重要性。

Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II – Clustering Extremely High-Dimensional Grid-Based Data

paper_url: http://arxiv.org/abs/2307.01400
repo_url: None
paper_authors: Chandrika Kamath, Juliette S. Franzman
for: 这个论文的目的是建立一个准确的模拟模型，以便更好地预测计算机模拟中的空间时间输出。
methods: 作者使用了一种简单的方法，即将输出数据分为不同类别，并建立每个类别的单独的模拟模型。但是当输出数据中的空间域数量很大，分类变得更加困难。因此，作者首先将数据转换为一致的格式，然后使用随机投影法减少数据的维度，使用迭代k-means算法进行分类。
results: 作者的方法可以将极高维度的数据进行有意义的分类，即使有一定的近似性。他们通过控制随机投影的方式和k-means算法的初始中心点的选择，确定了数据集中的cluster数量。

Abstract
Building an accurate surrogate model for the spatio-temporal outputs of a computer simulation is a challenging task. A simple approach to improve the accuracy of the surrogate is to cluster the outputs based on similarity and build a separate surrogate model for each cluster. This clustering is relatively straightforward when the output at each time step is of moderate size. However, when the spatial domain is represented by a large number of grid points, numbering in the millions, the clustering of the data becomes more challenging. In this report, we consider output data from simulations of a jet interacting with high explosives. These data are available on spatial domains of different sizes, at grid points that vary in their spatial coordinates, and in a format that distributes the output across multiple files at each time step of the simulation. We first describe how we bring these data into a consistent format prior to clustering. Borrowing the idea of random projections from data mining, we reduce the dimension of our data by a factor of thousand, making it possible to use the iterative k-means method for clustering. We show how we can use the randomness of both the random projections, and the choice of initial centroids in k-means clustering, to determine the number of clusters in our data set. Our approach makes clustering of extremely high dimensional data tractable, generating meaningful cluster assignments for our problem, despite the approximation introduced in the random projections.

摘要
在计算机模拟中的输出中，建立准确的代理模型是一项复杂的任务。一种简单的方法是根据输出的相似性进行归类，并为每个归类建立一个独立的代理模型。当输出的每个时间步骤的大小是 Moderate 时，这种归类是相对容易的。但是，当 spatial 领域被表示为数百万个网点时，归类数据变得更加困难。在这份报告中，我们考虑了计算机模拟中的液体喷气与高爆物相互作用的输出数据。这些数据在不同的空间尺度上可以获得，并且在每个时间步骤上分布在多个文件中。我们首先描述了如何将这些数据转换成一致的格式，以便归类。我们采用了数据挖掘中的Random Projections的想法，将数据维度减少到一千倍，使用迭代k-means算法进行归类。我们示出了如何使用Random Projections和k-means归类算法中的随机初始化中心的Randomness来确定数据集中的凝集数。我们的方法使得归类EXTREMELY HIGH 维度数据成为可能，生成了有意义的凝集分配，尽管在随机投影中引入了一定的简化。

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

paper_url: http://arxiv.org/abs/2307.01394
repo_url: None
paper_authors: Niranda Perera, Arup Kumar Sarker, Mills Staylor, Gregor von Laszewski, Kaiying Shan, Supun Kamburugamuve, Chathura Widanage, Vibhatha Abeykoon, Thejaka Amila Kanewela, Geoffrey Fox
for: 本研究旨在提高数据工程应用程序的性能，特别是在处理大量数据时。
methods: 本文使用高性能计算的视角，提出了分布式数据框架操作的并行处理模式，并实现了参考runtime实现Cylon。
results: 本研究在ORNL Summit超级计算机上评估了Cylon的性能。

Abstract
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer.

摘要
“数据科学领域在过去的一个 décennial 内扩大了很大，主要归功于大数据革命。人工智能（AI）和机器学习（ML）对数据工程应用带来了更多复杂性，这些应用现在被 integrate 到数据处理管道中来处理 terrabytes 级数据。通常，处理数据预处理过程中会投入大量时间，因此提高其效率直接影响整个管道性能。社区最近普遍认可数据帧为数据表示和操作的启用词。但是，当前最广泛使用的序列数据帧（R、pandas）在处理 Moderately 大规模数据集时会表现出性能限制。我们认为，从高性能计算的角度来看这个问题，还有很多可以提高的空间。在先前的发表文章中，我们提出了分布式数据帧运算 Patterns 和 Referencel Runtime 实现 Cylon 等一系列并发处理模式[1]。在这篇论文中，我们将这个概念进一步发展，并提出一种成本模型来评估所提出的模式。此外，我们还在 ORNL Summit 超级计算机上评估了 Cylon 的性能。”

Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part I – Analysis with a Small Sample Size

paper_url: http://arxiv.org/abs/2307.01393
repo_url: None
paper_authors: Chandrika Kamath, Juliette S. Franzman, Brian H. Daub
for: 本研究旨在开发一种高质量的空间-时间抽象方法，以便更好地理解复杂现象的计算机模拟结果。
methods: 本研究使用了一种基于机器学习的抽象方法，并在使用了一些简单的方法来提高抽象精度。
results: 研究发现，使用这种抽象方法可以创建高质量的空间-时间抽象模型，并且不需要进行大量的计算机模拟。

Abstract
Computer simulations, especially of complex phenomena, can be expensive, requiring high-performance computing resources. Often, to understand a phenomenon, multiple simulations are run, each with a different set of simulation input parameters. These data are then used to create an interpolant, or surrogate, relating the simulation outputs to the corresponding inputs. When the inputs and outputs are scalars, a simple machine learning model can suffice. However, when the simulation outputs are vector valued, available at locations in two or three spatial dimensions, often with a temporal component, creating a surrogate is more challenging. In this report, we use a two-dimensional problem of a jet interacting with high explosives to understand how we can build high-quality surrogates. The characteristics of our data set are unique - the vector-valued outputs from each simulation are available at over two million spatial locations; each simulation is run for a relatively small number of time steps; the size of the computational domain varies with each simulation; and resource constraints limit the number of simulations we can run. We show how we analyze these extremely large data-sets, set the parameters for the algorithms used in the analysis, and use simple ways to improve the accuracy of the spatio-temporal surrogates without substantially increasing the number of simulations required.

摘要
计算机模拟，尤其是复杂现象的模拟，可能具有高成本，需要高性能计算资源。经常情况下，以解释现象，需要运行多个模拟，每个模拟都有不同的模拟输入参数。这些数据后来用于创建一个 interpolant，或surrogate，将模拟输出与相应的输入关系。当输入和输出都是整数时，一个简单的机器学习模型即可。但当模拟输出是二维或三维的向量值，创建surrogate更加困难。在这份报告中，我们使用一个两维问题，即喷气与高爆物相互作用，来理解如何建立高质量surrogate。我们的数据集的特点是唯一的：每个模拟的向量值输出在超过两百万个空间位置上可用;每个模拟只需要很少的时间步骤;计算区域的大小随每个模拟而异；资源限制限制我们可以运行的模拟数量。我们如何分析这些非常大的数据集，设置分析中使用的参数，并使用简单的方法提高空间temporal surrogate的准确性，不需要substantially增加模拟数量。

Adversarial Learning in Real-World Fraud Detection: Challenges and Perspectives

paper_url: http://arxiv.org/abs/2307.01390
repo_url: None
paper_authors: Danele Lunghi, Alkis Simitsis, Olivier Caelen, Gianluca Bontempi
for: 本研究旨在探讨针对诈骗检测系统的攻击方法，以及如何扩展对其他领域和应用的攻击技术。
methods: 本研究使用了对抗机器学习技术，以探讨诈骗检测系统中的攻击方法。
results: 本研究发现了一些针对诈骗检测系统的攻击方法，并提出了一些可能的解决方案。

Abstract
Data economy relies on data-driven systems and complex machine learning applications are fueled by them. Unfortunately, however, machine learning models are exposed to fraudulent activities and adversarial attacks, which threaten their security and trustworthiness. In the last decade or so, the research interest on adversarial machine learning has grown significantly, revealing how learning applications could be severely impacted by effective attacks. Although early results of adversarial machine learning indicate the huge potential of the approach to specific domains such as image processing, still there is a gap in both the research literature and practice regarding how to generalize adversarial techniques in other domains and applications. Fraud detection is a critical defense mechanism for data economy, as it is for other applications as well, which poses several challenges for machine learning. In this work, we describe how attacks against fraud detection systems differ from other applications of adversarial machine learning, and propose a number of interesting directions to bridge this gap.

摘要
将文本翻译成简化中文。数据经济依赖于数据驱动系统和复杂的机器学习应用程序，但是这些应用程序受到诈骗活动和敌意攻击的威胁。在过去的一个 décennial 以来，关于反对机器学习的研究兴趣增长了 significatively，揭示了机器学习应用程序可能受到严重的影响。虽然初期的反对机器学习结果表明了该方法在图像处理领域的巨大潜力，但是在其他领域和应用程序中，还存在一定的泛化问题。防止诈骗是数据经济中的关键防御机制，同时也是其他应用程序中的挑战。在这种情况下，我们描述了诈骗检测系统受到攻击的方式与其他应用程序不同，并提出了一些有趣的方向来bridging这个差距。

Identification of Causal Relationship between Amyloid-beta Accumulation and Alzheimer’s Disease Progression via Counterfactual Inference

paper_url: http://arxiv.org/abs/2307.01389
repo_url: None
paper_authors: Haixing Dai, Mengxuan Hu, Qing Li, Lu Zhang, Lin Zhao, Dajiang Zhu, Ibai Diez, Jorge Sepulcre, Fan Zhang, Xingyu Gao, Manhua Liu, Quanzheng Li, Sheng Li, Tianming Liu, Xiang Li
for: 这篇论文旨在探讨阿尔茨海默症（AD）的预后诊断和个性化治疗方案。
methods: 论文提出了一种基于图 convolutional neural network（GVCNet）的方法来估计个体对药物剂量的影响，以探讨阿尔茨海默症发展的因果关系。
results: 论文显示了这种方法可以实现个体对阿尔茨海默症发展的测量，并且可以提供可靠的预后诊断和个性化治疗方案。

Abstract
Alzheimer's disease (AD) is a neurodegenerative disorder that is beginning with amyloidosis, followed by neuronal loss and deterioration in structure, function, and cognition. The accumulation of amyloid-beta in the brain, measured through 18F-florbetapir (AV45) positron emission tomography (PET) imaging, has been widely used for early diagnosis of AD. However, the relationship between amyloid-beta accumulation and AD pathophysiology remains unclear, and causal inference approaches are needed to uncover how amyloid-beta levels can impact AD development. In this paper, we propose a graph varying coefficient neural network (GVCNet) for estimating the individual treatment effect with continuous treatment levels using a graph convolutional neural network. We highlight the potential of causal inference approaches, including GVCNet, for measuring the regional causal connections between amyloid-beta accumulation and AD pathophysiology, which may serve as a robust tool for early diagnosis and tailored care.

摘要
阿尔茨海默病（AD）是一种神经退化疾病，起始于蛋白质沉积，然后是神经元丢失和结构、功能和认知的衰退。脑内βamyloid沉积的寻测，通过18F-氟苯酚（AV45） пози特核燐发射 Tomography（PET）成像，在早期诊断AD中广泛使用。然而，蛋白质沉积和AD生物学过程之间的关系仍然不清楚，需要用 causal inference 方法来探索蛋白质沉积如何影响AD发展。在这篇论文中，我们提出了一种基于图变换系数神经网络（GVCNet）的个体处方效应估计方法，可以用于评估连续治疗水平下的个体处方效应。我们强调了可meter causal inference 方法，包括 GVCNet，在评估蛋白质沉积和AD生物学过程之间的区域 causal 连接方面的潜在价值，这可能成为早期诊断和个性化治疗的可靠工具。

Systematic Bias in Sample Inference and its Effect on Machine Learning

paper_url: http://arxiv.org/abs/2307.01384
repo_url: None
paper_authors: Owen O’Neill, Fintan Costello
for: 这种机器学习模型下的目标特征下预测不准确，特别是对少数群体的预测。
methods: 使用小样本统计推断的方法，导致预测结果受到方向性的统计偏见。
results: 对多个子集的预测结果显示，这种偏见导致了少数群体的预测错误率较高。

Abstract
A commonly observed pattern in machine learning models is an underprediction of the target feature, with the model's predicted target rate for members of a given category typically being lower than the actual target rate for members of that category in the training set. This underprediction is usually larger for members of minority groups; while income level is underpredicted for both men and women in the 'adult' dataset, for example, the degree of underprediction is significantly higher for women (a minority in that dataset). We propose that this pattern of underprediction for minorities arises as a predictable consequence of statistical inference on small samples. When presented with a new individual for classification, an ML model performs inference not on the entire training set, but on a subset that is in some way similar to the new individual, with sizes of these subsets typically following a power law distribution so that most are small (and with these subsets being necessarily smaller for the minority group). We show that such inference on small samples is subject to systematic and directional statistical bias, and that this bias produces the observed patterns of underprediction seen in ML models. Analysing a standard sklearn decision tree model's predictions on a set of over 70 subsets of the 'adult' and COMPAS datasets, we found that a bias prediction measure based on small-sample inference had a significant positive correlations (0.56 and 0.85) with the observed underprediction rate for these subsets.

摘要
通常观察到的机器学习模型 patrón es la underprediction del feature objetivo, con la tasa predicha del modelo para los miembros de una categoría específica generalmente siendo menor que la tasa real para los miembros de esa categoría en el conjunto de entrenamiento. Esta underprediction es usualmente más grande para los miembros de los grupos minoritarios; por ejemplo, en el conjunto de datos 'adult', la tasa de underprediction es significativamente más alta para las mujeres (un grupo minoritario en ese conjunto de datos). Proponemos que este patrón de underprediction para los minorías se debe a una inferencia estadística predictible en pequeños conjuntos de datos. Cuando se presenta a un nuevo individuo para clasificación, un modelo de aprendizaje automático realiza inferencia no en todo el conjunto de entrenamiento, sino en un subconjunto que es de alguna manera similar al nuevo individuo, con tamaños de estos subconjuntos que siguen una distribución de potencia, lo que significa que la mayoría son pequeños (y con estos subconjuntos necesariamente más pequeños para el grupo minoritario). Demostramos que esta inferencia en pequeños conjuntos de datos está sujeta a una bias estadística sistemática y direccional, y que esta bias produce los patrones de underprediction observados en los modelos de aprendizaje automático. Analizando las predicciones de un modelo de árbol de decisión de sklearn en más de 70 subconjuntos del conjunto de datos 'adult' y COMPAS, encontramos que una medida de predicción de bias basada en la inferencia en pequeños conjuntos de datos tuvo una correlación positiva significativa (0,56 y 0,85) con la tasa de underprediction observada para estos subconjuntos.

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

paper_url: http://arxiv.org/abs/2307.01381
repo_url: https://github.com/osu-starlab/implicitmemory
paper_authors: Matthew Raffel, Lizhong Chen
for: 这个论文目的是提出一种新的听话器，以便在同时进行口头翻译。
methods: 该方法使用块处理来分割输入序列，并使用新的左上下文方法来隐式地保留记忆。
results: 实验结果表明，使用该方法可以在Encoder前进行快速加速，并且与使用左上下文和记忆银行的方法相比，翻译质量几乎相同。

Abstract
Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment's attention calculation. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass with nearly identical translation quality when compared with the state-of-the-art approach that employs both left context and memory banks.

摘要
同时对话翻译是人类communication task中的一项重要任务，即在流动输入语音时实时生成翻译。为此流动任务，使用块处理的 transformers 已经达到了状态机器的性能标准，而且可以降低计算成本。现有的方法，包括左上下文和内存银行，尝试让信息在段之间传递，但是这些方法都是不够的表示和过分昂贵的计算。在这篇论文中，我们提出了隐式记忆 transformer，通过一种新的左上下文方法，使得不需要显式表示内存。我们从前一段的注意输出中生成左上下文，并将其包含在当前段的注意计算中的键和值中。实验结果表明，隐式记忆 transformer 在 Must-C 数据集上提供了大幅降低encoder前进计算时间，并且与使用左上下文和内存银行的状态之前的翻译质量相似。

Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models

paper_url: http://arxiv.org/abs/2307.01379
repo_url: https://github.com/jinhaoduan/shifting-attention-to-relevance
paper_authors: Jinhao Duan, Hao Cheng, Shiqi Wang, Chenan Wang, Alex Zavalny, Renjing Xu, Bhavya Kailkhura, Kaidi Xu
for: 这项研究的目的是解决自动逆进语言模型（LLMs）生成输出的不确定性问题，即用户可以信任模型输出的问题。
methods: 这项研究使用了自动逆进语言模型（LLMs）生成输出的token不均等，即一些token更加重要（或代表）于另外的token，并且对于估计不确定性，所有token被视为平等的现象，来 investigate 如何解决这些不平等。
results: 研究结果显示，在估计不确定性时，许多重要的token和含有有限 semantics的句子被平均地或者甚至很重视，以至于存在biases。为了解决这些biases，提议使用 JOINT SHIFTING ATTENTION TO RELEVANT（SAR）组件，并在实验中达到了superior表现。

Abstract
Although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.

摘要
尽管大型自然语言模型（LLM）已经表现出了很大的潜力，但是 Still characterizing the uncertainty of model generations, i.e., when users can trust model outputs, is still a challenge. Our research is based on the heuristic fact that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation

paper_url: http://arxiv.org/abs/2307.01377
repo_url: https://github.com/osu-starlab/shiftablecontext
paper_authors: Matthew Raffel, Drew Penney, Lizhong Chen
for: simultaneous speech translation
methods: 使用 segment-based processing 和 Shiftable Context scheme
results: average increase of 2.09, 1.83, and 1.95 BLEU scores across each wait-k value for the three language pairs, with minimal impact on computation-aware Average Lagging.

Abstract
Transformer models using segment-based processing have been an effective architecture for simultaneous speech translation. However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation. Shiftable Context is also broadly applicable to segment-based transformers for streaming tasks. Our experiments on the English-German, English-French, and English-Spanish language pairs from the MUST-C dataset demonstrate that when applied to the Augmented Memory Transformer, a state-of-the-art model for simultaneous speech translation, the proposed scheme achieves an average increase of 2.09, 1.83, and 1.95 BLEU scores across each wait-k value for the three language pairs, respectively, with a minimal impact on computation-aware Average Lagging.

摘要
transformer模型使用分段处理有效地实现同时语音翻译。然而，这些模型在训练和推理环境中存在上下文匹配问题，从而限制了翻译准确性。我们解决这个问题，提出了Shiftable Context，一种简单 yet effective的方案，确保在训练和推理过程中保持一致的分段和上下文大小。Shiftable Context还可以广泛应用于流处理任务中的 segment-based transformer。我们在MUST-C数据集上进行英语-德语、英语-法语和英语-西班牙语三对语言对的实验，结果显示，当应用到Augmented Memory Transformer模型时，提出的方案平均提高了2.09、1.83和1.95的BLEU分数 across each wait-k值，并且对计算意识的均衡延迟产生了最小的影响。

Adaptive Principal Component Regression with Applications to Panel Data

paper_url: http://arxiv.org/abs/2307.01357
repo_url: None
paper_authors: Anish Agarwal, Keegan Harris, Justin Whitehouse, Zhiwei Steven Wu
for: This paper provides time-uniform finite sample guarantees for online principal component regression (PCR) in the presence of adaptive data collection.
methods: The paper uses tools from modern martingale concentration to analyze PCR in the online setting, which is a generalization of the fixed-design error-in-variables regression.
results: The paper provides a framework for experiment design in panel data settings when interventions are assigned adaptively, which can be seen as a generalization of synthetic control and synthetic interventions frameworks.Here’s the Chinese version:
for: 这篇论文提供了在在线主成分回归（PCR）中的时间固定样本保证。
methods: 这篇论文使用现代随机 martingale 集中来分析 PCR 在在线设置下的分析。
results: 这篇论文提供了针对板块数据设置中的实验设计框架，当实验是通过适应性的干预分配策略进行分配。

Abstract
Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for online (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixed design setting do not readily extend to the online setting, our results rely on adapting tools from modern martingale concentration to the error-in-variables setting. As an application of our bounds, we provide a framework for experiment design in panel data settings when interventions are assigned adaptively. Our framework may be thought of as a generalization of the synthetic control and synthetic interventions frameworks, where data is collected via an adaptive intervention assignment policy.

摘要
主成分回归（PCR）是一种流行的固定设计错误变量回归技术， linear regression 设定中的一种扩展，在观测 covariates 上存在随机噪声。我们提供了在线（规化）PCR 的首次时间均衡finite sample guarantees，当数据采集是动态的。由于fixed design 设定中PCR 的证明技巧不直接适用于在线设定，我们的结果基于采用现代martingale concentration 工具来error-in-variables设定。我们的极限 bounds 可以应用于面板数据设置中的实验设计，当实验是通过适应性干预分配策略采集数据。我们的框架可以看作是错误变量和synthetic control 框架的扩展，在适应性干预分配策略下采集数据。

Learning Generic Solutions for Multiphase Transport in Porous Media via the Flux Functions Operator

paper_url: http://arxiv.org/abs/2307.01354
repo_url: None
paper_authors: Waleed Diab, Omar Chaabi, Shayma Alkobaisi, Abeeb Awotunde, Mohammed Al Kobaisi
for: 加速 fluid 流动和运输在 porous media 中的 simulate 算法，使得在科学和工程领域中可以更快速地解决问题。
methods: 使用 deep learning 技术，具体来说是 Physics-Informed DeepONets (PI-DeepONets)，通过学习 partial differential equations (PDEs) 中的运算函数，从而实现快速的解决。
results: 比 traditional numerical solvers 快速多达四个数量级，并且可以捕捉到 any type of flux function (concave, convex, or non-convex) 的解决。同时，trained PI-DeepONet model 表现出了优秀的泛化能力，这使得它成为了解决 transport problems in porous media 中的一个有力的工具。

Abstract
Traditional numerical schemes for simulating fluid flow and transport in porous media can be computationally expensive. Advances in machine learning for scientific computing have the potential to help speed up the simulation time in many scientific and engineering fields. DeepONet has recently emerged as a powerful tool for accelerating the solution of partial differential equations (PDEs) by learning operators (mapping between function spaces) of PDEs. In this work, we learn the mapping between the space of flux functions of the Buckley-Leverett PDE and the space of solutions (saturations). We use Physics-Informed DeepONets (PI-DeepONets) to achieve this mapping without any paired input-output observations, except for a set of given initial or boundary conditions; ergo, eliminating the expensive data generation process. By leveraging the underlying physical laws via soft penalty constraints during model training, in a manner similar to Physics-Informed Neural Networks (PINNs), and a unique deep neural network architecture, the proposed PI-DeepONet model can predict the solution accurately given any type of flux function (concave, convex, or non-convex) while achieving up to four orders of magnitude improvements in speed over traditional numerical solvers. Moreover, the trained PI-DeepONet model demonstrates excellent generalization qualities, rendering it a promising tool for accelerating the solution of transport problems in porous media.

摘要
传统的数学方法 для模拟 fluid 流和物质传输在porous media中可能是计算昂贵的。机器学习的应用在科学计算中有助于减少模拟时间在多科学和工程领域。DeepONet 是一种可以加速解决部分偏微分方程（PDEs）的有力工具，它可以学习 PDEs 中操作（函数空间之间的映射）的映射。在这个工作中，我们学习了 Buckley-Leverett PDE 中的流函数空间和解空间之间的映射，使用 Physics-Informed DeepONets（PI-DeepONets）来实现这种映射，不需要任何对应的输入输出观察数据，只需要给定一些初始或边界条件即可。通过在模型训练中采用物理法律的软约束，类似于 Physics-Informed Neural Networks（PINNs），以及特有的深度神经网络架构，我们的提议的 PI-DeepONet 模型可以准确地预测解，并且可以在不同类型的流函数（凹、 convex、非几何）下实现四个数量级的速度提高。此外，我们训练的 PI-DeepONet 模型还表现出了优秀的泛化质量，使其成为加速porous media中物质传输问题的解决工具。

Patch-CNN: Training data-efficient deep learning for high-fidelity diffusion tensor estimation from minimal diffusion protocols

paper_url: http://arxiv.org/abs/2307.01346
repo_url: None
paper_authors: Tobias Goodwin-Allcock, Ting Gong, Robert Gray, Parashkev Nachev, Hui Zhang
for: 这种论文是为了提出一种新的方法，即 Patch-CNN，用于从六个方向的扩散图像（DWI）中提取扩散矩阵（DT）的估计。
methods: 该方法使用了深度学习方法，使用了缓冲层（Convolutional Neural Network，CNN）来学习扩散矩阵的估计。
results: 对比传统模型适应和维度全连接神经网络（voxel-wise Fully-Connected Neural Network，FCN），Patch-CNN 可以更好地估计扩散矩阵和纤维方向，并且只需要使用单个试验者的数据进行训练。

Abstract
We propose a new method, Patch-CNN, for diffusion tensor (DT) estimation from only six-direction diffusion weighted images (DWI). Deep learning-based methods have been recently proposed for dMRI parameter estimation, using either voxel-wise fully-connected neural networks (FCN) or image-wise convolutional neural networks (CNN). In the acute clinical context -- where pressure of time limits the number of imaged directions to a minimum -- existing approaches either require an infeasible number of training images volumes (image-wise CNNs), or do not estimate the fibre orientations (voxel-wise FCNs) required for tractogram estimation. To overcome these limitations, we propose Patch-CNN, a neural network with a minimal (non-voxel-wise) convolutional kernel (3$\times$3$\times$3). Compared with voxel-wise FCNs, this has the advantage of allowing the network to leverage local anatomical information. Compared with image-wise CNNs, the minimal kernel vastly reduces training data demand. Evaluated against both conventional model fitting and a voxel-wise FCN, Patch-CNN, trained with a single subject is shown to improve the estimation of both scalar dMRI parameters and fibre orientation from six-direction DWIs. The improved fibre orientation estimation is shown to produce improved tractogram.

摘要
我们提出了一种新方法，patch-CNN，用于从六个方向的扩散tensor（DT）估计。在临床情况下，使用全量的学习方法来估计DMRI参数，可以使用 Either voxel-wise fully-connected neural networks（FCN）或图像-wise convolutional neural networks（CNN）。现有的方法 either require an infeasible number of training images volumes（image-wise CNNs），或者不能估计纤维方向（voxel-wise FCNs），从而限制了轨迹估计。为了超越这些限制，我们提出了patch-CNN，一个具有最小（非voxel-wise） convolutional kernel（3×3×3）的神经网络。与voxel-wise FCNs比较，这有利于神经网络利用地方 анатомиче信息。与image-wise CNNs比较，最小kernel减少了训练数据的需求。我们通过对 conventiomal model fitting和voxel-wise FCN进行比较，发现patch-CNN，通过一个个体训练，可以提高六个方向DWI中的scalar DMRI参数和纤维方向的估计。此外，改进的纤维方向估计也可以提高轨迹的估计。

Robust Uncertainty Estimation for Classification of Maritime Objects

paper_url: http://arxiv.org/abs/2307.01325
repo_url: None
paper_authors: Jonathan Becktor, Frederik Scholler, Evangelos Boukas, Lazaros Nalpantidis
for: 这篇论文的目的是探讨在海上领域中使用不确定性估计的可能性，并在具有各种硬件和软件限制的实际场景中进行评估。
methods: 这篇论文使用了蒙特卡洛批处理来实现内部类uncertainty，并结合了最新的异常检测发现的技术来获得更全面的不确定性测量。
results: 该论文的实验结果显示，通过将Monte Carlo Dropout与异常检测技术结合使用，可以提高FPR95的性能，相比之下当模型没有异常数据训练时，该方法的性能提高了8%。此外，相比于基本实现的宽度网络，该方法可以提高性能 by 77%。此外， authors还释放了SHIPS数据集，并证明了该方法的有效性，将FPR95提高了44.2%。

Abstract
We explore the use of uncertainty estimation in the maritime domain, showing the efficacy on toy datasets (CIFAR10) and proving it on an in-house dataset, SHIPS. We present a method joining the intra-class uncertainty achieved using Monte Carlo Dropout, with recent discoveries in the field of outlier detection, to gain more holistic uncertainty measures. We explore the relationship between the introduced uncertainty measures and examine how well they work on CIFAR10 and in a real-life setting. Our work improves the FPR95 by 8% compared to the current highest-performing work when the models are trained without out-of-distribution data. We increase the performance by 77% compared to a vanilla implementation of the Wide ResNet. We release the SHIPS dataset and show the effectiveness of our method by improving the FPR95 by 44.2% with respect to the baseline. Our approach is model agnostic, easy to implement, and often does not require model retraining.

摘要
我们探索了海上领域中uncertainty估计的使用，通过使用CIFAR10杂交数据集和自有数据集SHIPS进行证明，并提出了将Monte Carlo Dropout中的内类uncertainty与现代异常检测发现相结合以获得更全面的uncertainty测度的方法。我们研究了引入的uncertainty测度与之间的关系，并在CIFAR10和实际场景中评估其效果。我们的工作提高了FPR95的性能，相比最高性能工作不包含外围数据集时，提高了8%。相比于普通实现的宽度网络，我们的方法提高了77%的性能。我们发布了SHIPS数据集，并通过提高FPR95的性能44.2%来证明我们的方法的效果。我们的方法是模型无关的，易于实现，通常不需要模型重新训练。

Density-based Feasibility Learning with Normalizing Flows for Introspective Robotic Assembly

paper_url: http://arxiv.org/abs/2307.01317
repo_url: https://github.com/DLR-RM/GRACE
paper_authors: Jianxiang Feng, Matan Atad, Ismael Rodríguez, Maximilian Durner, Stephan Günnemann, Rudolph Triebel
for: 本研究旨在提高机器学习（ML）模型在机器人组装序列规划（RASP）中的 introspection 能力，以避免效率下降。
methods: 本研究提出了一种基于密度的可行性学习方法，不需要非可行示例。具体来说，我们将可行性学习问题转化为Out-of-Distribution（OOD）探测问题，使用Normalizing Flows（NF）来估计复杂的概率分布。
results: 在机器人组装用例中，提出的方法比单类基elines表现出色地探测不可行的组装。我们还进一步调查了我们方法的内部工作机制，发现可以通过高级变体NF实现很大的内存节省。

Abstract
Machine Learning (ML) models in Robotic Assembly Sequence Planning (RASP) need to be introspective on the predicted solutions, i.e. whether they are feasible or not, to circumvent potential efficiency degradation. Previous works need both feasible and infeasible examples during training. However, the infeasible ones are hard to collect sufficiently when re-training is required for swift adaptation to new product variants. In this work, we propose a density-based feasibility learning method that requires only feasible examples. Concretely, we formulate the feasibility learning problem as Out-of-Distribution (OOD) detection with Normalizing Flows (NF), which are powerful generative models for estimating complex probability distributions. Empirically, the proposed method is demonstrated on robotic assembly use cases and outperforms other single-class baselines in detecting infeasible assemblies. We further investigate the internal working mechanism of our method and show that a large memory saving can be obtained based on an advanced variant of NF.

摘要
machine learning (ml) 模型在机器人组装序列规划 (rasp) 中需要 introspective 对预测的解决方案，以避免效率降低。先前的工作需要两类样本：可行和不可行的示例。然而，不可行的示例具有充足的收集困难，导致在重新训练时需要充足的时间。在这种情况下，我们提议一种基于浓度学习的可行学习方法，只需要可行的示例。具体来说，我们将可行学习问题定义为 OUT-OF-DISTRIBUTION (OOD) 检测，使用 Normalizing Flows (NF) 来Estimate 复杂的概率分布。实验表明，我们提议的方法在机器人组装use case中表现出色，可以快速检测不可行的组装。我们进一步调查我们的方法的内部工作机制，发现可以基于高级变体的 NF 实现大量内存保存。

Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2307.01316
repo_url: https://github.com/cav-research-lab/safe-reinforcement-learning-using-symbolic-logical-programming-for-autonomous-highway-driving
paper_authors: Iman Sharifi, Mustafa Yildirim, Saber Fallah
for: 本研究旨在开发一种能够在真实环境中学习自动驾驶策略，并确保安全性的神经符号逻辑深度学习方法（DRLSL）。
methods: 本方法结合神经网络学习和符号逻辑推理，以便在真实环境中学习自动驾驶策略，并且能够保证安全性。
results: 我们在使用高D数据集进行实践中，发现DRLSL方法可以避免不安全行为，并且在训练和测试阶段都能够快速 converges。此外，我们的结果还表明，DRLSL方法在面对新的驾驶场景时能够更好地泛化。

Abstract
The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.

摘要
<> translate "The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods."into Simplified Chinese.驾驶环境的动态性和路用者的多样性 pose significant challenges for autonomous driving decision-making.深度强化学习（DRL）已经成为解决这个问题的popular方法。然而，现有的DRL解决方案主要在模拟环境中应用，由于安全问题，导致它们在实际环境中没有得到广泛应用。为了突破这个限制，这篇论文提出了一种新的 neuralsymbolic model-free DRL方法，叫做DRL with Symbolic Logics（DRLSL）。这种方法结合了DRL（学习经验）和符号逻辑（知识驱动的理解），以便在真实环境中学习自动驾驶策略，并确保安全。我们在自动驾驶中实现了DRLSL框架，使用highD dataset，并证明了我们的方法在训练和测试阶段都可以避免不安全的行为。此外，我们的结果还表明，DRLSL在训练阶段更快 converges和在新的驾驶enario中表现更好的普适性。

A numerical algorithm for attaining the Chebyshev bound in optimal learning

paper_url: http://arxiv.org/abs/2307.01304
repo_url: None
paper_authors: Pradyumna Paruchuri, Debasish Chatterjee
for: 解决 оптимального学习从数据点集中回归函数的问题
methods: 基于近似解决半无穷问题的目标采样技术
results: 计算废弃半径和废弃中心，解决函数回归问题

Abstract
Given a compact subset of a Banach space, the Chebyshev center problem consists of finding a minimal circumscribing ball containing the set. In this article we establish a numerically tractable algorithm for solving the Chebyshev center problem in the context of optimal learning from a finite set of data points. For a hypothesis space realized as a compact but not necessarily convex subset of a finite-dimensional subspace of some underlying Banach space, this algorithm computes the Chebyshev radius and the Chebyshev center of the hypothesis space, thereby solving the problem of optimal recovery of functions from data. The algorithm itself is based on, and significantly extends, recent results for near-optimal solutions of convex semi-infinite problems by means of targeted sampling, and it is of independent interest. Several examples of numerical computations of Chebyshev centers are included in order to illustrate the effectiveness of the algorithm.

摘要
Translation notes:* "compact subset" becomes "compact subset" (同义译)* "Chebyshev center problem" becomes "Chebychev中心问题" (direct translation)* "hypothesis space" becomes "假设空间" (direct translation)* "compact but not necessarily convex subset" becomes "不必然凸的子集" (direct translation)* "minimal circumscribing ball" becomes "最小圆包" (direct translation)* "optimal learning from a finite set of data points" becomes "从finite个数据点中优化学习" (direct translation)* "convex semi-infinite problems" becomes "凸半无穷问题" (direct translation)* "targeted sampling" becomes "targeted采样" (direct translation)* "of independent interest" becomes "独立有利" (direct translation)Note that in Simplified Chinese, the word "space" is often omitted in translations, so "underlying Banach space" becomes just "underlying Banach 空间" in the translation.

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

paper_url: http://arxiv.org/abs/2307.01292
repo_url: None
paper_authors: Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov
for: 本研究探讨了基于服务器端模型 zoo 的实时网络应用中的安全性，具体来说是对模型EXTRACTION攻击的Robustness。
methods: 本研究提出了一种高效的查询 fingerprinting 算法，使攻击者可以让服务器端模型 consistently 执行恶意操作。此外，本研究还提出了一种基于噪音的防御机制，通过添加噪音到指定性能指标来防止 fingerprinting。
results: 本研究表明，使用高效查询 fingerprinting 算法可以在模型EXTRACTION攻击中实现高精度和高准确率（在 $1%$ 以内），同时可以提高模型抽象层的安全性。此外，本研究还发现了一种基于噪音的防御机制可以减少攻击精度和准确率（在 $9.8%$ 和 $4.8%$ 以内）。

Abstract
Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. Modern inference serving systems break this assumption. Thus, they cannot be directly applied to extract a victim model, as models are hidden behind a layer of abstraction exposed by the serving system. An attacker can no longer identify which model she is interacting with. To this end, we first propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained when attacking a single, explicitly specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Second, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. The proposed defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). Third, we show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We implement the proposed defense in a real system with plans to open source.

摘要
模型服务系统在实时网络应用中变得越来越流行，特别是在用户发送查询并指定需要的性能指标（例如精度和响应时间）后，服务器根据指定的指标从后端维护的模型 zoo 中提供查询结果。这篇论文检查这些系统的安全性，特别是对于模型提取攻击的Robustness。现有的黑盒攻击假设可以重复地选择服务器上的单个模型来进行推理请求。现代推理服务系统破坏了这一假设，因此无法直接应用于提取受害模型。攻击者无法确定她正在互动的是哪个模型。为此，我们首先提出了一种高效的询问算法，使得攻击者可以轻松地触发所需的模型。我们显示，使用我们的询问算法可以在$1\%$的精度和准确度下提取模型，并且可以在$14.6\%$的精度和$7.7\%$的准确度上提高模型提取的精度和准确度，相比之下 Naive 攻击。其次，我们采用噪音基的防御机制，将指定性能指标添加噪音，以防止指纹。我们的防御策略可以在中等模型提取 task 下 reducuce 攻击的精度和准确度为$9.8\%$和$4.8\%$。最后，我们显示了我们的防御机制存在可配置的质量和系统性能之间的负面冲击，可以在保持可接受的系统性能（$>80\%$）的情况下实现可靠的受害模型提取保护。我们已经实现了我们的防御机制，计划将其开源。

Fighting the disagreement in Explainable Machine Learning with consensus

paper_url: http://arxiv.org/abs/2307.01288
repo_url: None
paper_authors: Antonio Jesus Banegas-Luna, Carlos Martınez-Cortes, Horacio Perez-Sanchez
For: 本研究旨在解释机器学习模型的内部工作方式，以提高模型的可解释性。* Methods: 本研究使用了多种可解释性算法，包括本研究所开发的一种新的函数，以解释五种机器学习模型。* Results: 研究结果显示，提出的函数比其他函数更公正，提供了更一致和准确的解释。

Abstract
Machine learning (ML) models are often valued by the accuracy of their predictions. However, in some areas of science, the inner workings of models are as relevant as their accuracy. To understand how ML models work internally, the use of interpretability algorithms is the preferred option. Unfortunately, despite the diversity of algorithms available, they often disagree in explaining a model, leading to contradictory explanations. To cope with this issue, consensus functions can be applied once the models have been explained. Nevertheless, the problem is not completely solved because the final result will depend on the selected consensus function and other factors. In this paper, six consensus functions have been evaluated for the explanation of five ML models. The models were previously trained on four synthetic datasets whose internal rules were known in advance. The models were then explained with model-agnostic local and global interpretability algorithms. Finally, consensus was calculated with six different functions, including one developed by the authors. The results demonstrated that the proposed function is fairer than the others and provides more consistent and accurate explanations.

摘要
Translated into Simplified Chinese:机器学习（ML）模型常被评估于其预测准确率。然而，在一些科学领域中，模型内部的工作方式也很重要。为了了解模型如何工作，使用可解释算法是最佳选择。然而，尽管有多种可解释算法可用，它们经常在解释模型时存在差异，导致不一致的解释。为了解决这个问题，可以应用consensus函数。然而，这并不完全解决问题，因为选择的consensus函数以及其他因素会影响最终结果。在这篇论文中，六种consensus函数被评估以解释五种ML模型。这些模型先前在四个已知内部规则的 sintetic数据集上进行了训练。然后，使用模型无关的本地和全局可解释算法来解释模型。最后，使用六种不同的consensus函数进行了投票，包括作者所开发的一种。结果表明，提案的函数比其他们更公平，并提供了更一致和准确的解释。

Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the Human Connectome Development Cohort

paper_url: http://arxiv.org/abs/2307.05426
repo_url: None
paper_authors: Abdoljalil Addeh, Fernando Vega, Rebecca J Williams, Ali Golestani, G. Bruce Pike, M. Ethan MacDonald
for: 这个研究的目的是提高fMRI研究中肺功能信号的质量和可用性。
methods: 该研究使用一种一维 convolutional neural network（CNN）模型来重建两种肺功能指标，即RV和RVT。
results: 研究结果表明，CNN模型可以从休息BOLD信号中捕捉有用的特征，并重建实际的RV和RVT时间序列。I hope this helps! Let me know if you have any other questions.

Abstract
In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries. It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows.

摘要
很多fMRI研究中的呼吸信号不可用或者质量不良。因此，直接从BOLD信号中除去低频呼吸变化是不可能的。本研究提出了一种一维 convolutional neural network（CNN）模型，用于重建两个呼吸指标：RV和RVT。结果显示，CNN可以从休息BOLD信号中捕捉有用的特征，重建真实的RV和RVT时间序列。预计该方法的应用将降低fMRI研究的成本，降低复杂性，并减少参与者的负担，因为他们不需要穿戴呼吸膜。

NeuBTF: Neural fields for BTF encoding and transfer

paper_url: http://arxiv.org/abs/2307.01199
repo_url: None
paper_authors: Carlos Rodriguez-Pardo, Konstantinos Kazatzis, Jorge Lopez-Moreno, Elena Garces
for: 这篇论文旨在提出一种新的神经网络材料表示方法，用于解决神经网络材料的固定性问题，以便在渲染中使用。
methods: 该方法使用神经网络来表示材料，并使用引导图像来控制神经网络的输出。在测试时，该方法可以使用UV、摄像头和光照向量来查询神经网络的输出。
results: 该方法可以在多种 sintetic和实际材料上达到竞争性的压缩率，并且可以通过引导图像来控制神经网络的输出。

Abstract
Neural material representations are becoming a popular way to represent materials for rendering. They are more expressive than analytic models and occupy less memory than tabulated BTFs. However, existing neural materials are immutable, meaning that their output for a certain query of UVs, camera, and light vector is fixed once they are trained. While this is practical when there is no need to edit the material, it can become very limiting when the fragment of the material used for training is too small or not tileable, which frequently happens when the material has been captured with a gonioreflectometer. In this paper, we propose a novel neural material representation which jointly tackles the problems of BTF compression, tiling, and extrapolation. At test time, our method uses a guidance image as input to condition the neural BTF to the structural features of this input image. Then, the neural BTF can be queried as a regular BTF using UVs, camera, and light vectors. Every component in our framework is purposefully designed to maximize BTF encoding quality at minimal parameter count and computational complexity, achieving competitive compression rates compared with previous work. We demonstrate the results of our method on a variety of synthetic and captured materials, showing its generality and capacity to learn to represent many optical properties.

摘要
神经材料表示法是现代渲染中广泛应用的一种表示方法。它比分析模型更加表达力，且占用内存更少，但现有的神经材料都是不可变的，意味着它们的输出对于特定的UV、摄像机和光量向量的训练后就是固定的。这在材料的预测中是有用的，但在材料需要编辑时可能变得非常限制性。在这篇论文中，我们提出了一种新的神经材料表示方法，该方法同时解决了BTF压缩、瓦片和推导问题。在测试时，我们使用导航图像作为输入，通过conditioning神经BTF于这个输入图像的结构特征来控制神经BTF。然后，神经BTF可以被查询作为普通BTF使用UV、摄像机和光量向量。我们的框架中每个组件都是为最大化BTF编码质量而设计，而且减少参数计数和计算复杂度，与之前的工作相比，我们的方法实现了竞争力的压缩率。我们在多种 sintetic和捕捉的材料上进行了试验，展示了我们的方法的通用性和能力学习表示多种光学性质。

Improved sampling via learned diffusions

paper_url: http://arxiv.org/abs/2307.01198
repo_url: None
paper_authors: Lorenz Richter, Julius Berner, Guan-Horng Liu
for: 这些论文提出了基于深度学习的方法，用于从不正规分布中采样。
methods: 这些方法是控制的扩散过程的特殊情况，寻找从给定的先前分布到目标分布的最有可能的杂乱进程。
results: 我们在这些方法中引入了一种变量形式，基于时间反转的扩散过程中的路径空间测量差异。这种抽象视角导致了可优化的梯度下降算法，并包含了先前的目标作为特殊情况。此外，我们还可以考虑不同于倒卡劳布拉迪弗分布的差异，以避免模式塌缩。例如，我们提出了对数差异损失函数，它在数值上显示了优化性和改进性。

Abstract
Recently, a series of papers proposed deep learning-based approaches to sample from unnormalized target densities using controlled diffusion processes. In this work, we identify these approaches as special cases of the Schr\"odinger bridge problem, seeking the most likely stochastic evolution between a given prior distribution and the specified target. We further generalize this framework by introducing a variational formulation based on divergences between path space measures of time-reversed diffusion processes. This abstract perspective leads to practical losses that can be optimized by gradient-based algorithms and includes previous objectives as special cases. At the same time, it allows us to consider divergences other than the reverse Kullback-Leibler divergence that is known to suffer from mode collapse. In particular, we propose the so-called log-variance loss, which exhibits favorable numerical properties and leads to significantly improved performance across all considered approaches.

摘要
最近，一系列论文提出了基于深度学习的方法来从不正规Target概率分布中采样。在这篇文章中，我们将这些方法定义为Schrödinger大桥问题的特殊情况，寻找从给定的先验分布到指定的Target概率分布的最有可能的杂化过程。我们进一步总结了这个框架，通过在Path空间测度上引入减法，从而得到了一种可优化的变分形式。这种抽象的视角允许我们考虑其他than reverse Kullback-Leibler divergence的异同，这种异同 known to suffer from mode collapse。特别是，我们提议使用Log-variance loss，它在数值上具有优秀的性质，并在所有考虑的方法中带来显著改进。

Squeezing Large-Scale Diffusion Models for Mobile

paper_url: http://arxiv.org/abs/2307.01193
repo_url: None
paper_authors: Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, Hyungjun Kim
for: 这 paper 旨在探讨将 Stable Diffusion 模型部署到移动设备上，以便实现高精度图像生成。
methods: 该 paper 使用 TensorFlow Lite 框架来实现移动设备上的 Stable Diffusion 部署，并支持 iOS 和 Android 设备。
results: 该 paper 实现的 Mobile Stable Diffusion 可以在 Android 设备上 achieve 512x512 图像生成的推理延迟时间小于 7 秒，并且可以在移动 GPU 上实现。

Abstract
The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more than one billion parameters to mobile devices poses distinctive challenges due to the limited computational and memory resources, which may vary according to the device. In this paper, we present the challenges and solutions for deploying Stable Diffusion on mobile devices with TensorFlow Lite framework, which supports both iOS and Android devices. The resulting Mobile Stable Diffusion achieves the inference latency of smaller than 7 seconds for a 512x512 image generation on Android devices with mobile GPUs.

摘要
Diffusion模型的出现已经极大地扩大了高精度图像生成的范围，导致了实践部署和学术研究中的重要进步。然而，将大型Diffusion模型，如Stable Diffusion，deploy到移动设备上具有限制的计算和内存资源的问题。在这篇文章中，我们介绍了将Stable Diffusion部署到移动设备上的挑战和解决方案，使用TensorFlow Lite框架支持iOS和Android设备。我们的Mobile Stable Diffusion实现了512x512像素生成的推理延迟低于7秒钟在Android设备上。

Trainable Transformer in Transformer

paper_url: http://arxiv.org/abs/2307.01189
repo_url: https://github.com/abhishekpanigrahi1996/transformer_in_transformer
paper_authors: Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora
for: 这个论文目的是提出一种高效的Transformer模型内部精细调整方法，以便在推理过程中进行精细调整。
methods: 这个方法使用了一些创新的近似技术，使得一个具有少于20亿参数的TinT模型能够在单步前进中 simulate和精细调整一个125亿参数的Transformer模型。
results: 在语言模型和下游任务中进行综合实验 validate了TinT模型的内部精细调整过程，并证明了大型预训练语言模型可以执行复杂的子任务。例如，even with a limited one-step budget, we observe TinT for a OPT-125M model improves performance by 4-16% absolute on average compared to OPT-125M。

Abstract
Recent works attribute the capability of in-context learning (ICL) in large pre-trained language models to implicitly simulating and fine-tuning an internal model (e.g., linear or 2-layer MLP) during inference. However, such constructions require large memory overhead, which makes simulation of more sophisticated internal models intractable. In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e.g., pre-trained language models). In particular, we introduce innovative approximation techniques that allow a TinT model with less than 2 billion parameters to simulate and fine-tune a 125 million parameter transformer model within a single forward pass. TinT accommodates many common transformer variants and its design ideas also improve the efficiency of past instantiations of simple models inside transformers. We conduct end-to-end experiments to validate the internal fine-tuning procedure of TinT on various language modeling and downstream tasks. For example, even with a limited one-step budget, we observe TinT for a OPT-125M model improves performance by 4-16% absolute on average compared to OPT-125M. These findings suggest that large pre-trained language models are capable of performing intricate subroutines. To facilitate further work, a modular and extensible codebase for TinT is included.

摘要
近期研究归功大型预训言语模型中的增Context学习（ICL）能力于内部模型（例如线性或2层MLP）的隐式模拟和细化 during inference. 然而，这些建构具有大量内存开销，使得更复杂的内部模型的模拟成为不可行。在这项工作中，我们提出了高效的建构——Transformer in Transformer（简称TinT），允许 transformer 模型在执行中内部模拟和细化复杂模型（例如预训言语模型）。具体来说，我们提出了创新的近似技术，使得 TinT 模型 fewer than 200 billion parameters 可以在单个前进 pass 中模拟和细化 125 million parameter transformer 模型。TinT 支持许多常见 transformer 变种，并且其设计想法也提高了过去内置简单模型的效率。我们通过综合实验 validate TinT 模型内部细化过程的效果，并在语言模型和下游任务上 observe 4-16% 绝对提升。这些发现表明大规模预训言语模型可以执行复杂的子routines。为了便于后续工作，我们附加了可扩展和可模块化的代码基金。

Fitting an ellipsoid to a quadratic number of random points

paper_url: http://arxiv.org/abs/2307.01181
repo_url: None
paper_authors: Afonso S. Bandeira, Antoine Maillard, Shahar Mendelson, Elliot Paquette
for: 这个论文研究了将 $n$ 个标准正态随机向量在 $\mathbb{R}^d$ 中适应中心圆柱体的问题，当 $n, d \to \infty$ 时。
methods: 这个论文使用了 Bartl & Mendelson 关于 Gram 矩阵的集中性的结论，并使用了一些轻量级的假设来证明这个问题在高概率下是可行的。
results: 这个论文证明了当 $n \leq d^2 / C$，其中 $C > 0$ 是一个可能很大的常数， THEN 问题 $(\mathrm{P})$ 有高概率是可行的。

Abstract
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions with high probability if $n \geq (1 + \varepsilon) d^2 /4$. So far, only a trivial bound $n \geq d^2 / 2$ is known on the negative side, while the best results on the positive side assume $n \leq d^2 / \mathrm{polylog}(d)$. In this work, we improve over previous approaches using a key result of Bartl & Mendelson on the concentration of Gram matrices of random vectors under mild assumptions on their tail behavior. This allows us to give a simple proof that $(\mathrm{P})$ is feasible with high probability when $n \leq d^2 / C$, for a (possibly large) constant $C > 0$.

摘要
我们考虑一个问题($\mathrm{P}$)，即在中心为零的椭球上适应 $n$ 标准高斯均匀随机向量，当 $n, d \to \infty$ 时。这个问题据悉有一个锐化可行性过渡：如果 $n \leq (1 - \varepsilon) d^2 / 4$，那么 $(\mathrm{P})$ 有高概率解，而如果 $n \geq (1 + \varepsilon) d^2 /4$，那么 $(\mathrm{P})$ 有高概率无解。目前只知道一个负边界 $n \geq d^2 / 2$，而最好的结果在正边界上假设 $n \leq d^2 / \text{polylog}(d)$。在这个工作中，我们使用 Bartl & Mendelson 关于均匀矩阵的吸引性的结果，从而得到一个简单的证明：如果 $n \leq d^2 / C$，那么 $(\mathrm{P})$ 有高概率解，其中 $C > 0$ 是一个可能很大的常数。

PlanE: Representation Learning over Planar Graphs

paper_url: http://arxiv.org/abs/2307.01180
repo_url: https://github.com/zzysonny/plane
paper_authors: Radoslav Dimitrov, Zeyang Zhao, Ralph Abboud, İsmail İlkan Ceylan
for: 本研究的目的是设计一个可以快速学习完整的平面图 isomorphism 的架构，以便在平面图上进行图像学习。
methods: 本研究使用了一种称为 PlanE 的框架，它是基于 Hopcroft 和 Tarjan 的平面图 isomorphism 算法。PlanE 包括一些可以学习完整的平面图 invariants 的架构，并且可以在实际上扩展到大规模的平面图。
results: 本研究透过实验验证了 PlanE 的模型架构，并取得了多个 state-of-the-art 的结果。在 well-known 平面图 benchmark 上，PlanE 的模型能够实现高效地学习完整的平面图 invariants。

Abstract
Graph neural networks are prominent models for representation learning over graphs, where the idea is to iteratively compute representations of nodes of an input graph through a series of transformations in such a way that the learned graph function is isomorphism invariant on graphs, which makes the learned representations graph invariants. On the other hand, it is well-known that graph invariants learned by these class of models are incomplete: there are pairs of non-isomorphic graphs which cannot be distinguished by standard graph neural networks. This is unsurprising given the computational difficulty of graph isomorphism testing on general graphs, but the situation begs to differ for special graph classes, for which efficient graph isomorphism testing algorithms are known, such as planar graphs. The goal of this work is to design architectures for efficiently learning complete invariants of planar graphs. Inspired by the classical planar graph isomorphism algorithm of Hopcroft and Tarjan, we propose PlanE as a framework for planar representation learning. PlanE includes architectures which can learn complete invariants over planar graphs while remaining practically scalable. We empirically validate the strong performance of the resulting model architectures on well-known planar graph benchmarks, achieving multiple state-of-the-art results.

摘要
“图 neural networks 是 Representation learning over graphs 中的主要模型，其中的思想是通过一系列转换来计算输入图的节点的表示，以确定learned graph function 是isoformation invariant的，这使得learned representation 成为图 invariants。然而，已知这些类型的模型学习的图 invariants 是不完全的：存在一些非同构的图对标准图 neural networks 无法分辨。这不Surprising，因为计算通用图 isomorphism testing 的计算复杂度很高，但在特定的图类中，有高效的图 isomorphism testing 算法，如平面图。我们的目标是设计一种能够有效地学习完整的平面图 invariants的architecture。 draw inspiration from Hopcroft 和 Tarjan 的平面图 isomorphism 算法，我们提出 PlanE 框架，用于平面 representation learning。 PlanE 包括一些可以学习完整的平面图 invariants 的architecture，并且 remain practically scalable。我们通过实验证明了这些结果的强性，在well-known planar graph benchmarks 上达到多个state-of-the-art result。”

Learning Mixtures of Gaussians Using the DDPM Objective

paper_url: http://arxiv.org/abs/2307.01178
repo_url: None
paper_authors: Kulin Shah, Sitan Chen, Adam Klivans
for: 本文研究了 diffusion 模型可以学习哪些分布？
methods: 本文使用了什么方法？
results: 本文得到了哪些结果？Here are my answers, in Simplified Chinese:
for: 本文研究了 diffusion 模型可以学习 Gaussian mixture models 的参数。
methods: 本文使用了 gradient descent 算法，并证明了其可以高效地学习 Gaussian mixture models 的参数。
results: 本文证明了 gradient descent 算法可以在两种设置下高效地学习 Gaussian mixture models：1) 随机初始化下可以learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers。2) warm start 下可以 learns mixtures of $K$ spherical Gaussians with $\Omega(\sqrt{\log(\min(K,d))})$-separated centers。

Abstract
Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers. 2) We show gradient descent with a warm start learns mixtures of $K$ spherical Gaussians with $\Omega(\sqrt{\log(\min(K,d))})$-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods.

摘要
近期研究表明，扩散模型可以学习任何分布，只要可以进行分数估计。然而，我们还不够了解在哪些情况下分数估计是可行的，更重要的是，我们是否可以实现有效的梯度下降算法来解决这个问题。在这个工作中，我们给出了首次可证fficient的结果，其中包括以下两个情况：1. 我们证明，使用随机 initialization 的梯度下降在 $d$ 维的两个球形 Gaussian 混合模型中可以有效地回归真实参数。2. 我们证明，使用温始的梯度下降可以在 $K$ 个球形 Gaussian 混合模型中，对中心点进行 $\Omega(\sqrt{\log(\min(K,d))})$ 级别的分割。我们的证明中的一个关键元素是一种新的分数-基本方法和 EM 算法以及spectral methods之间的连接。

Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space

paper_url: http://arxiv.org/abs/2307.01177
repo_url: None
paper_authors: Zhengdao Chen
for: 本文研究深度学习理论中神经网络（NN）定义的函数空间的特点。
methods: 作者视多层NN为定义特定层次的再生核希尔бер特空间（RKHS），称为神经希尔бер特阶梯（NHL）。
results: 作者证明了多层NN表达的函数和NHL之间的对应关系，并提供了控制复杂性度量的泛化保证。 Plus, the author derives the evolution of NHL as the dynamics of multiple random fields, and shows examples of depth separation in NHLs under different activation functions.

Abstract
The characterization of the functions spaces explored by neural networks (NNs) is an important aspect of deep learning theory. In this work, we view a multi-layer NN with arbitrary width as defining a particular hierarchy of reproducing kernel Hilbert spaces (RKHSs), named a Neural Hilbert Ladder (NHL). This allows us to define a function space and a complexity measure that generalize prior results for shallow NNs, and we then examine their theoretical properties and implications in several aspects. First, we prove a correspondence between functions expressed by L-layer NNs and those belonging to L-level NHLs. Second, we prove generalization guarantees for learning an NHL with the complexity measure controlled. Third, corresponding to the training of multi-layer NNs in the infinite-width mean-field limit, we derive an evolution of the NHL characterized as the dynamics of multiple random fields. Fourth, we show examples of depth separation in NHLs under ReLU and quadratic activation functions. Finally, we complement the theory with numerical results to illustrate the learning of RKHS in NN training.

摘要
文章主要探讨深度学习理论中神经网络（NN）函数空间的特点。在这篇文章中，我们将多层NN视为定义特定层次的重复内 produit 希尔бер特空间（RKHS），称为神经希尔бер特阶梯（NHL）。这允许我们定义函数空间和复杂度度量，这些度量将对先前的浅层NN进行扩展，并且我们将研究这些理论性质和影响。首先，我们证明了L层NN表达的函数和L层NHL之间的对应关系。其次，我们证明了控制复杂度度量的学习承诺。第三，对于在无限宽度的平均场中训练多层NN，我们 derivation 了NHL的演化，这可以看做多个Random Fields 的动态。最后，我们通过实验示例来补充理论，以Illustrate 神经网络在训练中学习RKHS的过程。

Quantum Neural Estimation of Entropies

paper_url: http://arxiv.org/abs/2307.01171
repo_url: None
paper_authors: Ziv Goldfeld, Dhrumil Patel, Sreejith Sreekumar, Mark M. Wilde
for: 估计量子系统中的信息量和相关性
methods: 使用变量量子算法和经典神经网络参数化测量方法
results: 精确地估计了不同 entropy 度量的值，有效地应用于下游任务

Abstract
Entropy measures quantify the amount of information and correlations present in a quantum system. In practice, when the quantum state is unknown and only copies thereof are available, one must resort to the estimation of such entropy measures. Here we propose a variational quantum algorithm for estimating the von Neumann and R\'enyi entropies, as well as the measured relative entropy and measured R\'enyi relative entropy. Our approach first parameterizes a variational formula for the measure of interest by a quantum circuit and a classical neural network, and then optimizes the resulting objective over parameter space. Numerical simulations of our quantum algorithm are provided, using a noiseless quantum simulator. The algorithm provides accurate estimates of the various entropy measures for the examples tested, which renders it as a promising approach for usage in downstream tasks.

摘要
Entropy 测量量代表量子系统中的信息量和相关性。在实践中，当量子状态未知，仅可以通过量子状态的复制来进行估算Entropy测量。我们提出了一种量子算法来估算 von Neumann 熵和 R\'enyi 熵，以及测量相对熵和测量 R\'enyi 相对熵。我们的方法首先假设测量对象的量子演算和классиical neural network的参数，然后对参数空间进行优化。我们的numerical simulation表明，该算法可以准确地估算各种熵测量的例子，这使其成为下游任务中的一个有前途的方法。Here's the breakdown of the translation:* Entropy 测量量 (Entropy measures) -> 熵测量 (entropy measurements)* 量子系统 (quantum system) -> 量子状态 (quantum state)* 未知 (unknown) -> 未知的 (unknown)* 复制 (copies) -> 复制品 (copies)* 估算 (estimation) -> 估算值 (estimated value)* von Neumann 熵 (Von Neumann entropy) -> von Neumann 熵量 (Von Neumann entropy)* R\'enyi 熵 (R\'enyi entropy) -> R\'enyi 熵量 (R\'enyi entropy)* 测量相对熵 (measured relative entropy) -> 测量相对熵量 (measured relative entropy)* 测量 R\'enyi 相对熵 (measured R\'enyi relative entropy) -> 测量 R\'enyi 相对熵量 (measured R\'enyi relative entropy)* 参数 (parameters) -> 参数空间 (parameter space)* 优化 (optimization) -> 优化过程 (optimization process)* numerical simulation -> 数值仿真 (numerical simulation)Note that the translation is done in Simplified Chinese, which is the most widely used standard for Chinese writing. The translation is done word-for-word, and some of the phrases or sentences may not be exactly the same as the original English version, but they should convey the same meaning.

Online nearest neighbor classification

paper_url: http://arxiv.org/abs/2307.01170
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Sanjoy Dasgupta, Geelon So
for: 研究在可实现 Setting 中的在线非参数化分类问题。
methods: 使用 classical 1-nearest neighbor algorithm，并证明其在可实现 Setting 中 achieve 下降的误差率。
results: 实现下降的误差率，即在对征或平滑的对手中的误差率。

Abstract
We study an instance of online non-parametric classification in the realizable setting. In particular, we consider the classical 1-nearest neighbor algorithm, and show that it achieves sublinear regret - that is, a vanishing mistake rate - against dominated or smoothed adversaries in the realizable setting.

摘要
我们研究在可实现 setting 中的在线非参数化分类问题。特别是，我们考虑了经典的1 nearest neighbor算法，并证明它在可实现 setting 中对于受控或平滑的反对敌人（adversaries） achieve 子线性 regret - 即消失的错误率。

Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm

paper_url: http://arxiv.org/abs/2307.01169
repo_url: None
paper_authors: Amrutha Varshini Ramesh, Aaron Mishkin, Mark Schmidt, Yihan Zhou, Jonathan Wilder Lavington, Jennifer She
for: 这篇论文是关于最优化问题的，具体来说是使用斜率逐步下降法和贝叶斯搜索法来解决一种具有约束的最优化问题。
methods: 这篇论文使用了一种名为”proximal Polyak-Lojasiewicz”的假设，并通过将这个假设应用到斜率逐步下降法中来提高它的准确率。此外，论文还使用了一种名为”bound- and summation-constrained steepest descent”的方法来解决具有约束的最优化问题。
results: 论文的结果表明，使用这种新的方法可以在$O(n \log n)$时间内解决具有约束的最优化问题，而且比之前的方法更快。此外，论文还证明了这种方法的准确率是Random Selection的两倍，并且不виси于问题的维度$n$。

Abstract
We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n^2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.

摘要
我们考虑最小化一个几何函数，并且需要遵循一个总和约束。我们利用两个坐标更新的连接，将其与等式约束对应的steepest descent在L1内 producer一个更快的价值变数。然后我们考虑受约束的最小化问题，其中包括总和约束和范围约束。现有的对策可能只能保证很小的进步，或者需要O(n^2)的时间来计算。我们显示，在L1内的约束降阶 descendence可以在O(nlogn)的时间内获得更多的进步，并且可以更快地计算。

Don’t freeze: Finetune encoders for better Self-Supervised HAR

paper_url: http://arxiv.org/abs/2307.01168
repo_url: None
paper_authors: Vitor Fortes Rey, Dominique Nshimyimana, Paul Lukowicz
for: 这个论文是为了解决人类活动识别领域中的标签数据可用性问题而提出的一种解决方案。
methods: 这个论文使用了自然语言处理中的预测任务，如重构和对比预测编码，来学习有用的表示。这些方法采用了预训练、冻结和细化的过程。
results: 这个论文发现，不冻结表示后的表示可以获得显著性能提升，这种提升是随着标签数据的量而增加的。此外，这种效果是无论在Capture24数据集上进行预测任务还是直接在目标数据集上进行预测任务中都存在。

Abstract
Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive predictive coding, useful representations can be learned that then can be used for classification. Those approaches follow the pretrain, freeze and fine-tune procedure. In this paper we will show how a simple change - not freezing the representation - leads to substantial performance gains across pretext tasks. The improvement was found in all four investigated datasets and across all four pretext tasks and is inversely proportional to amount of labelled data. Moreover the effect is present whether the pretext task is carried on the Capture24 dataset or directly in unlabelled data of the target dataset.

摘要
近期，无监督学习在人活动识别领域被提出，作为数据可用性问题的解决方案。这种方法是通过重构或对比预测编码来学习有用的表示，然后用于分类。这些方法遵循“预训练、冻结并微调”的过程。在这篇论文中，我们将展示一种简单的改变：不冻结表示，导致了重要的性能提升，并且这种提升随着数据量的减少而增加。此外，这种效果是不论预测任务是在 Capture24 数据集上进行还是直接在无标签数据集上进行的。

Coupled Gradient Flows for Strategic Non-Local Distribution Shift

paper_url: http://arxiv.org/abs/2307.01166
repo_url: None
paper_authors: Lauren Conger, Franca Hoffmann, Eric Mazumdar, Lillian Ratliff
for: 本研究旨在分析现实世界系统中的分布变化动态，包括学习算法和其部署的分布之间的反馈循环。
methods: 本研究提出了一种新的整合方法，该方法可以模型学习算法部署中的复杂分布变化，包括策略性反应、非本地人口互动和其他外部因素引起的分布变化。
results: 研究表明，当算法进行梯度下降 retraining 时，可以 дости到稳定状态，并且在有限和无限维度中都有显式速率，这些速率取决于模型参数。此外，研究还发现，该方法可以 Capture 许多已知的分布变化形式，如楔形和不同影响。

Abstract
We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems that captures the feedback loop between learning algorithms and the distributions on which they are deployed. Prior work largely models feedback-induced distribution shift as adversarial or via an overly simplistic distribution-shift structure. In contrast, we propose a coupled partial differential equation model that captures fine-grained changes in the distribution over time by accounting for complex dynamics that arise due to strategic responses to algorithmic decision-making, non-local endogenous population interactions, and other exogenous sources of distribution shift. We consider two common settings in machine learning: cooperative settings with information asymmetries, and competitive settings where a learner faces strategic users. For both of these settings, when the algorithm retrains via gradient descent, we prove asymptotic convergence of the retraining procedure to a steady-state, both in finite and in infinite dimensions, obtaining explicit rates in terms of the model parameters. To do so we derive new results on the convergence of coupled PDEs that extends what is known on multi-species systems. Empirically, we show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.

摘要
我们提出了一种新的框架，用于分析实际系统中分布shift的动态。这个框架 capture了学习算法和它们所部署的分布之间的反馈循环。先前的工作大多把反馈引起的分布shift模型为对抗性或非常简单的分布shift结构。相比之下，我们提出了一个结合部分梯度方程的模型，该模型可以考虑复杂的时间变化、策略性反应、非本地人口互动等因素，以捕捉细腻的分布变化。我们考虑了两种常见的机器学习设置：合作性设置和竞争性设置。在两个设置中，当算法通过梯度下降 retrained 时，我们证明了预测过程的稳定性，包括有限维度和无穷维度下的稳定性，并得到了明确的速率。为此，我们 derivation 了新的结果，用于coupled PDEs 的减少。实际证明，我们的方法能够捕捉到一些已知的分布shift形式，如极化和不同的影响。

Improving Language Plasticity via Pretraining with Active Forgetting

paper_url: http://arxiv.org/abs/2307.01163
repo_url: None
paper_authors: Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe
for: 实现PLMs的 universality，将其应用于新语言。
methods: 使用活动遗忘机制 during pretraining，以实现PLMs快速适应新语言。
results: 在语言适应中，使用我们的遗忘机制可以提高PLMs的学习新embeddings的能力，并在仅有少量数据的情况下表现出佳。

Abstract
Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within a limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation but also outperform standard ones in a low-data regime, particularly for languages that are distant from English.

摘要
现在，预训练语言模型（PLM）是自然语言处理的主要模型。尽管它们在下游性能方面表现出色，但是将其应用到新语言可能会增加难度，从而限制其universal accessible的能力。先前的工作已经证明可以通过学习一个新的嵌入层来解决这个问题，但是这需要大量的数据和计算资源。我们提议使用活动忘记机制 durante la pretrainings，作为一种简单的创建 PLMs 可快速适应新语言的方法。具体来说，在每K更新中，我们会重置嵌入层，这会让 PLM 在有限的更新数量内提高其学习新嵌入的能力，类似于 meta-learning 效果。我们通过使用 RoBERTa 进行实验，发现使用我们的忘记机制不仅可以在语言适应过程中快速 convergence，而且在数据量低的情况下，特别是与英语远的语言，也能够表现出较好的性能。

Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.01158
repo_url: None
paper_authors: Ini Oguntola, Joseph Campbell, Simon Stepputtis, Katia Sycara
for: 本研究旨在提高人工智能代理人在多智能环境中的社会智能，通过模拟他人的心理状态。
methods: 本研究使用深度网络模型策略，并将含义rich的信念嵌入策略中。然后，对每个代理人的信念预测能力作为多代理人学习的自适应奖励信号。
results: 在混合合作竞争环境中，该方法可以提高代理人之间的协作和竞争能力。

Abstract
The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward signal for multi-agent reinforcement learning. Finally, we present preliminary empirical results in a mixed cooperative-competitive environment.

摘要
人类社交智能中能够模拟他人的心理状态是非常重要的，可以为人工智能agent提供类似的社交动力。我们提出了将 semantically meaningful和human-interpretable的beliefsgrounding在深度网络模型中的方法。然后我们考虑了第二个belief预测任务。我们认为每个agent可以预测其他agent的beliefs作为多 agent reinforcement learning中的内在奖励信号。最后，我们提供了一些初步的实验结果在混合合作-竞争环境中。Here's a word-for-word translation of the text:人类社交智能中能够模拟他人的心理状态是非常重要的，可以为人工智能agent提供类似的社交动力。我们提出了将semantically meaningful和human-interpretable的beliefsgrounding在深度网络模型中的方法。然后我们考虑了第二个belief预测任务。我们认为每个agent可以预测其他agent的beliefs作为多agent reinforcement learning中的内在奖励信号。最后，我们提供了一些初步的实验结果在混合合作-竞争环境中。

A novel approach for predicting epidemiological forecasting parameters based on real-time signals and Data Assimilation

paper_url: http://arxiv.org/abs/2307.01157
repo_url: None
paper_authors: Romain Molinas, César Quilodrán Casas, Rossella Arcucci, Ovidiu Şerban
for: 预测epidemiological参数，使用新的实时信号 integrate from various sources, such as social media-based population density maps and Air Quality data。
methods: 使用Convolutional Neural Networks (CNN) ensemble models and various data sources and fusion methodology to build robust predictions, and use data assimilation to estimate the state of the system from fused CNN predictions。
results: 提高了 COVID-19 疫情预测的性能和灵活性，并且比标准模型（SEIR）更高精度和更稳定。

Abstract
This paper proposes a novel approach to predict epidemiological parameters by integrating new real-time signals from various sources of information, such as novel social media-based population density maps and Air Quality data. We implement an ensemble of Convolutional Neural Networks (CNN) models using various data sources and fusion methodology to build robust predictions and simulate several dynamic parameters that could improve the decision-making process for policymakers. Additionally, we used data assimilation to estimate the state of our system from fused CNN predictions. The combination of meteorological signals and social media-based population density maps improved the performance and flexibility of our prediction of the COVID-19 outbreak in London. While the proposed approach outperforms standard models, such as compartmental models traditionally used in disease forecasting (SEIR), generating robust and consistent predictions allows us to increase the stability of our model while increasing its accuracy.

摘要
本文提出了一种新的方法，通过将新的实时信号 integrate into various sources of information, such as social media-based population density maps and Air Quality data, to predict epidemiological parameters。我们使用了一个ensemble of Convolutional Neural Networks (CNN) models and various data sources and fusion methodology to build robust predictions and simulate several dynamic parameters that could improve the decision-making process for policymakers。此外，我们使用数据充满来估计系统状态的CNN预测结果。 combining meteorological signals and social media-based population density maps improved the performance and flexibility of our prediction of the COVID-19 outbreak in London。相比标准模型（如SEIR组件模型），我们的方法具有更高的稳定性和准确性。

AVSegFormer: Audio-Visual Segmentation with Transformer

paper_url: http://arxiv.org/abs/2307.01146
repo_url: https://github.com/vvvb-github/avsegformer
paper_authors: Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu
for: 本研究旨在提出一种新的听视分割（AVS）任务，以解决在视频中找到并分割听起来的对象。
methods: 本文提出了一种基于transformer架构的AVSegFormer模型，通过引入听音查询和可学习查询，使网络可以 selectively 关注有趣的视觉特征。此外，我们还提出了一种听视混合器，可以动态调整视觉特征，并且设置了一个中间mask损失，以便更好地监督网络的预测。
results: 广泛的实验表明，AVSegFormer可以在AVS标准准样上取得状态的损失。网络代码可以在https://github.com/vvvb-github/AVSegFormer上下载。

Abstract
The combination of audio and vision has long been a topic of interest in the multi-modal community. Recently, a new audio-visual segmentation (AVS) task has been introduced, aiming to locate and segment the sounding objects in a given video. This task demands audio-driven pixel-level scene understanding for the first time, posing significant challenges. In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture. Specifically, we introduce audio queries and learnable queries into the transformer decoder, enabling the network to selectively attend to interested visual features. Besides, we present an audio-visual mixer, which can dynamically adjust visual features by amplifying relevant and suppressing irrelevant spatial channels. Additionally, we devise an intermediate mask loss to enhance the supervision of the decoder, encouraging the network to produce more accurate intermediate predictions. Extensive experiments demonstrate that AVSegFormer achieves state-of-the-art results on the AVS benchmark. The code is available at https://github.com/vvvb-github/AVSegFormer.

摘要
具有音频和视觉功能的组合已经是多Modal社区中的一个长期关注的话题。最近，一个新的音频视频分割（AVS）任务被引入，旨在在给定的视频中找到并分割声音的对象。这个任务要求音频驱动像素级场景理解，具有重大挑战。在这篇论文中，我们提出了AVSegFormer，一种新的AVS任务框架，利用转换架构。具体来说，我们引入了音频问题和学习问题到转换解码器中，使网络可以选择性地注意到有兴趣的视觉特征。此外，我们提出了一个音频视频混合器，可以动态调整视觉特征，增强有用的空间通道。此外，我们还提出了一个中间面 mask loss，以增强解码器的监督，让网络生成更加准确的中间预测。广泛的实验表明，AVSegFormer可以在AVS标准 bencmarks 上达到状态的最佳结果。代码可以在https://github.com/vvvb-github/AVSegFormer 中下载。

SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

paper_url: http://arxiv.org/abs/2307.01139
repo_url: https://github.com/lupantech/ScienceQA
paper_authors: Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge
for: 这个论文旨在提高大型语言模型（LLM）的能力，使其更好地遵循科学 Multimodal 指令。
methods: 这个论文提出了 SciTune 调教框架，用于改进 LLM 的科学 Multimodal 理解能力。其中使用了人类生成的科学指令调教数据集，并训练了一个包含视觉编码器和 LLM 的多Modal 模型 LLaMA-SciTune。
results: 对比机器生成数据只进行finetuning的模型，LLaMA-SciTune 在科学QA benchmark中平均和许多子类型的人工性能都高于人类性能。

Abstract
Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. In comparison to the models that are finetuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.

摘要
instruction fine-tuning是一种流行的思想，用于将大型语言模型（LLM）与人类意图进行对接。尽管这个想法在提高LLM对现有基础模型的适应性方面具有广泛的应用前景，但是它在科学领域中得到了更少的探索。在这项工作中，我们提出了SciTune作为一种调整框架，用于改进LLM对科学多Modal指令的遵循能力。为测试我们的方法，我们使用了人类生成的科学指令调整数据集，并训练了一个包含视觉编码器和LLM的科学频谱模型LLaMA-SciTune。与只使用机器生成的数据进行finetuning的模型相比，LLaMA-SciTune在科学问答 bencmark中平均和许多子类型上超越了人类性能。