2023-08-31

cs.LG

cs.LG - 2023-08-31

A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

paper_url: http://arxiv.org/abs/2308.16904
repo_url: None
paper_authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma
for: Linear systems with noisy coefficient matrices and right-hand side vectors, and the need for efficient iterative solvers.
methods: Randomized Kaczmarz (RK) algorithm and its convergence analysis in the presence of both additive and multiplicative noise.
results: The paper provides a robust analysis of RK’s convergence for noisy linear systems, without requiring knowledge of the noiseless coefficient matrix, and demonstrates the effectiveness of the method through comprehensive numerical experiments.Here’s the full text in Simplified Chinese:
for: Linear systems $Ax=b$ 频繁出现在实践中，需要有效的迭代解算法。它们经常受到操作错误或数据收集过程中的干扰，导致系统受到干扰。过去的十年，Randomized Kaczmarz（RK）算法已经被广泛研究，作为有效的迭代解算法。但是，RK在干扰 regime 的整合研究尚不充分，只考虑了右侧向量 $b$ 的干扰。在实际中， Matrix $A$ 也可能受到干扰。
methods: RK 算法在干扰 Linear Systems 中的整合分析，包括 $A$ 和 $b$ 的干扰。
results: paper 提供了 Robust 的 RK 整合分析方法，不需要知道干扰后的 Matrix $A$，并通过对各种干扰条件的分析，可以控制 RK 的整合。实验证明了这些理论成果的实际可行性。

Abstract
Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments.

摘要

Learning to Taste: A Multimodal Wine Dataset

paper_url: http://arxiv.org/abs/2308.16900
repo_url: None
paper_authors: Thoranna Bender, Simon Møe Sørensen, Alireza Kashani, K. Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, Frederik Warburg
for: 研究视觉、语言和味道之间的关系
methods: 使用大量多模态葡萄酒数据集（WineSensed），包括897k图像和824k葡萄酒评分，以及5k对精细味道距离的人工味道试验结果
results: 提出一种将人类经验与自动机器相似性kernels合并的低维度概念嵌入算法，可以提高粗略味道分类（酒精度、国家、葡萄、价格、评分），并与人类味道感受相似。

Abstract
We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique vintages, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.

摘要
我们介绍WineSensed，一个大型多模态葡萄酒数据集，用于研究视觉感知、语言和味道之间的关系。该数据集包括897万张葡萄酒标签图像和824万瓶葡萄酒的评论，来自Vivino平台。它包含了350万个唯一的年份、地区、评分、酒精含量、价格和葡萄种植物的注释。我们对一个子集进行了葡萄酒味道 экспериimento，让256名参与者按照葡萄酒的味道相似性进行排序，共计超过5000个对的Pairwise味道距离。我们提议一种低维度概念嵌入算法，结合人类经验和自动机器相似性kernels。我们示示了这个共享概念嵌入空间可以超越分类（酒精含量、国家、葡萄、价格、评分），并与人类对味道的细致感知相匹配。

Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization

paper_url: http://arxiv.org/abs/2308.16889
repo_url: None
paper_authors: Mariam Yahya, Setareh Maghsudi, Slawomir Stanczak
for: 本研究旨在实现无人机增强无线网络中的联邦学习（Federated Learning，FL），并且将其应用于具有能源限制的无线传输网络中。
methods: 本研究使用了多元目标多臂枪械理论来优化网络覆盖，同时降低了联邦学习延迟。另外，我们还提出了一个特别适用于大量动作集和严格能源限制的解决方案，使用了单一最佳臂识别算法来寻找最佳臂，以实现最大化网络覆盖和最小化联邦学习延迟。
results: NUMERICAL результаTS show the effectiveness of our approach, and demonstrate that our proposed method can significantly improve the coverage of the wireless sensor network while minimizing the FL delay.

Abstract
Federated learning (FL) involves several devices that collaboratively train a shared model without transferring their local data. FL reduces the communication overhead, making it a promising learning method in UAV-enhanced wireless networks with scarce energy resources. Despite the potential, implementing FL in UAV-enhanced networks is challenging, as conventional UAV placement methods that maximize coverage increase the FL delay significantly. Moreover, the uncertainty and lack of a priori information about crucial variables, such as channel quality, exacerbate the problem. In this paper, we first analyze the statistical characteristics of a UAV-enhanced wireless sensor network (WSN) with energy harvesting. We then develop a model and solution based on the multi-objective multi-armed bandit theory to maximize the network coverage while minimizing the FL delay. Besides, we propose another solution that is particularly useful with large action sets and strict energy constraints at the UAVs. Our proposal uses a scalarized best-arm identification algorithm to find the optimal arms that maximize the ratio of the expected reward to the expected energy cost by sequentially eliminating one or more arms in each round. Then, we derive the upper bound on the error probability of our multi-objective and cost-aware algorithm. Numerical results show the effectiveness of our approach.

摘要
Federated 学习（FL）包括多个设备共同训练一个共享模型，而不需要传输本地数据。FL 减少了通信开销，使其成为无人机增强无线网络中的优秀学习方法，具有紧张的能源资源。不过，在实施 FL 中，使用 convent ional 无人机布局方法可能会增加延迟，而且无法预知 crucial 变量，如通道质量，使得问题更加复杂。在这篇论文中，我们首先分析了一个由无人机增强的无线传感器网络（WSN）中的能量收集。然后，我们开发了一个基于多目标多臂投机理论的模型和解决方案，以最大化网络覆盖率，同时最小化 FL 延迟。此外，我们还提出了一个特点是在大动作集和严格能源限制下特别有用的解决方案。我们的建议使用一个权重积分算法来找出最佳的臂，以最大化预期回报与预期能源成本的比率。然后，我们 deriv 了Upper bound 上的错误概率。数字结果表明我们的方法的效果。

Prediction of Diblock Copolymer Morphology via Machine Learning

paper_url: http://arxiv.org/abs/2308.16886
repo_url: None
paper_authors: Hyun Park, Boyuan Yu, Juhae Park, Ge Sun, Emad Tajkhorshid, Juan J. de Pablo, Ludwig Schneider
for: 该研究旨在加速大域颗粒材料形态演化计算，以便更好地理解粒子在材料中的Diffusion行为。
methods: 该方法利用了粒子级别的empirical模型和宏观尺度的数学模型之间的分离，并通过使用人工智能来学习Stochastic驱动的异常消失过程。
results: 该研究 validate了一种基于UNet架构的Explainable AI方法，可以快速计算大域颗粒材料的形态演化，并且可以生成大型系统和长时间的轨迹来研究异常密度和其演化过程。

Abstract
A machine learning approach is presented to accelerate the computation of block polymer morphology evolution for large domains over long timescales. The strategy exploits the separation of characteristic times between coarse-grained particle evolution on the monomer scale and slow morphological evolution over mesoscopic scales. In contrast to empirical continuum models, the proposed approach learns stochastically driven defect annihilation processes directly from particle-based simulations. A UNet architecture that respects different boundary conditions is adopted, thereby allowing periodic and fixed substrate boundary conditions of arbitrary shape. Physical concepts are also introduced via the loss function and symmetries are incorporated via data augmentation. The model is validated using three different use cases. Explainable artificial intelligence methods are applied to visualize the morphology evolution over time. This approach enables the generation of large system sizes and long trajectories to investigate defect densities and their evolution under different types of confinement. As an application, we demonstrate the importance of accessing late-stage morphologies for understanding particle diffusion inside a single block. This work has implications for directed self-assembly and materials design in micro-electronics, battery materials, and membranes.

摘要
machine learning方法提出以加速大域领域内部链 polymer 结构演化计算。该策略利用粗粒体EVOLUTION的特征时间分解，从粗粒体 simulations 直接学习随机驱动的 Defect 消失过程。与经验法模型不同，该方法从粗粒体基本上学习随机驱动的 Defect 消失过程。采用UNet 架构，并遵循不同的边界条件，以 periodic 和固定substrate 边界条件。通过引入物理概念到损失函数中，并通过数据扩展来 incorporate Symmetries。该模型通过三个不同的应用 validate。使用 Explainable AI 方法可以Visualize 链 polymer 结构演化过程。这种方法可以生成大型系统和长时间轨迹，以调查各种封闭环境中 Defect 的浓度和演化。作为应用，我们示出了在一个块内部粒子 diffusion 的重要性。这种方法有关 directed self-assembly 和材料设计在微电子、电池材料和膜料等领域。

Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs

paper_url: http://arxiv.org/abs/2308.16859
repo_url: None
paper_authors: Mishfad Shaikh Veedu, Deepjyoti Deka, Murti V. Salapaka
for: 本文研究了一种 Linear Dynamical System (LDS) 上的 Directed Acyclic Graph (DAG) 的下面复杂度。 Specifically, the paper studies the sample complexity of learning the underlying DAG of an LDS over a DAG, where the nodal states are temporally correlated and driven by unobserved exogenous noise sources.
methods: 本文提出了一种基于 PSD 矩阵的度量和算法来重建 DAG。 The proposed metric and algorithm are inspired by the static settings, but are modified to accommodate the temporal correlations in the nodal states. The paper also considers the case where the equal noise PSD assumption can be relaxed.
results: 本文证明了 DAG 的下面复杂度为 $n = \Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. This upper bound is proven using a concentration bound for the PSD estimation, as well as a matching min-max lower bound based on generalized Fano’s inequality.

Abstract
In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.

摘要
在本文中，我们研究了一个线性动力系统（LDS）下的指向图（DAG）的下面复杂性。已经对静止系统的结构学习了许多，但是对动态系统的研究较少。我们称这种DAG为动态DAG（DDAG）。特别是，我们考虑了一个DDAG，其中节点动力是由未观察的外部噪声源驱动，这些噪声源是时间方向上广泛站立的（WSS），且没有相互相关。根据静止设置，我们提出了一个度量和一个算法，使用观察时间序列的PSD矩阵来重建DDAG。PSD假设可以被放宽，以便不论identifiability condition不被违反。对LDS WITH WSS（子） Gaussian噪声源，我们证明了需要学习DDAG的样本复杂度为 $n = \Theta(q\log(p/q))$,其中$p$ 是节点数量，$q$ 是每个节点最多的父节点数。为证明样本复杂度上限，我们 derivated一个PSD估计的吸引 bound，以及一个通用Fano的不等式来提供下界。因此，我们得出了DDAG的学习算法的度量优化性。

Majorization-Minimization for sparse SVMs

paper_url: http://arxiv.org/abs/2308.16858
repo_url: None
paper_authors: Alessandro Benfenati, Emilie Chouzenoux, Giorgia Franchini, Salla Latva-Aijo, Dominik Narnhofer, Jean-Christophe Pesquet, Sebastian J. Scott, Mahsa Yousefi
for: 本研究旨在提出一种基于稀疏Promoting-Regularized squared hinge loss minimization的支持向量机器学习（SVM）训练方法，以便应用快速训练方法和提高性能。
methods: 本研究使用了稀疏Promoting-Regularized squared hinge loss minimization方法，该方法利用了对函数梯度的Lippenchitz可微性，并可以处理稀疏正则化项，以提高选择最重要特征的能力。
results: 根据对三个不同数据集的测试和比较，提出的方法在指标（准确率、精度、回归率和F 1 分数）和计算成本两个方面具有良好的表现。

Abstract
Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost.

摘要

Natural Quantum Monte Carlo Computation of Excited States

paper_url: http://arxiv.org/abs/2308.16848
repo_url: None
paper_authors: David Pfau, Simon Axelrod, Halvard Sutterud, Ingrid von Glehn, James S. Spencer
for: 这个论文是为了估计量子系统的最低升阶态而写的。
methods: 这个方法使用变分 Monte Carlo 算法，无需任何自由参数和显式正交化不同态态。
results: 这个方法可以准确地估计量子系统的升阶态，并可以计算不同态之间的偏振矩。这种方法在分子物理中将是非常有用，例如可以准确地估计分子的升阶能量和振荡矩。

Abstract
We present a variational Monte Carlo algorithm for estimating the lowest excited states of a quantum system which is a natural generalization of the estimation of ground states. The method has no free parameters and requires no explicit orthogonalization of the different states, instead transforming the problem of finding excited states of a given system into that of finding the ground state of an expanded system. Expected values of arbitrary observables can be calculated, including off-diagonal expectations between different states such as the transition dipole moment. Although the method is entirely general, it works particularly well in conjunction with recent work on using neural networks as variational Ansatze for many-electron systems, and we show that by combining this method with the FermiNet and Psiformer Ansatze we can accurately recover vertical excitation energies and oscillator strengths on molecules as large as benzene. Beyond the examples on molecules presented here, we expect this technique will be of great interest for applications of variational quantum Monte Carlo to atomic, nuclear and condensed matter physics.

摘要
我团队提出了一种变分 Monte Carlo 算法，用于估算量子系统的最低强度态。这种方法是自然推广估算系统的基准态的方法，没有自由参数，不需要显式对不同态进行正交化。将问题变为找到系统的扩展系统的基准态。可以计算出任意观测量的期望值，包括不同态之间的偏移量，如轨道电动势矩。这种方法是普适的，但特别适用于使用神经网络作为多电子系统的变量 Ansatz，我们示出了将这种方法与 FermiNet 和 Psiformer Ansatz 结合使用可以准确地回归分子上的垂直升降能和振荡强度。此外，我们预期这种技术在原子、核和 condensed matter 物理中将具有广泛的应用。

FedDD: Toward Communication-efficient Federated Learning with Differential Parameter Dropout

paper_url: http://arxiv.org/abs/2308.16835
repo_url: None
paper_authors: Zhiying Feng, Xu Chen, Qiong Wu, Wen Wu, Xiaoxi Zhang, Qianyi Huang
for: 提高 Federated Learning（FL）的通信效率和模型融合能力，解决客户端间网络环境不同而导致的长时间通信延迟和模型融合问题。
methods: 提出了一种基于模型参数抽象的 Federated Learning 方案，即 Dropout 率分配和上传参数选择两个关键模块，通过优化客户端对应的模型参数上传比例和选择重要参数上传，以适应不同客户端的各种各样的网络环境和数据特点。
results: 通过 teorically 分析和实验评估，显示了提出的 FedDD 方案可以在通信效率和模型融合能力两个方面具有出色的表现，同时具有强大的泛化能力，能够适应数据的罕见类。

Abstract
Federated Learning (FL) requires frequent exchange of model parameters, which leads to long communication delay, especially when the network environments of clients vary greatly. Moreover, the parameter server needs to wait for the slowest client (i.e., straggler, which may have the largest model size, lowest computing capability or worst network condition) to upload parameters, which may significantly degrade the communication efficiency. Commonly-used client selection methods such as partial client selection would lead to the waste of computing resources and weaken the generalization of the global model. To tackle this problem, along a different line, in this paper, we advocate the approach of model parameter dropout instead of client selection, and accordingly propose a novel framework of Federated learning scheme with Differential parameter Dropout (FedDD). FedDD consists of two key modules: dropout rate allocation and uploaded parameter selection, which will optimize the model parameter uploading ratios tailored to different clients' heterogeneous conditions and also select the proper set of important model parameters for uploading subject to clients' dropout rate constraints. Specifically, the dropout rate allocation is formulated as a convex optimization problem, taking system heterogeneity, data heterogeneity, and model heterogeneity among clients into consideration. The uploaded parameter selection strategy prioritizes on eliciting important parameters for uploading to speedup convergence. Furthermore, we theoretically analyze the convergence of the proposed FedDD scheme. Extensive performance evaluations demonstrate that the proposed FedDD scheme can achieve outstanding performances in both communication efficiency and model convergence, and also possesses a strong generalization capability to data of rare classes.

摘要
federated learning (FL) 需要频繁交换模型参数，这会导致长途通信延迟，特别是当客户端环境差异较大时。此外，服务器还需要等待最慢的客户端（即废物，可能有最大模型大小、最低计算能力或最差网络条件）上传参数，这可能会对通信效率产生重大影响。常见的客户端选择方法，如部分客户端选择，会导致计算资源浪费和全局模型的泛化弱化。为解决这个问题，我们在这篇论文中提出了参数掉弃法，并对其进行了修改，从而提出了一种新的 Federated Learning 方案——Differential Parameter Dropout (FedDD)。FedDD 包括两个关键模块：Dropout 率分配和上传参数选择，它们会优化模型参数上传比例适应不同客户端的各种差异，同时选择合适的重要模型参数上传。具体来说，Dropout 率分配是一个凸型优化问题，考虑到系统差异、数据差异和模型差异。上传参数选择策略强调选择重要的参数上传，以加速协同整合。此外，我们还 theoretically 分析了 FedDD 方案的收敛性。EXT 的性能评估表明，提出的 FedDD 方案可以在通信效率和模型收敛之间取得极佳的平衡，同时具有强大的泛化能力，对数据的罕见类型进行泛化。

Joint Semantic-Native Communication and Inference via Minimal Simplicial Structures

paper_url: http://arxiv.org/abs/2308.16789
repo_url: None
paper_authors: Qiyang Zhao, Hang Zou, Mehdi Bennis, Merouane Debbah, Ebtesam Almazrouei, Faouzi Bader
for: 在本文中，我们研究了 semantic communication and inference 问题，在其中一个学生代理（Mobile device）向一个教师代理（Cloud server）提出查询以生成更高级别的数据semantics。
methods: 教师首先将其数据映射到 k-order simplicial complex 中，并学习其高阶相关性。为了有效地传输信息和进行推理，教师寻找最小 suffice 和不变的 semantic structures，并通过judiciously removing simplices 选择由Hodge Laplacians 选择的 simplicial structures。学生本地运行自己的查询，基于masked simplicial convolutional autoencoder (SCAE)，并利用本地和远程教师的知识。
results: 数字结果表明我们提出的方法有效地提高了查询准确率，不同通道条件和 simplicial 结构下。在一个合作作者数据集上，去掉 simplicial 结构 ranked Laplacian values 可以减少payload大小85%，不会影响准确率。 joint semantic communication and inference by masked SCAE 可以提高查询准确率25%，比本地学生基于查询和远程教师基于查询的准确率高15%。最后，我们发现 incorporating channel semantics 可以有效地提高推理准确率，特别是在低 SNR 值下。

Abstract
In this work, we study the problem of semantic communication and inference, in which a student agent (i.e. mobile device) queries a teacher agent (i.e. cloud sever) to generate higher-order data semantics living in a simplicial complex. Specifically, the teacher first maps its data into a k-order simplicial complex and learns its high-order correlations. For effective communication and inference, the teacher seeks minimally sufficient and invariant semantic structures prior to conveying information. These minimal simplicial structures are found via judiciously removing simplices selected by the Hodge Laplacians without compromising the inference query accuracy. Subsequently, the student locally runs its own set of queries based on a masked simplicial convolutional autoencoder (SCAE) leveraging both local and remote teacher's knowledge. Numerical results corroborate the effectiveness of the proposed approach in terms of improving inference query accuracy under different channel conditions and simplicial structures. Experiments on a coauthorship dataset show that removing simplices by ranking the Laplacian values yields a 85% reduction in payload size without sacrificing accuracy. Joint semantic communication and inference by masked SCAE improves query accuracy by 25% compared to local student based query and 15% compared to remote teacher based query. Finally, incorporating channel semantics is shown to effectively improve inference accuracy, notably at low SNR values.

摘要
在这项研究中，我们研究了 semantic communication和推理问题，在其中一个学生代理（即移动设备）向一个教师代理（即云服务器）发送查询来生成更高级数据 semantics。特别是，教师首先将其数据映射到 k-order simplicial complex 中，并学习其高级相关性。为了有效地传输信息并进行推理，教师寻找最小 suffice 和不变的 semantic structures ，以便在传输信息之前进行准备。这些最小 simplicial structures 通过 judiciously removing simplices 选择 Hodge Laplacians 而获得。然后，学生本地运行自己的查询，基于 masked simplicial convolutional autoencoder (SCAE) ，利用本地和远程教师的知识。numerical results 表明提议的方法可以在不同的通道条件和 simplicial structures 下提高推理查询精度。在 coauthorship 数据集上，通过将 simplices 按照 Laplacian 值排序来减少payload大小，可以得到85%的减少，而不会影响准确性。 joint semantic communication and inference by masked SCAE 可以提高查询精度，比 мест学生基于查询的精度提高25%，比远程教师基于查询的精度提高15%。最后， incorporating channel semantics 可以有效地提高推理精度，特别是在低 SNR 值下。

Constructing Indoor Region-based Radio Map without Location Labels

paper_url: http://arxiv.org/abs/2308.16759
repo_url: None
paper_authors: Zheng Xing, Junting Chen
for: construct a radio map without location labels
methods: signal subspace model with sequential prior, integrated segmentation and clustering algorithm
results: reduces region localization error by roughly 50% compared to baseline, outperforms some supervised localization schemes

Abstract
Radio map construction requires a large amount of radio measurement data with location labels, which imposes a high deployment cost. This paper develops a region-based radio map from received signal strength (RSS) measurements without location labels. The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once, where the footprints and timestamps are not recorded. The main challenge is to cluster the RSS data and match clusters with the physical regions. Classical clustering algorithms fail to work as the RSS data naturally appears as non-clustered due to multipaths and noise. In this paper, a signal subspace model with a sequential prior is constructed for the RSS data, and an integrated segmentation and clustering algorithm is developed, which is shown to find the globally optimal solution in a special case. Furthermore, the clustered data is matched with the physical regions using a graph-based approach. Based on real measurements from an office space, the proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline, and it even outperforms some supervised localization schemes, including k-nearest neighbor (KNN), support vector machine (SVM), and deep neural network (DNN), which require labeled data for training.

摘要
Radio 地图构建需要大量的无线测量数据与位置标签，这会导致高达 deployment 成本。本文基于接收信号强度（RSS）测量数据无法获取位置标签，开发了一种基于区域的无线地图。该构建基于一组隐藏收集的 RSS 测量数据，来自设备在室内区域中访问每个区域仅一次，无法记录足迹和时间戳。主要挑战是将 RSS 数据分组并与物理区域匹配。经典的分组算法无法工作，因为 RSS 数据自然出现非分布的特征，即 multipath 和噪声。本文构建了一个信号特征空间模型，并开发了一种整合分组和 clustering 算法，该算法在特定情况下能够找到全球最佳解决方案。此外，分组后的数据与物理区域进行图形相匹配。基于实际测量的办公室空间数据，提出的方案与权重中心位置标注（WCL）基准相比，减少了地区本地化错误约50%，甚至超过了一些指导式本地化方案，包括 k-最近邻（KNN）、支持向量机（SVM）和深度神经网络（DNN），这些方案需要训练数据。

Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction

paper_url: http://arxiv.org/abs/2308.16754
repo_url: None
paper_authors: Eric Arthur Werneburg
for: 这篇论文是关于使用插值技术来训练神经网络的理论研究。
methods: 这篇论文使用了 reproduce kernel Hilbert space 理论来推广神经网络的训练方法，并研究了 associate Hilbert space 的概念以提高 activation function 的表达能力。
results: 该论文提出了一种基于多重复杂函数理论的神经网络 architecture，称为 Prolongation Neural Networks (PNN)，并证明了 PNN 在噪音环境中表现更好于当前state-of-the-art 方法。

Abstract
We introduce and study the theory of training neural networks using interpolation techniques from reproducing kernel Hilbert space theory. We generalize the method to Krein spaces, and show that widely-used neural network architectures are subsets of reproducing kernel Krein spaces (RKKS). We study the concept of "associated Hilbert spaces" of RKKS and develop techniques to improve upon the expressivity of various activation functions. Next, using concepts from the theory of functions of several complex variables, we prove a computationally applicable, multidimensional generalization of the celebrated Adamjan- Arov-Krein (AAK) theorem. The theorem yields a novel class of neural networks, called Prolongation Neural Networks (PNN). We demonstrate that, by applying the multidimensional AAK theorem to gain a PNN, one can gain performance superior to both our interpolatory methods and current state-of-the-art methods in noisy environments. We provide useful illustrations of our methods in practice.

摘要
我们介绍和研究使用 interpolating 技术训练神经网络的理论，并将其推广到 Krein 空间中。我们显示了现有的神经网络架构是 reproduce 的 kernel Krein 空间（RKKS）的子集。我们研究了 associate 的希尔伯特空间的概念，并开发了提高各种激活函数表达能力的技巧。接着，使用函数多个复数变量理论，我们证明了一种计算可行的、多维度泛化的 Adamjan-Arov-Krein（AAK）定理。这个定理提供了一种新的神经网络，称为 Prolongation Neural Networks（PNN）。我们示出了，通过将多维度 AAK 定理应用于 PNN，可以在噪音环境中获得性能更高的结果，比我们的 interpolatory 方法和当前领域的状态OF-the-art 方法更好。我们在实践中提供了有用的示例。

Moreau Envelope ADMM for Decentralized Weakly Convex Optimization

paper_url: http://arxiv.org/abs/2308.16752
repo_url: None
paper_authors: Reza Mirzaeifard, Naveen K. D. Venkategowda, Alexander Jung, Stefan Werner
for: 提出一种基于分布式优化的 proximal ADMM 算法，用于解决各种 convex 和非 convex 优化问题。
methods: 使用 Moreau envelope function 分析 ADMM 算法的收敛性，并计算 dual 变量更新步骤中的 bound。
results: 对一系列 numerical experiments 进行了比较，发现提出的方法比 widely-used approaches 更快和稳定。

Abstract
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. Although the current versions of ADMM algorithm provide promising numerical results in producing solutions that are close to optimal for many convex and non-convex optimization problems, it remains unclear if they can converge to a stationary point for weakly convex and locally non-smooth functions. Through our analysis using the Moreau envelope function, we demonstrate that MADM can indeed converge to a stationary point under mild conditions. Our analysis also includes computing the bounds on the amount of change in the dual variable update step by relating the gradient of the Moreau envelope function to the proximal function. Furthermore, the results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.

摘要
这篇论文提出了一种靠近的多向量方法（MADM），用于分布式优化。 current ADMM 算法可以在许多凹陷和非凹陷优化问题上提供优秀的数值结果，但是无法确定是否可以到达一个稳定点 для弱 convex 和地方非滑降函数。我们通过使用Moreau函数的包装函数进行分析，并证明了 MADM 可以到达稳定点，只要满足一定的轻量级条件。我们的分析还包括计算 dual 变量更新步骤中的变化 bounds，通过将 proximal 函数的梯度与 Moreau 函数的梯度相关。此外，我们的数值实验结果表明，我们的方法比广泛使用的方法更快和更稳定。

Robust Representation Learning for Unreliable Partial Label Learning

paper_url: http://arxiv.org/abs/2308.16718
repo_url: None
paper_authors: Yu Shi, Dong-Dong Wu, Xin Geng, Min-Ling Zhang
for: 提高弱监督学习中对假标签的耐质量性能
methods: 提出了一种基于不可靠部分标签学习的不可靠度耐质量学习框架（URRL），并提出了一种双战略，包括KNN基于候选标签集修正和一致规则基于标签冲突的修正方法
results: 对多个 datasets 进行了广泛的实验，并证明了该方法可以比现有方法在不同的不可靠性和模糊性下表现出较好的性能

Abstract
Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. However, this idealistic assumption may not always hold due to potential annotation inaccuracies, meaning the ground-truth may not be present in the candidate label set. This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels, often resulting in a sub-optimal performance with existing methods. To address this challenge, we propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively. Concurrently, we propose a dual strategy that combines KNN-based candidate label set correction and consistency-regularization-based label disambiguation to refine label quality and enhance the ability of representation learning within the URRL framework. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art PLL methods on various datasets with diverse degrees of unreliability and ambiguity. Furthermore, we provide a theoretical analysis of our approach from the perspective of the expectation maximization (EM) algorithm. Upon acceptance, we pledge to make the code publicly accessible.

摘要
受限 Label Learning（PLL）是一种弱有监督学习方法，每个训练实例都会被分配一组候选标签，但只有一个标签是真实的。然而，这种理想化的假设并不总是成立，因为可能存在注释错误，导致真实的标签不在候选标签集中。这被称为不可靠受限 Label Learning（UPLL），它带来了额外的复杂性，由于受限标签的不可靠性和抽象性，通常会导致现有方法的下降性能。为解决这个挑战，我们提出了不可靠性Robust Representation Learning框架（URRL），它利用不可靠性Robust contrastive learning来帮助模型在受限标签下坚持effectively。同时，我们提出了两个策略：一是KNN基于候选标签集 corrections，二是Consistency regularization基于标签卷积来纠正标签质量并增强表示学习的能力。广泛的实验显示，我们提出的方法在不同的数据集上与已有PLL方法进行比较，具有更高的性能。此外，我们还提供了基于EM算法的理论分析。接受后，我们将代码公开访问。

Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness

paper_url: http://arxiv.org/abs/2308.16681
repo_url: https://github.com/reliable-ai/fairml-multiverse
paper_authors: Jan Simson, Florian Pfisterer, Christoph Kern
for: This paper aims to study the fairness of algorithmic decision-making (ADM) systems and provide a method for analyzing their fairness.methods: The authors introduce the method of multiverse analysis for algorithmic fairness, which turns implicit design decisions into explicit ones and demonstrates their fairness implications.results: The authors use an exemplary case study of predicting public health coverage for vulnerable populations to illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness, and how to detect these effects using multiverse analysis. The results show that the method can be used to better understand variability and robustness of algorithmic fairness.

Abstract
A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. When designed well, these systems promise more objective decisions while saving large amounts of resources and freeing up human time. However, when ADM systems are not designed well, they can lead to unfair decisions which discriminate against societal groups. The downstream effects of ADMs critically depend on the decisions made during the systems' design and implementation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these design decisions are made implicitly, without knowing exactly how they will influence the final system. It is therefore important to make explicit the decisions made during the design of ADM systems and understand how these decisions affect the fairness of the resulting system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health coverage of vulnerable populations for potential interventions. Our results illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness and how to detect these effects using multiverse analysis.

摘要
很多系统在全球使用算法决策（ADM）来自动化以前由人类做出的决策。当这些系统设计得好时，它们承诺会提供更Objective的决策，同时节省大量资源并释放人类时间。然而，当ADM系统不好设计时，它们可能会导致不公正的决策，排挤社会群体。downstream效果 OF ADMs取决于系统设计和实施阶段中的决策，因为数据中的偏见可以在模型化管道中被减少或加强。许多这些设计决策是通过不确定的方式进行，不知道它们会在最终系统中产生什么影响。因此，需要将ADM系统的设计决策变为显式的，并理解这些决策如何影响系统的公平性。为了解决这个问题，我们从心理学中借鉴了一些思想，并提出了用于算法公平的多宇托分析方法。在我们的提议方法中，我们将设计决策转换为显式的决策，并证明这些决策对公平性的影响。我们创建了一个包含所有可能的决策组合的网格。对每个这些宇托，我们计算公平性和性能的度量。使用这些数据，我们可以看到哪些决策对公平性产生了影响。我们示例案例研究了预测护理覆盖的护理人口，以及可能的交叉 intervención。我们的结果表明，设计决策可能对算法公平产生不期望的影响，并如何使用多宇托分析来探测这些影响。

Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

paper_url: http://arxiv.org/abs/2308.16680
repo_url: None
paper_authors: Michael Kagan, Lukas Heinrich
for: 高能物理领域中的程序 diffeentiation
methods: gradient estimation techniques
results: 开 up gradient based optimization in detector design optimization, simulator tuning, or data analysis and reconstruction optimizationHere’s the full translation of the abstract in Simplified Chinese:
for: 本文为高能物理领域中的程序 diffeentiation提供了方法。
methods: 本文使用了多种gradient estimation techniques。
results: 本文通过gradient estimation techniques，开 up了高能物理中的探测设计优化、模拟器调整、数据分析和重建优化。I hope this helps!

Abstract
We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.

摘要
我们提议使用多种梯度估计技术来启用高能物理中程序中的随机性的 differentiability。这些程序在高能物理中非常普遍，因为它们包含分支过程和归一化分析。因此，可以通过梯度基本优化来优化探测设计优化、模拟调试、数据分析和重建优化。我们讨论了多种可能的梯度估计策略，包括最近的随机AD方法，并在简化的探测设计实验中进行比较。在这之前，我们开发了，以我们所知道的，第一个完全可导分支程序。

Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting

paper_url: http://arxiv.org/abs/2308.16678
repo_url: None
paper_authors: Riccardo Miccini, Alaa Zniber, Clément Laroche, Tobias Piechowiak, Martin Schoeberl, Luca Pezzarossa, Ouassim Karrakchou, Jens Sparsø, Mounir Ghogho
for: 提高资源受限设备上深度降噪模型的性能和资源利用率
methods: 基于nsNet2的早退模型，实现多级准确率和计算复杂度的负担减少，并通过分流信息流来考虑引入的动态性
results: 通过Established metrics显示了模型性能和计算复杂度之间的负担减少和平衡Here’s a breakdown of each point:
for: The paper is written to improve the performance and resource utilization of deep noise suppression models on resource-constrained devices.
methods: The paper proposes an early-exiting model based on nsNet2, which provides multiple levels of accuracy and resource savings by halting computations at different stages. The original architecture is adapted by splitting the information flow to account for injected dynamism.
results: The paper shows the trade-offs between performance and computational complexity based on established metrics.

Abstract
Although deep learning has made strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.

摘要
although deep learning has made great strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.Here's the word-for-word translation:深度学习在深度噪声抑制领域已经做出了很大的进步，但是在有限资源的设备上运行深度建筑仍然是一个挑战。因此，我们提出了基于nsNet2的早退模型，可以在不同的阶段停止计算，并提供了几个级别的准确性和资源节省。此外，我们修改了原始建筑，将信息流分成多个流程，以考虑在注入的动态中的变化。我们根据已知的度量表示出了性能和计算复杂度之间的交易。

Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing

paper_url: http://arxiv.org/abs/2308.16671
repo_url: None
paper_authors: Shenglong Zhou, Kaidi Xu, Geoffrey Ye Li
for: 这个论文目的是为了实现分散式联合学习（DFL）中的模型训练，并且在分散式环境中实现模型训练的效率和稳定性。
methods: 本论文使用了一个基于对调方向法（iADM）的新算法，并且将模型受限于简洁性的限制，使得可以使用一位数探测（1BCS）来实现模型训练。在训练过程中，仅有部分邻居参与训练，使得算法具有耐慢端机器的特性。此外，算法使用关注点解决方法来解决复杂的子问题，并且使用关注点的关注点解决方法来解决复杂的子问题。
results: 数据实验表明，这个算法在通信和计算效率方面具有优秀的性能，并且可以实现模型训练的稳定性和可靠性。

Abstract
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging, as there is no central server to coordinate the training process. Especially when distributed nodes suffer from limitations in communication or computational resources, DFL will experience extremely inefficient and unstable training. Motivated by these challenges, in this paper, we develop a novel algorithm based on the framework of the inexact alternating direction method (iADM). On one hand, our goal is to train a shared model with a sparsity constraint. This constraint enables us to leverage one-bit compressive sensing (1BCS), allowing transmission of one-bit information among neighbour nodes. On the other hand, communication between neighbour nodes occurs only at certain steps, reducing the number of communication rounds. Therefore, the algorithm exhibits notable communication efficiency. Additionally, as each node selects only a subset of neighbours to participate in the training, the algorithm is robust against stragglers. Additionally, complex items are computed only once for several consecutive steps and subproblems are solved inexactly using closed-form solutions, resulting in high computational efficiency. Finally, numerical experiments showcase the algorithm's effectiveness in both communication and computation.

摘要
随着各种应用场景的实际需求，分布式联合学习（DFL）在最近几年得到了广泛的关注。与中央服务器协调训练过程相比，在DFL中训练共享模型对一大量节点进行训练是更加困难，因为没有中央服务器可以协调训练过程。尤其是当分布式节点受到通信或计算资源的限制时，DFL会遭遇极其不稳定和不fficient的训练。为了解决这些挑战，在这篇论文中，我们开发了基于不准确的 alternate direction 方法（iADM）的一种新算法。一方面，我们的目标是在共享模型中实现稀疏性约束。这种约束使我们可以利用一位数据压缩感知（1BCS），允许邻居节点之间传输一位信息。另一方面，邻居节点之间的通信只在某些步骤发生，因此算法的通信效率非常高。此外，每个节点只选择一 subset of 邻居节点参与训练，使算法具有抗异常节点（straggler）的性能。此外，复杂的项目只在几个连续步骤中计算一次，并使用关闭式解决方案解决减法问题，导致高效的计算性能。最后，数字实验证明算法在通信和计算方面的效果非常出色。

What can we learn from quantum convolutional neural networks?

paper_url: http://arxiv.org/abs/2308.16664
repo_url: None
paper_authors: Chukwudubem Umeano, Annie E. Paine, Vincent E. Elfving, Oleksandr Kyriienko
for: 这篇论文主要探讨了量子卷积神经网络（QCNN）的应用和性能。
methods: 这篇论文使用了量子卷积神经网络（QCNN）模型，并对其进行了严格的分析和测试。
results: 研究发现，使用QCNN模型可以高效地识别量子阶段，并且可以通过选择合适的基准函数来构建高性能的决策边界。此外，研究还发现，QCNN模型的泛化能力强度取决于嵌入类型，而使用旋转基准函数和快速变化的特征MAP可以提高模型的性能。

Abstract
We can learn from analyzing quantum convolutional neural networks (QCNNs) that: 1) working with quantum data can be perceived as embedding physical system parameters through a hidden feature map; 2) their high performance for quantum phase recognition can be attributed to generation of a very suitable basis set during the ground state embedding, where quantum criticality of spin models leads to basis functions with rapidly changing features; 3) pooling layers of QCNNs are responsible for picking those basis functions that can contribute to forming a high-performing decision boundary, and the learning process corresponds to adapting the measurement such that few-qubit operators are mapped to full-register observables; 4) generalization of QCNN models strongly depends on the embedding type, and that rotation-based feature maps with the Fourier basis require careful feature engineering; 5) accuracy and generalization of QCNNs with readout based on a limited number of shots favor the ground state embeddings and associated physics-informed models. We demonstrate these points in simulation, where our results shed light on classification for physical processes, relevant for applications in sensing. Finally, we show that QCNNs with properly chosen ground state embeddings can be used for fluid dynamics problems, expressing shock wave solutions with good generalization and proven trainability.

摘要
我们可以从分析量子卷积神经网络（QCNN）中学习到以下几点：1. 在处理量子数据时，可以将物理系统的参数嵌入到隐藏特征地图中，并且这个嵌入可以帮助我们获得更好的表现。2. QCNNs 的高性能在量子阶梯识别 зада问题上可以推广到基于物理系统的权威性，这是因为在这些系统中，基于量子扭转的特征函数在嵌入过程中会快速变化。3. QCNNs 中的填充层负责选择适合形成高性能决策边界的基底函数，并且学习过程将量子扭转映射到全域观测器上。4. QCNNs 的对应类型强烈取决于嵌入类型，而且使用Rotation-based特征函数和Fourier基底需要特别的特征工程。5. QCNNs 的精度和通用性强烈取决于嵌入类型和读取方式，而且使用有限次调试的读取方式优先预测类型和物理知识。我们在模拟中证明了这些点，并且显示了这些模型在感测领域中的应用。最后，我们还证明了 QCNNs 以适当的嵌入类型和物理知识可以用于流体动力学问题，表示出对于冲击波解的好通用性和训练可靠性。

Autoencoder-based Online Data Quality Monitoring for the CMS Electromagnetic Calorimeter

paper_url: http://arxiv.org/abs/2308.16659
repo_url: None
paper_authors: Abhirami Harilal, Kyungmin Park, Michael Andrews, Manfred Paulini
for: 这项研究旨在开发一个基于深度学习的ECAL数据质量监测系统，以快速检测和诊断ECAL仪器中的各种问题，以确保物理数据的质量。
methods: 该研究使用了无监督的深度学习方法，开发了一个在实时中运行的自适应压缩器，以检测ECAL中未seen的异常。
results: 该系统能够有效地检测ECAL中的异常，并保持false discovery rate在$10^{-2}$到$10^{-4}$之间，超过了现有的标准准则。实际中的性能也得到了验证，并在2018和2022年的LHC冲撞数据中探测到了一些隐藏的问题。

Abstract
The online Data Quality Monitoring system (DQM) of the CMS electromagnetic calorimeter (ECAL) is a crucial operational tool that allows ECAL experts to quickly identify, localize, and diagnose a broad range of detector issues that would otherwise hinder physics-quality data taking. Although the existing ECAL DQM system has been continuously updated to respond to new problems, it remains one step behind newer and unforeseen issues. Using unsupervised deep learning, a real-time autoencoder-based anomaly detection system is developed that is able to detect ECAL anomalies unseen in past data. After accounting for spatial variations in the response of the ECAL and the temporal evolution of anomalies, the new system is able to efficiently detect anomalies while maintaining an estimated false discovery rate between $10^{-2}$ to $10^{-4}$, beating existing benchmarks by about two orders of magnitude. The real-world performance of the system is validated using anomalies found in 2018 and 2022 LHC collision data. Additionally, first results from deploying the autoencoder-based system in the CMS online DQM workflow for the ECAL barrel during Run 3 of the LHC are presented, showing its promising performance in detecting obscure issues that could have been missed in the existing DQM system.

摘要
在CMS电磁calorimeter（ECAL）的在线数据质量监测系统（DQM）中，一个重要的运作工具可以帮助ECAL专家快速发现、定位和诊断各种探测器问题，这些问题会否妨碍物理质量数据收集。虽然现有的ECAL DQM系统已经不断更新以应对新的问题，但它仍然一步落后于新的问题。使用无监督深度学习，一个实时自适应器基于异常检测系统被开发出来，可以在ECAL中检测到未在过去数据中出现的异常。在考虑ECAL的响应特性和时间演化的情况下，新系统可以高效地检测异常，并且保持估计的假发现率在10^-2到10^-4之间，超过现有的标准 benchmark by about two orders of magnitude。实际性的性能 Validated using anomalies found in 2018 and 2022 LHC collision data. In addition, the first results of deploying the autoencoder-based system in the CMS online DQM workflow for the ECAL barrel during Run 3 of the LHC are presented, showing its promising performance in detecting obscure issues that could have been missed in the existing DQM system.

A Causal Discovery Approach To Learn How Urban Form Shapes Sustainable Mobility Across Continents

paper_url: http://arxiv.org/abs/2308.16599
repo_url: None
paper_authors: Felix Wagner, Florian Nachtigall, Lukas Franken, Nikola Milojevic-Dupont, Rafael H. M. Pereira, Nicolas Koch, Jakob Runge, Marta Gonzalez, Felix Creutzig
for: 本研究旨在提供对城市规划和交通系统低碳化发展的准确指导，通过掌握城市建设环境对旅行的影响关系。
methods: 本研究采用了 causal discovery 和可解释机器学习框架，通过高分辨率移动数据来探索城市形态对城市内旅行的影响关系。
results: 研究发现，城市的距离市中心、人口结构和密度对其他城市形态特征产生间接影响，而城市规划师和城市管理者可以通过了解这些影响关系来做出更加有效的城市规划决策。

Abstract
Global sustainability requires low-carbon urban transport systems, shaped by adequate infrastructure, deployment of low-carbon transport modes and shifts in travel behavior. To adequately implement alterations in infrastructure, it's essential to grasp the location-specific cause-and-effect mechanisms that the constructed environment has on travel. Yet, current research falls short in representing causal relationships between the 6D urban form variables and travel, generalizing across different regions, and modeling urban form effects at high spatial resolution. Here, we address all three gaps by utilizing a causal discovery and an explainable machine learning framework to detect urban form effects on intra-city travel based on high-resolution mobility data of six cities across three continents. We show that both distance to city center, demographics and density indirectly affect other urban form features. By considering the causal relationships, we find that location-specific influences align across cities, yet vary in magnitude. In addition, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits. Differences in urban form effects across the cities call for a more holistic definition of 6D measures. Our work is a starting point for location-specific analysis of urban form effects on mobility behavior using causal discovery approaches, which is highly relevant for city planners and municipalities across continents.

摘要
Here, we address these gaps by using a causal discovery and explainable machine learning framework to detect urban form effects on intra-city travel based on high-resolution mobility data of six cities across three continents. We find that both distance to city center, demographics, and density indirectly affect other urban form features. By considering the causal relationships, we show that location-specific influences align across cities, yet vary in magnitude. Additionally, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits.Differences in urban form effects across cities call for a more holistic definition of 6D measures. Our work is a starting point for location-specific analysis of urban form effects on mobility behavior using causal discovery approaches, which is highly relevant for city planners and municipalities across continents.

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study

paper_url: http://arxiv.org/abs/2308.16585
repo_url: None
paper_authors: Patrick Saux, Pierre Bauvin, Violeta Raverdy, Julien Teigny, Hélène Verkindt, Tomy Soumphonphakdy, Maxence Debert, Anne Jacobs, Daan Jacobs, Valerie Monpellier, Phong Ching Lee, Chin Hong Lim, Johanna C Andersson-Assarsson, Lena Carlsson, Per-Arne Svensson, Florence Galtier, Guelareh Dezfoulian, Mihaela Moldovanu, Severine Andrieux, Julien Couster, Marie Lepage, Erminia Lembo, Ornella Verrastro, Maud Robert, Paulina Salminen, Geltrude Mingrone, Ralph Peterli, Ricardo V Cohen, Carlos Zerrweck, David Nocca, Carel W Le Roux, Robert Caiazzo, Philippe Preux, François Pattou
for: 预测个人5年减肥轨迹后operation。
methods: 使用机器学习模型，通过选择变量并构建可读性树来预测个人5年减肥轨迹。
results: 在多国多中心的测试 cohort 中，模型的 median absolute deviation（MAD）和 root mean squared error（RMSE）的Body Mass Index（BMI）值均在2.8kg/m${}^2$ 和4.7kg/m${}^2$ 之间，mean difference between predicted and observed BMI均为-0.3kg/m${}^2$。这个模型已经被 integrate into an easy-to-use and interpretable web-based prediction tool，以帮助在进行前置决策。

Abstract
Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions.

摘要
背景：肥胖手术后Weight loss trajectories vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery.方法：我们在多国多中心的retrospective observational study中包括了18岁以上成年参与者（包括ABOS [NCT01129297]、BAREVAL [NCT02310178]、瑞典肥胖Subjects研究和荷兰肥胖临床[Nederlandse Obesitas Kliniek]）和两个随机化试验（SleevePass [NCT00793143] 和 SM-BOSS [NCT00356213]），涵盖欧洲、美洲和亚洲，并进行了5年跟踪。排除了前一次肥胖手术的患者或延迟了实际访问的时间。教学组包括了法国两个中心（ABOS和BAREVAL）。结果：10231名患者从12个中心的10个国家被包括在分析中，共计30602名患者-年。参与者中的7701名（75%）是女性，2530名（24%）是男性。在训练组中可以提供的434个基eline attribute中，选择了7个变量：身高、体重、 intervención类型、年龄、 диабе尼状况、 диабе尼持续时间和吸烟状况。在5年后的外部测试组中，总平均MAD BMI为2.8 kg/m${}^2$（95% CI 2.6-3.0），RMSE BMI为4.7 kg/m${}^2$（4.4-5.0），预测与实际BMI之差为-0.3 kg/m${}^2$（SD 4.7）。这个模型已经被 integrating into an easy-to-use and interpretable web-based prediction tool to help inform clinical decision before surgery。 interpret：我们使用机器学习来开发了一个在多国 Validated model for predicting individual 5-year weight loss trajectories after three common bariatric interventions。

MONDEO: Multistage Botnet Detection

paper_url: http://arxiv.org/abs/2308.16570
repo_url: https://github.com/TLDart/mondeo
paper_authors: Duarte Dias, Bruno Sousa, Nuno Antunes
for: The paper is written to detect DNS-based botnet malware in mobile devices using a lightweight and flexible mechanism called MONDEO.
methods: MONDEO uses four detection stages: Blacklisting/Whitelisting, Query rate analysis, DGA analysis, and Machine learning evaluation to identify botnet malware.
results: MONDEO was tested against several datasets and achieved high performance with RandomForest classifiers, making it a useful tool for detecting botnet malware in mobile devices.Here’s the same information in Simplified Chinese text:
for: 该文章是为探测移动设备中的 DNS 基于恶意软件使用轻量级、灵活的机制 MONDEO。
methods: MONDEO 使用四个检测阶段：黑名单/白名单、查询速率分析、 DGA 分析和机器学习评估来识别恶意软件。
results: MONDEO 在多个数据集上进行测试，并 achieved 高性能使用 RandomForest 分类器，使其成为移动设备中探测恶意软件的有用工具。

Abstract
Mobile devices have widespread to become the most used piece of technology. Due to their characteristics, they have become major targets for botnet-related malware. FluBot is one example of botnet malware that infects mobile devices. In particular, FluBot is a DNS-based botnet that uses Domain Generation Algorithms (DGA) to establish communication with the Command and Control Server (C2). MONDEO is a multistage mechanism with a flexible design to detect DNS-based botnet malware. MONDEO is lightweight and can be deployed without requiring the deployment of software, agents, or configuration in mobile devices, allowing easy integration in core networks. MONDEO comprises four detection stages: Blacklisting/Whitelisting, Query rate analysis, DGA analysis, and Machine learning evaluation. It was created with the goal of processing streams of packets to identify attacks with high efficiency, in the distinct phases. MONDEO was tested against several datasets to measure its efficiency and performance, being able to achieve high performance with RandomForest classifiers. The implementation is available at github.

摘要
mobile devices已经广泛普及，成为现代技术中最受欢迎的一种。由于它们的特点，它们成为了主要的针对 botnet 恶意软件的目标。fluBot是一种 DNS 基于的 botnet 恶意软件，通过Domain Generation Algorithms（DGA）与命令和控制服务器（C2）进行通信。MONDEO是一种多stage机制，具有灵活的设计，用于探测 DNS 基于的 botnet 恶意软件。MONDEO 轻量级，不需要在移动设备中部署软件、代理或配置，可以方便地集成到核心网络中。MONDEO 包括四个检测阶段：黑名单/白名单、查询率分析、DGA 分析和机器学习评估。它的目标是处理流量包来确定攻击，高效率地完成任务。MONDEO 在多个数据集上进行测试，并能够达到高效率的RandomForest 分类器。实现可以在 GitHub 上找到。

Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input

paper_url: http://arxiv.org/abs/2308.16544
repo_url: None
paper_authors: Jalmari Tuominen, Eetu Pulkkinen, Jaakko Peltonen, Juho Kanniainen, Niku Oksala, Ari Palomäki, Antti Roine
for: 预测急诊室（ED）填满情况，以提高患者安全性和健康结果。
methods: 使用先进的机器学习模型（ML）预测ED填满情况24小时前，并使用电子健康记录数据、床位数据、交通数据和天气变量等多变量输入。
results: N-BEATS和LightGBM模型在比较准确性方面表现出色，与统计学 benchmark 相比提高11%和9%；DeepAR模型预测下一天拥挤情况的ROC曲线为0.76（95% CI 0.69-0.84）。

Abstract
Emergency department (ED) crowding is a significant threat to patient safety and it has been repeatedly associated with increased mortality. Forecasting future service demand has the potential patient outcomes. Despite active research on the subject, several gaps remain: 1) proposed forecasting models have become outdated due to quick influx of advanced machine learning models (ML), 2) amount of multivariable input data has been limited and 3) discrete performance metrics have been rarely reported. In this study, we document the performance of a set of advanced ML models in forecasting ED occupancy 24 hours ahead. We use electronic health record data from a large, combined ED with an extensive set of explanatory variables, including the availability of beds in catchment area hospitals, traffic data from local observation stations, weather variables, etc. We show that N-BEATS and LightGBM outpeform benchmarks with 11 % and 9 % respective improvements and that DeepAR predicts next day crowding with an AUC of 0.76 (95 % CI 0.69-0.84). To the best of our knowledge, this is the first study to document the superiority of LightGBM and N-BEATS over statistical benchmarks in the context of ED forecasting.

摘要
急诊室拥堵是一种严重的 patient safety 问题，已经被重复地与增加 mortality 相关。预测未来服务需求有可能改善 patient outcomes。Despite 多年的研究，还有几个空白：1）提议的预测模型已经因为快速的机器学习模型（ML）的涌入而过时，2）数据的多变量输入受限，3）绝对性表现指标很少被报道。在这种研究中，我们记录了一些高级 ML 模型在预测急诊室占用 24 小时前的表现。我们使用了一个大型、集成的急诊室数据，包括抢救区域医院床位可用性、当地观测站交通数据、天气变量等多个说服变量。我们发现，N-BEATS 和 LightGBM 在比较均匀的情况下表现出了11%和9%的提升，而 DeepAR 预测下一天拥堵的 AUC 为 0.76（95% CI 0.69-0.84）。据我们所知，这是第一个证明 LightGBM 和 N-BEATS 在急诊室预测中超过统计标准的研究。

Scalable Incomplete Multi-View Clustering with Structure Alignment

paper_url: http://arxiv.org/abs/2308.16541
repo_url: https://github.com/wy1019/simvc-sa
paper_authors: Yi Wen, Siwei Wang, Ke Liang, Weixuan Liang, Xinhang Wan, Xinwang Liu, Suyuan Liu, Jiyuan Liu, En Zhu
for: This paper focuses on the problem of incomplete multi-view clustering (IMVC) and proposes a novel incomplete anchor graph learning framework called Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA) to tackle the issues of inter-view discrepancy and anchor misalignment.
methods: The proposed method constructs view-specific anchor graphs to capture complementary information from different views, and aligns the cross-view anchor correspondence using a novel structure alignment module. The anchor graph construction and alignment are jointly optimized in the unified framework to enhance clustering quality.
results: Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of the proposed method, with linear time and space complexity correlated with the number of samples. The code is publicly available at https://github.com/wy1019/SIMVC-SA.

Abstract
The success of existing multi-view clustering (MVC) relies on the assumption that all views are complete. However, samples are usually partially available due to data corruption or sensor malfunction, which raises the research of incomplete multi-view clustering (IMVC). Although several anchor-based IMVC methods have been proposed to process the large-scale incomplete data, they still suffer from the following drawbacks: i) Most existing approaches neglect the inter-view discrepancy and enforce cross-view representation to be consistent, which would corrupt the representation capability of the model; ii) Due to the samples disparity between different views, the learned anchor might be misaligned, which we referred as the Anchor-Unaligned Problem for Incomplete data (AUP-ID). Such the AUP-ID would cause inaccurate graph fusion and degrades clustering performance. To tackle these issues, we propose a novel incomplete anchor graph learning framework termed Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA). Specially, we construct the view-specific anchor graph to capture the complementary information from different views. In order to solve the AUP-ID, we propose a novel structure alignment module to refine the cross-view anchor correspondence. Meanwhile, the anchor graph construction and alignment are jointly optimized in our unified framework to enhance clustering quality. Through anchor graph construction instead of full graphs, the time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples. Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.

摘要
成功的多视图划分（MVC）取决于所有视图都是完整的，但是样本通常只有部分可用，这引起了划分不完整的多视图划分（IMVC）的研究。虽然一些基于锚点的IMVC方法已经提出，但它们仍然受到以下缺点的影响：一、大多数现有方法忽视了视图之间的差异，强制跨视图表示保持一致，这会让模型的表示能力受损；二、由于不同视图中的样本差异，学习的锚点可能会偏移，我们称之为缺失锚点问题（AUP-ID）。这种AUP-ID会导致不正确的图融合和下降划分性能。为解决这些问题，我们提出了一种基于缺失锚点的新型多视图划分框架，名为扩展可靠多视图划分with结构对齐（SIMVC-SA）。我们特别是建立视图特定的锚点图来捕捉不同视图中的补做信息。为解决AUP-ID，我们提出了一种新的结构对齐模块，以修正跨视图锚点对应关系。同时，锚点图建构和对齐在我们的统一框架中同步优化，以提高划分质量。通过锚点图建构而不是全图建构，我们提出的SIMVC-SA的时间和空间复杂度被证明为与样本数量直接相关。我们的代码可以在https://github.com/wy1019/SIMVC-SA上获取。Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.

Echocardiographic View Classification with Integrated Out-of-Distribution Detection for Enhanced Automatic Echocardiographic Analysis

paper_url: http://arxiv.org/abs/2308.16483
repo_url: None
paper_authors: Jaeik Jeon, Seongmin Ha, Yeonyee E. Yoon, Jiyeon Kim, Hyunseok Jeong, Dawun Jeong, Yeonggul Jang, Youngtaek Hong, Hyuk-Jae Chang
for: 这份研究旨在提高自动echocardiography分类的精度和可靠性，以便在诊断和评估心脏病的过程中帮助医生。
methods: 这篇研究使用了深度学习的方法，包括31种标本分类和对于没有分布（OOD）的检测。
results: 研究结果显示，ECHO-VICODE可以实现高精度和可靠性的心脏病分类，并且能够有效地检测没有分布的检测结果。

Abstract
In the rapidly evolving field of automatic echocardiographic analysis and interpretation, automatic view classification is a critical yet challenging task, owing to the inherent complexity and variability of echocardiographic data. This study presents ECHOcardiography VIew Classification with Out-of-Distribution dEtection (ECHO-VICODE), a novel deep learning-based framework that effectively addresses this challenge by training to classify 31 classes, surpassing previous studies and demonstrating its capacity to handle a wide range of echocardiographic views. Furthermore, ECHO-VICODE incorporates an integrated out-of-distribution (OOD) detection function, leveraging the relative Mahalanobis distance to effectively identify 'near-OOD' instances commonly encountered in echocardiographic data. Through extensive experimentation, we demonstrated the outstanding performance of ECHO-VICODE in terms of view classification and OOD detection, significantly reducing the potential for errors in echocardiographic analyses. This pioneering study significantly advances the domain of automated echocardiography analysis and exhibits promising prospects for substantial applications in extensive clinical research and practice.

摘要
在自动echocardiographic分析和解释领域中，自动视类别是一项挑战性的任务，主要因为echocardiographic数据的内在复杂性和变化性。本研究提出了ECHOcardiography View Classification with Out-of-Distribution Detection（ECHO-VICODE），一种深度学习基础的框架，能够有效地解决这个挑战。ECHO-VICODE通过训练31个类别，超过了先前的研究，并证明了其能够处理广泛的echocardiographic视图。此外，ECHO-VICODE还包括内置的out-of-distribution（OOD）检测功能，利用相对的Mahalanobis距离有效地标识echocardiographic数据中的“近OOD”实例。经过广泛的实验，我们证明了ECHO-VICODE在视图类别和OOD检测方面的出色性能，明显减少了echocardiographic分析中的可能的错误。这项创新的研究在自动echocardiography分析领域中具有先驱性，展现出了广泛的临床研究和实践应用的潜在前景。

A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems

paper_url: http://arxiv.org/abs/2308.16471
repo_url: None
paper_authors: Satoshi Yamamori, Jun Morimoto
for: 这个研究旨在应对动态运动生成任务中的聚合和碰撞，小的政策参数变化可能导致应力的完全不同结果。例如，在足球游戏中，几分之一的对策变化可以导致球会飞行在完全不同的方向上，但是不需要完全不同的技能。
methods: 这个研究使用了多任务强化学习算法来适应单一运动类别中的不同目标或环境，以及不同的奖励函数或物理环境的 Parameters。
results: 研究结果显示，提案的方法可以适应不同的目标位置或球的弹性系数的隐藏变化，而标准的预像随机化方法则无法处理不同的任务设定。

Abstract
In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.

摘要
在动态动作生成任务中，包括 contacts 和碰撞，小型政策参数变化可以导致极其不同的返回。例如，在足球中，通过些微改变球头位置或发球力量，球可以飞向完全不同的方向，但是不需要完全不同的技能。在本研究中，我们提出了一种多任务强化学习算法，用于适应单个动作类别中的不同目标或环境中的隐式变化。我们使用一个单脚机器人模型进行评估。结果表明，我们的方法可以适应不同的目标位置或球的归退率，而标准领域随机化方法无法处理不同的任务设定。

Domain-adaptive Message Passing Graph Neural Network

paper_url: http://arxiv.org/abs/2308.16470
repo_url: https://github.com/shenxiaocam/dm_gnn
paper_authors: Xiao Shen, Shirui Pan, Kup-Sze Choi, Xi Zhou
for: 本研究旨在Addressing the challenge of cross-network node classification (CNNC), which involves classifying nodes in a label-deficient target network using the knowledge from a source network with abundant labels.
methods: 本研究提出了一种域 adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation. DM-GNN 可以学习具有识别力的表示，同时也可以在不同网络之间进行转移学习。
results: 对于 eleven state-of-the-art methods, 本研究的 DM-GNN 显示出更高的效果，能够更好地匹配类别分布 across networks.

Abstract
Cross-network node classification (CNNC), which aims to classify nodes in a label-deficient target network by transferring the knowledge from a source network with abundant labels, draws increasing attention recently. To address CNNC, we propose a domain-adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation. DM-GNN is capable of learning informative representations for node classification that are also transferrable across networks. Firstly, a GNN encoder is constructed by dual feature extractors to separate ego-embedding learning from neighbor-embedding learning so as to jointly capture commonality and discrimination between connected nodes. Secondly, a label propagation node classifier is proposed to refine each node's label prediction by combining its own prediction and its neighbors' prediction. In addition, a label-aware propagation scheme is devised for the labeled source network to promote intra-class propagation while avoiding inter-class propagation, thus yielding label-discriminative source embeddings. Thirdly, conditional adversarial domain adaptation is performed to take the neighborhood-refined class-label information into account during adversarial domain adaptation, so that the class-conditional distributions across networks can be better matched. Comparisons with eleven state-of-the-art methods demonstrate the effectiveness of the proposed DM-GNN.

摘要
The proposed DM-GNN consists of three main components:1. GNN encoder: A GNN encoder is constructed using dual feature extractors to separate ego-embedding learning from neighbor-embedding learning. This allows the model to jointly capture commonality and discrimination between connected nodes.2. Label propagation node classifier: A label propagation node classifier is proposed to refine each node's label prediction by combining its own prediction and its neighbors' prediction.3. Conditional adversarial domain adaptation: Conditional adversarial domain adaptation is performed to take the neighborhood-refined class-label information into account during adversarial domain adaptation, allowing the class-conditional distributions across networks to be better matched.The proposed DM-GNN is evaluated using eleven state-of-the-art methods, and the results demonstrate its effectiveness in node classification tasks.In simplified Chinese, the text can be translated as: crossed-network node classification (CNNC) 是一种将知识从有 labels 的源网络转移到无 labels 的目标网络中进行分类的技术，在最近引起了越来越多的注意。为了解决 CNNC，我们提出了一个域 adapted 的讯息传递图 neural network (DM-GNN)，它结合了图 neural network (GNN) 和 conditional adversarial domain adaptation。DM-GNN 可以学习对网络分类的 informative 表现，并且可以跨网络传递。DM-GNN 的主要Component包括：1. GNN Encoder：使用 dual feature extractors 将 ego-embedding 学习和 neighbor-embedding 学习分开，以便同时捕捉网络中连接的node之间的共同性和分别性。2. Label Propagation Node Classifier：提出了一个 label propagation 节点分类器，可以透过节点的自己预测和邻居预测来优化节点的类别预测。3. Conditional Adversarial Domain Adaptation：在 conditional adversarial domain adaptation 中，使用节点预测的类别信息来对网络进行域对应，以便更好地匹配网络间的类别分布。DM-GNN 在 eleven state-of-the-art 方法中进行评估，结果显示了它的效果。

Computing excited states of molecules using normalizing flows

paper_url: http://arxiv.org/abs/2308.16468
repo_url: None
paper_authors: Yahya Saleh, Álvaro Fernández Corral, Armin Iske, Jochen Küpper, Andrey Yachmenev
for: 用于计算量子系统的ground和 excited状态
methods: 使用线性 span基函数的拟合方法，通过组合normalizing flows进行优化
results: 在计算三原料H$_2$S分子的很多振荡态和一些电子状态的诸如氢原子、分子氢离子和碳原子等一元电子系统中，达到了更高的精度和加速基aset快速整合。

Abstract
We present a new nonlinear variational framework for simultaneously computing ground and excited states of quantum systems. Our approach is based on approximating wavefunctions in the linear span of basis functions that are augmented and optimized \emph{via} composition with normalizing flows. The accuracy and efficiency of our approach are demonstrated in the calculations of a large number of vibrational states of the triatomic H$_2$S molecule as well as ground and several excited electronic states of prototypical one-electron systems including the hydrogen atom, the molecular hydrogen ion, and a carbon atom in a single-active-electron approximation. The results demonstrate significant improvements in the accuracy of energy predictions and accelerated basis-set convergence even when using normalizing flows with a small number of parameters. The present approach can be also seen as the optimization of a set of intrinsic coordinates that best capture the underlying physics within the given basis set.

摘要
我们提出了一种新的非线性变分方法，用于同时计算量子系统的基态和激发态。我们的方法基于使用线性span中的基函数，并通过组合normalizing flow进行优化和增强。我们在计算了大量的振荡态状态的H2S分子以及电子系统的基态和一些激发态的许多例子中，显示了我们的方法的精度和效率。结果表明，使用normalizing flow的少量参数可以大幅提高基准集合的准确性和基准集合的快速收敛。此外，我们的方法也可以看作是在给定基准集合中优化一组内在坐标，以最好捕捉下面物理的内在物理。

Least Squares Maximum and Weighted Generalization-Memorization Machines

paper_url: http://arxiv.org/abs/2308.16456
repo_url: None
paper_authors: Shuai Wang, Zhen Wang, Yuan-Hai Shao
for: 这个论文提出了一种新的记忆机制，用于改进支持向量机器学习（LSSVM）模型。这种机制可以准确分区训练集而不导致过拟合。
methods: 该论文提出了两种记忆影响模型（MIMM和WIMM），以及一些不同的记忆影响函数。这些模型可以降解到LSSVM模型中。
results: 实验结果表明，我们的MIMM和WIMM模型在总体性能和时间成本方面都有优势，比LSSVM和其他记忆模型更好。

Abstract
In this paper, we propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM). Without changing the equation constraints of the original LSSVM, this mechanism, allows an accurate partitioning of the training set without overfitting. The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed. It is demonstrated that these models can be degraded to the LSSVM. Furthermore, we propose some different memory impact functions for the MIMM and WIMM. The experimental results show that that our MIMM and WIMM have better generalization performance compared to the LSSVM and significant advantage in time cost compared to other memory models.

摘要
在这篇论文中，我们提出了一种新的记忆机制，用于改进最小二乘支持向量机（LSSVM）的性能。无需更改原始LSSVM的方程约束，这种机制可以准确地分区训练集而不导致过拟合。然后，我们提出了最大记忆影响模型（MIMM）和权重记忆影响模型（WIMM）。我们还提出了一些不同的记忆影响函数 для MIMM 和 WIMM。实验结果表明，我们的 MIMM 和 WIMM 在泛化性能方面表现更好，而且在时间成本方面具有显著的优势 compared to 其他记忆模型。

Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training

paper_url: http://arxiv.org/abs/2308.16453
repo_url: None
paper_authors: Xiang Li, Juncheng Guo, Qige Song, Jiang Xie, Yafei Sang, Shuyuan Zhao, Yongzheng Zhang
for: 本研究旨在提出一种新的隐私传输分类（ETC）框架，以便在移动互联网环境中有效地管理 encrypted traffic。
methods: 本文提出了一种新的 Pre-trAining Semi-Supervised ETC 框架（PASS），具有以下三个关键组成部分：1）对原始训练集进行采样和对比性预训练，以避免类别偏袋问题；2）使用半导引优化策略，以利用大量无标示流量数据并减轻人工标注工作负担；3）采用强化的循环优化策略，以提高 ETC 方法的泛化能力。
results: 对四个公共数据集进行比较，PASS 方法的性能比 state-of-the-art ETC 方法和普通采样方法更高，特别是在面临类别偏袋和流量同质化问题时。PASS 方法可以适应不同的特征提取器，并且可以在不同的网络环境下进行高效地隐私传输分类。

Abstract
Mobile Internet has profoundly reshaped modern lifestyles in various aspects. Encrypted Traffic Classification (ETC) naturally plays a crucial role in managing mobile Internet, especially with the explosive growth of mobile apps using encrypted communication. Despite some existing learning-based ETC methods showing promising results, three-fold limitations still remain in real-world network environments, 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors.

摘要
Mobile 互联网已经深深影响现代生活方式，Encrypted Traffic Classification（ETC）在移动互联网管理中扮演着关键角色，尤其是在移动应用程序使用加密通信的情况下。despite some existing learning-based ETC methods showing promising results, there are still three-fold limitations in real-world network environments, including 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors.

paper_url: http://arxiv.org/abs/2308.16437
repo_url: None
paper_authors: Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, Jinjie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang
for:The paper is written for proposing a new CTR dataset, AntM$^{2}$C, to address the limitations of existing CTR datasets.methods:The paper uses a multi-scenario multi-modal approach, including 200 million users and 6 million items, to provide a more comprehensive understanding of user preferences. The dataset includes 200 features, such as ID-based features, raw text features, and image features.results:The paper provides comparisons with baseline methods on several typical CTR tasks based on the AntM$^{2}$C dataset, which is currently the largest-scale CTR dataset available.

Abstract
Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.

摘要
Click-through rate (CTR) 预测是推荐系统中的关键问题。随着不同场景的数据的出现，当前的 dataset 受到以下一些限制：首先，用户通常从多个场景中点击不同类型的项目，模型需要从多个场景中学习，以提供更全面的用户偏好。现有 dataset 仅包含同一类型的项目的数据，来自单一场景。其次，多 modal 特征是多场景预测中非常重要的，它们可以解决不同 ID 编码之间的不一致问题。现有 dataset 基于 ID 特征，缺乏多 modal 特征。第三，大规模 dataset 可以提供更可靠的评估，全面反映模型之间的性能差异。现有 dataset 的规模约为 100 万，相比实际 CTR 预测中更小。为解决这些限制，我们提出 AntM$^{2}$C，一个基于互联网数据的多场景多modal CTR dataset。具体来说，AntM$^{2}$C 提供了以下优势：1. 覆盖 5 种不同类型的项目的 CTR 数据，为用户偏好的研究提供了新的视角。2. 除 ID 特征之外，AntM$^{2}$C 还提供了 2 个多 modal 特征，包括原始文本和图像特征，可以有效建立不同 ID 之间的连接。3. AntM$^{2}$C 提供了 1 亿 CTR 数据，包括 200 万用户和 6 万个项目。它是目前已知最大规模的 CTR dataset。基于 AntM$^{2}$C，我们构建了一些典型的 CTR 任务，并与基准方法进行比较。 dataset 的主页可以在中找到。

On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

paper_url: http://arxiv.org/abs/2308.16425
repo_url: None
paper_authors: Zenan Ling, Zhenyu Liao, Robert C. Qiu
for: 研究高维隐式神经网络的性质和特点。
methods: 提供高维隐式神经网络的 conjugate kernels 和 neural tangent kernels 的等价性。
results: 在高维情况下，隐式和显式神经网络之间存在等价关系。In English, this translates to:
for: Investigating the properties and characteristics of high-dimensional implicit neural networks.
methods: Providing the high-dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels.
results: Demonstrating the equivalence between implicit and explicit networks in high dimensions.

Abstract
Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.

摘要
高维隐藏神经网络（Implicit Neural Networks）在各种任务中表现出色，但是对于隐藏和显式网络之间的连接和差异的理论分析尚缺乏。本文研究高维隐藏神经网络，并提供高维等价的拟合kernel和神经折射kernel。基于这些结果，我们证明高维隐藏神经网络和显式神经网络在高维空间中是等价的。

DECODE: DilatEd COnvolutional neural network for Detecting Extreme-mass-ratio inspirals

paper_url: http://arxiv.org/abs/2308.16422
repo_url: None
paper_authors: Tianyu Zhao, Yue Zhou, Ruijun Shi, Zhoujian Cao, Zhixiang Ren
for: 探测EXTREME MASS RATIO INSPIRALS (EMRIs) 的检测具有复杂的波形、长时间和低信号响应率 (SNR)，因此比单簇binary coalescences更加困难。
methods: 我们介绍了一种名为DECODE的端到端模型，该模型通过频域序列模型来检测EMRI信号。DECODE的核心是一个扩展的 causal convolutional neural network，通过训练使用 TDI-1.5 探测器响应来训练。
results: 我们对1年的多通道 TDI 数据进行了评估，并取得了一个真正正确率为 96.3%，假阳性率为 1%，并且每个检测只需0.01秒钟的计算时间。我们还通过三个示例 EMRI 信号的可见化来证明DECODE 的强大潜力。

Abstract
The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, keeping an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses.

摘要
<Translate into Simplified Chinese.探测Extreme Mass Ratio Inspirals（EMRI）的过程非常复杂，因为它们的复杂波形、长时间 duration和低信号响应率（SNR），使其与固体二体合并更加棘手。尽管匹配滤波器基本技术需要大量计算资源，现有的深度学习基于方法主要处理时间频谱数据，并且通常受到数据持续时间和SNR的限制。此外，大多数现有工作忽略了时间延迟折射（TDI）和探测器响应计算中的长波长approximation，因此限制了它们对激光频率噪声的处理能力。在本研究中，我们介绍DECODE，一种集中在顺序模型频率域的端到端模型，用于EMRI信号检测。 DECODE中心在填充 causal 卷积神经网络，通过对合成数据进行TDI-1.5探测器响应训练，可以高效处理一年的多通道TDI数据，SNR约为50。我们对1年的数据进行评估，SNR值分别为50-120，DECODE实现了true positive率96.3%，false positive率1%，计算时间下than 0.01秒。通过对三个示例EMRI信号进行可见化和总结，DECODE表现出了对未来空间 gravitational wave数据分析的强大潜力。

CktGNN: Circuit Graph Neural Network for Electronic Design Automation

paper_url: http://arxiv.org/abs/2308.16406
repo_url: https://github.com/zehao-dong/CktGNN
paper_authors: Zehao Dong, Weidong Cao, Muhan Zhang, Dacheng Tao, Yixin Chen, Xuan Zhang
for: 这paper的目的是提出一种基于神经网络的电子设计自动化方法，用于快速和高效地设计分析电路。
methods: 这paper使用了一种名为Circuit Graph Neural Network（CktGNN）的方法，该方法同时自动生成电路拓扑和设备大小。CktGNN使用了两层GNN框架，将电路表示为一系列嵌入式GNN的输入。这种方法可以大幅提高设计效率，降低了消息传递的数量。
results: experiments表明，CktGNN在Open Circuit Benchmark（OCB）上表现出色，在 represenation-based optimization frameworks 下，其性能较其他最近的GNN基elines和人工设计更高。这些结果预示了一种基于学习的电子设计自动化方法的可能性。

Abstract
The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. Particularly, CktGNN encodes circuit graphs using a two-level GNN framework (of nested GNN) where circuits are represented as combinations of subgraphs in a known subgraph basis. In this way, it significantly improves design efficiency by reducing the number of subgraphs to perform message passing. Nonetheless, another critical roadblock to advancing learning-assisted circuit design automation is a lack of public benchmarks to perform canonical assessment and reproducible research. To tackle the challenge, we introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers with carefully-extracted circuit specifications. OCB is also equipped with communicative circuit generation and evaluation capabilities such that it can help to generalize CktGNN to design various analog circuits by producing corresponding datasets. Experiments on OCB show the extraordinary advantages of CktGNN through representation-based optimization frameworks over other recent powerful GNN baselines and human experts' manual designs. Our work paves the way toward a learning-based open-sourced design automation for analog circuits. Our source code is available at \url{https://github.com/zehao-dong/CktGNN}.

摘要
electronic design automation of analog circuits 是integrated circuit field中长期的挑战，因为设计空间很大，设计几何选择很复杂。在过去的几十年中，大量的研究努力都是专注于将晶体大小调整为 givent circuit topology。本文发表了一个名为Circuit Graph Neural Network（CktGNN）的网络模型，可以同时自动生成图形和设备调整，基于encoder-dependent的优化子routines。尤其是，CktGNN将图形编码为一个二级GNN框架（of nested GNN），将图形视为一系列对known subgraph basis的 комбинации。这样可以对设计效率产生重要的改善，将讯息传递需要的子图减少。然而，几乎所有的学习帮助图形设计自动化都受到一个问题的限制：缺乏公共的底线来进行 canonical assessment和可重复的研究。为了解决这个挑战，我们创建了Open Circuit Benchmark（OCB），一个开源的数据集，包含10,000个不同的操作增强器，对于这些图形进行了精确的特性extraction。OCB还具有通信式图形生成和评估功能，可以帮助CktGNN对各种图形进行设计。实验结果表明，CktGNN通过表现基础优化框架比其他最近的强大GNN基eline和人工设计来得到了很好的成绩。我们的工作开启了一个学习基于的开源设计自动化的未来。我们的代码可以在https://github.com/zehao-dong/CktGNN中找到。

Balancing between the Local and Global Structures (LGS) in Graph Embedding

paper_url: http://arxiv.org/abs/2308.16403
repo_url: None
paper_authors: Jacob Miller, Vahan Huroyan, Stephen Kobourov
for: 这个论文的目的是提出一种方法来平衡本地和全局结构（LGS）在图像模型中，通过可调参数。
methods: 该方法使用了一些图像模型，以及一些已知的质量指标，如压力和邻域保持。
results: 该研究通过synthetic和实际数据进行了评估，并结果表明LGS与现有方法竞争，使用了新的质量指标cluster distance preservation来评估中间结构捕捉。所有代码、数据、实验和分析都可以在线获取。

Abstract
We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online.

摘要
我们提出了一种方法来平衡本地和全局结构（LGS）在图像中，通过可调参数。一些图像方法是捕捉全局结构，而另一些则尝试保留本地邻居。许多方法都不是同时能够 capture well both local和全局信息在两个维度，这是大多数图形描述生成器生活的地方。选择使用本地或全局嵌入 для可视化取决于任务以及对下面数据的结构，可能不知道在先。对于给定的图，LGS寻找一个好的本地和全局结构平衡，以保留。我们通过 synthetic和实际数据进行评估，结果表明LGS与当前状态的方法竞争，使用已确立的质量指标 such as 压力和邻居保持。我们引入了一个新的质量指标，即群集距离保持，来评估中间结构捕捉。所有源代码、数据集、实验和分析都可以在线获取。

Improving Robustness and Accuracy of Ponzi Scheme Detection on Ethereum Using Time-Dependent Features

paper_url: http://arxiv.org/abs/2308.16391
repo_url: None
paper_authors: Phuong Duy Huynh, Son Hoang Dau, Xiaodong Li, Phuc Luong, Emanuele Viterbo
for: The paper aims to detect Ponzi schemes in the cryptocurrency market, specifically using transaction data to improve the robustness and accuracy of detection.
methods: The authors propose new detection models that rely only on transactions and introduce novel time-dependent features to capture Ponzi behavior characteristics.
results: The proposed models achieve considerably higher accuracy, precision, recall, and F1-score than existing transaction-based models, making them more effective in detecting Ponzi schemes in the cryptocurrency market.Here’s the simplified Chinese text for the three information points:
for: 本研究旨在探讨区块链市场中 Ponzi 骗财 schemes 的探测方法，使用交易数据来提高检测的可靠性和准确性。
methods: 作者提出了一种基于交易数据的新的检测模型，该模型具有更高的可靠性、准确性、报告率和 F1 分数，可以更好地检测区块链市场中的 Ponzi 骗财。
results: 提出的模型在存在 Ponzi 骗财的情况下的检测效果比现有的交易基于模型更高，可以更好地保证检测的可靠性和准确性。

Abstract
The rapid development of blockchain has led to more and more funding pouring into the cryptocurrency market, which also attracted cybercriminals' interest in recent years. The Ponzi scheme, an old-fashioned fraud, is now popular on the blockchain, causing considerable financial losses to many crypto-investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code or opcode. The contract-code-based approach, while achieving very high accuracy, is not robust: first, the source codes of a majority of contracts on Ethereum are not available, and second, a Ponzi developer can fool a contract-code-based detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected (since these models were trained on existing Ponzi logics only). A transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. We address this gap in the literature by developing new detection models that rely only on the transactions, hence guaranteeing the robustness, and moreover, achieve considerably higher Accuracy, Precision, Recall, and F1-score than existing transaction-based models. This is made possible thanks to the introduction of novel time-dependent features that capture Ponzi behaviours characteristics derived from our comprehensive data analyses on Ponzi and non-Ponzi data from the XBlock-ETH repository

摘要
随着区块链技术的快速发展，更多的投资者转移到了 криптовалюencies Market，其中也吸引了黑客的关注。在最近几年中， Ponzi 型骗财活动在区块链上变得非常流行，导致了许多 крип投资者遭受了重大的金融损失。文献中已经提出了一些 Ponzi 探测方法，大多数是基于智能合约源代码或 opcode 进行探测，但是这种方法并不够 robust。首先，大多数 Ethereum 上的合约源代码不可用，而且 Ponzi 开发者可以使用混淆或新的利润分配逻辑来欺骗合约代码检测模型。在交易基础上进行探测可以提高检测的Robustness，但现有的交易基础上的检测模型的准确率相对较低。我们在文献中填补这个空白，通过开发基于交易的新检测模型，以 guarantees 的Robustness和更高的准确率、精度、回归率和 F1 分数来检测 Ponzi。这种新的检测模型基于我们对 Ponzi 和非 Ponzi 数据进行了全面的分析，从而提取了 Ponzi 行为特征的时间依赖特征。

Multi-Objective Decision Transformers for Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2308.16379
repo_url: None
paper_authors: Abdelghani Ghanem, Philippe Ciblat, Mounir Ghogho
for: 提高停滞RL的效果，使其更能够利用 transformer 模型的注意力机制。
methods: 将 offline RL Reformulated 为多目标优化问题，并引入 action space regions 来改善 trajectory 表示。
results: experiments 表明，我们的提议可以更好地使用 transformer 模型的注意力机制，并且可以与现有 state-of-the-art 方法匹配或超越其性能。

Abstract
Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the art methods.

摘要
偏向式学习（RL）是结构化的，以 derive 策略从静止轨迹数据中获得，不需要实时环境交互。 recent studies have shown that framing offline RL as a sequence modeling task is feasible, where the sole aim is to predict actions based on prior context using the transformer architecture. However, this single task learning approach may undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the-art methods.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is written in a formal and literal style, which may not always be the most idiomatic or natural way to express the ideas in Chinese.

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

paper_url: http://arxiv.org/abs/2308.16369
repo_url: None
paper_authors: Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee
for: 提高 LLVM 推理性能
methods: 使用 chunked-prefills 和 decode-maximal batching 技术
results: 实现了 significanly 提高 LLVM 推理性能，包括 decode throughput 提高 10 倍，总体 throughput 提高 1.33 倍，并且减少了 pipeline bubbles。

Abstract
Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times also lead to imbalance across micro-batches when using pipeline parallelism, resulting in further inefficiency due to bubbles. We present SARATHI to address these challenges. SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes. During inference, the prefill chunk saturates GPU compute, while the decode requests 'piggyback' and cost up to an order of magnitude less compared to a decode-only batch. Chunked-prefills allows constructing multiple decode-maximal batches from a single prefill request, maximizing coverage of decodes that can piggyback. Furthermore, the uniform compute design of these batches ameliorates the imbalance between micro-batches, significantly reducing pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware. For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. For LLaMa-33B on A100 GPU, we achieve 1.25x higher end-to-end-throughput and up to 4.25x higher decode throughput. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x.

摘要
大型语言模型（LLM）推理包括两个不同阶段：预填阶段和解码阶段。预填阶段处理输入提示，而解码阶段通过将输出 tokens 生成 autoregressively。在预填阶段， GPU 计算资源会受到小批量大小的限制，而在解码阶段，每个请求都会生成一个 token，导致计算资源利用率低。此外，预填和解码时间的变化也会导致微批处理中的不均衡，从而导致更多的气泡。为解决这些挑战，我们提出了 SARATHI。SARATHI 使用 chunked-prefills，将预填请求分割成等大小的块，并使用 decode-maximal batching，将每个块中的 decode 和预填块相结合，以实现 GPU 计算资源的最大利用。在推理过程中，预填块会使 GPU 计算资源充沛，而 decode 请求会“乘车”并产生相对论少的计算成本。chunked-prefills 允许构建多个 decode-maximal batches，从而最大化 decode 的覆盖率。此外，uniform 的 compute 设计也减轻了微批处理中的不均衡，从而减少气泡。我们的技术可以在不同的模型和硬件上实现显著的推理性能提升。对于 LLaMA-13B 模型和 A6000 GPU，SARATHI 可以提高解码吞吐量 by up to 10x，并提高总体吞吐量 by up to 1.33x。对于 LLaMa-33B 模型和 A100 GPU，我们可以达到 1.25x 更高的总体吞吐量和 up to 4.25x 更高的解码吞吐量。当用于 GPT-3 pipeline parallelism 时，SARATHI 可以减少气泡 by 6.29x，从而实现总体吞吐量的提升 by 1.91x。

2023-08-31

eess.IV

eess.IV - 2023-08-31

Twofold Structured Features-Based Siamese Network for Infrared Target Tracking

paper_url: http://arxiv.org/abs/2308.16676
repo_url: None
paper_authors: Wei-Jie Yan, Yun-Kai Xu, Qian Chen, Xiao-Fang Kong, Guo-Hua Gu, A-Jun Shao, Min-Jie Wan
for: 提高干扰目标跟踪的精度和速度，尤其是在目标形状和大小变化时。
methods: 使用两重结构特征的SIAMESE网络，其中一个是通过将深度semantic信息和浅层空间信息融合在一起，以提高探测目标的能力。另一个是通过模板更新机制来有效地处理目标外观变化引起的迟迟跟踪失败。
results: 在VOT-TIR 2016 dataset上进行了质量和量测试，并证明了我们的方法可以具有与其他状态革新跟踪器相当的跟踪性能和实时跟踪速度。

Abstract
Nowadays, infrared target tracking has been a critical technology in the field of computer vision and has many applications, such as motion analysis, pedestrian surveillance, intelligent detection, and so forth. Unfortunately, due to the lack of color, texture and other detailed information, tracking drift often occurs when the tracker encounters infrared targets that vary in size or shape. To address this issue, we present a twofold structured features-based Siamese network for infrared target tracking. First of all, in order to improve the discriminative capacity for infrared targets, a novel feature fusion network is proposed to fuse both shallow spatial information and deep semantic information into the extracted features in a comprehensive manner. Then, a multi-template update module based on template update mechanism is designed to effectively deal with interferences from target appearance changes which are prone to cause early tracking failures. Finally, both qualitative and quantitative experiments are carried out on VOT-TIR 2016 dataset, which demonstrates that our method achieves the balance of promising tracking performance and real-time tracking speed against other out-of-the-art trackers.

摘要
现在，红外目标跟踪技术在计算机视觉领域具有重要意义，有很多应用，如运动分析、人员监测、智能检测等等。然而，由于红外目标缺乏颜色、文本和其他细节信息，跟踪偏移经常发生，当跟踪器遇到变形或大小不同的红外目标时。为解决这个问题，我们提出了一种两重结构特征基于Siamese网络的红外目标跟踪方法。首先，为了提高红外目标的抑制能力，我们提出了一种新的特征融合网络，将表层空间信息和深度 semantics信息融合到提取的特征中，以实现全面的特征融合。然后，我们设计了基于模板更新机制的多模板更新模块，以有效地处理目标外观变化所导致的跟踪失败。最后，我们对VOT-TIR 2016 dataset进行了质量和kvantitativerexperiment，结果显示，我们的方法可以协调出色的跟踪性和实时跟踪速度，与其他状态OF-THE-ART tracker相比。

2023-08-31

eess.SP

eess.SP - 2023-08-31

Amplitude Prediction from Uplink to Downlink CSI against Receiver Distortion in FDD Systems

paper_url: http://arxiv.org/abs/2308.16882
repo_url: None
paper_authors: Chaojin Qing, Zilong Wang, Qing Ye, Wenhui Liu, Linsi He
for: 提高FDDFeMassive MIMO系统中的下行频道信息准确预测精度，解决由接收器扭曲引起的频率匹配问题。
methods: 使用减少接收器扭曲的传统方法提取上行CSI的幅特征，然后使用专门设计的轻量级幅学习网络（Dist-LeaNet）来抑制接收器扭曲和调整下行CSI的幅相似性。
results: 对FDD系统进行了严格的实验，结果表明，考虑接收器扭曲，提出的方案可以提高下行频道信息预测精度，同时降低传输和处理延迟。

Abstract
In frequency division duplex (FDD) massive multiple-input multiple-output (mMIMO) systems, the reciprocity mismatch caused by receiver distortion seriously degrades the amplitude prediction performance of channel state information (CSI). To tackle this issue, from the perspective of distortion suppression and reciprocity calibration, a lightweight neural network-based amplitude prediction method is proposed in this paper. Specifically, with the receiver distortion at the base station (BS), conventional methods are employed to extract the amplitude feature of uplink CSI. Then, learning along the direction of the uplink wireless propagation channel, a dedicated and lightweight distortion-learning network (Dist-LeaNet) is designed to restrain the receiver distortion and calibrate the amplitude reciprocity between the uplink and downlink CSI. Subsequently, by cascading, a single hidden layer-based amplitude-prediction network (Amp-PreNet) is developed to accomplish amplitude prediction of downlink CSI based on the strong amplitude reciprocity. Simulation results show that, considering the receiver distortion in FDD systems, the proposed scheme effectively improves the amplitude prediction accuracy of downlink CSI while reducing the transmission and processing delay.

摘要
在分频分配多输入多输出（FDD）大规模多输入多输出（mMIMO）系统中，接收器损害导致的回归不匹配严重下降了频率预测性能。为解决这个问题，本文从损害抑制和回归准确的角度提出了一种轻量级神经网络基于频率预测方法。具体来说，通过在基站（BS）上提取接收器损害的干扰特征，然后通过学习在下降频率通信频道方向上，设计了专门的、轻量级的干扰学习网络（Dist-LeaNet），以抑制接收器损害并准确地做回归准确性between uplink和downlink CSI。接着，通过堆叠，一个单hidden layer基于频率预测网络（Amp-PreNet）被开发出来实现频率预测的下降频率CSI。 simulation结果表明，在考虑到FDD系统中接收器损害的情况下，提出的方案可以有效提高下降频率CSI的频率预测精度，同时降低传输和处理延迟。

Analysis and Optimization of Reconfigurable Intelligent Surfaces Based on $S$-Parameters Multiport Network Theory

paper_url: http://arxiv.org/abs/2308.16856
repo_url: None
paper_authors: Andrea Abrardo, Alberto Toccafondi, Marco Di Renzo
for: 本研究考虑了可重新配置智能表面（RIS），并使用多口网络理论来模型它。
methods: 我们首先比较了使用Z参数和S参数来表示RIS，并证明它们之间的等价性，并讨论它们的不同特点。然后，我们开发了一种优化RIS配置的算法，以优化电磁共振 Coupling的影响。
results: 我们显示，基于S参数优化算法比基于Z参数优化算法更高效，这是因为小修改步长的提案算法会导致S参数中更大的变化，从而增加算法的速度。

Abstract
In this paper, we consider a reconfigurable intelligent surface (RIS) and model it by using multiport network theory. We first compare the representation of RIS by using $Z$-parameters and $S$-parameters, by proving their equivalence and discussing their distinct features. Then, we develop an algorithm for optimizing the RIS configuration in the presence of electromagnetic mutual coupling. We show that the proposed algorithm based on optimizing the $S$-parameters results in better performance than existing algorithms based on optimizing the $Z$-parameters. This is attributed to the fact that small perturbations of the step size of the proposed algorithm result in larger variations of the $S$-parameters, hence increasing the convergence speed of the algorithm.

摘要
在本文中，我们考虑了可重新配置智能表面（RIS），并使用多ports网络理论来建模。我们首先比较了使用$Z$-参数和$S$-参数来表示RIS，并证明它们之间的等价性，并讨论它们的不同特点。然后，我们开发了一种优化RIS配置的算法，基于优化$S$-参数，并证明这种算法在电磁共振 coupling的存在下比既有算法更好。这是因为小幅修改算法中的步长，会导致$S$-参数的大小更大变化，从而提高算法的速度增长。

On the Performance of RIS-Aided Spatial Scattering Modulation for mmWave Transmission

paper_url: http://arxiv.org/abs/2308.16804
repo_url: None
paper_authors: Xusheng Zhu, Wen Chen, Zhendong Li, Qingqing Wu, Ziheng Zhang, Kunlun Wang, Jun Li
for: investigate a state-of-the-art reconfigurable intelligent surface (RIS)-assisted spatial scattering modulation (SSM) scheme for millimeter-wave (mmWave) systems
methods: utilize line-of-sight (LoS) and non-LoS links in the transmitter-RIS and RIS-receiver channels, respectively, and employ the maximum likelihood detector at the receiver
results: derive the conditional pairwise error probability (CPEP) expression for the RIS-SSM scheme under two scenarios, and obtain the union upper bound of average bit error probability (ABEP) based on the CPEP expression, all of which are validated by Monte Carlo simulations.Here is the Chinese translation of the three key information points:
for: 研究一种基于智能表面（RIS）的干扰干扰（SSM）方案，用于毫米波（mmWave）系统
methods: 利用传输器-RIS通道中的直线视野（LoS）和非直线视野（non-LoS）链接，并在接收器上使用最大可能性探测器
results: derive CPEP表达式，并根据其而获得ABEP上限，所有结果经过了Monte Carlo仿真验证。

Abstract
In this paper, we investigate a state-of-the-art reconfigurable intelligent surface (RIS)-assisted spatial scattering modulation (SSM) scheme for millimeter-wave (mmWave) systems, where a more practical scenario that the RIS is near the transmitter while the receiver is far from RIS is considered. To this end, the line-of-sight (LoS) and non-LoS links are utilized in the transmitter-RIS and RIS-receiver channels, respectively. By employing the maximum likelihood detector at the receiver, the conditional pairwise error probability (CPEP) expression for the RIS-SSM scheme is derived under the two scenarios that the received beam demodulation is correct or not. Furthermore, the union upper bound of average bit error probability (ABEP) is obtained based on the CPEP expression. Finally, the derivation results are exhaustively validated by the Monte Carlo simulations.

摘要
在这篇论文中，我们研究了一种基于快速可配置智能表面（RIS）的扩展频率（mmWave）系统中的空间扩散调制（SSM）方案，其中假设RIS位于发送器的近距离上，而接收器则位于RIS的远距离上。为此，在发送器-RIS和RIS-接收器通道中分别使用了直线视线（LoS）和非直线视线（non-LoS）链路。通过使用最大可能性检测器（Maximum Likelihood Detector，MLD）在接收器端，我们 derivated了RIS-SSM方案的假设捷径误差概率（CPEP）表达。然后，基于CPEP表达，我们获得了union最大上限 bound of average bit error probability（ABEP）。最后，我们使用Monte Carlo仿真 validate了 derive结果。

Channel Estimation Using RIDNet Assisted OMP for Hybrid-field THz Massive MIMO Systems

paper_url: http://arxiv.org/abs/2308.16638
repo_url: None
paper_authors: Hasan Nayir, Erhan Karakoca, Ali Görçin, Khalid Qaraqe
for:* 这篇论文旨在提出一种基于Recursive Information Distillation Network（RIDNet）和Orthogonal Matching Pursuit（OMP）的混合场 THz mMIMO通道估计方法，以解决hybrid-field THz mMIMO通道估计的挑战。methods:* 该方法使用RIDNet和OMP结合，对hybrid-field THz mMIMO通道进行估计，包括远场和近场组件。results:* 实验结果表明，提议的RIDNet基于方法在所有Signal-to-Noise Ratio（SNR）域内都具有较低的通道估计误差，特别是在低SNR域。此外，结果还表明，使用RIDNet基于方法可以使用较少的RF链和尝试符号来达到与OMP算法相同的性能。

Abstract
The terahertz (THz) band radio access with larger available bandwidth is anticipated to provide higher capacities for next-generation wireless communication systems. However, higher path loss at THz frequencies significantly limits the wireless communication range. Massive multiple-input multiple-output (mMIMO) is an attractive technology to increase the Rayleigh distance by generating higher gain beams using low wavelength and highly directive antenna array aperture. In addition, both far-field and near-field components of the antenna system should be considered for modelling THz electromagnetic propagation, where the channel estimation for this environment becomes a challenging task. This paper proposes a novel channel estimation method using a recursive information distillation network (RIDNet) together with orthogonal matching pursuit (OMP) for hybrid-field THz mMIMO channels, including both far-field and near-field components. The simulation experiments are performed using the ray-tracing tool. The results indicate that the proposed RIDNet-based method consistently provides lower channel estimation errors compared to the conventional OMP algorithm for all signal-to-noise ratio (SNR) regimes, and the performance gap becomes higher at low SNR regimes. Furthermore, the results imply that the same error performance of the OMP can be achieved by the RIDNet-based method using a lower number of RF chains and pilot symbols.

摘要
频率为teraHz（THz）的无线访问带宽更大，预计会提供下一代无线通信系统更高的容量。然而，THz频率上的跟踪损耗非常大，限制无线通信范围。大规模多输入多输出（mMIMO）技术可以提高Rayleigh距离，通过生成更高的投射高度和高度指向性的天线阵列。此外，需考虑天线系统的远场和近场组分，模拟THz电磁传播。由于这种环境的通道估计成为了一项挑战。这篇文章提出了一种使用重征信息蒸馈网络（RIDNet）和对匹配追求（OMP）算法的新通道估计方法，用于hybrid-field THz mMIMO通道，包括远场和近场组分。实验使用了射线跟踪工具。结果表明，提议的RIDNet基于方法在所有信号响应率（SNR）域内都提供了更低的通道估计错误，并且在低SNR域的性能差距变得更大。此外，结果表明，使用RIDNet基于方法可以通过使用较低的RF链和射频标志符来实现相同的错误性。

Design Challenges for the Implementation of Smart Homes

paper_url: http://arxiv.org/abs/2308.16602
repo_url: https://github.com/jettbrains/-L-
paper_authors: Nesreen Mufid
for: 这个研究的目标是设计和实现一个智能家庭模型，以提高家庭安全性和可用性。
methods: 该模型使用可靠的移动网络，让用户可以在外出时监控家庭内部的情况，并通过不同的探测器检测火灾、气体泄漏、水泄漏和偷窃等问题。此外，家庭内还设置了一个摄像头，为用户提供全景视图。
results: 该模型可以帮助用户在外出时监控家庭内部的情况，并在火灾、气体泄漏、水泄漏和偷窃等问题发生时通知用户，让用户有时间采取行动。此外，用户还可以通过移动应用程序远程控制家庭的照明系统，灯光的开关和灭火。

Abstract
Home automation for many years had faced challenges that limit its spreading around the world. These challenges caused by the high cost of Own such a home, inflexibility system (cannot be monitored outside the home) and issues to achieve optimal security. Our main objective is to design and implement a smart home model that is simple, affordable to the users. The proposed system provide flexibility to monitor the home, using the reliable cellular network. The user will be able what is inside the home when he /she is away from home. In addition to that, our model overcome the issue of the security by providing different sensors that detects smoke, gas, leakage of water and incases of burglary. Moreover, a camera will be available in the home to give a full view for the user when he/she is outside the home. The user will be informed by an application on his/she phone incase if there is a fire, water leakage and if someone break into the house. This will give the user a chance to take an action if such cases happened. Furthermore, the user can monitor the lighting system of the home, by giving the user a chance to turn the lights on and off remotely.

摘要
家庭自动化系统在多年来一直面临着限制其在全球蔓延的挑战。这些挑战是由于高昂的家庭所有成本、不灵活的系统（不能在外部监控）以及实现优质安全性的问题所致。我们的主要目标是设计并实施一个简单、可Affordable的家庭自动化模型。提议的系统提供了外部监控家庭的灵活性，使用可靠的手机网络。用户将能够在离家时了解家内情况，并且可以通过应用程序在手机上获得相关信息。此外，我们的模型还解决了安全性问题，通过设置烟报、气体探测器、水泄漏检测器和窃贼检测器等多种感知器来实现。此外，家中还将安装一个摄像头，以为用户在外部提供全景视图。用户通过应用程序接收有关家内情况的通知，如果发生火灾、水泄漏或窃贼等情况，他们可以及时采取相应的行动。此外，用户还可以通过移动设备控制家庭照明系统，启用和灭火按钮。

Data-Aided Channel Estimation Utilizing Gaussian Mixture Models

paper_url: http://arxiv.org/abs/2308.16601
repo_url: None
paper_authors: Franz Weißer, Nurettin Turan, Dominik Semmler, Wolfgang Utschick
for: 提高多用户系统中通道估计质量
methods: 使用数据符号和导航符号，提出两种方法，包括基于所有接收符号的子空间估计和基于 Gaussian mixture model 的通道估计器
results: 对实际通道测量数据进行数值仪表示，提议方法比 studied state-of-the-art 通道估计器 superior 性能

Abstract
In this work, we propose two methods that utilize data symbols in addition to pilot symbols for improved channel estimation quality in a multi-user system, so-called semi-blind channel estimation. To this end, a subspace is estimated based on all received symbols and utilized to improve the estimation quality of a Gaussian mixture model-based channel estimator, which solely uses pilot symbols for channel estimation. Both of the proposed approaches allow for parallelization. Even the precomputation of estimation filters, which is beneficial in terms of computational complexity, is enabled by one of the proposed methods. Numerical simulations for real channel measurement data available to us show that the proposed methods outperform the studied state-of-the-art channel estimators.

摘要
在这项工作中，我们提出了两种方法，利用数据符号以外的导航符号进行改进的通道估计质量，称为半不可见通道估计。为此，我们根据所有接收的符号 estimate一个子空间，并使用这个子空间来改进基于 Gaussian mixture model 的通道估计器，该仅使用导航符号进行通道估计。两种提出的方法均允许并行计算。而且，一种方法甚至可以在预计算估计filter的过程中启用并行计算。我们对我们手中的实际通道测量数据进行数值仿真，结果表明，我们提出的方法可以比 studied state-of-the-art 通道估计器表现更好。

Channel Estimation for XL-MIMO Systems with Polar-Domain Multi-Scale Residual Dense Network

paper_url: http://arxiv.org/abs/2308.16400
repo_url: https://github.com/HaoLei-tnunder/Channel_Estimation_for_XL-MIMO_Systems_with_Polar-Domain_Multi-Scale_Residual_Dense_Network
paper_authors: Hao Lei, Jiayi Zhang, Huahua Xiao, Xiaodan Zhang, Bo Ai, Derrick Wing Kwan Ng
for: 实现未来无线通信的广泛应用需要精准的通道状态信息，xl-mimo技术可以提供巨大的性能提升potential。
methods: 作者提出了基于 polar-domain sparse 的多重遗传神经网络（P-MRDN）和多scale residual dense network（P-MSRDN），以提高xl-mimo系统中的通道估计精度。
results: 实验结果显示，提议的方案比现有参考方案有更高的性能，并且 channel sparsity 对方案的影响相对较小。

Abstract
Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technique to enable versatile applications for future wireless communications.To realize the huge potential performance gain, accurate channel state information is a fundamental technical prerequisite. In conventional massive MIMO, the channel is often modeled by the far-field planar-wavefront with rich sparsity in the angular domain that facilitates the design of low-complexity channel estimation. However, this sparsity is not conspicuous in XL-MIMO systems due to the non-negligible near-field spherical-wavefront. To address the inherent performance loss of the angular-domain channel estimation schemes, we first propose the polar-domain multiple residual dense network (P-MRDN) for XL-MIMO systems based on the polar-domain sparsity of the near-field channel by improving the existing MRDN scheme. Furthermore, a polar-domain multi-scale residual dense network (P-MSRDN) is designed to improve the channel estimation accuracy. Finally, simulation results reveal the superior performance of the proposed schemes compared with existing benchmark schemes and the minimal influence of the channel sparsity on the proposed schemes.

摘要
“EXTREMELY LARGE-SCALE MULTIPLE-INPUT MULTIPLE-OUTPUT（XL-MIMO）是未来无线通讯技术的应用中的一个有前途的技术。为了实现这个巨大的性能提升，精确的通道状态信息是一个基本的技术前提。在传统的大规模MIMO系统中，通道通常被Modeled为距离较远的平面波front，这给了设计低复杂度的通道估测设计提供了帮助。但是，这种稀疏性不是XL-MIMO系统中的主要特点，因为近场球面波front的影响不可忽略。为了解决传统angular-domain channel estimation scheme的自然的性能损失，我们首先提出了 polar-domain multiple residual dense network（P-MRDN），这是基于近场通道的polar-domain稀疏性改进了现有的MRDN scheme。其次，我们设计了 polar-domain multi-scale residual dense network（P-MSRDN），以提高通道估测精度。最后，我们通过实验结果表明了我们提出的方案比现有的参考方案有更高的性能，并且通道稀疏性对我们提出的方案的影响较小。”

2023-08-30

cs.SD

cs.SD - 2023-08-30

A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis

paper_url: http://arxiv.org/abs/2308.15422
repo_url: None
paper_authors: Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, Charalampos Saitis
for: 这篇论文主要针对的是音乐和语音合成领域中的差分可控数字信号处理技术的应用。
methods: 该论文使用了差分可控数字信号处理技术，其中包括后退传播和权重调整等方法。
results: 该论文对音乐和语音合成任务进行了评估，并结果表明这些技术可以提高音乐和语音的生成质量。同时，论文还提出了一些未来研究的挑战，如优化症状、真实世界情况下的稳定性和设计决策。

Abstract
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.

摘要
“差分可读取数字信号处理”是一家技术集合，其中损失函数导数通过数字信号处理器进行反propagation，以便将其 интегрирова到神经网络中。本文对差分音频信号处理的文献进行了报告，专注于它在音乐与语音合成中的应用。我们列出了各种应用场景，包括音乐演奏渲染、声音匹配和语音转换，并讨论了使用这种方法的动机和影响。此外，我们还提供了对数字信号处理操作的差分实现的概述。最后，我们指出了当前的开放挑战，包括优化症状、对实际 Condition 的Robustness以及设计贸易OFF，并讨论了未来研究的方向。

2023-08-30

cs.CV

cs.CV - 2023-08-30

3D Adversarial Augmentations for Robust Out-of-Domain Predictions

paper_url: http://arxiv.org/abs/2308.15479
repo_url: None
paper_authors: Alexander Lehner, Stefano Gasperini, Alvaro Marcos-Ramiro, Michael Schmidt, Nassir Navab, Benjamin Busam, Federico Tombari
for: This paper aims to improve the generalization of 3D object detection and semantic segmentation models to out-of-domain data.methods: The authors use adversarial examples to augment the training set and improve the models’ robustness to out-of-domain data. They learn a set of vectors that deform the objects in an adversarial fashion while preserving their plausibility.results: The authors show that their approach substantially improves the robustness and generalization of both 3D object detection and 3D semantic segmentation methods to out-of-domain data, achieving better performance on a variety of scenarios using data from KITTI, Waymo, and CrashD for object detection, and data from SemanticKITTI, Waymo, and nuScenes for semantic segmentation.Here’s the simplified Chinese text for the three key points:for: 这篇论文目标是提高3D物体检测和 semantic segmentation 模型对非标型数据的泛化性。methods: 作者使用对抗示例来增强训练集，以提高模型对非标型数据的Robustness。他们学习了一组扭曲物体的vector，以 preserve their plausibility。results: 作者表明，他们的方法可以大幅提高3D物体检测和 semantic segmentation 模型对非标型数据的泛化性，在不同场景下，使用KITTI、Waymo和CrashD数据集进行3D物体检测，以及使用SemanticKITTI、Waymo和nuScenes数据集进行semantic segmentation，并且在训练使用标准单个数据集，而不是使用多个数据集。

Abstract
Since real-world training datasets cannot properly sample the long tail of the underlying data distribution, corner cases and rare out-of-domain samples can severely hinder the performance of state-of-the-art models. This problem becomes even more severe for dense tasks, such as 3D semantic segmentation, where points of non-standard objects can be confidently associated to the wrong class. In this work, we focus on improving the generalization to out-of-domain data. We achieve this by augmenting the training set with adversarial examples. First, we learn a set of vectors that deform the objects in an adversarial fashion. To prevent the adversarial examples from being too far from the existing data distribution, we preserve their plausibility through a series of constraints, ensuring sensor-awareness and shapes smoothness. Then, we perform adversarial augmentation by applying the learned sample-independent vectors to the available objects when training a model. We conduct extensive experiments across a variety of scenarios on data from KITTI, Waymo, and CrashD for 3D object detection, and on data from SemanticKITTI, Waymo, and nuScenes for 3D semantic segmentation. Despite training on a standard single dataset, our approach substantially improves the robustness and generalization of both 3D object detection and 3D semantic segmentation methods to out-of-domain data.

摘要
自实际训练数据集不能正确采样下游数据分布的长尾，因此角落情况和罕见的非预训练数据样本会严重影响当前最佳模型的性能。这个问题在某些笔直的任务上，如3D语义分割，变得更加严重，因为非标准对象的点可以坚定地归类到错误的类型上。在这种情况下，我们关注提高对非预训练数据的泛化。我们实现这一目标通过在训练集中添加对抗示例来实现。首先，我们学习一组可以妄图对象的变形向量。为了保证对抗示例不过于远离现有数据分布，我们保留其可能性通过一系列约束，包括感知器和形状的平滑性。然后，我们通过应用学习的样本独立向量来对可用的对象进行对抗增强。我们在各种场景下进行了广泛的实验，包括KITTI、Waymo和CrashD上的3D物体检测，以及SemanticKITTI、Waymo和nuScenes上的3D语义分割。尽管我们只使用了标准单个数据集进行训练，但我们的方法可以很大程度上提高3D物体检测和3D语义分割方法对于非预训练数据的泛化性和Robustness。

An Adaptive Tangent Feature Perspective of Neural Networks

paper_url: http://arxiv.org/abs/2308.15478
repo_url: None
paper_authors: Daniel LeJeune, Sina Alemohammad
for: 了解神经网络中特征学习的机制
methods: 使用线性模型在抽象特征空间进行学习，并在训练过程中允许特征进行变换
results: 提出了一种基于线性变换的特征学习框架，并证明该框架在神经网络中可以提供更多的特征学习细节，以及一种适应特征实现的抽象特征分类方法可以在MNIST和CIFAR-10上具有许多 órders of magnitude 的采样复杂性优势。

Abstract
In order to better understand feature learning in neural networks, we propose a framework for understanding linear models in tangent feature space where the features are allowed to be transformed during training. We consider linear transformations of features, resulting in a joint optimization over parameters and transformations with a bilinear interpolation constraint. We show that this optimization problem has an equivalent linearly constrained optimization with structured regularization that encourages approximately low rank solutions. Specializing to neural network structure, we gain insights into how the features and thus the kernel function change, providing additional nuance to the phenomenon of kernel alignment when the target function is poorly represented using tangent features. In addition to verifying our theoretical observations in real neural networks on a simple regression problem, we empirically show that an adaptive feature implementation of tangent feature classification has an order of magnitude lower sample complexity than the fixed tangent feature model on MNIST and CIFAR-10.

摘要
为了更好地理解神经网络中的特征学习，我们提出了一个框架来理解在斜缩Feature空间中的线性模型。我们考虑了特征的线性变换，从而导致参数和变换的共同优化问题，其中包括bilinear插值约束。我们表明这个优化问题有相应的线性约束优化问题，并且具有结构化正则化，以鼓励约束低维解决方案。在神经网络结构下，我们获得了特征和几何函数的变化，从而提供了特征对kernel函数的影响的更多的准确信息。此外，我们还证明了在实际神经网络中，适用于拟合特征的tanent特征分类模型具有训练样本的一个数量级减少。

Learning Modulated Transformation in GANs

paper_url: http://arxiv.org/abs/2308.15472
repo_url: None
paper_authors: Ceyuan Yang, Qihang Zhang, Yinghao Xu, Jiapeng Zhu, Yujun Shen, Bo Dai
for: 提高 generative adversarial networks (GANs) 的模型灵活性和可重用性，以便更好地处理各种生成任务，包括图像生成、3D-aware图像生成和视频生成。
methods: 提出一种名为 modulated transformation module (MTM) 的插件模块，该模块可以预测空间偏移，并根据隐藏码来控制变量位置，以便模型更好地处理几何变换。
results: 在多种生成任务上进行了广泛的实验，并证明了该方法可以与当前的状态略进行无需任何参数调整。特别是，在人体生成 tasks 上，我们提高了 StyleGAN3 的 FID 值从 21.36 下降至 13.60， demonstrate 了学习模ulated geometry transformation 的能力。

Abstract
The success of style-based generators largely benefits from style modulation, which helps take care of the cross-instance variation within data. However, the instance-wise stochasticity is typically introduced via regular convolution, where kernels interact with features at some fixed locations, limiting its capacity for modeling geometric variation. To alleviate this problem, we equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM). This module predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations for different instances, and hence offers the model an additional degree of freedom to handle geometry deformation. Extensive experiments suggest that our approach can be faithfully generalized to various generative tasks, including image generation, 3D-aware image synthesis, and video generation, and get compatible with state-of-the-art frameworks without any hyper-parameter tuning. It is noteworthy that, towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.

摘要
成功的风格基本生成器主要受益于风格调整，它可以处理数据中的跨实例变化。然而，实例具有的随机性通常通过常规 convolution 引入，其中核函数与特征在固定位置相互作用，限制模型的形态变换能力。为解决这个问题，我们在生成对抗网络（GANs）中增加了可替换模块，称为模ulated transformation module（MTM）。这个模块根据隐藏代码预测空间偏移，并基于这些偏移进行变量位置的 convolution 操作，从而为模型增加了一个额外的自由度来处理形态变换。我们的方法可以广泛应用于不同的生成任务，包括图像生成、三维感知图像合成和视频生成，并与当前最佳框架相容无需任何超参数调整。特别是，我们在挑战性的 TaiChi 数据集上进行人体生成 task 时，提高了 StyleGAN3 的 FID 从 21.36 下降至 13.60，这表明我们学习了模ulated geometry transformation 的能力。

Input margins can predict generalization too

paper_url: http://arxiv.org/abs/2308.15466
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Coenraad Mouton, Marthinus W. Theunissen, Marelie H. Davel
for: investigate the relationship between generalization and classification margins in deep neural networks
methods: use margin measurements, specifically constrained margins, to predict generalization ability
results: constrained margins achieve highly competitive scores and outperform other margin measurements in general, providing a novel insight into the relationship between generalization and classification margins.

Abstract
Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or its representation internal to the network. While margins have been shown to be correlated with the generalization ability of a model when measured at its hidden representations (hidden margins), no such link between large margins and generalization has been established for input margins. We show that while input margins are not generally predictive of generalization, they can be if the search space is appropriately constrained. We develop such a measure based on input margins, which we refer to as `constrained margins'. The predictive power of this new measure is demonstrated on the 'Predicting Generalization in Deep Learning' (PGDL) dataset and contrasted with hidden representation margins. We find that constrained margins achieve highly competitive scores and outperform other margin measurements in general. This provides a novel insight on the relationship between generalization and classification margins, and highlights the importance of considering the data manifold for investigations of generalization in DNNs.

摘要

Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

paper_url: http://arxiv.org/abs/2308.15462
repo_url: None
paper_authors: Yazhou Xing, Amrita Mazumdar, Anjul Patney, Chao Liu, Hongxu Yin, Qifeng Chen, Jan Kautz, Iuri Frosio
for: 解决LDR相机无法处理宽动态范围输入的问题，提高图像质量。
methods: 使用变换器基于深度神经网络（DNN）推断缺失HDR细节。在减少参数学习中，使用多尺度DNN和适当的成本函数来实现状态艺术质量。 Additionally, using a reference frame from the past as an additional input to aid the reconstruction of overexposed areas.
results: 在减少参数学习中，使用这种方法可以获得状态艺术质量，而不需要使用复杂的获取机制或高Dynamic范围成像处理。我们的示例视频可以在https://drive.google.com/file/d/1-r12BKImLOYCLUoPzdebnMyNjJ4Rk360/view中找到。

Abstract
Low dynamic range (LDR) cameras cannot deal with wide dynamic range inputs, frequently leading to local overexposure issues. We present a learning-based system to reduce these artifacts without resorting to complex acquisition mechanisms like alternating exposures or costly processing that are typical of high dynamic range (HDR) imaging. We propose a transformer-based deep neural network (DNN) to infer the missing HDR details. In an ablation study, we show the importance of using a multiscale DNN and train it with the proper cost function to achieve state-of-the-art quality. To aid the reconstruction of the overexposed areas, our DNN takes a reference frame from the past as an additional input. This leverages the commonly occurring temporal instabilities of autoexposure to our advantage: since well-exposed details in the current frame may be overexposed in the future, we use reinforcement learning to train a reference frame selection DNN that decides whether to adopt the current frame as a future reference. Without resorting to alternating exposures, we obtain therefore a causal, HDR hallucination algorithm with potential application in common video acquisition settings. Our demo video can be found at https://drive.google.com/file/d/1-r12BKImLOYCLUoPzdebnMyNjJ4Rk360/view

摘要
低动态范围（LDR）摄像机不能处理宽动态范围输入，导致本地过度曝光问题。我们提出了一种学习基于的系统，以减少这些缺陷而不需要复杂的获取机制如alternating exposures或高动态范围（HDR）拍摄。我们提议使用 transformer 基于的深度神经网络（DNN）来推理缺失 HDR 细节。在一个ablation study中，我们表明了使用多尺度 DNN 和适当的成本函数以 дости得状态的质量。为了重建过度曝光的区域，我们的 DNN 接受了过去的参考帧作为额外输入。这样利用了自动曝光的 temporal 不稳定性，我们使用 reinforcement learning 来训练参考帧选择 DNN，以确定是否采用当前帧作为未来的参考。无需alternating exposures，我们得到了一个 causal、HDR 幻化算法，可能在常见的视频拍摄设置中应用。我们的 demo 视频可以在找到。

Canonical Factors for Hybrid Neural Fields

paper_url: http://arxiv.org/abs/2308.15461
repo_url: https://github.com/brentyi/tilted
paper_authors: Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma
for: 本文主要针对的问题是射线对齐信号的抽象特征量化方法引入的不良偏见问题，并提出一种解决方法。
methods: 本文使用了学习一组均衡变换的方法，以消除这些偏见。
results: 实验结果表明，使用这种方法可以提高图像、签名距离和辐射场重建质量、稳定性、压缩率和运行时间。

Abstract
Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance succeeds with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, where we observe improvements across quality, robustness, compactness, and runtime. Results demonstrate that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures.

摘要

Characterize the undesirable biases that these architectures have for axis-aligned signals, which can lead to radiance field reconstruction differences of up to 2 PSNR.2. Explore how learning a set of canonicalizing transformations can improve representations by removing these biases.We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance can be done with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, and observe improvements in quality, robustness, compactness, and runtime. Our results show that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures.

Pseudo-Boolean Polynomials Approach To Edge Detection And Image Segmentation

paper_url: http://arxiv.org/abs/2308.15453
repo_url: None
paper_authors: Tendai Mapungwana Chikake, Boris Goldengorin, Alexey Samosyuk
for: 用于图像Edge检测和分割
methods: 使用pseudo-Boolean波尔次数计算image patches，进行二分类 blob和边区域的分类
results: 在简单图像中成功实现Edge检测和分割，并在复杂图像中进行应用In English, this translates to:
for: Used for image edge detection and segmentation
methods: Using pseudo-Boolean polynomials calculated on image patches for binary classification of blob and edge regions
results: Successfully implemented edge detection and segmentation on simple images and applied to complex images like aerial landscapes

Abstract
We introduce a deterministic approach to edge detection and image segmentation by formulating pseudo-Boolean polynomials on image patches. The approach works by applying a binary classification of blob and edge regions in an image based on the degrees of pseudo-Boolean polynomials calculated on patches extracted from the provided image. We test our method on simple images containing primitive shapes of constant and contrasting colour and establish the feasibility before applying it to complex instances like aerial landscape images. The proposed method is based on the exploitation of the reduction, polynomial degree, and equivalence properties of penalty-based pseudo-Boolean polynomials.

摘要
我们介绍了一种权值Deterministic逻辑来实现图像边检测和分割，通过在图像块上计算 pseudo-Boolean 多项式。该方法基于对图像块上的二分类，将图像分为 blob 和边区域，并基于计算的 pseudo-Boolean 多项式度量来进行分类。我们在简单的图像中使用了固定颜色和对比度的基本形状进行测试，并证明了该方法的可行性。然后，我们将该方法应用于复杂的飞行图像。该方法基于 pseudo-Boolean 多项式的减少、度量和等价性属性的利用。

WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

paper_url: http://arxiv.org/abs/2308.15413
repo_url: None
paper_authors: Eric Lei, Muhammad Asad Lodhi, Jiahao Pang, Junghyun Ahn, Dong Tian
for: 本研究旨在实现基于 mesh 数据的无监督学习，以便学习更有意义的表示。
methods: 本文提出了一种基于 bottleneck 的 mesh autoencoder，通过专门 Representing mesh 连接情况的基本图来促进学习共享的 latent 空间表示对象形状。
results: 对比点云学习，WRAPPINGNET 可以提供更高质量的重建和竞争性的分类结果，同时可以进行不同类型对象之间的 latent interpolate。

Abstract
There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific templates, e.g., human face/body templates. It restricts the learned latent codes to only be meaningful for objects in a specific category, so the learned latent spaces are unable to be used across different types of objects. In this work, we present WrappingNet, the first mesh autoencoder enabling general mesh unsupervised learning over heterogeneous objects. It introduces a novel base graph in the bottleneck dedicated to representing mesh connectivity, which is shown to facilitate learning a shared latent space representing object shape. The superiority of WrappingNet mesh learning is further demonstrated via improved reconstruction quality and competitive classification compared to point cloud learning, as well as latent interpolation between meshes of different categories.

摘要
有些最近的努力是通过固定长度代码Word来学习更有意义的表示，从网格数据中得到更多的信息，因为网格作为三维形态的完整模型，比点云更有优势。然而，网格连接会对深度学习管道的构建带来新的挑战。以前的无监督学习方法通常假设特定类别的模板，例如人脸/身体模板。这限制学习的幂等空间只能对特定类别的对象进行有意义的学习，因此学习的幂等空间无法在不同类别的对象之间进行使用。在这种工作中，我们介绍了WrappingNet，首个能够进行总体网格无监督学习的自动编码器。它引入了瓶颈部分的新基graph，用于表示网格连接，这被证明可以促进学习对象形状的共享 latent space。我们通过对网格学习和点云学习进行比较，以及在不同类别的网格之间进行 latent interpolate 等方法来证明 WrappingNet 的优越性。

Robust Long-Tailed Learning via Label-Aware Bounded CVaR

paper_url: http://arxiv.org/abs/2308.15405
repo_url: None
paper_authors: Hong Zhu, Runpeng Yu, Xing Tang, Yifei Wang, Yuan Fang, Yisen Wang
for: 实际世界中的核心类别问题 often 会出现不对称或长尾分布，导致模型训练时对少数类别表现差。这种情况下，单简的模型通常对少数类别表现不佳。
methods: 本文提出了两种基于CVaR（Conditional Value at Risk）的新方法来改善长尾学习的表现，并提供了严谨的理论保证。特别是，我们首先引入了Label-Aware Bounded CVaR（LAB-CVaR）损失函数，以解决原始CVaR的偏预结果问题，然后设计了LAB-CVaR的最佳质量上限。基于LAB-CVaR，我们还提出了LAB-CVaR with logit adjustment（LAB-CVaR-logit）损失函数，并提供了理论支持。
results: 实际实验结果显示，我们的提案方法在实际世界中的长尾标签分布下表现出色，较以单简的模型表现更好。

Abstract
Data in the real-world classification problems are always imbalanced or long-tailed, wherein the majority classes have the most of the samples that dominate the model training. In such setting, the naive model tends to have poor performance on the minority classes. Previously, a variety of loss modifications have been proposed to address the long-tailed leaning problem, while these methods either treat the samples in the same class indiscriminatingly or lack a theoretical guarantee. In this paper, we propose two novel approaches based on CVaR (Conditional Value at Risk) to improve the performance of long-tailed learning with a solid theoretical ground. Specifically, we firstly introduce a Label-Aware Bounded CVaR (LAB-CVaR) loss to overcome the pessimistic result of the original CVaR, and further design the optimal weight bounds for LAB-CVaR theoretically. Based on LAB-CVaR, we additionally propose a LAB-CVaR with logit adjustment (LAB-CVaR-logit) loss to stabilize the optimization process, where we also offer the theoretical support. Extensive experiments on real-world datasets with long-tailed label distributions verify the superiority of our proposed methods.

摘要
<>Translate given text into Simplified Chinese.<>世界上的实际分类问题中的数据总是偏斜或长尾分布，其中多数类占据了模型训练中的大多数样本。在这种情况下，简单的模型通常会对少数类表现不佳。先前，一些损失修改方法已经被提出，但这些方法可能会对同一类的样本待遇不公平，或者缺乏理论保证。在本文中，我们提出了两种基于CVaR（Conditional Value at Risk）的新方法，以改进长尾学习的性能，并提供了坚实的理论基础。 Specifically, we first introduce a Label-Aware Bounded CVaR (LAB-CVaR) loss to overcome the pessimistic result of the original CVaR, and further design the optimal weight bounds for LAB-CVaR theoretically. Based on LAB-CVaR, we additionally propose a LAB-CVaR with logit adjustment (LAB-CVaR-logit) loss to stabilize the optimization process, where we also offer the theoretical support. 实际上，我们对实际上的长尾标签分布进行了广泛的实验，并证明了我们提出的方法的优越性。

2023-08-30

cs.AI

cs.AI - 2023-08-30

A General-Purpose Self-Supervised Model for Computational Pathology

paper_url: http://arxiv.org/abs/2308.15474
repo_url: None
paper_authors: Richard J. Chen, Tong Ding, Ming Y. Lu, Drew F. K. Williamson, Guillaume Jaume, Bowen Chen, Andrew Zhang, Daniel Shao, Andrew H. Song, Muhammad Shaban, Mane Williams, Anurag Vaidya, Sharifa Sahai, Lukas Oldenburg, Luca L. Weishaupt, Judy J. Wang, Walt Williams, Long Phi Le, Georg Gerber, Faisal Mahmood
for: 本研究旨在提出一种通用的自助学习模型，用于解决生物病理学图像分类和诊断问题。
methods: 该模型使用了100万个各种组织类型的病理图像补充，并通过自助学习方法进行预训练。
results: 该模型在33种不同的临床任务中表现出色，包括分类、诊断和疾病类型划分等。此外，模型还能够在各种组织类型和诊断难度不同的情况下进行泛化和转移。

Abstract
Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts have proposed using pretrained image encoders with either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin-stained WSIs across 20 major tissue types, and evaluated on 33 representative CPath clinical tasks in CPath of varying diagnostic difficulties. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree code classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient AI models that can generalize and transfer to a gamut of diagnostically-challenging tasks and clinical workflows in anatomic pathology.

摘要
组织现象评估是computational pathology（CPath）的基本任务，它的目标是通过学习标注组织生物marker的Objective characterizations，以帮助诊断医学。然而，整个标本影像（WSI）对于计算机视觉问题而言是一个复杂的问题，因为标本影像的大规模分辨率和丰富多样性的形态现象对于大规模标注数据的生成提供了一定的挑战。目前的努力已经提议使用预训归数图像Encoder， either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets，但这些努力尚未得到了广泛的发展和评估。我们提出了UNI，一个通用的自主学习模型，预训使用了1000万个组织小图像，来自100000多个诊断HE染色标本影像，并在20个主要组织类型上进行了33个CPath临床任务的评估。此外，我们还展示了一些新的模型化能力，例如resolution-agnostic tissue classification、slides classification using few-shot class prototypes、和疾病分类对108种癌症的分类。UNI在CPath中进行了无监督学习的扩展，并在这些任务上实现了资料效率的AI模型，可以对诊断挑战性任务和临床工作流程进行普遍和转移。

Multimodal Contrastive Learning and Tabular Attention for Automated Alzheimer’s Disease Prediction

paper_url: http://arxiv.org/abs/2308.15469
repo_url: None
paper_authors: Weichen Huang
for: 这个研究旨在开发一个多 modal 对照学习框架，以利用 MRI 扫描和 PET 等神经成像数据，并处理 AD 疾病数据中的值得注意的表格资料。
methods: 这个框架使用了一个新的表格注意模组，可以强调和排名表格中的重要特征。它还使用了多 modal 对照学习技术，以将图像和表格资料结合在一起。
results: 实验结果显示，这个框架可以从 ADNI 数据库中的逾 882 个 MRI 扫描标本中检测出 AD 疾病，并且可以实现高于 83.8% 的准确率，与前一代的州度优化技术相比，提高了约 10%。

Abstract
Alongside neuroimaging such as MRI scans and PET, Alzheimer's disease (AD) datasets contain valuable tabular data including AD biomarkers and clinical assessments. Existing computer vision approaches struggle to utilize this additional information. To address these needs, we propose a generalizable framework for multimodal contrastive learning of image data and tabular data, a novel tabular attention module for amplifying and ranking salient features in tables, and the application of these techniques onto Alzheimer's disease prediction. Experimental evaulations demonstrate the strength of our framework by detecting Alzheimer's disease (AD) from over 882 MR image slices from the ADNI database. We take advantage of the high interpretability of tabular data and our novel tabular attention approach and through attribution of the attention scores for each row of the table, we note and rank the most predominant features. Results show that the model is capable of an accuracy of over 83.8%, almost a 10% increase from previous state of the art.

摘要
alongside neuroimaging such as MRI scans and PET, Alzheimer's disease (AD) datasets contain valuable tabular data including AD biomarkers and clinical assessments. Existing computer vision approaches struggle to utilize this additional information. To address these needs, we propose a generalizable framework for multimodal contrastive learning of image data and tabular data, a novel tabular attention module for amplifying and ranking salient features in tables, and the application of these techniques onto Alzheimer's disease prediction. Experimental evaulations demonstrate the strength of our framework by detecting Alzheimer's disease (AD) from over 882 MR image slices from the ADNI database. We take advantage of the high interpretability of tabular data and our novel tabular attention approach and through attribution of the attention scores for each row of the table, we note and rank the most predominant features. Results show that the model is capable of an accuracy of over 83.8%, almost a 10% increase from previous state of the art.Here's the translation in Traditional Chinese:附加了脑成像技术，如MRI扫描和PET，Alzheimer病（AD）数据集包含重要的表格资料，包括AD标识和临床评估。现有的计算机视觉方法对这些额外资讯难以使用。为解决这些需求，我们提出了一个通用的多modal对比学习框架，一个新的表格注意模组，以强调和排名表格中的重要特征。我们还应用了这些技术 onto Alzheimer病预测。实验评估显示了我们的框架在ADNI数据库中的882个MRI图像探针中检测到Alzheimer病的能力，比前一代的state of the art约10%高。我们利用表格资料的高解释性和我们的新的表格注意方法，通过每行表格中的注意分数汇总，发现和排名表格中的最主要特征。结果显示模型可以达到83.8%的精度，比前一代的state of the art约10%高。

A Comparative Study of Loss Functions: Traffic Predictions in Regular and Congestion Scenarios

paper_url: http://arxiv.org/abs/2308.15464
repo_url: https://github.com/xieyangxinyu/a-comparative-study-of-loss-functions-traffic-predictions-in-regular-and-congestion-scenarios
paper_authors: Yangxinyu Xie, Tanwi Mallick
for: 这 paper 的目的是提高深度学习模型在交通预测中的精度，特别是预测堵塞情况。
methods: 这 paper 使用了多种积分函数，包括 MAE-Focal Loss 和 Gumbel Loss，来解决传统损失函数的局限性。
results: 经过大规模实验，这 paper 发现 MAE-Focal Loss 和 Gumbel Loss 在预测交通速度方面具有最高效果，能够准确预测堵塞情况而不会妨碍正常交通预测。

Abstract
Spatiotemporal graph neural networks have achieved state-of-the-art performance in traffic forecasting. However, they often struggle to forecast congestion accurately due to the limitations of traditional loss functions. While accurate forecasting of regular traffic conditions is crucial, a reliable AI system must also accurately forecast congestion scenarios to maintain safe and efficient transportation. In this paper, we explore various loss functions inspired by heavy tail analysis and imbalanced classification problems to address this issue. We evaluate the efficacy of these loss functions in forecasting traffic speed, with an emphasis on congestion scenarios. Through extensive experiments on real-world traffic datasets, we discovered that when optimizing for Mean Absolute Error (MAE), the MAE-Focal Loss function stands out as the most effective. When optimizing Mean Squared Error (MSE), Gumbel Loss proves to be the superior choice. These choices effectively forecast traffic congestion events without compromising the accuracy of regular traffic speed forecasts. This research enhances deep learning models' capabilities in forecasting sudden speed changes due to congestion and underscores the need for more research in this direction. By elevating the accuracy of congestion forecasting, we advocate for AI systems that are reliable, secure, and resilient in practical traffic management scenarios.

摘要
现代交通预测中使用的空间时间图 neural network 已经达到了领先的性能水平。然而，它们经常因传统的损失函数的局限性而难以准确预测堵塞情况。尽管正确预测常规交通情况是非常重要，但一个可靠的 AI 系统也必须准确预测堵塞场景，以保证安全和高效的交通运输。在这篇论文中，我们探讨了各种启发自重态分析和不均衡分类问题的损失函数，以解决这一问题。我们对这些损失函数在预测交通速度方面的效果进行了广泛的实验，发现了使用 MAE-Focal Loss 函数时，MAE 函数在预测堵塞场景中表现最佳。使用 MSE 函数时，Gumbel Loss 函数表现最佳。这些选择可以准确预测交通堵塞事件，不会 compromise 正常交通速度预测的准确性。这项研究提高了深度学习模型在预测快速变化的能力，并强调了对堵塞预测的需求。我们认为，通过提高堵塞预测的准确性，可以建立可靠、安全、可靠的 AI 系统，以满足实际交通管理场景中的需求。

ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

paper_url: http://arxiv.org/abs/2308.15459
repo_url: https://github.com/zacharyhorvitz/ParaGuide
paper_authors: Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, Kathleen McKeown
for: 文章的目的是将文本的风格特征转换为新的风格，保留 semantic information。
methods: 本研究使用了一个新的扩散基础架构，叫做 ParaGuide，可以灵活地适应任意目标风格。这个方法使用了句子重写条件下的扩散模型，以及Gradient-based guidance from both off-the-shelf classifiers和强大的现有风格嵌入。
results: 研究在Enron Email Corpus上进行了人工和自动评估，和强大的基eline均以上。它可以成功地将文本的风格转换为新的风格，保留 semantic information。

Abstract
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. Target "styles" can be defined in numerous ways, ranging from single attributes (e.g, formality) to authorship (e.g, Shakespeare). Previous unsupervised style-transfer approaches generally rely on significant amounts of labeled data for only a fixed set of styles or require large language models. In contrast, we introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles at inference time. Our parameter-efficient approach, ParaGuide, leverages paraphrase-conditioned diffusion models alongside gradient-based guidance from both off-the-shelf classifiers and strong existing style embedders to transform the style of text while preserving semantic information. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.

摘要
文本风格转换是将文本的风格属性转换为另一种风格的任务，保持意义不变。目标风格可以定义为多种方式，从单一特征（例如正式度）到作者（例如莎士比亚）。前一代无监督风格转换方法通常需要大量标注数据，仅适用于固定的风格集或需要大型语言模型。相比之下，我们介绍了一种新的扩散基于框架，可以通过扩散模型来实现通用风格转换，并在推理时适应任意目标风格。我们的参数有效的方法， ParaGuide，利用了句子重构conditional扩散模型，并与梯度导航从存储类фика器和强有力的现有风格编码器来转换文本的风格，保持 semantic information。我们在恩рон电子邮件集上验证了该方法，并与人类和自动评估表明，它在正式度、情感和作者风格转换方面超过了强大基eline。

From SMOTE to Mixup for Deep Imbalanced Classification

paper_url: http://arxiv.org/abs/2308.15457
repo_url: https://github.com/ntucllab/imbalanced-dl
paper_authors: Wei-Chao Cheng, Tan-Ha Mai, Hsuan-Tien Lin
for: 本研究旨在探讨深度学习中对异质数据的处理方法，尤其是SMOTE数据增强技术是否有利于深度学习。
methods: 本研究使用了SMOTE技术，以及其扩展版本——软标签SMOTE和混合技术。
results: 研究发现，通过将SMOTE和混合技术结合使用，可以提高深度学习模型的泛化性能，并且在极端异质数据上达到最佳性能。

Abstract
Given imbalanced data, it is hard to train a good classifier using deep learning because of the poor generalization of minority classes. Traditionally, the well-known synthetic minority oversampling technique (SMOTE) for data augmentation, a data mining approach for imbalanced learning, has been used to improve this generalization. However, it is unclear whether SMOTE also benefits deep learning. In this work, we study why the original SMOTE is insufficient for deep learning, and enhance SMOTE using soft labels. Connecting the resulting soft SMOTE with Mixup, a modern data augmentation technique, leads to a unified framework that puts traditional and modern data augmentation techniques under the same umbrella. A careful study within this framework shows that Mixup improves generalization by implicitly achieving uneven margins between majority and minority classes. We then propose a novel margin-aware Mixup technique that more explicitly achieves uneven margins. Extensive experimental results demonstrate that our proposed technique yields state-of-the-art performance on deep imbalanced classification while achieving superior performance on extremely imbalanced data. The code is open-sourced in our developed package https://github.com/ntucllab/imbalanced-DL to foster future research in this direction.

摘要

When Do Program-of-Thoughts Work for Reasoning?

paper_url: http://arxiv.org/abs/2308.15452
repo_url: https://github.com/zjunlp/easyinstruct
paper_authors: Zhen Bi, Ningyu Zhang, Yinuo Jiang, Shumin Deng, Guozhou Zheng, Huajun Chen
for: 本研究旨在探讨 Large Language Models (LLMs) 在肢体人工智能领域的逻辑能力如何提高，以及程序语言的影响。
methods: 本研究提出了一种 complexity-impacted reasoning score (CIRS)，用于衡量程序语言的结构和逻辑特性对逻辑能力的影响。CIRS 使用抽象树来编码结构信息，并通过考虑困难性和环状复杂性来计算逻辑复杂性。
results: 研究发现，不同的程序数据复杂性会影响 LLMS 的逻辑能力提升。优化的复杂度是关键的，程序帮助提示可以提高逻辑能力。研究还提出了一种自动生成和分级算法，并应用于数学逻辑和代码生成任务。广泛的结果表明了我们的提出的方法的有效性。

Abstract
The reasoning capabilities of Large Language Models (LLMs) play a pivotal role in the realm of embodied artificial intelligence. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.

摘要
LLMs的逻辑能力在人工智能中扮演着关键性角色。虽有有效的程序激活提示法 для LLMs，但代码数据对复杂逻辑任务的改进仍然未得到足够的探讨。为解决这个空白，我们提出了复杂性影响逻辑能力指数（CIRS），它将结构性和逻辑性特征相结合，用于衡量代码数据对逻辑能力的相关性。我们使用抽象树来编码结构信息，并根据难度和 cyclomatic complexity来计算逻辑复杂性。经 empirical 分析发现，不 всех复杂度的代码数据可以被 LLMS 学习或理解。优质的复杂度是关键的，以便通过程序帮助提示来提高逻辑能力。然后，我们设计了自动生成和分配算法，并应用它到数学逻辑和代码生成任务中。广泛的结果表明了我们的提出的方法的有效性。代码将会被集成到 EasyInstruct 框架中，可以在中找到。

Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction

paper_url: http://arxiv.org/abs/2308.15427
repo_url: None
paper_authors: Wenjie Gao, Jiawei Fu, Haodong Jing, Nanning Zheng
for: 提高自动驾驶系统中的高清晰地图建构方法的性能，使其更敏感于废弃环境。
methods: 补充车载感知器上的信息使用卫星地图，提高HD地图建构方法的性能。提出一种层次融合模块，通过Feature级别融合和BEV级别融合来实现卫星地图信息的更好融合。
results: 在扩展nuScenes数据集上，证明了我们的模块可以准确地融合到现有的HD地图建构方法中，提高其在HD地图Semantic segmentation和实例检测任务中的性能。

Abstract
High-Definition (HD) maps play a crucial role in autonomous driving systems. Recent methods have attempted to construct HD maps in real-time based on information obtained from vehicle onboard sensors. However, the performance of these methods is significantly susceptible to the environment surrounding the vehicle due to the inherent limitation of onboard sensors, such as weak capacity for long-range detection. In this study, we demonstrate that supplementing onboard sensors with satellite maps can enhance the performance of HD map construction methods, leveraging the broad coverage capability of satellite maps. For the purpose of further research, we release the satellite map tiles as a complementary dataset of nuScenes dataset. Meanwhile, we propose a hierarchical fusion module that enables better fusion of satellite maps information with existing methods. Specifically, we design an attention mask based on segmentation and distance, applying the cross-attention mechanism to fuse onboard Bird's Eye View (BEV) features and satellite features in feature-level fusion. An alignment module is introduced before concatenation in BEV-level fusion to mitigate the impact of misalignment between the two features. The experimental results on the augmented nuScenes dataset showcase the seamless integration of our module into three existing HD map construction methods. It notably enhances their performance in both HD map semantic segmentation and instance detection tasks.

摘要
高清定义（HD）地图在自动驾驶系统中扮演着关键角色。现有方法尝试在实时基础上构建HD地图，但这些方法的性能受周围环境的影响很大，因为搭载在车辆上的感知器件具有较弱的远程探测能力。在本研究中，我们发现可以通过补充搭载在车辆上的感知器件与卫星地图的信息来提高HD地图构建方法的性能。为进一步研究，我们释放了卫星地图块作为nuScenes数据集的补充数据集。此外，我们提议了一种层次融合模块，使得更好地融合卫星地图信息与现有方法。具体来说，我们设计了一个基于分割和距离的注意mask，通过交叉注意机制来融合搭载在 Bird's Eye View（BEV）上的特征和卫星特征在特征层融合。在BEV层融合之前，我们引入了一个对齐模块，以mitigate卫星特征和BEV特征之间的偏移的影响。实验结果表明，我们的模块可以覆盖现有的三种HD地图构建方法，并在HD地图semantic segmentation和实例检测任务中显著提高其性能。

2023-08-30

cs.CL

cs.CL - 2023-08-30

Vulgar Remarks Detection in Chittagonian Dialect of Bangla

paper_url: http://arxiv.org/abs/2308.15448
repo_url: None
paper_authors: Tanjim Mahmud, Michal Ptaszynski, Fumito Masui
for: 本研究旨在探讨社交媒体上的负面言语 automatic detection方法，尤其是在低资源语言如锡兰语方言上。
methods: 本研究使用了指导学习和深度学习算法来检测社交媒体上的侮辱言语。Logistic Regression实现了可观的准确率（0.91），而简单的RNN具有Word2vec和fastTex的组合实现了较低的准确率（0.84-0.90），这说明了NN算法需要更多的数据。
results: 本研究显示，使用指导学习和深度学习算法可以准确地检测社交媒体上的侮辱言语，但是NN算法需要更多的数据以实现更高的准确率。

Abstract
The negative effects of online bullying and harassment are increasing with Internet popularity, especially in social media. One solution is using natural language processing (NLP) and machine learning (ML) methods for the automatic detection of harmful remarks, but these methods are limited in low-resource languages like the Chittagonian dialect of Bangla.This study focuses on detecting vulgar remarks in social media using supervised ML and deep learning algorithms.Logistic Regression achieved promising accuracy (0.91) while simple RNN with Word2vec and fastTex had lower accuracy (0.84-0.90), highlighting the issue that NN algorithms require more data.

摘要
互联网欺凌和 Harrasment 的负面影响随着互联网的普及而增加，尤其在社交媒体上。一种解决方案是使用自然语言处理（NLP）和机器学习（ML）方法进行自动发现危险评论，但这些方法在低资源语言如锡兰语的拼写方言上有限。本研究关注社交媒体中的粗鄙评论使用监督式 ML 和深度学习算法探测。Logistic Regression 达到了可靠的准确率（0.91），而简单的 RNN 与 Word2vec 和 fastTex 的准确率（0.84-0.90）较低，表明 NN 算法需要更多的数据。

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

paper_url: http://arxiv.org/abs/2308.15419
repo_url: https://github.com/tylerachang/lm-learning-curves
paper_authors: Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen
for: 这些语言模型在预训练中学习的问题是什么？
methods: 这些语言模型在预训练中使用了什么方法？
results: 这些语言模型在预训练中得到了什么结果？Here are the answers in Simplified Chinese text:
for: 这些语言模型在预训练中学习的问题是如何快速预测语言模型的性能？
methods: 这些语言模型在预训练中使用了自适应语言模型的预训练方法？
results: 这些语言模型在预训练中得到了快速预测语言模型的稳定性和性能？

Abstract
How do language models learn to make predictions during pre-training? To study this question, we extract learning curves from five autoregressive English language model pre-training runs, for 1M tokens in context. We observe that the language models generate short repetitive phrases before learning to generate longer and more coherent text. We quantify the final surprisal, within-run variability, age of acquisition, forgettability, and cross-run variability of learning curves for individual tokens in context. More frequent tokens reach lower final surprisals, exhibit less variability within and across pre-training runs, are learned earlier, and are less likely to be "forgotten" during pre-training. Higher n-gram probabilities further accentuate these effects. Independent of the target token, shorter and more frequent contexts correlate with marginally more stable and quickly acquired predictions. Effects of part-of-speech are also small, although nouns tend to be acquired later and less stably than verbs, adverbs, and adjectives. Our work contributes to a better understanding of language model pre-training dynamics and informs the deployment of stable language models in practice.

摘要
<>我们使用五个权重autoregressive英语语言模型的预训练运行来研究语言模型如何预测。我们从100万个字Context中提取学习曲线，并观察到语言模型在预训练过程中首先生成短重复的短语，然后学习 longer和更 coherent的文本。我们量化每个Token在Context中的最终难度、内Run变化、年龄 acquisition、忘记性和 Cross-Run变化。我们发现更常见的Token在Context中更容易预测， exhibit less variability within和across预训练运行，learn Earlier，并更可能被"忘记" during预训练。高 n-gram概率更加强调这些效果。独立于目标Token，更短和更频繁的Contexts呈现marginally more stable and quickly acquired预测。 parts of speech的影响也很小，although nouns tend to be acquired later and less stably than verbs, adverbs, and adjectives。我们的工作对语言模型预训练动力学的更好理解，并可以帮助实践中稳定地部署语言模型。

2023-08-30

cs.LG

cs.LG - 2023-08-30

Policy composition in reinforcement learning via multi-objective policy optimization

paper_url: http://arxiv.org/abs/2308.15470
repo_url: None
paper_authors: Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup
for: 本研究旨在使用相关的先进教学策略来帮助强化学习代理人学习成功的行为策略。
methods: 本研究使用了多目标策略优化算法（Multi-Objective Maximum a Posteriori Policy Optimization，简称MOPPO），其中包括任务目标以及教学策略作为多个目标。
results: 研究表明，在继续使用教学策略的情况下，强化学习代理人可以更快速地学习任务，特别是在缺乏形成奖励的情况下。在连续观察和动作空间的两个领域中，我们的代理人成功地组合了教学策略序列和并行，并能够进一步扩展教学策略以解决任务。

Abstract
We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm \citep{abdolmaleki2020distributional}, we show that teacher policies can help speed up learning, particularly in the absence of shaping rewards. In two domains with continuous observation and action spaces, our agents successfully compose teacher policies in sequence and in parallel, and are also able to further extend the policies of the teachers in order to solve the task. Depending on the specified combination of task and teacher(s), teacher(s) may naturally act to limit the final performance of an agent. The extent to which agents are required to adhere to teacher policies are determined by hyperparameters which determine both the effect of teachers on learning speed and the eventual performance of the agent on the task. In the {\tt humanoid} domain \citep{deepmindcontrolsuite2018}, we also equip agents with the ability to control the selection of teachers. With this ability, agents are able to meaningfully compose from the teacher policies to achieve a superior task reward on the {\tt walk} task than in cases without access to the teacher policies. We show the resemblance of composed task policies with the corresponding teacher policies through videos.

摘要
我们使用已有的教师策略来帮助权威学习代理人学习成功行为策略。教师策略被引入为目标之一，同时与任务目标一起使用多目标策略优化算法 \citep{abdolmaleki2020distributional}。我们在连续观察和动作空间的两个领域中表示，我们的代理人可以成功组合教师策略并且可以进一步扩展教师策略以解决任务。在指定的任务和教师的组合下，教师可能会自然地限制代理人的最终表现。代理人需要遵循教师策略的程度由参数决定，这些参数不仅影响代理人学习速度，还影响代理人在任务上的最终表现。在{\tt humanoid}领域 \citep{deepmindcontrolsuite2018}中，我们还让代理人控制选择教师的能力。通过这种能力，代理人能够有意义地从教师策略中组合任务策略，在{\tt walk}任务上比不使用教师策略的情况下更好的完成任务。我们通过视频显示，组合的任务策略与相应的教师策略之间的相似性。

Random feature approximation for general spectral methods

paper_url: http://arxiv.org/abs/2308.15434
repo_url: None
paper_authors: Mike Nguyen, Nicole Mücke
for: 本文研究了大规模算法中使用权重方法的减速技术，以及深度神经网络的分析方法。
methods: 本文使用了随机特征approximation技术，并对权重方法进行了分析。
results: 本文对权重方法的泛化性质进行了研究，并在不同的常数空间中获得了优化的学习率。

Abstract
Random feature approximation is arguably one of the most popular techniques to speed up kernel methods in large scale algorithms and provides a theoretical approach to the analysis of deep neural networks. We analyze generalization properties for a large class of spectral regularization methods combined with random features, containing kernel methods with implicit regularization such as gradient descent or explicit methods like Tikhonov regularization. For our estimators we obtain optimal learning rates over regularity classes (even for classes that are not included in the reproducing kernel Hilbert space), which are defined through appropriate source conditions. This improves or completes previous results obtained in related settings for specific kernel algorithms.

摘要
随机特征近似是大规模算法中最受欢迎的技术之一，它提供了对深度神经网络的分析理论方法。我们分析了一大类spectral regularization方法，包括梯度下降或特ikhonov regularization等kernel方法，其中的泛化性质得到了改进或完善。我们的估计器可以在不同的常数类型下获得最佳学习速率，这些常数类型包括 reproduce kernel Hilbert space以外的类型。这些结果与之前在相关的设置中获得的结果相匹配或完善。

Probabilistic solar flare forecasting using historical magnetogram data

paper_url: http://arxiv.org/abs/2308.15410
repo_url: https://github.com/swri-idea-lab/idea-lab-flare-forecast
paper_authors: Kiera van der Sande, Andrés Muñoz-Jaramillo, Subhamoy Chatterjee
for: 这个论文旨在利用机器学习技术预测太阳风暴。
methods: 这篇论文使用了卷积神经网络提取全天磁图像的特征，并将这些特征与磁图像和风暴历史信息相结合，使用了逻辑回归模型生成可カリibrated的风暴预测。
results: 包括历史数据在内的多种仪器的日常磁图像数据可以提高预测的准确性和可靠性，磁图像单一帧不含 significatively更多有用信息，而风暴历史信息比我们提取的磁图像特征更具预测力。

Abstract
Solar flare forecasting research using machine learning (ML) has focused on high resolution magnetogram data from the SDO/HMI era covering Solar Cycle 24 and the start of Solar Cycle 25, with some efforts looking back to SOHO/MDI for data from Solar Cycle 23. In this paper, we consider over 4 solar cycles of daily historical magnetogram data from multiple instruments. This is the first attempt to take advantage of this historical data for ML-based flare forecasting. We apply a convolutional neural network (CNN) to extract features from full-disk magnetograms together with a logistic regression model to incorporate scalar features based on magnetograms and flaring history. We use an ensemble approach to generate calibrated probabilistic forecasts of M-class or larger flares in the next 24 hours. Overall, we find that including historical data improves forecasting skill and reliability. We show that single frame magnetograms do not contain significantly more relevant information than can be summarized in a small number of scalar features, and that flaring history has greater predictive power than our CNN-extracted features. This indicates the importance of including temporal information in flare forecasting models.

摘要
太阳风暴预测研究使用机器学习（ML）专注于高分辨率磁场图像从SDO/HMI时期的太阳周期24和太阳周期25之前的一些努力，也有一些努力回到SOHO/MDI上的数据。在这篇论文中，我们考虑了4个太阳周期的日常历史磁场数据从多个仪器。这是第一次利用历史数据为ML-基于的太阳风暴预测。我们使用卷积神经网络（CNN）提取磁场图像的特征，并将磁场图像和风暴历史中的一些缺失特征加以逻辑回归模型。我们使用一个集成方法生成标准化的可信度预测M级或大于M级太阳风暴在下一个24小时内发生。总的来说，我们发现包含历史数据可以提高预测技巧和可靠性。我们显示单个帧磁场图像不含有足够重要的信息，而风暴历史更有预测力量。这表明包含时间信息在太阳风暴预测模型中是非常重要的。

2023-08-29

cs.CV

cs.CV - 2023-08-29

Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond

paper_url: http://arxiv.org/abs/2308.14753
repo_url: https://github.com/vsd-benchmark/vsd
paper_authors: Oren Barkan, Tal Reiss, Jonathan Weill, Ori Katz, Roy Hirsch, Itzik Malkiel, Noam Koenigstein
for: 该论文的目的是提出一个大规模的时尚视觉相似性数据集，以及一种有效的标签过程，以便评估视觉相似性检测方法。
methods: 该论文使用了专业注释的图像对进行标签，并提出了一种新的标签过程，可以应用于任何数据集。
results: 该论文提出了一个大规模的时尚视觉相似性数据集，并进行了对该数据集的分析和评估。Here’s the full text in Simplified Chinese:
for: 该论文的目的是提出一个大规模的时尚视觉相似性数据集，以及一种有效的标签过程，以便评估视觉相似性检测方法。
methods: 该论文使用了专业注释的图像对进行标签，并提出了一种新的标签过程，可以应用于任何数据集。
results: 该论文提出了一个大规模的时尚视觉相似性数据集，并进行了对该数据集的分析和评估。

Abstract
Visual similarities discovery (VSD) is an important task with broad e-commerce applications. Given an image of a certain object, the goal of VSD is to retrieve images of different objects with high perceptual visual similarity. Although being a highly addressed problem, the evaluation of proposed methods for VSD is often based on a proxy of an identification-retrieval task, evaluating the ability of a model to retrieve different images of the same object. We posit that evaluating VSD methods based on identification tasks is limited, and faithful evaluation must rely on expert annotations. In this paper, we introduce the first large-scale fashion visual similarity benchmark dataset, consisting of more than 110K expert-annotated image pairs. Besides this major contribution, we share insight from the challenges we faced while curating this dataset. Based on these insights, we propose a novel and efficient labeling procedure that can be applied to any dataset. Our analysis examines its limitations and inductive biases, and based on these findings, we propose metrics to mitigate those limitations. Though our primary focus lies on visual similarity, the methodologies we present have broader applications for discovering and evaluating perceptual similarity across various domains.

摘要
文本翻译为简化字符串。视觉相似性发现（VSD）是一项广泛应用于电子商务的重要任务。给定一个对象的图像，VSD的目标是检索具有高度感知视觉相似性的不同对象的图像。尽管是一个已经受到广泛关注的问题，但评估提出的VSD方法的方法通常基于一种对象预测任务的代理，评估模型是否可以正确地检索不同的对象图像。我们认为，基于预测任务进行评估VSD方法有限制，我们应该依靠专家注释来进行 faithful 评估。在这篇文章中，我们发布了首个大规模的时尚视觉相似性准 benchmark 数据集，包含 более 110K 专家注释的图像对。此外，我们还分享了在编 Curate 这个数据集时遇到的挑战，并提出了一种新的和高效的标签过程，可以应用于任何数据集。我们的分析检查了这些限制和偏好，并基于这些发现，我们提出了一些缓解这些限制的度量。虽然我们的主要焦点是视觉相似性，但我们所提出的方法ologies 有更广泛的应用于发现和评估不同领域中的感知相似性。

MagicEdit: High-Fidelity and Temporally Coherent Video Editing

paper_url: http://arxiv.org/abs/2308.14749
repo_url: None
paper_authors: Jun Hao Liew, Hanshu Yan, Jianfeng Zhang, Zhongcong Xu, Jiashi Feng
for: 这个论文是为了解决文本引导视频编辑任务而写的。
methods: 这个论文使用了分离内容、结构和动作信号的方法来在训练中学习高效的视频-to-视频翻译。
results: 论文表明，这种方法可以实现高精度和时间协调的视频翻译，并且支持多种下游视频编辑任务，如视频 стилизация、本地编辑、视频MagicMix和视频填充。

Abstract
In this report, we present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task. We found that high-fidelity and temporally coherent video-to-video translation can be achieved by explicitly disentangling the learning of content, structure and motion signals during training. This is in contradict to most existing methods which attempt to jointly model both the appearance and temporal representation within a single framework, which we argue, would lead to degradation in per-frame quality. Despite its simplicity, we show that MagicEdit supports various downstream video editing tasks, including video stylization, local editing, video-MagicMix and video outpainting.

摘要
在这份报告中，我们介绍MagicEdit，一种高效且简单的解决文本引导视频编辑问题的解决方案。我们发现，在训练中分离内容、结构和运动信号的学习可以实现高精度和时间启示的视频-to-视频翻译。这与大多数现有方法不同，这些方法尝试同时模型视频的外观和时间表示，我们认为这会导致每帧质量下降。尽管简单，我们显示MagicEdit支持多种下渠视频编辑任务，包括视频风格化、本地编辑、MagicMix和视频外缩。

MagicAvatar: Multimodal Avatar Generation and Animation

paper_url: http://arxiv.org/abs/2308.14748
repo_url: https://github.com/magic-research/magic-avatar
paper_authors: Jianfeng Zhang, Hanshu Yan, Zhongcong Xu, Jiashi Feng, Jun Hao Liew
for: 这个论文旨在提出一种基于多模态视频生成和人偶动画的框架，即 MagicAvatar。
methods: 这个框架包括两个阶段：第一个阶段是将多模态输入翻译成动作/控制信号（例如人姿、深度、DensePose），第二个阶段是根据这些动作信号生成人偶视频。
results: MagicAvatar可以通过提供一些人的图像来animate人偶，并且可以实现文本指导和视频指导的人偶生成。此外，它还可以应用于多模态人偶动画。

Abstract
This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars. Unlike most existing methods that generate avatar-centric videos directly from multimodal inputs (e.g., text prompts), MagicAvatar explicitly disentangles avatar video generation into two stages: (1) multimodal-to-motion and (2) motion-to-video generation. The first stage translates the multimodal inputs into motion/ control signals (e.g., human pose, depth, DensePose); while the second stage generates avatar-centric video guided by these motion signals. Additionally, MagicAvatar supports avatar animation by simply providing a few images of the target person. This capability enables the animation of the provided human identity according to the specific motion derived from the first stage. We demonstrate the flexibility of MagicAvatar through various applications, including text-guided and video-guided avatar generation, as well as multimodal avatar animation.

摘要

Multimodal-to-motion: Translating the multimodal inputs into motion/control signals (e.g., human pose, depth, DensePose).2. Motion-to-video generation: Generating avatar-centric video guided by these motion signals.Moreover, MagicAvatar supports avatar animation by simply providing a few images of the target person. This capability enables the animation of the provided human identity according to the specific motion derived from the first stage. We demonstrate the flexibility of MagicAvatar through various applications, including text-guided and video-guided avatar generation, as well as multimodal avatar animation.中文翻译：这份报告介绍了 MagicAvatar，一个用于多模态视频生成和人物动画的框架。与大多数现有方法不同，MagicAvatarexplicitly归纳了人物视频生成到两个阶段：1. 多模态到动作：将多模态输入翻译成动作/控制信号（例如人姿、深度、DensePose）。2. 动作到视频生成：根据这些动作信号生成人物视频。此外，MagicAvatar还支持人物动画，只需提供一些目标人物的图像即可。这使得可以根据第一阶段得到的特定动作来动画提供的人物。我们通过多种应用，包括文本指导和视频指导的人物生成以及多模态人物动画，展示了MagicAvatar的灵活性。

CoVR: Learning Composed Video Retrieval from Web Video Captions

paper_url: http://arxiv.org/abs/2308.14746
repo_url: https://github.com/lucas-ventura/CoVR
paper_authors: Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol
for: 这篇论文的目的是提出一种可扩展的自动生成compose Image Retrieval（CoIR）数据集方法，以替代手动纪录CoIR triplets的高成本和不可扩展性。
methods: 该方法利用了视频-caption对的对应关系，并采用了大型自然语言模型来生成修改文本。
results: 通过应用该方法于WebVid2M数据集，自动生成了160万个CoIR triplets，并提出了一个新的CoVR数据集和一个手动注解的评估集。实验表明，训练CoVR模型在我们的数据集上可以有效转移到CoIR，在零基eline设置下在CIRR和FashionIQ数据集上达到了最佳性能。

Abstract
Composed Image Retrieval (CoIR) has recently gained popularity as a task that considers both text and image queries together, to search for relevant images in a database. Most CoIR approaches require manually annotated datasets, comprising image-text-image triplets, where the text describes a modification from the query image to the target image. However, manual curation of CoIR triplets is expensive and prevents scalability. In this work, we instead propose a scalable automatic dataset creation methodology that generates triplets given video-caption pairs, while also expanding the scope of the task to include composed video retrieval (CoVR). To this end, we mine paired videos with a similar caption from a large database, and leverage a large language model to generate the corresponding modification text. Applying this methodology to the extensive WebVid2M collection, we automatically construct our WebVid-CoVR dataset, resulting in 1.6 million triplets. Moreover, we introduce a new benchmark for CoVR with a manually annotated evaluation set, along with baseline results. Our experiments further demonstrate that training a CoVR model on our dataset effectively transfers to CoIR, leading to improved state-of-the-art performance in the zero-shot setup on both the CIRR and FashionIQ benchmarks. Our code, datasets, and models are publicly available at https://imagine.enpc.fr/~ventural/covr.

摘要
团体图像检索（CoIR）在最近几年内得到了广泛关注，因为它同时考虑图像和文本查询，以搜索图像库中相关的图像。大多数CoIR方法需要手动标注数据集，包括图像-文本-图像三元组，其中文本描述了查询图像到目标图像的修改。然而，手动筛选CoIR三元组是贵重的并不可扩展。在这种情况下，我们提议一种可扩展的自动数据创建方法，使用视频-标题对进行生成三元组，同时扩展CoIR任务的范围，以包括组合视频检索（CoVR）。为此，我们从大量视频库中挖掘具有相同标题的视频对，并利用大型自然语言模型生成相应的修改文本。通过应用这种方法，我们自动建立了WebVid-CoVR数据集，共计1600万三元组。此外，我们还提供了一个手动标注的评估集，以及基准结果。我们的实验表明，训练CoVR模型于我们的数据集后，可以转移到CoIR任务，在零例情况下在CIRR和FashionIQ benchmark上实现了状态的推进性表现。我们的代码、数据集和模型都公开提供在https://imagine.enpc.fr/~ventural/covr。

Total Selfie: Generating Full-Body Selfies

paper_url: http://arxiv.org/abs/2308.14740
repo_url: None
paper_authors: Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz
for: 生成全身自拍照（用户提供的视频、目标姿势照片和每个位置的自拍照+背景对）
methods: diffusion-based 方法组合这些信息生成高质量、正确pose和背景的全身自拍照
results: 生成高品质、自然的全身自拍照，包括用户所需的姿势和背景

Abstract
We present a method to generate full-body selfies -- photos that you take of yourself, but capturing your whole body as if someone else took the photo of you from a few feet away. Our approach takes as input a pre-captured video of your body, a target pose photo, and a selfie + background pair for each location. We introduce a novel diffusion-based approach to combine all of this information into high quality, well-composed photos of you with the desired pose and background.

摘要
我们提出了一种方法，可以生成全身自拍照片。这种方法使用已经捕捉的视频、目标姿势照片和每个位置的自拍+背景对。我们提出了一种新的扩散方法，可以将这些信息组合成高质量、正确姿势和背景的全身自拍照片。

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

paper_url: http://arxiv.org/abs/2308.14713
repo_url: None
paper_authors: Aron Schmied, Tobias Fischer, Martin Danelljan, Marc Pollefeys, Fisher Yu
for:多camera系统提供了一个简单且便宜的选择，但是实现实时三维重建和自己运动估计的挑战非常大。methods:我们提出了R3D3，一个多camera系统，它通过调合几何推导和单目深度精准化，实现了紧密的三维重建和自己运动估计。results:我们的设计能够在充满活动的外部环境中实现紧密、一致的三维重建，并在DDAD和NuScenes参考数据集上实现了状态顶尖的紧密深度预测。

Abstract
Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach iterates between geometric estimation that exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We integrate multi-camera feature correlation and dense bundle adjustment operators that yield robust geometric depth and pose estimates. To improve reconstruction where geometric depth is unreliable, e.g. for moving objects or low-textured regions, we introduce learnable scene priors via a depth refinement network. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments. Consequently, we achieve state-of-the-art dense depth prediction on the DDAD and NuScenes benchmarks.

摘要
dense 3D 重建和自己运动估算是autéonomous driving 和机器人控制中的关键挑战。相比较复杂的多Modal 系统，多摄像头系统提供了一个简单、低成本的替代方案。然而，基于多摄像头的3D 重建复杂动态场景已经证明是非常困难，因为现有的解决方案通常会生成不完整或不一致的结果。我们提出了 R3D3，一个多摄像头系统用于精密3D 重建和自己运动估算。我们的方法在多摄像头的空间-时间信息上进行几何估算，并使用单摄像头的深度精度优化。我们将多摄像头特征相关和紧凑缓冲调整算法相结合，以获得强健的几何深度和pose估算。为了改进重建，特别是对于在运动 объек 或低文纹区域的情况，我们引入了学习场景假设。我们显示，这种设计可以实现高精度、一致的3D 重建，并在复杂的户外环境中达到了状态 искусственный智能的 dense depth prediction benchmarks。

360-Degree Panorama Generation from Few Unregistered NFoV Images

paper_url: http://arxiv.org/abs/2308.14686
repo_url: https://github.com/shanemankiw/panodiff
paper_authors: Jionghao Wang, Ziyu Chen, Jun Ling, Rong Xie, Li Song
for: 生成完整的360度全景图像
methods: 使用一或多个不准确地捕捉的窄视场图像，以及文本提示和几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几yles in Simplified Chinese text:
for: 生成完整的360度全景图像
methods: 使用一或多个不准确地捕捉的窄视场图像，以及文本提示和几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几种几ypes几种几种几种几种几种几ypes几种几种几种几ypes几种几种几Types几种几Types几种几Types几种几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types几Types�

Abstract
360$^\circ$ panoramas are extensively utilized as environmental light sources in computer graphics. However, capturing a 360$^\circ$ $\times$ 180$^\circ$ panorama poses challenges due to the necessity of specialized and costly equipment, and additional human resources. Prior studies develop various learning-based generative methods to synthesize panoramas from a single Narrow Field-of-View (NFoV) image, but they are limited in alterable input patterns, generation quality, and controllability. To address these issues, we propose a novel pipeline called PanoDiff, which efficiently generates complete 360$^\circ$ panoramas using one or more unregistered NFoV images captured from arbitrary angles. Our approach has two primary components to overcome the limitations. Firstly, a two-stage angle prediction module to handle various numbers of NFoV inputs. Secondly, a novel latent diffusion-based panorama generation model uses incomplete panorama and text prompts as control signals and utilizes several geometric augmentation schemes to ensure geometric properties in generated panoramas. Experiments show that PanoDiff achieves state-of-the-art panoramic generation quality and high controllability, making it suitable for applications such as content editing.

摘要
“360度全景图是计算机图形中广泛应用的环境光源。然而，捕捉360度×180度全景图受到特殊设备和人工资源的限制。先前的研究已经开发了基于学习的生成方法，以synthesize全景图从单个镜头视场（NFoV）图像中，但它们受到输入模式的局限性、生成质量和可控性的限制。为了解决这些问题，我们提出了一个新的渠道 called PanoDiff，它能够高效地使用一个或多个不准确的NFoV图像，从任意角度捕捉到完整的360度全景图。我们的方法包括两个主要组成部分：首先，一个两stage的角度预测模块，可以处理不同数量的NFoV输入。其次，一种新的扩散增强的全景图生成模型，使用部分全景图和文本提示作为控制信号，并使用多种几何增强方案来保证生成的全景图具备几何性质。实验表明，PanoDiff可以实现状态最佳的全景图生成质量和高可控性，适用于内容编辑等应用。”

Video-Based Hand Pose Estimation for Remote Assessment of Bradykinesia in Parkinson’s Disease

paper_url: http://arxiv.org/abs/2308.14679
repo_url: None
paper_authors: Gabriela T. Acevedo Trebbau, Andrea Bandini, Diego L. Guarin
for: 这个研究旨在检验pose estimation算法是否能够在视频流服务中进行远程PD评估和监测。
methods: 研究使用了7种商业可用的手势估计模型来估计视频中手部的运动。
results: 结果显示，在本地记录的视频中，3种模型表现良好，而在视频流服务中记录的视频中，模型的准确率显著下降。研究还发现，视频流服务中的运动速度和模型准确率之间存在负相关性。

Abstract
There is a growing interest in using pose estimation algorithms for video-based assessment of Bradykinesia in Parkinson's Disease (PD) to facilitate remote disease assessment and monitoring. However, the accuracy of pose estimation algorithms in videos from video streaming services during Telehealth appointments has not been studied. In this study, we used seven off-the-shelf hand pose estimation models to estimate the movement of the thumb and index fingers in videos of the finger-tapping (FT) test recorded from Healthy Controls (HC) and participants with PD and under two different conditions: streaming (videos recorded during a live Zoom meeting) and on-device (videos recorded locally with high-quality cameras). The accuracy and reliability of the models were estimated by comparing the models' output with manual results. Three of the seven models demonstrated good accuracy for on-device recordings, and the accuracy decreased significantly for streaming recordings. We observed a negative correlation between movement speed and the model's accuracy for the streaming recordings. Additionally, we evaluated the reliability of ten movement features related to bradykinesia extracted from video recordings of PD patients performing the FT test. While most of the features demonstrated excellent reliability for on-device recordings, most of the features demonstrated poor to moderate reliability for streaming recordings. Our findings highlight the limitations of pose estimation algorithms when applied to video recordings obtained during Telehealth visits, and demonstrate that on-device recordings can be used for automatic video-assessment of bradykinesia in PD.

摘要
有越来越多的关注使用pose estimation算法来评估基于视频的PD患者的静止症状评估和监测。然而，在视频流服务中使用pose estimation算法的准确性尚未被研究。本研究使用了七种市售手势估计模型来估计视频中的手指 thumb和index fingers的运动。我们使用了HC和PD患者在两种不同条件下录制的视频：流程（在live Zoom会议中录制的视频）和本地（使用高质量摄像头录制的视频）。我们将模型的准确性和可靠性与手动结果进行比较。三种模型在本地录制下表现良好，而流程录制下的准确性显著降低。我们发现视频录制中运动速度与模型的准确性之间存在负相关性。此外，我们评估了PD患者在执行FT测试时录制的视频中关于静止症状的10种运动特征的可靠性。大多数特征在本地录制下表现出色，而流程录制下大多数特征表现为亮度到中度的可靠性。我们的发现表明在Telehealth访问中使用pose estimation算法进行自动视频评估的难度，并且可以使用本地录制来实现更高的准确性和可靠性。

2023-08-29

cs.AI

cs.AI - 2023-08-29

AI Deception: A Survey of Examples, Risks, and Potential Solutions

paper_url: http://arxiv.org/abs/2308.14752
repo_url: None
paper_authors: Peter S. Park, Simon Goldstein, Aidan O’Gara, Michael Chen, Dan Hendrycks
For: 这篇论文主要探讨了现代人工智能系统是如何骗取人类信任的现象。* Methods: 论文首先是通过对现有的AI骗局例子进行概述，然后讨论了特定的AI系统（如Meta的CICERO）和通用AI系统（如大语言模型）如何骗取人类信任。* Results: 论文指出了AI骗局可能导致的一些风险，如诈骗、选举干预和AI系统控制失败等问题，并提出了一些解决这些问题的可能性，如实施robust风险评估要求、实施“机器人或者人”法律和重视相关研究，包括检测AI骗局和使AI系统更加不骗取人类信任的工具。

Abstract
This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta's CICERO) built for specific competitive situations, and general-purpose AI systems (such as large language models). Next, we detail several risks from AI deception, such as fraud, election tampering, and losing control of AI systems. Finally, we outline several potential solutions to the problems posed by AI deception: first, regulatory frameworks should subject AI systems that are capable of deception to robust risk-assessment requirements; second, policymakers should implement bot-or-not laws; and finally, policymakers should prioritize the funding of relevant research, including tools to detect AI deception and to make AI systems less deceptive. Policymakers, researchers, and the broader public should work proactively to prevent AI deception from destabilizing the shared foundations of our society.

摘要

Regulatory frameworks should subject AI systems capable of deception to rigorous risk assessments.2. Policymakers should implement “bot-or-not” laws to distinguish between human and AI-generated content.3. Policymakers should prioritize funding for research on AI deception detection and making AI systems less deceptive.We urge policymakers, researchers, and the public to work together to prevent AI deception from undermining the foundations of our society.

Flexible Techniques for Differentiable Rendering with 3D Gaussians

paper_url: http://arxiv.org/abs/2308.14737
repo_url: None
paper_authors: Leonid Keselman, Martial Hebert
for: 快速、可靠的形状重建是计算机视觉应用中的关键组成部分。神经辐射场示示了高品质新视图合成可能性，但是受到实际场景和物体快速重建的性能要求受限。
methods: 本文提出了基于 altenative 形状表示的新方法，包括使用可导流动、导出水密网格和按照光栅方向渲染每个光栅。此外，我们还证明了两种最近的方法之间的互操作性。
results: 我们的扩展方法可以快速、稳定地进行形状重建，并且可以在GPU或CPU上进行执行。codes和视觉示例可以在https://leonidk.github.io/fmb-plus找到。

Abstract
Fast, reliable shape reconstruction is an essential ingredient in many computer vision applications. Neural Radiance Fields demonstrated that photorealistic novel view synthesis is within reach, but was gated by performance requirements for fast reconstruction of real scenes and objects. Several recent approaches have built on alternative shape representations, in particular, 3D Gaussians. We develop extensions to these renderers, such as integrating differentiable optical flow, exporting watertight meshes and rendering per-ray normals. Additionally, we show how two of the recent methods are interoperable with each other. These reconstructions are quick, robust, and easily performed on GPU or CPU. For code and visual examples, see https://leonidk.github.io/fmb-plus

摘要
快速可靠的形状重建是许多计算机视觉应用程序的关键组成部分。神经辐射场示示了高真实度新视图合成可能性，但是它受到快速重建真实场景和物体的性能要求限制。数据recent approaches have built on alternative shape representations, in particular, 3D Gaussians. We develop extensions to these renderers, such as integrating differentiable optical flow, exporting watertight meshes, and rendering per-ray normals. Additionally, we show how two of the recent methods are interoperable with each other. These reconstructions are quick, robust, and easily performed on GPU or CPU. For code and visual examples, see Here's the breakdown of the translation:* 快速可靠的形状重建 (fast and reliable shape reconstruction)* 是许多计算机视觉应用程序的关键组成部分 (is an essential ingredient in many computer vision applications)* 神经辐射场示示了高真实度新视图合成可能性 (Neural Radiance Fields demonstrated the possibility of high-fidelity novel view synthesis)* 但是它受到快速重建真实场景和物体的性能要求限制 (but was gated by performance requirements for fast reconstruction of real scenes and objects)* 数据recent approaches have built on alternative shape representations, in particular, 3D Gaussians (recent methods have focused on alternative shape representations, such as 3D Gaussians)* We develop extensions to these renderers (we develop extensions to these methods)* such as integrating differentiable optical flow (including differentiable optical flow)* exporting watertight meshes (exporting watertight meshes)* and rendering per-ray normals (and rendering per-ray normals)* Additionally, we show how two of the recent methods are interoperable with each other (additionally, we show how two of the recent methods can be combined)* These reconstructions are quick, robust, and easily performed on GPU or CPU (these reconstructions are fast, robust, and can be performed easily on GPU or CPU)* For code and visual examples, see (for code and visual examples, see )

Bayesian artificial brain with ChatGPT

paper_url: http://arxiv.org/abs/2308.14732
repo_url: None
paper_authors: Renato A. Krohling
for: 这个论文是为了研究 chatGPT 在概率理解方面的数学问题解决能力。
methods: 这个研究使用了 Zhu & Gigerenzer 在 2006 年的研究，对 10 个概率理解问题进行了测试。
results: 结果显示，ChatGPT 能够 correctly 解决所有 10 个问题。

Abstract
This paper aims to investigate the mathematical problem-solving capabilities of Chat Generative Pre-Trained Transformer (ChatGPT) in case of Bayesian reasoning. The study draws inspiration from Zhu & Gigerenzer's research in 2006, which posed the question: Can children reason the Bayesian way? In the pursuit of answering this question, a set of 10 Bayesian reasoning problems were presented. The results of their work revealed that children's ability to reason effectively using Bayesian principles is contingent upon a well-structured information representation. In this paper, we present the same set of 10 Bayesian reasoning problems to ChatGPT. Remarkably, the results demonstrate that ChatGPT provides the right solutions to all problems.

摘要

Distilled GPT for Source Code Summarization

paper_url: http://arxiv.org/abs/2308.14731
repo_url: https://github.com/apcl-research/jam-cgpt
paper_authors: Chia-Yi Su, Collin McMillan
for: 这个论文的目的是提出一种可以在单个16GHz GPU上运行的开源模型，以便自动生成代码摘要，而不需要将代码发送到不受信任的第三方服务器上。
methods: 该论文使用了一种基于大语言模型的方法，通过知识储存的过程来训练一个小型的开源模型，以便模拟GPT-3.5的功能。
results: 论文的评估结果表明，该模型可以准确地生成代码摘要，并且可以在单个16GHz GPU上运行。

Abstract
A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as "changes all visible polygons to the color blue" can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT-3.5 in a process related to knowledge distillation. Our model is small enough (350m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT-3.5 on this task.

摘要
code summary 是一个简短的自然语言描述，描述源代码的功能。这些描述通常只是一句话长，但是它们成为开发者文档的基础。例如，"改变所有可见的多边形为蓝色" 可以给程序员提供高级别的理解，不需要阅读代码本身。现在，基于大语言模型的产品，如 ChatGPT，已经显示出自动生成这些描述的能力。然而，使用这些工具，程序员必须将代码传递给不可靠的第三方进行处理（例如，通过 API 调用）。这种失去控制不acceptable для多个组织。在这篇论文中，我们提出一种代替方案：我们使用一个开源模型，使用 GPT-3.5 生成的样本输出进行训练。我们的模型具有 350 万参数，可以在单个 16 GB GPU 上运行，而我们的评估表明，它可以模拟 GPT-3.5 在这个任务上。

PanoSwin: a Pano-style Swin Transformer for Panorama Understanding

paper_url: http://arxiv.org/abs/2308.14726
repo_url: None
paper_authors: Zhixin Ling, Zhen Xing, Xiangdong Zhou, Manliang Cao, Guichun Zhou
for: 提高panorama理解能力
methods: 使用狭窄窗口卷积和投影注意力来解决equirectangular projection中的边界缺失和空间扭曲问题，同时采用绝对位坐标嵌入和相对位坐标偏移来增强panoramic geometry信息。
results: 在多种panoramic任务上（包括panoramic object detection、panoramic classification和panoramic layout estimation）实现了比领先方法更高的性能。

Abstract
In panorama understanding, the widely used equirectangular projection (ERP) entails boundary discontinuity and spatial distortion. It severely deteriorates the conventional CNNs and vision Transformers on panoramas. In this paper, we propose a simple yet effective architecture named PanoSwin to learn panorama representations with ERP. To deal with the challenges brought by equirectangular projection, we explore a pano-style shift windowing scheme and novel pitch attention to address the boundary discontinuity and the spatial distortion, respectively. Besides, based on spherical distance and Cartesian coordinates, we adapt absolute positional embeddings and relative positional biases for panoramas to enhance panoramic geometry information. Realizing that planar image understanding might share some common knowledge with panorama understanding, we devise a novel two-stage learning framework to facilitate knowledge transfer from the planar images to panoramas. We conduct experiments against the state-of-the-art on various panoramic tasks, i.e., panoramic object detection, panoramic classification, and panoramic layout estimation. The experimental results demonstrate the effectiveness of PanoSwin in panorama understanding.

摘要
在панोра姆理解方面，广泛使用的Equirectangular Projection（ERP）会导致边界缺continuity和空间扭曲。这会严重损害传统的CNN和vision Transformers在пан姆上的性能。在这篇论文中，我们提出了一种简单又有效的架构名为PanoSwin，用于学习pan姆表示。为了处理ERP所带来的挑战，我们研究了一种pano-style shift windowing scheme和novel pitch attention来解决边界缺continuity和空间扭曲问题。此外，基于球面距离和Cartesian坐标，我们采用绝对位域嵌入和相对位域偏好来增强pan姆的几何信息。由于平面图像理解可能与pan姆理解共享一些共同知识，我们设计了一种新的两Stage学习框架，以便从平面图像中传输知识到pan姆。我们对各种pan姆任务进行了实验，包括pan姆 объек检测、pan姆分类和pan姆布局估计。实验结果表明PanoSwin在pan姆理解方面的效果。

Hierarchical Time Series Forecasting with Bayesian Modeling

paper_url: http://arxiv.org/abs/2308.14719
repo_url: None
paper_authors: Gal Elgavish
for: 这个论文主要针对时间序列分析中的预测任务，即在不约束的情况下做出有用的决策。
methods: 该论文提出了一种分布式预测方法，即在不同层次结构中训练独立的预测模型，并将这些模型传递给一种重新调整算法以改进预测结果。
results: 该论文通过synthetic和实际数据集的实验表明，分布式预测方法可以提高预测的准确性，并且可以与其他相关的研究作品进行比较。

Abstract
We encounter time series data in many domains such as finance, physics, business, and weather. One of the main tasks of time series analysis, one that helps to take informed decisions under uncertainty, is forecasting. Time series are often hierarchically structured, e.g., a company sales might be broken down into different regions, and each region into different stores. In some cases the number of series in the hierarchy is too big to fit in a single model to produce forecasts in relevant time, and a decentralized approach is beneficial. One way to do this is to train independent forecasting models for each series and for some summary statistics series implied by the hierarchy (e.g. the sum of all series) and to pass those models to a reconciliation algorithm to improve those forecasts by sharing information between the series. In this work we focus on the reconciliation step, and propose a method to do so from a Bayesian perspective - Bayesian forecast reconciliation. We also define the common case of linear Gaussian reconciliation, where the forecasts are Gaussian and the hierarchy has linear structure, and show that we can compute reconciliation in closed form. We evaluate these methods on synthetic and real data sets, and compare them to other work in this field.

摘要
我们在各个领域，如金融、物理、商业和天气中都可以找到时间序列数据。时间序列分析的一个主要任务是预测，帮助在不确定的情况下做出了解的决策。时间序列经常具有层次结构，例如一家公司的销售可能分解为不同的区域和每个区域的不同的店铺。在某些情况下，序列的数量太多，不能准确地预测，这时一种分布式方法是有利的。我们在这种情况下提出了一种方法，即训练独立的预测模型 для每个序列和某些层次统计量（例如所有序列的总和），然后将这些模型传递给一个协调算法以提高预测。在这篇文章中，我们关注协调步骤，并从泛函视角出发提出了一种bayesian预测协调方法。我们还定义了常见的线性 Gaussian 协调情况，其中预测是 Gaussian 分布，序列层次结构是线性的，并且可以在关闭式形式内计算协调。我们在 sintetic 和实际数据集上评估了这些方法，并与其他相关研究进行比较。

Fast Feedforward Networks

paper_url: http://arxiv.org/abs/2308.14711
repo_url: https://github.com/pbelcak/fastfeedforward
paper_authors: Peter Belcak, Roger Wattenhofer
for: 提高层SIZE与计算成本之间的非线性关系，并提供一种快速的替代方案。
methods: 引入快速推导（FFF）架构，其运行时间与 feedforward 网络相似，但计算成本呈尖峰式增长。
results: 比较 feedforward 网络和混合专家网络，FFF 在计算成本下降了5.8%的性能，并且可以轻松地替换 transformers。

Abstract
We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a logarithmic-time alternative to feedforward networks. We show that FFFs give comparable performance to feedforward networks at an exponential fraction of their inference cost, are quicker to deliver performance compared to mixture-of-expert networks, and can readily take the place of either in transformers. Pushing FFFs to the absolute limit, we train a vision transformer to perform single-neuron inferences at the cost of only 5.8% performance decrease against the full-width variant. Our implementation is available as a Python package; just use "pip install fastfeedforward".

摘要
我们打破层Size和推论成本之间的直线关系，通过引入快速feedforward（FFF）架构，实现了对数时间的替代方案。我们显示，FFFs可以与传统 feedforward 网络比较，具有相似的性能，并且在推论过程中更快速地获得结果，而且可以轻松地取代混合专家网络。我们在推展FFFs的最大限度下，将视觉 трансформа获得单 neuron 推论的成本，与全宽版本相对只需要5.8%的性能损失。我们的实现可以通过 pip 安装 fastfeedforward packages。

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

paper_url: http://arxiv.org/abs/2308.14710
repo_url: https://github.com/facebookresearch/cutler
paper_authors: Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell
For: The paper is written for unsupervised video instance segmentation, which is a challenging task that requires identifying and tracking multiple objects in a video sequence without any labeled data.* Methods: The paper proposes a simple method called VideoCutLER, which uses high-quality pseudo masks and a simple video synthesis method to train a video model that can effectively segment and track multiple instances across video frames.* Results: The paper achieves competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, surpassing the previous state-of-the-art by a large margin. Specifically, the paper achieves 50.7% APvideo^50 and exceeds DINO by 15.9% on YouTubeVIS-2019 in terms of APvideo.Here’s the information in Simplified Chinese text:
for: 该 paper 是为了解决无监督视频实例分割问题，这是一个非常困难的任务，需要在视频序列中识别和跟踪多个对象，而无需任何标注数据。
methods: 该 paper 提出了一种简单的方法，即 VideoCutLER，它使用高质量的 Pseudo 面Mask 和简单的视频合成方法来训练一个视频模型，能够有效地 segment 和跟踪多个实例 Across 视频帧。
results: 该 paper 在 YouTubeVIS-2019 benchmark 上 achieve 竞争性的无监督学习结果，大幅超过之前的状态泰施，具体来说是 50.7% APvideo^50 和 exceeds DINO 在 YouTubeVIS-2019 上的 APvideo 指标上。

Abstract
Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% APvideo^50 , surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS-2019 in terms of APvideo.

摘要
Traditional unsupervised video instance segmentation methods usually rely on motion estimates and have difficulty tracking small or divergent motions. We propose VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% APvideo^50, surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS-2019 in terms of APvideo.Here's the word-for-word translation:传统的无监督视频实例分割方法通常基于运动估计，并且在跟踪小或弯曲运动时存在困难。我们提出了VideoCutLER，一种简单的无监督多实例视频分割方法，不使用运动基于学习信号如光流或者在自然视频上进行训练。我们的关键发现是，使用高质量的 Pseudo 面积和简单的视频合成方法来训练视频模型，奇怪地 sufficient 使得模型可以有效地在视频帧中分割和跟踪多个实例。我们在 YouTubeVIS-2019 benchmark 上 achieve 50.7% APvideo^50，大幅超越过去的状态泰然。VideoCutLER 还可以作为supervised video instance segmentation任务的强大预训练模型，在 YouTubeVIS-2019 上过去 DINO 的 APvideo 值 by 15.9%。

TRIVEA: Transparent Ranking Interpretation using Visual Explanation of Black-Box Algorithmic Rankers

paper_url: http://arxiv.org/abs/2308.14622
repo_url: None
paper_authors: Jun Yuan, Kaustav Bhattacharjee, Akm Zahirul Islam, Aritra Dasgupta
for: 这篇论文的目的是提高排名的透明度，使得不同领域的决策者可以更加了解排名的内在逻辑。
methods: 本论文使用算法学习排名模型，并使用可解释Machine Learning（XAI）技术来帮助人类理解排名模型中的各个参数对排名的影响。
results: 通过TRIVEA视觉分析系统，研究人员可以轻松地探索和理解复杂多属性排名数据，无需打开黑盒排名模型。

Abstract
Ranking schemes drive many real-world decisions, like, where to study, whom to hire, what to buy, etc. Many of these decisions often come with high consequences. For example, a university can be deemed less prestigious if not featured in a top-k list, and consumers might not even explore products that do not get recommended to buyers. At the heart of most of these decisions are opaque ranking schemes, which dictate the ordering of data entities, but their internal logic is inaccessible or proprietary. Drawing inferences about the ranking differences is like a guessing game to the stakeholders, like, the rankees (i.e., the entities who are ranked, like product companies) and the decision-makers (i.e., who use the rankings, like buyers). In this paper, we aim to enable transparency in ranking interpretation by using algorithmic rankers that learn from available data and by enabling human reasoning about the learned ranking differences using explainable AI (XAI) methods. To realize this aim, we leverage the exploration-explanation paradigm of human-data interaction to let human stakeholders explore subsets and groupings of complex multi-attribute ranking data using visual explanations of model fit and attribute influence on rankings. We realize this explanation paradigm for transparent ranking interpretation in TRIVEA, a visual analytic system that is fueled by: i) visualizations of model fit derived from algorithmic rankers that learn the associations between attributes and rankings from available data and ii) visual explanations derived from XAI methods that help abstract important patterns, like, the relative influence of attributes in different ranking ranges. Using TRIVEA, end users not trained in data science have the agency to transparently reason about the global and local behavior of the rankings without the need to open black-box ranking models and develop confidence in the resulting attribute-based inferences. We demonstrate the efficacy of TRIVEA using multiple usage scenarios and subjective feedback from researchers with diverse domain expertise. Keywords: Visual Analytics, Learning-to-Rank, Explainable ML, Ranking

摘要
排名方案在现实生活中影响很大，例如选择学习的地方、聘请的人员、购买的产品等。这些决策通常带有严重的后果。例如，如果一所大学不被列入某些排名列表中，那么该大学可能会被评估为 less prestigious。顾客可能不会考虑没有获得推荐的产品。排名方案的内部逻辑通常是不可见或 propriety 的，因此决策者（如产品公司）和排名对象（如顾客）无法了解排名差异的解释。在这篇论文中，我们想要使排名解释变得 transparent ，使用算法学习排名方法，并使用可解释AI（XAI）方法来帮助人类理解学习到的排名差异。为实现这一目标，我们利用人类数据互动的探索-解释模式，让人类决策者在可视化的方式下探索复杂多属性排名数据的子集和分组。我们在 TRIVEA 中实现了这一解释模式，TRIVEA 是一个基于可视化的数据分析系统，它使用以下两种方法来提供可视化和解释：1. 基于算法学习的排名模型，从可用数据中学习排名和属性之间的关系，并生成可视化的模型适配。2. XAI 方法，帮助抽象出重要的 Pattern，如排名范围内的属性影响。使用 TRIVEA，无需数据科学背景的结束用户可以自主地透明地理解排名的全球和本地行为，并对Attribute-based的推理产生信任。我们通过多个使用场景和域专家的主观反馈，证明了 TRIVEA 的可行性和有用性。关键字：可视化分析、学习排名、可解释Machine Learning、排名

Assessing Trust in Construction AI-Powered Collaborative Robots using Structural Equation Modeling

paper_url: http://arxiv.org/abs/2308.14697
repo_url: None
paper_authors: Newsha Emaminejad, Lisa Kath, Reza Akhavian
for: This study aimed to investigate the key technical and psychological factors that impact the architecture, engineering, and construction (AEC) professionals’ trust in collaborative robots (cobots) powered by artificial intelligence (AI).
methods: The study employed a nationwide survey of 600 AEC industry practitioners to gather in-depth responses and valuable insights into the future opportunities for promoting the adoption, cultivation, and training of a skilled workforce to leverage this technology effectively. A Structural Equation Modeling (SEM) analysis was used to reveal the significant factors for the adoption of AI-powered cobots in construction.
results: The study found that safety and reliability are significant factors for the adoption of AI-powered cobots in construction, and fear of being replaced resulting from the use of cobots can have a substantial effect on the mental health of the affected workers. Additionally, the study found that a lower error rate in jobs involving cobots, safety measurements, and security of data collected by cobots from jobsites significantly impact reliability, while the transparency of cobots’ inner workings can benefit accuracy, robustness, security, privacy, and communication, and results in higher levels of automation. The study’s findings provide critical insights into the perceptions and experiences of AEC professionals towards adoption of cobots in construction and help project teams determine the adoption approach that aligns with the company’s goals and workers’ welfare.

Abstract
This study aimed to investigate the key technical and psychological factors that impact the architecture, engineering, and construction (AEC) professionals' trust in collaborative robots (cobots) powered by artificial intelligence (AI). The study employed a nationwide survey of 600 AEC industry practitioners to gather in-depth responses and valuable insights into the future opportunities for promoting the adoption, cultivation, and training of a skilled workforce to leverage this technology effectively. A Structural Equation Modeling (SEM) analysis revealed that safety and reliability are significant factors for the adoption of AI-powered cobots in construction. Fear of being replaced resulting from the use of cobots can have a substantial effect on the mental health of the affected workers. A lower error rate in jobs involving cobots, safety measurements, and security of data collected by cobots from jobsites significantly impact reliability, while the transparency of cobots' inner workings can benefit accuracy, robustness, security, privacy, and communication, and results in higher levels of automation, all of which demonstrated as contributors to trust. The study's findings provide critical insights into the perceptions and experiences of AEC professionals towards adoption of cobots in construction and help project teams determine the adoption approach that aligns with the company's goals workers' welfare.

摘要
Translation in Simplified Chinese:这项研究的目标是调查建筑、工程和建筑（AEC）专业人员对人工智能（AI）驱动的协作机器人（cobots）的信任度。研究使用了600名AEC行业专业人员参与全国范围内的调查，以获取深入的反应和有价值的洞察。结构方程分析发现，在建筑中采用AI驱动的cobots的安全性和可靠性是采用的关键因素。使用cobots可能会导致工人被取代，并且可能会对工人的心理健康产生很大的影响。在cobots参与的任务中，误差率的下降和工地安全测量对可靠性产生了重要影响，而cobots内部的透明度可以提高准确性、Robustness、安全性、隐私和通信，并导致更高水平的自动化，这些都是信任的重要因素。这项研究的发现对AEC专业人员对cobots在建筑领域的采用提供了关键的洞察和经验，帮助项目团队确定采用策略，以实现公司的目标和工人的福祉。

MELT: Mining Effective Lightweight Transformations from Pull Requests

paper_url: http://arxiv.org/abs/2308.14687
repo_url: https://github.com/squareslab/melt
paper_authors: Daniel Ramos, Hailie Mitchell, Inês Lynce, Vasco Manquinho, Ruben Martins, Claire Le Goues
for: 提高API更新效率，帮助软件开发人员更快速地更新API。
methods: 使用机器学习技术 mines API更新规则 directly from pull requests in popular library repositories，并提出了一种通用化过程来提高规则的可重用性。
results: 在四个流行的库中 mines 461个更新规则，并通过应用这些规则到客户端项目中，成功减少了警告数量并解决了一些测试案例。

Abstract
Software developers often struggle to update APIs, leading to manual, time-consuming, and error-prone processes. We introduce MELT, a new approach that generates lightweight API migration rules directly from pull requests in popular library repositories. Our key insight is that pull requests merged into open-source libraries are a rich source of information sufficient to mine API migration rules. By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in \comby, a language for structural code search and replace. Since inferred rules from single code examples may be too specific, we propose a generalization procedure to make the rules more applicable to client projects. MELT rules are syntax-driven, interpretable, and easily adaptable. Moreover, unlike previous work, our approach enables rule inference to seamlessly integrate into the library workflow, removing the need to wait for client code migrations. We evaluated MELT on pull requests from four popular libraries, successfully mining 461 migration rules from code examples in pull requests and 114 rules from auto-generated code examples. Our generalization procedure increases the number of matches for mined rules by 9x. We applied these rules to client projects and ran their tests, which led to an overall decrease in the number of warnings and fixing some test cases demonstrating MELT's effectiveness in real-world scenarios.

摘要

Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts

paper_url: http://arxiv.org/abs/2308.14683
repo_url: None
paper_authors: Thanh Thi Nguyen, Campbell Wilson, Janis Dalins
for: 这个研究的目的是为了探索在社交媒体平台上探索性的偏见和不当语言，以及发展一个可靠的检测系统，以维护网络上的安全，特别是对于弱化人群，如儿童和青少年。
methods: 本研究使用了Meta GenAI公司最近发布的免费预训Llama 2 7B-parameter模型，进行了精致的调整和测试，以检测在线上的性骚扰和不当语言。
results: 本研究的结果显示，提出的方法在三个不同的数据集上具有优秀的表现，能够自动和自适应地检测性骚扰和不当语言，并且可以应用于各种文本分类问题，如情感分析、骇客和诈欺检测、法律文件排序、假新闻检测、语言识别、用户意愿识别、文本基于产品分类、医疗记录分析和维护产品检测。

Abstract
Detecting online sexual predatory behaviours and abusive language on social media platforms has become a critical area of research due to the growing concerns about online safety, especially for vulnerable populations such as children and adolescents. Researchers have been exploring various techniques and approaches to develop effective detection systems that can identify and mitigate these risks. Recent development of large language models (LLMs) has opened a new opportunity to address this problem more effectively. This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B-parameter model, recently released by Meta GenAI. We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu). Based on the power of LLMs, our approach is generic and automated without a manual search for a synergy between feature extraction and classifier design steps like conventional methods in this domain. Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets with five sets of experiments. This study's outcomes indicate that the proposed method can be implemented in real-world applications (even with non-English languages) for flagging sexual predators, offensive or toxic content, hate speech, and discriminatory language in online discussions and comments to maintain respectful internet or digital communities. Furthermore, it can be employed for solving text classification problems with other potential applications such as sentiment analysis, spam and phishing detection, sorting legal documents, fake news detection, language identification, user intent recognition, text-based product categorization, medical record analysis, and resume screening.

摘要
在社交媒体平台上探测在线性侵和辱骂语言的行为已成为研究的关键领域，特别是对于容易受到侵害的人群，如儿童和青少年。研究人员已经在各种技术和方法上进行了探索，以开发有效的检测系统，以避免这些风险。最近，大型语言模型（LLMs）的发展对于解决这个问题提供了新的机会。本文提出了一种在线性侵辱聊天和辱骂语言检测的方法，使用Meta GenAI最近发布的开源预训练Llama 2 7B参数模型。我们在不同的数据集、不同程度的偏好和不同语言（英语、旁语和乌都）上进行了精细调整。基于LLMs的强大能力，我们的方法是自动化的，无需手动搜索特征提取和分类器设计步骤，与传统方法不同。实验结果表明，我们的方法在三个不同的数据集上表现出色，并在五组实验中表现了稳定和可靠的性。这些结果表明，我们的方法可以在实际应用中使用，即 Flagging sexual predators, offensive or toxic content, hate speech, and discriminatory language in online discussions and comments to maintain respectful internet or digital communities.此外，它还可以用于解决文本分类问题，包括情感分析、垃圾邮件和恶意软件检测、法律文档排序、假新闻检测、语言识别、用户意图识别、文本基于产品分类、医疗记录分析和简历屏选择。

2023-08-31

A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

Learning to Taste: A Multimodal Wine Dataset

Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization

Prediction of Diblock Copolymer Morphology via Machine Learning

Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs

Majorization-Minimization for sparse SVMs

Natural Quantum Monte Carlo Computation of Excited States

FedDD: Toward Communication-efficient Federated Learning with Differential Parameter Dropout

Joint Semantic-Native Communication and Inference via Minimal Simplicial Structures

Constructing Indoor Region-based Radio Map without Location Labels

Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction

Moreau Envelope ADMM for Decentralized Weakly Convex Optimization

Robust Representation Learning for Unreliable Partial Label Learning

Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness

Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting

Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing

What can we learn from quantum convolutional neural networks?

Autoencoder-based Online Data Quality Monitoring for the CMS Electromagnetic Calorimeter

A Causal Discovery Approach To Learn How Urban Form Shapes Sustainable Mobility Across Continents

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study

MONDEO: Multistage Botnet Detection

Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input

Scalable Incomplete Multi-View Clustering with Structure Alignment

Echocardiographic View Classification with Integrated Out-of-Distribution Detection for Enhanced Automatic Echocardiographic Analysis

A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems

Domain-adaptive Message Passing Graph Neural Network

Computing excited states of molecules using normalizing flows

Least Squares Maximum and Weighted Generalization-Memorization Machines

Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training

AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

DECODE: DilatEd COnvolutional neural network for Detecting Extreme-mass-ratio inspirals

CktGNN: Circuit Graph Neural Network for Electronic Design Automation

Balancing between the Local and Global Structures (LGS) in Graph Embedding

Improving Robustness and Accuracy of Ponzi Scheme Detection on Ethereum Using Time-Dependent Features

Multi-Objective Decision Transformers for Offline Reinforcement Learning

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

2023-08-31

Twofold Structured Features-Based Siamese Network for Infrared Target Tracking

2023-08-31

Amplitude Prediction from Uplink to Downlink CSI against Receiver Distortion in FDD Systems

Analysis and Optimization of Reconfigurable Intelligent Surfaces Based on $S$-Parameters Multiport Network Theory

On the Performance of RIS-Aided Spatial Scattering Modulation for mmWave Transmission

Channel Estimation Using RIDNet Assisted OMP for Hybrid-field THz Massive MIMO Systems

Design Challenges for the Implementation of Smart Homes

Data-Aided Channel Estimation Utilizing Gaussian Mixture Models

Channel Estimation for XL-MIMO Systems with Polar-Domain Multi-Scale Residual Dense Network

2023-08-30

A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis

2023-08-30

3D Adversarial Augmentations for Robust Out-of-Domain Predictions

An Adaptive Tangent Feature Perspective of Neural Networks

Learning Modulated Transformation in GANs

Input margins can predict generalization too

Online Overexposed Pixels Hallucination in Videos with Adaptive Reference Frame Selection

Canonical Factors for Hybrid Neural Fields

Pseudo-Boolean Polynomials Approach To Edge Detection And Image Segmentation

WrappingNet: Mesh Autoencoder via Deep Sphere Deformation

Robust Long-Tailed Learning via Label-Aware Bounded CVaR

2023-08-30

A General-Purpose Self-Supervised Model for Computational Pathology

Multimodal Contrastive Learning and Tabular Attention for Automated Alzheimer’s Disease Prediction

A Comparative Study of Loss Functions: Traffic Predictions in Regular and Congestion Scenarios

ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

From SMOTE to Mixup for Deep Imbalanced Classification

When Do Program-of-Thoughts Work for Reasoning?

Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction

2023-08-30

Vulgar Remarks Detection in Chittagonian Dialect of Bangla

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

2023-08-30

Policy composition in reinforcement learning via multi-objective policy optimization

Random feature approximation for general spectral methods

Probabilistic solar flare forecasting using historical magnetogram data

2023-08-29

Efficient Discovery and Effective Evaluation of Visual Perceptual Similarity: A Benchmark and Beyond

MagicEdit: High-Fidelity and Temporally Coherent Video Editing

MagicAvatar: Multimodal Avatar Generation and Animation