2023-10-11

cs.LG

cs.LG - 2023-10-11

Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines

paper_url: http://arxiv.org/abs/2310.07940
repo_url: None
paper_authors: Ravit Sharma, Wojciech Romaszkan, Feiqian Zhu, Puneet Gupta, Ankur Mehta
for: This paper aims to enable widely-applicable smart devices by overcoming the storage and processing requirements of deep neural networks.
methods: The paper explores the interactions between quantization, model scaling, and multi-modality with system components such as memory, sensors, and processors, and develops guidelines for optimal system design and model deployment for cost-constrained platforms.
results: The paper demonstrates an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board.

Abstract
Researchers have long touted a vision of the future enabled by a proliferation of internet-of-things devices, including smart sensors, homes, and cities. Increasingly, embedding intelligence in such devices involves the use of deep neural networks. However, their storage and processing requirements make them prohibitive for cheap, off-the-shelf platforms. Overcoming those requirements is necessary for enabling widely-applicable smart devices. While many ways of making models smaller and more efficient have been developed, there is a lack of understanding of which ones are best suited for particular scenarios. More importantly for edge platforms, those choices cannot be analyzed in isolation from cost and user experience. In this work, we holistically explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors. We perform this hardware/software co-design from the cost, latency, and user-experience perspective, and develop a set of guidelines for optimal system design and model deployment for the most cost-constrained platforms. We demonstrate our approach using an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board.

摘要
In this work, we comprehensively explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors. We perform this hardware/software co-design from the cost, latency, and user-experience perspective, and develop a set of guidelines for optimal system design and model deployment for the most cost-constrained platforms. We demonstrate our approach using an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board.

Enhanced sampling of Crystal Nucleation with Graph Representation Learnt Variables

paper_url: http://arxiv.org/abs/2310.07927
repo_url: None
paper_authors: Ziyue Zou, Pratyush Tiwary
for: 这个论文是用来描述一种基于图 neural network的学习方法，用于从实验室中观察的结晶结构中Derive low-dimensional variables。
methods: 这种方法使用了简单的卷积和聚合方法，并使用了自适应Encoder设置。
results: 该方法可以在多种铁和глицин的多态和多晶体中实现可靠的抽象和自由能计算，并且可以与实验结果相匹配。

Abstract
In this study, we present a graph neural network-based learning approach using an autoencoder setup to derive low-dimensional variables from features observed in experimental crystal structures. These variables are then biased in enhanced sampling to observe state-to-state transitions and reliable thermodynamic weights. Our approach uses simple convolution and pooling methods. To verify the effectiveness of our protocol, we examined the nucleation of various allotropes and polymorphs of iron and glycine from their molten states. Our graph latent variables when biased in well-tempered metadynamics consistently show transitions between states and achieve accurate free energy calculations in agreement with experiments, both of which are indicators of dependable sampling. This underscores the strength and promise of our graph neural net variables for improved sampling. The protocol shown here should be applicable for other systems and with other sampling methods.

摘要
在这项研究中，我们提出了基于图 neural network的学习方法，使用自适应Encoder设置来 derivate低维度变量从实验室中观察的晶体结构特征。这些变量然后被偏导向增强抽样，以观察状态转移和可靠的热动力学权重。我们的方法使用简单的卷积和聚合方法。为了证明我们的协议的有效性，我们对铁和глицин的多种晶体和合金的融化过程进行了研究。我们的图秘密变量，当偏导向于well-tempered metadynamics中，一致地显示了状态之间的转移，并实现了准确的自由能计算，与实验数据一致，这都是可靠的抽样的标志。这表明了我们的图 neural net变量的优异和承诺，这种方法应该适用于其他系统和其他抽样方法。

First-Order Dynamic Optimization for Streaming Convex Costs

paper_url: http://arxiv.org/abs/2310.07925
repo_url: None
paper_authors: M. Rostami, H. Moradian, S. S. Kia
for: 该 paper 提出了一组新的优化算法，用于解决一类具有时间变化的流量成本函数的凸优化问题。
methods: 我们开发了一种方法来跟踪优化解的最优解，并且只使用了成本函数的首次导数，从而使得算法更加计算效率。
results: 我们比较了我们的算法和梯度下降算法，并证明了梯度下降算法在优化问题中不是有效的。我们还通过一些示例，如解决一个预测控制问题，来演示我们的结果。

Abstract
This paper proposes a set of novel optimization algorithms for solving a class of convex optimization problems with time-varying streaming cost function. We develop an approach to track the optimal solution with a bounded error. Unlike the existing results, our algorithm is executed only by using the first-order derivatives of the cost function which makes it computationally efficient for optimization with time-varying cost function. We compare our algorithms to the gradient descent algorithm and show why gradient descent is not an effective solution for optimization problems with time-varying cost. Several examples including solving a model predictive control problem cast as a convex optimization problem with a streaming time-varying cost function demonstrate our results.

摘要
这篇论文提出了一组新的优化算法，用于解决具有时间变化流量成本函数的凸优化问题。我们开发了一种方法，以确保遵循优化解的 bounded error。与现有结果不同，我们的算法只使用成本函数的首次导数，从而使其 computationally efficient 于优化时间变化的成本函数。我们与梯度下降算法进行比较，并解释了梯度下降算法在优化时间变化的成本函数时的不足之处。这篇论文通过解决一个模型预测控制问题，具有流动时间变化的成本函数，来展示我们的结果。Note: "Simplified Chinese" is a romanization of Chinese that uses a simplified set of characters and grammar rules to represent the language. It is commonly used in mainland China and Singapore.

Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning

paper_url: http://arxiv.org/abs/2310.07902
repo_url: None
paper_authors: Noémie Jaquier, Leonel Rozo, Tamim Asfour
for: 本研究旨在探讨在 робототехнике中广泛使用机器学习方法处理数据时，数据中的自然几何约束如何得到有效处理。
methods: 本文提出了一种基于偏微分几何的方法，以解决机器学习方法中数据中的几何约束问题。
results: 本研究发现，在使用偏微分几何时，存在一种“单 tangent 空间误差”，即仅将数据项投影到单个 tangent 空间上，然后使用存在误差的机器学习算法进行处理。这种方法的缺陷导致了机器学习模型的不准确性和不稳定性。

Abstract
In the realm of robotics, numerous downstream robotics tasks leverage machine learning methods for processing, modeling, or synthesizing data. Often, this data comprises variables that inherently carry geometric constraints, such as the unit-norm condition of quaternions representing rigid-body orientations or the positive definiteness of stiffness and manipulability ellipsoids. Handling such geometric constraints effectively requires the incorporation of tools from differential geometry into the formulation of machine learning methods. In this context, Riemannian manifolds emerge as a powerful mathematical framework to handle such geometric constraints. Nevertheless, their recent adoption in robot learning has been largely characterized by a mathematically-flawed simplification, hereinafter referred to as the ``single tangent space fallacy". This approach involves merely projecting the data of interest onto a single tangent (Euclidean) space, over which an off-the-shelf learning algorithm is applied. This paper provides a theoretical elucidation of various misconceptions surrounding this approach and offers experimental evidence of its shortcomings. Finally, it presents valuable insights to promote best practices when employing Riemannian geometry within robot learning applications.

摘要
在 роботиCS中，许多下游任务利用机器学习方法处理、模型或生成数据。经常times，这些数据包括具有几何约束的变数，如体积正规的旋转矩阵或弹性和操作矩阵的正定性。有效地处理这些几何约束需要将数据转换为几何空间中的一个紧致的数据集合，并且使用几何学的工具来设计机器学习方法。在这个上下文中，里曼维 manifold emerges as a powerful mathematical framework to handle such geometric constraints.然而，在机器学习中的最近几年，这种方法的采用受到了“单点 tangent 空间误解”的影响，即将数据转换为单点 tangent 空间，然后使用对应的机器学习算法进行处理。本文将提供这种方法的理论详细阐述，以及实验证据证明其缺陷。最后，它将提供实践中的最佳做法，以便在机器学习应用中正确地使用几何学。

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

paper_url: http://arxiv.org/abs/2310.07895
repo_url: None
paper_authors: Julia Werner, Christoph Gerum, Moritz Reiber, Jörg Nick, Oliver Bringmann
for: 这个论文是为了高效地分类来自Video Capsule Endoscopy (VCE)研究中的肠胃部分图像而设计的。
methods: 这个论文使用了卷积神经网络（CNN）进行分类，并利用隐马尔可夫模型（HMM）的时间序列分析特性。
results: 该方法在里士满（RI）肠胃学数据集上达到了$98.04%$的准确率，可以准确地地标定肠胃迷你隧道中的位置，并且只需要约1M个参数，因此适用于低功耗设备。

Abstract
This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices

摘要
这篇论文提出了一种方法，通过结合卷积神经网络（CNN）和隐马尔可夫模型（HMM）的时间序列分析特性来高效地分类 Gastroenterologic 图像序列，从Video Capsule Endoscopy (VCE) 研究中获取的。研究表明，顺序时间序列分析可以corrrect CNN 输出中的错误。我们的方法在 Rhode Island (RI) Gastroenterology 数据集上达到了 $98.04\%$ 的准确率，这使得在 Gastrointestinal (GI) 轨迹中进行精确定位，只需约 1M 参数，因此适用于低功耗设备。

ASV Station Keeping under Wind Disturbances using Neural Network Simulation Error Minimization Model Predictive Control

paper_url: http://arxiv.org/abs/2310.07892
repo_url: None
paper_authors: Jalil Chavez-Galaviz, Jianwen Li, Ajinkya Chaudhary, Nina Mahmoudian
for: 这个论文主要针对 Autonomous Surface Vehicles (ASVs) 在狭窄空间中进行探测和相对定位时的稳定控制问题。
methods: 该论文提出了一种基于神经网络预测误差最小化 (NNSEM-MPC) 的模型预测控制器，用于精准地预测 ASV 的动态行为下风干扰的影响。
results: 对于风干扰情况下的稳定控制问题，该论文的提出的 NNSEM-MPC 方法与其他控制器（包括 backstepping 控制器、滑模控制器、简化动态 MPC (SD-MPC)、神经普通几何 MPC (NODE-MPC) 和知识基于 NODE MPC (KNODE-MPC)）进行比较，并在六个测试情况下得到了显著的优异性。

Abstract
Station keeping is an essential maneuver for Autonomous Surface Vehicles (ASVs), mainly when used in confined spaces, to carry out surveys that require the ASV to keep its position or in collaboration with other vehicles where the relative position has an impact over the mission. However, this maneuver can become challenging for classic feedback controllers due to the need for an accurate model of the ASV dynamics and the environmental disturbances. This work proposes a Model Predictive Controller using Neural Network Simulation Error Minimization (NNSEM-MPC) to accurately predict the dynamics of the ASV under wind disturbances. The performance of the proposed scheme under wind disturbances is tested and compared against other controllers in simulation, using the Robotics Operating System (ROS) and the multipurpose simulation environment Gazebo. A set of six tests were conducted by combining two wind speeds (3 m/s and 6 m/s) and three wind directions (0$^\circ$, 90$^\circ$, and 180$^\circ$). The simulation results clearly show the advantage of the NNSEM-MPC over the following methods: backstepping controller, sliding mode controller, simplified dynamics MPC (SD-MPC), neural ordinary differential equation MPC (NODE-MPC), and knowledge-based NODE MPC (KNODE-MPC). The proposed NNSEM-MPC approach performs better than the rest in 4 out of the 6 test conditions, and it is the second best in the 2 remaining test cases, reducing the mean position and heading error by at least 31\% and 46\% respectively across all the test cases. In terms of execution speed, the proposed NNSEM-MPC is at least 36\% faster than the rest of the MPC controllers. The field experiments on two different ASV platforms showed that ASVs can effectively keep the station utilizing the proposed method, with a position error as low as $1.68$ m and a heading error as low as $6.14^{\circ}$ within time windows of at least $150$s.

摘要
Station keeping是自主水下车辆（ASV）的必需操作，尤其在封闭空间中使用，以进行需要ASV保持位置或与其他车辆合作，其中相对位置对任务有重要影响。然而，这种操作可能会对 классическими反馈控制器而成为挑战，因为需要准确的ASV动态模型和环境干扰的数据。这项工作提议使用神经网络预测误差最小化模型predictive控制（NNSEM-MPC）来准确预测ASV在风干扰下的动态。我们对提议的方案在风干扰下的性能进行了测试和比较，使用Robotics Operating System（ROS）和多用途 simulate环境Gazebo。我们进行了六个测试，其中每个测试都 combinated two wind speed（3 m/s和6 m/s）和 three wind direction（0$^\circ$, 90$^\circ$,和180$^\circ）。模拟结果明显地显示了NNSEM-MPC的优势，胜过以下方法：backstepping controller、滑模控制、简化动态MPC（SD-MPC）、神经ordinary differential equation MPC（NODE-MPC）和知识基于NODE MPC（KNODE-MPC）。提议的NNSEM-MPC方法在4个测试条件中表现最佳，并在另外2个测试条件中表现第二最佳，将mean position和heading error降低至少31%和46%。在执行速度方面，提议的NNSEM-MPC至少36% faster than其他MPC控制器。在两种不同ASV平台上进行的野外实验表明，ASV可以通过提议的方法 effetively keep station，position error为1.68米，heading error为6.14度，在时间窗口至少150秒内。

A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

paper_url: http://arxiv.org/abs/2310.07891
repo_url: None
paper_authors: Behrad Moniri, Donghwan Lee, Hamed Hassani, Edgar Dobriban
for: This paper is written for understanding the conditions under which feature learning occurs in deep neural networks, and how to improve the learning process by introducing multiple rank-one components.
methods: The paper uses two-layer fully-connected neural networks and gradient descent with a growing learning rate to introduce multiple rank-one components, which enable the learning of non-linear features.
results: The paper shows that with a growing learning rate, the training process introduces multiple rank-one components, each corresponding to a specific polynomial feature, and these non-linear features can enhance learning. The limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes.

Abstract
Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.

摘要
<>translate "Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning."Translate:Feature 学习是深度神经网络的成功的基本原因之一。在满足 certain 条件下的两层全连接神经网络中，一步Gradient Descent在第一层 followed by Ridge Regression 在第二层可以导致特征学习，这可以通过特征矩阵的spectrum中的分离rank-one component -- spike -- 来Characterize。然而，随着步长不变，这个spike只会传递线性函数的信息，因此学习非线性Component是不可能的。我们显示，随着样本大小增加，这种训练实际引入多个rank-one component，每个component都对应于特定的多项式特征。我们进一步证明， limiting 大量和大样本训练和测试错误的更新神经网络的准确性是由这些spike完全Characterize。通过精确分析改进的损失，我们示出这些非线性特征可以提高学习。

Refined Mechanism Design for Approximately Structured Priors via Active Regression

paper_url: http://arxiv.org/abs/2310.07874
repo_url: None
paper_authors: Christos Boutsikas, Petros Drineas, Marios Mertzanidis, Alexandros Psomas, Paritosh Verma
for: 这个论文的目的是解决一个具有大量商品和投标者的契约问题，投标者的估价是从高维不确定的先前分布中独立采样出来的。
methods: 该论文使用了一种以话题模型为基础的方法，使用活动学习组件和机制设计组件。活动学习组件负责与投标者进行互动，并输出低维度的投标者类型，而机制设计组件则负责使机制在低维度模型下变得对投标者类型有效。
results: 该论文的结果表明，使用这种方法可以减少机制设计中的假设和限制，并且可以在不同的话题模型下实现更高的收益。此外，该论文还开拓了机制设计和随机线性代数（RLA）的连接，并将RLA的多个突破成果应用到机制设计中。

Abstract
We consider the problem of a revenue-maximizing seller with a large number of items $m$ for sale to $n$ strategic bidders, whose valuations are drawn independently from high-dimensional, unknown prior distributions. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute, and, even when they can be found, are often rife with various counter-intuitive properties. In this paper, following a model introduced recently by Cai and Daskalakis~\cite{cai2022recommender}, we consider the case that bidders' prior distributions can be well-approximated by a topic model. We design an active learning component, responsible for interacting with the bidders and outputting low-dimensional approximations of their types, and a mechanism design component, responsible for robustifying mechanisms for the low-dimensional model to work for the approximate types of the former component. On the active learning front, we cast our problem in the framework of Randomized Linear Algebra (RLA) for regression problems, allowing us to import several breakthrough results from that line of research, and adapt them to our setting. On the mechanism design front, we remove many restrictive assumptions of prior work on the type of access needed to the underlying distributions and the associated mechanisms. To the best of our knowledge, our work is the first to formulate connections between mechanism design, and RLA for active learning of regression problems, opening the door for further applications of randomized linear algebra primitives to mechanism design.

摘要
我们考虑一个寻求最大化收益的售家，面临一大量的商品($m$)供$n$名战略性投标者购买。这些投标者的价值是从高维度、未知的对应分布中独立地获取。对于这个设定，优化和约优化的机制是极其困难computational和易受到各种 counter-intuitive 的性质。在这篇文章中，我们遵循 Cai 和 Daskalakis （2022）提出的模型，即投标者对价值的对应分布可以以主题模型的形式近似。我们设计了一个活动学习部分，负责与投标者进行互动，从来到低维度的近似类型，以及一个机制设计部分，负责对这个低维度模型进行强健化，以适应投标者的实际类型。在活动学习方面，我们将问题套用在Randomized Linear Algebra（RLA）的框架中，允许我们从这个领域的研究中吸取多个突破性结果，并将其适应到我们的设定。在机制设计方面，我们取消了许多先前的研究对机制设计的限制性假设，例如需要存取到背后的分布和相关机制。我们的工作是首个将机制设计与 RLA 的活动学习连接起来，这会开启更多的应用机会，将随机线性代数元素应用到机制设计中。

QArchSearch: A Scalable Quantum Architecture Search Package

paper_url: http://arxiv.org/abs/2310.07858
repo_url: None
paper_authors: Ankit Kulshrestha, Danylo Lykov, Ilya Safro, Yuri Alexeev
for: 这篇论文旨在提供一种AI驱动的量子架构搜索包，用于自动选择合适的量子模型，以实现量子计算任务。
methods: 该论文使用的方法包括使用\texttt{QTensor}库作为后端，并采用两级并行的策略来加速搜索过程，以便在高性能计算系统上运行。
results: 论文的实验结果表明，\texttt{QArchSearch}可以有效地搜索大型量子电路，并可以探索不同的量子应用中的更复杂的模型。

Abstract
The current era of quantum computing has yielded several algorithms that promise high computational efficiency. While the algorithms are sound in theory and can provide potentially exponential speedup, there is little guidance on how to design proper quantum circuits to realize the appropriate unitary transformation to be applied to the input quantum state. In this paper, we present \texttt{QArchSearch}, an AI based quantum architecture search package with the \texttt{QTensor} library as a backend that provides a principled and automated approach to finding the best model given a task and input quantum state. We show that the search package is able to efficiently scale the search to large quantum circuits and enables the exploration of more complex models for different quantum applications. \texttt{QArchSearch} runs at scale and high efficiency on high-performance computing systems using a two-level parallelization scheme on both CPUs and GPUs, which has been demonstrated on the Polaris supercomputer.

摘要
当前的量子计算时代已经涌现了许多算法，这些算法承诺可以提供高效的计算能力。虽然这些算法在理论上是有效的，但是有很少关于如何设计合适的量子电路来实现输入量子状态的应用转换的指导。在这篇论文中，我们介绍了\texttt{QArchSearch}，一个基于人工智能的量子建筑搜索包，它使用\texttt{QTensor}库作为后端，并提供了一种原则正的和自动化的方法来找到给定任务和输入量子状态的最佳模型。我们表明了\texttt{QArchSearch}可以高效地扩展到大型量子电路，并允许对不同量子应用的模型进行更加复杂的探索。\texttt{QArchSearch}在高性能计算系统上运行，使用了两级并行化策略，其在CPUs和GPUs上进行了两级并行，这已经在Polaris超级计算机上得到了证明。

On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism

paper_url: http://arxiv.org/abs/2310.07852
repo_url: None
paper_authors: Saptarshi Roy, Ambuj Tewari
for: 该文章研究了在高维度含有稀畴元素的线性回归模型下的模型选择问题，并在敏感性框架下进行研究。特别是，文章研究了极私的最佳子集选择问题，并对其实现了utilities保证。
methods: 文章采用了著名的指数机制来选择最佳模型，并在满足某种margin条件下Establish its strong model recovery property。然而，指数搜索空间的指数机制带来了严重的计算瓶颈。为了解决这个挑战，文章提出了 Metropolis-Hastings算法来采样步骤，并Establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$.
results: 文章的结果表明，Metropolis-Hastings算法可以在高维度含有稀畴元素的线性回归模型下实现极私的最佳子集选择，并且可以在某种margin条件下Establish strong model recovery property。此外，文章还提出了一种approximate differential privacy的方法来保证最终估计的Metropolis-Hastings random walk的极私性。最后，文章还进行了一些Ilustrative simulations，并证明了其理论结论。

Abstract
We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.

摘要
我团队考虑了在高维简单线性回归模型下进行模型选择，并在权限保护框架下进行研究。特别是，我们考虑了不同极值隐私最佳分布选择问题，并研究其性能保证。我们采用了著名的凝聚机制来选择最佳模型，并在某种margin条件下证明了它的强型回归性质。然而，凝聚搜索空间的凝聚机制带来了严重的计算瓶颈。为了解决这个挑战，我们提议了 Metropolis-Hastings算法来实现抽样步骤，并证明了它在参数$n$, $p$, 和 $s$下的多项式混合时间到其站点分布。此外，我们还证明了对最终估计的Metropolis-Hastings随机步骤的approximate权限保护。最后，我们还进行了一些与理论结果相符的仿真实验。

Measuring Feature Sparsity in Language Models

paper_url: http://arxiv.org/abs/2310.07837
repo_url: None
paper_authors: Mingyang Deng, Lucas Tao, Joe Benton
for: 本研究旨在探讨语言模型中活动的表征方式，具体来说是用简单的线性组合来描述输入文本中的特征方向。
methods: 本研究使用 sparse coding 技术来重建特征方向，并开发了一些指标来评估 sparse coding 的成功程度。
results: 研究发现，语言模型的活动可以准确地表示为简单的线性组合，而且这种线性和简单性假设都得到了证明。此外，研究还发现模型活动的稀热程度最高在输入文本的第一层和最后一层。

Abstract
Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers.

摘要
近期研究建议，语言模型的激活可以表示为稀疏线性组合的特征向量。基于这个假设，这些研究尝试使用稀疏编码重建特征方向。我们开发了评估这些稀疏编码技术的度量，并测试了Linearity和稀疏性假设的有效性。我们发现我们的度量可以预测稀疏线性活动的水平，并能够分辨稀疏线性数据和其他数据 Distribution。我们使用我们的度量测量了一些语言模型的激活水平，并发现语言模型的激活可以准确地表示为稀疏线性组合的特征向量，比控制数据集更高。此外，我们还发现模型的激活在第一层和最后一层最为稀疏。

Large Language Models Are Zero-Shot Time Series Forecasters

paper_url: http://arxiv.org/abs/2310.07820
repo_url: https://github.com/ngruver/llmtime
paper_authors: Nate Gruver, Marc Finzi, Shikai Qiu, Andrew Gordon Wilson
for: 这个论文主要针对时间序列预测 task，使用大型自然语言模型（LLMs）如 GPT-3 和 LLaMA-2 进行预测。
methods: 该论文提出了一种将时间序列编码为数字字符串的方法，并使用这种方法来实现 LLMS 的 zero-shot 推广。另外，论文还提出了一种将离散分布转换为高度灵活的连续值分布的方法。
results: 论文发现， LLMS 可以在时间序列预测 task 上达到或超过专门设计的时间序列模型的性能，而无需训练。此外，论文还示出了 LLMS 可以处理缺失数据、考虑文本副信息和回答预测问题等。

Abstract
By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.

摘要
通过将时间序列编码为一串数字，我们可以将时间序列预测转化为下一个token的预测问题。发展这种方法，我们发现大语言模型（LLM）such as GPT-3和LLaMA-2可以 surprisingly Zero-shot推断时间序列，其性能与专门为下游任务训练的时间序列模型相当或超越。为了实现这种性能，我们提出了 tokenization 时间序列数据和将精度分布转化为高度灵活的浮点值的方法。我们认为 LLMS 对时间序列的成功归功于它们的自然地表示多模态分布，以及对简单、重复的偏好，这些特征与许多时间序列中的重复季节性脉冲有很大相似性。我们还示出了 LLMS 可以自然处理缺失数据，不需要采用非数字文本进行填充，同时可以处理文本侧信息和解释预测结果。虽然我们发现模型大小增加通常会提高时间序列的性能，但我们发现 GPT-4 可能会比 GPT-3 更差，这可能是因为 GPT-4 的 tokenization 方式不同，以及不良的uncertainty calibration，这可能是由RLHF等对适应性进行的调整所致。

Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

paper_url: http://arxiv.org/abs/2310.07811
repo_url: None
paper_authors: Gellért Weisz, András György, Csaba Szepesvári
for: 这种研究是为了解决线性$q^\pi$-可实现性假设下的在线学习决策问题。
methods: 这篇论文使用的方法是同时学习扫描过程中的状态和动作值，以及使用一种新的算法来同时学习扫描过程中的状态和动作值。
results: 这篇论文提出了一种新的算法，可以在线学习决策问题，并且可以在有限的交互中返回$\epsilon$-优化的策略。此外，论文还证明了这种算法在假设错误的情况下的性能，并且显示了性能与假设错误之间的关系。

Abstract
We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.

摘要
我们考虑在线束规动学学习（RL）中的集合Markov决策过程（MDP），在Linear-$q^\pi$-可现实性假设下，其中假设所有策略的动作价值可以表示为线性函数的状态动作特征。这个类型比linear MDP更加一般，因为在linear MDP中，转移核和奖励函数都是特征Vector的线性函数。作为我们的首要贡献，我们显示了这两个类型之间的差别在于在Linearly-$q^\pi$-可现实性MDP中有些状态，对于任何策略，所有的动作都有相对平等的价值，并且通过遵循任意固定策略在这些状态中跳过这些状态，可以将问题转化为一个线性MDP。基于这一观察，我们 derivate了一种新的（计算不fficiente）学习算法，可以同时学习在Linearly-$q^\pi$-可现实性MDP中哪些状态应该跳过以及在隐藏在问题中的线性MDP上运行另一个学习算法。这种方法可以在$\text{polylog}(H, d)/\epsilon^2$交互后返回一个$\epsilon$-优化策略，其中$H$是时间悬度和$d$是特征vector的维度，这是首个在这种设定下的多项式样本复杂度在线RL算法。我们的结果是在误specified情况下证明的，其中样本复杂度显示了对误pecification错误的恶化。

FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning

paper_url: http://arxiv.org/abs/2310.07807
repo_url: None
paper_authors: Ensiye Kiyamousavi, Boris Kraychev, Ivan Koychev
for: 这种研究的目标是为了解决 Federated Learning (FL) 中数据不均衡和模型融合效果的问题。methods: 这种研究使用了多种数据分割技术，以适应不同的数据不均衡。results: 研究表明，我们提出的方法可以准确地测量数据不均衡，并且可以逐步挑战 FL 算法。实验结果表明，模型在我们提出的分布上进行训练后，模型更加异质。

Abstract
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. Its goal is to create a robust and accurate model by aggregating and retraining local models over multiple rounds. However, FL faces challenges regarding data heterogeneity and model aggregation effectiveness. In order to simulate real-world data, researchers use methods for data partitioning that transform a dataset designated for centralized learning into a group of sub-datasets suitable for distributed machine learning with different data heterogeneity. In this paper, we study the currently popular data partitioning techniques and visualize their main disadvantages: the lack of precision in the data diversity, which leads to unreliable heterogeneity indexes, and the inability to incrementally challenge the FL algorithms. To resolve this problem, we propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions with gradual difficulty. We introduce a metric to measure data heterogeneity among the learning agents and a transformation technique that divides any dataset into splits with precise data diversity. Through a comparative study, we demonstrate the superiority of our method over existing FL data partitioning approaches, showcasing its potential to challenge model aggregation algorithms. Experimental results indicate that our approach gradually challenges the FL strategies, and the models trained on FedSym distributions are more distinct.

摘要
federated learning (FL) 是一种分布式机器学习方法，其目标是通过聚合和重新训练本地模型，创建一个稳定和准确的模型。然而，FL 面临数据多样性和模型聚合效果的挑战。为了模拟实际数据，研究人员使用数据分区方法，将Centralized learning 的数据集转换为适合分布式机器学习的多个子集。在这篇论文中，我们研究了目前流行的数据分区技术，并视觉化它们的主要缺点：数据多样性精度的缺失，导致无法准确地评估多样性指标，以及不能逐步挑战 FL 算法。为解决这个问题，我们提议一种方法，利用熵和对称来构建 '最有挑战性' 和可控的数据分布。我们介绍了一个度量数据多样性 среди学习代理的 metric，以及一种分割任何数据集的技术，以确保数据多样性的精度。通过比较研究，我们证明了我们的方法比现有的 FL 数据分区方法更加有利，并示出了它的潜在挑战模型聚合算法的能力。实验结果表明，我们的方法可以逐步挑战 FL 策略，并且模型在 FedSym 分布上训练得更加独特。

Using Spark Machine Learning Models to Perform Predictive Analysis on Flight Ticket Pricing Data

paper_url: http://arxiv.org/abs/2310.07787
repo_url: None
paper_authors: Philip Wong, Phue Thant, Pratiksha Yadav, Ruta Antaliya, Jongwook Woo
for: 预测航空票价（非停站） flight pricing
methods: 使用 r2（r-square）和 RMSE 进行预测，利用大量数据集（来自Expedia.com），包括约2000万记录和4.68 gigabytes
results: 确定最佳模型，以便在实际世界中预测航空票价，并且要求模型具有良好的泛化能力和优化的处理时间。

Abstract
This paper discusses predictive performance and processes undertaken on flight pricing data utilizing r2(r-square) and RMSE that leverages a large dataset, originally from Expedia.com, consisting of approximately 20 million records or 4.68 gigabytes. The project aims to determine the best models usable in the real world to predict airline ticket fares for non-stop flights across the US. Therefore, good generalization capability and optimized processing times are important measures for the model. We will discover key business insights utilizing feature importance and discuss the process and tools used for our analysis. Four regression machine learning algorithms were utilized: Random Forest, Gradient Boost Tree, Decision Tree, and Factorization Machines utilizing Cross Validator and Training Validator functions for assessing performance and generalization capability.

摘要
这篇论文讨论了预测性能和利用r2（r平方）和RMSE来处理飞行票价数据，使用了大量的数据集，来自Expedia.com，包含约2000万记录或4.68吉比特。项目的目标是确定用于实际应用中预测航空票价的最佳模型，因此泛化能力和处理时间优化是重要的评价指标。我们会通过特征重要性来探索关键的商业发现，并讨论我们使用的分析过程和工具。在本研究中，我们使用了四种回归机器学习算法：随机森林、梯度提升树、决策树和因素分解机器，并使用了cross validate和training validate函数来评估性能和泛化能力。

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

paper_url: http://arxiv.org/abs/2310.07786
repo_url: None
paper_authors: Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy
for: 实际应用中的情感带its often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends.
methods: 我们提出了一个新的非站ARY contextual bandit算法，它结合了可扩展的深度对应网络架构，并且运用了一个策略性优先预测的探索机制，以优先收集持续性最高的信息。
results: 我们通过对两个实际推荐数据集进行实验，证明了我们的方法与现有的基eline signifiantly outperform。

Abstract
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritization for information of enduring value, or are designed in ways that do not scale in modern applications with high-dimensional user-specific features and large action set, or both. In this paper, we introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism that strategically prioritizes collecting information with the most lasting value in a non-stationary environment. Through empirical evaluations on two real-world recommendation datasets, which exhibit pronounced non-stationarity, we demonstrate that our approach significantly outperforms the state-of-the-art baselines.

摘要
实际应用中的上下文带有趋势的带宽机器学习问题常会出现非站点性，这可能由季节性、偶然性和不断变化的社会趋势引起。虽然文献中有许多非站点上下文带宽机器学习算法，但它们过度探索，因为缺乏综合价值信息的优先级，或者设计不适应现代应用中高维用户特定特征和大量动作集，或者都两者。在本文中，我们介绍了一种新的非站点上下文带宽机器学习算法。它将拓扑图 neural network 作为核心组件，并通过精心设计的探索机制，以优先级Collect information with the most lasting value in a non-stationary environment。经验证明，我们的方法在两个实际推荐数据集上，具有明显的非站点性，significantly outperform了状态对照。

Promoting Robustness of Randomized Smoothing: Two Cost-Effective Approaches

paper_url: http://arxiv.org/abs/2310.07780
repo_url: None
paper_authors: Linbo Liu, Trong Nghia Hoang, Lam M. Nguyen, Tsui-Wei Weng
for: 提供可靠的 robustness 保证 для滑动化神经网络分类器。
methods: 提出了两种成本效果的方法，包括 AdvMacer 和 EsbRS，以提高滑动化神经网络分类器的 robustness 性能，而不会影响 clean 性能。
results: Comparing with SOTA 基elines, AdvMacer 可以提高滑动化神经网络分类器的 robustness 性能，并且可以在训练时间上减少 3 倍的时间成本。EsbRS 可以提高模型 ensemble 的 robustness 性能，并且提出了一种新的模型 ensemble 设计方法，以提高 robustness 性能。

Abstract
Randomized smoothing has recently attracted attentions in the field of adversarial robustness to provide provable robustness guarantees on smoothed neural network classifiers. However, existing works show that vanilla randomized smoothing usually does not provide good robustness performance and often requires (re)training techniques on the base classifier in order to boost the robustness of the resulting smoothed classifier. In this work, we propose two cost-effective approaches to boost the robustness of randomized smoothing while preserving its clean performance. The first approach introduces a new robust training method AdvMacerwhich combines adversarial training and robustness certification maximization for randomized smoothing. We show that AdvMacer can improve the robustness performance of randomized smoothing classifiers compared to SOTA baselines, while being 3x faster to train than MACER baseline. The second approach introduces a post-processing method EsbRS which greatly improves the robustness certificate based on building model ensembles. We explore different aspects of model ensembles that has not been studied by prior works and propose a novel design methodology to further improve robustness of the ensemble based on our theoretical analysis.

摘要
Randomized smoothing 在 adversarial robustness 领域已经吸引了关注，以提供可证明的 Robustness garanties на smoothed neural network 分类器。然而，现有的工作表明，vanilla randomized smoothing 通常不提供良好的 Robustness 性能，并且常需要 (re)training 技术来提高基础分类器的 Robustness。在这个工作中，我们提出了两种cost-effective的方法来提高 randomized smoothing 的 Robustness，同时保持其 clean 性能。第一种方法是 AdvMacer，它combines adversarial training 和 Robustness 证明最大化 для randomized smoothing。我们示出，AdvMacer 可以在 SOTA 基eline 下提高 randomized smoothing 分类器的 Robustness 性能，而且训练速度比 MACER 基eline 快三倍。第二种方法是 EsbRS，它是一种post-processing 方法，可以大幅提高基于模型集的 Robustness 证明。我们探索了不同的模型集方面，并提出了一种新的设计方法，以进一步提高模型集的 Robustness。我们的理论分析表明，这种设计方法可以减少模型集的复杂性，同时保持其 Robustness。

Feature Learning and Generalization in Deep Networks with Orthogonal Weights

paper_url: http://arxiv.org/abs/2310.07765
repo_url: None
paper_authors: Hannah Day, Yonatan Kahn, Daniel A. Roberts
for: 这篇论文是为了研究深度 neural network 的 initialization 方法，以及其对 training 的影响。
methods: 该论文使用了 rectangular network 和 tanh activation function，并使用了 orthogonal matrix 的 ensemble 初始化方法。
results: 论文表明，使用这种 initialization 方法可以避免深度 neural network 的 signal 干扰增长，并且可以提高网络的泛化能力和训练速度。

Abstract
Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.

摘要
全连接深度神经网络的权重初始化为独立的高斯分布可以调整到极点，从而防止信号在网络中 exponential 增长或减少。然而，这些网络仍然会表现出线性增长，与网络的宽度相同的深度成比例。我们表述了一种方法，使用矩阵的ensemble initialization，可以使得前activation 干扰的变化独立于深度。此外，我们还证明了在初始化时，NTK和其后代的 correlators 在对width 进行倒数据阶段会到达一个深度约为20，而不是无限制增长，如果使用高斯初始化。我们推测这种结构可以保留宽度与深度相对的特征学习，同时减少总的噪音，从而提高泛化和训练速度。我们通过对NTK的实际测量与深非linear orthogonal网络在MNIST和CIFAR-10分类任务上的性能进行比较，提供了一些实际证明。

Self-supervised Representation Learning From Random Data Projectors

paper_url: http://arxiv.org/abs/2310.07756
repo_url: https://github.com/layer6ai-labs/lfr
paper_authors: Yi Sui, Tongzi Wu, Jesse C. Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs
for: 本研究旨在提出一种不需要人工数据增强的自我超级vised representation learning方法，可以应用于多种数据类型和网络架构。
methods: 该方法基于重建随机数据 проекции来学习高质量数据表示。
results: 对多种表示学习任务进行了广泛的评估，并与多个状态时的SSRL基elines进行比较，得到了更高的表示学习性能。

Abstract
Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with application-specific data augmentation constraints. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. Due to its wide applicability and strong empirical results, we argue that learning from randomness is a fruitful research direction worthy of attention and further study.

摘要
自我监督学习（SSRL）已经得到了很大的进步，通过人工设计的数据增强技术来利用转换不变性假设。而这些增强技术基于的SSRL算法在计算机视觉和自然语言处理领域的性能已经推pushed the boundaries，但它们通常不直接适用于其他数据类型，并且可能与应用特定的数据增强约束矛盾。这篇论文提出了不需要增强或掩蔽的SSRL方法，可以应用于任何数据类型和网络架构。具体来说，我们表明可以通过重建随机数据投影来学习高质量的数据表示。我们对各种表示学习任务进行了广泛的评估，这些任务覆盖了多种Modalities和真实应用。我们发现该方法可以超越多个状态 искусственныйSSRL基准。由于其广泛适用和强实验结果，我们认为学习 randomness 是一个有前途的研究方向，值得关注和进一步研究。

Stabilizing Estimates of Shapley Values with Control Variates

paper_url: http://arxiv.org/abs/2310.07672
repo_url: None
paper_authors: Jeremy Goldwasser, Giles Hooker
for: 用于解释黑盒机器学习模型的预测结果
methods: 使用Monte Carlo技术和控制变量来稳定模型解释
results: 在高维数据集上可以带来很大减少Monte Carlo协变性的Shapley估计Here’s the breakdown of each point:1. 用于解释黑盒机器学习模型的预测结果 (for): The paper is written to explain the predictions of blackbox machine learning models.2. 使用Monte Carlo技术和控制变量来稳定模型解释 (methods): The paper proposes using the Monte Carlo technique of control variates to stabilize the model explanations.3. 在高维数据集上可以带来很大减少Monte Carlo协变性的Shapley估计 (results): The paper finds that the proposed method can produce dramatic reductions in the Monte Carlo variability of Shapley estimates on high-dimensional datasets.

Abstract
Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applicable to any machine learning model and requires virtually no extra computation or modeling effort. On several high-dimensional datasets, we find it can produce dramatic reductions in the Monte Carlo variability of Shapley estimates.

摘要
<>将文本翻译为简化字符的中文。<>采用Shapley值是黑obox机器学习模型预测的解释工具中最受欢迎的。然而，它们的计算成本较高，使得使用抽象近似法得到的解释带有一定的不确定性。为了稳定这些模型解释，我们提议使用控制SHAP，基于蒙тек控制变量的方法。我们的方法适用于任何机器学习模型，需要virtually no extra computation或模型定制努力。在一些高维数据集上，我们发现它可以生成显著减少Monte Carlo变化的Shapley估计。Note: "virtually no extra computation" is a bit tricky to translate, as "extra computation" is not a direct translation of "extra computation" in Chinese. I have translated it as "需要virtually no extra computation", where "virtually" is used to convey the idea that there is no significant additional computation required.

The First Pathloss Radio Map Prediction Challenge

paper_url: http://arxiv.org/abs/2310.07658
repo_url: None
paper_authors: Çağkan Yapar, Fabian Jaensch, Ron Levie, Gitta Kutyniok, Giuseppe Caire
for: 提出了一个pathloss radio map prediction挑战，以便促进研究和对最新的方法进行公平的比较。
methods: 使用提供的数据集和挑战任务进行预测。
results: 在挑战中，得到了一些结果。

Abstract
To foster research and facilitate fair comparisons among recently proposed pathloss radio map prediction methods, we have launched the ICASSP 2023 First Pathloss Radio Map Prediction Challenge. In this short overview paper, we briefly describe the pathloss prediction problem, the provided datasets, the challenge task and the challenge evaluation methodology. Finally, we present the results of the challenge.

摘要
<>转换文本为简化中文。<>为促进研究和促进最近提出的路径损失Radio map预测方法的公正比较，我们于ICASSP 2023年首次Radio map预测挑战。在这篇简短概述中，我们 briefly describe the pathloss prediction problem, the provided datasets, the challenge task and the challenge evaluation methodology. Finally, we present the results of the challenge。

Hypercomplex Multimodal Emotion Recognition from EEG and Peripheral Physiological Signals

paper_url: http://arxiv.org/abs/2310.07648
repo_url: None
paper_authors: Eleonora Lopez, Eleonora Chiarantano, Eleonora Grassucci, Danilo Comminiello
for: 本研究旨在提出一种基于嵌入空间的多modal感情识别方法，以提高现有方法的效果。
methods: 本方法使用了一种新的融合模块，该模块通过在嵌入空间进行参数化的高复杂多元 multiplication 来实现更加有效的融合步骤。
results: 在使用 MAHNOB-HCI 数据集进行分类测试中，本方法的性能超过了现有的多modal状态gartoon network，并且可以更好地识别抑或情绪的强度。

Abstract
Multimodal emotion recognition from physiological signals is receiving an increasing amount of attention due to the impossibility to control them at will unlike behavioral reactions, thus providing more reliable information. Existing deep learning-based methods still rely on extracted handcrafted features, not taking full advantage of the learning ability of neural networks, and often adopt a single-modality approach, while human emotions are inherently expressed in a multimodal way. In this paper, we propose a hypercomplex multimodal network equipped with a novel fusion module comprising parameterized hypercomplex multiplications. Indeed, by operating in a hypercomplex domain the operations follow algebraic rules which allow to model latent relations among learned feature dimensions for a more effective fusion step. We perform classification of valence and arousal from electroencephalogram (EEG) and peripheral physiological signals, employing the publicly available database MAHNOB-HCI surpassing a multimodal state-of-the-art network. The code of our work is freely available at https://github.com/ispamm/MHyEEG.

摘要
多Modal Emotion recognition from physiological signals 已经收到了越来越多的关注，因为不可控制的physiological signals不同于行为反应，可以提供更可靠的信息。现有的深度学习基本方法仍然基于提取的手动特征，没有完全利用神经网络的学习能力，而且常采用单模态方法，而人类情感表达是自然多模态的。在这篇论文中，我们提出了一个嵌入式多模态网络，具有一个新的融合模块，其中包括参数化的超复杂 multiply 运算。在超复杂domain中操作，操作按照代数规则进行，可以模型学习 feature 维度之间的潜在关系，从而实现更有效的融合步骤。我们使用MAHNOB-HCI数据集进行了EEG和周边生物信号的分类，并超越了现有的多模态状态码网络。我们的代码可以免费在https://github.com/ispamm/MHyEEG中下载。

Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey

paper_url: http://arxiv.org/abs/2310.07745
repo_url: None
paper_authors: Gregory Palmer, Chris Parry, Daniel J. B. Harrold, Chris Willis
for: 这篇论文是为了探讨深度强化学习（DRL）在自动化网络操作（ACO）中的应用和挑战而写的。
methods: 论文使用了DRL方法，并对其应用于ACO问题进行了批判和评估。
results: 论文发现了DRL在ACO问题中的挑战，包括高维状态空间、大多个数动作空间和对抗学习等问题，并提出了一些可能的解决方案。

Abstract
The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber-defence, numerous challenges must be overcome before DRL can be applied to autonomous cyber-operations (ACO) at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACO problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACO-DRL agent. We provide: i.) A summary of the domain properties that define the ACO problem; ii.) A comprehensive evaluation of the extent to which domains used for benchmarking DRL approaches are comparable to ACO; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACO. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACO.

摘要
随着最近几年的网络攻击数量快速增加，需要有原则的方法来防御网络免受黑客的攻击。深度强化学习（DRL）已经出现为防御攻击的有力的方法。然而，虽然DRL在网络防御方面表现出了很多潜力，但是在自动化网络操作（ACO）中应用DRL仍然是一个开放的挑战。在面临高维状态空间、大多个粒度动作空间和反对学习的环境中，原则的方法是必要的。过去的研究已经解决了这些问题，并且有很多有优的工程努力在解决这些问题上。然而，将DRL应用到整个ACO问题仍然是一个开放的挑战。在这篇文章中，我们对DRL的相关文献进行了抽象，并提出了一个理想化的ACO-DRL代理。我们提供了以下内容：1. ACO问题的域属性的总结，包括域的特点和挑战。2. DRL在不同域上的比较，以及这些域是否与ACO相似的分析。3. 对于面临维度瓶颈的域，State-of-the-art的扩展DRL方法的概述。4. 对于反对学习Setting下的DRL方法的评价和批判。我们结束时，提出了未来研究方向的开放问题，希望能够鼓励未来的研究人员和实践者在ACO领域进行更多的研究。

Graph Transformer Network for Flood Forecasting with Heterogeneous Covariates

paper_url: http://arxiv.org/abs/2310.07631
repo_url: https://github.com/JimengShi/FloodGTN_Prediction
paper_authors: Jimeng Shi, Vitalii Stebliankin, Zhaonan Wang, Shaowen Wang, Giri Narasimhan
for: 预测洪水，帮助管理洪水风险
methods: 使用图像变换网络（Graph Transformer Network，简称FloodGTN），通过图гра内部网络（Graph Neural Networks，GNNs）和LSTM学习水位的空间Temporal依赖关系，并利用Transformer学习对外部参数（如降雨、潮汐、水利设施等）的注意力。
results: 对南佛瑞达水资源管理区的数据进行应用，实验结果表明，FloodGTN比HEC-RAS模型高度精准，提高了70%，并在运行时间上减少了至少500倍。

Abstract
Floods can be very destructive causing heavy damage to life, property, and livelihoods. Global climate change and the consequent sea-level rise have increased the occurrence of extreme weather events, resulting in elevated and frequent flood risk. Therefore, accurate and timely flood forecasting in coastal river systems is critical to facilitate good flood management. However, the computational tools currently used are either slow or inaccurate. In this paper, we propose a Flood prediction tool using Graph Transformer Network (FloodGTN) for river systems. More specifically, FloodGTN learns the spatio-temporal dependencies of water levels at different monitoring stations using Graph Neural Networks (GNNs) and an LSTM. It is currently implemented to consider external covariates such as rainfall, tide, and the settings of hydraulic structures (e.g., outflows of dams, gates, pumps, etc.) along the river. We use a Transformer to learn the attention given to external covariates in computing water levels. We apply the FloodGTN tool to data from the South Florida Water Management District, which manages a coastal area prone to frequent storms and hurricanes. Experimental results show that FloodGTN outperforms the physics-based model (HEC-RAS) by achieving higher accuracy with 70% improvement while speeding up run times by at least 500x.

摘要
洪水可以很破坏生命、财产和生活方式。全球气候变化和相应的海平面上升已导致极端天气事件的增加，从而增加了洪水风险的频繁性。因此，在海 coastal 河流系统中准确并快速的洪水预测是非常重要的。但是，现有的计算工具都是慢或不准确的。在这篇论文中，我们提出了一种基于图 transformer 网络 (FloodGTN) 的洪水预测工具，用于河流系统。更specifically，FloodGTN 使用图神经网络 (GNNs) 和 LSTM 学习水位在不同监测站之间的空间时间相互关系。它还可以考虑外部covariates，如降雨、潮汐和水利设施（如水库、水闸、泵等）的设置。我们使用 transformer 来学习对外部covariates的注意力。我们在南佛瑞达水资源管理区的数据上应用了 FloodGTN 工具，该区域是频繁遭受风暴和飓风的海岸区。实验结果表明，FloodGTN 在比较HEC-RAS物理模型时，提高了准确率70%，而且提高了运行速度至少500倍。

Differentiable Euler Characteristic Transforms for Shape Classification

paper_url: http://arxiv.org/abs/2310.07630
repo_url: https://github.com/aidos-lab/dect
paper_authors: Ernst Roell, Bastian Rieck
for: 本文旨在开发一种能够在终端到终端学习ECT的新计算层，以提高ECT在图像和点云分类任务中的性能。
methods: 本文使用了一种新的计算层DECT，它可以在终端到终端学习ECT，并且具有快速和计算效率的优点。
results: 本文的DECT方法在图像和点云分类任务中的性能与更复杂的模型相当，而且还证明了ECT仍然具有同等的 topological 表达能力。

Abstract
The Euler Characteristic Transform (ECT) has proven to be a powerful representation, combining geometrical and topological characteristics of shapes and graphs. However, the ECT was hitherto unable to learn task-specific representations. We overcome this issue and develop a novel computational layer that enables learning the ECT in an end-to-end fashion. Our method DECT is fast and computationally efficient, while exhibiting performance on a par with more complex models in both graph and point cloud classification tasks. Moreover, we show that this seemingly unexpressive statistic still provides the same topological expressivity as more complex topological deep learning layers provide.

摘要
《EulerCharacteristicTransform》（ECT）已经显示出了一种强大的表示方式，将几何和拓扑特征结合在一起。然而，ECT之前无法学习任务特定的表示。我们解决了这个问题，并开发了一种新的计算层，使ECT可以在端到端的方式学习。我们的方法DECT快速和计算效率高，在图像和点云分类任务中表现与更复杂的模型相当。此外，我们还证明ECT仍然提供了与更复杂的拓扑深度学习层相同的拓扑表达能力。

Unsupervised Learning of Sea Surface Height Interpolation from Multi-variate Simulated Satellite Observations

paper_url: http://arxiv.org/abs/2310.07626
repo_url: None
paper_authors: Theo Archambault, Arthur Filoche, Anastase Charantonis, Dominique Bereziat, Sylvie Thiria
for:这个论文主要是为了研究使用卫星测距数据推算海面高程的方法。methods:论文使用了深度学习网络，利用海面温度数据来提高推算海面高程的精度。results:论文发现，使用深度学习网络可以在不具备训练数据的情况下，使用海面温度数据来提高推算海面高程的精度，并且可以减少41%的根据差值。

Abstract
Satellite-based remote sensing missions have revolutionized our understanding of the Ocean state and dynamics. Among them, spaceborne altimetry provides valuable measurements of Sea Surface Height (SSH), which is used to estimate surface geostrophic currents. However, due to the sensor technology employed, important gaps occur in SSH observations. Complete SSH maps are produced by the altimetry community using linear Optimal Interpolations (OI) such as the widely-used Data Unification and Altimeter Combination System (DUACS). However, OI is known for producing overly smooth fields and thus misses some mesostructures and eddies. On the other hand, Sea Surface Temperature (SST) products have much higher data coverage and SST is physically linked to geostrophic currents through advection. We design a realistic twin experiment to emulate the satellite observations of SSH and SST to evaluate interpolation methods. We introduce a deep learning network able to use SST information, and a trainable in two settings: one where we have no access to ground truth during training and one where it is accessible. Our investigation involves a comparative analysis of the aforementioned network when trained using either supervised or unsupervised loss functions. We assess the quality of SSH reconstructions and further evaluate the network's performance in terms of eddy detection and physical properties. We find that it is possible, even in an unsupervised setting to use SST to improve reconstruction performance compared to SST-agnostic interpolations. We compare our reconstructions to DUACS's and report a decrease of 41\% in terms of root mean squared error.

摘要
卫星远感任务已经革命化了我们对海洋状态和动力学的理解。其中，空间遥感技术提供了海面高程（SSH）的重要测量，用于估算表面地OSTROPIC currents。然而，由于感知技术的限制， SSH 观测存在重要的缺失。complete SSH 地图是通过附近优化技术（OI）生成，如广泛使用的数据统一和探针组合系统（DUACS）。然而，OI 知道生成过于平滑的场景，因此会错过一些中规模的涡动和涝层。一方面，海面温度（SST）产品具有更高的数据覆盖率，SST 与地OSTROPIC currents 是物理相关的。我们设计了一个现实的双子实验，用于模拟卫星观测的 SSH 和 SST。我们引入了一个深度学习网络，可以使用 SST 信息，并在两种设置下训练：一种没有训练数据，一种可以使用训练数据。我们的调查包括比较这些网络在不同的训练设置下的性能，并评估其在涝层检测和物理性能方面的表现。我们发现，即使在无supervision的设置下，也可以使用 SST 提高重建性能，相比于 SST 无关的 interpolations。我们对我们的重建与 DUACS 进行比较，发现root mean squared error 下降41%。

Prospective Side Information for Latent MDPs

paper_url: http://arxiv.org/abs/2310.07596
repo_url: None
paper_authors: Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis
for: 本研究目的是研究Latent Markov Decision Processes（LMDP）中的强化学习问题，具体来说是在有 prospectivе side information 的情况下。
methods: 本研究使用了Markov decision process（MDP）和partially observed Markov decision process（POMDP）的概念，以及近似搜索和强化学习算法。
results: 本研究发现，在LMDP中，具有prospectivе side information的情况下，任何高效样本算法都会遭受 $\Omega(K^{2/3})$ 的违和，而不是标准的 $\Omega(\sqrt{K})$ 下界。此外，本研究还设计了一种具有匹配的Upper bound的算法。

Abstract
In many interactive decision-making settings, there is latent and unobserved information that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user's preferences, is not given. In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction. This type of environment can be modeled as a Latent Markov Decision Process (LMDP), a special instance of Partially Observed Markov Decision Processes (POMDPs). Previous work established exponential lower bounds in the number of latent contexts for the LMDP class. This puts forward a question: under which natural assumptions a near-optimal policy of an LMDP can be efficiently learned? In this work, we study the class of LMDPs with {\em prospective side information}, when an agent receives additional, weakly revealing, information on the latent context at the beginning of each episode. We show that, surprisingly, this problem is not captured by contemporary settings and algorithms designed for partially observed environments. We then establish that any sample efficient algorithm must suffer at least $\Omega(K^{2/3})$-regret, as opposed to standard $\Omega(\sqrt{K})$ lower bounds, and design an algorithm with a matching upper bound.

摘要
Many interactive decision-making settings have latent and unobserved information that remains fixed. For example, in a dialogue system, the user's preferences may not be fully known. In such an environment, the latent information remains fixed throughout each episode because the user's identity does not change during an interaction. This type of environment can be modeled as a Latent Markov Decision Process (LMDP), a special instance of Partially Observed Markov Decision Processes (POMDPs). Previous work established exponential lower bounds in the number of latent contexts for the LMDP class. This raises a question: under what natural assumptions can a near-optimal policy of an LMDP be efficiently learned?In this work, we study the class of LMDPs with "prospective side information," where an agent receives additional, weakly revealing information on the latent context at the beginning of each episode. We find that this problem is not captured by contemporary settings and algorithms designed for partially observed environments. We then establish that any sample-efficient algorithm must suffer at least $\Omega(K^{2/3})$ regret, rather than the standard $\Omega(\sqrt{K})$ lower bounds, and design an algorithm with a matching upper bound.

Transformers for Green Semantic Communication: Less Energy, More Semantics

paper_url: http://arxiv.org/abs/2310.07592
repo_url: None
paper_authors: Shubhabrata Mukherjee, Cory Beard, Sejun Song
for: 本研究旨在提高含义传输的效率和可靠性，而不是强调个别符号或位元。
methods: 该研究提出了一种新的多目标损失函数 named “Energy-Optimized Semantic Loss” (EOSL)，用于平衡含义损失和能耗。
results: 经过对transformer模型的测试，包括CPU和GPU能耗测试，显示EOSL可以在推理阶段提高含义相似性表现的44%，同时节省90%的能耗。

Abstract
Semantic communication aims to transmit meaningful and effective information rather than focusing on individual symbols or bits, resulting in benefits like reduced latency, bandwidth usage, and higher throughput compared to traditional communication. However, semantic communication poses significant challenges due to the need for universal metrics for benchmarking the joint effects of semantic information loss and practical energy consumption. This research presents a novel multi-objective loss function named "Energy-Optimized Semantic Loss" (EOSL), addressing the challenge of balancing semantic information loss and energy consumption. Through comprehensive experiments on transformer models, including CPU and GPU energy usage, it is demonstrated that EOSL-based encoder model selection can save up to 90\% of energy while achieving a 44\% improvement in semantic similarity performance during inference in this experiment. This work paves the way for energy-efficient neural network selection and the development of greener semantic communication architectures.

摘要
This research proposes a novel multi-objective loss function called "Energy-Optimized Semantic Loss" (EOSL) to address the challenge of balancing semantic information loss and energy consumption. Through comprehensive experiments on transformer models, including CPU and GPU energy usage, it is demonstrated that EOSL-based encoder model selection can save up to 90% of energy while achieving a 44% improvement in semantic similarity performance during inference. This work paves the way for energy-efficient neural network selection and the development of greener semantic communication architectures.Here is the text in Simplified Chinese:semantic communication aimsto transmit meaningful and effective information instead of focusing on individual symbols or bits, which can result in benefits such as reduced latency, bandwidth usage, and higher throughput compared to traditional communication. However, semantic communication also poses significant challenges, such as the need for universal metrics for benchmarking the joint effects of semantic information loss and practical energy consumption.this research proposes a novel multi-objective loss function called "Energy-Optimized Semantic Loss" (EOSL) to address the challenge of balancing semantic information loss and energy consumption. Through comprehensive experiments on transformer models, including CPU and GPU energy usage, it is demonstrated that EOSL-based encoder model selection can save up to 90% of energy while achieving a 44% improvement in semantic similarity performance during inference. This work paves the way for energy-efficient neural network selection and the development of greener semantic communication architectures.

Analyzing Trendy Twitter Hashtags in the 2022 French Election

paper_url: http://arxiv.org/abs/2310.07576
repo_url: None
paper_authors: Aamir Mandviwalla, Lake Yin, Boleslaw K. Szymanski
for: 预测社交媒体用户未来活动的模型需要丰富的特征来进行准确预测。许多先进的模型可以生成这些特征，但是它们在庞大数据集上运行时的计算时间常常是禁制的。一些研究表明，简单的含义网络特征可以rich enough使用于机器学习任务。我们提议使用含义网络作为用户级别特征。
methods: 我们使用了一个含义网络，其中有1037个Twitter标签从一个包含370万个推文的2022年法国总统选举相关的词语集中提取出来。我们将标签作为节点，用户之间的互动关系作为 Edge，并将这些关系加权。然后，我们将这个图转换成最大拓扑树，并将最受欢迎的标签作为根节点来构建一个层次结构。最后，我们为每个用户提供一个基于这个树的向量特征。
results: 我们使用了这个semantic特征进行回归预测每个用户的六种情感响应（愤怒、享受、失望等）。我们发现大多数情感的$R^2$值大于0.5，这表明我们的semantic特征具有预测社交媒体用户回归的能力。这些结果表明我们的semantic特征可以考虑在进一步的大数据集上进行预测。

Abstract
Regressions trained to predict the future activity of social media users need rich features for accurate predictions. Many advanced models exist to generate such features; however, the time complexities of their computations are often prohibitive when they run on enormous data-sets. Some studies have shown that simple semantic network features can be rich enough to use for regressions without requiring complex computations. We propose a method for using semantic networks as user-level features for machine learning tasks. We conducted an experiment using a semantic network of 1037 Twitter hashtags from a corpus of 3.7 million tweets related to the 2022 French presidential election. A bipartite graph is formed where hashtags are nodes and weighted edges connect the hashtags reflecting the number of Twitter users that interacted with both hashtags. The graph is then transformed into a maximum-spanning tree with the most popular hashtag as its root node to construct a hierarchy amongst the hashtags. We then provide a vector feature for each user based on this tree. To validate the usefulness of our semantic feature we performed a regression experiment to predict the response rate of each user with six emotions like anger, enjoyment, or disgust. Our semantic feature performs well with the regression with most emotions having $R^2$ above 0.5. These results suggest that our semantic feature could be considered for use in further experiments predicting social media response on big data-sets.

摘要
<>将文本翻译成简化中文。<>预测社交媒体用户未来活动的回归模型需要丰富的特征来实现准确预测。许多高级模型可以生成这些特征，但是它们在庞大数据集上进行计算时间复杂度经常是禁止的。一些研究表明，使用 semantic network 的简单 semantic 特征可以避免复杂的计算。我们提出一种使用 semantic network 作为用户级别特征的方法。我们在一个包含 3.7 万条提子的推特帖子中选择了 1037 个标签，并将这些标签组织成一个带有权重边的对角raph。然后将这个对角raph变换成一个最大拓扑树，其中最受欢迎的标签作为根节点，以建立一个层次结构。我们然后为每个用户提供一个基于这棵树的 вектор特征。为验证我们的semantic特征的有用性，我们进行了一个回归实验，用于预测每个用户的六种情感响应（愤怒、愉悦、厌恶等）。我们的semantic特征在这些情感回归中表现良好，大多数情感有 $R^2$ 超过 0.5。这些结果表明，我们的semantic特征可以考虑在大数据集上进行进一步的实验。

Smootheness-Adaptive Dynamic Pricing with Nonparametric Demand Learning

paper_url: http://arxiv.org/abs/2310.07558
repo_url: None
paper_authors: Zeqi Ye, Hansheng Jiang
for: 研究非参数化需求函数的动态价格问题，并聚焦于适应未知Holder平滑参数$\beta$的需求函数。
methods: 提出了一种自相似性条件，以启用适应性。并开发了一种可以在不知道$\beta$情况下实现最佳动态价格算法，并证明了这种算法可以达到最佳 regretBound。
results: 证明了无法适应性的动态价格问题，并提出了一种基于自相似性条件的适应性动态价格算法，该算法可以在不知道$\beta$情况下实现最佳 regretBound。

Abstract
We study the dynamic pricing problem where the demand function is nonparametric and H\"older smooth, and we focus on adaptivity to the unknown H\"older smoothness parameter $\beta$ of the demand function. Traditionally the optimal dynamic pricing algorithm heavily relies on the knowledge of $\beta$ to achieve a minimax optimal regret of $\widetilde{O}(T^{\frac{\beta+1}{2\beta+1})$. However, we highlight the challenge of adaptivity in this dynamic pricing problem by proving that no pricing policy can adaptively achieve this minimax optimal regret without knowledge of $\beta$. Motivated by the impossibility result, we propose a self-similarity condition to enable adaptivity. Importantly, we show that the self-similarity condition does not compromise the problem's inherent complexity since it preserves the regret lower bound $\Omega(T^{\frac{\beta+1}{2\beta+1})$. Furthermore, we develop a smoothness-adaptive dynamic pricing algorithm and theoretically prove that the algorithm achieves this minimax optimal regret bound without the prior knowledge $\beta$.

摘要
我们研究动态价格问题，其中需求函数是非parametric且哈lder平滑的。我们专注于适应未知哈lder平滑度 Parameter $\beta$ 的需求函数。传统上最佳的动态价格算法严重依赖 $\beta$ 的知识，以 дости得最佳的 regret Bound $\widetilde{O}(T^{\frac{\beta+1}{2\beta+1})$。但我们点出了适应性的挑战，并证明了无法适应地achivr 此最佳 regret bound 的价格策略。我们提出了自similarity 条件，以启动适应性。我们证明了这个条件不会增加问题的内在复杂性，因为它保持了 regret 下界 $\Omega(T^{\frac{\beta+1}{2\beta+1})$。此外，我们开发了一个具有哈lder平滑性的动态价格算法，并证明了这个算法可以 дости得最佳的 regret bound 无需 $\beta$ 的专门知识。

Provable Advantage of Parameterized Quantum Circuit in Function Approximation

paper_url: http://arxiv.org/abs/2310.07528
repo_url: None
paper_authors: Zhan Yu, Qiuhao Chen, Yuling Jiao, Yinan Li, Xiliang Lu, Xin Wang, Jerry Zhijian Yang
for: 这个论文的目的是分析parameterized quantum circuits（PQCs）在机器学习任务中的表达能力。
methods: 这篇论文使用函数近似的角度来分析PQCs的表达能力，并提供了可重构的PQCs的建构方法，以及在各种函数上进行近似的技术。
results: 这篇论文提供了关于PQCs的表达能力的Explicit Construction，并提供了关于PQCs的近似误差的量化界限，其中误差的大小与PQCs的宽度、深度和可调参数的数量有关。此外，论文还比较了提案的PQCs和深度学习网络在高维平滑函数的近似中的性能，并发现PQCs的模型大小与深度学习网络的模型大小之间存在指数关系。这 suggets a potentially novel avenue for showcasing quantum advantages in quantum machine learning.

Abstract
Understanding the power of parameterized quantum circuits (PQCs) in accomplishing machine learning tasks is one of the most important questions in quantum machine learning. In this paper, we analyze the expressivity of PQCs through the lens of function approximation. Previously established universal approximation theorems for PQCs are mainly nonconstructive, leading us to the following question: How large do the PQCs need to be to approximate the target function up to a given error? We exhibit explicit constructions of data re-uploading PQCs for approximating continuous and smooth functions and establish quantitative approximation error bounds in terms of the width, the depth and the number of trainable parameters of the PQCs. To achieve this, we utilize techniques from quantum signal processing and linear combinations of unitaries to construct PQCs that implement multivariate polynomials. We implement global and local approximation techniques using Bernstein polynomials and local Taylor expansion and analyze their performances in the quantum setting. We also compare our proposed PQCs to nearly optimal deep neural networks in approximating high-dimensional smooth functions, showing that the ratio between model sizes of PQC and deep neural networks is exponentially small with respect to the input dimension. This suggests a potentially novel avenue for showcasing quantum advantages in quantum machine learning.

摘要

Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.07518
repo_url: None
paper_authors: Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi
for: 提高强化学习的采样效率，使用 posterior sampling 技术，并利用先验知识来改善采样效率。
methods: 提出一种新的 posterior sampling 方法，使用 causal graph 来表示先验知识，并在这个 graph 上进行 Bayesian 推理。
results: 在 illustrate 的领域中，经过数值评估，提出的 C-PSRL 方法可以强化 posterior sampling 的效率，并且与完整的 causal graph 相比，其效果几乎相同。

Abstract
Posterior sampling allows the exploitation of prior knowledge of the environment's transition dynamics to improve the sample efficiency of reinforcement learning. The prior is typically specified as a class of parametric distributions, a task that can be cumbersome in practice, often resulting in the choice of uninformative priors. In this work, we propose a novel posterior sampling approach in which the prior is given as a (partial) causal graph over the environment's variables. The latter is often more natural to design, such as listing known causal dependencies between biometric features in a medical treatment study. Specifically, we propose a hierarchical Bayesian procedure, called C-PSRL, simultaneously learning the full causal graph at the higher level and the parameters of the resulting factored dynamics at the lower level. For this procedure, we provide an analysis of its Bayesian regret, which explicitly connects the regret rate with the degree of prior knowledge. Our numerical evaluation conducted in illustrative domains confirms that C-PSRL strongly improves the efficiency of posterior sampling with an uninformative prior while performing close to posterior sampling with the full causal graph.

摘要
<>使用 posterior sampling 可以利用环境转移动力学的先前知识来提高强化学习的样本效率。先前通常是指定为一类 Parametric 分布，这可以在实践中是困难的，常导致选择不具有信息的先前。在这种工作中，我们提出了一种新的 posterior sampling 方法，在该方法中，先前是表示环境变量的 causal 图。这对于设计是更自然的，例如在医疗治疗研究中列出了知道的生物特征相互关系。我们提出了一种层次 Bayesian 过程，称为 C-PSRL，该过程同时学习全部 causal 图和其导致的结果的 factored 动力学参数。我们提供了 Bayesian regret 的分析，其直接连接了 regret 率与先前知识的度量。我们的数值评估在演示领域中表明，C-PSRL 可以大幅提高 posterior sampling 的效率，同时与完整的 causal 图相似。Note: " Simplified Chinese" is a romanization of the Chinese language, which is used to represent the language in the Latin alphabet. It is not a translation of the text into Chinese characters.

Model-based Clustering of Individuals’ Ecological Momentary Assessment Time-series Data for Improving Forecasting Performance

paper_url: http://arxiv.org/abs/2310.07491
repo_url: None
paper_authors: Mandani Ntekouli, Gerasimos Spanakis, Lourens Waldorp, Anne Roefs
for: 这个研究旨在使用时间序列数据进行ecological momentary assessment (EMA)，并使用集成分析方法来描述个人情绪行为。
methods: 这个研究使用了两种模型基于的集成方法，一是使用个人化模型中提取的参数，另一是根据模型基于的预测性能进行优化。
results: 研究发现，使用性能为基准的集成方法得到了最好的结果，在所有评估指标上都超过了个人化、全部一起和随机分组的基准。

Abstract
Through Ecological Momentary Assessment (EMA) studies, a number of time-series data is collected across multiple individuals, continuously monitoring various items of emotional behavior. Such complex data is commonly analyzed in an individual level, using personalized models. However, it is believed that additional information of similar individuals is likely to enhance these models leading to better individuals' description. Thus, clustering is investigated with an aim to group together the most similar individuals, and subsequently use this information in group-based models in order to improve individuals' predictive performance. More specifically, two model-based clustering approaches are examined, where the first is using model-extracted parameters of personalized models, whereas the second is optimized on the model-based forecasting performance. Both methods are then analyzed using intrinsic clustering evaluation measures (e.g. Silhouette coefficients) as well as the performance of a downstream forecasting scheme, where each forecasting group-model is devoted to describe all individuals belonging to one cluster. Among these, clustering based on performance shows the best results, in terms of all examined evaluation measures. As another level of evaluation, those group-models' performance is compared to three baseline scenarios, the personalized, the all-in-one group and the random group-based concept. According to this comparison, the superiority of clustering-based methods is again confirmed, indicating that the utilization of group-based information could be effectively enhance the overall performance of all individuals' data.

摘要

Nonlinear embeddings for conserving Hamiltonians and other quantities with Neural Galerkin schemes

paper_url: http://arxiv.org/abs/2310.07485
repo_url: None
paper_authors: Paul Schwerdtner, Philipp Schulze, Jules Berman, Benjamin Peherstorfer
for: 这个论文关注在解析方程的解场 Solution Fields 中的量保守问题，特别是使用深度网络来近似解场。
methods: 该方法基于Dirac–Frenkel变分原理，逐步在时间上训练非线性参数化。
results: 实验表明，该方法可以保持量的精度，并且可以与标准的显式和隐式时间推导方法结合使用。

Abstract
This work focuses on the conservation of quantities such as Hamiltonians, mass, and momentum when solution fields of partial differential equations are approximated with nonlinear parametrizations such as deep networks. The proposed approach builds on Neural Galerkin schemes that are based on the Dirac--Frenkel variational principle to train nonlinear parametrizations sequentially in time. We first show that only adding constraints that aim to conserve quantities in continuous time can be insufficient because the nonlinear dependence on the parameters implies that even quantities that are linear in the solution fields become nonlinear in the parameters and thus are challenging to discretize in time. Instead, we propose Neural Galerkin schemes that compute at each time step an explicit embedding onto the manifold of nonlinearly parametrized solution fields to guarantee conservation of quantities. The embeddings can be combined with standard explicit and implicit time integration schemes. Numerical experiments demonstrate that the proposed approach conserves quantities up to machine precision.

摘要
我们的研究探讨了使用深度网络作为非线性参数化方法时，对于偏微分方程解场的保守量的问题。我们的方法基于Neural Galerkin方法，该方法基于Dirac--Frenkel变量原理来逐步在时间上训练非线性参数化。我们发现，只要添加保守量的约束并不足以保证量的保守，因为非线性参数的依赖关系使得解场中的量变为非线性函数，这使得在时间上的积分变得困难。因此，我们提议使用Neural Galerkin方法来在每个时间步骤中计算非线性参数化解场的明确嵌入，以保证量的保守。这些嵌入可以与标准的显式和隐式时间积分方法结合使用。我们的numerical experiments表明，我们的方法可以保证量的保守，并且和标准方法相比，可以提高计算效率。

Automatic Sensor-free Affect Detection: A Systematic Literature Review

paper_url: http://arxiv.org/abs/2310.13711
repo_url: None
paper_authors: Felipe de Morais, Diógines Goldoni, Tiago Kautzmann, Rodrigo da Silva, Patricia A. Jaques
for: This paper provides a comprehensive literature review on sensor-free affect detection in computer-based learning environments (CBLEs) to enhance learning outcomes.
methods: The paper reviews the most frequently identified affective states, methodologies and techniques employed for sensor development, defining attributes of CBLEs and data samples, and key research trends.
results: The paper highlights the consistent performance of the models and the application of advanced machine learning techniques, but notes that there is ample scope for future research, including enhancing model performance, collecting more samples of underrepresented emotions, and refining model development practices.Here is the information in Simplified Chinese text:
for: 这篇论文为了提高学习 outcome，提供了计算机基础学习环境（CBLEs）中无感器情感探测的全面文献评论。
methods: 论文评论了最常见的情感状态，感应器开发方法和技术，定义CBLE和数据样本的特征，以及主要的研究趋势。
results: 论文指出了模型的一致性表现和应用先进机器学习技术，但还有充足的发展空间，包括提高模型性能，收集更多的异常情感样本，并规范模型开发方法。

Abstract
Emotions and other affective states play a pivotal role in cognition and, consequently, the learning process. It is well-established that computer-based learning environments (CBLEs) that can detect and adapt to students' affective states can enhance learning outcomes. However, practical constraints often pose challenges to the deployment of sensor-based affect detection in CBLEs, particularly for large-scale or long-term applications. As a result, sensor-free affect detection, which exclusively relies on logs of students' interactions with CBLEs, emerges as a compelling alternative. This paper provides a comprehensive literature review on sensor-free affect detection. It delves into the most frequently identified affective states, the methodologies and techniques employed for sensor development, the defining attributes of CBLEs and data samples, as well as key research trends. Despite the field's evident maturity, demonstrated by the consistent performance of the models and the application of advanced machine learning techniques, there is ample scope for future research. Potential areas for further exploration include enhancing the performance of sensor-free detection models, amassing more samples of underrepresented emotions, and identifying additional emotions. There is also a need to refine model development practices and methods. This could involve comparing the accuracy of various data collection techniques, determining the optimal granularity of duration, establishing a shared database of action logs and emotion labels, and making the source code of these models publicly accessible. Future research should also prioritize the integration of models into CBLEs for real-time detection, the provision of meaningful interventions based on detected emotions, and a deeper understanding of the impact of emotions on learning.

摘要
感情和其他情感状态在认知过程中扮演着关键性角色，因此在学习过程中也具有重要作用。已经证明了通过检测和适应学生情感状态的计算机基础学习环境（CBLE）可以提高学习成果。然而，实际应用中的偏见和限制常常使得感知基础的情感检测成为实现的瓶颈。因此，不需要感知设备的情感检测（sensor-free affect detection）成为了一种有前途的alternative。本文提供了关于情感检测的完整的文献综述，包括最常见的情感状态、情感检测器的开发方法和技术、CBLE和数据样本的特点以及主要的研究趋势。尽管领域的成熔度已经明显，表明模型的稳定性和高级机器学习技术的应用，但是还有很多可能的发展方向。未来研究应该强调检测模型的性能提升、更多的弱化情感样本收集、更多的情感状态检测以及模型开发方法的优化。此外，将模型集成到CBLE中进行实时检测，为检测到的情感状态提供有意义的 intervención，以及更深入地理解情感对学习的影响，也是未来研究的重要方向。

paper_url: http://arxiv.org/abs/2310.07464
repo_url: None
paper_authors: Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Changjing Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Hong Shen, Jun Tan, Yongbing Zhang
for:多种低度 glioma (LGG) 诊断和治疗中不可或缺的一部分是生物标志物 detection。但现有的LGG生物标志物 detection方法依赖于成本高、复杂的分子遗传测试，需要专业人员分析结果，并且经常报告了内部repeatability。methods:我们提出了一种可读性深度学习管道，基于多例学习（MIL）框架的多生物标志物 Histomorphology Discoverer（Multi-Beholder）模型，可以使用染色和抹平板扫描图像来预测LGG中五个生物标志物的状态。特别是通过 incorporating一类分类into MIL framework，实现了准确的实例 Pseudo-labeling，以便使用板块级别标签进行实例级别监督，从而提高生物标志物预测性能。results:Multi-Beholder在两个组合（n=607）中显示出了出色的预测性能和普适性（AUROC=0.6469-0.9735）。此外，Multi-Beholder的极佳可读性使得可以发现生物标志物状态和 histomorphology 特征之间的量化和质量相关性。我们的管道不仅提供了一种新的生物标志物预测方法，推进了LGG患者的分子治疗的可采用性，而且可以促进分子功能和LGG进程的新机制发现。

Abstract
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression.

摘要
生物标志物检测是低级 Glioma（LGG）诊断和治疗中不可或缺的一部分。然而，现有的LGG生物标志物检测方法依赖于昂贵和复杂的分子遗传学测试，需要专业人员分析结果，并且间误量往往被报告。为了解决这些挑战，我们提出了一个可解释的深度学习管道，即多种生物标志物检测器（Multi-Beholder）模型，基于多例学习（MIL）框架，用于预测LGG五个生物标志物的状态，只需使用染色的整幅干涂图像和板块级别生物标志物状态标签。特别是，通过将一类学习 integrate into MIL框架，实现了准确的实例 pseudo-标签，这种方法可以较好地补充板块级别标签，提高生物标志物预测性能。Multi-Beholder在两个 cohort（n=607）中表现出色，其AUROC值为0.6469-0.9735。此外，Multi-Beholder的优秀可解释性使得可以发现某些生物标志物状态和 Histomorphology 特征之间的数量和质量相关性。我们的管道不仅提供了一种新的生物标志物预测方法，扩展了LGG患者可应用的分子治疗，还可以探索新的分子功能和LGG进程中的新机制。

Uncovering ECG Changes during Healthy Aging using Explainable AI

paper_url: http://arxiv.org/abs/2310.07463
repo_url: https://github.com/ai4healthuol/ecg-aging
paper_authors: Gabriel Ott, Yannik Schaubelt, Juan Miguel Lopez Alcaraz, Wilhelm Haverkamp, Nils Strodthoff
for: 这项研究旨在提供更深刻的心脏年龄过程理解，以便更好地诊断心血管健康水平。
methods: 这项研究使用深度学习模型和树式分类模型分析了健康个体的ECG数据，并使用可解释AI技术确定ECG特征或原始信号特征是否能够分类不同年龄组。
results: 研究发现，年龄增长导致呼吸速率下降，并且发现高SDANN值能够区分年轻人和老年人。此外，深度学习模型表明，年龄增长导致P波类型的分布变化，这些发现可能提供传统特征分析方法之外的新的年龄相关ECG变化。

Abstract
Cardiovascular diseases remain the leading global cause of mortality. This necessitates a profound understanding of heart aging processes to diagnose constraints in cardiovascular fitness. Traditionally, most of such insights have been drawn from the analysis of electrocardiogram (ECG) feature changes of individuals as they age. However, these features, while informative, may potentially obscure underlying data relationships. In this paper, we employ a deep-learning model and a tree-based model to analyze ECG data from a robust dataset of healthy individuals across varying ages in both raw signals and ECG feature format. Explainable AI techniques are then used to identify ECG features or raw signal characteristics are most discriminative for distinguishing between age groups. Our analysis with tree-based classifiers reveal age-related declines in inferred breathing rates and identifies notably high SDANN values as indicative of elderly individuals, distinguishing them from younger adults. Furthermore, the deep-learning model underscores the pivotal role of the P-wave in age predictions across all age groups, suggesting potential changes in the distribution of different P-wave types with age. These findings shed new light on age-related ECG changes, offering insights that transcend traditional feature-based approaches.

摘要
In this study, we employ a deep-learning model and a tree-based model to analyze ECG data from a large dataset of healthy individuals across different ages. We use explainable AI techniques to identify the most discriminative ECG features or raw signal characteristics for distinguishing between age groups.Our analysis with tree-based classifiers reveals age-related declines in inferred breathing rates and identifies high SDANN values as indicative of elderly individuals. Additionally, the deep-learning model emphasizes the crucial role of the P-wave in age predictions across all age groups, suggesting potential changes in the distribution of different P-wave types with age.These findings offer new insights into age-related ECG changes, going beyond traditional feature-based approaches.

ProbTS: A Unified Toolkit to Probe Deep Time-series Forecasting

paper_url: http://arxiv.org/abs/2310.07446
repo_url: None
paper_authors: Jiawen Zhang, Xumeng Wen, Shun Zheng, Jia Li, Jiang Bian
for: 这篇论文的目的是将深度学习技术应用到时间序列预测领域，并评估这两个分支之间的不同特性和表现。
methods: 这篇论文使用了两种不同的方法：一种是特定的神经网络架构，另一种是使用进步的深度生成模型进行 probabilistic 预测。
results: 这篇论文使用 ProbTS 工具套件进行比较和评估这两种方法的表现，发现它们在不同的数据enario和方法ological focuses 之间有所不同，并提供了新的研究方向来提高时间序列预测的精度。

Abstract
Time-series forecasting serves as a linchpin in a myriad of applications, spanning various domains. With the growth of deep learning, this arena has bifurcated into two salient branches: one focuses on crafting specific neural architectures tailored for time series, and the other harnesses advanced deep generative models for probabilistic forecasting. While both branches have made significant progress, their differences across data scenarios, methodological focuses, and decoding schemes pose profound, yet unexplored, research questions. To bridge this knowledge chasm, we introduce ProbTS, a pioneering toolkit developed to synergize and compare these two distinct branches. Endowed with a unified data module, a modularized model module, and a comprehensive evaluator module, ProbTS allows us to revisit and benchmark leading methods from both branches. The scrutiny with ProbTS highlights their distinct characteristics, relative strengths and weaknesses, and areas that need further exploration. Our analyses point to new avenues for research, aiming for more effective time-series forecasting.

摘要
时间序列预测作为许多应用领域的核心，其中有两个主要分支：一个是针对时间序列特定的神经网络架构的设计，另一个是利用高级深度生成模型进行 probabilistic 预测。虽然这两个分支都取得了 significiant progress，但是在数据场景、方法重点和解码方案方面存在差异，这些差异仍然尚未得到系统的探索。为了bridging这个知识差距，我们提出了 ProbTS，一个 pioneering 的工具集，用于同时Synergize和比较这两个分支。ProbTS 具有一个统一的数据模块、一个模块化的模型模块和一个全面的评价模块，这使得我们可以重新评估和比较领导的方法从两个分支中。通过 ProbTS 的检验，我们发现了这两个分支的特点、相对优劣点和需要进一步探索的领域。我们的分析指向了新的研究方向，旨在更有效地预测时间序列。

A Branched Deep Convolutional Network for Forecasting the Occurrence of Hazes in Paris using Meteorological Maps with Different Characteristic Spatial Scales

paper_url: http://arxiv.org/abs/2310.07437
repo_url: None
paper_authors: Chien Wang
for: 预测浓雾事件的发生
methods: 使用多decadal日地区天气和水文变量作为输入特征，并使用Surface visibility观测数据作为目标进行训练
results: 两支分支架构对抗气雾事件的预测性能有所提高，并在验证和盲测评估中获得了合理的分数。

Abstract
A deep learning platform has been developed to forecast the occurrence of the low visibility events or hazes. It is trained by using multi-decadal daily regional maps of various meteorological and hydrological variables as input features and surface visibility observations as the targets. To better preserve the characteristic spatial information of different input features for training, two branched architectures have recently been developed for the case of Paris hazes. These new architectures have improved the performance of the network, producing reasonable scores in both validation and a blind forecasting evaluation using the data of 2021 and 2022 that have not been used in the training and validation.

摘要
一个深度学习平台已经开发，用于预测普通天气下雾霾或低视野事件的发生。该平台通过使用多个劳伦地域天气和水文变量的多 décennial日aily地图作为输入特征，并将地面可见度观测作为目标。为更好地保留不同输入特征的特征空间信息， reciently 两个分支架构已经开发用于Paris雾霾情况。这两个新架构已经提高了网络的性能，在验证和隐藏预测评估中制造了合理的分数，使用2021和2022年的数据进行验证和预测。Note: "reciently" is a typo, it should be "recently".

Generalized Mixture Model for Extreme Events Forecasting in Time Series Data

paper_url: http://arxiv.org/abs/2310.07435
repo_url: None
paper_authors: Jincheng Wang, Yue Gao
for: 这个研究是为了提高时间序列预测中的极端值预测性能，特别是在气象预测、交通管理和股票价格预测等领域。
methods: 本研究使用了一个新的深度混合模型，即深度极点混合模型（DEMMA），其包括两个主要模块：1）一个基于折衣分布的通用混合分布，和2）一个基于自动编码器的LSTM特征提取器和时间注意力机制。
results: 研究使用多个真实世界的雨量数据展示了DEMMA模型的效果，并证明了它在模型极端值预测方面的改进。

Abstract
Time Series Forecasting (TSF) is a widely researched topic with broad applications in weather forecasting, traffic control, and stock price prediction. Extreme values in time series often significantly impact human and natural systems, but predicting them is challenging due to their rare occurrence. Statistical methods based on Extreme Value Theory (EVT) provide a systematic approach to modeling the distribution of extremes, particularly the Generalized Pareto (GP) distribution for modeling the distribution of exceedances beyond a threshold. To overcome the subpar performance of deep learning in dealing with heavy-tailed data, we propose a novel framework to enhance the focus on extreme events. Specifically, we propose a Deep Extreme Mixture Model with Autoencoder (DEMMA) for time series prediction. The model comprises two main modules: 1) a generalized mixture distribution based on the Hurdle model and a reparameterized GP distribution form independent of the extreme threshold, 2) an Autoencoder-based LSTM feature extractor and a quantile prediction module with a temporal attention mechanism. We demonstrate the effectiveness of our approach on multiple real-world rainfall datasets.

摘要
时间序列预测（TSF）是广泛研究的主题，具有广泛的应用于天气预报、交通管理和股票价格预测等领域。时间序列中的极值事件经常对人类和自然系统产生深远的影响，但预测它们却是具有挑战性的，主要因为它们的发生率很低。基于极值理论（EVT）的统计方法可以系统地模型极值事件的分布，特别是通过 Generalized Pareto（GP）分布来模型超过阈值的出现。为了解决深度学习在处理具有极大尾部的数据时的表现不佳，我们提出了一种新的框架来强调极值事件。具体来说，我们提出了一种基于极值分布的深度混合模型（DEMMA），用于时间序列预测。该模型包括两个主要模块：1）基于极值模型和独立于极值阈值的GP分布的扩展 mixture distribution；2）基于自适应神经网络和时间注意机制的LSTM特征提取器和时间预测模块。我们在多个实际降水数据集上证明了我们的方法的效果。

Non-backtracking Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.07430
repo_url: None
paper_authors: Seonghyun Park, Narae Ryu, Gahee Kim, Dongyeop Woo, Se-Young Yun, Sungsoo Ahn
for: 提高大规模图像 neural network 的精度和计算效率，解决 message-passing 更新方法中的循环问题。
methods: 提出非循环回访图 neural network (NBA-GNN)，通过不包含之前访问过的节点的信息来更新消息，从而解决 message-passing 更新方法中的循环问题。
results: 通过实验证明，NBA-GNN 可以有效地解决循环问题，提高大规模图像 neural network 的精度和计算效率，并在长距离图像和推理节点分类任务上表现出色。

Abstract
The celebrated message-passing updates for graph neural networks allow the representation of large-scale graphs with local and computationally tractable updates. However, the local updates suffer from backtracking, i.e., a message flows through the same edge twice and revisits the previously visited node. Since the number of message flows increases exponentially with the number of updates, the redundancy in local updates prevents the graph neural network from accurately recognizing a particular message flow for downstream tasks. In this work, we propose to resolve such a redundancy via the non-backtracking graph neural network (NBA-GNN) that updates a message without incorporating the message from the previously visited node. We further investigate how NBA-GNN alleviates the over-squashing of GNNs, and establish a connection between NBA-GNN and the impressive performance of non-backtracking updates for stochastic block model recovery. We empirically verify the effectiveness of our NBA-GNN on long-range graph benchmark and transductive node classification problems.

摘要
“著名的消息传递更新方法为图 neuronal network 提供了可 representations of large-scale graphs 的地方和计算可 tractable 的更新。然而，本地更新受到回tracking的影响，即消息流经同一个边两次并返回已经访问过的节点。由于消息流的数量 exponentially 增加与更新数量的关系，这种 redundancy 在图 neuronal network 中阻碍了下游任务的准确识别。在这项工作中，我们提议了解决这种 redundancy 的非回tracking图 neuronal network (NBA-GNN)，它在更新消息时不 incorporate 先前访问过的节点的消息。我们进一步调查了 NBA-GNN 如何缓解 GNN 的过 compressing 问题，并建立了 NB 更新与 Stochastic Block Model 回归的连接。我们employmically 验证了我们的 NBA-GNN 在长距离图 bencmark 和推uctive node classification 问题上的效果。”Note: Simplified Chinese is a romanization of Chinese, and the translation may not be perfect. The original text is in English, and the translation is provided for reference only.

Quantum-Enhanced Forecasting: Leveraging Quantum Gramian Angular Field and CNNs for Stock Return Predictions

paper_url: http://arxiv.org/abs/2310.07427
repo_url: None
paper_authors: Zhengmeng Xu, Hai Lin
for: 该研究旨在提高时间序列预测精度，使用量子计算技术与深度学习结合。
methods: 该方法使用量子电路特定设计，将时间序列数据转换为适合 convolutional Neural Network (CNN) 训练的二维图像。与传统的 Gramian Angular Field (GAF) 方法不同，QGAF 方法不需要数据Normalization 和 inverse cosine 计算，简化了时间序列数据转换为图像的过程。
results: 对三个主要股市的数据进行了实验：中国A股市场、香港股市和美国股市。实验结果表明，相比传统GAF方法，QGAF方法在时间序列预测精度方面具有显著提高，降低了预测错误的平均绝对值（MAE）和平均方差（MSE） Errors by an average of 25% for MAE and 48% for MSE.

Abstract
We propose a time series forecasting method named Quantum Gramian Angular Field (QGAF). This approach merges the advantages of quantum computing technology with deep learning, aiming to enhance the precision of time series classification and forecasting. We successfully transformed stock return time series data into two-dimensional images suitable for Convolutional Neural Network (CNN) training by designing specific quantum circuits. Distinct from the classical Gramian Angular Field (GAF) approach, QGAF's uniqueness lies in eliminating the need for data normalization and inverse cosine calculations, simplifying the transformation process from time series data to two-dimensional images. To validate the effectiveness of this method, we conducted experiments on datasets from three major stock markets: the China A-share market, the Hong Kong stock market, and the US stock market. Experimental results revealed that compared to the classical GAF method, the QGAF approach significantly improved time series prediction accuracy, reducing prediction errors by an average of 25% for Mean Absolute Error (MAE) and 48% for Mean Squared Error (MSE). This research confirms the potential and promising prospects of integrating quantum computing with deep learning techniques in financial time series forecasting.

摘要
我们提出了一种名为量子agramian angular field（QGAF）的时间序列预测方法。这种方法结合了量子计算技术和深度学习，目的是提高时间序列分类和预测精度。我们成功地将股票回报时间序列数据转化为适合深度神经网络训练的二维图像，通过设计专门的量子电路。与 классиical Gramian angular field（GAF）方法不同，QGAF方法不需要数据Normalization和反归cosine计算，从时间序列数据转化到二维图像的过程被简化。为验证这种方法的有效性，我们在三个主要股票市场的数据上进行了实验：中国A股市场、香港股市场和美国股市场。实验结果表明，相比 классиical GAF方法，QGAF方法在时间序列预测精度方面有显著提高，降低预测错误的平均绝对值（MAE）和平均方差（MSE） errors by an average of 25% and 48%, respectively.这项研究证明了将量子计算技术与深度学习技术结合在金融时间序列预测中的潜在和未来的投资机会。

Deep Kernel and Image Quality Estimators for Optimizing Robotic Ultrasound Controller using Bayesian Optimization

paper_url: http://arxiv.org/abs/2310.07392
repo_url: None
paper_authors: Deepak Raina, SH Chandrashekhara, Richard Voyles, Juan Wachs, Subir Kumar Saha
for: 帮助自动化医疗影像成像，减少医生的工作负担
methods: 使用神经网络学习低维度kernels，并使用这些kernels进行bayesian优化
results: 实现50%的样本效率提高，并且这种性能提高是不受具体训练数据的影响Here’s a more detailed explanation of each point:
for: The paper is written to help improve the efficiency of autonomous robotic ultrasound imaging, by reducing the workload of sonographers.
methods: The paper proposes using a neural network to learn a low-dimensional kernel in Bayesian optimization, which is a sample-efficient optimization framework. The neural network is trained using probe and image data acquired during the procedure.
results: The paper shows that the proposed framework can achieve over 50% increase in sample efficiency for 6D control of the robotized probe, and this performance enhancement is independent of the specific training dataset, demonstrating inter-patient adaptability.

Abstract
Ultrasound is a commonly used medical imaging modality that requires expert sonographers to manually maneuver the ultrasound probe based on the acquired image. Autonomous Robotic Ultrasound (A-RUS) is an appealing alternative to this manual procedure in order to reduce sonographers' workload. The key challenge to A-RUS is optimizing the ultrasound image quality for the region of interest across different patients. This requires knowledge of anatomy, recognition of error sources and precise probe position, orientation and pressure. Sample efficiency is important while optimizing these parameters associated with the robotized probe controller. Bayesian Optimization (BO), a sample-efficient optimization framework, has recently been applied to optimize the 2D motion of the probe. Nevertheless, further improvements are needed to improve the sample efficiency for high-dimensional control of the probe. We aim to overcome this problem by using a neural network to learn a low-dimensional kernel in BO, termed as Deep Kernel (DK). The neural network of DK is trained using probe and image data acquired during the procedure. The two image quality estimators are proposed that use a deep convolution neural network and provide real-time feedback to the BO. We validated our framework using these two feedback functions on three urinary bladder phantoms. We obtained over 50% increase in sample efficiency for 6D control of the robotized probe. Furthermore, our results indicate that this performance enhancement in BO is independent of the specific training dataset, demonstrating inter-patient adaptability.

摘要
ultrasound 是一种广泛使用的医疗影像模式，需要专业的医疗人员手动操作ultrasound 探针。 autonomous Robotic Ultrasound (A-RUS) 是一种可能的解决方案，以减轻医疗人员的工作负担。然而，要OPTIMIZE 影像质量在不同的病人中是主要挑战。这需要了解解剖学、识别错误来源以及精确的探针位置、orientation 和压力。 sample efficiency 是重要的，而且OPTIMIZING 这些参数与 robotized 探针控制器相关。 Bayesian Optimization (BO) 是一种样本效率的优化框架，已经应用于优化2D 探针运动。然而，需要进一步改进以提高高维度控制的样本效率。我们想使用神经网络学习一个低维度kernel，称为Deep Kernel (DK)。神经网络的 DK 在 BO 中训练，使用在过程中获取的探针和影像数据。我们提出了两种影像质量估计器，使用深度卷积神经网络，并提供实时反馈给 BO。我们验证了我们的框架使用这两种反馈函数，在三个尿道模拟中获得了50%以上的样本效率提高。此外，我们的结果表明，这种性能改进在 BO 中是无关特定训练集的，表示可以在不同的病人中进行适应。

Experimental quantum natural gradient optimization in photonics

paper_url: http://arxiv.org/abs/2310.07371
repo_url: None
paper_authors: Yizhi Wang, Shichuan Xue, Yaxuan Wang, Jiangfang Ding, Weixu Shi, Dongyang Wang, Yong Liu, Yingwen Liu, Xiang Fu, Guangyao Huang, Anqi Huang, Mingtang Deng, Junjie Wu
for: 研究可行量子计算机在噪响中等级量子时代的实用应用。
methods: 使用可 Parameterized 量子圈和类别优化器，实现可行量子计算机的实用应用。
results: 比用Gradient-free和普通的梯度下降方法更快地 converges 和更好地避免地点极值，从而降低了电路执行的成本。在光学设备上实验ally 证明了这一点，并实现了He-H$^+$ 离子的分解曲线，达到了化学精度。

Abstract
Variational quantum algorithms (VQAs) combining the advantages of parameterized quantum circuits and classical optimizers, promise practical quantum applications in the Noisy Intermediate-Scale Quantum era. The performance of VQAs heavily depends on the optimization method. Compared with gradient-free and ordinary gradient descent methods, the quantum natural gradient (QNG), which mirrors the geometric structure of the parameter space, can achieve faster convergence and avoid local minima more easily, thereby reducing the cost of circuit executions. We utilized a fully programmable photonic chip to experimentally estimate the QNG in photonics for the first time. We obtained the dissociation curve of the He-H$^+$ cation and achieved chemical accuracy, verifying the outperformance of QNG optimization on a photonic device. Our work opens up a vista of utilizing QNG in photonics to implement practical near-term quantum applications.

摘要
“变量量量算法（VQA），结合参数化量Circuit和классиical优化器的优点，承诺实现量子应用程序在噪声中间量子时代。VQA的性能强调优化方法。相比于梯度值为零和普通梯度下降方法，量子自然梯度（QNG），它反映参数空间的几何结构，可以更快地收敛和更容易避免地陷入地点，从而降低环境执行成本。我们利用了完全可编程的光学芯片进行实验，对光学中的QNG进行了实验性估计。我们获得了He-H$^+$离子的解键曲线，并达到了化学精度，证明了QNG优化在光学设备上的超越性。我们的工作开启了在光学中使用QNG实现实用的近期量子应用的可能性。”Note that Simplified Chinese is a more informal and spoken version of Chinese, and it may not be appropriate for all formal situations or audiences. If you need a more formal translation, you may want to consider using Traditional Chinese or Classical Chinese.

Orthogonal Random Features: Explicit Forms and Sharp Inequalities

paper_url: http://arxiv.org/abs/2310.07370
repo_url: None
paper_authors: Nizar Demni, Hachem Kadri
for: 这个论文是为了扩大kernel方法的扩展而引入随机特征。特别是使用随机傅рие纹理和正交随机纹理来 aproximate popular Gaussian kernel。
methods: 这篇论文使用了随机傅рие纹理和正交随机纹理来 aproximate Gaussian kernel。
results: 这篇论文分析了随机特征的偏见和方差，并提供了正常化essel函数的准确表达和锐化均衡 bound，证明了正交随机特征比随机傅рие纹理更有用。

Abstract
Random features have been introduced to scale up kernel methods via randomization techniques. In particular, random Fourier features and orthogonal random features were used to approximate the popular Gaussian kernel. The former is performed by a random Gaussian matrix and leads exactly to the Gaussian kernel after averaging. In this work, we analyze the bias and the variance of the kernel approximation based on orthogonal random features which makes use of Haar orthogonal matrices. We provide explicit expressions for these quantities using normalized Bessel functions and derive sharp exponential bounds supporting the view that orthogonal random features are more informative than random Fourier features.

摘要
随机特性被引入来扩大kernel方法。特别是随机傅立勃方法和正交随机特性被用来估计流行的加aussian kernel。前者通过随机矩阵来实现，并导致exact Gaussian kernel после均值。在这种工作中，我们分析kernel approximation的偏差和方差基于正交随机特性，使用 Haar正交矩阵。我们提供了Explicit表达式使用 normalized Bessel functions，并 derivsharp exponential bounds，支持我们的观点，即正交随机特性比随机傅立勃方法更有用。

Improved Analysis of Sparse Linear Regression in Local Differential Privacy Model

paper_url: http://arxiv.org/abs/2310.07367
repo_url: None
paper_authors: Liyang Zhu, Meng Ding, Vaneet Aggarwal, Jinhui Xu, Di Wang
for: 本研究重新审视了含有稀疏参数的线性回归问题在本地隐私（LDP）模型中。现有研究在非互动和串行本地模型中已经关注到了对于$1$-稀疏参数的下界，但扩展到更一般的$k$-稀疏参数的情况是有挑战的。此外，是否存在有效的非互动LDP（NLDP）算法仍然是一个问题。
methods: 我们首先考虑了在$\epsilon$非互动LDP模型中的问题，并提供了$\ell_2$-范数估计误差的下界为$\Omega(\frac{\sqrt{dk\log d}{\sqrt{n}\epsilon})$，其中$n$是样本大小和$d$是特征空间的维度。我们还提出了一种创新的NLDP算法，这是本问题的首次解决方案。这个算法还生成了一个高效的估计器作为副产品。我们的算法实现了对于各种数据的上界为$\tilde{O}({\frac{d\sqrt{k}{\sqrt{n}\epsilon})$，可以通过增加$O(\sqrt{d})$的因子进一步提高。在串行互动LDP模型中，我们显示了类似的下界。
results: 我们的结论表明，在稀疏线性回归问题中，非互动LDP模型和中心DP模型之间存在深刻的差异。

Abstract
In this paper, we revisit the problem of sparse linear regression in the local differential privacy (LDP) model. Existing research in the non-interactive and sequentially local models has focused on obtaining the lower bounds for the case where the underlying parameter is $1$-sparse, and extending such bounds to the more general $k$-sparse case has proven to be challenging. Moreover, it is unclear whether efficient non-interactive LDP (NLDP) algorithms exist. To address these issues, we first consider the problem in the $\epsilon$ non-interactive LDP model and provide a lower bound of $\Omega(\frac{\sqrt{dk\log d}{\sqrt{n}\epsilon})$ on the $\ell_2$-norm estimation error for sub-Gaussian data, where $n$ is the sample size and $d$ is the dimension of the space. We propose an innovative NLDP algorithm, the very first of its kind for the problem. As a remarkable outcome, this algorithm also yields a novel and highly efficient estimator as a valuable by-product. Our algorithm achieves an upper bound of $\tilde{O}({\frac{d\sqrt{k}{\sqrt{n}\epsilon})$ for the estimation error when the data is sub-Gaussian, which can be further improved by a factor of $O(\sqrt{d})$ if the server has additional public but unlabeled data. For the sequentially interactive LDP model, we show a similar lower bound of $\Omega({\frac{\sqrt{dk}{\sqrt{n}\epsilon})$. As for the upper bound, we rectify a previous method and show that it is possible to achieve a bound of $\tilde{O}(\frac{k\sqrt{d}{\sqrt{n}\epsilon})$. Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.

摘要
在这篇论文中，我们重新考虑了在本地权限隐私（LDP）模型下的稀疏线性回归问题。现有研究在非交互式和顺序本地模型下都集中在了对于$1$-稀疏的参数下获得下界，并将其推广到更通用的$k$-稀疏 случа子是一项挑战。另外，是否存在高效的非交互式LDP（NLDP）算法也是一个问题。为解决这些问题，我们首先考虑了在$\epsilon$非交互式LDP模型下的问题，并提供了$\ell_2$-范数估计误差的下界为$\Omega(\frac{\sqrt{dk\log d}{\sqrt{n}\epsilon})$，其中$n$是样本大小，$d$是空间维度。我们提出了一种创新的NLDP算法，这是首次为这个问题提出了解决方案。这个算法还生成了一种高效的估计器作为副产品。我们的算法在SUB-高分布数据上达到了$\tilde{O}({\frac{d\sqrt{k}{\sqrt{n}\epsilon})$的估计误差上界，可以通过增加$O(\sqrt{d})$的因子进一步改进。如果服务器拥有额外的公共 yet 无标签数据，那么我们的算法可以在这些数据上进行改进，从而提高估计误差的Bound。在Sequentially交互式LDP模型下，我们显示了类似的下界为$\Omega(\frac{\sqrt{dk}{\sqrt{n}\epsilon})$。在上界方面，我们修复了之前的方法，并显示了可以达到$\tilde{O}(\frac{k\sqrt{d}{\sqrt{n}\epsilon})$的上界。我们的发现表明了稀疏线性回归问题在非权限模型、中央DP模型和本地DP模型之间存在fundamental的差异。

GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning

paper_url: http://arxiv.org/abs/2310.07365
repo_url: None
paper_authors: Yun Zhu, Yaoke Wang, Haizhou Shi, Zhenshuo Zhang, Siliang Tang
for:这篇论文的目的是提出一个名为GraphControl的创新部署模组，以便更好地实现图形领域的转移学习。methods:这篇论文使用了自适应的图形模型和ControlNet来实现图形转移学习。results:实验结果显示，GraphControl可以将预训 модеル更好地适应目标图形资料，实现1.4-3倍的性能提升，并且比较training-from-scratch方法在目标资料上的表现更好，且变得更快速。

Abstract
Graph-structured data is ubiquitous in the world which models complex relationships between objects, enabling various Web applications. Daily influxes of unlabeled graph data on the Web offer immense potential for these applications. Graph self-supervised algorithms have achieved significant success in acquiring generic knowledge from abundant unlabeled graph data. These pre-trained models can be applied to various downstream Web applications, saving training time and improving downstream (target) performance. However, different graphs, even across seemingly similar domains, can differ significantly in terms of attribute semantics, posing difficulties, if not infeasibility, for transferring the pre-trained models to downstream tasks. Concretely speaking, for example, the additional task-specific node information in downstream tasks (specificity) is usually deliberately omitted so that the pre-trained representation (transferability) can be leveraged. The trade-off as such is termed as "transferability-specificity dilemma" in this work. To address this challenge, we introduce an innovative deployment module coined as GraphControl, motivated by ControlNet, to realize better graph domain transfer learning. Specifically, by leveraging universal structural pre-trained models and GraphControl, we align the input space across various graphs and incorporate unique characteristics of target data as conditional inputs. These conditions will be progressively integrated into the model during fine-tuning or prompt tuning through ControlNet, facilitating personalized deployment. Extensive experiments show that our method significantly enhances the adaptability of pre-trained models on target attributed datasets, achieving 1.4-3x performance gain. Furthermore, it outperforms training-from-scratch methods on target data with a comparable margin and exhibits faster convergence.

摘要
世界各地的各种数据都具有图structured的特点，用于模型复杂的对象之间的关系，支持多种Web应用程序。日常大量未标注图数据的入流提供了巨大的潜在性 для这些应用程序。自我超vised学习算法在丰富的未标注图数据上取得了显著的成功，从而获得了一些通用的知识。这些预训练模型可以应用于多个下游Web应用程序，提高下游（目标）性能，并节省训练时间。然而，不同的图，即使在看起来相似的领域中，可能存在很大的属性 semantics 的差异，这会对预训练模型的转移带来很大的困难，甚至是不可能。具体来说，例如，在下游任务中添加特定任务的节点信息通常会故意 omitted，以便利用预训练表示。这种困难被称为“转移性-特点矛盾”在这篇论文中。为解决这个挑战，我们提出了一种创新的投入模块，称为GraphControl， inspirited by ControlNet。我们利用通用的结构预训练模型和GraphControl，将输入空间 across various graphs 进行对应，并 incorporate 目标数据中独特的特征作为条件输入。这些条件将在练习或提示调整中逐渐进行 integrate 到模型中，通过ControlNet，实现个性化部署。我们的方法在目标 attributed 数据上显著提高了预训练模型的适应性，实现了1.4-3x的性能提升。此外，它还超过了从scratch 训练方法在目标数据上的相同幅度，并且显示更快的收敛速度。

Atom-Motif Contrastive Transformer for Molecular Property Prediction

paper_url: http://arxiv.org/abs/2310.07351
repo_url: None
paper_authors: Wentao Yu, Shuo Chen, Chen Gong, Gang Niu, Masashi Sugiyama
for:* 这 paper 是为了提高分子属性预测（MPP）的效果而写的。methods:* 这 paper 使用了 Atom-Motif Contrastive Transformer（AMCT）模型，该模型不仅考虑了分子中的单个原子间交互，还考虑了分子中的重要模式（例如功能组）的交互。results:* 对比于当前的状态艺术方法，该 paper 的方法能够更好地预测分子的属性，并且可以准确地确定每个分子中的关键模式。

Abstract
Recently, Graph Transformer (GT) models have been widely used in the task of Molecular Property Prediction (MPP) due to their high reliability in characterizing the latent relationship among graph nodes (i.e., the atoms in a molecule). However, most existing GT-based methods usually explore the basic interactions between pairwise atoms, and thus they fail to consider the important interactions among critical motifs (e.g., functional groups consisted of several atoms) of molecules. As motifs in a molecule are significant patterns that are of great importance for determining molecular properties (e.g., toxicity and solubility), overlooking motif interactions inevitably hinders the effectiveness of MPP. To address this issue, we propose a novel Atom-Motif Contrastive Transformer (AMCT), which not only explores the atom-level interactions but also considers the motif-level interactions. Since the representations of atoms and motifs for a given molecule are actually two different views of the same instance, they are naturally aligned to generate the self-supervisory signals for model training. Meanwhile, the same motif can exist in different molecules, and hence we also employ the contrastive loss to maximize the representation agreement of identical motifs across different molecules. Finally, in order to clearly identify the motifs that are critical in deciding the properties of each molecule, we further construct a property-aware attention mechanism into our learning framework. Our proposed AMCT is extensively evaluated on seven popular benchmark datasets, and both quantitative and qualitative results firmly demonstrate its effectiveness when compared with the state-of-the-art methods.

摘要
近些年，图Transformer（GT）模型在分子性质预测（MPP）任务中广泛应用，因为它们可以准确地描述分子中节点之间的潜在关系。然而，大多数现有的GT基于方法通常只explore分子中基本的对应关系，因此它们忽略了分子中重要的对应关系（例如功能组consisted of several atoms）。由于分子中的功能组是决定分子性质的重要Patterns，忽略这些对应关系不可避免地降低了MPP的效果。为解决这个问题，我们提出了一种新的Atom-Motif Contrastive Transformer（AMCT）模型，它不仅探索分子中的原子间交互，还考虑分子中的对应关系。由于分子中的原子和功能组的表示是同一个实例的两种视角，因此它们自然地启用了自我超visional信号 для模型训练。此外，同一个功能组可以在不同的分子中出现，因此我们还使用了对比损失来提高同一个功能组在不同的分子中的表示协调。最后，为了清晰地确定每个分子中critical的功能组，我们进一步构建了一个性质意识的注意机制 into our learning framework。我们的提出的AMCT在七个流行的 benchmark dataset上进行了广泛的评估，并且量化和质量上的结果都证明了它的效果性比领先方法更高。

Towards Foundation Models for Learning on Tabular Data

paper_url: http://arxiv.org/abs/2310.07338
repo_url: https://github.com/jettbrains/-L-
paper_authors: Han Zhang, Xumeng Wen, Shun Zheng, Wei Xu, Jiang Bian
for: 提高 tabular 数据上的学习效果，并提供可转移的模型 для新任务。
methods: 使用生成型 tabular 学习，采用预训练的大语言模型（LLM）作为基本模型，并通过定制的目标进行精度调整。
results: 在零shot和Contextual Inference等指令ollowing任务中， TabFM 显著超越了 GPT-4 等关闭源 LLM，并在 scarce 数据下进行 fine-tuning 时显示了remarkable的效率和竞争力。

Abstract
Learning on tabular data underpins numerous real-world applications. Despite considerable efforts in developing effective learning models for tabular data, current transferable tabular models remain in their infancy, limited by either the lack of support for direct instruction following in new tasks or the neglect of acquiring foundational knowledge and capabilities from diverse tabular datasets. In this paper, we propose Tabular Foundation Models (TabFMs) to overcome these limitations. TabFMs harness the potential of generative tabular learning, employing a pre-trained large language model (LLM) as the base model and fine-tuning it using purpose-designed objectives on an extensive range of tabular datasets. This approach endows TabFMs with a profound understanding and universal capabilities essential for learning on tabular data. Our evaluations underscore TabFM's effectiveness: not only does it significantly excel in instruction-following tasks like zero-shot and in-context inference, but it also showcases performance that approaches, and in instances, even transcends, the renowned yet mysterious closed-source LLMs like GPT-4. Furthermore, when fine-tuning with scarce data, our model achieves remarkable efficiency and maintains competitive performance with abundant training data. Finally, while our results are promising, we also delve into TabFM's limitations and potential opportunities, aiming to stimulate and expedite future research on developing more potent TabFMs.

摘要
学习表格数据的应用场景非常广泛。尽管有大量的努力在开发有效的学习模型 для表格数据，但目前可传输的表格模型仍处于幼年期，受到 Either irect instruction following in new tasks 或 neglect of acquiring foundational knowledge and capabilities from diverse tabular datasets的限制。在这篇论文中，我们提议使用表格基础模型（TabFM）来超越这些限制。TabFM 利用生成表格学习的潜力，使用预训练的大型自然语言模型（LLM）作为基本模型，并通过特定目标的精心调整在广泛的表格数据集上。这种方法赋予 TabFM 深刻的理解和普遍的能力，使其成为学习表格数据的优秀选择。我们的评估表明，TabFM 不仅在 zero-shot 和 in-context 推理任务中表现出色，而且在一些情况下，甚至超越了著名但神秘的关闭源 LLM like GPT-4。此外，当 fine-tuning WITH 稀有数据时，我们的模型实现了很好的效率，并保持了与丰富训练数据相比的竞争性。最后，虽然我们的结果吸引人，但我们 также探讨 TabFM 的局限性和潜在机遇，以便促进和加快未来的研究。

Multichannel consecutive data cross-extraction with 1DCNN-attention for diagnosis of power transformer

paper_url: http://arxiv.org/abs/2310.07323
repo_url: None
paper_authors: Wei Zheng, Guogang Zhang, Chenchen Zhao, Qianqian Zhu
for: 本研究旨在提出一种基于多通道连续数据的变压器诊断方法，以便更好地捕捉变压器的状态信息。
methods: 本方法基于多通道连续数据层次结构（MCDC），并引入一维 convolutional neural network attention（1DCNN-attention）机制以提高诊断效果和简化空间复杂度。
results: 实验结果表明，相比于其他方法，MCDC和1DCNN-attention具有更高的诊断精度和泛化能力，并且1DCNN-attention机制能够提供更稳定的诊断结果。

Abstract
Power transformer plays a critical role in grid infrastructure, and its diagnosis is paramount for maintaining stable operation. However, the current methods for transformer diagnosis focus on discrete dissolved gas analysis, neglecting deep feature extraction of multichannel consecutive data. The unutilized sequential data contains the significant temporal information reflecting the transformer condition. In light of this, the structure of multichannel consecutive data cross-extraction (MCDC) is proposed in this article in order to comprehensively exploit the intrinsic characteristic and evaluate the states of transformer. Moreover, for the better accommodation in scenario of transformer diagnosis, one dimensional convolution neural network attention (1DCNN-attention) mechanism is introduced and offers a more efficient solution given the simplified spatial complexity. Finally, the effectiveness of MCDC and the superior generalization ability, compared with other algorithms, are validated in experiments conducted on a dataset collected from real operation cases of power transformer. Additionally, the better stability of 1DCNN-attention has also been certified.

摘要
<>Power transformer plays a critical role in grid infrastructure, and its diagnosis is paramount for maintaining stable operation. However, the current methods for transformer diagnosis focus on discrete dissolved gas analysis, neglecting deep feature extraction of multichannel consecutive data. The unutilized sequential data contains significant temporal information reflecting the transformer condition. In light of this, the structure of multichannel consecutive data cross-extraction (MCDC) is proposed in this article to comprehensively exploit the intrinsic characteristic and evaluate the states of transformer. Moreover, to better accommodate the scenario of transformer diagnosis, one dimensional convolution neural network attention (1DCNN-attention) mechanism is introduced, offering a more efficient solution given the simplified spatial complexity. Finally, the effectiveness of MCDC and the superior generalization ability, compared with other algorithms, are validated in experiments conducted on a dataset collected from real operation cases of power transformer. Additionally, the better stability of 1DCNN-attention has also been certified.Note: Please keep in mind that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from Traditional Chinese.

Byzantine-Resilient Decentralized Multi-Armed Bandits

paper_url: http://arxiv.org/abs/2310.07320
repo_url: None
paper_authors: Jingxuan Zhu, Alec Koppel, Alvaro Velasquez, Ji Liu
for: This paper focuses on the problem of decentralized cooperative multi-armed bandits (MAB) in a setting where some agents may be Byzantine (i.e., they may provide arbitrary wrong information). The goal is to develop a fully decentralized resilient algorithm that can recover the salient behavior of a cooperative setting even in the presence of Byzantine agents.
methods: The proposed algorithm uses an information mixing step among agents and a truncation of inconsistent and extreme values to fuse the information. The algorithm is based on the Upper-Confidence Bound (UCB) method, but with a modification to handle the Byzantine agents.
results: The paper shows that the proposed algorithm can achieve a regret that is no worse than the classic single-agent UCB1 algorithm, and the cumulative regret of all normal agents is strictly better than the non-cooperative case, as long as each agent has at least 3f+1 neighbors where f is the maximum possible Byzantine agents in each agent’s neighborhood. The paper also establishes extensions to time-varying neighbor graphs and minimax lower bounds on the achievable regret. Experiments corroborate the merits of the proposed framework in practice.

Abstract
In decentralized cooperative multi-armed bandits (MAB), each agent observes a distinct stream of rewards, and seeks to exchange information with others to select a sequence of arms so as to minimize its regret. Agents in the cooperative setting can outperform a single agent running a MAB method such as Upper-Confidence Bound (UCB) independently. In this work, we study how to recover such salient behavior when an unknown fraction of the agents can be Byzantine, that is, communicate arbitrarily wrong information in the form of reward mean-estimates or confidence sets. This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets. Our key contribution is the development of a fully decentralized resilient upper confidence bound (UCB) algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values. This truncation step enables us to establish that the performance of each normal agent is no worse than the classic single-agent UCB1 algorithm in terms of regret, and more importantly, the cumulative regret of all normal agents is strictly better than the non-cooperative case, provided that each agent has at least 3f+1 neighbors where f is the maximum possible Byzantine agents in each agent's neighborhood. Extensions to time-varying neighbor graphs, and minimax lower bounds are further established on the achievable regret. Experiments corroborate the merits of this framework in practice.

摘要
在分布式合作多臂强擦擦机（MAB）中，每个代理都观察到自己独特的奖励流，并尝试与其他代理交换信息，以选择一个序列的臂，以最小化它的 regret。在合作 setting中，代理可以超过单个代理运行 MAB 方法，如 Upper-Confidence Bound（UCB）独立地执行。在这种工作中，我们研究如何恢复这种突出的行为，当一个未知的比例的代理可以是 Byzantine，即通过不正确地传递奖励均值或信任集来交流信息时。这种框架可以用来模型计算机网络中的攻击者，推荐系统中的启动者，或财务市场中的操纵者。我们的关键贡献是开发了一种完全分布式抗攻击的Upper Confidence Bound（UCB）算法，该算法结合代理之间的信息混合步骤，并对不一致和极端值进行舍入。这种舍入步骤使得我们可以证明每个正常的代理的性能不 worse than 独立的单个代理 UCB1 算法，而且更重要的是，所有正常的代理的总 regret 比非合作情况更好，只要每个代理至少有 3f+1 个邻居，其中 f 是最大可能的 Byzantine 代理数量。我们还Extensions to 时变邻居图和最小最差下界是进一步确立的。实验证明了这种框架在实践中的优势。

Molecule-Edit Templates for Efficient and Accurate Retrosynthesis Prediction

paper_url: http://arxiv.org/abs/2310.07313
repo_url: None
paper_authors: Mikołaj Sacha, Michał Sadowski, Piotr Kozakowski, Ruard van Workum, Stanisław Jastrzębski
for: 本研究旨在开发一种基于机器学习的Retrosynthesis方法，以便用更加有效和可解释的方式预测复杂分子的合成步骤。
methods: 本研究使用了一种名为METRO（分子修改模板Retrosynthesis）的机器学习模型，该模型使用最小的模板（简化的反应模式）来预测反应，从而减少计算开销并达到标准benchmark的最佳结果。
results: 根据标准benchmark的测试结果，METRO模型可以准确预测复杂分子的合成步骤，并且比传统的模板基本法更加有效和可解释。

Abstract
Retrosynthesis involves determining a sequence of reactions to synthesize complex molecules from simpler precursors. As this poses a challenge in organic chemistry, machine learning has offered solutions, particularly for predicting possible reaction substrates for a given target molecule. These solutions mainly fall into template-based and template-free categories. The former is efficient but relies on a vast set of predefined reaction patterns, while the latter, though more flexible, can be computationally intensive and less interpretable. To address these issues, we introduce METRO (Molecule-Edit Templates for RetrOsynthesis), a machine-learning model that predicts reactions using minimal templates - simplified reaction patterns capturing only essential molecular changes - reducing computational overhead and achieving state-of-the-art results on standard benchmarks.

摘要
转换文本到简化中文：Retrosynthesis是指从简单前体分子 synthesize复杂分子的过程。这在有机化学中存在挑战，特别是预测目标分子可能的反应substrate。这些解决方案主要分为模板基和模板自由两类。前者是效率高，但需要大量预定的反应模式，而后者具有更多的灵活性，但计算负担大和解释性差。为解决这些问题，我们介绍METRO（分子修改模板 для RetrOsynthesis），一种基于机器学习的模型，使用最小模板预测反应，减少计算负担，达到标准benchmark的状态 искусственный智能。

Score Regularized Policy Optimization through Diffusion Behavior

paper_url: http://arxiv.org/abs/2310.07297
repo_url: https://github.com/thu-ml/srpo
paper_authors: Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
for: 实现高效的行为样本抽取和独立于时间和资源consuming的算法。
methods: 利用批评模型和遗传 diffusion 行为模型，将分布式批评政策转换为具有高效决策能力的决策政策，并在优化过程中直接使用行为分布的得分函数进行调整。
results: 在 D4RL 任务上，我们的方法可以大幅提高行为抽取速度，较于各种主流的扩散基于方法，并且仍保持现有的性能水准。

Abstract
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow because it necessitates tens to hundreds of iterative inference steps for one action. To address this issue, we propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models, leveraging the latter to directly regularize the policy gradient with the behavior distribution's score function during optimization. Our method enjoys powerful generative capabilities of diffusion modeling while completely circumventing the computationally intensive and time-consuming diffusion sampling scheme, both during training and evaluation. Extensive results on D4RL tasks show that our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks, while still maintaining state-of-the-art performance.

摘要

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

paper_url: http://arxiv.org/abs/2310.07269
repo_url: None
paper_authors: Zixiang Chen, Junkai Zhang, Yiwen Kou, Xiangning Chen, Cho-Jui Hsieh, Quanquan Gu
for: 降低过拟合的挑战，特别是在训练大型神经网络时，使用Sharpness-Aware Minimization（SAM）方法可以提高神经网络的泛化能力，即使存在标签噪声。
methods: 本文使用Sharpness-Aware Minimization（SAM）方法，并通过对非线性神经网络和分类任务的研究，解释了SAM在这些任务中的成功原理。
results: 对于某种数据模型和二层卷积ReLU网络，本文证明SAM在某些情况下比Stochastic Gradient Descent（SGD）更好地泛化，并通过实验证明了这一点。

Abstract
The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However, a deep understanding of how SAM works, especially in the setting of nonlinear neural networks and classification tasks, remains largely missing. This paper fills this gap by demonstrating why SAM generalizes better than Stochastic Gradient Descent (SGD) for a certain data model and two-layer convolutional ReLU networks. The loss landscape of our studied problem is nonsmooth, thus current explanations for the success of SAM based on the Hessian information are insufficient. Our result explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, thereby facilitating more effective learning of features. Experiments on both synthetic and real data corroborate our theory.

摘要
Translated into Simplified Chinese:难以适应问题，即模型记忆训练数据而不能泛化测试数据，在大型神经网络训练中变得越来越重要。为解决这个挑战，锐度感知敏感化（SAM）已经成为一种有前途的训练方法，可以提高神经网络的泛化性，即使存在标签噪声。然而，SAM在非线性神经网络和分类任务中的工作机制仍然不够了解。这篇论文填补这个空白，说明SAM在某种数据模型和二层卷积ReLU网络上的优化性比SGD更好。我们的搜索问题的损失景观是非凹的，因此现有基于Hessian信息的解释不够。我们的结果解释了SAM的优势，特别是它在早期阶段避免噪声学习，从而促进更有效的特征学习。实验证明了我们的理论。

RaftFed: A Lightweight Federated Learning Framework for Vehicular Crowd Intelligence

paper_url: http://arxiv.org/abs/2310.07268
repo_url: None
paper_authors: Changan Yang, Yaxing Chen, Yao Zhang, Helei Cui, Zhiwen Yu, Bin Guo, Zheng Yan, Zijiang Yang
for: 这则研究旨在解决车载智能应用中的数据隐私问题，使用了联邦学习（Federated Learning，FL）技术。
methods: 本研究提出了一个名为RaftFed的新型联邦学习框架，实现了隐私保护的车载智能应用。RaftFed使用了raft协议来实现分布式模型聚合，并且适应非 Identical Independent Distributions（Non-IID）数据。
results: 实验结果显示，RaftFed相比基eline的通信负载、模型精度和模型融合都表现更好。

Abstract
Vehicular crowd intelligence (VCI) is an emerging research field. Facilitated by state-of-the-art vehicular ad-hoc networks and artificial intelligence, various VCI applications come to place, e.g., collaborative sensing, positioning, and mapping. The collaborative property of VCI applications generally requires data to be shared among participants, thus forming network-wide intelligence. How to fulfill this process without compromising data privacy remains a challenging issue. Although federated learning (FL) is a promising tool to solve the problem, adapting conventional FL frameworks to VCI is nontrivial. First, the centralized model aggregation is unreliable in VCI because of the existence of stragglers with unfavorable channel conditions. Second, existing FL schemes are vulnerable to Non-IID data, which is intensified by the data heterogeneity in VCI. This paper proposes a novel federated learning framework called RaftFed to facilitate privacy-preserving VCI. The experimental results show that RaftFed performs better than baselines regarding communication overhead, model accuracy, and model convergence.

摘要
vehicular crowd intelligence (VCI) 是一个emerging研究领域。通过现代交通运输网络和人工智能技术，VCI应用得到了广泛的应用，例如共同探测、定位和地图生成。VCI应用的共同性通常需要参与者之间数据共享，因此形成网络范围内的智能。但是保护数据隐私的问题仍然是一个挑战。虽然联邦学习（FL）是一种有望的解决方案，但将传统FL框架适应VCI是一个非常困难的任务。首先，中央模型聚合是VCI中不可靠的，因为存在不优惠的通道条件下的停留者。其次，现有的FL方案容易受到非同分布数据的影响，这在VCI中更加严重，因为数据多样性很高。这篇论文提出了一种新的联邦学习框架called RaftFed，用于保护隐私的VCI。实验结果表明，RaftFed比基准方案更好，具有较低的通信开销、更高的模型准确率和更快的模型融合。

Classification of Dysarthria based on the Levels of Severity. A Systematic Review

paper_url: http://arxiv.org/abs/2310.07264
repo_url: None
paper_authors: Afnan Al-Ali, Somaya Al-Maadeed, Moutaz Saleh, Rani Chinnappa Naidu, Zachariah C Alex, Prakash Ramachandran, Rajeev Khoodeeram, Rajesh Kumar M
for: 这个评估是为了提高受影响者的沟通能力和生活质量，以及为了提供更加准确和可靠的诊断。
methods: 这个评估使用了人工智能技术，特别是机器学习算法，以自动分类受影响者的喉痛度。
results: 这个评估发现了一些最有效的特征和技术，可以用于自动分类受影响者的喉痛度，并提高了诊断的准确性和可靠性。

Abstract
Dysarthria is a neurological speech disorder that can significantly impact affected individuals' communication abilities and overall quality of life. The accurate and objective classification of dysarthria and the determination of its severity are crucial for effective therapeutic intervention. While traditional assessments by speech-language pathologists (SLPs) are common, they are often subjective, time-consuming, and can vary between practitioners. Emerging machine learning-based models have shown the potential to provide a more objective dysarthria assessment, enhancing diagnostic accuracy and reliability. This systematic review aims to comprehensively analyze current methodologies for classifying dysarthria based on severity levels. Specifically, this review will focus on determining the most effective set and type of features that can be used for automatic patient classification and evaluating the best AI techniques for this purpose. We will systematically review the literature on the automatic classification of dysarthria severity levels. Sources of information will include electronic databases and grey literature. Selection criteria will be established based on relevance to the research questions. Data extraction will include methodologies used, the type of features extracted for classification, and AI techniques employed. The findings of this systematic review will contribute to the current understanding of dysarthria classification, inform future research, and support the development of improved diagnostic tools. The implications of these findings could be significant in advancing patient care and improving therapeutic outcomes for individuals affected by dysarthria.

摘要
《嗜睡病患者精度诊断分类方法的系统atic review》Introduction:嗜睡病（Dysarthria）是一种神经系统疾病，可能对患者的沟通能力和生活质量产生重要影响。准确和客观地诊断嗜睡病和其严重程度是诊断治疗的关键。现有的传统评估方法由语言听说师（SLP）进行，但这些评估方法常常是主观的、耗时的，并且可能由听说师之间存在差异。新兴的机器学习基于模型已经显示出可以提供更客观的嗜睡病诊断，从而提高诊断的准确性和可靠性。本系统atic review旨在全面分析当前的嗜睡病分类方法，具体来说是寻找最有效的分类特征和AI技术。Objectives:本文的目标是对嗜睡病分类方法进行系统atic review，以便更好地理解嗜睡病的分类方法，并且为未来的研究提供参考。特定的研究问题包括：1. 寻找最有效的分类特征，以便自动分类嗜睡病的严重程度。2. 评估AI技术的效果，以便选择最佳的AI技术来进行嗜睡病分类。Methodology:本文的方法包括：1. 搜索电子数据库和灰色文献，以找到相关的研究。2. 根据研究问题的 relevance 选择合适的文献。3. 对选择的文献进行数据抽取，包括使用的方法、分类特征和AI技术。Expected outcomes:本文的结果将对当前的嗜睡病分类方法进行全面分析，并提供有价值的参考。这些结果可能对患者的诊断和治疗产生重要影响，并且可能推动未来的研究。Conclusion:本文的系统atic review将对嗜睡病分类方法进行全面分析，并评估AI技术的效果。这些结果将有助于我们更好地理解嗜睡病的分类方法，并为未来的研究提供参考。这些结果的发现可能对患者的诊断和治疗产生重要影响，并且可能推动未来的研究。

Deep ReLU networks and high-order finite element methods II: Chebyshev emulation

paper_url: http://arxiv.org/abs/2310.07261
repo_url: None
paper_authors: Joost A. A. Opschoor, Christoph Schwab
for: 这篇论文主要研究了深度ReLU神经网络（NN）在 Sobolev нормов下的表达率和稳定性，以及NN的参数数量如何影响这些性能指标。
methods: 这篇论文使用了 Novel constructions of ReLU NN surrogates，即使用Chebychev多项式扩展系数来表示近似函数。这些系数可以从Clenshaw–Curtis点中的函数值使用 inverse fast Fourier transform 计算。
results: 论文得到了对表达率和稳定性的较好的上限，超过了基于ReLU NN拟合幂数考虑的构造。论文还提供了不同类型函数和 norms 的NN拟合误差估计，以及在数值分析中遇到的常见函数和 norms 的ReLU NN拟合率 bounds。

Abstract
Expression rates and stability in Sobolev norms of deep ReLU neural networks (NNs) in terms of the number of parameters defining the NN for continuous, piecewise polynomial functions, on arbitrary, finite partitions $\mathcal{T}$ of a bounded interval $(a,b)$ are addressed. Novel constructions of ReLU NN surrogates encoding the approximated functions in terms of Chebyshev polynomial expansion coefficients are developed. Chebyshev coefficients can be computed easily from the values of the function in the Clenshaw--Curtis points using the inverse fast Fourier transform. Bounds on expression rates and stability that are superior to those of constructions based on ReLU NN emulations of monomials considered in [Opschoor, Petersen, Schwab, 2020] are obtained. All emulation bounds are explicit in terms of the (arbitrary) partition of the interval, the target emulation accuracy and the polynomial degree in each element of the partition. ReLU NN emulation error estimates are provided for various classes of functions and norms, commonly encountered in numerical analysis. In particular, we show exponential ReLU emulation rate bounds for analytic functions with point singularities and develop an interface between Chebfun approximations and constructive ReLU NN emulations.

摘要
“深度ReLU神经网络（NN）的表达速率和稳定性在 Sobolev нор下，对于连续、分割式多项函数，在有界区间（a,b）上的任意、有限分区 $\mathcal{T}$ 上进行研究。我们提出了基于 ReLU NN 的新构造，用于表示approximes 函数的 Chebyshev 多项展开系数。Chebyshev 系数可以通过 Clenshaw-Curtis 点的值来容易计算，使用反快速傅立叶变换。我们获得了基于 ReLU NN 的构造，superior 于 [Opschoor, Petersen, Schwab, 2020] 中基于 monomials 的构造的表达率和稳定性 bound。所有的 emulation bound 是对于（任意）分区、目标投影精度和每个分区元素的权重度的explicit 表达。我们还提供了不同类型函数和norms 的 ReLU NN 投影误差估计，广泛存在在数学分析中。特别是，我们展示了对于分析函数的点稳定性的 exponential ReLU 投影速率 bound。”

CacheGen: Fast Context Loading for Language Model Applications

paper_url: http://arxiv.org/abs/2310.07240
repo_url: None
paper_authors: Yuhan Liu, Hanchen Li, Kuntai Du, Jiayi Yao, Yihua Cheng, Yuyang Huang, Shan Lu, Michael Maire, Henry Hoffmann, Ari Holtzman, Ganesh Ananthanarayanan, Junchen Jiang
for: This paper aims to improve the efficiency of large language models (LLMs) by minimizing the delays in fetching and processing contexts.methods: The paper proposes a novel encoder that compresses key-value (KV) features into more compact bitstream representations, taking advantage of the KV features’ distributional properties. Additionally, the paper uses a controller to determine when to load the context as compressed KV features or raw text and picks the appropriate compression level.results: Compared to recent methods that handle long contexts, the proposed method reduces bandwidth usage by 3.7-4.3x and the total delay in fetching and processing contexts by 2.7-3x while maintaining similar LLM performance on various tasks.

Abstract
As large language models (LLMs) take on more complex tasks, their inputs incorporate longer contexts to respond to questions that require domain knowledge or user-specific conversational histories. Yet, using long contexts poses a challenge for responsive LLM systems, as nothing can be generated until all the contexts are fetched to and processed by the LLM. Existing systems optimize only the computation delay in context processing (e.g., by caching intermediate key-value features of the text context) but often cause longer network delays in context fetching (e.g., key-value features consume orders of magnitude larger bandwidth than the text context). This paper presents CacheGen to minimize the delays in fetching and processing contexts for LLMs. CacheGen reduces the bandwidth needed for transmitting long contexts' key-value (KV) features through a novel encoder that compresses KV features into more compact bitstream representations. The encoder combines adaptive quantization with a tailored arithmetic coder, taking advantage of the KV features' distributional properties, such as locality across tokens. Furthermore, CacheGen minimizes the total delay in fetching and processing a context by using a controller that determines when to load the context as compressed KV features or raw text and picks the appropriate compression level if loaded as KV features. We test CacheGen on three models of various sizes and three datasets of different context lengths. Compared to recent methods that handle long contexts, CacheGen reduces bandwidth usage by 3.7-4.3x and the total delay in fetching and processing contexts by 2.7-3x while maintaining similar LLM performance on various tasks as loading the text contexts.

摘要
large language models (LLMs) 在更复杂的任务中使用时，其输入将包含更长的上下文，以回答需要领域知识或用户特定的对话历史的问题。然而，使用长上下文会对responsive LLM系统 pose a challenge，因为系统无法生成任何内容 until all the contexts are fetched and processed by the LLM. existing systems only optimize the computation delay in context processing (e.g., by caching intermediate key-value features of the text context), but often cause longer network delays in context fetching (e.g., key-value features consume orders of magnitude larger bandwidth than the text context).this paper presents CacheGen, a method to minimize the delays in fetching and processing contexts for LLMs. CacheGen reduces the bandwidth needed for transmitting long contexts' key-value (KV) features through a novel encoder that compresses KV features into more compact bitstream representations. the encoder combines adaptive quantization with a tailored arithmetic coder, taking advantage of the KV features' distributional properties, such as locality across tokens. furthermore, CacheGen minimizes the total delay in fetching and processing a context by using a controller that determines when to load the context as compressed KV features or raw text and picks the appropriate compression level if loaded as KV features. we test CacheGen on three models of various sizes and three datasets of different context lengths. compared to recent methods that handle long contexts, CacheGen reduces bandwidth usage by 3.7-4.3x and the total delay in fetching and processing contexts by 2.7-3x while maintaining similar LLM performance on various tasks as loading the text contexts.

Are GATs Out of Balance?

paper_url: http://arxiv.org/abs/2310.07235
repo_url: None
paper_authors: Nimrah Mustafa, Aleksandar Bojchevski, Rebekka Burkholz
for: 该研究探讨了图神经网络（GNN）的优化和学习动力，尤其是GNN中的Graph Attention Network（GAT） Architecture的学习动力。
methods: 研究者们使用了权重化注意力系数的GAT网络，并 derive了GAT梯度流动动力的保守定律，解释了为什么使用标准初始化的GAT网络中大部分参数难以更新 durante el entrenamiento。
results: 研究者们提出了一种Initialize Balance的方法，该方法可以更好地传播梯度，从而使得更深的GAT网络可以更好地训练，同时可以大幅提高训练和融合时间。

Abstract
While the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.

摘要
whilst the expressive power and computational capabilities of graph neural networks (GNNs) have been theoretically studied, their optimization and learning dynamics, in general, remain largely unexplored. Our study undertakes the Graph Attention Network (GAT), a popular GNN architecture in which a node's neighborhood aggregation is weighted by parameterized attention coefficients. We derive a conservation law of GAT gradient flow dynamics, which explains why a high portion of parameters in GATs with standard initialization struggle to change during training. This effect is amplified in deeper GATs, which perform significantly worse than their shallow counterparts. To alleviate this problem, we devise an initialization scheme that balances the GAT network. Our approach i) allows more effective propagation of gradients and in turn enables trainability of deeper networks, and ii) attains a considerable speedup in training and convergence time in comparison to the standard initialization. Our main theorem serves as a stepping stone to studying the learning dynamics of positive homogeneous models with attention mechanisms.Here's the translation in Traditional Chinese:而Graph Neural Networks（GNNs）的表达能力和计算能力已经被理论上研究，但是它们的优化和学习动态在总的来说还是有很多不清楚的地方。我们的研究涉及到Graph Attention Network（GAT），这是一种常见的GNN架构，其中每个节点的邻居聚合是通过参数化的注意系数来权重。我们得出了GAT的Gradient Flow动力学保守定律，这解释了为什么使用标准初始化的GAT参数大部分在训练中难以变化。这个效果在深度GAT中更加明显，它们在训练和 converges 时间上表现较差。为了解决这个问题，我们提出了一种初始化方案，该方案可以更好地传递梯度，从而使得深度网络可以更好地训练，同时也可以减少训练和 converges 时间的浪费。我们的主要定理作为 studying the learning dynamics of positive homogeneous models with attention mechanisms 的起点。

Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality

paper_url: http://arxiv.org/abs/2310.07234
repo_url: https://github.com/thu-ml/hide-prompt
paper_authors: Liyuan Wang, Jingyi Xie, Xingxing Zhang, Mingyi Huang, Hang Su, Jun Zhu
for: 提高 continual learning 下的表现，尤其是在自动生成标签数据时。
methods: 提出了一种 Hierarchical Decomposition (HiDe-)Prompt 方法，通过将具有任务特征的提示 ensemble 和不同 Representation 的统计数据协调，以提高 continual learning 的表现。
results: 对 Split CIFAR-100 和 Split ImageNet-R 进行了广泛的实验，并取得了比较出色的表现（例如，在 Split CIFAR-100 上提高了15.01%和9.61%的表现），并且robustness 到不同的预训练方法。

Abstract
Prompt-based continual learning is an emerging direction in leveraging pre-trained knowledge for downstream continual learning, and has almost reached the performance pinnacle under supervised pre-training. However, our empirical research reveals that the current strategies fall short of their full potential under the more realistic self-supervised pre-training, which is essential for handling vast quantities of unlabeled data in practice. This is largely due to the difficulty of task-specific knowledge being incorporated into instructed representations via prompt parameters and predicted by uninstructed representations at test time. To overcome the exposed sub-optimality, we conduct a theoretical analysis of the continual learning objective in the context of pre-training, and decompose it into hierarchical components: within-task prediction, task-identity inference, and task-adaptive prediction. Following these empirical and theoretical insights, we propose Hierarchical Decomposition (HiDe-)Prompt, an innovative approach that explicitly optimizes the hierarchical components with an ensemble of task-specific prompts and statistics of both uninstructed and instructed representations, further with the coordination of a contrastive regularization strategy. Our extensive experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning (e.g., up to 15.01% and 9.61% lead on Split CIFAR-100 and Split ImageNet-R, respectively). Our code is available at \url{https://github.com/thu-ml/HiDe-Prompt}.

摘要
Prompt-based continual learning 是一种emerging direction，它利用预训练知识来进行下游 continual learning，并已经几乎 дости到了supervised pre-training的性能巅峰。然而，我们的实验研究表明，当前的策略在更实际的self-supervised pre-training下表现不佳，这主要是因为任务特定知识的包含 INTO instructed representations via prompt parameters和在测试时由uninstructed representations预测的困难。为了解决这种暴露的下optimality，我们进行了 continual learning目标在预训练 context中的理论分析，并将其 decomposes into hierarchical components：within-task prediction、task-identity inference和task-adaptive prediction。根据这些实际和理论的洞察，我们提出了HiDe-Prompt方法，它使用了 ensemble of task-specific prompts和预训练和测试时的 both uninstructed和 instructed representations的统计，同时协调了一种对比正则化策略。我们的广泛实验表明，HiDe-Prompt方法具有superior performance和鲁棒性，在不同的预训练 paradigms下（例如，Split CIFAR-100和Split ImageNet-R）。我们的代码可以在 \url{https://github.com/thu-ml/HiDe-Prompt} 上获取。

Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

paper_url: http://arxiv.org/abs/2310.07229
repo_url: None
paper_authors: Bowen Gao, Yinjun Jia, Yuanle Mo, Yuyan Ni, Weiying Ma, Zhiming Ma, Yanyan Lan
for: 这个研究旨在提高药物可活性预测、药物结合亲和力预测和无样化药物设计等生医应用中的腔室表现。
methods: 本研究使用了一种新的腔室预训法，具体来说是将蛋白结构分成药物相似的片段和其对应的腔室，然后使用高效的先验式小分子表现来帮助学习腔室表现。
results: 研究结果显示，ProFSA方法可以在不同任务中实现州务之前的表现，包括腔室可活性预测、腔室匹配和药物结合亲和力预测等。此外，ProFSA方法也超过了其他预训方法，并且开启了一个新的途径来缓解蛋白-药物复杂数据的罕见性问题。

Abstract
Pocket representations play a vital role in various biomedical applications, such as druggability estimation, ligand affinity prediction, and de novo drug design. While existing geometric features and pretrained representations have demonstrated promising results, they usually treat pockets independent of ligands, neglecting the fundamental interactions between them. However, the limited pocket-ligand complex structures available in the PDB database (less than 100 thousand non-redundant pairs) hampers large-scale pretraining endeavors for interaction modeling. To address this constraint, we propose a novel pocket pretraining approach that leverages knowledge from high-resolution atomic protein structures, assisted by highly effective pretrained small molecule representations. By segmenting protein structures into drug-like fragments and their corresponding pockets, we obtain a reasonable simulation of ligand-receptor interactions, resulting in the generation of over 5 million complexes. Subsequently, the pocket encoder is trained in a contrastive manner to align with the representation of pseudo-ligand furnished by some pretrained small molecule encoders. Our method, named ProFSA, achieves state-of-the-art performance across various tasks, including pocket druggability prediction, pocket matching, and ligand binding affinity prediction. Notably, ProFSA surpasses other pretraining methods by a substantial margin. Moreover, our work opens up a new avenue for mitigating the scarcity of protein-ligand complex data through the utilization of high-quality and diverse protein structure databases.

摘要
pocket 表示具有重要作用在各种生物医学应用中，如药物可能性预测、药物粘性预测和新药设计。 although existing geometric features and pre-trained representations have shown promising results, they usually treat pockets independently of ligands, neglecting the fundamental interactions between them. However, the limited number of pocket-ligand complex structures available in the PDB database (less than 100 thousand non-redundant pairs) hinders large-scale pretraining endeavors for interaction modeling. To address this constraint, we propose a novel pocket pretraining approach that leverages knowledge from high-resolution atomic protein structures, assisted by highly effective pre-trained small molecule representations. By segmenting protein structures into drug-like fragments and their corresponding pockets, we obtain a reasonable simulation of ligand-receptor interactions, resulting in the generation of over 5 million complexes. Subsequently, the pocket encoder is trained in a contrastive manner to align with the representation of pseudo-ligand furnished by some pre-trained small molecule encoders. Our method, named ProFSA, achieves state-of-the-art performance across various tasks, including pocket druggability prediction, pocket matching, and ligand binding affinity prediction. Notably, ProFSA surpasses other pretraining methods by a substantial margin. Moreover, our work opens up a new avenue for mitigating the scarcity of protein-ligand complex data through the utilization of high-quality and diverse protein structure databases.

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

paper_url: http://arxiv.org/abs/2310.07220
repo_url: None
paper_authors: Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang
for: 提高模型基于学习方法的效果和稳定性
methods: 使用保守的模型执行和优惠环境探索，以避免模型不准确地区域和减少模型预测错误的影响
results: 在一系列的 proprioceptive 和视觉控制任务上，使用 $\texttt{COPlanner}$ 可以显著提高模型基于方法的效率和稳定性，并且可以减少模型预测错误的影响

Abstract
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose $\texttt{COPlanner}$, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. $\texttt{COPlanner}$ leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently, $\texttt{COPlanner}$ can avoid model uncertain regions through conservative model rollouts, thereby alleviating the influence of model error. Simultaneously, it explores high-reward model uncertain regions to reduce model error actively through optimistic real environment exploration. $\texttt{COPlanner}$ is a plug-and-play framework that can be applied to any dyna-style model-based methods. Experimental results on a series of proprioceptive and visual continuous control tasks demonstrate that both sample efficiency and asymptotic performance of strong model-based methods are significantly improved combined with $\texttt{COPlanner}$.

摘要
dynamically�model�based reinforcement learning�contains two phases: model rollouts to generate samples for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose $\texttt{COPlanner}$, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. $\texttt{COPlanner}$ leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently, $\texttt{COPlanner}$ can avoid model uncertain regions through conservative model rollouts, thereby alleviating the influence of model error. Simultaneously, it explores high-reward model uncertain regions to reduce model error actively through optimistic real environment exploration. $\texttt{COPlanner}$ is a plug-and-play framework that can be applied to any dyna-style model-based methods. Experimental results on a series of proprioceptive and visual continuous control tasks demonstrate that both sample efficiency and asymptotic performance of strong model-based methods are significantly improved combined with $\texttt{COPlanner}$.

Enhancing Neural Architecture Search with Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices

paper_url: http://arxiv.org/abs/2310.07217
repo_url: None
paper_authors: Alessio Burrello, Matteo Risso, Beatrice Alessandra Motetti, Enrico Macii, Luca Benini, Daniele Jahier Pagliari
for: 这个研究旨在解决互联网项目中的问题，即对于互联网的设备不足的硬件限制，以及对于深度学习模型的复杂度和计算量的需求。
methods: 这个研究使用了多个条件搜索的方法，包括对于紧缩度和延迟的条件搜索，以及对于紧缩度和内存的条件搜索。这些方法可以实现在单一搜索中，同时满足多个条件的需求。
results: 这个研究的结果显示，使用这些方法可以在单一搜索中，实现对于内存和延迟的条件搜索，并且可以实现与现有的深度学习模型相比，提高了硬件限制下的性能。具体来说，这个研究可以实现87.4%的内存和54.2%的延迟的减少，同时保持与现有的深度学习模型相比的精度。

Abstract
The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.

摘要
“因互联网物 Things（IoT）设备的快速普及，导致深度学习（DL）模型的效率和准确性成为当前的应用需求。然而，传统的DL模型通常是复杂过度，不适合IoT端设备的 computationally intense 环境。为解决这个挑战，深度架构搜寻（NAS）已经出现为一种受欢迎的设计自动化技术，可以同时优化网络的准确性和复杂度。然而，现有的NAS技术通常需要许多迭代才能生成一个遵循硬件紧存和延迟限制的网络。在这个工作中，我们提出了一种新的方法，可以同时包含多个紧存和延迟的硬件紧存缓冲击网络的设计，并在单一迭代中生成一个遵循用户定义的紧存和延迟限制的网络，与现有的手动游戏定义网络相比，可以降低87.4%的紧存和54.2%的延迟，同时维持非劣于标准的深度学习网络。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes

paper_url: http://arxiv.org/abs/2310.07216
repo_url: None
paper_authors: Jaehyeong Jo, Sung Ju Hwang
for: 模型数据的分布在里曼尼安 manifold 上，是许多科学领域的应用需求。但现有的生成模型在 manifold 上受到严重的异常计算成本或简化热体函数的限制，这限制了它们的可行性和缩放性。
methods: 我们介绍了里曼尼安扩散混合（Riemannian Diffusion Mixture），一种基于endpoint-conditioned扩散过程的原理性框架，而不是基于之前的扩散模型的减去方法。我们还提出了一种简单又高效的训练目标，可以直接应用于一般 manifold。
results: 我们的方法在不同的 manifold 上比前一代生成模型表现出色，可以在高维度下进行扩散，并且需要减少很多在训练过程中的伪 simulate 步骤。

Abstract
Learning the distribution of data on Riemannian manifolds is crucial for modeling data from non-Euclidean space, which is required by many applications from diverse scientific fields. Yet, existing generative models on manifolds suffer from expensive divergence computation or rely on approximations of heat kernel. These limitations restrict their applicability to simple geometries and hinder scalability to high dimensions. In this work, we introduce the Riemannian Diffusion Mixture, a principled framework for building a generative process on manifolds as a mixture of endpoint-conditioned diffusion processes instead of relying on the denoising approach of previous diffusion models, for which the generative process is characterized by its drift guiding toward the most probable endpoint with respect to the geometry of the manifold. We further propose a simple yet efficient training objective for learning the mixture process, that is readily applicable to general manifolds. Our method outperforms previous generative models on various manifolds while scaling to high dimensions and requires a dramatically reduced number of in-training simulation steps for general manifolds.

摘要
学习里曼尼安数据分布是非常重要的，因为它是许多科学领域的应用所需的。然而，现有的生成模型在 manifold 上受到了严重的减法计算成本或者使用抽象的热核算法，这限制了它们的可用性和扩展性。在这项工作中，我们介绍了里曼尼安扩散混合（Riemannian Diffusion Mixture），一种主义的框架，用于在 manifold 上建立生成过程，而不是依赖于前一个扩散模型的减法方法。我们还提出了一种简单 yet efficient 的训练目标，可以轻松应用于一般 manifold。我们的方法在多种 manifold 上表现出色，可以扩展到高维度，并且需要减少很多在训练过程中的计算步骤。

Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration

paper_url: http://arxiv.org/abs/2310.07211
repo_url: None
paper_authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Guojian Zhan, Jie Li, Shengbo Eben Li
for: 这篇论文主要研究了规范化学习算法中的正则化技术，具体来说是使用希尔博 entropy 作为正则化函数，并证明了这种方法与标准的新颖矩阵法相等。
methods: 该论文使用了正则化策略迭代法，并证明了这种方法在强转化函数下的平滑化 Bellman 方程下具有全球线性准确率 $\gamma$，同时在本地区域内也具有 quadratic 准确率。
results: 该论文证明了正则化策略迭代法在全球和本地区域内都具有 globally linear convergence 性，其速率为 $\gamma$，并且在本地区域内也具有 quadratic 准确率。此外，研究者还证明了一种修改后的正则化策略迭代法，即使用 finite-step policy evaluation，与不准确的新颖法相等。这种算法在 asymptotic linear convergence 性下具有 $\gamma^M$ 的速率，其中 $M$ 是策略评估中所执行的步骤数。

Abstract
Regularization is one of the most important techniques in reinforcement learning algorithms. The well-known soft actor-critic algorithm is a special case of regularized policy iteration where the regularizer is chosen as Shannon entropy. Despite some empirical success of regularized policy iteration, its theoretical underpinnings remain unclear. This paper proves that regularized policy iteration is strictly equivalent to the standard Newton-Raphson method in the condition of smoothing out Bellman equation with strongly convex functions. This equivalence lays the foundation of a unified analysis for both global and local convergence behaviors of regularized policy iteration. We prove that regularized policy iteration has global linear convergence with the rate being $\gamma$ (discount factor). Furthermore, this algorithm converges quadratically once it enters a local region around the optimal value. We also show that a modified version of regularized policy iteration, i.e., with finite-step policy evaluation, is equivalent to inexact Newton method where the Newton iteration formula is solved with truncated iterations. We prove that the associated algorithm achieves an asymptotic linear convergence rate of $\gamma^M$ in which $M$ denotes the number of steps carried out in policy evaluation. Our results take a solid step towards a better understanding of the convergence properties of regularized policy iteration algorithms.

摘要
“常规化”是强化学习算法中最重要的技术之一。知名的软actor-批评算法是常规化policy迭代的特殊情况，其 régulateur选择为 entropy。虽然规范化policy迭代的实际成功，但其理论基础仍然不清楚。这篇论文证明了规范化policy迭代与标准的Newton-raphson方法在bellman方程略微化下是等价的。这种等价性为证明了规范化policy迭代的全面分析。我们证明了规范化policy迭代在discount因子γ下有全面线性减少率，其速率为γ。此外，当iteration进入一个地方圈附近的optimal值时，这种算法会 quadratic减少。我们还证明了一种修改后的规范化policy迭代算法，即在policy评估中使用有限步骤，与不准确的Newton方法相等。我们证明了这种算法在迭代过程中有 asymptotic 线性减少率，其减少率为γ^M，其中M为policy评估中进行的步数。我们的结果为规范化policy迭代算法的减少性质提供了一个坚实的步骤。

Robust Safe Reinforcement Learning under Adversarial Disturbances

paper_url: http://arxiv.org/abs/2310.07207
repo_url: None
paper_authors: Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, Yunan Wang
For: This paper aims to address the challenge of applying reinforcement learning to real-world control tasks while ensuring safety and robustness in the presence of external disturbances.* Methods: The proposed method uses a policy iteration scheme to solve for the robust invariant set, which is a subset of the safe set where persistent safety is possible. The method also integrates the proposed policy iteration scheme into a constrained reinforcement learning algorithm that simultaneously synthesizes the robust invariant set and uses it for constrained policy optimization.* Results: The proposed method achieves zero constraint violation with learned worst-case adversarial disturbances, while other baseline algorithms violate the safety constraints substantially. Additionally, the proposed method attains comparable performance as the baselines even in the absence of the adversary.Here are the three points in Simplified Chinese text:* For: 本研究旨在应对在真实世界控制任务中应用强化学习，保证安全和可靠性在外部干扰存在下。* Methods: 提议的方法使用政策迭代方法解决最差情况下的不变性集，并将其集成到了受限制的强化学习算法中，同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同时同 Times同时同时同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times同 Times

Abstract
Safety is a primary concern when applying reinforcement learning to real-world control tasks, especially in the presence of external disturbances. However, existing safe reinforcement learning algorithms rarely account for external disturbances, limiting their applicability and robustness in practice. To address this challenge, this paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances. First, this paper presents a policy iteration scheme to solve for the robust invariant set, i.e., a subset of the safe set, where persistent safety is only possible for states within. The key idea is to establish a two-player zero-sum game by leveraging the safety value function in Hamilton-Jacobi reachability analysis, in which the protagonist (i.e., control inputs) aims to maintain safety and the adversary (i.e., external disturbances) tries to break down safety. This paper proves that the proposed policy iteration algorithm converges monotonically to the maximal robust invariant set. Second, this paper integrates the proposed policy iteration scheme into a constrained reinforcement learning algorithm that simultaneously synthesizes the robust invariant set and uses it for constrained policy optimization. This algorithm tackles both optimality and safety, i.e., learning a policy that attains high rewards while maintaining safety under worst-case disturbances. Experiments on classic control tasks show that the proposed method achieves zero constraint violation with learned worst-case adversarial disturbances, while other baseline algorithms violate the safety constraints substantially. Our proposed method also attains comparable performance as the baselines even in the absence of the adversary.

摘要
安全是控制任务中的主要考虑因素，特别是在外部干扰存在的情况下。然而，现有的安全学习算法很少考虑外部干扰，这限制了它们在实践中的可行性和可靠性。为解决这个挑战，这篇论文提出了一种可靠安全学习框架，可以抵御最差情况下的干扰。首先，这篇论文提出了一种策略迭代算法，用于解决最差情况下的策略问题。这种算法基于哈密逊-雅可比达到可靠性的概念，在其中，控制输入（即主人公）尝试维护安全，而外部干扰（即反派）尝试打砸安全。这篇论文证明了该算法的 converges monotonic 性。其次，这篇论文将提出的策略迭代算法integrated into a constrained reinforcement learning algorithm，该算法同时Synthesizes the robust invariant set and uses it for constrained policy optimization.这个算法同时解决了最优和安全的问题，即学习一个策略，可以在最差情况下实现高的奖励，同时维护安全。在经典控制任务上进行实验，我们的提出的方法可以在面对最差情况下的干扰时，保持零的约束违反率，而其他基准算法却有很大的违反率。此外，我们的方法还可以与基准算法相比，在缺乏反派的情况下具有相似的性能。

Boosting Learning for LDPC Codes to Improve the Error-Floor Performance

paper_url: http://arxiv.org/abs/2310.07194
repo_url: https://github.com/ghy1228/ldpc_error_floor
paper_authors: Hee-Youl Kwak, Dae-Young Yun, Yongjune Kim, Sang-Hyo Kim, Jong-Seon No
for: 这个论文的目的是提出一种用神经网络学习的LDPC码解码器，以消除LDPC码的错误底部效应。
methods: 这个论文使用了两种training方法来提高LDPC码解码器的性能：首先，通过使用 Ensemble Networks 技术，将解码器分成两个神经网络，并训练后一个神经网络来专门处理不可 corrected 的单词。其次，通过使用块式训练计划，地方训练块内的 weights，以解决迷你梯度问题。
results: 通过应用这些training方法于标准LDPC码中，实现了最佳的错误底部性能，并且这些方法不需要额外的硬件成本。

Abstract
Low-density parity-check (LDPC) codes have been successfully commercialized in communication systems due to their strong error correction capabilities and simple decoding process. However, the error-floor phenomenon of LDPC codes, in which the error rate stops decreasing rapidly at a certain level, presents challenges for achieving extremely low error rates and deploying LDPC codes in scenarios demanding ultra-high reliability. In this work, we propose training methods for neural min-sum (NMS) decoders to eliminate the error-floor effect. First, by leveraging the boosting learning technique of ensemble networks, we divide the decoding network into two neural decoders and train the post decoder to be specialized for uncorrected words that the first decoder fails to correct. Secondly, to address the vanishing gradient issue in training, we introduce a block-wise training schedule that locally trains a block of weights while retraining the preceding block. Lastly, we show that assigning different weights to unsatisfied check nodes effectively lowers the error-floor with a minimal number of weights. By applying these training methods to standard LDPC codes, we achieve the best error-floor performance compared to other decoding methods. The proposed NMS decoder, optimized solely through novel training methods without additional modules, can be integrated into existing LDPC decoders without incurring extra hardware costs. The source code is available at https://github.com/ghy1228/LDPC_Error_Floor .

摘要
低密度正交码（LDPC）编码器在通信系统中得到成功，这是因为它们具有强大的错误检测能力和简单的解码过程。然而，LDPC编码器的错误地面现象，即错误率停止减少快速的现象，对于实现极低的错误率和在需要极高可靠性的场景中部署LDPC编码器带来挑战。在这种情况下，我们提出了使用神经网络的训练方法来消除错误地面现象。首先，我们利用了聚合学习技术，将解码网络分成两个神经网络，并将后台解码器特化为无法被首个解码器正确地解码的词语。其次，为了解决训练过程中的渐进性问题，我们引入了分割训练计划，当地方训练一个块的 weights 时，还会重新训练前一个块的 weights。最后，我们发现，对不满足的检查节点分配不同的权重，可以有效地降低错误地面。通过这些训练方法，我们在标准LDPC编码器上实现了最佳的错误地面性能，并且这些训练方法不需要额外的硬件成本。源代码可以在上下载。

Neural networks: deep, shallow, or in between?

paper_url: http://arxiv.org/abs/2310.07190
repo_url: https://github.com/djdprogramming/adfa2
paper_authors: Guergana Petrova, Przemyslaw Wojtaszczyk
for: 估算一个封闭集合从巴拿赫空间中的近似误差。
methods: 使用带width W、深度 l 和 lipschitz 活动函数的批处理神经网络输出来估算。
results: 显示，当depth l 往infty 时，可以达到更好的估算精度，但是 fixing depth 并 letting width W 往infty 无法提高估算精度。

Abstract
We give estimates from below for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions. We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity, and that there is no gain if we fix the depth and let the width W go to infinity.

摘要
我们提供以下估计 для准确度近似compactsubset从Banach空间的迪拜恩网络输出。我们显示，对于具有无限深度l的神经网络，可以达到比Entropy数字更好的速率，但是如果将宽度W fix，则无法获得更好的性能。Here's the breakdown of the translation:* "compact subset" is translated as "compact subset" ( compact subset 是 compact subset)* "Banach space" is translated as "Banach空间" ( Banach 空间)* "feed-forward neural networks" is translated as "迪拜恩网络" ( 迪拜恩网络)* "width" is translated as "宽度" ( 宽度)* "depth" is translated as "深度" ( 深度)* "Lipschitz activation functions" is translated as "Lipschitz激活函数" ( Lipschitz 激活函数)* "entropy numbers" is translated as "Entropy数字" ( Entropy数字)* "modulo logarithmic factors" is translated as "除alogarithmic因子" ( 除alogarithmic因子)* "rates" is translated as "速率" ( 速率)I hope this helps! Let me know if you have any further questions.

Kernel Cox partially linear regression: building predictive models for cancer patients’ survival

paper_url: http://arxiv.org/abs/2310.07187
repo_url: https://github.com/rongyaohua/reggkm
paper_authors: Yaohua Rong, Sihai Dave Zhao, Xia Zheng, Yi Li
for: 预测肿瘤患者的临床结果，准确预测肿瘤患者的生存时间。
methods: 使用kernel Cox准对分析方法和RegGKM方法，同时自动除除无关参数和非参数。
results: 对多重肿瘤数据集进行分析，预测患者的死亡负担基于基因表达，并可以将患者分为不同的死亡风险组。

Abstract
Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling and irrelevant predictors removing simultaneously. In this paper, we build a kernel Cox proportional hazards semi-parametric model and propose a novel regularized garrotized kernel machine (RegGKM) method to fit the model. We use the kernel machine method to describe the complex relationship between survival and predictors, while automatically removing irrelevant parametric and non-parametric predictors through a LASSO penalty. An efficient high-dimensional algorithm is developed for the proposed method. Comparison with other competing methods in simulation shows that the proposed method always has better predictive accuracy. We apply this method to analyze a multiple myeloma dataset and predict patients' death burden based on their gene expressions. Our results can help classify patients into groups with different death risks, facilitating treatment for better clinical outcomes.

摘要
广泛的多样性存在于癌症患者的存活时间，从几个月到多个 décadas。为准确预测临床结果，建立一个准确预测模型，将患者的分子 Profiling 与患者的存活时间相关联是非常重要。由于存活时间与高维分子预测器之间存在复杂的关系，同时需要进行非参数化模型和不相关预测器的自动 removing，因此在这篇论文中，我们构建了一个kernel Cox准确预测模型，并提出了一种新的RegGKM方法来适应该问题。我们使用kernel机器方法来描述存活时间和预测器之间的复杂关系，同时通过LASSO penalty自动地除掉了无关的参数化和非参数化预测器。我们开发了一种高维度的高效算法来解决该问题。与其他竞争方法在模拟中比较，我们的方法总是有更高的预测准确性。我们在分析多发性骨髓癌数据集时应用了该方法，并预测了患者的死亡负担基于他们的基因表达。我们的结果可以帮助将患者分组为不同的死亡风险类别，以便进行更好的临床结果。

SAM-OCTA: Prompting Segment-Anything for OCTA Image Segmentation

paper_url: http://arxiv.org/abs/2310.07183
repo_url: https://github.com/shellredia/sam-octa
paper_authors: Xinrun Chen, Chengliang Wang, Haojian Ning, Shiying Li
For: This paper proposes a new method for segmenting specific targets in optical coherence tomography angiography (OCTA) images, which is useful for diagnosing and treating eye diseases.* Methods: The proposed method, named SAM-OCTA, uses a low-rank adaptation technique for fine-tuning a foundation model and generates prompt points for various segmentation tasks on OCTA datasets.* Results: SAM-OCTA achieves or approaches state-of-the-art segmentation performance metrics on two publicly available OCTA datasets (OCTA-500 and ROSE), and demonstrates effective local vessel segmentation and artery-vein segmentation, which were not well-solved in previous works.

Abstract
In the analysis of optical coherence tomography angiography (OCTA) images, the operation of segmenting specific targets is necessary. Existing methods typically train on supervised datasets with limited samples (approximately a few hundred), which can lead to overfitting. To address this, the low-rank adaptation technique is adopted for foundation model fine-tuning and proposed corresponding prompt point generation strategies to process various segmentation tasks on OCTA datasets. This method is named SAM-OCTA and has been experimented on the publicly available OCTA-500 and ROSE datasets. This method achieves or approaches state-of-the-art segmentation performance metrics. The effect and applicability of prompt points are discussed in detail for the retinal vessel, foveal avascular zone, capillary, artery, and vein segmentation tasks. Furthermore, SAM-OCTA accomplishes local vessel segmentation and effective artery-vein segmentation, which was not well-solved in previous works. The code is available at https://github.com/ShellRedia/SAM-OCTA.

摘要
在Optical coherence tomography angiography（OCTA）图像分析中，需要进行特定目标的分割。现有方法通常是在有限样本（约百个）上进行监督学习，这可能导致过拟合。为解决这问题，我们采用了低级适应技术进行基础模型细化和提出了相应的提示点生成策略，以处理不同的分割任务。我们称之为SAM-OCTA，并在公共可用OCTA-500和ROSE数据集上进行了实验。SAM-OCTA方法达到或接近状态 искусственный智能分割性能指标。我们在retinal vessel、foveal avascular zone、capillary、artery和vein分割任务中详细介绍了提示点的效果和适用性。此外，SAM-OCTA还实现了本地血管分割和有效的artery-vein分割，这在前一次作品中没有得到妥善解决。代码可以在https://github.com/ShellRedia/SAM-OCTA上获取。

Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions

paper_url: http://arxiv.org/abs/2310.07174
repo_url: None
paper_authors: Jungtaek Kim, Jeongbeen Yoon, Minsu Cho
for: 本研究旨在探讨sorting问题的更加抽象且表达强的输入，如多 digit 图像和图像碎片，通过神经网络 sorting。
methods: 我们使用了一种具有梯度可信度的排序网络，并开发了一种不减少的交换函数来保证排序网络的导数性。
results: 我们的方法在多种排序 benchmark 上表现了比或相当于基eline方法。

Abstract
Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a mapping from a high-dimensional input to an ordinal variable, the differentiability of sorting networks needs to be guaranteed. In this paper we define a softening error by a differentiable swap function, and develop an error-free swap function that holds non-decreasing and differentiability conditions. Furthermore, a permutation-equivariant Transformer network with multi-head attention is adopted to capture dependency between given inputs and also leverage its model capacity with self-attention. Experiments on diverse sorting benchmarks show that our methods perform better than or comparable to baseline methods.

摘要
“排序是计算机系统中的基本操作，已经是长期着重研究的主题。我们在传统排序算法的问题形ulation上，考虑到更抽象且表达力强的输入，例如多位数字和图像碎片，透过神经网络进行排序。为了从高维输入学习一个排序顺序，排序网络的 diferenciability 需要保证。在这篇论文中，我们定义了一个滑动错误函数，并开发了一个无错误的交换函数，它保证了非减少和渐近条件。此外，我们还使用了一个具有对称性的Transformer网络，并使用多头注意力来捕捉输入的依赖关系。实验结果显示，我们的方法在多种排序实验中表现更好或相近于基eline方法。”Note that Simplified Chinese is used in the translation, as it is the more commonly used standard for Chinese writing.

Federated Generalization via Information-Theoretic Distribution Diversification

paper_url: http://arxiv.org/abs/2310.07171
repo_url: None
paper_authors: Zheshun Wu, Zenglin Xu, Dun Zeng, Qifan Wang
For: The paper focuses on addressing the non-Independent Identically Distributed (non-IID) challenge in Federated Learning (FL), which is a significant hurdle to FL’s generalization efficacy.* Methods: The paper proposes an information-theoretic generalization framework for FL, which quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. The paper also introduces a weighted aggregation approach and a duo of client selection strategies to bolster FL’s generalization prowess.* Results: The paper’s extensive empirical evaluations reaffirm the potency of the proposed methods, aligning seamlessly with the theoretical construct.

Abstract
Federated Learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the non-Independent Identically Distributed (non-IID) challenge, poses a significant hurdle to FL's generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models' generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the divergence between the training distributions of participating clients and the testing distributions of non-participating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations aim to bolster FL's generalization prowess by encompassing a more varied set of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.

摘要
Federated Learning (FL) 在合作模型训练无需直接数据共享的能力下崛起。然而，本地数据分布的巨大差异（非独立同分布，non-IID）问题对 FL 的通用效果提出了 significiant 挑战。这种情况变得更加复杂，当不 все客户端参与训练过程时，这是由于网络连接不稳定或计算资源有限的情况。这可能会很大地复杂训练过程中模型的通用能力评估。而在当前的研究中，大多数研究都集中在参与训练的客户端数据的不同分布上，而忽略了参与训练的客户端数据与测试数据的分布之间的差异。为回应这一问题，我们的论文揭示了一种信息学基本的通用框架，具体来说是通过评估本地分布中的信息熵来评估模型的通用错误。以我们的推导出的通用 bound 为导向，我们引入了权重聚合方法和两种客户端选择策略。这些创新目的是通过包括更多客户端数据分布的方式来提高 FL 的通用能力。我们的实验证明了我们的提议的力量，与我们的理论构造一致。

LLark: A Multimodal Foundation Model for Music

paper_url: http://arxiv.org/abs/2310.07160
repo_url: https://github.com/spotify-research/llark
paper_authors: Josh Gardner, Simon Durand, Daniel Stoller, Rachel M. Bittner
for: 本研究旨在开发一种用于音乐理解的多Modal模型（LLark），以便更好地理解音乐的结构和特点。
methods: 本研究使用了多种数据集的扩充和指令调整方法来训练LLark模型，并将音乐和语言模型集成在一起。
results: 在三类任务（音乐理解、描述和逻辑）中，我们的模型可以在零shot泛化中与现有基eline匹配或超越它们，而人类在描述和逻辑任务中也与模型的回答有高度一致性。I hope this helps! Let me know if you have any other questions.

Abstract
Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of audio. We present LLark, an instruction-tuned multimodal model for music understanding. We detail our process for dataset creation, which involves augmenting the annotations of diverse open-source music datasets and converting them to a unified instruction-tuning format. We propose a multimodal architecture for LLark, integrating a pretrained generative model for music with a pretrained language model. In evaluations on three types of tasks (music understanding, captioning, and reasoning), we show that our model matches or outperforms existing baselines in zero-shot generalization for music understanding, and that humans show a high degree of agreement with the model's responses in captioning and reasoning tasks. LLark is trained entirely from open-source music data and models, and we make our training code available along with the release of this paper. Additional results and audio examples are at https://bit.ly/llark, and our source code is available at https://github.com/spotify-research/llark .

摘要
音乐具有独特和复杂的结构，对于专业人员和现有的人工智能系统来说都是挑战，与其他形式的音频不同。我们介绍了LLark，一种基于指令调整的多Modal模型，用于音乐理解。我们详细介绍了我们的数据创建过程，包括对多种开源音乐数据集的扩充和转换为一致的指令调整格式。我们提议一种多Modal架构，将音乐生成模型和语言模型集成。在三种任务（音乐理解、captioning和理解）的评估中，我们显示了我们的模型在零shot泛化中与现有基elines匹配或超越，而人类在captioning和理解任务中与模型的回答达到了高度一致。LLark通过 entirely从开源音乐数据和模型进行训练，我们在发表这篇论文时提供了训练代码，可以在https://bit.ly/llark找到更多的结果和音频示例。我们的源代码可以在https://github.com/spotify-research/llark 上找到。

Imitation Learning from Purified Demonstration

paper_url: http://arxiv.org/abs/2310.07143
repo_url: None
paper_authors: Yunke Wang, Minjing Dong, Bo Du, Chang Xu
For: Addressing sequential decision-making problems with imperfect expert demonstrations.* Methods: Purifying potential perturbations in imperfect demonstrations via a two-step diffusion process, and then conducting imitation learning from the purified demonstrations.* Results: Theoretical evidence supporting the approach, and evaluation results on MuJoCo demonstrating effectiveness from different aspects.Here’s the summary in Traditional Chinese:* For: 解决受到不完整专家示范的序列做决策问题。* Methods: 使用两步Diffusion过程缓和潜在随机变动，然后从缓和的示范中学习。* Results: 提供了对方法的理论支持，并在MuJoCo上进行了不同方面的评估。

Abstract
Imitation learning has emerged as a promising approach for addressing sequential decision-making problems, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, expert demonstrations are often imperfect, leading to challenges in effectively applying imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the training typically requires a certain proportion of optimal demonstrations to guarantee performance. To tackle these problems, we propose to purify the potential perturbations in imperfect demonstrations and subsequently conduct imitation learning from purified demonstrations. Motivated by the success of diffusion models, we introduce a two-step purification via the diffusion process. In the first step, we apply a forward diffusion process to effectively smooth out the potential perturbations in imperfect demonstrations by introducing additional noise. Subsequently, a reverse generative process is utilized to recover the optimal expert demonstrations from the diffused ones. We provide theoretical evidence supporting our approach, demonstrating that total variance distance between the purified and optimal demonstration distributions can be upper-bounded. The evaluation results on MuJoCo demonstrate the effectiveness of our method from different aspects.

摘要
<> translate="zh-CN"Sequential decision-making problems 有 Emerged as a promising approach imitation learning, with the assumption that expert demonstrations are optimal. However, in real-world scenarios, expert demonstrations are often imperfect, leading to challenges in effectively applying imitation learning. While existing research has focused on optimizing with imperfect demonstrations, the training typically requires a certain proportion of optimal demonstrations to guarantee performance. To tackle these problems, we propose to purify the potential perturbations in imperfect demonstrations and subsequently conduct imitation learning from purified demonstrations.受 diffusion models 的成功所 inspirited, we introduce a two-step purification via the diffusion process. In the first step, we apply a forward diffusion process to effectively smooth out the potential perturbations in imperfect demonstrations by introducing additional noise. Subsequently, a reverse generative process is utilized to recover the optimal expert demonstrations from the diffused ones. We provide theoretical evidence supporting our approach, demonstrating that total variance distance between the purified and optimal demonstration distributions can be upper-bounded. The evaluation results on MuJoCo demonstrate the effectiveness of our method from different aspects.

Risk Assessment and Statistical Significance in the Age of Foundation Models

paper_url: http://arxiv.org/abs/2310.07132
repo_url: None
paper_authors: Apoorva Nitsure, Youssef Mroueh, Mattia Rigotti, Kristjan Greenewald, Brian Belgodere, Mikhail Yurochkin, Jiri Navratil, Igor Melnyk, Jerret Ross
for: 本研究旨在评估基础模型中的社会技术风险，并使用量化的统计学 significado进行评估。
methods: 本研究使用了一种新的统计相对测试，基于实际随机变量的首次和第二次统计学上的准则。这种测试的第二个统计学是与经济统计和金融数学中常用的mean-risk模型相关的。
results: 本研究通过定义一个”指标股票”来汇总一系列指标，并使用这个股票来选择基础模型。我们也提供了一种基于风险意识的模型选择方法，并通过 bootstrap variance estimate来支持统计学上的分析。通过使用这些方法，我们对不同的大语言模型进行了风险相关的比较。

Abstract
We propose a distributional framework for assessing socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a \emph{metrics portfolio} for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.

摘要
我们提出了一个分布式框架，用于评估基础模型的社会技术风险，并使用量化的统计学 significado。我们的方法基于新的随机变量测试，使用第一和第二个泛化随机变量的统计学上的比较。我们表明了这些测试的第二个统计学与经济数学和金融中常用的均值风险模型相关。使用这个框架，我们正式开发了一种基于风险意识的模型选择方法，给出了 guardrails 量化的指标。 inspirited by 数学金融中的股票选择理论，我们定义了一个“指标股票” для每个模型，以便对一系列指标进行汇总，并根据这些股票的随机上度来进行模型选择。我们的统计学 significado 是通过中心假设定理和 bootstrap 方法来证明的，并在实践中通过 bootstrap 假设来估计偏差。我们使用我们的框架来比较不同的大语言模型，关于从指令脱离和输出恶意内容的风险。

Machine Learning Methods for Background Potential Estimation in 2DEGs

paper_url: http://arxiv.org/abs/2310.07089
repo_url: None
paper_authors: Carlo da Cunha, Nobuyuki Aoki, David Ferry, Kevin Vora, Yu Zhang
for: 这研究探讨了二维电子晕（2DEGs）中含有杂stoff和瑕疵的问题，以及这些问题如何影响碰撞载子 mobilit、导电性和量子准确时间。methods: 这个研究使用了扫描门微镜（SGM）技术，并应用了三种不同的机器学习算法来估算2DEGs的背景潜能：生成 adversarial neural network、细胞神经网络和进化搜索算法。results: 研究发现，使用进化搜索算法可以很有效地估算2DEGs的背景潜能，并提供了一种新的瑕疵分析方法。这项研究不仅提高了我们对2DEGs的理解，还强调了机器学习在探索量子材料方面的潜在优势，对量子计算和纳но电子学习有重要意义。

Abstract
In the realm of quantum-effect devices and materials, two-dimensional electron gases (2DEGs) stand as fundamental structures that promise transformative technologies. However, the presence of impurities and defects in 2DEGs poses substantial challenges, impacting carrier mobility, conductivity, and quantum coherence time. To address this, we harness the power of scanning gate microscopy (SGM) and employ three distinct machine learning techniques to estimate the background potential of 2DEGs from SGM data: image-to-image translation using generative adversarial neural networks, cellular neural network, and evolutionary search. Our findings, despite data constraints, highlight the effectiveness of an evolutionary search algorithm in this context, offering a novel approach for defect analysis. This work not only advances our understanding of 2DEGs but also underscores the potential of machine learning in probing quantum materials, with implications for quantum computing and nanoelectronics.

摘要
在量子效应设备和材料领域中，二维电子液体（2DEG）作为基本结构，承诺引领技术转型。然而，2DEG中的杂质和缺陷对载子移动性、导电性和量子同步时间产生了显著影响。为此，我们利用扫描门微scanning gate microscopy（SGM）数据，并采用三种不同的机器学习技术来估算2DEG背景电势：图像到图像翻译使用生成对抗神经网络、细胞神经网络和进化搜索。我们的发现，尽管数据有限制，表明了进化搜索算法在这种情况下的效果，提供了一种新的缺陷分析方法。这种工作不仅提高了我们对2DEG的理解，也强调了机器学习在探测量子材料方面的潜在潜力，对量子计算和纳но电子学习有重要意义。

2023-10-11

Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines

Enhanced sampling of Crystal Nucleation with Graph Representation Learnt Variables

First-Order Dynamic Optimization for Streaming Convex Costs

Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning

Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs

ASV Station Keeping under Wind Disturbances using Neural Network Simulation Error Minimization Model Predictive Control

A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

Refined Mechanism Design for Approximately Structured Priors via Active Regression

QArchSearch: A Scalable Quantum Architecture Search Package

On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism

Measuring Feature Sparsity in Language Models

Large Language Models Are Zero-Shot Time Series Forecasters

Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning

Using Spark Machine Learning Models to Perform Predictive Analysis on Flight Ticket Pricing Data

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Promoting Robustness of Randomized Smoothing: Two Cost-Effective Approaches

Feature Learning and Generalization in Deep Networks with Orthogonal Weights

Self-supervised Representation Learning From Random Data Projectors

Stabilizing Estimates of Shapley Values with Control Variates

The First Pathloss Radio Map Prediction Challenge

Hypercomplex Multimodal Emotion Recognition from EEG and Peripheral Physiological Signals

Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey

Graph Transformer Network for Flood Forecasting with Heterogeneous Covariates

Differentiable Euler Characteristic Transforms for Shape Classification

Unsupervised Learning of Sea Surface Height Interpolation from Multi-variate Simulated Satellite Observations

Prospective Side Information for Latent MDPs

Transformers for Green Semantic Communication: Less Energy, More Semantics

Analyzing Trendy Twitter Hashtags in the 2022 French Election

Smootheness-Adaptive Dynamic Pricing with Nonparametric Demand Learning

Provable Advantage of Parameterized Quantum Circuit in Function Approximation

Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning

Model-based Clustering of Individuals’ Ecological Momentary Assessment Time-series Data for Improving Forecasting Performance

Nonlinear embeddings for conserving Hamiltonians and other quantities with Neural Galerkin schemes

Automatic Sensor-free Affect Detection: A Systematic Literature Review

Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma

Uncovering ECG Changes during Healthy Aging using Explainable AI

ProbTS: A Unified Toolkit to Probe Deep Time-series Forecasting

A Branched Deep Convolutional Network for Forecasting the Occurrence of Hazes in Paris using Meteorological Maps with Different Characteristic Spatial Scales

Generalized Mixture Model for Extreme Events Forecasting in Time Series Data

Non-backtracking Graph Neural Networks

Quantum-Enhanced Forecasting: Leveraging Quantum Gramian Angular Field and CNNs for Stock Return Predictions

Deep Kernel and Image Quality Estimators for Optimizing Robotic Ultrasound Controller using Bayesian Optimization

Experimental quantum natural gradient optimization in photonics

Orthogonal Random Features: Explicit Forms and Sharp Inequalities

Improved Analysis of Sparse Linear Regression in Local Differential Privacy Model

GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning

Atom-Motif Contrastive Transformer for Molecular Property Prediction

Towards Foundation Models for Learning on Tabular Data

Multichannel consecutive data cross-extraction with 1DCNN-attention for diagnosis of power transformer

Byzantine-Resilient Decentralized Multi-Armed Bandits

Molecule-Edit Templates for Efficient and Accurate Retrosynthesis Prediction

Score Regularized Policy Optimization through Diffusion Behavior

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

RaftFed: A Lightweight Federated Learning Framework for Vehicular Crowd Intelligence

Classification of Dysarthria based on the Levels of Severity. A Systematic Review

Deep ReLU networks and high-order finite element methods II: Chebyshev emulation

CacheGen: Fast Context Loading for Language Model Applications

Are GATs Out of Balance?

Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality

Self-supervised Pocket Pretraining via Protein Fragment-Surroundings Alignment

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

Enhancing Neural Architecture Search with Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices

Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes

Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration

Robust Safe Reinforcement Learning under Adversarial Disturbances

Boosting Learning for LDPC Codes to Improve the Error-Floor Performance

Neural networks: deep, shallow, or in between?

Kernel Cox partially linear regression: building predictive models for cancer patients’ survival

SAM-OCTA: Prompting Segment-Anything for OCTA Image Segmentation

Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions

Federated Generalization via Information-Theoretic Distribution Diversification

LLark: A Multimodal Foundation Model for Music

Imitation Learning from Purified Demonstration

Risk Assessment and Statistical Significance in the Age of Foundation Models

Machine Learning Methods for Background Potential Estimation in 2DEGs