2023-10-01

cs.LG

cs.LG - 2023-10-01

Determining the Optimal Number of Clusters for Time Series Datasets with Symbolic Pattern Forest

paper_url: http://arxiv.org/abs/2310.00820
repo_url: None
paper_authors: Md Nishat Raihan
for: 该论文的目的是提出一种基于Symbolic Pattern Forest（SPF）算法的时间序列嵌入分类方法，以确定时间序列数据集中的优化数量分支。
methods: 该方法使用了SPF算法生成时间序列数据集中的嵌入分类结果，并根据Silhouette系数选择优化数量分支。Silhouette系数在bag of word vector和tf-idf vector两个方面进行计算。
results: 对于UCRLibrary数据集，该方法实验结果表明与基准相比有显著改善。

Abstract
Clustering algorithms are among the most widely used data mining methods due to their exploratory power and being an initial preprocessing step that paves the way for other techniques. But the problem of calculating the optimal number of clusters (say k) is one of the significant challenges for such methods. The most widely used clustering algorithms like k-means and k-shape in time series data mining also need the ground truth for the number of clusters that need to be generated. In this work, we extended the Symbolic Pattern Forest algorithm, another time series clustering algorithm, to determine the optimal number of clusters for the time series datasets. We used SPF to generate the clusters from the datasets and chose the optimal number of clusters based on the Silhouette Coefficient, a metric used to calculate the goodness of a clustering technique. Silhouette was calculated on both the bag of word vectors and the tf-idf vectors generated from the SAX words of each time series. We tested our approach on the UCR archive datasets, and our experimental results so far showed significant improvement over the baseline.

摘要

ECG-SL: Electrocardiogram(ECG) Segment Learning, a deep learning method for ECG signal

paper_url: http://arxiv.org/abs/2310.00818
repo_url: None
paper_authors: Han Yu, Huiyuan Yang, Akane Sano
for: 本研究旨在使用深度学习模型以优化心跳信号的分析，提高心跳信号的诊断精度。
methods: 本研究提出了一种基于心跳段分的ECG-Segment based Learning（ECG-SL）框架，从心跳段中提取了结构特征，并使用时间模型学习时间信息。此外，还explored一种自动标注的自我超vised学习策略以预训练模型，从而提高下游任务的性能。
results: 对于三种临床应用（心脏病诊断、呼吸暂停检测和cardiac arrhythmia分类），ECG-SL方法显示了与基eline模型和任务特定方法相比的竞争性表现。此外，通过分心跳 segments的Visualization Map可以看到ECG-SL方法更强调每个心跳的峰值和ST范围。

Abstract
Electrocardiogram (ECG) is an essential signal in monitoring human heart activities. Researchers have achieved promising results in leveraging ECGs in clinical applications with deep learning models. However, the mainstream deep learning approaches usually neglect the periodic and formative attribute of the ECG heartbeat waveform. In this work, we propose a novel ECG-Segment based Learning (ECG-SL) framework to explicitly model the periodic nature of ECG signals. More specifically, ECG signals are first split into heartbeat segments, and then structural features are extracted from each of the segments. Based on the structural features, a temporal model is designed to learn the temporal information for various clinical tasks. Further, due to the fact that massive ECG signals are available but the labeled data are very limited, we also explore self-supervised learning strategy to pre-train the models, resulting significant improvement for downstream tasks. The proposed method outperforms the baseline model and shows competitive performances compared with task-specific methods in three clinical applications: cardiac condition diagnosis, sleep apnea detection, and arrhythmia classification. Further, we find that the ECG-SL tends to focus more on each heartbeat's peak and ST range than ResNet by visualizing the saliency maps.

摘要
电心图（ECG）是人类心脏活动监测中的关键信号。研究人员在临床应用中已经取得了深受欢迎的结果，使用深度学习模型。然而，主流深度学习方法通常忽略ECG心跳波形的周期性和结构特征。在这项工作中，我们提出了一种基于ECG心跳分割的学习框架（ECG-SL），以明确ECG信号的周期性。具体来说，ECG信号首先被分割成心跳分割，然后从每个分割中提取结构特征。基于这些结构特征，我们设计了一个时间模型，以学习不同临床任务中的时间信息。由于大量的ECG信号 disponible，但标注数据却很有限，因此我们还探索了自动学习策略，以预训练模型，从而实现了显著的提升。我们的方法超过基线模型，并与特定任务方法相比，在三种临床应用中（心脏病诊断、呼吸暂停检测和心动过速分类）显示了竞争力。此外，我们发现ECG-SL在每个心跳的峰值和ST范围方面更加强调，相比ResNet。我们可以通过Visualize saliency maps来见到这一点。

Learning to Make Adherence-Aware Advice

paper_url: http://arxiv.org/abs/2310.00817
repo_url: None
paper_authors: Guanting Chen, Xiaocheng Li, Chunlin Sun, Hanzhao Wang
for: 这篇论文目的是提出一个序推问题解决模型，以满足人类与人工智能（AI）之间的互动挑战。
methods: 这个模型考虑了人类的遵循度（机器建议被接受或拒绝的可能性），并提供了一个折补选项，以便机器在重要时刻提供建议。这篇论文还提供了专门的学习算法，以学习最佳建议策略，并只在重要时刻提供建议。
results: 与问题独立的算法相比，这篇论文的专门学习算法不仅具有更好的理论均衡性，还在实验中显示出强大的表现。

Abstract
As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.

摘要
artificial intelligence (AI) 系统在人类做出决策中扮演越来越重要的角色，但是人类与 AI 之间的互动问题开始浮现。一个挑战是由于 AI 策略不够佳，人类可能不会遵循 AI 的建议，同时 AI 需要提供建议时机选择性地为人类提供建议。这篇论文提出了一个顺序决策模型，该模型（i）考虑人类遵循度（机器建议被接受或拒绝的概率），（ii）将机器给出建议的时间点选择性地推荐。我们提供了特殊的学习算法，这些算法不仅具有更好的理论均衡性，并且在实验中表现出色。相比问题agnostic 征求学习算法，我们的特殊学习算法不仅具有更好的理论均衡性，而且在实验中表现出色。

Bayesian Design Principles for Frequentist Sequential Learning

paper_url: http://arxiv.org/abs/2310.00806
repo_url: https://github.com/xuyunbei/mab-code
paper_authors: Yunbei Xu, Assaf Zeevi
for:这篇论文的目的是优化频繁ister regret for sequential learning problems，并提供了一种总结 Bayesian principles的通用理论。methods:论文使用了一种新的优化方法，即“algorithmic beliefs”的生成，以及基于 Bayesian posteriors 的决策。results:论文提出了一种新的算法，可以在随机、对抗和不同环境下实现“best-of-all-worlds”的 empirical performance。此外，这些原理还可以应用于线性 bandits、bandit convex optimization 和 reinforcement learning。

Abstract
We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.

摘要
(Simplified Chinese translation note:* "frequentist regret" 改为 "频率 regret"* "sequential learning problems" 改为 "顺序学习问题"* "efficient bandit and reinforcement learning algorithms" 改为 "高效的随机抽象和奖励学习算法"* "Algorithmic Information Ratio" 改为 "算法信息比率"* "prior-free" 改为 "无先验"* "adversarial settings" 改为 "对抗设定"* "linear bandits" 改为 "线性随机抽象"* "bandit convex optimization" 改为 "随机抽象优化"* "reinforcement learning" 改为 "奖励学习")

Going Beyond Familiar Features for Deep Anomaly Detection

paper_url: http://arxiv.org/abs/2310.00797
repo_url: None
paper_authors: Sarath Sivaprasad, Mario Fritz
For: The paper is written for detecting anomalies in deep learning models, specifically addressing the problem of false negatives caused by uncaptured novel features.* Methods: The paper proposes a novel approach to anomaly detection using explainability, which captures novel features as unexplained observations in the input space. The approach combines similarity and novelty in a hybrid approach, eliminating the need for expensive background models and dense matching.* Results: The paper achieves strong performance across a wide range of anomaly benchmarks, reducing false negative anomalies by up to 40% compared to the state-of-the-art. The method also provides visually inspectable explanations for pixel-level anomalies.Here are the three points in Simplified Chinese text:* For: 本文是为检测深度学习模型中的异常点而写的，特别是解决由未捕捉的新特征引起的假阳性问题。* Methods: 本文提出了一种新的异常检测方法，使用可解释性来捕捉新特征，并将相似性和新鲜度结合在一起。这种方法可以无需昂贵的背景模型和紧密匹配。* Results: 本文在多种异常标准benchmark上实现了优秀的表现，相比之前的状态态-of-the-art，减少了假阳性异常的比例达40%。此外，方法还提供了可视化的解释，用于检测像素级异常。

Abstract
Anomaly Detection (AD) is a critical task that involves identifying observations that do not conform to a learned model of normality. Prior work in deep AD is predominantly based on a familiarity hypothesis, where familiar features serve as the reference in a pre-trained embedding space. While this strategy has proven highly successful, it turns out that it causes consistent false negatives when anomalies consist of truly novel features that are not well captured by the pre-trained encoding. We propose a novel approach to AD using explainability to capture novel features as unexplained observations in the input space. We achieve strong performance across a wide range of anomaly benchmarks by combining similarity and novelty in a hybrid approach. Our approach establishes a new state-of-the-art across multiple benchmarks, handling diverse anomaly types while eliminating the need for expensive background models and dense matching. In particular, we show that by taking account of novel features, we reduce false negative anomalies by up to 40% on challenging benchmarks compared to the state-of-the-art. Our method gives visually inspectable explanations for pixel-level anomalies.

摘要
异常检测（AD）是一项关键任务，它的目标是找到不符合学习的模型正常性的观察值。现有的深度AD研究大多基于 Familiarity 假设，即使用已经训练过的特征空间中的熟悉特征作为参考。然而，这种策略会导致常见的假阳性结果，即在异常值中包含未 capture 的新特征。我们提出一种基于解释力的新方法，可以捕捉输入空间中的新特征作为未解释的观察值。我们通过将相似性和新鲜度结合在一起来实现了一种混合方法，并在多个异常 benchmark 上达到了新的状态对领导地位。我们的方法可以处理多种异常类型，而无需购买贵重的背景模型和紧密匹配。特别是，我们发现通过考虑新特征，可以降低 false negative 异常值达到 40% 以上，相比之前的状态对领导地位。我们的方法还可以为像素级异常值提供可见的解释。

Categorizing Flight Paths using Data Visualization and Clustering Methodologies

paper_url: http://arxiv.org/abs/2310.00773
repo_url: None
paper_authors: Yifan Song, Keyang Yu, Seth Young
for: 本研究使用美国联邦航空管理局的航空流管理系统数据和DV8工具来开发了飞行路径分 clustering算法，以分类不同的飞行路径。
methods: 研究使用了两种分 clustering方法：一种是基于空间地理准备的距离模型，另一种是基于向量cosine相似性模型。两种方法的比较和应用示例演示了自动 clustering结果决定和人工循环过程的成功应用。
results: 研究发现，基于地理距离模型在航道部分的 clustering效果较好，而基于cosine相似性模型在近端操作部分，如到达路径，的 clustering效果较好。此外，使用点抽象技术可以提高计算效率。

Abstract
This work leverages the U.S. Federal Aviation Administration's Traffic Flow Management System dataset and DV8, a recently developed tool for highly interactive visualization of air traffic data, to develop clustering algorithms for categorizing air traffic by their varying flight paths. Two clustering methodologies, a spatial-based geographic distance model, and a vector-based cosine similarity model, are demonstrated and compared for their clustering effectiveness. Examples of their applications reveal successful, realistic clustering based on automated clustering result determination and human-in-the-loop processes, with geographic distance algorithms performing better for enroute portions of flight paths and cosine similarity algorithms performing better for near-terminal operations, such as arrival paths. A point extraction technique is applied to improve computation efficiency.

摘要
这项工作利用美国联邦航空管理局的交通流管理系统数据集和DV8工具，一种最近开发的高度互动式航空交通数据可视化工具，开发出 clustering 算法来分类不同的航空交通路径。我们示出了两种 clustering 方法，一种基于空间准备的地理距离模型，另一种基于向量的 косину similarity 模型，并对它们的划分效果进行比较。我们还提供了自动划分结果决定和人工循环过程的应用示例，其中地理距离算法在航道部分表现较好，而 косину similarity 算法在近机场操作，如进近路径，表现较好。此外，我们还应用了点提取技术来提高计算效率。

Data-Efficient Power Flow Learning for Network Contingencies

paper_url: http://arxiv.org/abs/2310.00763
repo_url: None
paper_authors: Parikshit Pareek, Deepjyoti Deka, Sidhant Misra
for: 学习电网中网络异常情况下的电流流动和相应的可能性电压范围（PVE）。
methods: 使用一种网络感知的 Gaussian Process（GP）称为顶点度 kernel（VDK-GP）来估算电压-功率函数，并提出一种新的多任务顶点度 kernel（MT-VDK）来确定未经见过的电网中的电流流动。
results: 在IEEE 30-Bus 电网上进行了 simulations，发现MT-VDK-GP方法可以在低训练数据范围（50-250样本）下减少了平均预测错误的50%以上，并在75%以上的 N-2 停机网络结构中超过了基于超参数的传输学习方法。此外，MT-VDK-GP方法还可以使用64倍少的电流解题方法来实现PVE。

Abstract
This work presents an efficient data-driven method to learn power flows in grids with network contingencies and to estimate corresponding probabilistic voltage envelopes (PVE). First, a network-aware Gaussian process (GP) termed Vertex-Degree Kernel (VDK-GP), developed in prior work, is used to estimate voltage-power functions for a few network configurations. The paper introduces a novel multi-task vertex degree kernel (MT-VDK) that amalgamates the learned VDK-GPs to determine power flows for unseen networks, with a significant reduction in the computational complexity and hyperparameter requirements compared to alternate approaches. Simulations on the IEEE 30-Bus network demonstrate the retention and transfer of power flow knowledge in both N-1 and N-2 contingency scenarios. The MT-VDK-GP approach achieves over 50% reduction in mean prediction error for novel N-1 contingency network configurations in low training data regimes (50-250 samples) over VDK-GP. Additionally, MT-VDK-GP outperforms a hyper-parameter based transfer learning approach in over 75% of N-2 contingency network structures, even without historical N-2 outage data. The proposed method demonstrates the ability to achieve PVEs using sixteen times fewer power flow solutions compared to Monte-Carlo sampling-based methods.

摘要

Data-driven adaptive building thermal controller tuning with constraints: A primal-dual contextual Bayesian optimization approach

paper_url: http://arxiv.org/abs/2310.00758
repo_url: None
paper_authors: Wenjie Xu, Bratislav Svetozarevic, Loris Di Natale, Philipp Heer, Colin N Jones
for: 本文 targets the problem of minimizing the energy consumption of a room temperature controller while ensuring the daily cumulative thermal discomfort of occupants is below a given threshold.
methods: 本文提出了一种数据驱动的逻辑-对抗搜索（PDCBO）方法来解决这个问题。
results: 在一个单个房间的 simulate case study中，我们运用了我们的算法来调整PI饱和预热时间的参数，并获得了至多4.7%的能源减少，同时保证每天的温室不超过给定的快速阈值。此外，PDCBO还可以自动跟踪时间变化的快速阈值，而其他方法无法完成这一任务。

Abstract
We study the problem of tuning the parameters of a room temperature controller to minimize its energy consumption, subject to the constraint that the daily cumulative thermal discomfort of the occupants is below a given threshold. We formulate it as an online constrained black-box optimization problem where, on each day, we observe some relevant environmental context and adaptively select the controller parameters. In this paper, we propose to use a data-driven Primal-Dual Contextual Bayesian Optimization (PDCBO) approach to solve this problem. In a simulation case study on a single room, we apply our algorithm to tune the parameters of a Proportional Integral (PI) heating controller and the pre-heating time. Our results show that PDCBO can save up to 4.7% energy consumption compared to other state-of-the-art Bayesian optimization-based methods while keeping the daily thermal discomfort below the given tolerable threshold on average. Additionally, PDCBO can automatically track time-varying tolerable thresholds while existing methods fail to do so. We then study an alternative constrained tuning problem where we aim to minimize the thermal discomfort with a given energy budget. With this formulation, PDCBO reduces the average discomfort by up to 63% compared to state-of-the-art safe optimization methods while keeping the average daily energy consumption below the required threshold.

摘要
我们研究控制室内温度的参数来减少能源消耗，并且保持每天累累感觉下限。我们将这个问题转化为线上受限制的黑盒优化问题，每天我们可以观察环境上的一些相关数据，然后选择参数。在这篇论文中，我们提出使用基于Primal-Dual Contextual Bayesian Optimization（PDCBO）的数据驱动方法来解决这个问题。在单一房间的实验案例中，我们使用我们的算法来调整PI适应器和预热时间的参数。我们的结果显示，PDCBO可以与其他现有的Bayesian优化基于方法相比，在平均每天的能源消耗下降4.7%，同时保持每天累累感觉下限。此外，PDCBO可以自动跟踪时间变化的受限制耐受阈值，而现有的方法则无法实现这一点。然后，我们研究一个受限制的问题，即将累累感觉降到最低，并且保持每天能源消耗在所需的预算下。这个问题中，PDCBO可以在平均每天的累累感觉下降63%，同时保持每天能源消耗在所需的预算下。

Identifying Copeland Winners in Dueling Bandits with Indifferences

paper_url: http://arxiv.org/abs/2310.00750
repo_url: None
paper_authors: Viktor Bengs, Björn Haddenhorst, Eyke Hüllermeier
for: 本文研究了一种叫做“对抗炮手问题”的概念，其中一个玩家可以通过选择一个或多个“炮手”来获得奖励。本文是关于这种问题的一种特殊情况，即在玩家提供反馈时，可能会出现“无偏好”的情况。
methods: 本文提出了一种名为POCOWISTA的算法，该算法可以寻找玩家的 Copeland 赢家（即最佳炮手）。此外，本文还提供了一个lower bound的下界，表明任何学习算法都需要至少这么多样本来找到 Copeland 赢家。
results: 本文的实验结果表明，POCOWISTA 算法在实际中表现出色，它可以快速寻找玩家的 Copeland 赢家，并且在普通的对抗炮手问题中也有优秀的表现。此外，如果 preference probabilities 满足一种特殊的随机对称性条件，则可以提供一个改进的 worst-case 下界。

Abstract
We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

摘要
我们考虑了在战斗炮手问题中确定科普兰赢家的任务，这是一种未得到充分研究但实际上很有实际意义的战斗炮手问题变种，在这种变种中，除了简单的首选之外，还可能观察到反馈形式的半同意。我们提供了确定科普兰赢家的样本复杂度下界，以及一种名为POCOWISTA的算法，该算法的样本复杂度几乎与下界匹配，并在实际中表现出色，包括传统的战斗炮手问题。在首选概率满足特定的随机transitivity性时，我们提供了一种改进的启发版本，其 worst case 样本复杂度得到改进。

SEED: Simple, Efficient, and Effective Data Management via Large Language Models

paper_url: http://arxiv.org/abs/2310.00749
repo_url: None
paper_authors: Zui CHen, Lei Cao, Sam Madden, Ju Fan, Nan Tang, Zihui Gu, Zeyuan Shang, Chunwei Liu, Michael Cafarella, Tim Kraska
for: This paper aims to provide an efficient and effective data management system for large language models (LLMs) by addressing the challenges of computational and economic expense.
methods: The paper proposes a system called SEED, which consists of three main components: code generation, model generation, and augmented LLM query. SEED localizes LLM computation as much as possible, uses optimization techniques to enhance the localized solution and LLM queries, and allows users to easily construct a customized data management solution.
results: The paper achieves state-of-the-art few-shot performance while significantly reducing the number of required LLM calls for diverse data management tasks such as data imputation and NL2SQL translation.

Abstract
We introduce SEED, an LLM-centric system that allows users to easily create efficient, and effective data management applications. SEED comprises three main components: code generation, model generation, and augmented LLM query to address the challenges that LLM services are computationally and economically expensive and do not always work well on all cases for a given data management task. SEED addresses the expense challenge by localizing LLM computation as much as possible. This includes replacing most of LLM calls with local code, local models, and augmenting LLM queries with batching and data access tools, etc. To ensure effectiveness, SEED features a bunch of optimization techniques to enhance the localized solution and the LLM queries, including automatic code validation, code ensemble, model representatives selection, selective tool usages, etc. Moreover, with SEED users are able to easily construct a data management solution customized to their applications. It allows the users to configure each component and compose an execution pipeline in natural language. SEED then automatically compiles it into an executable program. We showcase the efficiency and effectiveness of SEED using diverse data management tasks such as data imputation, NL2SQL translation, etc., achieving state-of-the-art few-shot performance while significantly reducing the number of required LLM calls.

摘要
我们介绍SEED系统，它是基于LLM的系统，让用户可以轻松地创建高效、高效的数据管理应用程序。SEED包括三个主要 ком成分：代码生成、模型生成和增强LLM查询。这些 ком成分是为了解决LLM服务 computationally和经济成本高，并且不一定在所有情况下能够实现数据管理任务。SEED通过地方化LLM计算来解决这个挑战，包括将大多数LLM请求替换为本地代码、本地模型和增强LLM查询批处理等。为了保证效果，SEED具有许多优化技术，包括自动验证代码、代码合并、模型选择、选择工具使用等。此外，SEED还允许用户轻松地建立自定义的数据管理解决方案，并且可以自然语言中 configurations 和构成执行管线。SEED 将自动将其转换为可执行程式。我们透过使用多种数据管理任务，例如数据补充、NL2SQL翻译等，得到了状况之中的几个shot性能，同时对LLM请求数量进行了重要削减。

Deterministic Langevin Unconstrained Optimization with Normalizing Flows

paper_url: http://arxiv.org/abs/2310.00745
repo_url: None
paper_authors: James M. Sullivan, Uros Seljak
for: 该论文旨在开发一种全球、不使用梯度的优化策略，用于解决costly黑桶函数问题。
methods: 该方法基于Fokker-Planck和Langevin方程，并且利用Normalizing Flow来进行活动学习和选择提案点。
results: 该方法在标准的synthetic测试函数上实现了superior或竞争性的进步，并在实际的科学和神经网络优化问题上达到了竞争性的result。

Abstract
We introduce a global, gradient-free surrogate optimization strategy for expensive black-box functions inspired by the Fokker-Planck and Langevin equations. These can be written as an optimization problem where the objective is the target function to maximize minus the logarithm of the current density of evaluated samples. This objective balances exploitation of the target objective with exploration of low-density regions. The method, Deterministic Langevin Optimization (DLO), relies on a Normalizing Flow density estimate to perform active learning and select proposal points for evaluation. This strategy differs qualitatively from the widely-used acquisition functions employed by Bayesian Optimization methods, and can accommodate a range of surrogate choices. We demonstrate superior or competitive progress toward objective optima on standard synthetic test functions, as well as on non-convex and multi-modal posteriors of moderate dimension. On real-world objectives, such as scientific and neural network hyperparameter optimization, DLO is competitive with state-of-the-art baselines.

摘要
我们介绍了一种全球、gradient-free的优化策略，用于优化costly黑obox函数。这种策略 Draw inspiration from the Fokker-Planck and Langevin equations, and can be formulated as an optimization problem where the objective is to maximize the target function minus the logarithm of the current density of evaluated samples. This objective balances the exploitation of the target objective with the exploration of low-density regions. Our method, Deterministic Langevin Optimization (DLO), uses a Normalizing Flow density estimate to perform active learning and select proposal points for evaluation. This strategy differs qualitatively from the widely-used acquisition functions employed by Bayesian Optimization methods, and can accommodate a range of surrogate choices. We demonstrate superior or competitive progress toward objective optima on standard synthetic test functions, as well as on non-convex and multi-modal posteriors of moderate dimension. On real-world objectives, such as scientific and neural network hyperparameter optimization, DLO is competitive with state-of-the-art baselines.

Spectral Neural Networks: Approximation Theory and Optimization Landscape

paper_url: http://arxiv.org/abs/2310.00729
repo_url: None
paper_authors: Chenghui Li, Rishi Sonthalia, Nicolas Garcia Trillos
for: This paper investigates the theoretical aspects of Spectral Neural Networks (SNN) and their tradeoffs with respect to the number of neurons and the amount of spectral geometric information learned.
methods: The paper uses a theoretical approach to explore the optimization landscape of SNN’s objective function, shedding light on the training dynamics of SNN and its non-convex ambient loss function.
results: The paper presents quantitative insights into the tradeoff between the number of neurons and the amount of spectral geometric information a neural network learns, and initiates a theoretical exploration of the training dynamics of SNN.

Abstract
There is a large variety of machine learning methodologies that are based on the extraction of spectral geometric information from data. However, the implementations of many of these methods often depend on traditional eigensolvers, which present limitations when applied in practical online big data scenarios. To address some of these challenges, researchers have proposed different strategies for training neural networks as alternatives to traditional eigensolvers, with one such approach known as Spectral Neural Network (SNN). In this paper, we investigate key theoretical aspects of SNN. First, we present quantitative insights into the tradeoff between the number of neurons and the amount of spectral geometric information a neural network learns. Second, we initiate a theoretical exploration of the optimization landscape of SNN's objective to shed light on the training dynamics of SNN. Unlike typical studies of convergence to global solutions of NN training dynamics, SNN presents an additional complexity due to its non-convex ambient loss function.

摘要
有很多机器学习方法基于数据中特征几何信息的提取，但是许多实现方法常常依赖于传统的特征值解决方案，这些解决方案在实际上线大数据场景中存在限制。为了解决这些挑战，研究人员已经提议了不同的替代方案，其中一种是叫做特征神经网络（SNN）。在这篇论文中，我们调查了SNN的关键理论方面。首先，我们提供了量化的视角，描述了神经网络学习过程中特征几何信息和神经元数之间的负反关系。其次，我们开始了SNN目标函数优化境地的理论探索，以便更好地理解SNN训练过程的动态。不同于传统的NN训练动态研究，SNN增加了非对称的抽象损失函数，使其训练动态更加复杂。

Physics-Informed Graph Neural Network for Dynamic Reconfiguration of Power Systems

paper_url: http://arxiv.org/abs/2310.00728
repo_url: None
paper_authors: Jules Authier, Rabab Haider, Anuradha Annaswamy, Florian Dorfler
for: 这个论文是为了解决动态重配置（DyR）问题的快速决策算法。DyR 是一个扩展到大规模网格和快速时间步骤的混合整数问题，可能是计算 tractable 的问题。
methods: 该论文提出了一种基于物理学习树（GNNs）框架的 GraPhyR，用于解决 DyR 问题。该框架直接包含了操作和连接约束，并通过练习策略来训练。
results: 论文的结果表明，GraPhyR 能够学习解决 DyR 问题，并且比传统的方法更快和更有效。

Abstract
To maintain a reliable grid we need fast decision-making algorithms for complex problems like Dynamic Reconfiguration (DyR). DyR optimizes distribution grid switch settings in real-time to minimize grid losses and dispatches resources to supply loads with available generation. DyR is a mixed-integer problem and can be computationally intractable to solve for large grids and at fast timescales. We propose GraPhyR, a Physics-Informed Graph Neural Network (GNNs) framework tailored for DyR. We incorporate essential operational and connectivity constraints directly within the GNN framework and train it end-to-end. Our results show that GraPhyR is able to learn to optimize the DyR task.

摘要
Simplified Chinese:维护可靠的电网需要快速的决策算法来解决复杂的问题，如动态重组（DyR）。DyR在实时中 ottimize 分配网络设置，以最小化电网损失和将资源派发给可用的生产。DyR是一个混合整数问题，可能需要大量的计算时间和复杂的数据分析。我们提出了 GraPhyR，一个基于物理网络学习（GNNs）框架，特别适合DyR。我们直接将运作和连接约束 integrate 到 GNN 框架中，并将其训练成一个终端解决方案。我们的结果显示，GraPhyR 能够学习来优化 DyR 任务。

Learning How to Propagate Messages in Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.00697
repo_url: https://github.com/tengxiao1/l2p
paper_authors: Teng Xiao, Zhengyu Chen, Donglin Wang, Suhang Wang
for: 这个论文研究了图神经网络（GNNs）中的信息传播策略学习问题。
methods: 该论文提出了一种通用学习框架，可以不仅学习GNN参数进行预测，而且可以显式地学习不同节点和不同图类型的可解释性和个性化的传播策略。
results: 经验表明，该提议的框架可以在不同类型的图benchmark上显著提高性能，并可以有效地学习GNN中的可解释性和个性化的传播策略。

Abstract
This paper studies the problem of learning message propagation strategies for graph neural networks (GNNs). One of the challenges for graph neural networks is that of defining the propagation strategy. For instance, the choices of propagation steps are often specialized to a single graph and are not personalized to different nodes. To compensate for this, in this paper, we present learning to propagate, a general learning framework that not only learns the GNN parameters for prediction but more importantly, can explicitly learn the interpretable and personalized propagate strategies for different nodes and various types of graphs. We introduce the optimal propagation steps as latent variables to help find the maximum-likelihood estimation of the GNN parameters in a variational Expectation-Maximization (VEM) framework. Extensive experiments on various types of graph benchmarks demonstrate that our proposed framework can significantly achieve better performance compared with the state-of-the-art methods, and can effectively learn personalized and interpretable propagate strategies of messages in GNNs.

摘要

The Noise Geometry of Stochastic Gradient Descent: A Quantitative and Analytical Characterization

paper_url: http://arxiv.org/abs/2310.00692
repo_url: None
paper_authors: Mingze Wang, Lei Wu
for: 本研究旨在理解梯度下降法（SGD）中噪声的地理学性质，并提供了对于过参数化线性模型（OLMs）和层次神经网络的全面理论分析。
methods: 本研究使用了平均和方向含义的对比，特别是考虑样本大小和输入数据缺乏对对焊的影响。
results: 研究发现，SGD在梯度下降过程中噪声会与损失函数的本地几何相似，并且SGD在避免锐 minimum 的过程中会选择平行于损失函数的平坦方向进行跃点。这与梯度下降法不同，后者只能逃脱锐 minimum 方向。实验 validate 了理论发现。

Abstract
Empirical studies have demonstrated that the noise in stochastic gradient descent (SGD) aligns favorably with the local geometry of loss landscape. However, theoretical and quantitative explanations for this phenomenon remain sparse. In this paper, we offer a comprehensive theoretical investigation into the aforementioned {\em noise geometry} for over-parameterized linear (OLMs) models and two-layer neural networks. We scrutinize both average and directional alignments, paying special attention to how factors like sample size and input data degeneracy affect the alignment strength. As a specific application, we leverage our noise geometry characterizations to study how SGD escapes from sharp minima, revealing that the escape direction has significant components along flat directions. This is in stark contrast to GD, which escapes only along the sharpest directions. To substantiate our theoretical findings, both synthetic and real-world experiments are provided.

摘要
empirical studies have shown that the noise in stochastic gradient descent (SGD) is aligned with the local geometry of the loss landscape. however, there is a lack of theoretical and quantitative explanations for this phenomenon. in this paper, we provide a comprehensive theoretical investigation into the "noise geometry" of over-parameterized linear (OLMs) models and two-layer neural networks. we examine both average and directional alignments, paying special attention to how factors such as sample size and input data degeneracy affect the alignment strength. as a specific application, we use our noise geometry characterizations to study how SGD escapes from sharp minima, revealing that the escape direction has significant components along flat directions. this is in stark contrast to GD, which escapes only along the sharpest directions. to substantiate our theoretical findings, we provide both synthetic and real-world experiments.

PharmacoNet: Accelerating Large-Scale Virtual Screening by Deep Pharmacophore Modeling

paper_url: http://arxiv.org/abs/2310.00681
repo_url: https://github.com/seonghwanseo/pharmaconet
paper_authors: Seonghwan Seo, Woo Youn Kim
for: 该研究旨在开发一种高效的结构基Virtual Screening方法，以应对越来越大的访问ible compound库。
methods: 该方法使用深度学习框架，通过识别稳定结合的3D药物配置，从binding site中预测药物和蛋白质的绑定pose。通过粗粒度图匹配，一步解决了现有方法中的昂贵绑定pose采样和评分过程。
results: 对比现有方法，PharmacoNet显示了更高的速度和更好的准确性，同时能够保留高过滤率下的hit候选者。研究发现，深度学习基于药物搜寻的方法可以激活未探索的药物搜寻潜力。

Abstract
As the size of accessible compound libraries expands to over 10 billion, the need for more efficient structure-based virtual screening methods is emerging. Different pre-screening methods have been developed to rapidly screen the library, but the structure-based methods applicable to general proteins are still lacking: the challenge is to predict the binding pose between proteins and ligands and perform scoring in an extremely short time. We introduce PharmacoNet, a deep learning framework that identifies the optimal 3D pharmacophore arrangement which a ligand should have for stable binding from the binding site. By coarse-grained graph matching between ligands and the generated pharmacophore arrangement, we solve the expensive binding pose sampling and scoring procedures of existing methods in a single step. PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function. Furthermore, we show the promising result that PharmacoNet effectively retains hit candidates even under the high pre-screening filtration rates. Overall, our study uncovers the hitherto untapped potential of a pharmacophore modeling approach in deep learning-based drug discovery.

摘要
We introduce PharmacoNet, a deep learning framework that identifies the optimal 3D pharmacophore arrangement for stable binding from the binding site. By coarse-grained graph matching between ligands and the generated pharmacophore arrangement, we eliminate the expensive binding pose sampling and scoring procedures of existing methods in a single step. PharmacoNet is significantly faster than state-of-the-art structure-based approaches, yet reasonably accurate with a simple scoring function.Moreover, we show that PharmacoNet effectively retains hit candidates even under high pre-screening filtration rates. Our study uncovers the hitherto untapped potential of a pharmacophore modeling approach in deep learning-based drug discovery.

A General Offline Reinforcement Learning Framework for Interactive Recommendation

paper_url: http://arxiv.org/abs/2310.00678
repo_url: None
paper_authors: Teng Xiao, Donglin Wang
for: 这个论文研究了在在线环境中学习互动推荐系统的问题，无需在线探索。
methods: 论文提出了一种通用的Offline reinforcement learning框架，可以在不进行线上探索的情况下，最大化用户奖励。特别是，论文首先引入了一种 probabilistic generative model for interactive recommendation，然后提出了一种有效的推理算法基于历史反馈。
results: 论文通过五种方法来减少分布匹配问题，包括支持约束、监督辅助、政策约束、对偶约束和奖励推断。实验表明，提出的方法可以在两个公共的实验数据集上达到比现有的监督学习和强化学习方法更高的性能。

Abstract
This paper studies the problem of learning interactive recommender systems from logged feedbacks without any exploration in online environments. We address the problem by proposing a general offline reinforcement learning framework for recommendation, which enables maximizing cumulative user rewards without online exploration. Specifically, we first introduce a probabilistic generative model for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world datasets, demonstrating that the proposed methods can achieve superior performance over existing supervised learning and reinforcement learning methods for recommendation.

摘要
这个论文研究在在线环境中学习互动推荐系统的问题，不需要在线探索。我们解决这个问题，提出了一种通用的离线强化学习推荐框架，可以在离线环境中最大化用户奖励。 Specifically, we first introduce a probabilistic生成模型 for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world datasets, demonstrating that the proposed methods can achieve superior performance over existing supervised learning and reinforcement learning methods for recommendation.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Optimization or Architecture: How to Hack Kalman Filtering

paper_url: http://arxiv.org/abs/2310.00675
repo_url: https://github.com/ido90/UsingKalmanFilterTheRightWay
paper_authors: Ido Greenberg, Netanel Yannay, Shie Mannor
for: 这个论文是为了探讨非线性滤波器的问题。
methods: 该论文使用了一种名为Optimized Kalman Filter（OKF）的方法，该方法可以对非线性模型进行优化，使其与标准的线性加权滤波器（KF）相比赢得竞争力。
results: 该论文表明，通过使用OKF来优化非线性模型，可以使KF在某些问题上与神经网络模型相比赢得竞争力。此外，OKF还有较好的理论基础和实际表现。

Abstract
In non-linear filtering, it is traditional to compare non-linear architectures such as neural networks to the standard linear Kalman Filter (KF). We observe that this mixes the evaluation of two separate components: the non-linear architecture, and the parameters optimization method. In particular, the non-linear model is often optimized, whereas the reference KF model is not. We argue that both should be optimized similarly, and to that end present the Optimized KF (OKF). We demonstrate that the KF may become competitive to neural models - if optimized using OKF. This implies that experimental conclusions of certain previous studies were derived from a flawed process. The advantage of OKF over the standard KF is further studied theoretically and empirically, in a variety of problems. Conveniently, OKF can replace the KF in real-world systems by merely updating the parameters.

摘要
“在非线性滤波中，传统上对非线性架构如神经网络进行比较，与标准的线性卡尔曼筛（KF）进行对比。我们注意到这两者是不同的两个组件：非线性架构和参数优化方法。具体来说，非线性模型经常被优化，而参考KF模型则不是。我们 argueThat both should be optimized similarly，并为此提出了优化后KF（OKF）。我们示出了KF可能与神经网络竞争，如果使用OKF进行优化。这意味着一些先前的研究结论可能基于不正确的过程得到。OKF比标准KF具有更大的优势，并在各种问题上进行了理论和实验研究。可以说，OKF可以在实际应用中取代KF，只需要更新参数即可。”

Learning Type Inference for Enhanced Dataflow Analysis

paper_url: http://arxiv.org/abs/2310.00673
repo_url: https://github.com/joernio/joernti-codetidal5
paper_authors: Lukas Seidel, Sedick David Baker Effendi, Xavier Pinho, Konrad Rieck, Brink van der Merwe, Fabian Yamaguchi
for: This paper aims to improve the accuracy and efficiency of type inference for dynamically-typed languages, specifically TypeScript, by using machine learning techniques.
methods: The paper proposes a Transformer-based model called CodeTIDAL5, which is trained to predict type annotations and integrates with an open-source static analysis tool called Joern.
results: The paper reports that CodeTIDAL5 outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall, and demonstrates the benefits of using the additional type information for security research.Here’s the same information in Simplified Chinese:
for: 这篇论文目标是提高动态类型语言中的类型推理精度和效率，具体是用机器学习技术来进行类型推理。
methods: 论文提出了一种基于转换器的模型，称为CodeTIDAL5，它可以预测类型注释，并与开源的静态分析工具Joern集成。
results: 论文表明，CodeTIDAL5比当前状态体系的最佳实现提高了7.85%的精度，达到了71.27%的总精度，并通过示出额外类型信息对安全研究带来了优势。

Abstract
Statically analyzing dynamically-typed code is a challenging endeavor, as even seemingly trivial tasks such as determining the targets of procedure calls are non-trivial without knowing the types of objects at compile time. Addressing this challenge, gradual typing is increasingly added to dynamically-typed languages, a prominent example being TypeScript that introduces static typing to JavaScript. Gradual typing improves the developer's ability to verify program behavior, contributing to robust, secure and debuggable programs. In practice, however, users only sparsely annotate types directly. At the same time, conventional type inference faces performance-related challenges as program size grows. Statistical techniques based on machine learning offer faster inference, but although recent approaches demonstrate overall improved accuracy, they still perform significantly worse on user-defined types than on the most common built-in types. Limiting their real-world usefulness even more, they rarely integrate with user-facing applications. We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations. For effective result retrieval and re-integration, we extract usage slices from a program's code property graph. Comparing our approach against recent neural type inference systems, our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall. Furthermore, we present JoernTI, an integration of our approach into Joern, an open source static analysis tool, and demonstrate that the analysis benefits from the additional type information. As our model allows for fast inference times even on commodity CPUs, making our system available through Joern leads to high accessibility and facilitates security research.

摘要
这是一个挑战性的任务，因为在类型是在编译时才能知道的情况下，确定程式中的目标类型是非常困难的。为了解决这个问题，有越来越多的 dynamically-typed 语言加入了渐进类型系统，例如 TypeScript，它将类型系统引入 JavaScript 中。渐进类型可以帮助开发者更好地验证程式的行为，从而实现更加稳定、安全和可靠的程式。然而，在实践中，用户几乎不直接将类型资讯输入。另一方面，传统的类型推论面临着程序大小增长时的性能问题，而且机器学习技术的使用可以提供更快的推论，但是这些方法通常在用户自定义的类型上表现较差。为了解决这个问题，我们提出了 CodeTIDAL5，一个基于 Transformer 模型的类型预测方法。为了有效地从程式码中提取类型资讯，我们将程式码转换为 code property graph，并从中提取使用类型的片段。与最新的神经网络类型推论系统相比，我们的模型在 ManyTypes4TypeScript 测试 benchmark 上的表现比前者高7.85%，总的来说是71.27%的准确率。此外，我们还将我们的方法与 open source 静态分析工具 Joern 集成，并证明了这种结合带来的分析优化。由于我们的模型可以在廉价的实体CPU上进行快速的类型推论，因此通过 Joern 进行分析可以实现高可用性和促进安全研究。

Balancing Efficiency vs. Effectiveness and Providing Missing Label Robustness in Multi-Label Stream Classification

paper_url: http://arxiv.org/abs/2310.00665
repo_url: https://github.com/sepehrbakhshi/ml-bels
paper_authors: Sepehr Bakhshi, Fazli Can
for: 这个论文的目的是提出一种适用于高维多标签分类的神经网络模型，以解决现有模型的不稳定性和效率问题。
methods: 我们的模型使用了选择性概念演化适应机制，使其适用于非站台环境。此外，我们还采用了一种简单 yet effective的抽象策略来处理缺失标签问题，并证明其在大多数状态前的监测模型中表现出色。
results: 我们的模型ML-BELS在11个基准模型、5个Synthetics和13个实际数据集上进行了广泛的评估，结果显示其能够平衡效率和有效性，同时对缺失标签和概念演化具有良好的Robustness。

Abstract
Available works addressing multi-label classification in a data stream environment focus on proposing accurate models; however, these models often exhibit inefficiency and cannot balance effectiveness and efficiency. In this work, we propose a neural network-based approach that tackles this issue and is suitable for high-dimensional multi-label classification. Our model uses a selective concept drift adaptation mechanism that makes it suitable for a non-stationary environment. Additionally, we adapt our model to an environment with missing labels using a simple yet effective imputation strategy and demonstrate that it outperforms a vast majority of the state-of-the-art supervised models. To achieve our purposes, we introduce a weighted binary relevance-based approach named ML-BELS using the Broad Ensemble Learning System (BELS) as its base classifier. Instead of a chain of stacked classifiers, our model employs independent weighted ensembles, with the weights generated by the predictions of a BELS classifier. We show that using the weighting strategy on datasets with low label cardinality negatively impacts the accuracy of the model; with this in mind, we use the label cardinality as a trigger for applying the weights. We present an extensive assessment of our model using 11 state-of-the-art baselines, five synthetics, and 13 real-world datasets, all with different characteristics. Our results demonstrate that the proposed approach ML-BELS is successful in balancing effectiveness and efficiency, and is robust to missing labels and concept drift.

摘要
可用的工作，关于多标签分类在数据流环境中，主要关注提出准确的模型，但这些模型经常表现不具有效率。在这种情况下，我们提出一种基于神经网络的方法，可以解决这个问题，并适用于高维多标签分类。我们的模型使用一种选择性概念漂移适应机制，使其适用于不站ARY环境。此外，我们采用一种简单 yet effective的损失函数填充策略，以适应缺失标签的环境。为了实现我们的目标，我们引入了一种权重Binary relevance-based approach named ML-BELS，使用Broad Ensemble Learning System（BELS）作为基类фика器。而不是一串堆叠的类фика器，我们的模型使用独立的权重ensemble，其权重由BELS类фика器的预测值生成。我们发现，在 datasets with low label cardinality 上，使用权重策略会下降模型的准确率。因此，我们使用标签 cardinality 作为触发器，只在标签 cardinality 高于某个阈值时应用权重。我们对 11 种基线模型、5 种 sintetics 和 13 个实际 datasets进行了广泛的评估。我们的结果表明，我们的方法 ML-BELS 能够均衡效果和效率，并对缺失标签和概念漂移 exhibit robustness。

Twin Neural Network Improved k-Nearest Neighbor Regression

paper_url: http://arxiv.org/abs/2310.00664
repo_url: None
paper_authors: Sebastian J. Wetzel
for: 用于预测差异而不是目标值本身的双神经网络回归。
methods: 使用预测差异的方法，选择最近的 anchor 数据点作为预测差异的 Referent。
results: 在小到中等大的数据集上，该算法可以超越神经网络和 k-最近邻 regression。

Abstract
Twin neural network regression is trained to predict differences between regression targets rather than the targets themselves. A solution to the original regression problem can be obtained by ensembling predicted differences between the targets of an unknown data point and multiple known anchor data points. Choosing the anchors to be the nearest neighbors of the unknown data point leads to a neural network-based improvement of k-nearest neighbor regression. This algorithm is shown to outperform both neural networks and k-nearest neighbor regression on small to medium-sized data sets.

摘要
双 neural network regression 是用来预测目标之间的差异而不是目标本身。一种解决原始回归问题的解决方案是通过 ensemble 预测未知数据点的目标之间的差异和多个已知的 anchor 数据点之间的差异。选择 anchor 为未知数据点最近的邻居，可以实现基于 neural network 的 k-nearest neighbor 回归的改进。这种算法在小到中型数据集上表现出了超过 neural networks 和 k-nearest neighbor 回归的性能。

PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting

paper_url: http://arxiv.org/abs/2310.00655
repo_url: https://github.com/Zeying-Gong/PatchMixer
paper_authors: Zeying Gong, Yujin Tang, Junwei Liang
for: 这篇论文主要针对时间序列预测任务进行了研究，尤其是解决Transformer Architecture中的Permutation-Invariant Self-Attention机制导致的一般挑战。
methods: 该论文提出了一种新的CNN基于模型，即PatchMixer，它使用了可变 convolutional structure来保留时间信息。与传统的CNN在这个领域中使用多尺度或多支流的方法不同，我们的方法仅使用了深度分割 convolutions，以EXTRACT both local features和全局相关性。
results: 我们的实验结果表明，相比最佳的方法和CNN，PatchMixer在七个时间序列预测 bencmarks上提供了3.9%和21.2%的相对改进，同时比最高级别的方法快2-3倍。

Abstract
Although the Transformer has been the dominant architecture for time series forecasting tasks in recent years, a fundamental challenge remains: the permutation-invariant self-attention mechanism within Transformers leads to a loss of temporal information. To tackle these challenges, we propose PatchMixer, a novel CNN-based model. It introduces a permutation-variant convolutional structure to preserve temporal information. Diverging from conventional CNNs in this field, which often employ multiple scales or numerous branches, our method relies exclusively on depthwise separable convolutions. This allows us to extract both local features and global correlations using a single-scale architecture. Furthermore, we employ dual forecasting heads that encompass both linear and nonlinear components to better model future curve trends and details. Our experimental results on seven time-series forecasting benchmarks indicate that compared with the state-of-the-art method and the best-performing CNN, PatchMixer yields $3.9\%$ and $21.2\%$ relative improvements, respectively, while being 2-3x faster than the most advanced method. We will release our code and model.

摘要
尽管Transformer在最近几年内时序预测任务中占据主导地位，但是一个基本挑战仍然存在：Transformer中的卷积层的自注意力机制会导致时间信息的丢失。为解决这些挑战，我们提出了PatchMixer，一种新的CNN基于模型。它引入了一种可变卷积结构，以保留时间信息。与传统的CNN在这个领域中，通常采用多个缩放或多个分支，我们的方法仅仅采用深度分解卷积。这使得我们可以提取本地特征和全局相关性，使用单一的大小结构。此外，我们使用双重预测头，包括线性和非线性组件，以更好地模型未来曲线趋势和细节。我们的实验结果表明，与状态之前的方法和最佳CNN相比，PatchMixer在七个时序预测标准 benchmark上提供了3.9%和21.2%的相对提升，同时比最先进的方法快2-3倍。我们将发布我们的代码和模型。

A primal-dual perspective for distributed TD-learning

paper_url: http://arxiv.org/abs/2310.00638
repo_url: None
paper_authors: Han-Dong Lim, Donghwan Lee
for: investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process
methods: based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints
results: examined the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models, without assuming a doubly stochastic matrix for the communication network structure.

Abstract
The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.

摘要
本文的目标是研究分布式时间差（TD）学习 для网络化多 Agent Markov决策过程。我们提出的方法基于分布式优化算法，可以被看作为 primal-dual ordinary differential equation（ODE）动力学Subject to null-space constraints。基于 primal-dual ODE 动力学Subject to null-space constraints 的快速抽象行为，我们研究了不同分布式 TD-学习场景中的最终迭代器行为，包括常数和减少步长和 Markovian 观测模型。不同于现有方法，我们的算法不需要假设基础通信网络结构是 doubly stochastic matrix。

GNRK: Graph Neural Runge-Kutta method for solving partial differential equations

paper_url: http://arxiv.org/abs/2310.00618
repo_url: https://github.com/hoyunchoi/GNRK
paper_authors: Hoyun Choi, Sungyeop Lee, B. Kahng, Junghyo Jo
for:* 这种新的方法可以用来解决广泛的偏微分方程（PDEs），而不需要特定的初始条件或偏微分方程的系数。methods:* 该方法基于图structures，使其对域分辨率和时间分辨率的变化具有抗锋性。* 该方法结合了图神经网络模块和回归结构，以提高其效率和通用性。results:* 对于2维布尔氏方程，GNRK表现出了较高的准确率和模型体积。* 该方法可以 straightforwardly拓展到解决相互关联的偏微分方程。

Abstract
Neural networks have proven to be efficient surrogate models for tackling partial differential equations (PDEs). However, their applicability is often confined to specific PDEs under certain constraints, in contrast to classical PDE solvers that rely on numerical differentiation. Striking a balance between efficiency and versatility, this study introduces a novel approach called Graph Neural Runge-Kutta (GNRK), which integrates graph neural network modules with a recurrent structure inspired by the classical solvers. The GNRK operates on graph structures, ensuring its resilience to changes in spatial and temporal resolutions during domain discretization. Moreover, it demonstrates the capability to address general PDEs, irrespective of initial conditions or PDE coefficients. To assess its performance, we benchmark the GNRK against existing neural network based PDE solvers using the 2-dimensional Burgers' equation, revealing the GNRK's superiority in terms of model size and accuracy. Additionally, this graph-based methodology offers a straightforward extension for solving coupled differential equations, typically necessitating more intricate models.

摘要
The GNRK operates on graph structures, ensuring its ability to adapt to changes in spatial and temporal resolutions during domain discretization. Moreover, it can handle general PDEs regardless of initial conditions or PDE coefficients. To evaluate its performance, we benchmark the GNRK against existing neural network-based PDE solvers using the 2-dimensional Burgers' equation, demonstrating its superiority in terms of model size and accuracy.Furthermore, the graph-based methodology used in the GNRK provides a straightforward extension for solving coupled differential equations, which are typically more challenging to model. This study opens up new possibilities for using neural networks to solve a wide range of PDEs, with potential applications in fields such as fluid dynamics, heat transfer, and wave propagation.

On the Onset of Robust Overfitting in Adversarial Training

paper_url: http://arxiv.org/abs/2310.00607
repo_url: None
paper_authors: Chaojian Yu, Xiaolong Shi, Jun Yu, Bo Han, Tongliang Liu
for: 本研究旨在解释robust overfitting的基本机制，以及提出两种方法来缓解这种现象。
methods: 研究者通过分析normal data和敌方攻击的影响，并提出了一种基于factor ablation的方法来解释robust overfitting的起源。
results: 实验结果表明，提出的两种方法能够有效地缓解robust overfitting，并提高模型的 adversarial robustness。

Abstract
Adversarial Training (AT) is a widely-used algorithm for building robust neural networks, but it suffers from the issue of robust overfitting, the fundamental mechanism of which remains unclear. In this work, we consider normal data and adversarial perturbation as separate factors, and identify that the underlying causes of robust overfitting stem from the normal data through factor ablation in AT. Furthermore, we explain the onset of robust overfitting as a result of the model learning features that lack robust generalization, which we refer to as non-effective features. Specifically, we provide a detailed analysis of the generation of non-effective features and how they lead to robust overfitting. Additionally, we explain various empirical behaviors observed in robust overfitting and revisit different techniques to mitigate robust overfitting from the perspective of non-effective features, providing a comprehensive understanding of the robust overfitting phenomenon. This understanding inspires us to propose two measures, attack strength and data augmentation, to hinder the learning of non-effective features by the neural network, thereby alleviating robust overfitting. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed methods in mitigating robust overfitting and enhancing adversarial robustness.

摘要
针对抗性训练（AT） Algorithm，我们尝试分离 normal data 和抗击干扰作用，并发现了 robust overfitting 的根本机制。我们发现，AT 中的 robust overfitting 问题来自 normal data 的因素缺失，而这些因素缺失导致模型学习无效的特征，我们称之为 non-effective features。我们进行了详细的非效果特征生成分析和如何导致 robust overfitting 的分析。此外，我们还解释了 robust overfitting 的不同实验现象，并重新评估了不同的技术来mitigate robust overfitting，从非效果特征的角度出发。基于这种理解，我们提出了两种方法，攻击强度和数据增强，来阻止神经网络学习非效果特征，从而缓解 robust overfitting。我们在标准数据集上进行了广泛的实验，并证明了我们的方法可以有效地缓解 robust overfitting 并提高对抗性。

Path Structured Multimarginal Schrödinger Bridge for Probabilistic Learning of Hardware Resource Usage by Control Software

paper_url: http://arxiv.org/abs/2310.00604
repo_url: None
paper_authors: Georgiy A. Bondar, Robert Gifford, Linh Thi Xuan Phan, Abhishek Halder
for: 解决路径结构多重 Marginal Schrödinger Bridge 问题 (MSBP)，获得最可能的测量对象 trajectory，以便预测软件控制下硬件资源的时变分布。
methods: 利用最近的算法技术解决这类结构化 MSBP，以学习软件控制下硬件资源的使用情况。
results: 方法可以快速减少至精度预测硬件资源使用情况，并且可以应用到任何软件预测Cyber-physical上下文相依性性能。

Abstract
The solution of the path structured multimarginal Schr\"{o}dinger bridge problem (MSBP) is the most-likely measure-valued trajectory consistent with a sequence of observed probability measures or distributional snapshots. We leverage recent algorithmic advances in solving such structured MSBPs for learning stochastic hardware resource usage by control software. The solution enables predicting the time-varying distribution of hardware resource availability at a desired time with guaranteed linear convergence. We demonstrate the efficacy of our probabilistic learning approach in a model predictive control software execution case study. The method exhibits rapid convergence to an accurate prediction of hardware resource utilization of the controller. The method can be broadly applied to any software to predict cyber-physical context-dependent performance at arbitrary time.

摘要
解决方案是多 Structured 多 marginal Schrödinger 桥Problem (MSBP) 的解决方案，它是一系列观测概率分布或分布快照的最有可能的测试值轨迹。我们利用了最新的算法技术来解决这种结构化MSBP，用于学习干扰控制软件的硬件资源使用情况。解决方案可以预测时间变化的硬件资源可用性，并且保证线性快速收敛。我们在一个模型预测控制软件执行 caso study 中证明了我们的概率学方法的有效性。方法在控制器中的硬件资源使用情况中显示了快速收敛到准确的预测。该方法可以广泛应用于任何软件，以预测任意时间点的Cyber-Physical context-dependent性能。

SIMD Dataflow Co-optimization for Efficient Neural Networks Inferences on CPUs

paper_url: http://arxiv.org/abs/2310.00574
repo_url: None
paper_authors: Cyrus Zhou, Zack Hassman, Ruize Xu, Dhirpal Shah, Vaugnn Richard, Yanjing Li
for: 这篇论文主要关注在CPU上执行神经网络时所遇到的挑战，特别是对于实时运算时间的最佳化，同时保持精度。
methods: 本研究使用了资料流（即神经网络计算顺序）来探索数据重复机会，运用了一个基于规律的分析和代码生成框架，以探索不同的单指令多数据（SIMD）实现方法，以获得优化神经网络执行。
results: 研究结果显示，以保持输出在SIMD注册中，同时将输入和权重重复 maximize 可以实现最好的性能，实现了对各种推理任务的广泛性和可靠性。具体来说，对于8位数字神经网络，可以 achieve 3x 速度提升；对于二进制神经网络，可以 achieve 4.8x 速度提升，对于现有的神经网络实现最佳化。

Abstract
We address the challenges associated with deploying neural networks on CPUs, with a particular focus on minimizing inference time while maintaining accuracy. Our novel approach is to use the dataflow (i.e., computation order) of a neural network to explore data reuse opportunities using heuristic-guided analysis and a code generation framework, which enables exploration of various Single Instruction, Multiple Data (SIMD) implementations to achieve optimized neural network execution. Our results demonstrate that the dataflow that keeps outputs in SIMD registers while also maximizing both input and weight reuse consistently yields the best performance for a wide variety of inference workloads, achieving up to 3x speedup for 8-bit neural networks, and up to 4.8x speedup for binary neural networks, respectively, over the optimized implementations of neural networks today.

摘要
我们面临将神经网络部署到CPU上的挑战，尤其是对于降低推导时间而保持精度的最佳化。我们的新方法是根据神经网络的资料流（即computation order）进行回传分析和代码生成框架，以寻找可以使用单指令多数据（SIMD）实现最佳化的神经网络执行。我们的结果显示，可以将输出保持在SIMD寄存器中，同时将输入和权重重用到最大化的情况下，协助实现广泛的推导工作负载上的最高性能，相比今天的最佳化神经网络实现，8位数字神经网络可以达到3倍的速度提升，而二进制神经网络可以达到4.8倍的速度提升。

Discrete Choice Multi-Armed Bandits

paper_url: http://arxiv.org/abs/2310.00562
repo_url: None
paper_authors: Emerson Melo, David Müller
for: 这篇论文将绘制连接分类选择模型和在线学习多重枪支算法的关系。我们的贡献可以概括为两个关键方面： 1. 我们提供了下线 regret bound，覆盖广泛的算法家族，其中包括Exp3算法为特例。 2. 我们引入了一种新的对抗多重枪支算法家族， Drawing inspiration from generalized nested logit models introduced by \citet{wen:2001}.
methods: 我们的算法使用了分类选择模型，并且提供了closed-form sampling distribution probabilities，使其可以实现高效。
results: 我们通过数值实验，将我们的算法应用于随机枪支问题，并取得了实质性的结果。

Abstract
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. Our contributions can be summarized in two key aspects. Firstly, we furnish sublinear regret bounds for a comprehensive family of algorithms, encompassing the Exp3 algorithm as a particular case. Secondly, we introduce a novel family of adversarial multiarmed bandit algorithms, drawing inspiration from the generalized nested logit models initially introduced by \citet{wen:2001}. These algorithms offer users the flexibility to fine-tune the model extensively, as they can be implemented efficiently due to their closed-form sampling distribution probabilities. To demonstrate the practical implementation of our algorithms, we present numerical experiments, focusing on the stochastic bandit case.

摘要
Firstly, we provide sublinear regret bounds for a wide range of algorithms, including the Exp3 algorithm as a special case.Secondly, we introduce a new family of adversarial multiarmed bandit algorithms, inspired by the generalized nested logit models first introduced by \citet{wen:2001}. These algorithms allow for extensive fine-tuning of the model and can be efficiently implemented due to their closed-form sampling distribution probabilities.To demonstrate the practical application of our algorithms, we present numerical experiments focusing on the stochastic bandit case.

Horizontal Class Backdoor to Deep Learning

paper_url: http://arxiv.org/abs/2310.00542
repo_url: None
paper_authors: Hua Ma, Shang Wang, Yansong Gao
for: 这篇论文旨在描述一种新型的水平类后门攻击（HCB），它可以通过含有无关的自然特征来让模型在某些情况下具有负面影响。
methods: 这篇论文使用了一种新的水平类后门攻击方法，它可以在模型中嵌入后门，并通过含有无关的自然特征来让后门在某些情况下被触发。
results: 实验结果表明，这种水平类后门攻击方法可以具有高效率和高攻击成功率，并且可以轻松地绕过多种已知的防御方法。

Abstract
All existing backdoor attacks to deep learning (DL) models belong to the vertical class backdoor (VCB). That is, any sample from a class will activate the implanted backdoor in the presence of the secret trigger, regardless of source-class-agnostic or source-class-specific backdoor. Current trends of existing defenses are overwhelmingly devised for VCB attacks especially the source-class-agnostic backdoor, which essentially neglects other potential simple but general backdoor types, thus giving false security implications. It is thus urgent to discover unknown backdoor types. This work reveals a new, simple, and general horizontal class backdoor (HCB) attack. We show that the backdoor can be naturally bounded with innocuous natural features that are common and pervasive in the real world. Note that an innocuous feature (e.g., expression) is irrelevant to the main task of the model (e.g., recognizing a person from one to another). The innocuous feature spans across classes horizontally but is exhibited by partial samples per class -- satisfying the horizontal class (HC) property. Only when the trigger is concurrently presented with the HC innocuous feature, can the backdoor be effectively activated. Extensive experiments on attacking performance in terms of high attack success rates with tasks of 1) MNIST, 2) facial recognition, 3) traffic sign recognition, and 4) object detection demonstrate that the HCB is highly efficient and effective. We extensively evaluate the HCB evasiveness against a (chronologically) series of 9 influential countermeasures of Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), and Beatrix (NDSS 23'), where none of them can succeed even when a simplest trigger is used.

摘要
所有现有的深度学习（DL）模型攻击都属于垂直类后门（VCB）。即任何一个类的样本在存在秘密触发符时，无论来源是否特定或不特定，都会触发嵌入的后门。现有的防御策略几乎都是为VCB攻击而设计的，特别是源特定后门，这实际上忽略了其他可能的简单但普遍的后门类型，从而给予false安全性。因此，发现未知后门类型是 Urgent。本工作发现了一种新的、简单的、普遍的水平类后门（HCB）攻击方法。我们表明，这种后门可以自然地与无关于主任务的 innocuous feature（例如表情）相结合，这种 innocuous feature 在实际世界中很普遍。注意，无关主任务的 innocuous feature 可以水平地跨类，但只有在触发符同时出现时，HC innocuous feature 才能够有效地触发后门。我们通过对不同任务的 MNIST、人脸识别、交通标识和物体检测进行了广泛的实验，并证明了 HCB 的高效率和可靠性。我们也进行了广泛的验证HCB的逃避性，并证明了HCB不受9种influential countermeasures的侵害，其中包括Fine-Pruning (RAID 18')、STRIP (ACSAC 19')、Neural Cleanse (Oakland 19')、ABS (CCS 19')、Februus (ACSAC 20')、MNTD (Oakland 21')、SCAn (USENIX SEC 21')、MOTH (Oakland 22')和Beatrix (NDSS 23')。 None of them can succeed even when a simplest trigger is used.

Robust Nonparametric Hypothesis Testing to Understand Variability in Training Neural Networks

paper_url: http://arxiv.org/abs/2310.00541
repo_url: None
paper_authors: Sinjini Banerjee, Reilly Cannon, Tim Marrinan, Tony Chiang, Anand D. Sarwate
for: 这个论文是为了描述一种新的模型相似度计算方法，用于评估深度神经网络（DNN）的训练过程中的杂态优化问题。
methods: 这篇论文使用了一种基于Robust Hypothesis Testing框架的新方法，用于评估DNN模型之间的相似度。这种方法不仅可以评估模型的测试准确率，还可以捕捉到模型计算的不同。
results: 这篇论文的结果表明，使用这种新方法可以更好地评估DNN模型之间的相似度，并且可以捕捉到模型计算的不同。这种方法可以用于评估其他从训练过程中得到的模型属性。

Abstract
Training a deep neural network (DNN) often involves stochastic optimization, which means each run will produce a different model. Several works suggest this variability is negligible when models have the same performance, which in the case of classification is test accuracy. However, models with similar test accuracy may not be computing the same function. We propose a new measure of closeness between classification models based on the output of the network before thresholding. Our measure is based on a robust hypothesis-testing framework and can be adapted to other quantities derived from trained models.

摘要

Thompson Exploration with Best Challenger Rule in Best Arm Identification

paper_url: http://arxiv.org/abs/2310.00539
repo_url: None
paper_authors: Jongyeong Lee, Junya Honda, Masashi Sugiyama
for: 本研究证明了一种在单参数几何模型下的fixed-confidence最佳臂标识问题（BAI）的解决方案。
methods: 我们提出了一种 combining Thompson sampling 和 computationally efficient approach 的策略，即最佳挑战者规则。
results: 我们证明了该策略是 any two-armed bandit problem 的 asymptotically optimal 策略，并且对于 general $K$-armed bandit problem ($K\geq 3$) 也有 near optimality。在numerical experiments中，我们的策略与 asymptotically optimal 策略相比，具有更好的sample complexity 性能，同时具有更低的计算成本。

Abstract
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving an optimization problem at every round and/or are forced to explore an arm at least a certain number of times except those restricted to the Gaussian model. To address these limitations, we propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule. While Thompson sampling was originally considered for maximizing the cumulative reward, we demonstrate that it can be used to naturally explore arms in BAI without forcing it. We show that our policy is asymptotically optimal for any two-armed bandit problems and achieves near optimality for general $K$-armed bandit problems for $K\geq 3$. Nevertheless, in numerical experiments, our policy shows competitive performance compared to asymptotically optimal policies in terms of sample complexity while requiring less computation cost. In addition, we highlight the advantages of our policy by comparing it to the concept of $\beta$-optimality, a relaxed notion of asymptotic optimality commonly considered in the analysis of a class of policies including the proposed one.

摘要

Statistical Limits of Adaptive Linear Models: Low-Dimensional Estimation and Inference

paper_url: http://arxiv.org/abs/2310.00532
repo_url: https://github.com/licong-lin/low-dim-debias
paper_authors: Licong Lin, Mufang Ying, Suvrojit Ghosh, Koulik Khamaru, Cun-Hui Zhang
for: 本文研究了采集数据是否会对统计学 estimation和推断带来影响，具体来说是 linear models 中的 Ordinary Least Squares (OLS) 估计器是否能够在不同的数据采集方式下保持 asymptotic normality。
methods: 本文使用了 minimax lower bound 来描述在不同数据采集方式下 estimation 的性能差异，并 investigate 了高维 linear models 中 parameter 组件的估计性能如何受到数据采集方式的影响。
results: 本文发现，当数据采集方式是 adaptive 时，OLS 估计器可能会具有较大的 estimation error，而且这个 error 与数据采集方式的度数相关。然而，当数据采集方式是 i.i.d. 时，OLS 估计器的 estimation error 可以达到最佳性。此外，本文还提出了一种新的估计器，可以在 adaptive 数据采集方式下实现 single coordinate inference。这个估计器的 asymptotic normality 性也得到了证明。

Abstract
Estimation and inference in statistics pose significant challenges when data are collected adaptively. Even in linear models, the Ordinary Least Squares (OLS) estimator may fail to exhibit asymptotic normality for single coordinate estimation and have inflated error. This issue is highlighted by a recent minimax lower bound, which shows that the error of estimating a single coordinate can be enlarged by a multiple of $\sqrt{d}$ when data are allowed to be arbitrarily adaptive, compared with the case when they are i.i.d. Our work explores this striking difference in estimation performance between utilizing i.i.d. and adaptive data. We investigate how the degree of adaptivity in data collection impacts the performance of estimating a low-dimensional parameter component in high-dimensional linear models. We identify conditions on the data collection mechanism under which the estimation error for a low-dimensional parameter component matches its counterpart in the i.i.d. setting, up to a factor that depends on the degree of adaptivity. We show that OLS or OLS on centered data can achieve this matching error. In addition, we propose a novel estimator for single coordinate inference via solving a Two-stage Adaptive Linear Estimating equation (TALE). Under a weaker form of adaptivity in data collection, we establish an asymptotic normality property of the proposed estimator.

摘要
“统计中的估计和推断在收集数据时存在重要的挑战。甚至在线性模型中，常数最小二乘（OLS）估计器可能无法在单坐标估计中展现 asymptotic normality，并且有较大的误差。这个问题得到了最近的最小下界 bound，显示了数据收集机制允许自由变化时，估计单坐标误差可以被增加为 $\sqrt{d}$ 倍，与独立Identically distributed（i.i.d）数据相比。我们的工作探讨了这种估计性能之间的差异，并研究了高维线性模型中低维参数组件的估计性能如何受到数据收集机制的影响。我们确定了数据收集机制下的condition under which the estimation error for a low-dimensional parameter component matches its counterpart in the i.i.d. setting, up to a factor that depends on the degree of adaptivity。我们还提出了一种新的估计器，可以在高维线性模型中实现单坐标推断。在一种更弱的数据收集机制下，我们证明了该估计器的 asymptotic normality 性。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.