cs.LG - 2023-09-16

Reducing sequential change detection to sequential estimation

paper_url: http://arxiv.org/abs/2309.09111
repo_url: None
paper_authors: Shubhanshu Shekhar, Aaditya Ramdas
for: 这个论文是为了解决数据流中参数或功能θ的变化检测问题，目标是设计一种具有小检测延迟，但保证在无变化情况下降低假阳性发生频率的变化检测方案。
methods: 这篇论文使用了信息理论Sequential estimation和 confidence sequences的方法，开始每个时间步骤上一个（1-α）信心序列，并在所有活跃信心序列交叉为空时宣布变化。
results: 论文证明了变化检测方案的平均运行长度至少为1/α，从而实现了具有小结构假设（即可能存在依赖性观察和非 Parametric 分布类），但强制 garanties。

Abstract
We consider the problem of sequential change detection, where the goal is to design a scheme for detecting any changes in a parameter or functional $\theta$ of the data stream distribution that has small detection delay, but guarantees control on the frequency of false alarms in the absence of changes. In this paper, we describe a simple reduction from sequential change detection to sequential estimation using confidence sequences: we begin a new $(1-\alpha)$-confidence sequence at each time step, and proclaim a change when the intersection of all active confidence sequences becomes empty. We prove that the average run length is at least $1/\alpha$, resulting in a change detection scheme with minimal structural assumptions~(thus allowing for possibly dependent observations, and nonparametric distribution classes), but strong guarantees. Our approach bears an interesting parallel with the reduction from change detection to sequential testing of Lorden (1971) and the e-detector of Shin et al. (2022).

摘要
我们考虑了时间序列变化检测问题，目标是设计一个检测数据流分布中参数或功能 $\theta$ 变化的方案，具有小检测延迟，但 garantuee控制不变数据流中噪声频率。在这篇论文中，我们描述了一种简单的减少从时间序列变化检测到时间序列估计的方法：我们在每个时间步骤开始一个新的 $(1-\alpha)$-信度序列，并在所有活动信度序列交集为空时宣布变化。我们证明了average run length至少为 $1/\alpha$，从而实现了变化检测方案，具有最小的结构假设（即允许可能相关的观察和非参数分布类型），但具有强 guarantees。我们的方法与 Lorden (1971) 的减少从变化检测到时间序列测试，以及 Shin et al. (2022) 的 e-detector 具有一定的平行。

Test-Time Compensated Representation Learning for Extreme Traffic Forecasting

paper_url: http://arxiv.org/abs/2309.09074
repo_url: None
paper_authors: Zhiwei Zhang, Weizhong Zhang, Yaowei Huang, Kani Chen
for: 预测交通流量是一项复杂的任务，因为交通系列具有复杂的空间时间相关性。这篇论文探讨了一个尚未得到充分关注的问题：极端事件。
methods: 我们提出了一种在测试阶段补偿表示学习框架，包括空间时间分解数据银行和多头空间变换模型（CompFormer）。前者分解所有训练数据的时间维度按照 periodic 特征，而后者通过空间注意力矩阵与最近观察和历史序列在数据银行之间建立连接，以便将稳定特征传递到抗EXTREME事件。
results: 我们的方法可以在极端事件下 достичь显著改进，比如METR-LA和PEMS-BAY 测试环境中的6个强基elines。总体而言，我们的方法可以提高交通预测的准确率，最高达28.2%。

Abstract
Traffic forecasting is a challenging task due to the complex spatio-temporal correlations among traffic series. In this paper, we identify an underexplored problem in multivariate traffic series prediction: extreme events. Road congestion and rush hours can result in low correlation in vehicle speeds at various intersections during adjacent time periods. Existing methods generally predict future series based on recent observations and entirely discard training data during the testing phase, rendering them unreliable for forecasting highly nonlinear multivariate time series. To tackle this issue, we propose a test-time compensated representation learning framework comprising a spatio-temporal decomposed data bank and a multi-head spatial transformer model (CompFormer). The former component explicitly separates all training data along the temporal dimension according to periodicity characteristics, while the latter component establishes a connection between recent observations and historical series in the data bank through a spatial attention matrix. This enables the CompFormer to transfer robust features to overcome anomalous events while using fewer computational resources. Our modules can be flexibly integrated with existing forecasting methods through end-to-end training, and we demonstrate their effectiveness on the METR-LA and PEMS-BAY benchmarks. Extensive experimental results show that our method is particularly important in extreme events, and can achieve significant improvements over six strong baselines, with an overall improvement of up to 28.2%.

摘要
很多时候，交通预测是一项非常困难的任务，这是因为交通系列之间存在复杂的空间-时间相关性。在这篇论文中，我们提出了一个未得到足够关注的问题：极端事件。路口拥堵和高峰时段可能导致不同的交通枢纽的车速在相邻时期之间存在低相关性。现有的方法通常是根据最近观察值预测未来系列，并完全抛弃测试阶段的训练数据，这使得它们在预测非线性多变量时间系列方面不可靠。为解决这个问题，我们提议一个在测试阶段补偿表示学习框架，包括空间-时间分解的数据银行和多头空间变换模型（CompFormer）。前者组件可以显式分解所有的训练数据以 Temp 度量的时间维度，而后者组件可以通过空间注意力矩阵与最近观察值和历史系列在数据银行中建立连接，使CompFormer可以传递 Robust 特征以抵御特殊事件，并使用更少的计算资源。我们的模块可以与现有的预测方法进行灵活的集成，我们在 METR-LA 和 PEMS-BAY 测试准则上进行了综合实验，结果表明我们的方法在极端事件中尤为重要，可以在六个强大基准集上实现显著提高，最大提高达28.2%。

Enhancing personalised thermal comfort models with Active Learning for improved HVAC controls

paper_url: http://arxiv.org/abs/2309.09073
repo_url: None
paper_authors: Zeynep Duygu Tekler, Yue Lei, Xilei Dai, Adrian Chong
for: 本研究旨在开发个性化thermal comfort模型，以便在建筑物中实现occupant-centric控制（OCC）系统。
methods: 本研究使用Active Learning（AL）技术， Addresses the data challenges related to real-world OCC implementations。
results: 研究结果表明，使用AL技术可以减少实际标注工作量（31.0%），同时仍能提高能效性和thermal satisfaction水平（1.3%和98%）。这表明，未来实际应用中可以部署这种系统，实现个性化的 COMFORT和能效的建筑操作。

Abstract
Developing personalised thermal comfort models to inform occupant-centric controls (OCC) in buildings requires collecting large amounts of real-time occupant preference data. This process can be highly intrusive and labour-intensive for large-scale implementations, limiting the practicality of real-world OCC implementations. To address this issue, this study proposes a thermal preference-based HVAC control framework enhanced with Active Learning (AL) to address the data challenges related to real-world implementations of such OCC systems. The proposed AL approach proactively identifies the most informative thermal conditions for human annotation and iteratively updates a supervised thermal comfort model. The resulting model is subsequently used to predict the occupants' thermal preferences under different thermal conditions, which are integrated into the building's HVAC controls. The feasibility of our proposed AL-enabled OCC was demonstrated in an EnergyPlus simulation of a real-world testbed supplemented with the thermal preference data of 58 study occupants. The preliminary results indicated a significant reduction in overall labelling effort (i.e., 31.0%) between our AL-enabled OCC and conventional OCC while still achieving a slight increase in energy savings (i.e., 1.3%) and thermal satisfaction levels above 98%. This result demonstrates the potential for deploying such systems in future real-world implementations, enabling personalised comfort and energy-efficient building operations.

摘要
开发个性化thermal comfort模型，以便在建筑物中实现占位者-центric控制（OCC）需要收集大量实时占位者喜好数据。这个过程可能是非常干扰性的，特别是在大规模实施时，这会限制现实世界中OCC系统的实现可行性。为解决这个问题，本研究提出了基于Active Learning（AL）的thermal preference-based HVAC控制框架。该方法可以把实时占位者喜好数据用于训练监督thermal comfort模型，并在不同的thermalconditions下预测占位者的thermal喜好。这些喜好值后来将被集成到建筑物的HVAC控制系统中。我们在一个基于EnergyPlus的实验室中对一个真实的测试床进行了 simulate，并补充了58名学生的thermal喜好数据。初步结果表明，我们的AL-enabled OCC比 conventinal OCC减少了31.0%的标签努力（即标签每个占位者的thermal condition），同时仍保持了1.3%的能源节约和thermal满意度高于98%。这些结果表明，在未来实际世界中部署这些系统是可能的，以实现个性化的舒适性和能源效率的建筑物运行。

Recovering Missing Node Features with Local Structure-based Embeddings

paper_url: http://arxiv.org/abs/2309.09068
repo_url: None
paper_authors: Victor M. Tenorio, Madeline Navarro, Santiago Segarra, Antonio G. Marques
for: recover completely missing node features for a set of graphs
methods: incorporate prior information from both graph topology and existing nodal values, use a Graph AutoEncoder to train a node embedding space
results: accurate feature estimation approach, valuable for downstream graph classification

Abstract
Node features bolster graph-based learning when exploited jointly with network structure. However, a lack of nodal attributes is prevalent in graph data. We present a framework to recover completely missing node features for a set of graphs, where we only know the signals of a subset of graphs. Our approach incorporates prior information from both graph topology and existing nodal values. We demonstrate an example implementation of our framework where we assume that node features depend on local graph structure. Missing nodal values are estimated by aggregating known features from the most similar nodes. Similarity is measured through a node embedding space that preserves local topological features, which we train using a Graph AutoEncoder. We empirically show not only the accuracy of our feature estimation approach but also its value for downstream graph classification. Our success embarks on and implies the need to emphasize the relationship between node features and graph structure in graph-based learning.

摘要
节点特征增强图基于学习，当它们与网络结构结合使用时。然而，graph数据中缺失节点特征非常普遍。我们提出了一种框架，可以完全重建缺失节点特征的集合，只要知道一部分图的信号。我们的方法利用图ptopology和已知节点值的先前信息。我们示例中的实现方式是假设节点特征取决于本地图STRUCTURE。缺失节点值可以通过已知节点特征的聚合来估计。相似性是通过一个节点嵌入空间来 preserve local topological features，我们使用一个图自动编码器来训练。我们实际上证明了我们的特征估计方法的准确性，以及其对下游图分类的价值。我们的成功表明了和节点特征和图结构之间的关系的重要性。

Temporal Smoothness Regularisers for Neural Link Predictors

paper_url: http://arxiv.org/abs/2309.09045
repo_url: None
paper_authors: Manuel Dileo, Pasquale Minervini, Matteo Zignani, Sabrina Gaito
for: 本研究旨在提高知识图中链接预测精度，并且可以处理时间约束。
methods: 本研究使用了四次tensor的 canonical decomposition 以及时间滤波 regularization 来预测知识图中链接。
results: 研究发现，通过选择合适的时间滤波 regularizer 和补偿量，简单的 TNTComplEx 方法可以在三个 temporal link prediction 数据集上 producenew state-of-the-art 结果，并且对两个 state-of-the-art 模型进行了评估。

Abstract
Most algorithms for representation learning and link prediction on relational data are designed for static data. However, the data to which they are applied typically evolves over time, including online social networks or interactions between users and items in recommender systems. This is also the case for graph-structured knowledge bases -- knowledge graphs -- which contain facts that are valid only for specific points in time. In such contexts, it becomes crucial to correctly identify missing links at a precise time point, i.e. the temporal prediction link task. Recently, Lacroix et al. and Sadeghian et al. proposed a solution to the problem of link prediction for knowledge graphs under temporal constraints inspired by the canonical decomposition of 4-order tensors, where they regularise the representations of time steps by enforcing temporal smoothing, i.e. by learning similar transformation for adjacent timestamps. However, the impact of the choice of temporal regularisation terms is still poorly understood. In this work, we systematically analyse several choices of temporal smoothing regularisers using linear functions and recurrent architectures. In our experiments, we show that by carefully selecting the temporal smoothing regulariser and regularisation weight, a simple method like TNTComplEx can produce significantly more accurate results than state-of-the-art methods on three widely used temporal link prediction datasets. Furthermore, we evaluate the impact of a wide range of temporal smoothing regularisers on two state-of-the-art temporal link prediction models. Our work shows that simple tensor factorisation models can produce new state-of-the-art results using newly proposed temporal regularisers, highlighting a promising avenue for future research.

摘要
大多数 repreSentation learning 和链接预测算法是设计 для静态数据，但实际应用中的数据通常会随时间而变化，如在线社交网络或用户和物品之间的推荐系统中。这也是知识图的情况——知识graph——其中的事实只有在特定时间点才是有效的。在这些上下文中，正确地预测时间点缺失的链接变得非常重要，即时间预测链接任务。 Lacroix et al. 和 Sadeghian et al. 已经提出了为知识图链接预测问题的解决方案，基于四元tensor的 canonical decomposition，其中将时间步骤的表示正则化，即在邻近时间步骤上学习类似的变换。然而，选择时间平滑正则化项的影响仍未得到充分理解。在这种工作中，我们系统地分析了一些时间平滑正则化项，包括线性函数和循环架构。我们的实验表明，通过选择合适的时间平滑正则化项和正则化权重，简单的 TNTComplEx 方法可以在三个常用的时间链接预测数据集上生成较为准确的结果，并且超过当前的方法。此外，我们评估了一些时间平滑正则化项对两种当前的时间链接预测模型的影响。我们的工作显示，简单的tensor因子分解模型可以使用新的时间正则化项生成新的状态计算结果，标志着未来研究的可能性。

Study of Enhanced MISC-Based Sparse Arrays with High uDOFs and Low Mutual Coupling

paper_url: http://arxiv.org/abs/2309.09044
repo_url: None
paper_authors: X. Sheng, D. Lu, Y. Li, R. C. de Lamare
for: 提高简洁数组（SA）的均匀度和干扰降低性。
methods: 基于最大间隔间距（IES）条件的加强最大间隔间距数组（EMISC SA），包括确定IES集和基于IES集的七个均匀线子数组（ULSAs）。
results: 比较研究表明，提出的EMISC SA在均匀度和干扰降低性方面表现更出色，在其他现有SA上显著优势。

Abstract
In this letter, inspired by the maximum inter-element spacing (IES) constraint (MISC) criterion, an enhanced MISC-based (EMISC) sparse array (SA) with high uniform degrees-of-freedom (uDOFs) and low mutual-coupling (MC) is proposed, analyzed and discussed in detail. For the EMISC SA, an IES set is first determined by the maximum IES and number of elements. Then, the EMISC SA is composed of seven uniform linear sub-arrays (ULSAs) derived from an IES set. An analysis of the uDOFs and weight function shows that, the proposed EMISC SA outperforms the IMISC SA in terms of uDOF and MC. Simulation results show a significant advantage of the EMISC SA over other existing SAs.

摘要
在这封信中，启发于最大间元素间距（IES）约束（MISC） criterion，提出了一种增强型MISC-based（EMISC）稀疏阵列（SA），其具有高统一度数（uDOFs）和低相互干扰（MC）。为EMISC SA而确定了IES集，然后将其分成七个固定长度的uniform linear sub-arrays（ULSAs）。通过对uDOF和权重函数的分析，可以看出，提议的EMISC SA在uDOF和MC方面表现出了较好的性能。仔细的 simulations 表明，EMISC SA在其他现有的SA中具有显著的优势。

Forward Invariance in Neural Network Controlled Systems

paper_url: http://arxiv.org/abs/2309.09043
repo_url: https://github.com/gtfactslab/harapanahalli_lcss2024
paper_authors: Akash Harapanahalli, Saber Jafarpour, Samuel Coogan
for: 这个paper是为了证明和搜索非线性系统中神经网络控制器的前向不变集的 certificates。
methods: 该 frameworks使用了间隔分析和卷积系统理论，constructs localized first-order inclusion functions for the closed-loop system using Jacobian bounds and existing neural network verification tools，并builds a dynamical embedding system to directly correspond with a nested family of hyper-rectangles provably converging to an attractive set of the original system。
results: 该 Framework是自动化的，使用了interval analysis toolbox $\texttt{npinterval}$， along with the symbolic arithmetic toolbox $\texttt{sympy}$，demonstrated on an $8$-dimensional leader-follower system。

Abstract
We present a framework based on interval analysis and monotone systems theory to certify and search for forward invariant sets in nonlinear systems with neural network controllers. The framework (i) constructs localized first-order inclusion functions for the closed-loop system using Jacobian bounds and existing neural network verification tools; (ii) builds a dynamical embedding system where its evaluation along a single trajectory directly corresponds with a nested family of hyper-rectangles provably converging to an attractive set of the original system; (iii) utilizes linear transformations to build families of nested paralleletopes with the same properties. The framework is automated in Python using our interval analysis toolbox $\texttt{npinterval}$, in conjunction with the symbolic arithmetic toolbox $\texttt{sympy}$, demonstrated on an $8$-dimensional leader-follower system.

摘要
我们提出了基于间隔分析和升降系统理论的框架，用于证明和搜索非线性系统中神经网络控制器的前向不变集。这个框架包括以下三个步骤：(i) 使用Jacobian bound和现有的神经网络验证工具来构建封闭循环系统的本地第一阶 inclusion函数;(ii) 使用动力系统来构建一个嵌入系统，其评估路径的评估直接对应于一个嵌入在原系统中的嵌入集;(iii) 使用线性变换来构建一个家族的嵌入多面体，这些多面体具有相同的性质。我们使用Python中的$\texttt{npinterval}$工具包，以及$\texttt{sympy}$工具包，在一个8维领导者-跟随者系统上进行了实验。

Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors

paper_url: http://arxiv.org/abs/2309.09032
repo_url: None
paper_authors: Junren Chen, Shuai Huang, Michael K. Ng, Zhaoqiang Liu
For: This paper addresses the problem of recovering a high-dimensional signal from a quadratic system with full-rank matrices, with a focus on incorporating prior knowledge of the signal to improve recovery performance.* Methods: The paper proposes two algorithms for signal recovery: the thresholded Wirtinger flow (TWF) algorithm and the projected gradient descent (PGD) algorithm. The TWF algorithm consists of a spectral initialization and a thresholded gradient descent step, while the PGD algorithm uses a projected power method and a projected gradient descent step.* Results: The paper reports experimental results that demonstrate the effectiveness of the proposed algorithms for signal recovery. Specifically, the results show that the proposed approach outperforms existing provable algorithms for the sparse case, and leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.

Abstract
The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol{x}$. First, we consider a $k$-sparse $\boldsymbol{x}$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol{x}$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent (with a good initialization) that produces a sequence linearly converging to $\boldsymbol{x}$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $\boldsymbol{x}$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. We develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(\delta)$ at a geometric rate when $m=O(k\log\frac{Lrn}{\delta^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.

摘要
文本中的问题是从quadratic system $\{\mathbf{y}_i = \mathbf{x}^{\top}\mathbf{A}_i\mathbf{x}, \ i=1,\ldots,m\}$中recover $\mathbf{x} \in \mathbb{R}^n$的信号，其中 $\mathbf{A}_i$ 是高级别矩阵。这种问题在应用中如不分配距离几何和sub-波长成像频繁出现。本文使用独立标准 Gaussian 矩阵 $\mathbf{A}_i$ ，解决高维度情况($m \ll n$)，并借助先验知识来$\mathbf{x}$。首先，我们考虑 $k$-sparse $\mathbf{x}$ 情况，并引入 thresholded Wirtinger flow（TWF）算法，不需要稀疏程度 $k$。 TWF 包括两个步骤：spectral initialization，可以在 $m = O(k^2\log n)$ 个探测中找到一个 sufficiently close to $\mathbf{x}$ 的点（Up to a sign flip），以及 thresholded gradient descent（with good initialization），可以在 $m = O(k\log n)$ 个探测中产生一个线性收敛到 $\mathbf{x}$ 的序列。其次，我们探索生成先验，假设 $\mathbf{x}$ 在一个 $L$-Lipschitz 连续的生成模型中，其中输入是 $k$-维的 $\ell_2$ 球中的径 $r$。我们开发了 projected gradient descent（PGD）算法，该算法包括两个步骤：projected power method，可以在 $m = O\big(\sqrt{\frac{k \log L}{m}\big)$ 个探测中提供一个初始向量，其 $\ell_2$ 误差为 $O\big(\sqrt{\frac{k \log L}{m}\big)$；以及 projected gradient descent，可以在 $m = O(k\log\frac{Lrn}{\delta^2})$ 个探测中精确地将 $\ell_2$ 误差降至 $O(\delta)$ 的水平。实验结果证明了我们的理论发现，以及使用生成先验可以在 MNIST 数据集中从一小数量的quadratic measurements中准确地还原图像。 Specifically, our approach for the sparse case significantly outperforms the existing provable algorithm sparse power factorization; leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.

gym-saturation: Gymnasium environments for saturation provers (System description)

paper_url: http://arxiv.org/abs/2309.09022
repo_url: None
paper_authors: Boris Shminke
for: 这篇论文描述了一个新版本的 Python 包 - gym-saturation，该包包含基于给定Clause算法的 OpenAI Gym 环境，用于导引滥化风格的证明器。
methods: 论文使用了两种不同的证明器：Vampire 和 iProver，并提供了解除证明状态表示与强化学习之间的关系的示例。此外，论文还提供了使用已知 ast2vec Python 代码嵌入模型作为一阶逻辑表示的示例。
results: 论文示例了如何使用 Ray RLlib 实现 Thompson 抽样和 proximal policy 优化两种强化学习算法，以便轻松实验新版本的 package。

Abstract
This work describes a new version of a previously published Python package - gym-saturation: a collection of OpenAI Gym environments for guiding saturation-style provers based on the given clause algorithm with reinforcement learning. We contribute usage examples with two different provers: Vampire and iProver. We also have decoupled the proof state representation from reinforcement learning per se and provided examples of using a known ast2vec Python code embedding model as a first-order logic representation. In addition, we demonstrate how environment wrappers can transform a prover into a problem similar to a multi-armed bandit. We applied two reinforcement learning algorithms (Thompson sampling and Proximal policy optimisation) implemented in Ray RLlib to show the ease of experimentation with the new release of our package.

摘要
这个工作描述了一个新版本的Python包 - gym-saturation：一个基于给定的句法算法的OpenAI Gym环境，用于指导吸血者和iProver等逻辑推理器。我们提供了使用示例，包括使用known ast2vec Python代码嵌入模型作为逻辑表示。此外，我们还示出了如何使用环境包装器将推理器转化成多重机关问题。我们在Ray RLlib中实现了两种回归学习算法（托мп逊抽样和距离策略优化），以示新版本包的使用方便。

Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection

paper_url: http://arxiv.org/abs/2309.08971
repo_url: None
paper_authors: Ilyass Moummad, Romain Serizel, Nicolas Farrugia
for: bioacoustic sound event detection, 了解动物行为和监测生物多样性使用音频
methods: 深度学习系统和超级vised contrastive pre-training
results: 61.52%(0.48)和68.19%(0.75)的F-score，无需特定的 annotated data 训练

Abstract
Bioacoustic sound event detection allows for better understanding of animal behavior and for better monitoring biodiversity using audio. Deep learning systems can help achieve this goal, however it is difficult to acquire sufficient annotated data to train these systems from scratch. To address this limitation, the Detection and Classification of Acoustic Scenes and Events (DCASE) community has recasted the problem within the framework of few-shot learning and organize an annual challenge for learning to detect animal sounds from only five annotated examples. In this work, we regularize supervised contrastive pre-training to learn features that can transfer well on new target tasks with animal sounds unseen during training, achieving a high F-score of 61.52%(0.48) when no feature adaptation is applied, and an F-score of 68.19%(0.75) when we further adapt the learned features for each new target task. This work aims to lower the entry bar to few-shot bioacoustic sound event detection by proposing a simple and yet effective framework for this task, by also providing open-source code.

摘要

UNIDEAL: Curriculum Knowledge Distillation Federated Learning

paper_url: http://arxiv.org/abs/2309.08961
repo_url: None
paper_authors: Yuwen Yang, Chang Liu, Xun Cai, Suizhi Huang, Hongtao Lu, Yue Ding
for: 该论文旨在解决跨Domain federated learning中的异常性问题，提出了一种名为UNIDEAL的新的 federated learning算法。
methods: 该算法使用了调整able Teacher-Student Mutual Evaluation Curriculum Learning，以进一步提高 federated learning 中的知识储存效果。
results: 对多个数据集进行了广泛的实验，与state-of-the-art基eline进行比较，结果显示，UNIDEAL可以在模型精度和通信效率两个方面达到更高的性能。此外，还提供了算法的收敛分析，显示其在非对称条件下的收敛率为O(1/T)。

Abstract
Federated Learning (FL) has emerged as a promising approach to enable collaborative learning among multiple clients while preserving data privacy. However, cross-domain FL tasks, where clients possess data from different domains or distributions, remain a challenging problem due to the inherent heterogeneity. In this paper, we present UNIDEAL, a novel FL algorithm specifically designed to tackle the challenges of cross-domain scenarios and heterogeneous model architectures. The proposed method introduces Adjustable Teacher-Student Mutual Evaluation Curriculum Learning, which significantly enhances the effectiveness of knowledge distillation in FL settings. We conduct extensive experiments on various datasets, comparing UNIDEAL with state-of-the-art baselines. Our results demonstrate that UNIDEAL achieves superior performance in terms of both model accuracy and communication efficiency. Additionally, we provide a convergence analysis of the algorithm, showing a convergence rate of O(1/T) under non-convex conditions.

摘要
《联合学习（FL）》已经出现为解决多个客户合作学习的有效方法，同时保护数据隐私。然而，跨Domain FL任务，客户拥有不同Domain或分布的数据，仍然是一个困难的问题，归因于内在的不一致性。本文提出了UNIDEAL算法，专门为跨Domainenario和多种模型架构设计。提议的方法 introduce Adjustable Teacher-Student Mutual Evaluation Curriculum Learning，对FL设置进行显著提升知识填充效果。我们在多个数据集上进行了广泛的实验，与现有的基线进行比较。我们的结果表明，UNIDEAL在模型准确率和通信效率两个方面均达到了Superior性能。此外，我们还提供了算法的收敛分析，显示其在非对称条件下的收敛率为O(1/T)。

Reducing Memory Requirements for the IPU using Butterfly Factorizations

paper_url: http://arxiv.org/abs/2309.08946
repo_url: None
paper_authors: S. -Kazem Shekofteh, Christian Alles, Holger Fröning
for: 本研究旨在探讨 Intellectual Processing Unit (IPU) 如何实现精简模型，提高高性能计算的可扩展性。
methods: 本研究使用了翅膀结构来取代完全连接和 convolutional 层，实现模型压缩。
results: 实验结果表明，使用翅膀结构可以提供 98.5% 压缩率，减少巨大的内存需求。IPU 实现可以获得 1.3x 和 1.6x 性能提升，并在实际数据集 CIFAR10 上达到 1.62x 训练时间速度提升。

Abstract
High Performance Computing (HPC) benefits from different improvements during last decades, specially in terms of hardware platforms to provide more processing power while maintaining the power consumption at a reasonable level. The Intelligence Processing Unit (IPU) is a new type of massively parallel processor, designed to speedup parallel computations with huge number of processing cores and on-chip memory components connected with high-speed fabrics. IPUs mainly target machine learning applications, however, due to the architectural differences between GPUs and IPUs, especially significantly less memory capacity on an IPU, methods for reducing model size by sparsification have to be considered. Butterfly factorizations are well-known replacements for fully-connected and convolutional layers. In this paper, we examine how butterfly structures can be implemented on an IPU and study their behavior and performance compared to a GPU. Experimental results indicate that these methods can provide 98.5% compression ratio to decrease the immense need for memory, the IPU implementation can benefit from 1.3x and 1.6x performance improvement for butterfly and pixelated butterfly, respectively. We also reach to 1.62x training time speedup on a real-word dataset such as CIFAR10.

摘要
高性能计算（HPC）在过去几十年中得到了不同的改进，尤其是硬件平台，以提供更多的处理力而不超过合理的能耗水平。知识处理单元（IPU）是一种新型的极大并行处理器，旨在加速并行计算，特别是机器学习应用中的大量并行计算。由于IPU的architecture和GPU不同，特别是IPU的内存容量远少于GPU，因此需要考虑减少模型大小的方法。蝴蝶分解是机器学习中广泛使用的替换方法，可以完全或部分替换完全连接和卷积层。在这篇论文中，我们研究了如何在IPU上实现蝴蝶结构，并研究其行为和性能，与GPU相比。实验结果表明，这些方法可以提供98.5%的压缩率，减少巨大的内存需求，IPU实现可以 benefit From 1.3x和1.6x的性能提升，分别是蝴蝶和像素化蝴蝶。此外，我们还达到了1.62x的训练时间加速，在实际数据集CIFAR10上。

Inverse classification with logistic and softmax classifiers: efficient optimization

paper_url: http://arxiv.org/abs/2309.08945
repo_url: None
paper_authors: Miguel Á. Carreira-Perpiñán, Suryabhan Singh Hada
for: 本研究目的是解决 inverse classification 问题，即查询一个训练好的分类器的 closest instance，以使得分类器预测的标签发生某种旨在的变化。
methods: 本研究使用了逻辑回归和 softmax 分类器，并利用了这两种模型的特殊性，实现了快速的优化解决方案。
results: 研究人员表明，可以通过 closed form 的方法解决逻辑回归模型，并通过迭代但非常快的方法解决 softmax 模型，可以准确地解决 inverse classification 问题，并且可以在毫秒级别的时间内解决高维度的实例和多个类型。

Abstract
In recent years, a certain type of problems have become of interest where one wants to query a trained classifier. Specifically, one wants to find the closest instance to a given input instance such that the classifier's predicted label is changed in a desired way. Examples of these ``inverse classification'' problems are counterfactual explanations, adversarial examples and model inversion. All of them are fundamentally optimization problems over the input instance vector involving a fixed classifier, and it is of interest to achieve a fast solution for interactive or real-time applications. We focus on solving this problem efficiently for two of the most widely used classifiers: logistic regression and softmax classifiers. Owing to special properties of these models, we show that the optimization can be solved in closed form for logistic regression, and iteratively but extremely fast for the softmax classifier. This allows us to solve either case exactly (to nearly machine precision) in a runtime of milliseconds to around a second even for very high-dimensional instances and many classes.

摘要

Fast Approximation of the Shapley Values Based on Order-of-Addition Experimental Designs

paper_url: http://arxiv.org/abs/2309.08923
repo_url: None
paper_authors: Liuqing Yang, Yongdao Zhou, Haoda Fu, Min-Qian Liu, Wei Zheng
for: 评估多方协作中每个 player 的贡献，以便做出公平的分配成本和利润。
methods: 使用 Shapley value 算法，但是因为计算复杂度太高，采用随机抽样法来估算 Shapley value。采用了基于 эксперименталь设计的 combinatorial structures 来实现更高精度的估算。
results: 对比 SRS 随机抽样法，DOE 采样 schemes 具有更高精度和可靠性，并且在某些情况下可以 deterministically recover 原始 Shapley value。在实验和实际应用中，DOE 采样 schemes 也表现出较快的计算速度。

Abstract
Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learning, node importance in social network, attribution models, etc. However, its heavy computational burden has been long recognized but rarely investigated. Specifically, in a $d$-player coalition game, calculating a Shapley value requires the evaluation of $d!$ or $2^d$ marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence it becomes infeasible to calculate the Shapley value when $d$ is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network.

摘要
沙普利值是原本 econometrics 中的一种概念，用于公平分配合作者在协同游戏中的收益和成本。在过去几十年中，它的应用范围已经扩展到了其他领域，如市场营销、工程和机器学习。例如，它可以解决敏感分析、地方模型解释、社交网络中节点重要性、负责模型等问题。然而，它的计算束缚非常重，长期被注意。特别是在 d 个玩家协同游戏中，计算沙普利值需要评估 d! 或 2^d 个边缘贡献值，这取决于我们是使用 permutation 还是 combination 的 Shapley value формулиров。当 d 较大时，计算沙晶利值变得不可能。通常的解决方案是随机抽样 permutations 来代替完整的 permutations 列表。我们发现了一种高级的随机抽样方案，可以为 estimation 提供更高的准确性。我们的采样方案基于设计实验 (DOE) 中的 combinatorial 结构，特别是 order-of-addition 实验设计，用于研究不同的组件顺序对输出的影响。我们证明了 obtained estimates 是无偏的，并可以 deterministically 恢复原始沙晶利值。 Both theoretical 和 simulations 结果表明，我们的 DOE-based 采样方案在 estimation 精度方面超过 SRS，并且在一些情况下可以 deterministically 恢复原始沙晶利值。 surprisingly，它还是微妙些快于 SRS。最后，我们对 C. elegans 神经系统和 9/11 恐怖袭击网络进行了实际数据分析。

Efficient Methods for Non-stationary Online Learning

paper_url: http://arxiv.org/abs/2309.08911
repo_url: None
paper_authors: Peng Zhao, Yan-Feng Xie, Lijun Zhang, Zhi-Hua Zhou
for: 这个论文主要针对非站点环境下的在线减少偏差和适应偏差的优化问题。
methods: 这个论文提出了一种基于 parameter-free online learning 的减少偏差和适应偏差优化方法，通过减少每轮投影数量从 $\mathcal{O}(\log T)$ 降低到 $1$，并且只需一次 gradient query 和一次函数评估。
results: 该论文的实验结果验证了论文中的理论结论，并且显示了该方法在非站点环境下的高效性和稳定性。

Abstract
Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.

摘要
在这篇论文中，我们提出高效的方法来优化动态 regret和适应 regret，从而减少每个回合的射影数量从 $\mathcal{O}(\log T)$ 到 1。此外，我们的算法只需要一次 gradient query 和一次函数评估在每个回合。我们的技术基于在参数自由线上学习中的减少机制，需要非常简单的非站点线上方法的非常简单的非站点线上方法的修改。实验证明了我们的理论发现。

Robust Online Covariance and Sparse Precision Estimation Under Arbitrary Data Corruption

paper_url: http://arxiv.org/abs/2309.08884
repo_url: None
paper_authors: Tong Yao, Shreyas Sundaram
for: 本研究旨在提出一种在在线enario中Robustly estimate covariance matrix的方法，以适应数据损害和攻击。
methods: 本文提出了一种基于trimmed-inner-product算法的在线方法，可以在面临arbitrary和敌意数据攻击的情况下 robustly estimate covariance matrix。
results: 本文提供了error bound和 convergence property的分析，证明了该方法可以准确地估计精度矩阵，即precision matrix。

Abstract
Gaussian graphical models are widely used to represent correlations among entities but remain vulnerable to data corruption. In this work, we introduce a modified trimmed-inner-product algorithm to robustly estimate the covariance in an online scenario even in the presence of arbitrary and adversarial data attacks. At each time step, data points, drawn nominally independently and identically from a multivariate Gaussian distribution, arrive. However, a certain fraction of these points may have been arbitrarily corrupted. We propose an online algorithm to estimate the sparse inverse covariance (i.e., precision) matrix despite this corruption. We provide the error-bound and convergence properties of the estimates to the true precision matrix under our algorithms.

摘要
Here's the Simplified Chinese translation: Gaussian 图模型广泛用于表示实体之间的相关性，但它们受到数据损害的威胁。在这项工作中，我们提出了一种修改后的内积法来在在线场景中稳定地估计协方差，即使在数据攻击中存在arbitrary和敌意的数据攻击。在每个时间步骤中，数据点会被独立地和identically从多变量 Gaussian 分布中采样，但一部分这些点可能被arbitrarily corrupted。我们提议一种在线算法来估计稀缺 inverse covariance（即精度）矩阵，即使在这些损害下。我们提供了估计错误 bound 和估计 converge 到真正精度矩阵的性质。

Rethinking Learning Rate Tuning in the Era of Large Language Models

paper_url: http://arxiv.org/abs/2309.08859
repo_url: https://github.com/mlsysx/lrbenchplusplus
paper_authors: Hongpeng Jin, Wenqi Wei, Xuyu Wang, Wenbin Zhang, Yanzhao Wu
for: 本研究旨在探讨大语言模型（LLM）精度预测性能的最佳化问题，尤其是学习率的调整问题。
methods: 本研究使用了现有的学习率策略分析LLM fine-tuning中的挑战和机遇，并提出了LRBench++来 benchmark学习率策略并且为LLM fine-tuning和传统的深度神经网络（DNN）训练提供了一个共享的 benchmarking工具。
results: 实验分析表明，LRBench++可以帮助找到最佳的学习率策略，并且在LLM fine-tuning和传统DNN训练中显示出了不同的特点。

Abstract
Large Language Models (LLMs) represent the recent success of deep learning in achieving remarkable human-like predictive performance. It has become a mainstream strategy to leverage fine-tuning to adapt LLMs for various real-world applications due to the prohibitive expenses associated with LLM training. The learning rate is one of the most important hyperparameters in LLM fine-tuning with direct impacts on both fine-tuning efficiency and fine-tuned LLM quality. Existing learning rate policies are primarily designed for training traditional deep neural networks (DNNs), which may not work well for LLM fine-tuning. We reassess the research challenges and opportunities of learning rate tuning in the coming era of Large Language Models. This paper makes three original contributions. First, we revisit existing learning rate policies to analyze the critical challenges of learning rate tuning in the era of LLMs. Second, we present LRBench++ to benchmark learning rate policies and facilitate learning rate tuning for both traditional DNNs and LLMs. Third, our experimental analysis with LRBench++ demonstrates the key differences between LLM fine-tuning and traditional DNN training and validates our analysis.

摘要
Translated into Simplified Chinese:大型语言模型（LLM）表示深度学习最近的成功，它们可以达到人类预测性能的惊人水平。由于LLM训练的成本过高，因此现在主流的策略是通过微调来适应LLM应用。学习率是微调LLM中最重要的超参数，它直接影响微调效率和微调后LLM质量。现有的学习率策略主要适用于训练传统深度神经网络（DNN），可能不适用于LLM微调。我们重新评估LLM微调中的研究挑战和机遇，并提出三项原创贡献。首先，我们回顾现有的学习率策略，分析LLM微调中学习率的挑战。其次，我们提出LRBench++来评估学习率策略，并且为传统DNN和LLM进行微调。第三，我们通过LRBench++的实验分析，发现LLM微调和传统DNN训练存在重要的差异，并证明我们的分析。

Intelligent machines work in unstructured environments by differential neural computing

paper_url: http://arxiv.org/abs/2309.08835
repo_url: None
paper_authors: Shengbo Wang, Shuo Gao, Chenyu Tang, Cong Li, Shurui Wang, Jiaqi Wang, Hubin Zhao, Guohua Hu, Arokia Nathan, Ravinder Dahiya, Luigi Occhipinti
for: 提高智能机器在实际世界中高效地处理未知环境信息，如人类一样。
methods: 基于卷积神经计算的干扰信号处理和学习方法，通过提取环境信息的主要特征并应用相关编码刺激到 memristors 中，成功实现了人类类似的处理未知环境信息能力，包括增强（>720%）和适应（<50%）机械刺激。
results: 方法在两种常见的智能机器应用中展现了良好的扩展性和泛化性，即物品抓取和自动驾驶。在前者中，通过学习未知物体特征（如锋利角和平滑表面），一个 memristor 在 1 ms 内实现了安全和稳定的抓取。在后者中，通过使用 40x25 个 memristor 数组，成功EXTRACTED 10 种未知环境决策信息（如超越车辆和行人），准确率达 94%。

Abstract
Expecting intelligent machines to efficiently work in real world requires a new method to understand unstructured information in unknown environments with good accuracy, scalability and generalization, like human. Here, a memristive neural computing based perceptual signal differential processing and learning method for intelligent machines is presented, via extracting main features of environmental information and applying associated encoded stimuli to memristors, we successfully obtain human-like ability in processing unstructured environmental information, such as amplification (>720%) and adaptation (<50%) of mechanical stimuli. The method also exhibits good scalability and generalization, validated in two typical applications of intelligent machines: object grasping and autonomous driving. In the former, a robot hand experimentally realizes safe and stable grasping, through learning unknown object features (e.g., sharp corner and smooth surface) with a single memristor in 1 ms. In the latter, the decision-making information of 10 unstructured environments in autonomous driving (e.g., overtaking cars, pedestrians) are accurately (94%) extracted with a 40x25 memristor array. By mimicking the intrinsic nature of human low-level perception mechanisms in electronic memristive neural circuits, the proposed method is adaptable to diverse sensing technologies, helping intelligent machines to generate smart high-level decisions in real world.

摘要
要让智能机器在真实世界中有效工作，需要一种新的方法来理解未知环境中的无结构信息，具有人类水平的准确性、可扩展性和泛化能力。在这篇文章中，我们提出了基于干扰神经计算的嗅收信号处理和学习方法，通过提取环境信息的主要特征并将其与干扰器相关的编码刺激应用于干扰器中，成功地实现了人类水平的环境信息处理能力，包括增强（>720%）和适应（<50%）的机械刺激。这种方法还具有良好的扩展性和泛化能力，在智能机器的两个典型应用中进行验证：物体抓取和自动驾驶。在前者中，一个机器人手经过实验性地实现了安全和稳定的抓取，通过学习未知物体特征（如锋利角和平滑面）的一个干扰器，在1毫秒内完成。在后者中，一个40x25干扰器数组可以高精度地提取10种不同的自动驾驶环境中的决策信息（如超越车辆和步行人），准确率达94%。通过模仿人类低级感觉机制的内在性，我们的方法可以与多种感知技术结合，帮助智能机器在真实世界中产生智能高级决策。