2023-08-22

cs.LG

cs.LG - 2023-08-22

A free from local minima algorithm for training regressive MLP neural networks

paper_url: http://arxiv.org/abs/2308.11532
repo_url: None
paper_authors: Augusto Montisci
for: 本文提出了一种创新的多层感知网络训练方法，以避免地方最小值问题。
methods: 该方法基于训练集分布的性质，或者说是其内部图像，以避免地方最小值问题。
results: 本文对一个知名的基准数据集进行了表现示例，并证明了该方法可以减少地方最小值问题。

Abstract
In this article an innovative method for training regressive MLP networks is presented, which is not subject to local minima. The Error-Back-Propagation algorithm, proposed by William-Hinton-Rummelhart, has had the merit of favouring the development of machine learning techniques, which has permeated every branch of research and technology since the mid-1980s. This extraordinary success is largely due to the black-box approach, but this same factor was also seen as a limitation, as soon more challenging problems were approached. One of the most critical aspects of the training algorithms was that of local minima of the loss function, typically the mean squared error of the output on the training set. In fact, as the most popular training algorithms are driven by the derivatives of the loss function, there is no possibility to evaluate if a reached minimum is local or global. The algorithm presented in this paper avoids the problem of local minima, as the training is based on the properties of the distribution of the training set, or better on its image internal to the neural network. The performance of the algorithm is shown for a well-known benchmark.

摘要
本文提出了一种创新的多层感知网络训练方法，不受地方最小值限制。惠威姆-希茨-鲁姆哈特提出的错误反传播算法在1980年代中期以来，在机器学习领域取得了杰出成就，这一成就主要归功于黑盒模型，但同时也被视为一个限制，因为随着问题的增加 complexity，这种方法的应用逐渐受限。训练算法中最 kritical的问题是搜索函数的地方最小值，通常是训练集上输出的平均方差。因为现有训练算法都是根据损失函数的导数进行驱动，因此无法评估是否到达了地方最小值。本文所提出的方法解决了本问题，基于训练集的分布特性或更好地说，是基于其内部图像的。文中所示的性能表现对一个知名的测试集进行了展示。

ReLiCADA – Reservoir Computing using Linear Cellular Automata Design Algorithm

paper_url: http://arxiv.org/abs/2308.11522
repo_url: None
paper_authors: Jonas Kantic, Fabian C. Legl, Walter Stechele, Jakob Hermann
for: 优化Reservoir Computing的设计用于时间序列应用。
methods: 使用Cellular Automata模型选择规则，并解决线性Cellular Automaton规则选择问题。
results: 对相关的标准数据集进行了严格的测试，选择的规则在总规则空间中排名在前5%，并且与其他当前领域模型相比，提供了更低的计算复杂性和训练时间，同时实现了更低的错误率。

Abstract
In this paper, we present a novel algorithm to optimize the design of Reservoir Computing using Cellular Automata models for time series applications. Besides selecting the models' hyperparameters, the proposed algorithm particularly solves the open problem of linear Cellular Automaton rule selection. The selection method pre-selects only a few promising candidate rules out of an exponentially growing rule space. When applied to relevant benchmark datasets, the selected rules achieve low errors, with the best rules being among the top 5% of the overall rule space. The algorithm was developed based on mathematical analysis of linear Cellular Automaton properties and is backed by almost one million experiments, adding up to a computational runtime of nearly one year. Comparisons to other state-of-the-art time series models show that the proposed Reservoir Computing using Cellular Automata models have lower computational complexity, at the same time, achieve lower errors. Hence, our approach reduces the time needed for training and hyperparameter optimization by up to several orders of magnitude.

摘要
在这篇论文中，我们提出了一种新的算法优化储量计算机使用细胞自动机模型进行时间序列应用。除了选择模型的超参数之外，我们的算法特别解决了线性细胞自动机规则选择的开放问题。选择方法先于搜索极其庞大的规则空间中的一些有潜力的候选规则。应用到相关的基准数据集上，选择的规则可以 дости到低的错误率，最好的规则在总规则空间中排名前5%。我们的算法基于细胞自动机的数学分析和大约一百万个实验，计算时间近一年。与其他当前最佳时间序列模型比较，我们的方法具有更低的计算复杂度，同时可以实现更低的错误率。因此，我们的方法可以减少训练和超参数优化所需的时间，可能是数量级减少。

EM for Mixture of Linear Regression with Clustered Data

paper_url: http://arxiv.org/abs/2308.11518
repo_url: None
paper_authors: Amirhossein Reisizadeh, Khashayar Gatmiry, Asuman Ozdaglar
for: 这种问题的答案是如何利用分布式学习框架中的数据归一化来提高学习效果。
methods: 作者使用了Expectation-Maximization（EM）方法来估计二元混合线性回归模型中的参数。
results: 作者表明，如果 Initialize EM 方法正确，并且 $m$ 增长为 $e^{o(n)}$，那么 EM 方法只需要 $O(1)$ 轮次来达到同样的统计准确性，并且提供了新的 asymptotic optimization 和通用的验证保证。

Abstract
Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. In many settings however, heterogeneous data may be generated in clusters with shared structures, as is the case in several applications such as federated learning where a common latent variable governs the distribution of all the samples generated by a client. It is therefore natural to ask how the underlying clustered structures in distributed data can be exploited to improve learning schemes. In this paper, we tackle this question in the special case of estimating $d$-dimensional parameters of a two-component mixture of linear regressions problem where each of $m$ nodes generates $n$ samples with a shared latent variable. We employ the well-known Expectation-Maximization (EM) method to estimate the maximum likelihood parameters from $m$ batches of dependent samples each containing $n$ measurements. Discarding the clustered structure in the mixture model, EM is known to require $O(\log(mn/d))$ iterations to reach the statistical accuracy of $O(\sqrt{d/(mn)})$. In contrast, we show that if initialized properly, EM on the structured data requires only $O(1)$ iterations to reach the same statistical accuracy, as long as $m$ grows up as $e^{o(n)}$. Our analysis establishes and combines novel asymptotic optimization and generalization guarantees for population and empirical EM with dependent samples, which may be of independent interest.

摘要
现代数据驱动和分布式学习框架面临各种各样的大规模数据，这些数据来自客户端分布在不同环境中。实际上，数据的差异性是规模化多个分布式学习模式的主要障碍。然而，在一些应用中，客户端生成的数据可能会具有共同结构，例如联邦学习中，每个客户端都生成的所有样本均具有共同的隐藏变量。因此，可以问到如何利用分布式数据中的层次结构来改进学习方案。在这篇论文中，我们解决这个问题，特别是在估计$d$-维参数的两组线性回归问题中。我们使用常见的期望-最大化（EM）方法来估计最大 LIKELIHOOD 参数，从$m$ 批次的相互依赖样本中获取 $n$ 个测量。不考虑分布式数据中的层次结构，EM 知道需要 $O(\log(mn/d))$ 迭代来达到同样的统计准确性，其中 $m$ 是 Client 数量，$n$ 是每个客户端生成的样本数量，$d$ 是维度。然而，我们显示，如果INITIALIZED 正确， THEN EM 在结构化数据上只需要 $O(1)$ 迭代来达到同样的统计准确性，只要 $m$ 增长为 $e^{o(n)}$。我们的分析建立了和更新了可opus和总体Optimization guarantees for population和empirical EM with dependent samples，这可能是独立的兴趣。

TrackFlow: Multi-Object Tracking with Normalizing Flows

paper_url: http://arxiv.org/abs/2308.11513
repo_url: None
paper_authors: Gianluca Mancusi, Aniello Panariello, Angelo Porrello, Matteo Fabbri, Simone Calderara, Rita Cucchiara
for: 提高多目标跟踪的性能，尤其是在多模态 Setting 中，使用 tracking-by-detection 方法。
methods: 使用 Deep density estimator 模型 conditional joint probability 分布 correct associations，并且提出了一种新的 probabilistic formulation 来解决 tailored hyperparameters 的问题和 cost 独立性不足问题。
results: 实验结果表明，该方法可以 consistently enhance 多个 tracking-by-detection 算法的性能，包括在 simulated 和 real benchmark 上。

Abstract
The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches. In view of this, we aim at extending tracking-by-detection to multi-modal settings, where a comprehensive cost has to be computed from heterogeneous information e.g., 2D motion cues, visual appearance, and pose estimates. More precisely, we follow a case study where a rough estimate of 3D information is also available and must be merged with other traditional metrics (e.g., the IoU). To achieve that, recent approaches resort to either simple rules or complex heuristics to balance the contribution of each cost. However, i) they require careful tuning of tailored hyperparameters on a hold-out set, and ii) they imply these costs to be independent, which does not hold in reality. We address these issues by building upon an elegant probabilistic formulation, which considers the cost of a candidate association as the negative log-likelihood yielded by a deep density estimator, trained to model the conditional joint probability distribution of correct associations. Our experiments, conducted on both simulated and real benchmarks, show that our approach consistently enhances the performance of several tracking-by-detection algorithms.

摘要
Multi-object tracking 领域最近又有新的关注，回归到传统的检测跟踪模式，因为它的简单性和强制约束，使得设计和监督跟踪方法变得复杂。在这个视野下，我们想扩展跟踪检测到多模态场景，计算来自不同信息（如2D运动cue、视觉外观和姿态估计）的总成本。更加准确地说，我们采用一个实际情况研究，其中有粗略的3D信息也可以用，并且需要与传统指标（例如IoU）进行混合。以前的方法通常采用简单的规则或复杂的规则来平衡每个成本的贡献。然而，这些方法有两个缺点：一是需要手动调整特定的Hyperparameter在保留集上，二是它们假设成本独立，这并不符合实际情况。我们解决这些问题，通过建立一种简洁的概率形式化，其中候选人协会的成本视为检测器训练的深度概率分布中的逻辑 JOIN 预测错误的负Log-likelihood。我们的实验，在 simulate 和实际 benchmark 上进行，表明我们的方法可以不断提高多个跟踪检测算法的性能。

Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models

paper_url: http://arxiv.org/abs/2308.11511
repo_url: None
paper_authors: Adrián Csiszárik, Melinda F. Kiss, Péter Kőrösi-Szabó, Márton Muntag, Gergely Papp, Dániel Varga
for: 研究了元素积み的归一化 combinatorial方法，探索了两个具有相同结构的神经网络参数 вектор $\Theta_A$ 和 $\Theta_B$ 的可能性。
methods: 使用了各种分布在 hypercube $[0,1]^{d}$ 和其邻域中的模型组合，并进行了广泛的实验研究。
results: 发现了一些新的模型连接性观察，包括模式可连接性和模型重新定位性。具体来说，发现了一些元素积み模型组合的特性，如：两个模型重新定位到共同的第三模型时，这些模型仍然可以形成一个工作的模型。此外，还发现了模型组合的函数和权重相似性的不同性，表明这些模型组合不是空的。

Abstract
We explore element-wise convex combinations of two permutation-aligned neural network parameter vectors $\Theta_A$ and $\Theta_B$ of size $d$. We conduct extensive experiments by examining various distributions of such model combinations parametrized by elements of the hypercube $[0,1]^{d}$ and its vicinity. Our findings reveal that broad regions of the hypercube form surfaces of low loss values, indicating that the notion of linear mode connectivity extends to a more general phenomenon which we call mode combinability. We also make several novel observations regarding linear mode connectivity and model re-basin. We demonstrate a transitivity property: two models re-based to a common third model are also linear mode connected, and a robustness property: even with significant perturbations of the neuron matchings the resulting combinations continue to form a working model. Moreover, we analyze the functional and weight similarity of model combinations and show that such combinations are non-vacuous in the sense that there are significant functional differences between the resulting models.

摘要
我们研究元素精度的凸 комби nations of two个 permutation-aligned neural network parameter vectors $\Theta_A$ 和 $\Theta_B$ of size $d$. We conduct extensive experiments by examining various distributions of such model combinations parametrized by elements of the hypercube $[0,1]^{d}$ and its vicinity. Our findings reveal that broad regions of the hypercube form surfaces of low loss values, indicating that the notion of linear mode connectivity extends to a more general phenomenon which we call mode combinability. We also make several novel observations regarding linear mode connectivity and model re-basin. We demonstrate a transitivity property: two models re-based to a common third model are also linear mode connected, and a robustness property: even with significant perturbations of the neuron matchings the resulting combinations continue to form a working model. Moreover, we analyze the functional and weight similarity of model combinations and show that such combinations are non-vacuous in the sense that there are significant functional differences between the resulting models.Here's the translation in Traditional Chinese:我们研究元素精度的凸 combinat ions of two个 permutation-aligned neural network parameter vectors $\Theta_A$ 和 $\Theta_B$ of size $d$. We conduct extensive experiments by examining various distributions of such model combinations parametrized by elements of the hypercube $[0,1]^{d}$ and its vicinity. Our findings reveal that broad regions of the hypercube form surfaces of low loss values, indicating that the notion of linear mode connectivity extends to a more general phenomenon which we call mode combinability. We also make several novel observations regarding linear mode connectivity and model re-basin. We demonstrate a transitivity property: two models re-based to a common third model are also linear mode connected, and a robustness property: even with significant perturbations of the neuron matchings the resulting combinations continue to form a working model. Moreover, we analyze the functional and weight similarity of model combinations and show that such combinations are non-vacuous in the sense that there are significant functional differences between the resulting models.

Can Authorship Representation Learning Capture Stylistic Features?

paper_url: http://arxiv.org/abs/2308.11490
repo_url: https://github.com/llnl/luar
paper_authors: Andrew Wang, Cristina Aggazzotti, Rebecca Kotula, Rafael Rivera Soto, Marcus Bishop, Nicholas Andrews
for: 这个论文主要目的是为了研究自动分离作者的风格和内容之间的关系，以及如何使用数据驱动方法来学习作者表示。
methods: 这篇论文使用了大量文本 Corpora 和作者标签，通过数据驱动方法来学习作者表示。
results: 实验结果表明，这些表示能够捕捉到作者的风格特征，并且可以鲁棒地抵制一些数据变化，如时间的主题演变。

Abstract
Automatically disentangling an author's style from the content of their writing is a longstanding and possibly insurmountable problem in computational linguistics. At the same time, the availability of large text corpora furnished with author labels has recently enabled learning authorship representations in a purely data-driven manner for authorship attribution, a task that ostensibly depends to a greater extent on encoding writing style than encoding content. However, success on this surrogate task does not ensure that such representations capture writing style since authorship could also be correlated with other latent variables, such as topic. In an effort to better understand the nature of the information these representations convey, and specifically to validate the hypothesis that they chiefly encode writing style, we systematically probe these representations through a series of targeted experiments. The results of these experiments suggest that representations learned for the surrogate authorship prediction task are indeed sensitive to writing style. As a consequence, authorship representations may be expected to be robust to certain kinds of data shift, such as topic drift over time. Additionally, our findings may open the door to downstream applications that require stylistic representations, such as style transfer.

摘要
自动分离作者的风格从写作内容是计算语言学领域的长期问题，而有大量文本资料Set with作者标签的可用性，使得可以通过数据驱动方式学习作者表示，这种任务据称更多地依赖于编码写作风格而非编码内容。然而，成功在这个代理任务上并不能保证这些表示 capture 写作风格，因为作者可能也与其他隐藏变量相关，如话题。为了更好地理解这些表示所传递的信息，以及特别是验证假设这些表示主要编码写作风格，我们系统地进行了一系列targeted Experiments。结果表明，learned for the surrogate authorship prediction task 的表示 действительно敏感于写作风格。因此，作者表示可能会具有 certain types of data shift 的 Robustness， such as topic drift over time。此外，我们的发现可能会开启下游应用需要风格表示的应用，如样式转移。

Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions

paper_url: http://arxiv.org/abs/2308.11483
repo_url: None
paper_authors: Pouya Pezeshkpour, Estevam Hruschka
for: 这 paper 专门研究了 Large Language Models (LLMs) 在多选问题上的 Robustness，以及这些模型如何受到示例数量和示例选择的影响。
methods: 这 paper 使用了多个 benchmark 和多种方法来调查 LLMs 的 Robustness，包括对答案选项的重新排序和几种 calibration 技术。
results: 研究发现，当答案选项重新排序时，LLMs 的表现会受到显著影响，表现下降约 13% 到 75%。此外，研究还发现了一些特征选项的位置可以增加或减少模型的偏见。通过不同的实验和调整方法， authors 提出了一些方法来提高 LLMs 的 Robustness。

Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in various NLP tasks. However, previous works have shown these models are sensitive towards prompt wording, and few-shot demonstrations and their order, posing challenges to fair assessment of these models. As these models become more powerful, it becomes imperative to understand and address these limitations. In this paper, we focus on LLMs robustness on the task of multiple-choice questions -- commonly adopted task to study reasoning and fact-retrieving capability of LLMs. Investigating the sensitivity of LLMs towards the order of options in multiple-choice questions, we demonstrate a considerable performance gap of approximately 13% to 75% in LLMs on different benchmarks, when answer options are reordered, even when using demonstrations in a few-shot setting. Through a detailed analysis, we conjecture that this sensitivity arises when LLMs are uncertain about the prediction between the top-2/3 choices, and specific options placements may favor certain prediction between those top choices depending on the question caused by positional bias. We also identify patterns in top-2 choices that amplify or mitigate the model's bias toward option placement. We found that for amplifying bias, the optimal strategy involves positioning the top two choices as the first and last options. Conversely, to mitigate bias, we recommend placing these choices among the adjacent options. To validate our conjecture, we conduct various experiments and adopt two approaches to calibrate LLMs' predictions, leading to up to 8 percentage points improvement across different models and benchmarks.

摘要
大型语言模型（LLM）在不同的自然语言处理任务中表现出色，但之前的研究表明这些模型对提示语言和少量示例的顺序有敏感性，这会带来评估这些模型的困难。随着这些模型的能力不断提高，我们需要更好地理解和解决这些限制。在这篇论文中，我们关注LLM在多选问题上的Robustness--一个通用的任务来评估LLM的理解和知识 retrieve能力。我们发现，当答案选项的顺序发生变化时，LLM的表现会受到大约13%到75%的变化，即使在几个示例中进行示例。通过详细分析，我们推断这种敏感性来自于LLM对前三个选项之间的预测不确定性，特定的选项位置可能会偏导certain prediction between top choices，这取决于问题的特点。我们还发现了选项的顺序对抑制或增强模型偏好的 Patterns。我们发现，为了增强偏好，最佳策略是将前两个选项放在第一和最后的位置。相反，以减少偏好，我们建议将这些选项放在相邻的位置。为了证实我们的推断，我们进行了多种实验和采用了两种方法来调整LLM的预测，导致不同的模型和benchmark上的提高。

Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection

paper_url: http://arxiv.org/abs/2308.11480
repo_url: https://github.com/servicenow/broad-openood
paper_authors: Charles Guille-Escuret, Pierre-André Noël, Ioannis Mitliagkas, David Vazquez, Joao Monteiro
for: This paper aims to improve the reliability of deployed machine learning systems by developing methods to detect out-of-distribution (OOD) inputs and addressing the limitation of existing research that only focuses on samples from classes absent from the training set.methods: The paper evaluates the performance of recent OOD detection methods on five distinct types of distribution shifts and releases a benchmark called BROAD (Benchmarking Resilience Over Anomaly Diversity) to publicly release their findings. The authors also propose an ensemble approach that leverages a generative model of existing detection scores to achieve superior performance in broad OOD detection.results: The paper reveals that existing OOD detection methods excel in detecting unknown classes but are inconsistent in detecting other types of distribution shifts. The proposed ensemble approach demonstrates superior performance compared to existing methods, offering a more consistent and comprehensive solution for broad OOD detection.

Abstract
Improving the reliability of deployed machine learning systems often involves developing methods to detect out-of-distribution (OOD) inputs. However, existing research often narrowly focuses on samples from classes that are absent from the training set, neglecting other types of plausible distribution shifts. This limitation reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs. In this study, we categorize five distinct types of distribution shifts and critically evaluate the performance of recent OOD detection methods on each of them. We publicly release our benchmark under the name BROAD (Benchmarking Resilience Over Anomaly Diversity). Our findings reveal that while these methods excel in detecting unknown classes, their performance is inconsistent when encountering other types of distribution shifts. In other words, they only reliably detect unexpected inputs that they have been specifically designed to expect. As a first step toward broad OOD detection, we learn a generative model of existing detection scores with a Gaussian mixture. By doing so, we present an ensemble approach that offers a more consistent and comprehensive solution for broad OOD detection, demonstrating superior performance compared to existing methods. Our code to download BROAD and reproduce our experiments is publicly available.

摘要
通常，升级部署的机器学习系统的可靠性通常需要开发检测非标型输入（OOD）的方法。然而，现有的研究frequently只关注 absence classes 的样本，忽视了其他类型的可能的分布变化。这种限制reduces the applicability of these methods in real-world scenarios, where systems encounter a wide variety of anomalous inputs。在这个研究中，我们将非标型输入分为五种不同的分布变化类别，并且严格评估了近期OOD检测方法在每个类别中的表现。我们公开发布了我们的 benchmark nder the name BROAD（Benchmarking Resilience Over Anomaly Diversity）。我们的发现表明，虽然这些方法在 unknown classes 方面表现出色，但在其他类型的分布变化时，其表现是不一致的。换句话说，它们只可靠地检测它们已经专门设计来预期的不同输入。作为一个初步的广泛OOD检测方法，我们学习了一个基于泛化模型的现有检测得分的 generator。通过这样做，我们提供了一种更一致和全面的解决方案，并证明了与现有方法相比，我们的方法具有更高的表现。我们的代码可以下载BROAD和重现我们的实验。

Revisiting column-generation-based matheuristic for learning classification trees

paper_url: http://arxiv.org/abs/2308.11477
repo_url: https://github.com/krooonal/col_gen_estimator
paper_authors: Krunal Kishor Patel, Guy Desaulniers, Andrea Lodi
for: 提高分类问题的解决方法的扩展性和可扩展性。
methods: 使用列生成方法进行决策树的学习，并对实际问题进行改进。
results: 改进后的方法可以更好地扩展到大型数据集，并且在分类问题中获得更高的准确率。

Abstract
Decision trees are highly interpretable models for solving classification problems in machine learning (ML). The standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy. Other discrete optimization models in the literature address the optimality problem but only work well on relatively small datasets. \cite{firat2020column} proposed a column-generation-based heuristic approach for learning decision trees. This approach improves scalability and can work with large datasets. In this paper, we describe improvements to this column generation approach. First, we modify the subproblem model to significantly reduce the number of subproblems in multiclass classification instances. Next, we show that the data-dependent constraints in the master problem are implied, and use them as cutting planes. Furthermore, we describe a separation model to generate data points for which the linear programming relaxation solution violates their corresponding constraints. We conclude by presenting computational results that show that these modifications result in better scalability.

摘要

Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning

paper_url: http://arxiv.org/abs/2308.11464
repo_url: None
paper_authors: Yun-Hin Chan, Rui Zhou, Running Zhao, Zhihan Jiang, Edith C. -H. Ngai
for: 提高模型不同的设备之间的兼容性，使得模型可以更好地处理异ogeneous系统的挑战。
methods: 提出一种基于 internal cross-layer gradients 的训练方案，可以增强深层模型之间的相似性，无需客户端之间的额外通信。
results: 经验证明，Internal Cross-layer Aggregation 可以提高异ogeneous FL 的性能，并且可以让模型 homogeneous FL 方法，如 FedAvg、FedProx、FedNova、Scaffold 和 MOON，在异ogeneous系统中展现出更好的表现。

Abstract
Federated learning (FL) inevitably confronts the challenge of system heterogeneity in practical scenarios. To enhance the capabilities of most model-homogeneous FL methods in handling system heterogeneity, we propose a training scheme that can extend their capabilities to cope with this challenge. In this paper, we commence our study with a detailed exploration of homogeneous and heterogeneous FL settings and discover three key observations: (1) a positive correlation between client performance and layer similarities, (2) higher similarities in the shallow layers in contrast to the deep layers, and (3) the smoother gradients distributions indicate the higher layer similarities. Building upon these observations, we propose InCo Aggregation that leverags internal cross-layer gradients, a mixture of gradients from shallow and deep layers within a server model, to augment the similarity in the deep layers without requiring additional communication between clients. Furthermore, our methods can be tailored to accommodate model-homogeneous FL methods such as FedAvg, FedProx, FedNova, Scaffold, and MOON, to expand their capabilities to handle the system heterogeneity. Copious experimental results validate the effectiveness of InCo Aggregation, spotlighting internal cross-layer gradients as a promising avenue to enhance the performance in heterogenous FL.

摘要
federated learning (FL) 在实际场景中遇到系统不同性的挑战。为了增强大多数模型同型 FL 方法在处理系统不同性的能力，我们提议一种训练方案，可以使其能够处理这种挑战。在这篇论文中，我们开始我们的研究，进行了详细的同型和不同型 FL 设置的探索，发现了三个关键观察结果：（1）客户端性能与层相似性之间存在正相关关系，（2）在浅层比深层更高的相似性，（3）更平滑的梯度分布表明更高的层相似性。基于这些观察结果，我们提出了内部交叉层梯度（InCo Aggregation），利用服务器模型内部的交叉层梯度，以增强深层层次的相似性，而不需要客户端之间进行额外的通信。此外，我们的方法可以与现有的模型同型 FL 方法，如 FedAvg、FedProx、FedNova、Scaffold 和 MOON 集成，以扩展它们的能力，处理系统不同性。丰富的实验结果证明了 InCo Aggregation 的有效性，指出交叉层梯度为各种不同类型 FL 方法的发展提供了一个有前途的方向。

An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning

paper_url: http://arxiv.org/abs/2308.11677
repo_url: None
paper_authors: Grégoire Petit, Michael Soumm, Eva Feillet, Adrian Popescu, Bertrand Delezoide, David Picard, Céline Hudelot
for: 这篇论文的目的是研究增量学习（Class-Incremental Learning）中如何从数据流中建立分类模型。
methods: 这篇论文使用了两种初始学习策略：使用目标资料集的内部组件，或者使用预先训练的模型 weights。
results: 这篇论文的主要结果是通过实验研究，发现初始学习策略是增量性能的主要影响因素，但是选择增量学习算法是预防忘记的更重要因素。

Abstract
Class-Incremental Learning (CIL) aims to build classification models from data streams. At each step of the CIL process, new classes must be integrated into the model. Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored, the case on which we focus here. To date, most approaches are based exclusively on the target dataset of the CIL process. However, the use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum. The initial model of the CIL process may only use the first batch of the target dataset, or also use pre-trained weights obtained on an auxiliary dataset. The choice between these two initial learning strategies can significantly influence the performance of the incremental learning model, but has not yet been studied in depth. Performance is also influenced by the choice of the CIL algorithm, the neural architecture, the nature of the target task, the distribution of classes in the stream and the number of examples available for learning. We conduct a comprehensive experimental study to assess the roles of these factors. We present a statistical analysis framework that quantifies the relative contribution of each factor to incremental performance. Our main finding is that the initial training strategy is the dominant factor influencing the average incremental accuracy, but that the choice of CIL algorithm is more important in preventing forgetting. Based on this analysis, we propose practical recommendations for choosing the right initial training strategy for a given incremental learning use case. These recommendations are intended to facilitate the practical deployment of incremental learning.

摘要
Translated into Simplified Chinese:类增量学习（CIL）目标是从数据流中构建分类模型。在CIL过程中，每步都需要将新类添加到模型中。由于悬崖式忘记，CIL在不能保存过去类例的情况下特别挑战。到目前为止，大多数方法都是基于目标数据集的CIL过程。然而，使用基于大量数据的自我supervised学习模型的初始化方法在最近几年得到了推动。CIL过程的初始模型可以只使用目标数据集的第一批样本，或者使用基于辅助数据集的预训练模型。这两种初始学习策略的选择会对增量学习模型的性能产生重要影响，但是这一点还没有得到深入研究。增量性能的选择也受到CIL算法、神经网络架构、目标任务的性质、流动数据集中类别的分布和可用学习示例的影响。我们进行了全面的实验研究，以评估这些因素的影响。我们还提供了一个统计分析框架，以量化每个因素对增量性能的相对贡献。我们的主要发现是，初始学习策略是增量性能的主要影响因素，但是选择CIL算法更重要地防止忘记。根据这一分析，我们提出了实用的初始学习策略选择建议，以便实现增量学习的实用部署。

A Survey on Self-Supervised Representation Learning

paper_url: http://arxiv.org/abs/2308.11455
repo_url: https://github.com/microsoft/esvit
paper_authors: Tobias Uelwer, Jan Robine, Stefan Sylvius Wagner, Marc Höftmann, Eric Upschulte, Sebastian Konietzny, Maike Behrendt, Stefan Harmeling
for: 本文提供了一个权威的综述，涵盖无监督图像表示学习方法的最新进展。
methods: 本文使用了多种无监督图像表示学习方法，包括自适应卷积、卷积扩展、卷积束规范等。
results: 根据 literature 中的 meta-study，无监督图像表示学习方法的实验结果显示，这些方法可以达到类似于有监督学习的水平，而无需使用 labels。

Abstract
Learning meaningful representations is at the heart of many tasks in the field of modern machine learning. Recently, a lot of methods were introduced that allow learning of image representations without supervision. These representations can then be used in downstream tasks like classification or object detection. The quality of these representations is close to supervised learning, while no labeled images are needed. This survey paper provides a comprehensive review of these methods in a unified notation, points out similarities and differences of these methods, and proposes a taxonomy which sets these methods in relation to each other. Furthermore, our survey summarizes the most-recent experimental results reported in the literature in form of a meta-study. Our survey is intended as a starting point for researchers and practitioners who want to dive into the field of representation learning.

摘要
学习有意义的表示是现代机器学习领域中的核心任务之一。近些年，许多没有监督的方法被引入，允许学习图像表示。这些表示可以在下游任务中使用，如分类或物体检测。这些方法的表示质量与监督学习相似，但无需 labels 的图像。这篇评论文件提供了这些方法的统一表示、相似性和差异，以及这些方法之间的相关性的分类。此外，我们的评论还summarizes 文献中最新的实验结果，并进行了一个元学习。我们的评论是为研究人员和实践者们，想要深入探索表示学习领域的开始点。

Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding

paper_url: http://arxiv.org/abs/2308.11448
repo_url: None
paper_authors: Jiantao Wu, Shentong Mo, Muhammad Awais, Sara Atito, Zhenhua Feng, Josef Kittler
for: The paper is written for advancing the field of computer vision and machine learning, specifically in the area of self-supervised learning and transfer learning.
methods: The paper proposes a new evaluation protocol for zero-shot segmentation based on a prompting patch, as well as a simple self-supervised learning approach called MMC that combines Masked image modelling, Momentum based self-distillation, and global Contrast to enhance discriminative representations of SSP ViTs.
results: The paper reports top-tier results in zero-shot semantic segmentation across various datasets, demonstrating the effectiveness of the proposed MMC approach in object segmentation tasks.

Abstract
Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data. In the realm of computer vision, pretrained vision transformers (ViTs) have played a pivotal role in advancing transfer learning. Nonetheless, the escalating cost of finetuning these large models has posed a challenge due to the explosion of model size. This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks, obviating the need for finetuning, with the intention of emulating human-like capabilities in generalisation and recognition of unseen objects. To this end, we propose an evaluation protocol for zero-shot segmentation based on a prompting patch. Given a point on the target object as a prompt, the algorithm calculates the similarity map between the selected patch and other patches, upon that, a simple thresholding is applied to segment the target. Another evaluation is intra-object and inter-object similarity to gauge discriminatory ability of SSP ViTs. Insights from zero-shot segmentation from prompting and discriminatory abilities of SSP led to the design of a simple SSP approach, termed MMC. This approaches combines Masked image modelling for encouraging similarity of local features, Momentum based self-distillation for transferring semantics from global to local features, and global Contrast for promoting semantics of global features, to enhance discriminative representations of SSP ViTs. Consequently, our proposed method significantly reduces the overlap of intra-object and inter-object similarities, thereby facilitating effective object segmentation within an image. Our experiments reveal that MMC delivers top-tier results in zero-shot semantic segmentation across various datasets.

摘要
自我监督预训练（SSP）已经成为机器学习中受欢迎的技术之一，允许提取有用的特征表示无需标注数据。在计算机视觉领域，预训练视transformer（ViT）已经扮演了重要的角色，促进了转移学习。然而，模型的规模快速增长，使得贵重的训练变得更加昂贵。本研究旨在评估SSL技术在计算机视觉任务中的效果，不需要训练，以便模拟人类类似的总结和对未看到的对象的识别。为此，我们提出了一种零shot分割评估方法，基于提示 patch。给定目标对象的一点作为提示，算法计算该选择的 patch 和其他 patches 之间的相似度图，然后应用简单的阈值处理来 segment the target。此外，我们还进行了针对SSL ViTs的内部和外部相似性评估，以衡量它们的权威性。通过零shot分割和SSL ViTs的权威性，我们设计了一种简单的SSP方法，名为MMC。这种方法结合了卷积图像模型，自动提取的 local features，以及全局对比，以提高 SSL ViTs 的权威性。我们的实验表明，MMC 可以在不同的dataset上达到顶尖的零shotSemantic segmentation结果。

Exploration of Rashomon Set Assists Explanations for Medical Data

paper_url: http://arxiv.org/abs/2308.11446
repo_url: None
paper_authors: Katarzyna Kobylińska, Mateusz Krzyziński, Rafał Machowicz, Mariusz Adamek, Przemysław Biecek
for: This paper aims to address the issue of relying solely on performance metrics in machine learning modeling, particularly in medical and healthcare studies, where valuable insights beyond predictions are desired.methods: The proposed approach utilizes the $\texttt{Rashomon_DETECT}$ algorithm, which compares prediction dependencies on variable values generated by XAI techniques, to identify the most different models within a Rashomon set. The Profile Disparity Index (PDI) is introduced to quantify differences in variable effects among models.results: The approach is demonstrated on a foundational case study of predicting survival among hemophagocytic lymphohistiocytosis (HLH) patients, as well as other medical data sets, showcasing its versatility and utility in various contexts. The results suggest that the proposed approach can provide comprehensive analysis of Rashomon set models, offering valuable insights beyond maximum performance metrics.

Abstract
The machine learning modeling process conventionally culminates in selecting a single model that maximizes a selected performance metric. However, this approach leads to abandoning a more profound analysis of slightly inferior models. Particularly in medical and healthcare studies, where the objective extends beyond predictions to valuable insight generation, relying solely on performance metrics can result in misleading or incomplete conclusions. This problem is particularly pertinent when dealing with a set of models with performance close to maximum one, known as $\textit{Rashomon set}$. Such a set can be numerous and may contain models describing the data in a different way, which calls for comprehensive analysis. This paper introduces a novel process to explore Rashomon set models, extending the conventional modeling approach. The cornerstone is the identification of the most different models within the Rashomon set, facilitated by the introduced $\texttt{Rashomon_DETECT}$ algorithm. This algorithm compares profiles illustrating prediction dependencies on variable values generated by eXplainable Artificial Intelligence (XAI) techniques. To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis. To illustrate the effectiveness of our approach, we showcase its application in predicting survival among hemophagocytic lymphohistiocytosis (HLH) patients - a foundational case study. Additionally, we benchmark our approach on other medical data sets, demonstrating its versatility and utility in various contexts.

摘要
This paper proposes a novel process to explore the Rashomon set models, extending the conventional modeling approach. The key is to identify the most different models within the Rashomon set, facilitated by the introduced $\texttt{Rashomon_DETECT}$ algorithm. This algorithm compares profiles illustrating prediction dependencies on variable values generated by eXplainable Artificial Intelligence (XAI) techniques. To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis.To demonstrate the effectiveness of our approach, we apply it to predicting survival among hemophagocytic lymphohistiocytosis (HLH) patients, which serves as a foundational case study. We also benchmark our approach on other medical data sets, showing its versatility and utility in various contexts.

Inferring gender from name: a large scale performance evaluation study

paper_url: http://arxiv.org/abs/2308.12381
repo_url: None
paper_authors: Kriste Krstovski, Yao Lu, Ye Xu
for: 这个研究的目的是为了评估现有的名字到性别推断方法的性能，并提出两种新的混合方法以提高性能。
methods: 该研究使用了许多大量注解的名字数据集进行分析，并评估了现有的名字到性别推断方法。
results: 研究发现现有的名字到性别推断方法的性能并不一致，并提出了两种新的混合方法，这两种方法在性能上超过了任何单一的现有方法。

Abstract
A person's gender is a crucial piece of information when performing research across a wide range of scientific disciplines, such as medicine, sociology, political science, and economics, to name a few. However, in increasing instances, especially given the proliferation of big data, gender information is not readily available. In such cases researchers need to infer gender from readily available information, primarily from persons' names. While inferring gender from name may raise some ethical questions, the lack of viable alternatives means that researchers have to resort to such approaches when the goal justifies the means - in the majority of such studies the goal is to examine patterns and determinants of gender disparities. The necessity of name-to-gender inference has generated an ever-growing domain of algorithmic approaches and software products. These approaches have been used throughout the world in academia, industry, governmental and non-governmental organizations. Nevertheless, the existing approaches have yet to be systematically evaluated and compared, making it challenging to determine the optimal approach for future research. In this work, we conducted a large scale performance evaluation of existing approaches for name-to-gender inference. Analysis are performed using a variety of large annotated datasets of names. We further propose two new hybrid approaches that achieve better performance than any single existing approach.

摘要
人的性别信息是科学研究中不可或缺的一部分，包括医学、社会学、政治科学和经济学等领域。然而，随着大数据的普及，性别信息越来越难以获得。在这些情况下，研究人员需要从可得到的信息中推断性别，主要是从人名中获取。虽然从名字推断性别可能会引起一些伦理问题，但在目的 justify the means 的情况下，研究人员需要采用这种方法。由于这种方法在全球范围内广泛应用，因此需要系统地评估和比较现有方法，以确定未来研究中的优化方法。在这项工作中，我们进行了大规模的性别推断方法的性能评估。我们使用了多种大型注释数据集来进行分析。此外，我们还提出了两种新的混合方法，其性能比任何单一现有方法更高。

A Study on the Impact of Non-confounding Covariates on the Inferential Performance of Methods based on the Potential Outcome Framework

paper_url: http://arxiv.org/abs/2308.11676
repo_url: None
paper_authors: Yonghe Zhao, Shuai Fu, Huiyan Sun
for: This paper focuses on the Potential Outcome Framework (POF) and its application in causal inference, specifically addressing the challenges of dealing with high-dimensional covariates.methods: The paper presents a unified graphical framework for the Causal Inference Models based on the POF (CIMs-B-POF) and analyzes the influence of various types of non-confounding covariates on the inference performance.results: The key findings are that the optimal scenario for eliminating confounding bias is for the covariates to exclusively encompass confounders, while the adjustment variables contribute to more accurate inferences in the task of inferring counterfactual outcomes. The theoretical conclusions are validated through extensive experiments conducted on synthetic datasets.

Abstract
The Potential Outcome Framework (POF) plays a prominent role in the field of causal inference. Most causal inference models based on the POF (CIMs-B-POF) are designed for eliminating confounding bias and default to an underlying assumption of Confounding Covariates. This assumption posits that the covariates consist solely of confounders. However, the assumption of Confounding Covariates is challenging to maintain in practice, particularly when dealing with high-dimensional covariates. While certain methods have been proposed to differentiate the distinct components of covariates prior to conducting causal inference, the consequences of treating non-confounding covariates as confounders remain unclear. This ambiguity poses a potential risk when applying the CIMs-B-POF in practical scenarios. In this paper, we present a unified graphical framework for the CIMs-B-POF, which greatly enhances the comprehension of these models' underlying principles. Using this graphical framework, we quantitatively analyze the extent to which the inference performance of CIMs-B-POF is influenced when incorporating various types of non-confounding covariates, such as instrumental variables, mediators, colliders, and adjustment variables. The key findings are: in the task of eliminating confounding bias, the optimal scenario is for the covariates to exclusively encompass confounders; in the subsequent task of inferring counterfactual outcomes, the adjustment variables contribute to more accurate inferences. Furthermore, extensive experiments conducted on synthetic datasets consistently validate these theoretical conclusions.

摘要
《可能结果框架（POF）在 causal inference 领域发挥着重要作用。大多数基于 POF 的 causal inference 模型（CIMs-B-POF）是为了消除偏见干扰而设计的，默认假设是 Confounding Covariates 假设，即covariates 仅仅包含偏见变量。然而，在实践中保持 Confounding Covariates 假设是困难的，特别是面临高维 covariates 时。虽然一些方法已经被提出来分解 covariates 中不同的组分，但对非偏见 covariates 被当作偏见变量的后果仍然不清楚。这种不确定性可能会在实践中应用 CIMs-B-POF 时产生风险。在这篇论文中，我们提出了一个统一的图表方法，可以很好地提高 CIMs-B-POF 的理解。使用这个图表方法，我们量化分析了在不同类型的非偏见 covariates 存在时，CIMs-B-POF 的推断性能是如何受影响的。我们发现，在消除偏见干扰的任务中，covariates 应该仅仅包含偏见变量；在后续的对 counterfactual 结果进行推断任务中，调整变量对更准确的推断做出了贡献。此外，我们在 sintetic 数据上进行了广泛的实验，并 consistently validate 了这些理论结论。

TurboViT: Generating Fast Vision Transformers via Generative Architecture Search

paper_url: http://arxiv.org/abs/2308.11421
repo_url: None
paper_authors: Alexander Wong, Saad Abbasi, Saeejith Nair
for: 这研究的目的是为了设计高效的视觉变换器网络，以满足高 Throughput、低内存需求的实际应用场景。
methods: 这研究使用的方法是Generative Architecture Search（GAS），通过这种方法，我们创造了一种高效的层次结构视觉变换器网络，叫做TurboViT。TurboViT网络设计围绕着面罩单元注意力和Q-池化设计模式。
results: TurboViT网络设计在ImageNet-1K数据集上实现了与其他10个state-of-the-art高效视觉变换器网络相同的准确率（>2.47$\times$ smaller than FasterViT-0），同时具有较低的计算复杂性（>3.4$\times$ fewer FLOPs和0.9% higher accuracy than MobileViT2-2.0）。此外，TurboViT在低延迟和批处理场景中的执行时间和 durchput也表现出了优异性（>3.21$\times$ lower latency和>3.18$\times$ higher throughput compared to FasterViT-0 for low-latency scenario）。

Abstract
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years. However, the architectural and computational complexity of such network architectures have made them challenging to deploy in real-world applications with high-throughput, low-memory requirements. As such, there has been significant research recently on the design of efficient vision transformer architectures. In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search (GAS) to achieve a strong balance between accuracy and architectural and computational efficiency. Through this generative architecture search process, we create TurboViT, a highly efficient hierarchical vision transformer architecture design that is generated around mask unit attention and Q-pooling design patterns. The resulting TurboViT architecture design achieves significantly lower architectural computational complexity (>2.47$\times$ smaller than FasterViT-0 while achieving same accuracy) and computational complexity (>3.4$\times$ fewer FLOPs and 0.9% higher accuracy than MobileViT2-2.0) when compared to 10 other state-of-the-art efficient vision transformer network architecture designs within a similar range of accuracy on the ImageNet-1K dataset. Furthermore, TurboViT demonstrated strong inference latency and throughput in both low-latency and batch processing scenarios (>3.21$\times$ lower latency and >3.18$\times$ higher throughput compared to FasterViT-0 for low-latency scenario). These promising results demonstrate the efficacy of leveraging generative architecture search for generating efficient transformer architecture designs for high-throughput scenarios.

摘要
视transformer在最近几年内已经达到了不可比拟的表现水平，但是这些网络架构的建筑和计算复杂度使其在实际应用中高速、低内存要求下部署具有挑战性。因此，最近有很多研究关于高效的视transformer架构设计。在这种研究中，我们通过生成架构搜索（GAS）来生成高效的视transformer架构设计，以达到精度和建筑计算效率之间的平衡。通过这个生成架构搜索过程，我们创造了TurboViT，一种高效的层次结构视transformer架构设计，围绕着面积单元注意力和Q-pooling设计模式。TurboViT架构设计的结果显示与其他10种状态OF-the-art高效视transformer网络架构设计相比，在ImageNet-1K数据集上 achieve 同等精度时表现出较低的建筑计算复杂度（>2.47倍于FasterViT-0）和计算复杂度（>3.4倍于MobileViT2-2.0）。此外，TurboViT在低延迟和批处理场景中表现出了优秀的执行速度和通过put（>3.21倍于FasterViT-0的低延迟场景）。这些惊喜的结果证明了可以通过生成架构搜索来生成高效的transformer架构设计，以满足高速应用场景。

Designing an attack-defense game: how to increase robustness of financial transaction models via a competition

paper_url: http://arxiv.org/abs/2308.11406
repo_url: None
paper_authors: Alexey Zaytsev, Alex Natekin, Evgeni Vorsin, Valerii Smirnov, Georgii Smirnov, Oleg Sidorshin, Alexander Senin, Alexander Dudin, Dmitry Berestnev
for:The paper is written to investigate the current state and dynamics of adversarial attacks and defenses for neural network models that use sequential financial data as the input, with a focus on the practicality of the attacks and defenses in real-world scenarios.methods:The paper uses a competition-based approach to examine the problems in modern financial transaction data, where participants compete directly against each other to test the robustness of their models. The authors also conduct a meta-study on the used approaches, including numerical experiments and ablation studies, to evaluate their effectiveness.results:The paper reports that the developed attacks and defenses outperform existing alternatives from the literature, demonstrating the validity of the competition as a tool for uncovering vulnerabilities of machine learning models and mitigating them in various domains. The authors also show that the attacks and defenses are practical in terms of execution.

Abstract
Given the escalating risks of malicious attacks in the finance sector and the consequential severe damage, a thorough understanding of adversarial strategies and robust defense mechanisms for machine learning models is critical. The threat becomes even more severe with the increased adoption in banks more accurate, but potentially fragile neural networks. We aim to investigate the current state and dynamics of adversarial attacks and defenses for neural network models that use sequential financial data as the input. To achieve this goal, we have designed a competition that allows realistic and detailed investigation of problems in modern financial transaction data. The participants compete directly against each other, so possible attacks and defenses are examined in close-to-real-life conditions. Our main contributions are the analysis of the competition dynamics that answers the questions on how important it is to conceal a model from malicious users, how long does it take to break it, and what techniques one should use to make it more robust, and introduction additional way to attack models or increase their robustness. Our analysis continues with a meta-study on the used approaches with their power, numerical experiments, and accompanied ablations studies. We show that the developed attacks and defenses outperform existing alternatives from the literature while being practical in terms of execution, proving the validity of the competition as a tool for uncovering vulnerabilities of machine learning models and mitigating them in various domains.

摘要
随着金融领域的黑客攻击风险的增加和对机器学习模型的严重损害，了解黑客策略和机器学习模型的Robust防御机制成为 kritical。随着银行更加广泛采用更加准确但可能脆弱的神经网络，威胁变得更加严重。我们的目标是研究现有的机器学习模型在使用时间序数据为输入时遭受的敌意攻击和防御策略。为了实现这个目标，我们设计了一个竞赛，allowing participants to compete directly with each other, allowing for a realistic and detailed investigation of problems in modern financial transaction data. Our main contributions include the analysis of the competition dynamics, which answers questions such as how important it is to conceal a model from malicious users, how long it takes to break it, and what techniques can be used to make it more robust. Additionally, we introduce new ways to attack models or increase their robustness.我们的分析继续通过一个meta-study，对使用的方法进行评估，包括其力量、数值实验和附加的ablation study。我们发现，我们的攻击和防御策略在执行上具有实用性，并且在不同领域中有效地抵御机器学习模型的攻击。因此，我们的竞赛可以作为探索机器学习模型的漏洞并解决它们的工具。

Non-Redundant Combination of Hand-Crafted and Deep Learning Radiomics: Application to the Early Detection of Pancreatic Cancer

paper_url: http://arxiv.org/abs/2308.11389
repo_url: None
paper_authors: Rebeca Vétil, Clément Abi-Nader, Alexandre Bône, Marie-Pierre Vullierme, Marc-Michel Rohé, Pietro Gori, Isabelle Bloch
for: 避免 Deep Learning Radiomics (DLR) 与 Hand-Crafted Radiomics (HCR) 重叠。
methods: 使用 VAE 提取 DLR 特征，并通过最小化它们与 HCR 特征的相互信息来保证其独立性。
results: 结果表明将非重叠 DLR 和 HCR 特征结合使用，可以提高预测肝癌早期 marker 的精度，比基eline方法更好。

Abstract
We address the problem of learning Deep Learning Radiomics (DLR) that are not redundant with Hand-Crafted Radiomics (HCR). To do so, we extract DLR features using a VAE while enforcing their independence with HCR features by minimizing their mutual information. The resulting DLR features can be combined with hand-crafted ones and leveraged by a classifier to predict early markers of cancer. We illustrate our method on four early markers of pancreatic cancer and validate it on a large independent test set. Our results highlight the value of combining non-redundant DLR and HCR features, as evidenced by an improvement in the Area Under the Curve compared to baseline methods that do not address redundancy or solely rely on HCR features.

摘要
我们对 Deep Learning Radiomics (DLR) 的学习进行非重复的应用，以避免与手动设计的 Radiomics (HCR) 重叠。我们使用 VAE 提取 DLR 特征，并通过最小化它们与 HCR 特征之间的相互信息，以确保它们之间的独立性。结果显示，可以结合 DLR 和手动设计的特征，并由分类器使用以预测早期肝癌标志。我们在四种早期肝癌标志上显示了我们的方法，并在大量独立的测试集上验证了我们的结果，结果显示了结合非重复的 DLR 和 HCR 特征的优点，相比于不处理重复性或仅仅靠待 HCR 特征的基eline方法，有所提高了预测精度。

Targeted Data Augmentation for bias mitigation

paper_url: http://arxiv.org/abs/2308.11386
repo_url: None
paper_authors: Agnieszka Mikołajczyk-Bareła, Maria Ferlin, Michał Grochowski
for: This paper aims to address the issue of bias in AI systems by introducing a novel approach called Targeted Data Augmentation (TDA).
methods: The TDA method leverages classical data augmentation techniques to insert biases into the training data, rather than removing them. This approach is designed to improve the performance of AI models by mitigating biases.
results: The authors found that their TDA method significantly decreased bias measures in two diverse datasets (clinical skin lesions and male and female faces) while maintaining a negligible increase in the error rate. The results show that the method can effectively mitigate biases associated with the frame, ruler, and glasses.

Abstract
The development of fair and ethical AI systems requires careful consideration of bias mitigation, an area often overlooked or ignored. In this study, we introduce a novel and efficient approach for addressing biases called Targeted Data Augmentation (TDA), which leverages classical data augmentation techniques to tackle the pressing issue of bias in data and models. Unlike the laborious task of removing biases, our method proposes to insert biases instead, resulting in improved performance. To identify biases, we annotated two diverse datasets: a dataset of clinical skin lesions and a dataset of male and female faces. These bias annotations are published for the first time in this study, providing a valuable resource for future research. Through Counterfactual Bias Insertion, we discovered that biases associated with the frame, ruler, and glasses had a significant impact on models. By randomly introducing biases during training, we mitigated these biases and achieved a substantial decrease in bias measures, ranging from two-fold to more than 50-fold, while maintaining a negligible increase in the error rate.

摘要
发展公正和伦理AI系统需要仔细考虑偏见缓解，这是常常被忽略或忽视的领域。在这项研究中，我们提出了一种新的和高效的偏见缓解方法，称为Targeted Data Augmentation（TDA），它利用了经典的数据扩展技术来解决数据和模型中的偏见问题。与劳动ious的偏见去除任务不同，我们的方法提议在训练过程中插入偏见，从而改善性能。为了确定偏见，我们对两个多样化的数据集进行了标注：一个是皮肤病变数据集，另一个是男女脸部数据集。这些偏见标注在这项研究中首次公布，提供了未来研究的价值资源。通过对比事实插入偏见，我们发现了框架、尺度和眼镜等偏见对模型的影响是显著的。通过在训练过程中随机插入偏见，我们缓解了这些偏见，并达到了偏见度量的显著减少，从2倍到超过50倍，而保持了错误率的微不足。

Interpretable Distribution-Invariant Fairness Measures for Continuous Scores

paper_url: http://arxiv.org/abs/2308.11375
repo_url: None
paper_authors: Ann-Kristin Becker, Oana Dumitrasc, Klaus Broelemann
For: The paper focuses on developing measures of algorithmic fairness for continuous scores, which can be applied to ranking tasks and are not heavily dependent on the distribution of scores.* Methods: The authors propose a distributionally invariant version of fairness measures for continuous scores based on the Wasserstein distance, which is easily computable and can quantify significant biases that ROC-based fairness measures miss.* Results: The proposed fairness measures outperform ROC-based fairness measures in quantifying and interpreting the strength of group disparities and comparing biases across different models, datasets, or time points, as demonstrated through experiments on the most commonly used fairness benchmark datasets.

Abstract
Measures of algorithmic fairness are usually discussed in the context of binary decisions. We extend the approach to continuous scores. So far, ROC-based measures have mainly been suggested for this purpose. Other existing methods depend heavily on the distribution of scores, are unsuitable for ranking tasks, or their effect sizes are not interpretable. Here, we propose a distributionally invariant version of fairness measures for continuous scores with a reasonable interpretation based on the Wasserstein distance. Our measures are easily computable and well suited for quantifying and interpreting the strength of group disparities as well as for comparing biases across different models, datasets, or time points. We derive a link between the different families of existing fairness measures for scores and show that the proposed distributionally invariant fairness measures outperform ROC-based fairness measures because they are more explicit and can quantify significant biases that ROC-based fairness measures miss. Finally, we demonstrate their effectiveness through experiments on the most commonly used fairness benchmark datasets.

摘要
algorithmic fairness的衡量方法通常在二进制决策之上讨论。我们将这些方法扩展到连续分数上。目前，ROC基本的方法是为此目的提出的。其他现有的方法都受到分布的影响，不适合排名任务，或者其效果不能解释。我们提出一种不受分布影响的公平度衡量方法，基于沃氏距离。我们的方法容易计算，适合量化和解释群体差异的强度以及不同模型、数据集、时间点之间的偏见。我们还 derivate了现有的分数公平度衡量方法之间的连接，并证明我们的分布不受影响的公平度衡量方法在ROC基本的公平度衡量方法的基础上表现更好，因为它们更加直观，可以量化ROC基本的公平度衡量方法所过look的偏见。最后，我们通过使用最常用的公平度 benchmark datasets进行实验，证明了我们的方法的效果。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard varieties of Chinese. If you need the translation in Traditional Chinese, please let me know.

How Much Temporal Long-Term Context is Needed for Action Segmentation?

paper_url: http://arxiv.org/abs/2308.11358
repo_url: https://github.com/ltcontext/ltcontext
paper_authors: Emad Bahrami, Gianpiero Francesca, Juergen Gall
for: 本研究的目的是回答 temporal action segmentation 需要多少长期时间上下文来实现最佳性能。
methods: 我们使用 transformer 基于模型，并使用 sparse attention capture 整个视频的上下文。
results: 我们的实验表明，模型整个视频的上下文是必要的来实现 temporal action segmentation 的最佳性能。

Abstract
Modeling long-term context in videos is crucial for many fine-grained tasks including temporal action segmentation. An interesting question that is still open is how much long-term temporal context is needed for optimal performance. While transformers can model the long-term context of a video, this becomes computationally prohibitive for long videos. Recent works on temporal action segmentation thus combine temporal convolutional networks with self-attentions that are computed only for a local temporal window. While these approaches show good results, their performance is limited by their inability to capture the full context of a video. In this work, we try to answer how much long-term temporal context is required for temporal action segmentation by introducing a transformer-based model that leverages sparse attention to capture the full context of a video. We compare our model with the current state of the art on three datasets for temporal action segmentation, namely 50Salads, Breakfast, and Assembly101. Our experiments show that modeling the full context of a video is necessary to obtain the best performance for temporal action segmentation.

摘要
<>模型长期视频上下文是关键的多种细化任务中，包括时间动作分割。一个有趣的问题是如何多少长期时间上下文是必要的 для最佳性能。虽然转换器可以模型视频的长期上下文，但这会对长视频进行计算是不可持续的。 current works on temporal action segmentation thus combine temporal convolutional networks with self-attention that are computed only for a local temporal window. 尽管这些方法显示良好的结果，但它们的性能受到全视频上下文的限制。在这种情况下，我们试图回答如何多少长期时间上下文是必要的 для时间动作分割，我们提出了一种基于转换器的模型，利用稀疏注意力来捕捉全视频上下文。我们与当前状态的艺术比较这些模型在三个时间动作分割数据集上的性能，结果表明，模型全视频上下文是必要的来获得最佳的时间动作分割性能。Note: "转换器" in the text refers to "transformer" in English.

Machine learning assisted exploration for affine Deligne-Lusztig varieties

paper_url: http://arxiv.org/abs/2308.11355
repo_url: https://github.com/jinpf314/ml4adlv
paper_authors: Bin Dong, Xuhua He, Pengfei Jin, Felix Schremmer, Qingchao Yu
for: 本研究用机器学习助手探索了拟合Deligne-Lusztig变换（ADLV）的几何结构，主要目标是研究ADLV中不空间分布、维度和分解成分的 enumerate 问题。
methods: 该研究提出了一种新的、混合数学和机器学习的框架，包括数据生成、模型训练、模式分析和人工审查。这个框架实现了数学研究和机器学习之间的紧凑合作，并且能够加速纯数学研究，从而找到新的假设和有前途的研究方向。
results: 本研究通过实验和数学分析，提出了一个虚数维度公式和一个新的下界问题，并且提供了一个完整的数学证明。此外，本研究还提供了计算ADLV和机器学习模型的源代码，以便进一步的探索。

Abstract
This paper presents a novel, interdisciplinary study that leverages a Machine Learning (ML) assisted framework to explore the geometry of affine Deligne-Lusztig varieties (ADLV). The primary objective is to investigate the nonemptiness pattern, dimension and enumeration of irreducible components of ADLV. Our proposed framework demonstrates a recursive pipeline of data generation, model training, pattern analysis, and human examination, presenting an intricate interplay between ML and pure mathematical research. Notably, our data-generation process is nuanced, emphasizing the selection of meaningful subsets and appropriate feature sets. We demonstrate that this framework has a potential to accelerate pure mathematical research, leading to the discovery of new conjectures and promising research directions that could otherwise take significant time to uncover. We rediscover the virtual dimension formula and provide a full mathematical proof of a newly identified problem concerning a certain lower bound of dimension. Furthermore, we extend an open invitation to the readers by providing the source code for computing ADLV and the ML models, promoting further explorations. This paper concludes by sharing valuable experiences and highlighting lessons learned from this collaboration.

摘要
Translated into Simplified Chinese:这篇论文介绍了一种新的、混合学科的研究方法，利用机器学习（ML）助手来探索非线性Deligne-Lusztig多折形式（ADLV）的几何结构。研究的主要目标是研究ADLV的非空性模式、维度和总数的reducible组件。我们提出的framework是一个层次结构，包括数据生成、模型训练、模式分析和人工审阅，这种结构展示了机器学习和纯数学研究之间的细腻相互作用。值得注意的是，我们的数据生成过程强调选择有意义的子集和适当的特征集。我们示示了这个框架可以加速纯数学研究，导致新的假设和潜在的研究方向的发现，这些研究方向可能需要很长时间才能探索出来。我们还重新发现了虚间维度公式，并提供了一个新的问题的完整数学证明，这个问题关于某些下界维度的问题。此外，我们向读者开放邀请，提供了计算ADLV和ML模型的源代码，以便进一步的探索。这篇论文 conclude by sharing valuable experiences and highlighting lessons learned from this collaboration.

WEARS: Wearable Emotion AI with Real-time Sensor data

paper_url: http://arxiv.org/abs/2308.11673
repo_url: None
paper_authors: Dhruv Limbani, Daketi Yatin, Nitish Chaturvedi, Vaishnavi Moorthy, Pushpalatha M, Harichandana BSS, Sumit Kumar
for: 这个研究旨在开发一个基于智能手表感应器的情绪预测系统，以便在日常生活中提高用户的情绪预测。
methods: 这个研究使用了一种混合式感应器，包括心率、加速度和运动感应器，以捕捉用户的情绪变化。研究还使用了英文和地方语言的影片来点击用户的情绪，并将数据集成为真实时间的实验数据。
results: 研究结果显示，使用多层感应器模型可以达到93.75%的准确率，并且进行了特征去掉的研究以了解不同感应器对情绪的影响。

Abstract
Emotion prediction is the field of study to understand human emotions. Existing methods focus on modalities like text, audio, facial expressions, etc., which could be private to the user. Emotion can be derived from the subject's psychological data as well. Various approaches that employ combinations of physiological sensors for emotion recognition have been proposed. Yet, not all sensors are simple to use and handy for individuals in their daily lives. Thus, we propose a system to predict user emotion using smartwatch sensors. We design a framework to collect ground truth in real-time utilizing a mix of English and regional language-based videos to invoke emotions in participants and collect the data. Further, we modeled the problem as binary classification due to the limited dataset size and experimented with multiple machine-learning models. We also did an ablation study to understand the impact of features including Heart Rate, Accelerometer, and Gyroscope sensor data on mood. From the experimental results, Multi-Layer Perceptron has shown a maximum accuracy of 93.75 percent for pleasant-unpleasant (high/low valence classification) moods.

摘要
情感预测是人类情感的研究领域，现有方法主要集中在文本、音频、表情等Modalities上，这些Modalities可能是用户私有的。情感可以从主体的心理数据中提取。不同的方法使用多种生物学传感器进行情感识别，但这些传感器不一定是用户日常生活中容易使用的。因此，我们提出了使用智能手表传感器预测用户情感的系统。我们设计了一个框架，通过混合英文和地方语言基于视频来采集真实时间的基准数据，并对这些数据进行模型化。由于数据集的限制，我们将问题定型为二分类问题，并对多种机器学习模型进行实验。我们还进行了减少特征的研究，以了解心率、加速计和自转仪减少情感的影响。从实验结果来看，多层感知器在高低抑裂（pleasant-unpleasant）情感类型上达到了最高的准确率为93.75%。

Careful at Estimation and Bold at Exploration

paper_url: http://arxiv.org/abs/2308.11348
repo_url: None
paper_authors: Xing Chen, Yijun Liu, Zhaogeng Liu, Hechang Chen, Hengshuai Yao, Yi Chang
for: 这篇论文主要针对的是连续动作空间的探索策略，具体来说是DPRL中的策略探索。
methods: 本文提出了一种新的探索策略，基于double-Q函数框架，它包括Q值更新的greedy Q softmax算法，以及将探索策略与Q值更新结合的方法。
results: 本文在Mujoco测试环境中评估了该方法，与之前的最佳方法进行比较，并在许多环境中表现出色，尤其是在最复杂的人工智能环境中。

Abstract
Exploration strategies in continuous action space are often heuristic due to the infinite actions, and these kinds of methods cannot derive a general conclusion. In prior work, it has been shown that policy-based exploration is beneficial for continuous action space in deterministic policy reinforcement learning(DPRL). However, policy-based exploration in DPRL has two prominent issues: aimless exploration and policy divergence, and the policy gradient for exploration is only sometimes helpful due to inaccurate estimation. Based on the double-Q function framework, we introduce a novel exploration strategy to mitigate these issues, separate from the policy gradient. We first propose the greedy Q softmax update schema for Q value update. The expected Q value is derived by weighted summing the conservative Q value over actions, and the weight is the corresponding greedy Q value. Greedy Q takes the maximum value of the two Q functions, and conservative Q takes the minimum value of the two different Q functions. For practicality, this theoretical basis is then extended to allow us to combine action exploration with the Q value update, except for the premise that we have a surrogate policy that behaves like this exploration policy. In practice, we construct such an exploration policy with a few sampled actions, and to meet the premise, we learn such a surrogate policy by minimizing the KL divergence between the target policy and the exploration policy constructed by the conservative Q. We evaluate our method on the Mujoco benchmark and demonstrate superior performance compared to previous state-of-the-art methods across various environments, particularly in the most complex Humanoid environment.

摘要
在连续动作空间中，探索策略经常是规则性的，因为动作的数量是无限的。在先前的工作中，已经证明了在决定性政策再增强学习（DPRL）中，政策基于的探索是有利的。然而，DPRL中的政策基于探索有两个主要问题：无目的探索和政策分化，而且policy梯度 для探索只是在部分情况下有帮助，因为估计不准确。基于双Q函数框架，我们提出了一种新的探索策略，与policy梯度分离。我们首先提出了Q值更新的软MAXschema，其中预期Q值是通过对动作的权重SUMming来计算，权重是对应的greedy Q值。greedy Q值取得最大值，而conservative Q值取得最小值。为了实用，我们将这种理论基础与动作探索结合，除非我们有一个伪政策，该伪政策在探索策略中行为类似。在实践中，我们构建了一个这种探索策略，使用了一些采样的动作，并为了满足这种前提，我们学习了一个伪政策，使其与目标政策的KL散度尽量小。我们在Mujoco bencmark上评估了我们的方法，并在不同环境中显示出比前一个状态的方法更高的性能，特别是在最复杂的人工肢体环境中。

ProAgent: Building Proactive Cooperative AI with Large Language Models

paper_url: http://arxiv.org/abs/2308.11339
repo_url: https://github.com/PKU-Alignment/ProAgent
paper_authors: Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang
for: 这 paper 的目的是开发一种基于大语言模型的智能代理（ProAgent），用于协同决策和行为适应。
methods: 这 paper 使用了大语言模型（LLMs）来实现 ProAgent，并通过对自己和团队成员的决策进行预测和规划来提高协同决策的能力。
results: 实验结果表明，ProAgent 在协同决策中表现出色，与现有的五种基于自我玩家和人口训练的方法相比，在与 AI 代理和人类代理合作时表现出明显的性能优势。

Abstract
Building AIs with adaptive behaviors in human-AI cooperation stands as a pivotal focus in AGI research. Current methods for developing cooperative agents predominantly rely on learning-based methods, where policy generalization heavily hinges on past interactions with specific teammates. These approaches constrain the agent's capacity to recalibrate its strategy when confronted with novel teammates. We propose \textbf{ProAgent}, a novel framework that harnesses large language models (LLMs) to fashion a \textit{pro}active \textit{agent} empowered with the ability to anticipate teammates' forthcoming decisions and formulate enhanced plans for itself. ProAgent excels at cooperative reasoning with the capacity to dynamically adapt its behavior to enhance collaborative efforts with teammates. Moreover, the ProAgent framework exhibits a high degree of modularity and interpretability, facilitating seamless integration to address a wide array of coordination scenarios. Experimental evaluations conducted within the framework of \textit{Overcook-AI} unveil the remarkable performance superiority of ProAgent, outperforming five methods based on self-play and population-based training in cooperation with AI agents. Further, when cooperating with human proxy models, its performance exhibits an average improvement exceeding 10\% compared to the current state-of-the-art, COLE. The advancement was consistently observed across diverse scenarios involving interactions with both AI agents of varying characteristics and human counterparts. These findings inspire future research for human-robot collaborations. For a hands-on demonstration, please visit \url{https://pku-proagent.github.io}.

摘要
建立人工智能机器人通过适应行为在人机合作中占据着重要地位，是AGI研究中的纽带。现有的合作代理人开发方法主要靠学习基本方法，策略总结受到过去与特定队友的交互所限。这些方法限制代理人的能力，在遇到新的队友时重新调整策略。我们提出了ProAgent框架，利用大型自然语言模型（LLM）为代理人带来了一种积极的战略，可以预测队友的下一步决策并根据此制定更加优化的计划。ProAgent在合作理解方面表现出了出色的能力，能够适应不同的队友并 dynamically adapt its behavior to enhance collaborative efforts。此外，ProAgent框架具有高度的可组合性和可读性，可以轻松地整合多种协调enario。在Overcook-AI框架下进行的实验评估中，ProAgent表现出了很高的性能优势，比基于自我玩家和人口训练的五种方法更出色地合作与AI代理人。此外，当与人类代理模型合作时，其表现也表现出了平均超过10%的提高，相比当前状态艺的COLE。这些发现鼓励未来的人机合作研究。如果您想了解更多，请访问\url{https://pku-proagent.github.io}.

Generalising sequence models for epigenome predictions with tissue and assay embeddings

paper_url: http://arxiv.org/abs/2308.11671
repo_url: None
paper_authors: Jacob Deasy, Ron Schwessinger, Ferran Gonzalez, Stephen Young, Kim Branson
for: 用于预测脱氧核糖体谱的序列模型方法在最近几年内得到了扩展，包括序列长度、模型大小和谱型多样性。
methods: 我们利用Contextualised Genomic Network（CGN）将组织和检测的嵌入式到输入空间中，以增强长距离序列嵌入的表达。
results: 我们的方法在多种脱氧核糖体谱中展现出了强相关性，并且为首次研究了基因变化对脱氧核糖体序列模型训练的影响。我们的总体方法在多个设置中超越了现有的状况检验。

Abstract
Sequence modelling approaches for epigenetic profile prediction have recently expanded in terms of sequence length, model size, and profile diversity. However, current models cannot infer on many experimentally feasible tissue and assay pairs due to poor usage of contextual information, limiting $\textit{in silico}$ understanding of regulatory genomics. We demonstrate that strong correlation can be achieved across a large range of experimental conditions by integrating tissue and assay embeddings into a Contextualised Genomic Network (CGN). In contrast to previous approaches, we enhance long-range sequence embeddings with contextual information in the input space, rather than expanding the output space. We exhibit the efficacy of our approach across a broad set of epigenetic profiles and provide the first insights into the effect of genetic variants on epigenetic sequence model training. Our general approach to context integration exceeds state of the art in multiple settings while employing a more rigorous validation procedure.

摘要
Sequence模型方法 для聚类profile预测最近几年在序列长度、模型大小和profile多样性方面得到了扩展。然而，当前的模型无法在许多实验可能的组织和测试对之间进行预测，因为它们不充分利用了Contextual information，限制了$\textit{in silico}$理解调控 genomics。我们示示了在大量实验条件下强相关性可以通过integrating组织和测试嵌入到Contextualised Genomic Network (CGN)中来实现。与之前的方法不同的是，我们在输入空间中增强长距离序列嵌入，而不是扩展输出空间。我们在多个聚类profile中展示了我们的方法的效果，并为聚类profile模型的训练提供了首次的遗传变异的影响。我们的总体方法在多个设置中超越了现状的最佳实践，并且采用了更严格的验证过程。

Protect Federated Learning Against Backdoor Attacks via Data-Free Trigger Generation

paper_url: http://arxiv.org/abs/2308.11333
repo_url: None
paper_authors: Yanxin Yang, Ming Hu, Yue Cao, Jun Xia, Yihao Huang, Yang Liu, Mingsong Chen
For: The paper aims to address the vulnerability of Federated Learning (FL) to poisoning attacks, specifically backdoor attacks, by proposing a novel data-free trigger-generation-based defense approach.* Methods: The proposed approach uses two characteristics of backdoor attacks to generate trigger images that can eliminate poisoned models and ensure the updated global model is benign. These methods include identifying the differences between the old and new global models, and evaluating the effect of the generated images.* Results: The approach is shown to defend against almost all existing types of backdoor attacks and outperform seven state-of-the-art defense methods with both IID and non-IID scenarios. Additionally, the approach can successfully defend against backdoor attacks even when 80% of the clients are malicious.

Abstract
As a distributed machine learning paradigm, Federated Learning (FL) enables large-scale clients to collaboratively train a model without sharing their raw data. However, due to the lack of data auditing for untrusted clients, FL is vulnerable to poisoning attacks, especially backdoor attacks. By using poisoned data for local training or directly changing the model parameters, attackers can easily inject backdoors into the model, which can trigger the model to make misclassification of targeted patterns in images. To address these issues, we propose a novel data-free trigger-generation-based defense approach based on the two characteristics of backdoor attacks: i) triggers are learned faster than normal knowledge, and ii) trigger patterns have a greater effect on image classification than normal class patterns. Our approach generates the images with newly learned knowledge by identifying the differences between the old and new global models, and filters trigger images by evaluating the effect of these generated images. By using these trigger images, our approach eliminates poisoned models to ensure the updated global model is benign. Comprehensive experiments demonstrate that our approach can defend against almost all the existing types of backdoor attacks and outperform all the seven state-of-the-art defense methods with both IID and non-IID scenarios. Especially, our approach can successfully defend against the backdoor attack even when 80\% of the clients are malicious.

摘要
为了防止恶意客户端对模型进行恶意攻击，我们提出了一种基于数据预处理技术的数据自由触发器生成方法。我们发现，在攻击者拥有恶意模型时，攻击者可以通过直接修改模型参数或使用恶意数据进行本地训练来插入后门。为了解决这些问题，我们提出了一种基于两个特征的触发器生成方法：一是触发器在学习过程中更快速地学习，二是触发器图像分类效果更大于正常类图像。我们的方法通过生成新的图像，并对这些图像进行筛选来消除恶意模型。我们的方法可以在IID和非IID场景下对大部分现有的后门攻击进行防御，并且在80%的客户端是恶意的情况下也能成功防御。

Machine Learning-based Positioning using Multivariate Time Series Classification for Factory Environments

paper_url: http://arxiv.org/abs/2308.11670
repo_url: None
paper_authors: Nisal Hemadasa Manikku Badu, Marcus Venzke, Volker Turau, Yanqiu Huang
for: This paper is written for indoor positioning systems (IPS) in privacy-concerned factory environments, where external infrastructures are infeasible or expensive to deploy.
methods: The paper uses machine learning (ML) models, specifically a multivariate time series classification (MTSC) approach, to localize moving entities in these environments using motion and ambient sensors.
results: The paper presents a comparative analysis of different ML models for indoor positioning, including CNN-1D, MLP, and DT. The results show that all models can achieve accuracies above 80%, with DT having the lowest memory footprint and inference latency, making it a promising choice for real-world deployments.

Abstract
Indoor Positioning Systems (IPS) gained importance in many industrial applications. State-of-the-art solutions heavily rely on external infrastructures and are subject to potential privacy compromises, external information requirements, and assumptions, that make it unfavorable for environments demanding privacy and prolonged functionality. In certain environments deploying supplementary infrastructures for indoor positioning could be infeasible and expensive. Recent developments in machine learning (ML) offer solutions to address these limitations relying only on the data from onboard sensors of IoT devices. However, it is unclear which model fits best considering the resource constraints of IoT devices. This paper presents a machine learning-based indoor positioning system, using motion and ambient sensors, to localize a moving entity in privacy concerned factory environments. The problem is formulated as a multivariate time series classification (MTSC) and a comparative analysis of different machine learning models is conducted in order to address it. We introduce a novel time series dataset emulating the assembly lines of a factory. This dataset is utilized to assess and compare the selected models in terms of accuracy, memory footprint and inference speed. The results illustrate that all evaluated models can achieve accuracies above 80 %. CNN-1D shows the most balanced performance, followed by MLP. DT was found to have the lowest memory footprint and inference latency, indicating its potential for a deployment in real-world scenarios.

摘要
本文报告了一种基于机器学习的indoor Positioning System，使用运动和 ambient 传感器来确定移动对象的位置。问题被формализ为多变量时间序列分类（MTSC），并进行了不同机器学习模型的比较分析，以解决该问题。我们创建了一个新的时间序列数据集，模拟了制造工场的生产线。这个数据集用于评估和比较选择的模型，以确定它们的准确率、内存占用量和推理速度。结果表明，所有评估模型都可以达到上千分之八十的准确率。CNN-1D 显示出最佳的平衡性，其次是 MLP。DT 具有最低的内存占用量和推理延迟，因此在实际应用场景中具有潜在的优势。

Class Label-aware Graph Anomaly Detection

paper_url: http://arxiv.org/abs/2308.11669
repo_url: https://github.com/jhkim611/clad
paper_authors: Junghoon Kim, Yeonjun In, Kanghoon Yoon, Junmo Lee, Chanyoung Park
for: 这篇论文的目的是提出一种基于类标签的图像异常检测方法，以提高无监督图像异常检测的性能。
methods: 该方法使用有限数量的标注节点来增强无监督图像异常检测的性能。
results: 对于十个 dataset 的实验结果表明，CLAD 在缺乏类标签信息的情况下也能够显著超过现有的无监督图像异常检测方法。源代码可以在 \url{https://github.com/jhkim611/CLAD} 上下载。

Abstract
Unsupervised GAD methods assume the lack of anomaly labels, i.e., whether a node is anomalous or not. One common observation we made from previous unsupervised methods is that they not only assume the absence of such anomaly labels, but also the absence of class labels (the class a node belongs to used in a general node classification task). In this work, we study the utility of class labels for unsupervised GAD; in particular, how they enhance the detection of structural anomalies. To this end, we propose a Class Label-aware Graph Anomaly Detection framework (CLAD) that utilizes a limited amount of labeled nodes to enhance the performance of unsupervised GAD. Extensive experiments on ten datasets demonstrate the superior performance of CLAD in comparison to existing unsupervised GAD methods, even in the absence of ground-truth class label information. The source code for CLAD is available at \url{https://github.com/jhkim611/CLAD}.

摘要
<>转换给定文本到简化中文。<>无监督GAD方法假设缺失异常标签，即节点是否异常。我们从前一些无监督方法中的观察结果中发现，不仅假设缺失异常标签，而且还假设缺失分类标签（用于普通的节点分类任务中的节点分类）。在这项工作中，我们研究了分类标签对无监督GAD的利用性，具体来说，如何通过限制使用有限数量的标签节点来提高无监督GAD的性能。为此，我们提出了一个具有限制使用有限数量的标签节点的分类标签感知Graph异常检测框架（CLAD）。经过对十个数据集的广泛实验，我们发现CLAD在无监督GAD方法中表现出优于现有的无监督GAD方法，即使没有ground truth分类标签信息。CLAD的源代码可以在\url{https://github.com/jhkim611/CLAD}中找到。

Uncertainty Estimation of Transformers’ Predictions via Topological Analysis of the Attention Matrices

paper_url: http://arxiv.org/abs/2308.11295
repo_url: None
paper_authors: Elizaveta Kostenok, Daniil Cherniavskii, Alexey Zaytsev
for: 本研究旨在解决深度学习模型在预测中的信任度评估问题，这是自然语言处理领域中的一个开放问题。
methods: 本研究使用了基于Transformer架构的神经网络，并利用Topological Data Analysis方法来探索这些模型中的内部关系。
results: 研究人员提出了基于注意力机制的不确定性估计方法，并与传统方法进行比较。结果显示，提出的算法在质量上超过了现有方法，并开启了一个新的应用领域，但是需要选择 topological 特征。

Abstract
Determining the degree of confidence of deep learning model in its prediction is an open problem in the field of natural language processing. Most of the classical methods for uncertainty estimation are quite weak for text classification models. We set the task of obtaining an uncertainty estimate for neural networks based on the Transformer architecture. A key feature of such mo-dels is the attention mechanism, which supports the information flow between the hidden representations of tokens in the neural network. We explore the formed relationships between internal representations using Topological Data Analysis methods and utilize them to predict model's confidence. In this paper, we propose a method for uncertainty estimation based on the topological properties of the attention mechanism and compare it with classical methods. As a result, the proposed algorithm surpasses the existing methods in quality and opens up a new area of application of the attention mechanism, but requires the selection of topological features.

摘要
决定深度学习模型预测结果的信度是自然语言处理领域的开启问题。大多数古典方法用于不确定性估计都对文本分类模型而言是很弱的。我们设定了基于Transformer架构的神经网络中的不确定性估计任务。Transformer模型的关键特征是对内存表现的注意机制，它支持内存表现之间的信息流动。我们利用Topological Data Analysis方法来探索内存表现之间的关系，并将其用于预测模型的信度。在这篇论文中，我们提出了基于注意机制的不确定性估计方法，并与古典方法进行比较。结果显示，我们的方法与古典方法相比有较高的质量，并开启了新的注意机制应用领域，但是需要选择适当的数据特征。

Network Momentum across Asset Classes

paper_url: http://arxiv.org/abs/2308.11294
repo_url: None
paper_authors: Xingyue Pu, Stephen Roberts, Xiaowen Dong, Stefan Zohren
for: 这篇论文旨在研究网络势动，即资产间势动的协同作用，以探讨多个资产类别之间的势动风险偏好传递。
methods: 该论文使用了一种线性和可解释的图学习模型，以揭示多资产类别之间的势动风险偏好传递网络。
results: 研究发现，资产间势动风险偏好传递网络存在，并且可以通过利用这些网络来构建一个多资产投资策略，其中的Sharpe比率为1.5，年化回报率为22%。

Abstract
We investigate the concept of network momentum, a novel trading signal derived from momentum spillover across assets. Initially observed within the confines of pairwise economic and fundamental ties, such as the stock-bond connection of the same company and stocks linked through supply-demand chains, momentum spillover implies a propagation of momentum risk premium from one asset to another. The similarity of momentum risk premium, exemplified by co-movement patterns, has been spotted across multiple asset classes including commodities, equities, bonds and currencies. However, studying the network effect of momentum spillover across these classes has been challenging due to a lack of readily available common characteristics or economic ties beyond the company level. In this paper, we explore the interconnections of momentum features across a diverse range of 64 continuous future contracts spanning these four classes. We utilise a linear and interpretable graph learning model with minimal assumptions to reveal the intricacies of the momentum spillover network. By leveraging the learned networks, we construct a network momentum strategy that exhibits a Sharpe ratio of 1.5 and an annual return of 22%, after volatility scaling, from 2000 to 2022. This paper pioneers the examination of momentum spillover across multiple asset classes using only pricing data, presents a multi-asset investment strategy based on network momentum, and underscores the effectiveness of this strategy through robust empirical analysis.

摘要

Improving Knot Prediction in Wood Logs with Longitudinal Feature Propagation

paper_url: http://arxiv.org/abs/2308.11291
repo_url: https://github.com/jeremyfix/icvs2023
paper_authors: Salim Khazem, Jeremy Fix, Cédric Pradalier
for: 这个研究是为了预测木材中内部缺陷的位置，以提高木材质量评估的精度和效率。
methods: 本研究使用了卷积回传神经网络来解决木材外形特征与内部缺陷的Binary segmentation任务。
results: 研究发现，通过使用卷积回传神经网络可以从木材外形特征中预测内部缺陷的位置，并且可以使用便宜的 Laser profilers 进行测量。对于 Fir 和 Spruce 树species 进行了评估，并进行了删除循环的评估。

Abstract
The quality of a wood log in the wood industry depends heavily on the presence of both outer and inner defects, including inner knots that are a result of the growth of tree branches. Today, locating the inner knots require the use of expensive equipment such as X-ray scanners. In this paper, we address the task of predicting the location of inner defects from the outer shape of the logs. The dataset is built by extracting both the contours and the knots with X-ray measurements. We propose to solve this binary segmentation task by leveraging convolutional recurrent neural networks. Once the neural network is trained, inference can be performed from the outer shape measured with cheap devices such as laser profilers. We demonstrate the effectiveness of our approach on fir and spruce tree species and perform ablation on the recurrence to demonstrate its importance.

摘要
木材行业中，木材质量受到外部和内部缺陷的影响，包括内部的结节，这些结节由树木的生长所导致。现在，找到内部缺陷需要使用昂贵的设备，如X射线扫描仪。在这篇论文中，我们Addresses the task of predicting内部缺陷的位置从木材外形的形态。我们使用卷积循环神经网络解决这个二分类标注任务。一旦神经网络训练完成，可以通过便宜的设备，如激光 Profiler 进行推理。我们在桦树和落叶树两种树种上验证了我们的方法，并对循环的重要性进行了遮红。

ShadowNet for Data-Centric Quantum System Learning

paper_url: http://arxiv.org/abs/2308.11290
repo_url: None
paper_authors: Yuxuan Du, Yibo Yang, Tongliang Liu, Zhouchen Lin, Bernard Ghanem, Dacheng Tao
for: 这篇论文旨在探讨如何使用统计学学习方法来解决大量量子系统的问题，即所谓的“量子系统学习”（Quantum System Learning，QSL）。
methods: 这篇论文提出了一种基于数据的学习方法， combining classical shadows和神经网络，以便解决多种QSL任务。这种方法可以在训练 stage上使用几何学习，并在推断 stage上使用神经网络进行预测，从而可以减少预测的不确定性。
results: 作者们在实验中使用了这种方法来解决量子状态探测和直接准确率计算等问题，并成功地在60个码之间进行了数值分析。这些结果表明这种数据驱动的人工智能方法可以帮助我们更好地探索和理解大量量子系统。

Abstract
Understanding the dynamics of large quantum systems is hindered by the curse of dimensionality. Statistical learning offers new possibilities in this regime by neural-network protocols and classical shadows, while both methods have limitations: the former is plagued by the predictive uncertainty and the latter lacks the generalization ability. Here we propose a data-centric learning paradigm combining the strength of these two approaches to facilitate diverse quantum system learning (QSL) tasks. Particularly, our paradigm utilizes classical shadows along with other easily obtainable information of quantum systems to create the training dataset, which is then learnt by neural networks to unveil the underlying mapping rule of the explored QSL problem. Capitalizing on the generalization power of neural networks, this paradigm can be trained offline and excel at predicting previously unseen systems at the inference stage, even with few state copies. Besides, it inherits the characteristic of classical shadows, enabling memory-efficient storage and faithful prediction. These features underscore the immense potential of the proposed data-centric approach in discovering novel and large-scale quantum systems. For concreteness, we present the instantiation of our paradigm in quantum state tomography and direct fidelity estimation tasks and conduct numerical analysis up to 60 qubits. Our work showcases the profound prospects of data-centric artificial intelligence to advance QSL in a faithful and generalizable manner.

摘要
大量量子系统的动态性受到维度约束的困扰。统计学学习提供了新的可能性，使用神经网络协议和类征阴影，然而这两种方法都有局限性：前者受到预测不确定性的困扰，后者缺乏总结能力。我们提出了一种数据驱动学习方法，结合这两种方法，以便推动多种量子系统学习任务（QSL）。具体来说，我们的方法利用类征阴影和其他量子系统较易获得的信息，创建训练集，然后通过神经网络学习探索到量子系统下的下 mapping 规则。利用神经网络的总结能力，我们的方法可以在训练stage上进行线上培养，在探索阶段以几个状态复制预测未看过的系统，并且具有快速储存和准确预测的特点。这些特点强调了我们提出的数据驱动方法在发现新的大规模量子系统方面的巨大潜力。为了证明这一点，我们在量子状态探测和直接准确率估计任务中实现了我们的方法，并进行了数值分析至60个量子比特。我们的工作展示了数据驱动人工智能在 faithful和总结的方式下进行量子系统学习的极大潜力。

Test Time Embedding Normalization for Popularity Bias Mitigation

paper_url: http://arxiv.org/abs/2308.11288
repo_url: https://github.com/ml-postech/tten
paper_authors: Dain Kim, Jinhyeok Park, Dongwoo Kim
for: 本研究旨在 Mitigating popularity bias in recommender systems, where popular items tend to dominate recommendation results.
methods: 我们提出了 ‘Test Time Embedding Normalization’ 方法，这是一种简单 yet effective strategy for mitigating popularity bias. Our approach utilizes the normalized item embedding during the inference stage to control the influence of embedding magnitude, which is highly correlated with item popularity.
results: 我们通过了广泛的实验，证明了我们的方法可以比前一些偏袋减弱方法更好地减少偏袋强度。 In addition, we found that the angular similarity between user and item embeddings can distinguish preferable and non-preferable items regardless of their popularity.

Abstract
Popularity bias is a widespread problem in the field of recommender systems, where popular items tend to dominate recommendation results. In this work, we propose 'Test Time Embedding Normalization' as a simple yet effective strategy for mitigating popularity bias, which surpasses the performance of the previous mitigation approaches by a significant margin. Our approach utilizes the normalized item embedding during the inference stage to control the influence of embedding magnitude, which is highly correlated with item popularity. Through extensive experiments, we show that our method combined with the sampled softmax loss effectively reduces popularity bias compare to previous approaches for bias mitigation. We further investigate the relationship between user and item embeddings and find that the angular similarity between embeddings distinguishes preferable and non-preferable items regardless of their popularity. The analysis explains the mechanism behind the success of our approach in eliminating the impact of popularity bias. Our code is available at https://github.com/ml-postech/TTEN.

摘要
受欢迎偏见是推荐系统领域中的一个广泛存在的问题，即受欢迎的 Item 占据推荐结果的主导地位。在这项工作中，我们提出了“测试时嵌入normalization”作为一种简单 yet effective的缓解受欢迎偏见策略，超过了之前的缓解方法的性能水平。我们的方法在推荐阶段使用 норmalized item嵌入来控制嵌入大小的影响，与受欢迎度高度相关。经过广泛的实验，我们表明了我们的方法与抽取softmax损失结合可以有效地缓解受欢迎偏见。我们进一步调查用户和 Item 嵌入之间的关系，发现angular相似性 между嵌入可以分辨用户喜欢和不喜欢的 Item，无论它们的受欢迎程度。这种分析解释了我们的方法在消除受欢迎偏见的机制。我们的代码可以在https://github.com/ml-postech/TTEN中找到。

CNN based Cuneiform Sign Detection Learned from Annotated 3D Renderings and Mapped Photographs with Illumination Augmentation

paper_url: http://arxiv.org/abs/2308.11277
repo_url: None
paper_authors: Ernst Stötzner, Timo Homburg, Hubert Mara
for: 提高ancient near eastern studies中的数字化研究难度，开发了用于处理古代 Mesopotamia文字的数字工具。
methods: 使用3D渲染、摄影和推理的组合方法，并提供了一种将3D渲染与摄影数据进行映射的工具。
results: 使用3D渲染的照片获得更好的结果，而且在混合数据集上也能够获得良好的结果。此外，使用MSII渲染提高了照片上的结果。

Abstract
Motivated by the challenges of the Digital Ancient Near Eastern Studies (DANES) community, we develop digital tools for processing cuneiform script being a 3D script imprinted into clay tablets used for more than three millennia and at least eight major languages. It consists of thousands of characters that have changed over time and space. Photographs are the most common representations usable for machine learning, while ink drawings are prone to interpretation. Best suited 3D datasets that are becoming available. We created and used the HeiCuBeDa and MaiCuBeDa datasets, which consist of around 500 annotated tablets. For our novel OCR-like approach to mixed image data, we provide an additional mapping tool for transferring annotations between 3D renderings and photographs. Our sign localization uses a RepPoints detector to predict the locations of characters as bounding boxes. We use image data from GigaMesh's MSII (curvature, see https://gigamesh.eu) based rendering, Phong-shaded 3D models, and photographs as well as illumination augmentation. The results show that using rendered 3D images for sign detection performs better than other work on photographs. In addition, our approach gives reasonably good results for photographs only, while it is best used for mixed datasets. More importantly, the Phong renderings, and especially the MSII renderings, improve the results on photographs, which is the largest dataset on a global scale.

摘要
受敦着数字古 Near East Studies (DANES) 社区的挑战启发，我们开发了数字工具来处理楔形文字，这是一种3D文字印刷在泥 TABLETS上，用于超过三千年和至少八种主要语言。它包括数以千计的字符，随着时间和空间而变化。 photographs 是最常见的机器学习表示方式，而墨汁画可能存在解释的问题。我们创建了 HeiCuBeDa 和 MaiCuBeDa 数据集，它们包含约500个标注的楔形文字。为我们的新的 OCR-like 方法，我们提供了一个附加的映射工具，用于将3D渲染和 photographs 之间的标注转移。我们的字符定位使用 RepPoints 探测器来预测字符的位置，并使用 GigaMesh 的 MSII (曲率，请参见 ) 基于的渲染、Phong 灯光渲染和 photographs 以及照明增强。结果表明，使用rendered 3D图像进行字符检测比其他作品更好，而且我们的方法在混合数据集上获得了良好的结果。此外，我们的方法在 photographs 上也获得了良好的结果，特别是使用 MSII 渲染，这是全球规模最大的数据集。

FoX: Formation-aware exploration in multi-agent reinforcement learning

paper_url: http://arxiv.org/abs/2308.11272
repo_url: None
paper_authors: Yonghyeon Jo, Sunwoo Lee, Junghyuk Yum, Seungyul Han
for: 提高多代理人学习（MARL）中的探索性能，解决多代理人探索空间的扩展性和不可见性问题。
methods: 提出了一种基于形态关系的探索框架（FoX），通过让半可见代理人访问不同形态的有用状态，使其更加意识到当前形态。
results: 在Google研究足球（GRF）和罕见Starcraft II多代理人挑战（SMAC）任务上，提出的FoX框架与现状的MARL算法相比，有显著的性能提升。

Abstract
Recently, deep multi-agent reinforcement learning (MARL) has gained significant popularity due to its success in various cooperative multi-agent tasks. However, exploration still remains a challenging problem in MARL due to the partial observability of the agents and the exploration space that can grow exponentially as the number of agents increases. Firstly, in order to address the scalability issue of the exploration space, we define a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations. Then, we propose a novel formation-aware exploration (FoX) framework that encourages partially observable agents to visit the states in diverse formations by guiding them to be well aware of their current formation solely based on their own observations. Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.

摘要
近期，深度多代理强化学习（MARL）已经受到了各种合作多代理任务中的成功。然而，探索仍然是 MARL 中的一个挑战，因为代理的部分可见性和探索空间的快速增长，尤其是当代理数量增加时。为了解决探索空间的扩展性问题，我们定义了一种形态基于的等价关系在探索空间上，并尝试将搜索限定在不同形态的状态上。然后，我们提出了一种新的形态感知探索（FoX）框架，使得部分可见的代理可以尽可能地访问不同形态的状态，通过基于自己观察的形态意识来导航。numerical results显示，我们提出的 FoX 框架在 Google Research Football (GRF) 和 sparse Starcraft II multi-agent challenge (SMAC) 任务上表现出了明显的优异。

Quantum-Inspired Machine Learning: a Survey

paper_url: http://arxiv.org/abs/2308.11269
repo_url: None
paper_authors: Larry Huynh, Jin Hong, Ajmal Mian, Hajime Suzuki, Yanqiu Wu, Seyit Camtepe
for: 这篇论文旨在为研究者提供一份完整、 инте格遍渠的量子机器学习（QiML）概述，涵盖了QiML多个研究领域，如张量网络 simulations、dequantized algorithms等，并展示了最新的进展、实践应用以及未来研究方向。
methods: 本论文使用了多种方法，包括张量网络 simulations、dequantized algorithms等，以探讨QiML的多个研究领域。
results: 本论文通过对QiML的多个研究领域进行探讨，揭示了这些领域的最新进展和实践应用，并预测未来研究的发展趋势。

Abstract
Quantum-inspired Machine Learning (QiML) is a burgeoning field, receiving global attention from researchers for its potential to leverage principles of quantum mechanics within classical computational frameworks. However, current review literature often presents a superficial exploration of QiML, focusing instead on the broader Quantum Machine Learning (QML) field. In response to this gap, this survey provides an integrated and comprehensive examination of QiML, exploring QiML's diverse research domains including tensor network simulations, dequantized algorithms, and others, showcasing recent advancements, practical applications, and illuminating potential future research avenues. Further, a concrete definition of QiML is established by analyzing various prior interpretations of the term and their inherent ambiguities. As QiML continues to evolve, we anticipate a wealth of future developments drawing from quantum mechanics, quantum computing, and classical machine learning, enriching the field further. This survey serves as a guide for researchers and practitioners alike, providing a holistic understanding of QiML's current landscape and future directions.

摘要
量子启发机器学习（QiML）是一个迅速发展的领域，在全球范围内吸引了研究者们的关注，因为它可能利用量子力学原理在类型性计算框架中实现。然而，当前的文献综述往往停留在更广泛的量子机器学习（QML）领域，而不是深入探究QiML。为了填补这个空白，本调查提供了一个整合和全面的QiML调查，探讨QiML的多个研究领域，包括张量网络 simulate、解量化算法等，展示最新的进展、实际应用以及未来研究方向。此外，本调查还提出了一个具体的QiML定义，通过分析各种先前解释的歧义，并且解决了这些歧义的内在不确定性。随着QiML的进一步发展，我们预计将有更多的未来发展，这些发展将来自量子力学、量子计算和类型性机器学习，使得QiML领域的研究和应用得到进一步的推动。本调查作为研究者和实践者的指南，为您提供了QiML当前领域的整体认识和未来发展方向。

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

paper_url: http://arxiv.org/abs/2308.11267
repo_url: None
paper_authors: David M. Bossens
for: 本研究的目的是提出一种基于Markov决策过程的 robust constrained reinforcement learning（RCRL）框架，以满足行为约束和模型误差的约束。
methods: 本研究使用了Value Estimation（值估计）和Lagrangian（拉格朗日）来模型 RCMDPs，并提出了两种算法：RCPG with Robust Lagrangian和Adversarial RCPG。
results: 实验结果显示，相比传统的RCPG变种和非 robust、非约束的ablations，提出的两种算法在 инвенoty管理和安全航行任务中表现竞争力强，特别是Adversarial RCPG在所有测试中排名第二。

Abstract
The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the full constrained objective and the lack of incremental learning, this paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG. RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint. Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy through gradient descent rather than indirectly and abruptly through constrained optimisation on a sorted value list. A theoretical analysis first derives the Lagrangian policy gradient for the policy optimisation of both proposed algorithms and then the adversarial policy gradient to learn the adversary for Adversarial RCPG. Empirical experiments injecting perturbations in inventory management and safe navigation tasks demonstrate the competitive performance of both algorithms compared to traditional RCPG variants as well as non-robust and non-constrained ablations. In particular, Adversarial RCPG ranks among the top two performing algorithms on all tests.

摘要
RCMDP（Robust Constrained Markov Decision Process）是一种最新的任务建模框架，用于机器学习中的强化学习，它将行为约束和模型转移动力模型中的错误 incorporated 。 simulate RCMDP 需要根据每个状态的值估计计算最差情况的动力学，这种方法已经在 Robust Constrained Policy Gradient (RCPG) 中使用。 highlighting RCPG 的 potential downsides, such as not robustifying the full constrained objective and the lack of incremental learning, 这篇论文提出了两种算法，即 RCPG with Robust Lagrangian 和 Adversarial RCPG。 RCPG with Robust Lagrangian 修改了 RCPG，通过 Lagrangian 而不是值或约束来计算最差情况的动力学。 Adversarial RCPG 也使用 Lagrangian，但是通过对 gradient descent 来直接和逐步地学习 adversary，而不是通过受限的优化来 indirectly 和突然地学习。理论分析首先 derive 了 Lagrangian policy gradient для policy 优化 both proposed algorithms，然后 derive 了 adversarial policy gradient 来学习 adversary для Adversarial RCPG。 empirical experiments 在 inventory management 和 safe navigation 任务中注入抖动，表明 both algorithms 与 traditional RCPG 变体以及非强化和非约束的ablations 相比，表现竞争性。尤其是 Adversarial RCPG，在所有测试中排名前两名。

Efficient Last-iterate Convergence Algorithms in Solving Games

paper_url: http://arxiv.org/abs/2308.11256
repo_url: None
paper_authors: Linjian Meng, Zhenxing Ge, Wenbin Li, Bo An, Yang Gao
for: 学习二人零Sum游戏的 Nash 平衡（NE）。
methods: 使用 Reward Transformation 框架，转化为 strongly convex-concave 优化问题（SCCP），并使用 Regret Matching+ 算法（RM+）解决 SCCP。
results: 提出了 Reward Transformation RM+ 算法（RTRM+），实现了 last-iterate 收敛，并在实验中表现出色，胜过现有的 last-iterate 收敛算法和 RM+。

Abstract
No-regret algorithms are popular for learning Nash equilibrium (NE) in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs). Many recent works consider the last-iterate convergence no-regret algorithms. Among them, the two most famous algorithms are Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weight Update (OMWU). However, OGDA has high per-iteration complexity. OMWU exhibits a lower per-iteration complexity but poorer empirical performance, and its convergence holds only when NE is unique. Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU. Unfortunately, RT-based algorithms perform worse than OGDA under the same number of iterations, and their convergence guarantee is based on the continuous-time feedback assumption, which does not hold in most scenarios. To address these issues, we provide a closer analysis of the RT framework, which holds for both continuous and discrete-time feedback. We demonstrate that the essence of the RT framework is to transform the problem of learning NE in the original game into a series of strongly convex-concave optimization problems (SCCPs). We show that the bottleneck of RT-based algorithms is the speed of solving SCCPs. To improve the their empirical performance, we design a novel transformation method to enable the SCCPs can be solved by Regret Matching+ (RM+), a no-regret algorithm with better empirical performance, resulting in Reward Transformation RM+ (RTRM+). RTRM+ enjoys last-iterate convergence under the discrete-time feedback setting. Using the counterfactual regret decomposition framework, we propose Reward Transformation CFR+ (RTCFR+) to extend RTRM+ to EFGs. Experimental results show that our algorithms significantly outperform existing last-iterate convergence algorithms and RM+ (CFR+).

摘要
不后悔算法是现代学习 Nash 均衡 (NE) 的两 player 零差游戏 (NFGs) 和扩展游戏 (EFGs) 中非常流行的。许多最近的工作是考虑最后迭代 convergence 不后悔算法。其中，最具知名度的两个算法是 Optimistic Gradient Descent Ascent (OGDA) 和 Optimistic Multiplicative Weight Update (OMWU)。但是 OGDA 的每迭代复杂度高，OWMU 的实际性较差，且其对 NE 独特性的数学保证只适用于当 NE 独特时。最近的工作提出了一个 Reward Transformation (RT) 框架，用于将 MWU 转换为一系列的强烈凸凹优化问题 (SCCPs)。这个框架可以消除 NE 独特性的限制，并实现与 OMWU 相同的性能。然而，RT-based 算法在同一个迭代次数下表现比 OGDA 差，且其对应的数学保证基于连续时间反馈假设，这不是现实中的普遍情况。为了解决这些问题，我们进行了更深入的 RT 框架的分析，该框架适用于连续时间和离散时间反馈。我们证明了 RT 框架的核心是将原始游戏中学习 NE 的问题转换为一系列的强烈凸凹优化问题 (SCCPs)。我们显示了 SCCPs 的瓶颈是解决 SCCPs 的速度。为了改善 RT-based 算法的实际性，我们设计了一个新的转换方法，使得 SCCPs 可以通过 Regret Matching+ (RM+) 解决，这是一个具有更好的实际性的无后悔算法。我们称之为 Reward Transformation RM+ (RTRM+)。RTRM+ 在离散时间反馈下实现了最后迭代均衡。使用反思过去 regret decomposition 框架，我们提出了 Reward Transformation CFR+ (RTCFR+)，以扩展 RTRM+ 到 EFGs。实验结果显示了我们的算法在与现有的最后迭代均衡算法和 RM+ (CFR+) 进行比较时，具有较高的实际性。

A survey on bias in machine learning research

paper_url: http://arxiv.org/abs/2308.11254
repo_url: https://github.com/Aastha2104/Parkinson-Disease-Prediction
paper_authors: Agnieszka Mikołajczyk-Bareła, Michał Grochowski
for: 本文旨在探讨机器学习中的偏见问题，尤其是偏见的起源和 causa causans。
methods: 本文提供了机器学习管道中偏见的四十多种可能的来源，并提供了具体的例子。
results: 通过理解机器学习中的偏见来源和后果，可以开发更好的偏见检测和缓解方法，以实现更公正、透明和准确的机器学习模型。

Abstract
Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias. However, bias was originally defined as a "systematic error," often caused by humans at different stages of the research process. This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models. The paper focus on bias in machine learning pipelines. Survey analyses over forty potential sources of bias in the machine learning (ML) pipeline, providing clear examples for each. By understanding the sources and consequences of bias in machine learning, better methods can be developed for its detecting and mitigating, leading to fairer, more transparent, and more accurate ML models.

摘要
当前的研究 souvent focuses on fairness, 而忽略了偏见的根源或原因。然而，偏见最初是定义为“系统性错误”，经常由人类在不同的研究过程中引起。本文旨在将过去的偏见研究文献与机器学习（ML）管道中的偏见相连接，并提供了40多个可能的偏见来源的分类。通过理解机器学习中的偏见来源和后果，可以开发出更好的偏见检测和缓解方法，以提供更公平、透明和准确的机器学习模型。

Multi-Source Domain Adaptation for Cross-Domain Fault Diagnosis of Chemical Processes

paper_url: http://arxiv.org/abs/2308.11247
repo_url: None
paper_authors: Eduardo Fernandes Montesuma, Michela Mulas, Fred Ngolè Mboula, Francesco Corona, Antoine Souloumiac
For: This paper focuses on Cross-Domain Fault Diagnosis (CDFD) in process supervision, using machine learning to predict fault types from sensor readings.* Methods: The authors compare single and multi-source unsupervised domain adaptation (SSDA and MSDA respectively) algorithms for CDFD, using the Tennessee-Eastmann Process as a benchmark.* Results: The MSDA baseline improves classification accuracy by 23% on average compared to the SSDA baseline, and using multiple sources during training improves accuracy by 8.4% on average even without adaptation.

Abstract
Fault diagnosis is an essential component in process supervision. Indeed, it determines which kind of fault has occurred, given that it has been previously detected, allowing for appropriate intervention. Automatic fault diagnosis systems use machine learning for predicting the fault type from sensor readings. Nonetheless, these models are sensible to changes in the data distributions, which may be caused by changes in the monitored process, such as changes in the mode of operation. This scenario is known as Cross-Domain Fault Diagnosis (CDFD). We provide an extensive comparison of single and multi-source unsupervised domain adaptation (SSDA and MSDA respectively) algorithms for CDFD. We study these methods in the context of the Tennessee-Eastmann Process, a widely used benchmark in the chemical industry. We show that using multiple domains during training has a positive effect, even when no adaptation is employed. As such, the MSDA baseline improves over the SSDA baseline classification accuracy by 23% on average. In addition, under the multiple-sources scenario, we improve classification accuracy of the no adaptation setting by 8.4% on average.

摘要
缺陷诊断是生产过程监测中的一个重要组件。它可以确定已经检测到的缺陷是哪种类型，以便进行适当的干预。自动缺陷诊断系统使用机器学习预测缺陷类型从感知器读数。然而，这些模型对数据分布的变化敏感，这些变化可能由监测过程中的变化引起，如操作模式的变化。这种情况被称为跨领域缺陷诊断（CDFD）。我们对单源和多源无监督领域适应（SSDA和MSDA）算法进行了广泛的比较，并在 Tennessee-Eastmann 过程中进行了研究。我们发现在训练时使用多个领域可以提高分类精度，即使没有适应。因此，MSDA 基线比 SSDA 基eline 分类精度提高了23%的平均值。此外，在多源enario下，我们通过不进行适应来提高无适应设置的分类精度的平均提高了8.4%。

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

paper_url: http://arxiv.org/abs/2308.11241
repo_url: https://github.com/harunorikawano/speaker-identification-with-tgp
paper_authors: Harunori Kawano, Sota Shimizu
for: 这 paper 是为了构建一个高效的扩展speaker identification模型，使用 Transformer 架构和自监学习。
methods: 该 paper 使用了 Transformer-based contextual model，并进行了参数与性能之间的关系分析，以探索有效模型的结构。此外，它还提出了一种强大学习能力的 pooling 方法：Temporal Gate Pooling。
results: 该 paper 在 VoxCeleb1 测试集上进行了评估，并 achieved 85.9% 的准确率，与 wav2vec2 的 317.7M 参数相比，只有 28.5M 参数。代码可以在 https://github.com/HarunoriKawano/speaker-identification-with-tgp 上获取。

Abstract
Wav2vec2 has achieved success in applying Transformer architecture and self-supervised learning to speech recognition. Recently, these have come to be used not only for speech recognition but also for the entire speech processing. This paper introduces an effective end-to-end speaker identification model applied Transformer-based contextual model. We explored the relationship between the parameters and the performance in order to discern the structure of an effective model. Furthermore, we propose a pooling method, Temporal Gate Pooling, with powerful learning ability for speaker identification. We applied Conformer as encoder and BEST-RQ for pre-training and conducted an evaluation utilizing the speaker identification of VoxCeleb1. The proposed method has achieved an accuracy of 85.9% with 28.5M parameters, demonstrating comparable precision to wav2vec2 with 317.7M parameters. Code is available at https://github.com/HarunoriKawano/speaker-identification-with-tgp.

摘要
噪音2vec2在应用Transformer架构和自动学习的情况下取得了成功，现在这些技术不仅用于语音识别，也用于整个语音处理。这篇研讨会发表了一个有效的终端到终端的话者识别模型，使用Transformer-based对话模型。我们探索了模型参数和性能之间的关系，以了解有效模型的结构。此外，我们提出了一种排程方法，时间门排程（Temporal Gate Pooling），具有强大的学习能力。我们使用Conformer构成器和BEST-RQ进行预训，并在使用VoxCeleb1的话者识别进行评估。我们的方法实现了85.9%的精度，仅使用28.5M个参数，与wav2vec2的317.7M个参数相比，具有相似的精度。代码可以在https://github.com/HarunoriKawano/speaker-identification-with-tgp上取得。

Minwise-Independent Permutations with Insertion and Deletion of Features

paper_url: http://arxiv.org/abs/2308.11240
repo_url: None
paper_authors: Rameshwar Pratap, Raghav Kulkarni
for: 本研究的目的是研究$\mathrm{minHash}$算法在高维数据中的应用，特别是在特征集的Insertion和Deletion操作时的性能分析和提出一种可以适应这种情况的算法。
methods: 本研究使用了$\mathrm{minHash}$算法，并提出了一种基于批处理的方法来适应特征集的Insertion和Deletion操作。
results: 研究表明，使用这种方法可以减少$\mathrm{minHash}$的计算时间，同时保持与从scratch计算$\mathrm{minHash}$的性能相似。

Abstract
In their seminal work, Broder \textit{et. al.}~\citep{BroderCFM98} introduces the $\mathrm{minHash}$ algorithm that computes a low-dimensional sketch of high-dimensional binary data that closely approximates pairwise Jaccard similarity. Since its invention, $\mathrm{minHash}$ has been commonly used by practitioners in various big data applications. Further, the data is dynamic in many real-life scenarios, and their feature sets evolve over time. We consider the case when features are dynamically inserted and deleted in the dataset. We note that a naive solution to this problem is to repeatedly recompute $\mathrm{minHash}$ with respect to the updated dimension. However, this is an expensive task as it requires generating fresh random permutations. To the best of our knowledge, no systematic study of $\mathrm{minHash}$ is recorded in the context of dynamic insertion and deletion of features. In this work, we initiate this study and suggest algorithms that make the $\mathrm{minHash}$ sketches adaptable to the dynamic insertion and deletion of features. We show a rigorous theoretical analysis of our algorithms and complement it with extensive experiments on several real-world datasets. Empirically we observe a significant speed-up in the running time while simultaneously offering comparable performance with respect to running $\mathrm{minHash}$ from scratch. Our proposal is efficient, accurate, and easy to implement in practice.

摘要
original text:In their seminal work, Broder et al. (1998) introduced the minHash algorithm that computes a low-dimensional sketch of high-dimensional binary data that closely approximates pairwise Jaccard similarity. Since its invention, minHash has been commonly used by practitioners in various big data applications. Further, the data is dynamic in many real-life scenarios, and their feature sets evolve over time. We consider the case when features are dynamically inserted and deleted in the dataset. We note that a naive solution to this problem is to repeatedly recompute minHash with respect to the updated dimension. However, this is an expensive task as it requires generating fresh random permutations. To the best of our knowledge, no systematic study of minHash is recorded in the context of dynamic insertion and deletion of features. In this work, we initiate this study and suggest algorithms that make the minHash sketches adaptable to the dynamic insertion and deletion of features. We show a rigorous theoretical analysis of our algorithms and complement it with extensive experiments on several real-world datasets. Empirically we observe a significant speed-up in the running time while simultaneously offering comparable performance with respect to running minHash from scratch. Our proposal is efficient, accurate, and easy to implement in practice.Simplified Chinese translation:布鲁德等人（1998）在其著作中提出了minHash算法，该算法可以将高维二分数据简化为低维矩阵，并且准确地反映对应的Jacard相似性。自其发明以来，minHash在各种大数据应用中广泛使用。然而，数据在实际场景中经常是动态的，特别是特征集的更新。我们考虑的情况是在数据集中动态插入和删除特征。我们注意到，一个简单的解决方案是重复计算minHash，以适应更新的维度。然而，这是一项昂贵的任务，因为需要生成新的随机排序。我们知道，在动态插入和删除特征的情况下，minHash的系统性研究并没有被记录。在这项工作中，我们开始这种研究，并提出了适应动态插入和删除特征的minHash笔记。我们提供了严格的理论分析，并补充了多个实际数据集的实验。我们观察到，在运行时间上，我们的提案可以获得显著的加速，同时保持与从头开始计算minHash的性能相似。我们的提案是高效、准确、易于实现。

Federated Learning on Patient Data for Privacy-Protecting Polycystic Ovary Syndrome Treatment

paper_url: http://arxiv.org/abs/2308.11220
repo_url: https://github.com/toriqiu/fl-pcos
paper_authors: Lucia Morris, Tori Qiu, Nikhil Raghuraman
for: 这项研究是为了采用 Federated Learning（FL）预测女性患有多囊卵巢综合征（PCOS）患者最佳药物。
methods: 该研究使用了多种 Federated Learning 方法在 sintetic PCOS 患者数据集上进行验证。
results: 研究表明，使用 Federated Learning 方法可以在具有各种数据的大规模数据集上预测PCOS 患者最佳药物，同时为患者提供隐私保证。

Abstract
The field of women's endocrinology has trailed behind data-driven medical solutions, largely due to concerns over the privacy of patient data. Valuable datapoints about hormone levels or menstrual cycling could expose patients who suffer from comorbidities or terminate a pregnancy, violating their privacy. We explore the application of Federated Learning (FL) to predict the optimal drug for patients with polycystic ovary syndrome (PCOS). PCOS is a serious hormonal disorder impacting millions of women worldwide, yet it's poorly understood and its research is stunted by a lack of patient data. We demonstrate that a variety of FL approaches succeed on a synthetic PCOS patient dataset. Our proposed FL models are a tool to access massive quantities of diverse data and identify the most effective treatment option while providing PCOS patients with privacy guarantees.

摘要
女性endoocrinology领域落后数据驱动医疗解决方案，主要是由于患者数据隐私担忧。有价值的数据点，如血浆激素水平或月经周期，可能泄露患有慢性疾病或 abortion 的患者身份，违反其隐私。我们探讨了 Federated Learning（FL）在预测女性泌乳体积缩疾病（PCOS）患者最佳药物方案方面的应用。PCOS 是全球数百万女性患有的严重孢子激素疾病，但它的研究受到缺乏患者数据的限制。我们展示了多种 FL 方法在人工 PCOS 患者数据集上取得成功。我们提议的 FL 模型是一种访问庞大数据量和提供 PCOS 患者隐私保证的工具。

Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models

paper_url: http://arxiv.org/abs/2308.11217
repo_url: None
paper_authors: Zengxiang Li, Zhaoxiang Hou, Hui Liu, Ying Wang, Tongzhi Li, Longfei Xie, Chao Shi, Chengyi Yang, Weishan Zhang, Zelei Liu, Liang Xu
for:这篇论文旨在提出一种多模态联合学习框架，帮助多家企业通过私有领域数据来共同训练大型模型，以实现多enario智能服务。methods:该论文提出了多模态联合学习的策略性转型，包括智能基础和目标在大模型时代的探讨，以及在不同数据和模型聚合、性能和成本交易、数据隐私和奖励机制等方面的新挑战。results:实验表明，通过多模态联合学习，企业可以增强和积累智能能力，共同创建智能城市模型，提供高质量智能服务，涵盖能源基础设施安全、住宅区安全和城市运营管理等多个场景。

Abstract
Multimodal data, which can comprehensively perceive and recognize the physical world, has become an essential path towards general artificial intelligence. However, multimodal large models trained on public datasets often underperform in specific industrial domains. This paper proposes a multimodal federated learning framework that enables multiple enterprises to utilize private domain data to collaboratively train large models for vertical domains, achieving intelligent services across scenarios. The authors discuss in-depth the strategic transformation of federated learning in terms of intelligence foundation and objectives in the era of big model, as well as the new challenges faced in heterogeneous data, model aggregation, performance and cost trade-off, data privacy, and incentive mechanism. The paper elaborates a case study of leading enterprises contributing multimodal data and expert knowledge to city safety operation management , including distributed deployment and efficient coordination of the federated learning platform, technical innovations on data quality improvement based on large model capabilities and efficient joint fine-tuning approaches. Preliminary experiments show that enterprises can enhance and accumulate intelligent capabilities through multimodal model federated learning, thereby jointly creating an smart city model that provides high-quality intelligent services covering energy infrastructure safety, residential community security, and urban operation management. The established federated learning cooperation ecosystem is expected to further aggregate industry, academia, and research resources, realize large models in multiple vertical domains, and promote the large-scale industrial application of artificial intelligence and cutting-edge research on multimodal federated learning.

摘要
多Modal数据，可以全面感受和识别物理世界，已成为人工智能的重要路径。然而，多Modal大型模型在公共数据集上训练时经常表现不佳在特定行业领域。这篇论文提出了一种多Modal联邦学习框架，允许多家企业使用私有领域数据共同训练大型模型，实现多场景智能服务。作者深入讨论了联邦学习在智能基础和目标时代下的战略性转变，以及在不同数据、模型集成、性能和成本衡量、数据隐私和奖励机制等方面面临的新挑战。论文还介绍了一个城市安全运营管理案例研究，包括分布式部署和有效协调联邦学习平台，技术创新以及大型模型能力的数据质量提升和有效的联合精度调整方法。初步实验表明，企业可通过多Modal模型联邦学习增强和积累智能能力，共同创建高质量智能服务，涵盖能源基础设施安全、住宅社区安全和城市运营管理等多个方向。建立的联邦学习合作生态系统预计会进一步吸引产业、学术和研究资源，实现多个垂直领域的大型模型，并推动人工智能和多Modal联邦学习的大规模工业应用。

Hamiltonian GAN

paper_url: http://arxiv.org/abs/2308.11216
repo_url: https://github.com/koritsky/hamiltonian_learning
paper_authors: Christine Allen-Blanchette
for: 本研究旨在透过哈密顿形式学习Physics-inspired video generation的方法，并且从数据中学习配置空间的表示。
methods: 本研究使用了一个具有学习的配置空间地图和哈密顿神经网络动作模型，从数据中学习配置空间的表示。
results: 本研究使用了一个物理灵感的循环坐标损失函数，并且证明了其可以提高表示的解释性和有效性。

Abstract
A growing body of work leverages the Hamiltonian formalism as an inductive bias for physically plausible neural network based video generation. The structure of the Hamiltonian ensures conservation of a learned quantity (e.g., energy) and imposes a phase-space interpretation on the low-dimensional manifold underlying the input video. While this interpretation has the potential to facilitate the integration of learned representations in downstream tasks, existing methods are limited in their applicability as they require a structural prior for the configuration space at design time. In this work, we present a GAN-based video generation pipeline with a learned configuration space map and Hamiltonian neural network motion model, to learn a representation of the configuration space from data. We train our model with a physics-inspired cyclic-coordinate loss function which encourages a minimal representation of the configuration space and improves interpretability. We demonstrate the efficacy and advantages of our approach on the Hamiltonian Dynamics Suite Toy Physics dataset.

摘要
<> translate "A growing body of work leverages the Hamiltonian formalism as an inductive bias for physically plausible neural network based video generation. The structure of the Hamiltonian ensures conservation of a learned quantity (e.g., energy) and imposes a phase-space interpretation on the low-dimensional manifold underlying the input video. While this interpretation has the potential to facilitate the integration of learned representations in downstream tasks, existing methods are limited in their applicability as they require a structural prior for the configuration space at design time. In this work, we present a GAN-based video generation pipeline with a learned configuration space map and Hamiltonian neural network motion model, to learn a representation of the configuration space from data. We train our model with a physics-inspired cyclic-coordinate loss function which encourages a minimal representation of the configuration space and improves interpretability. We demonstrate the efficacy and advantages of our approach on the Hamiltonian Dynamics Suite Toy Physics dataset."中文简体版：<>现有一些研究利用哈密顿ormalism作为神经网络基于视频生成的启发，这种结构保证学习的量（例如能量）的保守，并在输入视频的低维度抽象空间中强制实施相位空间的 интерpretation。这种 интерpretation有可能为下游任务提供助け，但现有方法受限于设计时需要的结构预先知识。在这个工作中，我们提出了基于GAN的视频生成管道，使用学习的配置空间地图和哈密顿神经网络运动模型，从数据中学习配置空间的表示。我们使用physics-inspired循环坐标损失函数，鼓励学习的配置空间表示最小化，提高可读性。我们在哈密顿动力学Suite Toy Physics数据集上证明了我们的方法的有效性和优势。

A Simple Framework for Multi-mode Spatial-Temporal Data Modeling

paper_url: http://arxiv.org/abs/2308.11204
repo_url: https://github.com/lzhmarkk/simmst
paper_authors: Zihang Liu, Le Yu, Tongyu Zhu, Leiei Sun
for: 本文提出了一种简单的多模式空间时间数据模型方法，用于捕捉多种空间模式之间的关系和时间相关性。
methods: 本文提出了一种通用的交叉模式空间关系学习模块，可以适应不同的空间模式之间的连接和信息传递。此外，文章还使用多层感知器来捕捉时间相关性和通道相关性。
results: 实验结果表明，该模型可以在三个实际 dataset 上具有更高的效果，同时具有较低的空间和时间复杂度。此外，通用的交叉模式空间关系学习模块的一致性也得到了验证。

Abstract
Spatial-temporal data modeling aims to mine the underlying spatial relationships and temporal dependencies of objects in a system. However, most existing methods focus on the modeling of spatial-temporal data in a single mode, lacking the understanding of multiple modes. Though very few methods have been presented to learn the multi-mode relationships recently, they are built on complicated components with higher model complexities. In this paper, we propose a simple framework for multi-mode spatial-temporal data modeling to bring both effectiveness and efficiency together. Specifically, we design a general cross-mode spatial relationships learning component to adaptively establish connections between multiple modes and propagate information along the learned connections. Moreover, we employ multi-layer perceptrons to capture the temporal dependencies and channel correlations, which are conceptually and technically succinct. Experiments on three real-world datasets show that our model can consistently outperform the baselines with lower space and time complexity, opening up a promising direction for modeling spatial-temporal data. The generalizability of the cross-mode spatial relationships learning module is also validated.

摘要
空间-时间数据模型目的是挖掘系统中对象之间的下一个空间关系和时间依赖关系。然而，大多数现有方法都是单模式的，缺乏多模式关系的理解。虽然有些最近提出的方法可以学习多模式关系，但它们是基于复杂的组件，具有更高的模型复杂性。在这篇论文中，我们提出了一个简单的多模式空间-时间数据模型 framework，以实现效率和效果的共同挥ansion。具体来说，我们设计了一个通用的跨模式空间关系学习组件，可以适应地建立多个模式之间的连接，并在学习到的连接上传递信息。此外，我们使用多层感知器来捕捉时间依赖关系和通道相关性，这些概念和技术都非常简洁。在三个实际数据集上进行了实验，我们的模型能够一直高于基eline，占用更少的空间和时间复杂度，开启了一个有前途的空间-时间数据模型领域。此外，我们验证了跨模式空间关系学习模块的一致性。

SegRNN: Segment Recurrent Neural Network for Long-Term Time Series Forecasting

paper_url: http://arxiv.org/abs/2308.11200
repo_url: None
paper_authors: Shengsheng Lin, Weiwei Lin, Wentai Wu, Feiyu Zhao, Ruichao Mo, Haotong Zhang
for: 本文 targets the Long-term Time Series Forecasting (LTSF) domain, addressing the challenges faced by RNN-based methods when dealing with excessively long look-back windows and forecast horizons.
methods: 本文 proposes two novel strategies to reduce the number of iterations in RNNs for LTSF tasks: Segment-wise Iterations and Parallel Multi-step Forecasting (PMF).
results: 实验结果显示，SegRNN 能够在 LTSF 任务中获得显著的改善，不仅比 SOTA Transformer-based 模型高，还可以降低 runtime 和 memory usage 超过 78%。

Abstract
RNN-based methods have faced challenges in the Long-term Time Series Forecasting (LTSF) domain when dealing with excessively long look-back windows and forecast horizons. Consequently, the dominance in this domain has shifted towards Transformer, MLP, and CNN approaches. The substantial number of recurrent iterations are the fundamental reasons behind the limitations of RNNs in LTSF. To address these issues, we propose two novel strategies to reduce the number of iterations in RNNs for LTSF tasks: Segment-wise Iterations and Parallel Multi-step Forecasting (PMF). RNNs that combine these strategies, namely SegRNN, significantly reduce the required recurrent iterations for LTSF, resulting in notable improvements in forecast accuracy and inference speed. Extensive experiments demonstrate that SegRNN not only outperforms SOTA Transformer-based models but also reduces runtime and memory usage by more than 78%. These achievements provide strong evidence that RNNs continue to excel in LTSF tasks and encourage further exploration of this domain with more RNN-based approaches. The source code is coming soon.

摘要

ConcatPlexer: Additional Dim1 Batching for Faster ViTs

paper_url: http://arxiv.org/abs/2308.11199
repo_url: None
paper_authors: Donghoon Han, Seunghyeon Seo, Donghyeon Jeon, Jiho Jang, Chaerin Kong, Nojun Kwak
for: 提高视觉识别的效率和精度
methods: 使用额外维度批处理（ concatenation）和特性改进技术
results: 在ImageNet1K和CIFAR100 dataset上训练的ConcatPlexer模型，相比ViT-B/16，具有23.5% less GFLOPs，同时保持69.5%和83.4%的验证精度。

Abstract
Transformers have demonstrated tremendous success not only in the natural language processing (NLP) domain but also the field of computer vision, igniting various creative approaches and applications. Yet, the superior performance and modeling flexibility of transformers came with a severe increase in computation costs, and hence several works have proposed methods to reduce this burden. Inspired by a cost-cutting method originally proposed for language models, Data Multiplexing (DataMUX), we propose a novel approach for efficient visual recognition that employs additional dim1 batching (i.e., concatenation) that greatly improves the throughput with little compromise in the accuracy. We first introduce a naive adaptation of DataMux for vision models, Image Multiplexer, and devise novel components to overcome its weaknesses, rendering our final model, ConcatPlexer, at the sweet spot between inference speed and accuracy. The ConcatPlexer was trained on ImageNet1K and CIFAR100 dataset and it achieved 23.5% less GFLOPs than ViT-B/16 with 69.5% and 83.4% validation accuracy, respectively.

摘要
启示器（Transformers）在自然语言处理（NLP）领域以外，也在计算机视觉领域取得了巨大成功，激发了许多创新的方法和应用。然而，启示器的高性能和模型灵活性却带来了计算成本的增加，因此许多研究均提出了减轻这种负担的方法。 Drawing inspiration from a cost-cutting method originally proposed for language models, Data Multiplexing (DataMUX), we propose a novel approach for efficient visual recognition that employs additional dim1 batching (i.e., concatenation) to greatly improve the throughput with little compromise in accuracy.我们首先介绍了图像多重化器（Image Multiplexer），该模型是对DataMux的原始方法的简单适应。然后，我们设计了新的组件来缓解Image Multiplexer的缺点，这使得我们的最终模型——ConcatPlexer——在推理速度和准确率之间占据了一个平衡点。 ConcatPlexer在ImageNet1K和CIFAR100 dataset上训练，其在ViT-B/16模型的23.5%的GFLOPs上下降，同时保持了69.5%和83.4%的验证精度。

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Power Analysis and Sample Size Estimation

paper_url: http://arxiv.org/abs/2308.11197
repo_url: None
paper_authors: Hamzeh Ghasemzadeh, Robert E. Hillman, Daryush D. Mehta
For: The paper aims to provide quantitative evidence to incentivize researchers to use the more robust method of nested cross-validation in machine learning-based analysis, and to present methods and MATLAB codes for power analysis during the design of a study.* Methods: The paper uses Monte Carlo simulations to compare the performance of four different cross-validation methods (single holdout, 10-fold, train-validation-test, and nested 10-fold) in terms of statistical power and statistical confidence.* Results: The paper finds that the nested 10-fold cross-validation method results in the highest statistical confidence and the highest statistical power, while providing an unbiased estimate of the accuracy. The required sample size with a single holdout is found to be 50% higher than what would be needed with nested cross-validation, and the confidence in the model based on nested cross-validation is found to be as much as four times higher than the confidence in the single holdout-based model.

Abstract
This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust method of nested cross-validation. The second purpose is to present methods and MATLAB codes for doing power analysis for ML-based analysis during the design of a study. Monte Carlo simulations were used to quantify the interactions between the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, and the dimensionality of the model. Four different cross-validations (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and statistical confidence of the ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome ({\alpha}=0.05, 1-\b{eta}=0.8). Statistical confidence of the model was defined as the probability of correct features being selected and hence being included in the final model. Our analysis showed that the model generated based on the single holdout method had very low statistical power and statistical confidence and that it significantly overestimated the accuracy. Conversely, the nested 10-fold cross-validation resulted in the highest statistical confidence and the highest statistical power, while providing an unbiased estimate of the accuracy. The required sample size with a single holdout could be 50% higher than what would be needed if nested cross-validation were used. Confidence in the model based on nested cross-validation was as much as four times higher than the confidence in the single holdout-based model. A computational model, MATLAB codes, and lookup tables are provided to assist researchers with estimating the sample size during the design of their future studies.

摘要

Automatic Task Parallelization of Dataflow Graphs in ML/DL models

paper_url: http://arxiv.org/abs/2308.11192
repo_url: None
paper_authors: Srinjoy Das, Lawrence Rauchwerger
for: 提高机器学习（ML）和深度学习（DL）模型的训练和推理性能
methods: 使用批处理和操作符并行方法，并利用 kritical-path-based 线性归一化来扩大并行路径
results: 在多个 ML 图ogram上达到了最多 1.9 倍的速度提升，并且在编译和运行时间方面也超过了一些当前机制，同时方法轻量级和快速，适用于有限的端点设备

Abstract
Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search space optimizations which are costly in terms of power and hardware usage. Especially in the case of inference, when the batch size is 1 and execution is on CPUs or for power-constrained edge devices, current techniques can become costly, complicated or inapplicable. To ameliorate this, we present a Critical-Path-based Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. Our task parallelization approach further optimizes the structure of graphs via cloning and prunes them via constant propagation and dead-code elimination. Contrary to other work, we generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format via a new tool that we have built called {\bf Ramiel}. This allows us to benefit from other downstream acceleration techniques like intra-op parallelism and potentially pipeline parallelism. Our preliminary results on several ML graphs demonstrate up to 1.9$\times$ speedup over serial execution and outperform some of the current mechanisms in both compile and runtimes. Lastly, our methods are lightweight and fast enough so that they can be used effectively for power and resource-constrained devices, while still enabling downstream optimizations.

摘要
Translated into Simplified Chinese:有很多方法可以加速机器学习（ML）或深度学习（DL）模型的训练和推理性能。然而，现代技术很多都是基于图和运算符并行方法，这些方法会对硬件和电力使用非常高。尤其在推理时，当批处理数量为1并在CPUs上执行时，现有的技术可能会变得昂贵、复杂或无法应用。为了改善这种情况，我们提出了一种关键路径基 linear clustering 方法，可以利用 ML 数据流图中的自然并行路径。我们的任务并行化方法还可以优化图的结构via 克隆和常量卷积，并通过死代码消除来减少图。与其他工作不同，我们可以通过一个新建的工具 Ramiel 将 ML 模型转换为可读的并行 Pytorch+Python 代码，并且可以通过其他下游加速技术，如内部并行和批处理并行。我们的初步结果表明，对于多个 ML 图，我们可以 achieved up to 1.9 倍的速度提升，并且超越了一些当前的机制。此外，我们的方法具有轻量级和快速的特点，可以在功能和资源限制的设备上使用，同时仍可以开启下游优化。

Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

paper_url: http://arxiv.org/abs/2308.11189
repo_url: https://github.com/lab-v2/diversity_measures
paper_authors: Noel Ngu, Nathaniel Lee, Paulo Shakarian
for: 这 paper 旨在提供一些基于响应多样性的错误预测方法，而不是基于特定应用领域的信息。
methods: 这 paper 使用了三种测量错误概率的方法：基于熵、基于金谱纷乱度和基于中心距离。
results: 实验表明，这些测量方法与错误概率强相关，并且可以应用于几个不同的任务，如几个提示、链式思维和错误检测。

Abstract
Error prediction in large language models often relies on domain-specific information. In this paper, we present measures for quantification of error in the response of a large language model based on the diversity of responses to a given prompt - hence independent of the underlying application. We describe how three such measures - based on entropy, Gini impurity, and centroid distance - can be employed. We perform a suite of experiments on multiple datasets and temperature settings to demonstrate that these measures strongly correlate with the probability of failure. Additionally, we present empirical results demonstrating how these measures can be applied to few-shot prompting, chain-of-thought reasoning, and error detection.

摘要
大型语言模型中的错误预测通常基于域特定信息。在这篇论文中，我们提出了基于响应提问的多样性来衡量语言模型的错误评估的方法。我们介绍了三种这种方法：基于熵、基于吉尼纯度和基于中心距离。我们在多个数据集和温度设置下进行了一系列实验，并证明了这些指标与错误概率强相关。此外，我们还提供了实证结果，证明这些指标可以应用于几个shot提问、链式思维和错误检测等领域。

A three in one bottom-up framework for simultaneous semantic segmentation, instance segmentation and classification of multi-organ nuclei in digital cancer histology

paper_url: http://arxiv.org/abs/2308.11179
repo_url: None
paper_authors: Ibtihaj Ahmad, Syed Muhammad Israr, Zain Ul Islam
For: The paper is focused on developing a deep learning framework for simultaneous instance segmentation and classification of nuclei in digital histology images.* Methods: The proposed method uses a multi-stage approach with additional decoder heads and independent weighted losses to produce semantic segmentation, edge proposals, and classification maps. The method also utilizes post-processing techniques to improve the final segmentation and classification results.* Results: The proposed method achieves high performance on several benchmark datasets, with a Dice score of 0.841 for semantic segmentation, bPQ scores of 0.713 for instance segmentation, and mPQ scores of 0.633 for nuclei classification. The method is also generalized across 19 types of tissues and is less complex compared to state-of-the-art methods.Here are the three key points in Simplified Chinese text:
for: 本文目标是为数字 histology 图像同时实现实例分割和分类。
methods: 提议的方法使用多阶段方法，并使用多个推理头和独立的权重损失来生成 semantic segmentation、edge proposals 和分类地図。方法还使用后处理技术来改进最终分割和分类结果。
results: 提议的方法在多个 referential 数据集上达到了高性能，包括 semantic segmentation 的 Dice 分数为 0.841，实例分割的 bPQ 分数为 0.713，和 nuclei 分类的 mPQ 分数为 0.633。方法还可以在多种组织中进行普适化，并且比 state-of-the-art 方法更加简单。

Abstract
Simultaneous segmentation and classification of nuclei in digital histology play an essential role in computer-assisted cancer diagnosis; however, it remains challenging. The highest achieved binary and multi-class Panoptic Quality (PQ) remains as low as 0.68 bPQ and 0.49 mPQ, respectively. It is due to the higher staining variability, variability across the tissue, rough clinical conditions, overlapping nuclei, and nuclear class imbalance. The generic deep-learning methods usually rely on end-to-end models, which fail to address these problems associated explicitly with digital histology. In our previous work, DAN-NucNet, we resolved these issues for semantic segmentation with an end-to-end model. This work extends our previous model to simultaneous instance segmentation and classification. We introduce additional decoder heads with independent weighted losses, which produce semantic segmentation, edge proposals, and classification maps. We use the outputs from the three-head model to apply post-processing to produce the final segmentation and classification. Our multi-stage approach utilizes edge proposals and semantic segmentations compared to direct segmentation and classification strategies followed by most state-of-the-art methods. Due to this, we demonstrate a significant performance improvement in producing high-quality instance segmentation and nuclei classification. We have achieved a 0.841 Dice score for semantic segmentation, 0.713 bPQ scores for instance segmentation, and 0.633 mPQ for nuclei classification. Our proposed framework is generalized across 19 types of tissues. Furthermore, the framework is less complex compared to the state-of-the-art.

摘要
<>对于计算机助理肿瘤诊断中的数字 histology 中的同时分割和分类，它扮演着关键的角色，但是它还是具有挑战性。最高的二元和多类 Panoptic Quality (PQ) 仍然为 0.68 bPQ 和 0.49 mPQ，分别。这是因为高程染色变化、质量不均匀、质量不稳定、核体重叠和核体类别偏好。通常的深度学习方法通常采用端到端模型，这些模型无法直接地Addressing these problems associated with digital histology. 在我们之前的工作中，我们已经解决了这些问题，并提出了 DAN-NucNet 模型。这个工作是将我们之前的模型扩展到同时实例分割和分类。我们添加了更多的解码头，每个解码头都有独立的权重损失，它们生成核心分割、边提案和分类图像。我们使用这些三个头的输出来进行后处理，生成最终的分割和分类结果。我们的多个阶段方法利用边提案和核心分割，而不是直接分割和分类策略，这使得我们可以达到高质量实例分割和核体分类。我们实现了 0.841 Dice 分割率、0.713 bPQ 分割率和 0.633 mPQ 分类率。我们的提出的框架可以普遍应用于 19 种组织类型。此外，我们的框架比state-of-the-art更加简单。

A Preliminary Investigation into Search and Matching for Tumour Discrimination in WHO Breast Taxonomy Using Deep Networks

paper_url: http://arxiv.org/abs/2308.11162
repo_url: None
paper_authors: Abubakr Shafique, Ricardo Gonzalez, Liron Pantanowitz, Puay Hoon Tan, Alberto Machado, Ian A Cree, Hamid R. Tizhoosh
for: 这个研究旨在开发一个基于深度学习的数字图像库，用于帮助病理学家更好地诊断乳腺癌。
methods: 该研究使用了一个国际化的深度学习模型，从TCGA数据库中提取了数百万个诊断 histopathology 图像，并对这些图像进行了深度特征提取。然后，该模型将图像与WTHO乳腺癌分类系统进行了对比，以便建立一个可搜索的数字图像库。
results: 研究发现，使用深度学习模型对乳腺癌图像进行了高精度的搜索和匹配，并且在验证过程中达到了88%的准确率。这些结果表明，使用数字图像库可以帮助病理学家更好地理解乳腺癌的复杂关系，并提高诊断的准确性。

Abstract
Breast cancer is one of the most common cancers affecting women worldwide. They include a group of malignant neoplasms with a variety of biological, clinical, and histopathological characteristics. There are more than 35 different histological forms of breast lesions that can be classified and diagnosed histologically according to cell morphology, growth, and architecture patterns. Recently, deep learning, in the field of artificial intelligence, has drawn a lot of attention for the computerized representation of medical images. Searchable digital atlases can provide pathologists with patch matching tools allowing them to search among evidently diagnosed and treated archival cases, a technology that may be regarded as computational second opinion. In this study, we indexed and analyzed the WHO breast taxonomy (Classification of Tumours 5th Ed.) spanning 35 tumour types. We visualized all tumour types using deep features extracted from a state-of-the-art deep learning model, pre-trained on millions of diagnostic histopathology images from the TCGA repository. Furthermore, we test the concept of a digital "atlas" as a reference for search and matching with rare test cases. The patch similarity search within the WHO breast taxonomy data reached over 88% accuracy when validating through "majority vote" and more than 91% accuracy when validating using top-n tumour types. These results show for the first time that complex relationships among common and rare breast lesions can be investigated using an indexed digital archive.

摘要
乳癌是世界各地妇女中最常见的一种恶性肿瘤，它们包括多种生物、临床和 histopathological 特征。在乳腺癌方面，有 более than 35 种不同的 histological 型术，可以通过细胞形态、生长和建筑模式进行分类和诊断。在人工智能领域，深度学习在最近几年引起了很多关注，用于计算机视觉图像的数字化表示。搜索可搜索的数字图频表可以为病理学家提供一种可搜索的数字图频表，以便通过对已诊断和治疗的档案进行搜索，这可以被视为计算机化的第二次诊断。在这项研究中，我们使用 WHO 乳腺分类法（第五版），涵盖了35种肿瘤类型。我们使用一种国际顶尖的深度学习模型，从TCGA数据库中获取了数百万个诊断 histopathology 图像，并将所有肿瘤类型用深度特征进行视觉化。此外，我们还测试了一种数字"图频"作为参考，以便对罕见案例进行搜索和匹配。在 WHO 乳腺分类法数据中进行质心 Similarity 搜索后，对 "主要投票" 和 "top-n" 案例进行验证，结果显示，这种方法可以达到88% 的准确率和91% 的准确率。这些结果表明，使用索引数字档案可以调查乳腺癌中复杂的关系。

xxMD: Benchmarking Neural Force Fields Using Extended Dynamics beyond Equilibrium

paper_url: http://arxiv.org/abs/2308.11155
repo_url: https://github.com/zpengmei/xxmd
paper_authors: Zihan Pengmei, Junyu Liu, Yinan Shu
for: 这篇论文主要针对的是计算化学中的神经力场模型（NFFs），它们在计算分子动力学中取代了量子化学计算。
methods: 作者使用了非自 Adair 动力学来生成新的xxMD数据集，这个数据集包括了多 reference 波函数理论和物理学术函数理论来确定能量和力。
results: 作者发现了 MD17 数据集中的坐标分布和能量分布存在约束，这使得这些数据集不适用于化学反应。而xxMD数据集更好地反映化学反应，并且可以用于评估NFF模型的泛化能力。

Abstract
Neural force fields (NFFs) have gained prominence in computational chemistry as surrogate models, superseding quantum-chemistry calculations in ab initio molecular dynamics. The prevalent benchmark for NFFs has been the MD17 dataset and its subsequent extension. These datasets predominantly comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampling from direct adiabatic dynamics. However, many chemical reactions entail significant molecular deformations, notably bond breaking. We demonstrate the constrained distribution of internal coordinates and energies in the MD17 datasets, underscoring their inadequacy for representing systems undergoing chemical reactions. Addressing this sampling limitation, we introduce the xxMD (Extended Excited-state Molecular Dynamics) dataset, derived from non-adiabatic dynamics. This dataset encompasses energies and forces ascertained from both multireference wave function theory and density functional theory. Furthermore, its nuclear configuration spaces authentically depict chemical reactions, making xxMD a more chemically relevant dataset. Our re-assessment of equivariant models on the xxMD datasets reveals notably higher mean absolute errors than those reported for MD17 and its variants. This observation underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability. Our proposed xxMD-CASSCF and xxMD-DFT datasets are available at \url{https://github.com/zpengmei/xxMD}.

摘要

Mobility-Aware Computation Offloading for Swarm Robotics using Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2308.11154
repo_url: None
paper_authors: Xiucheng Wang, Hongzhi Guo
for: automate dirty, dangerous, and dull tasks with swarm robotics
methods: leverage mobile edge computing and mobility-aware deep reinforcement learning model
results: meet delay requirements and guarantee computation precision with minimum robot energy

Abstract
Swarm robotics is envisioned to automate a large number of dirty, dangerous, and dull tasks. Robots have limited energy, computation capability, and communication resources. Therefore, current swarm robotics have a small number of robots, which can only provide limited spatio-temporal information. In this paper, we propose to leverage the mobile edge computing to alleviate the computation burden. We develop an effective solution based on a mobility-aware deep reinforcement learning model at the edge server side for computing scheduling and resource. Our results show that the proposed approach can meet delay requirements and guarantee computation precision by using minimum robot energy.

摘要
<> translate "Swarm robotics is envisioned to automate a large number of dirty, dangerous, and dull tasks. Robots have limited energy, computation capability, and communication resources. Therefore, current swarm robotics have a small number of robots, which can only provide limited spatio-temporal information. In this paper, we propose to leverage the mobile edge computing to alleviate the computation burden. We develop an effective solution based on a mobility-aware deep reinforcement learning model at the edge server side for computing scheduling and resource. Our results show that the proposed approach can meet delay requirements and guarantee computation precision by using minimum robot energy." into Chinese (Simplified)Answer:群体机器人是想要自动化大量的危险、不净、无聊任务。机器人具有有限的能量、计算能力和通信资源。因此，当前的群体机器人只有一小部分的机器人，可以提供有限的空间时间信息。在本文中，我们提议使用移动边缘计算来减轻计算负担。我们开发了一种有效的解决方案，基于移动边缘计算服务器端的 mobilit-aware深度学习模型，用于计算调度和资源分配。我们的结果显示，我们的方法可以 garantizar delay requirements和 computation precision，使用最少的机器人能量。

Energy-Efficient On-Board Radio Resource Management for Satellite Communications via Neuromorphic Computing

paper_url: http://arxiv.org/abs/2308.11152
repo_url: None
paper_authors: Flor Ortiz, Nicolas Skatchkovsky, Eva Lagunas, Wallace A. Martins, Geoffrey Eappen, Saed Daoud, Osvaldo Simeone, Bipin Rajendran, Symeon Chatzinotas
for: 这个论文目的是为了探讨使用Machine Learning（ML）技术来实现卫星通信（SatCom）系统中的无线资源管理，以提高系统的效率和可持续性。
methods: 本论文使用的方法包括使用脑机学发展的Brain-Inspired Machine Learning（BM）模型，并进行了软件实验和实际实验。实验使用了最新发布的Intel Loihi 2芯片，并与Xilinx Versal VCK5000芯片进行了比较。
results: 本论文的结果显示，在相应的工作负载下，使用Spiking Neural Networks（SNNs）在Loihi 2芯片上可以实现高度的准确性和能效性，并且比较之前的Convolutional Neural Networks（CNNs）基准平台可以降低能 consumption by more than 100 times。这些结果显示了 neuromorphic computing 和 SNNs 在未来 SatCom 系统中的潜在潜力，并可以帮助提高系统的效率和可持续性。

Abstract
The latest satellite communication (SatCom) missions are characterized by a fully reconfigurable on-board software-defined payload, capable of adapting radio resources to the temporal and spatial variations of the system traffic. As pure optimization-based solutions have shown to be computationally tedious and to lack flexibility, machine learning (ML)-based methods have emerged as promising alternatives. We investigate the application of energy-efficient brain-inspired ML models for on-board radio resource management. Apart from software simulation, we report extensive experimental results leveraging the recently released Intel Loihi 2 chip. To benchmark the performance of the proposed model, we implement conventional convolutional neural networks (CNN) on a Xilinx Versal VCK5000, and provide a detailed comparison of accuracy, precision, recall, and energy efficiency for different traffic demands. Most notably, for relevant workloads, spiking neural networks (SNNs) implemented on Loihi 2 yield higher accuracy, while reducing power consumption by more than 100$\times$ as compared to the CNN-based reference platform. Our findings point to the significant potential of neuromorphic computing and SNNs in supporting on-board SatCom operations, paving the way for enhanced efficiency and sustainability in future SatCom systems.

摘要
最新的卫星通信（SatCom）任务 caracterized by a fully reconfigurable on-board software-defined payload, capable of adapting radio resources to the temporal and spatial variations of the system traffic. As pure optimization-based solutions have shown to be computationally tedious and to lack flexibility,机器学习（ML）based methods have emerged as promising alternatives. We investigate the application of energy-efficient brain-inspired ML models for on-board radio resource management. Apart from software simulation, we report extensive experimental results leveraging the recently released Intel Loihi 2 chip. To benchmark the performance of the proposed model, we implement conventional convolutional neural networks（CNN）on a Xilinx Versal VCK5000, and provide a detailed comparison of accuracy, precision, recall, and energy efficiency for different traffic demands. Most notably, for relevant workloads, spiking neural networks（SNNs）implemented on Loihi 2 yield higher accuracy, while reducing power consumption by more than 100 times as compared to the CNN-based reference platform. Our findings point to the significant potential of neuromorphic computing and SNNs in supporting on-board SatCom operations, paving the way for enhanced efficiency and sustainability in future SatCom systems.

LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning (Practical Experience Report)

paper_url: http://arxiv.org/abs/2308.11148
repo_url: None
paper_authors: Junyi Lu, Lei Yu, Xiaojia Li, Li Yang, Chun Zuo
for: automatization of code review activities
methods: 使用 Large Language Models (LLMs) 和 parameter-efficient fine-tuning (PEFT) 方法
results: 与已有的预训练模型相当的性能，使用 fewer than 1% of trainable parametersHere’s the full translation of the abstract in Simplified Chinese:LLaMA-Reviewer 是一个创新的框架，利用 Large Language Models (LLMs) 在代码审查中表现出色。为了确保资源效率，这个框架使用 parameter-efficient fine-tuning (PEFT) 方法，实现高性能而使用 fewer than 1% of trainable parameters。在两个不同的公共可用数据集上进行了广泛的评估。发现，即使使用最小的 LLaMA 基本模型（6.7B parameters）和限定的调整 epochs，LLaMA-Reviewer 与现有的代码审查模型表现相同。实验显示了不同调整过程 ком成分对性能的影响。为了促进这个领域的无穷进步，代码和所有 PEFT-weight 插件已经公开开源。

Abstract
The automation of code review activities, a long-standing pursuit in software engineering, has been primarily addressed by numerous domain-specific pre-trained models. Despite their success, these models frequently demand extensive resources for pre-training from scratch. In contrast, Large Language Models (LLMs) provide an intriguing alternative, given their remarkable capabilities when supplemented with domain-specific knowledge. However, their potential for automating code review tasks remains largely unexplored. In response to this research gap, we present LLaMA-Reviewer, an innovative framework that leverages the capabilities of LLaMA, a popular LLM, in the realm of code review. Mindful of resource constraints, this framework employs parameter-efficient fine-tuning (PEFT) methods, delivering high performance while using less than 1% of trainable parameters. An extensive evaluation of LLaMA-Reviewer is conducted on two diverse, publicly available datasets. Notably, even with the smallest LLaMA base model consisting of 6.7B parameters and a limited number of tuning epochs, LLaMA-Reviewer equals the performance of existing code-review-focused models. The ablation experiments provide insights into the influence of various fine-tuning process components, including input representation, instruction tuning, and different PEFT methods. To foster continuous progress in this field, the code and all PEFT-weight plugins have been made open-source.

摘要
“自动化代码审查活动，长期是软件工程中的一个探索，曾经由多种域 especific pre-trained 模型进行了主要 Address。尽管它们在成功的情况下，但它们往往需要大量的资源进行预训练。相比之下，大型语言模型（LLM）具有一定的吸引力，因为它们在具有域Specific 知识的情况下表现出了惊人的能力。然而，它们在自动化代码审查任务上的潜在作用仍然未得到了充分的探索。为了填补这个研究漏洞，我们提出了 LLaMA-Reviewer 框架，该框架利用 LLMA 的能力在代码审查领域。注意到资源约束，这个框架使用 parameter-efficient fine-tuning（PEFT）方法，实现高性能的同时使用 less than 1% 的可训练参数。为了评估 LLMA-Reviewer 的性能，我们对两个公共可用的数据集进行了广泛的评估。结果显示，即使使用 LLMA 的最小基本模型，具有 6.7B 参数并且限制 Tuning 次数，LLMA-Reviewer 的性能与现有的代码审查专门的模型相当。通过对不同的 fine-tuning 过程组件的影响进行杜绝试验，我们得到了关于输入表示、指令调整和不同 PEFT 方法的深入的理解。为了促进这一领域的不断发展，我们将 Code 和所有 PEFT-weight 插件公开发布。”

Exploring Unsupervised Cell Recognition with Prior Self-activation Maps

paper_url: http://arxiv.org/abs/2308.11144
repo_url: https://github.com/cpystan/psm
paper_authors: Pingyi Chen, Chenglu Zhu, Zhongyi Shui, Jiatong Cai, Sunyi Zheng, Shichuan Zhang, Lin Yang
for: This paper aims to reduce the dependency on manual annotations for cell recognition tasks in the medical field.
methods: The proposed method uses self-activation maps (PSMs) to generate pseudo masks as training targets. An activation network is trained with self-supervised learning, and the gradient information in the shallow layers of the network is aggregated to generate PSMs. A semantic clustering module is then introduced to transform PSMs to pixel-level semantic pseudo masks for downstream tasks.
results: The proposed method achieves competitive performance on two histological datasets (MoNuSeg and BCData) without any manual annotations. It also demonstrates the ability to perform multi-class cell detection, which is not possible with existing unsupervised methods. The results show the potential of PSMs to address the hunger for labels in the medical field.

Abstract
The success of supervised deep learning models on cell recognition tasks relies on detailed annotations. Many previous works have managed to reduce the dependency on labels. However, considering the large number of cells contained in a patch, costly and inefficient labeling is still inevitable. To this end, we explored label-free methods for cell recognition. Prior self-activation maps (PSM) are proposed to generate pseudo masks as training targets. To be specific, an activation network is trained with self-supervised learning. The gradient information in the shallow layers of the network is aggregated to generate prior self-activation maps. Afterward, a semantic clustering module is then introduced as a pipeline to transform PSMs to pixel-level semantic pseudo masks for downstream tasks. We evaluated our method on two histological datasets: MoNuSeg (cell segmentation) and BCData (multi-class cell detection). Compared with other fully-supervised and weakly-supervised methods, our method can achieve competitive performance without any manual annotations. Our simple but effective framework can also achieve multi-class cell detection which can not be done by existing unsupervised methods. The results show the potential of PSMs that might inspire other research to deal with the hunger for labels in medical area.

摘要
success of supervised deep learning models on cell recognition tasks relies on detailed annotations. Many previous works have managed to reduce the dependency on labels. However, considering the large number of cells contained in a patch, costly and inefficient labeling is still inevitable. To this end, we explored label-free methods for cell recognition. Prior self-activation maps (PSM) are proposed to generate pseudo masks as training targets. To be specific, an activation network is trained with self-supervised learning. The gradient information in the shallow layers of the network is aggregated to generate prior self-activation maps. Afterward, a semantic clustering module is then introduced as a pipeline to transform PSMs to pixel-level semantic pseudo masks for downstream tasks. We evaluated our method on two histological datasets: MoNuSeg (cell segmentation) and BCData (multi-class cell detection). Compared with other fully-supervised and weakly-supervised methods, our method can achieve competitive performance without any manual annotations. Our simple but effective framework can also achieve multi-class cell detection which can not be done by existing unsupervised methods. The results show the potential of PSMs that might inspire other research to deal with the hunger for labels in medical area.Here's the word-for-word translation:成功的深度学习模型在细胞识别任务上取决于详细的标签。许多前一作已经减少了标签的依赖。然而，考虑到细胞识别任务中的细胞数量很大，手动标注仍然是不可避免的。为此，我们探索了没有标签的细胞识别方法。我们提出了基于自动化激活网络的先前自动化图（PSM），用于生成训练目标。具体来说，我们使用了自我学习来训练激活网络。激活网络的梯度信息在深层层次上汇聚，生成先前自动化图。然后，我们引入了 semantic clustering module，用于将 PSM 转化为下游任务中的像素级别semantic pseudo mask。我们对 MoNuSeg（细胞分 segmentation）和 BCData（多类细胞检测）两个 histological 数据集进行评估。与其他完全supervised和weakly supervised方法相比，我们的方法可以实现竞争性的性能，无需任何手动标注。我们的简单 yet effective 框架还可以实现多类细胞检测，这不可能由现有的无监督方法完成。结果表明 PSM 的潜在力量，可能会激励其他研究人员在医疗领域面临标签匮乏的问题。

Graph Encoding and Neural Network Approaches for Volleyball Analytics: From Game Outcome to Individual Play Predictions

paper_url: http://arxiv.org/abs/2308.11142
repo_url: None
paper_authors: Rhys Tracy, Haotian Xia, Alex Rasla, Yuan-Fang Wang, Ambuj Singh
for: 提高复杂排球预测的准确性，为教练和运动员提供更加深刻的意义。
methods: 引入专门的图编码技术，为现有的排球数据添加更多的接触接触排球上下文。
results: 使用图神经网络（GNNs）在这些润色数据上进行三个不同的排球预测任务：牺牲结果预测、场地预测和击TYPE预测。比较基准模型的性能，分析结果，更好地理解排球牺牲中的下面关系。结果表明，使用图编码可以提供更加深刻的分析，显著提高预测结果总体。此外，我们还示出了基eline任务可以通过简单的调整（如移除封锁的击）进行显著改进。最后，我们展示了选择合适的模型结构可以更好地提取重要信息，以便更好地进行某个任务。总之，我们的研究展示了使用图编码在体育数据分析中的潜在强大和弱点，并希望这将激励未来的机器学习策略的进一步改进。

Abstract
This research aims to improve the accuracy of complex volleyball predictions and provide more meaningful insights to coaches and players. We introduce a specialized graph encoding technique to add additional contact-by-contact volleyball context to an already available volleyball dataset without any additional data gathering. We demonstrate the potential benefits of using graph neural networks (GNNs) on this enriched dataset for three different volleyball prediction tasks: rally outcome prediction, set location prediction, and hit type prediction. We compare the performance of our graph-based models to baseline models and analyze the results to better understand the underlying relationships in a volleyball rally. Our results show that the use of GNNs with our graph encoding yields a much more advanced analysis of the data, which noticeably improves prediction results overall. We also show that these baseline tasks can be significantly improved with simple adjustments, such as removing blocked hits. Lastly, we demonstrate the importance of choosing a model architecture that will better extract the important information for a certain task. Overall, our study showcases the potential strengths and weaknesses of using graph encodings in sports data analytics and hopefully will inspire future improvements in machine learning strategies across sports and applications by using graphbased encodings.

摘要
（这项研究的目标是提高复杂的排球预测精度和为教练和运动员提供更有意义的排球数据分析。我们引入了特殊的图编码技术，以添加到现有的排球数据集中的更多的接触点排球上下文。我们示出了使用图神经网络（GNNs）在这个增强的数据集上进行三种不同的排球预测任务：赛点预测、场地预测和击打类型预测。我们比较了我们的图基模型和基线模型的性能，并分析了结果以更好地理解排球赛中的下面关系。我们的结果表明，使用GNNs与我们的图编码可以获得更高级别的数据分析，导致总体预测结果得到了明显的改善。我们还示出了基eline任务可以通过简单的调整，如移除堵塞的击打，获得显著的改善。最后，我们示出了选择适合的模型架构可以更好地提取关键信息，以便更好地进行某个任务。总之，我们的研究展示了使用图编码在体育数据分析中的潜在优势和不足，希望能启发未来机器学习策略的改进，并在不同的体育和应用场景中使用图基编码。）

Towards Validating Long-Term User Feedbacks in Interactive Recommendation Systems

paper_url: http://arxiv.org/abs/2308.11137
repo_url: https://github.com/jettbrains/-L-
paper_authors: Hojoon Lee, Dongyoon Hwang, Kyushik Min, Jaegul Choo
for: 这个论文的目的是为了评估基于奖励学习（RL）的推荐系统（IRS）的性能。
methods: 该论文使用了公共可用的评论数据集来比较和评估不同的RL算法。
results: 研究发现，一个简单的贪吃奖励模型在长期奖励上一直表现出优异，而RL基本模型则不太具备这种能力。此外，应用更高的长期奖励权重会导致推荐性能下降。最后，评论反馈对基准数据集上的性能具有较少的长期影响。

Abstract
Interactive Recommender Systems (IRSs) have attracted a lot of attention, due to their ability to model interactive processes between users and recommender systems. Numerous approaches have adopted Reinforcement Learning (RL) algorithms, as these can directly maximize users' cumulative rewards. In IRS, researchers commonly utilize publicly available review datasets to compare and evaluate algorithms. However, user feedback provided in public datasets merely includes instant responses (e.g., a rating), with no inclusion of delayed responses (e.g., the dwell time and the lifetime value). Thus, the question remains whether these review datasets are an appropriate choice to evaluate the long-term effects of the IRS. In this work, we revisited experiments on IRS with review datasets and compared RL-based models with a simple reward model that greedily recommends the item with the highest one-step reward. Following extensive analysis, we can reveal three main findings: First, a simple greedy reward model consistently outperforms RL-based models in maximizing cumulative rewards. Second, applying higher weighting to long-term rewards leads to a degradation of recommendation performance. Third, user feedbacks have mere long-term effects on the benchmark datasets. Based on our findings, we conclude that a dataset has to be carefully verified and that a simple greedy baseline should be included for a proper evaluation of RL-based IRS approaches.

摘要
互动推荐系统（IRS）在过去几年中吸引了很多关注，因为它可以模拟用户和推荐系统之间的互动过程。许多方法都采用了强化学习（RL）算法，因为它们可以直接提高用户的总奖励。在 IRs 中，研究人员通常使用公共可用的评价数据来比较和评估算法。然而，用户提供的反馈仅仅包括当下响应（例如，评分），没有包括延迟响应（例如，浏览时间和生命值）。因此，问题是这些评价数据是否适合评估 IRs 的长期效果。在这种情况下，我们重新进行了 IRs experiments 中的评价，并比较了基于 RL 算法的模型和简单奖励模型，该模型在一步奖励中选择最高的一个项目。经过广泛分析，我们得出了以下三个主要发现：1. 基于 RL 算法的模型在总奖励上 consistently 下降。2. 应用更高的长期奖励权重会导致推荐性能下降。3. 用户反馈在标准数据集上有较少的长期效果。根据我们的发现，我们 conclude 认为一个数据集需要仔细验证，并且一个简单的基准模型应该包含在评估 RL-based IRS 方法中。

Transformers for Capturing Multi-level Graph Structure using Hierarchical Distances

paper_url: http://arxiv.org/abs/2308.11129
repo_url: None
paper_authors: Yuankai Luo
for: 本研究旨在提高图像预测中的表达能力，通过模型图像中的多层嵌入结构和层次结构。
methods: 本文提出了一种层次距离结构编码（HDSE），该方法利用图像中的层次结构，以提高图像预测的表达能力。 HDSE 可以轻松地与现有的图像变换器结合使用，以实现同时应用多种位置表示。
results: 通过对 12 个真实世界数据集进行广泛的实验，我们证明了 HDSE 方法可以成功地提高多种基线变换器的表达能力，在 10 个标准数据集上达到领先的 empirical 性能。

Abstract
Graph transformers need strong inductive biases to derive meaningful attention scores. Yet, current proposals rarely address methods capturing longer ranges, hierarchical structures, or community structures, as they appear in various graphs such as molecules, social networks, and citation networks. In this paper, we propose a hierarchy-distance structural encoding (HDSE), which models a hierarchical distance between the nodes in a graph focusing on its multi-level, hierarchical nature. In particular, this yields a framework which can be flexibly integrated with existing graph transformers, allowing for simultaneous application with other positional representations. Through extensive experiments on 12 real-world datasets, we demonstrate that our HDSE method successfully enhances various types of baseline transformers, achieving state-of-the-art empirical performances on 10 benchmark datasets.

摘要
Current graph transformers rely heavily on strong inductive biases to generate meaningful attention scores. However, most existing proposals neglect longer ranges, hierarchical structures, or community structures found in various graphs such as molecules, social networks, and citation networks. In this paper, we propose a hierarchical distance structural encoding (HDSE) that captures the hierarchical distance between nodes in a graph, taking into account its multi-level and hierarchical nature. This approach can be seamlessly integrated with existing graph transformers, allowing for simultaneous use with other positional representations. Extensive experiments on 12 real-world datasets demonstrate that our HDSE method significantly enhances various baseline transformers, achieving state-of-the-art performance on 10 benchmark datasets.

How Expressive are Graph Neural Networks in Recommendation?

paper_url: http://arxiv.org/abs/2308.11127
repo_url: https://github.com/hkuds/gte
paper_authors: Xuheng Cai, Lianghao Xia, Xubin Ren, Chao Huang
for: 本研究旨在提供对Graph Neural Networks（GNNs）在推荐任务中的理论分析，包括GNNs的表达能力和其在推荐任务中的效果。
methods: 本研究使用了message passing GNNs和random node initialization来证明GNNs的表达能力，并提出了一个新的表达能力指标——topological closeness，用于评估GNNs在推荐任务中的能力。
results: 研究发现，GNNs在推荐任务中的表达能力与topological closeness指标有直接的关系，而且可以通过学习eless GNN算法来优化表达能力。此外，研究还发现，GNNs在不同的推荐任务中的表达能力有所不同，而且与任务的特点有关。

Abstract
Graph Neural Networks (GNNs) have demonstrated superior performance on various graph learning tasks, including recommendation, where they leverage user-item collaborative filtering signals in graphs. However, theoretical formulations of their capability are scarce, despite their empirical effectiveness in state-of-the-art recommender models. Recently, research has explored the expressiveness of GNNs in general, demonstrating that message passing GNNs are at most as powerful as the Weisfeiler-Lehman test, and that GNNs combined with random node initialization are universal. Nevertheless, the concept of "expressiveness" for GNNs remains vaguely defined. Most existing works adopt the graph isomorphism test as the metric of expressiveness, but this graph-level task may not effectively assess a model's ability in recommendation, where the objective is to distinguish nodes of different closeness. In this paper, we provide a comprehensive theoretical analysis of the expressiveness of GNNs in recommendation, considering three levels of expressiveness metrics: graph isomorphism (graph-level), node automorphism (node-level), and topological closeness (link-level). We propose the topological closeness metric to evaluate GNNs' ability to capture the structural distance between nodes, which aligns closely with the objective of recommendation. To validate the effectiveness of this new metric in evaluating recommendation performance, we introduce a learning-less GNN algorithm that is optimal on the new metric and can be optimal on the node-level metric with suitable modification. We conduct extensive experiments comparing the proposed algorithm against various types of state-of-the-art GNN models to explore the explainability of the new metric in the recommendation task. For reproducibility, implementation codes are available at https://github.com/HKUDS/GTE.

摘要
格raph神经网络（GNNs）在多种图学任务中表现出色，包括推荐，其利用用户Item的共同满意信号在图中。然而，GNNs的理论表述尚缺乏，尽管它们在当前最佳推荐模型中的实际效iveness很高。最近，研究人员已经探索了GNNs的表达能力，并证明了消息传递GNNs在最多情况下与Weisfeiler-Lehman测试相当有力，并且GNNs在随机节点初始化下是可 универса的。然而，GNNs的“表达能力”概念仍然未得到准确定义。大多数现有作品采用图 isomorphism测试作为表达能力的度量，但这个图级任务可能不能有效地评估一个模型在推荐任务中的能力，因为推荐任务的目标是分辨不同的节点的邻接关系。在这篇论文中，我们提供了对GNNs在推荐任务中的表达能力进行全面的理论分析，包括图 isomorphism（图级）、节点自机制（节点级）和 topological closeness（链级）三种表达能力级别。我们提出了 topological closeness 度量，用于评估 GNNs 在不同节点之间的结构距离，这与推荐任务的目标吻合。为验证新的度量在推荐任务中的效用，我们提出了一种不包含学习的 GNN 算法，该算法在新的度量上是优化的，并且可以通过修改而在节点级度量上达到优化。我们进行了对多种当前最佳 GNN 模型的广泛比较，以探索新的度量在推荐任务中的解释性。为保持可重现性，我们在 GitHub 上提供了实现代码，可以在中找到。

Random Word Data Augmentation with CLIP for Zero-Shot Anomaly Detection

paper_url: http://arxiv.org/abs/2308.11119
repo_url: None
paper_authors: Masato Tamura
for: 这个研究的目的是发展一个基于 CLIP 的类型缺失检测方法，并且不需要训练图像。
methods: 这个方法使用 CLIP 的描述子引导分类器来显示每个图像的每个部分，并且使用生成的文本嵌入来训练一个对应的 feed-forward 神经网络。
results: 实验结果显示，这个方法可以在零条件下达到顶尖性能，并且不需要劳动的描述子组合。

Abstract
This paper presents a novel method that leverages a visual-language model, CLIP, as a data source for zero-shot anomaly detection. Tremendous efforts have been put towards developing anomaly detectors due to their potential industrial applications. Considering the difficulty in acquiring various anomalous samples for training, most existing methods train models with only normal samples and measure discrepancies from the distribution of normal samples during inference, which requires training a model for each object category. The problem of this inefficient training requirement has been tackled by designing a CLIP-based anomaly detector that applies prompt-guided classification to each part of an image in a sliding window manner. However, the method still suffers from the labor of careful prompt ensembling with known object categories. To overcome the issues above, we propose leveraging CLIP as a data source for training. Our method generates text embeddings with the text encoder in CLIP with typical prompts that include words of normal and anomaly. In addition to these words, we insert several randomly generated words into prompts, which enables the encoder to generate a diverse set of normal and anomalous samples. Using the generated embeddings as training data, a feed-forward neural network learns to extract features of normal and anomaly from CLIP's embeddings, and as a result, a category-agnostic anomaly detector can be obtained without any training images. Experimental results demonstrate that our method achieves state-of-the-art performance without laborious prompt ensembling in zero-shot setups.

摘要
To address this issue, the proposed method uses prompt-guided classification to apply CLIP to each part of an image in a sliding window manner. However, this method still suffers from the labor of careful prompt ensembling with known object categories. To overcome these issues, the proposed method leverages CLIP as a data source for training. The method generates text embeddings with the text encoder in CLIP using typical prompts that include words of normal and anomaly. In addition, several randomly generated words are inserted into the prompts to enable the encoder to generate a diverse set of normal and anomalous samples.Using the generated embeddings as training data, a feed-forward neural network learns to extract features of normal and anomaly from CLIP's embeddings, resulting in a category-agnostic anomaly detector without any training images. Experimental results show that the proposed method achieves state-of-the-art performance without laborious prompt ensembling in zero-shot setups.

Development of a Novel Quantum Pre-processing Filter to Improve Image Classification Accuracy of Neural Network Models

paper_url: http://arxiv.org/abs/2308.11112
repo_url: https://github.com/hajimesuzuki999/qpf
paper_authors: Farina Riaz, Shahab Abdulla, Hajime Suzuki, Srinjoy Ganguly, Ravinesh C. Deo, Susan Hopkins
for: 提高图像分类模型的准确率
methods: 使用量子预处理滤波器（QPF）方法，包括四个量子比特的简单电路，使用Y转换门进行编码和两个控制NOT门创建量子 bits 之间的相关性。
results: 应用QPF方法后，基于MNIST和EMNIST datasets的图像分类精度从92.5%提高到95.4%和从68.9%提高到75.9%，而无需添加Extra模型参数或优化机器学习过程。但是，对于GTSRB dataset的测试表明，使用这种基线方法可能会导致图像分类精度下降。因此，进一步研究量子电路方法的设计和应用可能会有所助益。

Abstract
This paper proposes a novel quantum pre-processing filter (QPF) to improve the image classification accuracy of neural network (NN) models. A simple four qubit quantum circuit that uses Y rotation gates for encoding and two controlled NOT gates for creating correlation among the qubits is applied as a feature extraction filter prior to passing data into the fully connected NN architecture. By applying the QPF approach, the results show that the image classification accuracy based on the MNIST (handwritten 10 digits) and the EMNIST (handwritten 47 class digits and letters) datasets can be improved, from 92.5% to 95.4% and from 68.9% to 75.9%, respectively. These improvements were obtained without introducing extra model parameters or optimizations in the machine learning process. However, tests performed on the developed QPF approach against a relatively complex GTSRB dataset with 43 distinct class real-life traffic sign images showed a degradation in the classification accuracy. Considering this result, further research into the understanding and the design of a more suitable quantum circuit approach for image classification neural networks could be explored utilizing the baseline method proposed in this paper.

摘要

CAME: Contrastive Automated Model Evaluation

paper_url: http://arxiv.org/abs/2308.11111
repo_url: https://github.com/pengr/contrastive_autoeval
paper_authors: Ru Peng, Qiuyang Duan, Haobo Wang, Jiachen Ma, Yanbo Jiang, Yongjun Tu, Xiu Jiang, Junbo Zhao
for: 本研究旨在提出一种新的自动评估框架，以减少对训练集的依赖。
methods: 本研究使用一种新的对照损失来评估模型性能，不再需要使用训练集。
results: 研究表明，CAME可以与传统的AutoEval方法相比，在评估模型性能方面达到新的最高水平。

Abstract
The Automated Model Evaluation (AutoEval) framework entertains the possibility of evaluating a trained machine learning model without resorting to a labeled testing set. Despite the promise and some decent results, the existing AutoEval methods heavily rely on computing distribution shifts between the unlabelled testing set and the training set. We believe this reliance on the training set becomes another obstacle in shipping this technology to real-world ML development. In this work, we propose Contrastive Automatic Model Evaluation (CAME), a novel AutoEval framework that is rid of involving training set in the loop. The core idea of CAME bases on a theoretical analysis which bonds the model performance with a contrastive loss. Further, with extensive empirical validation, we manage to set up a predictable relationship between the two, simply by deducing on the unlabeled/unseen testing set. The resulting framework CAME establishes a new SOTA results for AutoEval by surpassing prior work significantly.

摘要
《自动评估模型框架（AutoEval）》可能性评估已训练的机器学习模型而不需要标注测试集。虽有承诺和一些不错的结果，现有的AutoEval方法仍然重重靠计算测试集和训练集之间的分布变化。我们认为，这种依赖于训练集的方法会成为实际机器学习开发中的另一个障碍。在这项工作中，我们提出了对比自动评估框架（CAME），它不再需要测试集在循环中。我们的核心想法是基于对比损失函数的理论分析，将模型性能与对比损失函数之间建立一个可预测的关系。经过广泛的实验验证，我们成功地建立了一个可预测的关系，只需通过对未标注/未见测试集进行推断。CAME框架实现了对AutoEval的新的最佳实践（SOTA）结果，在先前的工作上显著超越。

Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

paper_url: http://arxiv.org/abs/2308.11103
repo_url: https://github.com/skatinger/anonymity-at-risk-assessing-re-identification-capabilities-of-large-language-models
paper_authors: Alex Nyffenegger, Matthias Stürmer, Joel Niklaus
for: 这个论文的目的是探讨LLMs是否可以重新标识在法律案例中匿名的人员，以保护个人隐私。
methods: 作者使用实际的瑞士联邦最高法院的法律数据构建了一个证明原型，并在更严格的测试环境中进行了进一步的调查。
results: 研究发现，即使使用最好的LLMs，在法律案例中也存在高度重新标识的难度。这种难度主要归结于数据稀缺、训练资源的需要和材料使用的复杂性。

Abstract
Anonymity of both natural and legal persons in court rulings is a critical aspect of privacy protection in the European Union and Switzerland. With the advent of LLMs, concerns about large-scale re-identification of anonymized persons are growing. In accordance with the Federal Supreme Court of Switzerland, we explore the potential of LLMs to re-identify individuals in court rulings by constructing a proof-of-concept using actual legal data from the Swiss federal supreme court. Following the initial experiment, we constructed an anonymized Wikipedia dataset as a more rigorous testing ground to further investigate the findings. With the introduction and application of the new task of re-identifying people in texts, we also introduce new metrics to measure performance. We systematically analyze the factors that influence successful re-identifications, identifying model size, input length, and instruction tuning among the most critical determinants. Despite high re-identification rates on Wikipedia, even the best LLMs struggled with court decisions. The complexity is attributed to the lack of test datasets, the necessity for substantial training resources, and data sparsity in the information used for re-identification. In conclusion, this study demonstrates that re-identification using LLMs may not be feasible for now, but as the proof-of-concept on Wikipedia showed, it might become possible in the future. We hope that our system can help enhance the confidence in the security of anonymized decisions, thus leading to the courts being more confident to publish decisions.

摘要
欧洲联盟和瑞士的法律保护中的人员匿名性是一个关键方面。随着LLM的出现，大规模重新标识匿名化人员的问题正在增长。根据瑞士最高法院的判决，我们在实际的法律数据上进行了证明，并在使用瑞士联邦最高法院的法律数据进行了更加严格的测试。我们引入了一个新任务，即在文本中重新标识人员，并引入了新的表现指标。我们系统地分析了重新标识成功的因素，并确定了模型大小、输入长度和调参是最critical的决定因素。尽管在Wikipedia上实现了高的重新标识率，但even the best LLMs在法律判决上却遇到了困难。这些问题的原因是缺乏测试集，需要庞大的训练资源，以及在重新标识中使用的数据稀缺。因此，本研究表明，使用LLM进行重新标识可能不可能，但是在Wikipedia上的证明表明，这可能在未来成为可能的。我们希望，我们的系统可以增强匿名化判决的安全性，从而使法院更自信地发布判决。

Explicability and Inexplicability in the Interpretation of Quantum Neural Networks

paper_url: http://arxiv.org/abs/2308.11098
repo_url: https://github.com/lirandepira/interpret-qnn
paper_authors: Lirandë Pira, Chris Ferrie
for: 探讨人工智能方法的可解释性，尤其是深度神经网络，因为AI支持的系统广泛应用，但行为经常无法解释。
methods: 使用本地模型独立可解释性度量评估量化神经网络和经典神经网络的可解释性。
results: 提出了“不可解释区”概念，表示无法解释的数据样本，可能受到内在随机量子测量的影响。这种研究为建立负责任和可负责任量子AI模型做出了一步。

Abstract
Interpretability of artificial intelligence (AI) methods, particularly deep neural networks, is of great interest due to the widespread use of AI-backed systems, which often have unexplainable behavior. The interpretability of such models is a crucial component of building trusted systems. Many methods exist to approach this problem, but they do not obviously generalize to the quantum setting. Here we explore the interpretability of quantum neural networks using local model-agnostic interpretability measures of quantum and classical neural networks. We introduce the concept of the band of inexplicability, representing the interpretable region in which data samples have no explanation, likely victims of inherently random quantum measurements. We see this as a step toward understanding how to build responsible and accountable quantum AI models.

摘要
人工智能（AI）技术的解释性（interpretability）在深入吸引关注，因为AI支持系统广泛应用，但它们的行为往往不可解释。解释性是构建可信系统的关键组件。虽然有许多方法可以解决这个问题，但它们不显而易于扩展到量子设置。在这里，我们探讨量子神经网络的解释性使用本地模型独立可解释性指标评估量子和经典神经网络。我们引入了带宽不可解释性概念，表示无法解释的数据样本带宽，可能是因为内在的随机量子测量带来的。我们认为这是建立负责任和可负责任量子AI模型的一个重要步骤。

Video OWL-ViT: Temporally-consistent open-world localization in video

paper_url: http://arxiv.org/abs/2308.11093
repo_url: None
paper_authors: Georg Heigold, Matthias Minderer, Alexey Gritsenko, Alex Bewley, Daniel Keysers, Mario Lučić, Fisher Yu, Thomas Kipf
for: 本研究旨在适应预训练的开放视界图像模型于视频地图Localization。
methods: 我们基于OWL-ViT开放词汇检测模型，并添加了 transformer 解码器来进行时间卷积。输出token用于下一帧的对象检测。
results: 我们的模型在TAO-OW数据集上进行了成功的转移，并且在开放视界中保持了适应性。对比tracking-by-detection基eline，我们的模型具有更好的时间一致性。

Abstract
We present an architecture and a training recipe that adapts pre-trained open-world image models to localization in videos. Understanding the open visual world (without being constrained by fixed label spaces) is crucial for many real-world vision tasks. Contrastive pre-training on large image-text datasets has recently led to significant improvements for image-level tasks. For more structured tasks involving object localization applying pre-trained models is more challenging. This is particularly true for video tasks, where task-specific data is limited. We show successful transfer of open-world models by building on the OWL-ViT open-vocabulary detection model and adapting it to video by adding a transformer decoder. The decoder propagates object representations recurrently through time by using the output tokens for one frame as the object queries for the next. Our model is end-to-end trainable on video data and enjoys improved temporal consistency compared to tracking-by-detection baselines, while retaining the open-world capabilities of the backbone detector. We evaluate our model on the challenging TAO-OW benchmark and demonstrate that open-world capabilities, learned from large-scale image-text pre-training, can be transferred successfully to open-world localization across diverse videos.

摘要
我们提出了一种建筑和训练方法，用于将预训练的开放视界图像模型适应到视频本地化。理解开放视界（无需受限于固定标签空间）是许多实际视觉任务的关键。最近，对大量图像文本数据进行对比预训练已经导致了图像级任务的显著改进。但是，对于结构化任务，如对象本地化，使用预训练模型更加困难。特别是在视频任务中，任务特定的数据是有限的。我们基于OWL-ViT开放 vocabulary检测模型，并对其进行了修改，添加了一个变换器解码器。这个解码器通过在时间上循环传播对象表示，使用一帧输出的对象查询来初始化下一帧的查询。我们的模型是可以练习到视频数据上，并且具有改进的时间一致性，与跟踪-by-检测基线相比，而且保留了开放视界检测器的开放世界功能。我们在TAO-OW benchmark上进行了评估，并证明了大规模图像文本预训练学习的开放视界功能可以成功地传递到开放视界本地化中的多种视频。

Addressing Fairness and Explainability in Image Classification Using Optimal Transport

paper_url: http://arxiv.org/abs/2308.11090
repo_url: None
paper_authors: Philipp Ratz, François Hu, Arthur Charpentier
for: 本研究旨在提高人工智能系统的可靠性和负责任性，特别是在医疗和警察领域，通过解释不公平结果的原因。
methods: 本研究使用最优运输理论来探索图像中的偏见区域，可以轻松扩展到表格数据。通过 Wasserstein 质量量表，我们获得不受敏感变量影响的分数，同时保持predictive accuracy。
results: 我们的研究发现，使用最优运输理论可以快速找到偏见区域的原因，从而提高人工智能系统的可靠性和公平性。这些发现对于建立可靠、公平的人工智能系统具有重要意义，并且可以应用于各种领域中的决策制定。

Abstract
Algorithmic Fairness and the explainability of potentially unfair outcomes are crucial for establishing trust and accountability of Artificial Intelligence systems in domains such as healthcare and policing. Though significant advances have been made in each of the fields separately, achieving explainability in fairness applications remains challenging, particularly so in domains where deep neural networks are used. At the same time, ethical data-mining has become ever more relevant, as it has been shown countless times that fairness-unaware algorithms result in biased outcomes. Current approaches focus on mitigating biases in the outcomes of the model, but few attempts have been made to try to explain \emph{why} a model is biased. To bridge this gap, we propose a comprehensive approach that leverages optimal transport theory to uncover the causes and implications of biased regions in images, which easily extends to tabular data as well. Through the use of Wasserstein barycenters, we obtain scores that are independent of a sensitive variable but keep their marginal orderings. This step ensures predictive accuracy but also helps us to recover the regions most associated with the generation of the biases. Our findings hold significant implications for the development of trustworthy and unbiased AI systems, fostering transparency, accountability, and fairness in critical decision-making scenarios across diverse domains.

摘要
《算法公平性和可解释性是在健康照管和宪政等领域建立人工智能系统信任和责任的关键因素。虽然在每个领域 separately 有所进步，但在公平性应用中实现可解释性仍然是挑战，特别是在使用深度神经网络时。同时，伦理数据挖掘已成为不可或缺的，因为无数次证明了不公平的算法会导致偏见的结果。现有的方法主要是 Mitigating 偏见的结果，但几乎没有尝试解释为什么模型偏见。为了bridging这个鸿沟，我们提出了一种总体方法，利用最优运输理论来揭示偏见区域在图像中的原因和后果。通过使用 Wasserstein 矩阵中心，我们得到了不同敏感变量的独立分数，同时保持其排序。这个步骤确保预测精度，同时帮助我们回收生成偏见的区域。我们的发现对于开发可靠、无偏见的 AI 系统的开发具有深远的意义，推动了诚信、责任和公平在多个领域中的决策过程中的透明度和公平。》

Stress representations for tensor basis neural networks: alternative formulations to Finger-Rivlin-Ericksen

paper_url: http://arxiv.org/abs/2308.11080
repo_url: None
paper_authors: Jan N. Fuhg, Nikolaos Bouklas, Reese E. Jones
for: 这 paper 的目的是研究数据驱动的 constitutive 模型框架，以及其中的神经网络和经典表述定理的应用。
methods: 这 paper 使用了神经网络和经典表述定理来建立 constitutive 模型，并 investigate 了一些尚未被研究的表述形式。
results: 该 paper 测试了九种不同的模型变体，并对三种不同的材料进行了测试。结果表明，potential-based 方法和 coefficient-based 方法具有不同的优劣点，而不同的准备技术也对模型的性能产生了影响。

Abstract
Data-driven constitutive modeling frameworks based on neural networks and classical representation theorems have recently gained considerable attention due to their ability to easily incorporate constitutive constraints and their excellent generalization performance. In these models, the stress prediction follows from a linear combination of invariant-dependent coefficient functions and known tensor basis generators. However, thus far the formulations have been limited to stress representations based on the classical Rivlin and Ericksen form, while the performance of alternative representations has yet to be investigated. In this work, we survey a variety of tensor basis neural network models for modeling hyperelastic materials in a finite deformation context, including a number of so far unexplored formulations which use theoretically equivalent invariants and generators to Finger-Rivlin-Ericksen. Furthermore, we compare potential-based and coefficient-based approaches, as well as different calibration techniques. Nine variants are tested against both noisy and noiseless datasets for three different materials. Theoretical and practical insights into the performance of each formulation are given.

摘要
数据驱动的构成模型框架基于神经网络和经典表述定理在近些年来得到了广泛关注，因为它们可以轻松地包含构成约束和杰出的泛化性能。在这些模型中，压力预测来自于一种线性组合的恒定依赖 coefficient 函数和已知张量基 generator。然而，至今为止，这些 формуlas 仅限于压力表示方法基于经典的 Rivlin 和 Ericksen 形式，而其他表示方法的性能尚未被研究。在这种工作中，我们检验了一些张量基神经网络模型来模型弹性材料在有限做动上，包括一些尚未被探索的形式，它们使用了同等的 invariants 和 generator 来 Finger-Rivlin-Ericksen。此外，我们比较了 potential-based 和 coefficient-based approaches，以及不同的 calibration 技术。 nine variants 被测试在三种不同材料上，并提供了理论和实践的 Insights into 每种形式的性能。

Long-Term Prediction of Natural Video Sequences with Robust Video Predictors

paper_url: http://arxiv.org/abs/2308.11079
repo_url: None
paper_authors: Luke Ditria, Tom Drummond
for: 预测高维度视频序列是一个具有很多不确定性的问题。
methods: 该paper使用了深度学习 Perceptual 和 uncertainty-based 重建loss，以及注意力机制 skip connections，以提高短期预测质量。
results: 该paper实现了高质量短期预测，并且通过iterated single-step prediction任务，生成了非常长的自然视频序列。

Abstract
Predicting high dimensional video sequences is a curiously difficult problem. The number of possible futures for a given video sequence grows exponentially over time due to uncertainty. This is especially evident when trying to predict complicated natural video scenes from a limited snapshot of the world. The inherent uncertainty accumulates the further into the future you predict making long-term prediction very difficult. In this work we introduce a number of improvements to existing work that aid in creating Robust Video Predictors (RoViPs). We show that with a combination of deep Perceptual and uncertainty-based reconstruction losses we are able to create high quality short-term predictions. Attention-based skip connections are utilised to allow for long range spatial movement of input features to further improve performance. Finally, we show that by simply making the predictor robust to its own prediction errors, it is possible to produce very long, realistic natural video sequences using an iterated single-step prediction task.

摘要
预测高维视频序列是一个非常困难的问题，因为未来的可能性的数量会 exponential 增长随着时间的推移。特别是在预测自然视频场景时，由于全球的不确定性，预测的准确性会增加。在这项工作中，我们提出了一些改进现有工作的方法，以创建更高质量的视频预测器（RoViPs）。我们证明，通过将深度感知和不确定性基于的重建损失结合使用，可以创建高质量的短期预测。另外，我们使用注意力机制来实现长距离空间运动的输入特征传递，以进一步提高性能。最后，我们表明，通过使预测器具有自己预测错误的抗性，可以生成非常长、自然的视频序列，只需要采用迭代单步预测任务。

A Deep Dive into the Connections Between the Renormalization Group and Deep Learning in the Ising Model

paper_url: http://arxiv.org/abs/2308.11075
repo_url: None
paper_authors: Kelsie Taylor
for: 这个论文的目的是研究深度学习和离散群 teoría的关系，以及使用深度学习来实现离散群流。
methods: 这个论文使用了Restricted Boltzmann Machines（RBMs）来实现深度学习，并开发了一系列的 renormalization 技术来提供一个基准 для比较。
results: 研究发现，在1D Ising模型中，使用 Adam 优化器和 correlation length 损失函数来学习群流可以获得与分析模型相符的结果。在2D Ising模型中，使用 Wolff 算法生成样本，并使用 quasi-deterministic 方法实现群流，并计算了普通指数 exponent ν。最后，研究发现，在学习过程中，RBM层之间存在一种块结构，与离散群流有类似之处。但是，对于 simplest 的 nearest-neighbor Ising 模型，直接比较每层的权重和离散群 renormalization 发现了量化不一致。

Abstract
The renormalization group (RG) is an essential technique in statistical physics and quantum field theory, which considers scale-invariant properties of physical theories and how these theories' parameters change with scaling. Deep learning is a powerful computational technique that uses multi-layered neural networks to solve a myriad of complicated problems. Previous research suggests the possibility that unsupervised deep learning may be a form of RG flow, by being a layer-by-layer coarse graining of the original data. We examined this connection on a more rigorous basis for the simple example of Kadanoff block renormalization of the 2D nearest-neighbor Ising model, with our deep learning accomplished via Restricted Boltzmann Machines (RBMs). We developed extensive renormalization techniques for the 1D and 2D Ising model to provide a baseline for comparison. For the 1D Ising model, we successfully used Adam optimization on a correlation length loss function to learn the group flow, yielding results consistent with the analytical model for infinite N. For the 2D Ising model, we successfully generated Ising model samples using the Wolff algorithm, and performed the group flow using a quasi-deterministic method, validating these results by calculating the critical exponent \nu. We then examined RBM learning of the Ising model layer by layer, finding a blocking structure in the learning that is qualitatively similar to RG. Lastly, we directly compared the weights of each layer from the learning to Ising spin renormalization, but found quantitative inconsistencies for the simple case of nearest-neighbor Ising models.

摘要
renormalization group（RG）是物理学和量子场论中非常重要的技术，它考虑物理理论中尺度不变性的性质和尺度下降参数的变化。深度学习是一种非常强大的计算技术，它使用多层神经网络解决了许多复杂的问题。 précédente research suggests that unsupervised deep learning may be a form of RG flow, by being a layer-by-layer coarse graining of the original data. We examined this connection on a more rigorous basis for the simple example of Kadanoff block renormalization of the 2D nearest-neighbor Ising model, with our deep learning accomplished via Restricted Boltzmann Machines (RBMs). We developed extensive renormalization techniques for the 1D and 2D Ising model to provide a baseline for comparison. For the 1D Ising model, we successfully used Adam optimization on a correlation length loss function to learn the group flow, yielding results consistent with the analytical model for infinite N. For the 2D Ising model, we successfully generated Ising model samples using the Wolff algorithm, and performed the group flow using a quasi-deterministic method, validating these results by calculating the critical exponent ν. We then examined RBM learning of the Ising model layer by layer, finding a blocking structure in the learning that is qualitatively similar to RG. Lastly, we directly compared the weights of each layer from the learning to Ising spin renormalization, but found quantitative inconsistencies for the simple case of nearest-neighbor Ising models.Here's the translation in Traditional Chinese:renormalization group（RG）是物理学和量子场论中非常重要的技术，它考虑物理理论中尺度不变性的性质和尺度下降参数的变化。深度学习是一种非常强大的计算技术，它使用多层神经网络解决了许多复杂的问题。前一研究建议不监督深度学习可能是RG流的形式，通过层层粗化原始数据。我们在2D最近邻Israel模型中Kadanoff块化 Normalization中进行了更严格的检查，使用Restricted Boltzmann Machines（RBMs）进行深度学习。我们发展了1D和2D Ising模型的广泛normalization技术，以提供一个基准 для比较。For 1D Ising model，我们成功地使用Adam优化器在相互长度损失函数上学习群流，从而获得了无穷N的分析模型的结果。For 2D Ising model，我们成功地生成了Israel模型样本使用Wolff算法，并使用一种 quasi-deterministic 方法进行群流，这些结果被 validate by calculating the critical exponent ν。我们然后检查了RBM学习Israel模型层层，发现这些层层学习有一个类似RG的封顶结构。最后，我们直接比较了每个层的学习过程中的加重和Israel spin renormalization，发现这两者之间存在量化的不一致，尤其是在简单的邻近邻Israel模型中。

Neural Amortized Inference for Nested Multi-agent Reasoning

paper_url: http://arxiv.org/abs/2308.11071
repo_url: None
paper_authors: Kunal Jha, Tuan Anh Le, Chuanyang Jin, Yen-Ling Kuo, Joshua B. Tenenbaum, Tianmin Shu
for: 本研究旨在提高多智能交互中的嵌入式社会推理能力，使得计算机能够更好地理解其他智能的思维过程。
methods: 本研究提出了一种新的方法，利用神经网络来减少高阶社会推理的计算复杂性，从而提高多智能交互的效率。
results: 实验结果表明，本方法可以减少计算复杂性，同时保持准确性的水平。

Abstract
Multi-agent interactions, such as communication, teaching, and bluffing, often rely on higher-order social inference, i.e., understanding how others infer oneself. Such intricate reasoning can be effectively modeled through nested multi-agent reasoning. Nonetheless, the computational complexity escalates exponentially with each level of reasoning, posing a significant challenge. However, humans effortlessly perform complex social inferences as part of their daily lives. To bridge the gap between human-like inference capabilities and computational limitations, we propose a novel approach: leveraging neural networks to amortize high-order social inference, thereby expediting nested multi-agent reasoning. We evaluate our method in two challenging multi-agent interaction domains. The experimental results demonstrate that our method is computationally efficient while exhibiting minimal degradation in accuracy.

摘要
多智能机器人之间的交互，如 коммуникации、教学和谎梦，经常需要更高一级的社会推理，即理解别人对自己的推理。这种复杂的推理可以通过嵌入式多智能推理模型来有效地模拟。然而，计算复杂性随着每级推理的增加而呈指数增长， pose significant challenges。然而，人类在日常生活中很容易完成复杂的社会推理。为了bridging这个 gap，我们提出了一种新的方法：通过神经网络来减轻高级社会推理，从而加速嵌入式多智能推理。我们在两个复杂的多智能交互领域进行了实验，结果表明，我们的方法可以快速、有效地进行计算，同时减少了准确性的损失。

Topological Graph Signal Compression

paper_url: http://arxiv.org/abs/2308.11068
repo_url: None
paper_authors: Guillermo Bernárdez, Lev Telyatnikov, Eduard Alarcón, Albert Cabellos-Aparicio, Pere Barlet-Ros, Pietro Liò
for: 这篇论文是为了提出一种基于拓扑深度学习（TDL）方法来压缩信号 sobre 图structure的方法。
methods: 该方法包括两个主要步骤：首先，根据原始信号，使用 clustering 将 $N$ 个数据点分为 $K\ll N$ 个集合；然后，使用拓扑学照理的消息传递来在这些多个元素集中获得压缩的信号表示。
results: 我们的研究表明，我们的框架可以在两个实际 Internet Service Provider Networks 数据集上提高标准 GNN 和 feed-forward 架构的压缩率，从 $30%$ 到 $90%$ 的压缩率提高，这表明我们的方法更好地捕捉和利用图structured 网络结构中的空间和时间相关性。

Abstract
Recently emerged Topological Deep Learning (TDL) methods aim to extend current Graph Neural Networks (GNN) by naturally processing higher-order interactions, going beyond the pairwise relations and local neighborhoods defined by graph representations. In this paper we propose a novel TDL-based method for compressing signals over graphs, consisting in two main steps: first, disjoint sets of higher-order structures are inferred based on the original signal --by clustering $N$ datapoints into $K\ll N$ collections; then, a topological-inspired message passing gets a compressed representation of the signal within those multi-element sets. Our results show that our framework improves both standard GNN and feed-forward architectures in compressing temporal link-based signals from two real-word Internet Service Provider Networks' datasets --from $30\%$ up to $90\%$ better reconstruction errors across all evaluation scenarios--, suggesting that it better captures and exploits spatial and temporal correlations over the whole graph-based network structure.

摘要
最近爆出的拓扑深度学习（TDL）方法希望扩展当前图神经网络（GNN），让其自然处理更高阶交互，远离现有的对比关系和本地邻域定义的图表示。在这篇论文中，我们提出了一种基于TDL的信号压缩方法，包括两个主要步骤：首先，通过原始信号的 clustering，将 $N$ 个数据点分为 $K\ll N$ 个集合；然后，使用图理解的推递机制来在这些多元素集中获得压缩的信号表示。我们的结果表明，我们的框架可以在两个实际互联网服务提供商网络数据集上提高标准 GNN 和Feedforward 架构的压缩率，从 $30\%$ 到 $90\%$ 的各种评估场景中。这表明我们的框架更好地捕捉和利用图形结构中的空间和时间相关性。

UnLoc: A Unified Framework for Video Localization Tasks

paper_url: http://arxiv.org/abs/2308.11062
repo_url: https://github.com/google-research/scenic
paper_authors: Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid
for: 这 paper 的目的是提出一种新的视频地址Localization方法，用于解决无法裁剪的视频中的时间地址问题。
methods: 该方法使用预训练的图像和文本楼层，并将token传递给视频-文本融合模型。输出的融合模块 then 用于构建一个特征峰，每个级别与一个头相连，以预测每帧相关性分数和开始/结束时间偏移。
results: 与先前的方法不同，我们的 architecture 可以实现 Moment Retrieval、Temporal Localization 和 Action Segmentation 的三个任务，无需动作提案、运动基于预训练特征或表示掩码。与专门的模型不同，我们在三个不同的地址任务上达到了状态的最佳Result。

Abstract
While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task. We design a new approach for this called UnLoc, which uses pretrained image and text towers, and feeds tokens to a video-text fusion model. The output of the fusion module are then used to construct a feature pyramid in which each level connects to a head to predict a per-frame relevancy score and start/end time displacements. Unlike previous works, our architecture enables Moment Retrieval, Temporal Localization, and Action Segmentation with a single stage model, without the need for action proposals, motion based pretrained features or representation masking. Unlike specialized models, we achieve state of the art results on all three different localization tasks with a unified approach. Code will be available at: \url{https://github.com/google-research/scenic}.

摘要
“大规模的图像文本预训练模型，如CLIP，已经在剪辑后的视频上进行多个任务，但它们在未剪辑的视频中的时间本地化仍然是一个未探索的任务。我们设计了一种新的方法called UnLoc，它使用预训练的图像和文本楼层，并将token传递给视频文本融合模型。融合模块的输出然后用于构建一个特征层级，每个层与一个头连接以预测每帧的相关性分数和开始/结束时间偏移。与先前的方法不同，我们的架构允许场景检索、时间本地化和动作分割使用单一的阶段模型，不需要动作提案、运动基于预训练特征或表示掩蔽。与专门的模型不同，我们在三个不同的本地化任务上达到了状态机器人的最佳结果，代码将在：\url{https://github.com/google-research/scenic}上公开。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

paper_url: http://arxiv.org/abs/2308.12305
repo_url: None
paper_authors: Haokun Chen, Yao Zhang, Denis Krompass, Jindong Gu, Volker Tresp
for: 这则研究目的是为了提高基础模型在多modal学习中的表现，并且解决在不同领域的数据权益问题。
methods: 这篇研究使用了联合学习（Federated Learning）和两个数据Adapter（DAT）来处理多modal数据的不同性。
results: 研究结果显示，这个方法可以大幅提高基础模型在多modal学习中的表现，并且比起传统的中央化训练方法更加高效。

Abstract
Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.

摘要
最近，基金会模型在多Modal学习中表现出了惊人的进步。这些模型通常需要大量数据进行finetuning，但收集和中央化训练数据从多个领域是一项具有挑战性的任务，因为各个领域的隐私法规不同。为了解决这个问题，我们提出了一种基于联合学习的 Federated Dual-Aadapter Teacher（FedDAT）方法。我们的方法使用了一个双适应教师（DAT）来 Address数据不同性问题，通过客户端本地更新的Regularization和Mutual Knowledge Distillation（MKD）来进行高效的知识传递。FedDAT是首个能够有效地在多Modal Vision-Language任务上进行分布式finetuning基础模型的方法。为了证明其效果，我们在四个多Modal FLbenchmark上进行了广泛的实验，并证明FedDAT在与中央PEFT方法进行比较时具有显著的优势。

Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

paper_url: http://arxiv.org/abs/2308.11053
repo_url: None
paper_authors: Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng
for: 提高全双工通信中的干扰抑制和减少噪声，并且现有的神经网络具有高计算成本和不 flexible的参数调整能力。
methods: 提出了时Frequency dual-path压缩，以实现广泛的压缩比例的计算成本。Specifically, for frequency compression, trainable filters are used to replace manually designed filters for dimension reduction. For time compression, only using frame skipped prediction causes large performance degradation, which can be alleviated by a post-processing network with full sequence modeling.
results: 发现 that dual-path compression combining both the time and frequency methods will give further performance improvement, covering compression ratios from 4x to 32x with little model size change. Moreover, the proposed models show competitive performance compared with fast FullSubNet and DeepFilterNet.

Abstract
Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity. In this paper, we introduce time-frequency dual-path compression to achieve a wide range of compression ratios on computational cost. Specifically, for frequency compression, trainable filters are used to replace manually designed filters for dimension reduction. For time compression, only using frame skipped prediction causes large performance degradation, which can be alleviated by a post-processing network with full sequence modeling. We have found that under fixed compression ratios, dual-path compression combining both the time and frequency methods will give further performance improvement, covering compression ratios from 4x to 32x with little model size change. Moreover, the proposed models show competitive performance compared with fast FullSubNet and DeepFilterNet. A demo page can be found at hangtingchen.github.io/ultra_dual_path_compression.github.io/.

摘要
“干扰与噪音减少是全duplex通信的必备要素，但大多数现有的神经网络具有高计算成本和不可调节模型复杂度。在本文中，我们引入时间-频率二通路压缩以实现广泛的压缩比率。具体来说，用于频率压缩时，使用可训练的范畴缩减网络来取代手动设计的范畴缩减网络。而时间压缩则使用仅使用框架预测导致大量性能下降，可以通过全序模型来缓和。我们发现，固定压缩比率下的二通路压缩可以实现更好的性能提升，覆盖压缩比率从4x至32x的范围内，而且提案的模型与快速FullSubNet和DeepFilterNet的性能相当。请参考hangtingchen.github.io/ultra_dual_path_compression.github.io/。”

Harmonization Across Imaging Locations(HAIL): One-Shot Learning for Brain MRI

paper_url: http://arxiv.org/abs/2308.11047
repo_url: None
paper_authors: Abhijeet Parida, Zhifan Jiang, Syed Muhammad Anwar, Nicholas Foreman, Nicholas Stence, Michael J. Fisher, Roger J. Packer, Robert A. Avery, Marius George Linguraru
for: 针对罕见疾病，如儿童脑肿瘤，提出了机器学习基于诊断和预测的方法。
methods: 利用生成对抗网络（GANs）进行深度学习驱动的医疗影像协调，并使用一架一架学习Feature extractor、神经Style转移和自适应实例 нормализа。
results: 实验结果表明，我们的方法可以保持病人解剖结构的稳定性，同时调整影像的亮度尺度与新的临床站点匹配。我们的通用协调模型可以在新数据上应用，使其成为实际医疗应用和临床试验中的有价值工具。

Abstract
For machine learning-based prognosis and diagnosis of rare diseases, such as pediatric brain tumors, it is necessary to gather medical imaging data from multiple clinical sites that may use different devices and protocols. Deep learning-driven harmonization of radiologic images relies on generative adversarial networks (GANs). However, GANs notoriously generate pseudo structures that do not exist in the original training data, a phenomenon known as "hallucination". To prevent hallucination in medical imaging, such as magnetic resonance images (MRI) of the brain, we propose a one-shot learning method where we utilize neural style transfer for harmonization. At test time, the method uses one image from a clinical site to generate an image that matches the intensity scale of the collaborating sites. Our approach combines learning a feature extractor, neural style transfer, and adaptive instance normalization. We further propose a novel strategy to evaluate the effectiveness of image harmonization approaches with evaluation metrics that both measure image style harmonization and assess the preservation of anatomical structures. Experimental results demonstrate the effectiveness of our method in preserving patient anatomy while adjusting the image intensities to a new clinical site. Our general harmonization model can be used on unseen data from new sites, making it a valuable tool for real-world medical applications and clinical trials.

摘要
At test time, the method uses one image from a clinical site to generate an image that matches the intensity scale of the collaborating sites. Our approach combines learning a feature extractor, neural style transfer, and adaptive instance normalization. We also propose a novel strategy to evaluate the effectiveness of image harmonization approaches with evaluation metrics that both measure image style harmonization and assess the preservation of anatomical structures.Experimental results demonstrate the effectiveness of our method in preserving patient anatomy while adjusting the image intensities to a new clinical site. Our general harmonization model can be used on unseen data from new sites, making it a valuable tool for real-world medical applications and clinical trials.Translated into Simplified Chinese:为了使机器学习基于诊断和诊断 rare diseases 的应用成功，如儿童脑肿瘤，需要从多个临床站点收集医学成像数据，这些站点可能使用不同的设备和协议。但是，深度学习驱动的成像协调可能导致 pseudo 结构的生成，这被称为 "hallucination"。为了避免医学成像中的 hallucination，我们提议一种一遍学习方法，即使用神经风格传输来协调。在测试时，方法使用一张来自临床站点的图像，并生成一张与协作站点的强度标准匹配的图像。我们的方法结合学习特征提取器、神经风格传输和自适应实例normalization。我们还提出了一种新的评估图像协调方法的效果的策略，该策略包括评估图像风格协调和评估生物结构的保持。实验结果表明，我们的方法可以保持病人解剖结构，同时调整图像强度与新临床站点匹配。我们的通用协调模型可以在新站点上应用于未看过的数据，这使得它成为实际医疗应用和临床试验中的有价值工具。

Spurious Correlations and Where to Find Them

paper_url: http://arxiv.org/abs/2308.11043
repo_url: None
paper_authors: Gautam Sreekumar, Vishnu Naresh Boddeti
for: 本文旨在研究数据驱动学习中的假 correlations 问题，并提出一种基于 causal graph 的方法来识别和避免假 correlations。
methods: 本文使用了一些常见的假 correlations 假设，并通过synthetic datasets 进行实验研究其影响于标准 ERM 基elines。
results: 研究结果表明，假 correlations 的存在可以影响模型的性能，并且可以通过对模型设计 Parameters 的调整来避免假 correlations。

Abstract
Spurious correlations occur when a model learns unreliable features from the data and are a well-known drawback of data-driven learning. Although there are several algorithms proposed to mitigate it, we are yet to jointly derive the indicators of spurious correlations. As a result, the solutions built upon standalone hypotheses fail to beat simple ERM baselines. We collect some of the commonly studied hypotheses behind the occurrence of spurious correlations and investigate their influence on standard ERM baselines using synthetic datasets generated from causal graphs. Subsequently, we observe patterns connecting these hypotheses and model design choices.

摘要
假相关现象发生在模型学习不可靠特征时，是数据驱动学习的一个知名缺陷。虽然有许多算法提出来 mitigate it，但我们还未能同时 derive the indicators of spurious correlations。因此，基于单独的假设建立的解决方案无法超越简单的 ERM 基线。我们收集了一些常studied的假设在发生假相关时对标准 ERM 基eline的影响，并使用从 causal graphs 生成的 sintetic datasets 进行调查。后来，我们发现了这些假设和模型设计选择之间的征patterns。

Split Learning for Distributed Collaborative Training of Deep Learning Models in Health Informatics

paper_url: http://arxiv.org/abs/2308.11027
repo_url: None
paper_authors: Zhuohang Li, Chao Yan, Xinmeng Zhang, Gharib Gharibi, Zhijun Yin, Xiaoqian Jiang, Bradley A. Malin
for: 这篇论文的目的是探讨如何使用分割学习来实现医疗机构间的深度学习模型跨机构共同训练，同时保持原始数据和模型参数的私有性。
methods: 这篇论文提出了一个新的隐私保护分布式学习框架，该框架可以在不同的医疗机构中分割数据和模型参数，以保持隐私和安全性。
results: 研究者使用了多个生物医学影像和电子健康记录（EHR）数据集，证明了深度学习模型透过分割学习可以在跨机构的情况下实现高度相似的性能，同时大幅提高计算效率和降低隐私风险。

Abstract
Deep learning continues to rapidly evolve and is now demonstrating remarkable potential for numerous medical prediction tasks. However, realizing deep learning models that generalize across healthcare organizations is challenging. This is due, in part, to the inherent siloed nature of these organizations and patient privacy requirements. To address this problem, we illustrate how split learning can enable collaborative training of deep learning models across disparate and privately maintained health datasets, while keeping the original records and model parameters private. We introduce a new privacy-preserving distributed learning framework that offers a higher level of privacy compared to conventional federated learning. We use several biomedical imaging and electronic health record (EHR) datasets to show that deep learning models trained via split learning can achieve highly similar performance to their centralized and federated counterparts while greatly improving computational efficiency and reducing privacy risks.

摘要

Extreme Multilabel Classification for Specialist Doctor Recommendation with Implicit Feedback and Limited Patient Metadata

paper_url: http://arxiv.org/abs/2308.11022
repo_url: None
paper_authors: Filipa Valdeira, Stevo Racković, Valeria Danalachi, Qiwei Han, Cláudia Soares
for: 这个研究旨在开发一个更有效的医疗器官参考系统，以便为新的病人和已有过病史的病人提供更加 personnalized 的参考建议。
methods: 这个研究使用 Extreme Multilabel Classification (XML) 方法，通过将可用的特征编码为多个标签，以便预测不同领域的医生参考建议。
results: 相比于现有的推荐系统，这个方法在有病史的病人中提高了推荐的准确性约 $10%$，并且在新的病人中提高了预测的精度和传递率。

Abstract
Recommendation Systems (RS) are often used to address the issue of medical doctor referrals. However, these systems require access to patient feedback and medical records, which may not always be available in real-world scenarios. Our research focuses on medical referrals and aims to predict recommendations in different specialties of physicians for both new patients and those with a consultation history. We use Extreme Multilabel Classification (XML), commonly employed in text-based classification tasks, to encode available features and explore different scenarios. While its potential for recommendation tasks has often been suggested, this has not been thoroughly explored in the literature. Motivated by the doctor referral case, we show how to recast a traditional recommender setting into a multilabel classification problem that current XML methods can solve. Further, we propose a unified model leveraging patient history across different specialties. Compared to state-of-the-art RS using the same features, our approach consistently improves standard recommendation metrics up to approximately $10\%$ for patients with a previous consultation history. For new patients, XML proves better at exploiting available features, outperforming the benchmark in favorable scenarios, with particular emphasis on recall metrics. Thus, our approach brings us one step closer to creating more effective and personalized doctor referral systems. Additionally, it highlights XML as a promising alternative to current hybrid or content-based RS, while identifying key aspects to take into account when using XML for recommendation tasks.

摘要
医疗 Referral Systems (RS) 常常用于解决医生参诊问题。然而，这些系统可能缺乏实际场景中病人反馈和医疗记录的存在。我们的研究专注于医生参诊和预测不同专业医生的建议，包括新病人和已有咨询历史。我们使用极端多类标签分类（XML），通常用于文本分类任务，来编码可用的特征和探索不同enario。虽然其在推荐任务中的潜在优势尚未得到了充分的研究，但我们在医生参诊的情况下表明了如何将传统推荐设定转换成多类标签分类问题，以便现有的XML方法解决。此外，我们提议一种综合模型，利用病人历史跨不同专业。相比之下，使用同样的特征，我们的方法在既有咨询历史的病人中，一直保持了约10%的提升，而无咨询历史的病人中，XML更好地利用可用的特征，在有利的情况下超越了标准。因此，我们的方法使得创建更有效和个性化的医生参诊系统一步之降。此外，它还证明了XML作为推荐任务的一种有力的代替方案，同时标识了在使用XML推荐任务时需要注意的关键因素。

Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

paper_url: http://arxiv.org/abs/2308.11021
repo_url: None
paper_authors: Mihai Pirvu, Alina Marcu, Alexandra Dobrescu, Nabil Belbachir, Marius Leordeanu
for: 这 paper 是为了解决 Earth Observation 问题，这是一个多任务和缺少真实数据的问题。
methods: 这 paper 使用了一种多任务 hypergraph，每个节点是一个任务，不同的路径通过 hypergraph 到达给定任务就成为了无监督教师，通过组合生成可靠的 pseudolabels для该任务。
results: 通过对 NASA NEO 数据集进行广泛的实验，证明了我们的多任务半监督方法的价值，包括在强基eline和最新工作上提供了一系列稳定的提升。此外，我们还证明了 hypergraph 可以适应不监督的数据分布变化，并可靠地恢复缺失数据，包括多个观测层数据，在7年之间。

Abstract
There are many ways of interpreting the world and they are highly interdependent. We exploit such complex dependencies and introduce a powerful multi-task hypergraph, in which every node is a task and different paths through the hypergraph reaching a given task become unsupervised teachers, by forming ensembles that learn to generate reliable pseudolabels for that task. Each hyperedge is part of an ensemble teacher for a given task and it is also a student of the self-supervised hypergraph system. We apply our model to one of the most important problems of our times, that of Earth Observation, which is highly multi-task and it often suffers from missing ground-truth data. By performing extensive experiments on the NASA NEO Dataset, spanning a period of 22 years, we demonstrate the value of our multi-task semi-supervised approach, by consistent improvements over strong baselines and recent work. We also show that the hypergraph can adapt unsupervised to gradual data distribution shifts and reliably recover, through its multi-task self-supervision process, the missing data for several observational layers for up to seven years.

摘要
世界有多种 интерпрета方法，它们之间高度相互依赖。我们利用这些复杂的依赖关系，引入一个强大的多任务权重图，每个节点都是一个任务，不同的路径通过权重图到达一个给定任务就成为无监督教师，通过组合形成ensemble学习器，以生成可靠的假标签 для该任务。每个权重边都是一个任务的ensemble教师，同时也是自我监督权重图系统的学生。我们应用我们的模型到当今最重要的问题之一：地球观测，这是一个高度多任务的问题，经常缺少实际数据。通过对NASA NEO数据集进行了22年的广泛实验，我们示出了我们的多任务半supervised方法的价值，经常超越强基elines和最近的工作。我们还显示了权重图可以适应无监督数据分布变化，并通过自我监督过程来重新获取数据，对于多个观测层次，保持稳定的性能，达7年之久。

Instance-based Learning with Prototype Reduction for Real-Time Proportional Myocontrol: A Randomized User Study Demonstrating Accuracy-preserving Data Reduction for Prosthetic Embedded Systems

paper_url: http://arxiv.org/abs/2308.11019
repo_url: None
paper_authors: Tim Sziburis, Markus Nowak, Davide Brunelli
for: 这个研究旨在设计、实现以及验证基于kNN的手势检测方法，用于辅助 prosthesis 控制。
methods: 为了应对高计算需求，这个研究评估了不同的数据缩减方法，包括点扩展 mapping (DSM)，以提高实时灵活性。
results: 比较这些方法的结果显示，基于 kNN 的方法在线上成功率较高，并且在实时验证中也表现良好。DSM 缩减方法更是与 kNN 方法相比，在灵活性和时间行为方面表现更佳。

Abstract
This work presents the design, implementation and validation of learning techniques based on the kNN scheme for gesture detection in prosthetic control. To cope with high computational demands in instance-based prediction, methods of dataset reduction are evaluated considering real-time determinism to allow for the reliable integration into battery-powered portable devices. The influence of parameterization and varying proportionality schemes is analyzed, utilizing an eight-channel-sEMG armband. Besides offline cross-validation accuracy, success rates in real-time pilot experiments (online target achievement tests) are determined. Based on the assessment of specific dataset reduction techniques' adequacy for embedded control applications regarding accuracy and timing behaviour, Decision Surface Mapping (DSM) proves itself promising when applying kNN on the reduced set. A randomized, double-blind user study was conducted to evaluate the respective methods (kNN and kNN with DSM-reduction) against Ridge Regression (RR) and RR with Random Fourier Features (RR-RFF). The kNN-based methods performed significantly better (p<0.0005) than the regression techniques. Between DSM-kNN and kNN, there was no statistically significant difference (significance level 0.05). This is remarkable in consideration of only one sample per class in the reduced set, thus yielding a reduction rate of over 99% while preserving success rate. The same behaviour could be confirmed in an extended user study. With k=1, which turned out to be an excellent choice, the runtime complexity of both kNN (in every prediction step) as well as DSM-kNN (in the training phase) becomes linear concerning the number of original samples, favouring dependable wearable prosthesis applications.

摘要
The results show that kNN-based methods significantly outperform regression techniques (p<0.0005), with no statistically significant difference between DSM-kNN and kNN. The kNN method with a single nearest neighbor (k=1) achieves linear runtime complexity, making it suitable for real-time wearable prosthesis applications. The study demonstrates the potential of kNN-based gesture detection for reliable and efficient prosthetic control.Here is the Simplified Chinese translation of the text:这项研究提出了基于k-最近邻居（kNN）的手势检测方法，以适应轻量级的 prosthetic 控制。为了评估不同的数据减少技术的效果，这项研究比较了kNN与和DSM-kNN（Decision Surface Mapping）与ridge regression（RR）和RR的Random Fourier Features（RR-RFF）。结果显示，kNN基本方法significantly exceeds regression 方法（p<0.0005），并且DSM-kNN和kNN之间没有 statistically significiant 差异（significance 水平0.05）。kNN方法使用单个最近邻居（k=1） achieves linear 时间复杂度，使其适用于实时穿戴式 prosthesis 应用。这项研究 validate 了 kNN 基本方法的可靠和高效的手势检测。

Personalized Event Prediction for Electronic Health Records

paper_url: http://arxiv.org/abs/2308.11013
repo_url: None
paper_authors: Jeong Min Lee, Milos Hauskrecht
for: 预测医疗事件序列，提高患者照管质量
methods: 基于个体特点进行个体化预测，自适应和多模型选择
results: 在MIMIC-III数据库中测试和分析多种预测模型，提高预测精度

Abstract
Clinical event sequences consist of hundreds of clinical events that represent records of patient care in time. Developing accurate predictive models of such sequences is of a great importance for supporting a variety of models for interpreting/classifying the current patient condition, or predicting adverse clinical events and outcomes, all aimed to improve patient care. One important challenge of learning predictive models of clinical sequences is their patient-specific variability. Based on underlying clinical conditions, each patient's sequence may consist of different sets of clinical events (observations, lab results, medications, procedures). Hence, simple population-wide models learned from event sequences for many different patients may not accurately predict patient-specific dynamics of event sequences and their differences. To address the problem, we propose and investigate multiple new event sequence prediction models and methods that let us better adjust the prediction for individual patients and their specific conditions. The methods developed in this work pursue refinement of population-wide models to subpopulations, self-adaptation, and a meta-level model switching that is able to adaptively select the model with the best chance to support the immediate prediction. We analyze and test the performance of these models on clinical event sequences of patients in MIMIC-III database.

摘要
临床事件序列包含数百个临床事件记录，表示患者的时间记录。开发准确预测临床序列模型是非常重要的，以支持评估当前患者状况、分类临床事件、预测不良临床结果等，以提高患者的护理质量。一个重要的预测临床序列模型挑战是每个患者的序列具有特殊的病人特有性。基于下面的临床状况，每个患者的序列可能包含不同的临床事件（观察结果、实验室数据、药物和手术）。因此，从多个不同患者的事件序列中学习的人口广泛模型可能无法准确预测每个患者的特有的事件序列和差异。为解决这个问题，我们提出并研究了多种新的事件序列预测模型和方法。这些方法可以更好地适应各个患者和其特定的状况。我们在MIMIC-III数据库中分析和测试这些模型的性能。

Using language models in the implicit automated assessment of mathematical short answer items

paper_url: http://arxiv.org/abs/2308.11006
repo_url: None
paper_authors: Christopher Ormerod
for: This paper proposes a new way to assess short constructed responses to mathematics items, using a value identification pipeline to determine the correctness of the response and identify any misconceptions.
methods: The paper uses a pipeline consisting of two fine-tuned language models to identify the key values specified in the student response, with the first model determining if a value is implicit in the response and the second model identifying where the key value is specified.
results: The value identification pipeline is shown to be a more accurate and informative way to assess short constructed responses than traditional rubric-based scoring, and can provide more targeted feedback to students to help them improve their understanding of mathematics.Here is the same information in Simplified Chinese text:
for: 这篇论文提出了一种新的方法来评估数学项目短 constructed responses，使用值标识管道来确定答案正确性和揪出任何误解。
methods: 该论文使用一个管道，包括两个精心调整的自然语言模型，来标识学生答案中的关键值。首先，模型确定答案中是否含有值，然后确定答案中关键值的位置。
results: 值标识管道比传统的分类型评估方法更加准确和有用，可以为学生提供更targeted的反馈，帮助他们提高数学理解。

Abstract
We propose a new way to assess certain short constructed responses to mathematics items. Our approach uses a pipeline that identifies the key values specified by the student in their response. This allows us to determine the correctness of the response, as well as identify any misconceptions. The information from the value identification pipeline can then be used to provide feedback to the teacher and student. The value identification pipeline consists of two fine-tuned language models. The first model determines if a value is implicit in the student response. The second model identifies where in the response the key value is specified. We consider both a generic model that can be used for any prompt and value, as well as models that are specific to each prompt and value. The value identification pipeline is a more accurate and informative way to assess short constructed responses than traditional rubric-based scoring. It can be used to provide more targeted feedback to students, which can help them improve their understanding of mathematics.

摘要
我们提出了一种新的方法来评估一些短 constructed responses to mathematics items。我们的方法使用一个管道来标识学生在回答中提供的关键值。这些值可以用来确定回答的正确性，以及找到任何误解。管道中的信息可以用来提供反馈给教师和学生。我们的值标识管道包括两个精心调整的自然语言模型。第一个模型判断学生回答中是否存在潜在的值。第二个模型 identific where in the response the key value is specified。我们考虑了一个通用的模型，可以用于任何提问和值，以及特定的模型，用于每个提问和值。值标识管道比传统基于精心制定的分数表更加准确和有用，可以为学生提供更加有针对性的反馈，帮助他们提高数学理解。

Autonomous Detection of Methane Emissions in Multispectral Satellite Data Using Deep Learning

paper_url: http://arxiv.org/abs/2308.11003
repo_url: None
paper_authors: Bertrand Rouet-Leduc, Thomas Kerdreux, Alexandre Tuel, Claudia Hulbert
for: 快速降低全球暖化的主要目标是减少甲烷排放，但现有的甲烷排放监测技术主要基于估算的排放因子或自报告，这些技术经常会严重地下估排放量。
methods: 使用深度学习方法来自动检测卫星多spectral数据中的甲烷泄漏，并提高了false positive率与现有Multispectral甲烷数据产品相比，而无需提前知道泄漏点位置。
results: 我们的提议方法可以实现高精度、高频率的自动甲烷排放监测，并且可以减少人工分析的需求。

Abstract
Methane is one of the most potent greenhouse gases, and its short atmospheric half-life makes it a prime target to rapidly curb global warming. However, current methane emission monitoring techniques primarily rely on approximate emission factors or self-reporting, which have been shown to often dramatically underestimate emissions. Although initially designed to monitor surface properties, satellite multispectral data has recently emerged as a powerful method to analyze atmospheric content. However, the spectral resolution of multispectral instruments is poor, and methane measurements are typically very noisy. Methane data products are also sensitive to absorption by the surface and other atmospheric gases (water vapor in particular) and therefore provide noisy maps of potential methane plumes, that typically require extensive human analysis. Here, we show that the image recognition capabilities of deep learning methods can be leveraged to automatize the detection of methane leaks in Sentinel-2 satellite multispectral data, with dramatically reduced false positive rates compared with state-of-the-art multispectral methane data products, and without the need for a priori knowledge of potential leak sites. Our proposed approach paves the way for the automated, high-definition and high-frequency monitoring of point-source methane emissions across the world.

摘要
孕气是全球暖化最强大的气体之一，它的大气半衰期短，使其成为快速降低全球暖化的目标。然而，现有的甲烷排放监测技术主要基于估算性的排放因素或自reporting，这些技术经常会很大幅度地下估排放。 Although initially designed to monitor surface properties, satellite multispectral data has recently emerged as a powerful method to analyze atmospheric content. However, the spectral resolution of multispectral instruments is poor, and methane measurements are typically very noisy. Methane data products are also sensitive to absorption by the surface and other atmospheric gases (water vapor in particular) and therefore provide noisy maps of potential methane plumes, that typically require extensive human analysis. Here, we show that the image recognition capabilities of deep learning methods can be leveraged to automatize the detection of methane leaks in Sentinel-2 satellite multispectral data, with dramatically reduced false positive rates compared with state-of-the-art multispectral methane data products, and without the need for a priori knowledge of potential leak sites. Our proposed approach paves the way for the automated, high-definition and high-frequency monitoring of point-source methane emissions across the world.

SupEuclid: Extremely Simple, High Quality OoD Detection with Supervised Contrastive Learning and Euclidean Distance

paper_url: http://arxiv.org/abs/2308.10973
repo_url: None
paper_authors: Jarrod Haas
for: 本研究旨在提出一种简单 yet powerful的 Out-of-Distribution (OoD) detection方法，以及其在不同距离 benchmarks 上的性能。
methods: 本研究使用 Supervised Contrastive Learning (SCL) 方法，并使用 Euclidean distance 作为评价函数。
results: 研究结果表明，使用 ResNet18 和 SCL 方法可以在near和far OoD detection benchmarks 上达到状态 искусственный智能水平，无需额外的模型或 hyperparameter tuning。

Abstract
Out-of-Distribution (OoD) detection has developed substantially in the past few years, with available methods approaching, and in a few cases achieving, perfect data separation on standard benchmarks. These results generally involve large or complex models, pretraining, exposure to OoD examples or extra hyperparameter tuning. Remarkably, it is possible to achieve results that can exceed many of these state-of-the-art methods with a very simple method. We demonstrate that ResNet18 trained with Supervised Contrastive Learning (SCL) produces state-of-the-art results out-of-the-box on near and far OoD detection benchmarks using only Euclidean distance as a scoring rule. This may obviate the need in some cases for more sophisticated methods or larger models, and at the very least provides a very strong, easy to use baseline for further experimentation and analysis.

摘要

Fat Shattering, Joint Measurability, and PAC Learnability of POVM Hypothesis Classes

paper_url: http://arxiv.org/abs/2308.12304
repo_url: None
paper_authors: Abram Magner, Arun Padakandla
for: 本文研究了量子测量类型的学习可能性，并提出了匹配的必要和充分条件，以及相关的样本复杂度上限。
methods: 本文使用了empirical risk和denoised ERM来学习量子测量类型，并证明了这两种方法的 universal 性。
results: 本文证明了一些量子测量类型是PAC可学习的，并给出了具体的样本复杂度上限。此外，本文还证明了一些量子测量类型的学习可能性是基于finite fat shattering dimension和approximately jointly measurable subsets的。

Abstract
We characterize learnability for quantum measurement classes by establishing matching necessary and sufficient conditions for their PAC learnability, along with corresponding sample complexity bounds, in the setting where the learner is given access only to prepared quantum states. We first probe the results from previous works on this setting. We show that the empirical risk defined in previous works and matching the definition in the classical theory fails to satisfy the uniform convergence property enjoyed in the classical setting for some learnable classes. Moreover, we show that VC dimension generalization upper bounds in previous work are frequently infinite, even for finite-dimensional POVM classes. To surmount the failure of the standard ERM to satisfy uniform convergence, we define a new learning rule -- denoised ERM. We show this to be a universal learning rule for POVM and probabilistically observed concept classes, and the condition for it to satisfy uniform convergence is finite fat shattering dimension of the class. We give quantitative sample complexity upper and lower bounds for learnability in terms of finite fat-shattering dimension and a notion of approximate finite partitionability into approximately jointly measurable subsets, which allow for sample reuse. We then show that finite fat shattering dimension implies finite coverability by approximately jointly measurable subsets, leading to our matching conditions. We also show that every measurement class defined on a finite-dimensional Hilbert space is PAC learnable. We illustrate our results on several example POVM classes.

摘要
我们Characterize学习可能性 для量子测量类型by establishing匹配必要和 sufficient conditions for their PAC学习可能性，以及相应的样本复杂度上限，在learner只有prepared量子状态的设置下。我们首先探讨过去的作品的结果。我们表明，在previous works中定义的empirical risk并不满足在classical setting中所享受的uniform convergence property，并且在一些可学习类型上，VC dimension generalization upper bounds是无限大的，即使是finite-dimensional POVM类型。为了缺乏标准ERM的uniform convergence，我们定义了一个新的学习规则—denoised ERM。我们证明这是一个universal learning rule for POVM和 probabilistically observed concept classes，并且其uniform convergence的 conditon是finite fat shattering dimension of the class。我们给出了量子 sample complexity upper和lower bounds in terms offinite fat-shattering dimension和一种 approximate finite partitionability into approximately jointly measurable subsets，这些allow for sample reuse。然后，我们证明finite fat shattering dimension implies finite coverability by approximately jointly measurable subsets，导致我们的matching conditions。 finally，我们证明every measurement class defined on a finite-dimensional Hilbert space is PAC learnable。我们在several example POVM classes illustrate our results.Note: Simplified Chinese is a simplified version of Chinese that is used in mainland China and Singapore. It is different from Traditional Chinese, which is used in Taiwan and other parts of the world.

MRI Field-transfer Reconstruction with Limited Data: Regularization by Neural Style Transfer

paper_url: http://arxiv.org/abs/2308.10968
repo_url: None
paper_authors: Guoyao Shen, Yancheng Zhu, Hernan Jara, Sean B. Andersson, Chad W. Farris, Stephan Anderson, Xin Zhang
for: 这个研究的目的是提高MRI重建的品质，使用深度学习模型并充分利用对于图像重建的热点。
methods: 这个研究使用了对于图像重建的深度学习模型，并将对于图像重建的热点转换为一个对于图像重建的热点转换。
results: 这个研究发现，使用RNST方法可以从噪压低质量图像中重建高品质图像，并且可以在有限数据情况下进行图像重建。

Abstract
Recent works have demonstrated success in MRI reconstruction using deep learning-based models. However, most reported approaches require training on a task-specific, large-scale dataset. Regularization by denoising (RED) is a general pipeline which embeds a denoiser as a prior for image reconstruction. The potential of RED has been demonstrated for multiple image-related tasks such as denoising, deblurring and super-resolution. In this work, we propose a regularization by neural style transfer (RNST) method to further leverage the priors from the neural transfer and denoising engine. This enables RNST to reconstruct a high-quality image from a noisy low-quality image with different image styles and limited data. We validate RNST with clinical MRI scans from 1.5T and 3T and show that RNST can significantly boost image quality. Our results highlight the capability of the RNST framework for MRI reconstruction and the potential for reconstruction tasks with limited data.

摘要
最近的研究已经证明深度学习模型可以成功地重建MRI图像。然而，大多数报道的方法需要训练在特定任务的大规模数据集上。权化通过减噪（RED）是一个通用管道，它将减噪算法作为图像重建的先前。RED的潜在能力已经在多种图像相关任务中展现出来，如减噪、锐化和超分辨率。在这项工作中，我们提议一种基于神经传递和减噪引擎的常规化（RNST）方法，以更好地利用神经传递和减噪引擎中的先前。这使得RNST可以从噪声低质量的图像中重建高质量的图像，并且图像风格不同和数据有限。我们验证了RNST使用临床MRI扫描数据，并显示了RNST可以显著提高图像质量。我们的结果表明RNST框架适用于MRI重建和有限数据的重建任务。

Structured World Models from Human Videos

paper_url: http://arxiv.org/abs/2308.10901
repo_url: None
paper_authors: Russell Mendonca, Shikhar Bahl, Deepak Pathak
for: 本文目标是使机器人能够直接在实际世界中学习复杂的一般行为。
methods: 该方法使用互联网规模的人类视频数据来帮助机器人快速学习抓取技能。文中提出了一种基于视觉可用性学习的人类行为空间，并在这个空间中训练了一个世界模型。
results: 实验结果表明，这种方法可以让不同的机器人在复杂的Setting中快速学习多种抓取技能，仅用30分钟的互动数据。视频可以在https://human-world-model.github.io找到。

Abstract
We tackle the problem of learning complex, general behaviors directly in the real world. We propose an approach for robots to efficiently learn manipulation skills using only a handful of real-world interaction trajectories from many different settings. Inspired by the success of learning from large-scale datasets in the fields of computer vision and natural language, our belief is that in order to efficiently learn, a robot must be able to leverage internet-scale, human video data. Humans interact with the world in many interesting ways, which can allow a robot to not only build an understanding of useful actions and affordances but also how these actions affect the world for manipulation. Our approach builds a structured, human-centric action space grounded in visual affordances learned from human videos. Further, we train a world model on human videos and fine-tune on a small amount of robot interaction data without any task supervision. We show that this approach of affordance-space world models enables different robots to learn various manipulation skills in complex settings, in under 30 minutes of interaction. Videos can be found at https://human-world-model.github.io

摘要
我们面临的问题是直接在真实世界中学习复杂、通用的行为。我们提出了一种方法，使用少量真实世界交互轨迹来快速地教育机器人抓取技能。我们Draw inspiration from the success of learning from large-scale datasets in the fields of computer vision and natural language, we believe that in order to efficiently learn, a robot must be able to leverage internet-scale, human video data。人类在world中有很多有趣的交互方式，这可以让机器人不仅构建有用的动作和可用性的理解，还可以了解这些动作如何影响world для抓取。我们的方法是建立基于视觉可用性学习的人类行为空间，然后在这个空间中训练一个世界模型，并在这个模型上精度地训练一些机器人交互数据。我们显示，这种方法可以在30分钟内帮助不同的机器人在复杂的设置中学习各种抓取技能。视频可以在https://human-world-model.github.io找到。

Unlocking Accuracy and Fairness in Differentially Private Image Classification

paper_url: http://arxiv.org/abs/2308.10888
repo_url: None
paper_authors: Leonard Berrada, Soham De, Judy Hanwen Shen, Jamie Hayes, Robert Stanforth, David Stutz, Pushmeet Kohli, Samuel L. Smith, Borja Balle
for: 这个论文的目的是使得机器学习模型在保护个人隐私的情况下训练，以避免泄露敏感信息。
methods: 这篇论文使用了差分隐私（DP）框架，以提供正式的隐私保证。
results: 研究发现，使用DP训练的私有分类器可以与非私有分类器的准确率相似，即使数据分布shift大。这些私有分类器在四个数据集上达到了非私有状态的准确率水平，并且不会因为人口群体差异而导致不公正的性表现。这个突破可以让机器学习实践者安全地训练在敏感数据上，保护个人隐私。

Abstract
Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are also believed to exhibit larger performance disparities across subpopulations, raising fairness concerns. The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry. Here we show that pre-trained foundation models fine-tuned with DP can achieve similar accuracy to non-private classifiers, even in the presence of significant distribution shifts between pre-training data and downstream tasks. We achieve private accuracies within a few percent of the non-private state of the art across four datasets, including two medical imaging benchmarks. Furthermore, our private medical classifiers do not exhibit larger performance disparities across demographic groups than non-private models. This milestone to make DP training a practical and reliable technology has the potential to widely enable machine learning practitioners to train safely on sensitive datasets while protecting individuals' privacy.

摘要
隐私保护机器学习目标是在使用private数据进行训练而不泄露敏感信息。差分隐私（DP）是考虑隐私保护的标准框架，它提供了正式的隐私保证。然而，与非private模型相比，DP训练的模型通常具有显著减少的准确率。私有分类器还可能会在不同的人口群体中存在更大的性别差异，这引发了公平性的担忧。DP训练中模型的准确率较低，使得隐私保护机器学习在业界广泛应用的推广受阻。在这里，我们展示了使用预训练基础模型并在DP下进行细化训练可以达到与非private模型准确率相似的水平，即使在数据分布shift大的情况下。我们在四个数据集上达到了与非private状态的准确率相似的私有准确率，其中包括两个医疗影像标准 benchmark。此外，我们的私有医疗分类器不会在不同的人口群体中存在更大的性别差异，与非private模型相比。这一突破可能使DP训练成为实用可靠的技术，使机器学习师可以在敏感数据上训练，而不需要担心个人隐私。

Analyzing Transformer Dynamics as Movement through Embedding Space

paper_url: http://arxiv.org/abs/2308.10874
repo_url: None
paper_authors: Sumeet S. Singh
for: 这篇论文探讨了transformer语言模型如何实现智能行为，包括理解自然语言、识别模式、获取知识、思考、规划和使用工具。methods: 作者采用系统方法分析transformer语言模型，并开发了一个数学框架来描述其动态。这个框架将transformer模型看作是一个移动在嵌入空间中的系统，从而提供了一种原则性的思考方式，并且显示出了智能行为的起源。results: 研究发现，transformer模型的核心是一个嵌入空间漫游者，它将智能行为映射到嵌入空间中的路径上。在每个步骤中，模型会将上下文集成成一个单一的组合向量，该向量的嵌入空间位置定义下一步的路径。此外，研究还发现了模型学习和总结的机制，以及它们与嵌入空间中的vector组织相关的知识和技能。

Abstract
Transformer language models exhibit intelligent behaviors such as understanding natural language, recognizing patterns, acquiring knowledge, reasoning, planning, reflecting and using tools. This paper explores how their underlying mechanics give rise to intelligent behaviors. We adopt a systems approach to analyze Transformers in detail and develop a mathematical framework that frames their dynamics as movement through embedding space. This novel perspective provides a principled way of thinking about the problem and reveals important insights related to the emergence of intelligence: 1. At its core the Transformer is a Embedding Space walker, mapping intelligent behavior to trajectories in this vector space. 2. At each step of the walk, it composes context into a single composite vector whose location in Embedding Space defines the next step. 3. No learning actually occurs during decoding; in-context learning and generalization are simply the result of different contexts composing into different vectors. 4. Ultimately the knowledge, intelligence and skills exhibited by the model are embodied in the organization of vectors in Embedding Space rather than in specific neurons or layers. These abilities are properties of this organization. 5. Attention's contribution boils down to the association-bias it lends to vector composition and which influences the aforementioned organization. However, more investigation is needed to ascertain its significance. 6. The entire model is composed from two principal operations: data independent filtering and data dependent aggregation. This generalization unifies Transformers with other sequence models and across modalities. Building upon this foundation we formalize and test a semantic space theory which posits that embedding vectors represent semantic concepts and find some evidence of its validity.

摘要
transformer 语言模型显示出智能行为，如理解自然语言、识别模式、获取知识、理智、规划、反思和使用工具。这篇论文探讨transformer的下面机制如何导致智能行为的出现。我们采用系统方法分析transformer，并开发了一个数学框架，将其动态规则表示为向量空间中的移动。这种新的视角提供了一种原则性的思考方式，并揭示了智能行为出现的重要视角：1. transformer的核心是向量空间漫步者，将智能行为映射到向量空间中的轨迹上。2. 在每一步中，它将上下文融合到一个单一的复合向量中，该向量的位置在向量空间定义下一步的移动。3. 在解码过程中，不会有实际的学习 occur;解码是一种在不同上下文中的另一种组合。4. transformer中展现出的知识、智能和技能都是向量空间中的组织方式，而不是特定的神经元或层。这些能力是这种组织方式的属性。5. 注意力的贡献主要表现在向量组合中的协同偏好，这种偏好对向量空间的组织有影响。然而，更多的研究是需要确定其重要性。6. transformer的整体结构由两种主要操作组成：数据独立滤波和数据依赖聚合。这种总结将transformer与其他序列模型和多模态之间联系起来。基于这个基础，我们正式化和测试一种semantic space理论，该理论认为向量空间中的向量表示semantic concept，并发现了一些证据支持这一点。

Majorana Demonstrator Data Release for AI/ML Applications

paper_url: http://arxiv.org/abs/2308.10856
repo_url: None
paper_authors: I. J. Arnquist, F. T. Avignone III, A. S. Barabash, C. J. Barton, K. H. Bhimani, E. Blalock, B. Bos, M. Busch, M. Buuck, T. S. Caldwell, Y. -D. Chan, C. D. Christofferson, P. -H. Chu, M. L. Clark, C. Cuesta, J. A. Detwiler, Yu. Efremenko, H. Ejiri, S. R. Elliott, N. Fuad, G. K. Giovanetti, M. P. Green, J. Gruszko, I. S. Guinn, V. E. Guiseppe, C. R. Haufe, R. Henning, D. Hervas Aguilar, E. W. Hoppe, A. Hostiuc, M. F. Kidd, I. Kim, R. T. Kouzes, T. E. Lannen V, A. Li, J. M. Lopez-Castano, R. D. Martin, R. Massarczyk, S. J. Meijer, S. Mertens, T. K. Oli, L. S. Paudel, W. Pettus, A. W. P. Poon, B. Quenallata, D. C. Radford, A. L. Reine, K. Rielage, N. W. Ruof, D. C. Schaper, S. J. Schleich, D. Tedeschi, R. L. Varner, S. Vasilyev, S. L. Watkins, J. F. Wilkerson, C. Wiseman, W. Xu, C. -H. Yu, B. X. Zhu
for: 本研究的目的是为了支持人工智能（AI）和机器学习（ML）算法的训练和测试，并透过这些算法来分析 Majorana 实验中的数据。
methods: 本研究使用的方法包括 Majorana 实验中的数据分析、数据采集和处理等。
results: 本研究提供了一个受控的 Majorana 实验数据集，包括raw Germanium 探测器波形、激发形分离icut和加工后的能量等数据，以便用于AI和ML算法的训练和测试。

Abstract
The enclosed data release consists of a subset of the calibration data from the Majorana Demonstrator experiment. Each Majorana event is accompanied by raw Germanium detector waveforms, pulse shape discrimination cuts, and calibrated final energies, all shared in an HDF5 file format along with relevant metadata. This release is specifically designed to support the training and testing of Artificial Intelligence (AI) and Machine Learning (ML) algorithms upon our data. This document is structured as follows. Section I provides an overview of the dataset's content and format; Section II outlines the location of this dataset and the method for accessing it; Section III presents the NPML Machine Learning Challenge associated with this dataset; Section IV contains a disclaimer from the Majorana collaboration regarding the use of this dataset; Appendix A contains technical details of this data release. Please direct questions about the material provided within this release to liaobo77@ucsd.edu (A. Li).

摘要
附件的数据发布包含 Majorana 实验中的一个子集滤波器数据。每个 Majorana 事件都由原始的锡律器普通波形、振荡形态筛选器和已加工的最终能量，共同存储在 HDF5 文件格式中，并附带相关的元数据。这个发布是为支持使用人工智能（AI）和机器学习（ML）算法对我们的数据进行训练和测试而设计的。这份文档的结构如下：Section I：数据集的内容和格式的概述Section II：数据集的位置和访问方法Section III：NPML 机器学习挑战与这些数据集相关Section IV： Majorana 团队对使用这些数据集的声明Appendix A：这些数据发布的技术详细信息请对这些发布中的内容有任何问题，请邮件 liaobo77@ucsd.edu (A. Li)。

Evaluating quantum generative models via imbalanced data classification benchmarks

paper_url: http://arxiv.org/abs/2308.10847
repo_url: None
paper_authors: Graham R. Enos, Matthew J. Reagor, Eric Hulburd
for: 这篇论文的目的是用Explainable AI技术分析量子机器学习模型的行为是否与传统模型不同，并在具有不同复杂度和类别偏好的实际数据集上进行系统性的应用。
methods: 这篇论文使用了 hybrid 量子-классиical 神经网络，并从 twenty 个实际数据集中采样了 Synthetic 数据。这些数据集包括太阳风暴、心脏颤动和语音数据等，每个数据集都具有不同的复杂度和类别偏好。
results: 这篇论文通过对量子生成的数据进行比较，发现了一些特定的问题可以借助 hybrid 量子-claссиical 生成模型来解决，而其他问题则更适合使用传统方法。这种分析方法可以帮助您更好地了解一个问题是否适合使用 hybrid 量子-claссиical 生成模型。

Abstract
A limited set of tools exist for assessing whether the behavior of quantum machine learning models diverges from conventional models, outside of abstract or theoretical settings. We present a systematic application of explainable artificial intelligence techniques to analyze synthetic data generated from a hybrid quantum-classical neural network adapted from twenty different real-world data sets, including solar flares, cardiac arrhythmia, and speech data. Each of these data sets exhibits varying degrees of complexity and class imbalance. We benchmark the quantum-generated data relative to state-of-the-art methods for mitigating class imbalance for associated classification tasks. We leverage this approach to elucidate the qualities of a problem that make it more or less likely to be amenable to a hybrid quantum-classical generative model.

摘要
有限的工具存在，用于判断量子机器学习模型与传统模型之间的差异，除了抽象或理论上的设定。我们使用可解释人工智能技术来分析基于混合量子-классиical神经网络的 synthetic 数据，该神经网络来自二十个真实世界数据集，包括太阳风暴、心脏病发和语音数据。每个数据集具有不同的复杂度和类别偏度。我们对量子生成的数据进行比较，与现有的方法来 mitigate 类别偏度相关的分类任务。我们利用这种方法，以便描述一个问题的特点，使其更或 menos 可能被 hybrid 量子-классиical 生成模型解释。

Real World Time Series Benchmark Datasets with Distribution Shifts: Global Crude Oil Price and Volatility

paper_url: http://arxiv.org/abs/2308.10846
repo_url: https://github.com/oilpricebenchmarks/COB
paper_authors: Pranay Pasula
for:COB datasets are created to address the scarcity of task-labeled time-series benchmarks in the financial domain, specifically for crude oil benchmarks.methods:The paper uses asset price data transformed into volatility proxies, expectation-maximization (EM) fitting, and contextual task labels aligned with real-world events to create the COB datasets.results:The inclusion of task labels universally improves performance on four continual learning algorithms over multiple forecasting horizons, demonstrating the effectiveness of the COB datasets in handling distribution shifts in real-world data.Here is the same information in Simplified Chinese text:for:COB数据集是为解决金融领域时间序列标注数据的缺乏而创建的，特别是对于钻油标准数据集。methods:本文使用资产价格数据转换为Volatility代理，使用期望最大化（EM）方法适应，并使用与实际世界事件相关的任务标签来生成COB数据集。results:将任务标签包含在内的四种持续学习算法在多个预测时间步长上显示了统一提高性能，证明COB数据集在实际世界数据中处理分布变化的能力。

Abstract
The scarcity of task-labeled time-series benchmarks in the financial domain hinders progress in continual learning. Addressing this deficit would foster innovation in this area. Therefore, we present COB, Crude Oil Benchmark datasets. COB includes 30 years of asset prices that exhibit significant distribution shifts and optimally generates corresponding task (i.e., regime) labels based on these distribution shifts for the three most important crude oils in the world. Our contributions include creating real-world benchmark datasets by transforming asset price data into volatility proxies, fitting models using expectation-maximization (EM), generating contextual task labels that align with real-world events, and providing these labels as well as the general algorithm to the public. We show that the inclusion of these task labels universally improves performance on four continual learning algorithms, some state-of-the-art, over multiple forecasting horizons. We hope these benchmarks accelerate research in handling distribution shifts in real-world data, especially due to the global importance of the assets considered. We've made the (1) raw price data, (2) task labels generated by our approach, (3) and code for our algorithm available at https://oilpricebenchmarks.github.io.

摘要
“财经领域内的任务标签时序序数据缺乏，阻碍了不断学习的进步。为了解决这个问题，我们提出了COB，即原油价格库存数据。COB包含了30年的资产价格，这些价格表现出了明显的分布迁移，并且适当地生成相应的任务（即 режи）标签，基于这些分布迁移。我们的贡献包括将资产价格资料转换为波动调整器，使用期望最大化（EM）适应模型，生成相应的任务标签，并将这些标签以及通用的算法公开给大众。我们的实验表明，将这些任务标签包含在内，可以universally提高四种不断学习算法的表现，包括一些state-of-the-art算法，在多个预测时间框架下。我们希望这些库存可以促进实际数据中的分布迁移处理研究，特别是由于考虑到全球资产的重要性。我们已经将（1）原价格数据、（2）生成的任务标签、以及（3）代码公开给大众，可以在https://oilpricebenchmarks.github.io获取。”

Neural Networks Optimizations Against Concept and Data Drift in Malware Detection

paper_url: http://arxiv.org/abs/2308.10821
repo_url: None
paper_authors: William Maillet, Benjamin Marais
for: 这个研究旨在提高基eline neural network的抗变化能力，以应对适得过时的攻击者常常演化的问题。
methods: 我们提出了一个模型独立的协议，并使用最新的验证集进行训练，以及一个名为过渡敏感度减少的损失函数，以提高模型的抗变化能力。
results: 我们在使用EMBER dataset（2018）进行训练，并在2020-2023年发生的恶意档案上进行评估，得到了15.2%更多的恶意档案检测。

Abstract
Despite the promising results of machine learning models in malware detection, they face the problem of concept drift due to malware constant evolution. This leads to a decline in performance over time, as the data distribution of the new files differs from the training one, requiring regular model update. In this work, we propose a model-agnostic protocol to improve a baseline neural network to handle with the drift problem. We show the importance of feature reduction and training with the most recent validation set possible, and propose a loss function named Drift-Resilient Binary Cross-Entropy, an improvement to the classical Binary Cross-Entropy more effective against drift. We train our model on the EMBER dataset (2018) and evaluate it on a dataset of recent malicious files, collected between 2020 and 2023. Our improved model shows promising results, detecting 15.2% more malware than a baseline model.

摘要
尽管机器学习模型在恶意软件检测方面表现出了承诺的成绩，但它们面临着概念漂移问题，这是由恶意软件不断演化所致。这会导致模型的性能逐渐下降，因为新文件的数据分布与训练数据分布不同，需要定期更新模型。在这种情况下，我们提出了一种模型无关协议，用于改进基eline neural network，以适应概念漂移问题。我们表明了减少特征和使用最新的验证集进行训练的重要性，并提出了一种名为“漂移抗性二进制十进制”的损失函数，这是对传统的二进制十进制损失函数的改进，更有效地适应概念漂移。我们使用EMBER数据集（2018）进行训练，并对2020年至2023年间收集的恶意文件进行评估。我们改进后的模型显示了承诺的成绩，能够检测到基eline模型无法检测到的15.2%的恶意软件。

2023-08-22

A free from local minima algorithm for training regressive MLP neural networks

ReLiCADA – Reservoir Computing using Linear Cellular Automata Design Algorithm

EM for Mixture of Linear Regression with Clustered Data

TrackFlow: Multi-Object Tracking with Normalizing Flows

Mode Combinability: Exploring Convex Combinations of Permutation Aligned Models

Can Authorship Representation Learning Capture Stylistic Features?

Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions

Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection

Revisiting column-generation-based matheuristic for learning classification trees

Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning

An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning

A Survey on Self-Supervised Representation Learning

Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding

Exploration of Rashomon Set Assists Explanations for Medical Data

Inferring gender from name: a large scale performance evaluation study

A Study on the Impact of Non-confounding Covariates on the Inferential Performance of Methods based on the Potential Outcome Framework

TurboViT: Generating Fast Vision Transformers via Generative Architecture Search

Designing an attack-defense game: how to increase robustness of financial transaction models via a competition

Non-Redundant Combination of Hand-Crafted and Deep Learning Radiomics: Application to the Early Detection of Pancreatic Cancer

Targeted Data Augmentation for bias mitigation

Interpretable Distribution-Invariant Fairness Measures for Continuous Scores

How Much Temporal Long-Term Context is Needed for Action Segmentation?

Machine learning assisted exploration for affine Deligne-Lusztig varieties

WEARS: Wearable Emotion AI with Real-time Sensor data

Careful at Estimation and Bold at Exploration

ProAgent: Building Proactive Cooperative AI with Large Language Models

Generalising sequence models for epigenome predictions with tissue and assay embeddings

Protect Federated Learning Against Backdoor Attacks via Data-Free Trigger Generation

Machine Learning-based Positioning using Multivariate Time Series Classification for Factory Environments

Class Label-aware Graph Anomaly Detection

Uncertainty Estimation of Transformers’ Predictions via Topological Analysis of the Attention Matrices

Network Momentum across Asset Classes

Improving Knot Prediction in Wood Logs with Longitudinal Feature Propagation

ShadowNet for Data-Centric Quantum System Learning

Test Time Embedding Normalization for Popularity Bias Mitigation

CNN based Cuneiform Sign Detection Learned from Annotated 3D Renderings and Mapped Photographs with Illumination Augmentation

FoX: Formation-aware exploration in multi-agent reinforcement learning

Quantum-Inspired Machine Learning: a Survey

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

Efficient Last-iterate Convergence Algorithms in Solving Games

A survey on bias in machine learning research

Multi-Source Domain Adaptation for Cross-Domain Fault Diagnosis of Chemical Processes

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Minwise-Independent Permutations with Insertion and Deletion of Features

Federated Learning on Patient Data for Privacy-Protecting Polycystic Ovary Syndrome Treatment

Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models

Hamiltonian GAN

A Simple Framework for Multi-mode Spatial-Temporal Data Modeling

SegRNN: Segment Recurrent Neural Network for Long-Term Time Series Forecasting

ConcatPlexer: Additional Dim1 Batching for Faster ViTs

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Power Analysis and Sample Size Estimation

Automatic Task Parallelization of Dataflow Graphs in ML/DL models

Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

A three in one bottom-up framework for simultaneous semantic segmentation, instance segmentation and classification of multi-organ nuclei in digital cancer histology

A Preliminary Investigation into Search and Matching for Tumour Discrimination in WHO Breast Taxonomy Using Deep Networks

xxMD: Benchmarking Neural Force Fields Using Extended Dynamics beyond Equilibrium

Mobility-Aware Computation Offloading for Swarm Robotics using Deep Reinforcement Learning

Energy-Efficient On-Board Radio Resource Management for Satellite Communications via Neuromorphic Computing

LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning (Practical Experience Report)

Exploring Unsupervised Cell Recognition with Prior Self-activation Maps

Graph Encoding and Neural Network Approaches for Volleyball Analytics: From Game Outcome to Individual Play Predictions

Towards Validating Long-Term User Feedbacks in Interactive Recommendation Systems

Transformers for Capturing Multi-level Graph Structure using Hierarchical Distances

How Expressive are Graph Neural Networks in Recommendation?

Random Word Data Augmentation with CLIP for Zero-Shot Anomaly Detection

Development of a Novel Quantum Pre-processing Filter to Improve Image Classification Accuracy of Neural Network Models

CAME: Contrastive Automated Model Evaluation

Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

Explicability and Inexplicability in the Interpretation of Quantum Neural Networks

Video OWL-ViT: Temporally-consistent open-world localization in video

Addressing Fairness and Explainability in Image Classification Using Optimal Transport

Stress representations for tensor basis neural networks: alternative formulations to Finger-Rivlin-Ericksen

Long-Term Prediction of Natural Video Sequences with Robust Video Predictors

A Deep Dive into the Connections Between the Renormalization Group and Deep Learning in the Ising Model

Neural Amortized Inference for Nested Multi-agent Reasoning

Topological Graph Signal Compression

UnLoc: A Unified Framework for Video Localization Tasks

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression