2023-07-01

cs.LG

cs.LG - 2023-07-01

Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training

paper_url: http://arxiv.org/abs/2307.00368
repo_url: None
paper_authors: Dario Lazzaro, Antonio Emanuele Cinà, Maura Pintor, Ambra Demontis, Battista Biggio, Fabio Roli, Marcello Pelillo
for: 降低深度学习模型的能耗
methods: 使用梯度算法对模型训练进行精度补偿，以提高模型的能效性
results: 通过三个数据集和两种深度神经网络的实验分析，我们证明了我们的能源意识训练算法EAT可以培育出具有更好的平衡 между分类性能和能效性的网络。

Abstract
Deep learning models undergo a significant increase in the number of parameters they possess, leading to the execution of a larger number of operations during inference. This expansion significantly contributes to higher energy consumption and prediction latency. In this work, we propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training. To this end, we leverage a differentiable approximation of the $\ell_0$ norm, and use it as a sparse penalty over the training loss. Through our experimental analysis conducted on three datasets and two deep neural networks, we demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.

摘要

Understanding recent deep-learning techniques for identifying collective variables of molecular dynamics

paper_url: http://arxiv.org/abs/2307.00365
repo_url: None
paper_authors: Wei Zhang, Christof Schütte
for: 本研究探讨了使用深度学习技术来找出高维度不稳定分子系统中的一些特征变量（CV）的两种方法。
methods: 这两种方法分别是计算激发器或传输Operator的主要吸引函数的最大值，以及通过最小化重建误差来学习自适应oder。
results: 我们在 illustrate examples 上进行了比较 numerical 研究，并对两种方法的数学基础和实际应用进行了简洁的概述。

Abstract
High-dimensional metastable molecular system can often be characterised by a few features of the system, i.e. collective variables (CVs). Thanks to the rapid advance in the area of machine learning and deep learning, various deep learning-based CV identification techniques have been developed in recent years, allowing accurate modelling and efficient simulation of complex molecular systems. In this paper, we look at two different categories of deep learning-based approaches for finding CVs, either by computing leading eigenfunctions of infinitesimal generator or transfer operator associated to the underlying dynamics, or by learning an autoencoder via minimisation of reconstruction error. We present a concise overview of the mathematics behind these two approaches and conduct a comparative numerical study of these two approaches on illustrative examples.

摘要
高维度征文分子系统可以通过一些系统的特征来描述，即集合变量（CV）。随着机器学习和深度学习领域的快速发展，各种深度学习基于CV标识技术在最近几年内得到了推广应用。在这篇文章中，我们分析了两种不同类型的深度学习基于CV标识方法，即计算极值生成器或传输Operator相关的征文动态的主要特征值，或通过重建错误来学习自适应器。我们介绍了这两种方法的数学基础，并对这两种方法在示例问题上进行了比较性数值研究。

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

paper_url: http://arxiv.org/abs/2307.00364
repo_url: None
paper_authors: Vinitra Swamy, Jibril Frej, Tanja Käser
for: 本文提出了一个呼吁，即从现有的黑obox模型解释方法中扩展到设计可解释的神经网络架构，以解决现有的解释器有限制的问题。
methods: 本文提出了两种解释器设计方法，包括适应路由的可解释 conditional computation，以及诊断标准的Iterative Model Learning。
results: 本文认为，未来的人类中心的XAI不应该仅仅是解释黑obox，而是通过设计可解释的神经网络来实现。

Abstract
Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems, often defined as determining which features are most important to a model's prediction. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to avoid or minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single explainer. This is a particularly concerning trend when considering that recent work has identified systematic disagreement in explainability methods when applied to the same points and underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose to shift from post-hoc explainability to designing interpretable neural network architectures; moving away from approximation techniques in human-centric and high impact applications. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing for interpretable conditional computation and diagnostic benchmarks for iterative model learning). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.

摘要
人工智能（XAI）的解释能力在深度学习系统中扮演了关键的角色，通常是确定模型预测中最重要的特征。随着模型变得更大、更普遍和生活中更加普遍，解释性变得必不可少，以避免或减少模型错误的影响。然而，当前的人类中心XAI方法（如医疗预测任务、教育预测任务或个性化广告）通常依赖于单一的解释器。这是一个特别关心的趋势，因为最近的研究发现了不同的解释方法在对同一点和基于黑盒模型时存在系统性的不一致。在这篇论文中，我们因此呼吁行动，以解决当前状态的限制。我们提议将从后续解释性过渡到设计可解释的神经网络架构，卸下 aproximation 技术在人类中心和高影响应用中。我们标识出了人类中心 XAI 中的五个需求（实时、准确、可行、人类可解释、一致），并提出了两种可解释性设计神经网络工作流程的方案（可解释的 conditional 计算路由和诊断标准 для迭代式模型学习）。我们认为未来的人类中心 XAI 不应该是解释黑盒子，nor should it revert to traditional, interpretable models, but in neural networks that are intrinsically interpretable。

A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact

paper_url: http://arxiv.org/abs/2307.00361
repo_url: None
paper_authors: Álvaro Huertas-García, Carlos Martí-González, Rubén García Maezo, Alejandro Echeverría Rey
for: 本研究旨在开发一种可持续的人工智能（AI）和机器学习（ML）模型，用于避免异常检测中的高计算需求和相关的环境影响。
methods: 本研究采用了多种机器学习算法和不同的多层感知器（MLP）配置，并且对这些模型进行了仔细的评估。
results: 研究发现， tradicional的机器学习算法如决策树和随机森林可以达到强健的性能和效率，但是优化后的MLP配置可以提供更高的性能，尽管与此同时也增加了资源的消耗。

Abstract
In the context of Industry 4.0, the use of artificial intelligence (AI) and machine learning for anomaly detection is being hampered by high computational requirements and associated environmental effects. This study seeks to address the demands of high-performance machine learning models with environmental sustainability, contributing to the emerging discourse on 'Green AI.' An extensive variety of machine learning algorithms, coupled with various Multilayer Perceptron (MLP) configurations, were meticulously evaluated. Our investigation encapsulated a comprehensive suite of evaluation metrics, comprising Accuracy, Area Under the Curve (AUC), Recall, Precision, F1 Score, Kappa Statistic, Matthews Correlation Coefficient (MCC), and F1 Macro. Simultaneously, the environmental footprint of these models was gauged through considerations of time duration, CO2 equivalent, and energy consumption during the training, cross-validation, and inference phases. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised MLP configurations, albeit with a commensurate increase in resource consumption. The study incorporated a multi-objective optimisation approach, invoking Pareto optimality principles, to highlight the trade-offs between a model's performance and its environmental impact. The insights derived underscore the imperative of striking a balance between model performance, complexity, and environmental implications, thus offering valuable directions for future work in the development of environmentally conscious machine learning models for industrial applications.

摘要
在工业4.0上，人工智能（AI）和机器学习（ML）用于异常检测受到高度计算需求和相关环境影响的限制。本研究旨在满足高性能机器学习模型的环境可持续性需求，贡献到emerging Discourse on 'Green AI'。我们对各种机器学习算法和多层感知器（MLP）配置进行了精心评估。我们的调查包括了评估指标的总集，包括准确率、折线评估值（AUC）、回归率、精度、F1分数、kappa统计量、Matthews相关系数（MCC）和F1宏观量。同时，我们对这些模型的环境影响进行了评估，包括训练、验证和推理阶段的时间长度、二氧化碳等同和能耗。传统的机器学习算法，如决策树和随机森林，显示了出色的效率和性能。然而，优化的MLP配置可以提供更高的性能，尽管与此同时，资源消耗也增加了。我们的研究采用多目标优化方法， invoke Pareto可行性原理，以描述模型性能和环境影响之间的负担。研究结论表明，在开发工业应用中的机器学习模型时，需要 struck a balance между模型性能、复杂度和环境影响，从而提供有价值的指导 для未来的工作。

When Synthetic Data Met Regulation

paper_url: http://arxiv.org/abs/2307.00359
repo_url: None
paper_authors: Georgi Ganev
for: 本研究 argue that differentially private generative models 生成的伪数据可以具有足够的隐私保护，因此可以被认为是匿名数据和符合法规的。
methods: 本研究使用 differentially private generative models，such as 隐私均衡生成器和匿名生成器，来生成伪数据。
results: 本研究结果表明，使用 differentially private generative models 生成的伪数据可以具有足够的隐私保护，并且可以满足不同的隐私保护标准。

Abstract
In this paper, we argue that synthetic data produced by Differentially Private generative models can be sufficiently anonymized and, therefore, anonymous data and regulatory compliant.

摘要
在这篇论文中，我们认为使用差分隐私生成模型生成的合成数据可以具备隐私和法规符合性。

Fedward: Flexible Federated Backdoor Defense Framework with Non-IID Data

paper_url: http://arxiv.org/abs/2307.00356
repo_url: None
paper_authors: Zekai Chen, Fuyi Wang, Zhiwei Zheng, Ximeng Liu, Yujie Lin
for: 防止 Federated Backdoor Attack (FBA) 在 Federated Learning (FL) 中，保护敏感本地数据的隐私。
methods: 引入 Flexible Federated Backdoor Defense Framework (Fedward)，分解 FBA 为多种攻击，并使用 AmGrad 和 AutoOPTICS 等方法来解决每种攻击。同时，Fedward 使用 adaptive clipping method，通过限制边界上的样本数来维护 Non-IID 场景下的性能。
results: 对三个 benchmark 数据集进行了实验评估，与 state-of-the-art 研究进行了比较。结果表明，Fedward 能够有效防止 FBA，提高 clustering 防御方法的性能，并在 Non-IID 场景下保持最佳性能。特别是，Fedward 在 MNIST、FMNIST 和 CIFAR10 上的平均 FBA 成功率为 96.98%、90.74% 和 89.8%。

Abstract
Federated learning (FL) enables multiple clients to collaboratively train deep learning models while considering sensitive local datasets' privacy. However, adversaries can manipulate datasets and upload models by injecting triggers for federated backdoor attacks (FBA). Existing defense strategies against FBA consider specific and limited attacker models, and a sufficient amount of noise to be injected only mitigates rather than eliminates FBA. To address these deficiencies, we introduce a Flexible Federated Backdoor Defense Framework (Fedward) to ensure the elimination of adversarial backdoors. We decompose FBA into various attacks, and design amplified magnitude sparsification (AmGrad) and adaptive OPTICS clustering (AutoOPTICS) to address each attack. Meanwhile, Fedward uses the adaptive clipping method by regarding the number of samples in the benign group as constraints on the boundary. This ensures that Fedward can maintain the performance for the Non-IID scenario. We conduct experimental evaluations over three benchmark datasets and thoroughly compare them to state-of-the-art studies. The results demonstrate the promising defense performance from Fedward, moderately improved by 33% $\sim$ 75 in clustering defense methods, and 96.98%, 90.74%, and 89.8% for Non-IID to the utmost extent for the average FBA success rate over MNIST, FMNIST, and CIFAR10, respectively.

摘要
federated learning (FL) 允许多个客户端共同训练深度学习模型，同时考虑本地敏感数据的隐私。然而，敌方可以 manipulate 数据和上传模型，通过注入触发器进行联邦后门攻击 (FBA)。现有的防御策略对 FBA 只考虑特定和有限的攻击者模型，并且只有一定的噪声可以 Mitigate 而不是消除 FBA。为解决这些不足，我们介绍了一个可靠的联邦后门防御框架 (Fedward)，以确保消除敌意后门。我们将 FBA 分解为多种攻击，并设计了增强矩阵缩减 (AmGrad) 和自适应 OPTICS 划分 (AutoOPTICS) 来解决每种攻击。同时，Fedward 使用 adaptive clipping 方法，将数据集中减少缺失数据的影响。这确保了 Fedward 可以在非相同分布 (Non-IID) 场景中保持表现。我们对三个标准 benchmark 数据集进行了实验评估，并对比与现有研究。结果表明，Fedward 具有优秀的防御性能，与比较的 improved 33% 至 75% 在划分防御方法中，以及 96.98%、90.74% 和 89.8% 对 Non-IID 场景中的平均 FBA 成功率。

Sparse-Input Neural Network using Group Concave Regularization

paper_url: http://arxiv.org/abs/2307.00344
repo_url: https://github.com/r08in/gcrnn
paper_authors: Bin Luo, Susan Halabi
for: 本文研究了神经网络中的特征选择问题，尤其是在高维设置下，where the number of variables exceeds the available sample size in modeling.
methods: 我们提出了一种基于集合凹函数 regularization的含缺输入神经网络框架，通过对每个输入节点的所有出向连接权的$l_2$范数应用合适的凹函数罚则，以获得一个只使用小subset of the original variables的神经网络。
results: 我们的广泛的随机实验和实际数据示例表明，提议的估计器在特征选择和预测 kontinuous, binary, and time-to-event outcomes 方面具有满意的finite sample performance.

Abstract
Simultaneous feature selection and non-linear function estimation are challenging, especially in high-dimensional settings where the number of variables exceeds the available sample size in modeling. In this article, we investigate the problem of feature selection in neural networks. Although the group LASSO has been utilized to select variables for learning with neural networks, it tends to select unimportant variables into the model to compensate for its over-shrinkage. To overcome this limitation, we propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings. The main idea is to apply a proper concave penalty to the $l_2$ norm of weights from all outgoing connections of each input node, and thus obtain a neural net that only uses a small subset of the original variables. In addition, we develop an effective algorithm based on backward path-wise optimization to yield stable solution paths, in order to tackle the challenge of complex optimization landscapes. Our extensive simulation studies and real data examples demonstrate satisfactory finite sample performances of the proposed estimator, in feature selection and prediction for modeling continuous, binary, and time-to-event outcomes.

摘要
文本翻译为简化中文。<>模型中的同时特征选择和非线性函数估计是具有挑战性的，特别是在高维设置下，变量的数量超过了可用样本数量。在这篇文章中，我们研究了神经网络中的特征选择问题。虽然LASSO集团已经在学习神经网络中使用特征选择，但它往往选择无关的变量进行模型，以做到其过度压缩。为了解决这个限制，我们提议一种基于 sparse-input 神经网络的组合凹陷 regularization 的特征选择方法，在低维和高维设置下都可以实现。我们的主要想法是对每个输入节点的所有出going连接的 $l_2$ 范数加入一个合适的凹陷罚则，从而获得一个只使用原始变量的小子集的神经网络。此外，我们开发了一种基于后向路径优化的稳定解方法，以解决复杂的优化地形的挑战。我们的广泛的验证研究和实际数据示例表明，提议的估计器在特征选择和预测连续、二进制和时间事件的结果中具有满意的finite sample性能。

Recursive Algorithmic Reasoning

paper_url: http://arxiv.org/abs/2307.00337
repo_url: https://github.com/DJayalath/gnn-call-stack
paper_authors: Dulhan Jayalath, Jonas Jürß, Petar Veličković
for: 本文旨在探讨深度学习中对不同数据进行泛化的问题，并提出了一种基于图神经网络（GNN）的方法来解决这个问题。
methods: 本文提出了一种将GNN与堆结构相结合的方法，以及一种捕捉中间算法轨迹的方法，以提高深度学习网络对递归算法的适应性。
results: 本文的实验表明，与先前方法相比，该方法可以在处理更大的输入图时提高泛化性能。

Abstract
Learning models that execute algorithms can enable us to address a key problem in deep learning: generalizing to out-of-distribution data. However, neural networks are currently unable to execute recursive algorithms because they do not have arbitrarily large memory to store and recall state. To address this, we (1) propose a way to augment graph neural networks (GNNs) with a stack, and (2) develop an approach for capturing intermediate algorithm trajectories that improves algorithmic alignment with recursive algorithms over previous methods. The stack allows the network to learn to store and recall a portion of the state of the network at a particular time, analogous to the action of a call stack in a recursive algorithm. This augmentation permits the network to reason recursively. We empirically demonstrate that our proposals significantly improve generalization to larger input graphs over prior work on depth-first search (DFS).

摘要
学习模型可以让我们解决深度学习中的一个关键问题：对于不同分布数据的泛化。然而，神经网络目前无法执行循环算法因为它们没有无限大的内存来存储和回忆状态。为了解决这个问题，我们（1）提出了将图神经网络（GNNs）与栈相结合的方法，并（2）开发了捕捉中间算法轨迹的方法，以提高对循环算法的算法对齐性。栈允许网络学习保存和回忆网络特定时刻的一部分状态，类似于循环算法中的调用栈。这种扩展允许网络进行循环计算。我们经验性地证明，我们的提议可以对输入图更好地泛化，比之前的深度先搜索（DFS）方法更好。

Single Sequence Prediction over Reasoning Graphs for Multi-hop QA

paper_url: http://arxiv.org/abs/2307.00335
repo_url: None
paper_authors: Gowtham Ramesh, Makesh Sreedhar, Junjie Hu
for: 这篇论文目的是提出一种基于本地逻辑图的单调预测方法，以提高多步问答（QA）模型的准确率和解释性。
methods: 该方法使用了一个基于本地逻辑图的图神经网络来编码关键实体在每个上下文段中的关系，并将其混合到模型中的实体表示中。
results: 对HotpotQA和Musique两个 datasets进行实验，该方法显示有显著提高的答案匹配/F1 分数和解释路径的固有性，而且在Musique dataset上达到了状态艺术数据的最佳数据。

Abstract
Recent generative approaches for multi-hop question answering (QA) utilize the fusion-in-decoder method~\cite{izacard-grave-2021-leveraging} to generate a single sequence output which includes both a final answer and a reasoning path taken to arrive at that answer, such as passage titles and key facts from those passages. While such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model)\footnote{Code/Models will be released at \url{https://github.com/gowtham1997/SeqGraph} that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4\% increase in model parameters.

摘要
现代生成方法 для多步问答（QA）使用融合在解码器中的方法来生成一个单个序列输出，包括最终答案和到达该答案的思路，例如段落标题和关键信息从那些段落。 Although such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model) that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4% increase in model parameters.

Variation-aware Vision Transformer Quantization

paper_url: http://arxiv.org/abs/2307.00331
repo_url: https://github.com/huangowen/vvtq
paper_authors: Xijie Huang, Zhiqiang Shen, Kwang-Ting Cheng
for: 这篇论文的目的是提出一种基于知识传播的干扰量调整方法，以提高干扰量调整的稳定性和效率，并且减少干扰量调整的训练振荡。
methods: 本篇论文使用了量化检查和多腋检查的方法来探讨干扰量调整的影响，并且提出了一种基于知识传播的干扰量调整方法，包括多腋知识传播和模组对应的调整方法。
results: 本篇论文在ImageNet-1K上取得了2位数字的Swin-T模型，实现了77.66%的Top-1准确率，比前一代量化模型高3.35%。

Abstract
Despite the remarkable performance of Vision Transformers (ViTs) in various visual tasks, the expanding computation and model size of ViTs have increased the demand for improved efficiency during training and inference. To address the heavy computation and parameter drawbacks, quantization is frequently studied in the community as a representative model compression technique and has seen extensive use on CNNs. However, due to the unique properties of CNNs and ViTs, the quantization applications on ViTs are still limited and underexplored. In this paper, we identify the difficulty of ViT quantization on its unique variation behaviors, which differ from traditional CNN architectures. The variations indicate the magnitude of the parameter fluctuations and can also measure outlier conditions. Moreover, the variation behaviors reflect the various sensitivities to the quantization of each module. The quantization sensitivity analysis and comparison of ViTs with CNNs help us locate the underlying differences in variations. We also find that the variations in ViTs cause training oscillations, bringing instability during quantization-aware training (QAT). Correspondingly, we solve the variation problem with an efficient knowledge-distillation-based variation-aware quantization method. The multi-crop knowledge distillation scheme can accelerate and stabilize the training and alleviate the variation's influence during QAT. We also proposed a module-dependent quantization scheme and a variation-aware regularization term to suppress the oscillation of weights. On ImageNet-1K, we obtain a 77.66% Top-1 accuracy on the extremely low-bit scenario of 2-bit Swin-T, outperforming the previous state-of-the-art quantized model by 3.35%.

摘要
尽管视觉转换器（ViT）在视觉任务中表现出色，但是它们的计算量和模型大小的增加使得训练和推理中的效率提升变得更加重要。为了解决计算量和参数占用的问题，量化被广泛研究并在 convolutional neural networks（CNN）中广泛应用。然而，由于ViT的特殊性，量化在ViT上的应用仍然受限和未得到充分利用。在这篇论文中，我们发现ViT量化的困难在它的特殊变化行为上，这与传统的CNN架构不同。这些变化表示参数的振荡范围和可以测量异常情况。此外，这些变化行为反映每个模块的参数敏感度。我们通过对ViT和CNN进行量化敏感度分析和比较，了解它们之间的差异。我们还发现，在量化训练中，ViT中的变化会导致训练振荡，从而导致训练不稳定。为了解决这个问题，我们提出了一种高效的知识传递基于变化感知量化方法。我们采用多幅知识传递方案，以加速和稳定训练，并减轻变化的影响。此外，我们还提出了模块dependent的量化方案和变化感知 regularization 项，以抑制量化训练中的振荡。在 ImageNet-1K 上，我们在2位法（2-bit）的 Swin-T 上获得了77.66% 的 Top-1 准确率，比前一个状态的量化模型提高了3.35%。

FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy

paper_url: http://arxiv.org/abs/2307.01217
repo_url: https://github.com/tsingz0/fedcp
paper_authors: Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan
for: 针对关于隐私保护、合作学习和处理客户端数据统计不均衡的问题，提出了个性化联合学习（pFL）方法。
methods: 我们提出了一种名为 Federated Conditional Policy（FedCP）的方法，它根据每个样本的特征进行 conditinal policy 生成，以分类 global information 和个性化信息。然后，它将 global head 和个性化 head 分别处理这些特征。
results: 实验结果显示，FedCP 在计算机视觉和自然语言处理领域的十一种现有方法中，具有最高的性能，高于最佳化的方法平均提高6.69%。此外，FedCP 在部分客户端意外drop out的情况下仍保持优良性能。

Abstract
Recently, personalized federated learning (pFL) has attracted increasing attention in privacy protection, collaborative learning, and tackling statistical heterogeneity among clients, e.g., hospitals, mobile smartphones, etc. Most existing pFL methods focus on exploiting the global information and personalized information in the client-level model parameters while neglecting that data is the source of these two kinds of information. To address this, we propose the Federated Conditional Policy (FedCP) method, which generates a conditional policy for each sample to separate the global information and personalized information in its features and then processes them by a global head and a personalized head, respectively. FedCP is more fine-grained to consider personalization in a sample-specific manner than existing pFL methods. Extensive experiments in computer vision and natural language processing domains show that FedCP outperforms eleven state-of-the-art methods by up to 6.69%. Furthermore, FedCP maintains its superiority when some clients accidentally drop out, which frequently happens in mobile settings. Our code is public at https://github.com/TsingZ0/FedCP.

摘要
最近，个性化联合学习（pFL）已引起越来越多的关注，以保护隐私、合作学习和客户端数据的统计差异等方面。现有的大多数pFL方法都是利用客户端模型参数中的全球信息和个性化信息来增强模型性能的，而忽略了数据的来源。为了解决这个问题，我们提议了基于联合策略的个性化联合学习方法（FedCP），该方法可以为每个样本生成一个条件策略，以分离样本中的全球信息和个性化信息，然后由全球头和个性化头进行处理。FedCP比现有的pFL方法更加细化，可以在样本具体的方式上考虑个性化。在计算机视觉和自然语言处理领域进行了广泛的实验，我们发现FedCP可以与11种现有方法进行比较，其性能提高至多6.69%。此外，FedCP在手机设备上意外退出时仍然保持优势，这经常发生在手机设备上。我们的代码可以在https://github.com/TsingZ0/FedCP上获取。

DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis Across the Spectrum

paper_url: http://arxiv.org/abs/2307.00324
repo_url: None
paper_authors: Kishore Babu Nampalle, Pradeep Singh, Uppala Vivek Narayan, Balasubramanian Raman
for: 这篇论文目的是为了提出一个高精度、资源有效的医疗影像诊断模型，以应对医疗影像诊断中的 Computational Efficiency 挑战。
methods: 这篇论文使用了 MobileNetV2 架构，并将其扩展为 DeepMediX 模型，用于分类大脑 MRI 扫描和皮肤癌影像。它在 binary 和多类皮肤癌数据集上达到了Superior 性能。
results: DeepMediX 在 ISIC2018 等标准数据集上进行了严格的测试，结果显示它具有了 Exceptional 的诊断能力，与现有模型在大多数任务上几乎相同，甚至在一些情况下超越它们。

Abstract
In the rapidly evolving landscape of medical imaging diagnostics, achieving high accuracy while preserving computational efficiency remains a formidable challenge. This work presents \texttt{DeepMediX}, a groundbreaking, resource-efficient model that significantly addresses this challenge. Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer images, with superior performance demonstrated on both binary and multiclass skin cancer datasets. It provides a solution to labor-intensive manual processes, the need for large datasets, and complexities related to image properties. DeepMediX's design also includes the concept of Federated Learning, enabling a collaborative learning approach without compromising data privacy. This approach allows diverse healthcare institutions to benefit from shared learning experiences without the necessity of direct data access, enhancing the model's predictive power while preserving the privacy and integrity of sensitive patient data. Its low computational footprint makes DeepMediX suitable for deployment on handheld devices, offering potential for real-time diagnostic support. Through rigorous testing on standard datasets, including the ISIC2018 for dermatological research, DeepMediX demonstrates exceptional diagnostic capabilities, matching the performance of existing models on almost all tasks and even outperforming them in some cases. The findings of this study underline significant implications for the development and deployment of AI-based tools in medical imaging and their integration into point-of-care settings. The source code and models generated would be released at https://github.com/kishorebabun/DeepMediX.

摘要
在医疗影像诊断领域中，具有高准确率并且节省计算资源的挑战仍然存在。这项工作提出了《DeepMediX》模型，它是一种创新的、资源高效的模型，可以很好地解决这一挑战。基于MobileNetV2架构，DeepMediX在脑磁共振成像和皮肤癌图像分类方面表现出色，在 binary 和多类皮肤癌数据集上显示出了superior的性能。它提供了一种解决劳动密集的手动过程、需要大量数据和图像特性复杂度的解决方案。此外，DeepMediX 还实现了联合学习的概念，即在不同医疗机构之间进行分布式学习，而不需要直接访问敏感 пациент数据，从而保持数据隐私和敏感度。这种设计使得DeepMediX 适用于手持设备上部署，并且可以在实时诊断支持中提供可 counted 的优势。经过对标准数据集的严格测试，包括ISIC2018 皮肤癌研究数据集，DeepMediX 表现出了极其出色的诊断能力，与现有模型在大多数任务上准确率相当，甚至在一些情况下超越它们。这些发现对医疗影像领域中AI模型的开发和部署产生了深刻的影响，同时也提供了实时诊断支持的可能性。源代码和生成的模型将在上发布。

SHARCS: Shared Concept Space for Explainable Multimodal Learning

paper_url: http://arxiv.org/abs/2307.00316
repo_url: https://github.com/gabriele-dominici/SHARCS
paper_authors: Gabriele Dominici, Pietro Barbiero, Lucie Charlotte Magister, Pietro Liò, Nikola Simidjievski
for: 本研究旨在提供一种可解释的多模态学习方法，以解决复杂的实际世界问题，其中各个数据模式通常不够精准地解决给定的模型任务。
methods: 本研究使用了SHARCS（分享概念空间）方法，这是一种基于概念的新的多模态学习方法，可以学习和映射不同的各种多样化数据模式到一个共同的概念空间中，从而实现协同的概念映射。
results: 研究表明，SHARCS方法可以带来内在可解释的任务预测结果，同时也提高了下游预测性能。此外，SHARCS方法还可以在实际重要的场景中运行，如缺失模式的检索和跨模式解释。

Abstract
Multimodal learning is an essential paradigm for addressing complex real-world problems, where individual data modalities are typically insufficient to accurately solve a given modelling task. While various deep learning approaches have successfully addressed these challenges, their reasoning process is often opaque; limiting the capabilities for a principled explainable cross-modal analysis and any domain-expert intervention. In this paper, we introduce SHARCS (SHARed Concept Space) -- a novel concept-based approach for explainable multimodal learning. SHARCS learns and maps interpretable concepts from different heterogeneous modalities into a single unified concept-manifold, which leads to an intuitive projection of semantically similar cross-modal concepts. We demonstrate that such an approach can lead to inherently explainable task predictions while also improving downstream predictive performance. Moreover, we show that SHARCS can operate and significantly outperform other approaches in practically significant scenarios, such as retrieval of missing modalities and cross-modal explanations. Our approach is model-agnostic and easily applicable to different types (and number) of modalities, thus advancing the development of effective, interpretable, and trustworthy multimodal approaches.

摘要
多模态学习是现实世界中解决复杂问题的重要思想，因为单个数据模式通常不够准确地解决给定的模型化任务。虽然各种深度学习方法已成功解决这些挑战，但它们的理解过程通常是不透明的，限制了跨模态分析的可解释性和领域专家的干预。在这篇论文中，我们介绍了SHARCS（共享概念空间）——一种新的概念基于方法，用于解释多模态学习。SHARCS学习和映射不同的多样化模式中的可解释概念，将它们映射到单一的共享概念空间中，从而实现跨模态semantic相似的概念投影。我们示示了这种方法可以导致自然的解释任务预测，同时提高下游预测性能。此外，我们还证明了SHARCS可以在实际重要的场景下运行，如缺失模式的检索和跨模态解释。我们的方法是模型免疫的，可以适用于不同类型和数量的模式，因此推进了有效、可解释、可信worthy的多模态方法的发展。

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

paper_url: http://arxiv.org/abs/2307.00310
repo_url: None
paper_authors: Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot
for: 本文研究了 differentially private stochastic gradient descent（DP-SGD）的隐私分析，以及在训练常用 benchmark 数据集上的模型获得的隐私泄露。
methods: 本文提出了一种新的 DP-SGD 隐私分析方法，基于模型更新的分布来考虑点的相似性。此外，本文还提出了一种新的组合定理，用于有效地使用这种新的每步分析来评估整个训练过程的隐私保护。
results: 本文的评估结果显示，这种新的 DP-SGD 隐私分析方法可以正确地显示 DP-SGD 在许多数据点上具有更好的隐私保护。具体来说，正确分类的点会获得更好的隐私保证。

Abstract
Differentially private stochastic gradient descent (DP-SGD) is the canonical algorithm for private deep learning. While it is known that its privacy analysis is tight in the worst-case, several empirical results suggest that when training on common benchmark datasets, the models obtained leak significantly less privacy for many datapoints. In this paper, we develop a new analysis for DP-SGD that captures the intuition that points with similar neighbors in the dataset enjoy better privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints. In particular, we observe that correctly classified points obtain better privacy guarantees than misclassified points.

摘要
diferencialmente privado elíptico gradient descent (DP-SGD) es el algoritmo canonical para aprendizaje profundo privado. Aunque se sabe que su análisis de privacidad es tight en el caso worst-case, varios resultados empíricos sugieren que al entrenar en conjuntos de datos comunes, las modelos obtenidos filtran significativamente menos privacidad para muchos puntos de datos. En este artículo, desarrollamos un nuevo análisis para DP-SGD que captura la intuición de que los puntos con vecinos similares en el conjunto de datos disfrutan de mejores garantías de privacidad que los puntos de datos aislados. Formalmente, esto se hace modificando el análisis de privacidad por step de DP-SGD para introducir una dependencia en la distribución de actualizaciones de modelo computadas desde un conjunto de entrenamiento. Además, desarrollamos un nuevo teorema de composición para utilizar este nuevo análisis de per-step de manera efectiva para razonar sobre una carrera completa de entrenamiento. Juntos, nuestra evaluación muestra que este análisis novel de DP-SGD nos permite formalmente demostrar que DP-SGD filtra significativamente menos privacidad para muchos puntos de datos. En particular, observamos que los puntos correctamente clasificados obtienen mejores garantías de privacidad que los puntos incorrectamente clasificados.

Adversarial Attacks and Defenses on 3D Point Cloud Classification: A Survey

paper_url: http://arxiv.org/abs/2307.00309
repo_url: None
paper_authors: Hanieh Naderi, Ivan V. Bajić
for: 本文概述了针对点云分类任务的 adversarial attack 和防御技术的当前进展。
methods: 本文首先介绍了针对点云分类任务的 adversarial attack 的原理和特点，然后总结了过去几年的 adversarial example 生成方法。此外，它还分类了防御策略为输入转换、数据优化和深度模型修改。
results: 本文结束时提出了一些未来研究方向和挑战。Translation:
for: This paper summarizes the current progress on adversarial attack and defense techniques for point cloud classification.
methods: The paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes the adversarial example generation methods in recent years. It also classifies defense strategies as input transformation, data optimization, and deep model modification.
results: The paper concludes with several challenging issues and future research directions in this domain.

Abstract
Deep learning has successfully solved a wide range of tasks in 2D vision as a dominant AI technique. Recently, deep learning on 3D point clouds is becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification. This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes the adversarial example generation methods in recent years. Besides, it classifies defense strategies as input transformation, data optimization, and deep model modification. Finally, it presents several challenging issues and future research directions in this domain.

摘要
深度学习在2D视觉领域已经成为主导AI技术，Recently, deep learning on 3D点云是 becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification. This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes the adversarial example generation methods in recent years. Besides, it classifies defense strategies as input transformation, data optimization, and deep model modification. Finally, it presents several challenging issues and future research directions in this domain.Here's the translation breakdown:* 深度学习 (shēn dào xué xí) - deep learning* 2D视觉 (èr dào zhì jué) - 2D vision* 3D点云 (sān dào diǎn yún) - 3D point clouds* 主导 (zhǔ dǎo) - dominant* 攻击 (dào jī) - attack* 隐蔽 (yǐn bì) - imperceptible* 深度神经网络 (shēn dào shēn jī wǎng luò) - deep neural networks* 测试 (cè shí) - testing* 部署 (bù zhī) - deployment* 攻击表现 (dào jī bǎo xiǎng) - adversarial attacks* 攻击示例 (dào jī shì yǐ) - adversarial examples* 生成 (shēng jì) - generate* 攻击方法 (dào jī fāng fa) - adversarial attack methods* 防御 (fáng yì) - defense* 输入变换 (liù chū biàn zhǎn) - input transformation* 数据优化 (shù jí yǎo jī) - data optimization* 深度模型修改 (shēn dào mó deli xiū gǎi) - deep model modification* 挑战 (tiǎo zhàn) - challenging* 未来研究 (wèi lái yán jí) - future researchNote that Simplified Chinese is used in this translation, which is the standard writing system used in mainland China.

SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation

paper_url: http://arxiv.org/abs/2307.00306
repo_url: https://github.com/boschresearch/symfm6d
paper_authors: Fabian Duffhauss, Sebastian Koch, Hanna Ziesche, Ngo Anh Vien, Gerhard Neumann
for: 提高自动化系统与环境的交互安全性，需要检测对象和计算其6D姿态。
methods: 我们提出了一种新的对称意识多视角6D姿态估计器（SyMFM6D），使用深度多向合并网络将RGB-D帧集合多个视角进行有效融合，并同时预测场景中所有对象的预定关键点。基于关键点和实例semantic分割，我们高效计算6D姿态。
results: 我们的SyMFM6D网络在单视图和多视图6D姿态估计中都达到了现状顶峰性，并且我们提出了一种新的对称意识训练方法，以解决对称对象的歧义问题。此外，我们还证明了我们的方法对于不正确的摄像头准确和动态摄像头设置具有Robust性。

Abstract
Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach efficiently fuses the RGB-D frames from multiple perspectives in a deep multi-directional fusion network and predicts predefined keypoints for all objects in the scene simultaneously. Based on the keypoints and an instance semantic segmentation, we efficiently compute the 6D poses by least-squares fitting. To address the ambiguity issues for symmetric objects, we propose a novel training procedure for symmetry-aware keypoint detection including a new objective function. Our SyMFM6D network significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation. We furthermore show the effectiveness of our symmetry-aware training procedure and demonstrate that our approach is robust towards inaccurate camera calibration and dynamic camera setups.

摘要
检测对象和估计其6D姿态是自动化系统与环境交互的关键。大多数6D姿态估计器，然而，依赖单个相机框架，受到 occlusion 和对象相似性的影响。我们解决这个问题，通过提出一种新的对称意识多视图6D姿态估计器，称为 SyMFM6D。我们的方法能够有效地将多个视角的 RGB-D 帧在深度多向量融合网络中进行集成，并同时预测场景中所有对象的预定的关键点。基于关键点和实例Semantic分割，我们高效地计算出6D姿态。为了解决对称对象的含糊问题，我们提出了一种新的训练程序，包括一个新的目标函数。我们的 SyMFM6D 网络在单视图和多视图6D姿态估计中具有显著的优势，并且我们进一步证明了我们的对称意识训练程序的有效性。此外，我们还证明了我们的方法对不准确的相机准备和动态相机设置是Robust。

Applied Bayesian Structural Health Monitoring: inclinometer data anomaly detection and forecasting

paper_url: http://arxiv.org/abs/2307.00305
repo_url: None
paper_authors: David K. E. Green, Adam Jaspan
for: 这个论文旨在应用某种抽象技术来处理实际情况下的倾斜仪数据，以实现异常检测和预测。
methods: 这篇论文使用了 Bayesian 技术来处理倾斜仪数据，包括异常检测和预测。
results: 研究人员成功应用了这些技术来处理大量实际数据，并获得了有用的结果。这些技术可以广泛应用于工程 UQ 和结构健康监测（SHM）领域。

Abstract
Inclinometer probes are devices that can be used to measure deformations within earthwork slopes. This paper demonstrates a novel application of Bayesian techniques to real-world inclinometer data, providing both anomaly detection and forecasting. Specifically, this paper details an analysis of data collected from inclinometer data across the entire UK rail network. Practitioners have effectively two goals when processing monitoring data. The first is to identify any anomalous or dangerous movements, and the second is to predict potential future adverse scenarios by forecasting. In this paper we apply Uncertainty Quantification (UQ) techniques by implementing a Bayesian approach to anomaly detection and forecasting for inclinometer data. Subsequently, both costs and risks may be minimised by quantifying and evaluating the appropriate uncertainties. This framework may then act as an enabler for enhanced decision making and risk analysis. We show that inclinometer data can be described by a latent autocorrelated Markov process derived from measurements. This can be used as the transition model of a non-linear Bayesian filter. This allows for the prediction of system states. This learnt latent model also allows for the detection of anomalies: observations that are far from their expected value may be considered to have `high surprisal', that is they have a high information content relative to the model encoding represented by the learnt latent model. We successfully apply the forecasting and anomaly detection techniques to a large real-world data set in a computationally efficient manner. Although this paper studies inclinometers in particular, the techniques are broadly applicable to all areas of engineering UQ and Structural Health Monitoring (SHM).

摘要
倾斜仪是一种可以用来测量地形坡度中的弯曲变化的设备。本文介绍了一种使用 bayesian 技术对实际倾斜仪数据进行分析，以实现异常检测和预测。 Specifically, this paper details an analysis of data collected from inclinometer data across the entire UK rail network. 实践人员在监测数据处理中有两个目标：首先是 Identify any anomalous or dangerous movements, and the second is to predict potential future adverse scenarios by forecasting. In this paper, we apply Uncertainty Quantification (UQ) techniques by implementing a Bayesian approach to anomaly detection and forecasting for inclinometer data. By quantifying and evaluating the appropriate uncertainties, both costs and risks may be minimized. This framework may then act as an enabler for enhanced decision making and risk analysis.我们发现倾斜仪数据可以被描述为一个 latent autocorrelated Markov process，这可以作为系统状态预测的过程模型。这种learnt latent model也allow for the detection of anomalies：observations that are far from their expected value may be considered to have `high surprisal', that is they have a high information content relative to the model encoding represented by the learnt latent model.我们成功地应用预测和异常检测技术于大规模的实际数据集，并在计算效率上达到了可行的水平。 although this paper studies inclinometers in particular, the techniques are broadly applicable to all areas of engineering UQ and Structural Health Monitoring (SHM).

Accelerated primal-dual methods with enlarged step sizes and operator learning for nonsmooth optimal control problems

paper_url: http://arxiv.org/abs/2307.00296
repo_url: None
paper_authors: Yongcun Song, Xiaoming Yuan, Hangrui Yue
for: 该论文关注的问题是非满射最优控制问题，具有部分微分方程约束，这类问题具有非满射目标函数和高维度的计算问题。
methods: 该论文提出了一种基于 primal-dual 方法的解决方案，可以在每个迭代中只需解决两个微分方程。另外， authors 还提出了两种加速方法：一种是通过增大步长来加速 primal-dual 方法，另一种是通过学习操作符来加速 solve PDE 的计算。
results: 该论文通过验证了加速 primal-dual 方法的有效性，并且通过构建深度神经网络模型来缩减 solve PDE 的计算成本。该方法可以快速、精度高地解决各种 PDE 问题，并且可扩展到不同类型的 PDE 问题。

Abstract
We consider a general class of nonsmooth optimal control problems with partial differential equation (PDE) constraints, which are very challenging due to its nonsmooth objective functionals and the resulting high-dimensional and ill-conditioned systems after discretization. We focus on the application of a primal-dual method, with which different types of variables can be treated individually and thus its main computation at each iteration only requires solving two PDEs. Our target is to accelerate the primal-dual method with either larger step sizes or operator learning techniques. For the accelerated primal-dual method with larger step sizes, its convergence can be still proved rigorously while it numerically accelerates the original primal-dual method in a simple and universal way. For the operator learning acceleration, we construct deep neural network surrogate models for the involved PDEs. Once a neural operator is learned, solving a PDE requires only a forward pass of the neural network, and the computational cost is thus substantially reduced. The accelerated primal-dual method with operator learning is mesh-free, numerically efficient, and scalable to different types of PDEs. The acceleration effectiveness of these two techniques is promisingly validated by some preliminary numerical results.

摘要
我们考虑一个总体上的非短小最优控制问题，其具有部分泛函数方程（PDE）约束，这些问题非常具有挑战性，因为目标函数不均匀和约束后的系统尺度高、不充分条件。我们专注于使用主要-副 PRIMARY-DUAL 方法，它可以在每次迭代中只需解两个PDE来进行主要计算。我们的目标是加速主要-副 PRIMARY-DUAL 方法，可以通过更大的步长或者运算学习技术来实现。对于加速主要-副 PRIMARY-DUAL 方法的更大步长，我们可以正式证明其 converge，而且在数值上加速原始主要-副 PRIMARY-DUAL 方法。对于运算学习加速，我们构建了深度神经网络替代模型，可以在解PDE时只需进行神经网络的前进 pass，因此计算成本得到了显著减少。加速主要-副 PRIMARY-DUAL 方法的操作学习方法是无网络、数值高效和可扩展的。我们的实验结果表明，这两种加速技术的效果非常演示。

AutoST: Training-free Neural Architecture Search for Spiking Transformers

paper_url: http://arxiv.org/abs/2307.00293
repo_url: None
paper_authors: Ziqing Wang, Qidong Zhao, Jinku Cui, Xu Liu, Dongkuan Xu
for: 这个论文的目的是为了快速找到高性能且能耗低的神经网络架构（Spiking Transformer）。
methods: 这篇论文使用了一种名为AutoST的训练 свобо Nas方法，以快速找到高性能的Spiking Transformer架构。这种方法利用了Float Point Operations（FLOPs）作为性能指标，而不是传统的训练方法，这使得它能够更好地捕捉Spiking Neural Networks（SNNs）的性能特点。此外，这种方法还利用了初始化时的活动模式来估算Spiking Transformer的能 consumption。
results: 这篇论文的实验结果表明，AutoST模型比现有的手动或自动设计的SNN架构更高性能，而且能够减少能 consumption。

Abstract
Spiking Transformers have gained considerable attention because they achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers. However, the existing Spiking Transformer architectures, derived from ANNs, exhibit a notable architectural gap, resulting in suboptimal performance compared to their ANN counterparts. Traditional approaches to discovering optimal architectures primarily rely on either manual procedures, which are time-consuming, or Neural Architecture Search (NAS) methods, which are usually expensive in terms of memory footprints and computation time. To address these limitations, we introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance and energy-efficient Spiking Transformer architectures. Unlike existing training-free NAS methods, which struggle with the non-differentiability and high sparsity inherent in SNNs, we propose to utilize Floating-Point Operations (FLOPs) as a performance metric, which is independent of model computations and training dynamics, leading to a stronger correlation with performance. Moreover, to enable the search for energy-efficient architectures, we leverage activation patterns during initialization to estimate the energy consumption of Spiking Transformers. Our extensive experiments show that AutoST models outperform state-of-the-art manually or automatically designed SNN architectures on static and neuromorphic datasets, while significantly reducing energy consumption.

摘要
斯坦卷积 трансформа器（Spiking Transformer）在过去几年内已经受到了广泛关注，因为它们可以同时实现神经网络（SNN）的能量效率和转换器的高容量。然而，现有的斯坦卷积架构，基于人工神经网络（ANN），表现出一定的建筑空隙，导致其性能落后于其ANN对应器。传统的最佳架构发现方法主要靠 manual procedure，这是时间consuming的，或者使用神经网络搜索（NAS）方法，这些方法通常具有大量的内存占用和计算时间。为了解决这些限制，我们介绍AutoST，一种不需要训练的NAS方法，可以快速地找到高性能和能效的斯坦卷积架构。与现有的训练free NAS方法不同，我们使用 floating-point operations（FLOPs）作为性能指标，这是与模型计算和训练动态无关的，从而具有更强的相关性。此外，为了搜索能效的架构，我们在初始化时利用活动模式来估计斯坦卷积的能 consumption。我们的广泛实验表明，AutoST模型在静止和 neuromorphic 数据集上都高于状态当前的手动或自动设计的SNN架构，而同时减少了能 consumption。

All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning

paper_url: http://arxiv.org/abs/2307.00290
repo_url: None
paper_authors: Can Cui, Ruining Deng, Quan Liu, Tianyuan Yao, Shunxing Bao, Lucas W. Remedios, Yucheng Tang, Yuankai Huo
for: 这 paper 的目的是提出一种不需要在推理阶段提供手动提示的全自动分割模型（all-in-SAM），用于从标注生成到模型精度调整的整个人工智能开发工程中。
methods: 这 paper 使用了 SAM 模型，首先使用 SAM 生成点或 bounding box 等弱提示来生成像素级别的标注，然后使用这些标注来精度调整 SAM 分割模型。
results: 实验结果显示，提posed 管道在 Monuseg 数据集上的核体分割任务中超过了现有的 SOTA 方法，而使用弱和少的标注进行 SAM 精度调整也可以达到与使用强像素级别标注数据的性能。

Abstract
The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.

摘要
Segment Anything Model (SAM) 是一种最近提出的批量基于描述符的分割模型，可以在无预期分割任务中实现出色的灵活性和精度。然而，现有的管道仍然需要在推理阶段手动提供描述符，这对生物医学图像分割来说仍然是资源浪费。在本文中，我们提出一种不使用推理阶段手动提供描述符的管道，通过整个人工智能开发工程（从描述符生成到模型微调）使用 SAM。具体来说，SAM 首先使用弱描述符（例如点、 bounding box）生成像素级别的描述符，然后使用这些描述符微调 SAM 分割模型而不需要从scratch 重新训练。我们的实验结果显示了两个关键发现：1）我们的管道超过了目前最佳方法（SOTA）在公共 Monuseg 数据集上的核体 segmentation 任务中的性能，2）使用弱和少量的描述符进行 SAM 微调可以与使用强像素级别的 annotated 数据实现竞争性的性能。

CMA-ES for Post Hoc Ensembling in AutoML: A Great Success and Salvageable Failure

paper_url: http://arxiv.org/abs/2307.00286
repo_url: https://github.com/LennartPurucker/CMA-ES-PostHocEnsemblingAutoML
paper_authors: Lennart Purucker, Joeran Beel
For: The paper aims to analyze the performance of covariance matrix adaptation evolution strategy (CMA-ES) and greedy ensemble selection (GES) in automated machine learning (AutoML) systems, and to explore methods to stop CMA-ES from overfitting for ROC AUC.* Methods: The paper compares the performance of CMA-ES and GES on 71 classification datasets from the AutoML benchmark for AutoGluon, and proposes a method to normalize the weights produced by CMA-ES to avoid overfitting and improve performance for ROC AUC.* Results: The paper finds that CMA-ES overfits drastically and is outperformed by GES for the metric ROC AUC, but does not overfit and outperforms GES for the metric balanced accuracy. The proposed method to normalize the weights produced by CMA-ES improves its performance for ROC AUC and makes it perform better than or similar to GES.

Abstract
Many state-of-the-art automated machine learning (AutoML) systems use greedy ensemble selection (GES) by Caruana et al. (2004) to ensemble models found during model selection post hoc. Thereby, boosting predictive performance and likely following Auto-Sklearn 1's insight that alternatives, like stacking or gradient-free numerical optimization, overfit. Overfitting in Auto-Sklearn 1 is much more likely than in other AutoML systems because it uses only low-quality validation data for post hoc ensembling. Therefore, we were motivated to analyze whether Auto-Sklearn 1's insight holds true for systems with higher-quality validation data. Consequently, we compared the performance of covariance matrix adaptation evolution strategy (CMA-ES), state-of-the-art gradient-free numerical optimization, to GES on the 71 classification datasets from the AutoML benchmark for AutoGluon. We found that Auto-Sklearn's insight depends on the chosen metric. For the metric ROC AUC, CMA-ES overfits drastically and is outperformed by GES -- statistically significantly for multi-class classification. For the metric balanced accuracy, CMA-ES does not overfit and outperforms GES significantly. Motivated by the successful application of CMA-ES for balanced accuracy, we explored methods to stop CMA-ES from overfitting for ROC AUC. We propose a method to normalize the weights produced by CMA-ES, inspired by GES, that avoids overfitting for CMA-ES and makes CMA-ES perform better than or similar to GES for ROC AUC.

摘要
许多现代自动机器学习（AutoML）系统使用Caruana et al. (2004)提出的滥贪集成选择（GES）来ensemble模型在模型选择后期。这有助于提高预测性能，并可能是Auto-Sklearn 1中提出的一种准则，即使用低质量验证数据进行后期集成会导致过拟合。因此，我们被激励分析Auto-Sklearn 1中的准则是否适用于高质量验证数据。我们在AutoML benchmark中使用71个分类数据集进行比较，发现Auto-Sklearn的准则与选择的度量相关。对于ROC AUC度量，CMA-ES会过度拟合，而GES在多类分类问题上 statistically significant 地超过CMA-ES。对于balanced accuracy度量，CMA-ES不会过度拟合，并在多类分类问题上 statistically significant 地超过GES。受到成功应用CMA-ES的启示，我们探讨了如何使CMA-ES在ROC AUC度量上避免过度拟合，并提出了一种normalize CMA-ES生成的权重的方法，以避免CMA-ES过度拟合并使其在ROC AUC度量上表现更好或类似于GES。

Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML

paper_url: http://arxiv.org/abs/2307.00285
repo_url: https://github.com/isg-siegen/assembled
paper_authors: Lennart Purucker, Joeran Beel
for: 这篇论文的目的是提出一种名为Assembled-OpenML的Python工具，用于 comparing 不同的集成技术，以选择适合自动机器学习（AutoML）框架的技术。
methods: 这篇论文使用了OpenML的数据集和预测数据来构建集成的meta-dataset，并使用这些数据来比较不同的集成技术。
results: 在这篇论文中，作者使用了Assembled-OpenML工具来对一组集成技术进行比较，并在31个数据集上收集了1523个基本模型的预测数据。获取所有基本模型的预测数据使用Assembled-OpenML工具只需要大约1小时，而在最复杂的数据集上只需要训练和评估一个基本模型需要大约37分钟。

Abstract
Automated Machine Learning (AutoML) frameworks regularly use ensembles. Developers need to compare different ensemble techniques to select appropriate techniques for an AutoML framework from the many potential techniques. So far, the comparison of ensemble techniques is often computationally expensive, because many base models must be trained and evaluated one or multiple times. Therefore, we present Assembled-OpenML. Assembled-OpenML is a Python tool, which builds meta-datasets for ensembles using OpenML. A meta-dataset, called Metatask, consists of the data of an OpenML task, the task's dataset, and prediction data from model evaluations for the task. We can make the comparison of ensemble techniques computationally cheaper by using the predictions stored in a metatask instead of training and evaluating base models. To introduce Assembled-OpenML, we describe the first version of our tool. Moreover, we present an example of using Assembled-OpenML to compare a set of ensemble techniques. For this example comparison, we built a benchmark using Assembled-OpenML and implemented ensemble techniques expecting predictions instead of base models as input. In our example comparison, we gathered the prediction data of $1523$ base models for $31$ datasets. Obtaining the prediction data for all base models using Assembled-OpenML took ${\sim} 1$ hour in total. In comparison, obtaining the prediction data by training and evaluating just one base model on the most computationally expensive dataset took ${\sim} 37$ minutes.

摘要
自动化机器学习（AutoML）框架常用集成技术。开发者需要比较不同的集成技术，从多种可能的技术中选择适合 AutoML 框架。目前，对集成技术的比较经常是计算昂贵的，因为需要训练和评估多个基本模型。因此，我们介绍了 Assembled-OpenML。Assembled-OpenML 是一个 Python 工具，可以使用 OpenML 构建集成数据集（meta-dataset）。一个 meta-dataset 包括 OpenML 任务的数据、任务的数据集和模型评估中的预测数据。我们可以使用预测数据来减轻对集成技术的计算成本，而不需要训练和评估基本模型。为了介绍 Assembled-OpenML，我们描述了我们的首个工具版本。此外，我们还提供了使用 Assembled-OpenML 比较集成技术的示例。在这个示例中，我们使用 Assembled-OpenML 构建了一个 Refer 和一个 benchmark。我们在 benchmark 中使用 Assembled-OpenML 获取了 $1523$ 个基本模型的预测数据，并对 $31$ 个数据集进行了比较。在 total 的一小时内，我们获取了所有基本模型的预测数据。相比之下，只使用一个基本模型在最 computationally Expensive 的数据集上训练和评估，需要 ${\sim} 37$ 分钟。

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

paper_url: http://arxiv.org/abs/2307.00280
repo_url: None
paper_authors: Yan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu
for: 这个论文旨在探讨深度学习模型在不同系统实现中的Robustness问题。
methods: 作者们首次引入了一种通常被忽略的噪声——SysNoise，它会在训练和部署过程中出现，并对深度学习模型的Robustness产生影响。作者们分类了SysNoise为三类，并建立了一个总面测试来评估20多种模型在不同任务中对SysNoise的抗性。
results: 实验结果表明，SysNoise会对不同任务中的模型抗性产生影响，而常见的数据增强和对抗训练等方法对其影响有限。这些结果开启了一个新的研究领域，并希望这项工作能够吸引更多关注深度学习部署系统中模型性能的问题。研究者们已经在https://modeltc.github.io/systemnoise_web上公开了benchmark和框架。

Abstract
Extensive studies have shown that deep learning models are vulnerable to adversarial and natural noises, yet little is known about model robustness on noises caused by different system implementations. In this paper, we for the first time introduce SysNoise, a frequently occurred but often overlooked noise in the deep learning training-deployment cycle. In particular, SysNoise happens when the source training system switches to a disparate target system in deployments, where various tiny system mismatch adds up to a non-negligible difference. We first identify and classify SysNoise into three categories based on the inference stage; we then build a holistic benchmark to quantitatively measure the impact of SysNoise on 20+ models, comprehending image classification, object detection, instance segmentation and natural language processing tasks. Our extensive experiments revealed that SysNoise could bring certain impacts on model robustness across different tasks and common mitigations like data augmentation and adversarial training show limited effects on it. Together, our findings open a new research topic and we hope this work will raise research attention to deep learning deployment systems accounting for model performance. We have open-sourced the benchmark and framework at https://modeltc.github.io/systemnoise_web.

摘要
广泛的研究表明深度学习模型对阈值和自然噪声极敏感，然而对不同系统实施中的噪声知之少。在这篇论文中，我们首次引入了系统噪声（SysNoise），这是在深度学习训练部署过程中频繁出现， yet often overlooked的噪声。具体来说，SysNoise发生在训练系统转换到不同目标系统时，这些系统之间存在微小的差异，这些差异总体而言是可观的。我们首先将SysNoise分类为三类基于推理阶段，然后建立了一个涵盖多个任务的总体测试环境，以量化SysNoise对20多种模型的影响。我们的广泛实验发现，SysNoise会对不同任务的模型性能产生一定的影响，而常见的防御措施如数据扩展和对抗训练显示有限的效果。共同来说，我们的发现开启了一个新的研究领域，我们希望这项工作会吸引研究人员对深度学习部署系统的性能考虑。我们已经在https://modeltc.github.io/systemnoise_web上公开了benchmark和框架。

Common Knowledge Learning for Generating Transferable Adversarial Examples

paper_url: http://arxiv.org/abs/2307.00274
repo_url: None
paper_authors: Ruijie Yang, Yuanfang Guo, Junfu Wang, Jiantao Zhou, Yunhong Wang
for: 本文研究了一种黑盒攻击方法，即将恶意例子生成于一个卷积网络模型（源模型），然后使用这些恶意例子攻击一个未知的目标模型。现有方法往往在不同类型的DNN结构（例如ResNet-18和Swin Transformer）之间存在不满意的攻击跨性。
methods: 我们提出了一种共同知识学习（CKL）框架，用于学习更好的网络参数，以生成更好的攻击 transferred。具体来说，我们构建了多教师框架，其中知识从不同的教师模型中提取到一个学生网络中。我们还增加了约束条件，以降低模型特定的特征和提高攻击 transferred。
results: 我们的提议可以明显提高攻击 transferred。广泛的实验表明，我们的CKL框架可以更好地利用现有的DNN模型，并提高攻击 transferred的效果。

Abstract
This paper focuses on an important type of black-box attacks, i.e., transfer-based adversarial attacks, where the adversary generates adversarial examples by a substitute (source) model and utilize them to attack an unseen target model, without knowing its information. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures (e.g. ResNet-18 and Swin Transformer). In this paper, we observe that the above phenomenon is induced by the output inconsistency problem. To alleviate this problem while effectively utilizing the existing DNN models, we propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples with better transferability, under fixed network architectures. Specifically, to reduce the model-specific features and obtain better output distributions, we construct a multi-teacher framework, where the knowledge is distilled from different teacher architectures into one student network. By considering that the gradient of input is usually utilized to generated adversarial examples, we impose constraints on the gradients between the student and teacher models, to further alleviate the output inconsistency problem and enhance the adversarial transferability. Extensive experiments demonstrate that our proposed work can significantly improve the adversarial transferability.

摘要
Translated into Simplified Chinese:这篇论文关注一种重要的黑盒攻击方法，即传输基于敌意攻击，其中敌对者通过一个代理（源）模型生成攻击示例，然后使用这些示例攻击未知的目标模型。现有方法通常对不同类型的深度神经网络（DNN）模型的攻击性不满意。在这篇论文中，我们发现这种情况是由输出不一致问题引起的。为了解决这个问题并有效利用现有的DNN模型，我们提议一种通用知识学习（CKL）框架，用于学习更好的网络权重，以生成更好的攻击示例，并且具有更好的传输性。具体来说，我们构建了多教导框架，其中知识来自不同的教导模型被塑造成一个学生网络。由于通常利用输入的梯度来生成攻击示例，我们对学生和教导模型之间的梯度做出限制，以进一步缓解输出不一致问题，并提高攻击传输性。广泛的实验表明，我们的提议可以显著提高攻击传输性。

Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.00268
repo_url: None
paper_authors: Md Tamjid Hossain, Hung La
for: 这个论文是为了探讨 differential privacy (DP) 在 cooperative multiagent reinforcement learning (CMARL) 中保护代理人的隐私问题而写的。
methods: 这篇论文使用了 differential privacy 机制，并提出了一种基于这些机制的本地损害攻击（PeLPA），以便在知识共享时防止对代理人的攻击。
results: 研究发现，在不同的环境中，PeLPA 攻击可以增加 CMARL 模型的平均步长，并且可以减少 CMARL 模型的优化奖励和速度。在一个中等规模的环境中，PeLPA 攻击可以导致 CMARL 模型的平均步长增加50.69%和64.41%。

Abstract
Lately, differential privacy (DP) has been introduced in cooperative multiagent reinforcement learning (CMARL) to safeguard the agents' privacy against adversarial inference during knowledge sharing. Nevertheless, we argue that the noise introduced by DP mechanisms may inadvertently give rise to a novel poisoning threat, specifically in the context of private knowledge sharing during CMARL, which remains unexplored in the literature. To address this shortcoming, we present an adaptive, privacy-exploiting, and evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the inherent DP-noise to circumvent anomaly detection systems and hinder the optimal convergence of the CMARL model. We rigorously evaluate our proposed PeLPA attack in diverse environments, encompassing both non-adversarial and multiple-adversarial contexts. Our findings reveal that, in a medium-scale environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to an increase in average steps to goal by 50.69% and 64.41%, respectively. Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x computational time increase in optimal reward attainment and a 1.18x and 1.38x slower convergence for attacker ratios of 20% and 40%, respectively.

摘要
近些时间，演变式隐私（DP）在合作多代理游戏学习（CMARL）中被引入，以保护代理的隐私免受敌意推理中的攻击。然而，我们认为DP机制引入的噪声可能会不知不觉地导致一种新的毒素威胁，具体来说是在CMARL中私人知识共享时期，这一点在文献中尚未得到探讨。为解决这一缺点，我们提出了一种适应、隐私滥用和逃避抗击的本地化毒素攻击（PeLPA），利用DP噪声来绕过异常检测系统，阻碍CMARL模型的优化征化。我们仔细测试了我们提出的PeLPA攻击在多种环境中，包括不良环境和多个敌对环境。我们的结果表明，在中型环境下，PeLPA攻击的20%和40%攻击者比率可以导致平均步骤数增加50.69%和64.41%，分别。此外，在同样的条件下，PeLPA攻击可以导致优化奖励获得的计算时间增加1.4倍和1.6倍，以及优化征化 slower convergence的1.18倍和1.38倍。

An ML approach to resolution of singularities

paper_url: http://arxiv.org/abs/2307.00252
repo_url: None
paper_authors: Gergely Bérczi, Honglu Fan, Mingcong Zeng
for: 这个论文主要关注的是如何使用机器学习算法来解决 polynomial equations 中的精度问题。methods: 该论文使用了 reinforcement learning agents来找到最佳的解决方案。results: 在某些领域，论文中的模型在total number of polynomial additions performed方面表现出了提高，这提供了一个Proof-of-concept，表明现代机器学习技术在符号计算中的性能可以进行改进。

Abstract
The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2-player game, the so-called Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.

摘要
系统的多项式方程集解通常包含恶 behave的点。解析是几何基本过程，我们将这些点 replaced with smooth points，保持解集不变。解析不唯一，通常通过 repeatedly performing a fundamental operation known as " blowing-up" 来描述它们。这个过程可以被翻译为各种版本的2个玩家游戏，即希罗纳卡游戏，并且一个赢家的策略可以提供解决方案。在这篇论文中，我们介绍了一种使用强化学习代理来找到最优的解决方案。在某些领域，训练模型在总数量方程添加的环节上表现出色，超过了现有的选择规则，这提供了一个证明，表明最近的机器学习发展有助于改善符号计算中的算法性能。

Safe Screening for Unbalanced Optimal Transport

paper_url: http://arxiv.org/abs/2307.00247
repo_url: None
paper_authors: Xun Su, Zhongxi Fang, Hiroyuki Kasai
for: 本研究旨在加速优化非平衡优Transport（UOT）问题的优化过程，通过积极地标识并消除稀疏解的零元素。
methods: 本研究使用Safe Screening技术，并提出了一种新的近似投影、椭球安全区建构和两平面放松方法，以提高屏选效率而无需增加算法复杂度。
results: 研究表明，通过应用Safe Screening技术，可以有效地加速UOT问题的优化过程，而无需改变算法的复杂度。

Abstract
This paper introduces a framework that utilizes the Safe Screening technique to accelerate the optimization process of the Unbalanced Optimal Transport (UOT) problem by proactively identifying and eliminating zero elements in the sparse solutions. We demonstrate the feasibility of applying Safe Screening to the UOT problem with $\ell_2$-penalty and KL-penalty by conducting an analysis of the solution's bounds and considering the local strong convexity of the dual problem. Considering the specific structural characteristics of the UOT in comparison to general Lasso problems on the index matrix, we specifically propose a novel approximate projection, an elliptical safe region construction, and a two-hyperplane relaxation method. These enhancements significantly improve the screening efficiency for the UOT's without altering the algorithm's complexity.

摘要

On a Relation Between the Rate-Distortion Function and Optimal Transport

paper_url: http://arxiv.org/abs/2307.00246
repo_url: None
paper_authors: Eric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
for: 这篇论文探讨了Rate-Distortion和最佳运输（OT）理论之间的关系，尤其是将一种基于极限Entropic OT距离的函数与Rate-Distortion函数相等。
methods: 论文使用了数值验证这个结果，以及之前已知的Monge和Kantorovich问题与最佳整数化器相关的结果。
results: 论文将Rate-Distortion和整数化器的解决方案统一到了一起，使用它们的最佳运输解决方案。

Abstract
We discuss a relationship between rate-distortion and optimal transport (OT) theory, even though they seem to be unrelated at first glance. In particular, we show that a function defined via an extremal entropic OT distance is equivalent to the rate-distortion function. We numerically verify this result as well as previous results that connect the Monge and Kantorovich problems to optimal scalar quantization. Thus, we unify solving scalar quantization and rate-distortion functions in an alternative fashion by using their respective optimal transport solvers.

摘要
我们讨论了率复运输（rate-distortion）和最佳运输（optimal transport，OT）理论之间的关系，虽然他们在初见之下似乎无关。具体来说，我们证明了一个基于极大熵运输距离的函数与率复运输函数相等。我们也 numerically 验证了这个结果，以及之前已知的联结卢曼和乔托罗维茨问题与优秀单元调变器。因此，我们将解决单元调变和率复运输函数的问题集成到一个alternative的方式，使用它们的各自的最佳运输解决方案。

Unified Transfer Learning Models for High-Dimensional Linear Regression

paper_url: http://arxiv.org/abs/2307.00238
repo_url: None
paper_authors: Shuo Shuo Liu
for: 这 paper 是为了解决现代数据分析中的转移学习问题，特别是当target数据稀缺但source数据充沛时。
methods: 这 paper 提出了一种可解释的转移学习模型，称为UTrans，可以探测目标数据中可转移变量和源数据。 authors 还提出了一种基于假设检测的源检测算法，以排除不可转移数据。
results: 在多个实验中，UTrans 的估计和预测误差比既有的方法低很多，同时保持可解释性。 authors 最后应用了这种算法到美国 между代移动性数据，并与经典机器学习算法进行比较。

Abstract
Transfer learning plays a key role in modern data analysis when: (1) the target data are scarce but the source data are sufficient; (2) the distributions of the source and target data are heterogeneous. This paper develops an interpretable unified transfer learning model, termed as UTrans, which can detect both transferable variables and source data. More specifically, we establish the estimation error bounds and prove that our bounds are lower than those with target data only. Besides, we propose a source detection algorithm based on hypothesis testing to exclude the nontransferable data. We evaluate and compare UTrans to the existing algorithms in multiple experiments. It is shown that UTrans attains much lower estimation and prediction errors than the existing methods, while preserving interpretability. We finally apply it to the US intergenerational mobility data and compare our proposed algorithms to the classical machine learning algorithms.

摘要
现代数据分析中，转移学习具有重要作用，特别是当目标数据scarce但来源数据充足时。本文提出了可解释的一种统一转移学习模型，称为UTrans，可以探测目标数据中可以转移的变量以及来源数据。更 specifically，我们建立了估计误差 bound，并证明我们的 bound 比目标数据只有的低。此外，我们提出了一种来源检测算法基于假设测试，以排除不可转移的数据。我们在多个实验中评估和比较UTrans与现有方法，显示UTrans可以获得远低于现有方法的估计和预测误差，同时保持可解释性。最后，我们应用其到美国 между代移动性数据中，并与经典机器学习算法进行比较。

Hierarchical Federated Learning Incentivization for Gas Usage Estimation

paper_url: http://arxiv.org/abs/2307.00233
repo_url: None
paper_authors: Has Sun, Xiaoli Tang, Chengyi Yang, Zhenpeng Yu, Xiuli Wang, Qijie Ding, Zengxiang Li, Han Yu
for: The paper is written for the efficient functioning of gas distribution networks and saving operational costs by accurately estimating gas usage.
methods: The paper proposes a Hierarchical FL Incentive Mechanism for Gas Usage Estimation (HI-GAS) that uses federated learning (FL) to enable local data processing on each participant, such as gas companies and heating stations, while maintaining privacy and incentivizing active participation.
results: The proposed mechanism is testbedded in the ENN Group, one of the leading players in the natural gas and green energy industry, and extensive experiments validate the effectiveness of the proposed mechanism in improving gas usage estimation performance.

Abstract
Accurately estimating gas usage is essential for the efficient functioning of gas distribution networks and saving operational costs. Traditional methods rely on centralized data processing, which poses privacy risks. Federated learning (FL) offers a solution to this problem by enabling local data processing on each participant, such as gas companies and heating stations. However, local training and communication overhead may discourage gas companies and heating stations from actively participating in the FL training process. To address this challenge, we propose a Hierarchical FL Incentive Mechanism for Gas Usage Estimation (HI-GAS), which has been testbedded in the ENN Group, one of the leading players in the natural gas and green energy industry. It is designed to support horizontal FL among gas companies, and vertical FL among each gas company and heating station within a hierarchical FL ecosystem, rewarding participants based on their contributions to FL. In addition, a hierarchical FL model aggregation approach is also proposed to improve the gas usage estimation performance by aggregating models at different levels of the hierarchy. The incentive scheme employs a multi-dimensional contribution-aware reward distribution function that combines the evaluation of data quality and model contribution to incentivize both gas companies and heating stations within their jurisdiction while maintaining fairness. Results of extensive experiments validate the effectiveness of the proposed mechanism.

摘要
必须准确估算燃气使用量以确保燃气供应网络的有效运行和降低运营成本。传统方法依赖中央数据处理，却可能涉及到隐私风险。联邦学习（FL）提供了一种解决方案，允许每个参与者（如燃气公司和温水站）进行本地数据处理。然而，本地训练和通信开销可能会抑制燃气公司和温水站参与FL训练过程。为解决这个挑战，我们提出了一种层次FL奖励机制 для燃气使用量估算（HI-GAS），在ENN集团（一家领先的自然气和绿色能源企业）的测试环境中进行了测试。它支持水平FL among燃气公司，并在每个燃气公司和温水站之间层次FL，为参与者基于他们对FL的贡献而颁发奖励。此外，一种层次FL模型聚合方法也被提出，以提高燃气使用量估算性能。奖励机制采用多维度贡献意识分布函数，旨在奖励燃气公司和温水站在其辖区内的贡献，同时保持公平。实验结果证明了提案的效果。

Forward-Forward Algorithm for Hyperspectral Image Classification: A Preliminary Study

paper_url: http://arxiv.org/abs/2307.00231
repo_url: None
paper_authors: Sidike Paheding, Abel A. Reyes-Angulo
for: 本研究探讨了使用forward-forward算法（FFA）进行干涉谱图像分类。
methods: 本研究使用了传统的反向卷积算法（back-propagation）和FFA两种方法进行比较分析。
results: 初步结果表明FFA可能具有更好的性能和更好的可扩展性，而且可以避免反向卷积算法的一些局限性。

Abstract
The back-propagation algorithm has long been the de-facto standard in optimizing weights and biases in neural networks, particularly in cutting-edge deep learning models. Its widespread adoption in fields like natural language processing, computer vision, and remote sensing has revolutionized automation in various tasks. The popularity of back-propagation stems from its ability to achieve outstanding performance in tasks such as classification, detection, and segmentation. Nevertheless, back-propagation is not without its limitations, encompassing sensitivity to initial conditions, vanishing gradients, overfitting, and computational complexity. The recent introduction of a forward-forward algorithm (FFA), which computes local goodness functions to optimize network parameters, alleviates the dependence on substantial computational resources and the constant need for architectural scaling. This study investigates the application of FFA for hyperspectral image classification. Experimental results and comparative analysis are provided with the use of the traditional back-propagation algorithm. Preliminary results show the potential behind FFA and its promises.

摘要
很长时间内，反射协同精算算法一直是深度学习模型优化参数的标准方法，尤其是在自然语言处理、计算机视觉和远程感知等领域中。这种广泛的应用使得自动化各种任务得到了革命性的改善。反射协同精算算法的吸引力来自它能够在分类、探测和分割等任务中达到杰出的性能。然而，反射协同精算算法也有一些局限性，例如依赖于初始条件、渐近衰减、过拟合和计算复杂性等。最近，一种前进前进算法（FFA）在计算局部优良函数来优化网络参数，从而减少了对计算资源的依赖和建筑层次的需求。本研究探讨了使用FFA进行光谱成像分类的应用。实验结果和比较分析都提供了使用传统反射协同精算算法的前期结果，显示了FFA的潜力和承诺。

InferTurbo: A Scalable System for Boosting Full-graph Inference of Graph Neural Network over Huge Graphs

paper_url: http://arxiv.org/abs/2307.00228
repo_url: None
paper_authors: Dalong Zhang, Xianzheng Song, Zhiyang Hu, Yang Li, Miao Tao, Binbin Hu, Lin Wang, Zhiqiang Zhang, Jun Zhou
for: 提高工业场景中GNNS的推理效率和可扩展性。
methods: 提出一个可扩展的系统名为InferTurbo，通过采用GAS（集合应用散列）Schema来解决扩展性、可靠性和可扩展性等问题。
results: 实验结果表明，InferTurbo可以快速和高效地完成GNNS的推理任务，并且在图中含有一些核心节点时能够更好地均衡负载。系统可以在2小时内完成一个图中百亿个边的GNNS推理任务，并且与传统推理管道相比有显著的性能提升。

Abstract
GNN inference is a non-trivial task, especially in industrial scenarios with giant graphs, given three main challenges, i.e., scalability tailored for full-graph inference on huge graphs, inconsistency caused by stochastic acceleration strategies (e.g., sampling), and the serious redundant computation issue. To address the above challenges, we propose a scalable system named InferTurbo to boost the GNN inference tasks in industrial scenarios. Inspired by the philosophy of ``think-like-a-vertex", a GAS-like (Gather-Apply-Scatter) schema is proposed to describe the computation paradigm and data flow of GNN inference. The computation of GNNs is expressed in an iteration manner, in which a vertex would gather messages via in-edges and update its state information by forwarding an associated layer of GNNs with those messages and then send the updated information to other vertexes via out-edges. Following the schema, the proposed InferTurbo can be built with alternative backends (e.g., batch processing system or graph computing system). Moreover, InferTurbo introduces several strategies like shadow-nodes and partial-gather to handle nodes with large degrees for better load balancing. With InferTurbo, GNN inference can be hierarchically conducted over the full graph without sampling and redundant computation. Experimental results demonstrate that our system is robust and efficient for inference tasks over graphs containing some hub nodes with many adjacent edges. Meanwhile, the system gains a remarkable performance compared with the traditional inference pipeline, and it can finish a GNN inference task over a graph with tens of billions of nodes and hundreds of billions of edges within 2 hours.

摘要
原文：GNN推理是一个非常复杂的任务，特别在工业场景中面临巨大图的情况下，因为存在三个主要挑战：一是扩展性，二是随机加速策略（如采样）引起的不一致性，三是严重的重复计算问题。为了解决这些挑战，我们提出了一个可扩展的系统名为InferTurbo，用于加速工业场景中的GNN推理任务。根据“思考如Vertex”的哲学，我们提出了一种GAS（聚合应用散发）模式来描述GNN推理的计算范式和数据流程。GNN的计算是在迭代方式下进行，每个顶点都会通过入边收集消息，并将这些消息与GNN层进行相应的更新，然后将更新后的信息发送给其他顶点via出边。按照这种模式，我们提出的InferTurbo可以采用替换性的后端（如批处理系统或图计算系统）。此外，InferTurbo还引入了一些策略，如影子节点和部分聚合，以更好地协调节点的负载均衡。通过InferTurbo，GNN推理可以在全图上进行不需要采样和重复计算。实验结果表明，我们的系统具有良好的稳定性和效率，可以快速完成含有一些核心节点的GNN推理任务。同时，我们的系统与传统推理管道相比，具有显著的性能优势，可以在2小时内完成一个图中百亿个节点、千亿个边的GNN推理任务。

Causal Structure Learning by Using Intersection of Markov Blankets

paper_url: http://arxiv.org/abs/2307.00227
repo_url: https://github.com/ronedong/eembi
paper_authors: Yiran Dong, Chuanhou Gao
for: 本研究提出了一种新的 causal 结构学习算法，即 Endogenous and Exogenous Markov Blankets Intersection (EEMBI)，该算法结合了 Bayesian 网络和Structural Causal Models (SCM) 的特点。
methods: 本研究使用了 EEMBI 算法，并在其基础上提出了一种扩展版本，即 EEMBI-PC，它将 PC 算法的最后一步纳入 EEMBI 中。
results: 研究人员通过使用 EEMBI 和 EEMBI-PC 算法，在不同的数据集上进行了实验，并证明了这些算法的有效性和可靠性。

Abstract
In this paper, we introduce a novel causal structure learning algorithm called Endogenous and Exogenous Markov Blankets Intersection (EEMBI), which combines the properties of Bayesian networks and Structural Causal Models (SCM). Furthermore, we propose an extended version of EEMBI, namely EEMBI-PC, which integrates the last step of the PC algorithm into EEMBI.

摘要
在本文中，我们介绍了一种新的 causal structure 学习算法，即 Endogenous and Exogenous Markov Blankets Intersection (EEMBI)，该算法结合了 bayesian networks 和 Structural Causal Models (SCM) 的特性。此外，我们还提出了 EEMBI 的扩展版本，即 EEMBI-PC，该版本将 PC 算法的最后一步纳入 EEMBI 中。

S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture

paper_url: http://arxiv.org/abs/2307.00226
repo_url: None
paper_authors: Ye Xue, Diego Klabjan, Jean Utke
for: 本文旨在扩展和改进 Omninet 模型，以便处理多modalitaties 和多任务 simultaneously.
methods: 本文提出了三种改进：1) 通过cross-cache attention实现交互 among spatial, temporal, and structured features; 2) 使用 patch embeddings 增强视觉输入的空间表示; 3) 支持структуры数据。
results: 对多modalitaties 数据进行评估，提出了一种新的 Structured-data-enhanced Omninet（S-Omninet）模型，并在多个多模态数据集上实现了显著提高。

Abstract
Multimodal multitask learning has attracted an increasing interest in recent years. Singlemodal models have been advancing rapidly and have achieved astonishing results on various tasks across multiple domains. Multimodal learning offers opportunities for further improvements by integrating data from multiple modalities. Many methods are proposed to learn on a specific type of multimodal data, such as vision and language data. A few of them are designed to handle several modalities and tasks at a time. In this work, we extend and improve Omninet, an architecture that is capable of handling multiple modalities and tasks at a time, by introducing cross-cache attention, integrating patch embeddings for vision inputs, and supporting structured data. The proposed Structured-data-enhanced Omninet (S-Omninet) is a universal model that is capable of learning from structured data of various dimensions effectively with unstructured data through cross-cache attention, which enables interactions among spatial, temporal, and structured features. We also enhance spatial representations in a spatial cache with patch embeddings. We evaluate the proposed model on several multimodal datasets and demonstrate a significant improvement over the baseline, Omninet.

摘要
多模态多任务学习在最近几年内得到了越来越多的关注。单模态模型在不同领域中取得了非常出众的成绩。多模态学习可以更好地提高模型的性能，通过结合多个模式的数据。许多方法是用来学习特定类型的多模态数据，如视觉语言数据。然而，只有一些方法可以同时处理多个模式和任务。在这项工作中，我们对Omninet架构进行扩展和改进，通过引入跨缓存注意力、 integrate patch embeddings для视觉输入和支持结构数据。我们提出的结构化数据增强Omninet（S-Omninet）是一种通用的模型，可以有效地从不同维度的结构数据中学习，并且可以通过跨缓存注意力和缓存中的补充表示来实现空间、时间和结构特征之间的交互。我们还增强缓存中的空间表示，通过将patch embeddings integrate into spatial cache。我们在多个多模态数据集上评估了提议的模型，并证明了与基eline Omninet相比，S-Omninet具有显著的提升。

Re-Think and Re-Design Graph Neural Networks in Spaces of Continuous Graph Diffusion Functionals

paper_url: http://arxiv.org/abs/2307.00222
repo_url: None
paper_authors: Tingting Dan, Jiaqi Ding, Ziquan Wei, Shahar Z Kovalsky, Minjeong Kim, Won Hwa Kim, Guorong Wu
for: This paper proposes a new framework for graph neural networks (GNNs) that addresses the limitation of locality in existing GNN models and improves their ability to capture long-range dependencies and global patterns in graphs.
methods: The proposed framework uses a new inductive bias based on variational analysis and maps discrete GNN models to continuous diffusion functionals. It also introduces a selective mechanism to address the trade-off between model depth and over-smoothing, and a novel generative adversarial network (GAN) that predicts spreading flows in graphs.
results: The proposed GNN models achieve state-of-the-art (SOTA) performance on popular graph learning benchmarks such as Cora, Citeseer, and Pubmed.

Abstract
Graph neural networks (GNNs) are widely used in domains like social networks and biological systems. However, the locality assumption of GNNs, which limits information exchange to neighboring nodes, hampers their ability to capture long-range dependencies and global patterns in graphs. To address this, we propose a new inductive bias based on variational analysis, drawing inspiration from the Brachistochrone problem. Our framework establishes a mapping between discrete GNN models and continuous diffusion functionals. This enables the design of application-specific objective functions in the continuous domain and the construction of discrete deep models with mathematical guarantees. To tackle over-smoothing in GNNs, we analyze the existing layer-by-layer graph embedding models and identify that they are equivalent to l2-norm integral functionals of graph gradients, which cause over-smoothing. Similar to edge-preserving filters in image denoising, we introduce total variation (TV) to align the graph diffusion pattern with global community topologies. Additionally, we devise a selective mechanism to address the trade-off between model depth and over-smoothing, which can be easily integrated into existing GNNs. Furthermore, we propose a novel generative adversarial network (GAN) that predicts spreading flows in graphs through a neural transport equation. To mitigate vanishing flows, we customize the objective function to minimize transportation within each community while maximizing inter-community flows. Our GNN models achieve state-of-the-art (SOTA) performance on popular graph learning benchmarks such as Cora, Citeseer, and Pubmed.

摘要
图 neural network (GNN) 在社交网络和生物系统等领域广泛应用。然而，GNN 的本地性假设，即只与邻居交换信息，限制其捕捉长距离依赖和全局模式的能力。为解决这个问题，我们提出了一种新的假设基于变分分析， Drawing inspiration from the Brachistochrone problem。我们的框架将离散 GNN 模型与连续扩散函数联系起来，这使得可以在连续空间中设计应用特定的目标函数和建立具有数学保证的深度模型。为了解决 GNN 中的过滤问题，我们分析了现有的层次 Graph Embedding 模型，发现它们与 L2 范数积函数相等，导致过滤。类似于图边缘滤波器，我们引入全体变量（TV），以确保图像扩散模式与全局社区结构相匹配。此外，我们开发了一种选择性机制，以解决模型深度和过滤之间的负反向关系，这可以轻松地集成到现有的 GNN 中。此外，我们提出了一种基于神经运输方程的图文生成模型，通过最小化运输在每个社区内的交通量，以最大化社区间的运输量来避免消失流量。我们的 GNN 模型在 популяр的图学学习 Benchmark 上达到了状态的最佳性能（SOTA）。

A Constructive Approach to Function Realization by Neural Stochastic Differential Equations

paper_url: http://arxiv.org/abs/2307.00215
repo_url: None
paper_authors: Tanya Veeravalli, Maxim Raginsky
for: 本研究目的是研究神经动力系统的函数逼近问题，并提出了一种opposite, constructiveapproach。
methods: 本文使用了概率方法和 геометрические方法（Lie-theoretic methods）来 caracterize the classes of functions realized by such systems。
results: 研究表明，通过对系统动力学特性进行限制，可以实现不同类型的函数逼近。

Abstract
The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture. This can lead to high-complexity controls which are impractical in applications. In this paper, we take the opposite, constructive approach: We impose various structural restrictions on system dynamics and consequently characterize the class of functions that can be realized by such a system. The systems are implemented as a cascade interconnection of a neural stochastic differential equation (Neural SDE), a deterministic dynamical system, and a readout map. Both probabilistic and geometric (Lie-theoretic) methods are used to characterize the classes of functions realized by such systems.

摘要
通常，函数近似问题由神经动态系统来解决，通常采取顶部下降方法：任何连续函数都可以在给定的架构下被至高精度地近似。这可能导致具有高复杂度的控制系统，在应用中不实用。在这篇论文中，我们采取了相反的构建方法：我们对系统动力学进行了各种结构限制，并且根据这些限制，描述了神经动态系统可以实现的函数类型。这些系统通过神经随机分布方程（Neural SDE）、恒定动力学系统和读取映射来实现。我们使用概率方法和几何方法（Lie-theoretic）来描述这些系统实现的函数类型。

More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data

paper_url: http://arxiv.org/abs/2307.00213
repo_url: None
paper_authors: Andrew Kean Gao
for: 这个研究旨在测试Compact Convolutional Transformers（CCT）是否能够在具有限制数据的医疗图像分类中提供高精度的结果。
methods: 本研究使用了CCT，它是一种结合了transformers和卷积层的混合模型，以测试其在有限数据的医疗图像分类中的效果。
results: 研究发现，CCT在一个小型的资料集上可以 дости得92.49%的分类精度和0.9935的微 averaged ROC AUC，并在5个迭代后超过80%的验证精度。

Abstract
Transformers are very powerful tools for a variety of tasks across domains, from text generation to image captioning. However, transformers require substantial amounts of training data, which is often a challenge in biomedical settings, where high quality labeled data can be challenging or expensive to obtain. This study investigates the efficacy of Compact Convolutional Transformers (CCT) for robust medical image classification with limited data, addressing a key issue faced by conventional Vision Transformers - their requirement for large datasets. A hybrid of transformers and convolutional layers, CCTs demonstrate high accuracy on modestly sized datasets. We employed a benchmark dataset of peripheral blood cell images of eight distinct cell types, each represented by approximately 2,000 low-resolution (28x28x3 pixel) samples. Despite the dataset size being smaller than those typically used with Vision Transformers, we achieved a commendable classification accuracy of 92.49% and a micro-average ROC AUC of 0.9935. The CCT also learned quickly, exceeding 80% validation accuracy after five epochs. Analysis of per-class precision, recall, F1, and ROC showed that performance was strong across cell types. Our findings underscore the robustness of CCTs, indicating their potential as a solution to data scarcity issues prevalent in biomedical imaging. We substantiate the applicability of CCTs in data-constrained areas and encourage further work on CCTs.

摘要
transformers是非常强大的工具，可以用于多个领域的任务，从文本生成到图像描述。然而，transformers需要大量的训练数据，而在医学设置中，高质量的标注数据可能困难或者昂贵。这项研究检验了Compact Convolutional Transformers（CCT）在具有限制数据的情况下的稳定性，并解决了传统的视图转换器面临的大数据问题。CCT是一种组合了transformers和卷积层的混合型模型，在中等规模的数据集上达到了高精度。我们使用了8种不同类型的骨髓细胞图像 benchmark数据集，每种类型有约2000个低分辨率（28x28x3像素）的样本。尽管数据集的大小小于传统使用的视图转换器数据集，但我们达到了92.49%的分类精度和0.9935的微平均ROC AUC。CCT也快速学习，在5个epoch后超过80%的验证精度。分析每个类型的精度、回归、F1和ROC指标表明，CCT的性能强大，并且在各个细胞类型中表现良好。我们的发现证明了CCT的可靠性，表明它们在数据缺乏的情况下可以作为解决方案。我们鼓励进一步研究CCT，以推广其应用范围。

An Interpretable Constructive Algorithm for Incremental Random Weight Neural Networks and Its Application

paper_url: http://arxiv.org/abs/2307.00185
repo_url: None
paper_authors: Jing Nan, Wei Dai, Guan Yuan, Ping Zhou
for: 本文提出了一种可解释性构建算法（ICA），用于解决快速学习的难点，即隐藏参数与剩余误差之间的关系难以解释。
methods: 本文提出了一种基于几何关系的可解释性构建算法（ICA），并采用了节点池策略来获得更易于收敛的隐藏参数。此外，本文证明了ICA的通用近似性质。
results: 实验结果表明，ICA在六个基准数据集和一个数学模拟数据集上表现出色，其中模型学习速度、模型准确率和模型网络结构都得到了改进。此外，本文还采用了两个实际应用案例来验证ICA在实践中的效果。

Abstract
Incremental random weight neural networks (IRWNNs) have gained attention in view of its easy implementation and fast learning. However, a significant drawback of IRWNNs is that the elationship between the hidden parameters (node)and the residual error (model performance) is difficult to be interpreted. To address the above issue, this article proposes an interpretable constructive algorithm (ICA) with geometric information constraint. First, based on the geometric relationship between the hidden parameters and the residual error, an interpretable geometric information constraint is proposed to randomly assign the hidden parameters. Meanwhile, a node pool strategy is employed to obtain hidden parameters that is more conducive to convergence from hidden parameters satisfying the proposed constraint. Furthermore, the universal approximation property of the ICA is proved. Finally, a lightweight version of ICA is presented for large-scale data modeling tasks. Experimental results on six benchmark datasets and a numerical simulation dataset demonstrate that the ICA outperforms other constructive algorithms in terms of modeling speed, model accuracy, and model network structure. Besides, two practical industrial application case are used to validate the effectiveness of ICA in practical applications.

摘要
incremenetal random weight neural networks (IRWNNs) 已经吸引了关注，因为它的实现容易和学习速度快。然而，IRWNNs 的一个重大缺点是hidden parameters 和 residual error 之间的关系难以被解释。为了解决这个问题，本文提出了一种可解释性建构算法（ICA），具有几何信息约束。首先，根据hidden parameters 和 residual error 的几何关系，提出了一种可解释性的几何信息约束，随机分配hidden parameters。此外，employs 一种node pool策略来从满足提出的约束的hidden parameters中获取更有利于收敛的hidden parameters。其次，证明了ICA的通用适应性。最后，为大规模数据模型任务提出了一种轻量级版本的ICA。实验结果表明，ICA在六个标准 benchmark 数据集和一个数学模拟数据集上的模型速度、模型准确性和模型网络结构方面都超过了其他构造算法。此外，通过两个实际应用案例， validate 了ICA在实际应用中的有效性。

Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks

paper_url: http://arxiv.org/abs/2307.00175
repo_url: https://github.com/balevinstein/probes
paper_authors: B. A. Levinstein, Daniel A. Herrmann
for: 本文研究了大语言模型（LLMs）是否具有信念，以及如果它们具有信念，我们如何测量它们。
methods: 本文评估了两种现有方法，一种由Azaria和Mitchell（2023）提出，另一种由Burns等人（2022）提出。我们提供了实验结果，表明这两种方法在基本上不能普遍化。
results: 本文提供了实验结果，表明现有方法无法检测LLMs的信念。我们还讨论了一些近期的Arguments，认为LLMs无法具有信念。我们表明这些Arguments是误导的，并提供了一种更产生的问题定义，以及未来工作的具体路径。

Abstract
We consider the questions of whether or not large language models (LLMs) have beliefs, and, if they do, how we might measure them. First, we evaluate two existing approaches, one due to Azaria and Mitchell (2023) and the other to Burns et al. (2022). We provide empirical results that show that these methods fail to generalize in very basic ways. We then argue that, even if LLMs have beliefs, these methods are unlikely to be successful for conceptual reasons. Thus, there is still no lie-detector for LLMs. After describing our empirical results we take a step back and consider whether or not we should expect LLMs to have something like beliefs in the first place. We consider some recent arguments aiming to show that LLMs cannot have beliefs. We show that these arguments are misguided. We provide a more productive framing of questions surrounding the status of beliefs in LLMs, and highlight the empirical nature of the problem. We conclude by suggesting some concrete paths for future work.

摘要
我团队考虑了大语言模型（LLM）是否具有信念的问题，以及如果它们具有信念，我们如何测量它们。我们首先评估了Azaria和Mitchell（2023）的方法和Burns等人（2022）的方法。我们提供了实验结果，表明这些方法在非常基础的情况下无法泛化。我们then argue that，即使LLMs具有信念，这些方法是不可靠的，因为概念上的理由。因此，目前还没有LLM的“谎言检测器”。在描述我们的实验结果后，我们抽出一步，考虑LLM是否应该具有类似信念的问题。我们考虑了一些最近的Arguments，试图表明LLM无法具有信念。我们表示这些Arguments是误导的，并提供了一种更产生的问题定征，高亮问题的实际性。我们 conclude by suggesting some concrete paths for future work.Note that the translation is done using a machine translation tool, and may not be perfect or idiomatic.

The Integer Linear Programming Inference Cookbook

paper_url: http://arxiv.org/abs/2307.00171
repo_url: None
paper_authors: Vivek Srikumar, Dan Roth
for: 这篇论文是为了导导读者在自然语言处理中使用整数线性计划来解决新的推理问题。
methods: 这篇论文使用了许多recipes来帮助读者将新的推理问题转化为整数线性计划的实例。
results: 文章结束有两个实践例子，用于说明使用这些recipes的过程。

Abstract
Over the years, integer linear programs have been employed to model inference in many natural language processing problems. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes. At the end, we will see two worked examples to illustrate the use of these recipes.

摘要
年来，整数线性Programs 被应用于许多自然语言处理问题的推理。这篇评论旨在引导读者将新的推理问题转换成整数线性Programs的实例，并结构化为一系列的烹饪。文章结尾有两个示例，以解释使用这些烹饪的方法。Here's a breakdown of the translation:年来 (年来) - "Over the years"整数线性Programs (整数线性Programs) - "integer linear programs"被应用于 (被应用于) - "have been employed in"许多自然语言处理问题 (许多自然语言处理问题) - "many natural language processing problems"推理 (推理) - "inference"这篇评论 (这篇评论) - "this survey"是 meant to guide (是 meant to guide) - "is meant to guide"读者 (读者) - "reader"将新的推理问题 (将新的推理问题) - "new inference problems"转换成 (转换成) - "converted into"整数线性Programs 的实例 (整数线性Programs 的实例) - "an instance of integer linear programs"并结构化为 (并结构化为) - "and structured as"一系列的烹饪 (一系列的烹饪) - "a collection of recipes" At the end (At the end) - "at the end"有两个示例 (有两个示例) - "there are two examples"以解释使用 (以解释使用) - "to explain the use of"这些烹饪 (这些烹饪) - "these recipes"

VoxWatch: An open-set speaker recognition benchmark on VoxCeleb

paper_url: http://arxiv.org/abs/2307.00169
repo_url: None
paper_authors: Raghuveer Peri, Seyed Omid Sadjadi, Daniel Garcia-Romero
for:* 这个论文主要关注于开放集成 speaker identification（OSI）问题，具体来说是 determinant whether a test speech sample belongs to a speaker from a set of pre-enrolled individuals（in-set）or if it is from an out-of-set speaker。methods:* 该论文使用了三种强大的神经网络系统来进行测试，并使用了VoxCeleb dataset来建立了首个公共benchmark для OSI。results:* 论文表明，通常采用的adaptive score normalization不一定会提高OSI性能，但是score calibration和score fusion两种通常用于SV的技术在OSI中具有显著的改善作用。

Abstract
Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining if a test speech sample belongs to a speaker from a set of pre-enrolled individuals (in-set) or if it is from an out-of-set speaker. In addition to the typical challenges associated with speech variability, OSI is prone to the "false-alarm problem"; as the size of the in-set speaker population (a.k.a watchlist) grows, the out-of-set scores become larger, leading to increased false alarm rates. This is in particular challenging for applications in financial institutions and border security where the watchlist size is typically of the order of several thousand speakers. Therefore, it is important to systematically quantify the false-alarm problem, and develop techniques that alleviate the impact of watchlist size on detection performance. Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations. In this paper, we present the first public benchmark for OSI, developed using the VoxCeleb dataset. We quantify the effect of the watchlist size and speech duration on the watchlist-based speaker detection task using three strong neural network based systems. In contrast to the findings from prior research, we show that the commonly adopted adaptive score normalization is not guaranteed to improve the performance for this task. On the other hand, we show that score calibration and score fusion, two other commonly used techniques in SV, result in significant improvements in OSI performance.

摘要
尽管开放集群人识别（OSI）在实际应用中具有广泛的实用价值，如防止诈骗等，但在语音识别社区中，它受到了较少的关注。OSI的任务是判断测试语音样本是否属于一个先前 регистрирован的人员（在集群）中的人或者是来自外部人员。除了普通的语音变化挑战外，OSI还面临着“假警示问题”，即预先注册人员（watchlist）的大小增加后，外部人员的分数变大，导致假警示率增加。这对于金融机构和边境安全应用来说特别重要，因为预先注册人员的大小通常在千人左右。因此，系统地量化假警示问题，并开发技术来减少预先注册人员大小对检测性能的影响。现有研究对这个问题的研究不多，而且没有一个通用的评估标准。在这篇文章中，我们提供了第一个公共Benchmark для OSI，使用VoxCeleb数据集。我们使用三种强大的神经网络基于系统来评估预先注册人员大小和语音 duration对开放集群人识别任务的影响。与之前的研究不同，我们发现通常采用的adaptive score normalization不一定能提高表现。然而，我们发现对于这个任务，使用Score Calibration和Score Fusion两种通常用于语音识别中的技术可以实现显著的提高。

U-Calibration: Forecasting for an Unknown Agent

paper_url: http://arxiv.org/abs/2307.00168
repo_url: None
paper_authors: Robert Kleinberg, Renato Paes Leme, Jon Schneider, Yifeng Teng
for: 评估预测Binary事件的forecast， whose predictions are consumed by rational agents who take an action in response to a prediction, but whose utility is unknown to the forecaster.
methods: 使用新的metric called U-calibration to evaluate forecasts, which guarantees sublinear regret for all possible agents.
results: 提供了一种算法来实现$O(\sqrt{T})$ U-calibration error, 并且可以在多个预测类型中扩展。I hope that helps! Let me know if you have any other questions.

Abstract
We consider the problem of evaluating forecasts of binary events whose predictions are consumed by rational agents who take an action in response to a prediction, but whose utility is unknown to the forecaster. We show that optimizing forecasts for a single scoring rule (e.g., the Brier score) cannot guarantee low regret for all possible agents. In contrast, forecasts that are well-calibrated guarantee that all agents incur sublinear regret. However, calibration is not a necessary criterion here (it is possible for miscalibrated forecasts to provide good regret guarantees for all possible agents), and calibrated forecasting procedures have provably worse convergence rates than forecasting procedures targeting a single scoring rule. Motivated by this, we present a new metric for evaluating forecasts that we call U-calibration, equal to the maximal regret of the sequence of forecasts when evaluated under any bounded scoring rule. We show that sublinear U-calibration error is a necessary and sufficient condition for all agents to achieve sublinear regret guarantees. We additionally demonstrate how to compute the U-calibration error efficiently and provide an online algorithm that achieves $O(\sqrt{T})$ U-calibration error (on par with optimal rates for optimizing for a single scoring rule, and bypassing lower bounds for the traditionally calibrated learning procedures). Finally, we discuss generalizations to the multiclass prediction setting.

摘要
我们考虑一个评价预测 binary 事件的问题，其中预测是由理智的代理人运行，但预测的价值是不知道的。我们表明了优化预测的问题，不能 garantte 所有可能的代理人对预测的 regret 是低的。相反，几乎准确的预测可以保证所有代理人的 regret 是线性的。但是，准确ness 不是必要的条件（可能有错误的预测仍然可以为所有可能的代理人提供好的 regret guarantees），并且准确的预测程序有可靠的更差的融合率。 motivated by this，我们提出了一个新的预测评估标准，即 U-calibration，定义为任何紧bounded scoring rule 下的预测序列最大的 regret。我们表明了 sublinear U-calibration error 是所有代理人都可以获得 sublinear regret guarantees 的必需和充分条件。此外，我们还说明了如何实时计算 U-calibration error 的方法，并提出了一个在线上运行的算法，可以在 $O(\sqrt{T})$ 的 U-calibration error 下 достичь最佳的 regret guarantees，并且超越了传统的准确预测程序的下界。最后，我们讨论了多项预测设定下的扩展。

Counterfactual Collaborative Reasoning

paper_url: http://arxiv.org/abs/2307.00165
repo_url: None
paper_authors: Jianchao Ji, Zelong Li, Shuyuan Xu, Max Xiong, Juntao Tan, Yingqiang Ge, Hao Wang, Yongfeng Zhang
for: 提高机器学习模型的准确率和可读性
methods: 结合Counterfactual Collaborative Reasoning（CCR）和神经网络逻辑理解，使用推荐系统为例，解决数据稀缺问题，提高模型性能和透明度
results: 在三个实际数据集上实现更好的表现，比非加工模型和隐式加工模型更高，同时也提高模型的可读性。

Abstract
Causal reasoning and logical reasoning are two important types of reasoning abilities for human intelligence. However, their relationship has not been extensively explored under machine intelligence context. In this paper, we explore how the two reasoning abilities can be jointly modeled to enhance both accuracy and explainability of machine learning models. More specifically, by integrating two important types of reasoning ability -- counterfactual reasoning and (neural) logical reasoning -- we propose Counterfactual Collaborative Reasoning (CCR), which conducts counterfactual logic reasoning to improve the performance. In particular, we use recommender system as an example to show how CCR alleviate data scarcity, improve accuracy and enhance transparency. Technically, we leverage counterfactual reasoning to generate "difficult" counterfactual training examples for data augmentation, which -- together with the original training examples -- can enhance the model performance. Since the augmented data is model irrelevant, they can be used to enhance any model, enabling the wide applicability of the technique. Besides, most of the existing data augmentation methods focus on "implicit data augmentation" over users' implicit feedback, while our framework conducts "explicit data augmentation" over users explicit feedback based on counterfactual logic reasoning. Experiments on three real-world datasets show that CCR achieves better performance than non-augmented models and implicitly augmented models, and also improves model transparency by generating counterfactual explanations.

摘要
人工智能中的 causal reasoning 和逻辑理解是两种重要的理解能力。然而，这两种理解能力在机器人智能上的关系尚未得到广泛探讨。在这篇论文中，我们explore了两种理解能力如何结合以提高机器学习模型的准确性和可读性。更 Specifically,我们提出了Counterfactual Collaborative Reasoning（CCR），它通过将counterfactual逻辑理解和神经网络逻辑理解结合起来，以提高表现。具体来说，我们使用推荐系统作为示例，展示了如何CCR可以适应数据稀缺、提高准确性和提高透明度。技术上，我们利用counterfactual逻辑理解生成"difficult" counterfactual训练示例，用于数据加工。这些示例，与原始训练示例一起，可以提高模型性能。由于扩展数据是模型无关的，它们可以用于提高任何模型，因此这种技术具有广泛的可用性。此外，大多数现有的数据加工方法都是基于用户的隐式反馈进行"隐式数据加工"，而我们的框架则是基于counterfactual逻辑理解进行"显式数据加工"。实验结果表明，CCR在三个实际数据集上表现更好于未加工模型和隐式加工模型，并且也提高了模型的透明度。

What do self-supervised speech models know about words?

paper_url: http://arxiv.org/abs/2307.00162
repo_url: None
paper_authors: Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu
for: 本研究旨在 investigate 自我supervised speech模型（S3M）中 Encoding 语言信息的层次结构，以及不同模型中 Encoding 词级信息的方式。
methods: 本研究使用 canonical correlation analysis (CCA) 来度量不同层次中 Encoding 的词级语言特征，并对不同模型的层次表现进行了比较分析。
results: 研究发现，最佳词级语言内容通常存在模型中间层次，而一些较低级别信息，如发音，也保留在 huBERT 和 WavLM 中高层次。同时，研究发现，不同模型中 Encoding 词级信息的层次分布与语言属性具有相似的特征。此外，研究还发现，使用 HuBERT 和 WavLM 的最佳层次，可以直接实现一些任务的优秀表现，例如词汇识别、词语分 segmentation 和 semanticsentence similarity。

Abstract
Many self-supervised speech models (S3Ms) have been introduced over the last few years, producing performance and data efficiency improvements for a variety of speech tasks. Evidence is emerging that different S3Ms encode linguistic information in different layers, and also that some S3Ms appear to learn phone-like sub-word units. However, the extent to which these models capture larger linguistic units, such as words, and where word-related information is encoded, remains unclear. In this study, we conduct several analyses of word segment representations extracted from different layers of three S3Ms: wav2vec2, HuBERT, and WavLM. We employ canonical correlation analysis (CCA), a lightweight analysis tool, to measure the similarity between these representations and word-level linguistic properties. We find that the maximal word-level linguistic content tends to be found in intermediate model layers, while some lower-level information like pronunciation is also retained in higher layers of HuBERT and WavLM. Syntactic and semantic word attributes have similar layer-wise behavior. We also find that, for all of the models tested, word identity information is concentrated near the center of each word segment. We then test the layer-wise performance of the same models, when used directly with no additional learned parameters, on several tasks: acoustic word discrimination, word segmentation, and semantic sentence similarity. We find similar layer-wise trends in performance, and furthermore, find that when using the best-performing layer of HuBERT or WavLM, it is possible to achieve performance on word segmentation and sentence similarity that rivals more complex existing approaches.

摘要
In this study, we analyze word segment representations extracted from different layers of three S3Ms: wav2vec2, HuBERT, and WavLM. We use canonical correlation analysis (CCA) to measure the similarity between these representations and word-level linguistic properties. Our results show that the most comprehensive word-level linguistic content is found in the intermediate layers of the models, while some lower-level information like pronunciation is also retained in higher layers of HuBERT and WavLM. Additionally, we find that the layer-wise behavior of syntactic and semantic word attributes is similar.We also investigate the layer-wise performance of the models on several tasks: acoustic word discrimination, word segmentation, and semantic sentence similarity. Our findings show that the best-performing layers of HuBERT and WavLM achieve performance on word segmentation and sentence similarity that is comparable to more complex existing approaches. Furthermore, we find that the layer-wise trends in performance are similar across tasks.Overall, our study provides insights into the representation of linguistic information in self-supervised speech models and demonstrates the potential of using these models for speech processing tasks.

FFPDG: Fast, Fair and Private Data Generation

paper_url: http://arxiv.org/abs/2307.00161
repo_url: None
paper_authors: Weijie Xu, Jinjin Zhao, Francis Iannacci, Bo Wang
for: 本研究旨在提出一种快速、公平、灵活和隐私的数据生成方法，以解决现有的生成模型具有偏见和高计算资源需求的问题。
methods: 本研究使用了Recent GAN[\cite{goodfellow2014generative}]基于方法，并提出了一种基于约束的数据生成方法，以保证生成的数据具有公平性和隐私性。
results: 本研究通过理论和实验验证了提出的方法的有效性，并证明了模型在实际应用场景中的良好性能。

Abstract
Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN [\cite{goodfellow2014generative}] based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. In this work, we design a fast, fair, flexible and private data generation method. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios.

摘要
生成模型已经广泛应用于合成数据生成。公平和隐私是合成数据的两大关注点。虽然最近的GAN[\cite{goodfellow2014generative}]基于方法显示了保护隐私的好效果，但生成的数据可能更加偏袋。同时，这些方法需要高度的计算资源。在这项工作中，我们设计了快速、公平、灵活和隐私的数据生成方法。我们证明了我们的方法的效果 both theoretically and empirically，并示出了基于我们的方法训练的模型在真实应用场景中的良好表现。

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

paper_url: http://arxiv.org/abs/2307.00157
repo_url: None
paper_authors: Adrian Stando, Mustafa Cavus, Przemysław Biecek
for: Addressing the challenge of imbalanced data in classification by examining the impact of balancing methods on model behavior.methods: Utilizes Explainable Artificial Intelligence tools such as variable importance, partial dependence profile, and accumulated local effects to compare models trained on datasets before and after balancing.results: Shows significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings emphasize the importance of considering the impact of balancing methods on model behavior beyond just performance comparisons.

Abstract
Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing. In addition to the variable importance method, this study uses the partial dependence profile and accumulated local effects techniques. Real and simulated datasets are tested, and an open-source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.

摘要
<>使用中文简化字体表示文本。>不均衡数据对于分类具有重大挑战，因为模型的学习效果受到少数类的影响。为解决这个问题，通常使用平衡方法。然而，这些技术可能会导致过拟合或信息损失。本研究探讨平衡方法对模型行为的影响。使用可解释人工智能工具来比较在数据集之前和之后训练的模型。除了变量重要性方法外，这种研究还使用分解方程和积累地方效果技术。使用真实和模拟的数据集测试，并开发了一个开源的Python包edgaro来实现这种分析。结果显示，平衡方法会导致模型的行为发生显著变化，这可能会导致模型偏向平衡分布。这些发现证实了平衡分析应该超越模型性能比较，以实现更高的机器学习模型可靠性。因此，我们提出了一种新的模型性能增加图表，用于了解平衡方法的改变对模型行为的度量和性能增加的关系。

Stitched ViTs are Flexible Vision Backbones

paper_url: http://arxiv.org/abs/2307.00154
repo_url: https://github.com/ziplab/sn-netv2
paper_authors: Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
for: This paper aims to improve the efficiency of training and deployment of large pre-trained vision Transformers (ViTs) for downstream tasks.
methods: The authors propose a new framework called SN-Netv2, which stitches pre-trained model families to create a single model that supports diverse performance-efficiency trade-offs at runtime.
results: The authors achieve strong adaptation and efficiency in downstream tasks, including ImageNet-1K, ADE20K, COCO-Stuff-10K, NYUv2, and COCO-2017, with extensive experiments demonstrating the effectiveness of SN-Netv2.

Abstract
Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual sizes requires separate training and is restricted by fixed performance-efficiency trade-offs. In this paper, we are inspired by stitchable neural networks, which is a new framework that cheaply produces a single model that covers rich subnetworks by stitching pretrained model families, supporting diverse performance-efficiency trade-offs at runtime. Building upon this foundation, we introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation. Specifically, we first propose a Two-way stitching scheme to enlarge the stitching space. We then design a resource-constrained sampling strategy that takes into account the underlying FLOPs distributions in the space for improved sampling. Finally, we observe that learning stitching layers is a low-rank update, which plays an essential role on downstream tasks to stabilize training and ensure a good Pareto frontier. With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K, NYUv2 and COCO-2017, SN-Netv2 demonstrates strong ability to serve as a flexible vision backbone, achieving great advantages in both training efficiency and adaptation. Code will be released at https://github.com/ziplab/SN-Netv2.

摘要
大型预训练的平面视 transformer（ViT）已成为许多下游任务的工具马。然而，现有的使用批处理的ViT的工作是不高效的，因为采用不同的ViT大小需要单独的训练和固定的性能精度负担。在这篇论文中，我们受到折叠神经网络的启发，这是一种新的框架，可以便宜地生成一个单个模型，覆盖多样性的性能精度负担。基于这个基础，我们引入SN-Netv2，一个系统地改进的模型缝合框架，以便下游任务的适应。我们首先提出了两种缝合方案，以扩大缝合空间。然后，我们设计了考虑下面的FLOPs分布的资源限制 sampling策略。最后，我们发现学习缝合层是一个低级别的更新，在下游任务中很重要，以稳定训练和保证良好的Pareto前沿。通过对ImageNet-1K、ADE20K、COCO-Stuff-10K、NYUv2和COCO-2017进行了广泛的实验，SN-Netv2表现出了强大的灵活视觉基础，在训练效率和适应方面都取得了优异的成绩。代码将在https://github.com/ziplab/SN-Netv2上发布。

Hierarchical Neural Coding for Controllable CAD Model Generation

paper_url: http://arxiv.org/abs/2307.00149
repo_url: https://github.com/samxuxiang/hnc-cad
paper_authors: Xiang Xu, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Karl D. D. Willis, Yasutaka Furukawa
for: 本研究开发了一种新型的计算机支持设计（CAD）生成模型，以实现高级设计概念的生成和调整。
methods: 本研究使用了一种基于三层嵌入的神经代码树，将设计概念分解为全局部件安排、中度曲线几何和小度特征三层次。另外，使用了一种实验新的vector quantized VAE，将设计变化捕捉为神经代码库。
results: 本研究的实验结果显示，这种生成模型在单独生成和增强设计交互等任务上具有优秀的性能，并且可以实现设计调整和完成。代码可以在https://github.com/samxuxiang/hnc-cad中找到。

Abstract
This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from global part arrangement down to local curve geometry; and 2) controls the generation or completion of CAD models by specifying the target design using a code tree. Concretely, a novel variant of a vector quantized VAE with "masked skip connection" extracts design variations as neural codebooks at three levels. Two-stage cascaded auto-regressive transformers learn to generate code trees from incomplete CAD models and then complete CAD models following the intended design. Extensive experiments demonstrate superior performance on conventional tasks such as random generation while enabling novel interaction capabilities on conditional generation tasks. The code is available at https://github.com/samxuxiang/hnc-cad.

摘要
Specifically, the proposed model uses a novel variant of a vector quantized VAE with "masked skip connection" to extract design variations as neural codebooks at three levels. Two-stage cascaded auto-regressive transformers are then used to generate code trees from incomplete CAD models and complete the models following the intended design.The proposed model was evaluated through extensive experiments, which demonstrated superior performance on conventional tasks such as random generation and enabled novel interaction capabilities on conditional generation tasks. The code for the proposed model is available on GitHub at https://github.com/samxuxiang/hnc-cad.In simplified Chinese:这篇论文提出了一种新的计算机支持设计（CAD）的生成模型，该模型使用三级层次的神经代码表示高级设计概念，从全局部件布局下到本地曲线几何。模型允许设计或完成CAD模型，通过指定目标设计的代码树。具体来说，提出的模型使用了一种新的vectorQuantized VAE with "masked skip connection"抽取设计变化为神经代码库的三级层次。两个阶段嵌入式自然语言处理器学习代码树从不完整的CAD模型中生成代码树，然后完成CAD模型根据设计的意图。这篇论文通过了广泛的实验，证明了该模型在Random Generation和Conditional Generation任务上的超越性表现，同时允许新的交互能力。代码可以在GitHub上下载：https://github.com/samxuxiang/hnc-cad。

Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows

paper_url: http://arxiv.org/abs/2307.00144
repo_url: https://github.com/sibyllema/conservation_laws
paper_authors: Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
for: 这篇论文的目的是为了解释大型机器学习模型的最近成功，具体来说是研究梯度下降动力学的几何性质。
methods: 论文使用了梯度流动的方法，包括计算梯度的Jacobian和利达数学algebraic manipulate。
results: 论文发现了一些保守的量（conservation laws），这些量是在梯度下降动力学中保持不变的独立量，并且可以用来解释训练过程中模型的良好泛化性。论文还提供了计算这些量的算法，并在一些ReLU网络架构上进行了实验验证。

Abstract
Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This "implicit bias" is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. The purpose of this article is threefold. First, we rigorously expose the definition and basic properties of "conservation laws", which are maximal sets of independent quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then we explain how to find the exact number of these quantities by performing finite-dimensional algebraic manipulations on the Lie algebra generated by the Jacobian of the model. Finally, we provide algorithms (implemented in SageMath) to: a) compute a family of polynomial laws; b) compute the number of (not necessarily polynomial) conservation laws. We provide showcase examples that we fully work out theoretically. Besides, applying the two algorithms confirms for a number of ReLU network architectures that all known laws are recovered by the algorithm, and that there are no other laws. Such computational tools pave the way to understanding desirable properties of optimization initialization in large machine learning models.

摘要
The purpose of this article is threefold:1. We rigorously expose the definition and basic properties of "conservation laws", which are maximal sets of independent quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss.2. We explain how to find the exact number of these quantities by performing finite-dimensional algebraic manipulations on the Lie algebra generated by the Jacobian of the model.3. We provide algorithms (implemented in SageMath) to:a. Compute a family of polynomial laws.b. Compute the number of (not necessarily polynomial) conservation laws.We provide showcase examples that we fully work out theoretically. Besides, applying the two algorithms confirms for a number of ReLU network architectures that all known laws are recovered by the algorithm, and that there are no other laws. Such computational tools pave the way to understanding desirable properties of optimization initialization in large machine learning models.Translated into Simplified Chinese:理解梯度下降动力学的几何性质是大机器学习模型最近成功的关键因素之一。一个 striking observation 是训练过的过参数模型保留了优化初始化的一些性质。这种 "隐式偏见" 被认为是训练模型的一些有利属性的原因，并可能解释它们的泛化性能。这篇文章的目的是三重的：1. 我们严格把定了 "保守定律" 的定义和基本性质，即梯度流动中的一个给定模型（例如 ReLU 网络）与任何训练数据和任何损失的情况下，保留的独立量的最大集。2. 我们解释了如何使用 finite-dimensional 代数推导来计算这些量的确切数量。3. 我们提供了两个算法（ implemented in SageMath）：a. 计算一家 polynomial 定律。b. 计算不必然 polynomial 定律的数量。我们提供了一些示例，并完全理论上处理这些示例。此外，通过应用这两个算法，我们发现了一些 ReLU 网络架构中的所有知道的法律都是由算法生成的，而没有其他法律。这些计算工具可以帮助我们理解大机器学习模型优化初始化的愉悦性质。Translated into Traditional Chinese:理解梯度下降动力学的几何性质是大机器学习模型最近成功的关键因素之一。一个 striking observation 是训练过的过参数模型保留了优化初始化的一些性质。这种 "隐式偏见" 被认为是训练模型的一些有利属性的原因，并可能解释它们的泛化性能。这篇文章的目的是三重的：1. 我们严格把定了 "保守定律" 的定义和基本性质，即梯度流动中的一个给定模型（例如 ReLU 网络）与任何训练数据和任何损失的情况下，保留的独立量的最大集。2. 我们解释了如何使用 finite-dimensional 代数推导来计算这些量的确切数量。3. 我们提供了两个算法（ implemented in SageMath）：a. 计算一家 polynomial 定律。b. 计算不必然 polynomial 定律的数量。我们提供了一些示例，并完全理论上处理这些示例。此外，通过应用这两个算法，我们发现了一些 ReLU 网络架构中的所有知道的法律都是由算法生成的，而没有其他法律。这些计算工具可以帮助我们理解大机器学习模型优化初始化的愉悂性质。

BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

paper_url: http://arxiv.org/abs/2307.00142
repo_url: https://github.com/nrel/buildingsbench
paper_authors: Patrick Emami, Abhijeet Sahu, Peter Graf
for: 短期预测住宅和商业建筑能源消耗，以便在电力系统中进行规划和管理。
methods: 使用数据驱动的短期负荷预测（STLF）技术，并对buildingsbench dataset进行评估和比较。
results: 研究发现，使用生成的synthetically pretrained模型可以在真实的商业建筑中进行良好的预测，而不需要进行细致的调整。同时，对于大多数目标建筑，进行 fine-tuning可以提高性能。

Abstract
Short-term forecasting of residential and commercial building energy consumption is widely used in power systems and continues to grow in importance. Data-driven short-term load forecasting (STLF), although promising, has suffered from a lack of open, large-scale datasets with high building diversity. This has hindered exploring the pretrain-then-finetune paradigm for STLF. To help address this, we present BuildingsBench, which consists of 1) Buildings-900K, a large-scale dataset of 900K simulated buildings representing the U.S. building stock, and 2) an evaluation platform with over 1,900 real residential and commercial buildings from 7 open datasets. BuildingsBench benchmarks two under-explored tasks: zero-shot STLF, where a pretrained model is evaluated on unseen buildings without fine-tuning, and transfer learning, where a pretrained model is fine-tuned on a target building. The main finding of our benchmark analysis is that synthetically pretrained models generalize surprisingly well to real commercial buildings. An exploration of the effect of increasing dataset size and diversity on zero-shot commercial building performance reveals a power-law with diminishing returns. We also show that fine-tuning pretrained models on real commercial and residential buildings improves performance for a majority of target buildings. We hope that BuildingsBench encourages and facilitates future research on generalizable STLF. All datasets and code can be accessed from \url{https://github.com/NREL/BuildingsBench}.

摘要
短期预测住宅和商业建筑能源消耗广泛应用于电力系统，并且不断增长在重要性。数据驱动短期负荷预测（STLF），尽管搭配承诺，却受到开放大规模数据集的缺乏，高建筑多样性限制了explore预训练然后finetune的 paradigma。为了解决这个问题，我们提出了BuildingsBench，它包括以下两个组成部分：1. Buildings-900K，一个大规模的数据集，包含900K个模拟的建筑物，表示美国建筑资产。2.一个评估平台，包括7个开放数据集中的1,900个实际的住宅和商业建筑。BuildingsBench对两个未经审查的任务进行了评估：零shot STLF和传输学习。我们的主要发现是，通过synthetically预训练模型，对实际的商业建筑进行预测 surprisingly well。我们还发现，增加数据集的大小和多样性对零shot商业建筑性能的影响按照一个减少 returns的power-law。此外，我们还发现，对实际的商业和住宅建筑进行finetune预训练后，大多数目标建筑的性能得到了改进。我们希望，BuildingsBench可以鼓励和促进未来对通用 STLF 的研究。所有数据集和代码可以通过 \url{https://github.com/NREL/BuildingsBench} 访问。

Risk-sensitive Actor-free Policy via Convex Optimization

paper_url: http://arxiv.org/abs/2307.00141
repo_url: None
paper_authors: Ruoqi Zhang, Jens Sjölund
for: 这个论文旨在提出一种不考虑安全性的传统强化学习方法，并提出一种基于Conditional Value at Risk（CVaR）的风险敏感目标函数，以优化智能代理人。
methods: 该论文提出的方法是基于输入几何函数模型的风险敏感目标函数，该函数 Ensure convexity with respect to actions，使得通过简单的梯度追踪方法可以快速地确定全局优化的行动。
results: 实验结果表明，该方法可以有效地维护风险控制。

Abstract
Traditional reinforcement learning methods optimize agents without considering safety, potentially resulting in unintended consequences. In this paper, we propose an optimal actor-free policy that optimizes a risk-sensitive criterion based on the conditional value at risk. The risk-sensitive objective function is modeled using an input-convex neural network ensuring convexity with respect to the actions and enabling the identification of globally optimal actions through simple gradient-following methods. Experimental results demonstrate the efficacy of our approach in maintaining effective risk control.

摘要
传统的回归学习方法不考虑安全性，可能导致不良后果。在这篇论文中，我们提出了一种无actor的优化策略，该策略基于条件风险值来优化一个风险敏感的目标函数。风险敏感目标函数使用输入凸神经网络模型，使得行动的 convexity guarantees globally optimal actions can be identified through simple gradient-following methods。实验结果表明我们的方法可以够有效地控制风险。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Generalization Limits of Graph Neural Networks in Identity Effects Learning

paper_url: http://arxiv.org/abs/2307.00134
repo_url: https://github.com/aledinve/gnn_identity_effects
paper_authors: Giuseppe Alessio D’Inverno, Simone Brugiapaglia, Mirco Ravanelli
for: 本研究探讨了图生成模型（GNNs）在复杂图域上进行数据驱动学习的能力。
methods: 本研究使用了消息传递机制，并对GNNs的普通化性和基本限制进行了分析。
results: 研究发现，使用抽象编码（如一颗一热图）可以使GNNs在识别字符串是否相同的任务中表现不佳，而使用WL测试可以提供正面的存在结果。

Abstract
Graph Neural Networks (GNNs) have emerged as a powerful tool for data-driven learning on various graph domains. They are usually based on a message-passing mechanism and have gained increasing popularity for their intuitive formulation, which is closely linked to the Weisfeiler-Lehman (WL) test for graph isomorphism to which they have been proven equivalent in terms of expressive power. In this work, we establish new generalization properties and fundamental limits of GNNs in the context of learning so-called identity effects, i.e., the task of determining whether an object is composed of two identical components or not. Our study is motivated by the need to understand the capabilities of GNNs when performing simple cognitive tasks, with potential applications in computational linguistics and chemistry. We analyze two case studies: (i) two-letters words, for which we show that GNNs trained via stochastic gradient descent are unable to generalize to unseen letters when utilizing orthogonal encodings like one-hot representations; (ii) dicyclic graphs, i.e., graphs composed of two cycles, for which we present positive existence results leveraging the connection between GNNs and the WL test. Our theoretical analysis is supported by an extensive numerical study.

摘要
Graph Neural Networks (GNNs) 已经成为数据驱动学习在各种图形域的 poderoso工具。它们通常基于一个消息传递机制，并在表达力方面取得了广泛的应用。在这篇文章中，我们建立了新的普适性质和基本限制，以及在学习“标识效应”（即判断对象是否由两个相同的组件组成）方面的GNNs的潜在应用。我们的研究受到了计算语言学和化学等领域的应用需求的启发。我们分析了两个案例：（i）两个字母词，我们表明在使用 ortogonal 编码时，GNNs 通过权重梯度下降训练无法泛化到未见字母；（ii）异环图，即两个环组成的图，我们提供了正的存在结果，利用 GNNs 和 WL 测试之间的连接。我们的理论分析得到了广泛的数字实验支持。

Machine learning for advancing low-temperature plasma modeling and simulation

paper_url: http://arxiv.org/abs/2307.00131
repo_url: None
paper_authors: Jan Trieschmann, Luca Vialetto, Tobias Gergs
for: 本文主要用于介绍低温气体模拟和计算的最新进展，特别是通过机器学习和数据驱动模型的应用。
methods: 本文使用了许多现有的机器学习算法和方法，包括逻辑回归、支持向量机和深度学习等，以及数据驱动模型的应用。
results: 本文通过对 Literature 中的许多例子进行分析和评论，展示了机器学习和数据驱动模型在低温气体模拟和计算中的广泛应用和潜在advances。

Abstract
Machine learning has had an enormous impact in many scientific disciplines. Also in the field of low-temperature plasma modeling and simulation it has attracted significant interest within the past years. Whereas its application should be carefully assessed in general, many aspects of plasma modeling and simulation have benefited substantially from recent developments within the field of machine learning and data-driven modeling. In this survey, we approach two main objectives: (a) We review the state-of-the-art focusing on approaches to low-temperature plasma modeling and simulation. By dividing our survey into plasma physics, plasma chemistry, plasma-surface interactions, and plasma process control, we aim to extensively discuss relevant examples from literature. (b) We provide a perspective of potential advances to plasma science and technology. We specifically elaborate on advances possibly enabled by adaptation from other scientific disciplines. We argue that not only the known unknowns, but also unknown unknowns may be discovered due to an inherent propensity to spotlight hidden patterns in data.

摘要

Review the state-of-the-art in low-temperature plasma modeling and simulation, focusing on various approaches and their applications in plasma physics, plasma chemistry, plasma-surface interactions, and plasma process control.2. Provide a perspective on potential advances to plasma science and technology, including those that may be enabled by adapting techniques from other scientific disciplines. We argue that not only the known unknowns, but also unknown unknowns may be discovered due to the inherent ability of machine learning to spotlight hidden patterns in data.

Accelerating Inexact HyperGradient Descent for Bilevel Optimization

paper_url: http://arxiv.org/abs/2307.00126
repo_url: None
paper_authors: Haikuo Yang, Luo Luo, Chris Junchi Li, Michael I. Jordan
for: 这个论文是解决一般非凸-强 convex 双层优化问题的方法。
methods: 这个方法是 \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) 方法，它可以在 $\mathcal{O}(\kappa^{3.25}\epsilon^{-1.75})$ oracle复杂度内找到一个 $\epsilon$-first-order站点点。
results: 这个方法可以在非凸-强 convex 双层优化问题中找到一个 $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon},)\big)$-second-order站点点，并且超越了现有的Upper bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems，设置了新的状态纪录。empirical studies are conducted to validate the theoretical results in this paper.

Abstract
We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $\epsilon$-first-order stationary point of the objective with $\tilde{\mathcal{O}(\kappa^{3.25}\epsilon^{-1.75})$ oracle complexity, where $\kappa$ is the condition number of the lower-level objective and $\epsilon$ is the desired accuracy. We also propose a perturbed variant of \texttt{RAHGD} for finding an $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-second-order stationary point within the same order of oracle complexity. Our results achieve the best-known theoretical guarantees for finding stationary points in bilevel optimization and also improve upon the existing upper complexity bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems, setting a new state-of-the-art benchmark. Empirical studies are conducted to validate the theoretical results in this paper.

摘要
我们提出了一种解决一般非 convex-强 convex 二级优化问题的方法。我们的方法 -- Restarted Accelerated HyperGradient Descent（\texttt{RAHGD}）方法 -- 可以在 $\epsilon$ 精度下找到一个 $\epsilon$-第一阶点的目标函数，其复杂度为 $\mathcal{O}(\kappa^{3.25}\epsilon^{-1.75})$，其中 $\kappa$ 是下级目标函数的condition number，$\epsilon$ 是所需的精度。我们还提出了一种受损变体的 \texttt{RAHGD}，可以在同样的 oracle complexity 下找到一个 $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-第二阶点。我们的结果达到了带优化问题中的最佳理论保证，并且超越了现有的非 convex-强 convex 最小最大化问题中的上界复杂度，设置了新的状态纪录。我们进行了实验研究，以验证这些理论结果。

RObotic MAnipulation Network (ROMAN) $\unicode{x2013}$ Hybrid Hierarchical Learning for Solving Complex Sequential Tasks

paper_url: http://arxiv.org/abs/2307.00125
repo_url: None
paper_authors: Eleftherios Triantafyllidis, Fernando Acero, Zhaocheng Liu, Zhibin Li
for: 解决机器人 manipulate 多个复杂任务的长时间 horizon 问题
methods: 结合行为做样、学习做样、奖励学习
results: 实验结果表明，ROMAN 可以 correctly 生成长序列 manipulate 任务的正确动作，并且 exhibiting 鲁棒性于各种感知噪音。这些结果表明 ROMAN 具有自适应能力和自主维护功能，并且可以应用于各种自主 manipulate 任务。

Abstract
Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulation skills is an active area of research. In this work, we present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN), to address the challenge of solving multiple complex tasks over long time horizons in robotic manipulation. ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. It consists of a central manipulation network that coordinates an ensemble of various neural networks, each specialising in distinct re-combinable sub-tasks to generate their correct in-sequence actions for solving complex long-horizon manipulation tasks. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks and achieving adaptive behaviours beyond demonstrations, while exhibiting robustness to various sensory noises. These results demonstrate the significance and versatility of ROMAN's dynamic adaptability featuring autonomous failure recovery capabilities, and highlight its potential for various autonomous manipulation tasks that demand adaptive motor skills.

摘要
Translated into Simplified Chinese:解决长序列任务是人工智能体内的一大挑战。允许机器人系统执行多种多样化的序列任务，并且具有广泛的抓取技巧，是一个活跃的研究领域。在这项工作中，我们提出了一种混合层次学习框架，即机器人抓取网络（ROMAN），以解决机器人抓取中的多个复杂任务长时间内的挑战。ROMAN通过结合行为模仿学习、模仿学习和奖励学习来实现任务多样性和强健的失败恢复。它包括一个中央抓取网络，协调多个各特性的神经网络，每个神经网络专门为不同的可重复子任务提供正确的顺序动作，以解决复杂的长时间任务。实验结果表明，通过协调和激活这些特殊抓取专家，ROMAN可以成功完成长序列的复杂抓取任务，并达到超出示例的适应行为，同时展示对各种感知噪声的Robustness。这些结果表明ROMAN的动态适应性和自动失败恢复能力的重要性，并高亮它的适用于多种自主抓取任务，需要适应动作技巧。

How Do Human Users Teach a Continual Learning Robot in Repeated Interactions?

paper_url: http://arxiv.org/abs/2307.00123
repo_url: https://github.com/aliayub7/cl_hri
paper_authors: Ali Ayub, Jainish Mehta, Zachary De Francesco, Patrick Holthaus, Kerstin Dautenhahn, Chrystopher L. Nehaniv
for: 本研究旨在探讨人类在长期互动中教育可持续学习机器人的方式，以及这些教学方法是否具有个体化的特点。
methods: 本研究使用了两种不同的可持续学习模型，在Fetch移动抓取机器人上进行测试。研究采用了详细的质量分析和量化分析方法，探讨参与者的教学风格是否存在差异，以及这些差异对机器人的性能是否有影响。
results: 研究发现，参与者的教学风格之间存在显著的差异， indicating the need for personalized adaptation to their distinct teaching styles。此外，研究还发现，虽然专家和非专家用户之间有教学风格的差异，但这种差异不会影响机器人的性能。最后，研究发现，常用的 continual learning 技术测试设置不够，因为实际的用户在教育和教学机器人时会采用多种方法。

Abstract
Continual learning (CL) has emerged as an important avenue of research in recent years, at the intersection of Machine Learning (ML) and Human-Robot Interaction (HRI), to allow robots to continually learn in their environments over long-term interactions with humans. Most research in continual learning, however, has been robot-centered to develop continual learning algorithms that can quickly learn new information on static datasets. In this paper, we take a human-centered approach to continual learning, to understand how humans teach continual learning robots over the long term and if there are variations in their teaching styles. We conducted an in-person study with 40 participants that interacted with a continual learning robot in 200 sessions. In this between-participant study, we used two different CL models deployed on a Fetch mobile manipulator robot. An extensive qualitative and quantitative analysis of the data collected in the study shows that there is significant variation among the teaching styles of individual users indicating the need for personalized adaptation to their distinct teaching styles. The results also show that although there is a difference in the teaching styles between expert and non-expert users, the style does not have an effect on the performance of the continual learning robot. Finally, our analysis shows that the constrained experimental setups that have been widely used to test most continual learning techniques are not adequate, as real users interact with and teach continual learning robots in a variety of ways. Our code is available at https://github.com/aliayub7/cl_hri.

摘要

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

paper_url: http://arxiv.org/abs/2307.00117
repo_url: https://github.com/rail-berkeley/grif_release
paper_authors: Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine
for: 这个论文的目的是为了让机器人按照自然语言指令进行操作，但获取大量标注数据（即包含任务的语言指令）是不可能的。因此，本论文提出了一种方法，利用只需要一小amount of语言数据来获得 JOINT image-和目标条件的策略。
methods: 本论文使用了图像目标和语言指令之间的匹配来实现这种方法。具体来说，它首先学习一个映射从标注数据中，将语言指令与图像目标之间的对应关系学习到一个 embedding 空间中。然后，它将这个 embedding 空间用于训练一个策略，这个策略可以利用所有的无标注数据进行训练，同时受益于 embedding 的匹配关系，以便通过语言指令来控制策略。
results: 本论文的实验结果表明，这种方法可以在实际世界中实现 robust 的任务完成。具体来说，它可以在不同的搬运任务中，在不同的场景中，以及使用不同的语言指令中，完成任务。此外，这种方法还可以在语言指令外的数据上进行扩展。视频和代码可以在https://rail-berkeley.github.io/grif/ 上找到。

Abstract
Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsight with its final state as the goal. In this work, we contribute a method that taps into joint image- and goal- conditioned policies with language using only a small amount of language data. Prior work has made progress on this using vision-language models or by jointly training language-goal-conditioned policies, but so far neither method has scaled effectively to real-world robot tasks without significant human annotation. Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to. We then train a policy on this embedding: the policy benefits from all the unlabeled data, but the aligned embedding provides an interface for language to steer the policy. We show instruction following across a variety of manipulation tasks in different scenes, with generalization to language instructions outside of the labeled data. Videos and code for our approach can be found on our website: https://rail-berkeley.github.io/grif/ .

摘要
我们的目标是让机器人按照自然语言指令进行操作，如“把毯子放在微风炉旁”。但获取大量标注数据，即包含任务的示例并与语言指令相关的标签，是不可能的。相比之下，获取基于图像目标的策略是更加容易的，因为任何自主试验或示例都可以在后看的情况下被标注为其最终状态作为目标。在这项工作中，我们提供一种方法，可以通过只使用少量语言数据来使用图像和目标联合条件下的策略。先前的工作已经在这个领域做出了进步，使用视觉语言模型或同时培养语言目标条件下的策略，但是到目前为止，没有任何方法能够在实际世界中有效地执行机器人任务，不需要人类的批注。我们的方法可以在实际世界中达到稳定性，通过从标注数据中学习一个对应语言和目标之间的嵌入，使用这个嵌入来训练策略。我们的策略可以利用所有没有标注的数据，但是嵌入的对接使得语言可以控制策略。我们在不同的搬运任务中展示了 instrucion 遵循，并且在语言指令外部的数据上进行泛化。视频和代码可以在我们的网站上找到：https://rail-berkeley.github.io/grif/。

Ticket-BERT: Labeling Incident Management Tickets with Language Models

paper_url: http://arxiv.org/abs/2307.00108
repo_url: None
paper_authors: Zhexiong Liu, Cris Benge, Siduo Jiang
for: 高效地归类incident ticket для解决
methods: 使用我们提出的票据集合并Ticket-BERT模型进行标签
results: 对Azure认知服务进行实验，表明Ticket-BERT超过基线和当前文本分类器的性能，并且通过活动学习循环和Microsoft IcM系统的部署，快速适应新收集的票据。

Abstract
An essential aspect of prioritizing incident tickets for resolution is efficiently labeling tickets with fine-grained categories. However, ticket data is often complex and poses several unique challenges for modern machine learning methods: (1) tickets are created and updated either by machines with pre-defined algorithms or by engineers with domain expertise that share different protocols, (2) tickets receive frequent revisions that update ticket status by modifying all or parts of ticket descriptions, and (3) ticket labeling is time-sensitive and requires knowledge updates and new labels per the rapid software and hardware improvement lifecycle. To handle these issues, we introduce Ticket- BERT which trains a simple yet robust language model for labeling tickets using our proposed ticket datasets. Experiments demonstrate the superiority of Ticket-BERT over baselines and state-of-the-art text classifiers on Azure Cognitive Services. We further encapsulate Ticket-BERT with an active learning cycle and deploy it on the Microsoft IcM system, which enables the model to quickly finetune on newly-collected tickets with a few annotations.

摘要
“优先Handle incident tickets的一个重要方面是精确地分类tickets。然而，tickets的数据往往复杂，带来多个现代机器学习方法的挑战：（1）tickets是由机器或专家工程师透过预先定义的 алгоритми或专家知识所创建的，（2）tickets会频繁地更新，将tickets的状态更新为全部或部分的描述，（3）tickets的分类是时间敏感的，需要不断更新知识和新的标签，以应对软件和硬件的持续改进周期。”“为解决这些问题，我们介绍了Ticket-BERT，一个训练了简单 yet robust的语言模型，用于分类tickets。实验结果显示Ticket-BERT在Azure cognitive Services上比基eline和现有文本分类器更有优势。我们还将Ticket-BERT与活动学习周期结合，并将其部署到Microsoft IcM系统上，这使得模型可以快速地调整新收集的tickets，只需要少量的标注。”

Distance Functions and Normalization Under Stream Scenarios

paper_url: http://arxiv.org/abs/2307.00106
repo_url: None
paper_authors: Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto Jr, Rafael M. O. Cruz
for: 本研究旨在 investigate 数据流模型中的数据Normalization问题，以及不同距离函数在数据流中的表现。
methods: 本研究使用了 eight 种常见距离函数，包括不归一化、基于首批数据的归一化和基于前一批数据的归一化。
results: 结果显示，不归一化和Canberra距离函数组合可以在无知数据流情况下达到良好的表现。

Abstract
Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.

摘要
<>传感器数据归一化是模型分类系统的必要任务。在处理数据流时，归一化变得特别困难，因为我们可能不知道特征的属性，如最小值和最大值，而这些属性可能会随着时间变化。我们比较了8种公知的距离函数在数据流中的准确率，不归一化、根据首批数据的统计归一化和考虑前一批数据的归一化。我们认为启用全流程归一化的实验协议是不实际的，可能会导致偏导和Results的欠佳。我们的结果表明，不应用归一化，和Canberra距离可以是一个好的组合，当没有任何信息关于数据流是知道的时候。Note: The word "归一化" (guīyìhuà) in Simplified Chinese is used to refer to the process of normalization, and it is a literal translation of the English word "normalization".

Obscured Wildfire Flame Detection By Temporal Analysis of Smoke Patterns Captured by Unmanned Aerial Systems

paper_url: http://arxiv.org/abs/2307.00104
repo_url: None
paper_authors: Uma Meleti, Abolfazl Razi
for: 本研究论文目标是实时检测受遮盲森林火灾（火焰被树木、烟雾、云层等自然障碍物遮盖），使用RGB摄像头 equipped 无人机。
methods: 我们提出了一种新的方法，利用semantic segmentation，基于视频序列中烟团特征的时间分析。我们的方法使用卷积神经网络架构，包括预训练的CNNEncoder和3D卷积来解码，同时使用序列堆叠特征来利用时间变化。
results: 我们对一个精心制作的数据集进行了测试，该数据集包括RGB视频和IR视频，以确定真实的地面 truth。我们的方法在测试数据上达到了85.88%的Dice分数，同时达到了92.47%的准确率和90.67%的分类率。与其他方法相比，我们的方法在视频级别的火灾分类中表现出色，使用MobileNet+CBAM作为encoder backbone时达到了约100%的准确率。

Abstract
This research paper addresses the challenge of detecting obscured wildfires (when the fire flames are covered by trees, smoke, clouds, and other natural barriers) in real-time using drones equipped only with RGB cameras. We propose a novel methodology that employs semantic segmentation based on the temporal analysis of smoke patterns in video sequences. Our approach utilizes an encoder-decoder architecture based on deep convolutional neural network architecture with a pre-trained CNN encoder and 3D convolutions for decoding while using sequential stacking of features to exploit temporal variations. The predicted fire locations can assist drones in effectively combating forest fires and pinpoint fire retardant chemical drop on exact flame locations. We applied our method to a curated dataset derived from the FLAME2 dataset that includes RGB video along with IR video to determine the ground truth. Our proposed method has a unique property of detecting obscured fire and achieves a Dice score of 85.88%, while achieving a high precision of 92.47% and classification accuracy of 90.67% on test data showing promising results when inspected visually. Indeed, our method outperforms other methods by a significant margin in terms of video-level fire classification as we obtained about 100% accuracy using MobileNet+CBAM as the encoder backbone.

摘要

Redeeming Data Science by Decision Modelling

paper_url: http://arxiv.org/abs/2307.00088
repo_url: None
paper_authors: John Mark Agosta, Robert Horton
for: 本文提出了一个新的应用研究计划，以防止数据科学领域的范围过于扩大。
methods: 本文使用了 bayesian 方法和 AI 技术来构建决策图模型，并提出了 six 个决策质量原则。
results: 本文指出，任何成功的应用机器学习模型均需满足这六个决策质量原则。 Plus, the article also shows an example of how to integrate a model’s ROC curve with a utility model using Decision Modelling.

Abstract
With the explosion of applications of Data Science, the field is has come loose from its foundations. This article argues for a new program of applied research in areas familiar to researchers in Bayesian methods in AI that are needed to ground the practice of Data Science by borrowing from AI techniques for model formulation that we term ``Decision Modelling.'' This article briefly reviews the formulation process as building a causal graphical model, then discusses the process in terms of six principles that comprise \emph{Decision Quality}, a framework from the popular business literature. We claim that any successful applied ML modelling effort must include these six principles. We explain how Decision Modelling combines a conventional machine learning model with an explicit value model. To give a specific example we show how this is done by integrating a model's ROC curve with a utility model.

摘要
The article briefly reviews the formulation process as building a causal graphical model, and then discusses the process in terms of six principles that comprise "Decision Quality," a framework from the popular business literature. These six principles are:1. Causality: Understanding the causal relationships between variables.2. Contextuality: Taking into account the specific context in which decisions are made.3. Feedback: Using feedback to refine and improve decision-making processes.4. Flexibility: Being able to adapt to changing circumstances and learn from experience.5. Interpretability: Understanding the reasoning behind the decisions made by the model.6. Robustness: Ensuring that the model is robust and can handle unexpected events and outliers.The authors claim that any successful applied machine learning modelling effort must include these six principles. They explain how Decision Modelling combines a conventional machine learning model with an explicit value model, and provide a specific example of how this is done by integrating a model's ROC curve with a utility model.

Inter-case Predictive Process Monitoring: A candidate for Quantum Machine Learning?

paper_url: http://arxiv.org/abs/2307.00080
repo_url: None
paper_authors: Stefan Hill, David Fitzek, Patrick Delfmann, Carl Corea
for: 本研究旨在提高预测过程实例未来行为的精度，特别是当多个实例交互时。
methods: 本研究基于最新的机器学习研究进展，提出了自动预测过程实例的下一个活动、结果或剩余时间的方法。研究涉及提取事件日志数据中有用的特征以及捕捉数据中复杂的模式的问题。
results: 研究发现，在真实世界的训练数据上，包含间 случа的特征可以提高预测精度 более四 percent，而量子机器学习模型在一些特征配置下实际上是竞争对手。然而，由于量子硬件处于初期阶段，本研究对runtime、噪音和过拟合问题进行了批判性评估。

Abstract
Regardless of the domain, forecasting the future behaviour of a running process instance is a question of interest for decision makers, especially when multiple instances interact. Fostered by the recent advances in machine learning research, several methods have been proposed to predict the next activity, outcome or remaining time of a process automatically. Still, building a model with high predictive power requires both - intrinsic knowledge of how to extract meaningful features from the event log data and a model that captures complex patterns in data. This work builds upon the recent progress in inter-case Predictive Process Monitoring (PPM) and comprehensively benchmarks the impact of inter-case features on prediction accuracy. Moreover, it includes quantum machine learning models, which are expected to provide an advantage over classical models with a scaling amount of feature dimensions. The evaluation on real-world training data from the BPI challenge shows that the inter-case features provide a significant boost by more than four percent in accuracy and quantum algorithms are indeed competitive in a handful of feature configurations. Yet, as quantum hardware is still in its early stages of development, this paper critically discusses these findings in the light of runtime, noise and the risk to overfit on the training data. Finally, the implementation of an open-source plugin demonstrates the technical feasibility to connect a state-of-the-art workflow engine such as Camunda to an IBM quantum computing cloud service.

摘要
This work builds upon recent progress in inter-case Predictive Process Monitoring (PPM) and comprehensively benchmarks the impact of inter-case features on prediction accuracy. Moreover, it includes quantum machine learning models, which are expected to provide an advantage over classical models with a scaling amount of feature dimensions.The evaluation on real-world training data from the BPI challenge shows that inter-case features provide a significant boost of more than four percent in accuracy, and quantum algorithms are indeed competitive in a handful of feature configurations. However, as quantum hardware is still in its early stages of development, this paper critically discusses these findings in the light of runtime, noise, and the risk of overfitting on the training data.Finally, the implementation of an open-source plugin demonstrates the technical feasibility to connect a state-of-the-art workflow engine such as Camunda to an IBM quantum computing cloud service.

Dataset balancing can hurt model performance

paper_url: http://arxiv.org/abs/2307.00079
repo_url: None
paper_authors: R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal
for: 提高AudioSet dataset上的批处理性能，尤其是在罕见类样本上。
methods: 使用dataset balancing技术来提高性能。
results: balancing技术可以提高公共评估数据上的性能，但同时会降低在未公布的评估数据上的性能。balancing并不能确保罕见类性能的提高，nor does it improve rare class performance relative to common classes。

Abstract
Machine learning from training data with a skewed distribution of examples per class can lead to models that favor performance on common classes at the expense of performance on rare ones. AudioSet has a very wide range of priors over its 527 sound event classes. Classification performance on AudioSet is usually evaluated by a simple average over per-class metrics, meaning that performance on rare classes is equal in importance to the performance on common ones. Several recent papers have used dataset balancing techniques to improve performance on AudioSet. We find, however, that while balancing improves performance on the public AudioSet evaluation data it simultaneously hurts performance on an unpublished evaluation set collected under the same conditions. By varying the degree of balancing, we show that its benefits are fragile and depend on the evaluation set. We also do not find evidence indicating that balancing improves rare class performance relative to common classes. We therefore caution against blind application of balancing, as well as against paying too much attention to small improvements on a public evaluation set.

摘要

Transformers in Healthcare: A Survey

paper_url: http://arxiv.org/abs/2307.00067
repo_url: None
paper_authors: Subhash Nerella, Sabyasachi Bandyopadhyay, Jiaqing Zhang, Miguel Contreras, Scott Siegel, Aysegul Bumin, Brandon Silva, Jessica Sena, Benjamin Shickel, Azra Bihorac, Kia Khezeli, Parisa Rashidi
for: 该论文主要为健康领域的人工智能应用提供了一种概述。
methods: 该论文使用了Transformers神经网络架构，对医疗数据、结构化和无结构化电子医疗记录、社交媒体、生物物理信号和蛋白Sequences进行分析。
results: 该论文总结了使用Transformers神经网络在医疗领域的应用，包括临床诊断、报告生成、数据重构和药物/蛋白Synthesis。同时也讨论了使用Transformers的好处和缺点，以及计算成本、模型解释性、公平性、对人类价值观Alignment和伦理问题的影响。

Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of data, including medical imaging, structured and unstructured Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences. Those models could help in clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. We identified relevant studies using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.

摘要
With 人工智能（AI）逐渐渗透到社会各个方面，包括医疗健康领域，Transformers神经网络架构的采用速度快速地改变了许多应用程序。Transformer是一种深度学习架构，最初是用于解决通用自然语言处理（NLP）任务，后来在许多领域中得到了应用，包括医疗健康领域。在这篇评论文中，我们提供了Transformers神经网络架构在分析不同类型数据的概述，包括医疗影像、结构化和无结构化电子医疗纪录（EHR）、社交媒体、生理参数和蛋白质序列。这些模型可以帮助临床诊断、报告生成、数据重建和药物/蛋白合成。我们根据Preferred Reporting Items for Systematic Reviews and Meta-Analyses（PRISMA）指南进行了相关的研究选择。我们还讨论了使用Transformers在医疗健康领域的优点和缺点，包括计算成本、模型解释性、公平、对人类价值观Alignment、伦理问题和环境影响。

Improving the Transferability of Time Series Forecasting with Decomposition Adaptation

paper_url: http://arxiv.org/abs/2307.00066
repo_url: None
paper_authors: Yan Gao, Yan Wang, Qiang Wang
for: 本研究旨在提高多变量时间序预测模型的性能，解决数据稀缺问题。
methods: 我们提出了一种新的转移架构——序列分解适应网络（SeDAN），通过对各个领域数据进行适应，提高预测性能。同时，我们还提出了一种新的特征分解方法——隐式对比分解，以分解时间序列数据中的特征。
results: 我们在五个benchmark dataset上进行了广泛的实验，结果表明，我们的SeDAN可以更好地传递知识，提高预测性能的稳定性。

Abstract
Due to effective pattern mining and feature representation, neural forecasting models based on deep learning have achieved great progress. The premise of effective learning is to collect sufficient data. However, in time series forecasting, it is difficult to obtain enough data, which limits the performance of neural forecasting models. To alleviate the data scarcity limitation, we design Sequence Decomposition Adaptation Network (SeDAN) which is a novel transfer architecture to improve forecasting performance on the target domain by aligning transferable knowledge from cross-domain datasets. Rethinking the transferability of features in time series data, we propose Implicit Contrastive Decomposition to decompose the original features into components including seasonal and trend features, which are easier to transfer. Then we design the corresponding adaptation methods for decomposed features in different domains. Specifically, for seasonal features, we perform joint distribution adaptation and for trend features, we design an Optimal Local Adaptation. We conduct extensive experiments on five benchmark datasets for multivariate time series forecasting. The results demonstrate the effectiveness of our SeDAN. It can provide more efficient and stable knowledge transfer.

摘要
We rethought the transferability of features in time series data and proposed Implicit Contrastive Decomposition to decompose the original features into components, including seasonal and trend features, which are easier to transfer. We then designed corresponding adaptation methods for the decomposed features in different domains. For seasonal features, we performed joint distribution adaptation, and for trend features, we designed an Optimal Local Adaptation.We conducted extensive experiments on five benchmark datasets for multivariate time series forecasting and the results demonstrate the effectiveness of our SeDAN. It can provide more efficient and stable knowledge transfer.Here's the Simplified Chinese translation:因为深度学习模型的效果性和特征表示，因特性 forecasting 模型在深度学习方面已经取得了很大的进步。但是，在时间序列预测中，收集到的数据不够，限制了神经预测模型的性能。为了解决数据缺乏的问题，我们设计了序列分解适应网络（SeDAN），这是一种新的传输架构，可以在目标预测领域提高预测性能。我们重新思考了时间序列数据中特征的传输性，并提出了偏好对比分解，将原始特征分解成季节特征和趋势特征两个组分，这两个组分更容易传输。然后，我们设计了对这两个组分在不同领域进行适应的方法。对季节特征，我们进行了共同分布适应，对趋势特征，我们设计了最佳本地适应。我们在五个多ivariate 时间序列预测 benchmark 数据集上进行了广泛的实验，结果表明，我们的 SeDAN 非常有效。它可以提供更高效和稳定的知识传输。

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

paper_url: http://arxiv.org/abs/2306.17844
repo_url: None
paper_authors: Ziqian Zhong, Ziming Liu, Max Tegmark, Jacob Andreas
for: 这个论文是用来研究 neural networks 是否可靠地 rediscover 已知的算法来解决相应的任务的。
methods: 这个论文使用的方法是使用 modular addition 作为一个示例问题，通过调整模型的 hyperparameter 和初始化来让 neural networks 发现不同的算法解决方案。
results: 研究发现，即使是使用简单的学习问题， neural networks 仍然可以发现多种不同的算法解决方案，包括一种已知的 Clock 算法和一种新发现的、 menos intuitive 但可读的 Pizza 算法，以及多种更复杂的解决方案。

Abstract
Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.

摘要
neuros networks, trained on well-understood algorithmic tasks, can reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Resetting the Optimizer in Deep RL: An Empirical Study

paper_url: http://arxiv.org/abs/2306.17833
repo_url: None
paper_authors: Kavosh Asadi, Rasool Fakoor, Shoham Sabach
for: deep reinforcement learning中的优化值函数近似问题
methods: 使用现代variants of stochastic gradient descent algorithm such as Adam，并在每次迭代中重置内部参数
results: 这种简单的修改方法可以减轻现代优化器内部参数的污染效应，提高深度RL在Atari benchmark上的表现

Abstract
We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.

摘要
我们关注深度游戏学习中的优化值函数 aproximation 问题。这是一个迭代过程，其中每次迭代都可能有不同的优化目标函数。通常使用现代 variants of stochastic gradient descent algorithm，如 Adam。这些优化器会维护自己的内部参数，如梯度的初始值和二阶导数的估计值，并在时间上更新这些参数。因此，在前一个迭代的信息会影响当前迭代的优化问题解决。我们提出了一个简单的想法，即在开始新的迭代时，重置优化器的内部参数。我们通过employmodern optimizers in conjunction with the Rainbow algorithm来实验这种重置策略，并证明这是一种简单的 modificaiton 可以大幅提高深度RL在Atari benchmark上的性能。

Federated Ensemble YOLOv5 - A Better Generalized Object Detection Algorithm

paper_url: http://arxiv.org/abs/2306.17829
repo_url: None
paper_authors: Vinit Hegiste, Tatjana Legler, Martin Ruskowski
for: 这篇论文旨在探讨基于联合学习算法的联邦学习（FL）在对象检测中的应用，以提高泛化性能。
methods: 该论文使用了基于FED Avg和FED SGD的联邦学习算法，并采用了随机抽样策略不替换。
results: 实验结果显示，基于FL的YOLOv5模型在测试集上生成的精度 bounding box 比中央训练方法更高，特别是在测试集中包含两个客户端没有训练集的情况下。这些结果表明，FL可以视为一种ensemble algorithm的融合，类似于Bagging和Boosting技术的融合。因此，FL不仅可以视为一种隐私保护方法，还可以视为一种提高机器学习模型性能的方法。

Abstract
Federated learning (FL) has gained significant traction as a privacy-preserving algorithm, but the underlying resembles of federated learning algorithm like Federated averaging (FED Avg) or Federated SGD (FED SGD) to ensemble learning algorithms has not been fully explored. The purpose of this paper is to examine the application of FL to object detection as a method to enhance generalizability, and to compare its performance against a centralized training approach for an object detection algorithm. Specifically, we investigate the performance of a YOLOv5 model trained using FL across multiple clients and employ a random sampling strategy without replacement, so each client holds a portion of the same dataset used for centralized training. Our experimental results showcase the superior efficiency of the FL object detector's global model in generating accurate bounding boxes for unseen objects, with the test set being a mixture of objects from two distinct clients not represented in the training dataset. These findings suggest that FL can be viewed from an ensemble algorithm perspective, akin to a synergistic blend of Bagging and Boosting techniques. As a result, FL can be seen not only as a method to enhance privacy, but also as a method to enhance the performance of a machine learning model.

摘要
federated learning (FL) 已经得到了关于隐私保护的算法的广泛应用，但是背后的 federated learning 算法，如 federated averaging (FED Avg) 或 federated SGD (FED SGD) 与 ensemble learning 算法的相似性仍未得到了充分的探讨。本文的目的是探讨 federated learning 在物品探测中的应用，以增强模型的一致性，并与中央训练方法进行比较。具体来说，我们 investigate YOLOv5 模型使用 federated learning 方法在多个客户端上训练，使用随机抽样无替换的方法，以确保每个客户端都持有相同的测试数据。我们的实验结果显示，使用 federated learning 的 object detection 模型在处理未见过的物品时，具有更高的精度，尤其是在测试集中包含两个不同客户端的物品。这些结果表明， federated learning 可以被视为一种ensemble algorithm的融合，类似于 Bagging 和 Boosting 技术的融合。因此， federated learning 不仅可以作为一种隐私保护方法，而且可以作为一种提高机器学习模型表现的方法。

Understanding Unfairness via Training Concept Influence

paper_url: http://arxiv.org/abs/2306.17828
repo_url: None
paper_authors: Yuanshun Yao, Yang Liu
for: 本研究旨在帮助实践者更好地理解他们的数据和算法是如何不公。
methods: 本研究使用对训练样本进行counterfactual intervening，以计算这些样本对模型不公性的影响。
results: 本研究可以帮助实践者理解训练数据中的不公性，并且可以探索不公性的来源。此外，这种方法还可以检测恶意攻击、探测数据质量问题和修复不公性。

Abstract
Knowing the causes of a model's unfairness helps practitioners better understand their data and algorithms. This is an important yet relatively unexplored task. We look into this problem through the lens of the training data - one of the major sources of unfairness. We ask the following questions: how would a model's fairness performance change if, in its training data, some samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) some features were changed? In other words, we quantify the fairness influence of training samples by counterfactually intervening and changing samples based on predefined concepts, i.e. data attributes such as features (X), labels (Y), or sensitive attributes (A). To calculate a training sample's influence on the model's unfairness w.r.t a concept, we first generate counterfactual samples based on the concept, i.e. the counterfactual versions of the sample if the concept were changed. We then calculate the resulting impact on the unfairness, via influence function, if the counterfactual samples were used in training. Our framework not only helps practitioners understand the observed unfairness and repair their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.

摘要
知道模型的不公平性的原因可以帮助实践者更好地理解其数据和算法。这是一项重要但相对未经探索的任务。我们通过训练数据的镜像来研究这个问题。我们问的问题是：如果在训练数据中某些样本（1）来自不同（例如，人口）群体，（2）被标注 differently，或（3）某些特征被改变， THEN 如何改变模型的不公平性？我们使用 counterfactual 技术来计算训练样本对模型不公平性的影响。首先，我们生成 counterfactual 样本基于概念，即样本的 counterfactual 版本，如果概念被改变。然后，我们计算这些 counterfactual 样本在训练中的影响，通过 influence function 来衡量模型的不公平性。我们的框架不仅帮助实践者理解观察到的不公平性，还可以用于其他应用，如检测杂标注、修复不均衡表示、和检测公平性目标攻击。

Scalable tensor methods for nonuniform hypergraphs

paper_url: http://arxiv.org/abs/2306.17825
repo_url: None
paper_authors: Sinan G. Aksoy, Ilya Amburg, Stephen J. Young
for: 研究多元图模型中的多方交互方式
methods: 使用tensor方法，并开发了tensor times same vector（TTSV）算法以提高复杂度
results: 提供了一种新的多元图中心性和凝聚算法，并证明了这些tensor测量可以提供更多的信息，并能够探测高阶结构，而许多现有的矩阵基于方法无法探测出来

Abstract
While multilinear algebra appears natural for studying the multiway interactions modeled by hypergraphs, tensor methods for general hypergraphs have been stymied by theoretical and practical barriers. A recently proposed adjacency tensor is applicable to nonuniform hypergraphs, but is prohibitively costly to form and analyze in practice. We develop tensor times same vector (TTSV) algorithms for this tensor which improve complexity from $O(n^r)$ to a low-degree polynomial in $r$, where $n$ is the number of vertices and $r$ is the maximum hyperedge size. Our algorithms are implicit, avoiding formation of the order $r$ adjacency tensor. We demonstrate the flexibility and utility of our approach in practice by developing tensor-based hypergraph centrality and clustering algorithms. We also show these tensor measures offer complementary information to analogous graph-reduction approaches on data, and are also able to detect higher-order structure that many existing matrix-based approaches provably cannot.

摘要
While multilinear algebra appears natural for studying the multiway interactions modeled by hypergraphs, tensor methods for general hypergraphs have been stymied by theoretical and practical barriers. A recently proposed adjacency tensor is applicable to nonuniform hypergraphs, but is prohibitively costly to form and analyze in practice. We develop tensor times same vector (TTSV) algorithms for this tensor which improve complexity from $O(n^r)$ to a low-degree polynomial in $r$, where $n$ is the number of vertices and $r$ is the maximum hyperedge size. Our algorithms are implicit, avoiding formation of the order $r$ adjacency tensor. We demonstrate the flexibility and utility of our approach in practice by developing tensor-based hypergraph centrality and clustering algorithms. We also show these tensor measures offer complementary information to analogous graph-reduction approaches on data, and are also able to detect higher-order structure that many existing matrix-based approaches provably cannot.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you prefer Traditional Chinese, I can provide that as well.

Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation

paper_url: http://arxiv.org/abs/2306.17817
repo_url: None
paper_authors: Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki
for: 提高机器人操作精度，使用3D感知表示来实现高精度End-effector pose预测。
methods: 使用Transformer模型，将6DoF键动预测转换为3D检测，并使用自适应空间计算来选择最佳特征点。
results: 在RLbench manipulate benchmark上实现新的state-of-the-art，比前一代SOTA 2D多视图策略提高10%，并且在3D策略上提高22%，且需要3x menos计算资源。

Abstract
3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site: https://act3d.github.io/.

摘要
三维感知表示法适用于机器人操作，因为它容易表示遮挡和空间理解简化。许多操作任务需要高精度的endpoint姿态预测，通常需要高分辨率的3D感知格，这些格是计算成本较高的。因此，大多数操作策略直接在2D中运行，抛弃3D的逻辑推导。在这篇论文中，我们提出了Act3D，一种操作策略变换器，将6度 freedom键点预测视为3D检测，并在自适应空间计算中使用相对的空间注意力。它从一个或多个相机视图中提取了3D特征云，逐步在自由空间中分布3D点网格，使用相对的空间注意力来特征化物理特征云，并选择最佳的特征点 дляendpoint姿态预测。Act3D在RLbench上设置了新的状态天地，其在74个RLbench任务中实现了10%的绝对提升，并在22%的绝对提升和3x的计算量下超越了之前的SOTA 3D策略。在严格的拓展中，我们表明了相对空间注意力、大规模视语言预训练2D背景和跨度融合的重要性。代码和视频可以在我们项目网站上获取：。

Bayesian Optimization with Formal Safety Guarantees via Online Conformal Prediction

paper_url: http://arxiv.org/abs/2306.17815
repo_url: None
paper_authors: Yunchuan Zhang, Sangwoo Park, Osvaldo Simeone
For: The paper is written for optimizing black-box functions with safety constraints, specifically in scenarios where feedback on the safety of attempted solutions is provided.* Methods: The paper uses Bayesian optimization (BO) and online conformal prediction (CP) to develop a novel approach called SAFE-BOCP, which satisfies safety requirements while allowing for an arbitrary rate of violation of the safety constraint.* Results: The proposed SAFE-BOCP method is validated through experimental results on synthetic and real-world data, demonstrating its advantages and flexibility in optimizing black-box functions with safety constraints.Here’s the same information in Simplified Chinese:* For: 这篇论文是为了优化黑盒函数（black-box function）with safety constraints，具体来说是在反馈安全性的情况下进行优化。* Methods: 这篇论文使用 bayesian 优化（BO）和在线准确预测（CP）来开发一种名为 SAFE-BOCP 的新方法，该方法可以满足安全要求，同时允许随机但非零的安全性约束违反率。* Results: SAFE-BOCP 方法通过对 sintetic 和实际数据进行实验，证明了它在优化黑盒函数 with safety constraints 中的优势和灵活性。

Abstract
Black-box zero-th order optimization is a central primitive for applications in fields as diverse as finance, physics, and engineering. In a common formulation of this problem, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. In this paper, we study scenarios in which feedback is also provided on the safety of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process. Focusing on methods based on Bayesian optimization (BO), prior art has introduced an optimization scheme -- referred to as SAFEOPT -- that is guaranteed not to select any unsafe solution with a controllable probability over feedback noise as long as strict assumptions on the safety constraint function are met. In this paper, a novel BO-based approach is introduced that satisfies safety requirements irrespective of properties of the constraint function. This strong theoretical guarantee is obtained at the cost of allowing for an arbitrary, controllable but non-zero, rate of violation of the safety constraint. The proposed method, referred to as SAFE-BOCP, builds on online conformal prediction (CP) and is specialized to the cases in which feedback on the safety constraint is either noiseless or noisy. Experimental results on synthetic and real-world data validate the advantages and flexibility of the proposed SAFE-BOCP.

摘要
黑盒zero顺位优化是应用于金融、物理和工程等领域的中心基本原理。在一般的形式ulation中，设计师 sequentially尝试候选解， receiving noisy feedback on the value of each attempt from the system。在这篇论文中，我们研究了具有反馈安全性的问题，并将优化器限制为在优化过程中不能对安全性进行多个不安全的尝试。我们专注在基于Bayesian优化（BO）的方法上，并引入了一个称为SAFEOPT的优化方案，可以在feedbacknoise下保证不会选择任何不安全的解决方案，只要strictly assumptions on the safety constraint function是满足的。在这篇论文中，我们介绍了一个新的BO-based方法，可以满足安全要求，不论安全限制函数的性质。这个强制性保证是基于允许一定、可控的，但不是零的安全限制违背率。我们称这个方法为SAFE-BOCP，它基于线上对称预测（CP），并特化为当反馈安全限制是无噪或噪音的情况下。实验结果显示了SAFE-BOCP的优势和灵活性，在 synthetic和实际数据上。

Stay on topic with Classifier-Free Guidance

paper_url: http://arxiv.org/abs/2306.17806
repo_url: https://github.com/Vermeille/cfg-llm
paper_authors: Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman
for: 本研究的目的是应用类别器-free guidance（CFG）技术在文本生成中，以便更好地遵循提示。
methods: 本研究使用的方法是CFG技术，并在不同的任务上进行了评估，包括问答、理解、代码生成和机器翻译等。
results: 研究结果显示，CFG技术可以广泛应用于纯语言模型中，可以提高模型的性能，并且可以与其他推理时间方法相结合使用，以提高模型在困难任务中的表现。此外，CFG技术还可以增加助手的准确性和一致性。

Abstract
Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline.

摘要
<> translate "Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline." into Simplified Chinese.简化中文：现代出版的Classifier-Free Guidance（CFG）在文本生成中作为轻量级技术，激发生成的优化。在这项工作中，我们证明CFG可以广泛应用于语言模型的推理时间技术。我们显示CFG可以（1）提高PYTHIA、GPT-2和LLaMA家族模型在各种任务中的性能：问答、推理、代码生成和机器翻译等，在LAMBADA上以LLaMA-7B超过PaLM-540B的最佳性能；（2）带来相当于两倍参数的改进；（3）可以与其他推理时间方法相结合，如Chain-of-Thought和Self-Consistency，带来进一步的改进在困难任务中；（4）可以增强助手的准确性和流畅性在复杂的形式驱动和内容驱动的提问中：人类评估中显示GPT4All使用CFG的75%的偏好比基eline。

Voting-based Multimodal Automatic Deception Detection

paper_url: http://arxiv.org/abs/2307.07516
repo_url: None
paper_authors: Lana Touma, Mohammad Al Horani, Manar Tailouni, Anas Dahabiah, Khloud Al Jallad
for: automatic deception detection in videos using audio, visual, and lexical features
methods: voting-based method combining CNN, SVM on Mel spectrograms, and Word2Vec on SVM
results: outperformed state of the art with best results of 97%, 96%, 92% on Real-Life Trial Dataset, and 97%, 82%, 73% on Miami University Deception Detection Dataset.

Abstract
Automatic Deception Detection has been a hot research topic for a long time, using machine learning and deep learning to automatically detect deception, brings new light to this old field. In this paper, we proposed a voting-based method for automatic deception detection from videos using audio, visual and lexical features. Experiments were done on two datasets, the Real-life trial dataset by Michigan University and the Miami University deception detection dataset. Video samples were split into frames of images, audio, and manuscripts. Our Voting-based Multimodal proposed solution consists of three models. The first model is CNN for detecting deception from images, the second model is Support Vector Machine (SVM) on Mel spectrograms for detecting deception from audio and the third model is Word2Vec on Support Vector Machine (SVM) for detecting deception from manuscripts. Our proposed solution outperforms state of the art. Best results achieved on images, audio and text were 97%, 96%, 92% respectively on Real-Life Trial Dataset, and 97%, 82%, 73% on video, audio and text respectively on Miami University Deception Detection.

摘要
自动欺骗检测已经是长期的研究热点，通过机器学习和深度学习自动检测欺骗，带来了新的灯光。在这篇论文中，我们提出了一种投票方法 для自动欺骗检测从视频中提取的音频、视觉和文本特征。我们的投票方法包括三个模型：一个用于从图像中检测欺骗的Convolutional Neural Network (CNN)，一个用于从音频中检测欺骗的Support Vector Machine (SVM)，以及一个用于从文本中检测欺骗的Word2Vec和SVM。我们的提议方案在比较之上超过了现状。在美加大学实验 dataset 上，最好的结果是图像、音频和文本的检测率分别为97%、96%和92%。在美州大学欺骗检测 dataset 上，最好的结果是视频、音频和文本的检测率分别为97%、82%和73%。

Hierarchical Bayesian Regression for Multi-Location Sales Transaction Forecasting

paper_url: http://arxiv.org/abs/2306.17795
repo_url: None
paper_authors: John Mark Agosta, Mario Inchiosa
for: 预测购物行为，具体来说是在不同的店铺和日期上的购物额。
methods: 使用层次架构的 bayesian 模型，通过对各个组合的预测结果进行共享来提高预测的准确性。
results: 通过使用 \textsf{stan} 包和具体的销售交易数据，实现了在多个位置和日期上的购物额预测，并且解决了由于数据有限而导致的准确性问题。

Abstract
The features in many prediction models naturally take the form of a hierarchy. The lower levels represent individuals or events. These units group naturally into locations and intervals or other aggregates, often at multiple levels. Levels of groupings may intersect and join, much as relational database tables do. Besides representing the structure of the data, predictive features in hierarchical models can be assigned to their proper levels. Such models lend themselves to hierarchical Bayes solution methods that ``share'' results of inference between groups by generalizing over the case of individual models for each group versus one model that aggregates all groups into one. In this paper we show our work-in-progress applying a hierarchical Bayesian model to forecast purchases throughout the day at store franchises, with groupings over locations and days of the week. We demonstrate using the \textsf{stan} package on individual sales transaction data collected over the course of a year. We show how this solves the dilemma of having limited data and hence modest accuracy for each day and location, while being able to scale to a large number of locations with improved accuracy.

摘要
模型中的特征自然形成一个层次结构。下层表示个人或事件。这些单元自然归于地点和时间间隔或其他聚合，经常有多个层次。层次分组可能相交和结合，类似于关系数据库表。besides representing the structure of the data, predictive features in hierarchical models can be assigned to their proper levels.这些模型适用于层次权重解决方法，可以在不同组中共享推断结果。在这篇论文中，我们展示了我们对estore销售预测的工作进程，使用层次权重模型。我们使用了\textsf{stan}包，对各个销售交易数据进行分析，并示出了在一年内收集的数据中的准确性。我们还示出了如何通过聚合多个地点来提高准确性。

Vision Through the Veil: Differential Privacy in Federated Learning for Medical Image Classification

paper_url: http://arxiv.org/abs/2306.17794
repo_url: None
paper_authors: Kishore Babu Nampalle, Pradeep Singh, Uppala Vivek Narayan, Balasubramanian Raman
for: 这种研究是为了解决医疗领域深度学习应用中数据集成问题，而这种集成常常会带来重要的隐私问题。
methods: 这种研究使用了联邦学习框架，并将分Diff privacy技术integrated into it，以提供更高的隐私保护。
results: 研究发现，在保持隐私的同时，可以使用权限证明技术来维护强大的图像分类性能。

Abstract
The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direct data exchange, presents a promising solution. Nevertheless, the inherent vulnerabilities of federated learning necessitate further privacy safeguards. This study addresses this need by integrating differential privacy, a leading privacy-preserving technique, into a federated learning framework for medical image classification. We introduce a novel differentially private federated learning model and meticulously examine its impacts on privacy preservation and model performance. Our research confirms the existence of a trade-off between model accuracy and privacy settings. However, we demonstrate that strategic calibration of the privacy budget in differential privacy can uphold robust image classification performance while providing substantial privacy protection.

摘要
深度学习在医疗领域的普及需要不同机构之间的数据集成，这常常与隐私问题相关。尤其在医疗图像分析中，由于数据的敏感性，隐私保护是非常重要的。联邦学习，允许协作模型训练而不需直接数据交换，提供了一个有希望的解决方案。然而，联邦学习的内在漏洞需要进一步的隐私保护措施。本研究在联邦学习框架中集成了差分隐私，一种领先的隐私保护技术，以提高隐私保护和模型性能之间的平衡。我们提出了一种新的差分隐私联邦学习模型，并仔细分析了其对隐私保护和模型性能的影响。我们的研究证明了隐私保护和模型性能之间存在负相关性，但我们还是能够通过差分隐私的调整来保持 Robust 的图像分类性能，同时提供了严格的隐私保护。

Look, Remember and Reason: Visual Reasoning with Grounded Rationales

paper_url: http://arxiv.org/abs/2306.17778
repo_url: None
paper_authors: Apratim Bhattacharyya, Sunny Panchal, Mingu Lee, Reza Pourreza, Pulkit Madan, Roland Memisevic
for: 这篇论文旨在探讨大语言模型在视觉理解任务中的表现，尤其是如何使用人类视觉解决方法来提高模型的表现。
methods: 这篇论文提出了一种基于人类视觉解决方法的方法，即“看、记忆、理解”的三步过程，通过在每一步中逐步提取视觉信息，使用低级视觉能力来解决视觉理解问题。
results: 研究发现，通过在大语言模型中引入视觉理解的 rationales，可以使模型在多种视觉理解任务中表现竞争力强，包括CLEVR、CATER和ACRE等数据集中的任务。

Abstract
Large language models have recently shown human level performance on a variety of reasoning tasks. However, the ability of these models to perform complex visual reasoning has not been studied in detail yet. A key challenge in many visual reasoning tasks is that the visual information needs to be tightly integrated in the reasoning process. We propose to address this challenge by drawing inspiration from human visual problem solving which depends on a variety of low-level visual capabilities. It can often be cast as the three step-process of ``Look, Remember, Reason'': visual information is incrementally extracted using low-level visual routines in a step-by-step fashion until a final answer is reached. We follow the same paradigm to enable existing large language models, with minimal changes to the architecture, to solve visual reasoning problems. To this end, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. We show competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets over state-of-the-art models designed specifically for these tasks.

摘要
We apply the same paradigm to existing large language models, with minimal architectural changes, to enable them to solve visual reasoning problems. To do this, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. Our approach achieves competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets, outperforming state-of-the-art models designed specifically for these tasks.

Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

paper_url: http://arxiv.org/abs/2306.17775
repo_url: https://github.com/blt2114/twisted_diffusion_sampler
paper_authors: Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, John P. Cunningham
for: This paper is written for the task of conditional generation, specifically for molecular design and protein design.
methods: The paper proposes a new method called Twisted Diffusion Sampler (TDS), which is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. TDS uses twisting, an SMC technique, to incorporate heuristic approximations without compromising asymptotic exactness.
results: The paper shows that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. Additionally, TDS is applied to motif-scaffolding, a core task in protein design, and outperforms the state of the art on benchmark test cases.

Abstract
Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.

摘要
Diffusion models 已经在一系列的条件生成任务上取得了成功，包括分子设计和文本到图像生成。然而，这些成就主要取决于任务特定的条件训练或错误的尝试性近似。理想的条件生成方法应该能够提供准确的样本 для广泛的条件分布没有需要任务特定的训练。为此目的，我们介绍了扭曲扩散采样（TDS）。TDS 是一种顺序 Monte Carlo（SMC）算法， targets diffusion models 的条件分布。主要思想是使用扭曲，一种 SMC 技术，来推入尝试性近似而不失 asymptotic exactness。我们首先在 simulated 和 MNIST 图像缺失和分类生成任务上发现，TDS 提供了计算统计上的交易，提供更准确的近似，但是与 empirical 上的改进。然后，我们转向 protein 设计中的核心任务 - motif-scaffolding，使用 TDS 扩展到 Riemannian 扩散模型。在 benchmark 测试 caso，TDS 允许 flexible conditioning criteria，并经常超越当前的状态。

Precision Anti-Cancer Drug Selection via Neural Ranking

paper_url: http://arxiv.org/abs/2306.17771
repo_url: https://github.com/ninglab/drugranker
paper_authors: Vishal Dey, Xia Ning
for:这paper的目的是提出一种基于大规模高通量屏选数据的神经网络排名方法，用于快速和准确地选择适合某个癌细胞线的药物。methods:这paper使用了高通量屏选数据来驱动神经网络模型，并提出了两种神经网络排名方法：List-One和List-All。这两种方法都使用了列式排名的思想，但是List-All方法会考虑所有敏感药物，而不是只考虑最敏感的一个。results:实验结果表明，List-All方法在50%的测试细胞线上可以达到8.6%的提升，与最佳基准方法相比。此外，分析表明，提出的方法学习的隐藏空间具有有用的凝固结构和捕捉了相关的生物学特征。此外，这paper的实验评估还提供了不同方法之间的对比，以便用于选择最佳方法。

Abstract
Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most sensitive drugs for each cell line still remains a significant challenge. To address this, we developed neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed two neural listwise ranking methods that learn latent representations of drugs and cell lines, and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed a neural listwise ranking method, List-One, on top of the existing method ListNet. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the sensitive drugs instead of the top sensitive drug, unlike List-One. Our results demonstrate that List-All outperforms the best baseline with significant improvements of as much as 8.6% in hit@20 across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive empirical evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).

摘要
个性化癌症治疗需要深入了解药物和癌细胞之间复杂的互动关系，以及这些关系在不同的遗传和分子上下文中的变化。为此，高通量测试技术已被使用来生成大规模的药物响应数据，以便建立数据驱动的计算模型。然而，准确地对每个癌细胞类型中最敏感的药物进行优先级划分仍然是一个大的挑战。为解决这个问题，我们开发了基于大规模药物响应数据的神经网络排名方法。与现有方法不同的是，我们将药物响应预测 зада题定义为排名问题。在这个工作中，我们提出了两种神经网络排名方法：List-One和List-All。List-One方法基于现有的ListNet方法，而List-All方法则专注于每个癌细胞类型中的所有敏感药物。我们的结果显示，List-All方法在50%的测试癌细胞类型上具有显著改善（最大为8.6%），而且我们的分析表明，我们所提出的方法学习的秘密表空间具有有用的归一化结构，并捕捉了相关的生物学特征。此外，我们的全面的实验评估还提供了对不同方法（包括我们所提出的方法）的 объектив和 thorugh的比较。

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

paper_url: http://arxiv.org/abs/2306.17759
repo_url: None
paper_authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy
for: 研究深度学习理论中covariance矩阵的代理，探讨Transformers的trainability。
methods: 使用修改后的Softmax-based attention模型，并添加skip connections，在无穷深度和宽度的比例限制下进行研究。
results: 显示在初始化时，限定分布可以通过一个Stochastic Differential Equation(SDE)来描述，该SDE是depth-to-width比例的。通过修改Transformers的注意机制，控制了scale的演算和噪声，使得网络存在稳定的SDE，从而避免了深度注意模型中的rank degeneracy问题。

Abstract
In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.

摘要
在深度学习理论中，covariance矩阵表示 Representatives 的训练可行性。靛 motivated by Transformers 的成功，我们研究 modified Softmax-based attention 模型中 skip connections 的 proportional limit 的 infinite-depth-and-width。我们发现，在初始化时，限定分布可以通过一个 Stochastic Differential Equation (SDE) indexed by depth-to-width ratio 来描述。为了实现一个具有well-defined 杂化限制，Transformers 的 attention机制被修改为将 Softmax 输出中心于标识，并将 Softmax logits Scaling 通过宽度 dependent 温度参数。我们通过对应的 SDE 检查网络的稳定性，并证明了 residual connections 可以 elegantly 控制杂化的规模。存在一个稳定的 SDE 表示 covariance 结构是良好的，即使在非常深和宽的情况下，从而避免了深度注意力模型中的著名问题 rank degeneracy。最后，我们通过 simulations 表明 SDE 提供了非常好的finite-size model 的描述。我们称这种建模修改为 shaped Transformer。

TD Convergence: An Optimization Perspective

paper_url: http://arxiv.org/abs/2306.17750
repo_url: None
paper_authors: Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor
for: 本研究探讨了TD学习算法的收敛行为。
methods: 我们通过优化视角来研究TD算法，并证明了TD可以视为每次迭代都改变函数要Minimize的迭代优化算法。
results: 我们在经典 counter example 上 investigate TD 的偏置，并发现了两种力量 Determine TD 算法的收敛或偏置行为。我们在线性 TD 设置下证明了这两种力量之间的关系，并扩展到更广泛的设置中证明了 TD 算法的收敛。

Abstract
We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.

摘要
我们研究Temporal-Difference（TD）学习算法的收敛行为。通过优化的视角，我们首先 argueTD可以视为每次迭代都有变化的函数最小化算法。通过仔细分析TD在经典Counter例中的异常情况，我们确定了这两种力量DetermineTD算法的收敛或异常行为。我们 subsequentially formalize我们的发现，在线TD设置下，使用quadratic loss函数，并证明TD的收敛取决于这两种力量之间的交互。我们扩展这个优化视角，证明TD在远多于线性approximation和平方损失的情况下也是收敛的。我们的结果为TD在回归学习中的成功应用提供了理论解释。